Abstract

This study proposed an instantaneous summer air temperature (Tair) estimation model using the Himawari-8 Advanced Himawari Imager (AHI) brightness temperatures (BTs) in split-window channels and other auxiliary data. Correlation analysis and stepwise linear regression were used to select the predictors for Tair estimation. Nine predictors such as AHI BTs in channels 14 and 15, elevation, precipitable water vapor (PWV), and relative humidity (RH) were finally selected. Stepwise linear regression and neural network (NN) methods were applied to construct summer Tair estimation models over China, respectively. The estimated Tair by linear and NN models was evaluated using the observed Tair from 272 meteorological stations over China. The results showed that AHI BTs in channels 14 and 15, elevation, PWV, and RH were more important for Tair estimation than other predictors. The accuracy of the NN models was better than the linear models. The correlation coefficient (R), root mean square error (RMSE), and bias were 0.97, 1.72°C, and 0.04°C, respectively, for the NN model and were 0.89, 3.28°C, and 0.07°C, respectively, for the linear model. About 75.6% of the Tair differences by the NN model were within 2.0°C, and even 45.8% were within 1.0°C. The performance of the Tair estimation model for each site was also investigated, and the accuracy of Tair estimation in southeast China is better than northwest China.

1. Introduction

Air temperature (Tair), one of the basic meteorological observations, is a key environmental variable in a wide range of applications such as terrestrial hydrology, biosphere processes, climate change, and atmospheric sciences [1‐‐3]. The spatiotemporal pattern of Tair can be highly variable and complex because it is affected by properties which vary greatly in space and time [4, 5].

Tair is typically measured in thermometer shelters located about 1.5–2 m above ground at the meteorological station with high accuracy and temporal resolution. However, the distribution of meteorological stations is usually uneven and sparse especially in sparsely populated regions such as Tibetan Plateau [6, 7]. Therefore, the spatial resolution of the observed Tair at stations is coarse and provides only limited Tair information about spatial patterns over wide areas [6, 8].

Meteorological satellites can provide continuous surface and atmospheric observations for a large coverage [811]. In the last few decades, satellite observations have been widely used in Tair estimation. Several methods have been proposed which can be divided into the following four methods [12, 13]: simple statistical approaches [6, 14], advanced statistical approaches [1517], the temperature-vegetation index (TVX) approach [4, 9, 18, 19], and the energy-balance approaches [20, 21].

Simple statistical approaches directly establish a linear relationship between land surface temperature (LST) and Tair. The estimation accuracy is dependent on the data used to construct the models such as time and location. The advanced statistical approaches generally consider multiple factors such as LST, elevation, and normalized difference vegetation index (NDVI) in the Tair estimation model [12]. This approach usually employs multiple linear regression method to estimate Tair, and the estimation accuracy is considerably better than the simple statistical method. The TVX approach is based on the assumption that NDVI is linearly correlated with the LST, and the temperature at the top-of-canopy level is same as within the canopy for an infinitely thick canopy. The TVX method is not suitable for low vegetation coverage conditions and bare land [4, 18]. The energy balance method is a physically based approach, and it requires large amounts of parameters which often cannot be directly provided by remote sensing [13, 20].

In addition, other techniques, such as neural network (NN), support vector machines, and random forests, have also been used for Tair estimation [15, 16, 22]. These models can express the nonlinear relationship between various inputs and Tair, and their accuracy is usually better than the multivariate regression models [16, 22]. To summarize, the estimation errors of the existing algorithms were usually about 2-3°C [12, 13]. The methods providing “accurate” Tair estimation (i.e., 1-2°C) with high temporal and spatial resolution need to be further developed [12, 13, 23].

In the past years, the Moderate Resolution Imaging Spectroradiometer (MODIS) LST products have been well validated and widely used for Tair estimation due to its high accuracy [8, 13, 2426]. There are several studies focusing on the Tair estimation in different regions of China using MODIS data [2730]. The results show obvious regional differences in the accuracy of Tair estimation over China due to the extensive and complex topography [27, 30, 31]. MODIS provide observations with high spatial but relatively low temporal resolutions and can only provide Tair 1-2 times a day in the same area. In contrast, geostationary meteorological satellites can provide continuous monitoring of the Earth’s surface for a fixed geographical area [10, 32]. The new generation of geostationary meteorological satellites, such as Himawari-8, GOES-R, and FY-4A (FengYun-4A), will provide more accurate observations of Earth’s atmosphere, oceans, and land at higher temporal (e.g., 2.5–10 min) and spatial resolutions (e.g., 0.5–2 km) in the next few decades [3234].

The Advanced Himawari Imager (AHI) onboard Himawari-8 can provide continuous observation in 16 spectral bands covering a wavelength from 0.4 μm to 13.4 μm over the Asia-Pacific region every 10 min. It provides unprecedented opportunities for Tair estimation over this region at higher spatial and temporal resolutions. However, no operational LST products have been released for AHI at present, which means that the Tair estimation methods based on LST products cannot be used for AHI data. As far as we know, there are a few studies on Tair estimation using AHI data. The main objective of this study is to analyze the feasibility of AHI brightness temperatures (BTs) for Tair estimation and construct an instantaneous Tair estimation model in China using an advanced statistical approach.

Section 2 describes the study area and dataset used in this study. Section 3 describes the methodology for Tair estimation including correlation analysis, stepwise linear regression, neural network, and error analysis. Validation of the proposed algorithm for Tair estimation is presented in Section 4. The conclusions are given in Section 5.

2. Study Area and Data

2.1. Study Area

The study area is located in mainland China, where the terrain varies greatly and gradually descends from west to east (Figure 1). Most of the human population of China are located in the eastern plains. The vast western region is dominated by plateau and desert (e.g., Tibetan Plateau, Gobi, and Taklamakan Deserts) and has lower population [35]. Similarly, the spatial distribution of meteorological stations is uneven due to the unbalance of population and economic development (Figure 1).

The elevation of the stations ranges from 0 to 4532 m. The top six land surface types at meteorological stations are grasslands (∼23.7%), croplands (∼20.34%), woody savannas (∼14.41%), mixed forests (∼12.7%), crop/natural vegetation mosaic (∼9.32%), and barren or sparse land (∼8.47%).

The climate in China differs from region to region due to the complex terrain [35]. It is mainly dominated by dry seasons and wet monsoons, which lead to significant Tair differences between winter and summer [36]. Summer Tair in China is relatively high (>20°C) except high altitude areas (e.g., Tibetan Plateau). For example, the summer Tair over the sites in Figure 1 ranges from −3.7°C to 39.5°C in 2017. Summer heat wave shows a major impact on human health, agricultural food production, and the use of water and electricity. It is important for summer Tair monitoring in large scale. This study focused on summer Tair estimation over China using AHI observations.

2.2. Data

AHI L1b radiance and L2 cloud mask data, MODIS L3 (i.e., NDVI) products, meteorological data, Global Forecast System (GFS) forecast fields, and other auxiliary data such as elevation and latitude were used in this study.

2.2.1. Satellite Data

Himawari-8 (located at 140.7°E) is a Japanese new-generation geostationary meteorological satellite. It began operation on July 7, 2015, as a replacement for MTSAT-2 (also known as Himawari-7). AHI is the primary instrument aboard Himawari-8 and has 16 multispectral channels (3 visible, 3 near-infrared, and 10 infrared channels). It mainly captures visible and infrared images of the Asia-Pacific region [32]. The spatial resolution of AHI visible, near-infrared, and infrared channels is 0.5 km, 1.0 km, and 2.0 km, respectively, which are comparable to the imagers onboard polar-orbiting meteorological satellites. It can provide full-disk observations every 10 minutes and regional observations with shorter intervals [32].

As mentioned above, there is no operational LST product available for Himawari-8. In this study, the BTs in AHI split-window channels 14 and 15 (hereafter, BT11μm and BT12μm, respectively) were used as a proxy of LST for Tair estimation (Table 1). AHI L2 cloud mask products were used to select clear sky pixels. The AHI data during summer 2016 and 2017 were collected four times a day (e.g., UTC00, 06, 12, and 18) from the JAXA Himawari Monitor (P-Tree System) (http://www.eorc.jaxa.jp/ptree/index.html). MODIS Vegetation Indices Monthly L3 Global 0.05 Deg CMG (MOD13C2) was obtained by NASA’s Earth Observing System Data and Information System (EOSDIS) (https://earthdata.nasa.gov/).

2.2.2. Meteorological Data

Tair, RH, and surface pressure observations at synoptic hours (e.g., UTC00, 06, 12, and 18) during summer 2016 and 2017 from 272 surface stations in China (Figure 1) were used. The data were downloaded from China Meteorological Data service Center (CMDC) (http://data.cma.cn/).

2.2.3. Auxiliary Data

Previous studies have shown that the addition of auxiliary parameters can improve the accuracy of Tair estimation [3, 10, 11, 15, 16, 3739]. In this study, NDVI, PWV, RH, latitude (hereafter as LAT), longitude (hereafter as LON), Julian day (JD), hour, elevation (hereafter as EL), distance to coast (DTC), surface pressure (SP), and view zenith angle (VZA) were preliminary selected.

PWV, RH, and SP were taken from GFS 3-hourly forecast fields and then interpolated linearly according to the location and time information of the AHI soundings. GFS data were obtained through the NOAA National Operational Model Archive and Distribution System (NOMADS) (ftp://nomads.ncdc.noaa.gov/GFS/). LAT, LON, EL, and VZA were obtained from the auxiliary geolocation dataset of Himawari-8 provided by Japan Meteorological Agency. NDVI was derived from MODIS Vegetation Indices Monthly L3 Global 0.05 Deg CMG (MOD13C2). JD is the continuous count of days from January 1 every year. DTC was derived from Global Self-Consistent Hierarchical High-resolution Geography database (GSHHG) (http://www.soest.hawaii.edu/pwessel/gshhs/index.html). Detailed information of the primary data used in this study is shown in Table 2.

3. Methodology

3.1. Correlation Analysis

Correlation analysis was performed using SPSS 20.0. The results showed that Tair had a strong correlation (e.g., R = ±0.60–±0.80) with BT11μm, BT12μm, EL, and SP, while moderate correlation (e.g., R = ±0.40–±0.60) with VZA and PWV (as shown in Table 3). Compared with BT12μm (R = 0.69), BT11μm was more correlated with Tair (R= 0.75). The possible explanation is that the channel at 11 μm was located at a more transparent atmospheric window, and BT11μm was less affected by the atmosphere and was closer to LST than BT12μm. In addition, Tair had weak correlation (e.g., R = ±0.20–±0.40) with LAT, LON, and DTC and very weak correlation (e.g., R = 0–± 0.20) with JD, hour, RH, and NDVI.

It is worth noting that Pearson’s correlation coefficient only measures the linear relationship between two variables. There may be a nonlinear relationship between two variables when correlation coefficient is low. Some important variables may be omitted if using only correlation analysis to select predictors for Tair estimation. Therefore, stepwise linear regression was further applied to evaluate the contributions of various factors towards Tair estimation in the following section.

3.2. Stepwise Linear Regression

Stepwise linear regression is a method of regression with multiple variables while simultaneously removing the variables which are not important. In this study, the best-fit model was selected using Akaike information criterion (AIC) [40]. The best-fit model minimizes AIC by maximizing the log-likelihood, with penalties for the number of predictors included, removing strongly correlated variables and a prior examining of individual variables [41]. In each step, a variable is considered adding to (or subtracting from) the set of explanatory variables by stepwise regression procedures using AIC. This process is repeated until there are no significant variables to enter or remove any more. In this study, stepwise linear regression analysis was performed using SPSS 20.0.

3.3. Neural Networks

Artificial neural networks (NNs) are computing systems based on a collection of connected artificial neurons and are especially suited for handling nonlinear and complex problems. Neural network techniques can describe the relationship between inputs and outputs from training data and have been widely used in geophysical parameter estimations.

In this study, a multilayer feedforward neural network was used to construct the Tair estimation model. The 2016 datasets were used to train the neural network, and the 2017 datasets were used to validate the trained neural network.

3.4. Error Analysis

Three statistical factors, correlation coefficient (R), root-mean-square error (RMSE), and bias, were used to evaluate the accuracy of the Tair estimation model as follows:where is the estimated Tair, is the observed Tair at the meteorological stations, and N is the sample size. Generally, R is used to assess the degree of consistency but not absolute agreement, and a positive (negative) bias indicates an overestimation (underestimation) of Tair. Low RMSE values indicate a small discrepancy between the estimated and the true Tair.

4. Results and Analysis

The instantaneous summer Tair estimations over China were performed by the linear regression and NN models, respectively. The collocated AHI, meteorological, and GFS datasets over China in 2016 were used to construct Tair estimation models, and the datasets in 2017 were used to validate the constructed models. Validation of the linear and NN models was presented in this section.

4.1. Tair Estimation Results
4.1.1. Stepwise Linear Regression Model Results

A total of nine variables were selected for the best fit model by the stepwise regression procedure due to AIC. For purposes of evaluating the contribution of each predictor in the model to Tair estimation, nine linear Tair estimation models (hereafter as linear models) were constructed when 9 predictors (BT11μm, PWV, RH, EL, BT12μm, LAT, hour, LON, and VZA) were successively introduced (Table 4). Meanwhile, nine NN models (hereafter as AHI NN models) were also built using the same predictors of the linear models.

BT11μm was the first predictor introduced into the linear model by the stepwise regression approach (model 1), indicating that the contribution of BT11μm to the Tair estimation was superior to other predictors. The accuracy of Tair estimation was obviously improved when PWV, EL, and VZA were included in the model. It indicates that BT11μm, PWV, EL, and VZA play a more important role than other variables in Tair estimation for the linear models. The contributions of PWV towards the Tair estimation can be classified into two parts. First, PWV has a good positive correlation (R = 0.57) with Tair. Second, PWV is one of the main parameters needed for atmospheric correction and land surface temperature retrieval. Therefore, the Tair estimation accuracy was reasonably improved by introducing PWV to the models in which AHI BTs were used as a proxy for LST.

However, the changes in R and RMSE were less than 0.01 and 0.14°C when RH, BT12μm, LAT, and hour were further added to the models, indicating less influences of those predictors on the Tair estimation in the linear models. As BT12μm has a strong linear correlation with BT11μm, the accuracy of Tair estimation cannot be effectively improved by introducing BT12μm when BT11μm has been included in the model. It is worth noting that SP, JD, NDVI, and DTC were not included in the models by the stepwise linear regression approach. SP has a strong correlation with EL (R = −0.9947); therefore, SP was removed due to the multicollinearity in the model. JD, NDVI, and DTC do not significantly reduce the value of AIC, so they are also not included in the model. It indicates that JD, NDVI, and DTC cannot provide more useful information for the summer Tair estimation by the linear model.

4.1.2. Neural Network Model Results

BT11μm, PWV, RH, and EL play significant roles in Tair estimation in the AHI NN models. R, RMSE, and bias of the NN model by the four predictors (Model 4) were 0.95, 2.15°C, and 0.12°C, respectively. The changes of R and RMSE were less than 0.01 and 0.06°C when BT12μm, LAT, LON, and VZA were further added into the models, indicating these factors have less effect on Tair estimation when EL and BT11μm have been included. Unlike linear models, time information (i.e., hour) can effectively improve the accuracy of the NN models. The RMSE was reduced by 0.25°C when hour was added in the NN model (e.g., Model 7), while it can be negligible in the linear models.

In general, the Tair estimation accuracy of the AHI NN models was better than the corresponding linear models. The RMSE differences between the linear and NN models were larger than 1.0°C while can reach up to 2.5°C for models 3–8. As shown in Table 4, NN Model 9 had the highest accuracy for Tair estimation. Therefore, NN Model 9 was finally selected for Tair estimation, and the following results and discussions were mainly according to this model.

R, RMSE, and bias were 0.89, 3.28°C, and 0.07°C, respectively, for the linear model 9 (Figure 2(a)), while were 0.97, 1.72°C, and 0.04°C for the NN model 9 (Figure 2(b)). Figure 3 shows the two-dimensional histogram for estimated Tair and in-situ Tair observations. About 45.8% and 75.6% of the Tair errors were within 1.0°C and 2.0°C, respectively, for the NN model, while about 26.5% and 50.1% of the errors were within 1.0°C and 2.0°C, respectively, for the linear model.

In addition, the accuracy of the GFS Tair over China was also evaluated. R, RMSE, and bias between GFS Tair and in situ Tair observations in summer 2017 were 0.91, 2.68°C, and 0.19°C, respectively. The accuracy of GFS Tair was worse than the result of the NN model but slightly better than that of the linear model.

4.2. Spatial Distribution of Tair Estimation Errors

In order to further evaluate the applicability of NN model 9 for Tair estimation over China, Tair estimation error (in terms of R, RMSE, and bias) for each site was investigated, and its spatial distribution characteristics were analyzed.

The results showed that the accuracy of the NN model was better than the linear model for most sites. The correlation coefficient of the NN model mainly ranged from 0.90 to 0.98, while the RMSE ranged from 1.0°C to 2.5°C (see Figure 4). The bias was within ±1°C for the NN model. The RMSE of the NN model at ∼75.46% and ∼95.25% of the sites was less than 2.0°C and 2.5°C, respectively. It indicates that the feasibility of the NN model to estimate Tair using AHI BTs as a proxy for LST.

Furthermore, the results exhibited obvious spatial differences in the accuracy of the Tair estimation algorithm. The RMSE of the NN model in most sites over southeast China was less than 1.75°C while that in northwest China was about 2.0–2.5°C. In contrast, the performance of the linear model varied greatly with meteorological sites. The RMSE of the linear model in southeast China was mainly ranging from 2.0 to 3.0°C, while that in northwest China was generally larger than 3.25°C.

There are three possible explanations for the above results that the performance of the Tair estimation model was better over southeast China than that over northwest China. Firstly, the distribution of meteorological stations was uneven in China and mainly located in areas with a dense population (e.g., eastern China). Secondly, the accuracy of numerical models (e.g., GFS models) in eastern China is usually better than that in western areas [42]. Thirdly, the VZAs of AHI over the western area were larger than the eastern area, which can reach up to 80°. It means that the AHI observations in western areas are more easily affected by the atmosphere. In addition, the Tair difference between day and night in northeast areas is larger than southeast China due to the higher latitude. Although the correlation coefficients of Northeast China and Southeast China are comparable, the RMSE of Northeast China is slightly larger than that of Southwest China due to the large range of temperature change.

4.3. Effect of RH Uncertainty on Tair Estimation

As mentioned above, introducing PWV and RH information can obviously improve the accuracy of the Tair estimation models. However, it also can be one of the error sources. Therefore it is necessary to analyze the influence of GFS PWV and RH uncertainty on the Tair estimation. There was no synchronous observation of PWV (e.g., GPS and radiosonde PWV) available in this study, and the effect of GFS RH uncertainty on the Tair estimation was only investigated using the observed RH at 272 meteorological sites.

The applicability of GFS RH over China was evaluated by comparing with the observed RH at meteorological sites. The result showed that GFS RH error over western China was larger than eastern China, which was consistent with the previous study [42]. The RMSE of GFS RH was about 10–15% over eastern China while was about 15–20% over western China.

The accuracy of the linear and NN models was both obviously improved when observed RH (considered as “true” RH) was used (Figure 5). The change in R, RMSE, and bias for the linear model was about 0.02, 0.25°C, and 0.05°C due to GFS RH uncertainty, while 0.01, 0.19°C, and 0.01°C for the NN model (Figure 5). It indicated that the linear model was more easily affected by the RH error.

In order to investigate the spatial distribution of RH influence on the Tair estimation, the RMSE difference caused by GFS RH errors for each site was also presented (Figure 6). The RMSE difference of the linear model mainly ranged from 0 to 1.0°C. It can reach up to 1.5°C in western areas (e.g., Tibetan Plateau) due to the larger GFS RH errors. In contrast, the RMSE difference of NN model was less than 0.5°C for most sites, indicating the effect of RH uncertainty on the NN model was relatively weak.

5. Conclusion

The new generation of geostationary meteorological satellites (e.g., Himawari-8/9, GOES-16/17, and FY-4A) provides the unprecedented opportunities for Tair estimation with higher temporal and spatial resolutions. Considering no operational LST products available for Himawari-8 recently, we constructed an instantaneous Tair estimation model using AHI/Himawari-8 BTs in thermal infrared channels and other auxiliary parameters.

Correlation analysis and stepwise linear regression were used to select the predictors for Tair estimation, and 9 factors (BT11μm, PWV, RH, EL, BT12μm, LAT, hour, LON, and VZA) were selected. The contributions of BT11μm, PWV, RH, and EL to the precision of estimated Tair were far superior to other predictors. Stepwise linear regression and NN approaches were, respectively, applied to construct Tair estimation models using the selected predictors. The collected datasets, including AHI BT11μm, BT12μm, and L2 cloud mask, meteorological data, GFS forecast fields, and other auxiliary data, at 272 surface stations in China during summer 2016 were used to construct the models, while the data during summer 2017 were used to evaluate the models.

Results showed that Tair estimation accuracy of the NN model was obviously better than the linear model. R, RMSE, and bias of estimated Tair by the NN model were 0.97, 1.72°C, and 0.04°C respectively, while were 0.89, 3.28°C, and 0.07°C, respectively, for the linear model. The accuracy of the GFS Tair over China was also evaluated. R, RMSE, and bias of the GFS Tair in summer 2017 were 0.91, 2.68°C, and 0.19°C, respectively. The accuracy of the NN model was better than the GFS Tair, while that of the linear model was worse than GFS Tair. It indicates that the linear model cannot provide Tair estimation with better accuracy than the GFS Tair. About 75.6% of the Tair differences by the NN model were within 2.0°C, and even 45.8% were within 1.0°C. It can be concluded that it is feasible for instantaneous summer Tair estimation over China using the AHI BTs as a proxy for LST.

The spatial distribution of the meteorological sites used in this study was uneven, and the sites were mainly located in densely populated areas (e.g., eastern China). Therefore, the number and locations of the sites could cause biases in the models. Furthermore, this study focus on the summer Tair estimation in China, and the accuracy of the model for other seasons needs further analysis. The contributions of the predictors to the Tair estimation model may be different for different areas and seasons. For example, although DTC was not included in the proposed Tair estimation model, it may play an important role in Tair estimation over coastal areas. In addition, the influence of the GFS PWV error on Tair estimation needs further investigation.

Data Availability

The Himawari data were downloaded from JAXA Himawari Monitor (P-Tree System) http://www.eorc.jaxa.jp/ptree/index.html. The GFS data were obtained via https://www.nco.ncep.noaa.gov/pmb/products/gfs/, and meteorological station data used in this study were accessed at China Meteorological Data service Center (CMDC) (http://data.cma.cn/).

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (41475032) and the National Key Research and Development Program of China (2016YFA0600101). The authors would like to thank JAXA for providing Himawari-8 data, National Centers for Environmental Prediction (NCEP) for GFS data, and China Meteorological Data service Center (CMDC) for the meteorological data.