Next Article in Journal
Formation and Precipitation Processes of the Southwest Vortex Impacted by the Plateau Vortex
Previous Article in Journal
Assessing the COVID-19 Lockdown Impact on Global Air Quality: A Transportation Perspective
Previous Article in Special Issue
Development of Machine Learning and Deep Learning Prediction Models for PM2.5 in Ho Chi Minh City, Vietnam
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Inversion of Aerosol Chemical Composition in the Beijing–Tianjin–Hebei Region Using a Machine Learning Algorithm

1
School of Geoscience and Technology, Zhengzhou University, Zhengzhou 450001, China
2
School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China
3
Institute of Atmospheric Physics, Chinese Academy of Sciences, Beijing 100029, China
4
Inner Mongolia Autonomous Region Environmental Monitoring Center, Wuhai Branch, Wuhai 016000, China
5
Henan Provincial Climate Center, Zhengzhou 450003, China
6
Weather Modification Center of Henan Province, Zhengzhou 450001, China
7
Hebi Meteorological Bureau, Hebi 458000, China
*
Author to whom correspondence should be addressed.
Submission received: 20 November 2024 / Revised: 26 December 2024 / Accepted: 16 January 2025 / Published: 21 January 2025
(This article belongs to the Special Issue Atmospheric Pollution in Highly Polluted Areas)

Abstract

:
Aerosols and their chemical composition exert an influence on the atmospheric environment, global climate, and human health. However, obtaining the chemical composition of aerosols with high spatial and temporal resolution remains a challenging issue. In this study, using the NR-PM1 collected in the Beijing area from 2012 to 2013, we found that the annual average concentration was 41.32 μg·m−3, with the largest percentage of organics accounting for 49.3% of NR-PM1, followed by nitrates, sulfates, and ammonium. We then established models of aerosol chemical composition based on a machine learning algorithm. By comparing the inversion accuracies of single models—namely MLR (Multivariable Linear Regression) model, SVR (Support Vector Regression) model, RF (Random Forest) model, KNN (K-Nearest Neighbor) model, and LightGBM (Light Gradient Boosting Machine)—with that of the combined model (CM) after selecting the optimal model, we found that although the accuracy of the KNN model was the highest among the other single models, the accuracy of the CM model was higher. By employing the CM model to the spatially and temporally matched AOD (aerosol optical depth) data and meteorological data of the Beijing–Tianjin–Hebei region, the spatial distribution of the annual average concentrations of the four components was obtained. The areas with higher concentrations are mainly situated in the southwest of Beijing, and the annual average concentrations of the four components in Beijing’s southwest are 28 μg·m−3, 7 μg·m−3, 8 μg·m−3, and 15 μg·m−3 for organics, sulfates, ammonium, and nitrates, respectively. This study not only provides new methodological ideas for obtaining aerosol chemical composition concentrations based on satellite remote sensing data but also provides a data foundation and theoretical support for the formulation of atmospheric pollution prevention and control policies.

1. Introduction

In recent years, air pollution has been a serious environmental problem in China [1,2,3,4,5]. Aerosols play an important role, not only seriously affecting the quality of the atmospheric environment [6,7,8,9,10], but also having a significant impact on climate change [11,12,13,14,15] and human health [16,17,18,19]. At present, the common methods used to obtain the chemical composition of aerosols include sampling analysis [20,21,22,23], electron microscope scanning [24,25], and model simulation [26,27]. Although these methods can accurately obtain information on the various chemical components of aerosols, they have problems, such as difficulty in real-time monitoring and poor spatial coverage, and it remains difficult to obtain aerosol concentration data with long time series and large area coverage. Through global or regional coverage, satellite remote sensing can make up for the spatial discontinuity of ground monitoring stations. However, there are still some remote sensing mechanisms that present difficulties in quantifying parameter inversion problems. In order to solve this problem, some scholars have tried to combine machine learning with satellite remote sensing data, which has powerful nonlinear fitting ability [28], to achieve the inversion of ground PM2.5 concentration [29,30], and thus make up for the spatial fault and information gap in ground-based monitoring data. Huttunen et al. compared and analyzed the advantages and disadvantages of the lookup table method, nonlinear regression method, and four machine learning methods, and found that the aerosol optical thickness (AOD) obtained by most machine learning methods has higher accuracy [31]. Lanzaco et al. used AOD data from AERONET ground-based observations and meteorological factors (temperature, humidity, wind speed, and wind direction) to train a neural network for correcting MODIS AOD, through which the monthly mean error of satellite AOD was reduced to 0.1–0.6 [28]. Sun et al. built a PM2.5 concentration prediction model based on a deep neural network and used the 1 km resolution AOD products and meteorological data (temperature, humidity, wind speed, direction, visibility, etc.) from the stationary satellite Himawari-8 and the model to estimate hourly surface PM2.5 concentration in the Beijing–Tianjin–Hebei region [30]. Using MODIS AOD data and meteorological data, Chen et al. constructed a random forest model for retrieving PM2.5 concentration and combined the model to estimate the PM2.5 concentration in China from 2005 to 2016. The R2 and RMSE of the verified model were 0.83 and 28.1 µg·m−3 [32]. The technology of inverse representation of aerosol optical thickness and PM2.5 concentration based on machine learning and satellite remote sensing data is relatively mature. However, relatively few studies have been conducted on the inversion of aerosol chemical components based on machine learning.
The Beijing–Tianjin–Hebei (BTH) region is one of the three economic circles of China’s “capital economic circle”. It is not only the political, economic, and cultural center but also has important international influence. However, the rapid economic development and continuous expansion of the city have led to air pollution in the Beijing–Tianjin–Hebei region. Although the quality of its air environment has improved in recent years, it is still not optimistic. According to the national environmental quality status released by the Ministry of Ecology and Environment, in the past two years, cities such as Xingtai, Shijiazhuang, Handan, Baoding, and Tangshan in the Beijing–Tianjin–Hebei region are still in the bottom 20 of China’s key cities in terms of air quality. Therefore, retrieving the spatial distribution of aerosol chemical components in the Beijing–Tianjin-Hebei region through machine learning can not only clearly identify the source of air pollutants but also provide data support and theoretical guidance for government departments to formulate reasonable and effective air pollution prevention and control measures.

2. Data and Method

2.1. Observation Sites and Data Sources

The observational data used in this study were collected from Beijing station (39.97° N, 116.36° E), Xianghe Station (39.96° N, 116.95° E), and Xinglong station (40.40° N, 117.58° E), representing typical urban, suburb, and background environments over the Beijing–Tianjin–Hebei region (as shown in Figure 1). Among them, Beijing station is situated in the Institute of Atmospheric Physics, Chinese Academy of Sciences, approximately 15 m above the ground, with no significant point sources in the vicinity. The Xianghe observation station is located at the Xianghe Atmospheric Sounding Comprehensive Experiment Station of the Chinese Academy of Sciences, also 15 m above the ground. The station is surrounded by dense light industry and residential living areas, and there are no obvious local emissions or tall buildings nearby. Xinglong station is positioned in the Xinglong atmospheric background observation station, located south of Yanshan Mountain and north of the Great Wall, which is mainly encompassed by high mountains and less affected by human activities [33].
The aerosol chemical concentration data at the Beijing site were collected through aerosol mass spectrometers fabricated by the Aerodyne company. By employing an aerosol mass spectrometer to monitor the non-refractive (NR-PM1) portion of submicron aerosol PM1 (particle aerodynamic diameter ≤1 μm), the mass concentration of organics, nitrates, ammonium, sulfates, and nitrates can be derived [34]. The sampling period was 2012 to 2013 and the time resolution was 1 h. The aerosol chemical composition concentration data at Xinglong and Xianghe stations were acquired from particulate matter samples collected by the Wuhan Tianhong TH-150C medium flow sampler. Among them, the concentrations of organic carbon were measured using the thermal optical carbon analyzer developed by the Desert Research Institute of the United States, and water-soluble inorganic ions such as NH4+, NO3 and SO42− in PM2.5 were measured using ICS-90 ion chromatography. The time resolution of sampling is one week [33]. The proportions of various chemical components in PM1 and PM2.5 exhibit similarities over the BTH region [2,35]. Hence, data from the Beijing site of NR-PM1 was utilized to establish the models, and data from Xianghe and Xinglong of PM2.5 were used to examine the performance of the models.
Aerosol optical depth (AOD) is the integration of the aerosol extinction coefficient from the earth’s surface to the top of the atmosphere. MODIS AOD data is obtained from the LAADS website “https://ladsweb.modaps.eosdis.nasa.gov/ (accessed on 1 October 2024)” and the product of Terra C6 DB 10 km AOD data is selected. The temporal resolution is 1 day, and the spatial resolution is 0.1° × 0.1° [36]. The meteorological data used in this study are ERA-5 reanalysis data, with a spatial resolution of 0.25° × 0.25° and a temporal resolution of 1 h [37]. The meteorological variables used are Air Temperature at 2 m (2 Meter Temperature, T2M), Relative Humidity (RH), 10 Meter U Wind component (U10), 10 Meter V Wind component, (V10), Surface Pressure (SP) and Boundary Layer Height (BLH).
The different components of aerosols are classified according to their extinction properties into scattering and absorbing components. Light-scattering aerosols typically include sulfate, nitrate, ammonium salts, and most organic aerosols, while light-absorbing aerosols generally include black carbon, dust, and brown carbon [38]. The contributions of sulfate, ammonium, nitrate, and organic matter account for 70% of the total extinction [39]. The aerosol pollution in the Beijing–Tianjin–Hebei region primarily results from the rapid growth of secondary aerosols [40,41]. Therefore, our research focused on organics, ammonium, sulfates, and nitrates.

2.2. Analysis Method

2.2.1. Data Preprocessing

Owing to the disparity of temporal and spatial resolution of observation data, namely MODIS AOD data and ERA-5 reanalysis data, it is necessary to align the data in time and space before establishing the model. Firstly, the meteorological variable grid data of T2M, RH, U10, V10, SP, and BLH corresponding to the Beijing station (39.97° N, 116.36° E) were extracted from ERA-5 data, and the meteorological data corresponding to the Beijing station was acquired by averaging within a rectangle of 50 km × 50 km. Secondly, in the process of establishing the model, taking the observation time of AMS as the benchmark, the AOD and meteorological data were matched.

2.2.2. Machine Learning Method

This study selected six of the most prevalently used algorithms, including: Multivariable Linear Regression (MLR), which features a short modeling duration, and excellent interpretability; Support Vector Machine (SVM), which is appropriate for regression prediction with a smaller number of samples and holds an advantage in handling sample data that is linearly inseparable [42]; Random Forest (RF), an ensemble learning algorithm based on Bagging and decision trees [43,44,45]; K-Nearest Neighbor (KNN), a commonly employed algorithm for classification and regression [46]; XGBoost (eXtreme Gradient Boosting), a representative algorithm based on boosting [47]; and LightGBM (Light Gradient Boosting Machine), an algorithm evolved from GBDT (Gradient Boosted Decision Tree) [48]. The construction of the combined model is founded on the prediction outcomes of all the above-mentioned single models, and the prediction results of all the single models are combined through the grey wolf weight optimization algorithm, ultimately obtaining the final prediction results of the prediction samples [49].
In this study, the criterion coefficient (R-Square, R2), Mean Absolute Error (MAE), and Root Mean Squared Error (RMSE) were selected as the evaluation indices of the model performance, and the calculation formula of each evaluation index was as follows:
R 2 = 1 i = 1 n y i y ^ i 2 i = 1 n y i y ¯ i 2
M A E = 1 n i = 1 n y i y ^ i
y i is the observed value, y ¯ i is the average value, and y ^ i is the predicted value of the model or the inversion result.

3. Results and Discussion

3.1. Characteristics of Aerosol Chemical Composition Concentration in Beijing

As shown in Figure 2, the NR-PM1 in different seasons exhibits distinct characteristics. The average concentrations of NR-PM1 in spring, autumn, and winter were 41.98 μg·m−3, 44.93 μg·m−3, and 50.36 μg·m−3, respectively, which are significantly higher than the summer average of 22.32 μg·m−3. In addition, the average concentrations of organics, sulfates, ammonium, and nitrates in summer are also lower than those in other seasons. By comparing the concentrations of different chemical components, it can be revealed that the concentration and percentage of organics are the highest in different seasons. The average concentration of NR-PM1, organics, sulfates, ammonium, and nitrates in summer are all lower than those in other seasons. However, the percentage of sulfates in the summer average concentration is 23.68%, which is much higher than the percentage in other seasons. This is related to the fact that the temperature is higher in summer and the solar radiation is stronger, which is conducive to the transformation of sulfur dioxide into sulfates. The average concentration of nitrates in autumn is 10.14 μg·m−3, the highest among the four seasons. This is because the formation of nitrates is mainly based on photochemical reactions, but it is prone to decomposition at high temperatures. The radiation in autumn is stronger, and the temperature is moderate, which is conducive to the formation of nitrates and their existence as a compound in the atmosphere [23], so the nitrates concentration in autumn is higher. The concentration of organics and sulfates in winter are also higher than those in other seasons, and the percentage is the highest except for summer. This is mainly affected by the release of large amounts of sulfate precursors from fossil fuel combustion during winter heating in Beijing, as well as poor pollution dispersion conditions and lower boundary layer height, which will increase the concentration of pollutants [23].

3.2. Correlation Between AOD, Meteorological Factors and Aerosol Chemical Component

Statistically, the correlation coefficients between the concentrations of organics, sulfates, ammonium, nitrates, and AOD, as well as meteorological factors during the study period, were determined (Table 1). The concentrations of organics, sulfates, ammonium, nitrates, and AOD in the Beijing area demonstrated a linear correlation. The correlation between meteorological factors and the four chemical components of aerosol was high, especially T2M, RH, U10, V10, SP, and BLH. U10, SP, and BLH were negatively correlated with the concentrations of organics, sulfates, ammonium, and nitrates, while T2M, RH, and V10 were positively correlated with the concentrations of organics, sulfates, and ammonium. When there is a strong correlation between the input variables in the machine learning algorithm, the performance and stability of the constructed model will deteriorate [47]. We further analyzed the correlation between meteorological factors and AOD in Beijing and discovered that, except for the correlation between BLH and U10, which was above 0.5 and strong, the absolute values of the correlation coefficients among other features were all below 0.5, indicating that the linear correlation between other variables was weak. Considering the stability of the model, U10 was selected for exclusion when the model was established because, compared with the correlation coefficient between BLH and aerosol chemical composition concentration, U10 is a relatively less important feature. In summary, AOD, T2M, RH, V10, SP, and BLH can be used as input features for constructing models.

3.3. Establishment of Single Model and Comparison of Results for Different Chemical Components

During the construction of each model, the initial values of the parameters are set in accordance with the algorithmic characteristics and parameter-tuning experience of different models. GridSearch provided by Scikit-learn is adopted for hyperparameter optimization [50]. The optimal parameters of each model are obtained based on the evaluation indicators, and the respective machine learning models are constructed using the optimal parameters and the training set. The scope of optimization and the optimal model parameter combination determined by repeated experiments for each model are shown in Table 2.
Figure 3 and Table 3 show that the performance of the model of organics and sulfates is worse than that of ammonium and nitrates. In addition, in terms of the inversion accuracy of six models of each chemical component of aerosol, the KNN model has the highest inversion accuracy with higher R2, followed by the XGBoost model, RF model, and LightGBM model. The MLR model and SVR model have poor inversion accuracy among the six models. The red dots in Figure 3 represent the outliers, and the data are for 27 November 2012. From 16 November 2012 to 1 December 2012, Beijing experienced a prolonged period of air pollution. The maximum PM2.5 concentration reached 300 μg·m−3 in 1 December. The explosive increase of sulfates and organics was the primary reason for this pollution event. The simulation results indicate that the model has relatively low accuracy in simulating the concentration of organics and sulfates when dealing with severe pollution incidents.

3.4. Performance of Single Models in Xianghe and Xinglong

The satellite remote sensing AOD data and ERA-5 data corresponding to the latitude and longitude of Xianghe and Xinglong were extracted. Six machine learning models were used to estimate the concentrations of organics, sulfates, ammonium, and nitrates at Xianghe and Xinglong, and the MAE of the models was compared and analyzed by the observed data at Xianghe and Xinglong in Table 4. For Xianghe Station, the MAE between the estimated concentrations of organics, sulfates, ammonium, and nitrates by the LightGBM model and the measured values is the smallest, with MAE values of 17.53 μg·m−3, 9.43 μg·m−3, 5.18 μg·m−3, and 14.07 μg·m−3, respectively. Meanwhile, R2 between the estimated concentrations by the LightGBM model and the measured values is the highest, with values of 0.61, 0.55, 0.72, and 0.73 for organics, sulfates, ammonium, and nitrates. The MAE between the estimated concentrations of organics, sulfates, ammonium, and nitrates by the MLR model and the measured values is larger. The MAE values are 18.26 μg·m−3, 10.93 μg·m−3, 7.43 μg·m−3, and 14.16 μg·m−3, respectively. Overall, the MAE between the estimated concentrations of organics, sulfates, ammonium, and nitrates by the MLR model, SVR model, and XGBoost model and the measured values is larger, while the MAE between the estimated concentrations by the RF model, KNN model, and LightGBM model and the measured values is smaller. For Xinglong Station, single models show similar performance with Xianghe station, the MAE between the estimated concentrations of organics, sulfates, ammonium, and nitrates by the LightGBM model and the measured values is the smallest, with the values being 24.19 μg·m−3, 11.93 μg·m−3, 5.08 μg·m−3, and 17.65 μg·m−3, respectively. The R2 between the estimated concentrations by the LightGBM model and the measured values is the highest in addition to nitrates. The MAE between the MLR model-estimated concentrations of organic matter, sulfate, ammonium, and nitrate and the measured values is relatively large. The MAE values are 25.05 μg·m−3, 13.83 μg·m−3, 7.21 μg·m−3, and 17.41 μg·m−3, respectively.

3.5. Establishment of the Combined Model and Examination of the Performance

The combined model has two basic requirements of high precision and a large difference from the internal single model. According to the above research conclusions, the RF, KNN, XGBoost, and LightGBM models of each component meet the above two basic requirements. Therefore, this paper builds a combined model based on RF, KNN, XGBoost, and LightGBM, and names the combined model CM (Combined Model). In the process of constructing the combined model, the grey wolf weight optimization algorithm was used to optimize the weight of the single model inside the CM model, and then the CM model was constructed [49]. Figure 4 presents the flowchart for establishing the combined model.
The concentrations of organics, sulfates, ammonium, and nitrates at the Beijing, Xianghe, and Xinglong stations were estimated by the CM model, and the error of the model was compared with the measured values at the Beijing, Xianghe, and Xinglong stations.
Table 5 shows the MAE and R2 between the estimated concentrations of organics, sulfates, ammonium, and nitrates and the measured values at the Beijing, Xianghe, and Xinglong stations based on the CM model. By comparing the mean absolute error (MAE) and R2 of Beijing, Xianghe, and Xinglong, it is found that the magnitude of the error is in the order: Beijing < Xianghe < Xinglong, which is consistent with the result of the single model.
The MAE between the concentrations of organics, sulfates, ammonium, and nitrates estimated by the CM model and the measured values at the Beijing site was 13.08 μg·m−3, 7.87 μg·m−3, 4.65 μg·m−3, and 9.93 μg·m−3, respectively. The MAE between the concentrations of organics, sulfates, ammonium, and nitrates estimated by the CM model and the measured values in the Beijing station is lower than that of the single MLR model, SVR model, RF model, KNN model, and XGBoost model, and slightly higher than that of the LightGBM model.
The MAE between the concentrations of organics, sulfates, ammonium, and nitrates estimated by the CM model and the measured values at the Xianghe site was 17.23 μg·m−3, 10.17 μg·m−3, 5.47 μg·m−3, and 11.61 μg·m−3, respectively. Compared with the single model, it was found that the MAE between the concentration of organics, sulfates, and nitrates estimated by the CM model and the measured value was lower; the MAE between the estimated concentration of ammonium and the measured value was lower than that of the MLR model, SVR model, KNN model, and XGBoost model, and slightly higher than that of the LightGBM model and RF model.
The MAE between the concentrations of organics, sulfates, ammonium, and nitrates estimated by the CM model and the measured values at the Xinglong site was 24.05 μg·m−3, 12.58 μg·m−3, 5.71 μg·m−3, and 16.64 μg·m−3, respectively. Compared with the single model, it is found that the MAE between the organics concentration and the measured value of the Xinglong site estimated by the CM model is lower than that of most single models, while the MAE between the estimated sulfate and ammonium concentration and the measured value is lower than that of other single models except the LightGBM model. The MAE between the estimated nitrate concentration and the measured value was lower than that of the other single models except the RF model.
In summary, except for the LightGBM model and RF model, the average absolute error MAE between the concentrations of organics, sulfates, ammonium, and nitrates at the Beijing, Xianghe, and Xinglong sites estimated by the CM model and the measured values is smaller. The MAE magnitude of the mean absolute error in Beijing, Xianghe, and Xinglong is, in turn, Beijing < Xianghe < Xinglong, which is consistent with the six single models.
Beijing represents the urban areas of the BTH region, Xianghe represents the suburban areas, and Xinglong represents the rural areas. Since the model was established in Beijing, its accuracy will decline when applied to different sites, especially in Xinglong, which is farther away and has a lower air pollution level. The R2 between the combined model simulated results and observations for organics decreased from 0.84 in Beijing to 0.52 in Xinglong. We evaluate the accuracy level of our model by comparing it with other studies. Zhang et al. estimated the atmospheric columnar organics mass concentration from remote sensing measurements of aerosol spectral refractive indices using bimodal parameters. The R2 between the remote sensing results of organics in Beijing and the ground observation values was 0.35 [51]; Xie et al. established a comprehensive aerosol composition model to quantify black carbon (BC), brown carbon (BrC), mineral dust (DU), particulate organic matters, ammonium sulfate like (AS), sea salt, and aerosol water uptake. The R2 between the remote sensing results of AS in the Beijing suburbs and the ground observed sulfates is 0.67, and the R2 between the remote sensing results of AS and the ground observed nitrates is 0.71 [52]; Van Beelen et al. derived aerosol water and chemical composition using a modeling approach that combines individual measurements of remotely sensed aerosol properties (e.g., optical thickness, single-scattering albedo, refractive index, and size distribution) from an AERONET (Aerosol Robotic Network) Sun-sky radiometer with radiosonde measurements of relative humidity. The R2 between the remote sensing results of organic matter in rural areas and the ground observation values is 0.42, and the R2 between the remote sensing results of inorganic salts and the ground observation values is 0.35 [53].

3.6. Chemical Composition Concentration Estimation Based on Combination Model and Spatial Distribution Analysis

In order to estimate the annual and seasonal mean concentrations of aerosol chemical components in the Beijing–Tianjin–Hebei region, this study took 2012 as an example and estimated the daily concentrations of organics, sulfates, ammonium, and nitrates by using the CM model. By averaging the estimated daily concentrations of organics, sulfates, ammonium, and nitrates, the average annual and seasonal concentrations of organics, sulfates, ammonium, and nitrates in the Beijing–Tianjin–Hebei region in 2012 were finally obtained.
We input the gridded MODIS AOD data and meteorological data into the CM model to obtain the spatial distribution of the mass concentration of aerosol chemical components. Figure 5 shows the spatial distribution of the average annual mass concentration of organics, sulfates, ammonium, and nitrates. From the spatial perspective, these four chemical components in the Beijing–Tianjin–Hebei region exhibit similar distribution patterns, with higher values in the southwest but low values in the northwest. This is attributed to the more economically developed southwestern area of the BTH region, which features more intensive industrial activity, higher population density, and greater pollutant emissions from extensive human activities. Figure 5a shows the spatial distribution of the average annual mass concentration of organics. The organic matter concentration in the southwestern region of Beijing–Tianjin–Hebei exceeded 28 µg·m−3, with some areas even exceeding 33 µg·m−3. Figure 5b shows the spatial distribution of the annual mass concentration of sulfates. The average sulfates concentration in the south part of the BTH region is significantly higher than that in the north part, with the average annual concentration above 7 µg·m−3. Figure 5c shows the spatial distribution of the average annual mass concentration of ammonium. The average annual concentration of ammonium in the southwest of the BTH region is above 8 µg·m−3 and reached more than 10 µg·m−3 in some areas. Figure 5d shows the spatial distribution of the annual mass concentration of nitrates. In the whole study area, the highest average annual nitrates concentration was mainly located in the southwest part, and the average annual nitrates concentration was above 15 µg·m−3 and exceeded 18 µg·m−3 in some areas. Overall, the concentration of nitrates is higher than that of sulfates. Sulfates primarily originate from stationary sources, whereas nitrates are predominantly derived from traffic sources. Therefore, the ratio of sulfates to nitrates is often used to indicate the relative contributions of stationary and traffic sources [23]. When this ratio exceeds 1, it suggests that traffic sources contribute more to the total PM concentration in the BTH region.
As observed from Figure 6, the spatial distribution of the concentrations of the four aerosol chemical components exhibits distinct seasonal variations. The average concentrations of organics and sulfates in winter were significantly higher than those in the other three seasons. The areas with higher average concentrations in winter were mainly distributed in the south area of the BTH region, where the concentrations of organics and sulfates were above 30 µg·m−3 and 15 µg·m−3, respectively. The main reason for this phenomenon is that winter heating requires burning a large amount of fossil energy such as coal, which leads to excessive emissions of pollutants. The mass concentrations of the four components were all relatively low in summer. The spatial mean of organics concentration is about 20 µg·m−3, which is associated with the high humidity and precipitation in summer. The spatial distribution of nitrates in the Beijing–Tianjin–Hebei region was significantly higher in autumn and spring than in summer. Nitrates are a typical product of photochemical reactions. In autumn, strong radiation facilitates the formation of nitrate, and moderate temperatures promote its retention in the atmosphere [23].

4. Conclusions

Through the analysis of the chemical composition of aerosols in the Beijing area from 2012 to 2013, it was found that the annual average concentration of NR-PM1 was 41.32 µg·m−3, the annual average concentration of organics was 20.37 µg/m3, accounting for 49.3% of NR-PM1, the largest among the four components, followed by nitrates, sulfates, and ammonium. From the seasonal change, the Beijing area was most polluted in autumn and winter, followed by spring, and the lightest in summer. Through the comparison of different single models and combination models, it was found that four single models—RF, KNN, XGBoost, and LightGBM—performed relatively well, but the CM model had higher accuracy with smaller MAE and higher R2. Therefore, the spatial distribution of aerosol chemical composition concentration obtained by using the CM model is the most reasonable. Based on the CM model, the estimated concentrations of the four components in the Beijing, Xinglong, and Xianghe areas are basically consistent with the measured values. The spatial distribution of the annual average concentrations of organics, sulfates, ammonium, and nitrates overall shows a south-high-north-low pattern. The annual average concentrations of the four components in the southwestern part of Beijing are the highest, ranging from more than 28 µg·m−3 for organics to more than 15 µg·m−3 for nitrates. From the spatial distribution of the annual average concentrations of organics, sulfates, ammonium, and nitrates in the four seasons, the concentrations of organics and sulfates are relatively consistent in all seasons, while the distribution characteristics of ammonium and nitrates are similar. The spatial distribution of the annual average concentrations of the four components overall shows a south-high-north-low pattern, which is similar to the spatial distribution of the annual average concentrations of organics, sulfates, ammonium, and nitrates.

Author Contributions

Conceptualization, L.K.; software, B.L.; validation, C.S. and R.S.; formal analysis, B.L.; investigation, B.L. and C.S.; resources, Z.S. and W.Z.; data curation, R.S. amd W.Z.; writing—original draft preparation, B.L. and G.C.; writing—review and editing, B.L., G.C., C.S., R.S., P.Z., W.Z. and L.K.; visualization, G.C.; supervision, L.K. and W.Z.; project administration, L.K.; funding acquisition, Z.S., P.Z. and L.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the General Program of the National Natural Science Foundation of China (42375085), and the Anyang National Climate Observatory open fund (AYNCOF202415).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Fan, H.; Zhao, C.; Yang, Y. A comprehensive analysis of the spatio-temporal variation of urban air pollution in China during 2014–2018. Atmos. Environ. 2020, 220, 117066. [Google Scholar] [CrossRef]
  2. Shao, P.; Tian, H.; Sun, Y.; Liu, H.; Wu, B.; Liu, S.; Liu, X.; Wu, Y.; Liang, W.; Wang, Y.; et al. Characterizing remarkable changes of severe haze events and chemical compositions in multi-size airborne particles (PM1, PM2.5 and PM10) from January 2013 to 2016–2017 winter in Beijing, China. Atmos. Environ. 2018, 189, 133–144. [Google Scholar] [CrossRef]
  3. Kong, L.; Xin, J.; Zhang, W.; Wang, Y. The empirical correlations between PM2.5, PM10 and AOD in the Beijing metropolitan region and the PM2.5, PM10 distributions retrieved by MODIS. Environ. Pollut. 2016, 216, 350–360. [Google Scholar] [CrossRef]
  4. Zhang, Q.; Meng, X.; Shi, S.; Kan, L.; Chen, R.; Kan, H. Overview of particulate air pollution and human health in China: Evidence, challenges, and opportunities. Innovation 2022, 3, 100312. [Google Scholar] [CrossRef] [PubMed]
  5. Song, C.; Wu, L.; Xie, Y.; He, J.; Chen, X.; Wang, T.; Lin, Y.; Jin, T.; Wang, A.; Liu, Y.; et al. Air pollution in China: Status and spatiotemporal variations. Environ. Pollut. 2017, 227, 334–347. [Google Scholar] [CrossRef] [PubMed]
  6. Kong, L.; Xin, J.; Liu, Z.; Zhang, K.; Tang, G.; Zhang, W.; Wang, Y. The PM2.5 threshold for aerosol extinction in the Beijing megacity. Atmos. Environ. 2017, 167, 458–465. [Google Scholar] [CrossRef]
  7. Cao, J.; Wang, Q.; Chow, J.C.; Watson, J.G.; Tie, X.-X.; Shen, Z.-X.; Wang, P.; An, Z.-S. Impacts of aerosol compositions on visibility impairment in Xi’an, China. Atmos. Environ. 2012, 59, 559–566. [Google Scholar] [CrossRef]
  8. Tao, J.; Zhang, L.; Ho, K.; Zhang, R.; Lin, Z.; Zhang, Z.; Lin, M.; Cao, J.; Liu, S.; Wang, G. Impact of PM2.5 chemical compositions on aerosol light scattering in Guangzhou—The largest megacity in South China. Atmos. Res. 2014, 135, 48–58. [Google Scholar] [CrossRef]
  9. Wang, Y.H.; Liu, Z.R.; Zhang, J.K.; Hu, B.; Ji, D.S.; Yu, Y.C.; Wang, Y.S. Aerosol physicochemical properties and implications for visibility during an intense haze episode during winter in Beijing. Atmos. Chem. Phys. 2015, 15, 3205–3215. [Google Scholar] [CrossRef]
  10. Che, H.; Xia, X.; Zhao, H.; Li, L.; Gui, K.; Zheng, Y.; Song, J.; Qi, B.; Zhu, J.; Miao, Y.; et al. Aerosol optical and radiative properties and their environmental effects in China: A review. Earth-Sci. Rev. 2024, 248, 104634. [Google Scholar] [CrossRef]
  11. Levy, H.; Horowitz, L.W.; Schwarzkopf, M.D.; Ming, Y.; Golaz, J.-C.; Naik, V.; Ramaswamy, V. The roles of aerosol direct and indirect effects in past and future climate change. J. Geophys. Res. Atmos. 2013, 118, 4521–4532. [Google Scholar] [CrossRef]
  12. Lee, S.Y.; Wang, C. The response of the South Asian summer monsoon to temporal and spatial variations in absorbing aerosol radiative forcing. J. Clim. 2015, 28, 6626–6646. [Google Scholar] [CrossRef]
  13. Li, Z.; Xia, X.; Cribb, M.; Mi, W.; Holben, B.; Wang, P.; Chen, H.; Tsay, S.-C.; Eck, T.F.; Zhao, F.; et al. Aerosol optical properties and their radiative effects in northern China. J. Geophys. Res. Atmos. 2007, 112, D22S01. [Google Scholar] [CrossRef]
  14. Raju, M.P.; Safai, P.D.; Sonbawne, S.M.; Naidu, C.V. Black carbon radiative forcing over the Indian Arctic station, Himadri during the Arctic Summer of 2012. Atmos. Res. 2015, 157, 29–36. [Google Scholar] [CrossRef]
  15. Jia, H.; Ma, X.; Yu, F.; Quaas, J. Significant underestimation of radiative forcing by aerosol–cloud interactions derived from satellite-based methods. Nat. Commun. 2021, 12, 3649. [Google Scholar] [CrossRef] [PubMed]
  16. Bono, R.; Tassinari, R.; Bellisario, V.; Gilli, G.; Pazzi, M.; Pirro, V.; Mengozzi, G.; Bugiani, M.; Piccioni, P. Urban air and tobacco smoke as conditions that increase the risk of oxidative stress and respiratory response in youth. Environ. Res. 2015, 137, 141–146. [Google Scholar] [CrossRef]
  17. Perrone, M.G.; Gualtieri, M.; Consonni, V.; Ferrero, L.; Sangiorgi, G.; Longhin, E.; Ballabio, D.; Bolzacchini, E.; Camatini, M. Particle size, chemical composition, seasons of the year and urban, rural or remote site origins as determinants of biological effects of particulate matter on pulmonary cells. Environ. Pollut. 2013, 176, 215–227. [Google Scholar] [CrossRef]
  18. Tang, G.; Zhao, P.; Wang, Y.; Gao, W.; Cheng, M.; Xin, J.; Li, X.; Wang, Y. Mortality and air pollution in Beijing: The long-term relationship. Atmos. Environ. 2017, 150, 238–243. [Google Scholar] [CrossRef]
  19. Singh, A.; Pant, P.; Pope, F.D. Air quality during and after festivals: Aerosol concentrations, composition and health effects. Atmos. Res. 2019, 227, 220–232. [Google Scholar] [CrossRef]
  20. Ji, D.; Zhang, J.; He, J.; Wang, X.; Pang, B.; Liu, Z.; Wang, L.; Wang, Y. Characteristics of atmospheric organic and elemental carbon aerosols in urban Beijing, China. Atmos. Environ. 2016, 125, 293–306. [Google Scholar] [CrossRef]
  21. Tian, S.; Pan, Y.; Liu, Z.; Wen, T.; Wang, Y. Size-resolved aerosol chemical analysis of extreme haze pollution events during early 2013 in urban Beijing, China. J. Hazard. Mater. 2014, 279, 452–460. [Google Scholar] [CrossRef]
  22. Jeong, C.H.; McGuire, M.L.; Godri, K.J.; Slowik, J.G.; Rehbein, P.J.G.; Evans, G.J. Quantification of aerosol chemical composition using continuous single particle measurements. Atmos. Chem. Phys. 2011, 11, 7027–7044. [Google Scholar] [CrossRef]
  23. Huang, X.; Liu, Z.; Liu, J.; Hu, B.; Wen, T.; Tang, G.; Zhang, J.; Wu, F.; Ji, D.; Wang, L.; et al. Chemical characterization and source identification of PM2.5 at multiple sites in the Beijing–Tianjin–Hebei region, China. Atmos. Chem. Phys. 2017, 17, 12941–12962. [Google Scholar] [CrossRef]
  24. Li, J.; Anderson, J.R.; Buseck, P.R. TEM study of aerosol particles from clean and polluted marine boundary layers over the North Atlantic. J. Geophys. Res. Atmos. 2003, 108, 4189. [Google Scholar] [CrossRef]
  25. Gieré, R.; Blackford, M.; Smith, K. TEM study of PM2.5 emitted from coal and tire combustion in a thermal power station. Environ. Sci. Technol. 2006, 40, 6235–6240. [Google Scholar] [CrossRef] [PubMed]
  26. Ginoux, P.; Chin, M.; Tegen, I.; Prospero, J.M.; Holben, B.; Dubovik, O.; Lin, S.-J. Sources and distributions of dust aerosols simulated with the GOCART model. J. Geophys. Res. Atmos. 2001, 106, 20255–20273. [Google Scholar] [CrossRef]
  27. Huneeus, N.; Chevallier, F.; Boucher, O. Estimating aerosol emissions by assimilating observed aerosol optical depth in a global aerosol model. Atmos. Chem. Phys. 2015, 12, 4585–4606. [Google Scholar] [CrossRef]
  28. Lanzaco, B.L.; Olcese, L.E.; Palancar, G.G.; Toselli, B.M. An Improved Aerosol Optical Depth Map Based on Machine-Learning and MODIS Data: Development and Application in South America. Aerosol Air Qual. Res. 2017, 17, 1623–1636. [Google Scholar] [CrossRef]
  29. Zhan, Y.; Luo, Y.; Deng, X.; Chen, H.; Grieneisen, M.L.; Shen, X.; Zhu, L.; Zhang, M. Spatiotemporal prediction of continuous daily PM2.5 concentrations across China using a spatially explicit machine learning algorithm. Atmos. Environ. 2017, 155, 129–139. [Google Scholar] [CrossRef]
  30. Sun, Y.; Zeng, Q.; Geng, B.; Lin, X.; Sude, B.; Chen, L. Deep learning architecture for estimating hourly ground-level PM2.5 using satellite remote sensing. IEEE Geosci. Remote Sens. Lett. 2019, 16, 1343–1347. [Google Scholar] [CrossRef]
  31. Huttunen, J.; Kokkola, H.; Mielonen, T.; Mononen, M.E.J.; Lipponen, A.; Reunanen, J.; Lindfors, A.V.; Mikkonen, S.; Lehtinen, K.E.J.; Kouremeti, N.; et al. Retrieval of aerosol optical depth from surface solar radiation measurements using machine learning algorithms, non-linear regression and a radiative transfer-based look-up table. Atmos. Chem. Phys. 2016, 16, 8181–8191. [Google Scholar] [CrossRef]
  32. Chen, G.; Li, S.; Knibbs, L.D.; Hamm, N.A.S.; Cao, W.; Li, T.; Guo, J.; Ren, H.; Abramson, M.J.; Guo, Y. A machine learning method to estimate PM2.5 concentrations across China with remote sensing, meteorological and land use information. Sci. Total Environ. 2018, 636, 52–60. [Google Scholar] [CrossRef] [PubMed]
  33. Xin, J.; Wang, Y.; Pan, Y.; Ji, D.; Liu, Z.; Wen, T.; Wang, Y.; Li, X.; Sun, Y.; Sun, J.; et al. The campaign on atmospheric aerosol research network of China: CARE-China. Bull. Am. Meteorol. Soc. 2015, 96, 1137–1155. [Google Scholar] [CrossRef]
  34. Zhang, J.K.; Sun, Y.; Liu, Z.R.; Ji, D.S.; Hu, B.; Liu, Q.; Wang, Y.S. Characterization of submicron aerosols during a month of serious pollution in Beijing, 2013. Atmos. Chem. Phys. 2014, 14, 2887–2903. [Google Scholar] [CrossRef]
  35. Khan, J.Z.; Sun, L.; Tian, Y.; Shi, G.; Feng, Y. Chemical characterization and source apportionment of PM1 and PM2.5 in Tianjin, China: Impacts of biomass burning and primary biogenic sources. J. Environ. Sci. 2021, 99, 196–209. [Google Scholar] [CrossRef] [PubMed]
  36. Bilal, M.; Nazeer, M.; Nichol, J.; Qiu, Z.; Wang, L.; Bleiweiss, M.P.; Shen, X.; Campbell, J.R.; Lolli, S. Evaluation of terra-MODIS C6 and C6.1 aerosol products against Beijing, XiangHe, and Xinglong AERONET sites in China during 2004-2014. Remote Sens. 2019, 11, 486. [Google Scholar] [CrossRef]
  37. Jiao, D.; Xu, N.; Yang, F.; Xu, K. Evaluation of spatial-temporal variation performance of ERA5 precipitation data in China. Sci. Rep. 2021, 11, 17956. [Google Scholar] [CrossRef]
  38. Kaskaoutis, D.G.; Grivas, G.; Stavroulas, I.; Liakakou, E.; Dumka, U.C.; Dimitriou, K.; Gerasopoulos, E.; Mihalopoulos, N. In situ identification of aerosol types in Athens, Greece, based on long-term optical and on online chemical characterization. Atmos. Environ. 2021, 246, 118070. [Google Scholar] [CrossRef]
  39. Kong, L.; Xin, J.; Gao, W.; Tang, G.; Wang, X.; Wang, Y.; Zhang, W.; Chen, W.; Jia, S. A comprehensive evaluation of aerosol extinction apportionment in Beijing using a high-resolution time-of-flight aerosol mass spectrometer. Sci. Total Environ. 2021, 783, 146976. [Google Scholar] [CrossRef]
  40. Ji, D.; Li, L.; Wang, Y.; Zhang, J.; Cheng, M.; Sun, Y.; Liu, Z.; Wang, L.; Tang, G.; Hu, B.; et al. The heaviest particulate air-pollution episodes occurred in northern China in January, 2013: Insights gained from observation. Atmos. Environ. 2014, 92, 546–556. [Google Scholar] [CrossRef]
  41. Ausati, S.; Amanollahi, J. Assessing the accuracy of ANFIS, EEMD-GRNN, PCR, and MLR models in predicting PM2.5. Atmos. Environ. 2016, 142, 465–474. [Google Scholar] [CrossRef]
  42. Hamidi, S.K.; Zenner, E.K.; Bayat, M.; Fallah, A. Analysis of plot-level volume increment models developed from machine learning methods applied to an uneven-aged mixed forest. Ann. For. Sci. 2021, 78, 4. [Google Scholar] [CrossRef]
  43. Hu, X.; Belle, J.H.; Meng, X.; Wildani, A.; Waller, L.A.; Strickland, M.J.; Liu, Y. Estimating PM2.5 concentrations in the conterminous United States using the random forest approach. Environ. Sci. Technol. 2017, 51, 6936–6944. [Google Scholar] [CrossRef] [PubMed]
  44. Huang, K.; Xiao, Q.; Meng, X.; Geng, G.; Wang, Y.; Lyapustin, A.; Gu, D.; Liu, Y. Predicting monthly high-resolution PM2.5 concentrations with random forest model in the North China Plain. Environ. Pollut. 2018, 242, 675–683. [Google Scholar] [CrossRef]
  45. Thanh Noi, P.; Kappas, M. Comparison of random forest, k-nearest neighbor, and support vector machine classifiers for land cover classification using Sentinel-2 imagery. Sensors 2017, 18, 18. [Google Scholar] [CrossRef] [PubMed]
  46. Zhuang, Z.; Zhang, H.; Chan, P.W.; Tai, H.; Deng, Z. A Machine Learning-Based Model for Flight Turbulence Identification Using LiDAR Data. Atmosphere 2023, 14, 797. [Google Scholar] [CrossRef]
  47. Li, K.; Li, L.; Hu, A.; Pan, J.; Ma, Y.; Zhang, M. Research on Modeling Weighted Average Temperature Based on the Machine Learning Algorithms. Atmosphere 2023, 14, 1251. [Google Scholar] [CrossRef]
  48. Huang, R.J.; Zhang, Y.; Bozzetti, C.; Ho, K.-F.; Cao, J.-J.; Han, Y.; Daellenbach, K.R.; Slowik, J.G.; Platt, S.M.; Canonaco, F.; et al. High secondary aerosol contribution to particulate pollution during haze events in China. Nature 2014, 514, 218–222. [Google Scholar] [CrossRef] [PubMed]
  49. Fadil, I.; Helmiawan, M.A.; Sofiyan, Y. Optimization parameters support vector regression using grid search method. In Proceedings of the 9th International Conference on Cyber and IT Service Management (CITSM), Bengkulu, Indonesia, 22–23 September 2021; IEEE: NewYork, NY, USA, 2021. [Google Scholar]
  50. Gao, Z.M.; Zhao, J. An improved grey wolf optimization algorithm with variable weights. Comput. Intell. Neurosci. 2019, 2019, 2981282. [Google Scholar] [CrossRef] [PubMed]
  51. Zhang, Y.; Li, Z.; Sun, Y.; Lv, Y.; Xie, Y. Estimation of atmospheric columnar organic matter (OM) mass concentration from remote sensing measurements of aerosol spectral refractive indices. Atmos. Environ. 2018, 179, 107–117. [Google Scholar] [CrossRef]
  52. Xie, Y.S.; Li, Z.Q.; Zhang, Y.X.; Zhang, Y.; Li, D.H.; Li, K.T.; Xu, H.; Zhang, Y.; Wang, Y.Q.; Chen, X.F.; et al. Estimation of atmospheric aerosol composition from ground-based remote sensing measurements of Sun-sky radiometer. J. Geophys. Res. Atmos. 2017, 122, 498–518. [Google Scholar] [CrossRef]
  53. Van Beelen, A.J.; Roelofs, G.J.H.; Hasekamp, O.P.; Henzing, J.S.; Röckmann, T. Estimation of aerosol water and chemical composition from AERONET Sun–sky radiometer measurements at Cabauw, the Netherlands. Atmos. Chem. Phys. 2014, 14, 5969–5987. [Google Scholar] [CrossRef]
Figure 1. Location and surrounding environment of three observation sites over Beijing–Tianjin–Hebei region.
Figure 1. Location and surrounding environment of three observation sites over Beijing–Tianjin–Hebei region.
Atmosphere 16 00114 g001
Figure 2. Concentration of organics, sulfates, ammonium, and nitrates in different seasons.
Figure 2. Concentration of organics, sulfates, ammonium, and nitrates in different seasons.
Atmosphere 16 00114 g002
Figure 3. Scatterplots between simulated results of 6 single models and observations for four chemical components in Beijing (red dots represent outliers).
Figure 3. Scatterplots between simulated results of 6 single models and observations for four chemical components in Beijing (red dots represent outliers).
Atmosphere 16 00114 g003
Figure 4. Flowchart for the establishment of the combined model.
Figure 4. Flowchart for the establishment of the combined model.
Atmosphere 16 00114 g004
Figure 5. Spatial distribution of the average annual concentration of the four components ((a): organics, (b): sulfates, (c): ammonium, (d): nitrates).
Figure 5. Spatial distribution of the average annual concentration of the four components ((a): organics, (b): sulfates, (c): ammonium, (d): nitrates).
Atmosphere 16 00114 g005
Figure 6. Spatial distribution of seasonal mean concentrations of organics, sulfates, ammonium, and nitrates ((a): spring, (b): summer, (c): Autumn, (d): winter).
Figure 6. Spatial distribution of seasonal mean concentrations of organics, sulfates, ammonium, and nitrates ((a): spring, (b): summer, (c): Autumn, (d): winter).
Atmosphere 16 00114 g006
Table 1. Correlation coefficients (CC) and p values between aerosol chemical component and aerosol optical depth (AOD) as well as meteorological factors.
Table 1. Correlation coefficients (CC) and p values between aerosol chemical component and aerosol optical depth (AOD) as well as meteorological factors.
ORGSO42−NH4+NO3
AODCC
p
0.70
0.0002
0.79
0.0000
0.82
0.0000
0.76
0.0000
T2MCC
p
0.13
0.02
0.21
0.003
0.24
0.001
0.24
0.001
RHCC
p
0.41
0.02
0.57
0.0008
0.50
0.0006
0.45
0.0001
U10CC
p
−0.27
0.09
−0.20
0.9
−0.25
0.08
−0.26
0.08
V10CC
p
0.31
0.05
0.28
0.03
0.33
0.0002
0.32
0.0003
SPCC
p
−0.14
0.004
−0.13
0.04
−0.13
0.001
−0.13
0.003
BLHCC
p
−0.38
0.008
−0.29
0.005
−0.35
0.004
−0.36
0.04
Table 2. Scope of optimization and parameter selection for different models.
Table 2. Scope of optimization and parameter selection for different models.
ParameterORGSO42−NH4+NO3Scope of Optimization
SVR
kernelrbfrbfrbfrbf[‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’]
gamma6.31 0.522.102.09[1 × 10−4, 1]
C0.731.310.720.74[1 × 10−2, 1 × 102]
RF
n_estimators80507070[50, 1000]
max_depth10111012[5, 50]
min_samples_split9323[2, 20]
min_samples_leaf1221[1, 10]
max_features5544[0.1, 1.0]
KNN
n_neighbors3533[1, 30]
weightsdistancedistancedistancedistance[‘uniform’, ‘distance’]
algorithmautoautoautoauto[‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’]
XGBoost
objectivereg:squarederrorreg:squarederrorreg:squarederrorreg:squarederror[‘reg:squarederror’, ‘reg:linear’, ‘reg:pseudohubererror’, ‘reg:logistic’]
learning_rate0.010.010.010.01[0.01, 0.3]
n_estimators350340355390[50, 1000]
max_depth5556[3, 10]
min_child_weight1155[1, 10]
subsample0.60.90.70.7[0.5, 1.0]
colsample_bytree0.90.90.70.9[0.5, 1.0]
gamma0000[0, 5]
LightGBM
boosting_typegdbtgdbtgdbtgdbt[‘gbdt’, ‘dart’, ‘goss’, ‘rf’]
objectiveregressionregressionregressionregression[‘regression’, ‘huber’, ‘fair’, ‘poisson’, ‘quantile’, ‘mape’, ‘gamma’, ‘tweedie’]
learning_rate0.010.010.010.01[0.01, 0.3]
n_estimators490160435480[50, 1000]
max_depth8888[−1, 15]
num_leaves19182028[20, 200]
max_bin1255517565[25, 500]
feature_fraction0.70.60.60.6[0.5, 1.0]
bagging_fraction0.70.80.90.8[0.5, 1.0]
bagging_freq40501040[0, 100]
lambda_l10.100.10.001[0, 1]
lambda_l20.001010.5[0, 1]
Table 3. Linear regression functions between simulated results of 6 single models and observations for four chemical components in Beijing.
Table 3. Linear regression functions between simulated results of 6 single models and observations for four chemical components in Beijing.
KNNLGBMMLRRFSVRXGBoost
oganicsy = 0.76x + 4.82y = 0.71x + 5.45y = 0.49x + 8.75y = 0.68x + 6.08y = 0.62x + 6.38y = 0.75x + 4.80
(R2 = 0.75, p = 0.00)(R2 = 0.75, p = 0.00)(R2 = 0.53, p = 0.00)(R2 = 0.74, p = 0.00)(R2 = 0.68, p = 0.00)(R2 = 0.75, p = 0.00)
surfatesy = 0.70x + 1.81y = 0.59x + 3.36y = 0.63x + 2.39y = 0.67x + 2.15y = 0.67x + 2.15y = 0.73x + 1.93
(R2 = 0.70, p = 0.00)(R2 = 0.61, p = 0.00)(R2 = 0.56, p = 0.00)(R2 = 0.60, p = 0.00)(R2 = 0.50, p = 0.00)(R2 = 0.68, p = 0.00)
ammoniumy = 0.86x + 0.61y = 0.71x + 1.30y = 0.68x + 1.42y = 0.74x + 1.19y = 0.79x + 1.38y = 0.75x + 1.25
(R2 = 0.84, p = 0.00)(R2 = 0.83, p = 0.00)(R2 = 0.66, p = 0.00)(R2 = 0.81, p = 0.00)(R2 = 0.75, p = 0.00)(R2 = 0.83, p = 0.00)
nitratesy = 0.82x + 1.54y = 0.74x + 2.07y = 0.59x + 3.00y = 0.75x + 2.20y = 0.74x + 2.86y = 0.81x + 1.77
(R2 = 0.85, p = 0.00)(R2 = 0.77, p = 0.00)(R2 = 0.70, p = 0.00)(R2 = 0.79, p = 0.00)(R2 = 0.74, p = 0.00)(R2 = 0.77, p = 0.00)
Table 4. MAE and R2 between the simulated results of 6 single models and observations for four chemical components in Xianghe and Xinglong.
Table 4. MAE and R2 between the simulated results of 6 single models and observations for four chemical components in Xianghe and Xinglong.
MLRSVRRFKNNXGBoostLightGBM
Xianghe
organicsMAE
R2
18.26
0.4
19.05
0.52
17.53
0.59
18.87
0.56
19.72
0.51
17.35
0.61
sulfatesMAE
R2
10.93
0.39
10.46
0.47
10.40
0.46
10.15
0.50
10.41
0.48
9.43
0.55
ammoniumMAE
R2
7.43
049
6.15
0.70
5.30
0.70
5.65
0.60
6.36
0.54
5.18
0.72
nitratesMAE
R2
14.16
0.55
15.08
0.54
12.57
0.61
12.63
0.61
15.13
0.58
14.07
0.73
Xinglong
organicsMAE
R2
25.05
0.34
24.73
0.37
24.23
0.41
24.50
0.38
24.98
0.35
24.19
0.52
sulfatesMAE
R2
13.83
0.29
13.24
0.32
12.97
0.37
12.68
0.33
13.49
0.31
11.93
0.42
ammoniumMAE
R2
7.21
0.35
6.86
0.40
6.60
0.42
6.55
0.51
6.89
0.40
5.39
0.62
nitratesMAE
R2
17.41
0.39
17.82
0.37
16.55
0.46
17.05
0.45
17.76
0.35
17.65
0.37
Table 5. MAE and R2 between the simulated results of the combined model and observations for four chemical components in Beijing, Xianghe, and Xinglong.
Table 5. MAE and R2 between the simulated results of the combined model and observations for four chemical components in Beijing, Xianghe, and Xinglong.
Stations OrganicsSulfatesAmmoniumNitrates
BeijingMAE
R2
13.08
0.84
7.87
0.69
4.65
0.86
9.93
0.85
XiangheMAE
R2
17.23
0.63
10.17
0.54
5.47
0.72
13.13
0.73
XinglongMAE
R2
24.05
0.52
12.58
0.40
5.71
0.62
16.64
0.40
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, B.; Cheng, G.; Shang, C.; Si, R.; Shao, Z.; Zhang, P.; Zhang, W.; Kong, L. Inversion of Aerosol Chemical Composition in the Beijing–Tianjin–Hebei Region Using a Machine Learning Algorithm. Atmosphere 2025, 16, 114. https://doi.org/10.3390/atmos16020114

AMA Style

Li B, Cheng G, Shang C, Si R, Shao Z, Zhang P, Zhang W, Kong L. Inversion of Aerosol Chemical Composition in the Beijing–Tianjin–Hebei Region Using a Machine Learning Algorithm. Atmosphere. 2025; 16(2):114. https://doi.org/10.3390/atmos16020114

Chicago/Turabian Style

Li, Baojiang, Gang Cheng, Chunlin Shang, Ruirui Si, Zhenping Shao, Pu Zhang, Wenyu Zhang, and Lingbin Kong. 2025. "Inversion of Aerosol Chemical Composition in the Beijing–Tianjin–Hebei Region Using a Machine Learning Algorithm" Atmosphere 16, no. 2: 114. https://doi.org/10.3390/atmos16020114

APA Style

Li, B., Cheng, G., Shang, C., Si, R., Shao, Z., Zhang, P., Zhang, W., & Kong, L. (2025). Inversion of Aerosol Chemical Composition in the Beijing–Tianjin–Hebei Region Using a Machine Learning Algorithm. Atmosphere, 16(2), 114. https://doi.org/10.3390/atmos16020114

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop