Next Article in Journal
Evaluation and Optimization of Phosphate Recovery from Coarse Rejects Using Reverse Flotation
Previous Article in Journal
Economic Dynamics as the Main Limitation for Agricultural Sustainability in a Colombian Indigenous Community
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Spatiotemporal Evolution Analysis of PM2.5 Concentrations in Central China Using the Random Forest Algorithm

1
School of Environment and Surveying Engineering, Suzhou University, Suzhou 234000, China
2
3S Technology Application Research Center in Northern Anhui, Suzhou 234000, China
*
Authors to whom correspondence should be addressed.
These authors contributed equally to this work.
Submission received: 22 August 2024 / Revised: 30 September 2024 / Accepted: 1 October 2024 / Published: 4 October 2024

Abstract

:
This study focuses on Central China (CC), including Shanxi, Henan, Anhui, Hubei, Jiangxi, and Hunan provinces. The 2019 average annual precipitation (PRE), average annual temperature (TEM), average annual wind speed (WS), population density (POP), normalized difference vegetation index (NDVI), aerosol optical depth (AOD), gross domestic product (GDP), and elevation (DEM) data were used as explanatory variables to predict the average annual PM2.5 concentrations (PM2.5Cons) in CC. The average annual PM2.5Cons were predicted using different models, including multiple linear regression (MLR), back propagation neural network (BPNN), and random forest (RF) models. The results showed higher prediction accuracy and stability of the RF algorithm (RFA) than those of the other models. Therefore, it was used to analyze the contributions of the explanatory factors to the PM2.5 concentration (PM2.5Con) prediction in CC. Subsequently, the spatiotemporal evolution of the PM2.5Cons from 2010 to 2021 was systematically analyzed. The results indicated that (1) PRE and AOD had the most significant impacts on the PM2.5Cons. Specifically, the PRE and AOD values exhibited negative and positive correlations with the PM2.5Cons, respectively. The NDVI and WS were negatively correlated with the PM2.5Cons; (2) the southern and northern parts of Shanxi and Henan provinces, respectively, experienced the highest PM2.5Cons in the 2010–2013 period, indicating severe air pollution. However, the PM2.5Cons in the 2014–2021 period showed spatial decreasing trends, demonstrating the effectiveness of the implemented air pollution control measures in reducing pollution and improving air quality in CC. The findings of this study provide scientific evidence for air pollution control and policy making in CC. To further advance atmospheric sustainability in CC, the study suggested that the government enhance air quality monitoring, manage pollution sources, raise public awareness about environmental protection, and promote green lifestyles.

1. Introduction

The PM2.5 refers to particulate matter with an aerodynamic diameter of 2.5 μm or less, commonly known as fine particulates. Due to its small size, long transmission distance, and high mobility [1,2,3,4,5], PM2.5 has become one of the major contributors to air pollution in recent years. These tiny particles can penetrate deeply into the lungs, posing significant health risks [6,7,8,9,10,11,12]. Although the concentrations of PM2.5 in the Earth’s atmosphere are low, they can greatly affect the global climate and visibility in urban areas [13,14,15]. According to previous studies, PM2.5 can significantly reduce visibility by absorbing or reflecting solar radiation, leading to the occurrence of smog, impeding travel, and increasing the risk of traffic accidents [16,17,18,19]. Prolonged exposure to high PM2.5 concentrations (PM2.5Cons) can also negatively affect crop yields and quality. Due to their light weight and constant movement in the atmosphere, PM2.5 can cause widespread air pollution. With industrial development and increasing environmental degradation, PM2.5 has emerged as a major pollutant affecting air quality in China [20], where atmospheric particulate pollution is at high levels [21].
Numerous studies have employed machine learning algorithms, particularly the RFA, to predict PM2.5Cons and analyze their spatiotemporal variations. The RFA has demonstrated superior accuracy and efficiency across various contexts compared to other machine learning models including the following: (1) The superiority of the RFA over other algorithms. Compared to other algorithms such as decision trees or support vector machines, the RFA offers superior accuracy, greater robustness against overfitting, and enhanced capability in handling large datasets with numerous variables, and numerous researchers in China and other countries have conducted extensive studies on the prediction and spatiotemporal evolution of PM2.5Cons using the random forest algorithm (RFA). Agibayeva et al. [22] applied the RFA and multiple linear regression (MLR) analysis to predict PM2.5Cons in Astana, Kazakhstan, in heating and non-heating periods before evaluating the risk of disability-adjusted life years (DALYs) based on the prediction results. Their findings demonstrated a higher prediction accuracy of the RFA than that of MLR. In addition, the authors highlighted the great contributions of the PM10 and carbon monoxide (CO) concentrations to the PM2.5 prediction results. (2) The RFA in PM2.5 prediction models. Several studies have successfully developed PM2.5 prediction models using the RFA. Hu et al. [23] used land use data, measured PM2.5Cons, and MODIS-derived 10 km aerosol optical depth (AOD) data to estimate the 24 h average PM2.5Cons in the United States in 2011. The cross-validation (CV) of the RFA showed a coefficient of determination (R2) of 0.80, meeting the expected experimental outcomes. Du et al. [24] developed an RF-based PM2.5 prediction model using 2013–2016 meteorological data of Xi’an, China, such as wind speed, precipitation amounts, air temperature, CO concentrations, nitric oxide (NO), sulfur dioxide (SO2), seasons, and the previous day’s PM2.5Cons. Their results demonstrated the effective capacity of the developed model in predicting PM2.5Cons, with high accuracy and efficiency, outperforming the back propagation neural network (BPNN) algorithm. (3) Spatiotemporal analysis of PM2.5 using the RFA. The RFA has been instrumental in spatiotemporal analysis, providing valuable insights into PM2.5 variations over time and space. Shi et al. [25] focused on Eastern China as their study area, using PM2.5 monitoring, meteorological, and land use data as inputs. Employing the RFA, they systematically analyzed the spatiotemporal variations of PM2.5Cons for the years 2016, 2018, and 2020. Additionally, they conducted a quantitative assessment of the relationship between the spatial distribution of PM2.5Cons and landscape patterns, offering valuable insights into the environmental and landscape dynamics of the region. Xia et al. [26] analyzed the factors influencing PM2.5Cons across China using the RFA, showing space–time evolution in the contributions of influencing factors to the PM2.5Cons. (4) The RFA in regional PM2.5 studies. Several regional studies have leveraged RFA to analyze the dynamics and influencing factors of PM2.5Cons. Jin et al. [27] systematically examined the spatiotemporal changes and factors influencing PM2.5Cons in Anhui Province, demonstrating the strong impacts of natural and anthropogenic factors on the PM2.5Cons and trends. The authors highlighted a gradual decrease in the observed PM2.5Cons, improving air quality across the province. Lei et al. [28] monitored daily air quality in Macau over the 2016–2021 period to predict 24- and 48-h PM2.5, PM10, and CO concentrations using five machine learning algorithms, of which the RFA yielded the best prediction performance. In addition, Su et al. [29] analyzed the spatial distributions of PM2.5Cons in the Yangtze River delta over the 2003–2019 period and explored the key driving factors behind PM2.5 pollution in areas experiencing severe air quality issues using five machine learning algorithms. Their results indicated the high accuracy of the geographically weighted random forest (GWRF) algorithm in predicting the PM2.5Cons using main climate factors and human activities. Xu et al. [30] systematically evaluated the spatiotemporal variations of PM2.5Cons in Heilongjiang Province, China, using multi-source data and the RFA. The authors identified the number of fire points as the primary factor influencing temporal fluctuations in PM2.5 levels during straw burning.
Although the RFA has been extensively used in environmental science research due to its strong predictive capabilities and ability to handle complex data, there is still a lack of in-depth and systematic studies on PM2.5 pollution prediction and its influencing factors in Central China (CC). To better understand and address the impacts of PM2.5Cons in CC, we evaluated the influences of different explanatory variables on the PM2.5Cons, including climate factors (precipitation (PRE), temperature (TEM), and wind speed (WS)), population density (POP), the gross domestic product (GDP), the normalized difference vegetation index (NDVI), AOD, and the digital elevation model (DEM). In addition, the accuracies of the RF, MLR, and BPNN models in predicting the PM2.5Cons in CC were further evaluated and compared. The main objective of this study was to assess the quantitative relationships of the spatial distribution of the PM2.5Cons in CC with the environmental, population, and economic factors. In addition, the spatiotemporal evolution of the PM2.5Cons and their influencing factors in the region were explored over the 2010–2021 period. The objectives of this study are as follows: (1) to validate the prediction accuracy and performance of the RF, MLR, and BPNN models; (2) to explore the spatial influences of the environmental, population, and economic factors on the PM2.5Cons in CC; and (3) to explore the spatiotemporal evolution patterns of the PM2.5Cons in CC.

2. Materials and Methods

2.1. Study Area

This study focuses on CC, which spans from the coastal areas in the east to inland regions in the west. This study area covers Shanxi, Henan, Anhui, Hubei, Jiangxi, and Hunan provinces. The highest and lowest elevations are located in the western and eastern parts of CC, respectively, with vast plains. Geographically, CC is strategically located, running through the east–west axis of China and connecting the northern and southern parts. It serves as a major transportation hub and a key intersection of economic and cultural exchange. Economically, CC plays a crucial role as a vital node for industrial relocation, and resource allocation, contributing significantly to high-quality development. The administrative divisions of CC are shown in Figure 1.

2.2. Data Sources

Precipitation (PRE) and wind speed (WS) data of CC were supplied by the National Earth System Science Data Center. The POP, NDVI, DEM, and GDP data were provided by the Resource and Environment Science Data Center. The TEM and AOD data were obtained from the National Aeronautics and Space Administration Giovanni platform [31]. The 2010–2021 average PM2.5 concentration (PM2.5Con) raster data were sourced from the Atmospheric Composition Analysis Group, Washington University in St. Louis [32]. All collected data are reported in Table 1; the horizontal resolution of the data was 0.1° × 0.1°. With the rapid advancement of industrialization and urbanization, PM2.5 emissions in CC provinces, particularly Hubei, Henan, and Anhui, have increased significantly, exceeding those in the less-developed western regions and coastal areas with stricter environmental regulations. A comprehensive analysis of the spatiotemporal evolution of PM2.5Cons in CC is crucial for identifying local environmental challenges and offering valuable insights to facilitate air quality improvement efforts nationwide.

2.3. Methodology

2.3.1. Data Preprocessing

The raster data were first converted to the China Lambert projection and resampled to a resolution of 10,000 m using ArcGIS 10.7 [26]. Afterward, the clip tool was used to generate the spatial distribution maps of the explanatory variables. The PM2.5Cons raster data were then converted to points before extracting the corresponding values using the points tool to spatially align the explanatory variable raster data with the average annual PM2.5Con data. These data were employed to implement the MLR, BPNN, and RF models. The prediction accuracies of the models on a test dataset were further compared. The RFA was employed to assess the relative importance of each influencing factor in predicting the spatial distribution of the PM2.5Cons in CC over the 2010–2021 period. The detailed data processing steps are illustrated in Figure 2.

2.3.2. Multiple Linear Regression

MLR analysis is a statistical method used to quantitatively describe the linear relationship between a dependent variable and multiple independent variables. This statistical analysis can uncover the specific impacts of each independent variable on the dependent variable and their linear dependencies, according to Equation (1) [33]:
g a = t 0 + t 1 x 1 + t 2 x 2 + · · · + t n x n ,
where g a represents the average annual PM2.5Con; x n denotes the explanatory variables; and t n represents the model regression coefficients.

2.3.3. Back Propagation Neural Network

The implementation of the BPNN algorithm involves several key steps. First, the data are normalized to ensure a consistent data scale for analysis. Next, a neural network model is constructed and trained to optimize the network parameters. The trained model is then used for prediction purposes. Finally, the predicted values are denormalized to restore the original data scale. The learning process consists of two steps, namely forward and back propagation. The input data in the forward propagation step are processed through network layers to generate predicted values before computing the prediction errors through the back propagation step. The network weights are subsequently adjusted to continuously improve the prediction accuracy of the algorithm.

2.3.4. Random Forest

The RFA is an ensemble learning algorithm introduced by Breiman et al. [34,35]. This algorithm can enhance the prediction performance by combining multiple decision trees. The basic unit of this method is the decision tree. Specifically, numerous decision trees are constructed for performing classification or regression tasks. During the training steps, multiple decision trees are generated by randomly selecting features and samples. The predictions from these trees are then averaged to produce the final output. This ensemble method can not only improve the prediction accuracy but can also effectively reduce the risk of overfitting, thereby enhancing the stability and generalization ability of the algorithm.

2.3.5. Model Construction

The model parameters were evaluated using the PM2.5Cons as the dependent variable and the number of trees as the independent variable. The prediction error of the RFA decreased with increasing numbers of trees before stabilizing at approximately 500 trees. The relationship between the number of trees and the computed error is shown in Figure 3.
Based on the specified model (tree = 500), 10% of the data was randomly selected as the test dataset, and the remaining 90% was used as the training dataset to evaluate the effectiveness of the RFA in predicting the PM2.5Cons. The RFA demonstrated a high prediction performance in the training and test phases. The specific performance metrics are shown in Figure 4 and Figure 5.
We further optimized the prediction accuracy of the RFA by analyzing the generated residuals, which are defined as the differences between the observed and predicted values. The distribution of residuals was illustrated using residual scatter plots and Q–Q plots, as shown in Figure 6 and Figure 7. The residuals were evenly distributed around zero, indicating no discernible trends, which suggested a good model fit. Additionally, the points on the Q–Q plots closely align with a straight line, indicating that the data followed a normal distribution. The Q–Q plot shows a curved pattern which signifies a distribution which is heavier tailed than the normal distribution. On the other hand, the distribution of the residuals is nearly symmetric, so this deviation has a negligible effect on the results depending on normality.

2.3.6. Evaluation Metrics

(1)
Model Accuracy Metrics
The accuracies and reliability of the prediction results of the models were assessed using the root mean square error ( R M S E ) and R 2 [26]. The R M S E measures the average deviation between predicted and actual values. The R 2 values can be used to evaluate the ability of models to explain the variance in data, reflecting the quality of model fitting. These statistical metrics can, therefore, provide comprehensive assessment results of model performance. The calculation equations for these statistical metrics are presented in Equations (2) and (3) [26]:
R 2 = 1 i = 1 n X p r e i X o b s i 2 i = 1 n X o b s i 1 n i = 1 n X o b s i 2 ,
R M S E = 1 n i = 1 n X p r e i X o b s i 2 ,
The R 2 has been commonly used to evaluate the goodness of model regression fits, ranging from 0 to 1. An R 2 close to 1 indicates a good match between predicted and observed data, whereas the R M S E measures the standard deviation between predicted and observed values. The X o b s i and X p r e i denote the actual and predicted PM2.5Cons, respectively.
(2)
Key Factor Analysis
The percentage increase in Mean Squared Error (%IncMSE) is a metric used in the RFA to assess the relative contributions of explanatory variables to the prediction accuracy. The increase in node purity (IncNodePurity) can be used to assess the improvement in the homogeneity of a target variable after a decision tree node splits, determining the importance of features within decision trees. These metrics are computed using Equations (4) and (5) [26]:
% I n c M S E = ( M S E perm M S E o r i g ) / M S E o r i g × 100 ,
IncNode P u r i t y = ( P l e f t + P r i g h t ) / P before 1 ,
where M S E p e r m denotes the mean squared error ( M S E ) obtained by retraining the model after randomly shuffling specific feature values; M S E o r i g denotes the M S E of the original model; P left and P right represent the purity of the two child nodes after the splitting step; and P b e f o r e denotes the purity before the data split.

3. Results and Analysis

3.1. Spatial Analysis of the Explanatory Variables

Figure 8 shows the spatial distributions of the explanatory variables in CC in 2019. We observed higher AOD values in Henan, Hubei, and Anhui provinces than in the central and southwestern parts, indicating more severe air pollution in these areas. Shanxi Province has the highest elevation, with steep terrain, whereas Henan, Hubei, and Anhui provinces are relatively flat. Henan and Shanxi provinces had the highest and lowest GDPs, indicating good and poor economic growth, respectively. Hubei, Hunan, and Jiangxi provinces showed the highest NDVI values, indicating good vegetation cover when compared with Shanxi, Henan, and Anhui provinces. Shanxi exhibited the highest WSs, likely due to its topography when compared with those observed in Henan, Hubei, and Hunan. Jiangxi Province exhibited the highest and most evenly distributed PRE amounts, whereas Shanxi had the lowest PRE amount, mainly concentrated in the southern part. There was an increasing trend in the PRE amount from the northwest to the southeast. Jiangxi and Shanxi had the highest and lowest TEM values, respectively. We also observed an increase in the TEM values from the north to the south. The highest POP density was observed in Henan Province. In addition, Hubei and Anhui exhibited high POP densities with more even distributions. Shanxi exhibited the lowest POP density, reflecting a more sparsely populated area.

3.2. Analysis of the Factors Influencing the PM2.5 Concentrations

According to the comparative analysis results, the RFA outperformed the MLR and BPNN models, as detailed in Table 2. Figure 9 shows the comparison between the observed and RF-based predicted PM2.5Cons. The vertical and horizontal axes represent the predicted and actual values, respectively. The red diagonal line indicates the ideal scenario, where the predicted values perfectly match the actual values. Most data points were plotted around the diagonal line, indicating a high degree of alignment between the predicted and observed values. Indeed, the R 2 value of the RFA was 0.9407, suggesting strong explanatory power. Moreover, the R M S E value was 2.5124, indicating a low prediction error. The scatter plot shows that the data points are evenly distributed without significant systematic bias, demonstrating good prediction performance across the entire value range. Overall, the RFA demonstrated high accuracy and stability in predicting the PM2.5Cons. To further enhance the interpretability of the RFA [8], feature importance ranking and partial dependence analysis were conducted to identify the key factors influencing the PM2.5Cons in CC.
The obtained relative importance of the explanatory variables using the %IncMSE and IncNodePurity methods are shown in Figure 10. According to the results, the PRE exhibited the highest %IncMSE value of approximately 75%. The NDVI and TEM also significantly influenced the prediction performance of the RFA, showing %IncMSE values of approximately 55% and 50%, respectively. The AOD, POP, GDP, and the remaining factors demonstrated lower impacts on the prediction results. The AOD exhibited the greatest contribution to the node purity, showing an IncNodePurity value of over 200,000, followed by those of the PRE and POP, with IncNodePurity values of 175,000 and 150,000, respectively. In contrast, the GDP and NDVI showed the lowest contributions to the node purity of the RFA.
The correlation and partial correlation coefficients of the influencing factors are reported in Table 3. The AOD and PRE exhibited the most significant impact on the PM2.5Cons. Specifically, the AOD and PRE values showed strong positive and negative correlations with the PM2.5Cons in CC. Even after controlling for other variables, these two variables were the major influencing factors. In addition, although the NDVI and POP affected the PM2.5Cons, their relative importance decreased after considering the other variables. The relationship between the GDPs and PM2.5Cons shifted from a positive to a weak negative correlation after considering other factors. This finding indicates that the direct impact of the GDPs on the PM2.5Cons was explained by the other variables. On the other hand, the WS and TEM showed minimal influences on the PM2.5Cons [26]. In addition, their correlations with the PM2.5Cons became almost negligible after considering the other variables in the analysis.
The impacts of the explanatory variables on the distribution of the PM2.5Cons are shown in Figure 11. The AOD values were positively correlated with the PM2.5Cons. Specifically, the PM2.5Cons fluctuated and generally increased with increasing AOD values. However, lower influences of the AOD on the PM2.5Cons were observed at AOD values greater than 0.8. The relationships between the TEM, GDP, and PM2.5Con distributions were less clear. The PM2.5Cons were relatively stable with increasing TEM values. The greatest influence of the TEM on the PM2.5Cons was observed over the 0–15 °C range. The DEM, NDVI, POP, PRE, and WS values showed negative correlations with the PM2.5Cons. Specifically, the PM2.5Cons showed a decreasing trend with increasing DEM values before stabilizing at approximately 2000 m. A sharp decrease in the PM2.5Con was observed at NDVI values over 0.6. In addition, the strongest influence of the POP density on the PM2.5Cons was observed at over 5000 people/km2. In contrast, the PRE exhibited decreased influences on the PM2.5Cons under greater amounts than 500 mm. The lowest influence of the PRE was observed at amounts greater than 1500 mm [26]. A similar influence pattern was demonstrated by WS. The PM2.5Cons showed a decreasing trend with increasing WS values above 1.0 m/s, reaching the lowest influence at values above 2 m/s.
Overall, the AOD and PRE demonstrated the greatest influences on the PM2.5Cons in CC. The AOD and PRE values were positively and negatively correlated with the PM2.5Cons. However, the NDVI and WS values showed negative correlations with the PM2.5Cons. In contrast, the other variables (TEM, POP, DEM, and GDP) had minor effects on the PM2.5Cons.

3.3. Spatiotemporal Analysis of the PM2.5 Concentrations

3.3.1. Temporal Variations in the PM2.5 Concentrations

Figure 12 and Figure 13 show the distribution, trend, and spatial variations in the PM2.5Cons in CC over the 2010–2021 period. The highest PM2.5Cons were observed in 2011 before decreasing over the 2013–2021 period (Figure 13). The PM2.5Cons showed a continuous temporal decreasing trend. Hunan Province showed higher PM2.5Cons than those in Shanxi Province over the 2010–2015 period. After 2015, Hunan experienced a significant decline in the PM2.5Cons. The results showed slight temporal changes in the PM2.5Cons from 2010 to 2014. In contrast, decreasing trends in the PM2.5Cons were observed over the 2017–2021 period across all provinces, which might be due to reduced industrial pollution and increased environmental awareness. Nevertheless, Henan Province had the highest PM2.5Cons in the 2010–2021 period. It is worth noting that the PM2.5Cons in Henan had an M-shaped variation in the 2010–2015 period before exhibiting a decreasing trend over the 2015–2021 period. This finding further demonstrates the benefits of national policies and Henan’s industrial restructuring efforts.

3.3.2. Spatial Analysis of the PM2.5 Concentrations

The average PM2.5Cons in the central provinces are shown in Figure 14 and Table 4. The results showed strong regional variations in the PM2.5Cons (Figure 12 and Figure 14). South Shanxi and north Henan showed the highest PM2.5Cons in the 2010–2012 period, indicating severe air pollution. In contrast, Hunan, Anhui, and Jiangxi exhibited lower PM2.5Cons in the same period, indicating lighter pollution. The high-concentration areas in south Shanxi and central Henan decreased from 2013 to 2015, showing decreased air pollution levels in the study area. The low-concentration areas in Hubei, Hunan, Anhui, and Jiangxi expanded, demonstrating an improvement in air quality. The high-concentration areas in south Shanxi and central Henan also exhibited a decreasing trend over the 2016–2018 period, demonstrating the effectiveness of pollution control measures. In addition, the low-concentration areas in Hubei, Hunan, Anhui, and Jiangxi showed expansion trends, reflecting a great improvement in air quality. In 2019, the high-concentration areas in south Shanxi and north Henan continued to diminish, becoming concentrated in a few regions, while the low-concentration areas in Hubei, Hunan, Anhui, and Jiangxi showed continuous expansion trends. The high-concentration areas in south Shanxi and north Henan significantly decreased over the 2020–2021 period. On the other hand, the low-concentration areas in Hubei, Hunan, Anhui, and Jiangxi expanded in this period, demonstrating the effectiveness of pollution control measures. Henan and Jiangxi exhibited the highest and lowest average PM2.5Cons in the 2010–2021 period, respectively. The decreasing trends of the high-concentration areas in south Shanxi and central Henan demonstrated the effectiveness of air pollution control measures. The expansion of the low-concentration areas in Hubei, Hunan, Anhui, and Jiangxi reflected notable improvements in air quality in these areas. This suggests that environmental policies and measures were effective in reducing the PM2.5Cons in the study period, substantially improving air quality across the provinces.

4. Conclusions and Discussion

4.1. Conclusions

(1) We systematically analyzed the spatiotemporal changes in the PM2.5Cons in CC from 2010 to 2021. The results showed a gradually decreasing trend of the high-concentration areas in south Shanxi and central Henan, demonstrating an improvement in air quality. On the other hand, the low-concentration areas in Hubei, Hunan, Anhui, and Jiangxi expanded, further demonstrating an improvement in air quality in CC. Specifically, the PM2.5Cons exhibited a decreasing trend over the 2014–2021 period, demonstrating the effectiveness of air pollution control measures.
(2) The PRE and AOD were the main factors influencing the PM2.5Cons. In addition, the PRE and AOD values showed negative and positive correlations with the PM2.5Cons. In contrast, the NDVI and WS demonstrated lower influences on the PM2.5Cons in CC.
(3) The PM2.5Cons showed a spatial decreasing trend from the northern to the southern part of the study area. Henan and Jiangxi showed the highest and lowest PM2.5Cons, respectively, with a maximum difference of 48.06 g / m 3 .
(4) The RFA demonstrated high accuracy and stability in assessing the relative importance of the influencing factors. The results of this study demonstrated the effectiveness of the RFA in predicting the PM2.5Cons in CC. This study provides scientific support for air quality management and policy making in CC, contributing to the improvement of regional air quality.

4.2. Discussion

(1) Due to the limitations in the precision of data quantification, our current understanding of influencing factors remains incomplete, necessitating more in-depth exploration in future research.
(2) The lack of uniformity in data sources introduces errors during resampling and model input processes, highlighting the urgent need for higher precision data sources and advanced processing algorithms.
(3) Recently, China has launched a new generation of hyperspectral satellites capable of capturing polarimetric vector data, which provides enriched aerosol information. In the future, leveraging the combined strengths of polarimetric vector data and traditional scalar data could enhance the quantitative inversion of PM2.5Cons in CC.
(4) It is recommended that the government persistently advance both routine and temporary emission reduction measures to decrease PM2.5 emissions and improve air quality, particularly in Henan Province, which experiences the highest PM2.5 levels. These efforts should focus on enhancing air quality monitoring and data transparency, controlling pollution sources, adjusting industrial structures, raising public health awareness, promoting green lifestyles, and strengthening policy and legal frameworks. Such actions will help establish a strong environmental ethic and support sustained improvements in air quality.
(5) Surface cover and cloud interference have created gaps in aerosol data for certain regions, affecting aerosol inversion accuracy. Future research should prioritize strategies to address missing satellite aerosol data and enhance model capabilities to learn from concentrated abrupt changes and extreme values in the dataset. By combining high-resolution remote sensing imagery with more advanced interpolation algorithms, we can continuously optimize the RFA to improve the accuracy of PM2.5 retrieval, further enhancing air quality monitoring.

Author Contributions

Conceptualization, G.F. and Y.Z.; methodology, G.F.; software, Y.Z. and J.Z.; validation, G.F. and Y.Z.; formal analysis, G.F.; resources, G.F.; data curation, G.F. and Y.Z.; writing—original draft preparation, G.F. and Y.Z.; writing—review and editing, G.F. and J.Z.; visualization, J.Z. and Y.Z.; supervision, G.F.; project administration, G.F.; funding acquisition, G.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the School-enterprise Cooperation Practice Education Base of Anhui Province (2023xqhz065), 3S Technology Application Research Center in Northern Anhui (2021XJPT12), Suzhou University Quality Engineering Surplus Fund Project (szxy2023jyjf61), Teaching Team of Surveying and Mapping Engineering in Anhui Province (2020jxtd285), Professional Leader Fund of Suzhou University (2019XJZY06), Suzhou University Traditional Program Transformation and Upgrade Project (szxy2022ctzy01), and Anhui Province Higher Education Scientific Research Project (Key Natural Science Project: 2024AH051818).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The authors confirm that all the data are available in this article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Peng, H.J.; Zhou, Y.; Hu, X.F.; Zhang, L.; Peng, Y.Z.; Cai, X.X. A PM2.5 prediction model based on deep learning and random forest. Natl. Remote Sens. Bull. 2023, 2, 430–440. [Google Scholar]
  2. Kang, X.L.; Zhang, W.H.; Liu, Y.P.; Gu, X.F.; Yu, T.; Zhang, L.L.; Xu, H.K. PM2.5 remote sensing retrieval and change analysis in Beijing–Tianjin–Hebei region based on random forest model. Remote Sens. Technol. Appl. 2022, 2, 424–435. [Google Scholar]
  3. Zheng, J. Study on Aerosol Optical Thickness and Particle Concentration in Xi’an Based on MODIS Inversion. Master’s Thesis, Chang’an University, Xi’an, China, 2019. [Google Scholar]
  4. Liu, Z. Aerosol Optical Properties Study Based on Ground Observation. Ph.D. Thesis, University of Science and Technology of China, Hefei, China, 2020. [Google Scholar]
  5. Liu, J.; Li, P.F. Research of characteristics of temporal and spatial variation and the influence factors of PM2.5 in Hanzhong city. J. Shaanxi Univ. Technol. (Nat. Sci. Ed.) 2023, 3, 88–92. [Google Scholar]
  6. Xie, Y.; Dai, H.C.; Zhang, Y.X.; Wu, Y.Z.; Tatsuya, H.; Toshihiko, M. Comparison of health and economic impacts of PM2.5 and ozone pollution in China. Environ. Int. 2019, 130, 104881. [Google Scholar] [CrossRef] [PubMed]
  7. Al–Kindi, S.G.; Brook, R.D.; Biswal, S.; Rajagopalan, S. Environmental determinants of cardiovascular disease: Lessons learned from air pollution. Nat. Rev. Cardiol. 2020, 10, 656–672. [Google Scholar] [CrossRef] [PubMed]
  8. Yang, D.Y.; Ye, C.; Wang, X.M.; Lu, D.B.; Xu, J.H.; Yang, H.Q. Global distribution and evolvement of urbanization and PM2.5 (1998–2015). Atmos. Environ. 2018, 182, 171–178. [Google Scholar] [CrossRef]
  9. Matus, K.; Nam, K.M.; Selin, N.E.; Lamsal, L.N.; Reilly, J.M.; Paltsev, S. Health damages from air pollution in China. Glob. Environ. Change 2012, 1, 55–66. [Google Scholar] [CrossRef]
  10. Chen, L.; Zhu, J.; Liao, H.; Yang, Y.; Yue, X. Meteorological influences on PM2.5 and O3 trends and associated health burden since China’s clean air actions. Sci. Total Environ. 2020, 744, 140837. [Google Scholar] [CrossRef]
  11. Chen, Y.; Chen, R.J.; Chen, Y.; Dong, X.L.; Zhu, J.F.; Liu, C.; Donkelaar, A.V.; Martin, R.; Li, H.C.; Kan, H.; et al. The prospective effects of long–term exposure to ambient PM2.5 and constituents on mortality in rural East China. Chemosphere 2021, 280, 130740. [Google Scholar] [CrossRef]
  12. Daryanoosh, S.M.; Goudarzi, G.; Mohammadi, M.R.; Armin, H.; Khaniabadi, Y.O.; Sadeghi, S. Exposure to particulate matter and its health impacts an AirQ approach. Arch. Hyg. Sci. 2017, 1, 88–95. [Google Scholar] [CrossRef]
  13. Ye, W.F.; Ma, Z.Y.; Ha, X.Z. Spatial–temporal patterns of PM2.5 concentrations for 338 Chinese cities. Sci. Total Environ. 2018, 631, 524–533. [Google Scholar] [CrossRef] [PubMed]
  14. Deng, D.; Lu, P.L.; Duan, L.F.; Li, Z.L.; Wang, F.W.; Zhai, C.Z. Spatio–temporal characteristics and influencing factors of PM2.5 in Cheng–Yu district. Environ. Impact Assess. 2021, 4, 84–90. [Google Scholar]
  15. Fang, D.; Wang, Q.G.; Li, H.M.; Yu, Y.Y.; Lu, Y.; Qian, X. Mortality effects assessment of ambient PM2.5 pollution in the 74 leading cities of China. Sci. Total Environ. 2016, 569–570, 1545–1552. [Google Scholar] [CrossRef] [PubMed]
  16. Wu, S.Q.; Yao, J.Q.; Yang, R.; Zhang, S.W.; Zhao, W.J. Spatio–temporal variations in PM2. 5 and its influencing factors in the Yangtze River delta urban agglomeration. Environ. Sci. 2023, 44, 5325–5334. [Google Scholar]
  17. Wang, J.K.; Zhang, H.D.; Gui, H.L.; Rao, X.Q.; Zhang, B.H. Relationship between atmospheric visibility and PM2.5 concentrations and distributions. Environ. Sci. 2019, 7, 2985–2993. [Google Scholar]
  18. Chen, J.; Zhao, C.S. A review of influence factors and calculation of atmospheric low visibility. Adv. Meteorol. Sci. Technol. 2014, 4, 44–51. [Google Scholar]
  19. Khanna, I.; Khare, M.; Gargava, P.; Khan, A.A. Effect of PM2.5 chemical constituents on atmospheric visibility impairment. J. Air Waste Manag. Assoc. 2018, 5, 430–437. [Google Scholar] [CrossRef]
  20. Li, Y.H.; Wang, Y.; Yi, Q.C.; Chen, L.F. The study on air quality change of Nanchang city from 2004 to 2015 years based on satellite remote sensing MODIS data. J. Jiangxi Norm. Univ. (Nat. Sci.) 2019, 2, 214–220. [Google Scholar]
  21. Lu, J.M.; Zeng, S.P.; Zeng, J.; Wang, S.; Song, Y.Z. High resolution simulation of temporal and spatial variation of PM2.5 concentration based on random forest—A case study of central plains urban agglomeration core area. China Environ. Sci. 2023, 7, 3299–3311. [Google Scholar]
  22. Agibayeva, A.; Khalikhan, R.; Guney, M.; Karaca, F.; Torezhan, A.; Avcu, E. An air quality modeling and disability–adjusted life years (DALY) risk assessment case study: Comparing statistical and machine learning approaches for PM2.5 forecasting. Sustainability 2022, 14, 16641. [Google Scholar] [CrossRef]
  23. Hu, X.F.; Belle, J.H.; Meng, X.; Wildani, A.; Waller, L.A.; Strickland, M.J.; Liu, Y. Estimating PM2.5 concentrations in the conterminous United States using the random forest approach. Environ. Sci. Technol. 2017, 12, 6936–6944. [Google Scholar] [CrossRef] [PubMed]
  24. Du, X.; Feng, J.Y.; Lv, S.Q.; Shi, W. PM2.5 concentration prediction model based on random forest regression analysis. Telecommun. Sci. 2017, 7, 66–75. [Google Scholar]
  25. Shi, T.T.; Wang, S.; Yang, L.J.; Chen, W.Q.; Wang, Y.; Gao, J.J. The spatial–temporal change of PM2.5 concentration and its relationship with landscape pattern in East China. Remote Sens. Technol. Appl. 2024, 2, 435–446. [Google Scholar]
  26. Xia, X.S.; Chen, J.J.; Wang, J.J.; Cheng, X.F. PM2. 5 concentration influencing factors in China based on the random forest model. Environ. Sci. 2020, 5, 2057–2065. [Google Scholar]
  27. Jin, Y.P.; Peng, J.; Ling, M.; Zhang, B. Temporal and spatial variation characteristics of PM2.5 mass concentration and its influence factors in Anhui Province. J. Heilongjiang Inst. Technol. 2023, 1, 14–20. [Google Scholar]
  28. Lei, T.M.T.; Ng, S.C.W.; Siu, S.W.I. Application of ANN, XGBoost, and other ML methods to forecast air quality in Macau. Sustainability 2023, 15, 5341. [Google Scholar] [CrossRef]
  29. Su, Z.; Lin, L.; Xu, Z.; Chen, Y.; Yang, L.; Hu, H.; Lin, Z.; Wei, S.; Luo, S. Modeling the effects of drivers on PM2.5 in the Yangtze River delta with geographically weighted random forest. Remote Sens. 2023, 15, 3826. [Google Scholar] [CrossRef]
  30. Xu, Z.; Liu, B.; Wang, W.; Zhang, Z.; Qiu, W. Assessing the impact of straw burning on PM2.5 using explainable machine learning: A case study in Heilongjiang Province, China. Sustainability 2024, 16, 7315. [Google Scholar] [CrossRef]
  31. National Aeronautics and Space Administration Giovanni (NASA Giovanni). Available online: https://giovanni.gsfc.nasa.gov (accessed on 4 August 2023).
  32. Atmospheric Composition Analysis Group. Washington University in St. Louis (WUSTL ACAG). Available online: https://sites.wustl.edu/acag/datasets/surface-pm2-5 (accessed on 14 July 2023).
  33. Wang, C.; Kan, A.K.; Zeng, Y.L.; Li, G.Q.; Wang, M.; Ci, R. Population distribution pattern and influencing factors in Tibet based on random forest model. Acta Geogr. Sin. 2019, 4, 664–680. [Google Scholar]
  34. Wang, P.; Zhao, X.Y.; Song, K. Prediction of PM2.5 concentration in Yangtze River delta based on random forest algorithm. Environ. Monit. China 2021, 5, 21–31. [Google Scholar]
  35. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Figure 1. Geographical location of Central China.
Figure 1. Geographical location of Central China.
Sustainability 16 08613 g001
Figure 2. Flowchart.
Figure 2. Flowchart.
Sustainability 16 08613 g002
Figure 3. Relationship between tree and error.
Figure 3. Relationship between tree and error.
Sustainability 16 08613 g003
Figure 4. Training set accuracy.
Figure 4. Training set accuracy.
Sustainability 16 08613 g004
Figure 5. Test set accuracy.
Figure 5. Test set accuracy.
Sustainability 16 08613 g005
Figure 6. Residual plot.
Figure 6. Residual plot.
Sustainability 16 08613 g006
Figure 7. Q–Q plot of residuals.
Figure 7. Q–Q plot of residuals.
Sustainability 16 08613 g007
Figure 8. Spatial distribution of explanatory variables.
Figure 8. Spatial distribution of explanatory variables.
Sustainability 16 08613 g008aSustainability 16 08613 g008b
Figure 9. Model accuracy.
Figure 9. Model accuracy.
Sustainability 16 08613 g009
Figure 10. Ranking of important factors.
Figure 10. Ranking of important factors.
Sustainability 16 08613 g010
Figure 11. Impact of influencing factors on PM2.5Con (Unit: g / m 3 ).
Figure 11. Impact of influencing factors on PM2.5Con (Unit: g / m 3 ).
Sustainability 16 08613 g011
Figure 12. PM2.5 distribution from 2010 to 2021.
Figure 12. PM2.5 distribution from 2010 to 2021.
Sustainability 16 08613 g012aSustainability 16 08613 g012bSustainability 16 08613 g012c
Figure 13. PM2.5Con trends by province.
Figure 13. PM2.5Con trends by province.
Sustainability 16 08613 g013
Figure 14. Average PM2.5Con.
Figure 14. Average PM2.5Con.
Sustainability 16 08613 g014
Table 1. Data table of explanatory variables for 2019.
Table 1. Data table of explanatory variables for 2019.
Explanatory VariablesData NameSpatial Resolution
AODGIOVANNI-g4.timeAvgMap.MOD08_D3_6_1_Aerosol_Optical_Depth_Land_Ocean_Mean.20190101-20191231.72E_17N_146E_57N
DEMdem_1km1 km
WSwnd_2019_01-wnd_2019_121 km
GDPgdp20191 km
NDVIndvi201901-ndvi2019121 km
PREpre_2019.nc1 km
POPpop20191 km
TEMTEM0.5° × 0.625°
Table 2. Accuracy evaluation.
Table 2. Accuracy evaluation.
Method R 2 R M S E
MLR0.79924.4564
BPNN0.87573.5109
RF0.94072.5168
Table 3. Correlation and partial correlation coefficients of influencing factors.
Table 3. Correlation and partial correlation coefficients of influencing factors.
Influencing FactorsAODDEMWSGDPNDVIPREPOPTEM
Correlation coefficients0.6961−0.41000.15130.2074−0.4304−0.56650.35030.0927
Partial correlation coefficients0.4088−0.22110.0345−0.1383−0.1984−0.57970.19570.1143
Table 4. Average annual PM2.5Cons by province and year (Unit: μ g / m 3 ).
Table 4. Average annual PM2.5Cons by province and year (Unit: μ g / m 3 ).
Year
Province
201020112012201320142015201620172018201920202021
Anhui52.6753.3048.7454.8956.5549.8745.6446.8640.7239.8334.9432.85
Henan64.2971.2666.8572.0161.5063.6757.1853.9849.5647.7643.1338.36
Hubei53.0259.1153.5657.4551.7848.9542.4140.2336.1036.4330.6329.98
Hunan46.9754.0253.7450.4751.6644.3639.6438.0632.7533.3628.6128.67
Jiangxi38.9543.9743.8541.6742.7135.5934.4536.3429.1728.2627.4823.95
Shanxi48.7054.7550.3749.1546.0243.5643.6044.3239.1736.6334.1633.04
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Fang, G.; Zhu, Y.; Zhang, J. Spatiotemporal Evolution Analysis of PM2.5 Concentrations in Central China Using the Random Forest Algorithm. Sustainability 2024, 16, 8613. https://doi.org/10.3390/su16198613

AMA Style

Fang G, Zhu Y, Zhang J. Spatiotemporal Evolution Analysis of PM2.5 Concentrations in Central China Using the Random Forest Algorithm. Sustainability. 2024; 16(19):8613. https://doi.org/10.3390/su16198613

Chicago/Turabian Style

Fang, Gang, Yin Zhu, and Junnan Zhang. 2024. "Spatiotemporal Evolution Analysis of PM2.5 Concentrations in Central China Using the Random Forest Algorithm" Sustainability 16, no. 19: 8613. https://doi.org/10.3390/su16198613

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop