Climate Change Forecasting Using Machine Learning Algorithms
Climate Change Forecasting Using Machine Learning Algorithms
https://doi.org/10.22214/ijraset.2022.47098
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue X Oct 2022- Available at www.ijraset.com
Abstract: This is an extremely contentious subject of study as many people across the world don't believe that humans are
responsible for climate change. There will be an urgent requirement for adaptation as a result of the effects of climate change on
human society, as this is the only way that humans can hope to survive the increasingly chaotic weather of the future. The rapid
development of machine learning (ML) algorithms has sparked innovations in numerous fields of study and has even been
proposed as improving climate studies. Since global warming has an effect on not just people but also on many other kinds of
animals, efforts to bring it down may benefit everyone on Earth. To examine the global climate's development since 1800, we use
machine learning methods. From land average temperature to land average temperature uncertainty (95% CI around the mean),
land maximum temperature to land average temperature uncertainty, land minimum temperature to land average temperature
uncertainty, land and ocean average temperature to land and ocean average temperature uncertainty, and more. Kaggle datasets
show a wide range of temperature change metrics since 1750. Another dataset measures the average worldwide concentration of
carbon monoxide since 1958; it was compiled by the Earth System Research Laboratory of the United States Government. We
propose focusing equally on machine learning and artificial intelligence to better interpret and profit from current data and
simulation. Our findings provide insight into the difficulties and potential benefits of data-driven climate modeling, particularly
about the future integration of increasingly enormous model datasets.
Keywords: Climate change predictions, Machine learning algorithms, Temperature analysis, CO2 analysis , Artificial
intelligence, Global warming.
I. INTRODUCTION
The effects of climate change are becoming increasingly obvious. Extreme weather events, such as hurricanes, tornadoes, hail,
lightning, and fires, as well as floods, have become more frequent and more intense in recent years. Humanity's access to the natural
resources and agricultural practices that sustain it are under jeopardy as global ecosystems undergo rapid transformation. According
to projections made in the 2018 international report on climate change, catastrophic results are certain if greenhouse gas emissions
are not completely removed over the next thirty years. However, these pollutants continue to climb year after year. Reducing
emissions and adjusting to the effects of climate change are both necessary steps (preparing for unavoidable consequences). Both
have complex facets. Changing electrical systems, transportation, buildings, industries, and land use are all necessary for reducing
emissions of greenhouse gasses (GHG). Given the knowledge of climate and catastrophic occurrences, adaptation needs preparation
for resilience and disaster management. It's possible to consider the wide variety of issues as a positive sign, as it means there are a
lot of methods to make a difference. [1]
To help decision makers better manage present and future climate change risks, we need a deeper comprehension of risk
interconnections and dynamics. As a response to this challenge, researchers have begun to experiment with novel analytical methods
and techniques, such as the use of Machine Learning (ML) to make the most of the potential offered by the abundance and diversity
of spatio-temporal big data in environmental contexts. This paper covers the current state of the art and future possibilities for ML
approaches in Climate Change Risk Assessment (CCRA), a growing area of interest. Together, sciento-metric and systematic
analysis were used to conduct a comprehensive literature search covering the years 2000–2020. The results of the study revealed
that several different ML algorithms have been used before in CCRA, with Decision Tree, Random Forest, and Artificial Neural
Network being the most common. Most flood and landslide risk events are analyzed using these algorithms in an ensemble or
hybridized fashion. The examined CCRA applications all use ML to process remote sensing data in a consistent and efficient
manner, which facilitates the detection of environmental and structural aspects, as well as the identification and categorization of
targets. In contrast to research evaluating dangers under present conditions, literature concerning future climate change scenarios
does not seem to be highly pervasive in scientific production. Since these notions have just recently emerged in CCRA literature,
they have not yet been combined with ML-based applications, leading to the same conclusion. [2]
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 881
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue X Oct 2022- Available at www.ijraset.com
Visualization Techniques for Climate Change with the Help of ML and AI focuses on the effects of climate change and how they
might be avoided or lessened. There is hope that various algorithms, mathematical relations, and software models can help us make
sense of our world, forecast the weather, and develop novel products and services that reduce our negative impact on the planet
while simultaneously increasing our opportunities for success in areas like healthcare and environmental sustainability. This paper
discusses several methods for predicting climate change and alternate systems that might mitigate the risks climate scientists have
detected. Even better, climate action, one of the 17 sustainable development objectives, will be advanced with research
assistance.[3] . The Climate Change Intergovernmental Panel (IPCC), established by the United Nations in 1988 and honored with
the Nobel Peace Prize in 2007, evaluates climate projections (predictions) based on the results of approximately 50 climate models,
which are supported by approximately 25 laboratories around the world. (Common with Al Gore, the former Vice President) But
we're curious whether, even at our level, we could utilize some basic machine learning models to evaluate climate change. As a
result, the machine learning algorithms we use are more straightforward than those utilized by climate scientists. The public may
have a better understanding of the gravity of environmental issues thanks to Big Data.
We share graphs and plots that even those without scientific training can understand as more data is collected. As a result, in this
project, we place equal emphasis on the novelty and usability of the plotting, primarily via the use of a remarkable program called
bokeh3. The majority of our graphics are included in the report; however, we encourage readers to utilize our research to take full
use of their interactivity. To proceed, a large database had to be located. Kaggle, an online competition platform, provided us with
one. The dataset consists of five files, each of which is a record of monthly global temperatures beginning in 1750. (1), using either
(2) a nation, (3) a state, (4) a big city, or (5) an individual city. There are three different types of files used in this project: (4).
Finally, we included a new factor—the amount of CO2 pollution—and assess its effect on our models.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 882
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue X Oct 2022- Available at www.ijraset.com
[9] (Reckien, 2017) Climate change is widely regarded as the greatest threat to our civilizations in the future decades, and it may
threaten the lives of the large and diverse people who have made cities their homes in this century of urbanization. As the impacts of
climate change become more dire, discussions about cities are likely to shift from an overarching perspective to concentrate on
specific populations and how they will be impacted. This is because urban areas are home to widely varied individuals with varying
vulnerabilities. The issue of urban equality is therefore thrust into the spotlight.
[10] (Glavovic, 2021) There has been a breach of the agreement between science and society. Weather patterns are shifting. Science
provides evidence for the causes, the pace of worsening, the effects on people and on social-ecological systems, and the necessity of
taking action. Across governments, there is consensus that this issue has been thoroughly researched and resolved. Despite
mounting data, new warnings, and innovative approaches, indications of harmful global change continue to climb, making climate
change science a tragic field.
[11] (Massazza, 2022) We then highlight areas, such as intervention research, Potential areas where new methods may need to be
implemented in the near future include climate change assessments and the collection of mental health metrics. Both public mental
health and environmental epidemiology frameworks are drawn upon in this article. The purpose is not to provide precise definitions
of each methodology, but rather to highlight opportunities for combining different methods, encouraging collaboration across
disciplines, and inspiring novel approaches to research design.
[12] (L. Kaack, 2022) The possible influence of growing AI and ML on global greenhouse gas emissions is the subject of increased
speculation. The complex nature of the consequences of these emissions makes it difficult to quantify and predict them. In this
study, we provide a technique for comprehensively documenting the effects of ML on GHG emissions, categorizing these results as
follows: computing-related impacts; direct effects of utilizing ML; system-level impacts. Utilizing this methodology, we may
identify which facets of ML's effect on climate change mitigation require more investigation via impact assessment and scenario
analysis, and then recommend appropriate policy levers to bring about these changes. Climate change is only one area where the
development of AI is having a significant impact on society. In this Perspective, we provide a framework for assessing AI's effect on
GHG emissions, as well as recommendations for aligning AI with efforts to mitigate climate change.
[13] (Tahmineh Ladi, 2022) The climate catastrophe presents a number of difficult problems, and this document will outline some
of them and the potential role that artificial intelligence might play in finding answers. In particular, it will highlight a variety of
real-world applications of AI and future areas where the technology shows great potential. To further elaborate on the goals of the
AI and Climate Change Impact Weekend 2020, this document serves as an addendum to the Challenge Statement. As well as setting
the stage for the issue at hand, this reading is meant to spark students' imaginations about the potential applications of AI in the fight
against climate change.
[14] (Tom Beucler, I. Ebert‐Uphoff, 2020) Machine learning (ML) techniques are strong tools to develop models of clouds and
climate that are more true to the rapidly-increasing amounts of Earth system data than commonly-used semiempirical models. Here,
we examine ML technologies, including interpretable and physics-guided ML, and detail how they might be used to cloud-related
processes in the climate system, including radiation, microphysics, convection, and cloud detection, classification, simulation, and
uncertainty quantification.
[15] (Kyle Tilbury, 2020) With the help of machine learning, we may be able to mitigate some of the worst effects of climate
change. Approaches such as alerting individuals of their carbon footprint and measures to lessen it are examples of how machine
learning has been used in the past to combat the human repercussions of climate change. These strategies will be most effective if
they take into account the specific social and psychological characteristics of each patient. Affect is a social-psychological aspect in
climate change that has been shown to play a significant role in people's perspectives and motivation to take action to mitigate its
effects. We propose a study investigating the potential of incorporating emotion into machine learning-based solutions for climate
change.
[16] (Zhongkai Shangguan, Zihe Zheng, 2021) In this study, we assembled a large Twitter dataset about climate change and
analyzed it extensively using machine learning. We use topic modeling and natural language processing to demonstrate how the
volume of tweets concerning climate change correlates with the occurrence of big weather events, as well as to reveal the most often
discussed aspects of this issue and the general tone of online discourse.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 883
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue X Oct 2022- Available at www.ijraset.com
After that, we de-noise our data and give it comprehensible visual representation. The occurrence of NaN values was the first
problem we noticed. Inferring the global land average temperature in the 18th century is likely to have been a far more challenging
task. After using the dropna() technique of data cleansing, we discover that the dataset is only of use after the year 1800. Consistent
with this is the fact that compiled 19th century temperature data are often used by climate experts. Only the Land Average
Temperature and the 95% Uncertainty of that Temperature are of significance to us. We are aware that if we preserve the data in this
format, it will be difficult to present the yearly temperatures. Hence we generated a smaller data frame and resized the data.
A. Dataset Description
The first [temperature] dataset originates from the online platform Kaggle, [17]. The second is a record of the average worldwide
concentration of carbon dioxide (CO2) since 1958 by the United States Government's Earth System Research Laboratory, Global
Monitoring Division. Berkeley Earth, a division of the Lawrence Berkeley National Laboratory, has just accumulated new data, and
we have packed it.[18] Researchers at Berkeley compiled 1.6 billion temperature observations from 16 distinct sources to construct
the Earth Surface Temperature Study. It's packaged neatly and may be divided apart for additional investigation (for example by
country). They released both the original data and the software used to modify it. As an added bonus, they use methods that permit
the inclusion of meteorological data from shorter time periods, thereby lowering the number of unnecessary observations.
B. Data Pre-processing
To maximize the effectiveness of the machine learning algorithm, the data must be pre-processed. The first step in the pre-
processing pipeline is normalizing the data. The process is useful for linear data transformation. Missing data or missing values
occur when there is no recorded value for a certain attribute of a dataset. Unfortunately, missing data is common, and this might
affect the conclusions that can be drawn from data sets.
1) Correlation Matrix: In order to understand the relationships between our variables, we compute their correlations. Here, we
focus just on the temperature variables and entirely discounts the inherent uncertainty in their values. All these variables are
interconnected. All except one of these variables are somewhat connected with one another, and that one is Land_OceanTemp.
Here, we use an alternative prediction model, the Random Forest Regressor.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 884
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue X Oct 2022- Available at www.ijraset.com
As can be seen in the graph, the regression accurately predicts temperatures within a wide range. By finally separating the data into
a training set and a test set, we hope to use it as a prediction model. The time periods 1800-1949 and 1950-2015 are represented in
the databases used for training and testing. Based on the data and the visualization, it is clear that linear regression is not useful for
this kind of testing. The model's parameters gave a good match to the temperature data when built using information from the whole
dataset, but not when utilizing just a portion of the data. This makes it very evident that linear regression is not a viable method of
making forecasts.
2) Polynomial Regression: Replacing the fundamental linear model has traditionally been the go-to strategy for extending linear
regression to settings where the relationship between predictors and response is nonlinear. A link between the two is apparent.
As a result, we want to construct a polynomial regression with deg(P) 3. The initial step is to train the model using the whole
dataset. Once again, we utilize visual aids and common regression metrics to determine how well a model fits the data.
This model seems to better fit the data than the first linear regression. Predicting the future accurately is a field of study. Once again,
the testing database extends to 2015, whereas the train database begins in 1800 and goes through 1949.
The variance score here is substantially higher than the one obtained from the second linear regression. Unfortunately, it is still not
suitable for use as a data forecast model on our end. When we use linear regression, we have the same problem.
3) Random Forest: The scikit-learn package's recommended Random Forest Regressor is implemented here. We tried a few
different numbers for n (the number of trees) and found that 10 was optimal. To improve upon the accuracy of our prior
regressions, we choose to use a more extensive training database. Here, we take a look back at the previous month's minimum
and maximum land and ocean temperatures via the lens of regression analysis. If inputs (min, max, Land_ocean_average) are
only from the same year, the output (average) will be skewed. The testing database includes data from 1981 onward, whereas
the train database covers the years 1800-1980. Since then, we've been able to fine-tune our model even further. Mean Euclidean
Distance, the metric we created previously, is used to evaluate the goodness of model fit (MED).
4) XGBoost: The eXtreme Gradient Boosting Regressor (XGBoostRegressor) is employed after the Random Forest Regressor has
already been applied. The input and output variables, as well as the training and test sets, remain unchanged. We will run the
model with the preset values and evaluate the results. This means that the model outperforms the Random Forest Regressor.
5) KNN: As a result, we use a machine learning strategy called K-Nearest Neighbors. After a great deal of experimentation, we
find that the optimal value of k for MED minimization is 2. Aside from those two changes, everything else, including the input
and output variables and the test and training datasets, is the same. When compared to Random Forest and XGBoost, the MED
produced by KNN is superior. The XGB Regressor is currently the top performing model, thus we will aim to out-optimize it
by modifying KNN hyperparameters to get a lower MED. To find the optimal values for our hyperparameters, we use a grid
search and a hand-optimized minimization. The scikit-learn package provides the GridSearchCV feature.
6) GridSearchCV: GridSearchCV is a technique for determining the optimal values of a model's hyperparameters. The last section
highlighted the importance of hyperparameter settings in determining a model's output. The optimal settings of hyperparameters
cannot be anticipated in advance, thus it is vital to test all possible combinations. We use GridSearchCV to automate the
process of adjusting hyperparameters since doing it manually would require a significant investment of time and energy.
a) Impact of CO2 on our ML Models: We expanded our data set to include a CO2 variable. From 1958 to 2016, CO2 measurements
have been made accessible. We have to exclude 2016 since the source database only contains data through 2015. The Random
Forest Regressor and the XGBoost Regressor, two of our models that have so far had the best outcomes, are again used in our
evaluation of CO2 data. To find the best hyperparameters in this situation, we explicitly employ the aforementioned optimization
techniques. We add the most recent data to what we currently have as background and blend it with it.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 885
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue X Oct 2022- Available at www.ijraset.com
Now that the dataset has been updated, we may utilize our prediction models. This time, we choose to use the strategies for
hyperparameters optimization directly. Following a number of attempts, the set on which we are minimizing was selected.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 886
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue X Oct 2022- Available at www.ijraset.com
V. CONCLUSION
The first part of the results shows that similar to the trend in CO2 concentration, temperatures have been on the rise. In addition,
using a correlation study between CO2 concentration and temperature, we show that an increase in CO2 concentration results in a
temperature rise. A number of successful machine learning strategies were examined. However, more research into other machine
learning methods, such as ensemble-based methods like xgboost and linear regression, might be conducted in the future in an
attempt to discover improved models. In this study, we compared the performance of many machine learning models. When it
comes to the primary data set, the most effective models are the MED, the Random Forest Regressor, and the XGBoost Regressor.
The Random Forest Regressor's output improves when we add a new variable. Both GridSearch and manual optimization are very
time-consuming and inefficient approaches to MED reduction, which is a significant problem for us. Having access to more
processing power would have helped us achieve superior outcomes. Including CO2 in our models improves their performance. As a
percentage of body weight, the MED is rather small, at roughly 0.1%. To further lessen the impact of the temperature rise, we need
to account for other associated elements.
VI. ACKNOWLEDGEMENTS
I would like to express my deep gratitude to my Professor Ms. Divya Sharma, my research supervisor, for her patient guidance,
enthusiastic encouragement and constructive criticism for this study effort.
REFERENCES
[1] ∗ D. R. and P. L. D. Et.al, “Tackling Climate Change with Machine Learning,” 2019, [Online]. Available: https://arxiv.org/pdf/1906.05433.pdf
[2] F. E. Et.al, “Exploring machine learning potential for climate change risk assessment,” Sci. direct, 2021, [Online]. Available:
https://www.sciencedirect.com/science/article/abs/pii/S0012825221002531
[3] A. D. et. a. Arun Srivastav, “Visualization Techniques for Climate Change with Machine Learning and Artificial Intelligence,” ELSEVIER, 2022, [Online].
Available: https://www.elsevier.com/books/visualization-techniques-for-climate-change-with-machine-learning-and-artificial-intelligence/srivastav/978-0-323-
99714-0
[4] O. Adedeji, “Global Climate Change,” Res. gate, 2014, [Online]. Available: https://www.researchgate.net/publication/276495677_Global_Climate_Change
[5] D. R. M. et. a. Klingelhöfer, “Climate change: Does international research fulfill global demands and necessities?,” springeropen, 2020, [Online]. Available:
https://enveurope.springeropen.com/articles/10.1186/s12302-020-00419-1
[6] M. Margherita Grasso Manera, “The Health Effects of Climate Change: A Survey of Recent Quantitative Research,” Natl. Libr. Med., 2012, [Online].
Available: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3386570/
[7] M. Oppenheimer, “Climatic Change,” springer link link, 2021, [Online]. Available: https://www.springer.com/journal/10584
[8] J. A. G. Castro, “Climate change and flood risk: vulnerability assessment in an urban poor community in Mexico,” 2019, [Online]. Available:
https://journals.sagepub.com/doi/full/10.1177/0956247819827850
[9] D. Reckien, “Climate change, equity and the Sustainable Development Goals: an urban perspective,” 2017, [Online]. Available:
https://journals.sagepub.com/doi/full/10.1177/0956247816677778
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 887
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue X Oct 2022- Available at www.ijraset.com
[10] B. C. Glavovic, “The tragedy of climate change science,” 2021, [Online]. Available: https://www.tandfonline.com/doi/full/10.1080/17565529.2021.2008855
[11] A. Massazza, “Quantitative methods for climate change and mental health research: current trends and future directions”, [Online]. Available:
https://www.thelancet.com/journals/lanplh/article/PIIS2542-5196(22)00120-6/fulltext#
[12] P. D. L. Kaack, “Aligning artificial intelligence with climate change mitigation,” Semant. Sch., 2022, [Online]. Available:
https://www.semanticscholar.org/paper/Aligning-artificial-intelligence-with-climate-Kaack-Donti/8e45e5d8e4d7ca24699b516105414b29f71431e2
[13] S. J. et. a. Tahmineh Ladi, “Applications of machine learning and deep learning methods for climate change mitigation and adaptation,” Semant. Sch., 2022,
[Online]. Available: https://www.semanticscholar.org/paper/Applications-of-machine-learning-and-deep-learning-Ladi-
Jabalameli/3e44af63f0c79f46b40ac38668ad79f67fd4c383
[14] E. a. Tom Beucler, I. Ebert‐Uphoff, “Machine Learning for Clouds and Climate,” Semant. Sch., 2020, [Online]. Available:
https://www.semanticscholar.org/paper/Machine-Learning-for-Clouds-and-Climate-Beucler-Ebert‐Uphoff/ea0ccde42b1f4517e87ee2e3f77fb6c06671666b
[15] J. H. Kyle Tilbury, “The Human Effect Requires Affect: Addressing Social-Psychological Factors of Climate Change with Machine Learning,” Semant. Sch.,
2020, [Online]. Available: https://www.semanticscholar.org/paper/The-Human-Effect-Requires-Affect%3A-Addressing-of-Tilbury-
Hoey/cf0fd2632dff13bc69daeeb98118d1b7644418a0
[16] E. a. Zhongkai Shangguan, Zihe Zheng, “Trend and Thoughts: Understanding Climate Change Concern using Machine Learning and Social Media Data,”
Semant. Sch., 2021, [Online]. Available: https://www.semanticscholar.org/paper/Trend-and-Thoughts%3A-Understanding-Climate-Change-Shangguan-
Zheng/fd85cdaf951693c76dde2974a9c2502fe6c8bcc9
[17] B. Earth, “Climate Change: Earth Surface Temperature Data.” https://www.kaggle.com/datasets/berkeleyearth/climate-change-earth-surface-temperature-data
[18] B. Earth, “climate data.” http://berkeleyearth.org/data/
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 888