Study of California Earthquake Prediction Using Machine Learning Approach
Study of California Earthquake Prediction Using Machine Learning Approach
Pimpri Chinchwad College of Engineering (PCCOE), Pune, India. Aug 18-19, 2023
Abstract- Earthquake prediction is a highly challenging create models of the interior of the Earth and identify
and complex task that has been the subject of scientific regions at high risk of seismic activity. Understanding
inquiry for many years. Even though we have learned a lot earthquakes and seismology is essential for developing
about earthquakes, it is still hard to predict them accurately earthquake-resistant structures and implementing effective
and reliably. This is because earthquakes are complicated and disaster management strategies. Scientists can improve our
hard to predict, and because our scientific knowledge and ability to predict and mitigate the damage caused by seismic
technology are not as advanced as they could be. There have events by studying earthquakes and their effects.
been many ideas about how to predict earthquakes, such as
using seismological and geophysical data,. But a lot of work Earthquake analysis is essential for understanding the
still needs to be done to make a reliable and effective seismic dangers associated with living in earthquake-prone
earthquake prediction system that can warn people in time regions. This includes identifying potential earthquake
and lessen the damage caused by earthquakes. In this work, an sources, such as faults and seismic zones, and assessing the
intelligent earthquake prediction model is developed using potential damage that can be caused by seismic activity. The
various machine learning algorithms. The seismological data objective of earthquake analysis is to provide reliable and
of USGS is used for the model training and validation. The accurate information that can be used to develop effective
system has shown an prediction accuracy of 92.7%. strategies for mitigating the effects of earthquakes, including
the design of earthquake-resistant structures, the
Keywords: Random Forest (RF), Logistic Regression(LR),
Decision Tree(DT), Earthquake, California, USGS.
development of emergency response plans, and the
formulation of public policy.
I. INTRODUCTION Seismic models are based on assumptions and
The sudden and violent shaking of the ground caused by simplifications that may not capture the complexity of
the discharge of energy from the Earth's crust is an seismic activity in its entirety. This can result in inaccuracies
earthquake [7]. When two tectonic plates move against each in earthquake analysis and a lack of confidence in the
other or when there is an abrupt movement along a fault, this ensuing forecasts. Despite advancements in earthquake
energy is released. Earthquakes can range from minor analysis, scientists and engineers are unable to accurately
vibrations to devastating calamities that cause widespread predict when and where earthquakes will occur. This can
destruction and loss of life. Between 1998 and 2017, make it difficult to prepare for and respond effectively to
earthquakes caused nearly 7,50,000 deaths worldwide, more earthquakes.It is important for earthquake analysis to be
than half of all natural disaster-related deaths. During this conducted in an ethical and responsible manner.
time period, more than 125 million people were impacted by
By studying seismic activity and analyzing earthquake
earthquakes. In 1976, a magnitude 7.5 earthquake in the
data, researchers can help communities better prepare for
Tangshan region of China killed over 300,000 people. It is
future earthquakes by identifying high-risk areas and
one of China's deadliest natural disasters ever. China appears
developing mitigation strategies. Machine learning (ML) can
to have the most destructive earthquakes. This Asian nation
play a significant role in earthquake analysis by analyzing
accounts for roughly fifty percent of all earthquake fatalities.
large datasets and identifying patterns and relationships that
Seismology[8] is the scientific study of earthquakes and conventional statistical methods might not identify. On the
elastic wave propagation through the Earth. To study basis of historical data, machine learning algorithms can be
earthquakes, seismologists employ various methods, such as trained to predict the likelihood of an earthquake occurring
measuring ground motion, studying seismic waves, and in a particular region. These algorithms can account for a
analyzing earthquake data. Seismologists can determine an variety of variables, including seismic activity, weather
earthquake's location, magnitude, and depth by analyzing patterns, and geological data, in order to make precise
earthquakes and the seismic waves they generate. predictions.
Seismologists also examine the interior structure of the Earth
In this work we have used bootstrap aggregation method
and the behaviour of tectonic plates to determine earthquake
for better prediction of earthquake.
causes and predict their occurrence. Using seismic data, they
2
Authorized licensed use limited to: BRACT's Vishwakarma Institute Pune. Downloaded on October 15,2024 at 09:19:45 UTC from IEEE Xplore. Restrictions apply.
data is 20% of the total data set which is 215. The efficiency aspects of the earthquake are taken into consideration, it is
of the models are measured using different performance discovered that RF performs exceptionally well in
parameters such as accuracy, precision, recall and f1score comparison to the other algorithms. The DT model has
[15] as shown in table [2]. Accuracy is a measure of how shown improved performance after taking into account the
well a model or method can correctly predict or classify severe effects of an earthquake.In real-time systems, the
data. Precision is a measure of the accuracy of a binary accurate prediction result of RF has shown to
classification model. Recall is a measure of the effectiveness have implications.
of a binary classification model [16]. The mathematical
expressions for Accuracy, Precision and Recall are REFERENCES
mentioned in equation (3), (4) and (5). [1] T. J. Roy, M. A. Mahmood and D. Roy, "A Machine Learning Model
to Predict Earthquake Utilizing Neural Network," 2021 International
Confusion Matrix [14] of all the 3 algorithms are given Conference on Computer, Communication, Chemical, Materials and
in table (3). Electronic Engineering (IC4ME2), Rajshahi, Bangladesh, 2021, pp.
1-4, doi: 10.1109/IC4ME253898.2021.9768454.
ାௗ
Accuracy = (3) [2] R. Li, X. Lu, S. Li, H. Yang, J. Qiu and L. Zhang, "DLEP: A Deep
ାାାௗ
Learning Model for Earthquake Prediction," 2020 International Joint
Conference on Neural Networks (IJCNN), Glasgow, UK, 2020, pp.
Precision = (4)
ା 1-8, doi: 10.1109/IJCNN48605.2020.9207621.
[3] B. Bhargava and S. Pasari, "Earthquake Prediction Using Deep
Recall = (5) Neural Networks," 2022 8th International Conference on Advanced
ା
Computing and Communication Systems (ICACCS), Coimbatore,
Where, India, 2022, pp. 476-479, doi:
a=True Positive 10.1109/ICACCS54159.2022.9785011.
b=False Positive [4] W. Li, N. Narvekar, N. Nakshatra, N. Raut, B. Sirkeci and J. Gao,
c=False Negative "Seismic Data Classification Using Machine Learning," 2018 IEEE
Fourth International Conference on Big Data Computing Service and
d=True Negative Applications (BigDataService), Bamberg, Germany, 2018, pp. 56-63,
doi: 10.1109/BigDataService.2018.00017.
TABLE II. PERFORMANCE ANALYSIS [5] A. Gaba, A. Jana, R. Subramaniam, Y. Agrawal and M. Meleet,
Algorithm Accuracy Precision Recall "Analysis and Prediction of Earthquake Impact-a Machine Learning
Logistic 92.3% 93% 92% approach," 2019 4th International Conference on Computational
Regression Systems and Information Technology for Sustainable Solution
(CSITSS), Bengaluru, India, 2019, pp. 1-5, doi:
Random Forest 92.7% 93% 92%
10.1109/CSITSS47250.2019.9031026.
Decision Tree 87.1% 93% 93%
[6] R. Mallouhy, C. A. Jaoude, C. Guyeux and A. Makhoul, "Major
earthquake event prediction using various machine learning
TABLE III. CONFUSION MATRIX algorithms," 2019 International Conference on Information and
Algorithm Fn Tp Tn Fp Communication Technologies for Disaster Management (ICT-DM),
Logistic 18 215 1 0 Paris, France, 2019, pp. 1-7, doi: 10.1109/ICT-
DM47966.2019.9032983.
Regression
Random Forest 17 215 2 0 [7] https://en.wikipedia.org/wiki/Earthquake
Decision Tree 14 199 5 16 [8] https://en.wikipedia.org/wiki/Seismology
[9] https://www.usgs.gov/
In the case of a real-world crisis, such as an earthquake, [10] https://realpython.com/logistic-regression-python/
the value of False Negative should be as low as possible in [11] https://www.analyticsvidhya.com/blog/2021/06/understanding-
order to save people's lives. Thus, in the case of False random-forest/
Negative Decision Tree Algorithm is better, but when all [12] https://scikit-learn.org/stable/modules/tree.html
other parameters are considered, Random Forest is more [13] https://machinelearningmastery.com/gentle-introduction-long-short-
effective. term-memory-networks-experts/
[14] https://towardsdatascience.com/understanding-confusion-matrix-
a9ad42dcfd62
[15] https://deepai.org/machine-learning-glossary-and-terms/f-score
V. CONCLUSION [16] https://www.learndatasci.com/glossary/binary-classification/
RF and DT are utilised in the creation of an earthquake
prediction system that is presented in this work. When all
3
Authorized licensed use limited to: BRACT's Vishwakarma Institute Pune. Downloaded on October 15,2024 at 09:19:45 UTC from IEEE Xplore. Restrictions apply.