0% found this document useful (0 votes)
33 views

Study of California Earthquake Prediction Using Machine Learning Approach

Uploaded by

Amol Bhilare
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views

Study of California Earthquake Prediction Using Machine Learning Approach

Uploaded by

Amol Bhilare
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

2023 7th International Conference On Computing, Communication, Control And Automation (ICCUBEA)

Pimpri Chinchwad College of Engineering (PCCOE), Pune, India. Aug 18-19, 2023

Study of California Earthquake Prediction Using


Machine Learning Approach
Debabrata Swain Vyom Shah
Computer Science and Engineering Department Computer Science and Engineering Department
2023 7th International Conference On Computing, Communication, Control And Automation (ICCUBEA) | 979-8-3503-0426-8/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICCUBEA58933.2023.10392096

Pandit Deendayal Energy University, Pandit Deendayal Energy University,


Gandhinagar Gandhinagar
[email protected] [email protected]

Dhruvin Shah Amol Bhilare


Computer Science and Engineering Department Computer Science and Engineering Department
Pandit Deendayal Energy University, Pandit Deendayal Energy University,
Gandhinagar Gandhinagar
[email protected] [email protected]

Abstract- Earthquake prediction is a highly challenging create models of the interior of the Earth and identify
and complex task that has been the subject of scientific regions at high risk of seismic activity. Understanding
inquiry for many years. Even though we have learned a lot earthquakes and seismology is essential for developing
about earthquakes, it is still hard to predict them accurately earthquake-resistant structures and implementing effective
and reliably. This is because earthquakes are complicated and disaster management strategies. Scientists can improve our
hard to predict, and because our scientific knowledge and ability to predict and mitigate the damage caused by seismic
technology are not as advanced as they could be. There have events by studying earthquakes and their effects.
been many ideas about how to predict earthquakes, such as
using seismological and geophysical data,. But a lot of work Earthquake analysis is essential for understanding the
still needs to be done to make a reliable and effective seismic dangers associated with living in earthquake-prone
earthquake prediction system that can warn people in time regions. This includes identifying potential earthquake
and lessen the damage caused by earthquakes. In this work, an sources, such as faults and seismic zones, and assessing the
intelligent earthquake prediction model is developed using potential damage that can be caused by seismic activity. The
various machine learning algorithms. The seismological data objective of earthquake analysis is to provide reliable and
of USGS is used for the model training and validation. The accurate information that can be used to develop effective
system has shown an prediction accuracy of 92.7%. strategies for mitigating the effects of earthquakes, including
the design of earthquake-resistant structures, the
Keywords: Random Forest (RF), Logistic Regression(LR),
Decision Tree(DT), Earthquake, California, USGS.
development of emergency response plans, and the
formulation of public policy.
I. INTRODUCTION Seismic models are based on assumptions and
The sudden and violent shaking of the ground caused by simplifications that may not capture the complexity of
the discharge of energy from the Earth's crust is an seismic activity in its entirety. This can result in inaccuracies
earthquake [7]. When two tectonic plates move against each in earthquake analysis and a lack of confidence in the
other or when there is an abrupt movement along a fault, this ensuing forecasts. Despite advancements in earthquake
energy is released. Earthquakes can range from minor analysis, scientists and engineers are unable to accurately
vibrations to devastating calamities that cause widespread predict when and where earthquakes will occur. This can
destruction and loss of life. Between 1998 and 2017, make it difficult to prepare for and respond effectively to
earthquakes caused nearly 7,50,000 deaths worldwide, more earthquakes.It is important for earthquake analysis to be
than half of all natural disaster-related deaths. During this conducted in an ethical and responsible manner.
time period, more than 125 million people were impacted by
By studying seismic activity and analyzing earthquake
earthquakes. In 1976, a magnitude 7.5 earthquake in the
data, researchers can help communities better prepare for
Tangshan region of China killed over 300,000 people. It is
future earthquakes by identifying high-risk areas and
one of China's deadliest natural disasters ever. China appears
developing mitigation strategies. Machine learning (ML) can
to have the most destructive earthquakes. This Asian nation
play a significant role in earthquake analysis by analyzing
accounts for roughly fifty percent of all earthquake fatalities.
large datasets and identifying patterns and relationships that
Seismology[8] is the scientific study of earthquakes and conventional statistical methods might not identify. On the
elastic wave propagation through the Earth. To study basis of historical data, machine learning algorithms can be
earthquakes, seismologists employ various methods, such as trained to predict the likelihood of an earthquake occurring
measuring ground motion, studying seismic waves, and in a particular region. These algorithms can account for a
analyzing earthquake data. Seismologists can determine an variety of variables, including seismic activity, weather
earthquake's location, magnitude, and depth by analyzing patterns, and geological data, in order to make precise
earthquakes and the seismic waves they generate. predictions.
Seismologists also examine the interior structure of the Earth
In this work we have used bootstrap aggregation method
and the behaviour of tectonic plates to determine earthquake
for better prediction of earthquake.
causes and predict their occurrence. Using seismic data, they

979-8-3503-0426-8/23/$31.00 ©2023 IEEE 1


Authorized licensed use limited to: BRACT's Vishwakarma Institute Pune. Downloaded on October 15,2024 at 09:19:45 UTC from IEEE Xplore. Restrictions apply.
II. LITERATURE REVIEW Dmin Horizontal distance to the surface of Earth
Status Automatic or reviewed detection
Tamal Joyti et al. [1] developed a machine-learning
seismic analysis and risk identification system. SVM, LML, In the condensed data-set, "status" was the only
Bagging, OneR, Random Forest , Logistic Regression , categorical feature, and using label encoding, its values were
Naive Bayes, and Stacking were employed. A multi-layer changed to 0 and 1. Where 0 and 1 means reviewed and
neural network predictive model used longitudinal and automatic detection of earthquake.
magnitude regression. Advanced models can predict the next
earthquake with as little accuracy as possible, but they This data-set contains considerable amount of missing
cannot determine when. values of continuous features. Using count-plot function its
detection was performed but there were no null values
Rui et al. [2] DLEP model was used for prediction and observed.
convolutional neural network was also used for indirect
feature extraction. Experimental results on eight datasets In the next step, using the standard deviation approach,
with different characteristics showed the promising we looked for outliers and none were found.
performance of the proposed DLEP for accurate earthquake
prediction. C. Model Selection :
1) Logistic Regression :
Bhargava et al. [3] recommended employing deep neural Logistic regression is an algorithm used in machine
networks comprising long-short-term memory (LSTM) [13] learning for binary classification problems. It models the
units to identify and forecast earthquakes.The data set used relationship between a binary dependent variable and one or
was of Himalayas. The 2012–2018 data set trained the more independent variables and estimates the likelihood that
LSTM model, whereas the 2019–2020 data set tested it.For the dependent variable belongs to a particular class [10].
large earthquakes, the model failed.
Using the logistic function, the logistic regression
Wenrui et al. [4] created Machine Learning methods that formula transforms a linear combination of the independent
can be used to find earthquakes by looking at data from a variables into a probability value. Formula is mentioned in
continuous time series. These approaches can be used to find equation (1):
earthquakes. Additionally, the data was utilized to make

estimates regarding the times that the P-wave and S-wave P(y) = (1)
ଵା௘ ష೥
will arrive.
Where: P(y)=P(1/x) is the probability of the dependent
Anmol et al. [5] proposed that the Random Forest
variable (y) taking the value 1 given the independent
Classifier method is the best at diagnosing the damage
variables (x)
caused by earthquakes among LR, NBC, RFC, and KNN.
Their reasoning was based on the fact that the RFC approach z is the linear combination of the independent variables
had the highest F1 score. and their associated weights, expressed as z = b0 + b1x1 +
b2x2 + ... + bn*xn
Roxane et al. [6] classified big earthquake events using
eight machine learning methods using Mean Absolute Error b0, b1, b2, ..., bn are the weights assigned to each
comparison between negative and positive. SVM, KNN, independent variable
MLP, and Random Forest classify the majority of output
properly and produce the fewest false outputs exp() is the exponential function.
2) Random Forest :
III. METHODOLOGY Random forest is a method for classification based on
A. Data Collection : ensemble learning. It builds a large number of decision trees,
The data-set used for performing the model training in each of which is trained on distinct subsets of the data and
this work is acquired from the U.S. Geological Survey features. While classifying the test data it combines the
website. The USGS [9] data-set is one of the most reliable, decision made by each classifier and apply majority voting.
accurate and used data-set. In this data-set, there are 12575 It has high precision and low over fitting [10]. While
records and 22 features. Out of the 22 features, 8 are in constructing each decision tree, it considers a criteria known
categorical format and other 14 are in continuous format. as Gini Index that gives a score for information gain as
shown in equation (2).
B. Data Pre-processing
Gini = ͳ െ σୡ୧ୀଵሺ’୧ ሻଶ (2)
We only focused on records from the "California"
region for this data-set containing 1170 records. Features 3) Decision Tree :
such latitude, longitude, time, errors, location id, etc. which For classification and regression tasks, decision trees are
did not contributed to the analysis were eliminated from the a well-liked machine learning algorithm. By dividing the
data-set. data into subsets based on the feature values, it models the
relationship between a dependent variable and one or more
The final selected features are as follows in table[1]: independent variables [12].
TABLE I. FEATURE SELECTION IV. RESULTS & DISCUSSIONS
Features Description Here three different machine learning models such as
Depth Distance from origin of earthquake to surface of Earth Logistic Regression, Random Forest And Decision Tree are
Mag The size of earthquake
used for predicting the type of the earthquake. For the
Nst Number of seismic station
Gap The records where seismic activity is more
evaluation of the model test data is used. The size of the test

2
Authorized licensed use limited to: BRACT's Vishwakarma Institute Pune. Downloaded on October 15,2024 at 09:19:45 UTC from IEEE Xplore. Restrictions apply.
data is 20% of the total data set which is 215. The efficiency aspects of the earthquake are taken into consideration, it is
of the models are measured using different performance discovered that RF performs exceptionally well in
parameters such as accuracy, precision, recall and f1score comparison to the other algorithms. The DT model has
[15] as shown in table [2]. Accuracy is a measure of how shown improved performance after taking into account the
well a model or method can correctly predict or classify severe effects of an earthquake.In real-time systems, the
data. Precision is a measure of the accuracy of a binary accurate prediction result of RF has shown to
classification model. Recall is a measure of the effectiveness have implications.
of a binary classification model [16]. The mathematical
expressions for Accuracy, Precision and Recall are REFERENCES
mentioned in equation (3), (4) and (5). [1] T. J. Roy, M. A. Mahmood and D. Roy, "A Machine Learning Model
to Predict Earthquake Utilizing Neural Network," 2021 International
Confusion Matrix [14] of all the 3 algorithms are given Conference on Computer, Communication, Chemical, Materials and
in table (3). Electronic Engineering (IC4ME2), Rajshahi, Bangladesh, 2021, pp.
1-4, doi: 10.1109/IC4ME253898.2021.9768454.
௔ାௗ
Accuracy = (3) [2] R. Li, X. Lu, S. Li, H. Yang, J. Qiu and L. Zhang, "DLEP: A Deep
௔ା௕ା௖ାௗ
Learning Model for Earthquake Prediction," 2020 International Joint
௔ Conference on Neural Networks (IJCNN), Glasgow, UK, 2020, pp.
Precision = (4)
௔ା௕ 1-8, doi: 10.1109/IJCNN48605.2020.9207621.
௔ [3] B. Bhargava and S. Pasari, "Earthquake Prediction Using Deep
Recall = (5) Neural Networks," 2022 8th International Conference on Advanced
௔ା௖
Computing and Communication Systems (ICACCS), Coimbatore,
Where, India, 2022, pp. 476-479, doi:
a=True Positive 10.1109/ICACCS54159.2022.9785011.
b=False Positive [4] W. Li, N. Narvekar, N. Nakshatra, N. Raut, B. Sirkeci and J. Gao,
c=False Negative "Seismic Data Classification Using Machine Learning," 2018 IEEE
Fourth International Conference on Big Data Computing Service and
d=True Negative Applications (BigDataService), Bamberg, Germany, 2018, pp. 56-63,
doi: 10.1109/BigDataService.2018.00017.
TABLE II. PERFORMANCE ANALYSIS [5] A. Gaba, A. Jana, R. Subramaniam, Y. Agrawal and M. Meleet,
Algorithm Accuracy Precision Recall "Analysis and Prediction of Earthquake Impact-a Machine Learning
Logistic 92.3% 93% 92% approach," 2019 4th International Conference on Computational
Regression Systems and Information Technology for Sustainable Solution
(CSITSS), Bengaluru, India, 2019, pp. 1-5, doi:
Random Forest 92.7% 93% 92%
10.1109/CSITSS47250.2019.9031026.
Decision Tree 87.1% 93% 93%
[6] R. Mallouhy, C. A. Jaoude, C. Guyeux and A. Makhoul, "Major
earthquake event prediction using various machine learning
TABLE III. CONFUSION MATRIX algorithms," 2019 International Conference on Information and
Algorithm Fn Tp Tn Fp Communication Technologies for Disaster Management (ICT-DM),
Logistic 18 215 1 0 Paris, France, 2019, pp. 1-7, doi: 10.1109/ICT-
DM47966.2019.9032983.
Regression
Random Forest 17 215 2 0 [7] https://en.wikipedia.org/wiki/Earthquake
Decision Tree 14 199 5 16 [8] https://en.wikipedia.org/wiki/Seismology
[9] https://www.usgs.gov/
In the case of a real-world crisis, such as an earthquake, [10] https://realpython.com/logistic-regression-python/
the value of False Negative should be as low as possible in [11] https://www.analyticsvidhya.com/blog/2021/06/understanding-
order to save people's lives. Thus, in the case of False random-forest/
Negative Decision Tree Algorithm is better, but when all [12] https://scikit-learn.org/stable/modules/tree.html
other parameters are considered, Random Forest is more [13] https://machinelearningmastery.com/gentle-introduction-long-short-
effective. term-memory-networks-experts/
[14] https://towardsdatascience.com/understanding-confusion-matrix-
a9ad42dcfd62
[15] https://deepai.org/machine-learning-glossary-and-terms/f-score
V. CONCLUSION [16] https://www.learndatasci.com/glossary/binary-classification/
RF and DT are utilised in the creation of an earthquake
prediction system that is presented in this work. When all

3
Authorized licensed use limited to: BRACT's Vishwakarma Institute Pune. Downloaded on October 15,2024 at 09:19:45 UTC from IEEE Xplore. Restrictions apply.

You might also like