0% found this document useful (0 votes)
23 views5 pages

Use of Spatio-Temporal Features For Earthquake Forecasting of Imbalanced Data

Uploaded by

p20210448
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views5 pages

Use of Spatio-Temporal Features For Earthquake Forecasting of Imbalanced Data

Uploaded by

p20210448
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Use of Spatio-temporal Features for Earthquake

Forecasting of imbalanced Data


2022 International Conference on Intelligent Innovations in Engineering and Technology (ICIIET) | 978-1-6654-5653-1/22/$31.00 ©2022 IEEE | DOI: 10.1109/ICIIET55458.2022.9967687

Aaditya Sharma, Arnav Ahuja, Sonu Devi, Sumanta Pasari


Department of Mathematics,
Birla Institute of Technology and Science,
Pilani, India

Email: [email protected] [email protected] [email protected] [email protected]

Abstract—With improvement in instrumentation to precisely after much faster P-waves that create lesser damage compared
record seismic activities, the quality of seismic data is improving to S-waves. As a consequence, the prediction of earthquake
day by day, leading to more informative data sets. These data sets magnitude based on P-waves helps prevent property loss [5]
possess temporal and geospatial patterns that can be extracted by
feature engineering of temporal and geospatial factors. However, . However, for the long term (days to years) earthquake
the less frequent large-magnitude earthquakes often create an prediction, statistical techniques based on past seismic activity
imbalance in earthquake data. In this study, we propose three along with knowledge of the geophysical system are preferred.
machine learning-based algorithm-level techniques to transform The epidemic-type aftershock model is a classic example of
time series earthquake data into an equivalent data set with such consideration [6] . Here we propose a machine learning
temporal and geospatial features to treat the magnitude class
imbalance. Results from several study regions including the setup to carry out earthquake forecasting at several test sites.
Himalayas, Central Java, Sulawesi, Sumatra, and Southeast Asia The layout of the remaining part is as follows. In Section
are compared to discuss the efficacy of the proposed algorithms. II, we discuss and provide a brief literature survey. In Section
Accuracy, precision, and F1 score are used as evaluation metrics. III, we describe the dataset and the proposed methodology,
Therefore, the present work has provided a formulation to use while in section IV, we discuss the results. Finally, in Section
machine learning algorithms for imbalanced data in earthquake
forecasting. V, we summarize our work followed by the conclusion and
Index Terms—Class imbalance, Earthquake forecasting, Fea- the future scope of this study.
ture engineering, Machine learning, Spatio-temporal features
II. R ELATED WORKS
I. I NTRODUCTION
A. Early warning systems
In recent years, various scientific endeavors are made to
predict the timing of occurrence, size, and epicentral position Earthquake detection based on seismic waves as a precursor
of future large earthquakes. Several methods, including statis- can help in early warning systems. Visual inspection of P-
tical and modeling approaches, have been implemented for this wave arrival time is the most reliable technique in earthquake
purpose. These approaches are based on either the examination detection [5]. Such a methodology involves a heavy amount
of previous earthquake events captured in seismic catalogs or of pattern recognition. The majority of P-wave-based detection
the investigation of precursory events preceding earthquakes, systems are based on short-time average (STA) through long-
such as variations in electromagnetic signals, seismogenic time average (LTA) ratio [7] . As these algorithms produce
activity, and animal behavior. Existing methodologies that are a large number of false positives, they often require human
available for earthquake forecasting may be classified into upervision. However, in the case of early warning systems,
two broad categories- long-term and short-term earthquake not only the detection time but also the precision is crucial
forecasting. The time scales for earthquake forecasting ei- to avoid false negatives. In such cases, STA/LTA ratio and
ther include years or days [1] . In earthquake forecasting, template matching are the most preferred algorithms for early
three outcomes are usually required to be known- magnitude, earthquake detection. The STA/LTA ratio is a generalized
region, and time, with at least one of them should qualify and efficient way, though it is noise sensitive, insensitive to
the conditions for an earthquake [2], [3] . Most earthquake small events, produces false positives, and insensitive to events
forecasting algorithms rely on precursors, such as thermal occurring close to each other [8] . Therefore, the use of the
anomalies, variations in seismicity patterns, strain rates, seis- CNN-RNN model for the detection of the earthquake from
mic waveform, radon concentrations in groundwater level, soil waveform data has been suggested in previous studies [9] .
or air, and electromagnetic changes on or above the surface
of Earth [4] . Early warning systems predict earthquakes B. Feature engineering
based on the arrival time of P-waves. Most of the energy Feature extraction or feature engineering is a technique
released by an earthquake is carried by S-waves that arrive to extract meaningful information from raw data. Ahuja and

178
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on November 30,2024 at 12:36:58 UTC from IEEE Xplore. Restrictions apply.
Pasari [7] discussed temporal feature extraction from the earth- C. Deep learning in earthquake forecasting
quake catalog by calculating various seismic indicators for With advancements in earthquake catalogs and recording
every event. They essentially converted a time-series prob- techniques, the scope for algorithms that require extensive
lem into a regular machine learning problem. The seismic datasets are increasing in the domain of earthquake forecast-
indicators were based on the Gutenberg Richter law which ing.
describes frequency-magnitude power law distribution in a Huang et al. [14] used deep learning with image datasets,
specific geographic and temporal boundary. Although the where historical seismic data was projected on images to
seismicity indicators were used to capture a majority of the predict the possibility of occurrence of an earthquake in a
earthquake information, the study failed to incorporate spatial certain period based on past seismic data in Taiwan. Jena
information. A novel approach to developing a feature to et al. [15] discussed the use of convolutional neural network
incorporate the distance of events from seismological faults (CNN) along with features defined like distance from faults,
was provided by Yousefzadeh et al. [10] . They used kernel faults density, slope angle, etc in various regions of Indian
density estimation to calculate fault density from fault location subcontinent. The above two approaches involve implicit and
data for all regions of interest. It may be noted that kernel explicit feature extraction respectively, however combination
density estimation is a non-parametric method for estimation, of these types, feature extraction approaches were discussed
and it usually deals with stochastic datasets [3] . The kernel by Li et al. [16] , where precursory features were explicitly
density estimation method (Fig. 1) is useful in a variety of defined; a CNN model was used for implicit feature extraction,
applications, especially in handling geospatial datasets related the attention-based strategy was used for combining these
to road accidents, crime spots, and leakage in gas and oil features and a dynamic loss function was used to tackle the
pipelines [11] - [12] . The estimator function for kernel density challenge of class imbalance in seismic data. Earthquakes
estimation is given by equation 1: depend on spatial and temporal parameters, and thus their
P  x − observation  prediction based on historical data only is not enough. Wang et
f (x) = K (1) al. [17] discussed the use of long-short term memory (LSTM)
bandwidth
neural network to learn the Spatio-temporal relationship of
earthquakes. Two-dimensional input was used to make pre-
dictions using spatial and temporal features.
D. Machine learning models on imbalanced data
Most of the machine learning algorithms consider well-
represented classes and concentrate on accuracy as a metric.
However, in real-world problems, shreds of evidence of im-
balanced datasets, such as in medical statistics, clinical trials,
and credit card fraud data are inevitable. They harm machine
learning algorithms and they often cause over fitting.
In such a scenario, the models tend to report good accuracy
on majority classes but fail to perform on minority classes [18]
Fig. 1. Kernel density estimation on road accident data [10] . In this study, these minority classes are nothing but the
high-magnitude earthquakes, which play crucial roles in city
To calculate the optimal radius of the kernel function for planning, disaster management, and socioeconomic studies. In
density estimation, bi-variate Moran’s index is often used. this regard, a popular approach with Support Vector Machines
Along with spatial parameters, fault density temporal pa- (SVMs) is to bias the algorithm toward minority classes by
rameters are also utilized [13] . For example, Reyes et al. [11] increasing misclassification costs for misclassifying minority
used various features based on b-value, Bath’s law, and the classes [19] . Similarly, Random Forest (RF) classifiers with
Omori-Utsu’s law to predict earthquakes in Chile. Indicators cost-sensitive learning and sampling techniques have been
like b-value are crucial because they indicate the tectonics studied in past and they have produced acceptable results [20]
and geophysical properties of pressure variation of rocks and . Previous works also suggest using balanced random trees
solids. A large number of earthquakes occur near geological and weighted random forests as a cost-sensitive method for
faults, fault density would be an important parameter [9]. imbalanced datasets [20]. Deep neural networks are sensitive
Thus, in this work, we present a methodology to calculate to imbalanced data and may over-fit due to class imbalance,
fault density and study its effect on the proposed models. We hence use of simpler algorithms with algorithm-level adjust-
calculate fault density by applying kernel density estimation ments for imbalanced data is explored in this research.
on global fault data, whereas we compute the radius or the
III. DATASET AND METHODOLOGY
bandwidth of kernel density estimation function using the
bivariate Moran’s index in such a manner that the correlation A. Dataset
between the distance from fault and event magnitude becomes The earthquake data from five different regions is used to
maximum [9] test the proposed approach. These regions are the Himalayas,

179
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on November 30,2024 at 12:36:58 UTC from IEEE Xplore. Restrictions apply.
Central Java, Sulawesi, and Sumatra from Indonesia, and the
Southeast Asia. The sample earthquake catalog can be found
in table I. Each dataset contains 20 parameters. Training to-test
the ratio was 80%:20% for all the models.

TABLE I
E XAMPLE OF EARTHQUAKE CATALOG

EVENTIDE DATE TIME LAT LONG DEPTH MAG


589363 1982- 08:10:21 28.80 81.05 10.0 3.6
11-21
495770 1986- 05:16:13 34.00 72.58 15.0 4.0
05-16
490430 1986- 07:56:12 34.15 72.69 02.0 4.5 Fig. 2. GEM Global Active Fault Database (GEM GAF-DB)
07-10
. . . . . . .
. . . . . . . tries to gain maximum information with the minimum entropy
617005962 2019- 00:46.57 35.40 74.80 10.0 3.3
12-31 in the tree-subgroups [25] . All data is first gathered in a
617439537 2019- 04:55.79 35.53 74.46 11.0 3.1 root node, and subsequently, they are divided into various
12-31 subgroups based on their parameter values. The DT model
classifies data points in branch-like segments to construct an
Seismic data from the Himalayan region (1963-2020) con- inverted tree [26] .
sists of 9,320 data points with magnitudes (ML) ranging from 3) K-Nearest Neighbours (KNN): The KNN can serve as
3.5 to 7.6. Similarly, the earthquake data from the southeast regression as well as a classification algorithm. As a classifi-
Asia (1963-2020) comprises 65,000 data points. The Central cation algorithm, it decides the class of a new sample based
Java dataset (1963-2020) has 8,000 data points; the Sulawesi on K pre-decided classes [27] . However, for the imbalanced
region includes 28,000 data points, while the Sumatra catalog dataset, the distance-weighted KNN that assigns weights to
(1963-2020) consists of 22,000 data points. These datasets each of the K nearest neighbours is often recommended [27].
are time-series data. We apply feature extraction technique to At this juncture, we would like to mention that three other
transform each dataset into discrete data points with temporal algorithms, namely AdaBoost (AB), Easy Ensemble Classifier
and geospatial features. To calculate the fault density of the (EEC), and AdaCost (AC) were considered in the preliminary
region, fault location data were fitted into the kernel density step. However, those were eventually rejected due to their poor
estimation function for each epicentral location of the catalog. performance on imbalanced data sets. AdaBoost’s learning is
We use GEM Global Active Fault Database (GEM GAF-DB) observed to be biased toward the majority classes as they
in which the majority of the deforming continental regions in contribute more to the overall classification accuracy, whereas
the world are included (Fig. 2) [21] . the EEC tries to solve the class imbalance problem by under-
B. Machine learning algorithms sampling; it often reduces the size of the data set and results in
overfitting. Similarly, the AC model utilizes misclassification
The earthquake catalog in the form of time series data
costs to update training distribution on successive rounds of
is transformed into another data set with various temporal
boosting. Although it shows less cumulative misclassification
features. Machine learning algorithms are then implemented
cost compared to the AB, obtaining a heuristic approach to
to check their robustness for imbalanced data sets by using
find the cost matrix was very challenging [28] .
various evaluation metrics for measuring their performance.
1) Support Vector Machines (SVMs): It aims to find in
C. Evaluation
early separating hyperplanes to separate the data set into
different classes. In the ideal scenario, hyperplanes will be We have implemented the above-mentioned algorithms for
at a maximum distance from the nearest point of distinct a 7-day earthquake prediction (class-wise) in the region. The
classes [22] .Though SVMs produce good results on balanced efficiancy of the models on four selected data sets is appraised
data sets, they provide sub-optimal results on imbalanced data from three evaluation criteria: accuracy, F1-score, and class-
sets. wise precision. Unlike accuracy, precision and recall are less
Hence, we have considered weighted SVM to separate data sensitive to the distribution of classes [29] . As we have
classes. However, some other methods like z-SVM, fuzzy a multi-class output, we report the average value for each
SVM, one class learning, and kernel alignment modification classification Metric. The accuracy is given by equation 2,
also exist for imbalanced learning [23] . Correct Prediction
2) Decision Trees (DT): It is a hierarchical model com- Accuracy = (2)
T otal Prediction
prising some decisions, rules to divide independent variables
into homogeneous groups in a recursive manner [24] . Using The precision is given by equation 3
the obtained decision rules, one can assign an output class True Positive
for given input parameters. In the training step, the model Precision = (3)
T otal Predicted Postive

180
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on November 30,2024 at 12:36:58 UTC from IEEE Xplore. Restrictions apply.
The F1 score is given by equation 4 , E. Summary of results
Precision ∗ Recall The distance weighted KNN is observed to perform best
F1 Score = 2 × (4) among all three algorithms based on the accuracy, precision
Precision + Recall
and F1 score metrics.
• The weighted SVM shows lower accuracy compared to
IV. R ESULTS other algorithms across all data sets.
Here we report the results of three implemented models • All three algorithms have lower accuracy and precision

corresponding to the following earthquake classes (category 0 for the Himalayan data set, probably due to less number
to category 3). of data points.
• The distance-weighted KNN shows less variability in
1) Category 0:
accuracy and precision metrics across all data sets. Thus,
2) Category 1: 3
it can be said that the distance-weighted KNN has the
3) Category 2: 4
least sensitivity to data imbalance.
4) Category 3:
Evaluation metrics for weighted SVM, Distance weighted V. C ONCLUSION AND F UTURE S COPE
KNN and Weighted DT for Himalayan, Central Java, Sumatra, Reliable forecasting of large earthquakes is one of the
and Sulawesi regions are mentioned in table II, table III, and greatest challenges in earthquake disaster management. The
table IV, and table V respectively. earthquake catalog is often non-uniform (class imbalance
problem) in the sense that the majority of events are of
A. The Himalayan region
lesser magnitude. Hence a need to solve the class imbalance
problem is required. Therefore, in addition to the conventional
TABLE II machine learning methods of earthquake forecasting, the need
R ESULTS FOR THE H IMALAYAN DATASET
for use of machine learning algorithms to solve the class
Algorithm\Metric Accuracy Precision F1 Score imbalance problem is increasing day by day. In this study, we
Weighted SVM 68.20% 65.88% 0.63 have implemented three machine learning-based techniques,
Distance weighted KNN 86.13% 92.81% 0.89
Weighted Decision Trees 85.17% 85.50% 0.85 namely weighted support vector machines, distance weighted
K nearest neighbors, and weighted decision trees to carry
out seven-day ahead earthquake prediction (class-wise) at five
B. Central Java region (Indonesia) selected study regions. The inclusion of a geospatial parameter,
namely the fault density, is one of the unique features of this
TABLE III study. We essentially transformed a time series earthquake
R ESULTS FOR THE C ENTRAL JAVA DATASET data into an equivalent data set with temporal and geospatial
features to treat the magnitude class imbalance. The results
Algorithm\Metric Accuracy Precision F1 Score
Weighted SVM 82.39% 79.64% 0.80 from five different study sites reveal that distance weighted
Distance weighted KNN 90.61% 96.01% 0.93 KNN is the most preferable algorithm to achieve high accuracy
Weighted DT 89.96% 90.17% 0.90 and high recall for large-magnitude earthquakes.
As a further improvement of the present study, the XGBoost
(XGB) with weighted and focal losses can be implemented to
C. Sumatra region (Indonesia)
deal with the class imbalance problem [30] From the present
analysis, we conclude that problem of imbalanced learning
TABLE IV in earthquake prediction based on temporal and geospatial
R ESULTS FOR THE S UMATRA DATASET
seismic features can be resolved using algorithm level adjust-
Algorithm\Metric Accuracy Precision F1 Score ments. The proposed algorithm may promote several practical
Weighted SVM 75.56% 71.28% 0.70 applications that deal with the preservation of life and property
Distance weighted KNN 94.13% 97.38% 0.95
Weighted DT 91.91% 91.87% 0.92 (e.g., earthquake insurance scheme, disaster management, and
infrastructure planning) by forecasting large earthquakes in
a given region. In future work, deep learning models for
D. Sulawesi region (Indonesia) geospatial and temporal data may be considered to improve
the accuracy of earthquake prediction.
TABLE V ACKNOWLEDGEMENTS
R ESULTS FOR THE S ULAWESI DATASET
Constructive comments and suggestions from two anony-
Algorithm\Metric Accuracy Precision F1 Score
Weighted SVM 69.06% 64.23% 0.62
mous reviewers are duly acknowledged. Partial financial sup-
Distance weighted KNN 91.93% 97.25% 0.94 port was available from a DST-SERB project under the
Weighted DT 90.41% 90.41% 0.90 MATRICS scheme (file no: MTR/2021/000458). Sonu Devi
acknowledges BITS Pilani for the research fellowship.

181
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on November 30,2024 at 12:36:58 UTC from IEEE Xplore. Restrictions apply.
R EFERENCES [27] G. Batista and D. Silva, “How K-Nearest Neighbor Parameters Affect its
performance?” 38◦ JAIIO - Simposio Argentino de Inteligencia Artificial,
[1] Y. Kagan, D. Jackson, and A. Helmstetter, “Comparison of short term 2009.
and long term earthquake,” Bulletin of the Seismological Society of [28] W. Fan, S. J. Stolfo, J. Zhang, and P. K. Chan, “AdaCost: Misclassifica-
America, 2006. tion Cost-Sensitive Boosting,” Proceedings of the Sixteenth International
[2] N. Grafeeva and A. Galkina, “Machine Learning Methods for Earth- Conference on Machine Learning (ICML’99), 1999.
quake,” SEIM, 2019. [29] H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE
[3] T. K. Anderson, “Kernel density estimation and K-means clustering to Transactions on Knowledge and Data Engineering, 2009.
profile road accident hotspots,” Accident Analysis Prevention, 2009. [30] C. Wang, S. Deng, and Wang, “Imbalance-XGBoost: Leveraging
[4] T. H. Jordan, Y. T. Chen, P. Gasparini, R. Madariaga, I. Main, W. Mar- weighted and focal losses for binary label-imbalanced classification with
zocchi, G. Papadopoulos, G. Sobolev, K. Yamaoka, and J. Zschau, “Op- XG-Boost,” Pattern Recognition Letters, 2020.
erational Earthquake Forecasting: State of Knowledge and Guidelines
for Implementation,” Annals of Geophysics, 2011.
[5] G. Galasso and Cremen, “Earthquake early warning: Recent advances
and perspectives,” Earth-Science Reviews, p. 2020.
[6] A. A. Azim, M. S. Soliman, H. Yayama, and A. G. Hafez, “Real-time P-
wave picking for earthquake early warning system using discrete wavelet
transform,” NRIAG Journal of Astronomy and Geophysics, 2019.
[7] G. Madureira, G. M. Ruano, H. R. Khosravani, O. Barros, and A. E.
Ruano, “A Support Vector Machine Seismic Detector for Early-Warning
Applications,” 3rd IFAC International Conference on Intelligent Control,
2013.
[8] A. Ahuja and S. Pasari, “Chapter 10 : Earthquake Forecasting in the
Himalayas Artificial Neural Networks,” Disaster Management in the
Complex Himalayan Terrains, 2022.
[9] S. M. Mousavi, W. Zhu, Y. Sheng, and G. C. Beroza, “CRED: A Deep
Residual Network of Convolutional and Recurrent Units for Earthquake
Signal Detection,” Scientific Reports, 2018.
[10] S. A. Yousefzadeh, Hosseini, and Farnaghi, “Spatio-temporally explicit
earthquake prediction using deep neural network,” Soil Dynamics and
Earthquake Engineering, 2021.
[11] J. Reyesa, A. Morales-Estebanb, and F. Martlnez-Alvarezc, “Neural
networks to predict earthquakes in Chile,” Applied Soft Computing,
2013.
[12] A. Okabe, Satoh, and Sugihara, “A kernel density estimation method
for networks, its computational method, and a GIS-based tool,” Inter-
national Journal of Geographical Information Science, 2008.
[13] A. Z. Zambom and R. Dias, “A Review of Kernel Density Estimation
with Applications to Econometrics,” A Review of Kernel Density Esti-
mation with Applications to Econometrics, 2013.
[14] J. Huang, X. Wang, Y. Zhao, C. Xin, and H. Xiang, “Large earthquake
magnitude prediction in Taiwan based on deep neural network,” Neural
Network World, 2018.
[15] R. Jena, B. Pradhan, A. Al-Amri, C. W. Lee, and H. Park, “Earthquake
Probability Assessment for the Indian Subcontinent Using Deep Learn-
ing,” Sensors, 2020.
[16] R. Li, X. Lu, S. Li, H. Yang, J. Qiu, and L. Zhang, “DLEP: A Deep
Learning Model for Earthquake Prediction,” 2020 International Joint
Conference on Neural Networks (IJCNN), pp. 2020–2020.
[17] Q. Wang, Y. Guo, L. Yu, and P. Li, “Earthquake Prediction based on
Spatio-Temporal Data Mining: An LSTM Network Approach,” IEEE
Transactions on Emerging Topics in Computing, 2017.
[18] V. Ganganwar, “An overview of classification algorithms for imbalanced
datasets,” International Journal of Emerging Technology and Advanced
Engineering, 2012.
[19] R. Akbani, S. Kwek, and Japkowicz, “Applying Support Vector Ma-
chines,” European Conference on Machine Learning, 2004.
[20] A. Chen, Liaw, and Brieman, “Using Random Forest to Learn Imbal-
anced Data,” 2004, University of California, Berkeley,.
[21] S. Richard and M. Pagani, Earthquake Spectra, 2020.
[22] E. N. Bui, “Machine learning in the Australian critical zone,” Data
Science Applied to Sustainability Analysis, 2021.
[23] R. Batuwita and V. Palade, “Class Imbalance Learning Methods for Sup-
port Vector Machines,” Imbalanced Learning: Foundations, Algorithms,
and Applications, 2013.
[24] J. H. Cho and P. U. K. and, “Decision tree approach for classification
and dimensionality reduction of electronic noise data,” Sensor Actuator
B Chem, 2011.
[25] A. J. Myles, R. N. Feudale, N. A. Liu, and Woody, “An introduction to
decision tree modeling,” J Chemometr: A Journal of the Chemometrics
Society, 2004.
[26] Y. Y. Song and Lu, Decision tree methods: applications for classification
and prediction, Shanghai Arch Psychiatry, 2015.

182
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on November 30,2024 at 12:36:58 UTC from IEEE Xplore. Restrictions apply.

You might also like