Earthquake Prediction Based On Spatio-Temporal Data Mining An LSTM Network Approach
Earthquake Prediction Based On Spatio-Temporal Data Mining An LSTM Network Approach
ABSTRACT Earthquake prediction is a very important problem in seismology, the success of which can
potentially save many human lives. Various kinds of technologies have been proposed to address this problem,
such as mathematical analysis, machine learning algorithms like decision trees and support vector machines, and
precursors signal study. Unfortunately, they usually do not have very good results due to the seemingly dynamic
and unpredictable nature of earthquakes. In contrast, we notice that earthquakes are spatially and temporally
correlated because of the crust movement. Therefore, earthquake prediction for a particular location should not be
conducted only based on the history data in that location, but according to the history data in a larger area. In this
paper, we employ a deep learning technique called long short-term memory (LSTM) networks to learn the
spatio-temporal relationship among earthquakes in different locations and make predictions by taking advantage
of that relationship. Simulation results show that the LSTM network with two-dimensional input developed in
this paper is able to discover and exploit the spatio-temporal correlations among earthquakes to make better
predictions than before.
INDEX TERMS Earthquake prediction, spatio-temporal data mining, LSTM
I. INTRODUCTION Even animals’ abnormal behavior has been taken into account
Earthquakes are one of the most destructive natural disasters. in this kind of study [9]. The third type of work mainly
They usually occur without warning and do not allow much explores data mining and time series analysis methods,
time for people to react. Therefore, earthquakes can cause such as J48, adaboost, multi-objective info-fuzzy network
serious injuries and loss of life and destroy tremendous (M-IFN), k-nearest neighbors (kNN), SVM, and artificial neu-
buildings and infrastructure, leading to great economy loss. ral networks (ANNs) [10], [11], to predict the magnitude of
The prediction of earthquakes is obviously critical to the the largest earthquake in the next year based on the previously
safety of our society, but it has been proven to be a very chal- recorded seismic events in the same region. In the fourth type
lenging issue in seismology [1]. of work, deep learning algorithms are utilized to predict both
Existing works on earthquake prediction can be mainly the magnitude and the time of major seismic events. Various
classified into four categories according to the employed kinds of neural networks have been adopted, such as multi-
methodologies, i.e., 1) mathematical analysis, 2) precursor layer perceptron (MLP) [12], backward propagation (BP) neu-
signal investigation, 3) machine learning algorithms like deci- ral network [13], feed forward neural network (FFNN) [14],
sion trees and support vector machines (SVM), and 4) deep recurrent neural network (RNN) [15], which can work under
learning. The first type of work tries to formulate the earth- certain particular circumstances.
quake prediction problem by using different mathematical Although there have been a lot of works on earthquake
tools [2], like the FDL (Fibonacci, Dual and Lucas) method, prediction, very few of them can predict future seismic
kinds of probability distribution or other mathematics proving events accurately. The reason is that the occurrence of earth-
and spatial connection theory [3]. In the second type of work, quakes involves processes of very high complexity and
researchers study earthquake precursor signals to help with depends on a large number of factors that are difficult to
earthquake prediction. For example, electromagnetic signals analyze. There are obviously complex nonlinear correlations
[4], aerosol optical depth (AOD) [5], lithosphere-atmosphere- among earthquake occurrences, because of which traditional
ionosphere [6] and cloud image [7], [8] have been explored. mathematical, statistical, and machine learning methods
2168-6750 ß 2017 IEEE. Translations and content mining are permitted for academic research only.
Personal use is also permitted, but republication/redistribution requires IEEE permission.
148 See ht_tp://www.ieee.org/publications_standards/publications/rights/index.html
Authorized licensed use limited to: International Islamic University Malaysia. Downloaded on June 22,2023foratmore information.
03:57:06 VOLUME
UTC from IEEE 8, NO.
Xplore. 1, JAN.-MAR.
Restrictions 2020
apply.
Wang et al.: Earthquake Prediction Based on Spatio-Temporal Data Mining: An LSTM Network Approach
cannot analyze well in this process. Recently, deep learning First, some works employ mathematical or statistical tools
methods like RNNs are shown to be able to capture the non- to make earthquake prediction. Kannan [3] predicts earth-
linear correlations among data [16], [17]. Particularly, they quake epicenters according to spatial connections theory,
are mostly used to analyze time-series data so as to make pre- i.e., earthquakes occurring within a fault zone are related to
dictions. As a result, when previous works use deep learning one another. Particularly, predictions are made by taking
to make predictions, they predict earthquakes in a particular advantages of Poisson range identifier function (PRI), Pois-
location only based on the history time-series data in that son distribution, etc. Boucouvalas et al. [2] improve the
location, and hence still cannot get good results. In contrast, Fibonacci, Dual and Lucas (FDL) method and propose an
we contend that the spatio-temporal correlations among his- scheme to predict earthquakes by using a trigger planetary
tory earthquake data have to be investigated in order to make aspect date prior to a strong earthquake as a seed for the
more accurate predictions. unfolding of FDL time spiral. However, these works are only
To this end, in this paper we investigate earthquake predic- tested with very limited amount of data and do not provide
tion from a spatio-temporal perspective. Specifically, we devise good results (the success rate is low).
an earthquake prediction scheme by adjusting a long short-term Second, some works predict earthquakes based on precur-
memory (LSTM) network, which is an advanced RNN and has sor signals studies. Hayakawa [4] and Jiang [18] take the elec-
strong nonlinear learning capability even on the data containing tromagnetic signals as the precursor of significant
long-term interval correlations that the RNN is not able to earthquakes. Thomas et al. [8] and Fan et al. [7] have studied
achieve. We consider as a whole the earthquakes in an area of satellite images of clouds before earthquakes. Akhoondzadeh
interest (e.g., a country) to be an input element to the LSTM and Chehrebargh [5] claim that unusual aerosol optical depth
network, which is different from common deep learning (AOD) variations before earthquakes could be introduced as
approaches that only consider the data in one particular location an earthquake precursor. Meanwhile, Korepanov [6] proposes
as an input. Therefore, by having a time-series of such input a earthquake precursor based on lithosphere-atmosphere-
elements, we can construct an LSTM network with two-dimen- ionosphere coupling and relations. Florido et al. [19] discover
sional input that can learn the correlations among earthquakes precursory patterns for large earthquakes. Also, the new
in different locations and at different time, and exploit it to attributes, based on the well-known b-value, are also gener-
make predictions. After building LSTM network, we find that ated. In addition, Hayakawa et al. [9] study the abnormal
it is difficult to well train the network due to its high complexity behavior of animals about 10 days before earthquakes in order
and the lack of training data. Then, we decompose the original to make earthquake prediction. Unfortunately, it is difficult to
LSTM into several smaller ones to reduce the complexity and draw conclusions on theses precursor signals due to very lim-
the need for larger training data sets. ited data. Besides, these precursor signals alone usually can-
Our main contributions in this paper can be summarized as not lead to satisfactory prediction results.
follows. Third, machine learning has been employed as an important
We investigate the earthquake prediction problem from method to make earthquake prediction. Last et al. [10] com-
a spatio-temporal perspective. pare several data mining and time series analysis methods,
We construct an LSTM network with two-dimensional which include J48, AdaBoost, information network (IN),
input, which can discover the spatio-temporal correla- multi-objective info-fuzzy network (M-IFN), k-nearest neigh-
tions among history earthquake data, and exploit it to bors (k-NN) and SVM, for predicting the magnitude of the
make predictions on earthquakes in a large area of largest coming seismic event based on previously recorded
interest. seismic events in the same region. Besides, the prediction fea-
We decompose the original large LSTM network into tures based on the Gutenberg-Richter Ratio as well as some
several smaller ones, which can lower the complexity new seismic indicators are proved to be much more useful
and facilitate the network training. than those traditionally used in the earthquake prediction liter-
Simulation results show that the proposed LSTM ature, i.e., the average number of earthquakes in each region.
approach can obtain good performances. Asencio-Cort et al. [20] study the sensitivity of the existing
The rest of this paper is organized as follows. Section II seismicity indicators reported in the literature by changing the
introduces the most related work on earthquake prediction input attributes and their parameterization. We notice that
methods. Section III describes the proposed system model most machine learning methods make earthquake prediction
for earthquake prediction. Section IV details the proposed based on seismicity indicators, where only time-domain but
LSTM based scheme, which is followed by simulation not space domain correlations are studied. Moreover, tradi-
results and discussions in Section V. Finally, we conclude tional machine learning methods expose their limitations on
the paper in Section VI. mining data with complex nonlinear correlations.
Fourth, recently deep learning methods have been applied
II. RELATED WORK to earthquake prediction. Narayanakumar and Raja [21]
In this section, we introduce in detail the related works on evaluate the performance of BP neural network techniques
earthquake prediction, which are classified into four catego- in predicting earthquakes. They gather data with event time,
ries as we mentioned above. latitude, longitude, depth and magnitude to convert them into
VOLUME 8, NO.
Authorized 1, JAN.-MAR.
licensed 149
2020 to: International Islamic University Malaysia. Downloaded on June 22,2023 at 03:57:06 UTC from IEEE Xplore. Restrictions apply.
use limited
Wang et al.: Earthquake Prediction Based on Spatio-Temporal Data Mining: An LSTM Network Approach
150 VOLUME
Authorized licensed use limited to: International Islamic University Malaysia. Downloaded on June 22,2023 at 03:57:06 UTC from IEEE 8, NO.
Xplore. 1, JAN.-MAR.
Restrictions 2020
apply.
Wang et al.: Earthquake Prediction Based on Spatio-Temporal Data Mining: An LSTM Network Approach
VOLUME 8, NO.
Authorized 1, JAN.-MAR.
licensed 151
2020 to: International Islamic University Malaysia. Downloaded on June 22,2023 at 03:57:06 UTC from IEEE Xplore. Restrictions apply.
use limited
Wang et al.: Earthquake Prediction Based on Spatio-Temporal Data Mining: An LSTM Network Approach
152 VOLUME
Authorized licensed use limited to: International Islamic University Malaysia. Downloaded on June 22,2023 at 03:57:06 UTC from IEEE 8, NO.
Xplore. 1, JAN.-MAR.
Restrictions 2020
apply.
Wang et al.: Earthquake Prediction Based on Spatio-Temporal Data Mining: An LSTM Network Approach
we apply an activation function, which is set to softmax func- LSTM layer are shown in Figure 7, where there are L
tion, and obtain the prediction result xtþ1 . memory cells, one for each time slot. The output of the jth
The architecture of our system is presented in Figure 5. memory cell at time t, i.e., htj and ctj , is part of the input
Notice that as we mentioned above, in our system Xt is a of the next, i.e., the ðj 1Þth, memory cell. Besides, the out-
matrix of dimension M L. As in Figure 6, in the training put of the LSTM layer goes to a dense layer whose output is
process, the target of prediction based on input matrix Xt at denoted by hD t . In the following, we describe in detail what
time t is xtþ1 , and in the prediction phase, xtþ1 is what needs happens after the LSTM layer.
to be predicted at time t. In our architecture, hLt is an output
of the LSTM layer at time t, which is constructed by memory 2) DROPOUT
cells depicted in Figure 3. In particular, the details of our To prevent our system from being overfitted, we apply a
method called dropout to the output of the LSTM layer.
System overfitting can lead to very high performance in
training but very low in testing. This is because when over-
fitting occurs, the system focuses too much on historical
data, which makes it too rigid to give satisfactory result on
new input. Many works have proved that adding dropout in
the system can efficiently prevent a neural network from
being overfitted [25]. In particular, by having dropout in the
system, a certain number of randomly selected nodes are
temporarily turned off in each sample training, along with
all its conjoint connections. Therefore, in our case, we apply
dropout between the LSTM layer and the dense network,
which is shown in Figure 8. Since some of the nodes in the
output of the LSTM layer have been turned off, the system
becomes insensitive to some extent, and hence can avoid
from being “too smart”, i.e., overfitted.
3) DENSE NETWORK
After the LSTM layer, we have the output of LSTM goes
to a dense network, which is essentially a fully connected
neural network. In this fully connected neural network, at
VOLUME 8, NO.
Authorized 1, JAN.-MAR.
licensed 153
2020 to: International Islamic University Malaysia. Downloaded on June 22,2023 at 03:57:06 UTC from IEEE Xplore. Restrictions apply.
use limited
Wang et al.: Earthquake Prediction Based on Spatio-Temporal Data Mining: An LSTM Network Approach
each layer, each neuron gets connected to all the neurons at the We summarize the training process of our proposed LSTM
previous layer. By going through the dense network, the out- network in Algorithm 2.
put of LSTM is multiplied by a matrix and added with a bias.
The reason for having a dense network here is that the output 5) IMPROVING SYSTEM PERFORMANCE BY
of the LSTM contains the feature information we need to DECOMPOSITION
make prediction, but it is still not exactly what we need. So far we have introduced how our proposed LSTM works.
The dense network is so trying to learn the function between However there are two more problems: first, by considering
feature data and the prediction result. In our system, we set up a large area consisting of many sub-regions, we may have a
two layers in the dense network. The processing in the fully very large system with many variables, which requires a
connected network can represented below large amount of training data to be fully trained, and second,
by considering the sub-regions all together, we make earth-
t ¼ WD WP ht þ b;
hD L
quake predictions by taking advantages of the spatio-tempo-
where WP and hLt are the weight matrix between the output ral correlations among earthquake data in these sub-regions,
of the LSTM layer and the dense network, and the output of while in fact some sub-regions may not be very closely
the LSTM network, respectively, after the dropout. WD related in practice and hence will hinder the correct predic-
denotes the weight matrix in the dense network, hD tion. The first problem makes the system computationally
t is the
output of the dense network, and b is the bias. very expensive, and the second problem leads to less accu-
rate predictions. In the following, we propose to improve the
4) ACTIVATION FUNCTION system efficiency and accuracy by decomposition.
To obtain the final output of the system, we choose softmax Specifically, we divide all the sub-regions into groups,
as the activation function and apply it to the output of the which collectively and exclusively cover the whole area of
dense network. Particularly, the activation function maps the interest. We train the groups separately and make earthquake
output vector into a vector of elements between 0 and 1, each predictions for the sub-regions in the groups respectively. It
of which represents earthquake probability in a sub-region is obvious that how to form the groups is a very important
and the sum of which equals to 1. The softmax function can problem. We choose to put the sub-regions within the same
be calculated as fault zone into the same group. In so doing, the disturbance
m
from not-so-related sub-regions can be mitigated, the amount
ez of training data and the computational complexity can be sig-
t ¼ PM
ym ; for m ¼ 1; . . . ; M:
i¼1 ezi nificantly reduced.
154 VOLUME
Authorized licensed use limited to: International Islamic University Malaysia. Downloaded on June 22,2023 at 03:57:06 UTC from IEEE 8, NO.
Xplore. 1, JAN.-MAR.
Restrictions 2020
apply.
Wang et al.: Earthquake Prediction Based on Spatio-Temporal Data Mining: An LSTM Network Approach
FIGURE 9. Prediction results when the look back window is 1. FIGURE 10. Prediction results when the look back window is 10.
The horizontal axis represents time slots and the vertical axis The horizontal axis represents time slots and the vertical axis
represents the number of earthquakes that have happened in represents the number of earthquakes that have happened in
the corresponding time slot. the corresponding time slot.
one-dimensional input, i.e., by exploiting the temporal corre- Besides, we also employ our proposed LSTM network
lations only. with one-dimensional input to predict whether there are
earthquakes or not. The selected area of interest is in
1) DATA PREPROCESSING mainland China, particularly, between 75 E and 119 E
The data that we use is gathered from the USGS (US longitudes and 23 N and 45 N latitudes, as shown in
Geological Survey) website. In particular, we use Conter- Figure 11. We equally divide this area into nine smaller
minous U.S earthquake data from 2006 to 2016 with mag- sub-regions, and aim to predict whether there are earth-
nitudes greater than 2.5 in our simulations. We set one quakes with magnitudes greater than 4.5 in each of the
time slot to one month. In each time slot, the input is the sub-regions with the data collected from 1966 to 2016.
number of earthquakes that happened in this time slot in a Besides, in our LSTM network, the LSTM layer has an
certain sub-region. We have 120 data items when one time output of 128 neurons, the dense network has 256 and 64
slot is one month. As usual, we divide the data into two neurons in the first layer and second layer, respectively,
parts: training data and testing data. Particularly, the first and the output layer has 9 neurons. The activation func-
two third of data will be used for training and the rest will tion is set to the softmax function. Our results show that
be used for testing. the overall prediction accuracy is 63.50 percent, with true
positive accuracy of 46.83 percent and true negative accu-
2) LSTM NETWORK SETTINGS racy of 79.6 percent.
In this case, we build our LSTM network with one-dimen-
sional input only in the time domain. The output of the
LSTM layer has 4 neurons. The activation function is set by
default to the sigmoid function and makes a single value pre-
diction. The “look back window” of the system is set to 1
and 10, which is the number of most recent data that we con-
sider as input to predict the next time slot variables.
VOLUME 8, NO.
Authorized 1, JAN.-MAR.
licensed 155
2020 to: International Islamic University Malaysia. Downloaded on June 22,2023 at 03:57:06 UTC from IEEE Xplore. Restrictions apply.
use limited
Wang et al.: Earthquake Prediction Based on Spatio-Temporal Data Mining: An LSTM Network Approach
156 VOLUME
Authorized licensed use limited to: International Islamic University Malaysia. Downloaded on June 22,2023 at 03:57:06 UTC from IEEE 8, NO.
Xplore. 1, JAN.-MAR.
Restrictions 2020
apply.
Wang et al.: Earthquake Prediction Based on Spatio-Temporal Data Mining: An LSTM Network Approach
employ our decomposition method to further improve sub-regions. Note that previous results are obtained when
performance. each time slot is one month. Then, we conduct simulations by
Specifically, the sub-regions 1, 2, 5, 6 cover Tibet, Sich- reducing each time slot to two weeks. With the same system
uan, Xinjiang, Gansu and Ningxia provinces, which are settings, our system leads to prediction results shown in
within the same fault zone [28]. When we consider these Figure 16. We can find that without the decomposition
four sub-regions as a group, the prediction accuracy is method, the system gives lower prediction accuracy than that
88.57 percent. This indicates that our system can well learn when we set each time slot to one month as shown in
spatio-temporal correlations among the earthquakes in these Figure 15. This is because the input data becomes sparser
four sub-regions and make accurate predictions. Besides, when the time slot is reduced to two weeks only, which makes
the 3rd sub-region includes Nepal, which is also a region it more difficult for the system to find the correlations among
with intense earthquake activities. However, because of earthquake occurrences. Nevertheless, when applying the pro-
Himalayas mountains, Nepal is located in a different posed decomposition method, we can still achieve compara-
fault zone from all the other sub-regions within Mainland ble results with those with one-month time slots in Figure 15.
China. So it may have loose spatio-temporal correlations Specifically, the overall accuracy becomes 86 percent, and the
to the other sub-regions. This has been confirmed by the true positive accuracy and the true negative accuracy increase
fact that the overall prediction accuracy of group of 2, 3, from 60.83 to 69.28 percent and from 77.38 to 94.09 percent,
5 sub-regions is 52.46 percent, and that of the group of respectively. These results show that our proposed LSTM
3, 5, 6 is 56.25 percent. system with the decomposition method can work well with
After the analysis above, our final grouping plan is as fol- different time slot sizes in the temporal domain.
lows. Group 1 consists of the 1st, 2nd, 5th and 6th sub- Besides, we also attempt to increase the number of sub-
regions with prediction accuracy of 88.57 percent, group 2 regions to make the spatial prediction more accurate. Simi-
includes the 4th, 7th, 8th and 9th sub-regions with prediction larly, we equally divide the whole area of interest into 5 5
accuracy of 87.57 percent, and group 3 contains the 3rd sub- sub-regions instead of the previous 3 3 sub-regions. To
region with prediction accuracy of 61.60 percent. Combining make fair comparisons, we still set each time slot to one month.
the results together, we have that our overall prediction accu- Without applying the proposed decomposition method, the
racy is 85.12 percent with true positive accuracy of 77.07 overall accuracy increases to 82.47 percent, which is better
percent and true negative accuracy of 93.49 percent, which is than 74.81 percent that is obtained when the whole area is
also shown in Figure 15. From the figure, we can clearly see divided into 3 3 sub-regions. The complete results are
the performance improvement in terms of prediction accu- shown in Figure 17. We can see that although the overall accu-
racy, true positive accuracy, and true negative accuracy, by racy seems good, the true positive accuracy has dropped from
applying our decomposition method. 68.56 to 47.68 percent, which is too low to correctly predict
On the other hand, we compare a previous earthquake pre- earthquakes. The reason is that similar to reducing the time
diction scheme with ours. Specifically, Moustra et al. [22] slot size, the data becomes much sparser when the number of
make earthquake prediction by using a multi-layer percep- sub-regions increases from 9 to 25. Particularly, in the case of
tron (MLP), which is a kind of traditional ANN. We run this 25 sub-regions, the input vector becomes much longer, which
method on our two-dimensional input data and the prediction makes mining the correlations among earthquake occurrences
accuracy is 66.99 percent, which is much lower than our much more difficult. To address this issue, we apply our
result without decomposition, i.e., 74.81 percent, and that decomposition method. Here, the grouping plan is still based
with decomposition, i.e., 85.12 percent. on the fault zone distribution, which is similar to what we use
Furthermore, we evaluate the performance of our system when there are 3 3 sub-regions. From Figure 17, we can find
with input of different time slot sizes and different numbers of that the overall accuracy increases from 82.47 to 87.59 percent
VOLUME 8, NO.
Authorized 1, JAN.-MAR.
licensed 157
2020 to: International Islamic University Malaysia. Downloaded on June 22,2023 at 03:57:06 UTC from IEEE Xplore. Restrictions apply.
use limited
Wang et al.: Earthquake Prediction Based on Spatio-Temporal Data Mining: An LSTM Network Approach
158 VOLUME
Authorized licensed use limited to: International Islamic University Malaysia. Downloaded on June 22,2023 at 03:57:06 UTC from IEEE 8, NO.
Xplore. 1, JAN.-MAR.
Restrictions 2020
apply.