0% found this document useful (0 votes)
24 views11 pages

Earthquake Prediction Based On Spatio-Temporal Data Mining An LSTM Network Approach

This paper presents a novel approach to earthquake prediction using Long Short-Term Memory (LSTM) networks that leverage spatio-temporal data mining. By considering the correlations among earthquakes in different locations, the proposed method aims to improve prediction accuracy compared to traditional techniques that focus solely on localized historical data. Simulation results indicate that the LSTM network with two-dimensional input effectively captures these correlations, leading to better prediction outcomes for seismic events.

Uploaded by

Jahanara suchi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views11 pages

Earthquake Prediction Based On Spatio-Temporal Data Mining An LSTM Network Approach

This paper presents a novel approach to earthquake prediction using Long Short-Term Memory (LSTM) networks that leverage spatio-temporal data mining. By considering the correlations among earthquakes in different locations, the proposed method aims to improve prediction accuracy compared to traditional techniques that focus solely on localized historical data. Simulation results indicate that the LSTM network with two-dimensional input effectively captures these correlations, leading to better prediction outcomes for seismic events.

Uploaded by

Jahanara suchi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Received 14 December 2016; revised 27 February 2017; accepted 8 April 2017.

Date of publication 27 April 2017; date of current version 11 March 2020.


Digital Object Identifier 10.1109/TETC.2017.2699169

Earthquake Prediction Based on Spatio-Temporal


Data Mining: An LSTM Network Approach
QIANLONG WANG , YIFAN GUO , LIXING YU , AND PAN LI
The authors are with the Department of Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, OH 44106
CORRESPONDING AUTHOR: P. LI ([email protected])

ABSTRACT Earthquake prediction is a very important problem in seismology, the success of which can
potentially save many human lives. Various kinds of technologies have been proposed to address this problem,
such as mathematical analysis, machine learning algorithms like decision trees and support vector machines, and
precursors signal study. Unfortunately, they usually do not have very good results due to the seemingly dynamic
and unpredictable nature of earthquakes. In contrast, we notice that earthquakes are spatially and temporally
correlated because of the crust movement. Therefore, earthquake prediction for a particular location should not be
conducted only based on the history data in that location, but according to the history data in a larger area. In this
paper, we employ a deep learning technique called long short-term memory (LSTM) networks to learn the
spatio-temporal relationship among earthquakes in different locations and make predictions by taking advantage
of that relationship. Simulation results show that the LSTM network with two-dimensional input developed in
this paper is able to discover and exploit the spatio-temporal correlations among earthquakes to make better
predictions than before.
INDEX TERMS Earthquake prediction, spatio-temporal data mining, LSTM

I. INTRODUCTION Even animals’ abnormal behavior has been taken into account
Earthquakes are one of the most destructive natural disasters. in this kind of study [9]. The third type of work mainly
They usually occur without warning and do not allow much explores data mining and time series analysis methods,
time for people to react. Therefore, earthquakes can cause such as J48, adaboost, multi-objective info-fuzzy network
serious injuries and loss of life and destroy tremendous (M-IFN), k-nearest neighbors (kNN), SVM, and artificial neu-
buildings and infrastructure, leading to great economy loss. ral networks (ANNs) [10], [11], to predict the magnitude of
The prediction of earthquakes is obviously critical to the the largest earthquake in the next year based on the previously
safety of our society, but it has been proven to be a very chal- recorded seismic events in the same region. In the fourth type
lenging issue in seismology [1]. of work, deep learning algorithms are utilized to predict both
Existing works on earthquake prediction can be mainly the magnitude and the time of major seismic events. Various
classified into four categories according to the employed kinds of neural networks have been adopted, such as multi-
methodologies, i.e., 1) mathematical analysis, 2) precursor layer perceptron (MLP) [12], backward propagation (BP) neu-
signal investigation, 3) machine learning algorithms like deci- ral network [13], feed forward neural network (FFNN) [14],
sion trees and support vector machines (SVM), and 4) deep recurrent neural network (RNN) [15], which can work under
learning. The first type of work tries to formulate the earth- certain particular circumstances.
quake prediction problem by using different mathematical Although there have been a lot of works on earthquake
tools [2], like the FDL (Fibonacci, Dual and Lucas) method, prediction, very few of them can predict future seismic
kinds of probability distribution or other mathematics proving events accurately. The reason is that the occurrence of earth-
and spatial connection theory [3]. In the second type of work, quakes involves processes of very high complexity and
researchers study earthquake precursor signals to help with depends on a large number of factors that are difficult to
earthquake prediction. For example, electromagnetic signals analyze. There are obviously complex nonlinear correlations
[4], aerosol optical depth (AOD) [5], lithosphere-atmosphere- among earthquake occurrences, because of which traditional
ionosphere [6] and cloud image [7], [8] have been explored. mathematical, statistical, and machine learning methods
2168-6750 ß 2017 IEEE. Translations and content mining are permitted for academic research only.
Personal use is also permitted, but republication/redistribution requires IEEE permission.
148 See ht_tp://www.ieee.org/publications_standards/publications/rights/index.html
Authorized licensed use limited to: International Islamic University Malaysia. Downloaded on June 22,2023foratmore information.
03:57:06 VOLUME
UTC from IEEE 8, NO.
Xplore. 1, JAN.-MAR.
Restrictions 2020
apply.
Wang et al.: Earthquake Prediction Based on Spatio-Temporal Data Mining: An LSTM Network Approach

cannot analyze well in this process. Recently, deep learning First, some works employ mathematical or statistical tools
methods like RNNs are shown to be able to capture the non- to make earthquake prediction. Kannan [3] predicts earth-
linear correlations among data [16], [17]. Particularly, they quake epicenters according to spatial connections theory,
are mostly used to analyze time-series data so as to make pre- i.e., earthquakes occurring within a fault zone are related to
dictions. As a result, when previous works use deep learning one another. Particularly, predictions are made by taking
to make predictions, they predict earthquakes in a particular advantages of Poisson range identifier function (PRI), Pois-
location only based on the history time-series data in that son distribution, etc. Boucouvalas et al. [2] improve the
location, and hence still cannot get good results. In contrast, Fibonacci, Dual and Lucas (FDL) method and propose an
we contend that the spatio-temporal correlations among his- scheme to predict earthquakes by using a trigger planetary
tory earthquake data have to be investigated in order to make aspect date prior to a strong earthquake as a seed for the
more accurate predictions. unfolding of FDL time spiral. However, these works are only
To this end, in this paper we investigate earthquake predic- tested with very limited amount of data and do not provide
tion from a spatio-temporal perspective. Specifically, we devise good results (the success rate is low).
an earthquake prediction scheme by adjusting a long short-term Second, some works predict earthquakes based on precur-
memory (LSTM) network, which is an advanced RNN and has sor signals studies. Hayakawa [4] and Jiang [18] take the elec-
strong nonlinear learning capability even on the data containing tromagnetic signals as the precursor of significant
long-term interval correlations that the RNN is not able to earthquakes. Thomas et al. [8] and Fan et al. [7] have studied
achieve. We consider as a whole the earthquakes in an area of satellite images of clouds before earthquakes. Akhoondzadeh
interest (e.g., a country) to be an input element to the LSTM and Chehrebargh [5] claim that unusual aerosol optical depth
network, which is different from common deep learning (AOD) variations before earthquakes could be introduced as
approaches that only consider the data in one particular location an earthquake precursor. Meanwhile, Korepanov [6] proposes
as an input. Therefore, by having a time-series of such input a earthquake precursor based on lithosphere-atmosphere-
elements, we can construct an LSTM network with two-dimen- ionosphere coupling and relations. Florido et al. [19] discover
sional input that can learn the correlations among earthquakes precursory patterns for large earthquakes. Also, the new
in different locations and at different time, and exploit it to attributes, based on the well-known b-value, are also gener-
make predictions. After building LSTM network, we find that ated. In addition, Hayakawa et al. [9] study the abnormal
it is difficult to well train the network due to its high complexity behavior of animals about 10 days before earthquakes in order
and the lack of training data. Then, we decompose the original to make earthquake prediction. Unfortunately, it is difficult to
LSTM into several smaller ones to reduce the complexity and draw conclusions on theses precursor signals due to very lim-
the need for larger training data sets. ited data. Besides, these precursor signals alone usually can-
Our main contributions in this paper can be summarized as not lead to satisfactory prediction results.
follows. Third, machine learning has been employed as an important
 We investigate the earthquake prediction problem from method to make earthquake prediction. Last et al. [10] com-
a spatio-temporal perspective. pare several data mining and time series analysis methods,
 We construct an LSTM network with two-dimensional which include J48, AdaBoost, information network (IN),
input, which can discover the spatio-temporal correla- multi-objective info-fuzzy network (M-IFN), k-nearest neigh-
tions among history earthquake data, and exploit it to bors (k-NN) and SVM, for predicting the magnitude of the
make predictions on earthquakes in a large area of largest coming seismic event based on previously recorded
interest. seismic events in the same region. Besides, the prediction fea-
 We decompose the original large LSTM network into tures based on the Gutenberg-Richter Ratio as well as some
several smaller ones, which can lower the complexity new seismic indicators are proved to be much more useful
and facilitate the network training. than those traditionally used in the earthquake prediction liter-
 Simulation results show that the proposed LSTM ature, i.e., the average number of earthquakes in each region.
approach can obtain good performances. Asencio-Cort et al. [20] study the sensitivity of the existing
The rest of this paper is organized as follows. Section II seismicity indicators reported in the literature by changing the
introduces the most related work on earthquake prediction input attributes and their parameterization. We notice that
methods. Section III describes the proposed system model most machine learning methods make earthquake prediction
for earthquake prediction. Section IV details the proposed based on seismicity indicators, where only time-domain but
LSTM based scheme, which is followed by simulation not space domain correlations are studied. Moreover, tradi-
results and discussions in Section V. Finally, we conclude tional machine learning methods expose their limitations on
the paper in Section VI. mining data with complex nonlinear correlations.
Fourth, recently deep learning methods have been applied
II. RELATED WORK to earthquake prediction. Narayanakumar and Raja [21]
In this section, we introduce in detail the related works on evaluate the performance of BP neural network techniques
earthquake prediction, which are classified into four catego- in predicting earthquakes. They gather data with event time,
ries as we mentioned above. latitude, longitude, depth and magnitude to convert them into

VOLUME 8, NO.
Authorized 1, JAN.-MAR.
licensed 149
2020 to: International Islamic University Malaysia. Downloaded on June 22,2023 at 03:57:06 UTC from IEEE Xplore. Restrictions apply.
use limited
Wang et al.: Earthquake Prediction Based on Spatio-Temporal Data Mining: An LSTM Network Approach

shown in Figure 1, where the elements equal to 1 are called


“hot” elements. Therefore, our goal is to predict the next
system status xtþ1 based on previous system statuses,
i.e., xt ; xt1 ; . . ., etc.
We notice that how to divide the area of interest is an
important issue here. Without loss of generality, in this study
we choose to divide the area evenly into rectangular sub-
regions. Particularly, we consider the area of interest to be a
rectangular area with the four vertices denoted by Vnw , Vsw ,
Vne , Vse , respectively. The latitude and longitude of a point
are represented by LaðÞ and LoðÞ respectively. Thus, we
denote the whole area into M, which is equal to mh  mv ,
sub-regions. The vertical and horizontal edges of each sub-
region are as follows:
SRve ¼ jLaðVnw Þ  LaðVsw Þj=mv ; (1)
FIGURE 1. System statuses represented by M  1 vectors.
SRhe ¼ jLoðVnw Þ  LoðVne Þ=mh ; (2)
input for the neural network. The results show that the BP
neural network method can provide better prediction accu- where, SRve means the length of vertical edge for each sub-
racy for earthquakes of magnitudes 3 to 5 than previous region and SRhe means the length of the horizontal edge.
works, but still cannot have good results for earthquakes of Thus, when we build the system model and represent
magnitude 5 to 8 due to the lack of sufficient data. Moustra the real earthquake data by the multi-hot vectors, we first
et al. [22] first make earthquake predictions by using time define the length of a time slot, i.e., a week, a month, etc.,
series magnitude data, they then use seismic electric signals and then check which sub-region each earthquake in this
(SES) to further improve their results. Li and Liu [13] time slot happens in. This can easily be done by finding
develop particle swarm optimization (PSO) algorithm to opti- the index k of the sub-region (numbered from top to down
mize the parameter of a BP neural network. Particularly, they and from left to right): For an earthquake at location e, the
improve the PSO algorithm by adding nonlinear decreasing sub-region index k is
inertia weight to enhance searching prediction. Saba et al.
[14] predict earthquakes combining the Bat algorithm and a k ¼ mv i þ j þ 1; (3)
feed-forward neural network (FFNN). Mahmoudi et al. [12]
use an MLP network to predict the magnitudes of earth- where i is the coordinate on the horizontal axis and j is the
quakes. With online training, which is superior to batch coordinate on the vertical axis for each earthquake. Thus, i
training for large data sets, as their training method, MLP has and j can be calculated by
good prediction performance. We notice that most of these  
jLoðeÞ  LoðVnw Þj
neural network methods use various kinds of features as i¼ ; (4)
input to predict the time and/or magnitudes of earthquakes, SRhe
but few of them consider the spatial relations among earth-  
jLaðeÞ  LaðVnw Þj
quakes. Moreover, the spatio-temporal correlations among j¼ : (5)
SRve
earthquakes are not studied.
We describe our system modeling process in detail in
III. SYSTEM MODEL Algorithm 1.
In this study, we propose to make prediction on earthquakes
by taking advantages of the spatio-temporal correlations IV. EARTHQUAKE PREDICTION USING AN LSTM
among them. The intuition is that 1) earth is connected, and NETWORK WITH TWO-DIMENSIONAL INPUT
hence the seismic activities in one location will naturally In this section, we describe in detail our proposed earthquake
lead to seismic activities in other locations, and 2) the seismic prediction algorithm using an LSTM network with two-
activities tend to have certain patterns in the time domain. dimensional input. The main idea of our algorithm is
In particular, we divide an area of interest into several sub- to develop an LSTM network with two-dimensional input to
regions to facilitate spatio-temporal earthquake data mining. predict the next system status based on a number of the most
Our objective is to predict earthquakes in each sub-region. recent system statuses. This is achieved by learning the corre-
We denote the system status at time t by a “multi-hot” vector lations among earthquakes in different locations at different
xt , each element of which is equal to 1 or 0, representing times. In the following, we will first introduce the fundamen-
there are either earthquakes in the corresponding sub-region tals of the LSTM architecture and then explain how we
in time slot t (being hot) or not. Define M as the total number devise an LSTM with two-dimensional input to develop our
of sub-regions. Then, xt is a vector of dimension M  1 as earthquake prediction algorithm.

150 VOLUME
Authorized licensed use limited to: International Islamic University Malaysia. Downloaded on June 22,2023 at 03:57:06 UTC from IEEE 8, NO.
Xplore. 1, JAN.-MAR.
Restrictions 2020
apply.
Wang et al.: Earthquake Prediction Based on Spatio-Temporal Data Mining: An LSTM Network Approach

A. THE BASIC LSTM ARCHITECTURE


Long short-term memory is a redesigned architecture based
on the traditional RNN. It was proposed by Sepp Hochreiter
and J€urgen Schmidhuber in 1997 [23]. Notice that in theory
RNN can well handle data with time dependency between
each other, but in practice it struggles with the long-term
temporal dependency problem. By having memory cells that
record their states in a traditional RNN, an LSTM is able to
learn relations among data during a long time interval.

Algorithm 1. System Modeling from the Spatio-Temporal


Perspective
Input: The gathered raw information of earthquake events
E ¼ fe1 ; e2 ; . . .g. For each event e, time (t), latitude (La),
longitude (Lo), magnitude (Ma) information is given.
1: Initialization. Select the study area of rectangle shape
with four vertices expressed as Vnw , Vsw , Vne , Vse .
2: Spacial Segmentation: FIGURE 2. The typical RNN architecture with K hidden layers. ht
k
3: Divide the area into sub-regions. The shape of sub-region represents the state of the kth hidden layer in time slot t. Solid
generated with SRve ¼ jLaðVnw Þ  LaðVsw Þj=mh , SRhe ¼ lines mean the connections by weight.
jLoðVnw Þ  LoðVne Þ=mv .
4: Allocate each event to belonged sub-region. take advantage of information in the past time. Particularly,
5: for e in E do in an RNN, the output not only depends on the current input,
 
6: i ¼ jLaðeÞLaðV
SRhe
nw Þj but also depends on previous inputs. Denote the input vector
  at time t by xt . Then, the RNN network updates the hidden
7: j ¼ jLoðeÞLoðV
SRve
nw Þj
layer states h1t ; . . . ; hKt and computes the output yt based on
8: The sub-region with index of k ¼ i þ 3j þ 1 the the input xt and the hidden layer statuses at the past time
given event e instance. hkt denotes the kth hidden layer’s state at time t,
9: end for which is essentially a vector with the number of the elements
10: Temporal Segmentation:
representing the number of nodes at the kth hidden layer. As
11: Generate events frequency of each sub-region in each
shown in Figure 2, we can see that the past input information
time interval Dt and generate the mult-hot input vector.
would be propagated horizontally in each layer through
For each sub-region, apply function below.
12: for each time interval Dt do
weight matrices and nonlinear functions, and hence can be
13: if e exists then used for prediction.
14: Frequency in current time interval The number Specifically, an RNN works as follows. We usually set
of the existing e. the initial input at time t ¼ 0 to 0. Therefore, at time t, the
15: else hidden layer states are updated according to the follow
16: Frequency in current time interval 0 equations
17: end if
18: end for h1t ¼ F ðWih1 xt þ Wh1 h1 h1t1 þ b1t Þ
19: for each time interval Dt do t t1 t

20: for each sub-region do hkt ¼ F ðWhk1 hk hk1


t þ Wh k k hkt1 þ bkt Þ;
t t t1 ht
21: if CurrentFrequency ! ¼ 0 then
22: CurrentFrequency 1
where 2  k  K. Here, F is a nonlinear hidden layer func-
23: else
24: CurrentFrequency 0 tion that, for example, can be set as a sigmoid function. Wih1
t
25: end if denotes the weight matrix connecting the input to the first hid-
26: end for den layer at time t, Whk hk denotes the recurrent connection
t1 t
27: end for matrix between the kth hidden layers at time t  1 and at time
Output: Multi-hot feature vector. Each multi-hot vector t, Whk1 hk denotes the weight matrix connecting the ðk  1Þth
t t
points out the sub-regions with earthquake and kth hidden layer at time t and b denotes the bias vector. In
occurred in the current time interval. particular, suppose that we have xt 2 RM1 and the number of
nodes at the kth hidden layer is N k . Then, we can know that
the dimensions of the matrices Wih1 ; Whk hk ; Whk1 hk are N 1 
t t1 t t t
1) THE TYPICAL RNN ARCHITECTURE M; N k  N k ; N k  N k1 respectively, and the dimension of
Figure 2 shows the architecture of a general RNN with n hid- vector bkt is N k  1. Note that these parameters will be opti-
den layers. Compared with a normal ANN, an RNN tries to mized during the training process.

VOLUME 8, NO.
Authorized 1, JAN.-MAR.
licensed 151
2020 to: International Islamic University Malaysia. Downloaded on June 22,2023 at 03:57:06 UTC from IEEE Xplore. Restrictions apply.
use limited
Wang et al.: Earthquake Prediction Based on Spatio-Temporal Data Mining: An LSTM Network Approach

FIGURE 3. The typical LSTM single memory cell.

Besides, the output at time t, denoted by yt can be cal-


culated as
yt ¼ WhK o hKt þ dt ;
t
K
Here, yt 2 R , WhK o 2 RMN is the weight matrix
M1
t
between the Kth hidden layer and the output, and dt 2 RM1
is the bias vector for yt , respectively. Similar to the other
parameters mentioned above, these parameters will be opti-
mized during the training process. FIGURE 4. The flow diagram of our system.

2) THE TYPICAL LSTM ARCHITECTURE


As mentioned before, RNNs are incapable of handling long- Wix ; Wfx ; Wox and Wcx are the weight matrices connecting
term time dependency in practice, while LSTMs are explic- between input vector xt and input gate, forget gate, output gate
itly designed to address the long-term dependency problem. and cell state, respectively. Besides, since the gates and the
In particular, LSTMs have the same chain like structure, but input vector xt have the dimension of N k  1 and M  1
they use a different way to inplement function F in order to respectively, we can have that the dimensions of matrices
store information. It is to build a memory cell instead, which Wih ; Wic ; Wfh ; Wfc ; Wch ; Woh ; Woc are all the same, which is
could be considered as a black box that, for example, at the N k  N k , and the dimensions of matrices Wix ; Wfx ; Wcx ; Wox
first layer, takes the previous state ht1 and current system are N k  M.
input xt as inputs and computes internally to decide what to
B. THE PROPOSED LSTM NETWORK WITH
keep in memory and output the hidden state ht . Figure 3
TWO-DIMENSIONAL INPUT
shows the typical architecture of a single LSTM memory cell
1) SYSTEM ARCHITECTURE
[24]. We can see that the cell state runs straight down the
entire path with only some linear interactions, which makes We notice that when previous works employ neural networks
it very easy for information to be propagated in time. to predict earthquakes, they mainly consider the particular
To describe the memory cell in an LSTM in more detail, location and make predictions based on the historical earth-
we have the following equations quake data at this location, i.e., the input xt has only one
dimension and is only about one location. In so doing, they
it ¼ sðWix xt þ Wih ht1 þ Wic ct1 þ bi Þ essentially make predictions based on the temporal correla-
tions among historical data. In contrast, we consider that xt is
ft ¼ sðWfx xt þ Wfh ht1 þ Wfc ct1 þ bf Þ
a vector representing earthquake data at time t at several
ct ¼ ft  ct1 þ it  fðWcx xt þ Wch ht1 þ bc Þ locations, i.e., the M sub-regions as mentioned in our system
ot ¼ sðWox xt þ Woh ht1 þ Woc ct þ bo Þ model. The input to our LSTM network is a series, say L, of
ht ¼ ot  fðct Þ: xt ’s, i.e., a matrix of dimension M  L. Therefore, we can
make predictions for earthquakes in a large area not only
Here, i, f , o and c denotes the input gate, forget gate, output based on temporal data dependencies, but also based on spa-
gate, and cell state, respectively. These gates are all of the tial data correlations.
same dimension as the hidden vector h (N k  1 in the kth The main process of our system is shown by the flow chart
cell). s is a sigmoid function, and f is a nonlinear function in Figure 4. Specifically, the input matrix first goes through
which maps the input to ½1; 1:Wic , Wfc and Woc are the the LSTM layer. Then, dropout process is applied to the
peephole connection matrices, which connect cell state to output of the LSTM network, the result of which goes to the
input gate, forget gate, and output gate, respectively. Similarly, dense layer, i.e., a fully connected neural network. Finally,

152 VOLUME
Authorized licensed use limited to: International Islamic University Malaysia. Downloaded on June 22,2023 at 03:57:06 UTC from IEEE 8, NO.
Xplore. 1, JAN.-MAR.
Restrictions 2020
apply.
Wang et al.: Earthquake Prediction Based on Spatio-Temporal Data Mining: An LSTM Network Approach

FIGURE 7. The zoomed-in LSTM layer architecture with input Xt


FIGURE 5. Our system architecture. Dense means a fully con- and input Xtþ1 , respectively. There are L memory cells in the
nected neural network. LSTM layer due to the fact that we need to look back L system
statuses to make predictions.

we apply an activation function, which is set to softmax func- LSTM layer are shown in Figure 7, where there are L
tion, and obtain the prediction result xtþ1 . memory cells, one for each time slot. The output of the jth
The architecture of our system is presented in Figure 5. memory cell at time t, i.e., htj and ctj , is part of the input
Notice that as we mentioned above, in our system Xt is a of the next, i.e., the ðj  1Þth, memory cell. Besides, the out-
matrix of dimension M  L. As in Figure 6, in the training put of the LSTM layer goes to a dense layer whose output is
process, the target of prediction based on input matrix Xt at denoted by hD t . In the following, we describe in detail what
time t is xtþ1 , and in the prediction phase, xtþ1 is what needs happens after the LSTM layer.
to be predicted at time t. In our architecture, hLt is an output
of the LSTM layer at time t, which is constructed by memory 2) DROPOUT
cells depicted in Figure 3. In particular, the details of our To prevent our system from being overfitted, we apply a
method called dropout to the output of the LSTM layer.
System overfitting can lead to very high performance in
training but very low in testing. This is because when over-
fitting occurs, the system focuses too much on historical
data, which makes it too rigid to give satisfactory result on
new input. Many works have proved that adding dropout in
the system can efficiently prevent a neural network from
being overfitted [25]. In particular, by having dropout in the
system, a certain number of randomly selected nodes are
temporarily turned off in each sample training, along with
all its conjoint connections. Therefore, in our case, we apply
dropout between the LSTM layer and the dense network,
which is shown in Figure 8. Since some of the nodes in the
output of the LSTM layer have been turned off, the system
becomes insensitive to some extent, and hence can avoid
from being “too smart”, i.e., overfitted.

3) DENSE NETWORK
After the LSTM layer, we have the output of LSTM goes
to a dense network, which is essentially a fully connected
neural network. In this fully connected neural network, at

FIGURE 8. Dropout structure. The left side shows a normal sys-


tem without dropout, while the right side is the system with
FIGURE 6. The input matrix in our system. dropout applied.

VOLUME 8, NO.
Authorized 1, JAN.-MAR.
licensed 153
2020 to: International Islamic University Malaysia. Downloaded on June 22,2023 at 03:57:06 UTC from IEEE Xplore. Restrictions apply.
use limited
Wang et al.: Earthquake Prediction Based on Spatio-Temporal Data Mining: An LSTM Network Approach

each layer, each neuron gets connected to all the neurons at the We summarize the training process of our proposed LSTM
previous layer. By going through the dense network, the out- network in Algorithm 2.
put of LSTM is multiplied by a matrix and added with a bias.
The reason for having a dense network here is that the output 5) IMPROVING SYSTEM PERFORMANCE BY
of the LSTM contains the feature information we need to DECOMPOSITION
make prediction, but it is still not exactly what we need. So far we have introduced how our proposed LSTM works.
The dense network is so trying to learn the function between However there are two more problems: first, by considering
feature data and the prediction result. In our system, we set up a large area consisting of many sub-regions, we may have a
two layers in the dense network. The processing in the fully very large system with many variables, which requires a
connected network can represented below large amount of training data to be fully trained, and second,
by considering the sub-regions all together, we make earth-
t ¼ WD WP ht þ b;
hD L
quake predictions by taking advantages of the spatio-tempo-
where WP and hLt are the weight matrix between the output ral correlations among earthquake data in these sub-regions,
of the LSTM layer and the dense network, and the output of while in fact some sub-regions may not be very closely
the LSTM network, respectively, after the dropout. WD related in practice and hence will hinder the correct predic-
denotes the weight matrix in the dense network, hD tion. The first problem makes the system computationally
t is the
output of the dense network, and b is the bias. very expensive, and the second problem leads to less accu-
rate predictions. In the following, we propose to improve the
4) ACTIVATION FUNCTION system efficiency and accuracy by decomposition.
To obtain the final output of the system, we choose softmax Specifically, we divide all the sub-regions into groups,
as the activation function and apply it to the output of the which collectively and exclusively cover the whole area of
dense network. Particularly, the activation function maps the interest. We train the groups separately and make earthquake
output vector into a vector of elements between 0 and 1, each predictions for the sub-regions in the groups respectively. It
of which represents earthquake probability in a sub-region is obvious that how to form the groups is a very important
and the sum of which equals to 1. The softmax function can problem. We choose to put the sub-regions within the same
be calculated as fault zone into the same group. In so doing, the disturbance
m
from not-so-related sub-regions can be mitigated, the amount
ez of training data and the computational complexity can be sig-
t ¼ PM
ym ; for m ¼ 1; . . . ; M:
i¼1 ezi nificantly reduced.

Here, we use z to represent the output of the dense network hD t


Algorithm 2. The Training Process of the Proposed LSTM
for simplicity. zm and ym t represent the mth element in vector Input: X1 ; X2 ; . . . ; Xt
z, and that in the output yt , respectively. Note that the result is 1: Enter the LSTM Layer, and calculate function
a vector of probabilities between 0 and 1 but not binary results ht ¼ ot  fðct Þ.
that we need for earthquake prediction yet. To map the proba- 2: Apply Dropout.
bilities into 0s or 1s, we obtain an optimal probability thresh- 3: Go through the dense network. Calculate
t ¼ WD WP ht þ b.
old in the training process that minimizes the sum of the hD L
absolute value of the differences between the predicted label 4: Apply softmax as the activation function.
values and the real label values, which are either 0s or 1s. 5: Calculate cross-entropy function as the loss function.
Moreover, to have the system learn to hit the target value 6: Employ the gradient descent method to minimize the loss
in the training process, we need to define a loss function. function, and hence optimize the system parameters.
Since our problem is essentially a classification into variant
labels, we employ cross-entropy as the loss function, which
is commonly used and tested to be appropriate [26]. Particu- V. PERFORMANCE EVALUATION
larly, cross-entropy can be calculated as In this section, we evaluate the performance of the proposed
X
M system through two case studies. In the first case, we study
ðxtþ1 ; yt Þ ¼  xitþ1 log yit ; the system performance when we use one-dimensional input
i¼1 as before in our system; in the second case, we explore the
where xitþ1 and yit denote the ith element in xtþ1 and in yt , system performance when two-dimensional input is used as
respectively. we propose in this study.
To train our system, our goal is to minimize the loss func-
tion. We use the gradient descent method due to its efficiency. A. CASE STUDY I: THE PROPOSED LSTM NETWORK
In particular, we utilize RMSprop to minimize the loss func- WITH ONE-DIMENSIONAL INPUT
tion, which has been experimentally proved to be an effective As mentioned before, in this case, we make earthquake pre-
algorithm for RNNs [27]. dictions by using the proposed LSTM network with only

154 VOLUME
Authorized licensed use limited to: International Islamic University Malaysia. Downloaded on June 22,2023 at 03:57:06 UTC from IEEE 8, NO.
Xplore. 1, JAN.-MAR.
Restrictions 2020
apply.
Wang et al.: Earthquake Prediction Based on Spatio-Temporal Data Mining: An LSTM Network Approach

FIGURE 9. Prediction results when the look back window is 1. FIGURE 10. Prediction results when the look back window is 10.
The horizontal axis represents time slots and the vertical axis The horizontal axis represents time slots and the vertical axis
represents the number of earthquakes that have happened in represents the number of earthquakes that have happened in
the corresponding time slot. the corresponding time slot.

one-dimensional input, i.e., by exploiting the temporal corre- Besides, we also employ our proposed LSTM network
lations only. with one-dimensional input to predict whether there are
earthquakes or not. The selected area of interest is in
1) DATA PREPROCESSING mainland China, particularly, between 75 E and 119 E
The data that we use is gathered from the USGS (US longitudes and 23 N and 45 N latitudes, as shown in
Geological Survey) website. In particular, we use Conter- Figure 11. We equally divide this area into nine smaller
minous U.S earthquake data from 2006 to 2016 with mag- sub-regions, and aim to predict whether there are earth-
nitudes greater than 2.5 in our simulations. We set one quakes with magnitudes greater than 4.5 in each of the
time slot to one month. In each time slot, the input is the sub-regions with the data collected from 1966 to 2016.
number of earthquakes that happened in this time slot in a Besides, in our LSTM network, the LSTM layer has an
certain sub-region. We have 120 data items when one time output of 128 neurons, the dense network has 256 and 64
slot is one month. As usual, we divide the data into two neurons in the first layer and second layer, respectively,
parts: training data and testing data. Particularly, the first and the output layer has 9 neurons. The activation func-
two third of data will be used for training and the rest will tion is set to the softmax function. Our results show that
be used for testing. the overall prediction accuracy is 63.50 percent, with true
positive accuracy of 46.83 percent and true negative accu-
2) LSTM NETWORK SETTINGS racy of 79.6 percent.
In this case, we build our LSTM network with one-dimen-
sional input only in the time domain. The output of the
LSTM layer has 4 neurons. The activation function is set by
default to the sigmoid function and makes a single value pre-
diction. The “look back window” of the system is set to 1
and 10, which is the number of most recent data that we con-
sider as input to predict the next time slot variables.

3) SIMULATION RESULTS AND DISCUSSIONS


Figures 9 and 10 show the prediction results for the total
number of earthquakes when the look back window is equal
to 1 and 10, respectively. The blue line shows the real earth-
quake frequency distribution, while the green line and red
line denote the training results and testing results respec-
tively. Here, we adopt the mean squared deviation (MSD) as
FIGURE 11. Earthquakes with magnitudes greater than 4.5 from
our loss function. We get MSDs of 40.37 and 42.50 for look 1966 to 2016. The way we divide the area into 9 sub-regions is
back windows of 1 and 10, respectively, which represent shown in the figure, where sub-regions have been numbered
highly inaccurate predictions. from 1 to 9.

VOLUME 8, NO.
Authorized 1, JAN.-MAR.
licensed 155
2020 to: International Islamic University Malaysia. Downloaded on June 22,2023 at 03:57:06 UTC from IEEE Xplore. Restrictions apply.
use limited
Wang et al.: Earthquake Prediction Based on Spatio-Temporal Data Mining: An LSTM Network Approach

FIGURE 14. An input matrix with multi-hot vectors.


FIGURE 12. Raw data of earthquakes with magnitudes greater
than 4.5 from 1966 to 2016.
As explained in our system model, we represent the origi-
nal input by a multi-hot vector, in which an element is set to
B. CASE STUDY II: THE PROPOSED LSTM NETWORK 1 if the corresponding sub-region has earthquakes happened
WITH TWO-DIMENSIONAL INPUT and 0 otherwise. Besides, we choose the look back window
In this case, by changing the input to two-dimensions, we to be 12. Therefore, our input matrix is of 12  9. Moreover,
take advantage of spatio-temporal correlations to make earth- 70 percent of our data is used for model training, and the
quake predictions. remaining 30 percent is used for the testing. Figure 14 has
shown some of our generated multi-hot vectors.
1) DATA PREPROCESSING
The same as that in the previous case, we gather data from 2) LSTM NETWORK SETTINGS
USGS website, and use earthquake data in mainland China We use the same settings as those used for obtaining the sec-
for our simulations. As shown in Figure 11, our area of inter- ond set of simulation results in Section V-A3, i.e., predicting
est is still between 75 E and 119 E longitudes and 23 N and whether there are earthquakes or not in mainland China with
45 N latitudes, which is equally divided into nine smaller one-dimensional input. Specifically, with two-dimensional
sub-regions. We are interested in the earthquakes in this area input, our LSTM neural network has 128 neurons in the out-
with magnitudes greater than 4.5 from 1966 to 2016. Some put of the LSTM layer, 256 and 64 neurons in the first layer
raw data is shown in Figure 12. We define a time slot to be and second layer of the dense network, respectively, and 9
one month. Thus, we have 600 data items, as shown in neurons in the output layer. As we mentioned before, the
Figure 13. Here, the number in each data item means the fre- softmax function is selected as our activation function.
quency of earthquake events belong to the certain sub-region. In addition, we employ RMSprop as our optimizer with the
learning rate being 0.01.

3) SIMULATION RESULTS AND DISCUSSIONS


By conducting simulation results with the proposed LSTM
network with two-dimensional input, we find that the predic-
tion accuracy on the testing data is 74.81 percent, with the
true positive accuracy of 68.56 percent and true negative
accuracy of 81.31 percent as shown in Figure 15. We can
easily see that the prediction performance so far is obviously
much better than that in Case I with one-dimension input.
Particularly, the true positive accuracy is much higher, i.e.,
68.56 percent compared with 46.83 percent. Consequently,
we can know that mining spatio-temporal data correlations
does provide better prediction results than only mining tem-
poral data correlations.
Moreover, notice that the above results are obtained when
the area of interest is studied as a whole, and all the 9 sub-
FIGURE 13. Data of earthquake frequencies in all the 9 sub-regions. regions are considered to be closely correlated. Next we

156 VOLUME
Authorized licensed use limited to: International Islamic University Malaysia. Downloaded on June 22,2023 at 03:57:06 UTC from IEEE 8, NO.
Xplore. 1, JAN.-MAR.
Restrictions 2020
apply.
Wang et al.: Earthquake Prediction Based on Spatio-Temporal Data Mining: An LSTM Network Approach

FIGURE 15. Results comparison between without and with


FIGURE 16. Results comparison between without and with
decomposition, with 3  3 sub-regions and one-month time slots.
decomposition, with 3  3 sub-regions and two-weeks time slots.

employ our decomposition method to further improve sub-regions. Note that previous results are obtained when
performance. each time slot is one month. Then, we conduct simulations by
Specifically, the sub-regions 1, 2, 5, 6 cover Tibet, Sich- reducing each time slot to two weeks. With the same system
uan, Xinjiang, Gansu and Ningxia provinces, which are settings, our system leads to prediction results shown in
within the same fault zone [28]. When we consider these Figure 16. We can find that without the decomposition
four sub-regions as a group, the prediction accuracy is method, the system gives lower prediction accuracy than that
88.57 percent. This indicates that our system can well learn when we set each time slot to one month as shown in
spatio-temporal correlations among the earthquakes in these Figure 15. This is because the input data becomes sparser
four sub-regions and make accurate predictions. Besides, when the time slot is reduced to two weeks only, which makes
the 3rd sub-region includes Nepal, which is also a region it more difficult for the system to find the correlations among
with intense earthquake activities. However, because of earthquake occurrences. Nevertheless, when applying the pro-
Himalayas mountains, Nepal is located in a different posed decomposition method, we can still achieve compara-
fault zone from all the other sub-regions within Mainland ble results with those with one-month time slots in Figure 15.
China. So it may have loose spatio-temporal correlations Specifically, the overall accuracy becomes 86 percent, and the
to the other sub-regions. This has been confirmed by the true positive accuracy and the true negative accuracy increase
fact that the overall prediction accuracy of group of 2, 3, from 60.83 to 69.28 percent and from 77.38 to 94.09 percent,
5 sub-regions is 52.46 percent, and that of the group of respectively. These results show that our proposed LSTM
3, 5, 6 is 56.25 percent. system with the decomposition method can work well with
After the analysis above, our final grouping plan is as fol- different time slot sizes in the temporal domain.
lows. Group 1 consists of the 1st, 2nd, 5th and 6th sub- Besides, we also attempt to increase the number of sub-
regions with prediction accuracy of 88.57 percent, group 2 regions to make the spatial prediction more accurate. Simi-
includes the 4th, 7th, 8th and 9th sub-regions with prediction larly, we equally divide the whole area of interest into 5  5
accuracy of 87.57 percent, and group 3 contains the 3rd sub- sub-regions instead of the previous 3  3 sub-regions. To
region with prediction accuracy of 61.60 percent. Combining make fair comparisons, we still set each time slot to one month.
the results together, we have that our overall prediction accu- Without applying the proposed decomposition method, the
racy is 85.12 percent with true positive accuracy of 77.07 overall accuracy increases to 82.47 percent, which is better
percent and true negative accuracy of 93.49 percent, which is than 74.81 percent that is obtained when the whole area is
also shown in Figure 15. From the figure, we can clearly see divided into 3  3 sub-regions. The complete results are
the performance improvement in terms of prediction accu- shown in Figure 17. We can see that although the overall accu-
racy, true positive accuracy, and true negative accuracy, by racy seems good, the true positive accuracy has dropped from
applying our decomposition method. 68.56 to 47.68 percent, which is too low to correctly predict
On the other hand, we compare a previous earthquake pre- earthquakes. The reason is that similar to reducing the time
diction scheme with ours. Specifically, Moustra et al. [22] slot size, the data becomes much sparser when the number of
make earthquake prediction by using a multi-layer percep- sub-regions increases from 9 to 25. Particularly, in the case of
tron (MLP), which is a kind of traditional ANN. We run this 25 sub-regions, the input vector becomes much longer, which
method on our two-dimensional input data and the prediction makes mining the correlations among earthquake occurrences
accuracy is 66.99 percent, which is much lower than our much more difficult. To address this issue, we apply our
result without decomposition, i.e., 74.81 percent, and that decomposition method. Here, the grouping plan is still based
with decomposition, i.e., 85.12 percent. on the fault zone distribution, which is similar to what we use
Furthermore, we evaluate the performance of our system when there are 3  3 sub-regions. From Figure 17, we can find
with input of different time slot sizes and different numbers of that the overall accuracy increases from 82.47 to 87.59 percent

VOLUME 8, NO.
Authorized 1, JAN.-MAR.
licensed 157
2020 to: International Islamic University Malaysia. Downloaded on June 22,2023 at 03:57:06 UTC from IEEE Xplore. Restrictions apply.
use limited
Wang et al.: Earthquake Prediction Based on Spatio-Temporal Data Mining: An LSTM Network Approach

[5] M. Akhoondzadeh and F. J. Chehrebargh, “Feasibility of anomaly occur-


rence in aerosols time series obtained from modis satellite images during
hazardous earthquakes,” Advances Space Res., vol. 58, pp. 890–896, 2016.
[6] V. Korepanov, “Possibility to detect earthquake precursors using
cubesats,” Acta Astronautica, vol. 128, pp. 203–209, 2016.
[7] J. Fan, Z. Chen, L. Yan, J. Gong, and D. Wang, “Research on earthquake
prediction from infrared cloud images,” in Proc. 9th Int. Symp. Multispec-
tral Image Process. Pattern Recog., 2015, pp. 98 150E–98 150E.
[8] J. Thomas, F. Masci, and J. Love, “On a report that the 2012 M6: 0 earth-
quake in italy was predicted after seeing an unusual cloud formation,”
Natural Hazards Earth Syst. Sci., vol. 15, pp. 1061–1068, 2015.
[9] M. Hayakawa et al.“On the precursory abnormal animal behavior and
electromagnetic effects for the kobe earthquake (m~ 6) on april 12, 2013,”
Open J. Earthquake Res., vol. 5, no. 03, pp. 165–171, 2016.
[10] M. Last, N. Rabinowitz, and G. Leonard, “Predicting the maximum earth-
quake magnitude from seismic data in israel and its neighboring coun-
tries,” PloS one, vol. 11, no. 1, 2016, Art. no. e0146101.
FIGURE 17. Results comparison between without and with 
[11] G. Asencio-Cortes, F. Martlnez-Alvarez, A. Morales-Esteban, J. Reyes,
decomposition, with 5  5 sub-regions and one-month time slots. and A. Troncoso, “Improving earthquake prediction with principal compo-
nent analysis: Application to chile,” in Proc. Int. Conf. Hybrid Artif. Intell.
with the decomposition method applied. More noticeably, the Syst., 2015, pp. 393–404.
[12] J. Mahmoudi, M. A. Arjomand, M. Rezaei, and M. H. Mohammadi,
true positive accuracy significantly increases from 47.68 to “Predicting the earthquake magnitude using the multilayer perceptron neu-
86.48 percent, which is the best result we get so far, and in the ral network with two hidden layers,” Civil Eng. J., vol. 2, no. 1, pp. 1–12,
meantime, the true negative accuracy remains as high as 87.87 2016.
[13] C. Li and X. Liu, “An improved PSO-BP neural network and its applica-
percent. Thus, with more number of sub-regions in the spatial tion to earthquake prediction,” in Proc. Chinese Control Decision Conf.,
domain, our LSTM system with the decomposition method 2016, pp. 3434–3438.
can still effectively exploit the correlations among earthquakes [14] S. Saba, F. Ahsan, and S. Mohsin, “BAT-ANN based earthquake predic-
tion for pakistan region,” Soft Comput., vol. 20, pp. 1–9, 2016.
to make accurate earthquake predictions. [15] A. Panakkat and H. Adeli, “Recurrent neural network for approximate
To sum up, the above results clearly demonstrate the earthquake time and location prediction using multiple seismicity indica-
robustness and effectiveness of our new LSTM system. Last tors,” Comput.-Aided Civil Infrastructure Eng., vol. 24, no. 4, pp. 280–
292, 2009.
but not the least, comparing with previous earthquake predic- [16] X. Wang, L. Gao, and S. Mao, “PhaseFi: Phase fingerprinting for indoor
tion methods, which are mostly based on various seismic localization with a deep learning approach,” in Proc. IEEE Global
indicators, our system has little overhead on obtaining the Commun. Conf., 2015, pp. 1–6.
[17] X. Wang, L. Gao, S. Mao, and S. Pandey, “CSI-Based fingerprinting for
input data. Particularly, even in areas where there are no seis- indoor localization: A deep learning approach,” IEEE Trans. Vehicular
mic sensors and monitors, we are still able to use our system Technol., vol. 66, no. 1, pp. 763–776, 2017.
for earthquakes prediction. [18] M. Jiang, “Easily magnetic anomalies earthquake prediction,” in Proc.
MATEC Web Conf., 2016, Art. no. 01020.

[19] E. Florido, F. Martlnez-Alvarez, A. Morales-Esteban, J. Reyes, and
J. Aznarte-Mellado, “Detecting precursory patterns to enhance earthquake
VI. CONCLUSION
prediction in chile,” Comput. Geosciences, vol. 76, pp. 112–120, 2015.
In this paper, we have proposed a new earthquake prediction 
[20] G. Asencio-Cortes, F. Martlnez-Alvarez, A. Morales-Esteban, and
system from the spatio-temporal perspective. Specifically, J. Reyes, “A sensitivity study of seismicity indicators in supervised learn-
ing to improve earthquake prediction,” Knowl.-Based Syst., vol. 101,
we have designed an LSTM network with two-dimensional pp. 15–30, 2016.
input, which can discover the spatio-temporal correlations [21] S. Narayanakumar and K. Raja, “A BP artificial neural network model for
among earthquake occurrences and take advantage of the earthquake magnitude prediction in himalayas, india,” Circuits Syst.,
vol. 7, no. 11, pp. 3456–3468, 2016.
correlations to make accurate earthquake predictions. The [22] M. Moustra, M. Avraamides, and C. Christodoulou, “Artificial neural
proposed decomposition method for improving the effective- networks for earthquake prediction using time series magnitude data or
ness and efficiency of our LSTM network has been shown to seismic electric signals,” Expert Syst. Appl., vol. 38, no. 12, pp. 15032–
15039, 2011.
be able to significantly improve the system performance. [23] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural
Simulation results also demonstrate that our system can Comput., vol. 9, no. 8, pp. 1735–1780, 1997.
make accurate predictions with different temporal and spatial [24] A. Graves, “Generating sequences with recurrent neural networks,” arXiv,
vol. abs/1308.0850, 2013, http://arxiv.org/abs/1308.0850
prediction granularities. [25] N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhut-
dinov, “Dropout: A simple way to prevent neural networks from overfit-
REFERENCES ting,” J. Mach. Learn. Res., vol. 15, no. 1, pp. 1929–1958, 2014.
[26] J. Shore and R. Johnson, “Axiomatic derivation of the principle of maxi-
[1] G. A. Sobolev, “Methodology, results, and problems of forecasting earth- mum entropy and the principle of minimum cross-entropy,” IEEE Trans.
quakes,” Herald Russian Academy Sci., vol. 85, no. 2, pp. 107–111, 2015. Inf. Theory, vol. 26, no. 1, pp. 26–37, Jan. 1980.
[2] A. Boucouvalas, M. Gkasios, N. Tselikas, and G. Drakatos, “Modified- [27] T. Tieleman and G. Hinton, “Lecture 6.5-rmsprop: Divide the gradient by a
fibonacci-dual-lucas method for earthquake prediction,” in Proc. 3rd Int. running average of its recent magnitude,” COURSERA: Neural Netw.
Conf. Remote Sensing Geoinformation Environment, 2015, pp. 95 Mach. Learn., vol. 4, no. 2, pp. 1–31, 2012.
351A–95 351A. [28] G. Pararas-Carayannis, The earthquake of May 12, 2008 in the sichuan
[3] S. Kannan, “Innovative mathematical model for earthquake prediction,” province of china. 2008. [Online]. Available: http://www.drgeorgepc.com/
Eng. Failure Anal., vol. 41, pp. 89–95, 2014. Earthquake2008ChinaSichuan.html
[4] M. Hayakawa, “Earthquake prediction with electromagnetic phenomena,”
in Proc. 360 Degree Outlook Critical Scientific Technol. Challenges
Sustainable Soc., 2016, Art. no. 020002.

158 VOLUME
Authorized licensed use limited to: International Islamic University Malaysia. Downloaded on June 22,2023 at 03:57:06 UTC from IEEE 8, NO.
Xplore. 1, JAN.-MAR.
Restrictions 2020
apply.

You might also like