Drought Assessment Based On Data Fusion and Deep Learning
Research Article
Drought Assessment Based on Data Fusion and Deep Learning
Received 23 April 2022; Revised 30 May 2022; Accepted 6 June 2022; Published 31 July 2022
Drought is a major factor affecting the sustainable development of society and the economy. Research on drought assessment is of
great significance for formulating drought emergency policies and drought risk early warning and enhancing the ability to
withstand drought risks. Taking the Yellow River Basin as the object, this paper utilizes data fusion, copula function, entropy
theory, and deep learning, fuses the features of meteorological drought and hydrological drought into a drought assessment index,
and establishes a long short-term memory (LSTM) network for drought assessment, based on deep learning theory. The results
show that (1) after extracting the features of meteorological drought and hydrological drought, the drought convergence index
(DCI) built on the fused features by copula function can accurately reflect the start and duration of the drought; (2) the drought
assessment indices were effectively screened by judging the causality of the drought system, using the transfer entropy; (3) drawing
on the idea of deep learning, LSTM for drought assessment, which was established on DCI and the drought assessment factors, can
accurately assess the drought risks of the Yellow River Basin.
encompassing rainfall, runoff, evapotranspiration, and soil ringing a bell of the severity of drought [13]. In fact, the
water content and found that the index is highly sensitive to Yellow River Basin is the largest drought-stricken basin in
mild drought. China [14]. Drought has brought multiple problems, such as
The drought cannot be fully illustrated by a single dried rivers and soil degradation [15]. It is of realistic sig-
drought index alone. Therefore, many scholars have pro- nificance to study the drought assessment of the Yellow
posed considerable works to construct a series compre- River Basin, which would aid ecological balance and social
hensive drought index that cover as many drought variables development.
as possible [9]. Ren et al. [10] combined SPI, PDSI, and SPEI This paper selects the measured monthly runoffs of
into comprehensive drought indices through fuzzy com- 1980–2019 at six major hydrological stations and the
prehensive evaluation. Maji and Kanrar [11] proposed a monthly meteorological data of 1980–2019 at 21 meteoro-
comprehensive drought index by principal component logical stations along the trunk of the Yellow River. On this
analysis (PCA). Yet the combined drought index based on basis, the meteorological and hydrological drought features
the weighting method and the fuzzy combined method has were fused for assessment. The meteorological data were
certain subjectivity in weighting, and it is easy to cause obtained from China Meteorological Data Service Center
errors, and the combined drought index based on the (http://data.cma.cn), and the hydrological data were col-
principal component analysis method combines related lected from the official site of the Yellow River Conservancy
variables linearly and cannot reflect their nonlinear impact Commission of the Chinese Ministry of Water Resources
characteristics. (http://www.yrcc.gov.cn/).
Based on data fusion, DCI combining meteorological
and hydrological factors was constructed, meteorological 3. Methodology
and hydrological drought for the feature layer fusion by
copula function. Copula function is simply a specification of 3.1. Data Fusion
how univariate marginal distributions combine to form
multivariate distribution. There is no limitation in choosing 3.1.1. Selection of Characteristic Parameters. The effective-
the marginal distribution function, and all margin-free ness of the copula function comes from the fact that the
characteristics can be fully maintained. The DCI not only has function can merge random marginal distributions, which
the characteristics of the meteorological drought index that contain the information of all variables, into a joint dis-
can quickly capture the onset of drought but also has the tribution, without losing or distorting any information. In
advantage of the hydrological drought index that can de- 1993, McKee created SPI [16], a multitimescale drought
scribe the duration of drought, fusing meteorological and index. The SPI can effectively assess the features of drought
hydrological characteristics. The original information can be on different scales, using the occurrence probability of
optimized and combined through data fusion, making it precipitation. The standardized runoff index (SRI) measures
possible to turn multisource information into effective hydrological drought by river runoff and its statistic. The
output. This paper extracts the features of hydrological calculation process of the SRI is similar to that of the SPI
drought and meteorological drought from the data collected [17, 18]. The drought/flood grading standards of SPI and SRI
at six major hydrological stations and meteorological sta- follow the Grades of meteorological drought (GB/T20481-
tions along the trunk of the Yellow River and fuses the 2017), which was formulated by National Technical Com-
features with copula function, producing a hybrid drought mittee on the Standardization of Climate and Climate
index composed of both hydrological and meteorological Change of China.
factors. When selecting the drought factors, the influence
between variables is measured by transfer entropy, and 3.1.2. Copula Function. According to Sklar’s theorem
through this influence, a causal relationship between vari- [19, 20], for any two one-dimensional (1D) random variables
ables is established. Moreover, the LSTM network, a deep X and Y, if their distribution functions are FX(x) � P(X ≤ x)
learning model, was adopted to evaluate drought by drought and FY(y) � P(Y ≤ y), and their joint distribution function is
factors. FC(x,y), then there exists a unique copula function C()
satisfying the following equation:
FC (x, y) � CFX (x), FY (y), x, y ∈ R. (1)
2. Study Area and Data Sources
For two-dimensional (2D) variables, the common forms
The 5,464 km-long Yellow River (Figure 1) is the sixth of copula function include Gaussian, Archimedean, and
longest in the world, and the second longest river in China, student T. Among them, the Archimedean copula is widely
flowing through 9 provinces. The basin of the river, covering applied because it can reflect the dependence intensity with a
an area of 795,000 km2, spans across four geomorphic units: parameter. The Archimedean copula can be further divided
the Qinghai-Tibet Plateau, Inner Mongolia Plateau, the into three types: Gumbel, Clayton, and Frank. With
Loess Plateau, and North China Plain. Since the 1980s, the asymmetric structures, Gumbel and Clayton copulas can
temperature rose significantly in the Yellow River Basin, capture the asymmetric properties. With a symmetric de-
while the rainfall dropped slightly [12]. Since the 1990s, the pendence structure, the Frank copula permits negative de-
basin was hit by increasingly serious droughts. In 2009, a pendence by histogram of variables X and Y. The density
wide and severe drought plagued the Central Plains region, function can be defined as
Hydrological station
Meteorological station
40° N Main stream 40° N
35° N 35° N
0 60 120 240 360 480
CGu (u, v; θ)(ln u · ln v)θ− 1 3.1.3. Feature Layer Fusion. Let random variable X be the
cGu (u, v; θ) meteorological drought feature SPI, with a marginal dis-
uv(− ln u)θ +(− ln v)θ
2− 1/θ
(2) tribution of FX(x). The joint distribution probability of
drought features can be calculated by
(− ln u) +(− ln v) + θ − 1,
FC (x, y) CFX (x), FY (y)
θ θ
P(x ≤ X, y ≤ Y) (6)
− θe− θ − 1e− θ(u+v)
cFr (u, v; θ) (3) p.
e − 1 + e − 1e − 1
−θ − θu − θv
The drought convergence index (DCI) can be expressed
u + v− θ − 1
θ− 1 − θ − 2− 1/θ
cCl (u, v; θ) (1 + θ)(uv)− , (4)
DCI φ− 1 (p), (7)
where θ is a parameter; u and v are marginal cumulative
probabilities. where φ is the standard normal distribution.
The parameter θ can be solved by the Kendall rank Referring to classification standards of SPI [21] and the
correlation coefficient τ. For the Gumbel copula, the National Climate Center’s Standard, (GB/T20481-2017),
relationship between θ and τ is τ 1 − 1/θ, θ ∈ [1, ∞). For droughts can be divided into different levels (Table 1) by
the Frank copula, that relationship is τ 1+ severity.
4/θ(1/θ 0 t/et − 1dt − 1), θ∈R/{0}. For the Clayton copula, According to the frequency of historical droughts and
that relationship is τ θ/θ + 2, θ ∈ (0, ∞). the empirical frequency of DCI, DCI -1 was taken as the
For different objects, the fused feature between SPI and threshold of drought occurrence: if the DCI value remains
SRI can be described by the∧ Euclidean distance between below -1, the drought must have occurred.
empirical copula function C(u, v) and copula function
C(u, v) :
3.2. Transfer Entropy. Considering the transmissibility be-
d i1 C ui , vi − C ui , vi .
tween information, Schreiber [22] coined the concept of
(5) transfer entropy based on information entropy theory. Based
on mutual information that reflects the correlation between
The smaller the distance, the better the goodness-of-fit of variables, the transfer entropy measures the causality of
copula function for the variables. information transfer in terms of magnitude and direction.
Table 1: Division of drought levels. between variables X and Y, in the light of the information
DCI Drought levels
provided by variable Z. Transfer entropy considers the past
state of variable Y, and the dependence between variables X
(− ∞,− 2.0] Extreme drought
(− 2.0, − 1.5] Severe drought
and Y. Therefore, the following formula can be derived from
(− 1.5, − 1.0] Moderate drought the relationship between conditional mutual information
(− 1.0, ∞) No drought and transfer entropy:
TEX⟶Y � I Yi+1 ; Xi |Yi . (11)
3.2.1. Conditional Mutual Information. Conditional mutual
information refers to the amount of mutual information
acquired about event yj based on the known event xi, under 3.3. Deep Learning. In deep learning, the recurrent neural
the given event zk: network (RNN) has been widely and deeply applied in
natural language processing (NLP), such as speech recog-
pyj |xi , zk nition, language modeling, and machine translation. The
Iyj ; xi |zk � log . (8)
pyj |zk RNN has a good memory, supports parameter sharing, and
realizes turing completeness. Therefore, it has lots of ad-
Solving the expectations of variables X, Y, and Z, the vantages in learning the nonlinear features of series.
mean conditional mutual information of X relative to Y Proposed by Hochreiter and Schmidhuber [23], the
under Z can be obtained as follows: LSTM overcomes the defects of RNN by adding a memory
unit to the recurrent layer, putting an end to the problem of
I(Y; X|Z) � EIyj ; xi |zk
exploding or vanishing gradients. The LSTM relies on the
functions of the forget gate, the input gate, the output gate,
pyj |xi , zk and the memory unit to propagate and memorize long- and
� pyj , xi , zk log
j�1 i�1 k�1 pyj |zk short-term information. Figure 2 shows the structure of the
N N N During the operation of the LSTM, the forget gate firstly
� pyj , xi , zk log pyj |xi , zk (9) determines which information to forget in the current unit
j�1 i�1 k�1 according to the input at the current state. The output state
of the previous unit is controlled by the sigmoid function:
− pyj , xi , zk log pyj |zk ft � σ Wf ht− 1 , xt + bf . (12)
j�1 i�1 k�1
The input gate consists of two parts. Firstly, the sigmoid
� H(Y|Z) − H(Y|X, Z). layer determines which new information to add into the
current unit. Then, the tanh layer obtains the new candidate
state of the unit. Finally, the two are combined to obtain the
state of the unit at the current moment t:
3.2.2. Transfer Entropy. Suppose discrete variables Xi and Yi,
i � 1, 2, . . ., N are of the same length and mutually act on it � σ Wi ht− 1 , xt + bi , (13)
each other. Then, the transfer entropy from X to Y reflects
the information transfer from X to Y in the past states. t � tanh WC ht− 1 , xt + bC .
C (14)
P Yi+1 |Xi , Yi The memory unit C stores the memory in the RNN and
TEX⟶Y � P Yi+1 , Xi , Yi log
P Yi+1 |Yi represents the long-term memory. The short-term memory
is denoted by h. Both memories are propagated layer by layer
backward to ensure the memory function of the LSTM:
� P Yi+1 , Xi , Yi log P Yi+1 |Xi , Yi
(10) t.
Ct � ft ⊙ Ct− 1 + it ⊙ C (15)
− P Yi+1 , Xi , Yi log P Yi+1 |Yi From the past output, current input, and current unit
state, the output gate drives the current output:
� H Yi+1 |Yi − H Yi+1 |Xi , Yi . ot � σ Wo ht− 1 , xt + bo , (16)
ht−1 ht ht+1
Ct−1 Ct
× +
~ × Ot
A ft it Ct × A
σ σ tanh σ
ht−1 ht
xt−1 xt
Wi , Wc , and Wo are the weight matrices of the forget gate, 4. Case Study
the input gate, the unit state, and the output gate, respec-
tively; bf, bi, bc, and bo are the biases of the forget gate, the 4.1. Data Fusion. Data fusion intends to combine infor-
input gate, the unit state, and the output gate, respectively. mation in the best possible way to obtain more effective
The LSTM network uses the error backpropagation algo- information. This paper adopts the data fusion to combine
rithm to update the weights. The weight is divided into two meteorological data and hydrological data and evaluate the
parts, one part participates in the output of the previous drought level comprehensively. It is worth noting that the
neuron as Wfh , Wih , Wch , and Woh , and the other part data fusion generally falls into three levels: data layer fusion,
participates in the current input as Wfx , Wix , Wcx , and Wox . feature layer fusion, and decision layer fusion. The fusion by
L is the loss function, and the error at the current moment t copula function is the feature layer fusion because SPI and
is defined as SRI, two characteristic indices, are adopted, which reflect
meteorological and hydrological droughts, respectively.
def zL
δt � . (18)
4.1.1. Selection of Copula Function. Taking the Tangnaihai
The input to the neuron is
Station in the upper reaches of the Yellow River for example,
the frequency histograms and 2D frequency diagrams of SPI
netf,t � Wf ht− 1 , xt + bf � Wfh ht− 1 + Wfx xt + bf , (19) and SRI are displayed in Figure 3, and the cumulative
distribution and density function diagrams of the Gumbel
neti,t � Wi ht− 1 , xt + bi � Wih ht− 1 + Wix xt + bi , (20) copula are displayed in Figure 4.
The SPI and SRI values of the Tangnaihai Station con-
netc ,t � Wc ht− 1 , xt + bc � Wch ht− 1 + Wcx xt + bc , (21) centrated in [− 2, 2]. The SPI frequency distribution was
higher in the middle than the two sides and symmetric to a
neto,t � Wo ht− 1 , xt + bo � Woh ht− 1 + Wox xt + bo , (22) certain extent. The SRI frequency distribution was very
asymmetric: the left side was taller than the right side. Thus,
zL the distribution feature can be illustrated well by the Gumbel
δf,t � , (23) copula, which has the shortest Euclidean distance. Using the
Gumbel copula, the authors established the copula function
of the hybrid drought index for the Tangnaihai Station:
def zL
δi,t � , (24) 0.6247
zneti,t CGu (u, v; θ) � exp− (− lnu)1.6008 +(− lnv)1.6008 , (28)
0.7 0.7
0.6 0.6
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
−3 −2 −1 0 1 2 3 4 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5
(a) (b)
0.5 0.8
0 0
1 8
0 0
1 1
1 1
0.8 0.8
0.5 0.6 0.5 0.6
0.4 0.4
0.2 0.2
SRI 0 0 SRI 0 0
(a) (b)
Figure 4: Cumulative distribution function and probability density function of the Gumbel copula.
Drought index
4.1.2. Comparison of Drought Indices. Considering the sit- time of drought as the SRI. Therefore, the DCI combines the
uation of long-term drought, this paper computes the in- merits of the SPI and the SRI: The DCI can sensitively
terannual SPIs and SRIs and conducts the feature layer capture the start time of drought as the SPI, and effectively
fusion of meteorological and hydrological droughts by depict the drought duration. By fusing meteorological in-
copula function, producing a hybrid drought index coupling formation with hydrological information, the DCI can
long-term meteorological and hydrological information. characterize both meteorological and hydrological droughts
Taking the Tangnaihai Station for example, the interannual simultaneously. The advantages of the DCI include high
SPI, SRI, and DCI were compared (Figure 5). sensitivity, strong recognizability, and wide applicability.
The threshold of moderate drought was set to -1. This research further validated the theory of Kimaru et al.
Normally, drought begins when the drought index falls [21] and meteorological drought occurs and terminates very
below -1 and ends when the index rises above -1. The quickly, while hydrological drought begins and ends with a
drought occurrence directly depends on rainfall. If the certain delay in response to meteorological drought.
rainfall is consistently abnormal for a period, it will affect the
confluence between surface water and groundwater through
the natural water circulation and thus impact the hydro- 4.2. Selection of Assessment Factors. When selecting the
logical condition. The SPI responds sensitively to the mo- drought factors, the influence between variables is measured
ment of the drought occurrence, while the SRI can accurately by transfer entropy, and through this influence, a causal
determine the duration and end time of drought. As shown relationship between variables is established. Based on
in Figure 5, the DCI trend was overall similar to the trends of conditional mutual information, the transfer entropy is an
SPI and SRI. The three indices were consistent. The drought asymmetric measure [24]. Let TEX⟶Y be the information
start time recognized by the DCI was earlier than that transfer from X to Y; TEY⟶X be the information transfer
recognized by the SRI, and roughly the same as that iden- from Y to X. If TEX⟶Y > TEY⟶X, the influence of X
tified by the SPI. Besides, the DCI recognized the same end over Y is stronger than that of Y over X. Then, X would be
Table 3: Transfer entropies between drought influencing factors and the drought index.
Location Upper reaches Middle reaches Lower reaches
Transfer entropy TEX⟶Y TEY⟶X TEX⟶Y TEY⟶X TEX⟶Y TEY⟶X
Air temperature 0.1414 0.0740 0.1590 0.1022 0.1555 0.1290
Runoff 0.1304 0.1034 0.1394 0.1222 0.1410 0.0880
Rainfall 0.1252 0.1633 0.1717 0.2066 0.1215 0.1947
Humidity 0.1525 0.1059 0.1731 0.1537 0.1790 0.1083
Air pressure 0.1026 0.1049 0.1663 0.1724 0.1535 0.1542
Vapor pressure 0.1412 0.1611 0.1564 0.1638 0.1540 0.1647
Sunshine hours 0.1740 0.1348 0.1494 0.1353 0.1418 0.0838
Wind velocity 0.1360 0.0714 0.1564 0.0874 0.1925 0.1204
regarded as a driver of Y. Then, two variables were defined as selected as the activation function. The Adam optimizer was
Y � {DCI12} and X � {Drought factors}. Table 3 shows the employed to optimize the model. Drawing on the features of
values of TEX⟶Y and TEY⟶X. momentum and root mean square propagation (RMSProp),
Table 3 shows that TEX⟶Y < TEY⟶X held for adaptive learning rates were designed from parameters like
rainfall, air pressure, and vapor pressure. Thus, these factors the first and second order moment estimations of gradients,
are largely constrained by drought and are not the causes of aiming to effectively update the weights.
drought. In the long run, drought is mainly affected by
constantly changing factors. The action of drought will affect
4.3.3. Model Evaluation. The model performance was
the natural water circulation, thereby constraining the
evaluated by MSE and R2 . The evaluation results are listed in
formation of clouds, rains, and fogs. As a result, meteoro-
Table 4.
logical factors like rainfall, and air pressure will be influ-
As shown in Figure 6 and Table 3, the LSTM network can
enced by drought. Rainfall, as an instantaneous factor, drive
accurately assess the drought risks. The MSE was always
short-term drought more significantly than long-term
smaller than 0.0032 and minimized at 0.0015. The R2 was
drought. Drought will be influenced only if the rainfall
always greater than 0.96, peaking at 0.9934. The interannual
increases or decreases continuously and breaks the balance
drought fluctuation was gentle, and the drought occurrence
of water circulation. Overall, this paper chooses the fol-
probability was low and highly predictable. Thus, the LSTM
lowing factors to evaluate drought: air temperature, runoff,
achieved a high prediction accuracy with a strong stability.
humidity, sunshine hours, and wind velocity.
0.50 2.0
0.25 1.5
0.00 1.0
2016 2017 2018 2019 2020 2016 2017 2018 2019 2020
Year Year
(a) (b)
Shizuishan Station Sanmenxia Station
2 0.5
1 0.0
2016 2017 2018 2019 2020 2016 2017 2018 2019 2020
Year Year
(c) (d)
Longmen Station Lijin Station
2016 2017 2018 2019 2020 2016 2017 2018 2019 2020
Year Year
(e) (f )
40° N 40° N
35° N 35° N
High : 2.5
Low : -0.5
0 60 120 240 360 480
[26] achieved a high prediction accuracy with the LSTM (2) The transfer entropy was adopted to analyze the
neural network. Wu et al. [27] described monthly rainfall causality between climate and hydrological factors
with wavelet transform, ARIMA, and LSTM, and demon- and the drought index, and judge the direction of
strated that the proposed W-A-L composite model can information flow. In this way, the air temperature,
predict rainfall accurately, providing a good reference for runoff, humidity, sunshine hours, and wind velocity
further research into drought prediction. were selected to assess the interannual drought in the
Yellow River Basin.
5. Conclusions (3) A deep learning model was established for drought
assessment. The assessment accuracy of the LSTM
The Yellow River Basin is frequently hit by drought, owing to network was proved by comparing the predicted
the complex natural conditions, unique climate features, and values with the actual values, and by computing the
special geographical conditions. The drought index and evaluation metrics.
evaluation are important issues in the understanding and
prevention of drought. Taking rainfall and runoff as me- Drought research is a hot field among researchers. More
teorological and hydrological factors, this paper explores the scientific studies are expected for the driving mechanism,
nonlinear relationship between meteorological and hydro- evolution, and evaluation of drought. In the era of the big
logical series by the copula function, and constructs a hybrid data, data fusion and deep learning could be combined to
drought index through the feature layer fusion. Then, the open a bright new arena of drought research.
transfer entropy was computed for drought factors and
indices, the causality was judged by the direction of infor- Data Availability
mation flow, and multiple factors were chosen for drought
assessment. Further, the data information was mined The data used to support the findings of this study are
through deep learning and used to build a multivariate available from the corresponding author upon request.
LSTM network for drought evaluation. The main conclu-
sions are as follows:
Conflicts of Interest
(1) Following the idea of copula function, the hybrid
drought index DCI combines the features of rainfall The authors declare that they have no conflicts of interest.
and runoff, and integrates the merits of SPI and SRI.
The index can detect the drought occurrence as the Acknowledgments
SPI, and capture the drought duration as the SRI,
providing an effective tool to depict the start, end, This research was supported by the Key Science and
and progression of drought. Technology Projects in Henan Province (212102310306) and
