Water Quality Assessment of A River Using Deep Learning Bi-LSTM Methodology: Forecasting and Validation
Water Quality Assessment of A River Using Deep Learning Bi-LSTM Methodology: Forecasting and Validation
Water Quality Assessment of A River Using Deep Learning Bi-LSTM Methodology: Forecasting and Validation
Abstract
Water is a prime necessity for the survival and sustenance of all living beings. Over the past few years, the water quality
of rivers is adversely affected due to harmful wastes and pollutants. This ever-increasing water pollution is a big matter
of concern as it deteriorating the water quality, making it unfit for any type of use. Recently, water quality modelling
using machine learning techniques has generated a lot of interest and can be very beneficial in ecological and water
resources management. However, they suffer many times from high computational complexity and high prediction error.
The good performance of a deep neural network like long short-term memory network (LSTM) has been exploited for
the time-series data. In this paper, a deep learning–based Bi-LSTM model (DLBL-WQA) is introduced to forecast the
water quality factors of Yamuna River, India. The existing schemes do not perform missing value imputation and focus
only on the learning process without including a loss function pertaining to training error. The proposed model shows a
novel scheme which includes missing value imputation in the first phase, the second phase generates the feature maps
from the given input data, the third phase includes a Bi-LSTM architecture to improve the learning process, and finally,
an optimized loss function is applied to reduce the training error. Thus, the proposed model improves forecasting
accuracy. Data comprising monthly samples of different water quality factors were collected for 6 years (2013–2019)
at several locations in the Delhi region. Experimental results reveal that predicted values of the model and the actual
values were in a close agreement and could reveal a future trend. The performance of our model was compared with
various state of the art techniques like SVR, random forest, artificial neural network, LSTM, and CNN-LSTM. To check
the accuracy, metrics like root mean square errors (RMSE), the mean absolute error (MAE), mean square error (MSE),
and mean absolute percentage error (MAPE) have been used. Experimental analysis is carried out by measuring the
COD and BOD levels. COD analysis reveals the MSE, RMSE, MAE, and MAPE values as 0.015, 0.117, 0.115, and
20.32, respectively, for the Palla region. Similarly, BOD analysis indicates the MSE, RMSE, MAE, and MAPE values as
0.107, 0.108, 0.124, and 18.22, respectively. A comparative analysis reveals that the proposed model outperforms all
other models in terms of the best forecasting accuracy and lowest error rates.
Keywords Water quality prediction . Deep learning . CNN . Bi-LSTM . Yamuna river
Introduction
Responsible Editor: Xianliang Yi
Water is a dire necessity in all aspects of human, environmen-
* Sakshi Khullar
tal, and social life. It is well-studied that water covers the 71%
[email protected] area of the earth. The groundwater is among the prime sources
of drinking water for the living of the human race. Similarly,
Nanhey Singh the rivers are also known as the important resources of natural
[email protected] water (Adimalla 2019). The river water has several usages
1
such as domestic and agricultural use. Due to several signifi-
Guru Gobind Singh Indraprastha University, West Patel Nagar, New
Delhi 110008, India
cant usages of water, it has gained the most special place
2
among all other natural resources. However, current growth
CSE, GGSIPU, AIACTR, Krishna Nagar Road Chacha Nahru Bal
Chikitsalaya, Geeta Colony, Delhi, New Delhi 110031, India
in urbanization and industrialization is responsible for
Author's Personal Copy
Environ Sci Pollut Res
producing hazardous wastes. These wastages are directly SRU Learning Network (Liu et al. 2020), recurrent neural
discharged into the rivers in various forms such as metals network (Li et al. 2019), and many more.
(copper, cadmium mercury, lead, and arsenic, etc.), organic
pollutants (hydrocarbons, pesticides, phenols, and insecti- Background of water quality prediction
cides, etc.), and microbiological bacillus. These wastages have
a serious influence on the water by degrading its quality. Water pollution is a challenging threat to the environment and
Water quality has a direct impact on humans’ wellbeing. human health systems. Recently, water quality monitoring
Because of the consumption of polluted water, health-related and quality forecasting are widely adopted by organizations
issues and mortality rates are snowballing universally, espe- in various countries. Generally, the WQ monitoring depart-
cially in developing countries (Panaskar et al. 2016). As per ments use six parameters to evaluate the features of water such
the statistics from developing countries, more than 250 M as pH, chemical oxygen demand (COD), ammoniacal nitro-
people get infected every year, including 10–20 M die due gen (NH3N) dissolved oxygen (DO), suspended solids (SS),
to diseases in developing countries (Mukate et al. 2018). and biochemical oxygen demand (BOD). According to a
Recently, due to the lack of freshwater resources, most of study presented in (Yahya et al. 2019), dissolved oxygen
the population is using underground water for drinking, agri- (DO) plays an important role as it provides information about
culture, and industry’s usage. The underground water is well the compound, physical and biotic characteristics of water.
accepted and a reliable source of drinking water; thus, many For example, DO represents the existence of oxygen (O2)
people depend on the groundwater resources such as hand molecules in terms of mg/L concentration in water. These
pumps, bore wells, and dug well for drinking purposes. parameters play a significant role in water quality prediction.
However, currently, the groundwater quality is also degrading As mentioned before, India is a developing country, and
due to several factors such as the composition of the host rock, the majority of the population relies on agriculture for their
the interaction between rock and water, climate change, soil living. To obtain better crop yield, the quality of the
matrix, and water table depth. Moreover, India is a major underground and river water is an important parameter.
agroproducer nation that depends on the rainy water and sur- Several studies have been reported recently on water quality
face water resources for agriculture and drinking. Hence, con- prediction on Indian rivers. Kisi and Parmar (2016) presented
tinuous monitoring of water is required to manage the water a water pollution prediction study on Yamuna River using
quality level for different usages. Recently, the research com- SVM and adaptive regression model. Here, diverse groups
munity presented the water quality index (WQI) as a standard of water quality characteristics are considered such as free
criterion to monitor the water quality according to the water ammonia (AMM), total coliform (TC), total Kjeldahl nitrogen
quality levels across the globe (Rahman et al. 2019). The (TKN), potential of hydrogen (pH), water temperature (WT),
value of WQI ranges between 1 and 100 wherein the large and faecal coliform (FC). Kadam et al. (2019) presented an
number of WQI denotes a better level of water quality. artificial neural network (ANN) and multiple linear regression
Generally, the WQI level 80 score more denotes the clean (MLR) model for Shivganga River using the following param-
water and WQI score 40–80 represents the slightly polluted eters pH, Ca, TDS, SO4, Cl, TH, EC, Na, Mg, HCO3, K, and
water in the river, and if this score is below 40, then water is NO3. Similarly, Bisht et al. (2017), authors presented a com-
considered polluted water. However, the continuous monitor- parative analysis of the various combinations of decision tree–
ing of water can be costly and tedious. In various regions, based learning algorithms such as J48 (C4.5), random tree
water quality monitoring is approximated through a time- (RT), Hoeffding tree, LMT (logistic model tree), and random
consuming process. This process includes several steps such forest (RF).
as water sample collection from the site, proper storage, and
transportation of these samples to the laboratories for testing. Problem statement
The testing of these water samples requires time and expen-
sive equipment. During this process, there is enough room that The dissolved oxygen is an important factor which is widely
can cause error and inefficiencies in water samples. In order to used for water quality forecasting. The regression methods
perform efficient water management, water quality forecast- consider the DO data as a time-series data and provide fore-
ing is considered a promising solution. For water quality fore- casting. Similarly, the dynamic linear models are used in sam-
casting, machine learning techniques have gained huge attrac- ple monitoring approach. These models use time correlation
tion from the research community because of their nature of structure to achieve the real-time forecasting which can be
learning the water quality patterns along the time. Several useful for future periods. Maintaining reliability in water qual-
techniques have been introduced such as artificial neural net- ity prediction model has become a hot research topic in water
works (Chatterjee et al. 2017), support vector regression (Li environmental science. The drawbacks of tradition methods
et al. 2017), ARIMA model (Zhang and Xin 2018) deep learn- are eliminated by developing improved regression models.
ing methods such as LSTM (Wang et al. 2017), Deep Bi-S- Moreover, the traditional methods do not consider the effect
Author's Personal Copy
Environ Sci Pollut Res
of physics, chemistry, biology, meteorology, and hydraulics methods suffer from implementation costs and require more
factors (Liu et al. 2019). Currently, several methods are pre- time to obtain the results due to their time-consuming process.
sented such as fuzzy logic and artificial neural network to Thus, the existing techniques are not reliable for water quality
improve the performance of water quality models. Similarly, prediction. Currently, machine learning and deep learning
LSTM and Bi-LSTM methods also gained huge attraction in have gained attraction due to their significant performance
the field of time-series forecasting (Shahid et al. 2020). The of pattern learning.
existing techniques fail to deal with computational complexity Avila et al. (2018) presented a comparative study for water
for multivariate data. Moreover, in India, the river Yamuna is quality forecasting. This study includes several models such
considered a one of the important sources of water, since this as Naive Bayes (NB)–based model, MLR, Bayesian network,
river passes through various states; hence, monitoring the dynamic regression, RT, classification tree, RF, multinomial
quality of water becomes a very important task for irrigation logistic regression, discriminant analysis, and Markov chain.
and drinking water. The experimental study shows that Bayesian learning
achieves better performance when compared with existing
Contribution of the work techniques. However, these systems suffer from various
issues such as prediction accuracy; hence, scholars have
In this work, the main aim is to develop a reliable and auto- concentrated on developing machine learning models for
mated machine learning process to monitor and predict the water quality prediction. Li et al. (2017) proposed a
water quality of Yamuna River. However, the main contribu- regression-based method which uses ensemble empirical
tion of this work is as follows: mode decomposition (EEMD) and support vector regression
(SVR). The main idea is to decompose and ensemble the
(a) Collection of various water samples from different sites dissolved oxygen into several intrinsic mode functions.
in Delhi region which includes several parameters to in- These modes are further modelled using support vector
dicate the pollution level of water regression. Yaseen et al. (2018) presented an enhanced form
(b) Implementing missing value imputation model to deal of least square SVM integrated with a Bat algorithm
with the missing values (LSSVM-BA) to predict the DO absorption to approximate
(c) Development of CNN model to generate a feature map the pollution level of river water. Yasin and Karim (2020)
from the given input data introduced fuzzy weighted multivariate regression analysis
(d) Introducing a new bi-directional LSTM architecture for to predict the water quality index. The water samples are ex-
better learning amined based on six parameters which are pH, chemical ox-
(e) Incorporating optimization process with the help of a loss ygen demand, ammoniacal nitrogen, dissolved oxygen,
function to minimize the training error suspended solids, and biochemical oxygen demand (BOD).
(f) Comparative analysis to prove the robustness of the pro- Under the machine learning paradigm, Ahmed et al. (2019)
posed model. proposed supervised machine learning–based models using
four parameters, namely, temperature, turbidity, pH, and
total dissolved solids. In this work, several techniques are
Article organization tested. The experimental study shows that the gradient
boosting algorithm achieves better performance. Lu and Ma
The structure of the article is prepared as the following: sect. (2020) introduced a hybrid decision tree model for short-term
“Literature survey” presents a brief literature review about water quality prediction. Huang et al. (2018) introduced a
recent techniques of water quality prediction and forecasting hybrid model using genetic algorithm, neural network, fuzzy
using machine learning techniques, sect. “Proposed DLBL- logic, and wavelet transform. The fuzzy rules are approximat-
WQA” presents a proposed solution for water quality predic- ed using self-adaptive fuzzy c-mean clustering.
tion using deep learning techniques, sect. “Results and discus- Liao and Zhao (2016) focused on dissolved oxygen for
sion” presents the detailed experimental analysis where we water quality prediction and proposed a combined model
evaluate the efficacy of proposed approach with prevailing using principal component analysis (PCA), fuzzy neural net-
techniques and finally, and sect. “Conclusion and future work (FNN), and differential evolution combined with BP
scope” presents the concluding remarks about the proposed algorithm (DEBP). The PCA helps to minimize the dimen-
scheme of water quality prediction. sions of the input data vector and differential evolution
algorithm. Wang et al. (2017) presented a deep learning–
Literature survey based method using LSTM (long- and short-term memory)
neural network. The LSTM NN model is established for pre-
In the previous section, the need and advantages of water diction; later, training data is collected from Taihu Lake, and
quality prediction are discussed. However, conventional finally, suitable parameters are selected to improve the
Author's Personal Copy
Environ Sci Pollut Res
accuracy of neural networks. WQ prediction is a tedious task of the Himalayas in Uttarakhand (India). It travels a total of
because the water parameter quality parameters are non-linear, 1376 km while passing through various states such as
dynamic, changeable, and complex. Due to these parameters, Rajasthan, Haryana, Himachal, Uttar Pradesh, and Delhi.
traditional forecasting methods suffer from poor accuracy and Yamuna River supplies more than 70% of water to Delhi
higher computational complexity issues. To overcome these and around 57 million people depend on the Yamuna river
issues, Hu et al. (2019) introduced a deep LSTM network for water for daily usage (Parmar and Bhardwaj 2014). Table 1
water quality prediction in the mariculture environment. First presents the total catchment of the Yamuna River.
of all, data pre-processing methods such as linear interpola- In this work, the Delhi stretch of the Yamuna River is
tion, smoothing, filtering, and denoizing methods are applied considered. Water samples from various regions such as
to correct the collected water quality data. Later, Pearson’s Palla (23-km upstream near flood control office),
correlation coefficient is computed to find the relativity be- Nizamuddin Bridge (29-km downstream Palla), Agra Canal
tween pH, temperature, and other features. Finally, a at Kalindi Kunj, Okhla (39-km downstream Palla), and Agra
prediction model is established to forecast water quality. canal at Badarpur were collected. Below given Fig. 1 shows
Similarly, Li et al. (2019) used recurrent neural networks for the graphical representation of the considered study area.
WQ prediction. The conventional recurrent NN models use a The collected samples comprise of nine different parame-
single shallow model that is not reliable to find the ters such as water temperature (WT), DO, pH, free ammonia
longstanding relevance between water quality parameters. (AMM), COD, BOD, faecal coliform (TC), faecal coliform
This leads to false alarms in the WQP model. To overcome (FC), and conductivity (COND). Below given Table 2 pre-
these issues, authors Li et al. (2019) introduced a combined sents a brief description of these parameters.
model using recurrent neural networks (RNN) with improved
Dempster/Shafer (D-S) evidence theory (RNNs-DS). Here, Methodology
RNN handles the long-term dependencies, and improved D-
S evidence is used to synthesize the prediction outcome of This section describes the proposed solution for water quality
RNN. Furthermore, an enhanced scheme is presented to ob- prediction using deep learning techniques. Firstly, several
tain the number of evidences using correlation analysis which state-of-art models for time-series forecasting is discussed.
helps to reduce the uncertainty in evidence selection.
Moreover, a modified softmax function is also presented to Forecasting methods
solve the weight allocation problem. Ye et al. (2019)
discussed the issues of dynamic non-linearity and the correla- In this subsection, a brief description of forecasting models
tion between water quality features. They use LSTM to opti- like support vector regression, artificial neural network, ran-
mize the RNN architecture, and new architecture is introduced dom forest, and deep learning methods is given.
by considering several optimal parameters such as the no. of
storage units, no. of structural layers, and adjusting the win- SVR Support vector machine is a machine learning technique
dow size. Experimental results depict better prediction rates of that is established on the concept of statistical learning theory.
pollutant index with LSTM-RNN over the conventional grey SVR is a kind of SVM that is widely adopted in time-series
model and the RNN model. prediction applications such as weather forecasting, load fore-
casting, and fault prediction. Wang et al. used support vector
regression and presented a hybrid structure (WA-PSO-SVR)
Proposed DLBL-WQA based on wavelet analysis (WA) coupled with support vector
regression (SVR) and particle swarm optimization (PSO). The
This study is mainly focused on forecasting the water quality SVR is used to monitor the levels of COD, DO, and
factors of rivers. The Yamuna River in the Indian region is
chosen as the study area because it is a prominent water re-
source for about five states with over fifty million people Table 1 The catchment of Yamuna River
dependent on it. This section describes the proposed solution
State Total area (Sq. Km.) % Contribution
for water quality prediction
Rajasthan 102,883 29.80%
Yamuna River UP and Uttaranchal 74,208 21.50%
Haryana 21,265 6.50%
The Yamuna River is one of the major streams of Ganga in the Madhya Pradesh 14,028 40.60%
north region of India. The origin point of Yamuna in Himachal Pradesh 5799 1.6%
Yamunotri which is located at a height of 6387 m on Delhi 1485 0.4%
Banderpooch peaks (38°59′N 78°27′ E) in the lower region
Author's Personal Copy
Environ Sci Pollut Res
ammonia–nitrogen in river water. Similarly, the SVR is also the bias. W and b are realized as an optimization problem, and
used for water temperature monitoring (Quan et al. 2020). their solution can be obtained as:
Thus, the SVR can be applied for water quality forecasting.
1 N
This work adopts the SVR model for Yamuna river water kW k2 þ C ∑ εi þ ε*i ð3Þ
quality monitoring. Let us consider a time-series data given as: 2 i¼1
Parameter Description
AMM Free ammonia is the part of ammonia that did not get mixed with chlorine and stays as NH4+ or NH3 based on the temperature and pH of
the water.
BOD It is a measurement unit of the volume of oxygen required by the microorganisms for the degradation of the organic constituents exists in
water
COD It is the overall measure of all the organic and in-organic chemicals present in the water.
DO DO is a measurement of non-compound oxygen present in the water. It has a significant control on water quality because it affects the lives
of water creatures. Very high or very low DO can have an adverse consequence on the quality of water.
TC and FC The overall coliform and faecal coliform microbes test are a major sign of “potability” and fitness measure for the ingestion of drinking
water. It indicates the blend of overall coliform bacteria and the potential existence of disease-causing germs.
pH It indicates the acidity/base level of the water. It ranges from 0 to 14, where 7 is considered neutral. Below 7 level indicates acidity, while
above 7 indicates a base.
TKN Total Kjeldahl nitrogen is a measurement unit for organic nitrogen and ammonia. TKN usually exists in the range of 35 and 60 mg/L in the
influent civic wastewater.
WT Water temperature indicates the hotness or coldness of water. It plays a significant role to determine the water quality using
oxidation–reduction potential, water density, metabolic rates, compound toxicity, conductivity, and salinity.
Author's Personal Copy
Environ Sci Pollut Res
ANN The ANNs are widely adopted in various applications Table 3 Random forest algorithm
such as rainfall forecasting (Unnikrishnan and Jothiprakash Inputs:
2020, Samantaray et al. 2020) and electricity demand forecast- Training dataset X with dimension N×n where N denotes the number of
ing (Anand and Suganthi 2020). ANN is a learning model, observations and n represents the number of features.
motivated by the brain function and inherent nervous system. Y denotes output or target values of prediction with dimension N×1.
ANN contains a hidden layer to learn the patterns; however, L represents the number of trees in random forest, Ti represents the
the simplest ANN model can be constructed using a single- decision tree in RF, and m represented randomly selected features in
hidden layer which is known as a single-hidden layer each node of the decision tree.
feedforward neural network (SLFN). This single-layer net- Step 1: generate the training set in each decision tree Ti in the random
forest by sampling all observations.
work contains an input layer, a hidden layer with an activation
Step 2: compute the best split criterion for decision tree Ti by using m
function. The output of this network is computed as:
randomly selected features.
!
h Step 3: repeat this process, until the tree is grown and the full tree is
y¼g ∑ wjo v j þ b j ð5Þ constructed.
j¼1 Step 4: finally, aggregate the outcome of each decision tree to achieve the
final prediction result. This result can be used for classification and
regression. The final value is determined using a majority vote for
where vj is the output of the hidden layer, which is denoted as classification. However, the mean or median values from the predicted
v j ¼ f ∑ni¼1 wij xi þ bi Þ, xi denotes the input vector to the outputs are used for regression.
neuron, f(.) and g(.) represent the non-linear activation func-
tion, vj is the output of hidden layer, n denotes the total number
of features, h denotes the no. of hidden layer neurons, wij is the Similarly, the stacked autoencoder is a type of neural net-
weight between input variable i and neuron j of the considered work which contains several layers of sparse autoencoders,
hidden layer, wjo is the weight of the connection between the and the output layer is linked to the input layers. The encoding
hidden layer and output, and y is the final outcome of ANN. of each step is expressed as:
RF RF is a combined learning method that can be utilized for aðlÞ ¼ f zðlÞ
classification and regression tasks. The random forest–based ð6Þ
zðlþ1Þ ¼ W ðl;1Þ aðlÞ þ bðl;1Þ
regression scheme is also adopted in various forecasting sys-
tems (Moon et al. 2018). Generally, random forests use bag- Similarly, the decoding can be expressed as:
ging and random subspace methods for these problems.
Bagging is a widely used method in random forest models. aðnþlÞ ¼ f zðnþlÞ
According to these ensemble methods, each learning model is ð7Þ
zðnþlþ1Þ ¼ W ðn−l;2Þ aðnþlÞ þ bðn−l;2Þ
trained using bootstrap samples from the original training
samples, and then finally, outputs are aggregated. In the ran- where a(l) denotes the activation function in layer l, z(l) is the
dom forest, the constructed decision tree is used where m weighted sum of inputs for layer l, W( l, k ) denotes the weight
features are selected from a total n number of features. Later, value and b(l, k).
feature impurity criterion is applied to partition the features
along with the feature axis. Below given Table 3 shows the
algorithmic representation of the random forest algorithm.
Proposed deep learning–based forecasting method input data processed through the missing value imputation
for water quality prediction model is fed into the CNN layers. The outcome of the CNN
layer is processed through the Bi-LSTM layers, and finally,
We propose a deep learning model for predicting the water fully connected layers generate the predicted output.
quality in the Yamuna River using regression analysis. In the first module, two one-dimensional layers tailed by
Generally, the forecasting problems for a regression model two max-pooling layers are used. This combination helps to
can be expressed as: decrease the computational complications in feature extrac-
tion. Conventional methods use MLP (multilayer perceptron)
y ¼ F ðx; θÞ as feature extractor which uses feedforward neural network
where x denotes the input vector and y denotes the predicted process, but these methods fail to attain the anticipated out-
output for the corresponding input, F is a mapping function come due to the connectivity between the perceptron. On the
with parameters which are used to learn the pattern from other hand, CNN is known as a special type of MLP where
training samples pairs given as {(xr, yr)| r = 1, …R}. Thus, connectivity between other neurons is not required. These
the aim is to minimize the mean squared error (MMSE) of neurons are connected to a certain region of the input data
prediction. This MMSE function is defined as: and arranged using a certain size and stride. Here, CNN filters
are exploited which allow for parameter and weight sharing to
1 R
2
identify the different locations.
ε¼ ∑ byr −yr ð8Þ
R r¼1 2 In CNN layers, the neurons are combined by using weights
and biases which are learned in the training process.
where by is the predicted output class, and y denotes the input According to this model, we provide several water quality
class. In this work, a deep neural network–based learning parameters as inputs to the neurons. In the next step, the dot
technique is used to learn the parameters. Deep neural net- product operator is applied followed by a non-linearity func-
work adopts feedforward neural network model which con- tion. This model contains a 1D convolution layer, pooling
tains one or more hidden units. layer, and fully connected layer. The time-series data is given
as input to CNN in the form of one-dimensional data which
Missing value imputation are later arranged as sequential time instants. Let us consider
that the input vector is given as x = {x1, x2, x3, …xn} where x-
n ∈ R are the dataset variables. The 1D convolutional layer
The Yamuna River data samples were collected from the d
Central Pollution Control Board (CPCB) department, Delhi. generates a feature map as fm with the help of convolutional
The data had some missing values. Few data samples were not operators on the input water quality data with filter w ∈ Rfd
present in June and July. Similarly, some of the FC and TC where f denotes the inherent features. A new feature map fm
values were missing. Sometimes, there was no water in a can be obtained from the previous feature map which is
particular region; for instance, there was no water in expressed as:
Yamuna River in May 2016. These missing values in the data
f
can generate an inaccurate prediction. To avoid it, missing hl i m ¼ tanh w f m xi:iþ f −1 þ b ð9Þ
value imputation by using a simple “mean” imputer is done
where missing values are identified and replaced by the mean where b denotes the bias and h ∈ Rn − f + 1, the filter hl is ap-
values of neighbouring values. plied to each set of feature f in the input data as {x1 : f, x2 : f + 1,
…, xn − f + 1} which generate a feature map as hl = [hl1, hl2, …
hln − f + 1]. The output of convolution layer is obtained by
Deep Bi-LSTM network
adding the weighted inputs that are compromised using
multilinear transformation. Generally, the linear transforma-
Recurrent neural networks (RNN) play a significant role in
tion fails to capture the complex structure of input data hence
pattern learning. The RNN models maintain memory unit
non-linear functions are preferred over linear functions for
based on the contextual information which is used for process-
better learning. This work uses ReLU activation function
ing the time-series data. However, RNNs suffer from the issue
which is applied to each input. Further, the output of convo-
of managing long-range dependencies due to gradients. To
lution layer is processed through the max-pooling layer, and
overcome this, researchers have designed a long short-term
down sampling is performed i.e. we apply max-pooling layer
memory (LSTM) model for a pattern that solves the depen- !
dency problem by using memory cells. This model contains on each feature map as hl ¼ maxfhlg. This procedure helps
three multiplicative gates which help to forget and store the to select the features that are the most important. The outcome
information in the cell states. Below given Fig. 3 shows the of max-pooling layer can be expressed as:
complete proposed deep Bi-LSTM model to predict the water 0
quality. In this work, CNN is combined with Bi-LSTM. The xi ¼ CNNðxi Þ ð10Þ
Author's Personal Copy
Environ Sci Pollut Res
where xi denotes the input vector which contains water quality The processing of data in these two directions helps to capture
0
parameters, and xi denotes the output of CNN model which is each change in the information on water quality data. Let us
later fed to the Bi-LSTM network. Further, we incorporate consider that water quality parameter sequence x has n number
LSTM architecture with forget gate structure as depicted in of elements where LSTM considers forward direction ele-
below given Fig. 4. ments as {x1, x2, x3, …xn} and backward direction as {xn, xn
This structure is represented as: − 1, xn − 2, …x1}. These elements are trained separately during
the training process and finally fused by integrating the out-
it ¼ σðW i ð½xt ; yt−1 ÞÞ puts of the training process for each direction. This can be
f t ¼ σ W f ð½xt ; yt−1 Þ expressed as:
ot ¼ σðW 0 ðxt ; yt−1 ÞÞ
ð11Þ
gt ¼ tanh W g ð½xt ; yt−1 Þ yðt Þ ¼ y F ðt Þ⊕yB ðn−t þ 1Þ ð12Þ
ct ¼ f ⊙ct−1 þ i⊙g
yt ¼ o⊙tanhðct Þ where yF and yB denote the forward and backward directions,
respectively; ⊕ is used for integration operator; and y(t) is the
where we denote various gates such as input, forget, output, predicted output at time t
and input modulation gates in terms of i, f, o, g, and c. σ
denotes the sigmoid function; Wi, Wf, Wo , and Wg denote
the fully connected neural network corresponding to the input, Optimization model for LSTM
forget, output, and input modulation gates, respectively; and
⊙ denotes the element-wise product. Generally, the conven- Many times, deep learning methods suffer from learning
tional LSTM processes one-directional sequence which leads error-related issues which cause poor performance in classifi-
to reduced efficiency of LSTM, but multidirectional data may cation and forecasting. The conventional methods use a hy-
contain valuable information. Hence, to overcome these chal- brid model by combining the CNN and LSTM linearly; how-
lenges, we incorporate bi-directional LSTM which blends ever, computational complexity and parameter optimization
both forward and backward directions in the data sequence. remain challenging tasks. To overcome this issue, an optimi-
zation module is developed to improve the learning. Reducing
the training error can significantly improve forecasting perfor-
mance. Focusing on this notion, a novel and optimized train-
ing module is presented to minimize the training error. This
can be obtained by quantifying the uncertainty of prediction
values. To solve this problem, identifying the optimal weights
for LSTM is an important task to approximate the conditional
distribution p(yi, t + k| yi, t) of predicted output. Hence, a pre-
dictive model is created to estimate the probability distribution
of the output. The network is trained to minimize the error by
adjusting the parameters to maximize the likelihood L(θ)
Fig. 4 Bi-LSTM architecture which is obtained for each output θ of historical observations.
Author's Personal Copy
Environ Sci Pollut Res
Parameter Computation
The dataset contains nine parameters such as WT, DO, Four statistical parameters are employed to evaluate the per-
pH, AMM, COD, BOD, COND, TC, and FC to indicate formance of water quality forecasting using deep learning
the water quality. These parameters are treated as attri- techniques. These parameters are mean squared error, mean
butes for learning. Each attribute has 12 values; hence, absolute error, root mean square error, and mean absolute
for each region, there are 108 attributes. The complete percentage error (Table 6).
catchment of Yamuna River is considered in five re-
gions. Therefore, a total of 540 attributes are collected & MSE (mean squared error): it is a measurement of aver-
for each year data sample. Six-year (from 2013 to 2019) aged squared error between predicted and actual values.
water quality data is collected from the Central Pollution Less value of MSE represents the better prediction.
Control Board (CPCB), India. Thus, we have a total & RMSE (root mean square error): the RMSE represents the
3240 attributes in the collected data. This data are col- standard deviation of prediction errors.
lected on a monthly basis survey by CPCB. Below given & MAE (mean absolute error): it is a measurement of the
Table 4 shows the sample data for 3 months in each average magnitude of prediction errors.
region. & MAPE (mean absolute percentage error): it is the accuracy
A deep learning–based model is used for water quality of the time-series forecasting model.
forecasting. In order to train this network, several parameters
are used which are mentioned in below given Table 5.
The proposed deep neural network model contains 250 These parameters can be computed as follows:
hidden units where data is processed in batches by considering
batch size as 120. Data is divided into a ratio of 70–30% i.e.
70% data is used for training and 30% data is used for testing. Comparative analysis
The complete data is trained using the Bayesian
Regularization training function. In this section, a comparative study of the proposed technique
is made with other existing techniques of water quality
Artificial neural network Dense layers nodes = 32, 64, and 64 optimizer = “rmsprop”
Random forest n_estimators = 100, max_features = “auto,” bootstrap = true
Linear regression Default
LSTM Hidden layer LSTM nodes = 4
Activation function = sigmoid, epochs = 100, batch_size = 1, verbose = 2
CNN-LSTM Filters = 32, kernel = 2, activation function = “relu,” input_shape = (13, 1)
DLBL-WQA Filters = 32, kernel = 2, activation function = “relu,” input_shape = (13, 1), dropout = 0.25
Author's Personal Copy
Environ Sci Pollut Res
Table 8 Palla region COD prediction performance various parameters, for example, chemical oxygen demand
Methods MSE RMSE MAE MAPE (COD), biochemical oxygen demand (BOD), conductivity,
dissolved oxygen (DO), and temperature, pH can be utilized.
Support vector regression 0.491 0.711 0.596 54.28 This section considers the Palla region water samples of 6
Artificial neural network 0.566 0.652 0.521 54.21 years. As mentioned, 70% of data for training and 30% of data
Random forest 0.487 0.568 0.614 53.24 for testing are used. According to the proposed model, firstly,
Linear regression 0.401 0.480 0.556 51.29 missing value imputation method is applied to obtain the ap-
LSTM 0.328 0.401 0.358 46.82 proximated data for the months where the data is not present.
CNN-LSTM 0.218 0.268 0.214 34.22
Proposed DLBL-WQA 0.015 0.117 0.115 20.32 COD analysis
DLBL-WQA
CNN-LSTM
LSTM
Methods Used
LR
RF
SVR
35
30
COD Values
25
20
15
10
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
Sample (Month)
This comparative analysis shows that the MSE value BOD analysis
is decreased by 96.945%, 97.34%, 96.91%, 96.25%,
95.42%, and 93.11%; RMSE value using proposed ap- BOD is the extent of oxygen used by aerobic microor-
proach is decreased by 83.54%, 82.05%, 79.40%, ganism to break down the organic matters, and it serves
75.62%, 70.82%, and 56.34%; MAE values are de- as an important indicator of water pollution. The BOD
creased by 80.70%, 77.92%, 81.27%, 79.31%, 67.87%, of any aquatic system is the foremost parameter needed
and 46.26%; and MAPE values are decreased by for assessment of the water quality as well as develop-
62.56%, 62.51%, 61.83%, 60.38%, 56.59%, and ment of management strategies for the protection of wa-
40.61% when compared with the outcome of proposed ter resources
approach with SVR, ANN, RF, LR, LSTM and CNN- In this subsection, the BOD parameter is used to
LSTM, respectively. Below given Fig. 6 depicts the predict the water quality of the river in the Palla region
comparative analysis of dissolved oxygen prediction (Fig. 7). The complete experimental analysis and setup
where we show the relationship between actual and pre- are similar to the previous experiment. Below given
dicted values. Table 9 shows a comparative analysis in terms of
MSE, RMSE, MAE, and MAPE.
DLBL-WQA
CNN-LSTM
LSTM
Methods Used
LR
RF
ANN
SVR
Table 9 Palla region BOD prediction performance Conclusion and future scope
Methods MSE RMSE MAE MAPE
This study proposes a water quality forecasting method using
Support vector regression 0.580 0.662 0.671 46.88 deep learning techniques. It explores the applications of bidi-
Artificial neural network 0.580 0.557 0.628 43.91 rectional LSTM to forecast water quality factors like COD and
Random forest 0.540 0.460 0.527 41.01 BOD well in advance. The technique includes data pre-pro-
Linear regression 0.423 0.481 0.413 36.28 cessing, parameter setting, optimization, and learning proce-
LSTM 0.357 0.358 0.322 33.25 dure. The conventional CNN and LSTM models have high
CNN-LSTM 0.310 0.210 0.266 24.68 computational complexity and low prediction accuracy. To
Proposed DLBL-WQA 0.107 0.108 0.124 18.22 overcome this, an optimization model is used that helps to
reduce the training error and improve precision. The built-up
forecast model can be utilized, and adapted for various water
quality samples from different sources and accordingly may
Based on this analysis of BOD, we present a comparative have expansive application scenarios. The performance of the
analysis as depicted in below given Fig. 8 which illustrates the proposed DLBL-WQA model is compared with various state-
overall improvement in the performance of proposed ap- of-art forecasting models. Experimental studies show that the
proach in terms of MSE, RMSE, MAE, and MAPE. proposed approach attains the best forecasting accuracy and
According to this experiment, the MSE by using pro- lowest error rates out of all techniques. COD analysis illustrat-
posed approach is decreased by 81.55%, 81.55%, 80.18%, ed that the MSE and RMSE values for Palla region are obtain-
74.70%, 70.02%, and 65.48%; RMSE values decreased ed as 0.015 and 0.117, respectively. Further, the BOD analysis
by 83.68%, 80.61%, 53.17%, 77.54%, 69.83%, and in this area illustrated that the MSE and RMSE values are as
48.57%; MAE values are decreased by 81.52%, 80.25%, 0.107 and 0.108, respectively. Consequently, this model could
76.47%, 69.97%, 61.49%, and 53.38%; and MAPE values be successfully used to estimate various water quality factors
are decreased by 61.13%, 58.50%, 55.57%, 49.77%, well in advance thereby helping in its quality monitoring and
45.20%, and 26.17% when compared with the outcome management.
of proposed approach with SVR, ANN, RF, LR, LSTM, The present study uses time-series data analysis of a
and CNN-LSTM, respectively. single-dimensional input for water quality monitoring.
Based on this experiment, we present a comparative anal- The future scope of this work can include attribute se-
ysis of BOD prediction for considered BOD samples. Figure 8 lection and analysis of different parameters to monitor
demonstrates the performance comparison between actual and the water quality. Complex relationships between the
predicted BOD values. The experimental study shows that the different water quality variables can also be exploited
proposed model attains similar values as the actual values. to further enhance the accuracy of forecasting. Besides,
The abovementioned BOD and COD analysis show that the present research predicted the water quality data in
the proposed model provides an accurate estimation of water one monitoring station in the Palla region of Yamuna
quality parameters for forecasting. River. In future research, more monitoring stations
10
8
BOD Values
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
Sample (Month)
Author's Personal Copy
Environ Sci Pollut Res
could be included to study the water quality in spatial Li X, Cheng Z, Yu Q, Bai Y, Li C (2017) Water-quality prediction using
multimodal support vector regression: case study of Jialing River,
dimensions under the hydrodynamic principle.
China. J Environ Eng 143(10):04017070
Li L, Jiang P, Xu H, Lin G, Guo D, Wu H (2019) Water quality prediction
based on recurrent neural network and improved evidence theory: a
Acknowledgements The authors thank CPCB, New Delhi, for providing case study of Qiantang River, China. Environ Sci Pollut Res 26(19):
Yamuna river water quality data. 19879–19896
Liao, F., & Zhao, C. (2016). Water quality prediction model based on
Availability of data and materials The data that support the findings of fuzzy neural network. In 2016 6th International Conference on
this study are available from the Central Pollution Control Board, New Machinery, Materials, Environment, Biotechnology and
Delhi, India, but restrictions apply to the availability of these data, which Computer. Atlantis Press.
were used under license for the current study and so are not publicly Liu P, Wang J, Sangaiah AK, Xie Y, Yin X (2019) Analysis and predic-
available. Data are however available from the authors upon reasonable tion of water quality using LSTM deep neural networks in IoT
request and with permission of the Central Pollution Control Board. environment. Sustainability 11(7):2058
Liu J, Yu C, Hu Z, Zhao Y, Bai Y, Xie M, Luo J (2020) Accurate
Author contribution SK implemented the methodology and prepared the prediction scheme of water quality in smart mariculture with deep
manuscript with the guidance of NS (PhD supervisor). Bi-S-SRU learning network. IEEE Access 8:24784–24798
Lu H, Ma X (2020) Hybrid decision tree-based machine learning models
for short-term water quality prediction. Chemosphere 249:126169
Declarations Moon J, Kim Y, Son M, Hwang E (2018) Hybrid short-term load fore-
casting scheme using random forest and multilayer perceptron.
Ethics approval and consent to participate NA Energies 11(12):3283
Mukate S, Panaskar D, Wagh V, Muley A, Jangam C, Pawar R (2018)
Consent for publication NA Impact of anthropogenic inputs on water quality in Chincholi indus-
trial area of Solapur, Maharashtra, India. Groundw Sustain Dev 7:
359–371
Conflict of interest The authors declare no competing interests.
Panaskar DB, Wagh VM, Muley AA, Mukate SV, Pawar RS, Aamalawar
ML (2016) Evaluating groundwater suitability for the domestic,
irrigation, and industrial purposes in Nanded Tehsil, Maharashtra,
References India, using GIS and statistics. Arab J Geosci 9(13):615
Parmar KS, Bhardwaj R (2014) Water quality management using statis-
tical analysis and time-series prediction model. Appl Water Sci 4(4):
Adimalla N (2019) Groundwater quality for drinking and irrigation pur-
425–434
poses and potential health risks assessment: a case study from semi-
Quan, Q., Hao, Z., Xifeng, H., & Jingchun, L. (2020). Research on water
arid region of South India. Exposure Health 11(2):109–123
temperature prediction based on improved support vector regres-
Ahmed U, Mumtaz R, Anwar H, Shah AA, Irfan R, García-Nieto J (2019)
sion. Neural Comput Appl, 1-10.
Efficient water quality prediction using supervised machine learn-
Rahman, A., Dabrowski, J., & McCulloch, J. (2019). Dissolved oxygen
ing. Water 11(11):2210
prediction in prawn ponds from a group of one step predictors.
Anand, A., & Suganthi, L. (2020). Forecasting of electricity demand by
Information Processing in Agriculture.
hybrid ANN-PSO models. In Deep Learning and Neural Networks:
Samantaray, S., Tripathy, O., Sahoo, A., & Ghose, D. K. (2020). Rainfall
Concepts, Methodologies, Tools, and Applications (pp. 865-882).
forecasting through ANN and SVM in Bolangir Watershed, India.
IGI Global.
In Smart Intelligent Computing and Applications (pp. 767-774).
Avila R, Horn B, Moriarty E, Hodson R, Moltchanova E (2018) Springer, Singapore.
Evaluating statistical model performance in water quality prediction. Shahid F, Zameer A, Muneeb M (2020) Predictions for COVID-19 with
J Environ Manag 206:910–919 deep learning models of LSTM, GRU and Bi-LSTM. Chaos,
Bisht, A. K., Singh, R., Bhatt, A., & Bhutiani, R. (2017). Development of Solitons Fractals 140:110212
an automated water quality classification model for the River Ganga. Unnikrishnan P, Jothiprakash V (2020) Hybrid SSA-ARIMA-ANN mod-
In International Conference on Next Generation Computing el for forecasting daily rainfall. Water Resour Manag 34(11):3609–
Technologies (pp. 190-198). Springer, Singapore. 3623
Chatterjee S, Sarkar S, Dey N, Ashour AS, Sen S, Hassanien AE (2017) Wang, Y., Zhou, J., Chen, K., Wang, Y., & Liu, L. (2017). Water quality
Application of cuckoo search in water quality prediction using arti- prediction method based on LSTM neural network. In 2017 12th
ficial neural network. Int J Comput Intel Stud 6(2-3):229–244 International Conference on Intelligent Systems and Knowledge
Hu Z, Zhang Y, Zhao Y, Xie M, Zhong J, Tu Z, Liu J (2019) A water Engineering (ISKE) (pp. 1-5). IEEE.
quality prediction method based on the deep LSTM network con- Yahya A, Saeed A, Ahmed AN, Binti Othman F, Ibrahim RK, Afan HA,
sidering correlation in smart mariculture. Sensors 19(6):1420 Elshafie A (2019) Water quality prediction model based support
Huang, M., Tian, D., Liu, H., Zhang, C., Yi, X., Cai, J. … & Ying, G. vector machine model for ungauged river catchment under dual
(2018). A hybrid fuzzy wavelet neural network model with self- scenarios. Water 11(6):1231
adapted fuzzy-means clustering and genetic algorithm for water Yaseen ZM, Ehteram M, Sharafati A, Shahid S, Al-Ansari N, El-Shafie A
quality prediction in rivers. Complexity, 2018. (2018) The integration of nature-inspired algorithms with least
Kadam AK, Wagh VM, Muley AA, Umrikar BN, Sankhua RN (2019) square support vector regression models: application to modeling
Prediction of water quality index using artificial neural network and river dissolved oxygen concentration. Water 10(9):1124
multiple linear regression modelling approach in Shivganga River Yasin, M. I., & Karim, S. A. A. (2020). A new fuzzy weighted multivar-
basin, India. Model Earth Syst Environ 5(3):951–962 iate regression to predict water quality index at Perak rivers. In
Kisi O, Parmar KS (2016) Application of least square support vector Optimization based model using fuzzy and other statistical tech-
machine and multivariate adaptive regression spline models in long niques towards environmental sustainability (pp. 1-27). Springer,
term prediction of river water pollution. J Hydrol 534:104–112 Singapore.
Author's Personal Copy
Environ Sci Pollut Res
Ye, Q., Yang, X., Chen, C., & Wang, J. (2019). River water quality informatics in Sustainable Ecosystem and Society (pp. 127-133).
parameters prediction method based on LSTM-RNN model. In Springer, Singapore.
2019 Chinese Control and Decision Conference (CCDC) (pp.
3024-3028). IEEE. Publisher’s note Springer Nature remains neutral with regard to jurisdic-
Zhang, L., & Xin, F. (2018). Prediction model of river water quality time tional claims in published maps and institutional affiliations.
series based on ARIMA model. In International Conference on Geo-