A Predictive Model For Water Quality Index Assessment by Machine Learning Approach
A Predictive Model For Water Quality Index Assessment by Machine Learning Approach
Shri Ramdeobaba College of Engineering and Management, Nagpur, Shree Govindrao Wanjari college of Engineering.
India
[email protected]
Abstract— The deteriorating state of water quality poses a and water resource management, contributing significantly to
significant threat to human well-being and ecosystems, fueled practical strategies in water quality preservation [Kouadri, S.
by contaminants from industrial discharges, agricultural 2021]. WQI research is pivotal for safeguarding human
runoff, and urban activities. This pollution directly impacts health and environmental ecosystems. Clean water is
human health and disrupts aquatic habitats, jeopardizing
biodiversity. Conventional assessment methods fail to address
essential for consumption, and WQI aids in identifying and
this urgent issue, prompting the advocacy for a novel approach mitigating potential health hazards associated with
that leverages machine-learning algorithms. This research waterborne diseases. Moreover, aquatic ecosystems rely on
focuses on developing a predictive model for the Water Quality water quality, and disruptions can lead to biodiversity
Index (WQI), a key metric for water quality assessment. decline. The research addresses declining water quality by
Utilizing a diverse dataset from multiple monitoring stations, proposing an innovative methodology, utilizing machine
various machine learning algorithms, including XGBoost, learning to predict WQI, contributing to effective
Decision Tree, AdaBoost, K-Neighbours, and Support Vector environmental management strategies [Mengyuan Zhu,
Machine (SVM), analyze complex relationships among water 2022].
quality parameters to predict the WQI. The model undergoes
rigorous validation, comparing predicted WQI values with
independent data, using metrics like accuracy, precision, recall, The historical evolution of WQI research, computational
and F1 score. Sensitivity analysis identifies influential factors, methods, and recent advancements are outlined in the study,
revealing that machine-learning algorithms accurately gauge providing a comprehensive overview. WQI systems like
current water quality and demonstrate the ability to forecast Canada's National Pollutant Index and the National
future conditions. This research significantly advances water Sanitation Foundation Index are highlighted. Recent
quality monitoring, emphasizing the vital role of integrating advances in machine learning, including Decision Trees,
machine learning into environmental management. The Random Forests, SVM and Neural Networks, have shown
implications extend to policymakers, environmental scientists, promise in improving WQI calculations and enhancing
and water managers, providing valuable insights for
formulating effective strategies to protect and enhance water
predictive capabilities in water quality assessment [Kouadri,
quality. S. 2021]. The research paper advocates for an innovative
approach, leveraging machine learning algorithms to evaluate
Keywords— Water Quality Index, Machine Learning and predict water quality comprehensively. Unlike traditional
Algorithms, Predictive Modeling, Feature Selection, Ecological methods, this methodology anticipates future trends,
Health, Ecosystems facilitating proactive measures. Utilizing diverse datasets,
various machine learning algorithms, and rigorous validation,
I. INTRODUCTION the study underscores the potency of machine learning in
accurately gauging current water quality and foreseeing
In response to escalating concerns about water quality future conditions. The findings extend to policymakers,
and its implications for human well-being and ecosystem environmental scientists, and water managers, offering
sustainability, this review paper advocates for an advanced valuable insights for formulating effective strategies to
methodology employing machine learning algorithms to protect and enhance water quality.
predict the WQI. The study utilizes diverse datasets from
monitoring stations, employing Decision Trees, Random II. RELATED WORKS
Forest, SVM and Neural Networks to develop a predictive
model for WQI [Shams, M.Y., 2023]. Rigorous testing, In recent years, the need for improved water quality
validation, and feature selection highlight the efficiency and management has become increasingly evident due to the
resilience of machine learning algorithms, emphasizing their growing challenges of water pollution and its adverse effects
potential for anticipating future water quality trends [Hussein on ecosystems and public health. This issue is of great
EE, 2023]. The practical implications of these findings concern to environmental scientists and regulatory agencies
extend to environmental policy-making, scientific research,
Authorized licensed use limited to: Sardar Patel Institute of Technology. Downloaded on November 04,2024 at 14:32:37 UTC from IEEE Xplore. Restrictions apply.
critical indicators of water quality, such as pH, dissolved indexing helps standardize the measurement units and allows
oxygen, turbidity, and pollutant concentrations. The selection for incorporating parameter-specific considerations.
process is crucial, as it determines the scope and coverage of
the assessment. Table. 1. Selection of water quality parameters at Sites
Generation of Parameter Sub-Indices: Once the relevant SITE 1 SITE 2
water quality parameters have been selected, their Mean Q- Mean Q-
concentrations are transformed into unitless sub-indices. Parameters
Values value values values
These sub-indices are created to standardize the measurement
Dissolve Oxygen (mg/L) 5.25 50 7.91 71
units and allow for meaningful aggregation. Each parameter's
sub-index quantifies its contribution to the overall water Faecal coliform (MPN/
205 33 9.5 74
quality assessment. 100 ml)
Assignment of Parameter Weight Values: Parameters are pH 4.6 18 7.6 91
assigned weightings that reflect their relative significance in Biological oxygen demand
the assessment. Weight values are determined based on the 5.28 50 2.57 70
(BOD) (mg/L)
parameters' importance and impact on water quality. For 35.47 22 4.67 75
Nitrates (mg/L)
instance, parameters with more significant environmental or
health significance may receive higher weights in the Phosphates(mg/L) 3.95 17 0.52 57
calculation. Water Temperature 20.25 21 13.9 32
Computation of the Water Quality Index: The individual 17.15 65 3.73 87
parameter sub-indices are combined using an aggregation Turbidity (NTU)
function, considering the assigned weightings. This Total dissolved solids
415 44 66.37 87
computation results in a comprehensive WQI value. (mg/L)
Typically, a rating scale is employed to categorize or classify
the water quality based on the overall index value. This Parameter Weighting: In the NSF-WQI model, unequal
classification helps communicate the suitability of water weight values are assigned to the selected parameters. The
quality for various purposes, such as drinking, recreational original weight values were determined through expert panel
use, or ecosystem health. consensus. However, subsequent applications have used
modified weight values, considering the environmental
significance of each parameter. For instance, Dissolved
Oxygen (DO), faecal coliforms (FC), pH, and Biochemical
Oxygen Demand (BOD) are assigned significant weights to
reflect their importance in water quality assessment.
Authorized licensed use limited to: Sardar Patel Institute of Technology. Downloaded on November 04,2024 at 14:32:37 UTC from IEEE Xplore. Restrictions apply.
SITE 2 the assigned weight values. This aggregation process results
Weight Mean Q- in a single, comprehensive WQI value, which summarizes
Parameters WiQi the overall water quality assessment.
(Wi) values values
WQI Evaluation: The calculated WQI value is interpreted
Dissolve Oxygen (mg/L) 0.17 7.91 71 12.07 within a defined scale. The WQI typically ranges from 0 to
Faecal coliform (MPN/ 100 100, where 0 signifies the lowest water quality, and 100
0.15 9.5 74 11.1
ml) indicates the highest water quality. The model provides water
pH 0.12 7.6 91 10.92 quality classes, which categorize the assessment based on the
Biological oxygen demand WQI value, making it easier to understand and communicate
0.1 2.57 70 7 the water quality results to stakeholders and the public.
(BOD) (mg/L)
Nitrates (mg/L) 0.1 4.67 75 7.5 Table. 5. WQI evaluation parameters selection results for Site 1, 2 and Site 3
Phosphates (mg/L) 0.1 0.52 57 5.7 Parameters Site 1 Site 2 Site 3
Water Temperature 0.1 13.9 32 3.2 pH 7.9 4.6 6.1
Turbidity (NTU) 0.08 3.73 87 6.96 EC 100.33 310 122
Total dissolved TDS 67.22 473.7 266.66
0.08 66.37 87 6.96
solids(mg/L) TH 40.67 239.33 122.87
71.41
Calcium 55.61 45.05 28.17
Magnesium 6.48 16.5 21.83
WQI Evaluation: The resulting WQI value falls within a
scale ranging from 0 to 100. A WQI of 0 indicates the Iron 0.05 0.38 0.11
poorest water quality, while a value of 100 signifies excellent Fluoride 0.02 0.06 0.5
water quality. The NSF-WQI model defines five water
Turbidity 1.3 2.48 4.15
quality classes based on the calculated WQI values: excellent
(WQI = 90–100), good (WQI = 70–89), medium (WQI = 50–
Table. 6. WQI calculation results for Site 1, 2 and Site 3
69), bad (WQI = 25–49), and very bad quality (WQI = 0–24).
These classes categorize the water quality assessment, SITE Index Value Status
making it easier to interpret and communicate the results to Site 1 15.23 Very Poor
stakeholders and the public. Site 2 97.82 Excellent
Table. 4. WQI calculation results for Site 1 and Site 2 Site 3 42.32 Good
SITE Index Value Water Quality Status
IV. RESULTS AND DISCUSSION
Site 1 36.52 Poor Quality
Site 2 71.41 Good Quality The figure delineates the pivotal process of visualizing
datasets and pinpointing outliers, employing visual
The Weighted Arithmetic Index (WAI) is a widely used representations and statistical scrutiny for a comprehensive
method for assessing water quality that follows a structured dataset exploration. Simultaneously, techniques for outlier
process comprising multiple essential steps: identification are applied, ensuring data quality and refining
the dataset for subsequent analysis. Specifically, the visual
Parameter Selection: The WAI assessment begins with exploration focuses on probability, visually representing its
selecting specific water quality parameters considered pivotal characteristics and identifying potential outliers within the
for the evaluation. These parameters are carefully chosen dataset.
based on their relevance to the assessment's objectives and
the specific context in which it is applied. The selected
parameters may include indicators such as pH, dissolved
oxygen, turbidity, and pollutant concentrations, depending on
the assessment's scope.
Sub-index Generation: Sub-index values are generated for
each selected parameter. These sub-indices typically range
from 0 to 1, with a value of 1 indicating compliance with
recommended guidelines and 0 representing non-compliance
with these standards. Sub-indexing helps standardize
parameter values and allows for meaningful aggregation.
Parameter Weighting: Unequal weight values are assigned
to the selected parameters to reflect their relative significance
in the assessment. The weight values are typically determined
through expert consensus or expert judgment, considering Fig. 3. Visualizing datasets and detecting outliers
each parameter's environmental and health importance.
Aggregation: The WAI model employs an aggregation Through graphical tools like scatter plots or box plots, the
function to combine the parameter sub-indices, considering figure enhances the understanding of probability trends,
Authorized licensed use limited to: Sardar Patel Institute of Technology. Downloaded on November 04,2024 at 14:32:37 UTC from IEEE Xplore. Restrictions apply.
shedding light on the distribution of probabilities values and strongly in water quality classification. Employing a
facilitating the detection of anomalies that may impact data gradient-boosting framework, it combines weak learners to
integrity. This visual analysis is instrumental in gaining enhance predictive accuracy. The Decision Tree model,
insights into the dataset's intricacies, contributing to a more known for its simplicity and interpretability, excels at
nuanced comprehension of probability, and enabling capturing complex relationships within the data. However, its
effective handling of outliers to foster a more robust dataset susceptibility to overfitting warrants careful consideration.
for analysis. AdaBoost, an adaptive boosting algorithm, excels in
improving the performance of weak classifiers. Through
iterative emphasis on misclassified instances, it enhances
overall model accuracy. K-Nearest Neighbours (KNN) relies
on proximity-based classification, considering similarities
between instances. While effective in capturing local
patterns, its performance may be contingent on the choice of
neighbours. SVM emerges as a standout performer, boasting
a commendable accuracy of 73%. Recognized for its efficacy
in handling high-dimensional data, SVM separates classes
within the feature space. The classification report,
encompassing precision, recall, and F1-score, underscores the
reliability of SVM in evaluating water quality for
drinkability. The SVM model, notably, achieves a
commendable accuracy of 73%, showcasing its efficacy in
this context.
Authorized licensed use limited to: Sardar Patel Institute of Technology. Downloaded on November 04,2024 at 14:32:37 UTC from IEEE Xplore. Restrictions apply.
REFERENCES
Hussein EE, Jat Baloch MY, Nigar A, Abualkhair HF, Aldawood FK,
Tageldin E. Machine Learning Algorithms for Predicting the Water Quality
Index. Water. 2023; 15(20):3540. https://doi.org/10.3390/w15203540
Peda Gopi Arepalli, K. Jairam Naik, An IoT based smart water quality
assessment framework for aqua-ponds management using Dilated Spatial-
temporal Convolution Neural Network (DSTCNN), Aquacultural
Engineering, Vol. 104, 2024,
https://doi.org/10.1016/j.aquaeng.2023.102373.
Process Safety and Environmental Protection, Vol. 169, pp. 808-828, 2023.
https://doi.org/10.1016/j.psep.2022.11.073.
Wei Cong Leong, Alireza Bahadori, Jie Zhang & Zainal Ahmad, Prediction
of water quality index (WQI) using support vector machine (SVM) and
least square-support vector machine (LS-SVM), International Journal of
River Basin Management 19(2):1-8, 2019.
10.1080/15715124.2019.1628030
Authorized licensed use limited to: Sardar Patel Institute of Technology. Downloaded on November 04,2024 at 14:32:37 UTC from IEEE Xplore. Restrictions apply.