0% found this document useful (0 votes)
18 views11 pages

Tasks

The document discusses various tasks related to analyzing a water quality dataset. It describes understanding the dataset, attributes of the dataset, and several papers related to water quality prediction, monitoring, and modeling using techniques like machine learning, graph neural networks, and data assimilation.

Uploaded by

Aqsa Aqqa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as ODT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views11 pages

Tasks

The document discusses various tasks related to analyzing a water quality dataset. It describes understanding the dataset, attributes of the dataset, and several papers related to water quality prediction, monitoring, and modeling using techniques like machine learning, graph neural networks, and data assimilation.

Uploaded by

Aqsa Aqqa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as ODT, PDF, TXT or read online on Scribd
You are on page 1/ 11

Tasks of: Data Mining

Submitted to: Dr.Uzma Jamil


Submitted By:
Name: Shazia Riaz
Roll No: 241406
Task#1

Domain of Dataset: Water Quality Data 🌊 (kaggle.com)

Understand dataset:
The dataset comprises detailed records of water quality parameters collected at the Refuge by volunteers once every two weeks.

These measurements include turbidity, pH, dissolved oxygen (DO), salinity, and temperature. Sampling is conducted at designated locations
within various water bodies such as the Bay, D-Pool (fishing pond), C-Pool, B-Pool, and A-Pool.

What is data about?


This file contains detailed records of water quality parameters collected at the Refuge by volunteers once every two weeks. The parameters
include turbidity, pH, dissolved oxygen (DO), salinity, and temperature. Sampling occurs at designated locations within various water bodies
such as the Bay, D-Pool (fishing pond), C-Pool, B-Pool, and A-Pool. The dataset is maintained to ensure accuracy and reliability in monitoring
water quality over time. The water quality data collection process is managed by volunteers under the supervision of the Refuge authority.
Sampling occurs once every two weeks at designated locations across the various water bodies within the Refuge. The dataset is meticulously
maintained, ensuring accuracy and reliability in monitoring the water quality over time.

Attributes of Dataset:
Date: Indicates the date when the water quality data was recorded.
Salinity: Denotes the concentration of salts in the water.
Dissolved Oxygen: Indicates the amount of oxygen dissolved in the water, crucial for aquatic life.
pH: Represents the acidity or alkalinity level of the water.
Secchi Depth (m): Measures the depth at which a Secchi disk disappears from view, providing insight into water transparency.
Water Depth: Indicates the depth of the water column at the sampling location.

Water Temperature: Provides the thermal condition of the water.


Air Temperature: Represents the ambient air temperature during sampling.

Number of Classes? 10

Size of Dataset? 284 808

Source of dataset: Kaggle

Task #2
Seria Reference/Citations Problem Address Techniques/Methodology Results Limitation /Future Work
l No.

1 Rodríguez, Rafael, et al. The competing algorithms implement 76% The approach proposed in this study is
Water-Quality Data Im-
"Water-quality data univariate and multivariate imputation expected to aid water-resource researchers and
putation with a High Per-
imputation with a high methods (inverse distance weighting (IDW), managers in augmenting water-quality datasets
centage of Missing Values:
percentage of missing Random Forest Regressor (RFR), Ridge (R), and overcoming the missing data issue to
A Machine Learning Ap-
values: A machine learning Bayesian Ridge (BR), AdaBoost (AB), increase the number of future studies related to
proach
approach." Sustainability 1 Hubber Regressor (HR), Support Vector the water-quality matter.
3.11 (2021): 6318. Regressor (SVR) and K-nearest neighbours’
Regressor (KNNR))
2 Maiolo, Mario, and Principal component analysis (PCA) and 77% Vulnerability of drinking water supply systems
Multivariate Analysis of
Daniela Pantusa. cluster analysis (CA) methods were used to (DWSSs) depends on different factors such as
Water Quality Data for
"Multivariate analysis of process and reduce the dimensionality of the failures, loss of security, man-made threats,
Drinking Water Supply
water quality data for data, to highlight the parameters that have and the change and deterioration of supply-
Systems
drinking water supply the greatest influence on the qualitative state water quality. Currently, the lifespan of several
systems." Water 13.13 of the supplied water and to identify clusters. DWSSs worldwide has been exceeded,
(2021): 1766. exasperating these issues.
3 Li, Zilin, et al. "Real-time Real-time water quality gated graph neural network (GGNN) model his research advances water quality prediction
water quality prediction in prediction in water distri- for real-time water quality prediction in in WDNs by offering a practical and effective
water distribution bution networks using WDNs. The GGNN model integrates machine learning solution to address
networks using graph graph neural networks hydraulic flow directions and water quality challenges related to limited sensor data and
neural networks with with sparse monitoring data to represent the topology and system network complexity.
sparse monitoring data dynamics, and employs a masking operation
data." Water Research 250 for training to enhance prediction accuracy.
(2024): 121018.
4 Chidiac, Sandra, et al. "A WQIs
comprehensive review of A comprehensive review of
water quality indices water quality indices
(WQIs): History, models, (WQIs): history, models,
attempts and attempts and perspectives
perspectives." Reviews in
Environmental Science
and Bio/Technology 22.2
(2023): 349-395.
6 Wang, Zhaocai, Qingyu A novel hybrid model for (LSTM) This study presented a water quality prediction
Wang, and Tunhua Wu. "A water quality prediction model based on VMD-IGOA-
novel hybrid model for based on VMD and IGOA LSTM and used this model to
water quality prediction optimized for LSTM predict the DO content in the water quality of t
based on VMD and IGOA he Ganjiang River in the short-term
optimized for
LSTM." Frontiers of
Environmental Science &
Engineering 17.7 (2023):
88.
7 Hemdan, Ezz El-Din, et al. An efficient IoT based smart Therefore, this work describes an 80% In addition, we plan for applying and
"An efficient IoT based water quality monitoring experimental work to forecast at scale the integrating the proposed system for smart
smart water quality system water quality and proposes the measurement agriculture systems for water quality
monitoring of the Water Quality Index (WQI) for assessment and utilizing deep learning models
system." Multimedia Tools drinking water and labels the dataset with for the analysis of satellite imagery to estimate
and Applications 82.19 WQI values water quality for Smart IoT-based smart city
(2023): 28827-28851. applications.
8 Shim, Jaegyu, et al. "Deep Deep learning with data This study investigated the predictive 84% Preliminary detection for abnormal water
learning with data preprocessing methods for models using deep learning algorithms, quality after UF is vital for cost-efficient
preprocessing methods for water quality prediction in specifically convolutional neural operations, but current predictive models lack
water quality prediction in ultrafiltration network (CNN) and long short-term memory accuracy.
ultrafiltration." Journal of (LSTM) structures.
Cleaner Production 428
(2023): 139217.
9 Cho, Kyung Hwa, et al. Data assimilation in sur- This can be achieved by using mathematical expanding use of DA in water quality
"Data assimilation in face water quality model- techniques of data assimilation (DA) and management
surface water quality ling: A review their computational implementations
modeling: A
review." Water
Research 186 (2020):
116307.
10 Zou, Xing-Yun, et al. "The Empirical research using panel data It is recommended to use this research as a
impact of extreme weather The impact of extreme paradigm to summarize experiences and
events on water quality: weather events on water further expand climate change adaptation. As a
International quality: international evid- consequence, we believe our findings shed
evidence." Natural ence light on governments to cope with extreme
Hazards 115.1 (2023): 1- weather and improve water quality. We hence
21. put forward the following policy implications.
11 Schubert, Alyssa, et al. Interviewing Together, these future research directions will
"Perceptions of drinking Perceptions of drinking further advance new approaches for access to
water: Understanding the drinking water quality data.
water: Understanding the
role of individualized
water quality data in role of individualized wa-
Detroit, Michigan." PLOS
Water 3.4 (2024): ter quality data in Detroit,
e0000188.
Michigan

12 Zhi, Wei, et al. "Deep Deep learning for water  In this Review, we posit that deep His Review highlights the strengths and
learning for water quality learning represents an underutilized limitations of deep learning methods relative to
quality." Nature yet promising approach that can traditional approaches, and underscores its
Water (2024): 1-14. unravel intricate structures and potential as an emerging and indispensable
relationships in high-dimensional approach in overcoming challenges and
data. discovering new knowledge in water-quality
sciences.

Task #3
Introduction
Water resources are crucial for human survival and social development. However, with rapid industrial growth, river water pollution has become
a growing problem. Monitoring water quality and analysing its trends are becoming more important. By modelling and analysing river water
quality data, we can predict future changes in these indicators, which helps protect river ecosystems and manage water resources effectively.

Predicting water quality is vital for protecting watershed environments. Current prediction methods often fail to analyse future changes in water
quality indicators. This paper introduces a new model for predicting water quality parameters using a dual-attention mechanism. The model uses
an Encoder-Decoder structure to predict data series and combines attention to both dimension and time step to improve the accuracy of
predictions for future data.

Background Summery
Water quality is crucial for protecting watershed ecosystems, affecting nature, animals, and human activities. This paper presents a prediction
model that forecasts various water quality data, analyses the relationship between environmental factors, and predicts future trends. The model
uses a Seq2Seq framework with a dual attention mechanism for dimensions and time steps, improving prediction accuracy. It employs artificial
intelligence techniques like LSTM and ED-LSTM for comparison and modelling. Future research will explore how multiple factors, including
climate change and human activities, influence water quality data to enhance prediction reliability.

Recent global warming has increased climate instability, leading to more extreme weather and affecting water quality, especially in low-income
countries and those with low water innovation levels. This paper proposes an IoT-based water quality monitoring system that uses machine
learning algorithms to forecast water quality. Such forecasting helps water providers plan better, set goals, and detect anomalies, aiding decision-
making in smart city contexts. Forecasting is an indispensable task in the data prediction journey, which can help the water provider entities to
plan better, set goals, and detect abnormal events. The impact of global warming in recent years has gradually become more prominent. Climate
warning has combination the instability of the climate system and is an important background for the frequent occurrence of extreme weather
and climate events. The results show a significant impact on water quality is revealed in non-high-income countries and countries with low water
innovation level.
Problem Statement:
Volunteers supervised by the Refuge authority collect water quality data every two weeks to ensure accuracy and reliability. The dataset includes
detailed records of parameters like turbidity, pH, dissolved oxygen (DO), salinity, and temperature. Sampling occurs at designated spots in
different water bodies, including the Bay and various pools (A-Pool, B-Pool, C-Pool, and D-Pool). This carefully maintained dataset helps
monitor water quality over time.

Research Questions:
How would you rate the quality of the water at your faucet? In your view, how safe or unsafe is the water coming from your pipes ?

Scope
Water resources are crucial for the environment and health worldwide. Accurate water quality forecasting is essential for better water
management. This work aims to identify water quality impacts and provide an automated monitoring system to ensure water safety globally. It
describes experiments to forecast water quality on a large scale and proposes measuring the Water Quality Index (WQI) for drinking water,
labelling the dataset with WQI values. Additionally, it compares LSTM and Facebook Prophet models for water quality forecasting, helping data
providers and analysts make better decisions.
Objective of Research
This research helps monitor and analyse water quality at the Refuge, providing valuable insights into environmental health and aquatic
conditions. By accessing this dataset, users can make informed decisions and better manage the environment.

Motivation:
In the modelling process, we built a prediction model for water quality using multiple factors and analysed what affects the data to improve
prediction reliability. This helps identify health risks, manage drinking water systems safely, and increase user confidence in tap water. Accurate,
real-time water quality predictions enable practitioners to respond quickly to unexpected pollution events and protect river ecosystems.
Advanced sensors are used to detect, transmit, and measure complex water quality data

Proposed Solution:
Encoder-Decoder Architecture in Water Quality Data Proposed Solution

An encoder-decoder architecture can be highly effective in analysing and predicting water quality based on large datasets. This architecture,
commonly used in neural networks, transforms input data into a compressed form and then reconstructs the output, making it suitable for tasks
like time series prediction, anomaly detection, and spatial-temporal analysis in water quality management.
Introduction

Objective: To utilize an encoder-decoder neural network architecture for predicting water quality parameters and identifying potential pollution
sources.

Scope: Focus on key water quality parameters such as pH, dissolved oxygen, nitrates, phosphates, heavy metals, and microbial indicators.

Study Area: Specify the geographic area or water bodies under consideration for the proposed solution.

Encoder-Decoder Architecture Overview

 Encoder: Transforms the input water quality data into a fixed-size context vector (latent representation).
 Decoder: Uses the context vector to generate the predicted water quality parameters.

Model Structure

Encoder: Input Layer: Receives water quality data, which could include historical measurements, spatial data, and auxiliary data (e.g., weather
conditions, land use).

LSTM/GRU Layers: Employ Long Short-Term Memory (LSTM) or Gated Recurrent Units (GRU) to capture temporal dependencies in the
data.

Dense Layer: Compresses the output of the LSTM/GRU layers into a fixed-size context vector.

Decoder: Dense Layer: Expands the context vector back into a sequence format.

LSTM/GRU Layers: Decode the context vector into the predicted sequence of water quality parameters.

Output Layer: Generates the final predicted values for the water quality parameters.

Methodology
Data Collection:

Historical Data: Collect historical water quality measurements from various sources (e.g., government databases, research institutions).

Auxiliary Data: Gather additional data such as weather conditions, land use patterns, and industrial activity that might influence water quality.

Data Preprocessing:

Normalization: Normalize the data to ensure that all parameters are on a comparable scale.

Handling Missing Data: Use techniques such as interpolation or imputation to handle missing values.

Model Training:

Training Data: Use a portion of the collected data to train the model, ensuring that it learns the underlying patterns.

Validation Data: Use another portion of the data to validate the model and prevent overfitting.

Loss Function: Use Mean Squared Error (MSE) or another suitable loss function to measure the accuracy of predictions.

Prediction and Analysis:

Prediction: Use the trained model to predict future water quality parameters.

Anomaly Detection: Identify anomalies in the predicted data that might indicate potential pollution events.

Implementation Plan

Tasks: Develop the encoder-decoder model, train and validate the model using the collected data.

Deployment and Monitoring: Tasks: Deploy the model in a real-time monitoring system, continuously update the model with new data.

Evaluation and Adaptation


Real-time Predictions: Integrate the model into a real-time monitoring system to provide ongoing predictions of water quality parameters.

Conclusion

Summary: The encoder-decoder architecture provides a robust framework for predicting water quality and identifying potential pollution
sources.

Future Research: Explore the integration of more complex models such as attention mechanisms or convolutional layers to further enhance
prediction accuracy.

By implementing this encoder-decoder architecture, we can improve water quality prediction and management, leading to proactive measures
and better environmental outcomes.

You might also like