0% found this document useful (0 votes)
5 views18 pages

Only Code

This document provides a comprehensive implementation of a multi-task deep learning model for predicting water quality parameters using TensorFlow and PyTorch. It includes detailed code for data preprocessing, model architecture, training, evaluation, and visualization of results. Additionally, it covers the computation of the Water Quality Index (WQI) and a Random Forest regression model for WQI prediction.

Uploaded by

sashantnipate
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views18 pages

Only Code

This document provides a comprehensive implementation of a multi-task deep learning model for predicting water quality parameters using TensorFlow and PyTorch. It includes detailed code for data preprocessing, model architecture, training, evaluation, and visualization of results. Additionally, it covers the computation of the Water Quality Index (WQI) and a Random Forest regression model for WQI prediction.

Uploaded by

sashantnipate
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Appendices: -

Multi-Task Deep Learning Model Code


This appendix provides the complete implementation of the multi-task deep
learning model for water quality prediction. It includes data preprocessing, feature
engineering, model architecture, training, and evaluation.

Data Preprocessing and Cleaning


The following code standardizes column names, handles missing values, and extracts
relevant features for model training:

import pandas as pd

import numpy as np

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler

import tensorflow as tf

from tensorflow.keras.models import Model

from tensorflow.keras.layers import Input, Dense, Dropout

# Load your data

file_path = "/content/Water_creek_marine_seawater_beach_2023.csv"

water = pd.read_csv(file_path, encoding='latin-1')

# Standardize column names

water.columns = water.columns.str.strip().str.lower()

water = water.iloc[1:].reset_index(drop=True)
# Rename columns for clarity

water.rename(columns={

'monitoring location': 'location',

'temperature (°c)': 'min_temperature',

'temperature (°c).1': 'max_temperature',

'dissolved oxygen (mg/l)': 'min_do',

'dissolved oxygen (mg/l).1': 'max_do',

'ph': 'min_ph',

'ph.1': 'max_ph',

'conductivity (µmho/cm)': 'min_conductivity',

'conductivity (µmho/cm).1': 'max_conductivity',

'bod\n(mg/l)': 'min_bod',

'bod\n(mg/l).1': 'max_bod',

'nitraten (mg/l)': 'min_nitrate',

'nitraten (mg/l).1': 'max_nitrate',

'fecal coliform (mpn/100ml)': 'min_fecal_coliform',

'fecal coliform (mpn/100ml).1': 'max_fecal_coliform',

'total coliform (mpn/100ml)': 'min_total_coliform',

'total coliform (mpn/100ml).1': 'max_total_coliform'

}, inplace=True)

# Define numeric columns for cleaning

numeric_columns = ['min_temperature', 'max_temperature', 'min_do',


'max_do',

'min_ph', 'max_ph', 'min_conductivity',


'max_conductivity',
'min_bod', 'max_bod', 'min_nitrate', 'max_nitrate',

'min_fecal_coliform', 'max_fecal_coliform',

'min_total_coliform', 'max_total_coliform']

# Convert numeric columns and coerce errors to NaN

water[numeric_columns] = water[numeric_columns].apply(pd.to_numeric,
errors='coerce')

# Remove rows with NaN in any of the numeric columns

water.dropna(subset=numeric_columns, inplace=True)

# Optionally, remove rows with infinite values

water = water.replace([np.inf, -np.inf],


np.nan).dropna(subset=numeric_columns)

# Recompute the mean values after cleaning

water['mean_temperature'] = (water['min_temperature'] +
water['max_temperature']) / 2

water['mean_do'] = (water['min_do'] + water['max_do']) / 2

water['mean_ph'] = (water['min_ph'] + water['max_ph']) / 2

water['mean_bod'] = (water['min_bod'] + water['max_bod']) / 2

water['mean_nitrate'] = (water['min_nitrate'] + water['max_nitrate']) / 2

Multi-Task Learning Model Implementation


The following code defines a deep learning model to predict multiple water quality
parameters simultaneously.
Tensorflow Model
# Define input features and targets

X = water[['mean_temperature', 'mean_nitrate']] # example features

y_ph = water['mean_ph']

y_bod = water['mean_bod']

y_do = water['mean_do']

# Split the data into training and testing sets

X_train, X_test, y_ph_train, y_ph_test, y_bod_train, y_bod_test,


y_do_train, y_do_test = train_test_split(

X, y_ph, y_bod, y_do, test_size=0.2, random_state=42

# Normalize the features

scaler = StandardScaler()

X_train_scaled = scaler.fit_transform(X_train)

X_test_scaled = scaler.transform(X_test)

# Build the multi-task model using TensorFlow/Keras

inputs = Input(shape=(X_train_scaled.shape[1],))

x = Dense(64, activation='relu')(inputs)

x = Dropout(0.2)(x)

x = Dense(32, activation='relu')(x)

# Define separate output layers for each target


ph_output = Dense(1, name='ph_output')(x)

bod_output = Dense(1, name='bod_output')(x)

do_output = Dense(1, name='do_output')(x)

model = Model(inputs=inputs, outputs=[ph_output, bod_output, do_output])

model.compile(optimizer='adam', loss='mse')

model.summary()

# Train the model

history = model.fit(

X_train_scaled,

[y_ph_train, y_bod_train, y_do_train],

validation_data=(X_test_scaled, [y_ph_test, y_bod_test, y_do_test]),

epochs=50,

batch_size=16

import matplotlib.pyplot as plt

import seaborn as sns

# Optional: Set a Seaborn style for better aesthetics

sns.set(style="darkgrid")

# Extract loss values from the history object

epochs = range(1, len(history.history['loss']) + 1)


# Plot Overall Loss

plt.figure(figsize=(10, 6))

plt.plot(epochs, history.history['loss'], label='Training Loss',


marker='o')

plt.plot(epochs, history.history['val_loss'], label='Validation Loss',


marker='o')

plt.title('Overall Loss vs. Epochs')

plt.xlabel('Epoch')

plt.ylabel('Loss')

plt.legend()

plt.tight_layout()

plt.show()

# Plot pH Output Loss

plt.figure(figsize=(10, 6))

plt.plot(epochs, history.history['ph_output_loss'], label='Training pH


Loss', marker='o')

plt.plot(epochs, history.history['val_ph_output_loss'], label='Validation


pH Loss', marker='o')

plt.title('pH Output Loss vs. Epochs')

plt.xlabel('Epoch')

plt.ylabel('Loss')

plt.legend()

plt.tight_layout()

plt.show()

# Plot BOD Output Loss


plt.figure(figsize=(10, 6))

plt.plot(epochs, history.history['bod_output_loss'], label='Training BOD


Loss', marker='o')

plt.plot(epochs, history.history['val_bod_output_loss'], label='Validation


BOD Loss', marker='o')

plt.title('BOD Output Loss vs. Epochs')

plt.xlabel('Epoch')

plt.ylabel('Loss')

plt.legend()

plt.tight_layout()

plt.show()

# Plot DO Output Loss

plt.figure(figsize=(10, 6))

plt.plot(epochs, history.history['do_output_loss'], label='Training DO


Loss', marker='o')

plt.plot(epochs, history.history['val_do_output_loss'], label='Validation


DO Loss', marker='o')

plt.title('DO Output Loss vs. Epochs')

plt.xlabel('Epoch')

plt.ylabel('Loss')

plt.legend()

plt.tight_layout()

plt.show()

Pytorch Model: -
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
import matplotlib.pyplot as plt
import seaborn as sns

#
# 1. DATA PREPARATION
#
# Assume 'water' DataFrame is already loaded and cleaned

X = water[['mean_temperature', 'mean_nitrate']]
y_ph = water['mean_ph']
y_bod = water['mean_bod']
y_do = water['mean_do']

X_train, X_test, y_ph_train, y_ph_test, y_bod_train, y_bod_test,


y_do_train, y_do_test = train_test_split(
X, y_ph, y_bod, y_do, test_size=0.2, random_state=42
)

# Normalize the features


scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Custom Dataset
class WaterDataset(Dataset):
def init (self, X, y_ph, y_bod, y_do):
self.X = torch.tensor(X, dtype=torch.float32)
self.y_ph = torch.tensor(y_ph.values, dtype=torch.float32).view(-1,
1)
self.y_bod = torch.tensor(y_bod.values,
dtype=torch.float32).view(-1, 1)
self.y_do = torch.tensor(y_do.values, dtype=torch.float32).view(-1,
1)

def len (self):


return len(self.X)
def getitem (self, idx):
return self.X[idx], self.y_ph[idx], self.y_bod[idx], self.y_do[idx]

train_dataset = WaterDataset(X_train_scaled, y_ph_train, y_bod_train,


y_do_train)
test_dataset = WaterDataset(X_test_scaled, y_ph_test, y_bod_test,
y_do_test)

train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True)


test_loader = DataLoader(test_dataset, batch_size=16, shuffle=False)

#
# 2. MODEL DEFINITION
#
class MultiTaskNet(nn.Module):
def init (self, input_dim):
super(MultiTaskNet, self). init ()
self.shared = nn.Sequential(
nn.Linear(input_dim, 64),
nn.ReLU(),
nn.Dropout(0.2),
nn.Linear(64, 32),
nn.ReLU()
)
self.ph_head = nn.Linear(32, 1)
self.bod_head = nn.Linear(32, 1)
self.do_head = nn.Linear(32, 1)

def forward(self, x):


x = self.shared(x)
ph = self.ph_head(x)
bod = self.bod_head(x)
do = self.do_head(x)
return ph, bod, do

input_dim = X_train_scaled.shape[1]
model = MultiTaskNet(input_dim)

criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

#
# 3. TRAINING + VALIDATION LOOP (with Loss Tracking)
#
num_epochs = 50

# Lists to store losses per epoch


train_losses_ph = []
train_losses_bod = []
train_losses_do = []

val_losses_ph = []
val_losses_bod = []
val_losses_do = []

for epoch in range(num_epochs):


#
# TRAINING
#
model.train()
running_loss_ph = 0.0
running_loss_bod = 0.0
running_loss_do = 0.0

for X_batch, y_ph_batch, y_bod_batch, y_do_batch in train_loader:


optimizer.zero_grad()

pred_ph, pred_bod, pred_do = model(X_batch)


loss_ph = criterion(pred_ph, y_ph_batch)
loss_bod = criterion(pred_bod, y_bod_batch)
loss_do = criterion(pred_do, y_do_batch)

loss = loss_ph + loss_bod + loss_do


loss.backward()
optimizer.step()

# Accumulate individual losses


running_loss_ph += loss_ph.item()
running_loss_bod += loss_bod.item()
running_loss_do += loss_do.item()

# Average losses over the training set


train_loss_ph_epoch = running_loss_ph / len(train_loader)
train_loss_bod_epoch = running_loss_bod / len(train_loader)
train_loss_do_epoch = running_loss_do / len(train_loader)
train_losses_ph.append(train_loss_ph_epoch)
train_losses_bod.append(train_loss_bod_epoch)
train_losses_do.append(train_loss_do_epoch)

#
# VALIDATION
#
model.eval()
val_running_loss_ph = 0.0
val_running_loss_bod = 0.0
val_running_loss_do = 0.0

with torch.no_grad():
for X_val, y_ph_val, y_bod_val, y_do_val in test_loader:
pred_ph_val, pred_bod_val, pred_do_val = model(X_val)

val_loss_ph = criterion(pred_ph_val, y_ph_val)


val_loss_bod = criterion(pred_bod_val, y_bod_val)
val_loss_do = criterion(pred_do_val, y_do_val)

val_running_loss_ph += val_loss_ph.item()
val_running_loss_bod += val_loss_bod.item()
val_running_loss_do += val_loss_do.item()

val_loss_ph_epoch = val_running_loss_ph / len(test_loader)


val_loss_bod_epoch = val_running_loss_bod / len(test_loader)
val_loss_do_epoch = val_running_loss_do / len(test_loader)

val_losses_ph.append(val_loss_ph_epoch)
val_losses_bod.append(val_loss_bod_epoch)
val_losses_do.append(val_loss_do_epoch)

# Print combined (sum) loss for clarity


train_sum_loss = train_loss_ph_epoch + train_loss_bod_epoch +
train_loss_do_epoch
val_sum_loss = val_loss_ph_epoch + val_loss_bod_epoch +
val_loss_do_epoch

print(f"Epoch {epoch+1}/{num_epochs} | "


f"Train Loss (sum): {train_sum_loss:.4f} | "
f"Val Loss (sum): {val_sum_loss:.4f}")
#
# 4. VISUALIZATION
#
# Let's plot the training vs. validation losses for pH, BOD, and DO
sns.set(style="darkgrid") # Make plots look nicer

epochs_range = range(1, num_epochs + 1)

fig, axes = plt.subplots(1, 3, figsize=(20, 6))

# --- pH ---
axes[0].plot(epochs_range, train_losses_ph, label='Train pH Loss',
marker='o')
axes[0].plot(epochs_range, val_losses_ph, label='Val pH Loss', marker='o')
axes[0].set_title('pH Loss Over Epochs')
axes[0].set_xlabel('Epoch')
axes[0].set_ylabel('MSE Loss')
axes[0].legend()

# --- BOD ---


axes[1].plot(epochs_range, train_losses_bod, label='Train BOD Loss',
marker='o')
axes[1].plot(epochs_range, val_losses_bod, label='Val BOD Loss',
marker='o')
axes[1].set_title('BOD Loss Over Epochs')
axes[1].set_xlabel('Epoch')
axes[1].set_ylabel('MSE Loss')
axes[1].legend()

# --- DO ---
axes[2].plot(epochs_range, train_losses_do, label='Train DO Loss',
marker='o')
axes[2].plot(epochs_range, val_losses_do, label='Val DO Loss', marker='o')
axes[2].set_title('DO Loss Over Epochs')
axes[2].set_xlabel('Epoch')
axes[2].set_ylabel('MSE Loss')
axes[2].legend()

plt.tight_layout()
plt.show()
WǪI Computation and Predictive Modeling Code

This section includes code for computing the Water Quality Index, as well as machine
learning models for regression and classification.

Computation
Each parameter is converted into a subindex using a linear scaling method:

water['WQI'] = water.apply(lambda row: compute_wqi_for_row(row,


param_info), axis=1)

print("Head of the DataFrame with computed WQI:")

print(water[['mean_ph','mean_do','mean_bod','mean_temperature','mean_nitrat
e','WQI']].head(25))

# Optionally, you can plot a histogram of WQI values

import matplotlib.pyplot as plt

plt.figure(figsize=(6,4))

water['WQI'].hist(bins=20)

plt.title("Distribution of Computed WQI")

plt.xlabel("WQI (0-100 scale)")

plt.ylabel("Frequency")

plt.show()

Random Forest Regression for WǪI Prediction: -

import pandas as pd
import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns

from sklearn.model_selection import train_test_split

from sklearn.pipeline import Pipeline

from sklearn.impute import SimpleImputer

from sklearn.linear_model import LogisticRegression

from sklearn.metrics import accuracy_score, classification_report,


confusion_matrix

######################################

# 1. Categorize the WQI into classes

######################################

def categorize_wqi(wqi):

if wqi < 50:

return "Poor"

elif wqi < 75:

return "Moderate"

else:

return "Good"

# Assume 'water' DataFrame already exists and has a 'WQI' column

water['Quality_Category'] = water['WQI'].apply(categorize_wqi)
######################################

# 2. Prepare features and target

######################################

X_class = water[['mean_ph', 'mean_do', 'mean_bod', 'mean_temperature',


'mean_nitrate']]

y_class = water['Quality_Category']

# Drop rows with missing target values

data_class = water.dropna(subset=['Quality_Category'])

X_class = data_class[['mean_ph', 'mean_do', 'mean_bod', 'mean_temperature',


'mean_nitrate']]

y_class = data_class['Quality_Category']

######################################

# 3. Split dataset

######################################

Xc_train, Xc_test, yc_train, yc_test = train_test_split(

X_class,

y_class,

test_size=0.2,

random_state=42

######################################

# 4. Build and train the pipeline

######################################
pipeline_clf = Pipeline([

('imputer', SimpleImputer(strategy='mean')),

('clf', LogisticRegression(max_iter=300)) # Increased max_iter to


reduce convergence warnings

])

pipeline_clf.fit(Xc_train, yc_train)

######################################

# 5. Make predictions

######################################

yc_pred = pipeline_clf.predict(Xc_test)

######################################

# 6. Print metrics

######################################

print("Classification Accuracy:", accuracy_score(yc_test, yc_pred))

print(classification_report(yc_test, yc_pred))

######################################

# 7. Create a comparison DataFrame

######################################

df_results = Xc_test.copy()

df_results['Actual_Quality'] = yc_test.values

df_results['Predicted_Quality'] = yc_pred
print("\nSample rows from the Actual vs. Predicted Quality table:")

print(df_results.head(10))

######################################

# 8. Distribution of predicted quality

######################################

predicted_distribution =
df_results.groupby('Predicted_Quality').size().reset_index(name='Count')

print("\nDistribution of Predicted Quality in the Test Set:")

print(predicted_distribution)

######################################

# 9. Confusion Matrix

######################################

labels = ["Poor", "Moderate", "Good"]

cm = confusion_matrix(df_results['Actual_Quality'],
df_results['Predicted_Quality'], labels=labels)

cm_df = pd.DataFrame(cm, index=[f"Actual_{l}" for l in labels],


columns=[f"Pred_{l}" for l in labels])

print("\nConfusion Matrix:")

print(cm_df)

######################################

# 10. Visualizations
######################################

# Set up a figure with 2 subplots: 1) Bar chart for distribution, 2)


Heatmap for confusion matrix

fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(14, 5))

# --- (A) Bar Chart: Distribution of Predicted Categories ---

axes[0].bar(predicted_distribution['Predicted_Quality'],
predicted_distribution['Count'], color=['green','orange','red'])

axes[0].set_title("Distribution of Predicted Water Quality", fontsize=14)

axes[0].set_xlabel("Predicted Category", fontsize=12)

axes[0].set_ylabel("Number of Samples", fontsize=12)

# Annotate bar chart

for i, count in enumerate(predicted_distribution['Count']):

axes[0].text(i, count + 0.5, str(count), ha='center', va='bottom',


fontsize=12)

# --- (B) Heatmap: Confusion Matrix ---

sns.heatmap(cm_df, annot=True, cmap='Blues', fmt='d', cbar=False,


ax=axes[1])

axes[1].set_title("Confusion Matrix", fontsize=14)

axes[1].set_xlabel("Predicted Category", fontsize=12)

axes[1].set_ylabel("Actual Category", fontsize=12)

plt.tight_layout()

plt.show()

You might also like