Akshar AI Assignment
Akshar AI Assignment
1. Model Code:
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score, mean_absolute_error
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
# Step 2: Preprocessing
# Convert month names to numbers
month_map = {'June': 6, 'July': 7, 'August': 8}
data['Month'] = data['Month'].map(month_map)
# Step 5: Evaluate
y_pred = model.predict(X_test)
r2 = r2_score(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
# Colored output
GREEN = "\033[92m"
YELLOW = "\033[93m"
CYAN = "\033[96m"
RESET = "\033[0m"
B.Tech-CSE 2
print(f"{GREEN}Model R2 Score (Accuracy): {r2:.2f}{RESET}")
print(f"{YELLOW}Mean Absolute Error: {mae:.2f} mm{RESET}")
B.Tech-CSE 3
2. Description of Code:
The code implements a sophisticated approach to rainfall prediction using machine learning
techniques. Here's a detailed explanation of the implementation:
The code begins by importing necessary libraries: pandas for data manipulation,
matplotlib.pyplot for visualization, sklearn.ensemble for the RandomForestRegressor model,
and other scikit-learn components for data splitting, evaluation metrics, and feature scaling.
The rainfall dataset for Ahmedabad is loaded using pandas.read_csv(). This dataset spans from
2010 to 2023, containing monthly rainfall data for June, July, and August - the primary
monsoon months in Ahmedabad. The implementation follows a structured approach:
1. Enhanced Preprocessing: The code maps month names to numerical values (June→6,
July→7, August→8) for better model understanding, and uses three features for
prediction: 'Year', 'Month', and 'Avg_Rain_1994_2023' (historical average rainfall of
816mm).
2. Feature Scaling: Unlike the previous approach, this implementation includes a
StandardScaler to normalize the feature values, which can improve model performance,
particularly when features have different scales.
3. Proper Data Splitting: The data is split into training (80%) and testing (20%) sets
using train_test_split, which allows for more reliable model evaluation.
4. Improved Model Configuration: The RandomForestRegressor is configured with 200
estimators (decision trees) and a maximum depth of 10, providing more robust
predictions through ensemble learning while preventing overfitting.
5. Comprehensive Model Evaluation: The model calculates multiple performance
metrics:
o R² score: Measures the proportion of variance explained by the model
o Mean Absolute Error (MAE): The average absolute difference between
predicted and actual values
6. Feature Importance Analysis: The code examines which features contribute most to
the predictions, providing insights into the rainfall patterns.
7. Future Predictions: The model generates predictions for June, July, and August 2025
after properly scaling the input features.
8. Visualization: The implementation includes two visualizations:
o A line chart comparing actual vs. predicted values for test data
o A color-coded bar chart showing the rainfall predictions for the three months in
2025
B.Tech-CSE 4
3. Model Dataset:
B.Tech-CSE 5
4. Model Output:
Feature Importances:
o Year: 0.145
o Month: 0.855
o Avg_Rain_1994_2023: 0.000
o Strong correlation between actual (blue line with circle markers) and predicted
(red dashed line with x markers) values
o Generally close prediction patterns across all test samples
o Slight underestimation in some cases, particularly at samples 4 and 5
B.Tech-CSE 6
Rainfall Forecast for 2025:
The bar chart showing predicted rainfall for June, July, and August 2025 indicates:
The model predicts that August 2025 will experience the highest rainfall in Ahmedabad
(351.950 mm), followed closely by July (333.575 mm), with June having significantly
less rainfall (213.850 mm). This pattern aligns with historical monsoon trends in the
region, where rainfall typically increases as the monsoon season progresses from June
through August.
B.Tech-CSE 7