0% found this document useful (0 votes)

129 views

Rainfall Prediction using Machine Learning

Well project report

Uploaded by

ladduyadav63076

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

129 views

Rainfall Prediction using Machine Learning

Well project report

Uploaded by

ladduyadav63076

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Rainfall Prediction using Machine Learning – Python

Last Updated : 20 Dec, 2024

Today there are no certain methods by using which we can predict whether there will be rainfall today or not. Even
the meteorological department’s prediction fails sometimes. In this article, we will learn how to build a machine-
learning model which can predict whether there will be rainfall today or not based on some atmospheric factors. This
problem is related to Rainfall Prediction using Machine Learning because machine learning models tend to perform
better on the previously known task which needed highly skilled individuals to do so.

Importing Libraries and Dataset

Python libraries make it easy for us to handle the data and perform typical and complex tasks with a single line of
code.

 Pandas – This library helps to load the data frame in a 2D array format and has multiple functions to perform
analysis tasks in one go.

 Numpy – Numpy arrays are very fast and can perform large computations in a very short time.

 Matplotlib/Seaborn – This library is used to draw visualizations.

 Sklearn – This module contains multiple libraries are having pre-implemented functions to perform tasks
from data preprocessing to model development and evaluation.

 XGBoost – This contains the eXtreme Gradient Boosting machine learning algorithm which is one of the
algorithms which helps us to achieve high accuracy on predictions.

 Imblearn – This module contains a function that can be used for handling problems related to data
imbalance.

 Code:-

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sb

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler

from sklearn import metrics

from sklearn.svm import SVC

from xgboost import XGBClassifier

from sklearn.linear_model import LogisticRegression

from imblearn.over_sampling import RandomOverSampler

import warnings

warnings.filterwarnings('ignore')

Now let’s load the dataset into the panda’s data frame and print its first five rows.
Python
 Code:-
 df = pd.read_csv('Rainfall.csv')
 df.head()
 Output :-

Now let’s check the size of the dataset.

Python Code :-

df.shape

Output:- (366, 12)

Let’s check which column of the dataset contains which type of data.

Python Code:-

df.info()

Output:-

As per the above information regarding the data in each column, we can observe that there are no null values.

Python Code:-

df.describe().T
Data Cleaning

The data which is obtained from the primary sources is termed the raw data and required a lot of preprocessing
before we can derive any conclusions from it or do some modeling on it. Those preprocessing steps are known
as data cleaning and it includes, outliers removal, null value imputation, and removing discrepancies of any sort in
the data inputs.

Python Code:- df.isnull().sum()

Output:-

So there is one null value in the ‘winddirection’ as well as the ‘windspeed’ column. But what’s up with the column
name wind direction?

Python Code:- df.columns

Output:-
Index(['day', 'pressure ', 'maxtemp', 'temperature', 'mintemp', 'dewpoint', 'humidity ', 'cloud ',
'rainfall', 'sunshine', ' winddirection', 'windspeed'], dtype='object')

Here we can observe that there are unnecessary spaces in the names of the columns let’s remove that.

Python Code:-

df.rename(str.strip,

axis='columns',

inplace=True)

df.columns

for col in df.columns:

Python Code:-

# Checking if the column contains

# any null values

if df[col].isnull().sum() > 0:

val = df[col].mean()

df[col] = df[col].fillna(val)

df.isnull().sum().sum()

Output: 0
Exploratory Data Analysis

EDA is an approach to analyzing the data using visual techniques. It is used to discover trends, and patterns, or to
check assumptions with the help of statistical summaries and graphical representations. Here we will see how to
check the data imbalance and skewness of the data.

Python Code:-

plt.pie(df['rainfall'].value_counts().values,

labels = df['rainfall'].value_counts().index,

autopct='%1.1f%%')

plt.show()

Output:--

Python Code:-

df.groupby('rainfall').mean()

Output: Here we can clearly draw some observations:

 maxtemp is relatively lower on days of rainfall.

 dewpoint value is higher on days of rainfall.

 humidity is high on the days when rainfall is expected.

 Obviously, clouds must be there for rainfall.

 sunshine is also less on days of rainfall.

 windspeed is higher on days of rainfall.

The observations we have drawn from the above dataset are very much similar to what is observed in real life as
well.

features = list(df.select_dtypes(include = np.number).columns)

features.remove('day')

print(features)

['pressure', 'maxtemp', 'temperature', 'mintemp', 'dewpoint', 'humidity', 'cloud', 'sunshine', 'winddirection',

'windspeed']

Let’s check the distribution of the continuous features given in the dataset.

Python Code:-

plt.subplots(figsize=(15,8))

for i, col in enumerate(features):

Python Code:-

plt.subplot(3,4, i + 1)

sb.distplot(df[col])

plt.tight_layout()

plt.show()

Output:-

Let’s draw boxplots for the continuous variable to detect the outliers present in the data.

Python Code:-

plt.subplots(figsize=(15,8))

for i, col in enumerate(features):

plt.subplot(3,4, i + 1)

sb.boxplot(df[col])

plt.tight_layout()

plt.show()
There are outliers in the data but sadly we do not have much data so, we cannot remove this.

Python Code:-

df.replace({'yes':1, 'no':0}, inplace=True)

Sometimes there are highly correlated features that just increase the dimensionality of the feature space and do not
good for the model’s performance. So we must check whether there are highly correlated features in this dataset or
not.

Python Code:-

plt.figure(figsize=(10,10))

sb.heatmap(df.corr() > 0.8,

annot=True,

cbar=False)

plt.show()

Output:-

Now we will remove the highly correlated features ‘maxtemp’ and ‘mintemp’. But why not temp or dewpoint? This is
because temp and dewpoint provide distinct information regarding the weather and atmospheric conditions.

Python Code:-

df.drop(['maxtemp', 'mintemp'], axis=1, inplace=True)

Model Training

Now we will separate the features and target variables and split them into training and testing data by using which
we will select the model which is performing best on the validation data.

Python Code:-

features = df.drop(['day', 'rainfall'], axis=1)

target = df.rainfall
As we found earlier that the dataset we were using was imbalanced so, we will have to balance the training data
before feeding it to the model.

X_train, X_val, \

Y_train, Y_val = train_test_split(features,

target,

test_size=0.2,

stratify=target,

random_state=2)

# As the data was highly imbalanced we will

# balance it by adding repetitive rows of minority class.

ros = RandomOverSampler(sampling_strategy='minority',

random_state=22)

X, Y = ros.fit_resample(X_train, Y_train)

The features of the dataset were at different scales so, normalizing it before training will help us to obtain optimum
results faster along with stable training.

Python

# Normalizing the features for stable and fast training.

scaler = StandardScaler()

X = scaler.fit_transform(X)

X_val = scaler.transform(X_val)

Now let’s train some state-of-the-art models for classification and train them on our training data.

 LogisticRegression

 XGBClassifier

 SV

models = [LogisticRegression(), XGBClassifier(), SVC(kernel='rbf', probability=True)]

for i in range(3):

models[i].fit(X, Y)

print(f'{models[i]} : ')

train_preds = models[i].predict_proba(X)

print('Training Accuracy : ', metrics.roc_auc_score(Y, train_preds[:,1]))

val_preds = models[i].predict_proba(X_val)

print('Validation Accuracy : ', metrics.roc_auc_score(Y_val, val_preds[:,1]))

print()
LogisticRegression() :

Training Accuracy : 0.8893967324057472

Validation Accuracy : 0.8966666666666667

XGBClassifier() :

Training Accuracy : 0.9903285270573975

Validation Accuracy : 0.8408333333333333

SVC(probability=True) :

Training Accuracy : 0.9026413474407211

Validation Accuracy : 0.8858333333333333

Model Evaluation

From the above accuracies, we can say that Logistic Regression and support vector classifier are satisfactory as the
gap between the training and the validation accuracy is low. Let’s plot the confusion matrix as well for the validation
data using the SVC model.

Python

import matplotlib.pyplot as plt

from sklearn.metrics import ConfusionMatrixDisplay

from sklearn import metrics

ConfusionMatrixDisplay.from_estimator(models[2], X_val, Y_val)

plt.show()

# This code is modified by Susobhan Akhuli

Let’s plot the classification report as well for the validation data using the SVC model.

Python Code:-

print(metrics.classification_report(Y_val,

models[2].predict(X_val)))
precision recall f1-score support

0 0.84 0.67 0.74 24

1 0.85 0.94 0.90 50

accuracy 0.85 74

macro avg 0.85 0.80 0.82 74

weighted avg 0.85 0.85 0.85 74

Solution - Data Analysis With Python-Project-2 - v1.0
No ratings yet
Solution - Data Analysis With Python-Project-2 - v1.0
14 pages
NLP Assignment-1 20192
No ratings yet
NLP Assignment-1 20192
3 pages
Plane Link List
50% (2)
Plane Link List
2 pages
# For Linear Algebra Import Numpy As NP # For Data Processing Import Pandas As PD
No ratings yet
# For Linear Algebra Import Numpy As NP # For Data Processing Import Pandas As PD
4 pages
Python Scripts For Machine Learning
No ratings yet
Python Scripts For Machine Learning
13 pages
210430_PracticalWeek03a
No ratings yet
210430_PracticalWeek03a
1 page
Slides on DataI
No ratings yet
Slides on DataI
33 pages
This Study Resource Was
No ratings yet
This Study Resource Was
5 pages
Building Good Training Sets UNIT 1 PART2
No ratings yet
Building Good Training Sets UNIT 1 PART2
46 pages
Case Study-3
No ratings yet
Case Study-3
1 page
Rain Prediction Using Random Forest
No ratings yet
Rain Prediction Using Random Forest
30 pages
22K61A0654_2_sasi_auto
No ratings yet
22K61A0654_2_sasi_auto
24 pages
CS 611 Slides 4
No ratings yet
CS 611 Slides 4
25 pages
Rainfall Prediction
100% (2)
Rainfall Prediction
33 pages
indexdw (1)
No ratings yet
indexdw (1)
34 pages
EDA - Exploratory Data Analysis
No ratings yet
EDA - Exploratory Data Analysis
16 pages
PythonForMachineLearning
No ratings yet
PythonForMachineLearning
66 pages
Dealing With Missing Data in Python Pandas
100% (1)
Dealing With Missing Data in Python Pandas
14 pages
Lecture Material 3
No ratings yet
Lecture Material 3
7 pages
Rainfall
No ratings yet
Rainfall
24 pages
IDM Assignment
No ratings yet
IDM Assignment
15 pages
Machine Learning
No ratings yet
Machine Learning
30 pages
Python Learning
No ratings yet
Python Learning
21 pages
ML Book Notes
No ratings yet
ML Book Notes
9 pages
Dwdm-Lab Manual
No ratings yet
Dwdm-Lab Manual
39 pages
The Implication of Statistical Analysis and Feature Engineering For Model Building Using Machine Learning Algorithms
No ratings yet
The Implication of Statistical Analysis and Feature Engineering For Model Building Using Machine Learning Algorithms
11 pages
Lab 08 - Data Preprocessing
No ratings yet
Lab 08 - Data Preprocessing
9 pages
AD3461_ML Lab Manual
No ratings yet
AD3461_ML Lab Manual
54 pages
UNITIV.BtechIot
No ratings yet
UNITIV.BtechIot
43 pages
Lecture Material 10
No ratings yet
Lecture Material 10
9 pages
01 - Feature Engg
No ratings yet
01 - Feature Engg
43 pages
Lab Manual
No ratings yet
Lab Manual
55 pages
Practise Questions
No ratings yet
Practise Questions
26 pages
Data Analysis
No ratings yet
Data Analysis
8 pages
EDS - Python Cheat Sheet
0% (1)
EDS - Python Cheat Sheet
3 pages
Practical Implementation 02
No ratings yet
Practical Implementation 02
13 pages
11.feature Selection, Extraction
No ratings yet
11.feature Selection, Extraction
38 pages
Machine Learning Lecture1 - 26-27 Aug
No ratings yet
Machine Learning Lecture1 - 26-27 Aug
30 pages
Project-1 (Data Preprocessing)
No ratings yet
Project-1 (Data Preprocessing)
5 pages
Recurrent Neural Network-Programs
No ratings yet
Recurrent Neural Network-Programs
9 pages
Final Report 1301174460 1301174539 AMLdocx
No ratings yet
Final Report 1301174460 1301174539 AMLdocx
12 pages
Overview of Data Cleaning
No ratings yet
Overview of Data Cleaning
17 pages
MLlab Manual LIET
No ratings yet
MLlab Manual LIET
52 pages
LAB MANUAL For Machine Learning
No ratings yet
LAB MANUAL For Machine Learning
15 pages
ML and Deploying It Using Flask and Docker.
No ratings yet
ML and Deploying It Using Flask and Docker.
30 pages
ashfatmaterial
No ratings yet
ashfatmaterial
4 pages
23BCE7092_ML_Lab_Assignment[1]
No ratings yet
23BCE7092_ML_Lab_Assignment[1]
14 pages
Data Preprocessing Techniques in ML
No ratings yet
Data Preprocessing Techniques in ML
12 pages
R20 Iii-Ii ML Lab Manual
100% (1)
R20 Iii-Ii ML Lab Manual
79 pages
Machine Learning Project Checklist
No ratings yet
Machine Learning Project Checklist
30 pages
221IT027_DA_lab3 (2)
No ratings yet
221IT027_DA_lab3 (2)
5 pages
data-mining-lab-manual-CSE-VII-Sem
No ratings yet
data-mining-lab-manual-CSE-VII-Sem
63 pages
Manual
No ratings yet
Manual
48 pages
Exercise5 Solution
No ratings yet
Exercise5 Solution
22 pages
ML JOURNAL
No ratings yet
ML JOURNAL
53 pages
Code shabab error 7
No ratings yet
Code shabab error 7
5 pages
Machine Learning
No ratings yet
Machine Learning
28 pages
Air Quality Prediction
No ratings yet
Air Quality Prediction
21 pages
5 Techniques To Handle Imbalanced Data For A Classification Problem
No ratings yet
5 Techniques To Handle Imbalanced Data For A Classification Problem
7 pages
Data Pre-Processing Python For Beginner
No ratings yet
Data Pre-Processing Python For Beginner
12 pages
Data Pre-Processing Python For Beginner
No ratings yet
Data Pre-Processing Python For Beginner
12 pages
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
Sociocultural Factors in Teaching and Learning
No ratings yet
Sociocultural Factors in Teaching and Learning
50 pages
XC9572 In-System Programmable CPLD: Features Description
No ratings yet
XC9572 In-System Programmable CPLD: Features Description
9 pages
Math Club
No ratings yet
Math Club
5 pages
FMS 55 1050 LD Spec REV1.0.3 1
No ratings yet
FMS 55 1050 LD Spec REV1.0.3 1
5 pages
Revised Acceptance Guidelines DPR 2017
No ratings yet
Revised Acceptance Guidelines DPR 2017
53 pages
Spoken Word Production: A Theory of Lexical Access: Willem J. M. Levelt
No ratings yet
Spoken Word Production: A Theory of Lexical Access: Willem J. M. Levelt
8 pages
Jig and Fixture
100% (1)
Jig and Fixture
38 pages
الذكاء العاطفي وعلاقته بضغوط العمل - دراسة حالة على عينة من موظفي مؤسسة توزيع الكهرباء والغاز بمدينة الأغواط
No ratings yet
الذكاء العاطفي وعلاقته بضغوط العمل - دراسة حالة على عينة من موظفي مؤسسة توزيع الكهرباء والغاز بمدينة الأغواط
24 pages
ID - 241 1ststrike Oilfield Surplus Auction - Catalog
No ratings yet
ID - 241 1ststrike Oilfield Surplus Auction - Catalog
24 pages
CMC Ans 11013 Enu
No ratings yet
CMC Ans 11013 Enu
27 pages
Bang Olufsen Icepower 200ac SCH
No ratings yet
Bang Olufsen Icepower 200ac SCH
1 page
Intro To Philosophy
No ratings yet
Intro To Philosophy
18 pages
Confirmation of Diagnosis: Excess of Clearance Between Piston and Liner
No ratings yet
Confirmation of Diagnosis: Excess of Clearance Between Piston and Liner
2 pages
Canguilhem-The Brain and Thought PDF
100% (1)
Canguilhem-The Brain and Thought PDF
12 pages
Ranjith Krishnan: Session 8
No ratings yet
Ranjith Krishnan: Session 8
7 pages
Integrity
No ratings yet
Integrity
3 pages
What Is Mother Tongue
No ratings yet
What Is Mother Tongue
2 pages
IFA Guideline FV v6 0 Sep22 en
No ratings yet
IFA Guideline FV v6 0 Sep22 en
91 pages
Hebrani
No ratings yet
Hebrani
8 pages
قاموس مصطلحات تقنيات التعليم
No ratings yet
قاموس مصطلحات تقنيات التعليم
42 pages
Writing Variables Cycle
100% (1)
Writing Variables Cycle
1 page
O&M Manual - Reciprocating System - MTD075
No ratings yet
O&M Manual - Reciprocating System - MTD075
26 pages
Question 6
67% (3)
Question 6
4 pages
Statman & Caldwell - Applying Behavioral Finance To Capital Budgeting
No ratings yet
Statman & Caldwell - Applying Behavioral Finance To Capital Budgeting
10 pages
To Be Affirmative and Negative
No ratings yet
To Be Affirmative and Negative
2 pages
Power Point Presentation Exhibit 3h-r (No Redactions)
No ratings yet
Power Point Presentation Exhibit 3h-r (No Redactions)
19 pages
s7 Pdiag For s7-300 and s7-400 Diagnostics For Lad, STL, and FBD - Ingles
No ratings yet
s7 Pdiag For s7-300 and s7-400 Diagnostics For Lad, STL, and FBD - Ingles
220 pages
BSAB-Principles of Soil Science
No ratings yet
BSAB-Principles of Soil Science
11 pages