Rain Prediction Using Random Forest
Rain Prediction Using Random Forest
20MCA245
RAIN PREDICTION
USING RANDOM
FOREST
● INTRODUCTION
● LITERATURE REVIEW
● DATA PREPROCESSING
● MODEL SELECTION
● ACCURACY RATE
● PRODUCT BACKLOG AND SPRINTS
● CONCLUSION
● REFERENCES
2
INTRODUCTION
● Accurate rain prediction is crucial for various sectors,
including agriculture, disaster management, and water
resource management.
● Timely predictions can help to avoid risks and optimize
resource usage.
OBJECTIVE
The goal of this project is to develop a machine learning
model that predicts rainfall using historical weather
data. The model will utilize the Random Forest algorithm to
efficiently handle the data.
3
Dataset Overview
Source: Kaggle
Item 1 Name of dataset : weatherAUS.csv
No of Rows No of Columns
145460 23
4
Dataset Overview
5
Dataset Overview
Attribute Description
Date The date of observation.
6
Dataset Overview
Attribute Description
WindGustDir Direction of the strongest wind gust.
7
Dataset Overview
Attribute Description
Humidity3pm Humidity (%) at 3pm.
8
Dataset Overview
Attribute Description
RainToday yes, if precipitation (mm) in the 24 hours to 9am exceeds 1mm, otherwise
No
Rain Tomorrow Target variable in our prediction model, which we are trying to
predict using the other weather attributes.
9
LITERATURE REVIEW
SL PAPER AUTHOR FINDINGS RESULT
NO
1 Rainfall prediction Moulana Data Source: Historical rainfall data (1901-2015) Training
25919 instances and 11 features accuracy:
using Machine Mohammed,
Data Pre-processing: Handle missing values by 99.58%
Learning Kolapalli,Niharika filling with mean values and splitting the dataset
Techniques Golla and Siva Sai (80:20) Testing accuracy:
Maturi. Classification Techniques: Multiple Linear 98.59%
(YEAR 2020) Regression(MLR), Support Vector
Regression(SVR) and Random Forest
2 Rainfall prediction Arnav Garg Data Source: Historical rainfall data (1951-2015) Training
using Machine Himanshu Pandey was sourced from the National Data Sharing and accuracy: 85%
Learning Accessibility Policy (NDSAP) Data
Techniques. Pre-processing: The dataset was cleaned and Testing accuracy:
divided into training (1951-2014) and testing 80%.
(YEAR 2019) (2015) subsets.
Classification Techniques: Support Vector
Regression,K-Nearest Neighbors, .
10
DATA PREPROCESSING
1.Feature Classification
● The features were classified into two main categories.
Numerical Features
Categorical Features
3.Outlier Detection
11
4.Converting categorical variables to numerical format
5.Feature Selection
● Reducing overfitting.
● Enhancing accuracy.
● Reducing training time.
12
6.Splitting The Dataset
● Training and Testing Split: Divided the dataset into 80% for training and
20% for testing.
● This ensures that the model is trained on the majority of the data
13
MODEL SELECTION :
RANDOM FOREST
Random Forest is an ensemble method that builds multiple
decision trees for classification and regression, combining their
predictions to improve accuracy.
14
LITERATURE REVIEW AND INSPIRATION
● Handling Missing Data: Effectively manages datasets with missing
values
● Scalability: Efficient with large datasets and big data applications.
● Visualization: Easy to visualize individual trees within the forest.
15
Model Performance
16
PRODUCT BACKLOG
BACKLOG USER STORIES TASKS
ID
18
PRODUCT BACKLOG
BACKLOG USER STORIES TASKS
ID
19
SPRINT
SPRINT BURN DOWN
BURN DOWN CHART
CHART1
INITIAL
ESTIMATE Aug-01 Aug-02 Aug-05 Aug-06 Aug-07 Aug-08 Aug-09 Aug-10
BACKLOG ID USER STORIES DAY-0 DAY-1 DAY-2 DAY-3 DAY-4 DAY-5 DAY-6 DAY-7 DAY-8
REMAINING EFFORT 8 6 5 4 3 1 1 1 0
20
SPRINT BURN DOWN CHART 1
21
SPRINT BURN DOWN CHART 2
SPRINT BURN DOWN CHART
INITIAL
ESTIMATE Aug-12 Aug-13 Aug-14 Aug-15 Aug-16 Aug-17 Aug-19
BACKLOG ID USER STORIES
REMAINING EFFORT 14 13 12 10 8 5 2 0
22
SPRINT BURN DOWN CHART 2
23
SPRINT BURN
SPRINT BURNDOWN CHART 3
DOWN CHART
INITIAL
ESTIMATE Aug-20 Aug-21 Aug-22 Aug-23 Aug-26 Aug-27
BACKLOG ID USER STORIES
REMAINING EFFORT 12 9 7 4 3 1 0
24
SPRINT BURN DOWN CHART 3
25
SPRINT BURN DOWN CHART 4
SPRINT BURN DOWN CHART
INITIAL
ESTIMATE Aug-28 Aug-29 Aug-30 Aug-31 0CT-03
BACKLOG ID USER STORIES
DAY-0 DAY-1 DAY-2 DAY-3 DAY-4 DAY-5
REMAINING EFFORT 10 9 6 4 3 0
26
SPRINT BURN DOWN CHART 4
27
CONCLUSION
We have completed the second interim evaluation for our Rain Prediction
System project. Key accomplishments include:
Tasks Completed:
● Data Preprocessing
● Model Development
Future Steps:
We aim to improve the model's accuracy beyond 90% using advanced method
28
REFERENCES
https://youtu.be/dv2TruzOOmU?si=86mVgALYHtSzugle
https://docs.python.org/3/reference
LITERATURE REVIEW
https://tinyurl.com/4fxab5r6
https://tinyurl.com/25kr559k
29
THANK YOU
30