0% found this document useful (0 votes)
5 views30 pages

Rain Prediction Using Random Forest

Uploaded by

josemonjohn10
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views30 pages

Rain Prediction Using Random Forest

Uploaded by

josemonjohn10
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30

MINI PROJECT -

20MCA245

RAIN PREDICTION
USING RANDOM
FOREST

JOSEMON JOHN (MUT23MCA-2039) | Project Guide : Dr Smitha Anu


1
CONTENTS

● INTRODUCTION
● LITERATURE REVIEW
● DATA PREPROCESSING
● MODEL SELECTION
● ACCURACY RATE
● PRODUCT BACKLOG AND SPRINTS
● CONCLUSION
● REFERENCES

2
INTRODUCTION
● Accurate rain prediction is crucial for various sectors,
including agriculture, disaster management, and water
resource management.
● Timely predictions can help to avoid risks and optimize
resource usage.

OBJECTIVE
The goal of this project is to develop a machine learning
model that predicts rainfall using historical weather
data. The model will utilize the Random Forest algorithm to
efficiently handle the data.
3
Dataset Overview

Source: Kaggle
Item 1 Name of dataset : weatherAUS.csv

This dataset contains about 10 years of daily weather observations from


numerous Australian weather stations.

RainTomorrow is the target variable to predict. It means -- did it rain the


next day, Yes or No?

No of Rows No of Columns

145460 23

4
Dataset Overview

5
Dataset Overview

Attribute Description
Date The date of observation.

Location The common name of the weather station's location.

MinTemp Minimum temperature (°C).

MaxTemp Maximum temperature (°C).

Rainfall The amount of rainfall recorded for the day in mm

Evaporation Class A pan evaporation (mm) in the 24 hours to 9am.

Sunshine Number of Hours of bright sunshine in a day.

6
Dataset Overview

Attribute Description
WindGustDir Direction of the strongest wind gust.

WindGustSpeed Speed (km/h) of the strongest wind gust.

WindDir9am Wind direction at 9am

WindDir3pm Wind direction at 3pm.

WindSpeed9am Wind speed (km/h) averaged over 10 minutes prior to 9am.

WindSpeed3pm Wind speed (km/h) averaged over 10 minutes prior to 3pm.

Humidity9am Humidity (%) at 9am.

7
Dataset Overview

Attribute Description
Humidity3pm Humidity (%) at 3pm.

Pressure9am Atmospheric pressure (hPa) at 9am.

Pressure3pm Atmospheric pressure (hPa) at 3pm

Cloud9am Fraction of sky obscured by cloud at 9am (oktas).

Cloud3pm Fraction of sky obscured by cloud at 3pm (oktas).

Temp9am Temperature (°C) at 9am.

Temp3pm Temperature (°C) at 3pm.

8
Dataset Overview

Attribute Description
RainToday yes, if precipitation (mm) in the 24 hours to 9am exceeds 1mm, otherwise
No

Rain Tomorrow Target variable in our prediction model, which we are trying to
predict using the other weather attributes.

9
LITERATURE REVIEW
SL PAPER AUTHOR FINDINGS RESULT
NO

1 Rainfall prediction Moulana Data Source: Historical rainfall data (1901-2015) Training
25919 instances and 11 features accuracy:
using Machine Mohammed,
Data Pre-processing: Handle missing values by 99.58%
Learning Kolapalli,Niharika filling with mean values and splitting the dataset
Techniques Golla and Siva Sai (80:20) Testing accuracy:
Maturi. Classification Techniques: Multiple Linear 98.59%
(YEAR 2020) Regression(MLR), Support Vector
Regression(SVR) and Random Forest

2 Rainfall prediction Arnav Garg Data Source: Historical rainfall data (1951-2015) Training
using Machine Himanshu Pandey was sourced from the National Data Sharing and accuracy: 85%
Learning Accessibility Policy (NDSAP) Data
Techniques. Pre-processing: The dataset was cleaned and Testing accuracy:
divided into training (1951-2014) and testing 80%.
(YEAR 2019) (2015) subsets.
Classification Techniques: Support Vector
Regression,K-Nearest Neighbors, .
10
DATA PREPROCESSING
1.Feature Classification
● The features were classified into two main categories.

Numerical Features

Categorical Features

2.Handling Missing Values


● Handled missing data by using techniques like median imputation for
numerical values and Random value imputation for categorical variables.

3.Outlier Detection

● Identified outliers using


● Outliers were removed to improve the model’s performance.

11
4.Converting categorical variables to numerical format

● Converting categorical data to numerical form is a crucial step.

5.Feature Selection

● Selecting the most relevant features from the dataset.

It improves model performance by:

● Reducing overfitting.
● Enhancing accuracy.
● Reducing training time.

12
6.Splitting The Dataset

● Training and Testing Split: Divided the dataset into 80% for training and
20% for testing.

● This ensures that the model is trained on the majority of the data

13
MODEL SELECTION :
RANDOM FOREST
Random Forest is an ensemble method that builds multiple
decision trees for classification and regression, combining their
predictions to improve accuracy.

● It often achieves high predictive accuracy compared to


other algorithms.
● It requires minimal data preprocessing, such as
normalization or scaling.
● It can maintain accuracy even with missing data.

14
LITERATURE REVIEW AND INSPIRATION
● Handling Missing Data: Effectively manages datasets with missing
values
● Scalability: Efficient with large datasets and big data applications.
● Visualization: Easy to visualize individual trees within the forest.

15
Model Performance

Current Model Accuracy

● The current Accuracy of the model is 85%


● Achieved using techniques like :
● Data Preprocessing
● Model Selection

16
PRODUCT BACKLOG
BACKLOG USER STORIES TASKS
ID

101 As a data analyst, 1. Literature review.


I want to import the dataset and 2. Write code to import the dataset.
perform an initial view. 3. Display the Dataset.

102 As a data scientist, 1. Review feature names and write


I want to understand the features descriptions.
and their data types. 2. Print each features and their data types.

103 As a data analyst 1. Review the data to classify features as


I want to classify features as numerical or categorical.
numerical or categorical. 2. Count and display the numerical features.
3. Count and display the categorical features.

104 As a data scientist, 1. Identify the missing value.


I want to identify and impute 2. Choose appropriate imputation methods
missing values in the dataset. 3. Apply the methods to Categorical &
Numerical values
17
PRODUCT BACKLOG
BACKLOG USER STORIES TASKS
ID

105 As a data scientist, 1. Use visualization libraries to create graphs.


I want to visualize features and 2. Create a Box plot to find the outliers
detect outliers. 3. Apply outlier detection methods and record
findings.

106 As a data engineer, 1. Select an outlier handling strategy


I want to handle detected outliers 2. Implement the chosen strategy.
in the dataset. 3. Check if all the outlier removed.

107 As a data scientist, 1. Label Encoding: Convert each category to a


I want to convert categorical unique integer.
values into numerical values and do 2. Drop irrelevant features.
feature selection.

18
PRODUCT BACKLOG
BACKLOG USER STORIES TASKS
ID

108 As a data scientist, 1. Define the Split Strategy


I want to split the dataset, 2. Define the split ratio (80:20)
So that I can prepare the data for 3. Perform the Dataset Split:
machine learning models.

109 As a data scientist, 1. Select and configure different machine


I want to train and evaluate learning algorithms.
machine learning models, 2. Monitor performance metrics during training.
So that I can assess their 3. Analyse the accuracy and Steps to improve
performance and select the best the accuracy if needed.
model for deployment

19
SPRINT
SPRINT BURN DOWN
BURN DOWN CHART
CHART1

INITIAL
ESTIMATE Aug-01 Aug-02 Aug-05 Aug-06 Aug-07 Aug-08 Aug-09 Aug-10

BACKLOG ID USER STORIES DAY-0 DAY-1 DAY-2 DAY-3 DAY-4 DAY-5 DAY-6 DAY-7 DAY-8

101 LITERATURE REVIEW 2 2

101 IMPORT DATASET 1 1

101 INITIAL VIEW 1 1

102 REVIEW FEATURES 2 1 1

102 LIST DATA TYPES 2 1 1

REMAINING EFFORT 8 6 5 4 3 1 1 1 0

20
SPRINT BURN DOWN CHART 1

21
SPRINT BURN DOWN CHART 2
SPRINT BURN DOWN CHART

INITIAL
ESTIMATE Aug-12 Aug-13 Aug-14 Aug-15 Aug-16 Aug-17 Aug-19
BACKLOG ID USER STORIES

DAY-0 DAY-1 DAY-2 DAY-3 DAY-4 DAY-5 DAY-6 DAY-7

103.1 Classify Features 3 1 1 1

103.2 Rationale for Classifications 2 1 1

104.1 Identify Missing Values 3 1 2

104.2 Choose Imputation Methods 3 1 2

104.3 Apply Imputation 3 1 2

REMAINING EFFORT 14 13 12 10 8 5 2 0

22
SPRINT BURN DOWN CHART 2

23
SPRINT BURN
SPRINT BURNDOWN CHART 3
DOWN CHART

INITIAL
ESTIMATE Aug-20 Aug-21 Aug-22 Aug-23 Aug-26 Aug-27
BACKLOG ID USER STORIES

DAY-0 DAY-1 DAY-2 DAY-3 DAY-4 DAY-5 DAY-6

105.1 DATA VISUALIZATION 2 2

105.2 OUTLIER DETECTION 2 1 1

105.3 RECORD FINDING 2 1 1

106.1 OUTLIER HANDLING 2 2

107.1 DATA ENCODING 2 1 1

107.2 DOCUMENT RESULT 2 1 1

REMAINING EFFORT 12 9 7 4 3 1 0
24
SPRINT BURN DOWN CHART 3

25
SPRINT BURN DOWN CHART 4
SPRINT BURN DOWN CHART

INITIAL
ESTIMATE Aug-28 Aug-29 Aug-30 Aug-31 0CT-03
BACKLOG ID USER STORIES
DAY-0 DAY-1 DAY-2 DAY-3 DAY-4 DAY-5

108.1 DEFINE SPLIT 1 1

108.2 PERFORM SPLIT 2 2

108.3 SELECT ALGORITHM 3 1 2

109.1 MONITOR PERFORMANCE 3 1 2

109.2 ANALYZE ACCURACY 1 1

REMAINING EFFORT 10 9 6 4 3 0

26
SPRINT BURN DOWN CHART 4

27
CONCLUSION
We have completed the second interim evaluation for our Rain Prediction
System project. Key accomplishments include:

Tasks Completed:

● Data Preprocessing
● Model Development
Future Steps:

We aim to improve the model's accuracy beyond 90% using advanced method

28
REFERENCES
https://youtu.be/dv2TruzOOmU?si=86mVgALYHtSzugle

https://docs.python.org/3/reference

LITERATURE REVIEW
https://tinyurl.com/4fxab5r6

https://tinyurl.com/25kr559k

29
THANK YOU

30

You might also like