0% found this document useful (0 votes)

5 views30 pages

Rain Prediction Using Random Forest

Uploaded by

josemonjohn10

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views30 pages

Rain Prediction Using Random Forest

Uploaded by

josemonjohn10

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 30

MINI PROJECT -

20MCA245

RAIN PREDICTION
USING RANDOM
FOREST

JOSEMON JOHN (MUT23MCA-2039) | Project Guide : Dr Smitha Anu

1
CONTENTS

● INTRODUCTION
● LITERATURE REVIEW
● DATA PREPROCESSING
● MODEL SELECTION
● ACCURACY RATE
● PRODUCT BACKLOG AND SPRINTS
● CONCLUSION
● REFERENCES

2
INTRODUCTION
● Accurate rain prediction is crucial for various sectors,
including agriculture, disaster management, and water
resource management.
● Timely predictions can help to avoid risks and optimize
resource usage.

OBJECTIVE
The goal of this project is to develop a machine learning
model that predicts rainfall using historical weather
data. The model will utilize the Random Forest algorithm to
efficiently handle the data.
3
Dataset Overview

Source: Kaggle
Item 1 Name of dataset : weatherAUS.csv

This dataset contains about 10 years of daily weather observations from

numerous Australian weather stations.

RainTomorrow is the target variable to predict. It means -- did it rain the

next day, Yes or No?

No of Rows No of Columns

145460 23

4
Dataset Overview

5
Dataset Overview

Attribute Description
Date The date of observation.

Location The common name of the weather station's location.

MinTemp Minimum temperature (°C).

MaxTemp Maximum temperature (°C).

Rainfall The amount of rainfall recorded for the day in mm

Evaporation Class A pan evaporation (mm) in the 24 hours to 9am.

Sunshine Number of Hours of bright sunshine in a day.

6
Dataset Overview

Attribute Description
WindGustDir Direction of the strongest wind gust.

WindGustSpeed Speed (km/h) of the strongest wind gust.

WindDir9am Wind direction at 9am

WindDir3pm Wind direction at 3pm.

WindSpeed9am Wind speed (km/h) averaged over 10 minutes prior to 9am.

WindSpeed3pm Wind speed (km/h) averaged over 10 minutes prior to 3pm.

Humidity9am Humidity (%) at 9am.

7
Dataset Overview

Attribute Description
Humidity3pm Humidity (%) at 3pm.

Pressure9am Atmospheric pressure (hPa) at 9am.

Pressure3pm Atmospheric pressure (hPa) at 3pm

Cloud9am Fraction of sky obscured by cloud at 9am (oktas).

Cloud3pm Fraction of sky obscured by cloud at 3pm (oktas).

Temp9am Temperature (°C) at 9am.

Temp3pm Temperature (°C) at 3pm.

8
Dataset Overview

Attribute Description
RainToday yes, if precipitation (mm) in the 24 hours to 9am exceeds 1mm, otherwise
No

Rain Tomorrow Target variable in our prediction model, which we are trying to
predict using the other weather attributes.

9
LITERATURE REVIEW
SL PAPER AUTHOR FINDINGS RESULT
NO

1 Rainfall prediction Moulana Data Source: Historical rainfall data (1901-2015) Training
25919 instances and 11 features accuracy:
using Machine Mohammed,
Data Pre-processing: Handle missing values by 99.58%
Learning Kolapalli,Niharika filling with mean values and splitting the dataset
Techniques Golla and Siva Sai (80:20) Testing accuracy:
Maturi. Classification Techniques: Multiple Linear 98.59%
(YEAR 2020) Regression(MLR), Support Vector
Regression(SVR) and Random Forest

2 Rainfall prediction Arnav Garg Data Source: Historical rainfall data (1951-2015) Training
using Machine Himanshu Pandey was sourced from the National Data Sharing and accuracy: 85%
Learning Accessibility Policy (NDSAP) Data
Techniques. Pre-processing: The dataset was cleaned and Testing accuracy:
divided into training (1951-2014) and testing 80%.
(YEAR 2019) (2015) subsets.
Classification Techniques: Support Vector
Regression,K-Nearest Neighbors, .
10
DATA PREPROCESSING
1.Feature Classification
● The features were classified into two main categories.

Numerical Features

Categorical Features

2.Handling Missing Values

● Handled missing data by using techniques like median imputation for
numerical values and Random value imputation for categorical variables.

3.Outlier Detection

● Identified outliers using

● Outliers were removed to improve the model’s performance.

11
4.Converting categorical variables to numerical format

● Converting categorical data to numerical form is a crucial step.

5.Feature Selection

● Selecting the most relevant features from the dataset.

It improves model performance by:

● Reducing overfitting.
● Enhancing accuracy.
● Reducing training time.

12
6.Splitting The Dataset

● Training and Testing Split: Divided the dataset into 80% for training and
20% for testing.

● This ensures that the model is trained on the majority of the data

13
MODEL SELECTION :
RANDOM FOREST
Random Forest is an ensemble method that builds multiple
decision trees for classification and regression, combining their
predictions to improve accuracy.

● It often achieves high predictive accuracy compared to

other algorithms.
● It requires minimal data preprocessing, such as
normalization or scaling.
● It can maintain accuracy even with missing data.

14
LITERATURE REVIEW AND INSPIRATION
● Handling Missing Data: Effectively manages datasets with missing
values
● Scalability: Efficient with large datasets and big data applications.
● Visualization: Easy to visualize individual trees within the forest.

15
Model Performance

Current Model Accuracy

● The current Accuracy of the model is 85%

● Achieved using techniques like :
● Data Preprocessing
● Model Selection

16
PRODUCT BACKLOG
BACKLOG USER STORIES TASKS
ID

101 As a data analyst, 1. Literature review.

I want to import the dataset and 2. Write code to import the dataset.
perform an initial view. 3. Display the Dataset.

102 As a data scientist, 1. Review feature names and write

I want to understand the features descriptions.
and their data types. 2. Print each features and their data types.

103 As a data analyst 1. Review the data to classify features as

I want to classify features as numerical or categorical.
numerical or categorical. 2. Count and display the numerical features.
3. Count and display the categorical features.

104 As a data scientist, 1. Identify the missing value.

I want to identify and impute 2. Choose appropriate imputation methods
missing values in the dataset. 3. Apply the methods to Categorical &
Numerical values
17
PRODUCT BACKLOG
BACKLOG USER STORIES TASKS
ID

105 As a data scientist, 1. Use visualization libraries to create graphs.

I want to visualize features and 2. Create a Box plot to find the outliers
detect outliers. 3. Apply outlier detection methods and record
findings.

106 As a data engineer, 1. Select an outlier handling strategy

I want to handle detected outliers 2. Implement the chosen strategy.
in the dataset. 3. Check if all the outlier removed.

107 As a data scientist, 1. Label Encoding: Convert each category to a

I want to convert categorical unique integer.
values into numerical values and do 2. Drop irrelevant features.
feature selection.

18
PRODUCT BACKLOG
BACKLOG USER STORIES TASKS
ID

108 As a data scientist, 1. Define the Split Strategy

I want to split the dataset, 2. Define the split ratio (80:20)
So that I can prepare the data for 3. Perform the Dataset Split:
machine learning models.

109 As a data scientist, 1. Select and configure different machine

I want to train and evaluate learning algorithms.
machine learning models, 2. Monitor performance metrics during training.
So that I can assess their 3. Analyse the accuracy and Steps to improve
performance and select the best the accuracy if needed.
model for deployment

19
SPRINT
SPRINT BURN DOWN
BURN DOWN CHART
CHART1

INITIAL
ESTIMATE Aug-01 Aug-02 Aug-05 Aug-06 Aug-07 Aug-08 Aug-09 Aug-10

BACKLOG ID USER STORIES DAY-0 DAY-1 DAY-2 DAY-3 DAY-4 DAY-5 DAY-6 DAY-7 DAY-8

101 LITERATURE REVIEW 2 2

101 IMPORT DATASET 1 1

101 INITIAL VIEW 1 1

102 REVIEW FEATURES 2 1 1

102 LIST DATA TYPES 2 1 1

REMAINING EFFORT 8 6 5 4 3 1 1 1 0

20
SPRINT BURN DOWN CHART 1

21
SPRINT BURN DOWN CHART 2
SPRINT BURN DOWN CHART

INITIAL
ESTIMATE Aug-12 Aug-13 Aug-14 Aug-15 Aug-16 Aug-17 Aug-19
BACKLOG ID USER STORIES

DAY-0 DAY-1 DAY-2 DAY-3 DAY-4 DAY-5 DAY-6 DAY-7

103.1 Classify Features 3 1 1 1

103.2 Rationale for Classifications 2 1 1

104.1 Identify Missing Values 3 1 2

104.2 Choose Imputation Methods 3 1 2

104.3 Apply Imputation 3 1 2

REMAINING EFFORT 14 13 12 10 8 5 2 0

22
SPRINT BURN DOWN CHART 2

23
SPRINT BURN
SPRINT BURNDOWN CHART 3
DOWN CHART

INITIAL
ESTIMATE Aug-20 Aug-21 Aug-22 Aug-23 Aug-26 Aug-27
BACKLOG ID USER STORIES

DAY-0 DAY-1 DAY-2 DAY-3 DAY-4 DAY-5 DAY-6

105.1 DATA VISUALIZATION 2 2

105.2 OUTLIER DETECTION 2 1 1

105.3 RECORD FINDING 2 1 1

106.1 OUTLIER HANDLING 2 2

107.1 DATA ENCODING 2 1 1

107.2 DOCUMENT RESULT 2 1 1

REMAINING EFFORT 12 9 7 4 3 1 0
24
SPRINT BURN DOWN CHART 3

25
SPRINT BURN DOWN CHART 4
SPRINT BURN DOWN CHART

INITIAL
ESTIMATE Aug-28 Aug-29 Aug-30 Aug-31 0CT-03
BACKLOG ID USER STORIES
DAY-0 DAY-1 DAY-2 DAY-3 DAY-4 DAY-5

108.1 DEFINE SPLIT 1 1

108.2 PERFORM SPLIT 2 2

108.3 SELECT ALGORITHM 3 1 2

109.1 MONITOR PERFORMANCE 3 1 2

109.2 ANALYZE ACCURACY 1 1

REMAINING EFFORT 10 9 6 4 3 0

26
SPRINT BURN DOWN CHART 4

27
CONCLUSION
We have completed the second interim evaluation for our Rain Prediction
System project. Key accomplishments include:

Tasks Completed:

● Data Preprocessing
● Model Development
Future Steps:

We aim to improve the model's accuracy beyond 90% using advanced method

28
REFERENCES
https://youtu.be/dv2TruzOOmU?si=86mVgALYHtSzugle

https://docs.python.org/3/reference

LITERATURE REVIEW
https://tinyurl.com/4fxab5r6

https://tinyurl.com/25kr559k

29
THANK YOU

Capstone Presentation: Telecom Churn Study
100% (3)
Capstone Presentation: Telecom Churn Study
19 pages
Clinical Sas Notes
100% (1)
Clinical Sas Notes
21 pages
Flight Fare
No ratings yet
Flight Fare
15 pages
CS 2 3 4 Aml
No ratings yet
CS 2 3 4 Aml
70 pages
7 Data Science / Machine Learning Cheat Sheets in One
100% (1)
7 Data Science / Machine Learning Cheat Sheets in One
9 pages
Week Two Assignment A
No ratings yet
Week Two Assignment A
1 page
CSI5155 ML Project Report
No ratings yet
CSI5155 ML Project Report
23 pages
Rainfall
No ratings yet
Rainfall
24 pages
Mini Project PPT, Sumit Malan
No ratings yet
Mini Project PPT, Sumit Malan
12 pages
Rainfall Prediction
No ratings yet
Rainfall Prediction
29 pages
# For Linear Algebra Import Numpy As NP # For Data Processing Import Pandas As PD
No ratings yet
# For Linear Algebra Import Numpy As NP # For Data Processing Import Pandas As PD
4 pages
Rainfall Prediction Using Machine Learning
No ratings yet
Rainfall Prediction Using Machine Learning
9 pages
Lec 2
No ratings yet
Lec 2
13 pages
Methods and Models
No ratings yet
Methods and Models
12 pages
Rainfall Prediction
100% (2)
Rainfall Prediction
33 pages
Report 1
No ratings yet
Report 1
11 pages
Sat - 149.Pdf - Prediction of Bigmart Sales Using Machine Learning Algorihms
No ratings yet
Sat - 149.Pdf - Prediction of Bigmart Sales Using Machine Learning Algorihms
11 pages
Ids Case Study
No ratings yet
Ids Case Study
15 pages
Python Scripts For Machine Learning
No ratings yet
Python Scripts For Machine Learning
13 pages
Context: Description
No ratings yet
Context: Description
5 pages
"Crop Recommendation System": Internship Presentation On
No ratings yet
"Crop Recommendation System": Internship Presentation On
22 pages
Final Report 1301174460 1301174539 AMLdocx
No ratings yet
Final Report 1301174460 1301174539 AMLdocx
12 pages
This Study Resource Was
No ratings yet
This Study Resource Was
5 pages
Lec 03
No ratings yet
Lec 03
9 pages
Lecture02. ML Pipeline (Chapter 2)
No ratings yet
Lecture02. ML Pipeline (Chapter 2)
50 pages
Assignment 2
No ratings yet
Assignment 2
9 pages
Machine Learning Report (Classification Project Weather)
No ratings yet
Machine Learning Report (Classification Project Weather)
6 pages
DS Model Steps
No ratings yet
DS Model Steps
8 pages
Assignment Question
No ratings yet
Assignment Question
6 pages
MLA NLP Lecture2
No ratings yet
MLA NLP Lecture2
76 pages
Sales Prediction For Big Mart 3.0.pptx MM
No ratings yet
Sales Prediction For Big Mart 3.0.pptx MM
25 pages
Motivations Literature Review Objectives Methodology Results & Discussions Conclusions Future Scope References
No ratings yet
Motivations Literature Review Objectives Methodology Results & Discussions Conclusions Future Scope References
30 pages
Machine Learning Lab Record Report
No ratings yet
Machine Learning Lab Record Report
38 pages
Module 4 - Supervised Learning - First ML Model
No ratings yet
Module 4 - Supervised Learning - First ML Model
23 pages
Predict Rain Tomorrow in Australia
No ratings yet
Predict Rain Tomorrow in Australia
29 pages
Kaggle Course Notes
No ratings yet
Kaggle Course Notes
87 pages
Case Study-3
No ratings yet
Case Study-3
1 page
Retail Market Analysis: Ke Yuan, Yaoxin Liu, Shriyesh Chandra, Rishav Roy New York University
No ratings yet
Retail Market Analysis: Ke Yuan, Yaoxin Liu, Shriyesh Chandra, Rishav Roy New York University
12 pages
Report 4
No ratings yet
Report 4
50 pages
02 Input Output
No ratings yet
02 Input Output
44 pages
Prediction of Breast Cancer Using Machine Learning Algorithms - 2nd Review
No ratings yet
Prediction of Breast Cancer Using Machine Learning Algorithms - 2nd Review
21 pages
Lecture 4
No ratings yet
Lecture 4
56 pages
FinalPaper SalesPredictionModelforBigMart
No ratings yet
FinalPaper SalesPredictionModelforBigMart
14 pages
Lecture 5 - Feature Extraction, Model Building & Evaluation
No ratings yet
Lecture 5 - Feature Extraction, Model Building & Evaluation
35 pages
Data Collection
No ratings yet
Data Collection
8 pages
LAB MANUAL For Machine Learning
No ratings yet
LAB MANUAL For Machine Learning
15 pages
ML Lab
No ratings yet
ML Lab
46 pages
Training Seminar
No ratings yet
Training Seminar
12 pages
Slay The Day
No ratings yet
Slay The Day
21 pages
AIML 7 To 11
No ratings yet
AIML 7 To 11
7 pages
R1-Weather Prediction Mode1
No ratings yet
R1-Weather Prediction Mode1
7 pages
Credit Card Approval Prediction Report-Final
No ratings yet
Credit Card Approval Prediction Report-Final
27 pages
Electric Load Forecasting Using Data Mining Techniques
No ratings yet
Electric Load Forecasting Using Data Mining Techniques
3 pages
Credit Risk Project
No ratings yet
Credit Risk Project
11 pages
Untitled Presentation
No ratings yet
Untitled Presentation
18 pages
SSL Assignment Report 1
No ratings yet
SSL Assignment Report 1
11 pages
Each Stage of A Data Mining Project
No ratings yet
Each Stage of A Data Mining Project
5 pages
Minor Project
No ratings yet
Minor Project
21 pages
Practise Questions
No ratings yet
Practise Questions
26 pages
Lecture 4
No ratings yet
Lecture 4
56 pages
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
Spatiotemporal Transformer
No ratings yet
Spatiotemporal Transformer
14 pages
Homeopathic Treatment As An 'Add On' Therapy - Oawi Report
No ratings yet
Homeopathic Treatment As An 'Add On' Therapy - Oawi Report
12 pages
STAT
No ratings yet
STAT
11 pages
Final Capstone Project - Group 4 - TPS
No ratings yet
Final Capstone Project - Group 4 - TPS
27 pages
Introduction To Sas: Reading Assignment: Selected Sas Documentation For Bios111 Part 1: Introduction To SAS Software
No ratings yet
Introduction To Sas: Reading Assignment: Selected Sas Documentation For Bios111 Part 1: Introduction To SAS Software
22 pages
Data Screening: NCSS Statistical Software
No ratings yet
Data Screening: NCSS Statistical Software
6 pages
Tripod Checklist Prediction Model Development and Validation PDF
No ratings yet
Tripod Checklist Prediction Model Development and Validation PDF
1 page
Numerical Report
No ratings yet
Numerical Report
22 pages
Design of Blended Training For Transfer Into The Workplace: British Journal of Educational Technology November 2008
No ratings yet
Design of Blended Training For Transfer Into The Workplace: British Journal of Educational Technology November 2008
19 pages
Ids Unit 2
No ratings yet
Ids Unit 2
5 pages
PRELIS Examples Guide PDF
No ratings yet
PRELIS Examples Guide PDF
78 pages
ECON 342 AE Model Specification and Data Problems 2021
No ratings yet
ECON 342 AE Model Specification and Data Problems 2021
43 pages
Test Bank For Essentials of Statistics 7th Edition by Triola
No ratings yet
Test Bank For Essentials of Statistics 7th Edition by Triola
30 pages
Single-Sex and Co-Educational Secondary Schooling: What Are The Social and Family Outcomes, in The Short and Longer Term?
No ratings yet
Single-Sex and Co-Educational Secondary Schooling: What Are The Social and Family Outcomes, in The Short and Longer Term?
21 pages
Microsoft Test-DP-100
100% (1)
Microsoft Test-DP-100
50 pages
Yuan, Lou - 2020 - How Social Media Influencers Foster Relationships With Followers The Roles of Source Credibility and Fairness in para
No ratings yet
Yuan, Lou - 2020 - How Social Media Influencers Foster Relationships With Followers The Roles of Source Credibility and Fairness in para
16 pages
YakuzaFraud HK 2020aug
No ratings yet
YakuzaFraud HK 2020aug
37 pages
Changes in Criminal Thinking From Midadolescence To Early Adulthood - Does Trajectory Direction Matter
No ratings yet
Changes in Criminal Thinking From Midadolescence To Early Adulthood - Does Trajectory Direction Matter
11 pages
Matching Donors and Nonprofits
No ratings yet
Matching Donors and Nonprofits
27 pages
A Uri Kulot Era Pija
No ratings yet
A Uri Kulot Era Pija
10 pages
Single Blind Randomised Controlled Trial of GAME Goals Activity Motor Enrichment in Infants at High Risk of Cerebral Palsy 2016 Research in Developmen
No ratings yet
Single Blind Randomised Controlled Trial of GAME Goals Activity Motor Enrichment in Infants at High Risk of Cerebral Palsy 2016 Research in Developmen
12 pages
8438 Ecap792 Data Science Toolbox
No ratings yet
8438 Ecap792 Data Science Toolbox
317 pages
Statistical Computing I 1
No ratings yet
Statistical Computing I 1
192 pages
Comprehensive Guidelines For The Application of In-Situ Polymer Gels For Injection Well Conformance Improvement Based On Field Projects 179575
No ratings yet
Comprehensive Guidelines For The Application of In-Situ Polymer Gels For Injection Well Conformance Improvement Based On Field Projects 179575
27 pages
Summer Internship Project Report Format-Devansh
No ratings yet
Summer Internship Project Report Format-Devansh
58 pages
AMOS-nov1 1
100% (1)
AMOS-nov1 1
83 pages
Marques Et Al 2021 Body Image Accepted Manuscript
No ratings yet
Marques Et Al 2021 Body Image Accepted Manuscript
41 pages