0% found this document useful (0 votes)

131 views4 pages

# For Linear Algebra Import Numpy As NP # For Data Processing Import Pandas As PD

This document outlines steps to build a machine learning model to predict rainfall using a weather dataset. It includes: 1. Importing libraries and loading/preprocessing the weather data, which involves removing unnecessary variables, null values, and outliers. 2. Exploratory data analysis using SelectKBest to identify the top three predictor variables of rainfall as rainfall, humidity, and whether it rained the previous day. 3. Building classification models using logistic regression, random forest, decision tree, and support vector machine to predict rainfall, and evaluating model accuracy on test data. Logistic regression results in 83% accuracy with a runtime of 0.17 seconds.

Uploaded by

Dilip Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

131 views4 pages

# For Linear Algebra Import Numpy As NP # For Data Processing Import Pandas As PD

Uploaded by

Dilip Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Step 1: Import the required libraries

# For linear algebra

import numpy as np
# For data processing
import pandas as pd

Step 2: Load the data set

#Load the data set
df = pd.read_csv('. . . Desktop/weatherAUS.csv')
#Display the shape of the data set
print('Size of weather data frame is :',df.shape)
#Display data
print(df[0:5])

Step 3: Data Preprocessing

# Checking for null values

print(df.count().sort_values())

[5 rows x 24 columns]
Sunshine 75625
Evaporation 82670
Cloud3pm 86102
Cloud 9am 89572
Pressure 9am 130395
Pressure 3pm 130432
WindDir 9am 134894
WindGustDir 135134
WindGustSpeed 135197
Humidity 3pm 140953
WindDir 3pm 141232
Temp 3pm 141851
RISK_MM 142193
RainTomorrow 142193
RainToday 142199
Rainfall 142199
WindSpeed 3pm 142398
Humidity 9am 142806
Temp 9am 143693
WindSpeed 9am 143693
MinTemp 143975
MaxTemp 144199
Location 145460
Date 145460
dtype: int64
During data preprocessing it is always necessary to remove the variables that are not significant.
Unnecessary data will just increase our computations.

df =
df.drop(columns=['Sunshine','Evaporation','Cloud3pm','Cloud9am','Location','RI
SK_MM','Date'],axis=1)
print(df.shape)

(145460, 17)

Next, we will remove all the null values in our data frame.

#Removing null values

df = df.dropna(how='any')
print(df.shape)

(112925, 17)

After removing null values, we must also check our data set for any outliers. An outlier is a data
point that significantly differs from other observations. Outliers usually occur due to
miscalculations while collecting the data.

z = np.abs(stats.zscore(df._get_numeric_data()))
print(z)
df= df[(z < 3).all(axis=1)]
print(df.shape)

[[0.11756741 0.10822071 0.20666127 ... 1.14245477 0.08843526 0.04787026]
[0.84180219 0.20684494 0.27640495 ... 1.04184813 0.04122846 0.31776848]
[0.03761995 0.29277194 0.27640495 ... 0.91249673 0.55672435 0.15688743]
...
[1.44940294 0.23548728 0.27640495 ... 0.58223051 1.03257127 0.34701958]
[1.16159206 0.46462594 0.27640495 ... 0.25166583 0.78080166 0.58102838]
[0.77784422 0.4789471 0.27640495 ... 0.2085487 0.37167606 0.56640283]]
(107868, 17)

Next, we’ll be assigning ‘0s’ and ‘1s’ in the place of ‘YES’ and ‘NO’.

#Change yes and no to 1 and 0 respectvely for RainToday and RainTomorrow

variable
df['RainToday'].replace({'No': 0, 'Yes': 1},inplace = True)
df['RainTomorrow'].replace({'No': 0, 'Yes': 1},inplace = True)
Normalise The Data

Step 4: Exploratory Data Analysis (EDA)

Now that we’re done pre-processing the data set, it’s time to check perform analysis and identify
the significant variables that will help us predict the outcome. To do this we will make use of the
SelectKBest function

#Using SelectKBest to get the top features!

from sklearn.feature_selection import SelectKBest, chi2
X = df.loc[:,df.columns!='RainTomorrow']
y = df[['RainTomorrow']]
selector = SelectKBest(chi2, k=3)
selector.fit(X, y)
X_new = selector.transform(X)
print(X.columns[selector.get_support(indices=True)])

Index(['Rainfall', 'Humidity3pm', 'RainToday'], dtype='object')

The output gives us the three most significant predictor variables:

1. Rainfall
2. Humidity3pm
3. RainToday

The main aim of this demo is to make you understand how Machine Learning works, therefore,
to simplify the computations we will assign only one of these significant variables as the input.

#The important features are put in a data frame

df = df[['Humidity3pm','Rainfall','RainToday','RainTomorrow']]

#To simplify computations we will use only one feature (Humidity3pm) to build
the model

X = df[['Humidity3pm']] \input
y = df[['RainTomorrow']] \output
Step 5: Building a Machine Learning Model

At this step, we will build the Machine Learning model by using the training data set and
evaluate the efficiency of the model by using the testing data set.

We’ll be building classification models, by using the following algorithms:

1. Logistic Regression
2. Random Forest
3. Decision Tree
4. Support Vector Machine

Logistic Regression

#Logistic Regression
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import time

#Calculating the accuracy and the time taken by the classifier
t0=time.time()
#Data Splicing
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.25)
clf_logreg = LogisticRegression(random_state=0)
#Building the model using the training data set
clf_logreg.fit(X_train,y_train)

#Evaluating the model using testing data set
y_pred = clf_logreg.predict(X_test)
score = accuracy_score(y_test,y_pred)

#Printing the accuracy and the time taken by the classifier
print('Accuracy using Logistic Regression:',score)
print('Time taken using Logistic Regression:' , time.time()-t0)

Accuracy using Logistic Regression: 0.8330181332740015
Time taken using Logistic Regression: 0.1741015911102295

Computerised Accounting Systems Ani) Financical
No ratings yet
Computerised Accounting Systems Ani) Financical
57 pages
Data Cleaning - Cheatsheet
100% (2)
Data Cleaning - Cheatsheet
8 pages
22K61A0654 2 Sasi Auto
No ratings yet
22K61A0654 2 Sasi Auto
24 pages
Kaggle Course Notes
No ratings yet
Kaggle Course Notes
87 pages
# (Data Preprocessing) : (Cheatsheet)
No ratings yet
# (Data Preprocessing) : (Cheatsheet)
10 pages
Data Preprocessing in Machine Learning
No ratings yet
Data Preprocessing in Machine Learning
27 pages
Haemostasis: Catalogue
No ratings yet
Haemostasis: Catalogue
88 pages
Iii Aid - ML
No ratings yet
Iii Aid - ML
30 pages
C2-Distributed Databases
No ratings yet
C2-Distributed Databases
95 pages
Machine Learning Lab Assignment 1
No ratings yet
Machine Learning Lab Assignment 1
23 pages
Fortra Data Classification Suite For Windows Deployment Guide
No ratings yet
Fortra Data Classification Suite For Windows Deployment Guide
69 pages
IPCG CG Gyrocomp EN A4 07 2023 WEB
No ratings yet
IPCG CG Gyrocomp EN A4 07 2023 WEB
4 pages
Building Good Training Sets UNIT 1 PART2
No ratings yet
Building Good Training Sets UNIT 1 PART2
46 pages
ML Book Notes
No ratings yet
ML Book Notes
9 pages
ML Journal
No ratings yet
ML Journal
53 pages
DataAnalytics Lab Manual
No ratings yet
DataAnalytics Lab Manual
35 pages
Data Analytics Lab Manual - 250402 - 095326
No ratings yet
Data Analytics Lab Manual - 250402 - 095326
58 pages
ML 3
No ratings yet
ML 3
24 pages
Rainfall Prediction Using Machine Learning
No ratings yet
Rainfall Prediction Using Machine Learning
9 pages
DA Programs
No ratings yet
DA Programs
44 pages
Class Xii PDF For Practical
No ratings yet
Class Xii PDF For Practical
24 pages
Rain Prediction Using Random Forest
No ratings yet
Rain Prediction Using Random Forest
30 pages
NetSim User Manual
No ratings yet
NetSim User Manual
248 pages
UNITIV BtechIot
No ratings yet
UNITIV BtechIot
43 pages
MANUAL
No ratings yet
MANUAL
33 pages
Slides On DataI
No ratings yet
Slides On DataI
33 pages
Indexdw
No ratings yet
Indexdw
34 pages
MACHINE LEARNING Manual
No ratings yet
MACHINE LEARNING Manual
36 pages
BT11803 Tutorial 3 ANSWER
100% (1)
BT11803 Tutorial 3 ANSWER
4 pages
2.dasar Counting 1
No ratings yet
2.dasar Counting 1
19 pages
DA Lab
No ratings yet
DA Lab
27 pages
CP4252 Lab Manual
No ratings yet
CP4252 Lab Manual
13 pages
Untitled Document
No ratings yet
Untitled Document
19 pages
Codes
No ratings yet
Codes
5 pages
Machine Learning Project Checklist
No ratings yet
Machine Learning Project Checklist
30 pages
Manufacturing Machine Learning Tool Mechanical
No ratings yet
Manufacturing Machine Learning Tool Mechanical
13 pages
Practise Questions
No ratings yet
Practise Questions
26 pages
23BCE7092 ML Lab Assignment
No ratings yet
23BCE7092 ML Lab Assignment
14 pages
IDM Assignment
No ratings yet
IDM Assignment
15 pages
Smart Agriculture System
100% (1)
Smart Agriculture System
9 pages
Train
No ratings yet
Train
17 pages
Jashan ML
No ratings yet
Jashan ML
20 pages
Dwdm-Lab Manual
No ratings yet
Dwdm-Lab Manual
39 pages
Subject - Machine Learning Group - E27-24 Name
No ratings yet
Subject - Machine Learning Group - E27-24 Name
18 pages
Practical (Data Science)
No ratings yet
Practical (Data Science)
13 pages
PACS DATA EXTRACT-User Guide
100% (1)
PACS DATA EXTRACT-User Guide
15 pages
Activity File XII 24-25 - 240919 - 091153
No ratings yet
Activity File XII 24-25 - 240919 - 091153
17 pages
221IT027 DA Lab3
No ratings yet
221IT027 DA Lab3
5 pages
Decision Support
No ratings yet
Decision Support
21 pages
CS 611 Slides 4
No ratings yet
CS 611 Slides 4
25 pages
DWM Practical
No ratings yet
DWM Practical
12 pages
User Guide For Free Version
No ratings yet
User Guide For Free Version
20 pages
Machine Learning Report (Classification Project Weather)
No ratings yet
Machine Learning Report (Classification Project Weather)
6 pages
Mini Project With Output
No ratings yet
Mini Project With Output
8 pages
Web Methods EbXML Module Installation and User's Guide 7.1 SP1
100% (1)
Web Methods EbXML Module Installation and User's Guide 7.1 SP1
154 pages
A Study On Supervised Machine Learning Algorithm To Improvise Intrusion Detection Systems For Mobile Ad Hoc Networks
No ratings yet
A Study On Supervised Machine Learning Algorithm To Improvise Intrusion Detection Systems For Mobile Ad Hoc Networks
10 pages
CSI5155 ML Project Report
No ratings yet
CSI5155 ML Project Report
23 pages
AIML
No ratings yet
AIML
13 pages
Week 6 - Data Cleaning
No ratings yet
Week 6 - Data Cleaning
8 pages
Advance Python
No ratings yet
Advance Python
5 pages
Recurrent Neural Network-Programs
No ratings yet
Recurrent Neural Network-Programs
9 pages
Richtek RT9742
No ratings yet
Richtek RT9742
20 pages
Ashfatmaterial
No ratings yet
Ashfatmaterial
4 pages
Simplifying Radicals
No ratings yet
Simplifying Radicals
8 pages
EDS - Python Cheat Sheet
0% (1)
EDS - Python Cheat Sheet
3 pages
Project-1 (Data Preprocessing)
No ratings yet
Project-1 (Data Preprocessing)
5 pages
Data Preprocesing JavaPoint
No ratings yet
Data Preprocesing JavaPoint
19 pages
Inte 423 Exam Draft
No ratings yet
Inte 423 Exam Draft
3 pages
Bni Iol-712-000-K023 - en - Bni00041
No ratings yet
Bni Iol-712-000-K023 - en - Bni00041
12 pages
PracticalWeek03a
No ratings yet
PracticalWeek03a
1 page
Tutorial 4
No ratings yet
Tutorial 4
8 pages
Constructor CPP Unit8
No ratings yet
Constructor CPP Unit8
28 pages
BACS1113 ASSIGNMENT (JAN 2022) : (P3: Practical Skills)
No ratings yet
BACS1113 ASSIGNMENT (JAN 2022) : (P3: Practical Skills)
3 pages
ML (Prac1)
No ratings yet
ML (Prac1)
12 pages
Final Report 1301174460 1301174539 AMLdocx
No ratings yet
Final Report 1301174460 1301174539 AMLdocx
12 pages
Unit 1
No ratings yet
Unit 1
17 pages
Future Generation Computer Systems: Yan Wang Zhensen Wu Yuanjian Zhu Pei Zhang
No ratings yet
Future Generation Computer Systems: Yan Wang Zhensen Wu Yuanjian Zhu Pei Zhang
10 pages
08 GT I9070 Tshoo 7
No ratings yet
08 GT I9070 Tshoo 7
49 pages
Trichy
No ratings yet
Trichy
24 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
23 pages
Mi COMP111
No ratings yet
Mi COMP111
8 pages
Handling Missing Values in A Real-Time Dataset During
No ratings yet
Handling Missing Values in A Real-Time Dataset During
5 pages
Edc
100% (1)
Edc
2 pages
HP F210 User Manual
No ratings yet
HP F210 User Manual
31 pages
D-Tect 50 Ip Quad Pir Datasheet
No ratings yet
D-Tect 50 Ip Quad Pir Datasheet
2 pages
Python Scripts For Machine Learning
No ratings yet
Python Scripts For Machine Learning
13 pages
SUN2000-115kTL-M2 Datasheet
No ratings yet
SUN2000-115kTL-M2 Datasheet
2 pages
Bpo
No ratings yet
Bpo
8 pages
This Study Resource Was
No ratings yet
This Study Resource Was
5 pages
Camara Linksys WVC2300 Wireless
No ratings yet
Camara Linksys WVC2300 Wireless
80 pages
Radar Product Catalog v2
No ratings yet
Radar Product Catalog v2
4 pages
Facebook Growth/Milestone Timeline
No ratings yet
Facebook Growth/Milestone Timeline
5 pages
MX-CPG Bim Impplan Rev0
No ratings yet
MX-CPG Bim Impplan Rev0
17 pages
Application of Expert System
No ratings yet
Application of Expert System
4 pages
Profound Python Libraries
From Everand
Profound Python Libraries
Onder Teker
No ratings yet

# For Linear Algebra Import Numpy As NP # For Data Processing Import Pandas As PD

Uploaded by

# For Linear Algebra Import Numpy As NP # For Data Processing Import Pandas As PD

Uploaded by

Step 1: Import the required libraries

# For linear algebra

Step 2: Load the data set

Step 3: Data Preprocessing

# Checking for null values

#Removing null values

#Change yes and no to 1 and 0 respectvely for RainToday and RainTomorrow

Step 4: Exploratory Data Analysis (EDA)

#Using SelectKBest to get the top features!

The output gives us the three most significant predictor variables:

#The important features are put in a data frame

We’ll be building classification models, by using the following algorithms:

You might also like