0% found this document useful (0 votes)

165 views15 pages

Logistic Regression

The document outlines a data mining project to predict election winners in India using state-level polling data from 2004, 2008, and 2012. It describes cleaning and normalizing the data, building logistic regression models on the training data from 2004 and 2008, and evaluating the models' accuracy on the 2012 test data, finding an accuracy of 96.77%. The conclusion is that the model performs well for predicting state winners.

Uploaded by

Harshal Kolhatkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

165 views15 pages

Logistic Regression

Uploaded by

Harshal Kolhatkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 15

Data Mining Project

(Predict Election Winners)

- By Harshal Kolhatkar
Problem Statement

• An election is to be held in next month, ABC Corporation

a data analytics company wants to predict the future of
two largest parties in country.

• Two major parties are BJP & Congress.

• Goal : Use Polling data to predict state Winner.

Given Dataset

Instance represent a state in a given election

• State : Name of the state

• Year : Election year (2004,2008,2012)

Dependent Variable

• BJP : 1 if BJP won state, 0 if congress won.

Independent Variable

• Times now, India Today : Polled BJP% - Polled Congress%

• DiffCount : Polls with BJP winner – Polls with congress winner

• PropBJP : Polls with BJP winner / # polls

Data Cleaning
 Summary of Polling Data
Data Cleaning – Packages to handle Missing Values

List of R Packages
1. MICE (Multiple Imputation Via Chain Equation)
2. Amelia
3. miss Forest
4. Hmisc
5. mi
Data Cleaning
 Graphical Representation of Missing Value

Before Cleansing After Cleansing

Data Visualization
Mean = 0.2525253 Mean = 0.02858385
Standard Deviation = 14.27238 Standard Deviation = 1.026924

Before Normalizing After Normalizing

Data Visualization
Mean = 0.3838384 Mean = 0.02858385
Standard Deviation = 15.45745 Standard Deviation = 1.026924

Before Normalizing After Normalizing

Data Visualization – Checking Normality
Before Normalizing After Normalizing

Times Now
Data Visualization – Checking Normality
Before Normalizing After Normalizing

India Today
Data Modeling

 Collinearity is a linear association between two explanatory variables.

 Two variables are perfectly collinear if there is an exact linear relationship

between them.
Data Modeling (Using Train & Test)

Years : 2004, 2008, 2012

Train : 2004, 2008

Test : 2012
Data Model ( Logistic regression )
With India Today + Prop BJP With Prop BJP
Data Model ( Logistic regression )
• Train model ( Year 2004,2008)
• Accuracy of model = 94.11%
• Test Model (Year 2012)
• Accuracy of data = 96.77%
Conclusion

• Finally I conclude that the model that I have

made is performing well to predict data of
year 2012.
• So we can use this model to predict the state
winners.

Goodbelly Marketing Analysis Final
85% (13)
Goodbelly Marketing Analysis Final
32 pages
Machine Learning Business Report
75% (55)
Machine Learning Business Report
60 pages
ML Ts Proj
100% (9)
ML Ts Proj
58 pages
ASSIGNMENT Machine Learning
100% (5)
ASSIGNMENT Machine Learning
63 pages
Nureg-Cr-6823 HB Param Est
No ratings yet
Nureg-Cr-6823 HB Param Est
294 pages
Quiz 02
No ratings yet
Quiz 02
3 pages
Machine Learning Project
83% (6)
Machine Learning Project
37 pages
Prediction Model
No ratings yet
Prediction Model
5 pages
Logistic Regression
No ratings yet
Logistic Regression
9 pages
Machine Learning Project: Name-Rasmita Mallick Date - 5 September 2021
100% (2)
Machine Learning Project: Name-Rasmita Mallick Date - 5 September 2021
47 pages
Project Presentation
No ratings yet
Project Presentation
18 pages
Umendra Pratap Singh Solanki ML Graded Project 18-12-2022
No ratings yet
Umendra Pratap Singh Solanki ML Graded Project 18-12-2022
27 pages
Capstone Final Project Report Cricket Win Prediction
No ratings yet
Capstone Final Project Report Cricket Win Prediction
20 pages
Election Prediction Projectfinal
No ratings yet
Election Prediction Projectfinal
30 pages
Machine Learning Project Report
No ratings yet
Machine Learning Project Report
65 pages
Data Preprocessing
No ratings yet
Data Preprocessing
18 pages
U3 Prob & Stat & Hypo
No ratings yet
U3 Prob & Stat & Hypo
80 pages
HCI - Notes-Ch3
100% (1)
HCI - Notes-Ch3
44 pages
Bussiness Report PM
No ratings yet
Bussiness Report PM
44 pages
Big Data Assignment Revised
No ratings yet
Big Data Assignment Revised
4 pages
Tutorial04_Logistic Regression
No ratings yet
Tutorial04_Logistic Regression
8 pages
ML ProjectReport-Sonali Joshi
100% (2)
ML ProjectReport-Sonali Joshi
38 pages
PAMLSET1new.docx (1)
No ratings yet
PAMLSET1new.docx (1)
4 pages
DAV Guidelines
No ratings yet
DAV Guidelines
4 pages
FRA Milestone 1
No ratings yet
FRA Milestone 1
33 pages
Chapter2 BI
No ratings yet
Chapter2 BI
77 pages
ShubhashreeChakravarty Resume
No ratings yet
ShubhashreeChakravarty Resume
1 page
Data Distribution
No ratings yet
Data Distribution
26 pages
Session 1, 3, 4, 5[6]
No ratings yet
Session 1, 3, 4, 5[6]
79 pages
ML Unit 1 Part 2
No ratings yet
ML Unit 1 Part 2
56 pages
Lection Orecasting: 15.071 - The Analytics Edge
No ratings yet
Lection Orecasting: 15.071 - The Analytics Edge
9 pages
Homework 1: ECON 621: Political Economy, Monsoon 2019
No ratings yet
Homework 1: ECON 621: Political Economy, Monsoon 2019
2 pages
Lect 04 Preprocessing Structured
No ratings yet
Lect 04 Preprocessing Structured
39 pages
Business Report Project Machine Learning Rupesh Kumar DSBA-A5-21C-2021
100% (3)
Business Report Project Machine Learning Rupesh Kumar DSBA-A5-21C-2021
77 pages
DADM S2 Data Preprocessing-Data Cleaning and Transformation
No ratings yet
DADM S2 Data Preprocessing-Data Cleaning and Transformation
12 pages
7 OLS Assumptions
No ratings yet
7 OLS Assumptions
37 pages
Phase-2 (1)
No ratings yet
Phase-2 (1)
6 pages
Module 2 ML Chapter2
No ratings yet
Module 2 ML Chapter2
64 pages
Class12_DataScience_Project_Template_2024-25 (2)
No ratings yet
Class12_DataScience_Project_Template_2024-25 (2)
50 pages
02 Data
No ratings yet
02 Data
35 pages
Lecture 5 - Data Preparation
No ratings yet
Lecture 5 - Data Preparation
31 pages
Data Screening Assumptions
No ratings yet
Data Screening Assumptions
29 pages
Business Report ML
No ratings yet
Business Report ML
29 pages
Lead Scoring Group Case Study Presentation
100% (2)
Lead Scoring Group Case Study Presentation
19 pages
Chapter 02 Overview (R)
No ratings yet
Chapter 02 Overview (R)
43 pages
CaseStudy1 (2)
No ratings yet
CaseStudy1 (2)
25 pages
Machine Learning-2 Report.
No ratings yet
Machine Learning-2 Report.
71 pages
SCA - Module 3
No ratings yet
SCA - Module 3
48 pages
ML P L Lohitha 22-01-23 Business Report
No ratings yet
ML P L Lohitha 22-01-23 Business Report
34 pages
Chapter 02 Overview
No ratings yet
Chapter 02 Overview
43 pages
Data Science Lab
No ratings yet
Data Science Lab
66 pages
Md-Younus-Khan-FlowCV-Resume-20250205
No ratings yet
Md-Younus-Khan-FlowCV-Resume-20250205
2 pages
BE184
No ratings yet
BE184
47 pages
4 ExploratoryAnalysis
No ratings yet
4 ExploratoryAnalysis
42 pages
Data Preparation DM
No ratings yet
Data Preparation DM
26 pages
Bijesh Mishra Data Scientist
No ratings yet
Bijesh Mishra Data Scientist
2 pages
UNIT02
No ratings yet
UNIT02
41 pages
Estima
No ratings yet
Estima
378 pages
Fourth Edition: Descriptive Analytics I: Nature of Data, Statistical Modeling, and Visualization
No ratings yet
Fourth Edition: Descriptive Analytics I: Nature of Data, Statistical Modeling, and Visualization
66 pages
Data Mining For Business Intelligence: Shmueli, Patel & Bruce
No ratings yet
Data Mining For Business Intelligence: Shmueli, Patel & Bruce
37 pages
Khuraijam Shitle Kumar Manipur University: Clustered Based Analysis and Forecasting of COVID-19 Cases in NE India
No ratings yet
Khuraijam Shitle Kumar Manipur University: Clustered Based Analysis and Forecasting of COVID-19 Cases in NE India
33 pages
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Multiple Regression Analysis: Inference: Wooldridge: Introductory Econometrics: A Modern Approach, 5e
No ratings yet
Multiple Regression Analysis: Inference: Wooldridge: Introductory Econometrics: A Modern Approach, 5e
23 pages
Marquardt method (1)
No ratings yet
Marquardt method (1)
4 pages
Predicting The Churn in Telecom Industry
No ratings yet
Predicting The Churn in Telecom Industry
23 pages
1
No ratings yet
1
6 pages
Statistical Analysis in Climate Research Hans Von
No ratings yet
Statistical Analysis in Climate Research Hans Von
4 pages
Slides MLR
No ratings yet
Slides MLR
17 pages
Variable Selection 8.1 The Model Building Problem
No ratings yet
Variable Selection 8.1 The Model Building Problem
18 pages
MTech Aircraft Maintenance NDT
100% (1)
MTech Aircraft Maintenance NDT
39 pages
Regression
No ratings yet
Regression
3 pages
Multiple Regression
No ratings yet
Multiple Regression
57 pages
SAS 18 ACC 117 2nd Periodical Exam CS
No ratings yet
SAS 18 ACC 117 2nd Periodical Exam CS
8 pages
Ef3451 HW1 (Feb14 12)
No ratings yet
Ef3451 HW1 (Feb14 12)
2 pages
Qns Exam2
No ratings yet
Qns Exam2
11 pages
Statistical Data Analysis Full Project
No ratings yet
Statistical Data Analysis Full Project
22 pages
Concordance C Index - 2 PDF
No ratings yet
Concordance C Index - 2 PDF
8 pages
Introductory Econometrics A Modern Appro
100% (1)
Introductory Econometrics A Modern Appro
202 pages
(eBook PDF) Probability, Statistics, and Random Signals by Charles Bonceletinstant download
100% (2)
(eBook PDF) Probability, Statistics, and Random Signals by Charles Bonceletinstant download
48 pages
Calonico Cattaneo Farrell Titiunik 2017 Stata RD
No ratings yet
Calonico Cattaneo Farrell Titiunik 2017 Stata RD
33 pages
Outliers Detection in Regression Analysis Using Partial Least Square Approach
No ratings yet
Outliers Detection in Regression Analysis Using Partial Least Square Approach
3 pages
Tugas 5 Statistik Pendidikan "Analisis Chapter 9"
No ratings yet
Tugas 5 Statistik Pendidikan "Analisis Chapter 9"
6 pages
ECON4150 - Introductory Econometrics Seminar 6: (Moniqued@econ - Uio.no)
No ratings yet
ECON4150 - Introductory Econometrics Seminar 6: (Moniqued@econ - Uio.no)
14 pages
Trip Generation
No ratings yet
Trip Generation
13 pages
Data Klorofil Spss Konversi
No ratings yet
Data Klorofil Spss Konversi
8 pages
Program Name: B.Tech CSE Semester: 5th Course Name: Machine Learning Course Code:PEC-CS-D-501 (I) Facilitator Name: Aastha
No ratings yet
Program Name: B.Tech CSE Semester: 5th Course Name: Machine Learning Course Code:PEC-CS-D-501 (I) Facilitator Name: Aastha
20 pages
Statistic For Agriculture Studies: The Assumptions of Regression
No ratings yet
Statistic For Agriculture Studies: The Assumptions of Regression
6 pages
COSM - Lesson Plan (CSE)
No ratings yet
COSM - Lesson Plan (CSE)
4 pages
Regression
100% (1)
Regression
87 pages

Logistic Regression

Uploaded by

Logistic Regression

Uploaded by

Data Mining Project

(Predict Election Winners)

• An election is to be held in next month, ABC Corporation

• Two major parties are BJP & Congress.

• Goal : Use Polling data to predict state Winner.

Instance represent a state in a given election

• State : Name of the state

• Year : Election year (2004,2008,2012)

• BJP : 1 if BJP won state, 0 if congress won.

• Times now, India Today : Polled BJP% - Polled Congress%

• DiffCount : Polls with BJP winner – Polls with congress winner

• PropBJP : Polls with BJP winner / # polls

Before Cleansing After Cleansing

Before Normalizing After Normalizing

Before Normalizing After Normalizing

 Collinearity is a linear association between two explanatory variables.

 Two variables are perfectly collinear if there is an exact linear relationship

Years : 2004, 2008, 2012

Train : 2004, 2008

• Finally I conclude that the model that I have

You might also like