0% found this document useful (0 votes)
5 views

Tutorial04_Logistic Regression

Uploaded by

1135399568
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Tutorial04_Logistic Regression

Uploaded by

1135399568
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

IIMT2641

Introduction to Business Analytics

Introduction to Business Analytics


IIMT2641

Tutorial 04 - Logistic Regression


IIMT2641
Introduction to Business Analytics

United States Presidential Elections


• A president is elected every four years
• Generally, only two competitive candidates
– Republican
– Democratic

Official presidential photos are in the public domain

The University of Hong Kong 2


IIMT2641
Introduction to Business Analytics

The Electoral College


• The United States have 50 states
• Each assigned a number of electoral votes based
on population
– Most votes: 55 (California)
– Least votes: 3 (multiple states)
– Reassigned periodically based on population change
• Winner takes all: candidate with the most votes in a
state gets all its electoral votes
• Candidate with most electoral votes wins election

The University of Hong Kong 3


IIMT2641
Introduction to Business Analytics

2000 Election: Bush vs. Gore

2000 U.S. election map is in the public domain. Source: Wikimedia Commons.

The University of Hong Kong 4


IIMT2641
Introduction to Business Analytics

Election Prediction
• Goal: Use polling data
to predict state winners

• Then-New York Times


columnist Nate Silver
famously took on this
task for the 2012
election

The University of Hong Kong 5


IIMT2641
Introduction to Business Analytics

The Dataset
• Data from RealClearPolitics.com
• Instances represent a state in a given election
– State: Name of state
– Year: Election year (2004, 2008, 2012)
• Dependent variable
– Republican: 1 if Republican won state, 0 if Democrat
won
• Independent variables
– Rasmussen, SurveyUSA: Polled R% - Polled D%
– DiffCount: Polls with R winner – Polls with D winner
– PropR: Polls with R winner / # polls

The University of Hong Kong 6


IIMT2641
Introduction to Business Analytics

Simple Approaches to Missing Data


• Delete the missing observations
– We would be throwing away more than 50% of the
data
– We want to predict for all states
• Delete variables with missing values
– We want to retain data from Rasmussen/SurveyUSA
• Fill missing data points with average values
– The average value for a poll will be close to 0 (tie
between Democrat and Republican)
– If other polls in a state favor one candidate, the
missing one probably would have, too

The University of Hong Kong 8


IIMT2641
Introduction to Business Analytics

Multiple Imputation
• Fill in missing values based on non-missing values
– If Rasmussen is very negative, then a missing
SurveyUSA value will likely be negative
– Just like sample.split, results will differ between runs
unless you fix the random seed
• Although the method is complicated, we can use it
easily through R’s libraries
• We will use Multiple Imputation by Chained
Equations (mice) package

The University of Hong Kong 9

You might also like