0% found this document useful (0 votes)
7 views8 pages

Tutorial04_Logistic Regression

Uploaded by

1135399568
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views8 pages

Tutorial04_Logistic Regression

Uploaded by

1135399568
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

IIMT2641

Introduction to Business Analytics

Introduction to Business Analytics


IIMT2641

Tutorial 04 - Logistic Regression


IIMT2641
Introduction to Business Analytics

United States Presidential Elections


• A president is elected every four years
• Generally, only two competitive candidates
– Republican
– Democratic

Official presidential photos are in the public domain

The University of Hong Kong 2


IIMT2641
Introduction to Business Analytics

The Electoral College


• The United States have 50 states
• Each assigned a number of electoral votes based
on population
– Most votes: 55 (California)
– Least votes: 3 (multiple states)
– Reassigned periodically based on population change
• Winner takes all: candidate with the most votes in a
state gets all its electoral votes
• Candidate with most electoral votes wins election

The University of Hong Kong 3


IIMT2641
Introduction to Business Analytics

2000 Election: Bush vs. Gore

2000 U.S. election map is in the public domain. Source: Wikimedia Commons.

The University of Hong Kong 4


IIMT2641
Introduction to Business Analytics

Election Prediction
• Goal: Use polling data
to predict state winners

• Then-New York Times


columnist Nate Silver
famously took on this
task for the 2012
election

The University of Hong Kong 5


IIMT2641
Introduction to Business Analytics

The Dataset
• Data from RealClearPolitics.com
• Instances represent a state in a given election
– State: Name of state
– Year: Election year (2004, 2008, 2012)
• Dependent variable
– Republican: 1 if Republican won state, 0 if Democrat
won
• Independent variables
– Rasmussen, SurveyUSA: Polled R% - Polled D%
– DiffCount: Polls with R winner – Polls with D winner
– PropR: Polls with R winner / # polls

The University of Hong Kong 6


IIMT2641
Introduction to Business Analytics

Simple Approaches to Missing Data


• Delete the missing observations
– We would be throwing away more than 50% of the
data
– We want to predict for all states
• Delete variables with missing values
– We want to retain data from Rasmussen/SurveyUSA
• Fill missing data points with average values
– The average value for a poll will be close to 0 (tie
between Democrat and Republican)
– If other polls in a state favor one candidate, the
missing one probably would have, too

The University of Hong Kong 8


IIMT2641
Introduction to Business Analytics

Multiple Imputation
• Fill in missing values based on non-missing values
– If Rasmussen is very negative, then a missing
SurveyUSA value will likely be negative
– Just like sample.split, results will differ between runs
unless you fix the random seed
• Although the method is complicated, we can use it
easily through R’s libraries
• We will use Multiple Imputation by Chained
Equations (mice) package

The University of Hong Kong 9

You might also like