Tutorial04_Logistic Regression
Tutorial04_Logistic Regression
2000 U.S. election map is in the public domain. Source: Wikimedia Commons.
Election Prediction
• Goal: Use polling data
to predict state winners
The Dataset
• Data from RealClearPolitics.com
• Instances represent a state in a given election
– State: Name of state
– Year: Election year (2004, 2008, 2012)
• Dependent variable
– Republican: 1 if Republican won state, 0 if Democrat
won
• Independent variables
– Rasmussen, SurveyUSA: Polled R% - Polled D%
– DiffCount: Polls with R winner – Polls with D winner
– PropR: Polls with R winner / # polls
Multiple Imputation
• Fill in missing values based on non-missing values
– If Rasmussen is very negative, then a missing
SurveyUSA value will likely be negative
– Just like sample.split, results will differ between runs
unless you fix the random seed
• Although the method is complicated, we can use it
easily through R’s libraries
• We will use Multiple Imputation by Chained
Equations (mice) package