0% found this document useful (0 votes)
31 views9 pages

Lection Orecasting: 15.071 - The Analytics Edge

This document discusses election forecasting and predicting winners before votes are cast. It covers key aspects of US presidential elections, including how the electoral college system works. It then discusses Nate Silver's efforts to predict state winners in 2012 using polling data. The dataset used contains polling data from various sources for states in recent elections. Methods discussed to handle missing data in the dataset include listwise deletion, variable deletion, mean imputation, and multiple imputation using chained equations.

Uploaded by

Aman Gabbar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views9 pages

Lection Orecasting: 15.071 - The Analytics Edge

This document discusses election forecasting and predicting winners before votes are cast. It covers key aspects of US presidential elections, including how the electoral college system works. It then discusses Nate Silver's efforts to predict state winners in 2012 using polling data. The dataset used contains polling data from various sources for states in recent elections. Methods discussed to handle missing data in the dataset include listwise deletion, variable deletion, mean imputation, and multiple imputation using chained equations.

Uploaded by

Aman Gabbar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

ELECTION FORECASTING

Predicting the Winner Before any Votes are Cast

15.071 – The Analytics Edge


United States Presidential Elections
•  A president is elected every four years
•  Generally, only two competitive candidates
•  Republican
•  Democratic

15.071x –Election Forecasting: Predicting the Winner Before any Votes are Cast 1
The Electoral College
•  The United States have 50 states
•  Each assigned a number of electoral votes based on
population
•  Most votes: 55 (California)
•  Least votes: 3 (multiple states)
•  Reassigned periodically based on population change
•  Winner takes all: candidate with the most votes in a
state gets all its electoral votes
•  Candidate with most electoral votes wins election
15.071x –Election Forecasting: Predicting the Winner Before any Votes are Cast 2
2000 Election: Bush vs. Gore

15.071x –Election Forecasting: Predicting the Winner Before any Votes are Cast 3
Election Prediction
•  Goal: Use polling data to
predict state winners

•  Then-New York Times


columnist Nate Silver
famously took on this task
for the 2012 election

15.071x –Election Forecasting: Predicting the Winner Before any Votes are Cast 4
The Dataset
•  Data from RealClearPolitics.com
•  Instances represent a state in a given election
•  State: Name of state
•  Year: Election year (2004, 2008, 2012)
•  Dependent variable
•  Republican: 1 if Republican won state, 0 if Democrat won
•  Independent variables
•  Rasmussen, SurveyUSA: Polled R% - Polled D%
•  DiffCount: Polls with R winner – Polls with D winner
•  PropR: Polls with R winner / # polls

15.071x –The Statistical Sommelier: An Introduction to Linear Regression 5


ELECTION FORECASTING
Predicting the Winner Before any Votes are Cast

15.071 – The Analytics Edge


Simple Approaches to Missing Data
•  Delete the missing observations
•  We would be throwing away more than 50% of the data
•  We want to predict for all states
•  Delete variables with missing values
•  We want to retain data from Rasmussen/SurveyUSA
•  Fill missing data points with average values
•  The average value for a poll will be close to 0 (tie
between Democrat and Republican)
•  If other polls in a state favor one candidate, the missing
one probably would have, too

15.071x –Election Forecasting: Predicting the Winner Before any Votes are Cast 1
Multiple Imputation
•  Fill in missing values based on non-missing values
•  If Rasmussen is very negative, then a missing
SurveyUSA value will likely be negative
•  Just like sample.split, results will differ between runs unless
you fix the random seed
•  Although the method is complicated, we can use it
easily through R’s libraries
•  We will use Multiple Imputation by Chained
Equations (mice) package

15.071x –Election Forecasting: Predicting the Winner Before any Votes are Cast 2

You might also like