R Tutorial Slides
R Tutorial Slides
Capital One Data Mining Cup UW Statistics Club Saturday, March 23, 2013
Have the statistical background but lack the (R) modelling expertise Never taken a linear regression course (or simply forgot the one they did!)
Walkthrough example of a statistical prediction problem using Kaggle test data (Titanic problem) The goal is to predict who will survive given different factors such as
Age
Ticket Fare Sex Cabin Number of family aboard
R Basics
Opening R (RStudio) Navigating to the working directory Running commands Installing packages Loading packages
8.
9. 10. 11.
Interpret results
Challenge results Synthesize/write up results Create reproducible code
Fix variable names Merge data sets Fix missing content Fix inconsistent data
Make use of
Leave-one-out
Easy to implement in R
Examine Residuals plot Examine Q-Q plot Use the Model Testing process to pick a proper model
Consequence: standard error blows up on estimate Use R to compute correlation between all predictors. If there exists sets of predictors above 0.90 0.95 then either:
What Next?
Data Transformations
Check for multicollinearity
Different Types of Models (not covered here but check the R Code!)
Ensemble Methods