Linear Regression Review
Linear Regression Review
Data pre-processing
• What to do with missing info
• Transform variables
• Get it ready for analysis
Predictive Models
• Linear regression (covered in Evaluate Model
STAT202) • Training and test sets
• Logistic Regression • Performance metrics
• Decision Trees
What we will cover
Become familiar with a Dimension Reduction
data set • Perform PCA to reduce # of
• What variables are in it? variables in data set
• Where did it come from?
• Any initial patterns or Interpret
relationships? Draw conclusions
Provide recommendations
Cluster Analysis
• Generate useful segments
within the data
Data pre-processing
• What to do with missing info
• Transform variables
• Get it ready for analysis
Predictive Models
• Linear regression (covered Evaluate Model
in STAT202) • Training and test sets
• Logistic Regression • Performance metrics
• Decision Trees
Linear Regression
Using the equation for a line to estimate/predict a target variable
Simple Linear Regression: using a single predictor variable
Multiple Linear Regression: using a set of predictor variables
Error terms:
Have mean of zero
Have constant variance
Are independent of each other
Are normally distributed
Linear Regression
Tests/Statistics
Each coefficient is being tested against the following hypotheses:
H0: coefficient is equal to 0
H1: coefficient is not equal to 0
Adjusted R2 is a measure of fit that is adjusted for # of variables/observations used in the model
Penalty assigned for having more information
Example of Simple
Linear Regression
BASEBALL DATA SET: ESTIMATING HOME RUNS WITH
BATTING AVERAGE
Example of
Multiple Linear
Regression
HOUSE PRICE DATA SET: ESTIMATING HOUSE PRICE
WITH SET OF PREDICTOR VARIABLES