0% found this document useful (0 votes)
7 views9 pages

Linear Regression Review

Uploaded by

medhavipandit3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views9 pages

Linear Regression Review

Uploaded by

medhavipandit3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Intro to Data

Mining for Business


STAT331
LINEAR REGRESSION REVIEW
What we will cover
Become familiar with a Dimension Reduction
data set • Perform PCA to reduce # of
• What variables are in it? variables in data set
• Where did it come from?
• Any initial patterns or Interpret
relationships? Draw conclusions
Provide recommendations
Cluster Analysis
• Generate useful segments
within the data

Data pre-processing
• What to do with missing info
• Transform variables
• Get it ready for analysis
Predictive Models
• Linear regression (covered in Evaluate Model
STAT202) • Training and test sets
• Logistic Regression • Performance metrics
• Decision Trees
What we will cover
Become familiar with a Dimension Reduction
data set • Perform PCA to reduce # of
• What variables are in it? variables in data set
• Where did it come from?
• Any initial patterns or Interpret
relationships? Draw conclusions
Provide recommendations
Cluster Analysis
• Generate useful segments
within the data

Data pre-processing
• What to do with missing info
• Transform variables
• Get it ready for analysis
Predictive Models
• Linear regression (covered Evaluate Model
in STAT202) • Training and test sets
• Logistic Regression • Performance metrics
• Decision Trees
Linear Regression
Using the equation for a line to estimate/predict a target variable
Simple Linear Regression: using a single predictor variable
Multiple Linear Regression: using a set of predictor variables

Target variable: must be numerical


Predictor variables: can be numerical or categorical
Linear Regression

Y=the target variable


Xi=a predictor variable
Bi=the coefficient of a predictor variable
e=error term
Linear Regression Assumptions
Relationship between target and predictor variables is linear
 Can be adjusted for if not

Error terms:
 Have mean of zero
 Have constant variance
 Are independent of each other
 Are normally distributed
Linear Regression
Tests/Statistics
Each coefficient is being tested against the following hypotheses:
 H0: coefficient is equal to 0
 H1: coefficient is not equal to 0

All coefficients get a pvalue


 If pvalue is small, we reject – that variable has a significant effect

All coefficients come with confidence intervals


R2 is a measure of fit that ranges from 0% to 100%
 Proportion of variation in target variable that is explained by set of predictor variables

Adjusted R2 is a measure of fit that is adjusted for # of variables/observations used in the model
 Penalty assigned for having more information
Example of Simple
Linear Regression
BASEBALL DATA SET: ESTIMATING HOME RUNS WITH
BATTING AVERAGE
Example of
Multiple Linear
Regression
HOUSE PRICE DATA SET: ESTIMATING HOUSE PRICE
WITH SET OF PREDICTOR VARIABLES

You might also like