1.1 Regression Analysis
1.1 Regression Analysis
• Predicting numbers
• Supervised Learning
• Simple linear regression
• Multiple linear regression
• K-Nearest Neighbors
• Decision tree-based methods
• Artificial neural networks
Simple Linear Regression
• A very straightforward simple linear
approach for predicting a quantitative
response Y on the basis of a single predictor
variable X
? ? ?
Residual
Square of standard
deviation of each of the
observations yi of Y
Standard error
Variance
The true regression line
• The standard errors of the coefficients of the simple regression line can
be computed as:
• A range of values such that with 95% probability, the range will contain
true unknown value of a parameter
• For ,
Why?
• The t-statistic
The number of standard
deviations is away from
zero
• Whether is sufficiently far away from 0 so that the true value of is
non-zero
Are X and Y related? The p-value
We estimate
two parameters
• A p-value computed from t-distribution
• If no relationship between X and Y, a t-distribution with n-2 degrees
of freedom
• N > 30 makes it normal distribution
• Compute the probability of any value equal to , assuming
• The p-value
• Provides the smallest level of significance at which the null-hypothesis
would be rejected
• Smaller p-value means an association between X and Y
• Typical cutoff values of p are 5% and 1%
Is there a relationship between TV
advertising and sales?
🠶 What would you infer?
How much the model fits data?
Way above 1.
At least one
advertising media is
related to sales
• What if n is larger?
• An F value slightly above 1 is sufficient to reject the null hypothesis
• What’s the right F value to reject the null hypothesis?
• The p-value from the F-distribution (when the errors are normally distributed)
• A smaller p-value suggests a relationship between the predictors and the
response
Can we test the null hypothesis for a
subset of coefficients?
• The corresponding null hypothesis for a subset of q coefficients:
• Can test models with such combinations and choose the best model
• Methods used:
• Mallow’s Cp
• Akaike Information Criterion (AIC)
• Bayesian Information Criterion (BIC)
• Adjusted R2
• What would be the challenge here when combining variables?
Variable selection
• Automated and efficient methods to choose smaller yet effective
subsets of models (of subsets of variables)
• Common approaches:
• Forward selection
• Start with the null model (only intercept)
• Evaluate p simple linear regression models and add the variable with the lowest
RSS to the null model
• Evaluate the new set of two variable models and add the variable with the lowest
RSS to the model
• Continue till a particular stopping criteria is met
• Backward selection
• Start with all variables
• Remove the variable with highest p-value
• Re-evaluate the model, remove the variable with highest p-value and so on..
Variable selection
• Common approaches
• Mixed selection
• Start with a no-variable model
• Keep adding one by one like in the forward selection
• In case the p-value of any variable is above a certain threshold, remove that
variable
• Continue this back-and-forth process until a stopping criterion is met
• The model, once fitted to training data can be used for predictions
• Challenges
• Coefficient estimates are distant from actual population parameter values
• Least square plane vs true population regression plane
• Male female?
• Large, medium, small?
Issues with standard linear regression