0% found this document useful (0 votes)
22 views45 pages

BA501 Week5 Linear Regression

Uploaded by

susan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views45 pages

BA501 Week5 Linear Regression

Uploaded by

susan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

BA501 Mastery Program

版权声明
所有太阁官方网站以及在第三方平台 课程中所产生的课程内容,如文本, 图形,徽标,按钮图
标,图像,音频剪辑,视频剪辑,直播流,数字下 载,数据编辑和软件均属于太阁所有并受版
权法保护。

对于任何尝试散播或转售BitTiger的所属资料的行为,太阁将采取适当的法律行 动。

C 有关详情,请参阅
https://www.bittiger.io/termsofuse https://www.bittiger.io/termsofservice
Copyright Policy
All content included on the Site or third-party platforms as part of the class, such as text,
graphics, logos, button icons, images, audio clips, video clips, live streams, digital
downloads, data compilations, and software, is the property of BitTiger or its content
suppliers and protected by copyright laws.

Any attempt to redistribute or resell BitTiger content will result in the appropriate legal
action being taken.

C
We thank you in advance for respecting our copyrighted content.

For more info:


see https://www.bittiger.io/termsofuse
and https://www.bittiger.io/termsofservice
Summary
● Simple linear regression
○ Modeling E(Y|X)
○ Assumption
○ Coefficient estimation
■ Least square estimation
■ Maximum likelihood estimation
■ Hypothesis testing
Summary
● Multiple linear regression
○ Coefficient estimation
○ Evaluate model performance
○ ANOVA and F test
● Residual
○ Residual diagnostics
○ Leverage, standardizing
Simple Linear Regression
Classical example
● Let yi be the height of child i; xi
be the height of child i’s
parent
● How to use xi to predict yi?

● What are we predicting?


○ Expected Υ given Χ

● Simple linear regression


○ Υ = β0 + β1 Χ + ε
○ Ε[Υ] = β0 + β Χ
Simple linear regression model

Dependent / Response
Υ = β0 + β1 Χ + Error term
Variable ε
Coefficients Independent Variable

● Assumptions
○ 1. Linear relationship: Υ = β0 + β1 Χ + ε
○ 2. ε is independent, identically distributed (i.i.d)
○ 3. Homoscedasticity (constant variance) Var[ε |Χ = x] = σ2
(no matter what x is)
○ 4. Gaussian Noise: Normal distribution ε ~ Ν(0, σ2)
Assumption Violation

● Linear Relationship
Assumption Violation

● ε Dependent on each other


Assumption Violation
● Heteroscedasticity (non-constant variance)
Some error term example

Why Gaussian noise?

● Central limit theorem


○ Noise might be sum of lots of
little random noises from
different sources, independent
and with similar magnitude.

● Mathematical convenience
○ Closed form estimation
Under Gaussian Assumption
Υ = β0 + β1 Χ +
E(Υ) = β0 + β1 Χ ε

ε = Υ - β0 - β1 Χ ~ Normal (0, σ2) ->


Y = β0 + β1 Χ + ε ~ Normal (β0 + β1 Χ, σ2)
Coefficient estimation
● Some definition
○ Sale price Y: dependent variable, output, response
○ Lot area X: predictor, independent variable, covariate, input
● Coefficient estimation
○ Remember assumption Υ = β0 + β1Χ + ε
○ How do we find optimal (β0, β1)?
■ Least square error estimation (LSE)
■ Maximum likelihood estimation (MLE)
■ Compare LSE and MLE
Least square error estimation
● Mean square error (MSE)
Least square error estimation (cont’d)
● are estimators of

● Criteria for good estimators (week 1)


○ Unbiased

○ Variance decreases with more data


What is likelihood
● Probability
○ If we know β0 = 10, β1 = 0.8, E(Y) = 10 + 0.8X, possibility to observe (x1=70,
y1=72), …, (xn, yn)?
● Reverse above logic
○ If we observe (x1=70, y1=72), …, (xn, yn)
○ Likelihood (L) of β0 = 10, β1 = 0.8?
● Connection
○ L(β0,β1|{X, Y}) = P({X, Y}|β0, β1)

● How to use likelihood to estimate β0, β1


○ Maximizes Likelihood function L(β0,β1|{X, Y})
Maximum likelihood estimation (MLE)
● Under Gaussian error distribution

● MLE to LSE are identical when ε ~ Ν(0, σ2)


Hypothesis testing for coefficient
● Test assumption
○ Null hypothesis β1 = 0 (What type of test)

● Calculate stats: compare observed and H0


○ Why t distribution. Recall t score. How to decide d.f. (degree of

freedom)?
How to predict for new data?
● New data point x, how to predict y?
● Predicted (fitted) value at x:
● Is a single value?

○ Variance grows as σ2 is larger; more noise in predictions


○ Larger n is, smaller variance is; more precise of predictions
Distribution of predicted value

● Under Gaussian error assumption,


follows normal distribution
○ What’s Expectation, Variance,
confidence interval?
● Visualization
○ Observations as circle
○ Mean prediction as solid line
○ [5th, 95th] quantile of predicted value.
fitted

?
x ?
Bias and variance tradeoff
● Prediction error, suppose we have Υ = f(Χ) + ε
Bias and variance
● Accurate (small bias)
● Precise (small variance)
Multiple Linear Regression
Multiple linear regression
● More than one predictor, say p. ith data point

Yi = β0 + β1 X1i + β2 X2i + … + βp Xpi + εi

● How to get estimate (LSE)?

○ MSE: 1/n *

○ (β0 , β1 , … βp ) minimizes MSE (derivative = 0).

● What about MLE?


LSE for multiple regression
● Matrix form
○ n*1 matrix Y, n*(p+1) matrix X, (p+1)*1 matrix β , n*1 matrix ε
Υ=Χβ+ε
○ MSE:
○ Optimal estimator:
○ Fitted value: Projection matrix: H
How do you know if your model works?
ANOVA
● ANOVA (Analysis of variance)
○ Decompose variance into sources
○ F test
■ Null hypothesis:
■ Calculation: (p does not include β0)

Source of variance df Sum of square

Regression model p

Residual n-1-p
F distribution
Total n-1
How to evaluate model performance?
● R2, and relation with correlation

● Property
○ R2 , percentage of variance explained by model
○ 0 <= R2 <= 1
○ R2 could be misleading, link
■ More features always increase R2
■ Need to adjust R2
How do you know your model meet assumptions?
Residual Diagnostics
Residual vs. Error Term
● Residual
● Error (Noise)
○ ei is an observation of εi

● Properties of residual
○ 0
○ Constant variance, unchanging with x.
○ The residuals are not uncorrelated with each other, or extremely
weak
○ Normal distributed
Residual v.s. Fitted value
● To check if unbiased and homoscedasticity
• The residuals spread randomly around the 0 line indicating that the relationship is linear.
• The residuals form an approximate horizontal band around the 0 line indicating homogeneity of error
variance.
• No one residual is visibly away from the random pattern of the residuals indicating that there are no outliers.
Residual v.s. Fitted value – Violation Examples
QQ Plot
● To check for Normality
• The points should roughly follow the diagonal straight line
• You can also check normality use Shapiro-Wilk test or Kolmogorov-Smirnov test
QQ Plot – Violation Examples
Outlier, High Leverage, Influential
• An outlier is a data point whose response y does not follow the general trend of the rest of the
data.
• A data point has high leverage if it has "extreme" predictor x values. With or without this data point
will highly impact the estimate of coefficients
• A data point is influential if it unduly influences any part of a regression analysis, such as the
predicted responses, the estimated slope coefficients, or the hypothesis test results. Outliers and
high leverage data points have the potential to be influential, but we generally have to investigate
further to determine whether or not they are actually influential.
• Great reading
Standardized residuals vs. Leverage
● To check high leverage, influential data points

• Leverage: consider fitted line as lever, passing center point. Points further from center point
have larger leverage. hii= [H]ii measure distance of x to center of x

• Rule of thumb: Cook’s distance>1 influential points


How to Deal with Problematic Data?
Do NOT simply delete them by default!!!
You must have a good, objective reason for deleting data points

1. Check for obvious data errors. correct or delete them

2. Consider the possibility of mis-formulated model:


• Did you leave out any important predictors?
• Should you consider adding some interaction terms?
• Is there any nonlinearity that needs to be modeled?

3. If you delete any data after you've collected it, justify and describe it in your reports.
If you are not sure what to do about a data point, analyze the data twice — once with and once
without the data point — and report the results of both analyses.
Interview Questions
Basic

• What model will you use? For continuous Y variable, linear relationship. Not for categorical
y variable

• What are the assumptions of linear regression? How to check?

• How is coefficients estimated?

• What kind of diagnostics will you do?

Advanced

• What is the estimate of coefficients? How to derive it?

• What is variance of beta, error term? Degree of freedom

• What to do with non-constant variance? Non-linear? Non-normal?


Interview Questions
1, take home project report
2, product sense questions (2 hour, part of it)
3, industry what function, dynamic pricing, marketing analytics, growth, risk
4, What BA do, how to grow to a DS / work experience, day to day work
5, Q&A
Appendix
Leverage

hii: leverage of
ith data point

● Properties of hii
○ hii is a measure of the distance between xi and mean of the x
○ hii is a number between 0 and 1.
○ Sum of the hii equals p+1, the number of parameters
(regression coefficients including the intercept).
Standardizing
● Standardized residual
○ What: Residuals rescaled to have a mean of 0 and a variance of 1
○ Why:
■ Since data points far from center has larger impact on
estimating coefficient (leverage), residual variance of these
data points are smaller

○ How to standardize:
Standardizing in simple linear regression

● Residual and leverage


○ Link
Should we standardize features?
● Standardizing won’t change coefficient significance

● For comparing coefficients for different predictors


within a model, standardizing helps
● For comparing coefficients for the same predictors
across different data sets, don’t standardize

You might also like