Linear Regression Model: Man - PN@VNP - Edu.vn
Linear Regression Model: Man - PN@VNP - Edu.vn
𝑒𝑥𝑝𝑑𝑟𝑖𝑛𝑘 = 𝜷𝑖𝑛𝑐𝑜𝑚𝑒
Dependent variable Independent variable
𝑌 = 𝜷𝑋
2
PRACTICE
• Identify the dependent variable and independent variable
(3) the number of air conditioner sold per month and the price of air conditioner
(mil. VND/air conditioner)
3
TWO-VARIABLE: WHY DO WE NEED TO ESTIMATE 𝛽?
𝑒𝑥𝑝𝑑𝑟𝑖𝑛𝑘 = 𝜷𝑖𝑛𝑐𝑜𝑚𝑒
Dependent variable Independent variable
4
TWO-VARIABLE: ESTIMATING 𝛽 USING RSTUDIO
5
THE COEFFICIENTS 𝛽
▪ 𝛽: regression/estimated coefficients
▪ 𝛽 shows the effect of the independent variable on the dependent variable.
▪ 𝛽: how much the dependent variable changes when the independent variable
increases/decreases by 1 unit.
6
TWO-VARIABLE: WHAT IS 𝛽 MEANING?
We’ve estimated the above function. We have the regression function as
7
TWO-VARIABLE: WHAT IS 𝛽 MEANING?
We’ve estimated the above function. We have the regression function as
▪ 16.007 is the slope coefficient (𝜷𝟏 ) that is the angle of the line, compared to a
horizontal line (x axis).
▪ The slope coefficient indicates when income increases by 1 million VND/month,
the average expenditure on drinking increases 16 thousand VND per month.
8
TWO-VARIABLE: WHAT IS 𝛽 MEANING?
We’ve estimated the above function. We have the regression function as
▪ When income is 10 mil. VND per month, the average expenditure on drinking is
244 thousand VND per month.
9
TWO-VARIABLE: 𝛽 MEANING IN THE GRAPH
expdrink
𝑒𝑥𝑝𝑑𝑟𝑖𝑛𝑘 = 83.93 + 16.007𝑖𝑛𝑐𝑜𝑚𝑒
Regression line
𝛽1 = 16.007
83.93
𝛽0
income
▪ If 𝛽 changes (negative/positive), the blue line in the graph will change.
▪ 𝛽 coefficients determine the location of the regression line. 10
REGRESSION…
▪ evaluates the relationship between the dependent variable and one or more
independent variable(s).
▪ is the most important technique in econometrics.
11
TWO-VARIABLE REGRESSION MODEL
• Economic theories
• Logical thinking
Theory
12
THEORY AND PRACTICE: REGRESSION MODEL
▪ In economic theory, when the income increases, the expenditure also increases.
▪ In practice, we need to have a clear answer which is represented by the number
rather than theoretical statements.
13
TWO-VARIABLE (SIMPLE) LINEAR REGRESSION MODEL
Regression function
𝑒𝑥𝑝𝑑𝑟𝑖𝑛𝑘 = 𝛽𝑖𝑛𝑐𝑜𝑚𝑒
15
THE PURPOSES OF TODAY LECTURE
▪ The purposes of today lecture is to help you how to
16
▪ The multiple regression model allows us
▪ Estimate the partial effect of each independent variable on the dependent
variable, holding other variables unchanged,
▪ and improve the quality of regression model.
▪ The general form of the LRM model is:
𝑦𝑖 = 𝛽0 + 𝛽1 𝑥1𝑖 + 𝛽2 𝑥2𝑖 + ⋯ + 𝛽𝑘 𝑥𝑘𝑖 + 𝑒𝑖
where 𝑖 indicates the observation.
▪ Or, as written in short form:
𝑌 = 𝑋𝛽 + 𝑒
▪ 𝑌 is the regressand, or dependent/explained variable
▪ 𝑒 is an error term/residual.
▪ OLS requires linearity in the coefficients.
▪ The linear regression
𝑌𝑖 = 𝛽0 + 𝛽1 𝑋1𝑖 + 𝑒𝑖
2
𝑙𝑛𝑌𝑖 = 𝛽0 + 𝛽1 𝑋1𝑖 + 𝑒𝑖
𝛽መ0
20
ORDINARY LEAST SQUARES (OLS) METHOD
21
RESIDUALS
▪ Why do we need the residual?
▪ Lack of the independent variable
▪ Measurement error
▪ Effects of random externalities
22
OPTIONAL
▪ The residual
𝑒 = 𝑦 − 𝑋𝛽መ
▪ The sum of squares
𝑒 ′ 𝑒 = 𝑌 − 𝑋𝛽መ ′ 𝑌 − 𝑋𝛽መ
▪ To minimize 𝑒′𝑒 we need to find 𝛽መ such that
𝜕𝑒 ′ 𝑒
= −2𝑋 ′ 𝑌 + 2𝑋 ′ 𝑋𝛽መ = 0
𝜕𝛽መ
▪ The first order conditions is
𝛽መ = 𝑋 ′ 𝑋 −1
𝑋 ′𝑌
𝑠𝑒 = diag(𝑉𝐶𝑉)
OPTIONAL
𝑡 = 𝑏 ÷ 𝑠𝑒
where ÷ indicate element-wise division of vectors.
▪ Each element 𝑗 of 𝑝𝑣𝑎𝑙𝑢𝑒 is given by
𝑝𝑗 = 2 1 − 𝐹 𝑡𝑗 , 𝑑𝑓
where 𝑑𝑓 = 𝑁 − 1, the degree of freedom, and 𝐹 the Student’s t cumulative distribution function.
Note that 𝑁 is the number of observations (# rows of 𝑋).
OPTIONAL
▪ 𝑁 is number of observations
▪ 𝑑𝑓 for 𝑅𝑆𝑆 = 𝑁 − 𝑘
OPTIONAL
▪ Sometimes researchers play the game of “maximizing” 𝑅 2 (Somebody think the higher
the 𝑅2 , the better the model. BUT THIS IS NOT NECESSARILY TRUE!)
▪ To avoid this temptation: 𝑅 2 should takes into account the number of regressors
▪ Mind that the data is not the first argument, so if you use it in a pipe,
you have to do: model = lm(<formula>, data = .)
▪ The formula is: dependent variable ~ independent variable
▪ Try: model = lm(wage ~ edu, data = Z)
▪ Regression and display the result in 1 coding line
distribution of residuals
Section 1: a summary of the distribution of
residuals from the regression model.
Total variance
P-value/2
𝐸 𝑒𝑒’ 𝑋 = 𝜎 2 𝐼
▪ This assumption is used to calculate the VCV of coefficients (which is used to calculate
standard errors).
▪ Violation of this assumption results in heteroskedasticity, which gives biased estimates of
the standard errors.
OPTIONAL