0% found this document useful (0 votes)

40 views40 pages

Lecture 3 - LRM

The document discusses linear regression analysis. It introduces linear regression, the idea of fitting a line of best fit to relate two variables, and finding the slope coefficient. It covers topics like the linear regression model and ordinary least squares estimation, goodness of fit measures, assumptions, hypothesis testing using t-tests and F-tests, and illustrates with an example dataset.

Uploaded by

Ngọc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views40 pages

Lecture 3 - LRM

Uploaded by

Ngọc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 40

LINEAR

REGRESSIO
N
Nguyen Quang qua
[email protected]
THE IDEA BEHIND REGRESSION

• We want to relate two different variables – how

does one affect the other?
• Particularly, we want to know how much Y changes
when X increases/decreases by 1 unit.
• In doing so, we need a function in the form
Y = 𝛽𝑋
which lets us know that when X increases by 1 unit, Y
changes by 𝛽.
• Example:
• What is your monthly income?
REGRESSIO • How much do you spend on bubble milk tea?

N ANALYSIS • Below is the data from a sample of 100 students

• How much more does a student spend on bubble tea each
monthly if his/her income increases by 1 mil. VND?
• How do we find the value of 𝛽 in this case?
REGRESSIO • By fitting a line to the data.
N ANALYSIS • In particular, we try to find the line of best fit.
• What does best fit mean?
• How do we find the value of 𝛽 in this case?
REGRESSIO • By fitting a line to the data.
N ANALYSIS • In particular, we try to find the line of best fit.
• What does best fit mean?
REGRESSIO
N
FUNCTION
• Most basic regression does
exactly this
• The method of Ordinary
least squares (OLS) minimizes
the sum of the squared
“distances”
THE LINEAR REGRESSION MODEL (LRM)
• The general form of the LRM model is:
𝑌!= 𝛽" + 𝛽#𝑋#! + 𝛽$𝑋$! + ⋯ + 𝛽%𝑋%! + 𝑒!
• Or, as written in short form:
= 𝛽𝑋+ 𝑒! 𝑌!

• 𝑌is the regressand, or

dependent/explained variable
• 𝑋is a vector of regressors, or
independent/explanatory
variables
• 𝑒is an error term/residual.
REGRESSION COEFFICIENTS
𝑌!= 𝛽" + 𝛽#𝑋#! + 𝛽$𝑋$! + ⋯ + 𝛽% 𝑋%! + 𝑒!

• 𝛽"is the intercept/constant

• 𝛽# to 𝛽% are the slope coefficients

• In general, 𝛽 are the regression coefficients or regression parameters. THEY ARE

WHAT WE NEED TO ESTIMATE!
• Each slope coefficient measures the (partial) rate of change in the mean value of 𝑌for a unit
change in the value of a regressor, ceteris paribus
• Roughly speaking: 𝛽# lets us know when 𝑋# increases by one unit, 𝑌changes by 𝛽#, other
things (all other Xs) unchanged.
METHOD OF • Method of Ordinary Least Squares (OLS) search
for coefficients that minimizes residual sum of
ORDINARY squares (RSS):

LEAST 𝑅𝑆𝑆 = % 𝑢 "!

SQUARES • We need a data set of Y and X to find 𝛽.

• Finding 𝛽 is an optimization problem.
GOODNESS OF FIT: R2
• 𝑅$, the coefficient of determination, is an overall measure of the goodness of fit of the
estimated regression line.
• 𝑅$gives the percentage of the total variation in the dependent variable explained by the
regressors:
"
• Explained Sum of Squares 𝐸𝑆𝑆 = ∑ 𝑌)−
𝑌,

• Residual Sum of Squares 𝑅𝑆𝑆 = ∑ 𝑒 "

#$$ &$$
• Then: 𝑅 " = %$$= 1 − %$
• Total Sum of$Squares 𝑇𝑆𝑆 = ∑ 𝑌 − 𝑌, "
• It is a value between 0 (no fit) and 1 (perfect fit), higher 𝑅$indicates better fit.
• When 𝑅$= 1,𝑆𝑅= 0 and ∑ 𝑒$= 0.
• 𝑛is total number of observations
DEGREE OF • 𝑘is total number of estimated coefficients
FREEDOM • 𝑓𝑑for 𝑆𝑅= 𝑛− 𝑘
𝑑𝑓
GOODNESS OF FIT:
R SQUARED ADJUSTED

• 𝑅$is higher when more regressors are added.

• Sometimes researchers play the game of “maximizing” 𝑅$(Somebody think the
higher the 𝑅$, the better the model. BUT THIS IS NOT NECESSARILY
TRUE!)
• To avoid this temptation: 𝑅$should take into account the number of regressors
• Such an 𝑅$is called an adjusted 𝑅$, denoted as .𝑅$ (R-bar squared), and is
computed from the original (unadjusted) 𝑅$as follows:
𝑛− 1
.𝑅$ = 1 −1 − 𝑛− 𝑘
𝑅$
ILLUSTRATIO
N
DAT
A
A survey of 20,306 individuals in the U.S.
• male 1 = male; 2 = female
• age age (year)
• wage wage (US$/hour)
• tenure # years working for current
• union employer 1 = union member, 0
• edu otherwise
• married years of schooling (years)
Data file: 1 = married or living together with a
lrm.xlsx. partner, 0 otherwise
IMPORTIN
G DATA
IMPORTIN
G DATA
PREPARING AND DESCRIBING DATA
DESCRIBING
DUMMY VARIABLES
• For dummy variables, the mean and
sd do not make a lot of sense.
• We present the frequency of
each outcome instead.
hist(Z$wage, main = "Histogram of wage", xlab = "Wage", col = "yellow", breaks = 100, freq = TRUE)

MORE DETAILED DESCRIPTION: HISTOGRAM

Limit the range of the x axis to (0,100):
hist(Z$wage, main = "Histogram of wage", xlab = "Wage", col = "yellow", breaks = 1000, xlim = c(0,100))

MORE DETAILED DESCRIPTION: HISTOGRAM

SCATTER PLOT
plot(Z$edu,Z$wage, ylab = "Wage (US$/hour)", xlab = "Schooling years")
SCATTER PLOT
plot(Z$age,Z$wage, ylab = "Wage (US$/hour)", xlab = "Age (years)")
SCATTER PLOT
plot(Z$age,Z$wage, ylab = "Wage (US$/hour)", xlab = "Age (years)", ylim = c(0,100))
COMPARING WAGE BETWEEN GROUPS
REGRESSION RESULTS

One more schooling year results in a

wage increase of about US$2/hour.
REGRESSIO
N WITHOUT
OUTLIERS
1. Linear in parameters
2. Full rank
ASSUMPTION 3. Regressors X are fixed (non-
S OF THE stochastic)
CLASSICAL 4. Exogeneity of X
LRM 5. Normal distribution of the error
term
6. Homoskedasticity of the error
term
7. No autocorrelation
8. No specification error
• A1: Model is linear in the parameters
• A2: The number of observations must be
greater than the number of parameters, and
ASSUMPTION no perfect multicollinearity, or no perfect
S OF linear relationships among the 𝑋 variables.
CLASSICAL • A3: Regressors 𝑋𝑠 are fixed or nonstochastic
LRM
• A4: No correlation between 𝑋 and 𝑒, or
E e𝑋 = 0
• A5: Given 𝑋, the expected value of the
error term is zero, or 𝐸 𝑒𝑖 𝑋 =0
and follow
ASSUMPTION 𝑁(0, 𝜎0).
S OF • A6: Homoskedastic, or constant, variance
CLASSICAL of
LRM 𝑢1. Or 𝑣𝑎𝑟 𝑢1 𝑋 = 𝜎0 is a constant.
• A7: No autocorrelation 𝑐𝑜𝑣(𝑢1 , 𝑢2 |𝑋 )
= 0, 𝑖≠ 𝑗.
• A8: No specification bias.
• On the basis of assumptions A1 to A8, the
OLS method gives best linear unbiased estimators
(BLUE):
GAUSS – • (1) Estimators are linear functions of the
dependent variable Y.
MARKOV • (2) The estimators are unbiased; in repeated
THEORE applications of the method, the estimators
approach their true values.
M • (3) In the class of linear estimators, OLS
estimators have minimum variance; i.e., they are
efficient, or the “best” estimators.
HYPOTHESIS
TESTING
Testing individual coefficient: t test
Testing multiple coefficients: F
test
• To test the following hypothesis:
• 𝐻": 𝐵% = 0
• 𝐻#: 𝐵% ≠ 0

TESTING • Calculate the following and use the 𝑡table to

obtain the critical value with 𝑛 − 𝑘 degrees of
INDIVIDUAL freedom for a given level of significance
COEFFICIENT (or 𝛼, conventionally chosen at 10%, 5%, or
: T-TEST 1%): 𝑏3
𝑡=
𝑠𝑒 𝑏3
• Ifthis value is greater than the critical 𝑡value,
we can reject 𝐻0.
• Step 1: Form hypotheses
• 𝐻": 𝛽& = 0
• 𝐻#: 𝛽& ≠ 0
• Step 2: Determine confidence interval, critical
TESTING values, region of rejection, region of
INDIVIDUAL acceptance. ∗
𝑡
COEFFICIENT 4
0,673
: T-TEST • Step 3: Calculate test
statistic 𝑡 𝛽:
9 =𝑠
9 ;!
• Step 4: Decide
TESTING INDIVIDUAL COEFFICIENT: T TEST

• If >𝑡 𝑡!,()% Reject 𝐻" at level of significance of 𝛼

&& "

• If 𝑃*+,-. < 𝛼 Reject 𝐻" at level of significance

of 𝛼
TESTING INDIVIDUAL COEFFICIENT: T TEST

The hypothesis that schooling years has no impact on wage is rejected at 10% (even at
TESTING MULTIPLE COEFFICIENTS: F-TEST

• Step 1: Form hypotheses

• 𝐻": 𝛽/0# = 𝛽 /0$ = ⋯ = 𝛽% = 0
• 𝐻1 : At least one β different
from 0

• Step 2: Calculate test statistic ( 𝐹 )

(&$$ ! (&$$ " )/(+, ! (+, " )
𝐹=
&$$ " /+, "

𝑑𝑓- = 𝑛 − 𝑘

𝑑𝑓& = 𝑛 − 𝑚
TESTING MULTIPLE COEFFICIENTS: F-TEST

• Step 3: Determine the critical

value
𝐹∗%' & ,) ' (𝛼
%
• (𝑘 − 𝑚 ) degree of freedom for nominator )
• (𝑛 − 𝑘) degree of freedom for denominator

• Step 4: Decide
• 𝐹 .. > 𝐹∗, 𝑜𝑟

• 𝑃01234 = 𝑃 𝐹 > 𝐹 .. <𝛼

=> Reject 𝐻5 at the significance level of 𝛼

TESTING MULTIPLE COEFFICIENTS: F-TEST

The hypothesis that male, married and age are equal to zero simultaneously is rejected at 1%.
… is the F-test for the null hypothesis
F-TEST FOR that all coefficients are equal to zero
OVERALL simultaneously.
SIGNIFICANC
E
F TEST FOR OVERALL SIGNIFICANCE

The hypothesis that all coefficients are equal to zero simultaneously is rejected at 1%.

Econometric S Cheat Sheet
No ratings yet
Econometric S Cheat Sheet
3 pages
ECO 401 Econometrics: SI 2021 Week 2, 14 September
100% (1)
ECO 401 Econometrics: SI 2021 Week 2, 14 September
47 pages
Chapter 2-Simple Regression Model
No ratings yet
Chapter 2-Simple Regression Model
25 pages
Lecture 2 Simple Regression Model
100% (1)
Lecture 2 Simple Regression Model
47 pages
Linear Regression Model: Man - PN@VNP - Edu.vn
No ratings yet
Linear Regression Model: Man - PN@VNP - Edu.vn
77 pages
1-Chap II Econometrics ABC DR Mitiku
No ratings yet
1-Chap II Econometrics ABC DR Mitiku
80 pages
Linear Regression Model: Topic 2
No ratings yet
Linear Regression Model: Topic 2
49 pages
Ch3 Slides Ed4 2024
No ratings yet
Ch3 Slides Ed4 2024
72 pages
Ch3 Slides Ed4 2024 20
No ratings yet
Ch3 Slides Ed4 2024 20
72 pages
Lecture 2 - LRM
No ratings yet
Lecture 2 - LRM
43 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
42 pages
Week 8 - 10
No ratings yet
Week 8 - 10
72 pages
Topic 3a
No ratings yet
Topic 3a
64 pages
Slides 1 Iu
No ratings yet
Slides 1 Iu
45 pages
STAT22209 - Chapter 02-Regression Analyisis - 2022
No ratings yet
STAT22209 - Chapter 02-Regression Analyisis - 2022
41 pages
CH - 05 - Further Issues - TQT
No ratings yet
CH - 05 - Further Issues - TQT
35 pages
Chap3 - Multiple Regression
No ratings yet
Chap3 - Multiple Regression
56 pages
Topic 2
No ratings yet
Topic 2
23 pages
Regression: Dr. Agustinus Suryantoro, M.S
No ratings yet
Regression: Dr. Agustinus Suryantoro, M.S
31 pages
Oversikt ECN402
No ratings yet
Oversikt ECN402
40 pages
Regression With One Regressor
No ratings yet
Regression With One Regressor
25 pages
Economatrics 3
No ratings yet
Economatrics 3
32 pages
02 Simple Regression
No ratings yet
02 Simple Regression
29 pages
Bus 173 - Lecture 5
No ratings yet
Bus 173 - Lecture 5
38 pages
Regression Analysis (Simple)
100% (1)
Regression Analysis (Simple)
8 pages
R18&19
No ratings yet
R18&19
32 pages
Chapter 8 Regression Model - 2023
No ratings yet
Chapter 8 Regression Model - 2023
21 pages
Chapter 5 Statistics
No ratings yet
Chapter 5 Statistics
47 pages
Chapter Three
No ratings yet
Chapter Three
22 pages
Econometrics Practical
No ratings yet
Econometrics Practical
13 pages
Topic 2
No ratings yet
Topic 2
23 pages
Econometric Estimation BETA
No ratings yet
Econometric Estimation BETA
36 pages
Econometric S
No ratings yet
Econometric S
23 pages
Multiple Regression
No ratings yet
Multiple Regression
14 pages
Ch3 Multiple Regression
No ratings yet
Ch3 Multiple Regression
56 pages
STAT630Slide Adv Data Analysis
No ratings yet
STAT630Slide Adv Data Analysis
238 pages
Chapter 2
No ratings yet
Chapter 2
27 pages
Chapter 3 - Classical Simple Linear Regression
No ratings yet
Chapter 3 - Classical Simple Linear Regression
52 pages
Basics of The OLS Estimator: Study Guide For The Midterm
No ratings yet
Basics of The OLS Estimator: Study Guide For The Midterm
7 pages
Lecture 2-3
No ratings yet
Lecture 2-3
8 pages
Multiple Regression
No ratings yet
Multiple Regression
49 pages
Introduction To Econometrics - Summary
No ratings yet
Introduction To Econometrics - Summary
23 pages
UNIT 3 For ACfn & MGT
No ratings yet
UNIT 3 For ACfn & MGT
28 pages
EMF CheatSheet V4
100% (1)
EMF CheatSheet V4
2 pages
Review: Multiple Regression: Holding The Other Explanatory Variables Constant or Fixed
No ratings yet
Review: Multiple Regression: Holding The Other Explanatory Variables Constant or Fixed
7 pages
Regression Analysis
No ratings yet
Regression Analysis
65 pages
Midterm 2 Nem Veg Leges
No ratings yet
Midterm 2 Nem Veg Leges
9 pages
Short - Notes - Econometric Methods
No ratings yet
Short - Notes - Econometric Methods
22 pages
Regn Lect 5
No ratings yet
Regn Lect 5
9 pages
Additional Cheatsheet en
No ratings yet
Additional Cheatsheet en
3 pages
Econometrics Cheat Sheet
No ratings yet
Econometrics Cheat Sheet
4 pages
Ordinary Least Squares
No ratings yet
Ordinary Least Squares
21 pages
Simple Linear Regression: From Wikipedia, The Free Encyclopedia
No ratings yet
Simple Linear Regression: From Wikipedia, The Free Encyclopedia
10 pages
What Is Multiple Linear Regression
No ratings yet
What Is Multiple Linear Regression
23 pages
Welcome To The Course: Financial Econometrics I
No ratings yet
Welcome To The Course: Financial Econometrics I
14 pages
Chapter 3 Notes
No ratings yet
Chapter 3 Notes
5 pages
Emet2007 Notes
No ratings yet
Emet2007 Notes
6 pages
Basic Regression Analysis
No ratings yet
Basic Regression Analysis
5 pages
Cheat Sheet
No ratings yet
Cheat Sheet
4 pages