0% found this document useful (0 votes)
10 views133 pages

WEEK2 Simple Regression

The document covers the fundamentals of simple regression models, including types of data (cross-sectional, time series, and panel data) and terminology related to dependent and independent variables. It provides examples, such as the relationship between weight and height, and discusses the Capital Asset Pricing Model (CAPM) with risk decomposition. Additionally, it outlines assumptions for simple regression, parameter estimation using Ordinary Least Squares (OLS), and the historical background of regression analysis.

Uploaded by

meminatmaca55
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views133 pages

WEEK2 Simple Regression

The document covers the fundamentals of simple regression models, including types of data (cross-sectional, time series, and panel data) and terminology related to dependent and independent variables. It provides examples, such as the relationship between weight and height, and discusses the Capital Asset Pricing Model (CAPM) with risk decomposition. Additionally, it outlines assumptions for simple regression, parameter estimation using Ordinary Least Squares (OLS), and the historical background of regression analysis.

Uploaded by

meminatmaca55
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 133

ECONOMETRICS

SIMPLE REGRESSION
WEEK2

FALL 2024

Prof. Dr. Burç Ülengin


The Simple Regression
Model

y =  0 +  1x + u
Types of Data – Cross Sectional
➢ Cross-sectional data is a random sample
𝑦𝑖 𝑥𝑖
𝑦𝑖 = 𝛽0 + 𝛽1 𝑥𝑖 + 𝑢𝑖
➢ Each observation is a new individual, firm,
etc., with information at a point in time

➢ If the data is not a random sample, we have


a sample-selection problem
3
Types of Data – Time Series
➢ Time series data has a separate observation for
each time period – e.g., stock prices, GDP
𝑦𝑡 𝑥𝑡
𝑦𝑡 = 𝛽0 + 𝛽1 𝑥𝑡 + 𝑢𝑡
➢ Since it is not a random sample, different
problems to consider

➢ Trends, seasonality, and dynamic behavior


will be important 4
Types of Data – Panel
➢ Can pool random cross-sections and treat
them similarly to a typical cross-section. We
will need to account for time differences.
𝑦𝑖𝑡 𝑥𝑖𝑡
𝑦𝑖𝑡 = 𝛽0 + 𝛽1 𝑥𝑖𝑡 + 𝑢𝑖𝑡
➢ Can follow the same random individual
observations over time – known as panel data
or longitudinal data
5
Some Terminology
➢In the simple linear regression model,
where y = 0 + 1x + u, we typically refer
to y as the
▪ Dependent Variable, or
▪ Left-Hand Side Variable, or
▪ Explained Variable, or
▪ Regressand

6
Some Terminology, cont.
➢In the simple linear regression of y on x, we
typically refer to x as the
▪ Independent Variable, or
▪ Right-Hand Side Variable, or
▪ Explanatory Variable, or
▪ Regressor, or
▪ Covariate, or
▪ Control Variables
7
EXAMPLE1 : Weight vs. Height
BOY vs KILO
100,0

80,0
KILO

60,0

40,0
155,0 166,7 178,3 190,0

BOY 8
EXAMPLE1 : Weight vs. Height
BOY vs KILO
100,0

80,0
KILO

60,0

40,0
155,0 166,7 178,3 190,0

Varyans Covariance Matrix BOY


66.173 86.493
Correlation 0.85 9
86.493 156.353
EXAMPLE1 : Weight vs. Height
100

90

80 Regression
Line
KILO

70

60

50

40

155 160 165 170 175 180 185 190


BOY
10
EXAMPLE1: Weight vs. Height vs.
Gender
BOY vs KILO
100,0

▲ Male
80,0 ● Female
KILO

60,0

40,0
155,0 166,7 178,3 190,0

BOY
Varyans Covariance Matrix Varyans Covariance Matrix
25.464 19.953 79.524 35.147
19.953 42.437 35.147 27.061
Correlation = 0.61 Correlation = 0.76 11
EXAMPLE1 : Regression Lines
Weight vs. Height vs. Gender

Weight = 0 + 1Height + u
BOY vs KILO
100,0

80,0
KILO

60,0

40,0
155,0 166,7 178,3 190,0

12
BOY
EXAMPLE2: Capital Asset Pricing Model
CAPM
ri = Return of the asset i
( ri − rf ) =  + ( rm − rf ) +  rm = Return of the market
rf = Risk-free interest rate
Var ( ri − rf ) = Var[ + ( rm − rf ) + ]
Var ( ri − rf ) = Var[] + Var[( rm − rf )] + Var[]
Var ( ri − rf ) = Var[( rm − rf )] + Var[]
Var ( ri − rf ) = 2 Var[(rm − rf )] + Var[]
Total Risk Originated Risk Originated
Risk From Market from Firm

13
EXAMPLE2: Capital Asset Pricing Model
CAPM
0.2

0.1

0.0
RPET-RF

-0.1

Slope of
-0.2 the line is 

-0.3
-0.2 -0.1 0.0 0.1 0.2 0.3

RM-RF
14
EXAMPLE2: Capital Asset Pricing Model CAPM
Turkish Stock Exchange Market: BIST100 Akbank Garanti stocks
19/10/2020 18/10/2021
1,600

1,500

12
1,400

10 1,300

1,200
8

1,100
6

4
M10 M11 M12 M1 M2 M3 M4 M5 M6 M7 M8 M9 M10
2020 2021

BIST100 AKBANK GARANTI

15
EXAMPLE2: Capital Asset Pricing Model CAPM
Turkish Stock Exchange Market: BIST100 Akbank Garanti stocks
19/10/2020 18/10/2021
.12 .12

.08 .08

DLOG(GARANTI)
DLOG(AKBANK)

.04 .04

.00 .00

-.04 -.04

-.08 -.08

-.12 -.12
-.12 -.10 -.08 -.06 -.04 -.02 .00 .02 .04 -.12 -.10 -.08 -.06 -.04 -.02 .00 .02 .04

DLOG(BIST100) DLOG(BIST100)

Return(AKBANK) = -0.00089 + 1.186*Return(BIST100)

Return(GARANTI) = -0.000055 + 1.253*Return(BIST100) 16


EXAMPLE2: Risk Decomposition
Akbank and Garanti stocks
19/10/2020 18/10/2021

𝑉𝑎𝑟(𝑟𝑖 − 𝑟𝑓 ) = 𝛽2 𝑉𝑎𝑟[(𝑟𝑚 − 𝑟𝑓 )] + 𝑉𝑎𝑟[𝜀]

Total Risk Originated Risk Originated


Risk = From Market + from Firm

AKBANK 4.41 = 1.192*1.97=2.79 + 1.62 (%37)

GARANTİ 5.75
= 1.252*1.97=3.08 + 2.67 (%46)

Garanti is more risky than Akbank


17
A Simple Regression Assumptions

𝑦𝑖 = 𝛽0 + 𝛽1 𝑥𝑖 + 𝑢𝑖
➢The average value of u, the error term, in
the population is 0. That is,

◼ E(u) = 0

➢This is not a restrictive assumption, since


we can always use 0 to normalize E(u) to 0

18
A Simple Regression Assumptions
➢ Zero Conditional Mean
➢ We need to make a crucial assumption about
how u and x are related
➢ We want it to be the case that knowing
something about x does not give us any
information about u, so that they are
completely unrelated. That is, that
➢ E(u|x) = E(u) = 0, which implies
➢ E(y|x) = 0 + 1x 19
A Simple Regression Assumptions
➢ E(y|x) as a linear function of x, where for any x
the distribution of y is centered about E(y|x)

20
Parameter Estimation:
Ordinary Least Squares
➢ Basic idea of regression is to estimate the
population parameters from a sample
➢ Let {(xi,yi): i=1, …,n} denote a random
sample of size n from the population
➢ For each observation in this sample, it will
be the case that
yi = 0 + 1xi + ui

21
SIMPLE REGRESSION MODEL
y

y =  + x
Q4
Q3
Q2
 Q1

x1 x2 x3 x4 x

If the relationship were an exact –deterministic-one, the


observations would lie on a straight line and we would have no
trouble obtaining accurate estimates of  and .
23
SIMPLE REGRESSION MODEL
Random Disturbances

y
y =  + x
40
Age of
First 30
Child
20

10

0
30 40 50 60 x
Age of
Mother
24
SIMPLE REGRESSION MODEL
Population
y P4

u4

P1
y =  + x
Q3
Q2 Q4
u1 u3
u2
 Q1 P3
P2

Actual value P = Deterministic part Q + Stochastic part u

x1 x2 x3 x4 x

To allow for such divergences, we will write the model


as y =  + x + u, where u is a disturbance term.
25
SIMPLE REGRESSION MODEL
Sample
y (actual value)
y y^ (fitted value) y4
^
y - y = e (residual)
e4 y^ = a + bx
y1
𝑦ෝ3
𝑦ෝ2 𝑦ෝ4
e1 e3
e2
y3
a=?
𝑦ෝ1 y2
a b=?

x1 x2 x3 x4 x

The discrepancies between the actual and fitted values of y are


known as the residuals.
26
Deriving Linear
Regression Coefficients

27
DERIVING LINEAR REGRESSION
COEFFICIENTS a and b
Ordinary Least Squares OLS

𝑦ෝ𝑖 = 𝑎 + 𝑏𝑥𝑖 + 𝑒𝑖
Least squares criterion:
Minimize S, where

𝑆 = ෍ 𝑒𝑖2 = 𝑒12 +. . . +𝑒𝑛2

To begin with, we will draw the fitted line so as to minimize the


sum of the squares of the residuals. This is described as the
least squares criterion.
28
HISTORICAL BACKGROUND
➢ GAUSS 1795 – early user, not published
➢ LEGENDRE 1805 – an explicit mathematical representation
➢ GAUSS 1809 – integrated with probability theory
➢ GALTON 1900 – first used the Word “regression.”

Orbit Estimation
Estimation of the
coefficients of the
ellipse function

29
DERIVING ORDINARY LEAST SQUARES
OLS
Least squares criterion:
Minimize S, where

𝑆 = ෍ 𝑒𝑖2 = 𝑒12 +. . . +𝑒𝑛2


Why not minimize

෍ 𝑒𝑖 = 𝑒1 +. . . +𝑒𝑛

➢ Why the squares of the residuals? Why not just minimize


the sum of the residuals?
1. Avoid canceling positive and negative residuals and
2. More penalty for large resisuals than small ones. 30
DERIVING LINEAR REGRESSION
COEFFICIENTS
True model : y =  + x + u
y
6 Fitted line : yˆ = a + bx
y3
5 y2
4

3 y1
2

0
0 1 2 3
This sequence shows how the regression coefficients for a simple regression model are
derived, using the least squares criterion (OLS, for ordinary least squares)
We will start with a numerical example with just three observations: (1,3), (2,5), and
(3,6). 31
DERIVING LINEAR REGRESSION
COEFFICIENTS
y
yˆ 3 = a + 3b
6
y3
5 y2
4 yˆ 2 = a + 2b
yˆ 1 = a + b
3 y1
2 b True model: 𝑦 = 𝛼 + 𝛽𝑥 + 𝑢
a Fitted line: ෞ𝑦 = 𝑎 + 𝑏𝑥
1

0
0 1 2 3
Writing the fitted regression as y = a + bx, we will determine the values of a
^
and b that minimize the sum of the squares of the residuals.
32
DERIVING LINEAR REGRESSION
COEFFICIENTS

y
yˆ 3 = a + 3b y = yˆ + e
6
y3
5 y2 y = (a + bx ) + e
yˆ 2 = a + 2b e = y − a − bx
4
yˆ 1 = a + b
3 y1 e1 = y1 − yˆ 1 = 3 − a − b
2 b e2 = y2 − yˆ 2 = 5 − a − 2b
a
1 e3 = y3 − yˆ 3 = 6 − a − 3b
0
0 1 2 3

Given our choice of a and b, the residuals are as shown.


33
DERIVING LINEAR REGRESSION
COEFFICIENTS
S = e12 + e22 + e32 = ( 3 − a − b ) 2 + (5 − a − 2b ) 2 + (6 − a − 3b ) 2
= 9 + a 2 + b 2 − 6a − 6b + 2ab
+ 25 + a 2 + 4b 2 − 10a − 20b + 4ab
+ 36 + a 2 + 9b 2 − 12a − 36b + 6ab
= 70 + 3a 2 + 14b 2 − 28a − 62b + 12ab
𝑒1 = 𝑦1 − 𝑦ො1 = 3 − 𝑎 − 𝑏
𝑒2 = 𝑦2 − 𝑦ො2 = 5 − 𝑎 − 2𝑏
𝑒3 = 𝑦3 − 𝑦ො3 = 6 − 𝑎 − 3𝑏

The sum of the squares of the residuals is thus as shown above.


The quadratics have been expanded.
Like terms have been added together. 34
DERIVING LINEAR REGRESSION
COEFFICIENTS
S = e12 + e22 + e32 = (3 − a − b) 2 + (5 − a − 2b) 2 + (6 − a − 3b) 2
= 9 + a 2 + b 2 − 6a − 6b + 2ab
+ 25 + a 2 + 4b 2 − 10a − 20b + 4ab
+ 36 + a 2 + 9b 2 − 12a − 36b + 6ab
S = 70 + 3a 2 + 14b 2 − 28a − 62b + 12ab

S
= 0  6a + 12b − 28 = 0
a

For a minimum, the partial derivatives of S with respect to a


and b should be zero. (We should also check a second-order
condition.) 35
DERIVING LINEAR REGRESSION
COEFFICIENTS
S = e12 + e22 + e32 = ( 3 − a − b ) 2 + (5 − a − 2b ) 2 + (6 − a − 3b ) 2
= 9 + a 2 + b 2 − 6a − 6b + 2ab
+ 25 + a 2 + 4b 2 − 10a − 20b + 4ab
+ 36 + a 2 + 9b 2 − 12a − 36b + 6ab
= 70 + 3a 2 + 14b 2 − 28a − 62b + 12ab

S
= 0  6a + 12b − 28 = 0
a
S
= 0  12a + 28b − 62 = 0
b
The first-order conditions give us two equation in two unknowns.
36
DERIVING LINEAR REGRESSION
COEFFICIENTS
S = e12 + e22 + e32 = ( 3 − a − b ) 2 + (5 − a − 2b ) 2 + (6 − a − 3b ) 2
= 9 + a 2 + b 2 − 6a − 6b + 2ab
+ 25 + a 2 + 4b 2 − 10a − 20b + 4ab
+ 36 + a 2 + 9b 2 − 12a − 36b + 6ab
= 70 + 3a 2 + 14b 2 − 28a − 62b + 12ab
S
= 0  6a + 12b − 28 = 0 6a + 12b = 28
a
S 12a + 28b = 62
= 0  12a + 28b − 62 = 0
b

Solving them, we find that S is minimized when ∴ 𝐚 = 𝟏. 𝟔𝟕 𝐛 = 𝟏. 𝟓𝟎


a and b are equal to 1.67 and 1.50, respectively. 37
DERIVING LINEAR REGRESSION
COEFFICIENTS

y
yˆ 3 = 6.17
6
y3
5 y2
4 yˆ 2 = 4.67
yˆ 1 = 3.17
3 y1
2 1.50
True model : y =  + x + u
1.67
1
Fitted line : yˆ = 1.67 + 1.50 x

0
0 1 2 3

The fitted line and the fitted values of y are as shown.


38
DERIVING LINEAR REGRESSION
COEFFICIENTS
Now we will do the same thing for the general case with
n observations.
y True model : y =  + x + u
Fitted line : yˆ = a + bx

yn
y1

x1 xn x 39
DERIVING LINEAR REGRESSION
COEFFICIENTS
y True model : y =  + x + u
Fitted line : yˆ = a + bx

yˆ n = a + bxn

yn
y1

yˆ 1 = a + bx1
a b

x1 xn x
Given our choice of a and b, we will obtain a fitted line as shown. 40
DERIVING LINEAR REGRESSION
COEFFICIENTS
y True model : y =  + x + u
Fitted line : yˆ = a + bx

yˆ n = a + bxn

yn
y1
e1 e1 = y1 − yˆ 1 = y1 − a − bx1
yˆ 1 = a + bx1 .....
a b en = yn − yˆ n = yn − a − bxn

x1 xn x
The residual for the first observation is defined. 41
DERIVING LINEAR REGRESSION
COEFFICIENTS
y
True model : y =  + x + u
Fitted line : yˆ = a + bx
yˆ n = a + bxn
en
yn
y1
e1 e1 = y1 − yˆ 1 = y1 − a − bx1
yˆ 1 = a + bx1 .....
a b en = yn − yˆ n = yn − a − bxn

x1 xn x
Similarly we define the residuals for the remaining observations. That for
42
the last one is marked.
DERIVING LINEAR REGRESSION COEFFICIENTS
S = e12 + e22 + e32 = ( 3 − a − b ) 2 + (5 − a − 2b ) 2 + (6 − a − 3b ) 2
= 9 + a 2 + b 2 − 6a − 6b + 2ab
+ 25 + a 2 + 4b 2 − 10a − 20b + 4ab
+ 36 + a 2 + 9b 2 − 12a − 36b + 6ab
= 70 + 3a 2 + 14b 2 − 28a − 62b + 12ab
S = e12 + ... + en2 = ( y1 − a − bx1 ) 2 + ... + ( yn − a − bxn ) 2
= y12 + a 2 + b 2 x12 − 2ay1 − 2bx1 y1 + 2abx1
+ ...
+ yn2 + a 2 + b 2 x n2 − 2ayn − 2bxn yn + 2abxn
=  yi2 + na 2 + b 2  x i2 − 2a  yi − 2b x i yi + 2ab x i
The sum of the squares of the residuals is defined for the general case. The data for the
numerical example are shown for comparison.
43
The quadratics are expanded. Like terms are added together.
DERIVING LINEAR REGRESSION COEFFICIENTS
S = 70 + 3a 2 + 14b 2 − 28a − 62b + 12ab
S
= 0  6a + 12b − 28 = 0
a  a = 1.67, b = 1.50
S
= 0  12a + 28b − 62 = 0
b
S =  yi2 + na 2 + b 2  x i2 − 2a  y1 − 2b x i yi + 2ab x i

The first derivatives of S with respect to a and b provide us with two equations
that can be used to determine a and b.
Note that in this situation the observations on x and y are just data which
determine the coefficients in the expression for S.
The choice variables in the expression are a and b. This may seem a bit
strange because in elementary calculus courses a and b are always constants
and x and y are variables.
44
DERIVING LINEAR REGRESSION COEFFICIENTS

S = 70 + 3a 2 + 14b 2 − 28a − 62b + 12ab


S
= 0  6a + 12b − 28 = 0
a  a = 1.67, b = 1.50
S
= 0  12a + 28b − 62 = 0
b

S =  yi2 + na 2 + b 2  x i2 − 2a  y1 − 2b x i yi + 2ab x i


S
= 0  2na − 2 yi + 2b xi = 0
a
na =  yi −b x i a = y − bx
The first derivative with respect to a.
With some simple manipulation we obtain a tidy expression for a. 45
DERIVING LINEAR REGRESSION COEFFICIENTS

The first derivative with respect to b.


S =  yi2 + na 2 + b 2  x i2 − 2a  y1 − 2b x i yi + 2ab x i
S
= 0  2na − 2 yi + 2b xi = 0
a
na =  yi −b x i a = y − bx
S
= 0  2b xi2 − 2 xi yi + 2a  xi = 0
b

Divide through by 2.
S
= 0  2b  xi2 − 2  xi yi + 2a  xi = 0
b
b  xi2 −  xi yi + a  xi = 0
46
DERIVING LINEAR REGRESSION COEFFICIENTS

We now substitute for a using the expression obtained for it and we thus
obtain an equation that contains b only.
S =  yi2 + na 2 + b 2  x i2 − 2a  y1 − 2b x i yi + 2ab x i
S
= 0  2na − 2 yi + 2b xi = 0
a
na =  yi −b x i a = y − bx
S
= 0  2b xi2 − 2 xi yi + 2a  xi = 0
b
We now substitute for a using the expression obtained for it and we thus
obtain an equation that contains b only.
S
= 0  2b  xi2 − 2  xi yi + 2a  xi = 0
b
b  xi2 −  xi yi + a  xi = 0

b  xi2 −  xi yi + ( y − bx )  xi = 0 47
DERIVING LINEAR REGRESSION COEFFICIENTS

The definition of the sample mean has been used.


S
= 0  2b  xi2 − 2  xi yi + 2a  xi = 0
b
b  xi2 −  xi yi + a  xi = 0

b  xi2 −  xi yi + ( y − bx )  xi = 0  xi
x=
n
b  xi2 −  xi yi + ( y − bx )nx = 0
Terms not involving b have been transferred to the right side and the
equation has been divided through by n.

b( xi2 − nx 2 ) =  xi yi − nxy

1 2 1
b  x i − x  =  x i y i − x y
2

n  n 48
DERIVING LINEAR REGRESSION COEFFICIENTS
S
= 0  2b  xi2 − 2  xi yi + 2a  xi = 0
b
b  xi2 −  xi yi + a  xi = 0

b  xi2 −  xi yi + ( y − bx )  xi = 0

b  xi2 −  xi yi + ( y − bx )nx = 0

b( xi2 − nx 2 ) =  xi yi − nxy

1 2 1
b  x i − x  =  x i y i − x y
2

n  n
bVar( x ) = Cov( x , y )
Hence we obtain a tidy expression for b.
Cov( x , y )
b=
Var( x ) 49
DERIVING LINEAR REGRESSION COEFFICIENTS

y True model : y =  + x + u
Fitted line : yˆ = a + bx
yˆ n = a + bxn

yn
y1
a = y − bx
yˆ 1 = a + bx1 b=
Cov( x , y )
a b Var( x )

x1 xn x
_ _
𝒂 = 𝒚lj − 𝒃𝒙lj ⇒ 𝒚 = 𝒂 + 𝒃𝒙
The expression for a is standard, and we will soon see that it generalizes easily.
There are various ways of writing the expression for b. 50
Summary of OLS Slope Estimate

➢The slope estimate is the sample covariance


between x and y divided by the sample
variance of x
◼ If x and y are positively correlated, the

slope will be positive


◼ If x and y are negatively correlated, the

slope will be negative


➢ Only need x to vary in our sample
58
Interpretation of
Regression Equation
INTERPRETATION OF
A LINEAR REGRESSION EQUATION
ෝ =  0 +  1x
𝒚
x*=x + 1
ෝ* =  0 +  1x*
𝒚
ෝ* =  0 +  1(x + 1)
𝒚
ෝ* = (  0 +  1x) +  1
𝒚
ෝ + 1
ෝ* = 𝒚
𝒚

If x increases 1 unit => y changes 1 units


60
INTERPRETATION OF A REGRESSION
EQUATION
80

70
Hourly earnings ($)

60

50

40

30

20

10

0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
-10

Highest grade completed

The scatter diagram shows hourly earnings in 1994 plotted against highest grade completed
for a sample of 570 respondents from the National Longitudinal Survey of Youth.
Highest grade completed means just that for elementary and high school. Grades 13, 14,
and 15 mean completion of one, two and three years of college.
Grade 16 means completion of four-year college. Higher grades indicate years of
postgraduate education. 61
INTERPRETATION OF A REGRESSION
EQUATION
. reg earnings hgc

Source | SS df MS Number of obs = 570


---------+------------------------------ F( 1, 568) = 65.64
Model | 3977.38016 1 3977.38016 Prob > F = 0.0000
Residual | 34419.6569 568 60.5979875 R-squared = 0.1036
---------+------------------------------ Adj R-squared = 0.1020
Total | 38397.0371 569 67.4816117 Root MSE = 7.7845

------------------------------------------------------------------------------
earnings | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
hgc | 1.073055 .1324501 8.102 0.000 .8129028 1.333206
_cons | -1.391004 1.820305 -0.764 0.445 -4.966354 2.184347
------------------------------------------------------------------------------

This is the output from a regression of earnings on highest


grade completed, using Stata.
Units: HGC Years of schooling (highest grade completed )
Earnings hourly earnings in $
62
INTERPRETATION OF A REGRESSION EQUATION
. reg earnings hgc

Source | SS df MS Number of obs = 570


---------+------------------------------ F( 1, 568) = 65.64
Model | 3977.38016 1 3977.38016 Prob > F = 0.0000
Residual | 34419.6569 568 60.5979875 R-squared = 0.1036
---------+------------------------------ Adj R-squared = 0.1020
Total | 38397.0371 569 67.4816117 Root MSE = 7.7845

------------------------------------------------------------------------------
earnings | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
hgc | 1.073055 .1324501 8.102 0.000 .8129028 1.333206
_cons | -1.391004 1.820305 -0.764 0.445 -4.966354 2.184347
------------------------------------------------------------------------------

For the time being, we will be concerned only with the estimates of the
parameters. The variables in the regression are listed in the first
column and the second column gives the estimates of their coefficients.
In this case there is only one variable, HGC, and its coefficient is 1.073.
The estimate of the intercept is -1.391. _cons, in Stata, refers to the
constant. 63
INTERPRETATION OF A REGRESSION EQUATION
80

70

^
Hourly earnings ($)

60
EARNINGS = −1.391 + 1.073 HGC
50

40

30

20

10

0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
-10

Highest grade completed


Here is the scatter diagram again, with the regression line shown.
What do the coefficients actually mean?
7
64
INTERPRETATION OF A REGRESSION EQUATION
80 ^
EARNINGS = −1.391 + 1.073 HGC
70
Hourly earnings ($)

60

50

40

30

20

10

0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
-10

Highest grade completed


To answer this question, you must refer to the units in which the variables are
measured.
HGC is measured in years (strictly speaking, grades completed), EARNINGS in
dollars per hour. So the slope coefficient implies that hourly earnings increase
by $1.07 for each extra year of schooling. 65
INTERPRETATION OF A REGRESSION EQUATION
80
^
EARNINGS = −1.391 + 1.073 HGC
70
Hourly earnings ($)

60

50

40

30

20

10

0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
-10

Highest grade completed

We will look at a geometrical representation of this interpretation. To do


this, we will enlarge the marked section of the scatter diagram.
66
INTERPRETATION OF A REGRESSION EQUATION
15

14
Hourly earnings ($)

13
$11.49
12

11 $1.07
10 One year
$10.41
9

7
10.8 11 11.2 11.4 11.6 11.8 12 12.2
Highest grade completed

The regression line indicates that completing 12th grade instead of 11th grade
would increase earnings by $1.073, from $10.413 to $11.486, as a general tendency.
67
INTERPRETATION OF A REGRESSION EQUATION
80
^
EARNINGS = −1.391 + 1.073 HGC
70
Hourly earnings ($)

60

50

40

30

20

10

0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
-10

Highest grade completed

You should ask yourself whether this is a plausible figure. If it is implausible,


this could be a sign that your model is misspecified in some way.
For low levels of education it might be plausible.
But for high levels it would seem to be an underestimate
68
INTERPRETATION OF A REGRESSION EQUATION
80

70 ^
EARNINGS = −1.391 + 1.073 HGC
Hourly earnings ($)

60

50

40

30

20

10

0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
-10

Highest grade completed

What about the constant term? (Try to answer this question yourself before
continuing with this sequence.)
Literally, the constant indicates that an individual with no years of education
would have to pay $1.39 per hour to be allowed to work. 69
INTERPRETATION OF A REGRESSION EQUATION
80
^
EARNINGS = −1.391 + 1.073 HGC
70
Hourly earnings ($)

60

50

40

30

20

10

0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
-10

Highest grade completed


This does not make any sense at all. In former times craftsmen might require an
initial payment when taking on an apprentice, and might pay the apprentice
little or nothing for quite a while, but an interpretation of negative payment is
impossible to sustain. 70
INTERPRETATION OF A REGRESSION EQUATION

80

70
^
Hourly earnings ($)

60
𝑬𝑨𝑹𝑵𝑰𝑵𝑮𝑺 = −𝟏. 𝟑𝟗𝟏 + 𝟏. 𝟎𝟕𝟑 𝑯𝑮𝑪
50

40

30

20

10

0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
-10

Highest grade com pleted

A safe solution to the problem is to limit the interpretation to the range of the
sample data, and to refuse to extrapolate on the ground that we have no evidence
outside the data range.
With this explanation, the only function of the constant term is to enable you to
draw the regression line at the correct height on the scatter diagram. It has no
meaning of its own. 71
INTERPRETATION OF A REGRESSION EQUATION

80

70

60
Hourly earnings ($)

50

40

30

20

10

0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
-10

Highest grade completed

Another solution is to explore the possibility that the true relationship is nonlinear.
We will soon extend the regression technique to fit nonlinear models.

72
Algebraic Properties of OLS
➢ The sum of the OLS residuals is zero.
Thus, the sample average of the OLS
residuals is zero as well
➢ The sample covariance between the
regressors and the OLS residuals is zero
➢ The OLS regression line always goes
through the mean of the sample

77
More Terminology
We can think of each observation as being made
up of an explained part, and an unexplained part,
𝑦𝑖 = 𝑦ො𝑖 + 𝑢ො 𝑖 We then define the following:

2
෍ 𝑦𝑖 − 𝑦lj is the total sum of squares (SST)

෍ 𝑦ො𝑖 − 𝑦lj 2 is the explained sum of squares (SSE)

෍ 𝑢ො 𝑖2 is the residual sum of squares (SSR)


Then SST = SSE + SSR
79
Goodness-of-Fit R 2

• How do we think about how well our


sample regression line fits our sample data?

• Can compute the fraction of the total sum of


squares (SST) that is explained by the
model, call this the R-squared of regression

R2 = SSE / SST = 1 – SSR / SST


81
Goodness-of-Fit R2
ei = yi − yˆ i  yi = yˆ i + ei
Var( y ) = Var( yˆ + e ) = Var( yˆ ) + Var(e ) + 2Cov( yˆ , e )
= Var( yˆ ) + Var(e )
_
1 1 1
( y − y ) = ( yˆ − yˆ ) + (e − e )2
2 2

n n n
( y − y )2 = ( yˆ − y )2 +  e 2

TSS = ESS + RSS

 − 
2
ESS ( ˆ
y y ) 2
ei
R =
2
= i
= 1−
TSS  ( yi − y ) 2
 i
( y − y ) 2

The main criterion of goodness of fit, formally described as the coefficient of


determination, but usually referred to as R2, is defined to the the ratio of ESS to
TSS, that is, the proportion of the variance of y explained by the regression
equation. 82
Goodness-of-Fit R2
Cov( y , yˆ ) Cov([ yˆ + e ], yˆ )
ry , yˆ = =
Var( y ) Var( yˆ ) Var( y ) Var( yˆ )
Cov( yˆ , yˆ ) + Cov(e , yˆ ) Var( yˆ )
= =
Var( y ) Var( yˆ ) Var( y ) Var( yˆ )
Var( yˆ ) Var( yˆ ) Var( yˆ )
= =
Var( y ) Var( yˆ ) Var( y )
= R2
1
ESS  ( yˆ i − y )
2
n
 i
( ˆ
y − y ) 2
Var( yˆ )
R =
2
= = =
TSS  ( yi − y ) 2
1
n
 ( y i − y ) 2 Var( y )

Thus the correlation coefficient is the square root of R2. It follows that
it is maximized by the use of the least squares principle to determine
the regression coefficients. 83
Goodness-of-Fit R2
. reg earnings hgc

Source | SS df MS Number of obs = 570


---------+------------------------------ F( 1, 568) = 65.64
Model | 3977.38016 1 3977.38016 Prob > F = 0.0000
Residual | 34419.6569 568 60.5979875 R-squared = 0.1036
---------+------------------------------ Adj R-squared = 0.1020
Total | 38397.0371 569 67.4816117 Root MSE = 7.7845

------------------------------------------------------------------------------
earnings | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
hgc | 1.073055 .1324501 8.102 0.000 .8129028 1.333206
_cons | -1.391004 1.820305 -0.764 0.445 -4.966354 2.184347
------------------------------------------------------------------------------

In this case there is only one variable, HGC, and its coefficient is 1.073. _cons,
in Stata, refers to the constant. The estimate of the intercept is -1.391.

A variation at the HGC explains 10.4% of the variation of the


EARNINGS. 84
Properties of OLS
𝑦𝑖 = 𝛽0 + 𝛽1 𝑥𝑖 + 𝑢𝑖
If the assumptions are hold
➢ OLS estimator
◼ Unbiased

◼ Efficient

➢ BLUE – Best Linear Unbiased Estimator

85
UNBIASEDNESS AND EFFICIENCY

OLS x x
x
x xx x
x x xx x x
xx
Unbiased x Unbiased
x
Efficient Inefficient

x x x
xxx
xx
Biased xx x Biased
x
Efficient x x Inefficient
86
OLS ASSUMPTIONS
1. Assumptions on the disturbances
a. Random disturbances have zero mean E[ui] = 0
b. Homoskedasticity Var(ui) = 2
c. No serial correlation Cov(ui uj) = 0 i ≠j
2. Assumptions on model and its parameters
a. Constant parameters
b. Linear model
3. Assumption on the probability distribution
a. Normal distribution ui ~N(0, 2 )
4. Assumptions on regressors
a. Fixed - nonstochastic regressors 87
REGRESSION
COEFFICIENTS

93
REGRESSION COEFFICIENTS AS RANDOM VARIABLES

y =  + x + u
yˆ = a + bx

Cov(𝑥, 𝑦) Cov(𝑥, [𝛼 + 𝛽𝑥 + 𝑢])


𝑏= =
Var(𝑥) Var(𝑥)
Cov(𝑥, 𝛼) + Cov(𝑥, 𝛽𝑥) + Cov(𝑥, 𝑢)
=
Var(𝑥)
0 + 𝛽Cov(𝑥, 𝑥) + Cov(𝑥, 𝑢)
=
Var(𝑥)
Cov(𝑥,𝑢)
b = 𝛽 + Var(𝑥)

The error term depends on the value of the disturbance term in every observation
in the sample, and thus it is a special type of random variable.
We will investigate its effect on b in two ways: first, directly, using a Monte Carlo
experiment, and, second, analytically. 94
REGRESSION COEFFICIENTS AS RANDOM VARIABLES
Choose model in which y is
determined by x, parameter y =  + x + u
values, and u

Choose Choose u is
Choose x=  = 2.0
parameter distribution independent
data for x 1, 2, ... , 20  = 0.5
values for u N(0,1)

Model y = 2.0 + 0.5x + u

Generate the Generate the


values of y values of y

Estimators b = Cov(x, y)/Var(x); a = y − bx

Estimate the values Estimate the values


of the parameters of the parameters

We will then regress y on x using the OLS estimation technique and see how
well our estimates a and b correspond to the true values  and . 95
REGRESSION COEFFICIENTS AS RANDOM VARIABLES

y = 2.0 + 0.5x + u

x 2.0+0.5x u y x 2.0+0.5x u y

1 2.5 11 7.5
2 3.0 12 8.0
3 3.5 13 8.5
4 4.0 14 9.0
5 4.5 15 9.5
6 5.0 16 10.0
7 5.5 17 10.5
8 6.0 18 11.0
9 6.5 19 11.5
10 7.0 20 12.0

Given our choice of numbers for  and , we can derive the


nonstochastic component of y. 96
REGRESSION COEFFICIENTS AS RANDOM VARIABLES

y = 2.0 + 0.5 x
14.00

y
12.00

10.00

8.00

6.00

4.00

2.00

0.00 x
0.00 2.00 4.00 6.00 8.00 10.00 12.00 14.00 16.00 18.00 20.00

The nonstochastic component is displayed graphically.


97
REGRESSION COEFFICIENTS AS RANDOM VARIABLES

y = 2.0 + 0.5x + u
x 2.0+0.5x u y x 2.0+0.5x u y

1 2.5 -0.59 1.91 11 7.5 1.59 9.09


2 3.0 -0.24 2.76 12 8.0 -0.92 7.08
3 3.5 -0.83 2.67 13 8.5 -0.71 7.79
4 4.0 0.03 4.03 14 9.0 -0.25 8.75
5 4.5 -0.38 4.12 15 9.5 1.69 11.19
6 5.0 -2.19 2.81 16 10.0 0.15 10.15
7 5.5 1.03 6.53 17 10.5 0.02 10.52
8 6.0 0.24 6.24 18 11.0 -0.11 10.89
9 6.5 2.53 9.03 19 11.5 -0.91 10.59
10 7.0 -0.13 6.87 20 12.0 1.42 13.42

Next, we generate randomly a value of the disturbance term for each observation
using a N(0,1) distribution (normal with zero mean and unit variance).
We will generate values of y for all the 20 observations. 98
REGRESSION COEFFICIENTS AS RANDOM VARIABLES

y 14.00

12.00

10.00

8.00

6.00

4.00

2.00
x
0.00
0.00 5.00 10.00 15.00 20.00

The 20 observations are displayed graphically.


99
REGRESSION COEFFICIENTS AS RANDOM VARIABLES

14.00
y yˆ = 2.52 + 0.48 x
12.00

10.00

8.00

6.00

4.00

2.00

0.00
0.00 5.00 10.00 15.00 20.00 x

This time the slope coefficient has been overestimated and the
intercept underestimated.
100
REGRESSION COEFFICIENTS AS RANDOM VARIABLES

14.00
y yˆ = 2.13 + 0.45 x
12.00

10.00

8.00

6.00

4.00

2.00

0.00
0.00 5.00 10.00 15.00 20.00 x

As last time, the slope coefficient has been underestimated and the
intercept overestimated.
101
REGRESSION COEFFICIENTS AS RANDOM VARIABLES

replication a b

1 1.63 0.54
12
2 2.52 0.48
10
3 2.13 0.45
8
4 2.14 0.50
5 1.71 0.56 6

6 1.81 0.51 4

7 1.72 0.56 2

8 3.18 0.41 0
0,40 0,45 0,50 0,55 0,60
9 1.26 0.58
10 1.94 0.52

The table summarizes the results of the three regressions and adds
those obtained repeating the process a further seven times.
102
REGRESSION COEFFICIENTS AS RANDOM VARIABLES

1-10 11-20 21-30 31-40 41-50

0.54 0.49 0.54 0.52 0.49


0.48 0.54 0.46 0.47 0.50
0.45 0.49 0.45 0.54 0.48
0.50 0.54 0.50 0.53 0.44
0.56 0.54 0.41 0.51 0.53
0.51 0.52 0.53 0.51 0.48
0.56 0.49 0.53 0.47 0.47
0.41 0.53 0.47 0.55 0.50
0.58 0.60 0.51 0.51 0.53
0.52 0.48 0.47 0.58 0.51

Here are the estimates of  obtained with 40 further replications of


the process..
103
REGRESSION COEFFICIENTS AS RANDOM VARIABLES

12

10

0
0,40 0,45 0,50 0,55 0,60
50 replications

The histogram is beginning to display a central tendency.


104
REGRESSION COEFFICIENTS AS RANDOM VARIABLES

12

10

0
0,40 0,45 0,50 0,55 0,60

100 replications

105
REGRESSION COEFFICIENTS AS RANDOM VARIABLES
12

10

0
0,40 0,45 0,50 0,55 0,60

This is the histogram with 100 replications. We can see that the distribution
100 replications
appears to be symmetrical around the true value, implying that the estimator
is unbiased.
The red curve shows the limiting shape of the distribution. It is symmetrical
around the true value, confirming that the estimator is unbiased.
The distribution is normal because the disturbance term was drawn from a
normal distribution.. 106
OLS ASSUMPTIONS
1. Assumptions on regressors
a. Fixed - nonstochastic regressors
2. Assumptions on the disturbances
a. Random disturbances have zero mean E[ui] = 0
b. Homoskedasticity Var(ui) = 2
c. No serial correlation Cov(ui uj) = 0 i j
3. Assumptions on model and its parameters
a. Constant parameters
b. Linear model
4. Assumption on the probability distribution
a. Normal distribution u N(0, 2 )
107
Variance of the OLS Estimators
➢ Now we know that the sampling
distribution of our estimate is centered
around the true parameter
➢ Want to think about how spread out this
distribution is
➢ Much easier to think about this variance
under an additional assumption, so
➢Assume Var(u|x) = 2 (Homoskedasticity)
108
Variance of OLS (cont)
➢ Var(u|x) = E(u2|x)-[E(u|x)]2
➢ E(u|x) = 0, so 2 = E(u2|x) = E(u2) = Var(u)
➢ Thus 2 is also the unconditional variance,
called the error variance
➢ , the square root of the error variance is
called the standard deviation of the error
➢ Can say: E(y|x)=0 + 1x and Var(y|x) = 2
109
Homoskedastic Case
y
f(y|x)

. E(y|x) =  +  x
0 1
.

x1 x2
110
Heteroskedastic Case
f(y|x)

.
. E(y|x) = 0 + 1x

.
x1 x2 x3 x
111
PRECISION OF THE REGRESSION
COEFFICIENTS
Simple regression model: y =  + x + u
Variances of the regression coefficients
 u2  x2   2
pop.var ( a ) = 1 +  pop.var ( b ) = u
n  Var ( x )  n Var ( x )

The variances are inversely proportional to n, the number of


observations in the sample. The more information you have, the
more accurate your estimates are likely to be.
The variances are proportional to u2, the variance of the disturbance
term. The bigger the luck factor, the worse the estimates are likely to
be, other things being equal.

113
PRECISION OF THE REGRESSION COEFFICIENTS
Simple regression model: y =  + x + u  u2
pop.var ( b ) =
Variances of the regression coefficients n Var ( x )
y y
35 35

30 30

25 25

20 20

15 15

10 10

5 5

0 0
0 5 10 15 20 0 5 10 15 20
-5
-5 x x
-10 -10

-15 -15
This is illustrated by the diagrams above. The nonstochastic component of the
relationship, y = 3.0 + 0.8x, represented by the dotted line, is the same in both
diagrams.
The values of x are the same, and the same random numbers have been used to
generate the values of the disturbance term in the 20 observations.
However, in the right-hand diagram the random numbers have been multiplied by
a factor of 5. As a consequence, the regression line, the solid line, is a much
poorer approximation to the nonstochastic relationship. 114
PRECISION OF THE REGRESSION COEFFICIENTS
Simple regression model: y =  + x + u  u2
Variances of the regression coefficients pop.var ( b ) = n Var ( x )
y y
35 35

30 30

25 25

20 20

15 15

10 10

5 5

0 x 0 x
0 5 10 15 20 0 5 10 15 20
-5 -5

-10 -10

-15 -15

In the diagrams above, the nonstochastic component of the relationship is the


same and the same random numbers have been used for the 20 values of the
disturbance term.
However, Var(x) is much smaller in the right-hand diagram because the values
of x are much closer together.
Hence in that diagram the position of the regression line is more sensitive to
the values of the disturbance term, and as a consequence the regression line
is likely to be relatively inaccurate. 115
PRECISION OF THE REGRESSION COEFFICIENTS
Simple regression model: y =  + x + u
Variances of the regression coefficients
. reg earnings hgc

Source | SS df MS Number of obs = 570


---------+------------------------------ F( 1, 568) = 65.64
Model | 3977.38016 1 3977.38016 Prob > F = 0.0000
Residual | 34419.6569 568 60.5979875 R-squared = 0.1036
---------+------------------------------ Adj R-squared = 0.1020
Total | 38397.0371 569 67.4816117 Root MSE = 7.7845

------------------------------------------------------------------------------
earnings | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
hgc | 1.073055 .1324501 8.102 0.000 .8129028 1.333206
_cons | -1.391004 1.820305 -0.764 0.445 -4.966354 2.184347
------------------------------------------------------------------------------

The standard errors of the coefficients always appear as part of the


output of a regression. Here is the regression of hourly earnings on
years of schooling discussed in a previous sequence. The standard
errors appear in a column to the right of the coefficients.
116
Variance of OLS Summary
➢ The larger the error variance, 2, the larger
the variance of the slope estimate
➢ The larger the variability in the xi, the
smaller the variance of the slope estimate
➢ As a result, a larger sample size should
decrease the variance of the slope estimate
➢ Problem that the error variance is unknown

117
Estimating the Error Variance
➢ We don’t know what the error variance, 2,
is, because we don’t observe the errors, ui

➢ What we observe are the residuals, ûi

➢ We can use the residuals to form an


estimate of the error variance

118
Error Variance Estimate (cont)

uˆi = yi − ˆ0 − ˆ1 xi


= ( 0 + 1 xi + ui ) − ˆ0 − ˆ1 xi
i (
= u − ˆ −  − ˆ − 
0 0 ) ( 1 1 )
Then, an unbiased estimator of  2 is

ui = SSR / (n − 2 )
1
 =
ˆ2

(n − 2)  ˆ 2

119
Error Variance Estimate (cont)
ˆ = ˆ 2 = Standard error of the regression
()
recall that sd ˆ = 
sx
if we substitute ˆ for  then we have
the standard error of ˆ1 ,

( ) (
se ˆ1 = ˆ /  (xi − x )
2
)
1
2

120
TESTING A HYPOTHESIS
RELATING TO A
REGRESSION COEFFICIENT
TESTING A HYPOTHESIS RELATING TO A
REGRESSION COEFFICIENT

Model: y=+x+u
Null hypothesis: H0:  = 0
Alternative hypothesis H1:  ≠  0

This sequence describes the testing of a hypothesis at the 5% and


1% significance levels. It also defines what is meant by a Type I
error.
We will suppose that we have the standard simple regression
model and that we wish to test the hypothesis H0 that the slope
coefficient is equal to some value 0.
The hypothesis being tested is described as the null hypothesis.
We test it against the alternative hypothesis H1, which is simply
that  is not equal to 0. 122
TESTING A HYPOTHESIS RELATING TO A
REGRESSION COEFFICIENT
Model: y=+x+u
Null hypothesis: H0:  =  0
Alternative hypothesis H1:  =  0
Example model: p=+w+u
Null hypothesis: H0:  = 1.0
Alternative hypothesis: H1:  = 1.0
As an illustration, we will consider a model relating price inflation to
wage inflation. p is the rate of growth of prices and w is the rate of
growth of wages.
We will test the hypothesis that the rate of price inflation is equal to
the rate of wage inflation. The null hypothesis is therefore H0:  = 1.0.
(We should also test  = 0. 123
TESTING A HYPOTHESIS RELATING TO A
REGRESSION COEFFICIENT
probability density Decision rule (5% significance level):
function of b
Reject H0:  = 0
• if b > 0 + 1.96 s.d.
• if b < 0 - 1.96 s.d.

b − 0 Reject H0
Z= if |Z| > 1.96
s.d.

2.5% 2.5%

0-1.96sd 0-sd 0 0+sd 0+1.96sd b


Thus we would reject H0 if the estimate were 1.96 standard deviations (or more) above
or below the hypothetical mean.
With the present test, if the null hypothesis is true, a Type I error will occur 5% of the
time because 5% of the time we will get estimates in the upper or lower 2.5% tails. 124
TESTING A HYPOTHESIS RELATING TO A
REGRESSION COEFFICIENT

probability density Type I error: rejection of H0 when it is in fact true.


function of b
Probability of Type I error: in this case, 5%
Significance level of the test is 5%.

reject H0:  = 0 acceptance region for b reject H0:  = 0

2.5% 2.5%

0-1.96sd 0-sd 0 0+sd 0+1.96sd b

The significance level of a test is defined to be the probability of


making a Type I error if the null hypothesis is true.
125
TESTING A HYPOTHESIS RELATING TO A
REGRESSION COEFFICIENT
Decision rule (1% significance level):
Reject H0:  = 0
probability density
(1) if b > 0 + 2.58 s.d. (2) if b < 0 - 2.58 s.d.
function of b
(1) if Z > 2.58 (2) if Z < -2.58

acceptance region for b :


b − 0
0 - 2.58 s.d. < b < 0 + 2.58 s.d. Z=
-2.58 < Z < 2.58 s.d.

0.5% 0.5%

0-2.58sd 0-sd 0 0+sd 0+2.58sd b


The 0.5% tails of a normal distribution start 2.58 standard deviations
from the mean, so we now reject the null hypothesis if Z is greater
than 2.58, in absolute terms. 126
TESTING A HYPOTHESIS RELATING TO A
REGRESSION COEFFICIENT

probability density Type I error: rejection of H0 when it is in fact true.


function of b Probability of Type I error: in this case, 1%
Significance level of the test is 1%.

reject H0:  = 0 acceptance region for b reject H0:  = 0

0.5% 0.5%

0-2.58sd 0-sd 0 0+sd 0+2.58sd b

Since the probability of making a Type I error, if the null hypothesis


is true, is now only 1%, the test is said to be a 1% significance test.
127
t TEST OF A HYPOTHESIS
RELATING TO A REGRESSION
COEFFICIENT
t TEST OF A HYPOTHESIS RELATING TO A
REGRESSION COEFFICIENT
s.d. of b known s.d. of b not known

discrepancy between discrepancy between


hypothetical value and sample hypothetical value and sample
estimate, in terms of s.d.: estimate, in terms of s.e.:
b − 0 b − 0
Z= t=
s.d. s.e.

5% significance test: 5% significance test:


reject H0:  = 0 if reject H0:  = 0 if
Z > 1.96 or Z < -1.96 t > tcrit or t < -tcrit

We look up the critical value of t and if the t statistic is greater than it,
positive or negative, we reject the null hypothesis. If it is not, we do not. 129
t TEST OF A HYPOTHESIS RELATING TO A
REGRESSION COEFFICIENT
0.4

0.3 normal
t, n = 10
0.2 t, n = 5

0.1

0
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6

So why do we make such a fuss about referring to the t distribution


rather than the normal distribution? Would it really matter if we always
used 1.96 for the 5% test and 2.58 for the 1% test?
The answer is that it does make a difference. Although the distributions
are generally quite similar, the t distribution has longer tails than the
normal distribution, the difference being the greater, the smaller the
number of degrees of freedom. 130
t TEST OF A HYPOTHESIS RELATING TO A
REGRESSION COEFFICIENT
0.1
normal
t, n = 10
t, n = 5

0
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6

As a consequence, the probability of obtaining a high test statistic on


a pure chance basis is greater with a t distribution than with a normal
distribution.
This means that the rejection regions have to start more standard
deviations away from zero for a t distribution than for a normal
131
distribution.
t TEST OF A HYPOTHESIS RELATING TO A
REGRESSION COEFFICIENT
t Distribution: Critical values of t
Degrees of Two-tailed test 10% 5% 2% 1% 0.2% 0.1%
freedom One-tailed test 5% 2.5% 1% 0.5% 0.1% 0.05%
1 6.314 12.706 31.821 63.657 318.31 636.62
2 2.920 4.303 6.965 9.925 22.327 31.598
3 2.353 3.182 4.541 5.841 10.214 12.924
4 2.132 2.776 3.747 4.604 7.173 8.610
5 2.015 2.571 3.365 4.032 5.893 6.869
… … … … … … …
… … … … … … …
18 1.734 2.101 2.552 2.878 3.610 3.922
19 1.729 2.093 2.539 2.861 3.579 3.883
20 1.725 2.086 2.528 2.845 3.552 3.850
… … … … … … …
… … … … … … …
120 1.658 1.980 2.358 2.617 3.160 3.373
 1.645 1.960 2.326 2.576 3.090 3.291
If we were performing a regression with 20 observations, as in the price
inflation/wage inflation example, the number of degrees of freedom
would be 18 and the critical value of t for a 5% test would be 2.101. 132
ONE-TAILED TESTS

model: y =  + x + u

null hypothesis: H0 :  ≤ 0
alternative hypothesis: H1 :  > 0

It occurs when you wish to demonstrate that a variable x


influences another variable y. You set up the null hypothesis
of no effect and try to reject H0.

133
ONE-TAILED TESTS
probability density null hypothesis: H0 :  ≤ 0
function of b
alternative hypothesis: H1 :  > 0

do not reject H0 reject H0

5%

0 1.65 sd
However, if you can justify the use of a one-tailed test, for example with H0: 
> 0, your estimate only has to be 1.65 standard deviations above 0.
This makes it easier to reject H0 and thereby demonstrate that y really is
influenced by x (assuming that your model is correctly specified). 134
CONFIDENCE
INTERVALS
CONFIDENCE INTERVALS

probability density function of b null hypothesis


conditional on  = 0 being true H0:  = 0

acceptance region for b

2.5% 2.5%

0-1.96sd 0-sd 0 0+sd 0+1.96sd

We ended by deriving the range of estimates that are compatible


with H0 and called it the acceptance region.
136
CONFIDENCE INTERVALS

95% confidence interval

b - tcrit (5%) se <  < b + tcrit (5%) se

99% confidence interval

b - tcrit (1%) se <  < b + tcrit (1%) se

This implies that the standard error should be multiplied by the


critical value of t, given the significance level and number of degrees
of freedom, when determining the limits of the interval.
137
CONFIDENCE INTERVALS

probability density function of b


(1) conditional on  = max being true
(2) conditional on  = min being true

(2) (1)

max-2sd min-sd min min+sd b max-sd max max+sd max+2sd

The diagram shows the limiting values of the hypothetical values of


, together with their associated probability distributions for b.
138
EXAMPLE: HYPOTHESIS TESTING
𝐄𝐀𝐑𝐍𝐈𝐍𝐆𝐒 = 𝛂 + 𝛃 𝐇𝐆𝐂 + 𝐮
. reg earnings hgc H0:  = 
Source | SS df MS
H1:  ≠  Number of obs = 570
---------+------------------------------ F( 1, 568) = 65.64
Model | 3977.38016 1 3977.38016 Prob > F = 0.0000
Residual | 34419.6569 568 60.5979875 R-squared = 0.1036
---------+------------------------------ Adj R-squared = 0.1020
Total | 38397.0371 569 67.4816117 Root MSE = 7.7845
------------------------------------------------------------------------------
earnings | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
hgc | 1.073055 .1324501 8.102 0.000 .8129028 1.333206
_cons | -1.391004 1.820305 -0.764 0.445 -4.966354 2.184347
------------------------------------------------------------------------------

5% significance test: 𝑏 − 𝛽0 1.073 − 0


𝑡= = = 8.102
Reject H0:  =  s.e. 0.132
If t > tcrit or t < -tcrit t > 1.96 Reject H0:  = 
HGC has a significant effect on
Earnings at 5% significance level. 139
EXAMPLE: CONFIDENCE INTERVALS
𝐄𝐀𝐑𝐍𝐈𝐍𝐆𝐒 = 𝛂 + 𝛃 𝐇𝐆𝐂 + 𝐮
. reg earnings hgc

Source | SS df MS Number of obs = 570


---------+------------------------------ F( 1, 568) = 65.64
Model | 3977.38016 1 3977.38016 Prob > F = 0.0000
Residual | 34419.6569 568 60.5979875 R-squared = 0.1036
---------+------------------------------ Adj R-squared = 0.1020
Total | 38397.0371 569 67.4816117 Root MSE = 7.7845
------------------------------------------------------------------------------
earnings | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
hgc | 1.073055 .1324501 8.102 0.000 .8129028 1.333206
_cons | -1.391004 1.820305 -0.764 0.445 -4.966354 2.184347
------------------------------------------------------------------------------
95% confidence interval for 𝛃
b - tcrit (5%) se <  < b + tcrit (5%) se

1.073 – 1.96*0.132 <  < 1.073+1.96*0.132

0.813<  < 1.333 140


F TEST
OF GOODNESS OF FIT
F TEST OF GOODNESS OF FIT

Var( y ) = Var( yˆ ) + Var(e )


( y − y )2 = ( yˆ − y )2 +  e 2
TSS = ESS + RSS

R2 =
ESS
=
 i
( y − ˆ
y ) 2

TSS  ( yi − y ) 2

y =  + x + u

H 0 :  = 0, H 1 :   0

Since x is the only explanatory variable at the moment, the null hypothesis is that y is not
determined by x Mathematically, we have H0:  = 0.
142
F TEST OF GOODNESS OF FIT
Var(𝑦) = Var(𝑦)
ො + Var(𝑒)
y =  + x + u lj 2 = ∑(𝑦ො − 𝑦)
∑(𝑦 − 𝑦) lj 2 + ∑𝑒 2
𝑇𝑆𝑆 = 𝐸𝑆𝑆 + 𝑅𝑆𝑆
𝐻0 : 𝛽 = 0, 𝐻1 : 𝛽 ≠ 0
𝐸𝑆𝑆 ∑(𝑦𝑖 − ሜ 2
𝑦)

𝐻0 : 𝑅2 = 0, 𝐻1 : 𝑅2 ≠ 0 𝑅2 = =
𝑇𝑆𝑆 ∑(𝑦𝑖 − 𝑦) lj 2
𝐸𝑆𝑆
𝐸𝑆𝑆/𝑘 ൗ𝑘
𝐹(𝑘, 𝑛 − 𝑘 − 1) = = 𝑇𝑆𝑆
𝑅𝑆𝑆/(𝑛 − 𝑘 − 1) 𝑅𝑆𝑆ൗ(𝑛 − 𝑘 − 1)
𝑇𝑆𝑆
𝑅 2 /𝑘
=
(1 − 𝑅 2 )/(𝑛 − 𝑘 − 1)
If calculated F test value is greater than F table value Reject Null
Hypothes concerning goodness of fit.
F statistic, defined as shown. k is the number of explanatory
variables, which at present is just 1. 143
F TEST OF GOODNESS OF FIT and R2

F 140

120

100
R2 / k
F ( k , n − k − 1) =
80
(1 − R 2 ) /( n − k − 1)
60

40

20

0
0 0.2 0.4 0.6 0.8 1 R2

F is a monotonically increasing function of R2. As R2 increases, the


numerator increases and the denominator decreases, so for both of
these reasons F increases. 144
Evaluation of The Regression
Results
1. Is the equation supported by sound theory?
2. How well does the estimated regression fit the data?
3. Is the data set reasonably large and accurate?
4. Is OLS the best estimator to be used for this equation?
5. How well do the estimated coefficients correspond to
the expectations developed by the researcher before
the data were collected?
6. Are all the obviously important variables included in the
equation?
7. Has the most theoretically logical functional form been
used?
8. Does the regression appear to be free of major
145
econometric problems?
Example1: Height vs. Weight
100

90

80
KILO

70 Correlation 0.85

60

50

40
155 160 165 170 175 180 185 190

BOY
146
Example1: Height vs. Weight
190

185

180

175
HEIGHT

170 WEIGHTi = a + b*HEIGHTi + ei


165

160

155
40 50 60 70 80 90 100

WEIGHT

147
Example1: Height vs. Weight
Dependent Variable: KILO
Method: Least Squares KILO = -161.63 + Generally the constant
1.307*BOY
Sample (adjusted): 1 44 term is not interpreted.
Included observations: 44 after adjustments Mostly it is meaningless.
Variable Coefficient Std. Error t-Statistic Prob.

C -161.6304 21.73056 -7.437930 0.0000


BOY 1.307075 0.124817 10.47192 0.0000

R-squared 0.723067 Mean dependent var 65.68182


Adjusted R-squared 0.716473 S.D. dependent var Height explains 72% of
12.64869
S.E. of regression 6.735079 Akaike info criterion 6.696925
the variation of Weight.
Sum squared resid 1905.174 Schwarz criterion 6.778025
Log likelihood -145.3324 Hannan-Quinn criter. 6.727001
F-statistic 109.6611 Absolute valuestat
Durbin-Watson of t is grater then table
2.648693
Prob(F-statistic) 0.000000 critical value(1.96). Or the p-value is less
than 5%. Reject H0: 1= 0
If height increases 1cm, The coefficient is significant. Height has
weight increases 1.3kg . a significant effect on weight. 148
Example2: College Applications
➢ Suppose that you work in the admissions office of a college
that doesn’t allow prospective students to apply by using the
Common Application. How might you go about estimating the
number of extra applications that your college would receive if
it allowed the use of the Common Application?
COLLEGE
Amherst College
APPLICATION
6680
RANK
2
SIZE
1648
◼ APPLICATIONi = the number of applications
Bard College
Bates College
4980
4434
36
23
1641
1744
received by the ith college.
Bowdoin College 5961 7 1726
Bucknell University 8934 29 3529 ◼ RANKi = the U.S. News10 rank of the ith college
Carleton College 4840 6 1966
Centre College 2159 44 1144 (1 = best)
Colby College 4679 20 1865
Colgate University 8759 16 2754
Colorado College
Connecticut College
4826
4742
26
39
1939
1802
1. Is there any relationship between
Davidson College
Denison University
3992
5196
10
48
1667
2234
ranking and the number of application?
DePauw University 3624 48 2294
Dickinson College 5844 41 2372 2. Interpret the coefficient of rank.
Furman University 3879 41 2648
Gettysburg College
Grinnell College
6126
3077
45
14
2511
1556
3. Is it significant?
Hamilton College
Haverford College
4962
3492
17
9
1802
1168
4. Interpret the R2 and F statistics. 149
Example2: College Applications
9,000
➢ It looks there is a
8,000 negative relation
7,000 between number of
APPLICATION

application and rating.


6,000
➢ It the college at the top
5,000
rank, the number of
4,000 applications is higher.
3,000 ➢ Both variables are
almost normally
2,000
distributed.
0 10 20 30 40 50 60 ➢ There is no obvious
RANK
outlier.
150
9,000

Example2 : College 8,000

Applications 7,000

APPLICATION
6,000

Applicationi = 0 + 1*Ranki + ui
5,000

4,000

Dependent Variable: APPLICATION 3,000

Method: Least Squares 2,000


Sample: 1 49 0 10 20 30 40 50 60

Included observations: 49 RANK

Variable Coefficient Std. Error t-Statistic Prob.

C 5956.410 444.8246 13.39047 0.0000


RANK -33.52880 13.39669 -2.502768 0.0159

R-squared 0.117600 Mean dependent var 4992.286


Adjusted R-squared 0.098826 S.D. dependent var 1640.112
S.E. of regression 1556.962 Akaike info criterion 17.57882
Sum squared resid 1.14E+08 Schwarz criterion 17.65604
Log likelihood -428.6811 Hannan-Quinn criter. 17.60812
F-statistic 6.263847 Durbin-Watson stat 1.943843
Prob(F-statistic) 0.015857
151
Example2 : College Applications
Generally the constant
Applicationi = 0 + 1*Ranki + ui
term is not interpreted.
Dependent Variable: APPLICATION Mostly it is meaningless.
Method: Least Squares
Sample: 1 49
Included observations: 49

Variable Coefficient Std. Error t-Statistic Prob.

C 5956.410 444.8246 13.39047 0.0000


RANK -33.52880 13.39669 -2.502768 0.0159
Absolute value of
t is grater then table
R-squared 0.117600 Mean dependent var 4992.286 critical value(1.96).
Adjusted R-squared 0.098826 S.D. dependent var 1640.112
S.E. of regression 1556.962 Akaike info criterion 17.57882
Or the p-value is less
Sum squared resid 1.14E+08 Schwarz criterion 17.65604 than 5%.
Log likelihood -428.6811 Hannan-Quinn criter. 17.60812 Reject H0: 1= 0
F-statistic 6.263847 Durbin-Watson stat 1.943843
Prob(F-statistic) 0.015857
The coefficient is
significant. Rank has
an significant effect
If the ranking drops one level, on the # of
the number of of application applications
decreases 34 persons. 152
Example2 : College Applications
Dependent Variable: APPLICATION
Method: Least Squares
Sample: 1 49
Included observations: 49

Variable Coefficient Std. Error t-Statistic Prob.

C 5956.410 444.8246 13.39047 0.0000


RANK -33.52880 13.39669 -2.502768 0.0159 The variation of ranking
explain 12% of the
R-squared 0.117600 Mean dependent var 4992.286
Adjusted R-squared 0.098826 S.D. dependent var 1640.112 variation of the # of
S.E. of regression 1556.962 Akaike info criterion 17.57882 applications. R2=0.1176
Sum squared resid 1.14E+08 Schwarz criterion 17.65604
Log likelihood -428.6811 Hannan-Quinn criter. 17.60812
F-statistic 6.263847 Durbin-Watson stat 1.943843
Prob(F-statistic) 0.015857

The whole equation is


significant at 95% CL
153

You might also like