Chapter2 PDF
Chapter2 PDF
SIMPLE LINEAR
REGRESSION MODEL
9
1.2. Population Regression Function (PRF)
• E(Y/Xi)= f(Xi) → conditional expectation function (CEF) or
population regression function (PRF) → How the mean or
average response of Y varies with X.
• The functional form of the PRF is an empirical question. For
example, assume : E(Y | Xi) = β1 + β2Xi.
1
Yi = 1 + 2 ( ) + ui
Xi
ln Yi = + ln X i + ui
Yi = 1 + 2 X i + 3 X + ui i
2
= ˆ + ˆ X +uˆ
• On the basis of the SRF i
Y 1 2 i i
23
2.1 The method of Ordinary Least Squares (OLS)
• “The method of least squares is the
automobile of modern statistical analysis;
despite its limitations, occasional accidents,
and incidental pollution, it and its numerous
variations, extensions and related
conveyances carry the bulk of statistical
analysis, and are known and valued by all”.
Stephen M. Stigler
• Two-variable PRF: Yi = 1 + 2 X i + ui
• The PRF is not directly observable. We estimate
it from the SRF: Yi = ˆ1 + ˆ2 X i +uˆ i
• Or uˆ i = Yi − Yˆi = Yi − ˆ1 − ˆ 2 X i
The residuals are the differences between the
actual and estimated Y values.
i =1
26
2.1 The method of Ordinary Least Squares (OLS)
27
2.1 The Method of Ordinary Least Squares (OLS)
uˆi 2 = (Yi − ˆ1 − ˆ2 X i ) 2 (3.1.2 )
( )
n n 2
i i 1 2 i
ˆ
u 2
i =1
= Y − ˆ − ˆ X → min
i =1
i 1 2 Xi
Y = nˆ + ˆ (3.1.4)
i i 1 i 2 i
X Y = ˆ
X + ˆ
X 2 (3.1.5)
▪ Solving the normal equations simultaneously, we obtain:
(3.1.6)
(3.1.7)
29
2.1. The Method of Ordinary Least Squares
Review:
▪X,Y independent: var(X + Y) = var(X) + var(Y)
▪X,Y dependent: var(X + Y) = var(X) + var(Y) + 2Cov(X,Y)
▪Covariance
n n
(X i − X )(Yi − Y ) Y X i i − n. X .Y
cov( X , Y )
ˆ2 = i =1
n
= i =1
n
=
i i
Var ( X )
( X − X ) 2
X 2
− n.( X ) 2
30
i =1 i =1
2.1 The method of Ordinary Least Squares (OLS)
ˆ2 =
S xy ˆ1 = Y − ˆ2 X
S xx
• Where
S xy = ( X i − X )(Yi − Y ) = X iY i−nXY
S xx = ( X i − X ) = X − nX
2
i
2 2
S yy = (Yi − Y ) 2 = Yi 2 − nY 2
31
Example 1
• The effect of working time on income. Data on income in
dollars and working time of ten workers, X= working time,
Y= income
Obs X Y X2 Y2 XY
S xx = 668 − 10 8 = 28 2
S xy = 789 − 10 8 9,6 = 21
S yy = 952 − 10 9,6 2 = 30,4
S xy 21
ˆ2 = = = 0,75
S xx 28
ˆ1 = Y − ˆX = 9,6 − 0,75 8 = 3,6
• SRF: Yˆ = 3,6 + 0,75 X
33
Example 1
• Eviews with Quick/Estimate Equation or Object /New
object/ Equation
• Stata: reg Y X
Variable Coefficient Std. Error t-Statistic Prob.
34
Example 2
• The effect of rice price on rice demand. Y= demand (ton
per month), X= price (thousand dong per kg)
36
37
Interpretation of coefficient estimates
• Relation between working time and income:
Yˆ = 3.6 + 0.75 X
→ If working time increases by 1 hour, the estimated
increase in wages is about 75 cents.
38
2.2. Properties of OLS statistics
• The sum and the sample average of the OLS
n
residuals is zero. uˆ
i =1
i =0
39
2.2. Properties of least-squares estimators
40
Properties of OLS Estimators under the
Normality Assumption
• With the assumption that ui follow the normal
distribution, the OLS estimators have the following
properties:
41
Precision or standard errors of Least-Squares estimates
• The least-squares estimates are a function of the sample data. But since
the data change from sample to sample, the estimates will change.
Therefore, what is needed is some measure of “reliability” or precision of
the estimators βˆ1 and βˆ2. In statistics the precision of an estimate is
measured by its standard error (se), which can be obtained as follows:
42
Precision or standard errors of Least-Squares estimates
Where:
• ˆσ2 is the OLS estimator of the true but unknown
σ2 and where the expression n−2 is known as the
number of degrees of freedom (df).
• is the residual sum of squares (RSS).
See 3.5.2, 83
43
Example
• Wage and education (WAGE1.dta)
44
2.3. The assumptions underlying the OLS
Assumptions 1: Linear in Parameters. The
regression model is linear in the parameters.
Yi = 1 + 2 X i + ui
• Keep in mind that the regressand Y and the
regressor X themselves may be nonlinear.
45
2.3. The assumptions underlying the OLS
Assumption 2: X values are fixed in repeated
samplings. X is assumed to be non-stochastic.
E(ui / X i ) = 0
• Each Y population corresponding to a given X is
distributed around its mean value with some Y
values above the mean and some below it. The
mean value of these deviations corresponding to
any given X should be zero.
• Note that the assumption E(ui | Xi) = 0 implies
that E(Yi | Xi) = β1 + β2Xi.
47
Assumption 3
E(ui | Xi) = 0
48
2.3. The assumptions underlying the OLS
Assumption 3 is satisfy : Zero covariance between ui and
Xi
Cov(ui X i ) = 0
49
2.3. The assumptions underlying the OLS
53
54
2.3. The assumptions underlying the OLS
55
2.3. The assumptions underlying the OLS
Assumption 8: The regression model is correctly
specified. There is no specification bias or error in the
model used in empirical analysis.
Yi = β1 + β2 (1/Xi ) + ui
59
3. A measure of “Goodness of fit”
The goodness of fit: how “well” the sample
regression line fits the data.
60
3. A measure of “Goodness of fit”
61
3. A measure of “Goodness of fit”
• To compute this r2, we proceed as follows: Recall that
Yi = Yˆi +uˆ i
or in the deviation form yi = yˆi +uˆ i
Squaring on both sides and summing over the sample, we obtain
62
3. A measure of “Goodness of fit”
• Total sum of squares (TSS): Total variation of
the actual Y values about their sample mean.
63
3. A measure of “Goodness of fit”
The coefficient of
determination
64
3. A measure of “Goodness of fit”
65
3. A measure of “Goodness of fit”
• The coefficient of determination r2 is a measure of the
goodness of fit of a regression line.
→ r2 is a nonnegative quantity.
→ 0 ≤ r2 ≤ 1
66
Example
• CEO salary and ROE (CEOSAL1.DTA)
. reg salary roe
67
Example: CEO salary and ROE
68