0% found this document useful (0 votes)
28 views30 pages

(2021) EC6041 Lecture 2 CLRM

1. The document discusses the classical linear regression model (CLRM), which assumes a linear relationship between a dependent variable (y) and one or more independent or explanatory variables (x's). 2. The CLRM can be represented by the equation y = Xβ + ε, where y is the dependent variable, X is a matrix of explanatory variables, β is a vector of unknown parameters, and ε is a disturbance term. 3. The CLRM makes assumptions about the disturbance term ε, including that it has a mean of zero, is uncorrelated with the explanatory variables, and has constant variance. These assumptions allow the parameters to be estimated using least squares regression.

Uploaded by

G.Edward Gar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views30 pages

(2021) EC6041 Lecture 2 CLRM

1. The document discusses the classical linear regression model (CLRM), which assumes a linear relationship between a dependent variable (y) and one or more independent or explanatory variables (x's). 2. The CLRM can be represented by the equation y = Xβ + ε, where y is the dependent variable, X is a matrix of explanatory variables, β is a vector of unknown parameters, and ε is a disturbance term. 3. The CLRM makes assumptions about the disturbance term ε, including that it has a mean of zero, is uncorrelated with the explanatory variables, and has constant variance. These assumptions allow the parameters to be estimated using least squares regression.

Uploaded by

G.Edward Gar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Classical Linear Regression Model

Ratjomose P. Machema
[email protected]

Department of Economics
National University of Lesotho (NUL)

EC6041:
Econometric Theory and Applications

R.P. Machema, (NUL ) Lecture_2 CLRM EC6041 1 / 30


Table of contents

1 Introduction

2 Bivariate Regression Model

3 The Classical Multiple Linear Regression Model

4 Matrix Representation

5 CLRM Assumptions

6 Least Squares Regression

R.P. Machema, (NUL ) Lecture_2 CLRM EC6041 2 / 30


Introduction

An econometric study begins with a set of propositions about


some aspect of the economy. The theory specifies a set of
precise, deterministic relationships among variables.
The empirical investigation provides estimates of unknown
parameters in the model, such as elasticities or the effects of
monetary policy, and usually attempts to measure the validity of
the theory against the behaviour of observable data.
Once suitably constructed, the model might then be used for
prediction or analysis of behaviour.

R.P. Machema, (NUL ) Lecture_2 CLRM EC6041 3 / 30


Simple Regression Model

Recall that in a simple regression the dependent variable y is related to only one
explanatory variable x.
Therefore we would like to “study how y varies with changes in x.”
For examples:
x is amount of fertilizer, and y is soybean yield; or
x is years of schooling, y is hourly wage.
Simple linear regression model is a specification of the process that we
believe describes the relationship between two variables
For each level of xi , we assume that yi is generated by the following
simple linear regression model (or two-variable regression model)

yi = β1 + β2 xi + ui (1)

R.P. Machema, (NUL ) Lecture_2 CLRM EC6041 4 / 30


Simple Regression Model

The variable u, called the error term or disturbance, represents factors other
than x that affect y .
A SRM treats all factors affecting y other than x as being unobserved.
We call β0 the intercept parameter and β1 the slope parameter.
These describe a population, and our ultimate goal is to estimate them
y is assumed to be linearly related to x.
If the other factors in u are fixed, meaning ∆u = 0 then

∆y = β1 ∆x (2)

Thus, β1 is the slope parameter

R.P. Machema, (NUL ) Lecture_2 CLRM EC6041 5 / 30


Example - Yield and Fertilizer
A model to explain crop yield to fertilizer use is

yield = β0 + β1 fertilizer + u, (3)


where u contains land quality, rainfall on a plot of land, and so on.
The slope parameter, β1 , is of primary interest: it tells us how yield
changes when the amount of fertilizer changes, holding all else fixed.
If yi is income and xi is years of schooling, the interpretation is such
that if a person had 0 years-of-schooling (x = 0), they would be
expected to have income of β0 , and each extra year of schooling
would be associated with β1 higher income, on average.
The linearity of (3) implies that a one-unit change in x has the same effect
on y , regardless of the initial value of x.

Note:The linear function is probably not realistic here. The effect of


fertilizer is likely to diminish at large amounts of fertilizer.
R.P. Machema, (NUL ) Lecture_2 CLRM EC6041 6 / 30
Simple Linear Regression Model
How can we hope to generally estimate the ceteris paribus effect of y on x when
we have assumed all other factors affecting y are unobserved and lumped into u?
When it comes to estimating β1 (and β0 ) using a random sample of data, we
must restrict how u and x are related to each other.
First, we assume that the average, or expected, value of u is zero in the
population:
E (u) = 0
Second, assume u and x uncorrelated in the population:

Corr (x, u) = 0 (4)

which implies that u and x are not linearly related.


u is mean independent of x:

E (u|x) = E (u), all values x,

where E (u|x) means “the expected value of u given x.”


R.P. Machema, (NUL ) Lecture_2 CLRM EC6041 7 / 30
Simple Linear Regression Model
Suppose u is “land quality” and x is fertilizer amount. Then
E (u|x) = E (u) says fertilizer amounts are chosen independently of
quality.
This assumes fertilizer amounts are assigned at random.
Combining E (u|x) = E (u) with E (u) = 0 gives

E (u|x) = 0, all values x (5)

called the zero conditional mean assumption.


Because the expected value is a linear operator, E (u|x) = 0
implies

E (y |x) = β0 + β1 x + E (u|x) = β0 + β1 x, (6)

which shows the population regression function is a linear


function of x.
R.P. Machema, (NUL ) Lecture_2 CLRM EC6041 8 / 30
Multiple Linear Regression Model
The baseline standard of econometric analysis is the classical linear regression
model. At its simplest, the model assumes that each observation in the sample is
generated by a process that can be represented as

yi = xi1 β1 + xi2 β2 + · · · + xi1 βk + εi (7)

where the x variables are assumed to be independent of the error terms, the βs
are fixed and each εi is distributed independently and identically with a mean of
0 and variance σ 2 (i.e. there is no heteroscedasticity and no autocorrelation).
The observed value of yi is the sum of two parts, the regression function
and the disturbance, εi .
Our objective is to estimate the unknown parameters of the model,
use the data to study the validity of the theoretical propositions, and
perhaps use the model to predict the variable y .
If in addition we make the assumption that the errors are normally distributed,
then the model is known as the classical normal linear regression model.

R.P. Machema, (NUL ) Lecture_2 CLRM EC6041 9 / 30


Multiple Linear Regression Model

The error term ε arises for several reasons, primarily because we


cannot capture every influence on an economic variable in a model,
no matter how elaborate.
It can consist of a variety of things, including omitted variables
and measurement error.
it includes all of the things that we did not think of including on
the right-hand side of equation 7, as well as all of the things
that we could not includ e on the right-hand side of equation 9.
The error term ε thus embodies our ignorance about the relationship
between two variables.

R.P. Machema, (NUL ) Lecture_2 CLRM EC6041 10 / 30


Matrix representation

Since equation 6.4 is true of every observation , we can stack


the observations and get the equivalent expression
         
y1 x11 x12 x1k ε1
 y2   x21   x22   x2k   ε2 
 ..  =  ..  β1 +  ..  β2 + · · · +  ..  βk +  ..
         

 .   .   .   .   . 
yn xn1 xn2 xnk εn

which we can write in vector form as

y = x1 β1 + x2 β2 + · · · + xk βk + ε

R.P. Machema, (NUL ) Lecture_2 CLRM EC6041 11 / 30


Matrix representation

This can be written more compactly still in matrix form as


      
y1 x11 x12 · · · x1k β1 ε1
 y2   x21 x22 · · · x2k   β2   ε2 
 ..  =  .. .. . . ..   ..  +  .. 
      
 .   . . . .   .   . 
yn xn1 xn2 · · · xnk βk εn

or in short
y = Xβ + ε

R.P. Machema, (NUL ) Lecture_2 CLRM EC6041 12 / 30


Matrix representation

y = Xβ + ε is the fundamental equation of the linear regression


model:
y is the (n × 1) column vector of the observations on the
dependent variable, X is an (n × k) matrix in which each column
represents the observations on one of the explanatory variables,
β vector is a (k × 1) vector of parameters and ε is the (n × 1)
vector of stochastic error terms (or disturbance terms).
Typically the first column of X will be a column of 1s, so that 1 is the
intercept in the model.

R.P. Machema, (NUL ) Lecture_2 CLRM EC6041 13 / 30


Assumption A1: Linearity β and additivity in ε -
y = Xβ + ε
Linearity refers to the manner in which the parameters and the disturbance
enter the equation, not necessarily to the relationship among the variables.

We have seen examples where y and the xj can be nonlinear functions of


underlying variables, and so the model is flexible. Examples

E (yj |x1 ; x2 ) = β0 + β1 x1 + β2 x2 + βx22 (8)


E (yj |x1 ; x2 ) = β0 + β1 x1 + β2 x2 + βx1 x2 (9)
lny = β0 + β1 lnx1 + β2 lnx2 + · · · + βk lnxk (10)
equation (8 & 9) are nonlinear in x have important implications for
interpreting the βs, but not for estimating them. Eqt 10 is known as the
constant elasticity, often used in models of demand and production, the
elasticity of y with respect to changes in x is β
R.P. Machema, (NUL ) Lecture_2 CLRM EC6041 14 / 30
Assumption A2: Full Rank - X is an n × k matrix
with rank K
None of the columns of the X matrix should be able to be written as
a linear combination of the other columns.
The number of observations must be at least as large as the number
of variables .
Example:
Suppose that a cross-section model specifies that consumption, C , relates
to income as follows:

C = β1 + β2 non-labour income + β3 salary + β4 total income + ε

where total income is exactly equal to salary plus non-labor income.


There is an exact linear relationship among the variables in the model,
thus the model makes no sense from a ceteris paribus perspective.
Inclusion of share of non-labour, which captures the effect of relative
size, can be included in the regression.
R.P. Machema, (NUL ) Lecture_2 CLRM EC6041 15 / 30
Assumption A3: Zero Conditional mean -
E [εi |X] = 0
E [ε1 |X]
 
 E [ε2 |X] 
E [ε|X] =  .. =0
 
 . 
E [ε2 |X]

no observations on x convey information about the expected value of the


disturbance.
Restricts the relationship between the unobserved factors in εi and the
explanatory variables

Assumption A3 implies that


Cov [εi |x] = 0 for all i.
E [y|X] = Xβ
When Zero Conditional Mean assumption holds, we say x1 , ..., xk are
exogenous explanatory variables. If xj is correlated with ε, we often say xj is
an endogenous explanatory variable.
R.P. Machema, (NUL ) Lecture_2 CLRM EC6041 16 / 30
Assumptions of Linear Regression Model

The three assumptions of Linearity, Full-Rank, and Zero-Conditional


Mean comprise the linear regression model.
The regression of y on X is the conditional mean, E [y |X ], so
that without the zero-conditional mean assumption, X β is not
the conditional mean function.
The remaining assumptions will more completely specify the
characteristics of the disturbances in the model and state the
conditions under which the sample observations on x are
obtained.

R.P. Machema, (NUL ) Lecture_2 CLRM EC6041 17 / 30


Assumption A4: Spherical disturbances

the variance of the error term/disturbance is constant.

It is known as Homoskedasticity , which refers to a situation in which the


error has the same variance regardless of the value(s) taken by the
independent variable(s):

Var [εi |X] = σ 2 for alli = 1, . . . , n

and
Cov [εi , εi |X] = 0 for alli 6= j
If the error term is heteroskedastic, the dispersion changes over the range
of observations. Heteroskedasticity occurs when the variance of the error
term changes in response to a change in the value(s) of the independent
variable(s):
Var [εi |X] = σi2 for alli = 1, . . . , n

R.P. Machema, (NUL ) Lecture_2 CLRM EC6041 18 / 30


Assumption A4: Spherical disturbances

Figure: Homoskedasticity

(a) Homo (b) Heteroskedasticity

R.P. Machema, (NUL ) Lecture_2 CLRM EC6041 19 / 30


Assumption A4: Spherical disturbances

Furthermore errors that happen on one observation have no influence on


what happens to the next observation. That is, because the observations
are assumed to be randomly drawn, so the error values should be
independent and not related to one another.
This independence property will frequently be violated in practice.
For instance if farmers imitate the behaviour of their neighbours, this will
induce (for example) correlations in their farming patterns which may go
beyond the observables (i.e X) that we can control for in the standard
regressions.
Put differently, correlation and covariance of error observations is zero:

Cor [εi , εj |X] = 0 for alli 6= j

Uncorrelatedness across observations is labeled generically non-autocorrelation


If the errors have a relationship, then you have autocorrelation (or serial
correlation), and:  
Cor εi , εj |X 6= 0 for alli 6= j

R.P. Machema, (NUL ) Lecture_2 CLRM EC6041 20 / 30


Assumption A4: Spherical disturbances
The assumption implies that
 
E [ε1 ε1 |X ] E [ε1 ε2 |X ] ··· E [ε1 εn |X ]
 E [ε2 ε1 |X ] E [ε2 ε2 |X ] ··· E [ε2 εn |X ] 
Var [εi |X] = E [εε|X ] =  ..
 
.. 
 . . 
E [εn ε1 |X ] E [εn ε2 |X ] E [ε2 εn |X ]
 
Var(ε1 |X ) Cov(ε1 ε2 |X ) Cov(ε1 εn |X )
 Cov(ε1 ε2 |X ) Var(ε2 |X ) Cov(ε2 εn |X ) 
=
 
 . .. 

Cov(εn ε1 |X ) Cov(εn ε2 |X ) Var(εn |X )
 2 
σ 0 ··· 0
 2
.. 
= 0 σ . 0   = σ2 I

 ··· 
0 0 σ2
This matrix is the variance-covariance matrix of the disturbance term. The variances are
displayed on the main diagonal and the covariances in the off-diagonal positions.
R.P. Machema, (NUL ) Lecture_2 CLRM EC6041 21 / 30
Assumption A5: Exogeneity of X

X is fixed (non-stochastic); or
X is independent of ε

Version (a) of this assumption (fixed regressors) is unlikely to be


ever met in economic research.
In practice the crucial assumption will therefore be version (b),
i.e. that the regressors are generated independently of ε.

R.P. Machema, (NUL ) Lecture_2 CLRM EC6041 22 / 30


Assumption A6: Normality of ε

It is convenient to assume that the disturbances are normally distributed, with


zero mean and constant variance

ε|X ∼ N 0, σ 2
 

This assumption implies that observations on εi are statistically independent


as well as uncorrelated. That is, for any given X value, the error term
follows a normal distribution with a zero mean and constant variance.
An important characteristic of the normality assumption is that it isn’t
required for performing OLS estimation, but only when producing
confidence intervals and/or perform hypothesis tests with your OLS
estimates.
With the Central Limit Theorem, however, almost any model that has an
approximately continuous dependent variable and at least 200 observations
should have an approximately normal distribution of error terms – that is, the
errors would be asymptotically normal.

R.P. Machema, (NUL ) Lecture_2 CLRM EC6041 23 / 30


Introduction
The Classical Linear Regression Model represents the DSP as

y = Xβ + ε

If we pick any estimate (however arbitrary) of β and denote it by β̂ 1 , then


we can get a set of fitted values corresponding to these estimates:

ŷ1 = Xβ̂ 1

Corresponding to these fitted values, we will get a vector of residuals

e 1 = y − ŷ 1

IF we pick another estimate say β̂ 2 we would get a different set of fitted


values ŷ2 = Xβ̂ 2 and residuals e 2 = y − ŷ 2 .
In fact it is clear that there can be infinitely many fitted values and
residuals.
The problem is how to pick “sensible” estimates.
R.P. Machema, (NUL ) Lecture_2 CLRM EC6041 24 / 30
Introduction
We choose a vector β̂ so that the fitted line X β̂ is close to the data
points. The measure of closeness constitutes a fitting criterion.
There are a number of different approaches to estimation of the
parameters of the model. Nonetheless, the method of least squares
has long been the most popular.
The method of OLS chooses to estimate the coefficient vector,
β, by minimizing the sum of squared residuals (or Residual Sum
of Squares
n
X
β̂ = arg min (yi − xi b)2
i=1
where x is the i − th row of the matrix X.
It is important to distinguish between population quantities, such
as β and ε and sample estimates of them, denoted b and e.
R.P. Machema, (NUL ) Lecture_2 CLRM EC6041 25 / 30
Residuals
The residual sum of squares can be written as:
n
X
RSS = ei2
i=1
= e12 + e22 + · · · + en2
 
e1
  e2 
= e1 e2 · · · en ..
 
.
 
 
en
0
=e e
So the problem of OLS estimation is to minimise
RSS = e 0 e
where
e = y − X β̂
R.P. Machema, (NUL ) Lecture_2 CLRM EC6041 26 / 30
The solution to the OLS problem

The residual sum of squares as

RSS = e 0 e = (y −X β̂)0 (y −X β̂)


  
= y T − β̂ T X y − X β̂
= y 0 y − y 0 X β̂ − β̂ 0 X 0 y + β̂ 0 X 0 X β̂

since (y 0 X β̂ = β̂ 0 X 0 y ) it is a scalar, then

RSS = y 0 y − 2β̂ 0 X 0 y + β̂ 0 X 0 X β̂ (11)

R.P. Machema, (NUL ) Lecture_2 CLRM EC6041 27 / 30


The solution to the OLS problem
Recall we need the value of β that minimises the RSS, we need to differentiate
with respect to each of the elements of the vector and set the derivatives equal
to zero. That is,
∂RSS
=0
∂ βˆ1
∂RSS
=0
∂ βˆ2
···
∂RSS
=0
∂ βˆk

We can simultaneously solve these equations by writing this system of equations


in vector form:
∂RSS
=0
∂ β̂

R.P. Machema, (NUL ) Lecture_2 CLRM EC6041 28 / 30


The solution to the OLS problem
Therefore the first-order conditions are
 
∂RSS ∂ y 0 y − 2β̂ 0 X 0 y + β̂ 0 X 0 X β̂
=
∂ β̂ ∂ β̂
= −2X 0 y + 2X 0 X β̂

Setting the derivative equal to zero:

−2X 0 y + 2X 0 X β̂ = 0
X 0 X β̂ = X 0 y (12)

These are the normal equations. Provided that X 0 X has rank k (which it will
do, by the identification condition A2 that we imposed), we can solve out for β̂

β̂ = (X T X )−1 X T y

R.P. Machema, (NUL ) Lecture_2 CLRM EC6041 29 / 30


The solution to the OLS problem

For the OLS solution

β̂ = (X T X )−1 X T y

to define the unique minimum (and not some other stationary point),
the Hessian given by
∂ 2 RSS
= 2X 0 X
∂ β̂∂ β̂ 0
must be a positive definite matrix, which it is because X has full rank.
Therefore, the least squares solution is unique and minimizes the sum
of squared residuals.

R.P. Machema, (NUL ) Lecture_2 CLRM EC6041 30 / 30

You might also like