0% found this document useful (0 votes)
24 views32 pages

Econometrics Notes Sem II 2020 2021 October 26 2021

Uploaded by

franciscozziwa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views32 pages

Econometrics Notes Sem II 2020 2021 October 26 2021

Uploaded by

franciscozziwa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Econometric Methods

Notes

Compiled By

Andrew Muganga Kizito

February 2019

i
Content
Part I: The Basics of Regression Analysis...............................................................................5
1 Introduction to Econometrics ......................................................................................... 5
1.1 Definition .....................................................................................................................5
1.2 Some History ...............................................................................................................5
1.3 The Econometric approach .......................................................................................... 6
1.4 Types of models...........................................................................................................7
1.5 Structure of economic data .......................................................................................... 7
1.6 Other Important Concepts............................................................................................ 7
1.7 Uses of Econometrics ..................................................................................................8
1.8 Data Generating Mechanisms ...................................................................................... 8
1.9 An example ..................................................................................................................9
1.10Assignment: An Econometric Model Example ......................................................... 10
1.11Key terms ...................................................................................................................10
2 Basic Mathematical Concepts ...................................................................................... 11
2.1 Vectors and Matrices .................................................................................................11
2.2 Matrix Operations ......................................................................................................13
2.3 Matrix Inversion.........................................................................................................14
2.4 Quadratic Forms.........................................................................................................15
2.5 Calculus of Optimization ........................................................................................... 16
3 Elementary Statistics: A Review ..................................................................................17
3.1 Random Variables......................................................................................................17
3.2 Estimation ..................................................................................................................19
3.3 Properties of the Estimators ....................................................................................... 21
3.4 Probability Distributions............................................................................................ 21
3.5 Hypothesis Testing, P-Values and Confidence Intervals...........................................23
3.6 Descriptive Statistics..................................................................................................24
4 Introduction to the Regression Model ..........................................................................25
4.1 Curve Fitting ..............................................................................................................25
4.2 Derivation of Least Squares....................................................................................... 26
5 The Two-Variable Regression Model (Simple Linear Regression) ............................. 28
5.1 The model ..................................................................................................................28
5.2 The Best Linear unbiased Estimation ........................................................................28
5.3 Hypothesis Testing and Confidence Intervals ........................................................... 30
5.4 Analysis of Variance: Goodness of Fit ......................................................................31
5.5 Examples of Commands In Stata ...............................................................................32
6 Multiple Linear Regression .......................................................................................... 33
6.1 The Model..................................................................................................................33
6.2 In Matrix Notation .....................................................................................................33
6.3 Properties of OLS Estimators / Gauss-Markov Theorem ..........................................34
6.4 OLS Estimation in Matrix Notation...........................................................................35
6.5 Unbiased Estimation ..................................................................................................35
6.6 Variance of the Least Squares....................................................................................35
6.7 The Distribution of OLS Estimators / Regression Statistics......................................36

ii
6.8 F-tests, R-Square and Corrected R-Square ................................................................ 37
7 Using the Multiple Regression Model..........................................................................39
7.1 The general linear model ........................................................................................... 39
7.2 Use of Dummy Variables........................................................................................... 40
7.3 Other cases .................................................................................................................41
7.4 Interpretation of some examples of transformations .................................................42
Part 2: Single Equation Regression Models...........................................................................43
8 Multicollinearity ...........................................................................................................43
8.1 Definition: ..................................................................................................................43
8.2 Perfect Collinearity as an Identification Problem ...................................................... 43
8.3 High But Imperfect Collinearity ................................................................................45
8.4 Consequences of Multicollinearity ............................................................................45
8.5 Detection of Multicollinearity....................................................................................45
8.6 Remedial Measures....................................................................................................45
8.7 Concluding Remarks..................................................................................................46
9 Heteroskedasticity ........................................................................................................46
9.1 Definition of Heteroskedasticity ................................................................................46
9.2 Forms of Heteroskedasticity ...................................................................................... 46
9.3 Causes of Heteroskedasticity .....................................................................................48
9.4 Consequences of Heteroskedasticity..........................................................................49
9.5 Detection of Heteroskedasticity.................................................................................50
9.6 Tests for Heteroskedasticity....................................................................................... 51
9.7 What to Do If You Find Evidence of Heteroskedasticity ..........................................53
9.8 Conclusions about Heteroskedasticity .......................................................................56
9.9 Some Stata Commands .............................................................................................. 56
10 Serial Correlation (Autocorrelation).............................................................................57
10.1Definition of Autocorrelation ....................................................................................57
10.2Autocorrelation vs. Heteroskedasticity......................................................................57
10.3Causes of Autocorrelation.......................................................................................... 58
10.4Types of Autocorrelation ........................................................................................... 60
10.5Detection of Autocorrelation .....................................................................................62
10.6Correcting for Autocorrelation ..................................................................................68
10.7Cochrane-Orcutt Iterative Procedure .........................................................................70
10.8Concluding Remarks on Autocorrelation ..................................................................70
11 Forecasting with a Single Equation Regression Model................................................72
11.1Definition ...................................................................................................................72
11.2Point and Interval Forecasts....................................................................................... 72
11.3Ex-post and Ex-ante forecasts....................................................................................72
11.4Conditional vs. Unconditional forecasts ....................................................................73
11.5The forecast error .......................................................................................................73
11.6Forecast Evaluation....................................................................................................77
12 Elements of Model Specifications Gujarati Chapter 13)..............................................79
12.1Basic Facts about Model Specification Analysis....................................................... 79
12.2Types of Specification Errors and Their Consequences ............................................79

iii
12.3Specification Testing .................................................................................................80
12.4Box-Cox Transformation and the Box-Cox Test (graduate) .....................................84
12.5Non-nested hypothesis tests....................................................................................... 85
12.6Alternative Philosophies ............................................................................................ 87
12.7Conclusions................................................................................................................88
12.8Model Specification and Evaluation Analysis(To revisit during TS analysis)..........89
12.9Take Note of Spurious Regression (To revisit during TS analysis--graduate)..........89
13 Maximum Likelihood Estimation Method ...................................................................90
13.1Maximum Likelihood Estimation ..............................................................................91
14 Models of Qualitative Choice....................................................................................... 92
14.1Discrete Dependent Variable Models ........................................................................92
14.2The Basic Model ........................................................................................................92
14.3A Generic Model........................................................................................................93
14.4Linear Probability Model ........................................................................................... 94
14.5Logit Model ...............................................................................................................96
14.6Probit Model ..............................................................................................................97
14.7Estimating the Logit and Probit Models ....................................................................98
14.8Using STATA ............................................................................................................98
14.9Logit Versus Probit ....................................................................................................98
14.10Interpretation of Results .......................................................................................... 99
14.11Tobit Models..........................................................................................................100

iv
Part I: The Basics of Regression Analysis

1 Introduction to Econometrics

1.1 Definition

Econo = Economics
Metrics=Measurement
Econometrics: Quantitative measure of economic concepts and relationships

Application of statistical and mathematical methods to analyse economic data

Examples

Analysing the changes in quantities produced and consumed due to changes in prices and
income (price and income elasticities of demand and supply) for agricultural commodities.

Impact of policy interventions


 Health: Distribution of mosquito nets on malaria incidence
 Education: Training teachers on student completion rates or grades
 Infrastructure: Opening or tarmacking new roads on agricultural production
 Monetary and Fiscal Policy: Changes in interest rates on inflation

Econometrics has spread in other areas such as social sciences, health, engineering,
psychology, geography, etc.

1.2 Some History

 Developed as a result of application of Statistics, mathematics, economics to economic


data that was becoming readily available in the early 1900s

 Early work was based on a “classical” approach which grew out of experimental statistics

5
1.2.1 Highlights of Classical Approach
1. State theory or hypothesis
2. Specify mathematical model of the theory
3. Specify econometric model
4. Obtain data via experiment
5. Estimate parameters
6. Test hypothesis
7. Forecast or predict
8. Draw conclusions (control or policy purposes)

Works well where experimental data can be obtained (e.g., in studying effects of fertilizer
application on maize yield)

Does not work well in some cases (e.g., on effects of changes in money supply on inflation)

1.2.2 Highlights of the modern Approach


1. Recognize that economics is not an experimental science
2. Instead of controlling for background factors, include them in the model and measure
them
3. Treat all variables as random
4. Models are often incomplete and data is “noisy” and “messy”
5. Draw conclusions or make predictions based on incomplete and evolving theories

1.3 The Econometric approach


1. Identify the question to be answered
2. Specify a tentative mathematical / statistical model (use economic theory or any other
available information)
3. Collect data
4. Choose an appropriate estimator and estimate the parameters of the model
5. Evaluate the estimated model and return to the specification stage if necessary
6. Evaluate the final model specification (hypothesis and specification tests)
7. Answer the questions you started out with. This may involve a forecast, a hypothesis
test, or an economic interpretation of the results)

6
1.4 Types of models
1. Time-Series Models
2. Single-Equation Models
3. Multi-Equation Models

1.5 Structure of economic data


1. Time series data
a. Observe variable(s) over a long period
2. Cross-sectional data
a. A sample of units (individuals, households, holdings) at a given point in time
3. Pooled Cross sections
a. Both cross-sectional and time series future
b. Units may be different
4. Panel or longitudinal data
a. Time series for each cross sectional member

1.6 Other Important Concepts


1. Causality
a. Statistical relations do not establish causal connection.
b. Causation comes from some theory
c. One variable has causal effect on another (rain and yield)
d. Association does not necessarily imply causality
2. The notion of Ceteris Paribus
a. Other relevant factors remaining constant
b. E.g., change in prices on quantity demanded holding other factors constant
3. If other factors are not kept constant, causal effects cannot be established

7
1.7 Uses of Econometrics
1. Structural analysis: What is the price elasticity of export demand for Uganda grains?
2. Forecasting: What will be the value of coffee exports from Uganda in 2020?
3. Policy Analysis (Conditional Forecasting): What will be the value of coffee exports
in 2020 from Uganda following the introduction of a tax on agricultural inputs in
2015/2016 budget?
4. Other examples
Is the generic advertising of mosquito nets effective on usage and control of malaria?
(What is “effective”? How to you test effective?)
1.8 Data Generating Mechanisms

DGM is an important concept in econometric model building and inference

DGM is an abstract concept not a tangible, known process

There must be some mechanism that generates the economic data that we observe. We may
not know much about the process, or what makes it work, but there will be a process

When we specify an econometric model, we are hypothesizing what the DGM looks like
(but we will never know for sure)

When we engage in specification testing, we are testing whether our hypothesized DGM fits
the data or not.

Two Phases

We approach this course in two phases

1. We begin by assuming we have the correct DGM (model specification) and all we
have to do is to use data to estimate the parameters and test hypothesis

2. Then we move on to a more difficult question of how we determine whether our


model specification is consistent with the true DGM, and what effect it might have on
our estimation and hypothesis testing if our model specification is incorrect.

8
1.9 An example
a. Research Question:

To what extent does maize price, sex of farmer, land under maize production, and transport
costs influence maize sales in Uganda?

b. Model specification:
ln Q     ln P   S   ln L   T   i
i 0 1 i 2 i 3 i 4 t

Where
ln =natural logarithm.
Q =quantity of maize sold in kilograms by the i  th household.
i
P =price of maize in shillings per kilogram for the i  th household.
i
S =sex of the household head of the i  th household (Female=1, Male=0).
i
L =land under maize production by i  th household.
i
T =transport cost per 100 kilogram bag of maize from home to the market of the
i
i  th household.

c. Data:

Collected from a random sample of households in Uganda in 2013

d. Estimation Results

ln Q  6.965  0.407 ln P  0.609 S  0.201 ln L  0.430 ln T


i i i i i
(1.387) (0.1876) (0.128) (0.0698) (0.0813)
F  16.21 R 2  0.1618 ESS  456.62 T  341

Standard errors are reported under the coefficients estimates. The regression F-statistic, the
model R 2 , the error sums of squares (ESS) and the sample size are also reported.

e. Model Evaluation

“Goodness of fit”

Others specification tests to determine “best” model if we had more than one model

9
f. Interpretation of Results and answer the research question

a) Interpret the intercept and the slope parameter estimate on the sex variable.

b) Interpret the slope parameters on price, land and transport, and state whether you
think the estimated signs for the parameters estimates for price, land, and transport variables
conform to what we would expect from the economics underlying the model.

g. Inference

c) Calculate and interpret a 95% confidence interval on the slope parameter associated
with price variable.

d) Using a 1% significance level, formally test the null hypothesis that the slope
parameter on the transport is 0.

e) Predict the quantity of maize in kilograms of a female farmer would sell if she plants
2 acres of maize, sells at 600 shillings per kilogramme, and incurs a transport cost of 10,000
shillings per 100 kg bag.

1.10 Assignment: An Econometric Model Example


Use font 12, Times New Romans, margins 1 inch top, bottom, left, right.
On two pages, give an example of the following.
a. Research question
b. Specify the model
c. State the data you will use
d. What results will you estimate?
e. How will you evaluate the model?
f. How will you interpret the results and answer your research question in 1?
g. What inference will you make

Hand in on February 25, 2019.

1.11 Key terms


Causal Effect
Ceteris Paribus
Cross-Sectional Data Set
Econometric Model
Economic Model
Empirical Analysis
Experimental Data
Non-experimental Data
Observational Data
Panel Data
Pooled Cross Section
Time Series Data
10
2 Basic Mathematical Concepts

2.1 Vectors and Matrices


Matrix A of order or dimension M  N (M row and N columns) is given by
 a11 a12  a1N 
 
A   ai j       
a a 
 M 1 M 2  aMN 
Scalar is a single (real) number or matrix of order 1  1.
Column Vector: Matrix of M rows and 1 column.
3
1 
 
x 51   5 
 
0
 2 

Row Vector: Matrix of 1 row and N columns.


x 14  1 1 4 9.
Transposition: A transpose of matrix A denoted by A is the obtained by interchanging the
rows and the columns. For instance:
1 2 3 1 4 7 

A  4 5 6  A   2 5 8 
7 8 9   3 6 9 
Similarly,
1 
x   2  and x  1 2 5 .
 5 
Commonly the convention is to indicate row vectors by primes as x  1 2 5 .
Sub-matrix
Is a matrix resulting from the deletion of the ith row and jth column

1 2 3
4 6
If A   4 5 6  then Submatrix A(1, 2) of A is  
7 9 
,
7 8 9  
which was obtained by deleting the first row and the second column.

11
Types of Matrices

Square Matrix has the same number of rows as columns.


1 2 3
 1 3 
A   4 5 6 or B  
0 11
.
7 8 9  
Diagonal Matrix is a square matrix with at least one nonzero element on the main diagonal
and zeros elsewhere.
1 0 0 
A  0 5 0 
0 0 3
Scalar Matrix is a diagonal matrix whose diagonal elements are all equal (e.g., variance-
covariance)
 2 0 0 0 0
 
0 
2
0 0 0
var-cov(u), Σ   0 0 2 0 0
 
0 0 0 2 0 
0 0  2 
 0 0
Identity or Unit Matrix a diagonal matrix with elements of the leading diagonal being 1 and
the rest zeros.
1 0 0 0 0 
0 1 0 0 0
 
I 5  0 0 1 0 0
 
0 0 0 1 0
 0 0 0 0 1 
Symmetric Matrix is a square matrix whose elements above the main diagonal are mirror
images of the elements below the main diagonal.

Triangular Matrix is a square matrix that has all zeros either above or below the main non-
zero diagonal.
1 4 6  1 0 0
Upper triangular =  0 2 8  or Lower triangular =  4 2 0  .
 
 0 0 3   6 8 3
Null Matrix is one with all elements as zero and is denoted by O .
Equal Matrices must be of the same order and corresponding elements must be equal. We
write A = B.

12
2.2 Matrix Operations
2.2.1 Addition and Subtraction:
C  A  B  [a  b ]
ik ik
A  B  [a  b ]
ik ik

2.2.2 Scalar Multiplication


Multiply each element by the scalar: i.e.,  A    ai j  .
1 1  2 1 2  1  2 2 
A  , then 2 A    
0 2  2  0 2  2  0 4 
2.2.3 Matrix Multiplication
To multiply matrix A with B as AB then matrices A and B must be multiplicatively
conformable, that is, the number of columns of A must be equal to the number of rows in
matrix B. For instance,
1 2  1 0 2   (11)  (2  3) (1 0)  (2  1) (1 2)  (2  0) 
A 22  B 23     
3 4  3 1 0  (3 1)  (4  3) (3  0)  (4 1) (3  2)  (4  0) 
Properties of Matrix Multiplication
a. Matrix multiplication is not always commutative, i.e., AB  BA .
b. Row vector by column vector is a scalar. Consider the ordinary least squares residuals
 u1 
u 
 2 
uu  u1 u2  un 1 un    
 
un 1 
u 
 n 
=u1 + u2    un 1  un2
2 2 2

  ui2 , a scalar.
i

c. Column vector of order n by a row vector of order n will yield a n  n order matrix.

 u1 
u 
 2 
uu       u1 u 2  u n 1 u n 
 
 u n 1 
u 
 n 
 u12 u1u 2 u1u 3  u1u n 
 
 u 2 u1 u 22 u 2u3  u 2u n 
= is symmetric.
     
 2 
 u n u1 u n u 2 u n u 3  u n 

13
2.3 Matrix Inversion
An inverse of a square matrix A, denoted by A 1 , if it exists, is a unique square matrix such
that

AA 1  A 1 A  I.
Properties
a.  AB   B 1A 1 .
1

b.  A    A
1 1
.

Determinants
To every square matrix A, there corresponds a scalar known as a determinant denoted by
det(A) or A .
a a 
For a A 22   11 12  the determinant is
 a21 a22 
a11 a12
A     a11a22  a12 a21.
a21 a22

Properties of Determinants
a. Singular matrix is one whose determinant is zero. Nonsingular matrix is one whose
determinant is nonzero. So the inverse of a singular matrix is indeterminate.
b. If all elements of any row of a matrix are zeros, its determinant is zero.
c. A  A .
d. If any row (or column) is a multiple (linear combination) of other rows (or columns),
the determinant of the matrix is zero. Helpful in demonstration of multicollinearity of
variables.
e. AB  A B .

Rank of a Matrix is the order of the largest square submatrix whose determinant is not zero.

As noted earlier the inverse of a singular matrix does not exist. Therefore, for a n  n matrix
A, its rank must be n for its inverse to exist; if it is less than n, A is singular. This is an
important condition for valid estimates.

Minor
If the ith row and jth column of an n  n matrix A are deleted, the determinant of the resultant
submatrix is called the minor of the element aij and is denoted by M ij .

14
Cofactor
The cofactor of element aij of an n  n matrix A, denoted by cij is defined as
cij  ( 1) i  j M ij .

Cofactor Matrix
Replacing elements of aij of a matrix A by their cofactors cij we obtain the cofactor matrix,
cofA .

Adjoint Matrix is the transpose of the cofactor matrix, denoted by adj A.

Finding the Inverse of a Square Matrix


If A is a nonsingular square matrix (i.e., A  0 ), its inverse A 1 is given by
1
A 1   adjA  .
A

2.4 Quadratic Forms


Let A denote an n  n symmetric matrix with real entries and let x denote an n  1 column
vector. Then

Q ( x )  x Ax is said to be a quadratic form.

a11  a1n   x1 
Q (x )   x1 x 2  xn         
an1  ann   xn 
  aij x i x j
i j

Note: Weighted sum of squares and cross products of x .

Example: One variable quadratic y  ax 2 .


If a  0 , then ax 2 is always positive and zero when x = 0; therefore, ax 2 is positive
definite and x = 0 is a global minimizer (optimum).
If a  0 , then ax 2 is always negative and zero when x = 0; therefore, ax 2 is always
negative definite and x = 0 is a global maximizer (optimum).

Classification of Quadratic forms


a. Positive definite if Q ( x )  x Ax  0 for all x  0 in R n ;
b. Positive semidefinite if Q ( x )  x Ax  0 for all x  0 in R n ;
c. Negative definite if Q ( x )  x Ax  0 for all x  0 in R n ;
d. Negative semidefinite if Q ( x )  x Ax  0 for all x  0 in R n ;
e. Indefinite if Q ( x )  x Ax  0 for some x in R n and Q ( x )  x Ax  0 for
some other x in R n .

15
2.5 Calculus of Optimization

The first order or necessary condition for an optimum (maximum or minimum)


dy
0
dx
A sufficient condition for an optimum is
d2y
For a maximum, 0
dx 2
d2y
For a minimum, 0
dx 2

16
3 Elementary Statistics: A Review

3.1 Random Variables


 A random variable is a variable that can take on different numerical values, each with
some probability lying in [0, 1].

 A Random Variable (RV) might be discrete or continuous.

 A random variable Y is said to be discrete if it can only assume a finite or countably


infinite number of distinct values.

 A random variable that takes on any value in an interval is called continuous.

3.1.1 Probability Functions and Densities


Definition: A probability function maps every possible outcome from a discrete RV into a
probability value in [0, 1].

These probability values add to one.

These are point probabilities of random variables.

Definition: A probability density function maps every possible outcome from a continuous
RV into a value that defines the probability of any interval as the area under the PDF.

PDFs integrate to one.

3.1.2 Expected Values


Every random variable can be written as: Y=µ+ɛ

The expected value (or mean) µ is calculated as:

μ= ( ) =∑ For discrete RVs

( )=∫ ( ) For continuous RVs

ɛ is another RV with E(ɛ)=0

17
Expectation is a Linear Operator

Result 1: ( + )= + ( )

Result 2: ( + + )= + ( )+ ( )

But: This only occurs if the relationship is linear

Let’s look at some examples where this does NOT work

( ) ≠ ( ) ( ) Unless they are independent.

(1⁄ ) ≠ 1/ ( )
E(lnY) ≠lnE(Y)

Variance

The variance of a RV, Y, is a measure of “dispersion” around the mean and is defined as:
( )= = [ − ]

The standard deviation of Y is then the square root of the variance:


= ( )
And is a measure of the “typical” distance between a Y outcome and its mean μ.

3.1.3 Joint Distribution, Independence, Covariance, and Correlation,


Joint Distribution

Two RVs Y1 and Y2, are said to have a joint distribution if knowledge of the outcome from
one of them influences the probabilities that the other one will take the particular values.

Example: Maize and Beans yields in Busoga next season are related. If you know what the
maize yield will be, that helps you predict the bean yield. Hence, these RVs are jointly
distributed.

Independence
If knowing one RV value does not provide ANY information for predicting the other:
( | = ) = ( ) for all y.

Then the RVs are said to be independent.

18
Covariance
Covariance is a measure of the tendency for two jointly distributed RVs to “move together”

( , )= = [( − )( − )] = ( )−

Note: Independence implies covariance is zero but zero covariance does not imply
independence.

Variance Is Not a Linear Operator

( + )= ( )

Assignment:

Var (Y1+Y2)

Var ( Y   Y ) 
1 2

Correlation
Correlation is related to covariance and both are measures of the tendency of two RVs to
move together.

But: Covariance depends on the units of measurement of the RVs while correlation is
independent of the units of measurement.

This characteristic makes correlation more useful as a measure of co-movement.

The correlation coefficient of the two RVs Y1 and Y2 is defined as:


( , )= =
Correlation coefficients always satisfy −1 ≤ ≤ 1 and:
= 0 indicates no relationship (zero covariance)
= −1 indicates a complete negative relationship.

= 1 indicates a complete positive relationship

3.2 Estimation

3.2.1 Data Samples and Statistical Inference


Consider the random variable, Y (e.g., the yield on a particular kind of maize plot in
Kapchorwa).

Suppose we make one draw from Y (e.g., We measure the yield for one maize plot in one
season). This is one data point on Y.

19
Now run the same experiment N times providing a sample of N data points (e.g., We
measure yield from 12 different plots but all of the same plot type).

We want to do statistical inference (learn about the properties of the underlying random
variable Y from observing the data sample {y1, y2,….. yN }.

3.2.2 Statistics and Estimation


Definition: A statistic is any function of the data sample.
Definition: An estimator is any statistic that provides information about the underlying
population probability distribution.
Note: The data are random draws from the underlying probability distribution. Therefore,
statistics are themselves RVs with their own probability distributions.

3.2.3 Estimators of the Mean, Variance and Co-Variance

The population mean is given as:


1 N
Y   y
N i 1 i
The estimate of the mean is
1 n
y  y
ni 1 i
The population variance is given as:
1 N 2
 Y2  E[(Y  Y ) 2 ]   ( y  Y ) ; This estimator is biased
N i 1 i
We normally use

1 N 2
S2   ( y  Y ) because E ( s )  S
2 2

N 1i  1 i
The estimate of the variance is:
1 n 2
s 2  v( y )   ( yi  y )
n 1i  1
The estimate of the covariance of y1 and y 2 is the expectation of the product of y1 and y 2
when both are measured as deviations from their means.
1 N
Cov ( y , y )  E[( y  Y )( y  Y )]   ( y  Y )( y  Y )
1 2 1i 1 2i 2 N i  1 1i 1 2i 2
The estimate of the covariance is
^ 1 n
Cov ( y , y )   ( y  y )( y  y )
1 2 n  1 i  1 1i 1 2i 2
Correlation coefficient between variables y1 and y 2 is given as

20
Cov( y1 , y 2 )
 ( y1 , y 2 ) 
V ( y1 ) V ( y 2 )
The sample correlation coefficient is:
n
^  ( y1i  y1 )( y 2i  y 2 )
Cov ( y , y )
r( y , y )  1 2  i 1
1 2 v( y ) v( y ) n
1 2 2 n 2
 ( y1i  y1 )  ( y 2i  y 2 )
i 1 i 1
  0 indicates no relationship (zero covariance)
  1 indicates a complete negative relationship
  1 indicates a complete positive relationship

3.2.4 The Central Limit Theorem


If the random variable Y has mean  and variance  2 , then the sampling distribution of Y
becomes approximately normal with mean  and variance  2 / N as N increases.

3.3 Properties of the Estimators


3.3.1 Unbiasedness
( )=

3.3.2 Efficiency
( )< ( ∗) ∗
for some other estimate

3.3.3 Consistency
( )=

lim prob (| Y   |  )  1
Y
N 

3.4 Probability Distributions


You need to be familiar with the following four probability distributions
1. Normal
2. Chi-Square
3. t distribution
4. F distribution

3.4.1 The Standard Normal (Z-distribution)


A standard normal RV, Z, is continuous over (   ,   ), has zero mean, unit variance
and a pdf of the form:

21
1 1
f ( z)  exp( z 2 )
2 2
Y 
It is denoted Z ~ N (0,1) Z 
Y
A general normal RV, Y is continuous over (   ,   ) and can be written as:
Y    Y Z
Where   E (Y ) ,  2  var(Y ) , and Z ~ N (0,1)
Y
It is denoted Y ~ N (  ,  2 )
Y

3.4.2 Chi-Square distribution


A RV defined as the sum of squares of N independent standard normal RV is distributed as
Chi-square with N degrees of freedom.
N
W   Z 2 ~  2 (N ) .
i
i 1
Where Z ’s are independent and distributed Z ~ N (0,1) for all i.
i i
Note: W is defined over [0, +∞).

3.4.3 The t-distribution


A RV defined as the ratio of a standard normal RV to the square root of a standardised
independent Chi-square RV with N degrees of freedom is distributed as a t with N degrees of
freedom.
Z
V ~ t(N )
W /N
Where Z ~ N (0,1) and W ~  2 ( N )
Note: V is defined over (-∞, +∞), and looks a lot like a normal, but with “fatter tails.”

3.4.4 The F-distribution


If W and W are two independent chi-square RVs with degrees of freedom N and N ,
1 2 1 2
(W / N )
then a RV defined by the ratio: U  1 1 ~ F ( N , N ) is said to follow an F-
(W / N ) 1 2
2 2
distribution.

Note: U is defined over [0,   ).

22
3.5 Hypothesis Testing, P-Values and Confidence Intervals

3.5.1 Hypothesis Testing


Hypothesis testing involves the following steps:
1) State the null and alternative hypothesis
Eg : = 10, : ≠ 10
2) Define an appropriate test statistic (e.g., the sample mean, sample variance or
estimated coefficients from a regression model)
3) Compute the distribution of the statistic under the null
~ (10, )
4) Standardize the statistic so as to use published tables
= ~ (0,1) and ̂ = ~ ( − 1)
√ √
5) Compute the p-values of the computed test statistic
6) Reject if the p-value is less than the chosen significance level, otherwise fail to reject

3.5.2 P-Values
A p-value is the minimum significance level at which a null hypothesis can be rejected.

P-values are sometimes called the exact level of significance of a test statistic.

E.g., If a p-value=0.04 this means the null can be rejected at the 5% level but not the 1%
level.

3.5.3 Confidence Intervals


If you can do hypothesis tests you can do confidence intervals. We know that:

~ ( − 1)

Therefore, for 20 df:

−1.725 ≤ ≤ 1.725 = 0.9


: + 1.725 ≥ ≥ − 1.725 = 0.9


√ √

23
3.5.4 The power of a test
Power is the probability of rejecting the null hypothesis when it is in fact false.
power  P (reject H | H is true)
0 1
It is 1 minus the probability there will be a Type II error. I.e., it is 1 minus the probability one
will accept the Null Hypothesis as true when it is in fact false.

Table on Power and Type I and Type II errors


Decision Ho True Ho False
Fail to reject H Correct Decision Type II Error (1- Power)
0
Reject H Type I error (p-value) Correct Decision
0

3.6 Descriptive Statistics


 Histogram: Tabulates frequency distribution of the data
 The Median:
o For odd number of observations, the middle observation when the data are
ranked from low to high or (high to low)
o For even number of observations, the average of the two middle observations
 Skewness: provides the symmetry of the probability distribution
N
S  (1 / N )  x 3 / s 3
i
i 1
o s the standard deviation.
o S =0 or all symmetric distributions
o S >0 upper tail is thicker
o S <0 when negative tail is thicker

Negatively Skewed Distribution Positively Skewed Distribution

 Kurtosis: Provides measure of “thickness” of the tails of the distribution


N
K  (1 / N )  x 4 / s 4
i
i 1

o K =3 for a normal distribution


o K >3 for tails thicker than the normal
o K <3 for tails thinner than the normal

24
4 Introduction to the Regression Model

4.1 Curve Fitting


Time series data: Describes movement over time (e.g., Daily, weekly, monthly, quarterly,
annual)

Cross-Section data: Describes activities of individuals, firms, or households at a given point


in time

Pooled data: Combines time series and cross-section data: Study of firms or households
over time.

Sample: a set of observations

Grade Point Average and Family Income


Grade Point Income of parents in
Average $1,000
4 21
3 15
3.5 15
2 9
3 12
3.5 18
2.5 6
2.5 12
Source: Pindyck and Rubinfeld

Assume a linear relation between X and Y


Objective: Obtain “Best” straight line relating X and Y
Method 1: Connect a line between lowest and Highest Point
Method 2: Draw a line which seems to pass through most points
Method 3: Draw a line so that sum of vertical distances from line (deviations) is Zero.
Method 4: Draw a line that minimizes the absolute deviations from the fitted line
Method 5: Least squares: The “Line of best fit” minimises the sum of squared deviations of
the points of the graph from the point of the straight line.

25
Negative deviations below the line
Positive deviations above the line

4.2 Derivation of Least Squares

Fitted Values

Least-square line
yˆ  ˆ  ˆx
i i
Deviation: ˆ  y  yˆ
i i i

26
The ordinary least squares (OLS) criterion for picking estimators in the simple linear
regression model is to choose the , that minimise the sums of the squared residuals:

min ̂ = ( − − )

First order conditions for solving these quadratic programing problems are:

−2 − − =0

−2 − − =0

Normal equations

Solving for , gives the OLS formulas:


∑ ( − ̅ )( − )
= and = − ̅
∑ ( − ̅)

27
5 The Two-Variable Regression Model (Simple Linear Regression)

5.1 The model


The model is
y    x  
i i i
y = the ith observation on the dependent variable, y
i
x =the ith observation on the explanatory variable, x
i
 =the ith error term
i
α and β are the parameters (intercept and slope) to be estimated.

5.2 The Best Linear unbiased Estimation

5.2.1 Properties of OLS estimators / The Gauss-Markov Theorem


(1) If the DGM is truly linear of the form y     x  
i i i
(2) The X s are not Random Variables (controlled, as in experiments). They are fixed.
i
(3) E ( )  0 for all i (Error term has zero expected value)
i
(4) ( )= ( )= (Error term has constant variance for all
overvations-Homoscedastic)
(5) , = =0 ≠ (error terms are statistically independent-no
autocorrelation)
(6) The error term is normally distributed

Then the OLS estimators , are the Best Linear Unbiased Estimators (BLUE).
1. “Linear” estimator means that OLS estimators are a linear functions of the
observations on the dependent variable
2. “Unbiased” Means that ( ) = and E ( ˆ )  
3. “Best means that of all the estimators that are linear and unbiased, the OLS estimators
have the smallest variance (are most efficient).

5.2.2 Sources of Errors


1. Errors in optimization (behavioral errors)
2. Omitted variables (no data, immeasurable variable, incomplete theory).
3. Errors of measurement (poor proxy variables)
4. Intrinsic randomness in human behavior
5. Principal of parsimony (use as less variables as is necessary)
6. Wrong functional form. Functional form is only an approximation to the true
relationship between the variables (linear approximation)

28
5.2.3 Conditional Mean Interpretation of Regression
Notice that if we assume E (  ) =0 for all i (a very unrestrictive assumption-why?) then:
i
( | )= + is the conditional mean of at i (conditional on observing xi).

That implies that


1. The intercept, ,is the expected value of given = 0:
= ( | = 0)

2. The slope , is the expected change in given a marginal unit change in xi


E ( yi / xi )

xi
5.2.4 Estimation
We don’t know α and β but want to estimate them from the observed data on ( , ) for
i=1,2,3,……,N.

Definition: An estimator , of ( , ) is any pair of values that depend on the data and
satisfy: = + + ̂ for some set of residuals ̂

Note that  is a population quantity that you don’t see (disturbance) while ˆ is a sample
i i
quantity that is observable.

Note: Clearly there are many, many such potential estimators , . How do we make sure
we are picking “good” ones?

Some Potential Estimators


1. Graph the data and draw in a “line of best fit.”
2. Set = ∑ ⁄ and = 0
3. Set = 0 and = 1
4. OLS
Which of these are “good” and which are “bad?”

5.2.5 The Distribution of OLS Estimators


To test hypotheses about and we need to know their probability distributions.
Suppose that DGM:

Y    X  
i i i
Where ~ . . . ( , )

It can be shown that:

29
   
   
 2   X 2  (check)
 ~ N  ,
ˆ
  ~ N  , N
ˆ

N 2 2
  (X  X )    (X  X ) 
 i  1 i   i  1 i 

 
 
 X 2 
ˆ ~ N  , 
N 2
  ( X  X )
 i  1 i 

X 2
Cov (ˆ , ˆ )   cross check
N 2
 (Xi  X )
i 1

2
2 2  ˆi
s  ˆ 
N 2
s  ̂ =standard error of the regression
2
 ˆi =residual sums of squares.
Proof:
(X  X ) x
Let k  i  i
2  N 2 
i N
 (Xi  X )
  i 
x
i 1 i  1 

5.3 Hypothesis Testing and Confidence Intervals

5.3.1 Hypothesis Testing

Hypothesis testing involves the following steps:


1) State the null and alternative hypothesis
Eg : = 0, : ≠0

2) Define an appropriate test statistic:

3) Compute the distribution of the statistic under the null


~ 0,
∑ ( ̅)
4) Standardize the statistic so as to use published tables

30
= ~ (0,1) and ̂ = ~ ( − 2)
∑ ( ̅) ∑ ( ̅)

5) Compute the p-values of the computed test statistic ̂

6) Reject if the p-value is less than the chosen significance level, otherwise fail to reject

5.3.2 Confidence Intervals


If you can do hypothesis tests you can do confidence intervals.
Pr ob( ˆ  t s    ˆ  t s ) 1
( / 2, n  k ) ˆ ( / 2, n  k ) ˆ
1   =desired level of confidence, n  k =degrees of freedom.

We know that:
̂= ~ ( − 2) Where = ∑ ( − ̅)

Therefore for 20 degrees of freedom:

−1.725 ≤ ≤ 1.725 = 0.9

: + 1.725 ≥ ≥ − 1.725 = 0.9

5.4 Analysis of Variance: Goodness of Fit


How do we determine that the model is a “good” fit for the data?
1. Determine the y values predicted by the model as:
y    x  
i i i

2. Decompose the total sum of squares (TSS) in y into two components: the “Regression
sum of squares” (RSS) explained by the regression line; and the “Error sum of Squares”
(ESS) that is not explained by the regression line:

TSS= RSS + ESS


N 2 N 2 N 2
 ( yi  y )   ( yˆi  y )   ( yi  yˆi ) 
i 1 i 1 i 1
3. Define the “coefficient of determination:”
RSS ESS
R2   1
TSS TSS

31
N 2 N 2 N 2
 ( yˆi  y )  ( yi  yˆi )  ˆi
 i 1 1 i 1 i 1
RSS
R2  1
TSS N 2 N 2 N 2
 ( yi  y )  ( yi  y )  ( yi  y )
i 1 i 1 i 1

Note:
lies between 0 and 1
Values “close to” 1 indicate a good fit, but:
Low (high) does not necessarily indicate the model is “bad” (“good”). Why not?

Note: Be careful about the use of:


Error Sums of Squares (ESS) = RSS (Residual Sums of Squares)
Regression Sums of Squares (RSS) =ESS (Explained Sums of Squares) in some
books.

5.5 Examples of Commands In Stata

Sum gpa income


Corr gpa income
Reg gpa income
Test income
ttest gpa == 2

32

You might also like