Econometrics Chapter 1& 2
Econometrics Chapter 1& 2
The first task an econometrician faces is that of formulating an econometric model. What
is a model?
A model is a simplified representation of actual phenomena. For instance, saying that ‘the
quantity demanded of oranges depends on the price of oranges’ is a simplified
1
representation because there are a host of other variables that one can think of that
determine the demand for oranges. For example, the price of related goods such as apples
and income of the consumer, taste and preference of the consumer, etc are determinants
of the demand for orange. However, there is no end to this stream of other variables. In a
remote sense even the price of gasoline can affect the demand for oranges.
2
In fact, u is the source of inexactness in this model. Assume u is a random variable taking
the value of +100 or –100 with uniform probability of (0.5, 0.5). Let also
β 0 =50 and β 1 =1 . for a given value of X (say X ), Y is likely to take one of two
0
3
For Example, in economics there is a theory that states a positive relationship between
household consumption expenditure and the level of income, given other things being
constant. As income increases consumption expenditure will also increase.
Other theories include; negative relation between investment and interest rate, positive
relation between demand for a s normal good and level of income, negative relation
between quantity demanded and price of a product, etc, are an accepted theories in
economics.
As noted above, the disturbance term is a surrogate for all those variables that are
omitted from the model but that collectively affect the independent variable Y. But the
question is that; why not introduce all those variables into the model explicitly? Some of
the reasons are:
4
i) effects of variables omitted
ii) statistical errors
iii) Vagueness of theory
iv) unavailability of data
v) Intrinsic randomness in human behavior
vi) Wrong functional form, etc.
4. Obtaining the data - To estimate the econometric model specified above, that is, to
obtain the numerical values of 1 and 2, we need data. Thus, the econometrician or a
researcher can collect either a primary data or secondary data depending on his interest or
justification. Of course, we will have more to say about the nature and sources of data for
econometric analysis in the next chapter.
5. Estimation of the parameters of the econometric model - After obtaining the data,
the next task is to estimate the parameters of the consumption function. The numerical
estimates of the parameters give empirical content to the consumption function. The
actual mechanics of estimating the parameters will be discussed in the next chapter.
6. Hypothesis Testing - Assuming that the fitted model is a reasonably good
approximation of reality, we have to develop suitable criteria to find out whether the
estimates (for the values of β 0 and β 1) obtained from the econometric model are in
accordance with the expectations of the theory that is being tested.
Given the model: Y = β0 + β 1 X +u , Y – consumption and X - income
According the accepted economic theory there is positive relationship between
consumption expenditure and income. The Marginal propensity to Consume (MPC
measured by β 1 ) has to be positive: 0 < β 1< 1
Thus, the sign of β 1 is expected to be between zero and 1; if our result of estimation
shows a negative sign (if β 1< 0), contradicts the theory.
Before we accept the finding, we must enquire whether this estimate is not a chance but it
is a valid result. We test whether the estimated coefficients are statistically significant or
not, whether the estimated model is a close approximation the actual phenomenon, etc.
the confirmation or refutation of economic theories on the basis of sample evidence is
5
based on a branch of statistical theory known as statistical inference (hypothesis
testing).
Hypothesis testing enables the researcher whether to use the model for prediction or to
use it for further analysis or to accept the interpretation the result obtained from
estimation is valid or not.
7. Forecasting or prediction - If the chosen model does not refute the hypothesis or
theory under consideration, we may use it to predict the future value(s) of the dependent
variable Y on the basis of the known or expected future value(s) of the independent
variable X. For example, from our simple investment model, one can predict that for a
one-unit fall (rise) in real interest rate, gross investment will increase (decrease) by the
amount approximately equal to 1.
If the regression result seems reasonable, any policy maker or the government might use
appropriate fiscal and monetary policy mix to manipulate the control variable X (say
interest rate) to produce the desired level of the target variable Y (say investment).
1.3 The Types of Data for Econometric analysis
Data could be primary or secondary data. Primary data are data collected by the
researcher for his/her study from a selected sample using different instruments: such as
interview, questionnaire, focus group discussion, experiments, etc.
Secondary Data are data that are available from documents, reports or different
organizations. They are data that are not originally collected by the researcher but
someone else. For example, in Ethiopia, secondary data on macroeconomic variables can
be obtained from: CSA (Central statistics Authority), NBE (National bank of Ethiopia),
MOFD (Ministry of Finance and Dev’t) EEA (Ethiopian Economic Association), etc.
6
International organizations such as World Bank (WB data base), IMF, etc are also major
sources of macroeconomic data on almost every country.
In generally, there are three types of data which are used in empirical analysis: cross-
section data, time series data and pooled data (including longitudinal, panel data)
1) Cross – section data: - are data on one or more variables collected at the same point in
time. The most important example of cross-sectional data is population Census which
provides data on demographic and socio- economic characteristics of households at given
point in time. In Ethiopia, Population Census is taken once in 10 years; the most is taken
in 2007.
Example 1 – monthly income of selected households in 2005
Subject Income in birr Sex Schooling Work experience
in years
1 600 M Secondary 4
2 1200 M Secondary 2
3 900 F Tertiary 3
⋮ ⋮ ⋮ ⋮ ⋮
100 5000 M Tertiary 12
We have 100 samples households or subjects and 4 cross-sectional observations for each
subject. Thus, we have a total of 400 cross-sectional data (4×100).
Example 2 – prices for sampled fruits in 2005
Items Output in tons Price per kg (in birr)
Banana 200 8
Orange 60 12 We two cross sectional
Avocado 30 10 observations for 5 sampled
Apple 5 60 fruits. Hence, we have
Mango 75 8 2 ×5=10 cross sectional
data.
Cross - sectional data suffer from the problem of heterogeneity.
7
2) Time series data: - A time series is a set of observations on the values that a variable
takes at different times. Such data may be collected at regular time intervals, such as
daily, weekly, monthly, quarterly, annually, etc. The most important time series data are
Macroeconomic data such as: GDP, GNP, PCI (per capita income) etc, are reported
annually. Example - Ethiopian Per capita income at constant US dollar 2000 - 2012 (World
bank Data base 2014)
Year PCI
In time series data non-stationary problem is
2000 135
encountered, where mean and variance of the data vary
2001 142
2002 140 over time. A time series is said stationary if it’s mean
2003 133 and variance don’t vary over time (if mean and variance
2004 147
are constant).
2005 160
2006 172
2007 187
2008 202
2009 214
2010 234
2011 253
2012 269
(Ethiopian PCI at constant 2005 US dollar)
3) Pooled data: - In this case data are elements of both time series and cross –section.
For example, the following are macroeconomic data for Ethiopian economy 2000- 2012
Year PCI g I S
2000 135 3.0 23.11 11.0
PCI - per capita at constant 2005 US$,
2001 142 5.22 24.53 12.68
2002 140 1.38 27.31 13.16 g - GDP per capita income growth rate in %
2003 133 -5.0 25.0 10.74 Source: World Bank 2014
2004 147 10.4 29.76 12.86 I –Gross Domestic investment as %GDP
2005 160 8.73 26.53 5.92 Data base
2006 172 7.83 27.94 4.98 S – Gross Domestic saving as %GDP
2007 187 8.48 24.46 4.92 We have 4 cross sectional variables
2008 202 7.86 24.7 4.88 (per capita income, growth rate,
2009 214 6.0 25.59 7.0
2010 234 9.63 27.43 7.55 saving and investment) and for each
12.73 variable we have 13 time series
2011 254 8.32 27.87
2012 269 6.0 33.08 15.0 observation. We have a total of 4 by
13(52) observations.
8
4) Panel or Longitudinal data: - This is a special type of pooled date in which the same
cross- sectional unit is surveyed over time. This very closely related with pooled data.
e.g., Ethiopian rural households survey Data since 1989
Chapter Two –
Classical Linear Regression Model (CLRM)
2.1. The concept of Regression
9
Definition: Regression analysis is the study of the relationship between two or more
variables. There are dependent variables and explanatory (independent) variables; hence,
regression used to study the dependence of one variable (the dependent variable) over
one or more of explanatory variables.
We regress the dependent variable over the explanatory variables (Regress Y on X); and
we estimate or predict the mean value of the dependent variable in terms of the known
(fixed) values of the independent variables.
Y i=β 0 + β 1 X +ui
The Population Regression Function (PRF) which shows the expected value of the
dependent variable (conditional upon X, the independent variable) is given as:
E ( Y / X ) =β 0+ β1 X
Suppose the relation between age and height of children; where height of children is the
dependent (Y) variable while Age of children (X) is the explanatory variable. That is,
Height increases as Age of children increases.
: Y i=β 0 + β 1 X +ui
Where, Y the actual height of a child measured in Inches, X is age of a child in years and
ui denotes the error term. However; the expected height of children for any given age is
given as the conditional mean of Y at X:
Using the population regression line, the PRF is graphically shown as follows
10
Height(Y) in inches
70 D PRF Line: E ( Y / X ) =β 0+ β1 X
60 C
B
50 A
40
10 12 14 16 Age in years (X)
The population regression line contains the average distributions of heights (Y) at a given
Age (X) of children. Height of children increases as age increases, on average.
Note that at any given Age (for any fixed value of the explanatory variable, X), children’s
Height is random; it could be above or below the regression line. That is, Y is randomly
distributed while X is statistical fixed. However, the conditional mean of height ( E ( Y / X ) )
of children in each age group is predictable and it is denoted by points on the regression
line such as at points: A, B, C and D. E ( Y / X ) =β 0+ β1 X
For example, at age 10, let assume the expected height of children is 48 inches (at
point A on PRF line). But, the actual height of a child can be anywhere above or
below point A (it could be 50, 53, 58, or 38, 40, 45, etc inches.
Since we don’t have the entire population data, we should rely on sample data to estimate
population mean value. Thus, the sample counter part of the population regression
function is referred as Sample Regression Function (SRF) which given as:
Y^ i = ^β 0 + ^β 1 X i
Y^ i - Estimator of the population conditional mean E ( Y / X ) ; simply estimator of Y i
^β - Estimator of β 0 and
0
^β 1 - estimator of β 1
If we draw the line for the estimated mean values (SRF) it will not necessarily overlap
with the line of PRF, it only approximately closes to PRF line (below the broken line
shows the estimated SRF).
11
Height (Y) SRF PRF
70 D
60 C - - - - - SRF
B _______ PRF
50 A
40
10 12 14 16 Age (X)
Note: the dependent variable also called the explained, regressand, predictand,
endogenous, outcome, etc.
The ‘Explanatory variable’ also called independent, predictor, Regressor Exogenous,
Stimulus variable, etc.
Regression analysis deals with the dependence of one variable over other variables, it
does not necessarily imply causation (in a sense that one variable is a cause for another).
It only shows statistical relationship between dependent and explanatory variables.
In Regression Analysis the primary objective is to estimate or predict the average value
of one variable on the basis of the fixed values of other variables (explanatory).
In regression analysis there is an asymmetry in the way the dependent and explanatory
variables are treated. The dependent variable is assumed to be statistical, random, or
stochastic, that is, to have a probability distribution. The explanatory variables, on the
other hand, are assumed to have fixed values (in repeated sampling).
The simple linear regression model is a regression model that contains only two
variables: one dependent and one explanatory variable.
Y = β0 + β 1 X +U
Our discussion in this chapter is focused on two variables model (Simple Regression
Model)
But:
2
}
α2
}
Y 1=β0 +β1 X+u1 ¿ Y 2=α 0+ X+u2 ¿ ¿ ¿are non−linear in parameter regressioin functions.¿
α1
Assumption 2): X values are fixed in repeated sampling. Values taken by the regressor X
are considered fixed in repeated samples. More technically, X is assumed to be non-
stochastic or non- random.
13
This assumption states that the factors not explicitly included in the model (which are
denoted by
ui, ) do not systematically affect the mean value of Y i.e., their effect will
i
ui
+ui
−u
variance of
ui the same for all observations. Symbolically:
Var ( ui / x i ) = E [ ui − E ( ui / x i ) ]
2
= E [ u2i ] =δ 2 i=1⋯n
var ( u i )=σ 2
The opposite of homoscedasticity is heteroscedasticity which means the variance of the
error term is not constant.
Assumption 5: No autocorrelation between the disturbance or error terms. Given any two
X values,
X i ∧X j ( i≠ j ) , the correlation between any two error terms; ui ∧u j ( i≠ j ) is
zero. i.e.
[
cov ( ui , u j ) =E [ ui −E ( ui ) ][ u j−E ( u j ) ] ]
cov ( ui , u j ) =E [ [ ui−0 ][ u j−0 ] ]
cov ( ui , u j ) = E ( ui ) E ( u j ) = 0
Assumption 6: Zero covariance between the error terms ui and the explanatory
Variables, Xi. i.e., ( i i )
E u x =0
Cov ( u i , x i )=E [ ui −E ( ui ) ][ X i−E ( X i ) ]
14
If the error terms are correlated with the explanatory variables, it is difficult to isolate
the influence of X and u on the dependent variable, Y. Assumption 6 is automatically
fulfilled if X variable is nonrandom or non - stochastic and Assumption 3 hold.
Assumption 7: - the number of observations ‘n’ must be greater than the number of
Explanatory variables. In other words, the number of observations ‘n’ must be greater
than the number of parameters to be estimated.
Omission of important variables from the model, or by choosing the wrong functional
form, or by making wrong stochastic assumptions about the variables of the model, the
validity of interpreting the estimated regression will be highly questionable.
15
Normality Assumption: - The disturbance or error terms are normally distributed for all i.
i.e. ui ∼ N ( 0 , δ )
2
The disturbance term (ui ) is normally distributed with zero mean and variance σ 2
The normality assumption is the outcome of the Central Limit Theorem (CLT) of
statistics which implies as the sample size increases the distribution of the variable
approximately normally distributed. With the inclusion of the normality assumption
about the probability distribution of the disturbance term, the model could be called
Classical Normal linear Regression Model (CNLRM).
The probability distribution of the OLS (Ordinary Least Square) estimator depend the
distribution of the error. Since knowledge of the probability distributions of OLS
estimators is necessary to draw inferences about their population values, the nature of the
probability distribution of ui assumes an extremely important role in hypothesis testing.
16
Now, given data or observations on Y& X, we would like to determine the SRF in such a
manner that the estimated value (Y^ i ) is as close as possible to the actual Y. To this end,
^ ^
the OLS method determines or estimates β 0 and β 1 in such a way that it minimizes the
error or, it minimizes the Squared Sum of the Residuals (RSS) as possible as it could be.
The OLS method of estimation is shown below:
β^ 0 . β^ 1
^ ^
Then take the partial derivatives with respect to β 0 and β 1 as follows
∂ (∑ u ) =−2i2
∂ β^ 0
∑ ( Y i − β^ 0− β^ 1 X i )=−2 ∑ ui = 0⋯⋯ ( 1 )
F.O.C:-
∂ (∑ u ) =2 i2
∂ β^ 1
∑ ( Y i − β^ 0 − β^ 1 X i ) (− X i ) =−2 ∑ xi ui =0 ⋯⋯ ( 2 )
From equation (1), we have the following summations:
β^ 0 =Ȳ − β^ 1 x̄ - ^β
0 is the least square point estimator for β 0
Where: Y ∧¿ X are average values (mean) of Y and X respectively. This is the last
square estimator for 0 the intercept term.
β
From equation (2), we have:
−2 [∑ ( Y X − ^β i i 0 ]
X i− ^β 1 X i2 ) =0
∑ XiY i −¿ ^β 0 ∑ X i −¿ ^β 1 ∑ X i2 ¿ 0
17
∑ XiY i −¿ ( Y − ^β 1 X ) n X −¿ ^β 1 ∑ X i2 ¿ 0
∑ XiY i −¿ n Y X + ^β 1 n X 2−¿ ^β 1 ∑ X i2 ¿ 0
Hence
^β 1= ∑
X i Y i−n X Y
- ^β is the least square point estimator for β 1
∑ X i2 −n X 2 1
^β = ∑ xi y i
1
∑ xi2
The lower letters y i and x i denotes the values of the mean deviation: the deviation of
individual observation from its mean or average values.
y i = Y i−Y - the deviation of Y values from the mean value
x i = X i −X - the deviation of X i values from the mean value
[Note that in most of our subsequent discussion we write variables in deviation form].
The estimators as derived above are called point estimators; that is, given the sample,
each estimator provides only a single (point) value of the relevant population parameter.
Numerical Example 1: - consider a hypothetical date on output (Y) produced and labor
input ( X i ) used for a firm are given as follows:
Y 11 10 12 6 10 7 9 10 11 10
X 10 7 10 5 8 8 6 7 9 10
Then we have: two variables Y (the dependent variable) and X the explanatory variable,
sample size: n=10,
∑ Y i=96, ∑ X i=80, ∑ Y i2 =952, ∑ X i2=668 , ∑ Y i X i =789, Y =9.6 , X=8
The model is specified as: Y = β0 + β 1 X +U
18
Then estimate the values of the regression coefficients based on the data
β^ 1=
∑ Y i X i −n ȳ x̄ 789−( 10 ) ( 8 ) ( 9. 6 ) 21
= = =0 .75
∑ X 2−n x̄ 2 668−( 10 ) ( 8 )2
i
28
employment, total output will increase by 0.75 unit. Note also the following summations
in deviation form:
∑ yi x i=21 ∑ x i2=28 , ∑ yi2=30.4, ∑ ^yi2 = ^β 2 ∑ x 2=0.752 ×28=14.063
1 i
Note: ^β 1=
∑ xi y i = 21 =0.75
∑ xi2 28
2.2.4 Important properties of OLS estimators: -
1. It passes through the sample means of Y and X, ( Y and X respectively); in other
words, the SRF contains both mean values.
Y ^ = ^β 0 + ^β 1 X i
SRF Y
β^ 0 =Ȳ − β^ 1 X̄ i
⇒ Ȳ = β^ + β^ X̄
0 1
Ȳ A
β^ 0
X X
In the above example we have X =8, then we can determine the mean of Y as follows:
Y^ i = ^β + ^β X = 3.6 + 0.75(8) = 9.6. Hence, Y = 9.6 give X is 8. As far as the X is
0 1 i
2) The mean value of the estimated ( Y^ ) is equal to the mean value of the actual Y.
∑ Y i =∑ (Y^ i +ui )
19
∑ Y i = ∑ Y^ i + ∑ ui ∑ ui
n n n , where n = 0 by assumption
Y = Y^ + 0; Hence, Y = Y^
Y = Y^ = ^β 0 + ^β 1 X
3. The summation the disturbance term is zero – could be proved as follows:
ui=Y i− ^β 0− ^β 1 X 1
∑ u i = ∑ ( Y i− β^ 0− β^ 1 X 1 )
∑ u i = ∑ ( Y i )−∑ ( ^β 0+ β^ 1 X 1 )
∑ u i = ∑ ( Y i )−∑ Y^ I
∑ u i ¿ n Y −n Y^ = 0 (sinceY = Y^ ). Hence: ∑ ui ¿ 0
4. The OLS model can be written in deviation from as follows
Y i=Y^ i + ui= ^β0 − ^β 1 X 1 +ui
Y i=Y^ i + ui , subtract Y from both sides: Y i−Y =Y^ i−Y + ui
2a) y i= ^y i + ui
Again, observe that ^y i = Y^ i−Y = ^β 0 + ^β 1 X i −( ^β 0 + ^β 1 X )
^y i = ^β 0 + ^β 1 X i −( ^β 0 + ^β 1 X )
^y i = ^β 0 + ^β 1 X i − ^β 0− ^β 1 X
^y i = ^β 1 X i− ^β 1 X = ^β 1 ( X i−X )
∑ Y^ i ui ∑ ui ( β^ 0 + β^ 1 X i )
Cov ( Y
^ ,u ) =
i i =
n n
= β 0 ∑ ui + β 1 ∑ ui X i
^ ^
20
cov ¿ ) = 0 + 0
cov ¿ ) = 0; Hence, no correlation between estimated values of the dependent variable &
the error terms.
6. The residuals term is not correlated with the explanatory variable, ie, cov ¿ ) = 0
This imply that the error term has no systematic relation or correlation with the
explanatory variable
Having estimated a particular linear model, a natural question that comes up is:
How well does the estimated regression line fit the observations?
The coefficient of determination r 2 is a summary measure that tells how well the
sample regression line fits the data.
Since the population regression function (PRF) is estimated using sample, there is always
some deviation from actual value. Using sample observation, we produce the SRF
(Sample Regression Function). The measure of the ‘goodness of fit’, which denoted by r 2
in simple two variables regression, helps us to see how close is the estimated sample
regression line to the population regression line. r 2 is used to measure the proportion of
change in the dependent variable that can be accounted to the explanatory variable. r 2 is
computed from sample information and the amount error made. Recall that:
Y i=Y^ i +u i ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ( a )Y = Y^ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ( b )
21
∑ yi2=∑ ( ^y 2i ) + 2∑ ^yi ui + ∑ u i2
∑ u i ^y i=0, thus
∑
⏟ y i 2=∑
⏟ ^y i2 + ∑ u 2 ⋯⋯⋯⋯⋯⋯ ( c )
⏟i
TSS ESS
RSS
∑ ^yi2 ¿ ^β i ∑ x i =
∑ yi2−¿ ∑ u i2
2 2
ESS =
RSS = ∑ u i2 = ∑ yi2−∑ ^y i2 = ∑ yi2− ^βi2 ∑ x 2i
2 RSS
r =1− Where:
TSS
TSS = total sum of square. (Total variation of the dependent variable),
ESS = Explained sum of square, explained variation or the variation of the dependent
variable that accounted for the explanatory variable.
RSS = Residual sum of square, (Unexplained variation) the variation in the dependent
variable that is not explained by explanatory variables in the model.
TSS = ESS + RSS, then dividing both sides of equation (c) by TSS, we have
TSS ESS RSS
= +
TSS TSS TSS
2 RSS
r =1− - Another way of computing the goodness of fit
TSS
Note: - r 2 measures the proportion or percentage of total change in the dependent variable
(Y) explained by the regression model (Regressor). r 2 is taken as a measure of “goodness
–of fit” of the model.
22
When r = -1, there is perfect negative correlation between X and y. When r = 1, there
perfect positive correlation between X and Y.
β^ ∑ y i x i 21
r 2= 1 =( 0 . 75 ) . =0 . 52
∑ y i2 30 . 4
∑ yi xi 21 21
r =√ 0.52 = 0.721 or r =¿ = = = 0.72
√ y i2 √ x i2 √30.4 × 28 29.175
Interpretation: r 2 = 0.52 means that about 52% of the variation in output is explained by
the variation in labor hour.
23
2
r 2
=β 1
∑ xi yi = ( ∑ xi yi)
Students Activity: - show that:
∑ y i 2 ∑ x i2 ∑ y i2
2.3. The BLUE property of OLS Estimators: Gauss - Markov Theorem
The Gauss - Markov Theorem state that given the assumptions of the classical linear
regression model, the OLS estimators satisfy Best Linear Unbiased Estimators (BLUE)
property.
^
An estimator, say the OLS estimator β 1 , is said to be a Best Linear Unbiased Estimator
^ ^
2.4. Variance & covariance of OLS estimators β 0 ∧ β1
^ ^
Estimates β 0 ∧ β1 differ from sample to sample. Since we have only one sample at a
time, we rely on the precision of these estimates in representing the true parameters (
β 0∧β 1 ¿ . The measure of such precision is the standard error. It can be shown that the
variances of the estimators are:
ar ( β^ 0 )=δ 2 1 +
x̄ 2
[ ] δ 2 ∑ X i2 √∑ X 2
σ
n ∑x var ( ^β 0 ) se ( β^ 0 ) = i
2 n ∑ x i2 √ n∑ x 2
a) v i ; or = and i
δ
2
^ σ
b) var ( ^β 1) = and se ( β 1 )=
∑ xi 2
√∑ x i
2
cov ( β^ 0 , β^ 1 ) = − x̄ var ( β^ 1 )
c)
24
d) The population variance[ δ ] = var ( Y ) =
2 ∑ y2 = δ 2
n
= ∑
RSS u2
δ^ =
2
n−2 n−2
Where RSS = Residual Sum of Squares, and n−2 is the degree of freedom. More
variation in X (the explanatory variable) and increase in sample size (n) increases
^ ^
precision of the estimators ( β 0 ∧ β1 ), this so because it will reduce the variance of
estimators (remember assumption 8 which state var(X) has to be positive).
14 . 65
^ 2= ∑ u
2
=1 . 83
Therefore, δ 8
n−2 = - estimate of the population variance
( ) [ ( )]
2 2
^ ≈N β , δ
β ^ ∼ N β , δ 2 1 + x̄
and β
1 1
∑ x i2 0 0
n ∑x2
i
25
With the assumption that ui follows normal distribution, the OLS estimators have the following
properties:
Then by the properties of the normal distribution using the standard normal distributions
(Z and t - distributions) the standardized values of the coefficients will be computed as
follows:
¿
β 1 −β 1 ( β^ 1−β 1) ( β^ 0 −β 0 )
Z= ¿ ∼ t n−2 Similarly, ∼ t n−2
se β 1 Se ( β^ 1 ) Se ( β^ 0 )
estimator (
β^ 0 ∧ β^ 1 ) ,
say within two or three standard errors on either side of the point
estimator, such that this interval has, say 95% probability of including the true parameter
value (say,
β 0 , β1 )
^ ^
Symbolically: ( β 1 −δ ≤β 1 ≤ β1 +δ )=1−α , where ( 0<α <1 )
Where α is the level of significance (probability of committing a type – I error)
Note: - Type – I error rejecting a true hypothesis:
Type –II error Accepting a false hypothesis:
26
Suppose we have estimated: ^y i = ^β 0 + ^β 1 x i
β^ 0 −β 1
se ( β^ )
0
∼ t n−2
,
se ( β^ 0 ) =
√(δ^ 2
1
n
+
x̄ 2
∑x2 i
)
Thus, we have: Pr ¿ ¿ ¿
¿
n−2 Denote the degree of freedom, since we have two variables (in the case of two
variables model), hence, df =n−2 and if α = 5%, then a 95% confidence interval for β 0 is
given as follows:
^
[ ^ ^ ^
p β 0−t ( α ,n−2) se ( β 0 ) ≤ β 0 ≤ β 0+ t ( α , n−2) se ( β 0 ) = 95%
2 2 ]
Using the previous firm example; we have estimated the model as:
y i=3.6+ 0.75 x i +ui ;
[ ]
p β^ 0 −t ( 0.025, 8) se ( ^β 0 ) ≤ β0 ≤ ^β 0+t (0.025 , 8) se ( ^β 0 ) = 95%
100 cases intervals like (-1.22, 8.42) will contain the true
β0 .
[^ ^ ^ ^ ^
p β1 −t ( α ,n−2) se ( β 1 ) ≤ β 1 ≤ β 1 + β 1+t (α , n−2) se ( β1 ) = 95%
2 2 ]
[ ]
p β^ 1−t (0.025 , 8) se ( ^β 1) ≤ β 1 ≤ ^β 1+ t ( 0.025, 8) se ( ^β 1 ) = 95%
27
y i=3.6+ 0.75 x i +ui ; n=10 , df =n−2=10−2=8
α
95% confidence interval for β 1; where: α =¿ 0.5 and = 0.025, t 0.025 ,8 = 2.306
2
[ ]
p β^ 1−t (0.025 , 8) se ( ^β 1) ≤ β 1 ≤ ^β 1+ t ( 0.025, 8) se ( ^β 1 ) = 95%
[ −0 .16 , 1 .34 ] - The interval estimates of the true value of β 1 at 95% confidence
2 ( n−1 ) δ^ 2
χ cal = 2
- chi - square distribution with n – 2 df .
σ
δ^ - Variance estimated from the sample data and σ 2 - a hypothesized value of the true p
2
variance. Rearranging the above statement, we write the confidence interval for the
variance at 1 −α level of confidence:
[ ]
( n−2 ) δ^ 2 2 ( n−2 ) δ^ 2
P 2 ≤σ ≤ 2
χ α χ α = 1 −α
( 2 , n−2) ( 1− 2 ,n−2)
2 2
χ and χ 2
( α
2
, n−2 ) (1− α2 , n−2 ) obtained from the χ - table
Example – returning back to the above numerical illustration, construct a 95% confidence
interval for the true variance.
We have estimated sample variance as δ^ 2 = 1.83, n = 10, df = n - 2 = 8, ¿ 0.05 ,
2
From the table we have: χ (0.025 , 8) = 17.5346 and χ 2(0.975 , 8) = 2.1797
28
P [ ( 8 ) 1.83
17.5346
≤ σ2 ≤
( 8 ) 1.83
2.1797 ]
P [ 0.835 ≤σ ≤ 6.72 ]
2
The true variance lies between 0.835 and 6.72.
Hypothesis testing could be a two- or one-tail test. Whether one uses a two- or one-tail
test of significance will depend upon how the alternative hypothesis is formulated, which,
in turn, may depend upon some a priori considerations or prior empirical experience.
The t- test used to test a hypothesis about the statistical significance of the estimated
value of individual coefficient. In the language of significance tests, a statistic (a
particular value of estimated coefficient) is said to be statistically significant if the value
of the test statistic lies in the critical region. In this case the null hypothesis is rejected.
Similarly, a test is said to be statistically insignificant if the value of the test statistic lies
in the acceptance region. In this case, the null hypothesis can’t be rejected.
f ( x)
29
−t α / 2 ,df t α / 2 ,df
−t α/ 2 ,df and t α / 2 ,df are the critical t – values that can be obtained from the t- table.
From the sample information the t- value of each estimated coefficient is computed as
follows:
β^ 1 −β 1
t=
se ( β^ )
1 - follows the t - distribution with n−2 df , where:
β^ 1 −β 1
t c =¿
se ( β^ )1 = - is called test statistic computed from data (t – calculated)
β^ 0 −β 0
t c =¿
se ( β^ ) 0
Function as:
Y =β 0 + β1 X i +ui , here MP L=β 1 and the estimated result:
Y i=3.6+ 0.75 X i +ui
If we want to test the following at α =5 % Accept Null Reject Null
H 0 : β 1 =0 ∧ H 1 : β 1≠0
β^ 1−β 1 0 .75−0 -2.306 2.306
= =2 . 929=t c
SE ( β^ 1 ) 0 . 256
α
t – tabulated - the critical Value from the table at = 0.025 and df =8 is equal to ±2.306;
2
[−2.306 , 2.306]. But, the t- value computed is 2. 929.which is outside the interval, in
other words, the estimated value lies in the critical (rejection) region.
Conclusion: Since t c outside the interval (acceptance region), hence, we reject H0 - for the
marginal product of labor MPL or β 1 , is significantly different from zero.
30
Note: a larger |t| value computed will always evidence against the null hypothesis.
significance, is set at 0.05, then the null hypothesis β 1=0 , can be rejected if
β^ 1
|t c|=| |>2 .
Se ( β^ )
1
The exact level of significance (P– value): The P- value is defined as the lowest
significance level at which the null – hypothesis is rejected. From our illustrative example
of the firms, given the null hypothesis that β 1=0 , we have obtained or computed a t-
value of 2.929.
What is the P – value of obtaining a t – value of as much as or greater than 2.929?
By using a computer, it can be shown that the probability of obtaining a t- value of 2.929
or greater (for 8 DF) is about 0.019, i.e., Pr (|t|> 2.929 )≈0.02. since this is low
probability, we reject the null hypothesis.
31
TSS ESS RSS ∑ y
i
2 ∑ ^y i + ∑ ui 2
= + → =
δ2 δ2 δ2 δ2 δ2 δ2 ¿
2
χ n−1 = χ 2k −1 +¿ χ 2n−k
ESS / df ESS/ k −1
=
Thus, RSS / df RSS / n−k
∼F k −1, n−k
Decision Rule: is based on the comparison of the F – value computed based the sample
estimation using the above formula (can be denoted by F c ) and the F- value from the
table (denoted simply by F) at the given level of significance ( α ¿ and degree of freedoms
(K-1 and n-k). Here, k denotes the number of variables in the model and n – sample size.
The F –test has two degrees of freedom;
one for the numerator – for the ESS which is k −1 (in some statistical tables
denoted by v1 )
another for the denominator (RSS) which is n−k (in statistical tables denoted by
v 2)
Hence; we will Reject H0 if F c > F and conclude that the regression coefficients are
statistically significantly different from zero.
If F c < F we Accept the Null Hypothesis and conclude that the regression coefficients are
not statistically significantly different from zero.
Example: - Given the previous estimated model for output and labor; test the overall
significance of the model at 0.05 level of significance,
Y i=3.6+ 0.75 X i +ui
We have: ESS=15.75, and k −1=1 and RSS=14.65, and n−k =8
H 0 : β 0 =β 1=0 , and H 1 : β 0 =β1 ≠ 0
The null hypothesis asserted that the coefficients are not significant; while the alternative
hypothesis states the coefficients of the model are significantly different from zero.
The significance of the model is tested through F – test. The first compute the F- value
from the sample data;
32
ESS
ESS/k −1
1 15.75 8
F c ( 1,8 ) = RSS = = × =1.075 ×8=8.6
RSS 14.65 1
n−k
8
Next read the F value from F table at df =(1,8) and at α =0.05 level of significant
Hence F (1, 8) at 5%, level is 5.32;
Note: F (1, 8) at 5%, level is 5.32 which is lower than F computed; hence we reject H 0;
Conclude that the coefficients significant taken simultaneously.
^β = ∑ y i x i = 16800
= 0.5091 - It is the estimated value of the slope coefficients,
1
∑ xi2 33000
33
b) Compute TSS, ESS and RSS
√ δ^
√ δ^
2 2
42.125
√
se( β^ 1 ) = var ( β^ 1 ) =
∑ i x
2
=
33,000
= 0.0357; var( β^ 1 ) =
∑ xi 2
= 0.00128
√
δ^ ∑ X i
√
2
42.125× 322,000
^
√
se( β 0 ) = var ( β 0 ) =
^
n ∑ xi
2
=
10× 33,000
se( β^ 0 ) =
√ 13,564,250
330,000
= √ 41.104=6.411 ; var( β^ 0 ) =
13,564,250
330,000
= 41.104
34
Y Coef. Std. Err. t P>|t| [95% Conf. Interval]
X 0.5090909 0.0357428 14.24 0.000 0.4266678 0.591514
cons 24.45455 6.413817 3.81 0.005 9.664256 39.24483
35