0% found this document useful (0 votes)
19 views58 pages

Chapter 2

Here, a simple linear regression is described in detail. It is important for undergraduate-level econometrics teaching material,

Uploaded by

aschalew
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views58 pages

Chapter 2

Here, a simple linear regression is described in detail. It is important for undergraduate-level econometrics teaching material,

Uploaded by

aschalew
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 58

Bule Hora University

College of Business and Economics


Department of Economics

Chapter 2: Simple Linear Regression Model

1
Introduction
• Regression analysis concerned with describing and evaluating the
r/ship b/n a given variable (often called the dependent variable) and
one or more variables which are assumed to influence the given
variable ( often called independent or explanatory variables).
• The simplest economic r/ship is presented through a two –variable
model (also called the simple linear regression model) which is given
by:
Y= a  bX
Where a and b are unknown parameters (also called regression coefficients)
that we estimate using sample data. Here Y is the dependent variable and X is
the independent variable.

2
Cont….

Example: Suppose the r/ship b/n advertisement expenditure (X) and


sales volume (Y) of a firm is expressed as: Y=0.6X + 120
Here, on the basis of advertisement expenditure , we can predict sales
volume. For instance, if the advertisement expenditure of a certain firm
is 1500 Birr, then the estimated sales volume will be
sales volume = 0.6 (1500)+120=1020 Birr
Note that since sales volume is estimated on the basis of advertisement
expenditure , sales volume is the dependent variable and advertisement
expenditure is the independent variable.

3
Cont….

The error term

Consider the above model: Y=0.6X +120. This R/ship is deterministic or


exact, that is, given income we can determine the exact expenditure of a
HH. But in reality this rarely happens: Different HHs with the same
income are not expected to spend equal amount due to habit persistent,
geographical and time variation, etc. Thus, we should express the
regression model as:
Yi =    X i   i
where  i is the random error term (also called disturbance term)

4
Why do we need to include the stochastic (random) component, for example in the consumption function?

1.Omission of variables: leads to misspecification problem. For example,


income is not the only determinants of consumption.
2.Vagueness of theory: The theory, if any, determining the behavior of Y may be, and

often is, incomplete. We might know for certain that weekly income X influences

weekly consumption expenditure Y, but we might be ignorant or unsure about the

other variables affecting Y. Therefore, ui may be used as a substitute for all the

excluded or omitted variables from the model


3.There may be measurement error in collecting data. We may use poor proxy
variables, inaccuracy in collection and measurement of sample data. 5
Cont…
5. Erratic (random = unpredictable) human behaviour. Even if we succeed

in introducing all the relevant variables into the model, there is bound to be

some “intrinsic” randomness in individual Y’s that cannot be explained no

matter how hard we try. The disturbances, the u’s, may very well reflect this

intrinsic randomness.

6. Error of aggregation:- the sum of the parts may be different from the

whole.

6
Cont…

7.Sampling error: Consider a model relating Consumption (Y) with income

(X) of HHs. The sample we randomly choose to examine the R/ship may

turn out to be predominantly poor HHs. In such cases, our estimation of α

and β from this sample may not be as good as that from a balanced sample

group.

8.Unavailability of data: Even if we know what some of the excluded

variables are and therefore consider a multiple regression rather than a

simple regression, we may not have quantitative information about these

variables. 7
Cont….

• Thus, a full specification of a regression model should include a


specification of the probability distribution of the disturbance (error) term.
This information is given by what we call basic assumptions of the Classic
Linear Regression Model (CLRM).
• Consider the model
Yi =    X i   i , i = 1,2,...,n
Here the subscript i refers to the ith observation. In CLRM, Yi and Xi are
observable while εi is not.
• If i refers to some point or period of time, then we speak of time series
data.
• On the other hand, if i refers to the ith individual, object, geographical
region, etc., then we speak of cross-sectional data.
8
Assumption of the classical linear regression model
• The linear regression model is based on certain assumptions, some of
which refer to the distribution of the random variable ε, some to the
relationship b/n u and explanatory variables, and finally some refers
to the relationship b/n the explanatory variables themselves. We
will group the assumptions in two categories, (a) Stochastic
assumptions, (b) other assumptions.
• Stochastic Assumption of OLS
• Assumption 1: the true model is Y     X  
i i i

• This Assumption states that the r/ship b/n Yi and Xi is linear, and that
the deterministic component(    X i ) and the stochastic component
(  i ) are additive.

9
Cont…
Assumption 2: The mean of u in any particular period is Zero.
This means that for each value of X, ε may assume various values,
some greater than zero and some smaller than zero, but if we consider
all the possible values of ε, for any given value of X, they would have an
average value equal to zero.
Assumption 3: The variance of  i is constant in each period
(Homoscedasticity)
The variance of  i about its mean is constant at all values of X. in
other words for all values of X, the ε ’s will show the same dispersion
around their mean.
Assumption 4: The variables  i has a normal distribution with mean
zero and variance  2 for all i (often written as:

10
Cont…
• Assumption 5: the random term of different observations ( i , are
j)
independent ( No error autocorrelation)
Cov( i ,  j )  E ( i j )  0 for i  j.
• This means that all the covariance of any  i with any other  j are
equal to zero. The value which the random term assumed in one
period does not depend on the value which it assumed in any other
period.
Assumption 6: U is independent of explanatory variable
• The disturbance term is not correlated with the explanatory
variable(s). The u’s and the X’s do not tend to vary together; their
covariance is zero.
cov( X  )  E  X i  E ( X i ) ui  E ( i )   0

11
Cont…

cov( Xu )  E  X i  E ( X i ) ui  E (ui )   0


 E  X i  E ( X i )  ui  given E (ui )  0
 E ( X i ui )  E ( X i ) E (ui )
 E ( X i ui )
 X i E (ui ) given that the X i ' s are fixed
0

• Assumption 7: The explanatory variable(s) are measured without error.


• u absorbs the influence of omitted variables and possibly error of
measurement in the y’s. that is, we will assume that the regressions are
error–free, while the values may or may not include errors of
measurements

12
Cont…

• Other assumptions
Assumption 8: the explanatory variables are not perfectly linearly correlated.
• If there is more than one explanatory variable in the r/ship it is assumed
that they are not perfectly correlated with each other. Indeed the
repressors should not even be strongly correlated, they should not be
highly multicollinear.
Assumption 9: the macro variables should be correctly aggregated.
• Usually the variables X and Y are aggregative variables, representing the
sum of individual items. For example, in consumption function
C=bo+biY+u, C is the sum of the expenditures of all consumers and Y is the
sum of all individual incomes.it is assumed that the appropriate
aggregation procedure has been adopted in compiling the aggregate
variables.
13
cont
• Assumption 10: the r/ship being estimated is identified.
• It is assumed that the r/ship whose coefficients we want to estimate
has a unique mathematical form, that is it does not contain the same
variables as any other equation related to the one being investigated.
Assumption 11: the r/ship is correctly specified( no specification bias or
error).
Assumptions 12: The number of observations n must be greater than
the number of parameters to be estimated.

14
Methods of estimation

• Specifying the model and stating its underlying assumptions are the
first stage of any econometric application. The next step is the
estimation of the numerical values of the parameters of economic
relationships. The parameters of the simple linear regression model
can be estimated by various methods. Three of the most commonly
used methods are:
• Ordinary least square method (OLS)
• Maximum likelihood method (MLM)
• Method of moments (MM)
• But, here we will deal with the OLS and the MLM methods of
estimation.

15
The Ordinary Least square (OLS) method of Estimation
• In the regression model, Yi     X i   i , the values of the
parameters  and  are not known. When they are estimated from a
sample of size n, we obtain the sample regression line given by:
  
Y     Xi i  1, 2,..., n
i

 

• Where  and  are estimated by  and , respectively, and Yis the


estimated value of Y.
• The dominating and powerful estimation method of the parameters
( or regression coefficients)  and  is the method of least squares.
• The deviation between 
the observed and estimated values of Y are
called the residuals i , that is
 
  Y Y , i  1, 2,..., n
i i i
16
Cont….
• The magnitude of the residuals is the vertical distance between the
actual observed points and the estimating line (see the figure below)

• The estimating line will have a ‘good fit’ if: it minimizes the error between the
estimated points on the line and the actual observed points that were used to draw it.

• Our aim is then to determine the equation of such an estimating line in


such a way that the error in estimation is minimized.
17
Cont…
• The Sum of Square of the Errors (SSE) is:

 i   (Y  Y )
  
SSE   i i
2
  (Yi     X i ) 2

• By partial differentiation of the SSE with respect to  


 and 

and equating the results to zero we get:


SSE  


 2 (Yi     X i )  0

SSE  


 2 X i (Yi     X i )  0

Re-arranging the two equations, we get the so called normal equations

18
Cont…
 
Y  n    X
i i
 
 X Y   X    X
i i i i
2

 
Thus, we have two equations with two unknowns
 
 and  . Solving for
 and  we get:

 n X iYi  ( X i  Yi  X Y  nXY
  i i

n  X i 2  ( X i ) 2  X  nX2 2
i
 
 Y  X
Where X and Y arethe mean value of theindependent and dependent var iables, respectively ,
1 1
that is X 
n
 X i and Y 
n
 Yi

19
Cont…
are said to be the ordinary least-square (OLS) estimators of α
 
 and 
and β, respectively. The line Y     X is called the least square line or

i
 

the estimated regression line of Y on X.


Note: Model in deviation form
Consider the regression model:
Yi     X i  ui .........(1)
Applying summation to both sides of the equation and dividing by n we have
n
Yi n
 n
 Xi n
ui

i 1 n
 
i 1 n
 
i 1 n
 
i 1 n

Y     X  u ............(2)
Subtracting equation (2) from (1) we get
Yi  Y   ( X i  X )  (ui  u ).........(3)
Letting xi  X i  X , yi  Yi  Y and  i  (ui  u ), we can write equation (3) as
yi   xi   i .........(4) 20
Cont…
• Equation (4) is the simple regression (or two-variable) model in
deviations form.
• The OLS estimator of β from equation (4) is given by:

  xy i i

x i
2

• The Coefficient of Determination


Yi  bo  bi X i  U
     i

Variation   Systematic   Random 


inY    var iation    var iation 
 i     
Variation   Explained  UnExplained 
inY    var iation    var iation 
 i     

21
Cont.
• In other words the Total Sum of Square (TSS) is decomposed in to
Regression (explained) Sum of Square (RSS) and Error (residual or
unexplained) Sum of Square (ESS).
TSS= RSS+ESS
Computation formulas
• The TSS is a measure of dispersion of the observed value of Y about
their mean. That
n
is computednas:
TSS   (Yi  Y ) 2   yi 2
i 1 i 1
• The regression (explained) sum of square (RSS) measures the amount
of the total variability in the observed values of Y that is accounted for
by the linear r/ship b/n the observed values of X and Y. this is
computed as:
n 
 n 
2
 n
RSS   (Yi  Y )     ( X i  X )     xi 2
2 2 2

i 1  i 1  i 1 22
Cont…
• The error (residual or unexplained) sum of square (ESS) is measures
the amount of the total variability in the observed values of Y about
the regression line. This is computed as:

ESS   (Yi  Yi ) 2  TSS  RSS
• If a regression equation does a good job of describing the R/ship
between two variables, the explained sum of squares should
constitute a large portion of the total sum of squares.
• Thus, it would be of interest to determine the magnitude of this
proportion by computing the ratio of the explained sum of squares
to the total sum of squares. This proportion is called the sample
Coefficient of determination, R 2.That is:
RSS ESS
Coefficient of determination  R 2   1
TSS TSS

  xi yi
R 
2
where xi  X i  X and yi  Yi  Y .
yi
2 23
Cont…
• Note
1) The proportion of total variation in the dependent variable (Y)
that is explained by changes in the independent variable (X) or by
the regression line is equal to: R 2 x100%
2) The proportion of total variation in the dependent variable (Y)
that is due to factors other than X (e.g., due to excluded variables,
chance, etc.) is equal to: (1  R 2 ) x100%

Test for the coefficient of determination ( R 2 ).


2
The largest value that R can assume is 1 ( in which case all
observations fall on the regression line), and the smallest it can
assume is zero.
24
Cont…
A low value of R 2 is an indication that:
• X is poor explanatory variable in the sense that variation in X leaves Y
unaffected, or
• While X is a relevant variable, its influences on Y is weak as compared to
some other variables that are omitted from regression equation, or
• The regression equation is mis-specified (e.g., an exponential r/ship
might be more appropriate).
• Thus, small values ofR2 casts doubt about the usefulness of the
regression equation. We do not, however, pass final judgment on the
equation until it has been subjected to an objective statistical test. Such
a test is accomplished by means of analysis of variance (ANOVA) which
2
enables us to test the significance of
R (i.e. the adequacy of the linear
regression model).
25
The ANOVA table for simple linear Regression
ANOVA table for simple linear regression
Source of Sum of Degrees of Mean Square Variance ratio
Variation Square freedom
Regression RSS 1 RSS/1 Fcal =

Residual ESS n-2 ESS/(n-2)


Total TSS n-1

To test for the significant of R 2 , we compare the variance ratio with the critical value from the F distribution with 1
and (n-2) degree of freedom in the numerator and denominator, respectively, for a given significance level α.
Decision: if the calculated variance ratio exceeds the tabulated value, that is, if

Fcal  F (1, n  2), we then conclude that R 2is significant (or that the linear regression
mod el is adquate.

26
Cont…
Note: the F test is designed to test the significance of all variables or a set of variables in
a regression model. In the two variable model, however, it used to test the explanatory
power of a single variable (X). and at the same time, is equivalent to the test of
significance of R 2.
Illustrative Example
Consider the following data on the percentage rate of change in electricity consumption
(millions KWH) (Y) and the rate of change in the price of electricity (Birr/KWH) (X) for year
1979-1994.
Summary statistics: note here that:
xi  X i  X and yi  Yi  Y

n  16 , X  1.280625, Y  23.42688,  i
x 2
 92.20109,  i  13228.7,
y 2

x y i i  779.235

27
Cont…
• Estimation of regression coefficients
 

The slope  and the intercept are computed as:



  xy 779.235
  8.45147
 x 92.20109 2

 
  Y   X  23.42688  ( 8.45147)(1.280625)  34.25004
Therefore, the estimated regression equation is :
   
Y     X  Y  34.25004  8.45147 X

Test of nmodel adequacy


n
TSS   (Yi  Y ) 2   yi 2  13228.7
i 1 i 1

n 
 n 
  n
RSS   (Yi  Y ) 2   2   ( X i  X ) 2    2  xi 2  (8.45147) 2 (92.20109)  6585.679
i 1  i 1  i 1
28
Cont…

ESS=TSS-RSS=13228.7-6585.679=6643.016
RSS 6585.679
R2    0.4978
TSS 13228.7

Thus, we can conclude that:

• About 50% of the variation in electricity consumption is due to changes


in the price of electricity.

• The 50% of the variation in electricity consumption is not due to


changes in the price of electricity, but instead due to chance and other
factors not included in the model.
29
ANOVA table

Source of Sum of Degrees of Mean Square Variance ratio


Variation Square freedom

Regression RSS=6585.679 1 RSS/1=6585.679 Fcal =


Fcal=13.87916

Residual ESS=6643.016 16-2=14 ESS/(n-2)=474.5011


Total TSS=13228.7 16-1=15

F (1, n  2)  F0.05 (1,14)  4.60

30
31
Cont…

• Decision: Since the calculated variance ratio exceed the critical


value, we reject the null hypothesis of no linear r/ship b/n price and
consumption of electricity at the 5% level of significance. Thus, we
2
then conclude that R is significant, that is, the linear regression
model is adequate and is useful for prediction purposes.

32
Estimation of the standard error of β and test of its significance
 2
• An unbiased estimator of the error variance is given by:

 n 
1 ESS 6643.016
 2

n2

i 1
i
2

n2

16  2
 474.5011

Thus, an unbiased estimator of Var (  ) is given by :

  2
474.5011
V ( )    5.146372
x i
2
92.20109

The s tan dared error of  is :
  
s.e(  )  V ( )  5.146372  2.268562
The hypothesis of interest is :
H0 :   0
H1 :   0
We calculated the test statistic :

 8.45147
t  
  3.72548
s.e(  ) 2.268562

33
Cont…
• For α=0.05, the critical value from the student’s t distribution with (n-2)
degree of freedom is:
t t / 2 (n  2),
• Decision: Since we reject the null hypothesis, and conclude that β
is significantly different from zero. In other words, the price of electricity
significantly and negatively affects electricity consumption.

  8.45147
• The interpretation of the estimated regression coefficient is that for a
one percent drop (increase) in the growth rate of price of electricity, there is
an 8.45 percent increase (decrease) in the growth rate of electricity
consumption.
34
Properties Of OLS Estimators
• The ideal or optimum properties that the OLS estimates possess may be
summarized by well known theorem known as the Gauss-Markov Theorem.
• According to this theorem, under the basic assumptions of the classical linear
regression model, the least squares estimators are linear, unbiased and have
minimum variance (i.e. are best of all linear unbiased estimators).
• Some times the theorem referred as the BLUE theorem i.e. Best, Linear,
Unbiased Estimator. An estimator is called BLUE if:
• Linear: a linear function of the a random variable, such as, the dependent
variable Y.
• Unbiased: its average or expected value is equal to the true population
parameter.
• Minimum variance: It has a minimum variance in the class of linear and
unbiased estimators. An unbiased estimator with the least variance is known as
an efficient estimator.
35
Cont….

• According to the Gauss-Markov theorem, the OLS estimators possess


all the BLUE properties. The detailed proof of these properties are
presented below

• Here, we will prove that is the BLUE of β. The proof for can be
done similarly.

a )To showthat  is a linear estimator
The OLS estimator of  can be exp ressed as :

 x yi i
  ai yi
x i
2

xi 
Where ai  , x  X i  X , and yi  Yi  Y .Thus,we can see that  is linear
 xi 2

estimator as it can be written as a weighted average of the individual observation onY .

36
Cont….


b)To showthat  is an unbiased estimator of 
 
Note : An estimator  of  is said to be unbiased if : E ( )  
Consider the model in deviations form : yi   xi   i

  xy i i

 x ( x
i i  i )

  xi 2   xi i

 x i i
(*)
x i
2
x i
2
x i
2
x i
2

Now we have
E (  )   (since  is constant)
E ( xi i )   xi E ( i )   xi (0)  0 (sin ce, xi is non  stochastic and E ( i )  0)
Thus,

E ( )  E ( )  E (
 x i i
)
 x E (i i )
  0 
x i
2
x i
2


 is an unbiased estimator of 

37
Cont….

c) To showthat  has the smallest variance out of all linear unbiased estimator of 
Note :
 
1. The OLS estimators  and  are calculated from a specific sample of observations of
the dependent and independent var iables. If we consider a diffirent sample of observations for
 
 and  may var y from one sample to another , and hence, are random var ibles.

2.The var iance an estimator (a random var ible)  and  is given by :
 
Var ( )  E (   ) 2
2
 n

3.The expression   xi  can be written in expanded form as :
 i 1 
2
 n
 n n

  xi    xi   xi x j
2

 i 1  i 1 i j

This is simply the sum of square ( xi 2 ) plus the sum of cross  product terms ( xi x j for i  j )
38
39
• Note that (**) follows from assumption (error term have constant
variance and no error autocorrelation), that is, for all i and for all

• We have seen above (in proof (a)) that OLS estimator of can be
expressed as:

Where
• Now let be another linear unbiased estimator of given by:

40
Cont….
 xi 
 *   ci yi     d i (  xi   i ) (since yi   xi   i )
  xi
2 


 xi 2
   d i xi 
 xi  i
  di i
 xi 2
 xi 2

Taking exp ectation we have :



E (  *)  E      d i xi 
 xi i
  di i 


  xi 2 

     d i xi ( Since E ( xi i )  xi E ( i )  0, E (d i i )  d i E ( i )  0)
Thus, for  * to be unbaised (that is , for E (  *)   to hold ) we should have :
d x i i  0............(***)

41
The variance of is given by

42
Cont’d

• Since (which is summation of square of real numbers) is always greater


than or equal to zero:

• This implies that is the smallest as compared to the variance of any


other linear unbiased estimator of .
• Hence, we concluded that is best linear unbiased estimator (BLUE) of

43
The Confidence Interval Approach
to Hypothesis Testing

• A range of values of a sample statistic that is likely (at a given level of probability, i.e.

confidence level) to contain a population parameter.

• The interval that will include that population parameter a certain percentage (= confidence

level) of the time.

• An example of its usage: We estimate a parameter, say to be 0.93, and a “95% confidence
interval” to be (0.77,1.09). This means that we are 95% confident that the interval
containing the true (but unknown) value of .

• Confidence intervals are almost invariably two-sided, although in theory a one-sided


interval can be constructed.
How to Carry out a Hypothesis Test
Using Confidence Intervals

1. Calculate  ,  and SE ( ) , SE(  )as before.


2. Choose a significance level, , (again the convention is 5%). This is equivalent to choosing
a (1-)100% confidence interval, i.e. 5% significance level = 95% confidence interval

3. Use the t-tables to find the appropriate critical value, which will again have T-2 degrees of
freedom.
( ˆ  t crit  SE ( ˆ ), ˆ  t crit  SE ( ˆ ))
4. The confidence interval is given by

5. Perform the test: If the hypothesised value of  (*) lies outside the confidence interval,
then reject the null hypothesis that  = *, otherwise do not reject the null.
Confidence Intervals Versus Tests of Significance
• Note that the Test of Significance and Confidence Interval approaches always give
the same answer.
• Under the test of significance approach, we would not reject H 0 that  = * if the test
statistic lies within the non-rejection region, i.e. if
   *
t crit    t crit
SE (  )
• Rearranging, we would not reject if

 t crit  SE ( ˆ )  ˆ   *  t crit  SE ( ˆ )
ˆ  t crit  SE ( ˆ )   *  ˆ  t crit  SE ( ˆ )

• But this is just the rule under the confidence interval approach.
Constructing Tests of Significance and
Confidence Intervals: An Example

• Using the regression results given below,

yˆ t  20.3  0.5091xt , T=22


(14.38) (0.2561)

• Using both the test of significance (standard error test) and confidence interval
approaches, test the hypothesis that  =1 against a two-sided alternative.

• The first step is to obtain the critical value. We want tcrit = t20;5%
Determining the Rejection Region

f(x)

2.5% rejection region 2.5% rejection region

-2.086 +2.086
49
Performing the Test

ˆ  t crit  SE ( ˆ )
 0.5091  2.086  0.2561
 ( 0.0251,1.0433)
Changing the Size of the Test


Changing the Size of the Test:
The New Rejection Regions

f(x)

5% rejection region 5% rejection region

-1.725 +1.725
Changing the Size of the Test:
The Conclusion

• t20;10% = 1.725. So now, as the test statistic lies in the rejection region, we would

reject H0.

• Caution should therefore be used when placing emphasis on or making decisions


in marginal cases (i.e. in cases where we only just reject or not reject).

• If we reject the null hypothesis at the level, we say that the result of the test is
statistically significant.
The Size of the Hypothesis Test and the
Type I and Type II Errors
• While using sample statistics to draw conclusions about the parameters
of the population as a whole, there is always the possibility that the
sample collected does not accurately represent the population.

• Consequently, statistical tests carried out using such sample data may
yield incorrect results that may lead to erroneous rejection of the null
hypothesis. We have two types of errors:

54
Cont’d

• Type I Error

• Type I error occurs when we reject a true null hypothesis. For example, a type I
error would manifest in the form of rejecting H0 = 0 when it is actually zero.

• Type II Error

• Type II error occurs when we fail to reject a false null hypothesis. In such a
scenario, the test provides insufficient evidence to reject the null hypothesis
when it’s false.
55
Cont’d

56
Cont’d
• The level of significance denoted by α represents the probability of making a type I
error, i.e., rejecting the null hypothesis when, in fact, it’s true. α is the direct opposite
of β, which is taken to be the probability of making a type II error within the bounds
of statistical testing.

• The ideal but practically impossible statistical test would be one


that simultaneously minimizes α and β.

• We use α to determine critical values that subdivide the distribution into the rejection
and the non-rejection regions.
57
End of chapter 2.
Thank you!

58

You might also like