0% found this document useful (0 votes)
83 views27 pages

2.simple Regression Analysis Chapter 6

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
83 views27 pages

2.simple Regression Analysis Chapter 6

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

CHAPTER 6 SIMPLE REGRESSION ANALYSIS 1/27

SIMPLE REGRESSION ANALYSIS


THE TWO-VARIABLE LINEAR MODEL
The two-variable linear model, or simple regression analysis, is used for
testing hypotheses about the relationship between a dependent variable 𝑌
and an independent or explanatory variable 𝑋 and for prediction.

Simple linear regression analysis usually begins by plotting the set of


𝑋𝑌 values on a scatter diagram and determining by inspection if there exists
an approximate linear relationship:
𝑌𝑖 = 𝑏2 + 𝑏1 𝑋 … … . . (6.1)

Since the points are unlikely to fall precisely on the line, the exact linear
relationship in Eq. (6.1) must be modified to include a random disturbance,
error, or stochastic term, 𝑢,
𝑌𝑖 = 𝑏2 + 𝑏1 𝑋 + 𝑢 … … . (6.2)

The error term is assumed to be


(1) normally distributed, with
(2) zero expected value or mean, and
(3) constant variance, and it is further assumed
(4) that the error terms are uncorrelated or unrelated to each other, and
(5) that the explanatory variable assumes fixed values in repeated sampling
(so that 𝑋, and 𝑢, are also uncorrelated).

State each of the five assumptions of the classical regression model (OLS)
and give an intuitive explanation of the meaning and need for each of them.

(1)The first assumption of the classical linear regression model (OLS) is that
the random error term 𝑢 is normally distributed.

As a result, 𝑌 and the sampling distribution of the parameters of the


regression are also normally distributed, so that tests can be conducted
on the significance of the parameters.
(2) The second assumption is that the expected value of the error term or its
mean equals zero:
𝐸(𝑢) = 0

Because of this assumption, Eq. (6.1) gives the average value of 𝑌.


Specifically, since 𝑋 is assumed fixed, the value of 𝑌 in Eq. (6.2) varies
above and below its mean as 𝑢 exceeds or is smaller than 0.
CHAPTER 6 SIMPLE REGRESSION ANALYSIS 2/27

Since the average value of u is assumed to be 0, Eq. (6.1) gives the


average value of 𝑌.

(3) The third assumption is that the variance of the error term is constant in
each period and for all values of 𝑋:

𝐸(𝑢𝑖 )2 = 𝜎𝑢2

This assumption ensures that each observation is equally reliable, so that


estimates of the regression coefficients are efficient and tests of
hypotheses about them are not biased. These first three assumptions
about the error term can be summarized as

𝑢~𝑁(0, 𝜎𝑢2 )

(4) The fourth assumption is that the value which the error term assumes in
one period is uncorrelated or unrelated to its value in any other period:

𝐸(𝑢𝑖 , 𝑢𝑗 ) = 0 𝑓𝑜𝑟 𝑖 ≠ 𝑗 𝑖, 𝑗 = 1,2, ⋯ , 𝑛

This ensures that the average value of 𝑌 depends only on 𝑋 and not on 𝑢,
and it is, once again, required in order to have efficient estimates of the
regression coefficients and unbiased tests of their significance.

(5) The fifth assumption is that the explanatory variable assumes fixed
values that can be obtained in repeated samples, so that the explanatory
variable is also uncorrelated with the error term:

𝐸(𝑋𝑖 𝑢𝑖 ) = 0

This assumption is made to simplify the analysis.


CHAPTER 6 SIMPLE REGRESSION ANALYSIS 3/27

EXAMPLE 1.
The Table-1 gives the bushels of corn per acre, 𝒀, resulting from the use
of various amounts of fertilizer in pounds per acre, 𝑿, produced on a farm
in each of 10 years from 1971 to 1980. These are plotted in the scatter
diagram of Fig. 6-1. The relationship between 𝑿 and 𝒀 in Fi is
approximately linear (ie., the points would fall on or near a straight line).

Corn Produced with Fertilizer Used

Year 𝑛 𝑌𝑖 𝑋
1971 1 40 6
1972 2 44 10
1973 3 46 12
1974 4 48 14
1975 5 52 16
1976 6 58 18
1977 7 60 22
1978 8 68 24
1979 9 74 26
1980 10 80 32

MATLAB implementation of the problem

INPUT COLUMN VECTOR Y

OUTPUT COLUMN VECTOR Y


CHAPTER 6 SIMPLE REGRESSION ANALYSIS 4/27

INPUT COLUMN VECTOR X

OUTPUT COLUMN VECTOR X

SCATTER DIAGRAM OF THE DATA POINTS


MATLAB PLOT COMMAND
CHAPTER 6 SIMPLE REGRESSION ANALYSIS 5/27

THE ORDINARY LEAST SQUARES METHOD

The ordinary least-squares method (OLS) is a technique for fitting the


"best" straight line to the sample of XY observations.

It involves minimizing the sum of the squared (vertical) deviations of


points from the line:
2
𝛭𝑖𝑛 ∑(𝑌𝑖 − 𝑌̂𝑖 )

where 𝑌𝑖 , refers to the actual observations, and 𝑌 ̂𝑖 , refers to the


̂𝑖 = 𝑒𝑖 , the residual.
corresponding fitted values, so that 𝑌𝑖 − 𝑌

This gives the following two normal equations:

̂0 + 𝑏̂1 ∑ 𝑋𝑖 … … . (1)
∑ 𝑌𝑖 = 𝑛𝑏
̂0 ∑ 𝑋𝑖 + 𝑏̂1 ∑ 𝑥𝑖2 … … . (2)
∑ 𝑋𝑖 𝑌𝑖 = 𝑏

̂0 , and and 𝑏̂1 , are estimators


where 𝑛 is the number of observations and 𝑏
of the true parameters 𝑏0 and 𝑏1 .

Solving simultaneously Eqs. (1) and (2), we get

𝑛∑𝑋𝑖 𝑌𝑖 − ∑𝑋𝑖 ∑𝑌𝑖


𝑏̂1 = … … … (2𝐴)
𝑛∑𝑋𝑖2 − (∑𝑋𝑖 )2
̂0 is then given by
The value of 𝑏
̂0 = 𝑌̅ − 𝑏̂1 𝑋̅ … … … (2𝐵)
𝑏

It is often useful to use an equivalent formula for estimating 𝑏̂1 .

∑𝑥𝑖 𝑦𝑖
𝑏̂1 = … … . (3)
∑𝑥𝑖2

where 𝑥𝑖 = 𝑋𝑖 − 𝑋̅, and 𝑦𝑖 = 𝑌𝑖 − 𝑌̅. The estimated least-squares


regression (OLS) equation is then

̂0 + 𝑏̂1 𝑋𝑖 … …. (4)
̂𝑖 = 𝑏
𝑌
CHAPTER 6 SIMPLE REGRESSION ANALYSIS 6/27

EXAMPLE 2. Table-2 shows the calculations to estimate the regression


equation for the corn-fertilizer in Table-1. Using Eq. (3)
Table-2 Corn Produced with Fertilizer Used: Calculations

𝒏 𝒀(𝑪𝒐𝒓𝒏) 𝑿 (𝑭𝒆𝒓𝒕𝒊𝒍𝒊𝒛𝒆𝒓) 𝒚𝒊 𝒙𝒊 𝒙 𝒊 𝒚𝒊 𝒙𝟐𝒊


1 40 6 −17 −12 204 144
2 44 10 −13 −8 104 64
3 46 12 −11 −6 66 36
4 48 14 −9 −4 36 16
5 52 16 −5 −2 10 4
6 58 18 1 0 0 0
7 60 22 3 4 12 16
8 68 24 11 6 66 36
9 74 26 17 8 136 64
10 80 32 23 14 322 196
𝒏 = 𝟏𝟎 ∑𝒀𝒊 = 𝟓𝟕𝟎 ∑𝑿𝒊 = 𝟏𝟖𝟎 ∑𝒚𝒊 = 𝟎 ∑𝒙𝒊 = 𝟎 ∑𝒙𝒊 𝒚𝒊 = 𝟗𝟓𝟔 ∑𝒙𝟐𝒊 = 𝟓𝟕𝟔
̅ = 𝟓𝟕
𝒀 𝑿̅ = 𝟏𝟖

̂0 = ∑𝑥𝑖𝑦2 𝑖 = 956 = 1.66


𝑏 (the slope of the estimated regression line)
∑𝑥𝑖 576

̂0 = 𝑌̅ − 𝑏̂1 𝑋̅ = 57 − (1.66)(18) ≅ 57 − 29.88 ≅ 27.12 (the Y intercept)


𝑏

̂𝑖 = 27.12 + 1.66 𝑋𝑖
𝑌 (the estimated regression equation)

̂0 . When 𝑋𝑖 = 18 = 𝑌̅,
Thus, when 𝑋𝑖 = 0, Ŷ = 27.12 = 𝑏
Ŷ = 27.12 + 1.66(18) = 57 = 𝑌̅.

As a result, the regression line passes through the point (𝑋̅, 𝑌̅).

MATLAB APPLICATION OF THE PROBLEM

Since vector X and vector Y are already defined in MATLAB. Now we have
̂0 and 𝑏̂1 .
to calculate 𝑏

Since there are two ways to calculate 𝑏̂1 , so one by one we shall learn both
methods.

First Method

We need to calculate 𝑛, ∑𝑋𝑖 𝑌𝑖 , ∑𝑋𝑖 , ∑𝑌𝑖 , ∑𝑋𝑖2, 𝑋̅ and 𝑌̅.


CHAPTER 6 SIMPLE REGRESSION ANALYSIS 7/27

Using equation (2A), we have

𝑛∑𝑋𝑖 𝑌𝑖 − ∑𝑋𝑖 ∑𝑌𝑖


𝑏̂1 =
𝑛∑𝑋𝑖2 − (∑𝑋𝑖 )2

Using equation (2B), we have


̂0 = 𝑌̅ − 𝑏̂1 𝑋̅
𝑏

Second Method

We need to calculate 𝑥 = 𝑋𝑖 − 𝑋̅, 𝑦 = 𝑌𝑖 − 𝑌̅, ∑𝑥𝑖 𝑦𝑖 and ∑𝑥𝑖2 .


CHAPTER 6 SIMPLE REGRESSION ANALYSIS 8/27

Using equation (3), we have


∑𝑥𝑖 𝑦𝑖
𝑏̂1 =
∑𝑥𝑖2

Result of 𝑏̂1 obtained by second method can be easily verified by result


obtained from first method.

Note: Since 𝒙𝒊 and 𝒚𝒊 are deviations taken from means of 𝑿𝒊 and 𝒀𝒊


respectively, so their sum will always be zero.
CHAPTER 6 SIMPLE REGRESSION ANALYSIS 9/27

Since above all statements are executed one by one on command prompt
in command window.

We can combine all statements in one script file in editor window and
executed all statements in a single click.

Save above script file with an appropriate name. I have chosen


‘First_Method’ and then press run button, following results will be
displayed on command window:

Which can be verified with the previous results

For the Second


CHAPTER 6 SIMPLE REGRESSION ANALYSIS 10/27

Second Method, following script is written and executed:

The executed results are:

Here results are displayed in more printable format by using fprintf()


command.
Following is the graph of the scattered points and regression line.

MATLAB coding for plotting above graph:


CHAPTER 6 SIMPLE REGRESSION ANALYSIS 11/27

TESTS OF SIGNIFICANCE OF PARAMETER ESTIMATES

In order to test for the statistical significance of the parameter estimates


of the regression, the variance of 𝑏 ̂0 and 𝑏̂1 , is required.

∑𝑋𝑖2
𝑉𝑎𝑟 𝑏̂0 = 𝜎𝑛2
𝑛∑𝑥𝑖2
1
𝑉𝑎𝑟 𝑏̂1 = 𝜎𝑛2
𝑛∑𝑥𝑖2

Since 𝜎𝑛2 is unknown, the residual variances 𝑠 2 is used as an (unbiased)


estimate of 𝜎̂𝑛2 :

2
∑𝑒𝑖2
𝑠 = 𝜎̂𝑛2 =
𝑛−𝑘

where 𝑘 represents the number of parameter estimates.

Unbiased estimates of the variance of 𝑏̂0 and 𝑏̂1 , are then given by

∑𝑒𝑖2 ∑𝑋𝑖2
𝑠𝑏20 =
𝑛 − 𝑘 𝑛∑𝑥𝑖2
∑𝑒𝑖2 1
𝑠𝑏21 =
𝑛 − 𝑘 ∑𝑥𝑖2
CHAPTER 6 SIMPLE REGRESSION ANALYSIS 12/27

so that 𝑠𝑏̂0 and 𝑠𝑏̂1 are the standard errors of the estimates. Since 𝑢𝑖 , is
normally distributed, 𝑌, and therefore 𝑏̂0 and 𝑏̂1 , are also normally
distributed, so that we can use the distribution with 𝑛 − 𝑘 degrees of
freedom, to test hypotheses about and construct confidence intervals for
𝑏̂0 and 𝑏̂1 .

EXAMPLE 3
Table 3 (an extension of Table 2) shows the calculations required to test
the statistical significance of 𝑏̂0 and 𝑏̂1 .

The values of 𝑌̂, in Table 3 are obtained by substituting the values of 𝑋𝑖 ,


into the estimated regression equation found in Example 2. (The values
of 𝑦𝑖2 are obtained by squaring 𝑦𝑖 , from Table 2.

Variances of 𝑏̂0 and 𝑏̂1 are as follows:

∑𝑒𝑖2 ∑𝑋𝑖2 47.3056 3816


𝑠𝑏2̂0 = 2 ≅ ≅ 3.92
𝑛 − 𝑘 𝑛∑𝑥𝑖 10 − 2 10(576)
2 ∑𝑒𝑖2 1 47.3056
𝑠𝑏̂1 = 2 ≅ (10 ≅ 0.01
𝑛 − 𝑘 ∑𝑥𝑖 − 2)576

Standard Errors are:


𝑠𝑏̂0 = √3.92 ≅ 1.98
𝑠𝑏̂1 = √0.01 ≅ 0.1
Calculated 𝑡 values are:
𝑏̂0 − 𝑏0 27.12 − 0
𝑡0 = = ≅ 13.7
𝑠𝑏̂0 1.98
𝑏̂1 − 𝑏1 1.66 − 0
𝑡1 = = ≅ 16.6
𝑠𝑏̂1 0.2
Table 3 Corn-Fertilizer Calculations to Test Significance of Parameters

𝑌𝑒𝑎𝑟 𝑌𝑖 𝑋𝑖 𝑌̂ 𝑒𝑖 𝑒𝑖2 𝑋𝑖2 𝑥𝑖2 𝑦𝑖2


1 40 6 37.08 2.92 8.5264 36 144 289
2 44 10 43.72 0.28 0.0784 100 64 169
3 46 12 47.04 −1.04 1.0816 144 36 121
4 48 14 50.36 −2.36 5.5696 196 16 81
5 52 16 53.68 −1.68 2.8224 256 4 25
6 58 18 57.00 1.00 1.0000 324 0 1
7 60 22 63.64 −3.64 13.2496 484 16 9
8 68 24 66.96 1.04 1.0816 576 36 121
9 74 26 70.28 3.72 13.8384 676 64 289
CHAPTER 6 SIMPLE REGRESSION ANALYSIS 13/27

10 80 32 80.24 −0.24 0.0576 1024 196 529


2 2 2 2
𝑛 = 10 ∑𝑒𝑖 = 0 ∑𝑒𝑖 = 47.3056 ∑𝑋𝑖 = 3816 ∑𝑥𝑖 = 576 ∑𝑦𝑖 = 1634

MATLAB APPLICATION TO THE PROBLEM

We need to calculate
𝑛, ∑𝑒𝑖 , ∑𝑒𝑖2 , ∑𝑋𝑖2 , ∑𝑥𝑖2 , ∑𝑦𝑖2 , 𝑠 2𝑏̂0 , 𝑠𝑏2̂1 , 𝑠 𝑏̂0 , 𝑠 𝑏̂1 , 𝑡0 and 𝑡1
CHAPTER 6 SIMPLE REGRESSION ANALYSIS 14/27

Since 𝑘 = 2 in our case i.e 𝑏̂0 and 𝑏̂1 .

Variances of 𝑏̂0 and 𝑏̂1

Standard Errors of 𝑏̂0 and 𝑏̂1

Estimation of 𝑡0 and 𝑡1

Compare these t values with t-distribution table with 𝑛 − 𝑘 degrees of


freedom given in (Appendix V) at the end of the book:
Here 𝑛 − 𝑘 = 10 − 2 = 8 degrees of freedom and 5% level of significance.
CHAPTER 6 SIMPLE REGRESSION ANALYSIS 15/27

STUDENT’S T-DISTRIBUTION

df/ls 0.1 0.05 0.025 0.01 0.005


1 3.0777 6.3138 12.7062 31.8205 63.6567
2 1.8856 2.9200 4.3027 6.9646 9.9248
3 1.6377 2.3534 3.1824 4.5407 5.8409
4 1.5332 2.1318 2.7764 3.7469 4.6041
5 1.4759 2.0150 2.5706 3.3649 4.0321
6 1.4398 1.9432 2.4469 3.1427 3.7074
7 1.4149 1.8946 2.3646 2.9980 3.4995
8 1.3968 1.8595 2.3060 2.8965 3.3554
9 1.3830 1.8331 2.2622 2.8214 3.2498
10 1.3722 1.8125 2.2281 2.7638 3.1693
11 1.3634 1.7959 2.2010 2.7181 3.1058
12 1.3562 1.7823 2.1788 2.6810 3.0545
13 1.3502 1.7709 2.1604 2.6503 3.0123
14 1.3450 1.7613 2.1448 2.6245 2.9768
15 1.3406 1.7531 2.1314 2.6025 2.9467
16 1.3368 1.7459 2.1199 2.5835 2.9208
17 1.3334 1.7396 2.1098 2.5669 2.8982
18 1.3304 1.7341 2.1009 2.5524 2.8784
19 1.3277 1.7291 2.0930 2.5395 2.8609
20 1.3253 1.7247 2.0860 2.5280 2.8453
21 1.3232 1.7207 2.0796 2.5176 2.8314
22 1.3212 1.7171 2.0739 2.5083 2.8188
23 1.3195 1.7139 2.0687 2.4999 2.8073
24 1.3178 1.7109 2.0639 2.4922 2.7969
25 1.3163 1.7081 2.0595 2.4851 2.7874
26 1.3150 1.7056 2.0555 2.4786 2.7787
27 1.3137 1.7033 2.0518 2.4727 2.7707
28 1.3125 1.7011 2.0484 2.4671 2.7633
29 1.3114 1.6991 2.0452 2.4620 2.7564
30 1.3104 1.6973 2.0423 2.4573 2.7500
31 1.3095 1.6955 2.0395 2.4528 2.7440
32 1.3086 1.6939 2.0369 2.4487 2.7385
33 1.3077 1.6924 2.0345 2.4448 2.7333
34 1.3070 1.6909 2.0322 2.4411 2.7284
35 1.3062 1.6896 2.0301 2.4377 2.7238
CHAPTER 6 SIMPLE REGRESSION ANALYSIS 16/27

This can be seen that value of t at 8 degrees of freedom and 5% level of


significance is 2.306 which is smaller than 𝑡0 and 𝑡1 , so both parameters 𝑏̂0
and 𝑏̂1 are statistically significant.

We can find t-distribution value using MATLAB with in specified df and level
of significance as under:

In the above example df is 8 and level of significance is 5% in two tail which


will be half of that in one tail i.e. 0.025, so the percentile is
1 – 0.025 = 0.975.

So, MATLAB statement for finding t value is:

TEST OF GOODNESS OF FIT AND CORRELATION


The closer the observations fall to the regression line (ie., the smaller the
residuals), the greater is the variation in Y "explained" by the estimated
regression equation. The total variation in Y is equal to the explained plus the
residual variation:

2 2
∑(𝑌𝑖 – 𝑌̅)2 = ∑(𝑌̂𝑖 − 𝑌̅) + ∑(𝑌𝑖 − 𝑌̂𝑖 )
Total variation in Y Explained Residual
(or total sum of variation in Y variation in Y
squares) (regression sum (or error sum
of squares
of squares
𝑇𝑆𝑆 = 𝑅𝑆𝑆 + 𝐸𝑆𝑆

Dividing both sides by TSS gives:


𝑅𝑆𝑆 𝐸𝑆𝑆
1= +
𝑇𝑆𝑆 𝑇𝑆𝑆

The coefficient of determination, or 𝑅2 , is then defined as the proportion of


the total variation in Y "explained" by the regression of Y on X:
CHAPTER 6 SIMPLE REGRESSION ANALYSIS 17/27

𝑅𝑆𝑆 𝐸𝑆𝑆
𝑅2 = =1−
𝑇𝑆𝑆 𝑇𝑆𝑆
𝑅2 can be calculated by

2
∑𝑦̂ 2 ∑𝑒𝑖2
𝑅 = =1−
∑𝑦𝑖2 ∑𝑦𝑖2
2
Where ∑𝑦̂ 2 = ∑(𝑌̂𝑖 − 𝑌̅𝑖 )

R ranges in value from 0 (when the estimated regression equation explains


none of the variation in Y) to 1 (when all points lie on the regression line).

The correlation coefficient r is given by


∑𝑥𝑖 𝑦𝑖
𝑟 = √𝑅2 = √𝑏̂1
∑𝑦𝑖2

𝑟 ranges in value from -1 (for perfect negative linear correlation) to +1 (for


perfect positive linear correlation) and does not imply causality or
dependence. Sign of 𝑟 is depending upon sign of 𝑏̂1 (The Slope parameter)

Positive Correlation: Variables which have a direct relationship (a


positive correlation) increase together and decrease together.

Negative Correlation: In an inverse relationship (a negative


correlation), one variable increases while the other decreases.

EXAMPLE The coefficient of determination for the corn-fertilizer example


can be found from Table 6.
2
∑𝑒𝑖2 47.31
𝑅 =1− ≅ 1 − ≅ 1 − 0.0290 ≅ 0.9710 𝑜𝑟 97.10%
∑𝑦𝑖2 1634
Thus the regression equation explains about 97% of the total variation in
corn output.

The remaining 3% is attributed to factors included in the error term.


Then 𝑟 = √𝑅2 = √0.9710 ≅ 0.9854, 𝑜𝑟 98.54%, and is positive because 𝑏̂1 is
positive.
CHAPTER 6 SIMPLE REGRESSION ANALYSIS 18/27

Following Figure shows the total, the explained, and the residual variation of
Y.

The data in Table-4 reports the aggregate consumption (Y, in billions of U.S.
dollars) and disposable income (X, also in billions of U.S. dollars) for a
developing economy for the 12 years from 1988 to 1999.
Draw a scatter diagram for the data and determine by inspection if there
exists an approximate linear relationship between Y and X.

Table - 4 Aggregate Consumption (Y) and Disposable Income (X)

𝑌𝑒𝑎𝑟 𝑛 𝑌𝑖 𝑋𝑖
1988 1 102 114
1989 2 106 118
1990 3 108 126
1991 4 110 130
1992 5 122 136
1993 6 124 140
1994 7 128 148
1995 8 130 156
1996 9 142 160
1997 10 148 164
1998 11 150 170
1999 12 154 178
CHAPTER 6 SIMPLE REGRESSION ANALYSIS 19/27

From above Fig. it can be seen that the relationship between consumption
expenditures 𝑌 and disposable income 𝑋 is approximately linear, as required
by the linear regression model.

State the general relationship between consumption 𝒀 and disposable


income 𝑿 in

(a) exact linear form and

(b) stochastic form.

(c) Why would you expect most observed values of 𝒀 not to fall exactly on
a straight line?

(a) The exact or deterministic general relationship between aggregate


consumption expenditures Y and aggregate disposable income X can
be written as:
𝑌𝑖 = 𝑏0 + 𝑏1 𝑋𝑖

where 𝑖 refers to each year in time-series analysis (as with the data in
Table ) or to each economic unit (such as a family) in cross-sectional
analysis.
CHAPTER 6 SIMPLE REGRESSION ANALYSIS 20/27

In Eq. (6.1), 𝑏0 and 𝑏1 , are unknown constants called parameters.


Parameter 𝑏0 is the constant or 𝑌 intercept, while 𝑏1 , measures
∆𝑌/∆𝑋, which, in the context of Prob. 6.2, refers to the marginal
propensity to consume (MPC).

The specific linear relationship corresponding to the general linear


relationship in Eq. (6.1) is obtained by estimating the values of 𝑏0 and
𝑏1 (represented by 𝑏̂0 , and 𝑏̂1 , and read as "b sub zero hat" and "b sub
one hat").

(b) The exact linear relationship in Eq. (6.1) can be made stochastic by
adding a random disturbance or error term, M,, giving
𝑌𝑖 = 𝑏0 + 𝑏1 𝑋𝑖 + 𝑢𝑖

(c) Most observed values of 𝑌 are not expected to fall precisely on a


straight line

(1) because even though consumption 𝑌 is postulated to depend


primarily on disposable income 𝑋, it also may depend on numerous
other omitted variables with only slight and irregular effect on 𝑌 (if
some of these other variables had instead a significant and regular
effect on Y, then they should be included as additional explanatory
variables, as in a multiple regression model);

(2) because of possible errors in measuring Y; and

(3) because of inherent random human behavior, which usually leads


to different values of 𝑌 for the same value of 𝑋 under identical
circumstances.

THE ORDINARY LEAST-SQUARES METHOD

(a) What is meant by the ordinary least-squares (OLS) method of estimating


the "best" straight line that fits the sample of 𝑿𝒀 observations?

The OLS method gives the best straight line that fits the sample of 𝑋𝑌
observations in the sense that it minimizes the sum of the squared
(vertical) deviations of each observed point on the graph from the
straight line.

(b) Why do we take vertical deviations?


CHAPTER 6 SIMPLE REGRESSION ANALYSIS 21/27

We take vertical deviations because we are trying to explain or predict


movements in Y, which is measured along the vertical axis.

(c) Why do we not simply take the sum of the deviations without squaring
them?

We cannot take the sum of the deviations of each of the observed points
from the OLS line because deviations that are equal in size but opposite
in sign cancel out, so the sum of the deviations equals 0.

(d) Why do we not take the sum of the absolute deviations?

Taking the sum of the absolute deviations avoids the problem of having
the sum of the deviations equal to 0. However, the sum of the squared
deviations is preferred so as to penalize larger deviations relatively more
than smaller deviations.
Starting from Eq. (6.3) calling for the minimization of the sum of the squared
deviations or residuals, derive (a) normal Eq. (6.4) and (b) normal Eq. (6.5).

(a)
2
∑𝑒𝑖2 = ∑(𝑌𝑖 − 𝑌̂𝑖 ) = ∑(𝑌𝑖 − 𝑏̂0 − 𝑏̂1 𝑋𝑖 )
(b) Normal Eq. (6.4) is derived by minimizing ∑𝑒𝑖2 with respect to 𝑏̂0 :

2
∂∑𝑒𝑖2 𝜕Σ(𝑋𝑖 − 𝑏̂0 − 𝑏̂𝑖 𝑋𝑖 )
= =0
∂𝑏̂0 ∂𝑏̂0
2∑(𝑦1 − 𝑏̂0 − 𝑏̂1 𝑋𝑖 )(−1) = 0
∑(𝑦𝑖 − 𝑏̂0 − 𝑏̂1 𝑋𝑖 ) = 0
∑𝑌𝑖 = 𝑛𝑏̂0 + 𝑏̂1 ∑𝑋𝑖 … … . (𝐵1)

(6) Normal Eq. (6.5) is derived by minimizing∑𝑒𝑖2 with respect to 𝑏̂1 .


2
∂∑𝑒𝑖2 ∂∑(𝑌𝑖 − 𝑏̂0 − 𝑏̂1 𝑋𝑖 )
= =0
∂𝑏̂1 𝜕𝑏̂1
2∑(𝑌𝑖 − 𝑏̂0 − 𝑏̂1 𝑋𝑖 )(−𝑋𝑖 ) = 0
∑(𝑌𝑖 𝑋𝑖 − 𝑏̂0 𝑋𝑖 − 𝑏̂1 𝑋𝑖2 ) = 0

∑𝑌𝑖 𝑋𝑖 = 𝑏̂0 ∑𝑋𝑖 + 𝑏̂1 ∑𝑋𝑖2


CHAPTER 6 SIMPLE REGRESSION ANALYSIS 22/27

Solve simultaneously (B1) and (B2) to get values of 𝑏̂1 and 𝑏̂0
(α) Multiplying Eq. (?) by 𝑛 and Eq. (?) by ∑𝛸𝑖 ., we get
𝑛∑𝑋𝑖 𝑌𝑖 = 𝑏̂0 𝑛∑𝑋𝑖 + 𝑏̂1 𝑛∑𝑋𝑖2 … … . (𝐴1)
∑𝑋𝑖 ∑𝑌𝑖 = 𝑏̂0 𝑛∑𝑋𝑖 + 𝑏̂1 (∑𝑋𝑖 )2 … … … (𝐴2)
Subtracting Eq. (A2) from Eq. (A1), we get
𝑛∑𝑋𝑖 𝑌𝑖 − ∑𝑋𝑖 ∑𝑌𝑖 = 𝑏̂1 [𝑛∑𝑋𝑖2 − (∑𝑋𝑖 )2 ] … … . . (𝐴3)
Solving Eq. (A3) for 𝑏1 , we get
𝑛∑𝑋𝑖 𝑌𝑖 − ∑𝑋𝑖 ∑𝑌𝑖
𝑏̂1 = … … . . (𝐴4)
𝑛∑𝑋𝑖2 − (∑𝑋𝑖 )2
(b) Equation (A5) is obtained by simply solving Eq. (B1) for 𝑏̂0

∑𝑌𝑖 = 𝑛𝑏̂0 + 𝑏̂1 ∑𝑋𝑖 … … . . (𝐵1)


∑𝑌𝑖 ∑𝑋𝑖
𝑏̂0 = − 𝑏̂1
𝑛 𝑛
= 𝑌̅ − 𝑏̂1 𝑋̅ … … . (𝐴5)
EXAMPLE:
(a) Find the regression equation for the consumption schedule in Table
̂𝟏.
6.4, using Eq. (6.6) to find 𝒃
(a) Table 6.5 shows the calculations to find 𝑏̂1 , and 𝑏̂0 for the data in Table
6.4.
𝑛∑𝑋𝑖 𝑌𝑖 − ∑𝑋𝑖 ∑𝑌𝑖 (12)(225,124) − (1740)(1524) 2,701,488 − 2,651,760
𝑏ˆ1 = = =
𝑛∑𝑋𝑖2 − (∑𝑋𝑖 )2 (12)(257,112) − (1740)2 3,085,344 − 3,027,600
49,728
= ≅ 0.86
57,744
𝑏ˆ0 = 𝑌‾ − 𝑏ˆ1 𝑋‾ ≅ 127 − 0.86(145) ≅ 127 − 124.30 ≅ 2.30

Thus the equation for the estimated consumption regression is

𝑌̂𝑖 = 2.30 + 0.868𝑋𝑖 ,

𝑌𝑖 𝑋𝑖 𝑋𝑖 𝑌𝑖 𝑋𝑖2
1 102 114 11,628 12,996
2 106 118 12,508 13,924
CHAPTER 6 SIMPLE REGRESSION ANALYSIS 23/27

3 108 126 13,608 15,876


4 110 130 14,300 16,900
5 122 136 16,592 18,496
6 124 140 17,360 19,600
7 128 148 18,944 21,904
8 130 156 20,280 24,336
9 142 160 22,720 25,600
10 148 164 24,272 26,896
11 150 170 25,500 28,900
12 154 178 27,412 31,684

𝑛 = 12 ∑𝑌𝑖 = 1524 ∑𝑋𝑖 = 1740 ∑𝑋𝑖 𝑌𝑖 = 225,124 ∑𝑋𝑖2 = 257,112


𝑌‾ = 127 𝑋 = 145

(b) Plot the regression line and show the deviations of each 𝒀𝒊 , from the
corresponding 𝒀 ̂𝒊

(b) To plot the regression equation, we need to define any two points on the
regression line.
For example, when 𝑋𝑖 = 114, 𝑌̂𝑖 = 2.30 + 0.86(114) = 100.34.
When 𝑋𝑖 = 178, 𝑌̂𝑖 = 2.30 + 0.86(178) = 155.38.
CHAPTER 6 SIMPLE REGRESSION ANALYSIS 24/27

The consumption regression line is plotted in Fig. (?), where the positive
and negative residuals are also shown.
The regression line represents the best fit to the random sample of
consumption-disposable income observations in the sense that it
minimizes the sum of the squared (vertical) deviations from the line.
Assignment
̂𝟏 , in deviation form for
a) Starting with Eq. (?), derive the equation for 𝒃
the case where 𝑿 ̅=𝒀 ̅ = 𝟎.
̂𝟎 when 𝑿
(b) What is the value of 𝒃 ̅ = 𝒀
̅=𝟎?

For the aggregate consumption-income observations in Table (?), find


(a) 𝒔𝟐 (b) 𝒔𝟐𝒃̂𝟎 and 𝒔𝒃̂𝟎 (c) 𝒔𝟐𝒃̂𝟏 and 𝒔𝒃̂𝟏 (d) Test at the 5% level of
significance for 𝒃𝟎 and 𝒃𝟏 .
CHAPTER 6 SIMPLE REGRESSION ANALYSIS 25/27

Construct the 95% confidence interval for (a) 𝒃𝟎 and (b) 𝒃𝟏 , in above
problem
(a) The 95% confidence interval for 𝑏0 , is given by
𝑏0 = 𝑏̂0 ± 2.228𝑠𝑏̂0 = 2.30 ± 2.228(7.17) = 2.30 ± 15.97

So 𝑏0 is between -13.67 and 18.27 with 95% confidence. Note how wide
(and meaningless) the 95% confidence interval 𝑏0 is, reflecting the fact
that &, is highly insignificant.

(b) The 95% confidence interval for 𝑏1 is given by


𝑏1 = 𝑏̂1 ± 2.228𝑠𝑏̂1 = 0.86 ± 2.228(0.05) = 0.86 ± 0.11
So 𝑏1 is between 0.75 and 0.97 (ie., 0.75 < 𝑏1 < 0.97) with 95%
confidence.
Where 2.228 is t value obtained from t-distribution table at 5% level of
significance and 12-2=10 degrees of freedom. This value can also be
obtained by using MATLAB function:

Assignment:
C1.1 Find 𝑹𝟐 for the estimated consumption regression of previous
̂𝟐
∑𝒚 ∑𝒆𝟐
problem using the equation (a) 𝑹𝟐 = 𝒊
and (b) 𝑹𝟐 = 𝟏 − 𝒊
. Also find
∑𝒚𝟐
𝒊 ∑𝒚𝟐
𝒊
the results using MATLAB statements.

C1.2 Find 𝒓 for the estimated consumption regression in previous


∑𝒙𝒊 𝒚𝒊
problem using (a) 𝒓 = √𝑹𝟐 , (b) 𝒓 = ̂𝟏 ∑𝒙𝒊𝒚𝟐 𝒊 . Also
and (c) 𝒓 = √𝒃
∑𝒚𝒊
√∑𝒙𝟐 𝟐
𝒊 √∑𝒚𝒊

find the results using MATLAB statements.


CHAPTER 6 SIMPLE REGRESSION ANALYSIS 26/27

C1.3 Table (C1) gives the per capita income to the nearest $100 (𝑌) and
the percentage of the economy represented by agriculture (𝑋)
reported by the World Bank World Development Indicators for 1999
for 15 Latin American countries.
(a) Estimate the regression equation of 𝑌, on 𝑋.
(b)Test at the 5% level of significance for the statistical significance
of the parameters.
(c) Find the coefficient of determination.
(d) Use MATLAB statements to compute all the computations given
in (a), (b) and (c)
(e) Report the results obtained in part (a), (b) and (c) in standard
summary form.

Table (C1)
𝐶𝑜𝑢𝑛𝑡𝑟𝑦 (1) (2) (3) (4) (5) (6) (7) (8)
𝑛 1 2 3 4 5 6 7 8
𝑌𝑖 76 10 44 47 23 19 13 19
𝑋𝑖 6 16 9 8 14 11 12 10

𝐶𝑜𝑢𝑛𝑡𝑟𝑦 (9) (10) (11) (12) (13) (14) (15)


𝑛 9 10 11 12 13 14 15
𝑌𝑖 8 44 4 31 24 59 37
𝑋𝑖 18 5 26 8 8 9 5

*Key: (1) Argentina; (2) Bolivia; (3) Brazil; (4) Chile; (5) Colombia; (6)
Dominican Republic; (7) Ecuador; (8) El Salvador; (9) Honduras; (10)
Mexico; (11) Nicaragua; (12) Panama; (13) Peru; (14) Uruguay; (15)
Venezuela.
Source: World Bank World Development Indicators.
C1.4 Draw a scatter diagram for the data in Table(C2) and determine by
inspection if there is an approximate linear relationship between 𝑌𝑖 ,
and 𝑋𝑖 .
CHAPTER 6 SIMPLE REGRESSION ANALYSIS 27/27

C1.5 For the data in Table (C2, find the value of (a) 𝑏̂1 , and (b) 𝑏̂0 .
(c) Write the equation for the estimated OLS regression line.

Table (C2)
Observations on variables Y and X
𝑛 𝑌𝑖 𝑋𝑖
1 20 2
2 28 3
3 40 5
4 45 4
5 37 3
6 52 5
7 54 7
8 43 6
9 65 7
10 56 8

C1.6 (a) On a set of axes, plot the data in Table (C2), plot the estimated
OLS regression line and show the residuals.
(b) Show algebraically that the regression line goes through point
𝑋̅𝑌̅.

C1.7 For the data in Table (C2) , find (a) 𝑠 2 (b) 𝑠𝑏2̂0 and 𝑠𝑏̂0 , and (c 𝑠𝑏2̂1 and
𝑠𝑏̂1

Test at the 5% level of significance for (a) 𝑏0 and (b) 𝑏1 .


Construct the 95% confidence interval for (a) 𝑏0 and (b) 𝑏1 .
For the estimated OLS regression equation, find (a) 𝑅2 and (b) 𝑟.

You might also like