0% found this document useful (0 votes)

83 views27 pages

2.simple Regression Analysis Chapter 6

Uploaded by

Zakir Hussain Dohat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

83 views27 pages

2.simple Regression Analysis Chapter 6

Uploaded by

Zakir Hussain Dohat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

CHAPTER 6 SIMPLE REGRESSION ANALYSIS 1/27

SIMPLE REGRESSION ANALYSIS

THE TWO-VARIABLE LINEAR MODEL
The two-variable linear model, or simple regression analysis, is used for
testing hypotheses about the relationship between a dependent variable 𝑌
and an independent or explanatory variable 𝑋 and for prediction.

Simple linear regression analysis usually begins by plotting the set of

𝑋𝑌 values on a scatter diagram and determining by inspection if there exists
an approximate linear relationship:
𝑌𝑖 = 𝑏2 + 𝑏1 𝑋 … … . . (6.1)

Since the points are unlikely to fall precisely on the line, the exact linear
relationship in Eq. (6.1) must be modified to include a random disturbance,
error, or stochastic term, 𝑢,
𝑌𝑖 = 𝑏2 + 𝑏1 𝑋 + 𝑢 … … . (6.2)

The error term is assumed to be

(1) normally distributed, with
(2) zero expected value or mean, and
(3) constant variance, and it is further assumed
(4) that the error terms are uncorrelated or unrelated to each other, and
(5) that the explanatory variable assumes fixed values in repeated sampling
(so that 𝑋, and 𝑢, are also uncorrelated).

State each of the five assumptions of the classical regression model (OLS)
and give an intuitive explanation of the meaning and need for each of them.

(1)The first assumption of the classical linear regression model (OLS) is that
the random error term 𝑢 is normally distributed.

As a result, 𝑌 and the sampling distribution of the parameters of the

regression are also normally distributed, so that tests can be conducted
on the significance of the parameters.
(2) The second assumption is that the expected value of the error term or its
mean equals zero:
𝐸(𝑢) = 0

Because of this assumption, Eq. (6.1) gives the average value of 𝑌.

Specifically, since 𝑋 is assumed fixed, the value of 𝑌 in Eq. (6.2) varies
above and below its mean as 𝑢 exceeds or is smaller than 0.
CHAPTER 6 SIMPLE REGRESSION ANALYSIS 2/27

Since the average value of u is assumed to be 0, Eq. (6.1) gives the

average value of 𝑌.

(3) The third assumption is that the variance of the error term is constant in
each period and for all values of 𝑋:

𝐸(𝑢𝑖 )2 = 𝜎𝑢2

This assumption ensures that each observation is equally reliable, so that

estimates of the regression coefficients are efficient and tests of
hypotheses about them are not biased. These first three assumptions
about the error term can be summarized as

𝑢~𝑁(0, 𝜎𝑢2 )

(4) The fourth assumption is that the value which the error term assumes in
one period is uncorrelated or unrelated to its value in any other period:

𝐸(𝑢𝑖 , 𝑢𝑗 ) = 0 𝑓𝑜𝑟 𝑖 ≠ 𝑗 𝑖, 𝑗 = 1,2, ⋯ , 𝑛

This ensures that the average value of 𝑌 depends only on 𝑋 and not on 𝑢,
and it is, once again, required in order to have efficient estimates of the
regression coefficients and unbiased tests of their significance.

(5) The fifth assumption is that the explanatory variable assumes fixed
values that can be obtained in repeated samples, so that the explanatory
variable is also uncorrelated with the error term:

𝐸(𝑋𝑖 𝑢𝑖 ) = 0

This assumption is made to simplify the analysis.

CHAPTER 6 SIMPLE REGRESSION ANALYSIS 3/27

EXAMPLE 1.
The Table-1 gives the bushels of corn per acre, 𝒀, resulting from the use
of various amounts of fertilizer in pounds per acre, 𝑿, produced on a farm
in each of 10 years from 1971 to 1980. These are plotted in the scatter
diagram of Fig. 6-1. The relationship between 𝑿 and 𝒀 in Fi is
approximately linear (ie., the points would fall on or near a straight line).

Corn Produced with Fertilizer Used

Year 𝑛 𝑌𝑖 𝑋
1971 1 40 6
1972 2 44 10
1973 3 46 12
1974 4 48 14
1975 5 52 16
1976 6 58 18
1977 7 60 22
1978 8 68 24
1979 9 74 26
1980 10 80 32

MATLAB implementation of the problem

INPUT COLUMN VECTOR Y

OUTPUT COLUMN VECTOR Y

CHAPTER 6 SIMPLE REGRESSION ANALYSIS 4/27

INPUT COLUMN VECTOR X

OUTPUT COLUMN VECTOR X

SCATTER DIAGRAM OF THE DATA POINTS

MATLAB PLOT COMMAND
CHAPTER 6 SIMPLE REGRESSION ANALYSIS 5/27

THE ORDINARY LEAST SQUARES METHOD

The ordinary least-squares method (OLS) is a technique for fitting the

"best" straight line to the sample of XY observations.

It involves minimizing the sum of the squared (vertical) deviations of

points from the line:
2
𝛭𝑖𝑛 ∑(𝑌𝑖 − 𝑌̂𝑖 )

where 𝑌𝑖 , refers to the actual observations, and 𝑌 ̂𝑖 , refers to the

̂𝑖 = 𝑒𝑖 , the residual.
corresponding fitted values, so that 𝑌𝑖 − 𝑌

This gives the following two normal equations:

̂0 + 𝑏̂1 ∑ 𝑋𝑖 … … . (1)
∑ 𝑌𝑖 = 𝑛𝑏
̂0 ∑ 𝑋𝑖 + 𝑏̂1 ∑ 𝑥𝑖2 … … . (2)
∑ 𝑋𝑖 𝑌𝑖 = 𝑏

̂0 , and and 𝑏̂1 , are estimators

where 𝑛 is the number of observations and 𝑏
of the true parameters 𝑏0 and 𝑏1 .

Solving simultaneously Eqs. (1) and (2), we get

𝑛∑𝑋𝑖 𝑌𝑖 − ∑𝑋𝑖 ∑𝑌𝑖

𝑏̂1 = … … … (2𝐴)
𝑛∑𝑋𝑖2 − (∑𝑋𝑖 )2
̂0 is then given by
The value of 𝑏
̂0 = 𝑌̅ − 𝑏̂1 𝑋̅ … … … (2𝐵)
𝑏

It is often useful to use an equivalent formula for estimating 𝑏̂1 .

∑𝑥𝑖 𝑦𝑖
𝑏̂1 = … … . (3)
∑𝑥𝑖2

where 𝑥𝑖 = 𝑋𝑖 − 𝑋̅, and 𝑦𝑖 = 𝑌𝑖 − 𝑌̅. The estimated least-squares

regression (OLS) equation is then

̂0 + 𝑏̂1 𝑋𝑖 … …. (4)
̂𝑖 = 𝑏
𝑌
CHAPTER 6 SIMPLE REGRESSION ANALYSIS 6/27

EXAMPLE 2. Table-2 shows the calculations to estimate the regression

equation for the corn-fertilizer in Table-1. Using Eq. (3)
Table-2 Corn Produced with Fertilizer Used: Calculations

𝒏 𝒀(𝑪𝒐𝒓𝒏) 𝑿 (𝑭𝒆𝒓𝒕𝒊𝒍𝒊𝒛𝒆𝒓) 𝒚𝒊 𝒙𝒊 𝒙 𝒊 𝒚𝒊 𝒙𝟐𝒊

1 40 6 −17 −12 204 144
2 44 10 −13 −8 104 64
3 46 12 −11 −6 66 36
4 48 14 −9 −4 36 16
5 52 16 −5 −2 10 4
6 58 18 1 0 0 0
7 60 22 3 4 12 16
8 68 24 11 6 66 36
9 74 26 17 8 136 64
10 80 32 23 14 322 196
𝒏 = 𝟏𝟎 ∑𝒀𝒊 = 𝟓𝟕𝟎 ∑𝑿𝒊 = 𝟏𝟖𝟎 ∑𝒚𝒊 = 𝟎 ∑𝒙𝒊 = 𝟎 ∑𝒙𝒊 𝒚𝒊 = 𝟗𝟓𝟔 ∑𝒙𝟐𝒊 = 𝟓𝟕𝟔
̅ = 𝟓𝟕
𝒀 𝑿̅ = 𝟏𝟖

̂0 = ∑𝑥𝑖𝑦2 𝑖 = 956 = 1.66

𝑏 (the slope of the estimated regression line)
∑𝑥𝑖 576

̂0 = 𝑌̅ − 𝑏̂1 𝑋̅ = 57 − (1.66)(18) ≅ 57 − 29.88 ≅ 27.12 (the Y intercept)

𝑏

̂𝑖 = 27.12 + 1.66 𝑋𝑖
𝑌 (the estimated regression equation)

̂0 . When 𝑋𝑖 = 18 = 𝑌̅,
Thus, when 𝑋𝑖 = 0, Ŷ = 27.12 = 𝑏
Ŷ = 27.12 + 1.66(18) = 57 = 𝑌̅.

As a result, the regression line passes through the point (𝑋̅, 𝑌̅).

MATLAB APPLICATION OF THE PROBLEM

Since vector X and vector Y are already defined in MATLAB. Now we have
̂0 and 𝑏̂1 .
to calculate 𝑏

Since there are two ways to calculate 𝑏̂1 , so one by one we shall learn both
methods.

First Method

We need to calculate 𝑛, ∑𝑋𝑖 𝑌𝑖 , ∑𝑋𝑖 , ∑𝑌𝑖 , ∑𝑋𝑖2, 𝑋̅ and 𝑌̅.

CHAPTER 6 SIMPLE REGRESSION ANALYSIS 7/27

Using equation (2A), we have

𝑛∑𝑋𝑖 𝑌𝑖 − ∑𝑋𝑖 ∑𝑌𝑖

𝑏̂1 =
𝑛∑𝑋𝑖2 − (∑𝑋𝑖 )2

Using equation (2B), we have

̂0 = 𝑌̅ − 𝑏̂1 𝑋̅
𝑏

Second Method

We need to calculate 𝑥 = 𝑋𝑖 − 𝑋̅, 𝑦 = 𝑌𝑖 − 𝑌̅, ∑𝑥𝑖 𝑦𝑖 and ∑𝑥𝑖2 .

CHAPTER 6 SIMPLE REGRESSION ANALYSIS 8/27

Using equation (3), we have

∑𝑥𝑖 𝑦𝑖
𝑏̂1 =
∑𝑥𝑖2

Result of 𝑏̂1 obtained by second method can be easily verified by result

obtained from first method.

Note: Since 𝒙𝒊 and 𝒚𝒊 are deviations taken from means of 𝑿𝒊 and 𝒀𝒊

respectively, so their sum will always be zero.
CHAPTER 6 SIMPLE REGRESSION ANALYSIS 9/27

Since above all statements are executed one by one on command prompt
in command window.

We can combine all statements in one script file in editor window and
executed all statements in a single click.

Save above script file with an appropriate name. I have chosen

‘First_Method’ and then press run button, following results will be
displayed on command window:

Which can be verified with the previous results

For the Second

CHAPTER 6 SIMPLE REGRESSION ANALYSIS 10/27

Second Method, following script is written and executed:

The executed results are:

Here results are displayed in more printable format by using fprintf()

command.
Following is the graph of the scattered points and regression line.

MATLAB coding for plotting above graph:

CHAPTER 6 SIMPLE REGRESSION ANALYSIS 11/27

TESTS OF SIGNIFICANCE OF PARAMETER ESTIMATES

In order to test for the statistical significance of the parameter estimates

of the regression, the variance of 𝑏 ̂0 and 𝑏̂1 , is required.

∑𝑋𝑖2
𝑉𝑎𝑟 𝑏̂0 = 𝜎𝑛2
𝑛∑𝑥𝑖2
1
𝑉𝑎𝑟 𝑏̂1 = 𝜎𝑛2
𝑛∑𝑥𝑖2

Since 𝜎𝑛2 is unknown, the residual variances 𝑠 2 is used as an (unbiased)

estimate of 𝜎̂𝑛2 :

2
∑𝑒𝑖2
𝑠 = 𝜎̂𝑛2 =
𝑛−𝑘

where 𝑘 represents the number of parameter estimates.

Unbiased estimates of the variance of 𝑏̂0 and 𝑏̂1 , are then given by

∑𝑒𝑖2 ∑𝑋𝑖2
𝑠𝑏20 =
𝑛 − 𝑘 𝑛∑𝑥𝑖2
∑𝑒𝑖2 1
𝑠𝑏21 =
𝑛 − 𝑘 ∑𝑥𝑖2
CHAPTER 6 SIMPLE REGRESSION ANALYSIS 12/27

so that 𝑠𝑏̂0 and 𝑠𝑏̂1 are the standard errors of the estimates. Since 𝑢𝑖 , is
normally distributed, 𝑌, and therefore 𝑏̂0 and 𝑏̂1 , are also normally
distributed, so that we can use the distribution with 𝑛 − 𝑘 degrees of
freedom, to test hypotheses about and construct confidence intervals for
𝑏̂0 and 𝑏̂1 .

EXAMPLE 3
Table 3 (an extension of Table 2) shows the calculations required to test
the statistical significance of 𝑏̂0 and 𝑏̂1 .

The values of 𝑌̂, in Table 3 are obtained by substituting the values of 𝑋𝑖 ,

into the estimated regression equation found in Example 2. (The values
of 𝑦𝑖2 are obtained by squaring 𝑦𝑖 , from Table 2.

Variances of 𝑏̂0 and 𝑏̂1 are as follows:

∑𝑒𝑖2 ∑𝑋𝑖2 47.3056 3816

𝑠𝑏2̂0 = 2 ≅ ≅ 3.92
𝑛 − 𝑘 𝑛∑𝑥𝑖 10 − 2 10(576)
2 ∑𝑒𝑖2 1 47.3056
𝑠𝑏̂1 = 2 ≅ (10 ≅ 0.01
𝑛 − 𝑘 ∑𝑥𝑖 − 2)576

Standard Errors are:

𝑠𝑏̂0 = √3.92 ≅ 1.98
𝑠𝑏̂1 = √0.01 ≅ 0.1
Calculated 𝑡 values are:
𝑏̂0 − 𝑏0 27.12 − 0
𝑡0 = = ≅ 13.7
𝑠𝑏̂0 1.98
𝑏̂1 − 𝑏1 1.66 − 0
𝑡1 = = ≅ 16.6
𝑠𝑏̂1 0.2
Table 3 Corn-Fertilizer Calculations to Test Significance of Parameters

𝑌𝑒𝑎𝑟 𝑌𝑖 𝑋𝑖 𝑌̂ 𝑒𝑖 𝑒𝑖2 𝑋𝑖2 𝑥𝑖2 𝑦𝑖2

1 40 6 37.08 2.92 8.5264 36 144 289
2 44 10 43.72 0.28 0.0784 100 64 169
3 46 12 47.04 −1.04 1.0816 144 36 121
4 48 14 50.36 −2.36 5.5696 196 16 81
5 52 16 53.68 −1.68 2.8224 256 4 25
6 58 18 57.00 1.00 1.0000 324 0 1
7 60 22 63.64 −3.64 13.2496 484 16 9
8 68 24 66.96 1.04 1.0816 576 36 121
9 74 26 70.28 3.72 13.8384 676 64 289
CHAPTER 6 SIMPLE REGRESSION ANALYSIS 13/27

10 80 32 80.24 −0.24 0.0576 1024 196 529

2 2 2 2
𝑛 = 10 ∑𝑒𝑖 = 0 ∑𝑒𝑖 = 47.3056 ∑𝑋𝑖 = 3816 ∑𝑥𝑖 = 576 ∑𝑦𝑖 = 1634

MATLAB APPLICATION TO THE PROBLEM

We need to calculate
𝑛, ∑𝑒𝑖 , ∑𝑒𝑖2 , ∑𝑋𝑖2 , ∑𝑥𝑖2 , ∑𝑦𝑖2 , 𝑠 2𝑏̂0 , 𝑠𝑏2̂1 , 𝑠 𝑏̂0 , 𝑠 𝑏̂1 , 𝑡0 and 𝑡1
CHAPTER 6 SIMPLE REGRESSION ANALYSIS 14/27

Since 𝑘 = 2 in our case i.e 𝑏̂0 and 𝑏̂1 .

Variances of 𝑏̂0 and 𝑏̂1

Standard Errors of 𝑏̂0 and 𝑏̂1

Estimation of 𝑡0 and 𝑡1

Compare these t values with t-distribution table with 𝑛 − 𝑘 degrees of

freedom given in (Appendix V) at the end of the book:
Here 𝑛 − 𝑘 = 10 − 2 = 8 degrees of freedom and 5% level of significance.
CHAPTER 6 SIMPLE REGRESSION ANALYSIS 15/27

STUDENT’S T-DISTRIBUTION

df/ls 0.1 0.05 0.025 0.01 0.005

1 3.0777 6.3138 12.7062 31.8205 63.6567
2 1.8856 2.9200 4.3027 6.9646 9.9248
3 1.6377 2.3534 3.1824 4.5407 5.8409
4 1.5332 2.1318 2.7764 3.7469 4.6041
5 1.4759 2.0150 2.5706 3.3649 4.0321
6 1.4398 1.9432 2.4469 3.1427 3.7074
7 1.4149 1.8946 2.3646 2.9980 3.4995
8 1.3968 1.8595 2.3060 2.8965 3.3554
9 1.3830 1.8331 2.2622 2.8214 3.2498
10 1.3722 1.8125 2.2281 2.7638 3.1693
11 1.3634 1.7959 2.2010 2.7181 3.1058
12 1.3562 1.7823 2.1788 2.6810 3.0545
13 1.3502 1.7709 2.1604 2.6503 3.0123
14 1.3450 1.7613 2.1448 2.6245 2.9768
15 1.3406 1.7531 2.1314 2.6025 2.9467
16 1.3368 1.7459 2.1199 2.5835 2.9208
17 1.3334 1.7396 2.1098 2.5669 2.8982
18 1.3304 1.7341 2.1009 2.5524 2.8784
19 1.3277 1.7291 2.0930 2.5395 2.8609
20 1.3253 1.7247 2.0860 2.5280 2.8453
21 1.3232 1.7207 2.0796 2.5176 2.8314
22 1.3212 1.7171 2.0739 2.5083 2.8188
23 1.3195 1.7139 2.0687 2.4999 2.8073
24 1.3178 1.7109 2.0639 2.4922 2.7969
25 1.3163 1.7081 2.0595 2.4851 2.7874
26 1.3150 1.7056 2.0555 2.4786 2.7787
27 1.3137 1.7033 2.0518 2.4727 2.7707
28 1.3125 1.7011 2.0484 2.4671 2.7633
29 1.3114 1.6991 2.0452 2.4620 2.7564
30 1.3104 1.6973 2.0423 2.4573 2.7500
31 1.3095 1.6955 2.0395 2.4528 2.7440
32 1.3086 1.6939 2.0369 2.4487 2.7385
33 1.3077 1.6924 2.0345 2.4448 2.7333
34 1.3070 1.6909 2.0322 2.4411 2.7284
35 1.3062 1.6896 2.0301 2.4377 2.7238
CHAPTER 6 SIMPLE REGRESSION ANALYSIS 16/27

This can be seen that value of t at 8 degrees of freedom and 5% level of

significance is 2.306 which is smaller than 𝑡0 and 𝑡1 , so both parameters 𝑏̂0
and 𝑏̂1 are statistically significant.

We can find t-distribution value using MATLAB with in specified df and level
of significance as under:

In the above example df is 8 and level of significance is 5% in two tail which

will be half of that in one tail i.e. 0.025, so the percentile is
1 – 0.025 = 0.975.

So, MATLAB statement for finding t value is:

TEST OF GOODNESS OF FIT AND CORRELATION

The closer the observations fall to the regression line (ie., the smaller the
residuals), the greater is the variation in Y "explained" by the estimated
regression equation. The total variation in Y is equal to the explained plus the
residual variation:

2 2
∑(𝑌𝑖 – 𝑌̅)2 = ∑(𝑌̂𝑖 − 𝑌̅) + ∑(𝑌𝑖 − 𝑌̂𝑖 )
Total variation in Y Explained Residual
(or total sum of variation in Y variation in Y
squares) (regression sum (or error sum
of squares
of squares
𝑇𝑆𝑆 = 𝑅𝑆𝑆 + 𝐸𝑆𝑆

Dividing both sides by TSS gives:

𝑅𝑆𝑆 𝐸𝑆𝑆
1= +
𝑇𝑆𝑆 𝑇𝑆𝑆

The coefficient of determination, or 𝑅2 , is then defined as the proportion of

the total variation in Y "explained" by the regression of Y on X:
CHAPTER 6 SIMPLE REGRESSION ANALYSIS 17/27

𝑅𝑆𝑆 𝐸𝑆𝑆
𝑅2 = =1−
𝑇𝑆𝑆 𝑇𝑆𝑆
𝑅2 can be calculated by

2
∑𝑦̂ 2 ∑𝑒𝑖2
𝑅 = =1−
∑𝑦𝑖2 ∑𝑦𝑖2
2
Where ∑𝑦̂ 2 = ∑(𝑌̂𝑖 − 𝑌̅𝑖 )

R ranges in value from 0 (when the estimated regression equation explains

none of the variation in Y) to 1 (when all points lie on the regression line).

The correlation coefficient r is given by

∑𝑥𝑖 𝑦𝑖
𝑟 = √𝑅2 = √𝑏̂1
∑𝑦𝑖2

𝑟 ranges in value from -1 (for perfect negative linear correlation) to +1 (for

perfect positive linear correlation) and does not imply causality or
dependence. Sign of 𝑟 is depending upon sign of 𝑏̂1 (The Slope parameter)

Positive Correlation: Variables which have a direct relationship (a

positive correlation) increase together and decrease together.

Negative Correlation: In an inverse relationship (a negative

correlation), one variable increases while the other decreases.

EXAMPLE The coefficient of determination for the corn-fertilizer example

can be found from Table 6.
2
∑𝑒𝑖2 47.31
𝑅 =1− ≅ 1 − ≅ 1 − 0.0290 ≅ 0.9710 𝑜𝑟 97.10%
∑𝑦𝑖2 1634
Thus the regression equation explains about 97% of the total variation in
corn output.

The remaining 3% is attributed to factors included in the error term.

Then 𝑟 = √𝑅2 = √0.9710 ≅ 0.9854, 𝑜𝑟 98.54%, and is positive because 𝑏̂1 is
positive.
CHAPTER 6 SIMPLE REGRESSION ANALYSIS 18/27

Following Figure shows the total, the explained, and the residual variation of
Y.

The data in Table-4 reports the aggregate consumption (Y, in billions of U.S.
dollars) and disposable income (X, also in billions of U.S. dollars) for a
developing economy for the 12 years from 1988 to 1999.
Draw a scatter diagram for the data and determine by inspection if there
exists an approximate linear relationship between Y and X.

Table - 4 Aggregate Consumption (Y) and Disposable Income (X)

𝑌𝑒𝑎𝑟 𝑛 𝑌𝑖 𝑋𝑖
1988 1 102 114
1989 2 106 118
1990 3 108 126
1991 4 110 130
1992 5 122 136
1993 6 124 140
1994 7 128 148
1995 8 130 156
1996 9 142 160
1997 10 148 164
1998 11 150 170
1999 12 154 178
CHAPTER 6 SIMPLE REGRESSION ANALYSIS 19/27

From above Fig. it can be seen that the relationship between consumption
expenditures 𝑌 and disposable income 𝑋 is approximately linear, as required
by the linear regression model.

State the general relationship between consumption 𝒀 and disposable

income 𝑿 in

(a) exact linear form and

(b) stochastic form.

(a) The exact or deterministic general relationship between aggregate

consumption expenditures Y and aggregate disposable income X can
be written as:
𝑌𝑖 = 𝑏0 + 𝑏1 𝑋𝑖

where 𝑖 refers to each year in time-series analysis (as with the data in
Table ) or to each economic unit (such as a family) in cross-sectional
analysis.
CHAPTER 6 SIMPLE REGRESSION ANALYSIS 20/27

In Eq. (6.1), 𝑏0 and 𝑏1 , are unknown constants called parameters.

Parameter 𝑏0 is the constant or 𝑌 intercept, while 𝑏1 , measures
∆𝑌/∆𝑋, which, in the context of Prob. 6.2, refers to the marginal
propensity to consume (MPC).

The specific linear relationship corresponding to the general linear

relationship in Eq. (6.1) is obtained by estimating the values of 𝑏0 and
𝑏1 (represented by 𝑏̂0 , and 𝑏̂1 , and read as "b sub zero hat" and "b sub
one hat").

(b) The exact linear relationship in Eq. (6.1) can be made stochastic by
adding a random disturbance or error term, M,, giving
𝑌𝑖 = 𝑏0 + 𝑏1 𝑋𝑖 + 𝑢𝑖

(c) Most observed values of 𝑌 are not expected to fall precisely on a

straight line

(1) because even though consumption 𝑌 is postulated to depend

primarily on disposable income 𝑋, it also may depend on numerous
other omitted variables with only slight and irregular effect on 𝑌 (if
some of these other variables had instead a significant and regular
effect on Y, then they should be included as additional explanatory
variables, as in a multiple regression model);

(2) because of possible errors in measuring Y; and

(3) because of inherent random human behavior, which usually leads

to different values of 𝑌 for the same value of 𝑋 under identical
circumstances.

THE ORDINARY LEAST-SQUARES METHOD

(a) What is meant by the ordinary least-squares (OLS) method of estimating

the "best" straight line that fits the sample of 𝑿𝒀 observations?

The OLS method gives the best straight line that fits the sample of 𝑋𝑌
observations in the sense that it minimizes the sum of the squared
(vertical) deviations of each observed point on the graph from the
straight line.

(b) Why do we take vertical deviations?

CHAPTER 6 SIMPLE REGRESSION ANALYSIS 21/27

We take vertical deviations because we are trying to explain or predict

movements in Y, which is measured along the vertical axis.

We cannot take the sum of the deviations of each of the observed points
from the OLS line because deviations that are equal in size but opposite
in sign cancel out, so the sum of the deviations equals 0.

(d) Why do we not take the sum of the absolute deviations?

Taking the sum of the absolute deviations avoids the problem of having
the sum of the deviations equal to 0. However, the sum of the squared
deviations is preferred so as to penalize larger deviations relatively more
than smaller deviations.
Starting from Eq. (6.3) calling for the minimization of the sum of the squared
deviations or residuals, derive (a) normal Eq. (6.4) and (b) normal Eq. (6.5).

(a)
2
∑𝑒𝑖2 = ∑(𝑌𝑖 − 𝑌̂𝑖 ) = ∑(𝑌𝑖 − 𝑏̂0 − 𝑏̂1 𝑋𝑖 )
(b) Normal Eq. (6.4) is derived by minimizing ∑𝑒𝑖2 with respect to 𝑏̂0 :

2
∂∑𝑒𝑖2 𝜕Σ(𝑋𝑖 − 𝑏̂0 − 𝑏̂𝑖 𝑋𝑖 )
= =0
∂𝑏̂0 ∂𝑏̂0
2∑(𝑦1 − 𝑏̂0 − 𝑏̂1 𝑋𝑖 )(−1) = 0
∑(𝑦𝑖 − 𝑏̂0 − 𝑏̂1 𝑋𝑖 ) = 0
∑𝑌𝑖 = 𝑛𝑏̂0 + 𝑏̂1 ∑𝑋𝑖 … … . (𝐵1)

(6) Normal Eq. (6.5) is derived by minimizing∑𝑒𝑖2 with respect to 𝑏̂1 .

2
∂∑𝑒𝑖2 ∂∑(𝑌𝑖 − 𝑏̂0 − 𝑏̂1 𝑋𝑖 )
= =0
∂𝑏̂1 𝜕𝑏̂1
2∑(𝑌𝑖 − 𝑏̂0 − 𝑏̂1 𝑋𝑖 )(−𝑋𝑖 ) = 0
∑(𝑌𝑖 𝑋𝑖 − 𝑏̂0 𝑋𝑖 − 𝑏̂1 𝑋𝑖2 ) = 0

∑𝑌𝑖 𝑋𝑖 = 𝑏̂0 ∑𝑋𝑖 + 𝑏̂1 ∑𝑋𝑖2

CHAPTER 6 SIMPLE REGRESSION ANALYSIS 22/27

Solve simultaneously (B1) and (B2) to get values of 𝑏̂1 and 𝑏̂0
(α) Multiplying Eq. (?) by 𝑛 and Eq. (?) by ∑𝛸𝑖 ., we get
𝑛∑𝑋𝑖 𝑌𝑖 = 𝑏̂0 𝑛∑𝑋𝑖 + 𝑏̂1 𝑛∑𝑋𝑖2 … … . (𝐴1)
∑𝑋𝑖 ∑𝑌𝑖 = 𝑏̂0 𝑛∑𝑋𝑖 + 𝑏̂1 (∑𝑋𝑖 )2 … … … (𝐴2)
Subtracting Eq. (A2) from Eq. (A1), we get
𝑛∑𝑋𝑖 𝑌𝑖 − ∑𝑋𝑖 ∑𝑌𝑖 = 𝑏̂1 [𝑛∑𝑋𝑖2 − (∑𝑋𝑖 )2 ] … … . . (𝐴3)
Solving Eq. (A3) for 𝑏1 , we get
𝑛∑𝑋𝑖 𝑌𝑖 − ∑𝑋𝑖 ∑𝑌𝑖
𝑏̂1 = … … . . (𝐴4)
𝑛∑𝑋𝑖2 − (∑𝑋𝑖 )2
(b) Equation (A5) is obtained by simply solving Eq. (B1) for 𝑏̂0

∑𝑌𝑖 = 𝑛𝑏̂0 + 𝑏̂1 ∑𝑋𝑖 … … . . (𝐵1)

∑𝑌𝑖 ∑𝑋𝑖
𝑏̂0 = − 𝑏̂1
𝑛 𝑛
= 𝑌̅ − 𝑏̂1 𝑋̅ … … . (𝐴5)
EXAMPLE:
(a) Find the regression equation for the consumption schedule in Table
̂𝟏.
6.4, using Eq. (6.6) to find 𝒃
(a) Table 6.5 shows the calculations to find 𝑏̂1 , and 𝑏̂0 for the data in Table
6.4.
𝑛∑𝑋𝑖 𝑌𝑖 − ∑𝑋𝑖 ∑𝑌𝑖 (12)(225,124) − (1740)(1524) 2,701,488 − 2,651,760
𝑏ˆ1 = = =
𝑛∑𝑋𝑖2 − (∑𝑋𝑖 )2 (12)(257,112) − (1740)2 3,085,344 − 3,027,600
49,728
= ≅ 0.86
57,744
𝑏ˆ0 = 𝑌‾ − 𝑏ˆ1 𝑋‾ ≅ 127 − 0.86(145) ≅ 127 − 124.30 ≅ 2.30

Thus the equation for the estimated consumption regression is

𝑌̂𝑖 = 2.30 + 0.868𝑋𝑖 ,

𝑌𝑖 𝑋𝑖 𝑋𝑖 𝑌𝑖 𝑋𝑖2
1 102 114 11,628 12,996
2 106 118 12,508 13,924
CHAPTER 6 SIMPLE REGRESSION ANALYSIS 23/27

3 108 126 13,608 15,876

4 110 130 14,300 16,900
5 122 136 16,592 18,496
6 124 140 17,360 19,600
7 128 148 18,944 21,904
8 130 156 20,280 24,336
9 142 160 22,720 25,600
10 148 164 24,272 26,896
11 150 170 25,500 28,900
12 154 178 27,412 31,684

𝑛 = 12 ∑𝑌𝑖 = 1524 ∑𝑋𝑖 = 1740 ∑𝑋𝑖 𝑌𝑖 = 225,124 ∑𝑋𝑖2 = 257,112

𝑌‾ = 127 𝑋 = 145

(b) Plot the regression line and show the deviations of each 𝒀𝒊 , from the
corresponding 𝒀 ̂𝒊

(b) To plot the regression equation, we need to define any two points on the
regression line.
For example, when 𝑋𝑖 = 114, 𝑌̂𝑖 = 2.30 + 0.86(114) = 100.34.
When 𝑋𝑖 = 178, 𝑌̂𝑖 = 2.30 + 0.86(178) = 155.38.
CHAPTER 6 SIMPLE REGRESSION ANALYSIS 24/27

The consumption regression line is plotted in Fig. (?), where the positive
and negative residuals are also shown.
The regression line represents the best fit to the random sample of
consumption-disposable income observations in the sense that it
minimizes the sum of the squared (vertical) deviations from the line.
Assignment
̂𝟏 , in deviation form for
a) Starting with Eq. (?), derive the equation for 𝒃
the case where 𝑿 ̅=𝒀 ̅ = 𝟎.
̂𝟎 when 𝑿
(b) What is the value of 𝒃 ̅ = 𝒀
̅=𝟎?

For the aggregate consumption-income observations in Table (?), find

(a) 𝒔𝟐 (b) 𝒔𝟐𝒃̂𝟎 and 𝒔𝒃̂𝟎 (c) 𝒔𝟐𝒃̂𝟏 and 𝒔𝒃̂𝟏 (d) Test at the 5% level of
significance for 𝒃𝟎 and 𝒃𝟏 .
CHAPTER 6 SIMPLE REGRESSION ANALYSIS 25/27

Construct the 95% confidence interval for (a) 𝒃𝟎 and (b) 𝒃𝟏 , in above
problem
(a) The 95% confidence interval for 𝑏0 , is given by
𝑏0 = 𝑏̂0 ± 2.228𝑠𝑏̂0 = 2.30 ± 2.228(7.17) = 2.30 ± 15.97

So 𝑏0 is between -13.67 and 18.27 with 95% confidence. Note how wide
(and meaningless) the 95% confidence interval 𝑏0 is, reflecting the fact
that &, is highly insignificant.

(b) The 95% confidence interval for 𝑏1 is given by

𝑏1 = 𝑏̂1 ± 2.228𝑠𝑏̂1 = 0.86 ± 2.228(0.05) = 0.86 ± 0.11
So 𝑏1 is between 0.75 and 0.97 (ie., 0.75 < 𝑏1 < 0.97) with 95%
confidence.
Where 2.228 is t value obtained from t-distribution table at 5% level of
significance and 12-2=10 degrees of freedom. This value can also be
obtained by using MATLAB function:

Assignment:
C1.1 Find 𝑹𝟐 for the estimated consumption regression of previous
̂𝟐
∑𝒚 ∑𝒆𝟐
problem using the equation (a) 𝑹𝟐 = 𝒊
and (b) 𝑹𝟐 = 𝟏 − 𝒊
. Also find
∑𝒚𝟐
𝒊 ∑𝒚𝟐
𝒊
the results using MATLAB statements.

C1.2 Find 𝒓 for the estimated consumption regression in previous

∑𝒙𝒊 𝒚𝒊
problem using (a) 𝒓 = √𝑹𝟐 , (b) 𝒓 = ̂𝟏 ∑𝒙𝒊𝒚𝟐 𝒊 . Also
and (c) 𝒓 = √𝒃
∑𝒚𝒊
√∑𝒙𝟐 𝟐
𝒊 √∑𝒚𝒊

find the results using MATLAB statements.

CHAPTER 6 SIMPLE REGRESSION ANALYSIS 26/27

C1.3 Table (C1) gives the per capita income to the nearest $100 (𝑌) and
the percentage of the economy represented by agriculture (𝑋)
reported by the World Bank World Development Indicators for 1999
for 15 Latin American countries.
(a) Estimate the regression equation of 𝑌, on 𝑋.
(b)Test at the 5% level of significance for the statistical significance
of the parameters.
(c) Find the coefficient of determination.
(d) Use MATLAB statements to compute all the computations given
in (a), (b) and (c)
(e) Report the results obtained in part (a), (b) and (c) in standard
summary form.

Table (C1)
𝐶𝑜𝑢𝑛𝑡𝑟𝑦 (1) (2) (3) (4) (5) (6) (7) (8)
𝑛 1 2 3 4 5 6 7 8
𝑌𝑖 76 10 44 47 23 19 13 19
𝑋𝑖 6 16 9 8 14 11 12 10

𝐶𝑜𝑢𝑛𝑡𝑟𝑦 (9) (10) (11) (12) (13) (14) (15)

𝑛 9 10 11 12 13 14 15
𝑌𝑖 8 44 4 31 24 59 37
𝑋𝑖 18 5 26 8 8 9 5

*Key: (1) Argentina; (2) Bolivia; (3) Brazil; (4) Chile; (5) Colombia; (6)
Dominican Republic; (7) Ecuador; (8) El Salvador; (9) Honduras; (10)
Mexico; (11) Nicaragua; (12) Panama; (13) Peru; (14) Uruguay; (15)
Venezuela.
Source: World Bank World Development Indicators.
C1.4 Draw a scatter diagram for the data in Table(C2) and determine by
inspection if there is an approximate linear relationship between 𝑌𝑖 ,
and 𝑋𝑖 .
CHAPTER 6 SIMPLE REGRESSION ANALYSIS 27/27

C1.5 For the data in Table (C2, find the value of (a) 𝑏̂1 , and (b) 𝑏̂0 .
(c) Write the equation for the estimated OLS regression line.

Table (C2)
Observations on variables Y and X
𝑛 𝑌𝑖 𝑋𝑖
1 20 2
2 28 3
3 40 5
4 45 4
5 37 3
6 52 5
7 54 7
8 43 6
9 65 7
10 56 8

C1.6 (a) On a set of axes, plot the data in Table (C2), plot the estimated
OLS regression line and show the residuals.
(b) Show algebraically that the regression line goes through point
𝑋̅𝑌̅.

C1.7 For the data in Table (C2) , find (a) 𝑠 2 (b) 𝑠𝑏2̂0 and 𝑠𝑏̂0 , and (c 𝑠𝑏2̂1 and
𝑠𝑏̂1

Test at the 5% level of significance for (a) 𝑏0 and (b) 𝑏1 .

Construct the 95% confidence interval for (a) 𝑏0 and (b) 𝑏1 .
For the estimated OLS regression equation, find (a) 𝑅2 and (b) 𝑟.

Classical Linear Regression Model (CLRM)
100% (1)
Classical Linear Regression Model (CLRM)
68 pages
Chatgpt Learn Statistics
No ratings yet
Chatgpt Learn Statistics
28 pages
Machine Learning Lecture-Notes
100% (2)
Machine Learning Lecture-Notes
408 pages
Chapter 6
No ratings yet
Chapter 6
35 pages
Salary Prediction
No ratings yet
Salary Prediction
9 pages
Regression and Correlation
No ratings yet
Regression and Correlation
3 pages
Econometrics For Finace Lecture II-Session Two
No ratings yet
Econometrics For Finace Lecture II-Session Two
19 pages
Tugas Ulang Metodologi Penelitian
No ratings yet
Tugas Ulang Metodologi Penelitian
55 pages
Course Presentation AI 900 AzureAIFundamentals
No ratings yet
Course Presentation AI 900 AzureAIFundamentals
68 pages
ENCHO - Effects of Financial Inclusion On Poverty Alleviation in Cameroon
No ratings yet
ENCHO - Effects of Financial Inclusion On Poverty Alleviation in Cameroon
106 pages
STAT 3008 Applied Regression Analysis Tutorial 1 - Term 2, 2019 20
No ratings yet
STAT 3008 Applied Regression Analysis Tutorial 1 - Term 2, 2019 20
2 pages
Desmos User Guide
No ratings yet
Desmos User Guide
13 pages
Assignment Ms-Excel
No ratings yet
Assignment Ms-Excel
10 pages
Tutorial Stat 322 PDF
No ratings yet
Tutorial Stat 322 PDF
58 pages
Udemy Test4
No ratings yet
Udemy Test4
41 pages
Week 2 - Simple Linear Regression
No ratings yet
Week 2 - Simple Linear Regression
25 pages
Linear Regression
No ratings yet
Linear Regression
47 pages
(Ebook PDF) CFA Program Curriculum 2019 Level II Volumes 1-6 Box Set PDF Download
No ratings yet
(Ebook PDF) CFA Program Curriculum 2019 Level II Volumes 1-6 Box Set PDF Download
57 pages
Simple - Linear - Regression-Presentation - Review-Analysis - Covariance
No ratings yet
Simple - Linear - Regression-Presentation - Review-Analysis - Covariance
10 pages
Chapter 5 Regression Analysis
No ratings yet
Chapter 5 Regression Analysis
14 pages
Chap 06 - Evaluating Selection Techniques
No ratings yet
Chap 06 - Evaluating Selection Techniques
78 pages
SST307 Complete
No ratings yet
SST307 Complete
72 pages
Statistical Models in R
No ratings yet
Statistical Models in R
18 pages
125.785 Module 2.1
No ratings yet
125.785 Module 2.1
94 pages
Eco 3
No ratings yet
Eco 3
68 pages
Final Report Econometric
No ratings yet
Final Report Econometric
29 pages
file153663.J.H.M. Van Den Boomen
No ratings yet
file153663.J.H.M. Van Den Boomen
45 pages
Regression 2
No ratings yet
Regression 2
28 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
42 pages
Wpiea2022220 Print PDF
No ratings yet
Wpiea2022220 Print PDF
22 pages
Chapter Three
No ratings yet
Chapter Three
22 pages
Introduction To Econometrics
No ratings yet
Introduction To Econometrics
37 pages
Simple Regression Model CH02
No ratings yet
Simple Regression Model CH02
60 pages
Sun Drying
No ratings yet
Sun Drying
10 pages
On Asymptotic Distribution Theory in Segmented Regression Problems
No ratings yet
On Asymptotic Distribution Theory in Segmented Regression Problems
36 pages
C1 English
No ratings yet
C1 English
26 pages
Module in Statistics and Probability FINALS
No ratings yet
Module in Statistics and Probability FINALS
20 pages
Lecturer 4 Regression Analysis
100% (1)
Lecturer 4 Regression Analysis
29 pages
09 ML Nonparametric Machine Learning
No ratings yet
09 ML Nonparametric Machine Learning
19 pages
Simple Linear Regression Model I
No ratings yet
Simple Linear Regression Model I
83 pages
Chapter Two: Bivariate Regression Mode
100% (1)
Chapter Two: Bivariate Regression Mode
54 pages
Chap 6
No ratings yet
Chap 6
11 pages
Electrical Load Forecasting Techniques Employed in Power Sector of Pakistan
No ratings yet
Electrical Load Forecasting Techniques Employed in Power Sector of Pakistan
7 pages
41 +Yupiter+JMBI+Agustus+2023
No ratings yet
41 +Yupiter+JMBI+Agustus+2023
11 pages
CORRELATION and REGRESSION
No ratings yet
CORRELATION and REGRESSION
4 pages
Chapter Two
No ratings yet
Chapter Two
44 pages
Costing Research
No ratings yet
Costing Research
31 pages
Module 5
No ratings yet
Module 5
28 pages
Week 2
No ratings yet
Week 2
33 pages
Workplace Harassment - 2
No ratings yet
Workplace Harassment - 2
12 pages
PE Civil: Transportation Ebook Practice Exam
No ratings yet
PE Civil: Transportation Ebook Practice Exam
41 pages
Econometrics 2
No ratings yet
Econometrics 2
8 pages
Mock Exam Solution Empirical Methods For Finance
No ratings yet
Mock Exam Solution Empirical Methods For Finance
6 pages
Final Model 2 With Random Effects: The Mixed Procedure
No ratings yet
Final Model 2 With Random Effects: The Mixed Procedure
3 pages
BST 32202 Linear Regression 6 SLR Assumptions Lse
No ratings yet
BST 32202 Linear Regression 6 SLR Assumptions Lse
20 pages
EECM3724 Unit 9 ch14 Slides 2023
No ratings yet
EECM3724 Unit 9 ch14 Slides 2023
57 pages
Air Quality Prediction of Data Log by Machine Learning
No ratings yet
Air Quality Prediction of Data Log by Machine Learning
5 pages
Chapter 3
No ratings yet
Chapter 3
40 pages
DS Assignment 2
No ratings yet
DS Assignment 2
6 pages
Simple Linear Regression Analysis
No ratings yet
Simple Linear Regression Analysis
55 pages
Lecture 4
No ratings yet
Lecture 4
11 pages
Chapter 11 Lecture Notes .
No ratings yet
Chapter 11 Lecture Notes .
22 pages
ECN 5121 Econometric Methods Two-Variable Regression Model: The Problem of Estimation By: Domodar N. Gujarati
No ratings yet
ECN 5121 Econometric Methods Two-Variable Regression Model: The Problem of Estimation By: Domodar N. Gujarati
65 pages
Lecture3 221109 035214
No ratings yet
Lecture3 221109 035214
87 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
25 pages
Lab 4 - Markdown Practical - Solution
No ratings yet
Lab 4 - Markdown Practical - Solution
5 pages
FCDS - RA ch1 Sp21
No ratings yet
FCDS - RA ch1 Sp21
14 pages
M2L2 CLRM & Simple Linear Regression Analysis
No ratings yet
M2L2 CLRM & Simple Linear Regression Analysis
13 pages
Regression Notes - Part-1
No ratings yet
Regression Notes - Part-1
17 pages
Handout3 26
No ratings yet
Handout3 26
7 pages
Chapter2 (Simple Linear Regression)
No ratings yet
Chapter2 (Simple Linear Regression)
11 pages
Linear Models
No ratings yet
Linear Models
92 pages
Lecture2 241007 162001
No ratings yet
Lecture2 241007 162001
11 pages
Topic 6B Regression
No ratings yet
Topic 6B Regression
13 pages
Reg Analysis
No ratings yet
Reg Analysis
63 pages
Using Stacking Approaches For Machine Learning Models
No ratings yet
Using Stacking Approaches For Machine Learning Models
4 pages
Group Assignment Final PDF
100% (1)
Group Assignment Final PDF
13 pages
Chapter 2. Simple Linear Regression Module May13
No ratings yet
Chapter 2. Simple Linear Regression Module May13
20 pages
Lecture 2-3
No ratings yet
Lecture 2-3
8 pages
Student Notes Madule 2
No ratings yet
Student Notes Madule 2
12 pages
(Revised) Simple Linear Regression and Correlation
No ratings yet
(Revised) Simple Linear Regression and Correlation
41 pages
Regression and Correlation
No ratings yet
Regression and Correlation
14 pages
Regression and Multiple Regression Analysis
100% (1)
Regression and Multiple Regression Analysis
21 pages
Simple Linear Regression Analysis
No ratings yet
Simple Linear Regression Analysis
7 pages
ALGEBRA SIMPLIFIED EQUATIONS WORKBOOK WITH ANSWERS: Linear Equations, Quadratic Equations, Systems of Equations
From Everand
ALGEBRA SIMPLIFIED EQUATIONS WORKBOOK WITH ANSWERS: Linear Equations, Quadratic Equations, Systems of Equations
Luke Aneke
No ratings yet
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
From Everand
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
Andrew Igla
No ratings yet
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
A Brief Introduction to MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
From Everand
A Brief Introduction to MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
Peter Kattan
2.5/5 (2)
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)