Topic Three Sta450 (Part1)
Topic Three Sta450 (Part1)
Simple Linear Regression Model is a basic regression model where there is only one independent
variable and one response variable.
Yi o 1 X i i
where,
Yi = the value of the response variable in the ith trial
βo and β1 are the parameters of the regression equation
Xi = is a known constant which is the value of the predictor variable in the ith trial
εi = the random error term with mean E i 0 and variance i
2 2
Note:
1. The above regression model is said to be simple, linear in the parameters, and linear in
the independent variable.
“simple” – only one predictor variable
“linear in the parameters” no parameter appears as exponent or is multiplied or divided
by another parameter.
“linear in the predictor variable” – the predictor variable appears only in the first power.
2. A model that is linear in parameter and in the independent variable is also called as first
order model.
3. The above model is subject to the following conditions.
i. The relationship between X and Y must be linear.
ii. The error variable must be normally distributed
iii. The error variance must be constant
iv. The errors must be independent
1
STA450: FUNDAMENTALS OF REGRESSION ANALYSIS
Example:
A study was undertaken to examine the number of machines sold (Y) per month by a
sales representative and its relationship to the number of sales calls (X) made in a month
for a random sample of ten representatives. The data obtained is shown in the following
table.
Correlation coefficient
X Y
SS XY XY
r n
SS XX SSYY X
2
Y
2
X
2
Y
2
n
n
405 525
22200
10
405 28875 525 2
2
17175
10
10
0.931
From the above calculation, we had established that the relationship between the number
of calls made and the number of machines sold is linear and the correlation coefficient
indicated that this relationship was strong. The next step is to construct the model. Hence,
we need to estimate the parameters of the model.
2
STA450: FUNDAMENTALS OF REGRESSION ANALYSIS
The scatter diagram for the above data is shown below. When we fit a regression line, we
want the line of best fit. That is, we want a line that is as close as possible to the actual
data.
The double sided arrows shows the distance between the actual data values and the
regression line which is the error, ei. We want to minimize this error.
ei Yi Yi
Where Yi is the actual or observed values and Yi is the estimated values of Y from the
regression equation. Since we want to minimize the errors we take the sum of the squared
errors and then differentiate the function.
2 2
2
ei Yi Yi Yi 0 1 X
The objective of the least square method is to find the estimates of 0 ( 0 ) and the
estimates of 1 ( 1 ) for which ∑ei2 is minimum.
3
STA450: FUNDAMENTALS OF REGRESSION ANALYSIS
The formula for calculating the parameters of the regression line is found by
differentiating the above function and equating it to zero. The formulas are as follows:
X Y
SS XY XY
1 n
SS XX X
2
X
2
n
0 Y 1 X
To compute ̂ 1 :
X Y 405 525
XY 22200
SS XY n 10 937.5
1 1.2136
SS XX X
2
405
2
772.5
X 17175
2
n 10
To compute ̂ 0 :
y x 525 405
0 Y 1 X 1 10 1.2136 10 3.3492
n n
Thus, estimated regression function ( Yi 0 1 X ) are:
Yi 3.3490 1.2136X
4
STA450: FUNDAMENTALS OF REGRESSION ANALYSIS
Method 1
As stated above the standard error of the estimate measures the deviation of the data
from the regression line. This is a measure of error.
ei Yi Yi
The sum of ei will always be zero. Hence we take the sum of the squared deviations
e 2
i . This is also known as the sum of squares error, SSE. This value is calculated
as shown in the following table. (Refer on example number of calls and number of
machines sold)
X Y Y ei Yi Yi ei2
50 70 64.0291 5.9709 35.6513
35 50 45.8252 4.1748 17.4286
40 45 51.8932 -6.8932 47.5163
50 60 64.0291 -4.0291 16.2339
40 55 51.8932 3.1068 9.6522
50 65 64.0291 0.9709 0.9426
30 40 39.7573 0.2427 0.0589
35 50 45.8252 4.1748 17.4286
25 30 33.6893 -3.6893 13.6111
50 60 64.0291 -4.0291 16.2339
2
e =174.7573
SSE = e 2
i = 174.7573
SSE 174.7573
se 4.6738
n2 10 2
Mean of Y = 52.5
A good model must have se less than 10% from the mean of Y.
10% of 52.5 = 5.25
Thusc our model can be said to be good as it is less than 10% of the mean of Y (4.6738 < 5.25)
5
STA450: FUNDAMENTALS OF REGRESSION ANALYSIS
Method 2
SS2XY
SSE SSYY SSYY 1 SS XY
SS XX
SSyy Y 2
Y
2
28875
525
2
1312.5
n 10
SSxy 937.5
1 1.2136
SSE SSYY 1 SSXY 1312.5 1.2136 937.5 174.75
SSE 174.75
se 4.6738
n2 10 2
Class Exercise:
The director of admissions of small college administered a newly designed entrance test
to 20 students selected at random from the new freshman class in a study to determine
whether a student’s grade point average (GPA) at the end of the freshman year (Y) can be
predicted from the test score (X). The results of the study are as follow.
i 1 2 3 4 5 6 7 8 9 10
Xi 5.5 4.8 4.7 3.9 4.5 6.2 6.0 5.2 4.7 4.3
Yi 3.1 2.3 3.0 1.9 2.5 3.7 3.4 2.6 2.8 1.6
i 11 12 13 14 15 16 17 18 19 20
Xi 4.9 5.4 5.0 6.3 4.6 4.3 5.0 5.9 4.1 4.7
Yi 2.0 2.9 2.3 3.2 1.8 1.4 2.0 3.8 2.2 1.5
a. Obtain the least square estimates of β0 and β1, and state the estimated regression
function.
b. What is the point estimate of the change in the mean response when the entrance test
score increases by one point?