0% found this document useful (0 votes)
187 views6 pages

Topic Three Sta450 (Part1)

This document discusses simple linear regression analysis. It defines the simple linear regression model as having one independent variable and one response variable related by a linear equation with an error term. Least squares estimation is used to estimate the model parameters by minimizing the sum of squared errors between the observed and predicted response values. The example shows calculating the correlation coefficient between number of sales calls and machines sold, plotting the scatter diagram, and using the least squares method to estimate the intercept and slope parameters of the linear regression model relating these two variables. The estimated regression equation is then interpreted to describe how machines sold changes with number of sales calls.

Uploaded by

nur daliena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
187 views6 pages

Topic Three Sta450 (Part1)

This document discusses simple linear regression analysis. It defines the simple linear regression model as having one independent variable and one response variable related by a linear equation with an error term. Least squares estimation is used to estimate the model parameters by minimizing the sum of squared errors between the observed and predicted response values. The example shows calculating the correlation coefficient between number of sales calls and machines sold, plotting the scatter diagram, and using the least squares method to estimate the intercept and slope parameters of the linear regression model relating these two variables. The estimated regression equation is then interpreted to describe how machines sold changes with number of sales calls.

Uploaded by

nur daliena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

STA450: FUNDAMENTALS OF REGRESSION ANALYSIS

TOPIC THREE: SIMPLE LINEAR REGRESSION

3.1 General Concepts

Simple Linear Regression Model is a basic regression model where there is only one independent
variable and one response variable.

The simple linear regression model can be stated as follows:

Yi   o  1 X i   i
where,
Yi = the value of the response variable in the ith trial
βo and β1 are the parameters of the regression equation
Xi = is a known constant which is the value of the predictor variable in the ith trial
εi = the random error term with mean E i   0 and variance   i   
2 2

Note:
1. The above regression model is said to be simple, linear in the parameters, and linear in
the independent variable.
“simple” – only one predictor variable
“linear in the parameters” no parameter appears as exponent or is multiplied or divided
by another parameter.
“linear in the predictor variable” – the predictor variable appears only in the first power.
2. A model that is linear in parameter and in the independent variable is also called as first
order model.
3. The above model is subject to the following conditions.
i. The relationship between X and Y must be linear.
ii. The error variable must be normally distributed
iii. The error variance must be constant
iv. The errors must be independent

3.2 Least Square Estimation of the parameters

As we recall the model for simple linear regression:


Yi   o  1 X i   i
 β0 known as intercept.
If x = 0 is in the range, then β0 is the mean of the distribution of the response y,
when x = 0.
If x = 0 is not in the range, then β0 has no practical interpretation.
 β1 known as slope.
Change in the mean of the distribution of the response produced by a unit change
in x
 is random error.

1
STA450: FUNDAMENTALS OF REGRESSION ANALYSIS

Example:

A study was undertaken to examine the number of machines sold (Y) per month by a
sales representative and its relationship to the number of sales calls (X) made in a month
for a random sample of ten representatives. The data obtained is shown in the following
table.

Sales Representative Number of calls Number of machines sold


1 50 70
2 35 50
3 40 45
4 50 60
5 40 55
6 50 65
7 30 40
8 35 50
9 25 30
10 50 60

Correlation coefficient

∑ x = 405 ∑ y = 525 ∑ xy = 22200


2 2
∑ x = 17175 ∑ y = 28875 n = 10

 X Y
SS XY  XY 
r  n
SS XX SSYY     X  
2
  Y 
2
 X 
2
  Y 
2

 n 

n 

405  525 
22200 
 10
  405   28875   525 2 
2
17175 
 10  

10 

 0.931

From the above calculation, we had established that the relationship between the number
of calls made and the number of machines sold is linear and the correlation coefficient
indicated that this relationship was strong. The next step is to construct the model. Hence,
we need to estimate the parameters of the model.

2
STA450: FUNDAMENTALS OF REGRESSION ANALYSIS

The scatter diagram for the above data is shown below. When we fit a regression line, we
want the line of best fit. That is, we want a line that is as close as possible to the actual
data.

Best fitted regression line @ estimated regression line;


   
Yi  0  1 X @ Yi  b0  b1X

The double sided arrows shows the distance between the actual data values and the
regression line which is the error, ei. We want to minimize this error.

ei  Yi  Yi

Where Yi is the actual or observed values and Yi is the estimated values of Y from the
regression equation. Since we want to minimize the errors we take the sum of the squared
errors and then differentiate the function.
2 2
      
2
 ei    Yi  Yi     Yi  0  1 X 
   

The objective of the least square method is to find the estimates of  0 ( 0 ) and the

estimates of 1 ( 1 ) for which ∑ei2 is minimum.

3
STA450: FUNDAMENTALS OF REGRESSION ANALYSIS

The formula for calculating the parameters of the regression line is found by
differentiating the above function and equating it to zero. The formulas are as follows:

 X Y
 SS XY  XY 
1   n
SS XX   X
2
X 
2
n
 
 0  Y  1 X

Back to the example:

∑ x = 405 ∑ y = 525 ∑ xy = 22200 ∑ x2 = 17175 ∑ y2 = 28875 n = 10

To compute ̂ 1 :

 X Y 405  525 
  XY  22200 
SS XY n 10 937.5
1      1.2136
SS XX   X
2
 405 
2
772.5
X  17175 
2
n 10

To compute ̂ 0 :

  
y   x  525  405 
0  Y  1 X   1    10  1.2136  10   3.3492
n  n   

  
Thus, estimated regression function ( Yi  0  1 X ) are:

Yi  3.3490  1.2136X

Interpretation of the regression function:


For every 1 unit number of sales calls increase, unit number of machines sold will be
increased by 1.2136.

4
STA450: FUNDAMENTALS OF REGRESSION ANALYSIS

The standard error of estimate


This measures the deviation of the observations from the regression line. If all the
points on the scatter diagram fall on the regression line, then se would be zero giving
us a perfect forecast. This never occurs in reality. The standard error of the estimate is
actually a measure of the variability of the error terms. The smaller the value of se, the
better the model. We can compare the value of se with the mean of the dependent
variable. If the value is lesser than 10% of the mean, the model can be considered as
good. So how do we measure se? There are two ways of getting this value.

Method 1
As stated above the standard error of the estimate measures the deviation of the data
from the regression line. This is a measure of error.

ei  Yi  Yi
The sum of ei will always be zero. Hence we take the sum of the squared deviations
e 2
i . This is also known as the sum of squares error, SSE. This value is calculated
as shown in the following table. (Refer on example number of calls and number of
machines sold)
 
X Y Y ei  Yi  Yi ei2
50 70 64.0291 5.9709 35.6513
35 50 45.8252 4.1748 17.4286
40 45 51.8932 -6.8932 47.5163
50 60 64.0291 -4.0291 16.2339
40 55 51.8932 3.1068 9.6522
50 65 64.0291 0.9709 0.9426
30 40 39.7573 0.2427 0.0589
35 50 45.8252 4.1748 17.4286
25 30 33.6893 -3.6893 13.6111
50 60 64.0291 -4.0291 16.2339
2
 e =174.7573

SSE = e 2
i = 174.7573

SSE 174.7573
se    4.6738
n2 10  2

Mean of Y = 52.5
A good model must have se less than 10% from the mean of Y.
10% of 52.5 = 5.25
Thusc our model can be said to be good as it is less than 10% of the mean of Y (4.6738 < 5.25)

5
STA450: FUNDAMENTALS OF REGRESSION ANALYSIS

Method 2

We can also work out SSE by using the formula method.

SS2XY 
SSE  SSYY   SSYY  1 SS XY
SS XX

SSyy   Y 2

  Y
2
 28875 
 525 
2
 1312.5
n 10

SSxy  937.5


1  1.2136


SSE  SSYY  1 SSXY  1312.5  1.2136  937.5  174.75

SSE 174.75
se    4.6738
n2 10  2

Class Exercise:
The director of admissions of small college administered a newly designed entrance test
to 20 students selected at random from the new freshman class in a study to determine
whether a student’s grade point average (GPA) at the end of the freshman year (Y) can be
predicted from the test score (X). The results of the study are as follow.
i 1 2 3 4 5 6 7 8 9 10
Xi 5.5 4.8 4.7 3.9 4.5 6.2 6.0 5.2 4.7 4.3
Yi 3.1 2.3 3.0 1.9 2.5 3.7 3.4 2.6 2.8 1.6

i 11 12 13 14 15 16 17 18 19 20
Xi 4.9 5.4 5.0 6.3 4.6 4.3 5.0 5.9 4.1 4.7
Yi 2.0 2.9 2.3 3.2 1.8 1.4 2.0 3.8 2.2 1.5

a. Obtain the least square estimates of β0 and β1, and state the estimated regression
function.
b. What is the point estimate of the change in the mean response when the entrance test
score increases by one point?

You might also like