Cea Ece069 Sas-17-1
Cea Ece069 Sas-17-1
1) Introduction (2 mins)
Simple linear regression is a statistical method for obtaining a formula to predict the scores on one
variable from the scores on a second variable. The variable we are predicting is called the criterion
variable and is referred to as Y. The variable we are basing our predictions on is called the predictor
variable and is referred to as X. When there is only one predictor variable, the prediction method is
called simple regression.
In simple linear regression, the predictions of Y when plotted as a function of X form a straight line.
Linear regression consists of finding the best-fitting straight line through the points. The best-fitting line
is called a regression line.
B. MAIN LESSON
The residual error is, i = Yi Y0 , where Yi is the predicted value and Y0 is the observed
value. The error term is used to account for the variability in y that cannot be explained by the linear
relationship between x and y. If ε were not present, that would mean that knowing x would provide
enough information to determine the value of y.
The 0 ( the intercept of the regression line) and 1 ( the coefficient of X i or the slope of the
regression line ) is estimated by minimizing the sum of the square of the residual error. This
procedure is known as the Method of Least Square.
2
minimize ( i = (Yi Y.0 ) 2 )
1 n
n n n
n xi yi xi yi
n
1 i 1 i 1 i 1 and o yi 1 xi
n
n
2 n i 1 i 1
n xi xi
2
i 1 i 1
Equation 2
Equation 1
We then substitute the value of 0 and 1 and to the equation and have the regression line
equation.
Y 0 1 X Equation 3
The relationship between the independent and dependent variable is linear, that is, the line of
best fit through the data points is a straight line (rather than a curve)
Correlation Coefficient, r
One of the most commonly used correlation coefficient is the Pearson’s correlation coefficient, r.
The correlation coefficient, r, measures the strength of the linear relationship between the response
variable and the set of explanatory variable.
nx y x y
r
n x 2
x
2
n y 2
y
2
Equation 4
Coefficient of Determination, r2
The square of the correlation coefficient.
It is the proportion of variation in the response variable explained by the regression model.
The most common interpretation of the coefficient of determination is how well the regression model fits
the observed data. For example, a coefficient of determination of 60% shows that 60% of the data fit the
regression model. Generally, a higher coefficient indicates a better fit for the model.
Example 1. A research was done to study the effect of ambient temperature, x, on the electric power
consumed, y, by an industrial plant. Other factors were held constant. Below are data collected from
the experiment. Find the equation of the regression line and estimate the electric power consumption
when x = 70 0F.
y, x,
Trials
(BTU) (0 F )
1 250 27
2 285 45
3 320 72
4 295 58
5 265 31
6 298 60
7 267 31
8 321 74
From this table, we have Σ y i = 2,301; Σ x I = 398; Σ x i * y I = 117, 851; Σ xi 2 = 22,300 and
Σ yi 2 = 22,300. We then substitute these values to Equation 1, then to Equation 2 to solve 0 and
1
n n n
n xi yi xi yi
8 (117 ,851) 398 ( 2,301)
1 i 1 i 1 i 1
1.35
8 ( 22,300 ) 398
2 2
n
n
n xi xi
2
i 1 i 1
1 n n
o yi 1 xi
1
2,301 1.35 (398) 220 .5
n i 1 i 1 8
Substitute the values of 0 and 1 Equation 3, hence, the regression line equation is . . .
y = 220.5 + 1.35 x.
To predict the power consumption at x = 70 0F, we substitute this value to the regression line to
predict the power consumption, y.
The value of r =0.99, indicates that there is a very high positive relationship between the electric power
consumption and ambient temperature. That there is an increase in electric power consumption for an
increase in ambient temperature. Furthermore, the coefficient of determination of 0.98 (r2 = 0.992)
indicates that 98 % of the data fits into the regression line.
2) Activity 3: Skill-building Activities (with answer key) (18 mins + 2 mins checking)
Given below are data set on y and x. Let the y be the response variable and x be the predictor variable.
Find the equation of the regression line equation and the value of the correlation coefficient, r. Interpret
your result.
x y
0 2
1 3
2 5
3 4
4 6
2. If r 2 = 0.99, how confident are you in using the regression line to estimate the response variable given
the predictor variable?
a. not confident c. the relationship is weak to predict
b. very confident d. the relationship cannot be predicted
3. If the correlation coefficient is 0.90, the percentage of variation in the response variable explained by
the variation in the predictor variable is . . .
a. 0.90 % b. 90% c. 81% d. 0.81%
5. Larger values of r2 give us idea t hat the observations are more closely grouped about the . . ..
a. average value of the independent variables.
b. average value of the dependent variable
c. least squares line.
d. none of the above.
C. LESSON WRAP-UP
1) Activity 6: Thinking about Learning (5 mins)
You are done with the session! Let's track your progress.
Period 1 Period 2 Period 3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Group yourselves by three. Search for a problem (with given data points) related to your profession
that uses regression analysis. Solve for the regression line and the correlation coefficient then interpret
your result.
KEY TO CORRECTION
Activity #3
Extending the columns of the preceding table.
0 2 0 0 4
1 3 3 1 9
2 5 10 4 25
3 4 12 9 16
4 6 24 16 36
SUM 10 20 49 30 90
i 1 i 1
1 n n
o yi 1 xi
1
20 0.9 (10) 0.20
n i 1 i 1 5
The value of r =0.90, indicates that there is a very high positive relationship between the y and the x
variables. Furthermore, the coefficient of determination of 0.81 (r2 = 0.902) indicates that 81 % of the data
fits into the regression line.