Lecture 2
Lecture 2
Lesson 1
intercept of a Regression Line
The regression line is also called as the line of best fit. Its significance is in enabling us to
interpret data trends and help us in making predictions based on that data, the latter which is to be
discussed further in the next lesson.
Again, please take note that in doing regression, you first need to consider the following
assumptions:
a. There exist a relationship between the variables; and
b. The relationship is tested to be significant.
The stated conditions are necessary to be first met, otherwise doing a regression analysis
would be totally pointless.
A scatterplot is one way of illustrating a line of best fit. The figure below shows a scatterplot
of a data of two variables. Notice that several lines can be drawn on the graph near the points. With
this, you should be able to draw the line of best fit. Best fit means that the sum of the squares of the
vertical distances from each point to the line is at a minimum.
Example :
Given the data below, find the equation of the regression line and provide an interpretation of
the results.
Student No. of Study Hours (𝑥) Final Grade in Math(𝑦)
A 2 79
B 3 83
C 5 85
D 9 88
E 11 89
F 15 93
NORHAN A. SARIP 1
Solution
Before we can successfully proceed to solving for the equation of the regression line, we need
to solve first for the necessary summations. As such, a completed table like the one shown below
would be of great help.
A 2 79 158 4
B 3 83 249 9
C 5 85 425 25
D 9 88 792 81
E 11 89 979 121
F 15 93 1395 225
45 517 3998 465
Hence, the equation of the regression line 𝑦 ′ = 𝑎 + 𝑏𝑥 is𝑦 ′ = 79.078 + .945𝑥 where the slope is .945
and the y-intercept is 79.078.The y-intercept is the value you get when 𝑥 = 0. That is, it is the value
at some point where the line intersects the y-axis.
Interpretation
Marginal change is the magnitude of the change in one variable when the other variable
changes exactly one unit. In the problem, the value of the slope 𝑏, which is .945, is the marginal
change. This means that for every change in the value of 𝑥, which is the number of study hours, the
value of 𝑦 which is the grade also changes at .945unit on the average. Similarly, the value of the y–
intercept 𝑎 is 79.078. This means that the grade of a student would be 79.078 if he/she has zero
hours of study.
Today, you will be learning on how to use the equation of a regression line to make predictions
on the value of the dependent variable. That’s right! You heard it properly – prediction, or shall I say
estimation of a value of a dependent variable in which the value of the independent variable is not
present in your data given the circumstances that you have found.
To give you an idea on how to do such prediction (or estimation), let me start by showing you a
sample problem.
Example:
Below is a sample data about the top achieving students of a school given their number of
study hours (𝑥) and their score in the math final exam (𝑦). Find the equation of the regression line
and predict the value of the dependent variable if the value of the independent one is 14.
NORHAN A. SARIP 2
Student No. of Study Hours Score (out of 100)
A 5 83
B 7 87
C 8 89
D 11 93
E 13 96
Before we proceed with our initial computation, we must remember that in making regression
analysis, the data must be correlated and that the correlation must be significant. For the sake of this
discussion let us just have the assumption that such requirements have been met.
Now, like what we did in the previous module, we first need to solve for the necessary values in
finding the slope 𝑎 and the y-intercept 𝑏. Hence, we should come up with the following:
Student No. of Study Hours (x) Score out of 100 (y) xy x^2
A 5 83 415 25
B 7 87 609 49
C 8 89 712 64
D 11 93 1023 121
E 13 96 1248 169
44 448 4007 428
Hence, the equation of the regression line 𝑦 ′ = 𝑎 + 𝑏𝑥 is 𝑦 ′ = 75.667 + 1.583𝑥 where the slope
is 1.583 and the y-intercept is 75.667.
Interpretation
In the regression line equation, our slope 𝑏 is 1.583 which means that for every change in the
value of 𝑥, which is the number of study hours, the value of 𝑦 which is the score also changes at 1.583
unit on the average. Similarly, the value of the y–intercept 𝑎 is 75.667. This means that the score of a
student would be 75.667 if he/she has zero hours of study.
Now, since our main objective is to predict the value of 𝑦 when the value of 𝑥 is 14, we will now
use our newfound equation. We will replace 𝑥 with 14.
𝑦 ′ = 75.667 + 1.583𝑥
𝑦 ′ = 75.667 + 1.583(14)
𝑦 ′ = 75.667 + 22.162
𝑦 ′ = 97.829
NORHAN A. SARIP 3
Hence, if a student’s study hours is 14, his/her expected score in the math exam would be
97.829.
When using a regression line, you can only apply the interpretations of the slope and y-
intercept over the range of x values. It is dangerous to make predictions or statements beyond the
scope of what you observed in the data set.
In our example, we found that when a student studies for about 14 hours he/she would have
a score of 97.829. But should we use that same equation to predict their scores when the number of
study hours are already very large, say 100? Definitely not.
NORHAN A. SARIP 4