What Is Simple Linear Regression?
What Is Simple Linear Regression?
Note that the observed (x, y) data points fall directly on a line. As you may remember, the
relationship between degrees Fahrenheit and degrees Celsius is known to be:
9
F= ×C +32
5
That is, if you know the temperature in degrees Celsius, you can use this equation to determine the
temperature in degrees Fahrenheit exactly.
Here are some examples of other deterministic relationships that you have studied in high school.
You might anticipate that if you lived in the higher latitudes of the northern U.S., the less exposed
you'd be to the harmful rays of the sun, and therefore, the less risk you'd have of death due to skin
cancer. The scatter plot supports such a hypothesis. There appears to be a negative linear relationship
between latitude and mortality due to skin cancer, but the relationship is not perfect. Indeed, the plot
exhibits some "trend," but it also exhibits some "scatter." Therefore, it is a statistical relationship,
not a deterministic one.
Some other examples of statistical relationships might include:
Height and weight — as height increases, you'd expect weight to increase, but not perfectly.
Alcohol consumed and blood alcohol content — as alcohol consumption increases, you'd
expect one's blood alcohol content to increase, but not perfectly.
Vital lung capacity and pack-years of smoking — as amount of smoking increases (as
quantified by the number of pack-years of smoking), you'd expect lung function (as quantified by
vital lung capacity) to decrease, but not perfectly.
Driving speed and gas mileage — as driving speed increases, you'd expect gas mileage to
decrease, but not perfectly.
Okay, so let's study statistical relationships between one response variable y and one predictor
variable x!
2.2 - What is the "Best Fitting Line"?
Since we are interested in summarizing the trend between two quantitative variables, the natural
question arises — "what is the best fitting line?"
Let me give u an example: This chart shows a set of heights (x) and weights (y) of 10 students.
Looking at the plot below, which line — the solid line or the dashed line — do you think best
summarizes the trend between height and weight?
y^i=b0+b1xi
In our height and weight example, the experimental units ("experimental unit" is the object or
person on which the measurement is made) are students.
Let's try out the notation on our example with the trend summarized by the line w = -266.53 +
6.1376 h. The first data point in the list indicates that student 1 is 63 inches tall and weighs 127
pounds. That is, x1 = 63 and y1 = 127 . Do you see this point on the plot? If we know this student's
height but not his or her weight, we could use the equation of the line to predict his or her weight.
We'd predict the student's weight to be -266.53 + 6.1376(63) or 120.1 pounds. That is, y^1 = 120.1.
Clearly, our prediction wouldn't be perfectly correct — it has some "prediction error" (or "residual
error"). In fact, the size of its prediction error is 127-120.1 or 6.9 pounds.
You might want to roll your cursor over each of the 10 data points to make sure you understand the
notation used to keep track of the predictor values, the observed responses and the predicted
responses:
i xi yi y^i
1 63 127 120.1
2 64 121 126.3
3 66 142 138.5
4 69 157 157.0
5 69 162 157.0
6 71 156 169.2
7 71 169 169.2
8 72 165 175.4
9 73 181 181.5
1
75 208 193.8
0
As you can see, the size of the prediction error depends on the data point. If we didn't know the
weight of student 5, the equation of the line would predict his or her weight to be -266.53 +
6.1376(69) or 157 pounds. The size of the prediction error here is 162-157, or 5 pounds.
In general, when we use y^i=b0+b1xi to predict the actual response yi, we make a prediction error (or
residual error) of size:
ei=yi−y^i
A line that fits the data "best" will be one for which the n prediction errors — one for each
observed data point — are as small as possible in some overall sense. One way to achieve this goal
is to invoke the "least squares criterion," which says to "minimize the sum of the squared prediction
errors." That is:
The equation of the best fitting line is: y^i=b0+b1xi
We just need to find the values b0 and b1 that make the sum of the squared prediction errors
the smallest it can be.
That is, we need to find the values b0 and b1 that minimize:
Q=∑i=1n(yi−y^i)2
w = -331.2 + 7.1 h (the dashed line) w = -266.53 + 6.1376 h (the solid line)
(yi−y^i (yi−y^i
i xi yi y^i
)
(yi−y^i)2 i xi yi y^i
)
(yi−y^i)2
Q=∑i=1n(yi−(b0+b1xi))2
(that is, take the derivative with respect to b0 and b1, set to 0, and solve for b0 and b1) and get the
"least squares estimates" for b0 and b1:
b0=y¯−b1x¯
and:
b1=∑ni=1(xi−x¯)(yi−y¯)∑ni=1(xi−x¯)2
Because the formulas for b0 and b1 are derived using the least squares criterion, the resulting equation
— y^i=b0+b1xi— is often referred to as the "least squares regression line," or simply the "least
squares line." It is also sometimes called the "estimated regression equation." Incidentally, note
that in deriving the above formulas, we made no assumptions about the data other than that they
follow some sort of linear trend.
We can see from these formulas that the least squares line passes through the point (x¯,y¯), since
when x=x¯, then y=b0+b1x¯=y¯−b1x¯+b1x¯=y¯.
In practice, you won't really need to worry about the formulas for b0 and b1. Instead, you are are
going to let statistical software, such as R or Minitab, find least squares lines for you.
One thing the estimated regression coefficients, b0 and b1, allow us to do is to predict future
responses — one of the most common uses of an estimated regression line. This use is rather
straightforward:
A common use of the estimated regression line. y^i,wt=−266.53+6.1376xi,ht
y^i,wt=−266.53+6.1376(66)=138.
Predict (mean) weight of 66"-inch tall people.
55
y^i,wt=−266.53+6.1376(67)=144.
Predict (mean) weight of 67"-inch tall people.
69
Now, what does b0 tell us? The answer is obvious when you evaluate the estimated regression
equation at x = 0. Here, it tells us that a person who is 0 inches tall is predicted to weigh -266.53
pounds! Clearly, this prediction is nonsense. This happened because we "extrapolated" beyond the
"scope of the model" (the range of the x values). It is not meaningful to have a height of 0 inches,
that is, the scope of the model does not include x = 0. So, here the intercept b0 is not meaningful. In
general, if the "scope of the model" includes x = 0, then b0 is the predicted mean response when x =
0. Otherwise, b0 is not meaningful. There is more discussion of this here.
And, what does b1 tell us? The answer is obvious when you subtract the predicted weight of 66"-
inch tall people from the predicted weight of 67"-inch tall people. We obtain 144.69 - 138.55 = 6.14
pounds -- the value of b1. Here, it tells us that we predict the mean weight to increase by 6.14 pounds
for every additional one-inch increase in height. In general, we can expect the mean response to
increase or decrease by b1 units for every one unit increase in x.
https://drive.google.com/file/d/1ZoaueunP6p0d_--0CHm1QzWdLKXJR3pz/view?usp=sharing