Notes 2
Notes 2
Further Topics...
3 Properties of b0 and b1
Yi = β0 + β1 xi + εi , i = 1, . . . , n.
(75, 208)
200
(73, 181)
180
Weight
Y = 158.8
160
140
(63, 127)
120
64 66 68 70 72 74
Height
By differentiation of the least squares criterion
n
X
Q= [Yi − (b0 + b1 xi )]2
i=1
we can get
n
X
(Yi − b0 − b1 xi ) = 0
i=1
n
X
xi (Yi − b0 − b1 xi ) = 0
i=1
b0 = Ȳ − b1 x̄
1 Because the formulas for b0 and b1 are derived using the least squares
criterion, the resulting equation
Ŷi = b0 + b1 xi
b0 = Ȳ − b1 x̄,
we can get
Ȳ = b0 + b1 x̄,
which means that the least squares line passes through the point (x̄, Ȳ ).
Some Notations
We use the notations:
1 Sum of squares for x:
n
X n
X
2
Sxx = (xi − x̄) = x2i − nx̄2
i=1 i=1
2 b1 is the estimate of the change in mean response value E(Y ) for every
additional one-unit increase in the predictor x.
1. In the example of 10 students’ height and weight, b1 tells us that we predict the
mean weight to increase by 6.14 pounds for every additional one-inch increase
in height.
3 The numerator tells us, for each data point, to sum up the product of two
distances – the distance of the x value from x̄ (the mean of all of the x values)
and the distance of the Y value from Ȳ (the mean of all of the Y values).
When is the Slope b1 > 0?
1 Is the trend in the following plot positive, i.e., as x increases, Y tends to increase?
(75, 208)
200
(73, 181)
180
Weight
Y = 158.8
160
140
(63, 127)
120
64 66 68 70 72 74
Height
When is the Slope b1 < 0?
1 Is the trend in the following plot negative, i.e., as x increases, Y tends to decrease?
(33, 219)
Mortality (Deaths per 10 million)
200
180
160
(34.5, 160)
Y = 152.9
140
(43, 134)
120
100
x = 39.5
(44.8, 86)
30 35 40 45
We have two thermometer brands (A) and (B). The predictor is Celsius and the
response is Fahrenhelt. Will this thermometer brand (A) or brand (B) yield more
precise future predictions?
(A) (B)
120
120
100
100
Fahrenheit
80
80
60
60
40
40
20
0 10 20 30 40 50 0 10 20 30 40 50
Celsius Celsius
Review of Sample Variance
When there is no predictor x, we use Ȳ to estimate E(Y ), and we use the sample
variance s2 to estimate σ 2 .
The sample variance:
n
(Yi − Ȳ )2
P
i=1
s2 =
n−1
0.3
Probability density
0.2
0.1
0.0
-4 -2 0 2 4
In the simple linear regression setting when there is a predictor x. At each x
value, there is a sub-group of data points, and we use
Ŷi = b0 + b1 xi
to estimate
E(Yi ) = β0 + β1 xi .
+
+
College entrance test score
16
+
population regression line +
sample regression line +
20
14
+
+
+
+ +
12
+
15
10
+
+
10
+
6
+
5
+
4
1.0 1.5 2.0 2.5 3.0 3.5 4.0 1.0 1.5 2.0 2.5 3.0 3.5 4.0
1 The numerator again adds up, in squared units, how far each response yi is
from its estimated mean Ŷi .