ch12 0
ch12 0
1 Regression
2
Material from Devore’s book (Ed 8), and Cengagebrain.com
Simple Linear Regression
8
0
6
0
Ratin
g
4
0
2
0
0 5 1 1
0 5
Suga
r 2
Simple Linear Regression
8
0
6
0
Ratin
g
4
0
2
0
0 5 1 1
0 5
Suga
r 3
Simple Linear Regression
8
0
6
0
Ratin
g
4
0
x
x
2
0
0 5 1 1
0 5
Suga
r 4
The Simple Linear Regression
Model
The simplest deterministic mathematical relationship
between two variables x and y is a linear relationship:
y = 0 + 1x.
The objective of this section is to develop an
equivalent linear probabilistic model.
7
A Linear Probabilistic Model
The points (x1, y1), …, (xn, yn) resulting from n
independent observations will then be scattered
about the true regression line:
This image cannot currently be
displayed.
8
A Linear Probabilistic Model
How do we know simple linear
regression is appropriate?
- Theoretical considerations
- Scatterplots
9
A Linear Probabilistic Model
If we think of an entire population of (x, y)
pairs, then
Y2Y| x| xisisthe
a mean of all y of
measure values
how for which
much ,
x = xvalues
these of
and about
out the mean value.
y spread
10
A Linear Probabilistic Model
Interpreting parameters:
11
A Linear Probabilistic Model
What is 2Y | x? How do we interpret 2Y | x?
Homoscedasticity:
We assume the variance (amount of variability) of the
distribution of Y values to be the same at each different
value of fixed x. (i.e.
homogeneity of variance assumption).
12
When errors are normally
distributed…
distribution of
14
Estimating Model Parameters
The values of 0, 1, and 2 will almost never be
known to an investigator.
15
Estimating Model Parameters
Where
Yi = 0 + 1xi + εi for i = 1, 2, … , n
16
Estimating Model Parameters
The “best fit” line is motivated by the principle
of least squares, which can be traced back to
the German mathematician Gauss (1777–
1855):
17
Estimating Model Parameters
The sum of squared vertical deviations from
the points (x1, y1),…, (xn, yn) to the line is then
18
Estimating Model Parameters
The fitted regression line or least squares line is
then the line whose equation is y = + x.
19
Estimating Model Parameters
The least squares estimate of the slope coefficient
1 of the true regression line is
(Typically columns for xi, yi, xiyi and xi2 and constructed
and then 20
S and S are calculated.)
Estimating Model Parameters
The least squares estimate of the intercept 0 of
the true regression line is
21
Example (fitted regression line)
24
Fitted Values
Fitted Values:
The fitted (or predicted) are
by substituting x1,…, xn into the equation
values of the
obtained
estimated regression line:
Residuals:
The differences between
observed and fitted y the
values.
Residuals are estimates of the true error –
WHY?
25
Sum of the residuals
When the estimated regression line is obtained
via the principle of least squares, the sum of the
residuals should in theory be zero, if the error
distribution is symmetric, since
0
26
Example (fitted values)
28
Fitted Values
We interpret the fitted value as the value of y that we
would predict or expect when using the estimated
regression line with x = xi;; thus is the estimated true
mean for that population when x = xi (based on the
data).
Y i — Yˆ i = ϵˆ i 29
Residual Plots
Revenue = 2.7 * Temperature – 35
Residual
Temperature Revenue Revenue
(Observed –
(Celsius) (Observed) (Predicted)
Predicted)