BES - Lecture 10 - Simple Linear Regression
BES - Lecture 10 - Simple Linear Regression
BES - Lecture 10 - Simple Linear Regression
Lecture 10:
Simple Linear Regression
and Correlation
Outline
Introduction
1
8/25/22
Linear relationship?
2
8/25/22
The model
• The first-order linear model or simple linear
regression model
x
7
3 w (4, 3.2)
2.5
2 The smaller the sum of
(1, 2)w (3, 1.5)
w squared differences,
1 the better the fit of the
line to the data.
1 2 3 4 9
3
8/25/22
10
11
12
4
8/25/22
E(y|x 3)
The standard deviation remains constant ...
b 0 + b 1x 3 µ3
E(y|x 2)
b 0 + b 1x 2 µ2
b 0 + b 1x 1 µ1
x1 x2 x3
15
5
8/25/22
16
17
18
6
8/25/22
Coefficient of determination
• When we want to measure the strength of
the linear relationship, we use the
coefficient of determination.
19
Coefficient of determination
in p a
rt b y the regression model
in e d
e x p la
Overall variability in y
rema
ins, in
part,
unex
plaine the error
d
20
Coefficient of determination
y2
x1 x2
21
7
8/25/22
Coefficient of determination
• R2 measures the proportion of the variation
in y that is explained by the variation in x.
23
q q
q
q q q
q q q
qq q q q q q
q q qq q qq qq qq q q
q qq
q
q q q q q q q q q
q q
q q qq q q qq qq q q qqq q
q q q qqq qq q qq
q q
q qq qqqq q
qq qqqqqq q qq q qq q q q
qq qq qq qqq qq q
Relationship. No relationship.
Different inputs (x) yield Different inputs (x) yield
different outputs (y). the same output (y).
The slope is not equal to zero. The slope is equal to zero.
24
8
8/25/22
H 0: b 1 = 0
H A: b 1 ¹ 0 (or < 0, or > 0)
– The test statistic is
where
26
Coefficient of correlation
27
9
8/25/22
28
29
30
10
8/25/22
32
33
11
8/25/22
34
Regression diagnostics
• The three important conditions required for
the validity of the regression analysis are:
– The error variable is normally distributed.
– The error variance is constant for all values of x.
– The errors are independent of each other.
• How can we diagnose violations of these
conditions?
35
Regression diagnostics
• Examining the residuals (or standardized
residuals), we can identify violations of the
required conditions.
36
12
8/25/22
Example: Heteroscedasticity
When the requirement of a constant variance is
violated, we have heteroscedasticity.
+
^y
++
Residual
+
+ + + ++
+
+ + + ++ + +
+ + + +
+ + + ++ +
+ + + + y^
+ + ++ +
+ + +
+ ++
+ +++
+
37
Example: Heteroscedasticity
When the requirement of a constant variance is
not violated, we have homoscedasticity.
+
^y
++
Residual
+ +
+ + + ++
+
+ + + +
+ ++ + +
+ +
+ + + ++ ++ +
+ + + y^ ++
+ + + ++ +
+ + + + +
+ +++
+ ++
+
+
The spread of the data points
does not change much.
38
Example: Heteroscedasticity
When the requirement of a constant variance is
not violated, we have homoscedasticity.
+
^y +++ +
++ ++
Residual
+ +++
+ + +++ +
+ +++
+ + + +
+ ++ +
+ + +
+ ++
+ + + + ++
+ + y^ +
+ + +
+ + + ++ +
+ ++
+ ++
As far as the even spread, this is
a much better situation.
39
13
8/25/22
Residual Residual
+
+ + +
+
+ + +
+ + +
0 + 0 + +
+ Time Time
+ + + + + +
+ + + +
++
+
Outliers
• An outlier is an observation that is unusually
small or large.
• Several possibilities need to be investigated when
an outlier is observed:
– There was an error in recording the value.
– The point does not belong in the sample.
– The observation is valid.
• Identify outliers from the scatter diagram.
• It is customary to suspect an observation is an
outlier if its |standard residual| > 2.
42
14
8/25/22
+++++++++++
+ +
+
+ + … but some outliers
+ +
+
may be very influential.
+ + + +
+
+ +
+
43
44
Summary
45
15