Lec1 ppt2019
Lec1 ppt2019
Lec1 ppt2019
Lecture 1
1
Heights of women were adjusted by multiplying 1.08 such that men’s and women’s heights would have the same
mean.
Child(inch) Midparent(inch)
1 61.57220 70.07404
2 61.24382 68.22505
3 61.90968 65.12639
4 61.85769 64.23529
5 61.44986 63.88177
6 62.00005 67.02702
......
Figure: Scatter plot of child’s height against parent’s height
74
●
●
● ●
● ● ● ●●
● ● ●
● ●●
● ● ● ●● ● ● ●
● ●
●
● ●
● ● ●
● ● ● ● ● ●
● ● ●
● ● ●
●
● ●
● ● ●● ● ● ●
72
● ● ●● ● ●
● ●
● ● ● ● ●
● ●●●●
●●● ● ●
● ● ● ●●
● ● ●● ●
●● ●
● ● ● ● ●
● ● ● ● ● ● ●
●●
● ●
● ●● ● ●● ● ● ● ●
●
● ● ●
● ● ●
● ●●● ● ●● ● ● ● ●● ● ● ●● ●
● ●● ●
● ●● ●
● ● ●● ●
●
● ● ●
● ● ● ●● ● ●●
● ● ● ● ●
● ● ● ● ● ●● ● ●
● ● ● ● ● ●
●
70
● ● ● ● ● ● ● ● ●
●
● ● ●● ● ●● ● ● ● ●●●
● ● ●
●● ● ● ● ●●● ●●● ● ● ●●
●● ● ● ●●●● ● ●
● ●●● ● ●● ● ● ● ●●
● ● ●● ● ● ● ● ● ● ●●● ● ● ●●● ●
● ● ● ● ● ●
● ● ●● ● ●● ● ●● ● ● ● ●● ●●● ● ●●● ● ●
child height (inch)
● ● ●● ●
● ●● ● ● ● ●● ● ●● ● ●● ● ●
● ● ● ● ●
● ●●
●● ●●●● ●
● ●●● ● ● ●●
● ●● ● ●
● ● ● ●
● ● ●● ●● ● ● ●● ● ● ● ●
● ● ● ●●● ●● ●
● ● ● ●● ● ● ●●● ● ● ● ●●● ● ● ● ●●
●
●● ●
● ● ●● ●
●
● ● ●● ● ● ●●● ● ● ●
● ● ● ● ●●● ● ● ●● ●
● ● ● ●●● ●
●● ●●● ● ● ● ●
● ●
● ●● ●● ●● ● ● ● ● ●
● ● ●●
68
● ● ●● ● ● ● ● ● ●● ● ● ●
● ● ● ● ●●● ●●● ●
●
● ● ●●● ●● ● ● ●
●● ● ● ● ●
● ● ●● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ●
●
● ● ● ●●● ● ● ● ●● ●
● ● ● ● ● ● ● ●
● ● ●
●● ● ● ● ● ●● ● ●● ● ●●
● ● ●● ●● ● ●
● ● ● ●
●● ●● ● ● ●● ● ●
● ● ● ●● ● ●●● ●● ● ●● ●
● ● ●
● ●
●
● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ●● ●
●●
● ●●● ● ● ●● ● ● ● ● ●
●● ● ● ●
● ●
● ● ●●●
●
● ● ● ● ● ● ●● ●
● ● ● ● ●● ●● ● ● ● ●
● ●● ● ●● ● ● ● ●
● ● ● ●● ●● ●
● ● ● ● ● ● ●● ● ● ● ● ● ●
66
● ● ● ● ● ●
● ●● ●●● ●
● ●● ● ● ● ● ● ● ●●●● ● ●● ● ●
●● ● ● ●● ● ●●
● ● ● ● ● ● ● ● ●
● ● ● ●●●
● ● ● ● ● ●
● ●
● ● ●● ●
● ●
● ● ● ● ● ● ● ●
● ● ●
● ● ● ●● ●
● ●
● ● ● ●
● ●● ● ●
● ● ●● ● ● ● ●
● ● ● ●● ● ● ● ● ● ●
● ● ● ● ●
● ●● ● ●
● ●●●
64
● ●
● ●
● ● ● ● ●● ● ●●
● ●
● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ●
● ● ● ● ●
● ● ●
● ●
● ● ● ● ● ● ●
● ●
● ● ● ● ●
● ● ●
●
● ●● ●
62
● ●
● ●
●
●
●
64 66 68 70 72
midparent height (inch)
• Foot-ball shaped scatter plot =⇒ relationship between child’s
height (Y ) and parent’s height (X ) appears to be
• Fitted regression line:
Y = 24.54 + 0.637X
●
2
●
● ●
●
●
●
● ● ●
● ●
● ●
● ● ●
● ● ●
● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ●
● ● ● ●
1
● ● ● ● ●
● ● ● ●
● ● ● ● ●
● ●
● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ●
final score (zvalue)
● ● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ● ● ●
● ● ● ● ●
0
● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ●
● ● ● ● ●
● ● ● ● ● ●
● ● ● ●
● ● ● ● ●
● ●
● ● ● ● ●
● ● ● ●
● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ●
● ● ●
−1
● ● ● ● ● ● ● ● ●
● ● ● ● ●
●
● ● ● ● ● ●
● ● ● ●
● ● ●
● ●
● ●
● ● ● ● ●
● ● ● ●
● ● ●
● ●
●
−2
● ● ●
●
●
●
● ● ●
−3 −2 −1 0 1 2
midterm score (zvalue)
Salary
Salary survey of professional organizations relates salary to years
of experience.2
• Variables: Years of experience (X ) and salary (Y ).
• Cases: 143 organizations.
● ●
● ● ●
● ●
● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
70
● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ●
● ● ● ●
● ● ●
Salary (thousand $)
●
●
●
60
● ● ●
●
● ●
● ●
● ●
●
●
● ●
● ●
50
● ●
●
● ● ●
● ●
40
● ●
0 5 10 15 20 25 30 35
Years of experience
2
Source of data: Tryfos (1998): Methods for business analysis and forecasting
Case Salary Experience
1 71 26
2 69 19
3 73 22
4 69 17
5 65 13
6 75 25
..., ....
3
Source of data: lib.stat.cmu.edu/datasets/
Questions to Be Studied
From ”Applied Linear Statistical Models by Kutner, Nachtsheim, Neter and Li”
Heights
74
●
●
● ●
● ● ● ●●
● ● ●
● ●●
● ● ● ●● ● ● ●
● ●
●
● ●
● ● ●
● ● ● ● ● ●
● ● ●
● ● ●
●
● ●
● ● ●● ● ● ●
72
● ● ●● ● ●
● ●
● ● ● ● ●
● ●●●●
●● ● ● ●
● ● ● ●●
● ● ●● ●
●● ●
● ● ● ● ●
● ● ● ● ● ● ●
●●
● ● ● ●● ● ●● ● ● ● ●
●
● ● ●
● ● ●
● ●●● ● ●● ● ● ● ●● ● ● ●● ●
● ●● ●
● ●● ●● ● ●● ●
●
● ●● ● ● ● ●●● ●●
● ● ● ● ●
● ● ● ● ● ●● ● ●
● ● ● ● ● ●
●
70
● ● ● ● ● ● ● ● ●
●
● ● ●● ● ●● ● ● ● ●●●
● ● ●
●● ● ● ● ●●● ●●● ● ● ●●
●● ● ● ●●●● ● ●
● ●●● ● ●● ● ● ● ●●
●● ● ● ● ● ●●●
● ●
● ● ●●
● ● ● ● ● ● ● ●
●
● ●●● ●●● ● ● ● ● ● ● ●
● ● ●
68
● ● ●● ● ● ● ● ● ●● ● ● ●
● ●● ●●● ●●● ●
●
● ● ●● ● ●●●
●● ● ● ●
●● ● ●
● ● ●● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ●
● ● ● ●●● ● ● ●
●
●● ● ● ● ● ●●
● ● ● ● ●● ● ● ● ● ●
● ● ●
●● ● ●● ● ●
● ● ●
●
● ● ● ●
●● ●● ● ●●● ● ● ●● ●
● ● ● ●● ● ●●●● ●● ● ●● ● ● ●
● ●
●
● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ●● ●
●●
● ●● ● ● ● ●● ● ● ● ● ●
●● ● ● ●
● ●
● ● ● ● ●●●
●
● ● ● ● ● ● ●● ●
● ● ●● ●● ● ● ● ●
● ●● ● ●● ● ● ● ●
● ● ● ●● ●● ●
● ● ● ● ● ● ●● ● ● ● ● ● ●
66
● ● ● ● ● ●
● ●● ●●● ●
● ●● ● ● ● ● ● ● ●●●● ● ●● ●● ● ●● ●● ● ●●
● ● ● ● ● ●
● ● ● ●
● ● ● ●●●
● ● ● ● ● ●
● ●
● ● ●● ●
● ●
● ● ● ● ● ● ● ●
● ● ●
● ● ● ●● ●
● ●
● ● ● ●
● ●● ● ●
● ● ●● ● ● ● ●
● ● ● ●● ● ● ● ● ● ●
● ● ● ● ●
● ●● ● ●
● ●●●
64
● ●
● ●
● ● ● ● ●● ● ●●
● ●
● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ●
● ● ● ● ●
● ● ●
● ●
● ● ● ● ● ● ●
● ●
● ● ● ● ●
● ● ●
●
● ●●
62 ●
● ●
● ●
●
●
●
64 66 68 70 72
midparent height (inch)
• The average of the points falling in each vertical strip (bin) lies
approximately
The technique used here is called ”binning”. Can you think another
application of binning?
Notations and definitions.
• Mean of a random variable Y , denoted by E (Y ).
• Variance of a random variable Y , denoted by Var (Y ) or σ2 {Y }.
• Covariance between two random variables Y , Z, denoted by
Cov (Y , Z ) or σ{Y , Z }.
Check out appendix A.3 to review definitions of random variables,
mean (a.k.a. expected value), variance and covariance.
Simple Linear Regression Model
n cases (trials/subjects): Yi – the value of the response variable in
the ith case; Xi – the value of the predictor variable in the ith case.
• Model equation:
• Model assumptions:
• Unknown parameters:
Given Xi s, the distributions of the responses Yi s have the following
properties:
• The response Yi is the sum of two terms:
• which is
• which has
• i s are uncorrelated =⇒
In summary, the simple linear regression model says that the
responses Yi are
•
• whose means are
• whose variances are
• Moreover, two responses Yi and Yj (i , j ) are
Are the distributions of the responses Yi fully specified by this
model?
Regression Function
y = β0 + β1 x
• A
• β1 is the of the regression line: the change in
per unit change of X .
• β0 is the of the regression line: the value of
E (Y ) when
We will study how to model and fit the regression function from
data.
Figure: Regression line: y = β0 + β1 x
y=beta_0+beta_1 x
6
5
4
{ beta_1
y
{
1 unit of x
2
{ beta_0
1
0
0 1 2 3 4 5