Categorical Slide2024
Categorical Slide2024
► The value of ‘b’ shows the slope of the regression line and gives
us a measure of the change in y for a unit change in x.
► If we know the values of ‘a’ and ‘b’, we can easily compute the
value of Ŷ for any given value of X.
OLS estimates of coefficients
Based on the least squares estimation, the coefficients of
the estimated regression line y= a + bx are given by:
n n n n
(x i x)(yi y) x i y i ( x i )( y i ) n
b i 1 n i 1 n i 1
n
i 1
(x x) 2
x i ( x i ) 2 /n
2
i 1 i 1 i 1
a y bx
n XY X Y ( X X )(Y Y )
b= n X 2 ( X ) 2 = (X X ) 2
679 676
a= 10
- 0.77 ( 10
) = 67.9 – 52.05 = 15.85
*
*
Assumption 1 **
*
*
*
*
Linear relationship ** * *
Assumption 2 **
**
*
*
*
Y normally distributed **
**
*
at each value of x
Assumption 3
Same variance at each value of x
8
Checking Assumptions:
Assumption 1: linear relationship
Plot y against x to check for linearity
9
Checking Assumptions:
Assumption 2: Normality
Histogram of residuals
Dependent variable BMI
Normal P-P Plot of Standardized Residual
1.0
0.8
0.4
0.2
0.0
0.0 0.2 0.4 0.6 0.8 1.0
11
Exercise
Suppose we have the following dataset with the weight
and height of seven individuals.
• Hypothesis testing
• Correlation coefficient
• Coefficient of determination
Interval estimation of the regression parameters
Where
Example
(xi x )(yi y) xy
r
i i x y
2 2 2 2
(x x ) (y y )
XY [ X Y ] / n
[ X 2 ( X ) 2 / n][ Y 2 ( Y ) 2 / n]
where xi and yi are the values of X and Y for the ith individual
72
71
70
69
68
67
66
65
Y
64
60 62 64 66 68 70 72
X
Calculating r
The correlation coefficient for the data on fathers’ and
sons’ will be:
Basic values from the data
X 800, X 53,418, Y 811, Y 54,849, XY 54,107
2 2