Week 03 Regression
Week 03 Regression
TOD 533
Correlation, Introduction to Regression
Amit Das
TODS / AMSOM / AU
[email protected]
r
Z x Zy
1 r 1
N
• When high (low) z-scores of the two variables co-occur, the
correlation coefficient is larger
1
22-08-2024
Eyeballing correlation
2
22-08-2024
Statistical significance of r
• Null hypothesis: r = 0
Compute test statistic = n2
r
1 r2
• Compare against t-distribution with df = n-2
3
22-08-2024
6 d 2
rs 1
n n 2 1
• where d is the difference in the ranks of a given individual for the two
variables
• suitable for ordinal data
• less affected than Pearson r by outliers
4
22-08-2024
X
X X
p
X X
X q
X
intercept a
5
22-08-2024
N XY X Y
B
N X 2 X
2
A Y BX
6
22-08-2024
Y Y Y
X
B<0 X X X
X X X
X X
X X
X
X
X
B>0 X
B=0
X X X
7
22-08-2024
8
22-08-2024
Predictive power
• R = bivariate correlation between Yobserved and Ypredicted
(how well do they agree?)
• Consider the proportionate reduction in prediction
error (PRE) using the model
Y obs
Y Yobs Y pred / Yobs Y
2 2
2
9
22-08-2024
Hypothesis-testing in regression
• Consider Y = A + B1X1 + B2X2 + …+ BnXn
• For the null hypothesis H0 that ALL the coefficients Bi
are zero, B1 = B2 = Bn = 0
• and the alternate hypothesis Ha that at least one Bi is
NOT zero, Bi 0
R2 / k
F
• the test statistic is
1 R /n k 1
2
10
22-08-2024
Significance of coefficients
• Whether each coefficient Bi differs significantly from
zero is tested using the test statistic Bi / Bi
(value of coefficient / standard error)
• compared against t-distribution with n-(k+1) df
• Each coefficient can be tested in this manner
• H0: coefficient is zero vs. Ha: coefficient is not zero
• When a coefficient Bi fails this test, it is not significantly
different from zero, and the term involving Xi can be
dropped from the model
11
22-08-2024
12
22-08-2024
13
22-08-2024
14