Log Reg
Log Reg
Y 0 1 X1 2 X2 … n Xn
• Given a set of points (Xi,Yi), we wish to find
a linear function (or line in 2 dimensions)
that “goes through” these points.
• In general, the points are not exactly
aligned:
– Find line that best fits the points
Residue
• Error or residue:
– Observed value - Predicted value
Chart Title
4
Observed
Linear (Observed)
3
0
0 0.5 1 1.5 2 2.5
Sum-squared Error (SSE)
SSE
R 1
2
TSS
What is Best Fit?
0
y x1
1
xy x y
n
n x x
2
2
Example (I)
1
xy x y
n
n x x
2
2
x y x^2 xy
1.20 4.00 1.44 4.80 7 223.61 24.10 58.00
2.30 5.60 5.29 12.88 7 95.31 24.10 2
1565.27 1397.80
3.10 7.90 9.61 24.49
667.17 580.81
3.40 8.00 11.56 27.20 167.47
1.94
4.00 10.10 16.00 40.40 86.36
4.60 10.40 21.16 47.84
5.50 12.00 30.25 66.00 0
y x
1
n
24.10 58.00 95.31 223.61 58.00 1.94 24.10
7
Target: y=2x+1.5
11.27
1.61
7
Example (II)
Observed
14.00
12.00
10.00
8.00
Observed
6.00
4.00
2.00
0.00
0.00 1.00 2.00 3.00 4.00 5.00 6.00
Example (III)
SSE 0.975
R 1
2
1 0.98
TSS 47.369
Logistic Regression
Observations:
For each value of
SurvRate, the
number of dots is the
number of patients
with that value of
NewOut
Regression:
Standard linear
regression
Problem: extending the regression line a few units left or right along
the X axis produces predicted probabilities that fall outside of [0,1]
A Better Solution
Regression Curve:
Sigmoid function!
(bounded by
asymptotes y=0 and
y=1)
Odds
• Given some event with probability p of being 1,
the odds of that event are given by:
odds = p / (1–p)
• Consider the following data
Delinquent
Yes No Total
Normal 402 3614 4016
Testosterone High 101 345 446
503 3959 4462
• The odds of being delinquent if you are in the
Normal group are:
pdelinquent/(1–pdelinquent) = (402/4016) / (1 - (402/4016)) =
0.1001 / 0.8889 = 0.111
Odds Ratio
• The odds of being not delinquent in the Normal
group is the reciprocal of this:
– 0.8999/0.1001 = 8.99
• Now, for the High testosterone group
– odds(delinquent) = 101/345 = 0.293
– odds(not delinquent) = 345/101 = 3.416
• When we go from Normal to High, the odds of
being delinquent nearly triple:
– Odds ratio: 0.293/0.111 = 2.64
– 2.64 times more likely to be delinquent with high
testosterone levels
Logit Transform
logit( p) 0 1 X
• That is, the log odds (logit) is assumed
to be linearly related to the
independent variable X
• So, now we can focus on solving an
ordinary (linear) regression!
Recovering Probabilities
SSE 0.0028
TSS 0.5265
R2 0.9946
Summary