04 Chap04 ClassificationMethods-LogisticRegression 2024
04 Chap04 ClassificationMethods-LogisticRegression 2024
Classification Methods
Logistic Regression
Outline
3
The Default Dataset
4
Why not Linear Regression?
5
Solution: Use Logistic Function
1
0.9
0.8
0.7
Probability
0.6
0.5
0.4
0.3
0.2
0.1
0
-5 -4 -3 -2 -1 0 1 2 3 4 5
X
6
Generalized Linear Model
7
Logistic Function on Default Data
Now the probability of default is close to, but not less than zero for low
balances. And close to but not above 1 for high balances
8
Review of Bernoulli Trials
9
Interpreting the logistic regression coefficients
Odds ratio
10
Interpreting the logistic regression coefficients
Intercept
The intercept can only be interpreted assuming zero values for the
predictors
11
Are the coefficients significant?
We still want to perform a hypothesis test to see whether we can be sure that
are 0 and 1 significantly different from zero.
12
Example
Here the p-value for balance is very small, and b1 is positive, so we are sure
that if the balance increase, then the probability of default will increase as well.
The column labelled “Z-statistic” is the Wald test statistic. What conclusions
can we make here?
13
Making Prediction
Suppose an individual has an average balance of $1000. What is their
probability of default?
14
Qualitative Predictors in Logistic Regression
15
Multiple Logistic Regression
16
Multiple Logistic Regression- Default Data
Income (quantitative)
Student (qualitative)
17
An Apparent Contradiction!
18
Students (Orange) vs. Non-students (Blue)
19
To whom should credit be offered?
A student is risker than non students if no information about the credit card
balance is available. However, that student is less risky than a non student
with the same credit card balance!
For example, a student with a credit card balance of $1, 500 and an income of
$40, 000 has an estimated probability of default of
A non-student with the same balance and income has an estimated probability
of default of
20
Confounding
Confounder is the third variable that (but not our interest) distorts the
observed relationship between the independent variable and outcome.
21
Logistic regression with more than two classes
22
Softmax Coding
In the softmax coding, rather than selecting a baseline class, we treat all K classes
symmetrically, and assume that for k = 1,…,K,
The log odds ratio between the kth and k′th classes equals
23