Logistic Regression
Logistic Regression
Multivariate Analysis
Some interesting videos
• http://www.jmp.com/en_us/learning-
library/correlation-and-regression.html
• Check:
– Simple Logistic Regression
– Multiple Logistic Regression
Definition Logistic regression
• Logistic regression is one of the dependence techniques in which the
dependent variable is discrete and, more specifically, binary. That is, it
takes on only two possible values.
– Here are some examples:
– Will a credit card applicant pay off a bill or not?
– Will a mortgage applicant default?
– Will someone who receives a direct mail solicitation respond to the solicitation?
– In each of these cases, the answer is either “yes” or “no.”
• Such a categorical variable cannot directly be used as a dependent
variable in a regression.
• But a simple transformation solves the problem: Let the dependent
variable Y take on the value 1 for “yes” and 0 for “no.”
Logistic Regressions*
• Purposes
– To explain the behavior of a QUALITATIVE dependent variable (Y)
• Why Y is not equal for all the observations
(why not all the observations belong to the same group)
– To estimate the effect of one o more quantitative or qualitative
explanatory variables (X)
• Which X’s explain the behavior of Y
• How each X’s influences in Y
– To predict the Y value
• How well we have explained Y
(how many observations are well classified)
• Logical problems:
– The estimation, P(Y=1), is not between 0 and 1
– The relationship between X and Y, showing the
probability, could be no linear
Solution: to look for any non-lineal function that could represent
the relation between X and the probability Y=1
① Logistic Function: z
1.2
e
LOGIT F ( z ) =
1.0
z
1+ e .8
② Accumulated .6
normal distribution: .4
PROBIT .2
z 1 (−t 2 2 )
0.0
F ( z) = ∫ e dt
− ∞ (2π )1 2 -.2
• Consider taking the natural logarithm of both sides. The left side will
become log[ and the log of the odds ratio is called the logit. The
right side will become z (since log( ) = z) so that we have the relation
• We could estimate this model by linear regression and obtain estimates b0
of and b1 of if only we knew the log of the odds ratio for each
observation. Since we do not know the log of the odds ratio for each
observation, we will use a form of nonlinear regression called logistic
regression to estimate the model below:
• In so doing, we obtain the desired estimates b0 of and b1 of . The
estimated probability for an observation Xi will be
The Logistic function
• and the corresponding estimated logit will be
• which leads to a natural interpretation of the estimated coefficient in a
logistic regression: is the estimated change in the logit (log odds) for a
one-unit change in X.
LOGIT with a binary variable
e zi 1
Prob ( yi = 1) = pi = zi
= − zi
1+ e 1+ e
where zi = B0 + B1 X1i + ... + Bk Xki
1
Prob ( yi = 0) = 1 − pi = zi
1+ e
Prob ( yi = 1) pi
odds = = = e zi
Prob ( yi = 0) 1 − pi
⎛ pi ⎞ = B + B X + ... + B X
ln⎜ ⎟
⎝ 1 − pi ⎠
0 1 1i k ki
Odds and probabilities relating country of
origin and fuel consumption (1st example)
0 , 3003
P(Low Consu)= 202/406 = 49,75% 1−0 , 3003 = 0 , 429
P(Low Consu | EEUU) = 76/253 = 30,03%
P(Low Consu | Other) = 126/153 = 82,35%
Odds(Low Consu vs High consu) = 202 / 204 = 0,9901
Odds(Low Consu vs High consu | EEUU) = 76 / 177 = 0,4293
Odds(Low Consu vs High consu | Other) = 126 / 27 = 4,667
Odds(Low Consu vs High consu | EEUU vs Other) = 0,4293 / 4,6667 = 0,091
Odds and logit
Odds(Low Consu vs High consu | EEUU vs Other) = 0,4293 / 4,6667 = 0,091
BX yi 1− yi
n n
⎛ e ⎞ ⎛ 1 ⎞
L = ∏ p (1 − pi )
i
yi 1− yi
= ∏ ⎜⎜ BX
⎟⎟ ⎜ BX ⎟
i =1 i =1 ⎝ 1 + e ⎠ ⎝ 1 + e ⎠
• For a discussion of other statistics found here, such as BIC and Entropy RSquare, see the JMP Help.
Example: global significance
Other measures of the global usefulness of the model
Z2 = ∑
(Y
observado − Yprevisto )
2
P(1 − P )
• Some similar measures to R2 of the linear regression model:
proportion of the explained variance
2 N
L0
Cox & Snell : R 2 = 1 − ⎡ ⎤
⎢⎣ L* ⎥⎦
R 2
RMAX
Significance of each variable
• H0: Bi = 0 (Xi does not explain Y)
Ha: Bi ≠ 0 (Xi explains Y)
• This hypothesis is contrasted with χ ²
The fourth column (Most Likely PassClass) classifies the observation as either 1
or 0, depending upon whether the probability is greater than or less than 50%.
Confusion Matrix
• We can observe how well our model classifies all the observations (using this cut-off
point of 50%) by producing a confusion matrix: Click the red triangle and click
Confusion matrix.
• It compares the predictions vs. the sample data
– For each observation Prob(Y=1) is computed
– One observation is forecasted in the group (Y=1) if Prob(Y=1) is bigger than a
precise value (usually, the cut point is 0.5)
– The percentage of cases well classified is computed
• The rows of the confusion matrix are the actual
classification.
• The columns are the predicted classification from
the model (that is, the predicted 0/1 values from that
last fourth column using our logistic model and a
cutpoint of .50).
• Correct classifications are along the main diagonal
from upper left to lower right.
• The values on the other diagonal are
misclassifications.
Predictive effectiveness of the
classification
• The classification obtained with the model could be
compared to that randomly obtained, to that based on
Huberty Test, that is normal distributed:
(o − e) n 1 2
H= ; where e = (n1 + n22 )
e(n − e) n
o: right classified observations
n: number of total observations or total observations of the i group
In our example:
1 (355 − 203,004) 406
e= (204 + 202 ) = 203,004 ; H =
2 2
= 15,08 > 1,96
406 203,004 (406 − 203,004)
Second example
• Suppose we open a small data set toylogistic.jmp, containing students’
midterm exam scores (MidtermScore) and whether the student passed
the class (PassClass=1 if pass, PassClass=0 if fail).
• A passing grade for the midterm is 70. The first thing to do is create a
dummy variable to indicate whether the student passed the midterm:
PassMidterm = 1 if MidtermScore ≥ 70 and PassMidterm = 0 otherwise:
– Select Cols→New Column to open the New Column dialog box.
– In the Column Name text box, for our new dummy variable, type PassMidterm.
– Click the drop-down box for modeling type and change it to Nominal.
– Click the drop-down box for Column Properties and select Formula. The Formula dialog
box appears.
– Under Functions, click Conditional→If.
– Under Table Columns, click MidtermScore so that it appears in the top box to the right
of the If.
– Under Functions, click Comparison Analyze→Distributions “a>=b”.
– In the formula box to the right of >=, enter 70. Press the Tab key.
– Click in the box to the right of the Þ, and enter the number 1.
– Similarly, enter 0 for the else clause and accept twice.
The Logistic function: example
The Logistic function: example
• First, let us use a traditional contingency
table analysis to determine the odds ratio.
• Make sure that both PassClass and
PassMidterm are classified as nominal
variables.
– Right-click in the data grid of the column
PassClass and select Column Info.
– Click the black triangle next to Modeling Type
and select Nominal→OK. Do the same for
PassMidterm.
• Select Analyze→Tabulate to open the
Control Panel. It shows the general layout
for a table.
– Drag PassClass into the Drop zone for columns
and select Add Grouping Columns.
– Now that data have been added, the words
Drop zone for rows will no longer be visible, but
the Drop zone for rows will still be in the lower
left panel of the table.
The Logistic function: example
• Drag PassMidterm to the panel
immediately to the left of the 8 in the table.
• Select Add Grouping Columns. Click
Done.
• A contingency table will appear.
The probability of passing the class when you did not pass the midterm is
The probability of not passing the class when you did not pass the midterm is
(similar to row percentages).
The odds of passing the class given that you have failed the midterm are
The Logistic function: example
Similarly, we calculate the odds of passing the class given that you have passed the
midterm as:
Of the students that did pass the midterm, the odds are the number of students that
pass the class divided by the number of students that did not pass the class.
In the above paragraphs, we spoke only of odds. Now let us calculate an odds ratio.
Suppose we want to know the odds ratio of passing the class by comparing those who
pass the midterm (PassMidterm=1 in the numerator) to those who fail the midterm
(PassMidterm=0 in the denominator). The usual calculation leads to:
The Logistic function: example
which has the following interpretation: the odds of passing the class if you pass
the midterm are 8.33 times the odds of passing the class if you fail the midterm.
Of course, the user doesn’t have to perform all these calculations by hand;; JMP
will do them automatically. When a logistic regression has been run, simply
clicking the red triangle and selecting Odds Ratios will do the trick.
The Logistic function: example
• Equivalently, we could compare those who fail the midterm
(PassMidterm=0 in the numerator) to those who pass the midterm
(PassMidterm=1 in the denominator) and calculate:
• which tells us that the odds of passing the class failing the midterm are
0.12 times the odds of passing the class for a student who passes the
midterm.
• It is easier to interpret the odds ratio when it is less than 1 by using the
following transformation: (OR – 1)*100%.
– Compared to a person who passes the midterm, a person who fails the midterm is 12%
as likely to pass the class;; or equivalently, a person who fails the midterm is 88% less
likely, (OR – 1)*100% = (0.12 – 1)*100%= -88%, to pass the class than someone who
passed the midterm. Note that the log-odds are ln(0.12) = -2.12.
The Logistic function: example
• The relationships between probabilities, odds (ratios), and log-odds (ratios) are
straightforward.
• An event with a small probability has small odds, and also has small log-odds.
• An event with a large probability has large odds and also large log-odds.
• Probabilities are always between zero and unity;; odds are bounded below by zero
but can be arbitrarily large;; log-odds can be positive or negative and are not
bounded. In particular, if the odds ratio is 1 (so the probability of either event is
0.50), then the log-odds equal zero.
• Suppose π = 0.55, so the odds ratio 0.55/0.45 = 1.222. Then we say that the
event in the numerator is (1.222-1) = 22.2% more likely to occur than the event in
the denominator.
Odds ratio in Logistic regression
• Different software applications adopt different conventions for handling the
expression of odds ratios in logistic regression. By default, JMP = uses the “log
odds of 0/1” convention, which puts the 0 in the numerator and the 1 in the
denominator. This is a consequence of the sort order of the columns, which we
will address shortly.
• To see the practical importance of this, we can simply run a logistic regression. It
is important to make sure that PassClass is nominal and that PassMidterm is
continuous.
– If you have been following along with the book, both variables ought to be classified as nominal, so
PassMidterm n eeds to b e changed to continuous. Right-click in the column PassMidterm in the
data grid and select Column Info. Click the black triangle next to Modeling T ype and select
Continuous, and then click OK.
• From the top menu, select Analyze→Fit Model. Select PassClass→Y. Select
PassMidterm→Add. Click Run.
Odds ration in
Logistic regression
Odds ration in Logistic regression
• The intercept is 0.91629073, and the slope is -2.1202635.
• The slope gives the expected change in the logit for a one-unit change in the
independent variable (i.e., the expected change on the log of the odds ratio).
• However, if we simply exponentiate the slope (i.e., compute ) ,
then we get the 0/1 odds ratio.
• There is no need for us to exponentiate the coefficient manually. JMP will do this
for us:
– Click the red triangle and click Odds Ratios. T he Odds Ratios tables are added to the JMP output.
The fourth column (Most Likely PassClass) classifies the observation as either 1
or 0, depending upon whether the probability is greater than or less than 50%.
The Logistic function: example
• We can observe how well our model classifies all the
observations (using this cut-off point of 50%) by
producing a confusion matrix: Click the red triangle and
click Confusion matrix.
• The rows of the confusion matrix are the actual
classification (that is, whether PassClass is 0 or 1).
• The columns are the predicted classification from the
model (that is, the predicted 0/1 values from that last
fourth column using our logistic model and a cutpoint of
.50).
• Correct classifications are along the main diagonal
from upper left to lower right.
– We see that the model has classified 6 students as not passing
the class, and actually they did not pass the class.
– The model also classifies 10 students as passing the class when
they actually did.
• The values on the other diagonal, both equal to 2, are
misclassifications.
Model’s assumptions
• Before we can use the model, we have to check the model’s assumptions, etc.
The first step is to verify the linearity of the logit.
• This can be done by plotting the estimated logit against PassClass.
– Select Graph→Scatterplot Matrix.
– Select Lin[0]→Y, columns.
– Select MidtermScore→X.
– Click OK.
• The linearity assumption appears to be perfectly satisfied.
Variable Y with different categories
When Y has more than two categories we find different
models:
• Multinomial Logit: Y is nominal
• Conditional Logit: the election of one alternative of Y depends
on a previous election
• Ordinal Logit: Y is ordinal
• Sequential Logit: to go to the second level of Y requires to
pass the first
• Poisson regression: discrete variables (1, 2,...)