0% found this document useful (0 votes)
33 views

Logistic Regression

Logistic regression is a statistical model used when the dependent variable is binary. It transforms the dependent variable into the log odds of the probability of an event occurring versus not occurring. This allows the dependent variable to be modeled as a linear combination of the predictor variables. The logistic function, which has an S-shape, is used to model the relationship between the predictors and the log odds in a way that constrains the estimated probabilities to be between 0 and 1. Logistic regression estimates the model through maximum likelihood estimation rather than linear regression, due to the non-linear logistic function.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views

Logistic Regression

Logistic regression is a statistical model used when the dependent variable is binary. It transforms the dependent variable into the log odds of the probability of an event occurring versus not occurring. This allows the dependent variable to be modeled as a linear combination of the predictor variables. The logistic function, which has an S-shape, is used to model the relationship between the predictors and the log odds in a way that constrains the estimated probabilities to be between 0 and 1. Logistic regression estimates the model through maximum likelihood estimation rather than linear regression, due to the non-linear logistic function.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

Logistic Regression

Multivariate  Analysis
Some  interesting  videos
• http://www.jmp.com/en_us/learning-­
library/correlation-­and-­regression.html

• Check:  
– Simple  Logistic Regression
– Multiple Logistic Regression
Definition Logistic regression
• Logistic  regression  is  one  of  the  dependence  techniques  in  which  the  
dependent  variable  is  discrete  and,  more  specifically,  binary.  That  is,  it  
takes  on  only  two  possible  values.  
– Here  are  some  examples:  
– Will  a  credit  card  applicant  pay  off  a  bill  or  not?  
– Will  a  mortgage  applicant  default?  
– Will  someone  who  receives  a  direct  mail  solicitation  respond  to  the  solicitation?  
– In  each  of  these  cases,  the  answer  is  either  “yes”  or  “no.”  
• Such  a  categorical  variable  cannot  directly  be  used  as  a  dependent  
variable  in  a  regression.  
• But  a  simple  transformation  solves  the  problem:  Let  the  dependent  
variable  Y  take  on  the  value  1  for  “yes”  and  0  for  “no.”  
Logistic  Regressions*
• Purposes
– To explain the behavior of a QUALITATIVE dependent variable (Y)
• Why Y is not equal for all the observations
(why not all the observations belong to the same group)
– To estimate the effect of one o more quantitative or qualitative
explanatory variables (X)
• Which X’s explain the behavior of Y
• How each X’s influences in Y
– To predict the Y value
• How well we have explained Y
(how many observations are well classified)

• Notes based on professor Teresa Obis’ material


Linear  regression  model
• When Y is binary
– 0 = something is not happening
– 1 = something is happening
• The probability of obtaining 1 is estimated

Yˆ = E(Y) = 0 • P(Y = 0 ) + 1• P(Y = 1 ) = P(Y = 1 )


• A linear regression model could be used:
Y = B0 + B1 X1 + ... + Bk Xk +
• B1 shows the change in the probability of Y=1 when X1
increases in 1 unit
• From the theory of regression, we also know that E[Yi] = a +
b*Xi. (Here we use simple regression, but the same holds true
for multiple regression).
Linear  regression  model

• Combining these two results, we have P[Yi=1] = a + b*Xi.


• We can see that, in the case of a binary dependent variable, the
regression may be interpreted as a probability.
• We then seek to use this regression to estimate the probability
that Y takes on the value 1. If the estimated probability is high
enough, say above 0.5, then we predict 1; conversely, if the
estimated probability of a 1 is low enough, say below 0.5, then
we predict 0.
The  Linear  Probability  Model  
(LPM)
• When linear regression is applied to a binary dependent variable, it
is commonly called the Linear Probability Model (LPM).
• Traditional linear regression is designed for a continuous
dependent variable, and is not well-suited to handling a binary
dependent variable.
• Three primary difficulties arise in the LPM.
• First, the predictions from a linear regression do not necessarily fall
between zero and one.
– What are we to make of a predicted probability greater than one? How do we
interpret a negative probability? A model that is capable of producing such
nonsensical results does not inspire confidence.
Example: eat  fish  depends  on  the  age

The  estimation  does  not  fit  


between  0  and  1:
Y  =  -­0,164528  +  0,0187457  X
If  X  =  65
Y  estimated  =  1,0539
The  Linear  Probability  Model  
(LPM)
• Second,  for  any  given  predicted  value  of  y  (denoted      ),  the  residual  
(resid=  y  -­ )  can  take  only  two  values.  For  example,  if          =  0.37,  then  the  
only  possible  values  for  the  residual  are  resid=  -­0.37  or  resid  =  0.63  (=  1  
– 0.37),  because  it  has  to  be  the  case  that        +  resid  equals  zero  or  one.
• Clearly,  the  residuals  will   not  be  normal.  
• Plotting  a  graph  of        versus  resid  will   produce  not  a  nice  scatter  of  points,  
but  two  parallel  lines.  
– The  reader  should  verify  this  assertion  by  running  such  a  regression  and  making  the  
requisite  scatterplot.  
• A  further  implication  of  the  fact  that  the  residual  can  take  on  only  two  
values  for  any  is  that  the  residuals  are  heteroscedastic.  
– This  violates  the  linear  regression  assumption  of  homoscedasticity  (constant  variance).  
• The  estimates  of  the  standard  errors  of  the  regression  coefficients  will   not  
be  stable  and  inference  will   be  unreliable.  
The  Linear  Probability  Model  
(LPM)
• Third,  the  linearity  assumption  is  likely  to  be  invalid,  especially   at  the  
extremes  of  the  independent  variable.  
– Suppose  we  are  modeling  the  probability  that  a  consumer  will  pay  back  a  $10,000  loan  
as  a  function  of  his/her  income.  
– The  dependent  variable  is  binary,  1  =  the  consumer  pays  back  the  loan,  0  =  the  
consumer  does  not  pay  back  the  loan.  
– The  independent  variable  is  income,  measured  in  dollars.  
– A  consumer  whose  income  is  $50,000  might  have  a  probability  of  0.5  of  paying  back  
the  loan.  
– If  the  consumer’s  income  is  increased  by  $5,000,  then  the  probability  of  paying  back  
the  loan  might  increase  to  0.55,  so  that  every  $1,000  increase  in  income  increases  the  
probability  of  paying  back  the  loan  by  1%.  
– A  person  with  an  income  of  $150,000  (who  can  pay  the  loan  back  very  easily)  might  
have  a  probability  of  0.99  of  paying  back  the  loan.  
– What  happens  to  this  probability  when  the  consumer’s  income  is  increased  by  $5,000?  
Probability  cannot  increase  by  5%,  because  then  it  would  exceed  100%;;  yet  according  
to  the  linearity  assumption  of  linear  regression,  it  must  do  so.  
Therefore…LPM  presents  
different  limitations
• Econometric problems:
– ε does not follow a normal distribution
– Variance of ε is not constant (heterocedasticity)
non efficient estimators

• Logical problems:
– The estimation, P(Y=1), is not between 0 and 1
– The relationship between X and Y, showing the
probability, could be no linear
Solution: to look for any non-lineal function that could represent
the relation between X and the probability Y=1

① Logistic  Function: z
1.2

e
LOGIT F ( z ) =
1.0

z
1+ e .8

② Accumulated   .6

normal  distribution:   .4

PROBIT .2

z 1 (−t 2 2 )
0.0

F ( z) = ∫ e dt
− ∞ (2π )1 2 -­.2

③ Others (see Maddala or


Greene: Cauchy, Burr, ...)
The  Logistic  function
• A  better  way  to  model  P[Yi=1]  
Cumulative  distribution  function  of  logistic   would  be  to  use  a  function  that  is  
function
not  linear,  one  that  increases  
slowly  when  P[Yi=1]  is  close  to  
zero  or  one,  and  that  increases  
more  rapidly  in  between.  It  would  
have  an  “S” shape.  One  such  
function  is  the  logistic  function.

• Another  useful  representation  of  


the  logistic  function  is  
The  Logistic  function
• Recognize  that  the  y-­axis,  G(z),  is  a  probability  and  let  G(z)  =  π,  the  
probability  of  the  event  occurring.  
• We  can  form  the  odds  ratio  (the  probability  of  the  event  occurring  divided  
by  the  probability  of  the  event  not  occurring)  and  do  some  simplifying:

• Consider  taking  the  natural  logarithm  of  both  sides.  The  left  side  will  
become  log[                 and  the  log  of  the  odds  ratio  is  called  the  logit.  The  
right  side  will  become  z  (since  log( )  =  z)  so  that  we  have  the  relation

• and  this  is  called  the  logit  transformation.  


The  Logistic  function
• If  we  model  the  logit  as  a  linear  function  of  X  (i.e.,  let  z  =                      ),  then  we  
have

• We  could  estimate  this  model  by  linear  regression  and  obtain  estimates  b0
of          and  b1 of          if  only  we  knew  the  log  of  the  odds  ratio  for  each  
observation.  Since  we  do  not  know  the  log  of  the  odds  ratio  for  each  
observation,  we  will  use  a  form  of  nonlinear  regression  called  logistic  
regression  to  estimate  the  model  below:

• In  so  doing,  we  obtain  the  desired  estimates  b0 of        and  b1 of      .  The  
estimated  probability  for  an  observation  Xi will   be
The  Logistic  function
• and  the  corresponding  estimated  logit  will  be

• which  leads  to  a  natural  interpretation  of  the  estimated  coefficient  in  a  
logistic  regression:            is  the  estimated  change  in  the  logit  (log  odds)  for  a  
one-­unit  change  in  X.
LOGIT    with  a  binary  variable
e zi 1
Prob ( yi = 1) = pi = zi
= − zi
1+ e 1+ e
where zi = B0 +  B1 X1i +  ...  +  Bk Xki

1
Prob ( yi = 0) = 1 − pi = zi
1+ e
Prob ( yi = 1) pi
odds = = = e zi
Prob ( yi = 0) 1 − pi
⎛ pi ⎞ = B + B X + ... + B X
ln⎜ ⎟
⎝ 1 − pi ⎠
0 1 1i k ki
Odds  and  probabilities  relating  country  of  
origin  and  fuel  consumption  (1st example)

0 , 3003
P(Low Consu)=  202/406  =  49,75% 1−0 , 3003 = 0 , 429
P(Low Consu |  EEUU)  =  76/253  =  30,03%
P(Low Consu |  Other)  =  126/153  =  82,35%
Odds(Low Consu vs  High  consu)  =  202  /  204  =  0,9901
Odds(Low Consu vs  High  consu |  EEUU)  =  76  /  177  =  0,4293
Odds(Low Consu vs  High  consu |  Other)  =  126  /  27  =  4,667
Odds(Low Consu vs  High  consu |  EEUU  vs  Other)  =  0,4293  /  4,6667  =  0,091
Odds  and  logit
Odds(Low Consu vs High consu | EEUU vs Other) = 0,4293 / 4,6667 = 0,091

ln(odd(Low Consu vs  High  consu |  EEUU))  =  ln(0,4293)  =  -­0,8455


ln(odd(Low Consu vs  High  consu |  Other))  =  ln(4,6667)  =  1,540
Logit result:

ln(odd(consumo bajo)) = 1,540 – 2,386 origen

If origen = 0, (Other) ln(odd(Low Consu | Other)) = 1,54


If origen =1, ln(odd(Low Consu | EEUU)) =1,54 – 2,386 = -0,846

odd(Low Consu | EEUU vs Other)= e -2,386 = 0,0919


Estimation  of  the  Logit  model
• The B coefficients are estimated by maximum likelihood
likelihood possibility probability
The purpose is to maximize the likelihood that observations fit
the estimated model

BX yi 1− yi
n n
⎛ e ⎞ ⎛ 1 ⎞
L = ∏ p (1 − pi )
i
yi 1− yi
= ∏ ⎜⎜ BX
⎟⎟ ⎜ BX ⎟
i =1 i =1 ⎝ 1 + e ⎠ ⎝ 1 + e ⎠

In fact, the logarithm of L is maximized through an iterative


process.
Degree  of  fitness  of  the  model
• Likelihood value (probability): 0 L 1
• -(2) log L follows a χ ² distribution with (n-p) degrees of
freedom
• When L → 0; log L → -∝; - log L → +∝
• When L → 1; log L → 0; - log L → 0
• Therefore, the minor - log L the better will be the fitness of
the model to the data (better estimation)
• It is computed :
-Log L0 model with only the constant
-Log L* model with explanatory variables
-Log L0 -(-Log L* ) improvement of the model
Global  signification  of  the  model
• The analog to the ANOVA F-test for linear regression is found
under the Whole Model Test, in which the Full and Reduced
models are compared.
• The null hypothesis for this test is that all the slope parameters are
equal to zero.
• H0: All Bi = 0 (the model does not explain anything)
Ha: Some Bi ≠ 0 (Y is explained by some or all X’s)
• This hypothesis is contrasted with -Log L0+ Log L* that follows a
χ² distribution with (n-p) degrees of freedom
• Significance: probability of getting wrong if I reject the H0
– When the significance is small (smaller than 0,05 or 0,1) we can reject the H0
:
• Some of the X’s variables explain Y
• There is some relationship among any X and Y

• For  a  discussion  of  other  statistics  found  here,  such  as  BIC  and  Entropy  RSquare,  see  the  JMP  Help.
Example:  global  significance
Other  measures  of  the  global  usefulness  of  the  model

• Goodness of fit: compare the observed probabilities with the


expected ones

Z2 = ∑
(Y
observado − Yprevisto )
2

P(1 − P )
• Some similar measures to R2 of the linear regression model:
proportion of the explained variance
2 N
L0
Cox & Snell : R 2 = 1 − ⎡ ⎤
⎢⎣ L* ⎥⎦

R 2

Nagelkerke : R = 2 ; where RMAX = 1 − [L0 ]


2 C &S 2 2 N

RMAX
Significance  of  each  variable
• H0: Bi = 0 (Xi does not explain Y)
Ha: Bi ≠ 0 (Xi explains Y)
• This hypothesis is contrasted with χ ²

• When the significance is small (smaller than 0,05 or 0,1)


we can reject the H0 :
– Xi explains Y
– There is relationship between Xi and Y
Example:  which  variables  explain  the  
gas  consume  of  a  car
Lack of  fit Test
• When  we  use  more  than  one  independent  variables,  we  can  check  the  Lack  of  Fit  
test.  
• It  compares  the  model  actually  fitted  to  the  saturated  model.  
• The  saturated  model  is  a  model  generated  by  JMP  that  contains  as  many  
parameters  as  there  are  observations.  So  it  fits  the  data  very  well.  
• The  null  hypothesis  for  this  test  is  that  there  is  no  difference  between  the  
estimated  model  and  the  saturated  model.  
• If  this  hypothesis  is  rejected,  then  more  variables  (such  as  cross-­product  or  
squared  terms)  need  to  be  added  to  the  model.  
• In  the  present  case,  as  can  be  seen,  Prob>ChiSq=  1.  We  can  therefore  conclude  
that  we  do  not  need  to  add  more  terms  to  the  model.
Additional  comments

• We have to analyze the possible multicolineality (correlation


among the independent variables).
– If there was excessive multicolineality the S.E. of the B’s
would be overestimated and the statistic would not be well
computed
– Some variables, being individually statistically significant,
could appear as no significant. These variables don’t ad
more explanation to the explanation already given by the
others variables.
Interpreting  the  coefficients
• Based on the sign of the estimated coefficient Bi:
– Positive coefficient Prob(y=1) increases
– Negative coefficient Prob(y=1) decreases
• Based on exp (Bi):
– It is possible to say the odd relation (how much
probable is Y=1) when Xi only increases in 1 unit (see
next slide)
• To compute the odd or the probability for some concrete
values of the independent variables
• Effect on the probability:
– Difference in the probability (for qualitative variables)
– Derivative: P(Y=1)/ xi = p(1-p) Bi (quantitative
variables)
Odds  ratios
Click  the  red  triangle  and  click  Odds  
Ratios.  The  Odds  Ratios  tables  are  
added  to  the  JMP  output.  

Unit  Odds  Ratios  refers  to  the  


expected  change  in  the  odds  ratio  for  a  
one-­unit  change  in  the  independent  
variable.  
Range  Odds  Ratios  refers  to  the  
expected  change  in  the  odds  ratio  when  
the  independent  variable  changes  from  
its  minimum  to  its  maximum.  
Since  the  present  independent  variable  
is  a  binary  0-­1  variable,  these  two  
definitions  are  the  same.  
We  get  not  only  the  odds  ratio,  but  a  
confidence  interval,  too.  Notice  the  right-­
skewed  confidence  interval;;  this  is  
typical  of  confidence  intervals  for  odds  
ratios
Probabilities for each observation
• Finally,  we  can  use  the  logistic  regression  to  compute  probabilities  for  each  
observation.  
• As  noted,  the  logistic  regression  will  produce  an  estimated  logit for  each  
observation.  These  estimated  logits can  be  used,  in  the  obvious  way,  to  compute  
probabilities  for  each  observation.  
• We  can  obtain  the  estimated  logits and  probabilities  by  clicking  the  red  triangle  on  
Nominal  Logistic  Fit  and  selecting  Save  Probability  Formula.  
• Four  columns  will  be  added  to  the  worksheet:  Lin[0],  Prob[0],  Prob[1],  and  Most  
Likely  PassClass.  
• For  each  observation,  these  give  the  estimated  logit,  the  probability  of  low  
consumption,  and  the  probability  of  high  consumption.  
Probabilities for each observation

The  fourth  column  (Most  Likely  PassClass)  classifies  the  observation  as  either  1  
or  0,  depending  upon  whether  the  probability  is  greater  than  or  less  than  50%.  
Confusion  Matrix
• We can observe how well our model classifies all the observations (using this cut-off
point of 50%) by producing a confusion matrix: Click the red triangle and click
Confusion matrix.
• It compares the predictions vs. the sample data
– For each observation Prob(Y=1) is computed
– One observation is forecasted in the group (Y=1) if Prob(Y=1) is bigger than a
precise value (usually, the cut point is 0.5)
– The percentage of cases well classified is computed
• The rows of the confusion matrix are the actual
classification.
• The columns are the predicted classification from
the model (that is, the predicted 0/1 values from that
last fourth column using our logistic model and a
cutpoint of .50).
• Correct classifications are along the main diagonal
from upper left to lower right.
• The values on the other diagonal are
misclassifications.
Predictive  effectiveness  of  the  
classification
• The classification obtained with the model could be
compared to that randomly obtained, to that based on
Huberty Test, that is normal distributed:

(o − e) n 1 2
H= ; where e = (n1 + n22 )
e(n − e) n
o: right classified observations
n: number of total observations or total observations of the i group
In our example:
1 (355 − 203,004) 406
e= (204 + 202 ) = 203,004 ; H =
2 2
= 15,08 > 1,96
406 203,004 (406 − 203,004)
Second  example
• Suppose  we  open  a  small  data  set  toylogistic.jmp,  containing  students’
midterm  exam  scores  (MidtermScore)  and  whether  the  student  passed  
the  class  (PassClass=1  if  pass,  PassClass=0  if  fail).  
• A  passing  grade  for  the  midterm  is  70.  The  first  thing  to  do  is  create  a  
dummy  variable  to  indicate  whether  the  student  passed  the  midterm:  
PassMidterm  =  1  if  MidtermScore  ≥  70  and  PassMidterm  =  0  otherwise:
– Select  Cols→New  Column  to  open  the  New  Column  dialog  box.  
– In  the  Column  Name  text  box,  for  our  new  dummy  variable,  type  PassMidterm.  
– Click  the  drop-­down  box  for  modeling  type  and  change  it  to  Nominal.  
– Click  the  drop-­down  box  for  Column  Properties  and  select  Formula.  The  Formula  dialog  
box  appears.  
– Under  Functions,  click  Conditional→If.  
– Under  Table  Columns,  click  MidtermScore so  that  it  appears  in  the  top  box  to  the  right  
of  the  If.  
– Under  Functions,  click  Comparison  Analyze→Distributions  “a>=b”.  
– In  the  formula  box  to  the  right  of  >=,  enter  70.  Press  the  Tab key.  
– Click  in  the  box  to  the  right  of  the  Þ,  and  enter  the  number  1.  
– Similarly,  enter  0  for  the  else  clause  and  accept  twice.  
The  Logistic  function:  example
The  Logistic  function:  example
• First,  let  us  use  a  traditional  contingency  
table  analysis  to  determine  the  odds  ratio.  
• Make  sure  that  both  PassClass  and  
PassMidterm  are  classified  as  nominal  
variables.  
– Right-­click  in  the  data  grid  of  the  column  
PassClass and  select  Column  Info.  
– Click  the  black  triangle  next  to  Modeling  Type  
and  select  Nominal→OK.  Do  the  same  for  
PassMidterm.
• Select  Analyze→Tabulate  to  open  the  
Control  Panel.  It  shows  the  general  layout  
for  a  table.
– Drag  PassClass  into  the  Drop  zone  for  columns  
and  select  Add  Grouping  Columns.  
– Now  that  data  have  been  added,  the  words  
Drop  zone  for  rows will  no  longer  be  visible,  but  
the  Drop  zone  for  rows  will  still  be  in  the  lower  
left  panel  of  the  table.  
The  Logistic  function:  example
• Drag  PassMidterm  to  the  panel  
immediately  to  the  left  of  the  8  in  the  table.  
• Select  Add  Grouping  Columns.  Click  
Done.  
• A  contingency  table  will   appear.

The  probability  of  passing  the  class  when  you  did  not  pass  the  midterm  is  

The  probability  of  not  passing  the  class  when  you  did  not  pass  the  midterm  is  
(similar  to  row  percentages).  

The  odds  of  passing  the  class  given  that  you  have  failed  the  midterm  are
The  Logistic  function:  example
Similarly,  we  calculate  the  odds  of  passing  the  class  given  that  you  have  passed  the  
midterm  as:

Of  the  students  that  did  pass  the  midterm,  the  odds  are  the  number  of  students  that  
pass  the  class  divided  by  the  number  of  students  that  did  not  pass  the  class.

In  the  above  paragraphs,  we  spoke  only  of  odds.  Now  let  us  calculate  an  odds  ratio.  
Suppose  we  want  to  know  the  odds  ratio  of  passing  the  class  by  comparing  those  who  
pass  the  midterm  (PassMidterm=1  in  the  numerator)  to  those  who  fail  the  midterm  
(PassMidterm=0  in  the  denominator).  The  usual  calculation  leads  to:
The  Logistic  function:  example
which  has  the  following  interpretation:  the  odds  of  passing  the  class  if  you  pass  
the  midterm  are  8.33  times  the  odds  of  passing  the  class  if  you  fail  the  midterm.  

Note  that  the  log-­odds  are  ln(8.33)  =  2.120.  

Of  course,  the  user  doesn’t  have  to  perform  all  these  calculations  by  hand;;  JMP  
will  do  them  automatically.  When  a  logistic  regression  has  been  run,  simply  
clicking  the  red  triangle  and  selecting  Odds  Ratios  will  do  the  trick.
The  Logistic  function:  example
• Equivalently,  we  could  compare  those  who  fail  the  midterm  
(PassMidterm=0  in  the  numerator)  to  those  who  pass  the  midterm  
(PassMidterm=1  in  the  denominator)  and  calculate:

• which  tells  us  that  the  odds  of  passing  the  class  failing  the  midterm  are  
0.12  times  the  odds  of  passing  the  class  for  a  student  who  passes  the  
midterm.  
• It  is  easier  to  interpret  the  odds  ratio  when  it  is  less  than  1  by  using  the  
following   transformation:  (OR  – 1)*100%.  
– Compared  to  a  person  who  passes  the  midterm,  a  person  who  fails  the  midterm  is  12%  
as  likely  to  pass  the  class;;  or  equivalently,  a  person  who  fails  the  midterm  is  88%  less  
likely,  (OR  – 1)*100%  =  (0.12  – 1)*100%=  -­88%,  to  pass  the  class  than  someone  who  
passed  the  midterm.  Note  that  the  log-­odds  are  ln(0.12)  =  -­2.12.
The  Logistic  function:  example
• The  relationships  between  probabilities,  odds  (ratios),  and  log-­odds  (ratios)  are  
straightforward.  
• An  event  with  a  small  probability  has  small  odds,  and  also  has  small  log-­odds.  
• An  event  with  a  large  probability  has  large  odds  and  also  large  log-­odds.  
• Probabilities  are  always  between  zero  and  unity;;  odds  are  bounded  below  by  zero  
but  can  be  arbitrarily  large;;  log-­odds  can  be  positive  or  negative  and  are  not  
bounded.  In  particular,  if  the  odds  ratio  is  1  (so  the  probability  of  either  event  is  
0.50),  then  the  log-­odds  equal  zero.  

• Suppose  π =  0.55,  so  the  odds  ratio  0.55/0.45  =  1.222.  Then  we  say  that  the  
event  in  the  numerator  is  (1.222-­1)  =  22.2%  more  likely  to  occur  than  the  event  in  
the  denominator.
Odds  ratio  in  Logistic  regression
• Different  software  applications  adopt  different  conventions  for  handling  the  
expression  of  odds  ratios  in  logistic  regression.  By  default,  JMP  =  uses  the  “log  
odds  of  0/1” convention,  which  puts  the  0  in  the  numerator  and  the  1  in  the  
denominator.  This  is  a  consequence  of  the  sort  order  of  the  columns,  which  we  
will  address  shortly.
• To  see  the  practical  importance  of  this,  we  can  simply  run  a  logistic  regression.  It  
is  important  to  make  sure  that  PassClass  is  nominal  and  that  PassMidterm  is  
continuous.  
– If  you  have  been  following  along  with  the  book,  both  variables  ought  to  be  classified  as  nominal,  so  
PassMidterm  n eeds  to  b e  changed  to  continuous.  Right-­click  in  the  column  PassMidterm  in  the  
data  grid  and  select  Column  Info.  Click  the  black  triangle  next  to  Modeling  T ype  and  select  
Continuous,  and  then  click  OK.
• From  the  top  menu,  select  Analyze→Fit  Model.  Select  PassClass→Y.  Select  
PassMidterm→Add.  Click  Run.  
Odds  ration  in  
Logistic  regression
Odds  ration  in  Logistic  regression
• The  intercept  is  0.91629073,  and  the  slope  is  -­2.1202635.  
• The  slope  gives  the  expected  change  in  the  logit  for  a  one-­unit  change  in  the  
independent  variable  (i.e.,  the  expected  change  on  the  log  of  the  odds  ratio).  
• However,  if  we  simply  exponentiate  the  slope  (i.e.,  compute                                              ) ,                        
then  we  get  the  0/1  odds  ratio.
• There  is  no  need  for  us  to  exponentiate  the  coefficient  manually.  JMP  will  do  this  
for  us:  
– Click  the  red  triangle  and  click  Odds  Ratios.  T he  Odds  Ratios  tables  are  added  to  the  JMP  output.  

Unit  Odds  Ratios  refers  to  the  expected  change  in  


the  odds  ratio  for  a  one-­unit  change  in  the  
independent  variable.  
Range  Odds  Ratios  refers  to  the  expected  change  
in  the  odds  ratio  when  the  independent  variable  
changes  from  its  minimum  to  its  maximum.  
Since  the  present  independent  variable  is  a  binary  0-­
1  variable,  these  two  definitions  are  the  same.  
We  get  not  only  the  odds  ratio,  but  a  confidence  
interval,  too.  Notice  the  right-­skewed  confidence  
interval;;  this  is  typical  of  confidence  intervals  for  
odds  ratios
Odds  ration  in  Logistic  regression
• To  change  from  the  default  convention  (log  odds  of  0/1,  which  puts  the  0  in  the  
numerator  and  the  1  in  the  denominator),  in  the  data  table,  right-­click  to  select  
the  name  of  the  PassClass column.  Under  Column  Properties,  select  Value  
Ordering.  Click  on  the  value  1  and  click  Move  Up.

When  you  re-­run  the  logistic  


regression,  although  the  
parameter  estimates  will  not  
change,  the  odds  ratios  will  
change  to  reflect  the  fact  that  the  
1  is  now  in  the  numerator  and  the  
0  is  in  the  denominator.
The  Logistic  function:  example
• The  independent  variable  is  not  limited  to  being  only  a  nominal  (or  ordinal)  
variable;;  it  can  be  continuous.  
• In  particular,  let’s  examine  the  results  using  the  actual  score  on  the  midterm,  with  
MidtermScore  as  an  independent  variable:
• Select  Analyze→Fit  Model.  Select  PassClass→Y and  then  select  
MidtermScore→Add.  Click  Run.  
• This  time  the  intercept  is  25.6018754,  and  the  slope  is  -­0.3637609.  So  we  expect  
the  log-­odds  to  decrease  by  0.3637609  for  every  additional  point  scored  on  the  
midterm. To  view  the  effect  on  the  odds  ratio  itself,  
as  before  click  the  red  triangle  and  click  
Odds  Ratios.  
The  Logistic  function:  example
• For  a  one-­unit  increase  in  the  midterm  score,  the  new  odds  ratio  will  be  69.51%  of  
the  old  odds  ratio.  Or,  equivalently,  we  expect  to  see  a  30.5%  reduction  in  the  
odds  ratio  (0.695057  – 1)*(100%=-­30.5%).  
– For  example,  suppose  a  hypothetical  student  has  a  midterm  score  of  75.  T he  student’s  log  odds  of  
failing  the  class  would  be  25.6018754  – 0.3637609*75  =  -­1.680192.  
– So  the  student’s  odds  of  failing  the  class  would  be  exp(-­1.680192)  =  0.1863382.  T hat  is,  the  student  
is  much  more  likely  to  pass  than  fail.  
– Converting  odds  to  probabilities  (0.1863328/(1+0.1863328)  =  0.157066212786159),  we  
see  that  the  student’s  probability  of  failing  the  class  is  0.15707,  and  the  probability  of  
passing  the  class  is  0.84293.  
– Now,  if  the  student’s  score  increased  by  one  point  to  76,  then  the  log  odds  of  failing  the  
class  would  be  25.6018754  – 0.3637609*76  =  -­2.043953.  
– Thus,  the  student’s  odds  of  failing  the  class  become  exp(-­2.043953)=  0.1295157.  So,  the  probability  
of  passing  the  class  would  rise  to  0.885334,  and  the  probability  of  failing  the  class  would  fall  to  
0.114666.  
• With  respect  to  the  Unit  Odds  Ratio,  which  equals  0.695057,  we  see  that  a  one-­
unit  increase  in  the  test  score  changes  the  odds  ratio  from  0.1863382  to  
0.1295157.  In  accordance  with  the  estimated  coefficient  for  the  logistic  regression,  
the  new  odds  ratio  is  69.5%  of  the  old  odds  ratio  because  0.1295157/0.1863382  =  
0.695057.
The  Logistic  function:  example
• Finally,  we  can  use  the  logistic  regression  to  compute  probabilities  for  each  
observation.  
• As  noted,  the  logistic  regression  will  produce  an  estimated  logit  for  each  
observation.  These  estimated  logits  can  be  used,  in  the  obvious  way,  to  compute  
probabilities  for  each  observation.  
• Consider  a  student  whose  midterm  score  is  70.  The  student’s  estimated  logit  is  
25.6018754  – 0.3637609(70)  =  0.1386124.  Since  exp(0.1386129)  =  1.148679  =  
p/(1-­p),  we  can  solve  for  p  (the  probability  of  failing)  =  0.534597.  
• We  can  obtain  the  estimated  logits  and  probabilities  by  clicking  the  red  triangle  on  
Normal  Logistic  Fit  and  selecting  Save  Probability  Formula.  
• Four  columns  will  be  added  to  the  worksheet:  Lin[0],  Prob[0],  Prob[1],  and  Most  
Likely  PassClass.  
• For  each  observation,  these  give  the  estimated  logit,  the  probability  of  failing  the  
class,  and  the  probability  of  passing  the  class,  respectively.  
The  Logistic  function:  example
• Observe  that  the  sixth  student  
has  a  midterm  score  of  70.  
• Look  up  this  student’s  estimated  
probability  of  failing  (Prob[0]);;  it  
is  very  close  to  what  we  just  
calculated  above.  The  difference  
is  that  the  computer  carries  16  
digits  through  its  calculations,  
but  we  carried  only  six.

The  fourth  column  (Most  Likely  PassClass)  classifies  the  observation  as  either  1  
or  0,  depending  upon  whether  the  probability  is  greater  than  or  less  than  50%.  
The  Logistic  function:  example
• We  can  observe  how  well  our  model  classifies  all  the  
observations  (using  this  cut-­off  point  of  50%)  by  
producing  a  confusion  matrix:  Click  the  red  triangle  and  
click  Confusion  matrix.  
• The  rows  of  the  confusion  matrix  are  the  actual  
classification  (that  is,  whether  PassClass  is  0  or  1).  
• The  columns  are  the  predicted  classification  from  the  
model  (that  is,  the  predicted  0/1  values  from  that  last  
fourth  column  using  our  logistic  model  and  a  cutpoint  of  
.50).  
• Correct  classifications  are  along  the  main  diagonal  
from  upper  left  to  lower  right.  
– We  see  that  the  model  has  classified  6  students  as  not  passing  
the  class,  and  actually  they  did  not  pass  the  class.  
– The  model  also  classifies  10  students  as  passing  the  class  when  
they  actually  did.  
• The  values  on  the  other  diagonal,  both  equal  to  2,  are  
misclassifications.  
Model’s  assumptions
• Before  we  can  use  the  model,  we  have  to  check  the  model’s  assumptions,  etc.  
The  first  step  is  to  verify  the  linearity  of  the  logit.  
• This  can  be  done  by  plotting  the  estimated  logit  against  PassClass.  
– Select  Graph→Scatterplot  Matrix.  
– Select  Lin[0]→Y,  columns.  
– Select  MidtermScore→X.  
– Click  OK.  
• The  linearity  assumption  appears  to  be  perfectly  satisfied.
Variable  Y  with  different  categories
When Y has more than two categories we find different
models:
• Multinomial Logit: Y is nominal
• Conditional Logit: the election of one alternative of Y depends
on a previous election
• Ordinal Logit: Y is ordinal
• Sequential Logit: to go to the second level of Y requires to
pass the first
• Poisson regression: discrete variables (1, 2,...)

You might also like