0% found this document useful (0 votes)

29 views

Unit - II Regression-LogisticRegressionModels

Uploaded by

sakthi vel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views

Unit - II Regression-LogisticRegressionModels

Uploaded by

sakthi vel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Chapter 14

Logistic Regression Models

In the linear regression model X    , there are two types of variables – explanatory variables X 1 , X 2 ,.., X k

and study variable y . These variables can be measured on a continuous scale as well as like an indicator
variable. When the explanatory variables are qualitative, then their values are expressed as indicator
variables, and then dummy variable models are used.

When the study variable is a qualitative variable, then its values can be expressed using an indicator variable
taking only two possible values 0 and 1. In such a case, the logistic regression is used. For example, y can
denotes the values like success or failure, yes or no, like or dislike, which can be denoted by two values 0
and 1.

Consider the model

yi   0  1 xi1   2 xi 2  ...   k xik   i
 xi'    i , i  1, 2,..., n

where xi'  1, xi1 , xi 2 ,..., xik  ,  '    0 , 1 ,  2 ,...,  k  .

The study variable takes two values as yi  0 or 1. Assume that yi follows a Bernoulli distribution with a

parameter  i , so its probability distribution is

1 with P ( yi  1)   i
yi  
0 with P ( yi  0)  1   i .

Assuming E ( i )  0,

E ( yi ) 1. i  0.(1   i )   i .

From the model yi  xi'    i , we have

E ( yi )  xi' 
 E ( yi )  xi'    i
 E ( yi )  P ( yi  1).

Thus response function E ( yi ) is simply the probability that yi  1.

Regression Analysis | Chapter 14 | Logistic Regression Models | Shalabh, IIT Kanpur

1
Note that  i  yi  xi'  , so

- when yi  1, then  i  1  xi' 

- yi  0, then  i   xi'  .

Recall that earlier  i was assumed to follow a normal distribution when y was not an indicator variable.

When y is an indicator variable, then  i takes only two values, so it cannot be assumed to follow a normal

distribution.

In the usual regression model, the errors are homoskedastic, i.e., Var ( i )   2 and so Var ( yi )   2 . When y

is an indicator variable, then

Var ( yi )  E  yi  E ( yi ) 
2

 (1   i ) 2  i  (0   i ) 2 (1   i )
  i (1   i ) 1   i   i 
  i (1   i )
 E ( yi ) 1  E ( yi ) 
  y2i .

Thus Var ( yi ) depends on yi and is a function mean of yi . Moreover, since E ( yi )   i and  i is the

probability, so 0   i  1 and thus there is a constraint on E ( yi ) that 0  E ( yi )  1. This puts a big constraint

on the choice of the linear response function. One cannot fit a model in which the predicted values lie outside
the interval of 0 and 1.

When y is a dichotomous variable, then empirical pieces of evidence suggest that the function E ( y ) on the
whole real line that can be mapped to [0,1] has the sigmoid shape. It is a nonlinear S  shape like
1 1

E(y) E(y)

0 0
x x

Regression Analysis | Chapter 14 | Logistic Regression Models | Shalabh, IIT Kanpur

2
A natural choice for E ( y ) would be the cumulative distribution function of a random variable. In particular,
the logistic distribution, whose cumulative distribution function is the simplified logistic function yields a
good link and is given by
exp( y )
E ( y) 
1  exp( y )
exp( x '  )

1  exp( x '  )
1
 .
1  exp( x '  )

Linear predictor and link functions:

The systematic component in E(y) is the linear predictor and is denoted as
i    j xij  xi'  , i  1, 2,..., n, j  0,1, 2,..., k .
j

The link function in generalized linear model relates the linear predictor i to the mean response i .

Thus
g ( i )  i
or i  g 1 (i ).

In the usual linear models based on the normally distributed study variable, the link g ( i )  i is used and is

called an identity link. A link function maps the range of i onto the whole real line, provides good

empirical approximation and carries meaningful interpretations in real applications.

In the case of logistic regression, the link function is defined as


  ln .
1

This transformation is called as the logit transformation of probability  and is called as odds. The
1 
link  is also called as log-odds. This link function is obtained as follows:

Regression Analysis | Chapter 14 | Logistic Regression Models | Shalabh, IIT Kanpur

3
1

1  exp( )

or  1  exp( )   1

1 
or e  


or   ln .
1 

Note: Similar to logit function, there are other functions also which have the same shape as of logistic
function. These functions can also be transformed through  . There are two such popular functions – probit
transformation and complementary log-log transformation. The probit transformation is based on the
transformation of  using the cumulative distribution function of normal distribution and based on this is the
probit regression model.

The complementary log-log transformation of  is ln   ln(1   )  .

Maximum likelihood estimation of parameters:

Consider the general form of the logistic regression model
yi  E ( yi )   i

where yi ' s are independent Bernoulli random variable with a parameter  i with

E ( yi )   i
exp( xi'  )

1  exp( xii  )

The probability density function of yi is

f i ( yi )   iyi (1   i )1 yi , i  1, 2,..., n, yi  0 or 1.

The likelihood function is

n
L( y1 , y2 ,..., yn , 1 ,  2 ,...,  k )  L   f i ( yi )
i 1
n
  f i ( yi )(1   i )1 yi
i 1

Regression Analysis | Chapter 14 | Logistic Regression Models | Shalabh, IIT Kanpur

4
n
ln L   ln  iyi  ln(1   i )1 yi 
i 1
n
   yi ln  i  (1  yi ) ln(1   i ) 
i 1

n 
   n
   yi ln  i      ln(1   i ) .
i 1   1   i   i 1
Since
exp( xi'  )
i  ,
1  exp( xi'  )
1
1i  ,
1  exp( xi'  )
i
 exp( xi'  ),
1 i

ln i  exp xi'  ,
1 i
so
n n
ln L   yi xi'   ln 1  exp( xi'  )  .
i 1 i 1

Suppose repeated observations are available at each level of the x -variables. Let yi be the numbers of 1’s

observed for i th observation and ni be the number of trials at each observation. Then
n n n
ln L   yi i   ni ln(1   i )  yi ln(1   i ) .
i 1 i 1 i 1

The maximum likelihood estimate ̂ of  is obtained by the numerical maximization.

If V ( )  , then asymptotically

E ( ˆ )  
V ( ˆ )  ( X '  1 X ) 1.

After obtaining ̂ , the linear predictor is estimated by

î  xi'  .
The fitted value is
exp(î ) 1 1
yî  î    .
1  exp(î ) 1  exp(î ) 1  exp( xi' ˆ )

Regression Analysis | Chapter 14 | Logistic Regression Models | Shalabh, IIT Kanpur

5
Interpretation of parameters:
To understand the interpretation of the related  ' s in the logistic regression model, first, consider a simple
case with only one variable as
 ( x)   0  1 x.

After fitting of the model, ˆ0 and ˆ1 are obtained as the estimators of  0 and 1 respectively. Then the

fitted linear predictor at x  xi is

ˆ ( xi )  ˆ0  ˆ1 xi
which is the log-odds at x  xi . The fitted value at x  xi  1 is

ˆ ( xi  1)  ˆ0  ˆ1 ( xi  1)
which is the log-odds at x  xi  1.

Thus
ˆ1  ˆ ( xi  1)  ˆ ( xi )
 ln  odds( xi  1)   ln  odds( xi ) 
 odds(xi  1) 
 ln  
 odds( xi ) 
odds(xi  1)
  exp( ˆ1 ).
odds( xi )

This is termed as odd ratio, which is the estimated increase in the probability of success when the value of
explanatory variable changes by one unit.

When there are more than one explanatory variables in the model, then the interpretation of  j ' s is similar

as in the case of a single explanatory variable case. The odds ratio is exp ( ˆ j ) associated with explanatory

variable x j keeping other explanatory variables constant. This is similar to the interpretation of  j in

multiple linear regression model.

If there is a m unit change is the explanatory variable, then the estimated increase in odds ratio is exp (mˆ j ).

Regression Analysis | Chapter 14 | Logistic Regression Models | Shalabh, IIT Kanpur

6
Test of hypothesis:
The test of hypothesis for the parameters in the logistic regression model is based on asymptotic theory. It is
a large sample test based on the likelihood ratio test based on a statistic termed as deviance.

A model with exactly p parameters that perfectly fit the sample data is termed as a saturated model.

The statistic that compares the log-likelihoods of fitted and saturated models is called as model deviance. It
is defined as
 (  )  2 ln L(saturated model)  2 ln L( ˆ )

where ln L() is the log-likelihood and ̂ is the maximum likelihood estimate of  .

In the case of the logistic regression model, yi  0 or 1 and  i ’s are completely unrestricted. So the

likelihood will be maximum at  i  yi , and the maximum value of L (saturated modal) is

Maximum L(saturated model)  1
 ln Maximum L (saturated model)  0.

Let ̂ be the maximum likelihood estimator of  , then log-likelihood is maximum at   ˆ , and

n n
ln L( ˆ )   yi xi' ˆi  ln 1  exp( xi'  ) 
i 1 i 1

 ln L(saturated model).

Assuming that the logistic regression function is correct, the large sample distribution of likelihood ratio test
statistic  (  ) is approximately distributed as  2 (n  p ) , when n is large.

A large value of  (  ) implies the model is incorrect. A small value of  (  ) implies that the model is well
fitted and is as good as the saturated model. Note that generally, the fitted model will be having a smaller
number of parameters than the saturated model that is based on all the parameters. Thus at  % level of
significance.

Regression Analysis | Chapter 14 | Logistic Regression Models | Shalabh, IIT Kanpur

Theory Open Quantum System
100% (7)
Theory Open Quantum System
649 pages
University of Central Punjab: Department of Computer Engineering Assignment 3 (Part 1)
50% (2)
University of Central Punjab: Department of Computer Engineering Assignment 3 (Part 1)
3 pages
Quantitative Finance Problems and Solutions
No ratings yet
Quantitative Finance Problems and Solutions
2 pages
Binary Logistic Regression
No ratings yet
Binary Logistic Regression
8 pages
Notes 13
No ratings yet
Notes 13
18 pages
Logistic Nota
No ratings yet
Logistic Nota
87 pages
Logistic Regression
No ratings yet
Logistic Regression
54 pages
26GeneralizedLinearModelBernoulliAnnotated PDF
No ratings yet
26GeneralizedLinearModelBernoulliAnnotated PDF
46 pages
Logistic Regression
No ratings yet
Logistic Regression
8 pages
Logistic and Nonlinear Regression: Department of Political Science AND International Relations Posc/Uapp 816
No ratings yet
Logistic and Nonlinear Regression: Department of Political Science AND International Relations Posc/Uapp 816
15 pages
Logistic Regression
No ratings yet
Logistic Regression
8 pages
Ho GLM
No ratings yet
Ho GLM
5 pages
Week04 Lecture BB
No ratings yet
Week04 Lecture BB
80 pages
glm
No ratings yet
glm
4 pages
Lecture 8
No ratings yet
Lecture 8
39 pages
Chapter 5 Mgt
No ratings yet
Chapter 5 Mgt
60 pages
Section 9 Limited Dependent Variables
No ratings yet
Section 9 Limited Dependent Variables
17 pages
Lecture 22. Glm
No ratings yet
Lecture 22. Glm
41 pages
Regression With A Binary Dependent Variable
No ratings yet
Regression With A Binary Dependent Variable
63 pages
Generalized Linear Models: FX Axb C DX Axb C DX
No ratings yet
Generalized Linear Models: FX Axb C DX Axb C DX
11 pages
Chapter 5-LDVM-2024
No ratings yet
Chapter 5-LDVM-2024
27 pages
CQF ML Lab Estimating Default Probability With Logistic Regression
No ratings yet
CQF ML Lab Estimating Default Probability With Logistic Regression
7 pages
Capstone - Https:Users - Ox.ac - Uk: Jesu0073:Lecture 3:LogisticRegression
No ratings yet
Capstone - Https:Users - Ox.ac - Uk: Jesu0073:Lecture 3:LogisticRegression
17 pages
Logistic Regression
100% (1)
Logistic Regression
37 pages
Notes 15
No ratings yet
Notes 15
20 pages
PD2004 9
No ratings yet
PD2004 9
26 pages
Ogistic Egression: Concha Bielza, Pedro Larra Naga
No ratings yet
Ogistic Egression: Concha Bielza, Pedro Larra Naga
33 pages
Introduction To Logistic Regression
No ratings yet
Introduction To Logistic Regression
20 pages
ML Assignment
No ratings yet
ML Assignment
20 pages
Lecture 13: Introduction To Generalized Linear Models: 21 November 2007
No ratings yet
Lecture 13: Introduction To Generalized Linear Models: 21 November 2007
12 pages
A Simple But Effective Logistic Regression Derivation
No ratings yet
A Simple But Effective Logistic Regression Derivation
6 pages
Logistic Regression & Practice
100% (1)
Logistic Regression & Practice
51 pages
Regression3 Slides
No ratings yet
Regression3 Slides
47 pages
Log Reg
No ratings yet
Log Reg
32 pages
LAB04-RegressionTasks
No ratings yet
LAB04-RegressionTasks
31 pages
2+Logistic_regression
No ratings yet
2+Logistic_regression
10 pages
Lecture15 Binary Dependent Variables
No ratings yet
Lecture15 Binary Dependent Variables
38 pages
Logisticregression
No ratings yet
Logisticregression
40 pages
Logistic Regression
No ratings yet
Logistic Regression
4 pages
Logistic Regression
No ratings yet
Logistic Regression
18 pages
Binary Logistic Regression - 6.2
No ratings yet
Binary Logistic Regression - 6.2
34 pages
Logistic Regression
No ratings yet
Logistic Regression
23 pages
Logistic Regression: Logistic Regression and The New: Residual Logistic Regression
No ratings yet
Logistic Regression: Logistic Regression and The New: Residual Logistic Regression
31 pages
Reference Material Logistic Regression
No ratings yet
Reference Material Logistic Regression
11 pages
Understanding Maximum Likelihood
No ratings yet
Understanding Maximum Likelihood
5 pages
Logistic Regression
No ratings yet
Logistic Regression
9 pages
Dummy Dependent Variable
100% (1)
Dummy Dependent Variable
58 pages
Roni Presentation
No ratings yet
Roni Presentation
17 pages
ppt4
No ratings yet
ppt4
54 pages
Lecture Notes 5
100% (1)
Lecture Notes 5
53 pages
Econometria Avanzada: Generalized Linear Models
No ratings yet
Econometria Avanzada: Generalized Linear Models
30 pages
Msfe Week9
No ratings yet
Msfe Week9
5 pages
Regression Kann Ur 14
No ratings yet
Regression Kann Ur 14
43 pages
Logistic regression by Nirzona
No ratings yet
Logistic regression by Nirzona
11 pages
9 Mle
No ratings yet
9 Mle
39 pages
Reference Material - Logistic - Regression
No ratings yet
Reference Material - Logistic - Regression
11 pages
Reference Material - Logistic - Regression
No ratings yet
Reference Material - Logistic - Regression
11 pages
Logistic Regression
No ratings yet
Logistic Regression
6 pages
Logistic-Regression
No ratings yet
Logistic-Regression
3 pages
Econometrics Lecture Notes
No ratings yet
Econometrics Lecture Notes
16 pages
5.1) Binary logistic regression
No ratings yet
5.1) Binary logistic regression
32 pages
Complex Variables
From Everand
Complex Variables
Francis J. Flanigan
No ratings yet
Exercises of Complex Analysis
From Everand
Exercises of Complex Analysis
Simone Malacrida
No ratings yet
Unit - 2
No ratings yet
Unit - 2
3 pages
Unitb - II - Linear Probability, Logit and Probit
No ratings yet
Unitb - II - Linear Probability, Logit and Probit
34 pages
Unit - IV
No ratings yet
Unit - IV
14 pages
Bsc-Statistics - 'Syllabus PDF
No ratings yet
Bsc-Statistics - 'Syllabus PDF
141 pages
Stat 115 Sample Exam 1
No ratings yet
Stat 115 Sample Exam 1
3 pages
H-86 BCA III N 6th Sem Statistical C
No ratings yet
H-86 BCA III N 6th Sem Statistical C
2 pages
Copulas Exercises
No ratings yet
Copulas Exercises
4 pages
PROBSTAT
No ratings yet
PROBSTAT
3 pages
Department of Electrical Engineering EE 322 - Analog and Digital Communication
No ratings yet
Department of Electrical Engineering EE 322 - Analog and Digital Communication
6 pages
Statistical Quality Control of Engineere
No ratings yet
Statistical Quality Control of Engineere
175 pages
Solutions Chapter 8
100% (1)
Solutions Chapter 8
8 pages
Binomial Probability Sums: Table A.1
No ratings yet
Binomial Probability Sums: Table A.1
23 pages
Normal Distribution
No ratings yet
Normal Distribution
29 pages
Chapter 4
No ratings yet
Chapter 4
118 pages
Study Guide 01
No ratings yet
Study Guide 01
4 pages
Experimental Design and Its Role in Data Science: Tirthankar Dasgupta CS 109 / Stat 121 November 17, 2015
No ratings yet
Experimental Design and Its Role in Data Science: Tirthankar Dasgupta CS 109 / Stat 121 November 17, 2015
67 pages
Statistics
No ratings yet
Statistics
13 pages
Bayes Rule
No ratings yet
Bayes Rule
1 page
Probability and Statistics
No ratings yet
Probability and Statistics
204 pages
B4.1-R3 Syllabus
No ratings yet
B4.1-R3 Syllabus
2 pages
(Measures of Central Tendency) : Nadeem Uddin Associate Professor of Statistics
No ratings yet
(Measures of Central Tendency) : Nadeem Uddin Associate Professor of Statistics
11 pages
Assignment 4
No ratings yet
Assignment 4
1 page
Statistic & Probability: (GRADE 11) 3 Quarter
100% (1)
Statistic & Probability: (GRADE 11) 3 Quarter
21 pages
Hidden Markov Model HMM
No ratings yet
Hidden Markov Model HMM
33 pages
Visnav Lecture Notes
No ratings yet
Visnav Lecture Notes
180 pages
Tests of Normality
No ratings yet
Tests of Normality
6 pages
Chapter 12 - MCQ
No ratings yet
Chapter 12 - MCQ
5 pages
Type-II Error/ Unbiased Decision: All The Options A Failing Student Is Passed by An Examiner, It Is An Example of
No ratings yet
Type-II Error/ Unbiased Decision: All The Options A Failing Student Is Passed by An Examiner, It Is An Example of
2 pages
Introduction To Digital Signal Processing
100% (1)
Introduction To Digital Signal Processing
21 pages
Rounding Error Propagation - Bias and Uncertainty
No ratings yet
Rounding Error Propagation - Bias and Uncertainty
34 pages
Machine Learning Techniques
No ratings yet
Machine Learning Techniques
3 pages

Unit - II Regression-LogisticRegressionModels

Uploaded by

Unit - II Regression-LogisticRegressionModels

Uploaded by

Chapter 14

Logistic Regression Models

Consider the model

where xi'  1, xi1 , xi 2 ,..., xik  ,  '    0 , 1 ,  2 ,...,  k  .

parameter  i , so its probability distribution is

From the model yi  xi'    i , we have

Thus response function E ( yi ) is simply the probability that yi  1.

Regression Analysis | Chapter 14 | Logistic Regression Models | Shalabh, IIT Kanpur

- when yi  1, then  i  1  xi' 

is an indicator variable, then

Regression Analysis | Chapter 14 | Logistic Regression Models | Shalabh, IIT Kanpur

Linear predictor and link functions:

empirical approximation and carries meaningful interpretations in real applications.

In the case of logistic regression, the link function is defined as

Regression Analysis | Chapter 14 | Logistic Regression Models | Shalabh, IIT Kanpur

The complementary log-log transformation of  is ln   ln(1   )  .

Maximum likelihood estimation of parameters:

The probability density function of yi is

f i ( yi )   iyi (1   i )1 yi , i  1, 2,..., n, yi  0 or 1.

The likelihood function is

Regression Analysis | Chapter 14 | Logistic Regression Models | Shalabh, IIT Kanpur

The maximum likelihood estimate ̂ of  is obtained by the numerical maximization.

After obtaining ̂ , the linear predictor is estimated by

Regression Analysis | Chapter 14 | Logistic Regression Models | Shalabh, IIT Kanpur

fitted linear predictor at x  xi is

multiple linear regression model.

Regression Analysis | Chapter 14 | Logistic Regression Models | Shalabh, IIT Kanpur

where ln L() is the log-likelihood and ̂ is the maximum likelihood estimate of  .

likelihood will be maximum at  i  yi , and the maximum value of L (saturated modal) is

Let ̂ be the maximum likelihood estimator of  , then log-likelihood is maximum at   ˆ , and

Regression Analysis | Chapter 14 | Logistic Regression Models | Shalabh, IIT Kanpur

You might also like