0% found this document useful (0 votes)
3 views20 pages

Logistic Regression

Uploaded by

Ayesha Niazi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views20 pages

Logistic Regression

Uploaded by

Ayesha Niazi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Logistic Regression

The nature of DV
• Suppose we want to study the labor force participation (LFP) decision
of adult males
• an adult is either in the labor force or not,
• LFP is a yes or no decision.
• Y = 1 (the person is in the Labor Force) & Y = 0 (if a person is not in LF)
• General elections:
• Two political parties: Islamic & Secular
• DV = Y = Vote choice
• Y = 1 (if the vote is for a Islamic party) & Y = 0 (if the vote is for a Secular
party).
Objective
• In regression analysis, when DV is qualitative/binary/dichotomous,
our objective is to find the probability of something happening, for
example:
• voting for candidate,
• owning a house,
• belonging to a union,
• participating in a sport, etc.
• These regression models are called Probability Models.
The Logistic Distribution Function
• There are various approaches to develop a probability model.

• We will use the following logistic distribution function to model


regressions where the response variable is dichotomous / binary.
The Logit Model (L)
• Zi ranges from -∞ to +∞
• Pi ranges between 0 and 1
• Pi is nonlinearly related to Zi (i.e., Xi )
• We cannot use the familiar OLS procedure to estimate the parameters.
• We first linearize the model.

Features of the Logit Model
• As P goes from 0 to 1 (i.e., as Z varies from -∞ to +∞), the L goes from
-∞ to +∞. That is, although the probabilities lie between 0 and 1, the
logits are not so bounded.
• Although L is linear in X, the probabilities themselves are not.
• If L is positive, it means that when the value of the regressor(s)
increases, the odds that the regressand equals 1 (meaning some
event of interest happens) increases.
• If L is negative, the odds that the regressand equals 1 decreases as
the value of X increases.
• The slope (β), measures the change in L for a unit change in X.
Estimation of the Logit Model
• The following table gives data on several families grouped a/c to income
level and the number of families owning a house at each income level.
Corresponding to each income level Xi , there are Ni families, ni among
whom are home owners (ni ≤ Ni).

• if Ni is fairly large and if each observation in a given income
class Xi is distributed independently as a binomial variable, then

• the disturbance term in the logit model is heteroscedastic. Thus, instead of


using OLS we will have to use the weighted least squares (WLS).
Estimation of WLS

• keep in mind that all the conclusions will be valid strictly speaking only if
the sample is reasonably large.
The result of WLS regression:

• Logit interpretation: the estimated slope coefficient suggests that for


a unit ($1,000) increase in weighted income, the weighted log of the
odds in favor of owning a house goes up by 0.08 units.

• Odds interpretation: Take the antilog of the estimated logit, we gets the
odds ratio (Pi/(1-Pi)).

• This means that for a unit increase in weighted income, the (weighted)
odds in favor of owning a house increases by 1.0817 or about 8.17
percent.
R codes
d1=read.csv("D:/eBooks/Statistical modelling/SMC
MS/glogit.csv",header=TRUE)
attach(d1)
names(d1)

gglm=glm(Y~income,family=binomial("logit"))
summary(gglm)
Data on the Effect of Personalized System of Instruction
(PSI) on Course Grades
• Letting Y = 1 if a student’s final grade in an intermediate
microeconomics course was A and Y = 0 if the final grade was a B or a
C, Spector and Mazzeo used grade point average (GPA), TUCE, and
Personalized System of Instruction (PSI) as the grade predictors. The
logit model here can be written as:
• d2=read.csv("D:/eBooks/Statistical modelling/SMC
MS/gradeData.csv",header=TRUE)
• attach(d2)
• names(d2)
• gglm2=glm(Grade~PSI+GpaGrade+TuceGrade,family=binomial("logit"))
• summary(gglm2)
• the GPA coefficient of 2.8261 means, with other variables held
constant, that if GPA increases by a unit, on average the estimated
logit increases by about 2.83 units, suggesting a positive relationship
between the two.
• all the other repressors have a positive effect on the logit, although
statistically the effect of TUCE is not significant.
• if you take the antilog of the PSI coefficient of 2.3786 you will get
10.7897.
• This suggests that students who are exposed to the new method of
teaching are more than 10 times as likely to get an A than students
who are not exposed to it, other things remaining the same.

You might also like