0% found this document useful (0 votes)
11 views78 pages

Chapter 4

Chapter Four discusses multiple regression analysis incorporating qualitative information, focusing on the use of dummy variables as independent and dependent variables. It explains how qualitative attributes can be quantified to analyze their impact on dependent variables, such as salary differences based on gender or ethnicity. The chapter also covers regression models with only qualitative variables, as well as models that combine qualitative and quantitative variables, providing examples and interpretations of the results.

Uploaded by

mesudmesud2010
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views78 pages

Chapter 4

Chapter Four discusses multiple regression analysis incorporating qualitative information, focusing on the use of dummy variables as independent and dependent variables. It explains how qualitative attributes can be quantified to analyze their impact on dependent variables, such as salary differences based on gender or ethnicity. The chapter also covers regression models with only qualitative variables, as well as models that combine qualitative and quantitative variables, providing examples and interpretations of the results.

Uploaded by

mesudmesud2010
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 78

CHAPTER FOUR: MULTIPLE REGRESSION

ANALYSIS WITH QUALITATIVE INFORMATION

After learning the contents of chapter, students will be


able to:
1. Describing Qualitative Information
2. Dummy as Independent Variables
3. Dummy as Dependent Variable
1. The Linear Probability Model (LPM)
2. The Logit and Probit Models
3. Interpreting the Probit and Logit Model
Estimates
4.1 Describing Qualitative Information
• It describes qualities or characteristics. It is a non measurable
information that we obtain or gather for a given variable.
• It is an indicator variable in nature. Indicator variable, binary
variable, categorical and dichotomous variable are use
interchangeable.
• It is collected using questionnaires, interviews, or observation.
• In regression analysis the dependent variable can be influenced
by variables that are essentially qualitative in nature, such as
sex, race, color, religion, nationality, geographical region,
political upheavals, and party affiliation.
• One way we could “quantify” such attributes is by constructing
artificial variables that take on values of 1 or 0, 1 indicating the
presence (or possession) of that attribute and 0 indicating the
absence of that attribute.
4.1 Describing Qualitative Information
 Variables that assume such 0 and 1 values are called
dummy/ indicator/binary/categorical/dichotomous variables.
 Such variables are essentially a device to classify data into
mutually exclusive categories.
 Dummy variables are a data-classifying device in that they
divide a sample into various subgroups based on qualities or
attributes and implicitly allow one to run individual
regressions for each subgroup.
 The category that receives the value of zero is called base/
reference/ benchmark group. And all comparisons are made in
relation to the benchmark category.
 Dummy variables can be incorporated in regression models
just as easily as quantitative variables.
Examples qualitative variables:
 Gender may play a role in determining salary levels.
 Different ethnic groups may follow different
consumption patterns.
 Educational levels can affect earnings from
Employment.
Other examples of qualitative variables are :
 Marriage status (Single, Married, Separated,
divorced)
 Employment status (Employed, Unemployed).
 Union membership
 Owning a house
 Voting in elections (No, Yes and Undecided).
 Political party membership (Republican, democrat,
other)
4.2 The nature of dummy variables
 In regression analysis the dependent variable is
frequently influenced not only by variables that can
be readily quantified on some well-defined scale
(e.g., income, output, prices, costs, height and
temperature).
 But also, by variables that are essentially qualitative
in nature (e.g., sex, race, color, religion, nationality,
wars, earthquakes, strikes, political upheavals, and
changes in government economic policy).
 For example, holding all other factors constant, female
college professors are found to earn less than their
male counterparts, and non-whites are found to earn
less than whites. This pattern may result from sex or
racial discrimination.
4.2 The nature of dummy variables
Since such qualitative variables usually indicate
the presence or absence of a “quality” or an
attribute,
o In regression analysis, dummy variables are
mainly used to capture qualitative attributes or
characteristics.
o Dummy variables thus sort the data into
mutually exclusive categories
4.2 The nature of dummy variables
 A regression model may contain regressors
that are all exclusively dummy, or qualitative,
in nature. Such models are called Analysis of
Variance (ANOVA) models.
 The significance of the difference between the
means of two samples can be judged through
either z-test or the t-test.
 But, when we want to examine the significance
of the difference amongst more than two sample
means at the same time, the ANOVA technique
enables us to perform this simultaneous test.
4.2 The nature of dummy variables
 On the other hand, regression models
containing a mixture of quantitative
and qualitative variables are called
analysis of covariance (ANCOVA)
models.
 The interpretation of dummy variables
remains the same in both the ANCOVA
and ANOVA models.
4.3.1Regression with only qualitative
 As a matter of fact, a regression model may
contain explanatory variables that are exclusively
dummy, or qualitative, in nature.
 Example: Consider the following model for
salary of a college professors as a function of gender
 Y i = α + βDi + ui -----------------------------(1)
where Y = annual salary of a college professor
Di = 1 if male college professor
= 0 otherwise (i.e., female professor)
α = intercept term and β the slope
coefficient of dummy.
 Model (1) may enable us to find out
whether gender makes any difference
in a college professor's salary,
4.3.1Regression with only qualitative
 The slope coefficient β tells by how much the mean/average
salary of a male college deviating from female college professor.
 If D = 0, E(Y) = E(Y|D = 0) = α
 If D = 1, E(Y) = E(Y|D = 1) =α+β
 Thus, the difference between the two groups (in mean/average
values of Y) is: E(Y|D=1) – E(Y|D=0) =β
 The significance of this difference is tested by a t-test of β = 0.
 Therefore, mean salary of female college professor: E(Yi|Di=0) = α
 Mean salary of male college professor: E(Yi|Di =1) = α + β,
female is the base in the sense that comparisons are made with
that category.
 The coefficient attached to the dummy variable D can be called
the differential intercept coefficient. Because it tells by how much
the value of the intercept term of the category that receives the
value of differs from the intercept coefficient of the base category
Conti…
4.3.1Regression with only qualitative
 If the estimator of β is positive and statistically
significant, average salary of male college professor
exceeds average salary of female college professor by
the amount equal to β.
 On the other hand, if the estimator of β is negative
and statistically significant, it means average salary of
female college professor exceeds average salary of
male college professor by the amount equal to the
estimator of β.
 If the estimator of β is statistically insignificant,
average salary of female college professor does
not have statistically significant difference with
average salary of male college professor.
Conti…
Example
1.Suppose income of 22 accountant in thousands
regression is given (Standard errors are given within
parenthesis)
Y = 35.20 + 10.25D
t (43.82) (3.45)
o Where, Y is income in thousands and D is a dummy
variable of gender taking on the value of 1 if male. Then,
answer the following question:
A. Find the average salary of an accountant of male?
B. Find the average salary of an accountant of female?
C. Find the difference in average salary of an accountant male
and female as head of the household.
D. Test for both different intercept & slope.
Solution
A. The estimated mean salary of male accountant is the sum of
intercept and coefficient of dummy (i.e α + β = $45,450).
B. The intercept term gives the estimated mean salary of a
female accountant (i.e α = $35,200).
C. The difference between males and females is given by the
coefficient of the dummy variable and it equals 10.25 (45.45-
35.20).
D. Thus, t- statistics shows us gender differential is statistically
significant. Since β is positive and statistically significant
(because the t-calculated value =β/se(β) = 10.25/3.45 =
2.97 is greater than the t-tabulated value at 5% significance
level for the two tailed test).
E. Therefore, it means average salary of male people exceeds
average salary of female people by the amount equal to the
estimator of β (10.25).
Conti…
Figure 4.1: Average salary as shown by dummy regressors

Average salary
4.3.2 Regression on one quantitative variable and one
qualitative variable with two classes, or categories
 Consider the model:
Yi = α1 + α2 Di + βXi + ui ------------(2)
 Where: Yi = annual salary of a college
professor
Xi = years of teaching experience
Di = 1 if male =0 otherwise
Model (2) contains one quantitative variable
(years of teaching experience) and one qualitative
variable (gender) that has two classes (or levels,
classifications, or categories), namely, male and
female.
4.3.2 Regression on one quantitative variable and one
qualitative variable with two classes, or categories
What is the meaning of this equation?
Yi = α1 + α2 Di + βXi + ui ------------(2)
 Assuming, as usual, that E(ui ) = 0, we see that mean
salary of female college professor:
 E(Yi|Xi, Di = 0) = α1 + βXi ----------------------------------(3)
o Mean salary of male college professor:
o E(Yi|Xi , Di = 1) = (α + α2 ) + βXi -----------------(4)
o Geometrically, we have the situation shown in fig below
(for illustration, it is assumed that α1 > 0 ).
4.3.2 Regression on one quantitative variable and one
qualitative variable with two classes, or categories
 In model (2 ) postulates that the male and female college professors’ salary
functions in relation to the years of teaching experience have the same slope
(β ) but different intercepts.
 In other words, it is assumed that the level of the male professor's mean
salary is different from that of the female professor's mean salary (by α2 )
but the rate of change in the mean annual salary by years of experience is the
same for both sexes.
Example
Conti….
1.Assume the following regression result from a model
given by above equation with Y being the hourly wage
rate, D a dummy for men, and X a variable for years of
schooling. The dependent variable is expressed in USA
dollar ($). Standard errors are given within parenthesis:
Ϋ = 338.5 -165.5 D + 59.6 X
(244.7) (81.6) (17.04)
A. Find the slope and the intercept of dummy
variable (for male and female)?
B. Find the difference in average hourly wages
male and female as head of the household.
C. Interpret the estimated coefficient and model result
Regression on one quantitative variable and one qualitative
variable with more than two classes
 Suppose that, on the basis of the cross-sectional data, we
want to regress the annual expenditure on health care by an
individual on the income and education of the individual.
 Since the variable education is qualitative in nature,
suppose we consider three mutually exclusive levels of
education: less than high school, high school, and college.
Now, unlike the previous case, we have more than two
categories of the qualitative variable education.
 Therefore, following the rule that the number of dummies
be one less than the number of categories of the variable
(m-1), we should introduce two dummies to take care of the
three levels of education.
Conti…
 Assuming that the three educational groups have a common slope
but different intercepts in the regression of annual expenditure on
health care on annual income, we can use the following model:
Yi = α1 + α2 D2i + α3 D3i + βXi + ui ----------(5)
Where Yi = annual expenditure on health care
Xi= annual income
D2= 1 if high school education
= 0 otherwise
D3 = 1 if college education
= 0 otherwise
• Note: The intercept α1 will reflect “less than high school
education” category as the base category.
• The differential intercepts α2 and α3 tell by how much the
intercepts of the other two categories differ from the intercept of the
base category, which can be readily checked as follows:
Conti…
• Assuming E(ui ) = 0 , we obtain
E(Yi | D2 = 0, D3 = 0, Xi ) = α1 + βXi
E(Yi | D2 = 1, D3 = 0, Xi ) = (α1 + α 2 ) + βXi
E(Yi | D2 = 0, D3 = 1, Xi ) = (α1 + α3 ) + βXi
which are, respectively the mean health care expenditure functions for
the three levels of education, namely, less than high school, high
school, and college. Geometrically, the situation is shown in fig 1.2
(for illustrative purposes it is assumed that α3 > α2 ).

Ay
u
Illustrative Example
Regression on one quantitative variable and two qualitative variables, with
two categories
 The technique of dummy variable can be easily extended to
handle more than one qualitative variable.
 Let us revert to the college professors’ salary regression, but
now assume that in addition to years of teaching
experience and sex the, skin color of the teacher is also an
important determinant of salary.
 For simplicity, assume that color has two categories: black and
white and assume that sex has two categories male and
female . We can now write as :
Yi = α1 + α2 D2i + α3 D3i + βXi + ui ------(6)
Where Yi = annual salary
Xi = years of teaching experience
D2 = 1if male =0 otherwise
D3 = 1if white =0 otherwise
Conti…
 Notice that each of the two qualitative variables, sex and color, has
two categories and hence needs one dummy variable for each.
 Note also that the omitted, or base, category now is “black female
professor.” Assuming E(ui ) = 0 , we can obtain the following
regression from equation …………………..…(6)
Mean salary for black female professor:
E(Yi | D2 = 0, D3 = 0, Xi ) = α1 + βXi
Mean salary for black male professor:
E(Yi | D2 = 1, D3 = 0, Xi ) = (α1 + α 2 ) +
βXi
Mean salary for white female professor:
E(Yi | D2 = 0, D3 = 1, Xi ) = (α1 + α3 ) +
βXi
Mean salary for white male professor:
E(Yi | D2 = 1, D3 = 1, Xi ) = (α1 + α2 + α3 ) +
βXi
Example
1. Now, suppose we will run the regression of Y on the
four explanatory variables and a constant.
o Y =2736 + 12598D1 + 10969D2 + 5.197X1 + 10.562X2.
o Where, Y is the price of the house.
o D1= 1 (if the house has a driveway) or 0 (if it does not).
o D2= 1 (if the house has a recreation room) or 0
(otherwise) X1 is the size of the garden and X2 is land
rent and
Required: Calculate the expected value if the house has
no driveway, no recreation room, a driveway and a
recreation room, citreous paribus? And interpret the
result of all explanatory variables.
Solution
I. If the house has no driveway ( D1= 0 ) and no recreation
room ( D2 = 0 ), its value will be Y =2736.
II. If the house has a driveway, its value will be, ceteris
paribus), $12598 more.
III. If the house has a recreation room, its value will be, ceteris
paribus, $10969 more.
IV. If the house has a driveway and a recreation room, its value
will be, ceteris paribus, 12598+10969 = $23567 more.
V. Increasing the size of the garden by 1 square foot will
increase the price of the house by $5.197 whether the
house has or not a driveway or a recreation room.
VI. If the land of rent increase by one birr, the price of house
will be rise by $ 10.56, citreous paribus.
4.3.5 Dummy variable Trap
 First, if the regression contains a constant term,
the number of dummy variables must be one less than
the number of classes of each qualitative variable.
 If all categories of a qualitative variable are
incorporated with intercept, there will be perfect
multicollinearity and regression will be impossible.
This is called dummy variable trap.
 Dummy Variable Trap occurs when two or more
dummy variables created by one-hot encoding are
highly correlated (multi-collinear).
 This means that one variable can be predicted from
the others, making it difficult to interpret predicted
coefficient variables in regression models.
Conti…
 There is a way to avoid dummy variable trap.
 First, by introducing as many dummy variables as the number
of categories of that variable and omit the intercept term
in a model. Yi = β1D1i + β2D2i + β3D3i + ui
 Second, if there is base group in the model, the coefficient
attached to the dummy variables must always be interpreted
in relation to the base, or reference, group. That is, include
the intercept term and introduce only (m-1) dummies, where
m is the number of categories of the dummy variable.
 For example, If we want to look at the effect of location( Addis
Ababa, Hawassa, Arba Minch) on Person's salary in thousands
of Birr (Y). If, Arba Minch dropped then:
 Multiple Regression Model: Y= β0 + β1D1+ β2D2+ e
Conti…
 To distinguish the two categories, male and female,
we have introduced only one dummy variable Di . For if
Di = 1 always denotes a male, when Di = 0 we know that
it is a female since there are only two possible outcomes.
 Hence, one dummy variable suffices to distinguish
two categories.
 The general rule is this: If a qualitative variable has
“m” categories, introduce only “m-1” dummy variables.
 In the above example, sex has two categories, and
hence we introduced only a single dummy variable. If this
rule is not followed, we shall fall into what might be
called the dummy variable trap, that is, the situation
of perfect multicollinearity.
Conti…
4.3.6 ANOVA and ANCOVA MODELS
1. ANOVA stands for Analysis of Variance. It is a regression
model in which the dependent variable is quantitative in
nature, but all the explanatory variables are qualitative in
nature (dummies).
There are two major types of ANOVA models:
ANOVA model with one qualitative variable
ANOVA model with two qualitative variables
2. ANCOVA stands for analysis of covariance. It is regression
model contains a mixture of qualitative and quantitative
variables.
 NB. The interpretation of dummy variable remains the same in
both the ANCOVA and ANOVA.
4.4 Dummy as Dependent Variable
 Qualitative Response Model shows situations in which the
dependent variable in a regression equation simply represents a
discrete choice assuming only a limited number of values. Or it is
defined as a dependent variable whose range of values is
substantively restricted.
 On occasions the variable that we are trying to explain may be
discrete rather than continuous.
 Models that involve such variables are called
 Qualitative Response models or
 Discrete Choice models
 Categorical dependent variable model
 Dummy as Dependent Variable
 Dichotomous dependent variable models
 Limited dependent variable models.
Conti…
 If the dependent variable of the model is dummy, the usual
OLS technique will no more be useful. Instead, the maximum
likelihood estimation technique is used. Because when the
dependent variable is dummy, the objective is finding maximum
probability of something happening for the given values of
regressors
 In a regression analysis, we usually face a qualitative
response (dependent) variable of the “yes” or “no” type.
 Discrete choice models dealing with such kind of binary responses
are called binary choice models.
 At this junction, it is important to distinguish between:
 Binary choices: the dependent variable can take two values.
 Multiple choices: the dependent variable can take more than two
values.
 Multinomial choices: work as a teacher, or as a clerk, or as a self
employed or professional or as a factory worker
 Multinomial ordered choices: strongly agree, agree,
neutral, disagree.
Conti…
 There are several types of such models. Some of them include
the
 Linear Probability Model (LPM),
 Probit model
 Logit model,
 The tobit(censored regression) model
 Heckman two stage model etc.
 Technically, it is possible to estimate the binary choices
using OLS.
 Such linear model for binary choices where OLS is used is
called linear probability model (LPM).
 The primary objective in categorical response models is to
explain how observations fall into each category.
• For example, in the labor market case we may wish to
explain labor force participation decision of a women by
linking the dependent variable to explanatory variables like
age, education, marital status etc.
Basic framework of binary models
Conti
4.4.1 The Linear Probability
 It is a Model
multiple regression model with a dependent variable in the form of
binary rather than continuous.
 The term linear probability model comes from the fact that the right-hand
side of the equation is linear.
 Because the dependent variable Y is binary, the population regression
function corresponds to the probability that the dependent variable
equals 1 given explanatory variables, Xs, i.e.

 β1 is the change in the probability that Y=1 associated with a unit change in
X1, i.e.
Interpreting the coefficients of a LPM
Conti…
 The regression coefficients in the LPM are estimated by
OLS.
 The usual (Heteroscedastic-robust) OLS standard errors can
be used to construct confidence intervals and hypotheses
tests.
 Let be the probability that Y=1 (probability of success),
then = probability that Y=0 (probability of failure).
 Therefore;

Probability
0
1
Conti…
Simple Linear Probability Model (LPM)
P(Y=1∣X)=β0+β1X+ε
Probability of Being Approved for a Loan
• P(Loan Approved=1∣Income)=β0+β1⋅Income
• Let’s say after estimating the model using data, you get:
β0=0.1,β1=0.04.Income
• So, the model becomes: P(Loan Approved)=0.1+0.04⋅Income
• Interpretation of Coefficients
• Intercept (β0​=0.1):
• A person with $0 income has a 10% chance of getting approved
(probably only theoretical banks rarely approve $0 income).
• Slope (β1​=0.04):
• For every $1,000 increase in income, the probability of being approved
increases by 4 percentage points.
Advantages of the linear probability model
 It is easy to estimate and interpret the results
Drawbacks of LPM
I. The partial effect of any explanatory variable is constant.
The dependent variable is discrete while the independent
variable is the combination of discrete and continuous
variables.
II. The disturbances are not normally distributed. i .e E(Ui)#0
dependent variable Yᵢ assumes only two values (0 or 1),
the disturbances also takes only two values; that is, the error
term
follows the Bernoulli distribution. As a result, is
not normally distributed.
Conti…

The above equation shows heteroscedasticity because


P = β1 + β2Xi. Thus, the distribution of ui is
non- normal.
Conti….
IV. R2 as a Measure of Goodness of Fit is Questionable
Corresponding to the value of regressors (X‟s), the
dependent variable (Y) is either 0 or 1. Therefore, all the Y
values will either lie along the X axis or along the line
corresponding to Y equals 1. Therefore, generally no LPM is
expected to fit such a scatter well. As a result, computed R2
is of limited value in the dichotomous response models or in
qualitative dependent variables be it constrained or
unconstrained.
V. The restriction is not fulfilled: OLS estimation of the LPM
gives no guarantee for the probability to be between 0 and 1.
This is because the probability increases linearly with
regressors. In fact, we can restrict the LPM under OLS to be
between 0 and 1 or use estimation techniques other than OLS
that guarantee equation This is the real problem with the OLS
estimation of the LPM.
 It is this weakness that gives rise to better methods of
estimating binary dependent variable models (Logit and Probit
Model).
Conti….

Y and
LPM: Observed vs. Predicted
1.5

Observed Y (0 or 1)
Predicted Probabilities (yhat)

.5

-2 -1 0 1 2
X
Conti…
Conti….
 Probability model that has the following two features:
 As Xi increases, Pi= E(Y = 1/X) increases but never steps outside
the 0-1 interval.
 The relationship between Pi and Xi is nonlinear, that is, “ one
which approaches zero at slower and slower rates as Xi gets
small and approaches one at slower and slower rates as Xi gets
very large”.

 S-shaped curve is very much similar


with the cumulative distribution
function (CDF) of a random variable.
 CDF of a random variable X is simply
the probability that it takes a value
less than or equal to x₀, were x₀
is some specified numerical value of
X.
Ay
u
4.5.2 Binary Logit model
The logit model uses the cumulative logistic distribution to
transform the model so that the probabilities follow the S-
shape given on the previous slide
 The binomial logit is an estimation technique for equations
with dummy dependent variables that avoids the
unboundedness problem of the linear probability model.
 BLM is non-linear and does so by using a variant of the
cumulative logistic function
Conti..
 Note that: both response and non-response probabilities lie
in the interval [0 , 1] , and hence, are interpretable.
 Odd ratio: the ratio of the response probabilities (Pi) to the
non response probabilities (1-Pi).
Cont….
 L(the log of the odds ratio) is linear in X as well as (the
parameters). L is called the logit and hence the name
logit model is given to it.
 Thus, the log-odds ratio is a linear function of the
explanatory variables.
 For the LPM it is Pi, which is assumed to be a
linear function of the explanatory variables.
 If the odd ratio is equal to 1, then both outcomes
have equal probability.
 If the odd ratio is equal to 2 , then the outcome Yi = 1 is
twice more likely than the outcome Yi = 0.
 The odd ratio is always non-negative.
Feature of the logit model
 As Pi goes from 0 to 1, (i.e., as Z varies from −∞ to
+∞),Li goes from -∞ to ∞. Although, the probabilities
lie between 0 and 1, the logit Li are not so bounded.
 Logit is linear in X, the probabilities themselves are
not. This property is in contrast with the LPM model
where the probabilities increase linearly with X.
 If Li, the logit becomes increasingly large
and positive, as when the value of the
explanatory variable(s) increases and as the odds
ratio increases from 1 to infinity and the logit
becomes increasingly large and negative, as the odds
ratio decreases from 1 to 0.
 LPM assumes that Pi is linearly related to Xi, the logit model assumes that
the log of the odds ratio is linearly related to Xi.
 Interpretation: Be remind that we does not directly interpreted
the coefficients of the variables rather we interpreted their marginal
effects.
Conti…
 The coefficient β in logit (Non-linear model) is not necessarily a
measure of change of probability for a unit change the
covariates x. It is only interpreted in terms of odd-ratios
 The coefficient β measures the percentage change in log-odds
ratio for a unit change in a covariate. That is, a unit increase in
X1 leads to an increase of 100β1% in the odds-ratio.
I. Individual data

 SE are asymptotic hence we have to use Z statistic instead of


t- statistic.
 R-square is not meaningful in binary response models.
 LR test, which is chi-square test with df equal to number of
regressors, in Logit is equivalent the use of F-test for joint test
multiple regression model
II. Grouped data
Marginal Effect
 Reporting marginal effect instead of odd ratio is more popular in
economics. In most of applications, the primary goal is to explain the
effects of Xj on the response probability Pr (Y = 1), not of the log
odd-ratio.
 The changes in probabilities (slopes) can be computed, though not
constant, and are termed as marginal effects.
Cont…
 Each slope coefficient shows how the log of the odds in
favor of the outcome changes as the value of the
X variable changes by a unit.
 βi, the slope, measures the marginal effect of Xi on the

 The intercept 𝛽0 is the value of the log odds if Xi‟s


log odds-ratio in favor of Y=1.

are zero.
 The coefficient β measures the percentage change
in log-odds ratio for a unit change in a covariate.
Merits of Logit Model
• Logit analysis produces statistically sound results. By
allowing for the transformation of a dichotomous
dependent variable to a continuous variable ranging
from - ∞ to + ∞, the problem of out-of-range estimates
is avoided.
Conti…
 The logit analysis provides results which can be easily interpreted and
the method is simple to analyze.
 It gives parameter estimates which are asymptotically consistent,
efficient and normal, so that the analogue of the regression t-test can
be applied.
Demerits of Logit Model
Difference between Logit and LPM
 In the LPM the slope coefficient measures the marginal effect
of a unit change in the explanatory variable on the probability
of the outcome, holding other variables constant.
 In the logit model, the marginal effect of a unit change in the
explanatory variable not only depends on the coefficient of that
variable but also on the level of probability from which the
change is measured.
 The logit model depends on the values of all the explanatory
variables in the model.
 The LPM assumes that Pi is linearly related to Xi, where as the
logit model assumes that the of odds ratio is linearly related to
Xi.

4.5.3 The Probit model


 The probit model uses the cumulative normal distribution
function, hence sometimes referred to as the Normit
model.
 The probit model is similar to the logit model except that
the logistic function is replaced by the normal distribution
function.
 The estimating model that emerges from the normal
cumulative distribution function is popularly known as the
probit model.
 In the probit model, G is the standard normal cumulative
distribution function (cdf ), which is expressed as an
integral.

Conti…
Conti…
 The latent variable is assumed to be a linear function of the
observed X‟s through the structural model.
 However, since the latent dependent variable is unobserved
the model cannot be estimated using OLS.
 Maximization of the likelihood function for either the probit or
the logit model is accomplished by nonlinear estimation
methods.
 Maximum likelihood can be used instead. the choice is
between normal errors and logistic errors, resulting in the
probit (Normit) and logit models, respectively.
• It are used to predict an outcome variable that is categorical
(violates the assumption of linearity in normal
regression) from one or more categorical .
Similarities between Logit and Probit Models
 Both models give qualitatively similar results.
 In both model, interpret the sign of the coefficient but not the
magnitude. The magnitude cannot be interpreted using the
coefficient because different models have different scales of
coefficients.
 In both cases, as with the LPM, it is assumed that E[∈i/Xi] = 0
 Both Logit & probit models are S – shaped function.
 Both the probit and the logit models are estimated by Maximum
Likelihood Estimation
 Both Logit & probit models are a non-linear response function.
 Both the Probit and Logit models have the same basic structure.
 Estimate a latent variable Y* using a linear model.
 Y* ranges from negative infinity to positive infinity.
 Use a non-linear function to transform Y* into a predicted Y .Y
lies between 0 and 1.
Difference between Logit and Probit Models
 The main difference being that the logistic distribution has
slightly fatter tails (the conditional probability Pi approaches
zero or one at a slower rate in logit than in probit).
 In practice many researchers choose the logit model because of
its comparative mathematical simplicity.
 The parameters of the two models are scaled differently. I.e
The parameter estimates in logistic regression tend to be 0.6 to
0.8 times higher than they are in corresponding probit model.
 The coefficients derived from the maximum likelihood (ML)
function will be the coefficients for the probit model, if we
assume a normal distribution.
 If we assume that the appropriate distribution of the error term
is a logistic distribution, the coefficients that we get from the
ML function will be the coefficient of the logit model
5/14/2023

Difference and similarities between Logit and


Probit Models
Both uses the CDF (CLF and CNF) to transform the model so that
the probabilities follow the S-shape, but differ in the relative
thickness of the tails. Logit is relatively thicker than Probit.
This difference would, however, disappear, as the sample size
gets large.

Ay
u
How to interpret coefficients in both model?
 β can not be interpreted as a simple slope as in
ordinary regression. B/c the rate at which the curve
ascends and descends changes according to the value of
x. In other words, it is not a constant change as in
ordinary regression

In both logit and probit models, if the coefficient (β)


is greater than 1 or less than 1 (i.e., β0), it indicates
the direction of the relationship between the independent
variable (X) and the probability (P). If β>0, as X
increases, the probability P increases. If β<0, as X
increases, the probability P decreases.
Example on Logit and Probit
1. Suppose that we want to examine the effect of routine weekly
exercises on the performance of students.
 To this end, suppose we gave routine exercises to second
year section A students and at the end of the semester, we
found average scores in exercise (ASE) for each student.

𝑌𝑖=1 for those students scoring A and 𝑌𝑖=0 for those


 The dependent variable in this example is dichotomous,

students scoring other grades (B, C, D, F and FX).


 There are two continuous variables (GPA and ASE) and
one categorical variable, PC ownerships.
 Where , PC=1 for students with PC and PC =0 for
students with out PC ownerships.
Example on Logit and Probit
A. Interpretation of Logit
Model
logit grade gpa ase pc
Logistic regression Number of obs = 32
LR chi2(3) = 15.40
Prob > chi2 = 0.0015
Log likelihood = - Pseudo R2 = 0.3740
12.889633

grade Coef. Std. Err. z P>|z| [95% Conf.


Interval]

gpa 2.826113 1.262941 2.24 0.025 .3507938 5.301432


ase .0951577 .1415542 0.67 0.501 -.1822835 .3725988
pc 2.378688 1.064564 2.23 0.025 .29218 4.465195
_cons -13.02135 4.931325 -2.64 0.008 -22.68657 -
3.35613
Interpretation of Logit
 GPA: for every one-unit increase in GPA, we expect
a 2.826113 increase in log-odds of getting A grade ,
holding all other independent variables constant.
 ASE: for every one unit increase in ASE(so, for every
additional point scoring in exercise), we expect a
0.951577 increase in the log-odds of getting A grade,
holding all other explanatory variables constant.
 PC: for a one unit increase in PC (in other world,
individual going from no pc-to-pc ownership), we
expect a 2.37868 increase in the log-odds of getting A
grade, holding all other independent variables
constant.
Interpretation of Logit
B. Odds Ratio Interpretation of Logit
Model
logit grade gpa ase pc, or
Logistic regression Number of obs = 32
LR chi2(3) = 15.40
Prob > chi2 = 0.0015
Log likelihood = -12.889633 Pseudo R2 = 0.3740

grade Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]

gpa 16.87972 21.31809 2.24 0.025 1.420194 200.6239


ase 1.099832 .1556859 0.67 0.501 .8333651 1.451502
pc 10.79073 11.48743 2.23 0.025 1.339344 86.93802
_cons 2.21e-06 .0000109 -2.64 0.008 1.40e-10 .03487
Odds Ration Interpretation
•GPA: As GPA increases by one point, the
probability of getting A is 16.87 times as large as
the probability of getting other grades (B, C, D, F ,
and FX).
•ASE: As ASE increase by one point, the
probability of getting A is 1.09 times as large as the
probability of getting other grades.
•PC: For a PC owners, the probability of getting A
is 10.79 times as large as the probability for non-
owners to getting A.
Interpretation of Logit
C. Probability Interpretation of Logit
Model
. mfx

Marginal effects after


logit y = Pr(grade)
(predict)
variable dy/dx Std. Err. z P>|z| [ 95% C.I. X
= .25282025 ]

gpa .5338589 .23704 2.25 0.024 .069273 .998445 3.11719


ase .0179755 .02624 0.69 0.493 -.03344 .069399 21.9375
8
pc* .4564984 .18105 2.52 0.012 .10164 .811357 .4375

(*) dy/dx is for discrete change of dummy variable from 0


to 1
Marginal Effect (mfx)
Interpretation
•Both logit and probit give us similar results .
•GPA: As GPA increases by one point, the probability
of getting grade A by student increase by 53%.
•ASE: As ASE increase by one point, the probability of
getting A grade by student increase by 1.79%.
•PC: if the student with PC ,(the change in pc ownership
from no pc ownership to pc ownership) the probability
of getting A grade by the student increase by 45.64%.
Interpretation of Probit
Model
D. Probit
Estimation
probit grade gpa ase pc
Probit regression Number of obs = 32
LR chi2(3) = 15.55
Prob > chi2 = 0.0014
Log likelihood = - Pseudo R2 = 0.3775
12.818803

grade Coef. Std. Err. z P>|z| [95% Conf.


Interval]

gpa 1.62581 .6938825 2.34 0.019 .2658255 2.98579


5
ase .0517289 .0838903 0.62 0.537 -.1126929 .216150
8
pc 1.426332 .5950379 2.40 0.017 .2600795 2.59258
5
_cons -7.45232 2.542472 -2.93 0.003 -12.43547 -
Interpretation of Probit
GPA: For one unit increase in GPA, the probit
index(Z-score) increase by 1.62581.
ASE: For one unit increase in ASE, the probit
index increase by 0.517.
PC: The Student with PC, increase the Z-score
by 1.426.
The
end

You might also like