We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20
STATA continued..
Yrgalem D. Raya university Discrete Response models
So far the dependent variable (Y) has been continuous: such
as Determinant of income, education, agriculture output What if the dependent variable is discrete. Generally discrete dependent variables can be classified 1. Binary response models (Whether a firm exports, or a farmer has adopted a technology) 2. Multinomial response models (whether an individual is in wage employment, unemployed or in self- employment) 3. Ordered response models (e.g. Credit ratings from A to D) Binary dependent variable To day we are going to see binary dependent variable in order to understand the effect of X on a binary dependent variable: Y = get into college, or not Y = willing to pay, or not Y = adoption, or not The linear probability model • When we apply OLS to binary dependent variable • i.e. when Y is binary, the linear regression model Yi = 0 + 1Xi + ui is called the linear probability model. • The predicted value is a probability: • E(Y|X=x) = Pr(Y=1|X=x) = prob. that Y = 1 given x • = the predicted probability that Yi = 1, given X The linear probability model • Jj The linear probability model…. simple to estimate and to interpret • OLS is inefcient (based on homoskedasticity, normality) • Possible to get probability < 0 or > 1. This makes no sense—you can't have a probability below 0 or above 1. • This is a fundamental problem with the LPM that we can't patch up. • These disadvantages can be solved by using a nonlinear probability model: probit and logit regression LPM Vs probit and logit……. • a graph of a case where LPM goes wrong and the logit(probit). probit model • models the probability that Y=1 using the cumulative standard normal distribution function, evaluated at z = 0 + 1X: • Pr(Y = 1|X) = (0 + 1X) • is the cumulative normal distribution function. • z = 0 + 1X is the “z-value” or “z-index” of the probit model. • Since it is a nonlinear model, it cannot be estimated by our usual methods • Use maximum likelihood estimation Logit model • Another common choice for G(z) is the logistic function, which is the cdf for a standard logistic random variable • the probability of Y=1 as the cumulative standard logistic distribution function, evaluated at y = 0 + 1X: Pr(Y = 1|X) = F(0 + 1X) • F is the cumulative logistic distribution function:
F(0 + 1X) =
• This case is referred to as a logit model, or sometimes
as a logistic regression probit and logit • Both functions have similar shapes • Both the probit and logit are nonlinear and require maximum likelihood estimation • No real reason to prefer one over the other but • Traditionally saw more of the logit, mainly because the logistic function leads to a more easily computed model • Today, probit is easy to compute with standard packages, so more popular probit and logit • In general we care about the effect of x on P(y = 1|x), that is, we care about ∂p/ ∂x • For the linear case, this is easily computed as it is the coefficient on x • For the nonlinear probit and logit models, it’s more complicated: • ∂p/ ∂xj = g(0 +x)j, where g(z) is dG/dz probit and logit……. • in both cases the estimated coefficients look very different from those of the OLS linear probability estimates. • This is because they do not have the same interpretation as with OLS (they are simply values that maximum the likelihood function). • To obtain coefficients which can be interpreted in a similar way to OLS, need marginal effects • To obtain marginal effects in Stata run either the logit or probit command then simply type mfx probit and logit……. • In both cases the interpretation of these marginal effects is the impact that a unit change has on the probability of belonging to the treatment group (just like OLS coefficients) • because OLS is a linear estimator the estimated marginal effect is the same at every set of X values. (reg and mfx are equal), unlike with logit and probit. Whereas the cofecients from logit and probit will differ due to scaling, the marginal efects should be almost identical. Logit model in STATA………. • Stata provides two equivalent commands for the binary logit model that present the same result in different ways. • The logit command produces coefficients with respect to logit (log of odds), while logistic reports odd ratios. • A coefficient of logit is the corresponding logarithmic transformed odds ratio of logistic. Logit model in STATA………. • The both commands are followed by the dependent variable, a set of independent variables, and a series of options after a comma. • Example in oure case; logit wtp GEN AG MRS logistic wtp GEN AG MRS Logit model in STATA……….
example, 2.07321= ln(7.950299).
Logit model in STATA…. • To see the magnitude of the effect of the explanatory variables to the household’s WTP we have to see the computed marginal effect.
• The interpretation of the marginal effects shows the
change in the probability of an event due to a unit change in the continuous explanatory variables for the continuous variables. • and the change of dummy variables from 0 to 1 for discrete variables. Probit model in STATA………. • Stata provides commands for the binary probit model; probit • probit wtp GEN AG MRS • To see the magnitude of the effect of the explanatory variables to the household’s WTP we have to see the computed marginal effect mfx Probit model in STATA………. Tobit model • A limited dependent variable may take either zero or positive continuous values. In such situation Tobit estimation is appropriate. • Can also have latent variable models that don’t involve binary dependent variables • Say y* = x + u, u|x ~ Normal(0,s2) • But we only observe y = max(0, y*) • The Tobit model uses MLE to estimate both and s for this model • Important to realize that estimates the effect of x on y*, the latent variable, not y