Econometrics Unit 7 Dummy Variables: Amado Peiró (Universitat de València)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

ECONOMETRICS

UNIT 7

DUMMY VARIABLES

Amado Peiró (Universitat de València)

1
7.1 Dummy variables

Frequently we may want to include as a regressor a qualitative


or categorical (non-quantitative) variable. This is accomplished
by using dummy variables, which are variables that take value 1
in a certain case and 0 otherwise. For example, let us define la
dummy variable GENDER in the following way,

1 if 𝑖𝑖 is a man
𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝑖𝑖 = �
0 if 𝑖𝑖 is a woman

and we can build the following model:

𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑖𝑖 = 𝛽𝛽1 + 𝛽𝛽2 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝑖𝑖 + 𝛽𝛽3 𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝑖𝑖 + 𝑢𝑢𝑖𝑖

To interpret 𝛽𝛽3 , let us analyse the model for each gender:

2
♂: E 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑖𝑖 𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝑖𝑖 = 1 = 𝛽𝛽1 + 𝛽𝛽2 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝑖𝑖 + 𝛽𝛽3
♀: E 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑖𝑖 𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝑖𝑖 = 0 = 𝛽𝛽1 + 𝛽𝛽2 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝑖𝑖
───────────────────────────────────
E 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑖𝑖 │♂ − E 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑖𝑖 │♀ = 𝛽𝛽3

𝛽𝛽3 reflects the difference in the expected salary between a man


and a woman with the same education.

► What would happen if the dummy variable is defined in the


opposite way (1 for women, 0 for men)?

► What would happen if two dummy variables are used?

3
7.2 Dummy variables with any number of categories

If the qualitative variable admits 𝑚𝑚 categories (𝑚𝑚 > 1), one


should define 𝑚𝑚 − 1 dummy variables, one for each different
category, and the remanining category will be the reference
category. The coefficient will reflect the difference between the
corresponding category and the reference one.

4
7.3 Interactions with dummy variables

With the dummy defined in 7.1, let us consider the model

𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑖𝑖 = 𝛽𝛽1 + 𝛽𝛽2 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝑖𝑖 + 𝛽𝛽3 𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝑖𝑖 + 𝛽𝛽4 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝑖𝑖 � 𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝑖𝑖 + 𝑢𝑢𝑖𝑖

where 𝛽𝛽4 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝑖𝑖 � 𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝑖𝑖 is an interaction term. Then.

♂: E 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑖𝑖 │♂ = 𝛽𝛽1 + 𝛽𝛽2 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝑖𝑖 + 𝛽𝛽3 + 𝛽𝛽4 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝑖𝑖


♀: E 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑖𝑖 │♀ = 𝛽𝛽1 + 𝛽𝛽2 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝑖𝑖
───────────────────────────────────
E 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑖𝑖 │♂ − E 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑖𝑖 │♀ = 𝛽𝛽3 + 𝛽𝛽4 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝑖𝑖

Then, the difference in salary between men and women will be


composed by a fixed quantity (𝛽𝛽3 ) and a variable quantity (𝛽𝛽4 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝑖𝑖 )
that depends on education.

5
7.4 Dummy variables and structural break
Structural break may be studied with the Chow test or with dummy
variables. For example, to analyse the differences between genders:

CHOW TEST
MEN: 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑌𝑌𝑖𝑖 = 𝛽𝛽1,I + 𝛽𝛽2,I 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝑖𝑖 + 𝑢𝑢𝑖𝑖,I Sample size: 𝑛𝑛1 𝑆𝑆𝑆𝑆𝑆𝑆𝐼𝐼
WOMEN: 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑌𝑌𝑖𝑖 = 𝛽𝛽1,II + 𝛽𝛽2,II 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝑖𝑖 + 𝑢𝑢𝑖𝑖,II Sample size: 𝑛𝑛2 𝑆𝑆𝑆𝑆𝑆𝑆𝐼𝐼𝐼𝐼
ALL: 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑌𝑌𝑖𝑖 = 𝛽𝛽1 + 𝛽𝛽2 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝑖𝑖 + 𝑢𝑢𝑖𝑖 Sample size: 𝑛𝑛1 + 𝑛𝑛2 = 𝑛𝑛 𝑆𝑆𝑆𝑆𝑆𝑆

H0 : β1,I = β1,II
β2,I = β2,II ▶ There are NOT differences between genders
H1 : H0 not true ▶ There are differences between genders

and the test statistic will be:


𝑆𝑆𝑆𝑆𝑆𝑆− 𝑆𝑆𝑆𝑆𝑆𝑆𝐼𝐼 +𝑆𝑆𝑆𝑆𝑆𝑆𝐼𝐼𝐼𝐼 ⁄2
~ F2, 𝑛𝑛−4
𝑆𝑆𝑆𝑆𝑆𝑆𝐼𝐼 +𝑆𝑆𝑆𝑆𝑆𝑆𝐼𝐼𝐼𝐼 ⁄ 𝑛𝑛−4

6
DUMMY VARIABLES
Now our model will be:
SALARYi = β1 + β2 EDUCi + β3 GENDER i + β4 EDUCi � GENDER i + ui
The sample size is 𝑛𝑛, and GENDER takes value 1 for men, and 0
otherwise. To analyse the differences in salaries between genders:
H0 : β 3 = 0
β4 = 0 ▶ There are NOT differences between genders
H1 : H0 not true ▶ There are differences between genders

and the test statistic will be:

𝑆𝑆𝑆𝑆𝑆𝑆𝑅𝑅𝑅𝑅 − 𝑆𝑆𝑆𝑆𝑆𝑆𝐺𝐺𝐺𝐺 ⁄2
~ F2, 𝑛𝑛−4
𝑆𝑆𝑆𝑆𝑆𝑆𝐺𝐺𝐺𝐺 ⁄ 𝑛𝑛 − 4

7
ECONOMETRICS

UNIT 8

EXTENSIONS AND PROBLEMS IN THE MLRM (I)

Amado Peiró (Universitat de València)

1
6.1. Collinearity

A MLRM present collinearity or multicollinearity if there exists


linear dependence among the regressors.
If the linear dependence is exact, there will be perfect or exact
collinearity.
If the linear dependence is approximate (but not exact), there
will be high (but non-perfect) collinearity.

2
6.1.1. Consequences of collinearity

If a MLRM presents perfect collinearity, then one cannot


properly estimate the model as the the system of k normal
equations has infinite solutions.
If a MLRM presents high (but non-perfect) collinearity, then:
1. The model can be estimated, but the variances of the
estimators will be large.
2. The standard deviations of the estimators will be large.
3. The standard errors of the estimators will be large.
4. The t-statistics will be close to zero.
5. In particular, non-significance of the regressors will not be
rejected.
6. In addition, the estimators will be very sensitive to the
sample; small changes in the samples can affect the
estimates deeply.

3
6.1.2. Detection of collinearity

The existence of perfect collinearity is evident as the model


cannot be estimated.

One can suspect strong (but non-perfect) collinearity when:

1. The model is highly significant, but the variables are not


significant individually.
2. There is a strong correlation (close to −1 or close to +1)
between two regressors.
3. The regression of one regressor against the other regressors
yields a high 𝑅𝑅2 .

4
6.1.3. Solutions to collinearity

As the problem of collinearity is due to the existence of linear


dependence between regressors, possible solutions to break this
linear dependence are:

► To drop regressors involved in the linear dependence.


► To increase the sample with observations that do not keep
this linear dependence.

5
6.2. Heteroscedasticity

If all the errors have the same variance,

𝑉𝑉𝑉𝑉𝑉𝑉 𝑢𝑢1 = 𝑉𝑉𝑉𝑉𝑉𝑉 𝑢𝑢2 = ⋯ = 𝑉𝑉𝑉𝑉𝑉𝑉 𝑢𝑢𝑛𝑛 = 𝜎𝜎 2 ,

the model is homoscedastic. Otherwise, it is heteroscedastic.


That is, if not all the errors have the same variance, the model
presents heteroscedasticity.

6
6.2.1. Consequences of heteroscedasticity

If the regression model has heteroscedasticity, the OLS


estimators are still linear and unbiased, but they are no longer
best or minimum-variance estimators, and the conventional
tests are no longer valid.

7
6.2.2. Detection of heteroscedasticity
One of the most usual tests for heteroscedasticity is the White test. The
hypotheses are:
H0 : 𝑉𝑉𝑉𝑉𝑉𝑉 𝑢𝑢𝑖𝑖 = 𝜎𝜎 2
H1 : 𝑉𝑉𝑉𝑉𝑉𝑉 𝑢𝑢𝑖𝑖 = 𝜎𝜎𝑖𝑖2

The test statistic is obtained with the following steps:


1. The original regression is run.
2. The squared residuals of the preceding regression are regressed against a
constant, the regressors of the original regression, their squares and, possibly,
their cross products, without duplicate terms.
3. The test statistic will be 𝑁𝑁 � 𝑅𝑅2 , where 𝑅𝑅2 is the coefficient of determination
of the preceding auxiliary regression. Under the null hypothesis of
homoscedasticity, this statistic follows asymptotically a 𝜒𝜒 2 distribution with a
number of degrees of freedom equal to the number of regressors of the
auxiliary regression excluding the intercept.

You might also like