Econometrics Unit 7 Dummy Variables: Amado Peiró (Universitat de València)

ECONOMETRICS
UNIT 7
DUMMY VARIABLES
Amado Peiró (Universitat de València)
1
7.1 Dummy variables
Frequently we may want to include as a regressor a qualitative

or categorical (non-quantitative) variable. This is accomplished
by using dummy variables, which are variables that take value 1
in a certain case and 0 otherwise. For example, let us define la
dummy variable GENDER in the following way,
1 if 𝑖𝑖 is a man
𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝑖𝑖 = �
0 if 𝑖𝑖 is a woman
and we can build the following model:
𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑖𝑖 = 𝛽𝛽1 + 𝛽𝛽2 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝑖𝑖 + 𝛽𝛽3 𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝑖𝑖 + 𝑢𝑢𝑖𝑖
To interpret 𝛽𝛽3 , let us analyse the model for each gender:
2
♂: E 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑖𝑖 𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝑖𝑖 = 1 = 𝛽𝛽1 + 𝛽𝛽2 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝑖𝑖 + 𝛽𝛽3
♀: E 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑖𝑖 𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝑖𝑖 = 0 = 𝛽𝛽1 + 𝛽𝛽2 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝑖𝑖
───────────────────────────────────
E 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑖𝑖 │♂ − E 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑖𝑖 │♀ = 𝛽𝛽3
𝛽𝛽3 reflects the difference in the expected salary between a man

and a woman with the same education.
► What would happen if the dummy variable is defined in the

opposite way (1 for women, 0 for men)?
► What would happen if two dummy variables are used?
3
7.2 Dummy variables with any number of categories
If the qualitative variable admits 𝑚𝑚 categories (𝑚𝑚 > 1), one

should define 𝑚𝑚 − 1 dummy variables, one for each different
category, and the remanining category will be the reference
category. The coefficient will reflect the difference between the
corresponding category and the reference one.
4
7.3 Interactions with dummy variables
With the dummy defined in 7.1, let us consider the model
𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑖𝑖 = 𝛽𝛽1 + 𝛽𝛽2 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝑖𝑖 + 𝛽𝛽3 𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝑖𝑖 + 𝛽𝛽4 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝑖𝑖 � 𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝑖𝑖 + 𝑢𝑢𝑖𝑖
where 𝛽𝛽4 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝑖𝑖 � 𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝑖𝑖 is an interaction term. Then.
♂: E 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑖𝑖 │♂ = 𝛽𝛽1 + 𝛽𝛽2 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝑖𝑖 + 𝛽𝛽3 + 𝛽𝛽4 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝑖𝑖

♀: E 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑖𝑖 │♀ = 𝛽𝛽1 + 𝛽𝛽2 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝑖𝑖
───────────────────────────────────
E 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑖𝑖 │♂ − E 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑖𝑖 │♀ = 𝛽𝛽3 + 𝛽𝛽4 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝑖𝑖
Then, the difference in salary between men and women will be

composed by a fixed quantity (𝛽𝛽3 ) and a variable quantity (𝛽𝛽4 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝑖𝑖 )
that depends on education.
5
7.4 Dummy variables and structural break
Structural break may be studied with the Chow test or with dummy
variables. For example, to analyse the differences between genders:
CHOW TEST
MEN: 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑌𝑌𝑖𝑖 = 𝛽𝛽1,I + 𝛽𝛽2,I 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝑖𝑖 + 𝑢𝑢𝑖𝑖,I Sample size: 𝑛𝑛1 𝑆𝑆𝑆𝑆𝑆𝑆𝐼𝐼
WOMEN: 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑌𝑌𝑖𝑖 = 𝛽𝛽1,II + 𝛽𝛽2,II 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝑖𝑖 + 𝑢𝑢𝑖𝑖,II Sample size: 𝑛𝑛2 𝑆𝑆𝑆𝑆𝑆𝑆𝐼𝐼𝐼𝐼
ALL: 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑌𝑌𝑖𝑖 = 𝛽𝛽1 + 𝛽𝛽2 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝑖𝑖 + 𝑢𝑢𝑖𝑖 Sample size: 𝑛𝑛1 + 𝑛𝑛2 = 𝑛𝑛 𝑆𝑆𝑆𝑆𝑆𝑆
H0 : β1,I = β1,II
β2,I = β2,II ▶ There are NOT differences between genders
H1 : H0 not true ▶ There are differences between genders
and the test statistic will be:

𝑆𝑆𝑆𝑆𝑆𝑆− 𝑆𝑆𝑆𝑆𝑆𝑆𝐼𝐼 +𝑆𝑆𝑆𝑆𝑆𝑆𝐼𝐼𝐼𝐼 ⁄2
~ F2, 𝑛𝑛−4
𝑆𝑆𝑆𝑆𝑆𝑆𝐼𝐼 +𝑆𝑆𝑆𝑆𝑆𝑆𝐼𝐼𝐼𝐼 ⁄ 𝑛𝑛−4
6
DUMMY VARIABLES
Now our model will be:
SALARYi = β1 + β2 EDUCi + β3 GENDER i + β4 EDUCi � GENDER i + ui
The sample size is 𝑛𝑛, and GENDER takes value 1 for men, and 0
otherwise. To analyse the differences in salaries between genders:
H0 : β 3 = 0
β4 = 0 ▶ There are NOT differences between genders
H1 : H0 not true ▶ There are differences between genders
and the test statistic will be:
𝑆𝑆𝑆𝑆𝑆𝑆𝑅𝑅𝑅𝑅 − 𝑆𝑆𝑆𝑆𝑆𝑆𝐺𝐺𝐺𝐺 ⁄2
~ F2, 𝑛𝑛−4
𝑆𝑆𝑆𝑆𝑆𝑆𝐺𝐺𝐺𝐺 ⁄ 𝑛𝑛 − 4
7
ECONOMETRICS
UNIT 8
EXTENSIONS AND PROBLEMS IN THE MLRM (I)
Amado Peiró (Universitat de València)
1
6.1. Collinearity
A MLRM present collinearity or multicollinearity if there exists

linear dependence among the regressors.
If the linear dependence is exact, there will be perfect or exact
collinearity.
If the linear dependence is approximate (but not exact), there
will be high (but non-perfect) collinearity.
2
6.1.1. Consequences of collinearity
If a MLRM presents perfect collinearity, then one cannot

properly estimate the model as the the system of k normal
equations has infinite solutions.
If a MLRM presents high (but non-perfect) collinearity, then:
1. The model can be estimated, but the variances of the
estimators will be large.
2. The standard deviations of the estimators will be large.
3. The standard errors of the estimators will be large.
4. The t-statistics will be close to zero.
5. In particular, non-significance of the regressors will not be
rejected.
6. In addition, the estimators will be very sensitive to the
sample; small changes in the samples can affect the
estimates deeply.
3
6.1.2. Detection of collinearity
The existence of perfect collinearity is evident as the model

cannot be estimated.
One can suspect strong (but non-perfect) collinearity when:
1. The model is highly significant, but the variables are not

significant individually.
2. There is a strong correlation (close to −1 or close to +1)
between two regressors.
3. The regression of one regressor against the other regressors
yields a high 𝑅𝑅2 .
4
6.1.3. Solutions to collinearity
As the problem of collinearity is due to the existence of linear

dependence between regressors, possible solutions to break this
linear dependence are:
► To drop regressors involved in the linear dependence.

► To increase the sample with observations that do not keep
this linear dependence.
5
6.2. Heteroscedasticity
If all the errors have the same variance,
𝑉𝑉𝑉𝑉𝑉𝑉 𝑢𝑢1 = 𝑉𝑉𝑉𝑉𝑉𝑉 𝑢𝑢2 = ⋯ = 𝑉𝑉𝑉𝑉𝑉𝑉 𝑢𝑢𝑛𝑛 = 𝜎𝜎 2 ,
the model is homoscedastic. Otherwise, it is heteroscedastic.

That is, if not all the errors have the same variance, the model
presents heteroscedasticity.
6
6.2.1. Consequences of heteroscedasticity
If the regression model has heteroscedasticity, the OLS

estimators are still linear and unbiased, but they are no longer
best or minimum-variance estimators, and the conventional
tests are no longer valid.
7
6.2.2. Detection of heteroscedasticity
One of the most usual tests for heteroscedasticity is the White test. The
hypotheses are:
H0 : 𝑉𝑉𝑉𝑉𝑉𝑉 𝑢𝑢𝑖𝑖 = 𝜎𝜎 2
H1 : 𝑉𝑉𝑉𝑉𝑉𝑉 𝑢𝑢𝑖𝑖 = 𝜎𝜎𝑖𝑖2
The test statistic is obtained with the following steps:

1. The original regression is run.
2. The squared residuals of the preceding regression are regressed against a
constant, the regressors of the original regression, their squares and, possibly,
their cross products, without duplicate terms.
3. The test statistic will be 𝑁𝑁 � 𝑅𝑅2 , where 𝑅𝑅2 is the coefficient of determination
of the preceding auxiliary regression. Under the null hypothesis of
homoscedasticity, this statistic follows asymptotically a 𝜒𝜒 2 distribution with a
number of degrees of freedom equal to the number of regressors of the
auxiliary regression excluding the intercept.

Econometrics Unit 7 Dummy Variables: Amado Peiró (Universitat de València)

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Econometrics Unit 7 Dummy Variables: Amado Peiró (Universitat de València)

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Econometrics Unit 7 Dummy Variables: Amado Peiró (Universitat de València)

Uploaded by

Copyright:

Available Formats

ECONOMETRICS

Amado Peiró (Universitat de València)

Frequently we may want to include as a regressor a qualitative

and we can build the following model:

𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑖𝑖 = 𝛽𝛽1 + 𝛽𝛽2 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝑖𝑖 + 𝛽𝛽3 𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝑖𝑖 + 𝑢𝑢𝑖𝑖

To interpret 𝛽𝛽3 , let us analyse the model for each gender:

𝛽𝛽3 reflects the difference in the expected salary between a man

► What would happen if the dummy variable is defined in the

► What would happen if two dummy variables are used?

If the qualitative variable admits 𝑚𝑚 categories (𝑚𝑚 > 1), one

With the dummy defined in 7.1, let us consider the model

where 𝛽𝛽4 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝑖𝑖 � 𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝑖𝑖 is an interaction term. Then.

♂: E 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑖𝑖 │♂ = 𝛽𝛽1 + 𝛽𝛽2 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝑖𝑖 + 𝛽𝛽3 + 𝛽𝛽4 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝑖𝑖

Then, the difference in salary between men and women will be

and the test statistic will be:

and the test statistic will be:

EXTENSIONS AND PROBLEMS IN THE MLRM (I)

Amado Peiró (Universitat de València)

A MLRM present collinearity or multicollinearity if there exists

If a MLRM presents perfect collinearity, then one cannot

The existence of perfect collinearity is evident as the model

One can suspect strong (but non-perfect) collinearity when:

1. The model is highly significant, but the variables are not

As the problem of collinearity is due to the existence of linear

► To drop regressors involved in the linear dependence.

If all the errors have the same variance,

𝑉𝑉𝑉𝑉𝑉𝑉 𝑢𝑢1 = 𝑉𝑉𝑉𝑉𝑉𝑉 𝑢𝑢2 = ⋯ = 𝑉𝑉𝑉𝑉𝑉𝑉 𝑢𝑢𝑛𝑛 = 𝜎𝜎 2 ,

the model is homoscedastic. Otherwise, it is heteroscedastic.

If the regression model has heteroscedasticity, the OLS

The test statistic is obtained with the following steps:

You might also like