0% found this document useful (0 votes)

75 views9 pages

Dummy Variable Regression Models: Dichotomous Variables)

Dummy variable regression models allow qualitative explanatory variables to be used in regression analysis. Dummy variables assign numeric values (usually 0 and 1) to qualitative categories to quantify them. For example, a regression model with gender as a dummy explanatory variable would assign males as 0 and females as 1. Introducing dummy variables allows estimating different intercepts for each qualitative group while maintaining the same slope. The statistical significance and values of dummy variable coefficients indicate differences between group means for the dependent variable.

Uploaded by

Arunima

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

75 views9 pages

Dummy Variable Regression Models: Dichotomous Variables)

Uploaded by

Arunima

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Dummy Variable Regression Models

-Yogita Yadav

In all the regression models discussed so far the dependent and explanatory variables were
quantitative in nature. But this may not always be the case. There are many occasions when the
explanatory variables are qualitative in nature, for example gender, religion, color, nationality etc.
These qualitative variables are called Dummy Variables (AKA indicator/ categorical/ binary/
dichotomous variables).
The qualitative variables can be quantified with the help of artificial variables i.e. 0 (zero) and 1 (one)
where 0 indicates the absence of an attribute and 1 indicates the presence of that attribute.

Suppose we have a regression model where Y is dependent on one qualitative variable – ‘Gender’

Yi =B1 + B2 Di + ui (1.1)

Yi = Annual food expenditure (in $)

Di = 0, for males

Di = 1, for females

Here, Di is a dummy variable representing gender.

The regression models that contain only dummy explanatory variables are called analysis-of-
variance (ANOVA) Models, e.g. (1.1)

Model (1.1) is similar to the two-variable regression model we have discussed except that the
explanatory variable is qualitative in nature (Di instead of Xi). Since Di remains fixed from sample to
sample (just like Xi) and assuming ui satisfy the usual assumptions of CLRM, OLS method can be used
to estimate the parameters of model (1.1)

Now,

Average annual food expenditure for males:

E( Yi / Di = 0 ) = B1

And, average annual food expenditure for females:

E( Yi / Di = 1 ) = B1 + B2

So, B2 measure by how much the average annual food expenditure of females differ from that of
males.
Since there is no continuous regression line, it is not appropriate to call B2 the slope coefficient. B2, in
this case, is called the differential intercept term.

We know that, using the OLS method,

b2 = ∑ di yi / ∑ di2

and,

b1 = Y̅ – b 2 D̅

where di = Di – D̅

* The category for which dummy takes the value 0 is known as the Benchmark/ Base/ Reference
Category. For this particular example, Male is the base category.

Let the estimated regression line for model (1.1) be given by

Ŷi = 3176 – 506 Di

(SE) (233) (329)

(t-ratio) (13.6) (-1.52)

r2 = 0.1890

So, average food expenditure of males = $3176 ( Di = 0 for males )

average food expenditure of females = 3176 – 506 = $2670 ( Di = 1 for females )

However, if we check for statistical significance of B2

Ho: B2 = 0

will not be rejected. (WHY?)

This means that statistically there is no significant difference between average annual expenditure
on food by males and females.

What if we change the base category (i.e. we change the indication for 0 and 1)???

Suppose now,

Di = 0 for female

and Di = 1 for males

How will the regression results change?

 b1 will be different (WHY?)

 The sign of b2 will change but the absolute value will remain the same (WHY?)

 r2 and t-ratio (for b2) will remain the same (WHY?)

Suppose, the newly estimated regression line is given by

Ŷ̂̂̂i = 2670 + 506 Di

(SE) (233) (329)

(t-ratio) (11.4) (1.52)

r2 = 0.1890

Therefore, change of base category will not affect the regression results.

What if we introduce two different dummy variables for the two categories i.e. males and

females?

Let the new regression line be given by

Yi = B1 + B2 D2i + B3 D3i + ui

Where, D2i = 0 for males

D2i = 1 for females

And, D3i = 0 for females

D3i = 1 for males

Estimating this particular regression will not be possible. (WHY?)

Because one of the assumption under CLRM is that of no perfect multi-collinearity. However, for this
model

D2i + D3i = 1

which means D2 and D3 are perfectly linearly related.

Such situation will lead to DUMMY VARIABLE TRAP and we would not be able to estimate the
regression coefficients.

GENERAL RULE: If a model has a common intercept, B1, and if a qualitative variable has M categories
then introduce only M-1 dummy variables. {For each dummy variable, base category should remain
the same}

Question: Instead of (1.1), suppose we estimate the following regression,

Ŷi = b1’ + b2’ Di’

Where, Di’ = 0 for males

Di’ = 2 for females

Check how b2’ will change? What would happen to SE(b2’) and t-ratio?

{HINT: Di’ = 2Di and we have done similar questions for change in Xi}

ANOVA regression models although useful, are not so common in the field of economics. In most
economic research a regression model contains a combination of qualitative and quantitative
variables. Such regression model (containing both type of variables) are called analysis-of-
covariance (ANCOVA) models.

Next we will extend (1.1) by adding an explanatory variable.

Let

PRF : Yi = B1 + B2 Di + B3 Xi +ui (1.2)

Where Yi = Annual expenditure on food

Xi = after tax income

and Di = 0 for males

Di = 1 for females

Suppose the estimated regression equation is

Ŷi = 1506 – 228.98 Di + 0.06 Xi

(t-ratio) (8.01) (-2.38) (9.64)

R2 = 0.9268
For (1.2),

 H0: B2 = 0 will be rejected (WHY?)

Therefore, B2 is statistically significant. This means gender has influence on food expenditure and
there is a significant difference between male and female expenditure on food.

 H0: B3 = 0 will be rejected (WHY?)

As after tax income increases, level of expenditure on food increases (which makes sense).

So, B3 is also statistically significant.

 R2 has increased significantly (compared to r2 value of 1.1)

CONCLUSION: MODEL 1.2 seems better than MODEL 1.1

In model 1.1, we were committing mis-specification error i.e. omission if a relevant explanatory
variable.

INTERPRETATION OF REGRESSION COEFFICIENTS:

B2 – If we keep after tax income constant then the mean food expenditure of females is less than
that if males by $228.98

B3 – If After tax income increases by $1, mean food expenditure increases by $0.06 or 60 cents ;
keeping the influence of gender constant.

Here, B3 is the marginal propensity of food consumption.

For model (1.2), we can have two different regression equations for the two categories.

Mean food expenditure regression for males:

Ŷi = 1506 + 0.06 Xi

Mean food expenditure regression for females:

Ŷi = 1278 + 0.06 Xi

Notice, intercept is different but slope is same for the two regression lines. Thus, we have a case of
parallel regressions.

DUMMY INTERACTING WITH SLOPE:

Next we will introduce a new term in model where dummy interacts with slope.

Yi = B1 + B2 Di + B3 Xi + B4 (Di Xi ) + ui (1.3)

For (1.3),

Average male food consumption expenditure:

E(Yi / Di = 0, Xi ) = B1 + B3 Xi

Average female food consumption expenditure:

E(Yi / Di = 1, Xi ) = (B1 + B2) + ( B3 + B4) Xi

Here, B2 – Differential intercept term

B4 – Differential slope term / slope drifter

Notice: When we add dummy in additive form (as we did in model 1.2) we look at differences in
intercept of the two categories and when we add dummy in the multiplicative form / interactive
form (as in model 1.3), we look at differences in the slope of the two categories.

Let the estimated regression equation be given by:

Ŷi = 1432 – 67.89 Di + 0.06 Xi – 0.006 (Di Xi)

(t-ratio) (5.76) (-0.193) (7.31) (-0.484)

R2 = 0.93

For the above regression line,

 H0: B2 = 0 will not be rejected (WHY?)

Di is statistically insignificant

 H0: B3 = 0 will not be rejected (WHY?)

(Di Xi) is also statistically insignificant

 R2 has increased marginally (whatever the small increase, is due to addition of an explanatory
variable)

CONCLUSION: Model 1.2 is better that Model 1.3. We are committing a mis-specification error in
model 1.3 i.e. inclusion of an unnecessary variable.

Therefore, model 1.2 seems to be the most relevant among the three models discussed as
far.

To summarize,

 H0 : B2 = 0  Checks for same intercept

 H0 : B4 = 0  Checks for same slope

So, we have the following four cases

1) H0 : B2 = 0  Reject

H0 : B4 = 0  Reject

We will get dis-similar regressions.

2) H0 : B2 = 0  Reject

H0 : B4 = 0  Don’t Reject

We will get parallel regressions.

3) H0 : B2 = 0  Don’t Reject

H0 : B4 = 0  Reject

We will get concurrent regressions.

4) H0 : B2 = 0  Don’t Reject

H0 : B4 = 0  Don’t Reject

We will get coincident regressions.

2022 Econometrics I Chapter Four
No ratings yet
2022 Econometrics I Chapter Four
83 pages
Chapter 5 & 6
No ratings yet
Chapter 5 & 6
136 pages
Econometrics II All Chapters
No ratings yet
Econometrics II All Chapters
240 pages
CH 3
No ratings yet
CH 3
123 pages
Lecture 10
No ratings yet
Lecture 10
37 pages
Chapter 4
No ratings yet
Chapter 4
78 pages
Econometrics II Chapter Two
No ratings yet
Econometrics II Chapter Two
96 pages
Econometrics CH 1-4
100% (1)
Econometrics CH 1-4
315 pages
Chapter Four Econometrics
No ratings yet
Chapter Four Econometrics
26 pages
Econometrics II Chapter One
No ratings yet
Econometrics II Chapter One
35 pages
Binary
No ratings yet
Binary
47 pages
Chapter 1 Qualitative Variables Final
No ratings yet
Chapter 1 Qualitative Variables Final
74 pages
CH 4 Eco
No ratings yet
CH 4 Eco
42 pages
Lecture 4
No ratings yet
Lecture 4
45 pages
Lec11 Ecmt
No ratings yet
Lec11 Ecmt
25 pages
Econometrics II Specail-2
No ratings yet
Econometrics II Specail-2
107 pages
Econ 335 Wooldridge CH 7
No ratings yet
Econ 335 Wooldridge CH 7
22 pages
Econometrics II Chapter One
No ratings yet
Econometrics II Chapter One
87 pages
Lec 6EFCFull
No ratings yet
Lec 6EFCFull
34 pages
Ecoometrics II Chap 1
No ratings yet
Ecoometrics II Chap 1
22 pages
Linear Regression Lecture
No ratings yet
Linear Regression Lecture
18 pages
Binary
No ratings yet
Binary
40 pages
Chapter 7
No ratings yet
Chapter 7
31 pages
Econometrics II Slides-1
No ratings yet
Econometrics II Slides-1
61 pages
Chapter Three QM
No ratings yet
Chapter Three QM
77 pages
Econometrics I - Lecture 7 (Wooldridge)
No ratings yet
Econometrics I - Lecture 7 (Wooldridge)
34 pages
Dummy Variable Ques
No ratings yet
Dummy Variable Ques
7 pages
Chapter 1 Dummy Variable Regression
No ratings yet
Chapter 1 Dummy Variable Regression
45 pages
Asynchronus Learning Module - Sesi 8
No ratings yet
Asynchronus Learning Module - Sesi 8
9 pages
ECN 813 Dummy Variable
No ratings yet
ECN 813 Dummy Variable
21 pages
Econometrics II Chapter One
No ratings yet
Econometrics II Chapter One
71 pages
Dummy Variable Regression Models
No ratings yet
Dummy Variable Regression Models
19 pages
Anova
No ratings yet
Anova
14 pages
Lecture 6 Correlation and Regression
No ratings yet
Lecture 6 Correlation and Regression
10 pages
Econometrics II CH 1
No ratings yet
Econometrics II CH 1
48 pages
Correlation and Regression
No ratings yet
Correlation and Regression
10 pages
Chapter 4 (Compatibility Mode)
No ratings yet
Chapter 4 (Compatibility Mode)
66 pages
E 340
No ratings yet
E 340
6 pages
Econometrics 2
No ratings yet
Econometrics 2
135 pages
Econometrics Categorical Variables
No ratings yet
Econometrics Categorical Variables
12 pages
Statistical Inference Mcqs Final
100% (1)
Statistical Inference Mcqs Final
35 pages
Econometric S
No ratings yet
Econometric S
22 pages
Presentation G1
No ratings yet
Presentation G1
21 pages
UC Berkeley Econ 140 Section 10
No ratings yet
UC Berkeley Econ 140 Section 10
8 pages
Multiple Regression
No ratings yet
Multiple Regression
49 pages
Review: Multiple Regression: Holding The Other Explanatory Variables Constant or Fixed
No ratings yet
Review: Multiple Regression: Holding The Other Explanatory Variables Constant or Fixed
7 pages
Business Econometrics: Session VII-VIII DR Tutan Ahmed IIT Kharagpur February 2020
No ratings yet
Business Econometrics: Session VII-VIII DR Tutan Ahmed IIT Kharagpur February 2020
21 pages
Chapter12 Solutions PDF
No ratings yet
Chapter12 Solutions PDF
44 pages
A Sample Mid-Term Examination of Econometrics Multiple Choice
No ratings yet
A Sample Mid-Term Examination of Econometrics Multiple Choice
8 pages
Chapter 7, Dummy Variable
No ratings yet
Chapter 7, Dummy Variable
13 pages
Econometrics For Finance Chapter 5
No ratings yet
Econometrics For Finance Chapter 5
12 pages
CH 5. Discrete Choice Model
No ratings yet
CH 5. Discrete Choice Model
38 pages
Econ 140 - Spring 2016 Section 8: Additional Exercises
No ratings yet
Econ 140 - Spring 2016 Section 8: Additional Exercises
4 pages
Multiple Regression: Curve Estimation
100% (2)
Multiple Regression: Curve Estimation
23 pages
Assignments
No ratings yet
Assignments
6 pages
CHapter 5 Acct
No ratings yet
CHapter 5 Acct
8 pages
Example 2
No ratings yet
Example 2
7 pages
Cheat Sheet Statistics
No ratings yet
Cheat Sheet Statistics
3 pages
Regression With Dummy Variables Econ420 1
No ratings yet
Regression With Dummy Variables Econ420 1
47 pages
Stata Excel Spreadsheet
No ratings yet
Stata Excel Spreadsheet
43 pages
Chapter 6: Specification of Regression Variables
No ratings yet
Chapter 6: Specification of Regression Variables
26 pages
Chapter 6 Multiple Regression Analysis Further Issues
100% (3)
Chapter 6 Multiple Regression Analysis Further Issues
9 pages
Durbin Watson Test
No ratings yet
Durbin Watson Test
6 pages
MMEA WP 2 Measurement and Sampling Uncer
No ratings yet
MMEA WP 2 Measurement and Sampling Uncer
52 pages
STA301 Final Term Solved MCQs by JUNAID-1
No ratings yet
STA301 Final Term Solved MCQs by JUNAID-1
54 pages
Statistics Formula Booklet
No ratings yet
Statistics Formula Booklet
13 pages
Linear Regression
No ratings yet
Linear Regression
12 pages
Simple and Multiple Linear Regression
No ratings yet
Simple and Multiple Linear Regression
6 pages
(FREE PDF Sample) Introduction To Machine Learning Second Edition Adaptive Computation and Machine Learning Ethem Alpaydin Ebooks
100% (2)
(FREE PDF Sample) Introduction To Machine Learning Second Edition Adaptive Computation and Machine Learning Ethem Alpaydin Ebooks
55 pages
Regression Linear
No ratings yet
Regression Linear
24 pages
QA - M4 - MLR - Chapter 18 IND - Business StatisticsGovind Chand Beri
No ratings yet
QA - M4 - MLR - Chapter 18 IND - Business StatisticsGovind Chand Beri
25 pages
StockWatson Econ CH04
No ratings yet
StockWatson Econ CH04
27 pages
Statistical Modeling of Extreme Values PDF
No ratings yet
Statistical Modeling of Extreme Values PDF
28 pages
Measures of Forecast Error - MSE MAD MAPE Regression Analysis
No ratings yet
Measures of Forecast Error - MSE MAD MAPE Regression Analysis
34 pages
Solving Multicollinearity Problem: Int. J. Contemp. Math. Sciences, Vol. 6, 2011, No. 12, 585 - 600
No ratings yet
Solving Multicollinearity Problem: Int. J. Contemp. Math. Sciences, Vol. 6, 2011, No. 12, 585 - 600
16 pages
Synthetic Estimators Using Auxiliar
No ratings yet
Synthetic Estimators Using Auxiliar
14 pages
Tutorial - Session Nine
0% (1)
Tutorial - Session Nine
3 pages
Nonlinear Relationships: Y X X X X EYX
No ratings yet
Nonlinear Relationships: Y X X X X EYX
23 pages
Bonat 2018
No ratings yet
Bonat 2018
30 pages
BMS40420171201
No ratings yet
BMS40420171201
5 pages
Syllabus
No ratings yet
Syllabus
4 pages
Logit (Lect 05)
No ratings yet
Logit (Lect 05)
6 pages
Assignment 2
No ratings yet
Assignment 2
2 pages
Advertising - Paulina Frigia Rante (34) - PPBP 1 - Colaboratory
No ratings yet
Advertising - Paulina Frigia Rante (34) - PPBP 1 - Colaboratory
7 pages
Study Guide - Inference Procedures
No ratings yet
Study Guide - Inference Procedures
4 pages
Faculty of Science FRM 9649: Time Series Analysis & Forecasting
No ratings yet
Faculty of Science FRM 9649: Time Series Analysis & Forecasting
5 pages
Confidence Interval Curve
100% (1)
Confidence Interval Curve
4 pages
Bias Variance Derivation
No ratings yet
Bias Variance Derivation
2 pages
Pre-Calculus Essentials
From Everand
Pre-Calculus Essentials
Ernest Woodward
No ratings yet
Calculus III Essentials
From Everand
Calculus III Essentials
Editors of REA
1/5 (2)

Dummy Variable Regression Models: Dichotomous Variables)

Uploaded by

Dummy Variable Regression Models: Dichotomous Variables)

Uploaded by

Dummy Variable Regression Models

Yi = Annual food expenditure (in $)

Here, Di is a dummy variable representing gender.

Average annual food expenditure for males:

And, average annual food expenditure for females:

We know that, using the OLS method,

Let the estimated regression line for model (1.1) be given by

Ŷi = 3176 – 506 Di

(SE) (233) (329)

(t-ratio) (13.6) (-1.52)

So, average food expenditure of males = $3176 ( Di = 0 for males )

average food expenditure of females = 3176 – 506 = $2670 ( Di = 1 for females )

However, if we check for statistical significance of B2

will not be rejected. (WHY?)

and Di = 1 for males

 b1 will be different (WHY?)

 r2 and t-ratio (for b2) will remain the same (WHY?)

Suppose, the newly estimated regression line is given by

Ŷ̂̂̂i = 2670 + 506 Di

(SE) (233) (329)

(t-ratio) (11.4) (1.52)

Let the new regression line be given by

Where, D2i = 0 for males

D2i = 1 for females

And, D3i = 0 for females

D3i = 1 for males

Estimating this particular regression will not be possible. (WHY?)

which means D2 and D3 are perfectly linearly related.

Question: Instead of (1.1), suppose we estimate the following regression,

Ŷi = b1’ + b2’ Di’

Where, Di’ = 0 for males

Di’ = 2 for females

Next we will extend (1.1) by adding an explanatory variable.

PRF : Yi = B1 + B2 Di + B3 Xi +ui (1.2)

Where Yi = Annual expenditure on food

Xi = after tax income

and Di = 0 for males

Suppose the estimated regression equation is

Ŷi = 1506 – 228.98 Di + 0.06 Xi

(t-ratio) (8.01) (-2.38) (9.64)

 H0: B2 = 0 will be rejected (WHY?)

 H0: B3 = 0 will be rejected (WHY?)

So, B3 is also statistically significant.

 R2 has increased significantly (compared to r2 value of 1.1)

CONCLUSION: MODEL 1.2 seems better than MODEL 1.1

INTERPRETATION OF REGRESSION COEFFICIENTS:

Here, B3 is the marginal propensity of food consumption.

Mean food expenditure regression for males:

Ŷi = 1506 + 0.06 Xi

Mean food expenditure regression for females:

Ŷi = 1278 + 0.06 Xi

DUMMY INTERACTING WITH SLOPE:

Average male food consumption expenditure:

Average female food consumption expenditure:

E(Yi / Di = 1, Xi ) = (B1 + B2) + ( B3 + B4) Xi

Here, B2 – Differential intercept term

B4 – Differential slope term / slope drifter

Let the estimated regression equation be given by:

(t-ratio) (5.76) (-0.193) (7.31) (-0.484)

For the above regression line,

 H0: B2 = 0 will not be rejected (WHY?)

 H0: B3 = 0 will not be rejected (WHY?)

(Di Xi) is also statistically insignificant

 H0 : B2 = 0  Checks for same intercept

So, we have the following four cases

We will get dis-similar regressions.

We will get parallel regressions.

We will get concurrent regressions.

We will get coincident regressions.

You might also like