0% found this document useful (0 votes)

18 views58 pages

Lecture7 - Regression Extensions

The document discusses regression analysis techniques for categorical variables, including the creation of dummy variables, interactions between variables, and the effects of seasonality and time trends. It explains how to interpret coefficients for categorical variables and the importance of reference groups in regression models. Additionally, it provides practical examples and applications of these concepts in data analysis.

Uploaded by

JackCaizhizhao

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as KEY, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views58 pages

Lecture7 - Regression Extensions

Uploaded by

JackCaizhizhao

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as KEY, PDF, TXT or read online on Scribd

You are on page 1/ 58

Regression Extensions

Chu Junhong
[email protected]
HKU Business School
Road Map
Regression on categorical variables
Interactions
Different slopes
Different intercepts
Seasonality
Day of week effect
Month of year effect
Hour of day effect
Time trend
Categorical Variables
A categorical variable, also called a
qualitative variable, takes a countable
number of distinct (and fixed) groups
(attribute levels) and assigns each
individual to a particular group on the basis
of some qualitative property. Examples:
Religion, gender, undergraduate university,
college major, product brand, distribution
channel, season, day of a week,…
The groups have no natural order, i.e., you
cannot say that winter is better/worse than
summer.
Categorical Variables
Categorical variables cannot be the Y
variable; they can only be the X
variables
However, statistical models can only
work on numerical data
If you have categorical variables
(qualitative data) what should you
do?
6

But, the numbers of 1,…,7 are just labels. They do not contain
any numerical value (e.g., 7 is not larger than 6 by 1), and can
be relabeled easily with, say, A, .., G, without any information loss.
We cannot simply put these numbers into a regression equation
Solution: Create Dummy
Variables
A dummy variable is an indicator that
only take two values: 0 and 1
is used to represent each response or attribute level of
a categorical variable
# of dummy variables needed = # of levels –
1
Examples
Gender: 1) male, 2) female => need 1 dummy variable
Education: 1) no schooling, 2) primary, 3) middle
school, 4) high school, 5) college+ => need 4 dummy
variables
Religion: 1) Catholic, 2) Christian, 3) Buddhist, 4)
Muslim, 5) Free thinker, 6) others => need 5 dummy
variables
Why Only K-1 Dummies for K
Attributes?

What is the corresponding x for ?

(a vector of 1’s)
Treat Numeric Var as
Categories
We can also treat numeric variables (e.g.,
income bracket, distance segment) as
categorical
To examine the effect of each group
separately
To allow for non-linear effects (will use
examples)
Linear effect: the effect of increasing from 1 to 2
is the same as increasing from 2 to 3, or from 3
to 4, …
Nonlinear effect: the effect of increasing from 1
to 2 ⧧ from 2 to 3 ⧧ from 3 to 4, …
How to Interpret Dummy
Variables?
For k levels (attributes, responses), we can only have k-1
dummy variables in a regression as X variables if the var
enters alone.
One level or attribute / response is reserved as the “base”; all
interpretation is relative to the base/reference, whose
coefficient is 0
If the coefficient is positive, it is higher/larger than the base
If the coefficient is negative, it is lower/smaller than the
base
If female is the base:
When the male’s coefficient is positive, it means that
“compared to females, males are on average taller by…”
When the male’s coefficient is negative, it means that
“compared to females, males on average have lower xxx by…”
Whichever group is used as the reference, the coefficients for
the dummies will be different, but the interpretations will be
the same
Check whether WTP depends on Your
undergraduate majors

Count the freq of

each major and
order in desc

Create IDs for

undergraduate
majors

Merge the
undergraduate
major IDs back to
the data

Generate
dummies for
undergraduate
majors
Different Intercepts

Using “array” to
generate dummies for
major IDs

Run
regressions on
dummies
Use MajorID=1 Use MajorID=4
as the as the
reference reference

It’s quite cumbersome to create dummy variables,

esp. when you have a large number of attributes.
You can use Proc GLM to treat each level of
attribute as a dummy.
Change the default
reference group
Different Intercepts, Same Slope for each
Undergraduate major: Parallel Lines

WTP maj
for
wine
maj

maj

Height
Interactions (1):
categorical*continuous
In data analysis, we often interact two
independent variables.

If is categorical (undergraduate major) and is
continuous (height, price), then means
means that you
will have one slope for each level of . You will
have K slopes.
The different effects of your height on your WTP by
undergraduate majors
The different effects of father’s height on your height by
undergraduate majors
3 outlet types in the IRI data: grocery store, mass
merchandisers, and drug store, you will a slope for each of
them
Different Slopes for each Major

WTP
for
wine

maj

Height
Interactions (2): categorical
+categorical*continuous
In data analysis, we often interact two
independent variables.

If is categorical (undergraduate major) and
is continuous (height, price), then means
that you will have one intercept for level of
(K-1 in total); means
means that you will have one
slope for each level of . You will have K
slopes.
Different intercepts
different + slopes

WTP
for
wine

maj

Height
Interactions: Categorical*Categorical

In data analysis, we often interact two

independent variables.

If and
and are
are both categorical, then means
means
that you will have one intercept for each
combination of these two categorical
variables .
3 outlet types: grocery store, drug store, mass
merchandisers; 2 markets: Eau Claire and Pittsfield

4 majors, 2 genders
One intercept for each combination
Interactions: Continuous*Continuous

In data analysis, we often interact two

independent variables.

If and
and are
are both continuous (e.g.,
advertising and discount), then means
means
that the marginal effect of on y
depends the value of and vice versa.
, : main effects
: interaction effects
Main effects + Interaction
Effects
Seasonality and Time Trend
Seasonality means differs by season
Season can be year, quarter, month, day of
week, hour of day, etc.
Need to create dummies to check seasonality
Time trend means “long-term” increase
or decrease (can be nonlinear) in
Need to create a continuous variable and
include it in the regression model
Seasonality and time trend can both
present in the same data
Seasonality with no time
trend
Upward time trend with No
seasonality
Downward time trend with No
seasonality
Nonlinear time trend, No
seasonality
Upward Time Trend with
Seasonality
Downward Time Trend with
Seasonality
Use Ride Data to Practice
Seasonality + Time Trend
Seasonality – Day of Week
effect
Seasonality – Month of Year
Effect
Seasonality – Day of Week &
Month of Year Effect
Time Trend
Seasonality (DoW+Month) + Time Trend

After we control for

month effect and Day of
week effect, there is no
more time trend.
If we examine the month
effect, October is
highest and August is
the lowest, which likely
captures the time trend.
To check on this, let’s
drop month effect in our
model.
Drop Month Effect

The time trend becomes

significant: the calls
grow over time.
This confirms our guess
that month effect
absorbs the time trend.
Let’s also practice interactions
with the Ride data
Price sensitivity by distance

The longer the

distance, the more
price sensitive
Price sensitivity by distance

Main effect of distance: when

distance increases by 1km, the
demand increases by exp(0.05458)-1
= 5.61%
Main effect of price: when price
increases by 1%, the demand will
decrease by 0.0167%
Interaction effect of price*distance:
when distance increases by 1km,
price sensitivity increases by 0.0744
percentage points. When distance =
1km, the total price elasticity = -
0.0166-0.0744 = -0.0910
When distance = 10km, the total
price elasticity = -0.0166-0.0744*10
= -0.7506
Demand levels + price
sensitivity by distance segment
Optional: I also Use the IRI
data to Practice Seasonality,
Time Trend, and Interactions.
For those who are interested
to learn more, please take a
look by yourself.
IRI Coffee Purchases: Prepare
Data
Import panelists’ coffee purchases
in 2004
IRI Coffee Purchases: Prepare
Data
Import panelists’ coffee purchases
in 2005
IRI Coffee Purchases: Prepare
Data
Combine two years’ data
IRI Coffee Purchases: Prepare
Data
Convert package sizes into
equivalent units
IRI Coffee Purchases: Prepare
Data
Import demographics
IRI Coffee Purchases: Prepare
Data
Merge demographics with purchase
data
IRI Coffee Purchases: Prepare
Data
Convert IRI week into calendar
week for seasonality check
Demand Analysis: log-log
regression

R2 = 0.1476 R2 = 0.4378
Whether Purchase Q and Price
Sensitivity vary by outlet type

The average purchase Q

in “GR” is exp(-
0.436)=64.7% of that in
MA; the average
purchase Q in “DR” is
exp(-1.047724) = 35.1%
of that in MA.
(base The price elasticity in
) “DR” is
-0.41; it is -0.93 in “GR”,
and
-1.11 in “MA”.

R2 = 0.4427
Whether Purchase Q and Price
Sensitivity vary by outlet type

The average purchase Q

in “GR” is exp
(0.6116)=1.84 times
that in DR; the average
purchase Q in “MA” is
exp(1.047724) = 2.85
times that in DR.
(base
The price elasticity in
)
“DR” is
-0.41; it is -0.93 in “GR”,
and
-1.11 in “MA”.

R2 = 0.4427
Compare the Results with
Different Bases (References)

The slopes are identical

The intercepts: Add the intercept to the coefficients of the 3
outlets to see whether they are identical.
Conclusion: whichever group is used as the base, it will not
affect the interpretation.
“noint” option allows all
groups to have coefficients
Seasonality, No Time Trend
Seasonality and Time Trend
Check Seasonality in Price
Sensitivity and Purchase Quantity

Parameter est. of S.E. elasticity S.E.

mean Q
Intercept 1.3024 0.0191
month 1 -0.1026 0.0264 -0.9280 0.0147
month 2 -0.0709 0.0289 -0.9497 0.0164
month 3 0.0400 0.0264 -0.9842 0.0148
month 4 -0.0233 0.0297 -0.9584 0.0172
month 5 -0.0099 0.0286 -0.9653 0.0159
month 6 0.0723 0.0325 -1.0006 0.0191
month 7 -0.0925 0.0311 -0.9230 0.0183
month 8 -0.0078 0.0286 -0.9973 0.0158
month 9 0.0034 0.0307 -0.9887 0.0177
month 10 -0.0865 0.0292 -0.9513 0.0164
month 11 -0.2438 0.0249 -0.8485 0.0128
month 12 0.0000 . -1.0078 0.0140
time 0.0025 0.0001

RHCSA Exam (EX200) Commands Cheatsheet
100% (1)
RHCSA Exam (EX200) Commands Cheatsheet
11 pages
ECN 813 Dummy Variable
No ratings yet
ECN 813 Dummy Variable
21 pages
Multivariate Analysis – The Simplest Guide in the Universe: Bite-Size Stats, #6
From Everand
Multivariate Analysis – The Simplest Guide in the Universe: Bite-Size Stats, #6
Lee Baker
No ratings yet
3.dummy Variables
No ratings yet
3.dummy Variables
25 pages
Chapter Three QM
No ratings yet
Chapter Three QM
77 pages
Econometrics II Chapter One
No ratings yet
Econometrics II Chapter One
87 pages
Econometrics II All Chapters
No ratings yet
Econometrics II All Chapters
240 pages
Chapter Five (Dummy) - For Evaluation
No ratings yet
Chapter Five (Dummy) - For Evaluation
64 pages
Econometrics Categorical Variables
No ratings yet
Econometrics Categorical Variables
12 pages
Lecture 08 Dummy Variables
No ratings yet
Lecture 08 Dummy Variables
6 pages
Econometrics I - Lecture 7 (Wooldridge)
No ratings yet
Econometrics I - Lecture 7 (Wooldridge)
34 pages
Econometrics II Chapter Two
No ratings yet
Econometrics II Chapter Two
96 pages
Topic 7 Regression (Cont.)
No ratings yet
Topic 7 Regression (Cont.)
47 pages
Econometrics 2
No ratings yet
Econometrics 2
135 pages
Dummy Variables EAB
No ratings yet
Dummy Variables EAB
12 pages
Econometrics Cha 4
No ratings yet
Econometrics Cha 4
72 pages
Dummy Variable Regression
No ratings yet
Dummy Variable Regression
8 pages
Econometrics 2
No ratings yet
Econometrics 2
84 pages
Introduction To Econometrics Ii (Econ-3062) : Mohammed Adem (PHD)
100% (5)
Introduction To Econometrics Ii (Econ-3062) : Mohammed Adem (PHD)
83 pages
Chapter 1
No ratings yet
Chapter 1
47 pages
2022 Econometrics I Chapter Four
No ratings yet
2022 Econometrics I Chapter Four
83 pages
Chapter 1 Econometrics
No ratings yet
Chapter 1 Econometrics
21 pages
Lec11 Ecmt
No ratings yet
Lec11 Ecmt
25 pages
Dummies
No ratings yet
Dummies
5 pages
Chapter 4
No ratings yet
Chapter 4
78 pages
CHapter 5 Acct
No ratings yet
CHapter 5 Acct
8 pages
Lecture 4
No ratings yet
Lecture 4
45 pages
Ch07 - Dummy Variables - Ver1
No ratings yet
Ch07 - Dummy Variables - Ver1
29 pages
1-6 Dummy Variable
No ratings yet
1-6 Dummy Variable
16 pages
Econoch 7
No ratings yet
Econoch 7
32 pages
Dummy Variable
No ratings yet
Dummy Variable
10 pages
5 Gender Divide
No ratings yet
5 Gender Divide
13 pages
C3 English
No ratings yet
C3 English
31 pages
Extending The Multiple Regression
No ratings yet
Extending The Multiple Regression
19 pages
Dummy 19
No ratings yet
Dummy 19
9 pages
Lecture 9. Issues in Multiple Regression
No ratings yet
Lecture 9. Issues in Multiple Regression
13 pages
Econometrics Lecture Note Chapter 4 and 5
No ratings yet
Econometrics Lecture Note Chapter 4 and 5
39 pages
EBE Dummy Variables
No ratings yet
EBE Dummy Variables
9 pages
Econometrics 4
No ratings yet
Econometrics 4
37 pages
Econometrics II-1
No ratings yet
Econometrics II-1
56 pages
Econometrics II Chapter One
No ratings yet
Econometrics II Chapter One
71 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
43 pages
Chapter 7
No ratings yet
Chapter 7
31 pages
L1090 Lecture8 2024
No ratings yet
L1090 Lecture8 2024
36 pages
Presentation G1
No ratings yet
Presentation G1
21 pages
Lecture 10
No ratings yet
Lecture 10
37 pages
Ees 401 Econometrics II Module
No ratings yet
Ees 401 Econometrics II Module
77 pages
How To Use Dummy X Variables
No ratings yet
How To Use Dummy X Variables
7 pages
Chapter 4 (Compatibility Mode)
No ratings yet
Chapter 4 (Compatibility Mode)
66 pages
Chap7 and 6.2 - Fall20 - 1124
No ratings yet
Chap7 and 6.2 - Fall20 - 1124
45 pages
Chapter 5 & 6
No ratings yet
Chapter 5 & 6
136 pages
Dummy Variables
No ratings yet
Dummy Variables
2 pages
Econometrics II (N)
No ratings yet
Econometrics II (N)
30 pages
E 340
No ratings yet
E 340
6 pages
Variables
No ratings yet
Variables
14 pages
Dummy Dependent Variables Models
No ratings yet
Dummy Dependent Variables Models
15 pages
Dummy Variables
No ratings yet
Dummy Variables
25 pages
Dummy Variable
No ratings yet
Dummy Variable
21 pages
Website Worksheets - SPSS - Recoding Categorical Variables
No ratings yet
Website Worksheets - SPSS - Recoding Categorical Variables
7 pages
Econometrics II Chapter One
No ratings yet
Econometrics II Chapter One
35 pages
Dummy Variable Regression Models
No ratings yet
Dummy Variable Regression Models
19 pages
Nortel Meridian SL100 Brochure
No ratings yet
Nortel Meridian SL100 Brochure
4 pages
JDA WMS RedPrairie Functional Course Content
No ratings yet
JDA WMS RedPrairie Functional Course Content
2 pages
SQL Server and ASP Net Questions & Answers
No ratings yet
SQL Server and ASP Net Questions & Answers
12 pages
DTIN Assg. Q
No ratings yet
DTIN Assg. Q
5 pages
C109 Cut SH
No ratings yet
C109 Cut SH
1 page
Class 2 Word Processing (Ms Word)
No ratings yet
Class 2 Word Processing (Ms Word)
8 pages
SAP Manual
No ratings yet
SAP Manual
24 pages
Mock Exam 03
No ratings yet
Mock Exam 03
7 pages
Ajava1 To 23prac
No ratings yet
Ajava1 To 23prac
82 pages
Taller1 Hanger Sizing in Caesar
No ratings yet
Taller1 Hanger Sizing in Caesar
37 pages
HP Designjet T2300 eMFP Product Series - The Scanner Diagnostic Plot HP® Customer Support
No ratings yet
HP Designjet T2300 eMFP Product Series - The Scanner Diagnostic Plot HP® Customer Support
8 pages
IEG-HWU460 Jan2021
No ratings yet
IEG-HWU460 Jan2021
8 pages
Reading Answer Sheet
No ratings yet
Reading Answer Sheet
1 page
Modelling Rich Interaction
No ratings yet
Modelling Rich Interaction
50 pages
Use of Robot Kits in Manufacturing Industry-CIM
No ratings yet
Use of Robot Kits in Manufacturing Industry-CIM
11 pages
Pa0201 2000
No ratings yet
Pa0201 2000
103 pages
4090-9001 Supervised IAM Installation Manual Rev E PDF
No ratings yet
4090-9001 Supervised IAM Installation Manual Rev E PDF
2 pages
Arcgis Online Tutorial 2017 PDF
No ratings yet
Arcgis Online Tutorial 2017 PDF
22 pages
Emmanuel Seminar
No ratings yet
Emmanuel Seminar
9 pages
002 6030 RH120E Undercarriage CAT
No ratings yet
002 6030 RH120E Undercarriage CAT
26 pages
PGP in Data Science and AI With Fellowship
No ratings yet
PGP in Data Science and AI With Fellowship
14 pages
Suntech 550W EN - STP560S - C72 - VMH Monofacial PERC
No ratings yet
Suntech 550W EN - STP560S - C72 - VMH Monofacial PERC
2 pages
PAS 2 Inventories: Problem 1: True or False
No ratings yet
PAS 2 Inventories: Problem 1: True or False
3 pages
Riley Nelson Resume
No ratings yet
Riley Nelson Resume
2 pages
I PU Midterm Chapterwise Questions
No ratings yet
I PU Midterm Chapterwise Questions
4 pages
Finish - Checkout - T-Mobile
No ratings yet
Finish - Checkout - T-Mobile
4 pages
Salud Estructural
No ratings yet
Salud Estructural
45 pages
Best Monthly Management Report Template
No ratings yet
Best Monthly Management Report Template
3 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
57 pages

Lecture7 - Regression Extensions

Uploaded by

Lecture7 - Regression Extensions

Uploaded by

Regression Extensions

Count the freq of

Create IDs for

It’s quite cumbersome to create dummy variables,

In data analysis, we often interact two

In data analysis, we often interact two

After we control for

The time trend becomes

The longer the

Main effect of distance: when

The average purchase Q

The average purchase Q

The slopes are identical

Parameter est. of S.E. elasticity S.E.

You might also like