0% found this document useful (0 votes)
23 views

Lecture 23

Uploaded by

Sarah Alkindi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

Lecture 23

Uploaded by

Sarah Alkindi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Regression with Categorical

Variables

• Regression analysis requires numerical data.


• Categorical data can be included as independent
variables but must be coded numeric using
dummy variables.
• For variables with 2 categories, code as 0 and 1.
Example 8.15: A Model with
Categorical Variables (1 of 2)

• Employee Salaries provides data for 35 employees.

• Predict Salary using Age and MBA (code as yes = 1, no = 0)

𝑌 = 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + 𝜀
where
Y = salary
𝑋1 = age
𝑋2 = MBA indicator (0 or 1)
Example 8.15: A Model with
Categorical Variables (2 of 2)

• Salary = 893.59 + 1044.15 × Age + 14767.23 × MBA


– If MBA = 0, salary Salary = 893.59 + 1044 × Age
– If MBA = 1, salary Salary = 15,660.82 + 1,044.15 × Age
Interactions

• An interaction occurs when the effect of one


variable is dependent on another variable.
• We can test for interactions by defining a new
variable as the product of the two variables,
𝑋3 = 𝑋1 × 𝑋2 , and testing whether this
variable is significant, leading to an alternative
model.
𝑌 = 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + 𝛽3 𝑋3 + 𝜀
Example 8.16: Incorporating Interaction
Terms in a Regression Model (1 of 3)

• Define an interaction between Age


and MBA and re-run the regression.

The MBA indicator is not significant; we would typically drop it


and re-run the regression analysis.
Example 8.16: Incorporating Interaction
Terms in a Regression Model (2 of 3)
This results in the model:
salary = 3,323.11 + 984.25 × age + 425.58 × MBA × age
Example 8.16: Incorporating Interaction
Terms in a Regression Model (3 of 3)

• However, statisticians recommend that if


interactions are significant, first-order terms
should be kept in the model regardless of their p-
values.
• Thus, using the first regression model, we have:

salary = 3902.51 + 971.31 × Age − 2971.08


× MBA + 501.85 × MBA × Age
Categorical Variables with More Than Two
Levels

• When a categorical variable has k > 2 levels, we


need to add k − 1 additional variables to the
model.
Example 8.17: A Regression Model with Multiple
Levels of Categorical Variables (1 of 4)

• The Excel file Surface


Finish provides
measurements of the
surface finish of 35
parts produced on a
lathe, along with the
revolutions per minute
(RPM) of the spindle
and one of four types of
cutting tools used.
Example 8.17: A Regression Model with Multiple
Levels of Categorical Variables (2 of 4)

• Because we have k = 4 levels of tool type, we will


define a regression model of the form
𝑌 = 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + 𝛽3 𝑋3 + 𝛽4 𝑋4 + 𝜀
where
Y = surface finish
𝑋1 = RPM
𝑋2 = 1 if tool type is B and 0 if not
𝑋3 = 1 if tool type is C and 0 if not
𝑋4 = 1 if tool type is D and 0 if not
Example 8.17: A Regression Model with Multiple
Levels of Categorical Variables (3 of 4)

• Add 3 columns to the


data, one for each of
the tool type variables
Example 8.17: A Regression Model with Multiple
Levels of Categorical Variables (4 of 4)

• Regression results

Surface finish = 24.49 + 0.098 RPM − 13.31 type B − 20.49


type C − 26.04 type D
Example 8.17: A Regression Model with Multiple
Levels of Categorical Variables (4 of 4)

• Regression results:
Regression Models with Nonlinear
Terms

• Curvilinear models may be appropriate when scatter


charts or residual plots show nonlinear relationships.
• A second order polynomial might be used
𝑌 = 𝛽0 + 𝛽1 𝑋 + 𝛽2 𝑋 2 + 𝜀

• Here 𝛽1 represents the linear effect of X on Y and 𝛽2


represents the curvilinear effect.

• This model is linear in the β parameters so we can use


linear regression methods.
Example 8.18: Modeling Beverage Sales
Using Curvilinear Regression (1 of 2)

• The U-shape of the residual plot (a second-order


polynomial trendline was fit to the residual data) suggests
that a linear relationship is not appropriate.
Example 8.18: Modeling Beverage Sales
Using Curvilinear Regression (2 of 2)

• Add a variable for temperature squared.

• The model is: Sales = 142,850 − 3,643.17 × Temperature + 23.3


× Temperature2

You might also like