0% found this document useful (0 votes)
67 views62 pages

Module 6B Regression - Modelling Possibilities

The regression analysis is significant overall and has a good fit, with the model explaining 92.5% of the salary variation. The regression equation includes years of experience and MBA as having a positive influence on salaries, while being over 50 has a negative influence. The interaction between MBA and years of experience is also positive but not statistically significant.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views62 pages

Module 6B Regression - Modelling Possibilities

The regression analysis is significant overall and has a good fit, with the model explaining 92.5% of the salary variation. The regression equation includes years of experience and MBA as having a positive influence on salaries, while being over 50 has a negative influence. The interaction between MBA and years of experience is also positive but not statistically significant.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 62

REGRESSION ANALYSIS:

MODELING POSSIBILITIES
Learning Objective:
At the end of the lesson, the student should be able to:
• Use different types of explanatory variables for inclusion in the regression
equation.
• Interpret the regression results with these types of predictor variables.
• Familiarize oneself with the different types of nonlinear regression.
• Identify when binary logistic regression is appropriate.
• Formulate the binary logistic regression from the software output.
• Predict using the binary logistic regression equation.
Modeling Possibilities

Business Analytics: Data analysis and decision-making by Aljbright and Winston, 2020
DUMMY VARIABLES
Some potential variables are categorical and cannot be measured on a
quantitative scale. To include them in a regression equation, the trick is
to use dummy variables, also called indicators or 0-1 variables.
If a dummy variable for a given category equals 1, the observation is in
that category; if it equals 0, the observation is not in that category.
For example, if gender is a predictor variable, a single dummy variable is
required. Code Gender as 1 for females and 0 for males (or vice versa).

Business Analytics by Jaggia et al 2021


Example 1: How to create, use and interpret dummy variables in
regression analysis
Fifth National Bank of Springfield is facing a gender
discrimination suit. The charge is that its female
employees receive substantially smaller salaries than its
male employees. Refer to Bank Salaries.xlsx for the data
file.
Analyze whether the bank discriminates against females
in terms of salary. Include years of experience with the
current bank and with the previous bank as predictors.

Doane and Seward 2018


The following are the variables:

Doane and Seward 2018


Step 1: Code Gender to Female.
The first task is to create dummy variables for the variable
gender, using the function =IF(G2=“Female”,1,0).

Note:
The number of dummy variables is equal to the number of
categories of the original variable minus 1.

Business Analytics by Jaggia et al 2021


Step 2: Run the regression analysis in Excel.
If you are using Excel, see to it that you have placed the predictors
next to each other. Use the “Paste Special: Values” feature of Excel to do
this.

Business Analytics by Jaggia et al 2021


Run the regression analysis in Excel, JASP, or Gretl.

Business Analytics by Jaggia et al 2021


Excel Output
The regression is significant overall, F(3,204) = 65.93, p < .01.
However, R-square = 0.492. Hence, 49.2% of the variation in
salaries can be explained by the 3 predictor variables together in
the model.

The regression equation is:


𝑦ො = 35491.66 − 8080𝐹𝑒𝑚𝑎𝑙𝑒 + 987.99𝑌𝑒𝑎𝑟𝑠1 + 131.34𝑌𝑒𝑎𝑟2

Business Analytics by Jaggia et al 2021


Excel Output
The coefficients suggest that a female (Female = 1) makes $8080.21 less than a
male. The coefficient of “Female” significantly differs from zero at p < 0.01.
An employee with long experience in the current bank makes $987.99 more. The
coefficient
2 is also significant at p < .01.
Years of experience with the current bank and gender significantly contributes to
salary.

Business Analytics by Jaggia et al 2021


INTERACTION VARIABLES
When only a dummy variable is included in a regression equation, the
regression lines are forced to be parallel. To be more realistic, an
interaction variable must be included in the model.

Business Analytics by Jaggia et al 2021


INTERACTION VARIABLES
We can test for interaction between two predictors by including their
product in the regression model. For example, we might hypothesize
that Y depends on X1 and X2 and X1X2. To test for interaction, we
estimate the model:

Business Analytics by Jaggia et al 2021


INTERACTION VARIABLES
Given the data in Slide 16, we think that there is an interaction between
having an MBA degree and years of experience. Let us predict the salary
of a graduate with an MBA and have 15 years of work experience.
In the Excel file, include another column and multiply “MBA” data values
with “Years of Experience”. Run the analysis again.

Business Analytics by Jaggia et al 2021


SUMMARY OUTPUT
Excel Regression Statistics
The regression is significant over-all, F(5,19) = 46.64, p<.01, and
Output Multiple R
R Square
0.962
0.925
has a good fit, R-square = 0.925. Further, 92.5% of the salary
Adjusted R Square 0.905 variation can be explained by the 5 predictor variables (including
Standard Error 7.116 the interaction variable) together in the model.
Observations 25
ANOVA
df SS MS F Significance F
Regression 5 11809.38747 2361.877 46.6377332 5.11641E-10
Residual 19 962.2181284 50.64306
Total 24 12771.6056

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 30.621 4.819 6.355 4.2621E-06 20.536 40.707 20.536 40.707
Male 3.082 3.342 0.922 0.36797016 -3.913 10.078 -3.913 10.078
MBA 3.300 6.049 0.546 0.59171448 -9.360 15.961 -9.360 15.961
Years of Experience 2.890 0.251 11.524 5.1202E-10 2.365 3.415 2.365 3.415
MBA x Years 1.321 0.778 1.698 0.10583088 -0.307 2.950 -0.307 2.950
Over 50 -7.306 6.093 -1.199 0.24527495 -20.059 5.448 -20.059 5.448

Business Analytics by Jaggia et al 2021


SUMMARY OUTPUT
Excel Regression Statistics
Output Multiple R
R Square
0.962 equation is:
The regression
0.925
𝑦ො = 30.621 + 2.89𝑌𝑒𝑎𝑟𝑠 + 3.30𝑀𝐵𝐴 + 3.082𝑀𝑎𝑙𝑒 − 7.306 𝑂𝑣𝑒𝑟 50+1.321 MBA x Years
Adjusted R Square 0.905
Standard Error 7.116
Observations 25
ANOVA Except for “Over50”, the rest of the predictor variables exert a positive influence
on salaries. dfThe interaction
SS variableMS
MBA x Years
F is also positive
Significance F but not statistically
Regression significant even 5 at the11809.38747
10% level. 2361.877 46.6377332 5.11641E-10
Residual 19 962.2181284 50.64306
Total 24 12771.6056

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 30.621 4.819 6.355 4.2621E-06 20.536 40.707 20.536 40.707
Male 3.082 3.342 0.922 0.36797016 -3.913 10.078 -3.913 10.078
MBA 3.300 6.049 0.546 0.59171448 -9.360 15.961 -9.360 15.961
Years of Experience 2.890 0.251 11.524 5.1202E-10 2.365 3.415 2.365 3.415
MBA x Years 1.321 0.778 1.698 0.10583088 -0.307 2.950 -0.307 2.950
Over 50 -7.306 6.093 -1.199 0.24527495 -20.059 5.448 -20.059 5.448

Business Analytics by Jaggia et al 2021


Practice
Given the estimated regression in the previous slide, predict the
salary of a male graduate, aged 45 years old with an MBA and have 15
years of work experience.

Business Analytics by Jaggia et al 2021


Example

Business Analytics by Jaggia et al 2021


Example

Business Analytics by Jaggia et al 2021


Example

Business Analytics by Jaggia et al 2021


Example

Business Analytics by Jaggia et al 2021


Regression Models for Nonlinear relationships
▪ There are numerous applications where the relationship between the
predictor variable and the response variable cannot be represented by
a straight line and, therefore, must be captured by an appropriate
curve.
▪ Some common nonlinear regression models result from making simple
transformations of the variables. These transformations include
squares and natural logarithms, which capture nonlinear relationships
while still allowing easy estimation within the framework of the linear
regression model.
▪ Goodness-of-fit measures are used to choose between alternative
model specifications.
Business Analytics by Jaggia et al 2021
The Quadratic Regression Model

Business Analytics by Jaggia et al 2021


The Quadratic Regression Model
▪ The quadratic regression model is appropriate when the slope,
capturing the influence of x on y, changes in magnitude as well as sign.

Business Analytics by Jaggia et al 2021


The Quadratic Regression Model
▪ The following scatterplots of sample data with superimposed trendlines
show that the quadratic regression model provides a better fit for both
scatterplots.

Business Analytics by Jaggia et al 2021


Interpretation of coefficients in the quadratic regression model
▪ In nonlinear models, the sample regression equation is best interpreted
by calculating, and even graphing, the predicted effect on the response
variable over a range of values for the predictor variable.

Business Analytics by Jaggia et al 2021


Interpretation of coefficients in the quadratic regression model

Business Analytics by Jaggia et al 2021


Interpretation of coefficients in the quadratic regression model

Business Analytics by Jaggia et al 2021


Interpretation of coefficients in the quadratic regression model

Business Analytics by Jaggia et al 2021


Regression Models with Logarithms

Business Analytics by Jaggia et al 2021


The Log-Log Regression Model

Business Analytics by Jaggia et al 2021


The Log-Log Regression Model

Business Analytics by Jaggia et al 2021


Semi-Log Regression Models

Business Analytics by Jaggia et al 2021


The Logarithmic Regression Model

Business Analytics by Jaggia et al 2021


The Logarithmic Regression Model

Business Analytics by Jaggia et al 2021


The Exponential Regression Model

Business Analytics by Jaggia et al 2021


The Exponential Regression Model

Business Analytics by Jaggia et al 2021


The Exponential Regression Model

Business Analytics by Jaggia et al 2021


Summary

Business Analytics by Jaggia et al 2021


Example
Refer to Module 6B Cost of Power data set.
Tasks:
1. Create a scatterplot and determine if a linear or nonlinear function is
appropriate.
2. Run a linear regression analysis and determine if a nonlinear
transformation is necessary based on the residual plot.
3. Run a quadratic regression analysis and assess based on goodness of
fit.
4. Run a logarithmic regression analysis and assess based on goodness
of fit.
Business Analytics by Jaggia et al 2021
Exercise (In Gretl)
Using the MLRA data, do the following:
1. Create scatterplots between the different independent variables and
level of disclosures.
2. Describe the scatterplots in terms of linearity of the relationships.
3. Assess the normality of the different variables using either Shapiro-
Wilk or Jarque-Bera tests.
4. Add the log of the Firm size. Create the scatterplot between log firm
size and level of disclosure. Describe the result.
5. Do the same procedure for leverage.
6. Compute the correlation coefficients.
Business Analytics by Jaggia et al 2021
In Gretl
Scatterplot in Gretl

Business Analytics by Jaggia et al 2021


In Gretl
Test of normality of variables in Gretl (do this per variable)

Business Analytics by Jaggia et al 2021


In Gretl
Add the log of firm size. Create the scatterplot.

Business Analytics by Jaggia et al 2021


Exercise (In Gretl)
Using the MLRA data, do the following:
7. Perform a multiple regression analysis to determine if there is a
significant relationship between the different predictors and level of
voluntary disclosure.
8. Interpret the results.
9. Determine if the assumptions of regression analysis were satisfied
(normality of residuals, constant variance (homoskedasticity), absence of
multicollinearity)

Business Analytics by Jaggia et al 2021


Exercise (In Gretl)
Using the MLRA data, do the
following:
7. Perform a multiple regression
analysis to determine if there is
a significant relationship
between the different
predictors and level of
voluntary disclosure. Interpret
the results.

Business Analytics by Jaggia et al 2021


Exercise (In Gretl)
Using the MLRA data, do
the following:
7. Determine if the
assumptions of
regression analysis
were satisfied
(normality of residuals,
constant variance
(homoskedasticity),
absence of
multicollinearity).

Business Analytics by Jaggia et al 2021


Logistic Regression Analysis
▪ Logistic regression is a variation of the regression model.
▪ It is used when the dependent or response variable is binary in nature.
▪ Logistic regression predicts the probability of the dependent variable,
rather than the value of the respondent (as in simple linear regression).
▪ For example, will a bank debtor default on a loan (Y = 0, not missing
loan payment) or (Y = 1, missing at least one payments on an
account)?
▪ Also, will a bank customer use online banking (Y = 1) or not (Y = 0)?

Statistical Analysis with Software Applications, Mc Graw Hill


Logistic Regression Analysis
▪ Will a LAZADA customer make another purchase within the next 3
months (Y = 1) or not (Y = 0)?

Such research questions would seem to be candidates for


regression modeling because we could define possible predictors such
as a customer’s age, gender, length of time as an existing customer, or
past transaction history.

Statistical Analysis with Software Applications, Mc Graw Hill


Logistic Regression Analysis
Why not use the Least Squares Method?
If you perform an ordinary least-squares regression with a binary
response variable, there will be complications.
1. While the actual value of Y can only be 1 (if the event occurs) or 0 (if the
event do not occur), the predicted value of Y should be a number between 0
and 1, denoting the probability of the event of interest. Using linear
regression, the predicted Y values could be greater than 1 or less than 0.
2. Regression errors will violate the assumptions of homoskedasticity because
as the predicted Y values vary from 0.50 (in either direction), the variance of
the errors will decrease and approach 0.

Statistical Analysis with Software Applications, Mc Graw Hill


Logistic Regression Analysis
Why not use the Least Squares Method?
If you perform an ordinary least-squares regression with a binary
response variable, there will be complications.
3. Finally, significance tests assume normally distributed errors, which cannot
be the case when Y has only two values (Y = 0 or Y = 1). Therefore, tests
for significance would be in doubt if you used linear regression with a
binary response variable.

The solution is to use logistic regression using the nonlinear regression


model shown in the next slide.
Statistical Analysis with Software Applications, Mc Graw Hill
The Logistic Regression model
The logistic regression equation predicts the probability that
Y = 1 for any specified value of the independent variable. The
model form ensures predictions with the range 0 < 𝑦ො < 1.

Statistical Analysis with Software Applications, Mc Graw Hill


The Logistic Regression model
The logistic regression model has an S-shaped form. The
logistic function approaches 1 as the value of the independent
variable increases. An example is shown.

Statistical Analysis with Software Applications, Mc Graw Hill


Estimating a Logistic Regression model
The underlying model is the Bernoulli (binary) distribution.
1. The event of interest either occurs (probability 𝜋) or does
not occur (probability 1 - 𝜋).
2. Instead of the least squares method, the parameters are
estimated using the method of maximum likelihood. This
method chooses values of the regression parameters that
will maximize the probability of obtaining the observed
sample data.

Statistical Analysis with Software Applications, Mc Graw Hill


Example 1.
We might expect that the size of a bank customer’s savings
balance (in thousands) would predict whether (Y = 1 ) or not (Y
= 0) the customer also would have a brokerage account with the
bank to facilitate making investments. A sample of 20 bank
customers is shown.

Statistical Analysis with Software Applications, Mc Graw Hill


Example 1.
Step 1: Encode the
data in Excel, as
shown. Save in the
csv format.

Step 2: Open in
JASP and run Logistic
Regression.

Statistical Analysis with Software Applications, Mc Graw Hill


Example 1.
Step 3: Dependent variable
is “Broker?” while the
covariate is Savings
balance.

Step 4: Under “Statistics”,


check estimates. For Plots,
check “display conditional
estimates plots.
Statistical Analysis with Software Applications, Mc Graw Hill
Example 1. output from
JASP

Statistical Analysis with Software Applications, Mc Graw Hill


The estimated logistic equation is:
exp(−4.001 + .001 𝑆𝑎𝑣𝑖𝑛𝑔𝑠)
𝑦ො = 𝑃 𝐵𝑟𝑜𝑘𝑒𝑟 =
1 + exp(−4.001 + .001 𝑆𝑎𝑣𝑖𝑛𝑔𝑠)

Statistical Analysis with Software Applications, Mc Graw Hill


The estimated logistic equation is:
exp(−4.001 + .001 𝑆𝑎𝑣𝑖𝑛𝑔𝑠)
𝑦ො = 𝑃 𝐵𝑟𝑜𝑘𝑒𝑟 =
1 + exp(−4.001 + .001 𝑆𝑎𝑣𝑖𝑛𝑔𝑠)

We can use the fitted model to estimate the probability that a given customer will have a broker account. For example, if a customer has a savings
account balance of 6000 (thou), the predicted probability is
exp(−4.001 + .001 𝑆𝑎𝑣𝑖𝑛𝑔𝑠) exp(−4.001 + .001 𝑥 6000) 7.3817
𝑦ො = 𝑃 𝐵𝑟𝑜𝑘𝑒𝑟 = = = = 0.8807
1 + exp(−4.001 + .001 𝑆𝑎𝑣𝑖𝑛𝑔𝑠) 1 + exp(−4.001 + .001 𝑥 6000) 8.3817

On the other hand, if a customer has a savings account balance of 4000 (thou), the predicted probability is

exp(−4.001 + .001 𝑆𝑎𝑣𝑖𝑛𝑔𝑠) exp(−4.001 + .001 𝑥 4000) 0.999


𝑦ො = 𝑃 𝐵𝑟𝑜𝑘𝑒𝑟 = = = = 0.4998
1 + exp(−4.001 + .001 𝑆𝑎𝑣𝑖𝑛𝑔𝑠) 1 + exp(−4.001 + .001 𝑥 4000) 1.999

Statistical Analysis with Software Applications, Mc Graw Hill


References
• Business Analytics by Jaggia et al
• Business Analytics: Data analysis and decision-making by Albright and
Winston
• Applied Statistics for Business and Economics by Doane and Seward

You might also like