0% found this document useful (0 votes)
164 views6 pages

Problem Set 10 (With Instructions) : Regression Analysis

This document provides instructions for completing a problem set involving multiple regression analysis to predict revenue for Southwest Airlines. Students are asked to: 1) Conduct a multiple regression using four independent variables to predict revenue and calculate various regression statistics; 2) Interpret the results of the regression; 3) Conduct hypothesis tests on the regression coefficients; 4) Determine if any variables should be dropped from the model; and 5) Check assumptions of the regression analysis. The document provides the necessary data and output from the regression in a spreadsheet to allow students to complete the required analysis and reporting.

Uploaded by

Lily Tran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
164 views6 pages

Problem Set 10 (With Instructions) : Regression Analysis

This document provides instructions for completing a problem set involving multiple regression analysis to predict revenue for Southwest Airlines. Students are asked to: 1) Conduct a multiple regression using four independent variables to predict revenue and calculate various regression statistics; 2) Interpret the results of the regression; 3) Conduct hypothesis tests on the regression coefficients; 4) Determine if any variables should be dropped from the model; and 5) Check assumptions of the regression analysis. The document provides the necessary data and output from the regression in a spreadsheet to allow students to complete the required analysis and reporting.

Uploaded by

Lily Tran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Problem Set 10 (With instructions)

Chapter 13.

Individual or group. Use SWA_PS9_S14 file on Laulima PS9 folder

1. Do a multiple regression using JetFuel ($/gal), DPI ($B), Recession and AirTran to predict Revenue
for Southwest Airlines using data in the Data2 tab. Then use the results to predict Revenue in q3
2012 with a 95% prediction interval estimate when JetFuel = 3.07, DPI = 10337, Recession = 0 and
AirTran = 1. Write up a report of your analysis. You should include the following in your report:

Regression Analysis

Regression Statistics
Multiple R 0.9715
R Square 0.9438
Adjusted R
Square 0.9348
Standard Error 192.2927
Observations 30

ANOVA
Significanc
  df SS MS F eF
Regression 4 15533797.2889 3883449.3222 105.0248 0.0000
Residual 25 924412.1778 36976.4871
Total 29 16458209.4667      

Coefficient Upper
  s Standard Error t Stat P-value Lower 95% 95%
- - -
10322.081 13534.157 7110.006
Intercept 7 1559.6093 -6.6184 0.0000 1 2 -
JetFuel($/gal) 60.3484 87.0957 0.6929 0.4948 -119.0284 239.7253
DPI($B) 1.3014 0.1684 7.7292 0.0000 0.9546 1.6481
Recession -251.2149 97.4119 -2.5789 0.0162 -451.8385 -50.5913
1385.449
Airtran 1118.5423 129.5954 8.6310 0.0000 851.6355 0

Calculations
-
1.301 60.3484388 10322.0816
b4 through b0 intercepts 1118.5423 -251.2149 4 1 5
b4 through b0 Standard 0.168 87.0956637 1559.60928
Error 129.5954 97.4119 4 1 2
R Square, Standard Error 0.9438 192.2927 #N/A #N/A #N/A
F, Residual df 105.0248 25 #N/A #N/A #N/A
Regression SS, Residual 15533797.2 924412.177 #N/A #N/A #N/A
SS 9 8

Confidence level 95%


t Critical Value 2.0595
Half Width b0 3212.0754
Half Width b1 179.3769
Half Width b2 0.3468
Half Width b3 200.6236
Half Width b4 266.9068

a. Interpretation of R-squared, Adjusted R-squared, and standard error and what these say about
how good the regression is for predicting Revenue from the independent variables.

R square tells us how much the change in jet fuel costs, DPI, Recession and Air Tran prices
cause the variation of the data which in this case is 94%. Adjusted R squared tells us what R
squared is when adjusted to the data’s sample size. Adjusted R squared also tells us the number
of dependent variables. The standard error of 192.3 tells us the variance between the predicted
values and the true values. We can be 94% confident that the regression is acceptable because
the adjusted R squared and R squared value is very high while the standard error is low.

b. The model, the equation and interpretation of the 4 slopes. Which of these variables decrease
revenues and which increase revenues? Use the proper units when you do the interpretations.
Take care to interpret slopes for dummy variables appropriately.

Recession data will decrease the company’s revenue by $251 million. The other slope have a
positive effect on revenue. Every increase of $1 billion of jet fuel, increases revenue by $60
million. With every $1 billion spent on DPI, another $1.3 billion in revenue is gained. If Southwest
merges with Air Tran, revenue goes up $1118 million. The dummy variable when the company is
in a recession will cause the revenue to go down by a value of 25.

c. Tests including hypotheses and confidences for the entire equation and each variable separately,
5 sets of tests in all. State hypotheses and write a one sentence interpretation for each of the five.
Use α =.05 for all.

a. Y = β0 + β1(X1)+ β2( X2)
b. Y= -10322+60(X1)+ 251(X2)
c. Hypotheses
i. Intercept: H0: β0=0, H1: β0 ≠0
 The Significance F is nearly zero indicating we can be nearly 100%
confident in concluding H1.
ii. Jet Fuel: H0: β1=0, H1: β1 ≠0
 The Significance F is nearly zero indicating we can be nearly 100%
confident in concluding H1.
iii. DPI: H0: β2=0, H1: β02≠0
 The Significance F is nearly zero indicating we can be nearly 100%
confident in concluding H1.
iv. Recession: H0: β2 = 0, H1: β2 ≠ 0
 The Significance F is nearly zero indicating we can be nearly 100%
confident in concluding H1.
v. AirTran: H0: β2 = 0, H1: β2 ≠ 0
 The Significance F is nearly zero indicating we can be nearly 100%
confident in concluding H1.

d. Should any terms be dropped? If so, which ones and why?

2
The Jet Fuel data should be dropped since the p value (0.4948) is a bit high and there is not a
strong relationship with the data.

e. Check the assumptions of linearity for JetFuel and DPI (no need for the 2 dummy variables),
constant variability, normality and independence. All plots are given to you in the Residuals tab in
the excel results file. PHStat does not create the linearity plots. Write a statement detailing what
you found and recommendations for addressing any issues present.

Durbin-Watson Calculations

Sum of Squared Difference of 1657297.66


Residuals 7
924412.177
Sum of Squared Residuals 8

1.79281245
Durbin-Watson Statistic 6

Residual Plot for DPI($B)


400

300

200

100

R² = 0.00569
0
Residuals

-100

-200

-300

-400

-500
9000 9200 9400 9600 DPI($B) 9800 10000 10200 10400

3
Residual Plot for JetFuel($/ gal)
400

300

200

100

0
Residuals

R² = 0.02551
-100

-200

-300

-400

-500
0 0.5 1 1.5 JetFuel($/
2 gal) 2.5 3 3.5 4

Descriptive
Summary

Residual
Mean -5.45697E-13
Median -32.81958812
Mode #N/A
Minimum -389.238334
Maximum 300.1617437
Range 689.4000776
Variance 31876.2820
Standard
Deviation 178.5393
-
Coeff. of 32717672906252100.00
Variation %
Skewness -0.1622
Kurtosis -0.6260
Count 30
Standard Error 32.5967

Based on the graphs, there is no relationship for the Jet Fuel and DPI data. Normality looks good
for the data looking at the skewness and kurtosis and since the sample size is 30. The Durbin
Watson has a value of 1.7 which we can then say the data is independent since it is greater than
1.3

f. A statement explaining the results of the 95% prediction interval.

I can be 95% confident that the predicted value will fall within the predicted interval.

4
Make sure you include relevant excel results you used for each part above. (6 pts)

Note: writing a case report like this may be a problem on the final exam.

2. Going back to predicting Revenue from Flights in the Data1 worksheet, answer the following (3 pts)
Use the results from your SWA_PS9_S14 excel file from PS9
a. In PS9 you predicted the revenue in 2012 q1 to be $3,611M. We see the actual revenue was
$3,991M (from the Data2 tab). Given that the standard error was about $306M, what might
explain why the prediction off by more than this? (Hint: Do you know how many flights SWA
actually had in 2012 q1? How accurate did you expect this prediction to be anyway?) Does
this indicate that the prediction model was bad? (Hint: look at the 95% prediction interval you
found in PS9 and see if is consistent.)

There was a difference of $380 million between the predicted revenue in 2012 and the actual
revenue in 2012. The projected standard error was $306 million but the real data was only a
$74 million difference between actual standard error and projected standard error. This can
be due to the low R squared value which means there would be other factors to take into
account when looking at the variation of data. The model did not have a 100% level of
confidence so there was bound to be errors.

b. In PS9 would found the R-squared for predicting Revenue from Flights was about 66%.
Explain why this indicates that there might be other variables that could be added to improve
this model. Look at the variables in problem (1) and explain why adding Airtran to this model
would be a good choice. Hint: use the p-values you obtained in (1) and logic about the
possible relationship between Airtran and Revenue.

Adding the merger with AirTran was a positive addition to the revenue. With the merger
between AirTran and South West, revenue increased an additional $1118 million. There is
almost a 100% level of confidence that the AirTran merger would positively affect the data.

c. Add the dummy variable Airtran data from Data2!G1:G29 to Data1!E1:E29 (note that you are
skipping the last two Airtran data points). Run the regression for predicting Revenue from
Flights and Airtran. Is this a better model than just using Flights? Explain!

5
According to the p-values, the data clearly indicates both coefficients have almost 100%
confidence levels. The ANOVA calculated that a merger would add another 42,066 flights per
year with an error of 10,119 flights. This model is better because it shows how revenue will
increase as well as flights. However it will only be effective at showing how a merger would
positively benefit South West if all other variables are held constant.

d. Explain from the logic of the problem why you might expect a significant interaction between
Flights and Airtran.

Flights and Air Tran would have a significant interaction because each company serves
different routes and so if a merger were to occur, the amount of flights would also increase
due to the increase in resources and various routes. An increase in number of flights would
also result in an increase in revenue.

e. Now test statistically if there is evidence of an interaction for this by adding the interaction
term Flights x Airtran and run the regression with all the terms. Explain the results and if this
suggests you should add the interaction term to your model. Don't forget to explain why or
why not!

This model shows a clear interaction between flights and Air Tran and has an extremely low
p-value of 0.0290. This gives us an almost 98% level of confidence which will allow us to
predict future revenue with better accuracy than the last.

You might also like