0% found this document useful (0 votes)

6 views9 pages

Practice Final Part 1

The document outlines a practice final exam for a course on Data and Decisions, focusing on analyzing a dataset of 542 Action/Adventure movies to predict box office success based on various features. It presents three linear regression models with different predictor variables and their respective performance metrics, along with a series of questions and solutions related to the models' outputs and implications. The exam assesses understanding of concepts such as model selection, statistical significance, and the relationship between production budgets and box office revenue.

Uploaded by

oliviadtush

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views9 pages

Practice Final Part 1

Uploaded by

oliviadtush

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Practice Final Exam: Part 1

MGMTFT 402 - Data and Decisions

Data and Analysis

Think back to the first class, where we discussed which movies make the largest return
on investment. Suppose you want to better understand what features lead to box office
success vs. box office failure. We are going to focus on a subset of movies in the genre
“Action/Adventure” and with a total budget greater than $10 million.
Specifically, we have a dataset of 542 movies, with the following features as predictor variables:
Predictors / features:
• ProdBudget: Production budget (in millions of $)
• Year: Year movie was made
• MarBudget: Marketing budget (in millions of $)
• Runtime: Length of movie (in minutes)
• Sequel: Dummy variable indicating whether movie is a sequel (1) or not (0)
• CriticRating: Average critic rating at the end of the opening weekend (out of 100)
• AudienceRating: Average audience rating at the end of the opening weekend (out of
100)
We will also consider an interaction term between Sequel and MarBudget.
Our goal is to make good predictions for the following variable:
Target / label:
• BoxOffice: Total box office revenue (in millions of $)

We split the data into a training dataset (442 movies) and a testing dataset (100 movies) and
consider three different linear models. These three models use variables selected by LASSO
for three different values of λ.

1
Here is Model 1:
## MODEL INFO:
## Observations: 442
## Dependent Variable: BoxOffice
## Type: OLS linear regression
##
## MODEL FIT:
## F(1,440) = 682.97, p = 0.00
## R2 = 0.61
## Adj. R2 = 0.61
##
## Standard errors: OLS
## --------------------------------------------------
## Est. S.E. t val. p
## ----------------- -------- ------- -------- ------
## (Intercept) -42.29 16.45 -2.57 0.01
## ProdBudget 5.57 0.21 26.13 0.00
## --------------------------------------------------
For Model 1, the average prediction error on training data is 23.1. The standard deviation
of these prediction errors is 18.7. The average prediction error on testing data is 24.2. The
standard deviation of these prediction errors is 19.2. You can assume that a linear model
appears to be appropriate.

2
Here is Model 2:
## MODEL INFO:
## Observations: 442
## Dependent Variable: BoxOffice
## Type: OLS linear regression
##
## MODEL FIT:
## F(3,438) = 307.83, p = 0.00
## R2 = 0.68
## Adj. R2 = 0.68
##
## Standard errors: OLS
## -------------------------------------------------------
## Est. S.E. t val. p
## ---------------------- -------- ------- -------- ------
## (Intercept) -56.50 17.46 -3.24 0.00
## Sequel 79.30 36.63 2.16 0.03
## MarBudget 9.19 0.46 19.96 0.00
## Sequel:MarBudget 3.66 0.85 4.30 0.00
## -------------------------------------------------------
For Model 2, the average prediction error on training data is 17.7. The standard deviation
of these prediction errors is 16.9. The average prediction error on testing data is 18.5. The
standard deviation of these prediction errors is 17.8. You can assume that a linear model
appears to be appropriate.

3
Here is Model 3:
## MODEL INFO:
## Observations: 442
## Dependent Variable: BoxOffice
## Type: OLS linear regression
##
## MODEL FIT:
## F(6,435) = 160.83, p = 0.00
## R2 = 0.69
## Adj. R2 = 0.68
##
## Standard errors: OLS
## -------------------------------------------------------
## Est. S.E. t val. p
## ---------------------- -------- ------- -------- ------
## (Intercept) -99.00 53.10 -1.86 0.06
## ProdBudget -3.35 0.88 -3.80 0.00
## Sequel 101.73 36.71 2.77 0.01
## MarBudget 14.59 1.49 9.80 0.00
## CriticRating 0.45 0.72 0.63 0.53
## AudienceRating 0.24 1.00 0.24 0.81
## Sequel:MarBudget 5.59 0.98 5.72 0.00
## -------------------------------------------------------
For Model 3, the average prediction error on training data is 13.1. The standard deviation
of these prediction errors is 11.0. The average prediction error on testing data is 22.9. The
standard deviation of these prediction errors is 19.2. You can assume that a linear model
appears to be appropriate.

4
Questions
1. What percent of variability in BoxOffice can be explained by ProdBudget?
a) 5.6%
b) 61%
c) 69%
d) Not enough information.

2. Suppose a movie that is a sequel (Sequel = 1) has a marketing budget of $20 million.
Using Model 2, what would you predict for box office revenue?
a) $22.8 million
b) $127.3 million
c) $206.6 million
d) $279.8 million

3. Based on Model 3 and using a significance level of α = 0.05, is there a statistically

significant difference in the impact of marketing budget on films that are sequels
compared to films that are NOT sequels?
a) Yes, because the relevant p-value is below 0.05
b) Yes, because the relevant p-value is NOT below 0.05
c) No, because the relevant p-value is below 0.05
d) No, because the relevant p-value is NOT below 0.05

4. Does ProdBudget appear to be collinear with other predictor variables?

a) Yes
b) No
c) Not enough information.

5
5. Recall that the three models all come from LASSO regressions with different values of
λ. Which of the three models comes the highest value of λ?
a) Model 1
b) Model 2
c) Model 3
d) Not enough information.

6. Suppose we use forward selection instead of LASSO, and suppose it suggests the same
model as Model 2. Which of the following statements is NOT true?
a) The first variable selected might be ProdBudget, which was also the first variable
selected by LASSO.
b) Adding any other variable to Model 2 would increase AIC
c) Adding any other variable to Model 2 would increase R2 .

7. Consider Model 1. What is your interpretation of the coefficient on ProdBudget? Is

this coefficient statistically significant at the α = 0.05 level?

8. If you were a studio executive, does the regression output in Model 1 mean that
increasing the production budget will cause an increase in profits? Explain why or why
not in at most 3 sentences.

6
9. If you could only use one of these models to make predictions about BoxOffice, which
model would you use? Explain your answer in at most 2 sentences.

10. How confident are you that the model you chose in the previous answer will make better
predictions than Model 1 (or Model 2, if you chose Model 1)? The best answers will
use a hypothesis test to compare the two models (or identify a plausible hypothesis test
but describe why its results may not tell the whole story).

11. Is there any other information you would want to have to answer these questions? If so,
feel free to make an assumption in order to answer the question, and explain what you
assumed here. If not, leave this blank.

7
Solutions
1:b. This is the definition of R2 applied to Model 1.

2:d. This involves plugging in values into Model 2. y = −56.5+9.19∗20+79.3∗1+3.66∗20∗1 =

279.8.

3:a. The relevant coefficient is the one on the interaction term between Sequel and
MarBudget. This p-value is below 0.05, which means it is statistically significant at the 5%
level.

4:a. Yes, definitely. Looking at Models 1 and 3, we see that the coefficient on ProdBudget
changes signs from positive to negative. This is a strong sign that it is collinear with other
predictor variables, though it is not clear which of the variables it is collinear with.

5:a. The largest value of λ is the largest penalty for new variables, so we would select
fewer variables in that case. The correct answer is then Model 1, which has the fewest variables

6:a. This cannot be true. Once forward selection picks a variable, it remains selected. So if it
arrives at a model with 3 other variables (and not ProdBudget), it could not have selected
ProdBudget first. This is a big difference between LASSO and forward selection.

7: If the production budget were to increase by $1 million, we would predict box office
revenue to increase by $5.57 million. This coefficient is statistically significant at the 5%
level because the p-value is below 0.05.

8: No! Correlation does not imply causation. For example, it could be that movies that are
expected to do well based on the quality of the scripts are given larger budgets.

9: I would suggest using Model 2, as it has the lowest error on test data. It seems that this
model finds the sweet spot of testing error. It uses enough variables to reduce training and
testing error, but not so much that it overfits. The avoidance of overfitting can be seen by
noting that the training and testing errors are quite similar.

10: We want to assess whether the true average prediction errors will be different for Models 1
and 2. In other words, we want to compare the test data prediction errors using a two-sample
hypothesis test for means. Our null hypothesis is that the prediction errors are equal.

For Model 1, the test error has an average of x1 = 24.2 with a standard deviation of
s1 = 19.2. For Model 2, the test error has an average of x2 = 18.5 with a standard deviation

8
of s2 = 17.8. The sample sizes are n1 = n2 = 100

q
Using our formula for the appropriate test statistic, we get: (24.2−18.5)/ 19.22 /100 + 17.82 /100 =
2.177. Because this is slightly over 2, we can reject the null hypothesis that the prediction
errors are equal. There is a less than 5% chance that we would see this data if the two
models had equal prediction errors.

If you provide a compelling reason why a different model might be better, based on some
contextual knowledge or other rationale, that can be entirely acceptable, but you should
identify this as the relevant hypothesis test to conduct.

Final Project Documentation
No ratings yet
Final Project Documentation
53 pages
MLS 1 - Regression
No ratings yet
MLS 1 - Regression
20 pages
3HDAK00000A0252 en Product-Manual IDFP 2.1 Rev-B
No ratings yet
3HDAK00000A0252 en Product-Manual IDFP 2.1 Rev-B
389 pages
Lecture 12 Regression
No ratings yet
Lecture 12 Regression
55 pages
ABRM Regression
No ratings yet
ABRM Regression
22 pages
Lecture 4
No ratings yet
Lecture 4
62 pages
CSE3506 PPT Ref1
No ratings yet
CSE3506 PPT Ref1
135 pages
Qcm1 February 2015 424 Corrige
100% (2)
Qcm1 February 2015 424 Corrige
10 pages
Linear Regression Model Presentation
No ratings yet
Linear Regression Model Presentation
7 pages
DMV Unit 3 PPT - RSK - 250419 - 125620 Jfhuehiwhu
No ratings yet
DMV Unit 3 PPT - RSK - 250419 - 125620 Jfhuehiwhu
89 pages
IS4242 W3 Regression Analyses
No ratings yet
IS4242 W3 Regression Analyses
67 pages
GRand FiNAlle
No ratings yet
GRand FiNAlle
48 pages
Machine Learning Test Regression
No ratings yet
Machine Learning Test Regression
6 pages
MATH3004 Industrial Project Semester 2 2018 Bentley Campus INT
No ratings yet
MATH3004 Industrial Project Semester 2 2018 Bentley Campus INT
9 pages
Model Selection
No ratings yet
Model Selection
7 pages
SDSC3006 - Assignment 1
No ratings yet
SDSC3006 - Assignment 1
3 pages
Questions For Viva
No ratings yet
Questions For Viva
4 pages
SSRN Id4774522
No ratings yet
SSRN Id4774522
16 pages
t2 Sol
No ratings yet
t2 Sol
5 pages
Unit 16 Assignment One
No ratings yet
Unit 16 Assignment One
16 pages
Quantitative Methods II Mid-Term Examination: Instructions
100% (1)
Quantitative Methods II Mid-Term Examination: Instructions
17 pages
Linear RegressionSV
No ratings yet
Linear RegressionSV
66 pages
Predicting Pregnancies of Our Customers I - Regression Model
No ratings yet
Predicting Pregnancies of Our Customers I - Regression Model
50 pages
Linear Regression and Modeling Data
No ratings yet
Linear Regression and Modeling Data
3 pages
PeakFit 4.12 PDF
No ratings yet
PeakFit 4.12 PDF
2 pages
Dar Solved Ans
No ratings yet
Dar Solved Ans
20 pages
2SLS Notes
No ratings yet
2SLS Notes
44 pages
Week 10 - Lecture 10
No ratings yet
Week 10 - Lecture 10
59 pages
DSEnd
No ratings yet
DSEnd
30 pages
Data Analytics Group 7
No ratings yet
Data Analytics Group 7
7 pages
010 - STAT - SAMPLING Distribution of The Sample Mean
100% (1)
010 - STAT - SAMPLING Distribution of The Sample Mean
9 pages
Assignment 2 Module 3
No ratings yet
Assignment 2 Module 3
26 pages
Module 1-Nature of Psychological Measurement. 2
No ratings yet
Module 1-Nature of Psychological Measurement. 2
11 pages
Lecture Notes - Linear Regression
No ratings yet
Lecture Notes - Linear Regression
26 pages
Predictive Modelling Sweta Kumari
No ratings yet
Predictive Modelling Sweta Kumari
35 pages
1.2.2 Quiz Reading Histograms
No ratings yet
1.2.2 Quiz Reading Histograms
7 pages
Hair PPT Ch05
No ratings yet
Hair PPT Ch05
18 pages
Linear Regression
No ratings yet
Linear Regression
16 pages
Mmla Ia FT202087
No ratings yet
Mmla Ia FT202087
6 pages
MIS410 Lecture9-10
No ratings yet
MIS410 Lecture9-10
40 pages
AAI Lecture 10 SP 25
No ratings yet
AAI Lecture 10 SP 25
37 pages
Beta-4 Manual Supplement
No ratings yet
Beta-4 Manual Supplement
10 pages
Intro To Reg Models
No ratings yet
Intro To Reg Models
27 pages
AST Day 2 Slides
No ratings yet
AST Day 2 Slides
58 pages
Correlation Coefficient
No ratings yet
Correlation Coefficient
4 pages
Lab 03 Sol
No ratings yet
Lab 03 Sol
6 pages
An Empirical Evaluation of Explanations For State Repression
No ratings yet
An Empirical Evaluation of Explanations For State Repression
27 pages
CH 6. Simple Regression
No ratings yet
CH 6. Simple Regression
98 pages
Quiz 2 2021 Sol
No ratings yet
Quiz 2 2021 Sol
8 pages
Computer Lab 2 Block 1-3
No ratings yet
Computer Lab 2 Block 1-3
7 pages
Week1 Lecture2
No ratings yet
Week1 Lecture2
57 pages
Smaw NC Iv
No ratings yet
Smaw NC Iv
70 pages
Rithvik ++++
No ratings yet
Rithvik ++++
14 pages
QM-II Midterm OCT 2014 Solution
No ratings yet
QM-II Midterm OCT 2014 Solution
19 pages
Case Study Report Format: Cover Page Introduction
No ratings yet
Case Study Report Format: Cover Page Introduction
9 pages
Revision 235
No ratings yet
Revision 235
8 pages
Ssss PDF
No ratings yet
Ssss PDF
50 pages
Probability Distributions: by Dr. Ameer Kadhim Hussein. M.B.Ch.B. FICMS (Community Medicine
No ratings yet
Probability Distributions: by Dr. Ameer Kadhim Hussein. M.B.Ch.B. FICMS (Community Medicine
37 pages
Nanduri Naga Sowri Pgp-Dsba - Octa - G2 Great Learning
No ratings yet
Nanduri Naga Sowri Pgp-Dsba - Octa - G2 Great Learning
40 pages
0 Regularization PDF
No ratings yet
0 Regularization PDF
88 pages
Chapter 3: Research Methodology
No ratings yet
Chapter 3: Research Methodology
16 pages
Mercy Project
No ratings yet
Mercy Project
65 pages
Sample Final Exam (SMMD) : Part A: Each Question in This Part Is Worth 1point
No ratings yet
Sample Final Exam (SMMD) : Part A: Each Question in This Part Is Worth 1point
9 pages
MBA Syllabus 2022
No ratings yet
MBA Syllabus 2022
160 pages
Item Response Theory PDF
100% (2)
Item Response Theory PDF
31 pages
P (Y 1) e 1+ E: Business Analytics - Assignment
No ratings yet
P (Y 1) e 1+ E: Business Analytics - Assignment
4 pages
Slides Marked As Extra Study Are Not As A Part of Syllabus. Those Are Provided For Add-On Knowledge
No ratings yet
Slides Marked As Extra Study Are Not As A Part of Syllabus. Those Are Provided For Add-On Knowledge
45 pages
Effect of Superheated Steam and Convection Roasting On Changes in Physical Properties of Cocoa Bean (Theobroma Cacao)
No ratings yet
Effect of Superheated Steam and Convection Roasting On Changes in Physical Properties of Cocoa Bean (Theobroma Cacao)
6 pages
Econometrics Sheet 2B MR 2024
No ratings yet
Econometrics Sheet 2B MR 2024
5 pages
Accenture
No ratings yet
Accenture
3 pages
And Inflation The Role of The Monetary Sector : University of Michigan, Ann
No ratings yet
And Inflation The Role of The Monetary Sector : University of Michigan, Ann
24 pages
Graded Quiz Unit 3 PDF
No ratings yet
Graded Quiz Unit 3 PDF
10 pages
Regression Analysis
No ratings yet
Regression Analysis
20 pages
Econometrics - Solution sh.2B 2024
No ratings yet
Econometrics - Solution sh.2B 2024
9 pages
Business Statistics, 5 Ed.: by Ken Black
No ratings yet
Business Statistics, 5 Ed.: by Ken Black
34 pages
Box Plot
No ratings yet
Box Plot
4 pages
Coping Mechanisms and Academic Performance of 12th Grade Students During The COVID 19 Pandemic
No ratings yet
Coping Mechanisms and Academic Performance of 12th Grade Students During The COVID 19 Pandemic
9 pages
Questions
No ratings yet
Questions
3 pages
Parallelism of Statistics and Machine Learning & Logistic Regression Versus Random Forest
100% (1)
Parallelism of Statistics and Machine Learning & Logistic Regression Versus Random Forest
72 pages
EC TE Version A
No ratings yet
EC TE Version A
15 pages
Unit 6
No ratings yet
Unit 6
18 pages
Sbe10 10 Simple Regression
No ratings yet
Sbe10 10 Simple Regression
100 pages
Elements of Nonlinear Series Analysis and Forecasting PDF
100% (8)
Elements of Nonlinear Series Analysis and Forecasting PDF
626 pages
Submitted To:: Prof. Vinay Singh Chawan
No ratings yet
Submitted To:: Prof. Vinay Singh Chawan
12 pages
Top 20 MS Excel VBA Simulations, VBA to Model Risk, Investments, Growth, Gambling, and Monte Carlo Analysis
From Everand
Top 20 MS Excel VBA Simulations, VBA to Model Risk, Investments, Growth, Gambling, and Monte Carlo Analysis
Andrei Besedin
2.5/5 (2)
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Linear Regression with Multiple Covariates
From Everand
Linear Regression with Multiple Covariates
Brett Kottmann
No ratings yet
AI-900: Microsoft Azure AI Fundamentals Preparation
From Everand
AI-900: Microsoft Azure AI Fundamentals Preparation
Georgio Daccache
No ratings yet
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet

Practice Final Part 1

Uploaded by

Practice Final Part 1

Uploaded by

Practice Final Exam: Part 1

MGMTFT 402 - Data and Decisions

Data and Analysis

3. Based on Model 3 and using a significance level of α = 0.05, is there a statistically

4. Does ProdBudget appear to be collinear with other predictor variables?

7. Consider Model 1. What is your interpretation of the coefficient on ProdBudget? Is

2:d. This involves plugging in values into Model 2. y = −56.5+9.19∗20+79.3∗1+3.66∗20∗1 =

You might also like