Extra Practice #3 Making A Movie
Extra Practice #3 Making A Movie
Use JMP13, Excel and the data to answer the following questions. Where appropriate,
provide clear, succinct justifications of your rational.
Background
In 2016, the film industry in the US was over $11.5 Billion dollars.
Since talented directors and lead actors often consistently create blockbuster movies,
the name of the director and cast are generally good predictive variables for a film’s
success. However, a number of other variables may also be important to consider
including the film budget, country of origin, and plot (is the movie a sequel?).
In the past several years, social media has emerged as an important method of
marketing and advertising. In this assignment, we attempt to predict whether a movie
will be a “financial success” by considering a number of factors pertaining to the social
media presence, the duration of the movie, and the overall popularity of the cast and
director etc. We say a movie is a “financial success” if the movie yields a 30% profit or
higher.
The dataset movies.jmp consists of a subset of 2,823 movies produced from 1936 to
2010. A detailed description of the variables and their meanings are available at the
end of this document.
Instructions
Download the dataset “movies.jmp” from BB. The “.jmp” version has already been split
into a training set and a test set (the first 1000 movies are considered testing data).
You should use JMP, Excel or both to answer the following questions. Be sure to detail
your work and approach. Where appropriate, justify your responses quantitatively. You
may work in teams, but everyone must write-up their own solutions (and perform their
own computations) separately.
Movie_title
Title_year
Content_rating
Country
Director_name
NOTICE: This case and its solutions are COPYRIGHTED. They may not be copied, sold, published, disseminated, shared, or other wise
communicated to third parties whether in person, online or otherwise and whether or not for a profit or nonprofit purpose (2016).
1
BUAD 425
Actor_main
Testing_Data
2. (2 pts) Use stepwise and the “Go” option to build another logistic regression
model to predict whether a movie will be a financial success. Allow JMP to use
the same variables as in Q1. What variables does JMP ultimately pick? What is
the R^2 of this model?
3. (2 pts) Compare your answers in Q1, and Q2. Which do you think is a better
model for the business? Justify your response using the JMP output.
4. (6 pts) In the stepwise model you created in Q2, what is the coefficient (or
weight) of “director_facebook_likes”? Comparatively, what is the coefficient (or
weight) of the “actor_main_facebook_likes”?
5. (4 pts) Using the stepwise model from Q2 and a probability threshold of .57,
what is the resulting confusion matrix? (Remember to only use the testing data).
6. (4 pts) What is the accuracy of the classifier in Q5 (on the testing data)?
7. (6 pts) As a conservative estimate, let’s assume that we invest $1M for every
movie we predict is going to be a financial success, and on average, we earn a
profit of $4M every time we are correct (i.e. when we correctly invest in movies
which are financial successes).
According to the testing data in our model, how much money would we have
lost given the scenario above? Can we adjust our model to lower this number?
Briefly explain.
NOTICE: This case and its solutions are COPYRIGHTED. They may not be copied, sold, published, disseminated, shared, or other wise
communicated to third parties whether in person, online or otherwise and whether or not for a profit or nonprofit purpose (2016).
2
BUAD 425
8. (4 pts) Use the “Go” option and fit a decision tree model with the same variables
as in Q1. How many times did JMP split the data? Comment on why JMP
stopped at this number?
9. (4 pts) What is the R Square value of your model on the training data? On the
testing data?
10. (4 pts) A retired movie producer comments that social media has changed the
movie production world, and at this stage, the main actor having high facebook
likes is the most important indication of whether a movie will be successful or
not.
Based on the decision tree model in Q8, do you agree? If not, which variable is
the most important indication?
11. (4 pts) Describe the characteristics of a movie which is most likely NOT to be a
financial success (in other words, describe the characteristics of a movie which
would lead us to most confidently assume that the movie will not be a success)
12. (4pts) What is the confusion matrix for this model? Use a probability threshold
of .65.
13. (4 pts) What is the overall accuracy of this model? Is this model more accurate
than the Logistic Regression model from the previous parts?
Based on the assumptions stated in Q7, which model would you suggest the
company use? State clearly any assumptions you are making and justify your
response quantitatively.
Appendix:
NOTICE: This case and its solutions are COPYRIGHTED. They may not be copied, sold, published, disseminated, shared, or other wise
communicated to third parties whether in person, online or otherwise and whether or not for a profit or nonprofit purpose (2016).
4