Movie Data
Movie Data
net/publication/228587354
Movie Data
CITATION READS
1 679
2 authors, including:
Concetta Depaolo
Indiana State University
19 PUBLICATIONS 395 CITATIONS
SEE PROFILE
All content following this page was uploaded by Concetta Depaolo on 24 May 2016.
Movie Data
Constance H. McLaren
Concetta A. DePaolo
Indiana State University
Copyright © 2009 by Constance H. McLaren and Concetta A. DePaolo, all rights reserved. This text may be freely
shared among individuals, but it may not be republished in any medium without express written consent from the
authors and advance notification of the editor.
Key Words: Time Series; Movie Box Office; Forecasting; Graphical Display of Data; Curve Fitting; Rate of
Change
Abstract
The Movie dataset contains weekend and daily per theater box office receipt data as well as total U.S. gross
receipts for a set of 49 movies. Dates are provided for all time series values. The diverse list of movies was
selected, not at random, but to spark student interest and to provide a range of box office values. The values
provide a rich dataset to use for applications such as simple graphical analysis, a variety of time series and causal
forecasting models, curve-fitting, and rate of change analysis. A series of assignment questions is included and the
accompanying Instructor’s Manual provides representative solutions.
1. Introduction
Because time series forecasting is such a universal topic in business statistics classes, we have been intrigued with
finding data sets that are both current and meaningful for our students. Although there is certainly a huge amount
of financial time series data available, we have found that the movie box office data sets provide excellent
examples of those forecasting features typically emphasized in business statistics textbooks: trend, seasonality,
cycles, and randomness. Most students in our required business statistics classes are sophomores who have not yet
studied finance. Using data that is familiar to them—they understand that receipts are higher on weekends, they
know how blockbusters are released—ties statistical concepts from their classes to experiences in their lives. The
accompanying data provides information on a wide variety of movies. Instructors who wish to track other movies
or future releases are encouraged to visit the site from which these time series were obtained.
The dataset contains both weekend and daily per theater box office receipts and total US gross receipts for the 49
movies shown in Table 1. To increase student interest, movies were chosen from lists of recent Academy Award
Best Picture winners, highest grossing movies, series movies (e.g. the Harry Potter series, the Spiderman series),
and from the Sundance Film Festival. Values have been retrieved from http://www.the-numbers.com. Movies
selected include big budget as well as smaller, independent films. Receipts vary widely as well. In some cases,
only weekend data is available.
26 Lord of the Rings: The Return of the King 2003 Best Picture
At our university, all business majors are required to complete a two-course introductory (non-calculus based)
business statistics sequence, typically in their sophomore year. The first course covers data presentation, random
variables and probability distributions, and inference. The second course covers tests of independence, ANOVA,
regression, forecasting, and decision analysis as well as a brief unit on business applications of calculus. Typical
business statistics texts include coverage of regression analysis and time series forecasting (see, for example,
Anderson, Sweeney, & Williams, 2008; Bowerman, O’Connell, & Murphree, 2009; Groebner, Shannon, Fry, &
Smith, 2008; and Levine, Stephan, Krehbiel, & Berenson, 2008). We have found that the use of real data increases
student interest in the topics we teach in business statistics courses and in an upper level forecasting elective, and
we anticipate that this would be the case in other statistics courses. Students seem to enjoy data tied to the
entertainment industry, and they are quick to connect the time series patterns they find to their own social
activities.
In addition to the specific analytical questions provided in the assignments below, the data can support classroom
discussions about analytical decision making. Even without additional research into the entertainment industry,
students can use the data to make comparisons of similar movies, evaluate timing decisions for DVD releases, and
look at the impact of holidays and award nominations on box office receipts.
A useful classroom discussion can center on "new product" forecasting. In this area, analysts usually look at
analogies to learn how similar products performed in the past (Makridakis, Wheelwright & Hyndman, 1998, page
466). Students can brainstorm about whether similar movies (genre, actors, release timing, etc.) have similar
patterns of receipts. Validation for this comparison process is supported by the charts created for industry watchers
at The Numbers site. A typical chart, comparing major summer releases for 2008, is shown in Figure 1 below.
2. Data Sources
The data in the Movie data set were retrieved from http://www.the-numbers.com, a site that presents box office
receipt data for hundreds of movies. For each movie, the site provides information on the number of theaters, the
movie’s rank, and total receipts as well as the per theater information. We have chosen to concentrate on the per
theater information as it is more useful for classroom assignments, but instructors who want more detailed
information or want to collect data on future releases are encouraged to visit the Movie Archive section of this site.
Information on movie characteristics, such as a list of Academy Award winners, was found through various sites
(www.oscars.org/awardsdatabase, www.afi.com/tvevents/100years/100yearslist.aspx, http://www.imdb.com/
Sections/Awards/Sundance_Film_Festival).
The daily and weekend time series files have five variables. The first variable is the movie’s number in the
alphabetical list, the second is the movie title, the third is an index for the observation number, the fourth is the per
theater box office receipt amount in dollars, and the fifth is the date (mm/dd/yyyy). For weekend data, the date is
for the Friday of the Friday, Saturday, and Sunday that comprise the weekend total. If daily data is missing for a
title, the third, fourth, and fifth variables are coded as NA. Movie titles are arranged alphabetically. The day of the
week is not provided in the daily chart; if you have your students take this data to Excel, they can use the
"=Weekday" function to determine the day of the week.
Some movies opened to a limited audience and so on those occasions we waited to record values until the movie
was in general release. For some titles, the site does not report receipts every day and/or weekend near the end of
the movie’s run. It is a good exercise for students to look for missing entries in the time series and determine what
to do about those instances. Alternatively, instructors might decide to cleanse the data in advance.
4. Pedagogical Uses
This dataset can support exercises relating to visual display of data, descriptive statistics, trend analysis, and the
forecasting concepts commonly found in an introductory business statistics class. It is also appropriate for a class
in operations management or a class dedicated to forecasting. If more than just a few of the observations are used,
students should have access to software. Basic analyses such as graphing and descriptive statistics can be done
with Excel, although use of Minitab, SPSS, or another statistical software package is preferred for many of the
exercises.
Our approach to statistics follows typical business statistics books such as the widely used texts referenced above.
These books commonly include at least one chapter on forecasting in addition to several chapters on regression
analysis. In our approach, we first present the mathematical and statistical foundations for topics such as least
squares calculations with normal equations, the relationships among entries in ANOVA tables, trend analysis,
seasonal decomposition steps, and smoothing methods, so students understand the theoretical underpinnings of
statistical methods before using software tools to perform calculations. When software output is presented, we
focus on interpretation and analysis so that students are required to think critically about their results rather than
simply reporting output without understanding.
We offer the following successive assignments for use in the classroom. Instructors would certainly have to choose
those assignments that fit the educational objectives of the class and the abilities of the students. A detailed set of
assignment questions and solutions is found in the accompanying Instructor’s Manual.
Students will locate data for a specific movie, bring the data to the software package, format it, and create a time
series plot. We use this in the first days of the introductory business statistics class; it would also be suitable for an
information literacy class.
Students will compute descriptive statistics for several different types of movies using software, and examine these
statistics to draw conclusions about the movie types. We use this exercise in the early part of the introductory
business statistics class. It could also be used to illustrate the difficulty of using descriptive statistics to draw
conclusions about time series data.
Students will create time series plots using daily and weekend movie box office data. Using visual analysis and
software tools, they will prepare a discussion of the features of the plots. We use this exercise at the beginning of
the forecasting unit to help students recognize trend and seasonality in time series data.
Using software, students will fit several nonlinear trend equations to the weekend per theater box office receipts
and determine their suitability as forecasting models. We have used this exercise to illustrate nonlinear regression,
trend fitting, and concepts of rate of change. It also provides the basis for a discussion of overfitting models when
we ask students to consider whether their models are reasonable and appropriate.
This project duplicates the activities of previous exercises, combining them into one project, and adds a calculus-
based activity for rate of change. We have had good results using this exercise as an out-of-class group project in
the second required statistics course.
Students will examine the seasonal patterns in the daily per theater box office receipts. Using software tools
available, they will create seasonal forecasting models and evaluate them. We have used this exercise in both the
second required business statistics class, where we generally rely on seasonal decomposition, and in the
specialized forecasting class, where we ask students to develop and compare results from several more advanced
seasonal forecasting procedures.
This is a more advanced exercise and could be used in our second course or a business strategy class. Students will
play the role of a movie industry analyst who must predict box office revenue for a new movie. In order to find
similar movies to use for comparison, they will need to determine which factors are appropriate. Data from the
comparison group will be used to develop a model for the new release. We recommend this as a group exercise for
upper level students.
5. Conclusion
The Movie data sets provide interesting data for use in a wide variety of statistics classes. In our business statistics
classes we have found that using data from familiar products piques student interest. They are quick to see the
relationship between their analysis and business decision making. By choosing those assignments that fit the
learning objectives of their classes, instructors can provide examples and exercises that augment material included
with text books. The data can be used for activities as simple as plotting and finding descriptive statistics, but it
also supports more advanced analysis.
Acknowledgments
The authors wish to thank Bruce Nash, The-Numbers.com, for supplying Figure 1. Similar charts are posted at the
site.
Movies with missing daily data show NA for DAY_NUM, DAILY_PER_THEATER, and DATE.
Appendix B
The Movie Data Instructor’s Manual, containing all exercise assignments and solutions, is available at Appendix B
Instructors Manual Assignments and Solutions.doc
Data Sources
For movie box office data: http://www.the-numbers.com/
References
Anderson, D., D. Sweeney, & T. Williams (2008). Statistics for Business and Economics, 10th edition. Thomson
South-Western, Mason, OH.
Bowerman, B., R. O’Connell, & E. Murphree (2009). Business Statistics in Practice, McGraw Hill/Irwin, New
York.
Groebner, D. P. Shannon, P. Fry, & K. Smith (2008). Business Statistics, 7th edition. Pearson Education, Upper
Saddle River, NJ.
Levine, D., D. Stephan, T. Krehbiel, & M. Berenson (2008). Statistics for Managers, 5th edition. Pearson
Education, Upper Saddle River, NJ.
Makridakis, S., Wheelwright, S., & R. Hyndman (1998). Forecasting: Methods and Applications, 3rd edition. John
Wiley and Sons, New York.
Constance H. McLaren
Analytical Department
Indiana State University
Terre Haute, IN 47809
[email protected]
Concetta A. DePaolo
Analytical Department
Indiana State University
Terre Haute, IN 47809
Volume 17 (2009) | Archive | Index | Data Archive | Resources | Editorial Board | Guidelines for Authors | Guidelines for
Data Contributors | Home Page | Contact JSE | ASA Publications
http://www.amstat.org/publications/jse/v17n1/datasets.mclaren.html
View publication stats
(10 of 10)3/26/2009 1:51:39 PM