0% found this document useful (0 votes)

10 views6 pages

Predicting Film Box Office Performance Using Wikipedia Edit Data

This study investigates the use of Wikipedia edit data as a predictor for opening box office revenues of films released in the US, focusing on films from 2007 to 2011. A predictive model utilizing gradient boosting trees achieved an R² of 0.54 for films released in 2012, indicating that Wikipedia activity can reflect social interest but should be combined with other predictors for better accuracy. Key features influencing the model's performance include edit frequency, content changes, and revenues of similar films.

Uploaded by

International Journal of Innovative Science and Research Technology

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views6 pages

Predicting Film Box Office Performance Using Wikipedia Edit Data

Uploaded by

International Journal of Innovative Science and Research Technology

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Volume 10, Issue 2, February – 2025 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165 https://doi.org/10.5281/zenodo.14987459

Predicting Film Box Office Performance Using

Wikipedia Edit Data
Niraj Patel1
1
St. Clair College

Publication Date: 2025/03/11

Abstract: This study explores the potential of Wikipedia edit data as a predictor of opening box office revenues for films
released in the US. After analyzing films from 2007 to 2011, we developed a predictive model based on Wikipedia article
edits using gradient boosting trees as the primary algorithm. Our model incorporates features such as the frequency of
Wikipedia edits, the size and content of article revisions, and the revenues of similar films. The results demonstrate that
Wikipedia activity can serve as a rough indicator of film popularity, though the model’s predictive accuracy is limited. We
find that Wikipedia-based features, particularly edit runs and content changes, significantly contribute to the model’s
performance, achieving an R² of 0.54 for films released in 2012. This suggests that while Wikipedia data offers valuable
insights into social interest, it is best used in conjunction with other predictors for more reliable revenue estimates.

How to Cite: Niraj Patel (2025). Predicting Film Box Office Performance Using Wikipedia Edit Data. International Journal of
Innovative Science and Research Technology, 10(2), 1951-1956. https://doi.org/10.5281/zenodo.14987459

I. INTRODUCTION

 Wikipedia as a Gauge of Social Interest

Fig 1 Count of Wikipedia article edits for the films used in this paper’s training dataset over the 4 weeks prior to each film’s
respective release date, bucketed by days before the release date that the edits occurred. This graph shows the uptick in editing
activity that typically accompanies a film’s release.

 According to its article about itself (as of this writing), registered user account. The edit history of each article is
Wikipedia is “a collaboratively edited, multilingual, free saved with a timestamp. Interested users can view any
Internet encyclopedia” launched in January 2011. [6] Its past version of an article, and an article’s edit history
articles can be edited by anyone, either anonymously exhibits an evolving record of Wikipedia’s “knowledge”1
(though the editor’s IP address is logged) or with a of its subject.

IJISRT25FEB802 www.ijisrt.com 1951

Volume 10, Issue 2, February – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.5281/zenodo.14987459
 As such, Wikipedia’s edit history can be viewed as a  Rotten Tomatoes (http://www.rottentomatoes. com/) - a
barometer of social interest. For example, when a person popular movie review aggregator. I used it to obtain
is in the news, editing activity in his or her article often descriptive information about films: genres, runtime,
spikes. In fact, Wikipedia has template warnings MPAA rating, cast and directors, and so on. It offers an
indicating when an article is likely to be in flux due to a API if you register for a key (which is free as of the
relevant current event. Edit activity on Wikipedia, in this present writing).
sense, is akin to mentions on social networks like
Facebook or Twitter, although perhaps with a smaller  Wikipedia (English-language) (http://en.wikipedia.org/)-
participating audience (although many people read MediaWiki, the name of the web application upon which
Wikipedia, not nearly so many participate in its creation). Wikipedia is based, offers an API, no registration or key
 One area where we can try to gauge the degree to which necessary.
Wikipedia activity reflects social interest is in film box
office performance. Films have relatively well-defined  Much of the work involved in data retrieval and
release dates prior to which we can measure activity on formatting was to ensure that data retrieved from these
Wikipedia. They also have well-defined, measurable three sources corresponded to the same film; data from
outcomes - revenues at the ticket booth - that are clearly Rotten Tomatoes and Wikipedia was obtained by using
sensitive to popular interest. Theater owners obviously their APIs’ search functionalities, which can lead to
have a direct fi- nancial interest in knowing how well a incorrect hits if you are not careful. For example, we want
film is going to perform. Advertisers and publicists, to make sure that Rotten Tomatoes data for the 2012 film
sellers of tie- in products, and film journalists have a “The Lucky One” is not mapped to the 2008 film “The
slightly more indirect but still strong interest; they will Lucky Ones,” or that for the 2010 film “Salt” we do not
want to know how they should spend their time and examine the Wikipedia article for salt, the mineral.3
money. Can we use Wikipedia to usefully predict films’
opening box office performances?
 The universe of films that I considered were those listed
on Box Office Mojo as having opened in at least 1000
II. FORMULATION OF PROBLEM theaters. I manually excluded a handful of films that were
AND DATA SOURCES re-released or had limited engagement special features. I
trained my algorithms on films released between 2007 and
 The specific question I set out to answer was how 2011, inclusive. In total, 689 films were in the training
accurately, using Wikipedia’s help, can we predict the dataset. Data from films as far back as 2002 were used for
domestic per-theater box office gross of a film released some of the feature calculations; see the next section for
widely in the US over the first three days2 of its release.
more details. I tested my algorithm on films released in
2012, of which there were 124.
 Of course, Wikipedia’s highly open policy means that it
contains a stunning breadth of information from
 Box Office Mojo data had to be scraped from HTML, but
contributors with wide- ranging expertise and that said
the HTML was regular and consistent. Rotten Tomatoes
information is sometimes unreliable. For an example that has a nice JSON-based API for data retrieval, but its
was in the news not long before this paper was written, ranking of returns is quirky, sometimes retrieving obscure
see [5], or for Wikipedia’s own list of Wikipedia hoaxes, films or films with similar names (example: Oliver
see [4]. Stone’s 2008 biopic “W.” was unfindable through search
query, even though the website’s front end; I had to go
 Films traditionally open on Friday, and their “opening” to Stone’s Rotten Tomatoes page just to find the relevant
often refers to their gross over the first Friday, Saturday, web page). Wikipedia has a nice API and solid/consistent
and Sunday that they are playing. However, there are lookup, which is all the more impressive given that it
plenty of non-Friday openings. Consequently, I’ve stated contains articles on anything, not just films.
the problem in terms of the first three days’ worth of
grosses. III. FEATURES
 The Data Sources I used to Answer this Question were: A. Descriptive Features
 Box Office Mojo (http://www.boxofficemojo. com/) -  The descriptive features considered were the year of
contains detailed box office data. I used it to select the release, runtime, MPAA rating, whether the film was
universe of films to analyze and as my source for released on a Friday, and membership in genres as
theatrical release dates, number of opening theaters, and defined by Rotten Tomatoes. Rotten Tomatoes has 18
revenues. There is no API - I scraped the data with the genre labels, listed below. A film can belong to any
Python package Beautiful Soup. number of these genres.

IJISRT25FEB802 www.ijisrt.com 1952

Volume 10, Issue 2, February – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.5281/zenodo.14987459

B. Wikipedia-Based Features revisions for certain textual patterns. One was a count of
the number of article section headings, another was a
 For each Wikipedia article, I measured the number of count of the number of external file references (typically
edits runs that occurred during the period 0 to 7 days an image or sound file inserted into the article), and the
prior to midnight on the day of the film’s release, as well last was a case- insensitive search for the word “ IMAX”.
as during the period 7 to 28 days prior. I defined an edit
run as a sequence of consecutive edits from the same C. Revenues of Similar Films
author (identified by IP address if anonymous).
Sometimes, on Wikipedia, the same author commits  A natural approach to predicting the box office
several edits in a row, presumably as part of a single effort performance of a film is to look at comparable films; in
to edit the page, which I wanted to correspondingly treat particular, the natural benchmark for a sequel is its
as a single edit. I generally found this to be a slight predecessor. To this end, I created a feature consisting of
improvement over raw edit count in terms of predictive revenues of “similar” films released in the five years
power. preceding each film’s release (hence, data as far back as
2002 was involved, even though the training dataset
 I also extracted a few features from the content of the extended only as far back as 2007). The five-year window
article revisions themselves. One feature I used was the was arbitrary, but I think it forms a reasonable bond when
average size, in bytes, of revisions in the 28-day window. comparing expected box office performance.
Other features were obtained by scanning the text of the

Fig 2 Example similarity scores: for “The Avengers” (2012).

IJISRT25FEB802 www.ijisrt.com 1953

Volume 10, Issue 2, February – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.5281/zenodo.14987459
 Similarity between two films was defined as the generated by summing so-called “weak predictors” that
geometric mean of the Jaccard4 similarity measures of the are sequentially fit to the gradient of a specified loss
films’ 1) Rotten Tomatoes genre information and 2) function (for example, squared error). The overall model
Rotten Tomatoes cast/director information. The Rotten may be accurate and robust even if each individual weak
Tomatoes API only returns the first few starring members predictor is very simplistic. Gradient boosting trees refer
of each film’s cast, so the metric is not distorted by to gradient boosting with decision trees as our weak
differing cast sizes. Directors were always treated as predictors. For details, see Friedman’s article [2], and also
single people, even if there were co- directors, so for our Wikipedia’s own page on gradient boosting [3].
purposes, the Coen brothers, for example, count as a
single person.  I used the Python statistical package scikit- learn’s
implementation of gradient boosting trees, using the
 The feature incorporated into the algorithms was, for each default learning rate and least squares as my loss
film, the opening revenue of all other films in our universe function.There are a few other model parameters.
released up to five prior to that film, weighted by
similarity. See Figure 2 for an example of similarity 4The Jaccard similarity of two sets A and B is defined
scores for one of the films in the test dataset. as.

IV. ANALYSIS AND PREDICTION

 I tried a few different prediction algorithms; the one that 5Random forests and ordinary linear regression
proved the most effective on the test set, as measured by performed worse, but not by much. Despite the clearly non-
R 2, was gradient boosting trees.5 Gradi- ent boosting is a normal distribution of the revenue per theater (it has a positive
general predictive technique pioneered by Jerome skew), I did not have better success with a generalized linear
Friedman of Stanford in which a predictive formula is regression than with ordinary linear regression.

Fig 3 Estimators

Fig. 3: R2 of gradient boosting tree models on the test and the depth of the trees (how many leaves are in each
dataset as a function of the number of estimator iterations. decision tree - this parameterizes the complexity of each
The different curves represent different numbers of leaves in individual weak predictor).
the weak learner decision trees. The simplest weak learner, a
2-leaf tree, performs the best. Using stochastic gradient  Adapting the example in scikit-learn’s docu- mentation
boosting trees, in which a subsample of the features is used to [1], I calculated the R2 of gradient boosting trees at
fit the decision trees, improved the high-leaf models to some different iterations and tree depths. I fit the model using
degree. This suggests that the inferior performance of the different parameterizations to the test data. Figure 3
higher-leaf models may be due to overfitting. that can be illustrates the results and shows that this model fits the test
controlled by the user; the most important ones are the data best with about 100 iterations (this is, in fact,, scikit-
number of estimators (the number of weak predictors to fit) learn’s default value) and a very simple 2-leaf functional

IJISRT25FEB802 www.ijisrt.com 1954

Volume 10, Issue 2, February – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.5281/zenodo.14987459
form for its weak predictors. decision trees is representative of its importance in
generating predictions; highly relevant features will be
 Using a gradient boosting tree model with 100 estimators frequently involved in trees, and ir- relevant features will
and two leaves in each weak learner and training on films be involved rarely or not at all. Figure 5 shows the top 10
from 2007 to 2011, as mentioned previously, I was able to features. Several features had frequencies of 0, in
achieve an R2 of 0.5400 on the 2012 dataset. The particular the boolean variables for several of the genre
predictions and results are listed in an appendix at the end categories, indicating that they could have been
of this paper. Figure 4 shows a scatter of predictions and completely omitted without impacting the outcomes of
actual values. this m odel .
 The frequency with which a feature is used in the model’s

Fig 4 Predicted values vs. actual values.

Table 1 Top 10 Features in the Gradient Boosting Tree Model.

Feature Frequency (%)
Wikipedia edit runs 7-28 days prior 18.31
Film runtime 14.60
Opening per-theater revenue of similar films 13.30
Wikipedia frequency of headers/subheaders 12.07
Wikipedia edit runs 0-7 days prior 10.97
Wikipedia average size of revisions 9.73
Wikipedia frequency of word “IMAX” 5.07
Wikipedia frequency of external files 4.62
Is comedy 3.74
MPAA rating is PG-13 3.17

 The importance of the Wikipedia data in this model can another; frequency of appearance in news headlines is
also be seen by removing the Wikipedia features and another. There are many conceivable metrics to gauge
rerunning the model, which produces a considerably lower popular interest in seeing a film, and a comprehensive
R2 of 0.3434. model would include data from many sources.

V. CONCLUSION AND AVENUES FOR 6In fact, I found that the number of opening theaters
FURTHER EXPLORATION itself has significant predictive power on per-theater revenue.
I omitted it mainly because I wanted to specifically examine
 While the results above do show that Wikipedia activity Wikipedia’s ability to measure social interest.
has some ability to predict box office returns, I do not
think the model in this paper is precise enough to be used  In particular, a many-source approach will help overcome
as anything but a very rough forecasting tool. Wikipedia the biases that any one source would have. Although
is just one possible source of data for quantifying social Wikipedia is widely known, read, and edited by a wide
interest; social networks such as Twitter or Facebook are variety of people, it will still be biased to whatever extent

IJISRT25FEB802 www.ijisrt.com 1955

Volume 10, Issue 2, February – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.5281/zenodo.14987459
that Wikipedia editors do not reflect the population of potential source of data to consider when gauging interest
people who go to the movies. It is my opinion that the best - and not just in films, but anywhere popular interest is a
way to improve this model would be to obtain more concern. Wikipedia could be used as input for predictions
measurements of popular interest, particularly data related to interest in news and current events, ticket sales
sources whose audiences overlap little with Wikipedia for events other than films, investor sentiments, and many
editors - measurements of interest among moviegoing other ar ea s.
demographics that use the Internet relatively infrequently,
for example.

 Nevertheless, the partial success in predicting box office

revenues with Wikipedia demonstrates that it is one

Table 2 2012 Predictions and Errors, Sorted by Actual Revenue per theater.
Title Actual Predicted Error (actual - predicted)
Marvel’s The Avengers 47698 26452 21247
The Hunger Games 36871 22247 14624
The Dark Knight Rises 36532 19194 17338
The Twilight Saga: Breaking Dawn Part 2 Skyfall 34660 11890 22770
25211 31496 -6285
The Hobbit: An Unexpected Journey 20919 18152 2767
Dr. Seuss’ The Lorax 18830 7018 11812
The Amazing Spider-Man 17176 21054 -3877
Ted 16800 8127 8673
Think Like a Man 16693 5536 11157

Table 3 2012 Predictions and Errors, Sorted by Actual Revenue per theater (Part 1).
Title Actual Predicted Error
Abraham Lincoln: Vampire Hunter 5247 5668 -421
The Cabin in the Woods 5245 6979 -1734
Sparkle 5189 4511 677
Mirror Mirror 5032 3589 1444
Red Dawn 4916 7430 -2514
The Three Stooges 4892 5981 -1089
Rise of the Guardians 4869 8725 -3856
End of Watch 4818 2503 2315
Cloud Atlas 4787 8046 -3259
Step Up Revolution 4570 4409 162

Table 4 2012 Predictions and Errors, Sorted by actual Revenue per theater (Part 2).
Title Actual Predicted Error
Alex Cross 4489 3955 533
That’s My Boy 4440 6258 -1818
Parental Guidance 4392 4140 252
Diary of a Wimpy Kid: Dog Days 4312 5826 -1514
The Dictator 4245 7210 -2965
The Secret World of Arrietty 4235 6930 -2695
The Man with the Iron Fists 4235 6053 -1818
One For the Money 4207 4619 -411
Rock of Ages 4161 7405 -3244
ParaNorman 4108 6899 -2791

REFERENCES http://en.wikipedia. org/wiki/Gradient boosting

[4]. “List of hoaxes on Wikipedia.” Retrieved 10 Jan 2012.
[1]. “Ensemble methods.” Retrieved 13 Jan 2012. http:// en.wikipedia.org/wiki/Wikipedia:List of
http://scikit-learn. org/stable/modules/ensemble.html hoaxes on Wikipedia
[2]. Friedman, Jerome H. (19 Apr 2001). “Greedy [5]. Pfeiffer, Eric (4 Jan 2013). “War is over: Imaginary
Function Approx- imation: A Gradient Boosting ‘Bicholm’ conflict removed from Wikipedia after five
Machine.” Retrieved 10 Jan 2012. http://www- years.” Retrieved 10 Jan 2012.
stat.stanford.edu/∼jhf/ftp/trebst.pdf [6]. “Wikipedia.” Retrieved 10 Jan 2012.
[3]. “Gradient boosting.” Retrieved 13 Jan 2012. http://en.wikipedia.org/ wiki/Wikipedia

IJISRT25FEB802 www.ijisrt.com 1956

Concept of Epidemiology, Types and Uses
No ratings yet
Concept of Epidemiology, Types and Uses
48 pages
Process Audit Check Sheet Cum Report
75% (4)
Process Audit Check Sheet Cum Report
4 pages
Thesis Help Services Uk
100% (3)
Thesis Help Services Uk
12 pages
Course Outline Muslim Personal Law
No ratings yet
Course Outline Muslim Personal Law
13 pages
Tashman Thesis 2015
No ratings yet
Tashman Thesis 2015
64 pages
How To Analyze Gear Failures
100% (1)
How To Analyze Gear Failures
6 pages
Problem Analysis
0% (1)
Problem Analysis
6 pages
Chapter 11 - ANOVA
No ratings yet
Chapter 11 - ANOVA
57 pages
Pearls Analysis of Saving and Credit Cooperatives of Jhapa
100% (1)
Pearls Analysis of Saving and Credit Cooperatives of Jhapa
55 pages
Determinants of Investment in Manufacturing Sector: A Micro Level Analysis (The Case of Mekelle City)
No ratings yet
Determinants of Investment in Manufacturing Sector: A Micro Level Analysis (The Case of Mekelle City)
76 pages
Making Sense of Generative AI: Cutting through the hype for business leaders and curious minds
From Everand
Making Sense of Generative AI: Cutting through the hype for business leaders and curious minds
Dominik Hörndlein
No ratings yet
Action Research
100% (1)
Action Research
22 pages
Learning Ionic
From Everand
Learning Ionic
Arvind Ravulavaru
No ratings yet
Big Data and Business Opportunities
100% (1)
Big Data and Business Opportunities
6 pages
Herd Behavior
No ratings yet
Herd Behavior
20 pages
Puppet Reporting and Monitoring
From Everand
Puppet Reporting and Monitoring
Michael Duffy
No ratings yet
Early Predictions of Movie Success: The Who, What, and When of Profitability
No ratings yet
Early Predictions of Movie Success: The Who, What, and When of Profitability
45 pages
Pain Management Strategies For Jugular Bulb Dehiscence Using Midazolam and Morphine in Resource-Limited Primary Emergency Services: A Case Report and Literature Review
No ratings yet
Pain Management Strategies For Jugular Bulb Dehiscence Using Midazolam and Morphine in Resource-Limited Primary Emergency Services: A Case Report and Literature Review
10 pages
IMDB Data Analyses
0% (1)
IMDB Data Analyses
38 pages
Stock Footage Millionaire: The Complete Insiders' Guide to Producing Stock Footage for Fun and Fortune
From Everand
Stock Footage Millionaire: The Complete Insiders' Guide to Producing Stock Footage for Fun and Fortune
Robb Crocker
No ratings yet
Search For Binary Companions Around Millisecond Pulsars
No ratings yet
Search For Binary Companions Around Millisecond Pulsars
13 pages
Curriculum Development in Nursing Education. Where Is The Pathway?
No ratings yet
Curriculum Development in Nursing Education. Where Is The Pathway?
7 pages
Sneha Kumari - 262 - DS Project.
No ratings yet
Sneha Kumari - 262 - DS Project.
19 pages
SOP HTA Internal Audit
No ratings yet
SOP HTA Internal Audit
13 pages
Swift 2 Blueprints: Swift Blueprints
From Everand
Swift 2 Blueprints: Swift Blueprints
Cecil Costa
No ratings yet
Evaluating The Effectiveness of Lean Management in Agriculture: The Case of Nature's Gift Banana Farm, Lilongwe, Malawi
No ratings yet
Evaluating The Effectiveness of Lean Management in Agriculture: The Case of Nature's Gift Banana Farm, Lilongwe, Malawi
5 pages
Mastering Android Wear Application Development
From Everand
Mastering Android Wear Application Development
Siddique Hameed
No ratings yet
5.8-Nathan-Pena - Portfolio
No ratings yet
5.8-Nathan-Pena - Portfolio
25 pages
论文2
No ratings yet
论文2
27 pages
Irs Unit-5
No ratings yet
Irs Unit-5
28 pages
Analytic Project Report APR
No ratings yet
Analytic Project Report APR
42 pages
Webometrics Benefitting From Web Mining? An Investigation of Methods and Applications of Two Research Fields
No ratings yet
Webometrics Benefitting From Web Mining? An Investigation of Methods and Applications of Two Research Fields
37 pages
Using Data Mining in The Sentiment Analysis Proces
No ratings yet
Using Data Mining in The Sentiment Analysis Proces
23 pages
Maths IA Final Draft
No ratings yet
Maths IA Final Draft
25 pages
Harnessing Data Analytics For Supply Chain Excellence in The Age
No ratings yet
Harnessing Data Analytics For Supply Chain Excellence in The Age
5 pages
Netflix Analysis Report (2105878 - Bibhudutta Swain)
No ratings yet
Netflix Analysis Report (2105878 - Bibhudutta Swain)
19 pages
V25I0429-IJESAT-Movie Popularity and Target Audience Prediction Using The Content-Based Recommender System
No ratings yet
V25I0429-IJESAT-Movie Popularity and Target Audience Prediction Using The Content-Based Recommender System
9 pages
Predicting Movie Prices Through Dynamic Social Net
No ratings yet
Predicting Movie Prices Through Dynamic Social Net
12 pages
An Analysis of Wikipedia
No ratings yet
An Analysis of Wikipedia
18 pages
Analysis of The Influence Factors of Global Film Box Office Based On A Log-Linear Model
No ratings yet
Analysis of The Influence Factors of Global Film Box Office Based On A Log-Linear Model
8 pages
Sentiment Analysis Rating Sysytem
No ratings yet
Sentiment Analysis Rating Sysytem
19 pages
Unveiling Insights Harnessing Data Mining To Analyze Content Metrics
No ratings yet
Unveiling Insights Harnessing Data Mining To Analyze Content Metrics
9 pages
Internet Movie Database Analysis Using Python
No ratings yet
Internet Movie Database Analysis Using Python
6 pages
Lesson Plan: Present Absent
No ratings yet
Lesson Plan: Present Absent
3 pages
Movie Data
No ratings yet
Movie Data
11 pages
Movie Success Prediction Using Machine Learning Algorithms and Their Comparison
No ratings yet
Movie Success Prediction Using Machine Learning Algorithms and Their Comparison
6 pages
Beyond The Tests: How Portfolios Whisper of Equity and Engagement in Our Classrooms
100% (1)
Beyond The Tests: How Portfolios Whisper of Equity and Engagement in Our Classrooms
2 pages
Conference Draft 1 2
No ratings yet
Conference Draft 1 2
9 pages
Sodor Terminal Oil Project
No ratings yet
Sodor Terminal Oil Project
30 pages
My Project 1
No ratings yet
My Project 1
7 pages
IMDB Box Office Prediction Using Machine Learning Algorithms
No ratings yet
IMDB Box Office Prediction Using Machine Learning Algorithms
7 pages
Probability: Victormanuel - Casero@uclm - Es Office 2-B14 (Edificio Polit Ecnico)
No ratings yet
Probability: Victormanuel - Casero@uclm - Es Office 2-B14 (Edificio Polit Ecnico)
33 pages
Choudhery2017 Social Media Mining Prediction of Box Office Revenue
No ratings yet
Choudhery2017 Social Media Mining Prediction of Box Office Revenue
10 pages
Python for Data Science For Dummies
From Everand
Python for Data Science For Dummies
John Paul Mueller
No ratings yet
Factors Affecting The Box
No ratings yet
Factors Affecting The Box
12 pages
Nonfictionbasic Bloodstain Pattern Analysis Class Annoucement
No ratings yet
Nonfictionbasic Bloodstain Pattern Analysis Class Annoucement
2 pages
Prediks I Movie
No ratings yet
Prediks I Movie
25 pages
Unraveling The Hurdles of Mathematics Majors in The New Normal
No ratings yet
Unraveling The Hurdles of Mathematics Majors in The New Normal
12 pages
Science Checklist
No ratings yet
Science Checklist
1 page
Movie Data Analysis and Prediction
No ratings yet
Movie Data Analysis and Prediction
9 pages
Analysis and Clustering of Movie Genres
No ratings yet
Analysis and Clustering of Movie Genres
8 pages
DM 8
No ratings yet
DM 8
6 pages
Predicting IMDB Movie Ratings Using Social Media
No ratings yet
Predicting IMDB Movie Ratings Using Social Media
5 pages
Ijctt V3i1p138
No ratings yet
Ijctt V3i1p138
7 pages
Statement
No ratings yet
Statement
4 pages
Social Media
No ratings yet
Social Media
8 pages
Part B (Part B: To Be Completed by Students) : Web Mining
No ratings yet
Part B (Part B: To Be Completed by Students) : Web Mining
5 pages
Analyzing and Predicting The Success of Box Office Collection of A Movie Using Machine Learning
No ratings yet
Analyzing and Predicting The Success of Box Office Collection of A Movie Using Machine Learning
7 pages
Extension in Time Period For Submission of Research Thesis
No ratings yet
Extension in Time Period For Submission of Research Thesis
1 page
Predicting The Future With Social Media: Sitaram Asur Bernardo A. Huberman
No ratings yet
Predicting The Future With Social Media: Sitaram Asur Bernardo A. Huberman
8 pages
Social Medias Influence On Modern Language and Communication Skills
No ratings yet
Social Medias Influence On Modern Language and Communication Skills
12 pages
(IJCST-V6I6P21) :yogesh Sharma, Aastha Jaie, Heena Garg, Sagar Kumar
No ratings yet
(IJCST-V6I6P21) :yogesh Sharma, Aastha Jaie, Heena Garg, Sagar Kumar
6 pages
PGDM Semester 4 Internal Assessment: Customer Relationship Management
No ratings yet
PGDM Semester 4 Internal Assessment: Customer Relationship Management
4 pages
BUS351 Marking Guide
No ratings yet
BUS351 Marking Guide
2 pages
Quantifying Movie Magic With Google Search
No ratings yet
Quantifying Movie Magic With Google Search
11 pages
Big Data Analysis Using Apache HADOOP (November 2013) : Abstract-Big Data Problems Are Often Complex To
No ratings yet
Big Data Analysis Using Apache HADOOP (November 2013) : Abstract-Big Data Problems Are Often Complex To
11 pages
45 Ijmtst0806103
No ratings yet
45 Ijmtst0806103
4 pages
Ihhuhu
No ratings yet
Ihhuhu
4 pages
RSM Research Paper (605,610)
No ratings yet
RSM Research Paper (605,610)
15 pages
Box-Office Opening Prediction of Movies Based On Hype Analysis Through Data Mining
No ratings yet
Box-Office Opening Prediction of Movies Based On Hype Analysis Through Data Mining
5 pages
Movies Popularity Prediction Using Social Media and Conventional Features
No ratings yet
Movies Popularity Prediction Using Social Media and Conventional Features
5 pages
A Predictor For Movie Success: 2.1 Data Collection
No ratings yet
A Predictor For Movie Success: 2.1 Data Collection
5 pages
7 Rama Communication
No ratings yet
7 Rama Communication
15 pages
Predicting Movie Success Based On IMDB Data
No ratings yet
Predicting Movie Success Based On IMDB Data
4 pages
b1 PDF
No ratings yet
b1 PDF
6 pages
Bibliomining For Library Decision-Making: Background
No ratings yet
Bibliomining For Library Decision-Making: Background
1 page
The AI Content Explosion: How Artificial Intelligence will Create Books, Images, Videos, and Music
From Everand
The AI Content Explosion: How Artificial Intelligence will Create Books, Images, Videos, and Music
Hugh Line
No ratings yet
Efficacy, Safety, and Feasibility of Verapamil in The Management of Atrial Fibrillation in Emergency Services With Limited Resources: A Systematic Review
No ratings yet
Efficacy, Safety, and Feasibility of Verapamil in The Management of Atrial Fibrillation in Emergency Services With Limited Resources: A Systematic Review
13 pages
Predicting Movie Success Based On Imdb Data
No ratings yet
Predicting Movie Success Based On Imdb Data
5 pages
Ib Business Management - 5 5d Activity - Production Planning Activity
No ratings yet
Ib Business Management - 5 5d Activity - Production Planning Activity
2 pages
Personal-Professional Attributes of Teachers and Learning Competence of Junior High School Students
No ratings yet
Personal-Professional Attributes of Teachers and Learning Competence of Junior High School Students
28 pages
Enhancing Cloud Security With Fuzzy Logic A Comprehensive Approach To Authentication, Data Recovery, and Privateness
No ratings yet
Enhancing Cloud Security With Fuzzy Logic A Comprehensive Approach To Authentication, Data Recovery, and Privateness
12 pages
Analysis The Result of Translation: BY7 Group Andika Azmal Ridwan Ratika Rizki
No ratings yet
Analysis The Result of Translation: BY7 Group Andika Azmal Ridwan Ratika Rizki
9 pages
Unpacking Financial Interventions Link To Student Academic Performance in Public Secondary Schools: A Nyamira County Level Analysis, Kenya
No ratings yet
Unpacking Financial Interventions Link To Student Academic Performance in Public Secondary Schools: A Nyamira County Level Analysis, Kenya
11 pages
Potential Wound Healing Activity of Citrus Micrantha Rut. (Biasong) Ethanolic Peel Extract On Excised Cutaneous Wounds in Male Albino Mice
No ratings yet
Potential Wound Healing Activity of Citrus Micrantha Rut. (Biasong) Ethanolic Peel Extract On Excised Cutaneous Wounds in Male Albino Mice
11 pages
Innovative Strategies Statistical Solutions and Simulations For Modern Clinical Trials 1st Edition Mark Chang (Author) All Chapters Instant Download
No ratings yet
Innovative Strategies Statistical Solutions and Simulations For Modern Clinical Trials 1st Edition Mark Chang (Author) All Chapters Instant Download
55 pages
LerouxMcShane2017 Youthpolicing
No ratings yet
LerouxMcShane2017 Youthpolicing
14 pages
Cardiovascular Investigation Practical Report Checklist
No ratings yet
Cardiovascular Investigation Practical Report Checklist
1 page
A Study To Assess The General Mental Health Among College Students in Selected Colleges at Kannur District
No ratings yet
A Study To Assess The General Mental Health Among College Students in Selected Colleges at Kannur District
5 pages
Quantifying, Measuring, and Correlating Socio - Cultural Variables: An Indispensable Technique For Diverse Fields of The Social Sciences
No ratings yet
Quantifying, Measuring, and Correlating Socio - Cultural Variables: An Indispensable Technique For Diverse Fields of The Social Sciences
12 pages
Do Aliens Exist Essay
No ratings yet
Do Aliens Exist Essay
8 pages
Intercalating A Multi-Barreled Approach To Educational and Pedagogical Reform: A Brief Summation of Our Publications On Pedagogy
No ratings yet
Intercalating A Multi-Barreled Approach To Educational and Pedagogical Reform: A Brief Summation of Our Publications On Pedagogy
12 pages
Mediating Conflicts: Challenges of School Grievance Committee
No ratings yet
Mediating Conflicts: Challenges of School Grievance Committee
4 pages
Project
No ratings yet
Project
52 pages
Childhood Adversity and Its Echoes in Adult Intimate Relationships
No ratings yet
Childhood Adversity and Its Echoes in Adult Intimate Relationships
9 pages
University Libraries and The Use of Open Educational Resources (OERs) in Blended Learning (BL) : Effective Strategies From Nairobi County
No ratings yet
University Libraries and The Use of Open Educational Resources (OERs) in Blended Learning (BL) : Effective Strategies From Nairobi County
7 pages
Fenton Reagent-Based Advanced Oxidation For The Degradation of Reactive Black 5 and Methylene Blue Dyes
No ratings yet
Fenton Reagent-Based Advanced Oxidation For The Degradation of Reactive Black 5 and Methylene Blue Dyes
17 pages
Parental Influence On Aggression and Self-Esteem Among Young Adults: An Indian Context
No ratings yet
Parental Influence On Aggression and Self-Esteem Among Young Adults: An Indian Context
6 pages
Gastrointestinal Stromal Tumour (GIST)
No ratings yet
Gastrointestinal Stromal Tumour (GIST)
5 pages
Analyzing The Efficiency of Hybrid Explainable AI Models For Feature Extraction and Pattern Recognition in High-Dimensional Data Mining Tasks
No ratings yet
Analyzing The Efficiency of Hybrid Explainable AI Models For Feature Extraction and Pattern Recognition in High-Dimensional Data Mining Tasks
12 pages
Parental Participation and Students' Academic Achievement in Selected Government Aided Secondary Schools in Kibaale Town Council, Rakai District, Uganda
No ratings yet
Parental Participation and Students' Academic Achievement in Selected Government Aided Secondary Schools in Kibaale Town Council, Rakai District, Uganda
11 pages
GST 211 2025 Entrepreneurship 2
No ratings yet
GST 211 2025 Entrepreneurship 2
15 pages
Exploring The Association Between Attachment and Bullying Among Adolescents Through Bowlbian Perspective
No ratings yet
Exploring The Association Between Attachment and Bullying Among Adolescents Through Bowlbian Perspective
10 pages
Dental Care Flip Model: Dental Health Education To Improve Dental Health Maintenance Behavior of Elementary School Students
No ratings yet
Dental Care Flip Model: Dental Health Education To Improve Dental Health Maintenance Behavior of Elementary School Students
8 pages
Isolated Fallopian Tube Torsion Caused by A Mature Cystic Teratoma: A Rare Case Report
No ratings yet
Isolated Fallopian Tube Torsion Caused by A Mature Cystic Teratoma: A Rare Case Report
6 pages
Cardiovascular Catastrophe in Catastrophic Antiphospholipid Syndrome: A Case Report
No ratings yet
Cardiovascular Catastrophe in Catastrophic Antiphospholipid Syndrome: A Case Report
5 pages
Pamectomy in Lobular Breast Cancer
No ratings yet
Pamectomy in Lobular Breast Cancer
3 pages
Digital Transformation in The Judiciary: Evaluating The Impact of Court Case Management Systems On Reducing Case Backlogs and Enhancing Efficiency in Subordinate Courts of Tamil Nadu
No ratings yet
Digital Transformation in The Judiciary: Evaluating The Impact of Court Case Management Systems On Reducing Case Backlogs and Enhancing Efficiency in Subordinate Courts of Tamil Nadu
2 pages
Temperature-Energy Relationships and Spatial Distribution Analysis for Nano-Enhanced Phase Change Materials Via Thermal Energy Storage
No ratings yet
Temperature-Energy Relationships and Spatial Distribution Analysis for Nano-Enhanced Phase Change Materials Via Thermal Energy Storage
18 pages
Solid Dispersion-Based Approaches for Improving Oral Bioavailability: Current Progress and Future Perspectives
No ratings yet
Solid Dispersion-Based Approaches for Improving Oral Bioavailability: Current Progress and Future Perspectives
8 pages

Predicting Film Box Office Performance Using Wikipedia Edit Data

Uploaded by

Predicting Film Box Office Performance Using Wikipedia Edit Data

Uploaded by

Volume 10, Issue 2, February – 2025 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165 https://doi.org/10.5281/zenodo.14987459

Predicting Film Box Office Performance Using

Publication Date: 2025/03/11

 Wikipedia as a Gauge of Social Interest

IJISRT25FEB802 www.ijisrt.com 1951

IJISRT25FEB802 www.ijisrt.com 1952

Fig 2 Example similarity scores: for “The Avengers” (2012).

IJISRT25FEB802 www.ijisrt.com 1953

IV. ANALYSIS AND PREDICTION

IJISRT25FEB802 www.ijisrt.com 1954

Fig 4 Predicted values vs. actual values.

Table 1 Top 10 Features in the Gradient Boosting Tree Model.

IJISRT25FEB802 www.ijisrt.com 1955

 Nevertheless, the partial success in predicting box office

REFERENCES http://en.wikipedia. org/wiki/Gradient boosting

IJISRT25FEB802 www.ijisrt.com 1956

You might also like