0% found this document useful (0 votes)
37 views

Python

using python to solve buiness problems

Uploaded by

Satwik jain
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views

Python

using python to solve buiness problems

Uploaded by

Satwik jain
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Movie Box office Analysis

PYTHON PROJECT
Group 8 MBA -BA(A)
1. Harshada Wandhare A020
2. Manaswini Voolapalli A028
3. Muskan Khandelwal A032
4. Satwik Jain A055
5. Shivjeet Patil A060
6. Shubham Yadav A062

DATE: 21st August, 2024


Project Overview
The Python project leverages `matplotlib`, `seaborn`,
`pandas`, and `numpy` to conduct a comprehensive
analysis of Bollywood box office performance. It
examines metrics like worldwide gross, India net,
India gross, overseas revenue, budget, and verdict to
identify trends and success factors. Using `pandas`
for data manipulation, `numpy` for numerical
analysis, and `matplotlib` and `seaborn` for data
visualization, the project provides insights into the
financial dynamics of Bollywood films and predicts
potential box office outcomes.
Data Import
The code imports necessary libraries (`numpy`, `pandas`, `matplotlib`, and `seaborn`), reads
a CSV file containing data on the top 1000 Bollywood movies, and loads it into a
DataFrame named `movie_data`. The `head()` function then displays the first few rows of
the dataset for an overview. Along with that it removes unnamed columns.
Data Preprocessing
The code preprocesses the dataset
by converting the categorical
`Verdict` column into numerical
codes, making it usable for analysis. It
ensures that columns such as
`Worldwide`, `India Net`, `India Gross`,
`Overseas`, and `Budget` are numeric
for consistency. It also removes rows
with missing values, cleaning the
dataset for accurate analysis and
visualization.
Exploratory Analysis

The code provides a thorough exploratory analysis by first


summarizing statistics for numerical features to
understand their distribution and identify outliers. It then
checks for missing values, ensuring data completeness
and accuracy. By isolating numerical columns, it focuses
on relevant metrics like earnings and budgets, which is
essential for accurate financial analysis and strategic
business decisions. This step helps in ensuring that the
data is well-prepared for further analysis and visualization.
WordCloud
A word cloud visually represents the frequency of
words in a dataset, with word size corresponding to
frequency or importance. Common words appear
larger, while less frequent words are smaller. This
graphical tool is useful for quickly identifying key
themes or prominent terms in text data, making it a
popular choice for text analysis and summarization.
The code generates and displays a
word cloud of movie names from the
dataset. After installing the
`wordcloud` library, it concatenates
all movie names into a single string
and creates a word cloud using
`WordCloud`. The resulting
visualization, shown with `matplotlib`,
highlights the most frequently
occurring words in movie titles, with
larger words representing higher
frequency. The plot is titled "Word
Cloud of Movie Names" and provides
a visual representation of the most
common movie title terms.
Data Visualization
The project employs data visualization to explore Bollywood box office metrics using various
graphs, including bar charts, scatter plots, and heatmaps. Leveraging `matplotlib` and
`seaborn`, these visualizations reveal trends in worldwide gross, India net, overseas revenue,
and budget, offering insights into the factors influencing a movie's financial success.
Distribution of Indian Gross

The code shows the distribution of Indian Gross earnings


across movies using a line plot. It plots a line graph with movies
on the x-axis and their corresponding Indian Gross earnings on
the y-axis.

BUSINESS FINDIINGS: Earnings vary widely. A few blockbusters


dominate the market, generating massive revenue, while most
films earn significantly less. This extreme distribution
highlights the high-risk, high-reward nature of the Indian film
industry.
Distribution of Worldwide Gross

The code shows the distribution of Worldwide Gross earnings


across movies using a line plot. It plots a line graph with movies
on the x-axis and their corresponding Worldwide Gross
earnings on the y-axis.

BUSINESS FINDIINGS: The chart shows a highly skewed


distribution of worldwide gross earnings across movies. A few
movies generate massive revenue, while the majority earn
significantly less. This indicates a high-risk, high-reward
industry.
Budget Vs. Box Office Success

The code creates a scatter plot with a regression line to


show the relationship between movie budgets and
worldwide gross earnings. The blue dots represent
individual data points, while the red line indicates the
overall trend.
BUSINESS FINDINGS: The chart shows a positive
correlation between movie budget and worldwide gross
earnings, suggesting that higher budget films tend to
earn more. However, there's significant variation,
indicating other factors influence box office success.
BUDGET VS. INDIAN EARNINGS &
BUDGET VS. WORLDWIDE EARNINGS

The code creates a figure with two subplots. The first subplot uses `sns.regplot` to show Budget versus India
Gross, and the second shows Budget versus Worldwide Gross, each with a regression line to illustrate the
relationship between the variables.

BUSINESS FINDINGS: Both plots show a positive correlation, indicating that higher budget films tend to
have higher gross earnings in both India and worldwide. However, the worldwide gross plot shows a
stronger correlation and a wider range of earnings.
INDIAN GROSS VS. OVERSEAS GROSS

The code melts the `movie_data` DataFrame to reshape


it from wide to long format, focusing on `India Gross` and
`Overseas`. The `melt()` function transforms these
columns into a format where each row represents a type
of earnings. The `sns.boxplot` then visualizes this data,
showing the distribution, median, and variability of gross
earnings for both Indian and overseas markets.

BUSINESS FINDINGS: The median Overseas Gross is significantly higher than the India Gross, with a wider range and
more outliers, suggests that overseas markets contribute substantially to overall gross earnings, with a few movies
generating exceptionally high revenue.
VERDICT DISTRIBUTION OF MOVIE VERDICTS

The code generates a pie chart to show the distribution of


movie verdicts. It first counts the occurrences of each verdict
using `value_counts()`. The `plt.pie()` function then creates the
chart, showing each verdict's proportion with percentage
labels and a pastel color palette, making the distribution of
movie verdicts easy to interpret.

BUSINESS FINDINGS: Most movies fall into the "Hit" or "Flop" categories, accounting for nearly 40% of all
films. A smaller percentage achieve "SuperHit" or "All Time Blockbuster" status, indicating a highly
competitive industry with varying levels of success.
Budget Ranges & Frequencies
The code creates a histogram to visualize the
distribution of movie budgets. The
`sns.histplot()` function plots the budget values
with 30 bins and includes a kernel density
estimate (KDE) to show the data's distribution.
The plot displays the frequency of budget
ranges, aiding in understanding the spread
and concentration of movie budgets.

BUSINESS FINDINGS: A significant majority of


movies have budgets clustering around the
lower end of the scale, with a long tail
extending towards higher budget productions
indicating disparity in the budget allocation
across the film industry.
BoxPlot
A box plot visually summarizes the distribution of a
dataset by displaying its minimum, first quartile (Q1),
median, third quartile (Q3), and maximum values. It
highlights the range, interquartile range (IQR), and
potential outliers. Box plots provide a clear view of data
spread and central tendency, making them useful for
comparing distributions across different groups.

IQR=Q3−Q1
Upper Whisker = Q3+1.5×IQR

Lower Whisker = Q1−1.5×IQR


Budget Vs. Verdict

The code generates a box plot to compare movie


budgets across different verdicts. The `sns.boxplot()`
function plots `Budget` on the y-axis and `Verdict` on
the x-axis, with the `viridis` color palette. The plot
shows the distribution of budgets for each verdict
category.

BUSINESS FINDINGS: Reveals that budget generally increases with verdict category, with "All Time
Blockbuster" having the highest median budget. However, there's significant overlap, indicating budget
isn't the sole determinant of success.
Correlation HeatMap
A correlation heatmap is a graphical representation
of the relationships between numerical variables in a
dataset. It displays the correlation coefficients,
ranging from -1 to 1, which indicate the strength and
direction of the linear relationship between pairs of
variables. Positive values suggest a direct
relationship, while negative values indicate an
inverse relationship. The heatmap uses color
gradients to visually convey these relationships,
making it easier to identify patterns and correlations.
Feature Correlations
The code calculates the correlation matrix for key
numerical features—Worldwide Gross, Indian Gross,
India Net, Overseas, and Budget and shows using a
heatmap. The `sns.heatmap()` function displays
correlation coefficients with color-coding, indicating
the strength and direction of relationships between
these features.

BUSINESS FINDINGS: The correlation heatmap shows


strong positive relationships between worldwide,
overseas, and India gross earnings. Budget also
correlates positively with these metrics. However,
India net earnings have weak correlations with other
variables.
TOP MOVIES BY OVERSEAS &
INDIAN EARNINGS
The code first sorts the movie
dataset based on overseas
earnings to identify the top 10
highest-earning movies. It then
creates a side-by-side bar plot
where each movie’s Indian Gross
and Overseas Gross are displayed
using distinct colors. The
`bar_width` ensures bars for
different earnings are placed next
to each other, and `xticks` are
adjusted for clear movie labels.
BUSINESS FINDINGS: The chart
compares the Indian and
overseas gross earnings of the
top 10 movies. Overseas gross
significantly outperforms Indian
gross for most movies, with
"Avatar: The Way of Water" and
"Avengers: End Game" as notable
exceptions.
Indian Net Vs. Indian Gross
The code plots two side-by-side
histograms for analyzing the
distribution of Indian Net and Indian
Gross earnings. The first subplot shows
the frequency distribution of Indian
Net earnings, while the second subplot
displays the distribution for Indian
Gross earnings. Each histogram
includes a Kernel Density Estimate
(KDE) to visualize the distribution more
smoothly. The `tight_layout()` function
ensures that the subplots are neatly
arranged without overlapping.
BUSINESS FINDINGS: The two histograms show the distribution of Indian Net Earnings and Indian Gross
Earnings. Both distributions are right-skewed, with a long tail of high earners. The Indian Gross Earnings
distribution has a higher peak and a wider range than the Indian Net Earnings distribution. This
suggests that there are more movies with high gross earnings than high net earnings.
Worldwide Net Vs. Worldwide Gross
The code generates two histograms
side by side to compare the distribution
of Worldwide Gross and Overseas
earnings. The first subplot visualizes the
frequency distribution of Worldwide
Gross earnings, while the second
subplot displays the distribution of
Overseas earnings. Each histogram
includes a Kernel Density Estimate
(KDE) for a smoother representation of
the data distribution. The
`tight_layout()` function ensures proper
spacing between plots.
BUSINESS FINDINGS: Both distributions are right-skewed, with a long tail of high earners. The Worldwide
Gross Earnings distribution has a higher peak and a wider range than the Overseas Earnings
distribution. This suggests that there are more movies with high worldwide gross earnings than high
overseas earnings.
Worldwide Vs. Indian Gross

The code generates a pie chart comparing the total earnings from Worldwide Gross and Indian
Gross. It first calculates the total sum of Worldwide Gross and Indian Gross from the dataset.
Then, it creates a pie chart to visually represent the proportion of each type of earnings, with
percentage labels and distinct colors for clear differentiation. The `startangle=140` rotates the
chart for better readability.
BUSINESS FINDINGS: The pie chart illustrates
the distribution of total earnings between
Worldwide Gross and Indian Gross. It reveals
that Worldwide Gross accounts for a
significantly larger portion, representing 74.9%
of the total earnings, while Indian Gross
constitutes 25.1%. This indicates that the
majority of earnings originate from outside
India.
CUMULATIVE EARNINGS BY BUDGET RANGE

The code calculates and visualizes


the average cumulative earnings
across different budget ranges. It
first computes cumulative earnings
from Worldwide Gross, Indian Gross,
and Overseas earnings. Next, it bins
the budget into ranges and
calculates the average cumulative
earnings for each range. Finally, it
creates a scatter plot with a line
connecting the points to show how
average cumulative earnings vary
with budget ranges.
BUSINESS FINDINGS: The line
chart shows that the average
cumulative earnings increase
with budget range, but the
relationship is not linear. There
is a sharp increase in average
cumulative earnings between
budget ranges 4 and 5.
GROUP DETAILS

Harshada Wandhare Manaswini Voolapalli Muskan Khandelwal


A020 A028 A032

Satwik Jain Shivjeet Patil Shubham Yadav


A055 A060 A062

You might also like