Python
Python
PYTHON PROJECT
Group 8 MBA -BA(A)
1. Harshada Wandhare A020
2. Manaswini Voolapalli A028
3. Muskan Khandelwal A032
4. Satwik Jain A055
5. Shivjeet Patil A060
6. Shubham Yadav A062
The code creates a figure with two subplots. The first subplot uses `sns.regplot` to show Budget versus India
Gross, and the second shows Budget versus Worldwide Gross, each with a regression line to illustrate the
relationship between the variables.
BUSINESS FINDINGS: Both plots show a positive correlation, indicating that higher budget films tend to
have higher gross earnings in both India and worldwide. However, the worldwide gross plot shows a
stronger correlation and a wider range of earnings.
INDIAN GROSS VS. OVERSEAS GROSS
BUSINESS FINDINGS: The median Overseas Gross is significantly higher than the India Gross, with a wider range and
more outliers, suggests that overseas markets contribute substantially to overall gross earnings, with a few movies
generating exceptionally high revenue.
VERDICT DISTRIBUTION OF MOVIE VERDICTS
BUSINESS FINDINGS: Most movies fall into the "Hit" or "Flop" categories, accounting for nearly 40% of all
films. A smaller percentage achieve "SuperHit" or "All Time Blockbuster" status, indicating a highly
competitive industry with varying levels of success.
Budget Ranges & Frequencies
The code creates a histogram to visualize the
distribution of movie budgets. The
`sns.histplot()` function plots the budget values
with 30 bins and includes a kernel density
estimate (KDE) to show the data's distribution.
The plot displays the frequency of budget
ranges, aiding in understanding the spread
and concentration of movie budgets.
IQR=Q3−Q1
Upper Whisker = Q3+1.5×IQR
BUSINESS FINDINGS: Reveals that budget generally increases with verdict category, with "All Time
Blockbuster" having the highest median budget. However, there's significant overlap, indicating budget
isn't the sole determinant of success.
Correlation HeatMap
A correlation heatmap is a graphical representation
of the relationships between numerical variables in a
dataset. It displays the correlation coefficients,
ranging from -1 to 1, which indicate the strength and
direction of the linear relationship between pairs of
variables. Positive values suggest a direct
relationship, while negative values indicate an
inverse relationship. The heatmap uses color
gradients to visually convey these relationships,
making it easier to identify patterns and correlations.
Feature Correlations
The code calculates the correlation matrix for key
numerical features—Worldwide Gross, Indian Gross,
India Net, Overseas, and Budget and shows using a
heatmap. The `sns.heatmap()` function displays
correlation coefficients with color-coding, indicating
the strength and direction of relationships between
these features.
The code generates a pie chart comparing the total earnings from Worldwide Gross and Indian
Gross. It first calculates the total sum of Worldwide Gross and Indian Gross from the dataset.
Then, it creates a pie chart to visually represent the proportion of each type of earnings, with
percentage labels and distinct colors for clear differentiation. The `startangle=140` rotates the
chart for better readability.
BUSINESS FINDINGS: The pie chart illustrates
the distribution of total earnings between
Worldwide Gross and Indian Gross. It reveals
that Worldwide Gross accounts for a
significantly larger portion, representing 74.9%
of the total earnings, while Indian Gross
constitutes 25.1%. This indicates that the
majority of earnings originate from outside
India.
CUMULATIVE EARNINGS BY BUDGET RANGE