0% found this document useful (0 votes)

14 views15 pages

Bollywood and Heart Data Analysis

The document presents a data analysis of Bollywood movies and heart disease data using Python libraries such as pandas and seaborn. It includes various analyses such as movie counts by genre and month, return on investment (ROI) calculations, and correlations between different variables like budget and box office collections. Additionally, it explores heart disease data, providing insights into the structure and content of the datasets.

Uploaded by

karmaarules

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views15 pages

Bollywood and Heart Data Analysis

Uploaded by

karmaarules

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

9/20/22, 8:59 PM Bollywood and Heart Data Analysis

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set(color_codes=True)

import warnings
warnings.filterwarnings('ignore')
warnings.warn('DelftStack')
warnings.warn('Do not show this message')
print("No Warning Shown")

No Warning Shown

In [2]:
BW = pd.read_csv('bollywood.csv')
BW.head()

Out[2]: Release
SlNo MovieName ReleaseTime Genre Budget BoxOfficeCollection YoutubeViews You
Date

18-Apr-
0 1 2 States LW Romance 36 104.00 8576361
14

4-Jan-
1 2 Table No. 21 N Thriller 10 12.00 1087320
13

18-Jul- Amit Sahni

2 3 N Comedy 10 4.00 572336
14 Ki List

4-Jan- Rajdhani
3 4 N Drama 7 0.35 42626
13 Express

Bobby
4 5 4-Jul-14 N Comedy 18 10.80 3113427
Jasoos

In [3]:
print(BW.shape)
BW.info()

(149, 10)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 149 entries, 0 to 148
Data columns (total 10 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 SlNo 149 non-null int64
1 Release Date 149 non-null object
2 MovieName 149 non-null object
3 ReleaseTime 149 non-null object
4 Genre 149 non-null object
5 Budget 149 non-null int64
6 BoxOfficeCollection 149 non-null float64
7 YoutubeViews 149 non-null int64
8 YoutubeLikes 149 non-null int64
9 YoutubeDislikes 149 non-null int64

localhost:8888/nbconvert/html/Bollywood and Heart Data Analysis.ipynb?download=false 1/15

9/20/22, 8:59 PM Bollywood and Heart Data Analysis
dtypes: float64(1), int64(5), object(4)
memory usage: 11.8+ KB

In [4]:
Movies_by_genre = BW.groupby('Genre')['MovieName'].count().reset_index(name="MovieName_
print(Movies_by_genre.sort_values('MovieName_count',ascending = False))
sns.set_context("paper", font_scale= 1.5)
plt.title("MovieName_count vs Month")
sns.barplot(x='Genre',y ='MovieName_count', data = Movies_by_genre)
plt.xticks(rotation= 80)
plt.show()
Movies_by_genre['MovieName_count'].max()

Genre MovieName_count
3 Comedy 36
0 Drama 35
5 Thriller 26
4 Romance 25
1 Action 21
2 Action 3
6 Thriller 3

Out[4]: 36

In [5]:
cross_tab = pd.crosstab(BW.Genre, BW.ReleaseTime)
cross_tab

Out[5]: ReleaseTime FS HS LW N

Genre

Drama 4 6 1 24

Action 3 3 3 12

Action 0 0 0 3

Comedy 3 5 5 23

Romance 3 3 4 15

localhost:8888/nbconvert/html/Bollywood and Heart Data Analysis.ipynb?download=false 2/15

9/20/22, 8:59 PM Bollywood and Heart Data Analysis

ReleaseTime FS HS LW N

Genre

Thriller 4 1 1 20

Thriller 0 0 1 2

In [6]:
BW['Month'] = pd.DatetimeIndex(BW['Release Date']).month
BW.head(2)
Movies_by_month = BW.groupby('Month')['MovieName'].count().reset_index(name="Movie_coun
print(Movies_by_month.sort_values('Movie_count',ascending = False))
sns.set_context("paper", font_scale= 1.5)
plt.title("Movie Count vs Month")
sns.barplot(x='Month',y ='Movie_count', data = Movies_by_month)
plt.show()
Movies_by_month['Movie_count'].max()

Month Movie_count
0 1 20
2 3 19
4 5 18
1 2 16
6 7 16
3 4 11
5 6 10
8 9 10
10 11 10
9 10 9
7 8 8
11 12 2

Out[6]: 20

In [7]:
High_budget = BW[(BW['Budget'] > 25)]
HighBudgetMovies_by_month = High_budget.groupby('Month')['MovieName'].count().reset_ind
print(HighBudgetMovies_by_month.sort_values('Movie_count',ascending = False))
sns.set_context("paper", font_scale= 1.5)
plt.title("Movie Count vs Month")
sns.barplot(x='Month',y ='Movie_count', data = HighBudgetMovies_by_month)

localhost:8888/nbconvert/html/Bollywood and Heart Data Analysis.ipynb?download=false 3/15

9/20/22, 8:59 PM Bollywood and Heart Data Analysis
plt.show()
HighBudgetMovies_by_month['Movie_count'].max()

Month Movie_count
1 2 9
7 8 7
0 1 6
2 3 6
6 7 6
10 11 6
5 6 5
3 4 4
8 9 4
9 10 4
4 5 3
11 12 2

Out[7]: 9

In [8]:
BW['ROI'] = (BW['BoxOfficeCollection']-BW['Budget'])/BW['Budget']
Top10_ROI = BW.sort_values('ROI',ascending = False)
Top10 = Top10_ROI[['MovieName','ROI','ReleaseTime']].head(10)
Top10

Out[8]: MovieName ROI ReleaseTime

64 Aashiqui 2 8.166667 N

89 PK 7.647059 HS

132 Grand Masti 7.514286 LW

135 The Lunchbox 7.500000 N

87 Fukrey 6.240000 N

58 Mary Kom 5.933333 N

128 Shahid 5.666667 FS

37 Humpty Sharma Ki Dulhania 5.500000 N

101 Bhaag Milkha Bhaag 4.466667 N

localhost:8888/nbconvert/html/Bollywood and Heart Data Analysis.ipynb?download=false 4/15

9/20/22, 8:59 PM Bollywood and Heart Data Analysis

MovieName ROI ReleaseTime

115 Chennai Express 4.266667 FS

In [9]:
cross_tab_ROI = pd.crosstab(Top10.ROI,Top10.ReleaseTime)
print(cross_tab_ROI)
Avg_ROI = Top10.groupby('ReleaseTime')['ROI'].mean()
Avg_ROI

ReleaseTime FS HS LW N
ROI
4.266667 1 0 0 0
4.466667 0 0 0 1
5.500000 0 0 0 1
5.666667 1 0 0 0
5.933333 0 0 0 1
6.240000 0 0 0 1
7.500000 0 0 0 1
7.514286 0 0 1 0
7.647059 0 1 0 0
8.166667 0 0 0 1
Out[9]: ReleaseTime
FS 4.966667
HS 7.647059
LW 7.514286
N 6.301111
Name: ROI, dtype: float64

In [27]:
sns.set_context("paper", font_scale= 1.5)
plt.title("Histogram+Density Plot(Budget)")
sns.distplot(BW['Budget'], hist = True, color ='r')

### Most the movies are in the range of 2-50 crores

### There are few movies above 50 crores
### distribution looks normal with slight right skewness

Out[27]: <AxesSubplot:title={'center':'Histogram+Density Plot(Budget)'}, xlabel='Budget', ylabel

='Density'>

In [11]:
Comedy_ROI = BW[(BW['Genre'] == 'Comedy')]
localhost:8888/nbconvert/html/Bollywood and Heart Data Analysis.ipynb?download=false 5/15
9/20/22, 8:59 PM Bollywood and Heart Data Analysis
Drama_ROI = BW[(BW['Genre'] == ' Drama ')]
Drama_ROI.head(2)

plt.figure(figsize=(12,10))
sns.distplot(Drama_ROI['ROI'], hist = True, color = 'r', label = 'Drama')
sns.distplot(Comedy_ROI['ROI'], hist = True, color = 'b', label = 'Comedy')
plt.title('Drama vs Comedy', fontsize = 16)
plt.xlabel('Values', fontsize = 14)
plt.ylabel('Frequency', fontsize = 14)
plt.legend(loc = 'upper left', fontsize = 13)
plt.show()

In [12]:
sns.set_context("paper", font_scale= 1.5)
sns.lmplot(y="YoutubeLikes", x="BoxOfficeCollection", data=BW)
### Yes There is positive correlation between BoxOfficeCollection and YoutubeLikes

Out[12]: <seaborn.axisgrid.FacetGrid at 0x1ef2dfdb970>

localhost:8888/nbconvert/html/Bollywood and Heart Data Analysis.ipynb?download=false 6/15

9/20/22, 8:59 PM Bollywood and Heart Data Analysis

In [13]:
### Box Plots ###
plt.figure(figsize=(10,8))
sns.set_context("paper", font_scale= 1.5)
sns.boxplot(x="Genre", y="YoutubeLikes", data= BW, palette="Set3")
plt.xticks(rotation= 80)
plt.show()

### Action Movies has more youtubelikes

localhost:8888/nbconvert/html/Bollywood and Heart Data Analysis.ipynb?download=false 7/15

9/20/22, 8:59 PM Bollywood and Heart Data Analysis

In [14]:
plt.figure(figsize=(10,8))
Numerical_Variables = BW[['Budget','BoxOfficeCollection','YoutubeViews','YoutubeLikes',
sns.set_context("paper", font_scale= 1.5)
sns.heatmap(Numerical_Variables.corr(), cmap= 'YlGnBu', annot=True)
plt.show()
Numerical_Variables.corr().T

### Yes There is a Positive high Correlation among Budget, BoxOfficeCollection, Youtube

localhost:8888/nbconvert/html/Bollywood and Heart Data Analysis.ipynb?download=false 8/15

9/20/22, 8:59 PM Bollywood and Heart Data Analysis

Out[14]: Budget BoxOfficeCollection YoutubeViews YoutubeLikes YoutubeDislikes RO

Budget 1.000000 0.650401 0.589038 0.608916 0.665343 0.07205

BoxOfficeCollection 0.650401 1.000000 0.588632 0.682517 0.623941 0.58504

YoutubeViews 0.589038 0.588632 1.000000 0.884055 0.846739 0.25284

YoutubeLikes 0.608916 0.682517 0.884055 1.000000 0.859730 0.29130

YoutubeDislikes 0.665343 0.623941 0.846739 0.859730 1.000000 0.20153

ROI 0.072050 0.585042 0.252847 0.291302 0.201533 1.00000

In [15]:
Heart = pd.read_csv('SAheart.csv')
Heart.head()

Out[15]: sbp tobacco ldl adiposity famhist typea obesity alcohol age chd

0 160 12.00 5.73 23.11 Present 49 25.30 97.20 52 Si

1 144 0.01 4.41 28.61 Absent 55 28.87 2.06 63 Si

localhost:8888/nbconvert/html/Bollywood and Heart Data Analysis.ipynb?download=false 9/15

9/20/22, 8:59 PM Bollywood and Heart Data Analysis

sbp tobacco ldl adiposity famhist typea obesity alcohol age chd

2 118 0.08 3.48 32.28 Present 52 29.14 3.81 46 No

3 170 7.50 6.41 38.03 Present 51 31.99 24.26 58 Si

4 134 13.60 3.50 27.78 Present 60 25.99 57.34 49 Si

In [16]:
print(Heart.shape)
Heart.info()

(462, 10)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 462 entries, 0 to 461
Data columns (total 10 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 sbp 462 non-null int64
1 tobacco 462 non-null float64
2 ldl 462 non-null float64
3 adiposity 462 non-null float64
4 famhist 462 non-null object
5 typea 462 non-null int64
6 obesity 462 non-null float64
7 alcohol 462 non-null float64
8 age 462 non-null int64
9 chd 462 non-null object
dtypes: float64(5), int64(3), object(2)
memory usage: 36.2+ KB

In [17]:
Group_data = Heart.groupby('chd')['famhist'].count().reset_index(name="famhist_count")
sns.set_context("paper", font_scale= 1.5)
plt.title('famhist_count vs chd')
sns.barplot(x = 'chd',y='famhist_count',data = Group_data)
plt.show()
Group_data.head()

Out[17]: chd famhist_count

0 No 302

localhost:8888/nbconvert/html/Bollywood and Heart Data Analysis.ipynb?download=false 10/15

9/20/22, 8:59 PM Bollywood and Heart Data Analysis

chd famhist_count

1 Si 160

In [18]:
sns.set_context("paper", font_scale= 1.5)
sns.lmplot(y="age", x="sbp", data= Heart)
# Yes there is correlation between age and sbp

Out[18]: <seaborn.axisgrid.FacetGrid at 0x1ef2e298c10>

In [19]:
yes_chd = Heart[(Heart['chd'] == 'Si')]
No_chd = Heart[(Heart['chd'] == 'No')]
No_chd.head(2)

plt.figure(figsize=(12,10))
sns.distplot(yes_chd['tobacco'], hist = True, color = 'r', label = 'yes_chd')
sns.distplot(No_chd['tobacco'], hist = True, color = 'b', label = 'No_chd')
plt.title('yes_chd vs No_chd', fontsize = 16)
plt.xlabel('Values', fontsize = 14)
plt.ylabel('Frequency', fontsize = 14)
plt.legend(loc = 'upper left', fontsize = 13)
plt.show()

### Distribution show that those who consume tobacco there are higher chances of gettin

localhost:8888/nbconvert/html/Bollywood and Heart Data Analysis.ipynb?download=false 11/15

9/20/22, 8:59 PM Bollywood and Heart Data Analysis

In [20]:
plt.figure(figsize=(10,8))
Numerical_Variables1 = Heart[['sbp','obesity','age','ldl']]
sns.set_context("paper", font_scale= 1.5)
sns.heatmap(Numerical_Variables1.corr(), cmap= 'YlGnBu', annot=True)
plt.show()
Numerical_Variables1.corr().T

localhost:8888/nbconvert/html/Bollywood and Heart Data Analysis.ipynb?download=false 12/15

9/20/22, 8:59 PM Bollywood and Heart Data Analysis

Out[20]: sbp obesity age ldl

sbp 1.000000 0.238067 0.388771 0.158296

obesity 0.238067 1.000000 0.291777 0.330506

age 0.388771 0.291777 1.000000 0.311799

ldl 0.158296 0.330506 0.311799 1.000000

In [21]:
# her we define the threshhold or our age groups
age_groups = [0,15,35,55,64]

# and for convenience we give each of them a handy label

age_group_names = ['Young','adults','mid','old']

Heart['Age_group'] = pd.cut(Heart['age'], bins = age_groups, labels = age_group_names)

Heart.head(5)

Out[21]: sbp tobacco ldl adiposity famhist typea obesity alcohol age chd Age_group

0 160 12.00 5.73 23.11 Present 49 25.30 97.20 52 Si mid

1 144 0.01 4.41 28.61 Absent 55 28.87 2.06 63 Si old

2 118 0.08 3.48 32.28 Present 52 29.14 3.81 46 No mid

3 170 7.50 6.41 38.03 Present 51 31.99 24.26 58 Si old

4 134 13.60 3.50 27.78 Present 60 25.99 57.34 49 Si mid

localhost:8888/nbconvert/html/Bollywood and Heart Data Analysis.ipynb?download=false 13/15

9/20/22, 8:59 PM Bollywood and Heart Data Analysis

In [22]:
chd_cases = Heart[(Heart['chd'] == 'Si')]
Group_data1 = chd_cases.groupby('Age_group')['chd'].count().reset_index(name="chd_count
sns.set_context("paper", font_scale= 1.5)
plt.title('chd_count vs Age_group')
sns.barplot(x = 'Age_group',y='chd_count',data = Group_data1)
plt.show()
Group_data1.head(4)

Out[22]: Age_group chd_count

0 Young 0

1 adults 18

2 mid 81

3 old 61

In [23]:
sns.set_context("paper", font_scale= 1.5)
plt.figure(figsize=(10,8))
sns.boxplot(x="Age_group", y="ldl", data= Heart, palette="Set3")
plt.xticks(rotation= 80)
plt.show()

localhost:8888/nbconvert/html/Bollywood and Heart Data Analysis.ipynb?download=false 14/15

9/20/22, 8:59 PM Bollywood and Heart Data Analysis

localhost:8888/nbconvert/html/Bollywood and Heart Data Analysis.ipynb?download=false 15/15

CoEvolution (Alec Newald, 2011)
85% (13)
CoEvolution (Alec Newald, 2011)
262 pages
IMDB Movie Analysis
No ratings yet
IMDB Movie Analysis
80 pages
IMDB Movie Analysis 05 Project
No ratings yet
IMDB Movie Analysis 05 Project
7 pages
Movie Notebook
No ratings yet
Movie Notebook
91 pages
Moviesuggester - Jupyter Notebook
No ratings yet
Moviesuggester - Jupyter Notebook
11 pages
Python
No ratings yet
Python
30 pages
Rotten Tomatoes Audience Rating Prediction
No ratings yet
Rotten Tomatoes Audience Rating Prediction
36 pages
Project Report ON Movie Management: By: Kritika Sharma Class: XII-C
No ratings yet
Project Report ON Movie Management: By: Kritika Sharma Class: XII-C
23 pages
IMDB Movie Analysis: by Biswajeet Nayak
No ratings yet
IMDB Movie Analysis: by Biswajeet Nayak
23 pages
Code Day 3 ML
No ratings yet
Code Day 3 ML
24 pages
Megha Bharara CSV Project
No ratings yet
Megha Bharara CSV Project
22 pages
Big Data Assignment 1 Solutions
100% (1)
Big Data Assignment 1 Solutions
10 pages
Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram
No ratings yet
Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram
27 pages
Data Plus Movies Starter Kit
No ratings yet
Data Plus Movies Starter Kit
21 pages
DSLAB5
No ratings yet
DSLAB5
17 pages
An Alise
No ratings yet
An Alise
16 pages
Investigate A Dataset
No ratings yet
Investigate A Dataset
14 pages
Swati Mam The - Iscale Movies Project Code
No ratings yet
Swati Mam The - Iscale Movies Project Code
13 pages
Import As Import As Import As Import Import As From Import: 'Ggplot'
No ratings yet
Import As Import As Import As Import Import As From Import: 'Ggplot'
13 pages
Top50moviesp44091 2 2
No ratings yet
Top50moviesp44091 2 2
11 pages
Final Project1 IMDB Movie Analysis PDF
No ratings yet
Final Project1 IMDB Movie Analysis PDF
9 pages
15 Funciones Esenciales de Pandas
No ratings yet
15 Funciones Esenciales de Pandas
12 pages
15 Pandas Function For 90 - of The Work
No ratings yet
15 Pandas Function For 90 - of The Work
12 pages
Adriano Axel Pliopas Pereira - 83393 - Exercise 8 - Ggplot2movies
No ratings yet
Adriano Axel Pliopas Pereira - 83393 - Exercise 8 - Ggplot2movies
15 pages
IMDb+Movie+Assignment Stub
No ratings yet
IMDb+Movie+Assignment Stub
9 pages
Movie Data Analysis Netflix
No ratings yet
Movie Data Analysis Netflix
16 pages
04 - Movie Rating Analysis
No ratings yet
04 - Movie Rating Analysis
9 pages
Pandas Data Frame For Beginners
No ratings yet
Pandas Data Frame For Beginners
25 pages
Movie Recommendation System Analysis
No ratings yet
Movie Recommendation System Analysis
8 pages
Bollywood Analysis-1
No ratings yet
Bollywood Analysis-1
9 pages
Hands-On Lab - Importing Data in R
No ratings yet
Hands-On Lab - Importing Data in R
8 pages
3 An Illustrative Analysis: 3.1 Gathering Data
No ratings yet
3 An Illustrative Analysis: 3.1 Gathering Data
11 pages
Predictive Analysis 1 Assignment
No ratings yet
Predictive Analysis 1 Assignment
5 pages
Pyspark Basic Tasks
No ratings yet
Pyspark Basic Tasks
8 pages
Final Project
No ratings yet
Final Project
7 pages
SDM - Task B - Group 1G - Movies
No ratings yet
SDM - Task B - Group 1G - Movies
11 pages
Set-A
No ratings yet
Set-A
7 pages
Project 5
No ratings yet
Project 5
5 pages
Netflix Data Analysis Project
No ratings yet
Netflix Data Analysis Project
16 pages
Construction Readiness Review Pack
67% (3)
Construction Readiness Review Pack
91 pages
Source Code Source Code
No ratings yet
Source Code Source Code
4 pages
Project 4 Imdb Movie Analysis
No ratings yet
Project 4 Imdb Movie Analysis
17 pages
IMDB Movie Analysis
No ratings yet
IMDB Movie Analysis
17 pages
IMDB Dataframe Insights
No ratings yet
IMDB Dataframe Insights
3 pages
TMDB Box Office Prediction: Group 6
No ratings yet
TMDB Box Office Prediction: Group 6
7 pages
Week 3
No ratings yet
Week 3
2 pages
Assignment 1 B2019010
No ratings yet
Assignment 1 B2019010
9 pages
Report
No ratings yet
Report
26 pages
IMDB Movie Analysis1
No ratings yet
IMDB Movie Analysis1
14 pages
Using Excel With Pandas
No ratings yet
Using Excel With Pandas
16 pages
COM 428 - Jupyter Notebook2 - 101223
No ratings yet
COM 428 - Jupyter Notebook2 - 101223
16 pages
Netflix Case Study
No ratings yet
Netflix Case Study
12 pages
Submission I - Case Study For PGDDS (Semester II)
No ratings yet
Submission I - Case Study For PGDDS (Semester II)
14 pages
Marvel Vs DC
No ratings yet
Marvel Vs DC
1 page
Science and Society in Ancient India Debiprasad Chattopadhyaya
No ratings yet
Science and Society in Ancient India Debiprasad Chattopadhyaya
452 pages
Ads - Phase 5
No ratings yet
Ads - Phase 5
14 pages
BDA Lab 4: Python Data Visualization: Your Name: Mohamad Salehuddin Bin Zulkefli Matric No: 17005054
No ratings yet
BDA Lab 4: Python Data Visualization: Your Name: Mohamad Salehuddin Bin Zulkefli Matric No: 17005054
10 pages
Learner Responses
No ratings yet
Learner Responses
33 pages
IMDB Analysis
No ratings yet
IMDB Analysis
4 pages
A Level Physics Notes
No ratings yet
A Level Physics Notes
80 pages
Imdb
No ratings yet
Imdb
11 pages
Assignment - 1 - DSML: Ques - 1
No ratings yet
Assignment - 1 - DSML: Ques - 1
9 pages
IMDB Movie Analysis Report
No ratings yet
IMDB Movie Analysis Report
11 pages
Summarize The Concept of Consumer Learning
75% (4)
Summarize The Concept of Consumer Learning
15 pages
DAILY LESSON LOG OF STEM - PC11AG-Ib-1 (Week Two-Day One) : 4 Cy 4cy
No ratings yet
DAILY LESSON LOG OF STEM - PC11AG-Ib-1 (Week Two-Day One) : 4 Cy 4cy
4 pages
Aging and The Life Course: An Introduction To Social Gerontology 6th Edition (Ebook PDF) PDF Download
100% (2)
Aging and The Life Course: An Introduction To Social Gerontology 6th Edition (Ebook PDF) PDF Download
49 pages
Kepler's Cosmological Synthesis - Astrology, Mechanism and The Soul (PDFDrive) PDF
100% (1)
Kepler's Cosmological Synthesis - Astrology, Mechanism and The Soul (PDFDrive) PDF
201 pages
2 B. Chapter 2 Mpu22012 2021
No ratings yet
2 B. Chapter 2 Mpu22012 2021
59 pages
Westin Aristotle's Rhetorical Energeia
No ratings yet
Westin Aristotle's Rhetorical Energeia
11 pages
GRADE 12 Economics Test 2
No ratings yet
GRADE 12 Economics Test 2
2 pages
Admitted Student (Edited by Anmol)
No ratings yet
Admitted Student (Edited by Anmol)
16 pages
How To Preserve Malaysian Identity Essay
No ratings yet
How To Preserve Malaysian Identity Essay
1 page
Reviewer Communication
No ratings yet
Reviewer Communication
2 pages
Q1W4 Solving Equations Tranformable Into Quadratic Equations Problem Solving Involving Quadratic Equation and Rational Algebraic Equations
No ratings yet
Q1W4 Solving Equations Tranformable Into Quadratic Equations Problem Solving Involving Quadratic Equation and Rational Algebraic Equations
38 pages
Filamentous Fungi and Yeast
No ratings yet
Filamentous Fungi and Yeast
26 pages
Training and Development Department - Sanjana - Kumari
No ratings yet
Training and Development Department - Sanjana - Kumari
3 pages
Nas5311 Spec
No ratings yet
Nas5311 Spec
4 pages
Total Specifications of Products - en
No ratings yet
Total Specifications of Products - en
15 pages
Ebecryl-898 en A4
No ratings yet
Ebecryl-898 en A4
2 pages
Septic Tank & Leach Field
No ratings yet
Septic Tank & Leach Field
1 page
Homework7 2 December 2015
100% (1)
Homework7 2 December 2015
2 pages
Eigen Values and Eigen Vector
No ratings yet
Eigen Values and Eigen Vector
13 pages
Scholarship For MSC Student
No ratings yet
Scholarship For MSC Student
3 pages
Approvals - Listofproducts - Siemens 2019 PDF
No ratings yet
Approvals - Listofproducts - Siemens 2019 PDF
3 pages
Ocb & LMX
No ratings yet
Ocb & LMX
12 pages
YE Example
No ratings yet
YE Example
4 pages
Generation Gap
No ratings yet
Generation Gap
3 pages
Triz Ol Rna Extraction 030911
No ratings yet
Triz Ol Rna Extraction 030911
3 pages
Group Performance Tasks Ged5 G3
No ratings yet
Group Performance Tasks Ged5 G3
1 page
Unofficial Price Guide to Video Games: Virtual Boy
From Everand
Unofficial Price Guide to Video Games: Virtual Boy
Jay Recher
No ratings yet

Bollywood and Heart Data Analysis

Uploaded by

Bollywood and Heart Data Analysis

Uploaded by

9/20/22, 8:59 PM Bollywood and Heart Data Analysis

18-Jul- Amit Sahni

localhost:8888/nbconvert/html/Bollywood and Heart Data Analysis.ipynb?download=false 1/15

localhost:8888/nbconvert/html/Bollywood and Heart Data Analysis.ipynb?download=false 2/15

localhost:8888/nbconvert/html/Bollywood and Heart Data Analysis.ipynb?download=false 3/15

Out[8]: MovieName ROI ReleaseTime

132 Grand Masti 7.514286 LW

135 The Lunchbox 7.500000 N

58 Mary Kom 5.933333 N

128 Shahid 5.666667 FS

37 Humpty Sharma Ki Dulhania 5.500000 N

101 Bhaag Milkha Bhaag 4.466667 N

localhost:8888/nbconvert/html/Bollywood and Heart Data Analysis.ipynb?download=false 4/15

MovieName ROI ReleaseTime

115 Chennai Express 4.266667 FS

### Most the movies are in the range of 2-50 crores

Out[27]: <AxesSubplot:title={'center':'Histogram+Density Plot(Budget)'}, xlabel='Budget', ylabel

Out[12]: <seaborn.axisgrid.FacetGrid at 0x1ef2dfdb970>

localhost:8888/nbconvert/html/Bollywood and Heart Data Analysis.ipynb?download=false 6/15

### Action Movies has more youtubelikes

localhost:8888/nbconvert/html/Bollywood and Heart Data Analysis.ipynb?download=false 7/15

localhost:8888/nbconvert/html/Bollywood and Heart Data Analysis.ipynb?download=false 8/15

Out[14]: Budget BoxOfficeCollection YoutubeViews YoutubeLikes YoutubeDislikes RO

Budget 1.000000 0.650401 0.589038 0.608916 0.665343 0.07205

BoxOfficeCollection 0.650401 1.000000 0.588632 0.682517 0.623941 0.58504

YoutubeViews 0.589038 0.588632 1.000000 0.884055 0.846739 0.25284

YoutubeLikes 0.608916 0.682517 0.884055 1.000000 0.859730 0.29130

YoutubeDislikes 0.665343 0.623941 0.846739 0.859730 1.000000 0.20153

ROI 0.072050 0.585042 0.252847 0.291302 0.201533 1.00000

0 160 12.00 5.73 23.11 Present 49 25.30 97.20 52 Si

1 144 0.01 4.41 28.61 Absent 55 28.87 2.06 63 Si

localhost:8888/nbconvert/html/Bollywood and Heart Data Analysis.ipynb?download=false 9/15

2 118 0.08 3.48 32.28 Present 52 29.14 3.81 46 No

3 170 7.50 6.41 38.03 Present 51 31.99 24.26 58 Si

4 134 13.60 3.50 27.78 Present 60 25.99 57.34 49 Si

Out[17]: chd famhist_count

localhost:8888/nbconvert/html/Bollywood and Heart Data Analysis.ipynb?download=false 10/15

Out[18]: <seaborn.axisgrid.FacetGrid at 0x1ef2e298c10>

localhost:8888/nbconvert/html/Bollywood and Heart Data Analysis.ipynb?download=false 11/15

localhost:8888/nbconvert/html/Bollywood and Heart Data Analysis.ipynb?download=false 12/15

Out[20]: sbp obesity age ldl

sbp 1.000000 0.238067 0.388771 0.158296

obesity 0.238067 1.000000 0.291777 0.330506

age 0.388771 0.291777 1.000000 0.311799

ldl 0.158296 0.330506 0.311799 1.000000

# and for convenience we give each of them a handy label

Heart['Age_group'] = pd.cut(Heart['age'], bins = age_groups, labels = age_group_names)

0 160 12.00 5.73 23.11 Present 49 25.30 97.20 52 Si mid

1 144 0.01 4.41 28.61 Absent 55 28.87 2.06 63 Si old

2 118 0.08 3.48 32.28 Present 52 29.14 3.81 46 No mid

3 170 7.50 6.41 38.03 Present 51 31.99 24.26 58 Si old

4 134 13.60 3.50 27.78 Present 60 25.99 57.34 49 Si mid

localhost:8888/nbconvert/html/Bollywood and Heart Data Analysis.ipynb?download=false 13/15

Out[22]: Age_group chd_count

localhost:8888/nbconvert/html/Bollywood and Heart Data Analysis.ipynb?download=false 14/15

localhost:8888/nbconvert/html/Bollywood and Heart Data Analysis.ipynb?download=false 15/15

You might also like