Marvel Vs DC
Marvel Vs DC
Marvel Vs DC
In [5]: data.head()
Out[5]:
Company Film Release Adjusted Worldwide Domestic Foreign
In [6]: data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 37 entries, 0 to 36
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Company 37 non-null object
1 Film 37 non-null object
2 Release 37 non-null int64
3 Adjusted 37 non-null float64
4 Worldwide 37 non-null float64
5 Domestic 37 non-null float64
6 Foreign 37 non-null float64
dtypes: float64(4), int64(1), object(2)
memory usage: 2.1+ KB
In [9]: data2.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 26 entries, 0 to 25
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Company 26 non-null object
1 Film 26 non-null object
2 Release 26 non-null int64
3 Adjusted 26 non-null float64
4 Worldwide 26 non-null float64
5 Domestic 26 non-null float64
6 Foreign 26 non-null float64
dtypes: float64(4), int64(1), object(2)
memory usage: 1.5+ KB
In [10]: data1.describe()
Out[10]:
Release Adjusted Worldwide Domestic Foreign
In [11]: #print the movie with minim world wide gross, and release year
In [12]: d= data1['Worldwide']
In [17]: data1.describe()
Out[17]:
Release Adjusted Worldwide Domestic Foreign
In [18]: data2.describe()
Out[18]:
Release Adjusted Worldwide Domestic Foreign
In [ ]: #Outlier
income: 16 :5000 - 8000
2: 500, 1000
2: 50,000 and 100,000
In [ ]: 500-----1000----5000------80000----50000------------100,0000
In [22]: 5000+8000/2
Out[22]: 9000.0
In [23]: aa = sns.boxplot(data['Worldwide'])
a.set_title('Whisker plot', color='red')
In [ ]: # Percentile
In [29]: upper
Out[29]: 1415.1825549999999
In [31]: L
Out[31]: -639.215885
In [33]: out = data1.loc[(data1['Worldwide']>upper), ['Release', 'Film', 'Worldwide']] # Identifying the outlier from the dat
a
In [34]: out
Out[34]:
Release Film Worldwide
In [36]: lower
Out[36]:
Release Film Worldwide
Out[37]:
Release Film Worldwide
In [40]: w = data1['Worldwide']
data1['World_Gross']= ' '
d = data1['World_Gross']
In [ ]: #<500 : <500
#500 - 1000: 500-1000
#>1000: >1000
In [41]: j = 0
In [42]: for i in w:
if i <500:
d.loc[j] = '<500'
elif i>=500 and i<=1000:
d.loc[j] = '500-1000'
else:
d.loc[j] = '>1000'
j=j+1
/var/folders/2p/sc3q6t7x3w3gns_3143jbz700000gn/T/ipykernel_5276/380579830.py:7: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
In [43]: data1.head()
Out[43]:
Company Film Release Adjusted Worldwide Domestic Foreign World_Gross
In [45]: N = data1['World_Gross'].value_counts()
In [46]: N
Out[46]: <500 25
500-1000 10
>1000 2
Name: World_Gross, dtype: int64
In [49]: ax = sns.countplot(data1['World_Gross'])
ax.set_title('Distribution of World-wide Gross-Marvel',color='blue')
for i in ax.containers:
ax.bar_label(i)
In [ ]: #Release- display - count of movies releaes before 1990, 1990-2000, 2000 - 2010, aftr 2010