Matplotlib 1722309886
Matplotlib 1722309886
Types of Data
Numerical Data
Categorical Data
Two columns: Bivariate Analysis (Analyzing the relationship between two variables)
* 2D line Plot
In [4]: batsman
Out[4]:
index RG Sharma V Kohli
colors(hex)
Dashed Lines
In [9]:
plt.plot(batsman['index'],batsman['V Kohli'],color='#ADAA19',
linestyle = 'dashdot') # dashdot
plt.plot(batsman['index'],batsman['RG Sharma'],color='#FC00D6')
plt.title('Rohit Sharma Vs Virat Kohli CARRER difference')
plt.xlabel('Season')
plt.ylabel('Runs Scored')
dotted lines
In [10]:
plt.plot(batsman['index'],batsman['V Kohli'],color='#ADAA19',
linestyle = 'dotted') # dotted
plt.plot(batsman['index'],batsman['RG Sharma'],color='#FC00D6',
linestyle ='dotted') # dotted
plt.title('Rohit Sharma Vs Virat Kohli CARRER difference')
plt.xlabel('Season')
plt.ylabel('Runs Scored')
line width
In [12]:
plt.plot(batsman['index'],batsman['V Kohli'],color='#D9F10F'
,linestyle='solid',linewidth=3) # linewidth
plt.plot(batsman['index'],batsman['RG Sharma'],
color='#FC00D6',linestyle='dashdot',linewidth=2)
plt.title('Rohit Sharma Vs Virat Kohli CARRER difference')
plt.xlabel('Season')
plt.ylabel('Runs Scored')
marker(size)
Legend
In [17]: # legend -
plt.plot(batsman['index'],batsman['V Kohli'],
color='#D9F10F',linestyle='solid',
linewidth=3,marker='D',markersize=10,label='Virat')
plt.plot(batsman['index'],batsman['RG Sharma'],
color='#FC00D6',linestyle='dashdot',
linewidth=2,marker='o',label='Rohit')
plt.title('Rohit Sharma Vs Virat Kohli Career Comparison')
plt.xlabel('Season')
plt.ylabel('Runs Scored')
plt.legend() # loc = best
limiting axes
grid
show
In [21]:
plt.plot(batsman['index'],batsman['V Kohli'],color='#D9F10F',linestyle='solid'
plt.plot(batsman['index'],batsman['RG Sharma'],color='#FC00D6',linestyle='dash
plt.title('Rohit Sharma Vs Virat Kohli Career Comparison')
plt.xlabel('Season')
plt.ylabel('Runs Scored')
plt.grid()
plt.show()
Scatter plot
A scatter plot in matplotlib is a type of plot used to visualize the relationship between two
continuous variables. It displays individual data points as markers on a two-dimensional
coordinate system, with one variable represented on the x-axis and the other variable
represented on the y-axis.
Bivariate Analysis
numerical vs numerical
Use case - Finding correlation
In [24]: plt.scatter(x,y)
Out[26]:
batter runs avg strike_rate
In [27]: plt.scatter(df['avg'],df['strike_rate'],color='red',marker='+')
plt.title('Avg and SR analysis of Top 50 Batsman')
plt.xlabel('Average')
plt.ylabel('SR')
Size
In [31]:
sns.load_dataset('tips')
tips = sns.load_dataset('tips')
In [38]:
plt.plot(tips['total_bill'],tips['tip']) # Faster Technique
In [37]: plt.plot(tips['total_bill'],tips['tip'],'o')
Bar chart
Bivariate Analysis
Numerical vs Categorical
Use case - Aggregate analysis of groups
In [41]:
plt.barh(colors,children,color='gold')
Out[45]:
batsman 2015 2016 2017
Out[68]: 5
Colors
Overlapping problem
In [75]: plt.bar(df['batsman'],df['2017'],label='2017')
plt.bar(df['batsman'],df['2016'],bottom=df['2017'],label='2016')
plt.bar(df['batsman'],df['2015'],bottom=(df['2016'] + df['2017'])
,label='2015')
plt.legend()
plt.show()
Histogram
Univariate Analysis
Numerical col
Use case - Frequency Count
Out[77]: (array([2., 0., 0., 1., 1., 0., 1., 0., 0., 2.]),
array([10. , 15.1, 20.2, 25.3, 30.4, 35.5, 40.6, 45.7, 50.8, 55.9, 61. ]),
<BarContainer object of 10 artists>)
bins
In [80]: # on Data
df = pd.read_csv("vk.csv")
df
Out[80]:
match_id batsman_runs
0 12 62
1 17 28
2 20 64
3 27 0
4 30 10
136 624 75
138 632 54
139 633 0
140 636 54
In [85]: plt.hist(df['batsman_runs'])
plt.show()
Logarithmic scale
Out[92]: (array([ 12., 60., 109., 4039., 6003., 230., 410., 744., 291.,
51.]),
array([10. , 15.9, 21.8, 27.7, 33.6, 39.5, 45.4, 51.3, 57.2, 63.1, 69. ]),
<BarContainer object of 10 artists>)
In [93]: # Solution
plt.hist(arr,bins=[10,20,30,40,50,60,70],log=True)
plt.show()
Pie Chart
Univariate/Bivariate Analysis
Categorical vs numerical
In [95]: # On data
df =pd.read_csv("gayle-175.csv")
df
Out[95]:
batsman batsman_runs
0 AB de Villiers 31
1 CH Gayle 175
2 R Rampaul 0
3 SS Tiwary 2
4 TM Dilshan 33
5 V Kohli 11
In [97]: plt.pie(df['batsman_runs'],labels=df['batsman'])
plt.show()
Percentages
Colours
In [101]: plt.pie(df['batsman_runs'],labels=df['batsman'],
autopct='%0.1f%%',
colors=['blue','gold','green','orange','cyan','pink'])
plt.show()
Explode shadow
In [107]: plt.pie(df['batsman_runs'],labels=df['batsman'],
autopct='%0.1f%%',
explode=[0.3,0,0,0,0,0.1],shadow=True)
plt.show()
Changing Styles
In [108]: plt.style.available
Out[108]: ['Solarize_Light2',
'_classic_test_patch',
'bmh',
'classic',
'dark_background',
'fast',
'fivethirtyeight',
'ggplot',
'grayscale',
'seaborn',
'seaborn-bright',
'seaborn-colorblind',
'seaborn-dark',
'seaborn-dark-palette',
'seaborn-darkgrid',
'seaborn-deep',
'seaborn-muted',
'seaborn-notebook',
'seaborn-paper',
'seaborn-pastel',
'seaborn-poster',
'seaborn-talk',
'seaborn-ticks',
'seaborn-white',
'seaborn-whitegrid',
'tableau-colorblind10']
In [109]: # style
plt.style.use('Solarize_Light2')
In [110]: # Example
plt.hist(arr,bins=[10,20,30,40,50,60,70],log=True)
plt.show()
In [111]: # Style 2
plt.style.use('_classic_test_patch')
In [113]: plt.pie(df['batsman_runs'],labels=df['batsman'],
autopct='%0.1f%%',
explode=[0.3,0,0,0,0,0.1],shadow=True)
plt.show()
In [114]: # Style 3
plt.style.use('dark_background')
In [115]: # Example
plt.plot(batsman['index'],batsman['V Kohli'],color='#D9F10F',linestyle='solid'
plt.plot(batsman['index'],batsman['RG Sharma'],color='#FC00D6',linestyle='dash
plt.title('Rohit Sharma Vs Virat Kohli Career Comparison')
plt.xlabel('Season')
plt.ylabel('Runs Scored')
plt.grid()
plt.show()
In [116]: # Style 4
plt.style.use('seaborn-darkgrid')
In [120]: plt.pie(df['batsman_runs'],labels=df['batsman'],
autopct='%0.1f%%',
explode=[0.3,0,0,0,0,0.1],shadow=True)
plt.show()
In [121]: # Style 5
plt.style.use('ggplot')
Save Figure
In [ ]:
Colored Scatterplots
In [2]: iris = pd.read_csv("iris.csv")
iris.sample(5)
Out[2]:
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
In [3]: # Replacing
iris['Species'] = iris['Species'].replace({'Iris-setosa':0 ,
'Iris-versicolor':1 ,
'Iris-virginica':2})
iris.sample(5)
Out[3]:
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
C - colors
In [4]:
plt.scatter( iris['SepalLengthCm'],iris['PetalLengthCm'],
c= iris['Species']) # c
plt.xlabel('Sepal length')
plt.ylabel('petal length')
plt.show()
cmap
color bar
plt.show()
alpha
Plot size
In [9]: plt.figure(figsize=(15,7)) # 15 -width , 7 -height
plt.scatter( iris['SepalLengthCm'],iris['PetalLengthCm'],
c= iris['Species'] ,cmap ='jet', alpha =0.5) # reduces color
plt.xlabel('Sepal length')
plt.ylabel('petal length')
plt.colorbar()
plt.show()
Annotations
In [10]: # sample annotation - naming
x = [1,2,3,4]
y = [5,6,7,8]
plt.scatter(x,y)
plt.text(1,5,'Point 1',fontdict={'size':12,'color':'green'})
plt.text(2,6,'Point 2',fontdict={'size':12,'color':'red'})
plt.text(3,7,'Point 3',fontdict={'size':12,'color':'black'})
plt.text(4,8,'Point 4',fontdict={'size':12,'color':'brown'})
Out[11]: (605, 4)
In [13]: sample_df.shape
Out[13]: (25, 4)
In [14]: sample_df
Out[14]:
batter runs avg strike_rate
In [16]: sample_df
Out[16]:
batter runs avg strike_rate
In [17]: plt.figure(figsize=(18,10))
plt.scatter(sample_df['avg'],sample_df['strike_rate'] ,
s =sample_df['runs']) # s = size
for i in range(sample_df.shape[0]):
plt.text(sample_df['avg'].values[i],
sample_df['strike_rate'].values[i],
sample_df['batter'].values[i])
In [148]: plt.figure(figsize=(18,10))
plt.axvline(30, color ='red',linewidth=5) # Vertical line
plt.scatter(sample_df['avg'],sample_df['strike_rate'] ,
s =sample_df['runs'], color ='cyan')
for i in range(sample_df.shape[0]):
plt.text(sample_df['avg'].values[i],
sample_df['strike_rate'].values[i],
sample_df['batter'].values[i])
In [149]: plt.figure(figsize=(18,10))
plt.axhline(130, color ='red',linewidth=5) # Horizontal line
plt.axvline(30, color ='red',linewidth=5) # Vertical line
plt.scatter(sample_df['avg'],sample_df['strike_rate'] ,
s =sample_df['runs'], color ='yellow')
for i in range(sample_df.shape[0]):
plt.text(sample_df['avg'].values[i],
sample_df['strike_rate'].values[i],
sample_df['batter'].values[i])
In [150]: plt.figure(figsize=(18,10))
plt.axhline(130, color ='red',linewidth=5) # Horizontal line
plt.axhline(140, color ='green',linewidth=5) # EXTRA Horizontal line
plt.axvline(30, color ='red',linewidth=5)
plt.scatter(sample_df['avg'],sample_df['strike_rate'] ,
s =sample_df['runs'],color='pink')
for i in range(sample_df.shape[0]):
plt.text(sample_df['avg'].values[i],
sample_df['strike_rate'].values[i],
sample_df['batter'].values[i])
Subplots
In [22]: batter.head()
Out[22]:
batter runs avg strike_rate
C:\Users\user\AppData\Local\Temp/ipykernel_14332/2843873373.py:8: UserWarnin
g: Matplotlib is currently using module://matplotlib_inline.backend_inline, w
hich is a non-GUI backend, so cannot show the figure.
fig.show()
In [37]: # on Data
fig, ax = plt.subplots(nrows=2,ncols=1,sharex=True,figsize=(10,6))
#sharex = Controls sharing of properties among x (*sharex*) or y (*sharey*)
# axis
ax[0].scatter(batter['avg'],batter['strike_rate'],color='red')
ax[1].scatter(batter['avg'],batter['runs'],color ='green')
ax[0].set_title('Avg Vs Strike Rate')
ax[0].set_ylabel('Strike Rate')
ax[1].set_title('Avg Vs Runs')
ax[1].set_ylabel('Runs')
ax[1].set_xlabel('Avg')
C:\Users\user\AppData\Local\Temp/ipykernel_14332/3464065883.py:16: UserWarnin
g: Matplotlib is currently using module://matplotlib_inline.backend_inline, w
hich is a non-GUI backend, so cannot show the figure.
fig.show()
C:\Users\user\AppData\Local\Temp/ipykernel_14332/2368418411.py:14: UserWarnin
g: Matplotlib is currently using module://matplotlib_inline.backend_inline, w
hich is a non-GUI backend, so cannot show the figure.
fig.show()
3D scatter plots
In [62]: batter
fig = plt.figure(figsize=(10,7))
ax = plt.subplot(projection ='3d')
ax.scatter3D(batter['runs'],batter['avg'],batter['strike_rate'],
color='red' , marker = '*')
ax.set_title('IPL batsman analysis')
ax.set_xlabel('Runs')
ax.set_ylabel('Avg')
ax.set_zlabel('SR')
3D Line plot
In [61]: x = [0,1,5,25]
y = [0,10,13,0]
z = [0,13,20,9]
fig = plt.figure(figsize=(10,7))
ax = plt.subplot(projection='3d')
ax.scatter3D(x,y,z,s=[100,100,100,100])
ax.plot3D(x,y,z,color='blue')
3D surface Plot
In [63]: # Loss function x2 + y2
# Helpful in Machine learning
x = np.linspace(-10,10,100)
y = np.linspace(-10,10,100)
In [64]: x
In [65]: y
In [66]: np.meshgrid(x,y)
In [68]: xx.shape
In [69]: yy.shape
In [71]: z.shape
Contour Plots
In [84]: # Representing 3D to 2d
#3D graph
fig = plt.figure(figsize=(12,8))
ax = plt.subplot(projection='3d')
p = ax.plot_surface(xx,yy,z,cmap='viridis')
fig.colorbar(p)
Contourf Plot
Heat map
In [91]: delivery.head()
Out[91]:
non-
ID innings overs ballnumber batter bowler extra_type batsman_run e
striker
YBK Mohammed JC
0 1312200 1 0 1 NaN 0
Jaiswal Shami Buttler
YBK Mohammed JC
1 1312200 1 0 2 legbyes 0
Jaiswal Shami Buttler
JC Mohammed YBK
2 1312200 1 0 3 NaN 1
Buttler Shami Jaiswal
YBK Mohammed JC
3 1312200 1 0 4 NaN 0
Jaiswal Shami Buttler
YBK Mohammed JC
4 1312200 1 0 5 NaN 0
Jaiswal Shami Buttler
In [94]: temp_df.sample()
Out[94]:
non-
ID innings overs ballnumber batter bowler extra_type batsman_run ex
striker
MS A RA
134037 598062 1 15 3 NaN 6
Dhoni Nehra Jadeja
In [96]: grid
Out[96]:
ballnumber 1 2 3 4 5 6
overs
0 9 17 31 39 33 27
1 31 40 49 56 58 54
2 75 62 70 72 58 76
3 60 74 74 103 74 71
4 71 76 112 80 81 72
5 77 102 63 86 78 80
6 34 56 49 59 64 38
7 59 62 73 70 69 56
8 86 83 79 81 73 52
9 54 62 86 61 74 67
10 82 92 83 69 72 70
11 91 72 87 79 87 70
13 101 101 99 97 90 88
In [99]: plt.figure(figsize=(20,10))
plt.imshow(grid)
In [100]: plt.figure(figsize=(20,10))
plt.imshow(grid)
plt.yticks(delivery['overs'].unique(), list(range(1,21)))
plt.xticks(np.arange(0,6), list(range(1,7)))
plt.colorbar()
Pandas Plot
In [101]: # on a series
s = pd.Series([1,2,3,4,5,6,7])
s.plot(kind='pie')
Out[101]: <AxesSubplot:ylabel='None'>
Out[106]:
total_bill tip sex smoker day time size
In [109]: tips.head()
Out[109]:
total_bill tip sex smoker day time size
In [110]: # Scatter plot -> labels -> markers -> figsize -> color -> cmap
tips.plot(kind='scatter',x='total_bill',y='tip',
title='Cost Analysis',marker='+',
figsize=(10,6),s='size',c='sex',
cmap='viridis')
2d plot
In [111]:
# dataset = 'https://raw.githubusercontent.com/m-mehdi/pandas_tutorials/main/w
stocks = pd.read_csv('https://raw.githubusercontent.com/m-mehdi/pandas_tutoria
stocks.head()
Out[111]:
Date MSFT FB AAPL
Out[118]: <AxesSubplot:>
In [113]: stocks.plot(kind='line',x='Date')
Out[113]: <AxesSubplot:xlabel='Date'>
In [114]: stocks[['Date','AAPL','FB']].plot(kind='line',x='Date')
Out[114]: <AxesSubplot:xlabel='Date'>
bar chart
Out[115]:
batsman 2015 2016 2017
Out[119]: <AxesSubplot:xlabel='sex'>
Out[117]: <AxesSubplot:>
In [120]: temp.plot(kind='bar')
Out[120]: <AxesSubplot:>
Out[126]: <AxesSubplot:>
Histogram
Out[125]: <AxesSubplot:ylabel='Frequency'>
Pie chart
Out[127]:
batsman match1 match2 match3
0 Dhawan 120 0 50
1 Rohit 90 1 24
3 SKY 45 130 45
4 Pandya 12 34 10
In [130]: df['match1'].plot(kind='pie',
labels=df['batsman'].values,autopct='%0.1f%%',
colormap= 'tab10')
Out[130]: <AxesSubplot:ylabel='match1'>
on multiindex dataframes
In [ ]: