Unit 1 Pandas - Charts
Unit 1 Pandas - Charts
"A picture is worth of a thousand words". Most of us are familiar with this expression. Data visualization plays an essential role
in their presentation of both small and large-scale data. It especially applies when trying to explain the analysis of increasingly
large datasets.
Data visualization is the discipline of trying to expose the data to understand it by placing it in a visual context. Its main goal is to
collect large datasets into visual graphics to allow for easy understanding of complex relationships within the data. Several data
visualization libraries are available in Python namely Matplotlib, Seaborn and Folium etc.
Purpose of Data visualization
Better analysis and Quick action
Identifying patterns and Finding errors
Understanding the story, Exploring business insights and Grasping the Latest Trends
Plotting library
Matplotlib is the whole python package/library used to create 2D graphs and plots by using python scripts. Pyplot is a module in
matplotlib, which supports a very wide variety of graphs and plots namely-histogram, bar charts, power spectra, error charts
etc. It is used along with NumPy to provide an environment for MatLab. import matplotlib.pyplot as plt - is used for chart.
Pyplot provides the state-machine interface to the plotting library in matplotlib. It means that figures and axes are implicitly and
automatically created to achieve the desired plot. For example, calling plot from pyplot will automatically create the necessary
figure and axes to achieve the desired plot. Setting a title will then automatically set that title to the current axes object. The
pyplot interface is generally preferred for non-interactive plotting (i.e., scripting).
Following features are provided in matplotlib library for data visualization.
Drawing – plots can be drawn based on passed data through specific functions.
Customization – plots can be customized as per requirement after specifying it in the arguments of the functions. Like
color, style (dashed, dotted), width; adding label, title and legend in plots can be customized.
Saving – After drawing and customization plots can be saved like .pdf, .png, .eps etc. for future use.
Customizing / Adding details of the plots
Y limit range
Title
Legend
Y label
plt.hist(x['sales'], bins=[2,4,6,8], cumulative=True) plt.hist(x['sales'], bins=[2,4,6,8], histtype='step') plt.hist(x['sales'], bins=[2,4,6,8], rwidth = 0.9)
plt.title('Cumulative Histogram Chart of Sales of 2016 ') plt.title('Frequency Polygon Chart of Sales of 2016 ') plt.title('Histogram Chart of Sales using rwidth - bar graph style ')
plt.xlabel( 'Sales Bins or Interval' ) plt.xlabel( 'Sales Bins or Interval by 2' ) plt.xlabel('Sales Frequency values' )
plt.ylabel( 'Sales Frequency values' ) plt.ylabel( 'Sales Frequency values' ) plt.ylabel( 'Sales Bins or Interval by 3' )
plt.legend([ 'Sales Frequencies' ], loc='best') plt.legend([ 'Sales Frequencies' ], loc='upper left') plt.legend([ 'Sales Frequencies' ], loc='best')
‘x’ x marker ‘H’ hexagon2 marker ‘<’ triangle left marker ‘center right’
‘lower center’
‘D’ diamond marker ‘1’ tri down marker ‘>’ triangle right marker
‘upper center’
‘d’ thin diamond marker ‘2’ tri up marker ‘|’ , ‘_’ vline, hline markers ‘center’
Difference between bar graph and histogram
2. In histogram, inside bars always filled with colors. 2. In frequency polygon, inside bars always, no color filled only
plt.scatter( x['temp'], x['sales'], color='b', marker='x') plt.scatter( x['temp'], x['sales'], color='b', marker='x')
plt.xlim (35, 50)
plt.ylim (5, 20)
plt.xlabel('temperature', fontsize=16) plt.xlabel('temperature', fontsize=16)
plt.ylabel('Sales', fontsize=16) plt.ylabel('Sales', fontsize=16)
plt.title('scatter plot - temperature vs sales', fontsize=20) plt.title('scatter plot - temperature vs sales', fontsize=20)
plt.legend([ 'Sales ' ], loc='best') plt.legend([ 'Sales ' ], loc='best')
plt.savefig('d:\scatterchart.pdf') plt.savefig('d:\scatterchart.pdf')
plt.show() plt.show()
To change, default limit values label sequence, of a chart using xticks() / yticks()
Without label range sequence - xticks() / yticks() With label range sequence - xticks() / yticks()
Default label of x limit values and y limit values. Note- we can change the label sequence of x limit values
and y limit values in a chart according to our choice values.
Default – for x axis 40, 42, 45 Use defined – for x axis 40 - t1 , 42 – t2, 45 – t3
for y axis 10, 12, 15 for y axis 10 – s1 , 12 – s2, 15 - s3
import matplotlib.pyplot as plt import matplotlib.pyplot as plt
import pandas as pd import pandas as pd
x=pd.DataFrame({ x=pd.DataFrame({
'temp' : [40, 42, 45], 'temp' : [40, 42, 45],
'sales' : [10,12, 15]}) 'sales' : [10,12, 15]})
plt.scatter( x['temp'], x['sales'], color='b', marker='x') plt.scatter( x['temp'], x['sales'], color='b', marker='x')
plt.xticks(x['temp']) plt.xticks(x['temp'], ['t1','t2','t3'])
plt.yticks(x['sales']) plt.yticks(x['sales'], ['s1','s2','s3'])
plt.xlabel('temperature', fontsize=16) plt.xlabel('temperature', fontsize=16)
plt.ylabel('Sales', fontsize=16) plt.ylabel('Sales', fontsize=16)
plt.title('scatter plot - temperature vs sales', fontsize=20) plt.title('scatter plot - temperature vs sales', fontsize=20)
plt.legend([ 'Sales ' ], loc='best') plt.legend([ 'Sales ' ], loc='best')
plt.savefig('d:\scatterchart.pdf') plt.savefig('d:\scatterchart.pdf')
plt.show() plt.show()
Sin line chart Cos line chart Log line chart Exp line chart
plt.plot(df['App Name'], df['App Prince in Rs' ]) plt.bar(df['App Name'], df['Total Downloads' ]) df['Est Downloads' ] = df['Total Downloads' ] / 1000
plt.title('Simple Line Chart ') plt.title('Simple Bar Chart ') x=np.arange(len(df['App Name']))
plt.xlabel( 'App Name' ) plt.xlabel( 'App Name' ) plt.bar(x, df['App Prince in Rs' ], width=.25)
plt.ylabel( 'App Prince in Rs' ) plt.ylabel( 'Total Downloads' ) plt.bar(x+0.25, df['Est Downloads' ], width=.25)
plt.legend([ 'App Prince in Rs' ], loc='best') plt.legend([ 'Total Downloads' ], loc='best') plt.xticks(x,df['App Name'])
plt.savefig('d:\chart.pdf') plt.savefig('d:\chart.pdf') plt.title('Muplitple Bar Chart ')
plt.show() plt.show() plt.xlabel( 'App Name' )
plt.ylabel( 'App Prince in Rs and Est Downloads Rs' )
plt.legend([ ['App Prince in Rs'],['Est Downloads'] ], loc='best')
plt.savefig('d:\chart.pdf')
plt.show()
3. Given a data frame df1 as shown below: import pandas as pd
1990 2000 2010 import numpy as np
a 52 340 890 import matplotlib.pyplot as plt
b 64 480 560
c 78 688 1102 df=pd.DataFrame({1990:[52,64,78,94],
d 94 766 889 2000:[340,480,688,766],
2010:[890,560,1102,889]},index=['a','b','c','d'])
Write code to create:
(a). A scatter chart from the 1990 and 2010 columns of (b). A line chart from the 1990 and 2010 columns of (c). Create a bar chart plotting the three columns of
dataframe df1 dataframe df1 dataframe df1
plt.title('Simple Histogram Chart of plt.title('Simple Histogram Chart of weight plt.title('Simple Histogram Chart of weight plt.title('Simple Histogram Chart of
plt.xlabel( 'weight Bins or Interval' ) plt.xlabel( 'weight Frequency values' ) plt.xlabel( 'weight Bins or Interval' ) plt.xlabel( 'weight Bins or Interval' )
plt.ylabel( 'weight Frequency values' ) plt.ylabel( 'weight Bins or Interval' ) plt.ylabel( 'weight Frequency values' ) plt.ylabel( 'weight Frequency values' )
plt.legend([ 'weight Frequencies' ], plt.legend([ 'weight Frequencies' ], plt.legend([ 'weight Frequencies' ], plt.legend([ 'weight Frequencies' ],
plt.legend()
plt.show()
Syntax and examples of various Pandas charts
import matplotlib.pyplot as plt
plt.title('Simple Histogram Chart of weight in frequency polygon')
plt.xlabel( 'weight Bins or Interval' )
plt.ylabel( 'weight Frequency values' )
plt.legend([ 'weight Frequencies' ], loc='best')
plt.savefig('d:\histchart.pdf')
plt.show(
Line plt.plot( x['month'], x['sales1'], color = 'g' , marker = 'X', markersize = 15, markeredgecolor = 'blue',
linestyle = 'dashdot', linewidth = 5)