MCA - S3 - Data Visualisation - U5
MCA - S3 - Data Visualisation - U5
Unit-05
Visualisation Using Pandas
Semester-03
Master of Computer Application 1
UNIT
Names of Sub-Units
Setting Up the Environment, Line Plot, Bar Plot, Stacked Plot, Histogram, Box Plot, Area Plot, Scatter
Plot, Hex Plot, Pie Plot, Scatter Matrix, Subplots
Overview
The unit begins by setting up the environment of pandas for visualising data. Next, the unit discusses
how to create a line plot, bar plot and stacked plot in python using pandas. Further, the unit discusses
the functions for creating histogram, box plot and area plot using pandas. The unit also discusses how
to create scatter plot, hex plot and pie plot using pandas. Towards the end, the unit explains how to
build a scatter matrix and subplots in python using pandas.
Learning Objectives
2
Learning Outcomes
http://blaqueyard.com/download/Python%20Data%20Visualization%20Cookbook.pdf
5.1 INTRODUCTION
Data visualisation is perhaps the most critical phase in the whole data science, big data, or machine
learning life cycle. When we use colours and images to show our study or analysis, it becomes more
stunning, intriguing, and understandable. Clients may better comprehend the key underlying
architecture, trends, patterns, and correlations among parameters within the dataset by using
visualisation components such as graphs, charts, and maps. All of the data visualisations provide us a
clear and precise picture of what the data is trying to tell us. It neutralises all data hence, we can grasp
the data insights.
Pandas is a Python data manipulation and analysis package that is open-source. It is a quick and
strong tool that lets you modify statistical data and series data using data structures and operations.
Combining, restructuring, choosing, data cleansing, and data wrangling are forms of data manipulation
procedures. Data may be imported from a variety of file formats, including SQL, MS Excel, and comma-
separated values, using this library.
3
Linux & Mac.
Canopy is also available as free as well as any commercial distribution with all full SciPy stack
for Windows, Linux & Mac.
Python is a free Python in which distribution with SciPy can stack & Spyder IDE for Windows OS.
After this, matplotlib is install in for creating a chart.
For Ubuntu Users: The command to install pandas in Ubuntu OS users is as follows:
sudo apt-get install python-numpy python-scipy python-
matplotlibipythonipythonnotebook
python-pandas python-sympy python-nose
For Fedora Users: The command to install pandas in Fedora OS users is as follows:
sudo yum install numpyscipy python-matplotlibipython python-pandas
sympy python-nose atlas-devel
Line charts are used to plot continuous data in the form of lines. Therefore, each point on a line chart
corresponds to a value. A line chart can use any number of data series (that is, continuous related data
in a column) and you can distinguish the lines by using different colours or line styles. For instance
plotting the budget and expenses of an organisation as a line chart may enable you to identify cost
fluctuations. To represent data, a line chart uses a horizontal axis (x-axis) and a vertical axis (y-axis).
Line plots may be created straight from pandas dataframes using the dataframe.plot() function. The
syntax for the line plot is as follows:
DataFrame.plot.line(x=None, y=None, **kwargs)
where,
x: Represents the x-axis data
**kwargs: Refers to the additional keyword arguments that are documented in DataFrame.plot().
4
The output of the given program is as follows:
190
180
170
160
150
140
130
120
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0
5
The output of the given program is as follows:
175
180
125
100
75
50
25
6
100
0 1 2 3 4
5.6 HISTOGRAM
Histogram chart is used to shows data in the form of frequency within a distribution. Each column in
the histogram chart is known as Bin. However, the continuously flowing data can be represented using
Histogram. It makes it easy to analyse the data defined within various data ranges. The function syntax
for creating histogram is as follows:
DataFrame.plot.hist(by=None, bins=10, **kwargs)[source]
where,
by [str or sequence, optional]: Refers to the column in the DataFrame based on the data is group.
Bins[int, default 10]: Refers to a number of histogram bins that is used for creating histogram.
**kwargs: Refers to the additional keyword arguments that are documented in DataFrame.plot().
7
10
10
100 120
5.0 4
2.5
8
backend[str, default None]: Uses in place of the backend specified in the plotting.backend option
**kwargs: Refers to the additional plotting keyword arguments that are passed in matplotlib.pyplot.
boxplot().
450
350
300
250
200
9
import pandas as pd
dataframe= pd.read_csv("BP_Record.csv")
dataframe.plot.area()
The output of the given program is as follows:
1000
10 20 25 30
where,
x: Refers to a column name to be used as horizontal coordinates for every purpose
y: Refers to a column name to be used as vertical coordinates for every purpose
s: Specifies the size of dots
c: Specifies the colour of dots
**kwargs: Refers to the additional keyword arguments that are documented in DataFrame.plot().
10
12
10
Class
11
xis
5.11 PIE PLOT
A pie chart is used to show relative proportions or contributions to a whole, which is contributed by each
value in a single data series. Pie charts are most effective while representing a small amount of data.
A chart highlights information and statistics in pie-slice format. This sort of chart represents numbers
in percentages, and also the total of all pies ought to equal 100%.
12
The output of the given program is as follows:
Partywise Votes
It is vital to envision for correlation among freelance variables utilised in analysing regression throughout
information pre-processing. Scatter plots create it terribly straightforward to know the correlation
between the options. Pandas provides analysts with the scatter matrix () perform to feasiblywin these
plots. It is conjointly accustomed verify whether or not the correlation is positive or negative
Ax[Matplotlib axis object, optional]: Specifies the axis object for matplotlib
grid[bool, optional]: Specifies this option to true for displaying the grid
diagonal[{‘hist’, ‘kde’}]: Allows you to select either hist for histogram plot or kde for kernel density
estimation in the diagonal
13
marker[str, optional]: Specifies the marker type for matplotlib
density_kwds[keywords]: Defines the density keyword to specify the kernel density estimate plot
range_padding[float, default 0.05]: Specifies the relative extension of axis range in x and y
**kwargs: Refers to the additional keyword arguments that are passed to scatter function
250
Puls
100
Maxpulse
150
100
Calories
150
120
400
450
250
100
100
250
5.13 SUBPLOTS
Python subplots are a great tool for data visualisation because they provide you a lot of flexibility
about how data is shown. Subplot is a function that generates a figure and a series of subplots. It is a
wrapper function that makes it easy to generate standard subplot designs in a single call, including the
containing figure object.
14
and verify assumptions.
15
7. Prepare the different types of plot or chart that are best suited on the imported dataset.
Data visualisation is perhaps the most critical phase in the whole data science, big data, or machine
learning life cycle.
Pandas is a Python data manipulation and analysis package that is open-source
Combining, restructuring, choosing, data cleansing, and data wrangling are forms of data
manipulation procedures.
All of the standard of Python was actually a distribution that does not actually come in bundled
with all the Pandas module.
Line charts are used to plot continuous data in the form of lines.
Line plots may be created straight from pandas dataframes using the dataframe.plot function.
A bar chart is a visual presentation of category data.
Bar plots is created straight from pandas dataframes using the dataframe.plot.bar() function.
A stacked bar graph is another name for a stacked bar chart. It is a graph that compares different
sections of a whole.
Histogram chart is used to shows data in the form of frequency within a distribution.
A boxplot, often known as a box and whisker plot, is a visual representation of a data set’s spread
and centres.
In an area chart, areas are used to represent values. It is similar to a line chart in that it displays a
series as a set of points connected by a line.
A scatter plot is truly a diagram drawn between a pair of distributions of variables X & Y on a
2-dimensional plane.
Hex plot presents a various vary of all utilities from parsing multiple file-formats to then changing
a whole data table into a NumPy matrix array.
A pie chart is used to show relative proportions or contributions to a whole, which is contributed by
each value in a single data series.
Scatter matrix or additionally referred to as pairs plot that is succinctly plots of all the numeric
variables that have in an exceedingly dataset against one another one.
Python subplots are a great tool for data visualisation because they provide you a lot of flexibility
about how data is shown.
5.16 GLOSSARY
16
Chart: A graphical representation for all the data visualisation, in which “the data is to represented
by the indicators or symbols
Scatter chart: It is used to show the relationship between the numeric values in two data series
Histogram chart: It is used to shows data in the form of frequency within a distribution
17
scatter matrix technique to examine the dataset. Refer to Section Scatter Matrix
Discuss with your friends about how to create a different plots or charts in python using pandas.
18
19