0% found this document useful (0 votes)
2 views

Class 1 Data Visualization in Python using matplotlib

This document provides a comprehensive tutorial on data visualization in Python using the matplotlib library. It explains the importance of data visualization, the advantages of using Python for this purpose, and details the various types of plots that can be created with matplotlib. Additionally, it outlines the steps for creating different types of charts, including line charts, pie charts, and scatter plots, along with example code snippets.

Uploaded by

Sanjay Kumar
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Class 1 Data Visualization in Python using matplotlib

This document provides a comprehensive tutorial on data visualization in Python using the matplotlib library. It explains the importance of data visualization, the advantages of using Python for this purpose, and details the various types of plots that can be created with matplotlib. Additionally, it outlines the steps for creating different types of charts, including line charts, pie charts, and scatter plots, along with example code snippets.

Uploaded by

Sanjay Kumar
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Data Visualization in Python using matplotlib

Objectives
In this tutorial, we will be learning

 Explain what data visualization is and its importance in our world today
 Understand why Python is considered one of the best data visualization tools
 Describe matplotlib and its data visualization features in Python
 List the types of plots and the steps involved in creating these plots

Data Visualization
Data visualization is the technique to present the data in a pictorial or graphical format. It
enables stakeholders and decision makers to analyze data visually. The data in a graphical
format allows them to identify new trends and patterns easily.

Example
Let's look at an example below.
We are a sales manager in a leading global organization. The organization plans
to study the sales details of each product across all regions and countries. This is
to identify the product which has the highest sales in a particular region end up
the production. This research will enable the organization to increase the
manufacturing of that product in the particular region. The data involved in this
research might be huge and complex. The research on this large numeric data is
difficult and time-consuming when it is performed manually.
When this numerical data is plotted on a graph or converted to charts it's easy
to identify the patterns and predicted the result accurately.

The main benefits of data visualization are as follows:


 It simplifies the complex quantitative information
 It helps analyze and explore big data easily
 It identifies the areas that need attention or improvement
 It identifies the relationship between data points and variables
 It explores new patterns and reveals hidden patterns in the data

1
Three major considerations for Data Visualization:
 Clarity
 Accuracy
 Efficiency
Clarity ensures that the data set is complete and relevant. This enables the data
scientist to use the new patterns yield from the data in the relevant places.
Accuracy ensures using appropriate graphical representation to convey the right
message.
Efficiency uses efficient visualization technique which highlights all the data
points.
There are some basic factors that one would need to be aware of before
visualizing the data.
 Visual effect
 Coordination System
 Data Types and Scale
 Informative Interpretation

Visual Effect includes the usage of appropriate shapes, colors, and size to
represent the analyzed data.
The Coordinate System helps to organize the data points within the provided
coordinates.
The Data Types and Scale choose the type of data such as numeric or
categorical.
The Informative Interpretation helps create visuals in an effective and easily
interpreted ill manner using labels, title legends, and pointers.
So far we have covered what data visualization is and how it helps interpret results with
large and complex data. With the help of the Python programming language, we can
perform this data visualization.
We'll learn more about how to visualize data using the Python programming language
below.

Python Libraries

2
Many new python data visualization libraries are introduced recently, such as matplotlib,
Vispy, bokeh, Seaborn, pygal, folium, and networkx. The matplotlib has emerged as the
main data visualization library.

matplotlib
matplotlib is a python two-dimensional plotting library for data visualization and creating
interactive graphics or plots. Using pythons matplotlib, the data visualization of large and
complex data becomes easy.

matplotlib Advantages
There are several advantages of using matplotlib to visualize data.

 A multi-platform data visualization tool built on the numpy and sidepy framework.
Therefore, it's fast and efficient.
 It possesses the ability to work well with many operating systems and graphic
backends.
 It possesses high-quality graphics and plots to print and view for a range of graphs
such as histograms, bar charts, pie charts, scatter plots and heat maps.
 With Jupyter notebook integration, the developers have been free to spend their
time implementing features rather than struggling with compatibility.
 It has large community support and cross-platform support as it is an open source
tool.
 It has full control over graph or plot styles such as line properties, thoughts, and
access properties.

Understanding the Plot

Let's now try to understand a plot. A


plot is a graphical representation of
data, which shows the relationship between two variables or
the distribution of data.
Example
This is a line plot of the random numbers on the y-axis and the range on the x-axis. The
background of the plot is called a grid. The text first plot denotes the title of the plot and
text line one denotes the legend.

3
We can create a plot using four simple steps.

The first step is to import the required libraries:


Here we have imported numpy, pyplot, and style from matplotlib
numpy used to generate the random numbers
pyplot is used to plot numbers
style class is used for setting the grid style
matplotlib inline is required to display the plot within the jupyter notebook.

The second step is to define or import the required data set.


Here we have defined the dataset random number using numpy random method. Note that
the range is ten. We have used the print method to view the created random numbers

The third step is to set the plot parameters.


In this step, we set the style of the plot, labels of the coordinates, titles of the plot, the
legend and the linewidth.
In this example, we have used ggplot as the plot style. The plot method is used to plot the
graph against the random numbers. In the plot method the word ‘g’ denotes the plotline
color as green, the label denotes the legend label and is named as line one. Also the
linewidth=2. Note that we have labeled the x-axis as range and the as labels and set the title
as First Plot.

The last step is to display the created plot.


Use the legend method to plot the graph based on the set conditions and the show method
to display the created plot.

Importing PyPlot:
In order to use pyplot on your computer for data visualization, you need to first import it in
your Python environment by issuing one the following commands:
4
import matplotlib,pyplot
This would require you to refer every command of pyplot as matplotlib,pyplot.<command>.
For example: matplotlib.pyplot.plot(x,y)

import matplotlib.pyplot as pl
you can refer to every command of pyplot as pl.
for example: pl.plot(x,y)

Working with PyPlot Methods


The PyPlot interface provides many methods for 2D plotting of data. The matplotlib’s
PyPlot interface lets one plot data in multiple ways such as:
i. Line Charts
ii. Bar Charts
iii. Pie Charts
iv. Scatter Charts
v. Histogram Plot
vi. BoxPlot
You can easily plot the data available in the form of NumPy arrays(ndarrays) or
dataframes.
For example:
import numpy as np
import matplotlib.pyplot as pl
x=np.linspace(1,5,6)
y=np.log(x)
pl.plot(x,y)
Output is:

5
Line charts work out of the box with matplotlib. You can have multiple lines in a line
chart, change color, change type of line and much more. Matplotlib is a Python module for
plotting. Line charts are one of the many chart types it can create.
plot( ) function is used to create line chart using PyPlot.

PieChart:
A pie chart shows a static number and how categories represent part of a whole the
composition of something. A pie chart represents numbers in percentages, and the total
sum of all segments needs to equal 100%.
pie( ) function is used to create pie chart using PyPlot.

6
Scatterplot:
A scatter chart shows the relationship between two different variables and it can reveal the
distribution trends. It should be used when there are many different data points, and you
want to highlight similarities in the data set. This is useful when looking for outliers and for
understanding the distribution of your data.
scatter( ) function is used to create Scatter plot using PyPlot.

* Bar Chart A bar chart or bar graph that presents categorical data with rectangular
bars with heights or lengths proportional to the values that they represent. The bars can be
plotted vertically or horizontally.
bar( ) or barh( ) function is used to create bar chart using PyPlot.

7
Histogram Plot A histogram is a type of graph that provides a visual
interpretation of numerical data by indicating the number of data points that lie within a
range of values. hist( ) function is used to create Histogram Plot.

BoxPlot Charts A BoxPlot is the visual representation of the statistical five


number summary of a given data set. boxplot( ) function is used to create boxplot using
PyPlot.

Creating Line Chart using plot( )


Steps for Creating Line Chart using plot( ):
i. Import MatPlotlib
Syntax is:
import matplotlib.pyplot as pl
ii. Specify the data:
For example:

a=[20,40,60,80]

b=[1,2,3,4]
8
pl.plot(a,b)

iii. xlabel(“String”)- To specify label for x-axis or horizontal axis.


Syntax:
pl.xlabel(“Qtr”)
iv.ylabel(“String”)- To specify label for y-axis or vertical axis.
Syntax:
pl.ylabel(“Sales”)
v. title( )- To display title of the chart.
Syntax:
pl.title(“String value”)
vi.figure(figsize=(<width>,<length>))- To change the plot size as per your requirement.
Width and length are specified in points.
For example:-
pl.figure(figsize=(20,15))
vii. grid(True)- To show grid on the plot.
For example:
pl.grid(True)
viii. show( )- used to show a plot as per given specification.
Syntax is:
pl.show( )
ix. legend()- to display legends on chart.
pl.legend()

Applying Various Settings in plot( ) function:


plot( ) allows you to specify multiple settings for chart/graph such as:

i. color- used to specify colour of line.


Syntax is:
pl.plot(color=’r’)
Colour codes for different colour in Matplotlib are:

9
ii. linewidth- used to specify width of line in a chart:
Syntax is:
pl.plot(linewidth=value in points)
for example:
pl.plot(linewidth=2)
iii. linestyle- Used to specify linestyle in a chart.
Syntax is:
pl.plot(linestyle=’style’)
here style may be :
a. solid for solid line.
b. dashed for dashed line.
c. dotted for dotted line.
d. dashdot for dashdotted line
for example:
pl.plot(linestyle=’dotted’)
iv. markers-The data points being plotted on chart/graph are called markers.
a. marker=valid marker type.
b. Markersize- used to specify size if marker in points.
c. Markeredgecolor-used to specify edge of marker color.
Syntax is:
pl.plot(marker=’valid type’, markersize=value in points, markeredgecolor=”valid colour”)
For example:
pl.plot(marker=’x’, markersize=4, markeredgecolor=”blue”)
Different marker types for plotting are:

10
When you do not specify markeredgecolor separately in plot( ) by default it takes the
same colour as that of line. Also, if you do not specify linestyle separately along with
linecolor- and- markerstyle-combination-string for example r+ Python will only plot
markers and not the line. To get the line specify line argument.
For example:
When linestyle is specified
import matplotlib.pyplot as pl
marks=[78,67,98,55,88]
test=[1,2,3,4,5]
pl.xlabel('test')
pl.ylabel('Marks')
pl.title('Marks Analysis')
pl.plot(test,marks,color='y',linestyle='solid',

linewidth=4,marker='P',markeredgecolor='r')
pl.show()

11
Program Based on Line Chart:
Q1 Write a program to plot a line chart to depict the changing weekly sales for four
weeks. Give appropriate axes labels.
import matplotlib.pyplot as pl
WeekNo=[1,2,3,4]
Sales=[1000,3000,5000,3000]
pl.xlabel("WeekNo")
pl.ylabel("Sales")
pl.title("Weekly Sales Chart")
pl.figure(figsize=(10,5))
pl.grid(True)
pl.plot(WeekNo,Sales,color='g')
pl.show()

Q2 Marks is a list that stores marks of a student in 10 units tests. Write a program to
plot the students performance in these 10 units tests.

Q3 Write a program in python to draw a line charts from the given financial data of
ABC Co. for 5 days in the form a DataFrame fdf as shown below:
Day1 Day2 Day3 Day4 Day5
0 74.25 56.03 59.30 69.00 89.65
1 76.06 68.71 72.07 78.47 79.65
2 69.50 62.89 77.65 65.53 80.75
3 72.55 56.42 66.46 76.85 85.08
Solution:
import matplotlib.pyplot as pl
import numpy as np
import pandas as pd
d1=[74.25,76.06,69.50,72.55]
d2=[56.03,68.71,62.89,56.42]
d3=[59.30,72.07,77.65,66.46]
d4=[69.00,78.47,65.53,76.85]
d5=[89.65,79.65,80.75,85.08]
day1=['Day1','Day2','Day3','Day4','Day5']
dict1={'Day1':d1,'Day2':d2,'Day3':d3,'Day4':d4,'Day5':d5}
df=pd.DataFrame(dict1,columns=day1)
12
print(df)
df.plot()
pl.xlabel('Day')
pl.ylabel('Sales')
pl.title('Multiple lines on same plot with suitable legends ')
pl.legend()
pl.show()

13

You might also like