Data Science with Python
Data Visualization in Python using
matplotlib
Learning Objectives
By the end of this lesson, you will be able to:
Explain data visualization and its importance
Illustrate why Python is considered one of the best data
visualization tools
Describe matplotlib and its data visualization features in
Python
List the types of plots and the steps involved in creating
these plots
Data Visualization
Data Visualization
Data visualization is a technique to present the data in a pictorial or graphical format.
Well, you might wonder why data visualization is important?
Data Visualization
You are a Sales Manager in a leading global organization. The organization plans to study the sales details of each
product across all regions and countries. This is to identify the product which has the highest sales in a particular
region and up the production. This research will enable the organization to increase the manufacturing of that
product in that particular region.
Data Visualization
You are a Sales Manager in a leading global organization. The organization plans to study the sales details of each
product across all regions and countries. This is to identify the product which has the highest sales in a particular
region and up the production. This research will enable the organization to increase the manufacturing of that
product in that particular region.
Data Visualization
You are a Sales Manager in a leading global organization. The organization plans to study the sales details of each
product across all regions and countries. This is to identify the product which has the highest sales in a particular
region and up the production. This research will enable the organization to increase the manufacturing of that
product in that particular region.
Data Visualization
The main benefits of data visualization are:
Considerations of Data Visualization
Considerations of Data Visualization
Three major considerations for data visualization:
Ensure the dataset is complete and relevant. This enables the Data Scientist to use the new
patterns obtained from the data in the relevant places.
Considerations of Data Visualization
Three major considerations for data visualization:
Ensure you use appropriate graphical representation to convey the intended message.
Considerations of Data Visualization
Three major considerations for data visualization:
Use efficient visualization techniques that highlight all the data points.
Factors of Data Visualization
Factors of Data Visualization
There are some basic factors that one needs to be aware of before visualizing the data:
The visual effect includes the usage of appropriate shapes, colors, and sizes to represent the analyzed data.
Factors of Data Visualization
There are some basic factors that one needs to be aware of before visualizing the data:
The coordinate system helps organize the data points within the provided coordinates.
Factors of Data Visualization
There are some basic factors that one needs to be aware of before visualizing the data:
The data types and scale choose the type of data, for example, numeric or categorical.
Factors of Data Visualization
There are some basic factors that one needs to be aware of before visualizing the data:
The informative interpretation helps create visuals in an effective and easily interpretable manner using labels,
title, legends, and pointers.
Data Visualization Tool: Python
How is data visualization performed for large and complex data?
What data visualization is?
How data visualization helps interpret
results with large data
Python Libraries
Python Libraries
Many new Python data visualization libraries are introduced recently, such as:
matplotlib
vispy pygal
bokeh folium
seaborn networkx
Python Data
Visualization Libraries
Python Libraries: matplotlib
Using Python’s matplotlib, the data visualization of large and complex data becomes easy.
matplotlib
Python 2D plotting library
Python Libraries: matplotlib
There are several advantages of using matplotlib to visualize data. They are as follows:
Has high-quality With Jupyter notebook
graphics and plots to integration, the
print and view a range developers are free to
Can work well with spend their time
many operating of graphs
implementing features
systems and graphics Has large
back ends community support
and cross platform
Is a multi-platform
support as it is an
data visualization
open source tool
tool; therefore, it is
Has full control
fast and efficient
over graphs or
plot styles
Advantages of using matplotlib
to visualize data
The Plot
A plot is a graphical representation of data, which shows the relationship between two variables or the distribution of
data.
Title
First Plot
1.1
Legend
1.0
0.9
0.8 Grid
Numbers
Y -axis 0.7
0.6
0.5
0.4
0.
0.3
2 0 1 3 4 5 6 7
Range
X-axis
Steps to Create a Plot
You can create a plot using four simple steps.
Step 04: Display the created plot
Step 03: Set the plot parameters
Step 02: Define or import the required
dataset
Step 01: Import the required libraries
Steps to Create Plot: Example
First Plot
1.1
1.0
0.9
0.8
Numbers
0.7
0.6
0.5
0.4
0.3
0.2
0 1 3 4 5 6 7
Range
Steps to Create Plot: Example
Generate random nump Import the
numbers required libraries
Plot the pyplo y Step 01
numbers
tstyl
set the grid
style e
used numpy random
method
Defined the Define or import the
view the created dataset required dataset
random numbers Step 02
( ) Print method
Set the plot
ggplot Set the style parameters
Step 03
Set the
legend
Set line width
Set coordinates labels
Set the Display the
title created plot
Plot the graph Step 04
Display the created plot
Line Properties
Line Properties
Line Properties Plot Graphics
alph animated 1 linestyl 2 linewidt 3 marker
1 2
a e h style
set the transparency set the transparency
of the line of the line
View Line Properties
matplotlib also offers various line colors.
Click View Line Properties to know more.
Line Properties
Property Value Type
alpha float
Alias Color
animated [True | False] b Blue
antialiased or aa [True | False] r Red
c Cyan
clip_box a matplotlib.transform.Bbox instance m Magenta
g Green
clip_on [True | False]
y Yellow
a Path instance and a Transform k Black
clip_path
instance, a Patch
w White
color or c any matplotlib color
contains the hit testing function
dash_capstyle ['butt' | 'round' | 'projecting']
linestyle or ls [ '-' | '--' | '-.' | ':' | 'steps' | ...]
linewidth or lw float value in points
marker [ '+' | ',' | '.' | '1' | '2' | '3' | '4' ]
Plot with (X,Y)
A leading global organization wants to know how many people visit its website in a particular time. This
analysis helps it control and monitor the website traffic.
2D plot
User
s
Tim
e
Plot with (X,Y)
List of users
Time
Use %matplotlib inline to display or view the plot on Jupyter notebook.
Plot with (X,Y)
Web site traffic
1800
1600
Number of users 1400
1200
1000
800
600
400
200
0
6 8 10 12 14 16 18
Hrs
Controlling Line Patterns and Colors
Line Color (blue) Dashed (--)
Web site traffic
180
0
1600
Number of users
140
0
1200
1000
80
0
60
0
40
0
20
00
6 8 1 1 1 1 1
0 2 4 6 8
Hrs
Set Axis, Labels, and Legend Property
Using matplotlib, it is also possible to set the desired axis to interpret the result.
Axis is used to define the range on the x axis and y axis.
Set the
axis Web site
200
traffic Web
0
1500 traffic
Number of
users
1000
500
0
8 1 1 14 1
0 Hr
2 6
s
Alpha and Annotation
Alpha and Annotation
Alpha is an attribute that controls the transparency of the line.
The lower the alpha value, the more transparent the line is.
Alpha and Annotation
Annotate() method is used to annotate the graph. It has several attributes which help annotate the plot.
“Max” denotes the annotation text,
“ha” indicates the horizontal alignment,
“va” indicates the vertical alignment,
“xytext” indicates the text position,
“xy” indicates the arrow position, and
“arrowprops” indicates the properties of the arrow.
Alpha and Annotation
Annotate() method is used to annotate the graph. It has several attributes which help annotate the plot.
Multiple Plots
Monday
Web site traffic
2000
Web traffic
1500
Number of users
1000
500
0
8 10 12 14 1
6
Hrs
Multiple Plots
Web traffic data
Set different colors and line
widths for different days
Multiple Plots
Web site traffic
2000
Monday
Tuesday
Wednesday
1500
Number of users
1000
500
0
8 10 12 14 1
6
Hrs
Subplots
Subplots are used to display multiple plots in the same window.
With subplot, you can arrange plots in a regular grid.
The syntax for subplot is
It divides the current window into an m-by-n grid and
subplot(m,n,p).
creates an axis for a subplot in the position specified by p.
For example,
subplot(2,1,2) creates two subplots which are stacked vertically on a grid.
subplot(2,1,4) creates four subplots in one window.
Subplots
Subplots are used to display multiple plots in the same window.
With subplots, you can arrange plots in a regular grid.
Subplot(2,2,1) Subplot(2,2,2)
Subplot(2,1,1)
Grid divided
into two
vertically Grid divided
stacked plots Subplot(2,1,2) into four plots
Subplot(2,2,3) Subplot(2,2,4)
Layout
Layout and spacing adjustments are two important factors to be considered while creating subplots.
Use the plt.subplots_adjust() method with the parameters hspace and wspace to adjust the distances
between the subplots and move them around on the grid.
hspace
Top
Bottom
wspace
Types of Plots
Types of Plots
You can create different types of plots using matplotlib:
Histogram
Scatter Plot
Heat Map
Pie Chart
Error Bar
Types of Plots
You can create different types of plots using matplotlib:
Histogram Histograms are graphical representations of a
probability distribution. A histogram is a kind of
a bar chart.
Using matplotlib and its bar chart function, you
Scatter Plot bins
bins
can create histogram charts.
Frequency
Advantages of Histogram charts:
Heat Map
• They display the number of values within a
specified interval.
Pie Chart • They are suitable for large datasets as they
can be grouped within the intervals.
Error Bar Age
Types of Plots
You can create different types of plots using matplotlib:
A scatter plot is used to graphically display the relationships between variables.
Histogram
However, to control a plot, it is recommended to use scatter() method.
Scatter Plot
It has several advantages:
• Shows the correlation between variables
Heat Map • Is suitable for large datasets
• Is easy to find clusters
• Is possible to represent each piece of data as a
Pie Chart point on the plot
Error Bar
Types of Plots
You can create different types of plots using matplotlib:
A heat map is a way to visualize two-dimensional data. Using heat maps, you can gain
Histogram
deeper and faster insights about data than other types of plots.
It has several advantages:
Scatter Plot
• Draws attention to the risk-prone area
• Uses the entire dataset to draw meaningful insights
• Is used for cluster analysis and can deal with large
Heat Map
datasets
Pie Chart
Error Bar
Types of Plots
You can create different types of plots using matplotlib:
Pie charts are used to show percentage or proportional data.
Histogram
matplotlib provides the pie() method to create pie charts.
Scatter Plot It has several advantages:
• Summarizes a large dataset in visual form
• Displays the relative proportions of multiple
Heat Map classes of data
• Size of the circle is made proportional to the total
quantity
Pie Chart
Error Bar
Types of Plots
You can create different types of plots using matplotlib:
An error bar is used to graphically represent the variability of data. It is used mainly to
Histogram
identify errors. It builds confidence about the data analysis by revealing the statistical
difference between the two groups of data.
Scatter Plot
It has several advantages:
Heat Map
• Shows the variability in data and indicates the errors.
• Depicts the precision in the data analysis.
• Demonstrates how well a function and model are
Pie Chart used in the data analysis.
• Describes the underlying data.
Error Bar
Seaborn
Seaborn is a Python visualization library based on matplotlib. It provides a high-level interface to draw
attractive statistical graphics.
There are several advantages:
Has built-in statistical Has functions to
Possesses built-in functions which reveal visualize matrices
themes for better hidden patterns in the of data
visualizations dataset
Analyzing the “auto mpg” Data
Problem Statement:
Analyze the “auto mpg data” and draw a pair plot using seaborn library for mpg, weight,
and origin.
Sources:
(a) Origin: This dataset was taken from the StatLib library maintained at Carnegie Mellon
University.
•Number of Instances: 398
• Number of Attributes: 9 including the class attribute
• Attribute Information:
o mpg: continuous
o cylinders: multi-valued discrete
o displacement: continuous
o horsepower: continuous
o weight: continuous
o acceleration: continuous
o model year: multi-valued discrete
o origin: multi-valued discrete
o car name: string (unique for each instance)
Listing Ohio State’s Leading Causes of Death
Problem Statement:
You have been provided with a dataset that lists Ohio State’s leading causes of death from
the year 2012.
Using the two data points:
• Cause of deaths and
• Percentile
Draw a pie chart to visualize the dataset.
Listing Ohio State’s Leading Causes of Death
Instructions to perform the assignment:
• Download the dataset “Ohio_State_data”. Use the data provided to create relevant and
required variables.
Common instructions:
•If you are new to Python, download the “Anaconda Installation Instructions” document
from the “Resources” tab to view the steps for installing Anaconda and the Jupyter
notebook.
•Download the “Assignment 02” notebook and upload it on the Jupyter notebook to access
it.
•Follow the provided cues to complete the assignment.
Key Takeaways
You are now able to:
Explain what data visualization is and its importance
Illustrate why Python is considered one of the best data
visualization tools
Describe matplotlib and its data visualization features in
Python
List the types of plots and the steps involved in creating
these plots
Knowledge Check
Knowledge
Check
Which of the following methods is used to set the title?
1
a. Plot()
b. Plt.title()
c. Plot.title()
d. Title()
Knowledge
Check
Which of the following methods is used to set the title?
1
a. Plot()
b. Plt.title()
c. Plot.title()
d. Title()
The correct answer is b
Plt.title() is used to set the title.
Knowledge
Check
Which of the following methods is used to adjust the distances between the subplots?
2
a. plot.subplots_adjust()
b. plt.subplots_adjust()
c. subplots_adjust()
d. plt.subplots.adjust()
Knowledge
Check
Which of the following methods is used to adjust the distances between the subplots?
2
a. plot.subplots_adjust()
b. plt.subplots_adjust()
c. subplots_adjust()
d. plt.subplots.adjust()
The correct answer is b
plt.subplots_adjust() used to adjust the distances between the subplots.
Knowledge
Check Which of the following libraries needs to be imported to display the plot on Jupyter
notebook?
3
a. %matplotlib
b. %matplotlib inline
c. import matplotlib
d. import style
Knowledge
Check Which of the following libraries needs to be imported to display the plot on Jupyter
notebook?
3
a. %matplotlib
b. %matplotlib inline
c. import matplotlib
d. import style
The correct answer is b
To display the plot on Jupyter notebook “import‘%matplotlib inline.”
Knowledge
Check
Which of the following keywords is used to decide the transparency of the plot line?
4
a. Legend
b. Alpha
c. Animated
d. Annotation
Knowledge
Check
Which of the following keywords is used to decide the transparency of the plot line?
4
a. Legend
b. Alpha
c. Animated
d. Annotation
The correct answer is c
Alpha decides the line transparency in line properties while plotting line plot/ chart.
Knowledge
Check
Which of the following plots is used to represent data in a two-dimensional manner?
5
a. Histogram
b. Heat Map
c. Pie Chart
d. Scatter Plot
Knowledge
Check
Which of the following plots is used to represent data in a two-dimensional manner?
5
a. Histogram
b. Heat Map
c. Pie Chart
d. Scatter Plot
The correct answer is b
Heat Maps are used to represent data in a two-dimensional manner.
Knowledge
Check
Which of the following statements limits both x and y axes to the interval [0, 6]?
6
a. plt.xlim(0, 6)
b. plt.ylim(0, 6)
c. plt.xylim(0, 6)
d. plt.axis([0, 6, 0, 6])
Knowledge
Check
Which of the following statements limits both x and y axes to the interval [0, 6]?
6
a. plt.xlim(0, 6)
b. plt.ylim(0, 6)
c. plt.xylim(0, 6)
d. plt.axis([0, 6, 0, 6])
The correct answer is d
plt.axis([0, 6, 0, 6]) statement limits both x and y axes to the interval [0, 6].
Thank You