Class 1 Data Visualization in Python using matplotlib
Class 1 Data Visualization in Python using matplotlib
Objectives
In this tutorial, we will be learning
Explain what data visualization is and its importance in our world today
Understand why Python is considered one of the best data visualization tools
Describe matplotlib and its data visualization features in Python
List the types of plots and the steps involved in creating these plots
Data Visualization
Data visualization is the technique to present the data in a pictorial or graphical format. It
enables stakeholders and decision makers to analyze data visually. The data in a graphical
format allows them to identify new trends and patterns easily.
Example
Let's look at an example below.
We are a sales manager in a leading global organization. The organization plans
to study the sales details of each product across all regions and countries. This is
to identify the product which has the highest sales in a particular region end up
the production. This research will enable the organization to increase the
manufacturing of that product in the particular region. The data involved in this
research might be huge and complex. The research on this large numeric data is
difficult and time-consuming when it is performed manually.
When this numerical data is plotted on a graph or converted to charts it's easy
to identify the patterns and predicted the result accurately.
1
Three major considerations for Data Visualization:
Clarity
Accuracy
Efficiency
Clarity ensures that the data set is complete and relevant. This enables the data
scientist to use the new patterns yield from the data in the relevant places.
Accuracy ensures using appropriate graphical representation to convey the right
message.
Efficiency uses efficient visualization technique which highlights all the data
points.
There are some basic factors that one would need to be aware of before
visualizing the data.
Visual effect
Coordination System
Data Types and Scale
Informative Interpretation
Visual Effect includes the usage of appropriate shapes, colors, and size to
represent the analyzed data.
The Coordinate System helps to organize the data points within the provided
coordinates.
The Data Types and Scale choose the type of data such as numeric or
categorical.
The Informative Interpretation helps create visuals in an effective and easily
interpreted ill manner using labels, title legends, and pointers.
So far we have covered what data visualization is and how it helps interpret results with
large and complex data. With the help of the Python programming language, we can
perform this data visualization.
We'll learn more about how to visualize data using the Python programming language
below.
Python Libraries
2
Many new python data visualization libraries are introduced recently, such as matplotlib,
Vispy, bokeh, Seaborn, pygal, folium, and networkx. The matplotlib has emerged as the
main data visualization library.
matplotlib
matplotlib is a python two-dimensional plotting library for data visualization and creating
interactive graphics or plots. Using pythons matplotlib, the data visualization of large and
complex data becomes easy.
matplotlib Advantages
There are several advantages of using matplotlib to visualize data.
A multi-platform data visualization tool built on the numpy and sidepy framework.
Therefore, it's fast and efficient.
It possesses the ability to work well with many operating systems and graphic
backends.
It possesses high-quality graphics and plots to print and view for a range of graphs
such as histograms, bar charts, pie charts, scatter plots and heat maps.
With Jupyter notebook integration, the developers have been free to spend their
time implementing features rather than struggling with compatibility.
It has large community support and cross-platform support as it is an open source
tool.
It has full control over graph or plot styles such as line properties, thoughts, and
access properties.
3
We can create a plot using four simple steps.
Importing PyPlot:
In order to use pyplot on your computer for data visualization, you need to first import it in
your Python environment by issuing one the following commands:
4
import matplotlib,pyplot
This would require you to refer every command of pyplot as matplotlib,pyplot.<command>.
For example: matplotlib.pyplot.plot(x,y)
import matplotlib.pyplot as pl
you can refer to every command of pyplot as pl.
for example: pl.plot(x,y)
5
Line charts work out of the box with matplotlib. You can have multiple lines in a line
chart, change color, change type of line and much more. Matplotlib is a Python module for
plotting. Line charts are one of the many chart types it can create.
plot( ) function is used to create line chart using PyPlot.
PieChart:
A pie chart shows a static number and how categories represent part of a whole the
composition of something. A pie chart represents numbers in percentages, and the total
sum of all segments needs to equal 100%.
pie( ) function is used to create pie chart using PyPlot.
6
Scatterplot:
A scatter chart shows the relationship between two different variables and it can reveal the
distribution trends. It should be used when there are many different data points, and you
want to highlight similarities in the data set. This is useful when looking for outliers and for
understanding the distribution of your data.
scatter( ) function is used to create Scatter plot using PyPlot.
* Bar Chart A bar chart or bar graph that presents categorical data with rectangular
bars with heights or lengths proportional to the values that they represent. The bars can be
plotted vertically or horizontally.
bar( ) or barh( ) function is used to create bar chart using PyPlot.
7
Histogram Plot A histogram is a type of graph that provides a visual
interpretation of numerical data by indicating the number of data points that lie within a
range of values. hist( ) function is used to create Histogram Plot.
a=[20,40,60,80]
b=[1,2,3,4]
8
pl.plot(a,b)
9
ii. linewidth- used to specify width of line in a chart:
Syntax is:
pl.plot(linewidth=value in points)
for example:
pl.plot(linewidth=2)
iii. linestyle- Used to specify linestyle in a chart.
Syntax is:
pl.plot(linestyle=’style’)
here style may be :
a. solid for solid line.
b. dashed for dashed line.
c. dotted for dotted line.
d. dashdot for dashdotted line
for example:
pl.plot(linestyle=’dotted’)
iv. markers-The data points being plotted on chart/graph are called markers.
a. marker=valid marker type.
b. Markersize- used to specify size if marker in points.
c. Markeredgecolor-used to specify edge of marker color.
Syntax is:
pl.plot(marker=’valid type’, markersize=value in points, markeredgecolor=”valid colour”)
For example:
pl.plot(marker=’x’, markersize=4, markeredgecolor=”blue”)
Different marker types for plotting are:
10
When you do not specify markeredgecolor separately in plot( ) by default it takes the
same colour as that of line. Also, if you do not specify linestyle separately along with
linecolor- and- markerstyle-combination-string for example r+ Python will only plot
markers and not the line. To get the line specify line argument.
For example:
When linestyle is specified
import matplotlib.pyplot as pl
marks=[78,67,98,55,88]
test=[1,2,3,4,5]
pl.xlabel('test')
pl.ylabel('Marks')
pl.title('Marks Analysis')
pl.plot(test,marks,color='y',linestyle='solid',
linewidth=4,marker='P',markeredgecolor='r')
pl.show()
11
Program Based on Line Chart:
Q1 Write a program to plot a line chart to depict the changing weekly sales for four
weeks. Give appropriate axes labels.
import matplotlib.pyplot as pl
WeekNo=[1,2,3,4]
Sales=[1000,3000,5000,3000]
pl.xlabel("WeekNo")
pl.ylabel("Sales")
pl.title("Weekly Sales Chart")
pl.figure(figsize=(10,5))
pl.grid(True)
pl.plot(WeekNo,Sales,color='g')
pl.show()
Q2 Marks is a list that stores marks of a student in 10 units tests. Write a program to
plot the students performance in these 10 units tests.
Q3 Write a program in python to draw a line charts from the given financial data of
ABC Co. for 5 days in the form a DataFrame fdf as shown below:
Day1 Day2 Day3 Day4 Day5
0 74.25 56.03 59.30 69.00 89.65
1 76.06 68.71 72.07 78.47 79.65
2 69.50 62.89 77.65 65.53 80.75
3 72.55 56.42 66.46 76.85 85.08
Solution:
import matplotlib.pyplot as pl
import numpy as np
import pandas as pd
d1=[74.25,76.06,69.50,72.55]
d2=[56.03,68.71,62.89,56.42]
d3=[59.30,72.07,77.65,66.46]
d4=[69.00,78.47,65.53,76.85]
d5=[89.65,79.65,80.75,85.08]
day1=['Day1','Day2','Day3','Day4','Day5']
dict1={'Day1':d1,'Day2':d2,'Day3':d3,'Day4':d4,'Day5':d5}
df=pd.DataFrame(dict1,columns=day1)
12
print(df)
df.plot()
pl.xlabel('Day')
pl.ylabel('Sales')
pl.title('Multiple lines on same plot with suitable legends ')
pl.legend()
pl.show()
13