5 Plotting With Matplotlib
5 Plotting With Matplotlib
ipynb - Colab
In today's business world, so much information is gathered through data analysis that we need a way to paint a picture of that data so that we
can interpret it. Data visualization clearly shows what the information means by providing visual context through maps and graphs. This makes
data more understandable to the human mind, so you can easily spot trends, patterns, and outliers in large datasets.
Human perception itself was designed to assimilate figures and graphs more than numbers, even for those who are very obsessive with math.
To give an example, imagine sitting in front of a 30-million-row table of data and trying to notice anything insightful with your bare eyes! What
can be noticed?!
Now, reimagine this scenario, but with this gigantic table expressed in figures and graphs. Is it easier now for your bare eyes to catch something
insightful?
If your answer to the above question is yes, as it should be, then this can be a clear example of the importance of visualization in general and in
data analysis specifically.
What is matplotlib?
Matplotlib is a Python programming language library used for plotting and visualizing data. It is the most prominent of all Python visualization
packages and offers a wide range of functions. It can produce high-quality figures in various formats, such as PDF, SVG, JPG, PNG, BMP, and
GIF. Matplotlib also offers a variety of plot types including line plots, scatter plots, histograms, bar charts, error charts, pie charts, and box plots.
https://colab.research.google.com/drive/1sYUfVTNB5gLADf46firzr8ckLucqzeZm#printMode=true 1/27
7/19/24, 9:57 AM 5-plotting-with-matplotlib.ipynb - Colab
It also supports 3D plotting and many other libraries are built on top of it, such as pandas and Seaborn, allowing access to Matplotlib's methods
with less code.
The open-source tool Matplotlib was started by John Hunter in 2002 for his post-doctoral research in Neurobiology to visualize
Electrocorticography (ECoG) data from epilepsy patients.
The library itself is huge, with about 70,000 lines of code in total.
Matplotlib hosts several different interfaces (methods of creating characters) and can interact with several different backends. (The
backend handles the process of how the chart is rendered, not just the internals.)
Although comprehensive, some of matplotlib's public documentation is very outdated.
Matplotlib library is based on NumPy . This is a Python library used to visualize data. This is a tree-like hierarchy of objects that make up each
of these plots.
The figure in matplotlib can be understood as the outermost border of the graph. This figure can contain multiple axis objects.
Axes can be understood as part of a figure , a subplot in matplotlib terms. It can be used to manipulate any aspect of the charts it
contains. figure object in matplotlib is a box that contains one or more Axes objects. The following gaph is summerizing this.
https://colab.research.google.com/drive/1sYUfVTNB5gLADf46firzr8ckLucqzeZm#printMode=true 2/27
7/19/24, 9:57 AM 5-plotting-with-matplotlib.ipynb - Colab
Importing Matplotlib
Before using matplotlib, we need to import it. We can import Matplotlib as follows:
import matplotlib
Most of the time, we have to work with pyplot interface of matplotlib. So, we will import pyplot interface of matplotlib as follows:
import matplotlib.pyplot
To make things simpler, we will use standard shorthand for matplotlib imports as follows:
https://colab.research.google.com/drive/1sYUfVTNB5gLADf46firzr8ckLucqzeZm#printMode=true 3/27
7/19/24, 9:57 AM 5-plotting-with-matplotlib.ipynb - Colab
The Jupyter Notebook (formerly the IPython Notebook) is a comprehensive data analysis and visualization platform, offering multiple features
in one place. It allows users to execute code, create graphical plots, display rich text and media, write mathematical equations and much more -
all within a single executable document.
To enable interactive plotting within the notebook, the command %matplotlib should be used. There are two ways to work with graphics in
Jupyter Notebook.These are as follows:
%matplotlib notebook – This command will produce interactive plots embedded within the notebook.
%matplotlib inline – It will output static images of the plot embedded in the notebook.
After this command (it needs to be done only once per kernel per session), any cell within the notebook that creates a plot will embed a PNG
image of the graphic.
Histogram plt.hist()
Although the syntax is very simple but it has several interfaces and can be coded in different ways.
# import packages
import pandas as pd
import numpy as np
# setting random seed
np.random.seed(101)
# import matplotlib
import matplotlib.pyplot as plt
https://colab.research.google.com/drive/1sYUfVTNB5gLADf46firzr8ckLucqzeZm#printMode=true 4/27
7/19/24, 9:57 AM 5-plotting-with-matplotlib.ipynb - Colab
This can be easily by simply using multible plt.plot() functions before calling plt.show()
Advantages
Very powerful for representing continuous data, such as change over time.
Allows possible extrapolation of data.
Having a line constructed from multiple data points can allow you to make estimates of missing data.
Allows comparison of two or more features to see if there is any kind of connection or relationship.
Disadvantages
Can be challenging to determine exact values at a given point of the graph
Two lines that have values that are very similar, can make comparing data difficult.
In our example, we will follow the same methodology, create a figure, create axes, then the plot itself.In general, one of the most things you will
need to do with plots is to change the size of a plot. Lets see the syntax.
https://colab.research.google.com/drive/1sYUfVTNB5gLADf46firzr8ckLucqzeZm#printMode=true 5/27
7/19/24, 9:57 AM 5-plotting-with-matplotlib.ipynb - Colab
# create rnd_x :: 1000 random point between 0 and 1
rnd_x = np.linspace(0, 10, 1000)
# create rnd_y :: cos value of each point in x
rnd_y = np.cos(rnd_x)
# plot
plt.plot(rnd_x, rnd_y, 'b-');
Advantages
Visualizes large data sets.
Understandable to the majority
Shows relative numbers or proportions of multiple categories
Disadvantages
Easily manipulated to give wrong impressions.
Not suitable if there are huge numbers of categories.
the arguments related to plt.bar() are very straightforward. However, I recommend reading the documentation
https://colab.research.google.com/drive/1sYUfVTNB5gLADf46firzr8ckLucqzeZm#printMode=true 6/27
7/19/24, 9:57 AM 5-plotting-with-matplotlib.ipynb - Colab
matplotlib.pyplot.errorbar(x, y, yerr=None, xerr=None, fmt='', ecolor=None, elinewidth=None, capsize=None, barsabove=False, lolims=False, uplims=F
Syntax of plt.errorbar() is very similar to plt.bar() . The arguments that are very important are:
Check documentation
https://colab.research.google.com/drive/1sYUfVTNB5gLADf46firzr8ckLucqzeZm#printMode=true 7/27
7/19/24, 9:57 AM 5-plotting-with-matplotlib.ipynb - Colab
Advantages
can represent multiple categories in tight space
good for showing change over time of category sub-components
Disadvantages
can be challenging to understand
cab become very crowded
# lets create two random lists :: will be stacked on top of each other
rnd_a = list(np.random.randint(1,200,10))
rnd_b = list(np.random.randint(1,200,10))
# use simple range function for marking x steps
x_steps = range(10)
# for a stacked bar we will use two separte plt.bar() before calling plt.show()
plt.bar(x_steps, rnd_a, color = 'b')
plt.bar(x_steps, rnd_b, color = 'r', bottom = rnd_a)
plt.show()
https://colab.research.google.com/drive/1sYUfVTNB5gLADf46firzr8ckLucqzeZm#printMode=true 8/27
7/19/24, 9:57 AM 5-plotting-with-matplotlib.ipynb - Colab
# create two random lists that are reresents the amount bought of product a, and b respectively
prod_a = list(np.random.randint(1,200,5))
prod_b = list(np.random.randint(1,200,5))
# create two
plt.bar(x - width/2, prod_a, width, label='Men')
plt.bar(x + width/2, prod_b, width, label='Women')
# show
plt.show()
Advantages
know the part-to-whole relationship
Disadvantages
Sometimes when plotting a pie plot with some orientation in the 3rd dimension it will be very misleading.
matplotlib.pyplot.pie(x, explode=None, labels=None, colors=None, autopct=None, pctdistance=0.6, shadow=False, labeldistance=1.1, startangle=0, rad
Check documentation
# random array
rndnums = np.array([35, 28, 25, 10])
# labels
labels = ['A', 'B', 'C', 'D']
# create pie plot
plt.pie(rndnums, labels = labels)
# show
plt.show()
https://colab.research.google.com/drive/1sYUfVTNB5gLADf46firzr8ckLucqzeZm#printMode=true 9/27
7/19/24, 9:57 AM 5-plotting-with-matplotlib.ipynb - Colab
# random array
rndnums = np.array([35, 28, 25, 10])
# labels
labels = ['A', 'B', 'C', 'D']
# create pie plot
plt.pie(rndnums, explode=(0.1, 0, 0, 0), startangle=18, labels = labels)
# show
plt.show()
When dealing with numerical data, Box Plot is the best as it's revealing a lot of information that will help you get some insights about
distrbution, range, outliers, and skewness.
matplotlib.pyplot.boxplot(x, notch=None, sym=None, vert=None, whis=None, positions=None, widths=None, patch_artist=None, bootstrap=None, usermedia
Check documentation
https://colab.research.google.com/drive/1sYUfVTNB5gLADf46firzr8ckLucqzeZm#printMode=true 10/27
7/19/24, 9:57 AM 5-plotting-with-matplotlib.ipynb - Colab
x : x values
y1 : first border of y-axis
y2 : second border of y-axis, if there's one! Check documentation
# random x values
rndx = np.linspace(0, 10, 2000)
# y1 = sin(randx)
y1 = np.sin(rndx)
# y2 = cos(randx)
y2 = np.cos(rndx)
# create first plot
plt.plot(rndx, y1 , label = 'Sin(X)')
# create second plot
plt.plot(rndx, y2 , label = 'Cos(X)')
# use fill between the x-axis (y=0), and y1
plt.fill_between(rndx, y1=0, y2=y1)
# use fill between the x-axis (y=0), and y2
plt.fill_between(rndx, y1=0, y2=y2)
# show
plt.show()
keyboard_arrow_down Histogram
Histograms or frequency distribution plots indicates how often each distinct value occurs in a data set. Histograms are the most commonly
used charts for showing frequency distributions. Although they are very similar to bar charts, there are important differences between the two.
Histograms visualize quantitative data or numerical data, whereas bar charts display categorical variables.
Histograms are the best to use when you have a range of numerical data and you want to make sure about their distrbution. Knowing the
distrbution of a numerical feature can indicate a lot about the quality of the process resulting this feature and also, can hugly direct your choice
to your machine learning model, if you will proceed to this stage in the data science pipeline
https://colab.research.google.com/drive/1sYUfVTNB5gLADf46firzr8ckLucqzeZm#printMode=true 11/27
7/19/24, 9:57 AM 5-plotting-with-matplotlib.ipynb - Colab
Lets see Histogram syntax:
matplotlib.pyplot.hist(x, bins=None, range=None, density=False, weights=None, cumulative=False, bottom=None, histtype='bar', align='mid', orientat
Check documentation
Advantages
Creating scatter diagrams is a simple task, as all that needs to be done is to place and mark points on a graph.
Scatter plots are simple to comprehend and decipher.
Points that are far got isolated from the majority of points can be disregarded when conducting a correlation analysis, as these outliers do
not significantly affect the results.
Disadvantages
Scatter plots can become cluttered and difficult to read when they contain a large number of data points.
It can be difficult to determine the exact correlation between variables using a scatter plot, especially if the data points are widely
dispersed.
Scatter plots are not effective for displaying certain types of data, such as categorical data.
It can be difficult to accurately determine the distribution of the data using a scatter plot.
matplotlib.pyplot.scatter(x, y, s=None, c=None, marker=None, cmap=None, norm=None, vmin=None, vmax=None, alpha=None, linewidths=None, *, edgecolor
Check documentation
https://colab.research.google.com/drive/1sYUfVTNB5gLADf46firzr8ckLucqzeZm#printMode=true 12/27
7/19/24, 9:57 AM 5-plotting-with-matplotlib.ipynb - Colab
matplotlib.pyplot.hexbin(x, y, C=None, gridsize=100, bins=None, xscale='linear', yscale='linear', extent=None, cmap=None, norm=None, vmin=None, vm
# create data
x = np.random.normal(size=20000)
y = (x * 0.1 + np.random.normal(size=20000)) * 5
The brighter areas are indicating the most densed areas in the points distrbutions
keyboard_arrow_down Subplots
https://colab.research.google.com/drive/1sYUfVTNB5gLADf46firzr8ckLucqzeZm#printMode=true 13/27
7/19/24, 9:57 AM 5-plotting-with-matplotlib.ipynb - Colab
Plots in matplotlib must be implied within a Figure object. we can create a new figure with plt.figure() as follows:
fig = plt.figure()
ax1 = fig.add_subplot(2, 2, 1)
The above command means that there are four plots (2 * 2 = 4). We selected the first of four subplots. We can create the next three subplots
using the fig.add_subplot() commands as follows:
ax2 = fig.add_subplot(2, 2, 2)
ax3 = fig.add_subplot(2, 2, 3)
ax4 = fig.add_subplot(2, 2, 4)
There's also another way in creating subplots and it may be easier. It's syntax is as follows:
This will create a figure and axis that could be accessed just like an array.
https://colab.research.google.com/drive/1sYUfVTNB5gLADf46firzr8ckLucqzeZm#printMode=true 14/27
7/19/24, 9:57 AM 5-plotting-with-matplotlib.ipynb - Colab
# display multiple plots in one figure using subplots()
# create range of numbers from -50 to 50
x = np.arange(-50,50)
# square function
y_sq = np.power(x,2)
# cubic function
y_cub = np.power(x,3)
# sin function
y_sin = np.sin(x)
# cos function
y_cos = np.cos(x)
# tan function
y_tan = np.tan(x)
# tanh function
y_tanh = np.tanh(x)
# sinh function
y_sinh = np.sinh(x)
# cosh function
y_cosh = np.cosh(x)
# exp function
y_exp = np.exp(x)
# lets create figure and subplot
fig , ax = plt.subplots(nrows=3, ncols=3 , figsize = (20,20))
# plot y_sq
ax[0,0].plot(x,y_sq,"tab:blue")
# plot y_cub
ax[0,1].plot(x,y_cub,"tab:orange")
# plot y_sin
ax[0,2].plot(x,y_sin,"tab:green")
# plot y_cos
ax[1,0].plot(x,y_cos,"b-")
# plot y_tan
ax[1,1].plot(x,y_tan,"r-")
# plot y_tanh
ax[1,2].plot(x,y_tanh,"g-")
# plot y_sinh
ax[2,0].plot(x,y_sinh,"m-")
# plot y_cosh
ax[2,1].plot(x,y_cosh,"y-")
# plot y_exp
ax[2,2].plot(x,y_exp,"k-")
# show
plt.show()
https://colab.research.google.com/drive/1sYUfVTNB5gLADf46firzr8ckLucqzeZm#printMode=true 15/27
7/19/24, 9:57 AM 5-plotting-with-matplotlib.ipynb - Colab
print(plt.style.available)
plt.style.use('style_name')
https://colab.research.google.com/drive/1sYUfVTNB5gLADf46firzr8ckLucqzeZm#printMode=true 16/27
7/19/24, 9:57 AM 5-plotting-with-matplotlib.ipynb - Colab
Check documentation.
All the color options in matplotlib can be defined using the hex code of this color. You can get the color code of any color you want from Adobe
color wheel for example.
https://colab.research.google.com/drive/1sYUfVTNB5gLADf46firzr8ckLucqzeZm#printMode=true 17/27
7/19/24, 9:57 AM 5-plotting-with-matplotlib.ipynb - Colab
It can help to make the plot easier to read and interpret by providing a frame of reference for the data.
It can help to highlight patterns or trends in the data that might not be immediately apparent otherwise.
It can make it easier to compare values between different data points.
It can improve the overall aesthetics of the plot.
Overall, adding a grid to a plot can be a useful tool for helping to communicate the key insights and findings of your data analysis.
Syntax of grid:
plt.grid(True)
Check documentation
https://colab.research.google.com/drive/1sYUfVTNB5gLADf46firzr8ckLucqzeZm#printMode=true 18/27
7/19/24, 9:57 AM 5-plotting-with-matplotlib.ipynb - Colab
# calling plot
plt.plot(rndx, rndx*1.5)
# setting xmin, xmax, ymin, ymax
plt.axis([0, 5, -1, 13])
# show
plt.show()
In some times, we will need to control the x-axis or y-axis separtely. We could use the following:
plt.xlim([xmin, xmax])
plt.ylim([ymin, ymax])
https://colab.research.google.com/drive/1sYUfVTNB5gLADf46firzr8ckLucqzeZm#printMode=true 19/27
7/19/24, 9:57 AM 5-plotting-with-matplotlib.ipynb - Colab
# calling plot
plt.plot(rndx, rndx*1.5)
# setting xlim
plt.xlim([1.0, 4.0])
# setting ylim
plt.ylim([0.0, 12.0])
# show
plt.show()
Syntax!
plt.xticks(list_of_ticks)
plt.yticks(list_of_ticks)
This method will enable you adding xlabel, controlling its font, controlling its loc, etc.. Check documentation
Syntax:
matplotlib.pyplot.legend(*args, **kwargs)
legend()
legend(handles, labels)
https://colab.research.google.com/drive/1sYUfVTNB5gLADf46firzr8ckLucqzeZm#printMode=true 21/27
7/19/24, 9:57 AM 5-plotting-with-matplotlib.ipynb - Colab
legend(handles=handles)
legend(labels)
# generate x range
x = np.arange(0,5)
# cubic power function
y_cub = np.power(x,3)
# sqrt function
y_sqrt = np.sqrt(x)
# cos function
y_cos = np.cos(x)
# plot
plt.plot(x, y_cub)
plt.plot(x, y_sqrt)
plt.plot(x, y_cos)
# adding a legend
plt.legend(['Cubic','Sqrt','Cosine'])
# show
plt.show()
# generate x range
x = np.arange(0,5)
# cubic power function
y_cub = np.power(x,3)
# sqrt function
y_sqrt = np.sqrt(x)
# cos function
y_cos = np.cos(x)
# plot
plt.plot(x, y_cub, label='Cubic')
plt.plot(x, y_sqrt, label='Sqrt')
plt.plot(x, y_cos, label='Cosine')
# adding a legend
plt.legend()
# show
plt.show()
https://colab.research.google.com/drive/1sYUfVTNB5gLADf46firzr8ckLucqzeZm#printMode=true 22/27
7/19/24, 9:57 AM 5-plotting-with-matplotlib.ipynb - Colab
plt.legend() also, have an argument called loc for controlling the location of the legend. loc can take different values as follows:
best
upper left
upper right
lower left
lower right
upper center
lower center
center left
# generate x range
x = np.arange(0,5)
# cubic power function
y_cub = np.power(x,3)
# sqrt function
y_sqrt = np.sqrt(x)
# cos function
y_cos = np.cos(x)
# plot
plt.plot(x, y_cub, label='Cubic')
plt.plot(x, y_sqrt, label='Sqrt')
plt.plot(x, y_cos, label='Cosine')
# adding a legend
plt.legend(loc='lower left')
# show
plt.show()
keyboard_arrow_down Color
There are differnet ways to control the plot color in matplotlib:
You can use a single color string such as red , blue , or green .
https://colab.research.google.com/drive/1sYUfVTNB5gLADf46firzr8ckLucqzeZm#printMode=true 23/27
7/19/24, 9:57 AM 5-plotting-with-matplotlib.ipynb - Colab
You can use a short color string such as r , b , or g .
You can use a hex string such as #FF0000 for red or #0000FF for blue.
You can use a tuple of values in the range 0-1, such as (0.1, 0.2, 0.5) for a custom color.
# plot
plt.plot(rnd_x, rnd_y, color = 'g')
# show
plt.show();
# plot
plt.plot(rnd_x, rnd_y, color = '#0000FF')
# show
plt.show();
https://colab.research.google.com/drive/1sYUfVTNB5gLADf46firzr8ckLucqzeZm#printMode=true 24/27
7/19/24, 9:57 AM 5-plotting-with-matplotlib.ipynb - Colab
# plot
plt.plot(rnd_x, rnd_y, color = (0.1, 0.2, 0.5))
# show
plt.show();
# plot
plt.plot(rnd_x, rnd_y, linestyle = 'dashdot')
# show
plt.show();
https://colab.research.google.com/drive/1sYUfVTNB5gLADf46firzr8ckLucqzeZm#printMode=true 25/27
7/19/24, 9:57 AM 5-plotting-with-matplotlib.ipynb - Colab
https://colab.research.google.com/drive/1sYUfVTNB5gLADf46firzr8ckLucqzeZm#printMode=true 27/27