Matplotlib
Matplotlib is a popular data visualization library in Python. It provides a variety of tools
to create static, animated, and interactive plots. With it, you can generate graphs such as line
charts, bar charts, histograms, scatter plots, and more. It is widely used in scientific
computing, data science, and machine learning.
Features of Matplotlib:
Highly customizable: You can control every aspect of a plot, including colors, labels,
and layout.
Multiple plot types: Supports line plots, bar plots, histograms, scatter plots, pie
charts, etc.
Integration with NumPy and Pandas: Works seamlessly with these libraries for
data manipulation and visualization.
Interactive plots: Can create interactive figures using matplotlib.pyplot or
integrate with GUI backends.
Exporting capabilities: Plots can be saved as various file formats, including PNG,
PDF, and SVG.
The Figure is the final image, and may contain one or more Axes.
The Axes represents an individual plot (not to be confused with
Axis, which refers to the x-, y-, or z-axis of a plot).
Matplotlib Architecture
Back-end layer:
Back-end layer of matplotlib provides the implementations of three abstract interface classes.
FigureCanvas:
Provides area onto which figure is drawn.
Example: for drawing in real world, we need paper onto which we can draw. FigureCanvas is
similar to paper.
matplotlib.backend_bases.FigureCanvas
Renderer:
It does drawing on FigureCanvas.Instance of Renderer knows how to draw on the
FigureCanvas.
Example: just like we need paintbrush, pen or pencil to draw on paper, we use Renderer object
to draw on FigureCanvas.
matplotlib.backend_bases.Renderer
Event:
Instance of Event handles user input like keystrokes, mouse clicks.
matplotlib.backend_bases.Event
Artist layer:
Artist layer comprised of one object known as Artist.
FigureCanvas from back-end layer is paper, artist object knows how to use Renderer object to
draw on canvas.
Everything we see on matplotlib figure is an instance of Artist. Example: title, lines, labels,
images etc.
There are two types of Artist object:
Primitive:
Line2D, Rectangle, Circle, text.
Composite:
Axis, Axes, Tick, and figure.
Each composite artist may contain other composite artist as well as primitive artist.
Artist layer is best for the programmers and appropriate paradigm when writing a web
application server, UI application.
Scripting layer:
Artist layer is syntactically heavy for the everyday purposes, especially for exploratory work.
Matplotlib provides lighter scripting interface to simplify common task.
Scripting layer composed mainly of pyplot, a lighter interface than Artist layer.
matplotlib.pyplot
Example:
import matplotlib.pyplot as plt
import numpy as np
xpoints = np.array([0, 6])
ypoints = np.array([0, 250])
plt.title(“Density Function”, fontsize=16, color=”blue”,
loc=”left”)
font1 = {'family':'serif','color':'blue','size':20}
plt.xlabel("Average Pulse" , fontdict = font1)
plt.ylabel("Calorie Burnage")
plt.plot(xpoints, ypoints)
plt.show()
The plot() function is used to draw points (markers) in a diagram.
By default, the plot() function draws a line from point to point.
Pyplot functions to customise plots
Add Grid Lines to a Plot
With Pyplot, you can use the grid() function to add grid lines to the plot.
You can use the axis parameter in the grid() function to specify which grid
lines to display.
Legal values are: 'x', 'y', and 'both'. Default value is 'both'.
plt.grid(axis = 'x')
You can also set the line properties of the grid, like this: grid(true, color =
'color', linestyle = 'linestyle', linewidth = number).
Plt.grid(true) # add both grid lines to background
Markers
You can use the keyword argument marker to emphasize each point with a
specified marker:
plt.plot(xpoints,ypoints, marker = 'o')
Format Strings fmt
You can also use the shortcut string notation parameter to specify the
marker.
This parameter is also called fmt, and is written with this syntax:
marker|line|color
plt.plot(ypoints, 'o:r')
Line Reference
Line syntax Description
‘-‘ Solid line
‘:’ Dotted line
‘- -‘ Dashed line
‘- .’ Dashed/dotted line
Colour
Linewidth and Line Style
The linewidth and linestyle property can be used to change the width and the style of the line
chart. Linewidth is specified in pixels. The default line width is 1 pixel showing a thin line.
Thus, a number greater than 1 will output a thicker line depending on the value provided.
We can also set the line style of a line chart using the linestyle parameter. It can take a string
such as "solid", "dotted", "dashed" or "dashdot".
import matplotlib.pyplot as plt
import pandas as pd
height= [121.9,124.5,129.5,134.6,139.7,147.3,152.4,157.5,162.6]
weight= [19.7,21.3,23.5,25.9,28.5,32.1,35.7,39.6,43.2]
df=pd.DataFrame({"height": height,"weight": weight})
#Set xlabel for the plot
plt.xlabel('Weight in kg')
#Set ylabel for the plot
plt.ylabel('Height in cm')
#Set chart title:
plt.title('Average weight with respect to average height')
#plot using marker'-*' and line colour as green
plt.plot(df.weight,df.height,marker='*',markersize=10, color='green
',linewidth=2, linestyle='dashdot')
plt.show()
Legend
In matplotlib, a legend is a section describing the graph’s components. It helps us understand
the various curves and plots in our figures.
plt.legend(handles=[lone one,line two], loc=’lower right’,
shadow=True)
To add a legend outside our axes, we use the bbox_to_anchor() parameter
in legend() function
fontsize: To change the font size.
title: To add a title to the legend.
savefig()
The savefig() function is used to save the plot as a file. You can specify
the file format (e.g., .png, .jpg, .pdf) and other options like the DPI (dots
per inch).
plt.savefig("plot.png") # Save the plot as a PNG file
xticks() and yticks()
These functions are used to customize the tick marks on the x-axis and y-axis of the plot,
respectively. You can specify the locations of the ticks, the labels, or both.
plt.plot(x, y1)
plt.xticks([1, 2, 3, 4, 5], ['one', 'two', 'three', 'four', 'five'])
plt.yticks([1, 4, 9, 16, 25], ['one', 'four', 'nine', 'sixteen', 'twenty-five'])
plt.show()
Display Multiple Plots
With the subplot() function you can draw multiple plots in one figure.
The subplot() function takes three arguments that describes the layout of the figure.
The layout is organized in rows and columns, which are represented by
the first and second argument.
The third argument represents the index of the current plot.
plt.subplot(1, 2, 1)
#the figure has 1 row, 2 columns, and this plot is
the first plot.
plt.subplot(1, 2, 2)
#the figure has 1 row, 2 columns, and this plot is
the second plot.
import matplotlib.pyplot as plt
import numpy as np
x = np.array([0, 1, 2, 3])
y = np.array([3, 8, 1, 10])
plt.subplot(2, 3, 1)
plt.plot(x,y)
x = np.array([0, 1, 2, 3])
y = np.array([10, 20, 30, 40])
plt.subplot(2, 3, 2)
plt.plot(x,y)
x = np.array([0, 1, 2, 3])
y = np.array([3, 8, 1, 10])
plt.subplot(2, 3, 3)
plt.plot(x,y)
x = np.array([0, 1, 2, 3])
y = np.array([10, 20, 30, 40])
plt.subplot(2, 3, 4)
plt.plot(x,y)
x = np.array([0, 1, 2, 3])
y = np.array([3, 8, 1, 10])
plt.subplot(2, 3, 5)
plt.plot(x,y)
x = np.array([0, 1, 2, 3])
y = np.array([10, 20, 30, 40])
plt.subplot(2, 3, 6)
plt.plot(x,y)
plt.show()
You can add a title to each plot with the title() function:
x = np.array([0, 1, 2, 3])
y = np.array([3, 8, 1, 10])
plt.subplot(1, 2, 1)
plt.plot(x,y)
plt.title("SALES")
You can add a title to the entire figure with the suptitle() function:
plt.suptitle("MY SHOP")
Bar Chart
matplotlib.pyplot.bar(x, height, width=0.8, bottom=None, *, align='center', data=None, **kwargs)
Make a bar plot. The bars are positioned at x with the given alignment. Their dimensions are
given by height and width.
x: The x coordinates of the bars.
height: The height(s) of the bars.
width: The width(s) of the bars (default: 0.8)
bottom: The y coordinate(s) of the bottom side(s) of the bars.
align {'center', 'edge'}, default: 'center': Alignment of the bars to the x coordinates:
'center': Center the base on the x positions.
'edge': Align the left edges of the bars with the x positions.
Other Parameters:
Color: color or list of color, optional
The colors of the bar faces.
Edgecolor: color or list of color, optional
The colors of the bar edges.
Linewidth: float or array-like, optional
Width of the bar edge(s). If 0, don't draw edges.
import matplotlib.pyplot as plt; plt.rcdefaults()
import numpy as np
import matplotlib.pyplot as plt
objects = ('Python', 'C++', 'Java', 'Perl', 'Scala', 'Lisp')
y_pos = np.arange(len(objects))
performance = [10,8,6,4,2,1]
plt.bar(y_pos, performance, align='center', alpha=0.5)
plt.xticks(y_pos, objects)
plt.ylabel('Usage')
plt.title('Programming language usage')
plt.show(
import numpy as np
import matplotlib.pyplot as plt
# data to plot
n_groups = 4
means_frank = (90, 55, 40, 65)
means_guido = (85, 62, 54, 20)
# create plot
index = np.arange(n_groups)
bar_width = 0.35
plt.bar(index, means_frank, bar_width,color='b',label='Frank')
plt.bar(index + bar_width, means_guido,
bar_width,color='g',label='Guido')
plt.xlabel('Person')
plt.ylabel('Scores')
plt.title('Scores by person')
plt.xticks(index + bar_width/2, ('A', 'B', 'C', 'D'))
plt.legend()
plt.show()
Scatter Plot
matplotlib.pyplot.scatter(x, y, s=None, c=None, marker=None, cmap=None, norm=None, alpha
=None, linewidths=None, *, edgecolors=None, plotnonfinite=False, data=None, **kwarg)
A scatter plot of y vs. x with varying marker size and/or color
x, y: The data positions.
s: The marker size in points
c: The marker colors
marker: (default ‘o’) The marker style
alpha: he alpha blending value, between 0 (transparent) and 1 (opaque).
linewidths: (default is 1.5) The linewidth of the marker edges
edgecolors: The edge color of the marker
import matplotlib.pyplot as plt
import numpy as np
x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
plt.scatter(x, y, color = 'hotpink')
x = np.array([2,2,8,1,15,8,12,9,7,3,11,4,7,14,12])
y = np.array([100,105,84,105,90,99,90,95,94,100,79,112,91,80,85])
plt.scatter(x, y, color = '#88c999')
plt.show()
Histogram
A histogram is a graph showing frequency distributions.
It is a graph showing the number of observations within each given interval.
matplotlib.pyplot.hist(x, bins=None, range=None, density=False, weights=None, cumulative=False,
bottom=None, histtype='bar', align='mid', orientation='vertical', rwidth=None, log=False, color=Non
e, label=None, stacked=False, *, data=None, **kwargs): Compute and plot a histogram.
This method uses numpy.histogram to bin the data in x and count the number of
values in each bin, then draws the distribution either as
a BarContainer or Polygon.
x: (n,) array or sequence of (n,) arrays
bins: int or sequence or str (default: 10)
If bins is an integer, it defines the number of equal-width bins in the
range.
If bins is a sequence, it defines the bin edges, including the left edge
of the first bin and the right edge of the last bin; in this case, bins may
be unequally spaced. All but the last (righthand-most) bin is half-open. In
other words, if bins is:
[1, 2, 3, 4]
then the first bin is [1, 2) (including 1, but excluding 2) and the
second [2, 3). The last bin, however, is [3, 4], which includes 4.
range: tuple or None. The lower and upper range of the bins. Lower and upper
outliers are ignored. If not provided, range is (x.min(), x.max())
density: If True, draw and return a probability density: each bin will display the
bin's raw count divided by the total number of counts and the bin
width (density = counts / (sum(counts) * np.diff(bins))), so that the area
under the histogram integrates to 1 (np.sum(density * np.diff(bins)) == 1).
histtype{'bar', 'barstacked', 'step', 'stepfilled'}, default: 'bar'
The type of histogram to draw.
'bar' is a traditional bar-type histogram. If multiple data are given
the bars are arranged side by side.
'barstacked' is a bar-type histogram where multiple data are
stacked on top of each other.
'step' generates a lineplot that is by default unfilled.
'stepfilled' generates a lineplot that is by default filled.
align{'left', 'mid', 'right'}, default: 'mid'
The horizontal alignment of the histogram bars.
'left': bars are centered on the left bin edges.
'mid': bars are centered between the bin edges.
'right': bars are centered on the right bin edges.
orientation{'vertical', 'horizontal'}, default: 'vertical'
If 'horizontal', barh will be used for bar-type histograms and
the bottom kwarg will be the left edges
colorcolor or list of color or None, default: None
Color or sequence of colors, one per dataset. Default ( None) uses the
standard line color sequence.
import matplotlib.pyplot as plt
# create data
data = [32, 96, 45, 67, 76, 28, 79, 62, 43, 81, 70,
61, 95, 44, 60, 69, 71, 23, 69, 54, 76, 67,
82, 97, 26, 34, 18, 16, 59, 88, 29, 30, 66,
23, 65, 72, 20, 78, 49, 73, 62, 87, 37, 68,
81, 80, 77, 92, 81, 52, 43, 68, 71, 86]
# create histogram
plt.hist(data)
# display histogram
plt.show()
import matplotlib.pyplot as plt
import numpy as np
# Generate random data
data = np.random.randn(1000) # 1000 samples from a normal distribution
# Create a histogram
plt.hist(data, bins=30, color='blue', alpha=0.7, edgecolor='black')
# Add labels and title
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.title('Histogram Example')
# Display the histogram
plt.show()
Box Plots
Matplotlib's boxplot mainly provides a graphical summary of a data set with
features such as minimum, first quartile, median, third quartile, and maximum.
Note: A quartile is a statistical phrase for dividing observations into four
predetermined intervals based on data values.
Syntax:
matplotlib.pyplot.boxplot(x, notch, vert, patch_artist, widths)
x: The input data. If a 2D array, a boxplot is drawn for each column in x. If a
sequence of 1D arrays, a boxplot is drawn for each array in x.
notch: (default: False) Whether to draw a notched boxplot (True), or a
rectangular boxplot (False)
vert: (default: True) If True, draws vertical boxes. If False, draw horizontal
boxes.
Patch_artist: (default: False) If False produces boxes with the Line2D artist.
Otherwise, boxes are drawn with Patch artists.
widths: The widths of the boxes. The default is 0.5,
or 0.15*(distance between extreme positions), if that is smaller.
Box Plot Figure
# Library Import (matplotlib)
import matplotlib.pyplot as plot
value_A = [62,77,44,41,77,69,79,79,75,35,48,81,75,64,72,88,81,56,54,59]
value_B = [ 64,52,81,45,35,31,91,98,19,91,90,52,29,53,103,16,75,13,33,23]
value_C = [25,29,19,98,52,81,21,61,65,85,12,54,12,56,36,55,35,32,22,82]
value_D = [89,75,71,19,88,66,89,99,70,80,89,78,14,29,75,86,79,91,73,90]
value_E = [90,57,76,40,18,88,65,81,58,19,47,89,32,36,43,52,18,58,19,95]
box_plot_data=[value_A,value_B,value_C,value_D,value_E]
plot.title("Fruit Growth Distribution")
plot.boxplot(box_plot_data,patch_artist=True,labels=['Banana','Pineapple','Peach','Grapes','M
uskmelon'])
plot.show()
Heatmaps
A heatmap is a graphical representation of data where individual values are represented as
colors. In the context of a scatter dataset, a heatmap can show the density of data points in
different regions of the plot. This can be particularly useful for identifying clusters, trends,
and outliers in the data.
import matplotlib.pyplot as plt
import numpy as np
x = np.random.randn(1000)
y = np.random.randn(1000)
heatmap, xedges, yedges = np.histogram2d(x, y, bins=50)
# Plot the heatmap
plt.imshow(heatmap.T, origin='lower', cmap='viridis', aspect='auto')
plt.colorbar(label='Density')
plt.title('Heatmap')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.show()