Module-5 DSV
Module-5 DSV
Introduction
Matplotlib is probably the most popular plotting library for Python. It is
used for data science and machine learning visualizations all around the
world.
John Hunter was an American neurobiologist who began developing
Matplotlib in 2003.
It aimed to emulate the commands of the MATLAB software, which was
the scientific standard back then.
Several features, such as the global style of MATLAB, were introduced
into Matplotlib to make the transition to Matplotlib easier for MATLAB
users.
Figure
The Figure is an outermost container that allows you to draw
multiple plots within it. It not only holds the Axes object but also has
the ability to configure the Title.
Axes
The axes are an actual plot, or subplot, depending on whether you
want to plot single or multiple visualizations. Its sub-objects include
the x-axis, y-axis, spines, and legends.
Matplotlib gives us the ability not only to display data, but also design the
whole Figure around it by adjusting the Grid, X and Y ticks, tick labels, and
the Legend.
This implies that we can modify every single bit of a plot, starting from
the Title and Legend, right down to the major and minor ticks on the spines:
Taking a deeper look into the anatomy of a Figure object, we can observe the
following components:
Pyplot Basics
pyplot contains a simpler interface for creating visualizations that allow the
users to plot the data without explicitly configuring
the Figure and Axes themselves. They are automatically configured to achieve
the desired output. It is handy to use the alias plt to reference the imported
submodule, as follows:
Creating Figures
You can use plt.figure() to create a new Figure. This function returns a
Figure instance, but it is also passed to the backend. Every Figure-related
command that follows is applied to the current Figure and does not need to
know the Figure instance.
By default, the Figure has a width of 6.4 inches and a height of 4.8 inches
with a dpi (dots per inch) of 100. To change the default values of the Figure,
we can use the parameters figsize and dpi.
DPI (dots per inch) in Matplotlib is a parameter that affects the resolution of the
plot when it's saved to a file or displayed on screen. Higher DPI values result in
higher resolution images with more detail.
Closing Figures
plt.gcf().number
The plt.close('all') command is used to close all active Figures. The
following example shows how a Figure can be created and closed:
Format Strings
RGB or RGBA float tuples (for example, (0.2, 0.4, 0.3) or (0.2, 0.4, 0.3,
0.5))
RGB or RGBA hex strings (for example, '#0F0F0F' or '#0F0F0F0F')
The following table is an example of how a color can be represented in one
particular format:
Figure 3.3: Color specified in string format
All the available marker options are illustrated in the following figure:
All the available line styles are illustrated in the following diagram. In general,
solid lines should be used. We recommend restricting the use of dashed and
dotted lines to either visualize some bounds/targets/goals or to depict
uncertainty, for example, in a forecast:
Figure 3.5: Line styles
What are the different ways of plotting data points in matplotlib. Explain
with examples.
Plotting
With plt.plot([x], y, [fmt]), you can plot data points as lines and/or
markers. The function returns a list of Line2D objects representing the plotted
data. By default, if you do not provide a format string (fmt), the data points
will be connected with straight, solid lines. plt.plot([0, 1, 2, 3], [2,
4, 6, 8]) produces a plot, as shown in the following diagram. Since x is
optional and the default values are [0, …, N-1], plt.plot([2, 4, 6,
8]) results in the same plot:
Figure 3.6: Plotting data points as a line
If you want to plot markers instead of lines, you can just specify a format
string with any marker type. For example, plt.plot([0, 1, 2, 3], [2,
4, 6, 8], 'o') displays data points as circles, as shown in the following
diagram:
To plot multiple data pairs, the syntax plt.plot([x], y, [fmt], [x], y2,
[fmt2], …) can be used. plt.plot([2, 4, 6, 8], 'o', [1, 5, 9, 13],
's') results in the following diagram. Similarly, you can
use plt.plot multiple times, since we are working on the same Figure and
Axes:
Figure 3.8: Plotting data points with multiple markers
Ticks
Tick locations and labels can be set manually if Matplotlib's default isn't
sufficient. Considering the previous plot, it might be preferable to only have
ticks at multiples of ones at the x-axis. One way to accomplish this is to
use plt.xticks() and plt.yticks() to either get or set the ticks manually.
Parameters:
import numpy as np
plt.figure(figsize=(6, 3))
plt.plot([2, 4, 6, 8], 'o', [1, 5, 9, 13], 's')
plt.xticks(ticks=np.arange(4))
This will result in the following plot:
plt.figure(figsize=(6, 3))
plt.plot([2, 4, 6, 8], 'o', [1, 5, 9, 13], 's')
plt.xticks(ticks=np.arange(4), \
labels=['January', 'February', 'March', 'April'], \
rotation=20)
This will result in the following plot:
Figure 3.10: Plot with custom tick labels
Displaying Figures
Saving Figures
The plt.savefig(fname) saves the current Figure. There are some useful
optional parameters you can specify, such as dpi, format, or transparent.
The following code snippet gives an example of how you can save a Figure:
plt.figure()
plt.plot([1, 2, 4, 5], [1, 3, 4, 3], '-o')
#bbox_inches='tight' removes the outer white margins
plt.savefig('lineplot.png', dpi=300, bbox_inches='tight')
The following is the output of the code:
Figure 3.11: Saved Figure
In this exercise, we will create our first simple plot using Matplotlib. The
purpose of this exercise is for you to create your first simple line plot using
Matplotlib, including the customization of the plot with format strings.
Labels
Matplotlib provides a few label functions that we can use for setting labels to
the x- and y-axes. The plt.xlabel() and plt.ylabel() functions are used to
set the label for the current axes.
The set_xlabel() and set_ylabel() functions are used to set the label for
specified axes.
Example:
ax.set_xlabel('X Label')
ax.set_ylabel('Y Label')
You should (always) add labels to make a visualization more self-explanatory.
The same is valid for titles, which will be discussed now.
Titles
A title describes a particular chart/graph. The titles are placed above the axes
in the center, left edge, or right edge. There are two options for titles – you can
either set the Figure title or the title of an Axes.
The suptitle() function sets the title for the current and specified Figure.
The title() function helps in setting the title for the current and specified
axes.
Example:
fig = plt.figure()
fig.suptitle('Suptitle', fontsize=10, fontweight='bold')
This creates a bold Figure title with a text subtitle and a font size of 10:
plt.title('Title', fontsize=16)
The plt.title function will add a title to the Figure with text as Title and
font size of 16 in this case.
Text
There are two options for text – you can either add text to a Figure or text to
an Axes. The figtext(x, y, text) and text(x, y, text) functions add
text at locations x or y for a Figure.
Example:
Example:
Legends
Legend describes the content of the plot. To add a legend to your Axes, we
have to specify the label parameter at the time of plot creation.
Calling plt.legend() for the current Axes or Axes.legend() for a specific
Axes will add the legend. The loc parameter specifies the location of the
legend.
Example:
Let's look at the following scenario: you are interested in investing in stocks.
You downloaded the stock prices for the "big five": Amazon, Google, Apple,
Facebook, and Microsoft. You want to visualize the closing prices in dollars to
identify trends. This dataset is available in the Datasets folder that you had
downloaded initially. The following are the steps to perform:
Use Matplotlib to create a line chart visualizing the closing prices for the past five years
(whole data sequence) for all five companies. Add labels, titles, and a legend to make the
visualization self-explanatory. Use plt.grid() to add a grid to your plot.
# Import statements
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
# load datasets
google =
pd.read_csv('../../Datasets/GOOGL_data.csv')
facebook =
pd.read_csv('../../Datasets/FB_data.csv')
apple =
pd.read_csv('../../Datasets/AAPL_data.csv')
amazon =
pd.read_csv('../../Datasets/AMZN_data.csv')
microsoft =
pd.read_csv('../../Datasets/MSFT_data.csv')
#Use Matplotlib to create a line chart visualizing the closing prices for the past five years (whole
data sequence) for all five companies. Add labels, titles, and a legend to make the visualization
self-explanatory. Use plt.grid() to add a grid to your plot.
# Create figure
plt.figure(figsize=(16, 8), dpi=300)
# Plot data
plt.plot('date', 'close', data=google,
label='Google')
plt.plot('date', 'close', data=facebook,
label='Facebook')
plt.plot('date', 'close', data=apple,
label='Apple')
plt.plot('date', 'close', data=amazon,
label='Amazon')
plt.plot('date', 'close', data=microsoft,
label='Microsoft')
# Add grid
plt.grid()
# Add legend
plt.legend()
# Show plot
plt.show()
Use matplotlib code to demonstrate creating figures, closing figures, using format strings,
plotting, displaying figures, and saving figures with examples.
Summary:
Creating Figsize – Figsize,dpi
Closing Figure – Closing with Fig number
Formatting Strings : COlor, Line, Markers
Plotting: Plotting as Line, markers, multiple markers, data frame
Ticks: x-ticks,y-ticks,ticks with labels
Display figures – show,inline
Saving Figures
Basic Text and Legend functions: Labels, Titles, Text, Annotations,
Legend
Basic Plots
Bar Chart
Important parameters:
If you want to have subcategories, you have to use the plt.bar() function
multiple times with shifted x-coordinates. This is done in the following
example and illustrated in the figure that follows. The arange() function is a
method in the NumPy package that returns evenly spaced values within a
given interval. The gca() function helps in getting the instance of current axes
on any current Figure. The set_xticklabels() function is used to set the x-
tick labels with the list of given string labels.
Example:
Use matplotlib to create a bar plot for movie comparison (Or) What is the
use of bar plot.Explain with examples
Activity 3.02: Creating a Bar Plot for Movie
Comparison
We will use a bar plot to compare movie scores. You are given five movies
with scores from Rotten Tomatoes. The Tomatometer is the percentage of
approved Tomatometer critics who have given a positive review for the movie.
The Audience Score is the percentage of users who have given a score of 3.5 or
higher out of 5. Compare these two scores among the five movies.
# Create figure
plt.figure(figsize=(10, 5), dpi=300)
# Create bar plot
pos = np.arange(len(movie_scores['MovieTitle']))
width = 0.3
plt.bar(pos - width / 2, movie_scores['Tomatometer'], \
width, label='Tomatometer')
plt.bar(pos + width / 2, movie_scores['AudienceScore'], \
width, label='Audience Score')
# Specify ticks
plt.xticks(pos, rotation=10)
plt.yticks(np.arange(0, 101, 20))
# Get current Axes for setting tick labels and horizontal grid
ax = plt.gca()
# Set tick labels
ax.set_xticklabels(movie_scores['MovieTitle'])
ax.set_yticklabels(['0%', '20%', '40%', '60%', '80%', '100%'])
# Add minor ticks for y-axis in the interval of 5
ax.set_yticks(np.arange(0, 100, 5), minor=True)
# Add major horizontal grid with solid lines
ax.yaxis.grid(which='major')
# Add minor horizontal grid with dashed lines
ax.yaxis.grid(which='minor', linestyle='--')
# Add title
plt.title('Movie comparison')
# Add legend
plt.legend()
# Show plot
plt.show()
Use matplotlib library and create a pie chart for water usage
Pie Chart
Important parameters:
A stacked bar chart uses the same plt.bar function as bar charts. For each
stacked bar, the plt.bar function must be called, and the bottom parameter
must be specified, starting with the second stacked bar. This will become clear
with the following example:
plt.bar(x, bars1)
plt.bar(x, bars2, bottom=bars1)
plt.bar(x, bars3, bottom=np.add(bars1, bars2))
The result of the preceding code is visualized in the following diagram:
Let's look at the following scenario: you are the owner of a restaurant and, due
to a new law, you have to introduce a No Smoking Day. To make as few losses
as possible, you want to visualize how many sales are made every day,
categorized by smokers and non-smokers.
Use the dataset tips from Seaborn, which contains multiple entries of
restaurant bills, and create a matrix where the elements contain the sum of
the total bills for each day and smokers/non-smokers:
Note
For this exercise, we will import the Seaborn library as import seaborn as
sns. The dataset can be loaded using this code: bills =
sns.load_dataset('tips').
1. Import all the necessary dependencies and load the tips dataset.
Note that we have to import the Seaborn library to load the dataset.
2. Use the given dataset and create a matrix where the elements
contain the sum of the total bills for each day and split according to
smokers/non-smokers.
3. Create a stacked bar plot, stacking the summed total bills separated
according to smoker and non-smoker for each day.
4. Add a legend, labels, and a title.
After executing the preceding steps, the expected output should be as
follows:
# Create figure
plt.figure(figsize=(10, 5), dpi=300)
# Create stacked bar plot
plt.bar(days_range, totals[:, 0], label='Smoker')
plt.bar(days_range, totals[:, 1], bottom=totals[:, 0], \
label='Non-smoker')
# Add legend
plt.legend()
# Add labels and title
plt.xticks(days_range)
ax = plt.gca()
ax.set_xticklabels(days)
ax.yaxis.grid()
plt.ylabel('Daily total sales in $')
plt.title('Restaurant performance')
# Show plot
plt.show()
Important parameters:
Let's get some more practice regarding stacked area charts in the following
activity.
# Create figure
plt.figure(figsize=(10, 6), dpi=300)
# Create stacked area chart
labels = sales.columns[2:]
plt.stackplot('Quarter', 'Apple', 'Samsung', 'Huawei', \
'Xiaomi', 'OPPO', data=sales, labels=labels)
# Add legend
plt.legend()
# Add labels and title
plt.xlabel('Quarters')
plt.ylabel('Sales units in thousands')
plt.title('Smartphone sales units')
# Show plot
plt.show()
Histogram
Box Plot
The box plot shows multiple statistical measurements. The box extends from
the lower to the upper quartile values of the data, thereby allowing us to
visualize the interquartile range. For more details regarding the plot, refer to
the previous chapter. The plt.boxplot(x) function creates a box plot.
Important parameters:
Now that we've introduced histograms and box plots in Matplotlib, our
theoretical knowledge can be practiced in the following activity, where both
charts are used to visualize data regarding the intelligence quotient.
Note
The plt.axvline(x, [color=…], [linestyle=…]) function draws a
vertical line at position x.
Scatter Plot
Scatter plots show data points for two numerical variables, displaying a
variable on both axes. plt.scatter(x, y) creates a scatter plot of y versus x,
with optionally varying marker size and/or color.
Important parameters:
plt.scatter(x, y)
The result of the preceding code is shown in the following diagram:
Let's visualize the correlation between various animals with the help of a
scatter plot:
1. Create an Exercise3.03.ipynb Jupyter Notebook in
the Chapter03/Exercise3.03 folder to implement this exercise.
2. Import the necessary modules and enable plotting within the Jupyter
Notebook:
# Import statements
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
3. Use pandas to read the data located in the Datasets folder:
# Load dataset
data = pd.read_csv('../../Datasets/anage_data.csv')
4. The given dataset is not complete. Filter the data so that you end up
with samples containing a body mass and a maximum longevity. Sort
the data according to the animal class; here,
the isfinite() function (to check whether the number is finite or
not) checks for the finiteness of the given element:
# Preprocessing
longevity = 'Maximum longevity (yrs)'
mass = 'Body mass (g)'
data = data[np.isfinite(data[longevity]) \
& np.isfinite(data[mass])]
# Sort according to class
amphibia = data[data['Class'] == 'Amphibia']
aves = data[data['Class'] == 'Aves']
mammalia = data[data['Class'] == 'Mammalia']
reptilia = data[data['Class'] == 'Reptilia']
5. Create a scatter plot visualizing the correlation between the body
mass and the maximum longevity. Use different colors to group data
samples according to their class. Add a legend, labels, and a title. Use
a log scale for both the x-axis and y-axis:
# Create figure
plt.figure(figsize=(10, 6), dpi=300)
# Create scatter plot
plt.scatter(amphibia[mass], amphibia[longevity], \
label='Amphibia')
plt.scatter(aves[mass], aves[longevity], \
label='Aves')
plt.scatter(mammalia[mass], mammalia[longevity], \
label='Mammalia')
plt.scatter(reptilia[mass], reptilia[longevity], \
label='Reptilia')
# Add legend
plt.legend()
# Log scale
ax = plt.gca()
ax.set_xscale('log')
ax.set_yscale('log')
# Add labels
plt.xlabel('Body mass in grams')
plt.ylabel('Maximum longevity in years')
# Show plot
plt.show()
The following is the output of the code:
From the preceding output, we can visualize the correlation between various
animals based on the maximum longevity in years and body mass in grams.
Bubble Plot
Example:
Layouts
There are multiple ways to define a visualization layout in Matplotlib. By
layout, we mean the arrangement of multiple Axes within a Figure. We will
start with subplots and how to use the tight layout to create visually
appealing plots and then cover GridSpec, which offers a more flexible way to
create multi-plots.
Subplots
It is often useful to display several plots next to one another. Matplotlib offers
the concept of subplots, which are multiple Axes within a Figure. These plots
can be grids of plots, nested plots, and so on.
Example 1:
Example 2:
Subplots are an easy way to create a Figure with multiple plots of the same
size placed in a grid. They are not really suited for more sophisticated layouts.
Tight Layout
Examples:
Radar Charts
GridSpec
Example:
gs = matplotlib.gridspec.GridSpec(3, 4)
ax1 = plt.subplot(gs[:3, :3])
ax2 = plt.subplot(gs[0, 3])
ax3 = plt.subplot(gs[1, 3])
ax4 = plt.subplot(gs[2, 3])
ax1.plot(series[0])
ax2.plot(series[1])
ax3.plot(series[2])
ax4.plot(series[3])
plt.tight_layout()
The result of the preceding code is shown in the following diagram:
Images
If you want to include images in your visualizations or work with image data,
Matplotlib offers several functions for you. In this section, we will show you
how to load, save, and plot images with Matplotlib.
Loading Images
img_filenames = os.listdir('../../Datasets/images')
imgs = \
[mpimg.imread(os.path.join('../../Datasets/images', \
img_filename)) \
for img_filename in img_filenames]
The os.listdir() method in Python is used to get the list of all files and
directories in the specified directory and then the os.path.join() function is
used to join one or more path components intelligently.
Saving Images
Sometimes, it might be helpful to get an insight into the color values. We can
simply add a color bar to the image plot. It is recommended to use a colormap
with high contrast—for example, jet:
plt.imshow(img, cmap='jet')
plt.colorbar()
The preceding example is illustrated in the following figure:
Figure 3.44: Image with a jet colormap and color bar
Another way to get insight into the image values is to plot a histogram, as
shown in the following diagram. To plot the histogram for an image array, the
array has to be flattened using numpy.ravel:
To plot multiple images in a grid, we can simply use plt.subplots and plot
an image per Axes:
In this activity, we will plot images in a grid. You are a developer in a social
media company. Management has decided to add a feature that helps the
customer to upload images in a 2x2 grid format. Develop some standard code
to generate grid-formatted images and add this new feature to your company's
website.
plt.xlabel(‚$x$')
plt.ylabel(‚$\cos(x)$')
The following diagram shows the output of the preceding code:
Figure 3.49: Diagram demonstrating mathematical expressions
TeX examples:
'$\alpha_i>\beta_i$' produces .
Solution:
Solution:
Solution:
Solution:
Figure 3.53: Stacked area chart comparing sales units of different smartphone
manufacturers
Solution:
Let's visualize the IQ of different groups using a histogram and a box plot:
Solution: