0% found this document useful (0 votes)

5 views8 pages

Python Support

The document discusses the creation and customization of scatter plots using Matplotlib, highlighting the differences between the plt.plot and plt.scatter functions. It also covers the importance of visualizing errors in data representation, including basic error bars and continuous error visualization techniques. The document emphasizes the flexibility of Matplotlib in creating informative visualizations for multidimensional data.

Uploaded by

Boris KOUDAYA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views8 pages

Python Support

Uploaded by

Boris KOUDAYA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Simple Scatter Plots

Another commonly used plot type is the simple scatter plot, a close cousin of the line
plot. Instead of points being joined by line segments, here the points are represented
individually with a dot, circle, or other shape. We’ll start by setting up the notebook
for plotting and importing the functions we will use:
In[1]: %matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import numpy as np

Scatter Plots with plt.plot

In the previous section, we looked at plt.plot/ax.plot to produce line plots. It turns
out that this same function can produce scatter plots as well (Figure 4-20):
In[2]: x = np.linspace(0, 10, 30)
y = np.sin(x)

plt.plot(x, y, 'o', color='black');

Figure 4-20. Scatter plot example

The third argument in the function call is a character that represents the type of sym‐
bol used for the plotting. Just as you can specify options such as '-' and '--' to con‐
trol the line style, the marker style has its own set of short string codes. The full list of
available symbols can be seen in the documentation of plt.plot, or in Matplotlib’s
online documentation. Most of the possibilities are fairly intuitive, and we’ll show a
number of the more common ones here (Figure 4-21):
In[3]: rng = np.random.RandomState(0)
for marker in ['o', '.', ',', 'x', '+', 'v', '^', '<', '>', 's', 'd']:
plt.plot(rng.rand(5), rng.rand(5), marker,
label="marker='{0}'".format(marker))

Simple Scatter Plots | 233

plt.legend(numpoints=1)
plt.xlim(0, 1.8);

Figure 4-21. Demonstration of point numbers

For even more possibilities, these character codes can be used together with line and
color codes to plot points along with a line connecting them (Figure 4-22):
In[4]: plt.plot(x, y, '-ok'); # line (-), circle marker (o), black (k)

Figure 4-22. Combining line and point markers

Additional keyword arguments to plt.plot specify a wide range of properties of the

lines and markers (Figure 4-23):
In[5]: plt.plot(x, y, '-p', color='gray',
markersize=15, linewidth=4,
markerfacecolor='white',
markeredgecolor='gray',
markeredgewidth=2)
plt.ylim(-1.2, 1.2);

234 | Chapter 4: Visualization with Matplotlib

Figure 4-23. Customizing line and point numbers

This type of flexibility in the plt.plot function allows for a wide variety of possible
visualization options. For a full description of the options available, refer to the
plt.plot documentation.

Scatter Plots with plt.scatter

A second, more powerful method of creating scatter plots is the plt.scatter func‐
tion, which can be used very similarly to the plt.plot function (Figure 4-24):
In[6]: plt.scatter(x, y, marker='o');

Figure 4-24. A simple scatter plot

The primary difference of plt.scatter from plt.plot is that it can be used to create
scatter plots where the properties of each individual point (size, face color, edge color,
etc.) can be individually controlled or mapped to data.
Let’s show this by creating a random scatter plot with points of many colors and sizes.
In order to better see the overlapping results, we’ll also use the alpha keyword to
adjust the transparency level (Figure 4-25):

Simple Scatter Plots | 235

In[7]: rng = np.random.RandomState(0)
x = rng.randn(100)
y = rng.randn(100)
colors = rng.rand(100)
sizes = 1000 * rng.rand(100)

plt.scatter(x, y, c=colors, s=sizes, alpha=0.3,

cmap='viridis')
plt.colorbar(); # show color scale

Figure 4-25. Changing size, color, and transparency in scatter points

Notice that the color argument is automatically mapped to a color scale (shown here
by the colorbar() command), and the size argument is given in pixels. In this way,
the color and size of points can be used to convey information in the visualization, in
order to illustrate multidimensional data.
For example, we might use the Iris data from Scikit-Learn, where each sample is one
of three types of flowers that has had the size of its petals and sepals carefully meas‐
ured (Figure 4-26):
In[8]: from sklearn.datasets import load_iris
iris = load_iris()
features = iris.data.T

plt.scatter(features[0], features[1], alpha=0.2,

s=100*features[3], c=iris.target, cmap='viridis')
plt.xlabel(iris.feature_names[0])
plt.ylabel(iris.feature_names[1]);

236 | Chapter 4: Visualization with Matplotlib

Figure 4-26. Using point properties to encode features of the Iris data

We can see that this scatter plot has given us the ability to simultaneously explore
four different dimensions of the data: the (x, y) location of each point corresponds to
the sepal length and width, the size of the point is related to the petal width, and the
color is related to the particular species of flower. Multicolor and multifeature scatter
plots like this can be useful for both exploration and presentation of data.

plot Versus scatter: A Note on Efficiency

Aside from the different features available in plt.plot and plt.scatter, why might
you choose to use one over the other? While it doesn’t matter as much for small
amounts of data, as datasets get larger than a few thousand points, plt.plot can be
noticeably more efficient than plt.scatter. The reason is that plt.scatter has the
capability to render a different size and/or color for each point, so the renderer must
do the extra work of constructing each point individually. In plt.plot, on the other
hand, the points are always essentially clones of each other, so the work of determin‐
ing the appearance of the points is done only once for the entire set of data. For large
datasets, the difference between these two can lead to vastly different performance,
and for this reason, plt.plot should be preferred over plt.scatter for large
datasets.

Visualizing Errors
For any scientific measurement, accurate accounting for errors is nearly as important,
if not more important, than accurate reporting of the number itself. For example,
imagine that I am using some astrophysical observations to estimate the Hubble Con‐
stant, the local measurement of the expansion rate of the universe. I know that the
current literature suggests a value of around 71 (km/s)/Mpc, and I measure a value of
74 (km/s)/Mpc with my method. Are the values consistent? The only correct answer,
given this information, is this: there is no way to know.

Visualizing Errors | 237

Suppose I augment this information with reported uncertainties: the current litera‐
ture suggests a value of around 71 ± 2.5 (km/s)/Mpc, and my method has measured a
value of 74 ± 5 (km/s)/Mpc. Now are the values consistent? That is a question that
can be quantitatively answered.
In visualization of data and results, showing these errors effectively can make a plot
convey much more complete information.

Basic Errorbars
A basic errorbar can be created with a single Matplotlib function call (Figure 4-27):
In[1]: %matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import numpy as np
In[2]: x = np.linspace(0, 10, 50)
dy = 0.8
y = np.sin(x) + dy * np.random.randn(50)

plt.errorbar(x, y, yerr=dy, fmt='.k');

Figure 4-27. An errorbar example

Here the fmt is a format code controlling the appearance of lines and points, and has
the same syntax as the shorthand used in plt.plot, outlined in “Simple Line Plots”
on page 224 and “Simple Scatter Plots” on page 233.
In addition to these basic options, the errorbar function has many options to fine-
tune the outputs. Using these additional options you can easily customize the aesthet‐
ics of your errorbar plot. I often find it helpful, especially in crowded plots, to make
the errorbars lighter than the points themselves (Figure 4-28):
In[3]: plt.errorbar(x, y, yerr=dy, fmt='o', color='black',
ecolor='lightgray', elinewidth=3, capsize=0);

238 | Chapter 4: Visualization with Matplotlib

Figure 4-28. Customizing errorbars

In addition to these options, you can also specify horizontal errorbars (xerr), one-
sided errorbars, and many other variants. For more information on the options avail‐
able, refer to the docstring of plt.errorbar.

Continuous Errors
In some situations it is desirable to show errorbars on continuous quantities. Though
Matplotlib does not have a built-in convenience routine for this type of application,
it’s relatively easy to combine primitives like plt.plot and plt.fill_between for a
useful result.
Here we’ll perform a simple Gaussian process regression (GPR), using the Scikit-Learn
API (see “Introducing Scikit-Learn” on page 343 for details). This is a method of fit‐
ting a very flexible nonparametric function to data with a continuous measure of the
uncertainty. We won’t delve into the details of Gaussian process regression at this
point, but will focus instead on how you might visualize such a continuous error
measurement:
In[4]: from sklearn.gaussian_process import GaussianProcess

# define the model and draw some data

model = lambda x: x * np.sin(x)
xdata = np.array([1, 3, 5, 6, 8])
ydata = model(xdata)

# Compute the Gaussian process fit

gp = GaussianProcess(corr='cubic', theta0=1e-2, thetaL=1e-4, thetaU=1E-1,
random_start=100)
gp.fit(xdata[:, np.newaxis], ydata)

xfit = np.linspace(0, 10, 1000)

yfit, MSE = gp.predict(xfit[:, np.newaxis], eval_MSE=True)
dyfit = 2 * np.sqrt(MSE) # 2*sigma ~ 95% confidence region

Visualizing Errors | 239

We now have xfit, yfit, and dyfit, which sample the continuous fit to our data. We
could pass these to the plt.errorbar function as above, but we don’t really want to
plot 1,000 points with 1,000 errorbars. Instead, we can use the plt.fill_between
function with a light color to visualize this continuous error (Figure 4-29):
In[5]: # Visualize the result
plt.plot(xdata, ydata, 'or')
plt.plot(xfit, yfit, '-', color='gray')

plt.fill_between(xfit, yfit - dyfit, yfit + dyfit,

color='gray', alpha=0.2)
plt.xlim(0, 10);

Figure 4-29. Representing continuous uncertainty with filled regions

Note what we’ve done here with the fill_between function: we pass an x value, then
the lower y-bound, then the upper y-bound, and the result is that the area between
these regions is filled.
The resulting figure gives a very intuitive view into what the Gaussian process regres‐
sion algorithm is doing: in regions near a measured data point, the model is strongly
constrained and this is reflected in the small model errors. In regions far from a
measured data point, the model is not strongly constrained, and the model errors
increase.
For more information on the options available in plt.fill_between() (and the
closely related plt.fill() function), see the function docstring or the Matplotlib
documentation.
Finally, if this seems a bit too low level for your taste, refer to “Visualization with Sea‐
born” on page 311, where we discuss the Seaborn package, which has a more stream‐
lined API for visualizing this type of continuous errorbar.

240 | Chapter 4: Visualization with Matplotlib

Import: Sys - Executable - M Pip Install
No ratings yet
Import: Sys - Executable - M Pip Install
23 pages
FDS Unit 5
No ratings yet
FDS Unit 5
32 pages
Unit V Notes
No ratings yet
Unit V Notes
11 pages
Dev Lecture Notes UNIT-2
No ratings yet
Dev Lecture Notes UNIT-2
57 pages
Unit 5
No ratings yet
Unit 5
27 pages
AI Lab4
No ratings yet
AI Lab4
25 pages
DEV Lecture Notes Unit II
No ratings yet
DEV Lecture Notes Unit II
57 pages
Dev U-2
No ratings yet
Dev U-2
56 pages
Unit 4 DSF
No ratings yet
Unit 4 DSF
15 pages
Unit II Visualizing Using Matplotlib
No ratings yet
Unit II Visualizing Using Matplotlib
24 pages
Mat Plot Lib
No ratings yet
Mat Plot Lib
22 pages
Data Exploration & Visualization - Unit 2
No ratings yet
Data Exploration & Visualization - Unit 2
8 pages
Unit V Data Visualization
No ratings yet
Unit V Data Visualization
49 pages
Unit 5 Plotting - Matplotlib in Python
No ratings yet
Unit 5 Plotting - Matplotlib in Python
15 pages
Unit 5 Merged
No ratings yet
Unit 5 Merged
34 pages
Unit II Lecturer Notes
No ratings yet
Unit II Lecturer Notes
28 pages
DataVisualization - 1 Surya Sir
No ratings yet
DataVisualization - 1 Surya Sir
51 pages
Matplotlib Bov
No ratings yet
Matplotlib Bov
12 pages
Visualization With Matplotlib
No ratings yet
Visualization With Matplotlib
18 pages
4 Unit 4 Data Vizuvalization
No ratings yet
4 Unit 4 Data Vizuvalization
31 pages
FDS Unit 5 JPR
No ratings yet
FDS Unit 5 JPR
64 pages
Data Visualization Using Matplotlib
No ratings yet
Data Visualization Using Matplotlib
30 pages
Ad3301 Dev Unit 2 Notes Eduengg
No ratings yet
Ad3301 Dev Unit 2 Notes Eduengg
58 pages
Python Matplotlib
No ratings yet
Python Matplotlib
20 pages
Unit-5 AD23211 PDS Final NOTES
No ratings yet
Unit-5 AD23211 PDS Final NOTES
43 pages
Unit 5
No ratings yet
Unit 5
40 pages
UNIT3
No ratings yet
UNIT3
60 pages
Unit 5
No ratings yet
Unit 5
10 pages
Practical Guide To Matplotlib For Data Science - 1689973407325
No ratings yet
Practical Guide To Matplotlib For Data Science - 1689973407325
35 pages
Matplotlib in Python
No ratings yet
Matplotlib in Python
43 pages
Pyplot
No ratings yet
Pyplot
14 pages
Module - 5
No ratings yet
Module - 5
30 pages
Mat Plot Lib
No ratings yet
Mat Plot Lib
39 pages
Unit - II Visualization Using Matplotlib
No ratings yet
Unit - II Visualization Using Matplotlib
86 pages
Datascience
No ratings yet
Datascience
50 pages
Matplotlib_ Visualization With Python — Data Science Notes
No ratings yet
Matplotlib_ Visualization With Python — Data Science Notes
5 pages
UNIT Vnotes
No ratings yet
UNIT Vnotes
44 pages
Unit 4 Plotting Final
No ratings yet
Unit 4 Plotting Final
51 pages
Data Visualization in Python With Libraries
No ratings yet
Data Visualization in Python With Libraries
28 pages
Mat Plot Lib
No ratings yet
Mat Plot Lib
12 pages
Plotting Data Using Matplotlib
No ratings yet
Plotting Data Using Matplotlib
32 pages
Graphs Using Matplotlib
No ratings yet
Graphs Using Matplotlib
23 pages
Beginners Python Cheat Sheet PCC Matplotlib PDF
100% (1)
Beginners Python Cheat Sheet PCC Matplotlib PDF
2 pages
Mod 5
No ratings yet
Mod 5
61 pages
FODS Unit-5
No ratings yet
FODS Unit-5
44 pages
Data Visualization
No ratings yet
Data Visualization
66 pages
Matplotlib
No ratings yet
Matplotlib
20 pages
Python Unit 4.notes
No ratings yet
Python Unit 4.notes
50 pages
Matplotlib Handout
No ratings yet
Matplotlib Handout
30 pages
2.5. Introduction To Matplotlib 1
No ratings yet
2.5. Introduction To Matplotlib 1
45 pages
Unit 4 (2) Python
No ratings yet
Unit 4 (2) Python
27 pages
Matplotlib and Seaborn PDF
100% (1)
Matplotlib and Seaborn PDF
29 pages
Esci386 Lesson11 1dplots
No ratings yet
Esci386 Lesson11 1dplots
81 pages
Chapter-4 (Plotting Data Using Matplotlib)
No ratings yet
Chapter-4 (Plotting Data Using Matplotlib)
32 pages
Mat Plot Lib
No ratings yet
Mat Plot Lib
10 pages
Introduction Tom at Plot Lib
No ratings yet
Introduction Tom at Plot Lib
38 pages
Mat Plot Lib
No ratings yet
Mat Plot Lib
18 pages
Matplotlib Functions
No ratings yet
Matplotlib Functions
32 pages
L3_plotting With Pyplot
No ratings yet
L3_plotting With Pyplot
111 pages
Matplotlib
No ratings yet
Matplotlib
12 pages
Graphs with MATLAB (Taken from "MATLAB for Beginners: A Gentle Approach")
From Everand
Graphs with MATLAB (Taken from "MATLAB for Beginners: A Gentle Approach")
Peter Kattan
4/5 (2)
Gate C Notes
No ratings yet
Gate C Notes
141 pages
Unit 4 - Machine Learning - WWW - Rgpvnotes.in
0% (1)
Unit 4 - Machine Learning - WWW - Rgpvnotes.in
16 pages
FPL Notes 2024 Pattern by Imp Solution Hub - 250107 - 164337
No ratings yet
FPL Notes 2024 Pattern by Imp Solution Hub - 250107 - 164337
170 pages
Model Powerroc T35 Serial No. X013156DA Date 9 / 12 / 2012: Instructions For Ordering Replacement Parts
No ratings yet
Model Powerroc T35 Serial No. X013156DA Date 9 / 12 / 2012: Instructions For Ordering Replacement Parts
1 page
23 7-PDF Iso-2859
No ratings yet
23 7-PDF Iso-2859
1 page
ST Assignment2
No ratings yet
ST Assignment2
5 pages
Uvmf Beyond The Alu Generator Tutorial Extending Actual Test Control of The Dut Inputs - VH v15 I12
No ratings yet
Uvmf Beyond The Alu Generator Tutorial Extending Actual Test Control of The Dut Inputs - VH v15 I12
8 pages
Data Structures (Sorting)
No ratings yet
Data Structures (Sorting)
28 pages
15 Semaphores 05 09 2024
No ratings yet
15 Semaphores 05 09 2024
63 pages
MCSL 216
No ratings yet
MCSL 216
11 pages
Object-Oriented Systems Analysis and Design Using UML: Key Points and Objectives
100% (1)
Object-Oriented Systems Analysis and Design Using UML: Key Points and Objectives
40 pages
Java - Programming Module3 (Packages & Exception Handling)
No ratings yet
Java - Programming Module3 (Packages & Exception Handling)
15 pages
Divide Polynomials SGI
100% (1)
Divide Polynomials SGI
2 pages
Chapter 3 PDF
No ratings yet
Chapter 3 PDF
10 pages
King Ooo 11111
No ratings yet
King Ooo 11111
10 pages
Toward Using Higher-Level Abstractions To Teach Parallel Computing
No ratings yet
Toward Using Higher-Level Abstractions To Teach Parallel Computing
6 pages
XI CS Practical Journal
No ratings yet
XI CS Practical Journal
13 pages
Java Thesis
100% (3)
Java Thesis
4 pages
3 - Requirements Gathering
No ratings yet
3 - Requirements Gathering
10 pages
An Introduction To Vectorization With Intel Fortran Compiler 021712
No ratings yet
An Introduction To Vectorization With Intel Fortran Compiler 021712
6 pages
Data Structures Algorithms Mock Test
No ratings yet
Data Structures Algorithms Mock Test
6 pages
26-Last Lecture
No ratings yet
26-Last Lecture
24 pages
COA (17 18) Part 1
No ratings yet
COA (17 18) Part 1
2 pages
Entry Level Java Developer Resume Example
No ratings yet
Entry Level Java Developer Resume Example
1 page
Xii - Cs em Study Material 2025-2026
No ratings yet
Xii - Cs em Study Material 2025-2026
108 pages
Soft Computing
No ratings yet
Soft Computing
6 pages
Graph Theory and Decomposition
No ratings yet
Graph Theory and Decomposition
49 pages
JavaScriptTechnicalInterviewWorkbook 2021
No ratings yet
JavaScriptTechnicalInterviewWorkbook 2021
385 pages
2021 A Review of Image Based Pavement Crack Detection Algorithms
No ratings yet
2021 A Review of Image Based Pavement Crack Detection Algorithms
7 pages

Python Support

Uploaded by

Python Support

Uploaded by

Simple Scatter Plots

Scatter Plots with plt.plot

plt.plot(x, y, 'o', color='black');

Figure 4-20. Scatter plot example

Simple Scatter Plots | 233

Figure 4-21. Demonstration of point numbers

Figure 4-22. Combining line and point markers

Additional keyword arguments to plt.plot specify a wide range of properties of the

234 | Chapter 4: Visualization with Matplotlib

Scatter Plots with plt.scatter

Figure 4-24. A simple scatter plot

Simple Scatter Plots | 235

plt.scatter(x, y, c=colors, s=sizes, alpha=0.3,

Figure 4-25. Changing size, color, and transparency in scatter points

plt.scatter(features[0], features[1], alpha=0.2,

236 | Chapter 4: Visualization with Matplotlib

plot Versus scatter: A Note on Efficiency

Visualizing Errors | 237

plt.errorbar(x, y, yerr=dy, fmt='.k');

Figure 4-27. An errorbar example

238 | Chapter 4: Visualization with Matplotlib

# define the model and draw some data

# Compute the Gaussian process fit

xfit = np.linspace(0, 10, 1000)

Visualizing Errors | 239

plt.fill_between(xfit, yfit - dyfit, yfit + dyfit,

Figure 4-29. Representing continuous uncertainty with filled regions

240 | Chapter 4: Visualization with Matplotlib

You might also like