0% found this document useful (0 votes)
114 views

Tufte in Python - Part One

The document discusses implementing Edward Tufte's principles for data visualization in Python using Matplotlib. It summarizes Tufte's recommendations from his book "The Visual Display of Quantitative Information". It then shows how to create minimal line plots in the style of Tufte's designs using Matplotlib's object-oriented interface. This includes modifying font settings and adding data labels and lines to the plot. In the next post, the author plans to demonstrate additional Tufte-inspired plots like minimal boxplots and dot-dash plots.

Uploaded by

mr blah
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
114 views

Tufte in Python - Part One

The document discusses implementing Edward Tufte's principles for data visualization in Python using Matplotlib. It summarizes Tufte's recommendations from his book "The Visual Display of Quantitative Information". It then shows how to create minimal line plots in the style of Tufte's designs using Matplotlib's object-oriented interface. This includes modifying font settings and adding data labels and lines to the plot. In the next post, the author plans to demonstrate additional Tufte-inspired plots like minimal boxplots and dot-dash plots.

Uploaded by

mr blah
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Tufte in Python – Part One

anishazaveri1 May 29, 2020

Edward Tufte recommends several principles for


representing data in his book “The Visual Display of
Quantitative Information“. Here are some that I find
especially useful:

1. The representation of numbers, as physically


measured on the surface of the graphic itself, should
be directly proportional to the numerical quantities
represented (p. 77)
2. Write out explanations of the data on the graphic
itself. Label important events in the data (p.77)
3. Show data variation, not design variation (p. 77)
4. Maximize the data-ink ratio. Erase non-data ink and
redundant data-ink (p.105)

Inspired by Lukasz Piwek‘s implementation of Tufte’s


design principles in R, I decided to attempt the same in
Python. In this post, I will show you how to create Tufte-
style line plots, such as this:
Python has an array of visualization packages (here‘s a
good overview). My go-to package till now has been
plotnine, which provides a ggplot2 interface in Python. It’s
elegant and has allowed me to escape learning Python’s
infamously annoying package, matplotlib. However, for all
the flak it gets, matplotlib is the most customizable and
powerful visualization package Python offers. It also
serves as the base for most other Python visualization
tools (including plotnine). So, here’s a guide to
implementing Tufte’s principles in Python using matplotlib.

A very short introduction to Matplotlib


A confusing aspect of matplotlib is the existence of two
APIs. This is possibly one of it’s biggest flaws. It makes
troubleshooting bugs very difficult since answers on
StackOverflow frequently jump between the two APIs.

1. MATLAB-style API: Matplotlib was originally written


to mimic MATLAB, and the pyplot (plt) interface
provides a collection of MATLAB-like commands
2. Object oriented API: This API is more flexible, and the
one to use if you want better control and
customization.

I recommend using the object-oriented interface. This


ensures that you have standard syntax irrespective of
whether your plot is simple or complex

Here are the highlights of my matplotlib reading. It took


me about 30 min to get familiar with the basic syntax.

1. Lifecycle of a plot: A good matplotlib primer using the


object-oriented interface. Pay special attention to the
definition of Fig and Axes under ‘A note on the
Object-Oriented API vs. Pyplot‘. I recommend
keeping this figure open on the side as reference.
After reading this you should feel comfortable with
~80% of the visualizations in this post
2. Get familiar with Artists: Everything in your plot is
basically an Artist. Recognizing this is useful when
you want to customize the default instances of your
objects.
3. Get familiar with GridSpec: This will be useful for the
scatter-histogram plots, which we will create in Part 2
of this post.

Minimal lineplot
The plot we will replicate is found in The Visual Display of
Quantitative Information, p.68.

First, let’s modify the default matplotlib font. The font in


Tufte’s plot is an oldstyle serif font, one where the
numerals don’t line up at the top and the bottom. After
some Googling I settled on ‘Sabon Roman OsF‘ and
downloaded and installed the .ttf version of the font.

To install the font in matplotlib, I deleted the fontList file


from matplotlib’s font cache (find this by running
print(matplotlib.get_cachedir())). Next, modify the
rcParams to use our new font as the default serif font (you
may have to restart your kernel to get this to work)

Next, using the original plot as reference, I created some


data:

Now let’s initialize and modify the figure and axes:

At this point, here’s how our plot looks:

Time to add some data!

Here’s the final plot:


Now that we know our way around Tufte-style line plots,
we can replicate a more complicated example from The
Visual Display of Quantitative Information, p.75. The dark
line-segment between 1955-1956 indicates stricter
enforcement by Connecticut policemen against cars
exceeding the speed limit. Data from other states is
provided for comparison.

Final figure:
In the next post we will look at some other Tufte-style
plots, including a minimal boxplot and a dot-dash plot.

You might also like