Seaborn Statistical Data Visualization

Uploaded by

Patri Zio

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views

Seaborn Statistical Data Visualization

Uploaded by

Patri Zio

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

seaborn: statistical data visualization

Michael L. Waskom1
1 Center for Neural Science, New York University
DOI: 10.21105/joss.03021
Software
• Review
Summary
• Repository
• Archive
seaborn is a library for making statistical graphics in Python. It provides a high-level interface
to matplotlib and integrates closely with pandas data structures. Functions in the seaborn
library expose a declarative, dataset-oriented API that makes it easy to translate questions
Editor: Lorena Pantano
about data into graphics that can answer them. When given a dataset and a specification
Reviewers:
of the plot to make, seaborn automatically maps the data values to visual attributes such
• @dangeles as color, size, or style, internally computes statistical transformations, and decorates the plot
• @Sara-ShiHo with informative axis labels and a legend. Many seaborn functions can generate figures with
multiple panels that elicit comparisons between conditional subsets of data or across different
Submitted: 29 January 2021 pairings of variables in a dataset. seaborn is designed to be useful throughout the lifecycle of
Published: 06 April 2021
a scientific project. By producing complete graphics from a single function call with minimal
License arguments, seaborn facilitates rapid prototyping and exploratory data analysis. And by
Authors of papers retain offering extensive options for customization, along with exposing the underlying matplotlib
copyright and release the work objects, it can be used to create polished, publication-quality figures.
under a Creative Commons
Attribution 4.0 International
License (CC BY 4.0).
Statement of need

Data visualization is an indispensable part of the scientific process. Effective visualizations

will allow a scientist both to understand their own data and to communicate their insights to
others (Tukey, 1977). These goals can be furthered by tools for specifying a graph that provide
a good balance between efficiency and flexibility. Within the scientific Python ecosystem, the
matplotlib (Hunter, 2007) project is very well established, having been under continuous
development for nearly two decades. It is highly flexible, offering fine-grained control over the
placement and visual appearance of objects in a plot. It can be used interactively through GUI
applications, and it can output graphics to a wide range of static formats. Yet its relatively
low-level API can make some common tasks cumbersome to perform. For example, creating
a scatter plot where the marker size represents a numeric variable and the marker shape
represents a categorical variable requires one to transform the size values to graphical units
and to loop over the categorical levels, separately invoking a plotting function for each marker
type.
The seaborn library offers an interface to matplotlib that permits rapid data exploration
and prototyping of visualizations while retaining much of the flexibility and stability that are
necessary to produce publication-quality graphics. It is domain-general and can be used to
visualize a wide range of datasets that are well-represented within a tabular format.

Example

The following example demonstrates the creation of a figure with seaborn. The example
makes use of one of the built-in datasets that are provided for documentation and generation of

Waskom, M. L., (2021). seaborn: statistical data visualization. Journal of Open Source Software, 6(60), 3021. https://doi.org/10.21105/joss. 1
03021
reproducible bug reports. It illustrates several of the features described in the Overview section,
including the declarative API, semantic mappings, faceting across subplots, aggregation with
error bars, and visual theme control.

import seaborn as sns

sns.set_theme(context="paper")
fmri = sns.load_dataset("fmri")
g = sns.relplot(
data=fmri, kind="line",
x="timepoint", y="signal",
hue="event", style="event", col="region",
height=3.5, aspect=.8,
)
g.savefig("paper_demo.pdf")

region = parietal region = frontal

0.3

0.2
signal

event
0.1 stim
cue
0.0

0.1

0 5 10 15 0 5 10 15
timepoint timepoint

Figure 1: An example seaborn figure demonstrating some of its key features. The image was
generated using seaborn v0.11.1.

Overview

Users interface with seaborn through a collection of plotting functions that share a common
API for plot specification and offer many more specific options for customization. These
functions range from basic plot types such as scatter and line plots to functions that apply
various transformations and abstractions, such as histogram binning, kernel density estimation,
and regression model fitting. Functions in seaborn are classified as either “axes-level” or
“figure-level.” Axes-level functions behave like most plotting functions in the matplotlib.
pyplot namespace. By default, they hook into the state machine that tracks a “current”
figure and add a layer to it, but they can also accept a matplotlib axes object to control
where the plot is drawn, similar to using the matplotlib “object-oriented” interface. Figure-
level functions create their own figure when invoked, allowing them to “facet” the dataset
by creating multiple conditional subplots, along with adding conveniences such as putting
the legend outside the space of the plot by default. Each figure-level function corresponds
to several axes-level functions that serve similar purposes, with a single parameter selecting

Waskom, M. L., (2021). seaborn: statistical data visualization. Journal of Open Source Software, 6(60), 3021. https://doi.org/10.21105/joss. 2
03021
the kind of plot to make. For example, the displot function can produce several different
representations of a distribution, including a histogram, kernel density estimate, or empirical
cumulative distribution function. The figure-level functions make use of a seaborn class that
controls the layout of the figure, mediating between the axes-level functions and matplotlib.
These classes are part of the public API and can be used directly for advanced applications.
One of the key features in seaborn is that variables in a dataset can be automatically
“mapped” to visual attributes of the graph. These transformations are referred to as “se-
mantic” mappings because they endow the attributes with meaning vis a vis the dataset. By
freeing the user from manually specifying the transformations – which often requires looping
and multiple function invocations when using matplotlib directly – seaborn allows rapid
exploration of multidimensional relationships. To further aid efficiency, the default parameters
of the mappings are opinionated. For example, when mapping the color of the elements in a
plot, seaborn infers whether to use a qualitative or quantitative mapping based on whether
the input data are categorical or numeric. This behavior can be further configured or even
overridden by setting additional parameters of each plotting function.
Several seaborn functions also apply statistical transformations to the input data before
plotting, ranging from estimating the mean or median to fitting a general linear model. When
data are transformed in this way, seaborn automatically computes and shows error bars to
provide a visual cue about the uncertainty of the estimate. Unlike many graphical libraries,
seaborn shows 95% confidence interval error bars by default, rather than standard errors. The
confidence intervals are computed with a bootstrap algorithm, allowing them to generalize over
many different statistics, and the default level allows the user to perform “inference by eye”
(Cumming & Finch, 2005). Historically, error bar specification has been relatively limited, but
a forthcoming release (v0.12) will introduce a new configuration system that makes it possible
to show nonparametric percentile intervals and scaled analytic estimates of standard error or
standard deviation statistics.
seaborn aims to be flexible about the format of its input data. The most convenient usage
pattern provides a pandas (McKinney, 2010) dataframe with variables encoded in a long-
form or “tidy” (Wickham, 2014) format. With this format, columns in the dataframe can
be explicitly assigned to roles in the plot, such as specifying the x and y positions of a
scatterplot along with size and shape semantics. Long-form data supports efficient exploration
and prototyping because variables can be assigned different roles in the plot without modifying
anything about the original dataset. But most seaborn functions can also consume and
visualize “wide-form” data, typically producing similar output to how the analogous matplot
lib function would interpret a 2D array (e.g., producing a boxplot where each box represents a
column in the dataframe) while making use of the index and column names to label the graph.
Using the label information in a pandas object can help make plots that are interpretable
without further tweaking – reducing the chance of interpretive errors – but seaborn also
accepts data from a variety of more basic formats, including numpy (Harris et al., 2020) arrays
and simple Python collection types.
seaborn also offers multiple built-in themes that users can select to modify the visual appear-
ance of their graphs. The themes make use of the matplotlib rcParams system, meaning
that they will take effect for any figure created using matplotlib, not just those made by
seaborn. The themes are defined by two disjoint sets of parameters that separately control
the style of the figure and the scaling of its elements (such as line widths and font sizes). This
separation makes it easy to generate multiple versions of a figure that are scaled for different
contexts, such as written reports and slide presentations. The theming system can also be
used to set a default color palette. As color is particularly important in data visualization and
no single set of defaults is universally appropriate, every plotting function makes it easy to
choose an alternate categorical palette or continuous gradient mapping that is well-suited for
the particular dataset and plot type. The seaborn documentation contains a tutorial on the
use of color in data visualization to help users make this important decision.
seaborn does not aim to completely encapsulate or replace matplotlib. Many useful

Waskom, M. L., (2021). seaborn: statistical data visualization. Journal of Open Source Software, 6(60), 3021. https://doi.org/10.21105/joss. 3
03021
graphs can be created through the seaborn interface, but more advanced applications –
such as defining composite figures with multiple arbitrary plot types – will require importing
and using matplotlib as well. Even when calling only seaborn functions, deeper cus-
tomization of the plot appearance is achieved by specifying parameters that are passed-
through to the underlying matplotlib functions, and tweaks to the default axis limits,
ticks, and labels are made by calling methods on the matplotlib object that axes-level
seaborn functions return. This approach is distinct from other statistical graphing sys-
tems, such as ggplot2 (Wickham, 2016). While seaborn offers some similar features
and, in some cases, uses similar terminology to ggplot2, it does not implement the for-
mal Grammar of Graphics and cannot be used to produce arbitrary visualizations. Rather, its
aim is to facilitate rapid exploration and prototyping through named functions and opinion-
ated defaults while allowing the user to leverage the considerable flexibility of matplotlib
to create more domain-specific graphics and to polish figures for publication. An exam-
ple of a successful use of this approach to produce reproducible figures can be found at
https://github.com/WagnerLabPapers/Waskom_PNAS_2017 (Waskom & Wagner, 2017).

Acknowledgements

M.L.W. has been supported by the National Science Foundation IGERT program (0801700)
and by the Simons Foundation as a Junior Fellow in the Simons Society of Fellows (527794).
Many others have helped improve seaborn by asking questions, reporting bugs, and con-
tributing code; thank you to this community.

References

Cumming, G., & Finch, S. (2005). Inference by eye: confidence intervals and how to read
pictures of data. The American Psychologist, 60(2), 170–180. https://doi.org/10.1037/
0003-066X.60.2.170
Harris, C. R., Millman, K. J., Walt, S. J. van der, Gommers, R., Virtanen, P., Cournapeau,
D., Wieser, E., Taylor, J., Berg, S., Smith, N. J., Kern, R., Picus, M., Hoyer, S., Kerkwijk,
M. H. van, Brett, M., Haldane, A., R’ıo, J. F. del, Wiebe, M., Peterson, P., … Oliphant,
T. E. (2020). Array programming with NumPy. Nature, 585(7825), 357–362. https:
//doi.org/10.1038/s41586-020-2649-2
Hunter, J. D. (2007). Matplotlib: A 2D graphics environment. Computing in Science &
Engineering, 9(3), 90–95. https://doi.org/10.1109/MCSE.2007.55
McKinney, W. (2010). Data structures for statistical computing in python. In S. van der Walt
& J. Millman (Eds.), Proceedings of the 9th Python in Science Conference (pp. 51–56).
https://doi.org/10.25080/Majora-92bf1922-00a
Tukey, J. W. (1977). Exploratory data analysis. Addison-Wesley. ISBN: 978-0201076165
Waskom, M. L., & Wagner, A. D. (2017). Distributed representation of context by intrinsic
subnetworks in prefrontal cortex. Proceedings of the National Academy of Sciences, 2030–
2035. https://doi.org/10.1073/pnas.1615269114
Wickham, H. (2014). Tidy data. Journal of Statistical Software, Articles, 59(10), 1–23.
https://doi.org/10.18637/jss.v059.i10
Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer-Verlag.
ISBN: 978-3-319-24277-4

Waskom, M. L., (2021). seaborn: statistical data visualization. Journal of Open Source Software, 6(60), 3021. https://doi.org/10.21105/joss. 4
03021

Python Seaborn Notes
No ratings yet
Python Seaborn Notes
28 pages
Seaborn Tutorial PDF
100% (1)
Seaborn Tutorial PDF
84 pages
Exp-9
No ratings yet
Exp-9
3 pages
visualization
No ratings yet
visualization
18 pages
Unit 5 Seaborn Visualization - Copy
No ratings yet
Unit 5 Seaborn Visualization - Copy
35 pages
ISE2_2020BTECS00004
No ratings yet
ISE2_2020BTECS00004
12 pages
Seaborn
No ratings yet
Seaborn
19 pages
04.14 Visualization With Seaborn
No ratings yet
04.14 Visualization With Seaborn
11 pages
Day2Part2. DataVisualization
No ratings yet
Day2Part2. DataVisualization
29 pages
Seaborn - Part 1
No ratings yet
Seaborn - Part 1
22 pages
Seaborn_in_ML_Final_Presentation
No ratings yet
Seaborn_in_ML_Final_Presentation
30 pages
Visualization with Seaborn _ Python Data Science Handbook
No ratings yet
Visualization with Seaborn _ Python Data Science Handbook
17 pages
Data Visualization Using Seaborn - Towards Data Science
No ratings yet
Data Visualization Using Seaborn - Towards Data Science
31 pages
09 Static To Interactive Visualisation
No ratings yet
09 Static To Interactive Visualisation
27 pages
Basic_Plotting_with_Seaborn
No ratings yet
Basic_Plotting_with_Seaborn
6 pages
Seaborn
No ratings yet
Seaborn
2 pages
Session 5
No ratings yet
Session 5
16 pages
Data Visu Lab4
No ratings yet
Data Visu Lab4
23 pages
B15_Python_b3_Visualization
No ratings yet
B15_Python_b3_Visualization
45 pages
Python Interview Prep Doc
No ratings yet
Python Interview Prep Doc
6 pages
Wa0005.
No ratings yet
Wa0005.
4 pages
21AD71-module-3-textbook
No ratings yet
21AD71-module-3-textbook
49 pages
Ultimate_Data_Visualization_Guide_with_Python
No ratings yet
Ultimate_Data_Visualization_Guide_with_Python
26 pages
module 3
No ratings yet
module 3
26 pages
Seaborn 2
No ratings yet
Seaborn 2
49 pages
Day 14
No ratings yet
Day 14
17 pages
Data-Visualization-with-Matplotlib-and-Seaborn
No ratings yet
Data-Visualization-with-Matplotlib-and-Seaborn
10 pages
An Introduction To Seaborn
No ratings yet
An Introduction To Seaborn
42 pages
Seaborn
No ratings yet
Seaborn
8 pages
Overview of Seaborn Plotting Functions — Seaborn 0.13.2 Documentation
No ratings yet
Overview of Seaborn Plotting Functions — Seaborn 0.13.2 Documentation
10 pages
Binder-7-Intro To Data Vis With Seaborn
No ratings yet
Binder-7-Intro To Data Vis With Seaborn
150 pages
13 Ultimate Seaborn Tricks Using Python (Tutorial)
No ratings yet
13 Ultimate Seaborn Tricks Using Python (Tutorial)
10 pages
Better Plotting in Python With Seaborn: The Bright Blue Horror
No ratings yet
Better Plotting in Python With Seaborn: The Bright Blue Horror
9 pages
Python - Adv - 3 - Jupyter Notebook (Student)
No ratings yet
Python - Adv - 3 - Jupyter Notebook (Student)
18 pages
Advanced_Plot_Types_with_Seaborn
No ratings yet
Advanced_Plot_Types_with_Seaborn
4 pages
Data Visualization With Seaborn
No ratings yet
Data Visualization With Seaborn
3 pages
DM File
No ratings yet
DM File
22 pages
Columbia Seaborn Tutorial
No ratings yet
Columbia Seaborn Tutorial
12 pages
Seaborn Cheat Sheet Python For Data Science: 3 Plotting With Seaborn 3 Plotting With Seaborn
No ratings yet
Seaborn Cheat Sheet Python For Data Science: 3 Plotting With Seaborn 3 Plotting With Seaborn
1 page
04.14-Visualization-With-Seaborn
No ratings yet
04.14-Visualization-With-Seaborn
2 pages
Data Visualization With Python
No ratings yet
Data Visualization With Python
36 pages
Combinepdf
No ratings yet
Combinepdf
101 pages
Combinepdf
No ratings yet
Combinepdf
77 pages
Data Visualization: Created By: Joshua Rafael Sanchez
No ratings yet
Data Visualization: Created By: Joshua Rafael Sanchez
39 pages
Visualizing Categorical Data — Seaborn 0.13.2 Documentation
No ratings yet
Visualizing Categorical Data — Seaborn 0.13.2 Documentation
14 pages
Seaborn Seaborn: 1 2 Import Seaborn As Sns Import Pandas As PD
No ratings yet
Seaborn Seaborn: 1 2 Import Seaborn As Sns Import Pandas As PD
2 pages
Jmis 26 4 167
No ratings yet
Jmis 26 4 167
9 pages
Visualization Library Documentation
No ratings yet
Visualization Library Documentation
16 pages
Lecture 2.3
No ratings yet
Lecture 2.3
25 pages
Plot Per Columns Features Kde or Normal Distribution Seaborn in Details
No ratings yet
Plot Per Columns Features Kde or Normal Distribution Seaborn in Details
272 pages
5_6233181033324352260
No ratings yet
5_6233181033324352260
5 pages
Chapter 4 Data Visualizations
No ratings yet
Chapter 4 Data Visualizations
24 pages
Visualization With Help of PANDAS
No ratings yet
Visualization With Help of PANDAS
83 pages
Libraries
No ratings yet
Libraries
3 pages
Experiment 3
No ratings yet
Experiment 3
10 pages
Python Libraries
No ratings yet
Python Libraries
17 pages
Pyramid Image Processing: Exploring the Depths of Visual Analysis
From Everand
Pyramid Image Processing: Exploring the Depths of Visual Analysis
Fouad Sabry
No ratings yet
Ian Talks Python A-Z
From Everand
Ian Talks Python A-Z
Ian Eress
No ratings yet
Learn C++
From Everand
Learn C++
Aishik Dutta
No ratings yet
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
20 Materials
No ratings yet
20 Materials
32 pages
Stability of Nonlinear Systems
No ratings yet
Stability of Nonlinear Systems
18 pages
A Hundred Evidences Supporting The Book of Mormon
100% (2)
A Hundred Evidences Supporting The Book of Mormon
370 pages
Categories in Vedanta
No ratings yet
Categories in Vedanta
4 pages
Dynamics Horizontal and Vertical Motion
No ratings yet
Dynamics Horizontal and Vertical Motion
3 pages
C801 DDRCS
No ratings yet
C801 DDRCS
64 pages
Reference Paper 1
No ratings yet
Reference Paper 1
10 pages
Leadership Skills For Communication, Group Facilitation, and Action
No ratings yet
Leadership Skills For Communication, Group Facilitation, and Action
18 pages
Project PDF
No ratings yet
Project PDF
4 pages
Sydney Grammar 2017 Year 10 Maths Yearly & Solutions
No ratings yet
Sydney Grammar 2017 Year 10 Maths Yearly & Solutions
18 pages
Analysis of Stone Column Supported Geosy
No ratings yet
Analysis of Stone Column Supported Geosy
18 pages
ORENDA
No ratings yet
ORENDA
26 pages
Immediate download Dragonflies and Damselflies of the East Dennis Paulson ebooks 2024
No ratings yet
Immediate download Dragonflies and Damselflies of the East Dennis Paulson ebooks 2024
55 pages
Queation - CEM (Internal Assesment) 2019 May
No ratings yet
Queation - CEM (Internal Assesment) 2019 May
4 pages
Ethics (Hum 12)
No ratings yet
Ethics (Hum 12)
25 pages
Modified Ottovski System Guide Document
No ratings yet
Modified Ottovski System Guide Document
44 pages
AAN016 - V1 - Understanding The Rheology of Structured Fluids
No ratings yet
AAN016 - V1 - Understanding The Rheology of Structured Fluids
18 pages
Summative Test in Tle 7
86% (7)
Summative Test in Tle 7
16 pages
GSF35-2 PU: Part. No.: 3410.0345
No ratings yet
GSF35-2 PU: Part. No.: 3410.0345
2 pages
Eme 07092009
No ratings yet
Eme 07092009
2 pages
GA Week10
No ratings yet
GA Week10
12 pages
Prompts
No ratings yet
Prompts
4 pages
Garments Factories Are Classified According To Their Product Types Are As Follows
No ratings yet
Garments Factories Are Classified According To Their Product Types Are As Follows
7 pages
Yali Entrepreneurs: Workbook Two
No ratings yet
Yali Entrepreneurs: Workbook Two
13 pages
Smart Manufacturing & Manufacturing As A Service: Lscm-Information Technologies in Logistics
No ratings yet
Smart Manufacturing & Manufacturing As A Service: Lscm-Information Technologies in Logistics
15 pages
Download Full The Routledge International Handbook of Spirituality in Society and the Professions 1st Edition Laszlo Zsolnai And Bernadette Flanagan (Editors) PDF All Chapters
100% (4)
Download Full The Routledge International Handbook of Spirituality in Society and the Professions 1st Edition Laszlo Zsolnai And Bernadette Flanagan (Editors) PDF All Chapters
40 pages
John Philip D. Tiongco John Philip D. Tiongco John Philip D. Tiongco John Philip D. Tiongco
No ratings yet
John Philip D. Tiongco John Philip D. Tiongco John Philip D. Tiongco John Philip D. Tiongco
2 pages
PA31T2 MM POWER PLANT RIGGING AND CHECK[1]
No ratings yet
PA31T2 MM POWER PLANT RIGGING AND CHECK[1]
27 pages
Imr664 - Sbe Phase 2 - ST2
No ratings yet
Imr664 - Sbe Phase 2 - ST2
30 pages
WebFilter - Quiz - Attempt Review2
No ratings yet
WebFilter - Quiz - Attempt Review2
2 pages