0% found this document useful (0 votes)
90 views

ML Visualization NeurIPS Tutorial

Visualization for Machine Learning is a document that discusses how data visualization can be applied to machine learning. It provides an overview of the history and goals of data visualization, how visualizations work by encoding data visually, and some common techniques like color scales, guiding attention, interactive exploration, and faceting. It also discusses opportunities for applying visualization to machine learning, such as visualizing training data, model performance, interpretability, and high-dimensional data. The document aims to understand the state of the art in visualization and how those techniques can help apply machine learning models and communicate their results.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
90 views

ML Visualization NeurIPS Tutorial

Visualization for Machine Learning is a document that discusses how data visualization can be applied to machine learning. It provides an overview of the history and goals of data visualization, how visualizations work by encoding data visually, and some common techniques like color scales, guiding attention, interactive exploration, and faceting. It also discusses opportunities for applying visualization to machine learning, such as visualizing training data, model performance, interpretability, and high-dimensional data. The document aims to understand the state of the art in visualization and how those techniques can help apply machine learning models and communicate their results.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 145

Visualization for Machine Learning

Fernanda Viégas @viegasf


Martin Wattenberg @wattenberg
Google Brain
PAIR
People + AI Research
Bringing Design Thinking and HCI
to Machine Learning
google.ai/pair
Today's Agenda
What is data visualization?
How does it work? What are some best practices?

How has visualization been applied to ML?


Overview of the landscape
Special case: high-dimensional data
Goals
Understand state of the art
Known best practices in visualization
Broad survey of existing applications to ML

Apply visualizations in your own situation


References to tools and libraries
References to literature
What is data visualization?
Transform data into visual encodings

What is it good for?


Data exploration
Scientific insight
Communication
Education

How to ensure it works well?


Engage the visual system in smart ways
Take advantage of pre-attentive processing
What is data visualization?
Transform data into visual marks

What is it good for? How is it different from statistics?


Data exploration Vis: no specific question necessary
Scientific insight Classic Stats: you investigate a specific question*
Communication Vis & Stats: wonderful, complementary partners
Education

How to ensure it works well?


Engage the visual system in smart ways
Take advantage of pre-attentive processing
*OK, maybe not in EDA, but visualization is the
key technique there anyway!
Predates computers...
William Playfair (1786)
Line, bar, pie charts were all
invented by the same person!

Aside from revolutionizing


graphics, Playfair was an
economist, engineer, and even
a secret agent.

(Image: Wikipedia)
Florence Nightingale (1858)
These charts led to the
adoption of better hygiene /
sanitary practices in military
medicine, saving millions of
lives.

Arguably the most effective


visualization ever!

This particular visualization


technique would be frowned
on today. Lesson: technique is
less important than having
the right data and right
message.

(Image: Wikipedia)
W. E. B. Du Bois (1900)
For 1900 World's Fair, a
compendium of
visualizations. Many new
chart types!

Excellent example of
visualization aimed at
political change.

(Quartz)
What do these have in common?
Using special properties of the visual system to help us think.
What do these have in common?
Using special properties of the visual system to help us think.

Our visual system is like a GPU


- Incredibly good at a few special tasks
- With work, can be repurposed for more general situations
What do these have in common?
Using special properties of the visual system to help us think.

Our visual system is like a GPU


- Incredibly good at a few special tasks
- With work, can be repurposed for more general situations

All visualizations are made from a series of compromises.


How do visualizations work?
How do visualizations work?
Find visual encodings that
● Guide viewer's attention
● Communicate data to the viewer
● Let viewer calculate with data

On computer
● Interactive exploration
How do visualizations work?
Find visual encodings that
● Guide viewer's attention
● Communicate data to the viewer
● Let viewer calculate with data

On computer
● Interactive exploration
Encodings: some examples

Edmund Halley, 1686


Comparison A (2012): US Wind Map

Comparison B (2013): Earth.nullschool


Encodings: some theory
From perceptual psychology:
different encodings have different properties.

32.1

59.7

20.8

Position Length Area Slope Brightness Hue Text


Encodings: some theory
Good for communicating exact values...

32.1

59.7

20.8

Position Length Area Slope Brightness Hue Text


Encodings: some theory
Good for communicating ratios...

32.1

59.7

20.8

Position Length Area Slope Brightness Hue Text


Encodings: some theory
Good for drawing attention...

32.1

59.7

20.8

Position Length Area Slope Brightness Hue Text


Special case: color scales
Intensively studied for decades…
Rogowitz & Treinish (1996)
Web article:

“Why Should Engineers and


Scientists Be Worried About Color?”

Conclusions:
● Rainbow scales: bad
● There is no “best” scale
Practically speaking...
When in doubt, use the "Color Brewer" site:
http://colorbrewer2.org

(Built by Cynthia Brewer, a cartographer)


And study continues to this day...

A dive into a very recent paper (CHI 2018)


Color scales
Color scales
Uh oh, colorblindness… (very common!)

Red-blind protonopia. See http://www.color-blindness.com/coblis-color-blindness-simulator/


Guiding attention
Pre-attentive processing
Count the 5s
Count the 5s
Theory: attention
(Colin Ware, Visual Thinking for Design)

Pre-attentive processing / "popout"

Under the right circumstances, visual search


can be parallel, rather than serial

Time to find target does not increase as


number of distractors increases
Pre-Attentive Processing

Color Shape
Layering & separation

after Tufte
Layering & separation

after Tufte
Theory: calculation
Calculation
Example: we naturally average sizes.
“Seeing Sets: Representation by Statistical Properties.” Dan Ariely (2001)
Calculation
We can do weighted averages, too!
Example
Calculation
Hertzsprung-Russell diagram (via Wikipedia)

Your eye is doing something like kernel density


estimation...

Source: Wikipedia
How do visualizations work
- on computers?
How do visualizations work
- on computers?
Beyond static representations
● Interaction
● Conversation and collaboration
Theory: interaction
Shneiderman “mantra”:
(1996: “The Eyes Have It: A Task by Data Type
Taxonomy for Information Visualizations”)
● Overview first
● Zoom and filter
● Details on demand
Theory: interaction
Shneiderman “mantra”:
(1996: “The Eyes Have It: A Task by Data Type
Taxonomy for Information Visualizations”)
● Overview first
● Zoom and filter
● Details on demand

Example: dot maps


The Racial Dot Map: One Dot Per Person for the Entire U.S.
demographics.virginia.edu/DotMap/
Recap: How do visualizations work?
Find visual encodings that
● Guide viewer's attention
● Communicate data to the viewer
● Let viewer calculate with data

On computer
● Interactive exploration
Some common techniques
That could help in the ML context…

From the simple...


Case study: the humble table
We've talked to many, many ML teams

Every one of them displayed data in tables

Good design can make a huge difference


Design thinking in action, a little movie:

Remove to improve data tables


Joey Cherdarchuk
DarkHorse Analytics
Key points
- Structure & hierarchy
- Alignment
- Typography
- Color

These all apply to more complicated visualizations!


Some common techniques
That could help in the ML context…
Data density:
small multiples

Drought’s Footprint
Haeyoun Park, Kevin Quealy
NY Times
Data
faceting

Across U.S. Companies,


Tax Rates Vary Greatly
M. Bostock, M. Ericson, D.
Leonhardt, B. Marsh
NY Times
Back to machine learning!
Opportunities for Vis

Vis Opportunities

Source: Yannick Assogba


Framework: visualization uses in ML
1. Training Data
2. Model Performance
3. Interpretability + model inspection
4. High-dimensional data
5. Education and communication
1. Visualizing training data
Visualizing CIFAR-10
CIFAR-10 Facets Demo
Facets
Open-source visualization
pair-code.github.io/facets
Google Creative Lab
https://quickdraw.withgoogle.com/
.Quick Draw, the data. .
https://quickdraw.withgoogle.com/data. .
When things look alike
across cultures

Machine Learning for Visualization


Let’s Explore the Cutest Big Dataset
Ian Johnson
And when they don’t

South Africa Russia Korea Brazil United States Germany

Visual Averages by Country


Kyle McDonald
Outlets

Germany Japan Malaysia Sweden

Visual Averages by Country


Kyle McDonald
Finding nemo:
small multiples

Visual Averages by Country


Kyle McDonald
2. Performance monitoring (very briefly!)
Monitoring dashboards - apply standard visualization tools!

TensorBoard Visdom

Two examples among many...


3. Interpretability + model inspection
Convolutional NNs
Image classification: interpretability petri dish
Image classifiers are effective in practice

Exactly what they're doing is somewhat mysterious


- And their failures (e.g. adversarial examples) add to mystery

But: Way easier to inspect what’s going on in artificial classifiers than in human
classifiers ;-)

Since these are visual systems, it's natural to use visualization to inspect them
- What features are these networks really using?
- Do individual units have meaning?
- What roles are played by different layers?
- How are high-level concepts built from low-level ones?
Saliency maps - examples

More comparisons: https://pair-code.github.io/saliency/


Saliency maps
(a.k.a. "Sensitivity maps")

Idea: consider sensitivity of class to each pixel


i.e. grad(f), where f is function from pixels to class score.

Many ways to extend basic idea!


- Layer-wise relevance propagation (Binder et al.)
- Integrated gradients (Sundararajan et al.)
- Guided backprop (Springenberg et al.)
- etc.

Yet interpretation is slippery (Adebayo et al., Kindermans et al.)


- Tend to be visually noisy. Are these sometimes Rorschach tests?
- Are some of these methods essentially edge detectors?
Visualizing arbitrary neurons along the way to the top...

Gray: trying to maximize neural response. Colorful squares: maximal examples from an image data set
Visualizing and Understanding Convolutional Networks
Zeiler & Fergus, 2013
Understanding Neural Networks Through Deep Visualization
Yosinski et al. , 2015
http://yosinski.com/deepvis
drawNet
Torralba
Deep Dream

deepdream
Mordvintsev, Tyka, Olah
Combining these
interpretability
ideas to create new
visualizations

The Building Blocks of


Interpretability
Olah, Satyanarayan, Johnson, Carter,
Schubert, Ye, Mordvintsev
Interpreting Deep Visual Representations
Bau, Khosla, Oliva, Torralba
RNNs
Visualizing text sequences, colored by activations of a cell

The Unreasonable Effectiveness of Recurrent Neural Networks


Karpathy, 2015
The Unreasonable Effectiveness of Recurrent Neural Networks
Karpathy, 2015
Seq2Seq-Vis:
Visual Debugging
Tool for Sequence-
to- Sequence
Models
Strobelt, 2018

Examine model
decisions
Connect decisions to
previous examples
Test alternative
decisions
Linking multiple
views...

DQNViz: A Visual
Analytics Approach
to Understand
Deep Q-Networks

Wang et al.,
VAST 2018.
4. High-dimensional data
Why high-dimensional data?
Vectors spaces are the lingua franca of much of ML these days
- Data such as images, audio, video is naturally high-dimensional
- Dense representations of discrete data (e.g. word embeddings) have had major
successes
Why is it hard? Because it's impossible
Why is it hard? Because it's impossible

See Every Map Projection, Bostock.


Main approaches
Linear
- Principal Component Analysis
- Visualization of Labeled Data Using Linear Transformations (Koren & Carmel)

Non-linear (just a few of many)


- Multidimensional scaling
- Sammon mapping
- Isomap
- t-SNE
- UMAP
Main approaches
Linear
- Principal Component Analysis (show as much variation in data as possible)
- Visualization of Labeled Data Using Linear Transformations (clusters match labels)

Non-linear (just a few of many)


- Multidimensional scaling
- Sammon mapping
- Isomap Minimize distortion, according to some metric
- t-SNE
- UMAP
t-SNE
t-SNE
Fairly complex non-linear technique

Uses an adaptive sense of "distance." Translates well between geometry of high- and
low-dimensional space

Has become a standard tool, so we'll spend some time discussing how to read it.
Demo: MNIST visualization

Embedding Projector
Open Source visualization tool
Also available on Tensorboard
projector.tensorflow.org/
"Close reading" a visualization technique
What's the right way to understand
a "magic" visualization technique?

See Distill article


"Close reading" a visualization technique
What's the right way to understand
a "magic" visualization technique?

More visualization, of course!


Those hyperparameters really matter
Those hyperparameters really matter
Cluster sizes in a t-SNE plot mean nothing
Cluster sizes in a t-SNE plot mean nothing
Distances between clusters may not mean much
Distances between clusters may not mean much
You can see some shapes, sometimes
You can see some shapes, sometimes
Let's try this out with MNIST
Stopping too soon yields weird
artifacts.
The 4's may not be separated into
two clusters.

Clusters seem about equally far


apart in 3D; may not actually be.
The clusters of 1's probably is long
and thin.
UMAP: New kid on the block
UMAP: New kid on the block
Practical value
- Faster than t-SNE
- Can efficiently embed into high dimensions (i.e. useful not just for visualization)
- Often seems to capture global structure better
UMAP: New kid on the block
Practical value
- Faster than t-SNE
- Can efficiently embed into high dimensions (i.e. useful not just for visualization)
- Often seems to capture global structure better

Theory
- Roughly: manifold learning combined with explicit topology
- In detail: I don't completely understand the theory!
- This note does an amazing job of extracting key bits of UMAP paper:
https://www.math.upenn.edu/~jhansen/2018/05/04/UMAP/
UMAP: New kid on the block
Comparison of UMAP (left) and t-SNE (right) from McInnes
& Healy.

Global structure does seem to emerge more in UMAP.

For more
Let's compare in real-time on an audio data set!
Comparative Audio Analysis With Wavenet, MFCCs, UMAP,
t-SNE and PCA
(Leon Fedden)
Putting this together
The Beginner's Guide to Dimensionality Reduction
Matthew Conlen and Fred Hohman

https://idyll.pub/post/dimensionality-reduction-293e465c2a3443e8941b016d/
(just Google "Beginner's Guide to Dimensionality Reduction")
Pitfalls of high-dimensional space
Geometry of high-dimensional space holds many surprises…
Be careful about interpreting visualizations!

Adding "usually," "most," and "approximately" where appropriate:

- Two random vectors are perpendicular


- A standard Gaussian distribution is just a uniform distribution on a sphere
- A random matrix is a scalar multiple of an orthogonal matrix
- Random walks all have the same shape
Example: PCA of gradient descent trajectories

Lorch, Visualizing Deep Network Li et al, Visualizing the Loss Landscape


Training Trajectories, 2017 of Neural Nets, 2018
How to interpret? Compare random walks
It turns out that principal components of a random walk in a
high-dimensional space are (probably, approximately) cosines of
various frequencies! (Antognini, Sohl-Dickstein)

Can also see this via Karhunen-Loeve theorem for Brownian


motion.

Important: This doesn't invalidate work that uses PCA to look at


SGD trajectories. But it changes how we read the visualizations:
the interesting parts are differences from Lissajous patterns,
not similarities.

Antognini, Sohl-Dickstein. 2018


Lesson
If you see something interesting in
high-dimensional space…

compare to a random baseline!


Model interpretability example
Multi-lingual translation
What does the language embedding space look like?

https://arxiv.org/abs/1611.04558
Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation

Melvin Johnson, Mike Schuster, Quoc V. Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, Fernanda Viégas,
Martin Wattenberg, Greg Corrado, Macduff Hughes, Jeffrey Dean
Training: English ← → Japanese
English ← → Korean
Japanese ← → Korean (zero shot)
Visualize internal representation ("embedding space")
Research question
What does the multi language embedding space look like?

or

Note: not real data


What does a sentence look like in embedding space?
(points in 1024-dim space: the data that the decoder receives)

E.g. “The stratosphere extends from 10km to 50km in altitude”


What does a sentence look like in embedding space?

Note: simplification of real situation!


What does a sentence look like in embedding space?
What do parallel sentences look like in embedding space?
(same meaning, different language)

like this?
<2en>

<2pt>

English
Portuguese
What do parallel sentences look like in embedding space?
(same meaning, different language)

or like this?

English
Portuguese
Interlingua?
Sentences with the same meaning mapped to similar regions regardless of language!
Distance between bridge / non-bridge sentences is inversely related to translation quality
5. Education and communication
Education & communication
for technical audiences
TensorFlow Playground
playground.tensorflow.org
GAN Lab
https://poloclub.github.io/ganlab/
Distill.pub
Editors: Carter, Olah, Satyanarayan
Görtler, Kehlbeck,
Deussen. 2018
Education & communication
for non-technical audiences
Attacking discrimination with smarter machine learning
research.google.com/bigpicture/attacking-discrimination-in-ml

Transform math into a visual,


interactive simulation that can be
used by a broader set of stakeholders
such as policymakers and regulators.

Wattenberg, Viégas, Hardt. 2016


Google Creative Lab
https://quickdraw.withgoogle.com/
On Quickdraw, users draw
common objects (e.g.
avocado), then see if the
algorithm has correctly
recognized the object.

You were asked to draw avocado, and


the neural net did not recognize it.
After users see the
recognition result,
Quickdraw shows
visual examples to help
users understand the
algorithm’s reasoning.

For example, it shows


examples of what
typical avocados look like.
It also shows a visual diff
between the user’s
drawing and the
most-similar drawings
from alternative classes.
Compare user input to
classes system thought
were closest
Show examples of what the
system expected for the class
in question

Illustrate latent space to users


Visual Analytics in Deep
Learning: An Interrogative
Survey for the Next
Frontiers
Hohman, Kahng, Pienta, Chau
Resources
ML-specific General visualization & design Implementation

Stanford CS 231 Tableau (desktop app) D3


Sequences - Commercial - See also blocks.org
- Seq2Seq-vis - State of the art Notebooks
- LSTMvis - Industrial-strength - Observable
RawGraphs (web) - Jupyter'
Embedding Projector Flourish.studio (web) Matplotlib
Facets Three.js
Lobe.ai Color Brewer Kepler.gl
Coblis Plotly
- Colorblindness simulator
A Survey: Visual Analytics in
Deep Learning (Hohman et al)
Visualization for Machine Learning

Fernanda Viégas @viegasf


Martin Wattenberg @wattenberg
Google Brain

You might also like