0% found this document useful (0 votes)
9 views114 pages

06-Visualization Compressed

The document outlines the process of Exploratory Data Analysis (EDA) and Data Visualization, emphasizing the importance of asking scientific questions, sampling data, and exploring patterns. It discusses effective visualization techniques, such as maintaining graphical integrity, simplicity, and strategic use of color, while contrasting exploratory and explanatory goals. The document also highlights the significance of storytelling with data and provides examples of effective and ineffective visualizations.

Uploaded by

mailsacrifice14
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views114 pages

06-Visualization Compressed

The document outlines the process of Exploratory Data Analysis (EDA) and Data Visualization, emphasizing the importance of asking scientific questions, sampling data, and exploring patterns. It discusses effective visualization techniques, such as maintaining graphical integrity, simplicity, and strategic use of color, while contrasting exploratory and explanatory goals. The document also highlights the significance of storytelling with data and provides examples of effective and ineffective visualizations.

Uploaded by

mailsacrifice14
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 114

Exploratory Data Analysis

&
Data Visualization
Ask an interesting What is the scientific goal?
What would you do if you had all the data?
question. What do you want to predict or estimate?

How were the data sampled?


Which data are relevant?
Get the data. Are there privacy issues?

Plot the data.


Are there anomalies?
Explore the data. Are there patterns?

Build a model.
Model the data. Fit the model.
Validate the model.

Communicate and What did we learn?


Do the results make sense?
visualize the results. Can we tell a story?
Data Exploration
Not always sure what we are looking for
(until we find it)
Example: Antibiotics
Will Burtin, 1951
Genus,Species
Data + -

Min. Inhibitory
Concentration
[ml/g]
What Questions?
How effective are the drugs?

Gram Gram
Positive Negative

If bacteria is gram positive,


If bacteria is gram negative,
Penicillin & Neomycin are
Neomycin is most effective
most effective

M.Bostock, Protovis
afterW.Burtin,1951
How do the bacteria
compare?

Not a streptococcus!
(realized ~30 years later)

Really a streptococcus!
(realized ~20 years later)

Wainer & Lysen,“That’s funny...”


American Scientist, 2009
Adapted from Brian Schmotzer
How do the bacteria compare?

Wainer & Lysen,“That’s funny...”


American Scientist,2009
Exploratory Data Analysis
“The greatest value of a picture is when
it forces us to notice what we never
expected to see.”
JohnTukey
Visualization
To convey information through
graphical representations of data
Visualization Goals
Communicate (Explanatory)
Present data and ideas
Explain and inform
Provide evidence and support
Influence and persuade
Analyze (Exploratory)
Explore the data
Assess a situation
Determine how to proceed
Decide what to do
Visualization vs. statistics
Visualization almost always presents a more informative (though less
quantitative) view of your data than statistics (the noun, not the field)

[Source: https://twitter.com/JustinMatejka/status/770682771656368128 Credit: @JustinMatejka, @albertocairo]

This is a mathematical property: ! data points and " equations to satisfy,


with ! > "
Communicate

NewYorkTimes
Explore
MizBee [Meyer et al. 2009]

http://www.cs.utah.edu/~miriah/mizbee
Effective Visualizations
Not Effective...

Sources:USTreasury andWHO reports


http://viz.wtf
Effective Visualizations
1. Have graphical integrity
2. Keep it simple
3. Use the right display
4. Use color strategically
5. Tell a story with data
Graphical Integrity
Graphical Integrity

Flowing Data
Scale Distortions

Flowing Data
Scale Distortions
Scale Distortions

A.Kriebel,VizWiz
FOLLOW US ij CJ• Con tac t UsI Adverti se I Al u mn i I Su bsc ri beI Su bsc ri beto e -mail
[se arch the YON... ] 1,fü&d

·1> ,,va
t W /J
THE oLoEsT coLLEGE DAILY
1 ESTABLISHED 1878

HOME NEWS SPORTS OPINION WEEKEND MAGAZINE BLOG EVENTS MULTIMEDIA ABOUT LOGIN REGIS

Yale Summer Sessio n


Over 200 full-credit courses.
' Smne\lerit 1s. Afore Lux . June 4 - July 6, July 9 - Aug 10 20 12 .-.,7,-ri""' Yale

Facebook Recommendations
Chart Yale Graduates Majors Class 2011
mliriÏi Shake Shack to o p e n in New Haven
277 people recommend this.

llflfl Popular a n t i- r e lig io n creates false


g dichotom y


1S people recommend this.

llflfl Friends rem em ber Foucher LAW '14


g 10 people recommend this.
AIDS activist speaks about docum entary
film
8 people recommend this.

Panel outlines changes in h i p - h o p


3 0 p e o p le recommend this.
Science,technology,engineering
andmathdegrees
IJ Facebook soc.ial plu gin
Non-STE
Mdegrees
Advertisement
Featured
September 13, 201 1
Keep It Simple
Edward Tufte
Maximize Data-Ink Ratio
Data ink
Data-Ink Ratio =
Total ink used in graphic

0-$24,999 $25,000+ 0-$24,999 $25,000+


Maximize Data-Ink Ratio
Data ink
Data-Ink Ratio =
Total ink used in graphic

700

525

350

175

0
Males Females

0-$24,999 $25,000+ 0-$24,999 $25,000+


Why 3D pie charts
are bad

Kevin Fox
Avoid Chartjunk
Extraneous visual elements that distract from the message

ongoing,Tim Brey
Avoid Chartjunk

ongoing,Tim Brey
Avoid Chartjunk

ongoing,Tim Brey
Avoid Chartjunk

ongoing,Tim Brey
Avoid Chartjunk

ongoing,Tim Brey
Don’t!

matplotlib gallery

Excel Charts Blog


Use The Right Display
http://extremepresentation.typepad.com/blog/files/choosing_a_good_chart.pdf
Comparisons
Bar Chart
How Much Does Beer Consumption Vary by Country?

Bottles per
person per
week
Bars vs. Lines

Zacks 1999
Nathan Yau
Trends
Yahoo! Finance
Proportions
Pie Charts
eagerpies.com
Stacked Bar Chart

S.Few
Stacked Area Chart

S.Few
Don’t!
Correlations
Scatterplots

http://xkcd.com/388/
Don’t!

matplot3d tutorial
Distributions
Histogram

ggplot2
Bin Width

binwidth = 0.1 binwidth = 0.01


ggplot2
Density Plots
2D Density Plots
Seaborn Tutorial
Design Exercise
Hands-On Exercise
How do you feel about
doing science?
Table
Interest Before After
Excited 19 38
Kind of interested 25 30
OK 40 14
Not great 5 6
Bored 11 12

Data courtesy of Cole Nussbaumer


After the pilot program,

68%
of kids expressed interest towards science,
compared to 44% going into the program.
Perceptual Effectiveness
Stephen’s Power Law, 1961

J. Bertin, 1967

Cleveland / McGill, 1984

J. Mackinlay, 1986

Heer / Bostock, 2010


How much longer?

B 4x
How much steeper slope?

A B
4x
How much larger area?

A B
10x
How much darker?

A B
2x
How much bigger value?

A B
4x

2 16
Most
Efficient

Quantitative

Ordered

Least
Efficient Categories
C. Mulbrandon
VisualizingEconomics.com
Most Effective

VisualizingEconomics.com
Less Effective

VisualizingEconomics.com
Pie vs. Bar Charts
Least Effective

Cliff Mass
Use Color Strategically
Color Discriminability

Sinha 2007
Colors for Categories
Do not use more than 5-8 colors at once

Ware,“InformationVisualization”
Colors for Ordinal Data
Vary luminance and saturation

Zeilis et al,2009,“Escaping RGBland:Selecting


Colors for Statistical Graphics”
Colors for Quantitative Data

Hue
Luminance
(Rainbow)

Luminance
& Hue

Rogowitz and Treinish,Why should engineers and


scientists be worried about color?
Rainbow Colormap
Rainbow Colormap
Perceptually nonlinear

R.Simmon
Avoid Rainbow Colors!

matplotlib gallery
Color Blindness

Protanope Deuteranope Tritanope

Red / green Blue /Yellow


deficiencies deficiency

Based on slide from Stone


Color Blindness

Normal Protanope Deuteranope Lightness

Based on slide from Stone


Color Brewer

Nominal

Ordinal

Cynthia Brewer, Color Use Guidelines for Data Representation


Effective Visualizations
1. Have graphical integrity
2. Keep it simple
3. Use the right display
4. Use color strategically
5. Tell a story with data
Data types
Nominal: categorical data, no ordering
Example – Pet: {dog, cat, rabbit, …}
Operations: = , ≠

Ordinal: categorical data, with ordering


Example – Rating: {1,2,3,4,5}
Operations: = , ≠ , ≥ , ≤ , > , <

Interval: numerical data, zero has no fixed meaning


Example – Temperature Fahrenheit Operations:
=, ≠, ≥, ≤, >, <, +, −

Ratio: numerical data, zero has special meaning


Example – Temperature Kelvin
Operations: = , ≠ , ≥ , ≤ , > , < , + , − , ÷
1D
DATA
Bar Chart

Data
Nominal
Ordinal
Interval
Ratio

Suggestions, not rules


Pie Chart

Data
Nominal
Ordinal
Interval
Ratio
Histogram

Data
Nominal
Ordinal
Interval
Ratio
2D
DATA
Scatte
r plot
Dim 1 Dim 2
Nominal
Ordinal
Interval
Ratio

Why not ordinal data in


first dimension?
Line plot

Dim 1 Dim 2
Nominal
Ordinal
Interval
Ratio

Why not ordinal data in


first dimension?
Box and whiskers

Dim 1 Dim 2
Nominal
Ordinal
Interval
Ratio
Heatmap (matrix)

Dim 1 Dim 2
Nominal
Ordinal
Interval
Ratio
Heatmap (density, or 2D histogram)

Dim 1 Dim 2
Nominal
Ordinal
Interval
Ratio
3D+
DATA
3D scatter plot

Dim 1 Dim 2 Dim 3


Nominal
Ordinal
Interval
Ratio
Scatter plot matrix

Dim 1 Dim 2 Dim 3


Nominal
Ordinal
Interval
Ratio
Bubble plot

Dim 1 Dim 2 Dim 3


Nominal
Ordinal
Interval
Ratio
Color scatter plot

Dim 1 Dim 2 Dim 3


Nominal
Ordinal
Interval
Ratio
Further Reading
Edward Tufte
Stephen Few

You might also like