1 1 Course Notes Data Visualization
1 1 Course Notes Data Visualization
COURSE
CHARTS AND DATA TYPES
THERE IS NEVER ONLY ONE RIGHT VISUALIZATION
CLIENTS REQUEST SPECIFIC COLORS WITH THE AID OF ONLINE TOOLS TO BUILD YOUR OWN GRAPHS AND CHARTS
Bar Chart
Car Listings by Brand
1000
875
800
636
600 509
200
• INTUITIVE
• APPROPRIATE FOR NON-TECHNICAL
AUDIENCES
• ONE OF THE MOST COMMONLY USED
CHARTS
Pie Chart
• DON’T USE WHEN DATA ≠ 100%
C AR S B Y E N G I N E F U E L T Y P E
36%
Diesel
46% Gas
Other
14%
TOO MANY CATEGORIES
4%
60,000
50,000
40,000
LINE CHART WORKS BETTER
Gas
30,000 Petrol
20,000
10,000
Diesel
• AVOID WITH CATEGORIES OF SIMILAR SIZE – DIFFICULT
0 TO DETERMINE SIZE OF NON-RECTANGULAR SHAPES
• ORDER CATEGORIES BY SIZE – TO IMPROVE
1994
2009
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2010
2011
2012
2013
2014
2015
2016
READABILITY
• COMPARE VOLUME AMONG FEATURES
• AT LEAST THREE FEATURES • Y-AXIS MUST START AT 0 – WE’RE MEASURING VOLUME
• ORDERING FOR AT LEAST TWO OF THEM
• TIME SERIES DATA
Line Chart
• WHEN YOU HAVE A LARGE PERIOD OF TIME, NARROW
IT DOWN TO GAIN MORE INSIGHT
S&P vs FTSE Returns (H2 2008)
15.00%
10.00%
5.00%
0.00%
-5.00%
-10.00%
GSPC500 FTSE100
• UP TO SEVERAL CATEGORIES
• TIME SERIES DATA
• Y-AXIS DOESN’T HAVE TO START AT 0
Histogram
START WITH A VERY LARGE NUMBER TO REDUCE THE NUMBER CHOOSE SEVERAL BINS, SUCH THAT THE
OBSERVE THE DATA PATTERN PATTERN IN THE DATA IS VISIBLE
There are scientific approaches, however, they are Scott’s rule - 3.49𝜎𝑛−1/3
not often used in practice. Sturge’s Rule - 𝐾 = 1 + 3.322 log 𝑁
The reason is that real data has noise, is discrete, 𝑏
Doane’s Rule - log 2 (𝑛) + 1 + log 2 (1 + )
etc. 𝜎 𝑏
Scatter Plot • USE TRANSPARENCY TO AVOID OVERPLOTTING
Relationship between Area and Price of California Real
Estate
600
500
400
Price (000' of $)
300
200
100
• A THIRD VARIABLE COULD BE USED WITH A COLOR
0 PARAMETER
0 500 1000 1500 2000 2500
Area (sq. ft.)
20
15
10