Data Visualization With Ggplot2: Case Study I Bag Plot
Data Visualization With Ggplot2: Case Study I Bag Plot
Case Study I
Bag Plot
Data Visualization with ggplot2
ggplot2 2.0
● Write your own extensions
● Extremely flexible
● Create bag plot
● John Tukey (box plots)
● 2D box plot
Data Visualization with ggplot2
data set
> dim(df)
[1] 202 2
> head(df)
type Value
1 1 99.43952
2 1 99.76982
3 1 101.55871
4 1 100.07051
5 1 100.12929
6 1 101.71506
Data Visualization with ggplot2
2 box plots
> ggplot(df, aes(x = type, Value)) +
geom_boxplot() +
facet_wrap(~type, ncol = 2, scales = "free")
1 2
● ●
104 152
102 150
Value
148
100
146
98 ● ●
1 2
type
Data Visualization with ggplot2
slope plot
> df$ID <- seq_len(nrow(df) / 2)
> ggplot(df, aes(x = type, Value, group = ID)) +
geom_line(alpha = 0.3)
140
Value
120
100
1 2
type
Data Visualization with ggplot2
Distribution of slope
50
slope
Box plot?
45
40
Data Visualization with ggplot2
2 distinct variables
> head(dat)
group1 group2
1 99.43952 149.2896
2 99.76982 150.2569
3 101.55871 149.7533
4 100.07051 149.6525
5 100.12929 149.0484
6 101.71506 149.9550
Data Visualization with ggplot2
Sca!er plot
> ggplot(dat, aes(x = group1, y = group2)) +
geom_point()
● ●
152 ●
● ●
●
●
● ●
● ● ●
● ●● ● ● ●
● ● ● ●● ● ●
● ●
● ● ● ● ●● ●
● ● ●● ●● ●
150 ● ●
● ● ● ●
● ●● ● ● ● ● ●
●
group2
● ●
●● ● ●● ● ● ●
● ● ● ●
●● ● ●●
● ● ● ●
●
●● ● ● ●
● ● ●●
●● ● ●
●
● ● ●
148 ●
146
2D density plot
> library(viridis)
> ggplot(dat, aes(x = group1, y = group2)) +
stat_density_2d(geom = "tile", aes(fill = ..density..),
contour = FALSE) +
scale_fill_viridis()
152.5
density
150.0
0.15
group2
0.10
0.05
147.5
145.0
Bag plot
> library(aplpack)
> bagplot(dat[1:2])
hull
152
● ●
●
●
bag
●
●
●
● ●
● ●
●
●● ●
● ● ●
● ● ● ●
● ● ●
150
● ●
● ● ● ● ●
● ●
● ● ● ● ●● ● ●
●
group2
● ● ●
loop
● ●
● ● ●
● ●●
● ● ● ● ●●
● ● ● ●
● ● ●●
● ● ●
● ● ●
● ●●
● ● ● ●
●● ● ●
● ● ●
● ●
●● ● ●
148
● ● ●
●
●
146
group1
Data Visualization with ggplot2
aplpack
> library(aplpack)
> names(plot_data)
[1] "center" "hull.center" "hull.bag" "hull.loop"
[5] "pxy.bag" "pxy.outer" "pxy.outlier" "hdepths"
[9] "is.one.dim" "prdata" "xy" "xydata"
Data Visualization with ggplot2
ggplot2
> ggplot(dat, aes(x = group1, y = group2)) +
geom_point()
● ●
152 ●
● ●
●
●
● ●
● ● ●
● ●● ● ● ●
● ● ● ●● ● ●
● ●
● ● ● ● ●● ●
● ● ●● ●● ●
150 ● ●
● ● ● ●
● ●● ● ● ● ● ●
●
group2
● ●
●● ● ●● ● ● ●
● ● ● ●
●● ● ●●
● ● ● ●
●
●● ● ● ●
● ● ●●
●● ● ●
●
● ● ●
148 ●
146
ggplot2
> ggplot(dat, aes(x = group1, y = group2)) +
stat_bag(alpha = 0.2)
152
150
group2
148
146
Remarks
● Useful but not popular
● Poorly understood
● Learn to use ggplot2 extensions
DATA VISUALIZATION WITH GGPLOT2
Let’s practice!
DATA VISUALIZATION WITH GGPLOT2
Case Study II
Weather (Part 1)
Data Visualization with ggplot2
Weather
Source: h!p://www.edwardtu"e.com/
Data Visualization with ggplot2
present
> dim(present)
[1] 153 5
> head(present, n = 4)
month day year temp new_day
1 1 1 2016 41 1
2 1 2 2016 37 2
3 1 3 2016 40 3
4 1 4 2016 33 4
> tail(present, n = 4)
month day year temp new_day
148 5 28 2016 79 148
149 5 29 2016 80 149
150 5 30 2016 73 150
151 5 31 2016 76 151
Data Visualization with ggplot2
Time series
> ggplot(present, aes(x = new_day, y = temp)) +
geom_line()
80
60
temp
40
20
0 50 100 150
new_day
Data Visualization with ggplot2
past
> str(past)
'data.frame': 7645 obs. of 11 variables:
$ month : num 1 1 1 1 1 1 1 1 1 1 ...
$ day : num 1 2 3 4 5 6 7 8 9 10 ...
$ year : num 1995 1995 1995 1995 1995 ...
$ temp : num 44 41 28 31 21 27 42 35 34 29 ...
$ new_day : int 1 2 3 4 5 6 7 8 9 10 ...
$ upper : num 51 48 57 55 56 62 52 57 54 47 ...
$ lower : num 17 15 16 15 21 14 14 12 21 8.5 ...
$ avg : num 35.6 35.4 34.9 35.1 35.9 ...
$ se : num 2.19 1.83 2.46 2.53 1.92 ...
$ avg_upper: num 40.2 39.2 40 40.5 39.9 ...
$ avg_lower: num 31 31.5 29.7 29.8 31.9 ...
Data Visualization with ggplot2
75
temp
50
25
present + past
> ggplot(past, aes(x = new_day, y = temp, group = year)) +
geom_line(alpha = 0.4) +
geom_line(data = present, aes(group = 1), col = "red")
75
temp
50
25
present + past
> ggplot(past, aes(x = new_day, y = temp, group = year)) +
geom_line(alpha = 0.4) +
geom_line(data = present, aes(group = 1), col = "red")
75
temp
50
25
Linerange
75
temp
50
25
Records
●
●
75
● ●
● ●
temp
● ● ● ●
50 ●
25
● ●
Custom legend
●
●
75
● ●
● ●
temp
● ● ● ●
50 ●
●
New record high ●
past record high
95% CI range
25 Current year
● ●
past record low
New record low ●
●
Let’s practice!
DATA VISUALIZATION WITH GGPLOT2
Case Study II
Weather (Part 2)
Data Visualization with ggplot2
Up to now
●
●
75
● ●
● ●
temp
● ● ● ●
50 ●
●
New record high ●
past record high
95% CI range
25 Current year
● ●
past record low
New record low ●
●
Situation
● Many data frames
● Plot summary data frame as a layer
● stat_summary()
Data Visualization with ggplot2
stat_historical()
> ggplot(my_data, aes(x = new_day, y = temp, fill = year)) +
stat_historical()
75
temp
50
25
stat_present()
> ggplot(my_data, aes(x = new_day, y = temp, fill = year)) +
stat_historical() +
stat_present()
75
temp
50
25
stat_extremes()
> ggplot(my_data, aes(new_day, temp, fill = year)) +
stat_historical() +
stat_present() +
stat_extremes(aes(colour = ..record..))
●
●
75
● ●
● ●
temp
● ● ● ●
50 ●
25
● ●
Specific layers
> ggplot(my_data, aes(new_day, temp, fill = year)) +
stat_historical() +
# stat_present() +
stat_extremes(aes(colour = ..record..))
●
●
75
● ●
● ●
temp
● ● ● ●
50 ●
25
● ●
Face!ing
PARIS REYKJAVIK
75
●
50
●
●
25 ●
● ●
temp
●
●
75
● ●
● ●
● ● ● ● ● ●
●
●
●
50 ●
25
● ●
Let’s practice!
DATA VISUALIZATION WITH GGPLOT2
Wrap-up
Data Visualization with ggplot2
Statistics Design
Graphical Communication
Data Analysis & Perception
Data Visualization with ggplot2
Explore Explain
Confirm Inform
and and
Analyse Persuade
Data Visualization with ggplot2
Element Description
Element Description
24
21
18
12
●
●
●
9
0
Carnivore Herbivore Insectivore Omnivore
70
Eating habits
60
50
Site
Yield (bushels/acre)
Waseca
40
Crookston
Morris
University Farm
30 Duluth
Grand Rapids
20
10
0
1931 1932
Year
Data Visualization with ggplot2
1.00
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
Obese
0.75
Over−weight
0.50
1
Healthy−weight
0.25
Under−weight
0.00
0 10000 20000 30000 40000
xtext
residual
−5.0−2.5 0.0 2.5 5.0
Data Visualization with ggplot2
4 density
0.025
eruptions
0.020
0.015
0.010
3
0.005
2 Unemployment (%)
12
50 60 70 80 90
9
waiting
3
Silt
100
20
80
40
60
60
40
80
20
10
0
Sand Clay
20
40
60
80
0
10
Data Visualization with ggplot2
Iris Sepals
4.5
4.0
Species
3.5
setosa
Width
versicolor
3.0 virginica
Anderson, 1936
2.5
2.0
4 5 6 7 8
Length
Data Visualization with ggplot2
152
150
group2
148
146
●
●
75
● ●
● ●
temp
● ● ● ●
50 ●
●
New record high ●
past record high
95% CI range
25 Current year
● ●
past record low
New record low ●
●
Thank you!