0% found this document useful (0 votes)
149 views47 pages

Data Visualization With Ggplot2: Case Study I Bag Plot

This document discusses data visualization using ggplot2. It demonstrates how to create box plots, slope plots, scatter plots, density plots, and bag plots using ggplot2 on sample datasets. The bag plot is a 2D extension of the box plot created by John Tukey to visualize the distribution of two variables.

Uploaded by

Sxk 333
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
149 views47 pages

Data Visualization With Ggplot2: Case Study I Bag Plot

This document discusses data visualization using ggplot2. It demonstrates how to create box plots, slope plots, scatter plots, density plots, and bag plots using ggplot2 on sample datasets. The bag plot is a 2D extension of the box plot created by John Tukey to visualize the distribution of two variables.

Uploaded by

Sxk 333
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

DATA VISUALIZATION WITH GGPLOT2

Case Study I
Bag Plot
Data Visualization with ggplot2

ggplot2 2.0
● Write your own extensions
● Extremely flexible
● Create bag plot
● John Tukey (box plots)
● 2D box plot
Data Visualization with ggplot2

data set
> dim(df)
[1] 202 2

> head(df)
type Value
1 1 99.43952
2 1 99.76982
3 1 101.55871
4 1 100.07051
5 1 100.12929
6 1 101.71506
Data Visualization with ggplot2

2 box plots
> ggplot(df, aes(x = type, Value)) +
geom_boxplot() +
facet_wrap(~type, ncol = 2, scales = "free")

1 2
● ●

104 152

102 150
Value

148
100

146
98 ● ●

1 2
type
Data Visualization with ggplot2

slope plot
> df$ID <- seq_len(nrow(df) / 2)
> ggplot(df, aes(x = type, Value, group = ID)) +
geom_line(alpha = 0.3)

140
Value

120

100

1 2
type
Data Visualization with ggplot2

Distribution of slope

50
slope

Box plot?

45

40
Data Visualization with ggplot2

2 distinct variables
> head(dat)
group1 group2
1 99.43952 149.2896
2 99.76982 150.2569
3 101.55871 149.7533
4 100.07051 149.6525
5 100.12929 149.0484
6 101.71506 149.9550
Data Visualization with ggplot2

Sca!er plot
> ggplot(dat, aes(x = group1, y = group2)) +
geom_point()

● ●
152 ●
● ●



● ●
● ● ●
● ●● ● ● ●
● ● ● ●● ● ●
● ●
● ● ● ● ●● ●
● ● ●● ●● ●
150 ● ●
● ● ● ●
● ●● ● ● ● ● ●

group2

● ●
●● ● ●● ● ● ●
● ● ● ●
●● ● ●●
● ● ● ●

●● ● ● ●
● ● ●●
●● ● ●

● ● ●

148 ●

146

98 100 102 104


group1
Data Visualization with ggplot2

2D density plot
> library(viridis)
> ggplot(dat, aes(x = group1, y = group2)) +
stat_density_2d(geom = "tile", aes(fill = ..density..), 

contour = FALSE) +
scale_fill_viridis()

152.5

density
150.0
0.15
group2

0.10

0.05
147.5

145.0

98 100 102 104


group1
Data Visualization with ggplot2

Bag plot
> library(aplpack)
> bagplot(dat[1:2])

hull
152

● ●

bag



● ●
● ●

●● ●
● ● ●
● ● ● ●
● ● ●
150

● ●
● ● ● ● ●
● ●
● ● ● ● ●● ● ●

group2

● ● ●

loop
● ●
● ● ●
● ●●
● ● ● ● ●●
● ● ● ●
● ● ●●
● ● ●
● ● ●
● ●●
● ● ● ●
●● ● ●
● ● ●
● ●
●● ● ●
148

● ● ●


146

98 100 102 104

group1
Data Visualization with ggplot2

aplpack
> library(aplpack)

> plot_data <- compute.bagplot(x = dat$group1, y = dat$group2)

> names(plot_data)
[1] "center" "hull.center" "hull.bag" "hull.loop" 

[5] "pxy.bag" "pxy.outer" "pxy.outlier" "hdepths"
[9] "is.one.dim" "prdata" "xy" "xydata"
Data Visualization with ggplot2

ggplot2
> ggplot(dat, aes(x = group1, y = group2)) +
geom_point()

● ●
152 ●
● ●



● ●
● ● ●
● ●● ● ● ●
● ● ● ●● ● ●
● ●
● ● ● ● ●● ●
● ● ●● ●● ●
150 ● ●
● ● ● ●
● ●● ● ● ● ● ●

group2

● ●
●● ● ●● ● ● ●
● ● ● ●
●● ● ●●
● ● ● ●

●● ● ● ●
● ● ●●
●● ● ●

● ● ●

148 ●

146

98 100 102 104


group1
Data Visualization with ggplot2

ggplot2
> ggplot(dat, aes(x = group1, y = group2)) +
stat_bag(alpha = 0.2)

152

150
group2

148

146

98 100 102 104


group1
Data Visualization with ggplot2

Remarks
● Useful but not popular
● Poorly understood
● Learn to use ggplot2 extensions
DATA VISUALIZATION WITH GGPLOT2

Let’s practice!
DATA VISUALIZATION WITH GGPLOT2

Case Study II
Weather (Part 1)
Data Visualization with ggplot2

Weather

Source: h!p://www.edwardtu"e.com/
Data Visualization with ggplot2

present
> dim(present)
[1] 153 5

> head(present, n = 4)
month day year temp new_day
1 1 1 2016 41 1
2 1 2 2016 37 2
3 1 3 2016 40 3
4 1 4 2016 33 4

> tail(present, n = 4)
month day year temp new_day
148 5 28 2016 79 148
149 5 29 2016 80 149
150 5 30 2016 73 150
151 5 31 2016 76 151
Data Visualization with ggplot2

Time series
> ggplot(present, aes(x = new_day, y = temp)) +
geom_line()

80

60
temp

40

20

0 50 100 150
new_day
Data Visualization with ggplot2

past
> str(past)
'data.frame': 7645 obs. of 11 variables:
$ month : num 1 1 1 1 1 1 1 1 1 1 ...
$ day : num 1 2 3 4 5 6 7 8 9 10 ...
$ year : num 1995 1995 1995 1995 1995 ...
$ temp : num 44 41 28 31 21 27 42 35 34 29 ...
$ new_day : int 1 2 3 4 5 6 7 8 9 10 ...
$ upper : num 51 48 57 55 56 62 52 57 54 47 ...
$ lower : num 17 15 16 15 21 14 14 12 21 8.5 ...
$ avg : num 35.6 35.4 34.9 35.1 35.9 ...
$ se : num 2.19 1.83 2.46 2.53 1.92 ...
$ avg_upper: num 40.2 39.2 40 40.5 39.9 ...
$ avg_lower: num 31 31.5 29.7 29.8 31.9 ...
Data Visualization with ggplot2

Each year separately


> ggplot(past, aes(x = new_day, y = temp, group = year)) +
geom_line(alpha = 0.2)

75
temp

50

25

0 100 200 300


new_day
Data Visualization with ggplot2

present + past
> ggplot(past, aes(x = new_day, y = temp, group = year)) +
geom_line(alpha = 0.4) +
geom_line(data = present, aes(group = 1), col = "red")

75
temp

50

25

0 100 200 300


new_day
Data Visualization with ggplot2

present + past
> ggplot(past, aes(x = new_day, y = temp, group = year)) +
geom_line(alpha = 0.4) +
geom_line(data = present, aes(group = 1), col = "red")

75
temp

50

25

0 100 200 300


new_day
Data Visualization with ggplot2

Linerange

75
temp

50

25

0 100 200 300


new_day
Data Visualization with ggplot2

Records



75
● ●

● ●
temp

● ● ● ●
50 ●

25
● ●

0 100 200 300


new_day
Data Visualization with ggplot2

Custom legend



75
● ●

● ●
temp

● ● ● ●
50 ●


New record high ●
past record high
95% CI range
25 Current year
● ●
past record low
New record low ●

0 100 200 300


new_day
DATA VISUALIZATION WITH GGPLOT2

Let’s practice!
DATA VISUALIZATION WITH GGPLOT2

Case Study II
Weather (Part 2)
Data Visualization with ggplot2

Up to now


75
● ●

● ●
temp

● ● ● ●
50 ●


New record high ●
past record high
95% CI range
25 Current year
● ●
past record low
New record low ●

0 100 200 300


new_day
Data Visualization with ggplot2

Situation
● Many data frames
● Plot summary data frame as a layer
● stat_summary()
Data Visualization with ggplot2

stat_historical()
> ggplot(my_data, aes(x = new_day, y = temp, fill = year)) +
stat_historical()

75
temp

50

25

0 100 200 300


new_day
Data Visualization with ggplot2

stat_present()
> ggplot(my_data, aes(x = new_day, y = temp, fill = year)) +
stat_historical() +
stat_present()

75
temp

50

25

0 100 200 300


new_day
Data Visualization with ggplot2

stat_extremes()
> ggplot(my_data, aes(new_day, temp, fill = year)) +
stat_historical() +
stat_present() +
stat_extremes(aes(colour = ..record..))



75
● ●

● ●
temp

● ● ● ●
50 ●

25
● ●

0 100 200 300


new_day
Data Visualization with ggplot2

Specific layers
> ggplot(my_data, aes(new_day, temp, fill = year)) +
stat_historical() +
# stat_present() +
stat_extremes(aes(colour = ..record..))



75
● ●

● ●
temp

● ● ● ●
50 ●

25
● ●

0 100 200 300


new_day
Data Visualization with ggplot2

Face!ing
PARIS REYKJAVIK

75


50



25 ●
● ●
temp

NEW YORK LONDON



75
● ●

● ●

● ● ● ● ● ●



50 ●

25
● ●

0 100 200 300 0 100 200 300


new_day
DATA VISUALIZATION WITH GGPLOT2

Let’s practice!
DATA VISUALIZATION WITH GGPLOT2

Wrap-up
Data Visualization with ggplot2

Statistics Design

Graphical Communication
Data Analysis & Perception
Data Visualization with ggplot2

Explore Explain

Confirm Inform
and and
Analyse Persuade
Data Visualization with ggplot2

Element Description

Data The dataset being plo!ed.

Aesthetics The scales onto which we map our data.

Geometries The visual elements used for our data.


Data Visualization with ggplot2

Element Description

Data The dataset being plo!ed.

Aesthetics The scales onto which we map our data.

Geometries The visual elements used for our data.

Facets Plo!ing small multiples.

Statistics Representations of our data to aid understanding.

Coordinates The space on which the data will be plo!ed.

Themes All non-data ink.


Data Visualization with ggplot2

24

21

18

Total sleep time (h)


15 ●

12



9

0
Carnivore Herbivore Insectivore Omnivore
70
Eating habits

60

50

Site

Yield (bushels/acre)
Waseca
40
Crookston
Morris
University Farm
30 Duluth
Grand Rapids

20

10

0
1931 1932
Year
Data Visualization with ggplot2

1.00
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
Obese

0.75
Over−weight

0.50
1

Healthy−weight

0.25

Under−weight
0.00
0 10000 20000 30000 40000
xtext

residual
−5.0−2.5 0.0 2.5 5.0
Data Visualization with ggplot2

4 density
0.025
eruptions

0.020
0.015
0.010
3
0.005

2 Unemployment (%)
12

50 60 70 80 90
9
waiting

3
Silt
100
20

80
40

60
60

40
80

20
10
0

Sand Clay
20

40

60

80

0
10
Data Visualization with ggplot2

Iris Sepals
4.5

4.0

Species
3.5
setosa
Width

versicolor
3.0 virginica
Anderson, 1936

2.5

2.0
4 5 6 7 8
Length
Data Visualization with ggplot2

152

150
group2

148

146

98 100 102 104


group1



75
● ●

● ●
temp

● ● ● ●
50 ●


New record high ●
past record high
95% CI range
25 Current year
● ●
past record low
New record low ●

0 100 200 300


new_day
DATA VISUALIZATION WITH GGPLOT2

Thank you!

You might also like