Ggplot2_advancedTP.Rmd
Ggplot2_advancedTP.Rmd
library(rmarkdown)
library(epuRate)
# If necessary
# library(devtools)
# install_github("holtzy/epuRate")
```
<style>
.questionNumber{
color: #69b3a2;
border: solid;
border-color: #69b3a2;
padding: 3px;
border-width: 1px;
border-radius: 2px;
margin-top: 200px;
}
.code-folding-btn {
display: none;
}
</style>
<br><br>
# Get ready
***
The following libraries are needed all along the practical. Install them with
`install.packages()` if you do not have them already. Then load them with
`library()`.
```{r, echo=TRUE}
# Load it
library(ggplot2)
library(dplyr)
library(hrbrthemes)
library(viridis)
library(plotly)
```
# 1- General appearance
***
## → Titles
All `ggplot2` chart components can be changed using the `theme()` function. You can
see a complete list of components in the official
[documentation](https://ggplot2.tidyverse.org/reference/theme.html).
## → Themes
- `theme_bw()`
- `theme_dark()`
- `theme_minimal()`
- `theme_classic()`
# 2- Annotation
***
Annotation is a crucial component of a good dataviz. It can turn a boring graphic
into an interesting and insightful way to convey information. Dataviz is often
separated in two main types: exploratory and explanatory analysis. Annotation is
used for the second type.
## → Text
The most common type of annotation is text. Let's say you have a spike in a line
plot. It totally makes sense to highlight it, and explain more in details what it
is about.
# plot
...
```
```{r, class.source="Correction",fig.show="hide",echo=FALSE}
# Load dataset from github
data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/
Example_dataset/3_TwoNumOrdered.csv", header=T)
data$date <- as.Date(data$date)
# plot
data %>%
ggplot( aes(x=date, y=value)) +
geom_line(color="#69b3a2")
```
```{r, class.source="Correction",fig.show="hide",echo=FALSE}
# plot
data %>%
ggplot( aes(x=date, y=value)) +
geom_line(color="#69b3a2") +
annotate(geom="text", x=as.Date("2017-01-01"), y=19000,
label="Bitcoin price reached 20k $\nat the end of 2017")
```
## → Shape
# plot
data %>%
ggplot( aes(x=date, y=value)) +
geom_line(color="#69b3a2") +
annotate(geom="text", ...) +
annotate(geom="point", ...)
```
```{r, class.source="Correction",fig.show="hide",echo=FALSE}
# Find spike date and value:
# data %>% arrange(desc(value)) %>% head(1)
# plot
data %>%
ggplot( aes(x=date, y=value)) +
geom_line(color="#69b3a2") +
ylim(0,22000) +
annotate(geom="text", x=as.Date("2017-01-01"), y=20089,
label="Bitcoin price reached 20k $\nat the end of 2017") +
annotate(geom="point", x=as.Date("2017-12-17"), y=20089, size=10, shape=21,
fill="transparent")
```
## → Abline
```{r, class.source="Correction",fig.show="hide",echo=FALSE}
# Find spike date and value:
# data %>% arrange(desc(value)) %>% head(1)
# plot
data %>%
ggplot( aes(x=date, y=value)) +
geom_line(color="#69b3a2") +
ylim(0,22000) +
annotate(geom="text", x=as.Date("2017-01-01"), y=20089,
label="Bitcoin price reached 20k $\nat the end of 2017") +
annotate(geom="point", x=as.Date("2017-12-17"), y=20089, size=10, shape=21,
fill="transparent") +
geom_hline(yintercept=5000, color="orange", size=.5)
```
## → Color
```{r, class.source="Correction",fig.show="hide",echo=FALSE}
# Data are available in the gapminder package
library(gapminder)
data <- gapminder %>% filter(year=="2007") %>% select(-year)
# Basic scatterplot
ggplot( data, aes(x=gdpPercap, y=lifeExp, size = pop, color = continent)) +
geom_point(alpha=0.7)
```
- create a new column with `mutate`: this new column has the value `yes` if
`country=="South Africa"`, `no` otherwise. This is possible thanks to the `ifelse`
function.
- in the aesthetics part of the ggplot call, use this new column to control dot
colors
- use `scale_color_manual()` to control the color of both group. Use a bright color
for the country to highlight, and grey for the others.
```{r, class.source="Correction",fig.show="hide",echo=FALSE}
# Basic scatterplot
data %>%
mutate(isSouthAfrica = ifelse(country=="South Africa", "yes", "no")) %>%
ggplot( aes(x=gdpPercap, y=lifeExp, size = pop, color = isSouthAfrica)) +
geom_point(alpha=0.7) +
scale_color_manual(values=c("grey", "red")) +
theme(legend.position="none")
```
# prepare data
tmp <- data %>%
mutate( annotation = ifelse(...))
# plot
tmp %>%
ggplot( ...) +
geom... +
theme(...) +
geom_text_repel(data=tmp %>% filter(annotation=="yes"), aes(label=country),
size=4 )
```
```{r, class.source="Correction",fig.show="hide",echo=FALSE}
# ggrepel
library(ggrepel)
# prepare data
tmp <- data %>%
mutate( annotation = ifelse(gdpPercap > 5000 & lifeExp < 60, "yes", "no"))
# plot
tmp %>%
ggplot( aes(x=gdpPercap, y=lifeExp, size = pop, color = continent)) +
geom_point(alpha=0.7) +
theme(legend.position="none") +
geom_text_repel(data=tmp %>% filter(annotation=="yes"), aes(label=country),
size=4 )
```
# 3- Faceting
***
## → facet_wrap()
...
```
```{r, class.source="Correction",fig.show="hide",echo=FALSE}
# Libraries
library(babynames)
## → facet_grid()
<br><br><br><span class="questionNumber">Bonus</span>Load
[this](https://raw.githubusercontent.com/holtzy/data_to_viz/master/
Example_dataset/10_OneNumSevCatSubgroupsSevObs.csv) dataset in R. Build a histogram
for every combination of day and sex using `facet_wrap()`
# Plot
ggplot(data, aes(x=total_bill)) +
geom_histogram() +
facet_grid(sex~day)
```
# 4- Saving plots
***
```{r}
# save the plot in an object called p
p <- ggplot(data, aes(x=total_bill)) +
geom_histogram() +
facet_grid(sex~day)
# 5- Colors
***
Picking the right colors is a crucial step for a good dataviz. R offers awesome
options and packages to make the right choices. Here is an overview of the main
options.
```{r, class.source="Correction",fig.show="hide",echo=FALSE}
ggplot(iris, aes(x=Sepal.Length, y=Petal.Length, color=Species)) +
geom_point()
```
```{r, class.source="Correction",fig.show="hide",echo=FALSE}
ggplot(iris, aes(x=Sepal.Length, y=Petal.Length, color=Species)) +
geom_point() +
scale_color_manual( values=c("red","green","blue"))
```
Pick the one you like the most and apply it to to previous scatterplot. Use it to
color the `Species`.
```{r, class.source="Correction",fig.show="hide",echo=FALSE}
ggplot(iris, aes(x=Sepal.Length, y=Petal.Length, color=Species)) +
geom_point(size=4) +
scale_color_brewer(palette = "Set3")
```
## → Plotly
# Basic ggplot
p <- data %>%
ggplot( ...
p
```
```{r, class.source="Correction",fig.show="hide",echo=FALSE}
# load data
library(gapminder)
data <- gapminder %>% filter(year=="2007") %>% select(-year)
# Basic ggplot
p <- data %>%
ggplot( aes(x=gdpPercap, y=lifeExp, size = pop, color = continent)) +
geom_point(alpha=0.7)
p
```
- build a new column called `myText`. Fill it with whatever you want to show in the
tooltip.
- add a new aesthetics: `text=myText`
- in the `ggplotly()` call, add `tooltip="text"`
ggplotly(p, tooltip="text")
```
```{r, class.source="Correction",fig.show="hide",echo=FALSE}
# Basic ggplot
p <- data %>%
mutate(myText=paste("This country is: " , country )) %>%
ggplot( aes(x=gdpPercap, y=lifeExp, size = pop, color = continent, text=myText))
+
geom_point(alpha=0.7)
ggplotly(p, tooltip="text")
```
## → Leaflet
```{r}
# Library
library(leaflet)
# load example data (Fiji Earthquakes) + keep only 100 first lines
data(quakes)
quakes = head(quakes, 100)
# Final Map
leaflet(quakes) %>%
addTiles() %>%
setView( lat=-27, lng=170 , zoom=4) %>%
addProviderTiles("Esri.WorldImagery") %>%
addCircleMarkers(~long, ~lat,
fillColor = ~mypalette(mag), fillOpacity = 0.7, color="white", radius=8,
stroke=FALSE
) %>%
addLegend( pal=mypalette, values=~mag, opacity=0.9, title = "Magnitude", position
= "bottomright" )
```
## → Heatmap
The `d3heatmap` package allows to build interactive heatmaps in a few line of code.
Let's see how it works
```{r, class.source="Correction",fig.show="hide",echo=FALSE}
# Load data
data <- read.csv("http://datasets.flowingdata.com/ppg2008.csv", row.names = 1)
# head(data)
# summary(data)
```
```{r, class.source="Correction",fig.show="hide",echo=FALSE}
# Make the heatmap
heatmap( as.matrix(data), scale = "column")
```
# Build heatmap
d3heatmap(...)
```
```{r, class.source="Correction",fig.show="hide",echo=FALSE}
# Load library
library(d3heatmap)
# Build heatmap
d3heatmap(data, scale = "column")
```
# Then you can create the xts format, and thus use dygraph
don <- xts(x = data$value, order.by = data$date)
# 7- Scales
***
Scales control the details of how data values are translated to visual properties.
[Many different scales](https://ggplot2.tidyverse.org/reference/index.html#section-
scales) are offered by ggplot2. The most widely one is probably the log scale.