0% found this document useful (0 votes)
3 views

Ggplot2_advancedTP.Rmd

This document is a practical guide on advanced data visualization using R and ggplot2, focusing on enhancing charts through annotations, themes, and customizations. It includes code examples and exercises for users to practice creating and modifying visualizations, such as histograms and line plots. The guide also emphasizes the importance of annotation in data visualization to convey insights effectively.

Uploaded by

emmanuel prah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Ggplot2_advancedTP.Rmd

This document is a practical guide on advanced data visualization using R and ggplot2, focusing on enhancing charts through annotations, themes, and customizations. It includes code examples and exercises for users to practice creating and modifying visualizations, such as histograms and line plots. The guide also emphasizes the importance of annotation in data visualization to convey insights effectively.

Uploaded by

emmanuel prah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 22

---

title: "Advanced data visualization with R and ggplot2"


author: "a practical by [Yan Holtz](https://github.com/holtzy)"
date: "`r format(Sys.time(), '%d %B %Y')`"
mail: "[email protected]"
linkedin: "yan-holtz-2477534a"
twitter: "r_graph_gallery"
github: "holtzy"
home: "www.yan-holtz.com"
output:
epuRate::epurate:
toc: TRUE
number_sections: FALSE
code_folding: "show"
---

```{r global options, include = FALSE}


knitr::opts_chunk$set( warning=FALSE, message=FALSE)

library(rmarkdown)
library(epuRate)

# If necessary
# library(devtools)
# install_github("holtzy/epuRate")
```

<style>
.questionNumber{
color: #69b3a2;
border: solid;
border-color: #69b3a2;
padding: 3px;
border-width: 1px;
border-radius: 2px;
margin-top: 200px;
}
.code-folding-btn {
display: none;
}
</style>

<br><br>

> This practical follows the previous basic [introduction to


ggplot2](https://www.yan-holtz.com/teaching). It allows to go further with
`ggplot2`: annotation, theme customization, color palette, output formats, scales,
and more.

# Get ready
***
The following libraries are needed all along the practical. Install them with
`install.packages()` if you do not have them already. Then load them with
`library()`.
```{r, echo=TRUE}
# Load it
library(ggplot2)
library(dplyr)
library(hrbrthemes)
library(viridis)
library(plotly)
```

# 1- General appearance
***

## &rarr; Titles

<br><span class="questionNumber">Q1.1</span> The code below builds a basic


histogram for Rbnb apartment prices on the French Riviera. It shows only value
under 300 euros. Add code to:

- add a title with `ggtitle()`


- change axis labels `xlab()` and `ylab()`
- change axis limits with `xlim()` and `ylim()`

```{r, eval=FALSE, class.source="Question",echo=TRUE }


# Load dataset from github
data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/
Example_dataset/1_OneNum.csv", header=TRUE)

# Make the histogram


data %>%
filter( price<300 ) %>%
ggplot( aes(x=price)) +
geom_histogram() +
...
```

```{r, class.source="Correction",fig.show="hide",echo=FALSE, fig.show='asis'}


# Libraries
library(ggplot2)

# Load dataset from github


data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/
Example_dataset/1_OneNum.csv", header=TRUE)

# Make the histogram


data %>%
filter( price<300 ) %>%
ggplot( aes(x=price)) +
geom_histogram() +
ggtitle("Night price distribution of Airbnb appartements") +
xlab("Night price") +
ylab("Number of apartments") +
xlim(0,400)
```

## &rarr; Chart components

All `ggplot2` chart components can be changed using the `theme()` function. You can
see a complete list of components in the official
[documentation](https://ggplot2.tidyverse.org/reference/theme.html).

<u>Note</u>: components are changed using different functions: `element_text()`,


`element_line()` for lines and so on..

<br><br><br><span class="questionNumber">Q1.2</span> Reproduce the previous


histogram and change:

- plot title size and color with `plot.title`


- X axis title size and color with `axis.title.x`
- Grid appearance with `panel.grid.major`

```{r, eval=FALSE, class.source="Question",echo=TRUE }


# Make the histogram
data %>%
filter( price<300 ) %>%
ggplot( aes(x=price)) +
... +
theme(
plot.title = element_text(size=..., color=...),
...,
...
)
```

```{r, class.source="Correction",fig.show="hide",echo=FALSE, fig.show='asis'}


# Make the histogram
data %>%
filter( price<300 ) %>%
ggplot( aes(x=price)) +
geom_histogram() +
ggtitle("Night price distribution of Airbnb appartements") +
xlab("Night price") +
ylab("Number of apartments") +
xlim(0,400) +
theme(
plot.title = element_text(size=13, color="orange"),
axis.title.x = element_text(size=13, color="purple"),
panel.grid.major = element_line(colour = "red")
)
```

## &rarr; Themes

<br><span class="questionNumber">Q1.3</span> `ggplot2` offers a set of pre-built


themes. Try the followings to see which one you like the most:

- `theme_bw()`
- `theme_dark()`
- `theme_minimal()`
- `theme_classic()`

See a complete list [here](https://www.r-graph-gallery.com/192-ggplot-themes/).

```{r, eval=FALSE, class.source="Question",echo=TRUE }


# Load dataset from github
data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/
Example_dataset/1_OneNum.csv", header=TRUE)

# Make the histogram


data %>%
... +
theme_classic()
```

```{r, class.source="Correction",fig.show="hide",echo=FALSE, fig.show='asis'}


# Load dataset from github
data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/
Example_dataset/1_OneNum.csv", header=TRUE)

# Make the histogram


data %>%
filter( price<300 ) %>%
ggplot( aes(x=price)) +
geom_histogram(fill="#69b3a2", color="#e9ecef", alpha=0.9) +
ggtitle("Night price distribution of Airbnb appartements") +
theme_classic()
```

<br><br><br><span class="questionNumber">Q1.4</span> The `hrbrthemes` package


provides my favourite style. Install the package, load it, and apply the
`theme_ipsum()`. Documentation is [here](https://github.com/hrbrmstr/hrbrthemes).

```{r, eval=FALSE, class.source="Question",echo=TRUE }


# Libraries
library(tidyverse)
library(hrbrthemes)
library(viridis)

# Load dataset from github


data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/
Example_dataset/1_OneNum.csv", header=TRUE)

# Make the histogram


data %>%
filter( price<300 ) %>%
ggplot( aes(x=price)) +
stat_bin(breaks=seq(0,300,10), fill="#69b3a2", color="#e9ecef", alpha=0.9) +
ggtitle("Night price distribution of Airbnb appartements") +
theme_ipsum()
```

```{r, class.source="Correction",fig.show="hide",echo=FALSE, fig.show='asis'}


# Libraries
library(tidyverse)
library(hrbrthemes)
library(viridis)

# Load dataset from github


data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/
Example_dataset/1_OneNum.csv", header=TRUE)

# Make the histogram


data %>%
filter( price<300 ) %>%
ggplot( aes(x=price)) +
stat_bin(breaks=seq(0,300,10), fill="#69b3a2", color="#e9ecef", alpha=0.9) +
ggtitle("Night price distribution of Airbnb appartements") +
theme_ipsum()
```

# 2- Annotation
***
Annotation is a crucial component of a good dataviz. It can turn a boring graphic
into an interesting and insightful way to convey information. Dataviz is often
separated in two main types: exploratory and explanatory analysis. Annotation is
used for the second type.

## &rarr; Text
The most common type of annotation is text. Let's say you have a spike in a line
plot. It totally makes sense to highlight it, and explain more in details what it
is about.

<br><br><br><span class="questionNumber">Q1.1</span> Build a line plot showing the


bitcoin price evolution between 2013 and 2018. Dataset is located
[here]("https://raw.githubusercontent.com/holtzy/data_to_viz/master/
Example_dataset/3_TwoNumOrdered.csv") and can be read directly with `read.table()`.
What part of the chart would you highlight?

```{r, eval=FALSE, class.source="Question",echo=TRUE }


# Load dataset from github
data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/
Example_dataset/3_TwoNumOrdered.csv", header=T)
data$date <- as.Date(data$date)

# plot
...
```

```{r, class.source="Correction",fig.show="hide",echo=FALSE}
# Load dataset from github
data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/
Example_dataset/3_TwoNumOrdered.csv", header=T)
data$date <- as.Date(data$date)

# plot
data %>%
ggplot( aes(x=date, y=value)) +
geom_line(color="#69b3a2")
```

<br><br><br><span class="questionNumber">Q1.2</span> Use the `annotate()` function


to add text. Annotate requires several arguments:

- `geom`: type of annotation, use `text`


- `x`: position on the X axis
- `y`: position on the Y axis
- `label`: what you want to write
- Optional: `color`, `size`, `angle` and
[more](https://www.r-graph-gallery.com/233-add-annotations-on-ggplot2-chart/).

```{r, eval=FALSE, class.source="Question",echo=TRUE }


# plot
data %>%
ggplot( aes(x=date, y=value)) +
geom_line(color="#69b3a2") +
annotate( x=as.Date("2017-01-01"), ...)
```

```{r, class.source="Correction",fig.show="hide",echo=FALSE}
# plot
data %>%
ggplot( aes(x=date, y=value)) +
geom_line(color="#69b3a2") +
annotate(geom="text", x=as.Date("2017-01-01"), y=19000,
label="Bitcoin price reached 20k $\nat the end of 2017")
```

## &rarr; Shape

<br><span class="questionNumber">Q1.3</span> Find the exact spike `date` and its


`value`. Use this information to add a circle around the spike. This is done with
the `annotate()` function once more:

- `geom`: use `point`


- `x`: position on the X axis
- `y`: position on the Y axis
- `shape`: use 21, to be able to change the `fill` and the `color` arguments.
(fill=inside, color=stroke)
- `size`

```{r, eval=FALSE, class.source="Question",echo=TRUE }


# Find spike date and value:
# data %>% arrange(...) ...

# plot
data %>%
ggplot( aes(x=date, y=value)) +
geom_line(color="#69b3a2") +
annotate(geom="text", ...) +
annotate(geom="point", ...)
```

```{r, class.source="Correction",fig.show="hide",echo=FALSE}
# Find spike date and value:
# data %>% arrange(desc(value)) %>% head(1)

# plot
data %>%
ggplot( aes(x=date, y=value)) +
geom_line(color="#69b3a2") +
ylim(0,22000) +
annotate(geom="text", x=as.Date("2017-01-01"), y=20089,
label="Bitcoin price reached 20k $\nat the end of 2017") +
annotate(geom="point", x=as.Date("2017-12-17"), y=20089, size=10, shape=21,
fill="transparent")
```

## &rarr; Abline

<br><span class="questionNumber">Q1.4</span> Add a horizontal abline to show what


part of the curve is over 5000 $. This is possible thanks to the `geom_hline()`
function that requires its `yintercept` argument.

```{r, eval=FALSE, class.source="Question",echo=TRUE }


# Find spike date and value:
# data %>% arrange(desc(value)) %>% head(1)
# plot
data %>%
...
annotate(...) +
annotate(...) +
geom_hline(..., color=..., size=...)
```

```{r, class.source="Correction",fig.show="hide",echo=FALSE}
# Find spike date and value:
# data %>% arrange(desc(value)) %>% head(1)

# plot
data %>%
ggplot( aes(x=date, y=value)) +
geom_line(color="#69b3a2") +
ylim(0,22000) +
annotate(geom="text", x=as.Date("2017-01-01"), y=20089,
label="Bitcoin price reached 20k $\nat the end of 2017") +
annotate(geom="point", x=as.Date("2017-12-17"), y=20089, size=10, shape=21,
fill="transparent") +
geom_hline(yintercept=5000, color="orange", size=.5)
```

## &rarr; Color

<br><span class="questionNumber">Q1.5</span> Build a scatterplot based on the


`gapminder` dataset. Use `gdpPercap` for the X axis, `lifeExp` for the Y axis, and
`pop` for bubble size. Keep only the year 2007.

```{r, class.source="Correction",fig.show="hide",echo=FALSE}
# Data are available in the gapminder package
library(gapminder)
data <- gapminder %>% filter(year=="2007") %>% select(-year)

# Basic scatterplot
ggplot( data, aes(x=gdpPercap, y=lifeExp, size = pop, color = continent)) +
geom_point(alpha=0.7)

```

<br><br><br><span class="questionNumber">Q1.6</span> Highlight South Africa in the


chart: draw it in red, with all other circles in grey. Follow those steps:

- create a new column with `mutate`: this new column has the value `yes` if
`country=="South Africa"`, `no` otherwise. This is possible thanks to the `ifelse`
function.
- in the aesthetics part of the ggplot call, use this new column to control dot
colors
- use `scale_color_manual()` to control the color of both group. Use a bright color
for the country to highlight, and grey for the others.

```{r, eval=FALSE, class.source="Question",echo=TRUE }


# Basic scatterplot
data %>%
mutate(isSouthAfrica = ... ) %>%
ggplot( .., color = isSouthAfrica)) +
geom... +
scale_color_manual(values=c("grey", "red")) +
theme(legend.position="none")
```

```{r, class.source="Correction",fig.show="hide",echo=FALSE}
# Basic scatterplot
data %>%
mutate(isSouthAfrica = ifelse(country=="South Africa", "yes", "no")) %>%
ggplot( aes(x=gdpPercap, y=lifeExp, size = pop, color = isSouthAfrica)) +
geom_point(alpha=0.7) +
scale_color_manual(values=c("grey", "red")) +
theme(legend.position="none")

```

## &rarr; Multiple text

<br><span class="questionNumber">Q1.7</span> Highlight every country with


`gdpPercap > 5000` & `lifeExp < 60` in red. Write their names using the
`geom_text_repel of the `ggrepel` package to avoid text overlapping.

```{r, eval=FALSE, class.source="Question",echo=TRUE }


# ggrepel
library(ggrepel)

# prepare data
tmp <- data %>%
mutate( annotation = ifelse(...))

# plot
tmp %>%
ggplot( ...) +
geom... +
theme(...) +
geom_text_repel(data=tmp %>% filter(annotation=="yes"), aes(label=country),
size=4 )
```

```{r, class.source="Correction",fig.show="hide",echo=FALSE}
# ggrepel
library(ggrepel)

# prepare data
tmp <- data %>%
mutate( annotation = ifelse(gdpPercap > 5000 & lifeExp < 60, "yes", "no"))

# plot
tmp %>%
ggplot( aes(x=gdpPercap, y=lifeExp, size = pop, color = continent)) +
geom_point(alpha=0.7) +
theme(legend.position="none") +
geom_text_repel(data=tmp %>% filter(annotation=="yes"), aes(label=country),
size=4 )
```

# 3- Faceting
***

Faceting is a very powerful data visualization technique. It splits the figure in


small subsets, usually one by level of a categorical variable. `ggplot2` offers 2
functions to build small multiples: `facet_wrap()` and `facet_grid()`.

## &rarr; facet_wrap()

<br><span class="questionNumber">Q3.1</span> Build a [spaghetti


chart](https://www.data-to-viz.com/caveat/spaghetti.html) showing the evolution of
9 baby names in the US. (See code
[here](https://www.data-to-viz.com/caveat/spaghetti.html)). What's wrong with this
chart?

```{r, eval=FALSE, class.source="Question",echo=TRUE }


# Libraries
library(babynames)

# Load dataset from github


data <- babynames %>%
filter(name %in% c("Ashley", "Amanda", "Jessica", "Patricia", "Linda",
"Deborah", "Dorothy", "Betty", "Helen")) %>%
filter(sex=="F")

...

```

```{r, class.source="Correction",fig.show="hide",echo=FALSE}
# Libraries
library(babynames)

# Load dataset from github


data <- babynames %>%
filter(name %in% c("Ashley", "Amanda", "Jessica", "Patricia", "Linda",
"Deborah", "Dorothy", "Betty", "Helen")) %>%
filter(sex=="F")
# line plot = spaghetti chart
data %>%
ggplot( aes(x=year, y=n, group=name, color=name)) +
geom_line() +
ggtitle("Popularity of American names in the previous 30 years")
```

<br><br><br><span class="questionNumber">Q3.2</span> Use the `facet_wrap()`


function to build one area chart for each name. Basically, you have to provide a
categorical variable to the function. It will build a chart for each of its level.

Have a look to the Y axis. What do you observe? Is it a good option?

```{r, eval=FALSE, class.source="Question",echo=TRUE }


...
geom_area() +
... +
facet_wrap(~name)
```

You should get something like this:

```{r, class.source="Correction",fig.show="hide",echo=FALSE, fig.show='asis'}


data %>%
ggplot( aes(x=year, y=n, group=name, fill=name)) +
geom_area() +
ggtitle("Popularity of American names in the previous 30 years") +
theme(
legend.position="none",
) +
facet_wrap(~name)
```

<br><br><br><span class="questionNumber">Q3.3</span> Find out how to use the


`scale` option to have different Y axis limits for each subset. Does it make sense?
In which conditions?

```{r, eval=FALSE, class.source="Question",echo=TRUE }


...
geom_area() +
... +
facet_wrap(~name, scale=)
```

You should get something like this:


```{r, class.source="Correction",fig.show="hide",echo=FALSE, fig.show='asis'}
data %>%
ggplot( aes(x=year, y=n, group=name, fill=name)) +
geom_area() +
ggtitle("Popularity of American names in the previous 30 years") +
theme(
legend.position="none",
) +
facet_wrap(~name, scale="free_y")
```

## &rarr; facet_grid()

<br><span class="questionNumber">Bonus</span> Find out what the `facet_grid()`


function does. Why is it different to `facet_wrap()`?

<br><br><br><span class="questionNumber">Bonus</span>Load
[this](https://raw.githubusercontent.com/holtzy/data_to_viz/master/
Example_dataset/10_OneNumSevCatSubgroupsSevObs.csv) dataset in R. Build a histogram
for every combination of day and sex using `facet_wrap()`

You should get something like:

```{r, class.source="Correction",fig.show="hide",echo=FALSE, fig.show='asis'}


# Load dataset from github
data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/
Example_dataset/10_OneNumSevCatSubgroupsSevObs.csv", header=T, sep=",")

# Plot
ggplot(data, aes(x=total_bill)) +
geom_histogram() +
facet_grid(sex~day)
```
# 4- Saving plots
***

<br><br><br><span class="questionNumber">Q4.1</span> - Save the previous chart as a


`PNG` file using the `ggsave()` function. Where is saved the file?

```{r}
# save the plot in an object called p
p <- ggplot(data, aes(x=total_bill)) +
geom_histogram() +
facet_grid(sex~day)

# Save the plot


ggsave(p, filename = "chartFromRPractical.png")
```

<br><br><br><span class="questionNumber">Q4.2</span> - Specify the complete path


before file name to save the chart at a specific location.

# 5- Colors
***
Picking the right colors is a crucial step for a good dataviz. R offers awesome
options and packages to make the right choices. Here is an overview of the main
options.

## &rarr; One color

<br><span class="questionNumber">Q5.1</span> Several options exist to pick one


color. Change the histogram color using the `fill` argument on the chart below
using each of the following options:

- plain color name. Type `colors()` to see all the options.


- using `rgb()`. This function provides the quantity of red, green and blue to
build the color. Plus an argument for the opacity. Example, try
`rgb(.7, .6, .3, .2)`
- using `HTML` colors. Use [this
website](https://www.w3schools.com/colors/colors_picker.asp) to pick one you like.

```{r, eval=FALSE, class.source="Question",echo=TRUE }


# Load dataset from github
data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/
Example_dataset/1_OneNum.csv", header=TRUE)

# Make the histogram


data %>%
filter( price<300 ) %>%
ggplot( aes(x=price)) +
geom_histogram(fill="steelblue") +
ggtitle("Night price distribution of Airbnb appartements") +
theme_ipsum()
```

```{r, class.source="Correction",fig.show="hide",echo=FALSE, fig.show='asis'}


# Load dataset from github
data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/
Example_dataset/1_OneNum.csv", header=TRUE)

# Make the histogram


data %>%
filter( price<300 ) %>%
ggplot( aes(x=price)) +
geom_histogram(fill="steelblue") +
ggtitle("Night price distribution of Airbnb appartements") +
theme_ipsum()
```

## &rarr; Discrete color palette

<br><span class="questionNumber">Q5.2</span> Build a scatterplot based on the


`iris` dataset. Use `Sepal.Length` for the X axis, `Petal.Length` for the Y axis.
Use `color=Species` to color groups.

```{r, class.source="Correction",fig.show="hide",echo=FALSE}
ggplot(iris, aes(x=Sepal.Length, y=Petal.Length, color=Species)) +
geom_point()
```

<br><br><br><span class="questionNumber">Q5.3</span> It is possible to set the


color scale manually using `scale_color_manual()`. Use the hint below to see how to
use it and apply it to the previous scatterplot.

<u>Note</u>: it is a bad practice to pick colors randomly. Your palette will be


ugly and will probably not be colorblind friendly.

```{r, eval=FALSE, class.source="Question",echo=TRUE }


ggplot(iris, aes(x=Sepal.Length, y=Petal.Length, color=Species)) +
geom_point() +
scale_color_manual( values=c("red","green","blue"))
```

```{r, class.source="Correction",fig.show="hide",echo=FALSE}
ggplot(iris, aes(x=Sepal.Length, y=Petal.Length, color=Species)) +
geom_point() +
scale_color_manual( values=c("red","green","blue"))
```

<br><br><br><span class="questionNumber">Q5.4</span> Fortunately, people already


tackeled this issue for us and created packages offering nice color palettes. The
most famous one is `RColorBrewer`. Palettes are already available in `ggplot2`. See
all of them [here](https://www.r-graph-gallery.com/38-rcolorbrewers-palettes/), and
use one on your chart using `scale_color_brewer()`.

Pick the one you like the most and apply it to to previous scatterplot. Use it to
color the `Species`.

```{r, eval=FALSE, class.source="Question",echo=TRUE }


... +
scale_color_brewer(palette = )
```

```{r, class.source="Correction",fig.show="hide",echo=FALSE}
ggplot(iris, aes(x=Sepal.Length, y=Petal.Length, color=Species)) +
geom_point(size=4) +
scale_color_brewer(palette = "Set3")
```

## &rarr; Continuous color palette

<br><span class="questionNumber">Q5.5</span> `RColorBrewer` also offers continuous


color palette. However they must be called through the `scale_color_distiller`
function. Use the palette you like the most to color circles depending on
`Sepal_length`.

```{r, eval=FALSE, class.source="Question",echo=TRUE }


... +
scale_color_distil...
```

```{r, class.source="Correction",fig.show="hide",echo=FALSE, fig.show='asis'}


ggplot(iris, aes(x=Sepal.Length, y=Petal.Length, color=Sepal.Length)) +
geom_point() +
scale_color_distiller(palette = "RdPu")
```
# 6- Interactive charts
***
An interactive chart is a chart on which you can zoom, hover shapes to get
tooltips, click to trigger actions and more. Building interactive charts requires
javascript under the hood, but it is relatively easy to build it using R packages
that wrap the javascript for you. This type of packages are called [HTML widgets]
(https://www.htmlwidgets.org).

## &rarr; Plotly

<br><span class="questionNumber">Q6.1</span> Build the `gapminder` bubble plot


you've already done in the annotation part of this practical. Store it in an object
called `p`
```{r, eval=FALSE, class.source="Question",echo=TRUE }
# load data
library(gapminder)
data <- gapminder %>% filter(year=="2007") %>% select(-year)

# Basic ggplot
p <- data %>%
ggplot( ...
p
```

```{r, class.source="Correction",fig.show="hide",echo=FALSE}
# load data
library(gapminder)
data <- gapminder %>% filter(year=="2007") %>% select(-year)

# Basic ggplot
p <- data %>%
ggplot( aes(x=gdpPercap, y=lifeExp, size = pop, color = continent)) +
geom_point(alpha=0.7)
p
```

<br><br><br><span class="questionNumber">Q6.2</span> Install and load the `plotly`


package. Build an interactive chart using the `ggplotly()` function. What are the
new functionalities of this chart? Is it useful? What could be better?
```{r, fig.align="center"}
# Interactive version
library(plotly)
ggplotly(p)
```
<br><br><br><span class="questionNumber">Q6.3</span> Let's improve the tooltip of
the chart:

- build a new column called `myText`. Fill it with whatever you want to show in the
tooltip.
- add a new aesthetics: `text=myText`
- in the `ggplotly()` call, add `tooltip="text"`

```{r, eval=FALSE, class.source="Question",echo=TRUE }


# Basic ggplot
p <- data %>%
mutate(myText=...) %>%
ggplot( aes(...text=myText)) +
...

ggplotly(p, tooltip="text")
```

```{r, class.source="Correction",fig.show="hide",echo=FALSE}
# Basic ggplot
p <- data %>%
mutate(myText=paste("This country is: " , country )) %>%
ggplot( aes(x=gdpPercap, y=lifeExp, size = pop, color = continent, text=myText))
+
geom_point(alpha=0.7)

ggplotly(p, tooltip="text")

```

## &rarr; Leaflet

<br><span class="questionNumber">Q6.4</span> Use the HTML widget called `leaflet`


to build an interactive map showing the earthquakes described in the dataset called
`quakes`. Code is fully provided here, since cartography with R could deserve an
entire practical. The idea is just to discover to potential offered in a few lines
of code:

```{r}
# Library
library(leaflet)

# load example data (Fiji Earthquakes) + keep only 100 first lines
data(quakes)
quakes = head(quakes, 100)

# Create a color palette with handmade bins.


mybins=seq(4, 6.5, by=0.5)
mypalette = colorBin( palette="YlOrBr", domain=quakes$mag, na.color="transparent",
bins=mybins)

# Final Map
leaflet(quakes) %>%
addTiles() %>%
setView( lat=-27, lng=170 , zoom=4) %>%
addProviderTiles("Esri.WorldImagery") %>%
addCircleMarkers(~long, ~lat,
fillColor = ~mypalette(mag), fillOpacity = 0.7, color="white", radius=8,
stroke=FALSE
) %>%
addLegend( pal=mypalette, values=~mag, opacity=0.9, title = "Magnitude", position
= "bottomright" )
```

## &rarr; Heatmap
The `d3heatmap` package allows to build interactive heatmaps in a few line of code.
Let's see how it works

<br><br><br><span class="questionNumber">Q6.5</span> Load


[this](http://datasets.flowingdata.com/ppg2008.csv) dataset in R. Have a look to
the first rows. Describe it. ([source](https://flowingdata.com/2010/01/21/how-to-
make-a-heatmap-a-quick-and-easy-solution/))

```{r, eval=FALSE, class.source="Question",echo=TRUE }


# Load data
data <- read.csv(...)
```

```{r, class.source="Correction",fig.show="hide",echo=FALSE}
# Load data
data <- read.csv("http://datasets.flowingdata.com/ppg2008.csv", row.names = 1)

# head(data)
# summary(data)
```

<br><br><br><span class="questionNumber">Q6.6</span> R offers a `heatmap()`


function to build... heatmaps! Apply it to the dataset. What do you observe? Are
you happy with this heatmap? What's wrong with it? How can we solve the issue?

<u>Note</u>: input dataset must be at the `matrix` format to be accepted by the


function. Use `as.matrix()` to get this format.
```{r, class.source="Correction",fig.show="hide",echo=FALSE}
# Make the heatmap
heatmap( as.matrix(data) )
```

<br><br><br><span class="questionNumber">Q6.7</span> Check the `scale` option of


the `heatmap()` function. What is it for? Can it help us? How? Use it to improve
the heatmap.

```{r, class.source="Correction",fig.show="hide",echo=FALSE}
# Make the heatmap
heatmap( as.matrix(data), scale = "column")
```

<br><br><br><span class="questionNumber">Q6.8</span> `d3heatmap()` uses exactly the


same syntax than `heatmap()`. Use the function to get an interactive version of the
previous heatmap!

```{r, eval=FALSE, class.source="Question",echo=TRUE }


# Load library
library(d3heatmap)

# Build heatmap
d3heatmap(...)
```

```{r, class.source="Correction",fig.show="hide",echo=FALSE}
# Load library
library(d3heatmap)

# Build heatmap
d3heatmap(data, scale = "column")
```

## &rarr; Time Seriesx

<br><span class="questionNumber">Q6.9</span> - Use the HTML widget called


`dygraphs` to build an interactive line plot of the [bitcoin price evolution]
(https://raw.githubusercontent.com/holtzy/data_to_viz/master/Example_dataset/
3_TwoNumOrdered.csv). Try to reproduce the example below.

```{r, fig.align="center", out.width="100%"}


# Library
library(dygraphs)
library(xts) # To make the convertion data-frame / xts format

# Load dataset from github


data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/
Example_dataset/3_TwoNumOrdered.csv", header=T)
data$date <- as.Date(data$date)

# Then you can create the xts format, and thus use dygraph
don <- xts(x = data$value, order.by = data$date)

# Use the dygraph HTML widget


dygraph(don) %>%
dyOptions(labelsUTC = TRUE, fillGraph=TRUE, fillAlpha=0.1, drawGrid = FALSE,
colors="#D8AE5A") %>%
dyRangeSelector() %>%
dyCrosshair(direction = "vertical") %>%
dyHighlight(highlightCircleSize = 5, highlightSeriesBackgroundAlpha = 0.2,
hideOnMouseOut = FALSE) %>%
dyRoller(rollPeriod = 1)
```

## &rarr; HTML widgets

<br><span class="questionNumber">BONUS</span> - The packages showcased above are


just a sample of the possibilities offered by the html widgets. Visit [this
website](https://www.htmlwidgets.org/showcase_leaflet.html) to have an overview of
what kind of interactive chart you can do with `R`. Pick your favorite example and
try to reproduce it.

# 7- Scales
***
Scales control the details of how data values are translated to visual properties.
[Many different scales](https://ggplot2.tidyverse.org/reference/index.html#section-
scales) are offered by ggplot2. The most widely one is probably the log scale.

<br><br><br><span class="questionNumber">Q7.1</span> Build a histogram showing the


night price distribution of the french riviera apartements ([data
here](https://raw.githubusercontent.com/holtzy/data_to_viz/master/Example_dataset/
1_OneNum.csv)). Keep all the data, with extreme values.

```{r, eval=FALSE, class.source="Question",echo=TRUE }


# Load dataset from github
data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/
Example_dataset/1_OneNum.csv", header=TRUE)

# Make the histogram


data %>%
...
geom_histogram() +
...
```

```{r, class.source="Correction",fig.show="hide",echo=FALSE, fig.show='asis'}


# Libraries
library(ggplot2)

# Load dataset from github


data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/
Example_dataset/1_OneNum.csv", header=TRUE)

# Make the histogram


data %>%
ggplot( aes(x=price)) +
geom_histogram(color="white", fill="steelblue4") +
ggtitle("Night price distribution of Airbnb appartements") +
xlab("Night price") +
ylab("Number of apartments")
```

<br><br><br><span class="questionNumber">Q7.2</span> A common practice to avoid the


effect of extreme values is to filter data, or use `xlim` to zoom on a part of the
axis. Another approach is to use `scale_x_log10()` to apply a log transformation.
Apply this function to the histogram.

```{r, eval=FALSE, class.source="Question",echo=TRUE }


# Load dataset from github
data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/
Example_dataset/1_OneNum.csv", header=TRUE)

# Make the histogram


data %>%
...
geom_histogram() +
... +
scale_x_log10()
```
```{r, class.source="Correction",fig.show="hide",echo=FALSE, fig.show='asis'}
# Libraries
library(ggplot2)

# Load dataset from github


data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/
Example_dataset/1_OneNum.csv", header=TRUE)

# Make the histogram


data %>%
ggplot( aes(x=price)) +
geom_histogram(color="white", fill="steelblue4") +
ggtitle("Night price distribution of Airbnb appartements") +
xlab("Night price") +
ylab("Number of apartments") +
scale_x_log10()
```

<br><br><br><span class="questionNumber">Q7.3</span> What's the difference between


`scale_x_log10()` and applying the `log()` function on the dataset before doing the
chart? Why is it better?

You might also like