04 Data Visualization
04 Data Visualization
Luiza Andrade, Rob Marty, Rony Rodriguez-Ramirez, Luis Eduardo San Martin, Leonardo Viotti, Marc-
Andrea Fiorina
The World Bank – DIME | WB Github
March 2023
Introduction
2 / 56
Introduction
Initial Setup
If You Attended Session 2 If You Did Not Attend Session 2
1. Go to the dime-r-training-mar2023 folder that you created yesterday, and open the dime-r-training-mar2023 R project
that you created there.
3 / 56
Introduction
Initial Setup
If You Attended Session 2 If You Did Not Attend Session 2
2. Go to the OSF page of the course and download the file in: R for Stata Users - 2023 March > Data > dime-r-training-
mar2023.zip .
3. Unzip dime-r-training-mar2023.zip .
3 / 56
Today's session
Exploratory Analysis v. Publication/Reporting
For this session, you’ll use the ggplot2 package from the tidyverse meta-
package.
Similarly to previous sessions, you can find some references at the end of this
presentation that include a more comprehensive discussion on data
visualization.
4 / 56
Introduction
Before we start
Make sure the packages ggplot2 are installed and loaded. You can load it directly using library(tidyverse) or
library(ggplot2)
Load the whr_panel data set (remember to use the here package) we created last week.
# Packages
library(tidyverse)
library(here)
We’ll do this using ggplot2 with more customization. The idea is to create beautiful graphs.
6 / 56
Exploratory Analysis
7 / 56
Plot with Base R
First, we’re going to use base plot, i.e., using Base R default libraries. It is easy to use and can produce useful graphs with very
few lines of code.
(1) Create a vector called vars with the variables: economy_gdp_per_capita , happiness_score ,
health_life_expectancy , and freedom .
(2) Select all the variables from the vector vars in the whr_panel dataset and assign to the object whr_plot .
(3) Use the plot() function: plot(whr_plot)
# Vector of variables
vars <- c("economy_gdp_per_capita", "happiness_score", "health_life_expectancy", "freedom")
# Create a subset with only those variables, let's call this subset whr_plot
whr_plot <- whr_panel %>%
select(all_of(vars))
01:00
8 / 56
Base Plot
plot(whr_plot)
9 / 56
The beauty of ggplot2
1. Consistency with the Grammar of Graphics
This book is the foundation of several data viz applications: ggplot2 , polaris-
tableau , vega-lite
2. Flexibility
3. Layering and theme customization
4. Community
It is a powerful and easy to use tool (once you understand its logic) that produces
complex and multifaceted plots.
10 / 56
ggplot2: basic structure (template)
The basic ggplot structure is:
ggplot(data = DATA) +
GEOM_FUNCTION(mapping = aes(AESTHETIC MAPPINGS))
We are going to learn how we connect our data to the components of a ggplot
11 / 56
ggplot2: full structure
ggplot(data = <DATA>) + 1. Data : The data that you want to visualize
<GEOM_FUNCTION>( 2. Layers : geom_ and stat_ → The geometric shapes and
mapping = aes(<MAPPINGS>), statistical summaries representing the data
stat = <STAT>,
3. Aesthetics : aes() → Aesthetic mappings of the
position = <POSITION>
geometric and statistical objects
) +z
<COORDINATE_FUNCTION> + 4. Scales : scale_ → Maps between the data and the
<FACET_FUNCTION> + aesthetic dimensions
<SCALE_FUNTION> + 5. Coordinate system : coord_ → Maps data into the
<THEME_FUNCTION> plane of the data rectangle
6. Facets : facet_ → The arrangement of the data into a
grid of plots
7. Visual themes : theme() and theme_ → The overall
visual defaults of a plot
12 / 56
ggplot2: decomposition
13 / 56
Exploratory Analysis
Let's start making some plots.
ggplot(data = whr_panel) +
geom_point(mapping = aes(x = happiness_score, y = economy_gdp_per_capita))
14 / 56
Exploratory Analysis
We can also set up our mapping in the ggplot() function.
15 / 56
Exploratory Analysis
We can also set up the data outside the ggplot() function as follows:
whr_panel %>%
ggplot(aes(x = happiness_score, y = economy_gdp_per_capita)) +
geom_point()
16 / 56
Exploratory Analysis
I prefer to use the second way of structuring our ggplot.
Both structures will work but this will make a difference if you want to load more
datasets at the same time, and whether you would like to combine more geoms in
the same ggplot. More on this in the following slides.
17 / 56
Exploratory Analysis
Solution:
whr_panel %>%
ggplot() +
geom_point(aes(x = freedom, y = economy_gdp_per_capita))
01:00
18 / 56
Exploratory Analysis
The most common geoms are:
If you want to know more about layers, you can refer to this.
19 / 56
Exploratory Analysis
In summary, our basic plots should have the following:
20 / 56
Exploratory Analysis
In summary, our basic plots should have the following:
21 / 56
Exploratory Analysis
In summary, our basic plots should have the following:
22 / 56
Exploratory Analysis
We can also map colors.
whr_panel %>%
ggplot(
aes(
x = happiness_score,
y = economy_gdp_per_capita,
color = region
)
) +
geom_point()
23 / 56
Exploratory Analysis
Let's try to do something different, try, instead of region , adding color = "blue" inside aes() .
whr_panel %>%
ggplot(
aes(
x = happiness_score,
y = economy_gdp_per_capita,
color = "blue"
)
)
24 / 56
Exploratory Analysis
In ggplot2 , these settings are called aesthetics.
Notice that it is important to know where we are setting our aesthetics. For example:
25 / 56
Exploratory Analysis
Let's modify our last plot. Let's add color = "blue" inside geom_point() .
whr_panel %>%
ggplot(
aes(
x = happiness_score,
y = economy_gdp_per_capita
)
) +
geom_point(color = "blue")
26 / 56
Exploratory Analysis
Exercise 3: Map colors per year for the freedom and gdp plot we did before. Keep in mind the type of the variable
year .
Solution:
whr_panel %>%
ggplot(
aes(
x = freedom,
y = economy_gdp_per_capita,
color = year
)
) +
geom_point()
01:00
27 / 56
Exploratory Analysis
How do you think we could solve it?
Change the variable year as: as.factor(year) .
whr_panel %>%
ggplot(
aes(
x = freedom,
y = economy_gdp_per_capita,
color = as.factor(year)
)
) +
geom_point()
28 / 56
ggplot2: settings
29 / 56
ggplot2: settings
Now, let's try to modify our plots. In the following slides, we are going to:
1. Change shapes.
3. Separate by regions.
5. Change scales.
30 / 56
ggplot2: shapes
whr_panel %>%
ggplot(aes(x = happiness_score, y = economy_gdp_per_capita)) +
geom_point(shape = 5)
31 / 56
ggplot2: shapes
32 / 56
ggplot2: including more geoms
whr_panel %>%
ggplot(aes(x = happiness_score, y = economy_gdp_per_capita)) +
geom_point() +
geom_smooth()
33 / 56
ggplot2: Facets
whr_panel %>%
ggplot(aes(x = happiness_score, y = economy_gdp_per_capita)) +
geom_point() +
facet_wrap(~ region)
34 / 56
ggplot2: Colors and facets
Exercise 4: Use the last plot and add a color aesthetic per region.
Solution:
whr_panel %>%
ggplot(
aes(
x = happiness_score,
y = economy_gdp_per_capita,
color = region
)
) +
geom_point() +
facet_wrap(~ region)
01:00
35 / 56
ggplot2: Pipe and mutate before plotting
Let's imagine now, that we would like to transform a variable before plotting.
R Code Plot
whr_panel %>%
mutate(
latam = ifelse(region == "Latin America and Caribbean", TRUE, FALSE)
) %>%
ggplot(
aes(
x = happiness_score, y = economy_gdp_per_capita,
color = latam)
) +
geom_point()
36 / 56
ggplot2: Pipe and mutate before plotting
Let's imagine now, that we would like to transform a variable before plotting.
R Code Plot
36 / 56
ggplot2: geom's sizes
We can also specify the size of a geom, either by a variable or just a number.
whr_panel %>%
filter(year == 2017) %>%
ggplot(aes(x = happiness_score, y = economy_gdp_per_capita)) +
geom_point(aes(size = economy_gdp_per_capita))
37 / 56
ggplot2: Changing scales
Linear Log
ggplot(data = whr_panel,
aes(x = happiness_score,
y = economy_gdp_per_capita)) +
geom_point()
38 / 56
ggplot2: Changing scales
Linear Log
ggplot(data = whr_panel,
aes(x = happiness_score,
y = economy_gdp_per_capita)) +
geom_point() +
scale_x_log10()
38 / 56
ggplot2: Themes
Let's go back to our plot with the latam dummy.
39 / 56
ggplot2: Labels
R Code Plot
whr_panel %>%
mutate(
latam = ifelse(region == "Latin America and Caribbean", TRUE, FALSE)
) %>%
filter(year == 2015) %>%
ggplot(aes(x = happiness_score, y = economy_gdp_per_capita,
color = latam)) +
geom_point() +
labs(
x = "Happiness Score",
y = "GDP per Capita",
title = "Happiness Score vs GDP per Capita, 2015"
)
40 / 56
ggplot2: Labels
R Code Plot
40 / 56
ggplot2: Legends
R Code Plot
whr_panel %>%
mutate(
latam = ifelse(region == "Latin America and Caribbean", TRUE, FALSE)
) %>%
filter(year == 2015) %>%
ggplot(aes(x = happiness_score, y = economy_gdp_per_capita,
color = latam)) +
geom_point() +
scale_color_discrete(labels = c("No", "Yes")) +
labs(
x = "Happiness Score",
y = "GDP per Capita",
color = "Country in Latin America\nand the Caribbean",
title = "Happiness Score vs GDP per Capita, 2015"
)
41 / 56
ggplot2: Legends
R Code Plot
41 / 56
ggplot2: Themes
R Code Plot
whr_panel %>%
mutate(
latam = ifelse(region == "Latin America and Caribbean", TRUE, FALSE)
) %>%
filter(year == 2015) %>%
ggplot(aes(x = happiness_score, y = economy_gdp_per_capita,
color = latam)) +
geom_point() +
scale_color_discrete(labels = c("No", "Yes")) +
labs(
x = "Happiness Score",
y = "GDP per Capita",
color = "Country in Latin America\nand the Caribbean",
title = "Happiness Score vs GDP per Capita, 2015"
) +
theme_minimal()
42 / 56
ggplot2: Themes
R Code Plot
42 / 56
ggplot2: Themes
The theme() function allows you to modify each aspect of your plot. Some arguments are:
theme(
# Title and text labels
plot.title = element_text(color, size, face),
# Title font color size and face
legend.title = element_text(color, size, face),
# Title alignment. Number from 0 (left) to 1 (right)
legend.title.align = NULL,
# Text label font color size and face
legend.text = element_text(color, size, face),
# Text label alignment. Number from 0 (left) to 1 (right)
legend.text.align = NULL,
)
43 / 56
ggplot2: Color palettes
We can also add color palettes using other packages such as: RColorBrewer,
viridis or funny ones like the wesanderson package. So, let's add new colors.
# install.packages("RColorBrewer")
library(RColorBrewer)
44 / 56
ggplot2: Color palettes
R Code Plot
whr_panel %>%
mutate(
latam = ifelse(region == "Latin America and Caribbean", TRUE, FALSE)
) %>%
filter(year == 2015) %>%
ggplot(aes(x = happiness_score, y = economy_gdp_per_capita,
color = latam)) +
geom_point() +
scale_color_brewer(palette = "Dark2", labels = c("No", "Yes")) +
labs(
x = "Happiness Score",
y = "GDP per Capita",
color = "Country in Latin America\nand the Caribbean",
title = "Happiness Score vs GDP per Capita, 2015"
) +
theme_minimal()
45 / 56
ggplot2: Color palettes
R Code Plot
45 / 56
ggplot2: Color palettes
1. ghibli
2. LaCroixColoR
3. NineteenEightyR
4. nord
5. palettetown
6. quickpalette
7. wesanderson
46 / 56
Saving a plot
47 / 56
Saving a plot
Remember that in R we can always assign our functions to an object. In this case, we can assign our ggplot2 code to an
object called fig as follows.
48 / 56
Therefore, if you want to plot it again, you can just type fig in the console.
Saving a plot
Exercise 5: Save a ggplot under the name fig_*YOUR INITIALS* . Use the ggsave() function. You can either include
the function after your plot or, save the ggplot first as an object and then save the plot.
The syntax is ggsave(OBJECT, filename = FILEPATH, heigth = ..., width = ..., dpi = ...) .
Solution:
ggsave(
fig,
filename = here("DataWork","Output","Raw","fig_MA.png"),
dpi = 750,
scale = 0.8,
height = 8,
width = 12
)
01:00
49 / 56
And that's it for this session. Join us tomorrow for data
analysis. Remember to submit your feedback!
50 / 56
References and recommendations
51 / 56
References and recommendations
ggplot tricks:
Websites:
Online courses:
Books:
53 / 56
Interactive graphs
There are several packages to create interactive or dynamic data vizualizations with R. Here are a few:
leaflet - R integration tp one of the most popular open-source libraries for interactive maps.
highcharter - cool interactive graphs.
plotly - interactive graphs with integration to ggplot.
gganimate - ggplot GIFs.
DT - Interactive table
These are generally, html widgets that can be incorporated in to an html document and websites.
54 / 56
Interactive graphs
Now we’ll use the ggplotly() function from the plotly package to create an interactive graph!
55 / 56
Interactive graphs
R Code Plot
# Load package
library(plotly)
56 / 56
Interactive graphs
R Code Plot
1.5
GDP per Capita
1.0
0.5
0.0
3 4 5 6 7
Happiness Score
Country in Latin America FALSE TRUE
and the Caribbean
56 / 56