0% found this document useful (0 votes)
183 views

Data Visualization With R - Principles and Practice

This document provides an introduction to data visualization techniques using R and the ggplot2 package. It discusses different types of variables and the appropriate visualization geom for different variable combinations. Key geoms include geom_point() for scatter plots, geom_bar() for bar charts, geom_histogram() for histograms, and geom_line() for line plots. The document emphasizes that ggplot2 follows the grammar of graphics, where data, aesthetics, geometries and other elements are layered to create visualizations.

Uploaded by

APPIAH ELIJAH
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
183 views

Data Visualization With R - Principles and Practice

This document provides an introduction to data visualization techniques using R and the ggplot2 package. It discusses different types of variables and the appropriate visualization geom for different variable combinations. Key geoms include geom_point() for scatter plots, geom_bar() for bar charts, geom_histogram() for histograms, and geom_line() for line plots. The document emphasizes that ggplot2 follows the grammar of graphics, where data, aesthetics, geometries and other elements are layered to create visualizations.

Uploaded by

APPIAH ELIJAH
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

DATA VISUALIZATION

WITH

PRINCIPLES & PRACTICE


2

HELLO!
I am Elijah Appiah from
Ghana.
I am an Economist by
profession.
I love everything data, so I
love R!
You can reach me:
secret behind the smile! [email protected]
3

Lesson Goals
Provide compact introduction to
allow readers learn about
visualization techniques.

Emphasize the strong connections


between visualizations and insight.
5

Datasets

mtcars: base R
wage1: wooldridge package
diamonds: ggplot2 package
5

Variables
Categorical Numeric
Nominal – names, labels, Discrete – counts
categories with no natural e.g. number of
order cylinders of a vehicle
e.g. gender, countries
Ordinal – categories with Continuous – measured
an order even within an interval
e.g. Likert Scales e.g. height, weight
6

Variables (e.m)
Discrete – represents counts
e.g. number of students, grade levels, gender,
number of blue marbles in a jar, etc.
Continuous – represents measurable
amounts
e.g. height, weight, temperature, distance, etc.
7

GGPLOT2
GRAMMAR OF GRAPHICS PLOTS
8

GGPLOT2 LAYERS

Dataset to be Visual elements Data representations Plot appearance


visualized for the data to aid understanding (all non-data ink)

DATA AESTHETICS GEOMETRIES FACETS STATISTICS COORDINATES THEMES

Scales onto Create multiple The space on


which data is plots which data is
mapped plotted
9

GGPLOT2 LAYERS

Dataset to be Visual elements


visualized for the data

DATA AESTHETICS GEOMETRIES

Scales onto
which data is
mapped
10

GGPLOT2
The package is

ggplot2
The function is

ggplot()
11

Layer: DATA
ggplot(data = df)

Blank canvas with


grey background
12

Layer: AESTHETICS
The aesthetic attributes include:
x, y, colour (or color), shape, size, fill, alpha,
etc…
Aesthetics are mapped in the aes() function in the
ggplot() function.
13

Layer: AESTHETICS
ggplot(data = df, mapping = aes())

Aesthetic
attributes
14

Layer: AESTHETICS
ggplot(data = df, aes())

Aesthetic
attributes
15

Layer: AESTHETICS
ggplot(df, aes())

Aesthetic
attributes
16

Layer: AESTHETICS
ggplot(mtcars, aes(x = mpg))
17

Layer: AESTHETICS
ggplot(mtcars, aes(x = mpg, y = hp))
18

Layer: GEOMETRIES
The visual elements of plots are defined by geoms.
It is specified as geom_*().
where * denotes the specific type of plot to create.
A bar plot will be geom_bar()
A histogram will be geom_histogram()
A scatter plot will be geom_point()

Don’t worry…….we will be going into details soon……


19

Layer: GEOMETRIES
The geometric objects (or geoms) are added (+) to the
ggplot() function.

Example:
ggplot(mtcars, aes(x = mpg)) +
geom_histogram()

ggplot(mtcars, aes(x = mpg, y = hp)) +


geom_point()
20

Layer: GEOMETRIES
ggplot(mtcars, aes(x = mpg)) + geom_histogram()
21

Layer: GEOMETRIES
ggplot(mtcars, aes(x = mpg, y = hp)) + geom_point()
22

Now, let’s practice


23

VARIABLES & PLOTS - GEOMS


ONE VARIABLE

Discrete Continuous
Bar Plot – geom_bar() Histogram – geom_histogram()

Density Plot – geom_density()

Dot plot – geom_dot()

Frequency Polygons – geom_freqpoly()


24

VARIABLES & PLOTS - GEOMS


geom_bar() – display distribution of discrete variables.

geom_histogram() – bin and count continuous variable, display


with bars.

geom_density() – smoothed density estimate.

geom_dotplot() – stack individual points into a dot plot.

geom_freqpoly() – bin and count continuous variable, display


with lines.
25

Now, let’s practice


26

VARIABLES & PLOTS - GEOMS


TWO VARIABLES
Both Continuous One Continuous,
One Discrete
Scatter plot – geom_point() Bar plot – geom_col() or
geom_bar(stat=“identity”)
Quantile plot – geom_quantile() Box Plot – geom_boxplot()

Rug plot – geom_rug() Violin plot – geom_violin()

Text labels – geom_text()


27

VARIABLES & PLOTS - GEOMS


geom_point() – scatterplot.

geom_quantile() – smoothed quantile regression.

geom_rug() – marginal rug plots.

geom_text() – text labels.

geom_col()/geom_bar(stat=“identity”) – bar chart of


precomputed summaries.

geom_boxplot() – boxplots.

geom_violin() – show density of values in each group.


28

Now, let’s practice


29

VARIABLES & PLOTS - GEOMS


TWO VARIABLES

At Least One Show Distribution


Discrete (continuous)
Count plot– geom_count() Hexagonal Heatmap – geom_hex()

Jitter plot– geom_jitter() Heatmap – geom_bin2d()

Density plot – geom_density2d()


30

VARIABLES & PLOTS - GEOMS


geom_count() – count number of points at distinct locations.

geom_jitter() – randomly jitter overlapping points.

geom_hex() – bin into hexagons and count.

geom_bin2d() – smoothed 2d density estimate.

geom_density2d() – smoothed 2d density estimate.


31

Now, let’s practice


32

VARIABLES & PLOTS - GEOMS


TWO VARIABLES
One Time, One Display Uncertainty
Continuous
Line plot – geom_line() geom_crossbar()

Area plot – geom_area() geom_errorbar()

Step plot – geom_step() geom_linerange()

geom_pointrange()
33

VARIABLES & PLOTS - GEOMS


geom_line() – line plot.

geom_area() – area plot.

geom_step() – step plot.

geom_crossbar() – vertical bar with center.

geom_errorbar() – error bars.

Geom_linerange() – vertical line.

geom_pointrange() – vertical line with center.


34

Now, let’s practice


35

VARIABLES & PLOTS - GEOMS


TWO VARIABLES
geom_map() – for map data

THREE VARIABLES
geom_contour() – contours.
geom_tile() – tile the plane with rectangles.
geom_raster() – equal sized tiles (fast version of
geom_tile())
36

THANKS!
Any questions?
You can find me at: [email protected]

You might also like