0% found this document useful (0 votes)
70 views31 pages

R Module 4

This document provides an outline for a presentation on visualizing data using ggplot2 in R. It introduces ggplot2 and its underlying grammar of graphics. It covers the basic components of plots in ggplot2 including data, layers, scales, facets, and themes. It also demonstrates how to create different types of plots, customize aesthetics, and practice implementing ggplot2 in R. The document concludes with an assignment asking attendees to create several plots using the mpg and diamond datasets.

Uploaded by

Damai Arum
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views31 pages

R Module 4

This document provides an outline for a presentation on visualizing data using ggplot2 in R. It introduces ggplot2 and its underlying grammar of graphics. It covers the basic components of plots in ggplot2 including data, layers, scales, facets, and themes. It also demonstrates how to create different types of plots, customize aesthetics, and practice implementing ggplot2 in R. The document concludes with an assignment asking attendees to create several plots using the mpg and diamond datasets.

Uploaded by

Damai Arum
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 31

Visualizing Data using Ggplot2: Introduction

Jesita Wida Ajani


Presentation Outline
• GGPlot2
• Basic Plotting
• Grammar of Graphics
• Histograms and Box Plot
• Axes and Legends
• R Practice Session : Implementing what we have learned!
• Homework
Welcome to ggplot2
• Ggplot2 is an R package for producing statistical, or data, graphics.
• Unlike most other graphics packages, ggplot2 has a deep underlying
grammar based from Grammar of Graphics
• All plots are mainly composed of:
1. Data
2. Layers
3. Scales
4. Facet
5. Theme
Basic Plotting
• We use ggplot() function to create the base object
• Using data mpg from R, we will plot highway mileage (hwy) and
engine displacement in liters (displ)

Syntax:
ggplot(mpg) + geom_point(aes(x = displ, y = hwy))
ggplot(mpg, aes(x = displ, y = hwy) + geom_point()

• Pay attention with the “+” sign, this means we added new layer. As we
learn more about ggplot2 we will make more sophisticated plots.
Grammar of Graphics
• Layer
• Data
• Mapping
• Statistical transformation (stat)
• Geometric object (geom)
• Position adjustment (position)
• Scale
• Coordinate system (coord)
• Faceting (facet)
• Defaults
• Data
• Mapping
Aesthetic Attributes
• We can plot by certain categories by using aesthetics, for example:

1. ggplot(mpg) + geom_point(aes(x = displ, y = hwy, col = class))


2. ggplot(mpg) + geom_point(aes(x = displ, y = hwy, shape = drv))
3. ggplot(mpg) + geom_point(aes(x = displ, y = hwy, size = cyl))

• This gives each point a unique attribute corresponding to its class. The
legend allows us to read data values from the attributes
Plot

1 2 3
Facet Wrap
• Another technique for displaying additional categorical variable on a
plot is faceting. It splits the data into subsets and displaying the same
graph for each subset, or example:

ggplot(mpg) + geom_point(aes(x = displ, y = hwy)) + facet_wrap(~class)


Facet Wrap
• Faceting can be used to split the data up into subsets of the entire
dataset. This is a powerful tool when investigating whether patterns
are the same or different across conditions, and allows the subsets to
be visualized on the same plot (known as conditioned or trellis plots).
The faceting specification describes which variables should be used to
split up the data, and how they should be arranged.
Box Plot
• Box plot captures the intensity and variation of continuous variable
across categorical variable
• R mainly use three kinds of boxplot:
• Jittering, geom_jitter(), adds a little random noise to the data which can avoid
overplotting;
• Boxplots, geom_boxplot(), summarise the shape of the distribution with a
handful of summary statistics;
• Violin plots, geom_violin(), show a compact representation of the density of
the distribution.
1. ggplot(mpg) + geom_jitter(aes(drv, hwy))
2. ggplot(mpg) + geom_boxplot(aes(drv, hwy))
3. ggplot(mpg) + geom_violin(aes(drv, hwy))
Plot
Box Plot and Position Adjustment
• Sometimes with dense data we need to adjust the position of
elements on the plot, otherwise data points might obscure one
another. Bar plots frequently stackor dodge the bars to avoid overlap:
count(x = mpg, class, cyl) %>%
ggplot(mapping = aes(x = cyl, y = n, fill = class)) + geom_bar(stat = "identity") +
ggtitle(label = "A stacked bar chart")
Box Plot and Position Adjustment
Histograms
• Histograms show the distribution of a single numeric variable
ggplot(mpg) + geom_histogram(aes(hwy))

• We can control the width bins using binwidth option in


geom_histogram(). This is important to scale the graph
ggplot(mpg) + geom_histogram(aes(hwy), binwidth = 2)
Geometric Objects and Graph Title
• Each geom can only display certain aesthetics or visual attributes of
the geom. For example, a point geom has position, color, shape, and
size aesthetics.

ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = class)) +


geom_point() + ggtitle("A point geom with position and color aesthetics")
Geometric Objects and Graph Title
Color and Fill
• There is a significance difference between color and fill in ggplot2.
• Color only fill the edge of graph, on the other hand, fill adds a layer in
the body of objects.
• Color is suitable for line chart, and fill fits more on boxplot, histogram,
and bar chart, for example, we have two line code:
1. ggplot(mpg) + geom_violin(aes(x = drv, y = hwy), fill = “ dark blue")
2. ggplot(mpg) + geom_violin(aes(x = drv, y = hwy), color = "dark blue")

• For colour reference, please kindly visit this link:


https://www.r-graph-gallery.com/ggplot2-color.html
Plot
Axes and Legends
• We can use xlab() and ylab() to set the title, and also ggtitle() to
determine the title of the graph.
• On the other hand, legends can be showed in show.legend option in
the main geom(). To set the legend position, use option
legend.position in theme() function. For example:
diaplot <- ggplot(diamonds, aes(x = carat, y = price, col = cut)) + geom_point(alpha = 0.2,
show.legend = T) + geom_smooth(se = F) + xlab("The Weigh of the Diamond") + ylab("The Price
of Diamond, in Dollar") +ggtitle("Carat and Price of the Diamonds over Cut") +
theme(legend.position = "bottom")
Plot
R Practice Session
R Practice Session
• You will be supplied with a dataset called “Movie Rating”.
• Try to generate general statistics:
Head
Tail
Summary
R Practice Session
• Try to create a graph with aesthetics
ggplot(data=movies, aes(x=CriticRating, y=AudienceRating))
#Add geometry
ggplot(data=movies, aes(x=CriticRating, y=AudienceRating)) +
geom_point()
#Add colour
ggplot(data=movies, aes(x=CriticRating, y=AudienceRating,
colour=Genre)) + geom_point()

#Add size
ggplot(data=movies, aes(x=CriticRating, y=AudienceRating,
colour=Genre, size=BudgetMillions)) +
geom_point()
R Practice Session
• # Plotting with Layers
• p <- ggplot(data=movies, aes(x=CriticRating, y=AudienceRating,
• colour=Genre, size=BudgetMillions)) +
• geom_point()

• #Point
• p + geom_point()

• #Lines
• p+geom_line()

• #Multiple Layers
• p+geom_point() + geom_line()
• p+geom_line() + geom_point()
Post-tutorial Assignment
Tutorial Assignment
You have been supplied with mpg data.
Please create five plots depicting:
1. Create a basic plot using this criteria : x = cty and y = hwy
2. Create a basic plot using : x = displ, y = cyl and color of Blue
3. Create a violin-shape plot using x=displ, y = cyl and color fill of Red.
4. Try to map a plot using x=cty, y=displ, color based on types of
“manufacturer” and give a title of the graph (“A point geom with
position and color aes”)
Tutorial Assignment
1. Using the diamond data:
• Please create a ggplot similar to the previous sample but with x=carat,
y=depth, alpha of 0,15, title of x axis “The Carat of Diamond”, title of y
axis “The Depth of Diamond” and ggplot title of “Depth and Carat of
the Diamonds over Cut”. Make the legend position on the top of the
graph.
You need to submit the Homework to obtain a training certificate!
References
• Https://cran.r-project.org/doc/contrib/Paradis-rdebuts_en.pdf
• Eremenko, Kiriil. (2021, March 5). R For Data Science with Real
Exercise [Video file]. Retrieved from
https://www.udemy.com/course/r-programming/learn/lecture/45857
02#overview
.
• https://www.statmethods.net/input/importingdata.html
Thank You and See you on the next tutorial!

You might also like