0% found this document useful (0 votes)
48 views9 pages

Geom - Histogram Ggplot2 Geom - Histogram : # Library

The document provides examples of creating histograms with ggplot2 and base R. It demonstrates how to: 1. Create a basic histogram with ggplot2 using geom_histogram(). 2. Control the bin size/width with the binwidth argument in geom_histogram(). 3. Plot multiple histograms on the same axis or use small multiples with facets. 4. Add elements like boxplots or color portions of the histogram.

Uploaded by

Luis Emilio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views9 pages

Geom - Histogram Ggplot2 Geom - Histogram : # Library

The document provides examples of creating histograms with ggplot2 and base R. It demonstrates how to: 1. Create a basic histogram with ggplot2 using geom_histogram(). 2. Control the bin size/width with the binwidth argument in geom_histogram(). 3. Plot multiple histograms on the same axis or use small multiples with facets. 4. Add elements like boxplots or color portions of the histogram.

Uploaded by

Luis Emilio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Basic histogram with ggplot2

Basic histogram with geom_histogram

It is relatively straightforward to build a histogram with ggplot2 thanks to the geom_histogram() function. Only one


numeric variable is needed in the input. Note that a warning message is triggered with this code: we need to take
care of the bin width as explained in the next section.
# library
library(ggplot2)

# dataset:
data=data.frame(value=rnorm(100))

# basic histogram
p <- ggplot(data, aes(x=value)) +
geom_histogram()

#p

Control bin size with binwidth

A histogram takes as input a numeric variable and cuts it into several bins. Playing with the bin size is a very
important step, since its value can have a big impact on the histogram appearance and thus on the message you’re
trying to convey. This concept is explained in depth in data-to-viz.
Ggplot2 makes it a breeze to change the bin size thanks to the binwidth argument of the geom_histogram function.
See below the impact it can have on the output.
# Libraries
library(tidyverse)
library(hrbrthemes)

# Load dataset from github


data <-
read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/Example_dataset/1_OneNum.csv",
header=TRUE)
# plot
p <- data %>%
filter( price<300 ) %>%
ggplot( aes(x=price)) +
geom_histogram( binwidth=3, fill="#69b3a2", color="#e9ecef", alpha=0.9) +
ggtitle("Bin size = 3") +
theme_ipsum() +
theme(
plot.title = element_text(size=15)
)
#p

Control bin size with binwidth

A histogram takes as input a numeric variable and cuts it into several bins. Playing with the bin size is a very
important step, since its value can have a big impact on the histogram appearance and thus on the message you’re
trying to convey. This concept is explained in depth in data-to-viz.
Ggplot2 makes it a breeze to change the bin size thanks to the binwidth argument of the geom_histogram function.
See below the impact it can have on the output.
# Libraries
library(tidyverse)
library(hrbrthemes)

# Load dataset from github


data <-
read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/Example_dataset/1_OneNum.csv",
header=TRUE)

# plot
p <- data %>%
filter( price<300 ) %>%
ggplot( aes(x=price)) +
geom_histogram( binwidth=3, fill="#69b3a2", color="#e9ecef", alpha=0.9) +
ggtitle("Bin size = 3") +
theme_ipsum() +
theme(
plot.title = element_text(size=15)
)
#p
Several histograms on the same axis

If the number of group or variable you have is relatively low, you can display all of them on the same axis, using a
bit of transparency to make sure you do not hide any data.
Note: with 2 groups, you can also build a mirror histogram
# library
library(ggplot2)
library(dplyr)
library(hrbrthemes)

# Build dataset with different distributions


data <- data.frame(
type = c( rep("variable 1", 1000), rep("variable 2", 1000) ),
value = c( rnorm(1000), rnorm(1000, mean=4) )
)

# Represent it
p <- data %>%
ggplot( aes(x=value, fill=type)) +
geom_histogram( color="#e9ecef", alpha=0.6, position = 'identity') +
scale_fill_manual(values=c("#69b3a2", "#404080")) +
theme_ipsum() +
labs(fill="")

Using small multiple

If the number of group you need to represent is high, drawing them on the same axis often results in
a cluttered and unreadable figure.
A good workaroung is to use small multiple where each group is represented in a fraction of the plot window,
making the figure easy to read. This is pretty easy to build thanks to the facet_wrap() function of ggplot2.
Note: read more about the dataset used in this example here.
# Libraries
library(tidyverse)
library(hrbrthemes)
library(viridis)
library(forcats)

# Load dataset from github


data <- read.table("https://raw.githubusercontent.com/zonination/perceptions/master/probly.csv", header=TRUE,
sep=",")
data <- data %>%
gather(key="text", value="value") %>%
mutate(text = gsub("\\.", " ",text)) %>%
mutate(value = round(as.numeric(value),0))
# plot
p <- data %>%
mutate(text = fct_reorder(text, value)) %>%
ggplot( aes(x=value, color=text, fill=text)) +
geom_histogram(alpha=0.6, binwidth = 5) +
scale_fill_viridis(discrete=TRUE) +
scale_color_viridis(discrete=TRUE) +
theme_ipsum() +
theme(
legend.position="none",
panel.spacing = unit(0.1, "lines"),
strip.text.x = element_text(size = 8)
)+
xlab("") +
ylab("Assigned Probability (%)") +
facet_wrap(~text)

Two Histograms with melt colors


Histogramms are commonly used in data analysis to observe distribution of variables. A common task in data
visualization is to compare the distribution of 2 variables simultaneously.
Here is a tip to plot 2 histograms together (using the add function) with transparency (using the rgb function) to
keep information when shapes overlap.
#Create data
set.seed(1)
Ixos=rnorm(4000 , 120 , 30)
Primadur=rnorm(4000 , 200 , 30)

# First distribution
hist(Ixos, breaks=30, xlim=c(0,300), col=rgb(1,0,0,0.5), xlab="height",
ylab="nbr of plants", main="distribution of height of 2 durum wheat varieties" )

# Second with add=T to plot on top


hist(Primadur, breaks=30, xlim=c(0,300), col=rgb(0,0,1,0.5), add=T)

# Add legend
legend("topright", legend=c("Ixos","Primadur"), col=c(rgb(1,0,0,0.5),
rgb(0,0,1,0.5)), pt.cex=2, pch=15 )

Note: this is how the figure looks like if groups are drawn one beside each other:
par(
mfrow=c(1,2),
mar=c(4,4,1,0)
)
hist(Ixos, breaks=30 , xlim=c(0,300) , col=rgb(1,0,0,0.5) , xlab="height" , ylab="nbr of plants" , main="" )
hist(Primadur, breaks=30 , xlim=c(0,300) , col=rgb(0,0,1,0.5) , xlab="height" , ylab="" , main="")

Boxplot on top of histogram


This example illustrates how to split the plotting window in base R thanks to the layout function. Contrary to
the par(mfrow=...) solution, layout() allows greater control of panel parts.
Here a boxplot is added on top of the histogram, allowing to quickly observe summary statistics of the distribution.

# Create data
my_variable=c(rnorm(1000 , 0 , 2) , rnorm(1000 , 9 , 2))
# Layout to split the screen
layout(mat = matrix(c(1,2),2,1, byrow=TRUE), height = c(1,8))

# Draw the boxplot and the histogram


par(mar=c(0, 3.1, 1.1, 2.1))
boxplot(my_variable , horizontal=TRUE , ylim=c(-10,20), xaxt="n" , col=rgb(0.8,0.8,0,0.5) , frame=F)
par(mar=c(4, 3.1, 1.1, 2.1))
hist(my_variable , breaks=40 , col=rgb(0.2,0.8,0.5,0.5) , border=F , main="" , xlab="value of the variable", xlim=c(-
10,20))

Histogram with colored tail


This example demonstrates how to color parts of the histogram. First of all, the hist function must be called without
plotting the result using the plot=F option. It allows to store the position of each bin in an object (my_hist here).
Those bin borders are now available in the $breaks slot of the object, what allows to build a color vector
using ifelse statements. Finally, this color vector can be used in a plot call.
# Create data
my_variable=rnorm(2000, 0 , 10)

# Calculate histogram, but do not draw it


my_hist=hist(my_variable , breaks=40 , plot=F)

# Color vector
my_color= ifelse(my_hist$breaks < -10, rgb(0.2,0.8,0.5,0.5) , ifelse (my_hist$breaks >=10, "purple",
rgb(0.2,0.2,0.2,0.2) ))

# Final plot
plot(my_hist, col=my_color , border=F , main="" , xlab="value of the variable", xlim=c(-40,40) )

Mirrored histogram in base R


The mirrored histogram allows to compare the distribution of 2 variables.
First split the screen with the par(mfrow()) command. The top histogram needs a xaxt="n" statement to discard its
X axis. For the second one, inverse the values of the ylim argument to flip it upside down. Use
the margin command to adjust the position of the 2 charts.
#Create Data
x1 = rnorm(100)
x2 = rnorm(100)+rep(2,100)
par(mfrow=c(2,1))

#Make the plot


par(mar=c(0,5,3,3))
hist(x1 , main="" , xlim=c(-2,5), ylab="Frequency for x1", xlab="", ylim=c(0,25) , xaxt="n", las=1 , col="slateblue1",
breaks=10)
par(mar=c(5,5,0,3))
hist(x2 , main="" , xlim=c(-2,5), ylab="Frequency for x2", xlab="Value of my variable", ylim=c(25,0) , las=1 ,
col="tomato3" , breaks=10)

Histogram without border

Basically, you just need to add border=F to the hist function to remove the border of histogram bars.
# Create data
my_variable=c(rnorm(1000 , 0 , 2) , rnorm(1000 , 9 , 2))

# Draw the histogram with border=F


hist(my_variable , breaks=40 , col=rgb(0.2,0.8,0.5,0.5) , border=F , main="")

You might also like