Geom - Histogram Ggplot2 Geom - Histogram : # Library
Geom - Histogram Ggplot2 Geom - Histogram : # Library
# dataset:
data=data.frame(value=rnorm(100))
# basic histogram
p <- ggplot(data, aes(x=value)) +
geom_histogram()
#p
A histogram takes as input a numeric variable and cuts it into several bins. Playing with the bin size is a very
important step, since its value can have a big impact on the histogram appearance and thus on the message you’re
trying to convey. This concept is explained in depth in data-to-viz.
Ggplot2 makes it a breeze to change the bin size thanks to the binwidth argument of the geom_histogram function.
See below the impact it can have on the output.
# Libraries
library(tidyverse)
library(hrbrthemes)
A histogram takes as input a numeric variable and cuts it into several bins. Playing with the bin size is a very
important step, since its value can have a big impact on the histogram appearance and thus on the message you’re
trying to convey. This concept is explained in depth in data-to-viz.
Ggplot2 makes it a breeze to change the bin size thanks to the binwidth argument of the geom_histogram function.
See below the impact it can have on the output.
# Libraries
library(tidyverse)
library(hrbrthemes)
# plot
p <- data %>%
filter( price<300 ) %>%
ggplot( aes(x=price)) +
geom_histogram( binwidth=3, fill="#69b3a2", color="#e9ecef", alpha=0.9) +
ggtitle("Bin size = 3") +
theme_ipsum() +
theme(
plot.title = element_text(size=15)
)
#p
Several histograms on the same axis
If the number of group or variable you have is relatively low, you can display all of them on the same axis, using a
bit of transparency to make sure you do not hide any data.
Note: with 2 groups, you can also build a mirror histogram
# library
library(ggplot2)
library(dplyr)
library(hrbrthemes)
# Represent it
p <- data %>%
ggplot( aes(x=value, fill=type)) +
geom_histogram( color="#e9ecef", alpha=0.6, position = 'identity') +
scale_fill_manual(values=c("#69b3a2", "#404080")) +
theme_ipsum() +
labs(fill="")
If the number of group you need to represent is high, drawing them on the same axis often results in
a cluttered and unreadable figure.
A good workaroung is to use small multiple where each group is represented in a fraction of the plot window,
making the figure easy to read. This is pretty easy to build thanks to the facet_wrap() function of ggplot2.
Note: read more about the dataset used in this example here.
# Libraries
library(tidyverse)
library(hrbrthemes)
library(viridis)
library(forcats)
# First distribution
hist(Ixos, breaks=30, xlim=c(0,300), col=rgb(1,0,0,0.5), xlab="height",
ylab="nbr of plants", main="distribution of height of 2 durum wheat varieties" )
# Add legend
legend("topright", legend=c("Ixos","Primadur"), col=c(rgb(1,0,0,0.5),
rgb(0,0,1,0.5)), pt.cex=2, pch=15 )
Note: this is how the figure looks like if groups are drawn one beside each other:
par(
mfrow=c(1,2),
mar=c(4,4,1,0)
)
hist(Ixos, breaks=30 , xlim=c(0,300) , col=rgb(1,0,0,0.5) , xlab="height" , ylab="nbr of plants" , main="" )
hist(Primadur, breaks=30 , xlim=c(0,300) , col=rgb(0,0,1,0.5) , xlab="height" , ylab="" , main="")
# Create data
my_variable=c(rnorm(1000 , 0 , 2) , rnorm(1000 , 9 , 2))
# Layout to split the screen
layout(mat = matrix(c(1,2),2,1, byrow=TRUE), height = c(1,8))
# Color vector
my_color= ifelse(my_hist$breaks < -10, rgb(0.2,0.8,0.5,0.5) , ifelse (my_hist$breaks >=10, "purple",
rgb(0.2,0.2,0.2,0.2) ))
# Final plot
plot(my_hist, col=my_color , border=F , main="" , xlab="value of the variable", xlim=c(-40,40) )
Basically, you just need to add border=F to the hist function to remove the border of histogram bars.
# Create data
my_variable=c(rnorm(1000 , 0 , 2) , rnorm(1000 , 9 , 2))