Simple Rmarkdown
Simple Rmarkdown
28 August 2024
Contents
Introduction to Rmarkdown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Do silly math (“inline chunks”) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Better maths (code chunks) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Discussion and conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
More things to do . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Introduction to Rmarkdown
Rmarkdown really is a form of literate programming using markdown and R.
The gist is to write a “normal” markdown document (with a special header) that incorporates R code and that
is compiled to produce the final output document. If using RStudio, the compilation is simply a matter of
hitting the Knit button (or the corresponding key shortcut [Command + Shift + K on macOS])1 . The output
is usually an HTML document that be styled using various styles (see below), but can also be a PDF or even a
Word document! Various customisations can be done via the document header2 , but the main ones are:
• title, author and date: the document’s metadata (within quotes “ “);
• output: controls various aspects of the output, such as:
– its type: here, it is HTML (html_document), and for this, we can specify (among others):
∗ how the R code (if shown) is to be highlighted (here, textmate)
∗ and what theme to use for the whole document (I like readable, but there’s many, including
flatly)
∗ should we show the table of contents (toc: yes)
• ignore the rest for now. . .
The results of running the code are used to replace the code chunk in the text, here resulting in:
[...] lifting for us) is 4. The magic [...]
1 This automatically launches a very complex process of “knitting” the R code with the markdown text using the knitr package
1
Better maths (code chunks)
We can do larger chunks of code, which appear on separate lines than the markdown text and are enclosed
by three backticks “ “‘”, the first being immediately followed by “{r”, then a list of possible options, then
closed by “}”. The R code is included within and, as above, its results are inserted in the resulting document
at the place where the code is, but the computations done and their results can now be much more complex
and may include, besides text or numbers, tables and even plots.
Complex text
# This is a comment and is ued to help people (including a future YOU) understand what's going on.
# We define one VARIABLE named mean_age as having value 22.5:
mean_age <- 22.5; # please note the <- symbol which means that what's on the right is stored in the vari
# We define second variable named sd_age as having value 5.3:
sd_age <- 5.3; # the final ; is not obligatory but a very good idea (IMHO)
# For reproducibility, when using random numbers, it is a very good idea to use the same random seed to
set.seed(22092019); # which ever numeric seed you like (here, the date when I wrote this script), but ma
# We generate 10 extractions from a normal distribution with mean mean_age and standard distribution sd_
random_ages <- rnorm(10, mean=mean_age, sd=sd_age); # this tells R to "call" the function "rnorm" to gen
?rnorm # this is the standard way to get help in R (you have to run it from the console)
## The 10 generated ages are: 22.41466 19.57289 21.54989 28.88311 28.5857 14.77194 18.90499 25.97774 18
As you can see above, the results are accurate (10 random numbers normally distributed around 22.5 and
with standard deviation 5.3) but, oh boy, they’re ugly :(
Can we do better?
# Note that the variables defined and computed above, mean_age, sd_age and random_ages, are still with u
# Prettier display using text pasting and specifying in the R code chunk that the results are results='a
cat("The 10 generated ages are: ", # note that arguments to a function can appear on separate lines (and
paste0(# paste0 "glues" together without any separator its arguments: try paste0("I","am"," using ",
round(random_ages, digits=2), # round keeps only the given number of digits (here, 2) from it
collapse=", "), # the collapse argument is used when pasting multiple values and says how to
"\n");
The 10 generated ages are: 22.41, 19.57, 21.55, 28.88, 28.59, 14.77, 18.9, 25.98, 18.29, 20.38
Plots
How about plotting the normal distribution of ages and the actually generated ages?
# We'll use here what is called the "base" plotting capabilities of R because they are simpler and need
# Let's show the actual numbers in order (not very informative really) but very easy to do:
plot( # plot is the most basic plotting function and takes at least two parameters: the horizontal and t
1:length(random_ages), # the horizontal coordinates are here just the order of the generated ages;
random_ages, # the vertical coordinates are the actually generated values
pch=21, col="blue", # show them as hollow circles (symbol code 21) using blue color
main="Plot of ages", xlab="Sequential order", ylab="Age"); # title and lables for the axes
abline(h=mean_age, # this draws a single horizontal like at the mean value mean_age
col="red", lty="solid"); # shown as a solid red line
abline(h=c(mean_age - sd_age, mean_age + sd_age), # as above, but this draws two horizontal lines, one o
2
# the c() construction is extremely useful in R as it builds vectors from components (here, we bu
col="green", lty="dotted"); # show these two lines as dotted green ones
16 18 20 22 24 26 28 Plot of ages
Age
2 4 6 8 10
Sequential order
Figure 1: Plot of the actually generated ages (not the best plot, but still).
Generate more ages However, there’s too few ages for these to be meaningful, let’s generate 1,0000 of
them:
# We generate 10000 ages: note that the previous value of random_ages is LOST:
random_ages <- rnorm(10000, mean=mean_age, sd=sd_age);
# HereBad idea to dusplay all these, but let's see their summaries:
summary(random_ages);
## 5.283147
3
Histogram of random_ages
3
2
Frequency
1
0
10 15 20 25 30
random_ages
plot(1:length(random_ages), random_ages,
pch=21, col="blue", main="Plot of ages", xlab="Sequential order", ylab="Age");
abline(h=mean_age, col="red", lty="solid");
abline(h=c(mean_age - sd_age, mean_age + sd_age), col="green", lty="dotted");
4
Plot of ages
40
30
Age
20
10
0
Sequential order
Figure 3: Plot of the actually generated ages (not the best plot, but still).
Histogram of random_ages
1500
1000
Frequency
500
0
0 10 20 30 40
random_ages
5
The ideal normal distribution (from Wikipedia):
More things to do
• change the style: go to the “gear” icon → “Output options. . . ” and select something else in
“Apply theme:” and re-knit it (I like readable)
• produce something else than HTML: you can knit it to PDF or even to a Word document by going to the
Knit menu and selecting Knit to PDF or Knit to Word; for knitting to various formats, please see,
for example https://rmarkdown.rstudio.com/lesson-9.html and https://rmarkdown.rstudio.com/articles
_docx.html
If you really want to master Rmarkdown, please read the free e-book R Markdown: The Definitive Guide
(obviously, written in RMarkdown) – it’s fun to read and very useful!