R Cheat Sheet
R Cheat Sheet
Hurst
R cheat sheet
Contents
1 Data I/O and manipulation 2
1.1 Reading in data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 (very) Basic manipulation of a data frame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Creating factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Data summary 3
2.1 Summary statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.1 Summarizing categorical variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.2 Desriptive statistics for a continuous variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Basic graphics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
4 Modelling 6
4.1 Linear regression and the general linear model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4.2 Logistic regression and other GLMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.3 Longitudinal and other correlated outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.3.1 Continuous longitudinal or otherwise correlated outcomes . . . . . . . . . . . . . . . . . . . . . 8
4.3.2 Categorical longitudinal or otherwise correlated outcomes . . . . . . . . . . . . . . . . . . . . . 8
Preamble
This document is to provide generic syntax for many of the more common R functions and analyses you are likely to
use. Any idenitfier (name) prefixed with ’my” is generic and is not to be taken literately (i.e. Your code shouldn’t
have these names in it), instead you should adapt the code for your own purposes. Specifically:
• my.y, my.x1 and my.x2 are three continuous variables with my.y assumed to be the outome (endpoint) variable
and my.x1 and my.x2 assumed to be predictors.
• my.a and my.b are two categorical variables that are initially just number coded (i.e. R doesn’t yet know they
are categorical)
• my.a.fac and my.b.fac are the ”Factors” correponding to my.a and my.b (see code below)
• my.c.fac.within is a within-subject effect (e.g. for longitudinal data)
1
R cheat sheet: C.Hurst
READING IN DATA
Also, see the foreign library help to see how to read in files from other statistics packages like Stata or SPSS.
BASIC SUBSETTING
2
R cheat sheet: C.Hurst
CREATING FACTORS
2 Data summary
2.1 Summary statistics
2.1.1 Summarizing categorical variables
To tabulate a single categorical variable (frequency table):
FREQUENCY TABLES
CROSS TABULATIONS
3
R cheat sheet: C.Hurst
Note the describe function gives you most summary stats (mean, median, sd, IQR, range, min, max, n, nmiss etc)
and now the bivariate relationship between a continuous outcome in terms of a categorical explanatory (factor)
4
R cheat sheet: C.Hurst
3.3 Correlation
CORRELATION
5
R cheat sheet: C.Hurst
2. Spearmans
cor(mydata.df$my.x1, mydata.df$my.x2 , method ="spearman")
4 Modelling
One of the things you will notice about R is that modelling is very simple. Once you understand the basics of mod-
eling in one situation (e.g. a continous outcome), extending to other types and outcomes and situations is very easy.
Modelling in R is based on the concept of a formula:
y ∼ x1 + x2
in a linear regression (assumeing y, x1 and x2 are continuous) implies:
y = β0 + β1 x 1 + β2 x 2
If this same formula is used (for example) in a Poisson regression (for count data) using a log link then
y ∼ x1 + x2
implies
6
R cheat sheet: C.Hurst
anova(my.model)
Here, I will just give an example of a longitudunal analysis (using the patient identifier my.pat.id), but the
approach used for clustered data (e.g. Hospital ID) is very similar.
7
R cheat sheet: C.Hurst
Note that there are several R libraries for LMMs. My prefered library is called lme4. Again, the first time you
use this, you may have to download from CRAN.
library(lme4)
library(lme4)
8
R cheat sheet: C.Hurst
summary(my.model)
anova(my.model)
#Note I wrote the below function myself-get the R code from me
print.ORCIs.glmm.wald(my.model)