Basic R Commands For Data Analysis
Basic R Commands For Data Analysis
David Lorenz
English Department, Albert-Ludwigs-Universität Freiburg
Basic maths
> 1+1!! ! # addition
[1] 2
> 6/2!! ! # division
[1] 3
> 2*10+5! ! # multiplication and division before addition and subtraction
[1] 25
> 2*(10+5)! ! # bracketing
[1] 30
> 3^2!! ! # exponentiation!
[1] 9
> sqrt(9)! ! # square root
[1] 3
> round(10/3, digits=2)!! # rounding
[1] 3.333
> abs(-5)! ! # absolute value
[1] 5
Creating ‘objects’
> x <- 43! ! ! ! # assigning a value; x is now an object (also
! ! ! ! ! works the other way around: 43 -> x).
> venus <- "evening star"
> x
[1] 43
> venus
[1] "evening star"
> y <- c(1:5)! ! !
> y
[1] 1 2 3 4 5 ! ! ! # ‘c’ is for “concatenate” and is used to create
! ! ! ! ! sets (here a sequence from 1 to 5).
> elements <- c("fire", "earth", "air", "water")
> elements
[1] "fire" "earth" "air" "water"
The working directory is the directory on your computer that R currently ‘works’ in. (For how to
save items there, see at the end of ‘Data frames’.)
> getwd()! # shows the current working directory
> setwd(dir=”~/Desktop”)! # change the working directory (here to Desktop;
! ! ! ! ! the path given has to be an existing folder).
1
R Glossary – David Lorenz, January 2017
Data frames
Reading files works best with .csv format (‘comma-separated values’). Excel tables can be saved
as .csv (choose UTF-8 encoding; separator can also be “;” or tab stop). Alternatively, read.table()
reads tables in .txt format.
=> RStudio lets you read in data with “Import Dataset” in the “Environment” panel – always note
the filename in your R script!
> read.csv(file.choose())! # reads a .csv file (which you have to select);
> read.csv(file="~/Desktop/mydata.csv")!
! # reads the file in the given URL (this must be an existing file on your
! computer);
> is.data.frame(mydata)
[1] TRUE
> write.csv(file="~/Desktop/mydata.csv")
! ! ! ! # writes the data frame to a .csv file in the specified
! ! ! ! file path. (!Note: if this is an existing file, it
! ! ! ! will overwrite it!)
> save(mydata, file="~/Desktop/myworkspace.RData")
! ! ! ! # saves the data to a workspace file (.RData) (!Note:
! ! ! ! if this is an existing file, it will overwrite it!)
> save(“mydata”, “elements”, file="~/Desktop/myworkspace.RData")
! ! ! ! # saves the listed items to the specified file;
> save.image(file="~/Desktop/myworkspace.RData")
! ! ! ! # saves the entire work space to the specified file.
> load("~/Desktop/myworkspace.RData")
! ! ! ! # loads the data in the specified file. (You can also
! ! ! ! just double-click the file.)
Data types
Columns in a data frame are called ‘vectors’; a vector must contain data of only one type. The main
types are ‘factor’, ‘numeric’ and ‘character’; factors may be ordered.
Categorical data:
> xtabs(~ variant + gender, mydata)!!
! ! ! # shows a contingency table of ‘variant’ by the variable
! ! ! ‘gender’ in the data frame ‘mydata’ (the variables must be
! ! ! columns in the data frame).
Multifactorial:
# Logistic regression (binary dependent variable):
> ftable(mydata$gender, mydata$sentence_type, mydata$variant)
! ! ! # a ‘flat’ table of all combinations of several factors.
> mydata$variant <- relevel(mydata$variant, “short”)
! ! ! # defines the reference level of a factor.
> mymodel <- glm(variant ~ gender + age + sentence_type, data=mydata,
family=”binomial”)
! ! ! # a logistic regression model; here ‘variant’ is the dependent
! ! ! variable, ‘gender’, ‘age’ and ‘sentence_type’ are
! ! ! independent variables. Use...
> summary(mymodel)! # ...to see the coefficients and p-values.
> anova(mymodel)! ! # Analysis of Variance of the model.
5
R Glossary – David Lorenz, January 2017
Packages
Packages are specialized ‘add-ons’ that contain useful functions for data analysis, plotting, etc.
> install.packages(“effects”)!! # installs the package ‘effects’;
> library(effects)! ! # loads the package into the library. To see all
! ! ! ! ! the functions in a package, use help():
> help(package=effects)
ggplot2
## contains the functions qplot() and ggplot() for creating sophisticated
graphs. See http://ggplot2.org
plyr
## Tools for splitting, applying and combining data, e.g. ddply()
> ddply(mydata, .(variant, gender), summarize, mean=mean(duration))
! ! ! # a data frame showing the mean ‘duration’ for each
! ! ! combination of ‘variant’ and ‘gender’ in ‘mydata’ (other
! ! ! measures such as sd() can be added).
rms
## “regression modeling strategies” – contains the function lrm() for logistic
regression with a few more features than glm()
swirl
## A tutorial for working with R, for beginners and advanced learners – see
http://swirlstats.com
Further readings
R and statistics textbooks for linguists:
Baayen, Harald. 2008. Analyzing Linguistics Data. Cambridge: Cambridge University Press.
Gries, Stefan Th. 2013. Statistics for Linguistics with R. 2nd edition. Berlin, New York: De Gruyter.
Johnson, Keith. 2008. Quantitative Methods in Linguistics. Malden: Blackwell.
Levshina, Natalia. 2015. How to do Linguistics with R: Data exploration and Statistical Analysis.
Amsterdam: John Benjamins.
6
R Glossary – David Lorenz, January 2017
Introductory R books:
Adler, Joseph. 2010. R in a Nutshell: A Desktop Quick Reference. Sebastopol, CA: O'Reilly Media.
Crawley, Michael J. 2012. The R Book, 2nd Edition. Hoboken, NJ: Wiley.
Kabacoff, Robert I. 2011. R in Action: Data Analysis and Graphics with R. Greenwich, CT:
Manning Publications.
Teetor, Paul. 2011. R Cookbook. Sebastopol, CA: O'Reilly Media.
Verzani, John. 2014. Using R for Introductory Statistics. 2nd edition. Boca Raton: Chapman &
Hall / CRC Press.
Check the official R bibliography for more...
Online help
http://www.statmethods.net # a very good overview of the possibilities for
! ! ! ! statistics and graphics in R
https://groups.google.com/forum/#!forum/corpling-with-r
! ! ! ! ! # Stefan Gries' Corpus linguistics with R forum