R/Rpad Reference Card: Slicing and Extracting Data
R/Rpad Reference Card: Slicing and Extracting Data
R/Rpad Reference Card: Slicing and Extracting Data
Data creation
c(...) generic function to combine arguments with the default forming a
vector; with recursive=TRUE descends through lists combining all
elements into one vector
from:to generates a sequence; : has operator priority; 1:4 + 1 is 2,3,4,5
seq(from,to) generates a sequence by= specifies increment; length=
specifies desired length
seq(along=x) generates 1, 2, ..., length(x); useful for for loops
rep(x,times) replicate x times; use each= to repeat each element of x each times; rep(c(1,2,3),2) is 1 2 3 1 2 3;
rep(c(1,2,3),each=2) is 1 1 2 2 3 3
data.frame(...) create a data frame of the named or unnamed
arguments; data.frame(v=1:4,ch=c("a","B","c","d"),n=10);
shorter vectors are recycled to the length of the longest
list(...)
create a list of the named or unnamed arguments;
list(a=c(1,2),b="hi",c=3i);
array(x,dim=) array with data x; specify dimensions like
dim=c(3,4,2); elements of x recycle if x is not long enough
matrix(x,nrow=,ncol=) matrix; elements of x recycle
factor(x,levels=) encodes a vector x as a factor
gl(n,k,length=n*k,labels=1:n) generate levels (factors) by specifying the pattern of their levels; k is the number of levels, and n is
the number of replications
expand.grid() a data frame from all combinations of the supplied vectors or factors
rbind(...) combine arguments by rows for matrices, data frames, and
others
cbind(...) id. by columns
Variable conversion
as.array(x), as.data.frame(x), as.numeric(x),
as.logical(x), as.complex(x), as.character(x),
... convert type; for a complete list, use methods(as)
Variable information
is.na(x), is.null(x), is.array(x), is.data.frame(x),
is.numeric(x), is.complex(x), is.character(x),
... test for type; for a complete list, use methods(is)
length(x) number of elements in x
dim(x) Retrieve or set the dimension of an object; dim(x) <- c(3,2)
dimnames(x) Retrieve or set the dimension names of an object
nrow(x) number of rows; NROW(x) is the same but treats a vector as a onerow matrix
ncol(x) and NCOL(x) id. for columns
class(x) get or set the class of x; class(x) <- "myclass"
unclass(x) remove the class attribute of x
attr(x,which) get or set the attribute which of x
attributes(obj) get or set the list of attributes of obj
match(x, y) returns a vector of the same length than x with the elements
of x which are in y (NA otherwise)
which(x == a) returns a vector of the indices of x if the comparison operation is true (TRUE), in this example the values of i for which x[i]
== a (the argument of this function must be a variable of mode logical)
choose(n, k) computes the combinations of k events among n repetitions
= n!/[(n k)!k!]
na.omit(x) suppresses the observations with missing data (NA) (suppresses the corresponding line if x is a matrix or a data frame)
na.fail(x) returns an error message if x contains at least one NA
unique(x) if x is a vector or a data frame, returns a similar object but with
the duplicate elements suppressed
table(x) returns a table with the numbers of the differents values of x
(typically for integers or factors)
subset(x, ...) returns a selection of x with respect to criteria (...,
typically comparisons: x$V1 < 10); if x is a data frame, the option
select gives the variables to be kept or dropped using a minus sign
sample(x, size) resample randomly and without replacement size elements in the vector x, the option replace = TRUE allows to resample
with replacement
prop.table(x,margin=) table entries as fraction of marginal table
Math
sin,cos,tan,asin,acos,atan,atan2,log,log10,exp
max(x) maximum of the elements of x
min(x) minimum of the elements of x
range(x) id. then c(min(x), max(x))
sum(x) sum of the elements of x
diff(x) lagged and iterated differences of vector x
prod(x) product of the elements of x
mean(x) mean of the elements of x
median(x) median of the elements of x
quantile(x,probs=) sample quantiles corresponding to the given probabilities (defaults to 0,.25,.5,.75,1)
weighted.mean(x, w) mean of x with weights w
rank(x) ranks of the elements of x
var(x) or cov(x) variance of the elements of x (calculated on n 1); if x is
a matrix or a data frame, the variance-covariance matrix is calculated
sd(x) standard deviation of x
cor(x) correlation matrix of x if it is a matrix or a data frame (1 if x is a
vector)
var(x, y) or cov(x, y) covariance between x and y, or between the
columns of x and those of y if they are matrices or data frames
cor(x, y) linear correlation between x and y, or correlation matrix if they
are matrices or data frames
round(x, n) rounds the elements of x to n decimals
log(x, base) computes the logarithm of x with base base
scale(x) if x is a matrix, centers and scales the data; to center only use
the option scale=FALSE, to scale only center=FALSE (by default
center=TRUE, scale=TRUE)
pmin(x,y,...) a vector which ith element is the minimum of x[i],
y[i], . . .
pmax(x,y,...) id. for the maximum
cumsum(x) a vector which ith element is the sum from x[1] to x[i]
cumprod(x) id. for the product
Matrices
t(x) transpose
diag(x) diagonal
%*% matrix multiplication
solve(a,b) solves a %*% x = b for x
solve(a) matrix inverse of a
rowsum(x) sum of rows for a matrix-like object; rowSums(x) is a faster
version
colsum(x), colSums(x) id. for columns
rowMeans(x) fast version of row means
colMeans(x) id. for columns
Strings
paste(...) concatenate vectors after converting to character; sep= is the
string to separate terms (a single space is the default); collapse= is
an optional string to separate collapsed results
substr(x,start,stop) substrings in a character vector; can also assign, as substr(x, start, stop) <- value
strsplit(x,split) split x according to the substring split
grep(pattern,x) searches for matches to pattern within x; see ?regex
gsub(pattern,replacement,x) replacement of matches determined
by regular expression matching sub() is the same but only replaces
the first occurrence.
tolower(x) convert to lowercase
toupper(x) convert to uppercase
match(x,table) a vector of the positions of first matches for the elements
of x among table
x %in% table id. but returns a logical vector
pmatch(x,table) partial matches for the elements of x among table
nchar(x) number of characters
Where leading zeros are shown they will be used on output but are optional
on input. See ?strftime.
Graphics devices
x11(), windows() open a graphics window
postscript(file) starts the graphics device driver for producing PostScript graphics; use horizontal = FALSE, onefile =
FALSE, paper = "special" for EPS files; family= specifies the
font (AvantGarde, Bookman, Courier, Helvetica, Helvetica-Narrow,
NewCenturySchoolbook, Palatino, Times, or ComputerModern);
width= and height= specifies the size of the region in inches (for
paper="special", these specify the paper size).
ps.options() set and view (if called without arguments) default values
for the arguments to postscript
pdf, png, jpeg, bitmap, xfig, pictex; see ?Devices
dev.off() shuts down the specified (default is the current) graphics device;
see also dev.cur, dev.set
Plotting
plot(x) plot of the values of x (on the y-axis) ordered on the x-axis
plot(x, y) bivariate plot of x (on the x-axis) and y (on the y-axis)
hist(x) histogram of the frequencies of x
barplot(x) histogram of the values of x; use horiz=FALSE for horizontal
bars
dotchart(x) if x is a data frame, plots a Cleveland dot plot (stacked plots
line-by-line and column-by-column)
pie(x) circular pie-chart
boxplot(x) box-and-whiskers plot
sunflowerplot(x, y) id. than plot() but the points with similar coordinates are drawn as flowers which petal number represents the number of points
stripplot(x) plot of the values of x on a line (an alternative to
boxplot() for small sample sizes)
coplot(xy | z) bivariate plot of x and y for each value or interval of
values of z
interaction.plot (f1, f2, y) if f1 and f2 are factors, plots the
means of y (on the y-axis) with respect to the values of f1 (on the
x-axis) and of f2 (different curves); the option fun allows to choose
the summary statistic of y (by default fun=mean)
matplot(x,y) bivariate plot of the first column of x vs. the first one of y,
the second one of x vs. the second one of y, etc.
fourfoldplot(x) visualizes, with quarters of circles, the association between two dichotomous variables for different populations (x must
be an array with dim=c(2, 2, k), or a matrix with dim=c(2, 2) if
k = 1)
assocplot(x) CohenFriendly graph showing the deviations from independence of rows and columns in a two dimensional contingency table
mosaicplot(x) mosaic graph of the residuals from a log-linear regression of a contingency table
pairs(x) if x is a matrix or a data frame, draws all possible bivariate plots
between the columns of x
plot.ts(x) if x is an object of class "ts", plot of x with respect to time, x
may be multivariate but the series must have the same frequency and
dates
ts.plot(x) id. but if x is multivariate the series may have different dates
and must have the same frequency
qqnorm(x) quantiles of x with respect to the values expected under a normal law
qqplot(x, y) quantiles of y with respect to the quantiles of x
axis(side) adds an axis at the bottom (side=1), on the left (2), at the
top (3), or on the right (4); at=vect (optional) gives the abcissa (or
ordinates) where tick-marks are drawn
box() draw a box around the current plot
rug(x) draws the data x on the x-axis as small vertical lines
locator(n, type="n", ...) returns the coordinates (x, y) after the
user has clicked n times on the plot with the mouse; also draws symbols (type="p") or lines (type="l") with respect to optional graphic
parameters (...); by default nothing is drawn (type="n")
Graphical parameters
These can be set globally with par(...); many can be passed as parameters
to plotting commands.
adj controls text justification (0 left-justified, 0.5 centred, 1 right-justified)
bg specifies the colour of the background (ex. : bg="red", bg="blue", . . .
the list of the 657 available colours is displayed with colors())
bty controls the type of box drawn around the plot, allowed values are: "o",
"l", "7", "c", "u" ou "]" (the box looks like the corresponding character); if bty="n" the box is not drawn
cex a value controlling the size of texts and symbols with respect to the default; the following parameters have the same control for numbers on
the axes, cex.axis, the axis labels, cex.lab, the title, cex.main,
and the sub-title, cex.sub
col controls the color of symbols and lines; use color names: "red", "blue"
see colors() or as "#RRGGBB"; see rgb(), hsv(), gray(), and
rainbow(); as for cex there are: col.axis, col.lab, col.main,
col.sub
font an integer which controls the style of text (1: normal, 2: italics, 3:
bold, 4: bold italics); as for cex there are: font.axis, font.lab,
font.main, font.sub
las an integer which controls the orientation of the axis labels (0: parallel to
the axes, 1: horizontal, 2: perpendicular to the axes, 3: vertical)
lty controls the type of lines, can be an integer or string (1: "solid",
2: "dashed", 3: "dotted", 4: "dotdash", 5: "longdash", 6:
"twodash", or a string of up to eight characters (between "0" and
"9") which specifies alternatively the length, in points or pixels, of
the drawn elements and the blanks, for example lty="44" will have
the same effect than lty=2
lwd a numeric which controls the width of lines, default 1
mar a vector of 4 numeric values which control the space between the axes
and the border of the graph of the form c(bottom, left, top,
right), the default values are c(5.1, 4.1, 4.1, 2.1)
mfcol a vector of the form c(nr,nc) which partitions the graphic window
as a matrix of nr lines and nc columns, the plots are then drawn in
columns
mfrow id. but the plots are drawn by row
pch controls the type of symbol, either an integer between 1 and 25, or any
single character within ""
1 2
16 17
3
18
4
5
19 20
6
7
21 22
8
23
9
24
10 11
25
* *
12
.
13 14
15
X X a a ? ?
tcl a value which specifies the length of tick-marks on the axes as a fraction
of the height of a line of text (by default tcl=-0.5)
xaxs, yaxs style of axis interval calculation; default "r" for an extra
space; "i" for no extra space
xaxt if xaxt="n" the x-axis is set but not drawn (useful in conjunction with
axis(side=1, ...))
yaxt if yaxt="n" the y-axis is set but not drawn (useful in conjonction with
axis(side=2, ...))
Statistics
aov(formula) analysis of variance model
anova(fit,...) analysis of variance (or deviance) tables for one or more
fitted model objects
density(x) kernel density estimates of x
binom.test(),
pairwise.t.test(),
power.t.test(),
prop.test(), t.test(), ... use help.search("test")
Distributions
rnorm(n, mean=0, sd=1) Gaussian (normal)
rexp(n, rate=1) exponential
rgamma(n, shape, scale=1) gamma
rpois(n, lambda) Poisson
rweibull(n, shape, scale=1) Weibull
rcauchy(n, location=0, scale=1) Cauchy
rbeta(n, shape1, shape2) beta
rt(n, df) Student (t)
rf(n, df1, df2) FisherSnedecor (F) (2 )
rchisq(n, df) Pearson
rbinom(n, size, prob) binomial
rgeom(n, prob) geometric
rhyper(nn, m, n, k) hypergeometric
rlogis(n, location=0, scale=1) logistic
rlnorm(n, meanlog=0, sdlog=1) lognormal
rnbinom(n, size, prob) negative binomial
runif(n, min=0, max=1) uniform
rwilcox(nn, m, n), rsignrank(nn, n) Wilcoxons statistics
All these functions can be used by replacing the letter r with d, p or q to
get, respectively, the probability density (dfunc(x, ...)), the cumulative
probability density (pfunc(x, ...)), and the value of quantile (qfunc(p,
...), with 0 < p < 1).
Programming
function( arglist ) expr function definition
return(value)
if(cond) expr
if(cond) cons.expr else alt.expr
for(var in seq) expr
while(cond) expr
repeat expr
break
next
Use braces {} around statements
ifelse(test, yes, no) a value with the same shape as test filled
with elements from either yes or no
do.call(funname, args) executes a function call from the name of
the function and a list of arguments to be passed to it
Rpad utilities
RpadURL(filename) returns the URL for the given filename
RpadBaseURL(filename) returns the base URL for the given filename
RpadBaseFile(filename) returns the file name relative to the base R
directory
RpadIsLocal() returns TRUE if run locally (rather than the client-server
version)