Data Transformation With Data - Table: Cheat Sheet
Data Transformation With Data - Table: Cheat Sheet
UNIQUE CASES
Subset rows using i a a c dt[a == 1, c := 1 + 2] – create a new column a b a b unique(dt, a, b) – extract a subset of the
2 2 NA based on an expression but only for a subset of 1 2 1 2 data based on a unique combination of
dt[1:2, ] – subset rows based on row numbers. 1 1 3 rows. 2 2 2 2 values.
2 2 NA
1 1 1 1
1 2
a a dt[a > 5, ] – subset rows based on the values in Group according to by uniqueN(dt, by = c(“a”, “b”)) – return the number of unique rows,
based on columns specified in “by”. Leave out “by” to use all
2 6 one or more columns. columns.
6 a a a dt[, j, by = .(a)] – group rows by the
5 values in one or more columns.
* SET FUNCTIONS
Use “keyby = .(a)” for grouping and
LOGICAL OPERATORS TO USE IN i simultaneously sorting according to data.table provides a collection of functions beginning with
group column(s). “set”. They work without “<-” to alter data.tables in place. For
< <= is.na() %in% | %like% instance, “setDT(dt)” works like “dt <- as.data.table(dt)” but
> >= !is.na() ! & %between% without creating any copies in memory.
CC BY SA Erik Petrovski • [email protected] • www.petrovski.dk • Learn more with the data.table webpage or vignette • data.table version 1.11.4 • Updated: 2018-08
RENAME COLUMNS
a b x y
BIND
a b a b rbind(dt_a, dt_b) – combine rows of two
.SD
setnames(dt, c(“a”, “b”), c(“x”, “y”)) – rename a b
multiple columns. + = data.tables Refer to a Subset of the Data within a data.table
with .SD.
JOIN DCAST
a b x
3
y
b
a b x
dt_a[dt_b, .on(b = y)] – join two id y a b id a_X a_Z b_X b_Z dcast(dt, fread & fwrite
+ =
1 c 3 b 3 A X 1 3 A 1 2 3 4
data.tables based on rows with equal id ~ y,
2 a 2 c 1 c 2 A Z 2 4 B 1 2 3 4
3 b 1 a 2 a 1 values. setkey() can be used in stead B X 1 3 value.var = c(“a”, “b”)) fread & fwrite are data.table’s fast and multithreaded functions for
of “.on”. B Z 2 4 importing from and exporting to flat files – such as csv and tsv.
a b c x y z a b c x dt_a[dt_b, .on(b = y, c > z)] – Reshape a data.table from long to wide format.
3 b 4 IMPORT
+ =
1 c 7 3 b 4 3
2 c 5
join two data.tables based on dt A data.table.
2 a 5 1 c 5 2
3 b 6 1 a 8 NA a 8 1
rows with equal and unequal id ~ y Formula with a LHS: id column(s) containing id(s) for fread(“file.csv”) – read a flat file into R.
values multiple entries. And a RHS: column(s) with value(s) to
spread in column headers. fread(“file.csv”, cols = c(“a”, “b”)) – read two columns named “a”
value.var Column(s) containing values to fill into cells. and “b” from a file named “file.csv” in the working directory.
ROLLING JOIN
CC BY SA Erik Petrovski • [email protected] • www.petrovski.dk • Learn more with the data.table webpage or vignette • data.table version 1.11.4 • Updated: 2018-08