Notes On Data Table
Notes On Data Table
Makar Pravosud
28 08 2019
Using i:
We can also sort a data.table using order(), which internally uses data.table’s fast order for performance.
We can do much more in i by keying a data.table, which allows blazing fast subsets and joins.
Using j:
Using by:
Using by, we can group by columns by specifying a list of columns or a character vector of column names or
even expressions. The flexibility of j, combined with by and i makes for a very powerful syntax.
by can handle multiple columns and also expressions.
We can keyby grouping columns to automatically sort the grouped result.
We can use .SD and .SDcols in j to operate on multiple columns using already familiar base functions. Here
are some examples:
DT[, lapply(.SD, fun), by = . . . , .SDcols = . . . ] - applies fun to all columns specified in .SDcols while
grouping by the columns specified in by.
DT[, head(.SD, 2), by = . . . ] - return the first two rows for each group.
DT[col > val, head(.SD, 1), by = . . . ] - combine i along with j and by.
And remember the tip: As long as j returns a list, each element of the list will become a column in the
resulting data.table.