0% found this document useful (0 votes)
36 views

Notes On Data Table

The document discusses the syntax of the data.table package in R. It notes that the general syntax is DT[i, j, by] where i represents rows to select, j represents columns to select or transform, and by represents grouping columns. It provides examples of using i to subset and order rows, using j to select, compute on, and name columns, and using by to group by one or more columns and columns or expressions. The flexibility of combining i, j, and by allows for powerful operations like applying functions to grouped data or subsetting groups.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views

Notes On Data Table

The document discusses the syntax of the data.table package in R. It notes that the general syntax is DT[i, j, by] where i represents rows to select, j represents columns to select or transform, and by represents grouping columns. It provides examples of using i to subset and order rows, using j to select, compute on, and name columns, and using by to group by one or more columns and columns or expressions. The flexibility of combining i, j, and by allows for powerful operations like applying functions to grouped data or subsetting groups.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Notes on Data Table

Makar Pravosud
28 08 2019

The general form of data.table syntax is:


DT[i, j, by]

Using i:

We can also sort a data.table using order(), which internally uses data.table’s fast order for performance.
We can do much more in i by keying a data.table, which allows blazing fast subsets and joins.

Using j:

Select columns the data.table way: DT[, .(colA, colB)].


Select columns the data.frame way: DT[, c(“colA”, “colB”)].
Compute on columns: DT[, .(sum(colA), mean(colB))].
Provide names if necessary: DT[, .(sA =sum(colA), mB = mean(colB))].
Combine with i: DT[colA > value, sum(colB)].

Using by:

Using by, we can group by columns by specifying a list of columns or a character vector of column names or
even expressions. The flexibility of j, combined with by and i makes for a very powerful syntax.
by can handle multiple columns and also expressions.
We can keyby grouping columns to automatically sort the grouped result.
We can use .SD and .SDcols in j to operate on multiple columns using already familiar base functions. Here
are some examples:
DT[, lapply(.SD, fun), by = . . . , .SDcols = . . . ] - applies fun to all columns specified in .SDcols while
grouping by the columns specified in by.
DT[, head(.SD, 2), by = . . . ] - return the first two rows for each group.
DT[col > val, head(.SD, 1), by = . . . ] - combine i along with j and by.
And remember the tip: As long as j returns a list, each element of the list will become a column in the
resulting data.table.

You might also like