0% found this document useful (0 votes)
51 views1 page

Data Table Syntax and Usage Guide

The document discusses the syntax of the data.table package in R. It notes that the general syntax is DT[i, j, by] where i represents rows to select, j represents columns to select or transform, and by represents grouping columns. It provides examples of using i to subset and order rows, using j to select, compute on, and name columns, and using by to group by one or more columns and columns or expressions. The flexibility of combining i, j, and by allows for powerful operations like applying functions to grouped data or subsetting groups.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views1 page

Data Table Syntax and Usage Guide

The document discusses the syntax of the data.table package in R. It notes that the general syntax is DT[i, j, by] where i represents rows to select, j represents columns to select or transform, and by represents grouping columns. It provides examples of using i to subset and order rows, using j to select, compute on, and name columns, and using by to group by one or more columns and columns or expressions. The flexibility of combining i, j, and by allows for powerful operations like applying functions to grouped data or subsetting groups.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Notes on Data Table

Makar Pravosud
28 08 2019

The general form of [Link] syntax is:


DT[i, j, by]

Using i:

We can also sort a [Link] using order(), which internally uses [Link]’s fast order for performance.
We can do much more in i by keying a [Link], which allows blazing fast subsets and joins.

Using j:

Select columns the [Link] way: DT[, .(colA, colB)].


Select columns the [Link] way: DT[, c(“colA”, “colB”)].
Compute on columns: DT[, .(sum(colA), mean(colB))].
Provide names if necessary: DT[, .(sA =sum(colA), mB = mean(colB))].
Combine with i: DT[colA > value, sum(colB)].

Using by:

Using by, we can group by columns by specifying a list of columns or a character vector of column names or
even expressions. The flexibility of j, combined with by and i makes for a very powerful syntax.
by can handle multiple columns and also expressions.
We can keyby grouping columns to automatically sort the grouped result.
We can use .SD and .SDcols in j to operate on multiple columns using already familiar base functions. Here
are some examples:
DT[, lapply(.SD, fun), by = . . . , .SDcols = . . . ] - applies fun to all columns specified in .SDcols while
grouping by the columns specified in by.
DT[, head(.SD, 2), by = . . . ] - return the first two rows for each group.
DT[col > val, head(.SD, 1), by = . . . ] - combine i along with j and by.
And remember the tip: As long as j returns a list, each element of the list will become a column in the
resulting [Link].

You might also like