0% found this document useful (0 votes)
74 views1 page

Data - Table Cheat Sheet R For Data Science: Doing J by Group Advanced Data Table Operations

This document provides a summary of key operations and syntax for working with data.tables in R. Some key points covered include: 1) How to calculate summaries like sums and means on columns grouped by other columns using j and by. 2) How to subset and select columns from the data.table using i, j, and by. 3) How to update columns by reference in j using := to modify the data.table. 4) An overview of the general syntax for data.table operations of DT[i,j,by]. 5) How to work with .SD (Subset of Data) to access columns within each group for operations. So in summary

Uploaded by

mohitosh deb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views1 page

Data - Table Cheat Sheet R For Data Science: Doing J by Group Advanced Data Table Operations

This document provides a summary of key operations and syntax for working with data.tables in R. Some key points covered include: 1) How to calculate summaries like sums and means on columns grouped by other columns using j and by. 2) How to subset and select columns from the data.table using i, j, and by. 3) How to update columns by reference in j using := to modify the data.table. 4) An overview of the general syntax for data.table operations of DT[i,j,by]. 5) How to work with .SD (Subset of Data) to access columns within each group for operations. So in summary

Uploaded by

mohitosh deb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

> Doing j by Group > Advanced Data Table Operations

R For Data Science > DT[,.(V4.Sum=sum(V4)),by=V1] #Calculate sum of V4 for every group in V1 Sum

V1 V4.

>
>
DT[.N-1] #Return the penultimate row of the DT

DT[,.N] #Return the number of rows

data.table Cheat Sheet


1: 1 36
> DT[,.(V2,V3)] #Return V2 and V3 as a data.table

2: 2 42
> DT[,list(V2,V3)] #Return V2 and V3 as a data.table

> DT[,.(V4.Sum=sum(V4)), by=.(V1,V2)] #Calculate sum of V4 for every group in V1 and V2


#Return the result of j, grouped by all possible combinations of groups specified in by

> DT[,.(V4.Sum=sum(V4)), by=sign(V1-1)] #Calculate sum of V4 for every group in sign(V1-1)


> DT[,mean(V3),by=.(V1,V2)]

sign V4.Sum
V1 V2 V1

Learn data.table online at www.DataCamp.com 1: 0


2: 1
36

42

1: 1
2: 1
A
B
0.4053

0.4053

#The same as the above, with new name for the variable you’re grouping by
3: 1 C 0.4053

> DT[,.(V4.Sum=sum(V4)), by=.(V1.01=sign(V1-1))]


4: 2 A -0.6443

#Calculate sum of V4 for every group in V1 after subsetting on the first 5 rows
5: 2 B -0.6443

> DT[1:5,.(V4.Sum=sum(V4)), by=V1]


6: 2 C -0.6443

data.table
> DT[,.N,by=V1] #Count number of rows for every group in V1

General form: DT[i, j, by] “Take DT, subset rows using i, then calculate j grouped by by” .SD & .SDcols
data.table is an R package that provides a high-performance
> DT[,print(.SD),by=V2] #Look at what .SD contains

version of base R’s data.frame with syntax and feature enhancements

for ease of use, convenience and programming speed. > Adding/Updating Columns By Reference in j Using := >
>
>
DT[,.SD[c(1,.N)],by=V2] #Select the first and last row grouped by V2

DT[,lapply(.SD,sum),by=V2] #Calculate sum of columns in .SD grouped by V2

DT[,lapply(.SD,sum),by=V2, #Calculate sum of V3 and V4 in .SD grouped by V2

.SDcols=c("V3","V4")]

Load the package: > DT[,V1:=round(exp(V1),2)] #V1 is updated by what is after :=

V2 V3 V4

> DT Return the result by calling DT

1: A -0.478 22

> library(data.table) V1 V2 V3 V4
2: B -0.478 26

1: 2.72 A -0.1107 1
3: C -0.478 30

2: 7.39 B -0.1427 2
> DT[,lapply(.SD,sum),by=V2, #Calculate sum of V3 and V4 in .SD grouped by V2

3: 2.72 C -1.8893 3
.SDcols=paste0("V",3:4)]
> Creating A data.table 4: 7.39
...

A -0.3571 4

> DT[,c("V1","V2"):=list(round(exp(V1),2), #Columns V1 & V2 are updated by what is after :=

> set.seed(45L) #Create a data.table and call it DT

> DT <- data.table(V1=c(1L,2L),

LETTERS[4:6])]

#Alternative to the above one. With [], you print the result to the screen

> DT[,':='(V1=round(exp(V1),2),

> Indexing And Keys


V2=LETTERS[1:3],
V2=LETTERS[4:6])][]

V3=round(rnorm(4),4),
V1 V2 V3 V4
> setkey(DT,V2) #A key is set on V2; output is returned invisibly

V4=1:12) 1: 15.18 D -0.1107 1


> DT["A"] Return all rows where the key column (set to V2) has the value A

2: 1619.71 E -0.1427 2
V1 V2 V3 V4

3: 15.18 F -1.8893 3
1: 1 A -0.2392 1

2: 2 A -1.6148 4

> Subsetting Rows Using i


4: 1619.71 D -0.3571 4

> DT[,V1:=NULL] Remove V1


3: 1 A 1.0498 7

> DT[,c("V1","V2"):=NULL] #Remove columns V1 and V2


4: 2 A 0.3262 10

> Cols.chosen=c("A","B")
> DT[c("A","C")] #Return all rows where the key column (V2) has value A or C

> DT[3:5,] #Select 3rd to 5th row


> DT[,Cols.Chosen:=NULL] #Delete the column with column name Cols.chosen
> DT["A",mult="first"] #Return first row of all rows that match value A in key column V2

> DT[3:5] #Select 3rd to 5th row


> DT[,(Cols.Chosen):=NULL] #Delete the columns specified in the variable Cols.chosen > DT["A",mult="last"] #Return last row of all rows that match value A in key column V2

> DT[V2=="A"] #Select all rows that have value A in column V2


> DT[c("A","D")] #Return all rows where key column V2 has value A or D

> DT[V2 %in% c("A","C")] #Select all rows that have value A or C in column V2 V1 V2 V3 V4

1: 1 A -0.2392 1

> set()-Family 2: 2
3: 1
A
A
-1.6148
1.0498
4

> Manipulating on Columns in j 4: 2


5: NA D
A 0.3262
NA
10

NA

set() > DT[c("A","D"),nomatch=0] #Return all rows where key column V2 has value A or D

V1 V2 V3 V4

> DT[,V2] Return V2 as a vector

1: 1 A -0.2392 1

[1] “A” “B” “C” “A” “B” “C” ...


Syntax: for (i in from:to) set(DT, row, column, new value)
2: 2 A -1.6148 4

> DT[,.(V2,V3)] #Return V2 and V3 as a data.table

> rows <- list(3:4,5:6)


3: 1 A 1.0498 7

> DT[,sum(V1)] #Return the sum of all elements of V1 in a vector

> cols <- 1:2


4: 2 A 0.3262 10

[1] 18

#Sequence along the values of rows, and for the values of cols,
#Return total sum of V4, for rows of key column V2 that have values A or C

#Return the sum of all elements of V1 and the std. dev. of V3 in a data.table

set the values of those elements equal to NA (invisible)


> DT[c("A","C"),sum(V4)]

> DT[,.(sum(V1),sd(V3))]

> for(i in seq_along(rows))


#Return sum of column V4 for rows of V2 that have value A, sum(V4),

V1 V2

{set(DT,
and anohter sum for rows of V2 that have value C

1: 18 0.4546055

i=rows[[i]],
> DT[c("A","C"), by=.EACHI]

> DT[,.(Aggregate=sum(V1), #The same as the above, with new names

j=cols[i],
V2 V1

Sd.V3=sd(V3))]

value=NA)} 1: A 22

Aggregate Sd.V3

2: C 30

1: 18 0.4546055

> setkey(DT,V1,V2) #Sort by V1 and then by V2 within each group of V1 (invisible)

#Select column V2 and compute std. dev. of V3, which returns a single value & gets recycled

> DT[,.(V1,Sd.V3=sd(V3))]
setnames() #Select rows that have value 2 for the first key (V1) &

the value C for the second key (V2)

> DT[,.(print(V2), #Print column V2 and plot V3

> DT[.(2,"C")]

plot(V3),
Syntax: setnames(DT,"old","new")[]
V1 V2 V3 V4

NULL)] > setnames(DT,"V2","Rating") #Set name of V2 to Rating (invisible)


1: 2 C 0.3262 6

> setnames(DT, #Change 2 column names (invisible)


2: 2 C -1.6148 12

c("V2","V3"),
Select rows that have value 2 for the first key (V1) &

> Chaining
c("V2.rating","V3.DC")) within those rows the value A or C for the second key (V2)

> DT[.(2,c("A","C"))]

V1 V2 V3 V4

setcolorder() 1: 2 A -1.6148 4

> DT <- DT[,.(V4.Sum=sum(V4)), by=V1] #Calculate sum of V4, grouped by V1


2: 2 A 0.3262 10

V1 V4.Sum
3: 2 C 0.3262 6

Syntax: setcolorder(DT,"neworder")
1: 1 36
4: 2 C -1.6148 12
2: 2 42
> setcolorder(DT, #Change column ordering to contents of the specified vector (invisible)

> DT[V4.Sum>40] #Select that group of which the sum is >40


c("V2","V1","V4","V3"))
> DT[,.(V4.Sum=sum(V4)), #Select that group of which the sum is >40 (chaining)

by=V1][V4.Sum>40]

V1 V4.Sum

1: 2 42

> DT[,.(V4.Sum=sum(V4)), by=V1][order(-V1)] Calculate sum of V4, grouped by ordered on V1

V1 V4.Sum

1: 2
2: 1
42

36
Learn Data Skills Online at www.DataCamp.com

You might also like