Data - Table Cheat Sheet R For Data Science: Doing J by Group Advanced Data Table Operations
Data - Table Cheat Sheet R For Data Science: Doing J by Group Advanced Data Table Operations
R For Data Science > DT[,.(V4.Sum=sum(V4)),by=V1] #Calculate sum of V4 for every group in V1 Sum
V1 V4.
>
>
DT[.N-1] #Return the penultimate row of the DT
2: 2 42
> DT[,list(V2,V3)] #Return V2 and V3 as a data.table
sign V4.Sum
V1 V2 V1
42
1: 1
2: 1
A
B
0.4053
0.4053
#The same as the above, with new name
for the variable you’re grouping by
3: 1 C 0.4053
#Calculate sum of V4 for every group in V1
after subsetting on the first 5 rows
5: 2 B -0.6443
data.table
> DT[,.N,by=V1] #Count number of rows for every group in
V1
General form: DT[i, j, by]
“Take DT, subset rows using i, then calculate j grouped by by” .SD & .SDcols
data.table is an R package that provides a high-performance
> DT[,print(.SD),by=V2] #Look at what .SD contains
for ease of use, convenience and
programming speed. > Adding/Updating Columns By Reference in j Using := >
>
>
DT[,.SD[c(1,.N)],by=V2] #Select the first and last row grouped by V2
.SDcols=c("V3","V4")]
V2 V3 V4
1: A -0.478 22
> library(data.table) V1 V2 V3 V4
2: B -0.478 26
1: 2.72 A -0.1107 1
3: C -0.478 30
2: 7.39 B -0.1427 2
> DT[,lapply(.SD,sum),by=V2, #Calculate sum of V3 and V4 in .SD grouped by
V2
3: 2.72 C -1.8893 3
.SDcols=paste0("V",3:4)]
> Creating A data.table 4: 7.39
...
A -0.3571 4
LETTERS[4:6])]
#Alternative to the above one. With [], you print the result to the screen
> DT[,':='(V1=round(exp(V1),2),
V3=round(rnorm(4),4),
V1 V2 V3 V4
> setkey(DT,V2) #A key is set on V2; output is returned invisibly
2: 1619.71 E -0.1427 2
V1 V2 V3 V4
3: 15.18 F -1.8893 3
1: 1 A -0.2392 1
2: 2 A -1.6148 4
> Cols.chosen=c("A","B")
> DT[c("A","C")] #Return all rows where the key column (V2) has value A or C
> DT[V2 %in% c("A","C")] #Select all rows that have value A or C in column V2 V1 V2 V3 V4
1: 1 A -0.2392 1
> set()-Family 2: 2
3: 1
A
A
-1.6148
1.0498
4
NA
set() > DT[c("A","D"),nomatch=0] #Return all rows where key column V2 has value A or D
V1 V2 V3 V4
1: 1 A -0.2392 1
[1] 18
#Sequence along the values of rows, and
for the values of cols,
#Return total sum of V4, for rows of key column V2 that
have values A or C
#Return the sum of all elements of V1 and the std. dev. of V3 in a data.table
> DT[,.(sum(V1),sd(V3))]
V1 V2
{set(DT,
and anohter sum for rows of V2 that have value C
1: 18 0.4546055
i=rows[[i]],
> DT[c("A","C"), by=.EACHI]
j=cols[i],
V2 V1
Sd.V3=sd(V3))]
value=NA)} 1: A 22
Aggregate Sd.V3
2: C 30
1: 18 0.4546055
#Select column V2 and compute std. dev. of V3, which returns a single value & gets recycled
> DT[,.(V1,Sd.V3=sd(V3))]
setnames()
#Select rows that have value 2 for the first key (V1) &
> DT[.(2,"C")]
plot(V3),
Syntax: setnames(DT,"old","new")[]
V1 V2 V3 V4
c("V2","V3"),
Select rows that have value 2 for the first key (V1) &
> Chaining
c("V2.rating","V3.DC"))
within
those rows the value A or C for the second key (V2)
> DT[.(2,c("A","C"))]
V1 V2 V3 V4
setcolorder() 1: 2 A -1.6148 4
V1 V4.Sum
3: 2 C 0.3262 6
Syntax: setcolorder(DT,"neworder")
1: 1 36
4: 2 C -1.6148 12
2: 2 42
> setcolorder(DT, #Change column ordering to contents
of the specified vector (invisible)
by=V1][V4.Sum>40]
V1 V4.Sum
1: 2 42
V1 V4.Sum
1: 2
2: 1
42
36
Learn Data Skills Online at www.DataCamp.com