0% found this document useful (0 votes)

74 views1 page

Data - Table Cheat Sheet R For Data Science: Doing J by Group Advanced Data Table Operations

This document provides a summary of key operations and syntax for working with data.tables in R. Some key points covered include: 1) How to calculate summaries like sums and means on columns grouped by other columns using j and by. 2) How to subset and select columns from the data.table using i, j, and by. 3) How to update columns by reference in j using := to modify the data.table. 4) An overview of the general syntax for data.table operations of DT[i,j,by]. 5) How to work with .SD (Subset of Data) to access columns within each group for operations. So in summary

Uploaded by

mohitosh deb

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

74 views1 page

Data - Table Cheat Sheet R For Data Science: Doing J by Group Advanced Data Table Operations

Uploaded by

mohitosh deb

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 1

> Doing j by Group > Advanced Data Table Operations

R For Data Science > DT[,.(V4.Sum=sum(V4)),by=V1] #Calculate sum of V4 for every group in V1 Sum

V1 V4.

>
>
DT[.N-1] #Return the penultimate row of the DT

DT[,.N] #Return the number of rows

data.table Cheat Sheet

1: 1 36
> DT[,.(V2,V3)] #Return V2 and V3 as a data.table

2: 2 42
> DT[,list(V2,V3)] #Return V2 and V3 as a data.table

> DT[,.(V4.Sum=sum(V4)), by=.(V1,V2)] #Calculate sum of V4 for every group in V1 and V2

#Return the result of j, grouped by all possible combinations of groups specified in by

> DT[,.(V4.Sum=sum(V4)), by=sign(V1-1)] #Calculate sum of V4 for every group in sign(V1-1)

> DT[,mean(V3),by=.(V1,V2)]

sign V4.Sum
V1 V2 V1

Learn data.table online at www.DataCamp.com 1: 0

2: 1
36

1: 1
2: 1
A
B
0.4053

0.4053

#The same as the above, with new name for the variable you’re grouping by
3: 1 C 0.4053

> DT[,.(V4.Sum=sum(V4)), by=.(V1.01=sign(V1-1))]

4: 2 A -0.6443

#Calculate sum of V4 for every group in V1 after subsetting on the first 5 rows
5: 2 B -0.6443

> DT[1:5,.(V4.Sum=sum(V4)), by=V1]

6: 2 C -0.6443

data.table
> DT[,.N,by=V1] #Count number of rows for every group in V1

General form: DT[i, j, by] “Take DT, subset rows using i, then calculate j grouped by by” .SD & .SDcols
data.table is an R package that provides a high-performance
> DT[,print(.SD),by=V2] #Look at what .SD contains

version of base R’s data.frame with syntax and feature enhancements

for ease of use, convenience and programming speed. > Adding/Updating Columns By Reference in j Using := >
>
>
DT[,.SD[c(1,.N)],by=V2] #Select the first and last row grouped by V2

DT[,lapply(.SD,sum),by=V2] #Calculate sum of columns in .SD grouped by V2

DT[,lapply(.SD,sum),by=V2, #Calculate sum of V3 and V4 in .SD grouped by V2

.SDcols=c("V3","V4")]

Load the package: > DT[,V1:=round(exp(V1),2)] #V1 is updated by what is after :=

V2 V3 V4

> DT Return the result by calling DT

1: A -0.478 22

> library(data.table) V1 V2 V3 V4
2: B -0.478 26

1: 2.72 A -0.1107 1
3: C -0.478 30

2: 7.39 B -0.1427 2
> DT[,lapply(.SD,sum),by=V2, #Calculate sum of V3 and V4 in .SD grouped by V2

3: 2.72 C -1.8893 3
.SDcols=paste0("V",3:4)]
> Creating A data.table 4: 7.39
...

A -0.3571 4

> DT[,c("V1","V2"):=list(round(exp(V1),2), #Columns V1 & V2 are updated by what is after :=

> set.seed(45L) #Create a data.table and call it DT

> DT <- data.table(V1=c(1L,2L),

LETTERS[4:6])]

#Alternative to the above one. With [], you print the result to the screen

> DT[,':='(V1=round(exp(V1),2),

> Indexing And Keys

V2=LETTERS[1:3],
V2=LETTERS[4:6])][]

V3=round(rnorm(4),4),
V1 V2 V3 V4
> setkey(DT,V2) #A key is set on V2; output is returned invisibly

V4=1:12) 1: 15.18 D -0.1107 1

> DT["A"] Return all rows where the key column (set to V2) has the value A

2: 1619.71 E -0.1427 2
V1 V2 V3 V4

3: 15.18 F -1.8893 3
1: 1 A -0.2392 1

2: 2 A -1.6148 4

> Subsetting Rows Using i

4: 1619.71 D -0.3571 4

> DT[,V1:=NULL] Remove V1

3: 1 A 1.0498 7

> DT[,c("V1","V2"):=NULL] #Remove columns V1 and V2

4: 2 A 0.3262 10

> Cols.chosen=c("A","B")
> DT[c("A","C")] #Return all rows where the key column (V2) has value A or C

> DT[3:5,] #Select 3rd to 5th row

> DT[,Cols.Chosen:=NULL] #Delete the column with column name Cols.chosen
> DT["A",mult="first"] #Return first row of all rows that match value A in key column V2

> DT[3:5] #Select 3rd to 5th row

> DT[,(Cols.Chosen):=NULL] #Delete the columns specified in the variable Cols.chosen > DT["A",mult="last"] #Return last row of all rows that match value A in key column V2

> DT[V2=="A"] #Select all rows that have value A in column V2

> DT[c("A","D")] #Return all rows where key column V2 has value A or D

> DT[V2 %in% c("A","C")] #Select all rows that have value A or C in column V2 V1 V2 V3 V4

1: 1 A -0.2392 1

> set()-Family 2: 2
3: 1
A
A
-1.6148
1.0498
4

> Manipulating on Columns in j 4: 2

5: NA D
A 0.3262
NA
10

set() > DT[c("A","D"),nomatch=0] #Return all rows where key column V2 has value A or D

V1 V2 V3 V4

> DT[,V2] Return V2 as a vector

1: 1 A -0.2392 1

[1] “A” “B” “C” “A” “B” “C” ...

Syntax: for (i in from:to) set(DT, row, column, new value)
2: 2 A -1.6148 4

> DT[,.(V2,V3)] #Return V2 and V3 as a data.table

> rows <- list(3:4,5:6)

3: 1 A 1.0498 7

> DT[,sum(V1)] #Return the sum of all elements of V1 in a vector

> cols <- 1:2

4: 2 A 0.3262 10

[1] 18

#Sequence along the values of rows, and for the values of cols,
#Return total sum of V4, for rows of key column V2 that have values A or C

#Return the sum of all elements of V1 and the std. dev. of V3 in a data.table

set the values of those elements equal to NA (invisible)

> DT[c("A","C"),sum(V4)]

> DT[,.(sum(V1),sd(V3))]

> for(i in seq_along(rows))

#Return sum of column V4 for rows of V2 that have value A, sum(V4),

V1 V2

{set(DT,
and anohter sum for rows of V2 that have value C

1: 18 0.4546055

i=rows[[i]],
> DT[c("A","C"), by=.EACHI]

> DT[,.(Aggregate=sum(V1), #The same as the above, with new names

j=cols[i],
V2 V1

Sd.V3=sd(V3))]

value=NA)} 1: A 22

Aggregate Sd.V3

2: C 30

1: 18 0.4546055

> setkey(DT,V1,V2) #Sort by V1 and then by V2 within each group of V1 (invisible)

#Select column V2 and compute std. dev. of V3, which returns a single value & gets recycled

> DT[,.(V1,Sd.V3=sd(V3))]
setnames() #Select rows that have value 2 for the first key (V1) &

the value C for the second key (V2)

> DT[,.(print(V2), #Print column V2 and plot V3

> DT[.(2,"C")]

plot(V3),
Syntax: setnames(DT,"old","new")[]
V1 V2 V3 V4

NULL)] > setnames(DT,"V2","Rating") #Set name of V2 to Rating (invisible)

1: 2 C 0.3262 6

> setnames(DT, #Change 2 column names (invisible)

2: 2 C -1.6148 12

c("V2","V3"),
Select rows that have value 2 for the first key (V1) &

> Chaining
c("V2.rating","V3.DC")) within those rows the value A or C for the second key (V2)

> DT[.(2,c("A","C"))]

V1 V2 V3 V4

setcolorder() 1: 2 A -1.6148 4

> DT <- DT[,.(V4.Sum=sum(V4)), by=V1] #Calculate sum of V4, grouped by V1

2: 2 A 0.3262 10

V1 V4.Sum
3: 2 C 0.3262 6

Syntax: setcolorder(DT,"neworder")
1: 1 36
4: 2 C -1.6148 12
2: 2 42
> setcolorder(DT, #Change column ordering to contents of the specified vector (invisible)

> DT[V4.Sum>40] #Select that group of which the sum is >40

c("V2","V1","V4","V3"))
> DT[,.(V4.Sum=sum(V4)), #Select that group of which the sum is >40 (chaining)

by=V1][V4.Sum>40]

V1 V4.Sum

1: 2 42

> DT[,.(V4.Sum=sum(V4)), by=V1][order(-V1)] Calculate sum of V4, grouped by ordered on V1

V1 V4.Sum

1: 2
2: 1
42

36
Learn Data Skills Online at www.DataCamp.com

Shortcuts in Windows 11
100% (3)
Shortcuts in Windows 11
8 pages
Stastistics and Probability With R Programming Language: Lab Report
50% (2)
Stastistics and Probability With R Programming Language: Lab Report
44 pages
R Programming Cheatsheet
100% (2)
R Programming Cheatsheet
6 pages
SML Practical 1to11
No ratings yet
SML Practical 1to11
23 pages
早年自敲代码
No ratings yet
早年自敲代码
96 pages
R For Machine Learning Lab Practical Work: Master of Business Administration in Business Analytics
0% (1)
R For Machine Learning Lab Practical Work: Master of Business Administration in Business Analytics
9 pages
R Quiz
No ratings yet
R Quiz
291 pages
R Sem3
No ratings yet
R Sem3
34 pages
Parth Suryavanshi (231056) Practical No.1 To No.5
No ratings yet
Parth Suryavanshi (231056) Practical No.1 To No.5
37 pages
DivD Saanvi Khamitkar 31010223052
No ratings yet
DivD Saanvi Khamitkar 31010223052
31 pages
R Exam
No ratings yet
R Exam
18 pages
AMDA Practical - A048
No ratings yet
AMDA Practical - A048
35 pages
R Programming
No ratings yet
R Programming
34 pages
Da Lab It
No ratings yet
Da Lab It
20 pages
Applied Statistics MAT1011
No ratings yet
Applied Statistics MAT1011
22 pages
Pool
No ratings yet
Pool
13 pages
Practical 10
No ratings yet
Practical 10
22 pages
Changing Views On Estuary English
No ratings yet
Changing Views On Estuary English
37 pages
Model 1
No ratings yet
Model 1
14 pages
Data Wrangling
No ratings yet
Data Wrangling
12 pages
Lab 02 - Compound Data Structures
No ratings yet
Lab 02 - Compound Data Structures
12 pages
Expt. No. Basic Math Date
No ratings yet
Expt. No. Basic Math Date
24 pages
ADV Tsania Ismi Fauzia PDF
No ratings yet
ADV Tsania Ismi Fauzia PDF
14 pages
Fy BSC Stats Practical 1
No ratings yet
Fy BSC Stats Practical 1
11 pages
Matrix, Dataframes, List
No ratings yet
Matrix, Dataframes, List
8 pages
Matrix Codes
No ratings yet
Matrix Codes
8 pages
Datatable Intro
No ratings yet
Datatable Intro
9 pages
Statistic and R Programming Lab Exercise
No ratings yet
Statistic and R Programming Lab Exercise
8 pages
R Study Material I
No ratings yet
R Study Material I
8 pages
Bacs HW1
No ratings yet
Bacs HW1
6 pages
Data Manipulation With R - 3
No ratings yet
Data Manipulation With R - 3
17 pages
R - Tutorial: Matrices Are Vectors
No ratings yet
R - Tutorial: Matrices Are Vectors
13 pages
Data Transformation With Data - Table: Cheat Sheet
No ratings yet
Data Transformation With Data - Table: Cheat Sheet
2 pages
Enhanced Data
No ratings yet
Enhanced Data
12 pages
R Studio Notes
No ratings yet
R Studio Notes
10 pages
Intro To R Software
No ratings yet
Intro To R Software
7 pages
Curso Básico de Iniciación A La Programación Con R Álvaro Mauricio Bustamante Lozano
No ratings yet
Curso Básico de Iniciación A La Programación Con R Álvaro Mauricio Bustamante Lozano
9 pages
Matrices
No ratings yet
Matrices
6 pages
Datatable Cheat Sheet R
No ratings yet
Datatable Cheat Sheet R
1 page
4230025
No ratings yet
4230025
4 pages
Datatable
No ratings yet
Datatable
2 pages
A Short List of The Most Useful R Commands
No ratings yet
A Short List of The Most Useful R Commands
11 pages
Sakshi 22com1866
No ratings yet
Sakshi 22com1866
3 pages
Introduction To The Data - Table Package in R: Revised: October 2, 2014 (A Later Revision May Be Available On The)
No ratings yet
Introduction To The Data - Table Package in R: Revised: October 2, 2014 (A Later Revision May Be Available On The)
8 pages
BAN5
No ratings yet
BAN5
2 pages
Introduction To The Data - Table Package in R: Revised: September 18, 2015 (A Later Revision May Be Available On The)
No ratings yet
Introduction To The Data - Table Package in R: Revised: September 18, 2015 (A Later Revision May Be Available On The)
8 pages
Unit - 3 Learning Notes
No ratings yet
Unit - 3 Learning Notes
8 pages
A Short List of The Most Useful R Commands
No ratings yet
A Short List of The Most Useful R Commands
8 pages
Data Table
No ratings yet
Data Table
2 pages
R Functions
No ratings yet
R Functions
8 pages
V List (Name C ("Aj","Neil","Geo","Chitti","Jino"), Place C ("Clt","Guj","Ekm","Knr","Pkd") ) V
No ratings yet
V List (Name C ("Aj","Neil","Geo","Chitti","Jino"), Place C ("Clt","Guj","Ekm","Knr","Pkd") ) V
5 pages
R Programming Cheat Sheet: Ata Tructures
No ratings yet
R Programming Cheat Sheet: Ata Tructures
2 pages
Data Transformation With Data - Table: Cheat Sheet
No ratings yet
Data Transformation With Data - Table: Cheat Sheet
2 pages
Assignment
No ratings yet
Assignment
4 pages
Data Transformation With Data - Table: Cheat Sheet
No ratings yet
Data Transformation With Data - Table: Cheat Sheet
2 pages
Ds
No ratings yet
Ds
2 pages
Data Transformation With Data - Table: Cheat Sheet
No ratings yet
Data Transformation With Data - Table: Cheat Sheet
2 pages
R Syntax Examples 1
No ratings yet
R Syntax Examples 1
6 pages
R Console Array
No ratings yet
R Console Array
1 page
What Is Computer?
No ratings yet
What Is Computer?
17 pages
Comptency Map 21ST Literature of The Philippines and The World
No ratings yet
Comptency Map 21ST Literature of The Philippines and The World
6 pages
On Course A2 Test. Unit 8
No ratings yet
On Course A2 Test. Unit 8
3 pages
Backward Chaining
No ratings yet
Backward Chaining
16 pages
Catalogue Career Paths Courses
No ratings yet
Catalogue Career Paths Courses
6 pages
Updated README
No ratings yet
Updated README
2 pages
21st Century Literacies - An Introduction
No ratings yet
21st Century Literacies - An Introduction
23 pages
Android List View Using Custom Adapter and SQLite
No ratings yet
Android List View Using Custom Adapter and SQLite
14 pages
400 Editable Business Icons: Powerpoint Template
No ratings yet
400 Editable Business Icons: Powerpoint Template
60 pages
History Y5 2019 3rd Term
No ratings yet
History Y5 2019 3rd Term
11 pages
ALE HCM FI Integration
No ratings yet
ALE HCM FI Integration
2 pages
Read 366 Lit Assess Lesson Plan
No ratings yet
Read 366 Lit Assess Lesson Plan
2 pages
Answer Key: Cumulative Test
No ratings yet
Answer Key: Cumulative Test
2 pages
Macbeth
No ratings yet
Macbeth
3 pages
Windows XP Visual Guidelines
No ratings yet
Windows XP Visual Guidelines
49 pages
Lib System
No ratings yet
Lib System
10 pages
Quantum Mechanics - II Angular Momentum - III: Wigner-Eckart Theorem
No ratings yet
Quantum Mechanics - II Angular Momentum - III: Wigner-Eckart Theorem
10 pages
PR2 Printer Driver W2k-WXp
No ratings yet
PR2 Printer Driver W2k-WXp
9 pages
Concept of Philosophy and Science
No ratings yet
Concept of Philosophy and Science
5 pages
The Study of Select Themes in Cormac Mcarthy'S
No ratings yet
The Study of Select Themes in Cormac Mcarthy'S
26 pages
Tiếng Anh TT5
No ratings yet
Tiếng Anh TT5
13 pages
Countable and Uncountable Nouns
No ratings yet
Countable and Uncountable Nouns
3 pages
The Dos & Don'ts of Error Correction When Teaching+
No ratings yet
The Dos & Don'ts of Error Correction When Teaching+
8 pages
4 47 PG TRB 2013 English Keyanswer
No ratings yet
4 47 PG TRB 2013 English Keyanswer
6 pages
Cops and Robbers
No ratings yet
Cops and Robbers
1 page
Template Reading Assessment Monitoring Tool
No ratings yet
Template Reading Assessment Monitoring Tool
4 pages
Diorama On Angle of Elevation
No ratings yet
Diorama On Angle of Elevation
1 page
Bhasa Inggris Kelas 12
No ratings yet
Bhasa Inggris Kelas 12
5 pages
MATLAB for Beginners: A Gentle Approach - Revised Edition
From Everand
MATLAB for Beginners: A Gentle Approach - Revised Edition
Peter I. Kattan
3.5/5 (11)
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
MATLAB for Beginners: A Gentle Approach - Revised Edition
From Everand
MATLAB for Beginners: A Gentle Approach - Revised Edition
Peter Kattan
No ratings yet

Data - Table Cheat Sheet R For Data Science: Doing J by Group Advanced Data Table Operations

Uploaded by

Data - Table Cheat Sheet R For Data Science: Doing J by Group Advanced Data Table Operations

Uploaded by

> Doing j by Group > Advanced Data Table Operations

DT[,.N] #Return the number of rows

data.table Cheat Sheet

> DT[,.(V4.Sum=sum(V4)), by=.(V1,V2)] #Calculate sum of V4 for every group in V1 and V2

> DT[,.(V4.Sum=sum(V4)), by=sign(V1-1)] #Calculate sum of V4 for every group in sign(V1-1)

Learn data.table online at www.DataCamp.com 1: 0

> DT[,.(V4.Sum=sum(V4)), by=.(V1.01=sign(V1-1))]

> DT[1:5,.(V4.Sum=sum(V4)), by=V1]

version of base R’s data.frame with syntax and feature enhancements

DT[,lapply(.SD,sum),by=V2] #Calculate sum of columns in .SD grouped by V2

DT[,lapply(.SD,sum),by=V2, #Calculate sum of V3 and V4 in .SD grouped by V2

Load the package: > DT[,V1:=round(exp(V1),2)] #V1 is updated by what is after :=

> DT Return the result by calling DT

> DT[,c("V1","V2"):=list(round(exp(V1),2), #Columns V1 & V2 are updated by what is after :=

> set.seed(45L) #Create a data.table and call it DT

> DT <- data.table(V1=c(1L,2L),

> Indexing And Keys

V4=1:12) 1: 15.18 D -0.1107 1

> Subsetting Rows Using i

> DT[,V1:=NULL] Remove V1

> DT[,c("V1","V2"):=NULL] #Remove columns V1 and V2

> DT[3:5,] #Select 3rd to 5th row

> DT[3:5] #Select 3rd to 5th row

> DT[V2=="A"] #Select all rows that have value A in column V2

> Manipulating on Columns in j 4: 2

> DT[,V2] Return V2 as a vector

[1] “A” “B” “C” “A” “B” “C” ...

> DT[,.(V2,V3)] #Return V2 and V3 as a data.table

> rows <- list(3:4,5:6)

> DT[,sum(V1)] #Return the sum of all elements of V1 in a vector

> cols <- 1:2

set the values of those elements equal to NA (invisible)

> for(i in seq_along(rows))

> DT[,.(Aggregate=sum(V1), #The same as the above, with new names

> setkey(DT,V1,V2) #Sort by V1 and then by V2 within each group of V1 (invisible)

the value C for the second key (V2)

> DT[,.(print(V2), #Print column V2 and plot V3

NULL)] > setnames(DT,"V2","Rating") #Set name of V2 to Rating (invisible)

> setnames(DT, #Change 2 column names (invisible)

> DT <- DT[,.(V4.Sum=sum(V4)), by=V1] #Calculate sum of V4, grouped by V1

> DT[V4.Sum>40] #Select that group of which the sum is >40

> DT[,.(V4.Sum=sum(V4)), by=V1][order(-V1)] Calculate sum of V4, grouped by ordered on V1

You might also like