Data Transformation With Data - Table: Cheat Sheet

Data.table is a package for efficiently transforming and manipulating data in R. It converts R's native data frames into data.tables with enhanced functionality. Data.tables allow users to subset rows, select and manipulate columns, summarize and group data using syntax like dt[i,j,by]. They provide fast operations for tasks like subsetting, grouping, joining, and updating data.

Uploaded by

KGAOGELO Moloko

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views2 pages

Data Transformation With Data - Table: Cheat Sheet

Uploaded by

KGAOGELO Moloko

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Data Transformation with data.

table : : CHEAT SHEET

Basics Manipulate columns with j COMMON GROUPED OPREATIONS
dt[, .(sum_b = sum(b)), by = .(a)] – summarize rows within groups.
data.table is an extremely fast and memory efficient package
for transforming data in R. It works by converting R’s native EXTRACT
data frame objects into data.tables with new and enhanced dt[, c := sum(b), by = .(a)] – create a new column and compute rows
functionality. The basics of working with data.tables are: within groups.
dt[, c(2)] – select column(s) by number. Prefix
column numbers with “-” to deselect.
dt[, .SD[1], by = .(a)] – extract first row of groups.
dt[i, j, by]
dt[, .SD[.N], by = .(a)] – extract last row of groups.
Take data.table dt, b c b c dt[, .(b, c)] – select column(s) by name.
subset rows using i,
and manipulate columns with j,
grouped according to by. Chaining
data.tables are also data frames – functions that work with data SUMMARIZE dt[…][…] – perform a sequence of data.table operations by
frames therefore also work with data.tables. chaining multiple “[]”.
a x dt[, .(x = sum(a))] – create a data.table with new
columns based on the summarized values of rows.

Create a data.table Summary functions such as mean(), median(),

min(), max(), etc. may be used to summarize rows.
Functions for data.tables
data.table(a = c(1, 2), b = c(“a”, “b”)) – create a data.table from ARRANGE ROWS
scratch. Analogous to data.frame(). ADD COLUMN
a b a b setorder(dt, a, -b) – arrange the rows of a
setDT(df)* or as.data.table(df) – convert a data frame or a list to c dt[, c := 1 + 2] – create a new column based on 1 2 1 1 data.table. Prefix variable names with “-”
a data.table.
3 an expression. 2 2 2 2 for descending order.
3 2 1 2 1
3

UNIQUE CASES
Subset rows using i a a c dt[a == 1, c := 1 + 2] – create a new column a b a b unique(dt, a, b) – extract a subset of the
2 2 NA based on an expression but only for a subset of 1 2 1 2 data based on a unique combination of
dt[1:2, ] – subset rows based on row numbers. 1 1 3 rows. 2 2 2 2 values.
2 2 NA
1 1 1 1
1 2

a a dt[a > 5, ] – subset rows based on the values in Group according to by uniqueN(dt, by = c(“a”, “b”)) – return the number of unique rows,
based on columns specified in “by”. Leave out “by” to use all
2 6 one or more columns. columns.
6 a a a dt[, j, by = .(a)] – group rows by the
5 values in one or more columns.
* SET FUNCTIONS
Use “keyby = .(a)” for grouping and
LOGICAL OPERATORS TO USE IN i simultaneously sorting according to data.table provides a collection of functions beginning with
group column(s). “set”. They work without “<-” to alter data.tables in place. For
< <= is.na() %in% | %like% instance, “setDT(dt)” works like “dt <- as.data.table(dt)” but
> >= !is.na() ! & %between% without creating any copies in memory.

CC BY SA Erik Petrovski • [email protected] • www.petrovski.dk • Learn more with the data.table webpage or vignette • data.table version 1.11.4 • Updated: 2018-08
RENAME COLUMNS
a b x y
BIND
a b a b rbind(dt_a, dt_b) – combine rows of two
.SD
setnames(dt, c(“a”, “b”), c(“x”, “y”)) – rename a b
multiple columns. + = data.tables Refer to a Subset of the Data within a data.table
with .SD.

SET KEYS MULTIPLE COLUMN TYPE CONVERSION

a b x y a b x y cbind(dt_a, dt_b) – combine
setkey(dt, a, b) – set keys in a data.table to enable faster repeated columns of two data.tables dt[, lapply(.SD, as.character), .SDcols = c(“a”, “b”)] – convert
lookups in specified columns using “dt[.(value), ]” or for merging + = designated columns to character
without specifying merging columns “dt_a[dt_b]”.
GROUP OPTIMA
dt[, .SD[which.max(a)], by = b] – select the row with the highest
Combine data.tables Reshape a data.table value of within a column grouped according to b. Also works with
which.min() and which(). Similar to .SD[.N] and .SD[1] on page 1.

JOIN DCAST
a b x
3
y
b
a b x
dt_a[dt_b, .on(b = y)] – join two id y a b id a_X a_Z b_X b_Z dcast(dt, fread & fwrite
+ =
1 c 3 b 3 A X 1 3 A 1 2 3 4
data.tables based on rows with equal id ~ y,
2 a 2 c 1 c 2 A Z 2 4 B 1 2 3 4
3 b 1 a 2 a 1 values. setkey() can be used in stead B X 1 3 value.var = c(“a”, “b”)) fread & fwrite are data.table’s fast and multithreaded functions for
of “.on”. B Z 2 4 importing from and exporting to flat files – such as csv and tsv.

a b c x y z a b c x dt_a[dt_b, .on(b = y, c > z)] – Reshape a data.table from long to wide format.
3 b 4 IMPORT
+ =
1 c 7 3 b 4 3
2 c 5
join two data.tables based on dt A data.table.
2 a 5 1 c 5 2
3 b 6 1 a 8 NA a 8 1
rows with equal and unequal id ~ y Formula with a LHS: id column(s) containing id(s) for fread(“file.csv”) – read a flat file into R.
values multiple entries. And a RHS: column(s) with value(s) to
spread in column headers. fread(“file.csv”, cols = c(“a”, “b”)) – read two columns named “a”
value.var Column(s) containing values to fill into cells. and “b” from a file named “file.csv” in the working directory.
ROLLING JOIN

By default, a rolling join matches rows, defined by an id variable, MELT EXPORT

but only keeps the most recent preceding match with the left table,
defined by a date variable. id a_X a_Z b_X b_Z id y a b melt(dt, fwrite(dt, file =“”) – write a flat file from R.
A 1 2 3 4 A 1 1 3 id.vars = c("id"),
B 1 2 3 4 B 1 1 3
a id date b id date a id date b measure = patterns("^a", "^b"), MULTITHREADING
+ =
A 2 2 4
1 A 01-01-2010 1 A 01-01-2013 2 A 01-01-2013 1 B 2 2 4 variable.name = "y",
2 A 01-01-2012 1 B 01-01-2013 2 B 01-01-2013 1
value.name = c("a", "b")) setDTthreads() – set the number of threads that fread may use.
3 A 01-01-2014 Default is all available and appropriate for the task at hand.
1 B 01-01-2010
2 B 01-01-2012 Reshape a data.table from wide to long format.
dt A data.table.
# first set keys # then roll id.vars Id column(s) with id(s) for multiple entries.
setkey(dt_a, id, date) dt_a[dt_b, roll = TRUE] measure Column(s) containing values to fill into cells
setkey(dt_b, id, date) (often in pattern form).
variable.name, Name(s) of new column(s) for variables and values
dt[, roll = +Inf] – reverse the direction of the rolling join. value.name derived from old headers.

CC BY SA Erik Petrovski • [email protected] • www.petrovski.dk • Learn more with the data.table webpage or vignette • data.table version 1.11.4 • Updated: 2018-08

R-Programming For Data Science
No ratings yet
R-Programming For Data Science
59 pages
R Programming Cheatsheet
100% (2)
R Programming Cheatsheet
6 pages
DR - Pierpaolo-Delser - Introduction R
No ratings yet
DR - Pierpaolo-Delser - Introduction R
83 pages
Data Table
No ratings yet
Data Table
2 pages
R Data Reshaping - Javatpoint
No ratings yet
R Data Reshaping - Javatpoint
13 pages
Reshape2 - R - Flexibly Reshape Data - A Reboot of The Reshape Package
No ratings yet
Reshape2 - R - Flexibly Reshape Data - A Reboot of The Reshape Package
14 pages
Solutions For QB3
No ratings yet
Solutions For QB3
14 pages
Spring Course Material Nagoor Babu
0% (1)
Spring Course Material Nagoor Babu
44 pages
Datatable
No ratings yet
Datatable
2 pages
M2 Dar
No ratings yet
M2 Dar
46 pages
Enhanced Data
No ratings yet
Enhanced Data
12 pages
Fonction Dplyr
No ratings yet
Fonction Dplyr
5 pages
R Vectors
No ratings yet
R Vectors
22 pages
BS730 Class 12
No ratings yet
BS730 Class 12
36 pages
Data Wrangling
No ratings yet
Data Wrangling
12 pages
Introduction To Data Analytics For Accounting, 2nd Edition - Vernon Richardson, Katie Terrell and Ryan Teeter - 2, 2024 - McGraw Hill LLC
No ratings yet
Introduction To Data Analytics For Accounting, 2nd Edition - Vernon Richardson, Katie Terrell and Ryan Teeter - 2, 2024 - McGraw Hill LLC
689 pages
Getting Started With R
No ratings yet
Getting Started With R
155 pages
Data Manipulation With R - 3
No ratings yet
Data Manipulation With R - 3
17 pages
Datatable Intro
No ratings yet
Datatable Intro
9 pages
Data Visualisation Using Tableau
No ratings yet
Data Visualisation Using Tableau
12 pages
Presentation 1
No ratings yet
Presentation 1
34 pages
DS-R Block 3 MCQ Question Bank
No ratings yet
DS-R Block 3 MCQ Question Bank
6 pages
Data - Table Tutorial (With 50 Examples) PDF
No ratings yet
Data - Table Tutorial (With 50 Examples) PDF
13 pages
LSMW Migration With IDOC Method and BAPI
100% (1)
LSMW Migration With IDOC Method and BAPI
47 pages
Data Table PDF
No ratings yet
Data Table PDF
101 pages
Data Tidying With Tidyr::: Cheat Sheet
No ratings yet
Data Tidying With Tidyr::: Cheat Sheet
2 pages
R Lectures Chapter 4
No ratings yet
R Lectures Chapter 4
3 pages
R Course Own English HS
No ratings yet
R Course Own English HS
70 pages
Faqs About The Data - Table Package in R: Revised: October 2, 2014 (A Later Revision May Be Available On The)
No ratings yet
Faqs About The Data - Table Package in R: Revised: October 2, 2014 (A Later Revision May Be Available On The)
21 pages
Week 1-B. Data in R
No ratings yet
Week 1-B. Data in R
5 pages
Data Transformation With Data - Table: Cheat Sheet
No ratings yet
Data Transformation With Data - Table: Cheat Sheet
2 pages
R Reference Card
100% (4)
R Reference Card
4 pages
Base R
No ratings yet
Base R
9 pages
Databases Wikibook
No ratings yet
Databases Wikibook
105 pages
Unit - 2: Data Manipulation With R & Data Visualization in Watson Studio
No ratings yet
Unit - 2: Data Manipulation With R & Data Visualization in Watson Studio
58 pages
Data Transformation With Data - Table: Cheat Sheet
No ratings yet
Data Transformation With Data - Table: Cheat Sheet
2 pages
BMR Assignment: Tidyr
No ratings yet
BMR Assignment: Tidyr
3 pages
STAT 04 Simplify Notes
No ratings yet
STAT 04 Simplify Notes
34 pages
Delphi Getting Started With SQL Part 1 PDF
No ratings yet
Delphi Getting Started With SQL Part 1 PDF
6 pages
R Basic and Advanced
No ratings yet
R Basic and Advanced
9 pages
Introduction To R: Nihan Acar-Denizli, Pau Fonseca
No ratings yet
Introduction To R: Nihan Acar-Denizli, Pau Fonseca
50 pages
Data Mining: Concepts and Techniques: - Chapter 1 - Introduction
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 1 - Introduction
52 pages
Data Transformation With Data - Table: Cheat Sheet
No ratings yet
Data Transformation With Data - Table: Cheat Sheet
2 pages
UL2
No ratings yet
UL2
2 pages
R Programming: © 2016 SMART Training Resources Pvt. LTD
No ratings yet
R Programming: © 2016 SMART Training Resources Pvt. LTD
28 pages
Summary R - Coding
No ratings yet
Summary R - Coding
2 pages
R Prog
No ratings yet
R Prog
27 pages
Sap - Sap Bw/4Hana: Skills Gained
No ratings yet
Sap - Sap Bw/4Hana: Skills Gained
3 pages
Introduction To R PDF
No ratings yet
Introduction To R PDF
56 pages
Introduction To The Data - Table Package in R: Revised: October 2, 2014 (A Later Revision May Be Available On The)
No ratings yet
Introduction To The Data - Table Package in R: Revised: October 2, 2014 (A Later Revision May Be Available On The)
8 pages
Introduction To The Data - Table Package in R: Revised: September 18, 2015 (A Later Revision May Be Available On The)
No ratings yet
Introduction To The Data - Table Package in R: Revised: September 18, 2015 (A Later Revision May Be Available On The)
8 pages
R-Cheat Sheet
100% (1)
R-Cheat Sheet
4 pages
R Reference Card
No ratings yet
R Reference Card
6 pages
RMAN: Basic RMAN Commands: Show Command
No ratings yet
RMAN: Basic RMAN Commands: Show Command
4 pages
TOPIC 4 ER Modeling
No ratings yet
TOPIC 4 ER Modeling
37 pages
R - Tutorial: Matrices Are Vectors
No ratings yet
R - Tutorial: Matrices Are Vectors
13 pages
Basics: TH TH TH TH TH TH TH
No ratings yet
Basics: TH TH TH TH TH TH TH
3 pages
R Command Cheatsheet2551545
No ratings yet
R Command Cheatsheet2551545
2 pages
Dbms Lab
No ratings yet
Dbms Lab
4 pages
R Reference Card
No ratings yet
R Reference Card
6 pages
Business Partner Customer and Vendor Integration
No ratings yet
Business Partner Customer and Vendor Integration
27 pages
Disk Formatting Using Command Prompt
No ratings yet
Disk Formatting Using Command Prompt
2 pages
Informatics Practces: Name-Himanshu Saini Class-Xii - H Roll No. 17
No ratings yet
Informatics Practces: Name-Himanshu Saini Class-Xii - H Roll No. 17
62 pages
R/Rpad Reference Card: Slicing and Extracting Data
No ratings yet
R/Rpad Reference Card: Slicing and Extracting Data
5 pages
Lecture 6 Relational Algebra in DBMS
No ratings yet
Lecture 6 Relational Algebra in DBMS
22 pages
Unit-1-Database System Architecture
No ratings yet
Unit-1-Database System Architecture
38 pages
Prisma Cloud - Threat Detection - Assessment
No ratings yet
Prisma Cloud - Threat Detection - Assessment
6 pages
The Ultimate Guide of SQL
No ratings yet
The Ultimate Guide of SQL
28 pages
6.manage Workspaces and Datasets in Power BI
No ratings yet
6.manage Workspaces and Datasets in Power BI
46 pages
Oracle Mock Test at 4
No ratings yet
Oracle Mock Test at 4
9 pages
R Programming Cheat Sheet: Ata Tructures
No ratings yet
R Programming Cheat Sheet: Ata Tructures
2 pages
4IT4-22 - DBMS Lab Manual - Anjali Pandey
No ratings yet
4IT4-22 - DBMS Lab Manual - Anjali Pandey
69 pages
CO 3 Transaction
No ratings yet
CO 3 Transaction
15 pages
Asset List Audio
No ratings yet
Asset List Audio
31 pages
Lecture 2 - Relational Databases
No ratings yet
Lecture 2 - Relational Databases
32 pages
SQL Joins-Anran Xing and Michelle Tin
No ratings yet
SQL Joins-Anran Xing and Michelle Tin
26 pages
Reserved Word Mysql SQL Server Oracle
No ratings yet
Reserved Word Mysql SQL Server Oracle
18 pages
List of Aws Security Labs by Pwnedlabs 1728618372
No ratings yet
List of Aws Security Labs by Pwnedlabs 1728618372
7 pages
Improving Performance of SQLite Data 1703882908
No ratings yet
Improving Performance of SQLite Data 1703882908
8 pages
?mastering Advanced Excel
No ratings yet
?mastering Advanced Excel
6 pages
PostgreSQL For Wordpress
No ratings yet
PostgreSQL For Wordpress
5 pages
Matrices with MATLAB (Taken from "MATLAB for Beginners: A Gentle Approach")
From Everand
Matrices with MATLAB (Taken from "MATLAB for Beginners: A Gentle Approach")
Peter Kattan
3/5 (4)
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Graphs with MATLAB (Taken from "MATLAB for Beginners: A Gentle Approach")
From Everand
Graphs with MATLAB (Taken from "MATLAB for Beginners: A Gentle Approach")
Peter Kattan
4/5 (2)
MATLAB for Beginners: A Gentle Approach
From Everand
MATLAB for Beginners: A Gentle Approach
Peter I. Kattan
No ratings yet
A Brief Introduction to MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
From Everand
A Brief Introduction to MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
Peter Kattan
2.5/5 (2)
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Mastering Data Structures and Algorithms in C and C++
From Everand
Mastering Data Structures and Algorithms in C and C++
Sachin Naha
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Numerical Analysis II Essentials
From Everand
Numerical Analysis II Essentials
The Editors of REA
No ratings yet

Data Transformation With Data - Table: Cheat Sheet

Uploaded by

Data Transformation With Data - Table: Cheat Sheet

Uploaded by

Data Transformation with data.

table : : CHEAT SHEET

Create a data.table Summary functions such as mean(), median(),

SET KEYS MULTIPLE COLUMN TYPE CONVERSION

By default, a rolling join matches rows, defined by an id variable, MELT EXPORT

You might also like