Data Table

This cheat sheet provides a comprehensive overview of the data.table package in R, highlighting its efficiency and functionality for data manipulation. It covers key operations such as subsetting, grouping, summarizing, reshaping, and joining data.tables, along with examples of syntax. Additionally, it includes methods for reading from and writing to files, as well as converting data frames to data.tables.

Uploaded by

Cirill Mikhaliev

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views2 pages

Data Table

Uploaded by

Cirill Mikhaliev

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Data Transformation with data.

table : : CHEAT SHEET

Basics Manipulate columns with j Group according to by
data.table is an extremely fast and memory efficient package
for transforming data in R. It works by converting R’s native a a a dt[, j, by = .(a)] – group rows by
EXTRACT
data frame objects into data.tables with new and enhanced values in specified columns.
functionality. The basics of working with data.tables are: dt[, c(2)] – extract columns by number. Prefix
column numbers with “-” to drop. dt[, j, keyby = .(a)] – group and
dt[i, j, by] simultaneously sort rows by values
in specified columns.
Take data.table dt, b c b c dt[, .(b, c)] – extract columns by name.
subset rows using i
COMMON GROUPED OPERATIONS
and manipulate columns with j,
grouped according to by. dt[, .(c = sum(b)), by = a] – summarize rows within groups.

data.tables are also data frames – functions that work with data SUMMARIZE dt[, c := sum(b), by = a] – create a new column and compute rows
frames therefore also work with data.tables. within groups.
a x dt[, .(x = sum(a))] – create a data.table with new
columns based on the summarized values of rows.
dt[, .SD[1], by = a] – extract first row of groups.

Create a data.table Summary functions like mean(), median(), min(),

max(), etc. can be used to summarize rows. dt[, .SD[.N], by = a] – extract last row of groups.

data.table(a = c(1, 2), b = c("a", "b")) – create a data.table from

scratch. Analogous to data.frame(). COMPUTE COLUMNS*
Chaining
setDT(df)* or as.data.table(df) – convert a data frame or a list to c dt[, c := 1 + 2] – compute a column based on
a data.table.
3 an expression. dt[…][…] – perform a sequence of data.table operations by
3 chaining multiple “[]”.

a a c dt[a == 1, c := 1 + 2] – compute a column

Subset rows using i 2
1
2
1
NA
3
based on an expression but only for a subset
of rows. Functions for data.tables
dt[1:2, ] – subset rows based on row numbers.
c d dt[, `:=`(c = 1 , d = 2)] – compute multiple
1 2 columns based on separate expressions. REORDER
1 2
a b a b setorder(dt, a, -b) – reorder a data.table
1 2 1 2 according to specified columns. Prefix column
a a dt[a > 5, ] – subset rows based on values in DELETE COLUMN 2 2 1 1 names with “-” for descending order.
1 1 2 2
2 6 one or more columns.
6 c dt[, c := NULL] – delete a column.
5

* SET FUNCTIONS AND :=

LOGICAL OPERATORS TO USE IN i CONVERT COLUMN TYPE data.table’s functions prefixed with “set” and the operator “:=”
work without “<-” to alter data without making copies in
< <= is.na() %in% | %like% b b dt[, b := as.integer(b)] – convert the type of a memory. E.g., the more efficient “setDT(df)” is analogous to
> >= !is.na() ! & %between% 1.5 1 column using as.integer(), as.numeric(), “df <- as.data.table(df)”.
2.6 2 as.character(), as.Date(), etc..

Created by Erik Petrovsky and Mara Destefanis – [email protected] • Learn more with the data.table homepage or vignette • data.table version 1.15.0 • Updated: 2024-01
UNIQUE ROWS
unique(dt, by = c("a", "b")) – extract unique
BIND
Apply function to cols.
a b a b a b a b a b rbind(dt_a, dt_b) – combine rows of two
1 2 1 2 rows based on columns specified in “by”. + = data.tables.
2 2 2 2 Leave out “by” to use all columns. APPLY A FUNCTION TO MULTIPLE COLUMNS
1 2
a b a b dt[, lapply(.SD, mean), .SDcols = c("a", "b")] –
uniqueN(dt, by = c("a", "b")) – count the number of unique rows 1 4 2 5 apply a function – e.g. mean(), as.character(),
based on columns specified in “by”. a b x y a b x y cbind(dt_a, dt_b) – combine columns
2 5 which.max() – to columns specified in .SDcols
of two data.tables.
3 6 with lapply() and the .SD symbol. Also works
+ = with groups.
RENAME COLUMNS
a a a_m cols <- c("a")
a b x y setnames(dt, c("a", "b"), c("x", "y")) – rename 1 1 2 dt[, paste0(cols, "_m") := lapply(.SD, mean),
columns. .SDcols = cols] – apply a function to specified
Reshape a data.table
2 2 2
3 3 2 columns and assign the result with suffixed
variable names to the original data.
SET KEYS RESHAPE TO WIDE FORMAT
setkey(dt, a, b) – set keys to enable fast repeated lookup in
specified columns using “dt[.(value), ]” or for merging without id y a b id a_x a_z b_x b_z dcast(dt, Sequential rows
specifying merging columns using “dt_a[dt_b]”. A x 1 3 A 1 2 3 4 id ~ y,
A x 1 3 B 1 2 3 4
value.var = c("a", "b")) ROW IDS
B z 2 4
B z 2 4
dt[, c := 1:.N, by = b] – within groups, compute a
Combine data.tables
a b a b c
Reshape a data.table from long to wide format. 1 a 1 a 1 column with sequential row IDs.
2 a 2 a 2
dt A data.table. 3 b 3 b 1
JOIN id ~ y Formula with a LHS: ID columns containing IDs for
multiple entries. And a RHS: columns with values to
LAG & LEAD
a b x y a b x dt_a[dt_b, on = .(b = y)] – join spread in column headers.
1 c 3 b 3 b 3 data.tables on rows with equal values. value.var Columns containing values to fill into cells. dt[, c := shift(a, 1), by = b] – within groups,
2 a + 2 c = 1 c 2
a
1
b
a
a
1
b
a
c
NA duplicate a column with rows lagged by
3 b 1 a 2 a 1 2 a 2 a 1 specified amount.
RESHAPE TO LONG FORMAT 3 b 3 b NA
id y a b
melt(dt, 4 b 4 b 3
a b c x y z a b c x dt_a[dt_b, on = .(b = y, c > z)] – id a_x a_z b_x b_z
measure.vars = measure ( 5 b 5 b 4 dt[, c := shift(a, 1, type = "lead"), by = b] –
A x 1 3
1 c 7 3 b 4 3 b 4 3 join data.tables on rows with A 1 2 3 4 within groups, duplicate a column with rows
2 a 5 + 2 c 5 = 1 c 5 2 equal and unequal values. B 1 2 3 4
B
A
x
z
1
2
3
4
value.name, y, sep="_")) leading by specified amount.
3 b 6 1 a 8 NA a 8 1 B z 2 4
Reshape a data.table from wide to long format.
ROLLING JOIN dt A data.table.
measure.vars Columns containing values to fill into cells, read & write files
a id date b id date a id date b often using measure() or patterns ().
1 A 01-01-2010 + 1 A 01-01-2013 = 2 A 01-01-2013 1 id.vars Character vector of ID column names (optional). IMPORT
2 A 01-01-2012 1 B 01-01-2013 2 B 01-01-2013 1
3 A 01-01-2014
variable.name, fread("file.csv") – read data from a flat file such as .csv or .tsv into R.
1 B 01-01-2010 value.name Names for output columns (optional).
2 B 01-01-2012 fread("file.csv", select = c("a", "b")) – read specified columns from a
measure(out_name1, out_name2, sep="_", pattern="([ab])_(.*)")
sep(separator) or pattern (regular expression) are used to specify flat file into R.
dt_a[dt_b, on = .(id = id, date = date), roll = TRUE] – join
data.tables on matching rows in id columns but only keep the most columns to melt, and to parse input column names.
recent preceding match with the left data.table according to date out_name1, out_name2: names for output columns (creates single value
columns. “roll = -Inf” reverses direction. column), or value.name (creates a value columns for each unique part of EXPORT
the melted column name).
fwrite(dt, "file.csv") – write data to a flat file from R.
Created by Erik Petrovsky and Mara Destefanis – [email protected]• Learn more with the data.table homepage or vignette • data.table version 1.15.0 • Updated: 2024-01

WorkBook_Paper9
No ratings yet
WorkBook_Paper9
78 pages
.AMD FP6 Motherboard Design Guide
100% (2)
.AMD FP6 Motherboard Design Guide
316 pages
Data - Table Tutorial (With 50 Examples) PDF
No ratings yet
Data - Table Tutorial (With 50 Examples) PDF
13 pages
Chapter 1 PPT
No ratings yet
Chapter 1 PPT
26 pages
ITT American Electric HPS Micro-Watt Flood Series M Spec Sheet 1-82
No ratings yet
ITT American Electric HPS Micro-Watt Flood Series M Spec Sheet 1-82
6 pages
Review Questions 2
No ratings yet
Review Questions 2
309 pages
4CH1 2C Que 20211120
No ratings yet
4CH1 2C Que 20211120
24 pages
M2_DAR_
No ratings yet
M2_DAR_
46 pages
Solutions for QB3
No ratings yet
Solutions for QB3
14 pages
Alexander - 1984 - Stride Length and Speed For Adults, Children, and Fossil Hominids PDF
No ratings yet
Alexander - 1984 - Stride Length and Speed For Adults, Children, and Fossil Hominids PDF
5 pages
DR - Pierpaolo-Delser - Introduction R
No ratings yet
DR - Pierpaolo-Delser - Introduction R
83 pages
Electrochemistry Online Tutorial Question Form
No ratings yet
Electrochemistry Online Tutorial Question Form
3 pages
9-2-potential-difference-and-power-Yor18y1LJ~GdUhOM
No ratings yet
9-2-potential-difference-and-power-Yor18y1LJ~GdUhOM
12 pages
Unit - 2: Data Manipulation With R & Data Visualization in Watson Studio
No ratings yet
Unit - 2: Data Manipulation With R & Data Visualization in Watson Studio
58 pages
Matrix, Dataframes, List
No ratings yet
Matrix, Dataframes, List
8 pages
First Course On R
No ratings yet
First Course On R
26 pages
R_Vectors
No ratings yet
R_Vectors
22 pages
Data - Table Cheat Sheet R For Data Science: Doing J by Group Advanced Data Table Operations
No ratings yet
Data - Table Cheat Sheet R For Data Science: Doing J by Group Advanced Data Table Operations
1 page
Module 06 - Methods, Classes and Objects
No ratings yet
Module 06 - Methods, Classes and Objects
12 pages
DS-R Block 3 MCQ Question Bank
No ratings yet
DS-R Block 3 MCQ Question Bank
6 pages
Synthesis of Silver Nanoparticles With Different Shapes: Arabian Journal of Chemistry
No ratings yet
Synthesis of Silver Nanoparticles With Different Shapes: Arabian Journal of Chemistry
16 pages
fonction dplyr
No ratings yet
fonction dplyr
5 pages
Surds and Indices Questions Specially For Sbi Po Prelims
No ratings yet
Surds and Indices Questions Specially For Sbi Po Prelims
14 pages
Data Wrangling
No ratings yet
Data Wrangling
12 pages
Data Manipulation With R - 3
No ratings yet
Data Manipulation With R - 3
17 pages
Data Transformation With Data - Table: Cheat Sheet
No ratings yet
Data Transformation With Data - Table: Cheat Sheet
2 pages
Unit 1
No ratings yet
Unit 1
94 pages
ISYS3447 - Week 3 Notes
No ratings yet
ISYS3447 - Week 3 Notes
3 pages
RCourse Lecture8 Calculations
No ratings yet
RCourse Lecture8 Calculations
11 pages
1 MS Word Quiz PDF
88% (8)
1 MS Word Quiz PDF
2 pages
Base-R
No ratings yet
Base-R
9 pages
PHYSICS
No ratings yet
PHYSICS
3 pages
Calpeda Pump Datasheet
No ratings yet
Calpeda Pump Datasheet
16 pages
R study material I
No ratings yet
R study material I
8 pages
datatable
No ratings yet
datatable
2 pages
Data Transformation With Data - Table: Cheat Sheet
No ratings yet
Data Transformation With Data - Table: Cheat Sheet
2 pages
STAT 04 Simplify Notes
No ratings yet
STAT 04 Simplify Notes
34 pages
R Course Own English HS
No ratings yet
R Course Own English HS
70 pages
STTN 225 R Summary
No ratings yet
STTN 225 R Summary
18 pages
Comsol
100% (1)
Comsol
34 pages
R Lectures Chapter 4
No ratings yet
R Lectures Chapter 4
3 pages
Presentation 1
No ratings yet
Presentation 1
34 pages
R Studio Notes
No ratings yet
R Studio Notes
10 pages
Keys For @josmer10 STEEMIT
100% (1)
Keys For @josmer10 STEEMIT
2 pages
Data Transformation With Data - Table: Cheat Sheet
No ratings yet
Data Transformation With Data - Table: Cheat Sheet
2 pages
R Basic and Advanced
No ratings yet
R Basic and Advanced
9 pages
Microstrip Filter
No ratings yet
Microstrip Filter
16 pages
STK/WIF/20-21/283 Bar No.: QTC With Despatch
No ratings yet
STK/WIF/20-21/283 Bar No.: QTC With Despatch
58 pages
R Reference Card
No ratings yet
R Reference Card
6 pages
Introduction To The Data - Table Package in R: Revised: October 2, 2014 (A Later Revision May Be Available On The)
No ratings yet
Introduction To The Data - Table Package in R: Revised: October 2, 2014 (A Later Revision May Be Available On The)
8 pages
Introduction To The Data - Table Package in R: Revised: September 18, 2015 (A Later Revision May Be Available On The)
No ratings yet
Introduction To The Data - Table Package in R: Revised: September 18, 2015 (A Later Revision May Be Available On The)
8 pages
Data Transformation With Data - Table: Cheat Sheet
No ratings yet
Data Transformation With Data - Table: Cheat Sheet
2 pages
Week 1-B. Data in R
No ratings yet
Week 1-B. Data in R
5 pages
Introduction To R: Nihan Acar-Denizli, Pau Fonseca
No ratings yet
Introduction To R: Nihan Acar-Denizli, Pau Fonseca
50 pages
R-Cheat Sheet
100% (1)
R-Cheat Sheet
4 pages
BMR Assignment: Tidyr
No ratings yet
BMR Assignment: Tidyr
3 pages
Faqs About The Data - Table Package in R: Revised: October 2, 2014 (A Later Revision May Be Available On The)
No ratings yet
Faqs About The Data - Table Package in R: Revised: October 2, 2014 (A Later Revision May Be Available On The)
21 pages
SAS R::: Cheat Sheet
No ratings yet
SAS R::: Cheat Sheet
2 pages
R - Tutorial: Matrices Are Vectors
No ratings yet
R - Tutorial: Matrices Are Vectors
13 pages
Practical Scaffold Training Manual: Part 1: Basic Scaffolding
No ratings yet
Practical Scaffold Training Manual: Part 1: Basic Scaffolding
146 pages
R Functions
No ratings yet
R Functions
8 pages
R Command Cheatsheet2551545
No ratings yet
R Command Cheatsheet2551545
2 pages
Enhanced Data
No ratings yet
Enhanced Data
12 pages
Sas R
No ratings yet
Sas R
2 pages
UL2
No ratings yet
UL2
2 pages
R Reference Card
No ratings yet
R Reference Card
6 pages
2 13 2 14 Lesson Plan-2
No ratings yet
2 13 2 14 Lesson Plan-2
3 pages
Oow
No ratings yet
Oow
15 pages
EC-506 Scilab Laboratory Manual: Submitted by
No ratings yet
EC-506 Scilab Laboratory Manual: Submitted by
29 pages
R Programming Cheatsheet
100% (2)
R Programming Cheatsheet
6 pages
A Short List of The Most Useful R Commands
No ratings yet
A Short List of The Most Useful R Commands
11 pages
Basics: TH TH TH TH TH TH TH
No ratings yet
Basics: TH TH TH TH TH TH TH
3 pages
RX-F31S SCH
No ratings yet
RX-F31S SCH
22 pages
Data manipulation in R
No ratings yet
Data manipulation in R
5 pages
Electronic Circuits
No ratings yet
Electronic Circuits
15 pages
AECT210 Lecture 6
No ratings yet
AECT210 Lecture 6
7 pages
Chem 113E Module 1
No ratings yet
Chem 113E Module 1
11 pages
Isochem: Modular Chemical Process Pumps
No ratings yet
Isochem: Modular Chemical Process Pumps
20 pages
R Programming Cheat Sheet: Ata Tructures
No ratings yet
R Programming Cheat Sheet: Ata Tructures
2 pages
Datatable Cheat Sheet R
No ratings yet
Datatable Cheat Sheet R
1 page
Servo Amp: +B IC701 AN8885SBE1 Optical Pickup
No ratings yet
Servo Amp: +B IC701 AN8885SBE1 Optical Pickup
2 pages
Exercises of Matrices and Linear Algebra
From Everand
Exercises of Matrices and Linear Algebra
Simone Malacrida
4/5 (1)
Exercises of Differential Linear Systems
From Everand
Exercises of Differential Linear Systems
Simone Malacrida
No ratings yet
Mastering Data Structures and Algorithms in C and C++
From Everand
Mastering Data Structures and Algorithms in C and C++
Sachin Naha
No ratings yet
Matrices with MATLAB (Taken from "MATLAB for Beginners: A Gentle Approach")
From Everand
Matrices with MATLAB (Taken from "MATLAB for Beginners: A Gentle Approach")
Peter Kattan
3/5 (4)
Graphs with MATLAB (Taken from "MATLAB for Beginners: A Gentle Approach")
From Everand
Graphs with MATLAB (Taken from "MATLAB for Beginners: A Gentle Approach")
Peter Kattan
4/5 (2)
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Numerical Analysis II Essentials
From Everand
Numerical Analysis II Essentials
The Editors of REA
No ratings yet

Data Table

Uploaded by

Data Table

Uploaded by

Data Transformation with data.

table : : CHEAT SHEET

Create a data.table Summary functions like mean(), median(), min(),

data.table(a = c(1, 2), b = c("a", "b")) – create a data.table from

a a c dt[a == 1, c := 1 + 2] – compute a column

* SET FUNCTIONS AND :=

You might also like