Data Manipulation in R with data.table Last Updated : 24 Jun, 2025 Comments Improve Suggest changes Like Article Like Report data.table in R is a package used for handling and manipulating large datasets. It allows for fast data processing, such as creating, modifying, grouping and summarizing data and is often faster than other tools like dplyr for big data tasks.1. Creating and Sub-Setting DataWe can either convert existing data frames or create a new data.table object directly using data.table package. R library(data.table) DT <- data.table(x = c(1,2,3,4), y = c("A", "B", "C", "D"), z = c(TRUE, FALSE, TRUE, FALSE)) print(DT) subset_DT <- DT[x > 2] print(subset_DT) Output:Output2. Grouping the DataWe can group data by columns and perform calculations like sums, averages, etc., on those groups. R grouped_DT <- DT[, sum(x), by = y] print(grouped_DT) Output:Output3. Joining the DataWe can merge datasets, like performing an inner join on a common column. R DT2 <- data.table(y = c("A", "B", "C", "D"), v = c("alpha", "beta", "gamma", "delta")) inner_join_DT <- DT[DT2, on = "y"] print(inner_join_DT) Output:Output4. Modifying the DataWe can modify data by adding, updating or replacing columns. R DT[, x_squared := x^2] print(DT) Output:Output5. Comparison with dplyr PackageWhile the dplyr package is common, data.table is often faster for large datasets. We can use microbenchmark to compare execution times. R if (!require(microbenchmark)) { install.packages("microbenchmark") } library(microbenchmark) library(dplyr) dplyr_time <- microbenchmark( .dplyr <- DT %>% filter(x > 2) %>% group_by(y) %>% summarise(sum_x = sum(x)), times = 10 ) print(dplyr_time) data.table_time <- microbenchmark( .data.table <- DT[x > 2, sum(x), by = y], times = 10 ) print(data.table_time) Output:OutputThe output displays the execution time of the dplyr and data.table operations, including the minimum, median and maximum times across 10 runs. Comment More infoAdvertise with us Next Article Left join using data.table in R A anitha_priyanka Follow Improve Article Tags : R Language R-basics Similar Reads data.table vs data.frame in R Programming data.table in R is an enhanced version of the data.frame. Due to its speed of execution and the less code to type it became popular in R. The purpose of data.table is to create tabular data same as a data frame but the syntax varies. In the below example let we can see the syntax for the data table: 3 min read Add Multiple New Columns to data.table in R In this article, we will discuss how to Add Multiple New Columns to the data.table in R Programming Language. To do this we will first install the data.table library and then load that library. Syntax: install.packages("data.table") After installing the required packages out next step is to create t 3 min read Left join using data.table in R The data. table package in R is one of the best data manipulation tools that enable users to manage big data with so much ease and flexibility. One of its essential operations is the join, particularly the left join. This article will explore how to perform a left join using data.table, its advantag 6 min read Convert dataframe to data.table in R In this article, we will discuss how to convert dataframe to data.table in R Programming Language. data.table is an R package that provides an enhanced version of dataframe. Characteristics of data.table :Â data.table doesnât set or use row namesrow numbers are printed with a : for better readabilit 5 min read Introduction to Tidy Data in R Tidy data is a data science and analysis notion that entails arranging data systematically and consistently, making it easier to work with and analyze using tools such as R. Tidy data is a crucial component of Hadley Wickham's data science methodology, which he popularized by creating the "tidyverse 6 min read What Does .SD Stand for in data.table in R? data.table is a popular package in R for data manipulation, offering a high-performance version of data frames with enhanced functionality. One of the key features data.table is its special symbol .SD, which stands for "Subset of Data." This article will explore the theory behind .SD, its usage, and 4 min read Like