Plyr Package in R Programming
Last Updated :
25 Apr, 2025
The plyr
is a R package used for data manipulation based on the split-apply-combine strategy. It allows to split a dataset into subsets, apply a function to each subset, and then combine the results into a single output. This approach is helpful for tasks like data aggregation, summarization, and transformation.
Installing and Loading Plyr Package
The plyr package can be installed and loaded into R using the install.packages() function.
R
install.packages("plyr")
library(plyr)
1. Splitting Data using ddply( ) function
The ddply( ) function is used for splitting data frames into smaller subsets, applying a function to each subset, and then combining the results into a new data frame. The name "ddply" stands for "split, apply, and combine", which summarizes the three main steps of the function.
Example:
In this example, ddply() is used to group the mtcars dataset by the cyl variable (number of cylinders), and then the summarise() function is used to calculate the mean mpg for each group. The resulting output is a data frame with two columns: cyl and mean_mpg.
R
library(plyr)
ddply(mtcars, .(cyl), summarise, mean_mpg = mean(mpg))
Output:
ddplyr2. Combining the results using ldply( ) function
The ldply() function is used to convert a list of data frames or vectors into a single data frame, with each element of the list becoming a row of the output data frame. "ldply" stands for "list and bind data frames", which summarizes the main action of the function. It returns a data frame that contains all the elements of the input list, stacked on top of each other.
Example:
In this example, we first create a list of two data frames (countries_1 and countries_2) using data.frame() function. Then, we combine these data frames into a list called countries_list. Finally, we use ldply() function to combine all the data frames in countries_list into a single data frame called combined_df. The resulting data frame contains information about all the countries in the original data frames.
R
library(plyr)
countries_1 <- data.frame(country = c("USA", "Canada", "Mexico"), population = c(328, 37, 130))
countries_2 <- data.frame(country = c("Brazil", "Argentina", "Chile"), population = c(211, 45, 19))
countries_list <- list(countries_1, countries_2)
combined_df <- ldply(countries_list, data.frame)
head(combined_df)
Output:
idply3. Combining Data using adply( ) function
The adply() function is used to apply a function to each subset of a data frame and then combines the results into a new data frame. The "a" in adply() stands for "array", meaning that it can be used with arrays of any dimensions.
Example:
In this example, the adply() function is used to apply the sum() function to each row of the matrix mat. The second argument (1) specifies that we want to apply the function to each subset of the array consisting of one row and all columns. The third argument is an anonymous function that calculates the sum of each row. The resulting result data frame has one column and three rows (one for each row in mat). The values in each row correspond to the sum of that row.
R
library(plyr)
mat <- matrix(1:9, nrow = 3)
head(mat)
result <- adply(mat, 1, function(x) sum(x))
head(result)
Output:
adply4. Join Two Data Frames using join( ) function
join() is a function from the plyr package in R that is used to join two data frames by a common column.
Example:
In this example, the join() function is used to combine two data frames (df1 and df2) based on a common column (id). The by argument specifies the name of the common column. The resulting result data frame has three columns (id, name, age) and two rows (one for each matching value of id in df1 and df2). The values in the name and age columns correspond to the names and ages of the individuals with the matching id value.
R
library(plyr)
df1 <- data.frame(
id = c(1, 2, 3),
name = c("Alice", "Bob", "Charlie")
)
df2 <- data.frame(
id = c(2, 3, 4),
age = c(25, 30, 35)
)
head(df1)
head(df2)
result <- join(df1, df2, by = "id")
head(result)
Output:
join5. Summary Statistics using summarise( ) function
The summarise() function in the plyr package of R is used to aggregate data and calculate summary statistics by groups.
Example:
In this exa,ple, we first use the group_by() function from plyr to group the data by the group column, and pass the resulting grouped data frame to the summarise() function from plyr. We calculate the mean and standard deviation of the value column using the mean() and sd() functions, respectively, and give the resulting columns the names mean and sd. The resulting summary_df data frame will have a row for each group in the original df data frame, with columns for group, mean, and sd.
R
library(plyr)
df <- data.frame(group = c("A", "A", "B", "B", "B"), value = c(2, 4, 6, 8, 10))
summary_df <- summarise(group_by(df, group), mean = mean(value), sd = sd(value))
head(summary_df)
Output:
summaryIn this article, we explored the plyr package in R and implemented many functions available in the library.
Similar Reads
tidyr Package in R Programming
Packages in the R language are a collection of R functions, compiled code, and sample data. They are stored under a directory called âlibraryâ in the R environment. By default, R installs a set of packages during installation. Â One of the most important packages in R is the tidyr package. The sole p
13 min read
Parallel Programming In R
Parallel programming is a type of programming that involves dividing a large computational task into smaller, more manageable tasks that can be executed simultaneously. This approach can significantly speed up the execution time of complex computations and is particularly useful for data-intensive a
6 min read
Esquisse Package in R Programming
Packages in the R programming are a collection of R functions, compiled code, and sample data. They are stored under a directory called âlibraryâ in the R environment. By default, R installs a set of packages during installation. One of the most important packages in R is the Esquisse package. Esqui
6 min read
Removing Package In R
Managing packages helps ensure a smoother experience in data analysis. In R, removing unused packages is important for maintaining efficient and organised environment. It helps in:Free Up Space: By removing unused packages we can free up space for other tasks and improve overall system performance.R
4 min read
Updating Packages In R
R is a statistical programming language that relies heavily on the packages and libraries it offers. These packages help with many problems by offering features that make the analysis easier. Updating these packages is crucial to use the new features, bug fixes, or improvements. In this article, we
5 min read
pacman Package in R
In this article, we will be discussing briefly the pacman package with its working examples in the R programming language. Pacman Package in R Tyler Rinker, Dason Kurkiewicz, Keith Hughitt, Albert Wang, Garrick Aden-Buie, and Lukas Burk created the Pacman R package. The package contains tools for ea
2 min read
Grid and Lattice Packages in R Programming
Every programming language has packages to implement different functions. Many functions are bundled together in a package. To use those functions, installation and loading of these packages are required. In R programming, there are 10, 000 packages in the CRAN repository. Grid and Lattice are some
3 min read
Learn R Programming
R is a Programming Language that is mostly used for machine learning, data analysis, and statistical computing. It is an interpreted language and is platform independent that means it can be used on platforms like Windows, Linux, and macOS. In this R Language tutorial, we will Learn R Programming La
15+ min read
Random Package in R
In this article, we will discuss the random package including its use with working examples in the R programming language. Random Package in R The random package is created by Dirk Eddelbuettel, which allows the user to draw true random numbers by sampling from atmospheric noise via radio tuned to a
4 min read
R Programming Language - Introduction
R was created for statistical analysis and data visualization. It started in the early 1990s when researchers needed a tool that could handle large datasets, run complex computations and display results clearly in graphs and charts. R provides a user-friendly environment and when used with tools lik
4 min read