Open In App

Plyr Package in R Programming

Last Updated : 25 Apr, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

The plyr is a R package used for data manipulation based on the split-apply-combine strategy. It allows to split a dataset into subsets, apply a function to each subset, and then combine the results into a single output. This approach is helpful for tasks like data aggregation, summarization, and transformation.

Installing and Loading Plyr Package

The plyr package can be installed and loaded into R using the install.packages() function.

R
install.packages("plyr")
library(plyr)

1. Splitting Data using ddply( ) function

The ddply( ) function is used for splitting data frames into smaller subsets, applying a function to each subset, and then combining the results into a new data frame. The name "ddply" stands for "split, apply, and combine", which summarizes the three main steps of the function.

Example:

In this example, ddply() is used to group the mtcars dataset by the cyl variable (number of cylinders), and then the summarise() function is used to calculate the mean mpg for each group. The resulting output is a data frame with two columns: cyl and mean_mpg.

R
library(plyr)

ddply(mtcars, .(cyl), summarise, mean_mpg = mean(mpg))

Output:

ddplyr

2. Combining the results using ldply( ) function

The ldply() function is used to convert a list of data frames or vectors into a single data frame, with each element of the list becoming a row of the output data frame. "ldply" stands for "list and bind data frames", which summarizes the main action of the function. It returns a data frame that contains all the elements of the input list, stacked on top of each other.

Example:

In this example, we first create a list of two data frames (countries_1 and countries_2) using data.frame() function. Then, we combine these data frames into a list called countries_list. Finally, we use ldply() function to combine all the data frames in countries_list into a single data frame called combined_df. The resulting data frame contains information about all the countries in the original data frames.

R
library(plyr)

countries_1 <- data.frame(country = c("USA", "Canada", "Mexico"), population = c(328, 37, 130))
countries_2 <- data.frame(country = c("Brazil", "Argentina", "Chile"), population = c(211, 45, 19))
countries_list <- list(countries_1, countries_2)

combined_df <- ldply(countries_list, data.frame)
head(combined_df)

Output:

idply

3. Combining Data using adply( ) function

The adply() function is used to apply a function to each subset of a data frame and then combines the results into a new data frame. The "a" in adply() stands for "array", meaning that it can be used with arrays of any dimensions.

Example:

In this example, the adply() function is used to apply the sum() function to each row of the matrix mat. The second argument (1) specifies that we want to apply the function to each subset of the array consisting of one row and all columns. The third argument is an anonymous function that calculates the sum of each row. The resulting result data frame has one column and three rows (one for each row in mat). The values in each row correspond to the sum of that row.

R
library(plyr)

mat <- matrix(1:9, nrow = 3)
head(mat)

result <- adply(mat, 1, function(x) sum(x))
head(result)

Output:

adply

4. Join Two Data Frames using join( ) function

join() is a function from the plyr package in R that is used to join two data frames by a common column.

Example:

In this example, the join() function is used to combine two data frames (df1 and df2) based on a common column (id). The by argument specifies the name of the common column. The resulting result data frame has three columns (id, name, age) and two rows (one for each matching value of id in df1 and df2). The values in the name and age columns correspond to the names and ages of the individuals with the matching id value.

R
library(plyr)

df1 <- data.frame(
  id = c(1, 2, 3),
  name = c("Alice", "Bob", "Charlie")
)

df2 <- data.frame(
  id = c(2, 3, 4),
  age = c(25, 30, 35)
)

head(df1)
head(df2)

result <- join(df1, df2, by = "id")
head(result)

Output:

join

5. Summary Statistics using summarise( ) function

The summarise() function in the plyr package of R is used to aggregate data and calculate summary statistics by groups.

Example:

In this exa,ple, we first use the group_by() function from plyr to group the data by the group column, and pass the resulting grouped data frame to the summarise() function from plyr. We calculate the mean and standard deviation of the value column using the mean() and sd() functions, respectively, and give the resulting columns the names mean and sd. The resulting summary_df data frame will have a row for each group in the original df data frame, with columns for group, mean, and sd.

R
library(plyr)

df <- data.frame(group = c("A", "A", "B", "B", "B"), value = c(2, 4, 6, 8, 10))

summary_df <- summarise(group_by(df, group), mean = mean(value), sd = sd(value))
head(summary_df)

Output:

summary

In this article, we explored the plyr package in R and implemented many functions available in the library.


Next Article
Article Tags :

Similar Reads