0% found this document useful (0 votes)
2 views

Plyr Package in R Programming

The Plyr package in R is designed for data manipulation, allowing users to split, apply, and combine data using functions like ddply(), ldply(), adply(), join(), and summarise(). It facilitates tasks such as aggregating data, transforming datasets, and joining data frames based on common columns. Users can install and load the package to perform various data operations efficiently.

Uploaded by

sibi00424
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Plyr Package in R Programming

The Plyr package in R is designed for data manipulation, allowing users to split, apply, and combine data using functions like ddply(), ldply(), adply(), join(), and summarise(). It facilitates tasks such as aggregating data, transforming datasets, and joining data frames based on common columns. Users can install and load the package to perform various data operations efficiently.

Uploaded by

sibi00424
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Plyr Package in R Programming

What is Plyr Package?

Plyr is a package for data manipulation in R that provides a set of functions for
splitting, applying, and combining data. It is based on the concept of split-apply-
combine, where a dataset is first split into smaller subsets, a function is applied to
each subset, and the results are then combined into a single output. This process is
useful for tasks such as aggregating data, summarizing data, and transforming data.

Installing and Loading Plyr Package:

Before using the plyr package, it needs to be installed and loaded into R. The
package can be installed using the following command:

install.packages("plyr")

After the package is installed, it can be loaded into R using the following command:
library(plyr)

1. Splitting Data using ddply( ) functions:

The ddply( ) function is a powerful tool for splitting data frames into smaller
subsets, applying a function to each subset, and then combining the results into a
new data frame. The name “ddply” stands for “split, apply, and combine”, which
summarizes the three main steps of the function. Here are the main arguments of
ddply():
Syntax:
Parameters: `data`
object:The input data frame that you want to split and process.

Syntax:
Parameters: `variables`
object:One or more grouping variables that define how the data should be split.

Syntax:
Parameters: `fun`
object:A function that you want to apply to each subset of the data frame.
Syntax:
Parameters: `…`
object:Additional arguments that are passed to the function specified in fun. ere’s an
example of how to use ddply() to calculate the mean miles per gallon (mpg) of cars
in the mtcars dataset, grouped by the number of cylinders in the engine:

library(plyr)

# Using ddply to group by number of cylinders and calculate mean mpg


ddply(mtcars, .(cyl), summarise, mean_mpg = mean(mpg))

In this example, ddply() is used to group the mtcars dataset by the cyl variable
(number of cylinders), and then the summarise() function is used to calculate the
mean mpg for each group. The resulting output is a data frame with two columns:
cyl and mean_mpg.

2. Combining the results using ldply( ) function:

The ldply() function is used to convert a list of data frames or vectors into a single
data frame, with each element of the list becoming a row of the output data frame.
The name “ldply” stands for “list and bind data frames”, which summarizes the
main action of the function. Finally, the ldply() function returns a data frame that
contains all the elements of the input list, stacked on top of each other. Here are the
main arguments of ldply():
Syntax:
Parameters: `data`
object:The input list that you want to convert to a data frame.
Syntax:
Parameters: `.fun`
object:An optional function that you want to apply to each element of the list before
converting it to a data frame.

Syntax:
Parameters: `…`
object:Additional arguments that are passed to the function specified in .fun.

Example:
library(plyr)

# Create a list of data frames


countries_1 <- data.frame(country = c("USA", "Canada", "Mexico"), population =
c(328, 37, 130))
countries_2 <- data.frame(country = c("Brazil", "Argentina", "Chile"), population =
c(211, 45, 19))
countries_list <- list(countries_1, countries_2)

# Use ldply() to combine the list of data frames into a single data frame
combined_df <- ldply(countries_list, data.frame)

# View the resulting data frame


combined_df
In this example, we first create a list of two data frames (countries_1 and
countries_2) using data.frame() function. Then, we combine these data frames into a
list called countries_list. Finally, we use ldply() function to combine all the data
frames in countries_list into a single data frame called combined_df. The resulting
data frame contains information about all the countries in the original data frames.

3. Combining Data using adply( ) function:

The adply() function is used to apply a function to each subset of a data frame and
then combines the results into a new data frame. The a in adply() stands for
“array”, meaning that it can be used with arrays of any dimensions. The
arguments for adply() are:

Syntax:
Parameters: `data`
object:the input data frame or array.

Syntax:
Parameters: `margins`
object:the dimensions of the array to split over (in this example, we used 2 to split
over the second dimension)

Syntax:
Parameters: `FUN`
object:the function to apply to each subset of the array (in this example, we used an
anonymous function that calculates the mean and standard deviation of each
column)
Syntax:
Parameters: `…`
object:additional arguments to pass to the function specified in FUN (if any)

Example:

library(plyr)

# Create a sample matrix


mat <- matrix(1:9, nrow = 3)

# Display created matrix


mat

# Use adply() to calculate the sum of each row


result <- adply(mat, 1, function(x) sum(x))

# View the result


Result

In this example, the adply() function is used to apply the sum() function to each row
of the matrix mat. The second argument (1) specifies that we want to apply the
function to each subset of the array consisting of one row and all columns. The third
argument is an anonymous function that calculates the sum of each row. The
resulting result data frame has one column and three rows (one for each row in mat).
The values in each row correspond to the sum of that row.

4. Join Two Data Frames using join( ) function:

join() is a function from the plyr package in R that is used to join two data frames by
a common column. The join() function takes several arguments, including:

Syntax:
Parameters: `x`, `y`
object: Data frames join.

Syntax:
Parameters: `by`
object: The column(s) to join the data frames .

Syntax:
Parameters: `type`
object: The type of join to perform (e.g. “inner”, “outer”, “left”, “right”).

Syntax:
Parameters: `suffix`
object:A character vector to append to overlapping variable names (defaults to
c(“.x”, “.y”))

Example:

library(plyr)

# Create two sample data frames


df1 <- data.frame(
id = c(1, 2, 3),
name = c("Alice", "Bob", "Charlie")
)

df2 <- data.frame(


id = c(2, 3, 4),
age = c(25, 30, 35)
)

# Print the created dataset


df1
df2

# Use join() to combine the data frames


result <- join(df1, df2, by = "id")

# View the result


result

In this example, the join() function is used to combine two data frames (df1 and df2)
based on a common column (id). The by argument specifies the name of the
common column. The resulting result data frame has three columns (id, name, age)
and two rows (one for each matching value of id in df1 and df2). The values in the
name and age columns correspond to the names and ages of the individuals with the
matching id value.

5. Summary Statistics using summarise( ) function:

The summarise() function in the plyr package of R is used to aggregate data and
calculate summary statistics by groups. The summarise() function takes several
arguments, including:

Syntax:
Parameters: `data`
object: The data frame to summarize.

Syntax:
Parameters: `…`
object: a list of expressions that calculate summary statistics (e.g. mean(value),
sd(value), etc.)

Example:

# Load the plyr package


library(plyr)

# Create a data frame with two columns: group and value


df <- data.frame(group = c("A", "A", "B", "B", "B"), value = c(2, 4, 6, 8, 10))

# Summarize the data by group, calculating the


# mean and standard deviation of the value column
summary_df <- summarise(group_by(df, group), mean = mean(value), sd = sd(value))

# Print the summary data frame to the console


summary_df
In this code, We first use the group_by() function from plyr to group the data by the
group column, and pass the resulting grouped data frame to the summarise()
function from plyr. We calculate the mean and standard deviation of the value
column using the mean() and sd() functions, respectively, and give the resulting
columns the names mean and sd. The resulting summary_df data frame will have a
row for each group in the original df data frame, with columns for group, mean, and
sd.

You might also like