0% found this document useful (0 votes)
10 views2 pages

Assignment 2 Tidyr

R assignment

Uploaded by

prashatri5
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views2 pages

Assignment 2 Tidyr

R assignment

Uploaded by

prashatri5
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Assignment 2

Q1: Discuss tidyr Package in R Programming


The `tidyr` package in R is part of the tidyverse collection of packages designed for data
manipulation and cleaning. It provides a set of functions to restructure and clean datasets,
enabling data scientists and analysts to organize their data in a format that is easier to
analyze and visualize. The primary goal of `tidyr` is to convert datasets into a tidy format,
where each variable is a column, each observation is a row, and each type of observational
unit forms a table.

The tidy data principles ensure consistency and compatibility with other tidyverse tools like
`dplyr`, `ggplot2`, and `purrr`. The `tidyr` package offers a suite of tools for reshaping,
splitting, and unifying data to make it suitable for analysis.

Q2: tidyr Package Functions for Data Cleaning


The `tidyr` package provides several functions that are essential for cleaning and reshaping
data. Below are key functions with examples using inbuilt datasets in R:

1. gather()
The `gather()` function reshapes data from a wide format to a long format, combining
multiple columns into key-value pairs. It is particularly useful when variables are spread
across columns and need to be combined into two columns: one for the variable name and
another for its value.

Example using the `mtcars` dataset:

```R
library(tidyr)
data <- data.frame(model = rownames(mtcars), mtcars[, 1:4])
tidy_data <- gather(data, key = "variable", value = "value", mpg:hp)
print(tidy_data)
```

2. separate()
The `separate()` function splits a single column into multiple columns based on a specified
delimiter. This is helpful when data is stored in a combined format, such as a full name or a
date.

Example using a custom dataset:

```R
data <- data.frame(name = c("John_Doe", "Jane_Smith"))
tidy_data <- separate(data, name, into = c("first_name", "last_name"), sep = "_")
print(tidy_data)
```

3. spread()
The `spread()` function transforms data from a long format to a wide format, converting
key-value pairs into columns. This is useful when observations need to be spread across
multiple columns.

Example using a custom dataset:

```R
data <- data.frame(key = c("A", "A", "B", "B"),
variable = c("X", "Y", "X", "Y"),
value = c(1, 2, 3, 4))
tidy_data <- spread(data, key = variable, value = value)
print(tidy_data)
```

4. unite()
The `unite()` function combines multiple columns into a single column, with values
separated by a specified delimiter. It is the inverse of the `separate()` function.

Example using a custom dataset:

```R
data <- data.frame(first_name = c("John", "Jane"), last_name = c("Doe", "Smith"))
tidy_data <- unite(data, full_name, first_name, last_name, sep = " ")
print(tidy_data)
```

You might also like