Assignment 2 Tidyr
Assignment 2 Tidyr
The tidy data principles ensure consistency and compatibility with other tidyverse tools like
`dplyr`, `ggplot2`, and `purrr`. The `tidyr` package offers a suite of tools for reshaping,
splitting, and unifying data to make it suitable for analysis.
1. gather()
The `gather()` function reshapes data from a wide format to a long format, combining
multiple columns into key-value pairs. It is particularly useful when variables are spread
across columns and need to be combined into two columns: one for the variable name and
another for its value.
```R
library(tidyr)
data <- data.frame(model = rownames(mtcars), mtcars[, 1:4])
tidy_data <- gather(data, key = "variable", value = "value", mpg:hp)
print(tidy_data)
```
2. separate()
The `separate()` function splits a single column into multiple columns based on a specified
delimiter. This is helpful when data is stored in a combined format, such as a full name or a
date.
```R
data <- data.frame(name = c("John_Doe", "Jane_Smith"))
tidy_data <- separate(data, name, into = c("first_name", "last_name"), sep = "_")
print(tidy_data)
```
3. spread()
The `spread()` function transforms data from a long format to a wide format, converting
key-value pairs into columns. This is useful when observations need to be spread across
multiple columns.
```R
data <- data.frame(key = c("A", "A", "B", "B"),
variable = c("X", "Y", "X", "Y"),
value = c(1, 2, 3, 4))
tidy_data <- spread(data, key = variable, value = value)
print(tidy_data)
```
4. unite()
The `unite()` function combines multiple columns into a single column, with values
separated by a specified delimiter. It is the inverse of the `separate()` function.
```R
data <- data.frame(first_name = c("John", "Jane"), last_name = c("Doe", "Smith"))
tidy_data <- unite(data, full_name, first_name, last_name, sep = " ")
print(tidy_data)
```