Detailed explanations for Programming in R. Wrangling data
Detailed explanations for Programming in R. Wrangling data
● Dynamically set the working directory to the location of the current R script.
● Ensures that the path will always point to the correct folder containing the script,
regardless of its location.
3. Reading Data
R
Копировать код
BOA <- read.csv(paste0(file_path, "BagsOfApples.csv"), sep=";")
BOO <- read_excel(paste0(file_path, "BagsOfOrangesNA.xlsx"))
Geo <- read_excel(paste0(file_path, "Geo_dim.xlsx"))
4. Adding Columns
R
Копировать код
BOA$fruits <- "Apples"
BOO$fruits <- "Oranges"
How It Works:
● rbind():
○ Combines two dataframes (BOO and BOA) by stacking rows.
○ Assumes both dataframes have the same column structure (column names
and types).
By using the .id parameter in bind_rows, you can add an extra column indicating the
source of each row:
R
Копировать код
BOF <- bind_rows(BOO = BOO, BOA = BOA, .id = "source")
○ Here, rows from BOO would have source = "BOO", and rows from BOA
would have source = "BOA".
6. Replacing Values
R
Копировать код
BOF$origin <- str_replace_all(BOF$origin, "California", "United
States")
7. Merging Datasets
R
Копировать код
BOF <- left_join(BOF, Geo, by = c("origin" = "Country"))
● Joins BOF with Geo based on the origin column in BOF and Country column in
Geo.
8. Renaming Columns
R
Копировать код
BOF <- rename(BOF, price = prize)
9. Removing Columns
R
Копировать код
BOF <- select(BOF, -bagNo)
● anyNA(BOF):
○ Checks for the presence of NA (missing values) in the dataframe.
● na.omit(BOF):
○ Removes all rows with any NA values.
● mutate():
○ Adds a new column ppk (price per kilo), calculated as price divided by
weight.
● arrange():
○ Sorts the dataframe based on the ppk column in descending order.
How It Works:
● write.table():
○ Writes the dataframe BOF to a file named bagsoffruits_price.txt.
○ The file uses tab (\t) as the delimiter.
You can change sep to any other delimiter, such as commas for a CSV file:
R
Копировать код
write.table(BOF, file="bagsoffruits_price.csv", sep=",",
row.names=FALSE)
○
2. Adding Row Names:
○ Set row.names=TRUE to include row numbers as a separate column.
3. Saving with Quotes:
○
4. Using write.csv() for Simplicity:
● Counts the number of rows for each unique value in the foodLabel column.
R
Копировать код
arrange(count(BOF, foodLabel), desc(n))
● group_by():
○ Groups data by the foodLabel column.
● summarise():
○ Calculates the mean ppk for each group.