0% found this document useful (0 votes)
2 views

Detailed explanations for Programming in R. Wrangling data

The document provides detailed instructions for processing data in R, including setting file paths, reading datasets, adding and merging columns, and handling missing data. It covers various data manipulation techniques such as filtering, sorting, and summarizing data, as well as saving processed data to files. Additionally, it suggests using tools like ChatGPT for troubleshooting and clarifying concepts.

Uploaded by

soloviovalada
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Detailed explanations for Programming in R. Wrangling data

The document provides detailed instructions for processing data in R, including setting file paths, reading datasets, adding and merging columns, and handling missing data. It covers various data manipulation techniques such as filtering, sorting, and summarizing data, as well as saving processed data to files. Additionally, it suggests using tools like ChatGPT for troubleshooting and clarifying concepts.

Uploaded by

soloviovalada
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Ex 5 Detailed explanations

2. Setting File Path


R
Копировать код
file_path <-
paste0(dirname(rstudioapi::getSourceEditorContext()$path), "/")

● Dynamically set the working directory to the location of the current R script.
● Ensures that the path will always point to the correct folder containing the script,
regardless of its location.

3. Reading Data
R
Копировать код
BOA <- read.csv(paste0(file_path, "BagsOfApples.csv"), sep=";")
BOO <- read_excel(paste0(file_path, "BagsOfOrangesNA.xlsx"))
Geo <- read_excel(paste0(file_path, "Geo_dim.xlsx"))

● Reads three datasets into R:


○ BOA (Bags of Apples):
■ Reads a semicolon-delimited CSV file using read.csv().
■ Assumes the file is located in the directory set by file_path.
○ BOO (Bags of Oranges):
■ Reads an Excel file using read_excel() from the readxl package.
○ Geo (Geo-dimensions):
■ Likely contains geographic mapping information.

4. Adding Columns
R
Копировать код
BOA$fruits <- "Apples"
BOO$fruits <- "Oranges"

● Adds a new column fruits to each dataframe:


○ For BOA, all rows are labeled as "Apples".
○ For BOO, all rows are labeled as "Oranges".
5. Combining Data (Elaborated)
R
Копировать код
BOF <- rbind(BOO, BOA)

How It Works:

● rbind():
○ Combines two dataframes (BOO and BOA) by stacking rows.
○ Assumes both dataframes have the same column structure (column names
and types).

Additional Use Cases:

1. If Columns Don’t Match Exactly:


○ bind_rows(BOO, BOA, .id = "source") can be used to allow differing
columns. Missing columns will be filled with NA.
2. Adding an Identifier for Data Source:

By using the .id parameter in bind_rows, you can add an extra column indicating the
source of each row:
R
Копировать код
BOF <- bind_rows(BOO = BOO, BOA = BOA, .id = "source")

○ Here, rows from BOO would have source = "BOO", and rows from BOA
would have source = "BOA".

6. Replacing Values
R
Копировать код
BOF$origin <- str_replace_all(BOF$origin, "California", "United
States")

● Uses str_replace_all() from the stringr package to replace "California"


with "United States" in the origin column.

7. Merging Datasets
R
Копировать код
BOF <- left_join(BOF, Geo, by = c("origin" = "Country"))

● Joins BOF with Geo based on the origin column in BOF and Country column in
Geo.

8. Renaming Columns
R
Копировать код
BOF <- rename(BOF, price = prize)

● Renames the prize column to price.

9. Removing Columns
R
Копировать код
BOF <- select(BOF, -bagNo)

● Removes the bagNo column using select().

10. Handling Missing Data


R
Копировать код
anyNA(BOF)
BOF <- na.omit(BOF)

● anyNA(BOF):
○ Checks for the presence of NA (missing values) in the dataframe.
● na.omit(BOF):
○ Removes all rows with any NA values.

11. Filtering Data


R
Копировать код
BOF_europe <- filter(BOF, Region == "Europe")

● Creates a subset of BOF where the Region column equals "Europe".

12. Adding Calculated Columns


R
Копировать код
BOF <- mutate(BOF, ppk = price / weight)

● mutate():
○ Adds a new column ppk (price per kilo), calculated as price divided by
weight.

13. Sorting Data


R
Копировать код
arrange(BOF, desc(ppk))

● arrange():
○ Sorts the dataframe based on the ppk column in descending order.

14. Saving Processed Data (Elaborated)


R
Копировать код
write.table(BOF, file="bagsoffruits_price.txt", sep="\t",
row.names=FALSE)

How It Works:

● write.table():
○ Writes the dataframe BOF to a file named bagsoffruits_price.txt.
○ The file uses tab (\t) as the delimiter.

Additional Use Cases:


1. Changing the Delimiter:

You can change sep to any other delimiter, such as commas for a CSV file:
R
Копировать код
write.table(BOF, file="bagsoffruits_price.csv", sep=",",
row.names=FALSE)


2. Adding Row Names:
○ Set row.names=TRUE to include row numbers as a separate column.
3. Saving with Quotes:

To wrap text fields in quotes:


R
Копировать код
write.table(BOF, file="bagsoffruits_price.txt", sep="\t",
row.names=FALSE, quote=TRUE)


4. Using write.csv() for Simplicity:

For CSV files, write.csv() can be used as a shortcut:


R
Копировать код
write.csv(BOF, file="bagsoffruits_price.csv", row.names=FALSE)

15. Counting and Grouping Data


R
Копировать код
count(BOF, foodLabel)

● Counts the number of rows for each unique value in the foodLabel column.

R
Копировать код
arrange(count(BOF, foodLabel), desc(n))

● Arranges the counts in descending order by frequency.


16. Summarizing Data
R
Копировать код
BOFg <- group_by(BOF, foodLabel)
BOFgn <- summarise(BOFg, meanppk = mean(ppk))

● group_by():
○ Groups data by the foodLabel column.
● summarise():
○ Calculates the mean ppk for each group.

17. Troubleshooting and Assistance

● The document recommends using tools like ChatGPT or Copilot to troubleshoot


errors or get clarification on concepts.

You might also like