0% found this document useful (0 votes)

2 views

Detailed explanations for Programming in R. Wrangling data

The document provides detailed instructions for processing data in R, including setting file paths, reading datasets, adding and merging columns, and handling missing data. It covers various data manipulation techniques such as filtering, sorting, and summarizing data, as well as saving processed data to files. Additionally, it suggests using tools like ChatGPT for troubleshooting and clarifying concepts.

Uploaded by

soloviovalada

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Detailed explanations for Programming in R. Wrangling data

Uploaded by

soloviovalada

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Ex 5 Detailed explanations

2. Setting File Path

R
Копировать код
file_path <-
paste0(dirname(rstudioapi::getSourceEditorContext()$path), "/")

● Dynamically set the working directory to the location of the current R script.
● Ensures that the path will always point to the correct folder containing the script,
regardless of its location.

3. Reading Data
R
Копировать код
BOA <- read.csv(paste0(file_path, "BagsOfApples.csv"), sep=";")
BOO <- read_excel(paste0(file_path, "BagsOfOrangesNA.xlsx"))
Geo <- read_excel(paste0(file_path, "Geo_dim.xlsx"))

● Reads three datasets into R:

○ BOA (Bags of Apples):
■ Reads a semicolon-delimited CSV file using read.csv().
■ Assumes the file is located in the directory set by file_path.
○ BOO (Bags of Oranges):
■ Reads an Excel file using read_excel() from the readxl package.
○ Geo (Geo-dimensions):
■ Likely contains geographic mapping information.

4. Adding Columns
R
Копировать код
BOA$fruits <- "Apples"
BOO$fruits <- "Oranges"

● Adds a new column fruits to each dataframe:

○ For BOA, all rows are labeled as "Apples".
○ For BOO, all rows are labeled as "Oranges".
5. Combining Data (Elaborated)
R
Копировать код
BOF <- rbind(BOO, BOA)

How It Works:

● rbind():
○ Combines two dataframes (BOO and BOA) by stacking rows.
○ Assumes both dataframes have the same column structure (column names
and types).

Additional Use Cases:

1. If Columns Don’t Match Exactly:

○ bind_rows(BOO, BOA, .id = "source") can be used to allow differing
columns. Missing columns will be filled with NA.
2. Adding an Identifier for Data Source:

By using the .id parameter in bind_rows, you can add an extra column indicating the
source of each row:
R
Копировать код
BOF <- bind_rows(BOO = BOO, BOA = BOA, .id = "source")

○ Here, rows from BOO would have source = "BOO", and rows from BOA
would have source = "BOA".

6. Replacing Values
R
Копировать код
BOF$origin <- str_replace_all(BOF$origin, "California", "United
States")

● Uses str_replace_all() from the stringr package to replace "California"

with "United States" in the origin column.

7. Merging Datasets
R
Копировать код
BOF <- left_join(BOF, Geo, by = c("origin" = "Country"))

● Joins BOF with Geo based on the origin column in BOF and Country column in
Geo.

8. Renaming Columns
R
Копировать код
BOF <- rename(BOF, price = prize)

● Renames the prize column to price.

9. Removing Columns
R
Копировать код
BOF <- select(BOF, -bagNo)

● Removes the bagNo column using select().

10. Handling Missing Data

R
Копировать код
anyNA(BOF)
BOF <- na.omit(BOF)

● anyNA(BOF):
○ Checks for the presence of NA (missing values) in the dataframe.
● na.omit(BOF):
○ Removes all rows with any NA values.

11. Filtering Data

R
Копировать код
BOF_europe <- filter(BOF, Region == "Europe")

● Creates a subset of BOF where the Region column equals "Europe".

12. Adding Calculated Columns

R
Копировать код
BOF <- mutate(BOF, ppk = price / weight)

● mutate():
○ Adds a new column ppk (price per kilo), calculated as price divided by
weight.

13. Sorting Data

R
Копировать код
arrange(BOF, desc(ppk))

● arrange():
○ Sorts the dataframe based on the ppk column in descending order.

14. Saving Processed Data (Elaborated)

R
Копировать код
write.table(BOF, file="bagsoffruits_price.txt", sep="\t",
row.names=FALSE)

How It Works:

● write.table():
○ Writes the dataframe BOF to a file named bagsoffruits_price.txt.
○ The file uses tab (\t) as the delimiter.

Additional Use Cases:

1. Changing the Delimiter:

You can change sep to any other delimiter, such as commas for a CSV file:
R
Копировать код
write.table(BOF, file="bagsoffruits_price.csv", sep=",",
row.names=FALSE)

○
2. Adding Row Names:
○ Set row.names=TRUE to include row numbers as a separate column.
3. Saving with Quotes:

To wrap text fields in quotes:

R
Копировать код
write.table(BOF, file="bagsoffruits_price.txt", sep="\t",
row.names=FALSE, quote=TRUE)

○
4. Using write.csv() for Simplicity:

For CSV files, write.csv() can be used as a shortcut:

R
Копировать код
write.csv(BOF, file="bagsoffruits_price.csv", row.names=FALSE)

15. Counting and Grouping Data

R
Копировать код
count(BOF, foodLabel)

● Counts the number of rows for each unique value in the foodLabel column.

R
Копировать код
arrange(count(BOF, foodLabel), desc(n))

● Arranges the counts in descending order by frequency.

16. Summarizing Data
R
Копировать код
BOFg <- group_by(BOF, foodLabel)
BOFgn <- summarise(BOFg, meanppk = mean(ppk))

● group_by():
○ Groups data by the foodLabel column.
● summarise():
○ Calculates the mean ppk for each group.

17. Troubleshooting and Assistance

● The document recommends using tools like ChatGPT or Copilot to troubleshoot

errors or get clarification on concepts.

How To Create Buildups in Datastage
No ratings yet
How To Create Buildups in Datastage
17 pages
Getting Started with SAS Programming: Using SAS Studio in the Cloud
From Everand
Getting Started with SAS Programming: Using SAS Studio in the Cloud
Ron Cody
No ratings yet
Airbnb Digital Transformation
No ratings yet
Airbnb Digital Transformation
5 pages
Use Debug To See How It Is Working!!!
No ratings yet
Use Debug To See How It Is Working!!!
4 pages
Unit 2.2
No ratings yet
Unit 2.2
58 pages
8086
No ratings yet
8086
30 pages
MM - 1 - 2
No ratings yet
MM - 1 - 2
2 pages
BINARY FILE HANDLING ASSIGNMENT(2024-25)
No ratings yet
BINARY FILE HANDLING ASSIGNMENT(2024-25)
5 pages
Pass 1:: 9. Define Macro. Write A C Program With A Macro To Find Out Biggest of Two Numbers
No ratings yet
Pass 1:: 9. Define Macro. Write A C Program With A Macro To Find Out Biggest of Two Numbers
12 pages
Assignment
No ratings yet
Assignment
12 pages
Talend - Case Study
100% (1)
Talend - Case Study
5 pages
mongodb disucssion
No ratings yet
mongodb disucssion
26 pages
2 File Reference and Option Setting
No ratings yet
2 File Reference and Option Setting
7 pages
Phan Project2 Report
No ratings yet
Phan Project2 Report
10 pages
Lista5 Equacao 2 Grau
No ratings yet
Lista5 Equacao 2 Grau
108 pages
Big Data Lab
No ratings yet
Big Data Lab
12 pages
Expt 02
No ratings yet
Expt 02
9 pages
Programming in R. Ex 4 Detailed explanation
No ratings yet
Programming in R. Ex 4 Detailed explanation
10 pages
8086 Programming: Unit 3 & 4
No ratings yet
8086 Programming: Unit 3 & 4
25 pages
MIC
No ratings yet
MIC
30 pages
CSV 40
No ratings yet
CSV 40
2 pages
b9e20
No ratings yet
b9e20
4 pages
Command-Line Options of LINK - EXE and TLINK32.EXE
No ratings yet
Command-Line Options of LINK - EXE and TLINK32.EXE
6 pages
Which of The Following Functions Changes The Position of File Pointer and Returns Its New Position
No ratings yet
Which of The Following Functions Changes The Position of File Pointer and Returns Its New Position
1 page
COPA Extraction Steps
No ratings yet
COPA Extraction Steps
2 pages
mongodb
No ratings yet
mongodb
24 pages
Name: Sadikshya Khanal Section: C3G2: Workshop - 9 - Hadoop Part 2
No ratings yet
Name: Sadikshya Khanal Section: C3G2: Workshop - 9 - Hadoop Part 2
51 pages
05 - Mongodb - Backup
No ratings yet
05 - Mongodb - Backup
80 pages
Lecture P8: Pointers and Linked Lists: Pointer Overview
No ratings yet
Lecture P8: Pointers and Linked Lists: Pointer Overview
7 pages
Csempesz Mongodb
No ratings yet
Csempesz Mongodb
66 pages
SAP BW - r3 Extraction
No ratings yet
SAP BW - r3 Extraction
3 pages
L2 - Internal Architecture of 8086
No ratings yet
L2 - Internal Architecture of 8086
27 pages
Database
No ratings yet
Database
10 pages
Week2 - L2 - Addressing Modes of 8086-1-14
No ratings yet
Week2 - L2 - Addressing Modes of 8086-1-14
14 pages
A Manual for the Plan 9 Assembler [Rob Pike] ()
No ratings yet
A Manual for the Plan 9 Assembler [Rob Pike] ()
10 pages
Solved Quetion Bank For 2nd Class Test - DTM
No ratings yet
Solved Quetion Bank For 2nd Class Test - DTM
20 pages
MQP R Answers
No ratings yet
MQP R Answers
19 pages
dav 2 unit
No ratings yet
dav 2 unit
55 pages
Microprocessor and Interfacing Notes Lab Reports
No ratings yet
Microprocessor and Interfacing Notes Lab Reports
64 pages
8086 Microprocessor
No ratings yet
8086 Microprocessor
38 pages
Big Data
No ratings yet
Big Data
11 pages
Unit 4 - Compiler Design - WWW - Rgpvnotes.in
No ratings yet
Unit 4 - Compiler Design - WWW - Rgpvnotes.in
21 pages
Lec 16 BB
No ratings yet
Lec 16 BB
24 pages
Screenshot 2023-08-15 at 12.16.26 AM
No ratings yet
Screenshot 2023-08-15 at 12.16.26 AM
1 page
DBMS Marathon Questions
No ratings yet
DBMS Marathon Questions
55 pages
S.Y.BCS GROUP B INTRO TO PP
No ratings yet
S.Y.BCS GROUP B INTRO TO PP
6 pages
Chapter 3-1
No ratings yet
Chapter 3-1
54 pages
Segmentation & Addressing Modes: Name Roll No Section Marks
No ratings yet
Segmentation & Addressing Modes: Name Roll No Section Marks
6 pages
Solved QBank_CSV Files
No ratings yet
Solved QBank_CSV Files
10 pages
COPA Extraction Steps
No ratings yet
COPA Extraction Steps
2 pages
Binary File Program Model
No ratings yet
Binary File Program Model
10 pages
Hindu College, Amritsar
No ratings yet
Hindu College, Amritsar
22 pages
Xii Programs Binary Files
No ratings yet
Xii Programs Binary Files
4 pages
CSE141L
No ratings yet
CSE141L
10 pages
Comp348 04 C Programming Part-Ii
No ratings yet
Comp348 04 C Programming Part-Ii
49 pages
Python String Formatting Best Practices – Real Python
No ratings yet
Python String Formatting Best Practices – Real Python
8 pages
12CS T1 2022SQPAnsKey
No ratings yet
12CS T1 2022SQPAnsKey
4 pages
File Handling
No ratings yet
File Handling
30 pages
Worksheet 2 File Handling
No ratings yet
Worksheet 2 File Handling
7 pages
Query Optimization: Practice Exercises
No ratings yet
Query Optimization: Practice Exercises
4 pages
Practical SQL, 2nd Edition: A Beginner's Guide to Storytelling with Data
From Everand
Practical SQL, 2nd Edition: A Beginner's Guide to Storytelling with Data
Anthony DeBarros
No ratings yet
Muhammad Ammar BS IT 4A 48885
0% (2)
Muhammad Ammar BS IT 4A 48885
15 pages
IT Mentor Allocation (2021-25 Batch)
No ratings yet
IT Mentor Allocation (2021-25 Batch)
3 pages
Developing With Angular
100% (2)
Developing With Angular
402 pages
Bilingual Machine Translation
No ratings yet
Bilingual Machine Translation
8 pages
Real Estate's Embrace of Digital Transformation - Centric Digital
No ratings yet
Real Estate's Embrace of Digital Transformation - Centric Digital
14 pages
Cisco Security Agent v6
No ratings yet
Cisco Security Agent v6
2 pages
Quick Setup Guide: Radar Sensor For Continuous Level Measurement of Water and Wastewater
No ratings yet
Quick Setup Guide: Radar Sensor For Continuous Level Measurement of Water and Wastewater
28 pages
Degrees of Data Abstraction
No ratings yet
Degrees of Data Abstraction
5 pages
2 Blood Bank Management System Project 2
No ratings yet
2 Blood Bank Management System Project 2
18 pages
Digital Prepress PDF
No ratings yet
Digital Prepress PDF
16 pages
Message Prioritization in Advanced Adapter Engine
No ratings yet
Message Prioritization in Advanced Adapter Engine
9 pages
Crystal Clear Advanced
No ratings yet
Crystal Clear Advanced
114 pages
Guide To Completing Inderscience Author Copyright Agreement and Adding Digital Signatures
No ratings yet
Guide To Completing Inderscience Author Copyright Agreement and Adding Digital Signatures
2 pages
Welcome To ENG1060: Unit Staff
No ratings yet
Welcome To ENG1060: Unit Staff
7 pages
Stardom Fcn-Rtu: Low Power Autonomous Controller
No ratings yet
Stardom Fcn-Rtu: Low Power Autonomous Controller
1 page
Best Web Designers in Kerala - WIS
No ratings yet
Best Web Designers in Kerala - WIS
17 pages
Neonode® Touch Sensor Module User's Guide
100% (1)
Neonode® Touch Sensor Module User's Guide
151 pages
SRS PPT#01
No ratings yet
SRS PPT#01
19 pages
ECC Presentation
No ratings yet
ECC Presentation
24 pages
IT Companies in Pun1
No ratings yet
IT Companies in Pun1
5 pages
Tbs Mambo Radio: Compact All-In-One Remote Control Radio With TBS Tracer System
No ratings yet
Tbs Mambo Radio: Compact All-In-One Remote Control Radio With TBS Tracer System
49 pages
General Keyboard Shortcuts: To... Use This Shortcut
No ratings yet
General Keyboard Shortcuts: To... Use This Shortcut
4 pages
Short Cut Keys of Erp 9
No ratings yet
Short Cut Keys of Erp 9
4 pages
Erp Unit - 1
No ratings yet
Erp Unit - 1
13 pages
Infineon TC39x DataSheet v01 00 en
No ratings yet
Infineon TC39x DataSheet v01 00 en
542 pages
Dinesh Kumar.R: 122/244, Asthandra Naicker Street. Singanallu R, Coimbatore - 641 005, Tamil Nadu. 9916339035
No ratings yet
Dinesh Kumar.R: 122/244, Asthandra Naicker Street. Singanallu R, Coimbatore - 641 005, Tamil Nadu. 9916339035
3 pages
Sales Process Map
No ratings yet
Sales Process Map
7 pages
Complex Surcharges
No ratings yet
Complex Surcharges
7 pages
Download Complete (Ebook) Cybercrime and Digital Deviance by Graham, Roderick S., Smith, 'Shawn K. ISBN 9781032249193, 9781003283256, 9781040011171, 9781040011218, 1032249196, 100328325X, 1040011179, 1040011217 PDF for All Chapters
100% (5)
Download Complete (Ebook) Cybercrime and Digital Deviance by Graham, Roderick S., Smith, 'Shawn K. ISBN 9781032249193, 9781003283256, 9781040011171, 9781040011218, 1032249196, 100328325X, 1040011179, 1040011217 PDF for All Chapters
77 pages