0% found this document useful (0 votes)

8 views20 pages

ProgrammingForDS16_Rdatamanipulation

Uploaded by

Margarita Hambaryan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views20 pages

ProgrammingForDS16_Rdatamanipulation

Uploaded by

Margarita Hambaryan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Data Manipulation (R)

Liana Harutyunyan
Programming for Data Science
April 25, 2024
American University of Armenia
[email protected]

1
Data Manipulation

• dplyr is a package that helps you to solve most

common data manipulation tasks.
• Again for installing and importing the package, you
need:

install.packages("dplyr")

library(dplyr)

2
Data Manipulation

These are the most common functions from the library:

• filter — ﬁlters rows;

• select - ﬁlters columns;
• arrange — sorts a dataframe;
• mutate — creates new columns;
• group by — groups data by a speciﬁc key;
• summarize - performs aggregate functions

3
Data Manipulation

• Not necessary, but dplyr works the best with pipe like
operator from magrittr package.
• % > % operator takes the object from its left hand side
and uses it as an argument in the function on the right
hand side.
• Understand how to use it by replacing the pipe
operation with ‘then’ (in your mind, not in the code)
• example: filter(...) % > % select(...) - FILTER,
THEN on the ﬁltered SELECT

4
Mutate

mutate - adds new columns or modiﬁes current variables

in the dataset.
Adding new columns ﬁlled with the same value:

diamonds %>%
mutate(JustOne = 1,
Values = "something",
Simple = TRUE)

5
Mutate

Can use existing columns:

diamonds %>%
mutate(price_discounted = price * 0.9)

Can modify the existing ones:

diamonds %>%
mutate(price = price * 0.9,
mean_price = mean(price))

6
ifelse

ifelse returns a value with the same shape as test which is

ﬁlled with elements selected from either yes or no
depending on whether the element of test is TRUE or FALSE.

ifelse(test, yes, no)

Example:

vector <- c(1:10)

vector <- ifelse(vector > 5, "high", "low")

7
ifelse with mutate

Will change / create a new column depending on another

column’s values.

practice %>%
mutate(Health = ifelse(Subject == 1,
"sick",
"healthy"))

8
Filter

Only retain speciﬁc rows of data that meet the speciﬁed

requirement(s).

diamonds %>%
filter(cut == "Fair")

Will return only those rows that have ”cut” equal to ”Fair”.
Equivalent to:

diamonds[diamonds$cut == "Fair", ]

9
Filter

To have multiple conditions,

• for OR, you can use ”|”
• for AND, you can use ”,” (comma).

diamonds %>%
filter(cut == "Fair" | cut == "Good",
price <= 600)

Same as

diamonds %>%
filter(cut %in% c("Fair", "Good"),
price <= 600)

10
Select

• Select only the columns that you want to see. Gets rid of
all other columns.
• Can use columns positions or by name.
• The order in which you list the column names/positions
is the order that the columns will be displayed.
diamonds %>%
select(cut, color)
Same as:
diamonds %>%
select(1:5)

11
Select with negative sign

To exclude columns, you can use minus sign, both with

position numbers or column names.
Examples:

diamonds %>% select(-cut)

diamonds %>% select(-cut, -color)

diamonds %>% select(-c(cut, color))

diamonds %>% select(-c(2, 3))

12
group by and summarize

Groups variables different categories together for future

operations.

data %>%
group_by(Country) %>%
summarize(m = mean(Score),
s = sd(Score),
n = n()) # calculate the count

Equivalent to Python’s:

data.groupby("Country")["Score"].agg(["mean", "std", "count

Can also groupby with multiple columns, using

group by(Country, City)
13
arrange

Allows you arrange values within a variable in ascending or

descending order.
This can apply to both numerical and non-numerical
(alphabetical order) values.

diamonds %>%
arrange(cut)

diamonds %>%
arrange(desc(price))

14
Data Manipulation - examples set 1

• Exercise: Filter the data to have a diamond subset,

which is the original diamond data’s only Ideal cuts.

15
Data Manipulation - examples set 1

• Exercise: Filter the data to have a diamond subset,

15
Data Manipulation - examples set 1

• Exercise: Filter the data to have a diamond subset,

which is the original diamond data’s only Ideal cuts.
• Exercise: Add a new column *volume* that will be
multiplication result of columns x, y and z.
• Exercise: During last class, we calculated diamond price
means for each type of cut, using for loop. Do the same
with dplyr package.
• Exercise: Use the previous point to plot a barplot on
obtained table.
• Exercise: Calculate how many of each cut there is in the
dataset.

15
Data Manipulation - examples set 2

• Read summer.csv dataset in R.

• Filter the dataset to have only USA data. Now count the
type of medals USA received.
• Count how many medals received all the countries and
sort by the most medals to least. Take top 10.
• Take the data of countries ”USA”, ”FRA” and ”GBR”.
Group by by 2 columns Country and Medal and take
count for each.
• Plot the data from last point in a graph. What is the best
graph for this.

16
Data Manipulation - examples set 3

• Load airquality built-in dataset from R.

• Filter to have only those rows that do not complete NA
values and plot a scaterplot of *wind* and
*temperature*.
• Change the temperature from Fahrenheit to Celsius.
• Create a new column, where if temperature is bigger
than the mean temperature, write ”high”, otherwise
write ”low”.
• Calculate mean temperature for each month.
• Plot graph of multiple line plots, where each line is the
month and y is the temperature value.

17
Summary

Reading
https://bookdown.org/yih huynh/GuidetoRBook/basicdata-
management.html

Questions?

Modern Statistics With R
100% (3)
Modern Statistics With R
580 pages
Modern Family S01E01
No ratings yet
Modern Family S01E01
3 pages
Attachments in Workflow Notifications
No ratings yet
Attachments in Workflow Notifications
7 pages
Tutorial-Introduction To Dplyr
No ratings yet
Tutorial-Introduction To Dplyr
54 pages
Data manipulation in R
No ratings yet
Data manipulation in R
5 pages
Code Basics & Data Manipulation With R: Literature: Wickham & Grolemund R For Data Science Ch. 3, 16
No ratings yet
Code Basics & Data Manipulation With R: Literature: Wickham & Grolemund R For Data Science Ch. 3, 16
31 pages
Intro To Data Science Lecture 4
No ratings yet
Intro To Data Science Lecture 4
13 pages
Data Handling and Manipulation
No ratings yet
Data Handling and Manipulation
18 pages
DS-R Block 3-1 All
No ratings yet
DS-R Block 3-1 All
43 pages
MIT 302 - Statistical Computing II - Tutorial 02
No ratings yet
MIT 302 - Statistical Computing II - Tutorial 02
5 pages
Data Manipulation Workshop Handout
No ratings yet
Data Manipulation Workshop Handout
46 pages
Data Analytics-34-41
No ratings yet
Data Analytics-34-41
8 pages
Practical Assignment-10 Mini Project Nutrition Calculator - Calculate Nutrition For Recipes
No ratings yet
Practical Assignment-10 Mini Project Nutrition Calculator - Calculate Nutrition For Recipes
16 pages
Phan Project2 Report
No ratings yet
Phan Project2 Report
10 pages
RSTUDIO
No ratings yet
RSTUDIO
44 pages
Unit - 2: Data Manipulation With R & Data Visualization in Watson Studio
No ratings yet
Unit - 2: Data Manipulation With R & Data Visualization in Watson Studio
58 pages
R Advbeginner v5
No ratings yet
R Advbeginner v5
73 pages
R Programming
No ratings yet
R Programming
11 pages
Lecture 3 - Data Manipulation
No ratings yet
Lecture 3 - Data Manipulation
56 pages
Explore and Transform Data Based On Rows - Transcript
No ratings yet
Explore and Transform Data Based On Rows - Transcript
3 pages
MTH 4407 - Group 2 (Dr. Farid Zamani) - Lecture 6
No ratings yet
MTH 4407 - Group 2 (Dr. Farid Zamani) - Lecture 6
22 pages
Study Guide Data Manipulation With R
No ratings yet
Study Guide Data Manipulation With R
4 pages
R-Lab p-4,2,1
No ratings yet
R-Lab p-4,2,1
12 pages
All Codes
No ratings yet
All Codes
10 pages
R in Data Analysis
No ratings yet
R in Data Analysis
6 pages
Data - Analysis Using Matlab
No ratings yet
Data - Analysis Using Matlab
156 pages
DSCI 100 Cheat Sheet
No ratings yet
DSCI 100 Cheat Sheet
3 pages
6 Working With Data Frames in R
No ratings yet
6 Working With Data Frames in R
8 pages
Data Manipulation in Python Using Pandas
No ratings yet
Data Manipulation in Python Using Pandas
12 pages
MTech R Notes
No ratings yet
MTech R Notes
14 pages
WEEK 1
No ratings yet
WEEK 1
10 pages
Rtips. Revival 2012!: Paul E. Johnson June 8, 2012
No ratings yet
Rtips. Revival 2012!: Paul E. Johnson June 8, 2012
72 pages
EDA - Exploratory Data Analysis
No ratings yet
EDA - Exploratory Data Analysis
16 pages
Data Cleaning Using R
No ratings yet
Data Cleaning Using R
26 pages
R语言学习笔记
No ratings yet
R语言学习笔记
78 pages
Python For R Users
No ratings yet
Python For R Users
34 pages
DV Unit 2 Update
No ratings yet
DV Unit 2 Update
13 pages
Advanced R Data Analysis Training PDF
No ratings yet
Advanced R Data Analysis Training PDF
72 pages
R
No ratings yet
R
13 pages
CRC.Data.Science
No ratings yet
CRC.Data.Science
443 pages
Module IV
No ratings yet
Module IV
43 pages
r file code
No ratings yet
r file code
16 pages
R Cheat Sheet (Updated)
No ratings yet
R Cheat Sheet (Updated)
13 pages
CH 3
No ratings yet
CH 3
33 pages
EM622 Data Analysis and Visualization Techniques For Decision-Making
No ratings yet
EM622 Data Analysis and Visualization Techniques For Decision-Making
47 pages
L2 Lecture Note 1
No ratings yet
L2 Lecture Note 1
21 pages
Matlab Mathworks Data Analysis
No ratings yet
Matlab Mathworks Data Analysis
167 pages
Data Cleaning Using R
No ratings yet
Data Cleaning Using R
26 pages
Data cleaning Using R
No ratings yet
Data cleaning Using R
5 pages
Tidyverse Pres
No ratings yet
Tidyverse Pres
20 pages
Lab11
No ratings yet
Lab11
2 pages
2 Manipulating Processing Data
No ratings yet
2 Manipulating Processing Data
81 pages
Lab1 411 Eman Yahya 7773225
No ratings yet
Lab1 411 Eman Yahya 7773225
16 pages
Lecture 9: Data Wrangling With Dplyr: Kevin Lee
No ratings yet
Lecture 9: Data Wrangling With Dplyr: Kevin Lee
12 pages
Dav Exps - Merged - Merged
No ratings yet
Dav Exps - Merged - Merged
99 pages
ML_Unit_2
No ratings yet
ML_Unit_2
52 pages
R Tutorial
No ratings yet
R Tutorial
15 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
Excel Techniques
From Everand
Excel Techniques
Online Trainees
2/5 (1)
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
State of The Dead & Spiritualism
No ratings yet
State of The Dead & Spiritualism
2 pages
Vb Net Notes All Units
No ratings yet
Vb Net Notes All Units
13 pages
10TH Pre Final Time Table PDF
No ratings yet
10TH Pre Final Time Table PDF
2 pages
Mughal Empire From Babar To Aurangzeb
100% (3)
Mughal Empire From Babar To Aurangzeb
491 pages
COM 312 DATABASE 1
No ratings yet
COM 312 DATABASE 1
12 pages
SUBRAHMANYA - Skanda the Shanmukha Devata
No ratings yet
SUBRAHMANYA - Skanda the Shanmukha Devata
39 pages
AVW-PCAP Manual
No ratings yet
AVW-PCAP Manual
7 pages
AACKeys
No ratings yet
AACKeys
4 pages
Speech and Language
100% (1)
Speech and Language
20 pages
Rockhill 1
No ratings yet
Rockhill 1
508 pages
What Is Note-Taking Method?
100% (1)
What Is Note-Taking Method?
21 pages
The Tip of The Tongue Phenomenon (1966) Roger Et Al
No ratings yet
The Tip of The Tongue Phenomenon (1966) Roger Et Al
13 pages
Installing Redmine Oracle
No ratings yet
Installing Redmine Oracle
5 pages
Dissertation Table of Contents Apa
100% (2)
Dissertation Table of Contents Apa
6 pages
Supplementary Specification
No ratings yet
Supplementary Specification
7 pages
ESP-intro
No ratings yet
ESP-intro
70 pages
Shimon Ifrah - Getting Started With Containers in Azure - Deploy Secure Cloud Applications Using Terraform-Apress (2024)
No ratings yet
Shimon Ifrah - Getting Started With Containers in Azure - Deploy Secure Cloud Applications Using Terraform-Apress (2024)
221 pages
KingdomCovenantsAndCanonOfTheOldTestament.lesson3.StudyGuide.english
No ratings yet
KingdomCovenantsAndCanonOfTheOldTestament.lesson3.StudyGuide.english
28 pages
THE Yellow Peril: (Mrcchakatika, The Little Clay Cart, Urbana, 1938)
No ratings yet
THE Yellow Peril: (Mrcchakatika, The Little Clay Cart, Urbana, 1938)
26 pages
Mathematics IGCSE-I (LM)
No ratings yet
Mathematics IGCSE-I (LM)
20 pages
Creed Article 1
No ratings yet
Creed Article 1
28 pages
Learning Good Values From Literature
100% (1)
Learning Good Values From Literature
35 pages
Language in India: Strength For Today and Bright Hope For Tomorrow
No ratings yet
Language in India: Strength For Today and Bright Hope For Tomorrow
10 pages
Exercises Unit 49
No ratings yet
Exercises Unit 49
4 pages
Disourse and Pragmatics LIN 207 Speech Act Theory: Week 3 of 14
No ratings yet
Disourse and Pragmatics LIN 207 Speech Act Theory: Week 3 of 14
27 pages
MATLAB - Modelling in Time Domain - Norman Nise
No ratings yet
MATLAB - Modelling in Time Domain - Norman Nise
3 pages
Hum103 - For Intro Class
No ratings yet
Hum103 - For Intro Class
13 pages

ProgrammingForDS16_Rdatamanipulation

Uploaded by

ProgrammingForDS16_Rdatamanipulation

Uploaded by

Data Manipulation (R)

• dplyr is a package that helps you to solve most

These are the most common functions from the library:

• filter — ﬁlters rows;

mutate - adds new columns or modiﬁes current variables

Can use existing columns:

Can modify the existing ones:

ifelse returns a value with the same shape as test which is

ifelse(test, yes, no)

vector <- c(1:10)

Will change / create a new column depending on another

Only retain speciﬁc rows of data that meet the speciﬁed

To have multiple conditions,

To exclude columns, you can use minus sign, both with

diamonds %>% select(-cut)

diamonds %>% select(-cut, -color)

diamonds %>% select(-c(cut, color))

diamonds %>% select(-c(2, 3))

Groups variables different categories together for future

data.groupby("Country")["Score"].agg(["mean", "std", "count

Can also groupby with multiple columns, using

Allows you arrange values within a variable in ascending or

• Exercise: Filter the data to have a diamond subset,

• Exercise: Filter the data to have a diamond subset,

• Exercise: Filter the data to have a diamond subset,

• Read summer.csv dataset in R.

• Load *airquality* built-in dataset from R.

You might also like

• Load airquality built-in dataset from R.