0% found this document useful (0 votes)

25 views

Lecture 9: Data Wrangling With Dplyr: Kevin Lee

This document summarizes a lecture on data wrangling using the dplyr package in R. It introduces the concept of tidy data and describes five main functions in dplyr - filter(), arrange(), select(), mutate(), and summarize() - to manipulate and transform data frames. It also discusses working with relational data through inner, full, left and right joins. The overall purpose is to provide an overview of how to use dplyr to solve common data manipulation challenges.

Uploaded by

Dr-Rabia Almamalook

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views

Lecture 9: Data Wrangling With Dplyr: Kevin Lee

Uploaded by

Dr-Rabia Almamalook

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Lecture 9: Data Wrangling with dplyr

Kevin Lee

Department of Statistics
Western Michigan University

September 30, 2019

Kevin Lee (WMU) Lecture 9 (9/30/2019) September 30, 2019 1 / 12

Tidy Data

Happy families are all alike; every unhappy family is unhappy in its
own way.
– Leo Tolstoy

Tidy datasets are all alike, but every messy dataset is messy in its
own way.
– Hadley Wickham

Kevin Lee (WMU) Lecture 9 (9/30/2019) September 30, 2019 2 / 12

Tidy Data

Tidying your data means storing it in a consistent form that matches

the semantics of the dataset.

There are three interrelated rules which make a dataset tidy:

1 Each variable must have its own column,
2 Each observation must have its own row.
3 Each value must have its own cell.

Kevin Lee (WMU) Lecture 9 (9/30/2019) September 30, 2019 3 / 12

Data Transformation with dplyr

Five main dplyr functions that allow you to solve the majority of your data-
manipulation challenges:
filter(), pick observations by their values
arrange(), reorder the rows
select(), pick variables by their names
mutate(), create new variables with functions of existing variables
summarize(), collapse many values down to a single summary

Kevin Lee (WMU) Lecture 9 (9/30/2019) September 30, 2019 4 / 12

Data Transformation with dplyr

All functions work similarly:

1 The first argument is a data frame.
2 The subsequent arguments describe what to do with the data frame,
using the variable names.
3 The result is a new data frame.

Kevin Lee (WMU) Lecture 9 (9/30/2019) September 30, 2019 5 / 12

filter()
filter() allows you to subset observations based on their values

filter(data frame, condition)

To use filtering effectively, you have to know how to select the observations
that you want using the comparison operators and logical operators in R.
Comparison operators in R:
< # less than
> # greater than
== # equal to
<= # less than or equal to
>= # greater than or equal to
!= # not equal to
Logical operators in R:
& # logical “and”
| # logical “or”
! # logical “not”
Kevin Lee (WMU) Lecture 9 (9/30/2019) September 30, 2019 6 / 12
arrange()

arrange() allows you to change the order of the observations.

arrange(data frame, column name)

If you provide more than one column name, each additional column will be
used to break ties in the values of preceding columns:
Use desc() to reorder by a column in descending order.
Missing values are always sorted at the end.

Kevin Lee (WMU) Lecture 9 (9/30/2019) September 30, 2019 7 / 12

select()

select() allows you to zoom in on a useful subset using operations based

on the names of the variables.

select(data frame, column name)

Below are some helper functions you can use within select():
starts_with("abc") matches names that begin with "abc"
ends_with("xyz") matches names that contain "xyz".
num_range("x", 1:3) matches x1 , x2 , and x3 .

Kevin Lee (WMU) Lecture 9 (9/30/2019) September 30, 2019 8 / 12

mutate()

mutate() allows you to add new columns that are functions of existing
columns.

mutate(data frame, new column = f(column name))

If you only want to keep the new variables, use transmute().

Kevin Lee (WMU) Lecture 9 (9/30/2019) September 30, 2019 9 / 12

summarize()

summarize() collapse a data frame to a single row.

summarize(data frame, R function(column name))

Below are some summary functions you can use within summarize():
Measures of location: mean(), median()
Measures of variation: var(), sd(), IQR()
Measures of rank: min(), max(), quantile()

summarize() becomes really useful when we use with group_by().

group_by() is used to group data by one or more variables.
group_by(data frame, column name)

Kevin Lee (WMU) Lecture 9 (9/30/2019) September 30, 2019 10 / 12

Relational Data with dplyr

It is s rare that a data analysis involves only a single table of data.

Typically you have many tables of data, and you must combine them
to answer the questions that you are interested in.

Multiple tables of data are called relational data because it is the

relations, not just the individual datasets, that are important.

Kevin Lee (WMU) Lecture 9 (9/30/2019) September 30, 2019 11 / 12

Relational Data with dplyr

inner_join(x, y), keeps only common observations in x and y.

full_join(x, y), keeps all observations in x and y.
left_join(x, y), keeps all observations in x.
right_join(x, y), keeps all observations in y.

Kevin Lee (WMU) Lecture 9 (9/30/2019) September 30, 2019 12 / 12

Subsetting Data in R
No ratings yet
Subsetting Data in R
44 pages
Epicor Core Data Map PDF
100% (1)
Epicor Core Data Map PDF
9 pages
Tutorial-Introduction To Dplyr
No ratings yet
Tutorial-Introduction To Dplyr
54 pages
Study Guide Data Manipulation With R
No ratings yet
Study Guide Data Manipulation With R
4 pages
Data Analytics-34-41
No ratings yet
Data Analytics-34-41
8 pages
58.tidy Data in R For Linguists
No ratings yet
58.tidy Data in R For Linguists
14 pages
DSF 11-12
No ratings yet
DSF 11-12
21 pages
Statistics with R week 3
No ratings yet
Statistics with R week 3
3 pages
BIO259 Note
No ratings yet
BIO259 Note
55 pages
R Basic and Advanced
No ratings yet
R Basic and Advanced
9 pages
CH 3
No ratings yet
CH 3
33 pages
MIT 302 - Statistical Computing II - Tutorial 02
No ratings yet
MIT 302 - Statistical Computing II - Tutorial 02
5 pages
Practical 1_Data Frame Manipulation_072502
No ratings yet
Practical 1_Data Frame Manipulation_072502
16 pages
Data Manipulation Workshop Handout
No ratings yet
Data Manipulation Workshop Handout
46 pages
Data Analytics Using R
No ratings yet
Data Analytics Using R
37 pages
Module 7_(Data Analysis with R Programming)
No ratings yet
Module 7_(Data Analysis with R Programming)
18 pages
R study material I
No ratings yet
R study material I
8 pages
CleaningData Chapter 3
No ratings yet
CleaningData Chapter 3
29 pages
Unit2
No ratings yet
Unit2
76 pages
R Programming Cont..
No ratings yet
R Programming Cont..
24 pages
Unit 1.3
No ratings yet
Unit 1.3
36 pages
Code Basics & Data Manipulation With R: Literature: Wickham & Grolemund R For Data Science Ch. 3, 16
No ratings yet
Code Basics & Data Manipulation With R: Literature: Wickham & Grolemund R For Data Science Ch. 3, 16
31 pages
Rcourse3 PDF
No ratings yet
Rcourse3 PDF
35 pages
Section 03
No ratings yet
Section 03
20 pages
Lecture 1
No ratings yet
Lecture 1
35 pages
Da Session 4
No ratings yet
Da Session 4
75 pages
MTH 4407 - Group 2 (Dr. Farid Zamani) - Lecture 6
No ratings yet
MTH 4407 - Group 2 (Dr. Farid Zamani) - Lecture 6
22 pages
R Language PDF
100% (1)
R Language PDF
619 pages
Week 1-3
No ratings yet
Week 1-3
17 pages
Intro To Data Science Lecture 4
No ratings yet
Intro To Data Science Lecture 4
13 pages
ProgrammingForDS14_Rbasics
No ratings yet
ProgrammingForDS14_Rbasics
32 pages
6 Working With Data Frames in R
No ratings yet
6 Working With Data Frames in R
8 pages
Introduction to R for Business Analytics(1)
No ratings yet
Introduction to R for Business Analytics(1)
7 pages
Lab1 411 Eman Yahya 7773225
No ratings yet
Lab1 411 Eman Yahya 7773225
16 pages
Lesson 7 - The Data Frame
No ratings yet
Lesson 7 - The Data Frame
7 pages
Learn R_ Learn R_ Data Cleaning Cheatsheet _ Codecademy
No ratings yet
Learn R_ Learn R_ Data Cleaning Cheatsheet _ Codecademy
4 pages
Base-R
No ratings yet
Base-R
9 pages
Data Handling and Manipulation
No ratings yet
Data Handling and Manipulation
18 pages
Week3 2020
No ratings yet
Week3 2020
20 pages
BS730 Class 12
No ratings yet
BS730 Class 12
36 pages
Introduction to r
No ratings yet
Introduction to r
18 pages
L2 Lecture Note 1
No ratings yet
L2 Lecture Note 1
21 pages
data anlytics using r notes
No ratings yet
data anlytics using r notes
14 pages
BMR Assignment: Tidyr
No ratings yet
BMR Assignment: Tidyr
3 pages
Data
No ratings yet
Data
40 pages
Data Cleansing Using R
0% (1)
Data Cleansing Using R
10 pages
BT1101 - R Code Cheatsheet 1.0
No ratings yet
BT1101 - R Code Cheatsheet 1.0
12 pages
All Codes
No ratings yet
All Codes
10 pages
R
No ratings yet
R
13 pages
Module III
No ratings yet
Module III
53 pages
Chapter 03 Wrangling
No ratings yet
Chapter 03 Wrangling
40 pages
MTech R Notes
No ratings yet
MTech R Notes
14 pages
MLlab5th
No ratings yet
MLlab5th
17 pages
R Module 7 - Data Classes
No ratings yet
R Module 7 - Data Classes
45 pages
What Is Dplyr
No ratings yet
What Is Dplyr
23 pages
Data Cleaning Using R
No ratings yet
Data Cleaning Using R
26 pages
Data Structures
No ratings yet
Data Structures
8 pages
R Programming Swirl
No ratings yet
R Programming Swirl
22 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Data Science with R: Beginner to Expert
From Everand
Data Science with R: Beginner to Expert
Narayana Nemani
No ratings yet
Data Structures and Algorithm
From Everand
Data Structures and Algorithm
Knowledge Flow
No ratings yet
Deep Learning in Smart Grid Technology A Review of Recent Advancements and Future Prospects
No ratings yet
Deep Learning in Smart Grid Technology A Review of Recent Advancements and Future Prospects
21 pages
Ve RRRRRRR RRRRRRRR RRRRRRRR RRRRRRRR
No ratings yet
Ve RRRRRRR RRRRRRRR RRRRRRRR RRRRRRRR
130 pages
1 s2.0 S0926580522003338 Main
No ratings yet
1 s2.0 S0926580522003338 Main
12 pages
The 2008 Legatum Prosperity Index Report
No ratings yet
The 2008 Legatum Prosperity Index Report
105 pages
Applsci 11 00237 v3
No ratings yet
Applsci 11 00237 v3
28 pages
The Usability Engineering Lifecycle A Case Study
No ratings yet
The Usability Engineering Lifecycle A Case Study
23 pages
Strocke
No ratings yet
Strocke
17 pages
FY 2021 USAID Journey To Self-Reliance Country Roadmap Methodology Guide
No ratings yet
FY 2021 USAID Journey To Self-Reliance Country Roadmap Methodology Guide
50 pages
Hu Dissertation Etd
No ratings yet
Hu Dissertation Etd
157 pages
NAHAER
No ratings yet
NAHAER
229 pages
A Student Engagement Evaluation Methodology Inspired From Usability Engineering For Extracting Course Design Requirements
No ratings yet
A Student Engagement Evaluation Methodology Inspired From Usability Engineering For Extracting Course Design Requirements
20 pages
Ada 378838
No ratings yet
Ada 378838
288 pages
Smart Factories
No ratings yet
Smart Factories
68 pages
User's Manual
No ratings yet
User's Manual
25 pages
SCM Mapping Workflow Feb 09
No ratings yet
SCM Mapping Workflow Feb 09
21 pages
MT6761 Android Scatter
No ratings yet
MT6761 Android Scatter
12 pages
The BLF Series Car Frame: For Standard Applications With Cantilevered Car Frame Arrangement
No ratings yet
The BLF Series Car Frame: For Standard Applications With Cantilevered Car Frame Arrangement
4 pages
01 Introduction To Marketing Research
100% (1)
01 Introduction To Marketing Research
27 pages
MPR Format
No ratings yet
MPR Format
8 pages
Lab03 Solutions - DBMS - Queries
No ratings yet
Lab03 Solutions - DBMS - Queries
4 pages
CDDHv3 Assessment Brief-AssignmentI
No ratings yet
CDDHv3 Assessment Brief-AssignmentI
4 pages
Bca Bca 302 Data Structure With C 2011
No ratings yet
Bca Bca 302 Data Structure With C 2011
4 pages
Mechanical Engineering CAD - LKS 2018
No ratings yet
Mechanical Engineering CAD - LKS 2018
35 pages
Introduction To The Accord GUI
No ratings yet
Introduction To The Accord GUI
4 pages
S 4 Hana
No ratings yet
S 4 Hana
51 pages
Modelsim Quick Guide
No ratings yet
Modelsim Quick Guide
2 pages
Ml1.ipynb - Colaboratory
No ratings yet
Ml1.ipynb - Colaboratory
5 pages
Devops For ERP Implementation
No ratings yet
Devops For ERP Implementation
15 pages
Test Bank Starting Out with Java: Early Objects, 5/E Tony Gaddisinstant download
100% (2)
Test Bank Starting Out with Java: Early Objects, 5/E Tony Gaddisinstant download
45 pages
What Media Files Are Supported Through Allshare and USB On My LED TV - (2011 - 2012) - Samsung Support CA PDF
No ratings yet
What Media Files Are Supported Through Allshare and USB On My LED TV - (2011 - 2012) - Samsung Support CA PDF
4 pages
FS SAP Master Data - v1.03
No ratings yet
FS SAP Master Data - v1.03
20 pages
Aim: To Perform A C Program To Execute Two Level Directory
No ratings yet
Aim: To Perform A C Program To Execute Two Level Directory
9 pages
Exp2 PT 1.6.1.3.instructions
No ratings yet
Exp2 PT 1.6.1.3.instructions
3 pages
LESCO
No ratings yet
LESCO
9 pages
Key Resume Phrases & Words
No ratings yet
Key Resume Phrases & Words
4 pages
RHCluster Command Line
100% (1)
RHCluster Command Line
8 pages
تحديد الأجور في المؤسسات العمومية الإدارية دراسة حالة مديرية التربية لولاية تلمسان
No ratings yet
تحديد الأجور في المؤسسات العمومية الإدارية دراسة حالة مديرية التربية لولاية تلمسان
15 pages
K4 B User Manual, IX Edition September 2003
No ratings yet
K4 B User Manual, IX Edition September 2003
146 pages
Mathtoolbox
No ratings yet
Mathtoolbox
7 pages
Qua Môn KTMT 2022
No ratings yet
Qua Môn KTMT 2022
253 pages
27 Excel Hacks To Make You A Superstar PDF
No ratings yet
27 Excel Hacks To Make You A Superstar PDF
33 pages
Data Science Sample Resumes
No ratings yet
Data Science Sample Resumes
2 pages