0% found this document useful (0 votes)

12 views4 pages

Learn R - Learn R - Data Cleaning Cheatsheet - Codecademy

The document is a cheatsheet for data cleaning in R, detailing various functions such as gsub(), distinct(), str(), and as.numeric() for manipulating and cleaning data. It also covers combining data from multiple files, creating tidy datasets, and using dplyr and tidyr packages for effective data management. Key functions like separate() and gather() are highlighted for reshaping data and managing string values.

Uploaded by

snarficus

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views4 pages

Learn R - Learn R - Data Cleaning Cheatsheet - Codecademy

Uploaded by

snarficus

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

23-01-2025, 11:24 Learn R: Learn R: Data Cleaning Cheatsheet | Codecademy

Cheatsheets / Learn R

Learn R: Data Cleaning

gsub() R Function

The base R gsub() function searches for a regular # Replace the element "1" with the empty
expression in a string and replaces it. The function
string in the teams vector in order to
recieve a string or character to replace, a replacement
value, and the object that contains the regular get the teams_clean vector with the
expression. We can use it to replace substrings within a correct names.
single string or in each string in a vector.
teams <- c("Fal1cons", "Cardinals",
When combined with dplyr’s mutate() function, a
column of a data frame can be cleaned to enable "Seah1awks", "Vikings", "Bro1nco",
analysis. "Patrio1ts")

teams_clean <- gsub("1", "", teams)

print(teams_clean)

# Output:
# "Falcons" "Cardinals" "Seahawks"
"Vikings" "Bronco" "Patriots"

distinct() dplyr

The distinct() function from dplyr package is used to # Keep unique rows in the
keep only unique rows on a data frame. If there are
match_statistics data frame
duplicate rows, the function will preserve only the first
row. The function can be used to remove equal rows of distinct(match_statistics)
a dataframe, and to remove rows in a data frame based
on unique column values or unique combination of
# Keep only rows with different values in
columns values.
the prices column of trips
# dataframe
distinct(trips,prices)

https://www.codecademy.com/learn/learn-r/modules/learn-r-data-cleaning/cheatsheet 1/4
23-01-2025, 11:24 Learn R: Learn R: Data Cleaning Cheatsheet | Codecademy

str() Function

The str() function display the internal structure of an

R object that is passed as parameter of the function.
The function outputs the data structure of the object
as well as the elements of the object. When the object
is a dataframe, the function returns the data type of
each column in the data frame, the number of
observations and the number and variables.

Combing Data with R

Data from multiple files can be combined into one data

frame using the base R functions list.files() and
lappy() , with readr’s read_csv() and dplyr’s
bind_rows() functions. Consider the following steps:
1. Get the list of files. The following code will get a
list of all files in the current directory that
match the pattern “file_.*csv”

files <- list.files(pattern = "fi

1. Read in the files. The following code applies

read_csv(), a function from readr, to each file,
and adds the resulting data frames to the list
df_list.

df_list <- lapply(files, read_csv)

1. Combine the file data. Below bind_rows(), a

dplyr function, is used to combine the data from
each data frame in the list into one data frame.

df <- bind_rows(df_list)

https://www.codecademy.com/learn/learn-r/modules/learn-r-data-cleaning/cheatsheet 2/4
23-01-2025, 11:24 Learn R: Learn R: Data Cleaning Cheatsheet | Codecademy

R as.numeric() Function

The base R as.numeric() function can coerce

character string objects into numeric types.
This function is useful because often numbers are
stored as characters which do not allow operations or
analysis. The function receives the object to be
transformed as a parameter and transforms it to
numeric.
When this function is combined with the mutate()
function from dplyr, new columns of a dataframe can
be created with the numeric data type.

str_sub() function

The str_sub() function from the stringr package can # This command would take the first index
split a string by index position separating combined
to the five index of the string.
data values into their individual components. The
function uses the start= and end= arguments to str_sub('Marya1984', start=1,end=5)
perform the split operation. This function can be used
with mutate() from dplyr in order to generate multiple
new columns on a data frame based on split string
values of a particular column.

Tidy Dataset

In a tidy dataset each variable is represented by a

column, and each row is a separate observation. Tidy
datasets are the best way to conduct data analysis on
specific data. By adhering to the standard of a tidy
dataset, it is easier for an analyst to extract from.
Datasets that are not tidy present some issues in their
structure such as one column storing multiple variables,
the same information of a variable is spread out in
multiple columns, or the variables can be stored in both
rows and columns.

https://www.codecademy.com/learn/learn-r/modules/learn-r-data-cleaning/cheatsheet 3/4
23-01-2025, 11:24 Learn R: Learn R: Data Cleaning Cheatsheet | Codecademy

The dplyr and tidyr packages

The dplyr and tidyr packages provide functions that

solve common data cleaning challenges in R.
Data cleaning and preparation should be performed on
a “messy” dataset before any analysis can occur. This
process can include:
diagnosing the “tidiness” of the data
reshaping the data
combining multiple files of data
changing the data types of values
manipulating strings to better represent the
data

separate() Function

The separate() function from the tidyr package, is # This function would separate the
used to separate a single character column of a data
complete_name column into new columns
frame into multiple columns. Arguments of this function
are, in order, a dataframe, the column used to create called names and surnames on the
the new columns(column name or column position in individuals data frame.
the data frame), the new column names that will be
separate(individuals, complete_name,
used, and the separator argument. The default
seperator will match any non-alphanumeric sequence, c("names","surnames"))
such as a space or semicolon.

gather() tidyr

The gather() function from tidyr package is useful to

gather columns over a data frame into key-value pairs,
changing the shape of a data frame from wide to long.
The original data frame has multiple columns that can
be gathered, in a unique structure of key-value pair
with all values in one column and the column names in
another column.

Print Share

https://www.codecademy.com/learn/learn-r/modules/learn-r-data-cleaning/cheatsheet 4/4

R Cheat Sheet PDF
100% (1)
R Cheat Sheet PDF
38 pages
R Programming Cheatsheet
100% (2)
R Programming Cheatsheet
6 pages
RMA G3Scoresheet v1-1
100% (1)
RMA G3Scoresheet v1-1
17 pages
Advanced R Programming Tidyverse Packages Notes
No ratings yet
Advanced R Programming Tidyverse Packages Notes
12 pages
Data Cleaning Using R
No ratings yet
Data Cleaning Using R
26 pages
Assignment 2 Tidyr
No ratings yet
Assignment 2 Tidyr
2 pages
Base R
No ratings yet
Base R
9 pages
Data Cleaning Using R
No ratings yet
Data Cleaning Using R
26 pages
R Programming Swirl
No ratings yet
R Programming Swirl
22 pages
DataCamp Week 5
No ratings yet
DataCamp Week 5
7 pages
BMR Assignment: Tidyr
No ratings yet
BMR Assignment: Tidyr
3 pages
Assignment 2 Tidyr
No ratings yet
Assignment 2 Tidyr
2 pages
R Programming Cont..
No ratings yet
R Programming Cont..
24 pages
(R) Internal-2 Q & A
No ratings yet
(R) Internal-2 Q & A
65 pages
R Programming Cheat Sheet: Ata Tructures
No ratings yet
R Programming Cheat Sheet: Ata Tructures
2 pages
Mod3 Tables EPP
No ratings yet
Mod3 Tables EPP
9 pages
ProgrammingForDS14 Rbasics
No ratings yet
ProgrammingForDS14 Rbasics
32 pages
MIT 302 - Statistical Computing II - Tutorial 02
No ratings yet
MIT 302 - Statistical Computing II - Tutorial 02
5 pages
Creating and Manipulating Objects
No ratings yet
Creating and Manipulating Objects
12 pages
CleaningData Chapter 3
No ratings yet
CleaningData Chapter 3
29 pages
Section 03
No ratings yet
Section 03
20 pages
R BasicCommands
No ratings yet
R BasicCommands
5 pages
R Basic and Advanced
No ratings yet
R Basic and Advanced
9 pages
Unit 1.3
No ratings yet
Unit 1.3
36 pages
R WorkSamples
No ratings yet
R WorkSamples
44 pages
M1 Basics R Language
No ratings yet
M1 Basics R Language
30 pages
Programming R - 3
No ratings yet
Programming R - 3
16 pages
R Programming Cheat Sheet
No ratings yet
R Programming Cheat Sheet
7 pages
DA Lab Week-2
No ratings yet
DA Lab Week-2
22 pages
4 Overview of R Part 2
No ratings yet
4 Overview of R Part 2
63 pages
R Program Cheat Sheet 1
No ratings yet
R Program Cheat Sheet 1
2 pages
R Cheat Sheet 3 PDF
No ratings yet
R Cheat Sheet 3 PDF
2 pages
Cheat R Sheet
No ratings yet
Cheat R Sheet
5 pages
M2 Dar
No ratings yet
M2 Dar
46 pages
Data Anlytics Using R Notes
No ratings yet
Data Anlytics Using R Notes
14 pages
R Subnetting
No ratings yet
R Subnetting
16 pages
Machine Learning - Unit IV Notes
No ratings yet
Machine Learning - Unit IV Notes
18 pages
Solutions For QB3
No ratings yet
Solutions For QB3
14 pages
Data Analysis & Visualization With R
No ratings yet
Data Analysis & Visualization With R
14 pages
Dar Lecture 7
No ratings yet
Dar Lecture 7
24 pages
Practical 1 - Data Frame Manipulation - 072502
No ratings yet
Practical 1 - Data Frame Manipulation - 072502
16 pages
RSTUDIO
No ratings yet
RSTUDIO
44 pages
R Programming PDF
No ratings yet
R Programming PDF
128 pages
R Programming PDF
No ratings yet
R Programming PDF
128 pages
13.1 Course Notes - Section II, III, IV
No ratings yet
13.1 Course Notes - Section II, III, IV
12 pages
Lab1 411 Eman Yahya 7773225
No ratings yet
Lab1 411 Eman Yahya 7773225
16 pages
R
No ratings yet
R
13 pages
R Vectors
No ratings yet
R Vectors
22 pages
BS730 Class 12
No ratings yet
BS730 Class 12
36 pages
Introduction To R
No ratings yet
Introduction To R
52 pages
Basic R Dplyr Session 4 Demonstration
No ratings yet
Basic R Dplyr Session 4 Demonstration
18 pages
R Cheatsheet Base R
No ratings yet
R Cheatsheet Base R
2 pages
DSF 9-10
No ratings yet
DSF 9-10
25 pages
R22 Unit3 Vector List Matrix
No ratings yet
R22 Unit3 Vector List Matrix
37 pages
Mastering Data Structures and Algorithms in C and C++
From Everand
Mastering Data Structures and Algorithms in C and C++
Sachin Naha
No ratings yet
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Ian Talks Python A-Z
From Everand
Ian Talks Python A-Z
Ian Eress
No ratings yet
Algorithms and Data Structures: An Easy Guide to Programming Skills
From Everand
Algorithms and Data Structures: An Easy Guide to Programming Skills
Rigdon Jonathan
No ratings yet
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
Fourth Quarter Week 7 (Day 1 - 2) : For Teachers
No ratings yet
Fourth Quarter Week 7 (Day 1 - 2) : For Teachers
17 pages
Environmental Internal Audit: APRIL 2018
No ratings yet
Environmental Internal Audit: APRIL 2018
7 pages
IZZAT M AL SMADI Assistant Professor Department of Computer Information
No ratings yet
IZZAT M AL SMADI Assistant Professor Department of Computer Information
3 pages
OP09 - RSOPC Gateway: Presenter: Christopher Rogers Operate
No ratings yet
OP09 - RSOPC Gateway: Presenter: Christopher Rogers Operate
42 pages
Computational Linguistics and Audio-Visual Readability: Analysing Linguistic Features of Intralingual-Subtitles Corpora
No ratings yet
Computational Linguistics and Audio-Visual Readability: Analysing Linguistic Features of Intralingual-Subtitles Corpora
14 pages
Referencing
No ratings yet
Referencing
31 pages
Winning: Jack Welch
No ratings yet
Winning: Jack Welch
19 pages
BY Parth Sarthi Mba HR
No ratings yet
BY Parth Sarthi Mba HR
22 pages
Sub Netting Notes
80% (10)
Sub Netting Notes
6 pages
The Bloomsbury Companion To Stylistics - (Chapter 5 Pragmatics and Stylistics)
No ratings yet
The Bloomsbury Companion To Stylistics - (Chapter 5 Pragmatics and Stylistics)
14 pages
The - Role - of - Vibration - Monitoring - Schaeffler (UK) - (2009) PDF
No ratings yet
The - Role - of - Vibration - Monitoring - Schaeffler (UK) - (2009) PDF
20 pages
About The Affluent Worker de
No ratings yet
About The Affluent Worker de
18 pages
Power Point Work Sheet
No ratings yet
Power Point Work Sheet
3 pages
Memory-Wise Chapter Sampler
No ratings yet
Memory-Wise Chapter Sampler
28 pages
Revised Affinity Laws
No ratings yet
Revised Affinity Laws
13 pages
Pick and Place Robotic Arm Implementation Using Arduino
No ratings yet
Pick and Place Robotic Arm Implementation Using Arduino
9 pages
Physics 1 Lab Report Experiment Churi 1
No ratings yet
Physics 1 Lab Report Experiment Churi 1
11 pages
Light Reflection and Refraction Questions
No ratings yet
Light Reflection and Refraction Questions
16 pages
Problem Solving TEST 3
No ratings yet
Problem Solving TEST 3
44 pages
Portfolio PDF
No ratings yet
Portfolio PDF
15 pages
Arno and Thomas 2016
No ratings yet
Arno and Thomas 2016
11 pages
Pranav Sir - LR Direction Test Marathon Notes
No ratings yet
Pranav Sir - LR Direction Test Marathon Notes
3 pages
Jurnal Promosi Kesehatan Tentang Alas Kaki
No ratings yet
Jurnal Promosi Kesehatan Tentang Alas Kaki
8 pages
Spoken Language, Oral Culture
No ratings yet
Spoken Language, Oral Culture
7 pages
Music and Young Culture
100% (3)
Music and Young Culture
261 pages
Proclus, Metaphysical Elements
100% (4)
Proclus, Metaphysical Elements
230 pages
Operations Strategy
No ratings yet
Operations Strategy
4 pages
Mathematics MIT
No ratings yet
Mathematics MIT
30 pages
9 Nondeterministic Turing Machines: 9.1 Definitions
No ratings yet
9 Nondeterministic Turing Machines: 9.1 Definitions
6 pages

Learn R - Learn R - Data Cleaning Cheatsheet - Codecademy

Uploaded by

Learn R - Learn R - Data Cleaning Cheatsheet - Codecademy

Uploaded by

23-01-2025, 11:24 Learn R: Learn R: Data Cleaning Cheatsheet | Codecademy

Learn R: Data Cleaning

teams_clean <- gsub("1", "", teams)

The str() function display the internal structure of an

Combing Data with R

Data from multiple files can be combined into one data

files <- list.files(pattern = "fi

1. Read in the files. The following code applies

df_list <- lapply(files, read_csv)

1. Combine the file data. Below bind_rows(), a

The base R as.numeric() function can coerce

In a tidy dataset each variable is represented by a

The dplyr and tidyr packages

The dplyr and tidyr packages provide functions that

The gather() function from tidyr package is useful to

You might also like