Learn R - Learn R - Data Cleaning Cheatsheet - Codecademy
Learn R - Learn R - Data Cleaning Cheatsheet - Codecademy
Cheatsheets / Learn R
gsub() R Function
The base R gsub() function searches for a regular # Replace the element "1" with the empty
expression in a string and replaces it. The function
string in the teams vector in order to
recieve a string or character to replace, a replacement
value, and the object that contains the regular get the teams_clean vector with the
expression. We can use it to replace substrings within a correct names.
single string or in each string in a vector.
teams <- c("Fal1cons", "Cardinals",
When combined with dplyr’s mutate() function, a
column of a data frame can be cleaned to enable "Seah1awks", "Vikings", "Bro1nco",
analysis. "Patrio1ts")
print(teams_clean)
# Output:
# "Falcons" "Cardinals" "Seahawks"
"Vikings" "Bronco" "Patriots"
distinct() dplyr
The distinct() function from dplyr package is used to # Keep unique rows in the
keep only unique rows on a data frame. If there are
match_statistics data frame
duplicate rows, the function will preserve only the first
row. The function can be used to remove equal rows of distinct(match_statistics)
a dataframe, and to remove rows in a data frame based
on unique column values or unique combination of
# Keep only rows with different values in
columns values.
the prices column of trips
# dataframe
distinct(trips,prices)
https://www.codecademy.com/learn/learn-r/modules/learn-r-data-cleaning/cheatsheet 1/4
23-01-2025, 11:24 Learn R: Learn R: Data Cleaning Cheatsheet | Codecademy
str() Function
df <- bind_rows(df_list)
https://www.codecademy.com/learn/learn-r/modules/learn-r-data-cleaning/cheatsheet 2/4
23-01-2025, 11:24 Learn R: Learn R: Data Cleaning Cheatsheet | Codecademy
R as.numeric() Function
str_sub() function
The str_sub() function from the stringr package can # This command would take the first index
split a string by index position separating combined
to the five index of the string.
data values into their individual components. The
function uses the start= and end= arguments to str_sub('Marya1984', start=1,end=5)
perform the split operation. This function can be used
with mutate() from dplyr in order to generate multiple
new columns on a data frame based on split string
values of a particular column.
Tidy Dataset
https://www.codecademy.com/learn/learn-r/modules/learn-r-data-cleaning/cheatsheet 3/4
23-01-2025, 11:24 Learn R: Learn R: Data Cleaning Cheatsheet | Codecademy
separate() Function
The separate() function from the tidyr package, is # This function would separate the
used to separate a single character column of a data
complete_name column into new columns
frame into multiple columns. Arguments of this function
are, in order, a dataframe, the column used to create called names and surnames on the
the new columns(column name or column position in individuals data frame.
the data frame), the new column names that will be
separate(individuals, complete_name,
used, and the separator argument. The default
seperator will match any non-alphanumeric sequence, c("names","surnames"))
such as a space or semicolon.
gather() tidyr
Print Share
https://www.codecademy.com/learn/learn-r/modules/learn-r-data-cleaning/cheatsheet 4/4