Programming in R. Ex 4 Detailed explanation
Programming in R. Ex 4 Detailed explanation
library(tidyverse)
library(stringr)
library(htmlwidgets)
library(htmltools)
2. Validating Emails
Code:
R
Копировать код
emails <- c("[email protected]","unicorns_rock!gmail.com")
at_regex <- "@"
dotcom_regex <- ".com"
● Data:
○ emails: A character vector containing two email-like strings.
● Regex Patterns:
○ at_regex <- "@": Matches the literal "@" symbol.
○ dotcom_regex <- ".com": Matches the literal ".com". (The dot here is not
escaped because it’s followed by "com", making it specific.)
● Function: grepl():
○ Checks whether a pattern exists in each element of a vector.
○ Output: A logical vector indicating TRUE if the pattern is found, FALSE
otherwise.
○ For at_regex:
■ Checks if "@" exists in each string.
■ Example: "[email protected]" -> TRUE;
"unicorns_rock!gmail.com" -> FALSE.
○ For dotcom_regex:
■ Checks if ".com" exists.
■ Example: "[email protected]" -> TRUE;
"unicorns_rock!gmail.com" -> TRUE.
Visualizing Matches:
R
Копировать код
str_view(emails, at_regex)
str_view(emails, dotcom_regex)
● Function: str_view():
○ Highlights occurrences of a pattern within strings.
○ Usage:
■ First highlights "@" in the emails vector.
■ Second highlights ".com".
● Loop Logic:
○ Iterates over each email in the emails vector.
○ Condition:
■ Combines two grepl() checks (& is logical AND):
■ "@" must be present.
■ ".com" must be present.
○ Output:
■ If both conditions are TRUE, prints "email valid".
■ Otherwise, prints "email not valid".
Matches Anywhere:
R
Копировать код
str_view(strings, "ab", match = TRUE)
grep("ab", strings, value = TRUE)
● Pattern: ab:
○ Matches "ab" anywhere in the string.
● str_view():
○ Highlights all occurrences of "ab".
● grep(value = TRUE):
○ Returns strings containing "ab" as a substring.
● Pattern: ^ab:
○ Matches strings that start with "ab".
● Example:
○ Matches "abcd" and "ab".
● Pattern: ab$:
○ Matches strings that end with "ab".
● Example:
○ Matches "cdab", "c ab", and "ab".
● Pattern: ^ab$:
○ Matches strings that are exactly "ab".
● Example:
○ Matches only "ab".
● Pattern:
○ Escapes $ and ^ using \\ to match the literal string $^$.
Functions:
R
Копировать код
str_view(example, "\\$\\^\\$")
str_extract(example, "\\$\\^\\$")
ifelse(grepl(pattern, example), "pattern found", "p not found")
6. Removing Punctuation
Code:
R
Копировать код
str_replace_all(tweets, "[[:punct:]]", "")
gsub("[[:punct:]]", "", tweets)
● Regex Pattern:
○ [[:punct:]]: Matches all punctuation.
● Functions:
○ str_replace_all(): From stringr, replaces punctuation with an empty
string.
○ gsub(): Base R equivalent.
7. Matching Repetitions
Code:
R
Копировать код
numbers <- c("123", "1234", "09876543", "2345678288",
"abc12345678dd")
str_view(numbers, "\\b\\d{8}\\b")
● Pattern:
○ \\d{8}: Matches exactly 8 digits.
○ \\b: Ensures digits are isolated (word boundary).
8. Cleaning Up Scripts
● The task asks the user to clean a messy script file (Ex4_messy_code.r), ensuring
proper formatting and comments.
The task involves generating a new column, class_code, in a dataframe courses based
on transformations of several columns.
Original Code:
R
Копировать код
courses$class_code <- paste0(
str_extract(courses$Coursenumber, "[A-Z]+[A-Z0-9]*U"),
".LA_",
ifelse(grepl("Spring", courses$Semester), "S23", "F22")
)
Detailed Breakdown
●
○ Coursenumber: Contains course identifiers like "BA-BASPV1001U".
○ Semester: Indicates whether the course occurs in Fall or Spring.
● paste0():
○ Combines multiple strings into one without spaces between them.
Example:
R
Копировать код
paste0("A", "B", "C") # Outputs "ABC"
This means the function constructs a string for each row in the dataframe by combining:
● Purpose:
○ Extracts the part of the Coursenumber column that matches a specific
pattern.
● Regex Pattern:
○ "[A-Z]+[A-Z0-9]*U":
■ [A-Z]+: Matches one or more uppercase letters (e.g., "BA").
■ [A-Z0-9]*: Matches zero or more uppercase letters or digits (e.g.,
"BASPV1001").
■ U: Matches the literal character "U".
○ Together, this extracts course identifiers like "BASPV1001U".
Example:
● Input: "BA-BASPV1001U"
● Output: "BASPV1001U"
Function Explanation:
● str_extract():
○ Searches for the first match of the pattern in each string.
○ If no match is found, returns NA.
Example:
● Input: "BASPV1001U"
● Output (after adding ".LA_"): "BASPV1001U.LA_".
● grepl(pattern, x):
○ Returns TRUE if pattern is found in x.
○ Here:
■ pattern = "Spring"
■ x = courses$Semester (column containing values like "Fall
(E)", "Spring (E)").
Example:
Example:
R
Копировать код
paste0(
str_extract(courses$Coursenumber, "[A-Z]+[A-Z0-9]*U"), # Extracted
Course Identifier
".LA_", # Fixed String
ifelse(grepl("Spring", courses$Semester), "S23", "F22") # Semester
Suffix
)
Example: