0% found this document useful (0 votes)
2 views

Programming in R. Ex 4 Detailed explanation

The document provides a detailed explanation of various R programming concepts, including installing libraries, validating emails using regex, and manipulating strings. It covers regex patterns for matching specific string formats, cleaning data, and creating new columns in dataframes based on conditions. Additionally, it includes examples of functions like grepl, str_view, and paste0 to demonstrate practical applications of these concepts.

Uploaded by

soloviovalada
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Programming in R. Ex 4 Detailed explanation

The document provides a detailed explanation of various R programming concepts, including installing libraries, validating emails using regex, and manipulating strings. It covers regex patterns for matching specific string formats, cleaning data, and creating new columns in dataframes based on conditions. Additionally, it includes examples of functions like grepl, str_view, and paste0 to demonstrate practical applications of these concepts.

Uploaded by

soloviovalada
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Ex 4 Detailed explanation

1. Installing and Loading Libraries


Code:
R
Копировать код
#install.packages("tidyverse")
#install.packages("htmlwidgets")
#install.packages("htmltools")

library(tidyverse)
library(stringr)
library(htmlwidgets)
library(htmltools)

● Install Commands (commented):


○ install.packages("tidyverse"): Installs the tidyverse package.
This is a collection of R packages (including ggplot2, dplyr, and stringr)
used for data analysis, manipulation, and visualization.
○ install.packages("htmlwidgets"): Installs the htmlwidgets
package, allowing creation of interactive visualizations in R.
○ install.packages("htmltools"): Installs htmltools, which works
with htmlwidgets to support custom HTML elements.
● Library Loading:
○ library(tidyverse): Loads the entire tidyverse suite, including stringr.
○ library(stringr): Provides specialized string manipulation functions,
including regex utilities (str_view, str_replace_all, etc.).
○ library(htmlwidgets) and library(htmltools): These are required
for str_view() to generate visual regex matches.

2. Validating Emails
Code:
R
Копировать код
emails <- c("[email protected]","unicorns_rock!gmail.com")
at_regex <- "@"
dotcom_regex <- ".com"

● Data:
○ emails: A character vector containing two email-like strings.
● Regex Patterns:
○ at_regex <- "@": Matches the literal "@" symbol.
○ dotcom_regex <- ".com": Matches the literal ".com". (The dot here is not
escaped because it’s followed by "com", making it specific.)

Checking for Patterns:


R
Копировать код
grepl(at_regex, emails)
grepl(dotcom_regex, emails)

● Function: grepl():
○ Checks whether a pattern exists in each element of a vector.
○ Output: A logical vector indicating TRUE if the pattern is found, FALSE
otherwise.
○ For at_regex:
■ Checks if "@" exists in each string.
■ Example: "[email protected]" -> TRUE;
"unicorns_rock!gmail.com" -> FALSE.
○ For dotcom_regex:
■ Checks if ".com" exists.
■ Example: "[email protected]" -> TRUE;
"unicorns_rock!gmail.com" -> TRUE.

Visualizing Matches:
R
Копировать код
str_view(emails, at_regex)
str_view(emails, dotcom_regex)

● Function: str_view():
○ Highlights occurrences of a pattern within strings.
○ Usage:
■ First highlights "@" in the emails vector.
■ Second highlights ".com".

For-Loop Email Validation:


R
Копировать код
for (email in emails) {
if (grepl(at_regex, email) & grepl(dotcom_regex, email)) {
print("email valid")
} else {
print("email not valid")
}
}

● Loop Logic:
○ Iterates over each email in the emails vector.
○ Condition:
■ Combines two grepl() checks (& is logical AND):
■ "@" must be present.
■ ".com" must be present.
○ Output:
■ If both conditions are TRUE, prints "email valid".
■ Otherwise, prints "email not valid".

3. Anchors in Regular Expressions


Code:
R
Копировать код
strings <- c("abcd", "cdab", "cabd", "c ab", "ab")

● A vector of strings designed to test regex anchors (^, $).

Matches Anywhere:
R
Копировать код
str_view(strings, "ab", match = TRUE)
grep("ab", strings, value = TRUE)

● Pattern: ab:
○ Matches "ab" anywhere in the string.
● str_view():
○ Highlights all occurrences of "ab".
● grep(value = TRUE):
○ Returns strings containing "ab" as a substring.

Match at Start (^):


R
Копировать код
str_view(strings, "^ab", match = TRUE)
grep("^ab", strings, value = TRUE)

● Pattern: ^ab:
○ Matches strings that start with "ab".
● Example:
○ Matches "abcd" and "ab".

Match at End ($):


R
Копировать код
str_view(strings, "ab$", match = TRUE)
grep("ab$", strings, value = TRUE)

● Pattern: ab$:
○ Matches strings that end with "ab".
● Example:
○ Matches "cdab", "c ab", and "ab".

Match Exactly (^ and $):


R
Копировать код
str_view(strings, "^ab$", match = TRUE)
grep("^ab$", strings, value = TRUE)

● Pattern: ^ab$:
○ Matches strings that are exactly "ab".
● Example:
○ Matches only "ab".

4. Locating File Types


Code:
R
Копировать код
files <- c("file1.txt", "file2.csv", "file3.txt", "filetxt4.stxt",
"file1txt.R")
grep("\\.txt$", files, value = TRUE)
● Regex Pattern:
○ \\.txt$:
■ \\.: Matches the literal dot.
■ txt$: Matches "txt" at the end of a string.
● Functions:
○ grep(value = TRUE):
■ Returns filenames ending with .txt.

5. Matching Literal Strings


Code:
R
Копировать код
example <- "This is some text with unwanted characters $^$$"
pattern <- "\\$\\^\\$"

● Pattern:
○ Escapes $ and ^ using \\ to match the literal string $^$.

Functions:
R
Копировать код
str_view(example, "\\$\\^\\$")
str_extract(example, "\\$\\^\\$")
ifelse(grepl(pattern, example), "pattern found", "p not found")

● str_view(): Highlights $^$.


● str_extract(): Extracts $^$.
● ifelse(): Conditional validation of the pattern.

6. Removing Punctuation
Code:
R
Копировать код
str_replace_all(tweets, "[[:punct:]]", "")
gsub("[[:punct:]]", "", tweets)

● Regex Pattern:
○ [[:punct:]]: Matches all punctuation.
● Functions:
○ str_replace_all(): From stringr, replaces punctuation with an empty
string.
○ gsub(): Base R equivalent.

7. Matching Repetitions
Code:
R
Копировать код
numbers <- c("123", "1234", "09876543", "2345678288",
"abc12345678dd")
str_view(numbers, "\\b\\d{8}\\b")

● Pattern:
○ \\d{8}: Matches exactly 8 digits.
○ \\b: Ensures digits are isolated (word boundary).

8. Cleaning Up Scripts

● The task asks the user to clean a messy script file (Ex4_messy_code.r), ensuring
proper formatting and comments.

9. Creating Class Codes

The task involves generating a new column, class_code, in a dataframe courses based
on transformations of several columns.

Original Code:
R
Копировать код
courses$class_code <- paste0(
str_extract(courses$Coursenumber, "[A-Z]+[A-Z0-9]*U"),
".LA_",
ifelse(grepl("Spring", courses$Semester), "S23", "F22")
)

Detailed Breakdown

Step 1: What is courses?


● courses: A dataframe containing information about university courses.

Example structure (Courses_bach.xlsx file):


R
Копировать код
head(courses, 3)
## # A tibble: 3 × 9
## Coursenumber Title English Teaching Language
Semester ECTS ...
## 1 BA-BASPV1001U Internship 7.5 ECTS English Fall
(E) 7.5
## 2 BA-BASPV1002U Internship 15 ECTS English Spring
(E) 15
## 3 BA-BASPV1234U Emerging Markets ... English Fall
(E) 7.5


○ Coursenumber: Contains course identifiers like "BA-BASPV1001U".
○ Semester: Indicates whether the course occurs in Fall or Spring.

Step 2: Breaking Down paste0()


R
Копировать код
paste0(
str_extract(courses$Coursenumber, "[A-Z]+[A-Z0-9]*U"),
".LA_",
ifelse(grepl("Spring", courses$Semester), "S23", "F22")
)

● paste0():
○ Combines multiple strings into one without spaces between them.

Example:
R
Копировать код
paste0("A", "B", "C") # Outputs "ABC"

This means the function constructs a string for each row in the dataframe by combining:

1. The extracted course identifier (str_extract()).


2. A fixed string ".LA_".
3. A semester-specific suffix ("S23" or "F22", based on ifelse()).
Step 3: Extracting the Course Identifier
R
Копировать код
str_extract(courses$Coursenumber, "[A-Z]+[A-Z0-9]*U")

● Purpose:
○ Extracts the part of the Coursenumber column that matches a specific
pattern.
● Regex Pattern:
○ "[A-Z]+[A-Z0-9]*U":
■ [A-Z]+: Matches one or more uppercase letters (e.g., "BA").
■ [A-Z0-9]*: Matches zero or more uppercase letters or digits (e.g.,
"BASPV1001").
■ U: Matches the literal character "U".
○ Together, this extracts course identifiers like "BASPV1001U".

Example:

● Input: "BA-BASPV1001U"
● Output: "BASPV1001U"

Function Explanation:

● str_extract():
○ Searches for the first match of the pattern in each string.
○ If no match is found, returns NA.

Step 4: Adding a Fixed String


R
Копировать код
".LA_"

● Adds a static, consistent component (".LA_") to the course code.


● This is hardcoded and not dynamic.

Example:

● Input: "BASPV1001U"
● Output (after adding ".LA_"): "BASPV1001U.LA_".

Step 5: Determining Semester Suffix


R
Копировать код
ifelse(grepl("Spring", courses$Semester), "S23", "F22")
● Purpose:
○ Determines whether the semester is Spring or Fall and assigns the correct
suffix ("S23" for Spring 2023 or "F22" for Fall 2022).

Step 5.1: Checking for "Spring"


R
Копировать код
grepl("Spring", courses$Semester)

● grepl(pattern, x):
○ Returns TRUE if pattern is found in x.
○ Here:
■ pattern = "Spring"
■ x = courses$Semester (column containing values like "Fall
(E)", "Spring (E)").

Example:

● Input: c("Fall (E)", "Spring (E)", "Fall (E)")


● Output: c(FALSE, TRUE, FALSE)

Step 5.2: Using ifelse()


R
Копировать код
ifelse(grepl("Spring", courses$Semester), "S23", "F22")

● ifelse(condition, true, false):


○ Returns "S23" if condition is TRUE (i.e., "Spring" is found).
○ Otherwise, returns "F22".

Example:

● Input: c("Fall (E)", "Spring (E)", "Fall (E)")


● Output: c("F22", "S23", "F22").

Step 6: Constructing the Final String

Combining All Components with paste0():

R
Копировать код
paste0(
str_extract(courses$Coursenumber, "[A-Z]+[A-Z0-9]*U"), # Extracted
Course Identifier
".LA_", # Fixed String
ifelse(grepl("Spring", courses$Semester), "S23", "F22") # Semester
Suffix
)

Example:

● For a row with:


○ Coursenumber = "BA-BASPV1001U"
○ Semester = "Fall (E)":
■ Extracted: "BASPV1001U"
■ Fixed: ".LA_"
■ Semester Suffix: "F22"
● Final Result: "BASPV1001U.LA_F22"

You might also like