Intermediate R
Intermediate R
Hanadi Jusufovic
Datacamp Course 4
We can check if two operators are equal by using == sign. We can check
the equality of logical operators, for example:
TRUE == TRUE
which outputs TRUE.
Similarly, we have operators < less than, > greater than, <= less or equal
than, >= greater or equal to.
When comparing two strings, R uses alphabetical order to compare them. For
example:
"hello" > "goodbye"
outputs TRUE, because ’h’ comes after ’g’ in the alphabet.
1
TRUE > FALSE
outputs TRUE.
Another example:
x <- 12
x > 5 & x < 15
returns TRUE.
2
Logical operator | returns TRUE if at least one of the logical values is equal
to TRUE.
Logical operators on vectors will work exactly as we expect them to work based
on the examples for relational operators.
For example:
c(TRUE, TRUE, FALSE) & c(TRUE, FALSE, FALSE)
would return TRUE FALSE FALSE
We can use && and || instead of & and |, however, the result will be differ-
ent on vectors.
A double sign will examine just the first value of each vector. Therefore:
c(TRUE, TRUE, FALSE) && c(TRUE, FALSE, FALSE)
would return TRUE.
If/else statements are conditional statements that are used to control the flow
of the program. The if statement has the form:
if(condition) {
expression
}
If the condition evaluates to TRUE, the expression is executed. For example:
x <- -3
if(x < 0) {
print("x is a negative number")
}
and this would print ”x is a negative number” since the condition is evaluated
to TRUE.
The else statement has to be used together with the if statement. If the condi-
tion in the if statement is evaluated as FALSE, the code in the else statement
will execute. For example:
x <- 5
if(x < 0) {
3
print("x is a negative number")
} else {
print("x is either positive number or zero")
}
and this would print ”x is either a positive number or zero”.
Furthermore, we can use else if, to check for more than one condition.
if(x < 0) {
print("x is a negative number")
} else if(x == 0) {
print("x is zero")
} else {
print("x is a positive number")
}
We can also nest the if statements under if statements. For example:
if(x < 10) {
print("x is smaller than 10")
if(x < 5) {
print("x is smaller than 5")
}
else {
print("x is between 5 and 10")
}
} else {
print("x is not smaller than 10")
}
2 Loops
While loop works similarly to the if statement, however, the while loop will
continue executing (repeating), as long as the condition is TRUE.
while(condition) {
expression
}
For example:
n <- 5
sum <- 0
while(n > 0) {
sum <- sum + n
n <- n - 1
}
4
This code will sum 1+2+...+5
We can exit early out of the while loop by using break. For example:
ctr <- 1
while(ctr <= 7) {
if(ctr %% 5 == 0) {
break
}
print(paste("ctr is set to", ctr))
ctr <- ctr + 1
}
For loops look like:
for(variable in sequence) {
expression
}
For example:
cities <- c("New York", "London", "Amsterdam", "Tokyo")
for(city in cities) {
print(city)
}
For loop also works on lists; we can use the break to exit early from the for loop.
There is also next statement. We use next statement to skip to the next it-
eration of the loop. For example:
for(city in cities) {
if(nchar(city) == 6) {
next
}
print(city)
}
this code will print every city in the list, except the cities that have a name
length equal to 6.
5
3 Functions
We have already seen a lot of functions. The ones we used so far are built-in
functions. For example, to create a list we used list() function, and to print we
used print() function.
To call function, we have to specify the arguments to the function. For ex-
ample a function to calculate standard deviation is sd(). To call this function
we would pass a vector with values. For example:
sd(c(1, 5, 6, 7))
and this would return 2.629956
The sd() accepts two arguments. First argument is the argument x - the el-
ement that contains the values, and the second argument is na.rm argument.
By default, the value of na.rm is FALSE.
If we pass a list of elements that contains one N A value, and the na.rm argu-
ment is set to FALSE, then the sd() will return N A. However, if we set na.rm
to TRUE, sd() will ignore the N A value. For example:
vec <- c(1, 5, 6, NA)
sd(vec)
would return N A.
However,
sd(vec, TRUE)
would return 2.645751
We can write our own functions. We write functions to help us solve a par-
ticular, well-defined problem.
For example, lets write a function that returns 3 times the number passed in.
6
triple <- function(x) {
3 * x
}
But more specifically, we can return a value from a function using return()
triple <- function(x) {
y <- 3 * x
return(y)
}
An important thing to understand is the scope of a function. The variables that
are defined inside a function, are not accessible outside of the function.
We can load the built-in function to our program by using R Packages. The
base package is automatically installed, and it contains common functions such
as list(), print(), etc.
To install a package:
install.packages("ggvis")
To load a package:
library("ggvis")
7
This is quite a bit of code to find out the class of elements in a list.
However, lapply() function is here to help.
lapply(NYC, class)
Now consider, a vector of cities that we used before. If we want to find the
number of characters of each city we can write something like:
cities <- c("New York", "Paris", "London", "Tokyo")
When the results that lapply() function returns are all of the same type, as
8
in the example with cities, we can use sapply() function.
sapply(cities, nchar)
and this returns a named list that contains the number of letters in each name.
The return value looks like:
New York Paris London ...
8 5 6 ...
We can choose not to name the elements by setting the USE.NAMES to FALSE.
By default, USE.NAMES is set to TRUE.
sapply(cities, nchar, USE.NAMES = FALSE)
As we have seen, sapply() works similarly to lapply(), but it tries to simplify
the output from list to an array.
Finally, we consider vapply() function. vapply() will also apply a function over
a list or a vector, but we also have to explicitly specify the output format.
5 Utilities
So far we have seen a number of useful functions, such as lapply(), sapply(),
print(), sd(), etc.
9
There are some useful functions for data structures. For example seq(). seq()
generates a sequence. For example:
seq(1, 10, by = 3)
returns a sequence: 1 4 7 10
Another function to consider is rep(), which stand for repeat. For example:
rep(c(8, 6, 4, 2), times = 2)
returns: 8 6 4 2 8 6 4 2
Instead of times argument, we can use each argument, to repeart each element,
instead of the whole list.
rep(c(8, 6, 4, 2), each = 2)
returns: 8 8 6 6 4 4 2 2
10
grepl() function accepts two arguments:
grepl(pattern = <regex>, x = <string>)
where < regex > is a pattern that we are trying to match, and < string > is a
vector of characters.
For example, if we want to check which of the strings in animals vector contain
letter a:
grepl(pattern = "a", x = animals)
which returns: TRUE FALSE TRUE TRUE FALSE
We use ˆ to specify that we want to match a pattern that starts with some
expression. For example, if we want to match strings that start with letter a:
grepl(pattern = "^a", x = animals)
returns: FALSE FALSE FALSE TRUE FALSE
On the other hand, we use $ to match strings that end with some pattern.
For example, to match strings that end in letter a:
grepl(pattern = "a$", x = animals)
returns: FALSE FALSE TRUE FALSE FALSE
grep() functions works similarly, however, the return will be a vector of in-
dices that match the pattern.
So, to find indices that contain letter a:
grep(pattern = "a", x = animals)
returns: 1 3 4
To get the same output from grepl() function, we need to use which() func-
tion. For example:
which(grepl(pattern = "a", x = animals))
returns: 1 3 4
For example:
sub(regex = "a", replacement = "o", x = animals)
11
returns: ”cot” ”moose” ”impola” ”ont” ”kiwi”
As we can see, in ”impala” only the first a got replaced. sub() replaces only the
first occurance of the matched pattern. If we want to replace all the occurences
we use gsub() function.
It is important to note that the class(today) is not simply a string. The class
of the object today is ”Date”.
To get the current date and time we use Sys.time(). For example:
now <- Sys.time()
now
outputs ”2022-10-30 17:58:33 CEST”
If we check the class() of now, we get that the class is ”POSIXct” ”POSIXt”
These classes make sure that the time is compatible accross other operating
system according to POSIX standard.
12
returns ”2022-10-30”
There are some dedicated packages to deal with time in R, and they are:
• lubridate
• zoo
• xts
13