0% found this document useful (0 votes)
45 views13 pages

Intermediate R

This document provides an overview of intermediate R concepts including conditionals and control flow, loops, functions, and the apply family. It discusses relational operators like equality and inequality, logical operators like AND and OR, and if/else conditional statements. It also covers while, for, and next loops as well as writing custom functions. Finally, it demonstrates how to use lapply to apply functions over list elements to simplify code.

Uploaded by

Hanadi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views13 pages

Intermediate R

This document provides an overview of intermediate R concepts including conditionals and control flow, loops, functions, and the apply family. It discusses relational operators like equality and inequality, logical operators like AND and OR, and if/else conditional statements. It also covers while, for, and next loops as well as writing custom functions. Finally, it demonstrates how to use lapply to apply functions over list elements to simplify code.

Uploaded by

Hanadi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Intermediate R

Hanadi Jusufovic

Datacamp Course 4

1 Conditionals and Control Flow


Relational operators are comparators, that is, the operators that help us see
how one R object compares to another.

We can check if two operators are equal by using == sign. We can check
the equality of logical operators, for example:
TRUE == TRUE
which outputs TRUE.

We can check the equality of other types.


3 == 2
which outputs FALSE.
"hello" == "goodbye"
also outputs FALSE.

The oposite of equality, is ineqaulity, and we use ! = sign. For example:


"hello" != "goodbye"
which outputs TRUE.

Similarly, we have operators < less than, > greater than, <= less or equal
than, >= greater or equal to.
When comparing two strings, R uses alphabetical order to compare them. For
example:
"hello" > "goodbye"
outputs TRUE, because ’h’ comes after ’g’ in the alphabet.

Also, the value of TRUE is 1, and FALSE corresponds to 0. Therefore:

1
TRUE > FALSE
outputs TRUE.

We can use relational operators on vectors. For example, suppose we have


a vector
link <- c(16, 9, 13, 5)
If we compare vector link with 10 and we want to check if link is greater than
10:
link > 10
the output would be TRUE FALSE TRUE FALSE
Similarly, we can compare two vectors.

We can also use relational operators on matrices. For example, suppose we


have a matrix views defined as:
linkedin <- c(16, 9, 13, 5, 2, 17, 14)
facebook <- c(17, 7, 5, 16, 8, 13, 14)
views <- matrix(c(linkedin, facebook), nrow = 2, byrow = TRUE)
We could compare:
views == 13
and the output would be TRUE for positions (1,3) and (2,6) and false every-
where else.

There are 3 Logical Operators:


• AND operator &
• OR operator |
• NOT operator !
Logical operator & returns TRUE only if both of the operands are TRUE itself.
TRUE & TRUE
TRUE & FALSE
FALSE & FALSE
and the output would be TRUE FALSE FALSE

Another example:
x <- 12
x > 5 & x < 15
returns TRUE.

2
Logical operator | returns TRUE if at least one of the logical values is equal
to TRUE.

Logical operator ! negates the logical value. For example:


is.numeric(5)
!is.numeric(5)
the first line returns TRUE, since 5 is a numerical value, and the second line
returns FALSE the negation.

Logical operators on vectors will work exactly as we expect them to work based
on the examples for relational operators.

For example:
c(TRUE, TRUE, FALSE) & c(TRUE, FALSE, FALSE)
would return TRUE FALSE FALSE

We can use && and || instead of & and |, however, the result will be differ-
ent on vectors.
A double sign will examine just the first value of each vector. Therefore:
c(TRUE, TRUE, FALSE) && c(TRUE, FALSE, FALSE)
would return TRUE.

If/else statements are conditional statements that are used to control the flow
of the program. The if statement has the form:
if(condition) {
expression
}
If the condition evaluates to TRUE, the expression is executed. For example:
x <- -3
if(x < 0) {
print("x is a negative number")
}
and this would print ”x is a negative number” since the condition is evaluated
to TRUE.

The else statement has to be used together with the if statement. If the condi-
tion in the if statement is evaluated as FALSE, the code in the else statement
will execute. For example:
x <- 5
if(x < 0) {

3
print("x is a negative number")
} else {
print("x is either positive number or zero")
}
and this would print ”x is either a positive number or zero”.

Furthermore, we can use else if, to check for more than one condition.
if(x < 0) {
print("x is a negative number")
} else if(x == 0) {
print("x is zero")
} else {
print("x is a positive number")
}
We can also nest the if statements under if statements. For example:
if(x < 10) {
print("x is smaller than 10")
if(x < 5) {
print("x is smaller than 5")
}
else {
print("x is between 5 and 10")
}
} else {
print("x is not smaller than 10")
}

2 Loops
While loop works similarly to the if statement, however, the while loop will
continue executing (repeating), as long as the condition is TRUE.
while(condition) {
expression
}
For example:
n <- 5
sum <- 0
while(n > 0) {
sum <- sum + n
n <- n - 1
}

4
This code will sum 1+2+...+5

We can exit early out of the while loop by using break. For example:
ctr <- 1
while(ctr <= 7) {
if(ctr %% 5 == 0) {
break
}
print(paste("ctr is set to", ctr))
ctr <- ctr + 1
}
For loops look like:
for(variable in sequence) {
expression
}
For example:
cities <- c("New York", "London", "Amsterdam", "Tokyo")

for(city in cities) {
print(city)
}
For loop also works on lists; we can use the break to exit early from the for loop.

There is also next statement. We use next statement to skip to the next it-
eration of the loop. For example:
for(city in cities) {
if(nchar(city) == 6) {
next
}
print(city)
}
this code will print every city in the list, except the cities that have a name
length equal to 6.

There is also another way to write a for loop. Consider:


for(i in 1:length(cities)) {
print(cities[i])
}
Writing a loop this way is more versatile and often more useful.

5
3 Functions
We have already seen a lot of functions. The ones we used so far are built-in
functions. For example, to create a list we used list() function, and to print we
used print() function.

To call function, we have to specify the arguments to the function. For ex-
ample a function to calculate standard deviation is sd(). To call this function
we would pass a vector with values. For example:
sd(c(1, 5, 6, 7))
and this would return 2.629956

The sd() accepts two arguments. First argument is the argument x - the el-
ement that contains the values, and the second argument is na.rm argument.
By default, the value of na.rm is FALSE.
If we pass a list of elements that contains one N A value, and the na.rm argu-
ment is set to FALSE, then the sd() will return N A. However, if we set na.rm
to TRUE, sd() will ignore the N A value. For example:
vec <- c(1, 5, 6, NA)
sd(vec)
would return N A.
However,
sd(vec, TRUE)
would return 2.645751

An useful trick is to call args() function to see what arguments a function


accepts. For example:
args(sd)
returns function(x, na.rm = FALSE)

We can write our own functions. We write functions to help us solve a par-
ticular, well-defined problem.

To write a function we use the following format:


my_function <- function(arg1, arg2) {
body
}
where arg1 and arg2 are arguments that the function accepts, and body is the
body of the function that does some work.

For example, lets write a function that returns 3 times the number passed in.

6
triple <- function(x) {
3 * x
}
But more specifically, we can return a value from a function using return()
triple <- function(x) {
y <- 3 * x
return(y)
}
An important thing to understand is the scope of a function. The variables that
are defined inside a function, are not accessible outside of the function.

Another important note is that R passed arguments by value. This means


that the R function cannot change the value of the variable that we input as an
argument. The function will use that value to do some calculations and return
some value, but the value of the variable that was passed will not be changed
after the function is done executing.

We can load the built-in function to our program by using R Packages. The
base package is automatically installed, and it contains common functions such
as list(), print(), etc.

To install a package:
install.packages("ggvis")
To load a package:
library("ggvis")

4 The apply family


Suppose we have a variable called NYC defined as:
NYC <- list(pop = 8405837,
boroughs = c("Manhattan, "Brox", "Brooklyn",
"Queens", "Staten Island"),
capital = FALSE)
To find what class does each element belong to, we can use for loop to iterate
over each element and use class() function.
for(info in NYC) {
print(class(info))
}
and this prints out ”numerical” ”character” ”logical”

7
This is quite a bit of code to find out the class of elements in a list.
However, lapply() function is here to help.

lapply(NYC, class)
Now consider, a vector of cities that we used before. If we want to find the
number of characters of each city we can write something like:
cities <- c("New York", "Paris", "London", "Tokyo")

num_chars <- c()


for(i in 1:length(cities)) {
num_chars[i] <- nchar(cities[i])
}
We can do this much more quickly using lapply()
lapply(cities, nchar)
However, lapply() always returns a list. This is often not convenient, but we
can unlist by using unlist() function.
unlist(lapply(cities, nchar))
Suppose we have a list named oil that represents oil prices, and that we have a
function that triples the value.
We can use lapply() to quickly triple the prices
oil <- list(2.37, 2.49, 2.18, 2.22, 2.47, 2.32)
triple <- function(x) {
3 * x
}

result <- lapply(oil, triple)


unlist(result)
On the other hand, if we have a function multiply that takes two arguments,
value to multiply, and the factor by which to multiply, we can use lapply() as
follows
multiply <- function(x, factor) {
x * factor
}

result <- lapply(oil, multiply, factor = 3)


unlist(result)
lapply() is used to apply a function over a list or vector.

When the results that lapply() function returns are all of the same type, as

8
in the example with cities, we can use sapply() function.

sapply(cities, nchar)
and this returns a named list that contains the number of letters in each name.
The return value looks like:
New York Paris London ...
8 5 6 ...
We can choose not to name the elements by setting the USE.NAMES to FALSE.
By default, USE.NAMES is set to TRUE.
sapply(cities, nchar, USE.NAMES = FALSE)
As we have seen, sapply() works similarly to lapply(), but it tries to simplify
the output from list to an array.

Finally, we consider vapply() function. vapply() will also apply a function over
a list or a vector, but we also have to explicitly specify the output format.

Firstly, in our cities example we can use vapply() as:


vapply(cities, nchar, numeric(1))
Since nchar is a function that return a vector with length 1, with a numeric
value, in vapply() we have to specify the output with numeric(1)

5 Utilities
So far we have seen a number of useful functions, such as lapply(), sapply(),
print(), sd(), etc.

There are other useful utilities, such as Mathematical utilities.


For example, consider the following code:
c1 <- c(1.1, -7.1, 5.4, -2.7)
c2 <- c(-3.6, 4.1, -4.3, 6.5)
mean(c(sum(round(abs(c1))), sum(round(abs(c2)))))
Firstly, abs() function calculates the absolute value of an array of numerical
values.
Secondly, round() function rounds the input.
sum() function computes the sum of the input array.
mean() calculates the mean of the input values.

9
There are some useful functions for data structures. For example seq(). seq()
generates a sequence. For example:
seq(1, 10, by = 3)
returns a sequence: 1 4 7 10

Another function to consider is rep(), which stand for repeat. For example:
rep(c(8, 6, 4, 2), times = 2)
returns: 8 6 4 2 8 6 4 2

Instead of times argument, we can use each argument, to repeart each element,
instead of the whole list.
rep(c(8, 6, 4, 2), each = 2)
returns: 8 8 6 6 4 4 2 2

We can use sort() to sort a vector or a list. For example:


sort(c(10, 4, 2, 1))
outputs: 1 2 4 10

If we want to sort in decreasing order, we specify the argument decreasing =


T RU E

To inspect a structure of a data structure we use str()

We can use function is. ∗ () to check if someting is an object of some class.


For example:
li <- list()
is.list(li)
returns TRUE

We can convert between types by using as. ∗ () function. For example:


li2 <- as.list(c(1,2,3))
Regular expression are an essential part to R. Regular expressions are a se-
quence of (meta)characters that form a search pattern which we can use to
match strings.
We can use them to check if certain patterns exist in a string, to replace them
with other elements, or to extract patterns from a string.

Lets consider grepl() function. Suppose we have a vector


animals <- c("cat", "moose", "impala", "ant", "kiwi")

10
grepl() function accepts two arguments:
grepl(pattern = <regex>, x = <string>)
where < regex > is a pattern that we are trying to match, and < string > is a
vector of characters.
For example, if we want to check which of the strings in animals vector contain
letter a:
grepl(pattern = "a", x = animals)
which returns: TRUE FALSE TRUE TRUE FALSE

We use ˆ to specify that we want to match a pattern that starts with some
expression. For example, if we want to match strings that start with letter a:
grepl(pattern = "^a", x = animals)
returns: FALSE FALSE FALSE TRUE FALSE

On the other hand, we use $ to match strings that end with some pattern.
For example, to match strings that end in letter a:
grepl(pattern = "a$", x = animals)
returns: FALSE FALSE TRUE FALSE FALSE

grep() functions works similarly, however, the return will be a vector of in-
dices that match the pattern.
So, to find indices that contain letter a:
grep(pattern = "a", x = animals)
returns: 1 3 4

To get the same output from grepl() function, we need to use which() func-
tion. For example:
which(grepl(pattern = "a", x = animals))
returns: 1 3 4

We use sub() function to replace a pattern matched in a string. sub() func-


tion accepts three arguments:
sub(pattern = <regex>, replacement = <str>, x = <str>)
This works similarly to grep(), but we have to specify the rpelacement argument.

For example:
sub(regex = "a", replacement = "o", x = animals)

11
returns: ”cot” ”moose” ”impola” ”ont” ”kiwi”

As we can see, in ”impala” only the first a got replaced. sub() replaces only the
first occurance of the matched pattern. If we want to replace all the occurences
we use gsub() function.

It is often useful to use Times and Dates in R programming. For example,


if we want to run a script every hour, or every day, it is useful to log the time
and date.

To get the current date, we use:


today <- Sys.Date()
today
and if we print today the output would be ”2022-10-30” (today’s date)

It is important to note that the class(today) is not simply a string. The class
of the object today is ”Date”.

To get the current date and time we use Sys.time(). For example:
now <- Sys.time()
now
outputs ”2022-10-30 17:58:33 CEST”

If we check the class() of now, we get that the class is ”POSIXct” ”POSIXt”
These classes make sure that the time is compatible accross other operating
system according to POSIX standard.

If we want to change the class of a string to date we do:


my_birthday <- as.Date("1999-11-22")
The default format of as.Date() is year-month-day. If we try to reverse the
order of the month and date we would get an error.

However, we can specify the format as an argument to as.Date().


my_birthday <- as.Date("1999-22-11", format = "%Y-%d-%m")
To create a POSIXct object we use as.P OSIXct()
my_day <- as.POSIXct("1999-11-22 11:25:10")
We can do calculations on Date object. For example:
date <- as.Date("2022-10-29")
date + 1

12
returns ”2022-10-30”

Similarly, we can do arithmetic on P OSIXct object, and we can find a dif-


ferene between two dates by subtracting them.

There are some dedicated packages to deal with time in R, and they are:
• lubridate
• zoo
• xts

13

You might also like