0% found this document useful (0 votes)
65 views15 pages

Learning R

R can be used as a simple calculator, with basic arithmetic operators like addition, subtraction, multiplication, division, exponentiation, and modulo. Vectors are one-dimensional arrays that can store numeric, character, or logical data, and are created using the c() function. Logical comparison operators like <, >, <=, >=, ==, != allow evaluating whether conditions are TRUE or FALSE across vector elements. Selection of specific vector elements can be done by index number or name, or by using a logical vector from comparisons inside square brackets.

Uploaded by

Wilson Rangel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views15 pages

Learning R

R can be used as a simple calculator, with basic arithmetic operators like addition, subtraction, multiplication, division, exponentiation, and modulo. Vectors are one-dimensional arrays that can store numeric, character, or logical data, and are created using the c() function. Logical comparison operators like <, >, <=, >=, ==, != allow evaluating whether conditions are TRUE or FALSE across vector elements. Selection of specific vector elements can be done by index number or name, or by using a logical vector from comparisons inside square brackets.

Uploaded by

Wilson Rangel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Arithmetic with R

In its most basic form, R can be used as a simple calculator. Consider the following
arithmetic operators:

 Addition: +
 Subtraction: -
 Multiplication: *
 Division: /
 Exponentiation: ^
 Modulo: %%

The last two might need some explaining:

 The ^ operator raises the number to its left to the power of the number to its right:
for example 3^2 is 9.
 The modulo returns the remainder of the division of the number to the left by the
number on its right, for example 5 modulo 3 or 5 %% 3 is 2.

With this knowledge, follow the instructions to complete the exercise.

Basic data types in R


R works with numerous data types. Some of the most basic types to get started are:

 Decimal values like 4.5 are called numerics.


 Whole numbers like 4 are called integers. Integers are also numerics.
 Boolean values (TRUE or FALSE) are called logical.
 Text (or string) values are called characters.

Note how the quotation marks in the editor indicate that "some text" is a string.

What's that data type?


Do you remember that when you added 5 + "six", you got an error due to a mismatch
in data types? You can avoid such embarrassing situations by checking the data type of
a variable beforehand. You can do this with the class() function, as the code in the
editor shows.
Create a vector (2)
Let us focus first!

On your way from rags to riches, you will make extensive use of vectors. Vectors are
one-dimension arrays that can hold numeric data, character data, or logical data. In
other words, a vector is a simple tool to store data. For example, you can store your
daily gains and losses in the casinos.

In R, you create a vector with the combine function c(). You place the vector elements
separated by a comma between the parentheses. For example:
numeric_vector <- c(1, 2, 3)
character_vector <- c("a", "b", "c")

Once you have created these vectors in R, you can use them to do calculations.

Create a vector (3)


After one week in Las Vegas and still zero Ferraris in your garage, you decide that it is
time to start using your data analytical superpowers.

Before doing a first analysis, you decide to first collect all the winnings and losses for
the last week:

For poker_vector:

 On Monday you won $140


 Tuesday you lost $50
 Wednesday you won $20
 Thursday you lost $120
 Friday you won $240

For roulette_vector:

 On Monday you lost $24


 Tuesday you lost $50
 Wednesday you won $100
 Thursday you lost $350
 Friday you won $10

Naming a vector
As a data analyst, it is important to have a clear view on the data that you are using.
Understanding what each element refers to is therefore essential.

In the previous exercise, we created a vector with your winnings over the week. Each
vector element refers to a day of the week but it is hard to tell which element belongs to
which day. It would be nice if you could show that in the vector itself.

You can give a name to the elements of a vector with the names() function. Have a look
at this example:
some_vector <- c("John Doe", "poker player")
names(some_vector) <- c("Name", "Profession")
This code first creates a vector some_vector and then gives the two elements a name.
The first element is assigned the name Name, while the second element is
labeled Profession. Printing the contents to the console yields following output:
Naming a vector (2)
If you want to become a good statistician, you have to become lazy. (If you are already
lazy, chances are high you are one of those exceptional, natural-born statistical talents.)

In the previous exercises you probably experienced that it is boring and frustrating to
type and retype information such as the days of the week. However, when you look at it
from a higher perspective, there is a more efficient way to do this, namely, to assign the
days of the week vector to a variable!

Just like you did with your poker and roulette returns, you can also create a variable that
contains the days of the week. This way you can use and re-use it.
Calculating total winnings (2)
Now you understand how R does arithmetic with vectors, it is time to get those Ferraris
in your garage! First, you need to understand what the overall profit or loss per day of
the week was. The total daily profit is the sum of the profit/loss you realized on poker
per day, and the profit/loss you realized on roulette per day.

In R, this is just the sum of roulette_vector and poker_vector.

Calculating total winnings (3)


Based on the previous analysis, it looks like you had a mix of good and bad days. This
is not what your ego expected, and you wonder if there may be a very tiny chance you
have lost money over the week in total?

A function that helps you to answer this question is sum(). It calculates the sum of all
elements of a vector. For example, to calculate the total amount of money you have
lost/won with poker you do:
total_poker <- sum(poker_vector)
Comparing total winnings
Oops, it seems like you are losing money. Time to rethink and adapt your strategy! This
will require some deeper analysis…

After a short brainstorm in your hotel's jacuzzi, you realize that a possible explanation
might be that your skills in roulette are not as well developed as your skills in poker. So
maybe your total gains in poker are higher (or > ) than in roulette.

Vector selection: the good times


Your hunch seemed to be right. It appears that the poker game is more your cup of tea
than roulette.
Another possible route for investigation is your performance at the beginning of the
working week compared to the end of it. You did have a couple of Margarita cocktails at
the end of the week…

To answer that question, you only want to focus on a selection of the total_vector. In


other words, our goal is to select specific elements of the vector. To select elements of
a vector (and later matrices, data frames, …), you can use square brackets. Between
the square brackets, you indicate what elements to select. For example, to select the
first element of the vector, you type poker_vector[1]. To select the second element of
the vector, you type poker_vector[2], etc. Notice that the first element in a vector has
index 1, not 0 as in many other programming languages.

Vector selection: the good times (2)


How about analyzing your midweek results?

To select multiple elements from a vector, you can add square brackets at the end of it.
You can indicate between the brackets what elements should be selected. For example:
suppose you want to select the first and the fifth day of the week: use the vector c(1,
5) between the square brackets. For example, the code below selects the first and fifth
element of poker_vector:
poker_vector[c(1, 5)]
Vector selection: the good times (3)
Selecting multiple elements of poker_vector with c(2, 3, 4) is not very convenient.
Many statisticians are lazy people by nature, so they created an easier way to do
this: c(2, 3, 4) can be abbreviated to2:4, which generates a vector with all natural
numbers from 2 up to 4.
So, another way to find the mid-week results is poker_vector[2:4]. Notice how the
vector 2:4 is placed between the square brackets to select element 2 up to 4.

Vector selection: the good times (4)


Another way to tackle the previous exercise is by using the names of the vector
elements (Monday, Tuesday, …) instead of their numeric positions. For example,
poker_vector["Monday"]
will select the first element of poker_vector since "Monday" is the name of that first
element.

Just like you did in the previous exercise with numerics, you can also use the element
names to select multiple elements, for example:

poker_vector[c("Monday","Tuesday")]

Selection by comparison - Step 1


By making use of comparison operators, we can approach the previous question in a
more proactive way.

The (logical) comparison operators known to R are:

 < for less than


 > for greater than
 <= for less than or equal to
 >= for greater than or equal to
 == for equal to each other
 != not equal to each other

As seen in the previous chapter, stating 6 > 5 returns TRUE. The nice thing about R is
that you can use these comparison operators also on vectors. For example:

c(4, 5, 6) > 5
[1] FALSE FALSE TRUE

This command tests for every element of the vector if the condition stated by the
comparison operator is TRUE or FALSE.
# Poker and roulette winnings from Monday to Friday:
poker_vector <- c(140, -50, 20, -120, 240)
roulette_vector <- c(-24, -50, 100, -350, 10)
days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
names(poker_vector) <- days_vector
names(roulette_vector) <- days_vector

# Which days did you make money on poker?
selection_vector <- c(140, -50, 20, -120, 240) > 0
  
# Print out selection_vector

selection_vector

Selection by comparison - Step 2


Working with comparisons will make your data analytical life easier. Instead of selecting
a subset of days to investigate yourself (like before), you can simply ask R to return only
those days where you realized a positive return for poker.

In the previous exercises you used selection_vector <- poker_vector > 0 to find the
days on which you had a positive poker return. Now, you would like to know not only the
days on which you won, but also how much you won on those days.
You can select the desired elements, by putting selection_vector between the square
brackets that follow poker_vector:
poker_vector[selection_vector]
R knows what to do when you pass a logical vector in square brackets: it will only select
the elements that correspond to TRUE in selection_vector.
# Poker and roulette winnings from Monday to Friday:
poker_vector <- c(140, -50, 20, -120, 240)
roulette_vector <- c(-24, -50, 100, -350, 10)
days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
names(poker_vector) <- days_vector
names(roulette_vector) <- days_vector

# Which days did you make money on poker?
selection_vector <- poker_vector > 0

# Select from poker_vector these days
poker_winning_days <- poker_vector[selection_vector]

poker_winning_days

Advanced selection
Just like you did for poker, you also want to know those days where you realized a
positive return for roulette.

# Poker and roulette winnings from Monday to Friday:
poker_vector <- c(140, -50, 20, -120, 240)
roulette_vector <- c(-24, -50, 100, -350, 10)
days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
names(poker_vector) <- days_vector
names(roulette_vector) <- days_vector

# Which days did you make money on roulette?
selection_vector <-roulette_vector > 0

# Select from roulette_vector these days
roulette_winning_days <- roulette_vector[selection_vector]

roulette_winning_days

What's a matrix?
In R, a matrix is a collection of elements of the same data type (numeric, character, or
logical) arranged into a fixed number of rows and columns. Since you are only working
with rows and columns, a matrix is called two-dimensional.

You can construct a matrix in R with the matrix() function. Consider the following


example:

matrix(1:9, byrow = TRUE, nrow = 3)

In the matrix() function:

 The first argument is the collection of elements that R will arrange into the rows
and columns of the matrix. Here, we use 1:9 which is a shortcut for c(1, 2, 3,
4, 5, 6, 7, 8, 9).
 The argument byrow indicates that the matrix is filled by the rows. If we want the
matrix to be filled by the columns, we just place byrow = FALSE.
 The third argument nrow indicates that the matrix should have three rows.
 # Construct a matrix with 3 rows that contain the numbers 1 up to 9

 matrix(1:9,byrow =  TRUE ,nrow=3)



matrix(1:9,byrow = TRUE ,nrow=3)
 [,1] [,2] [,3]
 [1,] 1 2 3
 [2,] 4 5 6
 [3,] 7 8 9

Analyze matrices, you shall


It is now time to get your hands dirty. In the following exercises you will analyze the box
office numbers of the Star Wars franchise. May the force be with you!

In the editor, three vectors are defined. Each one represents the box office numbers
from the first three Star Wars movies. The first element of each vector indicates the US
box office revenue, the second element refers to the Non-US box office (source:
Wikipedia).

In this exercise, you'll combine all these figures into a single vector. Next, you'll build a
matrix from this vector.

# Box office Star Wars (in millions!)
new_hope <- c(460.998, 314.4)
empire_strikes <- c(290.475, 247.900)
return_jedi <- c(309.306, 165.8)

# Create box_office
box_office <- c(new_hope,empire_strikes,return_jedi)

# Construct star_wars_matrix
star_wars_matrix <- matrix(box_office, byrow = TRUE, nrow=2)

star_wars_matrix

# Create box_office
box_office <- c(new_hope,empire_strikes,return_jedi)
# Construct star_wars_matrix
star_wars_matrix <- matrix(box_office, byrow = TRUE, nrow=2)
star_wars_matrix
[,1] [,2] [,3]
[1,] 460.998 314.400 290.475
[2,] 247.900 309.306 165.800

Naming a matrix
To help you remember what is stored in star_wars_matrix, you would like to add the names
of the movies for the rows. Not only does this help you to read the data, but it is also
useful to select certain elements from the matrix.

Similar to vectors, you can add names for the rows and the columns of a matrix

rownames(my_matrix) <- row_names_vector


colnames(my_matrix) <- col_names_vector
We went ahead and prepared two vectors for you: region, and titles. You will need these
vectors to name the columns and rows of star_wars_matrix, respectively.
# Box office Star Wars (in millions!)
new_hope <- c(460.998, 314.4)
empire_strikes <- c(290.475, 247.900)
return_jedi <- c(309.306, 165.8)

# Construct matrix
star_wars_matrix <- matrix(c(new_hope, empire_strikes, return_jedi), nrow = 3, byrow = TRUE)

# Vectors region and titles, used for naming
region <- c("US", "non-US")
titles <- c("A New Hope", "The Empire Strikes Back", "Return of the Jedi")

# Name the columns with region

colnames(star_wars_matrix) <- region

# Name the rows with titles

rownames(star_wars_matrix) <- titles

# Print out star_wars_matrix
star_wars_matrix

Calculating the worldwide box office


The single most important thing for a movie in order to become an instant legend in
Tinseltown is its worldwide box office figures.

To calculate the total box office revenue for the three Star Wars movies, you have to
take the sum of the US revenue column and the non-US revenue column.

In R, the function rowSums() conveniently calculates the totals for each row of a matrix.


This function creates a new vector:
rowSums(my_matrix)

# Construct star_wars_matrix
box_office <- c(460.998, 314.4, 290.475, 247.900, 309.306, 165.8)
region <- c("US", "non-US")
titles <- c("A New Hope", 
                 "The Empire Strikes Back", 
                 "Return of the Jedi")
               
star_wars_matrix <- matrix(box_office, 
                      nrow = 3, byrow = TRUE,
                      dimnames = list(titles, region))

# Calculate worldwide box office figures
worldwide_vector <- rowSums(star_wars_matrix)

worldwide_vector

Adding a row
Just like every action has a reaction, every cbind() has an rbind(). (We admit, we are
pretty bad with metaphors.)

Your R workspace, where all variables you defined 'live' (check out what a workspace
is), has already been initialized and contains two matrices:

 star_wars_matrix that we have used all along, with data on the original trilogy,
 star_wars_matrix2, with similar data for the prequels trilogy.
Explore these matrices in the console if you want to have a closer look. If you want to
check out the contents of the workspace, you can type ls() in the console.

# star_wars_matrix and star_wars_matrix2 are available in your workspace
star_wars_matrix  
star_wars_matrix2 

# Combine both Star Wars trilogies in one matrix
all_wars_matrix <- rbind(star_wars_matrix,star_wars_matrix2)
all_wars_matrix

Adding a column for the Worldwide box


office
In the previous exercise you calculated the vector that contained the worldwide box
office receipt for each of the three Star Wars movies. However, this vector is not yet part
of star_wars_matrix.
You can add a column or multiple columns to a matrix with the cbind() function, which
merges matrices and/or vectors together by column. For example:
big_matrix <- cbind(matrix1, matrix2, vector1 ...)

You might also like