0% found this document useful (0 votes)
17 views26 pages

R Tutorial Ie255

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views26 pages

R Tutorial Ie255

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

A Short Introduction to R for IE255

October 3, 2020

Preface
This document contains a short introduction to the programming language R. By the help of each
example given in this document, you should be able to gather a basic knowledge about R which will help
you to use R and make the calculations and plots required in the probability course IE255. In order
to comprehend this programming language, it is recommended that you try yourself step by step the
applications presented in this document.

Note that this document is just explaining the basics. The names of the functions necessary directly
for probability calculations like eg. runif() will be mentioned in the course documents. With the
command

?runif
you get R-documentation of that function. Also in the appendix of this document you can nd some of
the most important R-commands used in probability and statistics. You will need to use them later in
IE255.

You can download the latest version of R from http://cran.r-project.org/. For Windows users,
click Windows link, then the base *.exe le.
link and you will see the download link for the
Once you install R, you can use directly the windows-interface or you can also use R-studio, which is
preferred by many students. In any case we strongly recommend that you write and save all your codes
in script les. In the windows R-interface just click File from the quick access bar, then New script and
you can write your code inside this script. If you have a complete code in your script le, you can select
the whole script with Ctrl+A and then run it with Ctrl+R to run your code in the R console in a fast
manner. You can always save your script les, then open them again by clicking File and Open script
from the quick access bar.

My thank is due to my former PhD student smail Ba³o§lu from Maltepe Unversity who shared the
latex sources of the R-tutorials he has written. This is a version slightly changed for the special needs of
IE255.

Wolfgang Hörmann

1
1 R Works with Vectors
1.1 Creating Vectors
In order to assign a value to a specied variable (e.g. 3 to x), we do the following:

x <- 3
or

x = 3
We will use the operator <- in our future examples for assigning values.
1
When we assign a number to a variable, R considers it as a vector with a single element. So, by using
[.] next to the specied variable, we can assign another element onto any index we want. Finally, if we
want to see what is stored in that specied variable, simply we write its name and press enter.

x[4] <- 7.5


x # press enter to display the content of x
# [1] 3.0 NA NA 7.5
Here, NA stands for not available. In fact, we have not assigned any values for the second and the
third indices of the vector x.
You can use # to add comments on a command line. R will ignore the rest of the line after this
symbol. However, the next line will be executed by R. (So you do not have to close your comment line
with # when you nish)
We can create an consecutive integer vector between two integers by a simple command.

x <- 1:8 # creates a consecutive integer vector


x
# [1] 1 2 3 4 5 6 7 8
y <- 15:11
y
# [1] 15 14 13 12 11
Following operation will add 3 to each element of x and store it as y.
y <- x+3
y
# [1] 4 5 6 7 8 9 10 11
The previous command actually sums up a vector of length 8 with a single element vector. Here, R
repeats the short vector again and again until it reaches the length of the long vector. Following sequence
of commands explains this operation clearly.

x <- 1:8
y <- 1:4
x
# [1] 1 2 3 4 5 6 7 8
y
# [1] 1 2 3 4
x+y # we can see the summation without storing them to any new variable
# [1] 2 4 6 8 6 8 10 12
In this summation y is repeated up to index 8 (since x is a vector of length 8). So, the fth element of
x is summed up with the rst element of y, the sixth element of x is summed up with the second element
of y and so forth. Yet, we might wonder what would it be if the length of y was not a multiple of the
length of y. We can try to see it.

1 There are signicant dierences between assignment operators that are not very important for ba-
sic R-programming. Still, you may be interested in a brief explanation of these dierence at link:
http://stat.ethz.ch/R-manual/R-patched/library/base/html/assignOps.html

2
x <- 1:8
y <- 1:3
x+y
# [1] 2 4 6 5 7 9 8 10
# Warning message:
# In x + y : longer object length is not a multiple of shorter object length
R again repeats the short vector until it reaches the length of the long vector. However, the last
repetition may not be complete. R returns a warning message about this, yet it executes the operation.
You should also know that you can make subtractions, multiplications, divisions, power and modular
arithmetic operations using the same principle. We will get into these operations in section 1.4.
We can create vectors also with specied values. As instance, let us create a vector of length 6 with
values 4, 8, 15, 16, 23, 42 and another vector of length 4 with values 521, 522, 523, 547. We use the function
c in order to combine those values in a vector. We can also learn about the number of elements in a
vector by using length() command.

x <- c(4,8,15,16,23,42) # "c'()' combines a series of values


y <- c(521,522,523,547)
x
# [1] 4 8 15 16 23 42
y
# [1] 521 522 523 547
z <- c(x,y) # we can also combine two vectors
z
# [1] 4 8 15 16 23 42 521 522 523 547

length(z)
# [1] 10
We can also revert a vector from the last element to the rst.

z <- rev(z) # we can use the same object to reassign that object
z
# [1] 547 523 522 521 42 23 16 15 8 4
Suppose we would like to create a vector of length 10, elements of which will all be equal to 5. We do
the following.

x <- rep(5,10) # "rep"eat 5 ten times


x
# [1] 5 5 5 5 5 5 5 5 5 5
y <- c(3,5,7)
z <- rep(y,4) # repeat vector y 4 times
z
# [1] 3 5 7 3 5 7 3 5 7 3 5 7
rep(y,c(2,3,5)) # repeat the elements of y vector at an amount
# of the elements of the next vector
# [1] 3 3 5 5 5 7 7 7 7 7
From the previous example, we see that we can also repeat vectors. As a last example for this section,
we would like to create a vector of length 21 between values 2 and 3, so that the dierence between
consecutive elements will all be equal.

x <- seq(2,3,length.out=21) # "seq" stands for sequence


x
# [1] 2.00 2.05 2.10 2.15 2.20 2.25 2.30 2.35 2.40 2.45 2.50
# [12] 2.55 2.60 2.65 2.70 2.75 2.80 2.85 2.90 2.95 3.00

3
If we are not interested in the length of the sequence but the step size, we can use by parameter
instead of length.out.
x <- seq(2,3,by=0.05)
x
# [1] 2.00 2.05 2.10 2.15 2.20 2.25 2.30 2.35 2.40 2.45 2.50
# [12] 2.55 2.60 2.65 2.70 2.75 2.80 2.85 2.90 2.95 3.00

1.2 Logical Expressions


You can use the following logical operators to write logical expressions, so they will return a vector
of TRUEs and FALSEs (in other words, a vector of zeros and ones, which can also be used in vector
operations).

• < : less than

• <=: less than or equal to

• > : greater than

• >=: greater than or equal to

• ==: equal to (do not forget that a single = symbol is used for assigning values)

• !=: not equal to

In the following sequence of examples, we create a vector and use it in dierent logical expressions. If a
vector element satises the expression, it returns a TRUE, otherwise a FALSE in the corresponding index.
You can use & as and and | as or in between logical expressions.
x <- 10:20
x
# [1] 10 11 12 13 14 15 16 17 18 19 20
x<17
# [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE
x<=17
# [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE
x>14
# [1] FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE
x>=14
# [1] FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
x==16
# [1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
x!=16
# [1] TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE

(x<=16) & (x>=12)


# [1] FALSE FALSE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE
(x<=11) | (x>=18)
# [1] TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE
So, what can we do with these logical expressions? As a rst simple example, we have a vector x of
integers from 1 to 20. We want to obtain a vector such that for every element of x that is less than 8, it
will yield zero values and the other elements will remain the same as they are in x.

4
x <- 1:20
y <- (x>=8)*(x)
y
# [1] 0 0 0 0 0 0 0 8 9 10 11 12 13 14 15 16 17
#[18] 18 19 20

As for the second example, we will evaluate the ordering costs of some goods. We can order at least
30 and at most 50 units of goods from our supplier in a single order. We have a xed cost of 50 if we
order less than or equal to 45 units and $15 otherwise. A single unit costs $7 if we order less than 40
units and $6.5 otherwise. If we want to evaluate the total ordering cost for each alternative:

units <- 30:50


marginalcost <- 7*units*(units<40)+6.5*units*(units>=40)
marginalcost
# [1] 210.0 217.0 224.0 231.0 238.0 245.0 252.0 259.0
# [9] 266.0 273.0 260.0 266.5 273.0 279.5 286.0 292.5
#[17] 299.0 305.5 312.0 318.5 325.0

fixedcost <- 50*(units<=45)+15*(units>45)


fixedcost
# [1] 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 15
#[18] 15 15 15 15

totalcost <- fixedcost+marginalcost


totalcost
# [1] 260.0 267.0 274.0 281.0 288.0 295.0 302.0 309.0
# [9] 316.0 323.0 310.0 316.5 323.0 329.5 336.0 342.5
#[17] 314.0 320.5 327.0 333.5 340.0
Following from the previous example, say we are not interested in an ordering that costs greater than
$318. Under these circumstances, we just want to make a list of the amount of units that we can order
and the list of costs correspond to that amount of units.

units[totalcost<=318] #returns the amount of units corresponding


#to a cost less than or equal to 318
# [1] 30 31 32 33 34 35 36 37 38 40 41 46
totalcost[totalcost<=318] #returns the total costs corresponding to a total
#cost less than or equal to 318 (got the idea?)
# [1] 260.0 267.0 274.0 281.0 288.0 295.0 302.0 309.0
# [9] 316.0 310.0 316.5 314.0
The rst of the previous two commands tells to yield the elements of units vector only for which the
corresponding elements of totalcost vector is less than or equal to 318. The second command tells to
yield the elements of totalcost vector only which are less than or equal to 318.
Like we did in the previous example, we can extract a subvector (subset of a vector which inherits
the same sequence) from a vector with dierent ways. Check out following examples:

5
x <- seq(5,8,by=0.3) # we will have 11 elements in this vector
x
# [1] 5.0 5.3 5.6 5.9 6.2 6.5 6.8 7.1 7.4 7.7 8.0
length(x)
# [1] 11

y1 <- x[3:7] # extract a subvector from the indices 3 to 7


y1
# [1] 5.6 5.9 6.2 6.5 6.8

y2 <- x[2*(1:5)] # extract a subvector from even indices


y2
# [1] 5.3 5.9 6.5 7.1 7.7

y3 <- x[-1] # extract a subvector by eliminating the first index


y3
# [1] 5.3 5.6 5.9 6.2 6.5 6.8 7.1 7.4 7.7 8.0

y4 <- x[-length(x)] # extract a subvector by eliminating the last index


y4
# [1] 5.0 5.3 5.6 5.9 6.2 6.5 6.8 7.1 7.4 7.7

y5 <- x[-seq(1,11,3)] # extract a subvector by eliminating all indices given


y5
# [1] 5.3 5.6 6.2 6.5 7.1 7.4 8.0

y6 <- x[seq(1,11,3)] # extract a subvector by choosing all indices given


y6
# [1] 5.0 5.9 6.8 7.7

y7 <- x[x<7] # extract a subvector with elements satisfy being less than 7
y7
# [1] 5.0 5.3 5.6 5.9 6.2 6.5 6.8

1.3 Creating Matrices


The vectors we created and used in Sections 1.1 and 1.2 are not considered as row or column vectors.
They are just vectors. Do not get confused with the display of the vector. We can create an explicit row
vector, a 1×d matrix using the function t() transpose.

x <- 1:5
x # a vector object
#[1] 1 2 3 4 5
y <- t(x)
y # a 1 x 5 matrix
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 2 3 4 5
As you can see, R displays a row vector in a completely dierent way. And if we take the transpose
of vector y, we will see the actual display of a column vector.

z <- t(t(x))
z # a 5x1 matrix
# [,1]
# [1,] 1
# [2,] 2

6
# [3,] 3
# [4,] 4
# [5,] 5
In order to create an m×n matrix in R, rst we need to create a vector (let us name it vec) which
contains the columns of the matrix sequentially from the rst to the last. Then we use the function
simply matrix(vec,nrow=m,ncol=n).
vec <- 1:12
x <- matrix(vec,nrow=3,ncol=4)

x
# [,1] [,2] [,3] [,4]
# [1,] 1 4 7 10
# [2,] 2 5 8 11
# [3,] 3 6 9 12

t(x)
# [,1] [,2] [,3]
# [1,] 1 2 3
# [2,] 4 5 6
# [3,] 7 8 9
# [4,] 10 11 12
It is also possible to assign the elements of a matrix row by row.

vec <- 1:12


x <- matrix(vec,nrow=3,ncol=4,byrow=TRUE)
x
# [,1] [,2] [,3] [,4]
# [1,] 1 2 3 4
# [2,] 5 6 7 8
# [3,] 9 10 11 12
You can take the inverse of a n×n matrix by using solve() function.

x <- matrix(c(1,2,-1,1,2,1,2,-2,-1),nrow=3,ncol=3)
x
[,1] [,2] [,3]
[1,] 1 1 2
[2,] 2 2 -2
[3,] -1 1 -1

xinv <- solve(x)


xinv
# [,1] [,2] [,3]
# [1,] 0.0000000 0.25000000 -0.5
# [2,] 0.3333333 0.08333333 0.5
# [3,] 0.3333333 -0.16666667 0.0
You can create a matrix that has its all elements equal by writing that specic value into the rst
parameter position in the function matrix(). You can also assign a vector into the diagonal elements of
a square matrix with the function diag().

7
x <- matrix(0,nrow=4,ncol=4)
x
# [,1] [,2] [,3] [,4]
# [1,] 0 0 0 0
# [2,] 0 0 0 0
# [3,] 0 0 0 0
# [4,] 0 0 0 0

diag(x) <- 1 # assigns 1 to all diagonal elements of x


x
# [,1] [,2] [,3] [,4]
# [1,] 1 0 0 0
# [2,] 0 1 0 0
# [3,] 0 0 1 0
# [4,] 0 0 0 1

You can learn about the number of columns, number of rows and the total number of elements in a
matrix or directly the dimension of a matrix by using the following functions.

x <- matrix(0,ncol=5,nrow=4)
ncol(x)
# [1] 5
nrow(x)
# [1] 4
length(x)
# [1] 20
dim(x) #
# [1] 4 5
Of course we can also make matrix multiplication in R using the %*% operator:

A <- matrix(1:4,ncol=2)
y <- 1:2
t(y)%*%A
# [,1] [,2]
#[1,] 5 11
A%*%t(t(y))
# [,1]
#[1,] 7
#[2,] 10
A%*%y # gives the same result as R knows that y must be a column vector here.

1.4 Arithmetic Operations in R


We made a short introduction to the arithmetic operations in Section 1.1. We have stated that we can
sum and multiply two vectors by components and we can also make subtraction, division and modular
arithmetic operations in the same manner.

x <- 2*(1:5)
x
# [1] 2 4 6 8 10
y <- 1:5
y
# [1] 1 2 3 4 5
x+y
# [1] 3 6 9 12 15

8
x*y
# [1] 2 8 18 32 50
x/y
# [1] 2 2 2 2 2
x-y
# [1] 1 2 3 4 5
x^2 # makes a power operation
# [1] 4 16 36 64 100
x^y
# [1] 2 16 216 4096 100000
x%%3 # yields mod(3) of every element in x
# [1] 2 1 0 2 1

y <- 3:7
y
# [1] 3 4 5 6 7

x%%y # makes a productwise modular operation


# [1] 2 0 1 2 3
x%/%y # makes an integer division
# [1] 0 1 1 1 1
In the previous example, x and y were just vectors. Even if one of them is dened as a row vector, R
again would do those operations but this time the results would also be row vectors.
You can nd the maximum value with max() and its minimum value with min(). You can sum up
all the elements of a vector with sum() and take the product of all the elements of a vector with prod().

x <- c(3,1,6,5,8,10,9,12,3)
min(x)
# [1] 1
max(x)
# [1] 12
sum(x)
# [1] 57
prod(x)
# [1] 2332800

You can compare two vectors component wise using pmax() and pmin(), so you can either obtain
the component-wise maximum or the component-wise minimum of two vectors. You can also sort a
vector with the function sort() and the function order() yields the index sequence of sorted vector.
Both functions sort values from minimum to maximum by default, but we can use additional parameter
decreasing=TRUE to obtain an order from maximum to minimum.

x <- 1:10
y <- 10:1
z <- c(3,2,1,6,5,4,10,9,8,7)

a <- pmax(x,y,z) # you can write as many vectors as you want


a
# [1] 10 9 8 7 6 6 10 9 9 10
sort(a)
# [1] 6 6 7 8 9 9 9 10 10 10
order(a)
# [1] 5 6 4 3 2 8 9 1 7 10

b <- pmin(x,y,z)

9
b
# [1] 1 2 1 4 5 4 4 3 2 1
sort(b,decreasing=TRUE)
# [1] 5 4 4 4 3 2 2 1 1 1
order(b,decreasing=TRUE)
# [1] 5 4 6 7 8 2 9 1 3 10
R can also do matrix multiplications with %*% operator. This operator should be handled carefully to
obtain correct results. Be sure about the dimensions of your matrices. R is also capable of making some
corrections if the dimensions of the matrices do not hold.

x <- matrix(1:6,ncol=2,nrow=3)
x
# [,1] [,2]
# [1,] 1 4
# [2,] 2 5
# [3,] 3 6

y <- matrix(1:4,ncol=2,nrow=2)
y
# [,1] [,2]
# [1,] 1 3
# [2,] 2 4

x%*%y
# [,1] [,2]
# [1,] 9 19
# [2,] 12 26
# [3,] 15 33

y%*%x
# Error in y %*% x : non-conformable arguments

y%*%t(x) # taking the transpose should help


[,1] [,2] [,3]
[1,] 13 17 21
[2,] 18 24 30
Consider matrix multiplication of two vectors. R interprets the rst vector as a row vector and the
operation yields a scalar. If we were to make a matrix multiplication of two column vectors, R would
would give an error. To return an outer product, the second vector must be a row vector.

x <- 1:3
y <- 3:1

x%*%y # R makes a correction by applying transpose to x


# [,1]
# [1,] 10

t(x)%*%t(y)
# Error in t(x) %*% t(y) : non-conformable arguments

t(x)%*%y # same as the first operation but we have a correct notation now
# [,1]
# [1,] 10

10
x%*%t(y) # only this one returns an outer product
# [,1] [,2] [,3]
# [1,] 3 2 1
# [2,] 6 4 2
# [3,] 9 6 3
Given a vector of real values, you can obtain the cumulative sums vector by the function cumsum()
and the cumulative products vector by the function cumprod(). On the other hand diff() gives you the
dierences between the consecutive elements of a vector.

x <- c(1,4,5,6,2,12)
y <- cumsum(x)
y
# [1] 1 5 10 16 18 30
# every index has the sum of the elements in x up to that index

z <- cumprod(x)
z
# [1] 1 4 20 120 240 2880
# every index has the product of the elements in x up to that index

diff(z)
# [1] 3 16 100 120 2640

You can evaluate the factorial of a positive real number with factorial() and absolute value of
any real number with abs(). You can take the square root of positive real number with sqrt(), and
the logarithm of a positive real number with log(). You can compute the exponential function of a real
number with exp() and the gamma function of a positive real number with gamma(). For integer rounding,
floor() yields the largest integer which is less than or equal to the specied value and ceiling() yields
the smallest integer which is greater than or equal to the specied value. as.integer() yields only the
integer part of the specied value.

factorial(3)
# [1] 6
factorial(1:6)
# [1] 1 2 6 24 120 720

abs(-4)
# [1] 4
abs(c(-3:3))
# [1] 3 2 1 0 1 2 3

sqrt(4)
# [1] 2
sqrt(1:9)
# [1] 1.000000 1.414214 1.732051 2.000000 2.236068 2.449490 2.645751 2.828427
# [9] 3.000000

log(100) # this is natural logarithm unless any base is defined


# [1] 4.60517
log10(100) # this is logarithm with base 10
# [1] 2
log2(100) # this is logarithm with base 2
# [1] 6.643856
log(100,5) # this is logarithm with base 5, which is the second parameter in log()
# [1] 2.861353

11
log(c(10,20,30,40))
# [1] 2.302585 2.995732 3.401197 3.688879

exp(4.60517) # must yield 100, maybe with a rounding error


# [1] 99.99998
exp(log(100)) # no rounding errors
# [1] 100
exp(seq(-2,2,0.4))
# [1] 0.1353353 0.2018965 0.3011942 0.4493290 0.6703200 1.0000000 1.4918247
# [8] 2.2255409 3.3201169 4.9530324 7.3890561

gamma(5) # equivalent to factorial(4)


# [1] 24
gamma(5.5) # equivalent to factorial(4.5)
# [1] 52.34278

x <- c(-3,-3.5,4,4.2)
floor(x)
# [1] -3 -4 4 4
ceiling(x)
# [1] -3 -3 4 5
as.integer(x)
# [1] -3 -3 4 4

12
2 Coding Functions and Using Loops in R
2.1 Coding Functions in R
The possibility to write functions easily, and the very exible way functions can be used in R makes
programming functions a main building block of using R in an optimal way. Also for R-beginners it is
not more dicult to code a function than to write the code directly to the R-prompt. But if you write a
function it allows you to repeat the same commands again with dierent input values much easier. That
helps also the debugging and the reusing of your code. Note that a function you code should (in our
course MUST) have a sensible name and a short description of what the function is doing
We use the following structure in order to create a specic function which is not already dened in R.

f <- function(p1=1,p2=2) {
# this function returns the ... a short explanation of what the function does
# p1 ... explain what is input 1
# p2 ... explain what is input 2
res <- p1^2+p2-..... #make the calculations
....
res # or return(res)
}
xxx <- f(p1=3, p2=5) # calls the function and stores the result in the variable xxx
f(p1=2, p2=4) # calls the function and prints the result to the R prompt
Check out the following examples of simple functions to comprehend how to create functions in R.

# EXAMPLE 01
circle <- function(r=1) {
# calculates circumference and area of a circle and returns a vector holding them
# r ... radius length

cf <- 2*pi*r # evaluates the circumference


a <- pi*r^2 # evaluates the enclosed area
c(circumference=cf, area=a)
# or return(c(circumference=cf, area=a))
}

circle()# uses default value r=1 should return 2*pi and pi


circle(r=3) # or circle(3)
# circumference area
# 18.84956 28.27433
circle(r=1)
# circumference area
# 6.283185 3.141593

13
# EXAMPLE 02
triangle <- function(c1 = c(0,0), c2=c(1,0),c3=c(1,1)){
# function returns the perimeter and the area of a triangle
# Check "www.mathopenref.com/coordtriangleareabox.html" for the explanation
# c1 ... coordinate of 1st corner (must be a vector of length 2)
# c2 ... coordinate of 2nd corner (must be a vector of length 2)
# c3 ... coordinate of 3rd corner (must be a vector of length 2)
if(length(c1)!=2 || length(c2)!=2 || length(c3)!=2){
return("function triangle(): ERROR!!!!! inappropriate INPUT")
}
# evaluating the perimeter
ab <- sqrt((c1[1]-c2[1])^2+(c1[2]-c2[2])^2)
bc <- sqrt((c3[1]-c2[1])^2+(c3[2]-c2[2])^2)
ac <- sqrt((c1[1]-c3[1])^2+(c1[2]-c3[2])^2)
pm <- ab+bc+ac
# evaluating the area
trab <- abs((c1[1]-c2[1])*(c1[2]-c2[2]))/2
trbc <- abs((c3[1]-c2[1])*(c3[2]-c2[2]))/2
trac <- abs((c1[1]-c3[1])*(c1[2]-c3[2]))/2

maxxy <- pmax(c1,c2,c3)


minxy <- pmin(c1,c2,c3)

sqa <- min(max((c1[1]-minxy[1])*(c1[2]-minxy[2]),0),max((maxxy[1]-c1[1])*(maxxy[2]-c1[2]),0))


sqb <- min(max((c2[1]-minxy[1])*(c2[2]-minxy[2]),0),max((maxxy[1]-c2[1])*(maxxy[2]-c2[2]),0))
sqc <- min(max((c3[1]-minxy[1])*(c3[2]-minxy[2]),0),max((maxxy[1]-c3[1])*(maxxy[2]-c3[2]),0))
area <- (maxxy[1]-minxy[1])*(maxxy[2]-minxy[2])-trab-trbc-trac-sqa-sqb-sqc
c(perimeter=pm,area=area)
}
res<-triangle() # calls the function using the default values of the input variables
res
#perimeter area
3.414214 0.500000
# this simple default triangle c1 = c(0,0), c2=c(1,0),c3=c(1,1) is very good for debugging:
res[1]-(2+sqrt(2)) # we check corectness of perimeter
#perimeter
# 0
res[2]-0.5 # we check corectness of perimeter
#area
# 0

coora <- c(23,18); coorb <- c(13,34); coorb <- c(50,5)


triangle(c1=coora,c2=coorb,c3=coorc)
# perimeter area
# 95.84525 151.00000

coora <- c(10,18); coorb <- c(13,34); coorc <- c(50,5)


triangle(c1=coora,c2=coorb,c3=coorc)
# perimeter area
# 105.3489 339.5000
coora <- c(3,5); coorb <- c(9,15); coorc <- c(6,10)
triangle(coora,coorb,coorc)
# perimeter area
# 0 0

14
Remember the ordering cost problem in section 1.2. We will create a function that yields the output
in case of a change in unit costs and ordering costs. In this function we will also assign default values to
input parameters. So, whenever a parameter is undened in the function call, R will assume the default
value for this parameter.

# EXAMPLE 03
orderingcostlist <- function(uc=7, duc=6.5, ucc=40, fc=50, dfc=15,fcc=45, tcub=318){
# returns the list of the odering costs
# uc=7, # regular unit cost
# duc=6.5, # discounted unit cost
# ucc=40, # minimum order amount to get discounted unit cost
# fc=50, # regular fixed cost
# dfc=15, # discounted fixed cost
# fcc=45, # maximum order amount with the regular fixed cost
# tcub=318 # total cost upper bound

units <- 30:50


marginalcost <- uc*units*(units<ucc)+duc*units*(units>=ucc)
fixedcost <- fc*(units<=fcc)+dfc*(units>fcc)
totalcost <- fixedcost+marginalcost
res <- totalcost[totalcost<=tcub]
names(res) <- units[totalcost<=tcub]
res
}

orderingcostlist() # will yield the same results before


# 30 31 32 33 34 35 36 37 38 40 41 46
# 260.0 267.0 274.0 281.0 288.0 295.0 302.0 309.0 316.0 310.0 316.5 314.0

orderingcostlist(fc=55,uc=6.3) # we just change two parameter values


# 30 31 32 33 34 35 36 37 40 41 46 47 48
# 265.0 272.0 279.0 286.0 293.0 300.0 307.0 314.0 307.0 313.3 304.8 311.1 317.4

In order to see the construction of an if-else statement in R, We will implement following function as
a last example.
 2

 x x < −2
x+6 −2 ≤ x < 0

f (x) =
−x + 6 0≤x<4
 √


x x≥4

15
# EXAMPLE 04
f <- function(x=0.5){
# implements the function f(x)
# x must be a single number (a vector of length 1)
if(x<(-2)){ x^2
}else if(x<0){ x+6
}else if(x<4){ -x+6
} else{ sqrt(x)
}
}

c(f(-4),f(-1),f(3),f(9))
# [1] 16 5 3 3
f(c(5,0)) # warning message and wrong results as (f(0) should be 6
# [1] 2.236068 0.000000
# Warning messages:1: In if (x < (-2)) { :
# the condition has length > 1 and only the first element will be used

The warning message above shows that the condition in the if-statement must be a single logical expres-
sion. If a vector of logical expressions is given, we get that warning message and only the rst element of
the logical expression is used, thus giving wrong results.
To implement the above function for a vector x we have to use the ifelse() function instead of the
if statement. .
ifelse() is dened by:

Usage:
ifelse(test, yes, no)
test .... a vector of logical expressions
yes .... vector whose values are returned for true elements of `test'.
no .... vector whose values are returned for false elements of `test'.
A better implementation of the function f() uses ifelse() is:

# EXAMPLE 04 better (vectorized) code


myf <- function(x=0:50){
# implements the function f(x)
# x is a vector
ifelse(x<(-2), x^2,
ifelse(x<0, x+6,
ifelse(x<4, -x+6,
sqrt(abs(x)) ) ) )
}

myf(c(-4,-1,3,9))
# [1] 16 5 3 3
c(myf(-4),myf(-1),myf(3),myf(9))
# [1] 16 5 3 3
myf(c(5,0))
#[1] 2.236068 6.000000

Note: Is possible to implement the fnction f for a vector x also using a for loop, but using ifelse is
clearly faster, and after you become used to the vectorized coding possibilities of R also not more dicult.
Note that you can also introduce predened functions as parameters. You will see an example of this
in section 2.2.

16
2.2 Dening Loops in R
A basic structure for a loop with a known number of repetitons is:

res<-0
for(i in 1:10){ # i gets sequential values from vector x in each repetiton
do required operations depending on i variable, eg:
res <- res + i^2
}
You can do every vectoral operation with a for-loop. But in R, it takes longer to execute loops than
it does in C. Thus, it is better to use vectoral operations when possible. The following example estimates
the expectation for the maximum of two standard uniform random variates, Y = max {U1 , U2 }, which is
actually equal to 2/3. We will rst not use the pmax pmax() function. Instead, we will dene a for-loop.
Now, this is our rst Monte Carlo simulation in this paper.
2

simmax2unif <- function(n){


# simulate the average of the maximum of two U(0,1) unifrom random variates
y <- numeric(n)
# in order to record the output of our simulation in "y"
# we should define it before the for-loop
for(i in 1:n){ # i will take integer values from 1 to n
u1 <- runif(1)
u2 <- runif(1)
y[i] <- max(u1,u2) # record the estimate as the "i"th entry
}
mean(y)
}

simmax2unif(100000)
# exp.estim.
# 0.665354266
system.time(x <- simmax2unif(100000)) # execution time in seconds
# user system elapsed
# 35.30 0.08 35.43

# Do the same simulation with pmax()


simmax2unif_2 <- function(n){
u1 <- runif(n)
u2 <- runif(n)
y <- pmax(u1,u2)
res <- mean(y)
names(res) <- c("exp.estim.")
res
}

simmax2unif_2(1000000)
# exp.estim.
# 0.6665182787
system.time(x <- simmax2unif_2(100000)) # execution time in seconds
# user system elapsed
# 0.03 0.00 0.03
As you can see, vectoral operations work way much faster than loops. Still, under some circumstances,
loops might be the only option to implement your algorithms.

2 You may not be familiar with random variates or Monte Carlo simulation. These will be taught in IE255. When you
learn more about random variates and Monte Carlo simulation, you can come back here and study these examples.

17
While-loops are useful espacially for the convergence algorithms. For loops with an undetermined
number of repetitions we use a while-loop, which is dened by:

while(condition){ # as long as the condition is satisfied, run the loop


do required operations
}
Here is a basic root nding algorithm that uses a while-loop:

findroot <- function(f=function(x) x^2-1, interval=c(0.2,3), errbound=1.e-8, trace=FALSE){


# root finding with the bisection method (see https://en.wikipedia.org/wiki/Bisection_method)
# finds the unique real root of a continuous function in an interval and returns its value
# f ... name of the function for which the root (its zero value) should be found
# interval... the interval with a single root
# errbound=1e-8, # maximal approximation error accepted
# trace=FALSE # if trace is true, the sequence of approximating values is printed
a <- interval[1]
b <- interval[2]
if(f(a)*f(b)>0){
return("findroot:() ERROR starting values have the same sign!!!")
}else{
counter <- 0
res <- 0
err <- abs(a-b)
while(err>errbound){
c <- (a+b)/2
fc <- f(c)
if(f(a)*fc>0){
a <- c
}else{
b <- c
}
err <- abs(a-b)
counter <- counter+1
res[counter] <- a
}
if(trace){
print(res)
}
}
res[length(res)]
}

func <- function(x){x^2-2}


int <- c(1,2)
roo<-findroot(func,int)
roo
#[1] 1.414214
findroot(func,int,trace=TRUE)
[1] 1.000000 1.250000 1.375000 1.375000 1.406250 1.406250 1.414062 1.414062
[9] 1.414062 1.414062 1.414062 1.414062 1.414185 1.414185 1.414185 1.414200
[17] 1.414207 1.414211 1.414213 1.414213 1.414213 1.414213 1.414214 1.414214
[25] 1.414214 1.414214 1.414214
[1] 1.414214

18
3 Drawing Plot Diagrams and Histograms in R
We would like to draw a plot diagram for the density function of standard normal distribution in the
interval (-4,4). We should create a vector with many x-values and the function responses in a second
vector.

x <- seq(-4,4,length.out=51) # this are not enough points


y <- dnorm(x)
plot(x,y) # plots with dots (figure 1)

windows() # you can use this command to display your diagram in a new window
plot(x,y,type="l") # connects the same dots (figure 2)

x <- seq(-4,4,length.out=10001) # this are enough points


y <- dnorm(x)
windows()
plot(x,y,type="l") # connects more dense dots (figure 3)
The resulting plots are given in Figure 1.

Now we show how to draw a histogram of a vector in R with hist(). Histograms are quite pretty
tools to see the distribution of given data. You can obtain a better histogram by changing the break
parameter.

x <- rnorm(100000,3,1.5)
# a vector of normal RVs with mean 3 and std. dev. 1.5

hist(x)

windows()
hist(x,breaks=50)

windows()
hist(x,breaks=100)
The histograms are presented in Figure 2. You can also add new lines and functions to a plot diagram
3
or a histogram which is already displayed . We use the lines() command wich is similar to the plot()
command. This time, there is no necessity to add a type parameter. You can also add lines to existing
diagrams using the abline() command. Check out the following examples.

hist(x,breaks=100)
y <- seq(-5,10,length.out=100001)
lines(y,dnorm(y,3,1.5)*200000)

y <- seq(-5,10,length.out=101)
windows()
plot(y,dnorm(y,3,1.5))
lines(y,dnorm(y,3,1.5))

windows()
plot(y,dnorm(y,3,1.5),type="l")
abline(v=4.5) # add a "v"ertical line on x=4.5
abline(v=1.5) # add a "v"ertical line on x=1.5
abline(h=dnorm(1.5,3,1.5)) # add a "h"orizontal line on y=dnorm(1.5,3,1.5)
abline(a=0.10,b=0.01) # add a line with slope=0.01 and intercept=0.10
The diagrams are given in Figure 3.
3 One can use the points() command to add new points to a plot diagram. For more details, see the following link:
http://stat.ethz.ch/R-manual/R-patched/library/graphics/html/points.html

19
Figure 1: Plots of the density function of the standard normal distribution

Figure 2: Histograms of a vector of normal RVs with mean 3 and standard deviation 1.5

Figure 3: Adding lines to existing diagrams with lines() (1-2) and abline() (3) commands

20
4 Basic User Information
4.1 Scaning and Printing Data
Assume that you have a data
4 written in a text le in the following format.

3 25 94.9 12
547 32556 56
89 567
435 342.1
76.5 983.2
0 343
# There are 15 real values
You can use the command scan() in order to store this data in a vector by scanning it from left to
right and top to down. Spaces and new lines will separate the values to store them in new indices.

x <- scan()
# press enter after writing this line, it will display "1:" on the command line
# Press CTRL+V to paste the copied data, 15 real values will be stored in x
# it will display "16:" on the command line
# Press enter in order to finish scanning process, 16th index will be ignored

# 1: 3 25 94.9 12
# 5: 547 32556 56
# 8: 89 567
# 10: 435 342.1
# 12: 76.5 983.2
# 14: 0 343
# 16:
# Read 15 items

x
# [1] 3.0 25.0 94.9 12.0 547.0 32556.0 56.0 89.0 567.0
# [10] 435.0 342.1 76.5 983.2 0.0 343.0
You can also scan a column of cells from an Excel sheet, but not rows. Be careful that the decimal
separator is (.) in R. So you can only scan values that uses (.) as the decimal separator.
You can also read tables from a text (*.txt) le. Assume you have a text le containing a data similar
to the following format:

length weight age


1.72 72.3 25
1.69 85.3 23
1.80 75.0 26
1.61 66 23
1.73 69 24
# 3 values in each row separated by space breaks.
# The decimal separator should be "."
On R interface, click on the File tab and then click on Change dir... to learn your working
5
directory . Copy your text le and paste it in that directory. Suppose it is named data.txt. Write the
following command:

4 Such data should only contain rational numbers.


5 You can also change your working directory from here.

21
x <- read.table(file="data.txt",header=TRUE)
# if you do not have any headers in your data, choose header as FALSE
x # press enter to display x table
# length weight age
# 1 1.72 72.3 25
# 2 1.69 85.3 23
# 3 1.80 75.0 26
# 4 1.61 66.0 23
# 5 1.73 69.0 24
x$length
# [1] 1.72 1.69 1.80 1.61 1.73
x$weight
# [1] 72.3 85.3 75.0 66.0 69.0
x$age
# [1] 25 23 26 23 24

You can print a comment or an object


6 within a function by using print() command. To print a
comment, do not forget to put it in a quotation.

print("error")
# [1] "error"
x <- 1:5
print(x)
# [1] 1 2 3 4 5

4.2 Session Management


You can nd detailed information about the functions which came predened with R. You can learn
about the parameters (arguments) that are available within the function and a few examples about the
function. Just write ? and the name of the function that you want to learn information about. Check
out the explanations given in R about following functions.

?det
?sample
?sin
?cbind
You can use apropos(".") to nd a list of all functions that contains a specic word. These functions
can be given with the default library or can be dened by you in that R session.

apropos("norm")
# [1] "dlnorm" "dnorm" "normalizePath" "plnorm"
# [5] "pnorm" "qlnorm" "qnorm" "qqnorm"
# [9] "qqnorm.default" "rlnorm" "rnorm"

6 Objects can be vectors, matrices, arrays, functions, lists (lists are similar to structures in C), tables etc.

22
apropos("exp")
# [1] ".__C__expression" ".expand_R_libs_env_var" ".Export"
# [4] ".mergeExportMethods" ".standard_regexps" "as.expression"
# [7] "as.expression.default" "char.expand" "dexp"
# [10] "exp" "expand.grid" "expand.model.frame"
# [13] "expm1" "expression" "getExportedValue"
# [16] "getNamespaceExports" "gregexpr" "is.expression"
# [19] "namespaceExport" "path.expand" "pexp"
# [22] "qexp" "regexpr" "rexp"
# [25] "SSbiexp" "USPersonalExpenditure"

If you need to see all the objects that you have created in your work session, simply write objects().
objects()
# [1] "a" "b" "circle" "coora"
# [5] "coorb" "coorc" "error" "f"
# [9] "findroot" "fixedcost" "func" "int"
# [13] "lbound" "marginalcost" "n" "orderingcostlist"
# [17] "res" "simmax2unif" "simmax2unif_2" "totalcost"
# [21] "triangle" "ubound" "units" "vec"
# [25] "x" "xest" "xinv" "y"
# [29] "y1" "y2" "y3" "y4"
# [33] "y5" "y6" "z"
You can always save your R session together with the objects that you have created by clicking File,
then Save Workspace from the quick access bar. You can always reach your saved work spaces by a
double-click on the saved le.

23
APPENDIX: Some Probability and Statistic functions in R
4.3 Probability Functions in R
There are four functions related to the distributions which are well-known and commonly used in proba-
bility theory and statistics. Let us give the denitions of those functions on normal distribution and then
talk about this probability distributions which are available in R.

• dnorm(x,y,z): returns the pdf (probability distribution function) value of x in a normal distribution
with mean y and standard deviation z.

• pnorm(x,y,z): returns the cdf (cumulative density function) value of x in a normal distribution
with mean y and standard deviation z.

• qnorm(x,y,z): returns the inverse cdf value of x in a normal distribution with mean y and standard
deviation z. Clearly x must be in the unit interval (x ∈ [0, 1]).

• rnorm(x,y,z): returns a vector of random variates (RVs) which has length x. The variates will
follow a normal distribution with mean y and standard deviation z.

Check out the following examples about normal distribution:

dnorm(0.5) # if no parameter is defined, R assumes a std. normal distribution


# [1] 0.3520653
dnorm(0,2,1)
# [1] 0.05399097
dnorm(3,3,5)
# [1] 0.07978846

pnorm(0) # the area below the curve


# on the left side of "0" in a std. normal distribution
# [1] 0.5
pnorm(2)
# [1] 0.9772499
pnorm(5,3,1)
# [1] 0.9772499

# following are the inverse of the previous "pnorm()" functions


qnorm(0.5)
# [1] 0
qnorm(0.9772499)
# [1] 2.000001
qnorm(0.9772499,3,1)
# [1] 5.000001

rnorm(20,2,1) # will generate 20 RVs which follow normal dist.


# with mean 2 and std. dev. 1
# [1] 2.31502453 0.37445729 2.04994863 1.89381118 0.63099383 1.50837615
# [7] 0.57363369 2.84601422 2.54003868 3.43652548 0.88941281 3.36373629
# [13] 0.58945290 2.44678124 -0.05360271 2.73920472 2.73643684 1.79465998
# [19] 1.30906099 2.18648566

24
Here is a list of useful distributions that are available for computation in R. There are also other
distributions which are available in R but not in this list. (For each distribution below, you can obtain
the cdf function by changing the initial letter d to p, the inverse cdf by changing to q and random variate
generator by changing to r). Apart from the normal distribution, please intend to practice and learn
about d,p,q,r functions over the rst nine distributions in this list .
7

• dpois(x,y) : returns the pmf (probability mass function) value of x in a Poisson distribution with
mean (rate) y.
• dbinom(x,y,z) : returns the pmf value of x in a binomial distribution with a population size y and
success probability z.
• dgeom(x,y) : returns the pmf value of x in a geometric distribution with a success probability y.
• dunif(x,y,z) : returns the pdf value of x in a uniform distribution with lower bound y and upper
bound z.

• dexp(x,y) : returns the pdf value of x in a exponential distribution with a rate parameter y.
• dgamma(x,y,scale=z) : returns the pdf value of x in a gamma distribution with a shape parameter
y and a scale parameter z. (If you do not write scale in parameter denition, it assumes z as the
rate parameter, which is equal to 1/scale)

• dchisq(x,y,z) : returns the pdf value of x in a chi-square distribution with degrees of freedom y
and the non-centrality parameter z.
• dt(x,y,z) : returns the pdf value of x in a t-distribution with degrees of freedom y and the
non-centrality parameter z.
• df(x,y,z,a) : returns the pdf value of x in a F-distribution with degrees of freedom-1 y, degrees
of freedom-2 z and the non-centrality parameter a.
• dcauchy(x,y,z) : returns the pdf value of x in a Cauchy distribution with a location parameter y
and scale parameter z.
• dnbinom(x,y,z) : returns the pmf value of x in a negative binomial distribution with dispersion
parameter y and success probability z.

• dhyper(x,y,z,a) : returns the pmf value of x (number of white balls) in a hyper geometric
distribution with a white population size y, a black population size z, number of drawings made
from the whole population a.
• dlnorm(x,y,z) : returns the pdf value of x in a log-normal distribution with log-mean y and
log-standard deviation z.
• dbeta(x,y,z) : returns the pdf value of x in a beta distribution with shape-1 parameter y and
shape-2 parameter z.
• dlogis(x,y,z) : returns the pdf value of x in a logistic distribution with a location parameter y
and scale parameter z.
• dweibull(x,y,z) : returns the pdf value of x in a Weibull distribution with a shape parameter y
and scale parameter z.
7 For BDA 521 students, it is sucient to practice and learn about the rst ve distributions in the list. For advanced
applications, it is also important to learn about gamma, chi-squared, students' t, and log-nromal distribution.

25
4.4 Statistical Functions in R
You can nd the mean of a vector with the function mean(), its standard deviation with sd(), its variance
with var(), its median with median(). You can use the function summary() to learn about 25 and 75
percent quantiles (which are called quartiles altogether with the median).

x <- rnorm(1000000,5,2) # x is a vector of 1000000 RVs


# which follow a normal dist. with mean 5 and std. dev. 2

mean(x)
# [1] 4.997776
sd(x)
# [1] 2.000817
var(x)
# [1] 4.003268
median(x)
# [1] 4.997408
summary(x)
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# -4.904 3.650 4.997 4.998 6.346 14.420
summary(x,digits=6)
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# -4.90360 3.65020 4.99741 4.99778 6.34564 14.42310
quantile(x) # this command yields the quartiles also
# 0% 25% 50% 75% 100%
# -4.903599 3.650201 4.997408 6.345639 14.423129

# quartiles can also be obtained by the following way


sort(x)[1000000*0.25]
# [1] 3.650189
sort(x)[1000000*0.5]
# [1] 4.997408
sort(x)[1000000*0.75]
# [1] 6.345639
Of course, when you try this sequence of commands, you will get dierent results since rnorm() will
produce RVs from a dierent seed.

26

You might also like