R Tutorial Ie255
R Tutorial Ie255
October 3, 2020
Preface
This document contains a short introduction to the programming language R. By the help of each
example given in this document, you should be able to gather a basic knowledge about R which will help
you to use R and make the calculations and plots required in the probability course IE255. In order
to comprehend this programming language, it is recommended that you try yourself step by step the
applications presented in this document.
Note that this document is just explaining the basics. The names of the functions necessary directly
for probability calculations like eg. runif() will be mentioned in the course documents. With the
command
?runif
you get R-documentation of that function. Also in the appendix of this document you can nd some of
the most important R-commands used in probability and statistics. You will need to use them later in
IE255.
You can download the latest version of R from http://cran.r-project.org/. For Windows users,
click Windows link, then the base *.exe le.
link and you will see the download link for the
Once you install R, you can use directly the windows-interface or you can also use R-studio, which is
preferred by many students. In any case we strongly recommend that you write and save all your codes
in script les. In the windows R-interface just click File from the quick access bar, then New script and
you can write your code inside this script. If you have a complete code in your script le, you can select
the whole script with Ctrl+A and then run it with Ctrl+R to run your code in the R console in a fast
manner. You can always save your script les, then open them again by clicking File and Open script
from the quick access bar.
My thank is due to my former PhD student smail Ba³o§lu from Maltepe Unversity who shared the
latex sources of the R-tutorials he has written. This is a version slightly changed for the special needs of
IE255.
Wolfgang Hörmann
1
1 R Works with Vectors
1.1 Creating Vectors
In order to assign a value to a specied variable (e.g. 3 to x), we do the following:
x <- 3
or
x = 3
We will use the operator <- in our future examples for assigning values.
1
When we assign a number to a variable, R considers it as a vector with a single element. So, by using
[.] next to the specied variable, we can assign another element onto any index we want. Finally, if we
want to see what is stored in that specied variable, simply we write its name and press enter.
x <- 1:8
y <- 1:4
x
# [1] 1 2 3 4 5 6 7 8
y
# [1] 1 2 3 4
x+y # we can see the summation without storing them to any new variable
# [1] 2 4 6 8 6 8 10 12
In this summation y is repeated up to index 8 (since x is a vector of length 8). So, the fth element of
x is summed up with the rst element of y, the sixth element of x is summed up with the second element
of y and so forth. Yet, we might wonder what would it be if the length of y was not a multiple of the
length of y. We can try to see it.
1 There are signicant dierences between assignment operators that are not very important for ba-
sic R-programming. Still, you may be interested in a brief explanation of these dierence at link:
http://stat.ethz.ch/R-manual/R-patched/library/base/html/assignOps.html
2
x <- 1:8
y <- 1:3
x+y
# [1] 2 4 6 5 7 9 8 10
# Warning message:
# In x + y : longer object length is not a multiple of shorter object length
R again repeats the short vector until it reaches the length of the long vector. However, the last
repetition may not be complete. R returns a warning message about this, yet it executes the operation.
You should also know that you can make subtractions, multiplications, divisions, power and modular
arithmetic operations using the same principle. We will get into these operations in section 1.4.
We can create vectors also with specied values. As instance, let us create a vector of length 6 with
values 4, 8, 15, 16, 23, 42 and another vector of length 4 with values 521, 522, 523, 547. We use the function
c in order to combine those values in a vector. We can also learn about the number of elements in a
vector by using length() command.
length(z)
# [1] 10
We can also revert a vector from the last element to the rst.
z <- rev(z) # we can use the same object to reassign that object
z
# [1] 547 523 522 521 42 23 16 15 8 4
Suppose we would like to create a vector of length 10, elements of which will all be equal to 5. We do
the following.
3
If we are not interested in the length of the sequence but the step size, we can use by parameter
instead of length.out.
x <- seq(2,3,by=0.05)
x
# [1] 2.00 2.05 2.10 2.15 2.20 2.25 2.30 2.35 2.40 2.45 2.50
# [12] 2.55 2.60 2.65 2.70 2.75 2.80 2.85 2.90 2.95 3.00
• ==: equal to (do not forget that a single = symbol is used for assigning values)
In the following sequence of examples, we create a vector and use it in dierent logical expressions. If a
vector element satises the expression, it returns a TRUE, otherwise a FALSE in the corresponding index.
You can use & as and and | as or in between logical expressions.
x <- 10:20
x
# [1] 10 11 12 13 14 15 16 17 18 19 20
x<17
# [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE
x<=17
# [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE
x>14
# [1] FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE
x>=14
# [1] FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
x==16
# [1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
x!=16
# [1] TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE
4
x <- 1:20
y <- (x>=8)*(x)
y
# [1] 0 0 0 0 0 0 0 8 9 10 11 12 13 14 15 16 17
#[18] 18 19 20
As for the second example, we will evaluate the ordering costs of some goods. We can order at least
30 and at most 50 units of goods from our supplier in a single order. We have a xed cost of 50 if we
order less than or equal to 45 units and $15 otherwise. A single unit costs $7 if we order less than 40
units and $6.5 otherwise. If we want to evaluate the total ordering cost for each alternative:
5
x <- seq(5,8,by=0.3) # we will have 11 elements in this vector
x
# [1] 5.0 5.3 5.6 5.9 6.2 6.5 6.8 7.1 7.4 7.7 8.0
length(x)
# [1] 11
y7 <- x[x<7] # extract a subvector with elements satisfy being less than 7
y7
# [1] 5.0 5.3 5.6 5.9 6.2 6.5 6.8
x <- 1:5
x # a vector object
#[1] 1 2 3 4 5
y <- t(x)
y # a 1 x 5 matrix
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 2 3 4 5
As you can see, R displays a row vector in a completely dierent way. And if we take the transpose
of vector y, we will see the actual display of a column vector.
z <- t(t(x))
z # a 5x1 matrix
# [,1]
# [1,] 1
# [2,] 2
6
# [3,] 3
# [4,] 4
# [5,] 5
In order to create an m×n matrix in R, rst we need to create a vector (let us name it vec) which
contains the columns of the matrix sequentially from the rst to the last. Then we use the function
simply matrix(vec,nrow=m,ncol=n).
vec <- 1:12
x <- matrix(vec,nrow=3,ncol=4)
x
# [,1] [,2] [,3] [,4]
# [1,] 1 4 7 10
# [2,] 2 5 8 11
# [3,] 3 6 9 12
t(x)
# [,1] [,2] [,3]
# [1,] 1 2 3
# [2,] 4 5 6
# [3,] 7 8 9
# [4,] 10 11 12
It is also possible to assign the elements of a matrix row by row.
x <- matrix(c(1,2,-1,1,2,1,2,-2,-1),nrow=3,ncol=3)
x
[,1] [,2] [,3]
[1,] 1 1 2
[2,] 2 2 -2
[3,] -1 1 -1
7
x <- matrix(0,nrow=4,ncol=4)
x
# [,1] [,2] [,3] [,4]
# [1,] 0 0 0 0
# [2,] 0 0 0 0
# [3,] 0 0 0 0
# [4,] 0 0 0 0
You can learn about the number of columns, number of rows and the total number of elements in a
matrix or directly the dimension of a matrix by using the following functions.
x <- matrix(0,ncol=5,nrow=4)
ncol(x)
# [1] 5
nrow(x)
# [1] 4
length(x)
# [1] 20
dim(x) #
# [1] 4 5
Of course we can also make matrix multiplication in R using the %*% operator:
A <- matrix(1:4,ncol=2)
y <- 1:2
t(y)%*%A
# [,1] [,2]
#[1,] 5 11
A%*%t(t(y))
# [,1]
#[1,] 7
#[2,] 10
A%*%y # gives the same result as R knows that y must be a column vector here.
x <- 2*(1:5)
x
# [1] 2 4 6 8 10
y <- 1:5
y
# [1] 1 2 3 4 5
x+y
# [1] 3 6 9 12 15
8
x*y
# [1] 2 8 18 32 50
x/y
# [1] 2 2 2 2 2
x-y
# [1] 1 2 3 4 5
x^2 # makes a power operation
# [1] 4 16 36 64 100
x^y
# [1] 2 16 216 4096 100000
x%%3 # yields mod(3) of every element in x
# [1] 2 1 0 2 1
y <- 3:7
y
# [1] 3 4 5 6 7
x <- c(3,1,6,5,8,10,9,12,3)
min(x)
# [1] 1
max(x)
# [1] 12
sum(x)
# [1] 57
prod(x)
# [1] 2332800
You can compare two vectors component wise using pmax() and pmin(), so you can either obtain
the component-wise maximum or the component-wise minimum of two vectors. You can also sort a
vector with the function sort() and the function order() yields the index sequence of sorted vector.
Both functions sort values from minimum to maximum by default, but we can use additional parameter
decreasing=TRUE to obtain an order from maximum to minimum.
x <- 1:10
y <- 10:1
z <- c(3,2,1,6,5,4,10,9,8,7)
b <- pmin(x,y,z)
9
b
# [1] 1 2 1 4 5 4 4 3 2 1
sort(b,decreasing=TRUE)
# [1] 5 4 4 4 3 2 2 1 1 1
order(b,decreasing=TRUE)
# [1] 5 4 6 7 8 2 9 1 3 10
R can also do matrix multiplications with %*% operator. This operator should be handled carefully to
obtain correct results. Be sure about the dimensions of your matrices. R is also capable of making some
corrections if the dimensions of the matrices do not hold.
x <- matrix(1:6,ncol=2,nrow=3)
x
# [,1] [,2]
# [1,] 1 4
# [2,] 2 5
# [3,] 3 6
y <- matrix(1:4,ncol=2,nrow=2)
y
# [,1] [,2]
# [1,] 1 3
# [2,] 2 4
x%*%y
# [,1] [,2]
# [1,] 9 19
# [2,] 12 26
# [3,] 15 33
y%*%x
# Error in y %*% x : non-conformable arguments
x <- 1:3
y <- 3:1
t(x)%*%t(y)
# Error in t(x) %*% t(y) : non-conformable arguments
t(x)%*%y # same as the first operation but we have a correct notation now
# [,1]
# [1,] 10
10
x%*%t(y) # only this one returns an outer product
# [,1] [,2] [,3]
# [1,] 3 2 1
# [2,] 6 4 2
# [3,] 9 6 3
Given a vector of real values, you can obtain the cumulative sums vector by the function cumsum()
and the cumulative products vector by the function cumprod(). On the other hand diff() gives you the
dierences between the consecutive elements of a vector.
x <- c(1,4,5,6,2,12)
y <- cumsum(x)
y
# [1] 1 5 10 16 18 30
# every index has the sum of the elements in x up to that index
z <- cumprod(x)
z
# [1] 1 4 20 120 240 2880
# every index has the product of the elements in x up to that index
diff(z)
# [1] 3 16 100 120 2640
You can evaluate the factorial of a positive real number with factorial() and absolute value of
any real number with abs(). You can take the square root of positive real number with sqrt(), and
the logarithm of a positive real number with log(). You can compute the exponential function of a real
number with exp() and the gamma function of a positive real number with gamma(). For integer rounding,
floor() yields the largest integer which is less than or equal to the specied value and ceiling() yields
the smallest integer which is greater than or equal to the specied value. as.integer() yields only the
integer part of the specied value.
factorial(3)
# [1] 6
factorial(1:6)
# [1] 1 2 6 24 120 720
abs(-4)
# [1] 4
abs(c(-3:3))
# [1] 3 2 1 0 1 2 3
sqrt(4)
# [1] 2
sqrt(1:9)
# [1] 1.000000 1.414214 1.732051 2.000000 2.236068 2.449490 2.645751 2.828427
# [9] 3.000000
11
log(c(10,20,30,40))
# [1] 2.302585 2.995732 3.401197 3.688879
x <- c(-3,-3.5,4,4.2)
floor(x)
# [1] -3 -4 4 4
ceiling(x)
# [1] -3 -3 4 5
as.integer(x)
# [1] -3 -3 4 4
12
2 Coding Functions and Using Loops in R
2.1 Coding Functions in R
The possibility to write functions easily, and the very exible way functions can be used in R makes
programming functions a main building block of using R in an optimal way. Also for R-beginners it is
not more dicult to code a function than to write the code directly to the R-prompt. But if you write a
function it allows you to repeat the same commands again with dierent input values much easier. That
helps also the debugging and the reusing of your code. Note that a function you code should (in our
course MUST) have a sensible name and a short description of what the function is doing
We use the following structure in order to create a specic function which is not already dened in R.
f <- function(p1=1,p2=2) {
# this function returns the ... a short explanation of what the function does
# p1 ... explain what is input 1
# p2 ... explain what is input 2
res <- p1^2+p2-..... #make the calculations
....
res # or return(res)
}
xxx <- f(p1=3, p2=5) # calls the function and stores the result in the variable xxx
f(p1=2, p2=4) # calls the function and prints the result to the R prompt
Check out the following examples of simple functions to comprehend how to create functions in R.
# EXAMPLE 01
circle <- function(r=1) {
# calculates circumference and area of a circle and returns a vector holding them
# r ... radius length
13
# EXAMPLE 02
triangle <- function(c1 = c(0,0), c2=c(1,0),c3=c(1,1)){
# function returns the perimeter and the area of a triangle
# Check "www.mathopenref.com/coordtriangleareabox.html" for the explanation
# c1 ... coordinate of 1st corner (must be a vector of length 2)
# c2 ... coordinate of 2nd corner (must be a vector of length 2)
# c3 ... coordinate of 3rd corner (must be a vector of length 2)
if(length(c1)!=2 || length(c2)!=2 || length(c3)!=2){
return("function triangle(): ERROR!!!!! inappropriate INPUT")
}
# evaluating the perimeter
ab <- sqrt((c1[1]-c2[1])^2+(c1[2]-c2[2])^2)
bc <- sqrt((c3[1]-c2[1])^2+(c3[2]-c2[2])^2)
ac <- sqrt((c1[1]-c3[1])^2+(c1[2]-c3[2])^2)
pm <- ab+bc+ac
# evaluating the area
trab <- abs((c1[1]-c2[1])*(c1[2]-c2[2]))/2
trbc <- abs((c3[1]-c2[1])*(c3[2]-c2[2]))/2
trac <- abs((c1[1]-c3[1])*(c1[2]-c3[2]))/2
14
Remember the ordering cost problem in section 1.2. We will create a function that yields the output
in case of a change in unit costs and ordering costs. In this function we will also assign default values to
input parameters. So, whenever a parameter is undened in the function call, R will assume the default
value for this parameter.
# EXAMPLE 03
orderingcostlist <- function(uc=7, duc=6.5, ucc=40, fc=50, dfc=15,fcc=45, tcub=318){
# returns the list of the odering costs
# uc=7, # regular unit cost
# duc=6.5, # discounted unit cost
# ucc=40, # minimum order amount to get discounted unit cost
# fc=50, # regular fixed cost
# dfc=15, # discounted fixed cost
# fcc=45, # maximum order amount with the regular fixed cost
# tcub=318 # total cost upper bound
In order to see the construction of an if-else statement in R, We will implement following function as
a last example.
2
x x < −2
x+6 −2 ≤ x < 0
f (x) =
−x + 6 0≤x<4
√
x x≥4
15
# EXAMPLE 04
f <- function(x=0.5){
# implements the function f(x)
# x must be a single number (a vector of length 1)
if(x<(-2)){ x^2
}else if(x<0){ x+6
}else if(x<4){ -x+6
} else{ sqrt(x)
}
}
c(f(-4),f(-1),f(3),f(9))
# [1] 16 5 3 3
f(c(5,0)) # warning message and wrong results as (f(0) should be 6
# [1] 2.236068 0.000000
# Warning messages:1: In if (x < (-2)) { :
# the condition has length > 1 and only the first element will be used
The warning message above shows that the condition in the if-statement must be a single logical expres-
sion. If a vector of logical expressions is given, we get that warning message and only the rst element of
the logical expression is used, thus giving wrong results.
To implement the above function for a vector x we have to use the ifelse() function instead of the
if statement. .
ifelse() is dened by:
Usage:
ifelse(test, yes, no)
test .... a vector of logical expressions
yes .... vector whose values are returned for true elements of `test'.
no .... vector whose values are returned for false elements of `test'.
A better implementation of the function f() uses ifelse() is:
myf(c(-4,-1,3,9))
# [1] 16 5 3 3
c(myf(-4),myf(-1),myf(3),myf(9))
# [1] 16 5 3 3
myf(c(5,0))
#[1] 2.236068 6.000000
Note: Is possible to implement the fnction f for a vector x also using a for loop, but using ifelse is
clearly faster, and after you become used to the vectorized coding possibilities of R also not more dicult.
Note that you can also introduce predened functions as parameters. You will see an example of this
in section 2.2.
16
2.2 Dening Loops in R
A basic structure for a loop with a known number of repetitons is:
res<-0
for(i in 1:10){ # i gets sequential values from vector x in each repetiton
do required operations depending on i variable, eg:
res <- res + i^2
}
You can do every vectoral operation with a for-loop. But in R, it takes longer to execute loops than
it does in C. Thus, it is better to use vectoral operations when possible. The following example estimates
the expectation for the maximum of two standard uniform random variates, Y = max {U1 , U2 }, which is
actually equal to 2/3. We will rst not use the pmax pmax() function. Instead, we will dene a for-loop.
Now, this is our rst Monte Carlo simulation in this paper.
2
simmax2unif(100000)
# exp.estim.
# 0.665354266
system.time(x <- simmax2unif(100000)) # execution time in seconds
# user system elapsed
# 35.30 0.08 35.43
simmax2unif_2(1000000)
# exp.estim.
# 0.6665182787
system.time(x <- simmax2unif_2(100000)) # execution time in seconds
# user system elapsed
# 0.03 0.00 0.03
As you can see, vectoral operations work way much faster than loops. Still, under some circumstances,
loops might be the only option to implement your algorithms.
2 You may not be familiar with random variates or Monte Carlo simulation. These will be taught in IE255. When you
learn more about random variates and Monte Carlo simulation, you can come back here and study these examples.
17
While-loops are useful espacially for the convergence algorithms. For loops with an undetermined
number of repetitions we use a while-loop, which is dened by:
18
3 Drawing Plot Diagrams and Histograms in R
We would like to draw a plot diagram for the density function of standard normal distribution in the
interval (-4,4). We should create a vector with many x-values and the function responses in a second
vector.
windows() # you can use this command to display your diagram in a new window
plot(x,y,type="l") # connects the same dots (figure 2)
Now we show how to draw a histogram of a vector in R with hist(). Histograms are quite pretty
tools to see the distribution of given data. You can obtain a better histogram by changing the break
parameter.
x <- rnorm(100000,3,1.5)
# a vector of normal RVs with mean 3 and std. dev. 1.5
hist(x)
windows()
hist(x,breaks=50)
windows()
hist(x,breaks=100)
The histograms are presented in Figure 2. You can also add new lines and functions to a plot diagram
3
or a histogram which is already displayed . We use the lines() command wich is similar to the plot()
command. This time, there is no necessity to add a type parameter. You can also add lines to existing
diagrams using the abline() command. Check out the following examples.
hist(x,breaks=100)
y <- seq(-5,10,length.out=100001)
lines(y,dnorm(y,3,1.5)*200000)
y <- seq(-5,10,length.out=101)
windows()
plot(y,dnorm(y,3,1.5))
lines(y,dnorm(y,3,1.5))
windows()
plot(y,dnorm(y,3,1.5),type="l")
abline(v=4.5) # add a "v"ertical line on x=4.5
abline(v=1.5) # add a "v"ertical line on x=1.5
abline(h=dnorm(1.5,3,1.5)) # add a "h"orizontal line on y=dnorm(1.5,3,1.5)
abline(a=0.10,b=0.01) # add a line with slope=0.01 and intercept=0.10
The diagrams are given in Figure 3.
3 One can use the points() command to add new points to a plot diagram. For more details, see the following link:
http://stat.ethz.ch/R-manual/R-patched/library/graphics/html/points.html
19
Figure 1: Plots of the density function of the standard normal distribution
Figure 2: Histograms of a vector of normal RVs with mean 3 and standard deviation 1.5
Figure 3: Adding lines to existing diagrams with lines() (1-2) and abline() (3) commands
20
4 Basic User Information
4.1 Scaning and Printing Data
Assume that you have a data
4 written in a text le in the following format.
3 25 94.9 12
547 32556 56
89 567
435 342.1
76.5 983.2
0 343
# There are 15 real values
You can use the command scan() in order to store this data in a vector by scanning it from left to
right and top to down. Spaces and new lines will separate the values to store them in new indices.
x <- scan()
# press enter after writing this line, it will display "1:" on the command line
# Press CTRL+V to paste the copied data, 15 real values will be stored in x
# it will display "16:" on the command line
# Press enter in order to finish scanning process, 16th index will be ignored
# 1: 3 25 94.9 12
# 5: 547 32556 56
# 8: 89 567
# 10: 435 342.1
# 12: 76.5 983.2
# 14: 0 343
# 16:
# Read 15 items
x
# [1] 3.0 25.0 94.9 12.0 547.0 32556.0 56.0 89.0 567.0
# [10] 435.0 342.1 76.5 983.2 0.0 343.0
You can also scan a column of cells from an Excel sheet, but not rows. Be careful that the decimal
separator is (.) in R. So you can only scan values that uses (.) as the decimal separator.
You can also read tables from a text (*.txt) le. Assume you have a text le containing a data similar
to the following format:
21
x <- read.table(file="data.txt",header=TRUE)
# if you do not have any headers in your data, choose header as FALSE
x # press enter to display x table
# length weight age
# 1 1.72 72.3 25
# 2 1.69 85.3 23
# 3 1.80 75.0 26
# 4 1.61 66.0 23
# 5 1.73 69.0 24
x$length
# [1] 1.72 1.69 1.80 1.61 1.73
x$weight
# [1] 72.3 85.3 75.0 66.0 69.0
x$age
# [1] 25 23 26 23 24
print("error")
# [1] "error"
x <- 1:5
print(x)
# [1] 1 2 3 4 5
?det
?sample
?sin
?cbind
You can use apropos(".") to nd a list of all functions that contains a specic word. These functions
can be given with the default library or can be dened by you in that R session.
apropos("norm")
# [1] "dlnorm" "dnorm" "normalizePath" "plnorm"
# [5] "pnorm" "qlnorm" "qnorm" "qqnorm"
# [9] "qqnorm.default" "rlnorm" "rnorm"
6 Objects can be vectors, matrices, arrays, functions, lists (lists are similar to structures in C), tables etc.
22
apropos("exp")
# [1] ".__C__expression" ".expand_R_libs_env_var" ".Export"
# [4] ".mergeExportMethods" ".standard_regexps" "as.expression"
# [7] "as.expression.default" "char.expand" "dexp"
# [10] "exp" "expand.grid" "expand.model.frame"
# [13] "expm1" "expression" "getExportedValue"
# [16] "getNamespaceExports" "gregexpr" "is.expression"
# [19] "namespaceExport" "path.expand" "pexp"
# [22] "qexp" "regexpr" "rexp"
# [25] "SSbiexp" "USPersonalExpenditure"
If you need to see all the objects that you have created in your work session, simply write objects().
objects()
# [1] "a" "b" "circle" "coora"
# [5] "coorb" "coorc" "error" "f"
# [9] "findroot" "fixedcost" "func" "int"
# [13] "lbound" "marginalcost" "n" "orderingcostlist"
# [17] "res" "simmax2unif" "simmax2unif_2" "totalcost"
# [21] "triangle" "ubound" "units" "vec"
# [25] "x" "xest" "xinv" "y"
# [29] "y1" "y2" "y3" "y4"
# [33] "y5" "y6" "z"
You can always save your R session together with the objects that you have created by clicking File,
then Save Workspace from the quick access bar. You can always reach your saved work spaces by a
double-click on the saved le.
23
APPENDIX: Some Probability and Statistic functions in R
4.3 Probability Functions in R
There are four functions related to the distributions which are well-known and commonly used in proba-
bility theory and statistics. Let us give the denitions of those functions on normal distribution and then
talk about this probability distributions which are available in R.
• dnorm(x,y,z): returns the pdf (probability distribution function) value of x in a normal distribution
with mean y and standard deviation z.
• pnorm(x,y,z): returns the cdf (cumulative density function) value of x in a normal distribution
with mean y and standard deviation z.
• qnorm(x,y,z): returns the inverse cdf value of x in a normal distribution with mean y and standard
deviation z. Clearly x must be in the unit interval (x ∈ [0, 1]).
• rnorm(x,y,z): returns a vector of random variates (RVs) which has length x. The variates will
follow a normal distribution with mean y and standard deviation z.
24
Here is a list of useful distributions that are available for computation in R. There are also other
distributions which are available in R but not in this list. (For each distribution below, you can obtain
the cdf function by changing the initial letter d to p, the inverse cdf by changing to q and random variate
generator by changing to r). Apart from the normal distribution, please intend to practice and learn
about d,p,q,r functions over the rst nine distributions in this list .
7
• dpois(x,y) : returns the pmf (probability mass function) value of x in a Poisson distribution with
mean (rate) y.
• dbinom(x,y,z) : returns the pmf value of x in a binomial distribution with a population size y and
success probability z.
• dgeom(x,y) : returns the pmf value of x in a geometric distribution with a success probability y.
• dunif(x,y,z) : returns the pdf value of x in a uniform distribution with lower bound y and upper
bound z.
• dexp(x,y) : returns the pdf value of x in a exponential distribution with a rate parameter y.
• dgamma(x,y,scale=z) : returns the pdf value of x in a gamma distribution with a shape parameter
y and a scale parameter z. (If you do not write scale in parameter denition, it assumes z as the
rate parameter, which is equal to 1/scale)
• dchisq(x,y,z) : returns the pdf value of x in a chi-square distribution with degrees of freedom y
and the non-centrality parameter z.
• dt(x,y,z) : returns the pdf value of x in a t-distribution with degrees of freedom y and the
non-centrality parameter z.
• df(x,y,z,a) : returns the pdf value of x in a F-distribution with degrees of freedom-1 y, degrees
of freedom-2 z and the non-centrality parameter a.
• dcauchy(x,y,z) : returns the pdf value of x in a Cauchy distribution with a location parameter y
and scale parameter z.
• dnbinom(x,y,z) : returns the pmf value of x in a negative binomial distribution with dispersion
parameter y and success probability z.
• dhyper(x,y,z,a) : returns the pmf value of x (number of white balls) in a hyper geometric
distribution with a white population size y, a black population size z, number of drawings made
from the whole population a.
• dlnorm(x,y,z) : returns the pdf value of x in a log-normal distribution with log-mean y and
log-standard deviation z.
• dbeta(x,y,z) : returns the pdf value of x in a beta distribution with shape-1 parameter y and
shape-2 parameter z.
• dlogis(x,y,z) : returns the pdf value of x in a logistic distribution with a location parameter y
and scale parameter z.
• dweibull(x,y,z) : returns the pdf value of x in a Weibull distribution with a shape parameter y
and scale parameter z.
7 For BDA 521 students, it is sucient to practice and learn about the rst ve distributions in the list. For advanced
applications, it is also important to learn about gamma, chi-squared, students' t, and log-nromal distribution.
25
4.4 Statistical Functions in R
You can nd the mean of a vector with the function mean(), its standard deviation with sd(), its variance
with var(), its median with median(). You can use the function summary() to learn about 25 and 75
percent quantiles (which are called quartiles altogether with the median).
mean(x)
# [1] 4.997776
sd(x)
# [1] 2.000817
var(x)
# [1] 4.003268
median(x)
# [1] 4.997408
summary(x)
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# -4.904 3.650 4.997 4.998 6.346 14.420
summary(x,digits=6)
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# -4.90360 3.65020 4.99741 4.99778 6.34564 14.42310
quantile(x) # this command yields the quartiles also
# 0% 25% 50% 75% 100%
# -4.903599 3.650201 4.997408 6.345639 14.423129
26