0% found this document useful (0 votes)
17 views5 pages

Lab 8

statistics

Uploaded by

gaby1darius26
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views5 pages

Lab 8

statistics

Uploaded by

gaby1darius26
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

STA1007S Lab 8: Probability distributions

SUBMISSION INSTRUCTIONS:

Your answers need to be submitted on Amathuba.

Go into the Quizzes section and click on Lab Session 8 to access the submission form. Please note that the
answers get automatically marked and so have to be in the correct format:

ENTER YOUR ANSWERS TO 2 DECIMAL PLACES UNLESS THE ANSWER IS A ZERO OR AN INTE-
GER (for example if the answer is 0 you just enter 0 and not 0.00, or if the answer is 2 you enter 2 and not 2.00).

DO NOT INCLUDE ANY UNITS (ie meters, mgs, etc).

PROBABILITIES MUST BE BETWEEN 0 AND 1, SO A 50% CHANCE WOULD CORRESPOND TO A


PROBABILITY OF 0.5.

Introduction
In a previous lab we learnt about the concept of sampling and simulating experiments. We defined a random
variable, which could take on some set of values and we randomly selected one of those values. We can then
say that that value has been ‘observed’ or ‘realized’. A collection of such observations (or ‘realizations’)
constitutes a sample. We could also control how frequently each of these values was observed by associating
a probability to them; either one by one or using a function. R has some built-in functions that are really
helpful when working with some of the most common forms of probability distributions. In this lab, we will
explore how to calculate probabilities and generate observations of random variables that have binomial,
Poisson or exponential distributions.
You will find that most of the R code necessary to execute the R commands is provided. This lab is meant to
be practice for you, so even if the code and the output of the code is provided, you are expected to create
your own script, run the pieces of code yourself and check whether the output is what you would expect it to
be. Every now and then, you will be asked to fill in blank pieces of code marked as ---. In addition to “fill in
the code”, you will need to answer other questions for which you must produce plots, run your own code or
explore your data. The questions you need to submit through Amathuba will appear in the submission boxes.
At any time you might call the function help(), to obtain information from any function you want. E.g. If
you wanted to obtain a description of how the function sample() works, you can at any time type in the
console (bottom left panel in RStudio):
help("sample")

or you can just type:


?sample

You should take this as a habit and check the help files of the functions you use for the first time.

1
Start a new R script and import your data
Start a new R script in your existing R project for the computer labs and write a few lines describing what
you are going to do.
Remember to add a line to clean your working environment and one to double check that your working
directory is correct.
Remember to save your script frequently!

Generating random observations


R has functions for a large range of probability distributions. They all follow a similar syntax and logic. To
start, let’s have a look at the help file for the function rexp().
# Visit the help file of the exponential distribution
?rexp()

The function rexp() is the last function in the list, and according to the ‘Description’ section in the help file,
it generates random numbers from the exponential distribution. This is similar to the sampling that we did
in the last lab. Back then, we defined a vector of values, associated probabilities to them and selected some
of them randomly using the function sample(). Using the rexp() function, we are telling R to look at all
the possible values that a random variable can take on, when it is associated to an exponental distribution
(any number greater than 0) and select some of them according to the probabilities given by the exponential
distribution. The number of values that we sample (observe, generate or realize are equivalent terms) is
passed on using the argument n. Recall, from lecture that when working with the exponential distribution,
we need to specify the rate parameter λ, which tells us the average rate at which events are observed.
Let’s generate a sample of 10 observations from an exponential distribution with a rate parameter of 5.
# Generate 10 numbers from an exponential distribution with rate = 5
rexp(n = 10, rate = 5)

## [1] 0.026210732 0.218521443 0.159268213 0.105986757 0.073297198 0.532243285


## [7] 0.097079451 0.134426008 0.005795506 0.042797918
Your results will look different to these, since the function rexp() is generating them at random. These 10
numbers were generated from an exponential distribution with rate parameter λ = 5.
Lets generate a larger sample of values and plot a histogram instead of printing them in the console.
# Generate 10000 numbers from an exponential distribution with rate = 5
exp_vector <- rexp(n = 10000, rate = 5)

# Plot the resulting sample using a histogram


hist(exp_vector)

2
Histogram of exp_vector

5000
Frequency

3000
0 1000

0.0 0.5 1.0 1.5 2.0

exp_vector

You might recall that the expected value of an exponential is 1/λ, which in this case is 1/5 = 0.2, which in
turn is consistent with the histogram. Remember that your histogram may look slightly different to this
one, since it may come from a different sample. The expected value, however, should be the same since both
samples come from the same distribution.
Re-run this code a number of times, and you will get a slightly different answer each time because the values
are generated randomly. Modify the code so that it generates only 100 values and re-run a number of times.
Now you see that the histogram is much more rugged and tends to change more from run to run. This is
the effect of sample size. Large samples resemble more closely the distribution they come from than smaller
samples.

Calculating probabilities
Now look at the function pexp(), the second function in the list in the help file. It is the cumulative
distribution function for the exponential distribution, i.e. P r(X ≤ x). The cumulative distribution function
for a continuous probability distribution gives us the area under the curve (i.e. the probability) up to a
specified point.
For example, let’s calculate the probability of observing a value smaller than the expected value of the same
exponential distribution we’ve been working with. How do we do this with the function pexp()? From
the help file, we see that this function expects a quantile (q), i.e. the value for x up to which we want to
calculate the cumulative probability. Then, it needs to know the rate parameter rate. And finally, we can
tell it whether we want the area under the lower tail (lower.tail = TRUE, which is the default and gives
P r(X ≤ x)) or the area under the upper tail (lower.tail = FALSE gives P r(X > x)).
We know that the expected value of an exponential distribution is 1/λ or similarly 1/rate. Then, to calculate
the probability of observing a value smaller than the expected value we type the following code.
# Calculate the probability of observing a value smaller than expected value
# of exponential with rate = 5
pexp(1/5, rate = 5)

## [1] 0.6321206
With this line of code we’ve asked R to calculate the area under the exponential distribution curve from −∞
up to the mean, which gives us the probability of an observation falling inside that inteval. Effectively, R

3
only needs to calculate the area under the curve from 0 up to the mean, since the exponential distribution
can’t take on negative numbers.
And we’ve got an interesting result! It seems that it is more likely to observe values smaller than the expected
value than values larger than the expected value. This is due to the exponential distribution being skewed to
the right. Remember that the value that ‘cuts’ the area under the curve (and therefore the probability) in
half is the median, not the mean!

SUBMISSION:

Use what you have just learnt to answer the following questions:

Suppose you are on a game drive and find a kill. Vultures arrive at the kill at a rate of 5.3 per hour.

Amathuba Question 1. You need to leave in 30 minutes. What is the probability that you will see at least
one vulture before you have to go?
Amathuba Question 2. What is the probability that it will be more than 20 minutes before the next
vulture arrives?
Amathuba Question 3. What is the probability that you need to wait between 20 and 40 minutes before
the first vulture arrives?

Working with discrete random variables


The R functions for the binomial distribution (pbinom()) and the Poisson distribution (ppois()) work in
a similar way. However, remember that these distributions have probability mass functions and they are
non-zero only for integers. For cumulative probabilities (‘smaller than a given number’ or ‘greater than a
given number’) we use these two functions in a similar way as we used pexp() above. However, if we want to
know the probability of observing exactly one outcome (P r(X = x)), we use the sister functions dbinom()
and dpois().
Note: the function dexp() also exists and gives us the probability density at some x value. It provides the
value of the PDF, just as dpois() or dbinom() provide the value of the PMF.
Here is an example (use the R help files if you need more guidance): you observe a group of 15 penguins that
each survive the year independently with probability 0.8.
1. What is the probability that exactly 10 are still alive at the end of the year?
# P(X = 10) when X have a B(15,0.8)
dbinom(10, size = 15, prob = 0.8)

## [1] 0.1031823
2. What is the probability that no more than 10 penguins survive?
# P(X <= 10) when X have a B(15,0.8)
pbinom(10, size = 15, prob = 0.8)

## [1] 0.1642337
3. What is the probability that at least 5 penguins survive? Note that when lower.tail = FALSE we
are specifying P r(X > x) and not P r(X ≥ x) and hence we specify 4 penguins here i.e. P r(X ≥ 5) =
P (X > 4) = P (5) + P (6) + ....
# P(X > 4) when X have a B(15,0.8)
pbinom(4, size = 15, prob = 0.8, lower.tail = FALSE)

## [1] 0.9999875

4
SUBMISSION:

Childhood lead poisoning is a public health concern. In a town in the Highveld, one child in 30 has a high
blood lead level. In a randomly chosen group of 40 children from the population, what is the probability that:
Amathuba Question 4. Exactly three have high lead level?
Amathuba Question 5. At most three have high lead levels?
Amathuba Question 6. At least three have high lead levels?

Grasshoppers are found in a large meadow at the rate of 2.4 per square meter.
Amathuba Question 7. What is the probability of less than 5 grasshoppers being found in a random
square meter?
Amathuba Question 8. What is the probability of more than 5 grasshoppers being found in an area of 2
square meters?

The commands you learned today


These are the functions and operators that you learned today. Fill in your own description of what they do.
rexp()
dexp()
pexp()
rbinom()
dbinom()
pbinom()
rpois()
dpois)
ppois()

You might also like