0% found this document useful (0 votes)

17 views5 pages

Lab 8

statistics

Uploaded by

gaby1darius26

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views5 pages

Lab 8

statistics

Uploaded by

gaby1darius26

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

STA1007S Lab 8: Probability distributions

SUBMISSION INSTRUCTIONS:

Your answers need to be submitted on Amathuba.

Go into the Quizzes section and click on Lab Session 8 to access the submission form. Please note that the
answers get automatically marked and so have to be in the correct format:

ENTER YOUR ANSWERS TO 2 DECIMAL PLACES UNLESS THE ANSWER IS A ZERO OR AN INTE-
GER (for example if the answer is 0 you just enter 0 and not 0.00, or if the answer is 2 you enter 2 and not 2.00).

DO NOT INCLUDE ANY UNITS (ie meters, mgs, etc).

PROBABILITIES MUST BE BETWEEN 0 AND 1, SO A 50% CHANCE WOULD CORRESPOND TO A

PROBABILITY OF 0.5.

Introduction
In a previous lab we learnt about the concept of sampling and simulating experiments. We defined a random
variable, which could take on some set of values and we randomly selected one of those values. We can then
say that that value has been ‘observed’ or ‘realized’. A collection of such observations (or ‘realizations’)
constitutes a sample. We could also control how frequently each of these values was observed by associating
a probability to them; either one by one or using a function. R has some built-in functions that are really
helpful when working with some of the most common forms of probability distributions. In this lab, we will
explore how to calculate probabilities and generate observations of random variables that have binomial,
Poisson or exponential distributions.
You will find that most of the R code necessary to execute the R commands is provided. This lab is meant to
be practice for you, so even if the code and the output of the code is provided, you are expected to create
your own script, run the pieces of code yourself and check whether the output is what you would expect it to
be. Every now and then, you will be asked to fill in blank pieces of code marked as ---. In addition to “fill in
the code”, you will need to answer other questions for which you must produce plots, run your own code or
explore your data. The questions you need to submit through Amathuba will appear in the submission boxes.
At any time you might call the function help(), to obtain information from any function you want. E.g. If
you wanted to obtain a description of how the function sample() works, you can at any time type in the
console (bottom left panel in RStudio):
help("sample")

or you can just type:

?sample

You should take this as a habit and check the help files of the functions you use for the first time.

1
Start a new R script and import your data
Start a new R script in your existing R project for the computer labs and write a few lines describing what
you are going to do.
Remember to add a line to clean your working environment and one to double check that your working
directory is correct.
Remember to save your script frequently!

Generating random observations

R has functions for a large range of probability distributions. They all follow a similar syntax and logic. To
start, let’s have a look at the help file for the function rexp().
# Visit the help file of the exponential distribution
?rexp()

The function rexp() is the last function in the list, and according to the ‘Description’ section in the help file,
it generates random numbers from the exponential distribution. This is similar to the sampling that we did
in the last lab. Back then, we defined a vector of values, associated probabilities to them and selected some
of them randomly using the function sample(). Using the rexp() function, we are telling R to look at all
the possible values that a random variable can take on, when it is associated to an exponental distribution
(any number greater than 0) and select some of them according to the probabilities given by the exponential
distribution. The number of values that we sample (observe, generate or realize are equivalent terms) is
passed on using the argument n. Recall, from lecture that when working with the exponential distribution,
we need to specify the rate parameter λ, which tells us the average rate at which events are observed.
Let’s generate a sample of 10 observations from an exponential distribution with a rate parameter of 5.
# Generate 10 numbers from an exponential distribution with rate = 5
rexp(n = 10, rate = 5)

## [1] 0.026210732 0.218521443 0.159268213 0.105986757 0.073297198 0.532243285

## [7] 0.097079451 0.134426008 0.005795506 0.042797918
Your results will look different to these, since the function rexp() is generating them at random. These 10
numbers were generated from an exponential distribution with rate parameter λ = 5.
Lets generate a larger sample of values and plot a histogram instead of printing them in the console.
# Generate 10000 numbers from an exponential distribution with rate = 5
exp_vector <- rexp(n = 10000, rate = 5)

# Plot the resulting sample using a histogram

hist(exp_vector)

2
Histogram of exp_vector

5000
Frequency

3000
0 1000

0.0 0.5 1.0 1.5 2.0

exp_vector

You might recall that the expected value of an exponential is 1/λ, which in this case is 1/5 = 0.2, which in
turn is consistent with the histogram. Remember that your histogram may look slightly different to this
one, since it may come from a different sample. The expected value, however, should be the same since both
samples come from the same distribution.
Re-run this code a number of times, and you will get a slightly different answer each time because the values
are generated randomly. Modify the code so that it generates only 100 values and re-run a number of times.
Now you see that the histogram is much more rugged and tends to change more from run to run. This is
the effect of sample size. Large samples resemble more closely the distribution they come from than smaller
samples.

Calculating probabilities
Now look at the function pexp(), the second function in the list in the help file. It is the cumulative
distribution function for the exponential distribution, i.e. P r(X ≤ x). The cumulative distribution function
for a continuous probability distribution gives us the area under the curve (i.e. the probability) up to a
specified point.
For example, let’s calculate the probability of observing a value smaller than the expected value of the same
exponential distribution we’ve been working with. How do we do this with the function pexp()? From
the help file, we see that this function expects a quantile (q), i.e. the value for x up to which we want to
calculate the cumulative probability. Then, it needs to know the rate parameter rate. And finally, we can
tell it whether we want the area under the lower tail (lower.tail = TRUE, which is the default and gives
P r(X ≤ x)) or the area under the upper tail (lower.tail = FALSE gives P r(X > x)).
We know that the expected value of an exponential distribution is 1/λ or similarly 1/rate. Then, to calculate
the probability of observing a value smaller than the expected value we type the following code.
# Calculate the probability of observing a value smaller than expected value
# of exponential with rate = 5
pexp(1/5, rate = 5)

## [1] 0.6321206
With this line of code we’ve asked R to calculate the area under the exponential distribution curve from −∞
up to the mean, which gives us the probability of an observation falling inside that inteval. Effectively, R

3
only needs to calculate the area under the curve from 0 up to the mean, since the exponential distribution
can’t take on negative numbers.
And we’ve got an interesting result! It seems that it is more likely to observe values smaller than the expected
value than values larger than the expected value. This is due to the exponential distribution being skewed to
the right. Remember that the value that ‘cuts’ the area under the curve (and therefore the probability) in
half is the median, not the mean!

SUBMISSION:

Use what you have just learnt to answer the following questions:

Suppose you are on a game drive and find a kill. Vultures arrive at the kill at a rate of 5.3 per hour.

Amathuba Question 1. You need to leave in 30 minutes. What is the probability that you will see at least
one vulture before you have to go?
Amathuba Question 2. What is the probability that it will be more than 20 minutes before the next
vulture arrives?
Amathuba Question 3. What is the probability that you need to wait between 20 and 40 minutes before
the first vulture arrives?

Working with discrete random variables

The R functions for the binomial distribution (pbinom()) and the Poisson distribution (ppois()) work in
a similar way. However, remember that these distributions have probability mass functions and they are
non-zero only for integers. For cumulative probabilities (‘smaller than a given number’ or ‘greater than a
given number’) we use these two functions in a similar way as we used pexp() above. However, if we want to
know the probability of observing exactly one outcome (P r(X = x)), we use the sister functions dbinom()
and dpois().
Note: the function dexp() also exists and gives us the probability density at some x value. It provides the
value of the PDF, just as dpois() or dbinom() provide the value of the PMF.
Here is an example (use the R help files if you need more guidance): you observe a group of 15 penguins that
each survive the year independently with probability 0.8.
1. What is the probability that exactly 10 are still alive at the end of the year?
# P(X = 10) when X have a B(15,0.8)
dbinom(10, size = 15, prob = 0.8)

## [1] 0.1031823
2. What is the probability that no more than 10 penguins survive?
# P(X <= 10) when X have a B(15,0.8)
pbinom(10, size = 15, prob = 0.8)

## [1] 0.1642337
3. What is the probability that at least 5 penguins survive? Note that when lower.tail = FALSE we
are specifying P r(X > x) and not P r(X ≥ x) and hence we specify 4 penguins here i.e. P r(X ≥ 5) =
P (X > 4) = P (5) + P (6) + ....
# P(X > 4) when X have a B(15,0.8)
pbinom(4, size = 15, prob = 0.8, lower.tail = FALSE)

## [1] 0.9999875

4
SUBMISSION:

Childhood lead poisoning is a public health concern. In a town in the Highveld, one child in 30 has a high
blood lead level. In a randomly chosen group of 40 children from the population, what is the probability that:
Amathuba Question 4. Exactly three have high lead level?
Amathuba Question 5. At most three have high lead levels?
Amathuba Question 6. At least three have high lead levels?

Grasshoppers are found in a large meadow at the rate of 2.4 per square meter.
Amathuba Question 7. What is the probability of less than 5 grasshoppers being found in a random
square meter?
Amathuba Question 8. What is the probability of more than 5 grasshoppers being found in an area of 2
square meters?

The commands you learned today

These are the functions and operators that you learned today. Fill in your own description of what they do.
rexp()
dexp()
pexp()
rbinom()
dbinom()
pbinom()
rpois()
dpois)
ppois()

Skewness 2025
No ratings yet
Skewness 2025
62 pages
Module 1
No ratings yet
Module 1
99 pages
20250313-Week4-Probability and Random Variables
No ratings yet
20250313-Week4-Probability and Random Variables
52 pages
c2 RVs Distribution
No ratings yet
c2 RVs Distribution
48 pages
Lesson 1 Statitstics and Random Variable
No ratings yet
Lesson 1 Statitstics and Random Variable
28 pages
Garvit 102216089
No ratings yet
Garvit 102216089
24 pages
Time Series Analysis - Univariate and Multivariate Methods by William Wei PDF
100% (3)
Time Series Analysis - Univariate and Multivariate Methods by William Wei PDF
634 pages
Lecture 1
No ratings yet
Lecture 1
41 pages
Check All Questions and Expand in Short If Needed ...
No ratings yet
Check All Questions and Expand in Short If Needed ...
6 pages
Math Reviewer
No ratings yet
Math Reviewer
31 pages
Data Science Probability
No ratings yet
Data Science Probability
75 pages
Lab Manual Ch4
No ratings yet
Lab Manual Ch4
10 pages
Da Unit-4
No ratings yet
Da Unit-4
37 pages
K. Sam Shanmugan, Arthur M. Breipohl-Random Signals - Detection, Estimation and Data Analysis-Wiley (1988) PDF
100% (4)
K. Sam Shanmugan, Arthur M. Breipohl-Random Signals - Detection, Estimation and Data Analysis-Wiley (1988) PDF
676 pages
Mathematical Computations Using R
No ratings yet
Mathematical Computations Using R
53 pages
2024 F STA-1005ab Review Problems For The Final Exam
No ratings yet
2024 F STA-1005ab Review Problems For The Final Exam
65 pages
Main
No ratings yet
Main
13 pages
03 - CT3S Introduction To Probability Simulation and Gibbs Sampling With R Solutions
100% (1)
03 - CT3S Introduction To Probability Simulation and Gibbs Sampling With R Solutions
270 pages
Presentation 3
No ratings yet
Presentation 3
29 pages
Meraz5.0 Cult Events
No ratings yet
Meraz5.0 Cult Events
31 pages
Unit 4
No ratings yet
Unit 4
38 pages
Probability and Statistics II MAY 2023
No ratings yet
Probability and Statistics II MAY 2023
51 pages
Statistics With R Week 6
No ratings yet
Statistics With R Week 6
3 pages
R-Program Lab Manual
No ratings yet
R-Program Lab Manual
57 pages
Data Science - Probability
No ratings yet
Data Science - Probability
53 pages
SMuR Assignment-2
No ratings yet
SMuR Assignment-2
2 pages
Intro To Probability (Pattern Recognition)
No ratings yet
Intro To Probability (Pattern Recognition)
94 pages
ECN-511 Random Variables 11
No ratings yet
ECN-511 Random Variables 11
106 pages
Return of The Lectures: Unit 4
No ratings yet
Return of The Lectures: Unit 4
87 pages
MATH 1280-01 Learning Journal Unit 5
No ratings yet
MATH 1280-01 Learning Journal Unit 5
5 pages
R Questions With Solution
No ratings yet
R Questions With Solution
11 pages
Ikaj Stochmod Lectnotes
No ratings yet
Ikaj Stochmod Lectnotes
114 pages
00 Lab Notes
No ratings yet
00 Lab Notes
13 pages
R03 Simulation.128
No ratings yet
R03 Simulation.128
18 pages
Probability
No ratings yet
Probability
69 pages
Random Variables: Fall 2017 Instructor: Ajit Rajwade
No ratings yet
Random Variables: Fall 2017 Instructor: Ajit Rajwade
74 pages
Chapter 3
No ratings yet
Chapter 3
35 pages
Probst at Lab
No ratings yet
Probst at Lab
9 pages
Probability
No ratings yet
Probability
36 pages
Ecmt1020 LT01
No ratings yet
Ecmt1020 LT01
8 pages
LEC0125 RNG Generation
No ratings yet
LEC0125 RNG Generation
7 pages
Ch-04 - Random Variables and Their Properties
No ratings yet
Ch-04 - Random Variables and Their Properties
32 pages
5 Describing Populations: in This Chapter We Describe Populations and Samples Using The Language of Probability
No ratings yet
5 Describing Populations: in This Chapter We Describe Populations and Samples Using The Language of Probability
9 pages
Probability Distributions in R
No ratings yet
Probability Distributions in R
42 pages
Psychological Statistics Syllabus
100% (2)
Psychological Statistics Syllabus
4 pages
Definitions: Random Variable: Probability Space
No ratings yet
Definitions: Random Variable: Probability Space
3 pages
Stat - G. Assignment
No ratings yet
Stat - G. Assignment
21 pages
Chapter Five One Dimensional Random Variables
100% (1)
Chapter Five One Dimensional Random Variables
12 pages
Workshop 5: PDF Sampling and Statistics: Preview: Generating Random Numbers
No ratings yet
Workshop 5: PDF Sampling and Statistics: Preview: Generating Random Numbers
10 pages
ECN121 Lecture 2 Notes
No ratings yet
ECN121 Lecture 2 Notes
7 pages
Random Variables
No ratings yet
Random Variables
11 pages
ProbabilityStatistics Probability2
No ratings yet
ProbabilityStatistics Probability2
11 pages
Random Experiments With R
No ratings yet
Random Experiments With R
3 pages
Unit I Random Variables
No ratings yet
Unit I Random Variables
32 pages
STA80006 Weeks7-12 PDF
No ratings yet
STA80006 Weeks7-12 PDF
29 pages
Tutorial 7 - Questions
No ratings yet
Tutorial 7 - Questions
4 pages
Gallery of Continuous Random Variables Class 5, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1 Learning Goals
No ratings yet
Gallery of Continuous Random Variables Class 5, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1 Learning Goals
7 pages
BRT - Notes Part 1
No ratings yet
BRT - Notes Part 1
15 pages
Lesson 5 - Probability Distributions
No ratings yet
Lesson 5 - Probability Distributions
8 pages
Statistical Inference
No ratings yet
Statistical Inference
106 pages
Assignment Unit 6
No ratings yet
Assignment Unit 6
5 pages
Fundamentals of Business Statistics: 6E John Loucks
No ratings yet
Fundamentals of Business Statistics: 6E John Loucks
63 pages
Statistics and Probability
0% (1)
Statistics and Probability
14 pages
Data Analysis For Social Scientists Cheatsheet
No ratings yet
Data Analysis For Social Scientists Cheatsheet
12 pages
Rambaut2018 Tracer 1.7
No ratings yet
Rambaut2018 Tracer 1.7
4 pages
Lab-2: Probability Distributions Name: Objective:To Compute Probability Density Function (PDF) and Cumulative Distribution Function (CDF) Outcomes
No ratings yet
Lab-2: Probability Distributions Name: Objective:To Compute Probability Density Function (PDF) and Cumulative Distribution Function (CDF) Outcomes
15 pages
Introduction To Probability: 2.1 Random Variable
No ratings yet
Introduction To Probability: 2.1 Random Variable
4 pages
Hidden Markov Models
No ratings yet
Hidden Markov Models
17 pages
Sim R
No ratings yet
Sim R
6 pages
Chapter 3 Geometric and Negative Binomial Distributions 1.1
No ratings yet
Chapter 3 Geometric and Negative Binomial Distributions 1.1
10 pages
CH 03
No ratings yet
CH 03
31 pages
Experiments Rlab Upto Cat - 1: Lab - 1 Introduction To R - Lab
No ratings yet
Experiments Rlab Upto Cat - 1: Lab - 1 Introduction To R - Lab
31 pages
CM2A - April23 - EXAM - Final Clean Proof
No ratings yet
CM2A - April23 - EXAM - Final Clean Proof
8 pages
Applications of Free Probability and Random Matrix Theory: Øyvind Ryan
No ratings yet
Applications of Free Probability and Random Matrix Theory: Øyvind Ryan
23 pages
Reliability of Structures 2nd Nowak Solution Manual
100% (52)
Reliability of Structures 2nd Nowak Solution Manual
20 pages
Probability of Simple Events
No ratings yet
Probability of Simple Events
18 pages
Stochastic Processes
No ratings yet
Stochastic Processes
31 pages
Cheat Sheet 4
No ratings yet
Cheat Sheet 4
2 pages
Nonparametric Hypotheses and Rank Statistics For Unbalanced Factorial Designs
No ratings yet
Nonparametric Hypotheses and Rank Statistics For Unbalanced Factorial Designs
10 pages
Exercises: Chapter 7 Sampling Distributions
No ratings yet
Exercises: Chapter 7 Sampling Distributions
3 pages
SEMI-DETAILED LESSON PLAN Week 2
No ratings yet
SEMI-DETAILED LESSON PLAN Week 2
5 pages
Which Method Is Useful To Compare The Long Term Variations in The Values of The Variable
No ratings yet
Which Method Is Useful To Compare The Long Term Variations in The Values of The Variable
5 pages
Tutsheet 9 New
No ratings yet
Tutsheet 9 New
2 pages
Brief Introduction: TD-BED 02 Modified Learning Materials/Modules Based On The Deped Curriculum
No ratings yet
Brief Introduction: TD-BED 02 Modified Learning Materials/Modules Based On The Deped Curriculum
12 pages
Atg3 - Stat&prob - 2ND Sem - Sy22-23 - Gcesguerra
No ratings yet
Atg3 - Stat&prob - 2ND Sem - Sy22-23 - Gcesguerra
6 pages
Mdm4u Fianl Exam Formula PDF
No ratings yet
Mdm4u Fianl Exam Formula PDF
1 page
Summary of Formula - Statistics
No ratings yet
Summary of Formula - Statistics
9 pages
Binomial Distribution Exercises
No ratings yet
Binomial Distribution Exercises
3 pages