0% found this document useful (0 votes)
12 views4 pages

Lab 6 Activities

Lab #6 involves R programming exercises focused on simulating probabilities using the sample() function. Students are required to perform simulations related to a spinner and dice rolls, calculate proportions, and compare them to theoretical probabilities. The lab emphasizes the importance of sample size in achieving accurate probability estimates.

Uploaded by

Mai Anh Đào
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views4 pages

Lab 6 Activities

Lab #6 involves R programming exercises focused on simulating probabilities using the sample() function. Students are required to perform simulations related to a spinner and dice rolls, calculate proportions, and compare them to theoretical probabilities. The lab emphasizes the importance of sample size in achieving accurate probability estimates.

Uploaded by

Mai Anh Đào
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Lab #6 Activities

Anh Dao Mai

Instructions: fill in the code chunks below and answer the questions with text responses. Your responses
must use code that was covered in class; other methods to solve the problems will not be accepted. Submit
your knit pdf file to Crowdmark.
A reminder that the R code we have covered in class is available on our STAT 2150 A01 UM Learn page,
under Content > Course Material. It is recommended that you knit to pdf after you fill in each code chunk.
Introduction:
Recall that a probability is a theoretical value of a proportion after an infinitely long series of trials. We
can easily make a long series of trials with R, stored in some vector, and estimate probabilities with the
proportion of times an event occurs in the series of trials.
The sample() function in R has the following syntax: sample(x, size, replace = FALSE, prob) where
x is an R object containing values (like a vector or a matrix) from which we select a sample, size is the
desired sample size, replace is set to FALSE by default indicating sampling is done without replacement
(that is, once an element of the population has been sampled, it cannot be sampled again), and prob is a
vector of probabilities (created on-the-fly or stored in the environment) of the same length as x, where each
element of prob is the probability for the corresponding element of the x vector. If the prob vector is not
specified, then each element of x will be sampled with equal probability.
For example, we can simulate rolling a die 120,000 times with the following code that randomly selects a
number between 1 and 6, where the probability of obtaining each of the numbers between 1 and 6 is one-
sixth. We sample with replacement since if you get a 1, for example, on a die roll, you can still get a 1 again
on a future die roll.

data = sample(1:6,120000,replace=TRUE,prob=rep(1/6,6))
data[1:30] # See the first 30 die rolls

## [1] 4 6 1 5 4 1 3 6 6 3 6 3 3 6 2 1 6 4 3 6 1 6 3 2 4 3 1 4 5 6

table(data)

## data
## 1 2 3 4 5 6
## 19971 20133 19872 19929 20050 20045

(Each time you knit this .Rmd file, you will get different results because you will get a different simulation
of 120,000 die rolls.) Note that when we write the number 120,000 in R, we cannot use a comma because
commas are used to separate arguments of a function. As we can see in the output of the table() function
above, roughly one-sixth of the die rolls (approximately 20,000) land on each of the numbers between 1 and
6.
Also, we could have obtained the proportion of times each of the possible outcomes occur by dividing the
output of the table() function by 120,000:

1
table(data)/120000

## data
## 1 2 3 4 5 6
## 0.1664250 0.1677750 0.1656000 0.1660750 0.1670833 0.1670417

We see that each of the six numbers occur about one-sixth of the time.
Question 1:
Suppose we have a spinner in the shape of a regular hexagon as seen in the image in Crowdmark. We can
easily see the probabilities that the spinner will land on 1, 2, or 3. Using the sample() function, simulate
spinning the spinner 180 times and obtain counts of how many times the numbers 1, 2, and 3 occurred.
Write your code after the set.seed(11111) below.

set.seed(11111)
spinner = sample(1:3,180,replace=TRUE,prob=c(1/6, 1/3, 1/2))
table(spinner)

## spinner
## 1 2 3
## 26 61 93

What proportion of the 180 simulated spins landed on 1? Enter your calculation into the below code chunk
so that the knit pdf will show the result of the calculation.

table(spinner)[1] / 180

## 1
## 0.1444444

Now simulate spinning the spinner 180,000 times and calculate the proportion of spins that land on 1. Write
your code after the set.seed(11111) below.

set.seed(11111)
spinner_large = sample(1:3, 180000, replace=TRUE, prob=c(1/6, 1/3, 1/2))
table(spinner_large)[1] / 180000

## 1
## 0.16705

Comment on the two sample proportions you have calculated compared to the theoretical probability of
landing on 1.
The results showed that the sample proportion of 1s in the 180,000-spin simulation was closer to the theoretical
probability (1/6) than in the 180-spin simulation, clearly demonstrating that a larger sample size provides a
more accurate estimate of the true probability.
Question 2:
When we roll two dice, there are 36 possible outcomes. We can obtain the sum showing on the two dice for
those 36 outcomes in a 6 x 6 matrix with the outer() function:

2
x = 1:6
y = 1:6
z = outer(x,y,FUN="+")
z

## [,1] [,2] [,3] [,4] [,5] [,6]


## [1,] 2 3 4 5 6 7
## [2,] 3 4 5 6 7 8
## [3,] 4 5 6 7 8 9
## [4,] 5 6 7 8 9 10
## [5,] 6 7 8 9 10 11
## [6,] 7 8 9 10 11 12

The following code will create a vector of the possible sums:

sums = unique(as.vector(z))

And the following code creates a vector probs of the probabilities of each of the possible sums:

dice = sort(as.vector(z))
dice

## [1] 2 3 3 4 4 4 5 5 5 5 6 6 6 6 6 7 7 7 7 7 7 8 8 8 8
## [26] 8 9 9 9 9 10 10 10 11 11 12

probs = table(dice)/36
probs

## dice
## 2 3 4 5 6 7 8
## 0.02777778 0.05555556 0.08333333 0.11111111 0.13888889 0.16666667 0.13888889
## 9 10 11 12
## 0.11111111 0.08333333 0.05555556 0.02777778

(a) Let X be the sum of the dice rolls. Use the probs vector to show that P(X is greater than or equal to
9) equals 5/18 (approximately 0.2778).

sum(probs[as.numeric(names(probs)) >= 9])

## [1] 0.2777778

(b) We can estimate the 5/18 probability in (a) by sampling from the sums vector. Simulate 30 rolls of
the two dice. Note that the output of the sample() function will not be the outcome of the dice rolls
(like (3,4)), but rather the sum of the two dice rolls (like 7). Then calculate what proportion of dice
rolls result in a sum greater than or equal to 9. This should be an estimate of the 0.2778 probability.
Write your code after the set.seed(11111) below.

set.seed(11111)
dice_rolls = sample(sums, 30, replace=TRUE, prob=probs)
mean(dice_rolls >= 9)

3
## [1] 0.3

(c) Repeat part (b) with 600 dice rolls (copy/paste your code from above and make any necessary changes).
Write your code after the set.seed(11111) below.

set.seed(11111)
dice_rolls = sample(sums, 600, replace=TRUE, prob=probs)
mean(dice_rolls >= 9)

## [1] 0.2683333

Compare the proportion of dice rolls where the sum is greater than or equal to 9 when there are 30 dice rolls
or 600 dice rolls, compared to the theoretical probability of 0.2777.
For 30 rolls, the estimated probability can differ significantly from 0.2778 due to sample variability. However,
with 600 rolls, the estimate stabilizes and aligns more closely with the theoretical probability. This illustrates
that increasing the sample size reduces randomness in probability estimation.

You might also like