Notes ch3 Sampling Distributions

Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

Sampling Distributions

Lecture Notes

Erkin Diyarbakirlioglu
IAE Gustave Eiffel
February 6, 2022

1
Table of contents

1 Introduction................................................................................................................................................................3

2 Sampling distribution .............................................................................................................................................4

3 Central Limit Theorem ...........................................................................................................................................5

4 CLT for sample average .........................................................................................................................................7

5 CLT for sample proportion ...................................................................................................................................8

2
1 Parameter(s) vs. Statistic(s)

1. In real-life, when we are interested with a given phenomenon, the exact probability model
underlying the data is generally unknown. The researcher has to extract information from the data
and draw conclusions. The problem is that the data itself is subject to randomness since one
typically do not observe the entire population but only a subset of it. The basic idea of statistical
inference is thus to use the characteristics of a sample to estimate the characteristics of the
population. Loosely speaking, statistical inference is a collection of methods that deal with
drawing conclusions from data that is subject to random variation.1 This note presents a number
of simple yet fundamental concepts as an introduction to the theory of statistical inference.

In statistics, a parameter 𝜃 describes a given property of a population. A statistic 𝜃̂ is a quantity


estimated from a sample drawn from the population of interest. The general setup for the problem
of statistical inference can be stated as follows: Consider the unknown quantity of interest 𝜃. Using
data, one estimates 𝜃̂. How close then 𝜃̂ is to 𝜃?2

2. The goal of statistical inference is to use the characteristics of a sample to estimate


characteristics of the population. Using a random sample from a population of interest, we can
calculate a sample statistic. The key point is that the sample statistic calculated using a random
sample is a random variable while the parameter of interest is a fixed value. Since the sample
statistic is a random variable, we can describe its random behavior using a special distribution,
which we will call the sampling distribution.

Example 1. Consider the following situations. The numbers given on the left column describe a
parameter. Those reported on the right are an estimate of the parameter of interest based on a
sample drawn from the population.

Parameter Statistic
▪ The mean of the outcomes of a six-sided die ▪ A six-sided die is rolled 10 times, the sample
is 𝜇 = 3.5. average is 𝑥̅ = 3.22.
▪ The minimum value one can get when we ▪ A six-sided die is rolled 10 times. The
roll a six-sided die is min(𝑋) = 1. minimum value observed is 2.

1 Source: https://www.probabilitycourse.com/chapter8/8_1_0_intro.php
2 The question may sound a simple one but it lays at the origin of two main schools in statistics: Frequentist (classical)
approach vs. Bayesian approach. In frequentist approach, the parameter 𝜃 is assumed to be deterministic: It has a
fixed value and there is nothing uncertain about it. In contrast, a sample estimate 𝜃̂ is assumed to random as it is
basically function of the sample. According to the Bayesian approach, the parameter is also assumed to be a random
variable. Besides, Bayesian school also asserts that the researcher has an initial guess about the distribution of 𝜃. After
observing some data, this distribution is updated using Bayes’ rule. Briefly, one may assert say that the classic school
addresses the estimation of a non-random quantity using data while Bayesian school deals with estimating random
variables.
3
▪ In metropolitan France, the mean weight of ▪ A random sample with 𝑛 = 30 observations
3 years-old girls is 𝜇 = 14 kg. among 3 years-old girls is selected. The
average weight is 𝑥̅ = 13.8 kg.
▪ The probability of a fair coin turning ▪ One tosses a fair coin 100 times. It lands on
“Heads” is 𝑝 = 50%. “Heads” 55 times. So, the sample proportion
of “Heads” is 𝑝̂ = 55%.
▪ The probability of six-sided die landing, say, ▪ A six-sided die is rolled 30 times. It lands on
3 is 𝑝 = 1⁄6. “three” 2 times. So, the sample proportion
is 𝑝̂ = 1⁄15.
▪ An internet study run by C. McManus ▪ In a survey with 1126 participants in
suggests that 11.15% of the French is left- France, 95 people claim that they are left-
handed. handed.
▪ According to Eurostat, the mean age of men ▪ A random sample from civil service
at first marriage in France was 34.9 in 2017. databases across different city halls in
France reveals that the average age of men
at first marriage was 36.2 in 2017.

2 Sampling distribution

3. Let 𝜃 be a population parameter of interest and 𝜃̂ be a sample estimate of 𝜃 calculated using a


random sample of size 𝑛. Then, the sampling distribution of 𝜃̂ is the distribution of the estimated
values 𝜃̂ obtained from repeated samplings of fixed size 𝑛.

Example 2. Using software (R or Excel), generate a 𝑘 × 𝑛 data matrix, i.e. 𝑛 samples drawn from a
random variable 𝑋 (e.g. 𝑋 may represent a fair six-sided die), each sample of size 𝑘. For practical
concerns, start with lower values for both 𝑘 and 𝑛, like 𝑘 = 10 and 𝑛 = 20. Draw then the
histogram of 𝑛 = 20 different sample averages. Then, increase the number of samples 𝑛 to show
the graphical convergence of the distribution of sample averages to the normal distribution.

4. Let 𝜃̂ be an estimate of the parameter 𝜃 calculated using a sample of size 𝑛. Then, the uncertainty
of the estimated value is measured by the standard error the estimate, defined as,

𝜎
𝑆𝐸(𝜃̂) = (1)
√𝑛

where 𝜎 is the population standard deviation. If the population standard deviation is unknown,
then the standard error of can be obtained using the sample standard deviation as,

𝑠
𝑆𝐸(𝜃̂) = (2)
√𝑛

Remarks:

4
• The standard error measures the randomness associated with 𝜃̂
• Given the formula, the larger the sample size, the better the precision of the estimate 𝜃̂
• Standard error of an estimate is not equivalent of the standard deviation,
o 𝜎 is the average variability across different elements of a dataset,
o 𝑆𝐸(𝜃̂) is a measure of the precision of an estimate

Example 3. Let 𝑋 be a random variable that represents the result of a fair die. Let’s calculate the
mean and the variance of 𝑋. Using the pmf of 𝑋, we obtain 𝜇 = ∑𝑛𝑖=1 𝑥𝑖 × 𝑃(𝑋 = 𝑥𝑖 ) = 3.5. The
variance of 𝑋, 𝜎 2 = ∑𝑛𝑖=1(𝑥𝑖 − 𝜇)2 × 𝑃(𝑋 = 𝑥𝑖 ) = 2.916667. Therefore, the standard deviation of
𝑋 is 𝜎 = |√2.916667| = 1.707825.

Example 4. Assume now you roll a six-sided die 8 times and get the following results: 4, 5, 5, 6, 1,
6, 4, 2. The sample average is 𝑥̅ = 4.125 and the sample standard deviation 𝑠 = 1.8077. The
standard error of the sample average is 𝑆𝐸(𝑥̅ ) = 𝜎⁄√𝑛 = 1.707825⁄√8 = 0.603807.

3 Central Limit Theorem

5. The Central Limit Theorem (CLT) states that the sum of a large number of independently and
identically distributed (iid) random variables is approximately normal. In this section, I discuss
the classical form of CLT. Let 𝑋1 , 𝑋2 , … , 𝑋𝑛 be a sequence of i.i.d. random variables each with finite
mean and variance as 𝐸(𝑋𝑖 ) = 𝜇 and 𝑉𝑎𝑟(𝑋𝑖 ) = 𝜎 2 . Define the sum,

𝑌𝑛 = 𝑋1 + ⋯ + 𝑋𝑛 (3)

Because 𝑋1 , … , 𝑋𝑛 are iid, we have

𝐸(𝑌𝑛 ) = 𝐸(𝑋1 + ⋯ + 𝑋𝑛 ) = 𝐸(𝑋1 ) + ⋯ + 𝐸(𝑋𝑛 ) = 𝑛𝜇

and

𝑉𝑎𝑟(𝑌𝑛 ) = 𝑉𝑎𝑟(𝑋1 + ⋯ + 𝑋𝑛 ) = 𝑉𝑎𝑟(𝑋1 ) + ⋯ + 𝑉𝑎𝑟(𝑋𝑛 ) = 𝑛𝜎 2

Then, the classical CLT holds that the following transformation of 𝑌𝑛 ,

𝑌𝑛 − 𝐸(𝑌𝑛 ) (𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛 ) − 𝑛𝜇
𝑍𝑛 = = (4)
√𝑉𝑎𝑟(𝑌𝑛 ) 𝜎√𝑛

converges in distribution to the standard normal random variable as 𝑛 goes to infinity, i.e.

5
lim 𝑃(𝑍𝑛 ≤ 𝑧) = Φ(𝑧) (5)
𝑛→∞

where Φ(𝑧) is the standard normal cdf.

6. The key point with the CLT is that in many real applications, a certain random variable of
interest is set as the sum of many independent random variables. In such situations, the CLT gives
a theoretically legitimate reason to apply the normal model. Examples of such random variables
include, among others, measurement errors in laboratories, noise in communication and signal
processing, the change in the prices of financial assets, etc. In addition, the interesting feature of
the CLT is that it does not matter what the distribution of the 𝑋𝑖 ’s is. These can be discrete,
continuous or mixed random variables. This can simplify calculations substantially when we deal
with a problem involving thousands of iid random variables. Instead of trying to find out the
distribution of the sum of these variables, the CLT gives a quick answer with the only condition
that one knows the mean and variance of these variables.3

Example. A bank teller serves customers standing in the queue one by one. It takes on average 2
minutes to serve a customer with a variance by 1 minute. If the service times for different
customers are independent, what is the probability that the bank teller serves 50 customers in
less than 110 minutes? Let 𝑌 = 𝑋1 + ⋯ + 𝑋50 be the total time to serve 50 customers. Since we
have a sequence of independent random variables with known 𝜇 and 𝜎 2 , we apply the CLT as,

110 − 50 × 2
𝑃(𝑌50 ≤ 110) = 𝑃 (𝑍50 ≤ ) = 𝑃(𝑍 ≤ 110) = 0.9213
√50 × 1

7. Continuity correction to CLT When the sequence 𝑋1 , … , 𝑋𝑛 of random variables are


discrete, their sum 𝑌 = 𝑋1 + ⋯ + 𝑋𝑛 can take only integer values. Applying the CLT and thus using
the normal model to 𝑌 can induce an approximation error since the normal distribution is
continuous while the underlying variable of interest is discrete. The continuity correction to CLT
yields a better approximation by overcoming this difficulty. Specifically, if one is interested in
evaluating the probability 𝑃(𝐴) = 𝑃(𝑌 − ≤ 𝑌 ≤ 𝑌 + ) with 𝑌 − ≤ 𝑌 ≤ 𝑌 + are all integers, the
following correction can be applied:

𝑃(𝐴) = 𝑃((𝑌 − − 0.5) ≤ 𝑌 ≤ (𝑌 + + 0.5))

3 This section is based on https://www.probabilitycourse.com/chapter7/7_1_2_central_limit_theorem.php


6
Example. On average, 8% of all children is nearsighted. Let’s calculate the probability to find less
than 20 nearsighted children in a random sample of 200 children. The exact probability can be
found by evaluating the binomial probabilities, 𝑃(𝑋 ≤ 20) = 𝑃(𝑋 = 0) + 𝑃(𝑋 = 1) + ⋯ +
𝑃(𝑋 = 20) with 𝑋 ∼ 𝐵(𝑛 = 200, 𝑝 = 0.08). Using software, we obtain 𝑃(𝑋 ≤ 20) = 0.8775.4 The
same probability using the CLT is calculated as,

20 − (200 × 0.08)
𝑃(𝑋 ≤ 20) = 𝑃 (𝑍 ≤ ) = 𝑃(𝑍 ≤ 1.0426) = 0.8514
√200 × 0.08 × 0.92

The approximation error is apparent. Applying the continuity correction, however, we get
𝑃(𝑋 ≤ 20 + 0.5) = 𝑃(𝑍 ≤ 1.1729) = 0.8796, so the result is considerably improved.

4 CLT for sample average

8. Using the previous result, we can obtain the sampling distribution of the sample average 𝑥̅ =
1
(𝑋1 + ⋯ + 𝑋𝑛 ). Rewrite first the CLT,
𝑛

(𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛 ) − 𝑛𝜇
plim 𝑃 (𝑍𝑛 ≤ ) = Φ(𝑧) (6)
𝑛→∞ 𝜎√𝑛

and divide the numerator and the denominator of the right-hand-side 𝑛,

1
𝑛 ((𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛 ) − 𝑛𝜇)
plim 𝑃 (𝑍𝑛 ≤ ) = Φ(𝑧)
𝑛→∞ 1
𝑛 𝜎 √𝑛

which, on rearranging, implies,

𝑥̅ − 𝜇
plim 𝑃 (𝑍𝑛 ≤ ) = Φ(𝑧) (7)
𝑛→∞ 𝜎⁄√𝑛

Therefore, the sample average is a normal variable as 𝑥̅ ∼ 𝑁(𝜇, 𝜎⁄√𝑛).

Example. The grades of a statistics class are assumed to be a normal random variable with mean
𝜇 = 11.4 and standard deviation 𝜎 = 3.5. If one chooses, say, 36 students at random, then the
sample average will also be a normal random variable as 𝑥̅ ∼ 𝑁(11.4, 3.5⁄√36).

4 In R, use pbinom(q = 20, size = 200, prob = 0.08).


7
5 CLT for sample proportion

9. Let 𝑋1 , … , 𝑋𝑛 be a sequence of iid Bernoulli random variables, each with success probability 𝑝.
Define 𝑌 as the number of successes in this sequence. Then, it is well known 𝑌 follows a binomial
distribution as 𝑌 ∼ 𝐵(𝑛, 𝑝), with 𝐸(𝑌) = 𝑛𝑝 and 𝑉𝑎𝑟(𝑌) = 𝑛𝑝(1 − 𝑝).

Consider now the sample proportion of success 𝑝̂ . Using 𝑌, the sample proportion of success can
be defined as the number of success observed in this sequence divided by the number of trials 𝑛.
The expected value of the sample proportion will be 𝐸(𝑝̂ ) = 𝑛𝑝⁄𝑛 = 𝑝. Because the variance of 𝑌
is 𝑉𝑎𝑟(𝑌) = 𝑛𝑝(1 − 𝑝), then the standard error of the sample proportion 𝑝̂ will be given by,

√𝑉𝑎𝑟(𝑌) √𝑛𝑝(1 − 𝑝)⁄𝑛 𝑝(1 − 𝑝)


√𝑉𝑎𝑟(𝑝̂ ) = 𝑆𝐸(𝑝̂ ) = = =√ (8)
√𝑛 √𝑛 𝑛

By virtue of the CLT, it can be shown that

𝑝̂ − 𝑝
plim 𝑃 𝑍𝑛 ≤ = Φ(𝑧) (9)
√𝑝(1 − 𝑝)
𝑛→∞
( 𝑛 )

𝑝(1−𝑝)
Therefore, the sample proportion is approximately normal as 𝑝̂ ∼ 𝑁 (𝑝, √ 𝑛
) if 𝑛𝑝 and

𝑛(1 − 𝑝) are both greater than 15.

Example. Assume you flip a fair coin 100 times. Denote 𝑝 the probability of observing heads and 𝑝̂
the proportion of heads observed in this experiment. Since 𝑛𝑝 and 𝑛(1 − 𝑝) are both greater than
15, the sampling distribution of the sample proportion is nearly normal with expected value 𝑝 =
0.5 and standard deviation 𝑆𝐸(𝑝̂ ) = √0.5 × (1 − 0.5)⁄100 = 0.05.

Example. Assume you want to calculate the probability to get heads at most 45 times in this
experiment. We have 𝑝̂ ∼ 𝑁(0.5, 0.05). Therefore, 𝑃(𝑝̂ ≤ 0.45) = 𝑃(𝑍 ≤ (𝑧 = −1)) = 0.159.

8
Appendix. Sampling distribution of sample mean

Let 𝑋 ∼ 𝑁(𝜇, 𝜎) and consider a sequence of repeated samplings from 𝑋 of fixed size 𝑛. Define,

1
𝑋̅ = (𝑋 + ⋯ + 𝑋𝑛 )
𝑛 1

the sample mean. Since 𝑋𝑖 are independent and random, they all have the same probability
distribution, i.e. 𝑁(𝜇, 𝜎). The moment generating function of 𝑋̅ can then be developed as,

𝜃 𝜃 𝜃
𝑀𝑋̅ (𝜃) = 𝑀1 (𝑋 (𝜃) = 𝑀𝑋1 +⋯+𝑋𝑛 ( ) = 𝑀𝑋1 ( ) × ⋯ × 𝑀𝑋𝑛 ( )
𝑛 1 +⋯+𝑋𝑛 ) 𝑛 𝑛 𝑛

because 𝑋𝑖 's are independent random variables all drawn from the same probability distribution.
Thus, the last term can be expressed as,

𝜃 𝜃 𝜃
𝑀𝑋1 ( ) = ⋯ = 𝑀𝑋𝑛 ( ) = 𝑀𝑋 ( )
𝑛 𝑛 𝑛

At the onset, we have already defined 𝑋 ∼ 𝑁(𝜇, 𝜎). Therefore,


𝑛 𝑛
𝜃 𝜃 1 𝜃 2 1 𝜃2 1 𝜎 2
𝑀𝑋̅ (𝜃) = (𝑀𝑋 ( )) = exp (𝜇 + 𝜎 2 ( ) ) = exp (𝜇𝜃 + 𝜎 2 ) = exp (𝜇𝜃 + ( ) 𝜃 2 )
𝑛 𝑛 2 𝑛 2 𝑛 2 √𝑛

The right-hand-side is the moment generating function of the variable 𝑋 when 𝑋 ∼ 𝑁(𝜇, 𝜎⁄√𝑛).

9
End-of-chapter exercises for Sampling distributions

Exercise 1. Identifying the parameter

For each of the following situations, state whether the parameter of interest is a mean of a
proportion. (1) In a survey, one hundred college students are asked how many hours per week
they spend on the Internet. (2) In a survey, one hundred college students are asked what
percentage of the time they spend on the Internet is part of their course work. (3) In a survey, one
hundred college students are asked whether or they have ever cited information from Wikipedia
on their papers. (4) In a survey, one hundred college students are asked the weekly amount of
money they spend on alcoholic beverages.

Exercise 2. Identifying the parameter

For each of the following situations, state whether the parameter of interest is a mean or a
proportion. (1) A poll shows that 64% of Europeans worry a great deal about climate change. (2)
A survey reports that local TV news has shown a 17% increase in revenue between 2009 and 2011
while newspaper revenues decreased by 6.4% during this time period. (3) In a survey, high school
and college students are asked whether or not they use geolocation services on their smartphones.
(4) In a survey, smartphone users are asked whether or not they use a web-based catering service.
(5) In a survey, smartphone users are asked how times they check their home screens without
receiving a call or a message signal.

Exercise 3. Parameters vs. statistics

Suppose you flip a coin 100 times. It lands on heads 45 times. What is the proportion of heads in
this experiment? State if this is a statistic or a population parameter? Is the value of the population
proportion of heads known? What is this value assuming the coin is fair?

Exercise 4. Parameters vs. statistics

A company sells cement in 5 kg packs. Suppose you buy 20 such packs and weigh each of them.
The mean weight of the sample is 5.12 kg. Identify the parameter of interest and the sample
statistic in this situation. Is it a mean or a proportion? What is the value of the population
parameter? What is the value of the sample statistic? What is the sample size?

Exercise 5. Parameters vs. statistics

10
A college counselor is interested in estimating how many credits a student typically enrolls in each
semester. The counselor decides to randomly sample 100 students by using the registrar’s
database of students. The histogram below shows the distribution of the number of credits taken
by these students. Sample statistics for this distribution are given next to the graph.

Minimum 8.00
1st quartile 13.00
Median 14.00
Mean 13.65
Standard dev. 1.91
3rd quartile 15.00
Maximum 18.00

number of credits
(1) What is the estimated average number of credits taken per semester by students at this
college? What about the median? (2) What is the sample standard deviation of the number of
credits taken per semester by students at this college? What about the interquartile range (IQR)?
(3) Is a load of 16 credits unusually high for this college? What about 18 credits? Explain your
reasoning. (4) The college counselor takes another random sample of 100 students and this time
finds a sample mean of 14.02 units. Should she be surprised that this sample statistic is slightly
different than the one from the original sample? Explain your reasoning. (5) The sample mean
given next to the histogram is a point estimate for the mean number of credits taken by all students
at that college. What measure do we use to quantify the variability of this estimate? Compute this
quantity using the data provided.

Exercise 6. Parameters vs. statistics

Researchers studying anthropometry collected body girth measurements and skeletal diameter
measurements, as well as age, height and gender, for 507 physically active individuals.5 The
histogram below shows the sample distribution of heights in centimeters.

Minimum 147.2
1st quartile 163.8
Median 170.3
Mean 171.1
Standard dev. 9.4
3rd quartile 177.8
Maximum 198.1

height

5 Source: Diez et al. (2015, p. 204).


11
(1) What is the estimated average height of active individuals? (2) What is the estimated sample
standard deviation of the heights of active individuals? (3) Is a person who is 180 cm tall
considered unusually tall? And is a person who is 155 cm tall considered unusually short? Explain
your reasoning. (4) The researchers take another random sample of physically active individuals.
Would you expect the mean and the standard deviation of this new sample to be the ones given
above? Explain your reasoning. (5) The sample means obtained are point estimates for the mean
height of all active individuals, if the sample of individuals is equivalent to a simple random
sample. What measure do we use to quantify the variability of such an estimate? Compute this
quantity using the data from the original sample given above.

Exercise 7. Sampling distributions

The number of eggs laid by a certain species of hen during their breeding period has a mean by 35
eggs with a standard deviation by 18.2.6 A group of researchers randomly samples 45 hens of this
species, counts the number of eggs laid during their breeding period, and records the sample
average. They repeat this process 1,000 times and build a distribution of the sample averages
obtained this way. (1) What is this distribution called? (2) Would you expect the shape of this
distribution to be symmetric or asymmetric? Explain your reasoning. (3) Calculate the variability
of this distribution and state the appropriate term used to refer to this value. (4) Suppose the
researchers’ budget is reduced and they are only able to collect random samples of 10 hens. The
sample mean of the number of eggs is recorded, and we repeat this 1,000 times, and build a new
distribution of sample means. How will the variability of this new distribution compare to the
variability of the original distribution?7

Exercise 8. Sampling distributions

Test scores for a standardized test are distributed as approximately normal with mean 700 and
standard deviation 80, 𝑋 ∼ 𝑁(700, 80). (1) Assume a random sample of 16 test results is taken.
What is the expected value of the sample average? What is the expected standard deviation of this
sample around the mean? (2) Calculate the same quantities, i.e. expected value of the sample
average and standard deviation of the sample average for another random sample of size 100.8

6 Source: Diez et al. (2015, p. 205).


7 (1) This is called the sampling distribution of the sample mean. (2) If each sample is random, we would expect this
distribution to be symmetric around the population mean. (3) The variability of this distribution is the standard error
of the sample mean. It is given by the ratio of the population standard deviation, which is in this case given, to the
square root of the sample size. We have 𝑆𝐸(𝑥̅ ) = 18.2⁄√1000 = 0.5755. (4) If the researcher takes 10 observations in
each sample, then the new standard error of the sample means will be 𝑆𝐸(𝑥̅ ) = 18.2⁄√10 = 5.75, which is 10 times
the standard error when the samples sizes were 1000 observations each.
8 The expected value of the sample average is equal to the population mean. So, 𝐸(𝑥̅ ) = 700. The standard error of the

mean is 𝑆𝐸(𝑥̅ ) = 80⁄√16 = 20. If one takes a random sample of size 100, then the expected value of the sample mean
will remain the same, i.e. 700. The standard error of the sample mean, however, will be lower as 𝑆𝐸(𝑥̅ ) = 80⁄√100 =
8.
12
Exercise 9. Sampling distributions

A manufacturer of compact fluorescent light bulbs advertises that the distribution of the lifespans
of these light bulbs is nearly normal with a mean of 9,000 hours and a standard deviation of 1,000
hours. (1) What is the probability that a randomly chosen light bulb lasts more than 10,500 hours?
(2) Describe the distribution of the mean lifespan of 15 light bulbs. (3) What is the probability that
the mean lifespan of 15 randomly chosen light bulbs is more than 10,500 hours? (4) Sketch the
two distributions (population and sampling) on the same scale.

Exercise 10. Sampling distributions

The chief researcher of a chemistry laboratory makes an experiment with the aid of her grad
students who help running the experiment and then report their measurements to the researcher.
The measurement of each student, however, is subject to some variability. The standard deviation
of their measurements is supposed to be 𝜎 = 10 milligrams. The lab’s chief researcher repeats the
measurement 8 times and records the average. What is the standard deviation of the sample
average? How many times the chief researcher must repeat the experiment to reduce the standard
deviation of the sample average to 2.5?9

Exercise 11. Sampling distributions

Roulette wheels are typically marked with the numbers 1 through 36 plus 0 and 00. Each of these
outcomes is equally likely every time the wheel is spun. According to a simple version, when a
player places a bet on any one number and is correct, the payoff is 35:1; that is if the player bets
$1, he will receive $36 if he wins ($35 plus $1 initial bet) and nothing if he loses. Suppose a player
place a $1 bet on his favorite number. (1) What is the casino’s expected profit from a single bet?
What is the standard deviation of this profit? (2) Suppose the casino remains open 350 days a year
and on an ordinary day 500 independent such bets are placed to the roulette. What will be the
standard deviation of the mean profit per game?10

Exercise 12. Central Limit Theorem

9 𝜎𝑥̅ = 𝜎⁄√𝑛 = 10⁄√8 and 2.5 = 10⁄√𝑛 → 𝑛 = 16 times.


10 There are 38 slots on the wheel. The odds of winning in the game are 1 in 38. If the player wins, the casino loses
$35, otherwise makes a profit $1. The mean profit of the casino from a single bet is then 𝜇 = $1 × (37⁄38) +
(−$35) × (1⁄38) = $0.0526. The variance of the profit from a single bet is 𝜎 2 = (1 − 0.0526)2 × (37⁄38) +
(−35 − 0.0526)2 × (1⁄38) = $33.2077, so 𝜎 = $5.7626. If there are 350 × 500 = 175,000 independent bets, the
standard deviation of the mean profit per game will be given by 𝜎⁄√𝑛 = √33.1776⁄√175000 = $0.0138, which is
much lower than the standard deviation of a single bet. So, as the number of independent bets increases, the average
income for the casino becomes less risky.
13
Assume there are 100 travelers on a plane. Let 𝑋𝑖 be the weight (in pounds) of the 𝑖th traveler on
the plane. Suppose 𝑋𝑖 ’s are i.i.d. random variables with 𝐸(𝑋𝑖 ) = 170 and 𝑉𝑎𝑟(𝑋𝑖 ) = 900. Find the
probability that the total weight of the travelers on the plane exceeds 18,000 pounds.11

Exercise 13. Central Limit Theorem

You have invited 64 guests to a party. You need to make sandwiches for them. You believe that a
guest might need 0, 1 or 2 sandwiches with probabilities 0.25, 0.5 and 0.25 respectively. Assume
that the number of sandwiches needed for each guest is independent from other guests. How
many sandwiches should you make so that you are 95% sure that there is no shortage?12

Exercise 14. Central Limit Theorem

The weights of Granny Smith apples from a large orchard follow a normal distribution with mean
380 g and standard deviation 28 g. (1) A single apple is randomly selected from this orchard. What
is the probability that it weighs more than 400 g? (2) The farmer sells the apples in crates which
contain 26 apples. What is the probability that a given crate weighs more than 10 kg?13

Exercise 15. Central Limit Theorem

An insurance company sells, among others, car insurance policies. Suppose a standardized
contract sells at $100 per year (cash into the company). In case of a claim, the company assumes
that the coverage costs per policy is $1000 (cash out of the company). Statistics suggest that 1%
of all contracts will make a claim during a given year while 99% of the customers will make no
claim at all. Assume there are 𝑛 = 1000 such car policies sold. Using the Central Limit Theorem,
calculate the probability that the company's earnings from car insurance policies sold be higher
than $80,000.14

11 Source: https://www.probabilitycourse.com/chapter7/7_1_3_solved_probs.php 𝑌 = 𝑋1 + ⋯ + 𝑋100 . The CLT states


that the variable 𝑍 = (𝑌 − 𝐸(𝑌))⁄(√𝑉𝑎𝑟(𝑌)) ∼ 𝑁(0, 1). So, 𝐸(𝑌) = 𝑛𝐸(𝑋𝑖 ) = 17000 and 𝑉𝑎𝑟(𝑌) = 𝑛𝑉𝑎𝑟(𝑋𝑖 ) =
100 × 900. It turns out that 𝑃(𝑍 > (18000 − 17000)⁄300) = 𝑃(𝑍 > 3.3333) = 1 − Φ(3.3333) = 4.3 × 10−4 .
12 Source: https://www.probabilitycourse.com/chapter7/7_1_3_solved_probs.php. We need to make sure with 95%

probability that the number of sandwiches eaten will be less than the number of sandwiches made 𝑦. If 𝑋𝑖 is the
variable that shows the number of sandwiches eaten by a typical guest, then 𝐸(𝑋𝑖 ) = 1 and 𝑉𝑎𝑟(𝑋𝑖 ) = 0.5. Using the
𝑦−(64×1) 𝑦−64 𝑦−64
CLT, we have 𝑃((𝑌) ≤ 𝑦) = 0.95 → 𝑃 (𝑍 ≤ ) = 0.95 → Φ ( ) = 0.95 → = Φ−1 (0.95) = 1.6448.
√64×0.5 √32 √32
Solving for 𝑦, we get 𝑦 = 73.3. Therefore, one needs to make at least 74 sandwiches.
13 Orchard → Verger (fr.). For a single apple, 𝑃(𝑋 ≥ 400) = 23.75%. A crate that contains 26 Granny Smith apples is

actually a sequence of iid random variables. The mean weight of 26 such apples is thus 26 × 380 = 9880 g. The
standard deviation of the weight of 26 apples is √26 × 28 = 142.7725. Using the CLT, 𝑃(𝑌 ≥ 10000) = 𝑃(𝑍 ≥
((10000 − 9880)⁄142.7725) = 𝑃(𝑍 ≥ 0.8405) = 20.03%.
14 Let 𝑋 be the earnings from a car insurance policy. 𝐸(𝑋 ) = 1% × (−1000) + 99% × 100 = $89. 𝑉𝑎𝑟(𝑋 ) = $11979.
𝑖 𝑖 𝑖
If the company sells 1000 contracts, then the total earnings are an iid sequence as 𝑌1000 = 𝑋1 + ⋯ + 𝑋1000 . Using the
CLT, we calculate 𝑃(𝑌1000 ≥ 80000) = 𝑃 (𝑍80000 ≥ ((80000 − 1000 × 89)⁄√1000 × 11979)) = 𝑃(𝑍80000 ≥ −2.6).
This probability is equal to 1 − 𝑃(𝑍 ≤ −2.6) = 99.53%.
14
Exercise 16. Central Limit Theorem

An insurance company sells, among other type of policies, homeowners insurance policies that
covers losses and damages to an individual’s house and assets in the home. The policy that is most
frequently subscripted by the customers sells for $200 a year. Over an ordinary year, the
probability that a homeowner reports a sinister is 1 in 1,000. In case of a claim, the company pays
out, on average, $5000 to the customer for compensation – this number can clearly vary a lot from
one case to another. So, the average loss from a policy in case of claim is $4,800 ($5,000
compensation minus $200 price paid by the customer). (1) Assume the company sells only 100 of
such policies. Calculate the probability that the total profit to the company from these policies will
be between $19,000 and $20,000. (2) Assume now that this company sells 10,000 such policies.
Calculate the probability that the total profit to the company will be between $1,900,000 and
$2,000,000.

Exercise 17. Central Limit Theorem

In a communication system each data packet consists of 1000 bits. Due to the noise, each bit may
be received with error with probability 0.1. It is assumed that errors occur independently. Find
the probability that there are more than 120 errors in a certain data packet.15

Exercise 18. Central Limit Theorem

An instructor asks students to send their assignments in pdf format only. However, based on
experience, he knows that on average 10% of all students return their assignment without
converting the original document to pdf. During the current academic year, he is going to teach to
300 students enrolled in different programs of the college he’s working for. Using the CLT,
calculate the probability that there will be more than 20 students who will not turn back their
assignment in pdf format.16

Exercise 19. Central Limit Theorem

An insurance company knows that the mean loss from fire for the entire population is 𝜇 = 250 €
and the standard deviation of loss is 𝜎 = 1000 €. If the company sells 10,000 policies, what is the
probability that the average loss per policy will be greater than 275€? (Note: The distribution of

15Source: https://www.probabilitycourse.com/chapter7/7_1_2_central_limit_theorem.php The number of errors in a


data packet is a sequence of independent Bernoulli random variables, 𝑋𝑖 ∼ 𝐵𝑒(𝑝 = 0.1). Note that 𝐸(𝑋𝑖 ) = 𝑝 and
𝑉𝑎𝑟(𝑋𝑖 ) = 𝑝(1 − 𝑝). If 𝑌 = 𝑋1 + ⋯ + 𝑋𝑛 is the total number of errors in the packet, then 𝑃(𝑌 > 120) =
120−𝑛𝑝 120−100
𝑃 (𝑍 > ) = 𝑃 (𝑍 > ) = 0.0175.
√𝑛𝑝(1−𝑝) √90
16𝑋𝑖 ∼ 𝐵𝑒(𝑝 = 0.1), the probability that a student does not convert to pdf. 𝐸(𝑋𝑖 ) = 0.1 and 𝑉𝑎𝑟(𝑋𝑖 ) = 0.09. Let 𝑌 =
𝑋1 + ⋯ + 𝑋300 . Then, the probability that more than 20 students do not convert their assignments to pdf is
20−(300×0.1)
𝑃(𝑌 ≥ 20) = 𝑃 (𝑍 ≥ ) = 0.9729.
√300×0.1×0.9
15
losses is actually strongly skewed to the right for many policies have $0 loss (no fire), but a few
have very large losses.)17

Exercise 20. Central Limit Theorem*

Let 𝑋1 , … , 𝑋𝑛 be a sequence of i.i.d. exponential random variables with 𝜆 = 1. Define 𝑥̅ =


1
𝑛
(𝑋1 + ⋯ + 𝑋𝑛 ). How large 𝑛 should be such that 𝑃(0.9 ≤ 𝑥̅ ≤ 1.1) = 0.95? Hint: For an

exponential random variable 𝐸(𝑋𝑖 ) = 1⁄𝜆 and 𝑉𝑎𝑟(𝑋𝑖 ) = 1⁄𝜆2.18

Exercise 21. Central Limit Theorem

For a simulation study, a sequence of 1000 random variables between 0 and 1 is generated from
a uniform distribution as 𝑋𝑖 ∼ 𝑈𝑛𝑖(𝑎 = 0, 𝑏 = 1). Using the CLT, calculate the probability that the
sum of the numbers in this sequence will be comprised between 490 and 510. Hint: The expected
value and variance of a uniform random variable between the bounds 𝑎 and 𝑏 as 𝑋𝑖 ∼ 𝑈𝑛𝑖(𝑎, 𝑏)
1 1 1
are 𝐸(𝑋𝑖 ) = (𝑎 + 𝑏) = and 𝑉𝑎𝑟(𝑋𝑖 ) = (𝑏 − 𝑎)2 .19
2 2 12

Exercise 22. Continuity correction to CLT

Assume you flip a fair coin 20 times. Calculate the probability of getting heads between 8 and 12
times, i.e. 𝑃(8 ≤ 𝐻𝑒𝑎𝑑𝑠 ≤ 12), using (1) the exact binomial model, and (2) the CLT. Compare the
results and apply the continuity correction to CLT to improve the approximation.20

Exercise 23. CLT for sample average

17 The expected value of the sample average is 250 and its standard deviation is 𝑆𝐸(𝑥̅ ) = 𝜎⁄√10 = 10. Using the CLT,
we can find 𝑃(𝑋 ≥ 275) = 𝑃(𝑍 ≥ (275 − 250)⁄10) = 𝑃(𝑍 ≥ 2.5) = 0.0062. Strong skewness can be ignored thanks
to the large sample size.
18 Source: https://www.probabilitycourse.com/chapter7/7_1_3_solved_probs.php. Let 𝑌 = 𝑋 + ⋯ + 𝑋 . Then,
1 𝑛
𝐸(𝑌) = 𝑛𝐸(𝑋𝑖 ) = 𝑛 and 𝑉𝑎𝑟(𝑌) = 𝑛𝑉𝑎𝑟(𝑋𝑖 ) = 𝑛 since 𝜆 = 1 (given). Note that 𝑋̅ = 𝑌⁄𝑛. Therefore, we apply the CLT,
𝑌 0.9𝑛−𝑛 𝑌−𝑛 1.1𝑛−𝑛
𝑃 (0.9 ≤ ≤ 1.1) ≥ 0.95 → 𝑃(0.9𝑛 ≤ 𝑌 ≤ 1.1𝑛) ≥ 0.95 → 𝑃 ( ≤ ≤ ) ≥ 0.95.
𝑛 √𝑛 √𝑛 √𝑛
𝑌−𝑛 𝑌−𝑛
This simplifies as 𝑃 (−0.1√𝑛 ≤ ≤ 0.1√𝑛) ≥ 0.95. The CLT implies that ∼ 𝑁(0,1). So, 𝑃(0.9 ≤ 𝑋̅ ≤ 1.1) ≈
√𝑛 √𝑛
Φ(0.1√𝑛) − Φ(−0.1√𝑛) = 2Φ(0.1√𝑛) − 1 since Φ(−𝑥) = 1 − Φ(𝑥). We need to have 2Φ(0.1√𝑛) − 1 ≥ 0.95 →
Φ(0.1√𝑛) ≥ 0.975 → 0.1√𝑛 ≥ Φ−1 (0.975) → 𝑛 ≥ 384.16. As 𝑛 is an integer, we conclude 𝑛 ≥ 385.
(𝑋1 +⋯+𝑋1000 )−1000×(1⁄2)
19 With 𝑋𝑖 ∼ 𝑈𝑛𝑖(𝑎 = 0, 𝑏 = 1), 𝐸(𝑋𝑖 ) = 1⁄2 and 𝑉𝑎𝑟(𝑋𝑖 ) = 1⁄12. Then, 𝑍1000 = ∼ 𝑁(0,1).
√1000×(1⁄12)
490−500 510−500
𝑃(490 ≤ 𝑍1000 ≤ 510) = 𝑃 ( ≤ 𝑍1000 ≤ ) = Φ(1.0954) − Φ(−1.0954) = 0.8633 − 0.1366 = 0.7267.
√83.3333 √83.3333
20Using the binomial model, 𝑃(8 ≤ 𝑋 ≤ 12) = 𝑃(𝑋 = 8) + ⋯ + 𝑃(𝑋 = 12) = 0.7368. Using the CLT,
8 − 10 12 − 10
𝑃(8 ≤ 𝑋 ≤ 12) = 𝑃 ( ≤𝑍≤ ) = 0.6289
√5 √5
To mitigate the difference between the binomial probability and the one obtained via the CLT, we apply the continuity
correction as,
7.5 − 10 12.5 − 10
𝑃(8 − 0.5 ≤ 𝑋 ≤ 12 + 0.5) = 𝑃 ( ≤𝑍≤ ) = 0.7364
√5 √5
16
A specific engine made for speedboats have an average power of 220 HP and standard deviation
of 5 HP. If we randomly select 16 engines and calculate their average HP, what is the probability
that the average power will be less than 222 HP?21

Exercise 24. CLT for sample average

The weights of baby giraffes are known to have a mean of 125 pounds and a standard deviation
of 15 pounds. If we randomly select 40 baby giraffes, what is the probability that the sample
average will be comprised between 122 and 128 pounds?

Exercise 25. CLT for sample average

Suppose that the number of errors per computer program has a Poisson distribution with mean 5
and variance 5. We get 125 programs at random. Calculate the probability that the average error
per program be less than 5.5.22

Exercise 26. CLT for sample average

Let 𝑋𝑖 be the waiting time of a customer in front of the desk in an agency. An assistant manager
claims that the average waiting time of all customers is 5 minutes. The manager prefers checking
the claim of his assistant. He observes 36 customers selected at random. He finds that the average
waiting time is 6.8 minutes. Should the manager reject his assistant’s claim? Hint: Waiting times
𝑋𝑖 are typically assumed to follow an exponential distribution with density function 𝑓(𝑥; 𝜃) =
𝜆𝑒 −𝜆𝑥 for 𝑥 ≥ 0, where 𝜆 is the distribution parameter. The mean and variance of 𝑋 are 𝐸(𝑋) =
1⁄𝜆 and 𝑉𝑎𝑟(𝑋) = 1⁄𝜆2 . Therefore, in this case, 𝜆 = 0.2 because the mean is given 𝐸(𝑋) = 5.23

Exercise 27. CLT for sample proportion

Suppose that 80% of all smartphones are equipped with an Android operating system. Consider a
random sample of 242 smartphones. What is the probability that the sample proportion of
smartphones equipped with Android OS to be comprised between 75 and 85%?24

222−200
21 𝑃(𝑥̅ ≤ 222) = 𝑃 (𝑍 ≤ ) = 0.9452.
5⁄√16
5.5−5
22 Wasserman (2004, pp. 77–78). 𝑃(𝑥̅125 ≤ 5.5) = 𝑃 (𝑍125 ≤ ) = Φ(2.5) = 0.9938.
√5⁄√125
23 Using the exponential distribution, we note that the mean and the variance of the parent distribution are 𝜇 = 5 and
𝜎 2 = 25. The sample mean is 𝑋̅ = 6.8 minutes. The manager would reject his assistant’s claim if the chances to
observe such a sample mean are low. That is, the manager can evaluate the claim by calculating 𝑃(𝑋̅ ≥ 6.8). Because
the waiting times of the customers should be iid random variables, we can apply the CLT as 𝑃(𝑋̅ ≥ 6.8) =
𝑃(𝑍 ≥ (6.8 − 5)⁄(√25⁄√36)) = 𝑃(𝑍 ≥ 2.16). This is equal to 1.5386%. The remaining part of the answer should be
based on the accurate interpretation of this probability.
24 The mean of sample proportion is 𝑝 = 0.8. Its standard error is 𝑆𝐸(𝑝̂ ) = √0.8 × 0.2⁄242 = 0.0257.
0.75−0.8 0.85−0.8
𝑃(0.75 ≤ 𝑝̂ ≤ 0.85) = 𝑃 ( ≤𝑍≤ ) = 𝑃(−1.9445 ≤ 𝑍 ≤ 1.9445) = 0.9482.
0.0257 0.0257
17
Exercise 28. CLT for sample proportion

Suppose that 45% of Europeans own an iPhone. If one takes a random sample of 50 European
citizens, what will be the probability that the sample proportion of iPhone owners to be between
40 and 50%?25

Exercise 29. CLT for sample proportion

Consider two samples with identical size 𝑛. The proportion of successes in the first sample is 30%
and the proportion of successes in the second sample is 50%. If the probability that the population
proportion to fall between 30 and 50% is 95% and assuming the sample proportions can be
modeled by a normal model, calculate, approximatively, the number of observations 𝑛 used in
each sample.26

Exercise 30. CLT

A large firm's call center handles customer complaints, technical support issues and various
inquiries. There are 36 agents in charge of managing inbound calls during the 4-hours long
morning shift between 8:00 AM and 12 noon. Each agent receives on average 1.75 calls every 10
minutes. On Monday, March 1st, 2021, the call center registered a total of 1668 inbound calls. The
manager thinks that such a number is not unusual for the morning shift. Using the statistics above,
set forth an argument against the manager.27

Exercise 31. CLT

A large firm's call center handles customer complaints, technical support issues and various
inquiries. There are 36 agents in charge of managing inbound calls during the 4-hours long
morning shift between 8:00 AM and 12 noon. Each agent receives on average 1.75 calls every 10
minutes. Using these statistics, calculate the lower 𝑊 − and upper bounds 𝑊 + such that 99% of

25 𝑆𝐸(𝑝̂ ) = √0.45(1 − 0.45)⁄50 = 0.0704. Using the CLT for sample proportion, we calculate 𝑃(0.4 ≤ 𝑝 ≤ 0.5) =
0.4−0.45 0.5−0.45
𝑃( ≤𝑍≤ ) = 𝑃(−0.7217 ≤ 𝑍 ≤ 0.7071) = Φ(0.7071) − Φ(−0.7217) = 52.5%.
0.0704 0.0704
26 Because the normal model applies to 𝑝̂ , we can write 𝑃(0.3 ≤ 𝑝 ≤ 0.5) = 𝑃(−1.96 ≤ 𝑍 ≤ 1.96) = 0.95. Then, we

build two following system of equations


27 1.75 calls per 10-minutes intervals, is equivalent to 6 × 1.75 = 10.5 calls each hour and 10.5 × 4 = 42 expected

calls during the morning shift. The number of total calls that an agent 𝑖 will handle during an ordinary morning shift is
then a Poisson random variable 𝑋𝑖 ∼ 𝑃𝑜𝑖(𝜆 = 42) with 𝐸(𝑋𝑖 ) = 𝑉𝑎𝑟(𝑋𝑖 ) = 42. The total number of calls that all agents
will handle can then be represented as the sum 𝑊 = 𝑋1 + ⋯ + 𝑋36 . This forms a sequence of iid random variables and,
therefore, the CLT is suitable to analyze how a total of 1668 inbound calls is unlikely during a morning shift. Applying
the CLT, we get 𝑍𝑊 = (𝑊𝑛 − 𝐸(𝑊))⁄√𝑉𝑎𝑟(𝑊) = (𝑊𝑛 − 𝑛𝐸(𝑋𝑖 ))⁄√𝑛𝑉𝑎𝑟(𝑋𝑖 ) = (1668 − 36 × 42)⁄√36 × 42 = 4.01.
Thus, receiving a total of 1668 calls during a morning shift is a 4-sigma event based on the CLT. Such an event is quite
unusual unlike what the manager claims.
18
total calls expected to be handled during the morning shift of an ordinary day will be comprised
between 𝑊 − and 𝑊 + . Mind rounding your answers 𝑊 − and 𝑊 + up to the nearest integer.28

Exercise 32. CLT

You play a chance game. You roll a six-sided die and win 6 times your bet if the die turns a 6. For
example, if you bet $10 and the die turns 6, then your payoff is $10 × 6 = $60. If the die lands on
a number other than 6, you lose your $10 bet. (1) Let 𝑋 be the random variable defined as the
Profit & Loss (i.e. P&L) of a player who bets $10 on this game one time. Build a probability model
for 𝑋. Calculate 𝐸(𝑋) and 𝑉𝑎𝑟(𝑋). (2) Suppose you play the game 𝑛 = 100 times. What is the
probability that your P&L will be higher than $100 at the end of your play?29

Exercise 33. CLT

Suppose a soccer player marks, on average, 0.35 goals per game of 90 minutes. Remember, this is
an average, sometimes he marks 1 goal per game, other times 2 goals per game or no goal at all.
(1) Calculate the probability that the he marks 2 goals during the next ordinary game of 90
minutes, 𝑃(𝑋 = 2) = ? (2) Suppose that he is going to play 100 games of the same kind. Calculate
the probability that he marks at least 30 and at most 40 goals at the end of this series if his
performance during a given game is independent of his performance during a previous one,
𝑃(30 ≤ 𝑋 ≤ 40) = ?30

28 1.75 calls per 10-minutes intervals, is equivalent to 6 × 1.75 = 10.5 calls each hour and 10.5 × 4 = 42 expected
calls during the morning shift. The number of total calls that an agent 𝑖 will handle during an ordinary morning shift is
then a Poisson random variable 𝑋𝑖 ∼ 𝑃𝑜𝑖(𝜆 = 42) with 𝐸(𝑋𝑖 ) = 𝑉𝑎𝑟(𝑋𝑖 ) = 42. The total number of calls that all agents
will handle can then be represented as the sum 𝑊 = 𝑋1 + ⋯ + 𝑋36 . This forms a sequence of iid random variables. The
bounds 𝑊 − and 𝑊 + such that 99% of the total calls can be written as 𝑃(𝑊 − ≤ 𝑊 ≤ 𝑊 + ) = 0.99. Because, 𝑊 follows
nearly a normal distribution, we can note that 𝑃(𝑧 − ≤ 𝑍 ≤ 𝑧 + ) = 0.99 where 𝑍 ∼ 𝑁(0, 1). Using the standard normal
quantiles, we first note 𝑧 − = −2.58 and 𝑧 + = 2.58 since 99% of the standard normal density is comprised between
−2.58 and 2.58. Finally, we solve 𝑊 − and 𝑊 + in −2.58 = (𝑊 − − 36 × 42)⁄√36 × 42 → 𝑊 − = 1411.678 and 2.58 =
(𝑊 + − 36 × 42)⁄√36 × 42 → 𝑊 + = 1612.322. Thus, 99% of the time during the morning shift of an ordinary day, the
call center is expected to handle between 1412 and 1613 calls.
29 Let 𝑋 represent the P&L when we play the game one time and bet $10. It can be shown that 𝐸(𝑋 ) = $0 and
𝑖 𝑖
𝑉𝑎𝑟(𝑋𝑖 ) = $500 per $10 bet. Then, if we play the game 𝑛 = 100 times, this becomes a sequence of iid random
variables 𝑋𝑖 and we can apply the CLT. 𝑍100 = (100 − 100 × 𝐸(𝑋𝑖 ))⁄√𝑛 × 𝑉𝑎𝑟(𝑋𝑖 ) = 0.44. Then, the probability that
the final P&L to be above $100 is given by 𝑃(𝑍 ≥ 0.44) = 1 − 𝑃(𝑍 ≤ 0.44) = 32.73%.
30 𝑋 ∼ 𝑃𝑜𝑖(𝜆 = 0.35). (1) 𝑃(𝑋 = 2) = 0.0432 (on Excel use POISSON.DIST(2, 0.35, FALSE)). (2) We consider the next

100 games as an iid sequence like 𝑊100 = 𝑋1 + ⋯ + 𝑋100 . Each 𝑋𝑖 follows the same Poisson distribution with mean
𝜆 = 0.35 and variance 𝜆 = 0.35. Therefore, we apply the CLT as, 𝑃(30 ≤ 𝑊100 ≤ 40) = 𝑃((30 − 100 × 0.35)/√(100 ×
0.35) ≤ 𝑍 ≤ (40 − 100 × 0.35)/√(100 × 0.35)). Using the st normal dist, we get 𝑃(−0.8451 ≤ 𝑍 ≤ 0.8451) =
0.8001 − 0.199 = 60.11%.
19
Sampling distributions 𝑌𝑛 − 𝐸(𝑌𝑛 ) (𝑋1 + ⋯ + 𝑋𝑛 ) − 𝑛𝜇
𝑍𝑛 = =
√𝑉𝑎𝑟(𝑌𝑛 ) 𝜎 √𝑛
Lecture Outline
converges in distribution to,

plim{𝑃(𝑍𝑛 ≤ 𝑧} = Φ(𝑧)
Def.: Statistical inference aims using the characteristics 𝑛→∞

of a sample to draw conclusions about the population.

CLT for 𝑋̅: Define 𝑋̅ = (1⁄𝑛)(𝑋1 + ⋯ + 𝑋𝑛 ). Dividing


Def.: A parameter 𝜃 is a quantity that describes a given the result
property of a population. A statistic 𝜃̂ is a quantity (𝑋1 + ⋯ + 𝑋𝑛 ) − 𝑛𝜇
plim 𝑃 (𝑍𝑛 ≤ ) = Φ(𝑧)
estimated from a sample drawn from the population. 𝑛→∞ 𝜎 √𝑛

Remarks: (1) 𝜃 assumed fixed, known if the probability by 𝑛, we get,


model is known too; (2) 𝜃̂ function of the sample drawn
1
((𝑋1 + ⋯ + 𝑋𝑛 ) − 𝑛𝜇)
from the population; (3) Thus as many possible values plim 𝑃 (𝑍𝑛 ≤ 𝑛 ) = Φ(𝑧)
𝑛→∞ 1
for 𝜃̂ as the number of possible samples population; (4) 𝑛 𝜎 √𝑛

Bottomline: 𝜃̂ is a random variable which, on rearranging, gives,

𝑥̅ − 𝜇
plim 𝑃 (𝑍𝑛 ≤ ) = Φ(𝑧)
Def.: The sampling dist. of 𝜃̂ is the dist. of estimated 𝑛→∞ 𝜎⁄√𝑛

values of 𝜃̂ from repeated samplings of fixed size 𝑛. Therefore 𝑋̅ ∼ 𝑁(𝜇, 𝜎⁄√𝑛).

Def.: Let 𝜃̂ be a sample statistic obtained using a sample CLT for 𝑝̂ : Let 𝑋1 , … , 𝑋𝑛 𝑖𝑖𝑑 Bernoulli as 𝑋𝑖 ∼ 𝐵𝑒(𝑝).
of size 𝑛. Then, the standard deviation of 𝜃̂ is called the Define 𝑌 as the number of success in 𝑋1 , … , 𝑋𝑛 , then
standard error of the estimate. Mathematically, 𝑌 ∼ 𝐵(𝑛, 𝑝) with 𝐸(𝑌) = 𝑛𝑝 and 𝑉𝑎𝑟(𝑌) = 𝑛𝑝(1 − 𝑝).
𝑆𝐸(𝜃̂) = 𝜎⁄√𝑛 if 𝜎 is known, otherwise 𝑆𝐸(𝜃̂) = 𝑠⁄√𝑛 Define 𝑝̂ as the sample proportion of success. The exp.
where 𝑠 is the sample standard deviation. value of 𝑝̂ is 𝐸(𝑝̂ ) = 𝑛𝑝⁄𝑛 = 𝑝. The st. error of 𝑝̂ is,

Remarks: (1) 𝑆𝐸(𝜃̂) measures the randomness √𝑉𝑎𝑟(𝑌) √𝑛𝑝(1 − 𝑝)⁄𝑛 𝑝(1 − 𝑝)
𝑆𝐸(𝑝̂ ) = = =√
associated with 𝜃̂; (2) The larger 𝑛 gets, the better the √𝑛 √𝑛 𝑛
precision of 𝜃̂.
The CLT holds that,

𝑝̂ − 𝑝
Central Limit Theorem: Let 𝑋1 , … , 𝑋𝑛 𝑖𝑖𝑑 r.v. with plim 𝑃 𝑍𝑛 ≤ = Φ(𝑧)
√𝑝(1 − 𝑝)
𝑛→∞
𝐸(𝑋𝑖 ) = 𝜇 and 𝑉𝑎𝑟(𝑋𝑖 ) = 𝜎 . Define 𝑌𝑛 = 𝑋1 + ⋯ +
2
( 𝑛 )
𝑋𝑛 . Then, 𝐸(𝑌𝑛 ) = 𝑛𝜇 and 𝑉𝑎𝑟(𝑌𝑛 ) = 𝑛𝜎 because of
2
𝑝(1−𝑝)
𝑖𝑖𝑑 property. The classical CLT holds that the following which implies 𝑝̂ ∼ 𝑁 (𝑝, √ 𝑛
).
transformation of 𝑌𝑛 ,
Remarks: Normality requires 𝑛𝑝 and 𝑛(1 − 𝑝) > 15.
1

You might also like