Notes ch3 Sampling Distributions
Notes ch3 Sampling Distributions
Notes ch3 Sampling Distributions
Lecture Notes
Erkin Diyarbakirlioglu
IAE Gustave Eiffel
February 6, 2022
1
Table of contents
1 Introduction................................................................................................................................................................3
2
1 Parameter(s) vs. Statistic(s)
1. In real-life, when we are interested with a given phenomenon, the exact probability model
underlying the data is generally unknown. The researcher has to extract information from the data
and draw conclusions. The problem is that the data itself is subject to randomness since one
typically do not observe the entire population but only a subset of it. The basic idea of statistical
inference is thus to use the characteristics of a sample to estimate the characteristics of the
population. Loosely speaking, statistical inference is a collection of methods that deal with
drawing conclusions from data that is subject to random variation.1 This note presents a number
of simple yet fundamental concepts as an introduction to the theory of statistical inference.
Example 1. Consider the following situations. The numbers given on the left column describe a
parameter. Those reported on the right are an estimate of the parameter of interest based on a
sample drawn from the population.
Parameter Statistic
▪ The mean of the outcomes of a six-sided die ▪ A six-sided die is rolled 10 times, the sample
is 𝜇 = 3.5. average is 𝑥̅ = 3.22.
▪ The minimum value one can get when we ▪ A six-sided die is rolled 10 times. The
roll a six-sided die is min(𝑋) = 1. minimum value observed is 2.
1 Source: https://www.probabilitycourse.com/chapter8/8_1_0_intro.php
2 The question may sound a simple one but it lays at the origin of two main schools in statistics: Frequentist (classical)
approach vs. Bayesian approach. In frequentist approach, the parameter 𝜃 is assumed to be deterministic: It has a
fixed value and there is nothing uncertain about it. In contrast, a sample estimate 𝜃̂ is assumed to random as it is
basically function of the sample. According to the Bayesian approach, the parameter is also assumed to be a random
variable. Besides, Bayesian school also asserts that the researcher has an initial guess about the distribution of 𝜃. After
observing some data, this distribution is updated using Bayes’ rule. Briefly, one may assert say that the classic school
addresses the estimation of a non-random quantity using data while Bayesian school deals with estimating random
variables.
3
▪ In metropolitan France, the mean weight of ▪ A random sample with 𝑛 = 30 observations
3 years-old girls is 𝜇 = 14 kg. among 3 years-old girls is selected. The
average weight is 𝑥̅ = 13.8 kg.
▪ The probability of a fair coin turning ▪ One tosses a fair coin 100 times. It lands on
“Heads” is 𝑝 = 50%. “Heads” 55 times. So, the sample proportion
of “Heads” is 𝑝̂ = 55%.
▪ The probability of six-sided die landing, say, ▪ A six-sided die is rolled 30 times. It lands on
3 is 𝑝 = 1⁄6. “three” 2 times. So, the sample proportion
is 𝑝̂ = 1⁄15.
▪ An internet study run by C. McManus ▪ In a survey with 1126 participants in
suggests that 11.15% of the French is left- France, 95 people claim that they are left-
handed. handed.
▪ According to Eurostat, the mean age of men ▪ A random sample from civil service
at first marriage in France was 34.9 in 2017. databases across different city halls in
France reveals that the average age of men
at first marriage was 36.2 in 2017.
2 Sampling distribution
Example 2. Using software (R or Excel), generate a 𝑘 × 𝑛 data matrix, i.e. 𝑛 samples drawn from a
random variable 𝑋 (e.g. 𝑋 may represent a fair six-sided die), each sample of size 𝑘. For practical
concerns, start with lower values for both 𝑘 and 𝑛, like 𝑘 = 10 and 𝑛 = 20. Draw then the
histogram of 𝑛 = 20 different sample averages. Then, increase the number of samples 𝑛 to show
the graphical convergence of the distribution of sample averages to the normal distribution.
4. Let 𝜃̂ be an estimate of the parameter 𝜃 calculated using a sample of size 𝑛. Then, the uncertainty
of the estimated value is measured by the standard error the estimate, defined as,
𝜎
𝑆𝐸(𝜃̂) = (1)
√𝑛
where 𝜎 is the population standard deviation. If the population standard deviation is unknown,
then the standard error of can be obtained using the sample standard deviation as,
𝑠
𝑆𝐸(𝜃̂) = (2)
√𝑛
Remarks:
4
• The standard error measures the randomness associated with 𝜃̂
• Given the formula, the larger the sample size, the better the precision of the estimate 𝜃̂
• Standard error of an estimate is not equivalent of the standard deviation,
o 𝜎 is the average variability across different elements of a dataset,
o 𝑆𝐸(𝜃̂) is a measure of the precision of an estimate
Example 3. Let 𝑋 be a random variable that represents the result of a fair die. Let’s calculate the
mean and the variance of 𝑋. Using the pmf of 𝑋, we obtain 𝜇 = ∑𝑛𝑖=1 𝑥𝑖 × 𝑃(𝑋 = 𝑥𝑖 ) = 3.5. The
variance of 𝑋, 𝜎 2 = ∑𝑛𝑖=1(𝑥𝑖 − 𝜇)2 × 𝑃(𝑋 = 𝑥𝑖 ) = 2.916667. Therefore, the standard deviation of
𝑋 is 𝜎 = |√2.916667| = 1.707825.
Example 4. Assume now you roll a six-sided die 8 times and get the following results: 4, 5, 5, 6, 1,
6, 4, 2. The sample average is 𝑥̅ = 4.125 and the sample standard deviation 𝑠 = 1.8077. The
standard error of the sample average is 𝑆𝐸(𝑥̅ ) = 𝜎⁄√𝑛 = 1.707825⁄√8 = 0.603807.
5. The Central Limit Theorem (CLT) states that the sum of a large number of independently and
identically distributed (iid) random variables is approximately normal. In this section, I discuss
the classical form of CLT. Let 𝑋1 , 𝑋2 , … , 𝑋𝑛 be a sequence of i.i.d. random variables each with finite
mean and variance as 𝐸(𝑋𝑖 ) = 𝜇 and 𝑉𝑎𝑟(𝑋𝑖 ) = 𝜎 2 . Define the sum,
𝑌𝑛 = 𝑋1 + ⋯ + 𝑋𝑛 (3)
and
𝑌𝑛 − 𝐸(𝑌𝑛 ) (𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛 ) − 𝑛𝜇
𝑍𝑛 = = (4)
√𝑉𝑎𝑟(𝑌𝑛 ) 𝜎√𝑛
converges in distribution to the standard normal random variable as 𝑛 goes to infinity, i.e.
5
lim 𝑃(𝑍𝑛 ≤ 𝑧) = Φ(𝑧) (5)
𝑛→∞
6. The key point with the CLT is that in many real applications, a certain random variable of
interest is set as the sum of many independent random variables. In such situations, the CLT gives
a theoretically legitimate reason to apply the normal model. Examples of such random variables
include, among others, measurement errors in laboratories, noise in communication and signal
processing, the change in the prices of financial assets, etc. In addition, the interesting feature of
the CLT is that it does not matter what the distribution of the 𝑋𝑖 ’s is. These can be discrete,
continuous or mixed random variables. This can simplify calculations substantially when we deal
with a problem involving thousands of iid random variables. Instead of trying to find out the
distribution of the sum of these variables, the CLT gives a quick answer with the only condition
that one knows the mean and variance of these variables.3
Example. A bank teller serves customers standing in the queue one by one. It takes on average 2
minutes to serve a customer with a variance by 1 minute. If the service times for different
customers are independent, what is the probability that the bank teller serves 50 customers in
less than 110 minutes? Let 𝑌 = 𝑋1 + ⋯ + 𝑋50 be the total time to serve 50 customers. Since we
have a sequence of independent random variables with known 𝜇 and 𝜎 2 , we apply the CLT as,
110 − 50 × 2
𝑃(𝑌50 ≤ 110) = 𝑃 (𝑍50 ≤ ) = 𝑃(𝑍 ≤ 110) = 0.9213
√50 × 1
20 − (200 × 0.08)
𝑃(𝑋 ≤ 20) = 𝑃 (𝑍 ≤ ) = 𝑃(𝑍 ≤ 1.0426) = 0.8514
√200 × 0.08 × 0.92
The approximation error is apparent. Applying the continuity correction, however, we get
𝑃(𝑋 ≤ 20 + 0.5) = 𝑃(𝑍 ≤ 1.1729) = 0.8796, so the result is considerably improved.
8. Using the previous result, we can obtain the sampling distribution of the sample average 𝑥̅ =
1
(𝑋1 + ⋯ + 𝑋𝑛 ). Rewrite first the CLT,
𝑛
(𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛 ) − 𝑛𝜇
plim 𝑃 (𝑍𝑛 ≤ ) = Φ(𝑧) (6)
𝑛→∞ 𝜎√𝑛
1
𝑛 ((𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛 ) − 𝑛𝜇)
plim 𝑃 (𝑍𝑛 ≤ ) = Φ(𝑧)
𝑛→∞ 1
𝑛 𝜎 √𝑛
𝑥̅ − 𝜇
plim 𝑃 (𝑍𝑛 ≤ ) = Φ(𝑧) (7)
𝑛→∞ 𝜎⁄√𝑛
Example. The grades of a statistics class are assumed to be a normal random variable with mean
𝜇 = 11.4 and standard deviation 𝜎 = 3.5. If one chooses, say, 36 students at random, then the
sample average will also be a normal random variable as 𝑥̅ ∼ 𝑁(11.4, 3.5⁄√36).
9. Let 𝑋1 , … , 𝑋𝑛 be a sequence of iid Bernoulli random variables, each with success probability 𝑝.
Define 𝑌 as the number of successes in this sequence. Then, it is well known 𝑌 follows a binomial
distribution as 𝑌 ∼ 𝐵(𝑛, 𝑝), with 𝐸(𝑌) = 𝑛𝑝 and 𝑉𝑎𝑟(𝑌) = 𝑛𝑝(1 − 𝑝).
Consider now the sample proportion of success 𝑝̂ . Using 𝑌, the sample proportion of success can
be defined as the number of success observed in this sequence divided by the number of trials 𝑛.
The expected value of the sample proportion will be 𝐸(𝑝̂ ) = 𝑛𝑝⁄𝑛 = 𝑝. Because the variance of 𝑌
is 𝑉𝑎𝑟(𝑌) = 𝑛𝑝(1 − 𝑝), then the standard error of the sample proportion 𝑝̂ will be given by,
𝑝̂ − 𝑝
plim 𝑃 𝑍𝑛 ≤ = Φ(𝑧) (9)
√𝑝(1 − 𝑝)
𝑛→∞
( 𝑛 )
𝑝(1−𝑝)
Therefore, the sample proportion is approximately normal as 𝑝̂ ∼ 𝑁 (𝑝, √ 𝑛
) if 𝑛𝑝 and
Example. Assume you flip a fair coin 100 times. Denote 𝑝 the probability of observing heads and 𝑝̂
the proportion of heads observed in this experiment. Since 𝑛𝑝 and 𝑛(1 − 𝑝) are both greater than
15, the sampling distribution of the sample proportion is nearly normal with expected value 𝑝 =
0.5 and standard deviation 𝑆𝐸(𝑝̂ ) = √0.5 × (1 − 0.5)⁄100 = 0.05.
Example. Assume you want to calculate the probability to get heads at most 45 times in this
experiment. We have 𝑝̂ ∼ 𝑁(0.5, 0.05). Therefore, 𝑃(𝑝̂ ≤ 0.45) = 𝑃(𝑍 ≤ (𝑧 = −1)) = 0.159.
8
Appendix. Sampling distribution of sample mean
Let 𝑋 ∼ 𝑁(𝜇, 𝜎) and consider a sequence of repeated samplings from 𝑋 of fixed size 𝑛. Define,
1
𝑋̅ = (𝑋 + ⋯ + 𝑋𝑛 )
𝑛 1
the sample mean. Since 𝑋𝑖 are independent and random, they all have the same probability
distribution, i.e. 𝑁(𝜇, 𝜎). The moment generating function of 𝑋̅ can then be developed as,
𝜃 𝜃 𝜃
𝑀𝑋̅ (𝜃) = 𝑀1 (𝑋 (𝜃) = 𝑀𝑋1 +⋯+𝑋𝑛 ( ) = 𝑀𝑋1 ( ) × ⋯ × 𝑀𝑋𝑛 ( )
𝑛 1 +⋯+𝑋𝑛 ) 𝑛 𝑛 𝑛
because 𝑋𝑖 's are independent random variables all drawn from the same probability distribution.
Thus, the last term can be expressed as,
𝜃 𝜃 𝜃
𝑀𝑋1 ( ) = ⋯ = 𝑀𝑋𝑛 ( ) = 𝑀𝑋 ( )
𝑛 𝑛 𝑛
The right-hand-side is the moment generating function of the variable 𝑋 when 𝑋 ∼ 𝑁(𝜇, 𝜎⁄√𝑛).
9
End-of-chapter exercises for Sampling distributions
For each of the following situations, state whether the parameter of interest is a mean of a
proportion. (1) In a survey, one hundred college students are asked how many hours per week
they spend on the Internet. (2) In a survey, one hundred college students are asked what
percentage of the time they spend on the Internet is part of their course work. (3) In a survey, one
hundred college students are asked whether or they have ever cited information from Wikipedia
on their papers. (4) In a survey, one hundred college students are asked the weekly amount of
money they spend on alcoholic beverages.
For each of the following situations, state whether the parameter of interest is a mean or a
proportion. (1) A poll shows that 64% of Europeans worry a great deal about climate change. (2)
A survey reports that local TV news has shown a 17% increase in revenue between 2009 and 2011
while newspaper revenues decreased by 6.4% during this time period. (3) In a survey, high school
and college students are asked whether or not they use geolocation services on their smartphones.
(4) In a survey, smartphone users are asked whether or not they use a web-based catering service.
(5) In a survey, smartphone users are asked how times they check their home screens without
receiving a call or a message signal.
Suppose you flip a coin 100 times. It lands on heads 45 times. What is the proportion of heads in
this experiment? State if this is a statistic or a population parameter? Is the value of the population
proportion of heads known? What is this value assuming the coin is fair?
A company sells cement in 5 kg packs. Suppose you buy 20 such packs and weigh each of them.
The mean weight of the sample is 5.12 kg. Identify the parameter of interest and the sample
statistic in this situation. Is it a mean or a proportion? What is the value of the population
parameter? What is the value of the sample statistic? What is the sample size?
10
A college counselor is interested in estimating how many credits a student typically enrolls in each
semester. The counselor decides to randomly sample 100 students by using the registrar’s
database of students. The histogram below shows the distribution of the number of credits taken
by these students. Sample statistics for this distribution are given next to the graph.
Minimum 8.00
1st quartile 13.00
Median 14.00
Mean 13.65
Standard dev. 1.91
3rd quartile 15.00
Maximum 18.00
number of credits
(1) What is the estimated average number of credits taken per semester by students at this
college? What about the median? (2) What is the sample standard deviation of the number of
credits taken per semester by students at this college? What about the interquartile range (IQR)?
(3) Is a load of 16 credits unusually high for this college? What about 18 credits? Explain your
reasoning. (4) The college counselor takes another random sample of 100 students and this time
finds a sample mean of 14.02 units. Should she be surprised that this sample statistic is slightly
different than the one from the original sample? Explain your reasoning. (5) The sample mean
given next to the histogram is a point estimate for the mean number of credits taken by all students
at that college. What measure do we use to quantify the variability of this estimate? Compute this
quantity using the data provided.
Researchers studying anthropometry collected body girth measurements and skeletal diameter
measurements, as well as age, height and gender, for 507 physically active individuals.5 The
histogram below shows the sample distribution of heights in centimeters.
Minimum 147.2
1st quartile 163.8
Median 170.3
Mean 171.1
Standard dev. 9.4
3rd quartile 177.8
Maximum 198.1
height
The number of eggs laid by a certain species of hen during their breeding period has a mean by 35
eggs with a standard deviation by 18.2.6 A group of researchers randomly samples 45 hens of this
species, counts the number of eggs laid during their breeding period, and records the sample
average. They repeat this process 1,000 times and build a distribution of the sample averages
obtained this way. (1) What is this distribution called? (2) Would you expect the shape of this
distribution to be symmetric or asymmetric? Explain your reasoning. (3) Calculate the variability
of this distribution and state the appropriate term used to refer to this value. (4) Suppose the
researchers’ budget is reduced and they are only able to collect random samples of 10 hens. The
sample mean of the number of eggs is recorded, and we repeat this 1,000 times, and build a new
distribution of sample means. How will the variability of this new distribution compare to the
variability of the original distribution?7
Test scores for a standardized test are distributed as approximately normal with mean 700 and
standard deviation 80, 𝑋 ∼ 𝑁(700, 80). (1) Assume a random sample of 16 test results is taken.
What is the expected value of the sample average? What is the expected standard deviation of this
sample around the mean? (2) Calculate the same quantities, i.e. expected value of the sample
average and standard deviation of the sample average for another random sample of size 100.8
mean is 𝑆𝐸(𝑥̅ ) = 80⁄√16 = 20. If one takes a random sample of size 100, then the expected value of the sample mean
will remain the same, i.e. 700. The standard error of the sample mean, however, will be lower as 𝑆𝐸(𝑥̅ ) = 80⁄√100 =
8.
12
Exercise 9. Sampling distributions
A manufacturer of compact fluorescent light bulbs advertises that the distribution of the lifespans
of these light bulbs is nearly normal with a mean of 9,000 hours and a standard deviation of 1,000
hours. (1) What is the probability that a randomly chosen light bulb lasts more than 10,500 hours?
(2) Describe the distribution of the mean lifespan of 15 light bulbs. (3) What is the probability that
the mean lifespan of 15 randomly chosen light bulbs is more than 10,500 hours? (4) Sketch the
two distributions (population and sampling) on the same scale.
The chief researcher of a chemistry laboratory makes an experiment with the aid of her grad
students who help running the experiment and then report their measurements to the researcher.
The measurement of each student, however, is subject to some variability. The standard deviation
of their measurements is supposed to be 𝜎 = 10 milligrams. The lab’s chief researcher repeats the
measurement 8 times and records the average. What is the standard deviation of the sample
average? How many times the chief researcher must repeat the experiment to reduce the standard
deviation of the sample average to 2.5?9
Roulette wheels are typically marked with the numbers 1 through 36 plus 0 and 00. Each of these
outcomes is equally likely every time the wheel is spun. According to a simple version, when a
player places a bet on any one number and is correct, the payoff is 35:1; that is if the player bets
$1, he will receive $36 if he wins ($35 plus $1 initial bet) and nothing if he loses. Suppose a player
place a $1 bet on his favorite number. (1) What is the casino’s expected profit from a single bet?
What is the standard deviation of this profit? (2) Suppose the casino remains open 350 days a year
and on an ordinary day 500 independent such bets are placed to the roulette. What will be the
standard deviation of the mean profit per game?10
You have invited 64 guests to a party. You need to make sandwiches for them. You believe that a
guest might need 0, 1 or 2 sandwiches with probabilities 0.25, 0.5 and 0.25 respectively. Assume
that the number of sandwiches needed for each guest is independent from other guests. How
many sandwiches should you make so that you are 95% sure that there is no shortage?12
The weights of Granny Smith apples from a large orchard follow a normal distribution with mean
380 g and standard deviation 28 g. (1) A single apple is randomly selected from this orchard. What
is the probability that it weighs more than 400 g? (2) The farmer sells the apples in crates which
contain 26 apples. What is the probability that a given crate weighs more than 10 kg?13
An insurance company sells, among others, car insurance policies. Suppose a standardized
contract sells at $100 per year (cash into the company). In case of a claim, the company assumes
that the coverage costs per policy is $1000 (cash out of the company). Statistics suggest that 1%
of all contracts will make a claim during a given year while 99% of the customers will make no
claim at all. Assume there are 𝑛 = 1000 such car policies sold. Using the Central Limit Theorem,
calculate the probability that the company's earnings from car insurance policies sold be higher
than $80,000.14
probability that the number of sandwiches eaten will be less than the number of sandwiches made 𝑦. If 𝑋𝑖 is the
variable that shows the number of sandwiches eaten by a typical guest, then 𝐸(𝑋𝑖 ) = 1 and 𝑉𝑎𝑟(𝑋𝑖 ) = 0.5. Using the
𝑦−(64×1) 𝑦−64 𝑦−64
CLT, we have 𝑃((𝑌) ≤ 𝑦) = 0.95 → 𝑃 (𝑍 ≤ ) = 0.95 → Φ ( ) = 0.95 → = Φ−1 (0.95) = 1.6448.
√64×0.5 √32 √32
Solving for 𝑦, we get 𝑦 = 73.3. Therefore, one needs to make at least 74 sandwiches.
13 Orchard → Verger (fr.). For a single apple, 𝑃(𝑋 ≥ 400) = 23.75%. A crate that contains 26 Granny Smith apples is
actually a sequence of iid random variables. The mean weight of 26 such apples is thus 26 × 380 = 9880 g. The
standard deviation of the weight of 26 apples is √26 × 28 = 142.7725. Using the CLT, 𝑃(𝑌 ≥ 10000) = 𝑃(𝑍 ≥
((10000 − 9880)⁄142.7725) = 𝑃(𝑍 ≥ 0.8405) = 20.03%.
14 Let 𝑋 be the earnings from a car insurance policy. 𝐸(𝑋 ) = 1% × (−1000) + 99% × 100 = $89. 𝑉𝑎𝑟(𝑋 ) = $11979.
𝑖 𝑖 𝑖
If the company sells 1000 contracts, then the total earnings are an iid sequence as 𝑌1000 = 𝑋1 + ⋯ + 𝑋1000 . Using the
CLT, we calculate 𝑃(𝑌1000 ≥ 80000) = 𝑃 (𝑍80000 ≥ ((80000 − 1000 × 89)⁄√1000 × 11979)) = 𝑃(𝑍80000 ≥ −2.6).
This probability is equal to 1 − 𝑃(𝑍 ≤ −2.6) = 99.53%.
14
Exercise 16. Central Limit Theorem
An insurance company sells, among other type of policies, homeowners insurance policies that
covers losses and damages to an individual’s house and assets in the home. The policy that is most
frequently subscripted by the customers sells for $200 a year. Over an ordinary year, the
probability that a homeowner reports a sinister is 1 in 1,000. In case of a claim, the company pays
out, on average, $5000 to the customer for compensation – this number can clearly vary a lot from
one case to another. So, the average loss from a policy in case of claim is $4,800 ($5,000
compensation minus $200 price paid by the customer). (1) Assume the company sells only 100 of
such policies. Calculate the probability that the total profit to the company from these policies will
be between $19,000 and $20,000. (2) Assume now that this company sells 10,000 such policies.
Calculate the probability that the total profit to the company will be between $1,900,000 and
$2,000,000.
In a communication system each data packet consists of 1000 bits. Due to the noise, each bit may
be received with error with probability 0.1. It is assumed that errors occur independently. Find
the probability that there are more than 120 errors in a certain data packet.15
An instructor asks students to send their assignments in pdf format only. However, based on
experience, he knows that on average 10% of all students return their assignment without
converting the original document to pdf. During the current academic year, he is going to teach to
300 students enrolled in different programs of the college he’s working for. Using the CLT,
calculate the probability that there will be more than 20 students who will not turn back their
assignment in pdf format.16
An insurance company knows that the mean loss from fire for the entire population is 𝜇 = 250 €
and the standard deviation of loss is 𝜎 = 1000 €. If the company sells 10,000 policies, what is the
probability that the average loss per policy will be greater than 275€? (Note: The distribution of
For a simulation study, a sequence of 1000 random variables between 0 and 1 is generated from
a uniform distribution as 𝑋𝑖 ∼ 𝑈𝑛𝑖(𝑎 = 0, 𝑏 = 1). Using the CLT, calculate the probability that the
sum of the numbers in this sequence will be comprised between 490 and 510. Hint: The expected
value and variance of a uniform random variable between the bounds 𝑎 and 𝑏 as 𝑋𝑖 ∼ 𝑈𝑛𝑖(𝑎, 𝑏)
1 1 1
are 𝐸(𝑋𝑖 ) = (𝑎 + 𝑏) = and 𝑉𝑎𝑟(𝑋𝑖 ) = (𝑏 − 𝑎)2 .19
2 2 12
Assume you flip a fair coin 20 times. Calculate the probability of getting heads between 8 and 12
times, i.e. 𝑃(8 ≤ 𝐻𝑒𝑎𝑑𝑠 ≤ 12), using (1) the exact binomial model, and (2) the CLT. Compare the
results and apply the continuity correction to CLT to improve the approximation.20
17 The expected value of the sample average is 250 and its standard deviation is 𝑆𝐸(𝑥̅ ) = 𝜎⁄√10 = 10. Using the CLT,
we can find 𝑃(𝑋 ≥ 275) = 𝑃(𝑍 ≥ (275 − 250)⁄10) = 𝑃(𝑍 ≥ 2.5) = 0.0062. Strong skewness can be ignored thanks
to the large sample size.
18 Source: https://www.probabilitycourse.com/chapter7/7_1_3_solved_probs.php. Let 𝑌 = 𝑋 + ⋯ + 𝑋 . Then,
1 𝑛
𝐸(𝑌) = 𝑛𝐸(𝑋𝑖 ) = 𝑛 and 𝑉𝑎𝑟(𝑌) = 𝑛𝑉𝑎𝑟(𝑋𝑖 ) = 𝑛 since 𝜆 = 1 (given). Note that 𝑋̅ = 𝑌⁄𝑛. Therefore, we apply the CLT,
𝑌 0.9𝑛−𝑛 𝑌−𝑛 1.1𝑛−𝑛
𝑃 (0.9 ≤ ≤ 1.1) ≥ 0.95 → 𝑃(0.9𝑛 ≤ 𝑌 ≤ 1.1𝑛) ≥ 0.95 → 𝑃 ( ≤ ≤ ) ≥ 0.95.
𝑛 √𝑛 √𝑛 √𝑛
𝑌−𝑛 𝑌−𝑛
This simplifies as 𝑃 (−0.1√𝑛 ≤ ≤ 0.1√𝑛) ≥ 0.95. The CLT implies that ∼ 𝑁(0,1). So, 𝑃(0.9 ≤ 𝑋̅ ≤ 1.1) ≈
√𝑛 √𝑛
Φ(0.1√𝑛) − Φ(−0.1√𝑛) = 2Φ(0.1√𝑛) − 1 since Φ(−𝑥) = 1 − Φ(𝑥). We need to have 2Φ(0.1√𝑛) − 1 ≥ 0.95 →
Φ(0.1√𝑛) ≥ 0.975 → 0.1√𝑛 ≥ Φ−1 (0.975) → 𝑛 ≥ 384.16. As 𝑛 is an integer, we conclude 𝑛 ≥ 385.
(𝑋1 +⋯+𝑋1000 )−1000×(1⁄2)
19 With 𝑋𝑖 ∼ 𝑈𝑛𝑖(𝑎 = 0, 𝑏 = 1), 𝐸(𝑋𝑖 ) = 1⁄2 and 𝑉𝑎𝑟(𝑋𝑖 ) = 1⁄12. Then, 𝑍1000 = ∼ 𝑁(0,1).
√1000×(1⁄12)
490−500 510−500
𝑃(490 ≤ 𝑍1000 ≤ 510) = 𝑃 ( ≤ 𝑍1000 ≤ ) = Φ(1.0954) − Φ(−1.0954) = 0.8633 − 0.1366 = 0.7267.
√83.3333 √83.3333
20Using the binomial model, 𝑃(8 ≤ 𝑋 ≤ 12) = 𝑃(𝑋 = 8) + ⋯ + 𝑃(𝑋 = 12) = 0.7368. Using the CLT,
8 − 10 12 − 10
𝑃(8 ≤ 𝑋 ≤ 12) = 𝑃 ( ≤𝑍≤ ) = 0.6289
√5 √5
To mitigate the difference between the binomial probability and the one obtained via the CLT, we apply the continuity
correction as,
7.5 − 10 12.5 − 10
𝑃(8 − 0.5 ≤ 𝑋 ≤ 12 + 0.5) = 𝑃 ( ≤𝑍≤ ) = 0.7364
√5 √5
16
A specific engine made for speedboats have an average power of 220 HP and standard deviation
of 5 HP. If we randomly select 16 engines and calculate their average HP, what is the probability
that the average power will be less than 222 HP?21
The weights of baby giraffes are known to have a mean of 125 pounds and a standard deviation
of 15 pounds. If we randomly select 40 baby giraffes, what is the probability that the sample
average will be comprised between 122 and 128 pounds?
Suppose that the number of errors per computer program has a Poisson distribution with mean 5
and variance 5. We get 125 programs at random. Calculate the probability that the average error
per program be less than 5.5.22
Let 𝑋𝑖 be the waiting time of a customer in front of the desk in an agency. An assistant manager
claims that the average waiting time of all customers is 5 minutes. The manager prefers checking
the claim of his assistant. He observes 36 customers selected at random. He finds that the average
waiting time is 6.8 minutes. Should the manager reject his assistant’s claim? Hint: Waiting times
𝑋𝑖 are typically assumed to follow an exponential distribution with density function 𝑓(𝑥; 𝜃) =
𝜆𝑒 −𝜆𝑥 for 𝑥 ≥ 0, where 𝜆 is the distribution parameter. The mean and variance of 𝑋 are 𝐸(𝑋) =
1⁄𝜆 and 𝑉𝑎𝑟(𝑋) = 1⁄𝜆2 . Therefore, in this case, 𝜆 = 0.2 because the mean is given 𝐸(𝑋) = 5.23
Suppose that 80% of all smartphones are equipped with an Android operating system. Consider a
random sample of 242 smartphones. What is the probability that the sample proportion of
smartphones equipped with Android OS to be comprised between 75 and 85%?24
222−200
21 𝑃(𝑥̅ ≤ 222) = 𝑃 (𝑍 ≤ ) = 0.9452.
5⁄√16
5.5−5
22 Wasserman (2004, pp. 77–78). 𝑃(𝑥̅125 ≤ 5.5) = 𝑃 (𝑍125 ≤ ) = Φ(2.5) = 0.9938.
√5⁄√125
23 Using the exponential distribution, we note that the mean and the variance of the parent distribution are 𝜇 = 5 and
𝜎 2 = 25. The sample mean is 𝑋̅ = 6.8 minutes. The manager would reject his assistant’s claim if the chances to
observe such a sample mean are low. That is, the manager can evaluate the claim by calculating 𝑃(𝑋̅ ≥ 6.8). Because
the waiting times of the customers should be iid random variables, we can apply the CLT as 𝑃(𝑋̅ ≥ 6.8) =
𝑃(𝑍 ≥ (6.8 − 5)⁄(√25⁄√36)) = 𝑃(𝑍 ≥ 2.16). This is equal to 1.5386%. The remaining part of the answer should be
based on the accurate interpretation of this probability.
24 The mean of sample proportion is 𝑝 = 0.8. Its standard error is 𝑆𝐸(𝑝̂ ) = √0.8 × 0.2⁄242 = 0.0257.
0.75−0.8 0.85−0.8
𝑃(0.75 ≤ 𝑝̂ ≤ 0.85) = 𝑃 ( ≤𝑍≤ ) = 𝑃(−1.9445 ≤ 𝑍 ≤ 1.9445) = 0.9482.
0.0257 0.0257
17
Exercise 28. CLT for sample proportion
Suppose that 45% of Europeans own an iPhone. If one takes a random sample of 50 European
citizens, what will be the probability that the sample proportion of iPhone owners to be between
40 and 50%?25
Consider two samples with identical size 𝑛. The proportion of successes in the first sample is 30%
and the proportion of successes in the second sample is 50%. If the probability that the population
proportion to fall between 30 and 50% is 95% and assuming the sample proportions can be
modeled by a normal model, calculate, approximatively, the number of observations 𝑛 used in
each sample.26
A large firm's call center handles customer complaints, technical support issues and various
inquiries. There are 36 agents in charge of managing inbound calls during the 4-hours long
morning shift between 8:00 AM and 12 noon. Each agent receives on average 1.75 calls every 10
minutes. On Monday, March 1st, 2021, the call center registered a total of 1668 inbound calls. The
manager thinks that such a number is not unusual for the morning shift. Using the statistics above,
set forth an argument against the manager.27
A large firm's call center handles customer complaints, technical support issues and various
inquiries. There are 36 agents in charge of managing inbound calls during the 4-hours long
morning shift between 8:00 AM and 12 noon. Each agent receives on average 1.75 calls every 10
minutes. Using these statistics, calculate the lower 𝑊 − and upper bounds 𝑊 + such that 99% of
25 𝑆𝐸(𝑝̂ ) = √0.45(1 − 0.45)⁄50 = 0.0704. Using the CLT for sample proportion, we calculate 𝑃(0.4 ≤ 𝑝 ≤ 0.5) =
0.4−0.45 0.5−0.45
𝑃( ≤𝑍≤ ) = 𝑃(−0.7217 ≤ 𝑍 ≤ 0.7071) = Φ(0.7071) − Φ(−0.7217) = 52.5%.
0.0704 0.0704
26 Because the normal model applies to 𝑝̂ , we can write 𝑃(0.3 ≤ 𝑝 ≤ 0.5) = 𝑃(−1.96 ≤ 𝑍 ≤ 1.96) = 0.95. Then, we
calls during the morning shift. The number of total calls that an agent 𝑖 will handle during an ordinary morning shift is
then a Poisson random variable 𝑋𝑖 ∼ 𝑃𝑜𝑖(𝜆 = 42) with 𝐸(𝑋𝑖 ) = 𝑉𝑎𝑟(𝑋𝑖 ) = 42. The total number of calls that all agents
will handle can then be represented as the sum 𝑊 = 𝑋1 + ⋯ + 𝑋36 . This forms a sequence of iid random variables and,
therefore, the CLT is suitable to analyze how a total of 1668 inbound calls is unlikely during a morning shift. Applying
the CLT, we get 𝑍𝑊 = (𝑊𝑛 − 𝐸(𝑊))⁄√𝑉𝑎𝑟(𝑊) = (𝑊𝑛 − 𝑛𝐸(𝑋𝑖 ))⁄√𝑛𝑉𝑎𝑟(𝑋𝑖 ) = (1668 − 36 × 42)⁄√36 × 42 = 4.01.
Thus, receiving a total of 1668 calls during a morning shift is a 4-sigma event based on the CLT. Such an event is quite
unusual unlike what the manager claims.
18
total calls expected to be handled during the morning shift of an ordinary day will be comprised
between 𝑊 − and 𝑊 + . Mind rounding your answers 𝑊 − and 𝑊 + up to the nearest integer.28
You play a chance game. You roll a six-sided die and win 6 times your bet if the die turns a 6. For
example, if you bet $10 and the die turns 6, then your payoff is $10 × 6 = $60. If the die lands on
a number other than 6, you lose your $10 bet. (1) Let 𝑋 be the random variable defined as the
Profit & Loss (i.e. P&L) of a player who bets $10 on this game one time. Build a probability model
for 𝑋. Calculate 𝐸(𝑋) and 𝑉𝑎𝑟(𝑋). (2) Suppose you play the game 𝑛 = 100 times. What is the
probability that your P&L will be higher than $100 at the end of your play?29
Suppose a soccer player marks, on average, 0.35 goals per game of 90 minutes. Remember, this is
an average, sometimes he marks 1 goal per game, other times 2 goals per game or no goal at all.
(1) Calculate the probability that the he marks 2 goals during the next ordinary game of 90
minutes, 𝑃(𝑋 = 2) = ? (2) Suppose that he is going to play 100 games of the same kind. Calculate
the probability that he marks at least 30 and at most 40 goals at the end of this series if his
performance during a given game is independent of his performance during a previous one,
𝑃(30 ≤ 𝑋 ≤ 40) = ?30
28 1.75 calls per 10-minutes intervals, is equivalent to 6 × 1.75 = 10.5 calls each hour and 10.5 × 4 = 42 expected
calls during the morning shift. The number of total calls that an agent 𝑖 will handle during an ordinary morning shift is
then a Poisson random variable 𝑋𝑖 ∼ 𝑃𝑜𝑖(𝜆 = 42) with 𝐸(𝑋𝑖 ) = 𝑉𝑎𝑟(𝑋𝑖 ) = 42. The total number of calls that all agents
will handle can then be represented as the sum 𝑊 = 𝑋1 + ⋯ + 𝑋36 . This forms a sequence of iid random variables. The
bounds 𝑊 − and 𝑊 + such that 99% of the total calls can be written as 𝑃(𝑊 − ≤ 𝑊 ≤ 𝑊 + ) = 0.99. Because, 𝑊 follows
nearly a normal distribution, we can note that 𝑃(𝑧 − ≤ 𝑍 ≤ 𝑧 + ) = 0.99 where 𝑍 ∼ 𝑁(0, 1). Using the standard normal
quantiles, we first note 𝑧 − = −2.58 and 𝑧 + = 2.58 since 99% of the standard normal density is comprised between
−2.58 and 2.58. Finally, we solve 𝑊 − and 𝑊 + in −2.58 = (𝑊 − − 36 × 42)⁄√36 × 42 → 𝑊 − = 1411.678 and 2.58 =
(𝑊 + − 36 × 42)⁄√36 × 42 → 𝑊 + = 1612.322. Thus, 99% of the time during the morning shift of an ordinary day, the
call center is expected to handle between 1412 and 1613 calls.
29 Let 𝑋 represent the P&L when we play the game one time and bet $10. It can be shown that 𝐸(𝑋 ) = $0 and
𝑖 𝑖
𝑉𝑎𝑟(𝑋𝑖 ) = $500 per $10 bet. Then, if we play the game 𝑛 = 100 times, this becomes a sequence of iid random
variables 𝑋𝑖 and we can apply the CLT. 𝑍100 = (100 − 100 × 𝐸(𝑋𝑖 ))⁄√𝑛 × 𝑉𝑎𝑟(𝑋𝑖 ) = 0.44. Then, the probability that
the final P&L to be above $100 is given by 𝑃(𝑍 ≥ 0.44) = 1 − 𝑃(𝑍 ≤ 0.44) = 32.73%.
30 𝑋 ∼ 𝑃𝑜𝑖(𝜆 = 0.35). (1) 𝑃(𝑋 = 2) = 0.0432 (on Excel use POISSON.DIST(2, 0.35, FALSE)). (2) We consider the next
100 games as an iid sequence like 𝑊100 = 𝑋1 + ⋯ + 𝑋100 . Each 𝑋𝑖 follows the same Poisson distribution with mean
𝜆 = 0.35 and variance 𝜆 = 0.35. Therefore, we apply the CLT as, 𝑃(30 ≤ 𝑊100 ≤ 40) = 𝑃((30 − 100 × 0.35)/√(100 ×
0.35) ≤ 𝑍 ≤ (40 − 100 × 0.35)/√(100 × 0.35)). Using the st normal dist, we get 𝑃(−0.8451 ≤ 𝑍 ≤ 0.8451) =
0.8001 − 0.199 = 60.11%.
19
Sampling distributions 𝑌𝑛 − 𝐸(𝑌𝑛 ) (𝑋1 + ⋯ + 𝑋𝑛 ) − 𝑛𝜇
𝑍𝑛 = =
√𝑉𝑎𝑟(𝑌𝑛 ) 𝜎 √𝑛
Lecture Outline
converges in distribution to,
plim{𝑃(𝑍𝑛 ≤ 𝑧} = Φ(𝑧)
Def.: Statistical inference aims using the characteristics 𝑛→∞
𝑥̅ − 𝜇
plim 𝑃 (𝑍𝑛 ≤ ) = Φ(𝑧)
Def.: The sampling dist. of 𝜃̂ is the dist. of estimated 𝑛→∞ 𝜎⁄√𝑛
Def.: Let 𝜃̂ be a sample statistic obtained using a sample CLT for 𝑝̂ : Let 𝑋1 , … , 𝑋𝑛 𝑖𝑖𝑑 Bernoulli as 𝑋𝑖 ∼ 𝐵𝑒(𝑝).
of size 𝑛. Then, the standard deviation of 𝜃̂ is called the Define 𝑌 as the number of success in 𝑋1 , … , 𝑋𝑛 , then
standard error of the estimate. Mathematically, 𝑌 ∼ 𝐵(𝑛, 𝑝) with 𝐸(𝑌) = 𝑛𝑝 and 𝑉𝑎𝑟(𝑌) = 𝑛𝑝(1 − 𝑝).
𝑆𝐸(𝜃̂) = 𝜎⁄√𝑛 if 𝜎 is known, otherwise 𝑆𝐸(𝜃̂) = 𝑠⁄√𝑛 Define 𝑝̂ as the sample proportion of success. The exp.
where 𝑠 is the sample standard deviation. value of 𝑝̂ is 𝐸(𝑝̂ ) = 𝑛𝑝⁄𝑛 = 𝑝. The st. error of 𝑝̂ is,
Remarks: (1) 𝑆𝐸(𝜃̂) measures the randomness √𝑉𝑎𝑟(𝑌) √𝑛𝑝(1 − 𝑝)⁄𝑛 𝑝(1 − 𝑝)
𝑆𝐸(𝑝̂ ) = = =√
associated with 𝜃̂; (2) The larger 𝑛 gets, the better the √𝑛 √𝑛 𝑛
precision of 𝜃̂.
The CLT holds that,
𝑝̂ − 𝑝
Central Limit Theorem: Let 𝑋1 , … , 𝑋𝑛 𝑖𝑖𝑑 r.v. with plim 𝑃 𝑍𝑛 ≤ = Φ(𝑧)
√𝑝(1 − 𝑝)
𝑛→∞
𝐸(𝑋𝑖 ) = 𝜇 and 𝑉𝑎𝑟(𝑋𝑖 ) = 𝜎 . Define 𝑌𝑛 = 𝑋1 + ⋯ +
2
( 𝑛 )
𝑋𝑛 . Then, 𝐸(𝑌𝑛 ) = 𝑛𝜇 and 𝑉𝑎𝑟(𝑌𝑛 ) = 𝑛𝜎 because of
2
𝑝(1−𝑝)
𝑖𝑖𝑑 property. The classical CLT holds that the following which implies 𝑝̂ ∼ 𝑁 (𝑝, √ 𝑛
).
transformation of 𝑌𝑛 ,
Remarks: Normality requires 𝑛𝑝 and 𝑛(1 − 𝑝) > 15.
1