Week 2
Week 2
Sampling distributions
Let’s draw all possible samples of size n from a given population of size 𝑁. Then consider
computing a statistic; the mean or a proportion or the standard deviation for each sample.
The variability of a sampling distribution is measured by its variance (or by its std. deviation).
Note: If 𝑁 is much larger than 𝑛, then𝑛/𝑁is fairly small and the sampling distribution has
roughly the same sampling error, irrespective of whether sampling is done with or
without replacement.
If sampling is done without replacement and the sample represents a significant fraction
(say, 1/10) of the population size, the sampling error will be clearly smaller.
Thus, 𝝈𝒙 = 𝝈
√𝒏
Therefore, we can specify the sampling distribution of the mean 𝑥 ~𝑁(𝜇 ̅ , 𝜎 ̅ )as
Let the probability of getting a success is P; and the probability of a failure is Q in a population.
From this population of size 𝑁, suppose that we draw all possible samples of size n. And finally,
within each sample, suppose that we determine the proportion of successes p and failures q. In
this way, we create a sampling distribution of the proportion.
Suppose there are𝑚 number of such samples drawn from this large population.
𝑷𝑸
Thus, 𝝈𝒑 = 𝒏
𝒑~𝑵(𝑷 ,
𝑷𝑸 ; whenever the sample size is sufficiently large and the population probability of
𝒏)
success (P) is known.
Example:
1. Suppose that a biased coin has probability p=0.4of heads. In 1000 tosses, what is the
probability that the number of heads exceeds 410?
1. Find the probability that of the next 120 births, no more than 40% will be boys. Assume
equal probabilities for the births of boys and girls. Assume also that the number of births in
the population (N) is very large, essentially infinite.
14
Exercises:
1. A true-false examination has 48 questions. Jane has probability 3/4 of answering a question
correctly. Ama just guesses on each question. A passing score is 30 or more correct answers.
Compare the probability that Jane passes the exam with the probability that Ama passes it.
Jane’s score has distribution B(48,0.75), so the probability that Jane’s score is 30 or more is
1-P(X<=29) = 0.9627. In case your calculator doesn’t give an answer, you will have to use a
normal approximation to the Binomial distribution (based on the Central Limit Theorem)
2. A restaurant feeds 400 customers per day. On the average 20 percent of the customers order
apple pie.
(a) Give a range for the number of pieces of apple pie ordered on a given day such that you
can be 95 percent sure that the actual number will fall in this range.
(b) How many customers must the restaurant have, on the average, to be at least 95 percent
sure that the number of customers ordering pie on that day falls in the 19 to 21 percent
range?
3. A rookie is brought to a baseball club on the assumption that he will have a 0.3 batting
average. (Batting average is the ratio of the number of hits to the number of times at bat.) In
the first year, he comes to bat 300 times and his batting average is 0.267. Assume that his at
bats can be considered Bernoulli trials with probability 0.3 for success. Could such a low
average be considered just bad luck or should he be sent back to the minor leagues?
15
3) Student’s t Distribution
A particular form of the t distribution is determined by its degrees of freedom. The “degrees of
freedom” refers to the number of independent observations in a set of data.
Suppose we have a simple random sample of size n drawn from a Normal population with mean
𝜇 and standard deviation 𝜎. Let 𝑥̅ denote the sample mean and s, the sample standard deviation.
16
The 𝑡 score produced by this transformation can be associated with a unique cumulative
probability. This cumulative probability represents the likelihood of finding a sample mean less
than or equal to 𝑥̅ , given a random sample of size n.
The notation 𝑡 represents the t-score that has a cumulative probability of (1 - α).
The t distribution can be used with any statistic having a bell-shaped distribution (i.e.,
approximately normal). i.e. when the population size is large but the sample sizes are small and
the standard deviation of the population is unknown t-Distribution can be applied.
The t distribution should not be used with small samples from populations that are not
approximately normal.
17
Example:
1. A random sample of 12 observations from a normal population with mean 48 produced the
following
Estimates: 𝑥̅ = 47.1and 𝑠 = 4.7. Find the probability of getting a sample of the same size
with its mean less than or equal to the population mean.
2. The MD of Orrange light bulb manufactures claims that an average of their light bulbs lasts
300 days. An investigator randomly selects 15 bulbs for testing and those bulbs last an
average of 290 days, with a standard deviation of 50 days. Assuming MD’s claim as true,
determine the probability that 15 randomly selected bulbs would have an average life of no
more than 290 days?
18
4. Chi-square Distribution
The chi-square statistic can be calculated from a sample of size 𝑛drawn from a population,
which is normal, using the following equation:
(𝑛 − 1)𝑠
𝜒 = 𝜎
When sampling is done for an infinite number of times, and by calculating the chi-square statistic
for each sample, the sampling distribution for the chi-square statistic can be obtained. It is then
called the chi-square distribution.
The chi-square distribution is constructed so that the total area under the curve is equal to 1. The
probability that the value of a chi-square statistic will fall between 0 and A; 𝑃(𝜒 ≤ 𝐴) is
illustrated by the following diagram.
Using the following Chi-Square Distribution table, one can find thecritical 𝜒 value, when the
probability of exceeding the critical value is given.
20
Example: My Cell company has developed a new cell phone battery. On average, the battery
lasts 60 minutes on a single charge. The standard deviation is 5 minutes. Suppose the
manufacturing department runs a quality control test. They randomly select 10 batteries. The
standard deviation of the selected batteries is 6 minutes.
b) What is the probability that the standard deviation of any sample of size 10 would be greater
than 6 minutes?
5. F Distribution
The distribution of all possible values of the f statistic is called an F distribution, with v1 = n1 -
1 and v2 = n2 - 1 degrees of freedom. The f statistic, also known as an f value, is a random
variable that has an F distribution.
Select a random sample of size n1 from a normal population, having a standard deviation
equal to σ1.
Select an independent random sample of size n2 from a normal population, having a
standard deviation equal to σ2.
The f statistic is the ratio of s12/σ12 and s22/σ22.
21
The curve of the F distribution depends on the degrees of freedom, v1 and v2.
This cumulative probability represents the likelihood that the f statistic is less than or equal to a
specified value.
F-distribution table can be used to find the value of an f statistic having a cumulative probability
of (1 - α); represented by fα.
Thus, f0.05(v1, v2) refers to value of the f statistic having a cumulative probability of (1-0.05)=
0.95, with v1and v2degrees of freedom.
Example:
Suppose a sample of 11 of cows was selected at random from a population of them having the
population standard deviation of their weight is 5 kg and the estimated sample sd is 4.5 kg.
Another sample of size 7 of bulls was taken in a similar way with their population sd is 3.5 kg
and sample sd is 4 kg.
a) Compute an f-statistic.