0% found this document useful (0 votes)
80 views

Week 2

1. The document discusses sampling distributions and the central limit theorem. It explains that as sample size increases, the sampling distribution of statistics like the mean and proportion will approach a normal distribution, regardless of the population distribution. 2. It provides examples of the sampling distributions of the mean and proportion. The sampling distribution of the mean follows a normal distribution, while the sampling distribution of a proportion follows a specific normal distribution. 3. The document also discusses the t-distribution and how it can be used to test hypotheses about population means when the population standard deviation is unknown. The t-distribution has applications in small sample sizes.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views

Week 2

1. The document discusses sampling distributions and the central limit theorem. It explains that as sample size increases, the sampling distribution of statistics like the mean and proportion will approach a normal distribution, regardless of the population distribution. 2. It provides examples of the sampling distributions of the mean and proportion. The sampling distribution of the mean follows a normal distribution, while the sampling distribution of a proportion follows a specific normal distribution. 3. The document also discusses the t-distribution and how it can be used to test hypotheses about population means when the population standard deviation is unknown. The t-distribution has applications in small sample sizes.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

11

MA 3014 – S3 (’20 Batch) - 2022

Sampling distributions

Let’s draw all possible samples of size n from a given population of size 𝑁. Then consider
computing a statistic; the mean or a proportion or the standard deviation for each sample.

The probability distribution of this statistic is called a sampling distribution.


Variability of a Sampling Distribution

The variability of a sampling distribution is measured by its variance (or by its std. deviation).

This variability will depend on;

 𝑁 : The number of observations in the population.


 𝑛 : The number of observations in the sample.
 The method used to select the samples at random.

Note: If 𝑁 is much larger than 𝑛, then𝑛/𝑁is fairly small and the sampling distribution has
roughly the same sampling error, irrespective of whether sampling is done with or
without replacement.

If sampling is done without replacement and the sample represents a significant fraction
(say, 1/10) of the population size, the sampling error will be clearly smaller.

The Central Limit Theorem


The Central Limit Theorem (CLT) states that the probability distribution of any statistic (or the
sampling distribution for any statistic) will be normal or nearly normal, if the sample size is
“large enough”. Thus the CLT permits approximate calculations for a variety of distributions.
Many statisticians say that a sample size of 30 is “large enough” as a rule of thumb.
These are some other instances in which the sample size can be considered as large enough.
 The population distribution is normal.
 The sampling distribution is symmetric, unimodal, without outliers, and the sample
size is 15 or less.
 The sampling distribution is moderately skewed, unimodal, without outliers, and the
sample size is between 16 and 40.
 The sample size is greater than 40, without outliers.
12

MA 3014 – S3 (’20 Batch) - 2022

1. Sampling Distribution of the Mean

Let us take 𝑥̅ as the mean of a sample of size 𝑛. Suppose


n there are𝑚 number of such samples drawn from this large
n n population.

If you take the average of the sample means by


n N n
∑ ̅ ∑ ̅
; 𝝁 (popln mean)
= 𝝁𝒙 = µ
n n
And, the standard error of the sampling distribution
n
𝜎 ̅ =𝜎̅= 𝜎 (1 𝑛 − 1 𝑁 ) = 𝜎 (1 𝑛) as N→∝

Thus, 𝝈𝒙 = 𝝈
√𝒏

Therefore, we can specify the sampling distribution of the mean 𝑥 ~𝑁(𝜇 ̅ , 𝜎 ̅ )as

𝒙 ~𝑵(𝝁 , 𝝈 𝒏); whenever two conditions are met:


𝟐

 The population is normally distributed, or the sample size is sufficiently large.


 The population standard deviation σ is known.
13

MA 3014 – S3 (’20 Batch) - 2022

2. Sampling Distribution of the Proportion

Let the probability of getting a success is P; and the probability of a failure is Q in a population.
From this population of size 𝑁, suppose that we draw all possible samples of size n. And finally,
within each sample, suppose that we determine the proportion of successes p and failures q. In
this way, we create a sampling distribution of the proportion.

Let us take p as proportion of successes in a sample of size 𝑛.

Suppose there are𝑚 number of such samples drawn from this large population.

If you take the mean of the sample proportions by


∑ ∑
; = 𝝁𝒑 = 𝑷 (Population proportion of success)

And, the standard error of the sampling distribution

𝜎 =𝜎 = 𝜎 (1 𝑛 − 1 𝑁 ) = 𝑃𝑄(1 𝑛 − 1 𝑁) = 𝑃𝑄/𝑛as N→∝

𝑷𝑸
Thus, 𝝈𝒑 = 𝒏

Therefore, we can specify the sampling distribution of the proportion 𝑝~𝑁(𝜇 , 𝜎 ) as

𝒑~𝑵(𝑷 ,
𝑷𝑸 ; whenever the sample size is sufficiently large and the population probability of
𝒏)
success (P) is known.

Example:

1. Suppose that a biased coin has probability p=0.4of heads. In 1000 tosses, what is the
probability that the number of heads exceeds 410?

1. Find the probability that of the next 120 births, no more than 40% will be boys. Assume
equal probabilities for the births of boys and girls. Assume also that the number of births in
the population (N) is very large, essentially infinite.
14

MA 3014 – S3 (’20 Batch) - 2022

Exercises:

1. A true-false examination has 48 questions. Jane has probability 3/4 of answering a question
correctly. Ama just guesses on each question. A passing score is 30 or more correct answers.
Compare the probability that Jane passes the exam with the probability that Ama passes it.
Jane’s score has distribution B(48,0.75), so the probability that Jane’s score is 30 or more is
1-P(X<=29) = 0.9627. In case your calculator doesn’t give an answer, you will have to use a
normal approximation to the Binomial distribution (based on the Central Limit Theorem)

2. A restaurant feeds 400 customers per day. On the average 20 percent of the customers order
apple pie.
(a) Give a range for the number of pieces of apple pie ordered on a given day such that you
can be 95 percent sure that the actual number will fall in this range.
(b) How many customers must the restaurant have, on the average, to be at least 95 percent
sure that the number of customers ordering pie on that day falls in the 19 to 21 percent
range?

3. A rookie is brought to a baseball club on the assumption that he will have a 0.3 batting
average. (Batting average is the ratio of the number of hits to the number of times at bat.) In
the first year, he comes to bat 300 times and his batting average is 0.267. Assume that his at
bats can be considered Bernoulli trials with probability 0.3 for success. Could such a low
average be considered just bad luck or should he be sent back to the minor leagues?
15

MA 3014 – S3 (’20 Batch) - 2022

3) Student’s t Distribution

A particular form of the t distribution is determined by its degrees of freedom. The “degrees of
freedom” refers to the number of independent observations in a set of data.

In general, the degrees of freedom of an estimate of a parameter is equal to the number of


independent scores that go into the estimate minus the number of parameters used as
intermediate steps in the estimation of the parameter itself (which, in sample variance, is one,
since the sample mean is the only intermediate step).

Lane, David M.."Degrees of


Freedom".HyperStatOnline.StatisticsSolutions.http://davidmlane.com/hyperstat/A42408.html.
Retrieved 2008-08-21.

Suppose we have a simple random sample of size n drawn from a Normal population with mean
𝜇 and standard deviation 𝜎. Let 𝑥̅ denote the sample mean and s, the sample standard deviation.
16

MA 3014 – S3 (’20 Batch) - 2022


𝒙 𝝁
Then the quantity 𝒕 = 𝒔 has a t distribution with n-1 degrees of freedom.
√𝒏

The 𝑡 score produced by this transformation can be associated with a unique cumulative
probability. This cumulative probability represents the likelihood of finding a sample mean less
than or equal to 𝑥̅ , given a random sample of size n.

The notation 𝑡 represents the t-score that has a cumulative probability of (1 - α).

Example: t0.05 = 2.92, then t0.95 = -2.92 for df=3

Properties of the t Distribution

 The mean of the distribution is equal to 0 .


 The variance is equal to v / ( v - 2 ), where v is the degrees of freedom and v> 2.
 The variance is always greater than 1, although it is close to 1 when there are many
degrees of freedom. With infinite degrees of freedom, the t distribution is the same as the
standard normal distribution.

When to use the t Distribution

The t distribution can be used with any statistic having a bell-shaped distribution (i.e.,
approximately normal). i.e. when the population size is large but the sample sizes are small and
the standard deviation of the population is unknown t-Distribution can be applied.

𝐸𝑥𝑎𝑚𝑝𝑙𝑒: 𝒕 = (𝒑 − 𝑷)/ (𝑷𝑸 ⁄ 𝒏)has a t distribution with n-1 degrees of freedom

When not to use the t-distribution

The t distribution should not be used with small samples from populations that are not
approximately normal.
17

MA 3014 – S3 (’20 Batch) - 2022

Example:

1. A random sample of 12 observations from a normal population with mean 48 produced the
following
Estimates: 𝑥̅ = 47.1and 𝑠 = 4.7. Find the probability of getting a sample of the same size
with its mean less than or equal to the population mean.

2. The MD of Orrange light bulb manufactures claims that an average of their light bulbs lasts
300 days. An investigator randomly selects 15 bulbs for testing and those bulbs last an
average of 290 days, with a standard deviation of 50 days. Assuming MD’s claim as true,
determine the probability that 15 randomly selected bulbs would have an average life of no
more than 290 days?
18

MA 3014 – S3 (’20 Batch) - 2022

4. Chi-square Distribution

The chi-square statistic can be calculated from a sample of size 𝑛drawn from a population,
which is normal, using the following equation:

(𝑛 − 1)𝑠
𝜒 = 𝜎

When sampling is done for an infinite number of times, and by calculating the chi-square statistic
for each sample, the sampling distribution for the chi-square statistic can be obtained. It is then
called the chi-square distribution.

The chi-square distribution also depends on the degrees of freedom; (𝑛 − 1).

Properties of the chi-square distribution:

 The mean of the distribution is equal to the number of degrees of freedom: μ = v.


 The variance is equal to two times the number of degrees of freedom: σ2 = 2v
 When the degrees of freedom are greater than or equal to 2, the maximum value for
𝑓(𝑥), 𝑡ℎ𝑒 𝑝𝑑𝑓 𝑜𝑓 𝑐ℎ𝑖 − 𝑠𝑞𝑢𝑎𝑟𝑒 occurs.
19

MA 3014 – S3 (’20 Batch) - 2022


 As the degrees of freedom increase, the chi-square curve approaches a normal
distribution.

Cumulative Probability of the Chi-Square Distribution

The chi-square distribution is constructed so that the total area under the curve is equal to 1. The
probability that the value of a chi-square statistic will fall between 0 and A; 𝑃(𝜒 ≤ 𝐴) is
illustrated by the following diagram.

Using the following Chi-Square Distribution table, one can find thecritical 𝜒 value, when the
probability of exceeding the critical value is given.
20

MA 3014 – S3 (’20 Batch) - 2022

Example: My Cell company has developed a new cell phone battery. On average, the battery
lasts 60 minutes on a single charge. The standard deviation is 5 minutes. Suppose the
manufacturing department runs a quality control test. They randomly select 10 batteries. The
standard deviation of the selected batteries is 6 minutes.

a) What is the chi-square statistic which represents this test?

b) What is the probability that the standard deviation of any sample of size 10 would be greater
than 6 minutes?

5. F Distribution

The distribution of all possible values of the f statistic is called an F distribution, with v1 = n1 -
1 and v2 = n2 - 1 degrees of freedom. The f statistic, also known as an f value, is a random
variable that has an F distribution.

How to compute an f statistic:

 Select a random sample of size n1 from a normal population, having a standard deviation
equal to σ1.
 Select an independent random sample of size n2 from a normal population, having a
standard deviation equal to σ2.
 The f statistic is the ratio of s12/σ12 and s22/σ22.
21

MA 3014 – S3 (’20 Batch) - 2022

The following equations are commonly used in equivalent to an f statistic:

f(v1, v2) = [ s12/σ12 ] / [ s22/σ22 ]


f(v1, v2) = [ s12. σ22 ] / [ s22. σ12 ]

f(v1, v2) = [ χ21 / v1 ] / [ χ22 / v2 ]


f(v1, v2) = [ χ21.v2 ] / [ χ22.v1 ]

The curve of the F distribution depends on the degrees of freedom, v1 and v2.

Properties of the F distribution:

 The mean of the distribution is equal to v2 / ( v2 - 2 ) for v2> 2.


 The variance is equal to [ 2v22( v1 + v2 - 2 ) ] / [ v1( v2 - 2 )2( v2 - 4 )] for v2> 4.
22

MA 3014 – S3 (’20 Batch) - 2022

Cumulative Probability of the F Distribution

This cumulative probability represents the likelihood that the f statistic is less than or equal to a
specified value.

F-distribution table can be used to find the value of an f statistic having a cumulative probability
of (1 - α); represented by fα.

Thus, f0.05(v1, v2) refers to value of the f statistic having a cumulative probability of (1-0.05)=
0.95, with v1and v2degrees of freedom.

Example:

Suppose a sample of 11 of cows was selected at random from a population of them having the
population standard deviation of their weight is 5 kg and the estimated sample sd is 4.5 kg.
Another sample of size 7 of bulls was taken in a similar way with their population sd is 3.5 kg
and sample sd is 4 kg.

a) Compute an f-statistic.

b) Determine the associated cumulative probability by finding an approximate f-value to the


above answer from the f-tables available for different significance levels ( 𝛼 ).

c) Interpret the probability you found.

Reference for f-table: http://www.socr.ucla.edu/Applets.dir/F_Table.html

Upgrade your knowledge by:

 Finding the pdf ‘s of the above sampling distributions.


 Studying the patterns of cdf ‘s of the above sampling distributions.

You might also like