0% found this document useful (0 votes)
67 views8 pages

Chapter 6 Sampling Distribution PDF

This document discusses sampling distributions and introduces key concepts: 1) A sampling distribution shows the possible values and probabilities of a sample statistic (e.g. sample mean, proportion) when taking different samples from the same population. 2) The distribution of the sample mean is derived, with the mean equal to the population mean and variance equal to the population variance divided by the sample size. 3) Examples are provided to illustrate constructing sampling distributions and calculating probabilities based on the distribution.

Uploaded by

dikkan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views8 pages

Chapter 6 Sampling Distribution PDF

This document discusses sampling distributions and introduces key concepts: 1) A sampling distribution shows the possible values and probabilities of a sample statistic (e.g. sample mean, proportion) when taking different samples from the same population. 2) The distribution of the sample mean is derived, with the mean equal to the population mean and variance equal to the population variance divided by the sample size. 3) Examples are provided to illustrate constructing sampling distributions and calculating probabilities based on the distribution.

Uploaded by

dikkan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

CH 6 SAMPLING DISTRIBUTIONS

This chapter is still about the probability distributions of random variables. However, the random
variables are not characteristics of raw scores; rather, they are the statistics of samples of scores (eg.

sample mean, X or sample proportion, p ). Of interest to us is the pattern of variation (distribution) of a
sample statistic, known as the sampling distribution of a statistic.

6.1 What is a Sampling Distribution?

When different samples (of the same size) are randomly drawn from a common population, the sample

statistics of interest (eg. sample mean, X or sample proportion, p ) will vary from sample to sample
because of random variation. Thus, a sample statistic is a random variable. The probability that a sample
statistic assumes a particular value depends on the likelihood of the sample being selected. The pattern of
variation in the possible values of the sample statistic forms a distribution which is known as a sampling
distribution.

Definition: A sampling distribution is a probability distribution of a sample statistic. It shows all


possible values of the sample statistic and their corresponding probabilities.

Suppose that samples, each of size n, are drawn from a population, X. For each sample, the mean of the n
scores, X , is computed. If infinitely many samples are drawn, there will be infinitely many X ’s and a
plot of these X ’s will reveal a pattern in its distribution. This distribution is known as the sampling
distribution of the sample mean, X .

Population X: x1, x2, x3, . . .


Mean of population X : µ

Sample1 Sample 2 Sample k


x11, x12,. . ., x1 n x21, x22, . . ., x2n .... xk1, xk2, . . . , xkn

Mean of Sample 1: x1 Mean of Sample 2: x2 .... Mean of Sample k: xk

Population: x1, x2 , ... , xk


Mean of X : µ X
Standard deviation of X : σ X
If, instead of the mean, medians were computed for each sample, then an infinite number of medians
would form a sampling distribution of the median. If proportions were computed for each sample, then
the infinite number of proportions would form a sampling distribution of the proportion.

Example: Consider the set {1, 2, 3, 4}:


1) Make a list of all samples of size 2 that can be drawn from this set (Sampling with replacement)
2) Construct the sampling distribution for the sample mean for samples of size 2
3) Construct the sampling distribution for the minimum for samples of size 2

Sample x Minimum Probability


{1, 1} 1.0 1 1/16
{1, 2} 1.5 1 1/16
{1, 3} 2.0 1 1/16
{1, 4} 2.5 1 1/16
{2, 1} 1.5 1 1/16
{2, 2} 2.0 2 1/16
{2, 3} 2.5 2 1/16
{2, 4} 3.0 2 1/16
{3, 1} 2.0 1 1/16
{3, 2} 2.5 2 1/16
{3, 3} 3.0 3 1/16
{3, 4} 3.5 3 1/16
{4, 1} 2.5 1 1/16
{4, 2} 3.0 2 1/16
{4, 3} 3.5 3 1/16
{4, 4} 4.0 4 1/16

The above table lists all possible samples of size 2, the mean for each sample, the minimum for each
sample, and the probability of drawing each sample (all equally likely).
The sampling distribution of the sample mean and the sample minimum are constructed based on the
above table.

Histogram: Sampling Distribution


of the Sample Mean

P( x)
0.25

0.20

0.15

0.10

0.05

0.00
1.0 1.5 2.0 2.5 3.0 3.5 4.0
x
Histogram: Sampling Distribution of the Sample Minimum:

Sampling Distribution of the Sample Minimum: P ( m) 0.5

m P(m) 0.4

1 7/16 0.3

2 5/16 0.2
3 3/16
4 1/16 0.1

0.0
m
1 2 3 4

The theory involved with sampling distributions requires random sampling whereby a random sample is
defined as:
A sample obtained in such a way that each possible sample of a fixed size n has an equal probability
(chance) of being selected.

In studying about sampling distributions, we will investigate the variability in sample statistics from
sample to sample. For each sample statistic, we will investigate
- the pattern (shape) of variability
- its mean
- its standard deviation.

Knowledge about the sampling distribution of a sample statistic will enable us to calculate its
probabilities. In this course, we will focus on the sampling distributions of the sample mean, X and

sample proportion, p .

6.2 The Distribution of the Sample Mean, X

Suppose that k samples, each of size n, are drawn from a common population, X. For each sample, the
mean of the n scores, x , is computed. If k samples are drawn, there are k values of x , as summarized in
the following table.

sample 1st jth nth sample


observation … observation … observation mean
1 x11 ... x1j … x1n x1
2 x21 ... x2j … x2n x2
.
.
i xi1 ... xij … xin xi
.
.
k xk1 ... xkj … xkn xk
X1 … Xj … Xn

The value xij can be thought of as the jth observation of sample i. If Xj denotes ‘the jth observation of a
sample’, then Xj is a random variable that varies from sample to sample. Thus, for samples of size n,
there are n random variables, X1, X2, … Xn . Furthermore, since all xij ’s are independent and are
observations from the same population, X, X1, X2, … Xn are therefore, independent with a common
distribution from X.
Let the sample mean of the ith sample be denoted by xi . Due to random variation, the values of
x1, x2 , ... , xk will also vary. Thus, there exists a random variable, namely the ‘sample mean’, denoted by
X , having possible values x1, x2 , ... , xk . By definition, the sample mean is the average of the n
observations in the sample, i.e.

1
X   X 1  X 2  ...  X n 
n
1 1 1
 X 1  X 2  ...  X n
n n n

Suppose that the common population , X , has mean  and variance  2 . The mean and variance of X
are derived as follows:
1 1 1 
µ X = E ( X ) = E  X1 + X 2 + ... + X n 
n n n 
1 1 1
= E ( X1) + E ( X 2 ) + ... + E ( X n )
n n n
1 1 1 1 
= µ + µ + ... + µ = n µ  = µ
n n n n 

1 1 1 
σ X2 = Var ( X ) = Var  X1 + X 2 + ... + X n 
n n n 
1 1 1
=
2
Var ( X1) + 2 Var ( X 2 ) + ... + 2 Var ( X n ) (since the Xi ‘s are independent)
n n n
1 1 1  1  σ2
=
2
σ2 + 2
σ 2 + ... + 2
σ 2 = n 2
σ2 =
n n n n  n

Example: The mean of a population is 64, and its standard deviation is 12. A sample of 40
observations is randomly selected. Find the expectation and variance of the sample mean.

Example: The probability distribution of a discrete random variable X is given by:


P(X = 0) =1/4 P(X = 1) = 1/2, P(X = 2) = P(X = 3) = 1/8.
Determine the mean and variance of the sample mean when
(i) the sample size is 2 (ii) the sample size is 16.
The Central Limit Theorem (CLT)

The Central Limit Theorem is a very important and powerful theorem in statistics. Most of hypothesis
testing and sampling theory are based on this theorem. The Central Limit Theorem provides a justification
for using the normal curve as a model for many naturally occurring phenomena. In most situations, this
theorem works reasonably well with sample sizes greater than 25. The informal statement of the theorem
is as follows:

Suppose X1, X2, , . . X n are n independent random variables having a common distribution. Then, as
n increases, the distributions of X1 + X2 +. . . + X n and of (X1 + X2 +. . . + Xn)/n increasingly
resembles normal distributions.

Note: The proof of the Central Limit Theorem is beyond the scope of this course

Graphical Illustration of the Central Limit Theorem:

Original Population Distribution of x:


n=2

10 20 30 x 10 20 30 x

Distribution of x:
n = 10 Distribution of x:
n = 30

10 x 10 20 x
Summary
- The mean of the sampling distribution of X is equal to the mean of the original population: µ X = µ
- The standard deviation of the sampling distribution of X (also called the standard error of the mean)
is equal to the standard deviation of the original population divided by the square root of the sample
σ
size: σ X =
n
- The distribution of X is (exactly) normal when the original population is normal.
The CLT says that the distribution of X is approximately normal , regardless of the shape of the
original distribution, when the sample size is large enough!

When the sampling distribution of the sample mean is (exactly) normally distributed, or approximately
normally distributed (by the CLT), we can then compute probabilities about the a sample mean using the
standard normal distribution.
Standard Error of the Mean
The standard deviation of the sampling distribution of the mean is called the standard error of the mean
σ
and is symbolized by σ X = . The standard error of a statistics describes the degree to which the
n
computed statistics will differ from one another when calculated from samples of similar size and selected
from similar population models. The larger the standard error, the greater the difference between the
computed statistics.

Example: Consider a normal population with   50 and  2 = 15. Suppose a sample of size 9 is
selected at random. Find: (1) P (45 ≤ X ≤ 60) (2) P ( X ≤ 47.5)
Solution:

Example: A recent report stated that the day-care cost per week in a city is RM200. Suppose that this
figure is taken as the mean cost per week and that the standard deviation is known to be RM25.
(i) Find the probability that a sample of 50 day-care centers would show a mean cost of RM200 or less
per week.
(ii) Suppose the actual sample mean cost for the sample of 50 day-care centers is RM215. Is there any
evidence to refute the claim of RM200 presented in the report?

Solution:
The shape of the original distribution is unknown, but the mean and standard deviation are known. As the
sample size, n, is large, the CLT applies. Thus, it can be said that the distribution of X is approximately
normal.
6.4 Sampling Distribution of Sample Proportions, p̂

Consider a population of size N with members which can be categorized as successes or failures. If the
number of successes in the population is Xc, then the proportion (or percentage or probability) of success
Xc
in the population, denoted as p, is , a constant.
N
Suppose that a random sample of n observations is taken from the population. Let the number of
successes in the sample be denoted by X. The observed proportion of successes in the sample, denoted by
X
p̂ , is therefore, . If samples of n observations are repeatedly taken from the same population an
n
X
infinite number of times, there will be an infinite number of sample proportions, pˆ = . The value of p̂
n
will vary from sample to sample because the value of X varies from sample to sample. Thus p̂ is a
random variable and its probability distribution is known as the sampling distribution of sample
proportions.
From Ch 5.1, we know that X , the number of successes in a sample of size n, is a random variable
having a binomial distribution with parameters n and p, written as X ~ Bin (n, p). The mean and variance
of X is
E(X) = np and Var(X) = npq.

It follows that the mean and variance of the random variable p̂ are:
X 
 pˆ  E ( pˆ )  E  
 n 
1
 np  p
n
and
X 
 pˆ 2  Var ( pˆ )  Var  
 n 
1 pq
 2
Var ( X ) 
n n

For cases where n is sufficiently large (so that np  5 and nq  5 ), the normal approximation on the
binomial distribution may be used to find probabilities about p̂ .

Summary

- The mean of the sampling distribution of p̂ is equal to the proportion of the original population, i.e.
µ pˆ = p
- The standard deviation of the sampling distribution of p̂ (also called the standard error of the
pq
proportion), i.e. σ pˆ =
n
- The sampling distribution of p̂ is approximately normal when the sample size, n, is large enough
( np ≥ 5 and nq ≥ 5 ).
Example: Suppose a fair coin is tossed 64 times. Determine the probability that at most 30 heads are
obtained.
Solution: Let X be the number of heads obtained, and p the probability of obtaining a head in a toss.
X ~ Binomial (n = 64, p = 0.5)
pˆ ~ Normal ( µ = 0.5, σ 2 = (0.5)(0.5) / 64 = 0.0039)
 X 30 
P ( X ≤ 30) = P ≤  = P ( pˆ ≤ 0.4688)
 n 64 
 0.4688 − 0.5 
≈ P Z ≤ 
 0.0039 
= P ( Z ≤ −0.4996) = 1 − P ( Z ≤ 0.4996) = 1 – 0.6913 = 0.3087

You might also like