STT251 Lecture-01
STT251 Lecture-01
Introduction
Sampling distribution of a statistic may be defined as the probability law, which the statistic
follows, if repeated random samples of a fixed size are drawn from a specified population.
Let us consider a random sample x1, x2, ...., xn of size n drawn from a population containing N units.
Let us further suppose that we are interested in the sampling distribution of the statistic x̄ (i.e.,
sample mean), where
1
x̄= (x1 + x 2 +. .. .+ x n)
n
If the population size N is finite, there is a finite number (say k) of possible ways of drawing n units
in the sample out of a total of N units in the population. Although the k samples are distinct, the
sample means may not be all different, but each of these will occur with equal probability. Thus, we
can construct a table showing the set of possible values of the statistic x̄ and also the probability
that x̄ will take each of these values. This probability distribution of the statistic x̄ is called
'sampling distribution' of sample mean. The above method is quite general, and the sampling
distribution of any other statistic, say, median or standard deviation of the sample, may be obtained.
The sampling distribution depends on multiple factors – the statistic, sample size, sampling process,
and the overall population. It is used to help calculate statistics such as means, ranges, variances,
and standard deviations for the given sample.
A fair die is thrown infinitely many times, with the random variable X = # of spots on any throw.
The probability distribution of X is:
X 1 2 3 4 5 6
P(X) 1/6 1/6 1/6 1/6 1/6 1/6
A sampling distribution is created by looking at all samples of size n=2 (i.e. two dice) and their
means…
While there are 36 possible samples of size 2, there are only 11 values for x̄ , and some (e.g.
=3.5) occur more frequently than others (e . g . x̄ =1).
P( ) 6/36
1.0 1/36
1.5 2/36 5/36
2.0 3/36
2.5 4/36 4/36
3.0 5/36
)
3.5 6/36
P(
1 2 3 4 5 6 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0
We can generalize the mean and variance of the sampling of two dice:
And to n-dice
The standard deviation of the sampling distribution is called the standard error:
Example 1.1: Suppose we have a population of N=4 incomes of four business firms and we want to
find the average return of these firms. The incomes (in Lakhs) are 100,200, 300 and 400. We first
note that in this case the (population) mean income is 250 lakhs. Now we use this situation to
illustrate how sample means differ from the population mean. Suppose we select a sample of n = 2
observations in order to estimate the population mean μ . Now, there are C(4,2) = 6 possible
samples of size 2 and we will randomly be selecting one sample from this. We shall now calculate
the means of these 6 different samples. These six different samples and their means are given in the
following table.
Now, from the table above, you can find that each sample has a different mean, with the exception
of third and fourth samples. Therefore four of the six samples will result in some error in the
estimation process. This sampling error is the difference between the population mean μ and the
sample mean we use to estimate it. Let us now consider the possible sample means and calculate
with their probability. We assume that each sample is equally likely to be chosen. Then the
probability of selecting a sample is 1/6.
Then we list every possible sample means and their respective possibilities in a table.
k
π=
n
where k is the number of observations that fall in a particular category and n is the total number of
observation. When the population is very large, we may take samples to study the population and
for each sample we calculate the sample proportion, p, as
s
p^ =
n
where s denotes the number of observation in the sample which meet the particular characteristic,
under study and n is the sample size.
If the sample size is much smaller than the size of a population with proportion p of successes, then
the mean and standard deviation of p^ are:
p (1− p)
μ ^p= p and σ ^p=
√ n
References: