WEEK 7.1 - Discrete Probability Distribution (Poisson)
WEEK 7.1 - Discrete Probability Distribution (Poisson)
BUSINESS STATISTICS
LE C TURE SIX
S A M P L I N G M ET H O D S A N D
S A M P L I N G D I S T R I B UT I O N
L E C T U R ER : D R G R A C E M U SA
C L A SS: M B A 2 0 2 5
Sample and population
A population: A complete listing or a collection of all
the elements of interest in a statistical study – Main
measure of population characteristics are the population
parameters (Means , , and Proportions) .
A sample is a subset of the population.
Good or bad samples.
Representative or non-representative samples. A
researcher hopes to obtain a sample that represents
the population, at least in the variables of interest for
the issue being examined.
Probabilistic samples are samples selected using the
principles of probability. This may allow a researcher
to determine the sampling distribution of a sample
statistic. If so, the researcher can determine the
probability of any given sampling error and make
statistical inferences about population characteristics.
Why sample?
Disadvantages
If sampling frame large, this method impracticable.
Minority subgroups of interest in population may not be
present in sample in sufficient numbers for study.
REPLACEMENT OF SELECTED UNITS
Sampling schemes may be without replacement ('WOR' - no
element can be selected more than once in the same sample) or
with replacement ('WR' - an element may appear multiple times in
the one sample).
For example, if we catch fish, measure them, and immediately
return them to the water before continuing with the sample, this is
a WR design, because we might end up catching and measuring
the same fish more than once. However, if we do not return the
fish to the water (e.g. if we eat the fish), this becomes a WOR
design.
SYSTEMATIC SAMPLING
Systematic sampling relies on arranging the target population
according to some ordering scheme and then selecting elements
at regular intervals through that ordered list.
Systematic sampling involves a random start and then proceeds
with the selection of every kth element from then onwards. In
this case, k=(population size/sample size).
It is important that the starting point is not automatically the first
in the list, but is instead randomly chosen from within the first
to the kth element in the list.
A simple example would be to select every 10th name from the
telephone directory (an 'every 10th' sample, also referred to as
'sampling with a skip of 10').
SYSTEMATIC SAMPLING……
ADVANTAGES:
Sample easy to select
Suitable sampling frame can be identified easily
Sample evenly spread over entire reference population
DISADVANTAGES:
Sample may be biased if hidden periodicity in population coincides
with that of selection.
Difficult to assess precision of estimate from one survey.
STRATIFIED SAMPLING
Where population embraces a number of distinct categories, the
frame can be organized into separate "strata." Each stratum is then
sampled as an independent sub-population, out of which individual
elements can be randomly selected.
Every unit in a stratum has same chance of being selected.
Using same sampling fraction for all strata ensures proportionate
representation in the sample.
Adequate representation of minority subgroups of interest can be
ensured by stratification & varying sampling fraction between
strata as required.
STRATIFIED SAMPLING……
Finally, since each stratum is treated as an independent
population, different sampling approaches can be applied to
different strata.
First N = 18 MFIs
In Kenya
1. MFI 1
2. MFI 2 Suppose you were asked to select a
3. MFI 3
4. MFI 4 simple random sample of size n =5
5. MFI 5
6. MFI 6 FROM the 18 cases, use RN in the
7.
8.
MFI 7
MFI 8
previous slide to choose the items .
9. MFI 9
10. MFI 10 .
11. MFI 11
MFI 12
12.
13. MFI 13 Always keep track of where you last
14.
15.
MFI 14
MFI 15
used the table and begin the next
16.
17.
MFI 16
MFI 17
selection at that point.
18. MFI 18
Sampling distributions
Basic Concept
The ultimate goal of generating a random sample is make
inferences about the nature of the population from which it is
drawn.
Key objective is to estimate a numerical measure of a population
called a population parameter using a sample statistic.
Note:
The value of a population parameter is usually CONSTANT
albeit unknown and never vary from sample to sample
Sample statistics usually vary with the sample selected (For
example, selecting the same sample size from different points
on a RN table) will result in different units hence different
statistics)
Since statistics vary from sample to sample, any
inferences based on them will be subject to some
level of uncertainties
Basic Concepts Cont..
Why then should one use such a measure to make inferences on the
population given this apparent reliability?
Answer lies on the fact that UNCERTAINITY of a statistics is characterized
by know properties reflected in what is often called SAMPLING distribution
. Each sample contains different elements so the value of the sample
statistic differs for each sample selected. These statistics provide different
estimates of the parameter. The sampling distribution describes how these
different values are distributed.
Knowledge of a Sampling Distribution .of a particular statistic provides the
information about the performance over the long run
A sampling distribution of a sample statistic (based on n observations) is the
relative frequency distribution of the statistic theoretically generated by
taking repeated samples of size n from a population of size N and computing
the statistic for each sample.
Sampling distribution of the sample
mean
Using a Random Number Table, select 5 different samples of size 5 and for each
calculate the sample mean of each sample
Repeat the above but this time with 5 different samples of size 25 say;
Find the mean and standard deviation of each set of sample sizes
What do you notice? That is what happens as n increases??
The same results would be obtained if f5equency distribution figure was used to
illustrate the results. That is lower n will results in higher variability in the sample
statistics
When a sample is selected, the sampling method may allow the researcher to
determine the sampling distribution of the sample mean x. ͞ The researcher hopes
E (x )
that the mean of the sampling distribution will be μ, the mean of the population. If
this occurs, then the expected value of the statistic x ͞ is μ. This characteristic of the
sample mean is that of being an unbiased estimator of μ. In this case,
If the variance of the sampling distribution can be determined, then the researcher
is able to determine how variable x ͞ is when there are repeated samples. The
researcher hopes to have a small variability for the sample means, so most
estimates of μ are close to μ.
Sampling distribution of the mean and
Central Limit Theorem
Standard σ x
deviation n
Hence:
and
Normal Approximation to Binomial…
Step 1
Define the sample proportion of interest in words. This is
important because every qualitative set of data has more than
one category, and you must be sure to identify the category, or
attribute, of interest. Also specify the values of the population
proportion p and the sample size n.
Step 2
Find the values of the mean and standard
error of the sampling distribution of p hat
using
p
p
p (1 p )
p n
Steps Cont..
Step 3
Verify that the sampling distribution of p is
approximately normal by checking that the
following holds np>5 and n(1-p) > 5
Step 4
Sketch a normal curve, and shade the area
corresponding to the probability of interest
Steps Cont
Step 5
Calculate the z-scores corresponding to the appropriate values
of
p. (Remember p is the same as the mean)
p p
z
p
Step 6
Use table to find the area under the normal curve
corresponding to each calculated z-score.
Step 7
With the help of the curve sketched in Step 4, find the
probability of interest by adding or subtracting appropriate
areas.
9.44
p
Example
p
Solution - Describe Sampling Distribution of
Population p = .52
Sample: Random, n = 300
Sampling distribution:
p = .50
p(1 p)
.0288
p n
2 2 2
(.6 .5) .3125 (.8 .5) .15625 (1 .5) .03125
.05
E(p) = p
p (1 p )
SD(p) =
n