0% found this document useful (0 votes)
11 views

CH-1 Sampling Distribution

The document discusses sampling and sampling distributions. It defines key terms like population, sample, parameter, statistic and different types of sampling techniques. Probability sampling techniques include simple random sampling and systematic random sampling. Non-probability sampling does not give all units an equal chance of being selected.

Uploaded by

workiemelkamu400
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

CH-1 Sampling Distribution

The document discusses sampling and sampling distributions. It defines key terms like population, sample, parameter, statistic and different types of sampling techniques. Probability sampling techniques include simple random sampling and systematic random sampling. Non-probability sampling does not give all units an equal chance of being selected.

Uploaded by

workiemelkamu400
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

CHAPTER ONE

SAMPLING AND SAMPLING DISTRIBUTIONS

1.1. INTRODUCTION
Sampling in statistics is as common and important as salt is in food. In homes, ladies take out
one teaspoonful to detect the quality of what they are cooking. In medical sciences, a few drops
of blood are taken and tested microscopically or chemically to determine whether the blood
contains any abnormalities.
In different fields of human activity, in performing the ordinary actions of our daily lives, the
decision-making process is based on the observations of a few units that form a portion of the
total population. Nowadays, sampling methods are extensively used in socio-economic surveys
to understand the living conditions, cost of living index, etc., of a class of people. In biological
studies, experiments are conducted on some units (persons, animals, or plants), and inferences
are drawn about the breed or variety to which the units belong. In industries, sampling
procedures are predominantly used for quality control.
1.2.SAMPLING THEORY
Sampling theory is the study of relationships existing between a population and samples drawn
from the population.
1.2.1. BASIC DEFINITIONS
➢ Universe or Population: is an aggregate of objects under study. It is a collection of
individuals or of their attributes or of results of operations which can be numerically
specified.
If the population consists of a finite number of individuals, then it is called a Finite
Population. If the population consists of an infinite number of units, then it is practically
impossible to observe all the units, and it is called an Infinite Population.
Although many populations appear to be exceedingly large, no truly infinite population
of physical objects actually exists. Given limited resources and time, it is practically not
possible to count the number of grains of sand on the beach. Such populations are termed
as infinite populations for our study.
➢ Sample:is a finite subset of a population drawn from it to estimate the
characteristics of the population. Sampling is a tool which enables us to draw
conclusions about the characteristics of the population.
➢ Sampling: sampling consists of drawing a sample from a given population and

1
using the sample data to gain knowledge about the population parameters.
➢ Census: a census involves complete enumeration or full count of the entire units
comprising the population in respect of the parameter of our interest.

➢ Statistic: - A numerical measurable value of the sample or a measurable characteristic


value of the sample.

➢ Parameter: - A measurable value of the population or a measurable characteristic value


of the population. It is a population result.

➢ Sampling design: - A sample design is a definite plan for obtaining a sample from the
sampling frame. It refers to the technique or the procedure that one would adopt in
selecting some sampling units from which inferences about the population is drawn.
Sampling design is determined before any data are collected.

➢ Sampling error: - Sample surveys do imply the study of a small portion of the
population and as such there would naturally be a certain amount of inaccuracy in the
information collected. This in accuracy may be termed as sampling error or error
variance. The discrepancies between population parameters and estimates (statistics),
which are derived from a random sample is also the error or the sampling bias. In short,
sampling error is the difference between a sample statistic and its corresponding
population parameter.

1.2.2. THE NEED FOR SAMPLES


It is often not feasible to study the entire population. Some of the major reasons why sampling is
necessary are listed as follows;
A. The destructive nature of certain tastes
Many experiments, especially in quality control, demand destructing outputs. Consider the
following tests:
 Testing wine or coffee
 Testing strength of light bulbs
 Blood test for a patient
Unless a sample is taken from the entire population, the wine taster should drink all the wine, all
the light bulbs produced should be destroyed, and nothing would remain for sale. All the blood
from the patient should be poured out, and the patient will die. Here sample is a must.

2
B. The physical impossibility of checking all items in the population
The populations of fish, birds and other wild lives are large and are constantly moving being
born and dying. There is no mechanism to contact all items or individual members of the
population.
C. The cost of studying all the items in a population is often prohibitive
Contacting the entire population is costly and expensive; therefore, sampling is preferable. Public
opinion polls and consumer testing organizations typically contact only a fraction of the millions
of families. Consider a multinational corporation with 50 million customers worldwide. If this
company plans to conduct a market survey, it will need to take 2000 samples out of the 50
million. If it costs 20 Br. to mail samples and tabulate the responses of these 2000 samples, the
total survey will cost 40,000 Br. Meanwhile, the same survey involving the entire 50 million
population would cost about one billion Br.
D. The adequacy of sample results
If the sampling technique is effectively conducted, there is no need to study the entire
population; the sample itself leads to adequate generalization about the population. Even if funds
were available, it is doubtful whether the additional accuracy of census, i.e., studying the entire
population, is essential in most problems. To determine the monthly index of food prices (bread,
beans, milk, etc.), it is unlikely that the inclusion of all grocery stores and shops would
significantly affect the index, since the prices of such commodities usually do not vary by more
than a few cents from one store to another. 100% accuracy cannot always be guaranteed by
studying the entire population. The chance of error in collecting and analyzing bulk data has its
own disadvantages.
E. To contact the whole population would often be time consuming
Studying all population is time consuming. Therefore, to save time, sampling is critical.

Characteristics of a good sample


An ideal sample should fulfill the following three basic characteristics:
➢ Representativeness: An ideal sample must adequately represent the entire population. It
should not lack any qualities found in the entire population.
➢ Adequacy: The number of units included in the sample should be sufficient to enable the
derivation of conclusions applicable to the entire population.
➢ Homogeneity: The elements included in the sample must bear likeness to other elements.

3
1.2.3. ERRORS
The term error denotes the difference between population value and its estimate provided by
sampling technique. The errors involved in the collection, processing, and analysis of data, as
well as sampling, may be broadly classified under the following two heads:
I. Sampling Errors and
II. Non-sampling Errors
Sampling Errors is the difference between the result of a sample and the result of census. It
represents the variance between the sample estimation and the actual value of the population.
This discrepancy arises because only a portion of the population (i.e., the sample) has been
utilized to estimate population parameters and draw inferences about the population. Sampling
error decreases as the sample size increases.
Non-Sampling Errors are those errors that can be created from errors in the sampling/census
procedure, and they cannot be reduced or eliminated by increasing the sample size. Non-
sampling errors primarily arise at the stages of observation ascertainment and processing of the
data and are thus present in both the complete enumeration survey and the sample survey. Such
errors occur because of human mistakes and not chance variation.
1.2.4. TYPES OF SAMPLES
The sampling techniques may be broadly classified as follows:
1. Probability(random) Sampling
2. Non-Probability(non-random) Sampling

1.2.4.1. Random or probability sampling techniques


Probability sampling is the scientific method of selecting samples according to some laws of
chance in which each unit in the population has some definite pre-assigned probability of being
selected in the sample.
1. Simple Random Samples
In a simple random sample, every item from a frame has the same chance of selection as every
other item. In addition, every sample of a fixed size has the same chance of selection as every
other sample of that size.
 In simple random sampling, “n” is used to represent the sample size and “N” is used to
represent the population size. Every item in the frame is numbered from 1 to N.
 The chance of selecting any particular member of the frame on the first selection is 1/N.

4
 Selection of Simple Random Sampling can be done by Lottery Method or the use of table
of random numbers. In lottery Method we identify each and every unit with distinct
numbers by allotting an identical card. The cards are put in a drum and thoroughly shuffled
before each unit is drawn. There are several Random Numbers Tables.
2. Systematic Random Sampling
The items or individuals of the population are arranged in some way (alphabetical) or some other
method. A random starting point is selected and then every Kth member of the population is
selected for the sample. A systematic random sample should not be used if there is a
predetermined pattern to the population. Values are listed in ascending or descending order.
- To get a systematic sample of size n from a population of size N, draw a random number i from
N
1 to K, where K = n , and then select i, i + K, i + 2K, i + 3K, …

3. Stratified Random Sample


This sampling design is most appropriate if the population is heterogeneous with respect to
characteristic under study or the population distribution is highly skewed.
We subdivide the population into several groups or strata such that i) units within each stratum is
more homogeneous ii) units between stratum are heterogeneous and iii) Strata do not overlap, in
other words every unit of population belongs to one and only one stratum.
The criterions used for stratification are geographical, sociological, age, sex, income etc. The
population of size N is divided into ‘K’ strata relatively homogenous of size N1,
N2………….Nk such that N1 + N2 +……… + Nk = N. Then we draw a simple random sample
from each stratum either proportional to size of stratum or non-proportional (equal units from
each stratum).
Stratified sampling has the advantage in some cases of more accuracy reflecting the
characteristics of the population than simple random or systematic random sampling.

Example: Suppose 200, 300 and 500 items are produced by factories located at three cities X, Y
and Z. We wish to draw a sample of 20 items under proportional stratified sampling.
Proportional samples to be selected are:
For Factory X = 20 x (200/1000) = 4
For Factory Y = 20 x (300/1000) = 6
For Factory Z = 20 x (500/1000) = 10
Total = 20

5
In non-proportional stratified sampling equal number of samples is taken regardless of the size of
the stratum.
4. Cluster Sampling
The total population is divided into recognizable subdivisions, known as clusters such that within
each cluster units are more heterogeneous and between clusters, they are homogenous. The units
are selected from each cluster by suitable sampling techniques.
- Often employed to reduce cost of sampling a population scattered over a large geographic area.
1.2.4.2. Non-random (non-probability) sampling techniques
It is a sampling scheme in which there is no attachment of probability concept in selecting a
sample unit for the sample. Rather, the sample is selected with a definite purpose in view, and
the choice of the sampling units depends entirely on the discretion and judgment of the
investigator. Depending on the object of inquiry and other considerations, a predetermined
number of sample units are selected purposefully so that they represent the true characteristics of
the population. A serious drawback of this sampling design is that it is highly subjective in
nature. The selection of sample units depends entirely on the personal convenience, biases,
prejudices, and beliefs of the investigator. This method will be more successful if the investigator
is thoroughly skilled and experienced.
i. Judgment Sampling: The choice of sample items depends exclusively on the judgment of the
investigator. The investigator’s experience and knowledge about the population will help to
select the sample units. It is the most suitable method if the population size is small.
ii. Convenience Sampling: The sample units are selected according to convenience of the
investigator. It is also called “chunk” which refers to the fraction of the population being
investigated which is selected neither by probability nor by judgment. Further a list or
framework should be available for the selection of the sample. There is high chance of bias being
introduced. It is used to make pilot studies.
iii. Quota Sampling: It is a type of judgment sampling. Under this design quotas are set up
according to some specified characteristic such as age group, income groups etc. From each
group a specified number of units are sampled according to the quota allotted to the group.
Within the group the selection of sample units depends on personal judgment. It has a risk of
personal prejudice and bias entering the process. This method is often used in public opinion
studies.
iv. Snowball Sampling: Snowball sampling involves identifying one or a few initial participants
who meet the criteria for the study and then asking them to refer other potential participants. This
method is often used when the population of interest is difficult to identify or access.
6
1.3.SAMPLING DISTRIBUTIONS
Sampling distribution describes the way in which a statistic or a function of statistics, which is
the function of the random variables X1, X2, ……., Xn, will vary from one sample to another
sample of the same size. Such sampling distributions have provided a foundation for the number
of test statistics used in hypothesis testing. We are often concerned with sampling distribution in
sampling analysis. If we take certain number of samples and for each sample compute various
statistical measures such as mean, standard deviation, etc., then we can find that each sample
may give its own value for the statistic under consideration. All such values of a particular
statistic, say mean, together with their relative frequencies will constitute the sampling
distribution of the particular statistic, say mean. Accordingly, we can have sampling distribution
of mean, or the sampling distribution of standard deviation or the sampling distribution of any
other statistical measure. It may be noted that each item in a sampling distribution is a particular
statistic of a sample. The sampling distribution tends quite closer to the normal distribution if the
number of samples (sample size) is large. The significance of sampling distribution follows from
the fact that the mean of a sampling distribution is the same as the mean of the population.
1.3.1. Sampling Distribution of the mean
Sampling distribution of the sample means, x , is the probability distribution consisting of a list
of all possible sample means of a given sample size selected from a population, and the
probability of occurrence associated with each sample mean. It is the probability distribution of

all possible values of the sample means x .The number of samples that can be drawn from a
population depends on whether we sample with replacement or without replacement.

Example: Consider a finite population with five elements labeled A, B, C, D, and E. If a sample
of size 2 is to be selected at random from the population without replacement.
A. How many possible samples of size 2 is possible?
B. List each of the different possible sample combinations.

Solution
N = 5 and n = 2 (NCn)= N!
n!(N-n)!

A. (5C2)= 5! = 5! = 10
2!(5-2)! 2!(3!)

B. AB, AC, AD, AE, BC, BD, BE, CD, CE, DE.

7
❖ In establishing relationship between the population and the sampling distribution of
the mean the following symbols will be used.
N = population size n = sample size
= population mean x = sample mean
= population standard deviation  or X =mean of sampling distribution of the means
X

 X = standard error of the mean


Example: The following distribution is the hourly wage of five employees. Assume that a
sample of 2 employees will be taken.
Employee X1 X2 X3 X4 X5
Hourly Wage 2 4 6 8 10
Find:
A. The population mean?
B. The standard deviation of the population.
C. Mean of sampling distribution of the means
D. Standard error of the means.
Solution
A. The 5 employees constitute our entire population, so that N = 5 and.
5

X
1 =1
i
2 + 4 + 6 + 8 + 10 30
= = =6
The population mean ()= N 5 5

 ( Xi −  )
2

B. The standard deviation is given by the formula  =


N
(2 − 6) 2 + (4 − 6) 2 + (6 − 6) 2 + (8 − 6) 2 + (10 − 6) 2 40
 = = = 8 = 2.83
5 5
C. Mean of sampling distribution of the means
If we are planning to take sample of two employees, we will have 10 possible samples. i.e.
5!
5C2= . The 10 possible samples and their means are the following
(5 − 2)!2!

X1, X2-------- (2, 4)--------- X 1 = 3 X2, X4------- (4, 8)------- X 6 = 6


X1, X3-------- (2, 6)--------- X 2 = 4 X2, X5------ (4, 10)-------- X 7 = 7
X1, X4-------- (2, 8)--------- X 3 = 5 X3, X4------ (6, 8)---------- X 8 = 7
X1, X5-------- (2, 10)-------- X 4 = 6 X3, X5------ (6, 10)-------- X 9 = 8
X2, X3------- (4, 6)----------- X 5 = 5 X4, X5------ (8, 10)------- X 10 = 9

8
The mean of the distribution of sample means (grand mean) is obtained by summing the various
sample means and dividing the sum by the number of samples (N).
The grand mean,  of the distribution of these sample means is:
X

10

 Xi
i =1 3 + 4 + 5 + 6 + 5 + 6 + 7 + 7 + 8 + 9 60
 = = = =6
X
N 10 10
The grand mean has the same value as the population mean.
Let’s organize this distribution of sample means into a frequency distribution and probability
distribution.

Sample Mean Frequency


Probability
3 1 0.1
4 1 0.1
5 2 0.2
6 2 0.2
7 2 0.2
8 1 0.1
9 1 0.1
1.00
The probability distribution of the sample mean is referred to us “Sampling distribution of the
mean”.
X
D. Standard error of the mean ( )
Standard error of the mean is a measure of dispersion of the distribution of sample means and is
similar to the standard deviation in a frequency distribution and it measures the likely deviation
of a sample mean from the grand mean of the sampling distribution.

 (X i −  )
2

X = X
, Where N = number of sample means.
N
However, it is not possible to take all possible samples from the population; we must use
X
alternative method to compute
If mean is given for a finite population,

X =

.
(N − n ) Where  = Population standard deviation
n N −1
N = Population size
n = Sample size.

9
(N − n )
Since we generally deal with very large population, which can be considered infinite,
(N − 1)
would approach 1.
Hence,  X =
 for the above example of the sampling distribution of the hourly wage of 5
n
employees, the standard error of the means is: 2.83/√2 =2.0011

N −n
- The factor is known as the finite correction factor and should be used when population
N −1
size is finite.
X
➢ Any single sample mean will become closer to the population mean, as the value of
decreases.
Properties of Sampling distribution of the mean
✓ The mean of the sampling distribution of the means and the mean of the population are
equal.
✓ The spread of the sample means in the distribution is smaller than in the population values.
For instance, the spread in the distribution of sample means above is from 3 to 9, while the
spread in the population was from 2 to 10.
The central limit theorem
The Central Limit Theorem States that: “Regardless of the shape of population distribution, the
distribution of the sample means approaches the normal probability distribution as the sample
size increases.”
The question is how large should the sample size be in order for the distribution of sample means
to approximate the normal distribution for any type of population. In practice, the sample sizes
of 30 or larger are considered adequate for this purpose. This should be noted however, the
sampling distribution would be normally distributed if the original population is normally
distributed, no matter what the sample size.
The discussion on the sampling distribution is concerned with the proximity to a sample mean to
the population mean. It can be seen that the possible values of sample means tend towards the
population mean, and according to central limit theorem, the distribution of sample means tends
to be normal for a sample size of “n” being larger than 30. The sampling distribution of the mean
approaches normal distribution as the sample size increases.

10
Distribution of the standardized statistics for the sample mean
In order to use the central limit theorem, we need to know the population standard deviation
when it is not known the standard deviation of the sample, designated by S is used to
approximate it. The standardized distribution of the sample means is Z and
x− x−
Z = , if the population standard deviation is known or , if the population standard
 s
n n

deviation is unknown.
Example 1:
The annual wages of all employees of a company has a mean of 20,400 per year with standard
deviation of 3200. The personnel manager is going to take a random sample of 36 employees and
calculate the sample mean wage. What is the probability that the sample mean will exceed
21,000?
n= 36  = 20,400 and  =3200
x− 21000 − 20400
P[ x > 21,000] = = = 1.125
 3200
n 36

Using standard Table the probability of Z1.13 is 0.3708, the area under the normal curve less
than-1.13 is 0.5-0.3708=0.1292. P (Z > 1.13) = 0.1292
Therefore, the probability that the sample mean will exceed 21, 000 is approximately 13%.
Example 2:
A company makes engine used in speedboats. The company’s engineers believe that the engine
delivers an average power of 220 horse power /HP/ and that the standard deviation of power
delivered is 15 HP. A potential buyer intends to sample 100 engines. What is the probability that

the sample mean, x will be less than 217 HP.

x− 217 − 220


P ( x < 217) =

= = -2
n 5 100

Using standard Table the probability of Z-2 is 0.4772, the area under the normal curve less than
-2 is 0.5-0.4772=0.0228. P (Z < -2) = 0.0228
Thus, the probability that the potential buyer’s tests will result in a sample mean lower than 217
HP is 2.28%.

11
1.3.2. Sampling distribution of the proportion
The population proportion, represented by “P " , is the proportion of items in the entire

population with the characteristic of interest. The sample proportion, represented by p , is the
proportion of items in the sample with the characteristic of interest. The sample proportion, a
statistic, is used to estimate the population proportion, a parameter.
To calculate the sample proportion, you assign possible values 0 to 1, to represent the presence
or absence of the characteristic.
Sample Proportion

X Number of items having the characteristic of interest


p = =
n Sample Size
The sample proportion ( p ) will be between 0 and 1.
You learned that the sample mean (x̅) is an unbiased estimator of the population mean  ,

similarly, sample proportion ( p ) is an unbiased estimator of the population proportion (P) by

analogy to the sampling distribution of the mean, whose standard error is  X =  / n

The standard error of the proportion (  p ) is:

p = P (1 − P ) / n

Where p = population proportion


n = sample size
Sampling distribution of the proportion will be normal:
1. When samples of a fixed size are drawn from a normally distributed population.
2. When sampling distribution of the proportions confirms with the central limit theorem;
i.e. n >30 or when n.p >5 and n(1-p) > 5.

In most cases in which inferences are made about the proportion, the sample size is substantial
enough to meet the conditions for using the normal approximation. Therefore, in many instances,
you can use the normal distribution to estimate the sampling distribution of the proportion.
Finding Z for the sampling distribution of the proportion

p −P
Z=
√P(1 − P)/n

12
Where

p =Sample proportion
P =Population proportion
n=sample size
Example
To illustrate the sampling distribution of the proportion, suppose that the manager of the local
branch of a bank determines that 40% of all depositors have multiple accounts at the bank. If you
select a random sample of 200 depositors, what is the probability that the sample proportion of
depositors with multiple accounts is less than 0.30?
Solution

p −P
Z=
√P(1 − P)/n
0.3−0.4
Z= = -2.89
√.40(1−.4)/200

Using standard Table, the probability is 0.4981, the area under the normal curve less than-2.89 is
0.5-0.4981=0.0019. Therefore, if the population proportion of items of interest is 0.40, only
0.19% of the samples would be expected to have sample proportions less than 0.30.
In the above discussion, the assumption made is if “n” is very large. If the sampled population is
finite, or small, we will make some adjustments to the standard error of the sample proportions
using the finite-population correction factor.

Exercise:
A manufacturer of screws has noticed that on average two percent of the screws produced are
defective. A random sample of 400 screws is examined for the proportion of defective screws.
Find the probability that the proportion of defective screws in the sample is between 0.01 and
0.03.

13

You might also like