0% found this document useful (0 votes)
13 views50 pages

9-1 ASAP Statistics - Sampling-1

The document discusses different sampling techniques used in statistics including simple random sampling, stratified sampling, systematic sampling, and cluster sampling. It explains the key aspects of each technique such as how samples are selected, their advantages and disadvantages. The document is intended to teach business analytics students about basic statistical sampling.

Uploaded by

George Mathew
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views50 pages

9-1 ASAP Statistics - Sampling-1

The document discusses different sampling techniques used in statistics including simple random sampling, stratified sampling, systematic sampling, and cluster sampling. It explains the key aspects of each technique such as how samples are selected, their advantages and disadvantages. The document is intended to teach business analytics students about basic statistical sampling.

Uploaded by

George Mathew
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

FOUNDATION TO DATA SCIENCE

Business Analytics

Unit1: BASIC STATISTICS


SAMPLING

Prof. Dr. George Mathew


B.Sc., B.Tech, PGDCA, PGDM, MBA, PhD 1
SAMPLING AND SAMPLING DISTRIBUTIONS
When sampling from a finite population, it is
recommended to select a probability sample as it
allows to make valid statistical inferences. The
simplest type is a simple random sample.
A simple random sampling of size n from a finite
population of size N is a sample selected such that
each item in the sample the same probability of
being selected.
Sampling Design
A sample is a subset, or some part, of a
larger population. The purpose of sampling is to
estimate an unknown characteristic of a
population.
Sampling is defined in terms of the population
being studied.
A population (universe) is any complete
group—for example, of people, sales territories,
stores, or college students—that shares some
common set of characteristics. The term
population element refers to an individual
member of the population.
Sampling Design
Population (universe): Any complete group of
entities that share some common set of
characteristics.
Population Element: An individual member of
a population.
Census: An investigation of all the individual
elements that make up a population.
Sampling Frame: A list of elements from which
the sample may be drawn. It is the Working
population
Sampling Frame
• A list of elements from which the sample
may be drawn
• Working population
• Mailing lists - data base marketers
• Sampling frame error
Stages in the
Define the target population
Selection
of a Sample Select a sampling frame

Determine if a probability or nonprobability


sampling method will be chosen

Plan procedure
for selecting sampling units

Determine sample size

Select actual sampling units

Conduct fieldwork
Why Sample?

Availability of
Lower cost
elements
Sampling
provides
Greater speed Greater
accuracy

14-7
Merits and Demerits of Sampling

When the field of inquiry is large becomes


difficult to adopt census because of the
resources involved.
Sample saves Time
Sample is the only option when destructive
testing
Some time gives more reliable results
Census is appropriate when the universe or
the population size small. It is no use
resorting to a sample survey.
Sampling
The process of Sampling involves any
procedure using a small number of items or
parts of the whole population to make
conclusions regarding the whole population.
Sample Purpose of Sampling

A sample is a Purpose of Sampling is to


enable researchers to estimate
subset, or some some unknown characteristics
part, of a larger of the population

population
Population: A complete group of entities sharing
some common set of characteristics.
A Census is am investigation of all individual
Sampling Frame
The list of elements from which a sample may be
drawn, also called population
Target Population Sampling Frame Error
Error that occurs when
The specific, complete certain sample elements are
group relevant to the not listed or available and are
not represented in the
research project sampling frame

Sampling Unit 1. Primary Sampling Unit


A unit selected in the first
A single element or stage
group of elements 2. Secondary Sampling unit
A unit selected in the
subject to selection in
second stage of sampling
the sample
Census Vs Sampling
When all the units are studied,
such a complete coverage is
called a census survey
When only a sample of the
universe is studied, the study is
called a sample survey.
Characteristics of a Good Sample
1. Accuracy 2. Precision
Accuracy is defined as the The sample must yield
degree to which bias is precise estimate. Precision is
absent. An accurate measured by the standard
(unbiased) sample is one error or standard deviation of
which exactly represents the sample estimate.
the population.
3. 4. Size
A good sample must be
Representativeness adequate in size in order to
A sample must be be reliable. The sample
representative of the should be of such sixe that
the inference drawn from the
population. sample are accurate to a
Probability sampling given level of confidence.
technique yield
Sampling Techniques (Types of Sampling)
Sampling techniques or methods may be
classified into two generic types:
A. Probability Sampling B. Non-Probability Sampling
1. Simple Random Sampling 1. Convenience or
2. Stratified Sampling accidental sampling
3. Systematic Sampling 2. Purposive or judgment
4. Cluster Sampling sampling
5. Area Sampling 3. Quota sampling
6. Multi-Stage and sub 4. Snow-ball sampling
sampling
7. Random Sampling with
probability proportional to size
8. Double Sampling and
Multiphase Sampling
9. Replicated or
interpenetrating sampling
Probability sampling
Probability sampling is based on the
theory of probability. It is also known as
random sampling.
Its characteristics are:
1.Every population has a chance of being
selected
2.Such chance is a known probability
3.It yield representative sample
4.The closeness of sample to the population
can be determined by estimating bias or
error
Non-Probability sampling
Non-Probability sampling is not based on the theory of
probability. This sample does not provide a chance of
selection to each population element.
Its characteristics are:
1. It does not ensure a selection chance to each
population unit
2. Selection probability is not known
3. It is not a representative one
4. Population parameters can not estimated from the
sample values
5. It suffers from sampling bias which will distort results
14-18

Simple Random
1. The lottery method
Drawing names from a hat and selecting
the winning raffle ticket from a large drum
are this type of sampling.
2. The use of Table of Random Numbers
3. Use of computers
Probability of selection= sample size/
Population size
14-19

Simple Random
Advantages Disadvantages
• Easy to implement • Requires list of
with random population
dialing elements
• Time consuming
• Uses larger
sample sizes
• Produces larger
errors
• High cost
14-20

Stratified Random Sampling


In this method the population is sub-divided into
homogeneous groups or strata such that the
members within each stratum have similar
attributes but the members between strata have
dissimilar attributes. From such stratum sample
is drawn using simple random sampling.
Example:
Statum1- Urban
Statum2- Semi urban
Statum3 –Rural
Proportionate Stratified sampling take samples
proportionate to each strata
14-21

Stratified
Advantages Disadvantages
• Control of sample size in • Increased error will result
strata if subgroups are selected
• Increased statistical at different rates
efficiency • Especially expensive if
• Provides data to strata on population must
represent and analyze be created
subgroups • High cost
• Enables use of different
methods in strata
14-22

Systematic Sampling
In this method every kth element in the
population is selected, beginning with
a random start of an element in the
range of 1 to k. Skip interval k is
determined by :
k= Population size/ sample size
14-23

Cluster Sampling
If the total area of interest happens to be a big
one, a convenient way in which a sample can be
taken is to divide the area into a number of
similar non-overlapping areas and then to
randomly select a number of these smaller areas
(called clusters), with the ultimate sample
consisting of all units in these area or cluster.
Then from each selected sampling units, a
sample of population elements is drawn by either
simple random selection of stratified random
selection
14-24

Cluster Sampling
Advantages Disadvantages
• Provides an unbiased • Often lower statistical
estimate of population efficiency due to
parameters if properly subgroups being
done homogeneous rather
• Economically more than heterogeneous
efficient than simple • Moderate cost
random
• Lowest cost per sample
• Easy to do without list
14-25

Stratified and Cluster Sampling


Stratified Cluster
• Population divided • Population divided into
into few subgroups many subgroups
• Homogeneity within • Heterogeneity within
subgroups subgroups
• Heterogeneity • Homogeneity between
between subgroups subgroups
• Choice of elements • Random choice of
from within each subgroups
subgroup
14-26

Area Sampling
If cluster happens to be some
geographic subdivisions, in that case
cluster sampling is better known as
area sampling.
In large field surveys, clusters
consisting of specific geographical
areas like districts, talukes, villages or
blocks in a city are randomly drawn.
14-27

Multi-Stage Sampling
Muti-stage sampling involves two or
more steps that combine some of the
probability sampling techniques
already described.
Multi stage sampling is applied in big
inquiries extending to a considerable
large geographical area.
Non-Probability sampling
Some of the popular non-
probability sampling techniques
are:
1. Convenience Sampling
2. Judgment Sampling
3. Quota Sampling
4. Snowball Sampling
Convenience Sampling
This is a non-probability sampling method in
which the interviewers will decide the choice of
sampling units based on their convenience.
In most of the situations, the following may be
true:
➢ The sampling units may be distributed sparsely
(thinly)
➢ Many respondents will refuse to fill the
questionnaire
➢ Interviewers may not be serious in selecting
sampling units as per sampling plan, etc.
Judgment Sampling (Purposive
Sampling)
This is a non-probability sampling method
in which the sampling units are selected
on the advice of some expert or by the
intuition /opinion of the researcher
himself.
There is chance of personal biases.
If done seriously, lead to better results.
This is called purposive sampling because the
samples are identified selectively which
prevents the inclusion of other sampling units.
Quota Sampling
Quota sampling is a non-probability
sampling method in which the population
is classified into a number of groups
based on some criterion, say age of the
members of the population, viz., old age,
middle age, and young age.
In this sampling, the proportion of number of
sampling units selected are same as in the
population.
Snowball Sampling
The snowball sampling is a restrictive multi-
stage sampling in which initially certain number
of sampling units are randomly selected. Later,
additional sampling units are selected based on
referral process.
This means that the initially selected
respondents provide addresses of additional
respondents for interviewers.
Initial respondents may be selected randomly,
for example, from the information in the
telephone directories.
Population Parameter
The population mean (µ), standard deviation (σ)
, and proportion (p) are called the parameters of
a distribution.
• Variables in a population
• Measured characteristics of a population
• Greek lower-case letters as notation

The test on the pameters like, mean, standard


deviation, and proportion are called parametric
test.
Sample Statistics
• Variables in a sample
• Measures computed from data
• English letters for notation
Estimating the Standard Error
of the Mean

S
S x
=
n
S
 = X  Z cl
n
Random Sampling Error
and Sample Size are
Related
Sample Size
Sample size calculation use:
1.Variance (standard deviation)
2.Magnitude of error
3.Confidence level
Sample Size Formula

2
 zs 
n= 
E
Sample Size Formula - Example
Suppose a survey researcher,
studying expenditures on lipstick,
wishes to have a 95 percent confident
level (Z) and a range of error (E) of
less than $2.00. The estimate of the
standard deviation is $29.00.
Sample Size Formula - Example

 (1.96)(29.00)
2 2
 zs 
n =  = 
E  2.00 
2
 56.84 
=  = (28.42)2
= 808
 2.00 
Sample Size Formula - Example

Suppose, in the same example as the


one before, the range of error (E) is
acceptable at $4.00, sample size is
reduced.
Sample Size Formula - Example

(1.96)(29.00)
2 2
 zs 
n =  =  
E  4.00 
2
56.84
=  = (14.21)2
= 202
 4.00 
Calculating Sample Size
99% Confidence
2 2
(2.57)(29)  (2.57)(29) 
n=  n= 
 2   4 
 
2
74.53 
2

= 74.53
=  
 2   4 
= [37.265] = [18.6325]
2 2

=1389 = 347
Standard Error of the Proportion

sp =
pq
n

or

p (1− p )
n
Confidence Interval for a
Proportion

p  ZclSp
Sample Size for a Proportion

2
Z pq
n= 2
E
z2pq
n= 2
E
Where:
n = Number of items in samples

Z2 = The square of the confidence interval


in standard error units.

p = Estimated proportion of success

q = (1-p) or estimated the proportion of failures

E2 = The square of the maximum allowance for error


between the true proportion and sample proportion
or zsp squared.
Calculating Sample Size
at the 95% Confidence Level

p = .6 (1. 96 )2(. 6)(. 4 )


n=
q = .4 ( . 035 )2
(3. 8416)(. 24)
=
001225
. 922
=
. 001225
= 753

You might also like