0% found this document useful (0 votes)
48 views48 pages

PME Lec1. Sampling 13dec

This document discusses sampling methods for collecting data. [1] It defines key terms like population, sample, and sampling. [2] It describes two main types of sampling methods - probability sampling and non-probability sampling. [3] Probability sampling methods like simple random sampling, systematic sampling, stratified sampling, and cluster sampling allow estimating how representative results are of the overall population. Non-probability methods are cheaper but do not support generalization.

Uploaded by

Anonymous 3tpoms
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views48 pages

PME Lec1. Sampling 13dec

This document discusses sampling methods for collecting data. [1] It defines key terms like population, sample, and sampling. [2] It describes two main types of sampling methods - probability sampling and non-probability sampling. [3] Probability sampling methods like simple random sampling, systematic sampling, stratified sampling, and cluster sampling allow estimating how representative results are of the overall population. Non-probability methods are cheaper but do not support generalization.

Uploaded by

Anonymous 3tpoms
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

1

CHAPTER 8

FUNDAMENTAL SAMPLING
DISTRIBUTION AND DATA
DESCRIPTION
2

What is Sampling? Why we do sampling


Sampling Methods
Types of sampling (non probability and probability
sampling)
Population, sample and sampling
3

Population: It consists of the totality of the observations


with which we are concerned. Population parameter. A
population parameter is the true value of a population
attribute.

Sample: It is a subset of a population; Sample statistic.


A sample statistic is an estimate, based on sample data,
of a population parameter.

Sampling: (statistics) the selection of a suitable sample


for study
Sampling
Sampling methods permits us to draw conclusions about a
population based on a sample.

Sampling (i.e. selecting a sub-set of a whole population) is


often done for reasons of cost (it’s less expensive to
sample 1,000 television viewers than 100 million TV
viewers) and practicality (e.g. performing a crash test on
every automobile produced is impractical).

In any case, the sampled population and the target


population should be similar to one another.
Why sampling?
5

Get information about large populations


Less costs
Less field time
More accuracy i.e. Can Do A Better Job of Data
Collection
When it’s impossible to study the whole
population
Target Population:
The population to be studied/ to which the investigator
6
wants to generalize his results.

Sampling Unit:
smallest unit from which sample can be selected

Sampling frame
List of all the sampling units from which sample is drawn

Sampling scheme
Method of selecting sampling units from sampling frame
2. Sampling Methods of Collecting
Data
There are many methods used to collect or obtain
data for analysis. Three of the most popular methods
are (excluding census)

Census
Direct Observation
Experiments
Sample Surveys
Data collection methods
8
9
10
Data collection methods pros and cons
iii. Survey:
Population Parameter vs. Sample Statistic
The reason for conducting a sample survey is to estimate the value of some
attribute of a population.

A survey solicits information from people;


e.g. Gallup polls; pre-election polls; marketing surveys.

Surveys may be administered in a variety of ways, e.g.


Personal Interview
Telephone Interview
Self Administered Questionnaire
Internet

The Response Rate (i.e. the proportion of all people selected


who complete the survey) is a key survey parameter.
Questionnaire Design
Over the yea lot of thought has been put into the science of
the design of survey questions. Key design principles:
1. Keep the questionnaire as short as possible.

2. Ask short, simple, and clearly worded questions.

3. Start with demographic questions to help respondents


get started comfortably.
4. Use dichotomous (yes|no) and multiple choice questions.

5. Use open-ended questions cautiously.

6. Avoid using leading-questions.

7. Pretest a questionnaire on a small number of people.

8. Think about the way you intend to use the collected data
when preparing the questionnaire.
3. Types of sampling
14

Probability samples

Non-Probability samples
Types of sampling
15

Probability samples. With probability sampling methods, each


population element has a known (non-zero) chance of being chosen
for the sample.

Non-probability samples. With non-probability sampling methods,


we do not know the probability that each population element will be
chosen, and/or we cannot be sure that each population element has
a non-zero chance of being chosen.

Non-probability sampling methods offer two potential advantages -


convenience and cost. The main disadvantage is that non-probability
sampling methods do not allow you to estimate the extent to which
sample statistics are likely to differ from population parameters.
Only probability sampling methods permit that kind of analysis.
Non probability sampling
16
Non probability samples
17

Convenience samples (ease of access)


sample is selected from elements of a population
that are easily accessible

Snowball sampling (friend of friend….etc.)

Purposive sampling (judgemental)


You chose who you think should be in the study
Non probability samples
18

• Probability of being chosen is unknown


• Cheaper- but unable to generalise
• potential for bias
Probabiliy sampling
19

Probability Sampling: Simple Random Sampling,


Stratified Random Sampling, Multi-Stage
Sampling, systematic, cluster

What is each and how is it done?


How do we decide which to use?
How do we analyze the results differently
depending on the type of sampling?
Probability samples
20

Random sampling
Each subject has a known probability of being selected

Allows application of statistical sampling theory to


results to:
Generalise
Test hypotheses

Ensure
Representativeness
Precision
Methods used in probability samples
21

1. Simple random sampling


2. Systematic sampling
3. Stratified sampling
4. Multi-stage sampling
5. Cluster sampling
1. Simple random sampling
22

Simple random sampling. Simple random sampling refers to any


sampling method that has the following properties.
The population consists of N objects.
The sample consists of n objects.
If all possible samples of n objects are equally likely to occur, the
sampling method is called simple random sampling.

There are many ways to obtain a simple random sample. One way
would be the lottery method. Each of the N population members is
assigned a unique number. The numbers are placed in a bowl and
thoroughly mixed. Then, a blind-folded researcher selects n numbers.
Population members having the selected numbers are included in the
sample.
1. Simple random sampling

23
Table of random numbers
24

684257954125632140
582032154785962024
362333254789120325
985263017424503686
5.25
Simple Random Sampling
A simple random sample is a sample selected in such a way that every
possible sample of the same size is equally likely to be chosen.

Drawing three names from a hat containing all the names of the
students in the class is an example of a simple random sample: any
group of three names is as equally likely as picking any other group of
three names.

VERY EASY TO DEFINE!


VERY, VERY DIFFICULT TO DO!
Random sample of 100 cokes bottles today at the coke plant.
Random sample of 50 pine trees in a 1000 acre forest.
Random sample of 5 deer in a national forest.
Simple Random Sampling…
5.26

A government income tax auditor must choose a


sample of 5 of 11 returns to audit…[Can do many
different ways]
Generate Sorted
Person Random # Person Random #
baker 0.87487 1 mark 0.08350
george 0.89068 2 ralph 0.11597
ralph 0.11597 3 joe 0.24662
mary 0.58635 4 sally 0.34346
sally 0.34346 5 aaron 0.37239
joe 0.24662
andrea 0.47609
andrea 0.47609
greg 0.53542
mark 0.08350
mary 0.58635
greg 0.53542
aaron 0.37239
kim 0.73809
kim 0.73809 baker 0.87487
george 0.89068
2. Systematic sampling
27

Systematic random sampling. With systematic


random sampling, we create a list of every member
of the population. From the list, we randomly select
the first sample element from the first k elements on
the population list. Thereafter, we select
every kth element on the list.

This method is different from simple random


sampling since every possible sample of n elements
is not equally likely.
Systematic sampling

28
3. Cluster sampling
29

Cluster: a group of sampling units close to each other i.e. crowding


together in the same area or neighborhood.
Cluster sampling. With cluster sampling, every member of the
population is assigned to one, and only one, group. Each group is
called a cluster. A sample of clusters is chosen, using a probability
method (often simple random sampling). Only individuals within
sampled clusters are surveyed.

Note the difference between cluster sampling and stratified sampling.


With stratified sampling, the sample includes elements from each
stratum. With cluster sampling, in contrast, the sample includes
elements only from sampled clusters.
Cluster sampling
Section 1 Section 2
30

Section 3

Section 5

Section 4
4. Stratified sampling
31

Stratified sampling. With stratified sampling, the population is divided into


groups, based on some characteristic. Then, within each group, a probability
sample (often a simple random sample) is selected. In stratified sampling, the
groups are called strata.

As a example, suppose we conduct a national survey. We might divide the


population into groups or strata, based on geography - north, east, south, and
west. Then, within each stratum, we might randomly select survey respondents.
4. Stratified sampling
32

The population is divided into subpopulations (strata)


and random samples are taken of each stratum
Disadvantages
It is not useful when there are no similar subgroups. It
cannot be used when amount of data in subgroups is
not equal but total data in a subgroup are of equal
importance as it gives more importance to subgroups
with more data.
Example
In general the size of the sample in each stratum is taken in proportion to the size of the stratum. This is called proportional allocation.
33 Suppose that in a company there are the following staff:

male, full time: 90


male, part time: 18
female, full time: 9
female, part time: 63
Total: 180

and we are asked to take a sample of 40 staff, stratified according to the above categories.
The first step is to find the total number of staff (180) and calculate the percentage in each group.

% male, full time = 90 / 180 = 50%


% male, part time = 18 / 180 = 10%
% female, full time = 9 / 180 = 5%
% female, part time = 63 / 180 = 35%

This tells us that of our sample of 40,


50% should be male, full time.
10% should be male, part time.
5% should be female, full time.
35% should be female, part time.
5. Multi-stage sampling
34

Multistage sampling is a complex form of cluster sampling. Cluster sampling


is a type of sampling which involves dividing the population into groups (or
clusters). Then, one or more clusters are chosen at random and everyone
within the chosen cluster is sampled.

Sometimes the population is too large and scattered for it to be practical to


make a list of the entire population from which to draw a SRS.
For instance, when the a polling organization samples US voters, they do not
do a SRS. Since voter lists are compiled by counties, they might first do a
sample of the counties and then sample within the selected counties.
This illustrates two stages. In some instances, they might use even more stages.
At each stage, they might do a stratified random sample on sex, race, income
level, or any other useful variable on which they could get information before
sampling.
35

Advantages
Cost and speed that the survey can be done in
Convenience of finding the survey sample
Normally more accurate than cluster sampling for
the same size sample
Disadvantages
Not as accurate as SRS if the sample is the same
size
More testing is difficult to do
5. Multistage sampling
36

Multistage sampling. With multistage sampling, we


select a sample by using combinations of different
sampling methods.

For example, in Stage 1, we might use cluster


sampling to choose clusters from a population. Then,
in Stage 2, we might use simple random sampling to
select a subset of elements from each chosen cluster
for the final sample.
6. Cluster Sampling
37

Cluster sampling is a sampling technique used when "natural"


groupings are evident in a statistical population. It is often used in
marketing research. In this technique, the total population is
divided into these groups (or clusters) and a sample of the groups
is selected. Then the required information is collected from the
elements within each selected group. This may be done for every
element in these groups or a subsample of elements may be
selected within each of these groups.
A common motivation for cluster sampling is to reduce the average
cost per interview. Given a fixed budget, this can allow an
increased sample size. Assuming a fixed sample size, the
technique given more accurate results when most of the variation
in the population is within the groups, not between them.
38

Elements within a cluster should ideally be as


heterogeneous as possible, but there should be
homogeneity between cluster means. Each cluster
should be a small scale representation of the total
population. The clusters should be mutually exclusive
and collectively exhaustive. A random sampling
technique is then used on any relevant clusters to
choose which clusters to include in the study. In single-
stage cluster sampling, all the elements from each of
the selected clusters are used. In two-stage cluster
sampling, a random sampling technique is applied to
the elements from each of the selected clusters.
How does one decide which type of
sampling to use?
39

The formulas in almost all books assume simple random sampling.


Unless you are willing to learn the more complex techniques to
analyze the data after it is collected, it is appropriate to use
simple random sampling. To learn the appropriate formulas for
the more complex sampling schemes, look for a book or course on
sampling.
Stratified random sampling gives more precise information than
simple random sampling for a given sample size. So, if information
on all members of the population is available that divides them
into strata that seem relevant, stratified sampling will usually be
used.
If the population is large and enough resources are available,
usually one will use multi-stage sampling. In such situations, usually
stratified sampling will be done at some stages.
PROBLEM
40

An auto analyst is conducting a satisfaction survey, sampling from a list of 10,000 new
car buyers. The list includes 2,500 Ford buyers, 2,500 GM buyers, 2,500 Honda buyers,
and 2,500 Toyota buyers. The analyst selects a sample of 400 car buyers, by randomly
sampling 100 buyers of each brand.

Is this an example of a simple random sample?


(A) Yes, because each buyer in the sample was randomly sampled.
(B) Yes, because each buyer in the sample had an equal chance of being sampled.
(C) Yes, because car buyers of every brand were equally represented in the sample.
(D) No, because every possible 400-buyer sample did not have an equal chance of
being chosen.
(E) No, because the population consisted of purchasers of four different brands of car.
Solution
41
Sampling and Non-Sampling Error
Two major types of error can arise when a sample of observations is
taken from a population:
sampling error and nonsampling error.

Sampling error refers to differences between the sample and the


population that exist only because of the observations that happened
to be selected for the sample. Random and we have no control over.

Nonsampling errors are more serious and are due to mistakes made in
the acquisition of data or due to the sample observations being
selected improperly. Most likely caused be poor planning, sloppy work,
act of the Goddess of Statistics, etc.
Sampling Error
Sampling error refers to differences between the
sample and the population that exist only because of
the observations that happened to be selected for the
sample.

Increasing the sample size will reduce this type of


error.
Nonsampling Error…
Non-sampling errors are more serious and are due to
mistakes made in the acquisition of data or due to the
sample observations being selected improperly.
Three types of non-sampling errors:

Errors in data acquisition,


Non response errors, and
Selection bias.

Note: increasing the sample size will not reduce this type
of error.
Non sampling error:
Errors in data acquisition
…arises from the recording of incorrect responses,
due to:

— incorrect measurements being taken because of faulty


equipment,
— mistakes made during transcription from primary sources,
— inaccurate recording of data due to misinterpretation of
terms, or
— inaccurate responses to questions concerning sensitive issues.
Nonresponse Error…
…refers to error (or bias) introduced when responses
are not obtained from some members of the sample,
i.e. the sample observations that are collected may
not be representative of the target population.

As mentioned earlier, the Response Rate (i.e. the


proportion of all people selected who complete the
survey) is a key survey parameter and helps in the
understanding in the validity of the survey and sources
of nonresponse error.
Selection Bias…
…occurs when the sampling plan is such that some
members of the target population cannot possibly be
selected for inclusion in the sample.
Errors in sample
48

Systematic error (or bias)


Inaccurate response (information bias)
Selection bias

Sampling error (random error)

You might also like