Chapter 7 BRM

Download as pdf or txt
Download as pdf or txt
You are on page 1of 51

Chapter 7

Data management plan – Sampling &


measurement

Sampling concepts-
Sample vs Census,Sampling vs Non Sampling
error; Sampling Design- Probability and
NonProbability
Sampling design;

Determination of Sample size- Sample size for


estimating population mean, Determination of
sample size for estimating the population
proportion

Data Editing- Field Editing, Centralized in house


editing; Coding
Sampling
● Sampling is the process of selecting units (e.g.,
people, organizations) from a population of
interest so that by studying the sample we may
fairly generalize our results back to the
population from which they were chosen.
Sampling
● Sampling units are non overlapping collections of elements from the
population that cover the entire population.
● Sampling is a general technique for estimating properties of a
particular distribution
Process
● 1. Identify the population of interest. A population is the group of
people that you want to make assumptions about.
● 2. Specify a sampling frame. A sampling frame is the group of
people from which you will draw your sample.
● 3.Specify a sampling method. There are basically two ways to
choose a sample from a sampling frame:
● randomly or
● non-randomly.
Sampling
4. Determine the sample size.
● In general, larger samples are better, but they also require more
time and effort to manage.
● 5. Implement the plan for
sampling

Terms in sampling
● If you measure the entire population and calculate a value like a
mean or average, we don't refer to this as a statistic, we call it a
parameter of the population.
● a response is a specific measurement value that a sampling
unit supplies.
● The distribution of an infinite number of samples of the same
size as the sample in your study is known as the sampling
distribution
Sampling Frame
● Basically, a sampling frame is a complete list of all the members of the
population that we wish to study.
● To give an example, if we wish to study the underlying factors that cause
patients to be admitted into hospital following an acute asthmatic attack in a
given area (your population), then you would need to know the names of all
the people in that area who have been admitted into hospital for this reason.
● From a list of these names, you can then randomly select an appropriate
number as representatives of the population (your sample) whom you can
invite to take part in the research.
● If we do not have such a sampling frame, then we are restricted to less
satisfactory forms of samples which cannot be randomly selected because
not all individuals within that population will have the same probability of
being selected for the sample.
● Thus we would find that our sample is a non-probability sample.
Sample vs census
Census and sampling are two methods
of collecting survey data about the
population that are used by many
countries.

Census refers to the quantitative


research method, in which all the
members of the population are
enumerated.

On the other hand, the sampling is the


widely used method, in statistical
testing, wherein a data set is selected
from the large population, which
represents the entire group.
Determining Sample Size:
How to Ensure You Get the Correct Sample Size

1) Population Size- Entire population


2) Margin of Error (Confidence Interval) — No sample will be
perfect, so you need to decide how much error to allow. The
confidence interval determines how much higher or lower than the
population mean you are willing to let your sample mean fall. eg-with
a margin of error of +/- 5%
3) Confidence Level — How confident do you want to be that the actual
mean falls within your confidence interval? The most common
confidence intervals are 90% confident, 95% confident, and 99%
confident.
4) Standard of Deviation — How much variance do you expect in your
responses?
● Since we haven’t actually administered our survey yet, the safe
decision is to use 0.5 – this is the most forgiving number and ensures
that your sample will be large enough.
Statistical Error- Sampling & Non
sampling
● Error (statistical error) describes the difference between a
value obtained from a data collection process and the 'true'
value for the population.
● The greater the error, the less representative the data are of the
population.

Data can be affected by two types of error:


1. Sampling error
● Deviation between an estimate from an ideal sample and
the true population value is the sampling error.
● Almost always, the sampling frame does not match up perfectly
with the target population, leading to errors of coverage.
Non sampling error
● 2 . Non-sampling error is a catch-all term for the deviations of
estimates from their true values that are not a function of the
sample chosen, including various systematic errors and random
errors that are not due to sampling

Non-sampling errors can be further divided into


a) coverage errors- Coverage error is an error that occurs in
statistical estimates of a survey. It results from gaps between
the sampling frame and the total population.
b) measurement errors (respondent, interviewer, questionnaire,
collection method…)
c) non-response errors and processing errors.
Non-response
● Non response is probably the most serious of
these errors.
Arises in three ways:
● Inability of the person responding to come up
with the answer
● Refusal to answer
● Inability to contact the sampled elements
Response Rates
● About 20 – 30% usually return a questionnaire
● Follow up techniques could bring it up to about
50%
● Still, response rates under 60 – 70% challenge
the integrity of the random sample
● How the survey is distributed can affect the
quality of sampling
Response and Non-response error
Response errors arise from people taking the survey but the resultant
answers are incorrect.

1) Measurement error results from the survey research instrument itself.


2) Recording and analysis errors are a matter of surveyors entering
incorrect data into the survey database.
3) respondent error occurs when respondents provide misleading
information

Non response error-required information is not obtained from the


persons selected in the sample.

The consequences of non-response


One effect of non-response is that is reduces the sample size. This does
not lead to wrong conclusions.
Due to the smaller sample size, the precision of estimators will be
smaller. The margins of error will be larger.
Rectify- send more or donnt consider lost sample
Probability
● The population of interest is usually too large to
attempt to survey all of its members.
● A carefully chosen sample can be used to
represent the population.
● The sample reflects the characteristics of the
population from which it is drawn.
● Probability-Probability is the measure of the
likelihood that an event will occur ( whom to
choose)
Probability and Nonprobability
samples
a.Probability Samples: each member of the
population has a known non-zero probability of
being selected
● Methods include random sampling, systematic
sampling, cluster and stratified sampling.
● b. Nonprobability Samples: members are
selected from the population in some
nonrandom manner
● Methods include convenience sampling,
judgment sampling, quota sampling, and
snowball sampling
1. Random Sampling
● Random sampling is the purest form of probability
sampling.
● Each member of the population has an equal and known
chance of being selected.
● When there are very large populations, it is often ‘difficult’ to
identify every member of the population
● Is a type of Probability sampling
● A simple random sample is meant to be an unbiased
representation of a group.
● An example of a simple random sample would be a group of 25
employees chosen out of a hat from a company of 250
employees.
● In this case, the population is all 250 employees, and the
sample is random because each employee has an equal
chance of being chosen
2. Stratified Sampling
● Stratified sampling is commonly used probability method
that is superior to random sampling because it reduces
sampling error.

A stratum is a subset of the population that share at
least one common characteristic; such as males and
females.
● Identify relevant stratums and their actual representation in
the population.
● Random sampling is then used to select a sufficient
number of subjects from each stratum.
● Stratified sampling is often used when one or more of the
stratums in the population have a low incidence relative to
the other stratums.

Stratafied is where the units are split into groups and
then a random sample is picked from each group.
3. Systematic sampling
● Systematic sampling is a
type of probability
sampling method in which
sample members from a
larger population are
selected according to a
random starting point and
a fixed, periodic interval.
● Systematic is where only
the first unit is selected
at random and the
remaining units are picked
in a sequence with equal
intervals
4. Cluster Sampling
● Cluster Sample: a probability sample in which each
sampling unit is a collection of elements.
● Cluster already exists, strata is created- eg-groups in class
is cluster, strata – based on marks
● Effective under the following conditions:
● A good sampling frame is not available or costly, while a
frame listing clusters is easily obtained
● The cost of obtaining observations increases as the
distance separating the elements increases
● Examples of clusters:
● City blocks – political or geographical
● Housing units – college students
● Hospitals – illnesses
5. Convenience Sampling
● Convenience sampling is used in exploratory
research where the researcher is interested in
getting an inexpensive approximation.
● The sample is selected because they are
convenient.
● It is a nonprobability method.
● Often used during preliminary research efforts
to get an estimate without incurring the cost or
time required to select a random sample
6. Judgment Sampling
● Judgment sampling is a common nonprobability
method.
● The sample is selected based upon judgment.
● an extension of convenience sampling
● When using this method, the researcher must
be confident that the chosen sample is truly
representative of the entire population.
7. Quota Sampling
● Quota sampling is the nonprobability equivalent
of stratified sampling.

● First identify the stratums and their proportions


as they are represented in the population

● Then convenience or judgment sampling is


used to select the required number of subjects
from each stratum.
8. Snowball Sampling
● Snowball sampling is a special nonprobability
method used when the desired sample
characteristic is rare.
● It may be extremely difficult or cost prohibitive
to locate respondents in these situations.
● This technique relies on referrals from initial
subjects to generate additional subjects.
● It lowers search costs; however, it introduces
bias because the technique itself reduces the
likelihood that the sample will represent a good
cross section from the population.
Sample design
● Sample design is the framework, or road map, that serves
as the basis for the selection of a survey sample and
affects many other important aspects of a survey as well.
● The "best" sample design depends on survey objectives
and on survey resources.
● For example, a researcher might select the most
economical design that provides a desired level of
precision.
● Or, if the budget is limited, a researcher might choose the
design that provides the greatest precision without going
over budget.
Sample design
● Survey researchers are interested in obtaining some type of
information through a survey for some population, or universe,
of interest.
● One must define a sampling frame that represents the
population of interest, from which a sample is to be drawn.
● The sampling frame may be identical to the population, or it
may be only part of it and is therefore subject to some
undercoverage, or it may have an indirect relationship to the
population (e. g. the population is preschool children and the
frame is a listing of preschools).
● The sample design provides the basic plan and methodology for
selecting the sample.
Sample design
Steps in Sample design
● 1. Defining the universe or population of interest is the
first step in any sample design.
● The accuracy of the results in any study depends on how
clearly the universe or population of interest is defined.
● The universe can be finite or infinite, depending on the
number of items it contains.

2. Defining the sampling unit within the population of
interest is the second step in the sample design
process.
● The sampling unit can be anything that exists within the
population of interest.
● For example, sampling unit may be a geographical unit, or
a construction unit or it may be an individual unit.
Steps in Sample design
● 3. Preparing the list of all the items within the population of
interest is the next step in the sample design process.
● It is from this list, which is also called as source list or sampling
frame, that we draw our sample.
● It is important to note that our sampling frame should be highly
representative of the population of interest.
● It addresses the question “Ideally, who do you want to
survey?” i.e. those who have the information sought What
are their characteristics. Who should be excluded?
● age, gender, product use, those in industry
● Geographic area
● It involves -defining population units
● setting population boundaries
● Screening (e.g. security questions, product use )
Steps in Sample design
● 4. Determination of sample size - the sample size should not
be excessively large nor it should be too small.
● It is desired that the sample size should be optimum and it
should be representative of the population and should give
reliable results.
● Population variance, population size, parameters of interest,
and budgetary constraints are some of the factors that impact
the sample size.
Steps in Sample design
● 5. Deciding about the technique of sampling is the next step in
sample design.
● There are many sampling techniques out of which the researchers
has to choose the one which gives lowest sampling error, given the
sample size and budgetary constraints.

eg- cluster, strata etc
Sample design for managerial research
● In business research, companies must often generate samples of
customers, clients, employees, and so forth to gather their opinions.
● Sample design is also a critical component of marketing research
and employee research for many organizations.
● During sample design, firms must answer questions such as: - What
is the relevant population, sampling frame, and sampling unit?
● - What is the appropriate margin of error that should be achieved?
● - How should sampling error and non-sampling error be assessed
and balanced?
Survey sampling
● Survey sampling describes the process of selecting a
sample of elements from a target population to conduct a
survey.
● In survey sampling it most often involves a questionnaire used
to measure the characteristics and/or attitudes of people.
● Different ways of contacting members of a sample once they
have been selected is the subject of survey data collection.
● The purpose of sampling is to reduce the cost and/or the
amount of work that it would take to survey the entire target
population.
● A survey that measures the entire target population is called a
census.
Determining Sample Size-Formula
● Knowing the target population, you have to decide the number
of the participants in a sample, which is termed as the “sample
size”.
● Aside from the estimated number of people in the target
population, the sample size can be influenced by other factors
such as budget, time available, and the target degree of
precision.

● The sample size can be calculated using the formula:


● n= t² x p(1-p) Where: n = required sample size

m² t = confidence level at 95% (standard value of 1.96)


p = estimated prevalence of the variable of interest
(e.g. 20% or 0.2 of the population are smokers)
m = margin of error at 5% (standard value of 0.05)
Confidence Limits and Level
● Confidence level- the probability that the value of a
parameter falls within a specified range of values.
onfidence intervals can be shown at several confidence
levels, for example 90%, 95% and 99%.


Confidence Limits and Level
● Confidence Limit- When sampling from a population to
estimate a mean a confidence interval is a range of values
within which you are n% confident the true mean is
included. n = some stated percentage, called a confidence
level.
Types of Survey Questions
● Before constructing questions, you must be knowledgeable
about each type of question used in survey research. These
basically include:
● 1. Closed-Ended Questions
● Closed-ended questions limit the answers of the respondents to
response options provided on the questionnaire.
● Advantages: time-efficient; responses are easy to code and
interpret; ideal for quantitative type of research
● Disadvantages: respondents are required to choose a response
that does not exactly reflect their answer; the researcher cannot
further explore the meaning of the responses
Types of Survey Questions
● Some examples of close ended questions are:
● Dichotomous or two-point questions (e.g. Yes or
No, Unsatisfied or Satisfied)
● Multiple choice questions (e.g. A, B, C or D)
● Scaled questions that are making use of rating
scales such as the Likert scale (i.e. a type of
five-point scale), three-point scales, semantic
differential scales, and seven-point scales
Types of Survey Questions
● 2. Open-Ended Questions
● In open-ended questions, there are no predefined options or
categories included.
● The participants should supply their own answers.
● Advantages: participants can respond to the questions exactly
as how they would like to answer them; the researcher can
investigate the meaning of the responses; ideal for qualitative
type of research
● Disadvantages: time-consuming; responses are difficult to code
and interpret
Types of Survey Questions
● Some examples of open-ended questions include:
● Completely unstructured questions- openly ask the opinion or
view of the respondent
● Word association questions - the participant states the first word
that pops in his mind once a series of words are presented
● Thematic Apperception Test – a picture is presented to the
respondent which he explains on his own point-of-view
● Sentence, story or picture completion – the respondent
continues an incomplete sentence or story, or writes on empty
conversation balloons in a picture
Types of Survey Questions

3. Matrix Questions
● Matrix questions are also closed-ended questions but are arranged
one under the other, such that the questions form a matrix or a table
with identical response options placed on top. For example:
● Please rate the following characteristics of the product based on your
satisfaction ( use a check mark):

Strongly Satisfied Neutral Unsatisfied Strongly


Satisfied Unsatisfied

Size

Color

Shape

Overall
Appearance
Types of Survey Questions

4. Contingency Questions
● Questions that need to be answered only when the respondent
provides a particular response to a question prior to them are called
contingency questions. Asking these questions effectively avoids
asking people questions that are not applicable to them. For example:
● Have you ever smoked a cigarette?
● ___Yes ___ No
● If YES, how many times have you smoked cigarette?
● __ Once
● __2-5 times
● __ 6-10 times
● __more than 10 times
Data analysis and Interpretation
● Interpretation refers to the task of drawing inferences from
the collected facts after an analytical and/or experimental
study.
● In fact, it is a search for broader meaning of research
findings.
● The task of interpretation has two major aspects viz.-
● the effort to establish continuity in research through linking
the results of a given study with those of another, and
● the establishment of some explanatory concepts.
Data analysis and Interpretation
● Researcher must give reasonable explanations of the relation which
he has found and he must interpret the lines of relationship in terms
of the underlying processes and must try to find out the thread of
uniformity that lies under the surface layer of his diversified research
findings.
● Extraneous information, if collected during the study, must be
considered while interpreting the final results of research study, for it
may prove to be a key factor in understanding the problem under
consideration.
● point out omissions and errors in logical argumentation. Such a
consultation will result in correct interpretation and, thus, will enhance
the utility of research results.
● Researcher must accomplish the task of interpretation only after
considering all relevant factors affecting the problem to avoid false
generalization.
Data Editing- Field Editing
Data editing is the activity aimed at detecting and correcting errors
(logical inconsistencies) in data.

Context: Editing techniques refers to a range of procedures and


processes used for detecting and handling errors in data.

Centralized in house editing

Coding
Calculate Mean
● For individual observations, . E.g.
● X = {3,5,7,7,8,8,8,9,9,10,10,12}
● = 96 ; n = 12
● Thus, = 96/12 = 8
The above observations can be organised into a frequency table and
mean calculated on the basis of frequencies

x 3 5 7 8 9 10 12
● f 1 1 2 3 2 2 1

f 3 5 14 24 18 20 12
= 96; = 12
Thus, = 96/12 = 8
Central Tendency–“Mean of Grouped
Data”
● House rental or prices in the PMR are frequently
tabulated as a range of values. E.g.
What is the mean rental across the areas?
● Rental (RM/month) 135-140 140-145 145-150 150-155 155-160
● Mid-point value (x) 137.5 142.5 147.5 152.5 157.5
● (f) 5 9 6 2 1
● Fx 687.5 1282.5 885.0 305.0 157.5
● What is the mean rental across the areas? =
=23; = 3317.5
● Thus, = 3317.5/23 = 144.24
Sampling- pg 173
● A market research survey in which 64
consumers were contacted states that 64% of
all consumers of a certain product were
motivated by the products advertising. Find the
confidence limits for the propotion of
consumers motivated by advertising in the
population , given a confidence level equal to
0.95.
Sampling 180

● The management of ITC hotels is interested in


determining the percentage of the hotels guests
who stay for more than 3 days. The reservation
manager wants to be 95 % confident that the
percentage has been estimated to be within +
3% of the true value. What is the most
conservative sample size needed for this
problem?
Sampling 179
● What should be the sample size if a simple
random sample from a population of 4000 items
is to be drawn to estimate the per cent defective
within 2 per cent of the true value with 95.5 per
cent probability ? What would be the size of the
sample if the population is assumed to be
infinite in the given case?
● Z value will be given as 2.005
Mean or average, in theory, is the sum of all the elements of a set
divided by the number of elements in the set.

Median is the middle value of a set.

3, 13, 7, 5, 21, 23, 39, 23, 40, 23, 14, 12, 56, 23, 29

When we put those numbers in order we have:

3, 5, 7, 12, 13, 14, 21, 23, 23, 23, 23, 29, 39, 40, 56

There are fifteen numbers. Our middle is the eighth number:

3, 5, 7, 12, 13, 14, 21, 23, 23, 23, 23, 29, 39, 40, 56
Mode in a dataset is the value that is most frequent in a dataset.
Like mean and median, mode is also used to summarize a set with
a single piece of information.
For example, the mode of the dataset S =
1,2,3,3,3,3,3,4,4,4,5,5,6,7, is 3

Variance- measure the deviation of a set of data from the mean


value. Find the variance and standard deviation for the following
data series:12, 6, 7, 3, 15, 10, 18, 5

.
Standard deviation is calculated by square rooting the variance of
the data. The standard deviation gives a more accurate account of
the dispersion of values in a dataset.

You might also like