RM Unit 2
RM Unit 2
In Statistics, the basis of all statistical calculations or interpretation lies in the collection of
data. There are numerous methods of data collection. Both are suitable in different cases
and the knowledge of these methods is important to understand when to apply which
method.
These two methods are the Census method and Sampling method.
Census Sampling
Census method is the method of statistical enumeration where all members of the
population are studied. A population refers to the set of all observations under concern. For
example, if you want to carry out a survey to find out student’s feedback about the facilities
of your school, all the students of your school would form a part of the ‘population’ for your
study.
At a more realistic level, a country wants to maintain information and records about all
households. It can collect this information by surveying all households in the country using
the census method.
In our country, the Government conducts the Census of India every ten years. The Census
appropriates information from households regarding their incomes, the earning members,
the total number of children, members of the family, etc. This method must take into
account all the units. It cannot leave out anyone in collecting data. Once collected, the
Census of India reveals demographic information such as birth rates, death rates, total
population, population growth rate of our country, etc. The last census was conducted in
the year 2011, and due in 2021.
Sampling Method
Like we have studied, the population contains units with some similar characteristics on the
basis of which they are grouped together for the study. In the case of the Census of India,
for example, the common characteristic was that all units are Indian nationals. But it is not
always practical to collect information from all the units of the population.
It is a time-consuming and costly method. Thus, an easy way out would be to collect
information from some representative group from the population and then make
observations accordingly.
This representative group which contains some units from the whole
population is called the sample
SAMPLE DESIGN
A sample design is the framework, or road map, that serves as the basis for the selection of
a survey sample and affects many other important aspects of a survey as well.
In a broad context, survey researchers are interested in obtaining some type of information
through a survey for some population, or universe, of interest. One must define a sampling
frame that represents the population of interest, from which a sample is to be drawn. The
sampling frame may be identical to the population, or it may be only part of it and is
therefore subject to some under coverage, or it may have an indirect relationship to the
population.
While developing a sampling design, the researcher must pay attention to the following
points:
Type of universe: The first step in developing any sample design is to clearly define the set
of objects, technically called the Universe, to be studied. The universe can be finite or
infinite. In finite universe the number of items is certain, but in case of an infinite universe
the number of items is infinite, i.e., we cannot have any idea about the total number of
items. The population of a city, the number of workers in a factory and the like are examples
of finite universes, whereas the number of stars in the sky, listeners of a specific radio
programme, throwing of a dice etc. are examples of infinite universes.
Sampling unit: A decision has to be taken concerning a sampling unit before selecting
sample. Sampling unit may be a geographical one such as state, district, village, etc., or a
construction unit such as house, flat, etc., or it may be a social unit such as family, club,
school, etc., or it may be an individual. The researcher will have to decide one or more of
such units that he has to select for his study.
Source list: It is also known as ‘sampling frame’ from which sample is to be drawn. It
contains the names of all items of a universe (in case of finite universe only). If source list is
not available, researcher has to prepare it. Such a list should be comprehensive, correct,
reliable and appropriate. It is extremely important for the source list to be as representative
of the population as possible.
Size of sample: This refers to the number of items to be selected from the universe to
constitute a sample. This a major problem before a researcher. The size of sample should
neither be excessively large, nor too small. It should be optimum. An optimum sample is one
which fulfills the requirements of efficiency, representativeness, reliability and flexibility.
While deciding the size of sample, researcher must determine the desired precision as also
an acceptable confidence level for the estimate. The size of population variance needs to be
considered as in case of larger variance usually a bigger sample is needed. The size of
population must be kept in view for this also limits the sample size. The parameters of
interest in a research study must be kept in view, while deciding the size of the sample.
Costs too dictate the size of sample that we can draw. As such, budgetary constraint must
invariably be taken into consideration when we decide the sample size.
Parameters of interest: In determining the sample design, one must consider the question
of the specific population parameters which are of interest. For instance, we may be
interested in estimating the proportion of persons with some characteristic in the
population, or we may be interested in knowing some average or the other measure
concerning the population. There may also be important sub-groups in the population about
whom we would like to make estimates. All this has a strong impact upon the sample design
we would accept.
Budgetary constraint: Cost considerations, from practical point of view, have a major
impact upon decisions relating to not only the size of the sample but also to the type of
sample. This fact can even lead to the use of a non-probability sample.
Sampling procedure: Finally, the researcher must decide the type of sample he will use i.e.,
he must decide about the technique to be used in selecting the items for the sample. In fact,
this technique or procedure stands for the sample design itself. There are several sample
designs (explained in the pages that follow) out of which the researcher must choose one
for his study. Obviously, he must select that design which, for a given sample size and for a
given cost, has a smaller sampling error.The first most important step in selecting a sample
is to determine the population. Once the population is identified, a sample must be
selected. A good sample is one which is:
Small in size.
It provides adequate information about the whole population.
It takes less time to collect and is less costly.
In the case of our previous example, you could choose students from your class to be
the representative sample out of the population (all students in the school). However, there
must be some rationale behind choosing the sample. If you think your class comprises a set
of students who will give unbiased opinions/feedback or if you think your class contains
students from different backgrounds and their responses would be relevant to your student,
you must choose them as your sample. Otherwise, it is ideal to choose another sample
which might be more relevant.
Again, realistically, the government wants estimates on the average income of the Indian
household. It is difficult and time-consuming to study all households. The government can
simply choose, say, 50 households from each state of the country and calculate the average
of that to arrive at an estimate. This estimate is not necessarily the actual figure that would
be arrived at if all units of the population underwent study. But, it approximately gives an
idea of what the figure might look like.
Different
Random sample
The term random has a very precise meaning. Each individual in the population of
interest has an equal likelihood of selection. This is a very strict meaning — you can’t
just collect responses on the street and have a random sample.
Stratified sample
Stratified samples are as good as or better than random samples, but they require
fairly detailed advance knowledge of the population characteristics, and therefore are
more difficult to construct.
As they are not truly representative, non-probability samples are less desirable than
probability samples. However, a researcher may not be able to obtain a random or
stratified sample, or it may be too expensive. A researcher may not care about
generalizing to a larger population. The validity of non-probability samples can be
increased by trying to approximate random selection, and by eliminating as many
sources of bias as possible.
Quota sample
The defining characteristic of a quota sample is that the researcher deliberately sets
the proportions of levels or strata within the sample. This is generally done to insure
the inclusion of a particular segment of the population. The proportions may or may
not differ dramatically from the actual proportion in the population. The researcher
sets a quota, independent of population characteristics.
Purposive sample
Convenience sample
Probability Sampling
Probability Sampling is a sampling technique in which sample from a larger population are
chosen using a method based on the theory of probability. For a participant to be considered
as a probability sample, he/she must be selected using a random selection.
The most important requirement of probability sampling is that everyone in your population
has a known and an equal chance of getting selected. For example, if you have a population
of 100 people every person would have odds of 1 in 100 for getting selected. Probability
sampling gives you the best chance to create a sample that is truly representative of the
population.
Probability sampling uses statistical theory to select randomly, a small group of people
(sample) from an existing large population and then predict that all their responses together
will match the overall population.
Let us take an example to understand this sampling technique. The population of the US
alone is 330 million, it is practically impossible to send a survey to every individual to gather
information but you can use probability sampling to get data which is as good even if it is
collected from a smaller population.
There are two ways in which the samples are chosen in this method of sampling:
Lottery system and using number generating software/ random number table. This
sampling technique usually works around large population and has its fair share of
advantages and disadvantages.
Ease of use represents the biggest advantage of simple random sampling. Unlike
more complicated sampling methods such as stratified random sampling and
probability sampling, no need exists to divide the population into sub-populations or
take any other additional steps before selecting members of the population at
random.
A sampling error can occur with a simple random sample if the sample does not end
up accurately reflecting the population it is supposed to represent. For example, in
our simple random sample of 25 employees, it would be possible to draw 25 men
even if the population consisted of 125 women and 125 men. For this reason, simple
random sampling is more commonly used when the researcher knows little about the
population. If the researcher knew more, it would be better to use a different
sampling technique, such as stratified random sampling, which helps to account for
the differences within the population, such as age, race or gender. Other
disadvantages include the fact that for sampling from large populations, the process
can be time consuming and costly compared to other methods.
2. Systematic Sample
Systematic Sampling is when you choose every “nth” individual to be a part of the
sample. For example, you can choose every 5th person to be in the sample.
Systematic sampling is an extended implementation of the same old probability
technique in which each member of the group is selected at regular periods to form a
sample. There’s an equal opportunity for every member of a population to be
selected using this sampling technique.
One risk that statisticians must consider when conducting systematic sampling
involves how the list used with the sampling interval is organized. If the population
placed on the list is organized in a cyclical pattern that matches the sampling
interval, the selected sample may be biased. For example, a company’s human
resources department wants to pick a sample of employees and ask how they feel
about company policies. Employees are grouped in teams of 20, with each team
headed by a manager. If the list used to pick the sample size is organized with teams
clustered together, the statistician risks picking only managers (or no managers at
all) depending on the sampling interval.
A common method is to arrange or classify by sex, age, ethnicity and similar ways.
Splitting subjects into mutually exclusive groups and then using simple random
sampling to choose members from groups.
Members in each of these groups should be distinct so that every member of all
groups get equal opportunity to be selected using simple probability. This sampling
method is also called “random quota sampling.
The main advantage of stratified random sampling is that it captures key population
characteristics in the sample. Similar to a weighted average, this method of sampling
produces characteristics in the sample that are proportional to the overall population.
Stratified random sampling works well for populations with a variety of attributes but
is otherwise ineffective if subgroups cannot be formed.
Stratification gives a smaller error in estimation and greater precision than the simple
random sampling method. The greater the differences between the strata, the
greater the gain in precision.
4. Area Sampling
The basic idea of area sampling is both simple and powerful. It enjoys wide usage in
situations where very high quality data are wanted but for which no list of universe
items exists. For instance, many governmental agencies (e.g. Bureau of Labor
Statistics) use area sampling.
However, the practical execution of a large scale area sample is highly complex.
Typically an area sampling is conducted in multiple stages, with successively smaller
area clusters being sub-sampled at each stage.
(i) Create geographic strata, each consisting of a group of counties in more or less
close proximity. Fifty or more such strata, containing all of the roughly 3,000 US
counties, are commonly used.
(ii) Within each geographic stratum, choose a probability sample of one or more
counties (or groups of counties such as metropolitan areas).
(iii) Within each sample county (or group of counties), choose a probability sample of
places (cities, towns, etc).
(iv) Within each sample place, select a probability sample of area segments (blocks
in cities, area with identifiable boundaries in other places, etc)
5. Cluster Sampling
Some steps and tips to use cluster sampling for market research, are:-
Sample: Decide the target audience and also the size of the sample.
Create and evaluate sampling frames: Create a sampling frame by using either an
existing frame or creating a new one for the target audience. Evaluate frames on the
basis of coverage and clustering and make adjustments accordingly. These groups
will be varied considering the population which can be exclusive and comprehensive.
Members of a sample are selected individually.
Determine groups: Determine the number of groups by including the same average
members in each group. Make sure each of these groups are distinct from one
another.
Select clusters: Choose clusters randomly for sampling.
Geographic segmentation: Geographic segmentation is the most commonly used
cluster sample.
Sub-types: Cluster sampling is bifurcated into one-stage and multi-stage subtypes
on the basis of the number of steps followed by researchers to form clusters.
There are two ways to classify cluster sampling. The first way is based on the
number of stages followed to obtain the cluster sample and the second way is the
representation of the groups in the entire cluster.
The first classification is the most used in cluster sampling. In most cases, sampling
by clusters happens over multiple stages. A stage is considered to be the steps
taken to get to a desired sample and cluster sampling is divided into single-stage,
two-stage, and multiple stages.
(I) Single Stage Cluster Sampling: As the name suggests, sampling will be done
just once. An example of Single Stage Cluster Sampling –An NGO wants to create a
sample of girls across 5 neighbouring towns to provide education. Using single-stage
cluster sampling, the NGO can randomly select towns (clusters) to form a sample
and extend help to the girls deprived of education in those towns.
(I) Consumes less time and cost: Sampling of geographically divided groups
require less work, time and cost. It’s a highly economical method to observe clusters
instead of randomly doing it throughout a particular region by allocating a limited
number of resources to those selected clusters.
(II) Convenient access: Large samples can be chosen with this sampling technique
and that’ll increase accessibility to various clusters.
(III) Least loss in accuracy of data: Since there can be large samples in each
cluster, loss of accuracy in information per individual can be compensated.
Non-Probability Sampling
Non-probability sampling is most useful for exploratory studies like pilot survey (a survey
that is deployed to a smaller sample compared to pre-determined sample size). Non-
probability sampling is used in studies where it is not possible to draw random probability
sampling due to time or cost considerations.
Non-probability sampling is a less stringent method, this sampling method depends heavily
on the expertise of the researchers. Non-probability sampling is carried out by methods of
observation and is widely used in qualitative research.
(i) Non-probability sampling is a more conducive and practical method for researchers
deploying survey in the real world. Although statisticians prefer probability sampling because
it yields data in the form of numbers. However, if done correctly, non-probability sampling
can yield similar if not the same quality of results.
(ii) Getting responses using non-probability sampling is faster and more cost-effective as
compared to probability sampling because sample is known to researcher, they are motivated
to respond quickly as compared to people who are randomly selected.
(i) In non-probability sampling, researcher needs to think through potential reasons for
biases. It is important to have a sample that represents closely the population.
This is not a scientific method of sampling and the downside to this sampling
technique is that the results can be influenced by the preconceived notions of a
researcher. Thus, there is a high amount of ambiguity involved in this research
technique.
For example, this type of sampling method can be used in pilot studies.
2. CONVENIENCE SAMPLING
Ideally, in research, it is good to test sample that represents the population. But, in
some research, the population is too large to test and consider the entire population.
This is one of the reasons, why researchers rely on convenience sampling, which is
the most common non-probability sampling technique, because of its speed, cost-
effectiveness, and ease of availability of the sample.
3. Quota Sampling
Hypothetically consider, a researcher wants to study the career goals of male and
female employees in an organization. There are 500 employees in the organization.
These 500 employees are known as population. In order to understand better about
a population, researcher will need only a sample, not the entire population. Further,
researcher is interested in particular strata within the population. Here is where quota
sampling helps in dividing the population into strata or groups.
For studying the career goals of 500 employees, technically the sample selected
should have proportionate numbers of males and females. Which means there
should be 250 males and 250 females. Since, this is unlikely, the groups or strata is
selected using quota sampling.
4. Snowball Sampling
Snowball sampling helps researchers find sample when they are difficult to locate.
Researchers use this technique when the sample size is small and not easily
available. This sampling system works like the referral program. Once the
researchers find suitable subjects, they are asked for assistance to seek similar
subjects to form a considerably good size sample.
For example, this type of sampling can be used to conduct research involving a
particular illness in patients or a rare disease. Researchers can seek help from
subjects to refer other subjects suffering from the same ailment to form a subjective
sample to carry out the study.