Probability Sampling
Probability Sampling
INTRODUCTION
In order to carry out a research study, you have to first acquire relevant information on
the subject.
In other words, you have to collect data.
This data is required to test your ‘hypotheses’ or generalizations that you have made for
the time being.
Let us suppose that as a researcher, you want to look into the relationships between study
habits and achievement motivation of undergraduate Students of Nursing. For this, you
have to select a few representative cases or samples from the entire population of
undergraduate students of Nursing.
The process of selection demands thorough understanding of the concept of population,
sample and various sampling techniques.
We shall familiarize you with the concepts of sample and population.
We shall also discuss the characteristics of a good sample and the various methods of
sampling.
1
For our research, it is impossible to collect information about the study habits of all these
students.
So, for the survey a researcher will have to select a representative few, i.e., a sample from
the population. This process is known as sampling.
If the nature of the population has to be inferred from a sample, it is necessary for the
sample to be truly representative of the population.
Moreover, it calls for drawing a representative ‘proportion’ of the population.
The population may contain a finite number of members or units.
Sometimes, the population may be ‘infinite’ as in the case of air pressure at various points
in the atmosphere.
Therefore, a population has to be defined clearly so that there is no ambiguity as to
whether a given unit belongs to the population or not.
Otherwise, a researcher will not know what units to consider for selecting a sample.
For example, we want to understand the study habits of distance education students.
Here, the population is not well defined : we are not told about the university/ universities
that have to be included in this survey. After all, there are more than hundred universities
in India, that provide distance education and there are thirteen state open universities.
Hence, to define it accurately, we have to specify the group as, say, undergraduate
students of Nursing.
The second issue related to the representativeness of a sample is to decide about the
‘sampling frame’, i.e., listing of all the units of the population in separate categories.
In the above study, there can be different sampling frames, such as male/female students,
employed/unemployed students, etc.
The sampling frame should be complete, accurate and up-to-date, and must be drawn
before selecting the sample.
Thirdly, a sample should be unbiased and objective.
Ideally, it should provide all information about the population from which it has been
drawn. Such a sample, based on the logic of induction, i.e., proceeding from the
particular to the general, falls within the range of random sampling errors. This leads us
to the results expressed in terms of “probability”.
2
A sample should not only be representative , but should also be adequate enough to
render stability to its characteristics.
What, then, is the ideal size of a sample?
An adequate sample is the one that contains enough cases to ensure reliable results.
If the population under study is homogeneous, a small sample is sufficient.
However, a much larger sample is necessary, if there is greater variability in the units of
population.
Thus the procedure of determining the sample size varies with the nature of the
characteristics under study and their distribution in the population.
Moreover, the adequacy of a sample will depend on our knowledge of the population as
well as on the method used in drawing the sample.
For example, if we try to find out the study habits of undergraduate students of Lady
Irwin College, Delhi, the population will obviously be more homogeneous than the
population of undergraduate students of Nursing, with respect to socio-economic status,
employment of students or study hours available.
However, it should be understood that the adequate size of the sample does not
automatically ensure accuracy of results.
METHODS OF SAMPLING
i) Probability sampling
ii) Non-probability sampling
PROBABILITY SAMPLING
4
Fisher and Yates (1967), After assigning consecutive numbers to the units of population,
the researcher starts at any point on the table of random numbers and reads the
consecutive numbers in any direction horizontally, vertically or diagonally.
If the read-out number corresponds with the one written on a unit card, then that unit is
chosen for the sample.
Let us, suppose that a sample of 5 study centers is to be selected at random from a
serially numbered population of 60 study centers. Using a part of a table of random
numbers reproduced here, five two-digit numbers (as the total population of study
centers, 60, is a two-digit figure) are selected.
Row 1 2 3 4 5 --- n
Column
1 2315 7548 5901 8372 5993 --- 6744
2 0554 5550 4310 5374 3508 --- 1343
3 1487 1603 5032 4043 6223 --- 0834
4 3897 6749 5094 0517 5853 --- 1695
5 9731 2617 1899 7553 0870 --- 0510
6 1174 2693 8144 3393 0862 --- 6850
7 4336 1288 5911 0164 5623 --- 4036
8 9380 6204 7833 2680 4491 --- 2571
9 4954 0131 8108 4298 4187 --- 9527
10 3676 8726 3337 9482 1569 --- 3880
11 --- --- --- --- --- --- ---
12 --- --- --- --- --- --- ---
n 3914 5218 3587 4855 4881 --- 5042
If you start with the first row and the first column, 23 is the first two-digit number, 05
is the next number and so on.
Any point can be selected to start with the random numbers for drawing the desired
sample size.
5
Suppose the researcher selects column 4 from row 1, the number to start with 83.
In this way he/she can select first 5 numbers from this column starting with 83. The
sample, then, is as follows:
83 75
53√ 33√
40√ 01√
05√ 26√
Now, in selecting the sample of 5 study centers, two numbers, 83 and 75, need to be
deleted as they are bigger than 60, the size of the population.
The processes of selection and deletion are stopped after the required number of five
units get selected.
The selected numbers are 53, 40, 05, 33 and 01. If any number is repeated in the table, it
may be substituted by the next number from the same column.
The researcher will go on to the next column until a sample of the desired size is
obtained.
Simple random sampling, ensures the best results. However, from a practical point of
view, a list of all the units of a population is not possible to obtain.
Even if it is possible, it may involve a very high cost which a researcher or an
organization may not be able to afford.
Therefore, simple random sampling is difficult to realize. Also, in case of a
heterogeneous population, a simple random sample may not necessarily represent the
characteristics of the total population, even though all selected units participate in the
investigation.
6
In the case of undergraduate students of the Open University in your country (assuming
you have one), students may be employed in different sectors and categories of
services/industries.
In spite of your best efforts you may not be able to list all the categories of employment.
In such a case, simple random sampling cannot help in representing all the categories
under study.
Systemic sampling
Systematic sampling provides a more even spread of the sample over the population list and
leads to greater precision. The process involves the following steps:
a. Make a list of the population units based on some order - alphabetical, seniority, street
number, house number or any such factor.
b. Determine the desired sampling fraction, say 50 out of 1000; and also the number of the
Kth unit. [K=N/n= 1000/50 = 20].
c. Starting with a randomly chosen number between 1 and K, both inclusive, select every
Kth unit from the list. If in the above example the randomly chosen number is 4, the
sample shall include the 4th, 24th, 44th, 64th, 84th units in each of the series going up to
the 984th unit.
This method provides a sample as good as a simple random sample and is comparatively
easier to draw.
If a researcher is interested to study the average telephone bill of an area in his/her city,
he/she may randomly select every fourth telephone holder from the telephone directory
and find out their annual telephone bills.
However, this method suffers from the following drawbacks because of departure from
randomness in the arrangement of the population units.
i) Periodic effects
Populations with more or less definite periodic trend are quite common.
Students’ attendance at a residential university library open seven days in
a week, sales of a store over twelve months in a year and flow of road
traffic past a particular traffic point on a road over 24 hours are a few
examples to show periodic trend or cyclic fluctuation in a given
7
population. In such cases systematic sample may not represent the
population adequately or remain effective all the time.
ii) Trend
Another handicap of systematic sampling emerges from the fact that very
often ‘n’ is not an integral multiple of ‘k’.
This leads to a varying number of units in the sample from the same finite
population.
Suppose a population of 100 counsellors is listed according to seniority
and a researcher wants to select a sample of 20.
First, he/she divides 100 by 20 to get 5 as the size of the interval.
Suppose he/she picks 4 at random from 1 to 5 as a starting number.
Then, he/she selects each 5th name at 9,14,19.... until he/she draws the
desired 20 names.
If he/she picks 2 as the starting point, another sample would consist
2,7,12.... In the latter sample each counsellors seniority is lower than
his/her counterpart in the former sample.
The mean average of these two samples would be significantly divergent
as regards seniority and other associated variables.
Many such samples can be drawn by taking different starting points but
there will be greater variation among them.
Thus, the ‘periodic effects’ and ‘trend’ of the listed population unduly
increase the variability of the samples, and calculations made from such
samples cannot show the sources of variability.
The main advantages of systematic sampling are:
a) It involves simple calculations.
b) It is less expensive than random sampling.
8
Stratified Random Sampling
1) Equal Allocation
In this type, all strata contribute the same number of sampling elements to the
sample.
9
Thus, if there are three strata , one third of the sample would be selected from
each stratum. This type of allocation is done when strata have equal population.
2) Proportional Allocation
In this type, all strata contribute to the sample a number that is proportional to its
size in the population.
The larger the stratum , the more members it contributes to the sample .
The sampling fraction remains constant .
Suppose there are five strata to be sampled and the respective population sizes of
the strata are as follows and 5% stratified random sample is to be selected.
The proportional allocation will be done as follows:
3) Optimum Allocation
In optimum allocation, the strata contributions to the sample are proportional to
the product of the strata population sizes and the variability of the dependent
variable within the strata.
Large strata and strata with large variability will have larger contributions to the
sample.
Because of the requirement of good estimates of population variability of
dependent variable, which is seldom available before the sample is selected, The
optimum allocation is used infrequently.
Stratified random sample is useful when lists of units or individuals in the
population are not available.
10
It is also useful in providing more accurate results than simple random sampling.
For example, while selecting a sample of undergraduate students of the Open
University in your country, the researcher may decide the whole population of
undergraduate students as males and females, north, east, south and west regions
of the country and then employed in government, private and autonomous
institutions in the country.
All these will be different strata. From each stratum researcher may select 50
students as a sample.
Sometimes stratification is not possible before collecting the data.
The stratum to which a unit belongs may not be known until the researcher has
actually conducted the survey.
Personal characteristics such as sex, social class, educational level, age etc., are
examples of such stratification criteria.
The procedure in such situations involves taking of a random sample of the
required size and then classifying the units into various strata.
The method is quite efficient provided the sample is reasonably large, i.e., more
than 20 in every stratum.
Cluster sampling
Cluster sampling is used when the population under study is infinite, where a list of units
of population does not exist, when the geographic distribution of units is scattered, or
when sampling of individual units is not convenient for several administrative reasons.
It involves division of the population into clusters that serve as primary sampling units.
A selection of the clusters is then made to form the sample.
Thus, in cluster sampling, the sampling unit contains clusters instead of individual
members or items in the population.
For example, for the purpose of selecting a sample of high school teachers in a state, you
may enlist all high schools instead of teachers teaching in high schools and select
randomly a 10 per cent sample (say) of the schools as clusters.
You may then use all the teachers of the selected schools as the sample or randomly select
a few of them.
11
Any location within which we find an intact group of similar characteristics (population
members) is termed as a cluster.
Examples of cluster include classrooms, schools, hospitals, and study centers .
Cluster sampling is economic, especially when the cost of measuring a unit is relatively
small and cost of reaching it is relatively large.
Multistage sampling
12
However, this method is recommended only when it seems impractical to draw a simple
random sample.
When the units vary in size, it is better to select a sample in such a way that the
probability of selection of units is proportional to its size.
For example, a particular study center has a population of 200 learners and another one
has 100.
While drawing a sample, the first study center will have double the representation as
compared to the second study center. Such a sample is known as probability proportion to
size sample or PPS sample.
There are a number of websites that will generate random numbers for you .
For e.g., website www.randomizer.org is very easy to use.
On opening this website you will have to answer a series of questions such as how many
sets of random numbers to be generated; how many numbers per set to be produced ;
number range etc.
Many software packages include programmes for selecting a random sample.
One such package is Statistical Package for Social Sciences (SPSS) for Windows 15.0
(SPSS, Inc.,2006). SPSS has two options for specifying the size of random sample:
a) Exactly
b) Approximately
Exactly, as the name suggests, requires exact/specific number like 600 from 2000 Class
IX students listed .
Whereas the second option specifies the sampling fraction i.e. the ratio of sample size to
population size, e.g. 30 percent of all the Class IX students could be selected.
A number of other software packages are also available that provide the scope for the
selection of a random sample other than a simple random sample.
13
SUMMARY
The briefing of Probability Sampling includes; (1) concept of Sampling (2) concept of
population (3) methods of sampling : PROBABILITY SAMPLING. (4) Research article for
evidence.
14
CONCLUSION
15
RESEARCH ARTICLE
Debashis rout (2019) conducted a study on Data mining ( towards sampling) is a key subject
on discovering various dimensions of unpolished data that is crucial for data extraction which
will be used for data analysis. One major area of data mining is the data pre-processing. In this
busy world it’s impossible to find the exact data when we plan for any data analysis using raw
data for a large population. It is wrong to believe that core data is enough to be used directly
from different sources. There are many reasons why it’s happening, because of incomplete information,
containing noisy data, duplication of data, as well as too inconsistent data. It creates a major
impact when we try to take any important decision based on data analysis. So, data preprocessing
plays a major role in business intelligence to validate raw data to prepare quality data. Data
preprocessing is a vast area, which consist of various strategies and methods being interrelated.
This article going to discuss about one of the important techniques under preprocessing named
sampling and comparison of various sampling procedures.
16
REFERENCES
Fisher, R. A., and Yates, F. (1967) Statistical Tables for Biological, Agricultural and Medical
Research, London, Oliver and Boyd.
Krejcie, Robert V., and Morgan Daryle W. (1970) Determining Small Size for Research
Activities in Educational and Psychological Measurement, 30, 607-610.
Enki-Village. (2019). What Is Purposive Sample? When and How to Use It. [online] Available at:
https://www.enkivillage.org/purposive-sampling.html [Accessed 19 Apr. 2019].
Abedor, Handbook on Improving Sampling Methods.
Knuth, Donald E., TEX, a System for Technical Text, American Mathematical society.
17