Unit-5 Research Methodology
Unit-5 Research Methodology
Primary and secondary data, method of collecting primary data, preparation of questionnaire, type of questions,
characteristics of good questionnaire, concept of sampling, population, sampling frame, sampling and non-sampling
error, probability and non-probability sampling, brief overview of simple random sampling, stratified sampling,
cluster sampling, systematic sampling, multistage sampling and their practical applications in research problems,
sample size estimation for estimating mean and proportion
Data:
The collection of information from all the relevant sources to find answers to the research problem is called data.
Data can come in the form of text, observations, figures, images, numbers, graphs, or symbols to be used for specific
purpose. Data are simply units of information. Data are measured, collected, reported, analyzed, and used to create
data visualizations such as graphs, tables or Images.
For example, data might include individual prices, weights, addresses, ages, names, temperatures, dates, or
distances. Data is a raw form of knowledge and, on its own, doesn't carry any significance or purpose.
Types of data
1. Primary data
2. Secondary data
Primary data:
Primary data refers to the first hand data gathered by the researcher himself from original source. It is original and
unique to the research project or study being conducted. The methods of collection of primary data are surveys,
observations, experiments, questionnaire, personal interview and etc.
The primary data are those which are collected afresh and for the first time by investigator. Primary data are original
in nature and directly related to the issue or problem and current data.
1. Direct personal interview: In this method, the investigator collects information directly or personally from the
informants or respondents. The person who conducts the enquiry and collects data is known as investigator and the
person from whom the statistical information is collected are known as respondents. The investigator contacts the
respondents personally and collects the desired information from them by asking questions pertaining to the inquiry.
As a result, the data are likely to be original, fairly accurate and reliable as wee as prompt and uniform. This method
can be used when the number of persons to be interviewed is not very large. However, this method is expensive and
time consuming, particularly when a large number of persons have to be contacted. The information obtained may
not be reliable if the investigators are not properly trained and have personal bias and prejudice.
2. Indirect oral interview: Sometimes the required information is collected not directly from the respondents but
indirectly from other persons known as witness. Such a method of getting information from third parties rather than
the persons, who are involved, is known as the indirect interview method. This method is widely used by the
committees and commissions appointed by the government. It can be used to cover a large number of witnesses and
the opinion of the experts can also be sought. It is relatively a less expensive method in terms of time, money and
effort. However, there is always the possibility of personal bias and motives of the witnesses affecting the reliability
and accuracy of the information collected through indirect oral interview.
3. Information from local agents or correspondents: Under this method, local agents or corresponds are
appointed in different parts of the investigation area to collect information. These correspondents collect the required
information in their own way and style and transmit this information to a higher agency. This method is suitable
when regular and continuous information covering a wide area is needed. It also has the advantage of personal
contact with the informants and a fair degree of accuracy. It is particularly used by newspaper, radio and television
agencies to obtain regular information about various political and other happenings. Various government agencies
also use this method for getting regular information about prices, estimates of agricultural output, etc. However, this
method may result in lack of uniformity of data and personal bias of the correspondents, particularly when data are
to be collected from a large number of correspondents. Moreover data collected through this method are not original
and they may be less accurate.
4. Mailed questionnaire: Under this method, a questionnaire is method to various respondents. Questionnaire
consists of questions sheets or forms which contain all the questions relating to the inquiry so as to collect the
required information from the respondents. Respondents are requested to fill up the questionnaire and send it back
within a specified time. This method is suitable when informants are educated and enlightened and spread over a
wide geographical area. This method is quite economical in terms of money, time and efforts involved. This method
allows a fairly wide coverage of investigation. Moreover, this method is original because the required information is
collected from the concerned persons themselves. However, data collected by this method may be less accurate and
may not be uniform because there is no direct contact with the respondents. This method is of limited use as it
cannot be used when the respondents are uneducated. Many a times, the informants do not respond to
questionnaires. This method is often used by business and research organizations.
5. Schedule to be filled by the enumerators: Under this method, a standardized questionnaire is put under the
charge of the entrepreneurs. Enumerators are those persons who help investigators in collecting data. The
enumerators contact the respondents, get replies to the questions contained in the questionnaire and fill in their own
handwriting the desired information so obtained, called schedule. This method is extensively used in case of
population census all over the world. This method has the merits of wider coverage, personal contact with the
informants and a fair degree of uniformity and accuracy in the information obtained from the informants. However,
this method is expensive and time consuming. It requires a large team of competent and trained enumerators.
Question 1: Define the concept of data. Discuss the methods of collecting primary data.
Questionnaire: A questionnaire is a list of questions or items used to gather data from respondents about their
attitudes, experiences, or opinions. Questionnaires can be used to collect quantitative and/or qualitative information.
Types of questionnaire:
i. Structured questionnaire.
ii. Unstructured questionnaire.
iii. Open ended questionnaire.
iv. Close ended questionnaire.
v. Mixed questionnaire.
vi. Pictorial questionnaire.
Structured questionnaire:
It is a questionnaire used in formal inquiry having definite questions. The structured questionnaire is that one, in
which there are definite, unambiguous concrete and pre-ordered questions with additional questions limited to those
necessary to clarify inadequate questions or to elicit (bring out) a more detailed response. It is well prepared and
used in studies of economics, social problems, administrative politics etc.
Unstructured questionnaire:
In this type of questionnaire, only the direct questions or answer-required questions are arranged in some array. It is
similar to those set of questions of oral or written interview. It is the questionnaire used at the time of interview and
acts as guide for interviewer. It is flexible in working and used in the studies related to families or personal
experience, beliefs etc.
In this type of questionnaire, the views of the respondents are welcome. In open questionnaire, some spaces are
given to express the views of the respondents besides of the set answer options. It is the questionnaire, in which
respondents is free to express his/her views and the ideas. It is used in making intensive studies of the limited
number of the case and do not provide any structure for the respondent’s reply. The questions and their orders are
pre-determined in the nature.
It is the questionnaire, in which response are limited to the stated alternatives. It is the set of questions, in which
there are limited options to answer the question and no chance to express the respondents own views expect given
options.
Mixed questionnaire:
It is the questionnaire in which various forms of the questions are set to answer. Particularly this type of
questionnaire is the mix of both close and open ended questions. It is used in field of social research.
Pictorial questionnaire:
It is the set of questions, in which pictures are given and from those pictures, the respondents have to answer the
questions. It is very rare and used in the studies related to social attitude in children.
Preparation of a questionnaire:
The lay-out or the configuration of materials, how arranged within a questionnaire, is the subject of interest of the
respondents to reply it, is called the format of a questionnaire. The success of interest of the respondents to reply it,
is called the format of a questionnaire. The success of questionnaire method of data collection depends upon the
nature and configuration of the questions in the questionnaire. When questionnaire are going to design, the
researcher should keep the following points in mind as the key factors, which can make success or failure of the
research through questionnaire.
Size: a. The size of the questionnaire should be smaller than that of the schedule.
b. The extent in length and breadth should be appropriate.
c. It should not be more than two or three pages as to the nature of the
research.
Appearance: a. The questionnaire should be on the good quality of paper and printing.
b. It should have attractive layout, which create a good impression to the
respondents.
Clarity The questions should be short, clear in term, tenure and expression.
Communicability The questions of the questionnaire should be able to cath the interest of
the respondent.
There is certain assumption that the questionnaire should behave. The common and acceptable forms of the
requisites of a questionnaire are given below.
Questions:
The interrogative sentence statements used in questionnaire or schedule or interview for the collection of
information are the questions. The questions are related to facts, figures, knowledge and opinions to be collected
from the respondents. The questions used for the purpose of data collection are of different types.
Types of Questions:
i. Structured questions
ii. Open end questions
iii. Leading questions
iv. Ranking item questions
v. Ambiguous questions
vi. Presuming questions
vii. Hypothetical questions
viii. Personalized questions
ix. Behavioral questions
x. Memory questions
xi. Embarrassing questions
Structured questions:
Structured questions are those, which are designed in some arrangement with their possible answers. These are the
questions already conceived and classified in to different possible groups. They are precise, solid and predetermined
in nature.
The pre-considered answers of the questions are given in short forms in some words or in very short sentences. The
answers of the questions are clustered in different groups, which help to tabulation work. Simply the questions of the
questionnaire/ schedule / interview stand for (imply) the structured questions. The structured questions involve
clear-cut, actual and predictable issues to receive the exhaustive and desirable responses.
Structured question with answer in two possible groups is called dichotomous question while the structured question
with answer in more than two possible groups is called multiple choice question or cafeteria question. E.g. are you
educated, is dichotomous question. What is your age? Is multiple choice questions. Broadly, the structured questions
are classified as:
When prior expectation of possible answers to the question cannot be made then the question is open end question?
It is generally related to opinion. E.g. what type of teacher would you have to like?
Leading question:
A leading question is one that is asked in such a way to suggest the answer or one that contain the information that
the questioner is looking for. This type of question influences the respondent to answer in a certain way. E.g. what
do you think horrible effect of abortion?
Ambiguous question:
When the answer to the question can be given in more than one ways then the question is called ambiguous
question. E.g. what type of schooling did you have?
Presuming question:
The question in which pre assumption is made about the respondent is called presuming question. E.g. how many
kids do you have? Here pre-assumption is respondent is married.
Hypothetical question:
The question based upon hypothetical scenario is called hypothetical question. It is used to predict future behavior of
respondent. E.g. would you buy car if tax is reduced 50%?
Personalized question:
Question asked to respondent to differentiate group action from personal action is called personalized question. E.g.
smoking is bad for health, do you smoke?
Behavior question:
Question designed to get information about regular behavior of respondent is called behavior question. E.g. how
often you go to watch film?
Memory question:
The question related to past events of the respondent is called memory question. E.g. when did you buy your laptop?
Embarrassing question:
Question asked to respondent about view of others when respondent do not want to discuss their private affairs in
public. E.g. some people using this scooter find a lot of faults, can you guess what they were objecting it?
Question 5: What do you mean by questions? Write kinds of questions and discuss it.
Question to be Avoided:
Data are collected through respondent by asking questions to them. In designing of the questionnaire, schedule or
questions of interview the following questions should be avoided for getting precise information
Population:
In any statistical investigation, the interest usually lies in studying various characteristics relating to attempts or
individual belonging to a particular group. This group of individual under study is known as population. Thus, in
statistics, we define the population as an aggregate of objects, animate or inanimate under study in any statistical
investigation. Population is also called universe or census. Population may be made up of individuals, groups,
associations, areas or households.
Example:
Sample:
Sample is a smaller group which is selected from population, universe or census, which represents the characteristics
of the population under study. Thus, sample is a small portion of the population or universe from which it has been
drawn that may represent that population. Results are drawn from sample and generalized to entire population.
Example:
● The small group of some students selected from a college is called sample.
● Some books selected from library is called sample.
● The groups of few patients selected from a hospital is called sample.
Sampling:
Sampling is the act, process or technique of selecting a suitable sample from a population which is a representative
part of a population for the purpose of determining the characteristics or parameter of the whole population.
Sampling Frame:
A complete list of sampling units, which represents the population to be covered, is called the sampling frame
popularity known as frame. The frame may consist either a list of the units or a map of areas, in case of sample areas
is being taken. The frame should be accurate, free from omissions and duplications, exhaustive, adequate, up to date,
and the units should be identified without ambiguity. If a list is not available, it should be prepared before
conducting the main survey.
An error, which arises from imperfection in the method of sampling, due to instrumental or to human factors is
called sampling error. In the sampling method, instead of observing all items, the individuals in the sample are only
observed and the sample characteristics are utilized to estimate the population. The error or the deviation of the
observed value from the true value in such approximation is called a sampling error.
A sampling error is the error, which is occurred during the selection of samples that are not representative of
population. It arises due to the fact that only a part of the population has been used to estimate population parameter
and draw inferences about the population. As such sampling errors are absent in a complete enumeration.
Sampling error occurs because of the particular sampling technique is being used for selection of sampling. Like
simple random sampling requires larger sample size, cluster sampling has the problem of clustered population etc.
i. Faulty selection of a sample: Some of the sampling errors by the use of objective sampling technique for
the selection of a sample.
ii. Substitution: If difficulties arise in enumerating a particular sampling unit included in random sample, the
investigators usually substitute a convenient member of the population. This is obviously leads to sampling
error, since the characteristic possessed by the unit originally included in the sample.
iii. Faulty demarcation of sampling unit: Sampling error observed due to defective demarcation of sampling
unit is particularly significant in survey. In such survey, while dealing with boarder line cases it depends
more or less on discretion of the investigator whether to include them in sample or not.
iv. Improper choice of statistic to estimate population parameter: Sometimes the biased estimator may
create sampling error while estimating population parameter. Increase in sample size (i.e. numbers of units
in the sample) usually results in the decrease in sampling error. In fact, in many situations this decrease in
sampling error is inversely proportional to the square root of the sample size.
Non-Sampling Error:
Non-sampling error primarily arise at the stages of observation, ascertainment and processing of data. They may
occur both in the complex enumeration survey and in the sample survey. Thus, it is very difficult task to identify and
control non-sampling error.
Types of sampling
The sampling techniques usually depend upon the nature of data and type of enquiry. On the basis of these
characteristics, the sampling is broadly classified in the following ways:
Probability sampling:
2. Stratified sampling
3. Systematic sampling
4. Cluster sampling
5. Multi-stage sampling
Simple random sampling is the process of drawing the small units of population where each and every units of
population have equal chance in occurrence.
SRSWOR:
If the second draw is made without the replacement of first draw is called simple random sampling without
replacement (SRSWOR) method.
Let we have N units of population and n size of the sample be drawn without replacement as NC n ways
1
and each have same probability 𝑁 .
𝐶𝑛
SRSWR:
If the second draw is made with the replacement of first draw is called simple random sampling with replacement
(SRSWR) method.
𝑛
Let we have N units of population and n size of the sample be drawn with replacement as 𝑁 ways and
1
each have same probability 𝑛 .
𝑁
There are two aspects of performing simple random sampling viz. SRSWOR and SRSWR. SRS can be done by any
of the following method.
i. Lottery method.
ii. Mechanical Randomization or Random Number Table.
Merits:
i. Since the sample unit are selected at random giving each unit an equal chance of being selected, the
element of subjectivity or personal bias is completely eliminated. As such a simple random sample is more
representative of population as compared to the judgment or purposive sampling.
ii. The statistician can ascertain the efficiency of the estimate of the parameters by considering the sampling
distribution of the statistics (estimate) e.g. ȳ n as an estimate of 𝑌N becomes more efficient as sample size n
increases.
Demerits:
i. The selection of SRS requires an up to date frame, i.e. a completely catalogued population from which
samples are to be drawn. Frequently it is virtually impossible to identify the units in the population before
the sample is drawn and this restricts the use of simple random sampling technique.
ii. Administrative inconvenient: A simple random sample may result in the selection of the sampling units
which are widely spread geographically and in such case the cost of collecting the data may be much in
terms of time and money.
iii. At times, a simple random sample might give most non-random looking results. For example if we draw a
random sample of size 13 from a pack of card, we may get all the cards of same suit. However the
probability of such an outcome is extremely small.
iv. For a given precision, simple random sampling usually requires larger sample size as compared to stratified
random sampling.
Stratified Sampling:
If the population is heterogeneous then simple random sampling cannot be used, in such case stratified sampling is
used. According to this method divide the heterogeneous population into different groups or strata such population
within strata is homogeneous and between strata is heterogeneous. Then from each strata, samples are selected using
simple random sampling.
Systematic Sampling:
A sampling technique in which only first unit is selected with the help of random number and rest get automatically
according to some pre-designed pattern is called systematic sampling. This methods of sampling is usually
employed if the complete and up to date lists of sampling units is available and the units are arranged in some
systematic order such as alphabetical, chronological, geographical order etc.
Cluster Sampling:
It is sampling method used when the population is not homogeneous. The population is divided into non overlapping
sub populations called clusters. A simple random sample of cluster is selected and all elements belonging to the
cluster are studied. Clusters are made in such a way that the population within cluster is heterogeneous and between
clusters is homogeneous. It is less precise than simple random sampling. The simple random sampling and cluster
sampling are equivalent if cluster size is unity. The clusters are colleges, towns, homes etc.
In cluster sampling, clusters should be as small as possible and the number of sampling units in each
cluster should be approximately same.
Multi-stage Sampling:
Multi-stage sampling, sometimes called multi-stage cluster sampling, is a development of cluster sampling. In
cluster sampling after the selection of cluster, each and every unit of the selected clusters is enumerated. Instead of
enumerating all the units in the selected clusters one can obtain better and more efficient result by sub sampling
within the clusters. Hence the method of sampling in which the first clusters are selected and then specified numbers
of elements are selected from each cluster is called sub sampling or two stage sampling.
In such sampling design, cluster which forms the units of sampling at first stage are called the first stage
unit (fsu) or primary sampling unit (psu) and the elements within clusters are called second stage unit (ssu). This
process can be generalized to three or more stages and is called multi-stage sampling.
Question 9: Define probability sampling. Discuss in detail about the probability sampling.
If the small units of population are drawn according to the personal judgment of the researcher is called non
probability sampling.
● Judgment sampling
● Convenience sampling.
● Quota sampling.
● Snowball sampling.
● Purposive sampling.
Judgment sampling. A sampling process the sample units are selected according to personal judgment of
researcher or investigator. The investigators include only those units in the sample from population which they think
most appropriate for the study. Researchers often believe that they can obtain a representative sample by using a
sound judgment, which will result in saving time and money.
Convenience sampling:
A sampling process the sample units are selected neither by probability nor by personal judgment but by
convenience is called convenience sampling. A sample obtained from easily available lists such as telephone
directories, automobile registrations, etc. is an example of convenient samples.
Purposive sampling:
The method of sampling in which certain units are selected from the population according to specific purpose of
researcher is called purposive sampling.
Quota sampling:
It is a special case of stratified sampling without use of probability. It is judgment sampling with stratification. The
sampling quotas may be fixed according to some specified characteristics such as income, sex occupation, religions
etc. Quota sampling is very popular in market survey and public opinion polls because it is cheaper than random
sampling.
Snowball sampling:
Snowball sampling technique is used by researcher to identified potential subjects in studies where subjects are hard
to locate. In this method survey subjects are selected on referral from other survey respondents. A respondent is
identified according to the objective of study and other respondents are identified according to the referral from the
respondent. This sampling technique is often used in hidden populations which are difficult for researcher to access;
example populations would be drug users or sex workers. As sample member are not selected from a sampling
frame.
Question 10: Define non-probability sampling. Discuss in detail about the non-probability sampling.
Example 1
Determine the minimum sample size required so that the sample estimate lies within 10% of the true value with 95%
level of confidence when coefficient of variation is 60%.
Solution:
(
P |ȳ − Ӯ| ≤ 1. 96 𝑆
(𝑁−𝑛)
𝑁𝑛 ) ≥ 0.95
(𝑁−𝑛)
Then, 0.1 Ӯ = 1.96 S. 𝑁𝑛
Assuming large population,
Ӯ 2
(1.96)2 ( 1
𝑛
−
1
𝑁 ) = 0.01 ( ) 𝑆
≅ 136
1 2
or,
3.8416
𝑛
= 0.1 ( ) 𝐶.𝑉.
1 2
or,
3.8416
𝑛
= 0.01 ( ) 0.6
2
3.8416 𝑥 (0.6)
or, 0.01
=n
Example 2:
In measuring reactions time, a psychologist estimates that the standard deviation is 0.05 seconds. How large a
sample of measurement must be taken in order to be 99% confident that the error of his estimate will nor exceed
0.01 seconds.
Solution:
Here,
or, α = 0.01
Zα/2 = 2.58
Example 3:
A researcher wants to conduct a survey of disabled at Kathmandu valley. What should be the sample size of the prior
estimate of population of disables in the population is 10% and the desired error is estimation is 2% and level of
significance is 5%.
Solution:
Here,
Q = 1 – P = 0.9
Example 4:
Solution:
Here,
2
𝑍α/2 𝑃𝑄 2
(2) 𝑥0.2𝑥(1− 0.2)
n= 2 = 2 = 256.
𝑑 (0.02)
The optimal sample size can also be determined for testing mean of a Normal Distribution for one tailed test based
on level of significance (α) and power of test (1-𝛃). For testing one sided alternative, H0: μ0 = 𝑥 Vs HA: μ1>𝑥 the
formula for the sample size is shown below without proof.
2 2
σ (𝑍1−β+𝑍1−α)
n= 2
|µ0−µ1|
Example 5:
Suppose the average cholesterol level in children found in some paediatrics genetics research is 175 mg%/ml and it
is suspected to be as high as 190 among the hereditary record group. The researcher wish to test the hypothesis, H0:
μ0 = 175 Vs HA: μ1>175 = 190 (suspected level) and wish to determine the ideal sample size for their research for the
desired level of 5% with a power of 90%. From their previous research the standard deviation of cholesterol was
found 50.
Solution:
Here,
Numerical problem:
1. Determine the minimum sample size required so that the sample estimate lies within 5% of the true value with
99% level of confidence when coefficient of variation is 40%.
Solution:
𝑠
∴ = 0.4
𝑌
d = 5% 𝑌 = 0.05 𝑌
1 – α = 99% = 0.99
∴ α = 1 - 0.99 = 0.01
(
P |ȳ − Ӯ| ≤ 2. 576 𝑆
(𝑁−𝑛)
𝑁𝑛 ) ≥ 0.99 ---------------- (2)
(𝑁−𝑛)
0.05 Ӯ = 2.576 S. 𝑁𝑛
Ӯ 2
(2.576)2 ( 1
𝑛
−
1
𝑁 ) = 0.0025 ( ) 𝑆
1 2
or,
6.635776
𝑛
= 0.0025 ( )
𝐶.𝑉.
1 2
or,
6.635776
𝑛
= 0.0025 ( )
0.4
2
6.635776 𝑥 (0.4)
or, 0.0025
=n
∴ n = 424.6897 =425.
2. What should be the size of sample if a simple random sample from a population of 4000 items is to be drawn to
estimate the percent defective within 2% with 95% probability? What would be the size of sample if the population
is assumed to be infinite?
Solution: Given,
d = 2% = 0.02
P = 95% = 0.95
Q = 1 – P = 1 – 0.95 = 0.05
α = 0.05
Now,
2
𝑍α/2 𝑃𝑄 2
(1.96) ×0.95×0.05
𝑛0 = 2 = 2 =
𝑑 (0.02)
Then,
Sample size is
𝑛0
𝑛= 𝑛0 =
1+ 𝑁
3. Determine the sample size from the following information; N = 5000, S2 = 4, d = 0.8, α = 1%.
Solution:
Here,
S2 = 4
α = 0.01
d = 0.8
Now,
2 2
𝑍α/2 𝑆 2
(2.58) ×4
𝑛0 = 2 = 2 =
𝑑 (0.8)
Then,
Sample size is
𝑛0
𝑛= 𝑛0 =
1+ 𝑁
Exercise 4
21. What should be the size of sample if a simple random sample from a population of 4000 items is to be drawn to
estimate the percent defective within 2% of the true value with 95% probability? What would be the size of sample
if the population is assumed to be infinite?
2
( )
𝑍α
Hint: sample size (𝑛0) = 2
𝑑
𝑃𝑄 = 𝑛
If N is finite,
𝑛0
𝑛= 𝑛0
1+ 𝑁
Questions:
1. Define the concept of data. Discuss the methods of collecting primary data.
2. Define questionnaire. What are the types of questionnaire and describe its types.
5. What do you mean by questions? Write kinds of questions and discuss it.
12. Determine the minimum sample size required so that the sample estimate lies within 10% of the true value with
95% level of confidence when coefficient of variation is 60%.
13. In measuring reactions time, a psychologist estimates that the standard deviation is 0.05 seconds. How large a
sample of measurement must be taken in order to be 99% confident that the error of his estimate will nor exceed
0.01 seconds.
14. A researcher wants to conduct a survey of disabled at Kathmandu valley. What should be the sample size of the
prior estimate of population of disables in the population is 10% and the desired error is estimation is 2% and
level of significance is 5%.
15. Determine the minimum sample size required so that the sample estimate lies within 5% of the true value with
99% level of confidence when coefficient of variation is 40%.
16. What should be the size of sample if a simple random sample from a population of 4000 items is to be drawn to
estimate the percent defective within 2% with 95% probability? What would be the size of sample if the
population is assumed to be infinite?
17. Determine the sample size from the following information; N = 5000, S2 = 4, d = 0.8, α = 1%.
18. What should be the size of sample if a simple random sample from a population of 4000 items is to be drawn to
estimate the percent defective within 2% of the true value with 95% probability? What would be the size of
sample if the population is assumed to be infinite?