What Is Data Collection
What Is Data Collection
In Statistics, data collection is a process of gathering information from all the relevant
sources to find a solution to the research problem. It helps to evaluate the outcome of
the problem. The data collection methods allow a person to conclude an answer to the
relevant question. Most of the organizations use data collection methods to make
assumptions about future probabilities and trends. Once the data is collected, it is
necessary to undergo the data organization process.
The main sources of the data collections methods are “Data”. Data can be classified
into two types, namely primary data and secondary data. The primary importance of
data collection in any research or business process is that it helps to determine many
important things about the company, particularly the performance. So, the data
collection process plays an important role in all the streams. Depending on the type of
data, the data collection method is divided into two categories namely,
In this article, the different types of data collection methods and their advantages and
limitations are explained.
Also, read:
Categorical Data
Data Handling
Sampling Methods
Data Collection and Organisation
Observation Method
Observation method is used when the study relates to behavioural science. This
method is planned systematically. It is subject to many controls and checks. The
different types of observations are:
Interview Method
The method of collecting data in terms of verbal responses. It is achieved in two ways,
such as
Questionnaire Method
In this method, the set of questions are mailed to the respondent. They should read,
reply and subsequently return the questionnaire. The questions are printed in the
definite order on the form. A good survey should have the following features:
Schedules
This method is similar to the questionnaire method with a slight difference. The
enumerations are specially appointed for the purpose of filling the schedules. It explains
the aims and objects of the investigation and may remove misunderstandings, if any
have come up. Enumerators should be trained to perform their job with hard work and
patience.
Government publications
Public records
Historical and statistical documents
Business documents
Technical and trade journals
Diaries
Letters
Unpublished biographies, etc.
Visit BYJU’S -The Learning App for Maths related articles and also watch personalized
videos to learn with ease.
Sources of Data
The sources of data can be classified into two types: statistical and non-statistical.
Statistical sources refer to data that is gathered for some official purposes, incorporate
censuses, and officially administered surveys. Non-statistical sources refer to the
collection of data for other administrative purposes or for the private sector.
1. Internal sources
When data is collected from reports and records of the organisation itself, they are known as
the internal sources.
For example, a company publishes its annual report’ on profit and loss, total sales, loans,
wages, etc.
2. External sources
When data is collected from sources outside the organisation, they are known as the
external sources. For example, if a tour and travel company obtains information on
Karnataka tourism from Karnataka Transport Corporation, it would be known as an external
source of data.
Types of Data
A) Primary data
B) Secondary data
(A) Investigator
(B) Enumerator
(C) Informant/Respondent
Answer:
Investigator ● One who conducts investigation, i.e., statistical enquiry and seeks information is known
as an investigator.
● It can be an individual person or an organisation.
Enumerator ● An enumerator is a person who helps investigators in the collection of data.
Informant ● An informant is the respondent who supplies the information to the investigators or
enumerators.
Direct Personal Investigation
Q.1 Explain direct personal investigation method of collecting primary data. Discuss its merits and
demerits.
Answer:
(A) Direct personal ● Under this method, the investigator obtains the first-hand information from
investigation the respondents themselves.
Q.1 Explain indirect oral investigation method of collecting primary data. Give its merit and
demerits.
Answer:
(A) Indirect oral Under this method, instead of directly approaching the informants, the investigators
investigation interview several other persons who are directly or indirectly in touch with the
informants.
(1) Wide coverage ● A wide area can be brought under investigation through this method.
(1) Indirect ● Since the information is not collected directly from the party, there is a
information possibility that it will not be fully true.
(2) Lack of accuracy ● As compared to direct personal investigation, the degree of accuracy of the
data is likely to be lower.
(3) Lack of uniformity ● Information collected from different persons for the same party may not be
homogeneous and comparable.
(4) Possibility of ● Respondent/witness can modify the information according to his personal
biased information interest.
Meaning of Liberalisation
Meaning of Globalisation
Meaning of Privatisation
Answer:
(A) Information through Under this method, local agents or correspondents are appointed and trained
correspondents to collect the information from the respondents.
(1) Wide coverage ● This method is useful where the field of investigation is very wide and
the information is to be collected from different parts of the country.
(3) Suitable for special ● This method is suitable for some special purpose investigations.
purposes
(1) Lack of uniformity ● The information supplied by different correspondents often lacks
homogeneity; hence it is not comparable.
(2) Lack of reliability ● Data obtained using this method may not be very reliable because of the
possibility of personal bias and prejudice of the enumerator.
(3) Less accuracy ● This method cannot be used where a high degree of accuracy is
required.
(4) Costly ● A lot of time and money is spent to collect the information through
correspondence.
Telephonic interviews
Q.1 Explain the telephonic interviews method of collecting primary data. Give its merit and demerits.
Answer:
(A) Telephonic Under this method, data is collected through interviews over the telephone.
interviews
(B) Following are the merits of telephonic interviews:
(1) Wide coverage ● This method is useful where the field of investigation is very wide and the
information is to be collected from different parts of the country.
(3) Reliability ● The collected data is reliable as it is obtained directly from the party.
(1) Limited use ● The disadvantage of this method is limited accessibility to people. This method
is not possible for people who do not own a telephone or mobile.
(2) Visual feedback is ● Telephone interviews also obstruct visual reactions of the respondents, which
not possible become helpful in obtaining information on sensitive issues.
Q.1 Discuss the mailed questionnaire method of collecting primary data. What are its merits and
demerits?
Answer:
(A) Mailed ● Under this method, a questionnaire containing a number of questions related
questionnaire method to the investigation is prepared.
Or
Answer:
(A) Questionnaires Under this method, an enumerator personally visits informants along with a
filled by enumerators questionnaire, asks questions, and notes down their response in the questionnaire
in his own language.
(2) Better responses The presence of the enumerator may induce the respondents to give information.
Answer:
(A) Meaning of ● Secondary data refers to the data that has already been collected by some other
secondary data person or agency and is used by us.
categories:
1. Published sources
2. Unpublished sources
(1) Published Published sources mean data available in printed form. It includes the following:
sources
1. Magazines, journals, and periodicals published by various government, semi-
government, and private organisations; Data related to birth, death, education,
etc., by the government at various levels; data regarding prices, production, etc.,
published by Economic Times, Financial Express, etc.
2. Reports of various committees or commissions like reports of pay commission
report, finance commission report, etc.
3. Reports of international agencies that are regularly published by agencies like
UNO, WHO, IMF., etc.
(2) Unpublished ● All the statistical material is not always published.
sources
● This category includes the records maintained by various government
and private offices.
Answer:
Following are the main precautions to be taken while using secondary data:
(1) Reliable agency ● We must ensure that the agency that has published the data should
be reliable.
(2) Suitability for the purpose ● The investigator must ensure that the data is suitable for the purpose
of enquiry of the present enquiry.
KEY TAKEAWAYS
Special Considerations
A representative sample is generally expected to yield the best collection of
results. Representative samples are known for collecting results, insights,
and observations that can be confidently relied on as a representation of the
larger population being studied. As such, representative sampling is typically
the best method for marketing or psychology studies.
While representative samples are often the sampling method of choice, they
do have some barriers. Oftentimes, it is impractical in terms of time, budget,
and effort to collect the data needed to build a representative sample. Using
stratified random sampling, researchers must identify characteristics, divide
the population into strata, and proportionally choose individuals for the
representative sample.
In general, the larger the population target to be studied the more difficult
representative sampling can be. This method can be especially difficult for an
extremely large population such as an entire country or race. When dealing
with large populations it can also be difficult to obtain the desired members
for participation. For example, individuals who are too busy to participate will
be under-represented in the representative sample. Understanding the pros
and cons of both representative sampling and random sampling can help
researchers select the best approach for their specific study.
Random Sampling
Sampling Error Formula
Population and Sample
Sampling error
Probability Sampling
Non-probability Sampling
Example:
Suppose we want to select a simple random sample of 200 students from a school.
Here, we can assign a number to every student in the school database from 1 to 500
and use a random number generator to select a sample of 200 numbers.
Systematic Sampling
In the systematic sampling method, the items are selected from the target population by
selecting the random selection point and selecting the other methods after a fixed
sample interval. It is calculated by dividing the total population size by the desired
population size.
Example:
Suppose the names of 300 students of a school are sorted in the reverse alphabetical
order. To select a sample in a systematic sampling method, we have to choose some
15 students by randomly selecting a starting number, say 5. From number 5 onwards,
will select every 15th person from the sorted list. Finally, we can end up with a sample
of some students.
Stratified Sampling
In a stratified sampling method, the total population is divided into smaller groups to
complete the sampling process. The small group is formed based on a few
characteristics in the population. After separating the population into a smaller group,
the statisticians randomly select the sample.
For example, there are three bags (A, B and C), each with different balls. Bag A has 50
balls, bag B has 100 balls, and bag C has 200 balls. We have to choose a sample of
balls from each bag proportionally. Suppose 5 balls from bag A, 10 balls from bag B and
20 balls from bag C.
Clustered Sampling
In the clustered sampling method, the cluster or group of people are formed from the
population set. The group has similar significatory characteristics. Also, they have an
equal chance of being a part of the sample. This method uses simple random sampling
for the cluster of population.
Example:
An educational institution has ten branches across the country with almost the number
of students. If we want to collect some data regarding facilities and other things, we
can’t travel to every unit to collect the required data. Hence, we can use random
sampling to select three or four branches as clusters.
All these four methods can be understood in a better manner with the help of the figure
given below. The figure contains various examples of how samples will be taken from
the population using different techniques.
What is Non-Probability Sampling?
The non-probability sampling method is a technique in which the researcher selects the
sample based on subjective judgment rather than the random selection. In this method,
not all the members of the population have a chance to participate in the study.
Convenience Sampling
In a convenience sampling method, the samples are selected from the population
directly because they are conveniently available for the researcher. The samples are
easy to select, and the researcher did not choose the sample that outlines the entire
population.
Example:
Consecutive Sampling
Consecutive sampling is similar to convenience sampling with a slight variation. The
researcher picks a single person or a group of people for sampling. Then the researcher
researches for a period of time to analyze the result and move to another group if
needed.
Quota Sampling
In the quota sampling method, the researcher forms a sample that involves the
individuals to represent the population based on specific traits or qualities. The
researcher chooses the sample subsets that bring the useful collection of data that
generalizes the entire population.
Learn more about quota sampling here.
Snowball Sampling
Snowball sampling is also known as a chain-referral sampling technique. In this method,
the samples have traits that are difficult to find. So, each identified member of a
population is asked to find the other sampling units. Those sampling units also belong to
the same targeted population.
These are also known as Random sampling These are also called non-random sampling
methods. methods.
These are used for research which is conclusive. These are used for research which is exploratory.
These involve a long time to get the data. These are easy ways to collect the data quickly.
There is an underlying hypothesis in probability The hypothesis is derived later by conducting the
sampling before the study starts. Also, the objective research study in the case of non-probability
of this method is to validate the defined hypothesis. sampling.