Data Collection and Processing
Data Collection and Processing
COLLECTION
AND
PROCESSING
UNIT 3
DATA COLLECTION
In most cases, data collection is the primary and most important step for
research, irrespective of the field of research. The approach of data collection is
different for different fields of study, depending on the required information.
TYPES OF DATA
TIME CONSUMING
EXPENSIVE
VARIOUS METHODS
AVAILIBILITY
RELIABILITY
ACCURACY
OBJECTIVES OF RESEARCH
FACTORS TIME
INFLUENCING
PRIMARY COST
DATA AVAILABILITY OF RESEARCH STAFF
COLLECTION
AVAILIBILITY OF RESPONDENTS
OBSERVATION METHOD
METHOD
OF EXPERIMENTATION METHOD
COLLECTIN
G PRIMARY SURVEY METHOD
DATA
INTERVIEW METHOD
OBSERVATION
METHOD
s Universal method
ges
It may not give complete information
Types of
Observatio
Disguised and Undisguised
n
Mechanical
Experimentation
Method
•The experimental method involves the
manipulation of variables to establish cause and
effect relationships. The key features are controlled
methods and the random allocation of participants
into controlled and experimental groups.
Disadvanta
ges
Time consuming
Types of
Experimentati
Lab Experiment
on
Natural Experiment
Interview
Method
• Time consuming
• Expensive
• Documentation and Paperwork
• Respondent and Interviewer
biasedness
• Sampling problem
Types of Interview
Focused Group
Personal Interview Interview
Types Mail
Internet
Schedules
The schedule is a formalized
set of questions, statements,
and spaces for answers,
provided to the enumerators
who ask questions to the
respondents and note down
the answers. While
a questionnaire is filled by the
informants themselves,
enumerators fill
the schedule on behalf of the
respondent.
To provide a standardized tool
for observation
Documents Schedule
Types of Survey Schedule
Schedules
Observation Schedule
Structured or Unstructured
Study all aspects of problem
Clarity
a Schedule
Pre-testing of Schedule
Division of Schedule
Nature of Use of
Features Respondents Computers
Time
Response Area of
Cost
Rate Coverage
• A questionnaire is a research instrument that
consists of a set of questions or other types of
prompts that aims to collect information from
a respondent. A research questionnaire is
typically a mix of close-ended
questions and open-ended questions. Open-
ended, long-form questions offer the
Questionna respondent the ability to elaborate on their
thoughts. Research questionnaires were
ire developed in 1838 by the Statistical Society of
London.
• The data collected from a data collection
questionnaire can be both qualitative as well
as quantitative in nature. A questionnaire may
or may not be delivered in the form of
a survey, but a survey always consists of a
questionnaire.
Importance of
Questionnaire
• It collects view point of people
• More data can be collected
• It gives a summary of demographic situation
• Less time consuming
• Study the behaviour
• It collects sensitive information
• It creates a data base
Essentials of Good
Questionnaire
Relevant Questions
Clarity
Restricted no. of Questions
Type of Questions: Open and Close ended
Sequence of Questions
Pilot Study
Data collected is up-to- Relevant and specific to
date your research objectives.
Advantages
of Primary Primary research can
Competitors have no
access to your data,
Advantag
Data expensive.
es of Quick Decision
Less Time
consuming
Secondar Less
y Data
No Sampling
Processing
Errors
of Data
Large
volume of
Data
Problem of Accuracy, and Reliability
Problem of Adequacy
Limitations
Lack of In-depth information
of
Secondary Lack of potential in handling specific problem
Data
Problem of Biased information
e of INFORMATION CAN BE
COLLECTED
RESEARCHER
Sampling
ECONOMICAL SUITABLE FOR QUALITY RESEARCH
ACADEMIC AND WORK
MARKET-BASED
RESEARCH
Method of
Sampling
Non-
Probability
Probability
Method
Method
Probability sampling is a sampling technique where a In non-probability sampling, the researcher chooses
researcher sets a selection of a few criteria and chooses members for research at random. This sampling method
members of a population randomly. All the members have is not a fixed or predefined selection process. This makes
an equal opportunity to be a part of the sample with this it difficult for all elements of a population to have equal
selection parameter opportunities to be included in a sample.
• It is a reliable method of obtaining
information where every single member
of a population is chosen randomly,
merely by chance. Each individual has
the same probability of being chosen to
be a part of a sample.
For example, in an organization of 500
Simple Random employees, if the HR team decides on
Sampling conducting team building activities, it is
highly likely that they would prefer
picking chits out of a bowl. In this case,
each of the 500 employees has an equal
opportunity of being selected.
• Lottery Method
• Random Tables
• Researchers use the systematic sampling
method to choose the sample members of a
population at regular intervals. It requires
the selection of a starting point for the
sample and sample size that can be
Systemati repeated at regular intervals. This type of
sampling method has a predefined range,
c and hence this sampling technique is the
Sampling least time-consuming.
For example, a researcher intends to collect
a systematic sample of 500 people in a
population of 5000. He/she numbers each
element of the population from 1-5000 and
will choose every 10th individual to be a
part of the sample (Total population/
Sample Size = 5000/500 = 10).
• Cluster sampling is a method where the
researchers divide the entire population
into sections or clusters that represent a
population. Clusters are identified and
included in a sample based on
demographic parameters like age, sex,
location, etc. This makes it very simple
for a survey creator to derive effective
inference from the feedback.
Cluster
Sampling • For example, if the United States
government wishes to evaluate the
number of immigrants living in the
Mainland US, they can divide it into
clusters based on states such as
California, Texas, Florida, Massachusetts,
Colorado, Hawaii, etc. This way of
conducting a survey will be more
effective as the results will be organized
into states and provide insightful
immigration data.
Stratified random sampling is a method in which
the researcher divides the population into smaller
groups that don’t overlap but represent the
entire population. While sampling, these groups
can be organized and then draw a sample from
each group separately.
Create an Accurate
Sample
Convenience Sampling
Suitability
Data Processing
Data processing is the method
of collecting raw data and
translating it into usable
information. It is usually
performed in a step-by-step
process by a team of data
scientists and data
engineers in an organization.
The raw data is collected,
filtered, sorted, processed,
analyzed, stored and then
presented in a readable format.
Stages in Data Processing
Graphic
Editing Coding Classification Tabulation
Presentation
Editing of data
Editing is the first step of data processing. Editing
is the process of examine the data collected
through questionnaire or any other method. It
start after all data collection to check it or reform
into useful data.
Coding is the process of categories
data according to research subject
or topic and the design of research.
In coding process researcher set a
code for a particular things like
Coding of male - M, Female- F that indicate
the gender in questionnaire without
data writing full spelling same as
researcher can be use colors to
highlight something or numbers like
1+, 1-. this type of coding makes
easy to calculate or evaluate result
in tabulation.
Classification or categorization is the
process of grouping the statistical data
under various understandable
homogeneous groups for the purpose of
convenient interpretation. A uniformity of
attributes is the basic criterion for
classification; and the grouping of data is
Classificati made according to similarity.
Classification becomes necessary when
on of Data there is a diversity in the data collected
for meaningless for meaningful
presentation and analysis. However, it is
meaningless in respect of homogeneous
data. A good classification should have
the characteristics of clarity,
homogeneity, equality of scale,
purposefulness and accuracy.
Tabulation
of data
Tabulation is the process of summarizing raw
data and displaying it in compact form for
further analysis. Therefore, preparing tables is
a very important step. Researcher can be
tabulation by hand or in digital mode. The
choice is made largely based on the size and
type of study, alternative costs, time
pressures, and the availability of computers,
and computer programmes. If the number of
questionnaire is small, and their length short,
hand tabulation is quite satisfactory.
Diagrams are charts and graphs used
to present data. These facilitate
getting the attention of the reader
more. This help present data more
effectively. Creative presentation of
Graphical data is possible. The data diagrams are
classified into:
Representat Pie Chart
ion Bar Graphs
Line Graphs
Gantt Charts
Histograms
Gantt Chart