Data Collection & Data management
Data Collection & Data management
Management
Introduction
The International Organization for Standardization ISO/IEC 11179 2013 defines data as a re-interpretable
representation of information in a formalized manner suitable for communication, interpretation &
processing. Data can also be defined as facts represented as text, numbers, graphics, images etc. Data is
the raw material used to create information. Data form the basic building blocks for all scientific
inquiries. Research draws conclusions from analysis of recorded data. The FAIR Data Principles state that
data from research should be Findable, Accessible, Interoperable & Reusable (FAIR).
Data collection comprises observation; measuring; and, recording of data. Data can be collected from
written documents, existing databases, by asking people questions or by observation of phenomena.
The Data Management Association (DAMA) defines data management as “development, supervision and
execution of plans, policies, programs and practices that control, protect, deliver and enhance the value
of data and information assets”. Data management translates to data collection, processing, storage,
sharing and archiving. Data management is an essential part of almost all research endeavors. The
accuracy & validity of data have a direct effect on the conclusions drawn from them.
In research, data management covers the handling of data from their origination to final archiving. The
data life cycle has three phases: (1) the origination phase during which the data is first collected, (2) the
active phase during which data are accumulating and changing, and (3) the inactive phase during which
data are no longer expected to accumulate or change.
1. Participant observation-the researcher participates in the activities of the group being observed,
in the same manner as its members, with or without their knowing that they are being
observed;
2. Non-participant observation-the researcher does not get involved in the activities of the group
but remains a passive observer, watching and listening to its activities and drawing conclusions
from this.
Observations can be made under two conditions-natural or controlled. Observing a group without
interfering in its normal activities is referred to as observation under natural conditions. Introducing a
stimulus to the group to elicit a reaction is referred to as observation under controlled conditions.
Categorical recording & numerical scales- a scale is developed to rate various aspects of an interaction
or phenomenon. The recording is done on a scale developed by the researcher or observer. The main
advantage of using scales is that it saves on time, enabling concentration on observation. It however
doesn’t provide an in-depth account of the interaction. It may additionally suffer from te following
errors:
Errors of central tendency-the unexperienced observer tends to avoid the extremes on the
scale, using mostly the central part, which creates an error.
Elevation effect-some observers may prefer certain sections of the scale (in the same way that
some teachers are strict markers & others are not). When observers tend to use a particular part
of the scale in recording an interaction, the phenomenon is known as the elevation effect.
The halo effect-Here, the way an observer rates an individual on one aspect of the interaction
influences the way s/he rates an individual on another aspect of the interaction (same to when a
teacher’s assessment of the performance of a student in one subject influences their rating of
that student’s performance in another subject)
1.1.2. Problems with using observation for data collection
When individuals or groups become aware that they are being observed, they may change their
behavior. This change could be positive or negative. When a change in the behavior of persons
or groups is attributed to their being observed, it is known as the Hawthorne effect. Using
observation in such a situation may introduce distortion, and the observations may not
represent their normal behavior.
Observer bias
Interpretations drawn from observations vary from observer to observer.
There is the possibility of incomplete observation and/or recording, varying with the method of
recording.
Direct Observation
Involves observation of external phenomena in the real world e.g., births, deaths, medical procedures,
past exposure to chemicals etc. Such verifiable facts can be collected and potentially corroborated with
external information. Humans are fallible*, inconsistent and use judgement in their observations. The
use of human observation thus requires standardization and quality control in order to increase
consistency in human observation. Human observation is complicated further when there’s subjectivity
or human judgement involved. Subjectivity is often present where humans are asked to classify events
into discrete categories or to observe real-life events.
Demerits
Human error-using a device for the observation can prevent human error. Additionally,
recording the phenomenon of interest for purposes of double observation can help mitigate this
Inconsistency-data observation and recording is subject to human individuality.
Subjectivity
Interviewing someone who directly experienced a phenomenon not directly observable is different. This
data is highly subjective and unverifiableThese phenomena are tricky to measure because there is no
external standard for comparison and no way to compare an internal phenomenon to such a standard if
it were to exist. Questionnaires and rating scales are the tools used to collect such data.
Demerits
Recall bias
Lack of reliability
Interviews are classified into different categories, based on their degree of flexibility. We can have
unstructured interviews or structured interviews.
1.2.1. Unstructured interviews
There is almost complete freedom in terms of its structure, contents, question wording and order. One
is free to ask whatever they want, and in a format that is relevant to the situation. There is also freedom
in terms of wording and the way the researcher explains questions to the respondents. The researcher
can formulate questions and raise issues on the spur of the moment, depending on what occurs to them
in the context of the discussion. They are predominantly used in qualitative research
Unstructured interviews are extremely useful in exploring intensively and extensively and
digging deeper into a situation, phenomenon, issue or problem
They are best suited to ID diversity & variety
Disadvantage-higher level of skill is required to conduct them
In a questionnaire, it is important that questions be clear and easy to understand as there is no one to
interpret the questions to respondents. The layout of a questionnaire should also be such that it is easy
to read and pleasant to the eyes, and the sequence of questions should be easy to follow. It should be
developed in an interactive style-respondents should feel as if someone is talking to them. In a
questionnaire, a sensitive question or one that respondents may be hesitant to respond to should be
prefaced by an interactive statement explaining the relevance of the question. Such statements should
be in a different font to distinguish them from the actual questions.
Validity & reliability- they may vary markedly from source to source;
Personal bias-info from personal records like diaries, as well as newspapers etc. may have a
personal bias as these writers are likely to exhibit less rigor & objectivity than one would expect
in research reports
Availability of data
Format-it is equally important to ascertain whether the data is available in the required format
(e.g., the researcher might want an analysis of age in categories 23-33, 34-48, but, in the source,
age may be categorized as 21-24, 25-29 etc.)