0% found this document useful (0 votes)
13 views

Data Collection & Data management

The document outlines the definitions and importance of data collection and management, emphasizing the FAIR Data Principles which advocate for data to be Findable, Accessible, Interoperable, and Reusable. It details various methods of data collection, including primary and secondary sources, and discusses the advantages and disadvantages of techniques such as observation, interviews, and questionnaires. Additionally, it highlights the significance of data management throughout the data life cycle, ensuring accuracy and validity in research conclusions.

Uploaded by

Samson Kihiu
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Data Collection & Data management

The document outlines the definitions and importance of data collection and management, emphasizing the FAIR Data Principles which advocate for data to be Findable, Accessible, Interoperable, and Reusable. It details various methods of data collection, including primary and secondary sources, and discusses the advantages and disadvantages of techniques such as observation, interviews, and questionnaires. Additionally, it highlights the significance of data management throughout the data life cycle, ensuring accuracy and validity in research conclusions.

Uploaded by

Samson Kihiu
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Data Collection & Data

Management
Introduction
The International Organization for Standardization ISO/IEC 11179 2013 defines data as a re-interpretable
representation of information in a formalized manner suitable for communication, interpretation &
processing. Data can also be defined as facts represented as text, numbers, graphics, images etc. Data is
the raw material used to create information. Data form the basic building blocks for all scientific
inquiries. Research draws conclusions from analysis of recorded data. The FAIR Data Principles state that
data from research should be Findable, Accessible, Interoperable & Reusable (FAIR).

Data collection comprises observation; measuring; and, recording of data. Data can be collected from
written documents, existing databases, by asking people questions or by observation of phenomena.

The Data Management Association (DAMA) defines data management as “development, supervision and
execution of plans, policies, programs and practices that control, protect, deliver and enhance the value
of data and information assets”. Data management translates to data collection, processing, storage,
sharing and archiving. Data management is an essential part of almost all research endeavors. The
accuracy & validity of data have a direct effect on the conclusions drawn from them.

In research, data management covers the handling of data from their origination to final archiving. The
data life cycle has three phases: (1) the origination phase during which the data is first collected, (2) the
active phase during which data are accumulating and changing, and (3) the inactive phase during which
data are no longer expected to accumulate or change.

Methods of Data Collection


In a research study, one usually needs to collect the required information; however, sometimes the data
required is already available and need only be extracted. On this basis, data can be categorized as
primary data or secondary data. Primary data is collected from primary sources, whereas secondary
data is collected from secondary sources.

Examples of primary sources


1. Finding out first-hand the attitudes of a community towards health services;
2. Ascertaining the health needs of a community;
3. Evaluating a social program;
4. Ascertaining the quality of service provided by a worker.

Examples of secondary sources of data


1. Extracting data from a census to obtain info on the age-sex structure of a population;
2. Use of hospital records to find out morbidity and mortality patterns in a community;
3. Use of an organization’s records to ascertain its activities;
4. Collection of data from sources such as articles, journals, magazines, books & periodicals to
obtain historical & other types of information
1. Collecting data using primary sources
1.1. Observation
Purposeful, systematic and selective way of watching listening to a phenomenon as it takes place.
Situations where observation is the most appropriate method of data collection include: when you want
to study the behavior or personality traits of an individual, or to study the dietary patterns of a
population.

There are two types of observation:

1. Participant observation-the researcher participates in the activities of the group being observed,
in the same manner as its members, with or without their knowing that they are being
observed;
2. Non-participant observation-the researcher does not get involved in the activities of the group
but remains a passive observer, watching and listening to its activities and drawing conclusions
from this.

Observations can be made under two conditions-natural or controlled. Observing a group without
interfering in its normal activities is referred to as observation under natural conditions. Introducing a
stimulus to the group to elicit a reaction is referred to as observation under controlled conditions.

1.1.1. Recording of observations


Narrative & descriptive recording- mainly used in qualitative research. In narrative recording, the
researcher records a description of the interaction in his/her own words. S/he makes brief notes while
observing the interaction, and then soon after competing the observation, makes detailed notes in
narrative form. It provides a deep insight into the interaction. A disadvantage however is that an
observer may be biased is his/her observations. Additionally, interpretations and conclusions drawn are
bound to be subjective, reflecting the researcher’s perspective. Also, if a researcher’s attention is on
observation, they might forget to record an important piece of interaction. In the process of recording,
part of the interaction may also be missed, hence there’s the possibility for incomplete recording and/or
observation. Comparability of narrative recording by different observers can also be a problem.

Categorical recording & numerical scales- a scale is developed to rate various aspects of an interaction
or phenomenon. The recording is done on a scale developed by the researcher or observer. The main
advantage of using scales is that it saves on time, enabling concentration on observation. It however
doesn’t provide an in-depth account of the interaction. It may additionally suffer from te following
errors:

 Errors of central tendency-the unexperienced observer tends to avoid the extremes on the
scale, using mostly the central part, which creates an error.
 Elevation effect-some observers may prefer certain sections of the scale (in the same way that
some teachers are strict markers & others are not). When observers tend to use a particular part
of the scale in recording an interaction, the phenomenon is known as the elevation effect.
 The halo effect-Here, the way an observer rates an individual on one aspect of the interaction
influences the way s/he rates an individual on another aspect of the interaction (same to when a
teacher’s assessment of the performance of a student in one subject influences their rating of
that student’s performance in another subject)
1.1.2. Problems with using observation for data collection
 When individuals or groups become aware that they are being observed, they may change their
behavior. This change could be positive or negative. When a change in the behavior of persons
or groups is attributed to their being observed, it is known as the Hawthorne effect. Using
observation in such a situation may introduce distortion, and the observations may not
represent their normal behavior.
 Observer bias
 Interpretations drawn from observations vary from observer to observer.
 There is the possibility of incomplete observation and/or recording, varying with the method of
recording.

Direct Observation
Involves observation of external phenomena in the real world e.g., births, deaths, medical procedures,
past exposure to chemicals etc. Such verifiable facts can be collected and potentially corroborated with
external information. Humans are fallible*, inconsistent and use judgement in their observations. The
use of human observation thus requires standardization and quality control in order to increase
consistency in human observation. Human observation is complicated further when there’s subjectivity
or human judgement involved. Subjectivity is often present where humans are asked to classify events
into discrete categories or to observe real-life events.

Demerits
 Human error-using a device for the observation can prevent human error. Additionally,
recording the phenomenon of interest for purposes of double observation can help mitigate this
 Inconsistency-data observation and recording is subject to human individuality.
 Subjectivity

Interviewing persons who directly observed/experienced


This is subject to the interviewee’s observation skills, recall, willingness, and their ability to report
accurately. It is subject to information loss and degradation over time.

Interviewing someone who directly experienced a phenomenon not directly observable is different. This
data is highly subjective and unverifiableThese phenomena are tricky to measure because there is no
external standard for comparison and no way to compare an internal phenomenon to such a standard if
it were to exist. Questionnaires and rating scales are the tools used to collect such data.

Demerits
 Recall bias
 Lack of reliability

1.2. The Interview


It is a person-to-person interaction, either face to face or otherwise, between two or more parties with
a specific purpose in mind. According to Monette et al. (1986:156), ‘an interview involves an interviewer
reading questions to respondents and recording their answers.’

Interviews are classified into different categories, based on their degree of flexibility. We can have
unstructured interviews or structured interviews.
1.2.1. Unstructured interviews
There is almost complete freedom in terms of its structure, contents, question wording and order. One
is free to ask whatever they want, and in a format that is relevant to the situation. There is also freedom
in terms of wording and the way the researcher explains questions to the respondents. The researcher
can formulate questions and raise issues on the spur of the moment, depending on what occurs to them
in the context of the discussion. They are predominantly used in qualitative research

 Unstructured interviews are extremely useful in exploring intensively and extensively and
digging deeper into a situation, phenomenon, issue or problem
 They are best suited to ID diversity & variety
 Disadvantage-higher level of skill is required to conduct them

1.2.2. Structured interviews


The researcher asks a predetermined set of questions, using the same wording and order of questions as
specified in the interview schedule.

 It provides uniform information, which assures the comparability of data


 Requires fewer interviewing skills than the unstructured interview

1.2.3. Advantages of the interview


 More appropriate for studying complex & sensitive situations-the interviewer has the
opportunity to prepare the respondent before asking sensitive questions and to explain complex
ones to respondents in person.
 Useful for collecting in-depth info- the researcher can probe.
 Information can be supplemented-the researcher is able to supplement info obtained from the
responses with those gained from observation of non-verbal cues.
 Questions can be explained, eliminating the likelihood that a question might be misunderstood.
 Has a wider application-it can be used with almost any type of population: children, the
handicapped, the illiterate, the geriatric population etc.

1.2.4. Disadvantages of the interview


 Time-consuming and expensive
 The quality of data depends upon the quality of the interaction; the quality of the responses
from different interactions may vary significantly too
 The quality of data depends upon the interviewer’s experience, skills & commitment
 The quality of data varies when multiple interviewers are used
 Possibility of research bias-a researcher’s bias in the framing of questions or in the
interpretation of responses is always possible.

1.3. The questionnaire


This is a written list of questions, the answers to which are recorded by respondents. Respondents read
the questions, interpret what is expected and then write down the answers.

In a questionnaire, it is important that questions be clear and easy to understand as there is no one to
interpret the questions to respondents. The layout of a questionnaire should also be such that it is easy
to read and pleasant to the eyes, and the sequence of questions should be easy to follow. It should be
developed in an interactive style-respondents should feel as if someone is talking to them. In a
questionnaire, a sensitive question or one that respondents may be hesitant to respond to should be
prefaced by an interactive statement explaining the relevance of the question. Such statements should
be in a different font to distinguish them from the actual questions.

1.3.1. Ways of administering a questionnaire


 The mailed questionnaire
 Collective administration- researcher obtains a captive audience (e.g., students in a
classroom/people assembled in one place). This ensures a very high response rate. Additionally,
as there is personal contact with the study population, the researcher can explain the purpose,
relevance and importance of the study and can clarify any questions that the respondents may
have.
 Online questionnaire
 Administration in a public space (e.g., a shopping center, health center, hospital etc.)

1.3.2. Advantages of a questionnaire


 It is less expensive financially
 It is less time consuming
 It offers greater anonymity

1.3.3. Disadvantages of the questionnaire


 Limited application-it is limited to a study population that can read and write.
 Low response rate-people fail to return them, in effect reducing the sample size. The response
rate depends on a number of factors: the interest of the sample in the topic of study; the layout
and length of the questionnaire; the quality of the letter explaining the purpose and relevance
of the study; and the methodology used to deliver the questionnaire.
 Self-selecting bias-due to the ow response rate, there is a self-selecting bias. Those who return
their questionnaires may have attitudes or motivations that are different from those that do not.
Hence, the findings may not be representative of the total study population.
 Lack of the opportunity to clarify issues-if, for any reason, respondents don’t understand some
questions, there is almost NIL opportunity for them to have the meaning clarified unless they
get in touch with the researcher, which rarely happens. This’ll affect the quality of data obtained
if different respondents interpret questions differently.
 No opportunity for spontaneous responses-most respondents glance through the whole
questionnaire before responding, giving them time to reflect, and they may thus change their
answers to some questions, esp. where spontaneous responses are required.
 The response to a question may be influenced by the response to other questions
 Other people can influence the answers-the respondents may consult other people before
responding. This may be bad, esp. when the researcher is only interested in the study
population’s opinions. Requesting respondents to express only their own opinion may help.
 A response cannot be supplemented with other info-the questionnaire lacks the additional
advantage of observation that comes with interviewing.
2. Collecting data from secondary sources
Here, the data has already been collected by someone else or already exists as a part of the routine
record keeping by an organization. All the researcher needs to do is extract the required information for
purposes of their study. Possible secondary sources include:

a) Government or quasi-government publications-they include the census, vital statistics


registration, health reports, demographic information etc.
b) Earlier research-for some topics, a vast array of research studies have already been done by
others, and can provide the researcher with the necessary info
c) Personal records e.g., diaries
d) Mass media- reports published in newspapers, magazines, on the internet etc. may be a good
source of data.

2.1. Problems with data from secondary sources


Before deciding to use data from sec. sources, the following issues should be kept in mind:

 Validity & reliability- they may vary markedly from source to source;
 Personal bias-info from personal records like diaries, as well as newspapers etc. may have a
personal bias as these writers are likely to exhibit less rigor & objectivity than one would expect
in research reports
 Availability of data
 Format-it is equally important to ascertain whether the data is available in the required format
(e.g., the researcher might want an analysis of age in categories 23-33, 34-48, but, in the source,
age may be categorized as 21-24, 25-29 etc.)

3. Data Collection/Recording Tools


1. Audio-recording devices
2. Video-recording devices- the data can be reviewed a no. of times before interpretation. Other
professionals can also be invited to look at the data in order to arrive at more objective
conclusions. Some people may however feel uncomfortable or behave differently around these
devices. Cost**
3. Photographic-recording devices
4. Interview schedule- this is a written list of questions, open-ended or closed, thoroughly pre-
tested for standardized wording, meaning and interpretation, prepared for use by an
interviewer in a person-to-person interaction (whether face to face, by telephone or other
electronic media)
5. Questionnaires
6. Notebooks
7. Numerical scales
Glossary
Data management:the process by which data is defined, documented, collected, and subsequently
processed 1
Data: factual information (as measurement or statistics) used as a basis for reasoning, discussion or
calculation 1

You might also like