Lecture Notes Chapter 5 Data Collection Sampling and Data Analysis
Lecture Notes Chapter 5 Data Collection Sampling and Data Analysis
Bien Maunahan
CHAPTER 5
DATA COLLECTION
After determining the research design, the next step in the research process is to select
the methods of collecting data. These methods include:
• Observation
• Interviews
• Questionnaires
• Standardized tests
• Use of physical instruments
• Simulation
• Review of documents
Many studies use multiple or mixed methods to collect data by exploiting the strengths
and offsetting the weaknesses of each data collection method. In doing so, they expand the
scope of the research. For example, we may use standardized tests to assess students’
performance, followed by qualitative data on why students with similar backgrounds differ
substantially in their performance. Economists regularly use this strategy to supplement
quantitative data with illustrative qualitative examples.
Instead of combining the qualitative and quantitative data, some researchers transform
the qualitative data into quantitative data. For example, a researcher may use numeric codes
corresponding to the responses to open-ended questions in a structured questionnaire. This
strategy partially overcomes the possibility of unlimited responses to open-ended questions.
Before discussing these data collection methods, we must understand the scales from
which empirical measures for theoretical concepts are developed.
SCALES MEASUREMENT
Scales are used for categorization, ranking, and assessing magnitudes (Table below). There
are two types of variables, namely:
A nominal scale categorizes data, such as 0 for females and 1 for males. In transport
studies, the mode of transport to work is usually a categorical variable, such as 1 for a walk, 2
for a bicycle, 3 for a motorcycle, 4 for a bus, etc. We often use discrete variables to generate
count or frequency data, such as the number of boys and girls in a class.
An interval scale consists of equal intervals that measure the relative distances
(differences) between points on the scale, such as IQ scores, temperature, or time. Ratios are
meaningless; a person with an IQ score of 160 is not twice as intelligent as one with a score
of 80. Similarly, the period 2005–2010 makes sense in calendar time, but not the ratio
2005/2010.
In a ratio scale, the ratios are meaningful. For example, a length of 3 m is twice as
long as 1.5 m, or 10 kg is twice as heavy as 5 kg.
Despite the differences in the interval and ratio scales, it is often not necessary to
distinguish them. Both scales use real numbers, and most statistics, such as the mean and
standard deviation, apply to them.
How precisely we want to measure something depends on the purpose and cost. For
instance, we should measure a room’s temperature using a thermometer if this level of
accuracy is essential, rather than merely stating if it is “hot” or “cold.” The measurement
scale will also affect the type of statistical tests used. For instance, the mean and standard
deviation are meaningless on the nominal scale. If there are 20 boys and ten girls in the class,
the average (of 15) does not make sense.
Observations
The researcher must know what to observe. Most observers use a checklist to guide
their observations. They also try to take notes and make sketches as soon as practical and
supplement them with recording media. However, be sure to obtain permission to record or
film conversations and activities. If quick note-taking is impossible, it should be done at the
earliest opportunity.
Observer bias may occur for various reasons. For example, the observer may either
not understand the context or he did not receive adequate training. Any two observers may
also interpret the same event differently. To reduce observer bias and improve coverage, we
often use two or more observers to compare notes. In general, triangulation uses two or more
observers or methods to collect data.
Interviews
The interviewer does not wish to impose any prior framework in an unstructured
interview. For instance, in an interview with a project manager on how he was affected by
project failure, the interviewer may start with a question such as “how do you define project
failure,” The interview then proceeds based on the responses. Generally, the questions will
cover awareness of the issue, the adaptive responses, damage control, the consequences, etc.
A focus group is a group interview comprising about five to ten respondents. The
researcher facilitates and moderates the collective discussion to explore ideas, share views, or
make recommendations on an issue. Its effectiveness will depend on the composition of the
participants, whether they are representative of the population, the skills of the moderator, the
ground rules, and the type of questions.
Beyond ten respondents, the focus group becomes a community meeting. Researchers
use public forums and hearings to gather ideas from a broader range of stakeholders. Such
meetings should be inclusive; they include all sections of the community. They should also
be participatory and not be dominated by specific stakeholders. Do not assume that all
stakeholders will be present for a public meeting.
Researchers may carry out the interviews face to face, particularly if probing
questions or visual aids are required, or over the telephone or Internet. Because the
researcher’s presence may affect the respondent’s readiness to provide information, the
interviewer should try to put the respondent at ease.
Questionnaires
Most questionnaires contain highly structured questions together with a limited set of
answers. They usually have factual questions and ratings and occasional opinions and
reasons. An example is given below.
Before finalizing the questionnaire, you should conduct a pretest using a small sample
of respondents to obtain feedback on the length, structure, sequencing, and content.
Generally, a questionnaire should not exceed five pages. It is advisable to adopt a simple
structure with proper sequencing without too many disruptive jumps from one section to
another. Finally, check the content for validity.
These validity concepts help ensure that the data collected, observations made, or changes
implemented are accurate, reliable, and applicable in understanding and improving aspects
across various settings or scenarios.
After the pre-test, the questionnaire is refined by checking validity, proper numbering,
rewording vague or offensive questions, including good options in a question, removing
duplicates and unimportant questions, and putting the more difficult or contentious questions
last rather than first.
5. Translation
• There should be a translation in cases where the language is a problem
6. Ethical Considerations
• Obtaining informed consent before the study or the interview begins
• Ensuring the confidentiality of the data obtained
• Leaning enough about the culture of informants to ensure it is respected
during the data collection process
7. Pre-test: it refers to a small-scale trial of particular research components
• Data collection tools
➢ clarity of language, acceptability of questions, the accuracy
of the translation, the time needed to administer the
questionnaire, the need to pre-categorize some answers,
need for additional instructions
• Availability and willingness of respondents
• Sampling procedure
• Procedures for data processing and analysis
Kinds of Questions
Closed questions are more common in questionnaires, and there is a range of ways of
providing the 'closeness' of the answer:
Yes/No Question
From these questions, it is clear that the answer should be a yes or no only.
In this type, there are two or more answers, and respondents are told either to
tick one or as many as they like
For example: Which of the following factors are more significant in the
poor performance of local engineers (tick as many as you like)?
o Budget/capital
o Skilled Manpower
o Gender
o Equipment
o Other Please Specify ___________
Respondents must put items in order: best to worst, most relevant to the least
essential, and others.
The respondent will rate something (an experience, attitude, attribute, etc.)
along a continuum.
➢ It is named after the psychologist Rensis Likert and can be used in any
situation where belief or attitude is being measured
➢ You will be asked for agreement or disagreement with a statement you
provide.
Questionnaire Online
At the time of writing, it lets you construct your questionnaire free for up to
100 respondents and gives help on the construction of a questionnaire, with
24-hour online support.
If you need an alternative service that lets you survey the number of people,
you will probably find that your university subscribes to one. You can ask
your tutor or someone at the Computer Centre which one is used at yours.
Using simple words; if a technical word is necessary, provide a short explanation within the
question; similarly, provide a map if necessary
• Using fixed-alternative questions that are theoretically sound and not artificially
imposed, and state clearly if multiple answers to a question are possible
• Avoiding vague questions, such as what does “seldom” mean in terms of how
many times I watch movies? It is better to provide frequency counts, such as “On
average, how many movies do you watch in a month?”
• Using open-ended questions where many answers are possible, for example,
“What is your vision for downtown?”
• Avoiding questions that lead to particular answers, for example, “Should
unproductive speculators be taxed?”
• Avoiding double-barreled questions, for example, “Is your work easy and
challenging?” poses a dilemma if it is easy but not challenging
• Stating the units of measurement, for example, gross or net monthly income
• Asking in units that people remember, for example, monthly take-home pay rather
than the annual income
• Using ranges for sensitive issues, for example, income ranges
• De-sensitizing phrases, for example, “Many people surf the Internet for
pornographic sites. Have you done this before?” are less sensitive than “Have you
surfed the Internet for pornographic sites?”
• Avoiding hypothetical questions that tend to be poor predictors because there are
many considerations, for example, “Do you intend to buy this product?”
• Avoiding questions on competency, for example, “How do you rate yourself as a
computer user?” is prone to the prestige bias of overrating one’s competency
• Avoiding questions that have a social desirability bias, for example, “Do you
support this project to help the unfortunate children?”
• Furthermore, being aware of possible researcher bias because the way questions
are worded or asked may not reflect respondents' view of the way issues.
Sometimes, we conduct an item analysis during the pre-test to determine if the responses to
an item (question) correlate well with the responses to other items.
For example, suppose we are interested in rating the services of a subway system
(based on a scale of 1 to 10), and the sections of the questionnaire include:
1. Respondent characteristics
2. Fares
3. Security and so on
We correlate the scores for each item from the pre-test responses with the aggregate scores
for all other items. In the table below, item 3.4 may be correlated with the aggregate scores
for all other items.
Table: Item Analysis
Standardized Tests
Standardized tests are another way to collect data. They are commonly used in
psychological and educational research, such as in an experiment to test mental ability or the
effectiveness of a teaching method.
The challenges in designing standardized tests include ensuring that the content is
appropriate, sufficient time to complete the test and that the test is not too easy or difficult.
Standardization makes such tests less suitable for answering complex questions with no
simple solutions.
These tests are used to quantify or assess particular traits, abilities, behaviors,
knowledge, or characteristics of individuals in a standardized and systematic way.
Administering a standardized test means giving everyone the same questions or tasks with
clear instructions, and making sure the test is fair and consistent for everyone taking it.
Real instruments are widely used in the natural sciences to measure velocity,
acceleration, temperature, distance, mass, pressure, weight, volume, etc. The decision will
depend on cost, availability, accuracy, precision, ease of use, calibration requirements, and
reliability.
Simulation
Review of Documents
For instance, “official” sources may suppress statistics on worksite accidents or use
different methodologies or words to make the numbers or organization look good.
Data Collection
As we already know that after developing the research question(s), hypothesis (or
framework), research design, and methods of data collection, the next step in the research
process is the actual collection and processing of data. The processes are linked, and it is
impossible to develop each step independently of prior decisions.
1. Permission to proceed
➢ Obtaining consent from relevant authorities, individuals, and the community
in which the project is to be carried out
2. Data handling
➢ Coding number questionnaires/samples/measurement
➢ Identify the person responsible for storing data and the place where it will be
stored
3. Data Collection
➢ Logistics
➢ Who will collect data
➢ When and
➢ With what resources
➢ Quality Control
➢ Prepare a fieldwork manual
➢ Select and train your research assistants
➢ Supervise the collection
➢ Check for completeness and accuracy
The issues are similar for all research designs, with minor variations between interpretive and
causal studies. These variations will be highlighted below.
Access to Respondents
The gatekeepers control your access to the organization. They are likely to be a leader
or senior person in the organization. For example, in a school setting, the gatekeeper is the
principal. He will decide whether you can observe and interview school administrators,
teachers, and students. You need to address his concerns, such as:
• The purpose of the study
• Why the school has been selected
• What are you going to do with the result
• Will it disrupt classes
• How the school can benefit from participating
From the last bullet point, we can see that the best way to gain access is to show how the
other party can benefit from the study. For instance, you may want to share your research
findings with the school as an inducement. This means that the research problem is essential
to the school. For example, you may want to share whether a new technique for teaching
mathematics is effective.
Training should be provided to field staff members. Someone familiar with the
fundamental research, such as the principal investigator, should conduct the training. The
briefing includes the nature and purpose of the study and data collection procedures. They
should be trained on specific procedures to be followed when contingencies arise, such as:
• The respondent is not at home
• No one is answering the phone
• The call is directed to an answering machine
• Non-resident answers the call
• The line has been disconnected
• The selected respondent is unable to answer because of physical disability
• There is a language barrier
• The interview is incomplete
• The selected respondent refuses to be interviewed.
Check that equipment such as tape recorders, cameras, and other measuring devices
are calibrated and in good working order. Field equipment should be looked after to prevent
damage and, for safety reasons, ensure only qualified people handle the equipment. Leaving
them unattended invites theft and gives participants the impression of professional
irresponsibility.
For interviews, notebooks, instruction manuals, survey forms, and maps should be
handled appropriately.
Most research designs require the review of documents for qualitative and
quantitative data. There should be a protocol or checklist for reviewing such documents to
extract the data meaningfully and effectively.
The process begins with assessing the types of information required and the types of
documents to review. Some of the information may have been published elsewhere, or there
are alternate sources of information. For example, different government agencies may publish
information on construction statistics. As far as possible, the researcher should triangulate the
data from these sources to minimize errors.
Use sources when collecting documentary data to minimize transcription and
interpretation errors. Subsequent users may have reorganized the original data and essential
footnotes on how these collected data may have been omitted.
Note-taking
The notes will have to be organized to trace the chain of evidence. The notes and
evidence are usually arranged in temporal sequence or themes. Within each theme, there is
still the need to track material changes.
In legal terms, a “chain of evidence” is a series of sequential events that account for
the actions of a particular person in a specific legal case (for example, a criminal case) from
the beginning to the end. The reasoning should be tight for the conclusion to be defensible in
court. It is similar to the causal mechanism or process tracing and is widely used in forensic
science.
Enhancing Reliability
In interviews, the researcher typically lets the respondent express his views freely,
telling, or constructing his side of the story. Reliability can be enhanced by cross-checking
his opinions with other sources of evidence. For example, if a worker claims that he works
long hours, this may be checked with colleagues. Triangulation among observers and other
sources of information also minimizes observer bias.
The sources should be reliable and credible for documentary research, such as data
published by reputable researchers and organizations. Triangulation among data sources will
also improve reliability.
Tracking of Progress
For interpretive studies, tracking research progress is less of a problem once access
has been secured, and respondents continue to co-operate. This is not so for survey research,
where the response rate tends to be more uncertain.
Tracking research progress also involves field supervision to ensure that research
assistants follow reasonable field procedures and workloads. It is not unusual for supervisors
to verify a small portion of the interviews or questionnaires by re-interviewing or asking
respondents whether they have been interviewed.
Supervisors should collect survey forms regularly and edit them in the field for
legibility and completeness. Where problems occur, these issues are communicated to field
assistants, and additional training may be necessary. A reminder may be sent, and follow-ups
are made soon after the cut-off date.
➢ Discrete data can only take specific values (like whole numbers)
➢ Continuous data can take any value (within a range)
With your eyes and ears, you get data or information, and with this data, you can
answer your questions and support or not the claims you made at the beginning of your
research. When data are used to support a proposition in this way, they become evidence.
Types of Data
1. Primary Data: Are those which are collected afresh and for the first time and happen
to be original
2. Secondary Data: Are those who have been collected
1. There are several methods of collecting primary data, particularly in surveys and
descriptive research.
2. In descriptive research, we obtain primary data either through observation or through
direct communication with respondents in one form or another or through personal
interviews
1. These are already available, i.e., they refer to the data which have already been
collected and analyzed by someone else.
2. Secondary data may either be published or unpublished data. The researcher must be
meticulous in using secondary data because the data available may be sometimes
unsuitable
Population refers to the group that the research wishes to conclude from
❖ The population is the entire group of interest that the researcher aims to study.
It could be people, objects, events, or any defined unit relevant to the research
question.
❖ It helps researchers determine the appropriate methods, sampling strategies,
and scope of their study, ensuring that the findings are relevant, reliable, and
applicable to the intended context or group.
Defining of Populations:
Types of Populations:
Samples:
It refers to the members of the population that have been chosen to take part in the
research
It is used to gather data, make inferences, and draw conclusions about the entire
population based on the characteristics of the selected subset
Example:
Scenario: A construction project manager wants to assess the satisfaction levels of clients
who have recently completed home renovation projects with their company. The manager
wishes to gather feedback to improve services and understand areas for enhancement
Population: The population in this scenario would consist of all clients who have recently
undergone home renovation projects with the construction company.
Sample: Instead of surveying or gathering feedback from the entire population of clients
(which might be impractical or resource-intensive), the project manager might opt to take a
sample from this population
The term sample refers to the members of the population that have been chosen to take part in
the research. Sampling procedures must ensure that the sample is representative of the
population.
Representative Samples
Random Sampling
Random Sampling
STRATIFIED SAMPLING
Types of Sampling
1. Probability Sampling
➢ The sampling method gives the probability that our sample is representative of
the population.
2. Non-probability Sampling
➢ If there is no such idea of probability, the sampling method is known as non-
probability sampling.
➢ Non-probability is also known as non-parametric sampling, which is used for a
specific purpose
SAMPLING
Probability Non-
Sampling Probability
Sampling
Probability Sampling
➢ A simple random sample is one in which each element of the population has
an equal and independent chance of being included in the sample
➢ It is done by using several techniques as :
➢ Tossing a coin
➢ Throwing a dice
➢ Lottery method
➢ Blindfolded method
➢ By using a random table
Advantages
Systematic Sampling
Now, we select each N/nth individual from the list, and thus we have
the desired size of the sample, known as a systematic sample.
Advantages
➢ This is not free from error since there is subjectivity due to other individuals'
different ways of the systematic list.
• The information of each individual is essential.
• There is a risk in concluding the observations of the sample
Stratified Sampling
• The researcher divides his population into strata based on some characteristics
and draws a predetermined number of units at random from each of these
smaller homogeneous groups (strata).
Advantages
Cluster Sampling
• To select the whole group is known as a Cluster sampling.
• In Cluster sampling, the sample units contain groups of elements (clusters)
instead of individual members or items in the population.
Advantages
• It may be a good representative of the population.
• It is an easy and economical method.
• It is practicable and highly applicable to education.
Disadvantages
• Cluster sampling is not free from error.
• It is not comprehensive.
Multi-Stage Sampling
• This sample is more comprehensive and representative of the population.
• Primary sample units are inclusive groups, and secondary units are subgroups
within these ultimate units to be selected, which belong to one and only one
group.
• Stages of a population are usually available within a group or population,
whenever the researcher does stratification.
• Individuals are selected from different stages for constituting the multi-stage
sampling
Multi-Stage Sampling
Advantages
Accidental Sampling
• The term incidental or accidental applied to those samples that are taken
because they are most frequently available,
• This refers to groups that are used as samples of a population because they are
readily available or because the researcher is unable to employ more
acceptable sampling Methods.
Advantages
• It is a straightforward method of sampling.
• It reduces the time, money, and energy
Disadvantages
• It is not representative of the population.
• It is not free from error
Judgment Sampling
• This involves the selection of a group from the population based on available
information thought
• It is to be representative of the total population.
• Generally, the investigator should take the judgment sample, so this sampling
is highly risky.
Advantages
• Knowledge of the investigator can be best used in this sampling technique.
• This technique of sampling is also economical.
Disadvantages
• This technique is objective.
• It is not free from error
• It includes uncontrolled variation
• Generalization is not possible
Purposive Sampling
• The sample is selected by some arbitrary method because it represents the total
population or is known to produce well-matched groups.
• The idea is to pick out the samples concerning some essential criteria for the
particular study.
• This method is appropriate when the study places particular emphasis on the
control of certain specific variables
• Snowball Sampling- Begin by identifying someone who meets the criteria for
inclusion in your study. Then ask the respondent to recommend others whom
they may know who also meet the criteria
• Modal Sampling-Sampling the most frequent case, such as polls
• Expert Sampling- sample of persons with known or demonstrable experience
and expertise in some area
• Heterogeneity Sampling- when we want to include all opinions or views, and
we are not concerned about representing these views proportionately
Advantages
• Use of the best available knowledge concerning the sample subjects.
• Better control of significant variables.
• The sample group's data can be easily matched.
• Homogeneity of subjects used in the sample.
Disadvantages
• The reliability of the criterion is questionable.
• Knowledge of the population is essential.
• Errors in classifying sampling subjects.
• Inability to generalize the total population.
Quota Sampling
• This combined both judgment sampling and probability sampling.
• The population is classified into several categories: based on judgment,
assumption, or previous knowledge, the proportion of the population falling
into each category is decided. After that, a quota of cases to be drawn is fixed,
and the observer is allowed to sample as he likes.
• Quota sampling is very arbitrary and likely to figure in municipal surveys.
Advantages
• It is an improvement over judgment sampling.
• It is an easy sampling technique.
• It is most frequently used in social surveys.
Disadvantages
• It is not a representative sample.
• It is not free from error.
• It influences regional geographical and social factors.
Sample Size
In addition to the purpose of the study and population size, the following three criteria
usually need to be specified to determine the appropriate sample size:
Degree of Variability
The third criterion, the degree of variability in the attributes, refers to the
distribution of characteristics in the population
The more heterogeneous a population, the larger the sample size required to
obtain a given level of precision. The less variable (more homogeneous) a
population, the smaller the sample size
You can find the following formulae (or variations thereof) in most statistics
textbooks, especially descriptive statistics dealing with probability.
The sample size for an infinite population, where the population is more significant than
50,000
𝑍 2 ∗ 𝑝 ∗ (1−𝑝)
SS = 𝐶2
SS = sample size
Z = Z value (1.96 for a 95 % confidence level)
P = percentage of population picking a choice, decimal
C =confidence interval in decimal (±0.04)
A Z-values represents the probability that a sample will fall within a specific distribution.
Sample size – finite population where the population is less than 50,000
𝑆𝑆
New SS = 𝑆𝑆−1
(1+ [ ])
𝑁
SS = sample size
N = population
Example
Find the sample size for a total student population in 5600. Use a 95% confidence
level, a population percentage of 60%, and a confidence interval of 4%
Sample Size
Some References: Here are the publication years for the mentioned books on
sampling techniques and research methodology:
Please note that some of these books might have newer editions available with updated
content, so it could be beneficial to explore the latest editions for the most current
information and insights.
Many books on sampling techniques and research methodology are available for purchase
or download in electronic formats. Several platforms offer digital versions of these books
for easy access. Here are some popular online platforms where you might find those books:
1. Amazon Kindle Store: Amazon offers a wide range of eBooks on research methods,
sampling techniques, and statistical analysis that can be purchased and downloaded to
Kindle devices or read using the Kindle app on various devices.
2. Google Books: Google Books provides access to a vast collection of books. While
not all books may be available for full download due to copyright restrictions, many
titles offer previews or limited pages for free and allow users to purchase digital
copies.
3. Project Gutenberg: This platform offers free access to a wide range of public
domain books. While it might not have the most recent publications, it can be a
resource for classic texts and older editions that are freely available for download.
4. Online Libraries and Academic Databases: Institutions, universities, and libraries
often provide access to electronic resources, including eBooks, through their online
libraries or academic databases. If you're affiliated with an academic institution, check
their library resources for available digital books on sampling and research
methodology.
5. Publisher Websites: Some publishers provide direct access to digital copies of their
books through their websites or partner platforms. Publishers often offer eBooks for
purchase or download in PDF or ePub formats.
Reading Assignment
Review some of the formulas in determining the sample size out of the
population.
Try looking for the book “Determining Sample Size” by Patrick Dattalo
DATA PROCESSING
After data have been collected, the next step is to process them into information
suitable for analysis. The processing of qualitative data is part of data analysis rather than a
separate process. For quantitative data, processing is necessary to ensure the integrity of the
data.
The first stage of data processing is to edit the data for errors, contradictions,
inconsistencies, and omissions that have escaped preliminary field editing. Falsified data are
usually rejected. Suppose errors or missing data have been spotted. In that case, a decision
has to be made to discard the information, re-contact the respondent, use an average value
among similar respondents, interpolate from other discounts, or use subject matter knowledge
to guess an appropriate value. Care must be exercised in handling outliers because they do
not fit the theory. They may provide a refutation of the theory. Data obtained from published
documents may also require editing. They may contain biases such as arbitrary accounting
conventions and failure to consider quality change, price discounts, and reporting errors.
They may also contain misprints.
The second stage of data processing is transforming the data through conversion,
adjustment, or reconstruction. This may involve converting one currency to another,
converting monthly to annual income, deriving net from gross values, or rebasing a time
series to a new base year.
The third stage is to code the data by labeling, classifying, and organizing them for
subsequent analysis. Coding is generally straightforward for quantitative data, such as
developing the data table or matrix for regression or multivariate analyses using statistical
software. The researcher should avoid heaping, where too much data fall into a particular
category. If it occurs, reclassification is necessary.
For qualitative data, coding is the basis for analysis because respondents provide
open-ended answers to questions. The development of the storyline is fundamental in
qualitative studies, so it is not just a matter of labeling and classifying data.
The processing of data and further analysis may be broken up into three stages:
1. Data management,
2. Explanatory data analysis
3. Statistical analysis (testing and modeling)
The process of analyzing data in research or any analytical project can be segmented into
three main stages:
1. Data management: This initial stage involves organizing, cleaning, and preparing the
raw data for analysis. It includes activities such as data collection, data entry, data
validation, and data cleaning to ensure accuracy and consistency. Data management
aims to transform raw data into a usable format that can be effectively analyzed.
Techniques like data coding, formatting, and structuring are part of this stage. The
goal is to create a dataset that is ready for further exploration and analysis.
2. Explanatory data analysis: Once the data is cleaned and organized, the next stage
involves exploring and understanding the characteristics, patterns, and relationships
within the dataset. Explanatory data analysis employs various statistical and
visualization techniques to describe the main features of the data, detect anomalies or
outliers, identify trends, and uncover insights. This stage focuses on summarizing the
data, examining distributions, relationships between variables, and gaining an initial
understanding of the dataset's key aspects. Visualizations like histograms, scatter
plots, and summary statistics (mean, median, standard deviation, etc.) are commonly
used in this phase.
3. Statistical analysis (Testing and Modeling): In this final stage, statistical methods,
tests, and models are applied to the prepared data to test hypotheses, make
predictions, or derive conclusions. This phase involves more advanced statistical
techniques such as hypothesis testing, regression analysis, machine learning
algorithms, or other modeling approaches depending on the nature of the research or
analysis goals. The aim is to draw meaningful conclusions, validate hypotheses, or
develop models that can predict or explain phenomena within the dataset.
These stages are interconnected and iterative; findings from one stage may lead to revisiting
the previous stages for further refinement or exploration. Each stage plays a crucial role in the
overall data analysis process, leading to a comprehensive understanding and interpretation of
the data to derive valuable insights and conclusions.
DATA ANALYSIS
The methods you use to analyze your data will depend on whether you have chosen to
conduct qualitative (words) or quantitative (numbers) research, and this choice will be
influenced by:
• personal preference
• methodological preference
• educational background
For quantitative data analysis, issues of validity and reliability are essential.
The analysis of large-scale surveys is best done using statistical software, although
simple frequency counts can be undertaken manually.
1. Eyeballing - Eyeballing just means looking at your numbers to see what they
tell you:
1. What do they seem to say?
2. Are they going up or down?
3. Are they all around one point?
4. Are there any that seem not to fit with the others?
The statistics described here are called descriptive statistics (because they
represent them), but there are others with different purposes.
If you use these statistics, you will probably need to use SPSS rather than Excel.
SPSS (Software Package used for Statistical Analysis)
3. Explaining
1. The next thing to remember with numbers is that the numbers and the
statistics used to analyze them serve no function in themselves.
2. They help you analyze, but this analysis exists for you to explain,
discuss, and communicate your findings.
3. So, remember that when you present an analysis using numbers, you
will need to explain with words
Ask two researchers to analyze a transcript, and they will probably come up
with very different results.
Coding Analysis - The raw data require some processing before the actual data analysis.
These may include making back-ups of original copies, indexing sources for easy reference
and retrieval, and transcribing audio recordings into texts to facilitate analysis. The coding
of texts consists of assigning words, phrases, symbols, or numbers to each category.
Reflexivity reduces researcher bias through continuous self-questioning, not taking anything
for granted — not even language. There is awareness of self-bias and recognition of multiple
views, explanations, and options. It is a crucial principle of qualitative research.
The narrative is a non-fiction storyline that guides the entire data analysis. It is an intelligible
story of human actions and the resulting events in the temporal order. The framework or
hypothesis provides the guideposts for the story. The researcher then proceeds to code the
data, knowing that the narrative may change as the researcher discovers new ideas and
evidence. Construction of coherent narratives of the changes occurring for an individual, a
community, a site or a program or policy
Discourse Analysis's purpose is to deconstruct texts to reveal how they create particular
views of “reality” and sustain life forms through such cognition and power relations.
Content analysis: reducing large amounts of unstructured textual content into manageable
data relevant to the research questions. Content analysis involves quantifying the contents of
written or digital texts, such as looking for the occurrences of particular words or images.
Counterfactual approaches
➢ Developing an estimate of what would have happened in the absence of the
program or policy implies using a control group or comparison group.
Consistency of evidence with the causal relationship
➢ It identifies patterns that would be consistent with a causal relationship and
then seeks confirming and disconfirming evidence.
Ruling out alternatives
➢ Identifying possible alternative causal explanations and then seeking
information to determine if these can be ruled out.
During the data collection stage, respondents were assured that all data were provided
confidentially and would be used exclusively for evaluation purposes
During the focus group discussions, the names of the participants were not recorded,
nor were the sessions taped
During the interviews, the interactions among the participants themselves as well as
between the participants and the evaluators were based on mutual respect and trust
The same evaluation also provides excellent examples of safeguards to ensure the
confidentiality of data: