Data Collection Methods
Data Collection Methods
• Quantitative
• It is used to quantify variables of any form by
generating numerical data or data that can be
transformed into usable statistics.
• It uses measurable data to formulate facts, to
uncover patterns in research and generalize results
from a large sample population.
Qualitative data
• Qualitative
• It is normally used to gain an understanding of
reasons, opinions, processes, and perceptions. It
uses a relatively smaller sample and is non statistical.
• A mixed method using quantitative and qualitative
methods is very common, especially with case studies,
and can be helpful in the triangulation of data required
to achieve validity.
Data Collection
Data Collection
• Qualitative Observation
• Log (Field Diary)
• Anecdotes
• Video recording
• Quantitative Observation
• Checklist
• Rating scale
Log (Field Diary)
Dis-advantages:
1.Technical problems with lightening, camera lens etc.
2.Camera angle adopted could present a lop-sided view of
an event or situation.
3.Participants may be more self-conscious in front of
camera.
Checklist
• Checklists includes several items on a topic and require same
response format of all items.
• It consists of a list of items with a place to check or to mark
“Yes or No”.
• A checklist enables the observer to record present or absence
of trait.
• It consists of a listing of steps, activities, and behaviours which
the observer records when an incident occurs.
• Useful for evaluating skills, behaviour, conditions, personality,
manifestations etc.
Checklist…..
Advantages:
1.Allow inter-individual comparisons.
2.Simple method to record observation.
3.Useful in evaluating learning activities.
4.Useful in containing the attention of the observer.
5.Decreases the chances of error in observation.
Checklist…..
Dis-Advantages
1. Does not indicate quality of performance, so usefulness
is limited.
2. Only a limited content of overall clinical performance
can be evaluated.
3. Only the presence or absence of an attribute, behavior
or performance parameter may be assessed.
4. Degree of accuracy cannot be assessed.
Rating Scales
• Resemble check lists but used when finer discriminations
is required and indicate the degree to which a trait is
present.
• Rating scales provide systematic procedures for
obtaining, recording and reporting the observer’s
judgement.
• By a rating is meant the judgement of one person by
another.
Rating Scales: Types
Very Moderately
Active Passive
Active Active
A
B
C
Numerical Rating Scale
It divides evaluation criteria in to a fixed number of
points, but defines numbers at the extremes only.
1 - Very Poor
2 - Poor
3 - Moderate
4 - Good
5 - Very Good
Rating Scale
Advantage
1. Easy to make and administer.
2. Easy to score.
3. Less time – consuming.
4. Can be used for large population.
5. Can be used to evaluate performance, skills and product
outcomes.
Rating Scale
Disadvantage
1. Difficult or dangerous to fix-up rating about many
aspects of an individual.
2. Chances of subjective evaluation, thus scales may
become unscientific and unreliable.
Interview
• Involves verbal communication between the researcher
and the subject during which information is provided to
the researcher.
• Interview is a conversation carried art with the definite
purpose of obtaining certain information by means of
spoken word.
• Most common method in qualitative and descriptive
studies.
Interview: Benefits
• Provides in–depth and detailed information
• Data from illiterate subjects
• Higher Response
• Clarify mis-understanding
• Ask questions at several levels
• Helps to gather supplementary information
• Use of special devices
• Accuracy can be checked
• Flexible and adaptable
Types of Interview
Advantages
1. Information obtained in such a casual manner enhances
reliability and credibility of data.
2. Explorative and qualitative studies
3. Less prone to interviewer biases.
4. “Probs” questions can be assuring additional information
to clarify.
Unstructured Interview
Disadvantages
1. Interviewer requires great deal of knowledge and skill
in order to analyse the data.
2. Information cannot be compared.
3. Analysis will be difficult.
4. Data interpretation based on researcher’s perception and
subjective feelings.
5. Time wasting.
Semi-structured Interview
• Used when researcher have a list of topics or broad questions
that must be addressed in an interview
• Interviewer's make a topic guide/interview guide containing
broad list of topics to be covered in an interview.
• Topic guide- A set of questions or list of topics
• Participants are assured to talk freely about the topic mention
in topic guide.
• Researcher will ask questions to different ways to different
participants.
• Includes both closed ended and open ended questions.
Semi- structured Interview
Advantages
1. Less prone to interviewer's bias.
2. More information can be explored from the respondent’s.
3. Needed data is collected.
4. Guides the interview.
Semi- structured Interview
Disadvantages
1. Some of the information may not be revealed.
2. Need to prepare a topic guide.
Structured Interview
Structured Interview
• It involves asking the same questions, in the same order,
and in same manner of all respondents in a study.
• It commonly have fixed type, and closed ended
questions.
• It also known as standardized interview.
• Interviewers are not permitted to change even specific
wording also.
• It increases the reliability and credibility of data.
Structured Interview
Advantages
1. Data of two interviews are easily comparable.
2. Recording, coding and analysis of data is easy.
3. Avoids irrelevant purposeless conversation.
Structured Interview
Disadvantages
1. In-depth information may not be possible.
2. Exploration of data is limited.
3. It may not cover all the possible responses or
respondent views.
Focus–Group Interviews
Focus–Group setting
Focus–Group Interviews
• In this interviews, homogenous group of 5-10 people whose
opinions and experiences are solicited simultaneously.
• The interviewer/ researcher guide the discussion according
to written set of questions or topics.
• It is a planned discussion.
• Duration of the interview ranges from 1.5-2 hours.
• All the verbal & non-verbal information is recorded.
• Ample opportunity is given to respondents to express their
views.
Focus – Group Interviews
Advantages
1. Efficient and can generate a lot of information.
2. Stimulates new ideas and creative concepts
3. Involves many participants at one time.
4. Participants may feel comfortable to answer in a group
with similar interests.
Focus – Group Interviews
Disadvantages
1. Chances of client or researcher’s bias.
2. May be difficult to moderate by one person.
3. Data difficult to code, analyze and interpret.
4. Focus group may not be representative of entire
population.
Joint Interviews
Joint Interviews
• They are conducted simultaneously to understand the
phenomenon involving two or more parties. People involved
in the interview are intimately related.
• Merits:
• Helpful in observing dynamics between two key actors.
• De-merits:
• Only supplements information.
• May be un-comfortable to participants as some things
cannot be discussed in front of other people.
Life Histories
• They are narrative self-disclosures about individual life
experiences.
• Researchers ask respondents to provide, often in
chronologic sequence, a narration of their ideas and
experiences, either orally or writing.
• Life histories are usually done in the ethnographic
studies.
• Example: A study involving experiences of women who had
simultaneously experienced abuse and physical disability.
Critical Incidents
• It is a method of gathering information about people’s
behaviors by examining specific incidents relating to the
behavior under investigation.
• The word ‘critical’ means that the incident must have had a
positive or negative impact on some outcome.
• Example: A study involving the outcomes of stress management
program. The 5-week program taught ‘mind-body-spiritual’
technique of silently repeating a mantra with spiritual meaning.
3-months later, critical incident interviews with 55 participants
yielded 147 incidents involving application of the technique.
Photo- Elicitation Interviews
• It involves an interview stimulated and guided by photographic images.
• Photographs of the participant’s world are taken up by the researcher
themselves or by the participants and become a stimulus for
discussion.
• Example: In an attempt to explore meaning and experience of hope
among young people living in Australia, participants can be given a
disposable camera to take photos showing hope for them and then
questioned during interviews.
• Participants need to be continually reassured that their taken-for
granted explanations of the photos are providing new and detailed
information.
Questioning
• This method allows the researcher to gather information
by asking the questions orally (interview) or by means of
a formal, written document (questionnaire).
• Questionnaire - is a structured instrument consisting of
a series of questions prepared by researcher on a paper
and that a subject is asked to complete either through
pencil or through a computer and is used to gather data
for phenomenon under study.
Questioning
• The instrument is called a SAQ(Self-Assessment Questionnaire)
when respondents complete the instrument themselves, usually
in a paper & pencil format.
• SAQ also known as ‘survey’ .
• Methods of Questionnaire Administration:
1. Direct Administration-Researcher will distribute the
questionnaire and respondent answer items by writing or
checking against correct response.
2. Post or e-mail including all electronic means. (Mailed
questionnaire)
Questionnaire: Types of Questions
Open-ended questions
• Provide opportunity to the respondents to express their
opinions and answers in their own way.
• No predetermined set of responses.
• Provide true, insightful and unexpected suggestions.
Questionnaire: Types of Questions
Closed-ended questions
• Closed–ended questions or fixed-alternative questions-
Response alternatives are pre-specifie d by the
researcher.
• Facilitate easy statistical analysis.
• Can be asked to different groups at different intervals.
Closed-ended questions
1. Dichotomous Questions
2. Multiple-choice Questions
3. Cafeteria Questions
4. Rank-order Questions
5. Contingency Questions
6. Rating Questions
7. Likert Questions
8. Bi-polar Questions
9. Matrix Questions
Dichotomous Questions
Question 1 2 3 4 5
This community is
Strongly Strongly
a good place to Agree Uncertain Disagree
Agree Disagree
raise children.
Matrix Questions
• Include multiple questions and identical response
categories, placed one under the other, forming a matrix.
• E.g. Please let me know your weekly schedule of the
following:
Mon Tues Wed Thurs Fri Sat Sun
Gym √ √ √ √ √
Aerobics √
Eating Out √ √
Drink √
Scales: Types
Likert Scale
Competent 7 6 5 4 3 2 1 Incompetent
Worthless 7 6 5 4 3 2 1 Valuable
Pleasant 7 6 5 4 3 2 1 Unpleasant
Semantic Differential Scale: Uses
Advantages
1. Easy to construct
2. Highly flexible.
3. Useful in evaluating several concepts such as person,
place, situation, abstract idea, controversial issue etc
Semantic Differential Scale
Disadvantages
1. Difficult to select relevant concepts appropriate for a
given study.
2. Time consuming to find bipolar adjectives.
Visual Analogue Scale
• It is used to measure the intensity of certain sensations and
feelings such as pain, discomfort, anxiety, alertness, severity of
clinical symptoms, functional ability, and attitude towards
environmental conditions.
• It is a 100 mm horizontal or vertical line with a statement at
either end representing one extreme of the dimension being
measured.
• It requires subjects to respond for particular phenomenon
measured, which is later measured by using a ruler from left
end.
Visual Analogue Scale
Visual Analogue Scale
Advantages
1. Reliable and Valid tool to measure the intensity of
certain sensations and feelings.
2. Rating of highly subjective phenomenon is possible by
this scale.
3. Most useful in studying changes in the phenomenon.
Visual Analogue Scale
Disadvantages
1. Cannot be used to compare results across group of
individuals at same time.
Schedules
• It is very much like the collection of data through
questionnaire, with little difference which lies in the fact
that schedules are being filled in by the enumerators
who are specially appointed for the purpose.
• These enumerators along with schedules, go to
respondents, put to them the questions from the
proforma in the order the questions are listed and
record the replies in the space meant for the same in
the proforma.
Differences
Questionnaire Schedules
Filled out by the researcher - interpret questions
Sent through mail
when necessary.
Spend money only in preparing the questionnaire Money has to be spent in appointing enumerators
Questionnaire method is likely to be very slow The information is collected well in time
Only when respondents are literate and cooperative When the respondents happen to be illiterate.
The success lies more on the quality of the Success depends upon the honesty and competence
questionnaire itself of enumerators
Warranty cards
Examples include:
• The ages (in years) of survey respondents
• The numbers of eggs that hens lay
• The amounts of milk from cows
• The weights of shipments received
• The numbers of cans of a drink
Quantitative Data
Quantitative data can be further described by distinguishing
between discrete and continuous types.
1. Discrete data result when the number of possible values
is either a finite number or a “countable” number. (That is,
the number of possible values is 0 or 1 or 2, and so on.)
2. Continuous data result from infinitely many possible
values that correspond to some continuous scale that
covers a range of values without gaps, interruptions, or
jumps.
Quantitative Data
Examples include:
• The ages (in years) of survey respondents - Discrete
• The numbers of eggs that hens lay - Discrete
• The amounts of milk from cows - Continuous
• The weights of shipments received - Continuous
• The numbers of cans of a drink - Discrete
• The volume or weight of Drink - Continuous
Categorical data
Examples include:
• Geographical locations of retail outlets
• Rankings of subordinates at yearly reviews
• The political party affiliations
• The numbers that are sewn on the shirts of Players
Data matrix
C h aracte r i st ic s o f
Something or someone
something or someone
Eg. Football player data
Eg. Football teams data
Case: CONDITION
Variables and constants
Football League
Football League data
Cases - Row wise
Variables - Column wise
Observations
Data matrix
Data matrix
• U s u a l ly w e d o n o t p r e s e nt
complete data matrix to other
people because it is often huge.
• It doesn't give a clear overview of
the statistical information contained
within the data matrix.
• While presenting the information
make use of summaries of data in
the forms of tables and graphs
(or) by measuring Central tendency
and Dispersion
Measurement of data
Level of Measurement
• In applying statistics to real problems, the level of
measurement of the data helps us decide which
procedure to use.
• Another common way of classifying data is to use four
levels of measurement:
1.Nominal,
2.Ordinal,
3.Interval, and
4.Ratio.
Nominal Level
• The lowest level of data measurement is the nominal level.
• Temperatures
• Years
• the change in stock price.
• % change in employment,
• the % return on a stock
Interval Level
Ratio Level
• Ratio-level data measurement is the highest level of
data measurement
• The ratio level of measurement is the interval level with
the additional property that there is also a natural zero
starting point (where zero indicates that none of the
quantity is present). For values at this level, differences
and ratios are both meaningful.
• Distances • Height • Time
• Prices • Weight • Volume
Ratio Level
Levels of Measurement
There is a natural zero starting
Ratio: Example: Distances
point and ratios are meaningful.
• T h e va r i a b le w e i g ht w a s a
quantitative data, sometimes it’s
possible turned into an ordinal
variable
• L et ’s ex p lo re h o w to
su m marize quantitative
variables graphically and
visualise their distribution.
• There are three types of
displays—the dot plot,
stem-and-leaf plot, and
histogram
Dot Plots
• A relatively simple statistical chart that is generally
used to display continuous, quantitative data is the dot
plot.
• In a dot plot, each data value is plotted along the
horizontal axis and is represented on the chart by a dot.
• If multiple data points have the same values, the dots
will stack up vertically.
• It may not be possible to display a large number of of
the data values along the horizontal axis.
Dot Plots
Dot Plots
Dot Plots
• Sample Mean
Mean - Example
Properties of the Mean
• If your variable is
• Categorical - use the mode,
• Quantitative - median or
the mean.
Mean Vs. Median
• If your data has
• Influential outliers - Median
• Highly skewed - Mean.
Measure of Dispersion
The probability of a
value being between z
= -2.33 and -0.67 is
0.2415.
Statistical Inference
There are two types of statistical inference methods
—estimation of population parameters and testing
hypotheses about the parameter values.
Taxonomy of Inferential Techniques
Hypothesis testing
Parametric tests
Estimation
• We can estimate the value of a population parameter in
two ways:
1. Point estimate - a single number that is our best
guess for the population parameter (or) by means of
2. Interval estimate - a range of values within which
we expect the parameter to fall.
• The probability that the interval contains the population
value is what we call the confidence level.
Point Vs. Interval Estimate
Point estimate
•
Example: Courtroom
Example: Courtroom
Significance level-Decision
Significance level-Decision
• Type I error is equal to the significance level or the
P(Type I error/null hypothesis is true) is equals alpha.
• It seems to be tempting to just decrease the
significance level.
• If you decrease the p(Type I error) leads to increase
the p(Type II error) and vice versa.
• The power of a test is the probability of rejecting
the null hypothesis, given that it is false, or 1-p(Type
II error)
Significance level-Decision
P-Value: Scuba-diving
P-Value/Critical Value
P-Value/Critical Value
P-Value/Critical Value
P-Value: Scuba-diving
Conclusion
• The significance level (denoted by 𝛂) is the probability that the
test statistic will fall in the critical region (more extreme or
occurs intentionally) when the null hypothesis is actually true.
• If the test statistic falls in the critical region, we reject the
null hypothesis
• We can define the confidence level for a confidence interval to
be the probability 1 - 𝛂.
• Common choices for 𝛂 are 0.05, 0.01, and 0.10, with 0.05 being
most common.
Conclusion
• For a two tailed test
Conclusion: Scuba-diving
Conclusion
• Fail to reject Null Hypothesis (ACCEPT)
• If P value > 𝛂/2 Value(Two tailed)
Step 4: P-
Value/ Critical
Value
P-value =
1-0.9987 =
0.0013
z test: Known Population
Step 5: Conclusion
• Assume significance level as 0.05
0.0013 < 0.05
• Critical value with 0.05 significance level is 1.645
3.05 > 1.645
• Reject Null hypothesis - Mean is changes from its
claimed value
z test: Summary
• To investigate the significance of the difference
between
• an assumed population mean μ/proportion and a
sample mean Ẍ/proportion. (variances known)
• t h e m e a n s /p r o p o r t i o n s /co u nt s o f t w o
populations. (variances known and equal/ not
equal)
t test: Testing a Claim About a
Mean: 𝛔 Not known
df = n-1 = 40-1 = 39
t test: Example
Step 4: Critical Value
• Using this test statistic of t = 1.501, we now
proceed to find the critical value from t-
distribution Table. With df = 39.
• Assume significance level of 0.05 in one tail to
find the critical value.
tcritical = 1.685.
t test: Example
Step 5: Conclusion
• Because the test statistic of , t < t𝛂 (1.501 < 1.685)
Step 1: Assumptions
• The variable is quantitative and variance of tubes
are restricted to less than 4.
• Randomization, such as random sampling or a
randomized experiment
• The number of tubes is normally distributed.
χ2 test
Step 2: hypotheses
• The null hypothesis is that the variance is
acceptable with no problems—the variance is equal
to (or less than) 4.
H 0: 𝛔2 = 4
df = n - 1 = 8 - 1 = 7
χ2 test
Step 4: P-Value/ Critical Value
• By assuming significance level as 0.05, Critical
value is
χ20.05,7 = 14.0671
χ2 test
χ2 test
Step 5: Conclusion
• Because this observed chi-square value, χ2= 36.72,
is greater than the critical chi-square table value,
χ20.05,7 = 14.0671.
Step 5: Conclusion
• The observed F value is 5.62, which is greater
than the upper-tail critical value of 3.59.
• Thus, the decision is to reject the null
hypotheses. The population variances are not
equal.
F-test: Summary
• To investigate the significance of the difference
between
• two population variances. (variance ratio test)
• the overall mean of K sub-populations and an
assumed value μ0 for the population mean.
• two counted results (Poisson distribution)
• To test the null hypothesis that K samples are from K
populations with the same mean. (analysis of variance)
ANOVA
More than two groups?
• We could compare the means against each other
using a “two sample t-test” which tests the
statistical differences between the means of two
groups.
More than two groups?
• This increases the chance of a Type 1 error.
• For example: setting the alpha value to 5% and
comparing 5 different groups gives us 10 pairs of
tests.
Not having type I error = (1-0.05)10 = 60%
• Uses the F-statistic to compare the variances of
the populations.
ANOVA Assumptions
• All samples must be randomly selected
• The populations must be normally distributed
• The samples must be independent from one
another
• Each population must have the same variance
• N-Way means there are N factors that describes
the cause of the variation in the data
ANOVA
• Null hypothesis in ANOVA
H0: All means are equal.
• Alternative hypothesis is
H1: All means are not equal
ANOVA
• The ANOVA will test whether the means of all the
populations are equal for different levels of one factor.
• The “F-ratio” is used to do this.
ANOVA
• “F-ratio” is the ratio: between groups variance / within
groups variance
• The F-ratio gets larger as the distribution overlap gets
smaller a larger F indicates a difference in the group
means.
ANOVA Table
ANOVA: Example
A process can be run at three temperatures for
conditioning, say 200o, 220o and 240oC. Using the below
data of four batches run randomly, determine whether the
temperature significantly affects the moisture content (%)
at a significance level of 0.05.
ANOVA Table
Test Statistic
Conclusion
• The test statistic is = 8.85/0.37=23.92
• From the F-table, we can see that critical value of
F=4.26.
• Since this is one rejection region right tail test
and the test statistic exceeds critical value, so the
null hypothesis is rejected.
• At 0.05 level of significance, given data indicates
that temperature is impacting moisture content
One-way versus Two-way