12b Module 8-A. Data Analysis
12b Module 8-A. Data Analysis
Key Choice:
• Quantitative Analysis
• Qualitative Analysis
2
Variables
A variable is a characteristic or attribute that varies or changes
over time or among individuals or groups
Examples: age, gender, agricultural production, kilometers of
paved roads, number of children who are undernourished,
hectares of national parks, etc.
3
Examples
Independent variable: education
Dependent variable: income
4
Types of Variables
Nominal and Ordinal:
nominal variables assign a label to categories: male, female;
single, married, divorced; red, green, blue
ordinal variables also assign names to each possible response
category but the categories can be ranked: level of satisfaction
with training (unsatisfied to satisfied); small, medium, large
Interval/ratio variables:
interval scale uses equidistant measurement but zero point is not
meaningful (e.g., celsius)
ratio has a meaningful zero point (i.e., zero indicates absence of
what is being measured) e.g., income, years of schooling, birth
rates, kilometers of paved roads
5
Quantitative Descriptive
Methods
Applied to one variable
Frequency/Percentage Distribution
A chart or table showing how often each value or range of values of a
variable appear in a data set.
Central Tendency
A measure of location of the middle or the center of a distribution.
Central tendency can refer to a mean, median, or mode
Dispersion
Describes how much the observations vary around the central tendency.
Range and standard deviation
6
Frequency distribution
7
Percentage distribution
Percentage distribution of bachelor's degrees conferred by U.S.
degree-granting institutions, by sex and race/ethnicity: 2003-04
8
Describing Distributions
Central Tendency:
• What are the typical characteristics?
- Example: What is the average age of graduates?
- Example: What is the average income in rural areas?
Dispersion:
• How dissimilar or concentrated are cases on a
characteristic?
- Example: How much variation in ages?
9
Measures of Central Tendency
10
Measures of Central Tendency:
Number of vehicles per hour
9
17 Mode =
19
Median =
23
23 Mean =
28
31
34 (How to compute
38 the mode when
there is an even
41 number of
151 cases?)
Sum = 414 11
Measure of Dispersion: Range and
Standard Deviation
12
Measure of Dispersion: Hours of
television watched per month
11 3
16 4 Which distribution
18 6 has the larger
19 12 standard
21 60 deviation? Why?
Sum = 85 85
Mean = 17 17
Median = ? ?
13
Measure of Dispersion: Standard Deviation
Normal Distribution: Bell-shaped curve
• 68.26% of the variation is within 1 standard deviation of the mean
• 95.44% of the variation is within 2 standard deviations of the mean
15
The Normal Distribution and
Intelligence Quotients
16
Applying the Normal Distribution
17
Positive skew
Positive skew: The right tail is longer and a few large numbers
distort the mean score -- the mean score is artificially high.
18
Negative skew
Negative skew: The left tail is longer, and a few extremely small
numbers distort the mean -- the mean score is artificially low.
19
Comparison of Means
Do males earn more than females?
Or, is gender related to income differences?
Female $798
21
Cross-tabulation
23
Statistical Significance -- 1
We collect data from a sample of farms: approximately half of the
farms were randomly assigned to a treatment group and the
other half were randomly assigned to a control group.
Examples
If I throw a coin in the air, what is the probability that it will land on
one side or the other?
If we throw a pair of dice, what is the probability that the result will
be a 7?
25
Statistical Significance -- 3
The standard for evaluation is typically p <=.05
Here we are saying that there is a 95 percent probability that our
results are not due to chance.
26
Statistical Significance -- 5
Survey Item Pre Post Pre vs. Post
Mean Mean Significance
1. I find plants to be interesting. 4.8 5.5 0.026*
2. Plants are boring. 2.8 2.2 0.029*
3. I have enjoyed learning about plants in the past. 4.6 4.7 0.378
4.I would like to learn more about plants. 5.1 5.4 0.189
5. Animals are m ore interesting than plants. 5.6 5.2 0.119
6. Bacteria are more interesting than plants. 3.7 3.3 0.15
7. I use my knowledge of plants in my everyday life. 3.4 4.7 0.001*
8. Plants are important to human society. 5.9 6.5 0.003*
9. I don’t really use or encounter plants in my everyday life. 2.6 2.0 0.074
10. In everyday life (walking to class, going to the store, etc.) I pay 4.4 5.0 0.094
attention to the plants around me.
* Significant at p ≤ 0.05
27
Statistical Significance -- 6
28
Statistical Significance --7
Be aware!
Statistical significance ≠ practical significance
A statistically significant difference is not necessarily large,
important, or of practical significance.
With a large sample, extremely small differences can be
statistically significant but still trivial.
Some researchers argue that statistical significance is of little
value.
29
Measures of Association
30
Measures of Association
31
Measures of Association
32
Measures of Association
Pearson’s
570
560
8th grade science score
r = -.50 550
p = .25 540
530
520
10 20 30 40 50 60
Descriptive Statistics
33
Association Does not Prove
Causality!
34
Establishing Causality
Causality: In impact evaluations, our ultimate goal often is to
identify the causal relationships among phenomena we
study
There are three factors necessary for causal inference:
1. The cause must precede the effect. Changes in the
independent variable must occur before changes in the
dependent variable.
2. The cause and effect must be related (i.e., correlated).
3. Other explanations of the cause-effect relationship must be
eliminated (i.e., rule out spurious or confounding factors)
35
Data Analysis in Monitoring Plans
36
Identify Data Subsets in Monitoring Plans
37
Possible Data Subsets for Monitoring:
Demographic Characteristics
38
Possible Data Subsets for Monitoring:
Service Characteristics
Service Characteristics
By organizational unit, if the service is provided in more
than one facility (such as different health clinics, schools,
parks, water bodies, or districts)
Type of procedure used by service provider
Amount or level of service
By customer needs
39
Discussion: Which Hospital Would You
Choose?
2,100 63 3% 800 16 2%
SURGERY DEATH SURGERY DEATH
DEATHS RATE DEATHS RATE
PATIENTS PATIENTS
MERCY APOLLO
HOSPITAL HOSPITAL
40
Discussion: Which Hospital Would You
Choose?
2,100 63 3% 800 16 2%
SURGERY DEATH SURGERY DEATH
DEATHS RATE DEATHS RATE
PATIENTS PATIENTS
MERCY APOLLO
HOSPITAL HOSPITAL
BUT… BUT…
42
Qualitative Data Analysis
43
Qualitative Data Analysis:
Inductive Analysis
Inductive analysis
Research findings emerge from the frequent, dominant or
significant themes found in the raw data. The findings are
not constrained by structured methodologies, models,
frameworks, etc….
Raw data
Themes
emerge
44
Example
45
Qualitative Data Analysis:
Inductive Analysis – Goals
To condense extensive and varied raw textual data into a brief,
summary format.
To establish clear links between the research objectives and the
summary findings derived from the raw data and to ensure
these links are both transparent (able to be demonstrated to
others) and defensible (justifiable given the objectives of the
research).
To develop a model or theory about the underlying structure of
experiences or processes that are evident in the text (raw
data).
Dr. David Thomas
http://www.health.auckland.ac.nz/hrmas/resources/qualdatanalysis.html
46
Qualitative Data Analysis:
Inductive Analysis – General Process
1. Review qualitative data carefully and fully.
2. Identify themes or categories from statements (or phrases)
found in the qualitative data.
3. For each theme or category, identify all of the statements (or
phrases) that go with that theme.
4. Determine linkages and relationships across themes (or
phrases).
5. Reduce the number of themes or categories.
6. Create a model based on primary themes or categories
47
Qualitative Data Analysis:
Deductive Analysis
Deductive analysis
Analyze data according to an existing framework (e.g., the
logic model, prior research, etc.)
1
2
5 3
6 4
7
10
13 8
11
14 Themes
15 12
16
48
Qualitative Data Analysis:
Deductive Analysis – General Process
1. Review the project model or framework.
2. Identify categories or groupings for data prior to data
analysis.
3. Review the qualitative data carefully and fully.
4. Label statements (or phrases) in the qualitative data with the
appropriate category or grouping based on the project model
or framework.
49
Content Analysis Example
Efficiency 50
Coding of Content
Sample of codes and categories Blue = access to education improved
Yellow = income increased Grey =bypassing village and less income
Green = access to markets and customers improved Purple = air pollution
Pink = employment opportunities improved Red = traffic and safety issues
I can sell my produce in more markets. This allows me to earn more money each day.
My daughter can now attend vocational college in the city because bus service is now available.
My income has increased because I was able to find a better job in the city.
I sometimes wish that the road had not been constructed. We have more traffic traveling at higher speeds.
There are so many more cars. The air pollution has affected my grandmother’s breathing.
Once the road was completed, fewer travelers stopped at my store. They now bypass the village and my
monthly income has dropped 30 percent.
We have more money because my husband can get to a second job.
More people in the community are able to attend the city’s vocational college due to the regular bus service.
My wife was able to get a part-time job in the city and our family income has increased.
More air pollution, but overall more market access has helped my company grow, increasing our revenues
51
Matrix for Coding
Employment
to the Income secondary More Traffic Air Bypassing health
opportunities
market increase school traffic accident pollution the village service
improved
improved improved improved
C128 1 1
K245 1
M358 1 1
… … … … … … … … …
Etc…
Total 20 26 15 18 11 5 10 8 16
52
Activity: Qualitative Analysis
53
Activity: Qualitative Analysis
After completing your analysis with your partner, compare results
with others.
Discuss the following with your group members when everyone
is done:
What were the strengths and limitations of the deductive analysis
process?
What were the strengths and limitations of the inductive analysis
process?
How can you use both approaches in your work?
54
Qualitative Data Analysis
55
Qualitative Data Analysis
56
Different approaches to measure different
expected results
58
Triangulation
Triangulation requires that different estimates be systematically
compared
If there are differences it is essential to understand and
explain the differences
– Are different data collection methods measuring different
things?
– Are some estimating methods more reliable/accurate than
others?
If the differences cannot be explained this must be stated in the
evaluation report.
Note that “triangulation” does not mean that exactly “three”
methods or sources must be used. Rather, the number is driven
by the evaluation design requirements.
59
Evaluation Plan
Specific Data
General Type of Type of Indicators & Data Collection
Sub- Data Analysis
Questions Question Design Measures Sources &
Questions Sampling
60
Measures of Association
Relation between males' education and
beginning salaries in Ministry of Sports
22
20
Pearson’s
18
16
r = .63 14
12
10
Education
6
0 20000 40000 60000 80000 100000
Beginning salary
Descriptive Statistics
16
Pearson’s 14
r = .47 12
10
Education
6
0 10000 20000 30000 40000
Beginning salary
Descriptive Statistics
63