0% found this document useful (0 votes)

37 views61 pages

12b Module 8-A. Data Analysis

Uploaded by

zemike

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views61 pages

12b Module 8-A. Data Analysis

Uploaded by

zemike

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 61

Module 8: Data Analysis

© 2008. The World Bank Group. All rights reserved.

Data Analysis Strategies

Key Choice:

• Quantitative Analysis
• Qualitative Analysis

2
Variables
A variable is a characteristic or attribute that varies or changes
over time or among individuals or groups
Examples: age, gender, agricultural production, kilometers of
paved roads, number of children who are undernourished,
hectares of national parks, etc.

Independent variable: the intervention or explanatory variable

Dependent variable: what we expect to change as a result of
the intervention or changes in the independent variable(s)

3
Examples
Independent variable: education
Dependent variable: income

Independent variable: access to skilled birth

attendants
Dependent variable: maternal deaths

4
Types of Variables
Nominal and Ordinal:
 nominal variables assign a label to categories: male, female;
single, married, divorced; red, green, blue
 ordinal variables also assign names to each possible response
category but the categories can be ranked: level of satisfaction
with training (unsatisfied to satisfied); small, medium, large

Interval/ratio variables:
 interval scale uses equidistant measurement but zero point is not
meaningful (e.g., celsius)
 ratio has a meaningful zero point (i.e., zero indicates absence of
what is being measured) e.g., income, years of schooling, birth
rates, kilometers of paved roads

5
Quantitative Descriptive
Methods
Applied to one variable
 Frequency/Percentage Distribution
 A chart or table showing how often each value or range of values of a
variable appear in a data set.

 Central Tendency
 A measure of location of the middle or the center of a distribution.
 Central tendency can refer to a mean, median, or mode

 Dispersion
 Describes how much the observations vary around the central tendency.
 Range and standard deviation

6
Frequency distribution

7
Percentage distribution
Percentage distribution of bachelor's degrees conferred by U.S.
degree-granting institutions, by sex and race/ethnicity: 2003-04

8
Describing Distributions

Central Tendency:
• What are the typical characteristics?
- Example: What is the average age of graduates?
- Example: What is the average income in rural areas?

Dispersion:
• How dissimilar or concentrated are cases on a
characteristic?
- Example: How much variation in ages?

9
Measures of Central Tendency

The 3-Ms: Mode, Median, Mean

Mode: most frequent response

Median: midpoint of the distribution
Mean: arithmetic average

10
Measures of Central Tendency:
Number of vehicles per hour

9
17 Mode =
19
Median =
23
23 Mean =
28
31
34 (How to compute
38 the mode when
there is an even
41 number of
151 cases?)
Sum = 414 11
Measure of Dispersion: Range and
Standard Deviation

Range: the difference between the largest and the smallest

values

Standard deviation: measures the dispersion of scores –

the distance from the mean
• Small standard deviation: not much dispersion; most of
the data or “scores” are close to the mean
• Large standard deviation: lots of dispersion and many
scores are far from the mean

12
Measure of Dispersion: Hours of
television watched per month

11 3
16 4 Which distribution
18 6 has the larger
19 12 standard
21 60 deviation? Why?
Sum = 85 85
Mean = 17 17
Median = ? ?

13
Measure of Dispersion: Standard Deviation
Normal Distribution: Bell-shaped curve
• 68.26% of the variation is within 1 standard deviation of the mean
• 95.44% of the variation is within 2 standard deviations of the mean

Note: this is not to scale.

14
Quantitative Data Analysis:
Normal Curve

15
The Normal Distribution and
Intelligence Quotients

16
Applying the Normal Distribution

17
Positive skew
Positive skew: The right tail is longer and a few large numbers
distort the mean score -- the mean score is artificially high.

18
Negative skew
Negative skew: The left tail is longer, and a few extremely small
numbers distort the mean -- the mean score is artificially low.

19
Comparison of Means
Do males earn more than females?
Or, is gender related to income differences?

Independent Variable=Gender Dependent Variable=Income

Gender Median Income

Male $924

Female $798

21
Cross-tabulation

23
Statistical Significance -- 1
We collect data from a sample of farms: approximately half of the
farms were randomly assigned to a treatment group and the
other half were randomly assigned to a control group.

 We want to know how much corn the farms in each group

produced per hectare before and after the intervention.

 We find that the mean amounts of corn produced are

different: farms in the treatment group produced more corn on
average than the farms in the control group.

 Can we conclude that the difference between the two groups

is statistically significant?
24
Statistical Significance -- 2
To answer this question we need to know the probability that the
difference between the two groups occurred by chance.
p = the probability an event will occur (range: 0.0 to 1.0)
0.0 = the event will never occur 1.0 = the event will always occur

Examples
 If I throw a coin in the air, what is the probability that it will land on
one side or the other?
 If we throw a pair of dice, what is the probability that the result will
be a 7?
25
Statistical Significance -- 3
The standard for evaluation is typically p <=.05
 Here we are saying that there is a 95 percent probability that our
results are not due to chance.

The smaller the value of p, the more statistically significant the

result and the stronger the evidence:
Example: p <= .01
 Here we are saying that there is a __ percent probability that our
results are not due to chance.

26
Statistical Significance -- 5
Survey Item Pre Post Pre vs. Post
Mean Mean Significance
1. I find plants to be interesting. 4.8 5.5 0.026*
2. Plants are boring. 2.8 2.2 0.029*
3. I have enjoyed learning about plants in the past. 4.6 4.7 0.378
4.I would like to learn more about plants. 5.1 5.4 0.189
5. Animals are m ore interesting than plants. 5.6 5.2 0.119
6. Bacteria are more interesting than plants. 3.7 3.3 0.15
7. I use my knowledge of plants in my everyday life. 3.4 4.7 0.001*
8. Plants are important to human society. 5.9 6.5 0.003*
9. I don’t really use or encounter plants in my everyday life. 2.6 2.0 0.074
10. In everyday life (walking to class, going to the store, etc.) I pay 4.4 5.0 0.094
attention to the plants around me.

* Significant at p ≤ 0.05
27
Statistical Significance -- 6

28
Statistical Significance --7
Be aware!
Statistical significance ≠ practical significance
A statistically significant difference is not necessarily large,
important, or of practical significance.
With a large sample, extremely small differences can be
statistically significant but still trivial.
Some researchers argue that statistical significance is of little
value.

29
Measures of Association

Strength and Direction

How strong is the association?

• Several different measures of association
• Some measures of association range from zero to 1
• Others range from -1 to +1

30
Measures of Association

31
Measures of Association

32
Measures of Association

Relation between 8th grade science score and

self-confidence in learning science
580

Pearson’s
570

560
8th grade science score

r = -.50 550

p = .25 540

530

520
10 20 30 40 50 60

Percent with high self-confidence

Descriptive Statistics

Mean Std. Deviation N

8th grade science score 552.71 19.729 7
Percent with high
35.71 14.384 7
self-confidence

33
Association Does not Prove
Causality!

34
Establishing Causality
Causality: In impact evaluations, our ultimate goal often is to
identify the causal relationships among phenomena we
study
There are three factors necessary for causal inference:
1. The cause must precede the effect. Changes in the
independent variable must occur before changes in the
dependent variable.
2. The cause and effect must be related (i.e., correlated).
3. Other explanations of the cause-effect relationship must be
eliminated (i.e., rule out spurious or confounding factors)

35
Data Analysis in Monitoring Plans

 Data Analysis in monitoring utilizes more basic

methods
 Need to consider:
 Useof data subsets
 Need for comparisons

36
Identify Data Subsets in Monitoring Plans

Be Cautious of Overly Aggregated Data!

Data for each outcome measure should be broken out

(disaggregated) to show outcomes for different sub-
groups or subunits.

37
Possible Data Subsets for Monitoring:
Demographic Characteristics

Common Demographic Characteristics

 By household income (or proxy for this)
 By gender
 By age group
 By race/ethnicity
 Employment status and type of employment
 By geographical area, such as rural versus urban, by district,
by municipality

38
Possible Data Subsets for Monitoring:
Service Characteristics

Service Characteristics
 By organizational unit, if the service is provided in more
than one facility (such as different health clinics, schools,
parks, water bodies, or districts)
 Type of procedure used by service provider
 Amount or level of service
 By customer needs

39
Discussion: Which Hospital Would You
Choose?

2,100 63 3% 800 16 2%
SURGERY DEATH SURGERY DEATH
DEATHS RATE DEATHS RATE
PATIENTS PATIENTS

MERCY APOLLO
HOSPITAL HOSPITAL

40
Discussion: Which Hospital Would You
Choose?

2,100 63 3% 800 16 2%
SURGERY DEATH SURGERY DEATH
DEATHS RATE DEATHS RATE
PATIENTS PATIENTS

MERCY APOLLO
HOSPITAL HOSPITAL

BUT… BUT…

600 6 1% 600 8 1.3%

IN GOOD DEATH IN GOOD DEATH
DEATHS RATE DEATHS RATE
CONDITION CONDITION

1,500 57 3.8% 200 8 4%

IN POOR DEATH IN POOR DEATH
DEATHS RATE DEATHS RATE
CONDITION CONDITION
41
Qualitative Data Analysis
 Data from narrative documents, open-ended
 Interviews, focus groups, unstructured
 Observations
 Methods for Analysis
 Inductive Analysis
 Deductive Analysis
 Synthesis

42
Qualitative Data Analysis

• Identify common words, ideas, themes

• Many approaches - develop a computer spreadsheet, write
themes on index cards, color code words, etc.
• Identify “quotable quotes”

43
Qualitative Data Analysis:
Inductive Analysis
Inductive analysis
Research findings emerge from the frequent, dominant or
significant themes found in the raw data. The findings are
not constrained by structured methodologies, models,
frameworks, etc….
Raw data

Themes
emerge

44
Example

1. What is your overall opinion of the new road?

The road has too many curves and is dangerous.

The road already has too many holes.
I am able to go to the city every day instead of twice a week
because of the new road.
My husband can spend more time at home with our family time
due to faster commuting to his job.
It is good. I can travel to the market quickly because it is
smooth and there is not too much traffic.

45
Qualitative Data Analysis:
Inductive Analysis – Goals
To condense extensive and varied raw textual data into a brief,
summary format.
To establish clear links between the research objectives and the
summary findings derived from the raw data and to ensure
these links are both transparent (able to be demonstrated to
others) and defensible (justifiable given the objectives of the
research).
To develop a model or theory about the underlying structure of
experiences or processes that are evident in the text (raw
data).
Dr. David Thomas
http://www.health.auckland.ac.nz/hrmas/resources/qualdatanalysis.html

46
Qualitative Data Analysis:
Inductive Analysis – General Process
1. Review qualitative data carefully and fully.
2. Identify themes or categories from statements (or phrases)
found in the qualitative data.
3. For each theme or category, identify all of the statements (or
phrases) that go with that theme.
4. Determine linkages and relationships across themes (or
phrases).
5. Reduce the number of themes or categories.
6. Create a model based on primary themes or categories

47
Qualitative Data Analysis:
Deductive Analysis
Deductive analysis
Analyze data according to an existing framework (e.g., the
logic model, prior research, etc.)

1
2
5 3
6 4
7
10
13 8
11
14 Themes
15 12
16

48
Qualitative Data Analysis:
Deductive Analysis – General Process
1. Review the project model or framework.
2. Identify categories or groupings for data prior to data
analysis.
3. Review the qualitative data carefully and fully.
4. Label statements (or phrases) in the qualitative data with the
appropriate category or grouping based on the project model
or framework.

49
Content Analysis Example

1. What is your overall opinion of the new road?

Condition Issues

The road has too many curves and is dangerous.

Efficiency 50
Coding of Content
Sample of codes and categories Blue = access to education improved
Yellow = income increased Grey =bypassing village and less income
Green = access to markets and customers improved Purple = air pollution
Pink = employment opportunities improved Red = traffic and safety issues

I can sell my produce in more markets. This allows me to earn more money each day.
My daughter can now attend vocational college in the city because bus service is now available.
My income has increased because I was able to find a better job in the city.
I sometimes wish that the road had not been constructed. We have more traffic traveling at higher speeds.
There are so many more cars. The air pollution has affected my grandmother’s breathing.
Once the road was completed, fewer travelers stopped at my store. They now bypass the village and my
monthly income has dropped 30 percent.
We have more money because my husband can get to a second job.

More people in the community are able to attend the city’s vocational college due to the regular bus service.
My wife was able to get a part-time job in the city and our family income has increased.
More air pollution, but overall more market access has helped my company grow, increasing our revenues

51
Matrix for Coding

Sample of data coded and documented in an Excel file.

Access Access to Access to

Respondent ID

Employment
to the Income secondary More Traffic Air Bypassing health
opportunities
market increase school traffic accident pollution the village service
improved
improved improved improved

C128 1 1

K245 1

M358 1 1

… … … … … … … … …
Etc…

Total 20 26 15 18 11 5 10 8 16

52
Activity: Qualitative Analysis

1. Identify a partner at your table.

2. One person in each pair will complete an inductive analysis of
the qualitative data.
3. One person in each pair will complete a deductive analysis
(using provided categories).
4. Code the data from the handouts.
5. Compare you notes when done (Inter-coder /Inter-rater
reliability).

53
Activity: Qualitative Analysis
After completing your analysis with your partner, compare results
with others.
Discuss the following with your group members when everyone
is done:
 What were the strengths and limitations of the deductive analysis
process?
 What were the strengths and limitations of the inductive analysis
process?
 How can you use both approaches in your work?

54
Qualitative Data Analysis

Greatest Risk: Bias

- Hard to recognize things you don’t expect

Have a second person do the analysis

- Compare results
- Work out differences

55
Qualitative Data Analysis

Writing about results

• Feature major themes
• “A number of participants said”
• Highlight interesting perspectives even if only said by one or
two people
• Typically avoid reporting numbers or percentages

56
Different approaches to measure different
expected results

Data Collection Tools

Survey Interview Focus Observations Document
Groups Review
Impact
A X X X
Impact
B X X
Outcome
X X
A
Output X X
A
Output X X X
B
57
Triangulation
The validity of estimates of process, outcome or impact
indicators can be improved when estimates from two or more
independent sources are compared:

 Household income reported in surveys

 Observation of the quality of the house, consumer durables,
and quality of clothes

58
Triangulation
Triangulation requires that different estimates be systematically
compared
 If there are differences it is essential to understand and
explain the differences
– Are different data collection methods measuring different
things?
– Are some estimating methods more reliable/accurate than
others?
If the differences cannot be explained this must be stated in the
evaluation report.
Note that “triangulation” does not mean that exactly “three”
methods or sources must be used. Rather, the number is driven
by the evaluation design requirements.
59
Evaluation Plan

Specific Data
General Type of Type of Indicators & Data Collection
Sub- Data Analysis
Questions Question Design Measures Sources &
Questions Sampling

60
Measures of Association
Relation between males' education and
beginning salaries in Ministry of Sports
22

Pearson’s
18

r = .63 14

10
Education

6
0 20000 40000 60000 80000 100000

Beginning salary

Descriptive Statistics

Mean Std. Deviation N

Education 14.43 2.979 258
Beginning salary 20301.40 9111.781 258
61
Measures of Association
Relation between females' education and
beginning salaries in Ministry of Sports
18

Pearson’s 14

r = .47 12

10
Education

6
0 10000 20000 30000 40000

Beginning salary

Descriptive Statistics

Mean Std. Deviation N

Education 12.3704 2.31915 216
Beginning salary 13091.97 2935.59921 216
62
Kurtosis

Kurtosis is a measure of whether the data are peaked or flat relative to a

normal distribution

Needs Assessment For Refugee Emergencies NARE
No ratings yet
Needs Assessment For Refugee Emergencies NARE
12 pages
Introduction To Compiler Design (CD) : Mu-Mit
No ratings yet
Introduction To Compiler Design (CD) : Mu-Mit
22 pages
Memory Interface
No ratings yet
Memory Interface
42 pages
Development of Framework For An Integrated Model For Technology Transfer
No ratings yet
Development of Framework For An Integrated Model For Technology Transfer
14 pages
Review On Natural Language Processing
No ratings yet
Review On Natural Language Processing
4 pages
CH04
No ratings yet
CH04
24 pages
Word Based Statistical Machine Translation From English Text To Indian Sign Language
No ratings yet
Word Based Statistical Machine Translation From English Text To Indian Sign Language
8 pages
Sketch of A Noisy Channel Model For The Translation Process: Michael Carl Moritz Schaeffer
No ratings yet
Sketch of A Noisy Channel Model For The Translation Process: Michael Carl Moritz Schaeffer
46 pages
1 SM
No ratings yet
1 SM
10 pages
CH06
No ratings yet
CH06
28 pages
MT Impact - Horizon 2020 (And Beyond) : Rudy Tirry
No ratings yet
MT Impact - Horizon 2020 (And Beyond) : Rudy Tirry
23 pages
Of The Text Book: Code Optimization
No ratings yet
Of The Text Book: Code Optimization
19 pages
Of The Text Book: Code Optimization
No ratings yet
Of The Text Book: Code Optimization
19 pages
English To Yorùbá Machine Translation System Using Rule-Based Approach
No ratings yet
English To Yorùbá Machine Translation System Using Rule-Based Approach
6 pages
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (643)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
4/5 (1175)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1856)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (298)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2289)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (629)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (1139)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (836)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (4103)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (244)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (815)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (943)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (903)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2885)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (233)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2546)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1267)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (919)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)

12b Module 8-A. Data Analysis

Uploaded by

12b Module 8-A. Data Analysis

Uploaded by

Module 8: Data Analysis

© 2008. The World Bank Group. All rights reserved.

Independent variable: the intervention or explanatory variable

Independent variable: access to skilled birth

The 3-Ms: Mode, Median, Mean

Mode: most frequent response

Range: the difference between the largest and the smallest

Standard deviation: measures the dispersion of scores –

Note: this is not to scale.

Independent Variable=Gender Dependent Variable=Income

Gender Median Income

 We want to know how much corn the farms in each group

 We find that the mean amounts of corn produced are

 Can we conclude that the difference between the two groups

The smaller the value of p, the more statistically significant the

Strength and Direction

How strong is the association?

Relation between 8th grade science score and

Percent with high self-confidence

Mean Std. Deviation N

 Data Analysis in monitoring utilizes more basic

Be Cautious of Overly Aggregated Data!

Data for each outcome measure should be broken out

Common Demographic Characteristics

600 6 1% 600 8 1.3%

1,500 57 3.8% 200 8 4%

• Identify common words, ideas, themes

1. What is your overall opinion of the new road?

The road has too many curves and is dangerous.

1. What is your overall opinion of the new road?

The road has too many curves and is dangerous.

Sample of data coded and documented in an Excel file.

Access Access to Access to

1. Identify a partner at your table.

Greatest Risk: Bias

Have a second person do the analysis

Writing about results

Data Collection Tools

 Household income reported in surveys

Mean Std. Deviation N

Mean Std. Deviation N

Kurtosis is a measure of whether the data are peaked or flat relative to a

You might also like