0% found this document useful (0 votes)
13 views

Topic 5_ Data Analysis (1)

The document discusses the importance of data analysis and management, including definitions, types of data (qualitative and quantitative), and the significance of data in decision-making and policy formulation. It covers data cleaning, coding, and the creation of codebooks for effective data organization. Additionally, it provides insights on summarizing data through frequency distributions, graphs, and charts, as well as practical examples using Excel.

Uploaded by

Ansgar Alberto
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Topic 5_ Data Analysis (1)

The document discusses the importance of data analysis and management, including definitions, types of data (qualitative and quantitative), and the significance of data in decision-making and policy formulation. It covers data cleaning, coding, and the creation of codebooks for effective data organization. Additionally, it provides insights on summarizing data through frequency distributions, graphs, and charts, as well as practical examples using Excel.

Uploaded by

Ansgar Alberto
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 68

Topic 4: Data analysis

Data Management
Data
• What do we mean by data?

How would you


define data? Are there different
types of data?

What forms can data


take?
Why data are important?
• At the national level data are needed for:
Identify population
Plan and in need of the Monitor and evaluate quality of
develop government services offered by the government
interventions services through its agents

Making Decisions on
Demonstrate trends in how to Improve
key indicators Interventions Appropriate policy formulation

Informs acquisition and distribution of Informs policy and guideline


resources
development
Types of data
QUALITATIVE QUANTITATIVE
Typically in the form of Text, words and images Numeric data (numbers,
percentages, rates etc)
Sources include Focus groups, interviews and Surveys and questionnaires,
observations monitoring/programme data,
observation checklist
Findings described In terms of common themes and Numerically or statistically
patterns of response
Ideal for Identifying previously unknown Measuring pervasiveness of
processes, explanations of why “known” phenomenon and
and how phenomenon occurred central patterns of associations
Data cleaning/Cleansing
• Removing data that may distort analysis
Removes:
• Incorrect
BAD
DATA • Incomplete
• Duplicates
• Improperly
formatted
data

• Data cleaning is the first step in the overall data preparation Good
process
analysis rests
• It involves analyzing, identifying and correcting messy raw
data on clean data
Coding and Code book
• A code: Short word, phrase (qualitative research) or a number
(quantitative research)describing the meaning and context of the
whole sentence or paragraph
• Why coding?
Reduces large quantities of
Data are collected in Working with original data
information to a form that is
different formats may be cumbersome easily handled

The use of computer programs


(Excel, SPSS,STATA, inter alia) For example: The birth rates for the 31 TZ regions would
not be coded but the regions would have been coded
(from 1-31) instead of writing their names
Note: NOT ALL DATA NEED TO BE CODED
Example of codes
• In each data set, a variable has a name and each category has a code
• For example:

The 1,2,3,4,5
1 = Christian are Codes for
2 = Muslim each response
Variable name: category
Religion 3 = Pagan
4 = Non- religious
5 = Other
Examples of codes in qualitative research

I noticed that the grand majority of homes have


chain link fences in front of them. There are many SECURITY
dogs with signs on fences that say “Beware of the
dogs”

He’s always been there for me, even when my parents were
not. He’s one of the few things that I hold as a constant in STABILITY
my life. So it’s nice

I feel really comfortable around her COMFORTABLE


Codebook

• It tells the coder/ researcher how each questionnaire will be coded


for data entry
• It specifies the question from the questionnaire from which the
data was taken
• It specifies the variable name, operational definition of the variable,
the coding options, type of a variable (numeric or alpha numeric)
and the number of columns the variable requires
• Consider an example of a questionnaire on Quality of Work Life
Questionnaire example
1. Name of district where you work……………………………
2. How long have you been an employee in this company?.......yrs
3. How many in- the- country sponsored training sessions have you
attended?.................
4. What is your job classification?
……………… Management
……………… Technical
……………… Administrative
………………. Clerical
5. Is your position
…………. Supervisory
…………. Non supervisory
6. Sex ……………… Male …………………….. Female
7. In what area do you need additional training?...................
Question No. Variable name Operational definition Coding
ID Questionnaire number 001-085

1 District Name of district where you work Kiwangwa = 1


Kipatamo = 2
Namkumba = 3
Kiambui = 4
Missing = 99
2 Length How long have you been an employee in this company?
3 Training How many in- the- country sponsored training sessions
have you attended
4 Jobclass What is your job classification Management = 1
Technical = 2
Administrative = 3
Clerical = 4
Missing = 99
5 Super Is your position supervisory or non supervisory? Supervisory = 1
Non- supervisory = 2
6 Needs In what area do you need additional training Supervising =1
Budgeting = 2
Computers= 3
Others= 4
Missing = 99
Tips on coding

• Use numbers to represent response categories


On a scale about service satisfaction:
1= Very satisfied
2 = Satisfied
3 = Neutral
4= Dissatisfied
5= Very dissatisfied

• Use 0 and 1 to code variables with binary response categories


Sex: 1= Male 2 = Female
Access of service :
1= Access 2 = No access

• One question can lead into more than one variable


Religion: 1 = Roman; 2 = Anglican; 3 = Muslim; 4= Other
1&2 = Christian; 3 = Muslim; 4 = Other (ONLY 3 categories with 3 codes)
Example in Excel: Features in Ms Excel
Each individual cell can be identified by
its row and column
The active cell is read “ an entry found in
column C and Row 8”
Called C8
It shows which
cell you are Formular
currently working bar
in
A tool bar
Organizing data: deleting row/column

Highlighting Column C + Right


Click
Gives a dialog with options: Choose
Delete to delete the column
highlighted
Highlight a row to be deleted + Right click , then
choose “Delete” and click once
Row 8 will be deleted
Moving a column/row

Suppose we
wanna move
column G
between E and F
Column G will now Right click at the
have a flashing green top of column G
box around it
Excel is waiting for
you to move it A list of options appears;
somewhere new
Select “Cut”
Right Click at the top of column
F and select “Insert Cut Cells”

This will insert Column G to


the left of column F –
Between E and F
Here it is!!!

Be careful not to use a PASTE option,


otherwise you will REPLACE the data
Sort data
• Is one way to organize data
• Sort data according to a specific variable
• You can sort data based on:
• Values, for example Numerically- smallest to largest or largest to
smallest
• Alphabetically – A to Z or Z to A
• Or according to other cell attributes such as cell color or font color
etc.
Sorting

Let us say we wanna


organize our data
according to “income”
From Largest to Smallest
And Click
on the
2 “Sort &
1 Filter

Highlight
the cells 3
A drop-
you want to down
4 menu
Sort by
will
Click on Sort
Largest to
appear
Smallest
A Sort warning will
appear

Ensure that “Expand the selection” is


checked/ticked. This ensures that all your data
in the table are included in the Sort
Now your data are sorted
from largest to smallest
according to “Income” (See
column D)
Filtering

• Filtering is another way to organize data – it makes it easy to work


with large datasets
• When you filter data you set specific criteria so that only certain
data are displayed
• E.g. You can filter a dataset to look only at a particular variable within the
data set, for example locality
• Data that do not meet your criteria are hidden from view – but not
deleted
• You can filter by more than one column, but each filter will be
applied only to the data that are visible
Filtering

Click on
Sort &
Filter
Click
Highlight ANY cell within
the row that you want to
on
add filters to (Usually the “Filter”
column headings)
A filter button will appear in each of
the cells in that row
Let us say we want to filter the data we
have to only look at data for the African
ethnic background only, and hide the
other ethnics
Click on a filter button at the bottom right
corner of the cell E1 where variable ethnic
is found

A drop
down menu
will appear
Tick only the box (es) for the data that you want
to remain visible (Africa for our case), then click
OK
After Filtering

Clear filter by clicking Sort & Filter, then choose


“Clear”

Your data set is now filtered


according to ethnicity
background and particularly
consists of Africans only
Its significance!
• A picture is worth a thousand of words!
• A well constructed table that summarizes data is also probably
worth a thousand of words!
• If I want to know something about 10,000 individuals, using
illustrations that summarizes data worth it
• This can be done using charts, tables and graphs
• There are different kinds of graphs and charts available
Summarizing qualitative data

• Frequency distribution: a tabular summary of data showing the


number (frequency)of items in each of the non overlapping classes
• Consider data from a sample of 50 soft drinks purchases
Coke Coke Pepsi Pepsi Pepsi Coke Mirinda Coke Sprite Sprite
Fanta Fanta Mirinda Mirinda Fanta Fanta Pepsi Coke Coke Sprite
7up Sprite 7up 7up Fanta Fanta Pepsi Sprite Coke Coke
Fanta Fanta Coke Coke Sprite 7up 7up Coke Pepsi Pepsi
Mirinda Mirinda Pepsi Coke Fanta Fanta Fanta Sprite 7up Coke

To develop a frequency distribution for these data, we count the number of times each soft drink
appears! So that we have a drink and number of times it appears (frequency): See Table 1
Table 1
Drink Type frequency
Coke 13
Pepsi 8
Mirinda 5
Fanta 11
7up 6
Sprite 7
50
50 is the total frequency– the total number of the drinks purchased
Intuition from Table 1

• Coke appears 13 times; Pepsi 8


times; Mirinda 5 times; Fanta
11 times; 7up 6 times and Sprite
7 times.
• It is a summary of how the 50
soft drink purchases are
distributed across the 6 types of
soft drinks.
• Coke is the leader, followed by
Fanta. Pepsi is the third, Sprite
the fourth etc.
Using Excel
Relative frequency and Percentage frequency distribution

• A frequency distribution shows the number (frequency) of items in


each of several non overlapping classes
• If interest is in the proportion , or percentage of items in each class,
RF and PF are developed
• Relative frequency of a class = Frequency of the class/ total
frequency
Relative and Percentage frequency distributions of soft drink purchases

Soft drink Relative frequency Percent frequency


Coke 0.26 26
Pepsi 0.16 16
Mirinda 0.10 10
Fanta 0.22 22
7up 0.12 12
Sprite 0.14 14
Total 1.00 100

Intuition: The top three soft drinks purchased were Coke (26%), Fanta
(22%) and Pepsi (22%)
Using Excel for Relative frequency

The formula:
E8
= E2/$E$8
Bar Graphs and Pie Charts

• A bar graph, or a bar chart , is a graphical device for depicting


qualitative data summarized in a frequency, relative frequency or
percentage frequency distribution
• On the horizontal axis- specify labels (categories)
• The vertical axis- presents the frequency, relative frequency or
percentage
• A bar of fixed width is drawn above each class label and the length
is extended until the frequency, relative frequency, per centage is
reached
Bar graph of soft drink purchases (Using Excel)

Step 2:
On the tool bar, Click
“Insert”, and select

Step 1:
Select the data set
you want to draw Step 3:
a bar chart Obtain the graph
The Graph
Pie Chart

• Suitable for presenting in a graphical form the relative frequency


and percentage distributions for qualitative data
• Draw a circle to represent all of the data
• Use the relative frequencies to subdivide the circle into sectors, for
example:
• Coke had a relative frequency 0.26 (360)= 93.6 degrees to the circle.
• Similar calculations will provide the required degrees in a circle of
360 degrees for each sector (category)
Pie Chart – Illustration using Excel
Class activity 1

• A questionnaire provides 58 Yes, 42 No, and 20 No Opinion


answers.
a. In the construction of a pie chart, how many degrees would be in
the section of the pie showing the Yes answers?
b. How many degrees would be in the section of the pie showing
the No answers?
c. Construct a pie chart
d. Construct a bar chart
Activity 2
• The following Table shows Opinion about government spending on Law
enforcement and education
The government On Law On Education (%)
spends: enforcement (%)
Too little 52.4 75.6
About right 33.5 18.2
Too much 14.1 6.2
Total 100 (1250) 100 (1400)
Questions:
1. What intuition on government spending can be drawn from the table?Research design\Linking
statistics and research.pdf
2. In the same graph, use a bar chart to compare government spending on law enforcement and on
education
3. Comment on the graph
Summarizing Quantitative data

• Frequency distribution: Same definition holds for quantitative and


qualitative data – tabular summary of data showing frequency of
items in each of several non-overlapping classes
• In quantitative data, we must be careful in defining the non
overlapping classes to be used in the frequency distribution
• 3 steps necessary to define the classes for a frequency distribution
with quantitative data are:
- Determine the number of non overlapping classes
- Determine the width of each class
- Determine the class limits
Frequency distribution
• Example:
• Frequency distribution of poverty rates: Percent of persons in poverty, 2001-
2003
Class Frequency
10-14 21
15-19 14
20-24 9
25-29 5
30-34 2
Total 51

Intuition:
21 persons have a poverty rate between 10 – 14 percent; Only 2 persons have a poverty rate
between 30 – 34 percentage!!! If the standard poverty rate is known, the feeling becomes more
realistic
Percentage distribution

• Consider the following percent frequency distribution for the data on


poverty rates among individuals in selected states in Country X
Classes Frequency Percent
10-14 21 41.2
15-19 14 27.5
20-24 9 17.6
25-29 5 9.8
30-34 2 3.9
Total 51 100

Sample message:
The fourth class contains 5 out of a total of 51 observations (n= 51). The percent in this class is (5
divide by 51) times 100 gives 9.8% Implies that 1 out of every 10 persons had a poverty rate
between 25 and 29 percent
Analyzing frequency distributions

• Four ways can be used:


• Percentage distribution
• Cumulative percentage distribution
• A cross tabulation and
• A graph
Excel : Histogram Presentation
Working out frequencies

Step 1

Highlight cells
Type a formula for E2:E6
FREQUENCY in
the Cell E2: =
FREQUENCY(A2:A
21,D2:D6) without Press
enter CTRL+SHIFT+ENTER Step 3
to have the frequencies
Step 2
After ctrl+shift+enter

The required
frequencies
Histogram

Select cells 3
E2:E6 Go to Chart
Wizard and
Format the chart Choose Clustered
obtained Column
Formatting Chart

Select the series Tab and then


Enter C2:C6 in the category
(X) axis labels box, click next
or Right click any value in x-
axis and select Data
Select data dialog box
A column chart from
Excel
Note: In a histogram,
the adjacent rectangles
MUST touch to each
other
A column chart to a histogram

1. Right Click on any rectangle in the


column chart to produce a list of
options
2. Select the Format Data Series option
3. Enter 0 in the Gap width
A
HISTOGRAM
Exercise
• A consulted company provided survey data on the annual amount
of household purchases by families with an annual income of
$75,000 or more. Assume that the following data from a sample of
25 households show the dollars spent in the past year on books and
magazines.
496 382 202 287 266 119 10 385 135 475
255 379 267 24 42 25 283 110 423 160
123 16 243 363 280
a. Construct a frequency distribution and a relative frequency distribution
for the data
b. Provide a histogram. Comment on the shape of the distribution
c. Comment on the annual spending on books and magazines for families in
the sample
Cross tabulations

• A tabular summary of data for two variables


• Consider the following example:
• The quality rating and the meal price data were collected for a
sample of 300 restaurant located in Town X.
Meal Price
Quality rating $10-19 $20-29 $30-39 $40-49 Total
Good 42 40 2 0 84
Very good 34 64 46 6 150
Excellent 2 14 28 22 66
Total 78 118 76 28 300
Meal Price

Quality rating $10-19 $20-29 $30-39 $40-49 Total

Good 42 40 2 0 84

Very good 34 64 46 6 150

Excellent 2 14 28 22 66

Total 78 118 76 28 300

Intuition from the Table above:


We see that the greatest number of restaurants in the sample (64) have
a “Very good” rating and a meal price in the $20-29. Only two
restaurants have an “excellent” rating and a meal price in the $10-19
range. Similar interpretations of the other frequencies can be made
Meal Price

Quality rating $10-19 $20-29 $30-39 $40-49 Total

Good 50.0 47.6 2.4 0.0 100

Very good 22.7 42.7 30.6 4.0 100

Excellent 3.0 21.2 42.4 33.4 100

Intuition (Percentage Use- Row Percentages):


-More insights on the relationship between two variables:
Of the restaurants with the lowest quality rating (good), we see that the greatest
percentage are for the less expensive (50% have $ 10-19 meal prices and 47.6%
have $20 – 29 meal prices).
Of the restaurants with the highest quality rating (excellent), we see that the
greatest percentage are for the more expensive restaurants (42.4% have $ 30 – 39
meal prices and 33.4% have $40-49 meal prices). Thus, the more expensive meals
are associated with the high quality restaurants

You might also like