Chapter 2 Methods of Data Collection and Organizing
Chapter 2 Methods of Data Collection and Organizing
Learning Objectives
• At the end of this chapter, the students will be able to:
2. Understand the criterion for the selection of a method to organize and present data
3. Identify the different methods of data collection and criterion that we use to select
a method of data collection
• There are different sources of data on health and health related conditions
• The major sources are census, vital statistics, health service records, morbidity
and mortality records etc…
• The information obtained from these sources are used for health Planning,
programming and evaluation of health services
10/29/2020 Getabalew 2
1. Census
• Census is a periodic count or enumeration of a population
• necessary for accurate description of population’s health status
• the principal sources of denominator for rates of disease and death
• Census data provides information on: -
• Size and composition of a population
• The forces that determines these variability
• The trends anticipated in the future
• General information about sociodemographic variables
10/29/2020 Getabalew 3
Types of census
10/29/2020 Getabalew 4
Census….
• Advantages of Census:
• Health planning and programming
• Accurate description of the health status of a population
• Census data are utilized in a number of ways for planning the welfare of the
people even out of health
• Disadvantages of census
• Conducting nationwide, therefore, very expensive
• Generates a large amount of data that takes long period to compile and analyze
• Carried out every 10years, therefore, it can’t assess yearly changes
10/29/2020 Getabalew 5
Common Errors in Census Data
• Omission and over enumeration
• Under reporting of deaths due to memory lapse and tendency not to report on
deaths particularly on neonatal and infant deaths
10/29/2020 Getabalew 6
Essential Characteristics of Census
• Individuality: - complete independent information of every inhabitants
• Universality: - all individuals of the population of all areas without omission or
repetition must be enumerated
• Simultanencity: - All individuals must be enumerated within shorter period of
time at a given moment called the censual period
• Periodicity: - must be taken at a regular time interval such as every ten years
• Well defined: - A well-defined geographic area and a clear period of time
10/29/2020 Getabalew 7
2. Registration of Vital Statistics
• Continuous process of recording vital events at the time of their
occurrence with the permanent need of human sources
• Vital events include; birth, death, migration, marriage, divorce, widow, etc.
10/29/2020 Getabalew 8
Characteristics of vital registration (4 C’s)
• Comprehensive → all births, deaths and other vital events should be registered
10/29/2020 Getabalew 9
Vital statistics….
Advantages of Vital Statistics
10/29/2020 Getabalew 10
Vital statistics….
Disadvantages
• Lack of denominator
10/29/2020 Getabalew 11
3. Health Service Records
• These include monthly, quarterly and annual reports of death, service etc and
reports on notifiable disease epidemic reports etc.
10/29/2020 Getabalew 12
Health service records…
• Advantages of Health Service Records
• Easily obtainable
• Available at low cost
• Continuous system of reporting
• Causes of illness and death available
10/29/2020 Getabalew 13
Health service records…..
• Disadvantages
• Lack of completeness
• Lack of representativeness
• Lack of denominator;- catchment area is not known in majority of cases
• Lack of uniformity in quality
• Diagnosis varies across the level of health institutions
• Lack of compliance with reporting
• Irregularity and incompleteness of published compilations
10/29/2020 Getabalew 14
Data Collection Methods
• Data collection methods or techniques allow as systematically collect data
about our objects of study (people, objects, and phenomenon)
2. interview
10/29/2020 Getabalew E 16
Method….
2. Interviews and self administered questionnaire
10/29/2020 Getabalew E 17
Method….
• Disadvantage:
• expensive
• Require skilled person ( interviewer)
• Common problems may include
• Language barrier
• Lack of adequate time
• Expenses
• Inadequately trained and experienced staff
• Invasion of privacy
• Suspicion
• Cultural norms
10/29/2020 Getabalew E 18
Type of Questions Used for Data Collection
1. Open ended questions: - permit free responses that should be recorded in the
respondents own words.
• The respondent is not given any possible answers to chosen from
• Sensitive issues
• one should try to offer a list of options that are exhaustive and mutually exclusive
• Single □
• Married □
• Separated
10/29/2020 / Divorce / Widowed Getabalew E 20
Steps in designing a questionnaire
• Step -1- Content
10/29/2020 Getabalew E 21
Steps in designing a questionnaire…
Step- 2- Formulating Questions
• Formulate one or more questions
• Questions are specific and precise enough
• Check whether each question measures one thing at a time
• Avoid combined questions. E.g. how large an interval would you and your
husband prefer between two successive births?
10/29/2020 Getabalew E 22
Steps in designing a questionnaire……
• must be logical for respondents and allow as much as possible for a natural
discussions
10/29/2020 Getabalew E 23
Steps in designing a questionnaire…
Step -4- Formatting the questions
10/29/2020 Getabalew E 24
Steps in designing a questionnaire….
Step-5- Translating
10/29/2020 Getabalew E 25
Methods of Data Organization and
Presentation
• The data collected in a survey is called raw data.
10/29/2020 Getabalew E 27
Definitions of terms;
• Lower class limit are the smallest numbers that can belong to the different classes.
• Upper class limits are the largest numbers that can belongs to the different classes.
• Class boundaries (true limits) are the numbers used to separate classes, but without the gaps
created by class limits.
• Class midpoints are the midpoints of the classes, each class midpoint can be found by adding the
lower class limit to the upper class limit and dividing the sum by 2.
• Class width is the difference between two consecutive lower class limits or two consecutive lower
10/29/2020 Getabalew E 28
Constructing a frequency distribution
a) Qualitative variable/A categorical distribution : Count the
number of cases in each category.
- Example1: The ICU type of 25 patients entering intensive care
unit at a given hospital:
1. Medical
2. Surgical
3. Cardiac
4. Other
10/29/2020 Getabalew E 29
Frequency Relative Frequency
ICU Type (How often) (Proportionately often)
Medical 12 0.48
Surgical 6 0.24
Cardiac 5 0.20
Other 2 0.08
Total 25 1.00
10/29/2020 Getabalew E 30
Example 2:
A study was conducted to assess the
characteristics of a group of 234 smokers by
collecting data on gender and other variables.
Gender, 1 = male, 2 = female
10/29/2020 Getabalew E 31
Constructing a frequency distribution……
b) Quantitative continuous variable:
• A guide on the determination of the number of classes (k) can be used the Sturge’s
Formula, given by:
• Note that the Sturges rule should not be regarded as final, but should be considered
as a guide only.
• The number of classes specified by the rule should be increased or decreased for
convenient or clear presentation.
10/29/2020 Getabalew E 32
Procedures for constructing a frequency distribution
2. Calculate class width
• Class width = (highest Value) - (lowest value)
Number of classes (k)
• Round this result to get a convenient number.(usually round up)
3. Starting point: begin by choosing a number for the lower limit of the first class. Choose either the
4. Using the lower limit of the first class and class width, proceed to list the other lower class limits.
(Add the class width to the starting point to get the second class limits, add the class width to the second
5. Go through the data set putting a tally in the appropriate class for each data value.
• should10/29/2020
be increased or decreased for convenient or clear presentation.
Getabalew E 33
Example:
• Leisure time (hours) per week for 40 college students:
23 24 18 14 20 36 24 26 23 21 16 15 19 20 22 14 13 10 19 27 29 22
38 28 34 32 23 19 21 31 16 28 19 18 12 27 15 21 25 16
Total 40 1.00
10/29/2020 Getabalew E 35
• Cumulative frequencies: When frequencies of two or
more classes are added.
10/29/2020 Getabalew E 36
• True limits: Are those limits that make an interval of a
continuous variable continuous in both directions
• Subtract 0.5 from the lower and add it to the upper limit
10/29/2020 Getabalew E 37
Time
(Hours) True limit Mid-point Frequency
10/29/2020 Getabalew E 38
Grouped frequency distribution
• In connection with large sets of data, a good overall picture and sufficient
information can often be conveyed by grouping the data into a number of
class intervals.
• For instance, the above grouped frequency distribution cannot tell how many
of the students are 19 years old, or how many are over 28.
10/29/2020 Getabalew E 40
Grouped frequency…..
• Example: Construct a grouped frequency distribution of the following data on
the amount of time (in hours) that 80 college students devoted to leisure
activities during a typical school week:
10/29/2020 Getabalew E 41
Grouped frequency….
• Determine the number of classes K = 1 + 3.322 x log (80) = 7.32 ≈7 classes
• Maximum value = 38 and Minimum value = 10 = Range/width = 38 – 10 = 28
and
• W = 28/7 = 4
• Using width of 5, we can construct grouped frequency distribution for the
above data as:
10/29/2020 Getabalew E 42
Grouped frequency……
Time spent Tally Frequency Cumulative
(hours) frequency
10-14 8 8
15-19 28 36
20-24 27 63
25-29 12 75
30-34 //// 4 79
35-39 / 1 80
Total 80
10/29/2020 Getabalew E 43
Statistical table
• An orderly and systematic presentation of numerical data in rows and
columns
10/29/2020 Getabalew E 44
Guidelines for constructing tables
• Keep them simple,
• Show totals,
Table 2 Primary and secondary cases of syphilis morbidity by age and sex in 1989.
10/29/2020 Getabalew E 48
Diagrammatic Representation of Data
10/29/2020 Getabalew E 49
Importance of Diagrammatic Representation
1. They have greater attraction than mere figures. They give delight to
the eye and add a spark of interest.
2. They help in deriving the required information in less time and
without any mental strain.
3. They facilitate comparison.
4. They may reveal unsuspected patterns in a complex set of data and
may suggest directions in which changes are occurring. This warns
us to take immediate action.
5. They have greater memorizing value than mere figures.
10/29/2020 Getabalew E 50
Limitations of Diagrammatic Representation
1. The technique of diagrammatic representation is made use only for
purposes of comparison.
2. It is not to be used when comparison is either not possible or is not
necessary.
3. Diagrammatic representation is not an alternative to tabulation.
4. It only strengthens the textual exposition of a subject, and cannot
serve as a complete substitute for statistical data.
5. It can give only an approximate idea and as such where greater
accuracy is needed diagrams will not be suitable.
6. They fail to bring to light small differences
10/29/2020 Getabalew E 51
General rules that are commonly accepted about construction of graphs.
1. Every graph should be self-explanatory and as simple as possible.
2. Titles are usually placed below the graph and it should again question what ?
Where? When? How classified?
3. Legends or keys should be used to differentiate variables if more than one is
shown.
4. The axes label should be placed to read from the left side and from the
bottom.
5. The units in to which the scale is divided should be clearly indicated.
6. The numerical scale representing frequency must start at zero or a break in
the line should be shown.
10/29/2020 Getabalew E 52
Specific types of graphs include:
• Bar graph
Nominal, ordinal
• Pie chart and discrete data
• Histogram
• Box plot
• Scatter plot Continuous
data
• Line graph
• Others
10/29/2020 Getabalew E 53
1. Bar charts (or graphs)
10/29/2020 Getabalew E 54
Bar chart for the type of ICU for 25 patients
10/29/2020 Getabalew E 55
Method of constructing bar chart
• All the bars must have equal width
• All the bars should rest on the same line called the base
10/29/2020 Getabalew E 56
Example: Construct a bar chart for the following data.
10/29/2020 Getabalew E 57
Distribution of patients in hopital X by source of referal, 1999
769
800
700 623
N o . o f p a t ie n t s
600
500
400
300 256
200 161
97
100
0
Other GP OPD Casualty Other
hospital
Source of re fe ral
10/29/2020 Getabalew E 58
2. Sub-divided bar chart
• The order in which the components are shown in a “bar” is followed in all
bars used in the diagram.
10/29/2020 Getabalew E 59
Example: Plasmodium species distribution for
confirmed malaria cases, Zeway, 2003
100 Mixed
P. vivax
80 P. falciparum
P e rc e n t
60
40
20
0
August October December
2003
10/29/2020 Getabalew E 60
3. Multiple bar graph
10/29/2020 Getabalew E 61
Prevalence of self reported breathlessness among school
childeren, 1998
B re a t h le s s n e s s , p e r c e n t
35
30
25
20
15
10
5
0
Neither One Both
Parents smooking
We can see from the graph quickly that the prevalence of the
symptoms increases both with the child’s smoking and with that
of their parents.
10/29/2020 Getabalew E 62
There’s no reason why the bar chart can’t be
plotted horizontally instead of vertically.
CHA
Type of source
HC
Reading
Training female
male
Campaign
Anti FGMC
CAT
0 10 20 30 40 50
Percent
10/29/2020 Getabalew E 64
Steps to construct a pie-chart
• Construct a frequency table
10/29/2020 Getabalew E 65
Example: Distribution of deaths for females, in England and
Wales, 1989.
10/29/2020 Getabalew E 66
Distribution fo cause of death for females, in England and Wales, 1989
Others
8%
Digestive System
4%
Injury and Poisoning
3%
Circulatory system
Respiratory system
42%
13%
Neoplasmas
30%
10/29/2020 Getabalew E 67
5. Stem and Leaf Plot
• A quick way to organize data to give visual impression similar to a
histogram while retaining much more detail on the data.
• Draw a vertical line and place the first digits of each value called the
“stem” on the left side of the line.
• The numbers on the right side of the vertical line present the second digit
of each observation; they are the “leaves”.
10/29/2020 Getabalew E 68
Example
• 43, 28, 34, 61, 77, 82, 22, 47, 49, 51, 29, 36, 66, 72, 41
2 2 8 9
3 4 6
4 1 3 7 9
5 1
6 1 6
7 2 7
8 2
10/29/2020 Getabalew E 69
6. Histogram
• Non-overlapping intervals that cover all of the data values must be used.
• Bars are then drawn over the intervals in such a way that the areas of the
bars are all proportional in the same way to their interval frequencies.
10/29/2020 Getabalew E 70
Example: Distribution of the age of women at the time of marriage
40
35
N o o f w o m en
30
25
20
15
10
0
14.5-19.5 19.5-24.5 24.5-29.5 29.5-34.5 34.5-39.5 39.5-44.5 44.5-49.5
Age group
10/29/2020 Getabalew E 71
Histogram for the ages of 2087 mothers with <5 children, Adami Tulu,
2003
700
600
500
400
300
200
0 N = 2087.00
15.0 20.0 25.0 30.0 35.0 40.0 45.0 50.0 55.0
N1AGEMOTH
10/29/2020 Getabalew E 72
7. Frequency polygon
10/29/2020 Getabalew E 73
Frequency polygon for the ages of 2087 mothers with <5
children, Adami Tulu, 2003
700
600
500
400
300
200
0 N = 2087.00
15.0 20.0 25.0 30.0 35.0 40.0 45.0 50.0 55.0
N1AGEMOTH
10/29/2020 Getabalew E 74
It can be also drawn without erecting rectangles by joining the top midpoints
of the intervals representing the frequency of the classes as follows:
Age of women at the time of marriage
40
35
30
N o o f w o m en
25
20
15
10
0
12 17 22 27 32 37 42 47
Age
10/29/2020 Getabalew E 75
8. Ogive Curve
• Some times it may be necessary to know the number of items whose values are more
or less than a certain amount.
• We may, for example, be interested to know the no. of patients whose weight is <50
Kg or >60 Kg.
• To get this information it is necessary to change the form of the frequency distribution
from a ‘simple’ to a ‘cumulative’ distribution.
10-19 3 12 3 12
20-29 1 4 4 16
30-39 3 12 7 28
40-49 0 0 7 28
50-59 6 24 13 52
60-69 1 4 14 56
70-79 9 36 23 92
80-89 2 8 25 100
Total 25 100
10/29/2020 Getabalew E 77
Cumulative frequency of 25 ICU patients
10/29/2020 Getabalew E 78
Heart rate of patients admited in hospital Y, 1998
60
50
C u m . freq en cy
40
30
20
10
104.5
54.5
69.5
94.5
59.5
64.5
74.5
79.5
84.5
89.5
99.5
Heart rate
LM MM
10/29/2020 Getabalew E 79
9. Line graph
• Useful for assessing the trend of particular situation overtime.
• The time, in weeks, months or years, is marked along the horizontal axis, and
10/29/2020 Getabalew E 80
N o . o f c o n f ir m e d m a la r ia c a s e s
No. of microscopically confirmed malaria cases by species and month at Zeway
malaria control unit, 2003
2100
1800 Positive
1500 P. falciparum
P. vivax
1200
900
600
300
0
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Months
10/29/2020 Getabalew E 81
Exercise
Data were recorded on the age in years and height in cm of 20 high school students in a classroom.
Females Males
Age Height Age Height
_______________________________________________________________
• 15 170 15 185
• 15 154 16 183
• 16 160 16 174
• 15 159 15 183
• 15 156 15 173
• 15 153 15 173
• 16 166 15 178
• 16 163 14 167
• 15 167 15 177
• 15 151
• 1610/29/2020 177 Getabalew E 82