0% found this document useful (0 votes)
18 views

Lecture 2 Data Information Knowledge-1

The document discusses data, information, and knowledge. It defines data as observations without meaning, information as contextualized data that answers questions, and knowledge as justified beliefs including understanding relationships. It provides examples to distinguish between the three concepts. The document also outlines the health information cycle from collecting data to taking action, and emphasizes ensuring data quality during processing.

Uploaded by

Clemence Lumani
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Lecture 2 Data Information Knowledge-1

The document discusses data, information, and knowledge. It defines data as observations without meaning, information as contextualized data that answers questions, and knowledge as justified beliefs including understanding relationships. It provides examples to distinguish between the three concepts. The document also outlines the health information cycle from collecting data to taking action, and emphasizes ensuring data quality during processing.

Uploaded by

Clemence Lumani
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 110

Mzuzu University

Introduction to Health Information Systems (HIS)

Lecture 2:
Data, Information, Knowledge
Outline
• Data, information & knowledge
• Information cycle – from data to action
• Ensuring data quality
• Data processing and compilation
• Data presentation
Data, Information, Knowledge
• Data: Observations (numbers, terms)
• No meaning attached to it as a result of which it
may have multiple meanings
• Example: what does “9” mean?

• Information: aggregation of data that makes


decision making easier
• Meaning is attached and contextualized
• Answers questions: who, what, when, where

(Zins,
2007)
Data, Information, Knowledge
• Knowledge: includes facts about real world
entities and the relationship between them;
justifiable beliefs based on data and information.

• It is an understanding gained through experience


• Answers the ‘how’ question
(Zins,
2007)
Data, Information, Knowledge

• Knowledge: includes facts


about real world entities
and the relationship
between them; justifiable
beliefs based on data and
information.

• It is an understanding
gained through experience
• Answers the ‘how’
question

(Zins, 2007)
Example: Data is everywhere!
• 38.5
• 45
• 150
• Female
Example: Information
• 38.5
• 38.5 degree Celsius  high body temperature
reading
• 45
• 45 years old female patient
• 45 kilograms female patient
• 150
• 150 cm female patient
• Female patient with CD4 count: 150 cell/mm3
Example: Knowledge
• An middle-age women with fever and
progressed Stage 3 infection (AIDS)
• An middle-age women with fever.
• An women with normal Body Mass Index (BMI)
with fever.
Quiz: Matching
CATEGORY REPONSE
• Data • 17
• Information • Positive Drug Screen
• Knowledge • Positive Pregnancy Test
• 20 weeks gestation
• A high risk, obese
pregnant teen in second
trimester with
substance use issues
• 90
Different types of Data/Variables

• Quantitative / Numerical / Continuous


 Numeric measurements
 Counts, how many, how much
 Examples: age, height, no. of children
• Qualitative / Nominal (named) / Categorical
 Descriptions, qualities
 Non-numeric data
 Ordered / ranked (non-quantitative) data
 Examples: Ill? (yes/no), district, cancer stage
Variable Types: Practice

For each variable, state whether it is qualitative or


quantitative:
Qualitative or
# Variable Possible responses
Quantitative?
1 Age (years) 0–99+ Quantitative
Single, Married,
2 Marital status Qualitative
Divorced, Widowed, …
3 Number of living siblings 0–20+ Quantitative
4 HIV status Pos, Neg, Unk Qualitative
5 CD4+ T-cell count 0–1600+ Quantitative
6 Sex M, F Qualitative
0 = Illiterate
1 = Primary only
7 Educational level Qualitative
2 = Secondary
3 = University
Why is type of variable important?
Qualitative or
# Variable Possible responses
Quantitative?
1 Age (years) 0–99+ Quantitative
Single, Married,
2 Marital status Qualitative
Divorced, Widowed, …
3 Number of living siblings 0–20+ Quantitative
4 HIV status Pos, Neg, Unk Qualitative
5 CD4+ T-cell count 0–1600+ Quantitative
6 Sex M, F Qualitative
0 = Illiterate
1 = Primary only
7 Educational level Qualitative
2 = Secondary
3 = University

Because we summarize different variable types


with different summary methods.
Transforming Data to Knowledge
• Raw data needs to be analyzed , interpreted,
and evaluated to form knowledge that can be
used to inform and aid decision aid.
Health Information Cycle: Data to Action
Health Information Cycle: Stages
Collect
Stage 1:Find the data.
Where is it located? Paper charts (Passport books)
Reporting forms? Electronic health records (EHRs)?
• This step is even more crucial as locate the sources
of data required for quality and other reporting.

Stage 2: Capture the data.


• Some data will be available electronically, some can
be acquired electronically, but some will require
manual abstraction.
• Use of indicators
Process
Stage 3: Normalize the data.
• Normalization ensures the data can be more than a
number or a note but meaningful data that can form
the basis for action.
• Ensures that information is used in the same way.
Line listing
• The accuracy and reliability that results from
normalization is of paramount importance.
• Normalization makes the information unambiguous.
How to ensure data quality?
During Data Collection -
• Make sure the data produced at the “source” is
accurate.
• Make sure the same standards and definitions are
used.
Before Data Processing -
• Make sure data is correctly transcribed and
transmitted into any data collection tools.
• Make sure data is correctly aggregated.
Tips in Data Processing & Compilation
• Removing mistakes and correcting for missing data
• Providing documented measures of degree of
confidence in data
• Capturing the flow of transactional data for safe
keeping
• Adjusting data from multiple sources to allow them
to be used together
• Structuring data to be usable by end-user tool
• Tracking all the above actions to tangibly support
data quality assessments
Data in a Line List
Confirmed Yellow fever cases, Country X, Dec. 2016 – Feb. 2017

Yellow
Acute IGM+
fever
Age Sex Date of Fever Jaundice Lab Test
ID # Village Vaccine
(years) (M/F) Onset
Y=Yes, N=No, U=Unknown
1 A 5 M 30 Dec 2016 Y N Y
2 B 11 F 09 Jan 2017 Y N Y
3 A 34 M 12 Jan 2017 Y N Y
4 C 73 M 12 Jan 2017 Y N Y
5 A 84 F 13 Jan 2017 Y N Y
6 B 16 M 16 Jan 2017 U N Y
7 B 19 F 30 Jan 2017 Y N Y
8 A 23 F 02 Feb 2017 Y N Y
9 C 38 F 08 Feb 2017 Y N Y
10 B 47 M 11 Feb 2017 Y N Y
11 A 27 F 17 Feb 2017 Y N Y
Process
Stage 4: Aggregate the data.
• This step helps to consolidates the data from
individual patients to groups or pools of patients.

• Healthcare providers can analyze the overall impact


and performance of programs.

• This step is crucial because data/variables are


analyzed differently depending on the type of data
Quantitative Variables
Type of Data Summarize with
• Measurements Measures of
• Numeric data central location
and spread
Examples Measures
• Age  Mode
• Height  Median
 Mean
• No. of children
 Range
• CD4+ T-cell counts
Central Location of the Age
Distribution?
A?
20 B?
Chart Title
Number of Cases

15

10

0
0 1 2 3 4 5 6 7 8 9

Age (Years)
Individual Records to Summarize
Day Dataset: incubation period (in days) of 19 patients with
Pt. s
Ebola virus disease (EVD)
KP 9
JB 8
SW 11
EB 9
NG 10
PK 7
BJ 9
JH 9
RF 6
AH 2
TN 11
RT 8
LW 14
EN 9
CL 8
RD 13
KJ 8
LC 10
TB 7
Mode

Definition
value that occurs most frequently in a dataset
• Simple measure, but relatively unimportant
To identify the mode

1 Create frequency
distribution table

2 Identify value that occurs most often (check if 1


value, more than 1, or none)
Identify Mode from Frequency
Distribution
ID Days
1 2 Days Frequency Days Mode
Frequency
2 6
3 7 2 1 9 5
4 7
5 8 3 0 10 2
6 8
4 0 11 2
7 8
8 8 0
5 12 0
9 9
10 9 6 1 13 1
11 9
12 9 7 2 14 1
13 9
14 10 8 4 Total 19
15 10
16 11
17 11
18 13
19 14
Mode: Properties and Uses
• “What is the most common group?”
• Easiest measure of central location to understand,
to explain, and to identify
• May be more than one mode
• May be no mode
• Mode may not be “central”
• Not used much in epidemiology
Median
Definition
Middle value, value that splits the distribution into
two equal parts
• 50% of observations are below the median
• 50% of observations are above the median

To identify the median

Arrange Find middle Identify the


1 observations
in order
2 position as
(n + 1) / 2
3 value at the
middle
Median: Example
Obs
Pt.
1
KP
Days
29
Ebola Incubation Period (n=19)
2
JB 68
3
SW 7
11 Odd number of values (n = 19)
4
EB 79
5
NG 8
10
6
PK 87
9 observations
1
7
BJ 89
8
JH 89 above median Sort 
9
RF 96
10
AH 92 Median = 9
11
TN
12
RT
13
LW
9
11
98
9
14 9 observations
2 Find middle position
(19 + 1) / 2 = 10

3 Median is value at
14
EN 109
CL
15 108
below median
RD
16 13
11
10th position = 9
KJ
17 118
LC
18 10
13
TB
19 147
Median: Example
Obs
Pt.
1
Days
Days
29
Ebola Incubation Period (n=20)
KP
2
JB 68
SW3 7
11 Added 20th patient, so now,
4 79
EB
5
NG 8
10
Even number of values (n = 20)
6
PK 87

1
7
BJ 89
8
JH 89 Sort 
9
RF 96
10
AH 92
Median = 9
11
TN
12
RT
13
LW
9
11
98
9
14
2 Find middle position
(20 + 1) / 2 = 10.5

3 Median is value
14
EN 109
15
CL 108
16
RD 11
13
midway between 10th
17
KJ 118 and 11th position =
18 13
LC
19
10
14
(9+9)/2 = 9
TB 7
20
YY 21
Median: Properties and Uses
• Good descriptive measure for center of data
• Not affected by an extreme value (“outlier”)
• Measure of choice for asymmetrical (“skewed”)
distribution

Symmetrical distribution Skewed distribution


20 20
15 Outlier 15 Outlier
10 10
5 5
0 0
0-9 10- 20- 30- 40- 50- 60- 70- 80- 90- 0-9 10- 20- 30- 40- 50- 60- 70- 80- 90-
19 29 39 49 59 69 79 89 99 19 29 39 49 59 69 79 89 99

Age group (years) Age group (years)


Mean
Definition
• The average of a set of numerical values

• To calculate the mean

Divide the sum by


Sum up the
1 values
2 the number of
observations (n)
Mean: Example
Obs
Pt.
1
Days
29
Ebola Incubation Period (n=19)
KP
2
JB 68
3
SW 7
11 Sum of all values
Mean =
4
EB
5
NG
79
8
10
n
6
PK 87
Sum up the values
7
BJ
8
JH
9
RF
89
89
96
1 Sum = 168
10
AH 92

2 Divide sum by number of


11
TN 9
11
12
RT 98
13
LW 9
14 observations (n)
14
EN 109
CL
15 108 n = 19
RD
16 13
11
KJ
17
LC
18
11
10
13
8
Mean is 168 / 19 = 8.8 days
TB
19 147
Mean: Example
Obs
Pt.
1
Days
Days
29
Ebola Incubation Period (n=20)
KP
2
JB 68
3 7 Sum of all values
SW 11
Mean =
4
EB
5
79
8
n
NG 10
6
PK 87
Sum up the values
7
BJ
8
JH
9
RF
89
89
96
1 Sum = 189
10
AH 92

2
11 9
TN
12
RT
11
98
Divide sum by number of
13
LW 9
14 observations (n)
14
EN 109
15
CL 108 n = 20
16
RD 11
13
17
KJ
18
LC
11
13
10
8
Mean is 189 / 20 = 9.5 days
19
TB 147
20
YY 21
Mean: Properties and Uses
• Best known measure of central location
• Uses all the data
• Affected by extreme values (outliers)
• Best for symmetrically distributed data

Symmetrical distribution Skewed distribution


20 20
15 Outlier 15 Outlier
10 10
5 5
0 0
0-9 10- 20- 30- 40- 50- 60- 70- 80- 90- 0-9 10- 20- 30- 40- 50- 60- 70- 80- 90-
19 29 39 49 59 69 79 89 99 19 29 39 49 59 69 79 89 99

Age group (years) Age group (years)


n=19 n=20
Obs
1
Days
2
Days
2
Measures of Central Location
2 6 6
3
4
7
7
7
7
Which measure is “sensitive” to
5 8 8 outliers?
6 8 8
7 8 8 Original Updated data
Measure
8
9
8
9
8
9
data (n=19) (n=20)
10 9 9
11 9 9 Mode 9 9
12 9 9
13 9 9
14 10 10
15 10 10
Median 9 9
16 11 11
17 11 11
18 13 13 Mean 8.8 9.5
19 14 14
20 21
Sum 168 178 Answer: Mean is sensitive to outliers
Measure of Spread

20 Chart Title
Number of Cases

15

10

5
Spread
0
0 1 2 3 4 5 6 7 8 9

Age (Years)
Range
Definition (Epidemiologic)
Description of smallest to largest value
 Measure of spread

To identify the range

Sort data or
1 create frequency
distribution
2 Find minimum and
maximum values
Range: Example
Obs Days
Ebola Incubation Period (n=19)
1 2
2 6
3 7
4 7 Minimum value
5 8
6 8 =2
7 8
8 8
9 9
10 9 Range = 2 – 14
11 9
12 9
13 9
14 10
15
16
10
11
Maximum value
17 11 = 14
18 13
19 14
Summarizing Quantitative Data:
Example: Ebola Incubation Period (n=19)
Pt.
KP
Days
9
Ebola incubation period (days)
JB
SW
8
11
Mode = 9
EB
NG
9
10
Median = 9
PK 7 Mean = 8.8
BJ 9
JH 9 Range = 2 – 14
RF 6
AH 2 For quantitative epidemiologic data,
TN 11
RT 8 recommend summary with median
LW 14 and range.
EN 9
CL 8 Summary of Incubation period:
RD 13
KJ 8 Median (range) = 9 (2 – 14) days
LC 10
TB 7
Measures of Central Location:
Summary
• Measure of central location — single measure that
represents an entire distribution
• Mean — average value
• Mean uses all data; sensitive to outliers
• Mean preferred for symmetrical data; not common in
epidemiology
• Median — central value
• Safer choice for most epidemiologic data
• Mode — most common value
• Use median or mean with range
Exercise
• Review the data set with confirmed cases of
acute Middle east respiratory syndrome
coronavirus (MERS-CoV) infection

• Calculate the mode, median , Mean and range


for:
• Age (years) of MERS-CoV cases
• Number of days from disease onset to WHO
notification
Reported MERS-CoV cases, 31 October – 8 December 2017,
Kingdom of Saudi Arabia
Note: NC: not calculable

Date of Date of
symptoms Exposure Date of Days notification Days from
ID Age City of onset Exposure to MERS- outcome to to WHO onset to
No. (years) Sex residence (dd/mm/yy) to camels CoV cases Status (dd/mm/yy) Death (dd/mm/yy) notification
1 7
49 M Unizah 24-Oct-17 Yes Unknown Deceased 6-Nov-17 13 31-Oct-17
2 7
60 M Riyadh 25-Oct-17 Yes Unknown Alive 1-Nov-17
3 8
42 F Riyadh 25-Oct-17 Unknown Unknown Alive 2-Nov-17
4 11
65 M Riyadh 25-Oct-17 Unknown Unknown Alive 5-Nov-17
5 7
64 M Riyadh 29-Oct-17 Unknown Unknown Alive 5-Nov-17
6 5
49 M Riyadh 1-Nov-17 Unknown Unknown Alive 6-Nov-17
7 4
51 M Afif 9-Nov-17 Yes Unknown Alive 13-Nov-17
8 4
75 F Unizah 9-Nov-17 Unknown Unknown Deceased 18-Nov-17 9 13-Nov-17
9 3
69 M Zulfi 12-Nov-17 Unknown Unknown Alive 15-Nov-17
10 9
77 F Buridah 9-Nov-17 Unknown Unknown Deceased 18-Nov-17 9 18-Nov-17
11 5
63 M Bisha 15-Nov-17 Yes Unknown Alive 20-Nov-17
12 3
64 F Alasyah 21-Nov-17 Yes Unknown Deceased 24-Nov-17 3 24-Nov-17
13 5
15 M Riyadh 23-Nov-17 Unknown Unknown Deceased 3-Dec-17 10 28-Nov-17
14 NC
13 M Riyadh Unknown Unknown Yes Alive 28-Nov-17
15 11
67 F Bisha 18-Nov-17 Unknown Unknown Alive 29-Nov-17
16 4
71 M Buridah 25-Nov-17 Unknown Unknown Alive 29-Nov-17
17 4
64 M Riyadh 30-Nov-17 Unknown Unknown Alive 4-Dec-17
18 11
90 M Riyadh 27-Nov-17 Unknown Unknown Alive 8-Dec-17
Qualitative Variables
Type of Data Summarize with
• Descriptions Measures of
• Non-numeric data frequency
Measures
Examples  Counts
• Ill? (yes/no)  Ratios
• Sex  Proportions
• District  Rates
Counts: Global Number of Deaths* by
Selected Causes, 2000 and 2015
2000 2015
All causes 52,135 56,441
Ischemic heart disease 6,883 8,756
Stroke 5,407 6,241
Lower respiratory infections 3,408 3,190
Chronic obstructive pulm. disease 2,953 3,170
Trachea, bronchus, lung cancers 1,255 1,695
Diabetes mellitus 958 1,586
Diarrheal disease 2,177 1,389
Tuberculosis 1,667 1,373
Road injury 1,118 1,342
Cirrhosis of the liver 905 1,162
Kidney disease 709 1,129
HIV/AIDS 1,463 1,060
* x 1,000
Source: WHO. Global Health Observatory. Top 10 causes of death. 2017
Counts: Properties and Uses

• Common descriptive measure


• Provides picture of burden of disease
• Essential for service delivery and planning
• First step in calculating rates
Common Form of Measures of
Frequency
• Ratios
• Proportions
• Rates = (x / y) x k

Where: x = numerator
y = denominator
k = constant (1, 100, 1000, etc.)
Ratio

Definition
Comparison of any two values
Ratio = (x / y) x k
Where: x = numerator
y = denominator
k = constant (1, 100, 1000, etc.)

Numerator and denominator can be related or


unrelated.
Ratio: Example 1

Sex Calculate ratio of males to females in this


(M/F) dataset
M
F
Ratio = (x / y) x k
M
M x (numerator) = 5
F y (denominator) =
6
M k (constant) =
1
F
F Ratio =
F 5:6
M
F
Ratio: Example 2
A city of four million people has 400 clinics. Calculate
the ratio of clinics per person.
x (numerator)
400 =
y (denominator)
4,000,000 =
k (constant)
1 =
If constant = 1, ratio =
400 / 4,000,000 x 1 = 0.0001 clinics / person
What constant would you recommend?
If constant = 10,000, ratio =
(400 / 4,000,000) x 10,000 =
1 clinic / 10,000 persons
ID Hosp?
1
2
Yes
No
Ratio: Practice 1
3 Yes
4 Yes
5 No Calculate the ratio of hospitalized to non-hospitalized
6 Yes patients
7 Yes
8 Yes
9 Yes
10 No
11
12
No
No
Number hospitalized = 14
13 Yes
14 Yes Number not hospitalized = 10
15 Yes
16 No
17 No
18
19
No
No
Hosp:non-Hosp Ratio = 14:10 or 1.4:1
20 Yes
21 Yes
22 Yes
23 No
24 Yes
Proportion

Definition
Comparison of a part to the whole

• Useful for describing distribution of characteristics


within a population
• Proportion = x / y, where
• x is the number with a characteristic
• y is the total number
• Percent = proportion x 100%, e.g., (x / y) x 100%
Proportions as Percentages of Total:
Example
2000 2015
n* % n* %
All causes 52,135 100.0 56,441 100.0
Ischemic heart disease 6,883 13.2 8,756 15.5
Stroke 5,407 10.4 6,241 11.1
Lower respiratory infections 3,408 6.5 3,190 5.7
Chronic obstructive pulm. dis. 2,953 5.7 3,170 5.6
Trachea/bronchus/lung cancers 1,255 2.4 1,695 3.0
Diabetes mellitus 958 1.8 1,586 2.8
Diarrheal disease 2,177 4.2 1,389 2.5
Tuberculosis 1,667 3.2 1,373 2.4
Road injury 1,118 2.1 1,342 2.4
Cirrhosis of the liver 905 1.7 1,162 2.1
Kidney disease 709 1.4 1,129 2.0
HIV/AIDS 1,463 2.8 1,060 1.9
* x 1,000
Source: WHO. Global Health Observatory. Top 10 causes of death. 2017
Hosp
ID
1
?
Yes Proportions: Practice 1
2 No
3 Yes
4 Yes Calculate the proportion and percentage
5 No
6 Yes
of cases who were hospitalized
7 Yes
8 Yes
9 Yes
10
11
No
No
Number hospitalized = 14
12 No
13 Yes Total number of cases = 24
14 Yes
15 Yes Proportion hospitalized = 14 / 24 or 0.583
16 No
17 No
18 No Percentage hospitalized = 0.583 x 100%
19 No
20 Yes = 58.3%
21 Yes
22 Yes
23 No
24 Yes
Proportions: Practice 2
Among 10,000 adults enrolled in a blood pressure (BP)
survey, 570 were diagnosed with hypertension (defined as
diastolic BP measurement >95 mm Hg).
Q. What proportion of the survey enrollees had
hypertension?
5.7%
A. 570 persons with hypertension = 0.057 =
10,000 persons enrolled
Q. What proportion did not have hypertension?
94.3%
A. 9,430 non-hypertensive persons = 0.943 =
10,000 persons enrolled
Shortcut when only two categories: 100% − 5.7% = 94.3%
Health-related Rates

• Incidence
• Prevalence
• Attack rate
• Case-fatality rate
• Mortality rate
• Other rates
Incidence versus Prevalence
Numerator
Incidence — New cases

Prevalence — Current cases


Incidence Rate
Definition
Frequency of new cases of illness in a population
over a specified period of time
Number of new cases during specified period x Constant

(size of population) x (time)


Incidence Rate: Example
Last year, 24 new Zika virus disease (ZVD) cases were
reported in District A (population 300,000).
• Calculate ZVD incidence rate per 100,000.
Number of new cases during specified period x Constant
(size of population) x (time)

24 cases
x 100,000 = 8.0
300,000 pop x 1 year

Last year’s incidence rate of ZVD was


8.0 cases per 100,000 population per year
Incidence Rate: Practice
During the past 3 years, a total of 60 cases of Zika
virus cases were reported to the surveillance system in
District A (population 300,000).
 Calculate the AVERAGE ANNUAL incidence rate
over the 3-year period

60 cases x 100,000 = 6.7


300,000 pop x 3 years

The average annual incidence rate of ZVD in District


A was 6.7 cases per 100,000 population per year.
Attack Rate (“Risk”)
Definition
Frequency of new cases in a population over a specified
period of time, usually during outbreak

Number of new cases during specified period x Constant


(such as 100%
Size of population at start of that period or 1,000)

Example: An outbreak of 16 cases of anthrax occurred in


Village Q (population = 800) during May 2016.
16
x 100 = 2.0%
800
Attack rate (risk) during anthrax outbreak was 2%.
Counts versus Attack Rates
Acute watery diarrhea cases by age and sex, Village X,
January, 20xx
Age (years) Male Female Total
<1 9 17 26
1 – 14 152 107 259
15 – 29 44 51 95
30 – 49 17 24 41
≥ 50 8 10 18
Total 230 209 439
Q1. Which age group had the most cases? 1-14 year olds
Q2. Which age group had the greatest risk of illness?
Need denominators (population size) to calculate risk
Attack Rates: Practice 1
Acute watery diarrhea cases by age and sex, Village X,
January, 20xx
Age (years) Males Females Total
Cases Pop. Cases Pop. Cases Pop.
<1 9 800 17 850 26 1,650
1 – 14 152 9,200 107 9,150 259 18,350
15 – 29 44 5,500 51 6,000 95 11,500
30 – 49 17 6,250 24 6,750 41 13,000
≥ 50 8 3,000 10 4,500 18 7,500
Total 230 24,750 209 27,250 439 52,000

Q3. Calculate attack rate (risk) for 1-14 year olds, per 1,000
population. 259 / 18,350 x 1,000 = 0.0141 x 1,000
= 14.1 cases /1,000 population
Attack Rates: Practice 2
Acute watery diarrhea cases by age and sex, Village X,
January, 20xx
Age (years) Males Females Total
Cases AR (%) Cases Pop. AR (%) AR (%)
<1 9 11.3 17 20.0 26 15.8
1 – 14 152 16.5 107 11.7 259 14.1
15 – 29 44 8.0 51 8.5 95 8.3
30 – 49 17 2.7 24 3.6 41 3.2
≥ 50 8 2.7 10 2.2 18 2.4
Total 230 9.3 209 7.7 439 8.4

Q1. Which age group had the most cases? 1-14 year olds
Q2. Which age group had the greatest risk of illness?
< 1 year olds
Prevalence
Definition — Prevalence of disease
Frequency of existing cases (new cases plus old cases
that are still active) of a disease in a population at a
point or over a period of time
Definition — Prevalence of an attribute
Frequency of persons with a particular attribute in a
population at a point or over a period of time
Formula
Numerator: number current cases or persons with
attribute
Denominator: size of population
Constant: usually 100 (%) or 1,000
Prevalence: Examples
Number persons living with HIV in Province X in 2018
Province X population on 1 July 2018

Number of children with anemia in District Y in 2018


District Y child population on 1 July 2018

Number of adults who smoked cigarettes in Country Z


in 2018
Country Z adult population on 1 July 2018
1 July 1 August

Community population = 100 people


Comparing Incidence and
Prevalence
Incidence Prevalence
• NEW cases or events • ALL cases at
over period of time point/period of time
• Useful for studying • Useful for measuring
factors that cause size of problem and
disease (“risk factors”) planning
Death (Mortality) Rate
Definition
Frequency of deaths in a defined population during a
specified period of time
Number of deaths during specified period x Constant
Size of population (usually 1,000)

Types
• Death rate – refers to entire population
• Disease-specific (Cause-specific) death rate
• Age-specific death rate
• Maternal mortality rate
• Many others
Death Rate: Practice

540,000 deaths occurred during 2017 in Country A


(estimated 2017 population of 60,000,000)
Death rate =
Number of deaths during specified period x Constant
Size of population (usually 1,000)

540,000 deaths x 1,000 = 9.0 deaths


60,000,000 population per 1,000 population

Epidemiologists use death rates to compare mortality


between areas because the rates account for
differences in population size
Case-Fatality Rate
Definition
Proportion of persons with a particular disease who
die as from that disease
• Describes the virulence or lethality of the disease
• Actually a proportion, not a rate
• Often reported as a percentage

Number of deaths due to a disease x Constant


Number of cases of that disease (such as 100)
Case-Fatality Rate: Example
Confirmed Human Influenza A/H5N1 Cases,
Worldwide, 2003–2017
Years Cases Deaths CFR
2003–2009 468 282 60%
2010–2014 233 125 54%
2015–2017 159 47 30%
2003–2017 860 454 ____

Calculate the worldwide case-fatality rate for 2003-2017.


454 / 860 x 100% = 53%

 WHO. Influenza Program. Cumulative number of confirmed human cases of avian


influenza A(H5N1). March 2018.
Measures of Frequency: Summary

Ratio
Counts Proportion
Comparison of
Number of cases Part of the whole
any two numbers

Incidence rate
Attack rate
New cases, any time
Rate New cases, short time
interval, need to take
Number of interval
time into account
cases
divided by Prevalence rate
Case-fatality rate
population Current cases in
Proportion of cases
population regardless
that died
of time of onset
Summary
• For qualitative variables, summarize with ratios,
proportions, and rates
• For quantitative variables, summarize with
mode, median, mean, and range
• For epidemiologic data, use median and range
• Key rates:
• Incidence: rate of new cases in population
• Prevalence: rate new + old cases in population
• Attack rate of disease: during outbreak
• Death rate: mortality accounting for population
size
• Case-fatality: deaths among cases
Exercise: Scenario

• Four years ago, 787 women aged 40–65 years who received
primary health care at a particular clinic were enrolled into a
blood pressure (BP) study. None had been previously
diagnosed with high blood pressure. Qualified clinicians
measured the BP of each woman, and hypertension was
defined as any person with one diastolic BP measurement of
>95 mm Hg. Each woman diagnosed with hypertension was
treated with antihypertensive drugs.
• Among the 787 women, 37 were diagnosed with hypertension
on Day 1 of the study. After exactly one year, an additional 43
women were diagnosed with new onset of hypertension. In the
subsequent 3 years, 54 additional women were diagnosed with
hypertension.
• Among the 787 enrollees, six died during the study period,
including five of those with hypertension.
Scenario……
• Question 1: What proportion of women in the cohort were
newly diagnosed with hypertension on Day 1?
• Question 2: What was the prevalence of hypertension among
this cohort of women at the end of the first year of this study?
• Question 3: What was the incidence of hypertension per year
during the study period
• Question 4: What was the annual death rate among all 787
women during the study period?
Present
Stage 6 Report the data.
• Reporting is more integral to healthcare quality
improvement.
• What do you want to communicate?
• Different information products for different data &
meanings
• Tools and methods for organizing data into
information:
• Graphs :Histogram, Line diagrams, Scatter plot,
Bar chart, Pie chart, population pyramids
• Tables : Frequency distribution
• Maps: Geographical presentation
Presenting data in graphs
Monthly Clinical Diagnostics at MZUNI Health Center
(Jan-April 2015)
3.5

2.5

Malaria
2 Dairrhea
Pneumonia

1.5

0.5

0
Jan Feb March April
GRAPHS
(a visual representation of data)

Advantages:
• Information is instantly conveyed
• Data presented clearly and simply
• Can expose relationships and patterns
• Detect trends over time
• Can be used to emphasise information

* Slide from UiO Course:


INF5761/INF9761
Graph Elements

Graph 1: Clinic Alpha -PHC Headcount, 2001 Title – descriptive clinic


name, what is graphed and
the time period
1200
Y axis – must ALWAYS be
1000 labeled
800
Y axis label
Y
numbers

600
400
X axis – label if appropriate
200
0
X
Jan Feb Mar Apr May Jun Key or legend – used if more
than one element graphed
PHC Headcount
Source: Notes:

Scale – must be appropriate


* Slide from UiO Course:
INF5761/INF9761
Five rules for graphs
1. Never put too much information in the graph.
KEEP IT SIMPLE
2. Be careful about mixing different activities: stick
to one group of people, diseases or service
3. Label your graph: always have a clear heading,
easily read labels on the axes, and a legend
which explains each of the lines or bars
4. Select scales that fit the entire graph on both
axes
5. Where possible, draw a target line or reference
point to show where you are aiming at
* Slide from UiO Course:
INF5761/INF9761
Type of graphs
Continuous data
• histograms
• line graphs
• scatter graphs
Discrete Data
• bar graphs
• pie charts

* Slide from UiO Course:


INF5761/INF9761
Line graph
PHC headcount under 5 years old, Manyara Clinic, 2001

400

300

200

100

0
Jan Feb Mar Apr May Jun

 accurate, can show changes in the relationships between two variables


 displays trends over time
 useful if more than one data item is used * Slide from UiO Course:
INF5761/INF9761
Bar graph versus Line graph
which one is best?

* Slide from UiO Course:


INF5761/INF9761
Line graph,
with two dependent variables

* Slide from UiO Course:


INF5761/INF9761
Line graph, for cumulative coverage
Clinic Alpha : EPI : Cumulative Coverage of
Children Fully Immunised 2000
100

80
Target line
60
%

40

20

0
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Monthly Immunisation 4 5.3 6.2 3.8 5.6 7.3 6.8 7 5.9 6.7 7.5 5.8
Cumulative Immunisation 4 9.3 15.5 19.3 24.9 32.2 39 46 51.9 58.6 66.1 71.9

Monthly Im m unis ation Cum ulative Im m unis ation

* Slide from UiO Course: INF5761/INF9761


Line graph, for cumulative coverage

 Simple and effective monitoring tool


Clinic Alpha : EPI : Cumulative Coverage of
Children Fully Immunised 2000
100

80

 Used when targets are set for a year i.e. 60


Target line

%
immunization, antenatal coverage, etc. 40

20

0
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Monthly Immunisation 4 5.3 6.2 3.8 5.6 7.3 6.8 7 5.9 6.7 7.5 5.8
Cumulative Immunisation 4 9.3 15.5 19.3 24.9 32.2 39 46 51.9 58.6 66.1 71.9

 Each month, data is graphed individually and also


Monthly Im m unis ation Cum ulative Im m unis ation

added to the previous month


 A target is set, a target line is drawn and progress is
monitored with respect to the target line

* Slide from UiO Course:


INF5761/INF9761
Pie chart good to show relative proportions

Only for data that


adds up to a total
(100%)

* Slide from UiO Course:


88
INF5761/INF9761
Bar graph, simple
Clinic Alpha : Attendance 2001

800
700
600
500
numbers

400
300
200
100
0
Jan Feb Mar Apr May Jun
PHC Headcount under 5 years PHC Headcount 5 years and over

displays data over time or can compare 2 or more different facilities / districts / regions / years
* Slide from UiO Course:
INF5761/INF9761
Bar graph, stacked
Clinic Alpha : Attendance 2001

1200

1000

800
numbers

600

400

200

0
Jan Feb Mar Apr May Jun

PHC Headcount under 5 years PHC Headcount 5 years and over

it displays the quantities, but it also shows the relative proportions of the categories to each other
and to the whole
BUT hard to estimate the value of the variables at the top
* Slide from UiO Course:
INF5761/INF9761
* Slide from UiO Course:
INF5761/INF9761
Common faults with graphs
No title
No labels for the variables
Don’t trust
No units of measurement (or incorrect units!)
the
computer!
No scale markings (or just too many!)
Inappropriate scale choice – data points should
be evenly represented
Incorrect choice of independent (x-axis) and
dependent (y-axis) variables
No legends when needed
Too high ink-to-data ratio (e.g. 3D graphs)
* Slide from UiO Course:
INF5761/INF9761
BAD
GRAPHS!

* Slide from UiO Course:


INF5761/INF9761
* Slide from UiO Course:
INF5761/INF9761
…gone fishing…
* Slide from UiO Course:
INF5761/INF9761
TABLES
Tables

• Beware information overload:


•easy to produce – difficult to use
•Ideally should contain:
• Few rows
• Few categories/columns
• Uses:
• assess quality
• trends over time
• make comparisons
• pick up outliers, gaps

* Slide from UiO Course:


INF5761/INF9761
A nice table
Number of children per family in Maputo, 2005

Number of Children Frequency %


0 7 6,7
1 10 9,6
2 15 14,4
3 25 24,0
4 21 20,2
5 10 9,6
6 6 5,8
7 5 4,8
8 2 1,9
9 3 2,9
Total 104 100,0
* Slide from UiO Course:
Source: Statistics & Planning Directorate, 2005 INF5761/INF9761
98
Action
Stage 6: Understand the data
• What was effective? What is the clinical point ? How
are these two points of view reconciled to get the
“right” results?
• Interpreting the information
• Take into account data quality bias
• When healthcare organizations and providers have
data they can understand issues better,
• A root cause analysis is an ideal way of solving
deficiencies or other problems flagged by the data.

* Slide from UiO Course:


99
INF5761/INF9761
Action………..
Stage 7 : Use the data.
• The final stage in the data life cycle is certainly the
most important
• Data that is accurate and reliable is not all that
useful until it is actionable.
• How is the data being used to manage quality of
care and cost of care?
• Plan action and interventions
• Prioritze resources
• Set well-defined targets
• How is the action going to be evaluated?
Interpret information to find causes
Community Family trust No proper
norms in the TBAs orientation
Men not
involved
Bye laws not
instituted
Low
community TBAs holding on
sensitization clients
Cultural Problem with
beliefs sharing of
fees
Difficult
terrain
Low PHU
Patients refusal Staff not
to go to PHU Deliveries motivated

Can’t afford
travel

Long Staff attitude


High fees for
distance deliveries
Low
Irregular educational
supervision level
Staff
Can’t afford shortage
* Slide from UiO Course: fees

INF5761/INF9761
Data quality bias?
1st Dose VS Population <1yr
Take action: Underweight children
Public campaign:
”You must weigh your child
every month to make sure
s/he grows properly”

* Slide from UiO Course: 103


INF5761/INF9761
Targets

state exactly what has to be achieved,


by whom and by when
a realistic point at which to aim to reach a goal
turning organizational goals into operational
numbers

* Slide from UiO Course:


104
INF5761/INF9761
Example Targets
Targets should be SMART

Specific capturing changes in situation concerned


Measurable able to be easily quantified
Appropriate fit to local needs, capacities and culture
Realistic can be reached with available resources
Time bound to be achieved by a certain time

106

You might also like