0% found this document useful (0 votes)
16 views97 pages

1 Introduction

Uploaded by

abdiwahab
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views97 pages

1 Introduction

Uploaded by

abdiwahab
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 97

BIOSTATISTICS

Course instructor: Abdiwahab Hasshi Ahmed

Biostat
10/20/24 1
Introduction

10/20/24 Biostat 2
What is statistics?
• The scientific study of numerical data
based on variation in nature.
• A set of procedures and rules for
reducing large masses of data into
manageable proportions allowing us
to draw conclusions from those data.

3
Statistics…
• Statistics the science of collecting,
summarizing, presenting,
interpreting data, and of using them
to test hypotheses.

4
• Biostatistics is statistics applied
to biological and health
problems

10/20/24 Biostat 5
Uses of Biostatistics
• Assessment of health status
• Resource allocation
• Vaccination uptake
• Magnitudes of a disease/condition
• Assessing risk factors
• Making diagnosis and choosing an appropriate
treatment

10/20/24 Biostat 6
Types of Statistics
1. Descriptive statistics:
• Ways of organizing and summarizing data
• Methods for identifying the important features
of a set of data and extracting useful
information

10/20/24 Biostat 7
Types of Statistics
2. Inferential statistics:
• Methods used for drawing conclusions
about a population based on the
information contained in a sample of
observations drawn from that population

10/20/24 Biostat 8
Types of statistics
• Descriptive Statistics
– Collection,
– organization,
– summarization, and
– presentation of data.
• Inferential Statistics
– Generalizing from samples to populations using
probabilities.
– Performing hypothesis testing,
– Determining relationships between variables,
– Making predictions. 9
Why study statistics in health?

• Medicine and epidemiology are becoming


increasingly quantitative

• Knowledge of statistics is required to design,


conduct and analyse medical researches

• Helps for better understanding of medical


literature

10
Roles of statistics
• In clinical medicine
– Making clinical diagnosis
– Determining Rx and prognosis
– Handling variations (defining normal values and
normal ranges)
• In public health
– Community diagnosis
• In Research
– Designing and undertaking clinical & public health
research 11
Limitations of statistics
1. Statistics doesn’t deal with single
(individual) value.
– It deals only with aggregate values
2. Statistics can’t deal with qualitative
characteristics
– Deals with data which can be quantified
3. Statistical conclusions are not universally
true
– Context specific
4. Statistical interpretations require high degree
of skill & understanding of the subject.
12
Data
• Data are numbers which can be
measurements or can be obtained by counting
• The raw material for statistics
• Can be obtained from:
– Routinely kept records
– Surveys
– Counting
– Experiments
– Reports
10/20/24 Biostat 13
Types of Data
1. Primary data: collected from the items or
individual respondents directly by the
researcher for the purpose of certain study.

2. Secondary data: which had been collected


by certain people or agency, and
statistically treated and the information
contained in it is used for other purpose

10/20/24 Biostat 14
Population and Sample
Target population:
• A collection of items that have
something in common for which we
wish to draw conclusions at a
particular time

• The whole group of interest

10/20/24 Biostat 15
Population and Sample
Study (Sampled) Population:
• The subset of the target population that
has at least some chance of being
sampled
• The specific population from which data
are collected

10/20/24 Biostat 16
Population and Sample
Sample:
. A subset of a study population, about
which information is actually
obtained.
. The individuals who are actually
measured and comprise the actual
data.
10/20/24 Biostat 17
Sample

Study Population

Target Population

10/20/24 Biostat 18
Generalizability
• is a two-stage procedure: we need
to able to generalize from the
sample to the study population and
then from the study population to
the target population

10/20/24 Biostat 19
Draw conclusions
Collect information from a about a rather
comparatively SMALL sample
LARGE population

10/20/24 Biostat 20
Parameter and Statistic
• Parameter: A descriptive measure computed
from the data of a population.

• Statistic: A descriptive measure computed


from the data of a sample.

10/20/24 Biostat 21
Descriptive Statistics
Descriptive Statistics
• Techniques used to organize and
summarize a set of data.

• The best way to work with data is to


summarize and organize them.

• Numbers that have not been summarized


and organized are called raw data.
Descriptive statistics include:
• Tables
• Graphs
• Numerical summary measures
- Measures of central tendency
- Measures of variability
• Before summarization and organization, we
need to know the types of variables and
measurement scales of our data.
Variable
• Variable: A characteristic which takes different
values in different persons, places, or things.
• Any aspect of an individual or object that is
measured/observed and takes any value.
• Variables can be broadly classified into:
– Qualitative (or categorical) or
– Quantitative (or numerical variables).
• Qualitative variable: A variable or characteristic
which can not be measured in quantitative form but can
only be sorted by name or categories

• Quantitative variable: A variable that can be


measured and expressed numerically.
Quantitative variable is divided into two:
• Discrete variable: It can only have a finite number
of values in any given interval.
• Characterized by gaps or interruptions in the values.

• Continuous variable: It can have an infinite


number of possible values in any given interval.
• Does not possess the gaps or interruptions
Scales of measurement
• All measurements are not the same.
• Measuring weight = eg. 40kg
• Measuring the status of a patient on
scale = “improved”, “stable”, “not
improved”.
• There are four types of scales of
measurement.
1. Nominal scale:
• The simplest type of data, in which the
values fall into unordered categories or
classes
• Uses names, labels, or symbols to
assign each measurement.
Examples: Blood type, sex, race, marital
status
Example of
nominal Scale: • The numbers have
Race/Ethnicity: NO meaning
1. Black • They are labels
2. White only
3. Latino
4. Other
2. Ordinal scale:
. Assigns each measurement to one of a
limited number of categories that are
ranked in terms of order.
• Examples: Patient status, Cancer stages
Example of ordinal
scale: • The numbers have
• Pain level: LIMITED
meaning 4>3>2>1
1. None is all we know
2. Mild apart from their
3. Moderate utility as labels
4. Severe
3. Interval scale:
- It has no true zero point. “0” is
arbitrarily chosen and doesn’t reflect
the absence of temp.
o
- Example: Temp. in F
4. Ratio scale:
- Measurement begins at a true zero
point and the scale has equal space.
Examples: Height, weight, blood

pressure
Methods of Data Organization and
Presentation
Frequency Distributions
• Ordered array: A simple arrangement of
individual observations in order of magnitude.
• The actual summarization and organization of
data starts from frequency distribution.
• Frequency distribution: A table which
involves a listing of all values of the studied
variable and how many times each value is
observed.
• Tables make easier to see how the data are
distributed
a) Qualitative variable: Count the number of
cases in each category.
- Example1: The ICU type of 25 patients
entering intensive care unit at a given hospital:
1. Medical
2. Surgical
3. Cardiac
4. Other
Frequency Relative Frequency
ICU (How often) (Proportionately
Type often)
Medical 12 0.48
Surgical 6 0.24
Cardiac 5 0.20
Other 2 0.08
Total 25 1.00
Example 2:
A study was conducted to assess the
characteristics of a group of 234 smokers by
collecting data on gender and other variables.
Gender, 1 = male, 2 = female

Gender Frequency (n) Relative Frequency


Male 1 110 47.0%
Female 2 124 53.0%
Total 234 100%
b) Quantitative variable:
- Select a set of continuous, non-overlapping
intervals such that each value can be placed in
one, and only one, of the intervals.
To determine the number of class intervals and the
corresponding width, we may use:

Sturge’s rule:
K 1  3.322(logn)
L S
W
K
where
K = number of class intervals n = no. of observations
W = width of the class interval L = the largest value
S = the smallest value
Example:
• Leisure time (hours) per week for 40 college
students:
23 24 18 14 20 36 24 26 23 21 16 15 19 20 22
14 13 10 19 27 29 22 38 28 34 32 23 19 21 31
16 28 19 18 12 27 15 21 25 16
K = 1 + 3.322 (log40) = 6.32 ≈ 6
Maximum value = 38, Minimum value = 10
Width = (38-10)/6 = 4.66 ≈ 5
Time Relative Cumulative
(Hours) Frequency Frequency Relative
Frequency
10-14 5 0.125 0.125
15-19 11 0.275 0.400
20-24 12 0.300 0.700
25-29 7 0.175 0.875
30-34 3 0.075 0.950
35-39 2 0.050 1.00
Total 40 1.00
• Cumulative frequencies: When frequencies of
two or more classes are added.

• Cumulative relative frequency: The percentage


of the total number of observations that have a
value either in that interval or below it.

• Mid-point: The value of the interval which


lies midway between the lower and the
upper limits of a class.
• True limits: Are those limits that make an
interval of a continuous variable continuous in
both directions

• Used for smoothening of the class intervals

• Subtract 0.5 from the lower and add it to the


upper limit
Time
(Hours) True limit Mid-point Frequency
10-14 9.5 – 14.5 12 5
15-19 14.5 – 19.5 17 11
20-24 19.5 – 24.5 22 12
25-29 24.5 – 29.5 27 7
30-34 29.5 – 34.5 32 3
35-39 34.5 - 39.5 37 2
Total 40
Tables can also be used to present more than
one variables.
Variable Frequency (n) Percent
Sex
Male
Female
Age (yrs)
15-19
20-24
25-29
Religion
Christian
Muslim
Occupation
Student
Farmer
Merchant
Guidelines for constructing tables
• Keep them simple,
• Limit the number of variables to three or less,
• All tables should be self-explanatory,
• Include clear title telling what, when and where,
• Clearly label the rows and columns,
• State clearly the unit of measurement used,
• Explain codes and abbreviations in the foot-note,
• Show totals,
• If data is not original, indicate the source in foot-
note.
Diagrammatic Representation

• Pictorial representations of numerical data


Importance of diagrammatic
representation:
1. Diagrams have greater attraction than
mere figures.
2. They give quick overall impression of the
data.
3. They have great memorizing value than
mere figures.
4. They facilitate comparison
5. Used to understand patterns and trends
• Well designed graphs can be powerful
means of communicating a great deal of
information

• When graphs are poorly designed, they not


only don’t effectively convey message,
but they are often misleading.
Specific types of graphs include:
• Bar graph Nominal, ordinal
• Pie chart and discrete data

• Histogram
• Box plot
Continuous
• Scatter plot data
• Line graph
• Others
1. Bar charts (or graphs)
• Categories are listed on the horizontal axis (X-
axis)
• Frequencies or relative frequencies are
represented on the Y-axis (ordinate)
• The height of each bar is proportional to the
frequency or relative frequency of
observations in that category
Bar chart for the type of ICU for 25 patients
Method of constructing bar chart
• All the bars must have equal width
• The bars are not joined together
• The different bars should be separated by
equal distances
• All the bars should rest on the same line
called the base
Example: Construct a bar chart for the
following data.
Distribution of patients in hospital by source of referral
Source of referral No. of patients Relative freq.
Other hospital 97 5.1
General practitioner 769 40.3
Out-patient department 623 32.7
Casualty 256 13.4
Other 161 8.5
Total 1 906 100.0
Distribution of patients in hopital X by source of referal, 1999
769
800

700 623
600
No. of patients

500
400

300 256

200 161
97
100
0
Other GP OPD Casualty Other
hospital
Source of referal
2. Sub-divided bar chart
• If there are different quantities forming the
sub-divisions of the totals, simple bars may
be sub-divided in the ratio of the various
sub-divisions to exhibit the relationship of
the parts to the whole.
• The order in which the components are
shown in a “bar” is followed in all bars used
in the diagram.
Example: Plasmodium species distribution for
confirmed malaria cases, Zeway, 2003

100 Mixed
P. vivax
80 P. falciparum

60
Percent

40

20

0
August October December
2003
3. Multiple bar graph
• Bar charts can be used to represent the
relationships among more than two
variables.
• The following figure shows the relationship
between children’s reports of breathlessness
and cigarette smoking by themselves and
their parents.
Prevalence of self reported breathlessness among school
childeren, 1998

35
Breathlessness, per cent

30
25
20
15
10
5
0
Neither One Both
Parents smooking

Child never smoked smoked occassionaly child smoked one/week or more

We can see from the graph quickly that the prevalence of the
symptoms increases both with the child’s smoking and with
that of their parents.
There’s no reason why the bar chart can’t be
plotted horizontally instead of vertically.

CHA
Type of source
HC

Reading

Training femal
male
e
Campaign

Anti FGMC

CAT

0 10 20 30 40 50
Percent

Figure 1. Source of information on the complications of FGM and participation in


RH programs, Jijiga, 2004*. * FGMC = female genital mutilation committee;
CAT= community action team; HC = health centre; CHA= community health agent
4. Pie chart
• Shows the relative frequency for each
category by dividing a circle into sectors,
the angles of which are proportional to the
relative frequency.

• Use percentage distributions

• Used for a single categorical variable


Steps to construct a pie-chart
• Construct a frequency table

• Change the frequency into percentage (P)

• Change the percentages into degrees, where:


degree = Percentage X 360o

• Draw a circle and divide it accordingly


Example: Distribution of deaths for females, in England
and Wales, 1989.

Cause of death No. of deaths


Circulatory system (C) 100 000
Neoplasmas (cancer) (N) 70 000
Respiratory system (R) 30 000
Injury and Poisoning (I) 6 000
Digestive system (D) 10 000
Others 20 000
Total 236 000
Distribution fo cause of death for females, in England and Wales, 1989

Others
8%
Digestive System
4%
Injury and Poisoning
3%

Circulatory system
Respiratory system
42%
13%

Neoplasmas
30%
5. Stem and Leaf Plot
• A quick way to organize data to give visual
impression similar to a histogram while
retaining much more detail on the data.

• Draw a vertical line and place the first digits of


each value called the “stem” on the left side of
the line.

• The numbers on the right side of the vertical


line present the second digit of each
observation; they are the “leaves”.
Example

• 43, 28, 34, 61, 77, 82, 22, 47, 49, 51, 29, 36,
66, 72, 41

2 2 8 9
3 4 6
4 1 3 7 9
5 1
6 1 6
7 2 7
8 2
6. Histogram
• Histograms are frequency distributions with
continuous class intervals that have been turned into
graphs.
• To construct a histogram, we draw the interval
boundaries on a horizontal line and the frequencies on
a vertical line.
• Non-overlapping intervals that cover all of the data
values must be used.
• Bars are then drawn over the intervals in such a way
that the areas of the bars are all proportional in the
same way to their interval frequencies.
Example: Distribution of the age of women at the time of
marriage
Age group 15 - 19 20- 24 25 -29 30-34 35-39 40-44 45-49
No. of women 11 36 28 13 7 3 2
Age of women at the time of marriage

40

35

30
No of women

25

20

15

10

0
14.5-19.5 19.5-24.5 24.5-29.5 29.5-34.5 34.5-39.5 39.5-44.5 44.5-49.5
Age group
Histogram for the ages of 2087 mothers with
<5 children, Adami Tulu, 2003
700

600

500

400

300

200

100 Std. Dev = 6.13


Mean = 27.6

0 N = 2087.00
15.0 20.0 25.0 30.0 35.0 40.0 45.0 50.0 55.0

N1AGEMOTH
7. Frequency polygon
• A frequency distribution can be portrayed
graphically in yet another way by means of
a frequency polygon.
• To draw a frequency polygon we connect
the mid-point of the tops of the cells of the
histogram by a straight line.
Frequency polygon for the ages of 2087 mothers with <5
children, Adami Tulu, 2003
700

600

500

400

300

200

100 Std. Dev = 6.13


Mean = 27.6
0 N = 2087.00
15.0 20.0 25.0 30.0 35.0 40.0 45.0 50.0 55.0

N1AGEMOTH
It can be also drawn without erecting rectangles by joining the top
midpoints of the intervals representing the frequency of the classes as
follows:
Age of women at the time of marriage

40

35

30
No of women

25

20

15

10

0
12 17 22 27 32 37 42 47
Age
8. Ogive Curve
• Some times it may be necessary to know the number of
items whose values are more or less than a certain
amount.
• We may, for example, be interested to know the no. of
patients whose weight is <50 Kg or >60 Kg.
• To get this information it is necessary to change the
form of the frequency distribution from a ‘simple’ to a
‘cumulative’ distribution.
• Ogive curve turns a cumulative frequency distribution in
to graphs.
• Are much more common than frequency polygons
Cumulative Frequency and Cum. Rel. Freq. of Age
of 25 ICU Patients

Relative Cumulative Cumulative


Age Interval Frequency Frequency frequency Rel. Freq.
(%) (%)

10-19 3 12 3 12
20-29 1 4 4 16
30-39 3 12 7 28
40-49 0 0 7 28
50-59 6 24 13 52
60-69 1 4 14 56
70-79 9 36 23 92
80-89 2 8 25 100

Total 25 100
Cumulative frequency of 25 ICU patients
Example: Heart rate of patients admitted to
hospital Y, 1998
Heart rate No. of patients Cumulative frequency Cumulative frequency
Less than Method(LM) More than Method(MM)
54.5-59.5 1 1 54
59.5-64.5 5 6 53
64.5-69.5 3 9 48
69.5-74.5 5 14 45
74.5-79.5 11 25 40
79.5-84.5 16 41 29
84.5-89.5 5 46 13
89.5-94.5 5 51 8
94.5-99.5 2 53 3
99.5-104.5 1 54 1
Heart rate of patients admited in hospital Y, 1998

60

50

40
Cum. freqency

30

20

10

0
54.5

59.5

64.5

69.5

74.5

79.5

84.5

89.5

94.5

99.5

104.5
Heart rate

LM MM
9. Box and Whisker Plot
• It is another way to display information when
the objective is to illustrate certain locations in
the distribution.
• Can be used to display a set of discrete or
continuous observations using a single vertical
axis – only certain summaries of the data are
shown
• First the percentiles (or quartiles) of the data
set must be defined
• A box is drawn with the top of the box at the
third quartile and the bottom at the first
quartile.
• The location of the mid-point of the
distribution is indicated with a horizontal line
in the box.
• Finally, straight lines, or whiskers, are drawn
from the centre of the top of the box to the
largest observation and from the centre of the
bottom of the box to the smallest observation.
• Percentile = p(n+1), p=the required percentile
• Arrange the numbers in ascending order
A. 1st quartile = 0.25(n+1)th
B. 2nd quartile = 0.5(n+1)th
C. 3rd quartile = 0.75(n+1)th
D. 20th percentile = 0.2(n+1)th
C. 15th percentile = 0.15(n+1)th
Example: Percentage super saturation of bile for 31 men and 29
women
Men Women
Subject Age % Super saturation Subject Age % Super saturation
1 23 40 1 40 65
2 31 86 2 33 86
3 58 11 3 49 76
4 25 86 4 44 89
5 63 106 5 63 142
6 43 66 6 27 58
7 67 123 7 23 98
8 48 90 8 56 146
9 29 112 9 41 80
10 26 52 10 30 66
11 64 88 11 38 52
12 55 137 12 23 35
13 31 88 13 35 55
14 20 80 14 50 127
15 23 65 15 47 77
16 43 79 16 36 91
17 27 87 17 74 128
18 63 56 18 53 75
19 59 110 19 41 82
20 53 106 20 25 89
21 66 110 21 57 84
22 48 78 22 42 116
23 27 80 23 49 73
24 32 47 24 60 87
25 62 74 25 23 76
26 36 58 26 48 107
27 29 88 27 44 84
28 27 73 28 37 120
29 65 118 29 57 123
30 42 67
31 60 57
160

140

120

100

80

60

40

20

Men Women
Box and whisker plots for percentage saturation of bile
• The percentage saturation of bile is a bit
more spread out among women with
range 35 to 146 but we see also that the
mid-points of the distributions are
almost the same and that most of the
spread in values in women occurs in the
upper half of the distribution.
10. Scatter plot
• Most studies in medicine involve measuring more
than one characteristic, and graphs displaying the
relationship between two characteristics are
common in literature.
• When both the variables are qualitative then we
can use a multiple bar graph.
• When one of the characteristics is qualitative and
the other is quantitative, the data can be displayed
in box and whisker plots.
• For two quantitative variables we use
bivariate plots (also called scatter plots or
scatter diagrams).

• In the same study on percentage saturation


of bile, information was collected on the
age of each patient to see whether a
relationship existed between the two
measures.
• A scatter diagram is constructed by drawing X-and Y-axes.
• Each observation is represented by a point or dot().

Age and percentage saturation of bile for women patients in


hospital Z, 1998
160

140

120
Saturation of bile

100

80

60

40

20

0
0 10 20 30 40 50 60 70 80
Age
• The graph suggests the possibility of
a positive relationship between age
and percentage saturation of bile in
women.
11. Line graph
• Useful for assessing the trend of particular situation
overtime.
• Helps for monitoring the trend of epidemics.
• The time, in weeks, months or years, is marked along the
horizontal axis, and
• Values of the quantity being studied is marked on the
vertical axis.
• Values for each category are connected by continuous
line.
• Sometimes two or more graphs are drawn on the same
graph taking the same scale so that the plotted graphs are
comparable.
No. of microscopically confirmed malaria cases by species
and month at Zeway malaria control unit, 2003

2100
No. of confirmed malaria cases

1800 Positive
1500 P. falciparum
P. vivax
1200

900

600

300

0
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Months
Line graph can be also used to depict the relationship between
two continuous variables like that of scatter diagram.

• The following graph shows level of


zidovudine (AZT) in the blood of AIDS
patients at several times after administration of
the drug, for with normal fat absorption and
with fat mal absorption.
Response to administration of zidovudine in two groups of AIDS
patients in hospital X, 1999

8
7
Blood zidovudine

6
concentration

5
4
3
2
1
0
10
20
70
80
100
120
170
190
250
300
360
Time since administration (Min.)

Fat malabsorption Normal fat absorption


14

12

10

0
Antepartum Intrapartum Postpartum

Pre-eclampsia Eclampsia

Remember:
A graph is a tool.
It is not artwork to
hang above your sofa!
It is more important that it is
easy to correctly interpret
than it is that it is pretty!
TANK YOU …..

You might also like