Bio Introduction
Bio Introduction
Objectives
• At the end of sessions, students will able
to:
– Define Statistics and Biostatistics
– Define variables and identify its categories
– Identify the four scales of measurement
– Identify data collection methods and
techniques
2
Introduction
• What is statistics?
• Statistics: A field of study concerned with:
– collection, organization, analysis,
summarization and interpretation of numerical
data, &
– the drawing of inferences about a body of data
when only a small part of the data is observed.
4
• The numbers must be presented in such a
way that valid interpretations are possible
5
Uses of biostatistics
• Provide methods of organizing information
• Assessment of health status
• Health program evaluation
• Resource allocation
• Magnitude of association
– Strong vs weak association between
exposure and outcome
6
Uses of biostatistics
• Assessing risk factors
– Cause & effect relationship
• Evaluation of a new vaccine or drug
– What can be concluded if the proportion of
people free from the disease is greater among
the vaccinated than the unvaccinated?
– How effective is the vaccine (drug)?
– Is the effect due to chance or some bias?
• Drawing of inferences
– Information from sample to population 7
What does biostatistics cover?
Research Planning
Presentation
Interpretation
8
Publication
Types of Statistics
1. Descriptive statistics:
• Ways of organizing and summarizing data
• Helps to identify the general features and
trends in a set of data and extracting
useful information
• Also very important in conveying the final
results of a study
• Example: tables, graphs, numerical
summary measures
9
Types of Statistics
2. Inferential statistics:
• Methods used for drawing conclusions
about a population based on the
information obtained from a sample of
observations drawn from that population
• Example: Principles of probability,
estimation, confidence interval,
comparison of two or more means or
proportions, hypothesis testing, etc.
10
Population and Sample
• Population:
– Refers to any collection of objects/ persons
• Target population:
– A collection of items that have something in
common for which we wish to draw conclusions at
a particular time.
• E.g., All hospitals in Ethiopia
– The whole group of interest
11
Population and Sample
Study (Sampled) Population:
• The subset of the target population that has at
least some chance of being sampled
• The specific population group from which
samples are drawn and data are collected
12
Population and Sample
Sample:
. A subset of a study population, about
which information is actually obtained.
. The individuals who are actually measured
and comprise the actual data.
13
.
Population
• Role of statistics
in using information
from a sample to make
inferences about the
population
Information
Sample
14
E.g.: In a study of the prevalence
of HIV among adolescents in
Ethiopia, a random sample of
adolescents in Lideta Kifle
Ketema of AA were included.
15
Generalizability
• Is a two-stage procedure:
• We need to be able to generalize from:
– the sample to the study population, &
– then from the study population to the target
population
• If the sample is not representative of the
population, the conclusions are restricted to
the sample & don’t have general
applicability
16
Draw conclusions
Collect information
about a rather
from a relatively
LARGE population
SMALL sample
17
Parameter and Statistic
• Parameter: A descriptive measure
computed from the data of a population.
– E.g., the mean (µ) age of the target population
• Statistic: A descriptive measure computed
from the data of a sample.
– E.g., sample mean age ( )
18
Cont’d…
• To each sample statistic there corresponds a
population parameter.
• We use X̅ , S2 , S , p, etc. to estimate μ, σ2, σ,
P (or π), etc. respectively.
Sample Statistics are Estimators of Population Parameters
Sample mean ( ), µ
Sample variance, S2 2
Sample proportion ( ), P or π
Sample SD, S
Sample Odds Ratio, OŔ OR
Sample Relative Risk, RŔ RR
Sample correlation coefficient, r ρ
02/08/24 19
Variable
• Variable: A characteristic which takes
different values in different persons, places,
or things.
• Any aspect of an individual or object that is
measured (e.g., BP) or recorded (e.g., age,
sex) and takes any value.
• There may be one variable in a study or
many.
• E.g., A study of treatment outcome of TB
• Variables can be broadly classified
into:
– Categorical (or Qualitative) or
– Quantitative (or numerical variables).
• Categorical variable: A variable or
characteristic which can not be measured in
quantitative form but can only be sorted by
name or categories
Variable
Types
of Qualitative Quantitative
variables or categorical measurement
Measurement scales
Scales of measurement
• All measurements are not the same.
• Measuring weight = eg. 40kg
• Measuring the status of a patient on scale
= “improved”, “stable”, “not improved”.
• There are four types of scales of
measurement.
1. Nominal scale:
• The simplest type of data, in which the values
fall into unordered categories or classes
• Consists of “naming” observations or
classifying them into various mutually
exclusive and collectively exhaustive
categories
• Uses names, labels, or symbols to assign each
measurement.
– Examples: Blood type, sex, race, marital status, etc.
Example of nominal Scale:
Race/Ethnicity:
1. Black • The numbers have NO
2. White meaning
3. Latino • They are labels only
4. Other
• If nominal data can take on only two
possible values, they are called
dichotomous or binary.
• So sex is not just nominal, it is
dichotomous (male or female).
• Yes/no questions
– E.g., cured from TB at 6 months of Rx
2. Ordinal scale:
• Assigns each measurement to one of a
limited number of categories that are
ranked in terms of order.
• Although non-numerical, can be
considered to have a natural ordering
– Examples: Patient status, cancer stages,
social class, etc.
Example of ordinal scale:
35
Data collection methods
• Data collection techniques allow us to
systematically collect data about our
objects of study
– people, objects, and phenomena and
– about the setting in which they occur.
37
Data collection methods
• Focus group discussions
• Other data collection techniques,
measuring height, length, weight, BMI,
MUAC, chest circumference, head
circumference, blood pressure, Hgb, Hct,
38
Data collection methods
1. Using available information (record
review)
40
Data collection methods
42
Data collection methods
• Interviews can be conducted with varying
degrees of flexibility. The two extremes,
high and low degree of flexibility, are
described below:
1.High degree of flexibility
2.Low degree of flexibility
43
Data collection methods
1. High degree of flexibility
• A structured or loosely structured method
of asking questions can be used for
interviewing individuals as well as groups
of key informants.
45
Data collection methods
46
Data collection methods
• Example: Interviews using a
questionnaire with a fixed list of questions
in a standard sequence, which have
mainly fixed or pre-categorized answers.
47
Data collection methods
4. Self administered written
questionnaires
• A SELF-ADMISNSTERD
QUESTIONNAIRE: is data collection
tools in which written questions are
presented that are to be answered by the
respondents in written form.
48
TYPES OF QUESTIONS
• Before examining the steps in designing a
questionnaire, we need to review the types
of questions used in questionnaires.
• Depending on how questions are asked
and recorded we can distinguish two major
possibilities - Open –ended questions, and
closed questions.
49
TYPES OF QUESTIONS
1. Open-ended questions
• Open-ended questions permit free
responses that should be recorded in the
respondent’s own words.
• The respondent is not given any possible
answers to choose from.
50
Open-ended questions
• Such questions are useful to obtain
information on:
Facts with which the researcher is not very
familiar,
Opinions, attitudes, and suggestions of
informants, or
Sensitive issues.
51
Open-ended questions
For example
• Can you describe exactly what the
traditional birth attendant did when your
labour started?”
• What do you think are the reasons for a
high drop-out rate of village health
committee members?”
52
Closed Questions
• Closed questions offer a list of possible
options or answers from which the
respondents must choose.
• When designing closed questions one
should try to:
Offer a list of options that are exhaustive
and mutually exclusive
Keep the number of options as few as
possible.
53
Closed Questions
• Closed questions are useful if the range of
possible responses is known.
For example
“What is your marital status?
1. Single
2. Married/living together
3. Separated/divorced/widowed
54
Closed Questions
“Have your every gone to the local village
health worker for treatment?
1. Yes
2. No
55
Data collection methods
5. Focus Group Discussion
• Used to collect information from a group through
guided discussions of the study topic
• Eight to ten individuals with similar background are
brought together to discuss their problems
• One modulator and one time keepers are needed
to facilitate and recorded the discussion
• The discussion will stop when idea saturation
reached.
56
Data collection methods
Problems in gathering data
• Language barriers
• Lack of adequate time
• Expense
• Inadequately trained and experienced staff
• Invasion of privacy
57
Selecting data collection method
depends on…
12 19 27 36 42 59
15 22 31 39 43 61
17 23 31 41 44 65
18 26 34 41 54 67
• The actual summarization and organization
of data starts from frequency distribution.
Sturge’s rule:
K 1 3.322(logn)
LS
W
K
where
K = number of class intervals n = no. of observations
W = width of the class interval L = the largest value
S = the smallest value
Example:
– Leisure time (hours) per week for 40 college
students:
23 24 18 14 20 36 24 26 23 21 16 15 19 20
22 14 13 10 19 27 29 22 38 28 34 32 23 19
21 31 16 28 19 18 12 27 15 21 25 16
K = 1 + 3.22 (log40) = 6.32 ≈ 6
Maximum value = 38, Minimum value = 10
Width = (38-10)/6 = 4.66 ≈ 5
Time Relative Cumulative
(Hours) Frequency Frequency Relative
Frequency
10-14 5 0.125 0.125
15-19 11 0.275 0.400
20-24 12 0.300 0.700
25-29 7 0.175 0.875
30-34 3 0.075 0.950
35-39 2 0.050 1.00
Total 40 1.00
• Cumulative frequencies: When frequencies
of two or more classes are added.
data.
3. They have great memorizing value than
mere figures.
4. They facilitate comparison
5. Used to understand patterns and trends
• Well designed graphs can be powerful
means of communicating a great deal of
information
• Histogram
• Stem-and-leaf plot
• Box plot Quantitative
• Scatter plot data
• Line graph
• Others
1. Bar charts (or graphs)
• Categories are listed on the horizontal axis
(X-axis)
• Frequencies or relative frequencies are
represented on the Y-axis (ordinate)
• The height of each bar is proportional to
the frequency or relative frequency of
observations in that category
Bar chart for the type of ICU for 25 patients
Method of constructing bar chart
• All the bars must have equal width
• The bars are not joined together (leave
space between bars)
• The different bars should be separated
by equal distances
• All the bars should rest on the same line
called the base
• Label both axes clearly
2. Sub-divided bar chart
• If there are different quantities forming
the sub-divisions of the totals, simple
bars may be sub-divided in the ratio of
the various sub-divisions to exhibit the
relationship of the parts to the whole.
• The order in which the components are
shown in a “bar” is followed in all bars
used in the diagram.
– Example: Stacked and 100% Component
bar charts
Example: Plasmodium species distribution for
confirmed malaria cases, Zeway, 2003
100 Mixed
P. vivax
80 P. falciparum
60
Percent
40
20
0
August October December
2003
3. Multiple bar graph
• Bar charts can be used to represent the
relationships among more than two
variables.
• The following figure shows the
relationship between children’s reports
of breathlessness and cigarette
smoking by themselves and their
parents.
Prevalence of self reported breathlessness among school
childeren, 1998
35
Breathlessness, per cent
30
25
20
15
10
5
0
Neither One Both
Parents smooking
We can see from the graph quickly that the prevalence of the symptoms
increases both with the child’s smoking and with that of their parents.
There’s no reason why the bar chart can’t be
plotted horizontally instead of vertically.
CHA
Type of source
HC
Reading
Training femal
male
e
Campaign
Anti FGMC
CAT
0 10 20 30 40 50
Percent
Others
8%
Digestive System
4%
Injury and Poisoning
3%
Circulatory system
Respiratory system
42%
13%
Neoplasmas
30%
5. Histogram
• Histograms are frequency distributions with
continuous class intervals that have been
turned into graphs.
• To construct a histogram, we draw the interval
boundaries on a horizontal line and the
frequencies on a vertical line.
• Non-overlapping intervals that cover all of the
data values must be used.
• Bars are drawn over the intervals in such a
way that the areas of the bars are all
proportional in the same way to their
interval frequencies.
40
35
30
25
No of women
20
15
10
0
14.5-19.5 19.5-24.5 24.5-29.5 29.5-34.5 34.5-39.5 39.5-44.5 44.5-49.5
Age group
Histogram for the ages of 2087 mothers with <5
children, Adami Tulu, 2003
700
600
500
400
300
200
N1AGEMOTH
Two problems with histograms
1. They are somewhat difficult to construct
2. The actual values within the respective
groups are lost and difficult to reconstruct
101