BASIC BIOSTATISTICS
COURSE CODE (PUBH 612)
By
Eshetu Alemayehu(BSc,MPH)
Epidemiology Dep't,Faculty of Health
Sciences,Jimma University Hargeyssa Campus
October, 2018
Jimma, Ethiopia
Objectives of the chapter
After completing this chapter, we will be able to:
Define Statistics and Biostatistics
Define and Identify the different types of
variable
list why we need to classify variables
Enumerate the importance and limitations of
statistics
2
• What is statistics?
3
Statistics
• The science of assembling and interpreting numerical
data (Bland, 2000)
• The discipline concerned with the treatment of numerical
data derived from groups of individuals (Armitage et al.,
2001).
Generally the term statistics is used to mean either
statistical data or statistical methods
4
Introduction
• Statistical data: refers to numerical descriptions
of things
– take the form of counts or measurements
– E.g. statistics of malaria cases include fever cases,
number of positives obtained, sex and age distribution
of positive cases, etc.
– Statistical data always denote numerical description but
not vice versa.
Why?
5
Characteristics of statistical data
I. They must be in aggregates – This means that
statistics are‘ number of facts.
II. They must be affected to a marked extent by a
multiplicity of causes. (meaning: statistics are
aggregates of such facts only as grow out of a '
variety of circumstances)
6
Characteristics of statistical data….
III. They must be enumerated or estimated
according to a reasonable standard of
accuracy.
• If the basis happens to be incorrect the
results(statistical investigations, prediction..)
are bound to be misleading.
IV. They must have been collected in a
systematic manner for a predetermined
purpose.
7
Characteristics of statistical data…
V. They must be placed in relation to each other.
That is, they must be comparable.
• Numerical facts may be placed in relation to
each other either in point of time, space or
condition.
8
Introduction…
• Statistical methods: methods that are used for
collecting, organizing, analyzing and
interpreting numerical data for
– understanding a phenomenon or
– making wise decisions
• It is a branch of scientific method and helps us
to know in a better way the object under study.
9
Biostatistics
Biostatistics: The application of statistical methods to
the fields of biological and medical sciences.
When the data being analyzed are derived from the
biological sciences and medicine, we use the term
biostatistics to distinguish this particular application of
statistical tools and concepts.
Concerned with interpretation of biological data & the
communication of information derived from these data
Has central role in medical investigations
10
Types of Biostatistics
11
Types of Statistics
1. Descriptive statistics:
• is the aspect of collecting, organization,
presentation and summarization of data.
• Helps to identify the general features and trends
in a set of data and extracting useful information
• Also very important in conveying the final results
of a study
• Example: tables, graphs, numerical summary
measures
12
Descriptive statistics…
Some statistical summaries which are especially
common in descriptive analyses are:
– Measures of central tendency
– Measures of dispersion
– Measures of association
– Cross-tabulation /contingency table
– Histogram
– Quartile, Q-Q plot
– Scatter plot
– Box plot
13
Types of Statistics
2. Inferential statistics:
• Consists of generalizing from samples to
population, performing hypothesis testing,
determining relation among variables, and
making prediction.
• Example: Principles of probability, estimation,
confidence interval, comparison of two or more
means or proportions, hypothesis testing,
regressions, etc.
14
Inferential statistics
• The inferences are drawn from particular
properties of sample to particular properties of
population
• Inferential statistics builds upon descriptive
statistics
• Estimating population values from sample
values
Sample population
15
Inferential statistics…
• Enables us to make confident decisions in the
face of uncertainty
E.g. Antibiotics reduce the duration of viral throat
infections by 1-2 days.
Five percent of women aged 30-49 consult their
GP each year with heavy menstrual bleeding.
16
Inferential statistics…
NB: They encompasses a variety of
procedures to ensure that the
inferences are sound and rational,
even though they may not always be
correct.
17
Uses of biostatistics
Statistics pervades a way of organizing information
on a wider and more formal basis than relying on
personal experience
handling variations:
I. Biological variation
–Among individuals as well as within same individual
over time
»Example: height, weight, blood pressure, eye color
...
II. Sample variation:
Biomedical research projects are usually carried out
on small numbers of study subjects
18
Uses of biostatistics…
Essential for scientific method of investigation
– Formulate hypothesis
– Design study to objectively test hypothesis
– Collect reliable and unbiased data
– Process and evaluate data rigorously
– Interpret and draw appropriate conclusions
19
Uses of biostatistics…
Essential for understanding, appraisal and
critique of scientific literature
Public health and medicine are becoming
increasingly quantitative.
20
Limitations of statistics
• It deals with only those subjects of inquiry that are
capable of being quantitatively measured and
numerically expressed
• It deals on aggregates of facts and no importance
is attached to individual items – suited only if the
group characteristics are desired to be studied
• Statistical data are only approximation and not
mathematically correct
21
Variable
Variable: is a characteristic under study that
assumes different values for different elements or it
is characteristic or an attribute that can assume
different values
Some examples of variables include:
Diastolic blood pressure,
heart rate, height,
The weight and
Stage of bladder cancer
Random variable: are variables whose value are
determined by chance
22
Types of variables
• Depending on the characteristic of the
measurement, variable can be:
1)Qualitative(Categorical) variable
A variable or characteristic which cannot be
measured in quantitative form
The categories should be clear cut (not
overlapping) and cover all the possibilities.
23
Types of variables…
• E.g. sex, place of birth, ethnic group, type of
drug, stages of breast cancer (I, II, III, or
IV),degree of pain (low, moderate, sever or
unbearable),vital status (alive or dead), nutrition
status(normal, moderate or malnutrition)
2) Quantitative (Numerical) variable:
measured and expressed numerically
They can be of two types:-
a) Discrete Data
b) Continuous Data
24
Types of variables…
A)Discrete Variable
– The values of a discrete variable are usually
whole numbers, such as the number of episodes
of diarrhea in the first five years of life.
– Observations can only take certain numerical
values
– Numerical discrete data occur when the
observations are integers that correspond with a
count of some sort
25
Types of variables…
Some common examples are:
The number of bacteria colonies on a plate,
The number of cells within a prescribed area
upon microscopic examination,
The number of heart beats within a specified
time interval,
A mother’s history of numbers of births
( parity) and pregnancies (gravidity),
The number of episodes of illness a patient
experiences during some time period, etc.
26
Types of variables…
B)Continuous Variable
A continuous variable is a measurement on a
continuous scale
Each observation theoretically falls somewhere along
a continuum
One is not restricted, in principle, to particular values
such as the integers of the discrete scale
27
Types of variables
Continuous variables are used to report a
measurement of the individual that can take on
any value within an acceptable range.
e.g. most clinical measurements, such as:
Blood pressure,
Serum cholesterol level,
Height, weight, age etc.
28
Types of variables
Variables
Quantitative
Qualitative
Continous Discrete
29
Dependent and independent variables
DEPENDENT variable: The variable that is used
to describe or measure the problem under study.
– E.g. Nutritional status
INDEPENDENT variables:
The variables that are used to describe or
measure the factors that are assumed to cause
or at least to influence the problem.
– E.g. age, sex, income, educational level, duration of
ART
30
Definitions of terms
• Population: the larger group of units about which
inferences are to be made.
• Sample: The smaller group of units actually
measured.
• Unit: a single individual or object being measured.
• Parameter: A descriptive measure computed
from the data of a population.
– E.g., the mean (µ) age of the target population
• Statistic: A descriptive measure computed
from the data of a sample.
– E.g., sample mean age ( ) 31
Definitions of terms …
Data:-are numbers which can be measurements or
can be obtained by counting the measurements or
observations (values) for a variable
The raw material for statistics
– Can be obtained from:
• Routinely kept records, literature
• Surveys
• Counting
• Experiments
• Reports
• Observation,etc…
Data set: it is a collection of observation on a
variable. 32
Scales of measurement
• Data comes in various sizes and shapes
and it is important to know about these so
that the proper analysis can be used on
the data.
• There are four at which we measure:
– Nominal
– Ordinal
– Interval
– Ratio
33
Nominal scales of measurement
• It may be thought of as "naming" level. This level
of measurement do not put subjects in any
particular order.
• There is no logical basis for saying one category
is higher or less than the other category.
• In research activities a YES/NO scale is
nominal.
34
Nominal….
• The simplest data consist of unordered,
dichotomous, or "either - or" types of
observations, i.e., either the patient lives or the
patient dies, either he has some particular
attribute or he does not.
• Examples are: Blood group, Gender, religious
affiliation
35
Nominal….
• The nominal level of measurement classifies
data into mutually exclusive (non over lapping),
exhaustive categories in which no order or
ranking can be imposed on the data
36
Ordinal Scales of Measurement
• An ordinal scale is next up the list in terms of
power of measurement.
• The simplest ordinal scale is a ranking.
• At this level we put subjects in order from lowest to
height.
• It is important to know that ranks do not tell us by
how much subjects differ.
• There is no objective distance between any two
points on your subjective scale.
37
Ordinal Scales…
• Hence, an ordinal scale only lets you
interpret gross order and not the relative
positional distances.
• e.g. if we told that professors have better
knowledge than lecturers, then we do not
by how much the professors are better
• To measure the amount of the difference
between subjects we need the next level
of measurement.
38
Ordinal Scales…
Some more examples:-
Academic status, job satisfaction index, employment
status, response to treatment (none, slow,
moderate, fast),
1. strongly agree
2. agree
3. no opinion
4. disagree
5. strongly disagree
39
Ordinal Scales...
The ordinal level of measurement classifies
data into categories that can be ranked;
however, precise differences between the
ranks do not exist
40
Interval Scales of Measurement
• It is more powerful than nominal and ordinal as it
not only orders or ranks or rates but also shows
exact distances between orders
• One unit on the scale represents the same
magnitude on the trait or characteristic being
measured across the whole range of the scale
41
Interval Scales…
They do not have a "true" zero point and therefore
it is not possible to make statements about how
many times higher one score is than another
• A good example of an interval scale is the
Fahrenheit scale for temperature,IQ,SAT.
Equal differences on this scale represent equal
differences in temperature, but the scale is not a
RATIO Scale.
Equal differences on this scale represent equal
differences in temperature but a temperature of
30 degrees is not twice as warm as one of 15
degrees 42
Interval Scales…
The interval level of measurement ranks
data and precise differences between units
of measure do exist; however, there is no
meaningful zero
43
Ratio Scales of Measurement.0….
• The highest level of measurement
• This has the properties of an interval scale
together with a fixed origin or zero point
• Permit to compare both differences in scores
and the relative magnitude of scores
– Examples of variables which are ratio scaled
include weights, lengths and times
– The difference between 5 and 10 minutes is
the same as that of between 10 and 15
minutes, and 10 minutes is twice as long as 5
minutes.
44
Ratio Scales…
The ratio level of measurement possesses
all the characteristics of interval
measurement, and there exists a true zero
In addition, true ratio exist between different
units of measure
45
Summary table for the four scales of
measurement
Lowest Scale characteristics
scale
Nominal Naming
Ordinal Ordering
Interval Equal interval without
absolute zero
Ratio Equal interval with
highest absolute zero
46
Assignment 1
Categorize the following variables into nominal, ordinal, interval or ratio
• Gender • Ranking of tennis players
• Grade(A, B, C, D and F ) • Major field
• Rating scale(poor, good, • Nationality
excellent) • Height
• Eye color • Weight
• Political affiliation • Time
• Religious affiliation • Age
• IQ
• Temperature
• Salary
47
48