Introduction To Bio Statistics
Introduction To Bio Statistics
Did researcher
assign exposures?
Experimental Observational
study study
Experimental study
Is allocation
random?
Randomised Non-randomised
control control
Observational study
Is there a
comparison group?
Analytical Descriptive
study study
Analytical study
Direction of the
study
CROSS
CASE CONTROL
COHORT STUDY SECTIONAL
STUDY
STUDY
Some definitions….
• Clinical Trial Experimental study in which
the exposure status (e.g. assigned to
active drug versus placebo) is determined
by the investigator.
• Randomized Controlled Trial A special
type of clinical trial in which assignment
to an exposure is determined purely by
chance.
• Cohort Study Observational study in which
subjects with an exposure of interest (e.g.
hypertension) and subjects without the
exposure are identified and then followed
forward in time to determine outcomes
(e.g. stroke).
• Case-Control Study Observational study
that first identifies a group of subjects
with a certain disease and a control group
without the disease, and then looks to back
in time (e.g. chart review) to find exposure
to risk factors for the disease. This type
of study is well suited for rare diseases.
• Cross-Sectional Study Observational study
that is done to examine presence or absence
of a disease or presence or absence of an
exposure at a particular time. Since
exposure and outcome are ascertained at the
same time, it is often unclear if the
exposure preceded the outcome.
• Case Report or Case Series Descriptive
study that reports on a single or a series of
patients with a certain disease. This type of
study usually generates a hypothesis but
cannot test a hypothesis because it does not
include an appropriate comparison group.
bias
• Any systematic error in the design or
conduct of a study that results in a
mistaken estimate of an exposure’s
effect on risk of disease.
Selection Bias
• Bias introduced by the way in which
participants are chosen for a study. For
example, in a case-control study using
different criteria to select cases (e.g.
sick, hospitalized population) versus
controls (young, healthy outpatients)
other than the presence of disease can
lead the investigator to a false conclusion
about an exposure.
Confounding
• This occurs when an investigator falsely
concludes that a particular exposure is
causally related to a disease without
adjusting for other factors that are
known risk factors for the disease and
are associated with the exposure.
Classification of statistics
STATISTICS
DESCRIPTIVE INFERENTIAL
STATISTICS STATISTICS
DESCRIPTIVE STATISTICS
Frequency distribution
Measures of dispersion
Measures of probability
INFERENTIAL STATISTICS
Analysis of variance
Arithmetic mean
Median
Mode
Position of averages
Geometric mean
Harmonic mean
Percentile
Arithmetic mean
• The arithmetic mean of a group is the
simple arithmetic average of the
observations
• This is calculated by dividing the total of
all observations by the number of
observations.
• In case of grouped data, arithmetic mean
is calculated assuming that each
observation in a class interval is equal to
the midpoint of that class interval
median
• The median is the magnitude of the
observation which occupies the middle
position when all the observations are
arranged in order of their magnitude.
• When there are even number of
observations in the group, the median is
the arithmetic mean of the center two
observations.
mode
• Mode is the most frequently occurring
value.
• The mode of the group is the value
around which all most of the observations
are heavily concentrated
Position of averages
• In a frequency distribution, the measures
of central tendency – mean median and
mode – occupy some definite relative
positions
• This position on a graph is called the
position of averages.
Symmetric distribution
• In a frequency distribution, if the
frequencies are equal on both sides of
the position of averages, then the
distribution is said to be symmetrical.
Skewed distribution
• In a frequency distribution, if the two
sides of the position of averages are
unequal, it is called an asymmetrical or
skewed distribution.
Geometric mean
• The geometric mean is usually more
suitable as a measure of central tendency
if the values change exponentially.
• If there are only 2 observations, then
the GM is the square root of the product
of 2 observations and if there are 3
observations it is the cube root of the
product of the 3 observations…
• Thus if there are n observations, it is the
‘n’th root of the product of the n
observations.
Harmonic mean
• The harmonic mean is used in situations
where the reciprocals of the actual
values seem more useful to determine the
central tendency.
• For example, it has been suggested that
the sensitivity to detect clusters of
observations is increased by measuring
the reciprocal of the distances rather
than the distances directly
percentile
• The value below which a given percentage
of observations occur is called a centile
or percentile.
• The median is called the 50th percentile
or centile.
• The percentiles divide the distribution
into 100ths but sometimes it is more
convenient to divide it into quartiles or
deciles.
MEASURES OF DISPERSION
Measures of dispersion
• The fact that we need an average or a
measure of central tendency shows that
there is variation among the observations
• Variation which is another characteristic
of a group of observations has to be
considered for describing the group more
satisfactorily.
• A single figure for a group relating to its
central tendency does not give any idea
about the variability of the observations.
Measures of dispersion
Range
Interquartile range
Mean deviation
Coeffecient of variation
range
• The range of a group of observations is
the interval between the smallest and the
biggest observation.
• The value of the range is dependent only
upon the two extreme observations in the
group and does not consider the other
observations.
• The occurrences of rare observations in
the group greatly influences the value of
the range and so it is not considered as
an ideal measure of dispersion.
Interquartile range
• Interquartile range is the interval
between the values of the upper quartile
and the lower quartile
• Upper quartile is the value above which
25% of the observations fall and lower
quartile is the value below which 25% of
the observations fall.
• This measure gives the range which
covers the middle 50% of the
observations in the group
Mean deviation
• The mean deviation is the arithmetic
mean of the deviations of the
observations from the arithmetic mean
ignoring the sign of these deviations.
• The mean deviation is based on all the
observations in the group.
• It is easy to measure this but it is not
widely used as more advantageous
methods are available.
Standard deviation
• The standard deviation is the square root
of the average of the squared deviations
of the observations from the arithmetic
mean.
• The deviation from mean is considered
without its sign in calculating the mean
deviation but in calculating the standard
deviation it is squared.
• The standard deviation is the most
important measure of dispersion.
Standard deviation
• For some frequency distributions there is
a relationship between the range and the
standard deviation.
• Standard deviation together with the
arithmetic mean can describe a frequency
distribution uniquely.
• Standard deviation of a population is
usually denoted by δ and that of a sample
by ‘s’
variance
• The square of the standard deviation is
called variance and it can also be used as
a measure of dispersion.
MEASURES OF PROBABILITY
probability
• One function of statistical methods is to
provide techniques for making inductive
inferences and also for measuring the
degree of uncertainty of such inferences.
• In games of chance for example, the
outcome of a particular trial is uncertain
but the long term outcome is predictable.
• The long term regularity provides us with
a measure of the amount of chance and
that is called probability.
The probability scale
• Chance is measured on a probability scale
having zero at one end and one at the
other end. The top end of the scale
marks absolute certainty and the bottom
end marks absolute impossibility.
• The other points on the probability scale
falling between one and zero indicate the
relative chance of occurrence of the
outcome.
Types of probability
A. a priori or Classical probability : when
the number of outcomes of a certain
trial is limited and known, then the
calculation of probability based on the
limited number of known outcomes is the
classical probability
B. a posteriori or frequency probability:
when the outcome of a trial is always
random and the probability can be only
calculated by the previous observational
or experimental evidence only.
Laws of probability for independent
events
1. Addition law : if an event can occur in
any one of several mutually exclusive
ways, the probability of that event is
the sum of the individual probabilities of
the different ways in which it can occur
2. Multiplication law : the probability of
the simultaneous occurrence of 2 or
more independent events is the product
of the individual probabilities.