Bio Statistics
Bio Statistics
Data
• Categorical or Continuous
Height in cm Tall/short
B.P in mm of Hg Hypo/normo/
hypertensives
Measurement scale
1. Nominal or Classificatory Scale
• Gender, ethnic background
2. Ordinal or Ranking Scale
• Hardness of rocks, beauty, military ranks
3. Interval Scale
• Celsius or Fahrenheit
4. Ratio Scale
• speed, height, mass or weight
1. Which is a qualitative variable
• a) BMI
• b) S. bilirubin
• c) Name of residing place
• d) Blood urea
2. Which is a quantitative variable
• Causes of deaths
• Religious distribution
• Age group distribution
• Age distribution
4. Which is an ordinal variable
• A)Blood pressure
• B)Name of residing place
• C)Grading of carcinoma
• D) temperature
5. Which is not a nominal scale
variable
• A)Causes of death
• B) religion
• C)diagnosis
• D)visual analogue scale
Measures of central Tendency
Value 2 2 3 3 3 4 5 5 6 7
(X)
Mean 4 4 4 4 4 4 4 4 4 4
X- -2 -2 -1 -1 -1 0 1 1 2 3
Mean
(X- 4 4 1 1 1 0 1 1 4 9
Mean)²
17
19
16
17kg
18
15
Standard Error of Mean
• The mean of any group of patients is often
taken to be representative of the whole
population of which it is a sample. The
population mean will fall within the group
mean with a SE of mean
• RR 12.26 to 57.74
• Obs value included
• CI 32.73 to 37.27
Example
In a study to find the prevalence of syndrome X in cardiology patients,
200 patients were selected and syndrome X was present in 62. Find the
range in which the prevalence of syndrome X in cardiology patients will
lie.
Population proportion:
Sample proportion ± 1.96*SE (proportion)
SE (proportion) = √(pq/n), q = 100-p
Neonatal 100 69 5
Asphyxia
Example
• A population has a mean pulse rate of 72 with a S.D of 10. When
we take a sample of 400 persons studied from this population got
a mean pulse rate of 80.
• Which of the statement is true
A)Sample is a true representative of population
B) Sample is not a true representative of population
C) Nothing can be concluded from this information
• S.E = S.D/√n
– 10/20=0.05
– 95% CI=71-73
– 80 is outside this range. So it is not a true representative of
population
In a group of 100 under five children
attending IMCH O.P the mean weight is 15kg.
The standard deviation is 2.
• (Sample) Mean +/ - 2 SE
• 95% CI =15+/-
=15+/- 2x0.02=14.96
2x0.02=14.96--15.04kg
• If the CI of two groups do not overlap they are
likely to be different clinically
Example – mean
• SE = 5
95% CI= sample proportion +/-
+/- 2 SE
= 40 +/-
+/- 2 x 5
=30-- 50
=30
• 4 x 90 x 90 /9 x9 = 400
Sample size – 2 groups
• Determine the sample size to prove that drug
A is better than drug B in reducing the
S.Cholesterol. The findings from a previous
study is given
Drug Mean SD
A 215 20
B 240 30
• Quantitative data N =
(Zα + Zβ )2 x S2 x 2 /d2
Zα = Z value for α level = 1.96 at α 0.05
Zβ = Z value for β level =1.28 for β at 10% & 0.82
at 20%
S = average SD
d = difference between the two means
• Determine the sample size to prove that drug A is
better than drug B in reducing the S.Cholesterol.
The findings from a previous study is given
Drug Mean SD
A 215 20
B 240 30
• (Zα + Zβ )2 x S2 x 2 /d2
• N = 10.5 x 25 x 25 x2/ 25 x 25 = 21
Sample size - proportion – 2 groups
• Qualitative data N =
(Zα + Zβ )2 p x q /d2
Zα = Z value for α level = 1.96 at α 0.5
Zβ = Z value for β level =1.28 for β at 10%
P = average prevalence /proportion/positive
factor
d = difference between the
prevalence/proportion/positive factor
Deciding statistical tests?
• In a clinical trial of a micronutrient on growth,
the weight was measured before and after
giving the micronutrient.. Which test will you
use for comparison?
• paired t test
• F test
• T test
• Chi square test
Difference in proportion Chi--square test, Z test,
Chi
• C
– Two groups
– >30
– Continuous variable
– Comparing mean
The most appropriate test to
compare birth weight in 3 different
regions is
• A) t test
• B) Anova
• C) Z test
• D) Chi square test
Difference in proportion Chi--square test, Z test,
Chi
• B
– Continuous variable
– Compare means
– > 2 groups
The most appropriate test to
compare BMI in two different adult
population of size 24 and 30 is
• A) Two sampled t test
• B) Paired t test
• C) Z test
• D) Chi square test
Difference in proportion Chi--square test, Z test,
Chi
• A
– Two different groups
– Continuous variable
– Size <30
The association between smoking
status and MI is tested by
• A) t test
• B) Anova
• C) F test
• D) Chi square test
• A) Fishers t Test
• B) Independent sample t test
• C) Paired t test
• D) Chi square test.
Difference in proportion Chi--square test, Z test,
Chi
• T-TEST
• Paired T Test
• Analysis of variance
Is there any relation between blood pressure
and body weight of these subjects?
• Correlation
Correlation coefficient
• Villages: 1 2 3 4 5 6 7 8
• Before :13 6 12 13 4 13 9 10
• After :15 4 10 9 1 11 8 13
Did the Installation of water supply
system significantly reduce deaths
Which non parametric test will be used
to test the null hypothesis
• Herbal :9 6 10 3 6 3 2
• Allopathy: 6 3 5 6 2 4 8
Is herbal treatment is better than
allopathic treatment?
• Small sample size
• Distribution is not normal
• Non parametric test corresonding to
independent T test
Non parametric tests
• < 2 - Insignificant
Reject Accept
Null hypothesis Null hypothesis
Alpha Error – 5%
SE = 4.5
Z = 930-
930-900/4.5=6.67
– For alpha error 5%, critical Z value = 1.96
– P value
• After applying a statistical test an
investigator get the p value as 0.01.
What does it mean?
• Null hypothesis states there is no difference,If
there is any difference it is due to chance
• P value = If the null hypothesis is true the
probability of the sample variation to occur by
chance
• P value 0.05= probability of the sample variation
by chance is only 5% if null hypothesis was true
• 95% the sample variation is not due to chance,&
there is a difference. So we will reject NH
• P = 0.01 - probability of the sample
variation by chance is only 1% if null
hypothesis was true
• 99 % the sample variation is not due to
chance,& there is a difference. So we will
reject NH
• As p value decreases the difference become
more significant
• For practical purpose p value < 0.05 ; the
difference is significant
Significance of P value
• In assessing the association between maternal
nutritional status and Birth weight of the
newborns two investigators A and B studied
separately and found significant results with p
values 0.02 & 0.04 respectively. From this
what can you infer about the magnitude of
association found by the two investigators
• A) The magnitude of association found by investigator A is
> than that found by B
Treatment 10 50 60
Control 15 55 70
• Assume no difference
• Calculate expected value in each square, if there was no
difference
• E = (Row total X column total) ÷ Grand total
Die Live Total
Treatment 10 50 60
Control 15 55 70
Total 25 105 130
• Σ (O-E)²/E
• (1.5)²/11.5 + (-3.6)²/46.4 + (1.5)²/13.5 + (-5.4)²/40.4
• 2.7
• Degrees of freedom= (Column – 1) (Row – 1) = 1
• Look at Chi sq table with 1 degree of freedom
Chi square table
Type of study Alternative Unit of study
name
Descriptive Case series Prevalence
Cross sectional study Individual
Longitudinal Incidence study
Correlational
Analytical Ecological Case reference Populations
studies Case control Follow up Individuals
(observational) Cohort Individuals
no 3 2997
Control 15 35 50
Total 25 75 100
P2/1-P2 0.3/0.7
+ -
a b
+ 2 21
- c d
3 24
• Odd’s ratio = ad /bc
• 2 x 24 = 0.76
21 x3
Interpretation
• OR =1,RISK FACTOR NOT RELATED TO DISEASE
No.malnourished
At age one 102 51
Study design –Cohort study
• Measure of risk –Relative risk ,Attributable
risk.
• Relative risk –Incidence among exposed
Incidence among nonexposed
= 102/300 = 0.34 = 2
51/ 300 0.17
Inference ?
• An out break of Pediculosis capitis being
investigated in a girls school with 291
pupils.Of 130 Children who live in a nearby
housing estate 18 were infested and of 161
who live elsewhere 37 were infested. The
Chi square value was found to be 3.93 .
• P value = 0.04
• Is there a significant difference in the
infestation rates between the two groups?
Results of a screening test
Disease
Positive Negative
Positive TP(a) FP(b)
Test
Specificity = d/b+d
+ -
21a 28b
Fever > 37.50c +
- 9c 42d
30a+c 70b+d
• Sensitivity = a/a+c - 21/30=70%
• Specificity = d/b+d = 42/70=60%
• Positive predictive value = a/a+b =
21/49=43%
• Negative predictive value = d/c+d = 42/51
Exercise 11
Disease prevalence in a population of 10,000
was 5%. A urine sugar test with sensitivity of
70% and specificity of 80% was done on the
population. The positive predictive value will
be :
a)15.55% b) 70.08% c) 84.4% d)98.06%
• Total population = 10,000
• Disease prevalence = 5%
• No diseased = 500
• Applying this to a 2x2 table :
2x2 table
+ -
LE Laboratory criteria
test Bacterial Not Bacterial
Meningitis Meningitis
Positive 20 3
Negative 0 64
1. What is this plot known as? And where are they primarily
used ?
the outcome
Answer
2. Diamond
B B1
U’ = n1n2 – U n1 = 7 n2 = 5 R1 = 30 R2 = 48
U’ = (7)(5) – 33
U’ = 2
U 0.05(2),7,5 = U 0.05(2),5,7 = 30
Treatment A 38 26 40
Treatment 36 42 35
B
• The first step is to rank the Rank
data. Treatment B is
indicated by bold red letters 1 26
• Treatment A ranks 1,4 & 5 – 2 35
Total 10
• Treatment B ranks 2,3 & 6 – 3 36
Total 11
• Use tables to see 4 38
significance of difference
5 40
6 42
Kaplan Meier curve
Serial time
• Serial time : (Time-to-event) is a clinical course
duration, variable for each subject
• Begin : enrolled into a study, treatment begins,
• Ends : end-point (event of interest) is reached or
the subject is censored
• Serial time duration of known survival is
terminated by the event of interest; this is known
as an interval and is graphed as a horizontal line
Censoring
• Censoring means the total survival time for that subject cannot be
accurately determined.
• Negative : subject drops out, is lost to follow-up, or required data is not
available
• Positive (good) : study ends before the subject had the event of interest
occur, i.e., they survived at least until the end of the study, but there is no
knowledge of what happened thereafter.
• Thus censoring can occur within the study or terminally at the end.
• Censored subjects are indicated on the Kaplan-Meier curve as tick marks;
these do not terminate the interval.
• Large number of censored subjects - question how the study was carried
out or if treatment was ineffective – many dropped out
• More patients censored, especially early, less reliable the survival curve
• No censored patients - interpret with caution
Kaplan Meier Curve
• The lengths of the horizontal lines along the X-axis of
serial times represent the survival duration for that
interval.
• The interval is terminated by the occurrence of the
event of interest.
• The vertical lines are just for cosmesis; they make the
curve more pleasing to observe
• The cumulative probability of surviving a given time is
seen on the Y-axis
• Group 1 - probability of surviving 11 months is 100%
• Group 2 - probability of surviving 11 months is 66.7%
Kaplan Meier curve
• The cumulative probability defines the probability at
the beginning and throughout the interval. This is
graphed on the Y-axis of the curve.
• The interval survival rate (or probability) defines the
probability of surviving past the interval, i.e. still
surviving after the interval and beginning the next.
• Cumulative survival rate (probability) in interval three
of 0.667
• Median survival by crossing a horizontal line
through the survival curve at 50% mark
• Compare 2 curves by log rank test
Kaplan Meier curve
• Survival curve somewhat misleading, as time to
any event eg time to tumor recurrence, time to
extubation after a surgical procedure, time to
rejection of a kidney transplant
• Many small steps - higher number of subjects,
whereas curves with large steps - limited number
of subjects (less accurate).
• Survivor function at the far right of a Kaplan-
Meier survival curve - interpret cautiously….
fewer patients remaining, so survival estimates
are not as accurate
ROC curve