0% found this document useful (0 votes)
21 views147 pages

Bio Statistics

The document provides an overview of biostatistics, including types of data (categorical, continuous, qualitative, and quantitative), measurement scales, and measures of central tendency. It discusses study designs, incidence vs. prevalence, confidence intervals, and standard error, along with examples illustrating statistical concepts. Additionally, it covers sample size calculations for qualitative and quantitative data.

Uploaded by

Fatimah Haider
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views147 pages

Bio Statistics

The document provides an overview of biostatistics, including types of data (categorical, continuous, qualitative, and quantitative), measurement scales, and measures of central tendency. It discusses study designs, incidence vs. prevalence, confidence intervals, and standard error, along with examples illustrating statistical concepts. Additionally, it covers sample size calculations for qualitative and quantitative data.

Uploaded by

Fatimah Haider
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 147

Biostatistics

Data

• Categorical or Continuous

• Parametric or Non parametric


Types of variables
• Qualitative
– Dichotomous
– Nominal
– Ordinal
• Quantitative
– Discrete
– Continuous
Quantitative data Qualitative data

Hb in gm% Anemic/non anemic

Height in cm Tall/short

B.P in mm of Hg Hypo/normo/
hypertensives
Measurement scale
1. Nominal or Classificatory Scale
• Gender, ethnic background
2. Ordinal or Ranking Scale
• Hardness of rocks, beauty, military ranks
3. Interval Scale
• Celsius or Fahrenheit
4. Ratio Scale
• speed, height, mass or weight
1. Which is a qualitative variable
• a) BMI
• b) S. bilirubin
• c) Name of residing place
• d) Blood urea
2. Which is a quantitative variable
• Causes of deaths
• Religious distribution
• Age group distribution
• Age distribution
4. Which is an ordinal variable
• A)Blood pressure
• B)Name of residing place
• C)Grading of carcinoma
• D) temperature
5. Which is not a nominal scale
variable
• A)Causes of death
• B) religion
• C)diagnosis
• D)visual analogue scale
Measures of central Tendency

• Qualitative data – Proportion


• Quantitative data – Mean,Median,Mode
• Define Median, 1st Quartile and 3rd Quartile.
• If the observations are arranged in ascending:
• Median: 50% observations are below and 50%
above this value
• 1st Quartile: 25% observations are below and
75% above this value
• 3rd Quartile: 75% observations are below and
25% above this value
• What is the basic difference between a ‘Case
control’ and ‘Cohort’ Study design?

• Case Control study is Retrospective and


Cohort Study is Prospective
• What is the difference between Incidence and
Prevalence?

• Incidence: The number of NEW cases occuring


in defined population during a specified
period of time
• Prevalence: Number of all cases old or new at
a given point of time or over a period of time
in a given population
S.No. 1 2 3 4 5 6 7 8 9 10

Value 2 2 3 3 3 4 5 5 6 7
(X)
Mean 4 4 4 4 4 4 4 4 4 4

X- -2 -2 -1 -1 -1 0 1 1 2 3
Mean

(X- 4 4 1 1 1 0 1 1 4 9
Mean)²

• Mode (most commonly occurring: most fashionable) = 3


• Median is the middle value between 5th and 6th figures = 3.5
• Mean = ∑ value/n = 40/10 = 4
∑ (X-Mean)²
• Variance = -------------------- = 26/9 = 2.88
n-1
• SD = √ Variance = √2.88 = 1.699
Example
• A Pediatrician in a general hospital is
investigating the amount of lead in urine from
children in a nearby estate. In a particular area
there are 15 children whose ages range from 1
year to 16. The following amounts of lead
were noted in urine:
0.6 , 2.6 , 0.1 , 1.1 , 0.4 , 2.0 , 0.8 , 1.3 , 1.2,
1.5, 3.2 , 1.7 , 1.9 , 1.9 , 2.2.
What is the mean urinary lead conc.?
Mean = Sum of obsns/No.of obsns = 1.5
Median = 1.5
SD = Σ ( x - x )2 = 0.64
n-1
95% reference range for the observations
= Mean +/- 2SD = 0.22 - 2.78
Which points are excluded from the reference range?
What proportion of data is excluded?
Normal distribution curve
68.3, 95.4, 99.7%

95 % Confidence intervals
• Population mean: Sample mean ± 1.96 SE (mean)
SE (mean) = SD/√n, n sample size

• Population proportion: Sample proportion ± 1.96 SE (proportion)


SE (proportion) = √(pq/n), q = 100-p

• Wide confidence intervals imply less reliable result; narrow confidence


intervals imply a more reliable study

• When 95% confidence intervals for difference in means or difference in


proportions includes zero,
The result is not statistically significant

• When 95% CI for Odds ratio or Relative risk include 1


It is not statistically significant
Central limit Theorem
Central limit theorem states that
• The random sampling distribution of sample
means will be normal distribution
• Means of random sample means will be equal
to population mean
• The standard deviation of sample means from
population mean is the standard error
Standard Error

17
19

16

17kg
18

15
Standard Error of Mean
• The mean of any group of patients is often
taken to be representative of the whole
population of which it is a sample. The
population mean will fall within the group
mean with a SE of mean

• Group SD/ √n the standard error of the


mean
Applications of SE

• To find out the range in which the population mean


will lie ( 95% confidence interval-
interval- sample mean +/-
+/-
2SE)

• To know whether the sample is representative of the


population if population mean is known

• To find the observed difference of two samples is


statistically significant
Example
The mean body wt. (in Kg) of 50, five-year-old
boys was 18.06 with SD 1.65. Find the range
in which the mean body wt. of five-year-old
boys will lie.

95% CI for population mean (mean body wt)


=18.06-1.96*0.233,18.06+1.96*0.233
=17.59 - 18.53
Example
• The mean urinary lead concentration in 144 children was 2.18
micromol/24 h, with SD 0.87.
• How does SD differ from standard error?
• SE = 0.87/12 = 0.07

• What is the difference between reference range and CI?


• SD – 0.87 SE – 0.07
• Mean (sample) – 2. 18
• Reference range is 0.44 – 3.92
• 95% CI is 2.04 - 2.32
Example
• A count of MP in 100 fields with a 2mm oil immersion
lens gave a mean of 35 parasites per field, SD 11.6.
• On counting 1 more field – 52 parasites
• Is 52 outside 95% ref range, what is the ref range ?
• What is the 95% CI for the mean of the population
from which this sample count of parasites was drawn?

• RR 12.26 to 57.74
• Obs value included
• CI 32.73 to 37.27
Example
In a study to find the prevalence of syndrome X in cardiology patients,
200 patients were selected and syndrome X was present in 62. Find the
range in which the prevalence of syndrome X in cardiology patients will
lie.

95% CI for proportion (prevalence)

Population proportion:
Sample proportion ± 1.96*SE (proportion)
SE (proportion) = √(pq/n), q = 100-p

p = 62/200 = 31%, q =69%


SE (proportion) = √(pq/n)
final ans : 30.94% - 31.06%
Example
• In one group of 64 patients with Iron deficient anemia, the
mean Hb level was 12.2 g/dl, SD –1.8 g/dl , in another
group of 49 it was 10.9 g/dl, SD – 2.1 g/l.
• Do you think there is a gross variability between two
samples?
• Calculate the 95% CI for both groups
• First group SE =0.225
95% CI = 11.75 – 12.65
• Second group SE = 0.3
95% CI = 10.3 –11.5
• Since there is no overlapping between the CI of the two
groups, there is variability between the groups at 5%
level
• If the mean Hb Level in the general population
is taken as 14.4 g/dl, is the mean of the first
sample representative of the population mean
and what is the significance of this difference?

• Since 14.4 is not included in the 95% CI, first


sample is not a representative of the
population.
Example
In an investigation on neonatal blood pressure
in relation to neonatal asphyxia following
results were obtained. Test whether the
difference in SBP is statistically significant
Babies – Number Mean SBP SD
9 days old
Normal 100 75 6

Neonatal 100 69 5
Asphyxia
Example
• A population has a mean pulse rate of 72 with a S.D of 10. When
we take a sample of 400 persons studied from this population got
a mean pulse rate of 80.
• Which of the statement is true
A)Sample is a true representative of population
B) Sample is not a true representative of population
C) Nothing can be concluded from this information

• S.E = S.D/√n
– 10/20=0.05
– 95% CI=71-73
– 80 is outside this range. So it is not a true representative of
population
In a group of 100 under five children
attending IMCH O.P the mean weight is 15kg.
The standard deviation is 2.

1.In what range 95% of children’s weight will


lie in the sample?

2. In what range the mean weight of all


children who are attending IMCH OP will lie?
Range in which 95% children’s weight in the
sample will lie:
95% reference range =
mean +/- 2SD = 11-19Kg
What is the mean weight of all the children
attending IMCH O.P
95% Confidence interval =
mean +/- 2 SE( Standard error) = 14.6 –
15.4
• In a group of 100 children the mean
weight is 15kg. The standard error is
0.02. In what range the population mean
will lie.
95% Confidence interval
• Range in which the mean population value will
lie

• (Sample) Mean +/ - 2 SE

• 95% CI =15+/-
=15+/- 2x0.02=14.96
2x0.02=14.96--15.04kg
• If the CI of two groups do not overlap they are
likely to be different clinically
Example – mean

• The PEFR of 100, 11 year old girls follow a normal


distribution with a mean of 300 1/min, standard
deviation 20 l/min and standard error of
2 l/min

• What will be the range in which 95% of the girl’s


PEFR will lie in the sample?

• What will be the range in which mean PEFR of the


population will lie from which the sample was
taken?
Range in which 95% of girls PEFR in the
sample will lie:
mean +/- 2SD = 260 - 340

Range in which mean PEFR Value will lie:


mean +/- 2SE( Standard error)-
95% Confidence interval = 296-304
Example – proportion
In a village the percentage of male population is
52%. In a sample of 100 people the male
percentage was 40 with a standard error of 5.
5.
Is this sample representing the population ?

• SE = 5
95% CI= sample proportion +/-
+/- 2 SE
= 40 +/-
+/- 2 x 5
=30-- 50
=30

52% is higher than this range


Comparing groups

Height of 100 boys & 100 girls gave the following


values. Do these two groups differ significantly
Answer
Girls
– 95% CI = 150 +/-
+/- 2 x 2
=146 -154
• Boys
– 95% CI = 160 +/-
+/- 2x 3
=154--166
=154
• Overlapping is present among the 95% CI
• Both groups can have the same population
mean
Sample size
• Qualitative data N = 4pq/L2
• P = positive factor /prevalence/proportion
• Q = 100 – p
• L = allowable error or precision or
variability
• 4 = 1.962(Alpha error) 2
• Quantitative data N = 4SD2/L2
Sample size
• Calculate the sample size to find out the
prevalence of a disease after implementing a
control programme with 10% allowable error.
Prevalence of the disease before
implementing the programme was 80 %
• L = 10% of 80% = 8
• N= 4 x 80 x 20/8 x 8 = 100
Sample size

• Determine the sample size to find out the Vitamin A


requirement in the under five children of Calicut
district . From the existing literature the mean daily
requirement of the same was documented as 930
I.U with a SD of 90 I.U. Consider the precision as 9.
• N = 4SD2/L2

• 4 x 90 x 90 /9 x9 = 400
Sample size – 2 groups
• Determine the sample size to prove that drug
A is better than drug B in reducing the
S.Cholesterol. The findings from a previous
study is given

Drug Mean SD

A 215 20

B 240 30
• Quantitative data N =
(Zα + Zβ )2 x S2 x 2 /d2
Zα = Z value for α level = 1.96 at α 0.05
Zβ = Z value for β level =1.28 for β at 10% & 0.82
at 20%
S = average SD
d = difference between the two means
• Determine the sample size to prove that drug A is
better than drug B in reducing the S.Cholesterol.
The findings from a previous study is given
Drug Mean SD

A 215 20

B 240 30

• (Zα + Zβ )2 x S2 x 2 /d2

• N = 10.5 x 25 x 25 x2/ 25 x 25 = 21
Sample size - proportion – 2 groups
• Qualitative data N =
(Zα + Zβ )2 p x q /d2
Zα = Z value for α level = 1.96 at α 0.5
Zβ = Z value for β level =1.28 for β at 10%
P = average prevalence /proportion/positive
factor
d = difference between the
prevalence/proportion/positive factor
Deciding statistical tests?
• In a clinical trial of a micronutrient on growth,
the weight was measured before and after
giving the micronutrient.. Which test will you
use for comparison?
• paired t test
• F test
• T test
• Chi square test
Difference in proportion Chi--square test, Z test,
Chi

Difference in mean (Before and Paired t test


after comparison-
comparison-same group)

Difference in mean (two Unpaired t test, If


independent groups) sample > 30-
30-Z test

More than 2 means(> 2 groups) Anova

Association b/w 2 quantitative Spearman correlation


variables
Prediction regression
Parametric and Nonparametric tests
Parametric: When the data is normally
distributed.

Nonparametric : When data is not normally


distributed,usually with small sample size.
Non parametric tests

Qualitative data Chi-square test


Fishers test,
Mc Nemar test
Paired t test Wilcoxon Signed rank test

independent t test Wilcoxon test , Mann-


Whitney U , Kolmogrov

independent t test Kruskal-wallis test


The most appropriate test for comparing
Hb values in the adult women in two
different population of size 150 and 200
is
• A) t test
• B) Anova
• C) Z test
• D) Chi square test
Difference in proportion Chi--square test, Z test,
Chi

Difference in mean (Before and Paired t test


after comparison-
comparison-same group)

Difference in mean (two Unpaired t test, If


independent groups) sample > 30-
30-Z test

More than 2 means(> 2 groups) Anova

Association b/w 2 quantitative Spearman correlation


variables
Prediction regression
Answer

• C
– Two groups
– >30
– Continuous variable
– Comparing mean
The most appropriate test to
compare birth weight in 3 different
regions is
• A) t test
• B) Anova
• C) Z test
• D) Chi square test
Difference in proportion Chi--square test, Z test,
Chi

Difference in mean (Before and Paired t test


after comparison-
comparison-same group)

Difference in mean (two Unpaired t test, If


independent groups) sample > 30-
30-Z test

More than 2 means(> 2 groups) Anova

Association b/w 2 quantitative Spearman correlation


variables
Prediction regression
Answer

• B
– Continuous variable
– Compare means
– > 2 groups
The most appropriate test to
compare BMI in two different adult
population of size 24 and 30 is
• A) Two sampled t test
• B) Paired t test
• C) Z test
• D) Chi square test
Difference in proportion Chi--square test, Z test,
Chi

Difference in mean (Before and Paired t test


after comparison-
comparison-same group)

Difference in mean (two Unpaired t test, If


independent groups) sample > 30-
30-Z test

More than 2 means(> 2 groups) Anova

Association b/w 2 quantitative Spearman correlation


variables
Prediction regression
Answer

• A
– Two different groups
– Continuous variable
– Size <30
The association between smoking
status and MI is tested by
• A) t test
• B) Anova
• C) F test
• D) Chi square test

• Qualitative variables, Test of association,


Chi square test
Standard drug used 40% of patients responded and
a new drug when used 60% of patients responded.
Which of the following tests of parametric
significance is most useful in this study?

• A) Fishers t Test
• B) Independent sample t test
• C) Paired t test
• D) Chi square test.
Difference in proportion Chi--square test, Z test,
Chi

Difference in mean (Before and Paired t test


after comparison-
comparison-same group)

Difference in mean (two Unpaired t test, If


independent groups) sample > 30-
30-Z test

More than 2 means(> 2 groups) Anova

Association b/w 2 quantitative Spearman correlation


variables
Prediction regression
• A consumer group would like to evaluate the
success of three different commercial weight
loss programmes. Subjects are assigned to one
of three programmes (Group A , Group B
,GROUP C) . Each group follows different diet
regimen. At first time and at the end of 6
weeks subjects are weighed an their BP
measurements recorded.
Test to detect mean difference in body
weight between Group A & Group B

• T-TEST

• Difference between means of two samples


Is there a significant difference in body weight in
Group A at Time 1 and Time 2?

• Paired T Test

• Same people sampled on two Occasions.


Is the difference in body weight of subjects in Group
A,GROUP b ,group C significantly different at Time 2

• Analysis of variance
Is there any relation between blood pressure
and body weight of these subjects?

• Association b/w 2 quantitative variables

• Correlation
Correlation coefficient

• Shows the relation between two quantitative


variable
• Shows the rate of change of one variable as
the other variable change
• The value lies between –1 to + 1
• Correlation coefficient of zero means that
there is no relationship
• Regression - estimate the value of one variable
for a given value of the other variable
• No. of deaths in 8 villages due to water
borne diseases before & after installation of
water supply system.

• Villages: 1 2 3 4 5 6 7 8
• Before :13 6 12 13 4 13 9 10
• After :15 4 10 9 1 11 8 13
Did the Installation of water supply
system significantly reduce deaths
Which non parametric test will be used
to test the null hypothesis

• Small sample size


• Distribution is not normal
• Before and after, same small group
• Non parametric test that corresponds to
Paired T test
Non parametric tests

Qualitative data Chi-square test


Fishers test,
Mc Nemar test
Paired t test Wilcoxon Signed rank test

independent t test Wilcoxon test , Mann-


Whitney U , Kolmogrov

independent t test Kruskal-wallis test


Did the Installation of water supply
system significantly reduce deaths
Which non parametric test will be used
to test the null hypothesis

• Wilcoxon signed rank test


For treatment of Hepatitis A 7 patients
treated with herbal medicines & 7 patients
treated with Allopathic symptomatic
management. S.Br values after 10 days of
treatment is given below

• Herbal :9 6 10 3 6 3 2

• Allopathy: 6 3 5 6 2 4 8
Is herbal treatment is better than
allopathic treatment?
• Small sample size
• Distribution is not normal
• Non parametric test corresonding to
independent T test
Non parametric tests

Qualitative data Chi-square test


Fishers test,
Mc Nemar test
Paired t test Wilcoxon Signed rank test

independent t test Wilcoxon test , Mann-


Whitney U , Kolmogrov

independent t test Kruskal-wallis test


Is herbal treatment is better than
allopathic treatment?
• Small sample size
• Distribution is not normal
• Non parametric test corresonding to
independent T test
• Mann Whitney U test
Steps for testing a hypothesis
• State Null Hypothesis (Ho) – assumes no difference
b/w the two populations being compared.

• State alternate hypothesis (H1 or HA) - assumes


that there is difference b/w the two populations.

• Fix the alpha error

• Identify the test statistic


• Find out the critical value

• Calculate the value for the identified statistical


test
Difference in means/ SE

• If the calculated value is > the table


value(critical value)-
value)- Reject Null Hypothesis
Z- Score

• Standard normal variate

• Z score is difference between means / SE

• < 2 - Insignificant
Reject Accept
Null hypothesis Null hypothesis

Null hypothesis Type 1 error Correct decision


true (alpha error)

Null hypothesis Correct decision Type 2 error


false (Beta error)
• Alpha = 5% (0.05)
• Beta = 0.1 to 0.2 or 10 to 20%.
• Power of the study = 1- beta error
• Strength at which we conclude there is no
difference between the two groups.
• In a study conducted on a sample of 400 adults, it
was found that mean daily requirement of VitVit.. A
was 900 I.U. From the existing literature the same
was documented as 930 I.U with a SE of 4.5 I.U.
Does the study finding differ from the existing
literature finding significantly?
Null hypothesis

Alpha Error – 5%

Test static –Z test

SE = 4.5

Z = 930-
930-900/4.5=6.67
– For alpha error 5%, critical Z value = 1.96

– 6.67 >1.96 So we will Reject null hypothesis

– There is a significant difference

– P value
• After applying a statistical test an
investigator get the p value as 0.01.
What does it mean?
• Null hypothesis states there is no difference,If
there is any difference it is due to chance
• P value = If the null hypothesis is true the
probability of the sample variation to occur by
chance
• P value 0.05= probability of the sample variation
by chance is only 5% if null hypothesis was true
• 95% the sample variation is not due to chance,&
there is a difference. So we will reject NH
• P = 0.01 - probability of the sample
variation by chance is only 1% if null
hypothesis was true
• 99 % the sample variation is not due to
chance,& there is a difference. So we will
reject NH
• As p value decreases the difference become
more significant
• For practical purpose p value < 0.05 ; the
difference is significant
Significance of P value
• In assessing the association between maternal
nutritional status and Birth weight of the
newborns two investigators A and B studied
separately and found significant results with p
values 0.02 & 0.04 respectively. From this
what can you infer about the magnitude of
association found by the two investigators
• A) The magnitude of association found by investigator A is
> than that found by B

• B) The magnitude of association found by investigator B is


> than that found by A

• C) The magnitude of association found by investigator B is


equal to that found by A since both are significant

• D) Nothing can be concluded


• Significance increases as p value decreases. Hence
association found by investigator A is more than
investigator B
Difference between means
• SE (diff) =√(SD²/n + SD²/n)
• Square SD’s and divide each by its n
• Add these
• Square root of this value
• This difference-due to chance?
Difference between means
• Start with null hypo - no significant diff bet 2
samples
• Test for significance-how many multiples of SE-is
this difference
• Diff in means div by SE (diff)
• i e z=diff in means by SE (diff)
• When z = 3.291 probability is 0.001,
• z=2.576 probability is 0.01
• And z=1.96 probability is 0.05)
Same problem, different method
• In 1 gp of 62 pts with iron def anemia Hb level
was 12.2, SD 1.8 g%. In another gp of 35 pts it
was 10.9, SD 2.1.
• What is SE (diff) between the two means?
• SE of diff =0.42g%
• z=3.08
• significance of diff?
Example
The mean cumulative wt. loss in gms for 12
patients receiving propranolol was 120 with
SD 10 and those of 11 control patients
following sweating during insulin induced
hypoglycemia was 70 with SD 8. Do the data
present sufficient evidence to conclude that the
mean cumulative wt. loss is different in two
groups
Chi-square Test
Die Live Total

Treatment 10 50 60

Control 15 55 70

Total 25 105 130

• Assume no difference
• Calculate expected value in each square, if there was no
difference
• E = (Row total X column total) ÷ Grand total
Die Live Total
Treatment 10 50 60
Control 15 55 70
Total 25 105 130

Die Live Total


Treatment Obs.= 10 Obs.= 50 60
Exp.= 25x60/130 Exp.=105x60/130
=11.5 =46.4
O-E=1.5 O-E= -3.6

Control Obs.= 15 Obs.= 35 50


Exp.=25x70/130 Exp.=105x50/130
=13.5 =40.4
O-E=1.5 O-E= -5.4
Die Live Total
Treatment Obs.= 10 Obs.= 50 60
Exp.= 25x60/130 Exp.=105x60/130
=11.5 =46.4
O-E=1.5 O-E= -3.6

Control Obs.= 15 Obs.= 35 50


Exp.=25x70/130 Exp.=105x50/130
=13.5 =40.4
O-E=1.5 O-E= -5.4

• Σ (O-E)²/E
• (1.5)²/11.5 + (-3.6)²/46.4 + (1.5)²/13.5 + (-5.4)²/40.4
• 2.7
• Degrees of freedom= (Column – 1) (Row – 1) = 1
• Look at Chi sq table with 1 degree of freedom
Chi square table
Type of study Alternative Unit of study
name
Descriptive Case series Prevalence
Cross sectional study Individual
Longitudinal Incidence study

Correlational
Analytical Ecological Case reference Populations
studies Case control Follow up Individuals
(observational) Cohort Individuals

Analytical Randomised Clinical trial Patients


studies controlled trial Community Healthy people
(interventional) Field trial intervention
Community Community Healthy people
trials
Study questions and appropriate designs

Type of question Appropriate study design


Burden of illness Cross sectional survey
Longitudinal survey
Causation, risk and Case control study, Cohort study
prognosis
Occupational risk, Ecological studies
environmental risk
Treatment efficacy RCT
Diagnostic test Paired comparative study
evaluation
Cost effectiveness RCT
Study design –Case control
• Measure of risk –Odd’s ratio
Cigarette smoking Developed cancer Did not devlop
cancer
Yes 70 6930

no 3 2997

1. Calculate the relative risk


2. RR= Incidence of disease in exposed/Incidence
of disease in non exposed
3. Incidence of disease in exposed = 70/7000=10
per 1000
4. Incidence of disease in non exposed=3/3000= 1
per 1000
5. So RR=10/1= 10
Relative Risk (RR)
Die Live Total
Treatment 10 40 50

Control 15 35 50
Total 25 75 100

• 10/50 = 0.20 or 20% = P1


• 15/50=0.30 or 30% = P2
• Absolute risk reduction (ARR) =
P1 (- ) P2 = 0.10

• Relative risk (RR) =


P1/ P2 = 0.2/0.3 = 0.66

• Relative risk reduction (RRR) =


P1 (-) P2/ P2 = 0.2 (-) 0.3/0.3 = 0.1/0.3 = 0.33

(May be calculated as 1-RR = 1 (-) 0.66= 0.34)


• Numbers needed to treat (NNT) = 1/ARR
1/ P1 (-) P2 = 1/0.2 – 0.3 = 1/0.1 = 10

• For a 10% difference in mortality between


groups we need to treat ten persons before we
save one life
Odds Ratio
• P1/1-P1 0.2/0.8
=

P2/1-P2 0.3/0.7

= 0.2 × 0.7/ 0.3 × 0.8


= 14/24 = 0.59
Odds ratio
Interpret the following statement:
• In a RCT the ‘odds’ of developing HMD were 0.55
(95% CI 0.3 – 2.1) in infants whose mothers were
given ‘Antenatal Steroids’.
• In infants of mothers who had received antenatal
steroids the chances of developing HMD are 45%
less as compared to those whose mother had not
received antenatal steroids. However, the 95%
Confidence intervals are not significant
Odd’s ratio
• In a study conducted by Gireesh G N etal
about the ‘Prevalence of Worm infestation in
children”,50 children in anganwadi were
examined. Out of this 5 had worm infestation.
2 out of this 5 have a history of pet animals
at home while 21 out of the 45 non infested
has a history of pet animals at home. Is there
any association between pet animals and
worm infestations?
• Set up a 2x2 table
Worm infestation

+ -

a b
+ 2 21

- c d
3 24
• Odd’s ratio = ad /bc

• 2 x 24 = 0.76
21 x3
Interpretation
• OR =1,RISK FACTOR NOT RELATED TO DISEASE

• OR <1 ,RISK FACTOR PROTECTIVE

• OR >1 RISK FACTOR POSITIVELY ASSOCIATED


WITH DISEASE
Relative risk
• In a study to find the effect of Birth weight on
subsequent growth of children , 300 children
with birth weight 2kg to 2.5 kg were followed
till age 1 . A similar number of children with
birth weight greater 2.5 kg were followed up
too. Anthropometric measurements done in
both groups. Results are shown below
Low birth weight Normal

No.children studied 300 300

No.malnourished
At age one 102 51
Study design –Cohort study
• Measure of risk –Relative risk ,Attributable
risk.
• Relative risk –Incidence among exposed
Incidence among nonexposed
= 102/300 = 0.34 = 2
51/ 300 0.17
Inference ?
• An out break of Pediculosis capitis being
investigated in a girls school with 291
pupils.Of 130 Children who live in a nearby
housing estate 18 were infested and of 161
who live elsewhere 37 were infested. The
Chi square value was found to be 3.93 .
• P value = 0.04
• Is there a significant difference in the
infestation rates between the two groups?
Results of a screening test
Disease
Positive Negative
Positive TP(a) FP(b)

Test

Negative FN© TN(d)


Features of a screening test
Sensitivity = a/ a+c

Specificity = d/b+d

Positive predictive value = a/a+b


Negative predictive value = d/c+d
False positive rate = b\b+d
False negative rate = c/a+c
In a group of patients presenting to a hospital emergency
with abdominal pain, 30% of patients have acute
appendicitis, 70% of patients with appendicitis have a
temperature greater than 37.50c and 40% of patients
without appendictis have a temperature greater than
37.50c. Considering these findings which of the following
statement is correct ?
a) Sensitivity of temperature greater than 37.50c as a
marker for appendicitis is 21/49
b) Specificity of temperature greater than 37.50c as a
marker for appendicitis is 42/70
c) The positive predictive value of temperature greater
than 37.50c as marker for appendicitis is 21/30
d) Specificity of the test will depend upon the
prevalence of appendicitis in the population to which it is
applied.
Sensitivity and Specificity

+ -
21a 28b
Fever > 37.50c +
- 9c 42d

30a+c 70b+d
• Sensitivity = a/a+c - 21/30=70%
• Specificity = d/b+d = 42/70=60%
• Positive predictive value = a/a+b =
21/49=43%
• Negative predictive value = d/c+d = 42/51
Exercise 11
Disease prevalence in a population of 10,000
was 5%. A urine sugar test with sensitivity of
70% and specificity of 80% was done on the
population. The positive predictive value will
be :
a)15.55% b) 70.08% c) 84.4% d)98.06%
• Total population = 10,000
• Disease prevalence = 5%
• No diseased = 500
• Applying this to a 2x2 table :
2x2 table

+ -

+ TEST 350 a 1900 b 2250

- 150c 7600d 7750


500 9500 10000
Comparison between LE test and Laboratory
Diagnosis of Bacterial Meningitis

LE Laboratory criteria
test Bacterial Not Bacterial
Meningitis Meningitis
Positive 20 3
Negative 0 64

• Sensitivity – 100; Specificity – 95.2%;


• Predictive value of Positive test – 86.96%;
• Predictictive value of Negative test – 100%;
Forest plot
• The left-hand column lists the names of the studies commonly in chronological
order
• The right-hand column is a plot of the measure of effect (e.g. an odds ratio) for
each of these studies represented by a square incorporating confidence intervals
represented by horizontal lines.
• the confidence intervals are symmetrical about the means from each study
• the area of each square is proportional to the study's weight in the meta-analysis.
• The overall meta-analysed measure of effect is often represented on the plot as a
vertical line. This meta-analysed measure of effect is commonly plotted as a
diamond, the lateral points of which indicate confidence intervals for this estimate.
• A vertical line representing no effect is also plotted. If the confidence intervals for
individual studies overlap with this line, it demonstrates that at the given level of
confidence their effect sizes do not differ from no effect for the individual study.
The same applies for the meta-analysed measure of effect: if the points of the
diamond overlap the line of no effect the overall meta-analysed result cannot be
said to differ from no effect at the given level of confidence.
OSCE

1. What is this plot known as? And where are they primarily

used ?

2. What is the arrowed figure known as ?

3. In this plot which study has the best association regarding

the outcome
Answer

1. Forest plot are used in meta analysis

2. Diamond

3. Study done by kay


Meta analysis
• Box size depends on size of sample
• CI is less wide when sample size is larger
• The meta-analyzed data represented as
diamond with the ends marking the CI
OSCE

B B1

1. What is this plot known as?


2. What is this used for?
3. Describe it (Points – A and B,B1)
Answer
• Box-and-whisker plot
• They displays a statistical summary of a variable: median,
quartiles, range and, possibly, extreme values.
• The central box represents the values from the lower to upper
quartile (25 to 75 percentile).
The middle line represents the median.
The horizontal line extends from the minimum to the maximum
value, excluding outside and far out values which are displayed
as separate points.
The weights of the male and female students in a class
are summarized in the following boxplots:

Which of the following is NOT correct?

a) 50% of male students weigh between 150 and 185 lbs.


b) 25% of female students have weights > 130 lbs.
c) The median weight of male students is about 162 lbs.
d) The mean weight of female students is about 120 lbs
e) Male students have less variability than female
students
http://www.stat.sfu.ca/~cschwarz/MultipleChoice/
Mann-Whitney U Test
• Nonparametric alternative to two-sample t-
test
• Actual measurements not used – ranks of the
measurements used
• Data can be ranked from highest to lowest or
lowest to highest values
• Calculate Mann-Whitney U statistic
U = n1n2 + n1(n1+1) – R1
2
Example of Mann-Whitney U test
• Two tailed null hypothesis that there is no
difference between the heights of male and
female students
• Ho: Male and female students are the same
height
• HA: Male and female students are not the
same height
Heights Heights Ranks of Ranks
of of male of
U = n1n2 + n1(n1+1) – R1 males females heights female
2 (cm) (cm) heights
193 175 1 7
U=(7)(5) + (7)(8) – 30 188 173 2 8
2
185 168 3 10
U = 35 + 28 – 30 183 165 4 11
180 163 5 12
U = 33 178 6
170 9

U’ = n1n2 – U n1 = 7 n2 = 5 R1 = 30 R2 = 48

U’ = (7)(5) – 33

U’ = 2

U 0.05(2),7,5 = U 0.05(2),5,7 = 30

As 33 > 30, Ho is rejected Zar, 1996


Non Parametric data

Treatment A 38 26 40

Treatment 36 42 35
B
• The first step is to rank the Rank
data. Treatment B is
indicated by bold red letters 1 26
• Treatment A ranks 1,4 & 5 – 2 35
Total 10
• Treatment B ranks 2,3 & 6 – 3 36
Total 11
• Use tables to see 4 38
significance of difference
5 40
6 42
Kaplan Meier curve
Serial time
• Serial time : (Time-to-event) is a clinical course
duration, variable for each subject
• Begin : enrolled into a study, treatment begins,
• Ends : end-point (event of interest) is reached or
the subject is censored
• Serial time duration of known survival is
terminated by the event of interest; this is known
as an interval and is graphed as a horizontal line
Censoring
• Censoring means the total survival time for that subject cannot be
accurately determined.
• Negative : subject drops out, is lost to follow-up, or required data is not
available
• Positive (good) : study ends before the subject had the event of interest
occur, i.e., they survived at least until the end of the study, but there is no
knowledge of what happened thereafter.
• Thus censoring can occur within the study or terminally at the end.
• Censored subjects are indicated on the Kaplan-Meier curve as tick marks;
these do not terminate the interval.
• Large number of censored subjects - question how the study was carried
out or if treatment was ineffective – many dropped out
• More patients censored, especially early, less reliable the survival curve
• No censored patients - interpret with caution
Kaplan Meier Curve
• The lengths of the horizontal lines along the X-axis of
serial times represent the survival duration for that
interval.
• The interval is terminated by the occurrence of the
event of interest.
• The vertical lines are just for cosmesis; they make the
curve more pleasing to observe
• The cumulative probability of surviving a given time is
seen on the Y-axis
• Group 1 - probability of surviving 11 months is 100%
• Group 2 - probability of surviving 11 months is 66.7%
Kaplan Meier curve
• The cumulative probability defines the probability at
the beginning and throughout the interval. This is
graphed on the Y-axis of the curve.
• The interval survival rate (or probability) defines the
probability of surviving past the interval, i.e. still
surviving after the interval and beginning the next.
• Cumulative survival rate (probability) in interval three
of 0.667
• Median survival  by crossing a horizontal line
through the survival curve at 50% mark
• Compare 2 curves by log rank test
Kaplan Meier curve
• Survival curve somewhat misleading, as time to
any event eg time to tumor recurrence, time to
extubation after a surgical procedure, time to
rejection of a kidney transplant
• Many small steps - higher number of subjects,
whereas curves with large steps - limited number
of subjects (less accurate).
• Survivor function at the far right of a Kaplan-
Meier survival curve - interpret cautiously….
fewer patients remaining, so survival estimates
are not as accurate
ROC curve

• It shows the tradeoff between sensitivity and specificity (any increase in


sensitivity will be accompanied by a decrease in specificity).
• The closer the curve follows the left-hand border and then the top border
of the ROC space, the more accurate the test.
• The closer the curve comes to the 45-degree diagonal of the ROC space,
the less accurate the test.
ROC curve
• The slope of the tangent line at a cutpoint gives the
likelihood ratio (LR) for that value of the test.
• The area under the curve is a measure of test accuracy. The
area measures discrimination, that is, the ability of the test
to correctly classify those with and without the disease.
• .90-1 = excellent
• .80-.90 = good
• .70-.80 = fair
• .60-.70 = poor
• .50-.60 = fail

You might also like