0% found this document useful (0 votes)
13 views56 pages

Biostatistics

Uploaded by

gcroohullahmalik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views56 pages

Biostatistics

Uploaded by

gcroohullahmalik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 56

Bio Statistics Notes

Biostatistics
Biostatistics is in fact the combination of two words “Bio” and “Statistics”.

The Bio parts involve Biology “The Study of living things” while the statistics part involves “The
accumulation, tracking, analysis and application of data. Biostatistics is the branch of statistics related to
medical and health applications. Biostatistics underpins the methodologies and epidemiological
investigation and research. Biostatistics is the used of the statistical procedures and analysis and study
and practice of biology. In simple words the branch of statistics that deals with data relating to living
organisms is called Biostatistics. Statistical process and methods apply to the collection, analysis, and
interpretation of biological data and especially data relating to human biology, health and medicine is
called Bio statistics

Applications of Biostatistics
Biostatistics has applications in all life sciences. Few applications of Biostatistics are
summarized below.

a) Community Medicine and Public Health

In modern Medicine Biostatistics is used to determine how diseases develop, progress


and spread. For Example Biostatisticians use statistics to predict the behavior of the
illness like flu. It is used to predict the mortality rate, the symptoms and even the time
of year people might get it. Medical research used biostatistics from beginning to end.

b) In Demography

It is used and estimating the attributes of population such as sex ratio, Birth rates,
Density of population etc.

c) In Pharmacology
To find the action of the drug, the drug is given to animals or humans to see whether
the changes are produced due to drug or by chance.
d) In Research
Research is incomplete without statistics. Every result needs to be statistically validated,
for the design of experiment, selecting the method of collection of data, deriving logical
conclusion from data, one need the enough knowledge of statistics.

Variable

Prepared By: Sir Zahawat Sahib


Bio Statistics Notes
A variable is any characteristics, number or any quantity that can be measured or counted. It is a
characteristic that takes on different values in different persons, places or things. Some examples of
variables include diastolic blood pressure, Heart rate, and the height of adult males, the weight of
preschool children and the age of patients seen in a dental clinic etc.

Types of Variable
a. Quantitative Variables

Variables that are measured on a numerical or quantitative scale, it can be measured in


that usual sense. For example the height of adult males, the weight of preschool children and the age of
patients seen in a dental clinic etc are the examples of quantitative variables. Measurements Made on
quantitative variables convey information regarding amount.

b. Qualitative Variables

Qualitative Variables is also called categorical variables. Many characteristics are not
capable of being measured some of them are ordered called ordinal and some of them cannot ordered
called Nominal. Qualitative variables can be coded to appear but there numbers are meaningless. For
example classification of peoples into some socio-economic group, Examination Grades etc.

Types of Quantitative Variable


Discrete Variables

Discrete variables are characterized by a gap or interruption in the values that it can
assume. These gaps or interruptions indicate the absence of values between particular values that the
variable can assume. It takes only whole numbers. For example the number of admission to general
hospitals, the number of decayed, missing teeth per child in an elementary school, the number or
prescriptions an individual takes daily.

Continuous Variables

A continuous variable assumes any value within a specified relevant interval. Examples
of continuous variables includes the various measurements that can be made on individuals such as
height (inch), weight (pounds), skull circumference, heartbeat, blood pressure, time to recovery (days)

Scale of Measurements
All characteristics in life cannot be measured through same scale not same statistical
procedure appropriate for handling every type of measurements. Psychologist “Stanly Smith Stevens” as
proposed four scales of measurements which cover nearly all area of learning.

They are:

Prepared By: Sir Zahawat Sahib


Bio Statistics Notes
Nominal, Ordinal, Interval, Ratio

Nominal Scale
As the name implies, it consist of “naming or labeling” observations or classifying them into
various mutually exclusive and collectively exhaustive categories and the observations of each
categories are counted. For example Gender (Male, Female), Marital stratus (Married, unmarried) etc
Data obtained by nominal scale are called nominal data or qualitative data and are analyzed by statistics
of attributes. Summery statistic “Mode” is computed from such data. Nominal data can be represented
by pie-chart or Bar chart.

Example Gender (Male=1, Female=2), Base Ball uniform numbers, the number provides no insight into
the play.

Ordinal Scale
Qualitative observations can be ranked or ordered according to some criterion e.g. with respect
to some quality or performance but interval among category is unknown or unequal. Ordinal scale
process natural ordering, for example Qualification (Matric, Inter, BA/BSC., Master, M.Phil., Ph.D.),
feelings (Very unhappy, unhappy, ok, happy, very happy) etc., the defects of this scale is that it as
unequal interval i.e. we don’t know the how much one category is better than the other, nor can we say
that a difference between ok and unhappy as the same as difference between vary happy and happy.
Data obtained by ordinal scale are called ordinal or ranked data. Summary statistic like Median,
Percentiles and spearmen’s rank correlation co-efficient are computed from ordinal data. Ordinal data
cannot be represented by pie-chart the best choice to present on the column-Bar chart. Note: Ordinal
scale implies a statement of “grater then” or “less than” without being able to state how much greater
or less.

Interval Scale
The interval scale has numeric ordered values, it fixed or equal intervals and can go below zero
(Means it can have negative values) where “0” is not the ordinary zero. In interval scale the distance
between two values are same i.e. the distance between 5-6 and degree as a same is that between 7-8
degree also it can go below zero, for example the temperature of Ice ad -5 degree. Interval scale not
only tells us about that values are smaller or bigger, but also tell that how much bigger or smaller, they
are unlike that of ordinal scale. For example if it is 450 on Sunday and 550 on Monday. We know not only
that it was hotter on Monday and also know that it was 100 hotter. Zero as meaningful on this scale and
does not mean the absence of the quality. I.e. Zero degree temperature is hotter than -1 degree.
Statistical methods like Mean, Median, and Mode etc. can be easily calculated from the interval data.

Ratio Scale

Prepared By: Sir Zahawat Sahib


Bio Statistics Notes
This scale has numeric ordered values with equal intervals but cannot go below zero. I.e. it can
take only positive values. Here zero is a true zero. For example “0” heart beat means no life similarly “0”
money means no money. We can use the word twice thrice here under this scale. Ratio scale is used for
measuring height, length, width, etc. Wide ranges of both descriptive and inferential statistics For
example Mean, Median, Mode, C.V, etc. can be calculated from the ratio data. Ratio scale is the subset
of the interval scale.

Independent Variable
An independent Variable is presumed to influence other variables. Sometimes independent
variables are called manipulated-variables or experimental variables. Independent variable is presumed
cause, whereas the dependent variables are presumed effect.

For example: How stress affects a mantel sate of a human being.

Dependent Variables
A dependent Variable is presumed to be effected by one or more independent variables. The
dependent variables is often called an outcome variable.

For example: If we are interested How stress affects Heart-rate in humans in this case, stress will be
independent variable and Heart-rate that will be dependent variable

Intervening/Mediating Variable
Intervening/Mediating variable whose existence is inferred but it cannot be measured. For
example determining the effects of video clips on learning ability of students of students of B.S the
association between video clips and leaner ability need to be explained.

Numerical Data/Quantitative Data


Numerical Data or quantitative data is a numerical measurement expressed in term of
numbers. These data having measuring as a measurement, such as a person height, person weight, or
blood pressure or they are count, such as the number of stock share a person own. Numerical data can
be further broken into two types: i.e. Discrete and continuous

Discrete Data

Discrete data represent items that can be counted; they take on possible values that can be
listed out. The list of possible values may be fixed also called finite or it may go from 0, 1, and 2 onto
infinity (making it countable infinite).for example the number of heads in 100 coins flips takes on values
0-100(finite case), but the number of flips needed to get a 100 takes on values from 100 up to infinity, if

Prepared By: Sir Zahawat Sahib


Bio Statistics Notes
you never get that 100th head, its possible values are listed as 100, 101, 102,…… representing countable
infinite case.

Continuous Data

Continuous Data represent measurement there possible values cannot be counted in can only
be describe using interval on the real number line. For example the exact amount of gas purchased at a
pump for cause with 20 gallon tanks would be continuous Data, from zero gallons to 20 gallons
represented by the interval [0-20] inclusive. You might pump 8.40 gallons, or 8.41 or 8.414 gallons, or
any possible number from 0-20. in this way continuous Data the thought of as being unaccountably
infinite.

Qualitative/Categorical Data
Qualitative data is categorical measurements expressed not in terms of numbers, but rather it
varies in kind or names. In statistics qualitative data is often used interchangeable with “Categorical
Data”. Categorical data represents characteristics such as person’s Gender, Marital status, or the types
of movies they like. Categorical Data can take on numerical values such as “1” indicating males and “2”
indicating female, but those numbers does not have mathematical meaning. A classic example defining
categorical data is given below.

Amount of money earned last week, birth date, exercise, Favorite sports, horse steps per night,
Language mostly spoken at home, foot length, opinion on environment conservation etc.

Categorical Data Numerical Data


Favorite Sports Birth Date
Language Mostly spoken at home Exercise
Opinion on environment conservation Horse Steps per Height
State Territory live in Foot Length

Important Question About the types of Variables

Question: What kind of Variable is Marital Status?

Answer: Marital status is qualitative/categorical variable. It can take on values such as “Married”,
“Widowed”, and “divorced”.

Question: What kind of Variable is song length?

Answer: Song Length is a quantitative variable. It can take on values such as “180 Second”, “189.2
Seconds”, and “210.0039 Seconds”, It continuous quantitative variable because it can take on infinite
number of values.

Prepared By: Sir Zahawat Sahib


Bio Statistics Notes
Survival Analysis
Survival analysis is the collection of statistical procedure for data analysis for which the outcome
variable of interest its time until an event occur. By time, we mean years, month, weeks or days from
beginning of follow up of an individual until an event occur. I.e. time refers to the age of an individual
when an event occur. By event mean death, disease indicates, relapse from remission, recovery.

Censored Data
Censoring occur when we have some information about an individual survival time, but we don’t know
survival time exactly.

For example

Leukemia Patients

As Simple Example of censoring, consider Leukemia Patients, following until they go out of remission.
Shown as “X”, if for a given patient, the study ends while the patient is still in remission (i.e. do not get
the event then the patient survival-time is considered as Censored). We know that for this person the
survival time is atleast as long as the period that the person has been followed, but the person goes out
of remission of the study ends, we don’t know the complete survival time.

Cause of Censored
1) A person doesn’t experiment the event before study ends.
2) A person is lost to follow up during the study period.
3) A person withdraws from the study due to death (If death is not the event of
interest) or some other reasons (Inverse drug reaction).

These Situations are allocated as.

 Person A is followed from start of study until getting the event at week 5. Therefor person A,
survival time is 5 week and is not censored.
 Person B is also observed at the start of the study but it is followed to the end of 12 week study
period without getting the event, the survival time here is censored because we can say only it
is at least 12 week.
 Person C, enter the study between 2nd and 3rd week and is followed until he or she withdraws
from the study at 6 week, this persons survival time is censored after 3.5 weeks.

Prepared By: Sir Zahawat Sahib


Bio Statistics Notes
 Person D, enter the study at week 4 and his followed without getting event. This person’s
censored time is 8 weeks.
 Person E, enter the study at week 3 and followed until week 9 when he is lost to follow up his
censored time is there for 6 weeks.
 Person F, enter at week 8 and his followed until getting the event at week 11.5 there for the
survival time is 3.5 weeks.

In short a six person were observed to get the event (person A and person F) and four
Censored (B, C, D, and E)

A table of the survival time data for Six Person is presented as:

Person Survival Failure (1) Censored


Time (0)
A 5 1 -----
B 12 ----- 0
C 3.5 ---- 0
D 8 ----- 0
E 6 ------ 0
F 3.5 1 -----

Types Of Censored
Right Censoring

When a person exist survival time become incomplete at the right side of the following a period,
occurring when the study ends or when the person’s lost to follow up are as withdrawn, this is called
right censoring.

Left Censoring

When a person exist survival time become incomplete at the right side of the
following up period for that person. For example, if we are following person’s with “HIV” infection, we
may start following up when a subject first test positive for the “HIV” Virus, but we may not know
exactly the time. First exposed to the virus thus, the survival time is censored on the left side.

Sampled Population

A population from which a sample is drawn or chosen is called sampled population

Target Population

A population about which information is required or wanted is called target population.

Prepared By: Sir Zahawat Sahib


Bio Statistics Notes
Explanation

Suppose we want to know the opinion of GPGC Nowshera Students about the examination system then
the sampled population may consist of the total number of students of statistics deptt, political science
deptt, etc. and the target population will consist of the total number of students in GPGC Nowshera

Odds
The odds in favor of an event are the ratio of the probability that an event will happen to the
probability that it will not happen.

For example: The odds a randomly chosen day are the week is a Sunday are one to six; 1/6, which is
same time to return 1/6 or 1:6

Example: There are five pink Marbles, 2 blue and 8 purples. What are the odds in favor of picking 1
blue Marble?

Solution: The odds of picking one blue Marble: odds= p/q; Where “P” is the probability of picking one

blue Marble. P=
2
1 ()¿
2
15 15
1

1− p : is the probability of not picking blue Marble:

(
1− p= =
1 ) 13
3

15 15
1

2 13
OR 1− p=1− 15 = 15

2
p 15 2
Odds= = =
q 13 13
15

It means that the odds of picking Marble is less than a half as compared to the odds of picking a Marble
other than blue.

Example: The probability of diabetes in patient is 5%. Find the odds of diabetes.

Solution: The odds of diabetes in a patient is

Prepared By: Sir Zahawat Sahib


Bio Statistics Notes
p p
Odds= =
1− p q

Where “P” is the probability of diabetes in is patient of 5%= 5/100= 0.05

1-p is the probability of no diabetes in a patient= 1-0.05=0.95

0.05 1
Odds= =
0.95 19

Odds=1 :9

The chance of diabetes in a patient is less than a half of the other.

Odd Ratio
It is defined as the Ratio of the odds of an event occurring in one group to the odds of it
occurring in another group i.e. the odd ratio compares the relative odds in each group.

Following the typical two by two contingency table

X- X+
Y- a b a+b
Y+ c d c+d
a+c b+d n
Since odd Ratio is the ratio of two odds

a
b ad
O . R= =
c bc
d

Odds can be computed from probability and probability can be computed from odds.

p( A)
Odds∈ Favor of A=
1− p( A)

odds∈ Favor of ( A)
p ( A )=
1+odds ∈Favor of ( A)

Note that if the odds are same in each row then the odd ratio is 1.

Interpretation

Prepared By: Sir Zahawat Sahib


Bio Statistics Notes
If odd=1

An odd Ratio=1, indicates that the condition or event under study is equally likely to occur in both
groups.

If odd> 1

An odd Ratio>1, indicates that the condition or event under study is more likely to occur in first group.

If odd< 1

An odd Ratio<1, indicates that the condition or event under study is less likely to occur in first group.

The Odd Ratio must be non-negative i.e. odd>=0 If the odd of first group approaches to zero then the
odd Ratio approaches to zero. But when the odd of the second group approaches to zero then the odd
Ratio approaches to ∞

Example: Considered the following data on survival of passengers on the titanic. There were 851
males passengers 142 survival and 709 died. Compute the odd Ratio and interpret your result.

Dead Alive Total


Male 709 142 851
Female 154 308 462
Total 863 450 1313

Solution:

First we calculate the Odds

a 709
Odds of death among male¿ b = 142

c 154
Odds of death among female¿ d = 308

Now we Calculate O.R

709
Odds of death among male 142
O.R= Odds of deathamong Female = 154
308

O.R=9.98

Prepared By: Sir Zahawat Sahib


Bio Statistics Notes
Interpretation:

The males are 10 times more likely to die in the titanic as compared to females.

Example: Suppose that in a sample of 100 men, 60 have drunk wine in a previous week, while in a
sample of 100 women, only 20 have drunk wine in the same period. Calculate odd ratio and comments
your results.

Who drunk Who do not Total


wine drunk wine
Men 60 40 100
Women 20 80 100
Total 80 120 200
Solution:

First we calculate the Odds

a 60
Odds of men who drink wine¿ b = 40

c 20
Odds of women who drink wine ¿ d = 80

Now we Calculate O.R

60
Odds of men who drunk wine 40
O.R= Odds of women who drunk wine = 20
80

O.R= 6
Interpretation:

The males are 6 times more likely to drink wine as compared to female in the previous week.

Question: If the prevalence of smoking among lung cancer patient in 95 per 100, and the prevalence
of smoking among peoples without lung cancer in 25 per 100. Calculate odd ratio and comments your
results.

Prepared By: Sir Zahawat Sahib


Bio Statistics Notes
Smoking Non-Smoking Total
Lung Cancer 95 5 100
patient
Patient without 25 75 100
Lung Cancer
Total 120 80 200

Solution:

First we calculate the Odds

a 95
Odds of Smoking among Ling cancer patient¿ b = 5

c 25
Odds of smoking among patient without Lung cancer ¿ d = 75

Now we Calculate O.R

95
Odds of Smoking among Lung cancer patient 5
O.R= Odds of Smoking among patient without Lung cancer = 25
75

O.R= 57
Interpretation:

The Patient with Lung cancer is more likely than without Lung Cancer.

Important Questions

Question: What does an odd Ratio of 0.5 mean?

Answer: The odd ratio of 0.5 means that odds of the exposer being found in the case group is 50% less
than the odds of finding to the exposer in the control group.

Question: What does an odd Ratio of 0.75 means?

Answer: The odd ratio of 0.75 means that odds in one group the outcomes is 25% less likely i.e. an odd
Ratio less than “1” means that the first group less likely to experience the event. If odd Ratio is 1.33
mean that the second group is the outcome is 33% more likely than the first group.

Prepared By: Sir Zahawat Sahib


Bio Statistics Notes
Standard Error of log Odd Ratio
The Standard Error of log odd ratio is estimated simply but by square root of the sum of the reciprocals
of the four frequencies.

S.E (ln )= √ 1 1 1 1
+ + +
a b c d

Knowing this S.E one can tests the significance hypothesis Ho; ln (θ ) and construct the confidence
interval

Where “Zα /2” is the value of “Z” defining the confidence limits

Example :

Who drink Who do not Total


wine drunk wine
Men 60 40 100
Women 30 70 100
Total 90 110 200

Calculate (1) Odd Ratio (2) Test the hypothesis ln (θ ) =0 (3) C.I for ln (θ )

(1) Odd Ratio

Solution:

First we calculate the Odds

a 60
Odds of men who drink wine¿ b = 40

c 30
Odds of women who drink wine ¿ d = 70

Now we Calculate O.R

Prepared By: Sir Zahawat Sahib


Bio Statistics Notes
60
Odds of men who drunk wine 40
O.R= Odds of women who drunk wine = 30
70

O.R= 4
Interpretation:

The males are 4 times more likely to drink wine as compared to female.

(2) Test the hypothesis ln (θ ) =0

Solution:

1) We formulate our null and alternative Hypothesis as

Ho=ln (θ ) =0 vs. Ho=ln (θ )≠ 0

2) Level of significance

We set α =0.5

3) Test Statistic to be used

4) Computation

=3.5 =1.2527
1.2527
Z=

√ 1 1 1 1
+ + +
60 40 30 70

Z=4.192

5) Critical Region

Z ≥ Z 0.025¿ ± 1.96

Prepared By: Sir Zahawat Sahib


Bio Statistics Notes
6) Conclusion

Since z=4.19 falls in the critical region, so therefor we reject Ho that the association
between sex and drunken wine is significant at α =0.05 level

(3) C.I for ln (θ )

The 100(1-α )% C.I for ln(θ)

± 1.96

=1.2527 S.E =0.2988


1.2527 ± 1.96(0.2988)

1.2527−0.5856 , 1.2527+0.5856

(0.671 , 1.8383)
The 95%C.I For θ
0.6705 1.8383
(e ,e )

( 1.955 , 6.2858 )

Since the C.I for θ doesn’t include “1” so there for significant association between Gender and
drunk wine.

Question:

Who is more likely to drink beer on queen(s) day student or teacher?

Drink Beer do not drink Total


beer
Student 90 10 100
Women 80 70 100
Total 170 30 200

Incidence

Prepared By: Sir Zahawat Sahib


Bio Statistics Notes
Incidence is a measure of diseases that allow us to determine a person’s probability being
diagnose with a diseases during a given period of time. Therefor incidence is the number of newly
diagnosed cases of diseases. An incidence rate is the number of new cases of a diseases divided by the
number of persons at risk for that diseases.

New Cases
Incidence=
Total Population

OR
NO :of New Cases
Incidence Rate=
NO :of People at risk ∈given time Frame

Example: If over the course of 1 year 5 women are diagnose with breast cancer out of the total female
study population of the 200.

Solution: Five women are diagnosing with breast cancer out of the total female study population of the
200(who do not breast cancer at the beginning of the study period. Then we would say that the
incidence of breast cancer in this population is:

5
Incidence= =0.025
200

Incidence=0.025 ×1000

Incidence=25 cases per 1000

Question: In a population of 1000, non-diseased persons, 28 were infected with HIV over two years of
observation.

Solution:

New Cases
Incidence= ×K
Total Population

28
Incidence= =¿
1000

Incidence=28 Cases per 1000

%
Incidence=2.8 year period
two

Question: 100 new Cases occurred in a population of 50000 in a year. Calculate Incidence rate

Solution:

Prepared By: Sir Zahawat Sahib


Bio Statistics Notes
NO : of New Cases
Incidence Proportion=
NO :of People at risk ∈given time Frame

100
Incidence Proportion= K
50000

Incidence Proportion=2 Per 1000 persons Per Year

Prevalence
It refer to all “old and new” cases existing at a given point or period of time in the given
population. The total number of individuals who have an attribute or diseases at a particular time (or
during a particular period) divided by the population at risk at that time. (Or Mid-year population), A
prevalence rate is the total number of cases of a diseases existing in a population divided by the total
population.

old +new cases


Prevalance Rate= ×K
Total population

Question: If measurement of cases is taken the population of 40000 people and 1200 were recently
diagnosed and 3500 are living with cancer then find prevalence rate.

Solution:

old +new cases


Prevalance Rate= ×K
Total population

1200+3500
Prevalance Rate= × 1000
40000

Prevalance Rate=118

The prevalence cancer is 118 per 1000 person in the population.

Types of Prevalence
Point Prevalence
The number of all current cases (old+ new) of a disease at one point of time in relation to a
defined population, at that point of time, point of time may be a day/several days/weeks etc depending
upon the time required to examine the entire population .

Point Prevalence=
No :of all cases ( old +new ) of a specified disease existing at a given point of time
Estimated Population at same point of time

Prepared By: Sir Zahawat Sahib


Bio Statistics Notes
Question: one Health extension worker conducted a survey in one of the near by the elementary
school on 10th march 1997 to know the prevalence of trachoma in that school. The total no: of students
in that school 200. The Health extension worker examined all 200 students for trachoma 100 students
were found to have trachoma. Calculate point prevalence rate of trachoma in that school.

Solution:

All Students with trachoma on 10 March1997


Point Prevalence=
The total population∈that School

100
Point Prevalence= ×100
200

Point Prevalence=50

So that 50 trachoma patient per 100 students on 10th march 1997. Which means that 50% of the
students in that school affected by trachoma.

Period Prevalence
The proportion of individuals is a specified population at risk who has the disease of interest
over a specified period of time. I.e. Annual prevalence, life time prevalence, (when the time of
prevalence rate is not specified it is usually point prevalence.

No :of existing cases∧new cases


Period Prevalence=
Total Population

Question: Between June 30 and august 30th 1999, Average Population of 1600, 29 existing cases of
hepatitis B on June 30, 6 incidences (New cases) of hepatitis B between July 1 st and August 30.Find the
period prevalence.

Solution:

No :of existing cases∧new cases


Period Prevalence=
Total Population

29+6
Period Prevalence=
1600

Period Prevalence=0.022

So there 2% disease of total population who have the disease

Prepared By: Sir Zahawat Sahib


Bio Statistics Notes
Question: population size of 1000 people, December 31, 2014 100 people had diabetes June 30 2015,
the current have diabetes =125. So the number of new cases =25 Find the prevalence on June 30 th 2015
and incidence Rate

Solution:

Point Prevalence=
No :of all cases ( old +new ) of a specified disease existing at a given point of time
Estimated Population at same point of time

125
Point Prevalence=
1000

Point Prevalence=0.125

NO : of New Cases
Incidence Proportion=
NO :of People at risk ∈given time Frame

25
Incidence Proportion=
900

Incidence ∝ ortion=0.027

Relative Risk
A Relative Risk can only be calculated from prospective studies (cohort study). It can be defined
as the ratio of the incidence rate among exposed to the incidence rate among non-exposed.

Mathematically

Incidence rate among exposed


Relativ e Risk=
Incidence rate among non−exposed

Considered the following 2×2 contingency table for the calculation of measure of association.

Exposure Outcome
Present Present Absent Total
Absent A B a+b
Total C D c+d
a+c b+d N

Interpretation
If R.R=1

Prepared By: Sir Zahawat Sahib


Bio Statistics Notes
If R.R=1, then the incidence in the exposed is the same as the incidence in the non-exposed. No increase
Risk i.e. No association between exposer and outcomes.

If R.R>1

If R.R>1, then the incidence in the exposed is greater than the incidence in the non-exposed. Increase
Risk of outcome among exposed. It is positive association i.e. (exposure is the harmful so those who are
exposed are at higher risk of suffering from diseased for those who are not-exposed.

If R.R<1

If R.R<1, then the incidence in the exposed is lower than the incidence in non-exposed. I.e. the
Decreased Risk, It is negative association. The exposure is protective.

For example: providing Vaccine to group will be our exposure and not providing Vaccine will be non-
exposed. If R.R<1, providing Vaccine is protective.

Note: The further the R.R is from 1 the stronger is the association.

Example: Suppose we are researching the effect of benzene exposure in cancer we go to a work where
there is non-potential for exposure to benzene. There are 483 people in the work center. However only
212 were are exposed to benzene in their work duties, 12% of the work center employees. Our
discovery finds that 40 people with cancer were in exposure group. Calculate the relative risk.

Cancer Total
Benzene 40 172` 212
Exposure
Not 18 253 271
Benzene
Exposure
Total 58 425 483

Solution:

a
Diseased risk among exposed¿ a+b =0.1886

c
Diseased risk among not exposed¿ c+ d =¿0.0664

Now we Calculate R.R

Diseased risk amongexposed 0.1886


R.R= Diseased risk among not exposed = 0.0664

Prepared By: Sir Zahawat Sahib


Bio Statistics Notes
O.R= 2.8406
Interpretation:

We can say that if we are exposed to benzene 2.84 times more likely to get cancer, if we are not
exposed to benzene.

Question:

Outcome Total
Exposure 366 32 398
Exposure 64 319 383
Total 430 351 781

Solution:

a
Diseased risk among exposed¿ a+b =0.9195

c
Diseased risk among not exposed¿ c+ d =¿0.1671

Now we Calculate R.R

Diseased risk amongexposed 0.9195


R.R= Diseased risk among not exposed = 0.1671

O.R=5.50
Interpretation:

We can say that if we are exposed group are 5.50 times more likely than the non-exposed
group.

Question:

Smoking Low Birth Rate Total


Status
Smokers 120 240 360
Non- 60 580 640
smokers
Total 180 820 1000

Prepared By: Sir Zahawat Sahib


Bio Statistics Notes
Solution:

a
Incidence of LBW among smokers¿ a+b =0.33

c
Incidence of LBW among non-smokers¿ c+ d =¿0.09375

Now we Calculate R.R

Diseased risk amongexposed 0.33


R.R= Diseased risk among not exposed = 0.09375

O.R= 3.6
Interpretation:

Based on the study smokers are 3.6 times more likely to suffer LBW then from non-smokers.

Question:

In a prospective study of pregnant women, the collective information on exercise leader of low
risk pregnant women. A group of 217 women’s did no voluntary exercise during the pregnancy; while
the group of 238 women exercises extensively outcome variable of interest is exercising preterm Labor.
The result is summarized as:

Risk Cases of preterm Total


Factor Labor
Extreme 22 216 238
Exercisin
g
Not 18 199 217
Extreme
Exercisin
g
Total 40 415 455

Solution:

a
Incidence of cases of preterm Labor extreme exercise ¿ a+b =0.092

Prepared By: Sir Zahawat Sahib


Bio Statistics Notes
c
Incidence of cases of preterm Labor not extreme exercise¿ c+ d =¿0.082

Now we Calculate R.R

Diseased risk amongexposed 0.092


R.R= Diseased risk among not exposed = 0.082

O.R= 1.12
Interpretation:

The result indicate that the risk of experiencing preterm labor when a women exercises heavily
is 1.12 times greater than the women who do not exercise at all.

C.I for R.R


To construct the C.I for R.R one has to follow the following steps.

Estimate the R.R from the given data.

Find the natural log “ln” of R.R I.e. “ln (R.R)”.

Find the Confidence interval from the standard normal distribution 1.96 for 95% C.I.

Calculate the standard error of ln (R.R) by using the formula

S . Eln (R . R)=
√ b
+
d
a (a+b) c (c +d )

Calculate the lower and upper on the ln scale.

ln ( R . R ) ±1.96 S . E ln (R . R)

Use the exponential function to the limits of the original scale.

I.e. exp (L, L), exp (U, L)

If the 95% C.I doesn’t contain the value “1” the association is set to be statistically significant
at α=0.05 level.

Question:

Physicians enrolled in the physician health study were randomly assigned to take daily
aspirim or placebo. The table provides the number with M.I in each group.

Prepared By: Sir Zahawat Sahib


Bio Statistics Notes
Myocardial information Total
Group Yes No
Aspirim 139 10898 11037
Placebo 239 10795 11034
Total 378 21693 22071

Calculate (1) Calculate R.R (2) Construct the 95% C.I for R.R

Solution:

a
Incidence of M.I among Aspirim¿ a+b =0.012

c
Incidence of M.I among placebo¿ c+ d =¿0.021

Now we Calculate R.R

Incidence of M . I among Aspirim 0.012


R.R= Incidence of M . I among placebo = 0.021

O.R=0.571
Interpretation:

The relative risk estimate=0.58 which indicates that physicians in the aspirim group had a lower
risk of M.I then physics in the placebo group.

Construct the 95% C.I for R.R

R.R=0.58 ln (R.R) = -0.5447

S . Eln (R . R)=
√ b
+
d
a (a+b) c (c +d )

S . Eln (R . R)=
√ 10898
+
10795
139(11037 ) 239(11034 )

S . Eln ( R . R ) =0.1058

Now the 100(1- α ) % for ln (R.R)

Prepared By: Sir Zahawat Sahib


Bio Statistics Notes
ln ( R . R ) ±1.96 S . E ln (R . R)

(−0.5447−0.207368 ,−0.5447+0.207368)

(−0.752068 ,−0.337332)
The 95%C.I for R.R
−0.752068 −0.337332
(e ,e )

( 0.47139 , 0.7136 )

The 95% C.I indicates that the decreased risk related to daily aspirim use is significant at α =0.05
level, since the interval does not contain “1”.

Sensitivity p ¿+¿ D +¿

It is the probability of positive test result given the individual as the disease. i.e. the likelihood of
a disease individual getting a positive test result. It is also called true positive. The countermand of this is
false negative.

a TP
p ¿+¿ D +¿= =
a+c TP+ FN

False Negative p ¿-¿ D +¿

The probability that test result is negative when actually the person is suffering from diseases.
OR The probability that is suffering from disease given test result is negative.

c FN
p ¿-¿ D +¿= =
a+c TP+ FN

Note: False Negative+ True Positive=1 or 100%

Specificity p ¿-¿ D -¿

Prepared By: Sir Zahawat Sahib


Bio Statistics Notes
The probability of negative test result given the individual doesn’t have the disease .i.e. the
likelihood of non-disease individual getting a negative-test result. It is also called true negative. The
countermand of this is called false positive.

d TN
p ¿-¿ D -¿= =
b +d TN+ FP

False Positive p ¿+¿ D -¿

The probability that test result is positive when actually the person is not suffering from the
diseases. OR The probability that is not suffering from disease given test result is positive.

b FP
p ¿+¿ D -¿= =
b +d FP+TN

Note: False Positive+ True Negative=1 or 100%


Screening Test Disease Total
T+ (a) TP (b) FP a+b
T- (c) FN (d) TN c+d
Total a+c b+d n

Positive Predictive Value (P.P.V) p ¿+¿ D +¿

The Probability that a person test positive has the disease .i.e. the probability that a subject has the
disease given the subject has a positive test result.

a TP
p ¿+¿ D +¿= =
a+b TP+ FP

Negative Predictive Value (N.P.V) p ¿-¿ D -¿

The probability that a person, who test is negative, does not have the disease .i.e. probability that a
subject doesn’t have the disease give the subject has a negative test result.

d TN
p ¿-¿ D -¿= =
c+ d TN + FN

Question:

Prepared By: Sir Zahawat Sahib


Bio Statistics Notes
If the total number of positive test in 350, out of which 200 actually have the diseases till 2018,
the late has tested 1000 cases total no; of patients are 400 , construct the table and calculate Sensitivity,
Specificity, False negative, False positive, P.P.V and N.P.V

Screening Test Disease Total


T+ 200 150 350
T- 200 450 650
Total 400 600 1000

TP a 200
Sensitivity p ¿+¿ D +¿= = = =0.5
TP+ FN a+ c 400

This means that there is 50% chance that the person would get the positive test result when actually
he has the disease

TN d 450
Specificity p ¿-¿ D -¿= = = =0.75
TN + FP b+ d 600

This means that there is 75% chance that the person would get the negative test result when actually
he has the no disease

FN c 200
False Negative p ¿-¿ D +¿= = =
TP+ FN a+ c 400
=0.5

This means that there is 50% chance that the person would get the negative test result when actually
he has the disease

FP b 150
False Positive p ¿+¿ D -¿=
FP+TN b+ d 600 =0.25
= =

This means that there is 25% chance that the person would get the positive test result when actually
he has the no disease

TP a
Positive Predictive Value (P.P.V) p ¿+¿ D +¿=
TP+ FP a+b 0.57
= =¿

This means that there is 57% chance that the person would get the positive test result when actually
he has the disease

Prepared By: Sir Zahawat Sahib


Bio Statistics Notes
TN d
Negative Predictive Value (N.P.V) p ¿-¿ D -¿= =
TN + FN c +d
=0.69

This means that there is 69% chance that the person would get the negative test result when actually
he has the no disease

Question:

Screening Test Disease Total


T+ 36 25 61
T- 9 230 239
Total 45 255 300

TP a 36
Sensitivity p ¿+¿ D +¿= = = =0.8
TP+ FN a+ c 45

This means that there is 80% chance that the person would get the positive test result when actually
he has the disease

TN d 230
Specificity p ¿-¿ D -¿= = = =0.90
TN + FP b+ d 255

This means that there is 90% chance that the person would get the negative test result when actually
he has the no disease

FN c 9
False Negative p ¿-¿ D +¿= = = =0.2
TP+ FN a+ c 45

This means that there is 20% chance that the person would get the negative test result when actually
he has the disease

FP b 25
False Positive p ¿+¿ D -¿= = =
FP+TN b+ d 255
=0.09

This means that there is 9.80% or 10% chance that the person would get the positive test result when
actually he has the no disease

TP a
Positive Predictive Value (P.P.V) p ¿+¿ D +¿=
TP+ FP a+b 0.59
= =¿

This means that there is 59% chance that the person would get the positive test result when actually
he has the disease

Prepared By: Sir Zahawat Sahib


Bio Statistics Notes
TN d
Negative Predictive Value (N.P.V) p ¿-¿ D -¿= =
TN + FN c +d
=0.96

This means that there is 96% chance that the person would get the negative test result when actually
he has the no disease

Note: P.P.V and N.P.V are affected by prevalence, when prevalence increases P.P.V increases
and N.P.V decreases.

TP TP TN TN
P.P.V= = N .P.V= =
AllTest Positive TP+ FP AllTest Negative FN +T N

P.P.V increases with increased specificity so higher the specificity the higher will be its P.P.V

P.P.V also increases with prevalence, N.P.V increases with increased sensitivity and decreases
with increases prevalence, so the higher the prevalence the lower will be N.P.V

Observational Studies
There are two basic types of Observational Studies

Prospective study

A prospective study is an observational study in which two random samples of subjects are
selected. One sample consists of subject who processes the risk factor, and the other sample consists of
subject who does not process the risk factor. The subjects are followed into the feature i.e. they are
followed prospectively and record in kept on the no: of subject in each sample who at some point in
time are classifiable into each of the categories of outcome variable. The data resulting from a
prospective study involving two dichotomous variables can be displayed in 2×2 contingency table that
usually provides information regarding the no: of subjects with and without risk factor and the number
who did and did not succumb to the diseases of interest as well as the frequency for each combination
of categories of the two variables.

Classification of subject with respect to diseases status and risk factor

Disease Status
Risk Factor Present Absent Total
Present A b a+b
Absent C d c+d
Total a+c b+d Total

Retrospective Study

Prepared By: Sir Zahawat Sahib


Bio Statistics Notes
Retrospective Study is reverse of prospective study. The samples are selected from those falling
into the categories of outcome variable. Then the investigator look back (takes a retrospective look) at
the subject and determines which one have (or hold) and which one do not have (or did not had) the
risk factor. In general prospective study is more expensive then retrospective study.

Case Control Study

It is a type of retrospective study, in which two groups with different known outcomes are
compared, that’s way one group have the disease and the other doesn’t have the disease. We compere
the subjects who have a disease (the cases) with subjects who do not have that disease (the control).
We calculate Odd Ratio (O.R) from the case control study.

Risk Factor

A risk factor is something that increases your chance of getting a disease, this risk come from
something you do. For example Smoking increases your chance of developing colon cancer, therefor
smoking is a risk factor for colon cancer.

Application of Bayes’ theorem to find P.P.V

Say you are given the prevalence as A, sensitivity as B, and specificity as C.

Probability of obtaining TP=A×B Probability of obtaining FP= (1-A)×(1-C)

Now you have everything you need for P.P.V

TP
P.P.V=
TP+ FP

Application of Bayes’ theorem to find N.P.V

Say you are given the prevalence as A, sensitivity as B, and specificity as C.

Probability of obtaining TN=A×C Probability of obtaining FN= (1-B)×(1-C)

TN
N .P.V=
TN + FN

Where TN and FN, we have all we need to find N.P.V

Bayes’ Rule by using previous information

P (A/B) =P (A).P (B/A)

Prepared By: Sir Zahawat Sahib


Bio Statistics Notes
Sensitivity and Specificity are easy to evaluate by case control study but predictivity requires. That the
subjects followed until such time that their diseases status is confused such as present or absent, this
could be very time confusing and expensive. Thus predictivity difficult to evaluate, but however there is
another approach called Bayes’ rule which utilized some priori (or additionally) information. It
prevalence of the diseases in the target population is known predictivity can be fixed by using sensitivity
and specificity which are under. We calculate these probabilities by using the knowledge of sensitivity
p ¿+¿ D +¿, specificity p ¿-¿ D -¿ and the probability of the relative disease in the general population P (D)
it is usually obtained from another independent study.

Application of Bayes’ theorem to calculate P.P.V

P (T ∩ D)
p(D /T )=
P (T )

T =(T ∩ D)∪(T ∩ D)

P ( T ) =( T ∩ D ) + ( T ∩ D ) Equation A

As T ∩ D∧T ∩ D are mutually exclusive

P ( T ∩ D )=P ( D ) . P (T /D)

P ( T ∩ D )=P ( D ) . P ¿ )

Put in Equation A

P ( T ) =P ( D ) . P(T / D)+ P ( D ) . P ¿)

Therefor we reach to the following version of Bayes ‘theorem for P.P.V

P ( D ) . P(T / D)
p(D /T )=
P ( D ) . P (T /D)+ P ( D ) . P(T / D)

P(T / D)=Sensitivity P ¿)¿ 1−Specificity

P ( D )=Probability o f disease∈ general popaulatiuon .

P ( D )=1−P ( D )

Application of Bayes’ theorem to calculate N.P.V

P ( D ∩T )
p(D /T )=
P (T )

Prepared By: Sir Zahawat Sahib


Bio Statistics Notes
P (D). P(T /D)
p(D /T )=
P¿¿

T =(T ∩ D)∪ (T ∩ D)

P ( T ) =( T ∩ D ) + ( T ∩ D ) Equation A

As T ∩ D∧T ∩ D are mutually exclusive

P ( T ∩ D )=P ( D ) . P(T /D)

P ( T ∩ D )=P ( D ) . P¿ )

Put in Equation A

P ( T ) =P ( D ) . P(T / D)+ P ( D ) . P ¿)

Therefor we reach to the following version of Bayes ‘theorem for N.P.V

P ( D ) . P(T / D)
p(D /T )=
P ( D ) . P (T /D)+ P ( D ) . P(T / D)

P ¿)¿ 1−Sensitivity

Question:

Medical Research team wishes to evaluate a proposed screening test for Alzheimer’s disease.
The was given to a random sample of 450 patients with Alzheimer’s disease and an independent random
sample of 500 patients without symptoms of the diseases the two samples were drawn from population
of subjects who were 65 years of age or older. The result is as follows.

Disease Status
Alzheimer’s Alzheimer’s Total
Present Absent
T+ 436 5 441
T- 14 495 509
Total 450 b+d 950

Based on another independent study it is known that the % of patients with Alzheimer’s disease is 11.3%
out of all subjects who were 65 years of age or older. First we calculate sensitivity and specificity as
follows.

Solution:

Prepared By: Sir Zahawat Sahib


Bio Statistics Notes
TP a 436
Sensitivity p ¿+¿ D +¿= = = =0.96
TP+ FN a+ c 450

This means that there is 96% chance that the person would get the positive test result when actually
he has the disease

TN d 495
Specificity p ¿-¿ D -¿= = = =0.99
TN + FP b+ d 500

This means that there is 99% chance that the person would get the negative test result when actually
he has the no disease

Now from the general population P ( D )=11.3 %=0.113 P ( D )=1−0.113=0.887

The positive predictive value of the test we wish to estimate the probability that the subject who is
positive on the test has Alzheimer’s disease

First we Calculate P.P.V


P ¿)¿ 1−Specificity P ¿)¿ 1−0.99 P ¿)¿ 0.01

( 0.113 ) .(0.96)
p(D /T )=
( 0.113 ) .(0.96)+ ( 0.887 ) . (0.01)

0.10848
p(D /T )=
0.10848+0.897

0.10848
p(D /T )=
0.11735

p(D /T )=0.9244

This means that 93% of the subject has a disease when given that the test is positive.

Now we calculate N.P.V


P ¿)¿ 1−Sensitivity P ¿)¿ 1−0.96 P ¿)¿ 0.04

P ( D ) . P(T / D)
p(D /T )=
P ( D ) . P (T /D)+ P ( D ) . P(T / D)

( 0.887 ) .(0.99)
p(D /T )=
( 0.887 ) .(0.99)+ ( 0.113 ) .( 0.04)

0.8713
p(D /T )=
0.8713+0.0045

Prepared By: Sir Zahawat Sahib


Bio Statistics Notes
0.8713
p(D /T )=
0.8758

p(D /T )=0.99

This means that 99% of the subject does not have a disease when given that the test is negative.

Likelihood Ratio
Likelihood ratio describes how many times a person with diseases is more likely to receive a
particular test result, then a person without disease. Another words it means how likely it is that a
patient has a disease as compare to patient without disease. A negative likelihood ratio means, how
likely it is that a patient has no disease as compare to patients with disease.

An LR+ of a positive test result is usually a number greater than “1” and an LR- of a negative test result
usually less between 0-1. When LR=1, this is useless, which means that this test has a very little influence
on a fact that a patient does or does not have a disease.

Probability of individual withthe condition havigthe test result


LR=
Probability of individual without the condition havig the test result

P(Test result is positive withdisease )


LR +¿
P (Test result is positive without disease)

P (T /D) Sensitivity
LR +¿ =
P (T /D) 1−specificity

P(Test result is negative withdisease)


LR -¿
P (Test result is negative without disease)

P (T /D) 1−Sensitivity
LR -¿ =
P (T /D) specificity

Test Accuracy
The accuracy of a test is its ability to differentiable the patient and healthy cases correctly. To
estimate the accuracy of the test we should calculate the proportion of true positive and true negative
and all evaluated cases. Mathematically it can be stated as:

TP+ TN
Test Accuracy=
TP+TN + FP+ FN

a+ d
Test Accuracy=
a+b +c +d

Prepared By: Sir Zahawat Sahib


Bio Statistics Notes
Scenario-I

Imagine we have a sample of 100 cases, 50 healthy and other patients. If a test is positive for all patients
and be negative for all healthy once, it is a 100% accurate. In figure error shows the test and it is been
able to differentiate the healthy and patient exactly. In this example the sensitivity of the test is

Patients Healthy Total


+
T 50 0 50
T- 0 50 50
Total 50 50 100

TP+ TN
Test Accuracy=
TP+TN + FP+ FN

a+ d
Test Accuracy=
a+b +c +d

50+50
Test Accuracy=
50+0+50+ 0

Test Accuracy=1∨100 %

TP a 50
Sensitivity p ¿+¿ D +¿= = = =1∨100 %
TP+ FN a+ c 50+0

This means that there is 100% chance that the person would get the positive test result when actually
he has the disease

TN d 50
Specificity p ¿-¿ D -¿= = = =1∨100 %
TN + FP b+ d 50+0

This means that there is 100% chance that the person would get the negative test result when actually
he has no disease

Taking into account the mentioned statistical characteristics this test is appropriate for both screening
and final verification a disease.

Scenario-II

Test with 75% accuracy 50% sensitivity and 100% specificity. If test is can only diagnose 25 out of the 50
patients and has reported the other has healthy (as we see from figure II) then the accuracy sensitivity
and specificity are given below accuracy of the 100 cases that have been tasted the test could determine
25 patients and 50 healthy cases correctly, therefor the accuracy of the test is 75%

Prepared By: Sir Zahawat Sahib


Bio Statistics Notes
Patients Healthy Total
T+ 25 0 25
T- 25 50 50
Total 50 50 100

TP+ TN
Test Accuracy=
TP+TN + FP+ FN

a+ d
Test Accuracy=
a+b +c +d

25+50
Test Accuracy=
25+50+0+ 25

Test Accuracy=75∨75 %

TP a 25
Sensitivity p ¿+¿ D +¿= = = =0.5∨50 %
TP+ FN a+ c 50

This means that there is 50% chance that the person would get the positive test result when actually
he has the disease

TN d 50
Specificity p ¿-¿ D -¿= = = =1∨100 %
TN + FP b+ d 0+50

This means that there is 100% chance that the person would get the negative test result when actually
he has no disease

This test is not suitable for screening purpose but is suitable for final confirmation of a disease.

Scenario III

If we assume that the test as mean able to identified 25% of the 50 healthy cases and as
reported the other as patient (we see from figure III) in this scenario accuracy, sensitivity and specificity
will be as follows test with 75% accuracy, 100% sensitivity and 50% specificity.

Patients Healthy Total


T+ 50 25 75
T- 0 25 25
Total 50 50 100

TP+ TN
Test Accuracy=
TP+TN + FP+ FN

Prepared By: Sir Zahawat Sahib


Bio Statistics Notes
a+ d
Test Accuracy=
a+b +c +d

50+25
Test Accuracy=
50+25+25+ 0

Test Accuracy=75∨75 %

TP a 50
Sensitivity p ¿+¿ D +¿= = = =1∨100 %
TP+ FN a+ c 50

This means that there is 100% chance that the person would get the positive test result when actually
he has the disease

TN d 25
Specificity p ¿-¿ D -¿= = = =0.5∨50 %
TN + FP b+ d 50

This means that there is 50% chance that the person would get the negative test result when actually
he has no disease

This test is suitable for screening purpose but it is not suitable for final confirmation of a disease.

Diagnostic Test
A diagnostic test is a procedure perform to conform or to determine the presence or absence of
disease in an individual suspected of having the disease usually following the reported of symptoms or
base on the result of other medical tests. This procedure will give as a rapid indication of whether a
patient has certain disease. A diagnostic test is any approach use together clinical information for
purpose of making a clinical decision. i.e. (Diagnoses) some examples of diagnostic test x-ray, Biopsies,
pregnancy test, medical histories and result from physical examination.

Diagnostic Odd Ratio


A diagnostic O.R is a measured of the effectiveness of diagnostic tests. It is defined “The Ratio of
the odd of the test being positive if the subject as a disease relative to the odds of the test being positive
if the doesn’t have the disease. The diagnostic range is from 0-∝ for useful test diagnostic O.R is greater
than 1. The higher diagnostic Odd Ratio is:

D .O . R=LR +¿ LR -

The Diagnostic Odd Ratio may be express in term of sensitivity and specificity of the test.

Sensitivity Specificity
D .O . R= ×
(1−sensitivity ) (1−specificity )

Prepared By: Sir Zahawat Sahib


Bio Statistics Notes
a d
a+ c b+ d
D .O . R= ×
a b
(1− ) (1− )
a+c b+ d

a d
a+c b+d
D .O . R= ×
a+c−a b+d−d
a+c b+d

a d
D .O . R= ×
c b

ad
D .O . R= =O . R
bc

The Diagnostic Odd Ratio may also be express in terms of positive predictive value (P.PV) and Negative
predictive value (N.P.V)

P.P.V N .P.V
D .O . R= ×
(1−P . P . V ) (1−N . P .V )

a d
a+b c+ d
D .O . R= ×
a d
(1− ) (1− )
a+b c+ d

a d
a+b c +d
D .O . R= ×
a+b−a c +d−d
a+b c +d

a d
D .O . R= ×
b c

ad
D .O . R= =O . R
bc

Question:

Concerned test with the following 2×2 contingency table calculate Diagnostic O.R

Disease Status
D+ D- Total
+
T 26 12 38
T- 3 48 51
Total 29 60 89
Prepared By: Sir Zahawat Sahib
Bio Statistics Notes
TP
FP
D .O . R=
FN
TN

26
12
D .O . R=
3
48

26 48
D .O . R= ×
12 3

D .O . R=34.56

Since D.O.R>1, so we conclude that the test is discriminating correctly.

Probability Distribution
The probability distribution of a random variable, describes how the probability are distributed
over the values of a random variable. A probability distribution is a listing of all the outcomes of an
experiment and their associated probabilities. For a discrete random variable X, the probability
distribution is defined by probability Mass function f ( x )= p [ X =x ], where this function gives the
probability for each value of the random variable. Consider the example of tossing of three coins in
which the variable of interest is a random variable X (the number of heads) when three coins are tossed,
let X, be the no: of heads.

s= { HHH , HHT , HTH ,THH ,TTH , HTT , THT , TTT }

Where X=0, 1, 2, 3

1
p ( X=0 )= p ( No heads ) =p ( TTT )=
8

3
p ( X=1 )= p ( one heads )= p ( THT , HTT , TTH )=
8

3
p ( X=0 )= p ( two heads )= p ( HHT ,THH , HTH )=
8

Prepared By: Sir Zahawat Sahib


Bio Statistics Notes
1
p ( X=0 )= p ( three heads ) =p ( HHH )=
8

Probability Distribution for no: of heads

X P(X=x)
0 1/8
1 3/8
2 3/8
3 1/8
1
Discrete Probability Distribution
Probability distribution of a discrete random variable is a table, graph or formula that gives the
probability associated with each possible value that the variable can assume. For the discrete Random
variable X, the probability Mass function is denoted by f ( x )= p [ X =x ], which satisfy the following two
conditions.

i. f (x)≥0∀ x∈ X ,
ii. ∑ f ( x )=1

Binomial Experiment
A binomial experiment is a statistical experiment that as the following properties.

1) The experiment consists of “n” repeated trails.


2) Each trail can result in two possible outcomes; we call one of these outcomes a success, and
the other, a failure.
3) The probability of success, denoted by “p” is the same on every trail.
4) The trails are independent i.e. the outcome on one trail does not affect the outcome on other
trail.

Consider the statistical experiment, in which we flip a coin two times and count the number of times
that a head occur, this is a binomial experiment because.

I. The experiment consist of repeated trails, we flip the coin two times.
II. Each trail can result in just two possible outcomes i.e. head or trail.
III. The probability of success is constant, i.e. 1/2
IV. The trails are independent i.e. getting head on one trail doesn’t affect whether we
get head on the other trail.

Daily uses examples of Binomial Experiment

Prepared By: Sir Zahawat Sahib


Bio Statistics Notes
If a new drug is introduced to cure a disease, it either cures the disease (it is successful) or it
does not cure the disease (it is failure), if you purchased a lottery ticket, you are either going to won a
price or not. Basically anything you can think of that can only be success or failure can be represented by
a binomial distribution.

Notations:

X: The number of success that result from a binomial experiment.

n: The number of trails in the binomial experiment.

p: The probability of success on an individual trail.

q: The probability of failure on an individual trail.

n!: The factorial of n.

b(X; n, p)→Binomial Probability

The Probability that an “n” trail Binomial-experiment, results an exactly X success, when the probability
of success on an individual trail is p.

The probability Distribution of a binomial random variable is called Binomial Distribution.

Suppose a binomial experiment consist of “n” trails which results on an “n” successes on an individual
trail is p if then the probability Mass function (P.M.F) of the Binomial Distribution is.

()
p [ X=x ] =f ( x ) = n p q X =0 ,1 , 2 … … n
x
x n− x

p [ X=x ] =f ( x ) =0 O. W

Question:

Suppose a die is rolled is 5 times. What is the probability of getting exactly 2, fours?

Solution:

This is a binomial experiment in which the number of successes=2, the number of trails=5 and the
probability of successes=p=1/6 or 0.167, therefor the Binomial probability is

b ( X ; n , p )=b ( 2; 5 , 0.167 )

()
p [ X=x ] =f ( x ) = n p q X =2
x
x n− x

Prepared By: Sir Zahawat Sahib


Bio Statistics Notes
2 ()
p [ X=2 ]= 5 (0.167) ¿
2

p [ X=2 ]=0.1606

Question:

The probability that a student is accepted to a prestigious college is 0.3 If 5 students from the
same school apply. What is the probability that at most 2 are accepted?

Solution:

Here n=5, p=0.3 q=0.7

b ( X ; n , p )=b ( X ≤2 ; 5 , 0.3 )
2
p [ X ≤2 ]=f ( x )=∑ b ( X ; n , p )
x=0

()
p [ X ≤2 ]=∑ 5 0.3x 0.75− x
x=0 x

p [ X ≤2 ]= 5 ¿
0 ()
p [ X ≤2 ]=0.1680+0.3601+0.3087

p [ X ≤2 ]=0.8368

Question:

60% of the people who purchased sports car are male. If 10 sports car are randomly selected.
Find the probability that exactly 7 are men.

Solution:

Here n=10, p=0.60 q=0.40 X=7

b ( X ; n , p )=b ( 7 ; 10 ,0.6 )

x ()
p [ X=x ] =f ( x ) = n p q X =7
x n− x

Prepared By: Sir Zahawat Sahib


Bio Statistics Notes
( )
p [ X=7 ] = 10 (0.60) ¿
7
7

p [ X=7 ] =0.2149

Question:

Suppose that 80% of adults with allergies with report symptoms relief with a specific
Medication. If the medication is given to 10 new patients with allergies, what is the probability that is
effective in exactly7?

Solution:

Here n=10, p=0.80 q=0.20 X=7

b ( X ; n , p )=b ( 7 ; 10 ,0.8 )

()
p [ X=x ] =f ( x ) = n p q X =7
x
x n− x

( )
p [ X=7 ] = 10 (0.80) ¿
7
7

p [ X=7 ] =0.2013

Question:

The likelihood that a patient with heart attack is 0.04.suppose we have 5 patients who suffer a
heart attack. What is the probability that all survive?

Solution:

Here n=5, p=0.04 q=0.96 X=0

b ( X ; n , p )=b ( 0 ; 5 ,0.04 )

()
p [ X=x ] =f ( x ) = n p q X =0
x
x n− x

p( All the patients survuve)

0()
p [ X=0 ] = 5 (0.04) ¿
0

p [ X=0 ] =0.8153

There is 81.53 % chance taht all patients will survive .

Prepared By: Sir Zahawat Sahib


Bio Statistics Notes
Question:

In a class of 8 students 3% of the students are suffering from anxiety. A sample of 100 students
is selected. Find the probability that out of these.

I. No students are suffering.


II. At least one student is suffering.
III. At least Majority of students are suffering.
IV. All the students are suffering.

Solution:

Let X is a random variable denoted the number of students suffering from anxiety.

Here n=5, p=0.03 q=0.97

No students are suffering. b ( X ; n , p )=b ( 0 ; 5 ,0.03 )

()
p [ X=x ] =f ( x ) = n p q X =0
x
x n− x

0 ()
p [ X=0 ] = 5 (0.03) ¿
0

p [ X=0 ] =0.8587

At least one student is suffering.

b ( X ; n , p )=b ( X ≥1 ; 5 , 0.3 )

p [ X ≥1 ]=1−p ( X <1 )

p [ X ≥1 ]=1−0.8587

p [ X ≥1 ]=0.1413

At least Majority of students are suffering.

b ( X ; n , p )=b ( X ≥3 ; 5 , 0.3 )

p [ X ≥3 ] =1− p(X < 3)

p [ X ≥1 ]=1−{ p ( X=0 ) + p ( X=1 ) + p( X=2) }

p [ X ≥1 ]=1−¿

Prepared By: Sir Zahawat Sahib


Bio Statistics Notes
p [ X ≥3 ] =1−0.8587+ 0.1327+0.0082

p [ X ≥3 ] =0.2822

All the students are suffering.

x ()
p [ X=x ] = n p q X=5
x n−x

5 ()
p [ X=5 ] = 5 (0.03) ¿
5

p [ X=5 ] =0

Poisson Distribution
The Poisson distribution is a discrete distribution. It is named after “Simeon-Denis Poisson”
(1781-1840). A French mathematician, who published its essentials in a paper in 1837, The Poisson
distribution and the binomial distribution have some simulates, but also several differences. The
binomial distribution describes a distribution of two possible outcomes, designated as success and
failure from a given number of trails. The Poisson distribution focus is only on the number of discrete
occurrence over interval. A Poisson experiment doesn’t have a given number of trails (n) as binomial
experiment does for examples.

A binomial experiment might be used to determine how many black cars are in a random sample of 50
cars. A Poisson experiment might focus on the number of cars random Arriving at a car Wash during a 20
minute interval. The Poisson distribution has the following characteristics.

i) It is discrete Distribution.
ii) Each occurrence is independent of the other occurrence.
iii) It describes discrete occurrence over an interval.
iv) The occurrence in each interval can range 0-∞ .
v) The mean number of occurrence must be constant throughout the experiment.

Probability Mass function of Poisson distribution

Let X a random variable having p.m.f of the form of


−μ x
e μ
p [ X=x ] =f ( x ) = , x=0 , 1 , 2, … .. ∞
x!

Prepared By: Sir Zahawat Sahib


Bio Statistics Notes
p [ X=x ] =f ( x ) =0 O. W

Then the random Variable X is said to be have Poisson distribution with Parameter μ. Where the symbol
“!” is called Factorial.

μ(is called the expected or mean number of occurrence) is sometimes written as λ , some times is called
event rate or rate parameter.

Here e=2.71828 ( Eule r ' s Constant )

Mean, Variance and β 1 and β 2 of a Poisson distribution

μ= λ Var(X) = μ= λ β 1¿ 1/m β 2=3+1/m

Question:
The average number of major stories in a city is 2 per year. What is the probability that exactly 3
storms will hit in the city next year.

Solution:

Here μ=2 X=3 e=2.71828


−μ x
e μ
p [ X=x ] =f ( x ) =
x!
−2 3
e 2
p [ X=3 ] =
2!

p [ X=3 ] =0.180∨18 %

Thus the probability of 3 storms having next year is 18%

Question:
The average number of home should by a gcon’s company has 2 home per day. What is the
probability that exactly 3 home will be sold tomorrow.

Solution:

Here μ=2 X=3 e=2.71828

Prepared By: Sir Zahawat Sahib


Bio Statistics Notes −μ x
e μ
p [ X=x ] =f ( x ) =
x!
−2 3
e 2
p [ X=3 ] =
2!

p [ X=3 ] =0.180∨18 %

Question:
Suppose the average number of loins seen in jungle on 1 day visits as 5. What is the probability
that has 2 arrests will see fewer than 4 loins on the next day visit.

Solution:

Here μ=5 X=0, 1, 2, 3 or X<4 e=2.71828


−μ x
e μ
p [ X=x ] =f x =
( )
x!

Since we want to find likelihood tourist will see four lions i.e. we want to find the probability that they
will see X=0, 1, 2, 3 or X<4
3
p [ X <4 ] =∑ p [ X =x ]
x=0

p [ X <4 ] = p ( X =0 ) + p ( X=1 )+ p ( X =2 ) + p ( X=3 )


−5 0 −5 1 −5 2 −5 3
e 5 e 5 e 5 e 5
p [ X <4 ] = + + +
0! 1! 2! 3!

p [ X <4 ] =0.264

That the probability that tourist will see no more than 3 lions are 0.264.

Question:

Consider a computer system will Poisson job annual determine the probability.

I. Zero Jobs.
II. Exactly 2 Jobs.
III. At most three Jobs.

Solution:

Zero Jobs

Prepared By: Sir Zahawat Sahib


Bio Statistics Notes
Here μ=2 X=0 e=2.71828
−μ x
e μ
p [ X=x ] =f ( x ) =
x!
−2 0
e 2
p [ X=0 ] =
0!

p [ X=0 ] =0.135

Exactly 2 Jobs

Here μ=2 X=2 e=2.71828


−μ x
e μ
p [ X=x ] =f ( x ) =
x!
−2 2
e 2
p [ X=2 ]=
2!

p [ X=2 ]=0.27

At most three Jobs


3
Here μ=2 X=0, 1, 2, 3 e=2.71828 p [ X <4 ] =∑ p [ X =x ]
x=0

p [ X <4 ] = p ( X =0 ) + p ( X=1 )+ p ( X =2 ) + p ( X=3 )


−2 0 −2 1 −2 2 −2 3
e 2 e 2 e 2 e 3
p [ X <4 ] = + + +
0! 1! 2! 3!

p [ X <4 ] =0.8560

Properties of Normal Distribution

 The function f(x) representing the normal distribution satisfies the properties of proper p.d.f

I .e. f ( x ) ≥ 0 ∀ x total Areaunder the normal curveis unity

 The Mean, Median and Mode for normal distribution are equal.

Mean=Median=Mode=μ

 The Normal distribution as symmetrical is unimodel. (which has one Mode)


 The M.D of the Normal distribution is approximately 4/5 of its S.D. (M.D=4/5 S.D)

Prepared By: Sir Zahawat Sahib


Bio Statistics Notes
 The odd order moments about the Mean are all zero, where is the even order moments about
Mean are given by

μ2n¿ ( 2 n−1 )( 2 n−3 ) … … ….5 .3 .1 σ 2n

μ2n+1¿ 0 ∀ n(odd)

 The Normal curve expends n indefinitely for to the left and to the right, approaching more
closely the x-axis, as x increases in magnitude.
 The curve is symmetric about its Mean and thus the area to the left to the Mean and the area to
the right of the Mean each equal to the 0.5.
 For Normal distribution about 68% of the area under the curve are between μ−σ and μ+σ
and about 95% of the area under the curve are between μ−2 σ and μ+2 σ and about 99.7% of
the area under the curve are between μ−3 σ and μ+3 σ .
 The points of inflection on the curve are standard deviation away from the Mean.

Question:

The average on the statistics test was 78, with S.D of 8. If the test score are normally distributed.
Find the probability that a student receives a test score less than 90

Solution:

Given that μ=78 σ =8

Let “X” denotes the number of score of statistics test.

X −μ 90−μ
P ( X <90 )= p( < )
σ σ

In Standardize Form

90−78
P ( X <90 )= p(Z < )
8

P ( X <90 )= p ( Z <1.50 )

P ( X <90 )=P (−∞ < Z< 0 ) + P(0< Z <1.50)

P ( X <90 )=0.5+¿

P ( X <90 )=0.9332

Prepared By: Sir Zahawat Sahib


Bio Statistics Notes
Question:

A pollen count for a species of flowers vary randomly in a manner well represented by a normal
distribution with μ=1000 , and σ =80

I. Find the probability that an individual pollen count will be greater than 1200
II. Less than 775.
III. Between 800 and 1100.

Solution:

Given that μ=1000 σ =80

Let “X” represents the pollen counts in a flower.

Find the probability that an individual pollen count will be greater than 1200

X−μ 1200−μ
P ( X <1200 ) =p ( < )
σ σ

In Standardize Form

1200−1000
P ( X <1200 ) =p ( Z< )
80

P ( X <90 )= p ( Z <2.50 )

P ( X <90 )=P ( 0< Z< ∞ )−P(0< Z <2.50)

P ( X <1200 ) =0.5−0.0175

P ( X <1200 ) =0.4825

Less than 775.

X−μ 775−μ
P ( X <775 ) =p ( < )
σ σ

In Standardize Form

775−1000
P ( X <775 ) =p (Z< )
80

P ( X <775 ) =p ( Z ←2.81 )

P ( X <775 ) =P (−2.81< Z <0 )+ P (0< Z<+ ∞)

Prepared By: Sir Zahawat Sahib


Bio Statistics Notes
P ( X <775 ) =0.007+0.5

P ( X <775 ) =0.5078

Between 800 and 1100.

800−μ X −μ 1100−μ
P ( 800 ≤ X ≤ 1100 )= p( < < )
σ σ σ

In Standardize Form

800−1000 1100−1000
P ( 800 ≤ X ≤ 1100 )= p( <Z< )
80 80

P ( 800 ≤ X ≤ 1100 )= p (−2.5< Z< 1.25 )

P ( 800 ≤ X ≤ 1100 )=P (−2.50< Z< 0 ) + P(0< Z <1.25)

P ( 800 ≤ X ≤ 1100 )=¿ 0.4938+ 0.3944

P ( 800 ≤ X ≤ 1100 )=¿ 0.8882

P-Value (Probability Value)


When we perform a hypothesis test in statistics, a P-value helps us to determine the significant
of the results. The P-value is a measure of the strength of the evidence against the null-hypothesis. The
P-value is used as an alternative to rejection points to provide the smallest level of significance at which
the null-hypothesis would be rejected.

Definition:
The P-value (or Probability value) is a probability of getting a sample statistic (such as the Mean)
or a more extreme sample statistic in the direction of the alternative hypothesis when the null-
hypothesis is true.

OR
“The P-value is the probability of getting the observed value of the test statistic, or a value with even
greater evidence against Ho, if the null-hypothesis is actually true”

The smaller the P-value the greater the evidence against Ho

Decision Rule when using a P-value

Prepared By: Sir Zahawat Sahib


Bio Statistics Notes
1) Reject Ho if P-value ≤ α
2) Do not Reject Ho, if P-value >α

Testing Hypothesis, solving problem (P-Value Method)


Step N0: 1
State the hypothesis and identify the claim.

Step N0: 2
Compute the test value.

Step N0: 3
Find the P-value

Step N0: 4
Make the Decision

Step N0: 5
Summarized the result

Two tailed test

P-value=P (Z<-|Z o| or Z>|Z o|

P-value= 2P (Z>Zo)

Right tailed test

P-value= P (Z>Zo)

Left tailed test

P-value= P (Z<Zo)

Prepared By: Sir Zahawat Sahib


Bio Statistics Notes
Case: I
If HA or H1 contains a less the alternative, find the probability that Z< your test statistic (i.e look
up your test statistic on the Z table and find its corresponding probability) the result is your P-value.

Note : In this case your test statistic usually negative.

Case: II
If HA or H1 contains a greater than the alternative, find the probability that Z> your test statistic
(i.e look up your test statistic on the Z table and find its corresponding probability and subtract it from 1)
the result is your P-value.

Note : In this case your test statistic usually Positive.

Case: III
If HA or H1 contains a not equal to alternative, find the probability that Z is beyond your test
statistic and double it.

There are two cases

 If your test statistic is negative, first find the probability that Z is less than test-statistic (i.e look
up your test statistic on the Z table and find its corresponding probability) then double this
probability to get P-value from)
 If your test statistic is positive , first find the probability that your test-statistic (i.e look up your
test statistic on the Z table and find its corresponding probability and then subtract it from 1)
then double this result to get P-value

Question: A researcher wishes to test to claim that the average cost of tuition in fees it 2 Year college
is greater than $5550. She selects a random sample of 36 2 year colleges and find is the mean to be
$5800, the population S.D is $600. Is there any evidence to support the claim at α 0.05? use P-value
Method.

Solution:

1) We formulate our null and alternative Hypothesis as

HO μ=5550 VS H1 μ>5550

2) Test Statistic

Prepared By: Sir Zahawat Sahib


Bio Statistics Notes

5800−5550
Z= 600
√36
Z=2.50

3) Compute the P-value

P-value= 1-P(Z>2.50)

=1-0.4938

=0.062

4) Make the Decision

0.062<0.05

I.e. P<α

5) Summarized the result

Since P is less than α so there is enough evidence to support the claim that the tuition is
fees it 2 years colleges are greater than $5550.

Question: A researcher wishes to test to claim that the average wind speed in a certain city is 9 per
hour. A sample of 36 days has an average wind speed 9.3, the S.D of the population is 0.8 miles per
hours at α=0.01. Is there enough evidence to reject the claim? Use P-value Method.

Solution:

1) We formulate our null and alternative Hypothesis as

HO μ=9 VS H1 μ ≠ 9

2) Test Statistic

Prepared By: Sir Zahawat Sahib


Bio Statistics Notes
9.3−9
Z= 0.8
√36
Z=2.25

3) Compute the P-value

P-value= 1-P (Z>2.25)

P-value =1-0.9878

P-value =0.0122

P-value=2(0.0122)

P-value=0.0244

4) Make the Decision

0.0244>0.01

I.e. P>α

5) Summarized the result

Since P> α so there is not enough evidence to reject the claim that the average wind
speed is 9 miles per hour.

Question: Suppose the average no: of Facebook friend from 150 S.D = 40.3. A random sample of 64
high school students in a particular country related the average Facebook friend was 160 at α=0.01. Is
their sufficient evidence to compute that the mean.

Solution:

1) We formulate our null and alternative Hypothesis as

HO μ=150 VS H1 μ>150

2) Test Statistic

Prepared By: Sir Zahawat Sahib


Bio Statistics Notes

160−150
Z= 40.3
√64
Z=1.9851

3) Compute the P-value

P-value= 1-P (Z>1.9851)

P-value =1-0.9767

P-value =0.0233

4) Make the Decision

0.0233>0.01

I.e. P>α

5) Summarized the result

Since P> α so there is not enough evidence to reject the claim.

Prepared By: Sir Zahawat Sahib

You might also like