0% found this document useful (0 votes)

13 views24 pages

2statsnotes 1

Uploaded by

Sana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views24 pages

2statsnotes 1

Uploaded by

Sana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

Series

Clinical
Oncology
Revision Notes
Biostatistics
Biostatistics ®

Contents

1. Descriptive statistics
1.1 Types of data
1.2 Displaying Data methods
1.3 Measure of central tendency
1.4 Measures of Dispersion
1.5 Distribution
2. Sampling
3. Principle of statistical inference
4. Survival analysis
5. Study methods
6. Clinical trials
7. Epidemiology
1.Descriptive statistics
Types of data
Variable-characteristic of interest (i.e diet regime) which takes different values in
different items/objects (different regime’s body weight)

A.Independent or dependent
Independent predictive factor that has an impact on a dependent variable
e.g different dietary fat regimens
Dependent outcome that reflects the effects of changing the indepe ndent
e.g., body weight under different dietary fat regimens

B. Qualitative or quantitative
Categorical (qualitative-cannot measure/countable)
Both order/unorder Unordered categories Ordered categories
Binomial (2 group) Nominal (> 2 gps) Ordinal
- Dead/alive - type of cancer -Stage breast cancer - I, II, III, IV
(unorder) - Blood gp O, A, B, AB -Birth order—1st, 2nd, 3rd,
- better/worse -Letter grades (A, B, C, D, F)
(order)
Numerical (quantitative-can be measured)
Continuous Discrete
Blood pressure, height, weight, age - Number of children
- Number of attacks of asthma per week
- Quantitative variables are converted to categorical ones for descriptive purposes
- Categorising data is useful for summarising results not for statistical analysis.

Displaying Data methods

Graphical display- looking for pattern Numerical- objective, precise
Categorical variables Continuous variables
- Bar Chart - Stem-and-Leaf Plot (both for 1 category vs 1 cont data)
(length of bar = - Box Plot i.e gender(F,M) vs height
frequency) - Histogram (area = frequency)
- pie chart - Dot plot/scatter plot (both continuous variable)

Measure of central tendency

Mean
arithmetic Average (affected by extreme values/outliers)

geometric - obtained by taking the antilogarithm of the arithmetic mean

of the log data
- geometric mean is useful if the data are skewed to the right
-close to the median if data are skewed to the right

median - 50th pe rcentile value

- the middle value in a sequentially ordered group of numbers
- mid 1st and 2nd added up/2 (if even number of data)
- not affected by extreme value(outlier)

mode most frequently occurring value, not affected by extreme value(outlier)

Measures of Dispersion (spread or variability)
Range In between the largest and the smallest observations
-Distorted by outliers
Percentile 1-99 percentile
Quartile -Data divided into 4 equal portions

Interquartile range 3rd quartile – 1st quartile = Q3(75%) – Q1(25%)

-not affected by extreme value(outlier)

Variance Average squared deviations of values from the mean

= <∑(mean- x)2 >/(n-1)

Standard Deviation Square root of variance

(SD) - variation about the mean
- affected by outlier- SD larger than expected
- affected by skewed data- over-estimate the spread

coefficient of =SD/mean
variation

Distribution
Normal - continuous probability distribution-bell shape- not skewed
- 1SD 68%, 2SD 95%, 3SD 99.7%
- depends on mean and variance

-Central Limit Theorem –

-If the sample size is large enough, then the sample mean x has
an approximately Normal distribution
-This is true no matter what the shape of the distribution of
the original data!

Standard Normal - distribution with a mean of 0 and SD of 1

t-distribution - t-distribution is a symmetrical distribution

- more like the Normal distribution as the sample size ↑
- calculate CI and test hypothesis mean of 1 or 2 groups

F-distribution - characterized by two parameters(comparing 2 variances)

- define as ratio of the DF of the numerator & DF of the
denominator
- skewed to the right

Lognormal - skewed to the right

Skewed - +ve skewed (skewed to the right), tail toward the right
Check skewed - -ve skewed (skewed to the left), tail toward the left
= mean- mode - checking mean/median/mode to check normality/skew
SD - Variance - ↑ with ↑ value of variable – right skew
±1 (ok) - ↓ with ↑ value of variable – left skew
Kurtosis Kappa~0±1- mesokurtik
>+1- leptokurtic (narrow, sharp and pointed)
<-1 - platykurtik (broad and plateau like)

Binomial - proportion distribution(discrete variable with 2 outcomes)

- symmetrical if the sample size is large

Chi-squared - proportion distribution(discrete variable with> 2 outcomes)

- right skewed distribution taking positive values
- more like the Normal distribution as DF ↑

Transformation skew data to normal

Left skew Right skew Proportion/sigmoid Survival curve
Square reciprocal logit reciprocal
square root
log transformation
2. Sampling
Population set of things/objects in which we have an interest at that particular
time (Universe of all units being studied)
i.e If we want to study lung cancer in village A, then the study
population will be all the villager in village A.

Sample Subset of the population

Sampling or Discrepancy bet. samples and its population parameter as study only
random error a portion of the population-reduce random error by ↑ sample size

Random sampling- Each member of the population has an EQUAL CHANCE of

being chosen for the sample
Sampling methods:
Random Using the random no table

Systematic Rank 100 people divide in interval then any no in the each interval i.e
Sample 100 divide into 10 group
choose 5th person in each group,
i.e. Choose the 5th ,15th ,25th ... 85th ,95th persons

Stratified Composition of the sample reflects composition of the population

Sample (proportional stratified sample)
i.e Malaysian Population

Malays 60%
Chinese 25%
Indians 10%
Others 5%

Proportional Stratified Sample of 1000 Malaysians

Malays 600 (60%)
Chinese 250 (25%)
Indians 100 (10%)
Others 50 (5%)

Cluster Divide population into groups, random sample of groups is chosen i.e
Sample - divide entire city into ―city blocks‖
- Random sample of blocks is chosen
- Count every person in each city block selected

Multistage -combined of any above methods of sampling

sampling
3. Inferential statistics
Hypothesis testing
Null hypothesis - Continuous data-no difference in means exists bet. two groups
H0 - Discrete data- difference equal 0.5 in 2 proportions
population - only true or false, not probability
i.e Pre- menstrual & post- menstrual mean daily dietary intake is the
same. Or (Ho =µd=0)

Alternative - difference exists between two groups/ not equal

hypothesis - shouldn’t set direction i.e better than/worsen than
Ha - should use ―not equally to the treatment group‖

Type 1 rejecting a true Null Hypothesis

Error (α) -depends on no of hypothesis testing (α ↓ by Bonferroni
correction) and effect of interest

Type 2 accepting a false Null Hypothesis

Error (β) - depends on sample size and variability of the observations

Power - rejecting the null hypothesis when it is false

(1 minus β) - Power is ↑ with
0.8 1. ↑ sample size
2. larger significant level(α) (0.01→0.05)
3. ↑ effect of interest ( differences bet groups)
4. ↓ variability

Significance -Conventionally preset at 0.05

Level - probability that events occur < preset level of 0.05,it cannot be
explained by chance alone as a result the null hypothesis should be
rejected

p-value - P-value obtained from significance test

- P value is the probability of getting the observed or extreme
result ,if null hypothesis is true
-after obtaining P value then comparing the significance level to
determine rejecting Ho or not

Standard - is the standard deviation of a sample mean

error mean Data type
Continuous Discrete/proportion
SE= S/ √n SE (p) = √<p(1-p)/n>

S- standard deviation p=r/n

N- sample size r- samples with the characteristic
n- sample size
- SE - ↑ with variability, ↓ with sample size i.e
larger sample, smaller SE- more precise estimation of population
mean
- SEM always < SD and until N large enough to equal SD
Confidence -interval that 95% ce rtain contains the true population value.
interval -Value can be- mean, difference bet 2 means, a proportion etc

i.e CI contain population mean with 95% certainty

(not sample mean, not absolute value rather mean value )

Width of CI 1.Normal distribution

depends a. known variance, N>30 (est. distributed like standard normal)
1.sample size - Normal distribution
2.SD 95% CI= sample mean ± 1.96xSE (SE= SD/ √n)
b. unknown variance, N<30
-directly with - t-distribution
SD
-inversely N
t-0.05= DF-1 then look for 0.025(both tail) value in table

2.Proportion

95% CI= p±1.96

Small N- exact Binomial distribution;

Large N- Binomial distribution

Statistically significance if the 95% CI does not

- Value- 0 not in the mean difference CI,
- 1 not in odd/relative ratio,
- not overlap of 2 CI

Larger sample larger sample- smaller SEM, narrow CI

more precise Precise- narrower CI, more precise to population mean

i.e Reference range- 95% of the population value fall in

Sample size - sample size is crucial as too small, we may not be able to
Calculation detect an important existing effect

Sample size depends on parameter limit set

1. ↑power→ ↑ sample size
2. ↑ Significance level(0.01→0.05) → ↑sample size
3. ↑variability- standard deviation→ ↑ sample size
4. ↑interest of effect → ↓ sample size

Conclusion 1.Statistical Conclusion

2 type i.e P value< α 0.05, therefore, reject Ho. The observed result is
conclusions statistically significant and unlikely being due to chance alone

2.Problem Conclusion
i.e The mean BW of babies born in Hospital X is significantly
lower than that of the population mean
One-tail test and For example, Y value being a standard/reference value
Two-tail test
Two-tail test: to see if there is a difference bet. X and Y
One-tail test: to see if there is a difference bet. X and Y in
one particular direction
Example: We test to see if X > Y
Example: We test to see if X < Y

choosing a 1. type of variable- i.e continuous,discrete

significance test 2. normally distributed or not
3. How many samples? One, two or more?
4. If two samples, are they independent or paired/matched?

Make sure the assumptions of the test are not violated

Parametric
- the distribution of scores in a population is normal & variance
equal and when the sample size is large.

Nonparametric
- the distribution of scores in a population is not normal or if the
sample size is small

Choosing a significance test

Comparing mean/median of the data

Outcome Parametric - Normal distributed Non-parametric Observation

variable Observation related?- compare means related? – compare median
Unpaired/ Paired/ unpaired paired
independent dependent
continuous T-test – Paired T-test- Wilcoxon Wilcoxon
i.e BP, pain Mean between compares sum-rank sign-rank
score 2 independent groups means bet. 2 test (aka Mann- test:
related groups Whitney U test):

(e.g., the same - alternative

subjects before -alternative paired t-test
& after Rx) T-test
-alternative
-paired of one sample t-
value of a test
variable
ANOVA- Repeated Kruskal- Friedman
Mean >2 independent gp ANOVA Wallis test: - ANOVA
-know one group >2 related - alternative
significant difference fr groups to ANOVA/
other gp (without -ordinal data
knowing specific gp is (extension
better than one another or wilcoxon-
the lowest score is the sum)
worse)
Comparing relations hip/association between 2 continuous variable
Correlation - measures the strength and the direction of the linear
relationship between two continuous variables- displayed by scatter plots
Parametric Non-parametric
Pearson’s correlation Spearman rank correlation:
- linear correlation bet. 2 normally - alternative to Pearson’s
distributed continuous variables -1 or both variable not
- doesn’t implied causality normally distributed
- H0 - linear relationship bet. 2 variable
- r= 0, strong relationship in
correlation coefficient r the form of a curve
- r shows magnitude/strength and direction
- ranging fr -1 to 1 -if statistically significant,
- +1 or -1, perfect correlation with all the concluded that a relationship
points lying on the line exist but which is not
- 0, no linear correlation necessarily linear
- +ve or –ve direction of relationship
no R 2 in spearman
If CI contained no 0 ,p value significant

Coefficient of determination R2
proportion or % variation in the dependent
variable(y) that is explained by all the
independent variables(x)

- range fr 0-1
0- no relationship
1- perfect relation
- √R2 = r correlation coefficient

Regression- dependent variable is a function of independent variable

(predict how independent variables (X1 etc) affect a dependent variable (Y)

Different type of data using different method for regression

Continuous Linear/ multiple regression
Categorical Logistic/ multiple logistic regression
Rates Proportional hazards
Count Poisson regression

Simple regression
- 1 independent variable and 1 dependent variable
Type of simple regression

a. Linear regression
y=mX+b

Y= outcome, X=predictor, m= slope, b= intercept

M = regression coefficient-amt of change in dependent variable for 1 unit
change in independent variable
Residual= observed y – fitted y
-residual is normally distributed.
Mean Residual =0 Variance= any +ve value

b. Exponential regression
y= ae bx
x- variable

Multiple Linear Regression

- 1 dependent variable and >2 independent variable
- linear relationship bet. a dependent variable(y)and some covariates(x)
-know whether or not an explanatory variable(y) is linearly related to any
one of the dependent variable, after adjusting for other covariates( control for
confounder)

Univariate analysis- 1 variable Y depends on at least one of the Xi’s

Multivariate analysis
-if the set of variables Y1..Yk depend on at least one of the Xi’s
-linear transformation of the predictor variables so that the sum of squared
deviations of the observed & predicted values on the outcome variable is
minimized

Binary or Cell>5 Cells < 5

categorical unpaired Paired unpaired paired
(e.g.
Chi’s square McNemar’s Fishe r’s McNemar’s
fracture-
-compare proportion bet gps chi-square exact test: exact test:
yes/no)
- non-parametric test: compares compares
- row(r)/column(c)mutually compares proportions proportions
Rules
exclusive binary between between
1.Random
- DF= (r-1)(c-1) outcome independent correlated
Sampling
between groups groups
-expected(E) no in each cell = correlated when there when there
2.categorical
(relevant R total x relevant C total) groups (e.g., are sparse are sparse
variable
divided by the overall total before and data (some data (some
after) cells <5). cells <5).
3.Expected
Value -E - 20% of the cells allow E
-All E>1 frequencies < 5 (E value -all similar - no need
-2x2 table, counted to see Fisher’s exact Chi’square calculate
All E>5 test needed) features expected
-RxC table frequency
20% E <5 Ho =
-proportion of preferences of
certain type in population= 1/2
- differences bet. responses
for the pairs in population= 0
(O and E frequency
approximately equal)
Chi-square value
- 2X2 table, ½ is the
continuity correction

>2xk table, no continuity

correctn
(observed - expected) 2
2  
expected

- know one group significant

difference from other group
(without know specific gp is
better)

- looking for a trend in

proportions, categories to be
in ordered
-If row/column associated
then rejected Ho

4.Survival analysis
- a collection of analytical procedures where the outcome of interest is the time until an
event of inte rest occurs.
-event: death, disease relapse, fracture occurred

- Time-to-event: time from entry into a study until a subject has a particular outcome
- Censoring: occurs when we have some information about the individual, but we do
not know the survival time exactly.
a. Right-censor for
1.no events after the study ends 2. lost to follow- up 3.Withdraw from study 3.die of
other cause (car accident)

b. Left-censor- never follow- up since the beginning

Survial fx-S(t) gives the probability that a person survives longer than some specified
time, t.

Hazard fx- h(t) gives the instantaneous potential per unit time for the event to occur,
given that the individual has survived up to time t.

Relationship: as h(t) increases, S(t) decreases

Survival analysis methods

Life-table use reasons
- large sample(ind. data not feasible cf interval group of data)
- exact specific time event unknow (i.e 6-10 wk)
Kaplan-Meier
-method of estimating time-to-event models in the presence of censored cases

- In survival analysis: intervals are defined by failures i.e

Survival of 10 breast ca patient
After 1 death: Survival # 9/10 1st interval
After 1 drop out
After 1 drop and 2nd death 2nd interval
P(surviving intervals 1 and 2)=9/10 X 7(9-2)/8(10-2) = 0.79

Biases of survival analysis- selection, information, measurement, confounding bias

Significant test for survival analysis:
log-rank, cox proportional hazards, mantel haenszel test

Use log-rank test to test the null hypothesis of no difference between survival functions
of the two groups- non-parametric

Cox proportional hazards regression

-assumptions: hazard ratio should be constant across time
Observations should be independent

Hazard fx
-measure of the potential for the event to occur at a particular time t,
given that the event did not occur, yet.
Larger values of the hazard function indicate greater potential for the event to occur

Hazard ratio-hazard/chance of events occurring in the treatment as a ratio of the hazard

of the events occurring in the control arm.

-HR
> 1 ↑ risk
< 1 ↓ risk
= 1 equal, no risk changes

5.Study methods- observational and experimental

Observational
- Descriptive Case report, case series, cross-sectional
- Analytical Cross sectional,Case control,Cohort study,Ecological
Experimental Any intervention. i.e RCT
longitudinal Samples are follow-up for a period of time

Observational
Case study written description of a patient or generate hypothesis
/series particular problem - rare diagnosis, event
Retrospective
Cross-sectional Prevalence study
Methods Pro:
- Random cross-section of population 1.Cheap, fast
- assess exposure & outcome 2.describe characteristic in large pop.
simultaneously in a defined population at a
single point of time Con:
(often unclear if the exposure preceded the 1.Recall bias
outcome-no causality) 2. Outcome/disease may be affected by
-Calculate prevalence point in time(seasonal variation, law,
health)
3.reversal temporality

Repeated cross-sectional Changes over time

-Target population is not follow/new group
of sample during the repeated time frame
2.Case-control study Pro:
- rare disease
- Cheap, fast
- Explore multiple hypotheses

Cons:
- recall bias
-Temporal relationships cannot be
established

Methods:
- Start with people with disease Interpretation odd
- Match them with controls (w/o disease) -odd ratio 5
- Look back and assess exposures
-control gp shouldn’t be selected base on Odd of outcome(dss) among expose 5
expose or not -affect analysis times of those unexposed.
-Odd ratio is calculated
- OR estimate of RR when the dss is rare
3.Ecologic Studies
- looks for an association between an
exposure and an outcome at the group
level rather than individuals
- Any relationship that is determined is bet
the factor and the group of individuals
Prospective
Cohort study - etiology, prognosis, natural history

Pro:
1. temporality bet. exposure & outcome
to establish cause
2. study for rare exposures
3. can study multiple outcome
4. better quality
Methods 5. direct risks assessments
-Begin with disease- free people - Incidence rate
-Classify people as exposed/unexposed - Relative risk
- Record outcomes of both groups - Attributable risks
- Compare outcomes using relative risk
RR = 1 ,equally both Cons:
RR>1 ↑ risks : RR<1 ↓ risks 1.expensive, lost f/up (long duration),
size(large no.)
RR= i.e 3 2. Inappropriate to study rare outcome
interpretation:
-3 time the risks of
-2 or 200%(3-1) increase risks

Example
Incidence cancer for smoker(28) / non-
smoker (20) = 1.4
1.Incidence in smoker was 1.4 times that in
non-smoker
2.Incidence in smoker was 40% (1.4-
1/1x100%)higher > in non-smoker

3.atrributable risks- <(28-20)/28> =29%

Among smoker, 29% cancer attributable to
smoking

Odd and relative risk

Absolute risks/ Absolute Risk (of dying if exposed) = A / (A + B)

Incidence rate Absolute Risk (of dying if not exposed) = C / (C + D)

Absolute risk measure of how much disease incidence is attributable

reduction to exposure- burden
= Absolute Risk (if exposed) – Absolute Risk (if not exposed)
= A / (A+B) – C / (C+D)
No needed-to- No need to treat to prevent one adverse event
treat =1/ absolute risk reduction (cost effectiveness )
(PAR) Population Attributable risk
= A / (A+B) – C / (C+D)
A / (A+B)

Relative risks measure of strength of association between exposure and

-Cohort, disease and is useful in analytical studies- risk
control trial = (A/(A + B)) / (C/(C + D))

Odd ratio Odds Ratio = (A / C) / (B / D)

- Case-control -odd of exposed/exposed in diseased/case gp divided by odds of
exposed/exposed in non-diseased/ control group

Causality 9 Hill’s criteria

Temporal exposure precede outcome
relationship
Strength measured by appropriate statistical tests
Dose-Response
Relationship

Consistency association results replicate in other study

Plausibility association compatible with current
pathological process
Alternate
Explanations
Experiment

Specificity single putative cause produces a specific effect

Coherence ass compatible with existing theory &
knowledge

6. Clinical trials
Clinical trial I- safety profile of treatment(use 1/10 of LD10 of mice)
Phase II- study safety,efficacy, dose responses
(follow ph 1,conducted on larger gp to determine safety,
identify side effect, safe dosage range)
III- compare standard treatment
IV- post-marketing surveillance
Clinical trial - subjects randomized into at least two groups
- control arm- receives either a placebo/current standard of care
- intervention arm – agent being studied in this group

Rx arm
-Observe effect= Rx effect + NC(natural course )+ EE
(external i.e placebo)+ Error(bias)
Control arm
- Observe effect= NC+EE+ bias
Protocol Written description of all aspects of the trial i.e inclusion,
exclusion, sample size, endpoint should be prepared
Ethic and - Trials must approved by ethical committee which does not
consent contravene Declaration of Helsinki
- informed consent must obtained from patient
-placebo(chance of randomization) must inform to patient

RCT – 3 elements :Randomization, Placebo, Blinding

Randomization –
subjects are randomly assigned -equal chance to any one of the treatment group

-index and reference groups comparable for all known and unknown factors
-to avoid selection bias and confounding bias
-does not guarantee equal distribution in small sample

Methods randomization
simple -Using random table
randomization -No guarantee of equal or approximately equal sample size in
each treatment group( imbalance of no participants in each group)

Block Aka Restricted randomization

randomization -groups of experimental units within a predefined block of units
assumed to be internally homogeneous
-guarantee balance in number

Stratified Individuals are identified based on important covariates (sex,

randomisation age, etc.) and then randomization occurs within the strata
-guarantee balance in number

Ways to carry out 1.Flipping coins

randomization 2.computer generates no
3. Concealment of allocation
- Procedure to protect the randomization process before the
subject enters the trial
-always feasible, if not done- leads to selection bias

Placebo -avoid placebo effect by subjects and natural history of dss

Blinding/ - Masking of the treatments after randomization (once trial

Masking begins) not always feasible
- avoid measurement and information bias(observer bias)
Blind
Single pt unaware of treatment(avoid placebo effect)
Double pt and investigator unaware of treatment(avoid
observer bias)
Triple Pt, investigator and statistician unaware

Open-label Unblind- unaviodable i.e in surgical procedures, comparison of

different devices and changes in lifestyle (modifiable risk factors)
Advantage 1.randomization- control of known & unknown factors
Of RCT influence outcome
2.control of exposure
3.high internal validity
4.a true measure of efficacy

Disadvantage of 1.limited external validity-result must also be relevant to patient

RCT clinically
2.cost and time
3. Limits to generalisability- Selection of the study population
(bias)
4.ethical dilemma
Type of study
1.Parallel

2.Cross-over

- each group receive each arm of Rx and after washout period

switching over to the other Rx
- each patient was used as his/her own control
- Fewer patients need to be recruited
- Individual differences in responses to the two treatments were
calculated and analysed (his/her own control)
3.Factorial

- experiment which there are ≥ 2 factors of interest

- Each factor in a factorial experiment can be ≥ 2 levels
- allows the study of interactions
- interaction bet 2 factors implies that the difference bet the
levels of one factor is not constant for the various levels of the
other factor
- less sample size
interim analyses Preplanned analysis i.e ensure that no major toxicities have
occurred/significant benefit requiring the trial to be stopped
prematurely

Intention-to-treat Participants will be counted in the intervention group to which

they were originally assigned for analysis, even if they:
 Refused the intervention after randomization
 Discontinued the intervention during the study
 Followed the intervention incorrectly
 Violated study protocol
 Missed follow-up measurements
Reasons
- Preserves the benefits of randomization.
Randomization balances potential confounding factors in the
study arms. This balance will be lost if the data are analyzed
according to how participants self-selected rather than how
they were randomized

- Simulates real life, where patients often don’t adhere perfectly

to treatment or may discontinue treatment altogether

NNT -clinical cost effectiveness of the results of a trial

Endpoint - a predefine outcome for an individual

- does not have to be a clinical outcome measure
-composite endpoint requires the participant to experience one
of a number of possible endpoints
- surrogate endpoint is a biomarker intended to substitute for a
clinical endpoint

10 endpoint – related efficacy i.e tumour regression, morbidity,

local recurrence, distance mets, death

20 endpoint- related to toxicity

Systematic review
Meta-analysis - Particular type of systematic review that focuses on the
numerical results.
- combine results fr all individual studies(clinical/observation)

Advantages
1.Increase the numbers of observations and the statistical power
2. Improve the estimates of the effect size of an intervention or
an association
3. Ability to control for between-study variation including
moderators to explain variation.

Disadvantages
1. Publication bias- very hard to get published studies that show
no significant results.
2.Sources of bias are not controlled by the method
3. May include badly designed studies that will cause flawed
statistics

Forest plot
- display RR and CI of each and combined all trial
- CI for the combined RR in the meta-analysis would have been
narrower(more precise) than any of the individual studies as
larger numbers after combined each studies
- random effects model used if heterogeneity between studies

- Funnel plot is a useful graph designed to check the existence

of publication bias in systematic reviews and meta-analyses

7. Epidemiology
Effect modification -magnitude of the effect of a particular exposure o n the outcome
will vary according to the presence of a third factor
- i.e Smoking and asbestos exposure alone →lung Ca, but
together, asbestos and smoking multiply the risk of lung cancer
significantly (a ―synergistic‖ effect).
- cannot be corrected or eliminated.

Stratification eliminate confounding and describe effect modification.

Prevalence point prevalence – no of people who have a disease or condition
P= I X D at a given point in time in a deﬁne population

p-prevalence period prevalence- the total number of people known to have

i- incidence had the condition at any time during a speciﬁed period
d-duration Duration long – overestimate prevalence
Incidence(depends on mortality rate and duration) – may or may
not affect prevalence

Incidence Incidence = no of new cases of a disease over a period of time

population at risk of the disease in the time period
Adjusted Direct and indirect method
rate/standardization
-remove common Direct standardization
confounding - need larger population with stable age –specific rate
factors: age/sex - suitable for cancer incidence rate

Workout
- age-specific rate X standard population= no of death
- total no of death of each age-specific gp/ total standard
population = standardization death rate (SDR)

Indirect standardization
-Smaller study population
Workout
-age specific rate of standard population x study age specific gp
= expected no of death
-SMR= Total observed death/total expected death x100%, if
>100% higher rate, <100% lower rate

-SDR= SMR x CDR standard population

Bias
- systematic error
- 3 categories of bias: selection, measurement, confounding
Selection bias
-selected samples are not representative of study population

i.e
1. Berkson fallacy-
-Select control subjects for a case-control study from hospitalized
-exposure frequency in hospitalized patients does not necessarily that of the general
population

2. Referral bias
-samples fr specialized medical centers (maybe> severe illness community hospital)

3.Selective loss to follow-up(cohort studies)

- high rate of follow-up loss creates a high potential for selection bias in prospective
studies

4. Non-response bias
- study design allows subjects to decide whether or not to participate in the study.
Imagine a health survey conducted by a random selection of phone numbers. The
phone numbers selected are called and people are interviewed using a standardized
questionnaire. There are always people who would refuse to participate in the survey.
If the refusal is somehow related to their health status (e.g., they are sicker than the
general population), then non-response selection bias results.

5. Prevalence bias (Neyman bias) may occur when incidence of a disease is estimated
based on prevalence, and data become skewed by selective survival. Diabetics are
more likely to die from myocardial infarction than are non-diabetics. If living patients
who have sustained myocardial infarction are asked about their diabetes status, it is
likely that diabetics will be under-represented because non-diabetics 'selectively
survived their cardiovascular events.

6. Susceptibility bias occurs when the treatment regimen selected for a patient
depends on the severity of the patient's condition. Imagine patients with acute
coronary syndrome. Healthier patients may be preferentially selected for coronary
intervention, while sicker patients may instead be selected for medical therapy. This
may create bias whereby outcomes from coronary intervention appear superior to
medical therapy simply because the subjects who underwent coronary intervention
were healthier.
Measurement/information- inaccurate estimate of exposure/outcome
- Measurement bias implies that exposure and/or outcome data are systematically
misclassfied (e.g exposed cases are labeled as unexposed).

- Misclassification
- differential - exposure (or disease) is related to disease (or exposure) or
- non-differential- exposure (or disease) is unrelated to disease (or exposure)

1. Recall bias -a potential problem in case-control studies result in overestimation of

the effect of exposure.

2. Observer bias (ascertainment bias, detection bias or assessment bias) occurs when
the investigators decision is adversely affected by knowledge of the exposure status.
Blinding of the heath care provider is an effective tool to avoid observer bias.

Confounding
- extraneous factor must have some properties linking it with the exposure and
outcome(confounding is related to both the explanatory variable and the response)
1. Factor is associated with exposure
2.Factor is associated with disease in the absence of exposure
3.Factor is not in the causal path between exposure and outcome

Control confounding methods

1. Study design
- Randomization for RCT
- Restriction
- Matching for case-control

2.Analysis
- Stratification
- Multivariate analysis

Screening test- aiming earlier detection & improved OS/prognosis

What Diseases:

-must be an important health problem.

-recognizable latent or early symptomatic stage.
-natural history of the dss(latent & declared dss) should be understood
-detect the dss prior to the onset of signs and symptoms
-test to confirm diagnosis
-treatment available
- early detection and treatment reduces morbidity and mortality

When Routine(antenatal test) or ad-hoc (episodic)-outbreak

Criteria
Acceptability - Simple, safe,easily administered
- Test should be acceptable to the population screened/administer
Repeatability consistent results when repeated – high precision
Reliability (precision, reproducibility) affected by:
•Observer variation: tested by repeat measurements at the same time
1.Intraobserver(variation between same observer )
2.Interobserver (variation between different observers)

•Biological variation : tested by repeat measurements over time

-variations in the way patients perceive their symptoms

•Variation due to the test method e.g. defective instruments, error test

Validity -what extent the test accurately measures which it purports to measure
-Testing validity: sensitivity and specificity

Sensitivity = - given the subject is ill, probability of a +ve test

TP(a)___ PID- +ve in disease (diseased case with +ve test)
TP+FN(a+c) - not affected by prevalence
- screening test and rule out(Snout)

Specificity = - given the subject is not ill, probability of a -ve test

__TN(d)____ - not affected by prevalence
TN+FP (d+b) -confirmatory test and rule in(SPin)

Validity of test-
Ideal Sn&Sp=1
(impossible ideal
test)

bet +ve/-ve result

of a test
-A- 100% sensivity B- 100% specificity
- X- tradeoff bet the two, as X shift more to Sn
→ > ↑True/false +ve(>Sn); ↓ TN (<Sp)
-↓ AUC of FN/FP → higher Sn & Sp

Sensitivity should be ↑:
•Penalty associated with missing a case is high e.g.
•When the disease is serious & definitive treatment exists.
•When the disease can be spread
•When subsequent diagnostic evaluations are associated with
minimum cost & minimum risk
Specificity should be ↑:
•When false positive results can harm patient physically,
financially or emotionally.
•When costs or risks associated with further diagnostic
techniques are substantial.

ROC
- trade-off bet Sn
& Sp available for
the test

Use:
1.decide cut offs
value for a test
2.determine the - Graph- sensitivity (y) vs (x) 1-specificity (false positive)
ability of a test to - Better test- more to the left and upper in the curve
predict dss - AUC- calculate the diagnostic accuracy of the test
- A - larger the AUC ~ 1,better the test is
- B - AUC ~0.5 straight line lack predictive value

Positive predictive -given a +ve test result, probability that a subject is actually ill
value PPV= TP/ TP + FP depends on Prevalence & Specificity

higher prevalence, (+ve test is more likely true +ve) PPV higher
higher specificity, lower False-+ve, higher PPV

Negative -given -ve test result, probability a subject doesn’t have the dss
predictive value NPV= TN/TN+FN

Lower prevalence (- ve test more likely true –ve)

+ve likelihood =Sentivity/(1-specitivity) – not affected by prevalence

ratio ratio of the chance of a positive result if the patient has the
disease to the chance of a positive result if he/she does not have
the disease.
- i.e LLR of 9 –
test +ve 9x more in diseased than pt without disease
Bias in screening
Selection bias Volunteer bias
Lead-time OS appeared improved as earlier detection by screening but not
successful intervention/Rx
Length-time OS improved as slow- growing/non-aggressive tumor detected
earlier.
Overdiagnosis bias -false +ve in those enthusiasm for a new screening

Basic Statistics For Health Sciences
91% (11)
Basic Statistics For Health Sciences
361 pages
Textbook of Clinical Trials in Oncology A Statistical Perspective 9781138083776 1138083771 9781315112084 Compress
No ratings yet
Textbook of Clinical Trials in Oncology A Statistical Perspective 9781138083776 1138083771 9781315112084 Compress
645 pages
Lecture 1 - Online - INTRODUCTION TO BIOSTATISTICS (Compatibility Mode)
100% (1)
Lecture 1 - Online - INTRODUCTION TO BIOSTATISTICS (Compatibility Mode)
28 pages
1 Biostatistics LECTURE 1
100% (1)
1 Biostatistics LECTURE 1
64 pages
Stats 1 For Students
No ratings yet
Stats 1 For Students
60 pages
Psychology 117 Study Guide
100% (3)
Psychology 117 Study Guide
41 pages
‎⁨نسخة ملزمة-الإحصاء⁩
No ratings yet
‎⁨نسخة ملزمة-الإحصاء⁩
165 pages
Bio Statistics
No ratings yet
Bio Statistics
97 pages
Full Slides Beginselen2019
No ratings yet
Full Slides Beginselen2019
364 pages
BRM Answer Key Q Bank by Alam.
No ratings yet
BRM Answer Key Q Bank by Alam.
90 pages
Introduction To Biostatistics: Dr. M. H. Rahbar
No ratings yet
Introduction To Biostatistics: Dr. M. H. Rahbar
35 pages
Bio-Statistics and RD Lecture Note
No ratings yet
Bio-Statistics and RD Lecture Note
176 pages
1.biostatistics Introduction
No ratings yet
1.biostatistics Introduction
72 pages
Biostatistics 140127003954 Phpapp02
No ratings yet
Biostatistics 140127003954 Phpapp02
47 pages
Statistical Analysis
No ratings yet
Statistical Analysis
93 pages
Statistics През
No ratings yet
Statistics През
46 pages
Lecture 1-2-118
No ratings yet
Lecture 1-2-118
117 pages
Intro SRM
No ratings yet
Intro SRM
73 pages
Basics of Statistics
No ratings yet
Basics of Statistics
40 pages
Stats 1 Module Updated
No ratings yet
Stats 1 Module Updated
53 pages
Biostatistics PDF
No ratings yet
Biostatistics PDF
40 pages
Understandingstatisticsinresearch 151026064600 Lva1 App6892
No ratings yet
Understandingstatisticsinresearch 151026064600 Lva1 App6892
37 pages
Bio Statistics
No ratings yet
Bio Statistics
72 pages
Biostat Aguila Mission Solis
No ratings yet
Biostat Aguila Mission Solis
44 pages
Mean, Median, Mode and Standard Deviation
No ratings yet
Mean, Median, Mode and Standard Deviation
42 pages
Week1 Introduction
No ratings yet
Week1 Introduction
36 pages
AEB02 - Basic Biostatistics (FE)
No ratings yet
AEB02 - Basic Biostatistics (FE)
36 pages
43hyrs Principles of Statistics 3
No ratings yet
43hyrs Principles of Statistics 3
56 pages
Basic Concepts in Biostatistics-1
No ratings yet
Basic Concepts in Biostatistics-1
40 pages
Presentación - Clase 1 - I. Estadistica
No ratings yet
Presentación - Clase 1 - I. Estadistica
57 pages
Biostatistics Nutrition 2
No ratings yet
Biostatistics Nutrition 2
20 pages
WK 1b Biostat
No ratings yet
WK 1b Biostat
38 pages
Biostatistics Notes
100% (1)
Biostatistics Notes
8 pages
Stats For Primary FRCA
No ratings yet
Stats For Primary FRCA
7 pages
Bio Statistics (Presentation)
No ratings yet
Bio Statistics (Presentation)
46 pages
Statistics Notes
No ratings yet
Statistics Notes
18 pages
Theresa Hughes Data Analysis and Surveying 101
No ratings yet
Theresa Hughes Data Analysis and Surveying 101
37 pages
Prof. Joy V. Lorin-Picar Davao Del Norte State College: New Visayas, Panabo City
No ratings yet
Prof. Joy V. Lorin-Picar Davao Del Norte State College: New Visayas, Panabo City
91 pages
A Brief (Very Brief) Overview of Biostatistics: Jody Kreiman, PHD Bureau of Glottal Affairs
No ratings yet
A Brief (Very Brief) Overview of Biostatistics: Jody Kreiman, PHD Bureau of Glottal Affairs
56 pages
Class 1 - Descripritive Statistics
No ratings yet
Class 1 - Descripritive Statistics
46 pages
Notes For Biostat
No ratings yet
Notes For Biostat
17 pages
Comm 215.MidtermReview
No ratings yet
Comm 215.MidtermReview
71 pages
AP Stats Study Guide
No ratings yet
AP Stats Study Guide
17 pages
Unit II: Basic Data Analytic Methods
No ratings yet
Unit II: Basic Data Analytic Methods
38 pages
Classification of Data: Objectives: Understand How Data Are Classified. Recognize The Different Types of Data
No ratings yet
Classification of Data: Objectives: Understand How Data Are Classified. Recognize The Different Types of Data
39 pages
Intro To Biostat in The Health Sciences
No ratings yet
Intro To Biostat in The Health Sciences
29 pages
Statistics 1 (Final) / Orthodontic Courses by Indian Dental Academy
No ratings yet
Statistics 1 (Final) / Orthodontic Courses by Indian Dental Academy
15 pages
Biostatistics Notes: Descriptive Statistics
No ratings yet
Biostatistics Notes: Descriptive Statistics
16 pages
Biostat Prelims
No ratings yet
Biostat Prelims
10 pages
Biostatistics Summary
No ratings yet
Biostatistics Summary
5 pages
Introduction To Bio Statistics
No ratings yet
Introduction To Bio Statistics
53 pages
Psych Stats Prelim
No ratings yet
Psych Stats Prelim
4 pages
Elementary Statistics and Probability Chapter 1 3
No ratings yet
Elementary Statistics and Probability Chapter 1 3
5 pages
ST Formula Sheet Midterm
No ratings yet
ST Formula Sheet Midterm
4 pages
Intro To Stats
No ratings yet
Intro To Stats
4 pages
Biostatistics Notes
No ratings yet
Biostatistics Notes
10 pages
A. Variables:: Types of Distributions
No ratings yet
A. Variables:: Types of Distributions
10 pages
Pediatric Pharmacotherapy Self Assessment 1st Edition Sandra Benavides Download
100% (2)
Pediatric Pharmacotherapy Self Assessment 1st Edition Sandra Benavides Download
61 pages
Users' Guides To The Medical Literature A Manual For Evidence Based Clinical Practice 2nd Edition Educational Ebook Download
100% (20)
Users' Guides To The Medical Literature A Manual For Evidence Based Clinical Practice 2nd Edition Educational Ebook Download
14 pages
Cheat Sheet 1
No ratings yet
Cheat Sheet 1
2 pages
Clinical Trials Study Design, Endpoints and Biomarkers, Drug Safety, and FDA and ICH Guidelines, 2nd Edition Official Ebook Release
100% (11)
Clinical Trials Study Design, Endpoints and Biomarkers, Drug Safety, and FDA and ICH Guidelines, 2nd Edition Official Ebook Release
17 pages
Statistics: An Introduction and Overview
No ratings yet
Statistics: An Introduction and Overview
51 pages
Example 001 126 Merged
No ratings yet
Example 001 126 Merged
524 pages
Chapter 1. The Nature of Probability and Statistics
No ratings yet
Chapter 1. The Nature of Probability and Statistics
5 pages
Journal of Advanced Nursing - 2008 - Daley - Therapeutic Touch Nursing Practice and Contemporary Cutaneous Wound Healing
No ratings yet
Journal of Advanced Nursing - 2008 - Daley - Therapeutic Touch Nursing Practice and Contemporary Cutaneous Wound Healing
10 pages
Medical Biotechnology
No ratings yet
Medical Biotechnology
33 pages
STAT-eBook-gene and Cell Therapies
No ratings yet
STAT-eBook-gene and Cell Therapies
33 pages
Principles and Practice of Clinical Trial Medicine Secure Download
No ratings yet
Principles and Practice of Clinical Trial Medicine Secure Download
15 pages
Current Treatments For BCG Failure in Non-Muscle Invasive Bladder Cancer (NMIBC)
No ratings yet
Current Treatments For BCG Failure in Non-Muscle Invasive Bladder Cancer (NMIBC)
10 pages
Yoo 13 - PLANETRA Study
No ratings yet
Yoo 13 - PLANETRA Study
10 pages
Aliment Pharmacol Ther - 2023 - Fass - Randomised Clinical Trial Efficacy and Safety of On Demand Vonoprazan Versus
No ratings yet
Aliment Pharmacol Ther - 2023 - Fass - Randomised Clinical Trial Efficacy and Safety of On Demand Vonoprazan Versus
12 pages
24 Month Intervention With A Specific Multinutrien
No ratings yet
24 Month Intervention With A Specific Multinutrien
11 pages
INmune Bio's XPro™ Phase II Alzheimer's Trial - Design, Mechanism, and Investment Analysis
No ratings yet
INmune Bio's XPro™ Phase II Alzheimer's Trial - Design, Mechanism, and Investment Analysis
11 pages
台灣必治妥施貴寶股份有限公司「舒停復膜衣錠6毫克」 (衛部藥輸字第028554號)
No ratings yet
台灣必治妥施貴寶股份有限公司「舒停復膜衣錠6毫克」 (衛部藥輸字第028554號)
8 pages
Study Details - Ivermectin For Severe COVID-19 Management - ClinicalTrials - Gov
No ratings yet
Study Details - Ivermectin For Severe COVID-19 Management - ClinicalTrials - Gov
20 pages
10.1007@s40257 019 00482 2
No ratings yet
10.1007@s40257 019 00482 2
6 pages
Aflibc06561 Summary
No ratings yet
Aflibc06561 Summary
9 pages
Master Protocols Draft Guidance For Industry
No ratings yet
Master Protocols Draft Guidance For Industry
25 pages
Efficacy and Safety Profile of Oral Creatine Monohydrate in Add-On To Cognitive-Behavioural Therapy in Depression
No ratings yet
Efficacy and Safety Profile of Oral Creatine Monohydrate in Add-On To Cognitive-Behavioural Therapy in Depression
8 pages
Diabetes Obesity Metabolism - 2013 - Yale - Efficacy and Safety of Canagliflozin in Subjects With Type 2 Diabetes and
No ratings yet
Diabetes Obesity Metabolism - 2013 - Yale - Efficacy and Safety of Canagliflozin in Subjects With Type 2 Diabetes and
11 pages
Diepss
No ratings yet
Diepss
12 pages
3.2 Woodbury - 2012
No ratings yet
3.2 Woodbury - 2012
6 pages
Efficacy and Safety of Zapnometinib in Hospitalised Adult Patient - 2023 - Eclin
No ratings yet
Efficacy and Safety of Zapnometinib in Hospitalised Adult Patient - 2023 - Eclin
12 pages
Jamaoncology Manji 2023 BR 230016 1702491848.52751
No ratings yet
Jamaoncology Manji 2023 BR 230016 1702491848.52751
6 pages
Daclizumab: Development, Clinical Trials, and Practical Aspects of Use in Multiple Sclerosis
No ratings yet
Daclizumab: Development, Clinical Trials, and Practical Aspects of Use in Multiple Sclerosis
17 pages
Clinical Trials Book
No ratings yet
Clinical Trials Book
11 pages
TUD APOLLO Synopsis Final V1!0!20160114 Engl
No ratings yet
TUD APOLLO Synopsis Final V1!0!20160114 Engl
6 pages
Project Ra Latest
No ratings yet
Project Ra Latest
2 pages
Endpoint Guidelines
No ratings yet
Endpoint Guidelines
3 pages