0% found this document useful (0 votes)
13 views24 pages

2statsnotes 1

Uploaded by

Sana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views24 pages

2statsnotes 1

Uploaded by

Sana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Series

Clinical
Oncology
Revision Notes
Biostatistics
Biostatistics ®

Contents

1. Descriptive statistics
1.1 Types of data
1.2 Displaying Data methods
1.3 Measure of central tendency
1.4 Measures of Dispersion
1.5 Distribution
2. Sampling
3. Principle of statistical inference
4. Survival analysis
5. Study methods
6. Clinical trials
7. Epidemiology
1.Descriptive statistics
Types of data
Variable-characteristic of interest (i.e diet regime) which takes different values in
different items/objects (different regime’s body weight)

A.Independent or dependent
Independent predictive factor that has an impact on a dependent variable
e.g different dietary fat regimens
Dependent outcome that reflects the effects of changing the indepe ndent
e.g., body weight under different dietary fat regimens

B. Qualitative or quantitative
Categorical (qualitative-cannot measure/countable)
Both order/unorder Unordered categories Ordered categories
Binomial (2 group) Nominal (> 2 gps) Ordinal
- Dead/alive - type of cancer -Stage breast cancer - I, II, III, IV
(unorder) - Blood gp O, A, B, AB -Birth order—1st, 2nd, 3rd,
- better/worse -Letter grades (A, B, C, D, F)
(order)
Numerical (quantitative-can be measured)
Continuous Discrete
Blood pressure, height, weight, age - Number of children
- Number of attacks of asthma per week
- Quantitative variables are converted to categorical ones for descriptive purposes
- Categorising data is useful for summarising results not for statistical analysis.

Displaying Data methods


Graphical display- looking for pattern Numerical- objective, precise
Categorical variables Continuous variables
- Bar Chart - Stem-and-Leaf Plot (both for 1 category vs 1 cont data)
(length of bar = - Box Plot i.e gender(F,M) vs height
frequency) - Histogram (area = frequency)
- pie chart - Dot plot/scatter plot (both continuous variable)

Measure of central tendency


Mean
arithmetic Average (affected by extreme values/outliers)

geometric - obtained by taking the antilogarithm of the arithmetic mean


of the log data
- geometric mean is useful if the data are skewed to the right
-close to the median if data are skewed to the right

median - 50th pe rcentile value


- the middle value in a sequentially ordered group of numbers
- mid 1st and 2nd added up/2 (if even number of data)
- not affected by extreme value(outlier)

mode most frequently occurring value, not affected by extreme value(outlier)


Measures of Dispersion (spread or variability)
Range In between the largest and the smallest observations
-Distorted by outliers
Percentile 1-99 percentile
Quartile -Data divided into 4 equal portions

Interquartile range 3rd quartile – 1st quartile = Q3(75%) – Q1(25%)


-not affected by extreme value(outlier)

Variance Average squared deviations of values from the mean


= <∑(mean- x)2 >/(n-1)

Standard Deviation Square root of variance


(SD) - variation about the mean
- affected by outlier- SD larger than expected
- affected by skewed data- over-estimate the spread

coefficient of =SD/mean
variation

Distribution
Normal - continuous probability distribution-bell shape- not skewed
- 1SD 68%, 2SD 95%, 3SD 99.7%
- depends on mean and variance

-Central Limit Theorem –


-If the sample size is large enough, then the sample mean x has
an approximately Normal distribution
-This is true no matter what the shape of the distribution of
the original data!

Standard Normal - distribution with a mean of 0 and SD of 1

t-distribution - t-distribution is a symmetrical distribution


- more like the Normal distribution as the sample size ↑
- calculate CI and test hypothesis mean of 1 or 2 groups

F-distribution - characterized by two parameters(comparing 2 variances)


- define as ratio of the DF of the numerator & DF of the
denominator
- skewed to the right

Lognormal - skewed to the right

Skewed - +ve skewed (skewed to the right), tail toward the right
Check skewed - -ve skewed (skewed to the left), tail toward the left
= mean- mode - checking mean/median/mode to check normality/skew
SD - Variance - ↑ with ↑ value of variable – right skew
±1 (ok) - ↓ with ↑ value of variable – left skew
Kurtosis Kappa~0±1- mesokurtik
>+1- leptokurtic (narrow, sharp and pointed)
<-1 - platykurtik (broad and plateau like)

Binomial - proportion distribution(discrete variable with 2 outcomes)


- symmetrical if the sample size is large

Chi-squared - proportion distribution(discrete variable with> 2 outcomes)


- right skewed distribution taking positive values
- more like the Normal distribution as DF ↑

Transformation skew data to normal


Left skew Right skew Proportion/sigmoid Survival curve
Square reciprocal logit reciprocal
square root
log transformation
2. Sampling
Population set of things/objects in which we have an interest at that particular
time (Universe of all units being studied)
i.e If we want to study lung cancer in village A, then the study
population will be all the villager in village A.

Sample Subset of the population

Sampling or Discrepancy bet. samples and its population parameter as study only
random error a portion of the population-reduce random error by ↑ sample size

Random sampling- Each member of the population has an EQUAL CHANCE of


being chosen for the sample
Sampling methods:
Random Using the random no table

Systematic Rank 100 people divide in interval then any no in the each interval i.e
Sample 100 divide into 10 group
choose 5th person in each group,
i.e. Choose the 5th ,15th ,25th ... 85th ,95th persons

Stratified Composition of the sample reflects composition of the population


Sample (proportional stratified sample)
i.e Malaysian Population

Malays 60%
Chinese 25%
Indians 10%
Others 5%

Proportional Stratified Sample of 1000 Malaysians


Malays 600 (60%)
Chinese 250 (25%)
Indians 100 (10%)
Others 50 (5%)

Cluster Divide population into groups, random sample of groups is chosen i.e
Sample - divide entire city into ―city blocks‖
- Random sample of blocks is chosen
- Count every person in each city block selected

Multistage -combined of any above methods of sampling


sampling
3. Inferential statistics
Hypothesis testing
Null hypothesis - Continuous data-no difference in means exists bet. two groups
H0 - Discrete data- difference equal 0.5 in 2 proportions
population - only true or false, not probability
i.e Pre- menstrual & post- menstrual mean daily dietary intake is the
same. Or (Ho =µd=0)

Alternative - difference exists between two groups/ not equal


hypothesis - shouldn’t set direction i.e better than/worsen than
Ha - should use ―not equally to the treatment group‖

Type 1 rejecting a true Null Hypothesis


Error (α) -depends on no of hypothesis testing (α ↓ by Bonferroni
correction) and effect of interest

Type 2 accepting a false Null Hypothesis


Error (β) - depends on sample size and variability of the observations

Power - rejecting the null hypothesis when it is false


(1 minus β) - Power is ↑ with
0.8 1. ↑ sample size
2. larger significant level(α) (0.01→0.05)
3. ↑ effect of interest ( differences bet groups)
4. ↓ variability

Significance -Conventionally preset at 0.05


Level - probability that events occur < preset level of 0.05,it cannot be
explained by chance alone as a result the null hypothesis should be
rejected

p-value - P-value obtained from significance test


- P value is the probability of getting the observed or extreme
result ,if null hypothesis is true
-after obtaining P value then comparing the significance level to
determine rejecting Ho or not

Standard - is the standard deviation of a sample mean


error mean Data type
Continuous Discrete/proportion
SE= S/ √n SE (p) = √<p(1-p)/n>

S- standard deviation p=r/n


N- sample size r- samples with the characteristic
n- sample size
- SE - ↑ with variability, ↓ with sample size i.e
larger sample, smaller SE- more precise estimation of population
mean
- SEM always < SD and until N large enough to equal SD
Confidence -interval that 95% ce rtain contains the true population value.
interval -Value can be- mean, difference bet 2 means, a proportion etc

i.e CI contain population mean with 95% certainty


(not sample mean, not absolute value rather mean value )

Width of CI 1.Normal distribution


depends a. known variance, N>30 (est. distributed like standard normal)
1.sample size - Normal distribution
2.SD 95% CI= sample mean ± 1.96xSE (SE= SD/ √n)
b. unknown variance, N<30
-directly with - t-distribution
SD
-inversely N
t-0.05= DF-1 then look for 0.025(both tail) value in table

2.Proportion

95% CI= p±1.96

Small N- exact Binomial distribution;


Large N- Binomial distribution

Statistically significance if the 95% CI does not


- Value- 0 not in the mean difference CI,
- 1 not in odd/relative ratio,
- not overlap of 2 CI

Larger sample larger sample- smaller SEM, narrow CI


more precise Precise- narrower CI, more precise to population mean

i.e Reference range- 95% of the population value fall in


Sample size - sample size is crucial as too small, we may not be able to
Calculation detect an important existing effect

Sample size depends on parameter limit set


1. ↑power→ ↑ sample size
2. ↑ Significance level(0.01→0.05) → ↑sample size
3. ↑variability- standard deviation→ ↑ sample size
4. ↑interest of effect → ↓ sample size

Conclusion 1.Statistical Conclusion


2 type i.e P value< α 0.05, therefore, reject Ho. The observed result is
conclusions statistically significant and unlikely being due to chance alone

2.Problem Conclusion
i.e The mean BW of babies born in Hospital X is significantly
lower than that of the population mean
One-tail test and For example, Y value being a standard/reference value
Two-tail test
Two-tail test: to see if there is a difference bet. X and Y
One-tail test: to see if there is a difference bet. X and Y in
one particular direction
Example: We test to see if X > Y
Example: We test to see if X < Y

choosing a 1. type of variable- i.e continuous,discrete


significance test 2. normally distributed or not
3. How many samples? One, two or more?
4. If two samples, are they independent or paired/matched?

Make sure the assumptions of the test are not violated

Parametric
- the distribution of scores in a population is normal & variance
equal and when the sample size is large.

Nonparametric
- the distribution of scores in a population is not normal or if the
sample size is small

Choosing a significance test


Comparing mean/median of the data

Outcome Parametric - Normal distributed Non-parametric Observation


variable Observation related?- compare means related? – compare median
Unpaired/ Paired/ unpaired paired
independent dependent
continuous T-test – Paired T-test- Wilcoxon Wilcoxon
i.e BP, pain Mean between compares sum-rank sign-rank
score 2 independent groups means bet. 2 test (aka Mann- test:
related groups Whitney U test):

(e.g., the same - alternative


subjects before -alternative paired t-test
& after Rx) T-test
-alternative
-paired of one sample t-
value of a test
variable
ANOVA- Repeated Kruskal- Friedman
Mean >2 independent gp ANOVA Wallis test: - ANOVA
-know one group >2 related - alternative
significant difference fr groups to ANOVA/
other gp (without -ordinal data
knowing specific gp is (extension
better than one another or wilcoxon-
the lowest score is the sum)
worse)
Comparing relations hip/association between 2 continuous variable
Correlation - measures the strength and the direction of the linear
relationship between two continuous variables- displayed by scatter plots
Parametric Non-parametric
Pearson’s correlation Spearman rank correlation:
- linear correlation bet. 2 normally - alternative to Pearson’s
distributed continuous variables -1 or both variable not
- doesn’t implied causality normally distributed
- H0 - linear relationship bet. 2 variable
- r= 0, strong relationship in
correlation coefficient r the form of a curve
- r shows magnitude/strength and direction
- ranging fr -1 to 1 -if statistically significant,
- +1 or -1, perfect correlation with all the concluded that a relationship
points lying on the line exist but which is not
- 0, no linear correlation necessarily linear
- +ve or –ve direction of relationship
no R 2 in spearman
If CI contained no 0 ,p value significant

Coefficient of determination R2
proportion or % variation in the dependent
variable(y) that is explained by all the
independent variables(x)

- range fr 0-1
0- no relationship
1- perfect relation
- √R2 = r correlation coefficient

Regression- dependent variable is a function of independent variable


(predict how independent variables (X1 etc) affect a dependent variable (Y)

Different type of data using different method for regression


Continuous Linear/ multiple regression
Categorical Logistic/ multiple logistic regression
Rates Proportional hazards
Count Poisson regression

Simple regression
- 1 independent variable and 1 dependent variable
Type of simple regression

a. Linear regression
y=mX+b

Y= outcome, X=predictor, m= slope, b= intercept


M = regression coefficient-amt of change in dependent variable for 1 unit
change in independent variable
Residual= observed y – fitted y
-residual is normally distributed.
Mean Residual =0 Variance= any +ve value

b. Exponential regression
y= ae bx
x- variable

Multiple Linear Regression


- 1 dependent variable and >2 independent variable
- linear relationship bet. a dependent variable(y)and some covariates(x)
-know whether or not an explanatory variable(y) is linearly related to any
one of the dependent variable, after adjusting for other covariates( control for
confounder)

Univariate analysis- 1 variable Y depends on at least one of the Xi’s

Multivariate analysis
-if the set of variables Y1..Yk depend on at least one of the Xi’s
-linear transformation of the predictor variables so that the sum of squared
deviations of the observed & predicted values on the outcome variable is
minimized

Binary or Cell>5 Cells < 5


categorical unpaired Paired unpaired paired
(e.g.
Chi’s square McNemar’s Fishe r’s McNemar’s
fracture-
-compare proportion bet gps chi-square exact test: exact test:
yes/no)
- non-parametric test: compares compares
- row(r)/column(c)mutually compares proportions proportions
Rules
exclusive binary between between
1.Random
- DF= (r-1)(c-1) outcome independent correlated
Sampling
between groups groups
-expected(E) no in each cell = correlated when there when there
2.categorical
(relevant R total x relevant C total) groups (e.g., are sparse are sparse
variable
divided by the overall total before and data (some data (some
after) cells <5). cells <5).
3.Expected
Value -E - 20% of the cells allow E
-All E>1 frequencies < 5 (E value -all similar - no need
-2x2 table, counted to see Fisher’s exact Chi’square calculate
All E>5 test needed) features expected
-RxC table frequency
20% E <5 Ho =
-proportion of preferences of
certain type in population= 1/2
- differences bet. responses
for the pairs in population= 0
(O and E frequency
approximately equal)
Chi-square value
- 2X2 table, ½ is the
continuity correction

>2xk table, no continuity


correctn
(observed - expected) 2
2  
expected

- know one group significant


difference from other group
(without know specific gp is
better)

- looking for a trend in


proportions, categories to be
in ordered
-If row/column associated
then rejected Ho

4.Survival analysis
- a collection of analytical procedures where the outcome of interest is the time until an
event of inte rest occurs.
-event: death, disease relapse, fracture occurred

- Time-to-event: time from entry into a study until a subject has a particular outcome
- Censoring: occurs when we have some information about the individual, but we do
not know the survival time exactly.
a. Right-censor for
1.no events after the study ends 2. lost to follow- up 3.Withdraw from study 3.die of
other cause (car accident)

b. Left-censor- never follow- up since the beginning

Survial fx-S(t) gives the probability that a person survives longer than some specified
time, t.

Hazard fx- h(t) gives the instantaneous potential per unit time for the event to occur,
given that the individual has survived up to time t.

Relationship: as h(t) increases, S(t) decreases

Survival analysis methods


Life-table use reasons
- large sample(ind. data not feasible cf interval group of data)
- exact specific time event unknow (i.e 6-10 wk)
Kaplan-Meier
-method of estimating time-to-event models in the presence of censored cases

- In survival analysis: intervals are defined by failures i.e


Survival of 10 breast ca patient
After 1 death: Survival # 9/10 1st interval
After 1 drop out
After 1 drop and 2nd death 2nd interval
P(surviving intervals 1 and 2)=9/10 X 7(9-2)/8(10-2) = 0.79

Biases of survival analysis- selection, information, measurement, confounding bias


Significant test for survival analysis:
log-rank, cox proportional hazards, mantel haenszel test

Use log-rank test to test the null hypothesis of no difference between survival functions
of the two groups- non-parametric

Cox proportional hazards regression


-assumptions: hazard ratio should be constant across time
Observations should be independent

Hazard fx
-measure of the potential for the event to occur at a particular time t,
given that the event did not occur, yet.
Larger values of the hazard function indicate greater potential for the event to occur

Hazard ratio-hazard/chance of events occurring in the treatment as a ratio of the hazard


of the events occurring in the control arm.

-HR
> 1 ↑ risk
< 1 ↓ risk
= 1 equal, no risk changes

5.Study methods- observational and experimental


Observational
- Descriptive Case report, case series, cross-sectional
- Analytical Cross sectional,Case control,Cohort study,Ecological
Experimental Any intervention. i.e RCT
longitudinal Samples are follow-up for a period of time

Observational
Case study written description of a patient or generate hypothesis
/series particular problem - rare diagnosis, event
Retrospective
Cross-sectional Prevalence study
Methods Pro:
- Random cross-section of population 1.Cheap, fast
- assess exposure & outcome 2.describe characteristic in large pop.
simultaneously in a defined population at a
single point of time Con:
(often unclear if the exposure preceded the 1.Recall bias
outcome-no causality) 2. Outcome/disease may be affected by
-Calculate prevalence point in time(seasonal variation, law,
health)
3.reversal temporality

Repeated cross-sectional Changes over time


-Target population is not follow/new group
of sample during the repeated time frame
2.Case-control study Pro:
- rare disease
- Cheap, fast
- Explore multiple hypotheses

Cons:
- recall bias
-Temporal relationships cannot be
established

Methods:
- Start with people with disease Interpretation odd
- Match them with controls (w/o disease) -odd ratio 5
- Look back and assess exposures
-control gp shouldn’t be selected base on Odd of outcome(dss) among expose 5
expose or not -affect analysis times of those unexposed.
-Odd ratio is calculated
- OR estimate of RR when the dss is rare
3.Ecologic Studies
- looks for an association between an
exposure and an outcome at the group
level rather than individuals
- Any relationship that is determined is bet
the factor and the group of individuals
Prospective
Cohort study - etiology, prognosis, natural history

Pro:
1. temporality bet. exposure & outcome
to establish cause
2. study for rare exposures
3. can study multiple outcome
4. better quality
Methods 5. direct risks assessments
-Begin with disease- free people - Incidence rate
-Classify people as exposed/unexposed - Relative risk
- Record outcomes of both groups - Attributable risks
- Compare outcomes using relative risk
RR = 1 ,equally both Cons:
RR>1 ↑ risks : RR<1 ↓ risks 1.expensive, lost f/up (long duration),
size(large no.)
RR= i.e 3 2. Inappropriate to study rare outcome
interpretation:
-3 time the risks of
-2 or 200%(3-1) increase risks

Example
Incidence cancer for smoker(28) / non-
smoker (20) = 1.4
1.Incidence in smoker was 1.4 times that in
non-smoker
2.Incidence in smoker was 40% (1.4-
1/1x100%)higher > in non-smoker

3.atrributable risks- <(28-20)/28> =29%


Among smoker, 29% cancer attributable to
smoking

Odd and relative risk

Absolute risks/ Absolute Risk (of dying if exposed) = A / (A + B)


Incidence rate Absolute Risk (of dying if not exposed) = C / (C + D)

Absolute risk measure of how much disease incidence is attributable


reduction to exposure- burden
= Absolute Risk (if exposed) – Absolute Risk (if not exposed)
= A / (A+B) – C / (C+D)
No needed-to- No need to treat to prevent one adverse event
treat =1/ absolute risk reduction (cost effectiveness )
(PAR) Population Attributable risk
= A / (A+B) – C / (C+D)
A / (A+B)

Relative risks measure of strength of association between exposure and


-Cohort, disease and is useful in analytical studies- risk
control trial = (A/(A + B)) / (C/(C + D))

Odd ratio Odds Ratio = (A / C) / (B / D)


- Case-control -odd of exposed/exposed in diseased/case gp divided by odds of
exposed/exposed in non-diseased/ control group

Causality 9 Hill’s criteria


Temporal exposure precede outcome
relationship
Strength measured by appropriate statistical tests
Dose-Response
Relationship

Consistency association results replicate in other study


Plausibility association compatible with current
pathological process
Alternate
Explanations
Experiment

Specificity single putative cause produces a specific effect


Coherence ass compatible with existing theory &
knowledge

6. Clinical trials
Clinical trial I- safety profile of treatment(use 1/10 of LD10 of mice)
Phase II- study safety,efficacy, dose responses
(follow ph 1,conducted on larger gp to determine safety,
identify side effect, safe dosage range)
III- compare standard treatment
IV- post-marketing surveillance
Clinical trial - subjects randomized into at least two groups
- control arm- receives either a placebo/current standard of care
- intervention arm – agent being studied in this group

Rx arm
-Observe effect= Rx effect + NC(natural course )+ EE
(external i.e placebo)+ Error(bias)
Control arm
- Observe effect= NC+EE+ bias
Protocol Written description of all aspects of the trial i.e inclusion,
exclusion, sample size, endpoint should be prepared
Ethic and - Trials must approved by ethical committee which does not
consent contravene Declaration of Helsinki
- informed consent must obtained from patient
-placebo(chance of randomization) must inform to patient

RCT – 3 elements :Randomization, Placebo, Blinding


Randomization –
subjects are randomly assigned -equal chance to any one of the treatment group

-index and reference groups comparable for all known and unknown factors
-to avoid selection bias and confounding bias
-does not guarantee equal distribution in small sample

Methods randomization
simple -Using random table
randomization -No guarantee of equal or approximately equal sample size in
each treatment group( imbalance of no participants in each group)

Block Aka Restricted randomization


randomization -groups of experimental units within a predefined block of units
assumed to be internally homogeneous
-guarantee balance in number

Stratified Individuals are identified based on important covariates (sex,


randomisation age, etc.) and then randomization occurs within the strata
-guarantee balance in number

Ways to carry out 1.Flipping coins


randomization 2.computer generates no
3. Concealment of allocation
- Procedure to protect the randomization process before the
subject enters the trial
-always feasible, if not done- leads to selection bias

Placebo -avoid placebo effect by subjects and natural history of dss

Blinding/ - Masking of the treatments after randomization (once trial


Masking begins) not always feasible
- avoid measurement and information bias(observer bias)
Blind
Single pt unaware of treatment(avoid placebo effect)
Double pt and investigator unaware of treatment(avoid
observer bias)
Triple Pt, investigator and statistician unaware

Open-label Unblind- unaviodable i.e in surgical procedures, comparison of


different devices and changes in lifestyle (modifiable risk factors)
Advantage 1.randomization- control of known & unknown factors
Of RCT influence outcome
2.control of exposure
3.high internal validity
4.a true measure of efficacy

Disadvantage of 1.limited external validity-result must also be relevant to patient


RCT clinically
2.cost and time
3. Limits to generalisability- Selection of the study population
(bias)
4.ethical dilemma
Type of study
1.Parallel

2.Cross-over

- each group receive each arm of Rx and after washout period


switching over to the other Rx
- each patient was used as his/her own control
- Fewer patients need to be recruited
- Individual differences in responses to the two treatments were
calculated and analysed (his/her own control)
3.Factorial

- experiment which there are ≥ 2 factors of interest


- Each factor in a factorial experiment can be ≥ 2 levels
- allows the study of interactions
- interaction bet 2 factors implies that the difference bet the
levels of one factor is not constant for the various levels of the
other factor
- less sample size
interim analyses Preplanned analysis i.e ensure that no major toxicities have
occurred/significant benefit requiring the trial to be stopped
prematurely

Intention-to-treat Participants will be counted in the intervention group to which


they were originally assigned for analysis, even if they:
 Refused the intervention after randomization
 Discontinued the intervention during the study
 Followed the intervention incorrectly
 Violated study protocol
 Missed follow-up measurements
Reasons
- Preserves the benefits of randomization.
Randomization balances potential confounding factors in the
study arms. This balance will be lost if the data are analyzed
according to how participants self-selected rather than how
they were randomized

- Simulates real life, where patients often don’t adhere perfectly


to treatment or may discontinue treatment altogether

NNT -clinical cost effectiveness of the results of a trial

Endpoint - a predefine outcome for an individual


- does not have to be a clinical outcome measure
-composite endpoint requires the participant to experience one
of a number of possible endpoints
- surrogate endpoint is a biomarker intended to substitute for a
clinical endpoint

10 endpoint – related efficacy i.e tumour regression, morbidity,


local recurrence, distance mets, death

20 endpoint- related to toxicity

Systematic review
Meta-analysis - Particular type of systematic review that focuses on the
numerical results.
- combine results fr all individual studies(clinical/observation)

Advantages
1.Increase the numbers of observations and the statistical power
2. Improve the estimates of the effect size of an intervention or
an association
3. Ability to control for between-study variation including
moderators to explain variation.

Disadvantages
1. Publication bias- very hard to get published studies that show
no significant results.
2.Sources of bias are not controlled by the method
3. May include badly designed studies that will cause flawed
statistics

Forest plot
- display RR and CI of each and combined all trial
- CI for the combined RR in the meta-analysis would have been
narrower(more precise) than any of the individual studies as
larger numbers after combined each studies
- random effects model used if heterogeneity between studies

- Funnel plot is a useful graph designed to check the existence


of publication bias in systematic reviews and meta-analyses

7. Epidemiology
Effect modification -magnitude of the effect of a particular exposure o n the outcome
will vary according to the presence of a third factor
- i.e Smoking and asbestos exposure alone →lung Ca, but
together, asbestos and smoking multiply the risk of lung cancer
significantly (a ―synergistic‖ effect).
- cannot be corrected or eliminated.

Stratification eliminate confounding and describe effect modification.


Prevalence point prevalence – no of people who have a disease or condition
P= I X D at a given point in time in a define population

p-prevalence period prevalence- the total number of people known to have


i- incidence had the condition at any time during a specified period
d-duration Duration long – overestimate prevalence
Incidence(depends on mortality rate and duration) – may or may
not affect prevalence

Incidence Incidence = no of new cases of a disease over a period of time


population at risk of the disease in the time period
Adjusted Direct and indirect method
rate/standardization
-remove common Direct standardization
confounding - need larger population with stable age –specific rate
factors: age/sex - suitable for cancer incidence rate

Workout
- age-specific rate X standard population= no of death
- total no of death of each age-specific gp/ total standard
population = standardization death rate (SDR)

Indirect standardization
-Smaller study population
Workout
-age specific rate of standard population x study age specific gp
= expected no of death
-SMR= Total observed death/total expected death x100%, if
>100% higher rate, <100% lower rate

-SDR= SMR x CDR standard population

Bias
- systematic error
- 3 categories of bias: selection, measurement, confounding
Selection bias
-selected samples are not representative of study population

i.e
1. Berkson fallacy-
-Select control subjects for a case-control study from hospitalized
-exposure frequency in hospitalized patients does not necessarily that of the general
population

2. Referral bias
-samples fr specialized medical centers (maybe> severe illness community hospital)

3.Selective loss to follow-up(cohort studies)


- high rate of follow-up loss creates a high potential for selection bias in prospective
studies

4. Non-response bias
- study design allows subjects to decide whether or not to participate in the study.
Imagine a health survey conducted by a random selection of phone numbers. The
phone numbers selected are called and people are interviewed using a standardized
questionnaire. There are always people who would refuse to participate in the survey.
If the refusal is somehow related to their health status (e.g., they are sicker than the
general population), then non-response selection bias results.

5. Prevalence bias (Neyman bias) may occur when incidence of a disease is estimated
based on prevalence, and data become skewed by selective survival. Diabetics are
more likely to die from myocardial infarction than are non-diabetics. If living patients
who have sustained myocardial infarction are asked about their diabetes status, it is
likely that diabetics will be under-represented because non-diabetics 'selectively
survived their cardiovascular events.

6. Susceptibility bias occurs when the treatment regimen selected for a patient
depends on the severity of the patient's condition. Imagine patients with acute
coronary syndrome. Healthier patients may be preferentially selected for coronary
intervention, while sicker patients may instead be selected for medical therapy. This
may create bias whereby outcomes from coronary intervention appear superior to
medical therapy simply because the subjects who underwent coronary intervention
were healthier.
Measurement/information- inaccurate estimate of exposure/outcome
- Measurement bias implies that exposure and/or outcome data are systematically
misclassfied (e.g exposed cases are labeled as unexposed).

- Misclassification
- differential - exposure (or disease) is related to disease (or exposure) or
- non-differential- exposure (or disease) is unrelated to disease (or exposure)

1. Recall bias -a potential problem in case-control studies result in overestimation of


the effect of exposure.

2. Observer bias (ascertainment bias, detection bias or assessment bias) occurs when
the investigators decision is adversely affected by knowledge of the exposure status.
Blinding of the heath care provider is an effective tool to avoid observer bias.

Confounding
- extraneous factor must have some properties linking it with the exposure and
outcome(confounding is related to both the explanatory variable and the response)
1. Factor is associated with exposure
2.Factor is associated with disease in the absence of exposure
3.Factor is not in the causal path between exposure and outcome

Control confounding methods


1. Study design
- Randomization for RCT
- Restriction
- Matching for case-control

2.Analysis
- Stratification
- Multivariate analysis

Screening test- aiming earlier detection & improved OS/prognosis


What Diseases:

-must be an important health problem.


-recognizable latent or early symptomatic stage.
-natural history of the dss(latent & declared dss) should be understood
-detect the dss prior to the onset of signs and symptoms
-test to confirm diagnosis
-treatment available
- early detection and treatment reduces morbidity and mortality

When Routine(antenatal test) or ad-hoc (episodic)-outbreak


Criteria
Acceptability - Simple, safe,easily administered
- Test should be acceptable to the population screened/administer
Repeatability consistent results when repeated – high precision
Reliability (precision, reproducibility) affected by:
•Observer variation: tested by repeat measurements at the same time
1.Intraobserver(variation between same observer )
2.Interobserver (variation between different observers)

•Biological variation : tested by repeat measurements over time


-variations in the way patients perceive their symptoms

•Variation due to the test method e.g. defective instruments, error test

Validity -what extent the test accurately measures which it purports to measure
-Testing validity: sensitivity and specificity

Sensitivity = - given the subject is ill, probability of a +ve test


TP(a)___ PID- +ve in disease (diseased case with +ve test)
TP+FN(a+c) - not affected by prevalence
- screening test and rule out(Snout)

Specificity = - given the subject is not ill, probability of a -ve test


__TN(d)____ - not affected by prevalence
TN+FP (d+b) -confirmatory test and rule in(SPin)

Validity of test-
Ideal Sn&Sp=1
(impossible ideal
test)

bet +ve/-ve result


of a test
-A- 100% sensivity B- 100% specificity
- X- tradeoff bet the two, as X shift more to Sn
→ > ↑True/false +ve(>Sn); ↓ TN (<Sp)
-↓ AUC of FN/FP → higher Sn & Sp

Sensitivity should be ↑:
•Penalty associated with missing a case is high e.g.
•When the disease is serious & definitive treatment exists.
•When the disease can be spread
•When subsequent diagnostic evaluations are associated with
minimum cost & minimum risk
Specificity should be ↑:
•When false positive results can harm patient physically,
financially or emotionally.
•When costs or risks associated with further diagnostic
techniques are substantial.

ROC
- trade-off bet Sn
& Sp available for
the test

Use:
1.decide cut offs
value for a test
2.determine the - Graph- sensitivity (y) vs (x) 1-specificity (false positive)
ability of a test to - Better test- more to the left and upper in the curve
predict dss - AUC- calculate the diagnostic accuracy of the test
- A - larger the AUC ~ 1,better the test is
- B - AUC ~0.5 straight line lack predictive value

Positive predictive -given a +ve test result, probability that a subject is actually ill
value PPV= TP/ TP + FP depends on Prevalence & Specificity

higher prevalence, (+ve test is more likely true +ve) PPV higher
higher specificity, lower False-+ve, higher PPV

Negative -given -ve test result, probability a subject doesn’t have the dss
predictive value NPV= TN/TN+FN

Lower prevalence (- ve test more likely true –ve)

+ve likelihood =Sentivity/(1-specitivity) – not affected by prevalence


ratio ratio of the chance of a positive result if the patient has the
disease to the chance of a positive result if he/she does not have
the disease.
- i.e LLR of 9 –
test +ve 9x more in diseased than pt without disease
Bias in screening
Selection bias Volunteer bias
Lead-time OS appeared improved as earlier detection by screening but not
successful intervention/Rx
Length-time OS improved as slow- growing/non-aggressive tumor detected
earlier.
Overdiagnosis bias -false +ve in those enthusiasm for a new screening

You might also like