2statsnotes 1
2statsnotes 1
Clinical
Oncology
Revision Notes
Biostatistics
Biostatistics ®
Contents
1. Descriptive statistics
1.1 Types of data
1.2 Displaying Data methods
1.3 Measure of central tendency
1.4 Measures of Dispersion
1.5 Distribution
2. Sampling
3. Principle of statistical inference
4. Survival analysis
5. Study methods
6. Clinical trials
7. Epidemiology
1.Descriptive statistics
Types of data
Variable-characteristic of interest (i.e diet regime) which takes different values in
different items/objects (different regime’s body weight)
A.Independent or dependent
Independent predictive factor that has an impact on a dependent variable
e.g different dietary fat regimens
Dependent outcome that reflects the effects of changing the indepe ndent
e.g., body weight under different dietary fat regimens
B. Qualitative or quantitative
Categorical (qualitative-cannot measure/countable)
Both order/unorder Unordered categories Ordered categories
Binomial (2 group) Nominal (> 2 gps) Ordinal
- Dead/alive - type of cancer -Stage breast cancer - I, II, III, IV
(unorder) - Blood gp O, A, B, AB -Birth order—1st, 2nd, 3rd,
- better/worse -Letter grades (A, B, C, D, F)
(order)
Numerical (quantitative-can be measured)
Continuous Discrete
Blood pressure, height, weight, age - Number of children
- Number of attacks of asthma per week
- Quantitative variables are converted to categorical ones for descriptive purposes
- Categorising data is useful for summarising results not for statistical analysis.
coefficient of =SD/mean
variation
Distribution
Normal - continuous probability distribution-bell shape- not skewed
- 1SD 68%, 2SD 95%, 3SD 99.7%
- depends on mean and variance
Skewed - +ve skewed (skewed to the right), tail toward the right
Check skewed - -ve skewed (skewed to the left), tail toward the left
= mean- mode - checking mean/median/mode to check normality/skew
SD - Variance - ↑ with ↑ value of variable – right skew
±1 (ok) - ↓ with ↑ value of variable – left skew
Kurtosis Kappa~0±1- mesokurtik
>+1- leptokurtic (narrow, sharp and pointed)
<-1 - platykurtik (broad and plateau like)
Sampling or Discrepancy bet. samples and its population parameter as study only
random error a portion of the population-reduce random error by ↑ sample size
Systematic Rank 100 people divide in interval then any no in the each interval i.e
Sample 100 divide into 10 group
choose 5th person in each group,
i.e. Choose the 5th ,15th ,25th ... 85th ,95th persons
Malays 60%
Chinese 25%
Indians 10%
Others 5%
Cluster Divide population into groups, random sample of groups is chosen i.e
Sample - divide entire city into ―city blocks‖
- Random sample of blocks is chosen
- Count every person in each city block selected
2.Proportion
2.Problem Conclusion
i.e The mean BW of babies born in Hospital X is significantly
lower than that of the population mean
One-tail test and For example, Y value being a standard/reference value
Two-tail test
Two-tail test: to see if there is a difference bet. X and Y
One-tail test: to see if there is a difference bet. X and Y in
one particular direction
Example: We test to see if X > Y
Example: We test to see if X < Y
Parametric
- the distribution of scores in a population is normal & variance
equal and when the sample size is large.
Nonparametric
- the distribution of scores in a population is not normal or if the
sample size is small
Coefficient of determination R2
proportion or % variation in the dependent
variable(y) that is explained by all the
independent variables(x)
- range fr 0-1
0- no relationship
1- perfect relation
- √R2 = r correlation coefficient
Simple regression
- 1 independent variable and 1 dependent variable
Type of simple regression
a. Linear regression
y=mX+b
b. Exponential regression
y= ae bx
x- variable
Multivariate analysis
-if the set of variables Y1..Yk depend on at least one of the Xi’s
-linear transformation of the predictor variables so that the sum of squared
deviations of the observed & predicted values on the outcome variable is
minimized
4.Survival analysis
- a collection of analytical procedures where the outcome of interest is the time until an
event of inte rest occurs.
-event: death, disease relapse, fracture occurred
- Time-to-event: time from entry into a study until a subject has a particular outcome
- Censoring: occurs when we have some information about the individual, but we do
not know the survival time exactly.
a. Right-censor for
1.no events after the study ends 2. lost to follow- up 3.Withdraw from study 3.die of
other cause (car accident)
Survial fx-S(t) gives the probability that a person survives longer than some specified
time, t.
Hazard fx- h(t) gives the instantaneous potential per unit time for the event to occur,
given that the individual has survived up to time t.
Use log-rank test to test the null hypothesis of no difference between survival functions
of the two groups- non-parametric
Hazard fx
-measure of the potential for the event to occur at a particular time t,
given that the event did not occur, yet.
Larger values of the hazard function indicate greater potential for the event to occur
-HR
> 1 ↑ risk
< 1 ↓ risk
= 1 equal, no risk changes
Observational
Case study written description of a patient or generate hypothesis
/series particular problem - rare diagnosis, event
Retrospective
Cross-sectional Prevalence study
Methods Pro:
- Random cross-section of population 1.Cheap, fast
- assess exposure & outcome 2.describe characteristic in large pop.
simultaneously in a defined population at a
single point of time Con:
(often unclear if the exposure preceded the 1.Recall bias
outcome-no causality) 2. Outcome/disease may be affected by
-Calculate prevalence point in time(seasonal variation, law,
health)
3.reversal temporality
Cons:
- recall bias
-Temporal relationships cannot be
established
Methods:
- Start with people with disease Interpretation odd
- Match them with controls (w/o disease) -odd ratio 5
- Look back and assess exposures
-control gp shouldn’t be selected base on Odd of outcome(dss) among expose 5
expose or not -affect analysis times of those unexposed.
-Odd ratio is calculated
- OR estimate of RR when the dss is rare
3.Ecologic Studies
- looks for an association between an
exposure and an outcome at the group
level rather than individuals
- Any relationship that is determined is bet
the factor and the group of individuals
Prospective
Cohort study - etiology, prognosis, natural history
Pro:
1. temporality bet. exposure & outcome
to establish cause
2. study for rare exposures
3. can study multiple outcome
4. better quality
Methods 5. direct risks assessments
-Begin with disease- free people - Incidence rate
-Classify people as exposed/unexposed - Relative risk
- Record outcomes of both groups - Attributable risks
- Compare outcomes using relative risk
RR = 1 ,equally both Cons:
RR>1 ↑ risks : RR<1 ↓ risks 1.expensive, lost f/up (long duration),
size(large no.)
RR= i.e 3 2. Inappropriate to study rare outcome
interpretation:
-3 time the risks of
-2 or 200%(3-1) increase risks
Example
Incidence cancer for smoker(28) / non-
smoker (20) = 1.4
1.Incidence in smoker was 1.4 times that in
non-smoker
2.Incidence in smoker was 40% (1.4-
1/1x100%)higher > in non-smoker
6. Clinical trials
Clinical trial I- safety profile of treatment(use 1/10 of LD10 of mice)
Phase II- study safety,efficacy, dose responses
(follow ph 1,conducted on larger gp to determine safety,
identify side effect, safe dosage range)
III- compare standard treatment
IV- post-marketing surveillance
Clinical trial - subjects randomized into at least two groups
- control arm- receives either a placebo/current standard of care
- intervention arm – agent being studied in this group
Rx arm
-Observe effect= Rx effect + NC(natural course )+ EE
(external i.e placebo)+ Error(bias)
Control arm
- Observe effect= NC+EE+ bias
Protocol Written description of all aspects of the trial i.e inclusion,
exclusion, sample size, endpoint should be prepared
Ethic and - Trials must approved by ethical committee which does not
consent contravene Declaration of Helsinki
- informed consent must obtained from patient
-placebo(chance of randomization) must inform to patient
-index and reference groups comparable for all known and unknown factors
-to avoid selection bias and confounding bias
-does not guarantee equal distribution in small sample
Methods randomization
simple -Using random table
randomization -No guarantee of equal or approximately equal sample size in
each treatment group( imbalance of no participants in each group)
2.Cross-over
Systematic review
Meta-analysis - Particular type of systematic review that focuses on the
numerical results.
- combine results fr all individual studies(clinical/observation)
Advantages
1.Increase the numbers of observations and the statistical power
2. Improve the estimates of the effect size of an intervention or
an association
3. Ability to control for between-study variation including
moderators to explain variation.
Disadvantages
1. Publication bias- very hard to get published studies that show
no significant results.
2.Sources of bias are not controlled by the method
3. May include badly designed studies that will cause flawed
statistics
Forest plot
- display RR and CI of each and combined all trial
- CI for the combined RR in the meta-analysis would have been
narrower(more precise) than any of the individual studies as
larger numbers after combined each studies
- random effects model used if heterogeneity between studies
7. Epidemiology
Effect modification -magnitude of the effect of a particular exposure o n the outcome
will vary according to the presence of a third factor
- i.e Smoking and asbestos exposure alone →lung Ca, but
together, asbestos and smoking multiply the risk of lung cancer
significantly (a ―synergistic‖ effect).
- cannot be corrected or eliminated.
Workout
- age-specific rate X standard population= no of death
- total no of death of each age-specific gp/ total standard
population = standardization death rate (SDR)
Indirect standardization
-Smaller study population
Workout
-age specific rate of standard population x study age specific gp
= expected no of death
-SMR= Total observed death/total expected death x100%, if
>100% higher rate, <100% lower rate
Bias
- systematic error
- 3 categories of bias: selection, measurement, confounding
Selection bias
-selected samples are not representative of study population
i.e
1. Berkson fallacy-
-Select control subjects for a case-control study from hospitalized
-exposure frequency in hospitalized patients does not necessarily that of the general
population
2. Referral bias
-samples fr specialized medical centers (maybe> severe illness community hospital)
4. Non-response bias
- study design allows subjects to decide whether or not to participate in the study.
Imagine a health survey conducted by a random selection of phone numbers. The
phone numbers selected are called and people are interviewed using a standardized
questionnaire. There are always people who would refuse to participate in the survey.
If the refusal is somehow related to their health status (e.g., they are sicker than the
general population), then non-response selection bias results.
5. Prevalence bias (Neyman bias) may occur when incidence of a disease is estimated
based on prevalence, and data become skewed by selective survival. Diabetics are
more likely to die from myocardial infarction than are non-diabetics. If living patients
who have sustained myocardial infarction are asked about their diabetes status, it is
likely that diabetics will be under-represented because non-diabetics 'selectively
survived their cardiovascular events.
6. Susceptibility bias occurs when the treatment regimen selected for a patient
depends on the severity of the patient's condition. Imagine patients with acute
coronary syndrome. Healthier patients may be preferentially selected for coronary
intervention, while sicker patients may instead be selected for medical therapy. This
may create bias whereby outcomes from coronary intervention appear superior to
medical therapy simply because the subjects who underwent coronary intervention
were healthier.
Measurement/information- inaccurate estimate of exposure/outcome
- Measurement bias implies that exposure and/or outcome data are systematically
misclassfied (e.g exposed cases are labeled as unexposed).
- Misclassification
- differential - exposure (or disease) is related to disease (or exposure) or
- non-differential- exposure (or disease) is unrelated to disease (or exposure)
2. Observer bias (ascertainment bias, detection bias or assessment bias) occurs when
the investigators decision is adversely affected by knowledge of the exposure status.
Blinding of the heath care provider is an effective tool to avoid observer bias.
Confounding
- extraneous factor must have some properties linking it with the exposure and
outcome(confounding is related to both the explanatory variable and the response)
1. Factor is associated with exposure
2.Factor is associated with disease in the absence of exposure
3.Factor is not in the causal path between exposure and outcome
2.Analysis
- Stratification
- Multivariate analysis
•Variation due to the test method e.g. defective instruments, error test
Validity -what extent the test accurately measures which it purports to measure
-Testing validity: sensitivity and specificity
Validity of test-
Ideal Sn&Sp=1
(impossible ideal
test)
Sensitivity should be ↑:
•Penalty associated with missing a case is high e.g.
•When the disease is serious & definitive treatment exists.
•When the disease can be spread
•When subsequent diagnostic evaluations are associated with
minimum cost & minimum risk
Specificity should be ↑:
•When false positive results can harm patient physically,
financially or emotionally.
•When costs or risks associated with further diagnostic
techniques are substantial.
ROC
- trade-off bet Sn
& Sp available for
the test
Use:
1.decide cut offs
value for a test
2.determine the - Graph- sensitivity (y) vs (x) 1-specificity (false positive)
ability of a test to - Better test- more to the left and upper in the curve
predict dss - AUC- calculate the diagnostic accuracy of the test
- A - larger the AUC ~ 1,better the test is
- B - AUC ~0.5 straight line lack predictive value
Positive predictive -given a +ve test result, probability that a subject is actually ill
value PPV= TP/ TP + FP depends on Prevalence & Specificity
higher prevalence, (+ve test is more likely true +ve) PPV higher
higher specificity, lower False-+ve, higher PPV
Negative -given -ve test result, probability a subject doesn’t have the dss
predictive value NPV= TN/TN+FN