0% found this document useful (0 votes)
42 views39 pages

Nominal Variables Tests and Outcome Measures - Lecture 4

The document discusses different types of variables that can be used in statistical tests and different methods for comparing categorical variables between groups, including the chi-square test. It also covers calculating odds ratios to quantify the strength of association between variables and factors like genotype and disease outcomes. Cumulative probabilities are introduced as a way to incorporate survival data over time into statistical analyses when observations may be incomplete.

Uploaded by

black.hadi194
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views39 pages

Nominal Variables Tests and Outcome Measures - Lecture 4

The document discusses different types of variables that can be used in statistical tests and different methods for comparing categorical variables between groups, including the chi-square test. It also covers calculating odds ratios to quantify the strength of association between variables and factors like genotype and disease outcomes. Cumulative probabilities are introduced as a way to incorporate survival data over time into statistical analyses when observations may be incomplete.

Uploaded by

black.hadi194
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

Nominal variables tests and outcome

measures

Department of Biostatistics and Translational Medicine


What are the principal types of
variables?
• Continuous
– Everything that can be measured
• Ordinal
– Everything that can be ranked/ordered
• Categorical
– Everything that can be grouped
How can one compare categorical
variables between groups?
• Do men get diabetes more often than women?
A basic test for proportions – the
2
Chi test
• The test is used to determine whether two variables
are associated in a way that certain combinations of
values occur more often than others
Converting Chi-square values to p
How does Chi2 work?
• It calculates the expected number of state/class
combinations
• Afterwards it calculates the deviation of the observed
values with the expected ones
• If the deviations are large the test rejects the null
hypothesis

H0 – the observed values do not deviate from expected ones


HA – the observed values deviate from the expected ones
How to compare categorical
variables?
Without
Diabetes Total
Diabetes
FTO [CC] 1500 8500 10000
15.00% 85.00%
FTO [nonCC] 1300 8700 10000
13.00% 87.00%
Total 2800 17200 20000

How should this table look if there was no association between


FTO genotype and diabetes?
Expected distribution
Without
Diabetes Total
Diabetes
FTO [CC]
1500 8500
observed
expected 1400 8600 10000
FTO [nonCC]
1300 8700
observed
expected 1400 8600 10000
Total 2800 17200 20000
Outcome
Without
Diabetes Total
Diabetes
FTO [CC] 1500 8500 10000
15.00% 85.00%
FTO [nonCC] 1300 8700 10000
13.00% 87.00%
Total 2800 17200 20000
p<0.001
The distribution of cell counts deviates significantly from the expected
distribution. Considering that there were more patients with diabetes among CC
homozygtes than among patients with other genotypes we conclude that being
homozygous predisposes to diabetes
When not to use the Chi 2 test?
• Very small groups
– Use a Fisher’s exact test instead
• Paired observations (the same individual
evaluated twice – before an after an
intervention)
Alternatives to the Chi-square test
Fisher’s exact test
A permutational test that calculates all possible tables with the same
marginal sums and checks whether the observed table is within 5% of the
extreme distributions.
Typically used if in the 2x2 table contains values <5

Yates’ corrected Chi-square test (continuity correction)

Used to prevent overestimation of statistical significance for small


numbers of observations
Typically used if the 2x2 table contains values <15
Which one?
Row -
Hypoglycemia - 0 Hypoglycemia - 1
Totals
MDI 218 6 224
Column % 49.21% 50.00%
Row % 97.32% 2.68%
CSII 225 6 231
Column % 50.79% 50.00%
Row % 97.40% 2.60%
Totals 443 12 455
Which one?
Row -
Hypoglycemia - 0 Hypoglycemia - 1
Totals
MDI 218 16 234
Column % 49.21% 50.00%
Row % 97.32% 2.68%
CSII 225 2 227
Column % 50.79% 50.00%
Row % 97.40% 2.60%
Totals 443 18 461
Which one?
Row -
Hypoglycemia - 0 Hypoglycemia - 1
Totals
MDI 218 16 234
Column % 49.21% 50.00%
Row % 97.32% 2.68%
CSII 225 16 241
Column % 50.79% 50.00%
Row % 97.40% 2.60%
Totals 443 32 475
Matched pairs test for nominal variables
McNemar’s Chi-square test
Works by contrasting the divergent pairs
Works by comparing the difference between divergent pairs b and c

Depression according Without depression in DSM


to DSM IV IV
Depression according
100 (a) 20 (b)
to ICD-10
Without depression
10 (c) 1500 (d)
in ICD-10
But what about comparing the
effect?
Are p values enough or can we do better?
Diabetes Without Diabetes Total

FTO [CC] 1500 8500 10000


p<0.0001
FTO [nonCC] 1300 8700 10000

Total 2800 17200 20000

Diabetes Without Diabetes Total


INS 5’VNTR [CC] 800 1200 2000 p<0.0001
INS 5’VNTR
650 1350 2000
[nonCC]
Total 1450 2550 4000

Which of these variants exerts a stronger biological effect?


Odds ratio
• Odds - a measure comparing the odds of getting the event of
interest against not getting one p/(1-p)
– An odds of 1 corresponds to an equal probability of survival and failure

• An odds ratio (p1/(1-p1))/(p2/(1-p2)) thus shows the relative odds


of an event of interest occurring depending on the examined
variable
– For example having a risk allele may lead to an OR of 1.2 for getting
diabetes, which means that carriers are 1.2 times more likely to become
diabetic than non-carriers
Odds ratio calculations

Without
Diabetes Total
Diabetes
FTO [CC] 1500 8500 10000

FTO [nonCC] 1300 8700 10000

Total 2800 17200 20000

(p1/(1-p1)) = (1500/8500) 0.1765 =


OR= = 1.18
(p2/(1-p2)) (1300/8700) 0.1494
Interpreting OR and RR
OR=1.2 95%CI 0.7 – 1.7
p>0.05

Protective effect Detrimental effect

0 1

OR=0.6 95%CI 0.2 – 1.0


p=0.05
OR=1.2 95%CI 1.1 – 1.3
p<0.05
Other tools used in expressing
effects’ strength
• Relative risk – a ratio of the probabilities of an event occurring in the exposed
and control groups. Typically used in RCTs as it requires a good baseline
probability estimate provided by a placebo-treated group

• Hazard ratio – a ratio of probabilities of an event occurring in the exposed and


control group taking into account the observation time

• Number needed to treat – the minimum number of patients who need to be


treated to prevent one bad outcome.

http://www.cebm.net/glossary/
What to do when talking about survival?
Dead due to myocardial infarction Alive Total
INS 5’VNTR [CC] 100 1900 2000
INS 5’VNTR
80 1920 2000
[nonCC]
Total 180 3820 4000

Humans don’t
What is missing?
live forever
Cumulative probability – a way to incorporate
incomplete observations into the analysis
• A multiplication of probabilities of an event occurring on a certain timepoint in an
observation lasting t epochs

• The cumulative probability covers both the probability of an event occurring while taking
into account the observations dropping out of the analysis due to various reasons:
• Surviving past observation end

• Leaving the study due to non-event reasons


A potential cancer trial – what is the probability
of surviving 5 years since diagnosis?

R – Relapse
D – Death
Converting the probabilities into a database with individual
starting points

R – Relapse
D – Death
Objectives of survival analysis
Estimate time-to-event for a group of individuals, such as time until
second heart-attack (MI) for a group of MI patients.

To compare time-to-event between two or more groups, such as treated


vs. placebo MI patients in a randomized controlled trial.

To assess the relationship of co-variables to time-to-event, such as: does


weight, insulin resistance, or cholesterol influence survival time of MI
patients?
Why not...?
1. Why not compare mean time-to-event between your groups using
a t-test or linear regression?
 ignores censoring

2. Why not compare proportion of events in your groups using


risk/odds ratios or logistic regression?
 ignores time
Terms used in survival analysis
Time-to-event:
The time from entry into a study until a subject has a particular outcome (ti = time at
last disease-free observation or time at event)

Censoring:
Subjects are said to be censored if they are lost to follow-up or drop out of the study,
or if the study ends before they die or have an outcome of interest. They are counted
as alive or disease-free for the time they were enrolled in the study. (ci =1 if had the
event; ci = 0  no event by time ti)
Kaplan-Meier curves
• We take the time to event into account rather than just the
event’s presence and group assignment
• The database needs three variables
– Complete observations are ones in whom the event occurred
• They impact survival curves by reducing the estimated probability of survival

– Censored observations are ones that dropped out of the analysis


Interpreting Kaplan-Meier curves
P<0.05 log-rank test result
Probability of the
outcome

Median
survival
time

Individual Censored
observation time observations

Furman R, et al. Idelalisib and Rituximab in Relapsed


Chronic Lymphocytic Leukemia. N Eng J Med. Jan 2014
Log-rank test
H0 - no difference between survival functions of the two groups

A log-rank test creates 2x2 tables at each event time and combines across the tables

It provides a c2 statistic with 1 degree of freedom (for a two groups comparison) and a
p-value.

When p value <0.05 we can conclude that there is a significant difference in the
survival time, e.g. in the treated group compared to untreated one.
Examples on using K-M curves

Hunger SP, Mullighan CG.Acute Lymphoblastic Leukemia in Children.


N Engl J Med. 2015 Oct 15;373(16):1541-52. doi: 10.1056/NEJMra1400972.
Limitations of Kaplan-Meier method
• Mainly descriptive

• Requires categorical predictors

• Survival estimates can be unreliable toward the end of a study when there are
small numbers of subjects at risk of having an event

• Doesn’t control for covariates

• Can’t accommodate time-dependent variables


Comparing the impact of variables
on the probability of survival
• For univariate comparisons (single variable
divides the whole group) we can use the log-
rank test
– H0 – cumulative probabilities of survival are equal
– HA – cumulative probabilities of survival are not equal
• What if there are several overlapping variables?
Multivariate analyses
• Can be used for continuous, catagorical and time-
dependent variables using different methods
• Typically used when multiple variables coexist and
overlap and one wants to extract the impact of a single
variable free from confounding effects of others
– Does smoking cause lung cancer or is male sex a more
significant risk factor?
Multivariate analysis of survival
probabilities
• Cox’ proportional hazard regression model
• Uses a polynomial equation to express the relative impact of
variables on the probability of survival
• Results are expressed as Hazard Ratios (interpreted similarly to
ORs)
• HRs that are adjusted represent the impact of a single variable
after „cleaning” it from the effects of other variables in the model
Typical results of Cox’ regression
Interpreting HR

Protective effect Detrimental effect

0 1
Thank you for your attention

You might also like