Chapter 2 Slides
Chapter 2 Slides
CHAPTER 2 Analysis
Overview
2A.
1A. Rates I 1B. Rates II
Association I
4.
Confounders
Unit 1A: Rates I
By the end of this unit you should be able to
do the following:
1. Identify a categorical variable.
2. Understand and interpret tables and
plots created from 1 categorical variable.
RECAP
Types of variables
What to measure:
Variable “Outcome” tells us if the
treatment was a success or not
500
Counts
Counts
Failure 600 Failure
400 Success Success
300 400 831
219
200
200
100
0 0
Outcome Outcome
Analysing 1 categorical variable - Plot
100% Stacked Bar plot for “Outcome”
100%
90% 20.9%
80%
70%
60%
Percentage
50% Failure
Success
40% 79.1%
30%
20%
10%
0%
Outcome
Conclusion
Table and bar plots gave us the same
conclusion
79% success
21% failure
X
542 (77.4%) 158 (22.6%) 700 (100%)
Y
289 (82.6%) 61 (17.4%) 350 (100%)
Column Total
831 (79.1%) 219 (20.9%) 1050 (100%)
Analysing 2 categorical variables - plot
Dodged bar plot for “Outcome” by Stacked bar plot for “Outcome” by
“Treatment” “Treatment”
600 800
542
700
500
600 158
400
500
289
Counts
Counts
300 400
Failure Failure
300 61
200 158 Success 542 Success
200
100 61 289
100
0 0
X Y X Y
Treatment Treatment
Analysing 2 categorical variables - plot
100% Stacked Bar plot for “Outcome” by “Treatment”
100%
90% 22.6% 17.4%
80%
70%
Percentage
60%
50%
Failure
40% 77.4% 82.6%
Success
30%
20%
10%
0%
X Y
Treatment
Summary
We have learnt how to analyse 2 categorical variables from the perspective of:
• Tables – 2x2 table
• Plots – Bar plots / 100% stacked bar plots
Unit 2A: Association I
Conclusion
• rate(A | B) < rate(A | NB)
• Presence of A is weaker when B is present.
• Less successful treatments when we see Treatment X: Treatment X
is negatively associated to a successful treatment.
• More successful treatments when we see Treatment Y: Treatment Y
is positively associated to a successful treatment.
On Establishing Association
𝑤𝑤 𝑥𝑥 𝑤𝑤 𝑦𝑦
> >
𝑤𝑤 + 𝑦𝑦 𝑥𝑥 + 𝑧𝑧 𝑤𝑤 + 𝑥𝑥 𝑦𝑦 + 𝑧𝑧
𝑤𝑤 𝑥𝑥 + 𝑧𝑧 > 𝑥𝑥(𝑤𝑤 + 𝑦𝑦) 𝑤𝑤 𝑦𝑦 + 𝑧𝑧 > 𝑦𝑦 𝑤𝑤 + 𝑥𝑥
𝑤𝑤𝑤𝑤 + 𝑤𝑤𝑤𝑤 > 𝑥𝑥𝑥𝑥 + 𝑥𝑥𝑥𝑥 𝑤𝑤𝑦𝑦 + 𝑤𝑤𝑤𝑤 > 𝑦𝑦𝑤𝑤 + 𝑦𝑦𝑥𝑥
𝑤𝑤𝑤𝑤 > 𝑥𝑥𝑥𝑥
rate(A | B) > rate(A NB ⇔ rate(B | A) > rate(B | NA)
1
2
1
rate(B | A) > rate(B NA → rate(A | B) > rate(A | NB)
Check:
rate(X | Success) < rate(X | Failure)
Summary
We have learned:
• How to identify association
• Symmetry rule and its consequence on identifying association
Unit 2B: Association II
By the end of this unit, you should
be able to do the following:
Cup 1
Size: Large cup
Sweetness: 90% Size: Cup 1 + Cup 2
Sweetness: In between 20%
Cup 2
to 90%, but closer to Cup 1
Size: Small cup
Sweetness: 20%
1. The closer rate(B) is to 100%,
the closer rate(A) is to rate(A | B).
Sweetness in the final cup is between Sweetness | Cup 1 and
Sweetness | Cup 2
Cup 1
Size: Small cup
Sweetness: 20% Size: Cup 1 + Cup 2
Sweetness: Exactly in between
20%+90%
20% to 90% = = 55%
Cup 2 2
Size: Small cup
Sweetness: 90%
3. If rate(A | B) = rate(A | NB), then
rate(A) = rate(A | B) = rate(A | NB).
Cup 1
Size: Small / Large cup
Sweetness: 20%
Size: 2 Cups added together
Sweetness: Exactly 20%
Cup 2
Size: Small / Large cup
Sweetness: 20%
If cups are of the same size, sweetness will be
exactly half of the original cups.
Linking back • If Rate(B) = 50%, overall rate of A will be
to exactly in between the rate of A given B and
the rate of A given NB.
Consequences If sweetness is the same for both cups, the
2 and 3 sweetness of the final cup will also be the same,
regardless of the sizes of the original cups.
• If rate(A | B) = rate(A | NB), then rate(A) is the
same as the 2 rates.
Linking back to dataset at hand
• rate(Success) = 0.79
Overall rate of successful treatments
• rate(Success | X) = 0.774
Groups: Treatment X and Treatment Y • rate(Success | Y) = 0.826
• rate(Success) in between the conditional rates
Treatment Y is positively
associated to success rate
Size Gender Treatment Outcome
Large
Success Failure Total
stones
Small
50%
92.5%
Success Failure Total
86.7% Failure stones
40%
Success Small stones Yes No Grand
30%
X 161 13 Total 174
20% X 161 13 174
10%
Y Y 234 234 3636 270 270
0% Grand Total 395 49 444
X Y
Total 395 49 444
Treatment
Analysing 3 categorical variables - plot
100% Stacked Bar plot for "Outcome" by "Treatment"
100%
7.5%
90%
13.3%
27.6% 31.3%
80%
70%
60%
Percentage
50%
Failure
92.5%
40%
86.7% Success
72.4% 68.8%
30%
20%
10%
0%
X Y X Y
Large Small
A paradox on our hands
Overall,
Y Treatment Y is better
Is X or Y
better?
Successful Total number rate(Success) Successful Total rate(Success) Successful Total rate(Success)
treatments of treatments in % treatments number of in % treatments number of in %
treatments treatments
Successful Total rate(Success) Successful Total rate(Success) Successful Total number rate(Success)
treatments number of in % treatments number of in % treatments of treatments in %
treatments treatments
Successful Total rate(Success) Successful Total rate(Success) Successful Total number rate(Success)
treatments number of in % treatments number of in % treatments of treatments in %
treatments treatments
Association Association
Stone
size
Confounding variable
Association Association
Stone
size
Confounding variable
Definition:
A confounder is a third variable that is associated to both the independent
and dependent variable whose relationship we are investigating
Stone size associated to treatment type
Large Small Total
X
X 526 174 700
526
rate Large | X = = 0.751 Large
700 Stones
80 Since 0.751 > 0.229,
rate Large | Y = = 0.229 Large stones positively associated to treatment X
350
Stone size associated to success
Success Failure Total
436 Large
rate Success | Large = = 0.719 Stones
606
395 Since 0.719 < 0.890,
rate Success | Small = = 0.890 Large stones negatively associated to success
444
rate(success | X) < rate(success | Y)
Negative Association
X
Large X Success
Large Y Failure
Small X Success
Large Y Success
Treatment
Size Outcome
Type
Large Y Success
I want
Treatment X!
Summary Proving Association
OR
Main
variables X
rate B A) ≠ rate B NA)
OR
Confounding Stone
variable size