Biostatistics 521 Lecture 14 Inference For Numerical Data II
Biostatistics 521 Lecture 14 Inference For Numerical Data II
NUMERICAL DATA II
Xiang Zhou, PhD
BIOS 521
10/19/2023
1
Data Types
Levels,
Numbers Groups,
Categories
2
Roadmap for Inference
Numerical
Compare
means for ANOVA
> 2 groups
Today!
Mean difference of
paired measurements Paired t-Test
on same samples
3
What Questions Can We Ask for Numerical Data?
4
Inference on a numerical variable (age) on one group:
• Confidence interval for population mean
• Hypothesis test for population mean of age being above/below some value (𝐻𝐻0 : 𝜇𝜇𝑎𝑎𝑎𝑎𝑎𝑎 = 50)
5
Maybe you are interested in the difference in ages between two groups
Inference on the difference in a numerical variable (age) between two groups (men, women):
• Confidence interval for difference in population means ( 𝜇𝜇𝑀𝑀 − 𝜇𝜇𝐹𝐹 )
• Hypothesis test for population mean of age in mean being different from women (𝐻𝐻0 : 𝜇𝜇𝑀𝑀 − 𝜇𝜇𝐹𝐹 = 0)
7
Maybe you are interested in the difference in people’s actual versus desired weights
8
Paired t-Test of Correlated Outcomes
Key idea: each person is independent, but the
𝑟𝑟 = 0.8
weight and desired weight measurements are
correlated within a person!
Desired Weight
Weight
9
Each sample has “paired” weight measurements.
Subtracting desired weight from desired weight gives a new variable for each sample:
wt_diff = weight – wtdesire
wt_diff > 0 implies people weigh more than they want
10
Inference on difference in actual versus desired weight in the dataset:
• Confidence interval for population mean in difference ( 𝜇𝜇𝑤𝑤𝑤𝑤_𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 )
• Hypothesis test for population mean of weight difference being non-zero (𝐻𝐻0 : 𝜇𝜇𝑤𝑤𝑤𝑤_𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 = 0)
11
The Paired t-Test
• Hypothesis testing at level 𝛼𝛼 on population mean for difference 𝜇𝜇𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 :
𝐻𝐻0 : 𝜇𝜇𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 = 𝜇𝜇0
𝐻𝐻𝐴𝐴 : 𝜇𝜇𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 ≠ 𝜇𝜇0
• Compute the difference in paired measurements 𝑥𝑥𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 = 𝑥𝑥2 − 𝑥𝑥1 for each of 𝑛𝑛 independent samples
• Note that the 𝑥𝑥1 and 𝑥𝑥2 measurements for each sample are not independent but the 𝑥𝑥𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑
measurements are independent across samples.
• Compute the sample mean 𝑋𝑋�𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 and standard deviation 𝑠𝑠𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 for the difference in paired measures
across all 𝑛𝑛 samples
𝑋𝑋� 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 − 𝜇𝜇0
• Use the standardized mean 𝑡𝑡 = as the test statistic
𝑆𝑆𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 ⁄ 𝑛𝑛
• Compute p-value as p = 2 × 𝑃𝑃 𝑇𝑇 > 𝑡𝑡 for two-sided test and reject the null hypothesis if 𝑝𝑝 < 𝛼𝛼,
or compare t-statistic to appropriate critical values
12
Example: Paired t-Test
Subtract paired measurements
𝑋𝑋�𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 = 14.59
𝑆𝑆𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 = 24.05
𝑛𝑛 = 20,000
13
Example: Paired t-Test
• Are the means the differences in actual
Subtract paired measurements and desired weight different?
• Perform the hypothesis test at 𝛼𝛼 = 1%:
𝐻𝐻0 : 𝜇𝜇𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 = 0
𝐻𝐻𝐴𝐴 : 𝜇𝜇𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 ≠ 0
15
Confidence Interval for Mean Difference of Paired Data
Since 𝑋𝑋�𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 is approximately Normal with standard deviation 𝑆𝑆𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 ⁄ 𝑛𝑛 for sufficiently large sample size
the (1 − 𝛼𝛼)% confidence interval for the mean difference of paired data is 𝜇𝜇𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 is
∗ 𝑆𝑆𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑
𝑋𝑋�𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 ±𝑡𝑡𝑛𝑛−1,𝛼𝛼�
2 𝑛𝑛
∗
Where 𝑡𝑡𝑛𝑛−1, 𝛼𝛼⁄ is the two-sided critical value for t-Distribution with 𝑛𝑛 − 1 degrees of freedom.
2
∗
That is, P 𝑇𝑇 > 𝑡𝑡𝑛𝑛−1, 𝛼𝛼⁄
2
= 𝛼𝛼/2.
∗
𝑧𝑧𝛼𝛼⁄
2
will give very close interval for large 𝑛𝑛
16
Paired t-test Demo in R
• See Lecture14_t_test.R script
17
Example: Confidence Interval for Paired Data
Subtract paired measurements
𝑋𝑋�𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 = 14.59, 𝑆𝑆𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 = 24.05, 𝑛𝑛 = 20,000
24.05
99% 𝐶𝐶𝐶𝐶 = 14.59 ± 2.57 = (14.15212, 15.02608)
20,000
18
Paired t-Test and One-Sample t-Test
Subtract paired measurements
19
Example: Reading Test Scores Across Time
• This example comes from data collected on students in the Minnesota Public School District
(MPLS) beginning in the 2004-2005 school year
• The outcome is score on a reading achievement test, collected on the same set of students in the
fifth and eight grades
𝐻𝐻0 : 𝜇𝜇𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 = 0
𝐻𝐻𝐴𝐴 : 𝜇𝜇𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 ≠ 0
Paired t-test for Change in Test Scores
27
Introduction to the Practice of Statistics 9th Edition. Moore, McCabe, Craig
Another Example
We would use a Two-Sample t-Test to test for a difference in mean weight loss
between the two groups: early eaters and late eaters.
28
Another Example
29
Another Example
First compute the test statistic:
9.9 − 7.7
𝑡𝑡 = = 3.71
5.82 6.12
+
202 200
• Since n is large in both groups, we will just use the Normal Approximation for inference
• The critical value for a two-sided test is 𝑧𝑧 ∗ = 1.96
Since 3.71 > 1.96 = 𝑧𝑧 ∗ , we reject the null hypothesis
We conclude that 𝜇𝜇𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 > 𝜇𝜇𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 because the sample mean for weight loss is higher in the
early eater group
The p-value is p = 2 × 𝑃𝑃 𝑡𝑡 > 3.71 ≈ 2.04𝑒𝑒 4 (computer needed)
30
ANOVA: Comparing Means Across Many Groups
• The two-sample t-test allowed us to test for a difference between the population means
for two groups
31
Comparisons Across > 2 Groups
5
2
= 10 possible pairwise comparisons
33
The Multiple Testing Problem
We could perform all possible Two Sample t-Tests
5
• There are = 10 such tests
Probability of Committing a Type 1 Error
2
• Every time you perform a hypothesis test you
𝑁𝑁
risk the chance of committing a Type 1 Error
1 − 1 − 𝛼𝛼
(falsely rejecting the null hypothesis)
https://xkcd.com/882/ 35
Analysis of Variance (ANOVA)
• ANOVA is a global test of the equivalence of many population means:
𝐻𝐻0 : 𝜇𝜇1 = 𝜇𝜇2 = ⋯ = 𝜇𝜇𝑘𝑘
𝐻𝐻𝐴𝐴 : At least one of the 𝑘𝑘 means is not the same as the others
• One single test for many comparisons = reduces the chance of a Type 1 Error.
• ANOVA Table – Partitions the observed variation in the numerical outcome into that
explained by differences between groups (Group Sums of Squares) and random variation
(Residual Sums of Squares)
• One single test for many comparisons = reduces the chance of a Type 1 Error.
• ANOVA Table – Partitions the observed variation in the numerical outcome into that
explained by differences between groups (Group Sums of Squares) and random variation
(Residual Sums of Squares)
All Samples
Samples Grouped by General Health 38
Analysis Of Variance Sums of Squares Table
Null Hypothesis
𝐻𝐻0 : The mean is the same across all groups
Alternative Hypothesis
𝐻𝐻𝐴𝐴 : The mean for at least one of the groups is different
40
Details for ANOVA*
41
Imagine you have a total of K populations whose means you wish to compare
“two or more of the population 𝐻𝐻𝑎𝑎 : 𝜇𝜇𝑖𝑖 ≠ 𝜇𝜇𝑗𝑗 for at least one i and one j
means are unequal”
Is consumption of sugary beverages in adults associated with level of calorie intake
Population 1: Population 2: Population 3:
U.S. Adults who report consuming less U.S. Adults who report consuming U.S. Adults who report consuming
than 1 sugary beverage/day 1-2 sugary beverages/day more than 2 sugary beverages/day
Each group also receives a “weight” equal to that group’s sample size
𝑲𝑲
𝟏𝟏 𝟐𝟐
𝑴𝑴𝑴𝑴𝑴𝑴 = � 𝒏𝒏𝒌𝒌 𝑿𝑿𝑲𝑲 − 𝑿𝑿•
𝑲𝑲 − 𝟏𝟏
𝒌𝒌=𝟏𝟏
• If we are comparing the means of K groups, the signal (evidence) is:
Mean sum-of-squares
between groups (MSB) 𝑲𝑲
𝟏𝟏 𝟐𝟐
𝑴𝑴𝑴𝑴𝑴𝑴 = � 𝒏𝒏𝒌𝒌 𝑿𝑿𝑲𝑲 − 𝑿𝑿•
𝑲𝑲 − 𝟏𝟏
𝒌𝒌=𝟏𝟏
• If we are comparing the means of K groups, the signal (evidence) is:
𝑲𝑲
𝟏𝟏 𝟐𝟐
𝑴𝑴𝑴𝑴𝑴𝑴 = � 𝒏𝒏𝒌𝒌 𝑿𝑿𝑲𝑲 − 𝑿𝑿•
𝑲𝑲 − 𝟏𝟏
𝒌𝒌=𝟏𝟏
Degrees of freedom
• If we are comparing the means of K groups, the signal (evidence) is:
𝑲𝑲
𝟏𝟏 𝟐𝟐
𝑴𝑴𝑴𝑴𝑴𝑴 = � 𝒏𝒏𝒌𝒌 𝑿𝑿𝑲𝑲 − 𝑿𝑿•
𝑲𝑲 − 𝟏𝟏
𝒌𝒌=𝟏𝟏
𝟏𝟏 𝟐𝟐 𝟐𝟐
𝑴𝑴𝑴𝑴𝑴𝑴 = [𝟐𝟐𝟐𝟐 𝟏𝟏𝟏𝟏𝟏𝟏𝟏𝟏 − 𝟏𝟏𝟏𝟏𝟏𝟏𝟏𝟏 + 𝟏𝟏𝟏𝟏 𝟐𝟐𝟐𝟐𝟐𝟐𝟐𝟐 − 𝟏𝟏𝟏𝟏𝟏𝟏𝟏𝟏 + 𝟔𝟔 𝟐𝟐𝟐𝟐𝟐𝟐𝟐𝟐 − 𝟏𝟏𝟏𝟏𝟏𝟏𝟏𝟏 𝟐𝟐 ]
𝟑𝟑 − 𝟏𝟏
𝑴𝑴𝑴𝑴𝑴𝑴 = 𝟒𝟒𝟒𝟒𝟒𝟒, 𝟖𝟖𝟖𝟖𝟖𝟖
Is this signal “BIG ENOUGH” relative to the noise in the data?
We measure how
much each observation
deviates from the
mean of its group
Our noise is the average of the squared deviations across the 42 people in our sample
Our noise is the average of the squared deviations across the 42 people in our sample
Our noise is the average of the squared deviations across the 42 people in our sample
Our noise is the average of the squared deviations across the 42 people in our sample
24 12 6
2 2 2
� 𝑋𝑋𝑖𝑖𝑖 − 𝑋𝑋1 + � 𝑋𝑋𝑖𝑖𝑖 − 𝑋𝑋2 + � 𝑋𝑋𝑖𝑖𝑖 − 𝑋𝑋3
𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1
• We add up the noise (skepticism) in each group and then take an average
24 12 6
1 2 2 2
𝑀𝑀𝑀𝑀𝑀𝑀 = � 𝑋𝑋𝑖𝑖𝑖 − 𝑋𝑋1 + � 𝑋𝑋𝑖𝑖𝑖 − 𝑋𝑋2 + � 𝑋𝑋𝑖𝑖𝑖 − 𝑋𝑋3
𝑛𝑛 − 𝐾𝐾
𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1
𝐾𝐾 𝑛𝑛𝑘𝑘
1 2
= � � 𝑋𝑋𝑖𝑖𝑖𝑖 − 𝑋𝑋𝐾𝐾
𝑛𝑛 − 𝐾𝐾
𝑘𝑘=1 𝑖𝑖=1
• We add up the noise (skepticism) in each group and then take an average
24 12 6
1 2 2 2
𝑀𝑀𝑀𝑀𝑀𝑀 = � 𝑋𝑋𝑖𝑖𝑖 − 𝑋𝑋1 + � 𝑋𝑋𝑖𝑖𝑖 − 𝑋𝑋2 + � 𝑋𝑋𝑖𝑖𝑖 − 𝑋𝑋3
𝑛𝑛 − 𝐾𝐾
𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1
Mean squared
𝐾𝐾 𝑛𝑛𝑘𝑘
error (MSE) 1 2
= � � 𝑋𝑋𝑖𝑖𝑖𝑖 − 𝑋𝑋𝐾𝐾
𝑛𝑛 − 𝐾𝐾
𝑘𝑘=1 𝑖𝑖=1
• We add up the noise (skepticism) in each group and then take an average
24 12 6
1 2 2 2
𝑀𝑀𝑀𝑀𝑀𝑀 = � 𝑋𝑋𝑖𝑖𝑖 − 𝑋𝑋1 + � 𝑋𝑋𝑖𝑖𝑖 − 𝑋𝑋2 + � 𝑋𝑋𝑖𝑖𝑖 − 𝑋𝑋3
𝑛𝑛 − 𝐾𝐾
𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1
𝐾𝐾 𝑛𝑛𝑘𝑘
1 2
= � � 𝑋𝑋𝑖𝑖𝑖𝑖 − 𝑋𝑋𝐾𝐾
𝑛𝑛 − 𝐾𝐾
𝑘𝑘=1 𝑖𝑖=1
Degrees of
freedom
• We add up the noise (skepticism) in each group and then take an average
24 12 6
1 2 2 2
𝑀𝑀𝑀𝑀𝑀𝑀 = � 𝑋𝑋𝑖𝑖𝑖 − 𝑋𝑋1 + � 𝑋𝑋𝑖𝑖𝑖 − 𝑋𝑋2 + � 𝑋𝑋𝑖𝑖𝑖 − 𝑋𝑋3
𝑛𝑛 − 𝐾𝐾
𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1
𝐾𝐾 𝑛𝑛𝑘𝑘
1 2
= � � 𝑋𝑋𝑖𝑖𝑖𝑖 − 𝑋𝑋𝐾𝐾
𝑛𝑛 − 𝐾𝐾
𝑘𝑘=1 𝑖𝑖=1
= 393,870
• Like a t-test, our test statistic is the signal-to-noise ratio
𝑀𝑀𝑀𝑀𝑀𝑀 evidence
𝑆𝑆𝑆𝑆𝑆𝑆 = 𝐹𝐹 =
𝑀𝑀𝑀𝑀𝑀𝑀 skepticism
𝑀𝑀𝑀𝑀𝑀𝑀 412,833
• 𝐹𝐹 = = = 1.05
𝑀𝑀𝑀𝑀𝑀𝑀 393,870
Pase & colleagues (2017). “Sugary beverage intake & preclinical Alzheimer’s disease in the community.” Alzheimer’s & Dementia.
The results of ANOVA are often displayed in an ANOVA table
Source Sum of Degrees of Mean Square F-statistic p-value
Squares Freedom
Between 825,665 2 412,833 1.05 0.36
groups
Within 15,360,918 39 393,870
groups
Total 16,186,583 41 806,703
Modified Example: ANOVA and Bonferroni
Correction
Pase & colleagues also compared the 3 beverage consumption groups with respect to mean grams of
saturated fat consumed per day. A subset of the data produces the following summary:
Statistically
Source Sum of Squares Degrees of Mean Square F-statistic p-value
significant difference
Freedom
in mean daily grams
Between 1,342 2 671 6.0 0.003 of saturated fat
groups consumed among
Within 47,390 425 112 the 3 populations
groups
Total 48,732 427
But, which populations are different from each other?
• We now do a “post-hoc” (after the fact) comparison of each pair of means (i.e. we do a two-
sample t-test with each possible pair of groups).
But, which populations are different from each other?
• We now do a “post-hoc” (after the fact) comparison of each pair of means (i.e. we do a two-
sample t-test with each possible pair of groups).
Because we have three groups, there are three pairwise comparisons we can do:
< 1/day versus 1-2/day < 1/day versus > 2/day 1-2/day versus > 2/day
Because we have three groups, there are three pairwise comparisons we can do:
< 1/day versus 1-2/day < 1/day versus > 2/day 1-2/day versus > 2/day
For K groups, there are K(K - 1)/2 comparisons Only the < 1/day population and the >
2/day populations differ significantly
If K = 5, we multiply each p-value by 5(4)/2 = 10
in their mean daily saturated fat
If K = 10, we multiply each p-value by 10(9)/2 = 45 intake.
Bonferroni Corrections
• An alternative approach to a Bonferroni correction is:
Since we have three comparisons, our new p-value threshold is 0.05/3 = 0.017
Bonferroni Corrections
• An alternative approach to a Bonferroni correction is:
Since we have three comparisons, our new p-value threshold is 0.05/3 = 0.017
Since we have three comparisons, our new p-value threshold is 0.05/3 = 0.017
𝒛𝒛∗𝜶𝜶⁄𝟐𝟐 will give very close interval for large 𝒏𝒏 (Use this for class)
78
Test Hypothesis Test Test Statistic Confidence Interval
79