0% found this document useful (0 votes)
19 views79 pages

Biostatistics 521 Lecture 14 Inference For Numerical Data II

Uploaded by

Chandni Rana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views79 pages

Biostatistics 521 Lecture 14 Inference For Numerical Data II

Uploaded by

Chandni Rana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 79

INFERENCE FOR

NUMERICAL DATA II
Xiang Zhou, PhD
BIOS 521
10/19/2023

1
Data Types

Levels,
Numbers Groups,
Categories

2
Roadmap for Inference
Numerical

Mean of One Sample


one group t-Test
Last Class
Compare
Two Sample
means for
t-Test
two groups

Compare
means for ANOVA
> 2 groups
Today!
Mean difference of
paired measurements Paired t-Test
on same samples
3
What Questions Can We Ask for Numerical Data?

4
Inference on a numerical variable (age) on one group:
• Confidence interval for population mean
• Hypothesis test for population mean of age being above/below some value (𝐻𝐻0 : 𝜇𝜇𝑎𝑎𝑎𝑎𝑎𝑎 = 50)

5
Maybe you are interested in the difference in ages between two groups

How might you test for a difference?


6
How might you test for a
difference?
 Treat dataset as two
samples (male, female)
 Compute the mean in
each group and subtract

Inference on the difference in a numerical variable (age) between two groups (men, women):
• Confidence interval for difference in population means ( 𝜇𝜇𝑀𝑀 − 𝜇𝜇𝐹𝐹 )
• Hypothesis test for population mean of age in mean being different from women (𝐻𝐻0 : 𝜇𝜇𝑀𝑀 − 𝜇𝜇𝐹𝐹 = 0)
7
Maybe you are interested in the difference in people’s actual versus desired weights

8
Paired t-Test of Correlated Outcomes
 Key idea: each person is independent, but the
𝑟𝑟 = 0.8
weight and desired weight measurements are
correlated within a person!
Desired Weight

 We should not treat these as two sets of


independent measurements

 Paired t-test: two correlated measurements


on same set of individuals

Weight
9
Each sample has “paired” weight measurements.
Subtracting desired weight from desired weight gives a new variable for each sample:
wt_diff = weight – wtdesire
 wt_diff > 0 implies people weigh more than they want

10
Inference on difference in actual versus desired weight in the dataset:
• Confidence interval for population mean in difference ( 𝜇𝜇𝑤𝑤𝑤𝑤_𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 )
• Hypothesis test for population mean of weight difference being non-zero (𝐻𝐻0 : 𝜇𝜇𝑤𝑤𝑤𝑤_𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 = 0)

11
The Paired t-Test
• Hypothesis testing at level 𝛼𝛼 on population mean for difference 𝜇𝜇𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 :
𝐻𝐻0 : 𝜇𝜇𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 = 𝜇𝜇0
𝐻𝐻𝐴𝐴 : 𝜇𝜇𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 ≠ 𝜇𝜇0

• Compute the difference in paired measurements 𝑥𝑥𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 = 𝑥𝑥2 − 𝑥𝑥1 for each of 𝑛𝑛 independent samples

• Note that the 𝑥𝑥1 and 𝑥𝑥2 measurements for each sample are not independent but the 𝑥𝑥𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑
measurements are independent across samples.
• Compute the sample mean 𝑋𝑋�𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 and standard deviation 𝑠𝑠𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 for the difference in paired measures
across all 𝑛𝑛 samples
𝑋𝑋� 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 − 𝜇𝜇0
• Use the standardized mean 𝑡𝑡 = as the test statistic
𝑆𝑆𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 ⁄ 𝑛𝑛

• Compute p-value as p = 2 × 𝑃𝑃 𝑇𝑇 > 𝑡𝑡 for two-sided test and reject the null hypothesis if 𝑝𝑝 < 𝛼𝛼,
or compare t-statistic to appropriate critical values
12
Example: Paired t-Test
Subtract paired measurements

𝑋𝑋�𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 = 14.59
𝑆𝑆𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 = 24.05
𝑛𝑛 = 20,000

13
Example: Paired t-Test
• Are the means the differences in actual
Subtract paired measurements and desired weight different?
• Perform the hypothesis test at 𝛼𝛼 = 1%:
𝐻𝐻0 : 𝜇𝜇𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 = 0
𝐻𝐻𝐴𝐴 : 𝜇𝜇𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 ≠ 0

𝑋𝑋�𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 − 𝜇𝜇0 14.59


𝑡𝑡 = =
𝑋𝑋�𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 = 14.59 𝑆𝑆𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 ⁄ 𝑛𝑛 24.05⁄ 20,000
𝑆𝑆𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 = 24.05
 𝑡𝑡 = 85.8
𝑛𝑛 = 20,000  𝑡𝑡 is highly significant so reject the null that actual
and desired weight are the same.
 Since 𝑋𝑋�𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 > 0 we would conclude that actual
weight is higher.
14
Example: Paired t-Test
• Are the means the differences in actual
and desired weight different?
Subtract paired measurements
• Perform the hypothesis test at 𝛼𝛼 = 1%:
𝐻𝐻0 : 𝜇𝜇𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 = 0
𝐻𝐻𝐴𝐴 : 𝜇𝜇𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 ≠ 0

15
Confidence Interval for Mean Difference of Paired Data

Since 𝑋𝑋�𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 is approximately Normal with standard deviation 𝑆𝑆𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 ⁄ 𝑛𝑛 for sufficiently large sample size
the (1 − 𝛼𝛼)% confidence interval for the mean difference of paired data is 𝜇𝜇𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 is

∗ 𝑆𝑆𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑
𝑋𝑋�𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 ±𝑡𝑡𝑛𝑛−1,𝛼𝛼�
2 𝑛𝑛

Where 𝑡𝑡𝑛𝑛−1, 𝛼𝛼⁄ is the two-sided critical value for t-Distribution with 𝑛𝑛 − 1 degrees of freedom.
2


That is, P 𝑇𝑇 > 𝑡𝑡𝑛𝑛−1, 𝛼𝛼⁄
2
= 𝛼𝛼/2.

 𝑧𝑧𝛼𝛼⁄
2
will give very close interval for large 𝑛𝑛

16
Paired t-test Demo in R
• See Lecture14_t_test.R script

17
Example: Confidence Interval for Paired Data
Subtract paired measurements
𝑋𝑋�𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 = 14.59, 𝑆𝑆𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 = 24.05, 𝑛𝑛 = 20,000

24.05
99% 𝐶𝐶𝐶𝐶 = 14.59 ± 2.57 = (14.15212, 15.02608)
20,000

18
Paired t-Test and One-Sample t-Test
Subtract paired measurements

19
Example: Reading Test Scores Across Time
• This example comes from data collected on students in the Minnesota Public School District
(MPLS) beginning in the 2004-2005 school year

• Data was collected in part to comply with federal accountability requirements,


namely Title X of the No Child Left Behind Act, and used to study factors affecting academic
achievement

• The outcome is score on a reading achievement test, collected on the same set of students in the
fifth and eight grades

 Are student scores changing over time?


Paired Data
The data are paired because we
have two scores for each student, Student ID
one in the 5th grade and one in the
8th grade.

We can use the Paired t-test to


test for a change in scores
between the 5th and 8th grades

Let 𝜇𝜇𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 = the mean change in


test score between the 5th and 8th
grades

𝐻𝐻0 : 𝜇𝜇𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 = 0
𝐻𝐻𝐴𝐴 : 𝜇𝜇𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 ≠ 0
Paired t-test for Change in Test Scores

The p-value is very small so we would reject the


null hypothesis that the test scores are the same
in 5th and 8th grade
The Importance of
Accounting for the
Student ID
Correlation between
Scores

The paired test scores are


correlated. Ignoring the within-
student correlation of test
scores treats each of the
measurements as independent
observations

Equivalent to “dropping the


lines connecting the dots”
The Importance of Visualizing the scores as coming from different sets
of 5th and 8th grade students
Accounting for the
Correlation between
Scores

The paired test scores are


correlated. Ignoring the within-
student correlation of test scores
treats each of the measurements
as independent observations

The Two-Sample t-test for a


difference in means between two
groups with independent
sampling
Two-Sample t-Test for Change in Test Scores

Fail to reject the null hypothesis that test scores


differ between 5th and 8th grade at alpha=5%

 The two-sample t-test is incorrectly ignoring the paired or


correlated test scores on the same student.
 It “assumes” the 5th and 8th grade scores come from different
(independent) students
Same Data, Different Result?
Two-sample t-test: 𝑝𝑝 = 0.06 Paired t-test: 𝑝𝑝 = 9.4𝑒𝑒 −5

 Both tests see a similar mean difference in test scores


 The tests view the variability in measurements very differently leading to different inference
 Because scores come from the same students, the change in scores between grades looks convincing using the Paired t-test
 Highlights the importance of using the appropriate statistical test
Another Example
Does the timing of food intake and impact weight loss?
• Researchers followed 402 overweight individuals through
a 20-week weight loss treatment. Participants were grouped into
early eaters and late eaters, based on the timing of their main meal.
The following data was collected on average weight loss (kg) over the
20 weeks :

• What is the appropriate test for analyzing this data?

27
Introduction to the Practice of Statistics 9th Edition. Moore, McCabe, Craig
Another Example
We would use a Two-Sample t-Test to test for a difference in mean weight loss
between the two groups: early eaters and late eaters.

𝐻𝐻0 : 𝜇𝜇𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 = 𝜇𝜇𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙


𝐻𝐻𝐴𝐴 : 𝜇𝜇𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 ≠ 𝜇𝜇𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙

Perform the two-sided hypothesis test at 𝛼𝛼 = 5%.

28
Another Example

29
Another Example
First compute the test statistic:

9.9 − 7.7
𝑡𝑡 = = 3.71
5.82 6.12
+
202 200

• Since n is large in both groups, we will just use the Normal Approximation for inference
• The critical value for a two-sided test is 𝑧𝑧 ∗ = 1.96
 Since 3.71 > 1.96 = 𝑧𝑧 ∗ , we reject the null hypothesis
 We conclude that 𝜇𝜇𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 > 𝜇𝜇𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 because the sample mean for weight loss is higher in the
early eater group
 The p-value is p = 2 × 𝑃𝑃 𝑡𝑡 > 3.71 ≈ 2.04𝑒𝑒 4 (computer needed)
30
ANOVA: Comparing Means Across Many Groups

• The two-sample t-test allowed us to test for a difference between the population means
for two groups

• What if there are more than 2 groups?

31
Comparisons Across > 2 Groups

The Categorical variable now has > 2 levels Numerical Outcome


32
Comparisons Across > 2 Groups
We could perform all possible Two
Sample t-Tests
• There are 52 = 10 such tests
Age

• Every time you perform a hypothesis


test you risk the chance of
committing a Type 1 Error
(falsely rejecting the null hypothesis)

Self-Reported General Health

5
2
= 10 possible pairwise comparisons
33
The Multiple Testing Problem
We could perform all possible Two Sample t-Tests
5
• There are = 10 such tests
Probability of Committing a Type 1 Error

2
• Every time you perform a hypothesis test you
𝑁𝑁
risk the chance of committing a Type 1 Error
1 − 1 − 𝛼𝛼
(falsely rejecting the null hypothesis)

• If we perform each test at level 𝛼𝛼 = 5%, the


probability that we falsely reject the null for at
least one of the 𝟏𝟏𝟏𝟏 tests is
1 − 1 − 0.05 10 = 0.40
 There is a 𝟒𝟒𝟒𝟒𝟒 chance we would claim a
difference when one does not really exist!
 Called the Multiple Testing Problem

N: Number of Tests Performed at Level 𝛼𝛼 = 5% 34


• Each time a hypothesis test is
performed at 𝛼𝛼 = 5%, there is a 5%
chance we reject the null hypothesis
even if it is true (Type I error)

• The chance of a Type I error increases


as we perform more test
 The “overall experiment” does not
have a 5% error rate

• Failing to account for the multiple


tests is an example of p-hacking

https://xkcd.com/882/ 35
Analysis of Variance (ANOVA)
• ANOVA is a global test of the equivalence of many population means:
𝐻𝐻0 : 𝜇𝜇1 = 𝜇𝜇2 = ⋯ = 𝜇𝜇𝑘𝑘
𝐻𝐻𝐴𝐴 : At least one of the 𝑘𝑘 means is not the same as the others

• One single test for many comparisons = reduces the chance of a Type 1 Error.

• ANOVA Table – Partitions the observed variation in the numerical outcome into that
explained by differences between groups (Group Sums of Squares) and random variation
(Residual Sums of Squares)

df Sum Sq Mean Sq F Value P-value


Groups 𝑘𝑘 − 1 ### ### F-statistic 𝑃𝑃(𝐹𝐹 > 𝑓𝑓)
Residual 𝑛𝑛 − 𝑘𝑘 ### ###
36
Analysis of Variance (ANOVA)
• ANOVA is a global test of the equivalence of many population means:
The ANOVA p-value is
𝐻𝐻0 : 𝜇𝜇1 = 𝜇𝜇2 = ⋯ = 𝜇𝜇𝑘𝑘 for this hypothesis test
𝐻𝐻𝐴𝐴 : At least one of the 𝑘𝑘 means is not the same as the others

• One single test for many comparisons = reduces the chance of a Type 1 Error.

• ANOVA Table – Partitions the observed variation in the numerical outcome into that
explained by differences between groups (Group Sums of Squares) and random variation
(Residual Sums of Squares)

df Sum Sq Mean Sq F Value P-value


Groups 𝑘𝑘 − 1 ### ### F-statistic 𝑃𝑃(𝐹𝐹 > 𝑓𝑓)
Residual 𝑛𝑛 − 𝑘𝑘 ### ###
37
Analysis Of Variance (ANOVA)
ANOVA null hypothesis:
The means for all ANOVA alternative hypothesis:
groups are the same The mean for at least one of the groups is different

All Samples
Samples Grouped by General Health 38
Analysis Of Variance Sums of Squares Table

Null Hypothesis
𝐻𝐻0 : The mean is the same across all groups

Alternative Hypothesis
𝐻𝐻𝐴𝐴 : The mean for at least one of the groups is different

There is very strong evidence to


Variable Degrees of Sums of Mean Sums F-Statistic P-value reject the null hypothesis that age
Freedom Squares of Squares
is independent of general health
General
Health
4 318613 79653 284.8 <2e-16 (i.e., mean age is the same across
general health groups)
Residual 19995 5592863 280
39
ANOVA Demo in R
• Reproduce these results for difference in Age by General Health groups

• Test for a difference in height by General Health groups

See the Lecture14_t_tests.R script

40
Details for ANOVA*

41
Imagine you have a total of K populations whose means you wish to compare

Pop 1 Pop 2 Pop 3 … Pop K

𝜇𝜇1 𝜇𝜇2 𝜇𝜇3 𝜇𝜇𝐾𝐾

“the population means are 𝐻𝐻0 : 𝜇𝜇1 = 𝜇𝜇2 = ⋯ = 𝜇𝜇𝐾𝐾


all equal”

“two or more of the population 𝐻𝐻𝑎𝑎 : 𝜇𝜇𝑖𝑖 ≠ 𝜇𝜇𝑗𝑗 for at least one i and one j
means are unequal”
Is consumption of sugary beverages in adults associated with level of calorie intake
Population 1: Population 2: Population 3:
U.S. Adults who report consuming less U.S. Adults who report consuming U.S. Adults who report consuming
than 1 sugary beverage/day 1-2 sugary beverages/day more than 2 sugary beverages/day

Mean Mean Mean


calories/day calories/day calories/day
𝝁𝝁𝟏𝟏 𝝁𝝁𝟐𝟐 𝝁𝝁𝟑𝟑

𝑯𝑯𝟎𝟎 : 𝝁𝝁𝟏𝟏 = 𝝁𝝁𝟐𝟐 = 𝝁𝝁𝟑𝟑 𝝁𝝁𝟏𝟏 ≠ 𝝁𝝁𝟐𝟐 and/or 3 multiple


𝝁𝝁𝟏𝟏 ≠ 𝝁𝝁𝟑𝟑 and/or pairwise
𝑯𝑯𝒂𝒂 : at least two of 𝝁𝝁𝟏𝟏 , 𝝁𝝁𝟐𝟐 , and 𝝁𝝁𝟑𝟑 not equal 𝝁𝝁𝟐𝟐 ≠ 𝝁𝝁𝟑𝟑 comparisons
Pase & colleagues (2017). “Sugary beverage intake & preclinical Alzheimer’s disease in the community.” Alzheimer’s & Dementia.
Is consumption of sugary beverages in adults associated with level of calorie intake

Population Beverage Sample


Consumption Sample Mean (SD) Size
Population 1 < 1 / day 1,782 (594) cal/day 2,395 24
Population 2 1 – 2 / day 2,007 (626) cal/day 1,239 12
Population 3 > 2 / day 2,413 (767) cal/day 641 6

24 1782 + 12 2007 + 6(2143)


Grand mean 𝑋𝑋• = = 1898
24 + 12 + 6

If 𝐻𝐻0 is true (all populations have the same mean),


then the sample means should all be close to the grand mean
Pase & colleagues (2017). “Sugary beverage intake & preclinical Alzheimer’s disease in the community.” Alzheimer’s & Dementia.
We can visually present the discrepancy of
the sample means from the grand mean
• The signal (evidence) is the average of the squared distances of each
• mean from the overall mean

Each group also receives a “weight” equal to that group’s sample size

Thus, if we let 𝑋𝑋• denote the overall mean and

𝑋𝑋1 = mean of < 1 beverage/day group


𝑋𝑋2 = mean of 1 - 2 beverages/day group

𝑋𝑋3 = mean of > 2 beverages/day group

Our signal is:


𝟏𝟏 𝟐𝟐 𝟐𝟐 𝟐𝟐
[𝟐𝟐𝟐𝟐 𝑿𝑿𝟏𝟏 − 𝑿𝑿• + 𝟏𝟏𝟏𝟏 𝑿𝑿𝟐𝟐 − 𝑿𝑿• + 𝟔𝟔 𝑿𝑿𝟑𝟑 − 𝑿𝑿• ]
𝟑𝟑 − 𝟏𝟏
• If we are comparing the means of K groups, the signal (evidence) is:

𝑲𝑲
𝟏𝟏 𝟐𝟐
𝑴𝑴𝑴𝑴𝑴𝑴 = � 𝒏𝒏𝒌𝒌 𝑿𝑿𝑲𝑲 − 𝑿𝑿•
𝑲𝑲 − 𝟏𝟏
𝒌𝒌=𝟏𝟏
• If we are comparing the means of K groups, the signal (evidence) is:

Mean sum-of-squares
between groups (MSB) 𝑲𝑲
𝟏𝟏 𝟐𝟐
𝑴𝑴𝑴𝑴𝑴𝑴 = � 𝒏𝒏𝒌𝒌 𝑿𝑿𝑲𝑲 − 𝑿𝑿•
𝑲𝑲 − 𝟏𝟏
𝒌𝒌=𝟏𝟏
• If we are comparing the means of K groups, the signal (evidence) is:

𝑲𝑲
𝟏𝟏 𝟐𝟐
𝑴𝑴𝑴𝑴𝑴𝑴 = � 𝒏𝒏𝒌𝒌 𝑿𝑿𝑲𝑲 − 𝑿𝑿•
𝑲𝑲 − 𝟏𝟏
𝒌𝒌=𝟏𝟏
Degrees of freedom
• If we are comparing the means of K groups, the signal (evidence) is:

𝑲𝑲
𝟏𝟏 𝟐𝟐
𝑴𝑴𝑴𝑴𝑴𝑴 = � 𝒏𝒏𝒌𝒌 𝑿𝑿𝑲𝑲 − 𝑿𝑿•
𝑲𝑲 − 𝟏𝟏
𝒌𝒌=𝟏𝟏

𝟏𝟏 𝟐𝟐 𝟐𝟐
𝑴𝑴𝑴𝑴𝑴𝑴 = [𝟐𝟐𝟐𝟐 𝟏𝟏𝟏𝟏𝟏𝟏𝟏𝟏 − 𝟏𝟏𝟏𝟏𝟏𝟏𝟏𝟏 + 𝟏𝟏𝟏𝟏 𝟐𝟐𝟐𝟐𝟐𝟐𝟐𝟐 − 𝟏𝟏𝟏𝟏𝟏𝟏𝟏𝟏 + 𝟔𝟔 𝟐𝟐𝟐𝟐𝟐𝟐𝟐𝟐 − 𝟏𝟏𝟏𝟏𝟏𝟏𝟏𝟏 𝟐𝟐 ]
𝟑𝟑 − 𝟏𝟏
𝑴𝑴𝑴𝑴𝑴𝑴 = 𝟒𝟒𝟒𝟒𝟒𝟒, 𝟖𝟖𝟖𝟖𝟖𝟖
Is this signal “BIG ENOUGH” relative to the noise in the data?

We measure how
much each observation
deviates from the
mean of its group
Our noise is the average of the squared deviations across the 42 people in our sample
Our noise is the average of the squared deviations across the 42 people in our sample
Our noise is the average of the squared deviations across the 42 people in our sample
Our noise is the average of the squared deviations across the 42 people in our sample

24 12 6
2 2 2
� 𝑋𝑋𝑖𝑖𝑖 − 𝑋𝑋1 + � 𝑋𝑋𝑖𝑖𝑖 − 𝑋𝑋2 + � 𝑋𝑋𝑖𝑖𝑖 − 𝑋𝑋3
𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1
• We add up the noise (skepticism) in each group and then take an average

We do not divide by the sample size (n = 42), but rather by


the sample size reduced by the number of groups (K = 3):

24 12 6
1 2 2 2
𝑀𝑀𝑀𝑀𝑀𝑀 = � 𝑋𝑋𝑖𝑖𝑖 − 𝑋𝑋1 + � 𝑋𝑋𝑖𝑖𝑖 − 𝑋𝑋2 + � 𝑋𝑋𝑖𝑖𝑖 − 𝑋𝑋3
𝑛𝑛 − 𝐾𝐾
𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1

𝐾𝐾 𝑛𝑛𝑘𝑘
1 2
= � � 𝑋𝑋𝑖𝑖𝑖𝑖 − 𝑋𝑋𝐾𝐾
𝑛𝑛 − 𝐾𝐾
𝑘𝑘=1 𝑖𝑖=1
• We add up the noise (skepticism) in each group and then take an average

We do not divide by the sample size (n = 42), but rather by


the sample size reduced by the number of groups (K = 3):

24 12 6
1 2 2 2
𝑀𝑀𝑀𝑀𝑀𝑀 = � 𝑋𝑋𝑖𝑖𝑖 − 𝑋𝑋1 + � 𝑋𝑋𝑖𝑖𝑖 − 𝑋𝑋2 + � 𝑋𝑋𝑖𝑖𝑖 − 𝑋𝑋3
𝑛𝑛 − 𝐾𝐾
𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1

Mean squared
𝐾𝐾 𝑛𝑛𝑘𝑘
error (MSE) 1 2
= � � 𝑋𝑋𝑖𝑖𝑖𝑖 − 𝑋𝑋𝐾𝐾
𝑛𝑛 − 𝐾𝐾
𝑘𝑘=1 𝑖𝑖=1
• We add up the noise (skepticism) in each group and then take an average

We do not divide by the sample size (n = 42), but rather by


the sample size reduced by the number of groups (K = 3):

24 12 6
1 2 2 2
𝑀𝑀𝑀𝑀𝑀𝑀 = � 𝑋𝑋𝑖𝑖𝑖 − 𝑋𝑋1 + � 𝑋𝑋𝑖𝑖𝑖 − 𝑋𝑋2 + � 𝑋𝑋𝑖𝑖𝑖 − 𝑋𝑋3
𝑛𝑛 − 𝐾𝐾
𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1

𝐾𝐾 𝑛𝑛𝑘𝑘
1 2
= � � 𝑋𝑋𝑖𝑖𝑖𝑖 − 𝑋𝑋𝐾𝐾
𝑛𝑛 − 𝐾𝐾
𝑘𝑘=1 𝑖𝑖=1
Degrees of
freedom
• We add up the noise (skepticism) in each group and then take an average

We do not divide by the sample size (n = 42), but rather by


the sample size reduced by the number of groups (K = 3):

24 12 6
1 2 2 2
𝑀𝑀𝑀𝑀𝑀𝑀 = � 𝑋𝑋𝑖𝑖𝑖 − 𝑋𝑋1 + � 𝑋𝑋𝑖𝑖𝑖 − 𝑋𝑋2 + � 𝑋𝑋𝑖𝑖𝑖 − 𝑋𝑋3
𝑛𝑛 − 𝐾𝐾
𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1

𝐾𝐾 𝑛𝑛𝑘𝑘
1 2
= � � 𝑋𝑋𝑖𝑖𝑖𝑖 − 𝑋𝑋𝐾𝐾
𝑛𝑛 − 𝐾𝐾
𝑘𝑘=1 𝑖𝑖=1

= 393,870
• Like a t-test, our test statistic is the signal-to-noise ratio

𝑀𝑀𝑀𝑀𝑀𝑀 evidence
𝑆𝑆𝑆𝑆𝑆𝑆 = 𝐹𝐹 =
𝑀𝑀𝑀𝑀𝑀𝑀 skepticism

We convert this F−statistic to a p−value, determined


from an F−distribution instead of a normal distribution

We want large values of F in order to reject 𝐻𝐻0

If F is large, then the observations tend to “cluster” in their groups,


making the groups distinct from each other
Is consumption of sugary beverages in adults associated with level of calorie intake

𝑀𝑀𝑀𝑀𝑀𝑀 412,833
• 𝐹𝐹 = = = 1.05
𝑀𝑀𝑀𝑀𝑀𝑀 393,870

Based upon the numerator &


𝑝𝑝 = 0.36
denominator degrees of freedom

We fail to reject 𝐻𝐻0

We do not conclude that there


We lack evidence that there
is no association between
is an association between
sugary drink consumption and
sugary drink consumption
daily caloric consumption
and daily caloric consumption
(accept 𝐻𝐻0 )

Pase & colleagues (2017). “Sugary beverage intake & preclinical Alzheimer’s disease in the community.” Alzheimer’s & Dementia.
The results of ANOVA are often displayed in an ANOVA table
Source Sum of Degrees of Mean Square F-statistic p-value
Squares Freedom
Between 825,665 2 412,833 1.05 0.36
groups
Within 15,360,918 39 393,870
groups
Total 16,186,583 41 806,703
Modified Example: ANOVA and Bonferroni
Correction
Pase & colleagues also compared the 3 beverage consumption groups with respect to mean grams of
saturated fat consumed per day. A subset of the data produces the following summary:

Population Beverage Sample Mean (SD) Sample Size


Consumption
Population 1 < 1 / day 22 (10) g/day 240
Population 2 1 – 2 / day 24 (11) g/day 124
Population 3 > 2 / day 27 (12) g/day 64
Pase & colleagues also compared the 3 beverage consumption groups with respect to mean grams of
saturated fat consumed per day. A subset of the data produces the following summary:

Population Beverage Sample Mean (SD) Sample Size


Consumption
Population 1 < 1 / day 22 (10) g/day 240
Population 2 1 – 2 / day 24 (11) g/day 124
Population 3 > 2 / day 27 (12) g/day 64

The resulting ANOVA table is:

Source Sum of Squares Degrees of Mean Square F-statistic p-value


Freedom
Between 1,342 2 671 6.0 0.003
groups
Within 47,390 425 112
groups
Total 48,732 427
Pase & colleagues also compared the 3 beverage consumption groups with respect to mean grams of
saturated fat consumed per day. A subset of the data produces the following summary:

Population Beverage Sample Mean (SD) Sample Size


Consumption
Population 1 < 1 / day 22 (10) g/day 240
Population 2 1 – 2 / day 24 (11) g/day 124
Population 3 > 2 / day 27 (12) g/day 64

The resulting ANOVA table is:

Statistically
Source Sum of Squares Degrees of Mean Square F-statistic p-value
significant difference
Freedom
in mean daily grams
Between 1,342 2 671 6.0 0.003 of saturated fat
groups consumed among
Within 47,390 425 112 the 3 populations
groups
Total 48,732 427
But, which populations are different from each other?
• We now do a “post-hoc” (after the fact) comparison of each pair of means (i.e. we do a two-
sample t-test with each possible pair of groups).
But, which populations are different from each other?
• We now do a “post-hoc” (after the fact) comparison of each pair of means (i.e. we do a two-
sample t-test with each possible pair of groups).

Because we have three groups, there are three pairwise comparisons we can do:

< 1/day versus 1-2/day < 1/day versus > 2/day 1-2/day versus > 2/day

If we do each of the 3 t-tests, we get the following:


But, which populations are different from each other?
• We now do a “post-hoc” (after the fact) comparison of each pair of means (i.e. we do a two-
sample t-test with each possible pair of groups).

Because we have three groups, there are three pairwise comparisons we can do:

< 1/day versus 1-2/day < 1/day versus > 2/day 1-2/day versus > 2/day

If we do each of the 3 t-tests, we get the following:

Comparison t-statistic p-value


< 1/day versus 1-2/day -1.69 0.093
< 1/day versus > 2/day -3.09 0.003
1-2/day versus > 2/day -1.68 0.096
Bonferroni Corrections
• Multiply each p-value by the number of comparisons you have and compare those inflated p-
values to your 0.05 threshold
Bonferroni Corrections
• Multiply each p-value by the number of comparisons you have and compare those inflated p-
values to your 0.05 threshold

Since we have three comparisons, we need to multiply each p-value by 3:

Comparison t-statistic Unadjusted Bonferroni


p-value p-value
< 1/day versus 1-2/day -1.69 0.093 0.279
< 1/day versus > 2/day -3.09 0.003 0.009
1-2/day versus > 2/day -1.68 0.096 0.288
Bonferroni Corrections
• Multiply each p-value by the number of comparisons you have and compare those inflated p-
values to your 0.05 threshold

Since we have three comparisons, we need to multiply each p-value by 3:

Comparison t-statistic Unadjusted Bonferroni


p-value p-value
< 1/day versus 1-2/day -1.69 0.093 0.279
< 1/day versus > 2/day -3.09 0.003 0.009
1-2/day versus > 2/day -1.68 0.096 0.288

Only the < 1/day population and the >


2/day populations differ significantly
in their mean daily saturated fat
intake.
Bonferroni Corrections
• Multiply each p-value by the number of comparisons you have and compare those inflated p-
values to your 0.05 threshold

Since we have three comparisons, we need to multiply each p-value by 3:

Comparison t-statistic Unadjusted Bonferroni


p-value p-value
< 1/day versus 1-2/day -1.69 0.093 0.279
< 1/day versus > 2/day -3.09 0.003 0.009
1-2/day versus > 2/day -1.68 0.096 0.288

For K groups, there are K(K - 1)/2 comparisons Only the < 1/day population and the >
2/day populations differ significantly
If K = 5, we multiply each p-value by 5(4)/2 = 10
in their mean daily saturated fat
If K = 10, we multiply each p-value by 10(9)/2 = 45 intake.
Bonferroni Corrections
• An alternative approach to a Bonferroni correction is:

1 To divide your original p-value threshold by the number of comparisons

2 To compare your unadjusted p-values to this new (lower) p-value threshold


Bonferroni Corrections
• An alternative approach to a Bonferroni correction is:

1 To divide your original p-value threshold by the number of comparisons

2 To compare your unadjusted p-values to this new (lower) p-value threshold

Since we have three comparisons, our new p-value threshold is 0.05/3 = 0.017
Bonferroni Corrections
• An alternative approach to a Bonferroni correction is:

1 To divide your original p-value threshold by the number of comparisons

2 To compare your unadjusted p-values to this new (lower) p-value threshold

Since we have three comparisons, our new p-value threshold is 0.05/3 = 0.017

Comparison t-statistic Unadjusted


p-value
< 1/day versus 1-2/day -1.69 0.093
< 1/day versus > 2/day -3.09 0.003
1-2/day versus > 2/day -1.68 0.096
Bonferroni Corrections
• An alternative approach to a Bonferroni correction is:

1 To divide your original p-value threshold by the number of comparisons

2 To compare your unadjusted p-values to this new (lower) p-value threshold

Since we have three comparisons, our new p-value threshold is 0.05/3 = 0.017

Comparison t-statistic Unadjusted


p-value
< 1/day versus 1-2/day -1.69 0.093 Only the < 1/day population and the >
< 1/day versus > 2/day -3.09 0.003 2/day populations differ significantly
1-2/day versus > 2/day -1.68 0.096 in their mean daily saturated fat
intake
Roadmap for Inference
Numerical
Categorical
Mean of One Sample
one group t-Test Proportion
of one group
Compare
Two Sample
means for
t-Test
two groups Compare
props for Coming up Next!
Compare two groups
means for ANOVA
> 2 groups
Compare
props for
Mean difference of
paired measurements Paired t-Test > 2 groups
on same samples
77
Test Hypothesis Test Test Statistic (t) Confidence Interval

One- Sample t-Test 𝐻𝐻0 : 𝜇𝜇 = 𝜇𝜇0 𝑋𝑋� − 𝜇𝜇0 𝑠𝑠


𝑋𝑋� ± 𝑡𝑡𝑛𝑛−1,

𝛼𝛼�
𝐻𝐻1 : 𝜇𝜇 ≠ 𝜇𝜇0 𝑠𝑠⁄ 𝑛𝑛 2 𝑛𝑛

Two-Sample t-Test 𝐻𝐻0 : 𝜇𝜇1 − 𝜇𝜇2 = 0 𝑋𝑋�1 − 𝑋𝑋�2


𝐻𝐻1 : 𝜇𝜇1 − 𝜇𝜇2 ≠ 0 𝑠𝑠12 𝑠𝑠22
𝑠𝑠12� 𝑠𝑠22� 𝑋𝑋�1 −𝑋𝑋�2 ± ∗
𝑡𝑡min(𝑛𝑛 𝛼𝛼 +
𝑛𝑛1 + 𝑛𝑛2 1 ,𝑛𝑛2 )−1, �2 𝑛𝑛1 𝑛𝑛2

Paired t-Test 𝐻𝐻0 : 𝜇𝜇𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 = 𝜇𝜇0 𝑋𝑋�𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 − 𝜇𝜇0 𝑆𝑆𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑


𝑋𝑋�𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 ± 𝑡𝑡𝑛𝑛−1,

𝛼𝛼�
𝐻𝐻1 : 𝜇𝜇𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 ≠ 𝜇𝜇0 𝑆𝑆𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 ⁄ 𝑛𝑛 2 𝑛𝑛

𝒛𝒛∗𝜶𝜶⁄𝟐𝟐 will give very close interval for large 𝒏𝒏 (Use this for class)
78
Test Hypothesis Test Test Statistic Confidence Interval

ANOVA 𝐻𝐻0 : 𝜇𝜇1 = 𝜇𝜇2 = ⋯ = 𝜇𝜇𝑘𝑘 𝐹𝐹 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑁𝑁/𝐴𝐴


(Analysis of Variance) 𝐻𝐻1 : 𝐴𝐴𝐴𝐴 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 𝑜𝑜𝑜𝑜𝑜𝑜 𝜇𝜇𝑖𝑖 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴
Test equality of >2 𝑖𝑖𝑖𝑖 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
means

79

You might also like