0% found this document useful (0 votes)

19 views79 pages

Biostatistics 521 Lecture 14 Inference For Numerical Data II

Uploaded by

Chandni Rana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views79 pages

Biostatistics 521 Lecture 14 Inference For Numerical Data II

Uploaded by

Chandni Rana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 79

INFERENCE FOR

NUMERICAL DATA II
Xiang Zhou, PhD
BIOS 521
10/19/2023

1
Data Types

Levels,
Numbers Groups,
Categories

2
Roadmap for Inference
Numerical

Mean of One Sample

one group t-Test
Last Class
Compare
Two Sample
means for
t-Test
two groups

Compare
means for ANOVA
> 2 groups
Today!
Mean difference of
paired measurements Paired t-Test
on same samples
3
What Questions Can We Ask for Numerical Data?

4
Inference on a numerical variable (age) on one group:
• Confidence interval for population mean
• Hypothesis test for population mean of age being above/below some value (𝐻𝐻0 : 𝜇𝜇𝑎𝑎𝑎𝑎𝑎𝑎 = 50)

5
Maybe you are interested in the difference in ages between two groups

How might you test for a difference?

6
How might you test for a
difference?
 Treat dataset as two
samples (male, female)
 Compute the mean in
each group and subtract

Inference on the difference in a numerical variable (age) between two groups (men, women):
• Confidence interval for difference in population means ( 𝜇𝜇𝑀𝑀 − 𝜇𝜇𝐹𝐹 )
• Hypothesis test for population mean of age in mean being different from women (𝐻𝐻0 : 𝜇𝜇𝑀𝑀 − 𝜇𝜇𝐹𝐹 = 0)
7
Maybe you are interested in the difference in people’s actual versus desired weights

8
Paired t-Test of Correlated Outcomes
 Key idea: each person is independent, but the
𝑟𝑟 = 0.8
weight and desired weight measurements are
correlated within a person!
Desired Weight

 We should not treat these as two sets of

independent measurements

 Paired t-test: two correlated measurements

on same set of individuals

Weight
9
Each sample has “paired” weight measurements.
Subtracting desired weight from desired weight gives a new variable for each sample:
wt_diff = weight – wtdesire
 wt_diff > 0 implies people weigh more than they want

10
Inference on difference in actual versus desired weight in the dataset:
• Confidence interval for population mean in difference ( 𝜇𝜇𝑤𝑤𝑤𝑤_𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 )
• Hypothesis test for population mean of weight difference being non-zero (𝐻𝐻0 : 𝜇𝜇𝑤𝑤𝑤𝑤_𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 = 0)

11
The Paired t-Test
• Hypothesis testing at level 𝛼𝛼 on population mean for difference 𝜇𝜇𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 :
𝐻𝐻0 : 𝜇𝜇𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 = 𝜇𝜇0
𝐻𝐻𝐴𝐴 : 𝜇𝜇𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 ≠ 𝜇𝜇0

• Compute the difference in paired measurements 𝑥𝑥𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 = 𝑥𝑥2 − 𝑥𝑥1 for each of 𝑛𝑛 independent samples

• Note that the 𝑥𝑥1 and 𝑥𝑥2 measurements for each sample are not independent but the 𝑥𝑥𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑
measurements are independent across samples.
• Compute the sample mean 𝑋𝑋�𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 and standard deviation 𝑠𝑠𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 for the difference in paired measures
across all 𝑛𝑛 samples
𝑋𝑋� 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 − 𝜇𝜇0
• Use the standardized mean 𝑡𝑡 = as the test statistic
𝑆𝑆𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 ⁄ 𝑛𝑛

• Compute p-value as p = 2 × 𝑃𝑃 𝑇𝑇 > 𝑡𝑡 for two-sided test and reject the null hypothesis if 𝑝𝑝 < 𝛼𝛼,
or compare t-statistic to appropriate critical values
12
Example: Paired t-Test
Subtract paired measurements

𝑋𝑋�𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 = 14.59
𝑆𝑆𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 = 24.05
𝑛𝑛 = 20,000

13
Example: Paired t-Test
• Are the means the differences in actual
Subtract paired measurements and desired weight different?
• Perform the hypothesis test at 𝛼𝛼 = 1%:
𝐻𝐻0 : 𝜇𝜇𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 = 0
𝐻𝐻𝐴𝐴 : 𝜇𝜇𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 ≠ 0

𝑋𝑋�𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 − 𝜇𝜇0 14.59

𝑡𝑡 = =
𝑋𝑋�𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 = 14.59 𝑆𝑆𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 ⁄ 𝑛𝑛 24.05⁄ 20,000
𝑆𝑆𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 = 24.05
 𝑡𝑡 = 85.8
𝑛𝑛 = 20,000  𝑡𝑡 is highly significant so reject the null that actual
and desired weight are the same.
 Since 𝑋𝑋�𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 > 0 we would conclude that actual
weight is higher.
14
Example: Paired t-Test
• Are the means the differences in actual
and desired weight different?
Subtract paired measurements
• Perform the hypothesis test at 𝛼𝛼 = 1%:
𝐻𝐻0 : 𝜇𝜇𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 = 0
𝐻𝐻𝐴𝐴 : 𝜇𝜇𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 ≠ 0

15
Confidence Interval for Mean Difference of Paired Data

Since 𝑋𝑋�𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 is approximately Normal with standard deviation 𝑆𝑆𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 ⁄ 𝑛𝑛 for sufficiently large sample size
the (1 − 𝛼𝛼)% confidence interval for the mean difference of paired data is 𝜇𝜇𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 is

∗ 𝑆𝑆𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑
𝑋𝑋�𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 ±𝑡𝑡𝑛𝑛−1,𝛼𝛼�
2 𝑛𝑛
∗
Where 𝑡𝑡𝑛𝑛−1, 𝛼𝛼⁄ is the two-sided critical value for t-Distribution with 𝑛𝑛 − 1 degrees of freedom.
2

∗
That is, P 𝑇𝑇 > 𝑡𝑡𝑛𝑛−1, 𝛼𝛼⁄
2
= 𝛼𝛼/2.
∗
 𝑧𝑧𝛼𝛼⁄
2
will give very close interval for large 𝑛𝑛

16
Paired t-test Demo in R
• See Lecture14_t_test.R script

17
Example: Confidence Interval for Paired Data
Subtract paired measurements
𝑋𝑋�𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 = 14.59, 𝑆𝑆𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 = 24.05, 𝑛𝑛 = 20,000

24.05
99% 𝐶𝐶𝐶𝐶 = 14.59 ± 2.57 = (14.15212, 15.02608)
20,000

18
Paired t-Test and One-Sample t-Test
Subtract paired measurements

19
Example: Reading Test Scores Across Time
• This example comes from data collected on students in the Minnesota Public School District
(MPLS) beginning in the 2004-2005 school year

• Data was collected in part to comply with federal accountability requirements,

namely Title X of the No Child Left Behind Act, and used to study factors affecting academic
achievement

• The outcome is score on a reading achievement test, collected on the same set of students in the
fifth and eight grades

 Are student scores changing over time?

Paired Data
The data are paired because we
have two scores for each student, Student ID
one in the 5th grade and one in the
8th grade.

We can use the Paired t-test to

test for a change in scores
between the 5th and 8th grades

Let 𝜇𝜇𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 = the mean change in

test score between the 5th and 8th
grades

𝐻𝐻0 : 𝜇𝜇𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 = 0
𝐻𝐻𝐴𝐴 : 𝜇𝜇𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 ≠ 0
Paired t-test for Change in Test Scores

The p-value is very small so we would reject the

null hypothesis that the test scores are the same
in 5th and 8th grade
The Importance of
Accounting for the
Student ID
Correlation between
Scores

The paired test scores are

correlated. Ignoring the within-
student correlation of test
scores treats each of the
measurements as independent
observations

Equivalent to “dropping the

lines connecting the dots”
The Importance of Visualizing the scores as coming from different sets
of 5th and 8th grade students
Accounting for the
Correlation between
Scores

The paired test scores are

correlated. Ignoring the within-
student correlation of test scores
treats each of the measurements
as independent observations

The Two-Sample t-test for a

difference in means between two
groups with independent
sampling
Two-Sample t-Test for Change in Test Scores

Fail to reject the null hypothesis that test scores

differ between 5th and 8th grade at alpha=5%

 The two-sample t-test is incorrectly ignoring the paired or

correlated test scores on the same student.
 It “assumes” the 5th and 8th grade scores come from different
(independent) students
Same Data, Different Result?
Two-sample t-test: 𝑝𝑝 = 0.06 Paired t-test: 𝑝𝑝 = 9.4𝑒𝑒 −5

 Both tests see a similar mean difference in test scores

 The tests view the variability in measurements very differently leading to different inference
 Because scores come from the same students, the change in scores between grades looks convincing using the Paired t-test
 Highlights the importance of using the appropriate statistical test
Another Example
Does the timing of food intake and impact weight loss?
• Researchers followed 402 overweight individuals through
a 20-week weight loss treatment. Participants were grouped into
early eaters and late eaters, based on the timing of their main meal.
The following data was collected on average weight loss (kg) over the
20 weeks :

• What is the appropriate test for analyzing this data?

27
Introduction to the Practice of Statistics 9th Edition. Moore, McCabe, Craig
Another Example
We would use a Two-Sample t-Test to test for a difference in mean weight loss
between the two groups: early eaters and late eaters.

𝐻𝐻0 : 𝜇𝜇𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 = 𝜇𝜇𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙

𝐻𝐻𝐴𝐴 : 𝜇𝜇𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 ≠ 𝜇𝜇𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙

Perform the two-sided hypothesis test at 𝛼𝛼 = 5%.

28
Another Example

29
Another Example
First compute the test statistic:

9.9 − 7.7
𝑡𝑡 = = 3.71
5.82 6.12
+
202 200

• Since n is large in both groups, we will just use the Normal Approximation for inference
• The critical value for a two-sided test is 𝑧𝑧 ∗ = 1.96
 Since 3.71 > 1.96 = 𝑧𝑧 ∗ , we reject the null hypothesis
 We conclude that 𝜇𝜇𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 > 𝜇𝜇𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 because the sample mean for weight loss is higher in the
early eater group
 The p-value is p = 2 × 𝑃𝑃 𝑡𝑡 > 3.71 ≈ 2.04𝑒𝑒 4 (computer needed)
30
ANOVA: Comparing Means Across Many Groups

• The two-sample t-test allowed us to test for a difference between the population means
for two groups

• What if there are more than 2 groups?

31
Comparisons Across > 2 Groups

The Categorical variable now has > 2 levels Numerical Outcome

32
Comparisons Across > 2 Groups
We could perform all possible Two
Sample t-Tests
• There are 52 = 10 such tests
Age

• Every time you perform a hypothesis

test you risk the chance of
committing a Type 1 Error
(falsely rejecting the null hypothesis)

Self-Reported General Health

5
2
= 10 possible pairwise comparisons
33
The Multiple Testing Problem
We could perform all possible Two Sample t-Tests
5
• There are = 10 such tests
Probability of Committing a Type 1 Error

2
• Every time you perform a hypothesis test you
𝑁𝑁
risk the chance of committing a Type 1 Error
1 − 1 − 𝛼𝛼
(falsely rejecting the null hypothesis)

• If we perform each test at level 𝛼𝛼 = 5%, the

probability that we falsely reject the null for at
least one of the 𝟏𝟏𝟏𝟏 tests is
1 − 1 − 0.05 10 = 0.40
 There is a 𝟒𝟒𝟒𝟒𝟒 chance we would claim a
difference when one does not really exist!
 Called the Multiple Testing Problem

N: Number of Tests Performed at Level 𝛼𝛼 = 5% 34

• Each time a hypothesis test is
performed at 𝛼𝛼 = 5%, there is a 5%
chance we reject the null hypothesis
even if it is true (Type I error)

• The chance of a Type I error increases

as we perform more test
 The “overall experiment” does not
have a 5% error rate

• Failing to account for the multiple

tests is an example of p-hacking

https://xkcd.com/882/ 35
Analysis of Variance (ANOVA)
• ANOVA is a global test of the equivalence of many population means:
𝐻𝐻0 : 𝜇𝜇1 = 𝜇𝜇2 = ⋯ = 𝜇𝜇𝑘𝑘
𝐻𝐻𝐴𝐴 : At least one of the 𝑘𝑘 means is not the same as the others

• One single test for many comparisons = reduces the chance of a Type 1 Error.

• ANOVA Table – Partitions the observed variation in the numerical outcome into that
explained by differences between groups (Group Sums of Squares) and random variation
(Residual Sums of Squares)

df Sum Sq Mean Sq F Value P-value

Groups 𝑘𝑘 − 1 ### ### F-statistic 𝑃𝑃(𝐹𝐹 > 𝑓𝑓)
Residual 𝑛𝑛 − 𝑘𝑘 ### ###
36
Analysis of Variance (ANOVA)
• ANOVA is a global test of the equivalence of many population means:
The ANOVA p-value is
𝐻𝐻0 : 𝜇𝜇1 = 𝜇𝜇2 = ⋯ = 𝜇𝜇𝑘𝑘 for this hypothesis test
𝐻𝐻𝐴𝐴 : At least one of the 𝑘𝑘 means is not the same as the others

• One single test for many comparisons = reduces the chance of a Type 1 Error.

df Sum Sq Mean Sq F Value P-value

Groups 𝑘𝑘 − 1 ### ### F-statistic 𝑃𝑃(𝐹𝐹 > 𝑓𝑓)
Residual 𝑛𝑛 − 𝑘𝑘 ### ###
37
Analysis Of Variance (ANOVA)
ANOVA null hypothesis:
The means for all ANOVA alternative hypothesis:
groups are the same The mean for at least one of the groups is different

All Samples
Samples Grouped by General Health 38
Analysis Of Variance Sums of Squares Table

Null Hypothesis
𝐻𝐻0 : The mean is the same across all groups

Alternative Hypothesis
𝐻𝐻𝐴𝐴 : The mean for at least one of the groups is different

There is very strong evidence to

Variable Degrees of Sums of Mean Sums F-Statistic P-value reject the null hypothesis that age
Freedom Squares of Squares
is independent of general health
General
Health
4 318613 79653 284.8 <2e-16 (i.e., mean age is the same across
general health groups)
Residual 19995 5592863 280
39
ANOVA Demo in R
• Reproduce these results for difference in Age by General Health groups

• Test for a difference in height by General Health groups

See the Lecture14_t_tests.R script

40
Details for ANOVA*

41
Imagine you have a total of K populations whose means you wish to compare

Pop 1 Pop 2 Pop 3 … Pop K

𝜇𝜇1 𝜇𝜇2 𝜇𝜇3 𝜇𝜇𝐾𝐾

“the population means are 𝐻𝐻0 : 𝜇𝜇1 = 𝜇𝜇2 = ⋯ = 𝜇𝜇𝐾𝐾

all equal”

“two or more of the population 𝐻𝐻𝑎𝑎 : 𝜇𝜇𝑖𝑖 ≠ 𝜇𝜇𝑗𝑗 for at least one i and one j
means are unequal”
Is consumption of sugary beverages in adults associated with level of calorie intake
Population 1: Population 2: Population 3:
U.S. Adults who report consuming less U.S. Adults who report consuming U.S. Adults who report consuming
than 1 sugary beverage/day 1-2 sugary beverages/day more than 2 sugary beverages/day

Mean Mean Mean

calories/day calories/day calories/day
𝝁𝝁𝟏𝟏 𝝁𝝁𝟐𝟐 𝝁𝝁𝟑𝟑

𝑯𝑯𝟎𝟎 : 𝝁𝝁𝟏𝟏 = 𝝁𝝁𝟐𝟐 = 𝝁𝝁𝟑𝟑 𝝁𝝁𝟏𝟏 ≠ 𝝁𝝁𝟐𝟐 and/or 3 multiple

𝝁𝝁𝟏𝟏 ≠ 𝝁𝝁𝟑𝟑 and/or pairwise
𝑯𝑯𝒂𝒂 : at least two of 𝝁𝝁𝟏𝟏 , 𝝁𝝁𝟐𝟐 , and 𝝁𝝁𝟑𝟑 not equal 𝝁𝝁𝟐𝟐 ≠ 𝝁𝝁𝟑𝟑 comparisons
Pase & colleagues (2017). “Sugary beverage intake & preclinical Alzheimer’s disease in the community.” Alzheimer’s & Dementia.
Is consumption of sugary beverages in adults associated with level of calorie intake

Population Beverage Sample

Consumption Sample Mean (SD) Size
Population 1 < 1 / day 1,782 (594) cal/day 2,395 24
Population 2 1 – 2 / day 2,007 (626) cal/day 1,239 12
Population 3 > 2 / day 2,413 (767) cal/day 641 6

24 1782 + 12 2007 + 6(2143)

Grand mean 𝑋𝑋• = = 1898
24 + 12 + 6

If 𝐻𝐻0 is true (all populations have the same mean),

then the sample means should all be close to the grand mean
Pase & colleagues (2017). “Sugary beverage intake & preclinical Alzheimer’s disease in the community.” Alzheimer’s & Dementia.
We can visually present the discrepancy of
the sample means from the grand mean
• The signal (evidence) is the average of the squared distances of each
• mean from the overall mean

Each group also receives a “weight” equal to that group’s sample size

Thus, if we let 𝑋𝑋• denote the overall mean and

𝑋𝑋1 = mean of < 1 beverage/day group

𝑋𝑋2 = mean of 1 - 2 beverages/day group

𝑋𝑋3 = mean of > 2 beverages/day group

Our signal is:

𝟏𝟏 𝟐𝟐 𝟐𝟐 𝟐𝟐
[𝟐𝟐𝟐𝟐 𝑿𝑿𝟏𝟏 − 𝑿𝑿• + 𝟏𝟏𝟏𝟏 𝑿𝑿𝟐𝟐 − 𝑿𝑿• + 𝟔𝟔 𝑿𝑿𝟑𝟑 − 𝑿𝑿• ]
𝟑𝟑 − 𝟏𝟏
• If we are comparing the means of K groups, the signal (evidence) is:

𝑲𝑲
𝟏𝟏 𝟐𝟐
𝑴𝑴𝑴𝑴𝑴𝑴 = � 𝒏𝒏𝒌𝒌 𝑿𝑿𝑲𝑲 − 𝑿𝑿•
𝑲𝑲 − 𝟏𝟏
𝒌𝒌=𝟏𝟏
• If we are comparing the means of K groups, the signal (evidence) is:

Mean sum-of-squares
between groups (MSB) 𝑲𝑲
𝟏𝟏 𝟐𝟐
𝑴𝑴𝑴𝑴𝑴𝑴 = � 𝒏𝒏𝒌𝒌 𝑿𝑿𝑲𝑲 − 𝑿𝑿•
𝑲𝑲 − 𝟏𝟏
𝒌𝒌=𝟏𝟏
• If we are comparing the means of K groups, the signal (evidence) is:

𝑲𝑲
𝟏𝟏 𝟐𝟐
𝑴𝑴𝑴𝑴𝑴𝑴 = � 𝒏𝒏𝒌𝒌 𝑿𝑿𝑲𝑲 − 𝑿𝑿•
𝑲𝑲 − 𝟏𝟏
𝒌𝒌=𝟏𝟏
Degrees of freedom
• If we are comparing the means of K groups, the signal (evidence) is:

𝑲𝑲
𝟏𝟏 𝟐𝟐
𝑴𝑴𝑴𝑴𝑴𝑴 = � 𝒏𝒏𝒌𝒌 𝑿𝑿𝑲𝑲 − 𝑿𝑿•
𝑲𝑲 − 𝟏𝟏
𝒌𝒌=𝟏𝟏

𝟏𝟏 𝟐𝟐 𝟐𝟐
𝑴𝑴𝑴𝑴𝑴𝑴 = [𝟐𝟐𝟐𝟐 𝟏𝟏𝟏𝟏𝟏𝟏𝟏𝟏 − 𝟏𝟏𝟏𝟏𝟏𝟏𝟏𝟏 + 𝟏𝟏𝟏𝟏 𝟐𝟐𝟐𝟐𝟐𝟐𝟐𝟐 − 𝟏𝟏𝟏𝟏𝟏𝟏𝟏𝟏 + 𝟔𝟔 𝟐𝟐𝟐𝟐𝟐𝟐𝟐𝟐 − 𝟏𝟏𝟏𝟏𝟏𝟏𝟏𝟏 𝟐𝟐 ]
𝟑𝟑 − 𝟏𝟏
𝑴𝑴𝑴𝑴𝑴𝑴 = 𝟒𝟒𝟒𝟒𝟒𝟒, 𝟖𝟖𝟖𝟖𝟖𝟖
Is this signal “BIG ENOUGH” relative to the noise in the data?

We measure how
much each observation
deviates from the
mean of its group
Our noise is the average of the squared deviations across the 42 people in our sample
Our noise is the average of the squared deviations across the 42 people in our sample
Our noise is the average of the squared deviations across the 42 people in our sample
Our noise is the average of the squared deviations across the 42 people in our sample

24 12 6
2 2 2
� 𝑋𝑋𝑖𝑖𝑖 − 𝑋𝑋1 + � 𝑋𝑋𝑖𝑖𝑖 − 𝑋𝑋2 + � 𝑋𝑋𝑖𝑖𝑖 − 𝑋𝑋3
𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1
• We add up the noise (skepticism) in each group and then take an average

We do not divide by the sample size (n = 42), but rather by

the sample size reduced by the number of groups (K = 3):

24 12 6
1 2 2 2
𝑀𝑀𝑀𝑀𝑀𝑀 = � 𝑋𝑋𝑖𝑖𝑖 − 𝑋𝑋1 + � 𝑋𝑋𝑖𝑖𝑖 − 𝑋𝑋2 + � 𝑋𝑋𝑖𝑖𝑖 − 𝑋𝑋3
𝑛𝑛 − 𝐾𝐾
𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1

𝐾𝐾 𝑛𝑛𝑘𝑘
1 2
= � � 𝑋𝑋𝑖𝑖𝑖𝑖 − 𝑋𝑋𝐾𝐾
𝑛𝑛 − 𝐾𝐾
𝑘𝑘=1 𝑖𝑖=1
• We add up the noise (skepticism) in each group and then take an average

We do not divide by the sample size (n = 42), but rather by

the sample size reduced by the number of groups (K = 3):

Mean squared
𝐾𝐾 𝑛𝑛𝑘𝑘
error (MSE) 1 2
= � � 𝑋𝑋𝑖𝑖𝑖𝑖 − 𝑋𝑋𝐾𝐾
𝑛𝑛 − 𝐾𝐾
𝑘𝑘=1 𝑖𝑖=1
• We add up the noise (skepticism) in each group and then take an average

We do not divide by the sample size (n = 42), but rather by

the sample size reduced by the number of groups (K = 3):

𝐾𝐾 𝑛𝑛𝑘𝑘
1 2
= � � 𝑋𝑋𝑖𝑖𝑖𝑖 − 𝑋𝑋𝐾𝐾
𝑛𝑛 − 𝐾𝐾
𝑘𝑘=1 𝑖𝑖=1
Degrees of
freedom
• We add up the noise (skepticism) in each group and then take an average

We do not divide by the sample size (n = 42), but rather by

the sample size reduced by the number of groups (K = 3):

𝐾𝐾 𝑛𝑛𝑘𝑘
1 2
= � � 𝑋𝑋𝑖𝑖𝑖𝑖 − 𝑋𝑋𝐾𝐾
𝑛𝑛 − 𝐾𝐾
𝑘𝑘=1 𝑖𝑖=1

= 393,870
• Like a t-test, our test statistic is the signal-to-noise ratio

𝑀𝑀𝑀𝑀𝑀𝑀 evidence
𝑆𝑆𝑆𝑆𝑆𝑆 = 𝐹𝐹 =
𝑀𝑀𝑀𝑀𝑀𝑀 skepticism

We convert this F−statistic to a p−value, determined

from an F−distribution instead of a normal distribution

We want large values of F in order to reject 𝐻𝐻0

If F is large, then the observations tend to “cluster” in their groups,

making the groups distinct from each other
Is consumption of sugary beverages in adults associated with level of calorie intake

𝑀𝑀𝑀𝑀𝑀𝑀 412,833
• 𝐹𝐹 = = = 1.05
𝑀𝑀𝑀𝑀𝑀𝑀 393,870

Based upon the numerator &

𝑝𝑝 = 0.36
denominator degrees of freedom

We fail to reject 𝐻𝐻0

We do not conclude that there

We lack evidence that there
is no association between
is an association between
sugary drink consumption and
sugary drink consumption
daily caloric consumption
and daily caloric consumption
(accept 𝐻𝐻0 )

Pase & colleagues (2017). “Sugary beverage intake & preclinical Alzheimer’s disease in the community.” Alzheimer’s & Dementia.
The results of ANOVA are often displayed in an ANOVA table
Source Sum of Degrees of Mean Square F-statistic p-value
Squares Freedom
Between 825,665 2 412,833 1.05 0.36
groups
Within 15,360,918 39 393,870
groups
Total 16,186,583 41 806,703
Modified Example: ANOVA and Bonferroni
Correction
Pase & colleagues also compared the 3 beverage consumption groups with respect to mean grams of
saturated fat consumed per day. A subset of the data produces the following summary:

Population Beverage Sample Mean (SD) Sample Size

Consumption
Population 1 < 1 / day 22 (10) g/day 240
Population 2 1 – 2 / day 24 (11) g/day 124
Population 3 > 2 / day 27 (12) g/day 64
Pase & colleagues also compared the 3 beverage consumption groups with respect to mean grams of
saturated fat consumed per day. A subset of the data produces the following summary:

Population Beverage Sample Mean (SD) Sample Size

Consumption
Population 1 < 1 / day 22 (10) g/day 240
Population 2 1 – 2 / day 24 (11) g/day 124
Population 3 > 2 / day 27 (12) g/day 64

The resulting ANOVA table is:

Source Sum of Squares Degrees of Mean Square F-statistic p-value

Freedom
Between 1,342 2 671 6.0 0.003
groups
Within 47,390 425 112
groups
Total 48,732 427
Pase & colleagues also compared the 3 beverage consumption groups with respect to mean grams of
saturated fat consumed per day. A subset of the data produces the following summary:

Population Beverage Sample Mean (SD) Sample Size

Consumption
Population 1 < 1 / day 22 (10) g/day 240
Population 2 1 – 2 / day 24 (11) g/day 124
Population 3 > 2 / day 27 (12) g/day 64

The resulting ANOVA table is:

Statistically
Source Sum of Squares Degrees of Mean Square F-statistic p-value
significant difference
Freedom
in mean daily grams
Between 1,342 2 671 6.0 0.003 of saturated fat
groups consumed among
Within 47,390 425 112 the 3 populations
groups
Total 48,732 427
But, which populations are different from each other?
• We now do a “post-hoc” (after the fact) comparison of each pair of means (i.e. we do a two-
sample t-test with each possible pair of groups).
But, which populations are different from each other?
• We now do a “post-hoc” (after the fact) comparison of each pair of means (i.e. we do a two-
sample t-test with each possible pair of groups).

Because we have three groups, there are three pairwise comparisons we can do:

< 1/day versus 1-2/day < 1/day versus > 2/day 1-2/day versus > 2/day

If we do each of the 3 t-tests, we get the following:

But, which populations are different from each other?
• We now do a “post-hoc” (after the fact) comparison of each pair of means (i.e. we do a two-
sample t-test with each possible pair of groups).

Because we have three groups, there are three pairwise comparisons we can do:

< 1/day versus 1-2/day < 1/day versus > 2/day 1-2/day versus > 2/day

If we do each of the 3 t-tests, we get the following:

Comparison t-statistic p-value

< 1/day versus 1-2/day -1.69 0.093
< 1/day versus > 2/day -3.09 0.003
1-2/day versus > 2/day -1.68 0.096
Bonferroni Corrections
• Multiply each p-value by the number of comparisons you have and compare those inflated p-
values to your 0.05 threshold
Bonferroni Corrections
• Multiply each p-value by the number of comparisons you have and compare those inflated p-
values to your 0.05 threshold

Since we have three comparisons, we need to multiply each p-value by 3:

Comparison t-statistic Unadjusted Bonferroni

p-value p-value
< 1/day versus 1-2/day -1.69 0.093 0.279
< 1/day versus > 2/day -3.09 0.003 0.009
1-2/day versus > 2/day -1.68 0.096 0.288
Bonferroni Corrections
• Multiply each p-value by the number of comparisons you have and compare those inflated p-
values to your 0.05 threshold

Since we have three comparisons, we need to multiply each p-value by 3:

Comparison t-statistic Unadjusted Bonferroni

p-value p-value
< 1/day versus 1-2/day -1.69 0.093 0.279
< 1/day versus > 2/day -3.09 0.003 0.009
1-2/day versus > 2/day -1.68 0.096 0.288

Only the < 1/day population and the >

2/day populations differ significantly
in their mean daily saturated fat
intake.
Bonferroni Corrections
• Multiply each p-value by the number of comparisons you have and compare those inflated p-
values to your 0.05 threshold

Since we have three comparisons, we need to multiply each p-value by 3:

Comparison t-statistic Unadjusted Bonferroni

p-value p-value
< 1/day versus 1-2/day -1.69 0.093 0.279
< 1/day versus > 2/day -3.09 0.003 0.009
1-2/day versus > 2/day -1.68 0.096 0.288

For K groups, there are K(K - 1)/2 comparisons Only the < 1/day population and the >
2/day populations differ significantly
If K = 5, we multiply each p-value by 5(4)/2 = 10
in their mean daily saturated fat
If K = 10, we multiply each p-value by 10(9)/2 = 45 intake.
Bonferroni Corrections
• An alternative approach to a Bonferroni correction is:

1 To divide your original p-value threshold by the number of comparisons

2 To compare your unadjusted p-values to this new (lower) p-value threshold

Bonferroni Corrections
• An alternative approach to a Bonferroni correction is:

1 To divide your original p-value threshold by the number of comparisons

2 To compare your unadjusted p-values to this new (lower) p-value threshold

Since we have three comparisons, our new p-value threshold is 0.05/3 = 0.017
Bonferroni Corrections
• An alternative approach to a Bonferroni correction is:

1 To divide your original p-value threshold by the number of comparisons

2 To compare your unadjusted p-values to this new (lower) p-value threshold

Since we have three comparisons, our new p-value threshold is 0.05/3 = 0.017

Comparison t-statistic Unadjusted

p-value
< 1/day versus 1-2/day -1.69 0.093
< 1/day versus > 2/day -3.09 0.003
1-2/day versus > 2/day -1.68 0.096
Bonferroni Corrections
• An alternative approach to a Bonferroni correction is:

1 To divide your original p-value threshold by the number of comparisons

2 To compare your unadjusted p-values to this new (lower) p-value threshold

Since we have three comparisons, our new p-value threshold is 0.05/3 = 0.017

Comparison t-statistic Unadjusted

p-value
< 1/day versus 1-2/day -1.69 0.093 Only the < 1/day population and the >
< 1/day versus > 2/day -3.09 0.003 2/day populations differ significantly
1-2/day versus > 2/day -1.68 0.096 in their mean daily saturated fat
intake
Roadmap for Inference
Numerical
Categorical
Mean of One Sample
one group t-Test Proportion
of one group
Compare
Two Sample
means for
t-Test
two groups Compare
props for Coming up Next!
Compare two groups
means for ANOVA
> 2 groups
Compare
props for
Mean difference of
paired measurements Paired t-Test > 2 groups
on same samples
77
Test Hypothesis Test Test Statistic (t) Confidence Interval

One- Sample t-Test 𝐻𝐻0 : 𝜇𝜇 = 𝜇𝜇0 𝑋𝑋� − 𝜇𝜇0 𝑠𝑠

𝑋𝑋� ± 𝑡𝑡𝑛𝑛−1,
∗
𝛼𝛼�
𝐻𝐻1 : 𝜇𝜇 ≠ 𝜇𝜇0 𝑠𝑠⁄ 𝑛𝑛 2 𝑛𝑛

Two-Sample t-Test 𝐻𝐻0 : 𝜇𝜇1 − 𝜇𝜇2 = 0 𝑋𝑋�1 − 𝑋𝑋�2

𝐻𝐻1 : 𝜇𝜇1 − 𝜇𝜇2 ≠ 0 𝑠𝑠12 𝑠𝑠22
𝑠𝑠12� 𝑠𝑠22� 𝑋𝑋�1 −𝑋𝑋�2 ± ∗
𝑡𝑡min(𝑛𝑛 𝛼𝛼 +
𝑛𝑛1 + 𝑛𝑛2 1 ,𝑛𝑛2 )−1, �2 𝑛𝑛1 𝑛𝑛2

Paired t-Test 𝐻𝐻0 : 𝜇𝜇𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 = 𝜇𝜇0 𝑋𝑋�𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 − 𝜇𝜇0 𝑆𝑆𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑

𝑋𝑋�𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 ± 𝑡𝑡𝑛𝑛−1,
∗
𝛼𝛼�
𝐻𝐻1 : 𝜇𝜇𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 ≠ 𝜇𝜇0 𝑆𝑆𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 ⁄ 𝑛𝑛 2 𝑛𝑛

𝒛𝒛∗𝜶𝜶⁄𝟐𝟐 will give very close interval for large 𝒏𝒏 (Use this for class)
78
Test Hypothesis Test Test Statistic Confidence Interval

ANOVA 𝐻𝐻0 : 𝜇𝜇1 = 𝜇𝜇2 = ⋯ = 𝜇𝜇𝑘𝑘 𝐹𝐹 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑁𝑁/𝐴𝐴

(Analysis of Variance) 𝐻𝐻1 : 𝐴𝐴𝐴𝐴 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 𝑜𝑜𝑜𝑜𝑜𝑜 𝜇𝜇𝑖𝑖 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴
Test equality of >2 𝑖𝑖𝑖𝑖 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
means

T Test
No ratings yet
T Test
16 pages
11paired T
No ratings yet
11paired T
49 pages
Paired T Test
No ratings yet
Paired T Test
19 pages
PROJECT 1 STA 108 Baru
75% (8)
PROJECT 1 STA 108 Baru
28 pages
Presentation T Test
50% (2)
Presentation T Test
31 pages
QRM 7B Inferential Statistics
No ratings yet
QRM 7B Inferential Statistics
84 pages
Lecture 5
No ratings yet
Lecture 5
34 pages
Chapter8 - Hyp - Test - 2 - Samples - Student
No ratings yet
Chapter8 - Hyp - Test - 2 - Samples - Student
45 pages
Stat 491 Chapter 8 - Hypothesis Testing - Two Sample Inference
No ratings yet
Stat 491 Chapter 8 - Hypothesis Testing - Two Sample Inference
39 pages
07 Analysis of Variance
No ratings yet
07 Analysis of Variance
122 pages
Lecture Slides 13 UN1201
No ratings yet
Lecture Slides 13 UN1201
19 pages
CH 9 10
No ratings yet
CH 9 10
48 pages
7A. Comparing 2 Pop. Means
No ratings yet
7A. Comparing 2 Pop. Means
16 pages
Sample Testing
No ratings yet
Sample Testing
8 pages
Lecture Notes Stats Ich 9
No ratings yet
Lecture Notes Stats Ich 9
28 pages
Statistical Methods For Continuous Variables - Part One
No ratings yet
Statistical Methods For Continuous Variables - Part One
83 pages
Research Methodology Lecture 7
No ratings yet
Research Methodology Lecture 7
103 pages
Presentation1 T TEST MCC 703
No ratings yet
Presentation1 T TEST MCC 703
43 pages
Two Sample T-Test 29-12-14
No ratings yet
Two Sample T-Test 29-12-14
38 pages
Allama Iqbal Open University Islamabad: Muhammad Ashraf
No ratings yet
Allama Iqbal Open University Islamabad: Muhammad Ashraf
25 pages
Test of Difference
No ratings yet
Test of Difference
7 pages
Notes509fall11sec45 PDF
No ratings yet
Notes509fall11sec45 PDF
12 pages
Two Sample Inference: By: Girma M
No ratings yet
Two Sample Inference: By: Girma M
33 pages
Chapter Three
No ratings yet
Chapter Three
14 pages
EDU 801 Lecture Note Summarized For 2025
No ratings yet
EDU 801 Lecture Note Summarized For 2025
41 pages
Educ 202 Adv Stat Final Exam Apr 19
No ratings yet
Educ 202 Adv Stat Final Exam Apr 19
4 pages
Student's T-Test (Unpaired Data)
No ratings yet
Student's T-Test (Unpaired Data)
21 pages
Paired T Test Problem
No ratings yet
Paired T Test Problem
5 pages
Introduction To Inferential Statistics & Important Statistical Tests
100% (1)
Introduction To Inferential Statistics & Important Statistical Tests
55 pages
P&S - R23 Syllabus and Model Paper
No ratings yet
P&S - R23 Syllabus and Model Paper
12 pages
Testing Two Independent Samples - With Minitab Procedures)
No ratings yet
Testing Two Independent Samples - With Minitab Procedures)
67 pages
T Testz Test
No ratings yet
T Testz Test
47 pages
Parametric and Non-Parametric
No ratings yet
Parametric and Non-Parametric
35 pages
Laboratory Exercise No 3B
No ratings yet
Laboratory Exercise No 3B
6 pages
Test For Two Related Samples
No ratings yet
Test For Two Related Samples
32 pages
Lecture 26 Compact
No ratings yet
Lecture 26 Compact
5 pages
13 Hypothesis Test Concerning Means Two Population Means
No ratings yet
13 Hypothesis Test Concerning Means Two Population Means
32 pages
5 Lesson Paired T Test
No ratings yet
5 Lesson Paired T Test
1 page
7 .T - Test For Dependent
No ratings yet
7 .T - Test For Dependent
14 pages
Paired T Test
No ratings yet
Paired T Test
3 pages
GMT 206 Numerical Data Analysis
No ratings yet
GMT 206 Numerical Data Analysis
18 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
41 pages
Types of T-Tests: Test Purpose Example
50% (2)
Types of T-Tests: Test Purpose Example
5 pages
T Test Function in Statistical Software
No ratings yet
T Test Function in Statistical Software
9 pages
Dependent T Test
No ratings yet
Dependent T Test
38 pages
Statistics: 1.1 Paired T-Tests: Rosie Shier. 2004
No ratings yet
Statistics: 1.1 Paired T-Tests: Rosie Shier. 2004
3 pages
Paired Samples T Test
No ratings yet
Paired Samples T Test
5 pages
Session 2 On Hypothesis Testing
No ratings yet
Session 2 On Hypothesis Testing
13 pages
T Test
No ratings yet
T Test
11 pages
STAT 206 - Chapter 10 (Two-Sample Hypothesis Tests)
No ratings yet
STAT 206 - Chapter 10 (Two-Sample Hypothesis Tests)
38 pages
19 MAY - NR - Tests of Significance
No ratings yet
19 MAY - NR - Tests of Significance
10 pages
Paired Sample T-Test
No ratings yet
Paired Sample T-Test
4 pages
What Is A T
No ratings yet
What Is A T
5 pages
Lesson 7
No ratings yet
Lesson 7
74 pages
Autoregressive Integrated Moving Average
No ratings yet
Autoregressive Integrated Moving Average
4 pages
Carver - The Case Against Statistical Significance Testing
No ratings yet
Carver - The Case Against Statistical Significance Testing
18 pages
Summary Table For Statistical Techniques
No ratings yet
Summary Table For Statistical Techniques
4 pages
Statistical Technique Summary Table
No ratings yet
Statistical Technique Summary Table
4 pages
CHAPTER8 QS026 semII 2009 10
No ratings yet
CHAPTER8 QS026 semII 2009 10
13 pages
The Normal Distribution
100% (1)
The Normal Distribution
2 pages
COR 006 Reviewer
No ratings yet
COR 006 Reviewer
5 pages
Quiz Sample of Business Statistic'S: January 2020
No ratings yet
Quiz Sample of Business Statistic'S: January 2020
79 pages
Appendix Mixed Models
No ratings yet
Appendix Mixed Models
24 pages
Newbold Et Al 94 Adventures With Arima Software
No ratings yet
Newbold Et Al 94 Adventures With Arima Software
9 pages
Homework Problems Stat 479: March 28, 2012
No ratings yet
Homework Problems Stat 479: March 28, 2012
44 pages
Statistical Inference 2 Note 02
No ratings yet
Statistical Inference 2 Note 02
7 pages
Time Series Analysis and Forecasting
No ratings yet
Time Series Analysis and Forecasting
28 pages
UNIT 4 K-Means Clustring
No ratings yet
UNIT 4 K-Means Clustring
13 pages
Week10 One and Two Way Anova
No ratings yet
Week10 One and Two Way Anova
93 pages
Psychophysics. Irt
No ratings yet
Psychophysics. Irt
5 pages
Biostatistics 521 Lectures 11 12 Introduction To Hypothesis Testing 2023-1
No ratings yet
Biostatistics 521 Lectures 11 12 Introduction To Hypothesis Testing 2023-1
104 pages
1 Regression
No ratings yet
1 Regression
65 pages
Statistics For Business and Economics Revised 12th Edition Anderson Test Bank Instant Download
100% (2)
Statistics For Business and Economics Revised 12th Edition Anderson Test Bank Instant Download
55 pages
Kappa Test For Agreement Between Two Raters
No ratings yet
Kappa Test For Agreement Between Two Raters
12 pages
Stats2 Week11
No ratings yet
Stats2 Week11
13 pages
T-Test Practical
No ratings yet
T-Test Practical
31 pages
Transforming Normal To Standard Normal
No ratings yet
Transforming Normal To Standard Normal
14 pages
Lab0 R Tutorial EHS
No ratings yet
Lab0 R Tutorial EHS
9 pages
Class 16 - Grid Cells
No ratings yet
Class 16 - Grid Cells
34 pages
SS 187 - Applied Statistics For Humanities - Salaar Khan
No ratings yet
SS 187 - Applied Statistics For Humanities - Salaar Khan
5 pages
EDA Notebook 6 Estimation of Population Mean
No ratings yet
EDA Notebook 6 Estimation of Population Mean
13 pages
Bouilleret Et Al - 1999 - Recurrent Seizures and Hippocampal Sclerosis Following Intrahippocampal Kainate
No ratings yet
Bouilleret Et Al - 1999 - Recurrent Seizures and Hippocampal Sclerosis Following Intrahippocampal Kainate
13 pages
Classification Basics
No ratings yet
Classification Basics
14 pages
Intermediate Statistics For Economics Econ006
No ratings yet
Intermediate Statistics For Economics Econ006
5 pages
Unit 6 MS
No ratings yet
Unit 6 MS
5 pages
Validating Clusters Using Hopkins Statistics
No ratings yet
Validating Clusters Using Hopkins Statistics
5 pages
BAT12 Calibration
No ratings yet
BAT12 Calibration
2 pages
HUL Revenue Analysis
No ratings yet
HUL Revenue Analysis
3 pages