0% found this document useful (0 votes)
11 views66 pages

UNL STAT318 Notes Chapter 1-4 (2020)

Uploaded by

Yuin Xuen Wong
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views66 pages

UNL STAT318 Notes Chapter 1-4 (2020)

Uploaded by

Yuin Xuen Wong
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 66

9/11/2020

CHAPTER 1
INTRODUCTION TO STATISTICAL INFERENCE

SECTION 1
INTRODUCTION TO STATISTICAL INFERENCE

1 2

WHAT IS DATA? DESCRIPTIVE STATISTICS

 Informal and exploratory analysis often done numerically and visually


 Variables measured on observational units.
 Two of the things we are most interested in when looking at data are:
 Observational units are the persons/things you want to know about!
 Where are the data centered.
 Usually presented in rows
 How spread out the data are.
 Variable are any characteristic of a person or thing that can be assigned a number or a category.
 Quantitative- takes on a numerical value
 Categorical- takes on a category designation

3 4
1
9/11/2020

MEASURES OF LOCATION MEASURES OF VARIABILITY

 Median is defined as the middle observation, where the observations are completely ordered.
 If you have an even number of observations in your ordered list, the median is the average of the two middle
observations.  Measures of location are usually considered incomplete without a corresponding measure of
variability. The numerical measures of variability we are interested in are:
 Mean/Average is defined as numerical average of all observations.
 Variance
 Sample mean is denoted by 𝑿 where
 Standard deviation

 Both of these numbers provide a measure of how spread out the data are. The larger the number, the
more spread out the observations are.
 Population mean is denoted by 𝝁

5 6

MEASURES OF VARIABILITY: VARIANCE MEASURES OF VARIABILITY: STANDARD DEVIATION

 Variance is the average squared deviation of data from the mean.


 Standard deviation (SD) is simply the square root of the variance.
 Population Variance, denoted by 𝜎 , is given by
 Population SD is given by 𝜎 and Sample SD is given by 𝑠.
 Variance and standard deviation describe the variability of the data. Thus, smaller variance or standard
deviation indicates that the data are more consistent.
 Sample Variance, denoted by 𝑠 , is given by

7 8
2
9/11/2020

EXAMPLES OVERVIEW OF STATISTICS

 Suppose you take a sample and observe the following: 70, 90, 80. Calculate the mean, median, standard deviation.
 Mean=
 Median=  Generally, our goal in statistics is to gather information from a sample and then generalize the results
 Sample standard deviation using the following steps:
from the sample to some larger population of interest.
 This is done by calculating statistics – numerical properties of the sample – and using them to
Step 1: Find the mean
estimate parameters – numerical properties of the population.
Step 2: Find the deviations (subtract
each observation from the mean). Square  This process is called statistical inference.
this value.
Step 3: Sum the deviations. Divide this by
n-1. This is 𝑠 .
Step 4: Take square root. This is 𝑠.

9 10

OVERALL IDEA OVERALL IDEA


We want to know about these.
Population

We have to work with these.


 The whole purpose of inference is to use the information we obtain from the sample to better
Sample
understand the population.

Random  We use statistics (numerical summaries of the sample) to try to understand and estimate the
Sampling
parameters (numerical summaries of the population).

Can’t observe directly!


Inference Statistic
Parameter (sample characteristic)
(population characteristic)

11 12
3
9/11/2020

TERMINOLOGY COLLECTION OF DATA

 Biased sample is a sample in which the elements selected share some property which influences
The whole purpose of statistics is to understand a set of information. This set of information is from a . . .
their results
 Population is the complete collection of ALL elements that are of interest for a given problem.
 Convenience Sample is where the selection of units from the population of interest is convenient
 Parameter is the numerical measurement used to describe a population.
to be in the sample
 These are rarely representative of the population.
The population is often so big that obtaining all information about its elements is either difficult or impossible. So, we work
 Simple random sampling implies that each member of the population has an equal probability of
with a more manageable set of data that we obtain from a . . .
being chosen.
 Sample is sub-collection of elements drawn from a population.
 Tends to be representative of the population.
 Statistic is the numerical measurement used to describe a sample.
 Allows for us to generalize our results to the population of interest

13 14

COLLECTION OF DATA ROLLING DOWN THE RIVER SAMPLING ACTIVITY

 Find the “Rolling Down the River Sampling Activity” on Canvas.


 Stratified sampling: divide the population into strata (similar groups) and then do simple random sampling of
each strata  In this activity, you will be exploring the different types of sampling procedures we discussed.

 Cluster sampling: break the population into heterogeneous (diverse) groups, called clusters, and then  Useful R code
randomly select one or more groups  nums <- seq(low, high, by = increment)
 Every unit within a cluster are in the sample.  sample(object, number within the sample, replace = T(or F))

 I will show you the first couple!


 Because both utilize simple random sampling, they tend to be representative of the population of interest, and  Make sure you are doing these separate from the numbers shown in class (i.e. get your own list of numbers)
these types of sampling procedures also allow for results to be generalized to the population. I am only showing how the code works.

15 16
4
9/11/2020

HELPER OR HINDERER HELPER OR HINDERER

 Suppose 65 infants from the U.S. are part of the study, of which 55 of the 65 select the helper toy.  What is the variable of interest? Categorical or quantitative?
 Research Question – Is there statistical evidence that infants truly have a preference for more helpful toys?
 What is the population of interest?  Define parameter of interest.

 What is the sample?  Calculate the statistic.

 Suppose infants have no preference. How many in the sample would you expect to choose the helper toy?

17 18

HELPER OR HINDERER SYMBOLS

 Do you think 55 out of 65 choosing the helper is evidence that infants in the population have a preference?

 In this course, the following symbols will be used:


 Suppose that instead 35 of the 65 infants had chosen the helper toy. Do you think 35 out of 65 choosing the
helper is evidence that infants in the population have a preference?
Parameter Statistic
𝜋: long run proportion 𝑝̂ (“p-hat”): sample proportion
𝜇: long run average 𝑥̅ (“x-bar”): sample average
 How many infants in a sample of 65 must choose the helper before you become convinced that the population of
infants actually does have a preference? Explain.

19 20
5
9/11/2020

HYPOTHESES ALTERNATIVE HYPOTHESIS

 Summarize the research question and parameter of interest


 Null hypothesis: what occurs due to random chance (written as H )
 One-sided test: testing for either less (<) than OR greater than (>) in the alternate hypothesis
 Alternate hypothesis: what the researcher is trying to support or what occurs if there is an effect (written as 𝐻
 Two-sided test: testing for “not equal to” in the alternate hypothesis
or 𝐻 )
 We start by assuming null hypothesis is true and look for evidence to reject this idea
 What are the hypotheses for the Helper activity?

21 22

DECISION P-VALUE

 Definition (comes in three major parts):


 Inference involves using the sample to decide one of the following two:
1. Probability of obtaining the statistic
 The sample supports the research question – Reject H0 in favor of Ha.
2. Plus the probability of obtaining a value more extreme than the statistic
 The sample does not support the research question – Fail to reject H0 in favor of Ha.
3. If the null hypothesis is true

 Must answer the following: How unusual would it be to a observe a result as extreme or more extreme by
Example: If you win 40% of the time at rock/paper/scissors, the 𝑝-value is the probability of winning 40% of the time
random chance alone (i.e. if the null is true)?
or higher if the real chance of winning is

23 24
6
9/11/2020

P-VALUES

 0 < 𝑝 − 𝑣𝑎𝑙𝑢𝑒 < 1 (𝑝-value is always between 0 and 1)


 Compare 𝑝-value to significance level (denoted as 𝛼)
 𝛼 is probability of Type I error (incorrectly rejecting the null hypothesis)
 Smaller 𝑝-value (𝑝-value < 𝛼)
 Stronger evidence against the null hypothesis (reject)
 Support the alternative hypothesis
 Larger 𝑝-value (𝑝-value > 𝛼)
 Insufficient evidence against the null hypothesis (fail to reject)
 Cannot support the alternative hypothesis

https://simplystatistics.org/2013/08/26/statistics-meme-sad-p-value-bear/

25 26

COMMON TEST STATISTIC (STANDARDIZED TEST) HELPER OR HINDERER SIMULATION

 Common test statistic: t or z =

 Measures how many standard deviations away from the mean the statistic  Let’s start using a simulation to get the 𝑝-value.
 Standard error measures the variability of the statistic  http://www.rossmanchance.com/ISIapplets.html
 Standardizing allows for a unitless measurement  How about we also get the 𝑝-value in R!
 Standardizing also allows for easy comparison
 To do this, we will need to calculate standard error using
 Sliding the mean to 0 and transforming the standard deviation to 1
 𝜋 is the value of the parameter under the null hypothesis
 Critical values- “cut-off” points that divide the distribution into reject/fail to reject regions
 pnorm(statistic, mean = mean of null, sd = standard error, lower.tail = F)
𝑧-distribution 𝑡-distribution
Population standard deviation Known (when 𝛼 = 0.05) Unknown
Critical value ±1.645 (one-sided test) Depends on degrees of
±1.96 (two-sided test) freedom (df)= 𝑛 − 1

27 28
7
9/11/2020

HELPER OR HINDERER CONFIDENCE INTERVALS


 What is the p-value?
 Contain plausible values of parameter
 Confidence level + significance level =1
 For 𝛼 = 0.05, confidence level = 1 − 𝛼 = 0.95
 What is the test statistic? (𝑧 = )
 Equation: Statistic ± Critical Value × Standard Error
 Generally use two-sided critical value for confidence intervals
 What does “confidence” mean:
 We expect 95% of all similarly constructed intervals to contain the true parameter value
 What conclusion can be drawn based on this p-value and test statistic?
 To interpret a confidence interval:
 With 95% confidence, the true parameter value falls within the confidence interval

29 30

CAN YOU REPEAT? EFFECT OF CONFIDENCE LEVEL AND SAMPLE SIZE

 Suppose 100 different samples were taken from a population with


 Confidence level
𝜋 = 0.5.  For 𝛼 = 0.01, use a 99% confidence interval
 Bigger confidence level → smaller 𝛼 level → rejecting less → fail to reject more → wider interval
 Each line on the plot represents ONE sample CI.  Example of wider confidence level:
 95% confident the average height of an American adult is between 5’3” and 6’2”
 Red line means CI does not contain 𝜋 = 0.5.  99% confident the average height of an American adult is between 4’ and 8’
 More confident but wider interval

 Notice that only 3 out of 100 samples *do not* include the true  Sample size
proportion.
 Bigger sample size → narrower interval
 Know more about the population so range of plausible values decreases
 Idea: with repeated sampling, about 95% of the confidence intervals
should contain the TRUE value of the population parameter

31 32
8
9/11/2020

HELPER OR HINDERER INCORRECT INTERPRETATION

 What is the 95% two-sided confidence interval?


 Incorrect: There’s a 95% chance that the true proportion of babies who choose the helper is between
and .
 Why is this incorrect?

 Interpret the confidence interval.

33 34

ST. GEORGE EXAMPLE ST. GEORGE EXAMPLE


 15% of patients who received heart transplant operations in Britain have died. After 79 out of 371 (21.3%) patients who received a heart
transplant operation at St. George's died.  What is the population of interest in this study?
 Research question- Is the mortality rate different at St. George’s than the national rate?
 𝛼 = 0.10
 What is the population of interest in this study?
 What is the sample?
 What is the variable of interest? Is it categorical or quantitative?  What is the sample?
 Define parameter of interest and the statistic.
 Write the hypotheses based on the parameter of interest.
 What is the standard error for finding the 𝑝-value and test statistic?
 What is the p-value using R?
 What is the variable of interest? Is it categorical or quantitative?
 What is the test statistic?
 What conclusion can be drawn?
 What is 90% confidence interval? Interpret.

35 36
9
9/11/2020

ST. GEORGE EXAMPLE ST. GEORGE EXAMPLE

 Define parameter of interest and the statistic.  What is your decision? What conclusion can be drawn about the research question?

 What is the p-value based?  What is 95% confidence interval? Interpret.

 What is the test statistic?

37 38

DEFINITIONS

 Response variable: variable of interest that’s being measured


 Explanatory variable: variable that may affect the response variable
 Ideally implement random assignment:
 Randomly assigning observational units to explanatory variable groups

SECTION 2  Benefits- can draw cause-and-effect conclusions


 No random assignment → potential for confounding variables
INFERENCE FOR TWO PROPORTIONS

39 40

39 40
10
9/11/2020

DIFFERENCE OF TWO PROPORTIONS TEST STATISTIC & CI

 Parameters:
 Interested in the difference:  Test statistic:
 Common Hypotheses:
𝑆𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 − 𝐻𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑧𝑒𝑑 𝑣𝑎𝑙𝑢𝑒
𝑧= =
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑒𝑟𝑟𝑜𝑟
where 𝑝 =
 Statistic:

 Validity conditions for normal distribution approximation:


 At least 10 “successes” and 10 “failures” for EACH group as rule of thumb  CI:
 Independent samples
Statistic ± Critical Value × Standard Error =
41 42

41 42

GINKGO EXAMPLE GINKGO EXAMPLE

 Ginkgo biloba and acetazolamide are two drugs used to help prevent contracting headaches at high altitudes. A  What are the hypotheses (in words and symbols)?
research study found that 72 out of 124 people who were randomly assigned to take ginkgo biloba got a
headache (58%) while 23 out of 118 people who were randomly assigned to take acetazolamide got a headache
(19.5%).
 Research question: Are the two drugs different with regards to how likely a person is to get a headache?
 Significance level: 𝛼 = 0.05

Ginkgo Biloba Acetazolamide Total

Headache 72 23 95
No headache 52 95 147
43 44

Total 124 118 242

43 44
11
9/11/2020

GINKGO EXAMPLE GINGKO EXAMPLE


 What is the 95% CI?

 What is the critical value based on our significance level?

 What is the test statistic?

 Interpret the interval.

 What is your decision? State the decision.

45 46

45 46

GINGKO EXAMPLE GINKGO EXAMPLE

 Find the p-value.

 Now that we have done these calculations by hand, let’s write code in R that will calculate these things for us,
including the p-value!
 Does your conclusion based on the test statistic match the conclusion based on the p-value?

48

47 48
12
9/11/2020

FISHER’S EXACT TEST

 Gives exact probability (i.e. exact p-value) when testing for significance (uses hypergeometric distribution)
 Does not require large sample size (this is called a nonparameteric test – doesn’t require any strict assumptions)
 Only assume row and column totals are fixed

FISHER’S EXACT TEST  Can be used when success/failure proportions are near 0% or 100%

49 50

49 50

GINGKO R CODE YOUR TURN

 A study of birth personality in Great Britain sampled 667 people who were the oldest child and 509 people who were
the youngest child. 260 of the oldest children said they were more relaxed than their other siblings while 214 of the
 We will just use R for this test!
youngest children said they were more relaxed. Is the proportion regarding who’s more relaxed different?
 Significance level: 𝛼 = 0.05

1. Write the hypotheses (words and symbols).


2. Calculate the test statistic.
3. Solve for the p-value from Fisher’s Exact Test
4. Set up and calculate the equation for the 95% confidence interval.
5. Interpret the interval.
Oldest Youngest Total
6. State your decision and give a conclusion.
More relaxed 260 214 474
Less relaxed 407 295 702
51 52
Total 667 509 1176

51 52
13
9/11/2020

EXAMPLE EXAMPLE

 Write the hypotheses.  Calculate the test statistic.

53 54

53 54

EXAMPLE EXAMPLE

 Interpret the interval.

 Solve for the p-value from Fisher’s Exact Test

 Set up and calculate the equation for the 95% confidence interval.
Statistic ± Critical Value × Standard Error
 State your decision and give a conclusion.

55 56

55 56
14
9/11/2020

DIFFERENCE OF TWO MEANS

 Parameters:
 Interested in difference:
 Common Hypotheses:

SECTION 3
INFERENCE FOR TWO MEANS
 Statistic:

57 58

57 58

VALIDITY CONDITIONS FOR TWO-SAMPLE T-TESTS

 Large sample size for BOTH groups


(𝑛 and 𝑛 ≥ 20 as “rule of thumb”)
 Independent groups
 Not strongly skewed / approximately symmetric distributions (i.e. sampling from approximately normal
distributions)
WELCH T-TEST ASSUMING UNEQUAL VARIANCES
 Question: Are the variances of the two populations equal?
 Two different tests based on the answer to this question

59 60

59 60
15
9/11/2020

(WELCH) T-TEST ASSUMING UNEQUAL VARIANCE CI ASSUMING UNEQUAL VARIANCE

 Test statistic (assuming unequal variances):

statistic − hypothesized value  Confidence Interval (assuming unequal variances):


𝑡= =
standard error Statistic ± Critical Value × Standard Error =

where 𝑑𝑓 =
 We are now using the 𝑡-distribution, where the critical value we use to compare to our test statistic now
depends on degrees of freedom.

61 62

61 62

STUDY EXAMPLE STUDY EXAMPLE

 A random sample of 24 freshman spent an average of 17.38 hours per week studying with a standard deviation of
5.93 while 23 seniors paid an average of 16.30 with a standard deviation of 3.72.  Calculate the test statistic assuming unequal variances.
 Research question: Do seniors spend a different amount of time studying for classes each week than freshman?
 Significance level: 𝛼 = 0.05
 Write the hypotheses.

0.75
𝑥 = 17.38, 𝑠 = 5.93, 𝑛 = 24
𝑥̅ = 16.30, 𝑠 = 3.72, 𝑛 = 23

63 64

63 64
16
9/11/2020

STUDY EXAMPLE R CODE FOR T-TEST ASSUMING UNEQUAL VARIANCES

 Set up and calculate the equation for the 95% confidence interval.

 Let’s go to R to code this information in!


 Our R code will allow us to easily get the multiplier used on the previous slide.
 We will also get a p-value.

65

65 66

STUDY EXAMPLE

 Interpret the interval.

 State your decision and give a conclusion.


TWO SAMPLE T-TEST ASSUMING EQUAL VARIANCES

67 68

67 68
17
9/11/2020

T-TEST ASSUMING EQUAL VARIANCE CI ASSUMING UNEQUAL VARIANCE

 Test statistic (assuming equal variances):


 Confidence Interval:
Statistic ± Critical Value × Standard Error =
statistic − hypothesized value
𝑡= =
standard error

where 𝑠 = where 𝑑𝑓 =

69 70

69 70

STUDENT LEARNING STUDENT LEARNING EXAMPLE

 A study was conducted where 69 students were randomly assigned to listen to a lecture from an attractive  Write the hypotheses.
instructor and got an average score of 18.27 on a 25-question quiz with a standard deviation of 3.30 while the 62
students randomly assigned to an unattractive instructor got an average score of 16.68 with a standard deviation
of 3.22. (Suppose Group 1: attractive instructor, Group 2: unattractive instructor)
 Research question: Do students learn more from attractive teachers?
 Significance level: 𝛼 = 0.05
1. Write the hypotheses.
2. Calculate the test statistic assuming equal variances by hand. The critical value is 1.657.
3. State your decision and give a conclusion.

71 72

71 72
18
9/11/2020

STUDENT LEARNING EXAMPLE

 Calculate the test statistic assuming equal variances.


 State your decision and give a conclusion.
( ) ( )
𝑠 =

̅ ̅
𝑡=

73 74

73 74

SO FAR…

 Independent groups design (Section 1.3): no connections relating individuals in one group to individuals in
another group

SECTION 4
 e.g. American League (AL) teams independent of National League (NL) teams in baseball
 e.g. freshman not related to seniors for time spent studying

INFERENCE FOR PAIRED DESIGNS

75 76

75 76
19
9/11/2020

TWO TYPES OF PAIRED DESIGN MEMORIZATION

 Repeated measures: Responses comes in pairs (taking information on same observational unit twice)
 27 students are sampled for this study.
 e.g. looking at average difference in scores on a pretest and a posttest for each person
 Research question: Does listening to music with lyrics affect students’ memorization ability?
 e.g. looking at average difference in weight before a diet and after diet for each person
 Explain how each of these studies can be done using:
 Independent groups
 Matched pairs design: match individuals with similar characteristics in groups of 2 and compare the difference
between each matched pair  Paired design using repeated measures
 e.g. difference in two surgery procedures for two people who are similar  Paired design using matching
 matched pairs used when repeated measures not feasible

77 78

77 78

INDEPENDENT GROUPS PAIRED DESIGN USING REPEATED MEASURES

79 80

79 80
20
9/11/2020

PAIRED DESIGN USING MATCHING PAIRED DESIGNS

 Does not focus on words memorized for each individual (or matched pair)
 Focuses on difference in words memorized for each individual (or matched pair)
 e.g. In independent groups: one person might memorize 13 words in lyrics group and another person might memorize 15
words without lyrics
 e.g. In paired design: a person might memorize 13 words with lyrics and the same person (or the other person in the
matched pair) might memorize 15 words without lyrics
 Focus on difference: 13 – 15 = –2
 This person memorized 2 less words with lyrics

81 82

81 82

ADVANTAGES AVERAGE DIFFERENCE

 Interested in analyzing the difference of each pair (resulting in a one-mean analysis)


 Less variability with paired designs
 Focusing only on variability within each person  Difference is calculated by subtracting the data from the two trials to produce one mean and one standard
deviation
 Not focusing on variability between people
 Example:
 Increases power (correctly rejecting null hypothesis)

 Works better when association is expected between observations


Pre-Test Post-Test Difference (post-pre)

 Note: randomization should still be used to determine which “treatment” each person will receive first for
Person 1 3 10 7
repeated measures (or which person in the pair receives which “treatment” for matched pairs) Person 2 5 15 10
 e.g. some people should listen to lyrics first and others should listen to the non-lyric music first; then switch off
Person 3 7 7 0
𝒙𝒅 = 𝟓. 𝟔𝟔𝟕
𝒔𝒅 = 𝟓. 𝟏𝟑𝟏
83 84

83 84
21
9/11/2020

AVERAGE DIFFERENCE PAIRED T-TEST

 Validity Conditions for paired t-test:

 Parameter:  Large sample size (n ≥ 20 as rule of thumb)


 Not strongly skewed / approximately symmetric distribution (i.e. sampling from approximately normal distribution)
 Common Hypotheses:
 Test Statistic
𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 − ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑧𝑒𝑑 𝑣𝑎𝑙𝑢𝑒
𝑡= =
𝑆𝐷

 Statistic:  Confidence Interval:


Statistic ± Critical Value × Standard Error =

85 86

85 86

WEIGHT CHANGE EXAMPLE WEIGHT CHANGE EXAMPLE

 Example: A random sample of first year-students was taken at a large Midwestern university. Their weight was
recorded at the beginning and end of their first year.
 Write the hypotheses.
 Research question: Do students typically have a change in weight during their first year of college?
 Significance level: 𝛼 = 0.05
1. Write the hypotheses.
2. Find the mean and standard deviation of the differences in R.
3. What is the critical value? Use R.
 Find the mean and standard deviation of the differences.
4. Determine the test statistic by hand and in R.
5. State your decision and give a conclusion.
6. What is the 95% CI? Complete by hand and in R.  What is the critical value? Use R.
7. Interpret the interval.

You will need the data set Weight.csv on Canvas. 87 88

87 88
22
9/11/2020

WEIGHT CHANGE EXAMPLE WEIGHT CHANGE EXAMPLE

 Determine the test statistic by hand and in R.


 Calculate the 95% CI by hand and in R.

 Interpret the interval.


 State your decision and give a conclusion.

89 90

89 90

CONCLUSION QUESTION

 Conclusion Question: Why does a paired design work better than independent samples in this scenario?

 Paired design focuses on measuring the difference in beginning and end weight for each person
 less variability


larger test statistic
smaller p-value
SECTION 5
 reject more
NONPARAMETRIC ALTERNATIVE TO COMPARING POPULATIONS

91 92

91 92
23
9/11/2020

T-TEST CHALLENGES

 Need a “large” enough sample size to be able to check normality assumption


 Not always feasible
 Nonparametric test doesn’t require any assumptions about underlying distribution
 Ex: Simulation from Section 1.1
 Always possible regardless of sample size.

 Two nonparametric tests discussed in this section:


MANN-WHITNEY (WILCOXON) TEST
INDEPENDENT GROUPS
 Wilcoxon Rank Sum Test (aka Mann-Whitney U Test)
 Independent groups

 Wilcoxon Signed Rank Test


 Paired design
93 94

93 94

HYPOTHESES FOR INDEPENDENT SAMPLES HYPOTHESES

 𝐻 : The distribution of population 1 is the same as the distribution of population 2

 Potential alternatives:  If the populations are symmetric, can be expressed in terms of the population medians (𝑀 , 𝑀 )

 𝐻 : The distribution of population 1 is not the same as the population 2.


 𝐻 : The distribution of population 1 is shifted to the right of the distribution for population 2
 𝐻 : The distribution of population 1 is shifted to the left of the distribution for population 2

95 96

95 96
24
9/11/2020

STEPS BRIEF EXAMPLE FOR DUPLICATE OBSERVATIONS


 Step 1: Put all observations in ascending order
 All observations regardless of group – but make sure to keep track of which group the observation came from!
 Step 2: Rank each observation
 Step 3: Calculate 𝑅 = Sum of all ranks for Group 1 and 𝑅 = Sum of all ranks for Group 2 2 3 5 6 6 7 7 7 8 8 8 8 9 9 10
 Group 1 is the group with the largest 𝑅
( )
 Step 4: Calculate the test statistics U = 𝑛 𝑛 + − 𝑅 for Group 1
 You can find the same value for Group 2 by changing the second and last terms to appropriate subscripts.
 You pick the smallest one to be your test statistic, which will be 𝑈 if we make Group 1 the group with the largest 𝑅.
 Step 5: Use a table to find the critical value to compare and make decision.
 This seems outdated… let’s use R.
 Notes:
 If 𝑛 + 𝑛 > 50, normal approximation is used to find p-values.
 If there are duplicate values, use the median of their unassigned ranks. 97

97 98

ICE CREAM EXAMPLE ICE CREAM EXAMPLE


 Hypotheses:
 Researchers wanted to compare the number of ice cream cones people eat in Florida per year versus in Iowa.
 Research question: Do Floridan's tend to eat more ice cream than Iowan’s?
 Significance level: 𝛼 = 0.05
 The following data were observed:  Calculate 𝑅 , 𝑅 , 𝑈
Florida 20 25 30 35 40
Value
Iowa 5 10 15 16 24
Rank
 What are the hypotheses in terms of medians? F or I
 Calculate 𝑅 and 𝑅
 Calculate 𝑈
 What is the p-value?
 What conclusion can be drawn?
99 100

99 100
25
9/11/2020

ICE CREAM EXAMPLE

 What is the p-value?

 What conclusion can be drawn?


WILCOXON SIGNED RANK TEST
PAIRED DESIGNS

101 102

101 102

HYPOTHESES FOR PAIRED DATA PROCEDURE


 Step 1: Calculate the actual differences between responses for a pair (i.e. Post test-Pre test)
 Step 2: Rank the absolute values of the differences and affix a sign to each rank

Abs Diff
Rank
 Typical hypotheses in terms of median difference: Sign
 Step 3: Add signed rank to your table

Sign Rank

 Step 3: Calculate 𝑉 by summing all the positive ranks (or 𝑉 by summing absolute value of negative ranks)
 Step 4: Again, we would use a table for the critical value – but we will use R to get a p-value instead.

103 104

103 104
26
9/11/2020

EXAMPLE DOCTOR’S EXAMPLE


 What are the hypotheses in terms of a median?
 Researchers are interested in the number of doctor's appointments residents of Lincoln had in 2018 compared to 2019.
A random sample of 5 adults was taken.
 Research question: Is there a difference in the number of doctor’s appointments made in 2018 compared to 2019?
 The following data were observed:
 Calculate 𝑉 .
Person 1 Person 2 Person 3 Person 4 Person 5
2018 3 4 2 6 9
Abs Diff
2019 1 3 5 2 3
Rank
Sign
 What are the hypotheses in terms of a median (2019-2018)?
Signed Rank
 Calculate 𝑉 .
 What is the p-value?
 What conclusion can be drawn? 105 106

105 106

DOCTOR’S EXAMPLE HEART RATE EXAMPLE

 What is the p-value?  We are interested in if typical heart rates differ when using the treadmill as compared to the elliptical. 10 participants
are sampled. Each participant used the elliptical and treadmill for five minutes, each on a separate day. At the five-minute
mark, their heart rate was recorded. The data are presented below:

Sub 1 2 3 4 5 6 7 8 9 10
Ellip 140 150 162 142 170 154 160 120 130 150
Tread 124 155 170 120 157 145 141 121 142 161
 What conclusion can be drawn?
 What are the hypotheses in terms of a median (elliptical-tread)?
 Calculate the appropriate statistic by hand.
 What is the p-value?
 What conclusion can be drawn?

107 108

107 108
27
9/11/2020

DISADVANTAGES TO NONPARAMETRIC RANK TESTS

 Typically less powerful tests (less likely to reject a false null hypothesis) than parametric tests
 If reasonably certain assumptions are met for parametric method, better to use that instead of nonparametric SECTION 6
ERRORS AND POWER

109 110

109 110

TWO ERRORS CHART


 Type I Error
 Reject the null hypothesis, but in reality the null is actually true
 False alarm – believe we have discovered a difference when no difference exists
H0 is TRUE H0 is FALSE
 𝑃𝑟𝑜𝑏 𝑇𝑦𝑝𝑒 𝐼 𝐸𝑟𝑟𝑜𝑟 = 𝛼

REJECT H0 Type I error


 Type II Error
 Fail to reject the null, but in reality the null is actually false
FAIL TO REJECT H0 Type II Error
 Missed opportunity – failing to detect difference that is there
 Generally, refer to 𝑃𝑟𝑜𝑏 𝑇𝑦𝑝𝑒 𝐼𝐼 𝐸𝑟𝑟𝑜𝑟 = 𝛽

 Important note: once you make a decision, only one type of error could have been made!
 You must decide before conducting an experiment or study which error will be worse.
111 112

111 112
28
9/11/2020

EXAMPLE EXAMPLE 2

As with a jury trial, another analogy to hypothesis testing involves medical diagnostic tests. These tests aim to
indicate whether or not the patient has a particular disease. But the tests are not infallible, so errors can be made.
Historically, it is known that the average body temperature was known to be 98.6 degree F; however, researchers
The null hypothesis can be regarded as the patient being healthy. wonder if this value has decreased.
The alternative hypothesis can be regarded as the patient having the disease.
 Describe what Type I error represents in this situation.
 Describe what Type I error represents in this situation.  Describe what Type II error represents in this situation.
 Describe what Type II error represents in this situation.  Which type of error would you consider to be more serious in this situation? Explain your thinking
 Which type of error would you consider to be more serious in this situation? Explain your thinking

113 114

113 114

POWER POWER

 Probability of rejecting 𝐻 if it is not true

 Function of several quantities:


 𝑃𝑜𝑤𝑒𝑟 = 1 − 𝑃𝑟𝑜𝑏 𝑇𝑦𝑝𝑒 𝐼𝐼 𝐸𝑟𝑟𝑜𝑟 = 1 − 𝛽
 How big of difference are you trying to detect?
 What is the variance?
 What is 𝛼?
 Example: Suppose we are investigating Drug A and a placebo. We know that the drug truly is effective. Suppose the  One other thing- sample size
power for this experiment was 0.9.
 Means that if we ran this experiment many times, we would get statistically significance difference about 90% of the time
 But, 10% of the time, we would get insignificant results, even though drug A truly is effective.

115 116

115 116
29
9/11/2020

POWER ANALYSES EXAMPLE 2- TWO INDEPENDENT SAMPLE MEAN EXAMPLE

 Example: Suppose that we are comparing two different types of seeds of corn and their yield. We know that the
 Power analyses focus on one of the following: two different seeds have a common variance in yield of 300. We are interested in detecting a 15-unit difference in
 Calculating the sample size required to achieve a certain power yield. We also know that we will have 25 of each type of seed. What’s the power of this test?
 Or opposite  calculating the power given a sample size

 Researchers commonly do these before running an experiment  To use, R’s function for two independent samples with equal variances, we need to calculate the effect size defined
 If you are spending the money on an experiment, you want to know you’ll be able to detect the difference if it is there… as:
|𝜇 − 𝜇 |
𝑑= =
𝜎
 Typically, you want at least 80% power

117 118

117 118

TRY DIFFERENT VALUES TRY DIFFERENT VALUES

 Try the following values for the sample size to (5, 10, 30, 50) while holding everything else constant. What
happens to the power as you change the sample size in each group?  Try the following values for the variances (𝜎 = 100, 200, 400, 500), while holding everything else constant (from
the initial scenario). What happens to the power as you increase and decrease the variances?

 Try the following values for 𝜇 − 𝜇 = 1, 5, 20 and 30, while holding everything else constant (from the initial
scenario). What happens to the power as you change the in each group?  Try the following values for the 𝛼 (0.001, 0.01, 0.1, 0.25), while holding everything else constant (from the initial
scenario). What happens to the power as you change the significance level?

119 120

119 120
30
9/11/2020

TWO INDEPENDENT SAMPLE MEAN EXAMPLE TRY DIFFERENT VALUES

 Try the following values for 𝜇 − 𝜇 = 1, 5, 15, and 20, while holding everything else constant (from the initial
 Example: Suppose that we are comparing freshman and sophomore’s Exam 1 grade in STAT 218. We are scenario). What happens to the required sample sizes as you increase and decrease size of the differences?
interested in finding a 10-point grade difference between freshman and sophomores. Assume both sophomores
and freshman have a common variance of 100. What sample size for each group do you need to have in order to
have 80% power?

 Try the following values for the variances (𝜎 =16, 50, 150, 200), while holding everything else constant (from
the initial scenario). What happens to the required sample sizes as you increase and decrease the variances?

121 122

121 122

TRY DIFFERENT VALUES SUMMARY

 Try the following values for the power (0.5, 0.6, 0.9, 0.99), while holding everything else constant (from the initial
scenario). What happens to the required sample sizes as you change the power ?

 Researchers commonly do power analyses before running a study


 If you are spending the money on a study, you want to know you’ll be able to detect the difference if it is there…

 Try the following values for the 𝛼 (0.001, 0.01, 0.1, 0.25), while holding everything else constant (from the initial
scenario). What happens to the required sample sizes as you change the significance level?
 You can do this in almost any study, but we’ve just demonstrated this for two independent mean with equal
variances

123 124

123 124
31
9/11/2020

SO FAR..

CHAPTER 2
ONE-WAY ANALYSIS OF VARIANCE (ANOVA)
 Compared two groups
 Florida vs Iowa with relation to ice cream consumption
 Freshman vs senior with regards to study habits
 Attractive vs unattractive instructor with regards to learning

 Are we always interested in only comparing two groups though?

1 2

1 2

STATISTICAL MODELING EXAMPLES OF ANOVA


 Suppose we are interested in comparing if waiting times at Starbucks differ on different days of the week.
 Explanatory:
 Number of levels:
 Much of statistics involve modeling  Response:

 Generally, it is a mathematical formula showing the relationship between the response variable and explanatory
variable(s)  Suppose we are interested in comparing if salaries tend to increase as education level (Associates, bachelors, master’s,
doctoral) increases.
 Often, written as follows:  Explanatory:
𝑌 = 𝑀𝑜𝑑𝑒𝑙 + 𝐸𝑟𝑟𝑜𝑟  Number of levels:
 Response:
where the model is where you put your explanatory variables, experimental design components (discussed later), etc

Typical overarching research questions: Are there at least two population means that differ?

3 4

3 4
1
9/11/2020

HYPOTHESES ONE-WAY ANOVA MODEL

 𝑦 =𝜇+𝜏 +𝑒
 Hypotheses:
 𝑦 denotes the response for the 𝑖 group and 𝑗 observation
𝐻 :𝜇 = 𝜇 = ⋯ = 𝜇
 𝜇 is the intercept (the overall mean for the baseline group that all other groups are compared to)
𝐻 : at least two population means differ (i.e. 𝜇 ≠ 𝜇 for some i, j)
 𝜏 is the effect of the 𝑖 group (how much that group mean deviates from the intercept)
 𝑒 ~𝑁(0, 𝜎 ) denotes the error term

 Note that the mean of group 𝑖 is: 𝜇 = 𝜇 + 𝜏

5 6

5 6

ASSUMPTIONS OF ANOVA A FEW SUMMARY STATISTICS

 Overall mean (where 𝑛 represents the total number of observations):


 Sampling from approximately normal distributions
 All populations have the same variance
 Samples are independent
 Mean of group 𝑖 (where 𝑛 represents the total number of observations in group 𝑖):

 Notes: Ideally, observational (experimental) units randomly assigned to explanatory variable groups
 Referred to as completely randomized design  ANOVA is calculated using the 𝐹-statistic by measuring the ratio of variability between groups and variability within
 If not possible to randomly assign, important to independently sample from each group
groups
 If 𝐹 is large, the variability between groups is a lot larger than the variability within groups

7 8

7 8
2
9/11/2020

ONE-WAY ANOVA TABLE SUM OF SQUARES

 Model sums of squares signifies between group variability (how much the means vary from group to
 For 𝑖 = 1,2, … , 𝑘 groups and 𝑗 = 1,2, … 𝑛 observations in each 𝑖 group, an ANOVA table can be constructed as group)
follows:
 Used to calculate the between-group variance

Source 𝐝𝐟 Sums of Squares Mean Square F 𝒑-value


Model  Error sums of squares signifies variability within group (how much the observations vary within each
group)
 Used to calculate the within-group variance

Error
 Total sums of squares signifies variability of each observation and overall mean

Total
9 10

9 10

STARBUCKS EXAMPLE STARBUCKS EXAMPLE

 Assume the data below represents the time (in minutes) of how long it takes to wait in line at Starbucks on
different days of the week

Monday Wednesday Friday


6 3 3.5
5.5 4 2
3 2 3
6.5 2 1.5

11 12

11 12
3
9/11/2020

HYPOTHESES STEP 1: CALCULATE THE SAMPLE MEAN FOR EACH GROUP

13 14

13 14

STEP 2: CALCULATE THE OVERALL MEAN STEP 3: CALCULATE THE SUMS OF SQUARES FOR THE MODEL.
 Calculate by multiplying sample size for each group by the square deviations of each sample group mean and
overall mean.

15 16

15 16
4
9/11/2020

STEP 4: CALCULATE THE SUMS OF SQUARES FOR THE ERROR STEP 5: CALCULATE SUM OF SQUARES TOTAL

 Calculate by summing square deviations for each observation and its group mean

17 18

17 18

ANOVA TABLE STARBUCK EXAMPLE CONCLUSION

Source df SS MS F 𝑝-value
 What is the critical value for F? (Hint: To obtain the critical value for the 𝐹-statistic in R, plug in 1 − 𝛼,the degrees
Day of freedom for the model, degrees of error)
qf(1-𝛼, df , df )

Error

 Conclusion:
Total

19 20

19 20
5
9/11/2020

RESTAURANT EXAMPLE RESTAURANT EXAMPLE HISTOGRAM

 Assume the data below represents the time (in minutes) of how long it takes to wait at different fast-food restaurants.
Researchers randomly assigned 15 different people to either wait at McDonald’s, Popeye’s, or Fazolli’s on random days.

McDonald’s Popeye’s Fazolli’s


3 10 6.5
4 13 10
2 11 8
4.5 15 9.5
6.5 13 7

 Researchers are interested if different restaurants have different wait times. State the hypotheses and find the ANOVA
table.

21 22

21 22

RESTAURANT EXAMPLE HYPOTHESES STEP 1: CALCULATE THE SAMPLE MEAN FOR EACH GROUP

23 24

23 24
6
9/11/2020

STEP 2: CALCULATE THE OVERALL MEAN STEP 3: CALCULATE THE SUMS OF SQUARES FOR THE MODEL.
 Calculate by multiplying sample size for each group by the square deviations of each sample group mean and
overall mean.

25 26

25 26

STEP 4: CALCULATE THE SUMS OF SQUARES FOR THE ERROR STEP 5: CALCULATE SUM OF SQUARES TOTAL

 Calculate by summing square deviations for each observation and its group mean

27 28

27 28
7
9/11/2020

ANOVA TABLE RESTAURANT EXAMPLE CONCLUSION

Source df SS MS F 𝑝-value
 What is the critical value?
Rest.

Error
 Conclusion?

Total

29 30

29 30

GPA EXAMPLE GPA EXAMPLE


 Suppose we are interested if there are any differences in GPA across freshman, sophomore, juniors, and seniors.
 What are the hypotheses?  Based R, find the ANOVA table.
𝑛 𝑥̅ Standard Deviation

Freshmen
Sophomores

Juniors
 Just eyeballing it, do you think there is a difference in groups? Seniors

31 32

31 32
8
9/11/2020

GPA EXAMPLE TABLE AND CONCLUSION 95% CONFIDENCE INTERVALS

Source df SS MS F 𝑝-value

Class
Error
Total

 Note: All the intervals above contain 0!


Conclusion:
 Why? The null hypothesis was not rejected so it’s plausible they’re the same
 If the null hypothesis is rejected, then at least one of the intervals would NOT contain 0.

33 34

33 34

MAJORS EXAMPLE

 Is there an association between major and GPA? Use R to answer this question.
 Hypotheses:

MULTIPLE COMPARISONS
 Conclusion:

35 36

35 36
9
9/11/2020

MULTIPLE COMPARISONS COMPARISON-WISE ERROR RATE

 Recall 𝛼- the probability of making a Type I Error (rejecting a true null)

 When the null hypothesis is rejected, a post hoc test needs to be performed to see which level is different

 For 𝑘 groups, there are C = = possible comparisons

 If we do _____ independent t-tests (like we did in Chapter 1), each test will have 𝑃 𝑇𝑦𝑝𝑒 𝐼 𝐸𝑟𝑟𝑜𝑟 = 𝛼
 In major example, number of comparisons:  Called comparison-wise error rate

37 38

37 38

EXPERIMENT-WISE TYPES OF MULTIPLICITY ADJUSTMENTS

 A multiplicity adjustment needs to be done to ensure that the experiment-wise Type I error stays at the specified
 What’s the probability of making at least one error in all of the comparisons assuming 𝛼 = 0.05? 𝛼 level
 Formula:

 Types of Multiplicity Adjustments


 Tukey – Accounts for both number of comparisons and degrees of freedom (we’ll use this one)
 Bonferroni – 𝛼 is divided by the number of comparisons being made
 Dunnett – Similar to Tukey, but used when you only want to compare all groups to one category (e.g. comparing
experimental treatments to a control treatment)

 This increases Type I error to about ______ % for the whole experiment
 Called the “family-wise” or “experiment-wise” error rate

39 40

39 40
10
9/11/2020

POST-HOC TEST FOR GPA BY MAJOR EXAMPLE POST-HOC TEST FOR GPA BY MAJOR EXAMPLE

 Interpretation for those with significant differences:


 With 95% confidence, Education majors have a true average GPA between 0.06 and 1.48 higher than Business majors.
 With 95% confidence, Science majors have a true average GPA between 0.05 and 1.07 higher than Business majors.
 Similar interpretations can be made for the other 8 intervals

41 42

41 42

11
9/11/2020

CHAPTER 3
EXPERIMENT DESIGN

SECTION 1
RANDOMIZED COMPLETE BLOCK DESIGN (RCBD)

1 2

1 2

MOTIVATING EXAMPLE COMPLETELY RANDOMIZED DESIGN


 We can randomly assign 4 students to each learning activity and conduct a one-way ANOVA, or completely
 Suppose we want to compare three learning activities and measure performance on 12 students in 4 classes. randomized design (CRD) and see if there’s a significant difference

 How could we design this based on the information we have learned in this course so far?
3 2 1 1 2 3 1 1

3 2 3 2

3
 Note: The numbers in the circles represent each of the learning activities while the rectangles represent the 4
classrooms taught by a different instructor

3 4
1
9/11/2020

THOUGHT QUESTION? RANDOMIZED COMPLETE BLOCK DEIGN

 If learning activity #3 turns out to be the best and learning activity #1 is the worst, why might that be an issue for  We can take 4 classes (each taught by a different instructor) with 3 students in each class.We can then randomly
the first design? assign one students in each class to receive each of the learning activities

1 1
2 3 3 2 1
2

1 2 3 3

5 6

5 6

RESEARCH DESIGN ADVANTAGES

 Blocks can help control block to block variability


 Variation due to differences between blocks
 Create blocks of homogeneous units (Blocks)  Experimental units within the same block are more homogeneous than units between blocks (if blocking is
effective)
 Randomly assign treatments to each unit within a block (Randomized)
 Note: each treatment must appear in each block (Complete)
 Goal: “create” block such that units within blocks are as similar as possible
 Helps us get a clearer picture of actual differences in treatment

7 8

7 8
2
9/11/2020

ADVANTAGES COMMON BLOCKING CRITERIA

 Green houses, gradients that occur in fields


 Groups of animals: Cages with multiple animals, litters
 Location (different fields, schools, states)

9 10

9 10

STATISTICAL MODEL FOR ANOVA WITH RCBD

 𝑦 = 𝜇+𝜏 +𝑏 +𝜖
 𝑦 is the observation on the 𝑖 treatment and 𝑗 block
 𝜇 denotes the intercept
 𝜏 denotes effect of 𝑖 treatment
ANOVA WITH RCBD  𝑏 ~𝑁(0, 𝜎 ) denotes effect of 𝑗 block
 𝜖 ~𝑁(0, 𝜎 ) denotes error term

11 12

11 12
3
9/11/2020

SIDE NOTES ANOVA TABLE WITH RANDOM BLOCKS

Source 𝑑𝑓 SS MS F
 The block term is considered a “random” effect because the blocks don’t consist of the entire population (e.g.
other classes could be chosen) Block 𝑏−1

 The treatment is a fixed effect (e.g. only those three learning activities are being considered)
Treatment 𝑡−1 SS(T) SS T MS T
 Difference between treatments does not depend on blocks
𝑡−1 MS E

Error (𝑏 − 1)(𝑡 − 1) SS(E) SS E


(𝑏 − 1)(𝑡 − 1)

Total 𝑏𝑡 − 1 SS(Total)

13 14

13 14

CAR COMPANY EXAMPLE HYPOTHESES


 Mazda wanted to compare how six different types of tires affected MPG in 2019 Mazda 3 in order to determine
which tires to use. The company collected information across five representative cities. Six cars within each city
were randomly assigned tire types.

 Explanatory (treatment):

 Response:

 Block:

15 16

15 16
4
9/11/2020

MODEL

17 18

17 18

“SKELETON” ANOVA TABLE (SV AND DF COLUMNS) CODING/ OUTPUT

Variance components

19 20

19 20
5
9/11/2020

CODING/ OUTPUT ADDITIONAL OUTPUT

 Where is Type I?!

 Are the p-values useful?

21 22

21 22

CONCLUSION CONCLUSION
 What’s the estimated average MPG for a Type 5?  Summary of conclusions(Hint: think Post Hoc tests)

23 24

23 24
6
9/11/2020

COOKIE EXAMPLE CAR COMPANY EXAMPLE

 We are interested in examining how the amount of fat in cookie dough affects a cookie’s texture. There are four  Explanatory:
recipes of interest. (Note that the texture of the cookie is measured by determining the amount of force (in grams)
required to penetrate the cookie surface). There are four different bakers, and each baker prepares each of the
recipes in a random order.  Response:
 The data for this example is already embedded within the Block Design.R code

 Identify explanatory, response, and block.  Block:


 State the hypotheses and model.
 State the “Skeleton” ANOVA
 Summarize your conclusions
 Analyze this data set ignoring the blocking (CRD). What impact does blocking have on the results?
25 26

25 26

HYPOTHESES MODEL

27 28

27 28
7
9/11/2020

“SKELETON” ANOVA TABLE ANALYSIS

 Variance component estimates:

 Initial hypotheses:

29 30

29 30

CONCLUSIONS COMPARE RESULTS TO CRD

 Insufficient evidence that any of the recipes require a different average force (0.3535), but since we did block by
baker, this isn’t the appropriate analysis. Do not use this analysis.

31 32

31 32
8
9/11/2020

MOTIVATION

 Randomized *complete* block design may not be practical in application


 May be limitations on:
 Space
SECTION 2  Resources (e.g. money)

INCOMPLETE BLOCK DESIGN


 Incomplete block designs  great alternative

33 34

33 34

INCOMPLETE BLOCK DESIGNS BALANCED INCOMPLETE BLOCK DESIGNS

 Only a subset of all the treatments are applied to each block  Incomplete block designs can either be balanced or unbalanced
 Example: Recall the classroom teaching example for Section 3.1. Imagine we now want to compare 8 treatments
(rather than three) and have 6 students per class.  Balanced design (Balanced incomplete block design; BIB):
 Each pair of treatments appears together 𝜆 times
4 5 7 8 5 6 6 5

Example: Suppose we have 3 treatments and 6 blocks. Each treatment appears with each other treatment
2 times (𝜆 = 2)
6 2 2 4 7 8 2 1
Block 1 Block 2 Block 3 Block 4 Block 5 Block 6
A A A B B A
3 1 1 3 3 4 7 8 B C C C C B
35 36

35 36
9
9/11/2020

UNBALANCED INCOMPLETE BLOCK DESIGN DETERMINING WHETHER BALANCE IS POSSIBLE


 Suppose we have:
 Unbalanced  a treatments
 Each pair of treatments don’t appear together an equal number of times  b blocks
 r replicates (number of times that a treatment appears in total)
 m treatments per block
 Example: Suppose we have 4 treatments and 8 blocks.

Block 1 Block 2 Block 3 Block 4 Block 5 Block 6 Block 7 Block 8 Then


𝜆 = 𝑟(𝑚 − 1)/(𝑎 − 1)
A A A B B C A C
B C D C D D B D
If 𝜆 is an integer and r is consistent across all treatments, then a balanced design is possible

37 38

37 38

EXAMPLE 1 EXAMPLE 2

 Find 𝜆. Determine if a balance is possible.  Find 𝜆. Determine if a balance is possible.

B1 B2 B3 B4
B1 B2 B3 B4 B5 B6
A A A B
A A A B B C
B B C C
B C D C D D
C D D D

39 40

39 40
10
9/11/2020

STATISTICIAN MODEL FOR ANOVA WITH BIB IN THIS CLASS..

𝑦 =𝜇+𝜏 +𝑏 +𝜖
 𝑦 is the observation on the 𝑖 treatment and 𝑗 block
 𝜇 denotes the intercept
 𝜏 denotes effect of 𝑖 treatment
 𝑏 ~𝑁(0, 𝜎 ) denotes effect of 𝑗 block  We won’t dive into the analyses of these designs, but I want you to be aware of what they are.
 𝜖 ~𝑁(0, 𝜎 ) denotes error term

 Main difference is that not all i-j combinations are possible with balanced incomplete blocks

41 42

41 42

11
9/11/2020

CHAPTER 4
TWO FACTOR ANOVA

SECTION 1
TWO FACTOR ANOVA

1 2

1 2

MOTIVATING EXAMPLE MOTIVATING EXAMPLE INTERACTION


 A study is designed to see the effect of class standing (lower classmen or upper classmen) and type of review
strategy (supplemental computer training, pre-course review, or neither) on how well students perform in a class.  Interaction: levels of one factor change depending on levels of another factor
 In our example:
 Factors (explanatory variable)

 Levels (levels of each factor)

 Degrees of freedom for the interaction term = 𝑖 − 1 𝑗 − 1 where


 i is the number of levels for the first factor
 Response:
 j is the number of levels for the second factor

3 4

3 4

1
9/11/2020

SV AND DF MODEL FOR TWO-FACTOR ANOVA

Source of Variation df
𝑦 = 𝜇 + 𝛼 + 𝛽 + 𝛼𝛽 +𝑒
Main effect of A a-1
 𝜇 is the intercept
Main effect of B b-1  𝛼 is effect of 𝑖𝑡ℎ level of factor A
 𝛽 is effect of the j𝑡ℎ level of factor B
Interaction (a-1)(b-1)
 𝛼𝛽 is interaction between the 𝑖𝑡ℎ level of factor A and 𝑗𝑡ℎ level of factor B
Error n-ab
 𝑒 ~𝑁(0, 𝜎 ) is residual term
Total n-1

5 6

5 6

ASSUMPTIONS OF TWO-WAY ANOVA “FIRST” HYPOTHESES

 The observations between and within the treatment combinations are independent.
 Sampling from a normal distribution for each treatment combination.
 The variance is the same for each treatment combination. Null: There is no interaction between factors A and B
Alt: There is an interaction between factors A and B

7 8

7 8

2
9/11/2020

STATISTICAL ANALYSIS: STEP 1AND 2 INTERACTION PLOTS (2 X 2 EXAMPLE)

B1 B1
 Look at interaction plot and p-value for the interaction term to see if an interaction exists.
 Small p-value means there is evidence of an interaction. B2
 Move to Step 3 (Option 1) Response Response

 If the p-value is above 𝛼 but there are more than two degrees of freedom for the interaction term, there may
B2
still be evidence of an interaction.
 There are methods to detect this interaction with formal statistical tests. We won’t dive into this in this course.
 Move to Step 3 (Option 2)

Level 1 Level 2 Level 1 Level 2


FACTOR A FACTOR A
9 10

9 10

STATISTICAL ANALYSIS: STEP 3 (OPTION 1) STATISTICAL ANALYSIS: STEP 3 (OPTION 2)

 If the interaction is significant, look at the simple effects.  If the interaction is not significant, look at the main effects.
 Null for main effect of A: There is no difference in the means of factor A
 This means you’re testing the differences in the levels at one factor while holding the other factor constant  Null for main effect of B: There is no difference in the means of factor B
 e.g. if an interaction exists between class standing and review strategy, look at:
 how upper and lower classmen compare for computer supplementation  This means you’re testing the differences in the levels for each factor separately
 how upper and lower classmen compare for pre-course review
 e.g. if no interaction exists between class standing and review strategy, look at:
 how upper and lower classmen compare for control group
 how upper and lower classmen compare overall (i.e. two sample t-test)
 effect of review strategy (i.e. one way ANOVA)

11 12

11 12

3
9/11/2020

BRIGHTNESS EXAMPLE MODEL

 An article in Industrial Quality Control describes an experiment to investigate the effect of the type of glass (two
types) and the type of phosphor (3 types) on the brightness of a television tube. Brightness is measured by the
current necessary (in micro amps) to obtain a specified brightness level.
 Factors and number of levels for each:

 Response

13 14

13 14

INITIAL HYPOTHESES INTERACTION PLOT

15 16

15 16

4
9/11/2020

ANOVA & DECISION ON INTERACTION HYPOTHESES & DECISION FOR MAIN EFFECTS OF GLASS

17 18

17 18

SUMMARY FOR GLASS HYPOTHESES & DECISION FOR MAIN EFFECTS OF PHOSPHOR

19 20

19 20

5
9/11/2020

SUMMARY FOR PHOSPHOR RECOMMENDATION

21 22

21 22

RIBBON EXAMPLE INITIAL HYPOTHESES & SKELETON ANOVA

 An experiment was conducted to aid in developing a product that can be used as a substrate for making ribbons.
The experiment was designed to investigate the effects of base polymer (Mylar, nylon and polyethylene) and
additive (c1, c2, c3 and c4 ) on the tensile strength of the resulting ribbon.
 Factors and number of levels for each:

 Response

23 24

23 24

6
9/11/2020

INTERACTION PLOT ANOVA & CONCLUSION ON INTERACTION

25 26

25 26

SIMPLE EFFECTS SUMMARY

 Output edited:

27 28

27 28

7
9/11/2020

SUMMARY QUESTION

 If a company is currently using additive C2 and currently has access to Mylar as their base polymer, should they
switch to using polyethylene instead of Mylar?

29 30

29 30

POKER EXAMPLE POKER EXAMPLE

 An study was conducted to see if a poker player’s skill (rated as average or expert) and poker hand received (bad,
neutral, or good) affected the player’s cash balance at the end (in euros).
 Factors and number of levels for each:  Factors and number of levels for each:

 Response:
 Initial hypotheses:
 Interaction Plot:
 Run appropriate analysis:  Response:
Option 1: Choose to either analyze effect of skill given poker hand OR analyze effect of poker hand given skill level
Option 2: Run ANOVA/two sample t-test on main effects

 Summarize results

31 32

31 32

8
9/11/2020

INTERACTION PLOT/INITIAL HYPOTHESES

 Initial hypotheses/ decision:

 Interaction Plot:

33 34

33 34

APPROPRIATE ANALYSIS APPROPRIATE ANALYSIS

 For a bad hand, an expert player earned an estimated average of 2.66 more euros than an average player. (p-
value=0.022). With 95% confidence, the average cash balance increases between 0.23 and 5.10 euros.
 For an average skill level, a neutral hand produces an estimated average of 5.18 more euros than a bad hand (p-
 For neutral or good hands, there’s insufficient evidence of a difference in average cash balance between expert value < 0.0001). With 95% confidence, the average cash balance increases between 2.75 and 7.61 euros.
and neutral players (p-value=0.699 and 0.518 respectively).
 For an average skill level, a good hand produces an estimated average of 9.27 more euros than a bad hand (p-value
< 0.0001). With 95% confidence, the average cash balance increases between 6.84 and 11.70 euros.

 You continue on to discuss the rest….

35 36

35 36

9
9/11/2020

APPROPRIATE ANALYSIS

SECTION 2
FACTORIAL BLOCK DESIGN

37 38

37 38

MAIN IDEA SV AND DF

 Combining random effect (a block) with more than one categorical factor (i.e. more than one categorical
explanatory variable) Source of Variation df
 If all treatments combinations are put into a block, the experiment design is a Randomized Complete Block Design (RCBD)
Block c-1
 In this class, we will only focus on two-way factorials (i.e. two-way ANOVA).
Main effect of A a-1

Main effect of B b-1

 Experimental and Treatment design: Interaction (a-1)(b-1)


 Experiment Design: RCBD
 Treatment Design: A x B factorial
Error (ab-1)(c-1)

Total n-1
39 40

39 40

10
9/11/2020

MODEL FOR TWO-FACTOR ANOVA COOKIE EXAMPLE

 An experiment was conducted to determine how long it takes various types of chocolate chips to dissolve in a
𝑦 = 𝜇 + 𝛼 + 𝛽 + 𝛼𝛽 +𝑐 +𝜖
mouth without chewing. Six different people tried two brands of chips (Hershey’s and Nestlé) and two kinds of
 𝜇 is the intercept chips (dark vs milk chocolate). Each person tries each combination of chip once.
 𝛼 is effect of the 𝑖 level of factor A  Factors and number of levels for each:

 𝛽 is effect of the 𝑗 level of factor B


 𝛼𝛽 is interaction between the 𝑖 level of factor A and 𝑗 level of factor B
 Response:
 𝑐 ~𝑁(0, 𝜎 ) is the effect of the 𝑘 block
 𝜖 ~𝑁(0, 𝜎 ) is residual term

 Block:

41 42

41 42

EXPERIMENTAL AND TREATMENT DESIGN “SKELETON” ANOVA TABLE

 Experiment Design:

 Treatment design:

43 44

43 44

11
9/11/2020

MODEL STEP 1: WAS BLOCKING EFFECTIVE?

45 46

45 46

2ND STEP: IS THERE VISUAL EVIDENCE OF AN INTERACTION? 3RD STEP: IS THERE STATISTICAL EVIDENCE OF AN INTERACTION?

47 48

47 48

12
9/11/2020

STEP 4: ANALYSIS SUMMARY OF CONCLUSIONS

 Note: R doesn’t have an easy ability to do multiple comparisons for a factorial block design . For problems like
this, you will be given output.

Simple effect level given type of chocolate


Simple Brand Estimate Standard DF t-value Adjusted p-
Effect Level Error value

dark Hershey's - Nestle -18.3333 2.1949 15 -8.35 <.0001


milk Hershey's - Nestle -5.1667 2.1949 15 -2.35 0.0326

49 50

49 50

ALTERNATIVE ANALYSIS
Simple effect level given brand of chocolate

Simple Type Estimate Standard DF t -value Adjusted p-


Effect Level Error value

Hershey's dark - milk 5.5000 2.1949 15 2.51 0.0242

Nestle dark - milk 18.6667 2.1949 15 8.50 <.0001

51

51

13

You might also like