UNL STAT318 Notes Chapter 1-4 (2020)
UNL STAT318 Notes Chapter 1-4 (2020)
CHAPTER 1
INTRODUCTION TO STATISTICAL INFERENCE
SECTION 1
INTRODUCTION TO STATISTICAL INFERENCE
1 2
3 4
1
9/11/2020
Median is defined as the middle observation, where the observations are completely ordered.
If you have an even number of observations in your ordered list, the median is the average of the two middle
observations. Measures of location are usually considered incomplete without a corresponding measure of
variability. The numerical measures of variability we are interested in are:
Mean/Average is defined as numerical average of all observations.
Variance
Sample mean is denoted by 𝑿 where
Standard deviation
Both of these numbers provide a measure of how spread out the data are. The larger the number, the
more spread out the observations are.
Population mean is denoted by 𝝁
5 6
7 8
2
9/11/2020
Suppose you take a sample and observe the following: 70, 90, 80. Calculate the mean, median, standard deviation.
Mean=
Median= Generally, our goal in statistics is to gather information from a sample and then generalize the results
Sample standard deviation using the following steps:
from the sample to some larger population of interest.
This is done by calculating statistics – numerical properties of the sample – and using them to
Step 1: Find the mean
estimate parameters – numerical properties of the population.
Step 2: Find the deviations (subtract
each observation from the mean). Square This process is called statistical inference.
this value.
Step 3: Sum the deviations. Divide this by
n-1. This is 𝑠 .
Step 4: Take square root. This is 𝑠.
9 10
Random We use statistics (numerical summaries of the sample) to try to understand and estimate the
Sampling
parameters (numerical summaries of the population).
11 12
3
9/11/2020
Biased sample is a sample in which the elements selected share some property which influences
The whole purpose of statistics is to understand a set of information. This set of information is from a . . .
their results
Population is the complete collection of ALL elements that are of interest for a given problem.
Convenience Sample is where the selection of units from the population of interest is convenient
Parameter is the numerical measurement used to describe a population.
to be in the sample
These are rarely representative of the population.
The population is often so big that obtaining all information about its elements is either difficult or impossible. So, we work
Simple random sampling implies that each member of the population has an equal probability of
with a more manageable set of data that we obtain from a . . .
being chosen.
Sample is sub-collection of elements drawn from a population.
Tends to be representative of the population.
Statistic is the numerical measurement used to describe a sample.
Allows for us to generalize our results to the population of interest
13 14
Cluster sampling: break the population into heterogeneous (diverse) groups, called clusters, and then Useful R code
randomly select one or more groups nums <- seq(low, high, by = increment)
Every unit within a cluster are in the sample. sample(object, number within the sample, replace = T(or F))
15 16
4
9/11/2020
Suppose 65 infants from the U.S. are part of the study, of which 55 of the 65 select the helper toy. What is the variable of interest? Categorical or quantitative?
Research Question – Is there statistical evidence that infants truly have a preference for more helpful toys?
What is the population of interest? Define parameter of interest.
Suppose infants have no preference. How many in the sample would you expect to choose the helper toy?
17 18
Do you think 55 out of 65 choosing the helper is evidence that infants in the population have a preference?
19 20
5
9/11/2020
21 22
DECISION P-VALUE
Must answer the following: How unusual would it be to a observe a result as extreme or more extreme by
Example: If you win 40% of the time at rock/paper/scissors, the 𝑝-value is the probability of winning 40% of the time
random chance alone (i.e. if the null is true)?
or higher if the real chance of winning is
23 24
6
9/11/2020
P-VALUES
https://simplystatistics.org/2013/08/26/statistics-meme-sad-p-value-bear/
25 26
Measures how many standard deviations away from the mean the statistic Let’s start using a simulation to get the 𝑝-value.
Standard error measures the variability of the statistic http://www.rossmanchance.com/ISIapplets.html
Standardizing allows for a unitless measurement How about we also get the 𝑝-value in R!
Standardizing also allows for easy comparison
To do this, we will need to calculate standard error using
Sliding the mean to 0 and transforming the standard deviation to 1
𝜋 is the value of the parameter under the null hypothesis
Critical values- “cut-off” points that divide the distribution into reject/fail to reject regions
pnorm(statistic, mean = mean of null, sd = standard error, lower.tail = F)
𝑧-distribution 𝑡-distribution
Population standard deviation Known (when 𝛼 = 0.05) Unknown
Critical value ±1.645 (one-sided test) Depends on degrees of
±1.96 (two-sided test) freedom (df)= 𝑛 − 1
27 28
7
9/11/2020
29 30
Notice that only 3 out of 100 samples *do not* include the true Sample size
proportion.
Bigger sample size → narrower interval
Know more about the population so range of plausible values decreases
Idea: with repeated sampling, about 95% of the confidence intervals
should contain the TRUE value of the population parameter
31 32
8
9/11/2020
33 34
35 36
9
9/11/2020
Define parameter of interest and the statistic. What is your decision? What conclusion can be drawn about the research question?
37 38
DEFINITIONS
39 40
39 40
10
9/11/2020
Parameters:
Interested in the difference: Test statistic:
Common Hypotheses:
𝑆𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 − 𝐻𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑧𝑒𝑑 𝑣𝑎𝑙𝑢𝑒
𝑧= =
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑒𝑟𝑟𝑜𝑟
where 𝑝 =
Statistic:
41 42
Ginkgo biloba and acetazolamide are two drugs used to help prevent contracting headaches at high altitudes. A What are the hypotheses (in words and symbols)?
research study found that 72 out of 124 people who were randomly assigned to take ginkgo biloba got a
headache (58%) while 23 out of 118 people who were randomly assigned to take acetazolamide got a headache
(19.5%).
Research question: Are the two drugs different with regards to how likely a person is to get a headache?
Significance level: 𝛼 = 0.05
Headache 72 23 95
No headache 52 95 147
43 44
43 44
11
9/11/2020
45 46
45 46
Now that we have done these calculations by hand, let’s write code in R that will calculate these things for us,
including the p-value!
Does your conclusion based on the test statistic match the conclusion based on the p-value?
48
47 48
12
9/11/2020
Gives exact probability (i.e. exact p-value) when testing for significance (uses hypergeometric distribution)
Does not require large sample size (this is called a nonparameteric test – doesn’t require any strict assumptions)
Only assume row and column totals are fixed
FISHER’S EXACT TEST Can be used when success/failure proportions are near 0% or 100%
49 50
49 50
A study of birth personality in Great Britain sampled 667 people who were the oldest child and 509 people who were
the youngest child. 260 of the oldest children said they were more relaxed than their other siblings while 214 of the
We will just use R for this test!
youngest children said they were more relaxed. Is the proportion regarding who’s more relaxed different?
Significance level: 𝛼 = 0.05
51 52
13
9/11/2020
EXAMPLE EXAMPLE
53 54
53 54
EXAMPLE EXAMPLE
Set up and calculate the equation for the 95% confidence interval.
Statistic ± Critical Value × Standard Error
State your decision and give a conclusion.
55 56
55 56
14
9/11/2020
Parameters:
Interested in difference:
Common Hypotheses:
SECTION 3
INFERENCE FOR TWO MEANS
Statistic:
57 58
57 58
59 60
59 60
15
9/11/2020
where 𝑑𝑓 =
We are now using the 𝑡-distribution, where the critical value we use to compare to our test statistic now
depends on degrees of freedom.
61 62
61 62
A random sample of 24 freshman spent an average of 17.38 hours per week studying with a standard deviation of
5.93 while 23 seniors paid an average of 16.30 with a standard deviation of 3.72. Calculate the test statistic assuming unequal variances.
Research question: Do seniors spend a different amount of time studying for classes each week than freshman?
Significance level: 𝛼 = 0.05
Write the hypotheses.
0.75
𝑥 = 17.38, 𝑠 = 5.93, 𝑛 = 24
𝑥̅ = 16.30, 𝑠 = 3.72, 𝑛 = 23
63 64
63 64
16
9/11/2020
Set up and calculate the equation for the 95% confidence interval.
65
65 66
STUDY EXAMPLE
67 68
67 68
17
9/11/2020
where 𝑠 = where 𝑑𝑓 =
69 70
69 70
A study was conducted where 69 students were randomly assigned to listen to a lecture from an attractive Write the hypotheses.
instructor and got an average score of 18.27 on a 25-question quiz with a standard deviation of 3.30 while the 62
students randomly assigned to an unattractive instructor got an average score of 16.68 with a standard deviation
of 3.22. (Suppose Group 1: attractive instructor, Group 2: unattractive instructor)
Research question: Do students learn more from attractive teachers?
Significance level: 𝛼 = 0.05
1. Write the hypotheses.
2. Calculate the test statistic assuming equal variances by hand. The critical value is 1.657.
3. State your decision and give a conclusion.
71 72
71 72
18
9/11/2020
̅ ̅
𝑡=
73 74
73 74
SO FAR…
Independent groups design (Section 1.3): no connections relating individuals in one group to individuals in
another group
SECTION 4
e.g. American League (AL) teams independent of National League (NL) teams in baseball
e.g. freshman not related to seniors for time spent studying
75 76
75 76
19
9/11/2020
Repeated measures: Responses comes in pairs (taking information on same observational unit twice)
27 students are sampled for this study.
e.g. looking at average difference in scores on a pretest and a posttest for each person
Research question: Does listening to music with lyrics affect students’ memorization ability?
e.g. looking at average difference in weight before a diet and after diet for each person
Explain how each of these studies can be done using:
Independent groups
Matched pairs design: match individuals with similar characteristics in groups of 2 and compare the difference
between each matched pair Paired design using repeated measures
e.g. difference in two surgery procedures for two people who are similar Paired design using matching
matched pairs used when repeated measures not feasible
77 78
77 78
79 80
79 80
20
9/11/2020
Does not focus on words memorized for each individual (or matched pair)
Focuses on difference in words memorized for each individual (or matched pair)
e.g. In independent groups: one person might memorize 13 words in lyrics group and another person might memorize 15
words without lyrics
e.g. In paired design: a person might memorize 13 words with lyrics and the same person (or the other person in the
matched pair) might memorize 15 words without lyrics
Focus on difference: 13 – 15 = –2
This person memorized 2 less words with lyrics
81 82
81 82
Note: randomization should still be used to determine which “treatment” each person will receive first for
Person 1 3 10 7
repeated measures (or which person in the pair receives which “treatment” for matched pairs) Person 2 5 15 10
e.g. some people should listen to lyrics first and others should listen to the non-lyric music first; then switch off
Person 3 7 7 0
𝒙𝒅 = 𝟓. 𝟔𝟔𝟕
𝒔𝒅 = 𝟓. 𝟏𝟑𝟏
83 84
83 84
21
9/11/2020
85 86
85 86
Example: A random sample of first year-students was taken at a large Midwestern university. Their weight was
recorded at the beginning and end of their first year.
Write the hypotheses.
Research question: Do students typically have a change in weight during their first year of college?
Significance level: 𝛼 = 0.05
1. Write the hypotheses.
2. Find the mean and standard deviation of the differences in R.
3. What is the critical value? Use R.
Find the mean and standard deviation of the differences.
4. Determine the test statistic by hand and in R.
5. State your decision and give a conclusion.
6. What is the 95% CI? Complete by hand and in R. What is the critical value? Use R.
7. Interpret the interval.
87 88
22
9/11/2020
89 90
89 90
CONCLUSION QUESTION
Conclusion Question: Why does a paired design work better than independent samples in this scenario?
Paired design focuses on measuring the difference in beginning and end weight for each person
less variability
larger test statistic
smaller p-value
SECTION 5
reject more
NONPARAMETRIC ALTERNATIVE TO COMPARING POPULATIONS
91 92
91 92
23
9/11/2020
T-TEST CHALLENGES
93 94
Potential alternatives: If the populations are symmetric, can be expressed in terms of the population medians (𝑀 , 𝑀 )
95 96
95 96
24
9/11/2020
97 98
99 100
25
9/11/2020
101 102
101 102
Abs Diff
Rank
Typical hypotheses in terms of median difference: Sign
Step 3: Add signed rank to your table
Sign Rank
Step 3: Calculate 𝑉 by summing all the positive ranks (or 𝑉 by summing absolute value of negative ranks)
Step 4: Again, we would use a table for the critical value – but we will use R to get a p-value instead.
103 104
103 104
26
9/11/2020
105 106
What is the p-value? We are interested in if typical heart rates differ when using the treadmill as compared to the elliptical. 10 participants
are sampled. Each participant used the elliptical and treadmill for five minutes, each on a separate day. At the five-minute
mark, their heart rate was recorded. The data are presented below:
Sub 1 2 3 4 5 6 7 8 9 10
Ellip 140 150 162 142 170 154 160 120 130 150
Tread 124 155 170 120 157 145 141 121 142 161
What conclusion can be drawn?
What are the hypotheses in terms of a median (elliptical-tread)?
Calculate the appropriate statistic by hand.
What is the p-value?
What conclusion can be drawn?
107 108
107 108
27
9/11/2020
Typically less powerful tests (less likely to reject a false null hypothesis) than parametric tests
If reasonably certain assumptions are met for parametric method, better to use that instead of nonparametric SECTION 6
ERRORS AND POWER
109 110
109 110
Important note: once you make a decision, only one type of error could have been made!
You must decide before conducting an experiment or study which error will be worse.
111 112
111 112
28
9/11/2020
EXAMPLE EXAMPLE 2
As with a jury trial, another analogy to hypothesis testing involves medical diagnostic tests. These tests aim to
indicate whether or not the patient has a particular disease. But the tests are not infallible, so errors can be made.
Historically, it is known that the average body temperature was known to be 98.6 degree F; however, researchers
The null hypothesis can be regarded as the patient being healthy. wonder if this value has decreased.
The alternative hypothesis can be regarded as the patient having the disease.
Describe what Type I error represents in this situation.
Describe what Type I error represents in this situation. Describe what Type II error represents in this situation.
Describe what Type II error represents in this situation. Which type of error would you consider to be more serious in this situation? Explain your thinking
Which type of error would you consider to be more serious in this situation? Explain your thinking
113 114
113 114
POWER POWER
115 116
115 116
29
9/11/2020
Example: Suppose that we are comparing two different types of seeds of corn and their yield. We know that the
Power analyses focus on one of the following: two different seeds have a common variance in yield of 300. We are interested in detecting a 15-unit difference in
Calculating the sample size required to achieve a certain power yield. We also know that we will have 25 of each type of seed. What’s the power of this test?
Or opposite calculating the power given a sample size
Researchers commonly do these before running an experiment To use, R’s function for two independent samples with equal variances, we need to calculate the effect size defined
If you are spending the money on an experiment, you want to know you’ll be able to detect the difference if it is there… as:
|𝜇 − 𝜇 |
𝑑= =
𝜎
Typically, you want at least 80% power
117 118
117 118
Try the following values for the sample size to (5, 10, 30, 50) while holding everything else constant. What
happens to the power as you change the sample size in each group? Try the following values for the variances (𝜎 = 100, 200, 400, 500), while holding everything else constant (from
the initial scenario). What happens to the power as you increase and decrease the variances?
Try the following values for 𝜇 − 𝜇 = 1, 5, 20 and 30, while holding everything else constant (from the initial
scenario). What happens to the power as you change the in each group? Try the following values for the 𝛼 (0.001, 0.01, 0.1, 0.25), while holding everything else constant (from the initial
scenario). What happens to the power as you change the significance level?
119 120
119 120
30
9/11/2020
Try the following values for 𝜇 − 𝜇 = 1, 5, 15, and 20, while holding everything else constant (from the initial
Example: Suppose that we are comparing freshman and sophomore’s Exam 1 grade in STAT 218. We are scenario). What happens to the required sample sizes as you increase and decrease size of the differences?
interested in finding a 10-point grade difference between freshman and sophomores. Assume both sophomores
and freshman have a common variance of 100. What sample size for each group do you need to have in order to
have 80% power?
Try the following values for the variances (𝜎 =16, 50, 150, 200), while holding everything else constant (from
the initial scenario). What happens to the required sample sizes as you increase and decrease the variances?
121 122
121 122
Try the following values for the power (0.5, 0.6, 0.9, 0.99), while holding everything else constant (from the initial
scenario). What happens to the required sample sizes as you change the power ?
Try the following values for the 𝛼 (0.001, 0.01, 0.1, 0.25), while holding everything else constant (from the initial
scenario). What happens to the required sample sizes as you change the significance level?
You can do this in almost any study, but we’ve just demonstrated this for two independent mean with equal
variances
123 124
123 124
31
9/11/2020
SO FAR..
CHAPTER 2
ONE-WAY ANALYSIS OF VARIANCE (ANOVA)
Compared two groups
Florida vs Iowa with relation to ice cream consumption
Freshman vs senior with regards to study habits
Attractive vs unattractive instructor with regards to learning
1 2
1 2
Generally, it is a mathematical formula showing the relationship between the response variable and explanatory
variable(s) Suppose we are interested in comparing if salaries tend to increase as education level (Associates, bachelors, master’s,
doctoral) increases.
Often, written as follows: Explanatory:
𝑌 = 𝑀𝑜𝑑𝑒𝑙 + 𝐸𝑟𝑟𝑜𝑟 Number of levels:
Response:
where the model is where you put your explanatory variables, experimental design components (discussed later), etc
Typical overarching research questions: Are there at least two population means that differ?
3 4
3 4
1
9/11/2020
𝑦 =𝜇+𝜏 +𝑒
Hypotheses:
𝑦 denotes the response for the 𝑖 group and 𝑗 observation
𝐻 :𝜇 = 𝜇 = ⋯ = 𝜇
𝜇 is the intercept (the overall mean for the baseline group that all other groups are compared to)
𝐻 : at least two population means differ (i.e. 𝜇 ≠ 𝜇 for some i, j)
𝜏 is the effect of the 𝑖 group (how much that group mean deviates from the intercept)
𝑒 ~𝑁(0, 𝜎 ) denotes the error term
5 6
5 6
Notes: Ideally, observational (experimental) units randomly assigned to explanatory variable groups
Referred to as completely randomized design ANOVA is calculated using the 𝐹-statistic by measuring the ratio of variability between groups and variability within
If not possible to randomly assign, important to independently sample from each group
groups
If 𝐹 is large, the variability between groups is a lot larger than the variability within groups
7 8
7 8
2
9/11/2020
Model sums of squares signifies between group variability (how much the means vary from group to
For 𝑖 = 1,2, … , 𝑘 groups and 𝑗 = 1,2, … 𝑛 observations in each 𝑖 group, an ANOVA table can be constructed as group)
follows:
Used to calculate the between-group variance
Error
Total sums of squares signifies variability of each observation and overall mean
Total
9 10
9 10
Assume the data below represents the time (in minutes) of how long it takes to wait in line at Starbucks on
different days of the week
11 12
11 12
3
9/11/2020
13 14
13 14
STEP 2: CALCULATE THE OVERALL MEAN STEP 3: CALCULATE THE SUMS OF SQUARES FOR THE MODEL.
Calculate by multiplying sample size for each group by the square deviations of each sample group mean and
overall mean.
15 16
15 16
4
9/11/2020
STEP 4: CALCULATE THE SUMS OF SQUARES FOR THE ERROR STEP 5: CALCULATE SUM OF SQUARES TOTAL
Calculate by summing square deviations for each observation and its group mean
17 18
17 18
Source df SS MS F 𝑝-value
What is the critical value for F? (Hint: To obtain the critical value for the 𝐹-statistic in R, plug in 1 − 𝛼,the degrees
Day of freedom for the model, degrees of error)
qf(1-𝛼, df , df )
Error
Conclusion:
Total
19 20
19 20
5
9/11/2020
Assume the data below represents the time (in minutes) of how long it takes to wait at different fast-food restaurants.
Researchers randomly assigned 15 different people to either wait at McDonald’s, Popeye’s, or Fazolli’s on random days.
Researchers are interested if different restaurants have different wait times. State the hypotheses and find the ANOVA
table.
21 22
21 22
RESTAURANT EXAMPLE HYPOTHESES STEP 1: CALCULATE THE SAMPLE MEAN FOR EACH GROUP
23 24
23 24
6
9/11/2020
STEP 2: CALCULATE THE OVERALL MEAN STEP 3: CALCULATE THE SUMS OF SQUARES FOR THE MODEL.
Calculate by multiplying sample size for each group by the square deviations of each sample group mean and
overall mean.
25 26
25 26
STEP 4: CALCULATE THE SUMS OF SQUARES FOR THE ERROR STEP 5: CALCULATE SUM OF SQUARES TOTAL
Calculate by summing square deviations for each observation and its group mean
27 28
27 28
7
9/11/2020
Source df SS MS F 𝑝-value
What is the critical value?
Rest.
Error
Conclusion?
Total
29 30
29 30
Freshmen
Sophomores
Juniors
Just eyeballing it, do you think there is a difference in groups? Seniors
31 32
31 32
8
9/11/2020
Source df SS MS F 𝑝-value
Class
Error
Total
33 34
33 34
MAJORS EXAMPLE
Is there an association between major and GPA? Use R to answer this question.
Hypotheses:
MULTIPLE COMPARISONS
Conclusion:
35 36
35 36
9
9/11/2020
When the null hypothesis is rejected, a post hoc test needs to be performed to see which level is different
If we do _____ independent t-tests (like we did in Chapter 1), each test will have 𝑃 𝑇𝑦𝑝𝑒 𝐼 𝐸𝑟𝑟𝑜𝑟 = 𝛼
In major example, number of comparisons: Called comparison-wise error rate
37 38
37 38
A multiplicity adjustment needs to be done to ensure that the experiment-wise Type I error stays at the specified
What’s the probability of making at least one error in all of the comparisons assuming 𝛼 = 0.05? 𝛼 level
Formula:
This increases Type I error to about ______ % for the whole experiment
Called the “family-wise” or “experiment-wise” error rate
39 40
39 40
10
9/11/2020
POST-HOC TEST FOR GPA BY MAJOR EXAMPLE POST-HOC TEST FOR GPA BY MAJOR EXAMPLE
41 42
41 42
11
9/11/2020
CHAPTER 3
EXPERIMENT DESIGN
SECTION 1
RANDOMIZED COMPLETE BLOCK DESIGN (RCBD)
1 2
1 2
How could we design this based on the information we have learned in this course so far?
3 2 1 1 2 3 1 1
3 2 3 2
3
Note: The numbers in the circles represent each of the learning activities while the rectangles represent the 4
classrooms taught by a different instructor
3 4
1
9/11/2020
If learning activity #3 turns out to be the best and learning activity #1 is the worst, why might that be an issue for We can take 4 classes (each taught by a different instructor) with 3 students in each class.We can then randomly
the first design? assign one students in each class to receive each of the learning activities
1 1
2 3 3 2 1
2
1 2 3 3
5 6
5 6
7 8
7 8
2
9/11/2020
9 10
9 10
𝑦 = 𝜇+𝜏 +𝑏 +𝜖
𝑦 is the observation on the 𝑖 treatment and 𝑗 block
𝜇 denotes the intercept
𝜏 denotes effect of 𝑖 treatment
ANOVA WITH RCBD 𝑏 ~𝑁(0, 𝜎 ) denotes effect of 𝑗 block
𝜖 ~𝑁(0, 𝜎 ) denotes error term
11 12
11 12
3
9/11/2020
Source 𝑑𝑓 SS MS F
The block term is considered a “random” effect because the blocks don’t consist of the entire population (e.g.
other classes could be chosen) Block 𝑏−1
The treatment is a fixed effect (e.g. only those three learning activities are being considered)
Treatment 𝑡−1 SS(T) SS T MS T
Difference between treatments does not depend on blocks
𝑡−1 MS E
Total 𝑏𝑡 − 1 SS(Total)
13 14
13 14
Explanatory (treatment):
Response:
Block:
15 16
15 16
4
9/11/2020
MODEL
17 18
17 18
Variance components
19 20
19 20
5
9/11/2020
21 22
21 22
CONCLUSION CONCLUSION
What’s the estimated average MPG for a Type 5? Summary of conclusions(Hint: think Post Hoc tests)
23 24
23 24
6
9/11/2020
We are interested in examining how the amount of fat in cookie dough affects a cookie’s texture. There are four Explanatory:
recipes of interest. (Note that the texture of the cookie is measured by determining the amount of force (in grams)
required to penetrate the cookie surface). There are four different bakers, and each baker prepares each of the
recipes in a random order. Response:
The data for this example is already embedded within the Block Design.R code
25 26
HYPOTHESES MODEL
27 28
27 28
7
9/11/2020
Initial hypotheses:
29 30
29 30
Insufficient evidence that any of the recipes require a different average force (0.3535), but since we did block by
baker, this isn’t the appropriate analysis. Do not use this analysis.
31 32
31 32
8
9/11/2020
MOTIVATION
33 34
33 34
Only a subset of all the treatments are applied to each block Incomplete block designs can either be balanced or unbalanced
Example: Recall the classroom teaching example for Section 3.1. Imagine we now want to compare 8 treatments
(rather than three) and have 6 students per class. Balanced design (Balanced incomplete block design; BIB):
Each pair of treatments appears together 𝜆 times
4 5 7 8 5 6 6 5
Example: Suppose we have 3 treatments and 6 blocks. Each treatment appears with each other treatment
2 times (𝜆 = 2)
6 2 2 4 7 8 2 1
Block 1 Block 2 Block 3 Block 4 Block 5 Block 6
A A A B B A
3 1 1 3 3 4 7 8 B C C C C B
35 36
35 36
9
9/11/2020
37 38
37 38
EXAMPLE 1 EXAMPLE 2
B1 B2 B3 B4
B1 B2 B3 B4 B5 B6
A A A B
A A A B B C
B B C C
B C D C D D
C D D D
39 40
39 40
10
9/11/2020
𝑦 =𝜇+𝜏 +𝑏 +𝜖
𝑦 is the observation on the 𝑖 treatment and 𝑗 block
𝜇 denotes the intercept
𝜏 denotes effect of 𝑖 treatment
𝑏 ~𝑁(0, 𝜎 ) denotes effect of 𝑗 block We won’t dive into the analyses of these designs, but I want you to be aware of what they are.
𝜖 ~𝑁(0, 𝜎 ) denotes error term
Main difference is that not all i-j combinations are possible with balanced incomplete blocks
41 42
41 42
11
9/11/2020
CHAPTER 4
TWO FACTOR ANOVA
SECTION 1
TWO FACTOR ANOVA
1 2
1 2
3 4
3 4
1
9/11/2020
Source of Variation df
𝑦 = 𝜇 + 𝛼 + 𝛽 + 𝛼𝛽 +𝑒
Main effect of A a-1
𝜇 is the intercept
Main effect of B b-1 𝛼 is effect of 𝑖𝑡ℎ level of factor A
𝛽 is effect of the j𝑡ℎ level of factor B
Interaction (a-1)(b-1)
𝛼𝛽 is interaction between the 𝑖𝑡ℎ level of factor A and 𝑗𝑡ℎ level of factor B
Error n-ab
𝑒 ~𝑁(0, 𝜎 ) is residual term
Total n-1
5 6
5 6
The observations between and within the treatment combinations are independent.
Sampling from a normal distribution for each treatment combination.
The variance is the same for each treatment combination. Null: There is no interaction between factors A and B
Alt: There is an interaction between factors A and B
7 8
7 8
2
9/11/2020
B1 B1
Look at interaction plot and p-value for the interaction term to see if an interaction exists.
Small p-value means there is evidence of an interaction. B2
Move to Step 3 (Option 1) Response Response
If the p-value is above 𝛼 but there are more than two degrees of freedom for the interaction term, there may
B2
still be evidence of an interaction.
There are methods to detect this interaction with formal statistical tests. We won’t dive into this in this course.
Move to Step 3 (Option 2)
9 10
If the interaction is significant, look at the simple effects. If the interaction is not significant, look at the main effects.
Null for main effect of A: There is no difference in the means of factor A
This means you’re testing the differences in the levels at one factor while holding the other factor constant Null for main effect of B: There is no difference in the means of factor B
e.g. if an interaction exists between class standing and review strategy, look at:
how upper and lower classmen compare for computer supplementation This means you’re testing the differences in the levels for each factor separately
how upper and lower classmen compare for pre-course review
e.g. if no interaction exists between class standing and review strategy, look at:
how upper and lower classmen compare for control group
how upper and lower classmen compare overall (i.e. two sample t-test)
effect of review strategy (i.e. one way ANOVA)
11 12
11 12
3
9/11/2020
An article in Industrial Quality Control describes an experiment to investigate the effect of the type of glass (two
types) and the type of phosphor (3 types) on the brightness of a television tube. Brightness is measured by the
current necessary (in micro amps) to obtain a specified brightness level.
Factors and number of levels for each:
Response
13 14
13 14
15 16
15 16
4
9/11/2020
ANOVA & DECISION ON INTERACTION HYPOTHESES & DECISION FOR MAIN EFFECTS OF GLASS
17 18
17 18
SUMMARY FOR GLASS HYPOTHESES & DECISION FOR MAIN EFFECTS OF PHOSPHOR
19 20
19 20
5
9/11/2020
21 22
21 22
An experiment was conducted to aid in developing a product that can be used as a substrate for making ribbons.
The experiment was designed to investigate the effects of base polymer (Mylar, nylon and polyethylene) and
additive (c1, c2, c3 and c4 ) on the tensile strength of the resulting ribbon.
Factors and number of levels for each:
Response
23 24
23 24
6
9/11/2020
25 26
25 26
Output edited:
27 28
27 28
7
9/11/2020
SUMMARY QUESTION
If a company is currently using additive C2 and currently has access to Mylar as their base polymer, should they
switch to using polyethylene instead of Mylar?
29 30
29 30
An study was conducted to see if a poker player’s skill (rated as average or expert) and poker hand received (bad,
neutral, or good) affected the player’s cash balance at the end (in euros).
Factors and number of levels for each: Factors and number of levels for each:
Response:
Initial hypotheses:
Interaction Plot:
Run appropriate analysis: Response:
Option 1: Choose to either analyze effect of skill given poker hand OR analyze effect of poker hand given skill level
Option 2: Run ANOVA/two sample t-test on main effects
Summarize results
31 32
31 32
8
9/11/2020
Interaction Plot:
33 34
33 34
For a bad hand, an expert player earned an estimated average of 2.66 more euros than an average player. (p-
value=0.022). With 95% confidence, the average cash balance increases between 0.23 and 5.10 euros.
For an average skill level, a neutral hand produces an estimated average of 5.18 more euros than a bad hand (p-
For neutral or good hands, there’s insufficient evidence of a difference in average cash balance between expert value < 0.0001). With 95% confidence, the average cash balance increases between 2.75 and 7.61 euros.
and neutral players (p-value=0.699 and 0.518 respectively).
For an average skill level, a good hand produces an estimated average of 9.27 more euros than a bad hand (p-value
< 0.0001). With 95% confidence, the average cash balance increases between 6.84 and 11.70 euros.
35 36
35 36
9
9/11/2020
APPROPRIATE ANALYSIS
SECTION 2
FACTORIAL BLOCK DESIGN
37 38
37 38
Combining random effect (a block) with more than one categorical factor (i.e. more than one categorical
explanatory variable) Source of Variation df
If all treatments combinations are put into a block, the experiment design is a Randomized Complete Block Design (RCBD)
Block c-1
In this class, we will only focus on two-way factorials (i.e. two-way ANOVA).
Main effect of A a-1
Total n-1
39 40
39 40
10
9/11/2020
An experiment was conducted to determine how long it takes various types of chocolate chips to dissolve in a
𝑦 = 𝜇 + 𝛼 + 𝛽 + 𝛼𝛽 +𝑐 +𝜖
mouth without chewing. Six different people tried two brands of chips (Hershey’s and Nestlé) and two kinds of
𝜇 is the intercept chips (dark vs milk chocolate). Each person tries each combination of chip once.
𝛼 is effect of the 𝑖 level of factor A Factors and number of levels for each:
Block:
41 42
41 42
Experiment Design:
Treatment design:
43 44
43 44
11
9/11/2020
45 46
45 46
2ND STEP: IS THERE VISUAL EVIDENCE OF AN INTERACTION? 3RD STEP: IS THERE STATISTICAL EVIDENCE OF AN INTERACTION?
47 48
47 48
12
9/11/2020
Note: R doesn’t have an easy ability to do multiple comparisons for a factorial block design . For problems like
this, you will be given output.
49 50
49 50
ALTERNATIVE ANALYSIS
Simple effect level given brand of chocolate
51
51
13