Class Notes, Statistical Methods in Research 1
Class Notes, Statistical Methods in Research 1
Contents
Contents
Part 1 Material
1 Descriptive Statistics
1.1 Concept . . . . . . . . . . . . . . .
1.2 Summary Statistics . . . . . . . . .
1.2.1 Location . . . . . . . . . . .
1.2.2 Spread . . . . . . . . . . . .
1.2.3 Effect of shifting and scaling
1.3 Graphical Summaries . . . . . . . .
1.3.1 Dot Plot . . . . . . . . . . .
1.3.2 Histogram . . . . . . . . . .
1.3.3 Box-Plot . . . . . . . . . . .
1.3.4 Pie chart . . . . . . . . . . .
1.3.5 Scatterplot . . . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
measurements
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
2 Probability
2.1 Sample Space and Events . . . . . . . . . . . . .
2.1.1 Basic concepts . . . . . . . . . . . . . . . .
2.1.2 Relating events . . . . . . . . . . . . . . .
2.2 Probability . . . . . . . . . . . . . . . . . . . . .
2.3 Conditional Probability and Independence . . . .
2.3.1 Independent Events . . . . . . . . . . . . .
2.3.2 Law of Total Probability . . . . . . . . . .
2.3.3 Bayes Rule . . . . . . . . . . . . . . . . .
2.4 Random Variables . . . . . . . . . . . . . . . . . .
2.4.1 Expected Value And Variance . . . . . . .
2.4.2 Population Percentiles . . . . . . . . . . .
2.4.3 Common Discrete Distributions . . . . . .
2.4.4 Common Continuous Distributions . . . .
2.4.5 Covariance . . . . . . . . . . . . . . . . . .
2.4.6 Mean and variance of linear combinations
2.4.7 Central Limit Theorem . . . . . . . . . . .
1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5
. 5
. 5
. 5
. 6
. 7
. 7
. 7
. 8
. 9
. 11
. 11
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
13
13
13
14
15
16
17
18
19
20
23
25
25
27
30
32
33
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
II
. . . .
. . . .
. . . .
width
. . . .
. . . .
. . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Part 2 Material
35
35
36
36
38
39
39
43
45
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
48
48
48
49
50
51
52
52
53
54
54
55
Population Location
57
. . . . . . . . . . . . . . . . . . 57
. . . . . . . . . . . . . . . . . . 59
. . . . . . . . . . . . . . . . . . 60
69
III
73
Part 3 Material
9 Regression
74
9.1 Simple Linear Regression . . . . . . . . . . . . . . . . . . . . . 74
2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
77
79
80
81
82
82
85
88
88
89
89
.
.
.
.
.
102
. 102
. 106
. 108
. 109
. 111
Part I
Part 1 Material
Chapter 1
Descriptive Statistics
Chapter 3 in textbook
1.1
Concept
1.2
Summary Statistics
1.2.1
Location
1
n
Pn
i=1
xi .
The % trimmed mean is the mean of the data with the smallest
% n observations and the largest % n observations truncated
from the data.
The pth percentile value divides the ordered data such that p% of
the data are less than that value and (100-p)% greater than it. It is
located at (p/100)(n+1) position of the ordered data. If the position
value is not an integer then average the values at (p/100)(n + 1) and
(p/100)(n + 1). The median is actually the 50th percentile.
According to the textbook p.76, the j th ordered observation corresponds to the 100(j 0.5)/n percentile.
Example 1.1. The following values of fracture stress (in megapascals) were
measured for a sample of 24 mixtures of hot mixed asphalt (HMA).
30 75 79 80 80 105 126 138 149 179 179 191
223 232 232 236 240 242 245 247 254 274 384 470
P24
=
Hence,
i=1 xi = 30 + 75 + + 384 + 470 = 4690 and thus x
4690/24 = 195.4167.
The median is the average of the observations at the 12th and 13th position
of the ordered data, i.e. x = (191 + 223)/2 = 207.
There are three modes, 80, 179 and 232.
To compute the 5% trimmed mean we need to remove 0.05(24) = 1.2
1 observations from the lower and upper side of the data. Hence remove
30 and 470 and recalculate the average of those 22 observations. That
is 190.45.
The 25th percentile (a.k.a. 1st Quartile) is located at (25/100)(24 +
1) = 6.25 position. So average the values at 6th and 7th position, i.e.
(105+126)/2=115.5
http://www.stat.ufl.edu/~ athienit/STA6166/loc_stats.pdf
Remark 1.1. Note that the mean is more sensitive to outliers-observations
that do not fall in the general pattern of the rest of the data-than the median.
Assume we have values 2, 3, 5. The mean is 3.33 and the median is 3. Assume
we now have 2, 3, 5, 112. The mean is 30.5 but the median is now 4.
1.2.2
Spread
1.2.3
s2y =
X
1 X
1 X
1
(yi
y )2 =
(axi +ba
xb)2 =
a2
(xi
x)2 = a2 s2x
n1
n1
n1
1.3
1.3.1
Graphical Summaries
Dot Plot
Stack each observation on a horizontal line to create a dot plot that gives an
idea of the shape of the data. Some rounding of data values is allowed in
order to stack.
100
200
300
400
1.3.2
Histogram
Freq.
5
7
10
1
1
Relative Freq.
5/24=0.208
7/24=0.292
0.417
0.0417
0.0417
Density
0.208/100=0.00208
0.292/100=0.00292
0.00417
0.000417
0.000417
Density
0.000
0.001
0.002
0.003
0.004
Histogram
100
200
300
400
500
Remark 1.2. May use Frequency, Relative Frequency or Density as the vertical axis when class widths are equal. However, class widths are not necessarily equal; usually done to create smoother graphics if not mandated by
the situation at hand. If this is the case then we must use Density that
accounts for the width because large classes may have unrepresentative large
frequencies.
http://www.stat.ufl.edu/~ athienit/STA6166/hist1_boxplot1.R
1.3.3
Box-Plot
100
200
300
400
0.4
0.5
0.0
0.1
0.2
0.3
=1
=1.5
=0.8
0.3
0.2
0.1
0.0
0.0
0.1
0.2
0.3
0.4
Skewed right
0.4
Skewed left
0.00
0.05
0.10
0.15
0.20
10
1.3.4
Pie chart
A pie or circle has 360 degrees. For each category of a variable, the size of the
slice is determined by the fraction of 360 that corresponds to that category.
Example 1.3. There is a total of 337,297,000 native English speakers of the
world, categorizes as
Country
USA
UK
Canada
Australia
Other
Total
USA 67%
Other 6%
Australia 5%
Canada 6%
UK 17%
1.3.5
Scatterplot
It is used to plot the raw 2-D points of two variables in an attempt to discern
a relationship.
Example 1.4. A small study with 7 subjects on the pharmacodynamics
of LSD on how LSD tissue concentration affects the subjects math scores
yielded the following data.
Score
Conc.
60
50
30
40
Math score
70
80
Scatterplot
12
Chapter 2
Probability
Chapter 4.1 - 4.5 in textbook.
The study of probability began in the 17th century when gamblers starting
hiring mathematicians to calculate the odds of winning for different types of
games.
2.1
2.1.1
13
2.1.2
Relating events
When we are concerned with multiple events within the sample space, Venn
Diagrams are useful to help explain some of the relationships. Lets illustrate
this via an example.
Example 2.3. Let,
S = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
A = {1, 3, 5, 7, 9}
B = {6, 7, 8, 9, 10}
14
2.2
Probability
Notation: Let P (A) denote the probability that the event A occurs. It is the
proportion of times that the event A would occur in the long run.
Axioms of Probability:
P (S) = 1
0 P (A) 1, since A S
If A1 , A2 , . . . are mutually exclusive, then
P (A1 A2 ) = P (A1 ) + P (A2 ) +
As a result of the axioms we have that P (A) = 1 P (Ac ) and that
P () = 0.
Example 2.4. In a computer lab there are 4 computers and once a day a
technician inspects them and counts the number of computer crashes. Hence,
S = {0, 1, 2, 3, 4} and
Crashes
0
1
2
3
4
Probability
0.60
0.30
0.05
0.04
0.01
2.3
Definition 2.4. A probability that is based upon the entire sample space is
called an unconditional probability, but when it is based upon a subset of the
sample space it is a conditional (on the subset) probability.
Definition 2.5. Let A and B be two events with P (B) 6= 0. Then the
conditional probability of A given B (has occurred) is
P (A|B) =
P (A B)
.
P (B)
The reason that we divide by the probability of given said occurrence, i.e.
P (B) is to re-standardize the sample space. We update the sample space to
be just B, i.e. S = B and hence P (B|B) = 1. The only part of event A that
occurs within this new S = B is P (A B).
Proposition 2.1. Rule of Multiplication:
If P (A) 6= 0, then P (A B) = P (B|A)P (A).
If P (B) 6= 0, then P (A B) = P (A|B)P (B).
Example 2.7. A player serving at tennis is only allowed one fault. At a
double fault the server loses a point/other player gains a point. Given the
following information:
Loss of point
0.02
2
ault
F
Serve 2
0 .9 8
4
0.4 1
Success
lt
Fau
Serve 1
0 .5
6
Success
16
What is the probability that the server loses a point, i.e. P (Fault 1 and Fault 2)?
P (Fault 1 and Fault 2) = P (Fault 2|Fault 1)P (Fault 1) = (0.02)(0.44) = 0.009
2.3.1
Independent Events
When the given occurrence of one event does not influence the probability
of a potential outcome of another event, then the two events are said to be
independent.
Definition 2.6. Two events A and B are independent if the probability of
each remains the same, whether or not the other has occurred. If P (A) 6=
0, P (B) 6= 0, then
P (B|A) = P (B) P (A|B) = P (A).
If either P (A) = 0, or P (B) = 0, then the two events are independent.
Definition 2.7. (Generalization) The events A1 , . . . , An are independent if
for each Ai and each collection Aj1 , . . . Ajm of events with P (Aj1 Ajm ) 6=
0,
P (Ai |Aj1 Ajm ) = P (Ai )
As a consequence of independence, the rule of multiplication then says
ind.
Ai
i=1
k
Y
i=1
P (Ai ) 0 < k n
17
2.3.2
n
X
i=1
What is the probability that a randomly chosen car will fail the emissions
test within 10 years?
IN CLASS
2.3.3
Bayes Rule
P (B|A)P (A)
P (A B)
=
.
P (B)
P (B|A)P (A) + P (B|Ac )P (Ac )
P (dash sent) =
4
7
Suppose that there is some interference and with probability 1/8 a dot is
mistankenly received on the other end as a dash, and vice versa.
Find P (dot sent|dash received).
IN CLASS
2.4
Random Variables
Notation: For a continuous random variable (r.v.) X, the probability density function (p.d.f.), denoted by fX (x), models the relative frequency of X.
Since there are infinitely many outcomes within an interval, the probability
evaluated at a singularity is always zero, e.g. P (X = x) = 0, x, X being a
continuous r.v.
Conditions for a function to be:
P
p.m.f. 0 p(x) 1 and x p(x) = 1
R
p.d.f. f (x) 0 and f (x)dx = 1
p(x)
0.1334
0.5333
0.3333
Total
1.0
21
0.3
0.2
0.0
0.1
Density
0.4
0.5
Example 2.12. (Continuous) The lifetime of a certain battery has a distribution that can be approximated by f (x) = 0.5e0.5x , x > 0.
tx
Z 0
Z 1
=
0dx +
0.5e0.5x dx
0.5x
= 0 + (e
= 0.3935
22
)|10
2.4.1
The expected value of a r.v. is thought of as the long term average for that
variable. Similarly, the variance is thought of as the long term average of
values of the r.v. to the expected value.
Definition 2.10. The expected value (or mean) of a r.v. X is
!
Z
X
discrete
X := E(X) =
xf (x)dx
=
xp(x) .
24
2.4.2
Population Percentiles
Example 2.18. Let r.v. X have p.d.f. f (x) = 0.5e0.5x , x > 0. The median
of X is found by solving for xm in
Z xm
0.5e0.5t dt = 0.5.
F (xm ) =
0
We note that
Z
xm
0.5 0.5t xm
e
|0
0.5
= e0.5xm (e0 )
= e0.5xm + 1.
0.5e0.5t dt =
2.4.3
Bernoulli
Imagine an experiment where the r.v. X can take only two possible outcomes,
success (X = 1) with some probability p and failure (X = 0) with probability
1 p. The p.m.f. of X is
p(x) = px (1 p)1x
x = 0, 1 0 p 1
25
Example 2.21. A fair coin is tossed 10 times and X = the number of heads
is recorded. What is the probability that X = 3?
One possible outcome is
(H) (H) (H) (T) (T) (T) (T) (T) (T) (T)
Also, E(X) = 4(1/6) = 2/3 and V (X) = 4(1/6)(5/6) = 5/9. The expected value of the proportion of 6s which is E(
p) = 1/6 and has variance
V (
p) = (5/36)/4 = 5/144.
2.4.4
Uniform
A continuous r.v. that places equal weight to all values within its support,
[a, b], a b, is said to be a uniform r.v. It has p.d.f.
f (x) =
1
ba
axb
0.0
0.1
0.2
0.3
0.4
0.5
Uniform Distribution
a+b
2
and V (X) =
(ba)2
.
12
Example 2.23. Waiting time for the delivery of a part from the warehouse
to certain destination is said to have a uniform distribution from 1 to 5 days.
What is the probability that the delivery time is two or more days?
Let X Uniform[1, 5]. Then, f (x) = 0.25 for 1 x 5 and hence
Z 5
P (X 2) =
0.25dt = 0.75.
2
Normal
The normal distribution (Gaussian distribution) is by far the most important
distribution in statistics. The normal distribution is identified by a location
parameter and a scale parameter 2 (> 0). A normal r.v. X is denoted as
X N(, 2 ) with p.d.f.
1
2
1
f (x) = e 22 (x)
2
<x<
0.0
0.1
0.2
0.3
0.4
Normal Distribution
X
.
<
<
P (13.4 < X < 19.0) = P
7
7
7
= P (0.61 < Z < 1.51)
= P (Z < 1.51) P (Z < 0.61)
= 0.93448 0.27093
= 0.66355
If one is using a computer there is no need to revert back and forth from
a standard normal, but it is always useful to standardize concepts.
Example 2.26. The height of males in inches is assumed to be normally distributed with mean of 69.1 and standard deviation 2.6. Let X N(69.1, 2.62 ).
Find the 90th percentile for the height of males.
29
0.15
0.00
0.05
0.10
90 % area
69.1
2.4.5
Covariance
and consequently Cov(X, Y ) = 0 since the joint p.d.f can be expressed as the
product of the two marginal p.d.fs. However, the converse is not true. Think
of a circle such as sin2 X + cos2 Y = 1. Obviously, X and Y are dependent
but they have no linear relationship. Hence, Cov(X, Y ) = 0.
30
The covariance is not unitless so a measure called the population correlation is used to describe the strength of the linear relationship that is
unitless
ranges from 1 to 1
XY = p
Cov(X, Y )
p
,
V (X) V (Y )
1 X
(xi x)(yi y)
n 1 i=1
" n
!
#
X
1
xi yi n
xy
=
n1
i=1
\Y ) =
XY := Cov(X,
Therefore,
rXY := XY =
Pn
xi yi ) n
xy
.
(n 1)sX sY
i=1
Example 2.27. Lets assume that we want to look at the relationship between two variables, height (in inches) and self esteem for 20 individuals.
Height
Esteem
68
4.1
68
3.5
71 62 75 58
4.6 3.8 4.4 3.2
67 63 62 60
3.2 3.7 3.3 3.4
60 67 68
3.1 3.8 4.1
63 65 67
4.0 4.1 3.8
4937.6 20(65.4)(3.755)
= 0.731
19(4.406)(0.426)
31
71 69
4.3 3.7
63 61
3.4 3.6
2.4.6
i=1
and
V
n
X
i=1
ai Xi
=
=
n
n X
X
i=1 j=1
n
X
a2i V
i=1
ai aj Cov(Xi , Xj )
(Xi ) + 2
XX
ai aj Cov(Xi , Xj )
(2.1)
(2.2)
i<j
1X
Xi
n i=1
n
1 X
1
2
= 2
V (Xi ) = 2 n 2 =
n i=1
n
n
ind.
Remark 2.3. As the sample size increases, the variance of the sample mean
= 0.
decreases with limn V (X)
Proposition 2.5. A linear combination of independent normal random variables is a normal random variable.
32
2.4.7
N(0, 1)
approx.
N ,
X
n
Remark 2.4. The following additional conditions/rule of thumb are needed
ensure that the CLT is applicable to the Binomial distribution
np > 5
and
n(1 p) > 5
Example 2.30. At a university the mean age of students is 22.3 and the
standard deviation is 4. A random sample of 64 students is to be drawn.
What is the probability that the average age of the sample will be greater
than 23?
By the CLT
42
approx.
.
X N 22.3,
64
So we need to find
> 23) = P
P (X
22.3
X
23 22.3
p
> p
4/ (64)
4/ (64)
= P (Z > 1.4)
= 0.0808
33
34
Chapter 3
Inference For Population Mean
Chapter 5 in textbook.
3.1
Confidence intervals
35
3.1.1
+ z/2
z/2 < < X
=P X
n
n
and the probability that (on the long run) the random C.I. interval,
z/2
X
n
contains the true value of is 1 . When a C.I. is constructed from a
single sample we can no longer talk about a probability as there is no long
run temporal concept but we can say that we are 100(1 )% confident that
the methodology by which the interval was contrived will contain the true
population parameter.
In practice the population variance is unknown. However, a large sample
size implies that the sample variance s2 is a good estimate for 2 and we
can replace it in the C.I. calculation. The technically correct method for
creating a C.I. when 2 is unknown is shown in Section 3.1.2.
Example 3.1. In a packaging plant, the sample mean and standard deviation
for the fill weight of 100 boxes are x = 12.05 and s = 0.1. The 95% C.I. for
the mean fill weight of the boxes is
0.1
12.05 z0.025
= (12.0304, 12.0696),
| {z } 100
(3.1)
1.96
3.1.2
has
The construct of the C.I. for the population mean uses the fact that X
a normal distribution. This can happen in two ways, when X1 , . . . , Xn are
i.i.d.
normal random variables
36
x z/2 .
n
(3.2)
(3.3)
where tn1 stands for Students-t distribution with parameter degrees of freedom = n1. A Students-t distribution is similar to the standard normal
except that it places more weight to extreme values as seen in Figure 3.1.2.
0.4
Density Functions
0.0
0.1
0.2
0.3
N(0,1)
t_4
(3.4)
Remark 3.2. To be technically correct then when 2 in known one should use
equation (3.2) and when it is unknown, equation (3.4). It is common practice,
for convenience mainly, to use equation (3.2) even when 2 is unknown but
the sample size is large. As discussed earlier under this scenario s2 is a good
estimate of 2 and the values in the t-table and z-table are very close to
eachother.
Example 3.2. Suppose that a sample of 36 resistors is taken with x = 10
and s2 = 0.7. A 95% C.I. for is
r
0.7
10 t(35,0.025)
= (9.71693, 10.28307)
| {z } 36
2.03
Note: If the exact degrees of freedom are not in the table, use the closest one.
Since n > 30 you may see in practice see used equation (3.2) for the
reasons discussed. The 95% C.I. using that method would be
r
0.7
10 z0.025
= (9.726691, 10.273309)
| {z } 36
1.96
3.1.3
The price paid for a higher confidence level, for the same sample statistics, is
a wider interval - try this at home using different values.
We know that as
/ n decreases and
the sample size n increases the standard deviation of X,
consequently so does the margin of error. Thus, knowing some preliminary
information such as a rough estimate for can help us determine the sample
size needed to obtain a fixed margin of error.
Using equation (3.2), the width of the interval is twice the margin of error
width = 2z/2 .
n
Thus,
n = 2z/2
width
n 2z/2
2
.
width
Example 3.3. In Example 3.1 we had that x = 12.05 and s = 0.1 for the
100 boxes, leading to a 95% C.I. for the true mean width 0.0392 or 0.0196
(3.1). Boss man requires a 95% C.I. of 0.0120.
38
IN CLASS
3.2
Hypothesis Testing
3.2.1
,
X N 75,
35
or equivalently
75
X
9
35
H0
N(0, 1).
Since we wish to control for the type I error, we set P (type I error) = .
The default value of is usually taken to be 5%.
0.4
Standard Normal
0.0
0.1
0.2
0.3
=0.05 area
1.645
x 75
40
9
35
is in the blue region, i.e. T.S. < z0.05 , then H0 is rejected. We assume that
sample mean x is a good estimate for and hence x should be close
to 0, which implies T should be close to zero. However, if it is not, then it
implies that = 75 was not a good hypothesis value for the true mean and
consequently that T was not centered correctly.
Assume that x = 70.8 from the 35 samples. Then T.S. = 2.76, which is
in the rejection region and we reject H0 at the = 0.05 level. Equivalently,
a conclusion can be reached in hypothesis/significance testing by using the
p-value.
Definition 3.3. The p-value of a hypothesis test is the probability of observing the specific value of the test statistic, T.S., or a more extreme value,
under the null hypothesis. The direction of the extreme values is indicated
by the alternative hypothesis.
Therefore, in this example values more extreme than -2.76 are {x|x <
2.76} as the alternative, Ha : < 75, is indicating values less than. Thus,
p-value = P (Z < 2.76) = 0.0029.
Therefore, since p-value< , the null hypothesis is rejected in favor of the
alternative hypothesis as the probability of observing the test statistic value
of -2.76 or more extreme (as indicated by Ha ) is smaller than the probability
of the type I error we are willing to undertake.
=0.05 area
pvalue
0.0
0.1
0.2
0.3
0.4
Standard Normal
2.76
1.645
(i) H0 : 0 vs Ha : > 0
(ii) H0 : 0 vs Ha : < 0
(iii) H0 : = 0 vs Ha : 6= 0
at the significance level, compute the test statistic
T.S. =
Reject the null if
(i) T.S. > z
x 0
.
/ n
(3.5)
42
vs
Ha : > 15
17 15
=4
3/ 36
3.2.2
If the sample size is small, i.e. n 30, then the C.L.T. is not applicable
and therefore we must assume that the individual r.vs. X1 , . . . , Xn
for X
corresponding to the sample are normal r.vs with mean and variance 2 .
N(, 2 /n) and we can proceed
Then, by Proposition 2.5 we have that X
exactly as in equation (3.5).
However, if is unknown, which is usually the case, we replace it by its
sample estimate s. Consequently,
0 H0
X
tn1 ,
s/ n
= x, the test statistic becomes
and the for an observed value X
T.S. =
x 0
.
s/ n
At the significance level, for the same hypothesis tests as before, we reject
H0 if
(i) T.S. > t(n1,)
(i) p-value=P (tn1 > T.S.) <
(ii) T.S. < t(n1,)
Remark 3.4. The values contained within a two-sided 100(1 )% C.I. are
precisely those values for which the p-value of a two sided hypothesis test
will be greater than .
Example 3.7. The lifetime of single cell organism is believed to be on average 257 hours. A small preliminary study was conducted to test whether
the average lifetime was different when the organism was placed in a certain
medium. The measurements are assumed to be normally distributed and
turned out to be 253, 261, 258, 255, and 256. The hypothesis test is
H0 : = 257 vs. Ha : 6= 257
With x = 256.6 and s = 3.05, the test statistic value is
T.S. =
256.6 257
= 0.293.
3.05/ 5
The p-value is P (|t4 | > | 0.293|) = P (t4 < 0.293) + P (t4 > 0.293) =
0.7839. Hence, since the p-value is large (> 0.05) we fail to reject H0 and
conclude that population mean is not statistically different from 257.
Instead of a hypothesis test if a two sided 95% was constructed by
3.05
256.6 t(4,0.025)
| {z } 5
(252.81, 260.39),
2.776
it clear that the null hypothesis value of = 257 is a plausible value and
consequently H0 is plausible, so it is not rejected.
44
Part II
Part 2 Material
45
Chapter 4
Inference For Population
Proportion
Chapter 10.1 - 10.2 in textbook.
4.1
In the binomial setting experiments had binary outcomes and of interest was
the number of successes out of the total number of trials. Let X be the total
number of successes, then X Bin(n, p). Once an experiment is conducted
and data obtained an estimate for p can be obtained,
x
p =
n
which is an average. It is the total number of successes over the total number
of trials. As such, if the number of successes and number of failures are
greater than 5, the C.L.T. tells us that
p(1 p)
p N p,
.
n
Then a 100(1 )% C.I. can be created as in equation (3.2),
r
p(1 p)
.
p z/2
n
This is the classical approach for when the sample size is large. This cannot
be used for the small sample framework as the C.L.T. is not applicable. An
exact version exists in the field nonparametric statistics. However, there does
exist an interval similar to classical version that works relatively well for small
sample sizes (not too small) and is equivalent for large sample sizes. It is
called the Agresti-Coull 100(1 )% C.I.,
r
p(1 p)
,
p z/2
n
46
where n
:= n + 4, and p := (x + 2)/
n.
Note: The instructor will be using the Agresti-Coull interval on the exam and
quizes, however the current textbook will be using the classical approach.
Example 4.1. A map and GPS application for a smartphone was tested for
accuracy. The experiment yielded 26 error out of the 74 trials. Find the 90%
C.I. for the proportion of errors.
Since n = 74 and x = 26, then n
= 74 + 4 and p = (26 + 2)/78 = 0.359.
Hence the 90% C.I. for p is
r
0.359(1 0.359)
0.359 z0.05
(0.269, 0.448)
|{z}
78
1.645
4.2
p p0
T.S. = q
,
p0 (1p0 )
n
and the r.v. corresponding to the test statistic has a standard normal distribution under the null hypothesis assumption. Reject the null if
(i) T.S. > z
(i) p-value=P (Z > T.S.) <
(ii) T.S. < z
47
Chapter 5
Inference For Two Population
Means
Chapter 6 in textbook.
5.1
There are instances when a C.I. for the difference between two means is of
interest when one wishes to compare the sample mean from one population
to the sample mean of another.
5.1.1
2
2
Y ) = X + Y .
V (K) = V (X
nX
nY
Therefore,
Y N
K := X
2
Y2
X
+
X Y ,
nX nY
Once again, if the variances are unknown we can replace them with the
sample variances due to the large sample size. In addition, we could use
Students-t critical values instead of the z-score, z/2 , (as the variances are
unknown) but large sample sizes imply that the t-score will be approximately
equal to the z-score.
Example 5.1. In an experiment, 50 observations of soil NO3 concentration
(mg/L) were taken at each of two (independent) locations X and Y . The
descriptive statistics are: x = 88.5, sX = 49.4, y = 110.6 and sY = 51.5.
Construct a 95% C.I. for the difference in means and interpret.
IN CLASS
5.1.2
As in Section 3.1.2, with small sample sizes we must assume that X1 , . . . , XnX
2
are i.i.d N(X , X
) and Y1 , . . . , YnY are i.i.d N(Y , Y2 ) with the two sample
being independent of one another. As in equation (3.3)
where
Y (X Y )
X
q 2
t
s2Y
sX
+ nY
nX
2
s2X
s2Y
+ nY
nX
= (s2 /n )2 (s2 /n )2 .
Y
X
X
+ YnY 1
nX 1
(5.1)
x y t(,/2)
s2X
s2
+ Y.
nX
nY
Example 5.2. Two methods are considered standard practice for surface
hardening. For Method A there were 15 specimens with a mean of 400.9
(N/mm2 ) and standard deviation 10.6. For Method B there were also 15
specimens with a mean of 367.2 and standard deviation 6.1. Assuming the
49
samples are independent and from a normal distribution the 98% C.I. for
A B is
r
10.62 6.12
+
400.9 367.2 t,0.01
15
15
where
2
2
2
10.6
6.1
+
15
15
2
Remark 5.1. When population variances are believed to be equal, i.e. X
2
Y we can improve on the estimate of variance by using a pooled or weighted
average estimate. If in addition to the regular assumptions, if we can assume
equality of variances then the 100(1 )% C.I. for X Y is
r
1
1
x y t(nX +nY 2,/2) sp
+
,
nX
nY
with
sp =
The assumption that the variances are equal must be made a priori and not
used simply because the two variances may be close in magnitude.
2
Example 5.3. Consider Example 5.2 but now assume that X
Y2 . A
98% C.I. for the difference of X Y constructed with
r
14(10.62 ) + 14(6.12)
sp =
= 8.648
28
is
400.9 367.2 t(28,0.01) (8.648)
| {z }
2.467
2
15
(25.9097, 41.4903)
5.1.3
A simple extension of Section 4.1 to the two sample framework yields the
100(1 )% C.I. for the difference of two population proportions. Let X
Bin(nX , pX ) and Y Bin(nY , pY ) be two independent binomial r.vs. Define
n
X = nX + 2 and pX = (x + 1)/
nX , similarly for Y . Then the 100(1 )%
C.I. for pX pY is
s
pX (1 pX ) pY (1 pY )
pX pY z/2
+
.
n
X
n
Y
50
Intuitively, since proportions are between 0 and 1, the difference of two proportions must lie between -1 and 1. Hence if the bounds of a C.I. are outside
the intuitive ones, they should be replaced by the intuitive bounds.
Example 5.4. In a clinical trial for a pain medication, 394 subjects were
blindly administered the drug, while an independent group of 380 were given
a placebo. From the drug group, 360 showed an improvement. From the
placebo group 304 showed improvement. Construct a 95% C.I. for the difference and interpret.
IN CLASS
5.1.4
There are instances when two samples are not independent, when a relationship exists between the two. For example, before treatment and after
treatment measurements made on the same experimental subject are dependent on eachother through the experimental subject. This is a common event
in clinical studies where the effectiveness of a treatment, that may be quantified by the difference in the before and after measurements, is dependent
upon the individual undergoing the treatment. Then, the data is said to be
paired.
Consider the data in the form of the pairs (X1 , Y1), (X2 , Y2 ), . . . , (Xn , Yn ).
We note that the pairs, i.e. two dimensional vectors, are independent as the
experimental subjects are assumed to be independent with marginal expectations E(Xi ) = X and E(Yi ) = Y for all i = 1, . . . , n. By defining,
D1 = X 1 Y 1
D2 = X 2 Y 2
..
.
Dn = X n Y n
a two sample problem has been reduced to a one sample problem. Inference
for X Y is equivalent to one sample inference on D as was done in
Chapter 3. This holds since,
!
!
n
n
X
X
1
1
Y ) = X Y .
=E
Di = E
Xi Yi = E(X
D := E(D)
n i=1
n i=1
51
New
Old
D
Car
1
2
3
4
5
6
7
8
9
10
4.35 5.00 4.21 5.03 5.71 4.61 4.70 6.03 3.80 4.70
4.19 4.62 4.04 4.72 5.52 4.26 4.27 6.24 3.46 4.50
0.16 0.38 0.17 0.31 0.19 0.35 0.43 -0.21 0.34 0.20
With d = 0.232 and sD = 0.183. Assuming that the data are normally
distributed, a 95% C.I. for new old = D is
0.183
0.232 t9,0.025
| {z } 10
(0.101, 0.363)
2.262
and we note that the interval is strictly greater than 0, implying that that
the difference is positive, i.e. that new > old
5.2
5.2.1
.
+
X Y N X Y ,
nX
nY
To test
(i) H0 : X Y 0 vs Ha : X Y > 0
(ii) H0 : X Y 0 vs Ha : X Y < 0
(iii) H0 : X Y = 0 vs Ha : X Y 6= 0
52
we assume that the variances are known and the test statistic is
T.S. = p
x y 0
.
+ Y2 /nY
2
X
/nX
The r.v. corresponding to the test statistic has a standard normal distribution under the null hypothesis H0 , that X Y = 0 . Reject the null
if
(i) T.S. > z
(i) p-value=P (Z > T.S.) <
(ii) T.S. < z
5.2.2
x y 0
,
2
X
/nX + Y2 /nY
5.2.3
Let X Bin(nX , pX ) and Y Bin(nY , pY ) represent two independent binomial r.vs from two Bernoulli trial experiments. To test
(i) H0 : pX pY 0 vs Ha : pX pY > 0
(ii) H0 : pX pY 0 vs Ha : pX pY < 0
(iii) H0 : pX pY = 0 vs Ha : pX pY 6= 0
we must assume that the number of successes and failures is greater than
10 for both samples. As the null hypotheses values for pX and pY are not
available we simply check that the sample successes and failures are greater
than 10. By virtue of the C.L.T.
pX (1 pX ) pY (1 pY )
H0
,
+
pX pY N 0,
nX
nY
and test statistic would be constructed in the usual way. However, under H0
it is assumed that pX = pY which implies that the two variances are equal
and therefore in lieu of Remark 5.1 we can replace pX and pY in the variance
by the pooled estimate
x+y
p =
.
nX + nY
The test statistic is then
T.S. = p
pX pY 0
,
p(1 p)(1/nX + 1/nY )
and the r.v. corresponding to the test statistic has a standard normal distribution under the null hypothesis.
5.2.4
In the event that two samples are dependent, i.e. paired, such as when two
different measurements are made on the same experimental unit, the inference methodology must be adapted to account for the dependence/covariance
between the two samples.
Refer to Section 5.1.4, where we consider the data in the form of the
pairs (X1 , Y1 ), (X2 , Y2 ), . . . , (Xn , Yn ) and construct the one-dimensional, i.e.
one-sample D1 , D2 , . . . , Dn where Di = Xi Yi for all i = 1, . . . , n. As shown
2
earlier, D = X Y and the variance term D
incorporates the covariance
between X and Y .
To test
(i) H0 : X Y = D 0 vs Ha : X Y = D > 0
54
(ii) H0 : X Y = D 0 vs Ha : X Y = D < 0
(iii) H0 : X Y = D = 0 vs Ha : X Y = D 6= 0
perform a one-sample hypothesis test by either a large or small sample inference using the test statistic
T.S. =
5.3
d 0
D / n
d 0
sD / n
or
A probability plot is a graphical technique for comparing two data sets, either
two sets of empirical observations, one empirical set against a theoretical set.
Definition 5.1. The empirical distribution function, or empirical c.d.f., is
the cumulative distribution function associated with the empirical measure
of the sample. This c.d.f. is a step function that jumps up by 1/n at each of
the n data points.
n
number of elements x
1X
Fn (x) =
I{xi x}
=
n
n i=1
5, 7, 8. The empirical c.d.f. is
if
if
if
if
if
x<1
1x<5
5x<7
7x<8
x>8
0.0
0.2
0.4
Fn(x)
0.6
0.8
1.0
0.25
F4 (x) = 0.50
0.75
10
1
0
1
2
Theoretical Quantiles
Normal QQ Plot
50
100
150
200
Sample Quantiles
56
Chapter 6
Nonparametric Procedures For
Population Location
When the sample size is small and we cannot assume that the data are
normally distributed we need must use exact nonparametric procedures to
perform inference on population cental values. Instead of means we will
be referring to medians (
) and other location concepts as they are less
influenced by outliers which can have a drastic impact (especially) on small
samples.
6.1
Sign test
0 vs Ha :
>
0
(ii) H0 :
0 vs Ha :
<
0
(iii) H0 :
=
0 vs Ha :
6=
0
we reject H0 if the p-value < . We illustrate the calculation of the p-value
with the following example.
Example 6.1. Pulse rates for a sample of 15 students were:
60, 62, 72, 60, 63, 75, 64, 68, 63, 60, 52, 64, 82, 68, 64
57
To test H0 :
65 vs Ha :
< 65 we have B = 5. The p-value, (i.e. the
probability of observing the test statistic or a value more extreme) is
p-value = P (B 5|B Bin(15, 0.5))
= P (B = 0) + . . . + P (B = 5)
5
X
15
0.5i 0.515i
=
i
i=0
= 0.1509.
Remark 6.1. If we wanted to test the location of the 70th percentile then
B Bin(n, 0.3)
Remark 6.2. There is also a normal approximation (shown in textbook) but
we will stick to exact method.
58
6.2
nX nY
nX > nY
TX TL ,
TY TU ,
nX nY
nX > nY
(ii)
(iii)
TX TU or TX TL
TY TU or TY TL
nX nY
nX > nY
where the critical values TU and TL can be found in Table 5 (Table 6 in the
textbook) where the first sample is the smallest one (done for convenience).
In practice though, R can provide exact p-values.
59
Example 6.2. Two groups of 10 did not know whether they were receiving
alcohol or the placebo and their reaction times (in seconds) was recorded.
Placebo
Alcohol
0.90 0.37
1.46 1.45
Test whether the distribution of reaction times for the placebo are shifted
to the left of that for alcohol (case (ii)). The ranks are:
Placebo
Alcohol
7 1 16 5 8 4
15 14 17 13 10 20
6 3 2 18
9 11 19 12
70
140
6.3
60
61
Field
1
2
3
4
5
6
7
8
9
10
A
211.4
204.4
202.0
201.9
202.4
202.0
202.4
207.1
203.6
216.0
B
186.3
205.7
184.4
203.6
180.4
202.0
181.5
186.7
205.7
189.1
D Rank(|D|)
25.1
15
-1.3
1
17.6
7
-1.7
2
22.0
14
0
0
20.9
13
20.4
11
-2.1
3
26.9
19
Field
11
12
13
14
15
16
17
18
19
20
A
208.9
208.7
213.8
201.6
201.8
200.3
201.8
201.5
212.1
203.4
B
183.6
188.7
188.6
204.2
181.6
208.7
181.5
208.7
186.8
182.9
D Rank(|D|)
25.3
17.5
20.0
8
25.2
16
-2.6
4
20.1
9
-8.4
6
20.3
10
-7.2
5
25.3
17.5
20.5
12
AB
25.1
-1.3
17.6
-1.7
22.0
0
20.9
20.4
-2.1
26.9
D Rank(|D|)
20.1
16
-6.3
2
12.6
7
-6.7
3
17.0
15
-5.0
1
15.9
14
15.4
12
-7.1
4
21.9
20
Field
11
12
13
14
15
16
17
18
19
20
AB
25.3
20.0
25.2
-2.6
20.1
-8.4
20.3
-7.2
25.3
20.5
D
Rank(|D|)
20.3
18.5
15.0
9
20.2
17
-7.6
5
15.2
10
-13.4
8
15.3
11
-12.2
6
20.3
18.5
15.5
13
63
Chapter 7
Inference About Population
Variances
Chapter 7 in textbook.
7.1
The sample statistic s2 is widely used as the point estimate for the population
variance 2 , and similar to the sample mean it varies from sample to sample
and has a sampling distribution.
Let X1 , . . . , Xn be i.i.d. r.v.s. WeP
already have some tools that help us
1
1 X
2
S =
(Xi X)
n 1 i=1
2
64
0.06
0.08
0.10
2distribution
0.00
0.02
0.04
area
(n 1)S 2
<
< 2(n1),/2
2
!
(n 1)S 2
(n 1)S 2
2
< < 2
2(n1),/2
(n1),1/2
2(n1),1/2
which implies that on the long run this interval will contain the true population variance parameter 100(1 )% of the time. Thus, the 100(1 )%
C.I. for 2 is
!
2
2
(n 1)s (n 1)s
,
.
2(n1),/2 2(n1),1/2
Example 7.1. At a coffee plant a machine that fills 500g coffee containers.
Ideally, the amount of coffee in a container should vary only slightly about
the 500g nominal value. The machine is designed to have a mean to dispense coffee amounts that have a normal distribution with mean 506.6g and
standard deviation of 4g. This implies that only 5% of containers weigh less
than 500g. A quality control engineer samples 30 containers every hour. A
particular sample yields
x = 500.453,
s = 3.433.
as a 95% C.I. for 2 , or equivalently, by taking the square root, (2.7341, 4.6150)
as a 95% C.I. for .
Remark 7.1. Hypothesis testing will be skipped but the methodology is exactly the same as for the mean with H0 : 2 = 02 , a test statistic value
of
(n 1)s2
.
T.S. =
02
For details see p.299 of textbook.
7.2
and
(nY 1)SY2
2nY 1
Y2
but it is also known that a standardized (by dividing by the degrees of freedom) ratio of two 2 s is an F-distribution. Therefore,
2 / 2
(nX 1)SX
X
nX 1
2
(nY 1)SY2 /Y
nY 1
2
SX
/SY2
FnX 1,nY 1 .
2
X
/Y2
where F(nX 1,nY 1), is the critical value for the F-distribution with the are
to the right being .
Remark 7.2. Hypothesis testing will be skipped but the methodology follows
2
along the same lines as before with H0 : X
/Y2 = 1, a test statistic value of
s2X
T.S. = 2 .
sY
For details see p.307 of textbook.
66
Example 7.2. The life length of an electrical component was studied under
two operating voltages, 110 and 220. Ten different components were assigned
to be tested under 110V and 16 under 220V. The times to failure (in 100s
hrs) were then recorded. Assuming that the two samples are independent
2
2
and normal we construct a 90% C.I. for 110
/220
.
V
n Mean
110 10 20.04
220 16 9.99
St.Dev.
0.474
0.233
IN CLASS
7.3
Comparing t 2 Variances
There are instances where we may wish to compare more than two population
variations such as the variability in SAT examination scores for students
using one of three types of preparatory material. The null assumption (for t
populations) is
H0 : 12 = = t2
or if the p-value (the area to the right of the T.S. under an Ft1,N t distribution) is < .
67
Example 7.3. Three different additives that are marketed for increasing
fuel efficiency in miles per gallon (mpg) were evaluated by a testing agency.
Studies have shown an average increase of 8% in mpg after using the products
for 250 miles. The testing agency wants to evaluate the variability in the
increase.
Additive
1
1
1
1
1
1
1
1
1
1
% in mpg
4.2
2.9
0.2
25.7
6.3
7.2
2.3
9.9
5.3
6.5
Additive
2
2
2
2
2
2
2
2
2
2
% in mpg
0.2
11.3
0.3
17.1
51.0
10.1
0.3
0.6
7.9
7.2
Additive
3
3
3
3
3
3
3
3
3
3
% in mpg
7.2
6.4
9.9
3.5
10.6
10.8
10.6
8.4
6.0
11.9
0.6
0.8
1.0
F(2,27) distribution
0.0
0.2
0.4
0.1803 area
1.8268
68
Chapter 8
Contingency Tables
Chapter 10.5 in textbook.
Contingency tables are cross-tabulations of frequency counts where the rows
(typically) represent the levels of the explanatory variable and the columns
represent the levels of the response variable.
We motivate the methodology through an example. A personnel manager wants to assess the popularity of 3 alternative flexible time-scheduling
plans among workers. A random sample of 216 workers yields the following
frequencies.
Favored Plan
1
2
3
Total
Office
1 2 3 4 Total
15 32 18 5
70
8 29 23 18
78
1 20 25 22
68
24 81 66 45 216
69
Office
Favored Plan
1
2
3
Total
1
0.0694
0.0370
0.0046
0.1111
2
0.1481
0.1343
0.0926
0.3750
3
0.0833
0.1065
0.1157
0.3056
4
0.0231
0.0833
0.1019
0.2083
Total
0.3241
0.3611
0.3148
1.0000
i = 1, 2, 3 j = 1, 2, 3, 4.
Performing this operation for all rows and columns gives us a table of expected joint probabilities.
Favored Plan
1
2
3
Total
1
0.0360
0.0401
0.0350
0.1111
Office
2
3
0.1215 0.0990
0.1354 0.1103
0.1181 0.0962
0.3750 0.3056
4
0.0675
0.0752
0.0656
0.2083
Total
0.3241
0.3611
0.3148
1.0000
1
2
3
4
Total
7.7778 26.2500 21.3889 14.5833
70
8.6667 29.2500 23.8333 16.2500
78
7.5556 25.5000 20.7778 14.1667
68
24
81
66
45
216
To test
H0 : Levels of one variable are independent of the other
we use Pearsons chi-square 2 test statistic that is applicable if Eij > 5 i, j.
T.S. =
r X
c
X
(nij Eij )2
Eij
i=1 j=1
where r is the number of rows and c is the number of columns. The sampling
distribution of the test statistic is 2(r1)(c1) and hence, for a specified ,
H0 : is rejected if
T.S. 2(r1)(c1), ,
or if the p-value (the area to the right of the test statistic) < . For the
example at hand the T.S. is
T.S. =
(15 7.7778)2
(22 14.1667)2
++
= 27.135,
7.7778
14.1667
the degrees of freedom are 2(3) = 6 and the p-value is 0.0001366. Therefore,
we reject H0 and conclude that Favored Plan and Office are not independent.
Once dependence is established, of interest is to determine which cells in
the contingency table have higher or lower frequencies than expected (under
independence). This is usually determined by observing the standardized
residuals (deviations) of the observed counts, nij , to the expected counts
Eij , i.e.
nij Eij
rij = p
Eij (1 pi+ )(1 p+j )
Office
Favored Plan
1
2
3
1
2
3
4
3.3409 1.7267 -1.0695 -3.4306
-0.3005 -0.0732 -0.2563 0.6104
-3.0560 -1.6644 1.3428 2.8258
71
Outstanding
21
20
4
3
Rating
Average
25
36
14
8
Poor
2
10
7
6
72
Part III
Part 3 Material
73
Chapter 9
Regression
Chapter 11 in textbook.
We have seen and interpreted the population correlation coefficient between
two r.vs that measures the strength of the linear relationship between the
two variables. In this chapter we hypothesize a linear relationship between
the two variables, estimate and draw inference about the model parameters.
9.1
The simplest deterministic mathematical relationship between two mathematical variables x and y is a linear relationship
y = 0 + 1 x,
where the coefficient 0 represents the y-axis intercept, the value of y when
x = 0, and 1 represents the slope, interpreted as the amount of change in
the value of y for a 1 unit increase in x.
To this model we add variability by introducing the random variable
i.i.d.
i N(0, 2 ) for each observation i = 1, . . . , n. Hence, the statistical
model by which we wish to model one random variable using known values
of some predictor variable becomes
Yi = 0 + 1 xi + i
i = 1, . . . , n
(9.1)
where Yi represents the r.v. corresponding to the response, i.e. the variable
we wish to model and xi stands for the observed value of the predictor.
Therefore we have that
ind.
Yi N(0 + 1 xi , 2 ).
(9.2)
Notice that the Y s are no longer identical since their mean depends on the
value of xi .
74
15
10
5
y
0
Data points
Regression line
20
10
10
20
30
40
50
60
n
X
i=1
(Yi E(Yi ))
min
n
X
i=1
(Yi (0 + 1 xi ))2 .
Hence, the goal is to find the values of 0 and 1 that minimizes the sum of
the distances between the points and their expected value under the model.
This is done by the following steps:
1. Taking the partial derivatives with respect to 0 and 1
2. Equate the two resulting equations to 0
3. Solve the simultaneous equations for 0 and 1
4. (Optional) Taking second partial derivatives to show that in fact they
minimize, not maximize.
75
Therefore,
b1 := 1 =
and
Pn
P
(xi x)(yi y)
sY
( ni=1 xi yi ) n
xy
i=1
Pn
=r
= Pn 2
)2
x2
( i=1 xi ) n
sX
i=1 (xi x
(9.3)
b0 := 0 = y b1 x.
Remark 9.1. Do not extrapolate model for values of the predictor x that were
not in the data, as it is not clear how the model behave for other values. Also,
do not fit a linear regression for data that do not appear to be linear.
Next we introduce some notation that will be useful in conducting inference of the model. In order to determine whether a regression model is
adequate we must compare it to the most naive model which uses the sample
mean Y as prediction, i.e. Y = Y . This model does not take into account
any predictors as the prediction is the same for all values of x. Then the total
distance of a point yi to the sample mean y can be broken down into two
components, one measuring the error of the model for that point, and one
measuring the improvement distance accounted by the regression model.
(yi y) = (yi yi ) + (
y y)
| {z } | {z }
| i {z }
Error
Regression
Total
76
(yi y)2 =
|i=1 {z
SST
n
X
(yi yi )2 +
|i=1 {z
SSE
n
X
(
yi y)2 ,
|i=1 {z
SSR
9.1.1
(9.4)
Pn
i )(
yi y)
i=1 (yi y
Goodness of fit
Remark 9.2. For simple linear regression (only) with one predictor, the coefficient of determination is the square of the correlation coefficient, i.e. R2 = r 2 .
Not true when more than one predictor is used in the model.
Example 9.1. For 15 cement block of certain dimensions, their weight (lbs)
and porosity (%) were measured.
Weight
99.0
101.1
102.7
103.0
105.4
107.0
108.7
110.8
112.1
112.4
113.6
113.8
115.1
115.4
120.0
Porosity
28.8
27.9
27.0
25.2
22.8
21.5
20.9
19.6
17.1
18.9
16.0
16.7
13.0
13.6
10.8
77
Coef
130.854
-1.07644
S = 1.02319
SE Coef
1.012
0.04889
R-Sq = 97.4%
T
129.28
-22.02
P
0.000
0.000
R-Sq(adj) = 97.2%
Analysis of Variance
Source
Regression
Residual Error
Total
DF
1
13
14
SS
507.59
13.61
521.20
MS
507.59
1.05
F
484.84
P
0.000
http://www.stat.ufl.edu/~ dathien/STA6166/reg.R
We note that the slope b1 = 1.08 implies that for each percentage point
increase in porosity, the weight decreases by 1.08 lbs (for values of porosity
observed, about 10-30%). The coefficient of determination is 0.974 implying
that 97.4% of the variability in weight (to its mean), conveyed by SS Total
is explained by the model, conveyed by the SS Regression value. The sample correlation coefficient of r = 0.987 also illustrates this strong negative
relationship.
78
9.1.2
)
j
i=1 (xi x
j=1
i=1
79
and
V (B1 ) = hP
n
j=1 (xj
Thus,
x)
n
X
2
.
(xi x)2 V (Yi ) = = Pn
i2
| {z }
)2
2
j=1 (xj x
i=1
2
B1 N
2
1 , Pn
)2
i=1 (xi x
(9.5)
Remark
Pn 9.4. The2larger the spread in the values of the predictor, the larger
the i=1 (xi x) value will be and hence the smaller the variances for B0
and B1 . Also, as (xi x)2 are nonnegative terms when we have more data
points,
Pni.e. larger2n, we are summing more non-negative terms and the larger
the i=1 (xi x) .
9.1.3
s
x)2
i=1 (xi
P n
tn2 ,
where s stands for the conditional (upon the model) standard deviation of
the response. The true variance is never known as there are infinite model
variations and hence the Students-t distribution is used, instead of the standard normal, irrespective of the sample size. Important to note is the fact
that the degrees of freedom are n 2, as 2 were lost due to the estimation
of 0 and 1 .
Therefore, a 100(1 )% C.I. for 1 is
1 tn2,/2 s1
80
pPn
where s1 = s/
)2 . Similarly, for a null hypothesis value 10 ,
i=1 (xi x
the test statistic is
1 10
T.S. =
s 1
which under the null hypothesis has a corresponding tn2 r.v.
Example 9.2. A 95% C.I. for 1 is
1.07644 t13,0.025 (0.04889)
9.1.4
(1.1821, 0.9708)
For an observed value of the predictor, xobs , i.e. xobs = xk for some k, we also
have the observed value of the response that create the data point (xobs , yobs )
in the two dimensional space. However, we also have the fitted value of the
response (once a regression model is fitted), y = b0 + b1 xobs . We wish to
better understand the behavior of the fitted response, so let us look at the
r.v. Y = B0 + B1 xobs , that is before any data on the responses are obtained.
After substituting the formulas for b0 and b1 can be re-expressed as
"
#
n
X
1
x
i
Y = B0 + B1 xobs =
+ (xobs x) Pn
Yi .
(9.6)
2
n
(x
)
j
j=1
i=1
)
1
obs
Y N 0 + 1 xobs ,
+ Pn
2 .
2
n
(x
)
j
j=1
Thus, a 100(1 )% C.I. for the mean response, E(Y ) = 0 + 1 xobs , for a
value of the predictor that is observed, i.e. in the data, is
s
!
1
(xobs x)2
y t(n2,/2) s
.
+ Pn
n
)2
j=1 (xj x
{z
}
|
sY
Example 9.3. Refer back to Example 9.1, concerning cement blocks and
specifically Table 9.1. Assume we are interested in a C.I. for the mean value
of the weight (response) when porosity is 27%. Notice that the value of 27%
for porosity is observed. So with
X
x = 19.987,
(xi x)2 = 438.057, (xobs x)2 = 49.182, s = 1.023
i=1
81
9.1.5
Prediction interval
)
1
new
2 ,
Ypred N 0 + 1 xnew , 1 + + Pn
2
n
)
j=1 (xj x
and a 100(1 )% prediction interval (P.I.) for Ypred = 0 + 1 xnew + , for
a value of the predictor that is unobserved, i.e. not in the data, is
s
!
(xnew x)2
1
ypred t(n2,/2) s
.
1 + + Pn
n
)2
j=1 (xj x
{z
}
|
spred
Example 9.4. Refer back to Example 9.1. Let us estimate the value of the
cement block weight when porosity is 14% and create an interval around it.
First we check that a value of 14% for porosity is within the range of the
observed data but does not belong to one of the data points so we need to predict the value and create a P.I. The predicted value is 130.85-1.08(14)=115.73
and with the following values
X
x = 19.987,
(xi x)2 = 438.057, (xnew x)2 = 34.422, s = 1.023
i=1
(113.3653, 118.0947)
2.160
9.1.6
Checking assumptions
i = 1, . . . , n
1. Normality
2. Independence
3. Homogeneity of variance
4. Model fit
with components of model fit being checked simultaneously within the first
three. The assumptions are checked using the residuals ei := yi yi for
i = 1 . . . , n, or the standardized residuals, which are the residuals divided
by their standard deviation. Standardized residuals are usually the default
residuals being used as their standard deviation should be around 1.
Although, exact statistical tests exist to test the assumptions, linear regression is robust to slight deviations so only graphical procedures will be
introduced here.
Normality
The simplest way to check for normality is with two graphical procedures:
Histogram
P-P or Q-Q plot
A histogram of the residuals is plotted and we try to determine if the
histogram is symmetric and bell shaped like a normal distribution is. In
addition, to check the model fit, we assume the observed response values
yi are centered around the regression line y. Hence, the histogram of the
residuals should be centered at 0. Referring to Example 9.1, we obtain the
following histogram.
9.1.7
In the event that the model assumptions appear to be violated to a significant degree, then a linear regression model on the available data is not valid.
However, have no fear, your friendly statistician is here. The data can be
transformed, in an attempt to fit a valid regression model to the new transformed data set. Both the response and the predictor can be transformed
but there is usually more emphasis on the response.
A common transformation mechanism is the Box-Cox transformation
(also known as Power transformation). This transformation mechanism when
85
applied to the response variable will attempt to remedy the worst of the
assumptions violated, i.e. to reach a compromise. A word of caution, is
that in an attempt to remedy the worst it may worsen the validity of one
of the other assumptions. The mechanism works by trying to identify the
(minimum or maximum depending on software) value of a parameter that
will be used as the power to which the responses will be transformed. The
transformation is
yi 1
if 6= 0
()
yi = G1
y
G log(y ) if = 0
y
i
Qn
where Gy = ( i=1 yi )1/n denotes the geometric mean of the responses. Note
that a value of = 1 effectively implies no transformation is necessary. There
are many software packages that can calculate an estimate for , and if the
sample size is large enough even create a C.I. around the value. Referring to
= 0.55.
Example 9.1, we see that
86
2000
4000
6000
8000
10000
0.5
1.0
2.0
Homogeneity / Fit
re
1.0
4
Order
87
0.0
Independence
1.0
2
time
1.0
Sample Quantiles
1.0
re
re
0.6
prop
0.4
0.2
Normal QQ Plot
1.5
3.0
1.5
0.0
0.8
Frequency
Histogram of re
Theoretical Quantiles
1
5
15
30
60
120
240
0.84 0.71 0.61 0.56 0.54 0.47 0.45
480 720 1440 2880 5760 10080
0.38 0.36 0.26 0.20 0.16 0.08
1.0
Time
Prop
Time
Prop
10 12
0.0
0.2
0.4
fitted.values(reg)
dat$time
0.5
0.5
0.5
1.5
Homogeneity / Fit
re2
0.5
Independence
1.5
2
l.time
1.5
Sample Quantiles
1.5
re2
0.5
prop
0.4
0.2
0
re2
0.6
Normal QQ Plot
1.5
4
2
0
0.8
Frequency
Histogram of re2
Theoretical Quantiles
It seems that a decent choice for is 0, i.e. a log transformation for time.
10 12
Order
0.2
0.4
0.6
0.8
fitted.values(reg2)
http://www.stat.ufl.edu/~ athienit/STA6166/reg_transpred.R
Remark 9.5. When creating graphs and checking if there are pattern try
and keep the axis for the standardized residuals range from -3 to 3. That
is 3 standard deviation below 0 to 3 standard deviations above 0. Software
have a tendency to zoom in, as done in the notes where some axes for
standardized residuals are from -1.5 to 0.5. Obviously if you zoom in enough
you will see a pattern. For example, is glass smooth? If you are viewing by
eye then yes. If you are viewing via an electron microscope then no.
In R just add plot(....., ylim=c(-3,3))
9.2
Multiple Regression
9.2.1
Model
i.i.d.
i = 1, . . . , n with i N(0, 2 ).
88
9.2.2
Goodness of fit
9.2.3
5
= 0.6425
44
Inference
s =
Pn
yi )2
SSE
=
.
np1
np1
i=1 (yi
Individual tests
Estimating the vector of coefficients = (0 , 1 , . . . , p ) now falls in the field
of matrix algebra and will not be covered in this class. We will simply use
the estimates provided by computer output. The interpretation of the slope
coefficients requires an additional statement. For example, a 1-unit increase
in predictor k will cause the response to change by amount k , assuming all
other predictors are held constant.
Inference on the slope parameters j for j = 1, . . . , p is done as in Section
9.1.3 but under the assumption that
Bj j
tnp1 .
s j
An individual test on k , tests the significance of predictor k, assuming
all other predictors j for j 6= k are included in the model. This can
lead to different conclusions depending on what other predictors are included
in the model.
Consider the following theoretical toy example. Someone wishes to measure the area of a square (the response) using as predictors two potential
variables, the length and the height of the square. Due to measurement
error, replicate measurements are taken.
A simple linear regression is fitted with length as the only predictor,
x = length. For the test H0 : 1 = 0, do you think that we would reject
H0 , i.e. is length a significant predictor of area?
Now assume that a multiple regression model is fitted with both predictors, x1 = length and x2 = height. Now, for the test H0 : 1 = 0, do
you think that we would reject H0 , i.e. is length a significant predictor
of area given that height is already included in the model?
This scenario is defined as confounding. In the toy example, height is a
confounding variable, i.e. an extraneous variable in a statistical model that
correlates with both the response variable and another predictor variable.
Example 9.7. In an experiment of 22 observations, a response y and two
predictors x1 and x2 were observed. Two simple linear regression models
were fitted:
(1)
y = 6.33 + 1.29 x1
90
Predictor
Constant
x1
S = 2.95954
Coef
6.335
1.2915
SE Coef
2.174
0.1392
T
2.91
9.28
R-Sq = 81.1%
P
0.009
0.000
R-Sq(adj) = 80.2%
(2)
y = 54.0 - 0.919 x2
Predictor
Constant
x2
S = 5.50892
Coef
53.964
-0.9192
SE Coef
8.774
0.2821
R-Sq = 34.7%
T
6.15
-3.26
P
0.000
0.004
R-Sq(adj) = 31.4%
Each predictor in their respective model is significant due to the small pvalues for their corresponding coefficients. The simple linear regression model
(1) is able to explain more of the variability in the response than model (2)
with R2 = 81.1%. Logically one would then assume that a multiple regression
model with both predictors would be the best model. The output of this
model is given below:
(3)
y = 12.8 + 1.20 x1 - 0.168 x2
Predictor
Constant
x1
x2
S = 2.97297
Coef
12.844
1.2029
-0.1682
SE Coef
7.514
0.1707
0.1858
R-Sq = 81.9%
T
1.71
7.05
-0.91
P
0.104
0.000
0.377
R-Sq(adj) = 80.0%
91
(9.7)
Now, assume that after fitting this model and looking at some preliminary
results, including the individual tests, we wish to test whether we can remove simultaneously the first, third and fourth predictor, i.e x1 , x3 and x4 .
Consequently, we wish to test the hypotheses
H0 : 1 = 3 = 4 = 0 vs Ha : at least one of them 6= 0
In effect we wish to test the full model in equation (9.7) to the reduced model
Yi = 0 + 1 x2i + 2 x5i + i
92
(9.8)
The SSE of the reduced model will be larger than the SSE of the full
model, as it only has two of the predictors of the full model. The test
statistic is based on comparing the difference in SSE of the reduced model
to the full model.
SSEred SSEf ull
dfEred dfEf ull
T.S. =
SSEf ull
dfEf ull
Under the null hypothesis, H0 , the r.v. corresponding to the test statistic follows an F-distribution with two degrees of freedom, F1 ,2 with 1 =
dfEred dfEf ull and 2 = dfEf ull . The p-value for this test is always the area
to the right of the F-distribution, i.e. P (F1 ,2 > T.S.).
Remark 9.7. Note that 1 = dfEred dfEf ull always equals the number of
predictors being tested in a simultaneous test. If n denotes the sample size
then for our example with p = 5 and testing 3 predictors,
1 = (n 2 1) (n 5 1) = 3
Remark 9.8. Simultaneously testing has to be done by fitting both the full
model and the reduced model in order to obtain the two sets of SSE. Computer output will however perform a simultaneous test for testing the significance of all the predictors. This is called the overall test of the model. In
this case, the reduced model has no predictors, hence
Yi = 0 + i
Yi = + i ,
DF
5
39
44
SS
8439559
8901715
17341274
MS
1687912
228249.1
F
7.395
P
0.000
Assuming all the model assumptions are met, we first take a look at the
overall fit of the model.
H0 : 1 = = 5 = 0 vs Ha : at least one of them 6= 0
The test statistic value is T.S. = 7.395 with an associated p-value of approximately 0 (found using an F5,39 distribution). Hence, at least one predictor
appears to be significant. In addition, the coefficient of determination, R2 , is
48.67%, indicating that a large proportion of the variability in the response
can be accounted for by the regression model.
Looking at the individual tests, acidity (pH) is significant given all the
other predictors with a p-value of 0.001, but salinity, potassium (K), sodium
(Na) and zinc (Zn) have large p-values for the individual tests. Since the
p-values are close to 0.5 it is acceptable to consider them for a simultaneous
test. However, this may just be a case of confounding as the certain variables
are highly correlated. Table 9.8 provides the pairwise correlations of the
continuous variables.
biomass
salinity
pH
K
Na
Zn
biomass salinity
pH
K
Na
Zn
1.000
-0.084 0.669 -0.150 -0.219 -0.503
-0.084
1.000 -0.051 -0.021 0.162 -0.421
0.669
-0.051 1.000 0.019 -0.038 -0.722
-0.150
-0.021 0.019 1.000 0.792 0.074
-0.219
0.162 -0.038 0.792 1.000 0.117
-0.503
-0.421 -0.722 0.074 0.117 1.000
can remove both. We will be a bit conservative and try an remove salinity,
K and Zn simultaneously. To test
H0 : salinity = K = Zn = 0 vs Ha : at least one of them 6= 0,
the reduced model needs to be fitted.
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -282.86356 319.38767 -0.886
0.3809
pH
333.10556
55.78001
5.972 4.36e-07
Na
-0.01770
0.01011 -1.752
0.0871
--Residual standard error: 461.1 on 42 degrees of freedom
Multiple R-squared: 0.4851,Adjusted R-squared: 0.4606
F-statistic: 19.79 on 2 and 42 DF, p-value: 8.82e-07
Analysis of Variance
Source
Regression
Residual Error
Total
DF
2
42
44
SS
8412953
8928321
17341274
MS
4206477
212579
F
19.72
P
0.000
(8928321 8901715)/3
= 0.0389.
8901715/39
From the F-table with 2 and 39 degrees of freedom we can determine that
the p-value is greater than 0.05, actually it is 0.9896, so we fail to reject the
null which implies that those predictor are not statistically significant and
we now use the reduced model as our current model.
At this point we see that Na is marginally significant with a p-value of
0.0871. Some may argue to remove it and some may not (due to its p-value
being on the cusp). I would suggest keeping it since our model is simple
enough at this point. Fitting a model without Na (only pH) gives
a model with a higher conditional standard deviation,
s
SSE
s = MSE =
= 472 (compared to 461.1)
np1
2
smaller Radj
= 0.4347 (compared to 0.4606)
(0 + 2 ) + 1 x1i + i
0 + 1 x1i + i
if x2 = 1
if x2 = 0
Coef
31.399
0.014208
-54.210
S = 22.8665
SE Coef
9.902
0.001400
7.243
R-Sq = 81.5%
T
3.17
10.15
-7.48
P
0.003
0.000
0.000
R-Sq(adj) = 80.5%
Analysis of Variance
Source
Regression
Residual Error
Total
DF
2
37
39
SS
85464
19346
104811
MS
42732
523
F
81.73
P
0.000
97
x2
100
0
50
150
0
1
2000
4000
6000
8000
x1
if x2 = 1
if x2 = 0
Coef
-1.84
0.019749
10.73
-0.010957
SE Coef
10.13
0.001546
14.05
0.002174
R-Sq = 89.2%
T
-0.18
12.78
0.76
-5.04
P
0.857
0.000
0.450
0.000
R-Sq(adj) = 88.3%
Analysis of Variance
98
Source
Regression
Residual Error
Total
DF
3
36
39
SS
93470
11341
104811
MS
31157
315
F
98.90
P
0.000
The overall fit of the new model is adequate with a T.S. = 98.90 but more
2
importantly Radj
has increased and s has decreased. Figure 9.10 also shows
the better fit.
x2
100
0
50
150
0
1
2000
4000
6000
8000
x1
99
The sample statistics for all the covariances among all the coefficients
can easily be obtained in R using the vcov function (although we readily
have available the variances, i.e. squared standard errors, for 1 and
3 . Then create a 100(1 )% CI for 1 + 3
q
b1 + b3 t(np1,/2) s21 + s23 + 2s1 3
Remark 9.14. This concept can easily be extended to linear combinations of
more that two coefficients.
http://www.stat.ufl.edu/~ athienit/STA6166/safe_reg.R
In the previous example the qualitative predictor only had two levels,
the use or the the lack of use of a safety program. To fully state all levels
only one binary/dummy predictor was necessary. In general, if a qualitative
predictor has k levels, then k 1 binary predictor variables are necessary.
For example, a qualitative predictor for a traffic light has three levels: red,
yellow and green. Therefore, only two binary predictors are necessary.
(
1 if red
xred =
0 otherwise
(
1 if yellow
xyellow =
0 otherwise
The case when xred = xyellow = 0 means that the light is green.
However, we can potentially treat this variable as a quantitative variable
and not a qualitative one. Although, the colour variable has three categories,
one may argue that colour (in some context) is an ordinal qualitative predictor. For example, you can order a drink in 3 sizes: small, medium and
large. There is an inherent order of 1, 2 and 3. In terms of frequency (or
wavelength) there is also an order of
red (400-484 THz),
yellow (508-526 THz),
green (526-606 THz).
Instead of creating 2 dummy variables we can create one (continuous) variable
for frequency of which we happen to observe only 3 frequencies:
442 THz for red, 517 THz for yellow and 566 THz for green
(simply taking midpoints of frequency intervals).
100
101
Chapter 10
Analysis Of Variance
Chapter 8 in the textbook.
For t = 2 samples/populations we have already seen in Chapters 5 and 6
various inference methods for comparing the central location of two populations. Next we introduce a statistical models and procedures that allow us
to compare more than two ( 2) populations.
Design / Data
Independent Samples (CRD)
Paired Data (RBD)
10.1
The Completely Randomized Design (CRD) is a linear model (as is the regression model) where for controlled experiments, subjects are assigned at
random to one of t treatments and, and for observational studies, subjects are
sampled from t existing groups with the purpose of comparing the different
groups.
The CRD statistical model is
Yij = + i +ij
| {z }
j = 1, . . . , ni , i = 1, . . . , t
(10.1)
i.i.d.
2
where ij N(0,
Pt ) and the restriction (to make model identifiable) that
some i = 0 or i=1 i = 0. The goal is test the statistical significance of
the treatment effects s. If all s are 0 then it implies that the response
can be modeled by a single mean rather than individual i s for each
treatment/sample. To better illustrate this concept consider the following
example (Example 6.1 in textbook).
Example 10.1. Company officials were concerned about the length of time
a particular drug retained its potency. A random sample of n1 = 10 fresh
102
bottles was retained and a second sample of n2 = 10 samples were stored for
a period of 1 year and the following potency readings were obtained.
Fresh
10.2
10.5
10.3
10.8
9.8
10.6
10.7
10.2
10.0
10.6
Stored
9.8
9.6
10.1
10.2
10.1
9.7
9.5
9.6
9.8
9.9
10.6
10.8
10.2
10.0
9.6
9.8
potency
10.4
Observations
Grand Mean
103
10.8
10.6
10.2
10.0
9.6
9.8
potency
10.4
Observations
Trt Mean
Grand Mean
Fresh
Stored
Method
(yij y++ )2 =
{z
SST
ni
t X
X
i=1 j=1
(yij yi+ )2 +
{z
SSE
ni
t X
X
i=1 j=1
(
yi+ y++ )2
{z
SSTrt
SST= (N 1)s2y
P
SSTrt= ti=1 ni (
yi+ y++ )2
P
SSE= ti=1 (ni 1)s2i
dfError
dfTrt
It can be shown (in more advanced courses) that the SS have 2 distribution and from that
E(MSE) = 2
2
E(MSTrt) = +
Pt
ni i2
t1
i=1
MST rt
.
MSE
104
250
263
257
253
Alloy Strength
264 256 260
254 267 265
279 269 273
258 262 264
239
267
277
273
Mean
253.8
263.2
271.0
262.0
St. Dev.
9.757
5.4037
8.7178
7.4498
http://www.stat.ufl.edu/~ athienit/STA6166/anova1.R
IN CLASS
105
10.1.1
Post-hoc comparisons
106
Lower
Upper
-24.6181 5.8181
-32.4181 -1.9819
-23.4181 7.0181
-23.0181 7.4181
-14.0181 16.4181
-6.2181 24.2181
Tukeys procedure
This procedure is derived so that the probability that at least one false difference is detected is (experimentwise error rate). The margin of error
is
r
MSE
q(t,N t),
n
where n is the common sample size for each treatment (which was 5 in the
example). If the sample sizes are unequal use
n=
1
n1
t
++
1
nt
Fishers LSD
Only apply this method after significance is confirmed with through the Ftest. For each pairwise comparison the margin of error is
s
1
1
+
tN t,/2 MSE
ni nj
http://www.stat.ufl.edu/~ athienit/STA6166/anova1.R
107
10.1.2
Nonparametric procedure
12
N(N + 1)
t
X
T2
i
i=1
ni
3(N + 1)
Insulin release
1.59 1.73 3.64 1.97
3.36 4.01 3.49 2.89
3.92 4.82 3.87 5.39
Insulin
1 2
5 10
9 11
108
release
7 3
6 4
8 12
T
T
13 3.25
25 6.25
40 10
providing
T.S. =
12
132 /4 + 252 /4 + 402 /4 3(13) = 7.0385
12(13)
http://www.stat.ufl.edu/~ athienit/STA6166/kruskal_wallis.R
IN CLASS
10.2
i = 1, . . . , t, j = 1, . . . , b
i.i.d.
Factor
y11
y12
y1b
2
..
.
y21
..
.
y22
..
.
..
.
y2b
..
.
yt1
yt2
ytb
with
SST =
t X
b
X
i=1 j=1
SSTrt =
SSBlock =
t
X
i=1
b
X
j=1
SSE =
(yij y)2
b(
yi+ y)2
t(
y+j y)2
b
t X
X
i=1 j=1
where yi+ is the mean of treatment i, y+j is the mean of block j, and y is
the grand mean.
The ANOVA table is then
Source
SS
df
MS
SSTrt
t1
SSTrt
Block
SSBlock
b1
SSBlock
2 + t2
Error
SSE
(b 1)(t 1)
SSE
Total
SST
bt 1
Trt
t1
b1
(b1)(t1)
110
E(MS)
2 + b
Pt
2i
t1
i=1
F
MSTrt
MSE
MSTrt H0
Ft1,(b1)(t1)
MSE
MSBlock H0
Fb1,(b1)(t1)
MSE
10.2.1
Nonparametric procedure
Ri+ Ri + z t(t1)
6
Example 10.6. http://www.stat.ufl.edu/~ athienit/STA6166/friedman.pdf
http://www.stat.ufl.edu/~ athienit/STA6166/friedman.R
112