Coursenotes Aug2012
Coursenotes Aug2012
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5
5
8
11
13
17
19
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
27
27
29
32
34
35
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3 Probability
41
3.1 Sets and Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.2 Conditional Probability and Independence . . . . . . . . . . . . . . . . . . . . . . . . 44
3.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4 Random Variables and Distributions
4.1 Definition and Notation . . . . . . . . . . . . . . . .
4.2 Discrete Random Variables . . . . . . . . . . . . . .
4.3 Continuous Random Variables . . . . . . . . . . . . .
4.4 Summarizing the Main Features of f (x) . . . . . . .
4.5 Sum and Average of Independent Random Variables
4.6 Max and Min of Independent Random Variables . .
4.6.1 The Maximum . . . . . . . . . . . . . . . . .
4.6.2 The Minimum . . . . . . . . . . . . . . . . .
4.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . .
4.7.1 Exercise Set A . . . . . . . . . . . . . . . . .
4.7.2 Exercise Set B . . . . . . . . . . . . . . . . .
1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
61
61
62
63
67
74
77
78
80
82
82
84
2
5 Normal Distribution
5.1 Definition and Properties
5.2 Checking Normality . . .
5.3 Exercises . . . . . . . . .
5.3.1 Exercise Set A . .
5.3.2 Exercise Set B . .
CONTENTS
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
89
89
95
98
98
99
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
103
103
104
106
108
113
114
116
116
116
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
119
119
123
125
126
126
127
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
129
129
130
130
132
134
136
141
141
142
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
9 Simulation Studies
147
9.1 Monte Carlo Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
9.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
10 Comparison of several means
10.1 An example . . . . . . . . .
10.2 Exercises . . . . . . . . . .
10.2.1 Exercise Set A . . .
10.2.2 Exercise Set B . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
153
153
163
163
163
CONTENTS
CONTENTS
Chapter 1
Engineers and applied scientists are often involved with the generation and collection of data and the
retrieval of information contained in data sets. They must also communicate to dierent audiences
the results of complex numerical studies including one or more data sets.
Experience shows that data sets are often messy, dicult to grasp and hard to analyze. In this
chapter we introduce some statistical techniques and ideas which can be used to summarize and
display data.
Table 1.1: Live Load Data
Bay
A
B
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
1st
44.4
138.4
164.7
98.3
178.0
123.7
157.5
119.4
150.4
92.2
169.8
181.5
105.4
157.6
168.4
161.0
156.3
152.3
138.9
112.3
2d
130.4
236.4
110.4
154.5
108.1
185.4
62.3
74.1
137.8
54.0
168.4
147.5
133.1
164.6
173.5
132.8
128.6
169.5
101.1
135.1
3d
127.6
202.5
185.7
171.9
197.9
130.3
65.2
118.2
105.5
139.2
169.9
104.1
62.0
195.0
150.4
161.0
111.8
162.1
127.9
123.9
4th
127.7
128.7
185.0
104.8
112.0
169.2
94.4
144.4
55.2
116.7
159.6
167.4
144.9
136.3
116.4
147.1
157.6
106.6
178.3
258.9
5th
108.4
154.3
150.0
230.1
66.6
91.8
156.1
212.0
122.9
32.1
179.6
172.4
129.1
136.6
143.7
199.8
129.3
112.0
127.5
192.1
Frequency Table
5
6th
184.0
117.0
198.7
102.8
160.9
134.5
133.6
132.3
127.8
184.8
33.5
128.8
94.9
223.7
179.5
141.4
115.2
141.0
145.1
155.0
7th
139.1
125.9
144.5
156.6
106.8
153.5
101.9
136.1
180.6
127.1
193.3
138.6
147.6
134.0
84.5
178.1
73.3
110.7
53.5
122.3
8th
120.6
127.2
121.5
136.1
123.2
131.4
117.6
184.3
53.0
171.8
99.5
110.1
167.9
179.1
161.5
145.7
94.3
145.8
182.4
86.1
9th
174.1
175.6
93.2
93.8
162.5
254.0
87.6
177.2
150.1
159.6
124.3
141.1
136.7
85.7
140.5
124.8
161.9
206.1
147.9
147.0
10th
187.9
114.1
202.2
197.8
118.3
194.6
142.4
151.8
138.4
123.8
208.6
189.3
173.2
122.3
94.1
179.8
154.7
88.8
138.0
118.0
Consider the 200 measurements of the live load distribution (pounds per square foot) on ten
floors and twenty bays of a large warehouse (Table 1.1). The live load is the load supported by
the structure excluding the weight of the structure itself. Notice how hard it is to understand data
presented in this raw form. They must clearly be organized and summarized in some fashion before
their analysis can be attempted.
One way to summarize a large data set is to condense it into a frequency table (see Table 1.2).
The first step to construct a frequency table is to determine an appropriate data range, that is,
an interval that contains all the observations and that has end points close (but not necessarily
equal) to the smallest and largest data values. The second step is to determine the number k of
bins. The data range is divided into k smaller subintervals, the bins, usually taken of the same size.
Normally, the number of bins k is chosen between 7 and 15, depending on the size of the data set
with fewer bins producing simpler but less detailed tables. For example, in the case of the live load
data, the smallest and largest observations are 32.1 and 258.9, the data range is [20, 260] and there
are 12 bins of size 20. The third step is to calculate the bin mark, ci , which represents that bin.
The bin mark is the center of the bin interval (that is, one half of the sum of the bins end points).
For example, 30 = (20 + 40)/2 for the first bin in Table 1.2. The fourth step is to calculate the
bin frequencies, ni . The bin frequency is equal to the number of data points lying in that bin. Each
data point must be counted once; if a data point is equal to the end points of two successive bins,
then it is included (only) in the second. For example, a live load of 60 is included in the third bin
(see Table 1.2). The fourth step is to calculate the relative frequencies
ni
fi =
n1 + n2 + . . . + nk
and the cumulative relative frequencies
n1 + . . . + ni
Fi =
.
n1 + n2 + . . . + nk
Notice that fi 100% gives the percentage of observations in the ith bin and Fi 100% gives the percentage of observations below the end point of the ith bin. For example, from Table 1.2, 18% of the
live loads are between 140 and 160 psf , and 95% of the live loads are below 200 psf .
Table 1.2: Frequency Table
Class
2040
4060
6080
80100
100120
120140
140160
160180
180200
200220
220240
240260
ci
30
50
70
90
110
130
150
170
190
210
230
250
ni
2
5
6
15
28
47
36
32
19
5
3
2
fi
0.010
0.025
0.030
0.075
0.14
0.235
0.180
0.160
0.095
0.025
0.015
0.010
Fi
0.010
0.035
0.065
0.140
0.280
0.515
0.695
0.855
0.950
0.975
0.990
1.000
At this point it is worth comparing Table 1.1 and Table 1.2. We can quickly learn, for instance,
from Table 1.2 that only 2 live loads lie between 20 and 40, but we cannot say which they are. On
the other hand, with considerably eort, we can find out from Table 1.1 that these live loads are
32.1 and 33.5. Table 1.2 looses some information in exchange for clarity. The loss of information
and gain in clarity are proportional to the number of bins.
Histogram:
probability
0.0
The information contained in a frequency table can be graphically displayed in a picture called
histogram (see Figure 1.1). Bars with areas proportional to the bin frequencies are drawn over each
bin. Notice that in the case of bins of equal size the bar areas are proportional to the bar heights.
The histogram shows the shape or distribution of the data and permits a direct visualization of
its general characteristics including typical values, spread, shape, etc. The histogram also helps
to detect unusual observations called outliers. From Figure 1.1 we notice that the distribution of
the live load is approximately symmetric: the central bin 120 140 is the most frequent and the
frequency of the other bins decrease as we move away from this central bin.
50
100
150
200
250
class
0.0
0.0
0.10 0.20
0.30
46
50
52
54
20
40
60
0 10 20 30 40 50 60
80
48
24.81524.82024.82524.83024.83524.840
1.2
Sample Mean
Quantitative variables such as the live load are usually denoted by upper case letters X, Y , etc. The
particular measurements for these variables are denoted by the corresponding lower case letters, xi ,
yi , etc. The subscripts give the order in which the measurements have been taken. For example,
the variable live load can be represented by X and, if the measurements were made floor by floor
from the first to the tenth, from bay A to bay U, then
x1 = 44.4,
x2 = 138.4,
...
x10 = 92.2,
x200 = 118.0.
...
The sample mean x (also called sample average) of a data set or sample is defined as
x=
x1 + x2 + + xn
=
n
Pn
i=1 xi
where n represents the number of data points (observations). For the live load data (see Table 1.1)
x = 140.156 pounds per ft2 .
The sample average can also be approximately calculated from a frequency table using the formula
Pk
k
ci ni X
=
ci fi .
i=1 ni
x Pi=1
k
The approximation is better when the measurements are symmetrically distributed over each bin.
For the live load data (see Table 1.2) we have
x
= (30 0.01) + (50 0.035) + . . . + (250 0.01) = 139.8 pounds per ft2 ,
which is close to the exact value, 140.156.
Properties of the Sample Mean
Linear Transformations: If the original measurements, xi are linearly transformed to obtain new
measurements
yi = a + bxi ,
for some constants a and b, then
y = a + bx.
In fact,
y=
Pn
i=1 yi
Pn
i=1 (a
+ bxi )
na + b
Pn
i=1 xi
=a+b
Pn
i=1 xi
= a + bx.
Example 1.1 Suppose that each live load from Table 1.1 is increased by 5 kilograms and converted
to kilograms per square foot. Since one pound equals 0.4535 kilograms, the revised measurements
are yi = 5 + 0.4535xi and y = 5 + 0.4535x = 5 + 0.4535 140.2 = 68.58kg.
Sum of Variables: If new measurements zi are obtained by adding old measurements xi and yi
then
z = x + y.
In fact,
z=
Pn
i=1 zi
Pn
i=1 (xi
+ yi )
Pn
i=1 xi
+
n
Pn
i=1 yi
= x + y.
Example 1.2 Let ui and vi (i = 1, . . . , 10) represent the live loads on bays A and B. The mean
load across floors for these two bays are (see Table 1.1)
u = (44.4 + 130.4 + . . . + 187.9)/10 = 134.42
(Bay A)
(Bay B).
If wi represent the combined live loads on bays A and B (i.e. wi = ui + vi ) then the combined mean
load across floors for these two bays is
w = u + v = 134.42 + 152.01 = 286.43.
10
Least Squares: The sample mean has a nice geometric interpretation. If we represent each observation xi as a point on the real line, then the sample mean is the point which is closest to entire
collection of measurements. More precisely, let S(t) be the sum of the squared distances from each
observation xi to the point t:
S(t) =
Then S(t) S(x) for all t. To prove this write
S(t) =
=
=
n
X
n
X
n
X
n
X
(xi t)2 .
[(xi x) + (x t)]2
[(xi x)2 + (x t)2 + 2(xi x)(x t)]
(xi x)2 + n(x t)2 + 2(x t)
since
n
X
n
X
(xi x)
(xi x) = nx nx = 0
for all t.
(Static Equilibrium).
Since the Fi0 s are all equal (Fi = w, say) we have F nw = 0 and so F = nw. To achieve torque
equilibrium, the placement d of F must satisfy
dF + (x1 F1 ) + (x2 F2 ) + . . . + (xn Fn ) = 0
(Torque Equilibrium).
x1 + x2 + . . . + xn
= x.
n
F3
F5
F2
x3
x5
x
6x2
F4
F1
x4
11
x1
F
Figure 1.3: The Sample Mean As Center of Gravity
1.3
Given the measurements (or sample) x1 , x2 , . . . , xn , their sample standard deviation SD(x) is defined
as
sP
n
2
i=1 (xi x)
SD(x) = +
.
n1
The expression inside the square root is called the sample variance, and denoted Var(x). In the
case of the live load data (Table 1.1)
Var(x) = 1583.892 square pounds per ft4
and
The standard deviation can be approximately calculated from a frequency table using the formula
sP
k
i=1 (ci
x)2 ni
.
n1
The approximation is better when the observations are symmetrically distributed on each bin. For
the live load (Table 1.2) we have
SD(x) +
SD(x)
(yi y)2
=
(n 1)
(a + bxi a bx)2
(n 1)
[b(xi x)]2
= b2
(n 1)
(xi x)2
= b2 Var(x).
(n 1)
12
Example 1.3 As in Example 1.1, each live load in Table 1.1 is increased by 5 kilograms per square
foot and converted to kilograms per square foot. Since one pound equals 0.4535 kilograms, the revised
measurements are yi = 5 + 0.4535xi kilograms per square foot and so Var(y) = 0.45352 Var(x) =
0.2056623 p1583.892 = 325.747kg 2 square kilograms per ft4 . The corresponding standard deviation
is SD(y) = 325.747 = 18.048kg kilograms per square foot.
Sum of Variables: If new measurements zi are obtained by adding old measurements xi and yi
then
Var(z) = Var(x) + Var(y) + 2Cov(x, y),
where
(1.1)
Pn
i=1 (xi
x)(yi y)
,
n1
is the covariance between xi and yi . The covariance will be further discussed in the next Chapter.
The important point here is to notice that the variances of xi and yi cannot simply be added to
obtain the variance of zi .
To prove (1.1) write
Cov(x, y) =
Var(z) =
=
=
Pn
i=1 (zi
Pn
z)2
=
n1
i=1 [(xi
Pn
i=1 (xi
Pn
i=1 (xi
+ yi x y)2
=
n1
Pn
i=1 [(xi
x) + (yi y)]2
n1
x)2 +
Pn
i=1 (yi
y)2 + 2
n1
Pn
i=1 (xi
x)(yi y)
Example 1.4 As in Example 1.2 let ui and vi be the live loads on bays A and B. The variances
and covariance for these loads are (see Table 1.1 and Example 1.2)
Var(u) =
Var(v) =
Cov(u, v) =
(Bay A)
(Bay B)
(xi x)2 =
n
X
i=1
x2i nx2 =
n
X
i=1
n
X
x2i (
i=1
xi )2 /n
(1.2)
13
(xi x)(yi y) =
n
X
i=1
xi yi nx y =
n
X
i=1
n
X
xi yi (
n
X
xi )(
i=1
yi )/n.
(1.3)
i=1
n
X
(xi x)2 =
i=1
(x2i + x2 2xi x) =
n
X
i=1
Pn
i=1 xi
n
X
i=1
x2i + nx2 2x
n
X
xi .
i=1
= nx and so
n
X
xi )2 /n.
i=1
Bay A (ui )
44.4
130.4
127.6
127.7
108.4
184.0
139.1
120.6
174.1
187.9
1344.2
Bay B (vi )
138.4
236.4
202.5
128.7
154.3
117.0
125.9
127.2
175.6
114.1
1520.1
u2i
1971.36
17004.16
16281.76
16307.29
11750.56
33856.00
19348.81
14544.36
30310.81
35306.41
196681.5
vi2
19154.56
55884.96
41006.25
16563.69
23808.49
13689.00
15850.81
16179.84
30835.36
13018.81
245991.8
ui vi
6144.96
30826.56
25839.00
16434.99
16726.12
21528.00
17512.69
15340.32
30571.96
21439.39
202364.0
Example 1.5 To illustrate the use of (1.2) and (1.3), lets calculate again Var(u), Var(v) and
Cov(u, v) where ui and vi are as in Example 1.4. Using (1.2) and the totals from Table 1.3 we have
196681.5
Var(u) =
9
(1344.2)2
10
= 1777.128
245991.8
and Var(v) =
9
(1520.1)2
10
= 1657.93.
1.4
202364.0
(1344.2)(1520.1)
10
= 218.650.
The location of non-symmetric data sets may be poorly represented by the sample mean because
the sample mean is very sensitive to the presence of outliers in the data. Notice that observations
far from the center have high torque or leverage and attract the sample mean (center of gravity)
toward them. The dispersion of non-symmetric data sets may also be poorly represented by the
sample standard deviation.
14
Example 1.6 A student with an average of 94.7% (SD=2.8%) on the first 10 assignments had a
personal problem and did very poorly on the eleventh where he got zero. Calculate his current
average and standard deviation.
Solution The mean drops from 95 to
x=
(10 95) + 0
= 86.09.
11
x2i =
10
X
i=1
P10
2
i=1 (xi 95)
Therefore,
90320.56 + 02 (11 86.092 )
= 879.4191,
10
p
and the standard deviation, then, increases from 2.8 to 879.4191 = 29.66.
Var(x) =
We will see that data sets which are asymmetric or include outliers may be better summarized
using the sample quantiles defined below.
Sample Quantiles
Let 0 < p < 1 be fixed. The sample quantile of order p, Q(p), is a number with the property
that approximately p100% of the data points are smaller than it. For example, if the 0.95 quantile
for the class final grades is Q(0.95) = 85 then 95% of the students got 85 or less. If your grade is
87 then you are in the the top 5% of the class. On the other hand, if your mark were smaller than
Q(0.10) than you would be in the lowest 10% of the class.
To compute Q(p) we must follow the following steps
1 Sort the data from smallest data point, x(1) , to largest data point, x(n) , to obtain
x(1) x(2) . . . x(n) .
The ith largest data point is denoted x(i) .
2 Compute the number np + 0.5. If this number is an integer, m, then
Q(p) = x(m) .
If np + 0.5 is not an integer and m < np + 0.5 < m + 1 for some integer m then
Q(p) =
x(m) + x(m+1)
.
2
15
Example 1.7 Let ui and vi be the live loads on the first two floors (see Table 1.4). Calculate the
quantiles of order 0.25, 0.50 and 0.75 for the live load on floors 1 and 2 and for the dierences
wi = ui vi between the live loads on these two floors.
Solution
To calculate the quantile of order 0.25 for the live load on floor 1, Qu (0.25), observe that n = 20,
p = .25 and so np + .5 = 20 .25 + .5 = 5.5 is between 5 and 6. Using the column u(i) from Table
1.4 we obtain
u(5) + u(6)
112.3 + 119.4
Qu (0.25) =
=
= 115.85.
2
2
Similar calculations give Qv (0.25) = 109.25 and Qw (0.25) = 25.25. To calculate Qu (0.50) notice
that np + .5 = 20 .50 + .5 = 10.5 is between 10 and 11. Again, using the column u(i) from Table
1.4 we obtain
u(10) + u(11)
150.4 + 152.3
Qu (0.50) =
=
= 151.35.
2
2
The reader can check using similar calculations that Qv (0.50) = 134.1, Qw (0.50) = 7, Qu (0.75) =
162.85, Qv (0.75) = 166.5 and Qw (0.75) = 38.
Unfortunately, the sample quantiles do not have the same nice properties as the the sample
mean in relation with sums and dierences of variables. For example
Qu (0.50) Qv (0.50) = 151.35 134.1 = 17.25
is quite dierent from Quv (0.50) = Qw (0.50) = 7. Also
Qu (0.25) Qv (0.25) = 115.85 109.25 = 6.6 6= 25.25 = Quv (0.50)
and
Qu (0.75) Qv (0.75) = 151.35 134.1 = 17.25 6= 38 = Quv (0.75).
Median and Interquartile Range
The quantiles Q(0.25), Q(0.5) and Q(0.75) are particularly useful and given special names: lower
quartile, median and upper quartile. Notice that the lowest 25% of the data is below Q(0.25) and
the lowest 75% of the data is below Q(0.75). Because of that, Q(0.25) and Q(0.75) are also called
first and third qartiles.
The lowest 50% of the data is below Q(0.5) and the other half is above it. Therefore the median
divides the data into two equal pieces, regardless the shape of the histogram. Because of this
property and the fact that the median is not much aected by outliers, it is often used as a measure
of location (instead of the mean).
The mean and the median are equal in the case of perfectly symmetric data sets. They are also
close in the presence of mild asymmetry. But very asymmetric data sets can produce very dierent
means and medians. When the mean and the median roughly agree we will normally prefer the
mean because of its nicer numerical properties (see the comments at the end of Problem 1.7). When
they do not, however, we will normally prefer the median because of its resistance to outliers. The
dierence between the mean and the median is a strong indication of the presence outliers in the
data which are severe enough to upset the sample mean.
16
ui
44.4
138.4
164.7
98.3
178.0
123.7
157.5
119.4
150.4
92.2
169.8
181.5
105.4
157.6
168.4
161.0
156.3
152.3
138.9
112.3
138.53
34.66
u(i)
44.4
92.2
98.3
105.4
112.3
119.4
123.7
138.4
138.9
150.4
152.3
156.3
157.5
157.6
161.0
164.7
168.4
169.8
178.0
181.5
vi
130.4
236.4
110.4
154.5
108.1
185.4
62.3
74.1
137.8
54.0
168.4
147.5
133.1
164.6
173.5
132.8
128.6
169.5
101.1
135.1
135.38
43.61
v(i)
54.0
62.3
74.1
101.1
108.1
110.4
128.6
130.4
132.8
133.1
135.1
137.8
147.5
154.5
164.6
168.4
169.5
173.5
185.4
236.4
wi
-86.0
-98.0
54.3
-56.2
69.9
-61.7
95.2
45.3
12.6
38.2
1.4
34.0
-27.7
-7.0
-5.1
28.2
27.7
-17.2
37.8
-22.8
3.145
51.37
w(i)
-98.0
-86.0
-61.7
-56.2
-27.7
-22.8
-17.2
-7.0
-5.1
1.4
12.6
27.7
28.2
34.0
37.8
38.2
45.3
54.3
69.9
95.2
As a rule of thumb we will calculate both the mean and the median and use the mean if they
are similar. Otherwise we will use the median. To guide our choice we can calculate the discrepancy
index
p |Mean Median|
d= n
2 IQR
and choose the mean when d is smaller than 1. The interquartile range (IQR), used in the denominator of d above, is defined as
IQR = Q(0.75) Q(0.25),
The IQR is recommended as a measure of dispersion in the presence of outliers and lack of symmetry.
Notice that IQR is proportional to the length of the central half of the data, regardless the shape
of the histogram, and it is not much aected by outliers.
Example 1.8 Refer to Example 1.6. Calculate the median, the interquatile range and the discrepancy index d for the students marks before and after the eleventh assignment (The marks are 94,
93, 95, 91, 96, 91, 98, 93, 99, 97 and 0). just one
Solution Since the sorted marks (before the eleventh assignment) are 91, 91, 93, 93, 94, 95, 96, 97,
98, 99, Q(0.25) = x(3) = 93, Q(0.5) = (x(5) + x(6) )/2 = (94 + 95)/2 = 94.5 and Q(0.75) = x(8) = 97.
p
Therefore, Median(x) = 94.5, IQR(x) = 9793 = 4 and d = 10(94.794.5)/(24) = 0.07905694.
Including the eleventh assignment we have Q(0.25) = (x(3) + x(4) )/2 = (91 + 93)/2 = 92,
Q(0.5) = x(6) = 94 and Q(0.75) = (x(8) + x(9) )/2 = (96 + 97)/2 = 96.5. Therefore, the new median
and IQR are: Median(x) = 94 and IQR(x) = 96.5 92 = 4.5. Unlike the mean, the median is very
littlepaected by the single poor performance. This is also reflected by the large discrepancy index
d = 11(86.09 94)/(2 9) = 2.915.
17
Example 1.9 Table 1.5 gives the mean, median, standard deviation and IQR for the data sets on
Figure 1.2. The mean and median of Tobins Q ratios show appreciable dierences (d = 2.98). In
addition, their standard deviation is more than twice their IQR. Clearly, the mean and standard
deviation are upset by a few heavily overrated firms. Tobins Q ratios are then better represented
by their median and IQR. The eect of outliers and lack of symmetry is moderate in the case of the
Age of Ocers data. Although d = 1.07 the mean and standard deviation still summarize these
data well. Finally, for the Speed of Light data the two clear (lower) outliers do not seem to have
much aect on the sample mean (d = 0.64).
Table 1.5: Summary figures for the data sets displayed on Figure 1.2
Data Set
Tobins Q ratio
Age of ocers
Speed of light
1.5
Mean
158.6
51.494
24.826
Median
118.5
52
24.827
Discrepancy
2.98
1.07
0.64
S. Deviation
97.749
1.739
0.011
IQR
47.593
2.222
0.005
Box Plot
The box plot is a powerful tool to display and compare data sets. It is just a box with whiskers
which helps to visualize the main quantiles (Q(0.25), Q(0.50) and Q(0.75)) and the extreme data
points (maximum and minimum).
For the following discussion refer to Figure 1.4 (b) and (d). The lower and upper ends of
the box are determined by the lower and upper quartiles (Q(0.25) and Q(0.75)); a line sectioning
the box displays the sample median and its relative position within the interquartile range. The
median then divides the main box into two smaller subboxes which represent the lower and upper
central quarters of the data. Symmetric data sets have upper and lower subboxes of equal size.
Asymmetric data sets have subboxes of dierent sizes, the larger one indicating the direction of
the asymmetry. The data on Figure 1.4 (b) is mildly asymmetric with a longer lower tail: the lower
subbox is larger than the upper one and the lower whisker is longer than the upper one. The data
on Figure 1.4 (d) is symmetric. The location and dispersion of a data set are also clearly conveyed
by the box plot: the position of the box (and the median line) give the location; the size (length)
of the box (proportional to the IQR) gives the dispersion. Larger boxes indicate larger dispersion.
Finally, the whiskers at either end extend to the extreme values (maximum and minimum).
Points which are above Q(0.75) + 1.5IQR or below Q(0.25) 1.5IQR are considered outliers.
The following rule is used to help visualizing outliers in the data: the length of the whiskers should
not exceed 1.5IQR and points outside this range are displayed as unconnected horizontal lines. This
is illustrated by Figure 1.4 (a) and (c) where the presence of outliers is flagged by the existence of
unconnected horizontal lines above the upper whisker (Figure 1.4 (a)) or below the lower whisker
(Figure 1.4 (c)).
18
24.7624.7824.8024.8224.84
48
50
52
54
Figure 1.4: Box plots for the data sets displayed on Figure 1.2
Example 1.10 Table 2.3 gives the monthly average flow (cubic meters per second) for the Fraser
River at Hope, BC, for the period 19711990. Figure 1.5 gives the boxplots for each month, from
January to December (from left to right). The year to year distributions of the monthly flows are
mildly asymmetric, with longer upper tails, and there are some outliers. However, the location and
dispersion summaries (see Table 1.10) are roughly consistent for most months and point to the same
conclusion: the river flow, and its variability as well, are much larger in the summer.
Table 1.6: Fraser River Monthly Flow (cms)
Year
Mean
Median
SD
IQR
Jan
957.4
868.0
274.4
174.6
Feb
894.8
849.5
202.8
163.0
Mar
993.1
926.5
233.5
257.0
Apr
1941.0
2010.0
477.8
427.8
May
4994.5
5000.0
976.4
613.0
Jun
6973.0
6365.0
1434.2
1325.9
Jul
5505.0
5120.0
1212.2
1277.8
Aug
3548.0
3380.0
886.4
505.6
Sep
2340.0
2245.0
685.6
446.3
Oct
1816.0
1910.0
401.7
424.1
Nov
1588.9
1525.0
366.1
377.8
Dec
1092.4
1005.0
282.2
181.1
19
2000
4000
6000
8000
10000
1.6. EXERCISES
Figure 1.5: Fraser River monthly flow (cms) from January (left) to December (right)
1.6
Exercises
Problem 1.1 The records of a department store show the following total monthly finance charges
(in dollars) for 240 customers which accounts included finance charges (see Table 1.7). From a
department stores records for a particular month, the total monthly finance charges in dollars were
obtained from 240 customers accounts that included finance charges. See the table shown below:
(a) Complete the frequency table. What percentage of customers were charged less than $20?
9
8
2
15
18
18
5
1
10
1
10
8
5
15
2
5
10
3
9
11
20
611
599
1051
781
578
796
774
820
772
696
573
748
748
797
851
809
723
5.47
4.88
5.62
5.63
4.07
5.29
5.34
5.26
5.44
5.46
5.55
5.34
5.30
5.36
5.79
5.75
5.29
5.10
5.86
5.58
5.27
5.85
5.65
5.39
21
1.6. EXERCISES
Problem 1.6 The mean size of twenty five recent projects at a construction company (in square
meters) is 25,689 m2 . The standard deviation is 2,542 m2 .
(a) Calculate the mean, variance and standard deviation in square feet [Hint: 1 foot = 0.3048 m].
(b) A new project of 226050 f t2 has been just completed. Update the mean, variance and standard
deviation.
Problem 1.7 The daily sales in April, 1994 for two departments of a large department store (in
thousands of USA dollars) are summarized below.
Problem 1.9 Find the average, variance and standard deviation for the following sets of numbers.
a) 1, 2, 3, 4, 5, . . . , 300
b) 4, 8, 12, 16, 20, . . . , 1200
c) 1, 2, 2, 3, 3, 3, 4, 4, 4, 4, . . . , 9, 9, 9, 9, 9, 9, 9, 9, 9
Pn
Pn 2
Pn 3
Hint:
i = n(n + 1)/2,
i = n(n + 1)(2n + 1)/6,
i = n2 (n + 1)2 /4
and
Pn 4
3
2
i = n(n + 1)(6n + 9n + n 1)/30
22
30 Bolts
0.90
0.65
0.62
0.86
0.63
0.75
0.80
1.00
4.31
3.58
3.72
3.64
3.35
3.64
3.55
4.47
frequency
9
1177
5390
4263
5034
1449
141
15
1
Problem 1.10 The number of worldwide earthquakes in 1993 is shown in the following table
(a) Complete the frequency table. What percentage of earthquakes were below 5.0? Above 6.0?
(b) Draw a histogram and comment on it.
(c) Calculate the mean and standard deviation for the earthquake magnitude in 1993.
Problem 1.11 The daily number of customers served by a fast food restaurant were recorded for
30 days including 9 weekends and 21 weekdays. The average and standard deviations are as follows:
Weekends: x1 = 389.56, SD1 = 27.4
Weekdays: x2 = 402.19, SD2 = 26.2
Calculate the average and standard deviation for the 30 days.
Problem 1.12 The average and the standard deviation for the weights of 200 small concretemix
bags (nominal weight = 50 pounds) are 51.2 pounds and 1.5 pounds, respectively. A new sample
of 200 large concretemix bags (nominal weight = 100 pounds) have just been weighed. Do you
expect that the standard deviation for the last sample will be closer to 1.5 pounds or to 3.0 pounds?
Justify your answer.
23
1.6. EXERCISES
5
X
|xi t|,
for several values of t between 1 and 20, and plot D(t) versus t. Where is the minimum achieved?
Do the same experiment for the data set x1 = 1, x2 = 3, x3 = 8, x4 = 12. Do you notice
any pattern? If so, repeat this experiment for several additional sets of numbers, to investigate the
persistence of this pattern. What is your conclusion? Can you prove it mathematically?
Problem 1.14 Each pair (xi , wi ), i = 1, , n, represents the placement and magnitude of a vertical force acting on a uniform beam. Find the center of gravity of this system. [Hint: see the
discussion under The Sample Mean as Center of Gravity and notice that in the present case the
vertical forces are not equal].
Problem 1.15 Calculate the center of gravity of the system when the placements (xi ) and weights
(wi ) are given by Table 1.12.
Table 1.12: Placements of Vertical Forces on a Uniform Beam
xi
1.8
1.4
1.3
3.8
1.2
1.9
1.2
1.1
1.1
wi
2.1
1.6
1.4
6.4
1.3
1.2
1.2
3.1
1.1
xi
1.2
1.3
1.2
1.2
1.4
1.3
1.6
1.1
1.2
wi
1.5
4.7
2.3
2.3
3.1
1.9
2.4
3.7
1.2
Problem 1.16 Each pair (xi , wi ), i = 1, , n, represents the placement and magnitude of a vertical force acting on a uniform beam. What values of wi would make the sample median the center
of gravity? Consider the cases when n is even and n odd separately.
Problem 1.17 The maximum annual flood flows for a certain river, for the period 1941-1990, are
given in Table 1.6.
(i) Summarize and display these data.
(ii) Compute the mean, median, standard deviation and interquartile range.
(iii) If a oneyear construction project is being planned and a flow of 150000 cfs or greater will halt
construction, what is the probability (based on past relative frequencies) that the construction
will be halted before the end of the project? What if it is a two-year construction project?
Problem 1.18 The planned and the actual times (in days) needed for the completion of 20 job
orders are given in Table 1.14.
(a) Calculate the average and the median planned time per order. Same for the actual time.
(b) Calculate the corresponding standard deviations and interquartile ranges.
24
Flood, cfs
153000
184000
66000
103000
123000
143000
131000
99000
137000
81000
144000
116000
11000
262000
44000
8000
199000
6000
166000
115000
88000
29000
66000
72000
37000
Year
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
Flood, cfs
159000
75000
102000
55000
86000
39000
131000
111000
108000
49000
198000
101000
253000
239000
217000
103000
86000
187000
57000
102000
82000
58000
34000
183000
22000
(c) If there is a delay penalty of $5000 per day and a beforeschedule bonus of $2500 per day, what
is the average net loss ( negative loss = gain) due to dierences between planned and actual times?
What is the standard deviation?
(d) Study the relationship between the planned and actual times.
(e) What would be your advice to the company based on the analysis of these data?
P
25
1.6. EXERCISES
Table 1.14: The planned and the actual times
Order
1
2
3
4
5
6
7
8
9
10
Planned Time
22
11
11
16
21
12
25
20
13
34
Actual Time
22
8
8
14
20
16
29
20
10
39
Order
11
12
13
14
15
16
17
18
19
20
Planned Time
17
27
16
30
22
17
13
18
21
18
Actual Time
18
34
14
35
18
16
12
14
19
17
Problem 1.20 The total paved area, X (in km2 ), and the time, Y (in days), needed to complete
the project was recorded for 25 dierent jobs. The data is summarized as follows:
x = 12.5 km2
y = 30.8 days
Cov(x, y) = 3.4
Give the corresponding summaries when the area is measured in ft2 and the time is measure in
hours.
Hint: 1 foot = 0.3048 m, and 1 km = 1000 m.
26
Chapter 2
2.1
Scatter Plot
June
5000 7000 9000 11000
28
10
20
30
Age
40
Flow
2000400060008000
Price
800 1200 1600 2000
6
8
Month
10
12
29
1000
2000
3000
Mar
6008001000
1400
Apr
1800
5000 7000
10001500200025003000
May
3000
600800
1200 1600
Jan
600 8001000120014001600
Feb
300040005000600070008000
2.2
The Covariance and the correlation coecient are used to quantify the degree of linear association
between pairs of variables. If two variables, xi and yi , are positively associated then when one of
them is above (below) its mean the other will also tend to be above (below) its mean. Therefore,
the products (xi x)(yi y) will be mostly positive and the sample covariance,
Cov(x, y) =
n
1 X
(xi x)(yi y)
n 1 i=1
(2.1)
will be large and positive. On the other hand, if the variables are negatively associated, when one
of them is above (below) its mean the other will tend to be below (above) its mean and so the
products (xi x)(yi y) will be mostly negative. In this case the sample covariance (2.1) will be
large and negative. Finally, if the variables are not positively nor negatively associated the products
(xi x)(yi y) will be positive and negative with approximately the same frequency (there will be
a fair degree of cancellation) and the sample covariance will be small.
The following formula provides a simple procedure for the hand calculation of the covariance:
Cov(x, y) =
=
n
1 X
(xi x)(yi y)
n 1 i=1
n
[xy x y] ,
n1
where
xy =
n
1X
xi yi
n i=1
(2.2)
Some problems with the interpretation of the covariance and its direct use as a measure of linear
association are illustrated in Example 2.1.
Example 2.1 Consider the measurements (xi , yi ) of the firstcrack and failure load (in pounds
per square foot) on Table 2.1. Figure 2.3 suggests that there little association between these measurements. Since x = 8396.6 pounds per square foot, y = 16, 064.4 pounds per square foot, and
30
Cov(x, y)
= 0.011259 million square pounds per ft4 .
1000 1000
Table 2.1: Strength of concrete beams
Unit
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Correlation Coecient
Problem 2.1 illustrates the strong dependency of Cov(x, y) on the scale of the variables. A
measure of linear association which is independent from the variables scale (see 2.5) is provided the
sample correlation coecient,
Cov(x, y)
Cov(x, y)
=
.
SD(x) SD(y)
Var(x)Var(y)
r(x, y) = p
11258.99
= 0.0026.
(2193.17)(1949.36)
31
16000
14000
Failure Load
18000
6000
8000
10000
12000
First-Crack Load
96.2759
= 0.9664 = 0.97,
77.50 128.06
indicates a strong positive linear association between temperature and yield. This is also clearly
suggested by the scatter plot in Figure 2.4. Notice that the relation between yield and tempreature
is likely to be causal, that is, the increase in yield may be actually caused by the increase in
temperature.
Several Pairs of Variables
When we have several variables their covariances and correlation coecients can be arranged in
matrix layouts called covariance matrix and correlation matrix. Although the covariance matrix is
dicult to interpret due to its dependence on the scale of the variables, it is nevertheless routinely
computed for future usage.
The correlation matrix is the numerical counterpart of the scatter plot matrix discussed before.
For the River Fraser Data (see Figure 2.2) we have
32
Temp. (X)
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
Jan
Feb
Mar
Apr
May
Yield (Y)
28
26
22
25
27
32
31
33
38
41
41
38
41
46
44
Jan
1.00
0.78
0.65
0.40
0.18
Feb
0.78
1.00
0.75
0.34
0.15
Unit
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Mar
0.65
0.75
1.00
0.50
0.19
Temp. (X)
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
Apr
0.40
0.34
0.50
1.00
0.29
Yield (Y)
41
45
53
46
44
49
53
49
51
55
56
58
58
58
63
May
0.18
0.15
0.19
0.29
1.00
As already observed from Figure 2.2, February flaws are somewhat correlated with January and
March flaws (with correlation coecients 0.78 and 0.75, respectively). January and March flaws
are also marginally correlated (correlation coecient equal to 0.65). The correlation coecients
between all the other pairs of months are below 0.50.
2.3
The scatter plot of linearly associated variables approximately follows a linear function
f(x) = 0 + 1 x
called regression line. The hats indicate that 0 , 0 and f(x) are calculated from the data. In this
context X and Y play dierent roles and are given special names. The independent variable X is
called explanatory variable and the dependent variable Y is called response variable.
Least Squares
The solid line on Figure 2.4 (see Example 2.2) was obtained by the method of least squares (LS).
According to this method, the regression coecients (the intercept 0 and the slope 1 ) minimize
(in b0 and b1 ) the sum of squares
S(b0 , b1 ) =
n
X
i=1
(yi b0 b1 xi )2 .
33
40
30
Yield
50
20
25
30
35
40
45
50
Temperature
n
X
i=1
(yi 0 1 xi ) = 0
(Gauss Equations)
(yi 0 1 xi )xi = 0.
which are obtained by r dierencing S(b0 , b1 ) with respect to b0 and b1 . Carrying out the summations
and dividing by n we obtain,
y 0 1 x = 0
xy 0 x 1 xx = 0.
(2.3)
(2.4)
where
xy = (1/n)
n
X
xi yi
and
xx = (1/n)
i=1
n
X
x2i
i=1
From (2.3), 0 = y 1 x. Substituting this into (2.4) and solving for 1 gives
xy x y
1 =
.
xx x x
Fitted Values and Residuals
(2.5)
34
The regression line f(x) and the regression coecients 0 and 1 are good summaries for linearly
associated data. In this case the fitted value
yi = f(xi ) = 0 + 1 xi
(Fitted Value)
will be close to the observed value of yi . How close depends on the strength of the linear association. The dierences between the observed values yi and the fitted values yi ,
are called regression residuals.
ei = yi yi
(Residual),
Residual Plot
The regression residuals ei are usually plotted against the fitted values yi to determine the
appropriateness of the linear regression fit. If the data are well summarized by the regression line
(see Figure 2.5 (a)) the corresponding scatter plot of (
yi , ei ) has no systematic pattern (see Figure
2.5 (c)). Examples of bad residual plots that is, plots that indicate that the regression line is a
poor summary for the data are given on Figure 2.5 (d) and (e). The corresponding scatter plots
and linear fits are given on Figure 2.5 (b) and (c). In the case of Figure 2.5 (d), the residuals go
from positive to negative and back to positive, suggesting that the relation between X and Y may
not be linear. In the case of Figure 2.5 (e) larger fitted values have larger residuals (in absolute
value).
2.4
In practice we often use several explanatory variables to predict or interpolate the values of a
single response variable. The explanatory variables may all be distinct or may include functions
(powers) of the observed explanatory variables.
If for example, we have p explanatory variables (X1 , X2 , , Xp ) and n observations or cases,
it is convenient to use double subscript notation. The first subscript (i) indicates the case and the
second subscript (j) indicates the variable.
Case (i)
1
2
3
yn
xn1 xn2
xnp
n
X
i=1
35
2.5. EXERCISES
The least square coecients are the solution to the linear equations
n
X
i=1
n
X
i=1
n
X
i=1
(Gauss Equations)
n
X
i=1
xp y 0 xp 1 x1 xp 2 x2 xp p xp xp = 0
where
yxj = (1/n)
n
X
i=1
2.5
xij yi
and
xj xk = (1/n)
n
X
xij xik .
(2.6)
i=1
Exercises
Problem 2.1
Problem 2.2 The following data give the logarithm (base 10) of the volume occupied by algal
cells on successive days, taken over a period over which the relative growth rate was approximately
constant.
36
8000
4000
10
15
x
20
25
30
2000
40
50
40
0
20
40
60
80 100
Fitted Value
120
140
20
40
60
80
-2000
100
2000
4000 6000
Fitted Value
-1000
-100
1000
Residual
20
400
y
200
100
30
Residual
20
-20
10
100
Residual
0
8000 10000
-200
2000
50
y
6000
y
100
300
10000
150
12000
200
50
100
150
200
Fitted Value
250
300
Figure 2.5: Examples of linear regression fits (above) and their residual plots (below).
Day (x)
1
2
3
4
5
6
7
8
9
(1) Plot log y against x. Do you think using the logarithmic scale is appropriate? Why?
(2) Calculate and interpret the sample correlation coecient.
Problem 2.3 The maximum annual flood flows of a river, for the period 19491990, are given in
Table 1.6.
(i) Summarize and display these data.
(ii) Compute the mean, median, standard deviation and interquartile range.
(iii) If a oneyear construction project is being planned and a flow of 150000 cfs or greater will halt
construction, what is the relative frequency (based on past relative frequencies) that the construction
will be halted before the end of the project? What if it is a two-year construction project?
37
2.5. EXERCISES
Fitted vs Residuals
20
25
60
70
-2
-4
20
Time
70
80
10
10
15
Diameter
Residual
0
5
6
20
25
20
Residual
0
2
90
50
10
15
Diameter
20
70
80
Fitted Value
25
Diameter vs Residuals
60
25
-2
60
50
70
80
Fitted Value
15
Diameter
Fitted vs Residuals
Residual
0
2
100
90
60
Cubic Fit
25
10
15
Diameter
Diameter vs Residuals
-2
Residual
0
2
Time
70
80
60
50
10
100
90
Residual
0
2
90
80
90
Fitted Value
Fitted vs Residuals
100
Quadratic Fit
-5
-4
15
Diameter
-10
-5
-2
10
-10
50
60
Time
70
80
Diameter vs Residuals
Residual
0
5
90
10
100
Linear Fit
10
15
Diameter
20
25
Problem 2.4 The planned and the actual times (in days) needed for the completion of 20 job
orders are given in Table 1.14
(a) Calculate the average and the median planned time per order. Same for the actual time.
(b) Calculate the corresponding standard deviations and interquartile ranges.
(c) If there is a delay penalty of $5000 per day and a beforeschedule bonus of $2500 per day, what
is the average net loss ( negative loss = gain) due to dierences between planned and actual times?
What is the standard deviation?
(d) Study the relationship between the planned and actual times.
(e) What would be your advice to the company based on the analysis of these data?
Problem 2.5 (a) Show that
Cov(x, y) =
(b) Show that if ui = a + b xi
(i) u = a + bx
(ii) Var(u) = b2 Var(x)
(iii) r(u, v) = r(x, y)
(iv) (u, v) = db (x, y)
and
xi yi ) nx y
n1
and
vi = c + d yi , then
Cov(x, y)
,
Var(x)
38
Jan
855
774
984
987
797
1140
1240
881
801
684
1860
821
972
1160
740
813
1000
629
800
1210
Feb
1030
857
842
929
780
1030
1230
791
721
649
1480
927
977
1010
706
809
944
657
685
841
Mar
841
1500
850
927
736
924
1130
952
957
703
1300
844
1240
1160
801
1280
1300
809
682
926
Apr
1550
2100
1550
2320
1100
2300
2350
1960
1290
1760
1880
1010
1990
2030
2070
2090
2280
2410
1780
3000
May
6120
6450
4910
5890
3940
7070
4710
3950
4910
5120
4950
5360
4090
2870
5300
3770
5120
5450
4860
5050
Jun
7590
10800
6180
8430
6830
7250
5670
5730
6360
4900
6260
8690
6060
6370
7390
8390
5840
5940
6020
8760
Jul
5590
7330
5000
7470
6070
7670
4830
4540
4860
4010
4890
7230
5240
6580
4650
5380
4070
4430
3990
6270
Aug
3570
4120
2930
4360
3420
6440
3620
2970
2610
2720
3620
4850
3460
3780
2770
3220
2980
3010
3170
3340
Sep
2360
2280
1680
2440
2300
4460
2340
2600
1830
2600
2130
3620
2210
2920
1940
1890
1680
1890
1840
1790
Oct
1890
1940
2080
1930
1950
2510
1650
2090
1420
2080
1530
2310
1470
2560
1980
1470
1020
1540
1380
1520
Nov
1550
1500
1620
1290
2360
1800
1260
1590
918
1630
1950
1470
2050
1370
1230
1340
1210
1470
2060
2110
Dec
908
1000
1130
978
1480
1480
1030
1010
952
1900
1140
1110
878
861
746
908
811
926
1410
1190
Problem 2.6 The total paved area, X (in km2 ), and the time, Y (in days), needed to complete
the project was recorded for 25 dierent jobs. The data is summarized as follows:
x = 12.5 km2
y = 30.8 days
Cov(x, y) = 3.4 ,
r(x, y) = 0.766 ,
= 2.36
Give the corresponding summaries when the area is measured in feet2 and the time is measure in
hours.
Hint: 1 foot = 0.305 m, and 1 km = 1000 m.
Problem 2.7 Show that 1 r(x, y) 1.
Hint: One can assume without loss of generality that
x=y=0
n
X
i=1
(yi bxi )2
(why?)
39
2.5. EXERCISES
Flood, cfs
153000
184000
66000
103000
123000
143000
131000
99000
137000
81000
144000
116000
11000
262000
44000
8000
199000
6000
166000
115000
88000
29000
66000
72000
37000
Year
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
Flood, cfs
159000
75000
102000
55000
86000
39000
131000
111000
108000
49000
198000
101000
253000
239000
217000
103000
86000
187000
57000
102000
82000
58000
34000
183000
22000
Planned Time
22
11
11
16
21
12
25
20
13
34
Actual Time
22
8
8
14
20
16
29
20
10
39
Order
11
12
13
14
15
16
17
18
19
20
Planned Time
17
27
16
30
22
17
13
18
21
18
Actual Time
18
34
14
35
18
16
12
14
19
17
40
Chapter 3
Probability
3.1
The theory of probability, which is briefly discussed below, is needed for the better understanding of some important statistical techniques. This theory is, roughly speaking, concerned with the assessment of the chances (or likelihood) that certain events will or will not
occur. In order to give a more precise (and useful) definition of probability, we need first to
introduce some technical concepts and definitions.
Random Experiment: The defining feature of a random experiment is that its outcome
cannot be determined beforehand. That is, the outcome of the random experiment will
only be known after the experiment has been completed. The next time the experiment is
performed (seemingly under the exact same conditions) the outcome may be dierent. Some
examples of random experiments are:
Sample Space (S): Although we may not be able to say beforehand what the outcome of
the random experiment will be, we should at least in principle to be able to make a complete
list of all the possible outcomes. This list (set) of all the possible outcomes is called the
sample space and denoted by S. A generic outcome (that is, element of S) is denoted by
w. The sample spaces for the random experiment listed above are:
S = {Yes, No},
S = {0, 1, 2, . . . , n} where n is the lot size,
S = [0, 1), the time (in hours) between breakdowns can be any non-negative real number.
41
42
CHAPTER 3. PROBABILITY
S = {0, 1, 2, . . .}, the number of accidents can be any non-negative integer number.
S = [0, 100], the percentage yield can be any real number between zero and one hundred.
Event: The events, usually denoted by the first upper case letters of the alphabet (A, B, C,
etc), are simply subsets of S. Most events encountered in practice are meaningful and can
be expressed either in words or using mathematical notation. Some examples (related to the
list of random experiments given above) are:
A = { less than four defectives} = {0, 1, 2, 3}.
B = { more than 200 hours} = (200, 1).
C = {2, 3, 5, 9}
()
w belongs to A
()
A occurs
w doesnt belongs to A
()
A doesnt occur
and
w 62 A
()
Probability Function (P ): Evidently, not all the events are equally likely. For instance,
the event
A = {more than three million accidents}
would appear to be quite unlikely, while the event
B = {more than three hours before the next crash}
would appear to be quite likely.
A probability function P is a function which assigns to each event a number representing
the likelihood that this event will actually occur.
For self-consistency reasons, any probability function P must satisfy the following properties:
(1) P () = 0 and P (S) = 1.
43
Example 3.1 It is known from previous experience that the probability of finding zero, one,
two, etc. defectives in lots of 100 items shipped by a certain supplier are as given in Table
2.1 below.
Let A, B and C be the events less than two defectives, more than one defective and
one or two defectives, respectively. (a) Calculate P (A), P (B) and P (C). (b) What is the
meaning (in words) of the event Ac ? Calculate P (Ac ) directly and using Property 4. (c)
What is the meaning (in words) of the event A [ C? Calculate P (A [ C) directly and using
Property 3.
Table 3.1:
Defectives
0
1
2
3
4
5
6 or more
Probability
0.50
0.20
0.15
0.10
0.03
0.02
0.00
44
CHAPTER 3. PROBABILITY
Solution
(a) From Table 2.1, P (A) = 0.70, P (B) = 0.30, and P (C) = 0.35
(b) Ac = {two or more defectives} = {more than one defective} = B, from Table 1, P (Ac ) =
P (B) = 0.30. This is consistent with the result we obtain using Property 4:
P (Ac ) = 1 P (A) = 1 0.70 = 0.30.
(c) A [ C = {less than three defectives}. Therefore, directly from Table 1, P (A [ C) = 0.85.
To make the calculation using Property 3, we must first find P (A \ C). Since A \ C =
{exactly one defective}, it follows from Table 1 that P (A \ C) = 0.20. Now,
P (A [ C) = 0.70 + 0.35 0.20 = 0.85.
2
3.2
There are instances when, after obtaining some partial information regarding the outcome
of a random experiment, one would like to update the probabilities of certain events, taking
into account the newly acquired information.
The updated probability of the event A, when it is known that the event B has occurred,
is in general denoted by P (A|B) and called the conditional probability of A given B. This
conditional probability can be calculated by the formula
P (A|B) =
P (A \ B)
P (B)
(3.1)
provided that P (B) > 0. A simple, but nevertheless important, consequence of (2) is that
P (A \ B) = P (A|B)P (B),
(3.2)
Since P (B) = 0.30 and P (D \ B) = P ({3, 4, 5}) = 0.15, the desired conditional probability
is
P (D|B) = P (D \ B)/P (B) = 0.15/0.30 = 0.50.
45
Poor design
- underestimated live load
- underestimated maximum wind speed
- etc.
A2
Poor construction
-
A3
A combination of A1 and A2 .
A4
Suppose that, from previous experience or some other source (for example some experts
opinion), the conditional probabilities of B given Ai are known. That is, the probabilities
that the event B will occur when the the cause Ai is present are known and represented by
p 1 , p2 , . . . , p m .
We will call these conditional probabilities risk factors. Suppose also that the probabilities
of each possible cause Ai are known. These probabilities are called prior probabilities and
denoted
1 , 2 , . . . , m .
In the case of our example, the prior probabilities may represent the actual fractions of
industrial buildings in the country which have some design or construction problems. Or they
may represent the subjective beliefs (educated guesses) of some expert consultant (perhaps
the engineer hired by the insurance company to investigate the causes of the accident). In
summary, we suppose that
pi = P (B|Ai ),
and
i = P (Ai ),
46
CHAPTER 3. PROBABILITY
Table 3.2:
Cause (i)
1
2
3
4
Prior Probability
0.00050
0.00010
0.00001
0.99939
(i )
Posterior Probability
0.29
0.12
0.02
0.57
The engineer hired by the insurance company to investigate the accident would certainly
wish to know where he can first start looking to find an assignable causes. More precisely, she
would wish to know what is the most likely assignable cause for the collapse of the building.
The conditional probability of each possible cause, given the fact that the event has
occurred, is called the posterior probability for this cause and can be calculated by the
famous Bayes formula
P (B|Ak )P (Ak )
P (B|A1 )P (A1 ) + P (B|A2 )P (A2 ) + . . . + P (B|Am )P (Am )
pk k
=
.
p1 1 + p2 2 + . . . + pm m
P (Ak |B) =
In the case of our example the posterior probability of the cause poor design (A1 ), for
instance, is equal to
P (A1 |B) =
(0.00050)(0.10)
(0.00050)(0.10) + (0.00010)(0.20) + (0.00001)(0.40) + (0.99939)(0.0001)
= 0.29.
The other posterior probabilities are calculated analogously and the results are displayed in
the fourth column of Table 3.2.
What did the engineer learn from the results of these (posterior probability) calculations?
In the first place she learned that the chance of finding an assignable cause is approximately
43%. Furthermore, she learned that it is best to begin looking for flaws in the design of the
building, as this cause is almost three times more likely to have caused the accident than the
other assignable causes. Finally she learned that it is highly unlikely that the collapse of the
building has been caused by more than one assignable cause.
47
P (B \ Ak )
.
P (B)
(3.3)
therefore,
P (B|Ak )P (Ak )
P (B|A1 )P (A1 ) + P (B|A2 )P (A2 ) + + P (B|Am )P (Am )
k pk
=
.
1 p1 + 2 p2 + + m pm
P (Ak |B) =
Example 3.2 A certain disease is known to aect 1% of the population. A test for the
disease has the following features: if the person is contaminated the test is positive with
probability 0.98. On the other hand, if the person is healthy, the test is negative with
probability 0.95. (a) What is the probability of a positive test when applied to a randomly
chosen subject? (b) What is the probability that an individual is aected by the disease after
testing positive? (c) Explain the connections between this problem and Bayes formula.
Solution
(a) Since B is clearly equal to the disjoint union of the events B \ C and B \ C c ,
P (B) =
=
=
=
P (B \ C) + P (B \ C c )
P (C)P (B|C) + P (C c )P (B|C c )
(0.01 0.98) + (0.99 0.05)
0.0593
48
CHAPTER 3. PROBABILITY
(b)
P (C|B) =
P (B \ C)
P (B|C)P (C)
0.98 0.01
=
=
= 0.1653
P (B)
P (B)
0.0593
Notice that the probability of having the disease, even after testing positive, is surprisingly
low (less than 0.17). Why do you think this is so?
(c) The calculation in part (a) produced the unconditional probability that the event
testing positive. This unconditional probability constitutes the denominator of Bayes
formula. If a person has been tested positive, given the characteristics of the test, this can
be caused by two possible causes: being healthy and being contaminated. The posterior
probability of the second cause is the result of part (b).
2
Independence
Roughly speaking, two events A and B are independent when the probability of any one
of them is not modified after knowing the results for the other (occurrence or not occurrence).
In other words, knowing about the occurrence or no occurrence of any one of these events
does not alter the amount of information (or uncertainty) that we initially had regarding the
other event. Quite simply then, we can say that two events are independent if they do not
carry any information regarding each other.
The formal definition of independence is somewhat surprising at first because it doesnt
make any direct reference to the events conditional probabilities. But see also the remarks
following the definition. Probabilists prefer this formal definition, because it is easy to check
and to generalize for the case of m events (m 2).
Definition: The events A and B are independent if
P (A \ B) = P (A)P (B).
Suppose that the events A and B are such that
P (A|B) = P (A).
In this case,
P (A \ B) = P (A|B)P (B) = P (A)P (B),
and the events A and B are independent according to the given definition.
On the other hand, if P (B) > 0 and A and B satisfy the given definition of independence,
then
P (A|B) =
P (A \ B)
P (A)P (B)
=
= P (A).
P (B)
P (B)
49
Example 3.3 The results of the STAT 251 midterm exam can be classified as follows:
Table 3.3:
High
Medium
Low
Male
0.05
0.30
0.30
0.65
Female
0.15
0.15
0.05
0.35
0.20
0.45
0.35
1.00
What is the meaning of the statement gender and performance are independent? Are they?
Why?
Solution
Gender and performance are (intuitively) independent if for example, knowing the score
of a randomly chosen test doesnt aect the probability that this test corresponds to a male
(0.65, from the table) or to a female (0.35). Or vice versa, knowing the gender of the student
who wrote the test doesnt modify our ability to predict its score.
Let A and B be the events a randomly chosen student is male and a randomly chosen
student has a high score, respectively. Is it true that P (A|B) = P (A)? The answer, of
course, is no because
P (A|B) = 0.05/0.20 = 0.25
and
P (A) = 0.65.
Before knowing that the score is high, the chances are almost two out of three that the
student is a male. However, after we know that the score is high, the chances are one out of
four that the student is a male. The lack of independence in this case is derived from the fact
that male students are underrepresented in the high score category and over-represented
in the low score category.
2
If Table 3.3 above is replaced by Table 3.4
Table 3.4:
High
Medium
Low
Male
0.13
0.29
0.23
0.65
Female
0.07
0.16
0.12
0.35
0.20
0.45
0.35
1.00
50
CHAPTER 3. PROBABILITY
derived from external information regarding the physical make up of the random experiment,
as illustrated in Example 3.4 below.
Fortunately then, we will have few occasions of checking this definition throughout this
course.
Definition: The events Ai (i = 1, . . . , m) are independent if
P (Ai \ Aj )
P (Ai )P (Aj )
P (Ai \ Aj \ Ak )
Example 3.4 A certain system has four independent components {a1 , a2 , a3 , a4 }. The pairs
of components a1 , a2 and a3 , a4 are in line. This means that, for instance, the subsystem
{a1 , a2 } fails if any of its two component does; similarly for the subsystem {a3 , a4 }. The
subsystems {a1 , a2 } and {a3 , a4 } are in parallel. This means that the system works if at least
one of the two subsystems does. Calculate the probability that the system fails assuming
that the four components are independent and that each one of them can break down with
probability 0.10. How many parallel subsystems would be needed if the probability of failure
for the entire system cannot exceed 0.001?
a1
a2
Q
Q
Q
Q
-
Q
Q
s
Q
a3
a4
Qs
3
Solution Let Ai be the event component ai works (i = 1, . . . , 4), and let C be the event
the system works.
P (C) = P [(A1 \ A2 ) [ (A3 \ A4 )] = P (A1 \ A2 ) + P (A3 \ A4 ) P [(A1 \ A2 ) \ (A3 \ A4 )]
= P (A1 )P (A2 ) + P (A3 )P (A4 ) P (A1 )P (A2 )P (A3 )P (A4 )
= 0.92 + 0.92 0.94 = 0.9639
51
To answer the second question, just notice that the probability of working for each independent subsystem is 0.92 = 0.81. Now, if Bi (i = 1, . . . , m) is the event the ith subsystem
works, it follows that
c
0.001 1 P (B1 [ B2 [ . . . [ Bm ) = P (B1c \ B2c \ . . . \ Bm
)
c
c
c
m
= P (B1 )P (B2 ) . . . P (Bm ) = [1 P (B1 )] = (1 0.81)m .
Therefore,
log(0.001) m log(0.19)
=)
log(0.001)
log(0.19)
=)
m = 5.
2
52
3.3
CHAPTER 3. PROBABILITY
Exercises
Problem 3.1 If A and B are independent events with P (A) = 0.2 and P (B) = 0.5, find the
following probabilities. (a) P (A [ B); (b) P (A \ B); and (c) P (Ac \ B c )
Problem 3.2 In a certain class, 5 students obtained an A, 10 students obtained a B, 17
students obtained a C, and 6 students obtained a D. What is the probability that a randomly
chosen student receive a B? If a student receives $10 for an A, $5 for a B, $2 for a C, and $0
for a D, what is the average gain that a student will make from this course?
Problem 3.3 Consider the problem of screening for cervical cancer. The probability that a
women has the cancer is 0.0001. The screening test correctly identifies 90% of all the women
who do have the disease, but the test is false positive with probability 0.001.
(a) Find the probability that a woman actually does have cervical cancer given the test says
she does.
(b) List the four possible outcomes in the sample space.
Problem 3.4 An automobile insurance company classifies each driver as a good risk, a
medium risk, or a poor risk. Of those currently insured, 30% are good risks, 50% are medium
risks, and 20% are poor risks. In any given year the probability that a driver will have at
least one accident is 0.1 for a good risk, 0.3 for a medium risk, and 0.5 for a poor risk.
(a) What is the probability that the next customer randomly selected will have at least one
accident next year?
(b) If a randomly selected driver insured by this company had an accident this year, what is
the probability that this driver was actually a good risk?
Problem 3.5 A truth serum given to a suspect is known to be 90% reliable when the person
is guilty and 99% reliable when the person is innocent. In other words, 10% of the guilty are
judged innocent by the serum and 1% of the innocent are judged guilty. If the suspect was
selected from a group of suspects of which only 5% have ever committed a crime, and the
serum indicates that he is guilty, what is the probability that he is innocent?
Problem 3.6 70% of the light aircrafts that disappear while in flight in a certain country
are subsequently discovered. Of the aircrafts that are discovered, 60% have an emergency
locator, whereas 80% of the aircrafts not discovered do not have an emergency locator.
(a) What percentage of the aircrafts have an emergency locator?
(b) What percentage of the aircrafts with emergency locator are discovered after they disappear?
Problem 3.7 Two methods, A and B, are available for teaching a certain industrial skill.
The failure rate is 20% for A and 10% for B. However, B is more expensive and hence is
only used 30% of the time (A is used the other 70%). A worker is taught the skill by one
of the methods, but fails to learn it correctly. What is the probability that the worker was
taught by Method A?
53
3.3. EXERCISES
Problem 3.8 Suppose that the numbers 1 through 10 form the sample space of a random
experiment, and assume that each number is equally likely. Define the following events: A1 ,
the number is even; A2 , the number is between 4 and 7, inclusive.
(a) Are A1 and A2 mutually exclusive events? Why?
(b) Calculate P (A1 ), P (A2 ), P (A1 \ A2 ), and P (A1 [ A2 ).
(c) Are A1 and A2 independent events? Why?
Problem 3.9 A coin is biased so that a head is twice as likely to occur as a tail. If the coin
is tossed three times,
(a) what is the sample space of the random experiment?
(b) what is the probability of getting exactly two tails?
Problem 3.10 Items in your inventory are produced at three dierent plants: 50 percent
from plant A1 , 30 percent from plant A2 , and 20 percent from plant A3 . You are aware
that your plants produce at dierent levels of quality: A1 produces 5 percent defectives, A2
produces 7 percent defectives, and A3 yields 8 percent defectives. You select an item from
your inventory and it turns out to be defective. Which plant is the item most likely to have
come from? Why does knowing the item is defective decrease the probability that it has
come from plant A1 , and increase the probability that it has come from either of the other
two plants?
Problem 3.11 Calculate the reliability of the system described in the following figure. The
numbers beside each component represent the probabilities of failure for this component.
Note that the components work independently of one another.
.05
1
.05
2
@
@
.1
3
4
.05
@
@ 5
.1
Problem 3.12 A system consists of two subsystems connected in series. Subsystem 1 has
two components connected in parallel. Subsystem 2 has only one component. Suppose the
three components work independently and each has probability of failure equal to 0.2. What
is the probability that the system works?
Problem 3.13 A proficiency examination for a certain skill was given to 100 employees of
a firm. Forty of the employees were male. Sixty of the employees passed the exam, in that
they scored above a preset level for satisfactory performance. The breakdown among males
and females was as follows:
Pass (P)
Fail
54
CHAPTER 3. PROBABILITY
Suppose an employee is randomly selected from the 100 who took the examination.
(a) Find the probability that the employee passed, given that he was male.
(b) Find the probability that the employee was male, given that he passed.
(c) Are the events P and M independent?
(d) Are the events P and F independent?
Problem 3.14 Propose appropriate sample spaces for the following random experiments.
Give also two examples of events for each case.
Counting/measuring:
1 - the number of employees attending work in a certain plant
2 - the number of days with wind speed above 50 km/hour, per year, in Vancouver
3 - the number of earthquakes in BC during any given period of two years
4 - the time between two consecutive breakdowns of a computer network
5 - the number of people leaving BC per year
6 - the percentage of STAT 241/51 students obtaining final marks above 80% in any given
term
7 - the number of engineers working in BC per year
8 - the percentage of computer scientists in BC who will make more than $65, 000 in 1996
9 - the number of employees still working in a certain production plant after 4:30 PM on
Fridays.
Problem 3.15 Let A and B be the events construction flaw due to some human
error and construction flaw due to some mechanical problem.
1) What are the meaning (in words) of the following events: (a) A [ B, (b) A \ B, (c)
A \ B c , (d) Ac \ B c , (e) (A [ B)c , (f) Ac [ B c , (g) (A \ B)c . Draw also the corresponding
diagrams.
2) Show that in general (A \ B)c = Ac [ B c and that (A [ B)c = Ac \ B c (so the results of
(f) and (g) and of (d) and (e) above were not mere coincidences).
3) Suppose that P (A) = 0.02, P (B) = 0.01 and P (A [ B) = 0.023. Calculate (a) P (A \ B),
(b) P (Ac \ B c ), (c) P (A \ B c ), (d) P (A|B c ), (e) P (A|B).
Problem 3.16 A large company hires most of its employees on the basis of two tests. The
two tests have scores ranging from one to five. The following table summarizes the performance of 16,839 applicants during the last six years. From this table we learn, for example,
that 3% of the applicants got a score of 2 on Test 1 and 2 on Test 2; and that 15% of the
applicants got a score of 3 on Test 1 and 2 on Test 2. We also learn that, for example, 20%
of the applicants got a score of 2 on Test 1 and that 25% of the applicants got a score of 2
on Test 2.
A group of 1500 new applicants have been selected to take the tests.
(a) What should the cutting scores be if between 140 and 180 applicants will be shortlisted
for a job interview? Assume that the company wishes to shortlist people with the highest
possible performances on the two tests.
55
3.3. EXERCISES
Table 3.5:
Score
1
2
3
4
5
Total
1
0.07
0.15
0.08
0.10
0.00
0.40
2
0.03
0.03
0.15
0.04
0.00
0.25
3
0.00
0.02
0.09
0.08
0.06
0.25
4
0.00
0.00
0.02
0.01
0.02
0.05
5
0.00
0.00
0.01
0.02
0.02
0.05
Total
0.10
0.20
0.35
0.25
0.10
1.00
Table 3.6:
Score
1
2
3
4
5
Test 1
0.10
0.20
0.35
0.25
0.10
Test 2
0.40
0.25
0.25
0.05
0.05
(b) Same as (a) but assuming now that the company wishes to hire people with the highest
possible performances on at least one of the two tests.
(c) (Continued from (a)) A manager suggests that only applicants who obtain marks above
a certain bottom line in one of the tests be given the other test. Noticing that giving
and marking each test costs the company $55, recommend which test should be given first.
Approximately how much will be saved on the basis of your advise?
(d) Repeat (a)(c) if the two tests performances are independent and the probabilities are
given by Table 2.6.
Problem 3.17 A computer company manufactures PC compatible computers in two plants,
called Plant A and B in this exercise. These plants account for 35 % and 65 % of the
production, respectively. The company records show that 3 % of the computers manufactured
by Plant A must be repaired under the warranty. The corresponding percentage for plant B
is 2.5 %.
(a) What is the percentage of computers that are repaired under the warranty and come from
Plant A?
(b) What percentage of computers repaired under the warranty come from Plant A? From
Plant B?
Problem 3.18 Twenty per cent of the days in a certain area are rainy (there is some measurable precipitation during the day), one third of the days are sunny (no measurable precipitation, more than 4 hours of sunshine) and fifteen per cent of the days are cold (daily
average temperature for the day below 5o C).
1 - Would you use the above information as an aid in
(i) Planning your next weekend activities (assuming that you live in this area)?
(ii) Deciding whether you want to move to this area?
(iii) Choosing the type of roofing for a large building in this area?
56
CHAPTER 3. PROBABILITY
57
3.3. EXERCISES
Problem 3.24 Suppose that we wish to determine whether an uncommon but fairly costly
construction flaw is present. Suppose that in fact this flaw has only probability 0.005 of
being present. A fairly simple test procedure is proposed to detect this flaw. Suppose that
the probabilities of being correctly positive and negative for this test are 0.98 and 0.94,
respectively.
1) Calculate the probability that the test will indicate the presence of a flaw.
2) Calculate the posterior probability that there is no flaw given that the test has indicated
that there is one. Comment on the implications of this result.
Problem 3.25 One method that can be use to distinguish between granite (G) and basalt
(B) rocks is to examine a portion of the infrared spectrum of the suns energy reflected
from the rock surface. Let R1 , R2 and R3 denote measured spectrum intensities at three
dierent wavelengths. Normally, R1 < R2 < R3 would be consistent with granite and
R3 < R1 < R2 would be consistent with basalt. However, when the measurements are made
remotely (e.g. using aircrafts) several orderings of the Ri0 s can arise. Flights over regions of
known composition have shown that granite rocks produce
(R1 < R2 < R3 )
(R1 < R3 < R2 )
(R3 < R1 < R2 )
60%
25%
15%
of the time,
of the time, and
of the time
On the other hand, basalt rocks produce these orderings of the spectrum intensities with
probabilities 0.10, 0.20 and 0.70, respectively. Suppose that for a randomly selected rock
from a certain region we have P (G) = 0.25 and P (B) = 0.75.
1) Calculate P (G|R1 < R2 < R3 ) and P (B|R1 < R2 < R3 ). If the measurements for a given
rock produce the ordering R1 < R2 < R3 , how would you classify this rock?
2) Same as 1) for the case R1 < R3 < R2
3) Same as 1) for the case R3 < R1 < R2
4) If one uses the classification rule determined in 1) 2) and 3), what is the probability of
a classification error (that a G rock is classified as a B rock or a B rock is classified as a G
rock)?
Problem 3.26 Messages are transmitted as a sequence of zeros and ones. Transmission errors occur independently, with probability 0.001. A message of 3500 bits will be transmitted.
(a) What is the probability that there will be no errors? What is the probability that there
will be more than one error?
(b) If the same message will be transmitted twice and those bits that do not agree will be
revised (and therefore these detected transmission errors will be corrected), what is the
probability that there will be no reception errors?
Problem 3.27 Suppose that the events A, B and C are independent. Show that,
(a) Ac and B c are independent.
58
CHAPTER 3. PROBABILITY
Table 3.7:
Low GPA
Medium GPA
High GPA
1) Calculate P (Bi [ Cj ),
Low Salary
0.10
0.07
0.03
0.20
Medium Salary
0.08
0.46
0.06
0.60
High Salary
0.02
0.07
0.11
0.20
0.20
0.60
0.20
i = 1, 2, 3 and j = 1, 2, 3
2) What is the meaning (in words), and the probability, of the event
A = (B1 \ C1 ) [ (B2 \ C2 ) [ (B3 \ C3 )
3) Are salary and GPA independent? Why?
4) Construct a table with the same marginals (same probabilities for the six categories) but
with salary and GPA being independent.
Problem 3.30 Consider the system of components connected as follows. There are two
subsystems connected in parallel. Components 1 and 2 constitute the first subsystem and are
connected in parallel (so that this subsystem works if either component works). Components
3 and 4 constitute the second subsystem and are connected in series (so that this subsystem
works if and only if both components do). If the components work independently of one
another and each component works with probability 0.85, (a) calculate the probability that
the system works. (b) calculate this probability if the two subsystems are connected in series.
59
3.3. EXERCISES
Problem 3.31 Calculate the reliability of the system described in the following figure. The
numbers beside each component represent the probabilities of failure for this component.
.05
@
@
1
2
.05
S
S
@
@
.01
.01
.01
.01
S
S
.05
7
60
CHAPTER 3. PROBABILITY
Chapter 4
62
{X x} = {w : X(w) x},
where the set A = (1, x]. Additional examples (related to Example 5 above) are
{X = 0} = {(N, N, N, N )}
and
{X 1} = {(N, N, N, N ), (D, N, N, N ), (N, D, N, N ), (N, N, D, N ), (N, N, N, D)}.
4.2
Discrete random variables are mainly used in relation to counting situations; for example,
The defining feature of a discrete random variable is that its range (the set of all its
possible values) is finite or countable. The values in the range are often integer numbers, but
they dont need to be so. For instance, a random variable taking the values zero, one half
and one with probabilities 0.5, 0.25 and 0.25 respectively is considered discrete.
The probability density function (or in short, the density), f (x), of a discrete random
variable X is defined as
f (x) = P (X = x),
That is, f (x) gives the probability of each possible value x of X. It obviously has the following
properties:
(1) f (x) 0 for all x in the range R of X
(2)
(3)
P
P
x2R
f (x) = 1
x2A
63
f (k),
kx
In many engineering applications one works with 1 F (x) instead of F (x). Notice that
1 F (x) = P (X > x) and therefore gives the probability that X will exceed the value x.
Example 4.1 (continued): Suppose that the items are independent and each one can be
defective with probability p. The density and distribution of the random variable (r.v.) X =
number of defectives can then be derived as follows:
f (0) = P (X = 0) = P ({N, N, N, N }) = (1 p)(1 p)(1 p)(1 p) = (1 p)4
f (1) = P (X = 1) = P ({D, N, N, N }, {N, D, N, N }, {N, N, D, N }, {N, N, N, D})
= p(1 p)(1 p)(1 p) + (1 p)p(1 p)(1 p) + (1 p)(1 p)p(1 p)
+(1 p)(1 p)(1 p)p = 4(1 p)3 p
In a similar way we can find that
f (2) = 6(1 p)2 p2 ,
and f (4) = p4
The values of the density and distribution functions of X, for the cases p = 0.40 and p = 0.80
are given in Table 2.5. A comparison of the density functions shows that smaller values of
X (0, 1 and 2) are more likely when p = 0.4 (why?) and that higher values (3 and 4) are
more likely when p = 0.8. Also notice that the distribution function for the case p = 0.8 is
uniformly smaller. This is so because getting smaller values of X is always more likely when
p = 0.4.
Table 4.1:
x
0
1
2
3
4
4.3
p = 0.40
f (x)
F (x)
0.1296 0.1296
0.3456 0.4752
0.3456 0.8208
0.1536 0.9744
0.0256 1.0000
p = 0.80
f (x)
F (x)
0.0016 0.0016
0.0256 0.0272
0.1536 0.1808
0.4096 0.5904
0.4096 1.0000
The continuous random variables are used in relation with continuous type of outcomes,
as for example,
64
the measurement error when measuring the distance between the North and South
shores of a river.
The typical events in these cases are bounded or unbounded intervals with probabilities
specified in terms of the integral of a continuous density function, f (x), over the desired
interval. See property (3) below.
Since the probability of all intervals must be non-negative and the probability of the entire
line should be one, it is clear that f (x) must have the two following properties:
(1) Non negative:
f (x) 0
(2) Total mass equal to one:
Z +1
1
for all x.
f (x)dx = 1.
Z b
a
f (x)dx.
Notice that, unlike in the discrete case, the inclusion or exclusion of the end points a and
b doesnt aect the probability that the continuous variable X is in the interval. In fact,
the event that X will take any single value, x, can be represented by the degenerate interval
x X x and so,
P (X = x) = P (x X x) =
Z x
x
f (t)dt = 0.
Therefore, unlike in the discrete case, f (x) doesnt represent the probability of the event
X = x. What is then the meaning of f (x)? It represents the relative probability that X will
be near x: if d > 0 is small,
P (x (d/2) < X < x + (d/2))
1 Z x+(d/2)
=
f (t)dt f (x).
d
d x(d/2)
65
Another important function related with a continuous random variable is its cumulative
distribution function defined as
F (x) = P (X x) =
Notice that, in particular,
Z x
f (t)dt,
for all x.
(4.1)
F(b)
F(b)-F(a)
F(a)
a
for all x.
(4.2)
Therefore, we can go back and forth from the density to the distribution function and vice
versa using formulas (4.1) and (4.2).
Example 4.2 Suppose that the maximum annual flood level of a river, X (in meters), has
density
f (x) = 0.125(x 5),
= 0
otherwise
0.8
0.4
1.0
< 6), P (6 X
0.5
if 5 < x < 9
0.6
Distribution Function
0.4
0.2
0.0
0.0
0.1
0.2
0.3
Density Function
7
x
10
10
9).
66
Solution
if x 5
F (x) = 0,
=
Z x
5
if 5 < x < 9
if x 9.
= 1,
Furthermore,
Notice that, since P (X = x) = 0, the inclusion or exclusion of the intervals boundary points
doesnt aect the probability of the corresponding interval. In other words,
P (6 X 7) = P (6 < X 7) = P (6 X < 7) = P (6 < X < 7) = F (7) F (6) = 0.1875.
Also notice that, since f (x) is increasing on (5, 9), P (5 < X < 6), for instance, is much
smaller than P (8 < X < 9), despite the length of the two intervals being equal.
2
Example 4.3 (Rounding-o Error and Uniform Random Variables): Due to the resolution
limitations of a measuring device, the measurements are rounded-o to the second decimal
place. If the third decimal place is 5 or more, the second place is increased by one unit; if the
third decimal place is 4 or less, the second place is left unchanged. For example, 3.2462 would
be reported as 3.25 and 3.2428 would be reported as 3.24. Let X represent the dierence
between the (unknown) true measurement, y, and the corresponding roundedo reading, r.
That is
X = y r.
Clearly, X can take any value between 0.005 < X < 0.005. It would appear reasonable
in this case to assume that all the possible values are equally likely. Therefore, the relative
probability f (x) that X will fall near any number x0 between 0.005 and 0.005 should then
be the same. That is,
f (x) = c,
= 0,
0.005 x 0.005,
otherwise.
The random variable X is said to be uniformly distributed between 0.005 and +0.005. By
property 2
Z +1
Z 0.005
f (x)dx =
cdx = 0.01c = 1,
1
0.005
67
120
1.5
F (x) = 0,
x 0.005,
= 100(x + 0.005),
0.005 x 0.005,
= 1,
x 0.005
Density Function
1.0
0.0
-0.005
0.0
x
0.005
0.5
distributiion
100
80
90
density
110
Distribution Function
-0.005
0.0
0.005
4.4
All the information concerning the random variable X is contained in its density function,
f (x), and this information can be used and displayed in the form of a picture (a graph of
f (x) versus x), a formula, or a table.
There are situations, however, when one would prefer to concentrate on a summary of the
more complete and complex information contained in f (x). This is the case, for example,
if we are working with several random variables that need to be compared in order to draw
some conclusions.
The summary of f (x), as any other summary, should be simple and informative. The
reader of such a summary should get a good idea of what are the most likely values of X and
what is the degree of uncertainty regarding the prediction of future values of X.
Typical densities found in practice are approximately symmetric and unimodal. These
densities can be summarized in terms of their central location and their dispersion. Therefore,
68
an approximately symmetric and unimodal density can be fairly well described by giving just
two numbers: a measure of its central location and a measure of its dispersion.
The median and the mean are two popular measures of (central) location and the
interquartile range and the standard deviation are two popular measures of dispersion.
These summary measures are defined and briefly discussed below.
The Median and the InterQuartile Range
Given a number between zero and one, the quantile of order of the distribution F (or
the r.v. X), denoted Q(), is implicitly defined by the equation
P (X Q()) = .
Therefore Q() has the property
Q() = F 1 ()
and can be found by solving (for x) the equation
F (x) = .
To find the quantile of order 0.25, for example, we must solve the equation
F (x) = 0.25.
The special quantiles Q(0.25) and Q(0.75) are often called the first quartile and the third
quartile, respectively.
The median of X, Med(X), is defined as the corresponding quantile of order 0.5, that is,
Med(X) = Q(0.5).
Evidently, Med(X) divides the range of X into two sets of equal probability. Therefore, it
can be used as a measure for the central location of f (x).
A simple sketch showing the locations of Q(0.25), M ed(X) and Q(0.75) constitutes a
good summary of f (x), even if it is not symmetric. Notice that if Q(0.75) M ed(X) is
significantly larger (or smaller) than M ed(X) Q(0.25), then f (x) is fairly asymmetric.
There are situations when there is no solution or too many solutions to the defining
equations above. This is typically the case for discrete random variables. In these cases the
quantiles (including the median) are calculated using some commonsense criterion. For
instance if the distribution function F (x) is constant and equal to 0.5 on the interval (x1 , x2 ),
then the median is taken equal to (x1 + x2 )/2 (see Figure 3.5 (a)). To give another example,
if the distribution function F (x) has a jump and doesnt take the value 0.5, the median is
defined as the location of the jump (see Figure 3.5 (b))
The dispersion about the median is usually measured in terms of the interquartile
range, denoted IQR(X) and defined as:
IQR(X) = Q(0.75) Q(0.25)
69
1.0
F(x)
0.5
0.5
F(x)
1.0
0.0
0.0
x1
x2
20
x1
0
(a)
20
(b)
Z x
= 2
f (t)dt = 2
Z x
0
exp {2t}dt
70
2x = log(2).
2x = log(3) log(4).
Therefore,
Q(0.25) =
log(4) log(3)
= 0.144.
2
Q(0.75) =
2x = log(4).
log(4)
= 0.693.
2
(c) Since
Q(0.75) Med(X) = 0.693 0.347 = 0.346
and
Med(X) Q(0.25) = 0.347 0.144 = 0.203,
the distribution is fairly asymmetric.
(d)
IQR = Q(0.75) Q(0.25) = 0.693 0.144 = 0.549.
2
The Mean, the Variance and the Standard Deviation
Let X be a random
variable with density f (x), and let g(X) be a function of X. For exp
ample, g(X) = X or g(X) = (X t)2 , where t is some fixed number. The notation E[g(X)],
read expected value of g(X), will be used very often in this course. The expected value of
g(X) is defined as the weighted average of the function g(x), with weights proportional to
the density function f (x). More precisely:
E[g(X)] =
Z +1
1
E[g(X)] =
g(x)f (x)dx
X
x2R
g(x)f (x)
(4.3)
(4.4)
71
Example 4.5 Refer to the random variables of Example 3.1 (number of defectives) and
Example 4.3 (roundingo error). Calculate E(X) and E(X 2 ).
Solution Since the random variable X of Example 3.1 is discrete, we must use formula (4.4)
to obtain:
E(X) = (0)(0.5) + (1)(0.2) + (2)(0.15) + (3)(0.10) + (4)(0.03) + (5)(0.02) = 1.02,
and
E(X 2 ) = (0)(0.5) + (1)(0.2) + (4)(0.15) + (9)(0.10) + (16)(0.03) + (25)(0.02) = 2.68.
In the case of the continuous random variable X of Example 4.3 we must use formula (4.3):
E(X) =
E(X ) =
=
Z +1
1
Z +1
1
xf (x)dx = 100
x f (x)dx = 100
Z 0.005
0.005
Z 0.005
0.005
xdx = 0,
x2 dx
100[(0.005)3 (0.005)3 ]
(200)(0.005)3
=
= 0.00000833.
3
3
2
Z +1
1
(x t)2 f (x)dx
(x t)2 f (x)
But we could begin this reasoning from the end and say that a good measure of central
location must minimize D(t). This optimal value of t, called the mean of X, is denoted
by the Greek letter .
72
To find we dierentiate D(t) and set the derivative equal to zero. In the continuous
case,
D0 (t) = 2
Z +1
1
and the discrete case can be treated similarly. Since D00 (t) = 2 > 0 for all t, the critical point
t = E(X) minimizes D(t). Therefore,
= E(X)
This procedure of defining the desired summary measure by the property of minimizing the
average of the squared residuals is a very important technique in applied statistics called the
method of minimum mean squared residuals. We will come across several applications
of this technique throughout this course.
Example 4.4 (continued): (a) Calculate the mean and the standard deviation for the waiting
time between two consecutive customers, X. (b) How do they compare with the corresponding median waiting time and interquartile range calculated before?
Solution
73
Z +1
0
x exp {2x}dx =
Z +1
0
exp {2x}
= [ exp {2x}/2]+1
= exp {0}/2 = 0.5.
0
More generally, if X is an exponential random variable with parameter (rate) , then
E(X) = 1/.
(4.5)
Z +1
0
Z +1
Therefore,
Z +1
0
x exp {2x}]dx
(4.6)
(b) Since the density of X is asymmetric, the median and the mean are expected to be
dierent (as they are). Since the density is skewed to the right (longer right hand side tail)
the mean expected time (0.5) is larger than the median expected time (0.347).
The two measures of dispersion (IQR = 0.549 and SD = 0.5) are quite consistent.
2
Properties of the Mean and the Variance
Property 1:
Proof
E(aX + b) =
xi f (xi )] + b = aE(X) + b.
Property 2:
Property 3:
Y.
E(XY ) = E(X)E(Y )
Property 4:
Proof
Var(aX + b) = E [(aX + b) (a + b)]2 = E [a(X )]2 = a2 E(X )2 = a2 Var(X)
74
Property 5:
Var(X Y ) = Var(X) + Var(Y )
variables X and Y .
2
for all pairs of independent random
All these properties will be used very often in this course. The proofs of properties 2, 3
and 5 are beyond the scope of this course, and therefore these properties must be accepted
as facts and used throughout the course.
The formula
Var(X) = E(X 2 ) [E(X)]2 = E(X 2 ) 2 ,
is often used for calculations. The derivation of this formula is very simple, using the properties of the mean listed above. In fact,
Var(X) = E{(X )2 } = E(X 2 + 2 2X) = E(X 2 ) + 2 2E(X)
= E(X 2 ) + 2 22 = E(X 2 ) 2 .
4.5
Random experiments are often independently repeated many times generating a sequence
X1 , X2 , . . . , Xn of n independent random variables. We will consider linear combinations of
these variables,
Y = a1 X1 + a2 X2 + + an Xn ,
where the coecients a1 , a2 , . . . , an are some given constants. For example, ai = 1, for all i,
produces the total
T = X1 + X2 + + Xn ,
and ai = 1/n, for all i, produces the average
X = (X1 + X2 + + Xn )/n.
Using the properties of the expected value and variance we have
E(Y ) = a1 E(X1 ) + a2 E(X2 ) + + an E(Xn )
and
Var(Y ) = a21 Var(X1 ) + a22 Var(X2 ) + + a2n Var(Xn ).
Typically, the n random variables Xi will have a common mean and an common variance
2 . In this case the sequence {X1 , X2 , . . . , Xn } is said to be a random sample. In this case,
E(Y ) = (a1 + a2 + + an )
and
Var(Y ) = (a21 + a22 + + a2n ) 2 .
75
Example 4.6 Twenty randomly selected students will be asked the question do you regularly smoke?. (a) Calculate the expected number of smokers in the sample if 10% of the
students smoke; (b) what is your estimate of the proportion, p, of smokers if six students
answered Yes?; (c) What are the expected value and the variance of your estimate?
Solution
(a) Let Xi be equal to one if the ith student answers Yes and equal to zero otherwise.
Let p be equal to the proportion of smokers in the student population. Then the Xi are
independent discrete random variables with density f (0) = 1 p and f (1) = p. Therefore,
E(Xi ) = E(Xi2 ) = 0f (0) + 1f (1) = f (1) = p = 0.1
and
Var(Xi ) = E(Xi2 ) [E(Xi )]2 = p p2 = p(1 p) = 0.09.
(b) A reasonable estimate for the fraction, p, of smokers in the population is given by the
corresponding fraction of smokers in the sample, X. In the case of our sample, the observed
value, x, of X is x = 6/20 = 0.3.
(c) The expected value of the estimate in (b) is p and its variance is p(1 p)/20. Why? 2
Example 4.7 The independent random variables X, Y and Z represent the monthly sales
of a large company in the provinces of BC, Ontario and Quebec, respectively. The mean and
standard deviations of these variables are as follows (in hundred of dollars):
E(X) = 1, 435
SD(X) = 120
E(Y ) = 2, 300
SD(Y ) = 150,
E(Z) = 1, 500
SD(Z) = 150.
(a) What are the expected value and the standard deviation of the total monthly sales?
(b) Sales manager J. Smith is responsible for the sales in BC and 2/3 of the sales in Ontario.
Sales manager R. Campbell is responsible for the sales in Quebec and the remaining 1/3 of
the sales in Ontario. What are the expected values and standard deviations of Mr. Smiths
and Mrs. Campbells monthly sales?
(c) What are the expected values and standard deviations of the annual sales for each
province? Assume for simplicity that the monthly sales are independent.
76
Solution
(a) The total monthly sales are
S = X + Y + Z.
By Property 2
E(S) = E(X) + E(Y ) + E(Z) = 1, 435 + 2, 300 + 1, 500 = 5, 235.
By Property 5
Var(S) = Var(X) + Var(Y ) + Var(Z) = 1202 + 1502 + 1502 = 59, 400.
Therefore,
SD(S) =
(b) First, notice that
S1 = X + (2/3)Y
and
S2 = Z + (1/3)Y,
SD(S2 ) = 158.11.
(c) If Xi (i = 1, . . . , 12) represent BCs monthly sales, the annual sales for BC are
T =
12
X
Xi
i=1
Therefore,
E(T ) = E
" 12
X
Xi =
i=1
12
X
i=1
The variance and the standard deviation of the annual sales in BC (assuming independence)
are:
"
#
Var(T ) = Var
12
X
i=1
Xi =
12
X
i=1
SD(T ) =
77
The student can now calculate the expected values and the standard deviations for the annual
sales in Toronto and Quebec.
2
Question: The total monthly sales can be obtained as the sum of Mr. Smiths (S1 =
X+(2/3)Y ) and Mrs. Campbells (S1 = Z+(1/3)Y ) monthly sales, with variances (calculated
in part (b)) equal to 24, 400 and 25, 000, respectively. Why is it then true that the total sales
variance (Var(X +Y +Z)), calculated in part (b), is not equal to the sum of 24, 400+25, 000 =
49, 400?
4.6
2.
The completion time of a project made up of n subprojects which can be pursued
simultaneously. In this case
Xi = Completion time for the ith subproject .
3.
The maximum flood level of a river in the next n years. In this case
Xi = Maximum flood level in the ith year .
78
The minimum flood level of a river in the next n years. In this case
Xi = Minimum flood level in the ith year .
4.6.1
The Maximum
Suppose that Fi (x) and fi (x) are the distribution and density functions of the random variable
Xi , and let FV (v) and fV (v) be the distribution and density functions of the maximum V .
Since the maximum, V , is less than a given value, v, if and only if each random variable Xi
is less than v we have
FV (v) = P {V v} = P {X1 v, X2 v, . . . , Xn v}
= P {X1 v}P {X2 v} P {Xn v}
= F1 (v)F2 (v) Fn (v)
This formula is greatly simplified when the Xi s are identically distributed, that is, when
F1 (x) = F2 (x) = = Fn (x) = F (x)
for all values of x. In this case,
FV (v) = [F (v)]n
(4.7)
(4.8)
and
Example 4.8 A system consists of five components connected in parallel. The lifetime
(in thousands of hours) of each component is an exponential random variable with mean
= 3. See Example 4.4 and Example 4.4 (continued) for the definition of exponential
random variables and formulas for their mean and variance.
(a) Calculate the median life (often called halflife) and standard deviation for each component.
(b) Calculate the probability that a component fails before 3500 hours.
(c) Calculate the probability that the system will fail before 3500 hours. Compare this with
the probability that a component fails before 3500 hours.
(d) Calculate the halflife (median life), mean life and standard deviation for the system.
79
Solution
(a) Using equation (4.5) and the fact that the lifetime X of each component is exponentially
distributed with mean = 3 we obtain that = 1/3 and that the density and distribution
functions of X are
f (x) = (1/3) exp{x/3}
and
F (x) = 1 exp{x/3},
x 0,
and so the probability that the system will fail before 3, 500 hours is
P {V 3.5} = FV (3.5) = [1 exp{3.5/3}]5 = (0.6886)5 = 0.1548.
The probability that a single component fails (calculated in part (b)) is more than four times
larger.
(c) To calculate the median life of the system we must use formula (1) once again:
FV (v) = 0.5 ) [1 exp{v/3}]5 = 0.5 ) exp{v/3}] = 1 (0.5)1/5 = 0.12945
) v0 = 3 log(0.12945) = 6.133.
Therefore, the median life of the system is equal to 6, 133 hours.
To calculate the mean life we must first obtain the density function of V . Using formula (2)
above we obtain
fV (v) = (5)[1 exp{v/3}]4 (1/3) exp{v/3}
= (5/3)[exp{v/3} 4 exp{2v/3} + 6 exp{v} 4 exp{4v/3} + exp{5v/3}].
Since, for any > 0,
Z 1
0
v exp{v}dv = (1/)
Z 1
0
80
Z 1
0
+ 6
Z 1
0
Z 1
0
v exp{v}dv 4
Z 1
0
v exp{v/3}dv 4
v exp{4v/3}dv +
Z 1
v exp{2v/3}dv
Z 1
0
v exp{5v/3}dv
v 2 exp{v}dv = 2/( 3 ),
[why?]
we have that
E(V 2 ) =
Z 1
0
+ 6
v 2 fV (v)dv = (5/3)[
Z 1
0
Z 1
v 2 exp{v}dv 4
v 2 exp{v/3}dv 4
Z 1
0
v 2 exp{4v/3}dv +
Z 1
0
Z 1
0
v 2 exp{2v/3}dv
v 2 exp{5v/3}dv
4.6.2
60.095 (6.852 ) =
13.1725 = 3.63.
2
The Minimum
Now we turn our attention to the distribution of the minimum, . Let F (u) and f (u)
denote the distribution and density functions of . Since the minimum, , is greater than a
given value, u, if and only if each random variable Xi is greater than u we have
F (u) = P { u} = 1 P { > u} = 1 P {X1 > u, X2 > u, . . . , Xn > u}
= 1 P {X1 > u}P {X2 > u} P {Xn > u} [ since the variables Xi are independent]
= 1 [1 F1 (u)][1 F2 (u)] [1 Fn (u)] [since P {Xi > u} = 1 Fi (u), i = 1, . . . , n]
81
As before, this formula can be greatly simplified when the Xi s are equally distributed, that
is, when
F1 (x) = F2 (x) = = Fn (x) = F (x)
for all values of x. In this case,
F (u) = 1 [1 F (u)]n
(4.9)
(4.10)
and
Example 4.9 A system consists of five components connected in series. The lifetime (in
thousands of hours) of each component is an exponential random variable with mean = 3.
(a) Calculate the probability that the system will fail before 3500 hours. Compare this with
the probability that a component fails before 3500 hours.
(b) Calculate the median life, the mean life and the standard deviation for the system.
Solution
(a) Using formula (4.9) above we obtain
F (u) = 1 [exp{u/3}]5 = 1 exp{(5/3)u}
and so is also exponentially distributed with parameter 5 (1/3) = 5/3. In general,
the minimum of n exponential random variables with parameter is also exponential with
parameter n. Finally,
P { 3.5} = F (3.5) = 1 exp{(5/3)3.5} = 0.9971
The probability that a component will fail before 3500 has been found (in Example 4.8) to
be 0.6886. Therefore, the probability that the system will fail before 3, 500 hours is almost
45% larger.
(b) Since is exponentially distributed, its mean and standard deviation can be obtained
directly from the distribution function found in (a), using equations (4.5) and (4.6). That is,
E() = SD() = (3/5) = 0.6.
Therefore, the mean life of the system, 600 hours, is 5 times smaller than that of the individual
components. Finally, the median life of the system can be found as follows:
1 exp{(5u)/3} = 0.5 ) exp{(5u)/3} = 0.5 ) u0 = 3 log(0.5)/5 = 0.416.
Therefore, the median life of the system is equal to 416 hours.
82
4.7
4.7.1
Exercises
Exercise Set A
Problem 4.1 A system consists of five identical components all connected in series. Suppose
each component has a lifetime (in hours) that is exponentially distributed with the rate
= 0.01, and all the five components work independently of one another.
Define T to be the time at which the system fails. Consider the following questions:
(a) Obtain the distribution of T . Can you tell what type of distribution it is?
(b) Compute the IQR (interquartile range) for the distribution obtained in part (a).
(c) What is the probability that the system will last at least 15 hours?
Problem 4.2 Are the following functions density functions? Why?
(a) f1 (x) = 1, 1 x 3; 0, otherwise.
(b) f2 (x) = x, 1 x 1; 0, otherwise.
(c) f3 (x) = exp(x), x 0; 0, otherwise.
Problem 4.3 Suppose that the response time X at a certain on-line computer terminal (the
elapsed time between the end of a users inquiry and the beginning of the systems response
to that inquiry) has an exponential distribution with expected response time equal to 5
seconds (i.e. the exponential rate is = 0.2).
(a) Calculate the median response time.
(b) What is the probability that the next three response times exceed 5 seconds? (Assume
that all the response times are independent).
Problem 4.4 The hourly volume of trac, X, for a proposed highway has density proportional to g(x), where
(
x(100 x) if 0 < x < 100
g(x) =
0
otherwise.
(a) Derive the density and the distribution functions of X.
(b) The trac engineer may design the highway capacity equal to the mean of X. Determine
the design capacity of the highway and the corresponding probability of exceedence
(i.e. trac volume is greater than the capacity).
Problem 4.5 A discrete random variable X has the density function given below.
x
1 0
f (x) 0.2 c
1
2
0.2 0.1
(a) Determine c;
(b) Find the distribution function F (x);
(c) Show that the random variable Y = X 2 has the density function g(y) given by
y
0
1
4
g(y) 0.5 0.4 0.1
83
4.7. EXERCISES
(d) Calculate expectation E(X), variance Var(X) and the mode of X (the value x with the
highest density).
Problem 4.6 A continuous random variable X has the density function f (x) which is proportional to cx on the interval 0 x 1, and 0 otherwise.
(a) Determine the constant c;
(b) Find the distribution function F (x) of X;
(c) Calculate E(X), Var(X) and the median, Q(0.5);
(d) Find P (|X| 0.5).
Problem 4.7 Show that
(a) Any distribution function F (x) is non-decreasing, i.e. for any real values x1 < x2 ,
F (x1 ) F (x2 ).
(b) Suppose X is a random variable with finite variance. Then, Var(X) E(X 2 ).
(c) If a density function f (x) is symmetric around 0, i.e. f (x) = f (x) for all x 2 R, then
F (0) = P (X 0) = 0.5.
Problem 4.8 If the probability density of a random variable is given by
f (x) =
8
>
< kx
(a) Find the value of k such that f (x) is a probability density function.
(b) Find the corresponding distribution function.
(c) Find the mean and median.
Problem 4.9 Suppose a random variable X has a probability density function given by
f (x) =
kx(1 x) for 0 x 1
0
elsewhere.
(a) Find the value of k such that f (x) is a probability density function.
(b) Find P (0.4 X 1).
(c) Find P (X 0.4|X 0.8).
(d) Find F (b) = P (X b), and sketch the graph of this function.
Problem 4.10 Suppose that random variables X and Y are independent and have the same
mean 3 and standard deviation 2. Calculate the mean and variance of X Y .
Problem 4.11 Suppose X has exponential distribution with a unknown parameter , i.e.
its density is that
(
exp(x) if x 0
f (x) =
0
otherwise.
If P (X 1) = 0.25, determine .
84
Problem 4.12 Suppose an enemy aircraft flies directly over the Alaska pipeline and fires
a single air-to-surface missile. If the missile hits anywhere within 10 feet of the pipeline, a
major structural damage will occur and the oil flow will be disrupted. Let X be the distance
from the pipeline to the point of impact. Note that X is a continuous random variable. The
probability function describing the missiles point of impact is given by
f (x) =
8
60+x
>
< 3600
60x
3600
>
: 0
3
2x2
for 1 x 3
otherwise
Exercise Set B
Problem 4.15 The continuous random variable X takes values between 2 and 2 and its
density function is proportional to
(a) 4 x2
(b) x2
(c) 2 + x
(d) exp {|x|}
Find, in each case, the density function, the distribution function, the mean, the standard
deviation, the median and the interquartile range of X.
85
4.7. EXERCISES
Problem 4.16 Find the density functions corresponding to the pictures in Figure 3.7. For
each case also calculate the distribution function, the mean, the median, the interquartile
range and the standard deviation.
30
(a)
-2
0
(d)
9
(b)
15
10
-1
-2
(c)
(e)
6
(f)
86
0
0.10
0.20
0.50
0.05
0.50
0.85
0.30
0.20
0.10
0.20
0.30
0.10
0.35
0.40
0.20
0.60
0.70
0.10
0.40
0.15
1
0.20
0.20
0.30
0.05
0.00
0.00
0.25
0.30
0.10
0.50
0.30
0.10
0.25
0.30
0.30
0.30
0.10
0.80
0.40
0.60
2
0.40
0.20
0.10
0.10
0.00
0.00
0.20
0.20
0.50
0.10
0.40
0.50
0.20
0.10
0.50
0.10
0.10
0.10
0.10
0.15
3
0.20
0.20
0.05
0.30
0.00
0.00
0.10
0.10
0.20
0.20
0.00
0.15
0.15
0.10
0.00
0.00
0.10
0.00
0.10
0.10
4
0.10
0.20
0.05
0.50
0.50
0.15
0.15
0.20
0.10
0.00
0.00
0.15
0.05
0.10
0.00
0.00
0.00
0.00
0.00
0.00
1) How would you rank these twenty welders (e.g. for promotion) on the basis of this
information alone?
2) Would you change the ranking if you know that items with one, two, three and four
cracks must be sold for $6, $15, $40, and $60 less, respectively? What if the associated losses
are $6, $15, $40, and $80. Suggestion: Use the computer.
Problem 4.21 Suppose that the maximum annual wind velocity near a construction site,
X, has exponential density
f (x) = exp {x},
x > 0.
(a) If the records of maximum wind speed show that the probability of maximum annual
wind velocities less than 72 mph is approximately 0.90, suggest an appropriate estimate for
.
(b) If the annual maximum wind speeds for dierent years are statistically independent,
calculate the probability that the maximum wind speed in the next three years will exceed
75 mph. What about the next 15 years?
(c) Plot the distribution function of the maximum wind speed for the next year, for the next
3 years and for the next 15 years. Briefly report your conclusions.
(d) Let Qm (p) (m = 1, 2, . . .) the quantile of order p for the maximum wind speed on the
next m years. Show that
h
Qm (p) = Q1 p1/m ,
for all m = 1, 2, . . .
Use this formula to plot Qm (0.90) versus m. Same for Qm (0.95). Briefly report your conclusions. Suggestion: Use the computer.
87
4.7. EXERCISES
Problem 4.22 A system has two independent components A and B connected in parallel.
If the operational life (in thousand of hours) of each component is a random variable with
density
f (x) =
1
(x 4)(10 x)
36
= 0
4 < x < 10
otherwise
(a) Find the median and the mean life of each component. Find also the standard deviation
and IQR.
(b) Calculate the distribution and density functions for the lifetime of the system. What is
the expected lifetime of the system?
(c) Same as (b) but assuming that the components are connected in series instead of in
parallel.
Problem 4.23 A large construction project consists of building a bridge and two roads
linking it to two cities (see the picture below). The contractual time for the entire project is
18 months.
The construction of each road will require between 15 and 20 months and that of the
bridge will require between 12 and 19 months. The three parts of the projects can be done
simultaneously and independently. Let X1 , X2 and Y represent the construction times for the
two roads and the bridge, respectively and suppose that these random variables are uniformly
distributed on their respective ranges.
(a) What is the expected time for completion of each part of the project? What are the
corresponding standard deviations?
(b) What is the expected time for the completion of the entire project? What is the corresponding standard deviation?
(c) What is the probability that the project will be completed within the contractual time?
Problem 4.24 Same as Problem 2.51, but assuming that the variables X1 , X2 and Y have
triangular distributions over their ranges.
City
River
Road 1
Bridge
Road 2
City
88
Chapter 5
Normal Distribution
5.1
Normal Distribution N (, 2 )
The Normal distribution is, for reasons that will be evident as we progress in this course,
the most popular distribution among engineers and other scientists. It is a continuous distribution with density,
(
1
(x )2
f (x) = p exp
2 2
2
where and are parameters which control the central location and the dispersion of the
density, respectively. The normal density is perfectly symmetric about the center, , and
this bell-shaped function is shorter and fatter as increases.
0.4
Normal Density
0.3
sigma=1
0.2
density
sigma=1.5
sigma=2
0.0
0.1
sigma=3
-6
-4
-2
90
The density steadily decreases as we move away from its highest value
1
f () = p .
2
Therefore, the relative (and also the absolute) probability that X will take a value near is
the highest. Since f (x) ! 0 as x ! 1, exponentially fast,
g(k) = P {|X | k} ! 1,
as k ! 1,
very fast. In fact, it can be shown that g(1) = 0.6827, g(2) = 0.9544, g(3) = 0.9973 and
g(4) = 0.9999. For practical purposes g(k) = 1 for k 4.
Some Important Facts about the Normal Distribution
Fact 1: If X N (, 2 ) and Y = aX + b, where a and b are two constants with a 6= 0, then
Y N (a + b, a2 2 ).
For example, if X N (2, 9) and Y = 5X + 1, then E(Y ) = (5)(2) + 1 = 11, Var(Y ) =
(52 )(9) = 225 and Y N (11, 225).
Proof We will consider the case a > 0. The proof for the a < 0 case is left as an exercise.
The distribution function of Y , denoted here by G is given by
yb
G(y) = P (Y y) = P (aX + b y) = P X
a
=F
yb
,
a
where F is the distribution function of X. The density function g(y) of Y can now be found
by dierentiating G(y). That is,
d
yb
1
yb
g(y) = G (y) = F
= f
dy
a
a
a
(
)
2
1
[y (a + b)]
p exp
=
,
2a2 2
a 2
0
2
Standardized Normal
An important particular case emerges when a = (1/) and b = (/). In this case the
transformed variable is denoted by Z and called standard normal. Since
Z = (1/)X (/) =
X
,
by Property 1, the parameters of the new normal variable, Z, can be obtained from those of
the given normal variable, X, ( and 2 ) as follows:
! a + b = (1/) (/) = 0
91
and
2 ! a2 2 = (1/)2 2 = 1.
That is, any given normal random variable X N (, 2 ) can be transformed into a standard
normal Z N (0, 1) by the equation
Z=
X
.
(5.1)
0.5
0.3
1-P(Z<1)
0.2
P(Z<-1)
0.0
0.1
density
0.4
P(Z<-1)=1-P(Z<1)
-3
-2
-1
Z z
1
t2
p exp
(z) =
dt.
2
1
2
Since (z) is symmetric about zero [(z) = (z) for all z] we have the important identity
(z) = 1 (z)
for all z. See Figure 4.2.
For example,
(1) = 1 (1) = 1 0.8413447 = 0.1586553,
and
92
Fact 3: The normal density cannot be integrated in closed form. That is, there are no simple
formulas for calculating expressions like
F (x) =
or
P (a < X < b) =
Z b
a
Z x
f (t)dt
93
c = (3)(1.64) + 2 = 6.92.
(f) The value of c such that P (|X 2| > c) = 0.10 is calculated as follows,
P (|X 2| > c) = P [|Z| > c/3] = 1 P [|Z| c/3] = 1 {2(c/3) 1} = 2[1 (c/3)] = 0.10
Therefore,
)
(c/3) = 0.95
c/3 = 1.64
c = (3)(1.64) = 4.92
2
Fact 4: If X N (, 2 ), then
E(X) =
Var(X) = 2 .
and
Proof It suces to prove that E(Z) = 0 and Var(Z) = 1, because from (9)
X = Z + ,
2
and then we would have E(X) = E(Z +) = E(Z)+ = and Var(Z
=
p+) = Var(Z)
2
0
. By symmetry, we must have E(Z) = 0. In fact, since (z) = (z/ 2) exp {z 2 /2} =
z(z), it follows that
Z 1
z(z)dz =
Z 1
0 (z)dz = (z)|1
1 = 0.
z (z)dz =
Z 1
z (z)dz =
z(z)|1
1
Z 1
(z)dz = 1.
2
Fact 5: Suppose that X1 , X2 , . . . , Xn are independent normal random variables with mean
E(Xi ) = i and variance Var(Xi ) = i2 . Let Y be a linear combination of the Xi , that is,
Y = a1 X1 + a2 X2 + . . . + an Xn ,
where ai ( i = 1, , n ) are some given constant coecients. Then,
Y
94
Proof The proof that Y is normal is beyond the scope of this course. On the other hand,
to show that
E(Y ) = a1 1 + a2 2 + . . . + an n ,
and
Var(Y ) = a21 12 + a22 22 + . . . + a2n n2 ,
is very easy, using Properties 2 and 5 for the mean and the variance of sums of random
variables.
2
Example 5.2 Suppose that X1 and X2 are independent, X1 N (2, 4), X2 N (5, 3) and
Y = 0.5X1 + 2.5X2 .
and variance
Therefore,
15 13.5
P (Y > 15) = 1 p
19.75
An important particular case arises when X1 , . . . , Xn is a normal sample, that is, when
the variables X1 , . . . , Xn are independent, identically distributed, normal random variables,
with mean and variance 2 . One can think of the Xi0 s as a sequence of n independent measurements of the normal random variable, X N (, 2 ). is usually called the population
mean and 2 is usually called the population variance.
If the coecients, ai , are all equal to 1/n. then Y is equal the sample average:
Y =
n
X
(1/n)Xi =
i=1
n
1X
Xi = X.
n i=1
By Fact 5, then, the normal sample average is also a normal random variable, with mean
n
X
ai =
i=1
and variance
n
X
i=1
a2i 2 =
n
X
1
i=1 n
n
X
1
i=1
2 =
2
n
= ,
n
n 2
2
=
.
n2
n
95
Example 5.3 Suppose that X1 , X2 , . . . , X16 are independent N (, 4) and X is their average.
(a) Calculate P (|X1 | < 1) and P (|X | < 1). (b) Calculate P (|X | < 1) when the
sample size is 25 instead of 16. (c) Comment on the result of your calculations.
Solution
(a) Since X1 N (, 4), X1 N (0, 4) and so,
P (|X1 | < 1) = 2(1/2) 1 = 2(0.5) 1 = 0.383.
Moreover, since X N (, 4/16), X N (0, 1/4) and
5.2
Checking Normality
96
That is,
qi = F 1 [(i 0.5)/n],
where F 1 denotes the inverse of F . In the special case of the standard normal the theoretical
quantiles will be denoted by di . They are given by the formula
di = 1 [(i 0.5)/n],
where, as usual, denotes the standard normal distribution function. In the case of a normal
random variable, X, with mean and variance 2 , we have
P (X qi ) = [(qi )/] = (i 0.5)/n.
and therefore,
(qi )/ = 1 [(i 0.5)/n] = di
=)
qi = + di .
97
0.5
-2
-1.5
-1.0
-0.5
0.0
1.0
10
-1
-2
-1
1.0
0.0
0.5
-0.5
-2
-1.0
-4
-2
-1
-2
-1
-2
-2
-50
-1
50
100
150
-1
-2
-1
98
5.3
5.3.1
Exercises
Exercise Set A
Problem 5.1 A machine operation produces steel shafts having diameters that are normally
distributed with a mean of 1.005 inches and a standard deviation of 0.01 inch. Specifications
call for diameters to fall within the interval 1.000.02 inches. What percentage of the output
of this operation will fail to meet specifications? What should be the mean diameter of the
shafts produced in order to minimize the fraction not meeting specifications?
Problem 5.2 Extruded plastic rods are automatically cut into nominal lengths of 6 inches.
Actual lengths are normally distributed about a mean of 6 inches and their standard deviation
is 0.06 inch.
(a) What proportion of the rods exceeds the tolerance limits of 5.9 inches to 6.1 inches?
(b) To what value does the standard deviation need to be reduced if 99% of the rods must
be within tolerance?
Problem 5.3 Suppose X1 and X2 are independent and identically distributed N (0, 4), and
define Y = max(X1 , X2 ). Find the density and the distribution functions of Y .
Problem 5.4 Assume that the height of UBC students is a normal random variable with
mean 5.65 feet and standard deviation 0.3 feet.
(a) Calculate the probability that a randomly selected student has height between 5.45 and
5.85 feet.
(b) What is the proportion of students above 6 feet?
Problem 5.5 The raw scores in a national aptitude test are normally distributed with mean
506 and standard deviation 81.
(a) What proportion of the candidates scored below 574?
(b) Find the 30th percentile of the scores.
Problem 5.6 Scores on a certain nationwide college entrance examination follow a normal
distribution with a mean of 500 and a standard deviation of 100.
(a) If a school admits only students who scores over 670, what proportion of the student pool
will be eligible for admission?
(b) What admission requirements would you see if only the top 15% are to be eligible?
Problem 5.7 A machine is designed to cut boards at a desired length of 8 feet. However,
the actual length of the boards is a normal random variable with standard deviation 0.2 feet.
The mean can be set by the machine operator. At what mean length should the machine be
set so that only 5 per cent of the boards are under cut (that is, under 8 feet)?
Problem 5.8 The temperature reading X from a thermocouple placed in a constanttemperature medium is normally distributed with mean , the actual temperature of the
medium, and standard deviation .
(a) What would the value of have to be to ensure that 95% of all readings are within 0.1
5.3. EXERCISES
99
of ?
(b) Consider the dierence between two observations X1 and X2 (here we could assume that
X1 and X2 are i.i.d.), what is the probability that the absolute value of this dierence is at
most 0.075 ?
Problem 5.9 Suppose the random variable X follows a normal distribution with mean =
50 and standard deviation = 5.
(a) Calculate the probability P (|X| > 60).
(b) Calculate EX 2 and the interquartile range of X.
5.3.2
Exercise Set B
100
Problem 5.12 A scholarship is oered to students who graduate in the top 5% of their
class. Rank in the class is based on GPA (4.00 being perfect). A professor tells you the
marks are distributed normally with mean 2.64 and variance 0.5831. What GPA must you
get to qualify for the scholarship?
Problem 5.13 If the test scores of 40 students are normally distributed with a mean of 65
and a standard deviation of 10.
(a) Calculate the probability that a randomly selected student scored between 50 and 80;
(b) If two students are randomly selected, calculate the probability that the dierence between
their scores is less than 10.
Problem 5.14 The length of trout in a lake is normally distributed with mean = 0.93
feet and standard deviation = 0.5 feet.
(a) What is the probability that a randomly chosen trout in the lake has a length of at least
0.5 feet;
(b) Suppose now that the is unknown. What is the value of if we know that 85% of the
trout in the lake are less than 1.5 feet long. Use the same mean 0.93.
Problem 5.15 The life of a certain type of electron tube is normally distributed with mean
95 hours and standard deviation 6 hours. Four tubes are used in a electronic system. Assume
that these tubes alone determine the operating life of the system and that, if any one fails,
the system is inoperative.
(a) What is the probability of a tube living at least 100 hours?
(b) What is the probability that the system will operate for more than 90 hours?
Problem 5.16 A product consists of an assembly of three components. The overall weight
of the product, Z, is equal to the sum of the weights X1 , X2 and X3 of its components.
Because of variability in production, they are independent random variables, each normally
distributed as N (2, 0.02), N (1, 0.010) and N (3, 0.03), respectively. What is the probability
that Z will meet the overall specification 6.00 0.30 inches?
Problem 5.17 Due to variability in raw materials and production conditions, the weight
(in hundred of pounds) of a concrete beam is a normal random variable with mean 31 and
standard deviation 0.50.
(a) Calculate the probability that a randomly selected beam weights between 3000 and 3200
pounds.
(b) Calculate the probability that 25 randomly selected beams will
weight more than 79,500 pounds.
Problem 5.18 A machine fills 250-pound bags of dry concrete mix. The actual weight of
the mix that is put in the bag is a normal random variable with standard deviation = 0.40
pound. The mean can be set by the machine operator. At what mean weight should the
machine be set so that only 10 per cent of the bags are underweight? What about the larger
500-pound bags?
5.3. EXERCISES
101
Problem 5.19 Check if the following samples are normal. Describe the type of departure
from normality when appropriate.
(a) 2.52 3.06 2.41 3.98 2.63 4.11 4.66 5.83 4.80 6.17 4.44 5.38 5.02 1.09 3.31 2.72 1.75 3.81
4.45 2.93
(b) 2.15 -3.46 1.12 0.25 -1.42 0.06 -1.16 -2.24 -1.50 0.37 0.66 -0.76 6.24 0.36 -0.40 0.52 -0.97
0.36 1.74 -0.65
(c) 1.79 -0.65 1.16 1.23 2.80 0.92 -2.62 -5.48 0.75 -2.64 -6.41 0.92 1.14 0.18 0.06 -1.49 -3.99
-10.36 7.12 -1.86
(d) -0.53 0.71 1.40 0.28 -0.65 1.02 -0.71 0.70 1.55 -0.52 -0.73 -1.04 -2.39 0.39 5.71 6.39 4.28
6.70 6.05 5.62
(e) -1.61 -1.29 0.59 -0.33 0.14 1.16 2.02 -0.52 0.69 -0.30 -0.56 0.43 -1.01 0.83 -0.95 0.24 0.01
0.10 0.12 0.07
102
Chapter 6
Bernoulli Experiments
Some random experiments can be viewed as a sequence of identical and independent trials,
on each of which one of two possible outcomes occurs. Some examples of random experiments
of this kind are
- Recording the number of times the maximum annual wind speed exceeds a certain level v0
(during a fixed number of years).
- Counting the number of years until v0 is exceeded for the first time.
- Testing (passnopass) a number of randomly chosen items.
- Polling some randomly (and independently) chosen individuals regarding some yesno question, for instance, did you vote in the last provincial election?
Each trial is called Bernoulli trial and a set of independent Bernoulli trials is called
Bernoulli process or Bernoulli experiment. The defining features of a Bernoulli experiment
are:
These outcomes refer to the occurrence or not of a certain event, A. They are arbitrarily
called success (when A occurs) and failure (when Ac occurs) and denoted by
S (for success)
and
F (for failure)
104
and so
P (F ) = 1 P (S) = 1 p = q.
The number of trials in a Bernoulli experiment can either be fixed or random. For example,
if we are considering the number of maximum annual wind speed exceedances of v0 in the
next fifteen years, the number of trials is fixed and equal to 15. On the other hand, if we are
considering the number of years until v0 is first exceeded, the number of trials is random.
6.2
Given a Bernoulli experiment of size n (n independent Bernoulli trials), there are n Bernoulli
random variables Y1 , Y2 , . . . , Yn associated with it. The random variable Yi (i = 1, . . . , n)
depends only on the outcome of the ith trial and is defined as follows
Yi = 1,
= 0,
That is, Yi is a counter for the number of Ss in the outcome of the random experiment.
The variables Yi are very simple. By definition, they are independent and their common
density function is
f (y) = py (1 p)1y , y = 0, 1.
The mean and the variance of Yi (they are, of course, the same for all i = 1, . . . , n,) are given
by,
E(Yi ) = (0)f (0) + (1)f (1) = p,
and
Var(Yi ) = (0 p)2 f (0) + (1 p)2 f (1) = (p)2 q + q 2 p = pq,
where q = 1 p.
The student can check that the variance is maximized when p = q = 0.5. This result is hardly
surprising as the uncertainty is clearly maximized when S and F are equally likely. On the
other hand, the uncertainty is clearly smaller for smaller or larger values of p. For example,
if p = 0.01 we can feel very confident that most of the trials will result in failures. Similarly,
if p = 0.99 we can confidently predict that most of the trials will result in successes.
Binomial Random Variable (B(n, p)).
Given a Bernoulli experiment of fixed size n, the corresponding Binomial random variable
X is defined as the total number of Ss in the sequence of Fs and Ss that constitutes the
outcome of the experiment. That is,
X=
n
X
i=1
Yi .
105
Using properties (2) and (4) of the mean and variance of random variables,
n
X
E(X) = E
Yi =
i=1
and
n
X
Var(X) = Var(
Yi ) =
i=1
n
X
n
X
i=1
Var(Yi ) =
i=1
E(Yi ) =
n
X
n
X
p = np.
i=1
where q = 1 p.
pq = npq,
i=1
for all x = 0, 1, . . . , n,
(6.1)
n!
[n(n 1) . . . (2)(1)]
=
.
x!(n x)!
[x(x 1) . . . (2)(1)] [(n x)(n x 1) . . . (2)(1)]
5!
[(5)(4)(3)(2)(1)]
=
= 10
3!2!
[(3)(2)(1)][(2)(1)]
To derive the density (6.1) first notice that X takes the value x only if x of the Yi are equal
to one and the remainder are equal to zero. The probability of this event is px q nx . In
addition, the n variables Yi can be divided into two groups of x and n x variables in (nx )
many dierent ways.
The distribution function of X doesnt have a simple closed form and can be obtained
from Table A5 for a limited set of values of n and p.
Example 6.1 Suppose that the logarithm of the operational life of a machine, T (in hours),
has a normal distribution with mean 15 and standard deviation 7. If a plant has 20 of these
machines working independently, (a) what is the probability that more than one machine
will breakdown before 1500 hours of operation? (b) how many more machines are needed if
the expected number of machines that will not break down before 1500 hours of operation
must be larger than 18?
Solution The number of machines breaking down before 1500 hours of operation, X, is a
binomial random variable with n = 20 and
"
log(1500) 15
p = P (T < 1500) = P (log(T ) < log(1500)) =
7
= 1 (1.1) = 1 0.8643 = 0.14.
(a) First we notice that
P (X > 1) = 1 P [X 1].
106
Since
20
20
19
P [X 1] = P (X = 0) + P (X = 1) = (20
0 )(0.86) + (1 )(0.14)(0.86)
6.3
The expected value, , of the number of trials before the first occurrence of a certain event,
A, is called the return period of that event. For example, the return period of the event
maximum annual wind speed exceeding v0 is equal to the expected number of years before
v0 is exceeded for the first time.
The number of trials itself is a discrete random variable, X, with Geometric density,
f (x) = p(1 p)x1 ,
x = 1, 2, . . .
(6.2)
x = 1, 2, . . .
The derivation of (6.2) is fairly straightforward: first of all, it is clear that the range of X is
equal to {1, 2, . . .}. Furthermore, we can have X = x only if the event Ac occurs during the
first x 1 trials and A occurs in the xth trial. In other words, we must have a sequence of
x 1 failures followed by a success. Because of the independence of the trials in a Bernoulli
experiment, it is clear that the probability of such a sequence is equal to p(1 p)x1 .
To check that f (x) = pq x1 is actually a probability density function we must verify that
1
X
f (x) = 1.
In fact, using the well known formula for the sum of a geometric series with rate 0 < q < 1,
[1 + q + q 2 + ] = 1/(1 q),
we obtain
1
X
0
f (x) =
1
X
1
pq x1 = p[1 + q + q 2 + ] =
p
= 1.
1q
107
1
X
1
1
X
x1
x(1 p)
=p
= p{
d
[
dp
= p{
d 1p
1
1
}=p 2 = .
dp p
p
p
(1 p)x ]} = p{
1
X
1
d
[(1 p)x ]}
dp
d
(1 p)[1 + (1 p) + (1 p)2 + ]}
dp
The return period of A is then inversely proportional to p = P (A). If p = P (A) is small then
we must wait, on average, a large number of periods until the first occurrence of A. On
the other hand, if p is large then we must wait, on average, a small number of periods for
the first occurrence of A.
The student will be asked to show (see Problem 6.6) that the variance of X is given by
Var(X) = (1 p)/p2 = ( 1).
One may well ask the question: why is called return period? The reason for this becomes
clear after we notice that, because of the assumed independence, the expected number of trials
before the first occurrence of A is the same as the expected number of trials between any two
consecutive occurrences of A.
Example 6.2 Suppose that a structure has been designed for a 25year rain (that is, a
rain that occurs on average every 25 years).
(a) What is the probability that the design annual rainfall will be exceeded for the first time
on the sixth year after completion of the structure?
(b) If the annual rainfall Y (in inches) is normal with mean 55 and variance 16, what is the
corresponding design rainfall?
Solution
(a) To say that a certain structure has been designed for a 25year rain means that it has
been designed for an annual rainfall with return period of 25 years.
The return period, , is equal to 25, and therefore the probability of exceeding the design
annual rainfall is
p = 1/ = 1/25 = 0.04.
If X represents the number of years until the first time the design annual rainfall is exceeded,
then
P (X = 6) = (0.04)(0.96)61 = (0.04)(0.96)5 = 0.033
is the required probability.
(b) The design rainfall, v0 , must satisfy the equation
P (Y > v0 ) = 0.04
108
or equivalently,
v0 55
= 0.96.
4
From the Standard Normal Table we find that (1.75) = 0.96. Therefore,
v0 55
= 1.75
4
v0 = (4)(1.75) + 55 = 62.
2
6.4
Many physical problems of interest to engineers and other applied scientists involve the
possible occurrences of an event A at some points in time and/or space. For example:
and
109
The process is called a Poisson Process if A is a rare event, that is, has the following
properties:
1) The number of occurrences of A on nonoverlapping intervals are independent.
2) The probability of exactly one occurrence of A on any interval of length is approximately equal to when is small.
3) The probability of more than one occurrence of A on any interval of length is
approximately equal to 2 when is small (that is, A is a rare event).
The discrete random variable X described above (number of occurrences on an interval
of fixed length ) has the so called Poisson density function
f (x) =
exp{}()x
,
x!
x = 0, 1, 2, . . .
and the continuous random variable T (time between consecutive occurrences of A, or interarrival time) has the so called exponential density
f (t) = exp{t},
t > 0.
The derivation of these densities from assumptions 1) 2) and 3) is not very dicult. The
interested student can read the heuristic derivation given at the end of this chapter.
Example 6.3 In Southern California there is on average one earthquake per year with
Richter magnitude 6.1 or greater (big earthquakes).
(a) What is the probability of having three or more big earthquakes in the next five years?
(b) What is the most likely number of big earthquakes in the next 15 months?
(c) What is the probability of having a period of 15 months without a big earthquake?
(d) What is the probability of having to wait more than three and a half years until the
occurrence of the next four big earthquakes?
Solution We assume that the sequence of big earthquakes follows a Poisson process with
(average) rate = 1 per year.
(a) The number X of big earthquakes in the next five years is a Poisson random variable
with average rate 5 and so, using the Poisson Table we get
P (X 3) = 1 P (X < 3) = 1 F (2) = 1 0.125 = 0.875.
110
(b) In general, a Poisson density f (x) with parameter is increasing at x (x 1) if and only
if the ratio f (x)/f (x 1) > 1. Since
f (x)
exp{} x
exp{} (x1)
= .
f (x 1)
x!
(x 1)!
x
it follows that
f (x) > f (x 1) when x <
f (x) = f (x 1) when x =
f (x) < f (x 1) when x >
Therefore, the largest value of f (x) is achieved when x = [], where
[] = integer part of
So, the most likely number of big earthquakes is [1.25] = 1 (notice that 15 months = 1.25
years).
(c) The waiting time T to the next big earthquake is an exponential random variable with
rate = 1 year, with distribution function
F (t) = 1 exp {t}.
Therefore,
P {T > 1.25} = 1 F (1.25) = 1 [1 exp {1.25}] = 0.287.
(d) Let Y represent the number of big earthquakes in the next three and a half years and let
W represent the waiting time (in years) until the occurrence of the next four big earthquakes.
We notice that Y is a Poisson random variable with rate 3.5 and that W is larger than
3.5 years if and only if Y is less than 4. So,
P [W > 3.5] = P [Y < 4] = F (3) = 0.5366.
2
Means and Variances The means of X and T are of practical interest, as they represent
the expected number of occurrences on a period of length and the expected waiting time
between consecutive occurrences, respectively. We will see that, not surprisingly,
E(X) =
and
E(T ) =
1
.
and
Var(T ) =
1
.
2
E(X) =
1
X
xf (x) =
x=0
1
X
x=0
= exp{}()
1
X
exp{}()x
exp{}()x
=
x
x!
x!
x=1
1
X
()x1
x=1
(x 1)!
= exp{}() exp{} = .
E[X(X 1)] =
x=0
1
X
x=2
1
X
x=0
x(x 1)
exp{}()x
x!
1
X
exp{}()x
()x2
= exp{}()2
x!
x=2 (x 2)!
Therefore,
E(X 2 ) = E[X(X 1)] + E(X) = ()2 + ,
and
Var(X) = E(X 2 ) [E(X)]2 = ()2 + ()2 = .
To calculate E(T ), we use integrationbypart with
u = t and dv = exp{t}dt,
to get
E(T ) =
Z 1
0
tf (t)dt =
Z 1
0
t exp{t}dt =
Z 1
0
exp{t}dt = 1/.
111
112
and dv = exp{t}dt,
to get
2
E(T ) =
Z 1
0
t f (t)dt =
Z 1
= (2/)[
Z 1
0
t exp{t}dt = 2
Z 1
0
t exp{t}dt
Finally,
Var(T ) = E(T 2 ) [E(T )]2 = (2/2 ) (1/2 ) = (1/2 ).
113
Notice that
25
X
W =
Ti
i=1
Ti =
25
X
Ti ) =
E(W ) = E
i=1
Var(W ) = Var(
i=1
and,
25
X
ETi =
i=1
25
X
25
X
1 = 25.
i=1
Var(Ti ) =
i=1
25
X
1 = 25.
i=1
SD(W ) = 5.
2
6.5
for all x = 0, . . . , n.
Example 17: On average, one per cent of the 50-kg dry concrete bags are underfilled below
49.5 kg. What is the probability of finding 4 or more of these underfilled bags in a lot of 200?
Solution: Since n = 200 and p = 0.01,
min{np, n(1 p)} = min{2, 198} = 2 < 5.
Since n is large and np = 2 is small, we can use the Poisson approximation
P [B(200, 0.01) 4] P [P(2) 4] = 1 P [P(2) < 4]
= 1 F (3) = 1 0.857 = 0.143
114
6.6
Let m be some
fixed integer number. If Yi is the number of occurrences of the event A in
the interval i1
, im , then the total number of occurrences, X, in the interval (0, 1] (we are
m
taking = 1 for simplicity) can be written as
X = Y1 + Y2 + . . . + Ym .
and so the variables Yi are approximately Bernoulli random variables when m is large.
By the above remarks, the random variable X is approximately Binomial, B(m, /m),
when m is large. Of course, the larger m, the better the approximation, and in the limit
(when m ! 1) the approximation becomes exact. Therefore, the probability that X will
take any fixed value x can be obtained from the limit, as m ! 1, of the binomial expression
P (X = x) = (nx )[/m]x [1 /m]mx
Since, as m ! 1, we have
(nx )/mx =
mm1m2
mx+1
...
! 1,
m m
m
m
[1 /m]x ! 1,
and
[1 /m]m ! exp{},
we obtain that, as m ! 1,
(nx )[/m]x [1 /m]mx =
mm1m2
mx+1
...
[1 /m]x [1 /m]m x /x!
m m
m
m
! exp{}x /x!,
In particular, this justifies the P(np) approximation to the B(n, p), when n is large and p
is small. The requirement that n is large corresponds to m being large and the requirement
that p is small corresponds to /m being small.
To derive the Exponential density of T , we reason as follows: The waiting time T until
the first occurrence of A will be larger than t if and only if the number of occurrences X in
the period (0, t) is equal to zero. Since X P(t),
P (T t) = 1 P (T > t) = 1 P (X = 0) = 1 [exp {t}(t)0 /0!]
= 1 exp {t},
116
6.7
6.7.1
Exercises
Exercise Set A
Problem 6.1 A weighted coin is flipped 200 times. Assume that the probability of a head
is 0.3 and the probability of a tail is 0.7. Each flip is independent from the other flips. Let
X be the total number of heads in the 200 flips.
(a) What is the distribution of X?
(b) What is the expected value of X and variance of X?
(c) What is the probability that X equals 35?
(d) What is the approximate probability that X is less than 45?
Note: Come back to this question after you learned about normal approximations in the
next chapter.
Problem 6.2 Suppose it is known that a treatment is successful in curing a muscular pain
in 50% of the cases. If it is tried on 15 patients, find the probabilities that:
(a) At most 6 will be cured.
(b) The number cured will be no fewer than 6 and no more than 10.
(c) Twelve or more will be cured.
(d) Calculate the mean and the standard deviation.
Problem 6.3 The oce of a particular U.S. Senator has on average five incoming calls per
minute. Use the Poisson distribution to find the probabilities that there will be:
(a) exactly two incoming calls during any given minute;
(b) three or more incoming calls during any given minute;
(c) no incoming calls during any given minute.
(d) What is the expected number of calls during any given period of five minutes?
Problem 6.4 A die is colored blue on 5 of its sides, and green on the other 1 side. This die
is rolled 8 times. Assume each roll of the die is independent from the other rolls. Let X be
the number of times blue comes up n the 8 rolls of the die.
(a) What is the expected value of X and the variance of X?
(b) What is the probability that X equals 6?
(c) What is the probability that X is greater than 6?
Problem 6.5 A factory produced 10, 000 light bulbs in February, in which there are 500
defectives. Suppose 20 bulbs are randomly inspected. Let X denote the number of defectives
in the sample.
(a) Calculate P (X = 2).
(b) If the sample size, i.e., the number of the inspected bulbs, is large, how would you
calculate P (X 2) approximately? For n = 200, calculate this probability approximately.
6.7.2
Exercise Set B
Problem 6.6 Let X be a random variable with geometric density (6.2). Show that
6.7. EXERCISES
117
118
Problem 6.12 The number of killer whales arriving at the Pacific Rim Observatory Station
follows a Poisson Process with rate = 4 per hour.
(a) What is the expected number and variance during the next hour?
(b) What is the probability that the waiting time T between two consecutive arrivals will be
30 minutes or more?
(c) What is the expected time and variance until the next 20 killer whales arriving at the
Observatory Station.
Problem 6.13 Car accidents are random and can be said to follow a Poisson distribution.
At a certain intersection in East Vancouver there are, on average, 4 accidents a week. Answer
the following questions:
(a) What is the probability of there being no accidents at this intersection next week?
(b) The record for accidents in one month at a single intersection is 20. Find the probability
that this record will be broken, at this intersection, next month. (Assume 30 days in one
month)
(c) What is the expected waiting time for 20 accidents to occur?
Problem 6.14 A test consists of ten multiple-choice questions with five possible answers.
For each question, there is only one correct answer out of five possible answers. If a student
randomly chooses one answer at each question, calculate the following probabilities that
(a) at most three questions are answered correctly?
(b) five questions are answered correctly?
(c) all questions are answered correctly?
And (d) calculate the mean and the standard deviation of number of correct answers.
Problem 6.15 The number of meteorites hitting Mars follows a Poisson process with parameter = 6 per month.
(a) What is the probability that at least 2 meteorites hit Mars in any given month?
(b) Find the probability that exactly 10 meteorites hit Mars in the next 6 months.
(c) What is the expected number of meteorites hitting the Mars in the next year?
Problem 6.16 A biased coin is flipped 10 times independently. The probability of tails is
0.4. Let X be the total number of heads in the 10 flips.
(a) Use a computer to find P (X = 4);
(b) Use the Binomial table to find P (1 < X < 5);
(c) What is the probability that one has to flip at least 5 times to get the first head?
Problem 6.17 Three identical fair coins are tossed simultaneously until all three show the
same face.
(a) What is the probability that they are tossed more than three times?
(b) Find the mean for the number of tosses.
Chapter 7
119
120
Example 7.1 A system consists of 25 independent parts connected in such a way that the ith
part automatically turnson when the (i 1)th part burns out. The expected lifetime of each
part is 10 weeks and the standard deviation is equal to 4 weeks. (a) Calculate the expected
lifetime and standard deviation for the the system. (b) Calculate the probability that the
system will last more than its expected life. (c) Calculate the probability that the system
will last more than 1.1 times its expected life. (d) What are the (approximate) median life
and interquartile range for the system?
Solution
(a) Let Xi denote the lifetime of the ith component and let
T =
25
X
Xi ,
i=1
25
X
i=1
25
X
i=1
Var(Xi ) = 25 16 = 400.
Therefore,
SD(T ) =
400 = 20 weeks,
' 25 N (10,
16
) = N (250, 400) = N (E(T ), Var(T )).
25
Therefore,
250 250
= 0.5.
20
(c) First of all notice that 1.1 E(T ) = 1.1 250 = 275. Now, by the discussion in (b),
P (T > E(T )) = P (T > 250) 1
P (T > 275) 1
275 250
= 1 (1.25) = 0.1056.
20
121
(d) Let Z denote the standard normal random variable. Using that T
follows that
'
N (250, 400), it
and
Table 7.1
Midpoint
40
44
48
52
56
60
64
68
Frequency
15
34
26
23
17
16
4
10
145
Example 7.2 Consider Table 7.1 with data on the annual (cumulative) rainfall intensity (X)
on a certain watershed area. The average annual rainfall intensity can be calculated from
Table 7.1 as:
X =
=
Since the average has been calculated from a frequency table, using the midpoint of each class
to represent all the points in each class, there is an approximation error to be considered.
How likely is that this approximation error is (a) larger than 0.05? (b) larger than 0.10? (c)
larger than 0.5?
122
Solution
To make the required probability calculations we will assume that the rainfall intensities
are uniformly distributed on each interval. This is a reasonable assumption given that we do
not have any additional information on the distribution of values on each class.
Let ri represent the actual annual rainfall intensity (i = 1, 2, . . . , 145) and let mi be the
midpoint of the corresponding class. For instance, if r5 = 50.35 (a value in the class 5054),
then mi = 52.0. Let
Ui = ri mi , i = 1, 2, . . . , 145.
Given our uniformity assumption, the Ui0 s are uniform random variables on the interval
(2, 2).
To proceed with our calculation, we will assume that the variables Ui0 s (which represent
the approximation errors) are independent.
Let
r1 + r2 + . . . + r145
r=
.
145
The approximation error, D, in the calculation of X can now be written as
D = r verlineX =
r1 + r2 + + r145 m1 + m2 + . . . + m145
U1 + U2 + . . . + U145
=
145
145
145
Since D is the average of 145 independent, identically distributed random variables with zero
mean and variance equal to
1Z 2 2
4
2 =
t dt = ,
4 2
3
we can use the (CLT) normal approximation. That is, we can use a normal distribution
with zero mean and variance equal toq(4/3)/145, to approximate the distribution of D. The
corresponding standard deviation is 4/435 = 0.095893.
(a)
P (|D| > 0.05) = P (|D|/0.095893 > 0.05/0.095893) 2[1 (0.52)] = 0.6084.
(b)
P (|D| > 0.1) = P (|D|/0.095893 > 0.1/0.095893) 2[1 (1.04)] = 0.2984.
(c)
P (|D| > 0.5) = P (|D|/0.095893 > 0.5/0.095893) 2[1 (5.21)] = 0.
2
Example 6.3 (continued from Chapter 6):
Recall part (h) of Example 6.3 from the previous chapter which was left unanswered
(h) What is the approximate probability that the waiting time until the 25th big earthquake
will exceed 27 years?
123
Solution
(h) Since W is a sum of iid random variables, we can use the Central Limit Theorem to
approximate P (W > 27). Since E(W ) = 25 and SD(W ) = 5 we have
P (W > 27) = 1 P (W 27) = 1
27 25
= 1 [0.40] = 1 0.6554 = 0.3446.
5
2
7.2
Let X be a binomial random variable with parameters n and p. When n is large so that
min{np, n(1 p)} 5,
we can use the following approximation:
"
"
k np + 0.5
k np 0.5
P (X = k) = P [k 0.5 < X < k + 0.5] =
.
p
p
npq
npq
(7.1)
The justification for the approximation above is given by the Central Limit Theorem. In
fact, we have seen before that
X = Y1 + Y2 + + Yn
where Y1 , Y2 , . . . , Yn are independent Bernoulli random variables with parameter p. Therefore,
X
Y1 + Y2 + + Yn
=
=Y
n
n
which is approximately N (p, pq/n) when n is large. Therefore,
X = nY
is approximately distributed as N (p, pq/n) multiplied by n, that is, N (np, npq). The continuity correction 0.5 which is added and subtracted to k is needed because we are approximating
a discrete random variable with a continuous random variable.
For example, if n = 15 and p = 0.4, then
min{np, n(1 p)} = min{6, 9} = 6 5,
p
np = 6
npq = 1.9
and
8 6 + 0.5
8 6 0.5
P (X = 8) =
1.9
1.9
124
Using the Binomial Table on the Appendix we have that the exact probability is equal to
P (X = 8) = F (8) F (7) = 0.9050 0.7869 = 0.1181.
Therefore, the approximation error is equal to 0.0032.
The student can verify, as an exercise, the entries in Table 6.2, where P (X = k) is being
approximated using formula (7.1).
125
Table 7.2
k
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
7.3
Approximated
0.0016
0.0070
0.0240
0.0605
0.1213
0.1827
0.2051
0.1827
0.1213
0.0605
0.0240
0.0070
0.0016
0.0003
0.0000
0.0000
Exact
0.0005
0.0047
0.0219
0.0634
0.1268
0.1859
0.2066
0.1771
0.1181
0.0612
0.0245
0.0074
0.0016
0.0003
0.0000
0.0000
Error
0.0011
0.0023
0.0021
-0.0029
-0.0055
-0.0032
-0.0015
0.0056
0.0032
-0.0007
-0.0005
-0.0004
0.0000
0.0000
0.0000
0.0000
The Central Limit Theorem can also be used to approximate Poisson probabilities when the
expected number of counts, , is large. As a rule of thumb, we will use this approximation
when 20.
The Poisson random variable, X ' P(), is approximated by the normal random variable,
N (, ), with the same mean and variance. In other words,
"
"
x + .5
x .5
p
p
P (X = x)
provided that 20. The continuity correction 0.5 added and subtracted to x is needed
because we are approximating a discrete random variable with a continuous random variable.
This approximation is justified by the following argument: consider a Poisson process
with rate = 1, and suppose that X represents the number of occurrences in a period of
length . We can divide into n subintervals of length /n and denote by Yi the number of
occurrences in the ith subinterval. It is clear that Y1 , . . . , Yn are independent Poisson random
variables with mean /n and that
X = Y1 + Y2 + + Yn = nY .
126
"
27 + .5 25
27 .5 25
p
p
P (X = 27)
7.4
7.4.1
Exercises
Exercise Set A
Problem 7.1 Two types of wood (Elm and Pine) are tested for breaking strength. Elm
wood has an expected breaking strength of 56 and a standard deviation of 4. Pine wood has
be the sample
an expected breaking strength of 72 and a standard deviation of 8. Let X
average breaking strength of an Elm sample of size 30, and Y be the sample average breaking
strength of a Pine sample of size 40.
127
7.4. EXERCISES
be:
(a) between 109 and 112?
(b) greater than 111?
(c) What assumption(s) have you made?
Problem 7.4 The expected amount of sulfur in the daily emission from a power plant is 134
pounds with a standard deviation of 22 pounds. For a random sample of 40 days, find the
approximate probability that the total amount of sulfur emissions will exceed 5, 600 pounds.
Problem 7.5 Suppose we draw two samples of equal size n from a population with unknown
and Y be the corresponding
mean but a known standard deviation 3.5, respectively. Let X
sample averages. How large would the sample size n be required to be to ensure that P (1
Y 1) = 0.90?
X
Problem 7.6 Suppose X1 , . . . , X30 are independent and identically distributed random variables with mean EX1 = 10 and variance
Var(X1 ) = 5.
1 P30
Exercise Set B
Problem 7.7 Show that if U has uniform distribution on the interval (0, 1) and F is any
given continuous distribution function, then X = F 1 (U ) has distribution F . This result can
be used to generate random variables with any given distribution.
Problem 7.8 (a) Generate m = 100 samples of size n = 10, of independent random variables
with uniform distribution on the interval (0, 1). Let Xij denote the j th element of the ith
sample (i = 1, 2, . . . , m and j = 1, 2, . . . , n).
Construct the histogram and Q Q plot for the sample means
Xi =
n
1X
Xij .
n i=1
(b) Same as (a) but with n = 20 and n = 40. What are your conclusions?
(c) Repeat (a) and (b) but with the Xij having density
f (x) =
1
(x 4)
18
= 0
What are your conclusions?
Hint: See Problem 7.7.
4 < x < 10
otherwise
128
Problem 7.9 Solve part (a) of Problem 7.8 but with p = 0.7, instead of 0.3.
Problem 7.10 Referring to Problem 6.10, find the probability that more than 800 customers
will come during the next 20 business days?
Problem 7.11 The expected tensile strength of two types of steel (types A and B, say) are
106 ksi and 104 ksi. The respective standard deviations are 8 ksi and 6 ksi. Let X and Y
be the sample average tensile strengths of two samples of 40 specimens of type A and 35
specimens of type B, respectively.
(a) What is the approximate distribution of X? Of Y ?
(b) What is the approximate distribution of X Y ? Why?
(c) Calculate (approximately) P [|X Y | < 1].
(d) Suppose that after completing all the sample measurements you find x y = 6. What
do you think now of the population assumptions made at the beginning of this problem?
Why?
Problem 7.12 (a) There are 75 defectives in a lot of 1500. Twenty five items are randomly
inspected (the inspection is non-destructive and the items are returned to the lot immediately
after inspection). If two or more items are defective the lot is returned to the supplier (at
the suppliers expense). Otherwise, the lot is accepted. What is the probability that the lot
will be rejected?
(b) Suppose that the actual number of defectives is unknown and that five out of twenty
five independently inspected items turned out to be defectives. Estimate the total number
of defectives in the lot (of 1500 items). What is the expected value and standard deviation
of your estimate? What is the (approximated) probability that your estimate is within a
distance of 10 from the actual total number of defectives?
Problem 7.13 A sequence of n independent pH determinations of a chemical compound
will be made. Each determination can be viewed as a random variable, Xi , with mean
(the unknown true pH of the compound) and standard deviation = 0.15. How many
independent determinations are required if we wish that the sample average X is within 0.01
of the true pH with probability 0.95? What is the necessary n if = 0.30?
Problem 7.14 Bits are independently received in a digital communication channel. The
probability that a received bit is in error is 0.00001.
(a) If 16 million bits are transmitted, calculate the (approximate) probability that more than
150 errors occur.
(b) If 160,000 bits are transmitted, calculate the (approximate) probability that more than
1 error occurs.
Chapter 8
Introduction
One is often interested on random quantities (variables Y , T , N , etc.) such as the strength
Y of a concrete block, the time T of a chemical reaction, the number N of visits to a
website, etc. Engineers and applied scientists use statistical models to represent these
random quantities. Statistical models are a set of mathematical equations involving random
variables and other unknown quantities called parameters.
For example, the compressive strength of a concrete block can be modeled as
Y = + "
(8.1)
where is a parameter that represents the true average compressive strength of the concrete
block, " is a random variable with zero mean and unit variance that accounts for the blockto-block variability and is a parameter that determines the average size of the block-toblock variability. Notice that according to this model the compressive strength of a concrete
block is a random variable that results from the sum of two components: a systematic
component or signal ( ) and a random component or noise (").
Independent measurements are often taken to adjust the model, that is, to estimate
the unknown parameters that appear in the model equations. For example, the compressive
strength of several concrete blocks can be measured to get information about and .
Before the measurements are actually performed they can be thought of as independent
replicates of the random quantity of interest. For example, the future measurements of the
compressive strengths can be represented as
Yi = + "i ,
i = 1, . . . , n,
(8.2)
130
population under study. In practice some units are randomly chosen and the measurements
are performed only on them. The set of selected units is called sample. The corresponding
set of measurements is also called a sample.
Given a statistical model and a set of measurements (sample) one can carry on some some
statistical procedures called statistical inference which are aimed at extrapolating from
the sample to the population. The most typical statistical procedures are:
Point estimation of the model parameters.
Confidence intervals for the model parameters.
Testing of hypotheses about the model parameters.
These procedures will be described and further discussed in the context of the simple situations considered below.
8.2
Sometimes it can be assumed that the quantity of interest is homogeneous for all the units in
the population and that the measurements are the sum of a systematic and a random part
(signal plus noise). In these cases we normally assume that the sample is a set of homogeneous
measurements
Yi = + "i ,
i = 1, . . . , n.
(8.3)
where and are as described in the Introduction above and n is the number of measurements
or sample size. It is often assumed that the measurements are independent and therefore
that the random variables "i , i = 1, . . . , n are independent. Finally, we assume that the
random variables "i are normal with mean zero and variance one.
Note: Multiplicative models where the measurements are the product of a systematic factor
and a random factor
Xi = Ui
can be transformed on additive models like (8.3) by taking the log of the measurements
Yi = ln(Xi ) = ln() + ln(Ui ).
8.2.1
131
for example the expected squared estimation error or the expected absolute estimation error.
Estimation of : A good point estimate for , the main parameter of model (8.3), can be
obtained by the method of least squares which consists of minimizing (in m) the sum of
squares
S(m) =
n
X
(Yj m)2
j=1
Dierentiating with respect to m and setting the derivative equal to zero gives the equation
S 0 (m) = 2
n
X
j=1
(Yj m) = 0,
or m =
n
1X
Yj = Y =
.
n j=1
Estimation Error: Being functions of the random variables, the point estimate Y and
the estimation error Y are also random variables. Obviously, we would like that the
estimation error is small. To have some idea of the behavior of the estimation error we can
calculate its expected value (mean) and its variance:
E[Y ] = E(Y ) =
n
n
1X
1X
E(Yj ) =
=0
n j=1
n j=1
and
Var(Y ) = Var(Y ) =
( Y is unbiased),
n
n
1 X
1 X
2
2
Var(Y
)
=
n
=
.
j
n2 j=1
n2 j=1
n
In this case, the estimation error has then a distribution centered at zero and a variance
inversely proportional to n. In other words, if n is suciently large, likely values of Y will
all be close to .
Estimation of 2 : The point estimate for 2 is based on the minimized sum of squares,
S(Y ), divided by a quantity d so that the E[S(Y )/d] = 2 . The simple derivation outlined
in Problem 8.9 shows that d = n 1, and so
2
=S =
Pn1
Y )2
.
n1
j=1 (Yj
132
The sample mean and variance areqy = 1.9833 and s2 = 0.09787879, respectively. The
standard error of y is then SE(y) = 0.09787879/12 = 0.09031371. It would appear that the
scientists measurement procedure is biased giving values below the true concentration. The
bias can be estimated as 1.9833 2.5 = 0.5166667, give or take 0.181 (0.181 = 2 SE(y)).
8.2.2
Consider the absolute estimation error |Y |. We wish to find a value d such that there is
a large probability (0.95 or 0.99) that the absolute estimation error is below d. That is, we
wish to find d such that, for some small value of (typically = 0.05 or 0.01) we have
P [|Y | < d] = 1
The resulting d can be, then, added to and subtracted from the observed average y to obtain
the upper and lower limits of an interval called (1 )100% confidence interval:
(y d, y + d)
Typical values of are = 0.05 and = 0.01 yielding 95% and 99% confidence intervals,
respectively. To fix ideas we will take = 0.05 in what follows.
Assuming that the model (8.3) is correct, the probability that and Y dier by more than
d is only 0.05. In other words, if we repeatedly obtain samples of size n and construct the
corresponding 95% confidence intervals for , on average, 95% of these intervals will include
the (unknown) value of .
Using that Y N (, 2 /n) we have
0.95 = P [ |Y | < d ] = P
"
"p #
p #
X
d n
nd
| p |<
= 2
1.
/ n
That is,
" p #
d n
= 0.975.
d = 1.96 p .
n
133
where
df = n k,
n = number of squared terms appearing in the variance estimate
and
k = number of additional estimated parameters appearing in the variance estimate
Table A.2 in Appendix gives the values of t(df ) () for several values of and df .
In summary, the estimated value of d is
s
d = tdf () p = tdf () SE(Y ).
n
Notice that for most values of n that appear in practice, tn1 (0.05) 2, justifying the
common practice of adding and subtracting 2 SE(y) from the observed average y.
Example 8.2 Refer to the data in example 8.1. A 95% confidence interval for the actual
mean of the scientists measurements is
1.9833 t(11) (0.05) SE(y)
or
1.9833 2.20 0.09031371.
That is, the systematic part of the scientists measurement is likely to lay between 1.8 and
2.2.
134
8.2.3
There are situations when one wishes to determine if a certain statement or hypothesis about
a model parameter is consistent with the given data. That is, one wishes to confront the
statement against the empirical evidence (data). For example, the scientist of Examples 8.1
and 8.2 may wish to test the hypothesis that the given measurement method is unbiased,
using her collected data.
The procedure for rejecting a hypothesis about a certain unknown population parameter,
on the basis of statistical evidence, is called testing of hypothesis. The hypothesis to be
tested is denoted by H0 .
Typical hypotheses, H0 , about are
(i) H0 : = 0
or
(ii) H0 : 0
or (iii) H0 : 0 ,
where 0 is some specified value. In the case of the scientist of examples 8.1 and 8.2, the
statement the measurement method is unbiased corresponds to (i) with = 2.5. On the
other hand, the statement the measurement method does not consistently under-estimates
the true concentration corresponds to (iii) with = 2.5. What statement would correspond
to (ii) with = 2.5?
Significance Level of a Test: When testing a hypothesis one can incur in two possible
errors: Rejecting a hypothesis that is true (Error of type I) or non-rejecting a hypothesis that
is false (Error of type II). Errors of type I are considered more important and kept under
tight control. Therefore, usual testing procedures insure that the probability of rejecting a
true hypothesis is rather small (0.01 or 0.05). The probability of error of type I is usually
denoted by and called significance level of the test.
Taking that into consideration, the hypothesis H0 is constructed in a such a way that its
incorrect rejection has a small probability. H0 states, then, the most conservative statement.
A statement that one would like to reject only in the presence of strong empirical evidence.
Because of that H0 is called as the null hypothesis.
The Testing Procedure: The testing procedures learned in this course are simply derived
from confidence intervals. Suppose we wish to test H0 at level . Then we distinguish two
cases:
Two sided tests: Hypotheses of the form H0 : = 0 give rise to two sided tests because
in this case we reject H0 if we have evidence indicating that is smaller or larger than 0 .
The twosided level testing procedure consists of the following two steps:
Step 1. Construct a (1 )100% confidence interval for .
Step 2. Reject H0 : = 0 if 0 lies outside that interval.
One sided tests: Hypotheses of the form H0 : 0 (H0 : 0 ) are called directional
hypotheses and give rise to onesided tests. Notice that in this case we reject H0 only if we
suspect that < 0 ( > 0 ).
The onesided level testing procedure consists of the following two steps:
Step 1. Construct a [1 (2 )]100% confidence interval for .
135
Example 8.4 A shipyard must order a large shipment of lacquer from a supplier. Besides
other design requirements, the lacquer must be durable and dry quickly. The average drying
time must not exceed 25 minutes. Supplier A claims that, on average, its product dries in
20.5 minutes. A sample of 30 20-liter cans from supplier A yields an average drying time of
22.3 minutes and standard deviation of 2.9 minutes.
(a) Is there statistical evidence to distrust supplier As claim that its product has an average
drying time of 20.5 minutes?
(b) Can we say that, on average, supplier As lacquer dries before 24 minutes?
136
present case p
= 0.05 and df = 30 1 Hence, from Table A.2, t29 (0.05) = 2.05. Moreover,
SE(y) = 2.9/ 30 = 0.529465. Therefore,
d = 2.05 0.529465 = 1.085,
and the 95% confidence interval for is
= (22.3 1.084) = (21.21 , 23.39).
(y d)
Since this interval doesnt include the value = 20.5, we reject suppliers A claim that
= 20.5. That is, we reject the hypothesis = 20.5 on the basis of the given data and
statistical model.
(b) One way to answer this question is to test the hypothesis
H0 : 24.0
at some (small) level . To take advantage of the calculations already made we may choose
= 0.025. Since the upper limit 95% confidence interval for is smaller than 24.0, we reject
H0 and answer question (b) in a positive way.
2
8.3
There are practical situations where we are interested in comparing several populations. In
this section we will consider the simplest case of two populations. In Chapter 10 we will
consider the general case of two or more populations.
Example 8.5 Refer to the situation described in Problem 8.4. Another supplier, called
Supplier B, could also supply the lacquer. A sample of 10 20-liter cans from supplier B yields
an average drying time of 20.7 minutes and standard deviation of 2.5 minutes. Does the data
support supplier Bs claim that, on average, its product dries faster than As? What if the
sample size from supplier B were 100 instead of 10?
This example illustrates a fairly common situation: one must take or recommend an important decision involving a large number of items (or individuals) on the basis of a relatively
small number of measurements performed on some of these items. Recall that the set of all
the items under study is called the population and the subset of items used to obtain the
measurements (and often the measurements themselves) is called the sample.
Example 8.5 includes two populations, namely the 3,000 20-liter cans of lacquer that can
be acquired from either supplier A or B. In the following these two populations will be called
population A and population B, respectively.
Although we are concerned with the entire populations, we will only be able to test the
items in the samples. Therefore, we must try to investigate and exploit the mathematical
connections between the samples and the population from which they came. This can be
137
done with the help of an statistical model, that is, a set of probability assumptions regarding
the sample measurements. The two sample measurements can be modeled as
Yij = i + i "ij ,
i = 1, 2 and j = 1, . . . , ni .
(8.4)
where the first subscript (i) indicates the population and the second subscript (j) indicates
the observation. Thus, i and i are the population means and variances, respectively, and
n1 and n2 are sample sizes. In the case of Example 8.5, n1 = 30 and n2 = 10. It is often
assumed that the measurements are independent and therefore that the random variables
"ij , i = 1, 2 and j = 1, . . . , ni are independent. Finally, as in the case of one sample, we
assume that the random variables "ij are normal with mean zero and variance one.
Similarly to the one-sample case, the population means 1 and 2 can be estimated by
the corresponding sample means:
Y1 =
n1
1 X
Y1j
n1 j=1
and
n2
1 X
Y2j ,
n2 j=1
Y2 =
Notice that Y 1 and Y 2 are normal random variables with means 1 and 2 and variances
12 /n1 and 22 /n2 , respectively. Furthermore, the population variances 12 and 22 can be
estimated by the sample variances
S12 =
n1
1 X
[Y1j Y 1 ]2
n1 j=1
and
n2
1 X
[Y2j Y 2 ]2
n2 j=1
S22 =
P2
Pni
Y i ]2
.
n1 + n2 2
i=1
j=1 [Yij
138
Linear Combinations of the Population Means: In practice one often wishes to estimate linear combinations of the population means and to test hypotheses about them. In
such cases we say that the parameter of interest is a linear combination of 1 and 2 .
The most common linear combination of 1 and 2 is the simple dierence:
= 1 2 .
Other examples are
= 1 22 ,
= 31 2 ,
= 1.21 + 0.52 ,
n1
+b
n2
"
a2
b2
+ .
n1 n 2
(29)(2.92 ) + (9)(2.52 )
= 7.90
30 + 10 2
and so
s
p
= s 1 + 1 = 2.8106 1 + 1 = 1.053 = 1.026.
SE()
n1 n2
30 10
a2
b2
+
.
n1 n2
139
(y 1 y 2 ) t(38) (0.05) s
n1 + n2
= 1.6 2.02 1.026 = (0.47 , 3.67),
n 1 n2
We have used the approximation t(38) (0.05) t(38) (0.05) = 2.02, because t(38) (0.05) is not
included in the table.
Solution to Example 8.5: The statement of Supplier B is consistent with the hypothesis
H 0 : 1 2
or equivalently
H0 : 1 2 0.
We may answer the question by testing this (directional) hypothesis at some (small) level .
For example, we may take = 0.05. The 90% confidence interval for = 1 2 is
(y 1 y 2 ) t(40) (0.10) s
n1 + n2
= 1.6 1.68 1.026 = 1.6 1.724 = (0.124 , 3.324),
n 1 n2
Since the value 1 2 = 0 falls in the interval, we conclude that there is no statistically
significant dierence between the two means. There is, then, statistical evidence against
Supplier Bs claim of having a superior product.
2
Example 8.6 Either 20 large machines or 30 small ones can be acquired for approximately
the same cost. One large and one small machines have been experimentally run for 20 days
with the following results:
y large = y 1 = 31.0,
y large = y 1 = 31.0,
slarge = s1 = 2.1
slarge = s1 = 2.1 1.
9
140
Solution: Since the total cost of 20 large machines equals the cost of 30 small machines, it is
reasonable to compare the total outputs:
Total output of 20 large machines = 201
Total output of 30 small machines = 302
where 1 and 2 are the average daily outputs for each type of machine.
Therefore, the parameter of interest is the linear combination
= 201 302 .
From the information given we have n1 = n2 = 20 and can be estimated by
= 20 y 1 30 y 2 = 20 31 30 22.7 = 61.0.
The pooled estimate of 2 is
19 2.12 + 19 1.92
= 4.01
20 + 20 2
and so s = 2.0. Since df = 20 + 20 2 = 38, from the Students t table we have
t(38) (0.05) t(40) (0.05) = 2.02.
Therefore, the 95% confidence interval for is
61.0 2.02 2.0
202 302
+
= 61.0 35.57 = (93.57, 28.43).
20
20
Therefore we reject (at level = 0.05) the hypothesis that both alternatives are equally
convenient. It appears that it would be more convenient to acquire 30 small machines.
141
8.4. EXERCISES
8.4
8.4.1
Exercises
Exercise Set A
P
Problem 8.1 Given that n1 = 15, x = 20, (xi x)2 = 28, and n2 = 12, y = 17, ,
y)2 = 22.
(a) Calculate the pooled variance s2 .
(b) Determine a 95% confidence interval for 1 2 .
(c) Test H0 : 1 = 2 with = .05.
(yi
Problem 8.2 The time for a worker to repair an electrical instrument is a normally distributed N (, 2 ) random variable measured in hours, where both and 2 are unknown.
The repair times for 10 such instruments chosen at random are as follows:
212,234,222,140,280,260,180,168,330,250
(1) Calculate the sample mean and the sample variance of the 10 observations.
(2) Construct a 95% confidence interval for .
(3) Suppose the worker claims that his average repair time for the instrument is no more
Test if there is sufficient
than 200 hours. Test if his claim conforms with the data.
evidence
to dispute the
claim.
Problem 8.3 (Hypothetical) The eectiveness of two STAT251/241
labsworker's
which were
conducted by two TAs is compared. A group of 24 students with rather similar backgrounds
was randomly divided into two labs and each group was taught by a dierent TA.Their test
scores at the end of the semester show the following characteristics:
n1 = 13, x = 74.5, s2x = 82.6
and
n2 = 11, y = 71.8, s2y = 112.6.
Assuming underlying normal distributions with 12 = 22 , find a 95 percent confidence interval
for 1 2 . Are the two labs dierent? Summarize the assumptions you used for your analysis.
Problem 8.4 Two machines (called A and B in this problem) are compared. Machine A
cost $ 3000 and machine B cost $ 4500. One machine of each type was operated during 30
days and the daily outputs were recorded. The results are summarized below:
Machine A: xA = 200kg sA = 5.1kg.
Machine B:
xB = 270kg
sB = 4.9kg.
Is there statistical evidence indicating that any one of these machines has better output/cost
performance than the other? Use = 0.5.
Problem 8.5 The average biological oxygen demand (BOD) at a certain experimental station has to be estimated. From measurements at other similar stations we know that the
variance of BOD samples is about 8.0 (mg/liter)2 . How many observations should we sample
142
if we want to be 90 percent confident that the true mean is within 1 mg/liter of our sample
average? (Hint: Using CLT, we may assume the sample average has approximately normal
distribution).
Problem 8.6 An automobile manufacturer recommends that any purchaser of one of its new
cars bring it in to a dealer for a 3000-mile checkup. The company wishes to know whether
the true average mileage for initial servicing diers from 3000. A random sample of 50 recent
purchasers resulted in a sample average mileage of 3208 and a sample standard deviation
of 273 miles. Does the data strongly suggest that true average mileage for this checkup is
something other than the recommended value?
Problem 8.7 The following data were obtained on mercury residues on birds breast muscles:
Mallard ducks: m = 16, x = 6.13, s1 = 2.40
Blue-winged teals: n = 17, y = 6.46, s2 = 1.73
Construct a 95% confidence interval for the dierence between true average mercury residues
1 , 2 in these two types of birds in the region of interest. Does your confidence interval
indicate that 1 = 2 at a 95% confidence level?
Problem 8.8 A manufacturer of a certain type of glue claims that his glue can withstand
230 units of pressure. To test this claim, a sample of size 24 is taken. The sample mean is
191.2 units and the sample standard deviation is 21.3 units.
(a) Propose a statistical model to test this claim and test the manufacturers claim.
(b) What is the highest claim that the manufacturer can make without rejection of this
claim?
8.4.2
Exercise Set B
Problem 8.9 Suppose that Y1 , . . . , Yn are a sample, that is, they are independent, identically
distributed, with common mean and common variance 2 . Recall that the sample variance
is equal to
Pn
(Yi Y )2
2
S = i=1
n1
(a) Show that
n
X
i=1
(Yi Y )2 =
n
X
i=1
(Yi )2 n(Y )2 .
143
8.4. EXERCISES
Problem 8.13 In order to process a certain chemical product, a company is considering the
convenience of acquiring (for approximately the same price) either 100 large machines or 200
small ones. One important consideration is the average daily processing capacity (in hundred
of pounds).
One machine of each type was tested for a period of 10 days, yielding the following results:
Large Machine: x1 = 120
s1 = 1.5
Small Machine: x2 = 65
s2 = 1.6
Model the data and identify the parameter of main interest. Construct a 95% confidence
interval for this parameter. What is your recommendation to management?
Problem 8.14 A study is made to see if increasing the substrate concentration has appreciable eect on the velocity of a chemical reaction. With the substrate concentration of 1.5
moles per liter, the reaction was run 15 times with an average velocity of 7.5 micromoles per
30 minutes and a standard deviation of 1.5. With a substrate concentration of 2.0 moles per
144
Employee
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Before Training
14.6
17.5
13.5
13.9
15.0
20.5
14.4
14.6
17.9
16.7
14.7
17.3
11.7
13.7
16.8
15.7
15.7
16.7
15.5
17.2
After Training
10.6
15.4
13.2
12.2
11.7
18.6
10.3
10.3
10.4
16.8
14.6
14.6
10.5
10.9
11.8
13.4
13.6
16.7
16.7
13.8
Dierence
4.0
2.1
0.3
1.7
3.3
1.9
4.1
4.3
7.5
-0.1
0.1
2.7
1.2
2.8
5.0
2.3
2.1
0.0
-1.2
3.4
liter, 12 runs were made yielding an average velocity of 8.8 micromoles per 30 minutes and a
sample standard deviation of 1.2. Would you say that the increase in substrate concentration
increases the mean velocity by as much as 0.5 micromoles per 30 minutes? Use a 0.01 level
of significance and assume the populations to be approximately normally distributed with
equal variances.
Problem 8.15 (Hypothetical) A study was made to estimate the dierence in annual salaries
of professors in University of British Columbia (UBC) and University of Toronto (UT). A
random sample of 100 professors in UBC showed an average salary of $46,000 with a standard
deviation $12,000. A random sample of 200 professors in UT showed an average salary of
$51,000 with a standard deviation of $14,000. Test the hypothesis that the average salary
for professors teaching in UBC diers from the average salary for professors teaching in UT
by $5,000.
Problem 8.16 A UBC student will spend, on the average, $8.00 for a Saturday evening
gathering in pub. A random sample of 12 students attending a homecoming party showed an
average expenditure of $8.9 with standard deviation of $1.75. Could you say that attending
a homecoming party costs students more than gathering in pub?
Problem 8.17 The following data represent the running times of films produced by two
dierent motion-picture companies.
Company I
Company II
Times (minutes)
103 94 110 87 98
97 82 123 92 175 88 118
Compute a 90% confidence interval for the dierence between the average running times of
films produced by the two companies. Do the films produced by Company II run longer than
those by Company I?
145
8.4. EXERCISES
Problem 8.18 It is required to compare the eect of two dyes on cotton fibers. A random
sample of 10 pieces of yarn were chosen; 5 pieces were treated with dye A, and 5 with dye B.
The results were
Dye A 4 5 8 8 10
Dye B 6 2 9 4 5
(a) Test the significance of the dierence between the two dyes. (Assume normality, common
variance, and significance level = 0.05.)
(b) How big a sample do you estimate would be needed to detect a dierence equal to 0.5
with probability 99%.
146
Chapter 9
Simulation Studies
9.1
Z 1
0
g(t)dt.
Suppose that g is such that this integral cannot be easily integrated and we need to
approximate it by numerical means. For simplicity suppose that 0 g(t) 1 for all 0 t
1.
If we are dealing with a function h(t) which is not between 0 and 1 but we know that
a h(t) b,
for all 0 t 1,
h(t) a
ba
h(t)dt = (b a)
Z 1
0
g(t)dt + a
Suppose that we want to estimate I with an error smaller than = 0.01, with probability
equal to 0.99. In other words, if I is the estimate of I, we require that
P {|I I| < 0.01} = 0.99.
147
148
Z 1
0
where U is a random variable with uniform distribution on the interval (0, 1). If we generate
n independent random variables
U1 , U 2 , . . . , U n
with uniform distribution on (0, 1), then by the Central Limit Theorem
n
1X
I =
g(Ui ) = g(U )
n i=1
Z 1
0
g (t)dt I
Z 1
0
Now,
p
p
P {|I I| < 0.01} = P { n|I I|/ < n0.01/}
P {|Z| <
p
n0.01/}
p
= 2[ n0.01/] 1.
But,
p
2[ n0.01/] 1 = 0.99
p
) [ n0.01/] = 0.995
)
)
n0.01
= 2.58
n=
2 (2.58)2
.
(0.01)2
149
Finally, since I(1 I) reaches its maximum at I = 0.5 it follows that I(1 I) 0.25 for all
I, and so, a conservative estimate for n is
n =
2 (2.58)2
(0.25)(2.58)2
= 16, 641
(0.01)2
(0.01)2
J=
Z b
a
f (t)dt,
(9.1)
where f (t) takes values between c and d. That is, the domain of integration can be any given
bounded interval, [a, b], and the function can take values on any given bounded interval [c, d].
For example, we may wish to estimate the integral
J=
Z 3
1
In this case the domain of integration is [1, 3] and the function ranges over the interval
[2.7183, 20.086]
.
In order to estimate J, first we must make the change of variables
u=
ta
,
ba
to obtain
J = (b a)
Z 1
0
Z 1
0
where
g(u) = (b a)f [(b a)u + a].
g(u)du,
150
Z 1
0
Z 1
0
and
g(u) = 2 exp {[2u + 1]2 }
The second step is to linearly modify the function g(u) so that the resulting function, h(u),
takes values between 0 and 1. That is,
h(u) =
g(u) (b a)c
(b a)(d c)
Finally,
J =
Z 1
0
g(u)du = (b a)(d c)
= (b a)(d c)I +
c
,
dc
Z 1
0
h(u)du +
c
dc
where I is of the desired form (that is, the integral between 0 and 1 of a function that takes
values between 0 and 1).
151
9.2. EXERCISES
9.2
Exercises
Problem 9.1 Use the Monte Carlo integration method with n = 1500 to approximate the
following integrals.
(a)
Z
I=
exp{x2 }dx.
What is the (approximated) probability that the approximation error is less than d = 0.05?
Less than d = 0.01?
(b)
Z 2
I=
exp{x2 }dx.
1
Z /2
0
152
Chapter 10
An example
47.90
44.56
37.69
52.58
61.93
58.91
82.31
44.81
39.72
26.72
47.95
50.41
37.79
40.47
63.39
42.85
51.82
46.36
40.98
44.68
34.66
29.13
52.01
62.18
57.79
51.37
52.72
46.83
50.21
44.30
47.48
41.21
61.38
54.60
48.51
60.26
64.97
52.76
42.12
30.64
Mean
45.05
SD
7.5
52.29
8.70
54.29
6.32
56.83
10.40
41.15
8.16
154
i = 1, . . . , k
and j = 1, . . . , ni .
The first subscript, i, ranges from 1 to k, where k is the number of populations being
compared, usually called treatments. In our example, we are comparing five types of drying
methods, therefore k = 5. The second subscript, j, ranges from 1 to ni , where ni is the number
of measurements for each treatment. In our example, we have n1 = n2 = . . . = n5 = 20.
The unknown parameters i represent the treatment averages. Dierences among the i s
account for the part of the variability observed in the data that is due to dierences among
the treatments being compared in the experiment.
The random variables "ij account for the additional variability that is caused by other
factors not explicitly considered in the experiment (dierent batches of raw material, dierent
mixing times, measurement errors, etc.). The best we can hope regarding the global eect
of these uncontrolled factors is that it will average out. In this way these factors will not
unduly enhance or worsen the performance of any treatment.
An important technique that can be used to achieve this (averaging out) is called randomization. The experimental units available for the experiment (the 100 concrete blocks
cylinders in the case of our example) must be randomly assigned to the dierent treatments,
so that each experimental unit has, in principle, the same chance of being assigned to any
treatment. One practical way for doing this in the case of our example is to number the
blocks from 1 to 100 and then to draw (without replacement) groups of 20 numbers. The
units with numbers in the first group are assigned to treatment A, the units with numbers
in the second group are assigned to treatment B, and so on. The actual labeling of the
treatments as A, B, etc. can also be randomly decided.
The model assumptions are:
(1)
(2)
(3)
(4)
These assumptions can be summarized by saying that the variables "ij s are iid N (0, 2 ).
(a) The QQ plots of Figure 10.1 (a)-(e) suggest that assumption (4) is consistent with the
data. Figure 10.1 (f) displays the boxplots for the combined data (first from the left) and
for each drying method. The variability within the samples seem roughly constant (the boxes
are of approximately equal size). This suggests that assumption (3) is also consistent with
the data.
155
10.1. AN EXAMPLE
Drying Method B
55
Empirical Quantiles
45 50
55 60
-2
-1
0
1
Normal Quantiltes
-1
0
1
Normal Quantiltes
Empirical Quantiles
60
70
-1
0
1
Normal Quantiltes
55
60
30
-1
0
1
Normal Quantiltes
40
-1
0
1
Normal Quantiltes
50
70
Empirical Quantiles
35 40 45 50
30
-2
Drying Method E
-2
80
-2
50
45
80
65
Empirical Quantiles
50
55
60
Drying Method D
Drying Method C
-2
40
30
Empirical Quantiles
35 40 45 50
65
Drying Method A
156
yij ,
j=1
y i.
si =
=
ni
yij
yi1 + . . . + yini
=
= j=1
ni
ni
yi.
=
,
the ith treatments mean,
ni
(yi1 y i. )2 + . . . + (yini y i. )2
ni 1
sP
ni
j=1 (yij
y i. )2
ni 1
k
X
yi.
i=1
ni
k X
X
yij ,
i=1 j=1
and
y .. =
Pk
Pk
i=1
yi.
i=1
Pni
j=1
yij
y 2. = 52.29,
y 3. = 54.29,
y 4. = 56.83,
y 5. = 41.15,
157
10.1. AN EXAMPLE
s1 = 7.50,
s2 = 8.70,
s3 = 6.32,
s4 = 10.40,
s5 = 8.16.
k
X
i=1
and
y .. = 4992/100 = 49.92.
It is not dicult to show that the y i. are unbiased estimates for the unknown parameters
i . In fact, the reader can easily verify that
E(Y i. ) = i ,
and Var(Y i. ) =
2
,
ni
for i = 1, . . . , k.
Analogously, it is not dicult to verify (see Problem 8.9) that S12 , S22 , . . . , Sk2 are k dierent
unbiased estimates for the common variance 2 :
E(Si2 ) = 2 ,
for i = 1, . . . , k.
These k estimates can be combined to obtain an unbiased estimate for 2 . The reader is
encouraged to verify that the combined estimate
2
S =
Pk
1)Si2
nk
i=1 (ni
is also unbiased and has a variance smaller than that of the individual Si2 s.
(c) Roughly speaking one can answer this question positively if there is evidence that a
substantial part of the variability in the data is due to dierences among the treatments.
The total variability observed in the data is represented by the total sum of squares,
SST =
ni
k X
X
[yij y .. ]2
ni
k X
X
yij2
ni
k X
X
yij2
i=1 j=1
i=1 j=1
i=1 j=1
Pk
i=1
y..2
.
n
Pni
Pk
j=1
i=1
ni
yij ]2
158
We will now show that the total sum of squares, SST , can be expressed as the sum of two
terms, the error sum of squares, SSe, and the treatment sum of squares, SSt. That
is,
SST = SSe + SSt,
(10.1)
where
SSe =
ni
k X
X
[yij y i. ]2
ni
k X
X
yij2
i=1 j=1
i=1 j=1
k
X
yi.2
i=1
ni
and
SSt =
k
X
i=1
ni [y i. y .. ]2 .
The first term on the righthand side of equation (1), SSe, represents the dierences
between items in the same treatment or withintreatment variability (this source of
variability is also called intragroupvariability). The second term, SSt, represents the
dierences between items from dierent treatments or betweengroups variability (this
source of variability is also called intergroupvariability).
To prove equation (1) we add and subtract y i. and expand the square to obtain
ni
k X
X
[yij y .. ]2 =
ni
k X
X
[(yij y i. ) + (y i. y .. )]2
ni
k X
X
(yij y i. )2 +
i=1 j=1
i=1 j=1
i=1 j=1
= SSe + SSt + 2
= SSe + SSt.
(y i. y .. )
k
X
i=1
(y i. y .. )2 + 2
i=1 j=1
k
X
i=1
= SSe + SSt + 2
ni
k X
X
ni
X
j=1
(yij y i. )
(y i. y .. )[ni y i. ni y i. ]
ni
k X
X
i=1 j=1
(yij y i. )(y i. y .. )
159
10.1. AN EXAMPLE
SSe = 259273.7
(4992.2)2
= 10049.11,
100
and
SSt = SST SSe = 3461.36.
Degrees of Freedom
The sums of squares cannot be compared directly. They must first be divided by their
respective degrees of freedoms.
Since we use n squares and only one estimated parameter in the calculation of SST , we
conclude that
df (SST ) = n 1.
Since there are n squares and k estimated parameters (the k treatment means) in the
calculation of SSe, we conclude that
df (SSe) = n k.
The degrees of freedom for SSt are obtained by the dierence
df (SSt) = df (SST ) df (SSe) = (n 1) (n k) = k 1.
ANALYSIS OF VARIANCE
All the calculations made so far can be summarized on a table called the analysis of
variance (ANOVA) table.
Table 10.2: ANOVA TABLE
Source
Drying Methods
Error
Total
Sum of Squares
3461.36
6587.75
10049.11
df
4
95
99
Mean Squares
865.25
69.34
F
12.45
160
(c) To answer question (c) we must compare the variability due to the treatments with the
variability due to other sources. In other words, we must find out if the treatment eect
is strong enough to stand out above the noise caused by other sources of variability.
To do so, the ratio
F =
M St
M Se
is compared with the value F [df (M St), df (M Se)] from the FTable, attached at the end of
these notes. In our case
F =
865.25
= 12.45
69.34
and
F (4, 95) F (4, 60) = 2.53.
Since F > F (4, 95) we conclude that there are statistically significant dierences among the
drying methods.
(d) To answer question (d) we must perform multiple comparisons of the treatment means.
It is intuitively clear that if the number of treatments is large and therefore the total number
of comparisons of pairs of means
K = (k2 ) = k(k 1)/2
is very large, there will be a greater chance that some of the 95% confidence intervals will fail
to include the value zero, even if all the i were the same. For example, K = 3 when k = 3,
K = 6 when k = 4 and K = 10 when k = 5.
To compensate for the fact that the probability of declaring two means dierent when
they are not is larger than the significance level = 0.05 used for each comparison, we must
use the smaller significance level, , given by
= 0.05/K
Each individual confidence interval is constructed so that it has probability 1 of
including the true treatment mean dierence. It can be shown that this procedure (called
Bonferroni multiple comparisons) is conservative: If all the treatment means are equal,
1 = 2 = . . . = k ,
then the probability that one or more of these intervals do not include the true dierence, 0,
is at most .
161
10.1. AN EXAMPLE
The procedure to compute the simultaneous confidence intervals is as follows. In the first
place, we must find the appropriate value, t(nk) () = t(nk) (/K), from the Students t table
(see Table 7.1). As before, the number of degrees of freedom corresponds to those of the MSe,
that is, df = n k.
The second step is to determine the standard deviation of the dierence of treatments
means, Y i. Y m. . It is easy to see that
Var(Y i. Y m. ) = 2
1
1
+
.
ni nm
Therefore,
estimated SD(Y i. Y m. )
M Se
1
1
+
.
ni nm
In the case of our example k = 5 and therefore K = 10. The observed dierences between
the 10 pairs of treatments (sample) means are given in Table 9.3.
Table 10.3: MULTIPLE COMPARISONS
Treatments
AB
AC
AD
AE
BC
BD
BE
CD
CE
DE
di,m
7.56
7.56
7.56
7.56
7.56
7.56
7.56
7.56
7.56
7.56
Observed Dierence
-7.24
-9.24
-11.78
3.9
-2.0
-4.54
11.14
-2.54
13.14
15.68
Significance
*
*
*
*
*
di,m = t(nk) M Se
1
1
+
.
ni nm
69.34
2
= 7.56.
20
The dierences marked with an star, *, on Table 9.3 are statistically significant. For example,
the * on the line AC together with the fact that the sign of the dierence is negative, is
162
interpreted as evidence that method A is worse (less strong) than method C. The conclusions
from Table 9.3 are: the methods A and E are not significantly dierent and appear to be
significantly worse than than the others. Observe that, although method A is not significantly
worse than method B (at the current level = 0.05) their dierence, 7.24, is almost significant
(fairly close to 7.56).
2
163
10.2. EXERCISES
10.2
Exercises
10.2.1
Exercise Set A
Problem 10.1 Three dierent methods are used to transport milk from a farm to a dairy
plant. Their daily costs (in $100) are given in the following:
Method 1: 8.10 4.40 6.00 7.00
Method 2: 6.60 8.60 7.35
Method 3: 12.00 11.20 13.30 10.55 11.50
(1) Calculate the sample mean and sample variance for the cost of each method.
(2) Calculate the grand mean and the pooled variance for the costs of the three methods.
(3) Test the dierence of the costs of the three methods.
Problem 10.2 Six samples of each of four types of cereal grain grown in a certain region were
analyzed to determine thiamin content, resulting in the following data (micrograms/grams):
Wheat: 5.2 4.5 6.0 6.1 6.7 5.8
Barley: 6.5 8.0 6.1 7.5 5.9 5.6
Maize: 5.8 4.7 6.4 4.9 6.0 5.2
Oats: 8.3 6.1 7.8 7.0 5.5 7.2
Carry out the analysis of variance for the given data. Do the data suggest that at least two
of the four dierent grains dier with respect to true average thiamin content? Use = 0.5.
Problem 10.3 A psychologist is studying the eectiveness of three methods of reducing
smoking. He wants to determine whether the mean reduction in the number of cigarettes
smoked daily diers from one method to another among male patients. Twelve men are
included in the experiment. Each smoked 60 cigarettes per day before the treatment. Four
randomly chosen members of the group pursue method I; four pursue method II; and so on.
The results are as follows (Table 9.4):
(a) Use a one-way analysis of variance to test whether the mean reduction in the number
Table 10.4:
Method I
52
51
51
52
Method II
41
40
39
40
Method III
49
47
45
47
of cigarettes smoked daily is equal for three methods. (Let the significance level equal 0.05).
(b) Use confidence intervals to determine which method results in a larger reduction in
smoking.
10.2.2
Exercise Set B
Problem 10.4 For best production of certain molds, the furnaces need to heat quickly up to
a temperature of 1500o F. Four furnaces were tested several times to determine the times (in
164
minutes) they took to reach 1500o F, starting from room temperature, yielding the following
results
Are the furnaces average heating times dierent? If so, which is the fastest? The slowest?
Table 10.5:
Furnace
1
2
3
4
ni
15
15
10
10
xi
14.21
13.11
15.17
12.42
si
0.52
0.47
0.60
0.43
Problem 10.5 Three specific brands of alkaline batteries are tested under heavy loading
conditions. Given here are the times, in hours, that 10 batteries of each brand functioned
before running out of power. Use analysis of variance to determine whether the battery
brands take significantly dierent times to completely discharge. If the discharge times are
significantly dierent (at the 0.05 level of confidence), determine which battery brands dier
from one another. Specify and check the model assumptions.
Table 10.6:
Battery Type
1
2
3
5.60
5.38
6.40
5.43
6.63
5.91
4.83
4.60
6.56
4.22
2.31
6.64
5.78
4.55
5.59
5.22
2.93
4.93
4.35
3.90
6.30
3.63
3.47
6.77
5.02
4.25
5.29
5.17
7.35
5.18
Problem 10.6 Five dierent copper-silver alloys are being considered for the conducting
material in large coaxial cables, for which conductivity is a very important material characteristic. Because of diering availabilities of the five kinds, it was impossible to make as
many samples from alloys 2 and 3 as from other alloys. Given next are the coded conductivity measurements from samples of wire made from each of the alloys. Determine whether
the alloys have significantly dierent conductivities. If the conductivities are significantly
dierent (at = 0.05), determine which alloys dier from one another. Specify and check
the model assumptions.
Problem 10.7 Show that
E(Y i ) = i ,
i = 1, . . . , k,
E(Si2 )
i = 1, . . . , k
= ,
165
10.2. EXERCISES
Table 10.7:
1
60.60
58.93
58.40
58.63
60.64
59.05
59.93
60.82
58.77
59.11
61.40
59.00
2
58.88
59.43
59.30
56.97
59.02
58.59
60.19
57.99
59.24
57.38
Alloy
3
62.90
63.63
62.33
63.27
61.25
62.67
61.29
60.77
4
60.72
60.41
59.60
59.27
59.79
62.35
60.26
60.53
58.91
58.55
61.20
59.73
60.12
60.49
5
57.93
59.85
61.06
57.31
61.28
59.68
57.82
59.29
58.65
61.96
57.96
59.42
59.40
60.30
60.15
and
E(MSe) = 2 .
Is the variance of MSe smaller than the variance of Si2 ? Why?
Problem 10.8 To study the correlation between the solar insulation and wind speed in
the United States, 26 National Weather Service stations used three dierent types of solar
collectors2D Tracking, NS Tracking and EW Tracking to collect the solar insulation and wind
speed data. An engineer wishes to compare whether these three collectors give significantly
dierent measurements of wind speed. The values of windspeed corresponding to attainment
of 95% integrated insulation are reported in the following Table 9.8.
Are there statistically significant dierences in measurement among the three dierent
apertures? Specify and check the model assumptions.
166
Table 10.8:
Station No.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
Site
Brownsville, Texas
Apalachicola, Fla.
Miami, Fla.
Santa Maia, Calif.
Ft. Worth, Texas
Lake Charles, La.
Phoenix, Ariz.
El Paso, Taxes
Charleston, S.C.
Fresno, Calif.
Albuquerque, N.M.
Nashville, Tenn.
Cape Hatteras, N.C
Ely, Nev.
Dodge City, Kan.
Columbia, Mo.
Washington, D.C.
Medford, Ore.
Omaha, Neb.
Madison, Wis.
New York, N.Y.
Boston, Mass.
Seattle, Wash.
Great Falls, Mont.
Bismarck, N.D.
Caribou, Me.
Latitude
25.900
29.733
25.800
34.900
32.833
30.217
33.433
31.800
32.900
36.767
35.050
36.117
35.267
39.283
37.767
38.967
38.833
42.367
41.367
43.133
40.783
42.350
47.450
47.483
46.767
46.867
2D Tracking
11.0
7.9
8.7
9.6
10.8
8.5
6.6
10.3
9.2
6.2
9.0
7.7
9.2
10.0
12.0
9.0
9.3
6.8
10.4
9.5
10.4
11.4
9.0
12.9
10.8
11.4
NS Tracking
11.0
7.9
8.6
9.7
10.7
8.4
6.6
10.3
9.1
6.3
9.0
7.6
9.2
10.1
11.9
8.9
9.1
6.9
10.3
9.5
10.3
11.2
9.0
12.6
10.7
11.3
EW Tracking
11.0
8.0
8.7
9.5
10.9
8.6
6.5
10.3
9.2
6.1
8.9
7.7
9.3
10.1
12.0
9.1
9.5
6.5
10.5
9.6
10.4
11.4
9.1
13.0
10.8
11.5
Chapter 11
An example
th
metal bar
xi
yi
zi
We will investigate the relationship between the variables xi and yi . Likewise, the relationship between the variables xi and zi can be investigated in an analogous way (see Problem
10.2).
First of all we notice that the roles of yi and xi are dierent. Reasonably, one must assume
that the elastic limit, yi , of the ith metal bar is somehow determined (or influenced) by the
diameter, xi , of the bar. Consequently, the variable yi can be considered as a dependent or
response variable and the variable xi can be considered as an independent or explanatory
variable.
A quick look at Figure 11.1 (a) will show that there is not an exact (deterministic)
relationship between xi and yi . For example, bars with the same diameter (3, say) have
dierent elastic limits (436.82, 449.40 and 412.63). However, the plot of yi versus xi shows
that in general, larger values of xi are associated with smaller values of yi .
In cases like this we say that the variables are statistically related, in the sense that the
average elastic limit is a decreasing function, f (xi ), of the diameter.
167
168
( 18
Bar
Diameter
of an inch)
3
3
3
4
4
4
4
5
5
5
5
5
5
6
6
6
6
7
Elastic
Limit
(100 psi)
436.82
449.40
412.63
425.00
419.71
415.74
422.94
407.76
416.84
388.39
416.25
384.35
412.91
379.64
371.11
369.34
384.91
362.89
Ultimate
Strength
(100 psi)
683.65
678.48
681.41
672.29
673.26
671.31
674.42
646.44
654.32
649.31
654.24
644.20
640.15
627.52
621.45
626.11
632.73
601.73
Unit
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
( 18
Bar
Diameter
of an inch)
7
7
8
8
8
9
9
9
10
10
10
10
11
11
11
12
12
Elastic
Limit
(100 psi)
361.14
356.06
328.59
321.64
321.14
297.28
286.04
291.99
231.15
249.13
249.81
251.22
200.76
216.99
210.26
162.30
167.63
Ultimate
Strength
(100 psi)
605.12
604.17
568.11
576.69
570.47
538.99
537.11
537.44
502.76
498.88
495.17
499.21
455.28
460.75
460.96
411.13
410.74
Each elastic limit measurement, yi , can be viewed as a particular value of the random
variable, Yi , which in turn can be expressed as the sum of two terms, f (xi ) and "i . That is,
Yi = f (xi ) + "i ,
i = 1, . . . , 35.
(11.1)
It is usually assumed that the random variables "i satisfy the following assumptions:
(1)
(2)
(3)
(4)
These assumptions can be summarized by simply saying that the variables Yi s are independent normal random variables with
E(Yi ) = f (xi )
and
Var(Yi ) = 2 .
The model (10.1) above is called linear if the function f (xi ) can be expressed in the form
f (xi ) = 0 + 1 g(xi ),
169
11.1. AN EXAMPLE
where the function g(x) is completely specified, and 0 and 1 are (usually unknown) parameters.
20
10
12
10
12
Diameter
(d)
10
-10
10
-20
20
Residuals
(c)
Elasticity Limit
Diameter
-20
-40
(b)
Residuals
Elasticity Limit
(a)
12
Diameter
10
12
Diameter
Figure 11.1
The linear model,
Yi = 0 + 1 g(xi ) + "i ,
i = 1, . . . , 35,
(11.2)
170
is very flexible as many possible mean response functions, f (xi ), satisfy the linear form
given above. For example, the functions
f (xi ) = 5.0 + 4.2xi
and
f (xi ) = 0 + 3 sin(2xi ),
0 = 5 and 1 = 4.2
0 = unspecified and 1 = 3,
exp{1 xi }
.
1 + exp{1 xi }
The shape assumed for f (xi ) is sometimes suggested by scientific or physical considerations. In other cases, as in the present example, the shape of f (xi ) is suggested by the data
itself. The plot of yi versus xi (see Figure 11.1) indicates that, at least in principle, the simple
linear mean response function
f (xi ) = 0 + 1 xi ,
that is g(xi ) = xi .
may be appropriate. In other words, to begin our investigation we will use the tentative
working assumption that, on the average, the elastic limit of the metal bars is a linear function
(0 + 1 xi ) of their diameters.
Of course, the values of 0 and 1 are unknown and must be empirically determined, that
is, estimated from the data. One popular method for estimating these parameters is the
method of least squares. Given the tentative values b0 and b1 for 0 and 1 , respectively,
the regression residuals
ri (b0 , b1 ) = yi b0 b1 xi ,
i = 1, . . . , n,
measure the vertical distances between the observed value, yi , and the tentatively estimated
mean response function, b0 + b1 xi .
The method of least squares consists of finding the values 0 and 1 of b0 and b1 , respectively,
which minimize the sum of the squares of the residuals. It is expected that, because of this
minimization property, the corresponding mean response function,
f(xi ) = 0 + 1 xi
will be close to or will fit the data points. In other words, the least square estimates 0
and 1 are the solution to the minimization problem:
171
11.1. AN EXAMPLE
min
b0 ,b1
n
X
ri2 (b0 , b1 ) =
min
b0 ,b1
i=1
n
X
[yi b0 b1 xi ]2 .
i=1
n
X
ri2 (b0 , b1 )
i=1
with respect to b0 and b1 , and set these derivatives equal to zero to obtain the so called LS
equations:
n
X
@
S(b0 , b1 ) = 2 [yi b0 b1 xi ] = 0
@b0
i=1
n
X
@
S(b0 , b1 ) = 2 [yi b0 b1 xi ]xi = 0,
@b1
i=1
n
X
i=1
yi nb0 b1
yi xi b0
n
X
i=1
xi b1
n
X
xi = 0
i=1
n
X
x2i = 0,
i=1
or equivalently,
y b0 b1 x = 0
(11.3)
xy b0 x b1 xx = 0,
(11.4)
where
xy =
n
1X
yi xi
n i=1
and
xx =
n
1X
x2 .
n i=1 i
172
b0 = y b1 x.
(11.5)
xy x y
.
xx x x
and
0 = y x.
In the case of our numerical example we have
x = 7.086, y = 336.565, xy = 2162.353 and
xx = 57.657,
Therefore,
2162.353 (7.086)(336.565)
1 =
= 29.86,
57.657 (7.0862 )
and
The plot of f(x) versus x (solid line in Figure 11.1 (a)) and the plot of the regression
residuals,
173
11.1. AN EXAMPLE
one would expect the plot of the ei versus the xi will not show any particular pattern. In
other words, if the specified mean response function is correct, the estimated mean response
function f(xi ) should extract most of the signal (systematic behavior) contained in the
data and the residuals, ei , should behave as patternless random noise.
Now that the tentatively specified simple transformation
g(x) = x
for the explanatory variable, x, is considered to be incorrect, the next step in the analysis is
to specify a new transformation. We will try the mean response function
f (x) = 0 + 1 x2 ,
that is g(x) = x2 .
Notice that if the mean elastic limit of the bars is a function of the bar surface
(diameter)2 1
1
( inch)2 = x2 (/4) ( inch)2 ,
4
8
8
then the newly proposed mean response function will be appropriate. To simplify the notation
we will write
wi = x2i ,
to represent the squared diameter of the ith metal bar.
The new estimates for 0 and 1 and f (x) are
1 =
wy y w
= 2.022,
ww w w
and
The plot for this new fit (solid line on Figure 11.1 (c)) and the residuals plot (Figure 11.1
(d)) indicate that this second fit is appropriate.
174
E(0 ) = 0
E(1 ) = 1 .
and
"
1
w2
+ Pn
2
n
i=1 [wi w]
and
2
.
2
i=1 [wi w]
Var(1 ) = Pn
Finally, it can be shown that under the model,
E
n
X
i=1
[Yi 0 1 wi ]2 = (n 2) 2
Pn
i=1 [Yi
0 1 wi ]2
n2
175
11.1. AN EXAMPLE
v
u
u1
SD(0 ) = s t + Pn
v
u
u1
w2
t +
=
s
2
n n[ww w w]
i=1 [wi w]
w2
and
s
=q
2
n[ww w w]
i=1 [wi w]
SD(1 ) = qP
n
v
u
u 1
86.53 t +
35
57.6572
= 1.60
35[4993.429 57.6572 ]
and
SD(1 ) =
86.53
= 0.0385
35(4993.429 57.6572 )
Confidence Intervals
95% confidence intervals for the model parameters, 0 and 1 , and also for the mean
response, f (x), can now be easily obtained. First we derive the 95% confidence intervals for
0 and 1 . As before, the intervals are of the form
0 d0
and
1 d1
where
d0 = t(n2) () SD(0 )
and
d1 = t(n2) () SD(1 ).
In the case of our example, n 2 = 35 2 = 33, t(33) (0.05) t(30) (0.05) = 2.04 and so
d0 = (2.04)(1.60) = 3.26
and
d1 = (2.04)(0.0385) = 0.0785
176
453.125 3.26
2.021 0.0785,
and
respectively.
Notice that, since the confidence interval for doesnt include the value zero, we conclude
that there is a linear decreasing relationship between the square of the bar diameter and its
1
elastic limit. When the bar surface increases by one unit ( 64
inch2 ) the average elastic limit
decreases by two hundred psi.
Finally, we can also construct a 95% confidence interval for the average response, f (x), at
any given value of x. It can be shown that the variance of f(w) is
Var(f(x)) =
"
1
(w w)2
+
n n[ww w w]
SD(f (x)) = st +
(w w)2
.
n[ww w w]
v
u
u 1
86.53t +
35
(w 57.657)2
.
35[4993.429 57.6572 ]
v
u
u 1
86.53t +
35
(16.0 57.657)2
= 1.59.
35[4993.429 57.6572 ]
323.72 3.24.
177
11.2. EXERCISES
11.2
Exercises
Problem 11.1 The number of hours needed by twenty employees to complete a certain task
have been measured before and after they participated of a special training program. The
data is displayed on Table 7.2. Notice that these data have already been partially studied in
Problem 7.12. Investigate the relationship between the before training and the after training
times using linear regression. State your conclusions.
Problem 11.2 Investigate the relationship between the diameter bar and the ultimate
strength shown in Table 10.1. State your conclusions.
Problem 11.3 Table 10.2 reports the yearly worldwide frequency of earthquakes with magnitude 6 or greater from January, 1953 to December, 1965.
(a) Make scatter-plots of the frequencies against magnitudes and the log-frequencies against
the magnitudes.
(b) Propose your regression model and estimate the coecients of your model.
(c) Test the null hypothesis that the slope is equal to zero.
Table 11.2:
Magnitude
6.0
6.1
6.2
6.3
6.4
6.5
6.6
6.7
6.8
6.9
7.0
7.1
7.2
7.3
Frequency
2750
1929
1755
1405
1154
920
634
487
376
276
213
141
110
85
Magnitude
7.4
7.5
7.6
7.7
7.8
7.9
8.0
8.1
8.2
8.3
8.4
8.5
8.6
8.7
Frequency
57
45
31
23
18
13
9
7
7
4
2
2
1
1
Problem 11.4 In a certain type of test specimen, the normal stress on a specimen is known
to be functionally related to the shear resistance. The following is a set of experimental data
on the variables.
178
x, normal stress
26.8
25.4
28.9
23.6
27.7
23.9
24.7
28.1
26.9
27.4
22.6
25.6
y, shear resistance
26.5
27.3
24.2
27.1
23.6
25.9
26.3
22.5
21.7
21.4
25.8
24.9
y (grams)
8 6 8
12 10 14
25 21 24
31 33 28
44 39 42
48 51 44
Chapter 12
Appendix
12.1
Appendix A: tables
This appendix includes five tables: normal table, t-distribution table, F -distribution table,
cumulative Poisson distribution table and cumulative binomial distribution table.
179