0% found this document useful (0 votes)

279 views179 pages

Coursenotes Aug2012

Course notes

Uploaded by

gego2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

279 views179 pages

Coursenotes Aug2012

Course notes

Uploaded by

gego2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 179

1 Summary and Display of Univariate Data

1.1 Frequency Table and Histogram . . . . . . . . . . . . .
1.2 Sample Mean . . . . . . . . . . . . . . . . . . . . . . .
1.3 Sample Standard Deviation, Variance and Covariance
1.4 Sample Quantiles, Median and Interquartile Range . .
1.5 Box Plot . . . . . . . . . . . . . . . . . . . . . . . . . .
1.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.

5
5
8
11
13
17
19

2 Summary and Display of Multivariate Data

2.1 Scatter Plot . . . . . . . . . . . . . . . . . . .
2.2 Covariance and Correlation Coecient . . . .
2.3 The Least Squares Regression Line . . . . . .
2.4 Multiple Linear Regression . . . . . . . . . .
2.5 Exercises . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.

27
27
29
32
34
35

.
.
.
.
.

3 Probability
41
3.1 Sets and Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.2 Conditional Probability and Independence . . . . . . . . . . . . . . . . . . . . . . . . 44
3.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4 Random Variables and Distributions
4.1 Definition and Notation . . . . . . . . . . . . . . . .
4.2 Discrete Random Variables . . . . . . . . . . . . . .
4.3 Continuous Random Variables . . . . . . . . . . . . .
4.4 Summarizing the Main Features of f (x) . . . . . . .
4.5 Sum and Average of Independent Random Variables
4.6 Max and Min of Independent Random Variables . .
4.6.1 The Maximum . . . . . . . . . . . . . . . . .
4.6.2 The Minimum . . . . . . . . . . . . . . . . .
4.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . .
4.7.1 Exercise Set A . . . . . . . . . . . . . . . . .
4.7.2 Exercise Set B . . . . . . . . . . . . . . . . .
1

.
.
.
.
.
.
.
.
.
.
.

61
61
62
63
67
74
77
78
80
82
82
84

2
5 Normal Distribution
5.1 Definition and Properties
5.2 Checking Normality . . .
5.3 Exercises . . . . . . . . .
5.3.1 Exercise Set A . .
5.3.2 Exercise Set B . .

CONTENTS

.
.
.
.
.

89
89
95
98
98
99

6 Some Probability Models

6.1 Bernoulli Experiments . . . . . . . . . . . . . . . . . . . . . . . . .
6.2 Bernoulli and Binomial Random Variables . . . . . . . . . . . . . .
6.3 Geometric Distribution and Return Period . . . . . . . . . . . . . .
6.4 Poisson process and associated random variables . . . . . . . . . .
6.5 Poisson Approximation to the Binomial . . . . . . . . . . . . . . .
6.6 Heuristic Derivation of the Poisson and Exponential Distributions
6.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.7.1 Exercise Set A . . . . . . . . . . . . . . . . . . . . . . . . .
6.7.2 Exercise Set B . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.

103
103
104
106
108
113
114
116
116
116

7 Normal Probability Approximations

7.1 Central Limit Theorem . . . . . . . . . . . . . . . .
7.2 Normal Approximation to the Binomial Distribution
7.3 Normal Approximation to the Poisson Distribution .
7.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . .
7.4.1 Exercise Set A . . . . . . . . . . . . . . . . .
7.4.2 Exercise Set B . . . . . . . . . . . . . . . . .

.
.
.
.
.
.

119
119
123
125
126
126
127

8 Statistical Modeling and Inference

8.1 Introduction . . . . . . . . . . . . . . .
8.2 One Sample Problems . . . . . . .
8.2.1 Point Estimates for and .
8.2.2 Confidence Interval for . . .
8.2.3 Testing of Hypotheses about
8.3 Two Sample Problems . . . . . . . . .
8.4 Exercises . . . . . . . . . . . . . . . .
8.4.1 Exercise Set A . . . . . . . . .
8.4.2 Exercise Set B . . . . . . . . .

.
.
.
.
.
.
.
.
.

129
129
130
130
132
134
136
141
141
142

.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.

9 Simulation Studies
147
9.1 Monte Carlo Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
9.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
10 Comparison of several means
10.1 An example . . . . . . . . .
10.2 Exercises . . . . . . . . . .
10.2.1 Exercise Set A . . .
10.2.2 Exercise Set B . . .

.
.
.
.

153
153
163
163
163

CONTENTS

11 The Simple Linear Regression Model

167
11.1 An example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
11.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
12 Appendix
179
12.1 Appendix A: tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

CONTENTS

Chapter 1

Summary and Display of Univariate

Data
1.1

Frequency Table and Histogram

Engineers and applied scientists are often involved with the generation and collection of data and the
retrieval of information contained in data sets. They must also communicate to dierent audiences
the results of complex numerical studies including one or more data sets.
Experience shows that data sets are often messy, dicult to grasp and hard to analyze. In this
chapter we introduce some statistical techniques and ideas which can be used to summarize and
display data.
Table 1.1: Live Load Data
Bay
A
B
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U

1st
44.4
138.4
164.7
98.3
178.0
123.7
157.5
119.4
150.4
92.2
169.8
181.5
105.4
157.6
168.4
161.0
156.3
152.3
138.9
112.3

2d
130.4
236.4
110.4
154.5
108.1
185.4
62.3
74.1
137.8
54.0
168.4
147.5
133.1
164.6
173.5
132.8
128.6
169.5
101.1
135.1

3d
127.6
202.5
185.7
171.9
197.9
130.3
65.2
118.2
105.5
139.2
169.9
104.1
62.0
195.0
150.4
161.0
111.8
162.1
127.9
123.9

4th
127.7
128.7
185.0
104.8
112.0
169.2
94.4
144.4
55.2
116.7
159.6
167.4
144.9
136.3
116.4
147.1
157.6
106.6
178.3
258.9

5th
108.4
154.3
150.0
230.1
66.6
91.8
156.1
212.0
122.9
32.1
179.6
172.4
129.1
136.6
143.7
199.8
129.3
112.0
127.5
192.1

Frequency Table
5

6th
184.0
117.0
198.7
102.8
160.9
134.5
133.6
132.3
127.8
184.8
33.5
128.8
94.9
223.7
179.5
141.4
115.2
141.0
145.1
155.0

7th
139.1
125.9
144.5
156.6
106.8
153.5
101.9
136.1
180.6
127.1
193.3
138.6
147.6
134.0
84.5
178.1
73.3
110.7
53.5
122.3

8th
120.6
127.2
121.5
136.1
123.2
131.4
117.6
184.3
53.0
171.8
99.5
110.1
167.9
179.1
161.5
145.7
94.3
145.8
182.4
86.1

9th
174.1
175.6
93.2
93.8
162.5
254.0
87.6
177.2
150.1
159.6
124.3
141.1
136.7
85.7
140.5
124.8
161.9
206.1
147.9
147.0

10th
187.9
114.1
202.2
197.8
118.3
194.6
142.4
151.8
138.4
123.8
208.6
189.3
173.2
122.3
94.1
179.8
154.7
88.8
138.0
118.0

CHAPTER 1. SUMMARY AND DISPLAY OF UNIVARIATE DATA

Consider the 200 measurements of the live load distribution (pounds per square foot) on ten
floors and twenty bays of a large warehouse (Table 1.1). The live load is the load supported by
the structure excluding the weight of the structure itself. Notice how hard it is to understand data
presented in this raw form. They must clearly be organized and summarized in some fashion before
their analysis can be attempted.
One way to summarize a large data set is to condense it into a frequency table (see Table 1.2).
The first step to construct a frequency table is to determine an appropriate data range, that is,
an interval that contains all the observations and that has end points close (but not necessarily
equal) to the smallest and largest data values. The second step is to determine the number k of
bins. The data range is divided into k smaller subintervals, the bins, usually taken of the same size.
Normally, the number of bins k is chosen between 7 and 15, depending on the size of the data set
with fewer bins producing simpler but less detailed tables. For example, in the case of the live load
data, the smallest and largest observations are 32.1 and 258.9, the data range is [20, 260] and there
are 12 bins of size 20. The third step is to calculate the bin mark, ci , which represents that bin.
The bin mark is the center of the bin interval (that is, one half of the sum of the bins end points).
For example, 30 = (20 + 40)/2 for the first bin in Table 1.2. The fourth step is to calculate the
bin frequencies, ni . The bin frequency is equal to the number of data points lying in that bin. Each
data point must be counted once; if a data point is equal to the end points of two successive bins,
then it is included (only) in the second. For example, a live load of 60 is included in the third bin
(see Table 1.2). The fourth step is to calculate the relative frequencies
ni
fi =
n1 + n2 + . . . + nk
and the cumulative relative frequencies
n1 + . . . + ni
Fi =
.
n1 + n2 + . . . + nk
Notice that fi 100% gives the percentage of observations in the ith bin and Fi 100% gives the percentage of observations below the end point of the ith bin. For example, from Table 1.2, 18% of the
live loads are between 140 and 160 psf , and 95% of the live loads are below 200 psf .
Table 1.2: Frequency Table
Class
2040
4060
6080
80100
100120
120140
140160
160180
180200
200220
220240
240260

ci
30
50
70
90
110
130
150
170
190
210
230
250

ni
2
5
6
15
28
47
36
32
19
5
3
2

fi
0.010
0.025
0.030
0.075
0.14
0.235
0.180
0.160
0.095
0.025
0.015
0.010

Fi
0.010
0.035
0.065
0.140
0.280
0.515
0.695
0.855
0.950
0.975
0.990
1.000

At this point it is worth comparing Table 1.1 and Table 1.2. We can quickly learn, for instance,
from Table 1.2 that only 2 live loads lie between 20 and 40, but we cannot say which they are. On

1.1. FREQUENCY TABLE AND HISTOGRAM

the other hand, with considerably eort, we can find out from Table 1.1 that these live loads are
32.1 and 33.5. Table 1.2 looses some information in exchange for clarity. The loss of information
and gain in clarity are proportional to the number of bins.
Histogram:

probability

Histogram of Live Load

0.0

0.002 0.004 0.006 0.008 0.010 0.012

The information contained in a frequency table can be graphically displayed in a picture called
histogram (see Figure 1.1). Bars with areas proportional to the bin frequencies are drawn over each
bin. Notice that in the case of bins of equal size the bar areas are proportional to the bar heights.
The histogram shows the shape or distribution of the data and permits a direct visualization of
its general characteristics including typical values, spread, shape, etc. The histogram also helps
to detect unusual observations called outliers. From Figure 1.1 we notice that the distribution of
the live load is approximately symmetric: the central bin 120 140 is the most frequent and the
frequency of the other bins decrease as we move away from this central bin.

100

150

200

250

class

Figure 1.1: Histogram of the Live Load

Many data sets encountered in practice are not symmetric. For example the histogram of
Tobins Q-ratios (market value to replacement cost, out of 250) for 50 firms in Figure 1.2 (a) shows
high positive skewness. There are a few firms which are highly over rated. The age of ocers
attaining the rank of colonel in the Royal Netherlands Air Force (Figure 1.2 (b)) exhibit a pattern
of negative skewness. There appear to be more whizzes than laggards in the Netherlands Air
Force. Figure 1.2 (c) displays Simon Newcombs measurements of the speed of light. Newcomb
measured the time required for light to travel from his laboratory on the Potomac River to a mirror
at the base of the Washington Monument and back, a total distance of about 7400 meters. These
measurements were used to estimate the speed of light. The histogram of Newcombs data (Figure
1.2 (c)) shows a symmetric distribution except for two outliers. Deleting these outliers gives the
symmetric histogram on Figure 1.2 (d).
Data sets can be further summarized in terms of just two numbers, one giving their location and
the other their dispersion. These summaries are very convenient and perhaps unavoidable when we

CHAPTER 1. SUMMARY AND DISPLAY OF UNIVARIATE DATA

(b) Age of officers

0.0

0.10 0.20

0.30

0.002 0.004 0.006

(a) Tobins Q ratio

100 200 300 400 500 600

(d) Outliers deleted

0 10 20 30 40 50 60

(c) Speed of light

24.76 24.78 24.80 24.82 24.84

24.81524.82024.82524.83024.83524.840

Figure 1.2: Some Non-Symmetric Histograms

must compare several data sets (e.g. the production figures from several plants and shifts). The
loss of information is not severe in the case of data sets with approximately symmetric histograms,
but may be very severe in other cases.
Two commonly used measures of location and dispersion are the sample mean and the sample
standard deviation. They are studied in the next two sections.

1.2

Sample Mean

Quantitative variables such as the live load are usually denoted by upper case letters X, Y , etc. The
particular measurements for these variables are denoted by the corresponding lower case letters, xi ,
yi , etc. The subscripts give the order in which the measurements have been taken. For example,
the variable live load can be represented by X and, if the measurements were made floor by floor
from the first to the tenth, from bay A to bay U, then
x1 = 44.4,

x2 = 138.4,

...

x10 = 92.2,

x200 = 118.0.

...

The sample mean x (also called sample average) of a data set or sample is defined as
x=

x1 + x2 + + xn
=
n

i=1 xi

where n represents the number of data points (observations). For the live load data (see Table 1.1)
x = 140.156 pounds per ft2 .

1.2. SAMPLE MEAN

The sample average can also be approximately calculated from a frequency table using the formula
Pk

k
ci ni X
=
ci fi .
i=1 ni

x Pi=1
k

The approximation is better when the measurements are symmetrically distributed over each bin.
For the live load data (see Table 1.2) we have
x

(30 2) + (50 5) + . . . + (250 2)

2 + 5 + ... + 2

= (30 0.01) + (50 0.035) + . . . + (250 0.01) = 139.8 pounds per ft2 ,
which is close to the exact value, 140.156.
Properties of the Sample Mean
Linear Transformations: If the original measurements, xi are linearly transformed to obtain new
measurements
yi = a + bxi ,
for some constants a and b, then
y = a + bx.
In fact,
y=

i=1 yi

i=1 (a

+ bxi )

na + b

i=1 xi

=a+b

i=1 xi

= a + bx.

Example 1.1 Suppose that each live load from Table 1.1 is increased by 5 kilograms and converted
to kilograms per square foot. Since one pound equals 0.4535 kilograms, the revised measurements
are yi = 5 + 0.4535xi and y = 5 + 0.4535x = 5 + 0.4535 140.2 = 68.58kg.
Sum of Variables: If new measurements zi are obtained by adding old measurements xi and yi
then
z = x + y.
In fact,
z=

i=1 zi

i=1 (xi

+ yi )

i=1 xi

+
n

i=1 yi

= x + y.

Example 1.2 Let ui and vi (i = 1, . . . , 10) represent the live loads on bays A and B. The mean
load across floors for these two bays are (see Table 1.1)
u = (44.4 + 130.4 + . . . + 187.9)/10 = 134.42

(Bay A)

v = (138.4 + 236.4 + . . . + 114.1)/10 = 152.01

(Bay B).

If wi represent the combined live loads on bays A and B (i.e. wi = ui + vi ) then the combined mean
load across floors for these two bays is
w = u + v = 134.42 + 152.01 = 286.43.

CHAPTER 1. SUMMARY AND DISPLAY OF UNIVARIATE DATA

Least Squares: The sample mean has a nice geometric interpretation. If we represent each observation xi as a point on the real line, then the sample mean is the point which is closest to entire
collection of measurements. More precisely, let S(t) be the sum of the squared distances from each
observation xi to the point t:
S(t) =
Then S(t) S(x) for all t. To prove this write
S(t) =
=
=

n
X
n
X
n
X

n
X

(xi t)2 .

[(xi x) + (x t)]2
[(xi x)2 + (x t)2 + 2(xi x)(x t)]
(xi x)2 + n(x t)2 + 2(x t)

= S(x) + n(x t)2 ,

S(x),

since

n
X

(xi x)

(xi x) = nx nx = 0

since n(x t)2 0,

for all t.

Moreover, equality holds only if all the measurements are equal.

Center of Gravity: The sample mean has also a nice physical interpretation. If we think of
the observations xi as points on a uniform beam where vertical equal forces, Fi , are applied (see
Figure 1.3), then the sample mean is the center of gravity of this system. To see this consider the
magnitude and the placement of the opposite force F needed to achieve static equilibrium. Since all
the forces are vertical, the horizontal component of F must be equal to zero. To achieve translation
equilibrium the sum of the vertical components of all the forces must also be equal to zero. If we
denote the vertical components of Fi by Fi , and the vertical component of F by F , then
F + (F1 + F2 + . . . + Fn ) = 0

(Static Equilibrium).

Since the Fi0 s are all equal (Fi = w, say) we have F nw = 0 and so F = nw. To achieve torque
equilibrium, the placement d of F must satisfy
dF + (x1 F1 ) + (x2 F2 ) + . . . + (xn Fn ) = 0

(Torque Equilibrium).

Replacing Fi by w and F by nw we have

dnw w(x1 + x2 + . . . + xn ) = 0.
Therefore,
d=

x1 + x2 + . . . + xn
= x.
n

1.3. SAMPLE STANDARD DEVIATION, VARIANCE AND COVARIANCE

x
6x2

F
Figure 1.3: The Sample Mean As Center of Gravity

1.3

Sample Standard Deviation, Variance and Covariance

Given the measurements (or sample) x1 , x2 , . . . , xn , their sample standard deviation SD(x) is defined
as
sP
n
2
i=1 (xi x)
SD(x) = +
.
n1
The expression inside the square root is called the sample variance, and denoted Var(x). In the
case of the live load data (Table 1.1)
Var(x) = 1583.892 square pounds per ft4

and

SD(x) = 39.798 pounds per ft2 .

The standard deviation can be approximately calculated from a frequency table using the formula
sP

k
i=1 (ci

x)2 ni
.
n1
The approximation is better when the observations are symmetrically distributed on each bin. For
the live load (Table 1.2) we have
SD(x) +

SD(x)

(30 139.8)2 2 + (50 139.8)2 5 + + (250 139.8)2 2

= 37.75 pounds per ft2 ,
199

which is close to the exact value, 39.798.

Properties of the Sample Variance
Linear Transformations: If the original measurements, xi are linearly transformed to obtain new
measurements
yi = a + bxi ,
for some constants a and b, then
Var(y) = b2 Var(x).
In fact, since y = a + bx,
Var(y) =
=

(yi y)2
=
(n 1)

(a + bxi a bx)2
(n 1)

[b(xi x)]2
= b2
(n 1)

(xi x)2
= b2 Var(x).
(n 1)

CHAPTER 1. SUMMARY AND DISPLAY OF UNIVARIATE DATA

Example 1.3 As in Example 1.1, each live load in Table 1.1 is increased by 5 kilograms per square
foot and converted to kilograms per square foot. Since one pound equals 0.4535 kilograms, the revised
measurements are yi = 5 + 0.4535xi kilograms per square foot and so Var(y) = 0.45352 Var(x) =
0.2056623 p1583.892 = 325.747kg 2 square kilograms per ft4 . The corresponding standard deviation
is SD(y) = 325.747 = 18.048kg kilograms per square foot.

Sum of Variables: If new measurements zi are obtained by adding old measurements xi and yi
then
Var(z) = Var(x) + Var(y) + 2Cov(x, y),
where

(1.1)

i=1 (xi

x)(yi y)
,
n1
is the covariance between xi and yi . The covariance will be further discussed in the next Chapter.
The important point here is to notice that the variances of xi and yi cannot simply be added to
obtain the variance of zi .
To prove (1.1) write
Cov(x, y) =

Var(z) =
=
=

i=1 (zi

z)2
=
n1

i=1 [(xi

i=1 (xi

+ yi x y)2
=
n1

i=1 [(xi

x) + (yi y)]2
n1

x)2 + (yi y)2 + 2(xi x)(yi y)]

x)2 +

i=1 (yi

y)2 + 2
n1

i=1 (xi

x)(yi y)

Example 1.4 As in Example 1.2 let ui and vi be the live loads on bays A and B. The variances
and covariance for these loads are (see Table 1.1 and Example 1.2)
Var(u) =

(44.4 134.42)2 + (130.4 134.42)2 + + (187.9 134.42)2

= 1777.128
9

Var(v) =

(138.4 152.01)2 + (236.4 152.01)2 + + (114.1 152.01)2

= 1657.93
9

Cov(u, v) =

(Bay A)
(Bay B)

(44.4 134.42)(138.4 152.01) + + (187.9 134.42)(114.1 152.01)

= 218.650.
9

If wi represents the combined live loads on bays A and B (i.e. wi = ui + vi ) then

Var(w) = Var(u) + Var(v) + 2Cov(u, v) = 1777.128 + 1657.93 + 2 (218.6502) = 2997.758
Two Simple Identities: the following identities are very useful for handling calculations of variances and covariances:
n
X
i=1

(xi x)2 =

n
X
i=1

x2i nx2 =

n
X
i=1

n
X

x2i (

i=1

xi )2 /n

(1.2)

1.4. SAMPLE QUANTILES, MEDIAN AND INTERQUARTILE RANGE

and
n
X
i=1

(xi x)(yi y) =

n
X
i=1

xi yi nx y =

n
X
i=1

n
X

xi yi (

n
X

xi )(

i=1

yi )/n.

(1.3)

i=1

To prove (1.2) write

n
X
i=1

n
X

(xi x)2 =

i=1

(x2i + x2 2xi x) =

The identities in (1.2) follow now because

nx2 2x

n
X
i=1

i=1 xi

n
X
i=1

x2i + nx2 2x

n
X

xi .

i=1

= nx and so
n
X

xi = nx2 2nx2 = nx2 = (

xi )2 /n.

i=1

The proof of (1.3) is similar and is left as an exercise.

Table 1.3: Variance and Covariance Calculations
Floor (i)
1
2
3
4
5
6
7
8
9
10
Total

Bay A (ui )
44.4
130.4
127.6
127.7
108.4
184.0
139.1
120.6
174.1
187.9
1344.2

Bay B (vi )
138.4
236.4
202.5
128.7
154.3
117.0
125.9
127.2
175.6
114.1
1520.1

u2i
1971.36
17004.16
16281.76
16307.29
11750.56
33856.00
19348.81
14544.36
30310.81
35306.41
196681.5

vi2
19154.56
55884.96
41006.25
16563.69
23808.49
13689.00
15850.81
16179.84
30835.36
13018.81
245991.8

ui vi
6144.96
30826.56
25839.00
16434.99
16726.12
21528.00
17512.69
15340.32
30571.96
21439.39
202364.0

Example 1.5 To illustrate the use of (1.2) and (1.3), lets calculate again Var(u), Var(v) and
Cov(u, v) where ui and vi are as in Example 1.4. Using (1.2) and the totals from Table 1.3 we have
196681.5
Var(u) =
9

(1344.2)2
10

= 1777.128

245991.8
and Var(v) =
9

(1520.1)2
10

= 1657.93.

Using (1.3) and the totals from Table 1.3 we have

Cov(u, v) =

1.4

202364.0

(1344.2)(1520.1)
10

= 218.650.

Sample Quantiles, Median and Interquartile Range

The location of non-symmetric data sets may be poorly represented by the sample mean because
the sample mean is very sensitive to the presence of outliers in the data. Notice that observations
far from the center have high torque or leverage and attract the sample mean (center of gravity)
toward them. The dispersion of non-symmetric data sets may also be poorly represented by the
sample standard deviation.

CHAPTER 1. SUMMARY AND DISPLAY OF UNIVARIATE DATA

Example 1.6 A student with an average of 94.7% (SD=2.8%) on the first 10 assignments had a
personal problem and did very poorly on the eleventh where he got zero. Calculate his current
average and standard deviation.
Solution The mean drops from 95 to
x=

(10 95) + 0
= 86.09.
11

To calculate the new standard deviation notice that

10
X
i=1

x2i =

10
X
i=1

P10

2
i=1 (xi 95)

= 9 2.82 = 70.56 and by (1.2)

(xi 95)2 + 10 952 = 70.56 + 90250 = 90320.56.

Therefore,
90320.56 + 02 (11 86.092 )
= 879.4191,
10
p
and the standard deviation, then, increases from 2.8 to 879.4191 = 29.66.
Var(x) =

We will see that data sets which are asymmetric or include outliers may be better summarized
using the sample quantiles defined below.
Sample Quantiles
Let 0 < p < 1 be fixed. The sample quantile of order p, Q(p), is a number with the property
that approximately p100% of the data points are smaller than it. For example, if the 0.95 quantile
for the class final grades is Q(0.95) = 85 then 95% of the students got 85 or less. If your grade is
87 then you are in the the top 5% of the class. On the other hand, if your mark were smaller than
Q(0.10) than you would be in the lowest 10% of the class.
To compute Q(p) we must follow the following steps
1 Sort the data from smallest data point, x(1) , to largest data point, x(n) , to obtain
x(1) x(2) . . . x(n) .
The ith largest data point is denoted x(i) .
2 Compute the number np + 0.5. If this number is an integer, m, then
Q(p) = x(m) .
If np + 0.5 is not an integer and m < np + 0.5 < m + 1 for some integer m then
Q(p) =

x(m) + x(m+1)
.
2

1.4. SAMPLE QUANTILES, MEDIAN AND INTERQUARTILE RANGE

Example 1.7 Let ui and vi be the live loads on the first two floors (see Table 1.4). Calculate the
quantiles of order 0.25, 0.50 and 0.75 for the live load on floors 1 and 2 and for the dierences
wi = ui vi between the live loads on these two floors.
Solution
To calculate the quantile of order 0.25 for the live load on floor 1, Qu (0.25), observe that n = 20,
p = .25 and so np + .5 = 20 .25 + .5 = 5.5 is between 5 and 6. Using the column u(i) from Table
1.4 we obtain
u(5) + u(6)
112.3 + 119.4
Qu (0.25) =
=
= 115.85.
2
2
Similar calculations give Qv (0.25) = 109.25 and Qw (0.25) = 25.25. To calculate Qu (0.50) notice
that np + .5 = 20 .50 + .5 = 10.5 is between 10 and 11. Again, using the column u(i) from Table
1.4 we obtain
u(10) + u(11)
150.4 + 152.3
Qu (0.50) =
=
= 151.35.
2
2
The reader can check using similar calculations that Qv (0.50) = 134.1, Qw (0.50) = 7, Qu (0.75) =
162.85, Qv (0.75) = 166.5 and Qw (0.75) = 38.
Unfortunately, the sample quantiles do not have the same nice properties as the the sample
mean in relation with sums and dierences of variables. For example
Qu (0.50) Qv (0.50) = 151.35 134.1 = 17.25
is quite dierent from Quv (0.50) = Qw (0.50) = 7. Also
Qu (0.25) Qv (0.25) = 115.85 109.25 = 6.6 6= 25.25 = Quv (0.50)
and
Qu (0.75) Qv (0.75) = 151.35 134.1 = 17.25 6= 38 = Quv (0.75).
Median and Interquartile Range
The quantiles Q(0.25), Q(0.5) and Q(0.75) are particularly useful and given special names: lower
quartile, median and upper quartile. Notice that the lowest 25% of the data is below Q(0.25) and
the lowest 75% of the data is below Q(0.75). Because of that, Q(0.25) and Q(0.75) are also called
first and third qartiles.
The lowest 50% of the data is below Q(0.5) and the other half is above it. Therefore the median
divides the data into two equal pieces, regardless the shape of the histogram. Because of this
property and the fact that the median is not much aected by outliers, it is often used as a measure
of location (instead of the mean).
The mean and the median are equal in the case of perfectly symmetric data sets. They are also
close in the presence of mild asymmetry. But very asymmetric data sets can produce very dierent
means and medians. When the mean and the median roughly agree we will normally prefer the
mean because of its nicer numerical properties (see the comments at the end of Problem 1.7). When
they do not, however, we will normally prefer the median because of its resistance to outliers. The
dierence between the mean and the median is a strong indication of the presence outliers in the
data which are severe enough to upset the sample mean.

CHAPTER 1. SUMMARY AND DISPLAY OF UNIVARIATE DATA

Table 1.4: Live Load on the First and Second Floors
i
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Mean
SD

ui
44.4
138.4
164.7
98.3
178.0
123.7
157.5
119.4
150.4
92.2
169.8
181.5
105.4
157.6
168.4
161.0
156.3
152.3
138.9
112.3
138.53
34.66

u(i)
44.4
92.2
98.3
105.4
112.3
119.4
123.7
138.4
138.9
150.4
152.3
156.3
157.5
157.6
161.0
164.7
168.4
169.8
178.0
181.5

vi
130.4
236.4
110.4
154.5
108.1
185.4
62.3
74.1
137.8
54.0
168.4
147.5
133.1
164.6
173.5
132.8
128.6
169.5
101.1
135.1
135.38
43.61

v(i)
54.0
62.3
74.1
101.1
108.1
110.4
128.6
130.4
132.8
133.1
135.1
137.8
147.5
154.5
164.6
168.4
169.5
173.5
185.4
236.4

wi
-86.0
-98.0
54.3
-56.2
69.9
-61.7
95.2
45.3
12.6
38.2
1.4
34.0
-27.7
-7.0
-5.1
28.2
27.7
-17.2
37.8
-22.8
3.145
51.37

w(i)
-98.0
-86.0
-61.7
-56.2
-27.7
-22.8
-17.2
-7.0
-5.1
1.4
12.6
27.7
28.2
34.0
37.8
38.2
45.3
54.3
69.9
95.2

As a rule of thumb we will calculate both the mean and the median and use the mean if they
are similar. Otherwise we will use the median. To guide our choice we can calculate the discrepancy
index
p |Mean Median|
d= n
2 IQR
and choose the mean when d is smaller than 1. The interquartile range (IQR), used in the denominator of d above, is defined as
IQR = Q(0.75) Q(0.25),

The IQR is recommended as a measure of dispersion in the presence of outliers and lack of symmetry.
Notice that IQR is proportional to the length of the central half of the data, regardless the shape
of the histogram, and it is not much aected by outliers.
Example 1.8 Refer to Example 1.6. Calculate the median, the interquatile range and the discrepancy index d for the students marks before and after the eleventh assignment (The marks are 94,
93, 95, 91, 96, 91, 98, 93, 99, 97 and 0). just one
Solution Since the sorted marks (before the eleventh assignment) are 91, 91, 93, 93, 94, 95, 96, 97,
98, 99, Q(0.25) = x(3) = 93, Q(0.5) = (x(5) + x(6) )/2 = (94 + 95)/2 = 94.5 and Q(0.75) = x(8) = 97.
p
Therefore, Median(x) = 94.5, IQR(x) = 9793 = 4 and d = 10(94.794.5)/(24) = 0.07905694.
Including the eleventh assignment we have Q(0.25) = (x(3) + x(4) )/2 = (91 + 93)/2 = 92,
Q(0.5) = x(6) = 94 and Q(0.75) = (x(8) + x(9) )/2 = (96 + 97)/2 = 96.5. Therefore, the new median
and IQR are: Median(x) = 94 and IQR(x) = 96.5 92 = 4.5. Unlike the mean, the median is very
littlepaected by the single poor performance. This is also reflected by the large discrepancy index
d = 11(86.09 94)/(2 9) = 2.915.

1.5. BOX PLOT

Example 1.9 Table 1.5 gives the mean, median, standard deviation and IQR for the data sets on
Figure 1.2. The mean and median of Tobins Q ratios show appreciable dierences (d = 2.98). In
addition, their standard deviation is more than twice their IQR. Clearly, the mean and standard
deviation are upset by a few heavily overrated firms. Tobins Q ratios are then better represented
by their median and IQR. The eect of outliers and lack of symmetry is moderate in the case of the
Age of Ocers data. Although d = 1.07 the mean and standard deviation still summarize these
data well. Finally, for the Speed of Light data the two clear (lower) outliers do not seem to have
much aect on the sample mean (d = 0.64).
Table 1.5: Summary figures for the data sets displayed on Figure 1.2
Data Set
Tobins Q ratio
Age of ocers
Speed of light

1.5

Mean
158.6
51.494
24.826

Median
118.5
52
24.827

Discrepancy
2.98
1.07
0.64

S. Deviation
97.749
1.739
0.011

IQR
47.593
2.222
0.005

Box Plot

The box plot is a powerful tool to display and compare data sets. It is just a box with whiskers
which helps to visualize the main quantiles (Q(0.25), Q(0.50) and Q(0.75)) and the extreme data
points (maximum and minimum).
For the following discussion refer to Figure 1.4 (b) and (d). The lower and upper ends of
the box are determined by the lower and upper quartiles (Q(0.25) and Q(0.75)); a line sectioning
the box displays the sample median and its relative position within the interquartile range. The
median then divides the main box into two smaller subboxes which represent the lower and upper
central quarters of the data. Symmetric data sets have upper and lower subboxes of equal size.
Asymmetric data sets have subboxes of dierent sizes, the larger one indicating the direction of
the asymmetry. The data on Figure 1.4 (b) is mildly asymmetric with a longer lower tail: the lower
subbox is larger than the upper one and the lower whisker is longer than the upper one. The data
on Figure 1.4 (d) is symmetric. The location and dispersion of a data set are also clearly conveyed
by the box plot: the position of the box (and the median line) give the location; the size (length)
of the box (proportional to the IQR) gives the dispersion. Larger boxes indicate larger dispersion.
Finally, the whiskers at either end extend to the extreme values (maximum and minimum).
Points which are above Q(0.75) + 1.5IQR or below Q(0.25) 1.5IQR are considered outliers.
The following rule is used to help visualizing outliers in the data: the length of the whiskers should
not exceed 1.5IQR and points outside this range are displayed as unconnected horizontal lines. This
is illustrated by Figure 1.4 (a) and (c) where the presence of outliers is flagged by the existence of
unconnected horizontal lines above the upper whisker (Figure 1.4 (a)) or below the lower whisker
(Figure 1.4 (c)).

CHAPTER 1. SUMMARY AND DISPLAY OF UNIVARIATE DATA

(b) Age of officers

24.7624.7824.8024.8224.84

(c) Speed of light

24.820 24.830 24.840

100 200 300 400 500

(a) Tobins Q ratio

(d) Outliers deleted

Figure 1.4: Box plots for the data sets displayed on Figure 1.2
Example 1.10 Table 2.3 gives the monthly average flow (cubic meters per second) for the Fraser
River at Hope, BC, for the period 19711990. Figure 1.5 gives the boxplots for each month, from
January to December (from left to right). The year to year distributions of the monthly flows are
mildly asymmetric, with longer upper tails, and there are some outliers. However, the location and
dispersion summaries (see Table 1.10) are roughly consistent for most months and point to the same
conclusion: the river flow, and its variability as well, are much larger in the summer.
Table 1.6: Fraser River Monthly Flow (cms)
Year
Mean
Median
SD
IQR

Jan
957.4
868.0
274.4
174.6

Feb
894.8
849.5
202.8
163.0

Mar
993.1
926.5
233.5
257.0

Apr
1941.0
2010.0
477.8
427.8

May
4994.5
5000.0
976.4
613.0

Jun
6973.0
6365.0
1434.2
1325.9

Jul
5505.0
5120.0
1212.2
1277.8

Aug
3548.0
3380.0
886.4
505.6

Sep
2340.0
2245.0
685.6
446.3

Oct
1816.0
1910.0
401.7
424.1

Nov
1588.9
1525.0
366.1
377.8

Dec
1092.4
1005.0
282.2
181.1

2000

4000

6000

8000

10000

1.6. EXERCISES

Figure 1.5: Fraser River monthly flow (cms) from January (left) to December (right)

1.6

Exercises

Problem 1.1 The records of a department store show the following total monthly finance charges
(in dollars) for 240 customers which accounts included finance charges (see Table 1.7). From a
department stores records for a particular month, the total monthly finance charges in dollars were
obtained from 240 customers accounts that included finance charges. See the table shown below:
(a) Complete the frequency table. What percentage of customers were charged less than $20?

Table 1.7: Finance Charges from 240 Accounts

Class Limits Numbers of Customers
05
65
5 10
88
10 15
42
15 20
27
20 25
18
(b) Construct a histogram using the four classes given above.
(c) Calculate the mean, variance and standard deviation.
Problem 1.2 Before microwave ovens are sold, the manufacturer must check to ensure that the
radiation coming through the door is below a specified safe limit. The amounts of radiation leakage
(mw/cm2 ) from 25 ovens, with the door closed, are:
15
12
7
5
8

9
8
2
15
18

18
5
1
10
1

10
8
5
15
2

5
10
3
9
11

CHAPTER 1. SUMMARY AND DISPLAY OF UNIVARIATE DATA

(a) Calculate the mean, variance and standard deviation.

(b) What are the median, quartiles and interquartile range?
(c) Compare the results of (a) and (b).
(d) Draw the box plot.
Problem 1.3 The following data are the waiting times (in minutes) between eruptions of Old
Faithful geyser between August 6 and 10, 1985.
816
778
796
682
711

611
599
1051
781
578

796
774
820
772
696

573
748
748
797
851

809
723

(a) Calculate the mean, variance and standard deviation.

(b) What are the median, quartiles and interquartile range?
(c) Compare the results of (a) and (b).
(d) Draw the box plot.
Problem 1.4 The following numbers are the final marks of 16 students in a previous STAT 251
class.
64 86 77 68 95 91 58 91 83 97 96 14 32 68 89 75
(a) Calculate the mean, variance and standard deviation.
(b) What are the median, quartiles and interquartile range?
(c) Compare the results of (a) and (b).
(d) Draw the box plot.
Problem 1.5 In 1798, Henry Cavendish estimated the density of the earth (as a multiple of the
density of water) by using a torsion balance. The dataset below contains his 29 measurements.

Table 1.8: Cavendish Measurements of the Density of the Earth

5.50
5.57
5.42
5.61
5.53

5.47
4.88
5.62
5.63
4.07

5.29
5.34
5.26
5.44
5.46

5.55
5.34
5.30
5.36
5.79

5.75
5.29
5.10
5.86
5.58

5.27
5.85
5.65
5.39

(a) Calculate the mean, variance and standard deviation.

(b) What are the median, quartiles and interquartile range?
(c) Compare the results of (a) and (b). I particular calculate the discrepancy index between the
mean and median.
(d) Briefly state your conclusions.

1.6. EXERCISES

Problem 1.6 The mean size of twenty five recent projects at a construction company (in square
meters) is 25,689 m2 . The standard deviation is 2,542 m2 .
(a) Calculate the mean, variance and standard deviation in square feet [Hint: 1 foot = 0.3048 m].
(b) A new project of 226050 f t2 has been just completed. Update the mean, variance and standard
deviation.
Problem 1.7 The daily sales in April, 1994 for two departments of a large department store (in
thousands of USA dollars) are summarized below.

Table 1.9: Daily Sales, April 1994

Department A Department B
Mean
24.3
32.4
Standard Deviation
12.4
10.3
Covariance
96.1
(a) Convert the figures above to hundreds of Canadian dollars (CN $1 = US $0.7)
(b) Calculate the mean and standard deviation for the total daily sales for the two departments.
Why do you think the combined daily sales are more variable than the individual ones?
(c) Calculate the mean and standard deviation for the dierence in daily sales between the two
departments. Comment your results.
(d) Under what conditions would the variance of the sums be smaller than the variance of the
dierences?
Problem 1.8 A manufacturer of automotive accessories provides bolts to fasten the accessory to
the car. Bolts are counted and packaged automatically by a machine. There are several adjustments
that aect the machine operation. An experiment to find out how several variables aect the speed
of the packaging process was carried out. In particular, the total number of bolts to be counted (10
and 30) and the sensitivity of the electronic eye (6 and 10) have been considered. The observed
times (in seconds per bolt) are given in Table 1.10.
(a) Summarize and describe the data.
(b) What adjustments have the greatest eect?
(c) How would you adjust the machine to shorten the packaging time?

Problem 1.9 Find the average, variance and standard deviation for the following sets of numbers.
a) 1, 2, 3, 4, 5, . . . , 300
b) 4, 8, 12, 16, 20, . . . , 1200
c) 1, 2, 2, 3, 3, 3, 4, 4, 4, 4, . . . , 9, 9, 9, 9, 9, 9, 9, 9, 9
Pn
Pn 2
Pn 3
Hint:
i = n(n + 1)/2,
i = n(n + 1)(2n + 1)/6,
i = n2 (n + 1)2 /4
and
Pn 4
3
2
i = n(n + 1)(6n + 9n + n 1)/30

CHAPTER 1. SUMMARY AND DISPLAY OF UNIVARIATE DATA

Table 1.10: Time for Counting and Packaging Bolts
10 Bolts
0.57
1.76
1.13
0.84
1.67
1.20
0.92
0.39
1.34
3.43
3.97
1.06
2.89
3.56
1.72
0.60

30 Bolts
0.90
0.65
0.62
0.86
0.63
0.75
0.80
1.00
4.31
3.58
3.72
3.64
3.35
3.64
3.55
4.47

Low Sens (6)

0.57
1.13
1.67
0.92
0.90
0.62
0.63
0.80
1.34
3.97
2.89
1.72
4.31
3.72
3.35
3.55

High Sens (10)

1.76
0.84
1.20
0.39
0.65
0.86
0.75
1.00
3.43
1.06
3.56
0.60
3.58
3.64
3.64
4.47

Table 1.11: Earthquakes in 1993

Magnitude
0.11.0
1.02.0
2.03.0
3.04.0
4.05.0
5.06.0
6.07.0
7.08.0
8.09.0

frequency
9
1177
5390
4263
5034
1449
141
15
1

Problem 1.10 The number of worldwide earthquakes in 1993 is shown in the following table
(a) Complete the frequency table. What percentage of earthquakes were below 5.0? Above 6.0?
(b) Draw a histogram and comment on it.
(c) Calculate the mean and standard deviation for the earthquake magnitude in 1993.
Problem 1.11 The daily number of customers served by a fast food restaurant were recorded for
30 days including 9 weekends and 21 weekdays. The average and standard deviations are as follows:
Weekends: x1 = 389.56, SD1 = 27.4
Weekdays: x2 = 402.19, SD2 = 26.2
Calculate the average and standard deviation for the 30 days.
Problem 1.12 The average and the standard deviation for the weights of 200 small concretemix
bags (nominal weight = 50 pounds) are 51.2 pounds and 1.5 pounds, respectively. A new sample
of 200 large concretemix bags (nominal weight = 100 pounds) have just been weighed. Do you
expect that the standard deviation for the last sample will be closer to 1.5 pounds or to 3.0 pounds?
Justify your answer.

1.6. EXERCISES

Problem 1.13 Given the data set x1 = 1, x2 = 3, x3 = 8, x4 = 12, x5 = 20 calculate the

function
D(t) =

5
X

|xi t|,

for several values of t between 1 and 20, and plot D(t) versus t. Where is the minimum achieved?
Do the same experiment for the data set x1 = 1, x2 = 3, x3 = 8, x4 = 12. Do you notice
any pattern? If so, repeat this experiment for several additional sets of numbers, to investigate the
persistence of this pattern. What is your conclusion? Can you prove it mathematically?
Problem 1.14 Each pair (xi , wi ), i = 1, , n, represents the placement and magnitude of a vertical force acting on a uniform beam. Find the center of gravity of this system. [Hint: see the
discussion under The Sample Mean as Center of Gravity and notice that in the present case the
vertical forces are not equal].
Problem 1.15 Calculate the center of gravity of the system when the placements (xi ) and weights
(wi ) are given by Table 1.12.
Table 1.12: Placements of Vertical Forces on a Uniform Beam
xi
1.8
1.4
1.3
3.8
1.2
1.9
1.2
1.1
1.1

wi
2.1
1.6
1.4
6.4
1.3
1.2
1.2
3.1
1.1

xi
1.2
1.3
1.2
1.2
1.4
1.3
1.6
1.1
1.2

wi
1.5
4.7
2.3
2.3
3.1
1.9
2.4
3.7
1.2

Problem 1.16 Each pair (xi , wi ), i = 1, , n, represents the placement and magnitude of a vertical force acting on a uniform beam. What values of wi would make the sample median the center
of gravity? Consider the cases when n is even and n odd separately.
Problem 1.17 The maximum annual flood flows for a certain river, for the period 1941-1990, are
given in Table 1.6.
(i) Summarize and display these data.
(ii) Compute the mean, median, standard deviation and interquartile range.
(iii) If a oneyear construction project is being planned and a flow of 150000 cfs or greater will halt
construction, what is the probability (based on past relative frequencies) that the construction
will be halted before the end of the project? What if it is a two-year construction project?
Problem 1.18 The planned and the actual times (in days) needed for the completion of 20 job
orders are given in Table 1.14.
(a) Calculate the average and the median planned time per order. Same for the actual time.
(b) Calculate the corresponding standard deviations and interquartile ranges.

CHAPTER 1. SUMMARY AND DISPLAY OF UNIVARIATE DATA

Table 1.13: Maximum annual flood flows
Year
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965

Flood, cfs
153000
184000
66000
103000
123000
143000
131000
99000
137000
81000
144000
116000
11000
262000
44000
8000
199000
6000
166000
115000
88000
29000
66000
72000
37000

Year
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990

Flood, cfs
159000
75000
102000
55000
86000
39000
131000
111000
108000
49000
198000
101000
253000
239000
217000
103000
86000
187000
57000
102000
82000
58000
34000
183000
22000

(c) If there is a delay penalty of $5000 per day and a beforeschedule bonus of $2500 per day, what
is the average net loss ( negative loss = gain) due to dierences between planned and actual times?
What is the standard deviation?
(d) Study the relationship between the planned and actual times.
(e) What would be your advice to the company based on the analysis of these data?
P

Problem 1.19 Show that (a) Cov(x, y) = [( xi yi ) nx y]/(n 1).

(b) If ui = a + b xi and vi = c + d yi , then Cov(u, v) = bdCov(x, y).

1.6. EXERCISES
Table 1.14: The planned and the actual times
Order
1
2
3
4
5
6
7
8
9
10

Planned Time
22
11
11
16
21
12
25
20
13
34

Actual Time
22
8
8
14
20
16
29
20
10
39

Order
11
12
13
14
15
16
17
18
19
20

Planned Time
17
27
16
30
22
17
13
18
21
18

Actual Time
18
34
14
35
18
16
12
14
19
17

Problem 1.20 The total paved area, X (in km2 ), and the time, Y (in days), needed to complete
the project was recorded for 25 dierent jobs. The data is summarized as follows:
x = 12.5 km2

SD(x) = 1.2 km2

y = 30.8 days

SD(y) = 3.7 days

Cov(x, y) = 3.4
Give the corresponding summaries when the area is measured in ft2 and the time is measure in
hours.
Hint: 1 foot = 0.3048 m, and 1 km = 1000 m.

CHAPTER 1. SUMMARY AND DISPLAY OF UNIVARIATE DATA

Chapter 2

Summary and Display of Multivariate

Data
In practice, we usually consider several variables simultaneously. In addition to describing each
variable as in Chapter 1, we may wish to investigate their possible relationships. Some examples
are provided by the firstcrack and failure load data on Table 2.1, the Fraser River flow data in
Table 2.3 and the yield data in Table 2.2. Are the firstcrack and failure load of concrete beams
related? Is it possible to use the firstcrack load to predict the failure load? Are the Fraser River
mean monthly flows related? Is it possible to use the average flows from previous months to predict
the current and future months flows? How does the temperature aect the yield of the chemical
process? Is there a simple equation relating the yield response to changes on the temperature?
As explained in the previous chapter, raw data must be summarized and/or graphically displayed to facilitate their analysis. We will now learn some simple techniques which can be used to
summarize multivariate data and describe their relationships. In the next sections we will introduce
scatter plots, correlation coecients, multiple correlation coecients, simple linear regression and
multiple linear regression.

2.1

Scatter Plot

Simultaneous observations on a pair of variables (xi , yi ), i = 1, . . . , n, can be graphically displayed

on a scatter plot. Each observation is represented as a point with xcoordinate xi and ycoordinate
yi . Scatter plots help in visualizing statistical relationships between variables (or the lack of them).
Linear Association and Causality
Some examples of scatter plots are presented on Figure 2.1. The dotted lines represent the
means for the x and y variables. For example the mean flows for January, February and June
are 957.4, 894.8 and 6973, respectively. Figure 2.1 (a) shows a positive linear association between
January and February flows: years with higher than average flows in January tend to have also
higher than average flows in February and vice versa for lower than average flows. Figure 2.1 (b),
on the other hand, shows a lack of linear association: years with higher than average flows in January
come together with higher than average and lower than average flows in June with approximately
27

CHAPTER 2. SUMMARY AND DISPLAY OF MULTIVARIATE DATA

(a) Jan-Feb Fraser Flow
February
800 100012001400

600 800 1000 1200 1400 1600 1800

January

June
5000 7000 9000 11000

(b) Jan-Jun Fraser Flow

(d) Mean Monthly Fow

600 800 1000 1200 1400 1600 1800

January

20
30
Age

Flow
2000400060008000

Price
800 1200 1600 2000

(c) House Age and Price

6
8
Month

Figure 2.1: Some Examples of Scatter Plots

the same frequency. Similarly for lower than average January flows. Figure 2.1 (c) shows a negative
linear association between the age and price of twenty randomly selected houses: older than average
houses tend to have lower than average prices and vice versa for newer houses; Figure 2.1 (d) shows
a nonlinear association between time of the year and river flow: the monthly mean flows first
increase (until June) and then decrease.
A common mistake is to confuse the concepts of linear association and causality. If we find a
positive linear association between two variables we can say that they tend to take values above and
below their means simultaneously. The observed linear association may be the result of a causal
relation between the variables an increase in one of them causes an increase in the other. In many
occasions, however, observed linear associations are the result of the action of a third variable (called
lurking variable) which drives the other two. For instance, the linear association between January
and February Fraser flows might be due to the eect of a lurking variable, namely the weather. If in
a given year we artificially increase the Fraser January flow we cannot expect a naturally occurring
higher flow in February.
Several Pairs of Variables
We often wish to investigate the pairwise relations between several pairs of variables. This can
be accomplished by several ways. One way is to use dierent symbols (dots, stars, letters, numbers,
etc.) to represent the points and overlay the scatter plots on a single picture, facilitating their
comparison. For instance, the weights and heights of men and women could be plotted on a single
scatter plot using the letter w for women and m for male.
Another technique for dealing with several variables is to display the scatter plots in a matrix
layout. Scatter plot matrices are useful for uncovering possible patterns in the pairwise association
structure. An example is given by Figure 2.2. Notice that the strength of association decreases as
months get further apart. Moreover, while January, February and March show some association,
April and May seem to have less (if any) association with other months.

1000

2000

3000

Mar

6008001000 1400 1800

6008001000

1400

Apr

600 1000 1400 1800

1800

5000 7000

10001500200025003000

May

3000

600800

1200 1600

Jan

600 8001000120014001600

Feb

600 1000 1400 1800

2.2. COVARIANCE AND CORRELATION COEFFICIENT

300040005000600070008000

Figure 2.2: Fraser River Monthly Average Flow (1914-1990)

2.2

Covariance and Correlation Coecient

The Covariance and the correlation coecient are used to quantify the degree of linear association
between pairs of variables. If two variables, xi and yi , are positively associated then when one of
them is above (below) its mean the other will also tend to be above (below) its mean. Therefore,
the products (xi x)(yi y) will be mostly positive and the sample covariance,
Cov(x, y) =

n
1 X
(xi x)(yi y)
n 1 i=1

(2.1)

will be large and positive. On the other hand, if the variables are negatively associated, when one
of them is above (below) its mean the other will tend to be below (above) its mean and so the
products (xi x)(yi y) will be mostly negative. In this case the sample covariance (2.1) will be
large and negative. Finally, if the variables are not positively nor negatively associated the products
(xi x)(yi y) will be positive and negative with approximately the same frequency (there will be
a fair degree of cancellation) and the sample covariance will be small.
The following formula provides a simple procedure for the hand calculation of the covariance:
Cov(x, y) =
=

n
1 X
(xi x)(yi y)
n 1 i=1

n
[xy x y] ,
n1

where

xy =

n
1X
xi yi
n i=1

(2.2)

Some problems with the interpretation of the covariance and its direct use as a measure of linear
association are illustrated in Example 2.1.
Example 2.1 Consider the measurements (xi , yi ) of the firstcrack and failure load (in pounds
per square foot) on Table 2.1. Figure 2.3 suggests that there little association between these measurements. Since x = 8396.6 pounds per square foot, y = 16, 064.4 pounds per square foot, and

CHAPTER 2. SUMMARY AND DISPLAY OF MULTIVARIATE DATA

xy = 134875, 645 square pounds per ft4 , from (2.2)

Cov(x, y) = (20/19) [(134875645) (8396.6)(16064.4)] = 11, 258.99 square pounds per ft4
If the loads are given in thousand of pounds per square foot instead of pounds per square foot, then
ui = xi /1000 ,vi = yi /1000 and, from Problem 1.19,
Cov(u, v) =

Cov(x, y)
= 0.011259 million square pounds per ft4 .
1000 1000
Table 2.1: Strength of concrete beams
Unit
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

FirstCrack Load (X)

7610
9528
7071
7463
4440
10929
12385
5734
6342
6772
7519
8511
9087
9072
12157
6504
6654
8700
11613
9841

Failure Load (Y)

18103
15283
19171
16014
12840
19606
14570
16755
15713
17094
13808
16480
16131
15315
12683
14625
16615
15643
15480
19359

Correlation Coecient
Problem 2.1 illustrates the strong dependency of Cov(x, y) on the scale of the variables. A
measure of linear association which is independent from the variables scale (see 2.5) is provided the
sample correlation coecient,
Cov(x, y)
Cov(x, y)
=
.
SD(x) SD(y)
Var(x)Var(y)

r(x, y) = p

More precisely, if ui = a + bxi and vi = c + dyi then r(u, v) = sign(bd)r(x, y).

Another advantage of r(x, y) is that it takes values between 1 and 1 (see 2.7). Therefore,
values of r(x, y) close to 1 indicate positive linear association, values of r(x, y) close to 1 indicate
negative linear association. Values of r(x, y) close to 0 indicate lack of linear association.
For the data in Example 2.1, Cov(x, y) = 11258.99, SD(x) = 2193.17, SD(y) = 1949.36 and
r(x, y) =

11258.99
= 0.0026.
(2193.17)(1949.36)

2.2. COVARIANCE AND CORRELATION COEFFICIENT

Scatterplot of Failure vs First-Crack Load

16000

14000

Failure Load

18000

6000

8000

10000

12000

First-Crack Load

Figure 2.3: FirstCrack Load vs Failure Load

The small value of r(x, y) confirms the qualitative impression from Figure 2.3 that the first crack
and the failure loads (in the case of these concrete beams) are not related. The main implication
from a practical point of view is that the first crack of a given beam cannot be used to predict its
ultimate failure load.
Example 2.2 Table 2.2 gives the results of an experiment to study the relation between temperature (in units of 10o Fahrenheit) and yield of a certain chemical process (percentage). The
reader can verify that in this case x = 34.5, y = 43.07, Var(x) = 77.50, Var(y) = 128.06 and
Cov(x, y) = 96.2759. Therefore, the correlation coecient,
r(x, y) = p

96.2759
= 0.9664 = 0.97,
77.50 128.06

indicates a strong positive linear association between temperature and yield. This is also clearly
suggested by the scatter plot in Figure 2.4. Notice that the relation between yield and tempreature
is likely to be causal, that is, the increase in yield may be actually caused by the increase in
temperature.
Several Pairs of Variables
When we have several variables their covariances and correlation coecients can be arranged in
matrix layouts called covariance matrix and correlation matrix. Although the covariance matrix is
dicult to interpret due to its dependence on the scale of the variables, it is nevertheless routinely
computed for future usage.
The correlation matrix is the numerical counterpart of the scatter plot matrix discussed before.
For the River Fraser Data (see Figure 2.2) we have

CHAPTER 2. SUMMARY AND DISPLAY OF MULTIVARIATE DATA

Table 2.2: Yield of a chemical process
Unit
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

Temp. (X)
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34

Jan
Feb
Mar
Apr
May

Yield (Y)
28
26
22
25
27
32
31
33
38
41
41
38
41
46
44

Jan
1.00
0.78
0.65
0.40
0.18

Feb
0.78
1.00
0.75
0.34
0.15

Unit
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

Mar
0.65
0.75
1.00
0.50
0.19

Temp. (X)
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49

Apr
0.40
0.34
0.50
1.00
0.29

Yield (Y)
41
45
53
46
44
49
53
49
51
55
56
58
58
58
63

May
0.18
0.15
0.19
0.29
1.00

As already observed from Figure 2.2, February flaws are somewhat correlated with January and
March flaws (with correlation coecients 0.78 and 0.75, respectively). January and March flaws
are also marginally correlated (correlation coecient equal to 0.65). The correlation coecients
between all the other pairs of months are below 0.50.

2.3

The Least Squares Regression Line

The scatter plot of linearly associated variables approximately follows a linear function
f(x) = 0 + 1 x
called regression line. The hats indicate that 0 , 0 and f(x) are calculated from the data. In this
context X and Y play dierent roles and are given special names. The independent variable X is
called explanatory variable and the dependent variable Y is called response variable.
Least Squares
The solid line on Figure 2.4 (see Example 2.2) was obtained by the method of least squares (LS).
According to this method, the regression coecients (the intercept 0 and the slope 1 ) minimize
(in b0 and b1 ) the sum of squares
S(b0 , b1 ) =

n
X
i=1

(yi b0 b1 xi )2 .

2.3. THE LEAST SQUARES REGRESSION LINE

Scatterplot of Yield vs Temperature

Yield

Temperature

Figure 2.4: Yield vs Temperature

The LS coecients are the solution to linear equations
n
X
i=1

n
X
i=1

(yi 0 1 xi ) = 0
(Gauss Equations)

(yi 0 1 xi )xi = 0.

which are obtained by r dierencing S(b0 , b1 ) with respect to b0 and b1 . Carrying out the summations
and dividing by n we obtain,
y 0 1 x = 0
xy 0 x 1 xx = 0.

(2.3)
(2.4)

where
xy = (1/n)

n
X

xi yi

and

xx = (1/n)

i=1

n
X

x2i

i=1

From (2.3), 0 = y 1 x. Substituting this into (2.4) and solving for 1 gives
xy x y
1 =
.
xx x x
Fitted Values and Residuals

(2.5)

CHAPTER 2. SUMMARY AND DISPLAY OF MULTIVARIATE DATA

The regression line f(x) and the regression coecients 0 and 1 are good summaries for linearly
associated data. In this case the fitted value
yi = f(xi ) = 0 + 1 xi
(Fitted Value)
will be close to the observed value of yi . How close depends on the strength of the linear association. The dierences between the observed values yi and the fitted values yi ,
are called regression residuals.

ei = yi yi

(Residual),

Residual Plot
The regression residuals ei are usually plotted against the fitted values yi to determine the
appropriateness of the linear regression fit. If the data are well summarized by the regression line
(see Figure 2.5 (a)) the corresponding scatter plot of (
yi , ei ) has no systematic pattern (see Figure
2.5 (c)). Examples of bad residual plots that is, plots that indicate that the regression line is a
poor summary for the data are given on Figure 2.5 (d) and (e). The corresponding scatter plots
and linear fits are given on Figure 2.5 (b) and (c). In the case of Figure 2.5 (d), the residuals go
from positive to negative and back to positive, suggesting that the relation between X and Y may
not be linear. In the case of Figure 2.5 (e) larger fitted values have larger residuals (in absolute
value).

2.4

Multiple Linear Regression

In practice we often use several explanatory variables to predict or interpolate the values of a
single response variable. The explanatory variables may all be distinct or may include functions
(powers) of the observed explanatory variables.
If for example, we have p explanatory variables (X1 , X2 , , Xp ) and n observations or cases,
it is convenient to use double subscript notation. The first subscript (i) indicates the case and the
second subscript (j) indicates the variable.
Case (i)
1
2
3

Response Variable (yi )

y1
y2
y3

Explanatory Variables (xij )

x11 x12
x1p
x21 x22
x2p
x31 x32
x3p

xn1 xn2
xnp

The linear regression function is now given by

f(x) = 0 + 1 x1 + 2 x2 + + p xp ,

and the regression coecients (0 , 1 , ,p ) minimize (in b0 , b1 , , bp ) the sum of squares

S(b0 , b1 , , bp ) =

n
X
i=1

(yi b0 b1 xi1 b2 xi2 b2 xip )2 .

2.5. EXERCISES
The least square coecients are the solution to the linear equations
n
X
i=1

(yi 0 1 xi1 2 xi2 p xip ) = 0

n
X

(yi 0 1 xi1 2 xi2 p xip )xi1 = 0

i=1
n
X
i=1

(yi 0 1 xi1 2 xi2 p xip )xi2 = 0

(Gauss Equations)

n
X
i=1

(yi 0 1 xi1 2 xi2 p xip )xip = 0

which are obtained by dierencing S(b0 , b1 , , bp ) with respect to b0 , b1 , , bp .

Carrying out the sums and dividing by n we obtain,
y 0 1 x1 2 x2 p xp = 0
x1 y 0 x1 1 x1 x1 2 x2 x1 p xp x1 = 0
x2 y 0 x2 1 x1 x2 2 x2 x2 p xp x2 = 0

xp y 0 xp 1 x1 xp 2 x2 xp p xp xp = 0
where
yxj = (1/n)

n
X
i=1

2.5

xij yi

and

xj xk = (1/n)

n
X

xij xik .

(2.6)

i=1

Exercises

Problem 2.1
Problem 2.2 The following data give the logarithm (base 10) of the volume occupied by algal
cells on successive days, taken over a period over which the relative growth rate was approximately
constant.

CHAPTER 2. SUMMARY AND DISPLAY OF MULTIVARIATE DATA

8000
4000

15
x

2000

40
0

60
80 100
Fitted Value

120

140

-2000

100

2000

4000 6000
Fitted Value

-1000

-100

1000

Residual

(d) Megaphone Pattern

(e) Quadratic Pattern

400
y
200
100

Residual

-20

(d) Patternless Residuals

100

Residual
0

8000 10000

-200

2000

y
6000

y
100

300

10000

150

(c) Increasing Variability

12000

(b) Nonlinear Relation

200

(a) Linear Relation

100

150
200
Fitted Value

250

300

Figure 2.5: Examples of linear regression fits (above) and their residual plots (below).

Day (x)
1
2
3
4
5
6
7
8
9

log Volume (log(y))

3.592
3.823
4.174
4.534
4.956
5.163
5.495
5.602
6.087

(1) Plot log y against x. Do you think using the logarithmic scale is appropriate? Why?
(2) Calculate and interpret the sample correlation coecient.

Problem 2.3 The maximum annual flood flows of a river, for the period 19491990, are given in
Table 1.6.
(i) Summarize and display these data.
(ii) Compute the mean, median, standard deviation and interquartile range.
(iii) If a oneyear construction project is being planned and a flow of 150000 cfs or greater will halt
construction, what is the relative frequency (based on past relative frequencies) that the construction
will be halted before the end of the project? What if it is a two-year construction project?

2.5. EXERCISES

Fitted vs Residuals

-2
-4

Time
70
80

15
Diameter

Residual
0
5
6

Residual
0
2

15
Diameter

70
80
Fitted Value

Diameter vs Residuals

-2

60
50

70
80
Fitted Value

15
Diameter

Fitted vs Residuals

Residual
0
2

100
90

Cubic Fit

15
Diameter

Diameter vs Residuals

-2

Residual
0
2

Time
70
80
60
50

100

Residual
0
2

80
90
Fitted Value

Fitted vs Residuals

100

Quadratic Fit

-5

-4

15
Diameter

-10

-5

-2

-10

Time
70
80

Diameter vs Residuals

Residual
0
5

100

Linear Fit

15
Diameter

Figure 2.6: Polishing Times.

Problem 2.4 The planned and the actual times (in days) needed for the completion of 20 job
orders are given in Table 1.14
(a) Calculate the average and the median planned time per order. Same for the actual time.
(b) Calculate the corresponding standard deviations and interquartile ranges.
(c) If there is a delay penalty of $5000 per day and a beforeschedule bonus of $2500 per day, what
is the average net loss ( negative loss = gain) due to dierences between planned and actual times?
What is the standard deviation?
(d) Study the relationship between the planned and actual times.
(e) What would be your advice to the company based on the analysis of these data?
Problem 2.5 (a) Show that
Cov(x, y) =
(b) Show that if ui = a + b xi
(i) u = a + bx
(ii) Var(u) = b2 Var(x)
(iii) r(u, v) = r(x, y)
(iv) (u, v) = db (x, y)

and

xi yi ) nx y
n1

and

vi = c + d yi , then

Cov(x, y)
,
Var(x)

CHAPTER 2. SUMMARY AND DISPLAY OF MULTIVARIATE DATA

Table 2.3: Fraser River Monthly Flow (cms)
Year
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990

Jan
855
774
984
987
797
1140
1240
881
801
684
1860
821
972
1160
740
813
1000
629
800
1210

Feb
1030
857
842
929
780
1030
1230
791
721
649
1480
927
977
1010
706
809
944
657
685
841

Mar
841
1500
850
927
736
924
1130
952
957
703
1300
844
1240
1160
801
1280
1300
809
682
926

Apr
1550
2100
1550
2320
1100
2300
2350
1960
1290
1760
1880
1010
1990
2030
2070
2090
2280
2410
1780
3000

May
6120
6450
4910
5890
3940
7070
4710
3950
4910
5120
4950
5360
4090
2870
5300
3770
5120
5450
4860
5050

Jun
7590
10800
6180
8430
6830
7250
5670
5730
6360
4900
6260
8690
6060
6370
7390
8390
5840
5940
6020
8760

Jul
5590
7330
5000
7470
6070
7670
4830
4540
4860
4010
4890
7230
5240
6580
4650
5380
4070
4430
3990
6270

Aug
3570
4120
2930
4360
3420
6440
3620
2970
2610
2720
3620
4850
3460
3780
2770
3220
2980
3010
3170
3340

Sep
2360
2280
1680
2440
2300
4460
2340
2600
1830
2600
2130
3620
2210
2920
1940
1890
1680
1890
1840
1790

Oct
1890
1940
2080
1930
1950
2510
1650
2090
1420
2080
1530
2310
1470
2560
1980
1470
1020
1540
1380
1520

Nov
1550
1500
1620
1290
2360
1800
1260
1590
918
1630
1950
1470
2050
1370
1230
1340
1210
1470
2060
2110

Dec
908
1000
1130
978
1480
1480
1030
1010
952
1900
1140
1110
878
861
746
908
811
926
1410
1190

Problem 2.6 The total paved area, X (in km2 ), and the time, Y (in days), needed to complete
the project was recorded for 25 dierent jobs. The data is summarized as follows:
x = 12.5 km2

SD(x) = 1.2 km2

y = 30.8 days

SD(y) = 3.7 days

Cov(x, y) = 3.4 ,

r(x, y) = 0.766 ,

= 2.36

Give the corresponding summaries when the area is measured in feet2 and the time is measure in
hours.
Hint: 1 foot = 0.305 m, and 1 km = 1000 m.
Problem 2.7 Show that 1 r(x, y) 1.
Hint: One can assume without loss of generality that
x=y=0

and SD(x) = SD(y) = 1

Then use the fact that

n
X
i=1

(yi bxi )2

for all b, and in particular for b = Cov(x, y)/SD(x)2 .

(why?)

2.5. EXERCISES

Table 2.4: The records of maximum annual flood flows

Year
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965

Flood, cfs
153000
184000
66000
103000
123000
143000
131000
99000
137000
81000
144000
116000
11000
262000
44000
8000
199000
6000
166000
115000
88000
29000
66000
72000
37000

Year
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990

Flood, cfs
159000
75000
102000
55000
86000
39000
131000
111000
108000
49000
198000
101000
253000
239000
217000
103000
86000
187000
57000
102000
82000
58000
34000
183000
22000

Table 2.5: The planned and the actual times

Order
1
2
3
4
5
6
7
8
9
10

Planned Time
22
11
11
16
21
12
25
20
13
34

Actual Time
22
8
8
14
20
16
29
20
10
39

Order
11
12
13
14
15
16
17
18
19
20

Planned Time
17
27
16
30
22
17
13
18
21
18

Actual Time
18
34
14
35
18
16
12
14
19
17

CHAPTER 2. SUMMARY AND DISPLAY OF MULTIVARIATE DATA

Chapter 3

Probability
3.1

Sets and Probability

The theory of probability, which is briefly discussed below, is needed for the better understanding of some important statistical techniques. This theory is, roughly speaking, concerned with the assessment of the chances (or likelihood) that certain events will or will not
occur. In order to give a more precise (and useful) definition of probability, we need first to
introduce some technical concepts and definitions.
Random Experiment: The defining feature of a random experiment is that its outcome
cannot be determined beforehand. That is, the outcome of the random experiment will
only be known after the experiment has been completed. The next time the experiment is
performed (seemingly under the exact same conditions) the outcome may be dierent. Some
examples of random experiments are:

asking a randomly selected person if she smokes,

counting the number of defective items found in a lot,
measuring the time elapsed between two consecutive breakdowns of a computer network,
counting the yearly number of work-related accidents in a production plant,
measuring the yield of a chemical reaction.

Sample Space (S): Although we may not be able to say beforehand what the outcome of
the random experiment will be, we should at least in principle to be able to make a complete
list of all the possible outcomes. This list (set) of all the possible outcomes is called the
sample space and denoted by S. A generic outcome (that is, element of S) is denoted by
w. The sample spaces for the random experiment listed above are:
S = {Yes, No},
S = {0, 1, 2, . . . , n} where n is the lot size,
S = [0, 1), the time (in hours) between breakdowns can be any non-negative real number.
41

CHAPTER 3. PROBABILITY

S = {0, 1, 2, . . .}, the number of accidents can be any non-negative integer number.

S = [0, 100], the percentage yield can be any real number between zero and one hundred.
Event: The events, usually denoted by the first upper case letters of the alphabet (A, B, C,
etc), are simply subsets of S. Most events encountered in practice are meaningful and can
be expressed either in words or using mathematical notation. Some examples (related to the
list of random experiments given above) are:
A = { less than four defectives} = {0, 1, 2, 3}.
B = { more than 200 hours} = (200, 1).
C = {2, 3, 5, 9}

D = {between ten and twenty percent} = [10, 20].

An important feature of the events is that they can or cannot occur, depending on the
actual outcome of the random experiment. For instance, if after completing the inspection
of the lot we find two defectives, the event A has occurred. On the other hand, if the actual
number of defectives turned out to be five, the event A did not occur.
Two rather special events are the impossible event which can never occur denoted
by the empty set ; and the sure event which always occurs consisting of the entire
sample space, S.
Some related mathematical notations are:
w2A

()

w belongs to A

()

A occurs

w doesnt belongs to A

()

A doesnt occur

and
w 62 A

()

Probability Function (P ): Evidently, not all the events are equally likely. For instance,
the event
A = {more than three million accidents}
would appear to be quite unlikely, while the event
B = {more than three hours before the next crash}
would appear to be quite likely.
A probability function P is a function which assigns to each event a number representing
the likelihood that this event will actually occur.
For self-consistency reasons, any probability function P must satisfy the following properties:
(1) P () = 0 and P (S) = 1.

3.1. SETS AND PROBABILITY

(2) 0 P (A) 1 for all A.

(3) P (A [ B) = P (A) + P (B) P (A \ B).
Properties (4)-(6) below can be derived from (1)-(3).
(4) P (A [ B) = P (A) + P (B) if A and B are disjoint mutually exclusive) events.
In fact, if A and B are disjoint then A \ B = ; and P (A \ B) = 0.
(5) P (Ac ) = 1 P (A), where Ac denotes the complement of A.
In fact, since A [ Ac = S and A \ Ac = ;, 1 = P (A [ Ac ) = P (A) + P (Ac ) and (5) follows.
(6) If A B then P (A) P (B).
In fact, since A B,
B = (B \ A) [ (B \ Ac ) = A [ (B \ Ac ).
Since A and (B \ Ac ) are disjoint, P [A \ (B \ Ac )] = 0 and so
P (B) = P (A) + P (B \ Ac ) P (A).

Example 3.1 It is known from previous experience that the probability of finding zero, one,
two, etc. defectives in lots of 100 items shipped by a certain supplier are as given in Table
2.1 below.
Let A, B and C be the events less than two defectives, more than one defective and
one or two defectives, respectively. (a) Calculate P (A), P (B) and P (C). (b) What is the
meaning (in words) of the event Ac ? Calculate P (Ac ) directly and using Property 4. (c)
What is the meaning (in words) of the event A [ C? Calculate P (A [ C) directly and using
Property 3.
Table 3.1:
Defectives
0
1
2
3
4
5
6 or more

Probability
0.50
0.20
0.15
0.10
0.03
0.02
0.00

CHAPTER 3. PROBABILITY

Solution
(a) From Table 2.1, P (A) = 0.70, P (B) = 0.30, and P (C) = 0.35
(b) Ac = {two or more defectives} = {more than one defective} = B, from Table 1, P (Ac ) =
P (B) = 0.30. This is consistent with the result we obtain using Property 4:
P (Ac ) = 1 P (A) = 1 0.70 = 0.30.
(c) A [ C = {less than three defectives}. Therefore, directly from Table 1, P (A [ C) = 0.85.
To make the calculation using Property 3, we must first find P (A \ C). Since A \ C =
{exactly one defective}, it follows from Table 1 that P (A \ C) = 0.20. Now,
P (A [ C) = 0.70 + 0.35 0.20 = 0.85.
2

3.2

Conditional Probability and Independence

There are instances when, after obtaining some partial information regarding the outcome
of a random experiment, one would like to update the probabilities of certain events, taking
into account the newly acquired information.
The updated probability of the event A, when it is known that the event B has occurred,
is in general denoted by P (A|B) and called the conditional probability of A given B. This
conditional probability can be calculated by the formula
P (A|B) =

P (A \ B)
P (B)

(3.1)

provided that P (B) > 0. A simple, but nevertheless important, consequence of (2) is that
P (A \ B) = P (A|B)P (B),

(3.2)

which is sometime called the multiplication law.

Example 3.1 (continued): Suppose that we know that the lot contains two defectives or
more. What is the probability that it contains three or more defectives?
Solution Let
B = { two or more defectives } = { more than one defective }
and
D = { three or more defectives } = { more than two defectives }.

Since P (B) = 0.30 and P (D \ B) = P ({3, 4, 5}) = 0.15, the desired conditional probability
is
P (D|B) = P (D \ B)/P (B) = 0.15/0.30 = 0.50.

3.2. CONDITIONAL PROBABILITY AND INDEPENDENCE

Posterior Probability and Bayes Formula

Suppose that we wish to investigate the occurrence of a certain event B. For example,
consider the collapse of a large industrial building or the crash of a computer network.
The event B may have been caused by one of several possible causes or states of nature
denoted A1 , A2 . . . Am , For example, the collapse of the industrial building may have been
caused by one (and only one) of the following:
A1

Poor design
- underestimated live load
- underestimated maximum wind speed
- etc.

Poor construction
-

Low grade material

Insucient supervision and control
Gross human error
etc.

A combination of A1 and A2 .

Other (non-assignable) causes.

Suppose that, from previous experience or some other source (for example some experts
opinion), the conditional probabilities of B given Ai are known. That is, the probabilities
that the event B will occur when the the cause Ai is present are known and represented by
p 1 , p2 , . . . , p m .
We will call these conditional probabilities risk factors. Suppose also that the probabilities
of each possible cause Ai are known. These probabilities are called prior probabilities and
denoted
1 , 2 , . . . , m .
In the case of our example, the prior probabilities may represent the actual fractions of
industrial buildings in the country which have some design or construction problems. Or they
may represent the subjective beliefs (educated guesses) of some expert consultant (perhaps
the engineer hired by the insurance company to investigate the causes of the accident). In
summary, we suppose that
pi = P (B|Ai ),

and

i = P (Ai ),

are known for all i = 1, . . . , m. Notice that

1 + 2 + . . . + m = 1.
The prior probabilities and the risks factors for the collapsed building example are given in
columns 2 and 3 of Table 3.2

CHAPTER 3. PROBABILITY
Table 3.2:
Cause (i)
1
2
3
4

Prior Probability
0.00050
0.00010
0.00001
0.99939

(i )

Risk Factor (pi )

0.10
0.20
0.40
0.0001

Posterior Probability
0.29
0.12
0.02
0.57

The engineer hired by the insurance company to investigate the accident would certainly
wish to know where he can first start looking to find an assignable causes. More precisely, she
would wish to know what is the most likely assignable cause for the collapse of the building.
The conditional probability of each possible cause, given the fact that the event has
occurred, is called the posterior probability for this cause and can be calculated by the
famous Bayes formula
P (B|Ak )P (Ak )
P (B|A1 )P (A1 ) + P (B|A2 )P (A2 ) + . . . + P (B|Am )P (Am )
pk k
=
.
p1 1 + p2 2 + . . . + pm m

P (Ak |B) =

In the case of our example the posterior probability of the cause poor design (A1 ), for
instance, is equal to

P (A1 |B) =

(0.00050)(0.10)
(0.00050)(0.10) + (0.00010)(0.20) + (0.00001)(0.40) + (0.99939)(0.0001)

= 0.29.
The other posterior probabilities are calculated analogously and the results are displayed in
the fourth column of Table 3.2.
What did the engineer learn from the results of these (posterior probability) calculations?
In the first place she learned that the chance of finding an assignable cause is approximately
43%. Furthermore, she learned that it is best to begin looking for flaws in the design of the
building, as this cause is almost three times more likely to have caused the accident than the
other assignable causes. Finally she learned that it is highly unlikely that the collapse of the
building has been caused by more than one assignable cause.

3.2. CONDITIONAL PROBABILITY AND INDEPENDENCE

Derivation of Bayes Formula

By the definition of conditional probability,
P (Ak |B) =

P (B \ Ak )
.
P (B)

Since in addition S can be expressed as the disjoint union

S = A1 [ A2 [ . . . [ Am ,
we follow
B = B \ S = B \ (A1 [ A2 [ . . . [ Am ) = (B \ A1 ) [ (B \ A2 ) [ . . . [ (B \ Am )
and so,
P (B) = P (B \ A1 ) + P (B \ A2 ) + . . . + P (B \ Am )
= P (B|A1 )P (A1 ) + P (B|A2 )P (A2 ) + . . . + P (B|Am )P (Am )
= 1 p 1 + 2 p 2 + + m p m .

(3.3)

therefore,
P (B|Ak )P (Ak )
P (B|A1 )P (A1 ) + P (B|A2 )P (A2 ) + + P (B|Am )P (Am )
k pk
=
.
1 p1 + 2 p2 + + m pm

P (Ak |B) =

Example 3.2 A certain disease is known to aect 1% of the population. A test for the
disease has the following features: if the person is contaminated the test is positive with
probability 0.98. On the other hand, if the person is healthy, the test is negative with
probability 0.95. (a) What is the probability of a positive test when applied to a randomly
chosen subject? (b) What is the probability that an individual is aected by the disease after
testing positive? (c) Explain the connections between this problem and Bayes formula.
Solution
(a) Since B is clearly equal to the disjoint union of the events B \ C and B \ C c ,
P (B) =
=
=
=

P (B \ C) + P (B \ C c )
P (C)P (B|C) + P (C c )P (B|C c )
(0.01 0.98) + (0.99 0.05)
0.0593

CHAPTER 3. PROBABILITY

(b)
P (C|B) =

P (B \ C)
P (B|C)P (C)
0.98 0.01
=
=
= 0.1653
P (B)
P (B)
0.0593

Notice that the probability of having the disease, even after testing positive, is surprisingly
low (less than 0.17). Why do you think this is so?
(c) The calculation in part (a) produced the unconditional probability that the event
testing positive. This unconditional probability constitutes the denominator of Bayes
formula. If a person has been tested positive, given the characteristics of the test, this can
be caused by two possible causes: being healthy and being contaminated. The posterior
probability of the second cause is the result of part (b).
2
Independence
Roughly speaking, two events A and B are independent when the probability of any one
of them is not modified after knowing the results for the other (occurrence or not occurrence).
In other words, knowing about the occurrence or no occurrence of any one of these events
does not alter the amount of information (or uncertainty) that we initially had regarding the
other event. Quite simply then, we can say that two events are independent if they do not
carry any information regarding each other.
The formal definition of independence is somewhat surprising at first because it doesnt
make any direct reference to the events conditional probabilities. But see also the remarks
following the definition. Probabilists prefer this formal definition, because it is easy to check
and to generalize for the case of m events (m 2).
Definition: The events A and B are independent if
P (A \ B) = P (A)P (B).
Suppose that the events A and B are such that
P (A|B) = P (A).
In this case,
P (A \ B) = P (A|B)P (B) = P (A)P (B),

and the events A and B are independent according to the given definition.
On the other hand, if P (B) > 0 and A and B satisfy the given definition of independence,
then
P (A|B) =

P (A \ B)
P (A)P (B)
=
= P (A).
P (B)
P (B)

3.2. CONDITIONAL PROBABILITY AND INDEPENDENCE

Example 3.3 The results of the STAT 251 midterm exam can be classified as follows:
Table 3.3:
High
Medium
Low

Male
0.05
0.30
0.30
0.65

Female
0.15
0.15
0.05
0.35

0.20
0.45
0.35
1.00

What is the meaning of the statement gender and performance are independent? Are they?
Why?
Solution
Gender and performance are (intuitively) independent if for example, knowing the score
of a randomly chosen test doesnt aect the probability that this test corresponds to a male
(0.65, from the table) or to a female (0.35). Or vice versa, knowing the gender of the student
who wrote the test doesnt modify our ability to predict its score.
Let A and B be the events a randomly chosen student is male and a randomly chosen
student has a high score, respectively. Is it true that P (A|B) = P (A)? The answer, of
course, is no because
P (A|B) = 0.05/0.20 = 0.25

and

P (A) = 0.65.

Before knowing that the score is high, the chances are almost two out of three that the
student is a male. However, after we know that the score is high, the chances are one out of
four that the student is a male. The lack of independence in this case is derived from the fact
that male students are underrepresented in the high score category and over-represented
in the low score category.
2
If Table 3.3 above is replaced by Table 3.4
Table 3.4:
High
Medium
Low

Male
0.13
0.29
0.23
0.65

Female
0.07
0.16
0.12
0.35

0.20
0.45
0.35
1.00

then gender and performance are independent. (Why?).

The concept of independence also applies to three or more events and we shall now give the
formal definition of independence of m events. At the same time we want to point out that,
in most practical applications, the independence of certain events is often simply assumed or

CHAPTER 3. PROBABILITY

derived from external information regarding the physical make up of the random experiment,
as illustrated in Example 3.4 below.
Fortunately then, we will have few occasions of checking this definition throughout this
course.
Definition: The events Ai (i = 1, . . . , m) are independent if
P (Ai \ Aj )

for all i 6= j and

P (Ai )P (Aj )

P (Ai \ Aj \ Ak )

= P (Ai )P (Aj )P (Ak ) for all i 6= j 6= k and

...
P (A1 \ A2 \ . . . \ Am ) = P (A1 )P (A2 ) . . . P (Am )

Example 3.4 A certain system has four independent components {a1 , a2 , a3 , a4 }. The pairs
of components a1 , a2 and a3 , a4 are in line. This means that, for instance, the subsystem
{a1 , a2 } fails if any of its two component does; similarly for the subsystem {a3 , a4 }. The
subsystems {a1 , a2 } and {a3 , a4 } are in parallel. This means that the system works if at least
one of the two subsystems does. Calculate the probability that the system fails assuming
that the four components are independent and that each one of them can break down with
probability 0.10. How many parallel subsystems would be needed if the probability of failure
for the entire system cannot exceed 0.001?

Q
Q

-
Q

Q
s
Q

Qs
3

Figure 3.1: A four-component system

Solution Let Ai be the event component ai works (i = 1, . . . , 4), and let C be the event
the system works.
P (C) = P [(A1 \ A2 ) [ (A3 \ A4 )] = P (A1 \ A2 ) + P (A3 \ A4 ) P [(A1 \ A2 ) \ (A3 \ A4 )]
= P (A1 )P (A2 ) + P (A3 )P (A4 ) P (A1 )P (A2 )P (A3 )P (A4 )
= 0.92 + 0.92 0.94 = 0.9639

3.2. CONDITIONAL PROBABILITY AND INDEPENDENCE

To answer the second question, just notice that the probability of working for each independent subsystem is 0.92 = 0.81. Now, if Bi (i = 1, . . . , m) is the event the ith subsystem
works, it follows that
c
0.001 1 P (B1 [ B2 [ . . . [ Bm ) = P (B1c \ B2c \ . . . \ Bm
)
c
c
c
m
= P (B1 )P (B2 ) . . . P (Bm ) = [1 P (B1 )] = (1 0.81)m .

Therefore,
log(0.001) m log(0.19)

log(0.001)
log(0.19)

m = 5.
2

3.3

CHAPTER 3. PROBABILITY

Exercises

Problem 3.1 If A and B are independent events with P (A) = 0.2 and P (B) = 0.5, find the
following probabilities. (a) P (A [ B); (b) P (A \ B); and (c) P (Ac \ B c )
Problem 3.2 In a certain class, 5 students obtained an A, 10 students obtained a B, 17
students obtained a C, and 6 students obtained a D. What is the probability that a randomly
chosen student receive a B? If a student receives $10 for an A, $5 for a B, $2 for a C, and $0
for a D, what is the average gain that a student will make from this course?
Problem 3.3 Consider the problem of screening for cervical cancer. The probability that a
women has the cancer is 0.0001. The screening test correctly identifies 90% of all the women
who do have the disease, but the test is false positive with probability 0.001.
(a) Find the probability that a woman actually does have cervical cancer given the test says
she does.
(b) List the four possible outcomes in the sample space.
Problem 3.4 An automobile insurance company classifies each driver as a good risk, a
medium risk, or a poor risk. Of those currently insured, 30% are good risks, 50% are medium
risks, and 20% are poor risks. In any given year the probability that a driver will have at
least one accident is 0.1 for a good risk, 0.3 for a medium risk, and 0.5 for a poor risk.
(a) What is the probability that the next customer randomly selected will have at least one
accident next year?
(b) If a randomly selected driver insured by this company had an accident this year, what is
the probability that this driver was actually a good risk?
Problem 3.5 A truth serum given to a suspect is known to be 90% reliable when the person
is guilty and 99% reliable when the person is innocent. In other words, 10% of the guilty are
judged innocent by the serum and 1% of the innocent are judged guilty. If the suspect was
selected from a group of suspects of which only 5% have ever committed a crime, and the
serum indicates that he is guilty, what is the probability that he is innocent?
Problem 3.6 70% of the light aircrafts that disappear while in flight in a certain country
are subsequently discovered. Of the aircrafts that are discovered, 60% have an emergency
locator, whereas 80% of the aircrafts not discovered do not have an emergency locator.
(a) What percentage of the aircrafts have an emergency locator?
(b) What percentage of the aircrafts with emergency locator are discovered after they disappear?
Problem 3.7 Two methods, A and B, are available for teaching a certain industrial skill.
The failure rate is 20% for A and 10% for B. However, B is more expensive and hence is
only used 30% of the time (A is used the other 70%). A worker is taught the skill by one
of the methods, but fails to learn it correctly. What is the probability that the worker was
taught by Method A?

3.3. EXERCISES

Problem 3.8 Suppose that the numbers 1 through 10 form the sample space of a random
experiment, and assume that each number is equally likely. Define the following events: A1 ,
the number is even; A2 , the number is between 4 and 7, inclusive.
(a) Are A1 and A2 mutually exclusive events? Why?
(b) Calculate P (A1 ), P (A2 ), P (A1 \ A2 ), and P (A1 [ A2 ).
(c) Are A1 and A2 independent events? Why?
Problem 3.9 A coin is biased so that a head is twice as likely to occur as a tail. If the coin
is tossed three times,
(a) what is the sample space of the random experiment?
(b) what is the probability of getting exactly two tails?
Problem 3.10 Items in your inventory are produced at three dierent plants: 50 percent
from plant A1 , 30 percent from plant A2 , and 20 percent from plant A3 . You are aware
that your plants produce at dierent levels of quality: A1 produces 5 percent defectives, A2
produces 7 percent defectives, and A3 yields 8 percent defectives. You select an item from
your inventory and it turns out to be defective. Which plant is the item most likely to have
come from? Why does knowing the item is defective decrease the probability that it has
come from plant A1 , and increase the probability that it has come from either of the other
two plants?
Problem 3.11 Calculate the reliability of the system described in the following figure. The
numbers beside each component represent the probabilities of failure for this component.
Note that the components work independently of one another.
.05
1

.05
2
@
@

.1
3
4

.05

@
@ 5

.1
Problem 3.12 A system consists of two subsystems connected in series. Subsystem 1 has
two components connected in parallel. Subsystem 2 has only one component. Suppose the
three components work independently and each has probability of failure equal to 0.2. What
is the probability that the system works?
Problem 3.13 A proficiency examination for a certain skill was given to 100 employees of
a firm. Forty of the employees were male. Sixty of the employees passed the exam, in that
they scored above a preset level for satisfactory performance. The breakdown among males
and females was as follows:
Pass (P)
Fail

Male (M) Female (F)

24
36
16
24
100

CHAPTER 3. PROBABILITY

Suppose an employee is randomly selected from the 100 who took the examination.
(a) Find the probability that the employee passed, given that he was male.
(b) Find the probability that the employee was male, given that he passed.
(c) Are the events P and M independent?
(d) Are the events P and F independent?
Problem 3.14 Propose appropriate sample spaces for the following random experiments.
Give also two examples of events for each case.
Counting/measuring:
1 - the number of employees attending work in a certain plant
2 - the number of days with wind speed above 50 km/hour, per year, in Vancouver
3 - the number of earthquakes in BC during any given period of two years
4 - the time between two consecutive breakdowns of a computer network
5 - the number of people leaving BC per year
6 - the percentage of STAT 241/51 students obtaining final marks above 80% in any given
term
7 - the number of engineers working in BC per year
8 - the percentage of computer scientists in BC who will make more than $65, 000 in 1996
9 - the number of employees still working in a certain production plant after 4:30 PM on
Fridays.
Problem 3.15 Let A and B be the events construction flaw due to some human
error and construction flaw due to some mechanical problem.
1) What are the meaning (in words) of the following events: (a) A [ B, (b) A \ B, (c)
A \ B c , (d) Ac \ B c , (e) (A [ B)c , (f) Ac [ B c , (g) (A \ B)c . Draw also the corresponding
diagrams.
2) Show that in general (A \ B)c = Ac [ B c and that (A [ B)c = Ac \ B c (so the results of
(f) and (g) and of (d) and (e) above were not mere coincidences).
3) Suppose that P (A) = 0.02, P (B) = 0.01 and P (A [ B) = 0.023. Calculate (a) P (A \ B),
(b) P (Ac \ B c ), (c) P (A \ B c ), (d) P (A|B c ), (e) P (A|B).
Problem 3.16 A large company hires most of its employees on the basis of two tests. The
two tests have scores ranging from one to five. The following table summarizes the performance of 16,839 applicants during the last six years. From this table we learn, for example,
that 3% of the applicants got a score of 2 on Test 1 and 2 on Test 2; and that 15% of the
applicants got a score of 3 on Test 1 and 2 on Test 2. We also learn that, for example, 20%
of the applicants got a score of 2 on Test 1 and that 25% of the applicants got a score of 2
on Test 2.
A group of 1500 new applicants have been selected to take the tests.
(a) What should the cutting scores be if between 140 and 180 applicants will be shortlisted
for a job interview? Assume that the company wishes to shortlist people with the highest
possible performances on the two tests.

3.3. EXERCISES
Table 3.5:
Score
1
2
3
4
5
Total

1
0.07
0.15
0.08
0.10
0.00
0.40

2
0.03
0.03
0.15
0.04
0.00
0.25

3
0.00
0.02
0.09
0.08
0.06
0.25

4
0.00
0.00
0.02
0.01
0.02
0.05

5
0.00
0.00
0.01
0.02
0.02
0.05

Total
0.10
0.20
0.35
0.25
0.10
1.00

Table 3.6:
Score
1
2
3
4
5

Test 1
0.10
0.20
0.35
0.25
0.10

Test 2
0.40
0.25
0.25
0.05
0.05

(b) Same as (a) but assuming now that the company wishes to hire people with the highest
possible performances on at least one of the two tests.
(c) (Continued from (a)) A manager suggests that only applicants who obtain marks above
a certain bottom line in one of the tests be given the other test. Noticing that giving
and marking each test costs the company $55, recommend which test should be given first.
Approximately how much will be saved on the basis of your advise?
(d) Repeat (a)(c) if the two tests performances are independent and the probabilities are
given by Table 2.6.
Problem 3.17 A computer company manufactures PC compatible computers in two plants,
called Plant A and B in this exercise. These plants account for 35 % and 65 % of the
production, respectively. The company records show that 3 % of the computers manufactured
by Plant A must be repaired under the warranty. The corresponding percentage for plant B
is 2.5 %.
(a) What is the percentage of computers that are repaired under the warranty and come from
Plant A?
(b) What percentage of computers repaired under the warranty come from Plant A? From
Plant B?
Problem 3.18 Twenty per cent of the days in a certain area are rainy (there is some measurable precipitation during the day), one third of the days are sunny (no measurable precipitation, more than 4 hours of sunshine) and fifteen per cent of the days are cold (daily
average temperature for the day below 5o C).
1 - Would you use the above information as an aid in
(i) Planning your next weekend activities (assuming that you live in this area)?
(ii) Deciding whether you want to move to this area?
(iii) Choosing the type of roofing for a large building in this area?

CHAPTER 3. PROBABILITY

Justify your answers.

2 - Given that five per cent of the days are sunny and cold, and five per cent of the days are
rainy and cold, calculate the probability that a given day will be either sunny, rainy or cold.
3 - Are sunny and cold days independent? What about rainy and cold days?
Problem 3.19 A company sells a (cheap) recording tape under a limited lifetime warranty. From the company records one learns that
5% of the tapes sold by the company are defective and could be replaced under the warranty.
50% of the customers who get one of these defective tapes will claim it under the warranty
and have it replaced.
90% of the tapes which are claimed to be defective are actually so. These tapes are replaced
under the warranty.
(a) Which of the above are conditional probabilities?
(b) Using the above information, calculate the probability that a customer will claim the
warranty.
(c) What is the maximum allowable fraction of defective tapes if the company wants to have
at most 1% of the tapes returned?
Problem 3.20 Show that P (A \ B \ C) = P (A)P (B|A)P (C|A \ B).
Problem 3.21 On average, 20% of the students fail the first midterm. Of those, 60% fail
the second midterm. Moreover, 80% of the students that failed the two midterms fail also
the final exam.
(a) What is the probability that a randomly chosen student fails the two midterms?
(b) What is the probability that a randomly chosen student fails the two midterms and the
final exam?
Problem 3.22 The probability that system survives 300 hours is 0.8. The probability that
a 300 hours old system survives another 300 hours is 0.6. The probability that a 600 hours
old system survives another 300 hours is 0.5.
(a) What is the probability that the system survives 600 hours?
(b) What is the probability that the system survives 900 hours?
Problem 3.23 Recall the situation in Example 3.2 presented in class: the probability of
infection for an individual in the general population is = .01 and a test for the disease
is such that it will be correctly positive 98% of the time and correctly negative 95% of the
time. Some individuals, however, may belong to some high risk groups and therefore have
a larger prior probability of being infected.
1) calculate the posterior probability of infection as a function of the corresponding prior
probability, , given that the test is positive (denote this probability by g()) and make a
plot of g() versus .
2) what is the value of for which the posterior probability given a positive test is twice as
large as the prior probability?

3.3. EXERCISES

Problem 3.24 Suppose that we wish to determine whether an uncommon but fairly costly
construction flaw is present. Suppose that in fact this flaw has only probability 0.005 of
being present. A fairly simple test procedure is proposed to detect this flaw. Suppose that
the probabilities of being correctly positive and negative for this test are 0.98 and 0.94,
respectively.
1) Calculate the probability that the test will indicate the presence of a flaw.
2) Calculate the posterior probability that there is no flaw given that the test has indicated
that there is one. Comment on the implications of this result.
Problem 3.25 One method that can be use to distinguish between granite (G) and basalt
(B) rocks is to examine a portion of the infrared spectrum of the suns energy reflected
from the rock surface. Let R1 , R2 and R3 denote measured spectrum intensities at three
dierent wavelengths. Normally, R1 < R2 < R3 would be consistent with granite and
R3 < R1 < R2 would be consistent with basalt. However, when the measurements are made
remotely (e.g. using aircrafts) several orderings of the Ri0 s can arise. Flights over regions of
known composition have shown that granite rocks produce
(R1 < R2 < R3 )
(R1 < R3 < R2 )
(R3 < R1 < R2 )

60%
25%
15%

of the time,
of the time, and
of the time

On the other hand, basalt rocks produce these orderings of the spectrum intensities with
probabilities 0.10, 0.20 and 0.70, respectively. Suppose that for a randomly selected rock
from a certain region we have P (G) = 0.25 and P (B) = 0.75.
1) Calculate P (G|R1 < R2 < R3 ) and P (B|R1 < R2 < R3 ). If the measurements for a given
rock produce the ordering R1 < R2 < R3 , how would you classify this rock?
2) Same as 1) for the case R1 < R3 < R2
3) Same as 1) for the case R3 < R1 < R2
4) If one uses the classification rule determined in 1) 2) and 3), what is the probability of
a classification error (that a G rock is classified as a B rock or a B rock is classified as a G
rock)?
Problem 3.26 Messages are transmitted as a sequence of zeros and ones. Transmission errors occur independently, with probability 0.001. A message of 3500 bits will be transmitted.
(a) What is the probability that there will be no errors? What is the probability that there
will be more than one error?
(b) If the same message will be transmitted twice and those bits that do not agree will be
revised (and therefore these detected transmission errors will be corrected), what is the
probability that there will be no reception errors?
Problem 3.27 Suppose that the events A, B and C are independent. Show that,
(a) Ac and B c are independent.

CHAPTER 3. PROBABILITY

(b) A [ B and C are independent.

(c) Ac \ B c and C are independent.

Problem 3.28 A test has been designed to indicate the presence of a flaw in an electronic
component. The components which test positive are sent back to the production department.
It is known, however, that 1% of the time the test gives either a false positive or a false
negative result.
(a) What is the proportion of faulty components being produced if 2% of them are sent back
to production on the basis of the test?
(b) The company produces twenty thousand components each year. The loss associated
with the rejection of a sound component is $5, that associated with the rejection of a faulty
component is $50 and that associated with the selling of a defective component is $150. What
is the total loss? How much of this loss is due to defective testing?
Problem 3.29 Consider the probabilities given in Table 2.7 and the events
B1 = {Having a low GPA}, B2 = {Having a medium GPA}, B3 = {Having a high GPA}

C1 = {Having a low salary}, C2 = {Having a medium salary}, C3 = {Having a high salary}

Table 3.7:

Low GPA
Medium GPA
High GPA

1) Calculate P (Bi [ Cj ),

Low Salary
0.10
0.07
0.03
0.20

Medium Salary
0.08
0.46
0.06
0.60

High Salary
0.02
0.07
0.11
0.20

0.20
0.60
0.20

i = 1, 2, 3 and j = 1, 2, 3

2) What is the meaning (in words), and the probability, of the event
A = (B1 \ C1 ) [ (B2 \ C2 ) [ (B3 \ C3 )
3) Are salary and GPA independent? Why?
4) Construct a table with the same marginals (same probabilities for the six categories) but
with salary and GPA being independent.
Problem 3.30 Consider the system of components connected as follows. There are two
subsystems connected in parallel. Components 1 and 2 constitute the first subsystem and are
connected in parallel (so that this subsystem works if either component works). Components
3 and 4 constitute the second subsystem and are connected in series (so that this subsystem
works if and only if both components do). If the components work independently of one
another and each component works with probability 0.85, (a) calculate the probability that
the system works. (b) calculate this probability if the two subsystems are connected in series.

3.3. EXERCISES

Problem 3.31 Calculate the reliability of the system described in the following figure. The
numbers beside each component represent the probabilities of failure for this component.
.05

@
@

1
2
.05

S
S

@
@

.01

S
S

.05
7

CHAPTER 3. PROBABILITY

Chapter 4

Random Variables and Distributions

4.1

Definition and Notation

Mathematically, a random variable X is a function defined on the sample space S, assigning

a number, x = X(w), to each outcome w in the sample space. Notice that the upper case
letter X represents the random variable and the lower case letter x represents one of its
possible values.
Example 4.1 Let S the sample space associated with the inspection of four items. That is,
S = {w = (w1 , w2 , w3 , w4 )}
where wi , i = 1, . . . , 4, is equal to D (for defective) or N (for nondefective). The random
variable X is defined as the number of Ds in w and the random variable Y is defined
as the indicator of two or more Ds in w (that is, Y (w) = 1 if w contains two or more
defectives and Y (w) = 0, otherwise). For instance, X(N, N, N, N ) = 0, X(N, D, N, N ) = 1,
X(D, N, D, D) = 3, and Y (N, N, N, N ) = 0, Y (N, D, N, N ) = 0, Y (D, N, D, D) = 1.
Random variables are often used to summarize the most relevant information contained
in the sample space. For example, one may be interested in the total number of defectives
(number of D0 s in w) and may not care about the order in which they have been found. In this
case the random variable X(w) defined above would capture the most relevant information
contained in w. If we will reject lots with two or more defectives (among the four inspected
items) the random variable, Y would be of most interest.
Notation: The notations {X = x} {X x} etc. will be used very often in this course.
Their exact meaning is explained below. In general,
{X 2 A} = {w : X(w) 2 A},

where A is a set of numbers.

This takes on dierent forms for dierent sets A0 s. For example,

{X = x} = {w : X(w) = x},
61

CHAPTER 4. RANDOM VARIABLES AND DISTRIBUTIONS

where the set A = {x} and

{X x} = {w : X(w) x},

where the set A = (1, x]. Additional examples (related to Example 5 above) are
{X = 0} = {(N, N, N, N )}
and
{X 1} = {(N, N, N, N ), (D, N, N, N ), (N, D, N, N ), (N, N, D, N ), (N, N, N, D)}.

4.2

Discrete Random Variables

Discrete random variables are mainly used in relation to counting situations; for example,

Counting the number of defective items in a lot

Counting the number of yearly failures of an electrical network

Counting the weekly number of customers arriving at a service outlet

Counting the hourly number of cars crossing a bridge

Counting the number of jobs interviews before finding a job

The defining feature of a discrete random variable is that its range (the set of all its
possible values) is finite or countable. The values in the range are often integer numbers, but
they dont need to be so. For instance, a random variable taking the values zero, one half
and one with probabilities 0.5, 0.25 and 0.25 respectively is considered discrete.
The probability density function (or in short, the density), f (x), of a discrete random
variable X is defined as
f (x) = P (X = x),

for all possible value x of X

That is, f (x) gives the probability of each possible value x of X. It obviously has the following
properties:
(1) f (x) 0 for all x in the range R of X
(2)
(3)

P
P

x2R

f (x) = 1

x2A

f (x) = P (X 2 A) for all subsets A of R.

4.3. CONTINUOUS RANDOM VARIABLES

The distribution function of X (in short the distribution), F (x), is defined as

F (x) = P (X x) =

f (k),

for all real x.

In many engineering applications one works with 1 F (x) instead of F (x). Notice that
1 F (x) = P (X > x) and therefore gives the probability that X will exceed the value x.
Example 4.1 (continued): Suppose that the items are independent and each one can be
defective with probability p. The density and distribution of the random variable (r.v.) X =
number of defectives can then be derived as follows:
f (0) = P (X = 0) = P ({N, N, N, N }) = (1 p)(1 p)(1 p)(1 p) = (1 p)4
f (1) = P (X = 1) = P ({D, N, N, N }, {N, D, N, N }, {N, N, D, N }, {N, N, N, D})
= p(1 p)(1 p)(1 p) + (1 p)p(1 p)(1 p) + (1 p)(1 p)p(1 p)
+(1 p)(1 p)(1 p)p = 4(1 p)3 p
In a similar way we can find that
f (2) = 6(1 p)2 p2 ,

f (3) = 4(1 p)1 p3

and f (4) = p4

The values of the density and distribution functions of X, for the cases p = 0.40 and p = 0.80
are given in Table 2.5. A comparison of the density functions shows that smaller values of
X (0, 1 and 2) are more likely when p = 0.4 (why?) and that higher values (3 and 4) are
more likely when p = 0.8. Also notice that the distribution function for the case p = 0.8 is
uniformly smaller. This is so because getting smaller values of X is always more likely when
p = 0.4.
Table 4.1:
x
0
1
2
3
4

4.3

p = 0.40
f (x)
F (x)
0.1296 0.1296
0.3456 0.4752
0.3456 0.8208
0.1536 0.9744
0.0256 1.0000

p = 0.80
f (x)
F (x)
0.0016 0.0016
0.0256 0.0272
0.1536 0.1808
0.4096 0.5904
0.4096 1.0000

Continuous Random Variables

The continuous random variables are used in relation with continuous type of outcomes,
as for example,

the lifetime of a system or component

CHAPTER 4. RANDOM VARIABLES AND DISTRIBUTIONS

the yield of a chemical process

the weight of a randomly chosen item

the dierence between the specified and actual diameter of a part

the measurement error when measuring the distance between the North and South
shores of a river.
The typical events in these cases are bounded or unbounded intervals with probabilities
specified in terms of the integral of a continuous density function, f (x), over the desired
interval. See property (3) below.
Since the probability of all intervals must be non-negative and the probability of the entire
line should be one, it is clear that f (x) must have the two following properties:
(1) Non negative:
f (x) 0
(2) Total mass equal to one:

Z +1
1

for all x.

f (x)dx = 1.

(3) Probability calculation:

P {a < X b} =

Z b
a

f (x)dx.

Notice that, unlike in the discrete case, the inclusion or exclusion of the end points a and
b doesnt aect the probability that the continuous variable X is in the interval. In fact,
the event that X will take any single value, x, can be represented by the degenerate interval
x X x and so,
P (X = x) = P (x X x) =

Z x
x

f (t)dt = 0.

Therefore, unlike in the discrete case, f (x) doesnt represent the probability of the event
X = x. What is then the meaning of f (x)? It represents the relative probability that X will
be near x: if d > 0 is small,
P (x (d/2) < X < x + (d/2))
1 Z x+(d/2)
=
f (t)dt f (x).
d
d x(d/2)

4.3. CONTINUOUS RANDOM VARIABLES

Another important function related with a continuous random variable is its cumulative
distribution function defined as
F (x) = P (X x) =
Notice that, in particular,

Z x

f (t)dt,

for all x.

(4.1)

P (a < X < b) = F (b) F (a).

F(b)

F(b)-F(a)

F(a)
a

Figure 3.1: Probability on (a, b) under density function f (x)

By the Fundamental Theorem of Calculus,
f (x) = F 0 (x),

for all x.

(4.2)

Therefore, we can go back and forth from the density to the distribution function and vice
versa using formulas (4.1) and (4.2).
Example 4.2 Suppose that the maximum annual flood level of a river, X (in meters), has
density
f (x) = 0.125(x 5),
= 0
otherwise

< 7), and P (8 X

0.8

0.4

1.0

< 6), P (6 X

0.5

Calculate F (x), P (5 < X

if 5 < x < 9

0.6

Distribution Function

0.4
0.2
0.0

0.0

0.1

0.2

0.3

Density Function

7
x

Figure 3.2: Distribution and density functions

9).

CHAPTER 4. RANDOM VARIABLES AND DISTRIBUTIONS

Solution
if x 5

F (x) = 0,
=

Z x
5

0.125(t 5)dt = (0.0625)(x 5)2 ,

if 5 < x < 9

if x 9.

= 1,
Furthermore,

P (5 < X < 6) = F (6) F (5)

= 0.0625[(6 5)2 (5 5)2 ]
= 0.0625.
Analogously,
and

P (6 X < 7) = F (7) F (6) = 0.25 0.0625 = 0.1875,

P (8 X < 9) = F (9) F (8) = 1.0 0.5625 = 0.4375.

Notice that, since P (X = x) = 0, the inclusion or exclusion of the intervals boundary points
doesnt aect the probability of the corresponding interval. In other words,
P (6 X 7) = P (6 < X 7) = P (6 X < 7) = P (6 < X < 7) = F (7) F (6) = 0.1875.
Also notice that, since f (x) is increasing on (5, 9), P (5 < X < 6), for instance, is much
smaller than P (8 < X < 9), despite the length of the two intervals being equal.
2
Example 4.3 (Rounding-o Error and Uniform Random Variables): Due to the resolution
limitations of a measuring device, the measurements are rounded-o to the second decimal
place. If the third decimal place is 5 or more, the second place is increased by one unit; if the
third decimal place is 4 or less, the second place is left unchanged. For example, 3.2462 would
be reported as 3.25 and 3.2428 would be reported as 3.24. Let X represent the dierence
between the (unknown) true measurement, y, and the corresponding roundedo reading, r.
That is
X = y r.
Clearly, X can take any value between 0.005 < X < 0.005. It would appear reasonable
in this case to assume that all the possible values are equally likely. Therefore, the relative
probability f (x) that X will fall near any number x0 between 0.005 and 0.005 should then
be the same. That is,
f (x) = c,
= 0,

0.005 x 0.005,
otherwise.

The random variable X is said to be uniformly distributed between 0.005 and +0.005. By
property 2
Z +1
Z 0.005
f (x)dx =
cdx = 0.01c = 1,
1

0.005

4.4. SUMMARIZING THE MAIN FEATURES OF F (X)

Therefore, c must be equal to 1/0.01 = 100 and

f (x) = 100,
0.005 x 0.005,
= 0, otherwise.
The corresponding distribution function is

120

1.5

F (x) = 0,
x 0.005,
= 100(x + 0.005),
0.005 x 0.005,
= 1,
x 0.005

Density Function
1.0
0.0
-0.005

0.0
x

0.005

0.5

distributiion

100
80

density

110

Distribution Function

-0.005

0.0

0.005

Figure 3.4: Distribution and Density of the uniform random variable

4.4

Summarizing the Main Features of f (x)

All the information concerning the random variable X is contained in its density function,
f (x), and this information can be used and displayed in the form of a picture (a graph of
f (x) versus x), a formula, or a table.
There are situations, however, when one would prefer to concentrate on a summary of the
more complete and complex information contained in f (x). This is the case, for example,
if we are working with several random variables that need to be compared in order to draw
some conclusions.
The summary of f (x), as any other summary, should be simple and informative. The
reader of such a summary should get a good idea of what are the most likely values of X and
what is the degree of uncertainty regarding the prediction of future values of X.
Typical densities found in practice are approximately symmetric and unimodal. These
densities can be summarized in terms of their central location and their dispersion. Therefore,

CHAPTER 4. RANDOM VARIABLES AND DISTRIBUTIONS

an approximately symmetric and unimodal density can be fairly well described by giving just
two numbers: a measure of its central location and a measure of its dispersion.
The median and the mean are two popular measures of (central) location and the
interquartile range and the standard deviation are two popular measures of dispersion.
These summary measures are defined and briefly discussed below.
The Median and the InterQuartile Range
Given a number between zero and one, the quantile of order of the distribution F (or
the r.v. X), denoted Q(), is implicitly defined by the equation
P (X Q()) = .
Therefore Q() has the property
Q() = F 1 ()
and can be found by solving (for x) the equation
F (x) = .
To find the quantile of order 0.25, for example, we must solve the equation
F (x) = 0.25.
The special quantiles Q(0.25) and Q(0.75) are often called the first quartile and the third
quartile, respectively.
The median of X, Med(X), is defined as the corresponding quantile of order 0.5, that is,
Med(X) = Q(0.5).
Evidently, Med(X) divides the range of X into two sets of equal probability. Therefore, it
can be used as a measure for the central location of f (x).
A simple sketch showing the locations of Q(0.25), M ed(X) and Q(0.75) constitutes a
good summary of f (x), even if it is not symmetric. Notice that if Q(0.75) M ed(X) is
significantly larger (or smaller) than M ed(X) Q(0.25), then f (x) is fairly asymmetric.
There are situations when there is no solution or too many solutions to the defining
equations above. This is typically the case for discrete random variables. In these cases the
quantiles (including the median) are calculated using some commonsense criterion. For
instance if the distribution function F (x) is constant and equal to 0.5 on the interval (x1 , x2 ),
then the median is taken equal to (x1 + x2 )/2 (see Figure 3.5 (a)). To give another example,
if the distribution function F (x) has a jump and doesnt take the value 0.5, the median is
defined as the location of the jump (see Figure 3.5 (b))
The dispersion about the median is usually measured in terms of the interquartile
range, denoted IQR(X) and defined as:
IQR(X) = Q(0.75) Q(0.25)

1.0

F(x)

0.5

F(x)

1.0

4.4. SUMMARIZING THE MAIN FEATURES OF F (X)

0.0

x1
0

(a)

20
(b)

Figure 3.5: Calculation of the median

When the density f (x) is fairly concentrated (around some central value) IQR(X) tends
to be smaller. Roughly speaking, the size of IQR(X) is directly proportional to the degree
of uncertainty that one faces in trying to predict the future values of X.
Example 4.4 (Waiting Time and Exponential Random Variables) The waiting time X (in
hours) between the arrival of two consecutive customers at a service outlet is a random
variable with exponential density
f (x) = ex , if x 0,
= 0, otherwise.
where is a positive parameter representing the rate at which customers arrive. For this
example, take = 2 customers per hour. (a) Find the distribution function F (x). (b)
Calculate Med(X), Q(0.25) and Q(0.25). (c) Is f (x) symmetric? (d) Calculate IQR(X).
Solution
(a)
F (x) =

Z x

= 2

f (t)dt = 2

Z x
0

exp {2t}dt

exp {0} exp {2x}

= 1 exp {2x}.
2

CHAPTER 4. RANDOM VARIABLES AND DISTRIBUTIONS

(b) To calculate the median,

1 exp {2x} = 1/2

exp {2x} = 1/2

2x = log(2).

Therefore, Med(X) = log(2)/2 = 0.347.

To calculate Q(0.25),
1 exp {2x} = 1/4

exp {2x} = 3/4

2x = log(3) log(4).

Therefore,
Q(0.25) =

log(4) log(3)
= 0.144.
2

Analogously, to calculate Q(0.75),

1 exp {2x} = 3/4

exp {2x} = 1/4

Q(0.75) =

2x = log(4).

log(4)
= 0.693.
2

(c) Since
Q(0.75) Med(X) = 0.693 0.347 = 0.346
and
Med(X) Q(0.25) = 0.347 0.144 = 0.203,
the distribution is fairly asymmetric.
(d)
IQR = Q(0.75) Q(0.25) = 0.693 0.144 = 0.549.
2
The Mean, the Variance and the Standard Deviation
Let X be a random
variable with density f (x), and let g(X) be a function of X. For exp
ample, g(X) = X or g(X) = (X t)2 , where t is some fixed number. The notation E[g(X)],
read expected value of g(X), will be used very often in this course. The expected value of
g(X) is defined as the weighted average of the function g(x), with weights proportional to
the density function f (x). More precisely:
E[g(X)] =

Z +1
1

E[g(X)] =

g(x)f (x)dx
X

x2R

where R is the range of X.

g(x)f (x)

in the continuous case, and

in the discrete case,

(4.3)

(4.4)

4.4. SUMMARIZING THE MAIN FEATURES OF F (X)

Example 4.5 Refer to the random variables of Example 3.1 (number of defectives) and
Example 4.3 (roundingo error). Calculate E(X) and E(X 2 ).
Solution Since the random variable X of Example 3.1 is discrete, we must use formula (4.4)
to obtain:
E(X) = (0)(0.5) + (1)(0.2) + (2)(0.15) + (3)(0.10) + (4)(0.03) + (5)(0.02) = 1.02,
and
E(X 2 ) = (0)(0.5) + (1)(0.2) + (4)(0.15) + (9)(0.10) + (16)(0.03) + (25)(0.02) = 2.68.
In the case of the continuous random variable X of Example 4.3 we must use formula (4.3):
E(X) =

E(X ) =
=

Z +1
1

xf (x)dx = 100

x f (x)dx = 100

Z 0.005

0.005

Z 0.005
0.005

xdx = 0,

x2 dx

100[(0.005)3 (0.005)3 ]
(200)(0.005)3
=
= 0.00000833.
3
3
2

The mean of X as a Measure of Central Location

Suppose that it is proposed that a certain number t is used as the measure of central
location of X. How could we decide if this proposed value is appropriate? One way to think
about this question is as follows. If t is a good measure of central location then, in principle,
one would expect that the squared residuals (x t)2 will be fairly small for those values x
of X which are highly likely (those for which f (x) is large). If this is so, then one would also
expect that the average of these squared residuals,
D(t) = E{(X t)2 },
will also be fairly small. Notice that
D(t) =
=

Z +1
1

(x t)2 f (x)dx

(x t)2 f (x)

in the continuous case

in the discrete case,

But we could begin this reasoning from the end and say that a good measure of central
location must minimize D(t). This optimal value of t, called the mean of X, is denoted
by the Greek letter .

CHAPTER 4. RANDOM VARIABLES AND DISTRIBUTIONS

To find we dierentiate D(t) and set the derivative equal to zero. In the continuous
case,
D0 (t) = 2

Z +1
1

(x t)f (x)dx = 2[E(X) t] = 0 ) t = E(X),

and the discrete case can be treated similarly. Since D00 (t) = 2 > 0 for all t, the critical point
t = E(X) minimizes D(t). Therefore,
= E(X)

This procedure of defining the desired summary measure by the property of minimizing the
average of the squared residuals is a very important technique in applied statistics called the
method of minimum mean squared residuals. We will come across several applications
of this technique throughout this course.

The Standard Deviation of X as a Measure of Dispersion

It is clear from the above discussion that
D(t) D() = E{(X )2 }
for all values of t. The quantity D() is usually denoted by the Greek symbol 2 (read
Sigma squared) and called the variance of X. An alternative notation for the variance
of X, also often used in this course, is Var(X).
It is evident that Var(X) will tend to be smaller when the density of X is more concentrated around , since the smaller squared residuals will receive larger weights. Therefore,
Var(X) could be taken as a measure of the dispersion of f (x). A problem with Var(X) is
that it is expressed in a unit which is the square of original unit of X. This problem is
easily solved by taking the (positive) square root of Var(X). This is called the standard
deviation of X and denoted by either or SD(X).
q
p
= SD(X) = + Var(X) = + 2 .

Example 4.4 (continued): (a) Calculate the mean and the standard deviation for the waiting
time between two consecutive customers, X. (b) How do they compare with the corresponding median waiting time and interquartile range calculated before?
Solution

4.4. SUMMARIZING THE MAIN FEATURES OF F (X)

(a) Using integration bypart,

E(X) = 2

Z +1
0

x exp {2x}dx =

2[x exp {2x}/2]+1

Z +1
0

exp {2x}

= [ exp {2x}/2]+1
= exp {0}/2 = 0.5.
0
More generally, if X is an exponential random variable with parameter (rate) , then
E(X) = 1/.

(4.5)

Using integration bypart again, we get

E(X 2 ) = 2
= 2

Z +1
0

x2 exp {2x}dx = 2[x2 exp {2x}/2]+1

+ 2[
0

x exp {2x}]dx = 0.5.

Z +1

Therefore,

Z +1
0

x exp {2x}]dx

SD(X) = + E(X 2 ) [E(X)]2 = + 0.5 (0.5)2 = 0.5.

More generally, if X is an exponential random variable with parameter , then

Var(X) = 1/2

and SD(X) = 1/.

(4.6)

(b) Since the density of X is asymmetric, the median and the mean are expected to be
dierent (as they are). Since the density is skewed to the right (longer right hand side tail)
the mean expected time (0.5) is larger than the median expected time (0.347).
The two measures of dispersion (IQR = 0.549 and SD = 0.5) are quite consistent.
2
Properties of the Mean and the Variance
Property 1:

E(aX + b) = aE(X) + b for all constants a and b.

Proof
E(aX + b) =

(axi + b)f (xi ) = a[

The proof for the continuous case is identical.

xi f (xi )] + b = aE(X) + b.

Property 2:

E(X + Y ) = E(X) + E(Y ) for all pairs of random variables X and Y .

Property 3:
Y.

E(XY ) = E(X)E(Y )

Property 4:

Var(aX + b) = a2 Var(X) for all constants a and b.

for all pairs of independent random variables X and

Proof
Var(aX + b) = E [(aX + b) (a + b)]2 = E [a(X )]2 = a2 E(X )2 = a2 Var(X)

CHAPTER 4. RANDOM VARIABLES AND DISTRIBUTIONS

Property 5:
Var(X Y ) = Var(X) + Var(Y )
variables X and Y .

2
for all pairs of independent random

All these properties will be used very often in this course. The proofs of properties 2, 3
and 5 are beyond the scope of this course, and therefore these properties must be accepted
as facts and used throughout the course.
The formula
Var(X) = E(X 2 ) [E(X)]2 = E(X 2 ) 2 ,

is often used for calculations. The derivation of this formula is very simple, using the properties of the mean listed above. In fact,
Var(X) = E{(X )2 } = E(X 2 + 2 2X) = E(X 2 ) + 2 2E(X)
= E(X 2 ) + 2 22 = E(X 2 ) 2 .

4.5

Sum and Average of Independent Random Variables

Random experiments are often independently repeated many times generating a sequence
X1 , X2 , . . . , Xn of n independent random variables. We will consider linear combinations of
these variables,
Y = a1 X1 + a2 X2 + + an Xn ,

where the coecients a1 , a2 , . . . , an are some given constants. For example, ai = 1, for all i,
produces the total
T = X1 + X2 + + Xn ,
and ai = 1/n, for all i, produces the average

X = (X1 + X2 + + Xn )/n.
Using the properties of the expected value and variance we have
E(Y ) = a1 E(X1 ) + a2 E(X2 ) + + an E(Xn )
and
Var(Y ) = a21 Var(X1 ) + a22 Var(X2 ) + + a2n Var(Xn ).

Typically, the n random variables Xi will have a common mean and an common variance
2 . In this case the sequence {X1 , X2 , . . . , Xn } is said to be a random sample. In this case,
E(Y ) = (a1 + a2 + + an )
and
Var(Y ) = (a21 + a22 + + a2n ) 2 .

4.5. SUM AND AVERAGE OF INDEPENDENT RANDOM VARIABLES

Example 4.6 Twenty randomly selected students will be asked the question do you regularly smoke?. (a) Calculate the expected number of smokers in the sample if 10% of the
students smoke; (b) what is your estimate of the proportion, p, of smokers if six students
answered Yes?; (c) What are the expected value and the variance of your estimate?

Solution
(a) Let Xi be equal to one if the ith student answers Yes and equal to zero otherwise.
Let p be equal to the proportion of smokers in the student population. Then the Xi are
independent discrete random variables with density f (0) = 1 p and f (1) = p. Therefore,
E(Xi ) = E(Xi2 ) = 0f (0) + 1f (1) = f (1) = p = 0.1
and
Var(Xi ) = E(Xi2 ) [E(Xi )]2 = p p2 = p(1 p) = 0.09.

Hence, the expected number of smokers in a sample of 20 students is

E(X1 + X2 + + X20 ) = 20p = 2.
The corresponding variance is
Var(X1 + X2 + + X20 ) = 20p(1 p) = 1.8

(b) A reasonable estimate for the fraction, p, of smokers in the population is given by the
corresponding fraction of smokers in the sample, X. In the case of our sample, the observed
value, x, of X is x = 6/20 = 0.3.
(c) The expected value of the estimate in (b) is p and its variance is p(1 p)/20. Why? 2
Example 4.7 The independent random variables X, Y and Z represent the monthly sales
of a large company in the provinces of BC, Ontario and Quebec, respectively. The mean and
standard deviations of these variables are as follows (in hundred of dollars):
E(X) = 1, 435
SD(X) = 120

E(Y ) = 2, 300
SD(Y ) = 150,

E(Z) = 1, 500
SD(Z) = 150.

(a) What are the expected value and the standard deviation of the total monthly sales?
(b) Sales manager J. Smith is responsible for the sales in BC and 2/3 of the sales in Ontario.
Sales manager R. Campbell is responsible for the sales in Quebec and the remaining 1/3 of
the sales in Ontario. What are the expected values and standard deviations of Mr. Smiths
and Mrs. Campbells monthly sales?
(c) What are the expected values and standard deviations of the annual sales for each
province? Assume for simplicity that the monthly sales are independent.

CHAPTER 4. RANDOM VARIABLES AND DISTRIBUTIONS

Solution
(a) The total monthly sales are
S = X + Y + Z.
By Property 2
E(S) = E(X) + E(Y ) + E(Z) = 1, 435 + 2, 300 + 1, 500 = 5, 235.
By Property 5
Var(S) = Var(X) + Var(Y ) + Var(Z) = 1202 + 1502 + 1502 = 59, 400.
Therefore,
SD(S) =
(b) First, notice that

59, 400 = 243.72

S1 = X + (2/3)Y

and

S2 = Z + (1/3)Y,

are Mr. Smiths and Mrs. Campbells monthly sales. By Property 2

E(S1 ) = E(X) + (2/3)E(Y ) = 1, 435 + (2/3)2, 300 = 2968.33.
Analogously
E(S2 ) = 2266.67
By Property 5
Var(S1 ) = Var(X) + (2/3)2 Var(Y ) = 1202 + (2/3)2 1502 = 24, 400,
and so
SD(S1 ) =
Analogously

24, 400 = 156.20.

SD(S2 ) = 158.11.
(c) If Xi (i = 1, . . . , 12) represent BCs monthly sales, the annual sales for BC are
T =

12
X

i=1

Therefore,
E(T ) = E

" 12
X

Xi =

i=1

12
X

E(Xi ) = (12)(1, 435) = 17, 220.

i=1

The variance and the standard deviation of the annual sales in BC (assuming independence)
are:
"
#
Var(T ) = Var

12
X
i=1

Xi =

12
X
i=1

Var(Xi ) = (12)(1202 ) = 172, 800.

4.6. MAX AND MIN OF INDEPENDENT RANDOM VARIABLES

SD(T ) =

172, 800 = 415.69.

The student can now calculate the expected values and the standard deviations for the annual
sales in Toronto and Quebec.
2
Question: The total monthly sales can be obtained as the sum of Mr. Smiths (S1 =
X+(2/3)Y ) and Mrs. Campbells (S1 = Z+(1/3)Y ) monthly sales, with variances (calculated
in part (b)) equal to 24, 400 and 25, 000, respectively. Why is it then true that the total sales
variance (Var(X +Y +Z)), calculated in part (b), is not equal to the sum of 24, 400+25, 000 =
49, 400?

4.6

Max and Min of Independent Random Variables

The maximum, V , and the minimum, , of a sequence of n independent random variables

are of practical interest. They can be used to represent (or model) a number of random
quantities which naturally appear in practice. For example, the maximum,
V = max{X1 , X2 , . . . , Xn }.
can be used to model
1.

The lifetime of a system of n components connected in parallel. In this case

Xi = Lifetime of the ith component.

2.
The completion time of a project made up of n subprojects which can be pursued
simultaneously. In this case
Xi = Completion time for the ith subproject .

The maximum flood level of a river in the next n years. In this case
Xi = Maximum flood level in the ith year .

On the other hand, the minimum,

= min{X1 , X2 , . . . , Xn }.
can be used to model
1.

The lifetime of a system of n components connected in series. In this case

Xi = Lifetime of the ith component.

CHAPTER 4. RANDOM VARIABLES AND DISTRIBUTIONS

2. The completion time of a project independently pursued by n competing teams. In this

case
Xi = Completion time by the ith team .
3.

The minimum flood level of a river in the next n years. In this case
Xi = Minimum flood level in the ith year .

4.6.1

The Maximum

Suppose that Fi (x) and fi (x) are the distribution and density functions of the random variable
Xi , and let FV (v) and fV (v) be the distribution and density functions of the maximum V .
Since the maximum, V , is less than a given value, v, if and only if each random variable Xi
is less than v we have
FV (v) = P {V v} = P {X1 v, X2 v, . . . , Xn v}
= P {X1 v}P {X2 v} P {Xn v}
= F1 (v)F2 (v) Fn (v)

[ since the variables Xi are independent]

[since P {Xi v} = Fi (v), i = 1, . . . , n]

This formula is greatly simplified when the Xi s are identically distributed, that is, when
F1 (x) = F2 (x) = = Fn (x) = F (x)
for all values of x. In this case,
FV (v) = [F (v)]n

(4.7)

fV (v) = FV0 (v) = n[F (v)]n1 f (v).

(4.8)

and

Example 4.8 A system consists of five components connected in parallel. The lifetime
(in thousands of hours) of each component is an exponential random variable with mean
= 3. See Example 4.4 and Example 4.4 (continued) for the definition of exponential
random variables and formulas for their mean and variance.
(a) Calculate the median life (often called halflife) and standard deviation for each component.
(b) Calculate the probability that a component fails before 3500 hours.
(c) Calculate the probability that the system will fail before 3500 hours. Compare this with
the probability that a component fails before 3500 hours.
(d) Calculate the halflife (median life), mean life and standard deviation for the system.

4.6. MAX AND MIN OF INDEPENDENT RANDOM VARIABLES

Solution
(a) Using equation (4.5) and the fact that the lifetime X of each component is exponentially
distributed with mean = 3 we obtain that = 1/3 and that the density and distribution
functions of X are
f (x) = (1/3) exp{x/3}

and

F (x) = 1 exp{x/3},

x 0,

respectively. The half-life of each component can be obtained as follows

1 exp{x/3} = 0.5 ) exp{x/3} = 0.5 ) x0 = 3 log(0.5) = 2.08.
Therefore, the half-life of each component is equal to 2, 080 hours. To obtain the standard
deviation, recall that from equation (4.6) the standard deviation of an exponential random
variable is equal to its mean, that is,
SD(X) = E(X) = .
Therefore, the standard deviation of the lifetime of each component is equal to 3.
(b) The probability that a component will fail before 3500 is
P {X 3.5} = F (3.5) = [1 exp{3.5/3}] = 0.6886.
(c) Using formula (4.7)
FV (v) = [1 exp{v/3}]5

and so the probability that the system will fail before 3, 500 hours is
P {V 3.5} = FV (3.5) = [1 exp{3.5/3}]5 = (0.6886)5 = 0.1548.
The probability that a single component fails (calculated in part (b)) is more than four times
larger.
(c) To calculate the median life of the system we must use formula (1) once again:
FV (v) = 0.5 ) [1 exp{v/3}]5 = 0.5 ) exp{v/3}] = 1 (0.5)1/5 = 0.12945
) v0 = 3 log(0.12945) = 6.133.
Therefore, the median life of the system is equal to 6, 133 hours.
To calculate the mean life we must first obtain the density function of V . Using formula (2)
above we obtain
fV (v) = (5)[1 exp{v/3}]4 (1/3) exp{v/3}
= (5/3)[exp{v/3} 4 exp{2v/3} + 6 exp{v} 4 exp{4v/3} + exp{5v/3}].
Since, for any > 0,
Z 1
0

v exp{v}dv = (1/)

Z 1
0

v exp{v}dv = E(V )(1/) = (1/)2 ,

CHAPTER 4. RANDOM VARIABLES AND DISTRIBUTIONS

the mean life, E(V ), is equal to

E(V ) =

Z 1
0

+ 6

vfV (v)dv = (5/3)

Z 1
0

v exp{v}dv 4

Z 1
0

v exp{v/3}dv 4
v exp{4v/3}dv +

Z 1

v exp{2v/3}dv

Z 1
0

v exp{5v/3}dv

= (5/3)[ (9) (4)(9/4) + (6)(1) (4)(9/16) + (9/25) ] = 6.85.

To calculate SD(V ) we must first find
Var(V ) = E(V 2 ) [E(V )]2 = E(V 2 ) (6.85)2 .
Since, for any > 0,
Z 1
0

v 2 exp{v}dv = 2/( 3 ),

[why?]

we have that
E(V 2 ) =

Z 1
0

+ 6

v 2 fV (v)dv = (5/3)[

Z 1
0

Z 1

v 2 exp{v}dv 4

v 2 exp{v/3}dv 4

Z 1
0

v 2 exp{4v/3}dv +

Z 1
0

v 2 exp{2v/3}dv
v 2 exp{5v/3}dv

= (2)(5/3)[ (27) (4)(27/8) + (6)(1) (4)(27/64) + (27/125) ] = 60.095.

Therefore,
SD(V ) =

4.6.2

60.095 (6.852 ) =

13.1725 = 3.63.
2

The Minimum

Now we turn our attention to the distribution of the minimum, . Let F (u) and f (u)
denote the distribution and density functions of . Since the minimum, , is greater than a
given value, u, if and only if each random variable Xi is greater than u we have
F (u) = P { u} = 1 P { > u} = 1 P {X1 > u, X2 > u, . . . , Xn > u}
= 1 P {X1 > u}P {X2 > u} P {Xn > u} [ since the variables Xi are independent]
= 1 [1 F1 (u)][1 F2 (u)] [1 Fn (u)] [since P {Xi > u} = 1 Fi (u), i = 1, . . . , n]

4.6. MAX AND MIN OF INDEPENDENT RANDOM VARIABLES

As before, this formula can be greatly simplified when the Xi s are equally distributed, that
is, when
F1 (x) = F2 (x) = = Fn (x) = F (x)
for all values of x. In this case,

F (u) = 1 [1 F (u)]n

(4.9)

f (u) = F0 (u) = n[1 F (u)]n1 f (u).

(4.10)

and

Example 4.9 A system consists of five components connected in series. The lifetime (in
thousands of hours) of each component is an exponential random variable with mean = 3.
(a) Calculate the probability that the system will fail before 3500 hours. Compare this with
the probability that a component fails before 3500 hours.
(b) Calculate the median life, the mean life and the standard deviation for the system.
Solution
(a) Using formula (4.9) above we obtain
F (u) = 1 [exp{u/3}]5 = 1 exp{(5/3)u}
and so is also exponentially distributed with parameter 5 (1/3) = 5/3. In general,
the minimum of n exponential random variables with parameter is also exponential with
parameter n. Finally,
P { 3.5} = F (3.5) = 1 exp{(5/3)3.5} = 0.9971
The probability that a component will fail before 3500 has been found (in Example 4.8) to
be 0.6886. Therefore, the probability that the system will fail before 3, 500 hours is almost
45% larger.
(b) Since is exponentially distributed, its mean and standard deviation can be obtained
directly from the distribution function found in (a), using equations (4.5) and (4.6). That is,
E() = SD() = (3/5) = 0.6.
Therefore, the mean life of the system, 600 hours, is 5 times smaller than that of the individual
components. Finally, the median life of the system can be found as follows:
1 exp{(5u)/3} = 0.5 ) exp{(5u)/3} = 0.5 ) u0 = 3 log(0.5)/5 = 0.416.
Therefore, the median life of the system is equal to 416 hours.

4.7
4.7.1

CHAPTER 4. RANDOM VARIABLES AND DISTRIBUTIONS

Exercises
Exercise Set A

Problem 4.1 A system consists of five identical components all connected in series. Suppose
each component has a lifetime (in hours) that is exponentially distributed with the rate
= 0.01, and all the five components work independently of one another.
Define T to be the time at which the system fails. Consider the following questions:
(a) Obtain the distribution of T . Can you tell what type of distribution it is?
(b) Compute the IQR (interquartile range) for the distribution obtained in part (a).
(c) What is the probability that the system will last at least 15 hours?
Problem 4.2 Are the following functions density functions? Why?
(a) f1 (x) = 1, 1 x 3; 0, otherwise.
(b) f2 (x) = x, 1 x 1; 0, otherwise.
(c) f3 (x) = exp(x), x 0; 0, otherwise.
Problem 4.3 Suppose that the response time X at a certain on-line computer terminal (the
elapsed time between the end of a users inquiry and the beginning of the systems response
to that inquiry) has an exponential distribution with expected response time equal to 5
seconds (i.e. the exponential rate is = 0.2).
(a) Calculate the median response time.
(b) What is the probability that the next three response times exceed 5 seconds? (Assume
that all the response times are independent).
Problem 4.4 The hourly volume of trac, X, for a proposed highway has density proportional to g(x), where
(
x(100 x) if 0 < x < 100
g(x) =
0
otherwise.
(a) Derive the density and the distribution functions of X.
(b) The trac engineer may design the highway capacity equal to the mean of X. Determine
the design capacity of the highway and the corresponding probability of exceedence
(i.e. trac volume is greater than the capacity).
Problem 4.5 A discrete random variable X has the density function given below.
x
1 0
f (x) 0.2 c

1
2
0.2 0.1

(a) Determine c;
(b) Find the distribution function F (x);
(c) Show that the random variable Y = X 2 has the density function g(y) given by
y
0
1
4
g(y) 0.5 0.4 0.1

4.7. EXERCISES

(d) Calculate expectation E(X), variance Var(X) and the mode of X (the value x with the
highest density).
Problem 4.6 A continuous random variable X has the density function f (x) which is proportional to cx on the interval 0 x 1, and 0 otherwise.
(a) Determine the constant c;
(b) Find the distribution function F (x) of X;
(c) Calculate E(X), Var(X) and the median, Q(0.5);
(d) Find P (|X| 0.5).
Problem 4.7 Show that
(a) Any distribution function F (x) is non-decreasing, i.e. for any real values x1 < x2 ,
F (x1 ) F (x2 ).
(b) Suppose X is a random variable with finite variance. Then, Var(X) E(X 2 ).
(c) If a density function f (x) is symmetric around 0, i.e. f (x) = f (x) for all x 2 R, then
F (0) = P (X 0) = 0.5.
Problem 4.8 If the probability density of a random variable is given by
f (x) =

8
>
< kx

for 0 < x < 2

2k(3 x) for 2 x < 3
>
:
0
elsewhere

(a) Find the value of k such that f (x) is a probability density function.
(b) Find the corresponding distribution function.
(c) Find the mean and median.
Problem 4.9 Suppose a random variable X has a probability density function given by
f (x) =

kx(1 x) for 0 x 1
0
elsewhere.

(a) Find the value of k such that f (x) is a probability density function.
(b) Find P (0.4 X 1).
(c) Find P (X 0.4|X 0.8).
(d) Find F (b) = P (X b), and sketch the graph of this function.
Problem 4.10 Suppose that random variables X and Y are independent and have the same
mean 3 and standard deviation 2. Calculate the mean and variance of X Y .
Problem 4.11 Suppose X has exponential distribution with a unknown parameter , i.e.
its density is that
(
exp(x) if x 0
f (x) =
0
otherwise.
If P (X 1) = 0.25, determine .

CHAPTER 4. RANDOM VARIABLES AND DISTRIBUTIONS

Problem 4.12 Suppose an enemy aircraft flies directly over the Alaska pipeline and fires
a single air-to-surface missile. If the missile hits anywhere within 10 feet of the pipeline, a
major structural damage will occur and the oil flow will be disrupted. Let X be the distance
from the pipeline to the point of impact. Note that X is a continuous random variable. The
probability function describing the missiles point of impact is given by
f (x) =

8
60+x
>
< 3600
60x
3600

>
: 0

for 60 < x < 0

for 0 x < 60
otherwise.

(a) Find the distribution function, F (x).

(b) Let A be the event flow is disrupted. Find P (A).
(c) Find the mean and the standard deviation of X.
(d) Find the median and the interquartile range of X.
Problem 4.13 Consider a random variable X which follows the uniform distribution on the
interval (0, 1). (a) Give the density function f (x) and obtain the cumulative distribution
functionF (x) of X;
(b) Calculatepthe mean (expectation) E(X) and variance Var(X);
(c) Let Y = X. Find the E(Y ) and Var(Y );
(d) Obtain the distribution function G(y) and furthermore the density function g(y) of random variable Y .
Problem 4.14 The reaction time (in seconds) to a certain stimulus is a continuous random
variable with density given below
f (x) =

3
2x2

for 1 x 3
otherwise

(a) Obtain the distribution function.

(b) Take next two observations X1 and X2 (we can assume they are i.i.d). Then consider
V = max{X1 , X2 }. What is the density and distribution functions of V ?
(c) Compute the expectation E(V ) and the standard deviation SD(V ).
(d) Compute the dierence between the expectation and the median for the distribution of
V.
4.7.2

Exercise Set B

Problem 4.15 The continuous random variable X takes values between 2 and 2 and its
density function is proportional to
(a) 4 x2
(b) x2

Find, in each case, the density function, the distribution function, the mean, the standard
deviation, the median and the interquartile range of X.

4.7. EXERCISES

Problem 4.16 Find the density functions corresponding to the pictures in Figure 3.7. For
each case also calculate the distribution function, the mean, the median, the interquartile
range and the standard deviation.

(a)

-2

0
(d)

9
(b)

-1

-2

(c)

(e)

6
(f)

Figure 3.7: Pictures of densities

Problem 4.17 The density function for the lifetime of a part, X, decays exponentially fast.
If the halflife of X is equal to fifty weeks, find the mean and standard deviation of X.
Problem 4.18 The density function for the measurement error, X, is uniform on the interval
(0.5, 0.8). What is the distribution function of X 2 ? What is the density of X 2 ?
Problem 4.19 The hourly volume of trac, X, for a proposed highway has density proportional to d(x), where
d(x) = x
= (3/2)(500 x)
= 0

if 0 < x < 300

if 300 x < 500
otherwise.

(a) Derive the density and the distribution functions of X.

(b) The trac engineer may design the highway capacity equal to one of the following:
(i) the mode of X (defined as the value x with highest density)
(ii) the mean of X
(iii) the median of X
(iv) the quantile of order 0.90 of X (Q(0.90)).
Determine the design capacity of the highway and the corresponding probability of exceedance (that is, capacity is less than trac volume) for each of the four cases.
Problem 4.20 The company has 20 welders with the following performances:

CHAPTER 4. RANDOM VARIABLES AND DISTRIBUTIONS

Welder
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

0
0.10
0.20
0.50
0.05
0.50
0.85
0.30
0.20
0.10
0.20
0.30
0.10
0.35
0.40
0.20
0.60
0.70
0.10
0.40
0.15

1
0.20
0.20
0.30
0.05
0.00
0.00
0.25
0.30
0.10
0.50
0.30
0.10
0.25
0.30
0.30
0.30
0.10
0.80
0.40
0.60

2
0.40
0.20
0.10
0.10
0.00
0.00
0.20
0.20
0.50
0.10
0.40
0.50
0.20
0.10
0.50
0.10
0.10
0.10
0.10
0.15

3
0.20
0.20
0.05
0.30
0.00
0.00
0.10
0.10
0.20
0.20
0.00
0.15
0.15
0.10
0.00
0.00
0.10
0.00
0.10
0.10

4
0.10
0.20
0.05
0.50
0.50
0.15
0.15
0.20
0.10
0.00
0.00
0.15
0.05
0.10
0.00
0.00
0.00
0.00
0.00
0.00

1) How would you rank these twenty welders (e.g. for promotion) on the basis of this
information alone?
2) Would you change the ranking if you know that items with one, two, three and four
cracks must be sold for $6, $15, $40, and $60 less, respectively? What if the associated losses
are $6, $15, $40, and $80. Suggestion: Use the computer.
Problem 4.21 Suppose that the maximum annual wind velocity near a construction site,
X, has exponential density
f (x) = exp {x},

x > 0.

(a) If the records of maximum wind speed show that the probability of maximum annual
wind velocities less than 72 mph is approximately 0.90, suggest an appropriate estimate for
.
(b) If the annual maximum wind speeds for dierent years are statistically independent,
calculate the probability that the maximum wind speed in the next three years will exceed
75 mph. What about the next 15 years?
(c) Plot the distribution function of the maximum wind speed for the next year, for the next
3 years and for the next 15 years. Briefly report your conclusions.
(d) Let Qm (p) (m = 1, 2, . . .) the quantile of order p for the maximum wind speed on the
next m years. Show that
h

Qm (p) = Q1 p1/m ,

for all m = 1, 2, . . .

Use this formula to plot Qm (0.90) versus m. Same for Qm (0.95). Briefly report your conclusions. Suggestion: Use the computer.

4.7. EXERCISES

Problem 4.22 A system has two independent components A and B connected in parallel.
If the operational life (in thousand of hours) of each component is a random variable with
density
f (x) =

1
(x 4)(10 x)
36

= 0

4 < x < 10
otherwise

(a) Find the median and the mean life of each component. Find also the standard deviation
and IQR.
(b) Calculate the distribution and density functions for the lifetime of the system. What is
the expected lifetime of the system?
(c) Same as (b) but assuming that the components are connected in series instead of in
parallel.
Problem 4.23 A large construction project consists of building a bridge and two roads
linking it to two cities (see the picture below). The contractual time for the entire project is
18 months.
The construction of each road will require between 15 and 20 months and that of the
bridge will require between 12 and 19 months. The three parts of the projects can be done
simultaneously and independently. Let X1 , X2 and Y represent the construction times for the
two roads and the bridge, respectively and suppose that these random variables are uniformly
distributed on their respective ranges.
(a) What is the expected time for completion of each part of the project? What are the
corresponding standard deviations?
(b) What is the expected time for the completion of the entire project? What is the corresponding standard deviation?
(c) What is the probability that the project will be completed within the contractual time?
Problem 4.24 Same as Problem 2.51, but assuming that the variables X1 , X2 and Y have
triangular distributions over their ranges.

City

River

Road 1

Bridge

Road 2

City

CHAPTER 4. RANDOM VARIABLES AND DISTRIBUTIONS

Chapter 5

Normal Distribution
5.1

Definition and Properties

Normal Distribution N (, 2 )
The Normal distribution is, for reasons that will be evident as we progress in this course,
the most popular distribution among engineers and other scientists. It is a continuous distribution with density,
(

1
(x )2
f (x) = p exp
2 2
2

where and are parameters which control the central location and the dispersion of the
density, respectively. The normal density is perfectly symmetric about the center, , and
this bell-shaped function is shorter and fatter as increases.

0.4

Normal Density

0.3

sigma=1

0.2

density

sigma=1.5
sigma=2

0.0

0.1

sigma=3

-6

-4

-2

Figure 4.1: Normal density functions

CHAPTER 5. NORMAL DISTRIBUTION

The density steadily decreases as we move away from its highest value
1
f () = p .
2
Therefore, the relative (and also the absolute) probability that X will take a value near is
the highest. Since f (x) ! 0 as x ! 1, exponentially fast,
g(k) = P {|X | k} ! 1,

as k ! 1,

very fast. In fact, it can be shown that g(1) = 0.6827, g(2) = 0.9544, g(3) = 0.9973 and
g(4) = 0.9999. For practical purposes g(k) = 1 for k 4.
Some Important Facts about the Normal Distribution
Fact 1: If X N (, 2 ) and Y = aX + b, where a and b are two constants with a 6= 0, then
Y N (a + b, a2 2 ).
For example, if X N (2, 9) and Y = 5X + 1, then E(Y ) = (5)(2) + 1 = 11, Var(Y ) =
(52 )(9) = 225 and Y N (11, 225).
Proof We will consider the case a > 0. The proof for the a < 0 case is left as an exercise.
The distribution function of Y , denoted here by G is given by

yb
G(y) = P (Y y) = P (aX + b y) = P X
a

yb
,
a

where F is the distribution function of X. The density function g(y) of Y can now be found
by dierentiating G(y). That is,

d
yb
1
yb
g(y) = G (y) = F
= f
dy
a
a
a
(
)
2
1
[y (a + b)]
p exp
=
,
2a2 2
a 2
0

2
Standardized Normal
An important particular case emerges when a = (1/) and b = (/). In this case the
transformed variable is denoted by Z and called standard normal. Since
Z = (1/)X (/) =

X
,

by Property 1, the parameters of the new normal variable, Z, can be obtained from those of
the given normal variable, X, ( and 2 ) as follows:
! a + b = (1/) (/) = 0

5.1. DEFINITION AND PROPERTIES

and
2 ! a2 2 = (1/)2 2 = 1.

That is, any given normal random variable X N (, 2 ) can be transformed into a standard
normal Z N (0, 1) by the equation
Z=

X
.

(5.1)

0.5

Symmetry of the Normal Distribution

0.3

1-P(Z<1)

0.2

P(Z<-1)

0.0

0.1

density

0.4

P(Z<-1)=1-P(Z<1)

-3

-2

-1

Figure 4.2: Symmetry of normal distribution

Fact 2: The standard normal density is denoted by the Greek letter (pronounced Phi) and
the standard normal distribution function is denoted by the corresponding upper case Greek
letter . In symbols,
(
)
1
z2
(z) = p exp
2
2
and

Z z

1
t2
p exp
(z) =
dt.
2
1
2
Since (z) is symmetric about zero [(z) = (z) for all z] we have the important identity
(z) = 1 (z)
for all z. See Figure 4.2.
For example,
(1) = 1 (1) = 1 0.8413447 = 0.1586553,
and

P (1.5 < Z < 1.2) = (1.2) (1.5) = (1.2) [1 (1.5)]

= (1.2) + (1.5) 1 = 0.8181231.

CHAPTER 5. NORMAL DISTRIBUTION

Fact 3: The normal density cannot be integrated in closed form. That is, there are no simple
formulas for calculating expressions like
F (x) =
or
P (a < X < b) =

Z b
a

Z x

f (t)dt

f (t)dt = F (b) F (a).

These expressions can only be calculated by numerical methods (numerical integration or

quadrature). Fortunately, however, we can use Fact 1 to reduce calculations involving any
normal random variable to the standard normal case (see Table 1 in the Appendix). The
basic formula for these calculations is
F (x) = P (X x) = P [(X )/ < (x )/]
= P [Z < (x )/] = [(x )/].
The application of this reduction method is illustrated in the following example.
Example 5.1 Let X N (2, 9). Calculate (a) P (X < 5), (b) P (3 < X < 5) (c) P (X > 5)
(d) P (|X 2| < 3) (e) The value of c such that P (X < c) = 0.95 (f) The value of c such
that P (|X 2| > c) = 0.10
Solution
(a) P (X < 5) = F (5) = [(5 2)/3)] = (1) = 0.8413447, from Table 1 in the Appendix.
(b) P (3 < X < 5) = F (5) F (3) = [(5 2)/3] [(3 2)/3] = (1) (5/3) =
0.8413447 0.04779035 = 0.7935544, from Table 1 in the Appendix.
(c) P (X > 5) = 1 P (X 5) = 1 F (5) = 1 (1) = 0.1586553.
(d) To solve this question we must first remember that a number has absolute value smaller
than 3 if and only if this number is between 3 and 3. In other words, to say that |X 2| < 3
is equivalent to saying that 3 < X 2 < 3. Therefore,
P [|X 2| < 3] = P [3 < X 2 < 3] = P [1 < (X 2)/3 < 1] = P [1 < Z < 1]
= (1) (1) = (1) [1 (1)] = 2(1) 1
= 0.6826895.
One useful result to point out here is
P [|Z| z)] = 2(z) 1

5.1. DEFINITION AND PROPERTIES

(e) To solve this question we first notice that

P (X < c) = P [Z < (c 2)/3] = [(c 2)/3].
Second, we see from the Normal Table that (d) = 0.95 if d 1.64. Therefore
c2
= 1.64
3

c = (3)(1.64) + 2 = 6.92.

(f) The value of c such that P (|X 2| > c) = 0.10 is calculated as follows,
P (|X 2| > c) = P [|Z| > c/3] = 1 P [|Z| c/3] = 1 {2(c/3) 1} = 2[1 (c/3)] = 0.10
Therefore,
)

(c/3) = 0.95

c/3 = 1.64

c = (3)(1.64) = 4.92
2

Fact 4: If X N (, 2 ), then
E(X) =

Var(X) = 2 .

and

Proof It suces to prove that E(Z) = 0 and Var(Z) = 1, because from (9)
X = Z + ,
2
and then we would have E(X) = E(Z +) = E(Z)+ = and Var(Z
=
p+) = Var(Z)
2
0
. By symmetry, we must have E(Z) = 0. In fact, since (z) = (z/ 2) exp {z 2 /2} =
z(z), it follows that

Z 1

z(z)dz =

Z 1

0 (z)dz = (z)|1
1 = 0.

Finally, using bypart integration [u = z and dv = 0 (z)] we obtain

Z 1

z (z)dz =

Z 1

z (z)dz =

z(z)|1
1

Z 1

(z)dz = 1.
2

Fact 5: Suppose that X1 , X2 , . . . , Xn are independent normal random variables with mean
E(Xi ) = i and variance Var(Xi ) = i2 . Let Y be a linear combination of the Xi , that is,
Y = a1 X1 + a2 X2 + . . . + an Xn ,
where ai ( i = 1, , n ) are some given constant coecients. Then,
Y

N (a1 1 + a2 2 + . . . + an n , a21 12 + a22 22 + . . . + a2n n2 )

CHAPTER 5. NORMAL DISTRIBUTION

Proof The proof that Y is normal is beyond the scope of this course. On the other hand,
to show that
E(Y ) = a1 1 + a2 2 + . . . + an n ,
and
Var(Y ) = a21 12 + a22 22 + . . . + a2n n2 ,
is very easy, using Properties 2 and 5 for the mean and the variance of sums of random
variables.
2
Example 5.2 Suppose that X1 and X2 are independent, X1 N (2, 4), X2 N (5, 3) and
Y = 0.5X1 + 2.5X2 .

Find the probability that Y is larger than 15.

Solution By Fact 5, Y is a normal random variable, with mean

and variance

= (0.5 2) + (2.5 5) = 13.5,

2 = (0.52 4) + (2.52 3) = 19.75.

Therefore,

15 13.5
P (Y > 15) = 1 p
19.75

= 1 (0.34) = 1 0.6331 = 0.3669.

An important particular case arises when X1 , . . . , Xn is a normal sample, that is, when
the variables X1 , . . . , Xn are independent, identically distributed, normal random variables,
with mean and variance 2 . One can think of the Xi0 s as a sequence of n independent measurements of the normal random variable, X N (, 2 ). is usually called the population
mean and 2 is usually called the population variance.
If the coecients, ai , are all equal to 1/n. then Y is equal the sample average:
Y =

n
X

(1/n)Xi =

i=1

n
1X
Xi = X.
n i=1

By Fact 5, then, the normal sample average is also a normal random variable, with mean
n
X

ai =

i=1

and variance

n
X
i=1

a2i 2 =

n
X
1

i=1 n

n
X
1
i=1

2 =
2

n
= ,
n
n 2
2
=
.
n2
n

5.2. CHECKING NORMALITY

Example 5.3 Suppose that X1 , X2 , . . . , X16 are independent N (, 4) and X is their average.
(a) Calculate P (|X1 | < 1) and P (|X | < 1). (b) Calculate P (|X | < 1) when the
sample size is 25 instead of 16. (c) Comment on the result of your calculations.
Solution
(a) Since X1 N (, 4), X1 N (0, 4) and so,
P (|X1 | < 1) = 2(1/2) 1 = 2(0.5) 1 = 0.383.
Moreover, since X N (, 4/16), X N (0, 1/4) and

P (|X | < 1) = 2 1/ 1/4 1 = 2(2) 1 = 0.954.

(b) Since X N (, 4/25), X N (0, 4/25) and
P (|X | < 1) = 2(5/2) 1 = 2(2.5) 1 = 0.9876.
(c) The probability that the sample mean, X, is close to the population mean, , (0.954
when n = 16, and 0.9876 when n = 25) is much larger than the probability that any single
observation, Xi , is close to (0.383). The probability that the sample mean is close to the
population mean depends on the sample size, n, and gets larger when the n gets larger. 2

5.2

Checking Normality

A data set, x1 , x2 , . . . , xn , is a sample if the x0i s are a sequence of independent observations

of a random variable, X. The sample is called normal if X N (, 2 ). The statistical
analysis of many data sets is based on the assumption the data set is a normal sample. The
validity of this assumption must be carefully examined because the conclusions of the analysis
may be seriously distorted in the absence of the assumed normality. The most common types
departures from normality are asymmetry, heavy tailness and the presence of outliers.
One simple method for checking the normality of the sample x1 , x2 , . . . , xn is the so called
normal QQ plot. A normal QQ plot is a plot of the theoretical standard normal quantiles
of order (i 0.50/n, di , versus the corresponding empirical sample quantiles, qi = x(i) . If
the sample is normal, then the points (di , qi ) must be close to a straight line. Therefore,
departures from a straightline pattern in the QQ plot indicate lack of normality.
Several normal QQ plots are displayed on Figure 4.3. The sample for case (a) is normal.
The samples for the other five cases depart from normality in dierent ways.
The QQ plot technique is based on the following rational. The theoretical quantile of
order (i 0.5)/n for the random variable X, denoted qi , is defined by the equation
P (X qi ) = F (qi ) = (i 0.5)/n.

That is,

CHAPTER 5. NORMAL DISTRIBUTION

qi = F 1 [(i 0.5)/n],

where F 1 denotes the inverse of F . In the special case of the standard normal the theoretical
quantiles will be denoted by di . They are given by the formula
di = 1 [(i 0.5)/n],
where, as usual, denotes the standard normal distribution function. In the case of a normal
random variable, X, with mean and variance 2 , we have
P (X qi ) = [(qi )/] = (i 0.5)/n.
and therefore,
(qi )/ = 1 [(i 0.5)/n] = di

qi = + di .

Given the sample

x1 , x 2 , . . . , x n ,
the corresponding empirical quantiles, qi are simply given by the sorted sample,
x(1) , x(2) , . . . , x(n) , that is,
q1 = x(1) , q2 = x(2) , . . . , qn = x(n) .
If this sample comes from a N (, 2 ) distribution then one would expect that
qi qi = + di ,
and therefore the plot of x(i) versus di will be close to a straight line, with slope and
intercept .

5.2. CHECKING NORMALITY

(b) Mixture of two Normal Samples

0.5

-2

-1.5

-1.0

-0.5

0.0

1.0

(a) Normal Sample

-1

-2

-1

Quantiles of Standard Normal

(c) 3 Outliers in a Normal Sample

(d) 5 Inliers in a Normal Sample

1.0

0.0

0.5

-0.5

-2

-1.0

-4

-2

-1

-2

-1

Quantiles of Standard Normal

(e) Distribution with Heavy Tails

(f) Distribution with Thin Tails

-2

-50

-1

100

150

-1

Quantiles of Standard Normal

-2

-1

Quantiles of Standard Normal

Figure 4.3: Q-Q plots for checking normality

5.3
5.3.1

CHAPTER 5. NORMAL DISTRIBUTION

Exercises
Exercise Set A

Problem 5.1 A machine operation produces steel shafts having diameters that are normally
distributed with a mean of 1.005 inches and a standard deviation of 0.01 inch. Specifications
call for diameters to fall within the interval 1.000.02 inches. What percentage of the output
of this operation will fail to meet specifications? What should be the mean diameter of the
shafts produced in order to minimize the fraction not meeting specifications?
Problem 5.2 Extruded plastic rods are automatically cut into nominal lengths of 6 inches.
Actual lengths are normally distributed about a mean of 6 inches and their standard deviation
is 0.06 inch.
(a) What proportion of the rods exceeds the tolerance limits of 5.9 inches to 6.1 inches?
(b) To what value does the standard deviation need to be reduced if 99% of the rods must
be within tolerance?
Problem 5.3 Suppose X1 and X2 are independent and identically distributed N (0, 4), and
define Y = max(X1 , X2 ). Find the density and the distribution functions of Y .
Problem 5.4 Assume that the height of UBC students is a normal random variable with
mean 5.65 feet and standard deviation 0.3 feet.
(a) Calculate the probability that a randomly selected student has height between 5.45 and
5.85 feet.
(b) What is the proportion of students above 6 feet?
Problem 5.5 The raw scores in a national aptitude test are normally distributed with mean
506 and standard deviation 81.
(a) What proportion of the candidates scored below 574?
(b) Find the 30th percentile of the scores.
Problem 5.6 Scores on a certain nationwide college entrance examination follow a normal
distribution with a mean of 500 and a standard deviation of 100.
(a) If a school admits only students who scores over 670, what proportion of the student pool
will be eligible for admission?
(b) What admission requirements would you see if only the top 15% are to be eligible?
Problem 5.7 A machine is designed to cut boards at a desired length of 8 feet. However,
the actual length of the boards is a normal random variable with standard deviation 0.2 feet.
The mean can be set by the machine operator. At what mean length should the machine be
set so that only 5 per cent of the boards are under cut (that is, under 8 feet)?
Problem 5.8 The temperature reading X from a thermocouple placed in a constanttemperature medium is normally distributed with mean , the actual temperature of the
medium, and standard deviation .
(a) What would the value of have to be to ensure that 95% of all readings are within 0.1

5.3. EXERCISES

of ?
(b) Consider the dierence between two observations X1 and X2 (here we could assume that
X1 and X2 are i.i.d.), what is the probability that the absolute value of this dierence is at
most 0.075 ?
Problem 5.9 Suppose the random variable X follows a normal distribution with mean =
50 and standard deviation = 5.
(a) Calculate the probability P (|X| > 60).
(b) Calculate EX 2 and the interquartile range of X.
5.3.2

Exercise Set B

Problem 5.10 Let Z be a standard normal random variable. Find:

(a) P (Z < 1.3)
(b) P (0.8 < Z < 1.3)
(c) P (0.8 < Z < 1.3)
(d) P (1.3 < Z < 0.8)
(e) c such that P (Z < c) = 0.9032
(f) c such that P (Z < c) = 0.0968
(g) c such that P (c < Z < c) = 0.90
(h) c such that P (|Z| < c) = 0.95
(i) c such that P (|Z| > c) = 0.80
Problem 5.11 Let X be a normal random variable with mean 10 and variance 25 Find:
(a) P (X < 13)
(b) P (11 < X < 13)
(c) P (8 < X < 13)
(d) P (6 < X < 8)
(e) c such that P (X < c) = 0.9032
(f) c such that P (X < c) = 0.0968
(g) c such that P (c < X 10 < c) = 0.90
(h) c such that P (c < X < c) = 0.95
(j) c such that P (|X 10| > c) = 0.80
(k) c such that P (|X| > c) = 0.80

100

CHAPTER 5. NORMAL DISTRIBUTION

Problem 5.12 A scholarship is oered to students who graduate in the top 5% of their
class. Rank in the class is based on GPA (4.00 being perfect). A professor tells you the
marks are distributed normally with mean 2.64 and variance 0.5831. What GPA must you
get to qualify for the scholarship?
Problem 5.13 If the test scores of 40 students are normally distributed with a mean of 65
and a standard deviation of 10.
(a) Calculate the probability that a randomly selected student scored between 50 and 80;
(b) If two students are randomly selected, calculate the probability that the dierence between
their scores is less than 10.
Problem 5.14 The length of trout in a lake is normally distributed with mean = 0.93
feet and standard deviation = 0.5 feet.
(a) What is the probability that a randomly chosen trout in the lake has a length of at least
0.5 feet;
(b) Suppose now that the is unknown. What is the value of if we know that 85% of the
trout in the lake are less than 1.5 feet long. Use the same mean 0.93.
Problem 5.15 The life of a certain type of electron tube is normally distributed with mean
95 hours and standard deviation 6 hours. Four tubes are used in a electronic system. Assume
that these tubes alone determine the operating life of the system and that, if any one fails,
the system is inoperative.
(a) What is the probability of a tube living at least 100 hours?
(b) What is the probability that the system will operate for more than 90 hours?
Problem 5.16 A product consists of an assembly of three components. The overall weight
of the product, Z, is equal to the sum of the weights X1 , X2 and X3 of its components.
Because of variability in production, they are independent random variables, each normally
distributed as N (2, 0.02), N (1, 0.010) and N (3, 0.03), respectively. What is the probability
that Z will meet the overall specification 6.00 0.30 inches?
Problem 5.17 Due to variability in raw materials and production conditions, the weight
(in hundred of pounds) of a concrete beam is a normal random variable with mean 31 and
standard deviation 0.50.
(a) Calculate the probability that a randomly selected beam weights between 3000 and 3200
pounds.
(b) Calculate the probability that 25 randomly selected beams will
weight more than 79,500 pounds.
Problem 5.18 A machine fills 250-pound bags of dry concrete mix. The actual weight of
the mix that is put in the bag is a normal random variable with standard deviation = 0.40
pound. The mean can be set by the machine operator. At what mean weight should the
machine be set so that only 10 per cent of the bags are underweight? What about the larger
500-pound bags?

5.3. EXERCISES

101

Problem 5.19 Check if the following samples are normal. Describe the type of departure
from normality when appropriate.
(a) 2.52 3.06 2.41 3.98 2.63 4.11 4.66 5.83 4.80 6.17 4.44 5.38 5.02 1.09 3.31 2.72 1.75 3.81
4.45 2.93
(b) 2.15 -3.46 1.12 0.25 -1.42 0.06 -1.16 -2.24 -1.50 0.37 0.66 -0.76 6.24 0.36 -0.40 0.52 -0.97
0.36 1.74 -0.65
(c) 1.79 -0.65 1.16 1.23 2.80 0.92 -2.62 -5.48 0.75 -2.64 -6.41 0.92 1.14 0.18 0.06 -1.49 -3.99
-10.36 7.12 -1.86
(d) -0.53 0.71 1.40 0.28 -0.65 1.02 -0.71 0.70 1.55 -0.52 -0.73 -1.04 -2.39 0.39 5.71 6.39 4.28
6.70 6.05 5.62
(e) -1.61 -1.29 0.59 -0.33 0.14 1.16 2.02 -0.52 0.69 -0.30 -0.56 0.43 -1.01 0.83 -0.95 0.24 0.01
0.10 0.12 0.07

102

CHAPTER 5. NORMAL DISTRIBUTION

Chapter 6

Some Probability Models

6.1

Bernoulli Experiments

Some random experiments can be viewed as a sequence of identical and independent trials,
on each of which one of two possible outcomes occurs. Some examples of random experiments
of this kind are
- Recording the number of times the maximum annual wind speed exceeds a certain level v0
(during a fixed number of years).
- Counting the number of years until v0 is exceeded for the first time.
- Testing (passnopass) a number of randomly chosen items.
- Polling some randomly (and independently) chosen individuals regarding some yesno question, for instance, did you vote in the last provincial election?
Each trial is called Bernoulli trial and a set of independent Bernoulli trials is called
Bernoulli process or Bernoulli experiment. The defining features of a Bernoulli experiment
are:

The experiment consists of a sequence of trials

The trials are independent

Each trial has only two possible outcomes.

These outcomes refer to the occurrence or not of a certain event, A. They are arbitrarily
called success (when A occurs) and failure (when Ac occurs) and denoted by
S (for success)

and

F (for failure)

The probability of S is the same for all trials.

This constant probability is denoted by p, that is
P (S) = p
103

104

CHAPTER 6. SOME PROBABILITY MODELS

and so
P (F ) = 1 P (S) = 1 p = q.
The number of trials in a Bernoulli experiment can either be fixed or random. For example,
if we are considering the number of maximum annual wind speed exceedances of v0 in the
next fifteen years, the number of trials is fixed and equal to 15. On the other hand, if we are
considering the number of years until v0 is first exceeded, the number of trials is random.

6.2

Bernoulli and Binomial Random Variables

Given a Bernoulli experiment of size n (n independent Bernoulli trials), there are n Bernoulli
random variables Y1 , Y2 , . . . , Yn associated with it. The random variable Yi (i = 1, . . . , n)
depends only on the outcome of the ith trial and is defined as follows
Yi = 1,
= 0,

if the ith trial ends in S

if the ith trial ends in F .

That is, Yi is a counter for the number of Ss in the outcome of the random experiment.
The variables Yi are very simple. By definition, they are independent and their common
density function is
f (y) = py (1 p)1y , y = 0, 1.
The mean and the variance of Yi (they are, of course, the same for all i = 1, . . . , n,) are given
by,
E(Yi ) = (0)f (0) + (1)f (1) = p,
and
Var(Yi ) = (0 p)2 f (0) + (1 p)2 f (1) = (p)2 q + q 2 p = pq,

where q = 1 p.

The student can check that the variance is maximized when p = q = 0.5. This result is hardly
surprising as the uncertainty is clearly maximized when S and F are equally likely. On the
other hand, the uncertainty is clearly smaller for smaller or larger values of p. For example,
if p = 0.01 we can feel very confident that most of the trials will result in failures. Similarly,
if p = 0.99 we can confidently predict that most of the trials will result in successes.
Binomial Random Variable (B(n, p)).
Given a Bernoulli experiment of fixed size n, the corresponding Binomial random variable
X is defined as the total number of Ss in the sequence of Fs and Ss that constitutes the
outcome of the experiment. That is,
X=

n
X
i=1

Yi .

105

6.2. BERNOULLI AND BINOMIAL RANDOM VARIABLES

Using properties (2) and (4) of the mean and variance of random variables,
n
X

E(X) = E

Yi =

i=1

and

n
X

Var(X) = Var(

Yi ) =

i=1

n
X

n
X
i=1

Var(Yi ) =

i=1

E(Yi ) =

n
X

p = np.

i=1

where q = 1 p.

pq = npq,

i=1

The probability density function of X is

f (x) = (nx )px q nx ,
where
(nx ) =

for all x = 0, 1, . . . , n,

(6.1)

n!
[n(n 1) . . . (2)(1)]
=
.
x!(n x)!
[x(x 1) . . . (2)(1)] [(n x)(n x 1) . . . (2)(1)]

For example, if n = 5 and x = 3 we have

(53 ) =

5!
[(5)(4)(3)(2)(1)]
=
= 10
3!2!
[(3)(2)(1)][(2)(1)]

To derive the density (6.1) first notice that X takes the value x only if x of the Yi are equal
to one and the remainder are equal to zero. The probability of this event is px q nx . In
addition, the n variables Yi can be divided into two groups of x and n x variables in (nx )
many dierent ways.
The distribution function of X doesnt have a simple closed form and can be obtained
from Table A5 for a limited set of values of n and p.
Example 6.1 Suppose that the logarithm of the operational life of a machine, T (in hours),
has a normal distribution with mean 15 and standard deviation 7. If a plant has 20 of these
machines working independently, (a) what is the probability that more than one machine
will breakdown before 1500 hours of operation? (b) how many more machines are needed if
the expected number of machines that will not break down before 1500 hours of operation
must be larger than 18?
Solution The number of machines breaking down before 1500 hours of operation, X, is a
binomial random variable with n = 20 and
"

log(1500) 15
p = P (T < 1500) = P (log(T ) < log(1500)) =
7
= 1 (1.1) = 1 0.8643 = 0.14.
(a) First we notice that
P (X > 1) = 1 P [X 1].

106

CHAPTER 6. SOME PROBABILITY MODELS

Since
20
20
19
P [X 1] = P (X = 0) + P (X = 1) = (20
0 )(0.86) + (1 )(0.14)(0.86)

= 0.04897 + 0.15945 = 0.21,

P (X > 1) = 1 0.21 = 0.79
(b) The expected number of machines in operation (out of n machines) after 1500 hours
of operation is n(1 0.14). For this expected value to be larger than 18, n must be larger
than 18/0.86 = 20.93. Therefore, the company needs to acquire one additional machine.
2

6.3

Geometric Distribution and Return Period

The expected value, , of the number of trials before the first occurrence of a certain event,
A, is called the return period of that event. For example, the return period of the event
maximum annual wind speed exceeding v0 is equal to the expected number of years before
v0 is exceeded for the first time.
The number of trials itself is a discrete random variable, X, with Geometric density,
f (x) = p(1 p)x1 ,

x = 1, 2, . . .

(6.2)

where p is the probability of A and q = (1 p) is the probability of the complementary event,

Ac . The distribution function of X has a simple closed form (see Problem 6.6)
F (x) = 1 (1 p)x ,

x = 1, 2, . . .

The derivation of (6.2) is fairly straightforward: first of all, it is clear that the range of X is
equal to {1, 2, . . .}. Furthermore, we can have X = x only if the event Ac occurs during the
first x 1 trials and A occurs in the xth trial. In other words, we must have a sequence of
x 1 failures followed by a success. Because of the independence of the trials in a Bernoulli
experiment, it is clear that the probability of such a sequence is equal to p(1 p)x1 .
To check that f (x) = pq x1 is actually a probability density function we must verify that
1
X

f (x) = 1.

In fact, using the well known formula for the sum of a geometric series with rate 0 < q < 1,
[1 + q + q 2 + ] = 1/(1 q),
we obtain

1
X
0

f (x) =

1
X
1

pq x1 = p[1 + q + q 2 + ] =

p
= 1.
1q

6.3. GEOMETRIC DISTRIBUTION AND RETURN PERIOD

107

Finally, the return period, , of the event A is given by

= E(X) = p

1
X
1

1
X

x(1 p)

= p{

d
[
dp

= p{

d 1p
1
1
}=p 2 = .
dp p
p
p

(1 p)x ]} = p{

1
X
1

d
[(1 p)x ]}
dp

d
(1 p)[1 + (1 p) + (1 p)2 + ]}
dp

The return period of A is then inversely proportional to p = P (A). If p = P (A) is small then
we must wait, on average, a large number of periods until the first occurrence of A. On
the other hand, if p is large then we must wait, on average, a small number of periods for
the first occurrence of A.
The student will be asked to show (see Problem 6.6) that the variance of X is given by
Var(X) = (1 p)/p2 = ( 1).
One may well ask the question: why is called return period? The reason for this becomes
clear after we notice that, because of the assumed independence, the expected number of trials
before the first occurrence of A is the same as the expected number of trials between any two
consecutive occurrences of A.
Example 6.2 Suppose that a structure has been designed for a 25year rain (that is, a
rain that occurs on average every 25 years).
(a) What is the probability that the design annual rainfall will be exceeded for the first time
on the sixth year after completion of the structure?
(b) If the annual rainfall Y (in inches) is normal with mean 55 and variance 16, what is the
corresponding design rainfall?
Solution
(a) To say that a certain structure has been designed for a 25year rain means that it has
been designed for an annual rainfall with return period of 25 years.
The return period, , is equal to 25, and therefore the probability of exceeding the design
annual rainfall is
p = 1/ = 1/25 = 0.04.
If X represents the number of years until the first time the design annual rainfall is exceeded,
then
P (X = 6) = (0.04)(0.96)61 = (0.04)(0.96)5 = 0.033
is the required probability.
(b) The design rainfall, v0 , must satisfy the equation
P (Y > v0 ) = 0.04

108

CHAPTER 6. SOME PROBABILITY MODELS

or equivalently,

v0 55
= 0.96.
4
From the Standard Normal Table we find that (1.75) = 0.96. Therefore,

v0 55
= 1.75
4

v0 = (4)(1.75) + 55 = 62.
2

6.4

Poisson process and associated random variables

Many physical problems of interest to engineers and other applied scientists involve the
possible occurrences of an event A at some points in time and/or space. For example:

earthquakes can occur at any time over a seismically active region

trac accidents can occur at any time along a busy highway

fatigue cracks can occur at any point along a continuous weld

flaws can occur at any point over a wood panel

phone calls can arrive at any time at a telephone switchboard

crashes can occur at any time on a computer network

An important feature of these processes is the expected number of occurrences of the

event A per unit of time (or space). This average number of occurrences is represented by
and called the rate of the process. We will see that this parameter determines the main
features of the entire process.
To fix ideas, suppose that we are studying the sequence of crashes of a computer network
and that we are using a week as the unit of time. In this case is the average number of
crashes per week.
There are at least two main features of the sequence of occurrences (e.g. crashes) which
are of interest: the number of occurrences in an interval of length and the time between
consecutive occurrences of A. These features are represented by the random variables X and
T below.

X is the number of occurrences in an interval of length ,

and

T is the time between two consecutive occurrences

6.4. POISSON PROCESS AND ASSOCIATED RANDOM VARIABLES

109

The process is called a Poisson Process if A is a rare event, that is, has the following
properties:
1) The number of occurrences of A on nonoverlapping intervals are independent.
2) The probability of exactly one occurrence of A on any interval of length is approximately equal to when is small.
3) The probability of more than one occurrence of A on any interval of length is
approximately equal to 2 when is small (that is, A is a rare event).
The discrete random variable X described above (number of occurrences on an interval
of fixed length ) has the so called Poisson density function
f (x) =

exp{}()x
,
x!

x = 0, 1, 2, . . .

and the continuous random variable T (time between consecutive occurrences of A, or interarrival time) has the so called exponential density
f (t) = exp{t},

t > 0.

The derivation of these densities from assumptions 1) 2) and 3) is not very dicult. The
interested student can read the heuristic derivation given at the end of this chapter.
Example 6.3 In Southern California there is on average one earthquake per year with
Richter magnitude 6.1 or greater (big earthquakes).
(a) What is the probability of having three or more big earthquakes in the next five years?
(b) What is the most likely number of big earthquakes in the next 15 months?
(c) What is the probability of having a period of 15 months without a big earthquake?
(d) What is the probability of having to wait more than three and a half years until the
occurrence of the next four big earthquakes?
Solution We assume that the sequence of big earthquakes follows a Poisson process with
(average) rate = 1 per year.
(a) The number X of big earthquakes in the next five years is a Poisson random variable
with average rate 5 and so, using the Poisson Table we get
P (X 3) = 1 P (X < 3) = 1 F (2) = 1 0.125 = 0.875.

110

CHAPTER 6. SOME PROBABILITY MODELS

(b) In general, a Poisson density f (x) with parameter is increasing at x (x 1) if and only
if the ratio f (x)/f (x 1) > 1. Since
f (x)
exp{} x
exp{} (x1)

= .
f (x 1)
x!
(x 1)!
x
it follows that
f (x) > f (x 1) when x <
f (x) = f (x 1) when x =
f (x) < f (x 1) when x >
Therefore, the largest value of f (x) is achieved when x = [], where
[] = integer part of
So, the most likely number of big earthquakes is [1.25] = 1 (notice that 15 months = 1.25
years).
(c) The waiting time T to the next big earthquake is an exponential random variable with
rate = 1 year, with distribution function
F (t) = 1 exp {t}.
Therefore,
P {T > 1.25} = 1 F (1.25) = 1 [1 exp {1.25}] = 0.287.
(d) Let Y represent the number of big earthquakes in the next three and a half years and let
W represent the waiting time (in years) until the occurrence of the next four big earthquakes.
We notice that Y is a Poisson random variable with rate 3.5 and that W is larger than
3.5 years if and only if Y is less than 4. So,
P [W > 3.5] = P [Y < 4] = F (3) = 0.5366.
2
Means and Variances The means of X and T are of practical interest, as they represent
the expected number of occurrences on a period of length and the expected waiting time
between consecutive occurrences, respectively. We will see that, not surprisingly,
E(X) =

and

E(T ) =

1
.

6.4. POISSON PROCESS AND ASSOCIATED RANDOM VARIABLES

We will also see that

Var(X) =

and

Var(T ) =

1
.
2

First, lets calculate E(X).

E(X) =

1
X

xf (x) =

x=0

1
X

x=0

= exp{}()

1
X
exp{}()x
exp{}()x
=
x
x!
x!
x=1

1
X
()x1

x=1

(x 1)!

= exp{}() exp{} = .

Analogously, it follows that

1
X

E[X(X 1)] =

x=0
1
X

x=2

x(x 1)f (x) =

x(x 1)

1
X

x=0

x(x 1)

exp{}()x
x!

1
X
exp{}()x
()x2
= exp{}()2
x!
x=2 (x 2)!

= exp{}()2 exp{} = ()2 .

Therefore,
E(X 2 ) = E[X(X 1)] + E(X) = ()2 + ,
and
Var(X) = E(X 2 ) [E(X)]2 = ()2 + ()2 = .
To calculate E(T ), we use integrationbypart with
u = t and dv = exp{t}dt,
to get

E(T ) =

Z 1
0

tf (t)dt =

Z 1
0

t exp{t}dt =

Z 1
0

exp{t}dt = 1/.

111

112

CHAPTER 6. SOME PROBABILITY MODELS

To calculate E(T 2 ), we use integrationbypart with

u = t2

and dv = exp{t}dt,

to get
2

E(T ) =

Z 1
0

t f (t)dt =
Z 1

= (2/)[

Z 1
0

t exp{t}dt = 2

Z 1
0

t exp{t}dt

t exp{t}dt] = (2/)(1/) = (2/2 ).

Finally,
Var(T ) = E(T 2 ) [E(T )]2 = (2/2 ) (1/2 ) = (1/2 ).

Example 6.3 (continued):

(e) What is the expected number of big earthquakes in the next five years? Fifteen months?
What are the corresponding standard deviations?
(f) What is the expected waiting time (in years) between two consecutive big earthquakes?
(g) What is the expected waiting time (in years) until the 25th big earthquake? The standard
deviation?
(h) What is the approximate probability that the waiting time until the 25th big earthquake
will exceed 27 years? This question will be answered in the next chapter.
Solution
(e) Since X = number of big
p earthquakes in the next five years is Poisson(5), we have
that E(X) = 5 and SD(X) = 5 = 2.24.
p In the case of fifteen months (1.25 years) the mean
is 1.25 and the standard deviation is 1.25 = 1.12.
(f) Since T = waiting time (in years) between two consecutive big earthquakes is an exponential random variable with rate = 1, its expected value (E(T ) = 1/) and standard
deviation (Var(T ) = 1/2 ) are both equal to one.
(g) Let
W = waiting time (in years) until the 25th big earthquake
and let
Ti = waiting time (in years) between the (i 1)th and the ith big earthquakes, i = 1, 2, . . . , 25.

113

6.5. POISSON APPROXIMATION TO THE BINOMIAL

Notice that

25
X

W =

i=1

where, because of the Poisson process assumptions,

T1 , T2 , . . . , T25

are iid Exp(1),

where Exp() means exponential distribution with parameter (rate) . Therefore,

25
X

Ti =

25
X

Ti ) =

E(W ) = E

i=1

Var(W ) = Var(

i=1

and,

25
X

ETi =

i=1

25
X

1 = 25.

i=1

Var(Ti ) =

i=1

25
X

1 = 25.

i=1

SD(W ) = 5.
2

6.5

Poisson Approximation to the Binomial

Let X B(n, p) be a binomial random variable with parameters n and p. If n is large

(n 20) and p is small (np < 5) then we can use a Poisson random variable with rate
= np, Y P(np), to approximate the probabilistic behavior of X. In other words, we can
use the approximation
P [B(n, p] = x) P [P(np) = x] = exp{np}(np)x /x!,

for all x = 0, . . . , n.

Example 17: On average, one per cent of the 50-kg dry concrete bags are underfilled below
49.5 kg. What is the probability of finding 4 or more of these underfilled bags in a lot of 200?
Solution: Since n = 200 and p = 0.01,
min{np, n(1 p)} = min{2, 198} = 2 < 5.
Since n is large and np = 2 is small, we can use the Poisson approximation
P [B(200, 0.01) 4] P [P(2) 4] = 1 P [P(2) < 4]
= 1 F (3) = 1 0.857 = 0.143

from the Poisson table.

114

CHAPTER 6. SOME PROBABILITY MODELS

6.6

Heuristic Derivation of the Poisson and Exponential Distributions

Let m be some
fixed integer number. If Yi is the number of occurrences of the event A in

the interval i1
, im , then the total number of occurrences, X, in the interval (0, 1] (we are
m
taking = 1 for simplicity) can be written as
X = Y1 + Y2 + . . . + Ym .

Because of the independence of the number of occurrences on non-overlapping intervals,

the variables Y1 , Y2 , . . . , Ym are independent. Moreover, because of the assumption that A is
a rare event, the probability that the variables Yi will take values other than zero and one is
nearly zero,
P (Yi > 1) 0, when m is large,

and so the variables Yi are approximately Bernoulli random variables when m is large.
By the above remarks, the random variable X is approximately Binomial, B(m, /m),
when m is large. Of course, the larger m, the better the approximation, and in the limit
(when m ! 1) the approximation becomes exact. Therefore, the probability that X will
take any fixed value x can be obtained from the limit, as m ! 1, of the binomial expression
P (X = x) = (nx )[/m]x [1 /m]mx

Since, as m ! 1, we have
(nx )/mx =

mm1m2
mx+1
...
! 1,
m m
m
m

[1 /m]x ! 1,

and

[1 /m]m ! exp{},
we obtain that, as m ! 1,
(nx )[/m]x [1 /m]mx =

mm1m2
mx+1
...
[1 /m]x [1 /m]m x /x!
m m
m
m

! exp{}x /x!,

the Poisson density function.

6.6. HEURISTIC DERIVATION OF THE POISSON AND EXPONENTIAL DISTRIBUTIONS115

In particular, this justifies the P(np) approximation to the B(n, p), when n is large and p
is small. The requirement that n is large corresponds to m being large and the requirement
that p is small corresponds to /m being small.
To derive the Exponential density of T , we reason as follows: The waiting time T until
the first occurrence of A will be larger than t if and only if the number of occurrences X in
the period (0, t) is equal to zero. Since X P(t),
P (T t) = 1 P (T > t) = 1 P (X = 0) = 1 [exp {t}(t)0 /0!]
= 1 exp {t},

the exponential distribution with parameter .

116

6.7
6.7.1

CHAPTER 6. SOME PROBABILITY MODELS

Exercises
Exercise Set A

Problem 6.1 A weighted coin is flipped 200 times. Assume that the probability of a head
is 0.3 and the probability of a tail is 0.7. Each flip is independent from the other flips. Let
X be the total number of heads in the 200 flips.
(a) What is the distribution of X?
(b) What is the expected value of X and variance of X?
(c) What is the probability that X equals 35?
(d) What is the approximate probability that X is less than 45?
Note: Come back to this question after you learned about normal approximations in the
next chapter.
Problem 6.2 Suppose it is known that a treatment is successful in curing a muscular pain
in 50% of the cases. If it is tried on 15 patients, find the probabilities that:
(a) At most 6 will be cured.
(b) The number cured will be no fewer than 6 and no more than 10.
(c) Twelve or more will be cured.
(d) Calculate the mean and the standard deviation.
Problem 6.3 The oce of a particular U.S. Senator has on average five incoming calls per
minute. Use the Poisson distribution to find the probabilities that there will be:
(a) exactly two incoming calls during any given minute;
(b) three or more incoming calls during any given minute;
(c) no incoming calls during any given minute.
(d) What is the expected number of calls during any given period of five minutes?
Problem 6.4 A die is colored blue on 5 of its sides, and green on the other 1 side. This die
is rolled 8 times. Assume each roll of the die is independent from the other rolls. Let X be
the number of times blue comes up n the 8 rolls of the die.
(a) What is the expected value of X and the variance of X?
(b) What is the probability that X equals 6?
(c) What is the probability that X is greater than 6?
Problem 6.5 A factory produced 10, 000 light bulbs in February, in which there are 500
defectives. Suppose 20 bulbs are randomly inspected. Let X denote the number of defectives
in the sample.
(a) Calculate P (X = 2).
(b) If the sample size, i.e., the number of the inspected bulbs, is large, how would you
calculate P (X 2) approximately? For n = 200, calculate this probability approximately.
6.7.2

Exercise Set B

Problem 6.6 Let X be a random variable with geometric density (6.2). Show that

6.7. EXERCISES

117

(a) F (x) = 1 P (X > x) = 1 (1 p)x .

(b) E[X(X 1)] = 2(1 p)/p2 , and therefore E(X 2 ) = (2 p)/p2 .
(c) Var(X) = (1 p)/p2 = ( 1).
Problem 6.7 The Statistical Tutorial Center has been designed to handle a maximum of
25 students per day. Suppose that the number X of students visiting this center each day is
a normal random variable with mean 15 and variance 16.
(a) What is the return period for this center?
(b) What is the probability that the design number of visits will not be exceeded before
the 10th day?
Problem 6.8 A transmission tower has been designed for a 30year wind.
(a) What is the probability that the design maximum annual wind velocity will be exceeded
for the first time on the 7th year after completion of the project?
(b) What is the probability that the design maximum annual wind velocity will be exceeded
during the first 7 years after completion of the project?
(c) If the maximum annual wind velocity (in miles per hour) is an exponential random variable
with mean 35, what is the design maximum annual wind velocity?
(d) What is the return period if the design maximum annual wind velocity is decreased by
15%?
Problem 6.9 (a) Let X1 and X2 be two Binomial random variables with n = 14 and p =
0.30. Calculate
(i) P (X1 = 4), P (X1 < 6) and P (2 < X1 < 6) (use the Binomial table)
(ii) E(X1 ), SD(X1 ), E(X1 + X2 ), SD(X1 + X2 ), E(X1 X2 ) and SD(X1 X2 )
(iii) P (X1 + X2 = 8), P (X1 + X2 < 12) and P (4 < X1 + X2 < 12).
Problem 6.10 The arrival of customers to a service station is well approximated by a Poisson Process with rate = 5 per hour.
(a) What is the expected number of customers per day? (the service station is open eight
hours per day)
(b) What is the most likely number of customers in any given hour?
(c) What is the probability that more than seven customers will arrive in the next hour?
(d) What is the probability that the waiting time between two consecutive arrivals will be
25 minutes or more?
(e) What is the expected time until the arrival of the next 25 customers? The standard
deviation?
Problem 6.11 A bag contains 4 red balls and 6 white balls. One ball was drawn with equal
probability and replaced in the bag before next draw was made. Let X be the number of red
balls out of 100 draws from the bag.
(a) Give a general expression for P (X = k), k = 0, 1, ..., 100;
(b) Calculate the mean and variance of X;
(c) Calculate the probability P (X 38).

118

CHAPTER 6. SOME PROBABILITY MODELS

Problem 6.12 The number of killer whales arriving at the Pacific Rim Observatory Station
follows a Poisson Process with rate = 4 per hour.
(a) What is the expected number and variance during the next hour?
(b) What is the probability that the waiting time T between two consecutive arrivals will be
30 minutes or more?
(c) What is the expected time and variance until the next 20 killer whales arriving at the
Observatory Station.
Problem 6.13 Car accidents are random and can be said to follow a Poisson distribution.
At a certain intersection in East Vancouver there are, on average, 4 accidents a week. Answer
the following questions:
(a) What is the probability of there being no accidents at this intersection next week?
(b) The record for accidents in one month at a single intersection is 20. Find the probability
that this record will be broken, at this intersection, next month. (Assume 30 days in one
month)
(c) What is the expected waiting time for 20 accidents to occur?
Problem 6.14 A test consists of ten multiple-choice questions with five possible answers.
For each question, there is only one correct answer out of five possible answers. If a student
randomly chooses one answer at each question, calculate the following probabilities that
(a) at most three questions are answered correctly?
(b) five questions are answered correctly?
(c) all questions are answered correctly?
And (d) calculate the mean and the standard deviation of number of correct answers.
Problem 6.15 The number of meteorites hitting Mars follows a Poisson process with parameter = 6 per month.
(a) What is the probability that at least 2 meteorites hit Mars in any given month?
(b) Find the probability that exactly 10 meteorites hit Mars in the next 6 months.
(c) What is the expected number of meteorites hitting the Mars in the next year?
Problem 6.16 A biased coin is flipped 10 times independently. The probability of tails is
0.4. Let X be the total number of heads in the 10 flips.
(a) Use a computer to find P (X = 4);
(b) Use the Binomial table to find P (1 < X < 5);
(c) What is the probability that one has to flip at least 5 times to get the first head?
Problem 6.17 Three identical fair coins are tossed simultaneously until all three show the
same face.
(a) What is the probability that they are tossed more than three times?
(b) Find the mean for the number of tosses.

Chapter 7

Normal Probability Approximations

7.1

Central Limit Theorem

By Fact 5 in Chapter 4, if X1 , X2 , . . . , Xn is a sample from a normal population with mean

and standard deviation then
X1 + X2 + + Xn
2
N (, ).
n
n
Often, however, one has to deal with nonnormal samples. For example, the population
variable, X, may represent the lifetime of a part and Xi may represent the lifetime of the ith
randomly chosen part. Since the lifetime of a part cannot be negative, X cannot be normal.
A more reasonable assumption may be that X is exponentially distributed (X E()) with
unknown parameter . Or more generally, one may simply assume that X is positive with
mean = 1/, and variance 1/2 . The sample, X1 , X2 , . . . , Xn , would then be typically
obtained in order to estimate the expected life, , for the parts.
Let X1 , X2 , . . . , Xn be a sample from an arbitrary population with mean and variance
2
. A very important result, called the Central Limit Theorem (CLT), states that, when
n is large, X is approximately normal, with mean and variance 2 /n, regardless of the
actual shape of the population distribution. This remarkable result will be extensively used
throughout this course.
The CLT is a limit (asymptotic) result, and the distribution of the average is not exactly
normal for any finite value of n. An obvious question at his point is when should n be
considered large enough for practical applications? Unfortunately, the size of n for which
the normal approximation is good depends on the distribution of the variables Xi being
averaged. If this distribution is symmetric and has light tails then the CLT approximation
may be quite good for small values of n (n equal to five or six). If the distribution of the
Xi0 s is very asymmetric, then it will take longer for the CLT approximation to provide a
reasonable approximation. In many practical situations, the CLT normal approximation can
be used when n 20.
X=

119

120

CHAPTER 7. NORMAL PROBABILITY APPROXIMATIONS

Example 7.1 A system consists of 25 independent parts connected in such a way that the ith
part automatically turnson when the (i 1)th part burns out. The expected lifetime of each
part is 10 weeks and the standard deviation is equal to 4 weeks. (a) Calculate the expected
lifetime and standard deviation for the the system. (b) Calculate the probability that the
system will last more than its expected life. (c) Calculate the probability that the system
will last more than 1.1 times its expected life. (d) What are the (approximate) median life
and interquartile range for the system?
Solution
(a) Let Xi denote the lifetime of the ith component and let
T =

25
X

Xi ,

i=1

denote the lifetime of the system. Then,

E(T ) =

25
X
i=1

E(Xi ) = 25 10 = 250 weeks,

and, using the assumption of independence,

var(T ) =

25
X
i=1

Var(Xi ) = 25 16 = 400.

Therefore,
SD(T ) =

400 = 20 weeks,

Notice that the

p mean of T is 25 times larger than that of each Xi while the standard deviation
of T is only 25 = 5 times larger.
(b) First observe that
T
16
= X ' N (10, ),
25
25
where the symbol ' means approximately distributed as, and so
T

' 25 N (10,

16
) = N (250, 400) = N (E(T ), Var(T )).
25

Therefore,

250 250
= 0.5.
20
(c) First of all notice that 1.1 E(T ) = 1.1 250 = 275. Now, by the discussion in (b),
P (T > E(T )) = P (T > 250) 1

P (T > 275) 1

275 250
= 1 (1.25) = 0.1056.
20

121

7.1. CENTRAL LIMIT THEOREM

(d) Let Z denote the standard normal random variable. Using that T
follows that

N (250, 400), it

Q1 (T ) Q1 (N (250, 400)) = 250 + (20 Q1 (Z)) = 250 (20 0.675) = 236.5.

Analogously,
Q2 (T ) = Median(T ) = 250 + (20 Q2 (Z)) = 250,

and

Q3 (T ) Q3 (N (250, 400)) = 250 + (20 Q3 (Z)) = 250 + (20 0.675) = 263.5.

Therefore,
IQR(T ) Q3 (T ) Q1 (T ) = 263.5 236.5 = 27.0
2

Table 7.1

Rainfall Intensity (in.)

3842
4246
4650
5054
5458
5862
6266
6670
Total

Midpoint
40
44
48
52
56
60
64
68

Frequency
15
34
26
23
17
16
4
10
145

Example 7.2 Consider Table 7.1 with data on the annual (cumulative) rainfall intensity (X)
on a certain watershed area. The average annual rainfall intensity can be calculated from
Table 7.1 as:
X =
=

(40)(15) + (44)(34) + . . . + (68)(10)

15 + 34 + . . . + 10
7388
= 50.952.
145

Since the average has been calculated from a frequency table, using the midpoint of each class
to represent all the points in each class, there is an approximation error to be considered.
How likely is that this approximation error is (a) larger than 0.05? (b) larger than 0.10? (c)
larger than 0.5?

122

CHAPTER 7. NORMAL PROBABILITY APPROXIMATIONS

Solution
To make the required probability calculations we will assume that the rainfall intensities
are uniformly distributed on each interval. This is a reasonable assumption given that we do
not have any additional information on the distribution of values on each class.
Let ri represent the actual annual rainfall intensity (i = 1, 2, . . . , 145) and let mi be the
midpoint of the corresponding class. For instance, if r5 = 50.35 (a value in the class 5054),
then mi = 52.0. Let
Ui = ri mi , i = 1, 2, . . . , 145.

Given our uniformity assumption, the Ui0 s are uniform random variables on the interval
(2, 2).
To proceed with our calculation, we will assume that the variables Ui0 s (which represent
the approximation errors) are independent.
Let
r1 + r2 + . . . + r145
r=
.
145
The approximation error, D, in the calculation of X can now be written as
D = r verlineX =

r1 + r2 + + r145 m1 + m2 + . . . + m145
U1 + U2 + . . . + U145

=
145
145
145

Since D is the average of 145 independent, identically distributed random variables with zero
mean and variance equal to
1Z 2 2
4
2 =
t dt = ,
4 2
3
we can use the (CLT) normal approximation. That is, we can use a normal distribution
with zero mean and variance equal toq(4/3)/145, to approximate the distribution of D. The
corresponding standard deviation is 4/435 = 0.095893.
(a)
P (|D| > 0.05) = P (|D|/0.095893 > 0.05/0.095893) 2[1 (0.52)] = 0.6084.
(b)
P (|D| > 0.1) = P (|D|/0.095893 > 0.1/0.095893) 2[1 (1.04)] = 0.2984.
(c)
P (|D| > 0.5) = P (|D|/0.095893 > 0.5/0.095893) 2[1 (5.21)] = 0.
2
Example 6.3 (continued from Chapter 6):
Recall part (h) of Example 6.3 from the previous chapter which was left unanswered
(h) What is the approximate probability that the waiting time until the 25th big earthquake
will exceed 27 years?

123

7.2. NORMAL APPROXIMATION TO THE BINOMIAL DISTRIBUTION

Solution
(h) Since W is a sum of iid random variables, we can use the Central Limit Theorem to
approximate P (W > 27). Since E(W ) = 25 and SD(W ) = 5 we have
P (W > 27) = 1 P (W 27) = 1

27 25
= 1 [0.40] = 1 0.6554 = 0.3446.
5
2

7.2

Normal Approximation to the Binomial Distribution

Let X be a binomial random variable with parameters n and p. When n is large so that
min{np, n(1 p)} 5,
we can use the following approximation:
"

k np + 0.5
k np 0.5
P (X = k) = P [k 0.5 < X < k + 0.5] =

.
p
p
npq
npq

(7.1)

The justification for the approximation above is given by the Central Limit Theorem. In
fact, we have seen before that
X = Y1 + Y2 + + Yn
where Y1 , Y2 , . . . , Yn are independent Bernoulli random variables with parameter p. Therefore,
X
Y1 + Y2 + + Yn
=
=Y
n
n
which is approximately N (p, pq/n) when n is large. Therefore,
X = nY
is approximately distributed as N (p, pq/n) multiplied by n, that is, N (np, npq). The continuity correction 0.5 which is added and subtracted to k is needed because we are approximating
a discrete random variable with a continuous random variable.
For example, if n = 15 and p = 0.4, then
min{np, n(1 p)} = min{6, 9} = 6 5,
p
np = 6
npq = 1.9
and

8 6 + 0.5
8 6 0.5
P (X = 8) =

1.9
1.9

= (1.32) (0.79) = 0.9065825 0.7852361 = 0.1213.

124

CHAPTER 7. NORMAL PROBABILITY APPROXIMATIONS

Using the Binomial Table on the Appendix we have that the exact probability is equal to
P (X = 8) = F (8) F (7) = 0.9050 0.7869 = 0.1181.
Therefore, the approximation error is equal to 0.0032.
The student can verify, as an exercise, the entries in Table 6.2, where P (X = k) is being
approximated using formula (7.1).

7.3. NORMAL APPROXIMATION TO THE POISSON DISTRIBUTION

125

Table 7.2

k
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

7.3

Approximated
0.0016
0.0070
0.0240
0.0605
0.1213
0.1827
0.2051
0.1827
0.1213
0.0605
0.0240
0.0070
0.0016
0.0003
0.0000
0.0000

Exact
0.0005
0.0047
0.0219
0.0634
0.1268
0.1859
0.2066
0.1771
0.1181
0.0612
0.0245
0.0074
0.0016
0.0003
0.0000
0.0000

Error
0.0011
0.0023
0.0021
-0.0029
-0.0055
-0.0032
-0.0015
0.0056
0.0032
-0.0007
-0.0005
-0.0004
0.0000
0.0000
0.0000
0.0000

Normal Approximation to the Poisson Distribution

The Central Limit Theorem can also be used to approximate Poisson probabilities when the
expected number of counts, , is large. As a rule of thumb, we will use this approximation
when 20.
The Poisson random variable, X ' P(), is approximated by the normal random variable,
N (, ), with the same mean and variance. In other words,
"

x + .5
x .5
p
p
P (X = x)

provided that 20. The continuity correction 0.5 added and subtracted to x is needed
because we are approximating a discrete random variable with a continuous random variable.
This approximation is justified by the following argument: consider a Poisson process
with rate = 1, and suppose that X represents the number of occurrences in a period of
length . We can divide into n subintervals of length /n and denote by Yi the number of
occurrences in the ith subinterval. It is clear that Y1 , . . . , Yn are independent Poisson random
variables with mean /n and that
X = Y1 + Y2 + + Yn = nY .

126

CHAPTER 7. NORMAL PROBABILITY APPROXIMATIONS

Therefore, by the CLT,

X = nY ' nN (/n, /n2 ) = N (, ).
Intuitively, the requirement that is large is necessary because one needs to represent X
as the sum of a a large number, n, of independent random variables, Yi , and the common
distribution of these random variables becomes very asymmetric when /n is very small.
As an example, let X ' P(25) and calculate (a) P (X = 27), (b) P (X > 27) and (c)
P (24 X < 27). In the case of (a),
"

27 + .5 25
27 .5 25
p
p
P (X = 27)

= (0.5) (0.3) = 0.6915 0.6179 = 0.0736.

25
25
The exact probability is exp 25 2527 /(27!) = 0.07080. In the case of (b),
P (X > 27) = 1 P (X 27) 1 ((27.5 25)/5) = 1 (0.5) = 1 0.6915 = 0.3085.
The exact probability in this case is 0.2998. Finally, in the case of (c),
P (24 X < 27) ((26.5 25)/5) ((23.5 25)/5) = (0.3) (0.3) = 2(0.3) 1 = 0.2358.
The corresponding exact probability is 0.2355.

7.4
7.4.1

Exercises
Exercise Set A

Problem 7.1 Two types of wood (Elm and Pine) are tested for breaking strength. Elm
wood has an expected breaking strength of 56 and a standard deviation of 4. Pine wood has
be the sample
an expected breaking strength of 72 and a standard deviation of 8. Let X
average breaking strength of an Elm sample of size 30, and Y be the sample average breaking
strength of a Pine sample of size 40.

(a) What is the approximate distribution of X?

(b) What is the approximate distribution of Y ?
+ Y < 110).
(c) Calculate (approximately) P (X
Problem 7.2 Consider a population with mean 82 and standard deviation 12.
(a) If a random sample of size 64 is selected, what is the probability that the sample mean
will lie between 80.8 and 83.2?
(b) With a random sample of size 100, what is the probability that the sample mean will lie
between 80.8 and 83.2?
(c) What assumption(s) have you used in (a) and (b)?
Problem 7.3 Suppose that the population distribution of the gripping strengths of industrial workers is known to have a mean of 110 and a standard deviation of 10. For a random
sample of 75 workers, what is the probability that the sample mean gripping strength will

127

7.4. EXERCISES

be:
(a) between 109 and 112?
(b) greater than 111?
(c) What assumption(s) have you made?
Problem 7.4 The expected amount of sulfur in the daily emission from a power plant is 134
pounds with a standard deviation of 22 pounds. For a random sample of 40 days, find the
approximate probability that the total amount of sulfur emissions will exceed 5, 600 pounds.
Problem 7.5 Suppose we draw two samples of equal size n from a population with unknown
and Y be the corresponding
mean but a known standard deviation 3.5, respectively. Let X
sample averages. How large would the sample size n be required to be to ensure that P (1
Y 1) = 0.90?
X
Problem 7.6 Suppose X1 , . . . , X30 are independent and identically distributed random variables with mean EX1 = 10 and variance
Var(X1 ) = 5.
1 P30

(a) Calculate the mean of X = 30 i=1 Xi and the standard deviation of X1 X2 .

approximately.
(b) Calculate the interquartile range of X
7.4.2

Exercise Set B

Problem 7.7 Show that if U has uniform distribution on the interval (0, 1) and F is any
given continuous distribution function, then X = F 1 (U ) has distribution F . This result can
be used to generate random variables with any given distribution.
Problem 7.8 (a) Generate m = 100 samples of size n = 10, of independent random variables
with uniform distribution on the interval (0, 1). Let Xij denote the j th element of the ith
sample (i = 1, 2, . . . , m and j = 1, 2, . . . , n).
Construct the histogram and Q Q plot for the sample means
Xi =

n
1X
Xij .
n i=1

(b) Same as (a) but with n = 20 and n = 40. What are your conclusions?
(c) Repeat (a) and (b) but with the Xij having density
f (x) =

1
(x 4)
18

= 0
What are your conclusions?
Hint: See Problem 7.7.

4 < x < 10
otherwise

128

CHAPTER 7. NORMAL PROBABILITY APPROXIMATIONS

Problem 7.9 Solve part (a) of Problem 7.8 but with p = 0.7, instead of 0.3.
Problem 7.10 Referring to Problem 6.10, find the probability that more than 800 customers
will come during the next 20 business days?
Problem 7.11 The expected tensile strength of two types of steel (types A and B, say) are
106 ksi and 104 ksi. The respective standard deviations are 8 ksi and 6 ksi. Let X and Y
be the sample average tensile strengths of two samples of 40 specimens of type A and 35
specimens of type B, respectively.
(a) What is the approximate distribution of X? Of Y ?
(b) What is the approximate distribution of X Y ? Why?
(c) Calculate (approximately) P [|X Y | < 1].
(d) Suppose that after completing all the sample measurements you find x y = 6. What
do you think now of the population assumptions made at the beginning of this problem?
Why?
Problem 7.12 (a) There are 75 defectives in a lot of 1500. Twenty five items are randomly
inspected (the inspection is non-destructive and the items are returned to the lot immediately
after inspection). If two or more items are defective the lot is returned to the supplier (at
the suppliers expense). Otherwise, the lot is accepted. What is the probability that the lot
will be rejected?
(b) Suppose that the actual number of defectives is unknown and that five out of twenty
five independently inspected items turned out to be defectives. Estimate the total number
of defectives in the lot (of 1500 items). What is the expected value and standard deviation
of your estimate? What is the (approximated) probability that your estimate is within a
distance of 10 from the actual total number of defectives?
Problem 7.13 A sequence of n independent pH determinations of a chemical compound
will be made. Each determination can be viewed as a random variable, Xi , with mean
(the unknown true pH of the compound) and standard deviation = 0.15. How many
independent determinations are required if we wish that the sample average X is within 0.01
of the true pH with probability 0.95? What is the necessary n if = 0.30?
Problem 7.14 Bits are independently received in a digital communication channel. The
probability that a received bit is in error is 0.00001.
(a) If 16 million bits are transmitted, calculate the (approximate) probability that more than
150 errors occur.
(b) If 160,000 bits are transmitted, calculate the (approximate) probability that more than
1 error occurs.

Chapter 8

Statistical Modeling and Inference

8.1

Introduction

One is often interested on random quantities (variables Y , T , N , etc.) such as the strength
Y of a concrete block, the time T of a chemical reaction, the number N of visits to a
website, etc. Engineers and applied scientists use statistical models to represent these
random quantities. Statistical models are a set of mathematical equations involving random
variables and other unknown quantities called parameters.
For example, the compressive strength of a concrete block can be modeled as
Y = + "

(8.1)

where is a parameter that represents the true average compressive strength of the concrete
block, " is a random variable with zero mean and unit variance that accounts for the blockto-block variability and is a parameter that determines the average size of the block-toblock variability. Notice that according to this model the compressive strength of a concrete
block is a random variable that results from the sum of two components: a systematic
component or signal ( ) and a random component or noise (").
Independent measurements are often taken to adjust the model, that is, to estimate
the unknown parameters that appear in the model equations. For example, the compressive
strength of several concrete blocks can be measured to get information about and .
Before the measurements are actually performed they can be thought of as independent
replicates of the random quantity of interest. For example, the future measurements of the
compressive strengths can be represented as
Yi = + "i ,

i = 1, . . . , n,

(8.2)

where n is the number of measurements.

Population and Sample: The complete set of items or individuals on which we are interested and on which we could, in principle, measure the variable (s) of interest is called
population. Some examples of populations are a lot of concrete blocks, the websites on a
certain topic, the most recent 300 days of operation of a retail store, etc. It is often impossible (or impractical) to measure the quantity of interest on all the units that comprise the
129

130

CHAPTER 8. STATISTICAL MODELING AND INFERENCE

population under study. In practice some units are randomly chosen and the measurements
are performed only on them. The set of selected units is called sample. The corresponding
set of measurements is also called a sample.
Given a statistical model and a set of measurements (sample) one can carry on some some
statistical procedures called statistical inference which are aimed at extrapolating from
the sample to the population. The most typical statistical procedures are:
Point estimation of the model parameters.
Confidence intervals for the model parameters.
Testing of hypotheses about the model parameters.
These procedures will be described and further discussed in the context of the simple situations considered below.

8.2

One Sample Problems

Sometimes it can be assumed that the quantity of interest is homogeneous for all the units in
the population and that the measurements are the sum of a systematic and a random part
(signal plus noise). In these cases we normally assume that the sample is a set of homogeneous
measurements
Yi = + "i ,
i = 1, . . . , n.
(8.3)
where and are as described in the Introduction above and n is the number of measurements
or sample size. It is often assumed that the measurements are independent and therefore
that the random variables "i , i = 1, . . . , n are independent. Finally, we assume that the
random variables "i are normal with mean zero and variance one.
Note: Multiplicative models where the measurements are the product of a systematic factor
and a random factor
Xi = Ui
can be transformed on additive models like (8.3) by taking the log of the measurements
Yi = ln(Xi ) = ln() + ln(Ui ).
8.2.1

Point Estimates for and

A point estimate is a certain combination of the sample measurements (a function of the

sample) which is expected to take values reasonably close to the parameter it is supposed
to estimate. The point estimate is usually denoted by the same letter as the parameter
but with an added hat to indicate that it is an estimate (e.g. is a point estimate for the
parameter ).
Of course, there are in principle many ways of combining the data to obtain a point
estimate. The particular combination is chosen in order to minimize some function the
estimation error
,

131

8.2. ONE SAMPLE PROBLEMS

for example the expected squared estimation error or the expected absolute estimation error.
Estimation of : A good point estimate for , the main parameter of model (8.3), can be
obtained by the method of least squares which consists of minimizing (in m) the sum of
squares
S(m) =

n
X

(Yj m)2

j=1

Dierentiating with respect to m and setting the derivative equal to zero gives the equation
S 0 (m) = 2

n
X

j=1

(Yj m) = 0,

or m =

n
1X
Yj = Y =
.
n j=1

Estimation Error: Being functions of the random variables, the point estimate Y and
the estimation error Y are also random variables. Obviously, we would like that the
estimation error is small. To have some idea of the behavior of the estimation error we can
calculate its expected value (mean) and its variance:
E[Y ] = E(Y ) =

n
n
1X
1X
E(Yj ) =
=0
n j=1
n j=1

and
Var(Y ) = Var(Y ) =

( Y is unbiased),

n
n
1 X
1 X
2
2
Var(Y
)
=
n
=
.
j
n2 j=1
n2 j=1
n

In this case, the estimation error has then a distribution centered at zero and a variance
inversely proportional to n. In other words, if n is suciently large, likely values of Y will
all be close to .
Estimation of 2 : The point estimate for 2 is based on the minimized sum of squares,
S(Y ), divided by a quantity d so that the E[S(Y )/d] = 2 . The simple derivation outlined
in Problem 8.9 shows that d = n 1, and so
2

=S =

Pn1

Y )2
.
n1

j=1 (Yj

The Standard Error of y: The precision of y as an estimate of can be measured in terms

of its estimated standard deviation,
p
SE(y) = s/ n,
called standard error of y.
Example 8.1 A scientists wishes to detect small amounts of contamination in the environment. To test her measurement procedure, she spiked 12 specimens with a known concentration (2.5 g/l of lead). The readings for the 12 specimens are
1.9 2.4 2.2 2.1 2.4 1.5 2.3 1.7 1.9 1.9 1.5 2.0

132

CHAPTER 8. STATISTICAL MODELING AND INFERENCE

The sample mean and variance areqy = 1.9833 and s2 = 0.09787879, respectively. The
standard error of y is then SE(y) = 0.09787879/12 = 0.09031371. It would appear that the
scientists measurement procedure is biased giving values below the true concentration. The
bias can be estimated as 1.9833 2.5 = 0.5166667, give or take 0.181 (0.181 = 2 SE(y)).

8.2.2

Confidence Interval for

Consider the absolute estimation error |Y |. We wish to find a value d such that there is
a large probability (0.95 or 0.99) that the absolute estimation error is below d. That is, we
wish to find d such that, for some small value of (typically = 0.05 or 0.01) we have
P [|Y | < d] = 1
The resulting d can be, then, added to and subtracted from the observed average y to obtain
the upper and lower limits of an interval called (1 )100% confidence interval:
(y d, y + d)
Typical values of are = 0.05 and = 0.01 yielding 95% and 99% confidence intervals,
respectively. To fix ideas we will take = 0.05 in what follows.
Assuming that the model (8.3) is correct, the probability that and Y dier by more than
d is only 0.05. In other words, if we repeatedly obtain samples of size n and construct the
corresponding 95% confidence intervals for , on average, 95% of these intervals will include
the (unknown) value of .
Using that Y N (, 2 /n) we have
0.95 = P [ |Y | < d ] = P

"p #
p #
X
d n
nd
| p |<
= 2
1.
/ n

That is,

" p #

d n
= 0.975.

Using the standard normal table we get,

p
d n
= 1.96,

from which we have

d = 1.96 p .
n

133

8.2. ONE SAMPLE PROBLEMS

Unfortunately, in most practical applications, the value of is unknown and must be

estimated from the data. To estimate we can use, for instance, the sample standard
deviation s. The corresponding estimate for d is now
s
d = 1.96 p = 1.96 SE(y).
n
The precision of s as an estimate of increases with the sample size. Therefore, replacing
by s has little eect when the sample size is large (n 20, say). However, when n is small
the added level of uncertainty is somewhat increased and an adjustment is needed. To adjust
for the increased level of uncertainty the value from the normal table (1.96 when = 0.05)
must be replaced by a slightly larger value, tdf (), obtained from the Students t table. The
precise Students t value, tdf (), depends on two parameters: the significance level, , and
the degrees of freedom, df .
The significance level, , is equal to one minus the desired confidence level . In our case,
the confidence level (desired precision) is 0.95 and so = 0.05. In this simple case the degrees
of freedom parameter, df is simply equal to the sample size minus one, that is df = n 1.
More generally, (for future applications) the degrees of freedom are given by the formula

where

df = n k,
n = number of squared terms appearing in the variance estimate

and
k = number of additional estimated parameters appearing in the variance estimate
Table A.2 in Appendix gives the values of t(df ) () for several values of and df .
In summary, the estimated value of d is
s
d = tdf () p = tdf () SE(Y ).
n
Notice that for most values of n that appear in practice, tn1 (0.05) 2, justifying the
common practice of adding and subtracting 2 SE(y) from the observed average y.
Example 8.2 Refer to the data in example 8.1. A 95% confidence interval for the actual
mean of the scientists measurements is
1.9833 t(11) (0.05) SE(y)
or
1.9833 2.20 0.09031371.

That is, the systematic part of the scientists measurement is likely to lay between 1.8 and
2.2.

134

8.2.3

CHAPTER 8. STATISTICAL MODELING AND INFERENCE

Testing of Hypotheses about

There are situations when one wishes to determine if a certain statement or hypothesis about
a model parameter is consistent with the given data. That is, one wishes to confront the
statement against the empirical evidence (data). For example, the scientist of Examples 8.1
and 8.2 may wish to test the hypothesis that the given measurement method is unbiased,
using her collected data.
The procedure for rejecting a hypothesis about a certain unknown population parameter,
on the basis of statistical evidence, is called testing of hypothesis. The hypothesis to be
tested is denoted by H0 .
Typical hypotheses, H0 , about are
(i) H0 : = 0

(ii) H0 : 0

or (iii) H0 : 0 ,

where 0 is some specified value. In the case of the scientist of examples 8.1 and 8.2, the
statement the measurement method is unbiased corresponds to (i) with = 2.5. On the
other hand, the statement the measurement method does not consistently under-estimates
the true concentration corresponds to (iii) with = 2.5. What statement would correspond
to (ii) with = 2.5?
Significance Level of a Test: When testing a hypothesis one can incur in two possible
errors: Rejecting a hypothesis that is true (Error of type I) or non-rejecting a hypothesis that
is false (Error of type II). Errors of type I are considered more important and kept under
tight control. Therefore, usual testing procedures insure that the probability of rejecting a
true hypothesis is rather small (0.01 or 0.05). The probability of error of type I is usually
denoted by and called significance level of the test.
Taking that into consideration, the hypothesis H0 is constructed in a such a way that its
incorrect rejection has a small probability. H0 states, then, the most conservative statement.
A statement that one would like to reject only in the presence of strong empirical evidence.
Because of that H0 is called as the null hypothesis.
The Testing Procedure: The testing procedures learned in this course are simply derived
from confidence intervals. Suppose we wish to test H0 at level . Then we distinguish two
cases:
Two sided tests: Hypotheses of the form H0 : = 0 give rise to two sided tests because
in this case we reject H0 if we have evidence indicating that is smaller or larger than 0 .
The twosided level testing procedure consists of the following two steps:
Step 1. Construct a (1 )100% confidence interval for .
Step 2. Reject H0 : = 0 if 0 lies outside that interval.
One sided tests: Hypotheses of the form H0 : 0 (H0 : 0 ) are called directional
hypotheses and give rise to onesided tests. Notice that in this case we reject H0 only if we
suspect that < 0 ( > 0 ).
The onesided level testing procedure consists of the following two steps:
Step 1. Construct a [1 (2 )]100% confidence interval for .

135

8.2. ONE SAMPLE PROBLEMS

Step 2. Reject H0 : 0 (H0 : 0 ) if 0 is larger (smaller) than the upper (lower)

end of that interval.
That is, we reject H0 if the confidence interval is completely contained in the complement of
the interval assumed under H0 .
Example 8.3 Refer to the data in example 8.1. Test at level = 0.05 the following hypotheses: (a) H0 : = 2.5; and (b) H0 : 2.3.
(a) Since the 95% confidence interval (1.785, 2.182) (see example 8.2) does not include 2.5,
we reject H0 . There is statistical evidence indicating that the measurement procedure is not
unbiased.
(b) We must first construct a 90% confidence interval for . From example 8.1 we have
that y = 1.9833 and SE(y) = 0.0931371. Moreover, from the Student-t Table we have
t(11) (0.10) = 1.80. Therefore, the 90% confidence interval for is
(1.9833 1.80 0.0931371, 1.9833 1.80 0.0931371) = (1.82, 2.15).
Since 2.15 < 2.3 we reject the H0 . There is statistical evidence indicating that the measurement procedure systematically underestimates the true lead concentration by at least 0.2
g/l.

Example 8.4 A shipyard must order a large shipment of lacquer from a supplier. Besides
other design requirements, the lacquer must be durable and dry quickly. The average drying
time must not exceed 25 minutes. Supplier A claims that, on average, its product dries in
20.5 minutes. A sample of 30 20-liter cans from supplier A yields an average drying time of
22.3 minutes and standard deviation of 2.9 minutes.
(a) Is there statistical evidence to distrust supplier As claim that its product has an average
drying time of 20.5 minutes?
(b) Can we say that, on average, supplier As lacquer dries before 24 minutes?

Solution to Example 8.4:

(a) To answer this question we must assess the precision of y as an estimate of . Evidently,
y = 22.3 is dierent from the claimed value 20.5 for . However, we need still to determine
if the observed dierence of 1.8 is within the normal range of variability of Y .
To answer the question we can test the hypothesis
H0 : = 20.5.
at level = 0.05, say. Since it is a non-directional hypothesis (twosided test) we must
construct a 95% confidence interval for and check if it contain the value 20.5. In the

136

CHAPTER 8. STATISTICAL MODELING AND INFERENCE

present case p
= 0.05 and df = 30 1 Hence, from Table A.2, t29 (0.05) = 2.05. Moreover,
SE(y) = 2.9/ 30 = 0.529465. Therefore,
d = 2.05 0.529465 = 1.085,
and the 95% confidence interval for is
= (22.3 1.084) = (21.21 , 23.39).
(y d)
Since this interval doesnt include the value = 20.5, we reject suppliers A claim that
= 20.5. That is, we reject the hypothesis = 20.5 on the basis of the given data and
statistical model.
(b) One way to answer this question is to test the hypothesis
H0 : 24.0
at some (small) level . To take advantage of the calculations already made we may choose
= 0.025. Since the upper limit 95% confidence interval for is smaller than 24.0, we reject
H0 and answer question (b) in a positive way.
2

8.3

Two Sample Problems

There are practical situations where we are interested in comparing several populations. In
this section we will consider the simplest case of two populations. In Chapter 10 we will
consider the general case of two or more populations.
Example 8.5 Refer to the situation described in Problem 8.4. Another supplier, called
Supplier B, could also supply the lacquer. A sample of 10 20-liter cans from supplier B yields
an average drying time of 20.7 minutes and standard deviation of 2.5 minutes. Does the data
support supplier Bs claim that, on average, its product dries faster than As? What if the
sample size from supplier B were 100 instead of 10?
This example illustrates a fairly common situation: one must take or recommend an important decision involving a large number of items (or individuals) on the basis of a relatively
small number of measurements performed on some of these items. Recall that the set of all
the items under study is called the population and the subset of items used to obtain the
measurements (and often the measurements themselves) is called the sample.
Example 8.5 includes two populations, namely the 3,000 20-liter cans of lacquer that can
be acquired from either supplier A or B. In the following these two populations will be called
population A and population B, respectively.
Although we are concerned with the entire populations, we will only be able to test the
items in the samples. Therefore, we must try to investigate and exploit the mathematical
connections between the samples and the population from which they came. This can be

137

8.3. TWO SAMPLE PROBLEMS

done with the help of an statistical model, that is, a set of probability assumptions regarding
the sample measurements. The two sample measurements can be modeled as
Yij = i + i "ij ,

i = 1, 2 and j = 1, . . . , ni .

(8.4)

where the first subscript (i) indicates the population and the second subscript (j) indicates
the observation. Thus, i and i are the population means and variances, respectively, and
n1 and n2 are sample sizes. In the case of Example 8.5, n1 = 30 and n2 = 10. It is often
assumed that the measurements are independent and therefore that the random variables
"ij , i = 1, 2 and j = 1, . . . , ni are independent. Finally, as in the case of one sample, we
assume that the random variables "ij are normal with mean zero and variance one.
Similarly to the one-sample case, the population means 1 and 2 can be estimated by
the corresponding sample means:
Y1 =

n1
1 X
Y1j
n1 j=1

and

n2
1 X
Y2j ,
n2 j=1

Y2 =

Notice that Y 1 and Y 2 are normal random variables with means 1 and 2 and variances
12 /n1 and 22 /n2 , respectively. Furthermore, the population variances 12 and 22 can be
estimated by the sample variances
S12 =

n1
1 X
[Y1j Y 1 ]2
n1 j=1

and

n2
1 X
[Y2j Y 2 ]2
n2 j=1

S22 =

Notice that E(Si2 ) = i2 (see Problem 8.9).

The Pooled Variance Estimate If the variances of the two populations are approximately
equal it then makes sense to compare their means. On the other hand, if the variances
are very dierent, comparing the population means may be a gross oversimplification. A
practical solution in these cases is to apply a transformation (e.g. use log(Yij ) instead of Yij )
that stabilizes (equalizes) the variances.
In this course we will only consider the simple situation where
12 = 22 = 2 .
An unbiased estimate for the common variance 2 , based on the individual unbiased estimates
S12 and S22 , is given by the pooled variance estimate
(n1 1)S12 + (n2 1)S22
S =
=
n1 + n2 2
2

Pni

Y i ]2
.
n1 + n2 2

i=1

j=1 [Yij

138

CHAPTER 8. STATISTICAL MODELING AND INFERENCE

Linear Combinations of the Population Means: In practice one often wishes to estimate linear combinations of the population means and to test hypotheses about them. In
such cases we say that the parameter of interest is a linear combination of 1 and 2 .
The most common linear combination of 1 and 2 is the simple dierence:
= 1 2 .
Other examples are
= 1 22 ,

= 31 2 ,

= 1.21 + 0.52 ,

etc. In general, can be written as

= a1 + b2
where a and b are given constants.
The parameter of interest, can be unbiasedly estimated by
= aY 1 + bY 2
In fact,
= E(aY 1 + bY 2 ) = aE(Y 1 ) + bE(Y 2 ) = a1 + b2 = .
E()
The variance of is equal to
2

= Var(aY 1 + bY 2 ) = a Var(Y 1 ) + b Var(Y 2 ) = a

Var()
2

Therefore, the standard error of is

=
SE()

a2
b2
+ .
n1 n 2

In the case of Example 8.5 the parameter of interest is = 1 2 , estimated as

= y 1 y 2 = 22.3 20.7 = 1.6,
The pooled variance estimate is
s2 =

(29)(2.92 ) + (9)(2.52 )
= 7.90
30 + 10 2

and so
s

p
= s 1 + 1 = 2.8106 1 + 1 = 1.053 = 1.026.
SE()
n1 n2
30 10

a2
b2
+
.
n1 n2

139

8.3. TWO SAMPLE PROBLEMS

Degrees of Freedom: Notice that we are using n1 + n2 observations to calculate s2 , and

that we estimated two unknown parameters (1 and 2 ), therefore
df = n1 + n2 2.
Confidence Interval for : A (1 )100% confidence interval for is given by

t(n1 +n2 2) () SE()

In the case of Example 8.5, a 95% confidence interval for 1 2 is given by

(y 1 y 2 ) t(38) (0.05) s

n1 + n2
= 1.6 2.02 1.026 = (0.47 , 3.67),
n 1 n2

We have used the approximation t(38) (0.05) t(38) (0.05) = 2.02, because t(38) (0.05) is not
included in the table.
Solution to Example 8.5: The statement of Supplier B is consistent with the hypothesis
H 0 : 1 2
or equivalently
H0 : 1 2 0.
We may answer the question by testing this (directional) hypothesis at some (small) level .
For example, we may take = 0.05. The 90% confidence interval for = 1 2 is
(y 1 y 2 ) t(40) (0.10) s

n1 + n2
= 1.6 1.68 1.026 = 1.6 1.724 = (0.124 , 3.324),
n 1 n2

Since the value 1 2 = 0 falls in the interval, we conclude that there is no statistically
significant dierence between the two means. There is, then, statistical evidence against
Supplier Bs claim of having a superior product.
2
Example 8.6 Either 20 large machines or 30 small ones can be acquired for approximately
the same cost. One large and one small machines have been experimentally run for 20 days
with the following results:
y large = y 1 = 31.0,
y large = y 1 = 31.0,

slarge = s1 = 2.1
slarge = s1 = 2.1 1.
9

Is there statistical evidence in favor of either type of machine? Use = 0.05.

140

CHAPTER 8. STATISTICAL MODELING AND INFERENCE

Solution: Since the total cost of 20 large machines equals the cost of 30 small machines, it is
reasonable to compare the total outputs:
Total output of 20 large machines = 201
Total output of 30 small machines = 302
where 1 and 2 are the average daily outputs for each type of machine.
Therefore, the parameter of interest is the linear combination
= 201 302 .
From the information given we have n1 = n2 = 20 and can be estimated by
= 20 y 1 30 y 2 = 20 31 30 22.7 = 61.0.
The pooled estimate of 2 is
19 2.12 + 19 1.92
= 4.01
20 + 20 2
and so s = 2.0. Since df = 20 + 20 2 = 38, from the Students t table we have
t(38) (0.05) t(40) (0.05) = 2.02.
Therefore, the 95% confidence interval for is
61.0 2.02 2.0

202 302
+
= 61.0 35.57 = (93.57, 28.43).
20
20

Therefore we reject (at level = 0.05) the hypothesis that both alternatives are equally
convenient. It appears that it would be more convenient to acquire 30 small machines.

141

8.4. EXERCISES

8.4
8.4.1

Exercises
Exercise Set A
P

Problem 8.1 Given that n1 = 15, x = 20, (xi x)2 = 28, and n2 = 12, y = 17, ,
y)2 = 22.
(a) Calculate the pooled variance s2 .
(b) Determine a 95% confidence interval for 1 2 .
(c) Test H0 : 1 = 2 with = .05.

(yi

Problem 8.2 The time for a worker to repair an electrical instrument is a normally distributed N (, 2 ) random variable measured in hours, where both and 2 are unknown.
The repair times for 10 such instruments chosen at random are as follows:
212,234,222,140,280,260,180,168,330,250
(1) Calculate the sample mean and the sample variance of the 10 observations.
(2) Construct a 95% confidence interval for .
(3) Suppose the worker claims that his average repair time for the instrument is no more
Test if there is sufficient
than 200 hours. Test if his claim conforms with the data.
evidence
to dispute the
claim.
Problem 8.3 (Hypothetical) The eectiveness of two STAT251/241
labsworker's
which were
conducted by two TAs is compared. A group of 24 students with rather similar backgrounds
was randomly divided into two labs and each group was taught by a dierent TA.Their test
scores at the end of the semester show the following characteristics:
n1 = 13, x = 74.5, s2x = 82.6
and
n2 = 11, y = 71.8, s2y = 112.6.
Assuming underlying normal distributions with 12 = 22 , find a 95 percent confidence interval
for 1 2 . Are the two labs dierent? Summarize the assumptions you used for your analysis.
Problem 8.4 Two machines (called A and B in this problem) are compared. Machine A
cost $ 3000 and machine B cost $ 4500. One machine of each type was operated during 30
days and the daily outputs were recorded. The results are summarized below:
Machine A: xA = 200kg sA = 5.1kg.
Machine B:

xB = 270kg

sB = 4.9kg.

Is there statistical evidence indicating that any one of these machines has better output/cost
performance than the other? Use = 0.5.
Problem 8.5 The average biological oxygen demand (BOD) at a certain experimental station has to be estimated. From measurements at other similar stations we know that the
variance of BOD samples is about 8.0 (mg/liter)2 . How many observations should we sample

142

CHAPTER 8. STATISTICAL MODELING AND INFERENCE

if we want to be 90 percent confident that the true mean is within 1 mg/liter of our sample
average? (Hint: Using CLT, we may assume the sample average has approximately normal
distribution).
Problem 8.6 An automobile manufacturer recommends that any purchaser of one of its new
cars bring it in to a dealer for a 3000-mile checkup. The company wishes to know whether
the true average mileage for initial servicing diers from 3000. A random sample of 50 recent
purchasers resulted in a sample average mileage of 3208 and a sample standard deviation
of 273 miles. Does the data strongly suggest that true average mileage for this checkup is
something other than the recommended value?
Problem 8.7 The following data were obtained on mercury residues on birds breast muscles:
Mallard ducks: m = 16, x = 6.13, s1 = 2.40
Blue-winged teals: n = 17, y = 6.46, s2 = 1.73
Construct a 95% confidence interval for the dierence between true average mercury residues
1 , 2 in these two types of birds in the region of interest. Does your confidence interval
indicate that 1 = 2 at a 95% confidence level?
Problem 8.8 A manufacturer of a certain type of glue claims that his glue can withstand
230 units of pressure. To test this claim, a sample of size 24 is taken. The sample mean is
191.2 units and the sample standard deviation is 21.3 units.
(a) Propose a statistical model to test this claim and test the manufacturers claim.
(b) What is the highest claim that the manufacturer can make without rejection of this
claim?
8.4.2

Exercise Set B

Problem 8.9 Suppose that Y1 , . . . , Yn are a sample, that is, they are independent, identically
distributed, with common mean and common variance 2 . Recall that the sample variance
is equal to
Pn
(Yi Y )2
2
S = i=1
n1
(a) Show that
n
X
i=1

(Yi Y )2 =

n
X
i=1

(Yi )2 n(Y )2 .

(b) Show that S is an unbiased estimate of 2 , that is

E(S 2 ) = 2
Problem 8.10 (a) The president of a cable company claims that its 0.3inch cable will
support an average load of 4200 pounds. Twenty four of these cables are tested to failure,
yielding the following data:
4201.3 4262.4 3983.0 3943.0 4141.3 4168.5 4050.0 4142.7

143

8.4. EXERCISES

4270.0 4002.9 4393.9 3868.0 4123.5 4192.5 3986.6 4276.7

4253.9 4303.4 4099.2 4136.1 4492.7 4292.7 3820.9 3621.4
Propose a statistical model for the given data and test the presidents claim. Check that
your models assumptions are consistent with the data.
(b) A dierent supplier has provided a sample of thirty six 0.3inch cables which, after tested
to failure, yielded the following data:
4047.3 4302.6 4069.4 3914.8 4133.2 3658.6 4221.9 3913.1 4129.9 4068.7
4389.9 3943.9 4446.6 3796.3 4117.4 3816.9 4353.4 4009.5 4432.9 4072.1
3862.0 3939.3 3875.2 3989.0 4203.2 4334.9 4358.6 4189.9 4219.7 4238.0
4033.2 4005.2 4428.8 3938.0 4171.6 3974.7
Propose a statistical model for (all) the given data and test the hypothesis that the
cable from the two companies have the same average strength. Check that your models
assumptions are consistent with the data.
Problem 8.11 A politician must decide whether or not to run in the next local election.
He would be inclined to do so if at least 30% of the voters would favor his candidacy. The
results of a poll of 20 local citizens yielded the following results:
30% favor the politician
35% favor other candidates
35% are still undecided
Should the candidate decide to run based on the results of this survey? Do you think that
the sample size is appropriate? If not, suggest an appropriate sample size.
Problem 8.12 The number of hours needed by twenty employees to complete a certain task
have been measured before and after they participated of a special training program. The
data is displayed on Table 7.1.
How would you model these data in order to answer the question: Was the training
program successful? Was it? Also check that your models assumptions are consistent with
the data.
Table 7.1:

Problem 8.13 In order to process a certain chemical product, a company is considering the
convenience of acquiring (for approximately the same price) either 100 large machines or 200
small ones. One important consideration is the average daily processing capacity (in hundred
of pounds).
One machine of each type was tested for a period of 10 days, yielding the following results:
Large Machine: x1 = 120
s1 = 1.5
Small Machine: x2 = 65
s2 = 1.6
Model the data and identify the parameter of main interest. Construct a 95% confidence
interval for this parameter. What is your recommendation to management?
Problem 8.14 A study is made to see if increasing the substrate concentration has appreciable eect on the velocity of a chemical reaction. With the substrate concentration of 1.5
moles per liter, the reaction was run 15 times with an average velocity of 7.5 micromoles per
30 minutes and a standard deviation of 1.5. With a substrate concentration of 2.0 moles per

144

CHAPTER 8. STATISTICAL MODELING AND INFERENCE

Employee
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

Before Training
14.6
17.5
13.5
13.9
15.0
20.5
14.4
14.6
17.9
16.7
14.7
17.3
11.7
13.7
16.8
15.7
15.7
16.7
15.5
17.2

After Training
10.6
15.4
13.2
12.2
11.7
18.6
10.3
10.3
10.4
16.8
14.6
14.6
10.5
10.9
11.8
13.4
13.6
16.7
16.7
13.8

Dierence
4.0
2.1
0.3
1.7
3.3
1.9
4.1
4.3
7.5
-0.1
0.1
2.7
1.2
2.8
5.0
2.3
2.1
0.0
-1.2
3.4

liter, 12 runs were made yielding an average velocity of 8.8 micromoles per 30 minutes and a
sample standard deviation of 1.2. Would you say that the increase in substrate concentration
increases the mean velocity by as much as 0.5 micromoles per 30 minutes? Use a 0.01 level
of significance and assume the populations to be approximately normally distributed with
equal variances.
Problem 8.15 (Hypothetical) A study was made to estimate the dierence in annual salaries
of professors in University of British Columbia (UBC) and University of Toronto (UT). A
random sample of 100 professors in UBC showed an average salary of $46,000 with a standard
deviation $12,000. A random sample of 200 professors in UT showed an average salary of
$51,000 with a standard deviation of $14,000. Test the hypothesis that the average salary
for professors teaching in UBC diers from the average salary for professors teaching in UT
by $5,000.
Problem 8.16 A UBC student will spend, on the average, $8.00 for a Saturday evening
gathering in pub. A random sample of 12 students attending a homecoming party showed an
average expenditure of $8.9 with standard deviation of $1.75. Could you say that attending
a homecoming party costs students more than gathering in pub?
Problem 8.17 The following data represent the running times of films produced by two
dierent motion-picture companies.
Company I
Company II

Times (minutes)
103 94 110 87 98
97 82 123 92 175 88 118

Compute a 90% confidence interval for the dierence between the average running times of
films produced by the two companies. Do the films produced by Company II run longer than
those by Company I?

145

8.4. EXERCISES

Problem 8.18 It is required to compare the eect of two dyes on cotton fibers. A random
sample of 10 pieces of yarn were chosen; 5 pieces were treated with dye A, and 5 with dye B.
The results were
Dye A 4 5 8 8 10
Dye B 6 2 9 4 5
(a) Test the significance of the dierence between the two dyes. (Assume normality, common
variance, and significance level = 0.05.)
(b) How big a sample do you estimate would be needed to detect a dierence equal to 0.5
with probability 99%.

146

CHAPTER 8. STATISTICAL MODELING AND INFERENCE

Chapter 9

Simulation Studies
9.1

Monte Carlo Simulation

Consider the integral

Z 1
0

g(t)dt.

Suppose that g is such that this integral cannot be easily integrated and we need to
approximate it by numerical means. For simplicity suppose that 0 g(t) 1 for all 0 t
1.
If we are dealing with a function h(t) which is not between 0 and 1 but we know that
a h(t) b,

for all 0 t 1,

then the function

g(t) =

h(t) a
ba

does take values between 0 and 1 and

Z 1
0

h(t)dt = (b a)

Z 1
0

g(t)dt + a

Suppose that we want to estimate I with an error smaller than = 0.01, with probability
equal to 0.99. In other words, if I is the estimate of I, we require that
P {|I I| < 0.01} = 0.99.
147

148

CHAPTER 9. SIMULATION STUDIES

First of all, we notice that

Z 1
0

g(t)dt = E{g(U )},

where U is a random variable with uniform distribution on the interval (0, 1). If we generate
n independent random variables
U1 , U 2 , . . . , U n
with uniform distribution on (0, 1), then by the Central Limit Theorem
n
1X
I =
g(Ui ) = g(U )
n i=1

is approximately normal with mean I = E{g(U )} and variance 2 /n, where

Z 1
0

g (t)dt I

Z 1
0

g(t)dt I 2 = I(1 I).

Now,
p
p
P {|I I| < 0.01} = P { n|I I|/ < n0.01/}
P {|Z| <

p
n0.01/}

p
= 2[ n0.01/] 1.
But,
p
2[ n0.01/] 1 = 0.99

p
) [ n0.01/] = 0.995
)
)

n0.01
= 2.58

2 (2.58)2
.
(0.01)2

149

9.1. MONTE CARLO SIMULATION

Finally, since I(1 I) reaches its maximum at I = 0.5 it follows that I(1 I) 0.25 for all
I, and so, a conservative estimate for n is
n =

2 (2.58)2
(0.25)(2.58)2

= 16, 641
(0.01)2
(0.01)2

Therefore, an estimate of I based on n = 16, 641 independent uniform random variables, Ui ,

will include an error smaller than 0.01, with probability 0.99.
The Monte Carlo method can also be used to estimate an integral of the form

Z b
a

f (t)dt,

(9.1)

where f (t) takes values between c and d. That is, the domain of integration can be any given
bounded interval, [a, b], and the function can take values on any given bounded interval [c, d].
For example, we may wish to estimate the integral
J=

Z 3
1

exp {t2 }dt.

In this case the domain of integration is [1, 3] and the function ranges over the interval
[2.7183, 20.086]
.
In order to estimate J, first we must make the change of variables
u=

ta
,
ba

to obtain
J = (b a)

Z 1
0

f [(b a)u + a]du =

Z 1
0

where
g(u) = (b a)f [(b a)u + a].

g(u)du,

150

CHAPTER 9. SIMULATION STUDIES

In the case of our numerical example we have

J = (3 1)

Z 1
0

exp {[(3 1)u + 1] }du == 2

Z 1
0

exp {2u + 1]2 }du,

and
g(u) = 2 exp {[2u + 1]2 }
The second step is to linearly modify the function g(u) so that the resulting function, h(u),
takes values between 0 and 1. That is,
h(u) =

g(u) (b a)c
(b a)(d c)

g(u) = (b a)(d c)h(u) + (b a)c

Notice that, since

(b a)c g(u) (b a)d,
then
0 h(u) 1.
In the case of our numerical example,
h(u) =

2 exp {[2u + 1]2 } 2.7183

2 exp {2u + 1]2 } 2.7183
=
.
2(20.086 2.7183)
34.7354

Finally,

J =

Z 1
0

g(u)du = (b a)(d c)

= (b a)(d c)I +

c
,
dc

Z 1
0

h(u)du +

c
dc

where I is of the desired form (that is, the integral between 0 and 1 of a function that takes
values between 0 and 1).

151

9.2. EXERCISES

9.2

Exercises

Problem 9.1 Use the Monte Carlo integration method with n = 1500 to approximate the
following integrals.
(a)
Z
I=

exp{x2 }dx.

What is the (approximated) probability that the approximation error is less than d = 0.05?
Less than d = 0.01?
(b)
Z 2
I=
exp{x2 }dx.
1

Problem 9.2 Let

Z /2
0

exp{ cos2 (x)} cos(x) sin(x)dx

(a) Use the Monte Carlo method, with n = 100, to estimate I.

(b) Construct a 95% confidence interval for I based on the Monte Carlo data.
(c) Is the true value of I included in your confidence interval?
Hint: use the change of variables y = cos2 (x) to exactly evaluate the integral.
(d) Repeat (a)(c) with n = 500 and n = 1000.
(e) What is the needed sample size if the 95% confidence interval must have total length
equal to 0.02?
Problem 9.3 (a) Generate 100 samples of size n = 10 from the following distributions:
(1) Uniform on the interval (0, 1); (2) exponential with mean 1; (3) discrete with f (1) =
1/3, f (2) = 1/3 and f (9) = 1/3; (4) discrete with f (1) = 1/8, f (3) = 1/8 and f (9) = 3/4
and (5) f (1) = 1/3, f (5) = 1/3 and f (9) = 1/3.
(b) For each distribution calculate the corresponding sample means and discuss the merits
of the CLT approximation to the distribution of the sample mean in each case. You can use
histograms, Q-Q plots, box plots, etc. for your analysis.
(c) Repeat (a) and (b) with n = 20 and n = 50.
(d) Concisely state your conclusions.

152

CHAPTER 9. SIMULATION STUDIES

Chapter 10

Comparison of several means

10.1

An example

The main ideas will be illustrated by the following example.

Example 10.1 A construction company wants to compare several dierent methods of drying concrete block cylinders. To that eect, the engineer in charge of acquisition and testing
of materials sets up an experiment to compare five dierent drying methods referred to as
drying methods A, B, C, D and E. One important feature of the concrete block cylinders
is their compressive strength (in hundreds of kilograms per square centimeter), which can
be determined by means of a destructive strength test. After selecting a carefully designed
experiment (we will discuss this important step later on) the engineer collected the data
displayed in Table 9.1.
Table 10.1: Concrete Blocks Compressive Strength
Type
A
B
C
D
E

47.90
44.56
37.69
52.58
61.93
58.91
82.31
44.81
39.72
26.72

47.95
50.41
37.79
40.47
63.39
42.85
51.82
46.36
40.98
44.68

Compressive Strength (100 pounds

49.39
48.80
53.15
49.06
35.99
45.15
57.53
50.05
62.75
51.62
39.73
65.68
53.85
55.06
49.14
49.71
52.87
47.26
50.97
58.45
48.40
53.28
55.00
49.97
64.11
61.06
47.72
53.08
44.76
68.43
76.49
48.58
44.74
29.94
47.18
32.84
34.48
46.54
54.80
56.89

per square inch)

50.62
46.80
40.79
30.38
64.62
46.64
57.68
50.54
48.87
66.48
49.47
54.21
56.99
55.49
61.41
55.97
35.39
43.54
34.46
42.88

34.66
29.13
52.01
62.18
57.79
51.37
52.72
46.83
50.21
44.30

47.48
41.21
61.38
54.60
48.51
60.26
64.97
52.76
42.12
30.64

Mean
45.05

SD
7.5

52.29

8.70

54.29

6.32

56.83

10.40

41.15

8.16

Propose a statistical model and answer the following questions:

(a) Are the models assumptions consistent with the data?
(b) Propose unbiased estimates for the unknown parameters in the model.
(c) Are the (population) mean compressive strengths for the five methods dierent?
(d) If the answer to question (b) is positive, what method is the best? The worst?
153

154

CHAPTER 10. COMPARISON OF SEVERAL MEANS

Solution to Example 10.1:

We propose the following model. Each measurement will be represented as the sum of
two terms, an unknown constant, i , and a random variable, "ij .
Yij = i + "ij ,

i = 1, . . . , k

and j = 1, . . . , ni .

The first subscript, i, ranges from 1 to k, where k is the number of populations being
compared, usually called treatments. In our example, we are comparing five types of drying
methods, therefore k = 5. The second subscript, j, ranges from 1 to ni , where ni is the number
of measurements for each treatment. In our example, we have n1 = n2 = . . . = n5 = 20.
The unknown parameters i represent the treatment averages. Dierences among the i s
account for the part of the variability observed in the data that is due to dierences among
the treatments being compared in the experiment.
The random variables "ij account for the additional variability that is caused by other
factors not explicitly considered in the experiment (dierent batches of raw material, dierent
mixing times, measurement errors, etc.). The best we can hope regarding the global eect
of these uncontrolled factors is that it will average out. In this way these factors will not
unduly enhance or worsen the performance of any treatment.
An important technique that can be used to achieve this (averaging out) is called randomization. The experimental units available for the experiment (the 100 concrete blocks
cylinders in the case of our example) must be randomly assigned to the dierent treatments,
so that each experimental unit has, in principle, the same chance of being assigned to any
treatment. One practical way for doing this in the case of our example is to number the
blocks from 1 to 100 and then to draw (without replacement) groups of 20 numbers. The
units with numbers in the first group are assigned to treatment A, the units with numbers
in the second group are assigned to treatment B, and so on. The actual labeling of the
treatments as A, B, etc. can also be randomly decided.
The model assumptions are:
(1)
(2)
(3)
(4)

Independence. The random variables "ij are independent.

Constant Treatment Means. E("ij ) = 0 for all i and j.
Constant Variance. Var("ij ) = 2 for all i and j.
Normality. The variables "ij are normal.

These assumptions can be summarized by saying that the variables "ij s are iid N (0, 2 ).
(a) The QQ plots of Figure 10.1 (a)-(e) suggest that assumption (4) is consistent with the
data. Figure 10.1 (f) displays the boxplots for the combined data (first from the left) and
for each drying method. The variability within the samples seem roughly constant (the boxes
are of approximately equal size). This suggests that assumption (3) is also consistent with
the data.

155

10.1. AN EXAMPLE

Drying Method B

Empirical Quantiles
45 50
55 60

-2

-1
0
1
Normal Quantiltes

Empirical Quantiles
60
70

-1
0
1
Normal Quantiltes

Empirical Quantiles
35 40 45 50

Boxplot of Drying Methods

-2

Drying Method E

-2

65
Empirical Quantiles
50
55
60

Drying Method D

Drying Method C

-2

Empirical Quantiles
35 40 45 50

Drying Method A

Figure 10.1: Q-Q plots and boxplot

156

CHAPTER 10. COMPARISON OF SEVERAL MEANS

(b) The unknown parameters in the model are i (i = 1, . . . , k) and 2 .

In what follows we will use the following notations.
n = n1 + . . . + nk ,

the total sample size,

yi. = yi1 + . . . + yini

ni
X

the ith treatments total,

yij ,

j=1

y i.

si =
=

ni
yij
yi1 + . . . + yini
=
= j=1
ni
ni
yi.
=
,
the ith treatments mean,
ni

(yi1 y i. )2 + . . . + (yini y i. )2
ni 1

ni
j=1 (yij

y i. )2
ni 1

the ith treatments standard deviation,

y.. = y1. + . . . + yk. =

k
X

yi.

i=1

ni
k X
X

yij ,

the overall total,

i=1 j=1

and

y .. =

i=1

yi.

i=1

Pni

j=1

yij

the overall mean.

In the case of our example

y 1. = 45.05,
and

y 2. = 52.29,

y 3. = 54.29,

y 4. = 56.83,

y 5. = 41.15,

157

10.1. AN EXAMPLE

s1 = 7.50,

s2 = 8.70,

s3 = 6.32,

s4 = 10.40,

s5 = 8.16.

(See columns 3 and 4 of Table 9.1). In addition,

y.. =

k
X

ni y i. = 20[45.05 + 52.29 + 54.29 + 56.83 + 41.15] = 4992.2

i=1

and

y .. = 4992/100 = 49.92.

It is not dicult to show that the y i. are unbiased estimates for the unknown parameters
i . In fact, the reader can easily verify that
E(Y i. ) = i ,

and Var(Y i. ) =

2
,
ni

for i = 1, . . . , k.

Analogously, it is not dicult to verify (see Problem 8.9) that S12 , S22 , . . . , Sk2 are k dierent
unbiased estimates for the common variance 2 :
E(Si2 ) = 2 ,

for i = 1, . . . , k.

These k estimates can be combined to obtain an unbiased estimate for 2 . The reader is
encouraged to verify that the combined estimate
2

S =

1)Si2
nk

i=1 (ni

is also unbiased and has a variance smaller than that of the individual Si2 s.
(c) Roughly speaking one can answer this question positively if there is evidence that a
substantial part of the variability in the data is due to dierences among the treatments.
The total variability observed in the data is represented by the total sum of squares,

SST =

ni
k X
X

[yij y .. ]2

ni
k X
X

yij2

ni
k X
X

yij2

i=1 j=1

i=1

y..2
.
n

Pni

j=1

i=1

yij ]2

158

CHAPTER 10. COMPARISON OF SEVERAL MEANS

We will now show that the total sum of squares, SST , can be expressed as the sum of two
terms, the error sum of squares, SSe, and the treatment sum of squares, SSt. That
is,
SST = SSe + SSt,

(10.1)

where

SSe =

ni
k X
X

[yij y i. ]2

ni
k X
X

yij2

i=1 j=1

k
X
yi.2
i=1

and

SSt =

k
X
i=1

ni [y i. y .. ]2 .

The first term on the righthand side of equation (1), SSe, represents the dierences
between items in the same treatment or withintreatment variability (this source of
variability is also called intragroupvariability). The second term, SSt, represents the
dierences between items from dierent treatments or betweengroups variability (this
source of variability is also called intergroupvariability).
To prove equation (1) we add and subtract y i. and expand the square to obtain
ni
k X
X

[yij y .. ]2 =

ni
k X
X

[(yij y i. ) + (y i. y .. )]2

ni
k X
X

(yij y i. )2 +

i=1 j=1

= SSe + SSt + 2

= SSe + SSt.

(y i. y .. )

k
X
i=1

(y i. y .. )2 + 2

i=1 j=1

k
X
i=1

= SSe + SSt + 2

ni
k X
X

ni
X

j=1

(yij y i. )

(y i. y .. )[ni y i. ni y i. ]

ni
k X
X

i=1 j=1

(yij y i. )(y i. y .. )

159

10.1. AN EXAMPLE

In the case of our example we have

SST = 259273.7

SSe = 259273.7

(4992.2)2
= 10049.11,
100

(901.01)2 + (1045.72)2 + (1085.79)2 + (1136.67)2 + (823.05)2

= 6587.75
20

and
SSt = SST SSe = 3461.36.
Degrees of Freedom
The sums of squares cannot be compared directly. They must first be divided by their
respective degrees of freedoms.
Since we use n squares and only one estimated parameter in the calculation of SST , we
conclude that
df (SST ) = n 1.
Since there are n squares and k estimated parameters (the k treatment means) in the
calculation of SSe, we conclude that
df (SSe) = n k.
The degrees of freedom for SSt are obtained by the dierence
df (SSt) = df (SST ) df (SSe) = (n 1) (n k) = k 1.
ANALYSIS OF VARIANCE
All the calculations made so far can be summarized on a table called the analysis of
variance (ANOVA) table.
Table 10.2: ANOVA TABLE
Source
Drying Methods
Error
Total

Sum of Squares
3461.36
6587.75
10049.11

df
4
95
99

Mean Squares
865.25
69.34

F
12.45

160

CHAPTER 10. COMPARISON OF SEVERAL MEANS

(c) To answer question (c) we must compare the variability due to the treatments with the
variability due to other sources. In other words, we must find out if the treatment eect
is strong enough to stand out above the noise caused by other sources of variability.
To do so, the ratio
F =

M St
M Se

is compared with the value F [df (M St), df (M Se)] from the FTable, attached at the end of
these notes. In our case
F =

865.25
= 12.45
69.34

and
F (4, 95) F (4, 60) = 2.53.
Since F > F (4, 95) we conclude that there are statistically significant dierences among the
drying methods.
(d) To answer question (d) we must perform multiple comparisons of the treatment means.
It is intuitively clear that if the number of treatments is large and therefore the total number
of comparisons of pairs of means
K = (k2 ) = k(k 1)/2
is very large, there will be a greater chance that some of the 95% confidence intervals will fail
to include the value zero, even if all the i were the same. For example, K = 3 when k = 3,
K = 6 when k = 4 and K = 10 when k = 5.
To compensate for the fact that the probability of declaring two means dierent when
they are not is larger than the significance level = 0.05 used for each comparison, we must
use the smaller significance level, , given by
= 0.05/K
Each individual confidence interval is constructed so that it has probability 1 of
including the true treatment mean dierence. It can be shown that this procedure (called
Bonferroni multiple comparisons) is conservative: If all the treatment means are equal,
1 = 2 = . . . = k ,
then the probability that one or more of these intervals do not include the true dierence, 0,
is at most .

161

10.1. AN EXAMPLE

The procedure to compute the simultaneous confidence intervals is as follows. In the first
place, we must find the appropriate value, t(nk) () = t(nk) (/K), from the Students t table
(see Table 7.1). As before, the number of degrees of freedom corresponds to those of the MSe,
that is, df = n k.
The second step is to determine the standard deviation of the dierence of treatments
means, Y i. Y m. . It is easy to see that

Var(Y i. Y m. ) = 2

1
1
+
.
ni nm

Therefore,
estimated SD(Y i. Y m. )

M Se

1
1
+
.
ni nm

In the case of our example k = 5 and therefore K = 10. The observed dierences between
the 10 pairs of treatments (sample) means are given in Table 9.3.
Table 10.3: MULTIPLE COMPARISONS
Treatments
AB
AC
AD
AE
BC
BD
BE
CD
CE
DE

di,m
7.56
7.56
7.56
7.56
7.56
7.56
7.56
7.56
7.56
7.56

Observed Dierence
-7.24
-9.24
-11.78
3.9
-2.0
-4.54
11.14
-2.54
13.14
15.68

Significance
*
*

*
*
*

As explained before, the (precision) number di,m is calculated by the formula

di,m = t(nk) M Se

1
1
+
.
ni nm

In the case of our example, since

n1 = n2 = n3 = n4 = n5 = 20,
all the di,m are equal to
d = t(95) (0.05/10)

69.34

2
= 7.56.
20

The dierences marked with an star, *, on Table 9.3 are statistically significant. For example,
the * on the line AC together with the fact that the sign of the dierence is negative, is

162

CHAPTER 10. COMPARISON OF SEVERAL MEANS

interpreted as evidence that method A is worse (less strong) than method C. The conclusions
from Table 9.3 are: the methods A and E are not significantly dierent and appear to be
significantly worse than than the others. Observe that, although method A is not significantly
worse than method B (at the current level = 0.05) their dierence, 7.24, is almost significant
(fairly close to 7.56).
2

163

10.2. EXERCISES

10.2

Exercises

10.2.1

Exercise Set A

Problem 10.1 Three dierent methods are used to transport milk from a farm to a dairy
plant. Their daily costs (in $100) are given in the following:
Method 1: 8.10 4.40 6.00 7.00
Method 2: 6.60 8.60 7.35
Method 3: 12.00 11.20 13.30 10.55 11.50
(1) Calculate the sample mean and sample variance for the cost of each method.
(2) Calculate the grand mean and the pooled variance for the costs of the three methods.
(3) Test the dierence of the costs of the three methods.
Problem 10.2 Six samples of each of four types of cereal grain grown in a certain region were
analyzed to determine thiamin content, resulting in the following data (micrograms/grams):
Wheat: 5.2 4.5 6.0 6.1 6.7 5.8
Barley: 6.5 8.0 6.1 7.5 5.9 5.6
Maize: 5.8 4.7 6.4 4.9 6.0 5.2
Oats: 8.3 6.1 7.8 7.0 5.5 7.2
Carry out the analysis of variance for the given data. Do the data suggest that at least two
of the four dierent grains dier with respect to true average thiamin content? Use = 0.5.
Problem 10.3 A psychologist is studying the eectiveness of three methods of reducing
smoking. He wants to determine whether the mean reduction in the number of cigarettes
smoked daily diers from one method to another among male patients. Twelve men are
included in the experiment. Each smoked 60 cigarettes per day before the treatment. Four
randomly chosen members of the group pursue method I; four pursue method II; and so on.
The results are as follows (Table 9.4):
(a) Use a one-way analysis of variance to test whether the mean reduction in the number
Table 10.4:
Method I
52
51
51
52

Method II
41
40
39
40

Method III
49
47
45
47

of cigarettes smoked daily is equal for three methods. (Let the significance level equal 0.05).
(b) Use confidence intervals to determine which method results in a larger reduction in
smoking.
10.2.2

Exercise Set B

Problem 10.4 For best production of certain molds, the furnaces need to heat quickly up to
a temperature of 1500o F. Four furnaces were tested several times to determine the times (in

164

CHAPTER 10. COMPARISON OF SEVERAL MEANS

minutes) they took to reach 1500o F, starting from room temperature, yielding the following
results
Are the furnaces average heating times dierent? If so, which is the fastest? The slowest?
Table 10.5:
Furnace
1
2
3
4

ni
15
15
10
10

xi
14.21
13.11
15.17
12.42

si
0.52
0.47
0.60
0.43

Problem 10.5 Three specific brands of alkaline batteries are tested under heavy loading
conditions. Given here are the times, in hours, that 10 batteries of each brand functioned
before running out of power. Use analysis of variance to determine whether the battery
brands take significantly dierent times to completely discharge. If the discharge times are
significantly dierent (at the 0.05 level of confidence), determine which battery brands dier
from one another. Specify and check the model assumptions.
Table 10.6:
Battery Type
1
2
3
5.60
5.38
6.40
5.43
6.63
5.91
4.83
4.60
6.56
4.22
2.31
6.64
5.78
4.55
5.59
5.22
2.93
4.93
4.35
3.90
6.30
3.63
3.47
6.77
5.02
4.25
5.29
5.17
7.35
5.18

Problem 10.6 Five dierent copper-silver alloys are being considered for the conducting
material in large coaxial cables, for which conductivity is a very important material characteristic. Because of diering availabilities of the five kinds, it was impossible to make as
many samples from alloys 2 and 3 as from other alloys. Given next are the coded conductivity measurements from samples of wire made from each of the alloys. Determine whether
the alloys have significantly dierent conductivities. If the conductivities are significantly
dierent (at = 0.05), determine which alloys dier from one another. Specify and check
the model assumptions.
Problem 10.7 Show that
E(Y i ) = i ,

i = 1, . . . , k,

E(Si2 )

i = 1, . . . , k

= ,

165

10.2. EXERCISES
Table 10.7:
1
60.60
58.93
58.40
58.63
60.64
59.05
59.93
60.82
58.77
59.11
61.40
59.00

2
58.88
59.43
59.30
56.97
59.02
58.59
60.19
57.99
59.24
57.38

Alloy
3
62.90
63.63
62.33
63.27
61.25
62.67
61.29
60.77

4
60.72
60.41
59.60
59.27
59.79
62.35
60.26
60.53
58.91
58.55
61.20
59.73
60.12
60.49

5
57.93
59.85
61.06
57.31
61.28
59.68
57.82
59.29
58.65
61.96
57.96
59.42
59.40
60.30
60.15

and
E(MSe) = 2 .
Is the variance of MSe smaller than the variance of Si2 ? Why?
Problem 10.8 To study the correlation between the solar insulation and wind speed in
the United States, 26 National Weather Service stations used three dierent types of solar
collectors2D Tracking, NS Tracking and EW Tracking to collect the solar insulation and wind
speed data. An engineer wishes to compare whether these three collectors give significantly
dierent measurements of wind speed. The values of windspeed corresponding to attainment
of 95% integrated insulation are reported in the following Table 9.8.
Are there statistically significant dierences in measurement among the three dierent
apertures? Specify and check the model assumptions.

166

CHAPTER 10. COMPARISON OF SEVERAL MEANS

Table 10.8:
Station No.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

Site
Brownsville, Texas
Apalachicola, Fla.
Miami, Fla.
Santa Maia, Calif.
Ft. Worth, Texas
Lake Charles, La.
Phoenix, Ariz.
El Paso, Taxes
Charleston, S.C.
Fresno, Calif.
Albuquerque, N.M.
Nashville, Tenn.
Cape Hatteras, N.C
Ely, Nev.
Dodge City, Kan.
Columbia, Mo.
Washington, D.C.
Medford, Ore.
Omaha, Neb.
Madison, Wis.
New York, N.Y.
Boston, Mass.
Seattle, Wash.
Great Falls, Mont.
Bismarck, N.D.
Caribou, Me.

Latitude
25.900
29.733
25.800
34.900
32.833
30.217
33.433
31.800
32.900
36.767
35.050
36.117
35.267
39.283
37.767
38.967
38.833
42.367
41.367
43.133
40.783
42.350
47.450
47.483
46.767
46.867

2D Tracking
11.0
7.9
8.7
9.6
10.8
8.5
6.6
10.3
9.2
6.2
9.0
7.7
9.2
10.0
12.0
9.0
9.3
6.8
10.4
9.5
10.4
11.4
9.0
12.9
10.8
11.4

NS Tracking
11.0
7.9
8.6
9.7
10.7
8.4
6.6
10.3
9.1
6.3
9.0
7.6
9.2
10.1
11.9
8.9
9.1
6.9
10.3
9.5
10.3
11.2
9.0
12.6
10.7
11.3

EW Tracking
11.0
8.0
8.7
9.5
10.9
8.6
6.5
10.3
9.2
6.1
8.9
7.7
9.3
10.1
12.0
9.1
9.5
6.5
10.5
9.6
10.4
11.4
9.1
13.0
10.8
11.5

Chapter 11

The Simple Linear Regression Model

11.1

An example

Consider the following example:

Example 11.1 Due to dierences in the cooling rates when rolled, the average elastic limit
and the ultimate strength of reinforcing metal bars is determined by the bar size. The
measurements in Table 10.1 (in hundreds of pounds per square inch) were obtained from a
sample of bars.
The experimental units (metal bars) are numbered from 1 to 35. Notice that each experimental unit, i, gave rise to three dierent measurements:
The diameter of the ith metal bar
The elastic limit of the i

metal bar

The ultimate strength of the ith metal bar

xi
yi
zi

We will investigate the relationship between the variables xi and yi . Likewise, the relationship between the variables xi and zi can be investigated in an analogous way (see Problem
10.2).
First of all we notice that the roles of yi and xi are dierent. Reasonably, one must assume
that the elastic limit, yi , of the ith metal bar is somehow determined (or influenced) by the
diameter, xi , of the bar. Consequently, the variable yi can be considered as a dependent or
response variable and the variable xi can be considered as an independent or explanatory
variable.
A quick look at Figure 11.1 (a) will show that there is not an exact (deterministic)
relationship between xi and yi . For example, bars with the same diameter (3, say) have
dierent elastic limits (436.82, 449.40 and 412.63). However, the plot of yi versus xi shows
that in general, larger values of xi are associated with smaller values of yi .
In cases like this we say that the variables are statistically related, in the sense that the
average elastic limit is a decreasing function, f (xi ), of the diameter.
167

168

CHAPTER 11. THE SIMPLE LINEAR REGRESSION MODEL

Table 11.1: Elastic Limit and Ultimate Strength of Metal Bars
Unit
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

( 18

Bar
Diameter
of an inch)
3
3
3
4
4
4
4
5
5
5
5
5
5
6
6
6
6
7

Elastic
Limit
(100 psi)
436.82
449.40
412.63
425.00
419.71
415.74
422.94
407.76
416.84
388.39
416.25
384.35
412.91
379.64
371.11
369.34
384.91
362.89

Ultimate
Strength
(100 psi)
683.65
678.48
681.41
672.29
673.26
671.31
674.42
646.44
654.32
649.31
654.24
644.20
640.15
627.52
621.45
626.11
632.73
601.73

Unit
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35

( 18

Bar
Diameter
of an inch)
7
7
8
8
8
9
9
9
10
10
10
10
11
11
11
12
12

Elastic
Limit
(100 psi)
361.14
356.06
328.59
321.64
321.14
297.28
286.04
291.99
231.15
249.13
249.81
251.22
200.76
216.99
210.26
162.30
167.63

Ultimate
Strength
(100 psi)
605.12
604.17
568.11
576.69
570.47
538.99
537.11
537.44
502.76
498.88
495.17
499.21
455.28
460.75
460.96
411.13
410.74

Each elastic limit measurement, yi , can be viewed as a particular value of the random
variable, Yi , which in turn can be expressed as the sum of two terms, f (xi ) and "i . That is,
Yi = f (xi ) + "i ,

i = 1, . . . , 35.

(11.1)

It is usually assumed that the random variables "i satisfy the following assumptions:
(1)
(2)
(3)
(4)

Independence. The random variables "i are independent.

Constant Mean. E("i ) = 0 for all i.
Constant Variance. Var("i ) = 2 for all i.
Normality. The variables "i are normal.

These assumptions can be summarized by simply saying that the variables Yi s are independent normal random variables with
E(Yi ) = f (xi )

and

Var(Yi ) = 2 .

The model (10.1) above is called linear if the function f (xi ) can be expressed in the form
f (xi ) = 0 + 1 g(xi ),

169

11.1. AN EXAMPLE

where the function g(x) is completely specified, and 0 and 1 are (usually unknown) parameters.

Diameter

(d)

-10

-20

Residuals

150 200 250 300 350 400 450

(c)

Elasticity Limit

Diameter

-20

-40

(b)

Residuals

150 200 250 300 350 400 450

Elasticity Limit

(a)

Diameter

Figure 11.1
The linear model,

Yi = 0 + 1 g(xi ) + "i ,

i = 1, . . . , 35,

(11.2)

170

CHAPTER 11. THE SIMPLE LINEAR REGRESSION MODEL

is very flexible as many possible mean response functions, f (xi ), satisfy the linear form
given above. For example, the functions
f (xi ) = 5.0 + 4.2xi

and

f (xi ) = 0 + 3 sin(2xi ),

are linear, in the sense explained above, with

g(xi ) = xi ,

0 = 5 and 1 = 4.2

in the first case, and

g(xi ) = sin(2xi ),

0 = unspecified and 1 = 3,

in the second case.

On the other hand, there are some functions that can not be expressed in this linear form.
One example is the function
f (xi ) =

exp{1 xi }
.
1 + exp{1 xi }

The shape assumed for f (xi ) is sometimes suggested by scientific or physical considerations. In other cases, as in the present example, the shape of f (xi ) is suggested by the data
itself. The plot of yi versus xi (see Figure 11.1) indicates that, at least in principle, the simple
linear mean response function
f (xi ) = 0 + 1 xi ,

that is g(xi ) = xi .

may be appropriate. In other words, to begin our investigation we will use the tentative
working assumption that, on the average, the elastic limit of the metal bars is a linear function
(0 + 1 xi ) of their diameters.
Of course, the values of 0 and 1 are unknown and must be empirically determined, that
is, estimated from the data. One popular method for estimating these parameters is the
method of least squares. Given the tentative values b0 and b1 for 0 and 1 , respectively,
the regression residuals
ri (b0 , b1 ) = yi b0 b1 xi ,

i = 1, . . . , n,

measure the vertical distances between the observed value, yi , and the tentatively estimated
mean response function, b0 + b1 xi .
The method of least squares consists of finding the values 0 and 1 of b0 and b1 , respectively,
which minimize the sum of the squares of the residuals. It is expected that, because of this
minimization property, the corresponding mean response function,
f(xi ) = 0 + 1 xi
will be close to or will fit the data points. In other words, the least square estimates 0
and 1 are the solution to the minimization problem:

171

11.1. AN EXAMPLE

min
b0 ,b1

n
X

ri2 (b0 , b1 ) =

min
b0 ,b1

i=1

n
X

[yi b0 b1 xi ]2 .

i=1

To find the actual values of 0 and 1 , we dierentiate the function

S(b0 , b1 ) =

n
X

ri2 (b0 , b1 )

i=1

with respect to b0 and b1 , and set these derivatives equal to zero to obtain the so called LS
equations:
n
X
@
S(b0 , b1 ) = 2 [yi b0 b1 xi ] = 0
@b0
i=1

n
X
@
S(b0 , b1 ) = 2 [yi b0 b1 xi ]xi = 0,
@b1
i=1

The LS equations can be rewritten as

n
X
i=1

yi nb0 b1

yi xi b0

n
X
i=1

xi b1

n
X

xi = 0

i=1

n
X

x2i = 0,

i=1

or equivalently,

y b0 b1 x = 0

(11.3)

xy b0 x b1 xx = 0,

(11.4)

where

xy =

From (10.3) we have

n
1X
yi xi
n i=1

and

xx =

n
1X
x2 .
n i=1 i

172

CHAPTER 11. THE SIMPLE LINEAR REGRESSION MODEL

b0 = y b1 x.

(11.5)

From this and (10.4) we have

b1 xx = xy b0 x
= xy [y b1 x] x
= xy y x + b1 x x
) b1 =

xy x y
.
xx x x

Therefore (see (10.5)),

xy y x
1 =
,
xx x x

and

0 = y x.
In the case of our numerical example we have
x = 7.086, y = 336.565, xy = 2162.353 and

xx = 57.657,

Therefore,
2162.353 (7.086)(336.565)
1 =
= 29.86,
57.657 (7.0862 )
and

0 = 336.565 (29.86)(7.086) = 548.16.

f(x) = 548.16 (29.86)x.

The plot of f(x) versus x (solid line in Figure 11.1 (a)) and the plot of the regression
residuals,

173

11.1. AN EXAMPLE

ei = yi f(xi ) = yi [ 548.16 (29.86)xi ],

versus xi (Figure 11.1 (b)) shows that the current fit may not be appropriate. The residuals show a clear negativepositivenegative pattern. Since the residuals, ei , estimate the
unobservable model errors,
" i = y i 0 1 x i ,

one would expect the plot of the ei versus the xi will not show any particular pattern. In
other words, if the specified mean response function is correct, the estimated mean response
function f(xi ) should extract most of the signal (systematic behavior) contained in the
data and the residuals, ei , should behave as patternless random noise.
Now that the tentatively specified simple transformation
g(x) = x
for the explanatory variable, x, is considered to be incorrect, the next step in the analysis is
to specify a new transformation. We will try the mean response function
f (x) = 0 + 1 x2 ,

that is g(x) = x2 .

Notice that if the mean elastic limit of the bars is a function of the bar surface
(diameter)2 1
1
( inch)2 = x2 (/4) ( inch)2 ,
4
8
8
then the newly proposed mean response function will be appropriate. To simplify the notation
we will write
wi = x2i ,
to represent the squared diameter of the ith metal bar.
The new estimates for 0 and 1 and f (x) are
1 =

wy y w
= 2.022,
ww w w

and

0 = y 1 w = 336.56 + (2.022)(57.65) = 453.128.

and
f(x) = 453.128 (2.022) x2 .

The plot for this new fit (solid line on Figure 11.1 (c)) and the residuals plot (Figure 11.1
(d)) indicate that this second fit is appropriate.

174

CHAPTER 11. THE SIMPLE LINEAR REGRESSION MODEL

Inference in the Linear Regression Model

It is not dicult to show that, if the model is correct, the estimates 0 and 1 are unbiased:

E(0 ) = 0

E(1 ) = 1 .

and

Also, it can be shown that

Var(0 ) =

1
w2
+ Pn
2
n
i=1 [wi w]

and
2
.
2
i=1 [wi w]

Var(1 ) = Pn
Finally, it can be shown that under the model,
E

n
X
i=1

[Yi 0 1 wi ]2 = (n 2) 2

a modelbased unbiased estimate for 2 is given by

n[(yy y y) 12 (ww w w)]

.
n2

i=1 [Yi

0 1 wi ]2
n2

In the case of our example,

s2 =

35[(120178.7 336.56462 ) (2.0222 )(4993.429 57.6572 )]

= 86.53.
35 2

In summary, the empirically estimated standard deviations of 0 and 1 are

175

11.1. AN EXAMPLE
v
u
u1
SD(0 ) = s t + Pn

v
u
u1
w2
t +
=
s
2
n n[ww w w]
i=1 [wi w]

and
s
=q
2
n[ww w w]
i=1 [wi w]

SD(1 ) = qP
n

In the case of our example,

SD(0 ) =

v
u
u 1
86.53 t +

57.6572
= 1.60
35[4993.429 57.6572 ]

and
SD(1 ) =

86.53
= 0.0385
35(4993.429 57.6572 )

Confidence Intervals
95% confidence intervals for the model parameters, 0 and 1 , and also for the mean
response, f (x), can now be easily obtained. First we derive the 95% confidence intervals for
0 and 1 . As before, the intervals are of the form
0 d0

and

1 d1

where
d0 = t(n2) () SD(0 )

and

d1 = t(n2) () SD(1 ).

In the case of our example, n 2 = 35 2 = 33, t(33) (0.05) t(30) (0.05) = 2.04 and so
d0 = (2.04)(1.60) = 3.26

and

d1 = (2.04)(0.0385) = 0.0785

Therefore, the 95% confidence intervals for 0 and 1 are

176

CHAPTER 11. THE SIMPLE LINEAR REGRESSION MODEL

453.125 3.26

2.021 0.0785,

and

respectively.
Notice that, since the confidence interval for doesnt include the value zero, we conclude
that there is a linear decreasing relationship between the square of the bar diameter and its
1
elastic limit. When the bar surface increases by one unit ( 64
inch2 ) the average elastic limit
decreases by two hundred psi.
Finally, we can also construct a 95% confidence interval for the average response, f (x), at
any given value of x. It can be shown that the variance of f(w) is
Var(f(x)) =

1
(w w)2
+
n n[ww w w]

where w = x2 . Therefore, the empirically estimated standard deviation of f(x) is

v
u
u1

SD(f (x)) = st +

(w w)2
.
n[ww w w]

In the case of our example we have,

SD(f(x)) =

v
u
u 1
86.53t +

(w 57.657)2
.
35[4993.429 57.6572 ]

For instance, if the value of interest is x = 8.0,

SD(f(8.0)) =

v
u
u 1
86.53t +

(16.0 57.657)2
= 1.59.
35[4993.429 57.6572 ]

The corresponding 95% confidence interval for f (8.0) is then,

f(8.0)) d
where
Since

d = (2.04)(1.59) (2.04)(1.59) = 3.24.

f(8.0) = 453.125 (2.022)(8.02 ) = 323.72,

the 95% confidence interval for f (8.0) is equal to

323.72 3.24.

177

11.2. EXERCISES

11.2

Exercises

Problem 11.1 The number of hours needed by twenty employees to complete a certain task
have been measured before and after they participated of a special training program. The
data is displayed on Table 7.2. Notice that these data have already been partially studied in
Problem 7.12. Investigate the relationship between the before training and the after training
times using linear regression. State your conclusions.

Problem 11.2 Investigate the relationship between the diameter bar and the ultimate
strength shown in Table 10.1. State your conclusions.

Problem 11.3 Table 10.2 reports the yearly worldwide frequency of earthquakes with magnitude 6 or greater from January, 1953 to December, 1965.
(a) Make scatter-plots of the frequencies against magnitudes and the log-frequencies against
the magnitudes.
(b) Propose your regression model and estimate the coecients of your model.
(c) Test the null hypothesis that the slope is equal to zero.

Table 11.2:
Magnitude
6.0
6.1
6.2
6.3
6.4
6.5
6.6
6.7
6.8
6.9
7.0
7.1
7.2
7.3

Frequency
2750
1929
1755
1405
1154
920
634
487
376
276
213
141
110
85

Magnitude
7.4
7.5
7.6
7.7
7.8
7.9
8.0
8.1
8.2
8.3
8.4
8.5
8.6
8.7

Frequency
57
45
31
23
18
13
9
7
7
4
2
2
1
1

Problem 11.4 In a certain type of test specimen, the normal stress on a specimen is known
to be functionally related to the shear resistance. The following is a set of experimental data
on the variables.

178

CHAPTER 11. THE SIMPLE LINEAR REGRESSION MODEL

x, normal stress
26.8
25.4
28.9
23.6
27.7
23.9
24.7
28.1
26.9
27.4
22.6
25.6

y, shear resistance
26.5
27.3
24.2
27.1
23.6
25.9
26.3
22.5
21.7
21.4
25.8
24.9

(a) Write the regression equation.

(b) Estimate the shear resistance for a normal stress of 24.5 pounds per square inch.
(c) Construct 95% confidence intervals for regression coecients 0 and 1 .
(d) Check the normality assumption through the residuals.
Problem 11.5 The amounts of a chemical compound y, which dissolved in 100 grams of
water at various temperatures, x, were recorded as follows:
x(o C)
0
15
30
45
60
75

y (grams)
8 6 8
12 10 14
25 21 24
31 33 28
44 39 42
48 51 44

(a) Find the equation of the regression line.

(b) Estimate the amount of chemical that will dissolve in 100 grams of water at 50o C.
(c) Test the hypothesis that 0 = 6, using a 0.01 level of significance, against the alternative
that 0 6= 6.
(d) Is the linear model adequate?

Chapter 12

Appendix
12.1

Appendix A: tables

This appendix includes five tables: normal table, t-distribution table, F -distribution table,
cumulative Poisson distribution table and cumulative binomial distribution table.

179

Intrusion Detection Honeypots
From Everand
Intrusion Detection Honeypots
Chris Sanders
3/5 (2)
Bio Stat Methods
No ratings yet
Bio Stat Methods
474 pages
Probability and Statistics Explorations With Maple
No ratings yet
Probability and Statistics Explorations With Maple
287 pages
Quick Reference Guide To Unique Pet Species
100% (1)
Quick Reference Guide To Unique Pet Species
624 pages
I ' S M P S I: Nstructor S Olutions Anual Robability AND Tatistical Nference
No ratings yet
I ' S M P S I: Nstructor S Olutions Anual Robability AND Tatistical Nference
16 pages
Introduction To Statistics: Haramaya University College of Computing and Informatics Department of Statistics
100% (1)
Introduction To Statistics: Haramaya University College of Computing and Informatics Department of Statistics
113 pages
Quantum Physics for Beginners
From Everand
Quantum Physics for Beginners
Max Thomson
4.5/5 (3)
Suzan S. Waryoba Vs Shija Dalawa
No ratings yet
Suzan S. Waryoba Vs Shija Dalawa
11 pages
Introduction To Probability and Statistics
100% (1)
Introduction To Probability and Statistics
179 pages
STAT 251 Course Text
No ratings yet
STAT 251 Course Text
179 pages
Stat Book
No ratings yet
Stat Book
413 pages
An Introduction To The Science of Statis PDF
No ratings yet
An Introduction To The Science of Statis PDF
430 pages
Statbook PDF
No ratings yet
Statbook PDF
433 pages
Stat Book
No ratings yet
Stat Book
383 pages
+an Introduction To The Science of Statistics PDF
No ratings yet
+an Introduction To The Science of Statistics PDF
383 pages
Introduction To Statistics WITH SAS
No ratings yet
Introduction To Statistics WITH SAS
238 pages
2020-2021 EDA 101 Handout
No ratings yet
2020-2021 EDA 101 Handout
192 pages
Introduction To Statistics 14 Weeks
No ratings yet
Introduction To Statistics 14 Weeks
310 pages
STA501 Study Guide 2024-02-27 01 - 00 - 08
No ratings yet
STA501 Study Guide 2024-02-27 01 - 00 - 08
270 pages
MTH 106 INTRODUCTORY TO DESCRIPTIVE STATISTICS
100% (1)
MTH 106 INTRODUCTORY TO DESCRIPTIVE STATISTICS
134 pages
Book
No ratings yet
Book
475 pages
Book Solutions
No ratings yet
Book Solutions
17 pages
Probability
No ratings yet
Probability
180 pages
Probability and Statistics
No ratings yet
Probability and Statistics
580 pages
Introduction To Statistical Thought
100% (2)
Introduction To Statistical Thought
393 pages
Statistical Inference
No ratings yet
Statistical Inference
158 pages
Statistical Models
No ratings yet
Statistical Models
248 pages
Statistical Inference For Data Science
100% (1)
Statistical Inference For Data Science
124 pages
Introduction To Statistical Thought - Michael Levine
No ratings yet
Introduction To Statistical Thought - Michael Levine
344 pages
MA-202: Probability & Statistics: Class Notes
No ratings yet
MA-202: Probability & Statistics: Class Notes
221 pages
Bio220 Lab Manual
No ratings yet
Bio220 Lab Manual
92 pages
Statistic Book
100% (1)
Statistic Book
328 pages
Caffo-Statistical Inference For Data Science PDF
No ratings yet
Caffo-Statistical Inference For Data Science PDF
112 pages
JB Ies 109 Exercises Answers
No ratings yet
JB Ies 109 Exercises Answers
246 pages
Bio Stat
No ratings yet
Bio Stat
472 pages
Intro Stat
No ratings yet
Intro Stat
112 pages
Engineering Mathematics II
No ratings yet
Engineering Mathematics II
102 pages
Statistics and Probability Theory Summary and Answer of Exercises
No ratings yet
Statistics and Probability Theory Summary and Answer of Exercises
120 pages
Intuition To Probability (Version 1.19)
No ratings yet
Intuition To Probability (Version 1.19)
396 pages
Maple Manual
No ratings yet
Maple Manual
285 pages
Cours-Corrigé de Proba&Statis
No ratings yet
Cours-Corrigé de Proba&Statis
79 pages
Ma 202
No ratings yet
Ma 202
219 pages
Inbound 8969254549211759123
No ratings yet
Inbound 8969254549211759123
164 pages
Statistical Inference in Data Science
No ratings yet
Statistical Inference in Data Science
121 pages
Asdsdasd
No ratings yet
Asdsdasd
112 pages
Elementary Statistics For UWG v1.11
No ratings yet
Elementary Statistics For UWG v1.11
239 pages
Brian Caffo-Statistical Inference For Data Science - A Companion To The Coursera Statistical Inference Course (2015) PDF
100% (1)
Brian Caffo-Statistical Inference For Data Science - A Companion To The Coursera Statistical Inference Course (2015) PDF
112 pages
ST102/ST109 Elementary Statistical Theory Course Pack 2022/23 (Michaelmas Term)
100% (1)
ST102/ST109 Elementary Statistical Theory Course Pack 2022/23 (Michaelmas Term)
235 pages
001-2023-0929 DLMDSAS01 Course Book
No ratings yet
001-2023-0929 DLMDSAS01 Course Book
224 pages
Minitab Guide
No ratings yet
Minitab Guide
256 pages
MINITAB Manual For Introduction To The Practice of Statistics
No ratings yet
MINITAB Manual For Introduction To The Practice of Statistics
124 pages
Multivariate Analysis Lecture Notes For Stat 5353: J. D. Tubbs Department of Mathematical Sciences Fall Semester 2002
No ratings yet
Multivariate Analysis Lecture Notes For Stat 5353: J. D. Tubbs Department of Mathematical Sciences Fall Semester 2002
234 pages
Statistical Inference
No ratings yet
Statistical Inference
148 pages
A Discourse Analysis of 1 Peter
From Everand
A Discourse Analysis of 1 Peter
Ervin Ray Starwalt
No ratings yet
Unlocking Statistics for the Social Sciences
From Everand
Unlocking Statistics for the Social Sciences
Norma Sinclair
No ratings yet
Advanced college algebra study guide
From Everand
Advanced college algebra study guide
Harrison Cook
No ratings yet
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
From Everand
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
Harrison K Cook
No ratings yet
Gray Hat Hacking the Ethical Hacker's
From Everand
Gray Hat Hacking the Ethical Hacker's
Çağatay Şanlı
5/5 (1)
Mortals or Immortals
From Everand
Mortals or Immortals
Konstantinos p Anastasiadis
No ratings yet
Mathematics N4: FET College Nated, #6
From Everand
Mathematics N4: FET College Nated, #6
Efetobo Emede
No ratings yet
Kellory the Warlock
From Everand
Kellory the Warlock
Lin Carter
No ratings yet
The Stock Market from A to See - 2nd Edition
From Everand
The Stock Market from A to See - 2nd Edition
John Nunez
No ratings yet
DataSheet - W 10 VT 1
No ratings yet
DataSheet - W 10 VT 1
1 page
Advanced Powder Technology: Michael J. Carr, Alan W. Roberts, Craig A. Wheeler
No ratings yet
Advanced Powder Technology: Michael J. Carr, Alan W. Roberts, Craig A. Wheeler
7 pages
An Excavation Force Calculations and Applications: An Analytical Approach
No ratings yet
An Excavation Force Calculations and Applications: An Analytical Approach
8 pages
Integration of Digging Forces in A Multi-Body-System Model of An Excavator
No ratings yet
Integration of Digging Forces in A Multi-Body-System Model of An Excavator
20 pages
Phipps-Conservatory 1994
No ratings yet
Phipps-Conservatory 1994
2 pages
The Influence of Belt Cover Thickness On Conveyor
No ratings yet
The Influence of Belt Cover Thickness On Conveyor
7 pages
Paramax 9000 Catalog PDF
No ratings yet
Paramax 9000 Catalog PDF
281 pages
Paramax 9000 Catalog PDF
No ratings yet
Paramax 9000 Catalog PDF
281 pages
Crosby Mckissick Sheaves Catalog PDF
No ratings yet
Crosby Mckissick Sheaves Catalog PDF
29 pages
Steel
No ratings yet
Steel
1 page
Data Sheet: Three Phase Induction Motor - Squirrel Cage
100% (1)
Data Sheet: Three Phase Induction Motor - Squirrel Cage
7 pages
Pages From R22-0001-2014-12-08
No ratings yet
Pages From R22-0001-2014-12-08
1 page
US 6566 U.S.TsubakiInc. 02 02 2018
No ratings yet
US 6566 U.S.TsubakiInc. 02 02 2018
1 page
Power Transmission Design Project EGMN 300-001-32729 Mechanical Systems Design Fall 2016 Engineering Building West Classroom 0101 0800-0915, Tuesday & Thursday
100% (1)
Power Transmission Design Project EGMN 300-001-32729 Mechanical Systems Design Fall 2016 Engineering Building West Classroom 0101 0800-0915, Tuesday & Thursday
8 pages
Project 1 - Stats 1: Phase 1 Design Survey
No ratings yet
Project 1 - Stats 1: Phase 1 Design Survey
1 page
2016 CAD Assignment
No ratings yet
2016 CAD Assignment
2 pages
Example Report
No ratings yet
Example Report
5 pages
Report Topic: Ceramic-Based Composite Materials For Automobile Engine Applications
No ratings yet
Report Topic: Ceramic-Based Composite Materials For Automobile Engine Applications
4 pages
C Increments) and A: P at Each Condition
No ratings yet
C Increments) and A: P at Each Condition
1 page
IENG 314 Midterm Exam 1 Study Guide: by Amirmahdi Tafreshian October 3, 2015
No ratings yet
IENG 314 Midterm Exam 1 Study Guide: by Amirmahdi Tafreshian October 3, 2015
2 pages
On The Convergence of A Non-Incremental Homogenization Method For Nonlinear Elastic Composite Materials
No ratings yet
On The Convergence of A Non-Incremental Homogenization Method For Nonlinear Elastic Composite Materials
21 pages
Using Storyboarding and de Bono
No ratings yet
Using Storyboarding and de Bono
11 pages
ENGLISH 2 - Set A 1st QUARTER TEST 2024-2025
100% (2)
ENGLISH 2 - Set A 1st QUARTER TEST 2024-2025
5 pages
Stanley: # Experience # Education
No ratings yet
Stanley: # Experience # Education
1 page
De Cuong On Tap HKI Tieng Anh 8 Global
No ratings yet
De Cuong On Tap HKI Tieng Anh 8 Global
5 pages
Elementary Unit
No ratings yet
Elementary Unit
46 pages
10th Holiday Worksheet
No ratings yet
10th Holiday Worksheet
28 pages
"Village of Painters": A Visit To Naya, Pingla
No ratings yet
"Village of Painters": A Visit To Naya, Pingla
5 pages
Chapter 11 Cost Acctng
67% (6)
Chapter 11 Cost Acctng
11 pages
Case Study
No ratings yet
Case Study
44 pages
Eberhardt L4 InSituStress
100% (1)
Eberhardt L4 InSituStress
28 pages
HS Unit 1 Test 1
No ratings yet
HS Unit 1 Test 1
5 pages
List of References Supporting The Assessment of Senna Vahl), Folium and Fructus
No ratings yet
List of References Supporting The Assessment of Senna Vahl), Folium and Fructus
15 pages
Grade 11 - First Term Test - March 2018: Royal College - Colombo 07
No ratings yet
Grade 11 - First Term Test - March 2018: Royal College - Colombo 07
9 pages
Midterm Exam in DIASS 2024 Final
No ratings yet
Midterm Exam in DIASS 2024 Final
9 pages
Jgygy
No ratings yet
Jgygy
92 pages
Questionpaper PaperB1 June2023
No ratings yet
Questionpaper PaperB1 June2023
12 pages
Perception Towards Online Shopping
No ratings yet
Perception Towards Online Shopping
6 pages
Skrip Project 2
No ratings yet
Skrip Project 2
8 pages
Dalik4ielts t78154
No ratings yet
Dalik4ielts t78154
3 pages
Joy Tindiwegi V Julia Tigeita Munubi and Harriet Nyanjura Munubi 2025 UGRSB 11 (12 May 2025)
No ratings yet
Joy Tindiwegi V Julia Tigeita Munubi and Harriet Nyanjura Munubi 2025 UGRSB 11 (12 May 2025)
14 pages
Elementary Practical Statistics
67% (3)
Elementary Practical Statistics
440 pages
Pitch Perfect - 50 Shades of Pitch Perfect - Getting Sidetracked
No ratings yet
Pitch Perfect - 50 Shades of Pitch Perfect - Getting Sidetracked
5 pages
West Indonesia-Java
No ratings yet
West Indonesia-Java
25 pages
Grammer Exercise
100% (1)
Grammer Exercise
4 pages
Poems Absurdities
No ratings yet
Poems Absurdities
6 pages
CIR - Food Menu - Lunch - August - 2024
No ratings yet
CIR - Food Menu - Lunch - August - 2024
1 page
Filipino Thesis Sample Kabanata 2
100% (3)
Filipino Thesis Sample Kabanata 2
5 pages
Construction NDA Template
No ratings yet
Construction NDA Template
2 pages

Coursenotes Aug2012

Uploaded by

Coursenotes Aug2012

Uploaded by

Contents

1 Summary and Display of Univariate Data

2 Summary and Display of Multivariate Data

6 Some Probability Models

7 Normal Probability Approximations

8 Statistical Modeling and Inference

11 The Simple Linear Regression Model

Summary and Display of Univariate

Frequency Table and Histogram

CHAPTER 1. SUMMARY AND DISPLAY OF UNIVARIATE DATA

1.1. FREQUENCY TABLE AND HISTOGRAM

Histogram of Live Load

0.002 0.004 0.006 0.008 0.010 0.012

Figure 1.1: Histogram of the Live Load

CHAPTER 1. SUMMARY AND DISPLAY OF UNIVARIATE DATA

(b) Age of officers

0.002 0.004 0.006

(a) Tobins Q ratio

100 200 300 400 500 600

(d) Outliers deleted

(c) Speed of light

24.76 24.78 24.80 24.82 24.84

Figure 1.2: Some Non-Symmetric Histograms

1.2. SAMPLE MEAN

(30 2) + (50 5) + . . . + (250 2)

v = (138.4 + 236.4 + . . . + 114.1)/10 = 152.01

CHAPTER 1. SUMMARY AND DISPLAY OF UNIVARIATE DATA

= S(x) + n(x t)2 ,

since n(x t)2 0,

Moreover, equality holds only if all the measurements are equal.

Replacing Fi by w and F by nw we have

1.3. SAMPLE STANDARD DEVIATION, VARIANCE AND COVARIANCE

Sample Standard Deviation, Variance and Covariance

SD(x) = 39.798 pounds per ft2 .

(30 139.8)2 2 + (50 139.8)2 5 + + (250 139.8)2 2

which is close to the exact value, 39.798.

CHAPTER 1. SUMMARY AND DISPLAY OF UNIVARIATE DATA

x)2 + (yi y)2 + 2(xi x)(yi y)]

(44.4 134.42)2 + (130.4 134.42)2 + + (187.9 134.42)2

(138.4 152.01)2 + (236.4 152.01)2 + + (114.1 152.01)2

(44.4 134.42)(138.4 152.01) + + (187.9 134.42)(114.1 152.01)

If wi represents the combined live loads on bays A and B (i.e. wi = ui + vi ) then

1.4. SAMPLE QUANTILES, MEDIAN AND INTERQUARTILE RANGE

To prove (1.2) write

The identities in (1.2) follow now because

xi = nx2 2nx2 = nx2 = (

The proof of (1.3) is similar and is left as an exercise.

Using (1.3) and the totals from Table 1.3 we have

Sample Quantiles, Median and Interquartile Range

CHAPTER 1. SUMMARY AND DISPLAY OF UNIVARIATE DATA

To calculate the new standard deviation notice that

= 9 2.82 = 70.56 and by (1.2)

(xi 95)2 + 10 952 = 70.56 + 90250 = 90320.56.

1.4. SAMPLE QUANTILES, MEDIAN AND INTERQUARTILE RANGE

CHAPTER 1. SUMMARY AND DISPLAY OF UNIVARIATE DATA

1.5. BOX PLOT

CHAPTER 1. SUMMARY AND DISPLAY OF UNIVARIATE DATA

(b) Age of officers

(c) Speed of light

24.820 24.830 24.840

100 200 300 400 500

(a) Tobins Q ratio

(d) Outliers deleted

Table 1.7: Finance Charges from 240 Accounts

CHAPTER 1. SUMMARY AND DISPLAY OF UNIVARIATE DATA

(a) Calculate the mean, variance and standard deviation.

(a) Calculate the mean, variance and standard deviation.

Table 1.8: Cavendish Measurements of the Density of the Earth

(a) Calculate the mean, variance and standard deviation.

Table 1.9: Daily Sales, April 1994

CHAPTER 1. SUMMARY AND DISPLAY OF UNIVARIATE DATA

Low Sens (6)

High Sens (10)

Table 1.11: Earthquakes in 1993

Problem 1.13 Given the data set x1 = 1, x2 = 3, x3 = 8, x4 = 12, x5 = 20 calculate the

CHAPTER 1. SUMMARY AND DISPLAY OF UNIVARIATE DATA