STAT 251 Course Text
STAT 251 Course Text
3 Probability 41
3.1 Sets and Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.2 Conditional Probability and Independence . . . . . . . . . . . . . . . . . . . . . . . . 44
3.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
1
2 CONTENTS
5 Normal Distribution 89
5.1 Definition and Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.2 Checking Normality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.3.1 Exercise Set A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.3.2 Exercise Set B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
12 Appendix 179
12.1 Appendix A: tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
4 CONTENTS
Chapter 1
Frequency Table
5
6 CHAPTER 1. SUMMARY AND DISPLAY OF UNIVARIATE DATA
Consider the 200 measurements of the live load distribution (pounds per square foot) on ten
floors and twenty bays of a large warehouse (Table 1.1). The live load is the load supported by
the structure excluding the weight of the structure itself. Notice how hard it is to understand data
presented in this raw form. They must clearly be organized and summarized in some fashion before
their analysis can be attempted.
One way to summarize a large data set is to condense it into a frequency table (see Table 1.2).
The first step to construct a frequency table is to determine an appropriate data range, that is,
an interval that contains all the observations and that has end points close (but not necessarily
equal) to the smallest and largest data values. The second step is to determine the number k of
bins. The data range is divided into k smaller subintervals, the bins, usually taken of the same size.
Normally, the number of bins k is chosen between 7 and 15, depending on the size of the data set
with fewer bins producing simpler but less detailed tables. For example, in the case of the live load
data, the smallest and largest observations are 32.1 and 258.9, the data range is [20, 260] and there
are 12 bins of size 20. The third step is to calculate the bin mark, ci , which represents that bin.
The bin mark is the center of the bin interval (that is, one half of the sum of the bin’s end points).
For example, 30 = (20 + 40)/2 for the first bin in Table 1.2. The fourth step is to calculate the
bin frequencies, ni . The bin frequency is equal to the number of data points lying in that bin. Each
data point must be counted once; if a data point is equal to the end points of two successive bins,
then it is included (only) in the second. For example, a live load of 60 is included in the third bin
(see Table 1.2). The fourth step is to calculate the relative frequencies
ni
fi =
n1 + n2 + . . . + nk
and the cumulative relative frequencies
n1 + . . . + ni
Fi = .
n1 + n2 + . . . + nk
Notice that fi 100% gives the percentage of observations in the ith bin and Fi 100% gives the per-
centage of observations below the end point of the ith bin. For example, from Table 1.2, 18% of the
live loads are between 140 and 160 psf , and 95% of the live loads are below 200 psf .
Class ci ni fi Fi
20–40 30 2 0.010 0.010
40–60 50 5 0.025 0.035
60–80 70 6 0.030 0.065
80–100 90 15 0.075 0.140
100–120 110 28 0.14 0.280
120–140 130 47 0.235 0.515
140–160 150 36 0.180 0.695
160–180 170 32 0.160 0.855
180–200 190 19 0.095 0.950
200–220 210 5 0.025 0.975
220–240 230 3 0.015 0.990
240–260 250 2 0.010 1.000
At this point it is worth comparing Table 1.1 and Table 1.2. We can quickly learn, for instance,
from Table 1.2 that only 2 live loads lie between 20 and 40, but we cannot say which they are. On
1.1. FREQUENCY TABLE AND HISTOGRAM 7
the other hand, with considerably effort, we can find out from Table 1.1 that these live loads are
32.1 and 33.5. Table 1.2 looses some information in exchange for clarity. The loss of information
and gain in clarity are proportional to the number of bins.
Histogram:
The information contained in a frequency table can be graphically displayed in a picture called
histogram (see Figure 1.1). Bars with areas proportional to the bin frequencies are drawn over each
bin. Notice that in the case of bins of equal size the bar areas are proportional to the bar heights.
The histogram shows the shape or distribution of the data and permits a direct visualization of
its general characteristics including typical values, spread, shape, etc. The histogram also helps
to detect unusual observations called outliers. From Figure 1.1 we notice that the distribution of
the live load is approximately symmetric: the central bin 120 − 140 is the most frequent and the
frequency of the other bins decrease as we move away from this central bin.
0.0
class
Many data sets encountered in practice are not symmetric. For example the histogram of
Tobin’s Q-ratios (market value to replacement cost, out of 250) for 50 firms in Figure 1.2 (a) shows
high positive skewness. There are a few firms which are highly over rated. The age of officers
attaining the rank of colonel in the Royal Netherlands Air Force (Figure 1.2 (b)) exhibit a pattern
of negative skewness. There appear to be more “whizzes” than “laggards” in the Netherlands Air
Force. Figure 1.2 (c) displays Simon Newcomb’s measurements of the speed of light. Newcomb
measured the time required for light to travel from his laboratory on the Potomac River to a mirror
at the base of the Washington Monument and back, a total distance of about 7400 meters. These
measurements were used to estimate the speed of light. The histogram of Newcomb’s data (Figure
1.2 (c)) shows a symmetric distribution except for two outliers. Deleting these outliers gives the
symmetric histogram on Figure 1.2 (d).
Data sets can be further summarized in terms of just two numbers, one giving their location and
the other their dispersion. These summaries are very convenient and perhaps unavoidable when we
8 CHAPTER 1. SUMMARY AND DISPLAY OF UNIVARIATE DATA
0.30
0.10 0.20
0.0
0.0
0 100 200 300 400 500 600 46 48 50 52 54
80
0 10 20 30 40 50 60
60
40
20
0
24.76 24.78 24.80 24.82 24.84 24.81524.82024.82524.83024.83524.840
must compare several data sets (e.g. the production figures from several plants and shifts). The
loss of information is not severe in the case of data sets with approximately symmetric histograms,
but may be very severe in other cases.
Two commonly used measures of location and dispersion are the sample mean and the sample
standard deviation. They are studied in the next two sections.
The sample mean x (also called sample average) of a data set or “sample” is defined as
Pn
x1 + x2 + · · · + xn i=1 xi
x= = ,
n n
where n represents the number of data points (observations). For the live load data (see Table 1.1)
The sample average can also be approximately calculated from a frequency table using the formula
Pk k
ci ni X
x ≈ Pi=1
k
= ci fi .
i=1 ni
The approximation is better when the measurements are symmetrically distributed over each bin.
For the live load data (see Table 1.2) we have
(30 × 2) + (50 × 5) + . . . + (250 × 2)
x ≈
2 + 5 + ... + 2
= (30 × 0.01) + (50 × 0.035) + . . . + (250 × 0.01) = 139.8 pounds per ft2 ,
Linear Transformations: If the original measurements, xi are linearly transformed to obtain new
measurements
yi = a + bxi ,
for some constants a and b, then
y = a + bx.
In fact, Pn Pn Pn Pn
i=1 yi i=1 (a + bxi ) na + b i=1 xi i=1 xi
y= = = =a+b = a + bx.
n n n n
Example 1.1 Suppose that each live load from Table 1.1 is increased by 5 kilograms and converted
to kilograms per square foot. Since one pound equals 0.4535 kilograms, the revised measurements
are yi = 5 + 0.4535xi and y = 5 + 0.4535x = 5 + 0.4535 × 140.2 = 68.58kg.
Sum of Variables: If new measurements zi are obtained by adding old measurements xi and yi
then
z = x + y.
In fact, Pn Pn Pn Pn
i=1 zi i=1 (xi + yi ) i=1 xi + i=1 yi
z= = = = x + y.
n n n
Example 1.2 Let ui and vi (i = 1, . . . , 10) represent the live loads on bays A and B. The mean
load across floors for these two bays are (see Table 1.1)
If wi represent the combined live loads on bays A and B (i.e. wi = ui + vi ) then the combined mean
load across floors for these two bays is
Least Squares: The sample mean has a nice geometric interpretation. If we represent each obser-
vation xi as a point on the real line, then the sample mean is the point which is “closest” to entire
collection of measurements. More precisely, let S(t) be the sum of the squared distances from each
observation xi to the point t:
n
X
S(t) = (xi − t)2 .
Then S(t) ≥ S(x) for all t. To prove this write
n
X
S(t) = [(xi − x) + (x − t)]2
n
X
= [(xi − x)2 + (x − t)2 + 2(xi − x)(x − t)]
n
X n
X
= (xi − x)2 + n(x − t)2 + 2(x − t) (xi − x)
n
X
= S(x) + n(x − t)2 , since (xi − x) = nx − nx = 0
Center of Gravity: The sample mean has also a nice physical interpretation. If we think of
the observations xi as points on a uniform beam where vertical equal forces, Fi , are applied (see
Figure 1.3), then the sample mean is the center of gravity of this system. To see this consider the
magnitude and the placement of the opposite force F needed to achieve static equilibrium. Since all
the forces are vertical, the horizontal component of F must be equal to zero. To achieve translation
equilibrium the sum of the vertical components of all the forces must also be equal to zero. If we
denote the vertical components of Fi by Fi , and the vertical component of F by F , then
Since the Fi0 s are all equal (Fi = −w, say) we have F − nw = 0 and so F = nw. To achieve torque
equilibrium, the placement d of F must satisfy
dnw − w(x1 + x2 + . . . + xn ) = 0.
Therefore,
x1 + x2 + . . . + xn
d= = x.
n
1.3. SAMPLE STANDARD DEVIATION, VARIANCE AND COVARIANCE 11
F3 F5 F2 F4 F1
? ? ? ? ?
x3 x5 x̄ 6x2 x4 x1
F
Figure 1.3: The Sample Mean As Center of Gravity
Linear Transformations: If the original measurements, xi are linearly transformed to obtain new
measurements
yi = a + bxi ,
for some constants a and b, then
Var(y) = b2 Var(x).
In fact, since y = a + bx,
P P
(yi − y)2 (a + bxi − a − bx)2
Var(y) = =
(n − 1) (n − 1)
P P
[b(xi − x)]2 (xi − x)2
= = b2 = b2 Var(x).
(n − 1) (n − 1)
12 CHAPTER 1. SUMMARY AND DISPLAY OF UNIVARIATE DATA
Example 1.3 As in Example 1.1, each live load in Table 1.1 is increased by 5 kilograms per square
foot and converted to kilograms per square foot. Since one pound equals 0.4535 kilograms, the revised
measurements are yi = 5 + 0.4535xi kilograms per square foot and so Var(y) = 0.45352 × Var(x) =
0.2056623 ×√1583.892 = 325.747kg 2 square kilograms per ft4 . The corresponding standard deviation
is SD(y) = 325.747 = 18.048kg kilograms per square foot.
Sum of Variables: If new measurements zi are obtained by adding old measurements xi and yi
then
where Pn
i=1 (xi
− x)(yi − y)
Cov(x, y) = ,
n−1
is the covariance between xi and yi . The covariance will be further discussed in the next Chapter.
The important point here is to notice that the variances of xi and yi cannot simply be added to
obtain the variance of zi .
To prove (1.1) write
Pn Pn Pn
− z)2
i=1 (zi i=1 (xi + yi − x − y)2 i=1 [(xi − x) + (yi − y)]2
Var(z) = = =
n−1 n−1 n−1
Pn
i=1 [(xi − x)2 + (yi − y)2 + 2(xi − x)(yi − y)]
=
n−1
Pn Pn Pn
i=1 (xi − x)2 + i=1 (yi − y)2 + 2 i=1 (xi − x)(yi − y)
= .
n−1
Example 1.4 As in Example 1.2 let ui and vi be the live loads on bays A and B. The variances
and covariance for these loads are (see Table 1.1 and Example 1.2)
(44.4 − 134.42)2 + (130.4 − 134.42)2 + · · · + (187.9 − 134.42)2
Var(u) = = 1777.128 (Bay A)
9
Two Simple Identities: the following identities are very useful for handling calculations of vari-
ances and covariances:
n
X n
X n
X n
X
2
(xi − x) = x2i 2
− nx = x2i −( xi )2 /n (1.2)
i=1 i=1 i=1 i=1
1.4. SAMPLE QUANTILES, MEDIAN AND INTERQUARTILE RANGE 13
and
n
X n
X n
X n
X n
X
(xi − x)(yi − y) = xi yi − nx y = xi yi − ( xi )( yi )/n. (1.3)
i=1 i=1 i=1 i=1 i=1
Example 1.5 To illustrate the use of (1.2) and (1.3), let’s calculate again Var(u), Var(v) and
Cov(u, v) where ui and vi are as in Example 1.4. Using (1.2) and the totals from Table 1.3 we have
(1344.2)2 (1520.1)2
196681.5 − 10 245991.8 − 10
Var(u) = = 1777.128 and Var(v) = = 1657.93.
9 9
Using (1.3) and the totals from Table 1.3 we have
(1344.2)(1520.1)
202364.0 − 10
Cov(u, v) = = −218.650.
9
Example 1.6 A student with an average of 94.7% (SD=2.8%) on the first 10 assignments had a
personal problem and did very poorly on the eleventh where he got zero. Calculate his current
average and standard deviation.
(10 × 95) + 0
x= = 86.09.
11
P10 2
To calculate the new standard deviation notice that i=1 (xi − 95) = 9 × 2.82 = 70.56 and by (1.2)
10
X 10
X
x2i = (xi − 95)2 + 10 × 952 = 70.56 + 90250 = 90320.56.
i=1 i=1
Therefore,
90320.56 + 02 − (11 × 86.092 )
Var(x) = = 879.4191,
10
√
and the standard deviation, then, increases from 2.8 to 879.4191 = 29.66. 2
We will see that data sets which are asymmetric or include outliers may be better summarized
using the sample quantiles defined below.
Sample Quantiles
Let 0 < p < 1 be fixed. The sample quantile of order p, Q(p), is a number with the property
that approximately p100% of the data points are smaller than it. For example, if the 0.95 quantile
for the class final grades is Q(0.95) = 85 then 95% of the students got 85 or less. If your grade is
87 then you are in the the top 5% of the class. On the other hand, if your mark were smaller than
Q(0.10) than you would be in the lowest 10% of the class.
To compute Q(p) we must follow the following steps
1 Sort the data from smallest data point, x(1) , to largest data point, x(n) , to obtain
Q(p) = x(m) .
If np + 0.5 is not an integer and m < np + 0.5 < m + 1 for some integer m then
x(m) + x(m+1)
Q(p) = .
2
1.4. SAMPLE QUANTILES, MEDIAN AND INTERQUARTILE RANGE 15
Example 1.7 Let ui and vi be the live loads on the first two floors (see Table 1.4). Calculate the
quantiles of order 0.25, 0.50 and 0.75 for the live load on floors 1 and 2 and for the differences
wi = ui − vi between the live loads on these two floors.
Solution
To calculate the quantile of order 0.25 for the live load on floor 1, Qu (0.25), observe that n = 20,
p = .25 and so np + .5 = 20 × .25 + .5 = 5.5 is between 5 and 6. Using the column u(i) from Table
1.4 we obtain
u(5) + u(6) 112.3 + 119.4
Qu (0.25) = = = 115.85.
2 2
Similar calculations give Qv (0.25) = 109.25 and Qw (0.25) = −25.25. To calculate Qu (0.50) notice
that np + .5 = 20 × .50 + .5 = 10.5 is between 10 and 11. Again, using the column u(i) from Table
1.4 we obtain
u(10) + u(11) 150.4 + 152.3
Qu (0.50) = = = 151.35.
2 2
The reader can check using similar calculations that Qv (0.50) = 134.1, Qw (0.50) = 7, Qu (0.75) =
162.85, Qv (0.75) = 166.5 and Qw (0.75) = 38.
Unfortunately, the sample quantiles do not have the same nice properties as the the sample
mean in relation with sums and differences of variables. For example
and
Qu (0.75) − Qv (0.75) = 151.35 − 134.1 = 17.25 6= 38 = Qu−v (0.75).
As a rule of thumb we will calculate both the mean and the median and use the mean if they
are similar. Otherwise we will use the median. To guide our choice we can calculate the discrepancy
index
√ |Mean − Median|
d= n
2 IQR
and choose the mean when d is smaller than 1. The interquartile range (IQR), used in the denom-
inator of d above, is defined as
IQR = Q(0.75) − Q(0.25),
The IQR is recommended as a measure of dispersion in the presence of outliers and lack of symmetry.
Notice that IQR is proportional to the length of the central half of the data, regardless the shape
of the histogram, and it is not much affected by outliers.
Example 1.8 Refer to Example 1.6. Calculate the median, the interquatile range and the discrep-
ancy index d for the student’s marks before and after the eleventh assignment (The marks are 94,
93, 95, 91, 96, 91, 98, 93, 99, 97 and 0). just one
Solution Since the sorted marks (before the eleventh assignment) are 91, 91, 93, 93, 94, 95, 96, 97,
98, 99, Q(0.25) = x(3) = 93, Q(0.5) = (x(5) + x(6) )/2 = (94 + 95)/2 = 94.5 and Q(0.75) = x(8) = 97.
√
Therefore, Median(x) = 94.5, IQR(x) = 97−93 = 4 and d = 10(94.7−94.5)/(2×4) = 0.07905694.
Including the eleventh assignment we have Q(0.25) = (x(3) + x(4) )/2 = (91 + 93)/2 = 92,
Q(0.5) = x(6) = 94 and Q(0.75) = (x(8) + x(9) )/2 = (96 + 97)/2 = 96.5. Therefore, the new median
and IQR are: Median(x) = 94 and IQR(x) = 96.5 − 92 = 4.5. Unlike the mean, the median is very
little√affected by the single poor performance. This is also reflected by the large discrepancy index
d = 11(86.09 − 94)/(2 × 9) = 2.915.
1.5. BOX PLOT 17
Example 1.9 Table 1.5 gives the mean, median, standard deviation and IQR for the data sets on
Figure 1.2. The mean and median of Tobin’s Q ratios show appreciable differences (d = 2.98). In
addition, their standard deviation is more than twice their IQR. Clearly, the mean and standard
deviation are upset by a few heavily over–rated firms. Tobin’s Q ratios are then better represented
by their median and IQR. The effect of outliers and lack of symmetry is moderate in the case of the
“Age of Officers” data. Although d = 1.07 the mean and standard deviation still summarize these
data well. Finally, for the “Speed of Light” data the two clear (lower) outliers do not seem to have
much affect on the sample mean (d = 0.64).
Table 1.5: Summary figures for the data sets displayed on Figure 1.2
54
100 200 300 400 500
52
50
48
(c) Speed of light (d) Outliers deleted
Figure 1.4: Box plots for the data sets displayed on Figure 1.2
Example 1.10 Table 2.3 gives the monthly average flow (cubic meters per second) for the Fraser
River at Hope, BC, for the period 1971–1990. Figure 1.5 gives the boxplots for each month, from
January to December (from left to right). The year to year distributions of the monthly flows are
mildly asymmetric, with longer upper tails, and there are some outliers. However, the location and
dispersion summaries (see Table 1.10) are roughly consistent for most months and point to the same
conclusion: the river flow, and its variability as well, are much larger in the summer.
Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Mean 957.4 894.8 993.1 1941.0 4994.5 6973.0 5505.0 3548.0 2340.0 1816.0 1588.9 1092.4
Median 868.0 849.5 926.5 2010.0 5000.0 6365.0 5120.0 3380.0 2245.0 1910.0 1525.0 1005.0
SD 274.4 202.8 233.5 477.8 976.4 1434.2 1212.2 886.4 685.6 401.7 366.1 282.2
IQR 174.6 163.0 257.0 427.8 613.0 1325.9 1277.8 505.6 446.3 424.1 377.8 181.1
1.6. EXERCISES 19
10000
8000
6000
4000
2000
Figure 1.5: Fraser River monthly flow (cms) from January (left) to December (right)
1.6 Exercises
Problem 1.1 The records of a department store show the following total monthly finance charges
(in dollars) for 240 customers which accounts included finance charges (see Table 1.7). From a
department store’s records for a particular month, the total monthly finance charges in dollars were
obtained from 240 customers accounts that included finance charges. See the table shown below:
(a) Complete the frequency table. What percentage of customers were charged less than $20?
Problem 1.3 The following data are the waiting times (in minutes) between eruptions of Old
Faithful geyser between August 6 and 10, 1985.
Problem 1.4 The following numbers are the final marks of 16 students in a previous STAT 251
class.
64 86 77 68 95 91 58 91 83 97 96 14 32 68 89 75
Problem 1.5 In 1798, Henry Cavendish estimated the density of the earth (as a multiple of the
density of water) by using a torsion balance. The dataset below contains his 29 measurements.
Problem 1.6 The mean size of twenty five recent projects at a construction company (in square
meters) is 25,689 m2 . The standard deviation is 2,542 m2 .
(a) Calculate the mean, variance and standard deviation in square feet [Hint: 1 foot = 0.3048 m].
(b) A new project of 226050 f t2 has been just completed. Update the mean, variance and standard
deviation.
Problem 1.7 The daily sales in April, 1994 for two departments of a large department store (in
thousands of USA dollars) are summarized below.
(a) Convert the figures above to hundreds of Canadian dollars (CN $1 = US $0.7)
(b) Calculate the mean and standard deviation for the total daily sales for the two departments.
Why do you think the combined daily sales are more variable than the individual ones?
(c) Calculate the mean and standard deviation for the difference in daily sales between the two
departments. Comment your results.
(d) Under what conditions would the variance of the sums be smaller than the variance of the
differences?
Problem 1.8 A manufacturer of automotive accessories provides bolts to fasten the accessory to
the car. Bolts are counted and packaged automatically by a machine. There are several adjustments
that affect the machine operation. An experiment to find out how several variables affect the speed
of the packaging process was carried out. In particular, the total number of bolts to be counted (10
and 30) and the sensitivity of the electronic eye (6 and 10) have been considered. The observed
times (in seconds per bolt) are given in Table 1.10.
(a) Summarize and describe the data.
(b) What adjustments have the greatest effect?
(c) How would you adjust the machine to shorten the packaging time?
Problem 1.9 Find the average, variance and standard deviation for the following sets of numbers.
a) 1, 2, 3, 4, 5, . . . , 300
b) 4, 8, 12, 16, 20, . . . , 1200
c) 1, 2, 2, 3, 3, 3, 4, 4, 4, 4, . . . , 9, 9, 9, 9, 9, 9, 9, 9, 9
Pn Pn 2 Pn 3
Hint: i = n(n + 1)/2, i = n(n + 1)(2n + 1)/6, i = n2 (n + 1)2 /4 and
Pn 4 3 2
i = n(n + 1)(6n + 9n + n − 1)/30
22 CHAPTER 1. SUMMARY AND DISPLAY OF UNIVARIATE DATA
Problem 1.10 The number of worldwide earthquakes in 1993 is shown in the following table
(a) Complete the frequency table. What percentage of earthquakes were below 5.0? Above 6.0?
(b) Draw a histogram and comment on it.
(c) Calculate the mean and standard deviation for the earthquake magnitude in 1993.
Problem 1.11 The daily number of customers served by a fast food restaurant were recorded for
30 days including 9 weekends and 21 weekdays. The average and standard deviations are as follows:
Weekends: x1 = 389.56, SD1 = 27.4
Weekdays: x2 = 402.19, SD2 = 26.2
Calculate the average and standard deviation for the 30 days.
Problem 1.12 The average and the standard deviation for the weights of 200 small concrete–mix
bags (nominal weight = 50 pounds) are 51.2 pounds and 1.5 pounds, respectively. A new sample
of 200 large concrete–mix bags (nominal weight = 100 pounds) have just been weighed. Do you
expect that the standard deviation for the last sample will be closer to 1.5 pounds or to 3.0 pounds?
Justify your answer.
1.6. EXERCISES 23
Problem 1.14 Each pair (xi , wi ), i = 1, · · · , n, represents the placement and magnitude of a ver-
tical force acting on a uniform beam. Find the center of gravity of this system. [Hint: see the
discussion under “The Sample Mean as Center of Gravity” and notice that in the present case the
vertical forces are not equal].
Problem 1.15 Calculate the center of gravity of the system when the placements (xi ) and weights
(wi ) are given by Table 1.12.
Problem 1.16 Each pair (xi , wi ), i = 1, · · · , n, represents the placement and magnitude of a ver-
tical force acting on a uniform beam. What values of wi would make the sample median the center
of gravity? Consider the cases when n is even and n odd separately.
Problem 1.17 The maximum annual flood flows for a certain river, for the period 1941-1990, are
given in Table 1.6.
(i) Summarize and display these data.
(ii) Compute the mean, median, standard deviation and interquartile range.
(iii) If a one–year construction project is being planned and a flow of 150000 cfs or greater will halt
construction, what is the “probability” (based on past relative frequencies) that the construction
will be halted before the end of the project? What if it is a two-year construction project?
Problem 1.18 The planned and the actual times (in days) needed for the completion of 20 job
orders are given in Table 1.14.
(a) Calculate the average and the median planned time per order. Same for the actual time.
(b) Calculate the corresponding standard deviations and interquartile ranges.
24 CHAPTER 1. SUMMARY AND DISPLAY OF UNIVARIATE DATA
(c) If there is a delay penalty of $5000 per day and a before–schedule bonus of $2500 per day, what
is the average net loss ( negative loss = gain) due to differences between planned and actual times?
What is the standard deviation?
(d) Study the relationship between the planned and actual times.
(e) What would be your advice to the company based on the analysis of these data?
P
Problem 1.19 Show that (a) Cov(x, y) = [( xi yi ) − nx y]/(n − 1).
(b) If ui = a + b xi and vi = c + d yi , then Cov(u, v) = bdCov(x, y).
1.6. EXERCISES 25
Order Planned Time Actual Time Order Planned Time Actual Time
1 22 22 11 17 18
2 11 8 12 27 34
3 11 8 13 16 14
4 16 14 14 30 35
5 21 20 15 22 18
6 12 16 16 17 16
7 25 29 17 13 12
8 20 20 18 18 14
9 13 10 19 21 19
10 34 39 20 18 17
Problem 1.20 The total paved area, X (in km2 ), and the time, Y (in days), needed to complete
the project was recorded for 25 different jobs. The data is summarized as follows:
27
28 CHAPTER 2. SUMMARY AND DISPLAY OF MULTIVARIATE DATA
(a) Jan-Feb Fraser Flow (b) Jan-Jun Fraser Flow
800 100012001400
•
February
•
•• •
June
• •• • •
•• •
• • •
• • • • • •
•• • • • • •••
•
• ••
•• •
600 800 1000 1200 1400 1600 1800 600 800 1000 1200 1400 1600 1800
January January
2000400060008000
• ••
• • ••
• ••• ••
Price
Flow
•• •
•• • ••
•• •• • ••• ••
• •• • •• •
• •
• • • • • •• ••
• • •• • •• •• ••
• • • ••• • •• •• ••
• •• •• •• •• • •• • ••
10 20 30 40 2 4 6 8 10 12
Age Month
the same frequency. Similarly for lower than average January flows. Figure 2.1 (c) shows a negative
linear association between the age and price of twenty randomly selected houses: older than average
houses tend to have lower than average prices and vice versa for newer houses; Figure 2.1 (d) shows
a non–linear association between time of the year and river flow: the monthly mean flows first
increase (until June) and then decrease.
A common mistake is to confuse the concepts of linear association and causality. If we find a
positive linear association between two variables we can say that they tend to take values above and
below their means simultaneously. The observed linear association may be the result of a causal
relation between the variables – an increase in one of them causes an increase in the other. In many
occasions, however, observed linear associations are the result of the action of a third variable (called
lurking variable) which drives the other two. For instance, the linear association between January
and February Fraser flows might be due to the effect of a lurking variable, namely the weather. If in
a given year we artificially increase the Fraser January flow we cannot expect a naturally occurring
higher flow in February.
We often wish to investigate the pairwise relations between several pairs of variables. This can
be accomplished by several ways. One way is to use different symbols (dots, stars, letters, numbers,
etc.) to represent the points and overlay the scatter plots on a single picture, facilitating their
comparison. For instance, the weights and heights of men and women could be plotted on a single
scatter plot using the letter “w” for women and “m” for male.
Another technique for dealing with several variables is to display the scatter plots in a “matrix”
layout. Scatter plot matrices are useful for uncovering possible patterns in the pairwise association
structure. An example is given by Figure 2.2. Notice that the strength of association decreases as
months get further apart. Moreover, while January, February and March show some association,
April and May seem to have less (if any) association with other months.
2.2. COVARIANCE AND CORRELATION COEFFICIENT 29
600 8001000120014001600 10001500200025003000
1200 1600
•• • • • • ••
• • • • • • •• • • ••
• • • •
•• • • •
•• •• • ••• • • •• •• • •••
• •
• • •••• Feb ••••••• • • •• •• • • • • ••• • •• • •
•• •• •••• ••••• • • • •• • •• • • • •••• • • •• •••• •• •
•• • •••• •• ••
600800
•• •• ••••••• •••••••••• • • • • • •• • • • ••• • •• • • • • • ••
• •• •• • • • • •
• ••••••••••• ••• • ••••••••• • •• • • • • •
••••• • ••• •••• •• • • • ••• • • • • •
• •• •• •• ••• • •• • •• • • • • • •• •• • •
• •• ••• •• • •• • ••
• • • • •• • •
• •• •• • • •• • • • • •• • • • • • •• ••
•• •• • • • • Mar • •• • • •• •
• • •• •••• •••••••• • • ••• •• •• ••••• • •• ••• • •••• ••• • • • • • • ••••••••• •• • •
•• • ••••••••• • •• • •• • • ••• ••• •• •• • •• •
•• • • •• • •• • ••• • ••• • • • • •• •
• ••••• •• • ••• • • •• •
• ••• •••• • •••
••••• •• • •
•• ••• • • •• ••• • ••••••••••
•• • • •
••••• • •• •••• • • • ••
•• •• • ••••• •• •• •
••
• • •• •• •• • •
• • • •
3000
• • • •
• • • •
• • •• ••• •• •• • • • • • •• • •• • • • •••• ••
• • •
•
• •• •• ••• • • ••
• •• • • •••• • •••• • • ••••• • •••• • • • •• ••
2000
• • •• • • • • • • • • • • ••
• ••• • •
• • •
• • •
••••• •• ••
• •• Apr • •••
••••
•
• •• •• • • • •• •
•
• • • • •• • ••••••• ••• • •
• • •• • •• •• •
• ••• •• • ••• • • • • • • • • • • • ••••••• • ••• • •••• ••• • ••• • •
•
• •••• •••
1000
• •
• • • • •••• •• • • • ••••• • •• • •• •• • • • ••••• • • • •
•• • •• •• •••• • • • ••
• • • •
5000 7000
• • • • • • ••
• • • • •• • ••
• • •
• •
• •• • •• • • • •• • •
• • •• •• • • •• ••• ••• •• •• • • • • ••• •• •
•• •• • •••• • •• •
•• •• • • • •••
•• • • ••• • •• • • • ••••••• ••• •••• •• •• • •• •••• •
•• ••• •• •••••• •• •• • • • • •• •
• •• • • •• • • •••• • •
May
• • • •• • •
• • • • •• • ••• ••••• • • • • ••••• •• ••• • • •
••• • • ••••••• •••• • •• • • •• ••••• ••••• •
• •
•
• •• •• • • •
• ••• • • •• ••• • ••••• • • • •• • • • •
3000
• • • • • •• • • • •
6008001000 1400 1800 6008001000 1400 1800 300040005000600070008000
will be large and positive. On the other hand, if the variables are negatively associated, when one
of them is above (below) its mean the other will tend to be below (above) its mean and so the
products (xi − x)(yi − y) will be mostly negative. In this case the sample covariance (2.1) will be
large and negative. Finally, if the variables are not positively nor negatively associated the products
(xi − x)(yi − y) will be positive and negative with approximately the same frequency (there will be
a fair degree of cancellation) and the sample covariance will be small.
The following formula provides a simple procedure for the hand calculation of the covariance:
n
1 X
Cov(x, y) = (xi − x)(yi − y)
n − 1 i=1
n
n 1X
= [xy − x y] , where xy = xi yi (2.2)
n−1 n i=1
Some problems with the interpretation of the covariance and its direct use as a measure of linear
association are illustrated in Example 2.1.
Example 2.1 Consider the measurements (xi , yi ) of the first–crack and failure load (in pounds
per square foot) on Table 2.1. Figure 2.3 suggests that there little association between these mea-
surements. Since x = 8396.6 pounds per square foot, y = 16, 064.4 pounds per square foot, and
30 CHAPTER 2. SUMMARY AND DISPLAY OF MULTIVARIATE DATA
Cov(x, y) = (20/19) [(134875645) − (8396.6)(16064.4)] = −11, 258.99 square pounds per ft4
If the loads are given in thousand of pounds per square foot instead of pounds per square foot, then
ui = xi /1000 ,vi = yi /1000 and, from Problem 1.19,
Cov(x, y)
Cov(u, v) = = −0.011259 million square pounds per ft4 .
1000 × 1000
Correlation Coefficient
Problem 2.1 illustrates the strong dependency of Cov(x, y) on the scale of the variables. A
measure of linear association which is independent from the variables scale (see 2.5) is provided the
sample correlation coefficient,
Cov(x, y) Cov(x, y)
r(x, y) = p = .
Var(x)Var(y) SD(x) SD(y)
18000
•
Failure Load
• • •
16000
• •
• •
• • •
14000 • •
• •
First-Crack Load
The small value of r(x, y) confirms the qualitative impression from Figure 2.3 that the first crack
and the failure loads (in the case of these concrete beams) are not related. The main implication
from a practical point of view is that the first crack of a given beam cannot be used to predict its
ultimate failure load.
Example 2.2 Table 2.2 gives the results of an experiment to study the relation between tem-
perature (in units of 10o Fahrenheit) and yield of a certain chemical process (percentage). The
reader can verify that in this case x = 34.5, y = 43.07, Var(x) = 77.50, Var(y) = 128.06 and
Cov(x, y) = 96.2759. Therefore, the correlation coefficient,
96.2759
r(x, y) = √ = 0.9664 = 0.97,
77.50 × 128.06
indicates a strong positive linear association between temperature and yield. This is also clearly
suggested by the scatter plot in Figure 2.4. Notice that the relation between yield and tempreature
is likely to be causal, that is, the increase in yield may be actually caused by the increase in
temperature.
When we have several variables their covariances and correlation coefficients can be arranged in
matrix layouts called covariance matrix and correlation matrix. Although the covariance matrix is
difficult to interpret due to its dependence on the scale of the variables, it is nevertheless routinely
computed for future usage.
The correlation matrix is the numerical counterpart of the scatter plot matrix discussed before.
For the River Fraser Data (see Figure 2.2) we have
32 CHAPTER 2. SUMMARY AND DISPLAY OF MULTIVARIATE DATA
Unit Temp. (X) Yield (Y) Unit Temp. (X) Yield (Y)
1 20 28 16 35 41
2 21 26 17 36 45
3 22 22 18 37 53
4 23 25 19 38 46
5 24 27 20 39 44
6 25 32 21 40 49
7 26 31 22 41 53
8 27 33 23 42 49
9 28 38 24 43 51
10 29 41 25 44 55
11 30 41 26 45 56
12 31 38 27 46 58
13 32 41 28 47 58
14 33 46 29 48 58
15 34 44 30 49 63
As already observed from Figure 2.2, February flaws are somewhat correlated with January and
March flaws (with correlation coefficients 0.78 and 0.75, respectively). January and March flaws
are also marginally correlated (correlation coefficient equal to 0.65). The correlation coefficients
between all the other pairs of months are below 0.50.
called regression line. The hats indicate that β̂0 , β̂0 and fˆ(x) are calculated from the data. In this
context X and Y play different roles and are given special names. The independent variable X is
called explanatory variable and the dependent variable Y is called response variable.
Least Squares
The solid line on Figure 2.4 (see Example 2.2) was obtained by the method of least squares (LS).
According to this method, the regression coefficients (the intercept β̂0 and the slope β̂1 ) minimize
(in b0 and b1 ) the sum of squares
n
X
S(b0 , b1 ) = (yi − b0 − b1 xi )2 .
i=1
2.3. THE LEAST SQUARES REGRESSION LINE 33
60
•••
••
• •
•
50
• •
• •
• •
Yield
•
•• • •
40
• •
30 •••
•
• ••
•
20 25 30 35 40 45 50
Temperature
which are obtained by r differencing S(b0 , b1 ) with respect to b0 and b1 . Carrying out the summations
and dividing by n we obtain,
where
n
X n
X
xy = (1/n) xi yi and xx = (1/n) x2i (2.5)
i=1 i=1
From (2.3), β̂0 = y − β̂1 x. Substituting this into (2.4) and solving for β̂1 gives
xy − x y
β̂1 = .
xx − x x
The regression line fˆ(x) and the regression coefficients β̂0 and β̂1 are good summaries for linearly
associated data. In this case the fitted value
ŷi = fˆ(xi ) = β̂0 + β̂1 xi (Fitted Value)
will be “close” to the observed value of yi . How close depends on the strength of the linear associ-
ation. The differences between the observed values yi and the fitted values ŷi ,
ei = yi − ŷi (Residual),
are called regression residuals.
Residual Plot
The regression residuals ei are usually plotted against the fitted values ŷi to determine the
appropriateness of the linear regression fit. If the data are well summarized by the regression line
(see Figure 2.5 (a)) the corresponding scatter plot of (ŷi , ei ) has no systematic pattern (see Figure
2.5 (c)). Examples of “bad” residual plots – that is, plots that indicate that the regression line is a
poor summary for the data – are given on Figure 2.5 (d) and (e). The corresponding scatter plots
and linear fits are given on Figure 2.5 (b) and (c). In the case of Figure 2.5 (d), the residuals go
from positive to negative and back to positive, suggesting that the relation between X and Y may
not be linear. In the case of Figure 2.5 (e) larger fitted values have larger residuals (in absolute
value).
The least square coefficients are the solution to the linear equations
n
X
(yi − β̂0 − β̂1 xi1 − β̂2 xi2 − · · · − β̂p xip ) = 0
i=1
n
X
(yi − β̂0 − β̂1 xi1 − β̂2 xi2 − · · · − β̂p xip )xi1 = 0
i=1
Xn
(yi − β̂0 − β̂1 xi1 − β̂2 xi2 − · · · − β̂p xip )xi2 = 0 (Gauss Equations)
i=1
·········
·········
·········
n
X
(yi − β̂0 − β̂1 xi1 − β̂2 xi2 − · · · − β̂p xip )xip = 0
i=1
where
n
X n
X
yxj = (1/n) xij yi and xj xk = (1/n) xij xik . (2.6)
i=1 i=1
2.5 Exercises
Problem 2.1
Problem 2.2 The following data give the logarithm (base 10) of the volume occupied by algal
cells on successive days, taken over a period over which the relative growth rate was approximately
constant.
36 CHAPTER 2. SUMMARY AND DISPLAY OF MULTIVARIATE DATA
400
200
12000
• •• •
•
• •
•
•
•
10000
••
•
150
300
•
• •• •
•
•
8000
• •
• •• • • •
• • •
• • • •
•
100
6000
•
200
•
y
y
• • • •
• • ••
• •
• • • •
4000
• •• • •
• •••
• •• • • •• • • •
•
50
• • • •
100
• • • • • •
•
2000
• •• •
•• • •
• •• • • •
• • •• •• ••
•• • •
•
• • • • •• •• •
0
0
• • • ••
0
5 10 15 20 25 30 0 10 20 30 40 50 0 20 40 60 80 100
x x x
2000
• • •
• ••
•
•
••
100
40
• •
•
• •
• • •
1000 • •
• ••
• • • • •• •
20
• • • • •
• • •
Residual
Residual
Residual
• •• •• ••• • ••
0
• • • •
•• •• • • • •• • •
•• • •
• • • • • •• •
• •• • •
0
0
• • •• •
• •
• • •
-100
• • • •
• ••
• • • •
•
-20
• • • • •
• •• • • ••
-1000
• •
•• ••• •
•
•
• • • •
-200
20 40 60 80 100 120 140 -2000 0 2000 4000 6000 8000 10000 50 100 150 200 250 300
Fitted Value Fitted Value Fitted Value
Figure 2.5: Examples of linear regression fits (above) and their residual plots (below).
Problem 2.3 The maximum annual flood flows of a river, for the period 1949–1990, are given in
Table 1.6.
(i) Summarize and display these data.
(ii) Compute the mean, median, standard deviation and interquartile range.
(iii) If a one–year construction project is being planned and a flow of 150000 cfs or greater will halt
construction, what is the relative frequency (based on past relative frequencies) that the construction
will be halted before the end of the project? What if it is a two-year construction project?
2.5. EXERCISES 37
100
•• • •
•
•••
10
10
• •• • •
• • •
••• • •
90
• • • • •
5
•• • • •• • •
•
80
Residual
Residual
• •• ••
Time
•• • •
0
70
• • • •
-5
-5
• •
60
• • • •
• • • • •
-10
-10
• • •
•
50
• •
•• • •
5 10 15 20 25 60 70 80 90 100 5 10 15 20 25
Diameter Fitted Value Diameter
••• • •
6
• ••• ••
•
••• • • • •
90
4
• •
•
• • • •
•
80
Residual
Residual
2
2
•
Time
•• • • • •
• •
• •
70
0
• •• • ••
• •
• • • •
• •
60
-2
-2
• • • •
• • •
• • • • •
• • •
50
-4
-4
•• • •
5 10 15 20 25 60 70 80 90 5 10 15 20 25
Diameter Fitted Value Diameter
••• •
• •
•
• •
••
4
4
• • ••
•• • •
90
••
2
2
• • •• • • •• •
80
Residual
Residual
• • •
Time
•• • •
• • • •
0
0
70
• • • • • •
60
• • • • •
-2
-2
• • •
• • • • •
• • •
•
50
•• • • • •
5 10 15 20 25 50 60 70 80 90 5 10 15 20 25
Diameter Fitted Value Diameter
Problem 2.4 The planned and the actual times (in days) needed for the completion of 20 job
orders are given in Table 1.14
(a) Calculate the average and the median planned time per order. Same for the actual time.
(b) Calculate the corresponding standard deviations and interquartile ranges.
(c) If there is a delay penalty of $5000 per day and a before–schedule bonus of $2500 per day, what
is the average net loss ( negative loss = gain) due to differences between planned and actual times?
What is the standard deviation?
(d) Study the relationship between the planned and actual times.
(e) What would be your advice to the company based on the analysis of these data?
Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1971 855 1030 841 1550 6120 7590 5590 3570 2360 1890 1550 908
1972 774 857 1500 2100 6450 10800 7330 4120 2280 1940 1500 1000
1973 984 842 850 1550 4910 6180 5000 2930 1680 2080 1620 1130
1974 987 929 927 2320 5890 8430 7470 4360 2440 1930 1290 978
1975 797 780 736 1100 3940 6830 6070 3420 2300 1950 2360 1480
1976 1140 1030 924 2300 7070 7250 7670 6440 4460 2510 1800 1480
1977 1240 1230 1130 2350 4710 5670 4830 3620 2340 1650 1260 1030
1978 881 791 952 1960 3950 5730 4540 2970 2600 2090 1590 1010
1979 801 721 957 1290 4910 6360 4860 2610 1830 1420 918 952
1980 684 649 703 1760 5120 4900 4010 2720 2600 2080 1630 1900
1981 1860 1480 1300 1880 4950 6260 4890 3620 2130 1530 1950 1140
1982 821 927 844 1010 5360 8690 7230 4850 3620 2310 1470 1110
1983 972 977 1240 1990 4090 6060 5240 3460 2210 1470 2050 878
1984 1160 1010 1160 2030 2870 6370 6580 3780 2920 2560 1370 861
1985 740 706 801 2070 5300 7390 4650 2770 1940 1980 1230 746
1986 813 809 1280 2090 3770 8390 5380 3220 1890 1470 1340 908
1987 1000 944 1300 2280 5120 5840 4070 2980 1680 1020 1210 811
1988 629 657 809 2410 5450 5940 4430 3010 1890 1540 1470 926
1989 800 685 682 1780 4860 6020 3990 3170 1840 1380 2060 1410
1990 1210 841 926 3000 5050 8760 6270 3340 1790 1520 2110 1190
Problem 2.6 The total paved area, X (in km2 ), and the time, Y (in days), needed to complete
the project was recorded for 25 different jobs. The data is summarized as follows:
Order Planned Time Actual Time Order Planned Time Actual Time
1 22 22 11 17 18
2 11 8 12 27 34
3 11 8 13 16 14
4 16 14 14 30 35
5 21 20 15 22 18
6 12 16 16 17 16
7 25 29 17 13 12
8 20 20 18 18 14
9 13 10 19 21 19
10 34 39 20 18 17
40 CHAPTER 2. SUMMARY AND DISPLAY OF MULTIVARIATE DATA
Chapter 3
Probability
Random Experiment: The defining feature of a random experiment is that its outcome
cannot be determined beforehand. That is, the outcome of the random experiment will
only be known after the experiment has been completed. The next time the experiment is
performed (seemingly under the exact same conditions) the outcome may be different. Some
examples of random experiments are:
Sample Space (S): Although we may not be able to say beforehand what the outcome of
the random experiment will be, we should at least in principle to be able to make a complete
“list” of all the possible outcomes. This list (set) of all the possible outcomes is called “the
sample space” and denoted by S. A generic outcome (that is, element of S) is denoted by
w. The sample spaces for the random experiment listed above are:
– S = {Yes, No},
– S = {0, 1, 2, . . . , n} where n is the lot size,
– S = [0, ∞), the time (in hours) between breakdowns can be any non-negative real number.
41
42 CHAPTER 3. PROBABILITY
– S = {0, 1, 2, . . .}, the number of accidents can be any non-negative integer number.
– S = [0, 100], the percentage yield can be any real number between zero and one hundred.
Event: The events, usually denoted by the first upper case letters of the alphabet (A, B, C,
etc), are simply subsets of S. Most events encountered in practice are meaningful and can
be expressed either in words or using mathematical notation. Some examples (related to the
list of random experiments given above) are:
An important feature of the events is that they can or cannot occur, depending on the
actual outcome of the random experiment. For instance, if after completing the inspection
of the lot we find two defectives, the event A has occurred. On the other hand, if the actual
number of defectives turned out to be five, the event A did not occur.
Two rather special events are the “impossible” event – which can never occur – denoted
by the empty set ∅ and the “sure” event – which always occurs – consisting of the entire
sample space, S.
Some related mathematical notations are:
and
w 6∈ A ⇐⇒ w doesn’t belongs to A ⇐⇒ A doesn’t occur
Probability Function (P ): Evidently, not all the events are equally likely. For instance,
the event
A = {more than three million accidents}
would appear to be quite unlikely, while the event
In fact, since A ⊆ B,
B = (B ∩ A) ∪ (B ∩ Ac ) = A ∪ (B ∩ Ac ).
Example 3.1 It is known from previous experience that the probability of finding zero, one,
two, etc. defectives in lots of 100 items shipped by a certain supplier are as given in Table
2.1 below.
Let A, B and C be the events “less than two defectives”, “more than one defective” and
“one or two defectives”, respectively. (a) Calculate P (A), P (B) and P (C). (b) What is the
meaning (in words) of the event Ac ? Calculate P (Ac ) directly and using Property 4. (c)
What is the meaning (in words) of the event A ∪ C? Calculate P (A ∪ C) directly and using
Property 3.
Table 3.1:
Defectives Probability
0 0.50
1 0.20
2 0.15
3 0.10
4 0.03
5 0.02
6 or more 0.00
44 CHAPTER 3. PROBABILITY
Solution
(a) From Table 2.1, P (A) = 0.70, P (B) = 0.30, and P (C) = 0.35
(b) Ac = {two or more defectives} = {more than one defective} = B, from Table 1, P (Ac ) =
P (B) = 0.30. This is consistent with the result we obtain using Property 4:
P (Ac ) = 1 − P (A) = 1 − 0.70 = 0.30.
(c) A ∪ C = {less than three defectives}. Therefore, directly from Table 1, P (A ∪ C) = 0.85.
To make the calculation using Property 3, we must first find P (A ∩ C). Since A ∩ C =
{exactly one defective}, it follows from Table 1 that P (A ∩ C) = 0.20. Now,
P (A ∪ C) = 0.70 + 0.35 − 0.20 = 0.85.
2
Example 3.1 (continued): Suppose that we know that the lot contains two defectives or
more. What is the probability that it contains three or more defectives?
Solution Let
B = { two or more defectives } = { more than one defective }
and
D = { three or more defectives } = { more than two defectives }.
Since P (B) = 0.30 and P (D ∩ B) = P ({3, 4, 5}) = 0.15, the desired conditional probability
is
P (D|B) = P (D ∩ B)/P (B) = 0.15/0.30 = 0.50.
2
3.2. CONDITIONAL PROBABILITY AND INDEPENDENCE 45
Suppose that we wish to investigate the occurrence of a certain event B. For example,
consider the collapse of a large industrial building or the crash of a computer network.
The event B may have been caused by one of several possible causes or states of nature
denoted A1 , A2 . . . Am , For example, the collapse of the industrial building may have been
caused by one (and only one) of the following:
A1 Poor design
- underestimated live load
- underestimated maximum wind speed
- etc.
A2 Poor construction
- Low grade material
- Insufficient supervision and control
- Gross human error
- etc.
A3 A combination of A1 and A2 .
A4 Other (non-assignable) causes.
Suppose that, from previous experience or some other source (for example some expert’s
opinion), the conditional probabilities of B given Ai are known. That is, the probabilities
that the event B will occur when the the cause Ai is present are known and represented by
p 1 , p2 , . . . , p m .
We will call these conditional probabilities risk factors. Suppose also that the probabilities
of each possible cause Ai are known. These probabilities are called prior probabilities and
denoted
π1 , π 2 , . . . , π m .
In the case of our example, the prior probabilities may represent the actual fractions of
industrial buildings in the country which have some design or construction problems. Or they
may represent the subjective beliefs (educated guesses) of some expert consultant (perhaps
the engineer hired by the insurance company to investigate the causes of the accident). In
summary, we suppose that
pi = P (B|Ai ), and πi = P (Ai ),
are known for all i = 1, . . . , m. Notice that
π1 + π2 + . . . + πm = 1.
The prior probabilities and the risks factors for the collapsed building example are given in
columns 2 and 3 of Table 3.2
46 CHAPTER 3. PROBABILITY
Table 3.2:
Cause (i) Prior Probability (πi ) Risk Factor (pi ) Posterior Probability
1 0.00050 0.10 0.29
2 0.00010 0.20 0.12
3 0.00001 0.40 0.02
4 0.99939 0.0001 0.57
The engineer hired by the insurance company to investigate the accident would certainly
wish to know where he can first start looking to find an assignable causes. More precisely, she
would wish to know what is the most likely assignable cause for the collapse of the building.
The conditional probability of each possible cause, given the fact that the event has
occurred, is called the posterior probability for this cause and can be calculated by the
famous Bayes’ formula
P (B|Ak )P (Ak )
P (Ak |B) =
P (B|A1 )P (A1 ) + P (B|A2 )P (A2 ) + . . . + P (B|Am )P (Am )
pk πk
= .
p1 π1 + p2 π2 + . . . + pm πm
In the case of our example the posterior probability of the cause “poor design” (A1 ), for
instance, is equal to
(0.00050)(0.10)
P (A1 |B) =
(0.00050)(0.10) + (0.00010)(0.20) + (0.00001)(0.40) + (0.99939)(0.0001)
= 0.29.
The other posterior probabilities are calculated analogously and the results are displayed in
the fourth column of Table 3.2.
What did the engineer learn from the results of these (posterior probability) calculations?
In the first place she learned that the chance of finding an assignable cause is approximately
43%. Furthermore, she learned that it is best to begin looking for flaws in the design of the
building, as this cause is almost three times more likely to have caused the accident than the
other assignable causes. Finally she learned that it is highly unlikely that the collapse of the
building has been caused by more than one assignable cause.
3.2. CONDITIONAL PROBABILITY AND INDEPENDENCE 47
= π1 p 1 + π2 p 2 + · · · + πm p m . (3.3)
therefore,
P (B|Ak )P (Ak )
P (Ak |B) =
P (B|A1 )P (A1 ) + P (B|A2 )P (A2 ) + · · · + P (B|Am )P (Am )
πk pk
= .
π1 p1 + π2 p2 + · · · + πm pm
Example 3.2 A certain disease is known to affect 1% of the population. A test for the
disease has the following features: if the person is contaminated the test is positive with
probability 0.98. On the other hand, if the person is healthy, the test is negative with
probability 0.95. (a) What is the probability of a positive test when applied to a randomly
chosen subject? (b) What is the probability that an individual is affected by the disease after
testing positive? (c) Explain the connections between this problem and Bayes’ formula.
Solution
(a) Since B is clearly equal to the disjoint union of the events B ∩ C and B ∩ C c ,
P (B) = P (B ∩ C) + P (B ∩ C c )
= P (C)P (B|C) + P (C c )P (B|C c )
= (0.01 × 0.98) + (0.99 × 0.05)
= 0.0593
48 CHAPTER 3. PROBABILITY
(b)
P (B ∩ C) P (B|C)P (C) 0.98 0.01
P (C|B) = = = = 0.1653
P (B) P (B) 0.0593
Notice that the probability of having the disease, even after testing positive, is surprisingly
low (less than 0.17). Why do you think this is so?
(c) The calculation in part (a) produced the “unconditional” probability that the event
“testing positive”. This unconditional probability constitutes the denominator of Bayes’
formula. If a person has been tested positive, given the characteristics of the test, this can
be caused by two possible causes: being healthy and being contaminated. The posterior
probability of the second cause is the result of part (b). 2
Independence
Roughly speaking, two events A and B are independent when the probability of any one
of them is not modified after knowing the results for the other (occurrence or not occurrence).
In other words, knowing about the occurrence or no occurrence of any one of these events
does not alter the amount of information (or uncertainty) that we initially had regarding the
other event. Quite simply then, we can say that two events are independent if they do not
carry any information regarding each other.
The formal definition of independence is somewhat surprising at first because it doesn’t
make any direct reference to the events’ conditional probabilities. But see also the remarks
following the definition. Probabilists prefer this formal definition, because it is easy to check
and to generalize for the case of m events (m ≥ 2).
Definition: The events A and B are independent if
P (A ∩ B) = P (A)P (B).
P (A ∩ B) P (A)P (B)
P (A|B) = = = P (A).
P (B) P (B)
3.2. CONDITIONAL PROBABILITY AND INDEPENDENCE 49
Example 3.3 The results of the STAT 251 midterm exam can be classified as follows:
Table 3.3:
Male Female
High 0.05 0.15 0.20
Medium 0.30 0.15 0.45
Low 0.30 0.05 0.35
0.65 0.35 1.00
What is the meaning of the statement “gender and performance are independent”? Are they?
Why?
Solution
Gender and performance are (intuitively) independent if for example, knowing the score
of a randomly chosen test doesn’t affect the probability that this test corresponds to a male
(0.65, from the table) or to a female (0.35). Or vice versa, knowing the gender of the student
who wrote the test doesn’t modify our ability to predict its score.
Let A and B be the events “a randomly chosen student is male” and “a randomly chosen
student has a high score”, respectively. Is it true that P (A|B) = P (A)? The answer, of
course, is no because
Before knowing that the score is high, the chances are almost two out of three that the
student is a male. However, after we know that the score is high, the chances are one out of
four that the student is a male. The lack of independence in this case is derived from the fact
that male students are “under–represented” in the high score category and “over-represented
” in the low score category. 2
If Table 3.3 above is replaced by Table 3.4
Table 3.4:
Male Female
High 0.13 0.07 0.20
Medium 0.29 0.16 0.45
Low 0.23 0.12 0.35
0.65 0.35 1.00
The concept of independence also applies to three or more events and we shall now give the
formal definition of independence of m events. At the same time we want to point out that,
in most practical applications, the independence of certain events is often simply assumed or
50 CHAPTER 3. PROBABILITY
derived from external information regarding the physical make up of the random experiment,
as illustrated in Example 3.4 below.
Fortunately then, we will have few occasions of checking this definition throughout this
course.
Example 3.4 A certain system has four independent components {a1 , a2 , a3 , a4 }. The pairs
of components a1 , a2 and a3 , a4 are in line. This means that, for instance, the subsystem
{a1 , a2 } fails if any of its two component does; similarly for the subsystem {a3 , a4 }. The
subsystems {a1 , a2 } and {a3 , a4 } are in parallel. This means that the system works if at least
one of the two subsystems does. Calculate the probability that the system fails assuming
that the four components are independent and that each one of them can break down with
probability 0.10. How many parallel subsystems would be needed if the probability of failure
for the entire system cannot exceed 0.001?
3́ a1 - a2 Q
´ Q
´ Q
´ Q
´ Q
-´ s-
Q
Q 3́
Q ´
Q ´
Q ´
Q ´
s
Q a3 - a4 ´
Solution Let Ai be the event “component ai works” (i = 1, . . . , 4), and let C be the event
“the system works”.
P (C) = P [(A1 ∩ A2 ) ∪ (A3 ∩ A4 )] = P (A1 ∩ A2 ) + P (A3 ∩ A4 ) − P [(A1 ∩ A2 ) ∩ (A3 ∩ A4 )]
= P (A1 )P (A2 ) + P (A3 )P (A4 ) − P (A1 )P (A2 )P (A3 )P (A4 )
= 0.92 + 0.92 − 0.94 = 0.9639
3.2. CONDITIONAL PROBABILITY AND INDEPENDENCE 51
To answer the second question, just notice that the probability of working for each inde-
pendent subsystem is 0.92 = 0.81. Now, if Bi (i = 1, . . . , m) is the event “the ith subsystem
works”, it follows that
Therefore,
log(0.001)
log(0.001) ≥ m log(0.19) =⇒ m≥ =⇒ m = 5.
log(0.19)
2
52 CHAPTER 3. PROBABILITY
3.3 Exercises
Problem 3.1 If A and B are independent events with P (A) = 0.2 and P (B) = 0.5, find the
following probabilities. (a) P (A ∪ B); (b) P (A ∩ B); and (c) P (Ac ∩ B c )
Problem 3.3 Consider the problem of screening for cervical cancer. The probability that a
women has the cancer is 0.0001. The screening test correctly identifies 90% of all the women
who do have the disease, but the test is false positive with probability 0.001.
(a) Find the probability that a woman actually does have cervical cancer given the test says
she does.
(b) List the four possible outcomes in the sample space.
Problem 3.4 An automobile insurance company classifies each driver as a good risk, a
medium risk, or a poor risk. Of those currently insured, 30% are good risks, 50% are medium
risks, and 20% are poor risks. In any given year the probability that a driver will have at
least one accident is 0.1 for a good risk, 0.3 for a medium risk, and 0.5 for a poor risk.
(a) What is the probability that the next customer randomly selected will have at least one
accident next year?
(b) If a randomly selected driver insured by this company had an accident this year, what is
the probability that this driver was actually a good risk?
Problem 3.5 A truth serum given to a suspect is known to be 90% reliable when the person
is guilty and 99% reliable when the person is innocent. In other words, 10% of the guilty are
judged innocent by the serum and 1% of the innocent are judged guilty. If the suspect was
selected from a group of suspects of which only 5% have ever committed a crime, and the
serum indicates that he is guilty, what is the probability that he is innocent?
Problem 3.6 70% of the light aircrafts that disappear while in flight in a certain country
are subsequently discovered. Of the aircrafts that are discovered, 60% have an emergency
locator, whereas 80% of the aircrafts not discovered do not have an emergency locator.
(a) What percentage of the aircrafts have an emergency locator?
(b) What percentage of the aircrafts with emergency locator are discovered after they disap-
pear?
Problem 3.7 Two methods, A and B, are available for teaching a certain industrial skill.
The failure rate is 20% for A and 10% for B. However, B is more expensive and hence is
only used 30% of the time (A is used the other 70%). A worker is taught the skill by one
of the methods, but fails to learn it correctly. What is the probability that the worker was
taught by Method A?
3.3. EXERCISES 53
Problem 3.8 Suppose that the numbers 1 through 10 form the sample space of a random
experiment, and assume that each number is equally likely. Define the following events: A1 ,
the number is even; A2 , the number is between 4 and 7, inclusive.
(a) Are A1 and A2 mutually exclusive events? Why?
(b) Calculate P (A1 ), P (A2 ), P (A1 ∩ A2 ), and P (A1 ∪ A2 ).
(c) Are A1 and A2 independent events? Why?
Problem 3.9 A coin is biased so that a head is twice as likely to occur as a tail. If the coin
is tossed three times,
(a) what is the sample space of the random experiment?
(b) what is the probability of getting exactly two tails?
Problem 3.10 Items in your inventory are produced at three different plants: 50 percent
from plant A1 , 30 percent from plant A2 , and 20 percent from plant A3 . You are aware
that your plants produce at different levels of quality: A1 produces 5 percent defectives, A2
produces 7 percent defectives, and A3 yields 8 percent defectives. You select an item from
your inventory and it turns out to be defective. Which plant is the item most likely to have
come from? Why does knowing the item is defective decrease the probability that it has
come from plant A1 , and increase the probability that it has come from either of the other
two plants?
Problem 3.11 Calculate the reliability of the system described in the following figure. The
numbers beside each component represent the probabilities of failure for this component.
Note that the components work independently of one another.
.1
.05 .05 ¡ 3 .05
@
1 2 ¡ @ 5
@ ¡
@ 4 ¡
.1
Problem 3.12 A system consists of two subsystems connected in series. Subsystem 1 has
two components connected in parallel. Subsystem 2 has only one component. Suppose the
three components work independently and each has probability of failure equal to 0.2. What
is the probability that the system works?
Problem 3.13 A proficiency examination for a certain skill was given to 100 employees of
a firm. Forty of the employees were male. Sixty of the employees passed the exam, in that
they scored above a preset level for satisfactory performance. The breakdown among males
and females was as follows:
Male (M) Female (F)
Pass (P) 24 36
Fail 16 24
100
54 CHAPTER 3. PROBABILITY
Suppose an employee is randomly selected from the 100 who took the examination.
(a) Find the probability that the employee passed, given that he was male.
(b) Find the probability that the employee was male, given that he passed.
(c) Are the events P and M independent?
(d) Are the events P and F independent?
Problem 3.14 Propose appropriate sample spaces for the following random experiments.
Give also two examples of events for each case.
Counting/measuring:
1 - the number of employees attending work in a certain plant
2 - the number of days with wind speed above 50 km/hour, per year, in Vancouver
3 - the number of earthquakes in BC during any given period of two years
4 - the time between two consecutive breakdowns of a computer network
5 - the number of people leaving BC per year
6 - the percentage of STAT 241/51 students obtaining final marks above 80% in any given
term
7 - the number of engineers working in BC per year
8 - the percentage of computer scientists in BC who will make more than $65, 000 in 1996
9 - the number of employees still working in a certain production plant after 4:30 PM on
Fridays.
Problem 3.15 Let A and B be the events “construction flaw due to some human
error” and “construction flaw due to some mechanical problem”.
1) What are the meaning (in words) of the following events: (a) A ∪ B, (b) A ∩ B, (c)
A ∩ B c , (d) Ac ∩ B c , (e) (A ∪ B)c , (f) Ac ∪ B c , (g) (A ∩ B)c . Draw also the corresponding
diagrams.
2) Show that in general (A ∩ B)c = Ac ∪ B c and that (A ∪ B)c = Ac ∩ B c (so the results of
(f) and (g) and of (d) and (e) above were not mere coincidences).
3) Suppose that P (A) = 0.02, P (B) = 0.01 and P (A ∪ B) = 0.023. Calculate (a) P (A ∩ B),
(b) P (Ac ∩ B c ), (c) P (A ∩ B c ), (d) P (A|B c ), (e) P (A|B).
Problem 3.16 A large company hires most of its employees on the basis of two tests. The
two tests have scores ranging from one to five. The following table summarizes the perfor-
mance of 16,839 applicants during the last six years. From this table we learn, for example,
that 3% of the applicants got a score of 2 on Test 1 and 2 on Test 2; and that 15% of the
applicants got a score of 3 on Test 1 and 2 on Test 2. We also learn that, for example, 20%
of the applicants got a score of 2 on Test 1 and that 25% of the applicants got a score of 2
on Test 2.
A group of 1500 new applicants have been selected to take the tests.
(a) What should the cutting scores be if between 140 and 180 applicants will be short–listed
for a job interview? Assume that the company wishes to short–list people with the highest
possible performances on the two tests.
3.3. EXERCISES 55
Table 3.5:
Score 1 2 3 4 5 Total
1 0.07 0.03 0.00 0.00 0.00 0.10
2 0.15 0.03 0.02 0.00 0.00 0.20
3 0.08 0.15 0.09 0.02 0.01 0.35
4 0.10 0.04 0.08 0.01 0.02 0.25
5 0.00 0.00 0.06 0.02 0.02 0.10
Total 0.40 0.25 0.25 0.05 0.05 1.00
Table 3.6:
Score Test 1 Test 2
1 0.10 0.40
2 0.20 0.25
3 0.35 0.25
4 0.25 0.05
5 0.10 0.05
(b) Same as (a) but assuming now that the company wishes to hire people with the highest
possible performances on at least one of the two tests.
(c) (Continued from (a)) A manager suggests that only applicants who obtain marks above
a certain bottom line in one of the tests be given the other test. Noticing that giving
and marking each test costs the company $55, recommend which test should be given first.
Approximately how much will be saved on the basis of your advise?
(d) Repeat (a)–(c) if the two tests performances are independent and the probabilities are
given by Table 2.6.
Problem 3.18 Twenty per cent of the days in a certain area are rainy (there is some mea-
surable precipitation during the day), one third of the days are sunny (no measurable pre-
cipitation, more than 4 hours of sunshine) and fifteen per cent of the days are cold (daily
average temperature for the day below 5o C).
1 - Would you use the above information as an aid in
(i) Planning your next weekend activities (assuming that you live in this area)?
(ii) Deciding whether you want to move to this area?
(iii) Choosing the type of roofing for a large building in this area?
56 CHAPTER 3. PROBABILITY
Problem 3.19 A company sells a (cheap) recording tape under a limited “lifetime war-
ranty”. From the company records one learns that
5% of the tapes sold by the company are defective and could be replaced under the warranty.
50% of the customers who get one of these defective tapes will claim it under the warranty
and have it replaced.
90% of the tapes which are claimed to be defective are actually so. These tapes are replaced
under the warranty.
(a) Which of the above are conditional probabilities?
(b) Using the above information, calculate the probability that a customer will claim the
warranty.
(c) What is the maximum allowable fraction of defective tapes if the company wants to have
at most 1% of the tapes returned?
Problem 3.21 On average, 20% of the students fail the first midterm. Of those, 60% fail
the second midterm. Moreover, 80% of the students that failed the two midterms fail also
the final exam.
(a) What is the probability that a randomly chosen student fails the two midterms?
(b) What is the probability that a randomly chosen student fails the two midterms and the
final exam?
Problem 3.22 The probability that system survives 300 hours is 0.8. The probability that
a 300 hours old system survives another 300 hours is 0.6. The probability that a 600 hours
old system survives another 300 hours is 0.5.
(a) What is the probability that the system survives 600 hours?
(b) What is the probability that the system survives 900 hours?
Problem 3.23 Recall the situation in Example 3.2 presented in class: the probability of
infection for an individual in the general population is π = .01 and a test for the disease
is such that it will be correctly positive 98% of the time and correctly negative 95% of the
time. Some individuals, however, may belong to some “high risk” groups and therefore have
a larger prior probability π of being infected.
1) calculate the posterior probability of infection as a function of the corresponding prior
probability, π, given that the test is positive (denote this probability by g(π)) and make a
plot of g(π) versus π.
2) what is the value of π for which the posterior probability given a positive test is twice as
large as the prior probability?
3.3. EXERCISES 57
Problem 3.24 Suppose that we wish to determine whether an uncommon but fairly costly
construction flaw is present. Suppose that in fact this flaw has only probability 0.005 of
being present. A fairly simple test procedure is proposed to detect this flaw. Suppose that
the probabilities of being correctly positive and negative for this test are 0.98 and 0.94,
respectively.
1) Calculate the probability that the test will indicate the presence of a flaw.
2) Calculate the posterior probability that there is no flaw given that the test has indicated
that there is one. Comment on the implications of this result.
Problem 3.25 One method that can be use to distinguish between granite (G) and basalt
(B) rocks is to examine a portion of the infrared spectrum of the sun’s energy reflected
from the rock surface. Let R1 , R2 and R3 denote measured spectrum intensities at three
different wavelengths. Normally, R1 < R2 < R3 would be consistent with granite and
R3 < R1 < R2 would be consistent with basalt. However, when the measurements are made
remotely (e.g. using aircrafts) several orderings of the Ri0 s can arise. Flights over regions of
known composition have shown that granite rocks produce
On the other hand, basalt rocks produce these orderings of the spectrum intensities with
probabilities 0.10, 0.20 and 0.70, respectively. Suppose that for a randomly selected rock
from a certain region we have P (G) = 0.25 and P (B) = 0.75.
1) Calculate P (G|R1 < R2 < R3 ) and P (B|R1 < R2 < R3 ). If the measurements for a given
rock produce the ordering R1 < R2 < R3 , how would you classify this rock?
2) Same as 1) for the case R1 < R3 < R2
3) Same as 1) for the case R3 < R1 < R2
4) If one uses the classification rule determined in 1) 2) and 3), what is the probability of
a classification error (that a G rock is classified as a B rock or a B rock is classified as a G
rock)?
Problem 3.26 Messages are transmitted as a sequence of zeros and ones. Transmission er-
rors occur independently, with probability 0.001. A message of 3500 bits will be transmitted.
(a) What is the probability that there will be no errors? What is the probability that there
will be more than one error?
(b) If the same message will be transmitted twice and those bits that do not agree will be
revised (and therefore these “detected” transmission errors will be corrected), what is the
probability that there will be no reception errors?
Problem 3.27 Suppose that the events A, B and C are independent. Show that,
(a) Ac and B c are independent.
58 CHAPTER 3. PROBABILITY
Table 3.7:
Low Salary Medium Salary High Salary
Low GPA 0.10 0.08 0.02 0.20
Medium GPA 0.07 0.46 0.07 0.60
High GPA 0.03 0.06 0.11 0.20
0.20 0.60 0.20
Problem 3.30 Consider the system of components connected as follows. There are two
subsystems connected in parallel. Components 1 and 2 constitute the first subsystem and are
connected in parallel (so that this subsystem works if either component works). Components
3 and 4 constitute the second subsystem and are connected in series (so that this subsystem
works if and only if both components do). If the components work independently of one
another and each component works with probability 0.85, (a) calculate the probability that
the system works. (b) calculate this probability if the two subsystems are connected in series.
3.3. EXERCISES 59
Problem 3.31 Calculate the reliability of the system described in the following figure. The
numbers beside each component represent the probabilities of failure for this component.
.05 .01 .01
1 3 4 .05
¡ S ¡ S
¡ S ¡ S 7
@ ¢ @ ¢
@ 2 ¢ @ 5 6 ¢
Example 4.1 Let S the sample space associated with the inspection of four items. That is,
S = {w = (w1 , w2 , w3 , w4 )}
Random variables are often used to summarize the most relevant information contained
in the sample space. For example, one may be interested in the total number of defectives
(number of D0 s in w) and may not care about the order in which they have been found. In this
case the random variable X(w) defined above would capture the most relevant information
contained in w. If we will reject lots with two or more defectives (among the four inspected
items) the random variable, Y would be of most interest.
Notation: The notations {X = x} {X ≤ x} etc. will be used very often in this course.
Their exact meaning is explained below. In general,
{X = x} = {w : X(w) = x},
61
62 CHAPTER 4. RANDOM VARIABLES AND DISTRIBUTIONS
{X = 0} = {(N, N, N, N )}
and
The defining feature of a discrete random variable is that its range (the set of all its
possible values) is finite or countable. The values in the range are often integer numbers, but
they don’t need to be so. For instance, a random variable taking the values zero, one half
and one with probabilities 0.5, 0.25 and 0.25 respectively is considered discrete.
The probability density function (or in short, the density), f (x), of a discrete random
variable X is defined as
That is, f (x) gives the probability of each possible value x of X. It obviously has the following
properties:
In many engineering applications one works with 1 − F (x) instead of F (x). Notice that
1 − F (x) = P (X > x) and therefore gives the probability that X will exceed the value x.
Example 4.1 (continued): Suppose that the items are independent and each one can be
defective with probability p. The density and distribution of the random variable (r.v.) X =
“number of defectives” can then be derived as follows:
f (0) = P (X = 0) = P ({N, N, N, N }) = (1 − p)(1 − p)(1 − p)(1 − p) = (1 − p)4
f (1) = P (X = 1) = P ({D, N, N, N }, {N, D, N, N }, {N, N, D, N }, {N, N, N, D})
= p(1 − p)(1 − p)(1 − p) + (1 − p)p(1 − p)(1 − p) + (1 − p)(1 − p)p(1 − p)
+(1 − p)(1 − p)(1 − p)p = 4(1 − p)3 p
Table 4.1:
p = 0.40 p = 0.80
x f (x) F (x) f (x) F (x)
0 0.1296 0.1296 0.0016 0.0016
1 0.3456 0.4752 0.0256 0.0272
2 0.3456 0.8208 0.1536 0.1808
3 0.1536 0.9744 0.4096 0.5904
4 0.0256 1.0000 0.4096 1.0000
• the measurement error when measuring the distance between the North and South
shores of a river.
The typical events in these cases are bounded or unbounded intervals with probabilities
specified in terms of the integral of a continuous density function, f (x), over the desired
interval. See property (3) below.
Since the probability of all intervals must be non-negative and the probability of the entire
line should be one, it is clear that f (x) must have the two following properties:
Notice that, unlike in the discrete case, the inclusion or exclusion of the end points a and
b doesn’t affect the probability that the continuous variable X is in the interval. In fact,
the event that X will take any single value, x, can be represented by the degenerate interval
x ≤ X ≤ x and so,
Z x
P (X = x) = P (x ≤ X ≤ x) = f (t)dt = 0.
x
Therefore, unlike in the discrete case, f (x) doesn’t represent the probability of the event
X = x. What is then the meaning of f (x)? It represents the relative probability that X will
be near x: if d > 0 is small,
Another important function related with a continuous random variable is its cumulative
distribution function defined as
Z x
F (x) = P (X ≤ x) = f (t)dt, for all x. (4.1)
−∞
Notice that, in particular,
P (a < X < b) = F (b) − F (a).
F(b) F(b)-F(a)
F(a)
b a a b
Figure 3.1: Probability on (a, b) under density function f (x)
By the Fundamental Theorem of Calculus,
f (x) = F 0 (x), for all x. (4.2)
Therefore, we can go back and forth from the density to the distribution function and vice
versa using formulas (4.1) and (4.2).
Example 4.2 Suppose that the maximum annual flood level of a river, X (in meters), has
density
f (x) = 0.125(x − 5), if 5 < x < 9
= 0 otherwise
Calculate F (x), P (5 < X < 6), P (6 ≤ X < 7), and P (8 ≤ X ≤ 9).
0.5
1.0
0.4
0.8
0.6
F
f
0.2
0.4
0.1
0.2
0.0
0.0
4 5 6 7 8 9 10 4 5 6 7 8 9 10
x x
Solution
F (x) = 0, if x ≤ 5
Z x
= 0.125(t − 5)dt = (0.0625)(x − 5)2 , if 5 < x < 9
5
= 1, if x ≥ 9.
Furthermore,
P (5 < X < 6) = F (6) − F (5)
= 0.0625[(6 − 5)2 − (5 − 5)2 ]
= 0.0625.
Analogously,
P (6 ≤ X < 7) = F (7) − F (6) = 0.25 − 0.0625 = 0.1875,
and
P (8 ≤ X < 9) = F (9) − F (8) = 1.0 − 0.5625 = 0.4375.
Notice that, since P (X = x) = 0, the inclusion or exclusion of the interval’s boundary points
doesn’t affect the probability of the corresponding interval. In other words,
P (6 ≤ X ≤ 7) = P (6 < X ≤ 7) = P (6 ≤ X < 7) = P (6 < X < 7) = F (7) − F (6) = 0.1875.
Also notice that, since f (x) is increasing on (5, 9), P (5 < X < 6), for instance, is much
smaller than P (8 < X < 9), despite the length of the two intervals being equal. 2
Example 4.3 (Rounding-off Error and Uniform Random Variables): Due to the resolution
limitations of a measuring device, the measurements are rounded-off to the second decimal
place. If the third decimal place is 5 or more, the second place is increased by one unit; if the
third decimal place is 4 or less, the second place is left unchanged. For example, 3.2462 would
be reported as 3.25 and 3.2428 would be reported as 3.24. Let X represent the difference
between the (unknown) true measurement, y, and the corresponding rounded–off reading, r.
That is
X = y − r.
Clearly, X can take any value between −0.005 < X < 0.005. It would appear reasonable
in this case to assume that all the possible values are equally likely. Therefore, the relative
probability f (x) that X will fall near any number x0 between −0.005 and 0.005 should then
be the same. That is,
f (x) = c, − 0.005 ≤ x ≤ 0.005,
= 0, otherwise.
The random variable X is said to be uniformly distributed between −0.005 and +0.005. By
property 2 Z +∞ Z 0.005
f (x)dx = cdx = 0.01c = 1,
−∞ −0.005
4.4. SUMMARIZING THE MAIN FEATURES OF F (X) 67
F (x) = 0, x ≤ −0.005,
= 100(x + 0.005), − 0.005 ≤ x ≤ 0.005,
= 1, x ≥ 0.005
1.5
120
1.0
•
distributiion
density
100
0.5
90
80
0.0
x x
an approximately symmetric and unimodal density can be fairly well described by giving just
two numbers: a measure of its central location and a measure of its dispersion.
The median and the mean are two popular measures of (central) location and the
interquartile range and the standard deviation are two popular measures of dispersion.
These summary measures are defined and briefly discussed below.
Given a number α between zero and one, the quantile of order α of the distribution F (or
the r.v. X), denoted Q(α), is implicitly defined by the equation
P (X ≤ Q(α)) = α.
F (x) = α.
To find the quantile of order 0.25, for example, we must solve the equation
F (x) = 0.25.
The “special” quantiles Q(0.25) and Q(0.75) are often called the first quartile and the third
quartile, respectively.
The median of X, Med(X), is defined as the corresponding quantile of order 0.5, that is,
Med(X) = Q(0.5).
Evidently, Med(X) divides the range of X into two sets of equal probability. Therefore, it
can be used as a measure for the central location of f (x).
A simple sketch showing the locations of Q(0.25), M ed(X) and Q(0.75) constitutes a
good summary of f (x), even if it is not symmetric. Notice that if Q(0.75) − M ed(X) is
significantly larger (or smaller) than M ed(X) − Q(0.25), then f (x) is fairly asymmetric.
There are situations when there is no solution or too many solutions to the defining
equations above. This is typically the case for discrete random variables. In these cases the
quantiles (including the median) are calculated using some “common–sense” criterion. For
instance if the distribution function F (x) is constant and equal to 0.5 on the interval (x1 , x2 ),
then the median is taken equal to (x1 + x2 )/2 (see Figure 3.5 (a)). To give another example,
if the distribution function F (x) has a jump and doesn’t take the value 0.5, the median is
defined as the location of the jump (see Figure 3.5 (b))
The dispersion about the median is usually measured in terms of the interquartile
range, denoted IQR(X) and defined as:
1.0
1.0
• •
•
•
F(x)
F(x)
0.5
0.5
•
• •
•
•
0.0
0.0
x1 x2 x1
0 20 0 20
(a) (b)
Example 4.4 (Waiting Time and Exponential Random Variables) The waiting time X (in
hours) between the arrival of two consecutive customers at a service outlet is a random
variable with exponential density
f (x) = λe−λx , if x ≥ 0,
= 0, otherwise.
where λ is a positive parameter representing the rate at which customers arrive. For this
example, take λ = 2 customers per hour. (a) Find the distribution function F (x). (b)
Calculate Med(X), Q(0.25) and Q(0.25). (c) Is f (x) symmetric? (d) Calculate IQR(X).
Solution
(a)
Z x Z x
F (x) = f (t)dt = 2 exp {−2t}dt
−∞ 0
Therefore,
log(4) − log(3)
Q(0.25) = = 0.144.
2
Analogously, to calculate Q(0.75),
log(4)
Q(0.75) = = 0.693.
2
(c) Since
Q(0.75) − Med(X) = 0.693 − 0.347 = 0.346
and
Med(X) − Q(0.25) = 0.347 − 0.144 = 0.203,
the distribution is fairly asymmetric.
(d)
IQR = Q(0.75) − Q(0.25) = 0.693 − 0.144 = 0.549.
2
Let X be a random
√ variable with density f (x), and let g(X) be a function of X. For ex-
ample, g(X) = X or g(X) = (X −t)2 , where t is some fixed number. The notation E[g(X)],
read “expected value of g(X)”, will be used very often in this course. The expected value of
g(X) is defined as the weighted average of the function g(x), with weights proportional to
the density function f (x). More precisely:
Z +∞
E[g(X)] = g(x)f (x)dx in the continuous case, and (4.3)
−∞
X
E[g(X)] = g(x)f (x) in the discrete case, (4.4)
x∈R
Example 4.5 Refer to the random variables of Example 3.1 (number of defectives) and
Example 4.3 (rounding–off error). Calculate E(X) and E(X 2 ).
Solution Since the random variable X of Example 3.1 is discrete, we must use formula (4.4)
to obtain:
and
In the case of the continuous random variable X of Example 4.3 we must use formula (4.3):
Z +∞ Z 0.005
E(X) = xf (x)dx = 100 xdx = 0,
−∞ −0.005
Z +∞ Z 0.005
E(X 2 ) = x2 f (x)dx = 100 x2 dx
−∞ −0.005
Suppose that it is proposed that a certain number t is used as the measure of central
location of X. How could we decide if this proposed value is appropriate? One way to think
about this question is as follows. If t is a good measure of central location then, in principle,
one would expect that the squared residuals (x − t)2 will be fairly small for those values x
of X which are highly likely (those for which f (x) is large). If this is so, then one would also
expect that the average of these squared residuals,
But we could begin this reasoning from the end and say that a good measure of central
location must minimize D(t). This “optimal” value of t, called the mean of X, is denoted
by the Greek letter µ.
72 CHAPTER 4. RANDOM VARIABLES AND DISTRIBUTIONS
To find µ we differentiate D(t) and set the derivative equal to zero. In the continuous
case,
Z +∞
0
D (t) = −2 (x − t)f (x)dx = −2[E(X) − t] = 0 ⇒ t = E(X),
−∞
and the discrete case can be treated similarly. Since D00 (t) = 2 > 0 for all t, the critical point
t = E(X) minimizes D(t). Therefore,
µ = E(X)
This procedure of defining the desired summary measure by the property of minimizing the
average of the squared residuals is a very important technique in applied statistics called the
method of minimum mean squared residuals. We will come across several applications
of this technique throughout this course.
for all values of t. The quantity D(µ) is usually denoted by the Greek symbol σ 2 (read
“Sigma squared”) and called the variance of X. An alternative notation for the variance
of X, also often used in this course, is Var(X).
It is evident that Var(X) will tend to be smaller when the density of X is more concen-
trated around µ, since the smaller squared residuals will receive larger weights. Therefore,
Var(X) could be taken as a measure of the dispersion of f (x). A problem with Var(X) is
that it is expressed in a unit which is the square of original unit of X. This problem is
easily solved by taking the (positive) square root of Var(X). This is called the standard
deviation of X and denoted by either σ or SD(X).
q √
σ = SD(X) = + Var(X) = + σ 2 .
Example 4.4 (continued): (a) Calculate the mean and the standard deviation for the waiting
time between two consecutive customers, X. (b) How do they compare with the correspond-
ing median waiting time and interquartile range calculated before?
Solution
4.4. SUMMARIZING THE MAIN FEATURES OF F (X) 73
= [− exp {−2x}/2]+∞
0 = exp {0}/2 = 0.5.
More generally, if X is an exponential random variable with parameter (rate) λ, then
E(X) = 1/λ. (4.5)
Using integration by–part again, we get
Z +∞ Z +∞
E(X 2 ) = 2 x2 exp {−2x}dx = 2[−x2 exp {−2x}/2]+∞
0 + 2[ x exp {−2x}]dx
0 0
Z +∞
= 2 x exp {−2x}]dx = 0.5.
0
Therefore, q q
SD(X) = + E(X 2 ) − [E(X)]2 = + 0.5 − (0.5)2 = 0.5.
More generally, if X is an exponential random variable with parameter λ, then
Var(X) = 1/λ2 and SD(X) = 1/λ. (4.6)
(b) Since the density of X is asymmetric, the median and the mean are expected to be
different (as they are). Since the density is skewed to the right (longer right hand side tail)
the mean expected time (0.5) is larger than the median expected time (0.347).
The two measures of dispersion (IQR = 0.549 and SD = 0.5) are quite consistent. 2
Property 2: E(X + Y ) = E(X) + E(Y ) for all pairs of random variables X and Y .
Property 3: E(XY ) = E(X)E(Y ) for all pairs of independent random variables X and
Y.
Proof
Var(aX + b) = E [(aX + b) − (aµ + b)]2 = E [a(X − µ)]2 = a2 E(X − µ)2 = a2 Var(X)
74 CHAPTER 4. RANDOM VARIABLES AND DISTRIBUTIONS
2
Property 5: Var(X ± Y ) = Var(X) + Var(Y ) for all pairs of independent random
variables X and Y .
All these properties will be used very often in this course. The proofs of properties 2, 3
and 5 are beyond the scope of this course, and therefore these properties must be accepted
as facts and used throughout the course.
The formula
Var(X) = E(X 2 ) − [E(X)]2 = E(X 2 ) − µ2 ,
is often used for calculations. The derivation of this formula is very simple, using the prop-
erties of the mean listed above. In fact,
Var(X) = E{(X − µ)2 } = E(X 2 + µ2 − 2µX) = E(X 2 ) + µ2 − 2µE(X)
= E(X 2 ) + µ2 − 2µ2 = E(X 2 ) − µ2 .
Example 4.6 Twenty randomly selected students will be asked the question “do you reg-
ularly smoke?”. (a) Calculate the expected number of smokers in the sample if 10% of the
students smoke; (b) what is your “estimate” of the proportion, p, of smokers if six students
answered “Yes”?; (c) What are the expected value and the variance of your estimate?
Solution
(a) Let Xi be equal to one if the ith student answers “Yes” and equal to zero otherwise.
Let p be equal to the proportion of smokers in the student population. Then the Xi are
independent discrete random variables with density f (0) = 1 − p and f (1) = p. Therefore,
and
Var(Xi ) = E(Xi2 ) − [E(Xi )]2 = p − p2 = p(1 − p) = 0.09.
Hence, the expected number of smokers in a sample of 20 students is
(b) A reasonable estimate for the fraction, p, of smokers in the population is given by the
corresponding fraction of smokers in the sample, X. In the case of our sample, the observed
value, x, of X is x = 6/20 = 0.3.
(c) The expected value of the estimate in (b) is p and its variance is p(1 − p)/20. Why? 2
Example 4.7 The independent random variables X, Y and Z represent the monthly sales
of a large company in the provinces of BC, Ontario and Quebec, respectively. The mean and
standard deviations of these variables are as follows (in hundred of dollars):
(a) What are the expected value and the standard deviation of the total monthly sales?
(b) Sales manager J. Smith is responsible for the sales in BC and 2/3 of the sales in Ontario.
Sales manager R. Campbell is responsible for the sales in Quebec and the remaining 1/3 of
the sales in Ontario. What are the expected values and standard deviations of Mr. Smith’s
and Mrs. Campbell’s monthly sales?
(c) What are the expected values and standard deviations of the annual sales for each
province? Assume for simplicity that the monthly sales are independent.
76 CHAPTER 4. RANDOM VARIABLES AND DISTRIBUTIONS
Solution
By Property 5
Therefore, q
SD(S) = 59, 400 = 243.72
(b) First, notice that
Analogously
E(S2 ) = 2266.67
By Property 5
and so q
SD(S1 ) = 24, 400 = 156.20.
Analogously
SD(S2 ) = 158.11.
(c) If Xi (i = 1, . . . , 12) represent BC’s monthly sales, the annual sales for BC are
12
X
T = Xi
i=1
Therefore, " 12 #
X 12
X
E(T ) = E Xi = E(Xi ) = (12)(1, 435) = 17, 220.
i=1 i=1
The variance and the standard deviation of the annual sales in BC (assuming independence)
are: " #
12
X 12
X
Var(T ) = Var Xi = Var(Xi ) = (12)(1202 ) = 172, 800.
i=1 i=1
4.6. MAX AND MIN OF INDEPENDENT RANDOM VARIABLES 77
q
SD(T ) = 172, 800 = 415.69.
The student can now calculate the expected values and the standard deviations for the annual
sales in Toronto and Quebec. 2
Question: The total monthly sales can be obtained as the sum of Mr. Smith’s (S1 =
X+(2/3)Y ) and Mrs. Campbell’s (S1 = Z+(1/3)Y ) monthly sales, with variances (calculated
in part (b)) equal to 24, 400 and 25, 000, respectively. Why is it then true that the total sales
variance (Var(X +Y +Z)), calculated in part (b), is not equal to the sum of 24, 400+25, 000 =
49, 400?
V = max{X1 , X2 , . . . , Xn }.
3. The maximum flood level of a river in the next n years. In this case
Λ = min{X1 , X2 , . . . , Xn }.
3. The minimum flood level of a river in the next n years. In this case
Xi = Minimum flood level in the ith year .
This formula is greatly simplified when the Xi ’s are identically distributed, that is, when
F1 (x) = F2 (x) = · · · = Fn (x) = F (x)
for all values of x. In this case,
FV (v) = [F (v)]n (4.7)
and
fV (v) = FV0 (v) = n[F (v)]n−1 f (v). (4.8)
Example 4.8 A system consists of five components connected in parallel. The lifetime
(in thousands of hours) of each component is an exponential random variable with mean
µ = 3. See Example 4.4 and Example 4.4 (continued) for the definition of exponential
random variables and formulas for their mean and variance.
(a) Calculate the median life (often called “half–life”) and standard deviation for each com-
ponent.
(b) Calculate the probability that a component fails before 3500 hours.
(c) Calculate the probability that the system will fail before 3500 hours. Compare this with
the probability that a component fails before 3500 hours.
(d) Calculate the half–life (median life), mean life and standard deviation for the system.
4.6. MAX AND MIN OF INDEPENDENT RANDOM VARIABLES 79
Solution
(a) Using equation (4.5) and the fact that the lifetime X of each component is exponentially
distributed with mean µ = 3 we obtain that λ = 1/3 and that the density and distribution
functions of X are
f (x) = (1/3) exp{−x/3} and F (x) = 1 − exp{−x/3}, x ≥ 0,
respectively. The half-life of each component can be obtained as follows
1 − exp{−x/3} = 0.5 ⇒ exp{−x/3} = 0.5 ⇒ x0 = −3 log(0.5) = 2.08.
Therefore, the half-life of each component is equal to 2, 080 hours. To obtain the standard
deviation, recall that from equation (4.6) the standard deviation of an exponential random
variable is equal to its mean, that is,
SD(X) = E(X) = µ.
Therefore, the standard deviation of the lifetime of each component is equal to 3.
(b) The probability that a component will fail before 3500 is
P {X ≤ 3.5} = F (3.5) = [1 − exp{−3.5/3}] = 0.6886.
(c) Using formula (4.7)
FV (v) = [1 − exp{−v/3}]5
and so the probability that the system will fail before 3, 500 hours is
P {V ≤ 3.5} = FV (3.5) = [1 − exp{−3.5/3}]5 = (0.6886)5 = 0.1548.
The probability that a single component fails (calculated in part (b)) is more than four times
larger.
(c) To calculate the median life of the system we must use formula (1) once again:
FV (v) = 0.5 ⇒ [1 − exp{−v/3}]5 = 0.5 ⇒ exp{−v/3}] = 1 − (0.5)1/5 = 0.12945
⇒ v0 = −3 log(0.12945) = 6.133.
Therefore, the median life of the system is equal to 6, 133 hours.
To calculate the mean life we must first obtain the density function of V . Using formula (2)
above we obtain
fV (v) = (5)[1 − exp{−v/3}]4 (1/3) exp{−v/3}
we have that
Z ∞ Z ∞ Z ∞
E(V 2 ) = v 2 fV (v)dv = (5/3)[ v 2 exp{−v/3}dv − 4 v 2 exp{−2v/3}dv
0 0 0
Z ∞ Z ∞ Z ∞
+ 6 v 2 exp{−v}dv − 4 v 2 exp{−4v/3}dv + v 2 exp{−5v/3}dv
0 0 0
Therefore, q √
SD(V ) = 60.095 − (6.852 ) = 13.1725 = 3.63.
2
= 1 − P {X1 > u}P {X2 > u} · · · P {Xn > u} [ since the variables Xi are independent]
As before, this formula can be greatly simplified when the Xi ’s are equally distributed, that
is, when
F1 (x) = F2 (x) = · · · = Fn (x) = F (x)
for all values of x. In this case,
FΛ (u) = 1 − [1 − F (u)]n (4.9)
and
fΛ (u) = FΛ0 (u) = n[1 − F (u)]n−1 f (u). (4.10)
Example 4.9 A system consists of five components connected in series. The lifetime (in
thousands of hours) of each component is an exponential random variable with mean µ = 3.
(a) Calculate the probability that the system will fail before 3500 hours. Compare this with
the probability that a component fails before 3500 hours.
(b) Calculate the median life, the mean life and the standard deviation for the system.
Solution
4.7 Exercises
4.7.1 Exercise Set A
Problem 4.1 A system consists of five identical components all connected in series. Suppose
each component has a lifetime (in hours) that is exponentially distributed with the rate
λ = 0.01, and all the five components work independently of one another.
Define T to be the time at which the system fails. Consider the following questions:
(a) Obtain the distribution of T . Can you tell what type of distribution it is?
(b) Compute the IQR (interquartile range) for the distribution obtained in part (a).
(c) What is the probability that the system will last at least 15 hours?
Problem 4.3 Suppose that the response time X at a certain on-line computer terminal (the
elapsed time between the end of a user’s inquiry and the beginning of the system’s response
to that inquiry) has an exponential distribution with expected response time equal to 5
seconds (i.e. the exponential rate is λ = 0.2).
(a) Calculate the median response time.
(b) What is the probability that the next three response times exceed 5 seconds? (Assume
that all the response times are independent).
Problem 4.4 The hourly volume of traffic, X, for a proposed highway has density propor-
tional to g(x), where (
x(100 − x) if 0 < x < 100
g(x) =
0 otherwise.
(a) Derive the density and the distribution functions of X.
(b) The traffic engineer may design the highway capacity equal to the mean of X. Determine
the design capacity of the highway and the corresponding probability of exceedence
(i.e. traffic volume is greater than the capacity).
Problem 4.5 A discrete random variable X has the density function given below.
x −1 0 1 2
f (x) 0.2 c 0.2 0.1
(a) Determine c;
(b) Find the distribution function F (x);
(c) Show that the random variable Y = X 2 has the density function g(y) given by
y 0 1 4
g(y) 0.5 0.4 0.1
4.7. EXERCISES 83
(d) Calculate expectation E(X), variance Var(X) and the mode of X (the value x with the
highest density).
Problem 4.6 A continuous random variable X has the density function f (x) which is pro-
portional to cx on the interval 0 ≤ x ≤ 1, and 0 otherwise.
(a) Determine the constant c;
(b) Find the distribution function F (x) of X;
(c) Calculate E(X), Var(X) and the median, Q(0.5);
(d) Find P (|X| ≥ 0.5).
Problem 4.9 Suppose a random variable X has a probability density function given by
(
kx(1 − x) for 0 ≤ x ≤ 1
f (x) =
0 elsewhere.
(a) Find the value of k such that f (x) is a probability density function.
(b) Find P (0.4 ≤ X ≤ 1).
(c) Find P (X ≤ 0.4|X ≤ 0.8).
(d) Find F (b) = P (X ≤ b), and sketch the graph of this function.
Problem 4.10 Suppose that random variables X and Y are independent and have the same
mean 3 and standard deviation 2. Calculate the mean and variance of X − Y .
Problem 4.11 Suppose X has exponential distribution with a unknown parameter λ, i.e.
its density is that (
λ exp(−λx) if x ≥ 0
f (x) =
0 otherwise.
If P (X ≥ 1) = 0.25, determine λ.
84 CHAPTER 4. RANDOM VARIABLES AND DISTRIBUTIONS
Problem 4.12 Suppose an enemy aircraft flies directly over the Alaska pipeline and fires
a single air-to-surface missile. If the missile hits anywhere within 10 feet of the pipeline, a
major structural damage will occur and the oil flow will be disrupted. Let X be the distance
from the pipeline to the point of impact. Note that X is a continuous random variable. The
probability function describing the missile’s point of impact is given by
60+x
3600 for −60 < x < 0
60−x
f (x) = 3600
for 0 ≤ x < 60
0 otherwise.
(a) Find the distribution function, F (x).
(b) Let A be the event “flow is disrupted.” Find P (A).
(c) Find the mean and the standard deviation of X.
(d) Find the median and the interquartile range of X.
Problem 4.13 Consider a random variable X which follows the uniform distribution on the
interval (0, 1). (a) Give the density function f (x) and obtain the cumulative distribution
functionF (x) of X;
(b) Calculate√the mean (expectation) E(X) and variance Var(X);
(c) Let Y = X. Find the E(Y ) and Var(Y );
(d) Obtain the distribution function G(y) and furthermore the density function g(y) of ran-
dom variable Y .
Problem 4.14 The reaction time (in seconds) to a certain stimulus is a continuous random
variable with density given below
(
3
2x2
for 1 ≤ x ≤ 3
f (x) =
0 otherwise
(a) Obtain the distribution function.
(b) Take next two observations X1 and X2 (we can assume they are i.i.d). Then consider
V = max{X1 , X2 }. What is the density and distribution functions of V ?
(c) Compute the expectation E(V ) and the standard deviation SD(V ).
(d) Compute the difference between the expectation and the median for the distribution of
V.
Problem 4.16 Find the density functions corresponding to the pictures in Figure 3.7. For
each case also calculate the distribution function, the mean, the median, the interquartile
range and the standard deviation.
6 30 3 9 15 -1 1
(a) (b) (c)
-2 0 2 0 10 -2 0 6
(d) (e) (f)
Problem 4.17 The density function for the lifetime of a part, X, decays exponentially fast.
If the half–life of X is equal to fifty weeks, find the mean and standard deviation of X.
Problem 4.18 The density function for the measurement error, X, is uniform on the interval
(−0.5, 0.8). What is the distribution function of X 2 ? What is the density of X 2 ?
Problem 4.19 The hourly volume of traffic, X, for a proposed highway has density propor-
tional to d(x), where
Problem 4.20 The company has 20 welders with the following “performances”:
86 CHAPTER 4. RANDOM VARIABLES AND DISTRIBUTIONS
Welder 0 1 2 3 4
1 0.10 0.20 0.40 0.20 0.10
2 0.20 0.20 0.20 0.20 0.20
3 0.50 0.30 0.10 0.05 0.05
4 0.05 0.05 0.10 0.30 0.50
5 0.50 0.00 0.00 0.00 0.50
6 0.85 0.00 0.00 0.00 0.15
7 0.30 0.25 0.20 0.10 0.15
8 0.20 0.30 0.20 0.10 0.20
9 0.10 0.10 0.50 0.20 0.10
10 0.20 0.50 0.10 0.20 0.00
11 0.30 0.30 0.40 0.00 0.00
12 0.10 0.10 0.50 0.15 0.15
13 0.35 0.25 0.20 0.15 0.05
14 0.40 0.30 0.10 0.10 0.10
15 0.20 0.30 0.50 0.00 0.00
16 0.60 0.30 0.10 0.00 0.00
17 0.70 0.10 0.10 0.10 0.00
18 0.10 0.80 0.10 0.00 0.00
19 0.40 0.40 0.10 0.10 0.00
20 0.15 0.60 0.15 0.10 0.00
1) How would you rank these twenty welders (e.g. for promotion) on the basis of this
information alone?
2) Would you change the “ranking” if you know that items with one, two, three and four
cracks must be sold for $6, $15, $40, and $60 less, respectively? What if the associated losses
are $6, $15, $40, and $80. Suggestion: Use the computer.
Problem 4.21 Suppose that the maximum annual wind velocity near a construction site,
X, has exponential density
f (x) = λ exp {−λx}, x > 0.
(a) If the records of maximum wind speed show that the probability of maximum annual
wind velocities less than 72 mph is approximately 0.90, suggest an appropriate estimate for
λ.
(b) If the annual maximum wind speeds for different years are statistically independent,
calculate the probability that the maximum wind speed in the next three years will exceed
75 mph. What about the next 15 years?
(c) Plot the distribution function of the maximum wind speed for the next year, for the next
3 years and for the next 15 years. Briefly report your conclusions.
(d) Let Qm (p) (m = 1, 2, . . .) the quantile of order p for the maximum wind speed on the
next m years. Show that
h i
Qm (p) = Q1 p1/m , for all m = 1, 2, . . .
Use this formula to plot Qm (0.90) versus m. Same for Qm (0.95). Briefly report your conclu-
sions. Suggestion: Use the computer.
4.7. EXERCISES 87
Problem 4.22 A system has two independent components A and B connected in parallel.
If the operational life (in thousand of hours) of each component is a random variable with
density
1
f (x) = (x − 4)(10 − x) 4 < x < 10
36
= 0 otherwise
(a) Find the median and the mean life of each component. Find also the standard deviation
and IQR.
(b) Calculate the distribution and density functions for the lifetime of the system. What is
the expected lifetime of the system?
(c) Same as (b) but assuming that the components are connected “in series” instead of “in
parallel”.
Problem 4.23 A large construction project consists of building a bridge and two roads
linking it to two cities (see the picture below). The contractual time for the entire project is
18 months.
The construction of each road will require between 15 and 20 months and that of the
bridge will require between 12 and 19 months. The three parts of the projects can be done
simultaneously and independently. Let X1 , X2 and Y represent the construction times for the
two roads and the bridge, respectively and suppose that these random variables are uniformly
distributed on their respective ranges.
(a) What is the expected time for completion of each part of the project? What are the
corresponding standard deviations?
(b) What is the expected time for the completion of the entire project? What is the corre-
sponding standard deviation?
(c) What is the probability that the project will be completed within the contractual time?
Problem 4.24 Same as Problem 2.51, but assuming that the variables X1 , X2 and Y have
triangular distributions over their ranges.
City
River
Road 2
Bridge
Road 1
City
88 CHAPTER 4. RANDOM VARIABLES AND DISTRIBUTIONS
Chapter 5
Normal Distribution
The Normal distribution is, for reasons that will be evident as we progress in this course,
the most popular distribution among engineers and other scientists. It is a continuous dis-
tribution with density,
( )
1 (x − µ)2
f (x) = √ exp −
σ 2π 2σ 2
where µ and σ are “parameters” which control the central location and the dispersion of the
density, respectively. The normal density is perfectly symmetric about the center, µ, and
this bell-shaped function is “shorter” and “fatter” as σ increases.
Normal Density
0.4
sigma=1
0.3
sigma=1.5
density
0.2
sigma=2
sigma=3
0.1
0.0
-6 -4 -2 0 2 4 6
x
Figure 4.1: Normal density functions
89
90 CHAPTER 5. NORMAL DISTRIBUTION
The density steadily decreases as we move away from its highest value
1
f (µ) = √ .
σ 2π
Therefore, the relative (and also the absolute) probability that X will take a value near µ is
the highest. Since f (x) → 0 as x → ∞, exponentially fast,
g(k) = P {|X − µ| ≤ kσ} → 1, as k → ∞,
very fast. In fact, it can be shown that g(1) = 0.6827, g(2) = 0.9544, g(3) = 0.9973 and
g(4) = 0.9999. For practical purposes g(k) = 1 for k ≥ 4.
Fact 1: If X ∼ N (µ, σ 2 ) and Y = aX + b, where a and b are two constants with a 6= 0, then
Y ∼ N (aµ + b, a2 σ 2 ).
For example, if X ∼ N (2, 9) and Y = 5X + 1, then E(Y ) = (5)(2) + 1 = 11, Var(Y ) =
(52 )(9) = 225 and Y ∼ N (11, 225).
Proof We will consider the case a > 0. The proof for the a < 0 case is left as an exercise.
The distribution function of Y , denoted here by G is given by
à ! à !
y−b y−b
G(y) = P (Y ≤ y) = P (aX + b ≤ y) = P X ≤ =F ,
a a
where F is the distribution function of X. The density function g(y) of Y can now be found
by differentiating G(y). That is,
à ! à !
0 d y−b 1 y−b
g(y) = G (y) = F = f
dy a a a
( )
2
1 [y − (aµ + b)]
= √ exp − ,
aσ 2π 2a2 σ 2
2
Standardized Normal
An important particular case emerges when a = (1/σ) and b = −(µ/σ). In this case the
transformed variable is denoted by Z and called “standard normal”. Since
X −µ
Z = (1/σ)X − (µ/σ) = ,
σ
by Property 1, the parameters of the new normal variable, Z, can be obtained from those of
the given normal variable, X, (µ and σ 2 ) as follows:
µ −→ aµ + b = (1/σ)µ − (µ/σ) = 0
5.1. DEFINITION AND PROPERTIES 91
and
σ 2 −→ a2 σ 2 = (−1/σ)2 σ 2 = 1.
That is, any given normal random variable X ∼ N (µ, σ 2 ) can be transformed into a standard
normal Z ∼ N (0, 1) by the equation
X −µ
Z= . (5.1)
σ
P(Z<-1)=1-P(Z<1)
0.4
0.3
P(Z<-1) 1-P(Z<1)
density
0.2
0.1
0.0
-3 -2 -1 0 1 2 3
Fact 3: The normal density cannot be integrated in closed form. That is, there are no simple
formulas for calculating expressions like
Z x
F (x) = f (t)dt
−∞
or Z b
P (a < X < b) = f (t)dt = F (b) − F (a).
a
Example 5.1 Let X ∼ N (2, 9). Calculate (a) P (X < 5), (b) P (−3 < X < 5) (c) P (X > 5)
(d) P (|X − 2| < 3) (e) The value of c such that P (X < c) = 0.95 (f) The value of c such
that P (|X − 2| > c) = 0.10
Solution
(a) P (X < 5) = F (5) = Φ[(5 − 2)/3)] = Φ(1) = 0.8413447, from Table 1 in the Appendix.
(b) P (−3 < X < 5) = F (5) − F (−3) = Φ[(5 − 2)/3] − Φ[(−3 − 2)/3] = Φ(1) − Φ(−5/3) =
0.8413447 − 0.04779035 = 0.7935544, from Table 1 in the Appendix.
(d) To solve this question we must first remember that a number has absolute value smaller
than 3 if and only if this number is between −3 and 3. In other words, to say that |X −2| < 3
is equivalent to saying that −3 < X − 2 < 3. Therefore,
P [|X − 2| < 3] = P [−3 < X − 2 < 3] = P [−1 < (X − 2)/3 < 1] = P [−1 < Z < 1]
= Φ(1) − Φ(−1) = Φ(1) − [1 − Φ(1)] = 2Φ(1) − 1
= 0.6826895.
(f) The value of c such that P (|X − 2| > c) = 0.10 is calculated as follows,
P (|X − 2| > c) = P [|Z| > c/3] = 1 − P [|Z| ≤ c/3] = 1 − {2Φ(c/3) − 1} = 2[1 − Φ(c/3)] = 0.10
Therefore,
Φ(c/3) = 0.95 ⇒ c/3 = 1.64 ⇒ c = (3)(1.64) = 4.92
2
Proof It suffices to prove that E(Z) = 0 and Var(Z) = 1, because from (9)
X = σZ + µ,
2
and then we would have E(X) = E(σZ +µ) = σE(Z)+µ = µ and Var(σZ √+µ) = σ Var(Z) =
σ . By symmetry, we must have E(Z) = 0. In fact, since φ (z) = (−z/ 2π) exp {−z 2 /2} =
2 0
Fact 5: Suppose that X1 , X2 , . . . , Xn are independent normal random variables with mean
E(Xi ) = µi and variance Var(Xi ) = σi2 . Let Y be a linear combination of the Xi , that is,
Y = a1 X1 + a2 X2 + . . . + an Xn ,
where ai ( i = 1, · · · , n ) are some given constant coefficients. Then,
Y ∼ N (a1 µ1 + a2 µ2 + . . . + an µn , a21 σ12 + a22 σ22 + . . . + a2n σn2 )
94 CHAPTER 5. NORMAL DISTRIBUTION
Proof The proof that Y is normal is beyond the scope of this course. On the other hand,
to show that
E(Y ) = a1 µ1 + a2 µ2 + . . . + an µn ,
and
Var(Y ) = a21 σ12 + a22 σ22 + . . . + a2n σn2 ,
is very easy, using Properties 2 and 5 for the mean and the variance of sums of random
variables. 2
Example 5.2 Suppose that X1 and X2 are independent, X1 ∼ N (2, 4), X2 ∼ N (5, 3) and
Y = 0.5X1 + 2.5X2 .
Find the probability that Y is larger than 15.
An important particular case arises when X1 , . . . , Xn is a normal sample, that is, when
the variables X1 , . . . , Xn are independent, identically distributed, normal random variables,
with mean µ and variance σ 2 . One can think of the Xi0 s as a sequence of n independent mea-
surements of the normal random variable, X ∼ N (µ, σ 2 ). µ is usually called the population
mean and σ 2 is usually called the population variance.
If the coefficients, ai , are all equal to 1/n. then Y is equal the sample average:
n
X n
1X
Y = (1/n)Xi = Xi = X.
i=1 n i=1
By Fact 5, then, the normal sample average is also a normal random variable, with mean
n
X n
X 1 nµ
ai µ = µ= = µ,
i=1 i=1 n n
and variance n n
X X 1 nσ 2 σ2
a2i σ 2 = 2
σ 2
= = .
i=1 i=1 n n2 n
5.2. CHECKING NORMALITY 95
Example 5.3 Suppose that X1 , X2 , . . . , X16 are independent N (µ, 4) and X is their average.
(a) Calculate P (|X1 − µ| < 1) and P (|X − µ| < 1). (b) Calculate P (|X − µ| < 1) when the
sample size is 25 instead of 16. (c) Comment on the result of your calculations.
Solution
(c) The probability that the sample mean, X, is close to the population mean, µ, (0.954
when n = 16, and 0.9876 when n = 25) is much larger than the probability that any single
observation, Xi , is close to µ (0.383). The probability that the sample mean is close to the
population mean depends on the sample size, n, and gets larger when the n gets larger. 2
P (X ≤ qi ) = F (qi ) = (i − 0.5)/n.
96 CHAPTER 5. NORMAL DISTRIBUTION
That is,
qi = F −1 [(i − 0.5)/n],
where F −1 denotes the inverse of F . In the special case of the standard normal the theoretical
quantiles will be denoted by di . They are given by the formula
where, as usual, Φ denotes the standard normal distribution function. In the case of a normal
random variable, X, with mean µ and variance σ 2 , we have
and therefore,
If this sample comes from a N (µ, σ 2 ) distribution then one would expect that
q̂i ≈ qi = µ + σdi ,
and therefore the plot of x(i) versus di will be close to a straight line, with slope σ and
intercept µ.
5.2. CHECKING NORMALITY 97
10
• •
1.0
• • •
•
8
•• ••• •
•• ••
0.5
• ••
•
••
6
0.0
•
•••
••
•
4
-0.5
•
• •
•
2
-1.0
•
••
• ••
•••
-1.5
• •
0
• •
• •
-2 -1 0 1 2 -2 -1 0 1 2
Quantiles of Standard Normal Quantiles of Standard Normal
•
1.0
•
•
•
4
•
••
0.5
2
•
•• • ••
•••••
•• •
0.0
•••• •• •• ••
0
• •• •
• • ••
• •
-2
-0.5
•
• •
-4
•
-1.0
• •
-2 -1 0 1 2 -2 -1 0 1 2
Quantiles of Standard Normal Quantiles of Standard Normal
(e) Distribution with Heavy Tails (f) Distribution with Thin Tails
• •
2
150
•
• •
100
•
1
•
•••• •
•• •
•
50
•• •
0
• •
•
•• ••••
• •• ••
0
••
• •••
-1
•
• • •
•
-50
•
• •
-2
• •
-2 -1 0 1 2 -2 -1 0 1 2
Quantiles of Standard Normal Quantiles of Standard Normal
5.3 Exercises
5.3.1 Exercise Set A
Problem 5.1 A machine operation produces steel shafts having diameters that are normally
distributed with a mean of 1.005 inches and a standard deviation of 0.01 inch. Specifications
call for diameters to fall within the interval 1.00±0.02 inches. What percentage of the output
of this operation will fail to meet specifications? What should be the mean diameter of the
shafts produced in order to minimize the fraction not meeting specifications?
Problem 5.2 Extruded plastic rods are automatically cut into nominal lengths of 6 inches.
Actual lengths are normally distributed about a mean of 6 inches and their standard deviation
is 0.06 inch.
(a) What proportion of the rods exceeds the tolerance limits of 5.9 inches to 6.1 inches?
(b) To what value does the standard deviation need to be reduced if 99% of the rods must
be within tolerance?
Problem 5.3 Suppose X1 and X2 are independent and identically distributed N (0, 4), and
define Y = max(X1 , X2 ). Find the density and the distribution functions of Y .
Problem 5.4 Assume that the height of UBC students is a normal random variable with
mean 5.65 feet and standard deviation 0.3 feet.
(a) Calculate the probability that a randomly selected student has height between 5.45 and
5.85 feet.
(b) What is the proportion of students above 6 feet?
Problem 5.5 The raw scores in a national aptitude test are normally distributed with mean
506 and standard deviation 81.
(a) What proportion of the candidates scored below 574?
(b) Find the 30th percentile of the scores.
Problem 5.6 Scores on a certain nationwide college entrance examination follow a normal
distribution with a mean of 500 and a standard deviation of 100.
(a) If a school admits only students who scores over 670, what proportion of the student pool
will be eligible for admission?
(b) What admission requirements would you see if only the top 15% are to be eligible?
Problem 5.7 A machine is designed to cut boards at a desired length of 8 feet. However,
the actual length of the boards is a normal random variable with standard deviation 0.2 feet.
The mean can be set by the machine operator. At what mean length should the machine be
set so that only 5 per cent of the boards are under cut (that is, under 8 feet)?
of µ?
(b) Consider the difference between two observations X1 and X2 (here we could assume that
X1 and X2 are i.i.d.), what is the probability that the absolute value of this difference is at
most 0.075◦ ?
Problem 5.9 Suppose the random variable X follows a normal distribution with mean µ =
50 and standard deviation σ = 5.
(a) Calculate the probability P (|X| > 60).
(b) Calculate EX 2 and the interquartile range of X.
Problem 5.11 Let X be a normal random variable with mean 10 and variance 25 Find:
Problem 5.12 A scholarship is offered to students who graduate in the top 5% of their
class. Rank in the class is based on GPA (4.00 being perfect). A professor tells you the
marks are distributed normally with mean 2.64 and variance 0.5831. What GPA must you
get to qualify for the scholarship?
Problem 5.13 If the test scores of 40 students are normally distributed with a mean of 65
and a standard deviation of 10.
(a) Calculate the probability that a randomly selected student scored between 50 and 80;
(b) If two students are randomly selected, calculate the probability that the difference between
their scores is less than 10.
Problem 5.14 The length of trout in a lake is normally distributed with mean µ = 0.93
feet and standard deviation σ = 0.5 feet.
(a) What is the probability that a randomly chosen trout in the lake has a length of at least
0.5 feet;
(b) Suppose now that the σ is unknown. What is the value of σ if we know that 85% of the
trout in the lake are less than 1.5 feet long. Use the same mean 0.93.
Problem 5.15 The life of a certain type of electron tube is normally distributed with mean
95 hours and standard deviation 6 hours. Four tubes are used in a electronic system. Assume
that these tubes alone determine the operating life of the system and that, if any one fails,
the system is inoperative.
(a) What is the probability of a tube living at least 100 hours?
(b) What is the probability that the system will operate for more than 90 hours?
Problem 5.16 A product consists of an assembly of three components. The overall weight
of the product, Z, is equal to the sum of the weights X1 , X2 and X3 of its components.
Because of variability in production, they are independent random variables, each normally
distributed as N (2, 0.02), N (1, 0.010) and N (3, 0.03), respectively. What is the probability
that Z will meet the overall specification 6.00 ± 0.30 inches?
Problem 5.17 Due to variability in raw materials and production conditions, the weight
(in hundred of pounds) of a concrete beam is a normal random variable with mean 31 and
standard deviation 0.50.
(a) Calculate the probability that a randomly selected beam weights between 3000 and 3200
pounds.
Problem 5.18 A machine fills 250-pound bags of dry concrete mix. The actual weight of
the mix that is put in the bag is a normal random variable with standard deviation σ = 0.40
pound. The mean can be set by the machine operator. At what mean weight should the
machine be set so that only 10 per cent of the bags are underweight? What about the larger
500-pound bags?
5.3. EXERCISES 101
Problem 5.19 Check if the following samples are normal. Describe the type of departure
from normality when appropriate.
(a) 2.52 3.06 2.41 3.98 2.63 4.11 4.66 5.83 4.80 6.17 4.44 5.38 5.02 1.09 3.31 2.72 1.75 3.81
4.45 2.93
(b) 2.15 -3.46 1.12 0.25 -1.42 0.06 -1.16 -2.24 -1.50 0.37 0.66 -0.76 6.24 0.36 -0.40 0.52 -0.97
0.36 1.74 -0.65
(c) 1.79 -0.65 1.16 1.23 2.80 0.92 -2.62 -5.48 0.75 -2.64 -6.41 0.92 1.14 0.18 0.06 -1.49 -3.99
-10.36 7.12 -1.86
(d) -0.53 0.71 1.40 0.28 -0.65 1.02 -0.71 0.70 1.55 -0.52 -0.73 -1.04 -2.39 0.39 5.71 6.39 4.28
6.70 6.05 5.62
(e) -1.61 -1.29 0.59 -0.33 0.14 1.16 2.02 -0.52 0.69 -0.30 -0.56 0.43 -1.01 0.83 -0.95 0.24 0.01
0.10 0.12 0.07
102 CHAPTER 5. NORMAL DISTRIBUTION
Chapter 6
- Recording the number of times the maximum annual wind speed exceeds a certain level v0
(during a fixed number of years).
- Counting the number of years until v0 is exceeded for the first time.
- Testing (pass–no–pass) a number of randomly chosen items.
- Polling some randomly (and independently) chosen individuals regarding some yes–no ques-
tion, for instance, “did you vote in the last provincial election?”
Each trial is called Bernoulli trial and a set of independent Bernoulli trials is called
Bernoulli process or Bernoulli experiment. The defining features of a Bernoulli experiment
are:
These outcomes refer to the occurrence or not of a certain event, A. They are arbitrarily
called success (when A occurs) and failure (when Ac occurs) and denoted by
S (for success) and F (for failure)
103
104 CHAPTER 6. SOME PROBABILITY MODELS
and so
P (F ) = 1 − P (S) = 1 − p = q.
The number of trials in a Bernoulli experiment can either be fixed or random. For example,
if we are considering the number of maximum annual wind speed exceedances of v0 in the
next fifteen years, the number of trials is fixed and equal to 15. On the other hand, if we are
considering the number of years until v0 is first exceeded, the number of trials is random.
That is, Yi is a “counter” for the number of S’s in the outcome of the random experiment.
The variables Yi are very simple. By definition, they are independent and their common
density function is
f (y) = py (1 − p)1−y , y = 0, 1.
The mean and the variance of Yi (they are, of course, the same for all i = 1, . . . , n,) are given
by,
E(Yi ) = (0)f (0) + (1)f (1) = p,
and
The student can check that the variance is maximized when p = q = 0.5. This result is hardly
surprising as the uncertainty is clearly maximized when S and F are equally likely. On the
other hand, the uncertainty is clearly smaller for smaller or larger values of p. For example,
if p = 0.01 we can feel very confident that most of the trials will result in failures. Similarly,
if p = 0.99 we can confidently predict that most of the trials will result in successes.
Given a Bernoulli experiment of fixed size n, the corresponding Binomial random variable
X is defined as the total number of S’s in the sequence of F’s and S’s that constitutes the
outcome of the experiment. That is,
n
X
X= Yi .
i=1
6.2. BERNOULLI AND BINOMIAL RANDOM VARIABLES 105
Using properties (2) and (4) of the mean and variance of random variables,
à n ! n n
X X X
E(X) = E Yi = E(Yi ) = p = np.
i=1 i=1 i=1
and
n
X n
X n
X
Var(X) = Var( Yi ) = Var(Yi ) = pq = npq, where q = 1 − p.
i=1 i=1 i=1
where
n! [n(n − 1) . . . (2)(1)]
(nx ) = = .
x!(n − x)! [x(x − 1) . . . (2)(1)] [(n − x)(n − x − 1) . . . (2)(1)]
For example, if n = 5 and x = 3 we have
5! [(5)(4)(3)(2)(1)]
(53 ) = = = 10
3!2! [(3)(2)(1)][(2)(1)]
To derive the density (6.1) first notice that X takes the value x only if x of the Yi are equal
to one and the remainder are equal to zero. The probability of this event is px q n−x . In
addition, the n variables Yi can be divided into two groups of x and n − x variables in (nx )
many different ways.
The distribution function of X doesn’t have a simple closed form and can be obtained
from Table A5 for a limited set of values of n and p.
Example 6.1 Suppose that the logarithm of the operational life of a machine, T (in hours),
has a normal distribution with mean 15 and standard deviation 7. If a plant has 20 of these
machines working independently, (a) what is the probability that more than one machine
will breakdown before 1500 hours of operation? (b) how many more machines are needed if
the expected number of machines that will not break down before 1500 hours of operation
must be larger than 18?
Solution The number of machines breaking down before 1500 hours of operation, X, is a
binomial random variable with n = 20 and
" #
log(1500) − 15
p = P (T < 1500) = P (log(T ) < log(1500)) = Φ
7
Since
P [X ≤ 1] = P (X = 0) + P (X = 1) = (20 20 20
0 )(0.86) + (1 )(0.14)(0.86)
19
In fact, using the well known formula for the sum of a geometric series with rate 0 < q < 1,
[1 + q + q 2 + · · ·] = 1/(1 − q),
we obtain ∞ ∞
X X p
f (x) = pq x−1 = p[1 + q + q 2 + · · ·] = = 1.
0 1 1−q
6.3. GEOMETRIC DISTRIBUTION AND RETURN PERIOD 107
The return period of A is then inversely proportional to p = P (A). If p = P (A) is small then
we must wait, on average, a large number of periods τ until the first occurrence of A. On
the other hand, if p is large then we must wait, on average, a small number τ of periods for
the first occurrence of A.
The student will be asked to show (see Problem 6.6) that the variance of X is given by
One may well ask the question: why is τ called “return period”? The reason for this becomes
clear after we notice that, because of the assumed independence, the expected number of trials
before the first occurrence of A is the same as the expected number of trials between any two
consecutive occurrences of A.
Example 6.2 Suppose that a structure has been designed for a “25–year rain” (that is, a
rain that occurs on average every 25 years).
(a) What is the probability that the design annual rainfall will be exceeded for the first time
on the sixth year after completion of the structure?
(b) If the annual rainfall Y (in inches) is normal with mean 55 and variance 16, what is the
corresponding design rainfall?
Solution
(a) To say that a certain structure has been designed for a “25–year rain” means that it has
been designed for an annual rainfall with return period of 25 years.
The return period, τ , is equal to 25, and therefore the probability of exceeding the design
annual rainfall is
p = 1/τ = 1/25 = 0.04.
If X represents the number of years until the first time the design annual rainfall is exceeded,
then
P (X = 6) = (0.04)(0.96)6−1 = (0.04)(0.96)5 = 0.033
is the required probability.
(b) The design rainfall, v0 , must satisfy the equation
P (Y > v0 ) = 0.04
108 CHAPTER 6. SOME PROBABILITY MODELS
or equivalently, · ¸
v0 − 55
Φ = 0.96.
4
From the Standard Normal Table we find that Φ(1.75) = 0.96. Therefore,
v0 − 55
= 1.75 ⇒ v0 = (4)(1.75) + 55 = 62.
4
2
and
The process is called a Poisson Process if A is a “rare” event, that is, has the following
properties:
The derivation of these densities from assumptions 1) 2) and 3) is not very difficult. The
interested student can read the heuristic derivation given at the end of this chapter.
Example 6.3 In Southern California there is on average one earthquake per year with
Richter magnitude 6.1 or greater (big earthquakes).
(a) What is the probability of having three or more big earthquakes in the next five years?
(b) What is the most likely number of big earthquakes in the next 15 months?
(c) What is the probability of having a period of 15 months without a big earthquake?
(d) What is the probability of having to wait more than three and a half years until the
occurrence of the next four big earthquakes?
Solution We assume that the sequence of big earthquakes follows a Poisson process with
(average) rate λ = 1 per year.
(a) The number X of big earthquakes in the next five years is a Poisson random variable
with average rate 5 and so, using the Poisson Table we get
(b) In general, a Poisson density f (x) with parameter δ is increasing at x (x ≥ 1) if and only
if the ratio f (x)/f (x − 1) > 1. Since
f (x) = f (x − 1) when x = δ
So, the most likely number of big earthquakes is [1.25] = 1 (notice that 15 months = 1.25
years).
(c) The waiting time T to the next big earthquake is an exponential random variable with
rate λ = 1 year, with distribution function
Therefore,
P {T > 1.25} = 1 − F (1.25) = 1 − [1 − exp {−1.25}] = 0.287.
(d) Let Y represent the number of big earthquakes in the next three and a half years and let
W represent the waiting time (in years) until the occurrence of the next four big earthquakes.
We notice that Y is a Poisson random variable with rate 3.5 and that W is larger than
3.5 years if and only if Y is less than 4. So,
Means and Variances The means of X and T are of practical interest, as they represent
the expected number of occurrences on a period of length ∆ and the expected waiting time
between consecutive occurrences, respectively. We will see that, not surprisingly,
1
E(X) = λ∆ and E(T ) = .
λ
6.4. POISSON PROCESS AND ASSOCIATED RANDOM VARIABLES 111
1
Var(X) = λ∆ and Var(T ) = .
λ2
∞ ∞ ∞
X X exp{−λ∆}(λ∆)x X exp{−λ∆}(λ∆)x
E(X) = xf (x) = x = x
x=0 x=0 x! x=1 x!
∞
X (λ∆)x−1
= exp{−λ∆}(λ∆) = exp{−λ∆}(λ∆) exp{λ∆} = λ∆.
x=1 (x − 1)!
∞ ∞
X X exp{−λ∆}(λ∆)x
E[X(X − 1)] = x(x − 1)f (x) = x(x − 1)
x=0 x=0 x!
∞ ∞
X exp{−λ∆}(λ∆)x X (λ∆)x−2
= x(x − 1) = exp{−λ∆}(λ∆)2
x=2 x! x=2 (x − 2)!
Therefore,
Z ∞ Z ∞ Z ∞
E(T ) = tf (t)dt = tλ exp{−λt}dt = exp{−λt}dt = 1/λ.
0 0 0
112 CHAPTER 6. SOME PROBABILITY MODELS
u = t2 and dv = exp{−λt}dt,
to get
Z ∞ Z ∞ Z ∞
2 2 2
E(T ) = t f (t)dt = t λ exp{−λt}dt = 2 t exp{−λt}dt
0 0 0
Z ∞
= (2/λ)[ tλ exp{−λt}dt] = (2/λ)(1/λ) = (2/λ2 ).
0
Finally,
Solution
(e) Since X = “number of big √ earthquakes in the next five years” is Poisson(5), we have
that E(X) = 5 and SD(X) = 5 = 2.24. √ In the case of fifteen months (1.25 years) the mean
is 1.25 and the standard deviation is 1.25 = 1.12.
(f) Since T = “waiting time (in years) between two consecutive big earthquakes” is an ex-
ponential random variable with rate λ = 1, its expected value (E(T ) = 1/λ) and standard
deviation (Var(T ) = 1/λ2 ) are both equal to one.
(g) Let
W = “waiting time (in years) until the 25th big earthquake”
and let
Ti = “waiting time (in years) between the (i − 1)th and the ith big earthquakes”, i = 1, 2, . . . , 25.
6.5. POISSON APPROXIMATION TO THE BINOMIAL 113
Notice that
25
X
W = Ti
i=1
where, because of the Poisson process assumptions,
T1 , T2 , . . . , T25 are iid Exp(1),
where Exp(λ) means “exponential distribution with parameter (rate) λ”. Therefore,
25
X 25
X 25
X
E(W ) = E Ti = ETi = 1 = 25.
i=1 i=1 i=1
25
X 25
X 25
X
Var(W ) = Var( Ti ) = Var(Ti ) = 1 = 25.
i=1 i=1 i=1
and,
SD(W ) = 5.
Example 17: On average, one per cent of the 50-kg dry concrete bags are underfilled below
49.5 kg. What is the probability of finding 4 or more of these underfilled bags in a lot of 200?
Let m be some
³ fixed´ integer number. If Yi is the number of occurrences of the event A in
i−1
the interval m , im , then the total number of occurrences, X, in the interval (0, 1] (we are
taking ∆ = 1 for simplicity) can be written as
X = Y1 + Y2 + . . . + Ym .
[1 − λ/m]−x → 1,
and
[1 − λ/m]m → exp{−λ},
we obtain that, as m → ∞,
mm−1m−2 m−x+1
(nx )[λ/m]x [1 − λ/m]m−x = ... [1 − λ/m]−x [1 − λ/m]m λx /x!
m m m m
In particular, this justifies the P(np) approximation to the B(n, p), when n is large and p
is small. The requirement that n is large corresponds to m being large and the requirement
that p is small corresponds to λ/m being small.
To derive the Exponential density of T , we reason as follows: The waiting time T until
the first occurrence of A will be larger than t if and only if the number of occurrences X in
the period (0, t) is equal to zero. Since X ∼ P(λt),
6.7 Exercises
6.7.1 Exercise Set A
Problem 6.1 A weighted coin is flipped 200 times. Assume that the probability of a head
is 0.3 and the probability of a tail is 0.7. Each flip is independent from the other flips. Let
X be the total number of heads in the 200 flips.
(a) What is the distribution of X?
(b) What is the expected value of X and variance of X?
(c) What is the probability that X equals 35?
(d) What is the approximate probability that X is less than 45?
Note: Come back to this question after you learned about normal approximations in the
next chapter.
Problem 6.2 Suppose it is known that a treatment is successful in curing a muscular pain
in 50% of the cases. If it is tried on 15 patients, find the probabilities that:
(a) At most 6 will be cured.
(b) The number cured will be no fewer than 6 and no more than 10.
(c) Twelve or more will be cured.
(d) Calculate the mean and the standard deviation.
Problem 6.3 The office of a particular U.S. Senator has on average five incoming calls per
minute. Use the Poisson distribution to find the probabilities that there will be:
(a) exactly two incoming calls during any given minute;
(b) three or more incoming calls during any given minute;
(c) no incoming calls during any given minute.
(d) What is the expected number of calls during any given period of five minutes?
Problem 6.4 A die is colored blue on 5 of its sides, and green on the other 1 side. This die
is rolled 8 times. Assume each roll of the die is independent from the other rolls. Let X be
the number of times blue comes up n the 8 rolls of the die.
(a) What is the expected value of X and the variance of X?
(b) What is the probability that X equals 6?
(c) What is the probability that X is greater than 6?
Problem 6.5 A factory produced 10, 000 light bulbs in February, in which there are 500
defectives. Suppose 20 bulbs are randomly inspected. Let X denote the number of defectives
in the sample.
(a) Calculate P (X = 2).
(b) If the sample size, i.e., the number of the inspected bulbs, is large, how would you
calculate P (X ≥ 2) approximately? For n = 200, calculate this probability approximately.
Problem 6.7 The Statistical Tutorial Center has been designed to handle a maximum of
25 students per day. Suppose that the number X of students visiting this center each day is
a normal random variable with mean 15 and variance 16.
(a) What is the return period τ for this center?
(b) What is the probability that the ”design” number of visits will not be exceeded before
the 10th day?
Problem 6.8 A transmission tower has been designed for a 30–year wind.
(a) What is the probability that the design maximum annual wind velocity will be exceeded
for the first time on the 7th year after completion of the project?
(b) What is the probability that the design maximum annual wind velocity will be exceeded
during the first 7 years after completion of the project?
(c) If the maximum annual wind velocity (in miles per hour) is an exponential random variable
with mean 35, what is the design maximum annual wind velocity?
(d) What is the return period if the design maximum annual wind velocity is decreased by
15%?
Problem 6.9 (a) Let X1 and X2 be two Binomial random variables with n = 14 and p =
0.30. Calculate
(i) P (X1 = 4), P (X1 < 6) and P (2 < X1 < 6) (use the Binomial table)
(ii) E(X1 ), SD(X1 ), E(X1 + X2 ), SD(X1 + X2 ), E(X1 X2 ) and SD(X1 X2 )
(iii) P (X1 + X2 = 8), P (X1 + X2 < 12) and P (4 < X1 + X2 < 12).
Problem 6.10 The arrival of customers to a service station is well approximated by a Pois-
son Process with rate λ = 5 per hour.
(a) What is the expected number of customers per day? (the service station is open eight
hours per day)
(b) What is the most likely number of customers in any given hour?
(c) What is the probability that more than seven customers will arrive in the next hour?
(d) What is the probability that the waiting time between two consecutive arrivals will be
25 minutes or more?
(e) What is the expected time until the arrival of the next 25 customers? The standard
deviation?
Problem 6.11 A bag contains 4 red balls and 6 white balls. One ball was drawn with equal
probability and replaced in the bag before next draw was made. Let X be the number of red
balls out of 100 draws from the bag.
(a) Give a general expression for P (X = k), k = 0, 1, ..., 100;
(b) Calculate the mean and variance of X;
(c) Calculate the probability P (X ≤ 38).
118 CHAPTER 6. SOME PROBABILITY MODELS
Problem 6.12 The number of killer whales arriving at the Pacific Rim Observatory Station
follows a Poisson Process with rate λ = 4 per hour.
(a) What is the expected number and variance during the next hour?
(b) What is the probability that the waiting time T between two consecutive arrivals will be
30 minutes or more?
(c) What is the expected time and variance until the next 20 killer whales arriving at the
Observatory Station.
Problem 6.13 Car accidents are random and can be said to follow a Poisson distribution.
At a certain intersection in East Vancouver there are, on average, 4 accidents a week. Answer
the following questions:
(a) What is the probability of there being no accidents at this intersection next week?
(b) The record for accidents in one month at a single intersection is 20. Find the probability
that this record will be broken, at this intersection, next month. (Assume 30 days in one
month)
(c) What is the expected waiting time for 20 accidents to occur?
Problem 6.14 A test consists of ten multiple-choice questions with five possible answers.
For each question, there is only one correct answer out of five possible answers. If a student
randomly chooses one answer at each question, calculate the following probabilities that
(a) at most three questions are answered correctly?
(b) five questions are answered correctly?
(c) all questions are answered correctly?
And (d) calculate the mean and the standard deviation of number of correct answers.
Problem 6.15 The number of meteorites hitting Mars follows a Poisson process with pa-
rameter λ = 6 per month.
(a) What is the probability that at least 2 meteorites hit Mars in any given month?
(b) Find the probability that exactly 10 meteorites hit Mars in the next 6 months.
(c) What is the expected number of meteorites hitting the Mars in the next year?
Problem 6.16 A biased coin is flipped 10 times independently. The probability of tails is
0.4. Let X be the total number of heads in the 10 flips.
(a) Use a computer to find P (X = 4);
(b) Use the Binomial table to find P (1 < X < 5);
(c) What is the probability that one has to flip at least 5 times to get the first head?
Problem 6.17 Three identical fair coins are tossed simultaneously until all three show the
same face.
(a) What is the probability that they are tossed more than three times?
(b) Find the mean for the number of tosses.
Chapter 7
119
120 CHAPTER 7. NORMAL PROBABILITY APPROXIMATIONS
Example 7.1 A system consists of 25 independent parts connected in such a way that the ith
part automatically turns–on when the (i − 1)th part burns out. The expected lifetime of each
part is 10 weeks and the standard deviation is equal to 4 weeks. (a) Calculate the expected
lifetime and standard deviation for the the system. (b) Calculate the probability that the
system will last more than its expected life. (c) Calculate the probability that the system
will last more than 1.1 times its expected life. (d) What are the (approximate) median life
and interquartile range for the system?
Solution
(a) Let Xi denote the lifetime of the ith component and let
25
X
T = Xi ,
i=1
Therefore, √
SD(T ) = 400 = 20 weeks,
Notice that the
√ mean of T is 25 times larger than that of each Xi while the standard deviation
of T is only 25 = 5 times larger.
(d) Let Z denote the standard normal random variable. Using that T ' N (250, 400), it
follows that
Analogously,
Q2 (T ) = Median(T ) = 250 + (20 × Q2 (Z)) = 250,
and
Therefore,
IQR(T ) ≈ Q3 (T ) − Q1 (T ) = 263.5 − 236.5 = 27.0
2
Table 7.1
Example 7.2 Consider Table 7.1 with data on the annual (cumulative) rainfall intensity (X)
on a certain watershed area. The average annual rainfall intensity can be calculated from
Solution
To make the required probability calculations we will assume that the rainfall intensities
are uniformly distributed on each interval. This is a reasonable assumption given that we do
not have any additional information on the distribution of values on each class.
Let ri represent the actual annual rainfall intensity (i = 1, 2, . . . , 145) and let mi be the
midpoint of the corresponding class. For instance, if r5 = 50.35 (a value in the class 50–54),
then mi = 52.0. Let
Ui = ri − mi , i = 1, 2, . . . , 145.
Given our “uniformity” assumption, the Ui0 s are uniform random variables on the interval
(−2, 2).
To proceed with our calculation, we will assume that the variables Ui0 s (which represent
the approximation errors) are independent.
Let
r1 + r2 + . . . + r145
r= .
145
The approximation error, D, in the calculation of X can now be written as
r1 + r2 + · · · + r145 m1 + m2 + . . . + m145 U1 + U2 + . . . + U145
D = ør − verlineX = − =
145 145 145
Since D is the average of 145 independent, identically distributed random variables with zero
mean and variance equal to
2 1Z 2 2 4
σ = t dt = ,
4 −2 3
we can use the (CLT) normal approximation. That is, we can use a normal distribution
with zero mean and variance equal toq(4/3)/145, to approximate the distribution of D. The
corresponding standard deviation is 4/435 = 0.095893.
(a)
(b)
P (|D| > 0.1) = P (|D|/0.095893 > 0.1/0.095893) ≈ 2[1 − Φ(1.04)] = 0.2984.
(c)
P (|D| > 0.5) = P (|D|/0.095893 > 0.5/0.095893) ≈ 2[1 − Φ(5.21)] = 0.
2
Solution
(h) Since W is a sum of iid random variables, we can use the Central Limit Theorem to
approximate P (W > 27). Since E(W ) = 25 and SD(W ) = 5 we have
· ¸
27 − 25
P (W > 27) = 1 − P (W ≤ 27) = 1 − Φ = 1 − Φ[0.40] = 1 − 0.6554 = 0.3446.
5
2
Using the Binomial Table on the Appendix we have that the exact probability is equal to
Table 7.2
provided that α ≥ 20. The continuity correction 0.5 added and subtracted to x is needed
because we are approximating a discrete random variable with a continuous random variable.
This approximation is justified by the following argument: consider a Poisson process
with rate λ = 1, and suppose that X represents the number of occurrences in a period of
length α. We can divide α into n subintervals of length α/n and denote by Yi the number of
occurrences in the ith subinterval. It is clear that Y1 , . . . , Yn are independent Poisson random
variables with mean α/n and that
X = Y1 + Y2 + · · · + Yn = nY .
126 CHAPTER 7. NORMAL PROBABILITY APPROXIMATIONS
Intuitively, the requirement that α is large is necessary because one needs to represent X
as the sum of a a large number, n, of independent random variables, Yi , and the common
distribution of these random variables becomes very asymmetric when α/n is very small.
As an example, let X ' P(25) and calculate (a) P (X = 27), (b) P (X > 27) and (c)
P (24 ≤ X < 27). In the case of (a),
" # " #
27 + .5 − 25 27 − .5 − 25
P (X = 27) ≈ Φ √ −Φ √ = Φ(0.5) − Φ(0.3) = 0.6915 − 0.6179 = 0.0736.
25 25
The exact probability is exp −25 × 2527 /(27!) = 0.07080. In the case of (b),
The exact probability in this case is 0.2998. Finally, in the case of (c),
P (24 ≤ X < 27) ≈ Φ((26.5 − 25)/5) − Φ((23.5 − 25)/5) = Φ(0.3) − Φ(−0.3) = 2Φ(0.3) − 1 = 0.2358.
7.4 Exercises
7.4.1 Exercise Set A
Problem 7.1 Two types of wood (Elm and Pine) are tested for breaking strength. Elm
wood has an expected breaking strength of 56 and a standard deviation of 4. Pine wood has
an expected breaking strength of 72 and a standard deviation of 8. Let X̄ be the sample
average breaking strength of an Elm sample of size 30, and Ȳ be the sample average breaking
strength of a Pine sample of size 40.
(a) What is the approximate distribution of X̄?
(b) What is the approximate distribution of Ȳ ?
(c) Calculate (approximately) P (X̄ + Ȳ < 110).
Problem 7.2 Consider a population with mean 82 and standard deviation 12.
(a) If a random sample of size 64 is selected, what is the probability that the sample mean
will lie between 80.8 and 83.2?
(b) With a random sample of size 100, what is the probability that the sample mean will lie
between 80.8 and 83.2?
(c) What assumption(s) have you used in (a) and (b)?
Problem 7.3 Suppose that the population distribution of the gripping strengths of indus-
trial workers is known to have a mean of 110 and a standard deviation of 10. For a random
sample of 75 workers, what is the probability that the sample mean gripping strength will
7.4. EXERCISES 127
be:
(a) between 109 and 112?
(b) greater than 111?
(c) What assumption(s) have you made?
Problem 7.4 The expected amount of sulfur in the daily emission from a power plant is 134
pounds with a standard deviation of 22 pounds. For a random sample of 40 days, find the
approximate probability that the total amount of sulfur emissions will exceed 5, 600 pounds.
Problem 7.5 Suppose we draw two samples of equal size n from a population with unknown
mean µ but a known standard deviation 3.5, respectively. Let X̄ and Ȳ be the corresponding
sample averages. How large would the sample size n be required to be to ensure that P (−1 ≤
X̄ − Ȳ ≤ 1) = 0.90?
Problem 7.6 Suppose X1 , . . . , X30 are independent and identically distributed random vari-
ables with mean EX1 = 10 and variance Var(X1 ) = 5.
1 P30
(a) Calculate the mean of X̄ = 30 i=1 X i and the standard deviation of X1 − X2 .
(b) Calculate the interquartile range of X̄ approximately.
Problem 7.8 (a) Generate m = 100 samples of size n = 10, of independent random variables
with uniform distribution on the interval (0, 1). Let Xij denote the j th element of the ith
sample (i = 1, 2, . . . , m and j = 1, 2, . . . , n).
Construct the histogram and Q − Q plot for the sample means
n
1X
Xi = Xij .
n i=1
(b) Same as (a) but with n = 20 and n = 40. What are your conclusions?
(c) Repeat (a) and (b) but with the Xij having density
1
f (x) = (x − 4) 4 < x < 10
18
= 0 otherwise
Problem 7.9 Solve part (a) of Problem 7.8 but with p = 0.7, instead of 0.3.
Problem 7.10 Referring to Problem 6.10, find the probability that more than 800 customers
will come during the next 20 business days?
Problem 7.11 The expected tensile strength of two types of steel (types A and B, say) are
106 ksi and 104 ksi. The respective standard deviations are 8 ksi and 6 ksi. Let X and Y
be the sample average tensile strengths of two samples of 40 specimens of type A and 35
specimens of type B, respectively.
(d) Suppose that after completing all the sample measurements you find x − y = 6. What
do you think now of the “population” assumptions made at the beginning of this problem?
Why?
Problem 7.12 (a) There are 75 defectives in a lot of 1500. Twenty five items are randomly
inspected (the inspection is non-destructive and the items are returned to the lot immediately
after inspection). If two or more items are defective the lot is returned to the supplier (at
the supplier’s expense). Otherwise, the lot is accepted. What is the probability that the lot
will be rejected?
(b) Suppose that the actual number of defectives is unknown and that five out of twenty
five independently inspected items turned out to be defectives. Estimate the total number
of defectives in the lot (of 1500 items). What is the expected value and standard deviation
of your estimate? What is the (approximated) probability that your estimate is within a
distance of 10 from the actual total number of defectives?
Problem 7.14 Bits are independently received in a digital communication channel. The
probability that a received bit is in error is 0.00001.
(a) If 16 million bits are transmitted, calculate the (approximate) probability that more than
150 errors occur.
(b) If 160,000 bits are transmitted, calculate the (approximate) probability that more than
1 error occurs.
Chapter 8
8.1 Introduction
One is often interested on random quantities (variables Y , T , N , etc.) such as the strength
Y of a concrete block, the time T of a chemical reaction, the number N of visits to a
website, etc. Engineers and applied scientists use statistical models to represent these
random quantities. Statistical models are a set of mathematical equations involving random
variables and other unknown quantities called parameters.
For example, the compressive strength of a concrete block can be modeled as
Y = µ + σε (8.1)
where µ is a parameter that represents the “true” average compressive strength of the concrete
block, ε is a random variable with zero mean and unit variance that accounts for the “block-
to-block” variability and σ is a parameter that determines the average size of the “block-to-
block” variability. Notice that according to this model the compressive strength of a concrete
block is a random variable that results from the sum of two components: a systematic
component or signal (µ ) and a random component or noise (σε).
Independent measurements are often taken to “adjust the model”, that is, to estimate
the unknown parameters that appear in the model equations. For example, the compressive
strength of several concrete blocks can be measured to get information about µ and σ.
Before the measurements are actually performed they can be thought of as independent
replicates of the random quantity of interest. For example, the future measurements of the
compressive strengths can be represented as
Yi = µ + σεi , i = 1, . . . , n, (8.2)
129
130 CHAPTER 8. STATISTICAL MODELING AND INFERENCE
population under study. In practice some units are randomly chosen and the measurements
are performed only on them. The set of selected units is called sample. The corresponding
set of measurements is also called a sample.
Given a statistical model and a set of measurements (sample) one can carry on some some
statistical procedures called statistical inference which are aimed at extrapolating from
the sample to the population. The most typical statistical procedures are:
• Point estimation of the model parameters.
• Confidence intervals for the model parameters.
• Testing of hypotheses about the model parameters.
These procedures will be described and further discussed in the context of the simple situa-
tions considered below.
for example the expected squared estimation error or the expected absolute estimation error.
Estimation of µ: A good point estimate for µ, the main parameter of model (8.3), can be
obtained by the method of least squares which consists of minimizing (in m) the sum of
squares
n
X
S(m) = (Yj − m)2
j=1
Differentiating with respect to m and setting the derivative equal to zero gives the equation
n
X n
1X
S 0 (m) = −2 (Yj − m) = 0, or m = Yj = Y = µ̂.
j=1 n j=1
Estimation Error: Being functions of the random variables, the point estimate Y and
the estimation error Y − µ are also random variables. Obviously, we would like that the
estimation error is small. To have some idea of the behavior of the estimation error we can
calculate its expected value (mean) and its variance:
n n
1X 1X
E[Y − µ] = E(Y ) − µ = E(Yj ) − µ = µ−µ=0 ( Y is unbiased),
n j=1 n j=1
and n n
1 X 1 X 2 σ2
Var(Y − µ) = Var(Y ) = Var(Y j ) = nσ = .
n2 j=1 n2 j=1 n
In this case, the estimation error has then a distribution centered at zero and a variance
inversely proportional to n. In other words, if n is sufficiently large, likely values of Y will
all be close to µ.
Estimation of σ 2 : The point estimate for σ 2 is based on the minimized sum of squares,
S(Y ), divided by a quantity d so that the E[S(Y )/d] = σ 2 . The simple derivation outlined
in Problem 8.9 shows that d = n − 1, and so
Pn−1
2 2 − Y )2
j=1 (Yj
σ̂ = S = .
n−1
The sample mean and variance areqy = 1.9833 and s2 = 0.09787879, respectively. The
standard error of y is then SE(y) = 0.09787879/12 = 0.09031371. It would appear that the
scientist’s measurement procedure is biased giving values below the true concentration. The
bias can be estimated as 1.9833 − 2.5 = −0.5166667, give or take 0.181 (0.181 = 2 × SE(y)).
P [|Y − µ| < d] = 1 − α
The resulting d can be, then, added to and subtracted from the observed average y to obtain
the upper and lower limits of an interval called (1 − α)100% confidence interval:
(y − d, y + d)
Typical values of α are α = 0.05 and α = 0.01 yielding 95% and 99% confidence intervals,
respectively. To fix ideas we will take α = 0.05 in what follows.
Assuming that the model (8.3) is correct, the probability that µ and Y differ by more than
d is only 0.05. In other words, if we repeatedly obtain samples of size n and construct the
corresponding 95% confidence intervals for µ, on average, 95% of these intervals will include
the (unknown) value of µ.
Using that Y ∼ N (µ, σ 2 /n) we have
" √ # "√ #
X −µ d n nd
0.95 = P [ |Y − µ| < d ] = P | √ |< = 2Φ − 1.
σ/ n σ σ
That is,
" √ #
d n
Φ = 0.975.
σ
Using the standard normal table we get,
√
d n
= 1.96,
σ
from which we have
σ
d = 1.96 √ .
n
8.2. ONE SAMPLE PROBLEMS 133
s
dˆ = 1.96 √ = 1.96 × SE(y).
n
The precision of s as an estimate of σ increases with the sample size. Therefore, replacing σ
by s has little effect when the sample size is large (n ≥ 20, say). However, when n is small
the added level of uncertainty is somewhat increased and an adjustment is needed. To adjust
for the increased level of uncertainty the value from the normal table (1.96 when α = 0.05)
must be replaced by a slightly larger value, tdf (α), obtained from the Student’s t table. The
precise Student’s t value, tdf (α), depends on two parameters: the significance level, α, and
the degrees of freedom, df .
The significance level, α, is equal to one minus the desired confidence level . In our case,
the confidence level (desired precision) is 0.95 and so α = 0.05. In this simple case the degrees
of freedom parameter, df is simply equal to the sample size minus one, that is df = n − 1.
More generally, (for future applications) the degrees of freedom are given by the formula
df = n − k,
where
n = number of squared terms appearing in the variance estimate
and
k = number of additional estimated parameters appearing in the variance estimate
Table A.2 in Appendix gives the values of t(df ) (α) for several values of α and df .
In summary, the estimated value of d is
s
dˆ = tdf (α) √ = tdf (α) SE(Y ).
n
Notice that for most values of n that appear in practice, tn−1 (0.05) ≈ 2, justifying the
common practice of adding and subtracting 2 × SE(y) from the observed average y.
Example 8.2 Refer to the data in example 8.1. A 95% confidence interval for the actual
mean of the scientist’s measurements is
1.9833 ± t(11) (0.05) × SE(y)
or
1.9833 ± 2.20 × 0.09031371.
That is, the systematic part of the scientist’s measurement is likely to lay between 1.8 and
2.2.
134 CHAPTER 8. STATISTICAL MODELING AND INFERENCE
Significance Level of a Test: When testing a hypothesis one can incur in two possible
errors: Rejecting a hypothesis that is true (Error of type I) or non-rejecting a hypothesis that
is false (Error of type II). Errors of type I are considered more important and kept under
tight control. Therefore, usual testing procedures insure that the probability of rejecting a
true hypothesis is rather small (0.01 or 0.05). The probability of error of type I is usually
denoted by α and called significance level of the test.
Taking that into consideration, the hypothesis H0 is constructed in a such a way that its
incorrect rejection has a small probability. H0 states, then, the most conservative statement.
A statement that one would like to reject only in the presence of strong empirical evidence.
Because of that H0 is called as the “null hypothesis”.
The Testing Procedure: The testing procedures learned in this course are simply derived
from confidence intervals. Suppose we wish to test H0 at level α. Then we distinguish two
cases:
Two sided tests: Hypotheses of the form H0 : µ = µ0 give rise to two sided tests because
in this case we reject H0 if we have evidence indicating that µ is smaller or larger than µ0 .
The two–sided level α testing procedure consists of the following two steps:
Step 1. Construct a (1 − α)100% confidence interval for µ.
Step 2. Reject H0 : µ = µ0 if µ0 lies outside that interval.
One sided tests: Hypotheses of the form H0 : µ ≥ µ0 (H0 : µ ≤ µ0 ) are called directional
hypotheses and give rise to one–sided tests. Notice that in this case we reject H0 only if we
suspect that µ < µ0 (µ > µ0 ).
The one–sided level α testing procedure consists of the following two steps:
Step 1. Construct a [1 − (2 × α)]100% confidence interval for µ.
8.2. ONE SAMPLE PROBLEMS 135
Example 8.3 Refer to the data in example 8.1. Test at level α = 0.05 the following hy-
potheses: (a) H0 : µ = 2.5; and (b) H0 : µ ≥ 2.3.
(a) Since the 95% confidence interval (1.785, 2.182) (see example 8.2) does not include 2.5,
we reject H0 . There is statistical evidence indicating that the measurement procedure is not
unbiased.
(b) We must first construct a 90% confidence interval for µ. From example 8.1 we have
that y = 1.9833 and SE(y) = 0.0931371. Moreover, from the Student-t Table we have
t(11) (0.10) = 1.80. Therefore, the 90% confidence interval for µ is
Since 2.15 < 2.3 we reject the H0 . There is statistical evidence indicating that the mea-
surement procedure systematically underestimates the true lead concentration by at least 0.2
µg/l.
Example 8.4 A shipyard must order a large shipment of lacquer from a supplier. Besides
other design requirements, the lacquer must be durable and dry quickly. The average drying
time must not exceed 25 minutes. Supplier A claims that, on average, its product dries in
20.5 minutes. A sample of 30 20-liter cans from supplier A yields an average drying time of
22.3 minutes and standard deviation of 2.9 minutes.
(a) Is there statistical evidence to distrust supplier A’s claim that its product has an average
drying time of 20.5 minutes?
(b) Can we say that, on average, supplier A’s lacquer dries before 24 minutes?
(a) To answer this question we must assess the precision of y as an estimate of µ. Evidently,
y = 22.3 is different from the claimed value 20.5 for µ. However, we need still to determine
if the observed difference of 1.8 is within the normal range of variability of Y .
To answer the question we can test the hypothesis
H0 : µ = 20.5.
present case √
α = 0.05 and df = 30 − 1 Hence, from Table A.2, t29 (0.05) = 2.05. Moreover,
SE(y) = 2.9/ 30 = 0.529465. Therefore,
Since this interval doesn’t include the value µ = 20.5, we reject supplier’s A claim that
µ = 20.5. That is, we reject the hypothesis µ = 20.5 on the basis of the given data and
statistical model.
H0 : µ ≥ 24.0
at some (small) level α. To take advantage of the calculations already made we may choose
α = 0.025. Since the upper limit 95% confidence interval for µ is smaller than 24.0, we reject
H0 and answer question (b) in a positive way. 2
Example 8.5 Refer to the situation described in Problem 8.4. Another supplier, called
Supplier B, could also supply the lacquer. A sample of 10 20-liter cans from supplier B yields
an average drying time of 20.7 minutes and standard deviation of 2.5 minutes. Does the data
support supplier B’s claim that, on average, its product dries faster than A’s? What if the
sample size from supplier B were 100 instead of 10?
This example illustrates a fairly common situation: one must take or recommend an im-
portant decision involving a large number of items (or individuals) on the basis of a relatively
small number of measurements performed on some of these items. Recall that the set of all
the items under study is called the population and the subset of items used to obtain the
measurements (and often the measurements themselves) is called the sample.
Example 8.5 includes two populations, namely the 3,000 20-liter cans of lacquer that can
be acquired from either supplier A or B. In the following these two populations will be called
population A and population B, respectively.
Although we are concerned with the entire populations, we will only be able to test the
items in the samples. Therefore, we must try to investigate and exploit the mathematical
connections between the samples and the population from which they came. This can be
8.3. TWO SAMPLE PROBLEMS 137
done with the help of an statistical model, that is, a set of probability assumptions regarding
the sample measurements. The two sample measurements can be modeled as
Notice that Y 1 and Y 2 are normal random variables with means µ1 and µ2 and variances
σ12 /n1 and σ22 /n2 , respectively. Furthermore, the population variances σ12 and σ22 can be
estimated by the sample variances
n1 n2
1 X 1 X
S12 = [Y1j − Y 1 ]2 and S22 = [Y2j − Y 2 ]2
n1 j=1 n2 j=1
The Pooled Variance Estimate If the variances of the two populations are approximately
equal it then makes sense to compare their means. On the other hand, if the variances
are very different, comparing the population means may be a gross oversimplification. A
practical solution in these cases is to apply a transformation (e.g. use log(Yij ) instead of Yij )
that stabilizes (equalizes) the variances.
In this course we will only consider the simple situation where
σ12 = σ22 = σ 2 .
An unbiased estimate for the common variance σ 2 , based on the individual unbiased estimates
S12 and S22 , is given by the pooled variance estimate
P2 Pni
2 (n1 − 1)S12 + (n2 − 1)S22 i=1 − Y i ]2
j=1 [Yij
S = = .
n1 + n2 − 2 n1 + n2 − 2
138 CHAPTER 8. STATISTICAL MODELING AND INFERENCE
Linear Combinations of the Population Means: In practice one often wishes to esti-
mate linear combinations of the population means and to test hypotheses about them. In
such cases we say that the parameter of interest is a linear combination of µ1 and µ2 .
The most common linear combination of µ1 and µ2 is the simple difference:
θ = µ1 − µ2 .
θ = aµ1 + bµ2
θ̂ = aY 1 + bY 2
In fact,
E(θ̂) = E(aY 1 + bY 2 ) = aE(Y 1 ) + bE(Y 2 ) = aµ1 + bµ2 = θ.
The variance of θ̂ is equal to
" #
σ2 σ2 a2 b2
Var(θ̂) = Var(aY 1 + bY 2 ) = a2 Var(Y 1 ) + b2 Var(Y 2 ) = a2 + b2 = σ2 + .
n1 n2 n1 n2
s s
1 1 1 1 √
SE(θ̂) = s + = 2.8106 + = 1.053 = 1.026.
n1 n2 30 10
8.3. TWO SAMPLE PROBLEMS 139
df = n1 + n2 − 2.
s
n1 + n2
(y 1 − y 2 ) ± t(38) (0.05) × s = 1.6 ± 2.02 × 1.026 = (−0.47 , 3.67),
n1 n2
We have used the approximation t(38) (0.05) ≈ t(38) (0.05) = 2.02, because t(38) (0.05) is not
included in the table.
Solution to Example 8.5: The statement of Supplier B is consistent with the hypothesis
H 0 : µ1 ≥ µ2
or equivalently
H0 : µ1 − µ2 ≥ 0.
We may answer the question by testing this (directional) hypothesis at some (small) level α.
For example, we may take α = 0.05. The 90% confidence interval for θ = µ1 − µ2 is
s
n1 + n2
(y 1 − y 2 ) ± t(40) (0.10) × s = 1.6 ± 1.68 × 1.026 = 1.6 ± 1.724 = (−0.124 , 3.324),
n1 n2
Since the value µ1 − µ2 = 0 falls in the interval, we conclude that there is no statistically
significant difference between the two means. There is, then, statistical evidence against
Supplier B’s claim of having a superior product. 2
Example 8.6 Either 20 large machines or 30 small ones can be acquired for approximately
the same cost. One large and one small machines have been experimentally run for 20 days
with the following results:
Solution: Since the total cost of 20 large machines equals the cost of 30 small machines, it is
reasonable to compare the total outputs:
where µ1 and µ2 are the average daily outputs for each type of machine.
Therefore, the parameter of interest is the linear combination
θ = 20µ1 − 30µ2 .
θ̂ = 20 y 1 − 30 y 2 = 20 × 31 − 30 × 22.7 = −61.0.
8.4 Exercises
8.4.1 Exercise Set A
P P
Problem 8.1 Given that n1 = 15, x̄ = 20, (xi − x̄)2 = 28, and n2 = 12, ȳ = 17, , (yi −
ȳ)2 = 22.
(a) Calculate the pooled variance s2 .
(b) Determine a 95% confidence interval for µ1 − µ2 .
(c) Test H0 : µ1 = µ2 with α = .05.
Problem 8.2 The time for a worker to repair an electrical instrument is a normally dis-
tributed N (µ, σ 2 ) random variable measured in hours, where both µ and σ 2 are unknown.
The repair times for 10 such instruments chosen at random are as follows:
212,234,222,140,280,260,180,168,330,250
(1) Calculate the sample mean and the sample variance of the 10 observations.
(2) Construct a 95% confidence interval for µ.
(3) Suppose the worker claims that his average repair time for the instrument is no more
than 200 hours. Test if his claim conforms with the data.
Problem 8.3 (Hypothetical) The effectiveness of two STAT251/241 labs which were con-
ducted by two TAs is compared. A group of 24 students with rather similar backgrounds
was randomly divided into two labs and each group was taught by a different TA.Their test
scores at the end of the semester show the following characteristics:
and
n2 = 11, ȳ = 71.8, s2y = 112.6.
Assuming underlying normal distributions with σ12 = σ22 , find a 95 percent confidence interval
for µ1 −µ2 . Are the two labs different? Summarize the assumptions you used for your analysis.
Problem 8.4 Two machines (called A and B in this problem) are compared. Machine A
cost $ 3000 and machine B cost $ 4500. One machine of each type was operated during 30
days and the daily outputs were recorded. The results are summarized below:
Machine A: xA = 200kg sA = 5.1kg.
Machine B: xB = 270kg sB = 4.9kg.
Is there statistical evidence indicating that any one of these machines has better output/cost
performance than the other? Use α = 0.5.
Problem 8.5 The average biological oxygen demand (BOD) at a certain experimental sta-
tion has to be estimated. From measurements at other similar stations we know that the
variance of BOD samples is about 8.0 (mg/liter)2 . How many observations should we sample
142 CHAPTER 8. STATISTICAL MODELING AND INFERENCE
if we want to be 90 percent confident that the true mean is within 1 mg/liter of our sample
average? (Hint: Using CLT, we may assume the sample average has approximately normal
distribution).
Problem 8.6 An automobile manufacturer recommends that any purchaser of one of its new
cars bring it in to a dealer for a 3000-mile checkup. The company wishes to know whether
the true average mileage for initial servicing differs from 3000. A random sample of 50 recent
purchasers resulted in a sample average mileage of 3208 and a sample standard deviation
of 273 miles. Does the data strongly suggest that true average mileage for this checkup is
something other than the recommended value?
Problem 8.7 The following data were obtained on mercury residues on birds’ breast mus-
cles:
Mallard ducks: m = 16, x̄ = 6.13, s1 = 2.40
Blue-winged teals: n = 17, ȳ = 6.46, s2 = 1.73
Construct a 95% confidence interval for the difference between true average mercury residues
µ1 , µ2 in these two types of birds in the region of interest. Does your confidence interval
indicate that µ1 = µ2 at a 95% confidence level?
Problem 8.8 A manufacturer of a certain type of glue claims that his glue can withstand
230 units of pressure. To test this claim, a sample of size 24 is taken. The sample mean is
191.2 units and the sample standard deviation is 21.3 units.
(a) Propose a statistical model to test this claim and test the manufacturer’s claim.
(b) What is the highest claim that the manufacturer can make without rejection of this
claim?
E(S 2 ) = σ 2
Problem 8.10 (a) The president of a cable company claims that its 0.3–inch cable will
support an average load of 4200 pounds. Twenty four of these cables are tested to failure,
yielding the following data:
4201.3 4262.4 3983.0 3943.0 4141.3 4168.5 4050.0 4142.7
8.4. EXERCISES 143
Problem 8.11 A politician must decide whether or not to run in the next local election.
He would be inclined to do so if at least 30% of the voters would favor his candidacy. The
results of a poll of 20 local citizens yielded the following results:
30% favor the politician 35% favor other candidates 35% are still undecided
Should the candidate decide to run based on the results of this survey? Do you think that
the sample size is appropriate? If not, suggest an appropriate sample size.
Problem 8.12 The number of hours needed by twenty employees to complete a certain task
have been measured before and after they participated of a special training program. The
data is displayed on Table 7.1.
How would you model these data in order to answer the question: Was the training
program successful? Was it? Also check that your model’s assumptions are consistent with
the data.
Table 7.1:
Problem 8.13 In order to process a certain chemical product, a company is considering the
convenience of acquiring (for approximately the same price) either 100 large machines or 200
small ones. One important consideration is the average daily processing capacity (in hundred
of pounds).
One machine of each type was tested for a period of 10 days, yielding the following results:
Large Machine: x1 = 120 s1 = 1.5
Small Machine: x2 = 65 s2 = 1.6
Model the data and identify the parameter of main interest. Construct a 95% confidence
interval for this parameter. What is your recommendation to management?
Problem 8.14 A study is made to see if increasing the substrate concentration has appre-
ciable effect on the velocity of a chemical reaction. With the substrate concentration of 1.5
moles per liter, the reaction was run 15 times with an average velocity of 7.5 micromoles per
30 minutes and a standard deviation of 1.5. With a substrate concentration of 2.0 moles per
144 CHAPTER 8. STATISTICAL MODELING AND INFERENCE
liter, 12 runs were made yielding an average velocity of 8.8 micromoles per 30 minutes and a
sample standard deviation of 1.2. Would you say that the increase in substrate concentration
increases the mean velocity by as much as 0.5 micromoles per 30 minutes? Use a 0.01 level
of significance and assume the populations to be approximately normally distributed with
equal variances.
Problem 8.15 (Hypothetical) A study was made to estimate the difference in annual salaries
of professors in University of British Columbia (UBC) and University of Toronto (UT). A
random sample of 100 professors in UBC showed an average salary of $46,000 with a standard
deviation $12,000. A random sample of 200 professors in UT showed an average salary of
$51,000 with a standard deviation of $14,000. Test the hypothesis that the average salary
for professors teaching in UBC differs from the average salary for professors teaching in UT
by $5,000.
Problem 8.16 A UBC student will spend, on the average, $8.00 for a Saturday evening
gathering in pub. A random sample of 12 students attending a homecoming party showed an
average expenditure of $8.9 with standard deviation of $1.75. Could you say that attending
a homecoming party costs students more than gathering in pub?
Problem 8.17 The following data represent the running times of films produced by two
different motion-picture companies.
Times (minutes)
Company I 103 94 110 87 98
Company II 97 82 123 92 175 88 118
Compute a 90% confidence interval for the difference between the average running times of
films produced by the two companies. Do the films produced by Company II run longer than
those by Company I?
8.4. EXERCISES 145
Problem 8.18 It is required to compare the effect of two dyes on cotton fibers. A random
sample of 10 pieces of yarn were chosen; 5 pieces were treated with dye A, and 5 with dye B.
The results were
Dye A 4 5 8 8 10
Dye B 6 2 9 4 5
(a) Test the significance of the difference between the two dyes. (Assume normality, common
variance, and significance level α = 0.05.)
(b) How big a sample do you estimate would be needed to detect a difference equal to 0.5
with probability 99%.
146 CHAPTER 8. STATISTICAL MODELING AND INFERENCE
Chapter 9
Simulation Studies
Suppose that g is such that this integral cannot be easily integrated and we need to
approximate it by numerical means. For simplicity suppose that 0 ≤ g(t) ≤ 1 for all 0 ≤ t ≤
1.
If we are dealing with a function h(t) which is not between 0 and 1 but we know that
h(t) − a
g(t) =
b−a
Suppose that we want to estimate I with an error smaller than δ = 0.01, with probability
equal to 0.99. In other words, if Iˆ is the estimate of I, we require that
147
148 CHAPTER 9. SIMULATION STUDIES
Z 1
I= g(t)dt = E{g(U )},
0
where U is a random variable with uniform distribution on the interval (0, 1). If we generate
n independent random variables
U1 , U 2 , . . . , U n
with uniform distribution on (0, 1), then by the Central Limit Theorem
n
1X
Iˆ = g(Ui ) = g(U )
n i=1
Z 1 Z 1
2 2 2
σ = g (t)dt − I ≤ g(t)dt − I 2 = I(1 − I).
0 0
Now,
√ √
P {|Iˆ − I| < 0.01} = P { n|Iˆ − I|/σ < n0.01/σ}
√
≈ P {|Z| < n0.01/σ}
√
= 2Φ[ n0.01/σ] − 1.
But,
√ √
2Φ[ n0.01/σ] − 1 = 0.99 ⇒ Φ[ n0.01/σ] = 0.995
√
n0.01
⇒ = 2.58
σ
σ 2 (2.58)2
⇒ n= .
(0.01)2
9.1. MONTE CARLO SIMULATION 149
Finally, since I(1 − I) reaches its maximum at I = 0.5 it follows that I(1 − I) ≤ 0.25 for all
I, and so, a conservative estimate for n is
σ 2 (2.58)2 (0.25)(2.58)2
n = ≤ = 16, 641
(0.01)2 (0.01)2
The Monte Carlo method can also be used to estimate an integral of the form
Z b
J= f (t)dt, (9.1)
a
where f (t) takes values between c and d. That is, the domain of integration can be any given
bounded interval, [a, b], and the function can take values on any given bounded interval [c, d].
For example, we may wish to estimate the integral
Z 3
J= exp {t2 }dt.
1
In this case the domain of integration is [1, 3] and the function ranges over the interval
[2.7183, 20.086]
.
In order to estimate J, first we must make the change of variables
t−a
u= ,
b−a
to obtain
Z 1 Z 1
J = (b − a) f [(b − a)u + a]du = g(u)du,
0 0
where
and
The second step is to linearly modify the function g(u) so that the resulting function, h(u),
takes values between 0 and 1. That is,
g(u) − (b − a)c
h(u) = ⇒ g(u) = (b − a)(d − c)h(u) + (b − a)c
(b − a)(d − c)
then
0 ≤ h(u) ≤ 1.
Finally,
Z 1 Z 1
c
J = g(u)du = (b − a)(d − c) h(u)du +
0 0 d−c
c
= (b − a)(d − c)I + ,
d−c
where I is of the desired form (that is, the integral between 0 and 1 of a function that takes
values between 0 and 1).
9.2. EXERCISES 151
9.2 Exercises
Problem 9.1 Use the Monte Carlo integration method with n = 1500 to approximate the
following integrals.
(a) Z 1
I= exp{−x2 }dx.
0
What is the (approximated) probability that the approximation error is less than d = 0.05?
Less than d = 0.01?
(b) Z 2
I= exp{x2 }dx.
−1
Problem 9.3 (a) Generate 100 samples of size n = 10 from the following distributions:
(1) Uniform on the interval (0, 1); (2) exponential with mean 1; (3) discrete with f (1) =
1/3, f (2) = 1/3 and f (9) = 1/3; (4) discrete with f (1) = 1/8, f (3) = 1/8 and f (9) = 3/4
and (5) f (1) = 1/3, f (5) = 1/3 and f (9) = 1/3.
(b) For each distribution calculate the corresponding sample means and discuss the merits
of the CLT approximation to the distribution of the sample mean in each case. You can use
histograms, Q-Q plots, box plots, etc. for your analysis.
(c) Repeat (a) and (b) with n = 20 and n = 50.
(d) Concisely state your conclusions.
152 CHAPTER 9. SIMULATION STUDIES
Chapter 10
10.1 An example
The main ideas will be illustrated by the following example.
Example 10.1 A construction company wants to compare several different methods of dry-
ing concrete block cylinders. To that effect, the engineer in charge of acquisition and testing
of materials sets up an experiment to compare five different drying methods referred to as
drying methods A, B, C, D and E. One important feature of the concrete block cylinders
is their compressive strength (in hundreds of kilograms per square centimeter), which can
be determined by means of a destructive strength test. After selecting a carefully designed
experiment (we will discuss this important step later on) the engineer collected the data
displayed in Table 9.1.
153
154 CHAPTER 10. COMPARISON OF SEVERAL MEANS
We propose the following model. Each measurement will be represented as the sum of
two terms, an unknown constant, µi , and a random variable, εij .
The first subscript, i, ranges from 1 to k, where k is the number of populations being
compared, usually called treatments. In our example, we are comparing five types of drying
methods, therefore k = 5. The second subscript, j, ranges from 1 to ni , where ni is the number
of measurements for each treatment. In our example, we have n1 = n2 = . . . = n5 = 20.
The unknown parameters µi represent the treatment averages. Differences among the µi ’s
account for the part of the variability observed in the data that is due to differences among
the treatments being compared in the experiment.
The random variables εij account for the additional variability that is caused by other
factors not explicitly considered in the experiment (different batches of raw material, different
mixing times, measurement errors, etc.). The best we can hope regarding the global effect
of these uncontrolled factors is that it will average out. In this way these factors will not
unduly enhance or worsen the performance of any treatment.
An important technique that can be used to achieve this (averaging out) is called ran-
domization. The experimental units available for the experiment (the 100 concrete blocks
cylinders in the case of our example) must be randomly assigned to the different treatments,
so that each experimental unit has, in principle, the same chance of being assigned to any
treatment. One practical way for doing this in the case of our example is to number the
blocks from 1 to 100 and then to draw (without replacement) groups of 20 numbers. The
units with numbers in the first group are assigned to treatment A, the units with numbers
in the second group are assigned to treatment B, and so on. The actual labeling of the
treatments as A, B, etc. can also be randomly decided.
The model assumptions are:
These assumptions can be summarized by saying that the variables εij ’s are iid N (0, σ 2 ).
(a) The Q–Q plots of Figure 10.1 (a)-(e) suggest that assumption (4) is consistent with the
data. Figure 10.1 (f) displays the box–plots for the combined data (first from the left) and
for each drying method. The variability within the samples seem roughly constant (the boxes
are of approximately equal size). This suggests that assumption (3) is also consistent with
the data.
10.1. AN EXAMPLE 155
65
•
55
•
• ••
55 60
Empirical Quantiles
Empirical Quantiles
• •
35 40 45 50
• •• •
•• •
•• •
•• ••
•
••
45 50 •
•• ••
•
•
•
• •
40
•
30
• • •
-2 -1 0 1 2 -2 -1 0 1 2
Normal Quantiltes Normal Quantiltes
Drying Method C Drying Method D
• •
80
65
• •
•
Empirical Quantiles
Empirical Quantiles
••
60
70
•
•• •
••
55
•
•• ••
60
••
••
50
• • •
• ••
• ••••
50
45
••
• •
• • •
-2 -1 0 1 2 -2 -1 0 1 2
Normal Quantiltes Normal Quantiltes
Drying Method E Boxplot of Drying Methods
•
80
55
•
70
35 40 45 50
•
Empirical Quantiles
••
60
••
••
••
•
50
• ••
40
•
• •
30
30
•
-2 -1 0 1 2 A B C D E
Normal Quantiltes
ni
X
= yij , the ith treatment’s total,
j=1
ni P
yi1 + . . . + yini yij
y i. = = j=1
ni ni
yi.
= , the ith treatment’s mean,
ni
s
(yi1 − y i. )2 + . . . + (yini − y i. )2
si =
ni − 1
sP
ni
j=1 (yij− y i. )2
= the ith treatment’s standard deviation,
ni − 1
k
X
y.. = y1. + . . . + yk. = yi.
i=1
ni
k X
X
= yij , the overall total,
i=1 j=1
and
Pk
i=1 yi.
y .. =
n
Pk Pni
i=1 j=1 yij
= , the overall mean.
n
In the case of our example
k
X
y.. = ni y i. = 20[45.05 + 52.29 + 54.29 + 56.83 + 41.15] = 4992.2
i=1
and
y .. = 4992/100 = 49.92.
It is not difficult to show that the y i. are unbiased estimates for the unknown parameters
µi . In fact, the reader can easily verify that
σ2
E(Y i. ) = µi , and Var(Y i. ) = , for i = 1, . . . , k.
ni
Analogously, it is not difficult to verify (see Problem 8.9) that S12 , S22 , . . . , Sk2 are k different
unbiased estimates for the common variance σ 2 :
E(Si2 ) = σ 2 , for i = 1, . . . , k.
These k estimates can be combined to obtain an unbiased estimate for σ 2 . The reader is
encouraged to verify that the combined estimate
Pk
2 − 1)Si2
i=1 (ni
S =
n−k
is also unbiased and has a variance smaller than that of the individual Si2 ’s.
(c) Roughly speaking one can answer this question positively if there is evidence that a
substantial part of the variability in the data is due to differences among the treatments.
The total variability observed in the data is represented by the total sum of squares,
ni
k X
X
SST = [yij − y .. ]2
i=1 j=1
ni Pk Pni
k X
X [ i=1 j=1 yij ]2
= yij2 − Pk
i=1 j=1 i=1 ni
ni
k X
X y..2
= yij2 − .
i=1 j=1 n
158 CHAPTER 10. COMPARISON OF SEVERAL MEANS
We will now show that the total sum of squares, SST , can be expressed as the sum of two
terms, the error sum of squares, SSe, and the treatment sum of squares, SSt. That
is,
ni
k X
X
SSe = [yij − y i. ]2
i=1 j=1
ni
k X k
X X yi.2
= yij2 − .
i=1 j=1 i=1 ni
and
k
X
SSt = ni [y i. − y .. ]2 .
i=1
The first term on the right–hand side of equation (1), SSe, represents the differences
between items in the same treatment or within–treatment variability (this source of
variability is also called intra–group–variability). The second term, SSt, represents the
differences between items from different treatments or between–groups variability (this
source of variability is also called inter–group–variability).
To prove equation (1) we add and subtract y i. and expand the square to obtain
X ni
k X ni
k X
X
[yij − y .. ]2 = [(yij − y i. ) + (y i. − y .. )]2
i=1 j=1 i=1 j=1
X ni
k X ni
k X
X ni
k X
X
= (yij − y i. )2 + (y i. − y .. )2 + 2 (yij − y i. )(y i. − y .. )
i=1 j=1 i=1 j=1 i=1 j=1
k
X ni
X
= SSe + SSt + 2 (y i. − y .. ) (yij − y i. )
i=1 j=1
k
X
= SSe + SSt + 2 (y i. − y .. )[ni y i. − ni y i. ]
i=1
= SSe + SSt.
10.1. AN EXAMPLE 159
(4992.2)2
SST = 259273.7 − = 10049.11,
100
and
Degrees of Freedom
The sums of squares cannot be compared directly. They must first be divided by their
respective degrees of freedoms.
Since we use n squares and only one estimated parameter in the calculation of SST , we
conclude that
df (SST ) = n − 1.
Since there are n squares and k estimated parameters (the k treatment means) in the
calculation of SSe, we conclude that
df (SSe) = n − k.
ANALYSIS OF VARIANCE
All the calculations made so far can be summarized on a table called the analysis of
variance (ANOVA) table.
(c) To answer question (c) we must compare the variability due to the treatments with the
variability due to other sources. In other words, we must find out if the “treatment effect”
is strong enough to stand out above the “noise” caused by other sources of variability.
To do so, the ratio
M St
F =
M Se
is compared with the value F [df (M St), df (M Se)] from the F–Table, attached at the end of
these notes. In our case
865.25
F = = 12.45
69.34
and
Since F > F (4, 95) we conclude that there are statistically significant differences among the
drying methods.
(d) To answer question (d) we must perform multiple comparisons of the treatment means.
It is intuitively clear that if the number of treatments is large and therefore the total number
of comparisons of pairs of means
δ = 0.05/K
Each individual confidence interval is constructed so that it has probability 1 − δ of
including the true treatment mean difference. It can be shown that this procedure (called
Bonferroni multiple comparisons) is conservative: If all the treatment means are equal,
µ1 = µ2 = . . . = µk ,
then the probability that one or more of these intervals do not include the true difference, 0,
is at most α.
10.1. AN EXAMPLE 161
The procedure to compute the simultaneous confidence intervals is as follows. In the first
place, we must find the appropriate value, t(n−k) (δ) = t(n−k) (α/K), from the Student’s t table
(see Table 7.1). As before, the number of degrees of freedom corresponds to those of the MSe,
that is, df = n − k.
The second step is to determine the standard deviation of the difference of treatments
means, Y i. − Y m. . It is easy to see that
· ¸
1 1
Var(Y i. − Y m. ) = σ 2 + .
ni nm
Therefore,
s· ¸
√ 1 1
estimated SD(Y i. − Y m. ) ≈ M Se + .
ni nm
In the case of our example k = 5 and therefore K = 10. The observed differences between
the 10 pairs of treatments (sample) means are given in Table 9.3.
s
√ 1 1
dˆi,m = t(n−k) M Se + .
ni nm
In the case of our example, since
n1 = n2 = n3 = n4 = n5 = 20,
all the dˆi,m are equal to
s
√ 2
dˆ = t(95) (0.05/10) 69.34 = 7.56.
20
The differences marked with an star, *, on Table 9.3 are statistically significant. For example,
the * on the line A–C together with the fact that the sign of the difference is negative, is
162 CHAPTER 10. COMPARISON OF SEVERAL MEANS
interpreted as evidence that method A is worse (less strong) than method C. The conclusions
from Table 9.3 are: the methods A and E are not significantly different and appear to be
significantly worse than than the others. Observe that, although method A is not significantly
worse than method B (at the current level α = 0.05) their difference, 7.24, is almost significant
(fairly close to 7.56). 2
10.2. EXERCISES 163
10.2 Exercises
10.2.1 Exercise Set A
Problem 10.1 Three different methods are used to transport milk from a farm to a dairy
plant. Their daily costs (in $100) are given in the following:
Method 1: 8.10 4.40 6.00 7.00
Method 2: 6.60 8.60 7.35
Method 3: 12.00 11.20 13.30 10.55 11.50
(1) Calculate the sample mean and sample variance for the cost of each method.
(2) Calculate the grand mean and the pooled variance for the costs of the three methods.
(3) Test the difference of the costs of the three methods.
Problem 10.2 Six samples of each of four types of cereal grain grown in a certain region were
analyzed to determine thiamin content, resulting in the following data (micrograms/grams):
Wheat: 5.2 4.5 6.0 6.1 6.7 5.8
Barley: 6.5 8.0 6.1 7.5 5.9 5.6
Maize: 5.8 4.7 6.4 4.9 6.0 5.2
Oats: 8.3 6.1 7.8 7.0 5.5 7.2
Carry out the analysis of variance for the given data. Do the data suggest that at least two
of the four different grains differ with respect to true average thiamin content? Use α = 0.5.
Table 10.4:
Method I Method II Method III
52 41 49
51 40 47
51 39 45
52 40 47
of cigarettes smoked daily is equal for three methods. (Let the significance level equal 0.05).
(b) Use confidence intervals to determine which method results in a larger reduction in
smoking.
minutes) they took to reach 1500o F, starting from room temperature, yielding the following
results
Are the furnaces average heating times different? If so, which is the fastest? The slowest?
Table 10.5:
Furnace ni xi si
1 15 14.21 0.52
2 15 13.11 0.47
3 10 15.17 0.60
4 10 12.42 0.43
Problem 10.5 Three specific brands of alkaline batteries are tested under heavy loading
conditions. Given here are the times, in hours, that 10 batteries of each brand functioned
before running out of power. Use analysis of variance to determine whether the battery
brands take significantly different times to completely discharge. If the discharge times are
significantly different (at the 0.05 level of confidence), determine which battery brands differ
from one another. Specify and check the model assumptions.
Table 10.6:
Battery Type
1 2 3
5.60 5.38 6.40
5.43 6.63 5.91
4.83 4.60 6.56
4.22 2.31 6.64
5.78 4.55 5.59
5.22 2.93 4.93
4.35 3.90 6.30
3.63 3.47 6.77
5.02 4.25 5.29
5.17 7.35 5.18
Problem 10.6 Five different copper-silver alloys are being considered for the conducting
material in large coaxial cables, for which conductivity is a very important material char-
acteristic. Because of differing availabilities of the five kinds, it was impossible to make as
many samples from alloys 2 and 3 as from other alloys. Given next are the coded conduc-
tivity measurements from samples of wire made from each of the alloys. Determine whether
the alloys have significantly different conductivities. If the conductivities are significantly
different (at α = 0.05), determine which alloys differ from one another. Specify and check
the model assumptions.
Table 10.7:
Alloy
1 2 3 4 5
60.60 58.88 62.90 60.72 57.93
58.93 59.43 63.63 60.41 59.85
58.40 59.30 62.33 59.60 61.06
58.63 56.97 63.27 59.27 57.31
60.64 59.02 61.25 59.79 61.28
59.05 58.59 62.67 62.35 59.68
59.93 60.19 61.29 60.26 57.82
60.82 57.99 60.77 60.53 59.29
58.77 59.24 58.91 58.65
59.11 57.38 58.55 61.96
61.40 61.20 57.96
59.00 59.73 59.42
60.12 59.40
60.49 60.30
60.15
and
E(MSe) = σ 2 .
Is the variance of MSe smaller than the variance of Si2 ? Why?
Problem 10.8 To study the correlation between the solar insulation and wind speed in
the United States, 26 National Weather Service stations used three different types of solar
collectors–2D Tracking, NS Tracking and EW Tracking to collect the solar insulation and wind
speed data. An engineer wishes to compare whether these three collectors give significantly
different measurements of wind speed. The values of windspeed corresponding to attainment
of 95% integrated insulation are reported in the following Table 9.8.
Are there statistically significant differences in measurement among the three different
apertures? Specify and check the model assumptions.
166 CHAPTER 10. COMPARISON OF SEVERAL MEANS
Table 10.8:
Station No. Site Latitude 2D Tracking NS Tracking EW Tracking
1 Brownsville, Texas 25.900 11.0 11.0 11.0
2 Apalachicola, Fla. 29.733 7.9 7.9 8.0
3 Miami, Fla. 25.800 8.7 8.6 8.7
4 Santa Maia, Calif. 34.900 9.6 9.7 9.5
5 Ft. Worth, Texas 32.833 10.8 10.7 10.9
6 Lake Charles, La. 30.217 8.5 8.4 8.6
7 Phoenix, Ariz. 33.433 6.6 6.6 6.5
8 El Paso, Taxes 31.800 10.3 10.3 10.3
9 Charleston, S.C. 32.900 9.2 9.1 9.2
10 Fresno, Calif. 36.767 6.2 6.3 6.1
11 Albuquerque, N.M. 35.050 9.0 9.0 8.9
12 Nashville, Tenn. 36.117 7.7 7.6 7.7
13 Cape Hatteras, N.C 35.267 9.2 9.2 9.3
14 Ely, Nev. 39.283 10.0 10.1 10.1
15 Dodge City, Kan. 37.767 12.0 11.9 12.0
16 Columbia, Mo. 38.967 9.0 8.9 9.1
17 Washington, D.C. 38.833 9.3 9.1 9.5
18 Medford, Ore. 42.367 6.8 6.9 6.5
19 Omaha, Neb. 41.367 10.4 10.3 10.5
20 Madison, Wis. 43.133 9.5 9.5 9.6
21 New York, N.Y. 40.783 10.4 10.3 10.4
22 Boston, Mass. 42.350 11.4 11.2 11.4
23 Seattle, Wash. 47.450 9.0 9.0 9.1
24 Great Falls, Mont. 47.483 12.9 12.6 13.0
25 Bismarck, N.D. 46.767 10.8 10.7 10.8
26 Caribou, Me. 46.867 11.4 11.3 11.5
Chapter 11
11.1 An example
Consider the following example:
Example 11.1 Due to differences in the cooling rates when rolled, the average elastic limit
and the ultimate strength of reinforcing metal bars is determined by the bar size. The
measurements in Table 10.1 (in hundreds of pounds per square inch) were obtained from a
sample of bars.
The experimental units (metal bars) are numbered from 1 to 35. Notice that each exper-
imental unit, i, gave rise to three different measurements:
We will investigate the relationship between the variables xi and yi . Likewise, the rela-
tionship between the variables xi and zi can be investigated in an analogous way (see Problem
10.2).
First of all we notice that the roles of yi and xi are different. Reasonably, one must assume
that the elastic limit, yi , of the ith metal bar is somehow determined (or influenced) by the
diameter, xi , of the bar. Consequently, the variable yi can be considered as a dependent or
response variable and the variable xi can be considered as an independent or explanatory
variable.
A quick look at Figure 11.1 (a) will show that there is not an exact (deterministic)
relationship between xi and yi . For example, bars with the same diameter (3, say) have
different elastic limits (436.82, 449.40 and 412.63). However, the plot of yi versus xi shows
that in general, larger values of xi are associated with smaller values of yi .
In cases like this we say that the variables are statistically related, in the sense that the
average elastic limit is a decreasing function, f (xi ), of the diameter.
167
168 CHAPTER 11. THE SIMPLE LINEAR REGRESSION MODEL
Each elastic limit measurement, yi , can be viewed as a particular value of the random
variable, Yi , which in turn can be expressed as the sum of two terms, f (xi ) and εi . That is,
It is usually assumed that the random variables εi satisfy the following assumptions:
These assumptions can be summarized by simply saying that the variables Yi ’s are indepen-
dent normal random variables with
The model (10.1) above is called linear if the function f (xi ) can be expressed in the form
f (xi ) = β0 + β1 g(xi ),
11.1. AN EXAMPLE 169
where the function g(x) is completely specified, and β0 and β1 are (usually unknown) param-
eters.
(a) (b)
150 200 250 300 350 400 450
• ••
20
• •
•• • • • •
• •• •
• • •
• •• • •
• ••
Elasticity Limit
•• ••
0
•• Residuals ••
•
• • • •
•• •
• •
-20 • •
• • •
• •
••
•
-40
••
•
4 6 8 10 12 4 6 8 10 12
Diameter Diameter
(c) (d)
150 200 250 300 350 400 450
• •
•
•• ••
20
•
• ••
• •• • •
Elasticity Limit
10
• •
Residuals
•• •• • •
• • • • •
•• • • •
• •
0
• • • • •
•
• • •
-10
• ••
••
• •
-20
•• • •
•
4 6 8 10 12 4 6 8 10 12
Diameter Diameter
Figure 11.1
The linear model,
is very flexible as many possible mean response functions, f (xi ), satisfy the linear form
given above. For example, the functions
exp{β1 xi }
f (xi ) = .
1 + exp{β1 xi }
The shape assumed for f (xi ) is sometimes suggested by scientific or physical considera-
tions. In other cases, as in the present example, the shape of f (xi ) is suggested by the data
itself. The plot of yi versus xi (see Figure 11.1) indicates that, at least in principle, the simple
linear mean response function
ri (b0 , b1 ) = yi − b0 − b1 xi , i = 1, . . . , n,
measure the vertical distances between the observed value, yi , and the tentatively estimated
mean response function, b0 + b1 xi .
The method of least squares consists of finding the values βˆ0 and βˆ1 of b0 and b1 , respectively,
which minimize the sum of the squares of the residuals. It is expected that, because of this
minimization property, the corresponding mean response function,
n
X n
X
min ri2 (b0 , b1 ) = min [yi − b0 − b1 xi ]2 .
b0 ,b1 b0 ,b1
i=1 i=1
To find the actual values of βˆ0 and βˆ1 , we differentiate the function
n
X
S(b0 , b1 ) = ri2 (b0 , b1 )
i=1
with respect to b0 and b1 , and set these derivatives equal to zero to obtain the so called LS
equations:
Xn
∂
S(b0 , b1 ) = −2 [yi − b0 − b1 xi ] = 0
∂b0 i=1
Xn
∂
S(b0 , b1 ) = −2 [yi − b0 − b1 xi ]xi = 0,
∂b1 i=1
n
X n
X
yi − nb0 − b1 xi = 0
i=1 i=1
n
X n
X n
X
yi xi − b0 xi − b1 x2i = 0,
i=1 i=1 i=1
or equivalently,
y − b0 − b1 x = 0 (11.3)
xy − b0 x − b1 xx = 0, (11.4)
where
n n
1X 1X
xy = yi xi and xx = x2 .
n i=1 n i=1 i
b0 = y − b1 x. (11.5)
b1 xx = xy − b0 x
= xy − [y − b1 x] x
= xy − y x + b1 x x
xy − x y
⇒ b1 = .
xx − x x
xy − y x
βˆ1 = , and
xx − x x
βˆ0 = y − β̂ x.
In the case of our numerical example we have
Therefore,
2162.353 − (7.086)(336.565)
βˆ1 = = − 29.86,
57.657 − (7.0862 )
ε i = y i − β 0 − β 1 xi ,
one would expect the plot of the ei versus the xi will not show any particular pattern. In
other words, if the specified mean response function is correct, the estimated mean response
function fˆ(xi ) should “extract” most of the signal (systematic behavior) contained in the
data and the residuals, ei , should behave as patternless random noise.
Now that the tentatively specified simple transformation
g(x) = x
for the explanatory variable, x, is considered to be incorrect, the next step in the analysis is
to specify a new transformation. We will try the mean response function
(diameter)2 ∗ π 1 1
( inch)2 = x2 (π/4) ( inch)2 ,
4 8 8
then the newly proposed mean response function will be appropriate. To simplify the notation
we will write
wi = x2i ,
to represent the squared diameter of the ith metal bar.
The new estimates for β0 and β1 and f (x) are
wy − y w
β̂1 = = −2.022, and
ww − w w
It is not difficult to show that, if the model is correct, the estimates βˆ0 and βˆ1 are unbiased:
and
σ2
Var(β̂1 ) = Pn 2
.
i=1 [wi − w]
Pn
2 i=1 [Yi − β̂0 − β̂1 wi ]2
s =
n−2
In summary, the empirically estimated standard deviations of β̂0 and β̂1 are
11.1. AN EXAMPLE 175
v v
u u
u1 w2 u1 w2
SD(β̂0 ) = s t + Pn 2
= s t +
n i=1 [wi − w] n n[ww − w w]
and
s s
SD(β̂1 ) = qP =q
n 2
i=1 [wi − w] n[ww − w w]
and
s
86.53
SD(β̂1 ) = = 0.0385
35(4993.429 − 57.6572 )
Confidence Intervals
95% confidence intervals for the model parameters, β0 and β1 , and also for the mean
response, f (x), can now be easily obtained. First we derive the 95% confidence intervals for
β0 and β1 . As before, the intervals are of the form
where
In the case of our example, n − 2 = 35 − 2 = 33, t(33) (0.05) ≈ t(30) (0.05) = 2.04 and so
respectively.
Notice that, since the confidence interval for β doesn’t include the value zero, we conclude
that there is a linear decreasing relationship between the square of the bar diameter and its
1
elastic limit. When the bar surface increases by one unit ( 64 inch2 ) the average elastic limit
decreases by two hundred psi.
Finally, we can also construct a 95% confidence interval for the average response, f (x), at
any given value of x. It can be shown that the variance of fˆ(w) is
" #
1 (w − w)2
Var(fˆ(x)) = σ 2
+
n n[ww − w w]
11.2 Exercises
Problem 11.1 The number of hours needed by twenty employees to complete a certain task
have been measured before and after they participated of a special training program. The
data is displayed on Table 7.2. Notice that these data have already been partially studied in
Problem 7.12. Investigate the relationship between the before training and the after training
times using linear regression. State your conclusions.
Problem 11.2 Investigate the relationship between the diameter bar and the ultimate
strength shown in Table 10.1. State your conclusions.
Problem 11.3 Table 10.2 reports the yearly worldwide frequency of earthquakes with mag-
nitude 6 or greater from January, 1953 to December, 1965.
(a) Make scatter-plots of the frequencies against magnitudes and the log-frequencies against
the magnitudes.
(b) Propose your regression model and estimate the coefficients of your model.
(c) Test the null hypothesis that the slope is equal to zero.
Table 11.2:
Magnitude Frequency Magnitude Frequency
6.0 2750 7.4 57
6.1 1929 7.5 45
6.2 1755 7.6 31
6.3 1405 7.7 23
6.4 1154 7.8 18
6.5 920 7.9 13
6.6 634 8.0 9
6.7 487 8.1 7
6.8 376 8.2 7
6.9 276 8.3 4
7.0 213 8.4 2
7.1 141 8.5 2
7.2 110 8.6 1
7.3 85 8.7 1
Problem 11.4 In a certain type of test specimen, the normal stress on a specimen is known
to be functionally related to the shear resistance. The following is a set of experimental data
on the variables.
178 CHAPTER 11. THE SIMPLE LINEAR REGRESSION MODEL
Problem 11.5 The amounts of a chemical compound y, which dissolved in 100 grams of
water at various temperatures, x, were recorded as follows:
x(o C) y (grams)
0 8 6 8
15 12 10 14
30 25 21 24
45 31 33 28
60 44 39 42
75 48 51 44
(a) Find the equation of the regression line.
(b) Estimate the amount of chemical that will dissolve in 100 grams of water at 50o C.
(c) Test the hypothesis that β0 = 6, using a 0.01 level of significance, against the alternative
that β0 6= 6.
(d) Is the linear model adequate?
Chapter 12
Appendix
179