100% found this document useful (1 vote)

229 views656 pages

Applied Statistics

Uploaded by

concepcioncherylynb

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

229 views656 pages

Applied Statistics

Uploaded by

concepcioncherylynb

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 656

www.ebook3000.

com
Applied Statistics
for Engineers
and Scientists

Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has
deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has
www.ebook3000.com
deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
third edition

Applied Statistics
for Engineers
and Scientists

Jay Devore
California Polytechnic State University, San Luis Obispo

Nicholas Farnum
California State University, Fullerton

Jimmy Doi
California Polytechnic State University, San Luis Obispo

Australia Brazil Mexico Singapore United Kingdom United States

Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has
deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
This is an electronic version of the print textbook. Due to electronic rights restrictions, some third party content may be suppressed. Editorial
review has deemed that any suppressed content does not materially affect the overall learning experience. The publisher reserves the right to
remove content from this title at any time if subsequent rights restrictions require it. For valuable information on pricing, previous
editions, changes to current editions, and alternate formats, please visit www.cengage.com/highered to search by
ISBN#, author, title, or keyword for materials in your areas of interest.

Publisher: Richard Stratton ALL RIGHTS RESERVED. No part of this work covered by the copyright
herein may be reproduced, transmitted, stored, or used in any form or by
Senior Sponsoring Editor: Molly Taylor any means graphic, electronic, or mechanical, including but not limited
Development Editor: Laura Wheel to photocopying, recording, scanning, digitizing, taping, web distribution,
Editorial Assistant: Danielle Hallock information networks, or information storage and retrieval systems, except
as permitted under Section 107 or 108 of the 1976 United States Copyright
Associate Media Editor: Andrew Coppola
Act, without the prior written permission of the publisher.
Brand Manager: Gordon Lee
Content Project Manager: Jill Quinn For product information and technology assistance, contact us at
Cengage Learning Customer & Sales Support, 1-800-354-9706
Senior Art Director: Linda May
For permission to use material from this text or product,
Manufacturing Planner: Sandee Milewski
submit all requests online at www.cengage.com/permissions
Rights Acquisition Specialist: Shalice Further permissions questions can be emailed to
Shah-Caldwell [email protected]
Production Service: Prashant Kumar Das,
MPS Limited Library of Congress Control Number: 2013944181
Text and Cover Designer: Jenny Willingham ISBN-13: 978-1-133-11136-8
Cover Image: Female Scientist:
ISBN-10: 1-133-11136-X
wavebreakmedia/Shutterstock.com;
Solar Panels: portumen/Shutterstock.com; Cengage Learning
Nanotubes: PASIEKA/SPL/Getty images 200 First Stamford Place, 4th Floor
Compositor: MPS Limited Stamford, CT 06902
USA

Cengage Learning is a leading provider of customized learning solutions with

office locations around the globe, including Singapore, the United Kingdom,
Australia, Mexico, Brazil and Japan. Locate your local office at
international.cengage.com/region

Cengage Learning products are represented in Canada by

Nelson Education, Ltd.

For your course and learning solutions, visit www.cengage.com

Purchase any of our products at your local college store or at our p
referred
online store www.cengagebrain.com
Instructors: Please visit login.cengage.com and log in to access instructor-
specific resources.

Printed in the United States of America

1 2 3 4 5 6 7 17 16 15 14 13

My grandsons, Philip and Elliot

J.L.D.

My grandchildren, Ava and Leo

N.R.F.

My wife and daughter, Midori and Alicia

J.A.D.

Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has
www.ebook3000.com
deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has
deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Contents

1 Data and Distributions, 1

1 Populations, Samples, and Processes, 3
2 Visual Displays for Univariate Data, 10
3 Describing Distributions, 28
4 The Normal Distribution, 36
5 Other Continuous Distributions, 46
6 Several Useful Discrete Distributions, 50
Supplementary Exercises, 58
Bibliography, 60

2 Numerical Summary Measures, 61

1 Measures of Center, 62
2 Measures of Variability, 72
3 More Detailed Summary Quantities, 80
4 Quantile Plots, 90
Supplementary Exercises, 97
Bibliography, 100

vii

3 Bivariate and Multivariate Data

and Distributions, 101
1 Scatterplots, 102
2 Correlation, 108
3 Fitting a Line to Bivariate Data, 117
4 Nonlinear Relationships, 132
5 Using More Than One Predictor, 140
6 Joint Distributions, 151
Supplementary Exercises, 157
Bibliography, 160

4 Obtaining Data, 161

1 Operational Definitions, 162
2 Data from Sampling, 166
3 Data from Experiments, 179
4 Measurement Systems, 186
Supplementary Exercises, 192
Bibliography, 193

5 Probability and Sampling Distributions, 194

1 Chance Experiments, 195
2 Probability Concepts, 201
3 Conditional Probability and Independence, 208
4 Random Variables, 215
5 Sampling Distributions, 228
6 Describing Sampling Distributions, 233
Supplementary Exercises, 242
Bibliography, 245

6 Quality and Reliability, 246

1 Terminology, 247
2 How Control Charts Work, 252
3 Control Charts for Mean and Variation, 256
4 Process Capability Analysis, 265
5 Control Charts for Attributes Data, 273
6 Reliability, 283

Supplementary Exercises, 291

Bibliography, 292

7 Estimation and Statistical Intervals, 293

1 Point Estimation, 294
2 Large-Sample Confidence Intervals
for a Population Mean, 298
3 More Large-Sample Confidence Intervals, 307
4 Small-Sample Intervals Based
on a Normal Population Distribution, 318
5 Intervals for 1 2 2 Based on Normal
Population Distributions, 327
6 Other Topics in Estimation (Optional), 335
Supplementary Exercises, 347
Bibliography, 351

8 Testing Statistical Hypotheses, 352

1 Hypotheses and Test Procedures, 353
2 Tests Concerning Hypotheses About Means, 363
3 Tests Concerning Hypotheses About a
Categorical Population, 380
4 Testing the Form of a Distribution, 394
5 Further Aspects of Hypothesis Testing, 399
Supplementary Exercises, 407
Bibliography, 412

9 The Analysis of Variance, 413

1 Terminology and Concepts, 414
2 Single-Factor ANOVA, 419
3 Interpreting ANOVA Results, 427
4 Randomized Block Experiments, 435
Supplementary Exercises, 441
Bibliography, 444

10 Experimental Design, 445

1 Terminology and Concepts, 446
2 Two-Factor Designs, 453
3 Multifactor Designs, 463
4 2 Designs, 472
5 Fractional Factorial Designs, 489
Supplementary Exercises, 499
Bibliography, 502

11 Inferential Methods in Regression

and Correlation, 503
1 Regression Models Involving
a Single Independent Variable, 504
2 Inferences About the Slope Coefficient , 517
3 Inferences Based on the Estimated Regression Line, 525
4 Multiple Regression Models, 533
5 Inferences in Multiple Regression, 542
6 Further Aspects of Regression Analysis, 555
Supplementary Exercises, 573
Bibliography, 580

Appendix Tables, 581

Answers to Odd-Numbered Exercises, 604
Index, 629

Purpose
The use of statistical models and methods for describing and analyzing data has become
common practice in virtually all scientific disciplines. This book provides a comprehen-
sive introduction to those models and methods most likely to be encountered and used
by students in their careers in engineering and the natural sciences. It is appropriate for
courses of one term (semester or quarter) in duration.

Approach
Students in a statistics course designed to serve other majors are too often initially skepti-
cal of the value and relevance of the subject matter. Our experience, however, is that
students can be turned on to the subject by the use of good examples and exercises that
blend their everyday experiences with their scientific interests. We have worked hard to
find examples involving real, rather than artificial, data—data that someone thought
was worth collecting and analyzing. Many of the methods presented throughout the
book are illustrated by analyzing data taken from a published source.
The exercises form a very important component of the book. A really good lecturer
can deceive students into thinking they have an excellent mastery of the subject, only
to discover otherwise when they start working problems. We have therefore provided a
rich assortment of exercises designed to reinforce understanding of the material. A sub-
stantial majority of these are based on real data, and we have tried as much as possible
to avoid mathematical manipulation for its own sake. Someone who attempts a good
portion of the exercises will gain a greater appreciation of the scope and applicability of
the subject than would be gleaned simply by reading the text.
Sometimes the reader may be unfamiliar with the context of a particular problem
situation (as indeed we often were), but we believe that students will find scenarios,

such as the one below, more appealing than they would in patently artificial situations
dealing with widgets or brand A versus brand B.

64. The use of microorganisms to dissolve metals x1 5 pH, x2 5 sucrose concentration (g/L), and
from ores has offered an ecologically friendly x3 5 spore population (106 cells/ml) on y 5
and less expensive alternative to traditional oxalic acid production (mg/L). The accompa-
methods. The dissolution of metals by this nying SAS output resulted from a request to fit
method can be done in a two-stage bioleaching the model with predictors x1, x2, and x3 only.
process: (1) microorganisms are grown in cul- Source DF Sum of Mean F Pr > F
ture to produce metabolites (e.g. organic acids) Squares Square Value
and (2) ore is added to the culture medium to Model 3 5861301 1953767 7.53 0.0052
initiate leaching. The article “Two-Stage Fun- Error 11 2855951 259632
Corrected
gal Leaching of Vanadium from Uranium Ore
Total 14 8717252
Residue of the Leaching Stage using Statisti-
cal Experimental Design” (Annals of Nuclear Fitting the complete second-order model re-
Energy, 2013: 48–52) reported on a two-stage sulted in SSResid 5 541,632. Carry out a test at
bioleaching process of vanadium by using significance level .01 to decide whether at least
the fungus Aspergillus niger. In one study, the one of the second-order predictors provides use-
authors examined the impact of the variables ful information about oxalic acid production.

Mathematical and Computing Level

The exposition is relatively modest in terms of mathematical development. Limited
use of univariate calculus is made in the first two chapters, and a bit of univariate and
multivariate calculus is employed later on. Matrix algebra appears nowhere in the book.
Thus virtually all of the exposition should be accessible to those whose mathematical
background includes one semester or two quarters of differential and integral calculus.
The computer is an indispensable tool these days for organizing, displaying, and ana-
lyzing data. We have included many examples, as illustrated on the next page, of output
from the most widely used statistical computer packages, including Minitab, SAS, R, and
JMP, both to convince students that the statistical methods discussed herein are available
in these packages and to expose them to format and contents of typical output. Because
availability of packages and nature of platforms vary widely from institution to institution,
we decided not to include instructions for obtaining output from any particular package.
Based on our experience, it should be straightforward to supplement the text by indepen-
dently introducing students to any one of the aforementioned packages. They can then be
asked to use the computer in working the many problems that contain raw data.

and lower greenhouse gas emissions. One popular biodiesel fuel is fatty acid ethyl
ester (FAEE). The authors of “Application of the Full Factorial Design to Opti-
mization of Base-Catalyzed Sunﬂower Oil Ethanolysis” (Fuel, 2013: 433−442)
performed an experiment to determine optimal process conditions for producing
FAEE from the ethanolysis of sunﬂower oils. In one study, the effects of three pro-
cess factors on FAEE purity (%) were investigated.

Factor Factor name Factor levels

A Reaction Temperature 25°C, 50°C, 75°C
B Ethanol-to-oil molar ratio 6:1, 9:1, 12:1
C Catalyst loading .75 wt.%, 1.00 wt.%, 1.25 wt.%

(See Page 467 for the complete data) Interaction Plots for FAEE
Data Means

Plots of all two-factor interactions are shown in Figure 10.18,

6 9 12
along with theTEMP
95
main effects Plots for the three factors. Suppose
90
we are
TEMP
interested in maximizing 255075
the value of the response variable, FAEE purity.85 Looking at the interaction plots,RATIO
the combination of factor levels that best accomplishes this objective
RATIO
is A 5 75°C,
95
90
6
9
12
B 5 12:1, and C 5 1.25%. In this example, the conclusions from the interaction 85
LOAD
plots agree with the conclusions that we would 95 have drawn from inspecting the 0.75 1.00
90
main effects plots.
LOAD 1.25
85
25 50 75 0.75 1.00 1.25

Interaction Plots for FAEE Main Effects Plots for FAEE

Data Means Data Means
6 9 12 TEMP RATIO
TEMP 96
95 25
50 94
90 TEMP 75 92
85 90
RATIO
95 88
Mean

6
9 25 50 75 6 9 12
RATIO 90 12
LOAD
85 96
LOAD 94
95 0.75
1.00 92
90 LOAD 1.25
90
85
88
25 50 75 0.75 1.00 1.25 0.75 1.00 1.25

Main Effects Plots for FAEE

Figure 10.18 Two-factor interaction plots and main effects plots for Example 10.2
Data Means
TEMP RATIO
Unless otherwise noted, all content on this page is © Cengage Learning.

96
94
92
90
88
Mean

25 50 75 6 9 12

Focus and Content

96
LOAD

We92
90
have written this book for an audience whose primary interest is in statistical meth-
odology
88
0.75
and1.00the analysis
1.25
of data. The ordering of topics herein is rather different from
what is found in virtually all competing texts. The usual approach is to inject a heavy
dose of probability at the outset, then develop probability distributions and use these as
a basis for inferential methods (drawing conclusions from data). Unfortunately, an intro-
ductory one-term course rarely allows sufficient time for comprehensive treatments of
both probability and statistical inference. If probability is emphasized, statistics gets short
shrift. An additional problem is that many students find probability to be a difficult and

intimidating subject, so starting out in this way creates an aura of mathematical formal-
ism that makes it all too easy to lose sight of the applied and practical aspects of statistics.
Certainly descriptive statistical methods can be developed in detail with virtu-
ally no probability background, and even an understanding of the most commonly
used inferential techniques requires familiarity with only the most basic of probability
properties. So we decided to proceed along a path first blazed by David Moore and
George McCabe in their book Introduction to the Practice of Statistics, written for a non-
science audience. In their Chapter 1, the normal distribution is introduced and em-
ployed to address many interesting questions, whereas probability does not surface
until much later in the book. Our Chapter 1 first presents some basic concepts and
terminology, continues with an introduction to some descriptive techniques, and then
extends the notion of a histogram for sample data to a distribution of values for an entire
population or process. This allows us to develop and use not only the family of normal
distributions but also other continuous and discrete distributions such as the lognormal,
Weibull, Poisson, and binomial. Chapter 2 covers numerical summary measures for
sample data (e.g., the sample mean x and sample standard deviation s) in tandem with
analogous measures for populations and processes (e.g., the population or process mean
and standard deviation ).
The focus of the first two chapters is on univariate data (observations on or values of
a single variable, such as tensile strength). In the third chapter we consider descriptive
methods for bivariate data (e.g., measuring both thickness and strength for wire speci-
mens) and then multivariate data, emphasizing in particular correlation and regression.
This chapter should be especially useful for courses in which there is insufficient time
to cover regression models from a probabilistic viewpoint (such models and inferences
based on them are the subject of Chapter 11).
Most other books intended for our target audience say rather little about how data
is obtained. Yet statistics has much to say not only about how to analyze data once it is
available but also about sensible and efficient techniques for collecting data. Several
lower-level texts, notably the one by Moore and McCabe cited earlier, successfully and
entertainingly covered this territory prior to probability and inference, and we follow
their lead with our Chapter 4. Sampling and experimental design are discussed, and the
last section contains an introduction to various aspects of measurement.
At last probability makes its appearance in Chapter 5. Our minimalist treatment
of this subject is intended to move readers expeditiously into the inferential part of the
book. Since only the notion of probability as limiting or long-run relative frequency is
needed to understand the basis for most of the usual inferential procedures, little time
is spent on topics such as addition and multiplication rules and conditional probability,
and no material on counting techniques is included here (combinations enter briefly in
Chapter 1 in connection with the binomial distribution). The concept of a random vari-
able and its probability distribution is then introduced and related to the distributional
material in Chapter 1. Finally, the notion of a statistic and its sampling distribution is
discussed and illustrated.
The remaining six chapters focus on the most widely used methods from statistical
inference. Descriptive techniques from earlier chapters, such as boxplots and quantile
plots, are employed in many of our examples. Chapter 6 covers topics from quality con-
trol and reliability. Estimation and various statistical intervals—confidence, prediction,

and tolerance—are introduced in Chapter 7. Hypothesis testing is discussed in Chap-

ter 8. Chapter 9 covers the analysis of variance for comparing more than two popula-
tions or treatments, and these ideas are extended in Chapter 10 to the analysis of data
from designed multifactor experiments. Finally, regression models and associated infer-
ential procedures are covered in Chapter 11.

Some Suggestions Concerning Coverage

It should be possible to cover virtually all the material in the book in a semester-long
course that meets four hours per week. For a course of this duration that meets only
three times per week or for a one-quarter course, some pruning will have to be done
(perhaps combined with reading assignments on topics not discussed in lecture). The
first four sections of Chapter 1 are essential, but Section 5 on other (than the normal)
continuous distributions and Section 6 on the binomial and Poisson distributions can
be covered very lightly or even omitted altogether. The first two sections of Chapter 2,
on measures of center and spread, are also required. The material on more detailed
summary measures (e.g., boxplots) in Section 3 can be just touched on or skipped, and
quantile plots from Section 4 can be presented very quickly.
When time does not allow for coverage of inferences in regression, we strongly
recommend that at least a bit of bivariate descriptive methods from Chapter 3 be cov-
ered. At minimum, this could consume just two or three one-hour lectures in which
scatterplots, correlation, and fitting a line by least squares are discussed. More time
would provide the opportunity to introduce r2 as an assessment of fit, nonlinear rela-
tionships, and even multiple regression. If inference in regression is to be covered, this
chapter can be skipped over for the moment and then combined with Chapter 11 at the
end of the course.
Chapter 4, on obtaining data, can be covered next or postponed until later. There
is no mathematics here, only some definitions and examples, so this is one place where
a minimal amount of lecture time can be expended along with a request that students
read on their own. Most of Chapter 5 is crucial; inferential methods cannot be under-
stood without a modest exposure to probability and sampling distributions of various
statistics. The quality control and reliability techniques of Chapter 6 are attractive ap-
plications of sampling distribution and probability properties. When time is limited,
as few as two lectures might be devoted to some general concepts and a single type of
control chart. Another possibility is to postpone this material until after hypothesis test-
ing has been introduced.
From this point on, it is local option as to what is covered and in how much detail.
We certainly believe that students deserve at least minimal exposure to point estima-
tion, confidence intervals, and hypothesis testing. Time may permit presentation of
just some selected one-sample procedures (Sections 7.1, 7.2, 8.1, and perhaps a bit of
Sections 7.4 and 8.2). A longer course would accommodate topics from among predic-
tion and tolerance intervals, two-sample situations, chi-squared tests, testing the plausi-
bility of some particular type of distribution (e.g., testing the assumption that the data
came from a normal distribution), analysis of variance and experimental design, and
more on regression.

Changes for the Third Edition

There are nearly 200 new exercises and 40 new examples, most of which include
real data or other information from published sources.
Chapter 1 contains a new subsection on “The Scope of Modern Statistics” to
illustrate how statisticians continue to develop new methodology while working
on problems in a wide spectrum of disciplines.
Section 8.3, on hypothesis testing based on categorical data, now contains a sub-
section on Fisher’s Exact Test that is a useful alternative when assumptions for
the standard chi-squared test fail.
Section 11.6, on regression, now contains a subsection on the multiple logistic
regression model that accommodates multiple predictor variables for a dichoto-
mous response.
In general, the exposition has been polished, tightened, and improved.

Acknowledgments
We greatly appreciate the feedback and useful advice from the many individuals who
reviewed various parts of our manuscript: Christine Anderson-Cook, Virginia Tech;
Olcay Arslan, St. Cloud State; Peyton Cook, The University of Tulsa; Jean-Yves “Pip”
Courbois, University of Washington; Charles Donaghey, University of Houston; Dale O.
Everson, University of Idaho; William P. Fox, United States Military Academy; William
Fulkerson, Deere & Company; Roger Hoerl, General Electric Company; Marianne
Huebner, Michigan State University; Alan M. Johnson, University of Arkansas, Little
Rock; Steven L. Johnson, University of Arkansas; Janusz Kawczak, University of North
Carolina, Charlotte; Mohammed Kazemi, University of North Carolina, Charlotte;
David P. Kessler, Purdue University; Barbara McKinney, Western Michigan University;
Jang W. Ra, University of Alaska, Anchorage; John Ramberg, University of Arizona;
Stephen E. Rigdon, Southern Illinois University at Edwardsville; Amy L. Rocha,
San Jose State University; Joe Romano, Stanford University; Lewis H. Shoemaker,
Millersville University; and Paul Wilson, Rochester Institute of Technology.
The editorial and production services provided by numerous people from
Cengage Learning are greatly appreciated, especially the support of Shaylin Walsh,
Laura Wheel, and Jill Quinn. It was indeed a great pleasure to have Prashant Kumar
Das overseeing production of the book; his attention to detail, timely feedback, and
willingness to tolerate the authors’ idiosyncrasies made our work during production
much more tolerable than would otherwise have been the case. A special thanks goes
to Soma Roy for her accuracy checking and work on the solutions manuals. Finally, the
continuing support of family, colleagues, and friends has helped smooth out the bumps
in the road. We are truly grateful to all of you.

Introduction
Statistical concepts and methods are not only useful but indeed often indispensable in
understanding the world around us. They provide ways of gaining new insights into the
behavior of many phenomena that you will encounter in your chosen field of specializa-
tion in engineering or science.
The discipline of statistics teaches us how to make intelligent judgments and in-
formed decisions in the presence of uncertainty and variation. Without uncertainty
or variation, there would be little need for statistical methods or statisticians. If every
component of a particular type had exactly the same lifetime, if all resistors produced
by a certain manufacturer had the same resistance value, if pH determinations for soil
specimens from a particular locale gave identical results, and so on, then a single obser-
vation would reveal all desired information.
An interesting manifestation of variation appeared in connection with an effort
to determine the “greenest” way to travel. The article titled “Carbon Conundrum”
( , 2008: 9) described websites that help consumers calculate carbon
output. The results for carbon output for a flight from New York to Los Angeles appear
in the accompanying table.

Carbon Calculator CO2 (lb)

Terra Pass 1924
Conservation International 3000
Cool It 3049
World Resources Institute/Safe Climate 3163
National Wildlife Federation 3465
Sustainable Travel International 3577
Native Energy 3960
Environmental Defense 4000
Carbonfund.org 4820
The Climate Trust/CarbonCounter.org 5860
Bonneville Environmental Foundation 6732
Substantial disagreement clearly exists among these online calculators as to exactly
how much carbon is emitted, characterized in the article as “from a ballerina’s to
Bigfoot’s.” A website also was provided where readers could learn more about how the
various calculators work.
How can statistical techniques be used to gather information and draw conclusions?
Suppose, for example, that a materials engineer has developed a coating for retarding cor-
rosion in metal pipe under specified circumstances. If this coating is applied to different
segments of pipe, variation in environmental conditions and in the segments themselves will
result in more substantial corrosion on some segments than on others. Methods of statisti-
cal analysis could be used on data from such an experiment to decide whether the
amount of corrosion exceeds an upper specification limit of some sort or to predict how
much corrosion will occur on a single piece of pipe.
Alternatively, suppose the engineer has developed the coating in the belief that it will
be superior to the currently used coating. A comparative experiment could be carried out
to investigate this issue by applying the current coating to some segments of pipe and the
new coating to other segments. This must be done with care lest the wrong conclusion
emerge. For example, perhaps the average amount of corrosion is identical for the two
coatings. However, the new coating may be applied to segments that have superior ability
to resist corrosion and under less stressful environmental conditions compared to the seg-
ments and conditions for the current coating. The investigator would then likely observe
a difference between the two coatings attributable not to the coatings themselves but just
to extraneous variation. Statistics offers not only methods for analyzing the results of ex-
periments once they have been carried out but also suggestions for how experiments can
be performed in an efficient manner to mitigate the effects of variation and have a better
chance of producing correct conclusions.
In Chapters 1–3, we concentrate on describing and summarizing statistical informa-
tion obtained from populations or processes under investigation. Chapter 4 discusses
how information can be collected either by the mechanism of sampling or by designing

and carrying out an experiment. Chapter 5 formalizes the notion of randomness and un-
certainty by introducing the language of probability. The remainder of the book focuses
on the development of inferential methods for drawing interesting conclusions from data
in a wide variety of situations. We hope you will find the subject matter and our presenta-
tion to be as interesting, relevant, and exciting as we do.

1.1 Populations, Samples, and Processes

Engineers and scientists are constantly exposed to collections of facts, or data, both in their
professional capacities and in everyday activities. The discipline of statistics provides meth-
ods for organizing and summarizing data and for drawing conclusions based on informa-
tion contained in the data.
An investigation will typically focus on a well-defined collection of objects constitut-
ing a population of interest. In one study, the population might consist of all gelatin cap-
sules of a particular type produced during a specified period. Another investigation might
involve the population consisting of all individuals who received a B.S. in engineering
during the most recent academic year. When desired information is available for all ob-
jects in the population, we have what is called a census. Constraints on time, money, and
other scarce resources usually make a census impractical or infeasible. Instead, a subset of
the population—a sample—is selected in some prescribed manner. Thus we might obtain
a sample of bearings from a particular production run as a basis for investigating whether
bearings are conforming to manufacturing specifications, or we might select a sample of
last year’s engineering graduates to obtain feedback about the quality of the curricula.
We are usually interested only in certain characteristics of the objects in a population:
the number of flaws on the surface of each casing, the thickness of each capsule wall, the
gender of an engineering graduate, the age at which the individual graduated, and so on.
A characteristic may be categorical, such as gender or type of malfunction, or it may be
numerical in nature. In the former case, the value of the characteristic is a category (e.g.,
female or insufficient solder), whereas in the latter case, the value is a number (e.g., age 5
23 years or diameter 5 .502 cm). A variable is any characteristic whose value may change
from one object to another in the population. We shall generally denote variables by lower-
case letters from the end of our alphabet. Examples include
x 5 gender of a graduating engineer
y 5 number of major defects on a newly manufactured automobile
z 5 braking distance of an automobile under specified conditions
Data results from making observations either on a single variable or simultaneously on
two or more variables. A univariate data set consists of observations on a single variable.
For example, we might determine the type of transmission, automatic (A) or manual (M),
on each of ten automobiles recently purchased at a certain dealership, resulting in the
categorical data set
M A A A M A A M A A
The following sample of lifetimes (hours) of brand X batteries put to a certain use is a nu-
merical univariate data set:
5.6 5.1 6.2 6.0 5.8 6.5 5.8 5.5

We have bivariate data when observations are made on each of two variables. Our data set
might consist of a (height, weight) pair for each basketball player on a team, with the first
observation as (72, 168), the second as (75, 212), and so on. If an engineer determines the
value of both x 5 component lifetime and y 5 reason for component failure, the resulting
data set is bivariate with one variable numerical and the other categorical. Multivariate
data arises when observations are made on more than two variables. For example, a re-
search physician might determine the systolic blood pressure, diastolic blood pressure, and
serum cholesterol level for each patient participating in a study. Each observation would be
a triple of numbers, such as (120, 80, 146). In many multivariate data sets, some variables
are numerical and others are categorical. Thus the annual automobile issue of Consumer
Reports gives values of such variables as type of vehicle (small, sporty, compact, mid-size,
large), city fuel efficiency (mpg), highway fuel efficiency (mpg), drivetrain type (rear wheel,
front wheel, four wheel), and so on.

Branches of Statistics
An investigator who has collected data may wish simply to summarize and describe
important features of the data. This entails using methods from descriptive statis-
tics. Some of these methods are graphical in nature—the construction of histograms,
boxplots, and scatterplots are primary examples. Other descriptive methods involve cal-
culation of numerical summary measures, such as means, standard deviations, and cor-
relation coefficients. The wide availability of statistical computer software packages has
made these tasks much easier to carry out than they used to be. Computers are much
more efficient than human beings at calculation and the creation of pictures (once they
have received appropriate instructions from the user!). This means that the investigator
doesn’t have to expend much effort on “grunt work” and will have more time to study
the data and extract important messages. Throughout this book, we will present output
from various packages such as Minitab, SAS, and R. The R software can be downloaded
without charge from www.r-project.org.

Example 1.1 Charity is a big business in the United States. The website charitynavigator.com gives
information on approximately 5500 charitable organizations, and many smaller chari-
ties fly below the navigator’s radar screen. Some charities operate very efficiently, with
fund-raising and administrative expenses only a small percentage of total expenses,
whereas others spend a high percentage of what they take in to perform the same
activities. Here is data on fund-raising expenses as a percentage of total expenditures
for a random sample of 60 charities:
6.1 12.6 34.7 1.6 18.8 2.2 3.0 2.2 5.6 3.8
2.2 3.1 1.3 1.1 14.1 4.0 21.0 6.1 1.3 20.4
7.5 3.9 10.1 8.1 19.5 5.2 12.0 15.8 10.4 5.2
6.4 10.8 83.1 3.6 6.2 6.3 16.3 12.7 1.3 0.8
8.8 5.1 3.7 26.3 6.0 48.0 8.2 11.7 7.2 3.9
15.3 16.6 8.8 12.0 4.7 14.7 6.4 17.0 2.5 16.2
Without any organization, making sense of the data’s most prominent features is dif-
ficult: What is a typical (i.e., representative) value? Are values highly concentrated

about a typical value or are they quite dispersed? Are there any gaps in the data?
What fraction of the values are less than 20%? Figure 1.1 shows what is called a stem-
and-leaf display as well as a histogram. In Section 1.2, we will discuss construction
and interpretation of these data summaries. For the moment, we hope you see how
they begin to describe how the percentages are distributed over the range of possible
values from 0 to 100. A substantial majority of the charities in the sample obviously
spend less than 20% on fund-raising, and only a few percentages might be viewed as
beyond the bounds of sensible practice.

30
Frequency

10
Unless otherwise noted, all content on this page is © Cengage Learning.

0
0 10 20 30 40 50 60 70 80 90
FundRsng

Figure 1.1 A Minitab stem-and-leaf display (10ths digit truncated)

and histogram for the charity fund-raising percentage data

Having obtained a sample from a population, an investigator would frequently like to

use sample information to draw some type of conclusion (make an inference of some sort)
about the population. That is, the sample is a means to an end rather than an end in itself.

Techniques for generalizing from a sample to a population are gathered within the branch
of our discipline called inferential statistics.

Example 1.2 Material strength investigations provide a rich area of application for statistical
methods. The article “Effects of Aggregates and Microfillers on the Flexural Prop-
erties of Concrete” (Magazine of Concrete Research, 1997: 81–98) reported on
a study of strength properties of high-performance concrete obtained by using
superplasticizers and certain binders. The compressive strength of such concrete
had previously been investigated, but not much was known about flexural strength
(a measure of ability to resist failure in bending). The accompanying data on
flexural strength (in megapascals, MPa, where 1 Pa (pascal) 5 1.45 3 1024 psi) ap-
peared in the article cited:

5.9 7.2 7.3 6.3 8.1 6.8 7.0 7.6 6.8 6.5 7.0 6.3 7.9 9.0
8.2 8.7 7.8 9.7 7.4 7.7 9.7 7.8 7.7 11.6 11.3 11.8 10.7

Suppose we want an estimate of the average value of flexural strength for all beams
that could be made in this way (if we conceptualize a population of all such beams,
we are trying to estimate the population mean). It can be shown that, with a high
degree of confidence, the population mean strength is between 7.48 MPa and 8.80
MPa; we call this a confidence interval or interval estimate. Alternatively, this data
could be used to predict the flexural strength of a single beam of this type. With a
high degree of confidence, the strength of a single such beam will exceed 7.35 MPa;
the number 7.35 is called a lower prediction bound.

The Scope of Modern Statistics

Statistical methodology is commonly employed by investigators in virtually every disci-
pline, including such areas as
molecular biology (analysis of microarray data)
ecology (describing quantitatively how individuals in various animal and plant pop-
ulations are spatially distributed)
materials engineering (studying properties of various treatments to retard corrosion)
marketing (developing market surveys and strategies for marketing new products)
public health (identifying sources of diseases and ways to treat them)
civil engineering (assessing the effects of stress on structural elements and the im-
pacts of traffic flows on communities)
As you progress through the book, you’ll encounter a wide spectrum of different
scenarios in the examples and exercises that illustrate the application of techniques
from probability and statistics. Many of these scenarios involve data or other mate-
rial extracted from articles in engineering and science journals. The methods pre-
sented here have become established and trusted tools in the arsenal of those who
work with data. Meanwhile, statisticians continue to develop new models to describe

randomness and uncertainty and new methodology to analyze data. As evidence of

the continuing creative efforts in the statistical community, here are titles and cap-
sule descriptions of some articles that have recently appeared in statistics journals
(Journal of the American Statistical Association is abbreviated JASA, and APS is short
for the Annals of Applied Statistics, just two of the many prominent journals in the
discipline):
“Application of Branching Models in the Study of Invasive Species” (JASA,
2012: 467–476): Seismologists often predict earthquake occurrences using what
is known as epidemic-type aftershock sequence (ETAS) models. The name stems
from the model feature that allows earthquakes to cause aftershocks, which in turn
may induce subsequent aftershocks, and so on, thereby generating a cascading
effect. The authors propose the use of ETAS models in studying invasive plant
and animal species. In particular, the article considers the spread of an invasive
species in Costa Rica (Musa velutina, or red banana). The authors determine the
estimated spatial–temporal rate of spread of red banana plants using a space–time
ETAS model.
“Spatio-Spectral Mixed-Effects Model for Functional Magnetic Resonance Im-
aging Data” (JASA, 2012: 568–577): For many years, scientists have attempted
to model cognitive control-related activation among specific regions of the hu-
man brain. Researchers measure this brain activity through functional magnetic
resonance imaging (fMRI). fMRI data often exhibit spatial and temporal correla-
tions (i.e., observations made at nearby locations or time points are often strongly
related). Standard approaches to fMRI analysis, however, fail to incorporate
these relationships. The article proposes a statistical model to study activation
in specific regions in the prefrontal cortex while also incorporating the underly-
ing spatio–temporal correlations. The authors provide a simulation study that
shows that significant errors can occur by ignoring the correlation structure in
the network.
“Active Learning Through Sequential Design, with Applications to the Detection
of Money Laundering” (JASA, 2009: 969–981): Money laundering involves con-
cealing the origin of funds obtained through illegal activities. The huge number
of transactions occurring daily at financial institutions makes detection of money
laundering difficult. The standard approach has been to extract various summary
quantities from the transaction history and conduct a time consuming investigation
of suspicious activities. The article proposes a more efficient statistical method and
illustrates its use in a case study.
“Robust Internal Benchmarking and False Discovery Rates for Detecting Racial
Bias in Police Stops” (JASA, 2009: 661–668): Allegations of police actions that are
at least partly attributable to racial bias have become a contentious issue in many
communities. This article proposes a new method that is designed to reduce the
risk of flagging a substantial number of “false positives” (individuals falsely identi-
fied as manifesting bias). The method was applied to data on 500,000 pedestrian
stops from New York City in 2006; 15 officers from the pool of 3000 regularly in-
volved in pedestrian stops were identified as having stopped a substantially greater
fraction of black and Hispanic people than what would be predicted if bias were
absent.

“Measuring the Vulnerability of the Uruguayan Population to Vector-Borne Diseases

via Spatially Hierarchical Factor Models” (APS, 2012: 284–303): Vector-borne
diseases are illnesses caused by infections transmitted to people by organisms such
as insects and spiders. According to the World Health Organization, the most deadly
vector-borne disease is malaria, which kills more than 1 million people annually,
mostly African children under age five. The authors develop a statistical index
to model the vulnerability of Uruguayans to vector-borne diseases by accounting
for variation attributable to factors such as different census tracts within cities and
different cities in the country.
“Self-Exciting Hurdle Models for Terrorist Activity” (APS, 2012: 106–124): The
authors develop a predictive model of terrorist activity by considering the daily
number of terrorist attacks in Indonesia from 1994 through 2007. The model
estimates the chance of future attacks as a function of the times since past attacks.
One feature of the model considers the excess of nonattack days coupled with
the presence of multiple coordinated attacks on the same day. The article pro-
vides an interpretation of various model characteristics and assesses its predictive
performance.
“The BARISTA: A Model for Bid Arrivals in Online Auctions” (APS, 2007:
412–441): Online auctions such as those on eBay and uBid often have char-
acteristics that differentiate them from traditional auctions. One particularly
important such property is that the number of bidders at the outset of many
traditional auctions is fixed, whereas in online auctions this number and the
number of resulting bids are not predetermined. The article proposes a new
BARISTA (for Bid ARivals In STAges) model for describing the way in which
bids arrive that allows for higher bidding intensity not only at the outset of
the auction but also as the auction comes to a close. Various properties of the
model are investigated and then validated using data from eBay.com on auc-
tions for Palm M515 personal assistants, Microsoft Xbox games, and Cartier
watches.

Statistical information now appears with increasing frequency in the popular media, and
occasionally the spotlight is even turned on statisticians. For example, “Behind Cancer
Guidelines, Quest for Data,” a New York Times article from November 23, 2009, reported
that the new science for cancer investigations and more sophisticated methods for data
analysis spurred the U.S. Preventive Services task force to reexamine guidelines for how
frequently middle-aged and older women should have mammograms. The panel com-
missioned six independent groups to do statistical modeling. The result was a new set of
conclusions, in particular one that mammograms every two years give nearly the same
benefit as annual ones and confer only half the risk of harm. Donald Berry, a promi-
nent biostatistician, was quoted as saying he was pleasantly surprised that the task force
took the new research to heart in making its recommendations. The task force’s report
has generated much controversy among cancer organizations, politicians, and women
themselves.
We hope you will become increasingly convinced of the importance and relevance
of the discipline of statistics as you dig more deeply into the book and subject. We also
anticipate you’ll be intrigued enough to want to continue your statistical education beyond
your current course.

Enumerative Versus Analytic Studies

W. E. Deming was a very influential American statistician whose ideas concerning
the use of statistical methods in industrial production found great favor with Japanese
companies in the years after World War II. He used the phrase enumerative study to
describe investigations involving a finite collection of identifiable, unchanging objects
that make up a population. In such studies, a sampling frame—that is, a listing of the
objects to be sampled—is available or can be created. One example of such a frame is
the collection of all signatures on petitions to qualify an initiative for inclusion on the
ballot for an upcoming election. A sample is usually selected to ascertain whether the
number of valid signatures exceeds a specified value. The variable on which observa-
tions are made is dichotomous, the two possible values being valid (S, for success) and
not valid (F, for failure). As another example, the frame may contain serial numbers of
all ovens manufactured by a particular company during a particular period. A sample
may be selected to infer something about the average actual temperature of these units
when the temperature control is set to 400°F (an inference about the population mean
temperature).
Many problem situations faced by engineers involve some sort of ongoing process—a
group of interrelated activities undertaken to accomplish some objective—rather than a
specified, unchanging population. An investigator wants to learn something about how the
process is operating so that the process can then be modified to better achieve the desired
goal. Deming described such scenarios as analytic studies.

Example 1.3 The process of making ignition keys for automobiles consists of trimming and press-
ing raw key blanks, cutting grooves and notches, and then plating the keys. Dimen-
sions associated with groove and notch cutting are crucial to proper key functioning.
There will always be “normal” variation in dimensions because of fluctuations in
materials, worker behavior, and environmental conditions. It is important, though,
to monitor production to ensure that there are no unusual sources of variation, such
as incorrect machine settings or contaminated material, which might result in non-
conforming units or substantial changes in product characteristics. For this purpose,
a sample (subgroup) of five keys is selected every 20 minutes, and critical dimensions
are measured. Here are a few of the resulting observations for one particular dimen-
sion (in thousandths of an inch):
Subgroup 1: 6.1 8.4 7.6 7.5 4.4
Subgroup 2: 8.8 8.3 5.9 7.4 7.6
Subgroup 3: 8.0 7.5 7.0 6.8 9.3
This is indeed sample data, which can be used as a basis for drawing conclusions.
However, the conclusions are about production process behavior rather than about
a particular population of keys.

Analytic studies sometimes involve figuring out what actions to take to improve the
performance of a future product.

Example 1.4 Failure in fluorescent lamps occurs when their luminosity falls below a predeter-
mined level. The article “Using Degradation Data to Improve Fluorescent Lamp
Reliability” (J. of Quality Technology, 1995: 363–369) described a case study involv-
ing fluorescent lamps of a certain type. The project engineer suggested focusing on
three factors thought to be crucial to reliability:

1. The amount of electric current in the exhaustive process

2. The concentration of the mercury dispenser in the coating process
3. The concentration of argon in the filling process

Two levels, low and high, of each factor were established, leading to eight com-
binations of factor levels (e.g., low current, high mercury concentration, and
low argon concentration). Luminance levels were then monitored over time for
certain factor-level combinations. (Because of limited resources, only four of
the eight combinations were included in the experiment, with five lamps used
at each one.) Here is data for one particular lamp for which all factor levels
were low:

Time (hr): 100 500 1000 2000 3000 4000 5000 6000
Luminance (lumens): 2810 2490 2460 2370 2320 2160 2140 2080

Statistical methods were used on the resulting data to draw conclusions about how
lamp reliability could be improved. In particular, it was recommended that high
concentration levels should be used with a low current level.

1.2 Visual Displays for Univariate Data

Some preliminary organization of a data set often reveals useful information and opens
paths of inquiry. Pictures are particularly effective in this respect. In this section, we intro-
duce several of the most frequently used pictorial techniques.

Stem-and-Leaf Displays
A stem-and-leaf display can be an effective way to organize numerical data without
expending much effort. It is based on separating each observation into two parts:
(1) a stem, consisting of one or more leading digits, and (2) a leaf, consisting of
the remaining or trailing digit(s). Suppose, for example, that data on calibration
times (sec) for certain test devices has been gathered and that the smallest and
largest times are 11.3 and 18.8, respectively. Then we could use the tens and ones
digits as the stem of an observation, leaving the tenths digit for the leaf. Thus 11.3
would have a stem of 11 and a leaf of 3, 16.0 would have a stem of 16 and a leaf of
0, and so on. Once stem values have been chosen, they should be listed in a single
column. Then the leaf of each observation should be placed on the row of the cor-
responding stem.

Example 1.5 The use of alcohol by college students is of great concern not only to those in the
academic community but also, because of potential health and safety consequences,
to society at large. The article “Health and Behavioral Consequences of Binge
Drinking in College” (J. of the Amer. Med. Assoc., 1994: 1672–1677) reported on
a comprehensive study of heavy drinking on campuses across the United States. A
binge episode was defined as five or more drinks in a row for males and four or more
for females. Figure 1.2 shows a stem-and-leaf display of 140 values of x 5 the percent-
age of undergraduate students who are binge drinkers. (These values were not given
in the cited article, but our display agrees with a picture of the data that did appear.)

0 4
1 1345678889
2 1223456666777889999 Stem: tens digit
3 0112233344555666677777888899999 Leaf: ones digit
4 111222223344445566666677788888999
5 00111222233455666667777888899
6 01111244455666778

Figure 1.2 Stem-and-leaf display for percentage binge drinkers at

each of 140 colleges

The first leaf on the stem 2 row is 1, which tells us that 21% of the students at one
of the colleges in the sample were binge drinkers. Without the identification of stem
digits and leaf digits on the display, we wouldn’t know whether the stem 2, leaf 1 obser-
vation should be read as 21%, 2.1%, or .21%.
When creating a display by hand, ordering the leaves from smallest to largest on
each line can be time-consuming, and this ordering usually contributes little if any
extra information. Suppose the observations had been listed in alphabetical order by
school name, as
16% 33% 64% 37% 31% ...
Then placing these values on the display in this order would result in the stem 1 row
having 6 as its first leaf, and the beginning of the stem 3 row would be
3 | 371 . . .
Unless otherwise noted, all content on this page is © Cengage Learning.

The display suggests that a typical or representative value is in the stem 4 row,
perhaps in the mid-40% range. The observations are not highly concentrated about
this typical value, as would be the case if all values were between 20% and 49%. The
display rises to a single peak as we move downward, and then declines; there are no
gaps in the display. The shape of the display is not perfectly symmetric, but instead
appears to stretch out a bit more in the direction of low leaves than in the direction of
high leaves. Lastly, there are no observations that are unusually far from the bulk of
the data (no outliers), as would be the case if one of the 26% values had instead been
86%. The most surprising feature of this data is that at most colleges in the sample, at
least one-quarter of the students are binge drinkers. The problem of heavy drinking on
campuses is much more pervasive than many had suspected.

A stem-and-leaf display conveys information about the following aspects of the data:
Identification of a typical or representative value
Extent of spread about the typical value
Presence of any gaps in the data
Extent of symmetry in the distribution of values
Number and location of peaks
Presence of any outlying values
Suppose in Example 1.5 that each observation had included a tenths digit as well as
the tens and ones digits: 16.4%, 36.5%, and so on. We could use two-digit leaves, so that
16.4 would have a stem of 1 and a leaf of 64; in this case, the decimal point can be omitted,
but commas are necessary between successive leaves. Because such a display can become
very unwieldy, it is customary to use single-digit leaves obtained by truncation (not round-
ing). Thus 36.7 would have stem 3 and leaf 6, and information about the tenths digit would
be suppressed.
Consider a data set consisting of exam scores all of which are in the 70s, 80s, and 90s
(an instructor’s dream!). A stem-and-leaf display with the tens digit as the stem would have
only three rows. However, a more informative display can be created by repeating each
stem value twice, once for the low leaves 0, 1, 2, 3, 4 and again for the high leaves 5, 6, 7,
8, 9. A display of the binge-drinking data with repeated stems is shown in Figure 1.3. (The
11 on the far left in the fourth row indicates that there are 11 observations on or above that
row; the (14) row contains the middle data value.)

Unless otherwise noted, all content on this page is © Cengage Learning.

Figure 1.3 Minitab stem-and-leaf display

using repeated stems

Suppose that a final exam in physics contained questions worth a total of 200 points
and that the only student who scored in the 100s earned 186 points. Rather than include
rows 10, 11, . . . , and 18 just to show the extreme outlier 186, it is better to stop the display
with a stem 9 row and place the information HI: 186 in a prominent place to the right of
the display. The same thing can be done with outliers on the low end.
Consider two different data sets, each consisting of observations on the same variable,
for example, exam scores for two different classes or stopping distances for cars equipped

with two different braking systems. An investigator would naturally want to know in what
ways the two sets were similar and how they differed. This can be accomplished by using
a comparative stem-and-leaf display, in which the leaves for one data set are listed to the
right of the stems and the leaves for the other to the left. Figure 1.4 shows a small example;
the two sides of the display are quite similar, except that the right side appears to be shifted
up one row (about 10 points) from the other side.

9 658618
9447 8 13754380
2208965655 7 5312267
2432875 6 45104
5882 5 9

Figure 1.4 A comparative stem-

and-leaf display of exam scores

Dotplots
A dotplot is an attractive summary of numerical data when the data set is reasonably small or
there are relatively few distinct data values. Each observation is represented by a dot above the
corresponding location on a horizontal measurement scale. When a value occurs more than
once, there is a dot for each occurrence, and these dots are stacked vertically. As with a stem-and-
leaf display, a dotplot gives information about location, spread, extremes, and gaps.

Example 1.6 Here is data on state-by-state appropriations for higher education as a percentage of
state and local tax revenue for fiscal year 2009–2010 (from the Statistical Abstract of the
United States). Values are listed in order of state abbreviations (AL first, WY last):
14.0 3.1 8.6 9.6 7.4 4.0 4.5 6.5 6.1 8.8
8.2 8.6 6.4 6.7 8.0 8.5 9.4 9.5 4.6 6.8
3.9 6.9 6.3 11.9 5.8 5.8 9.9 5.9 2.7 4.2
14.9 4.0 12.1 8.0 5.2 9.2 6.8 4.3 3.9 9.6
Unless otherwise noted, all content on this page is © Cengage Learning.

8.0 8.6 8.6 8.7 3.1 5.8 6.2 8.7 6.8 8.9

Figure 1.5 shows a dotplot of the data. The most striking feature is the substan-
tial state-to-state variability. The largest values (for New Mexico, Alabama, North
Carolina, and Mississippi) are somewhat separated from the bulk of the data and
may possibly qualify as outliers.

3.6 5.4 7.2 9.0 10.8 12.6 14.4

Figure 1.5 A dotplot of the data from Example 1.6

If the data set discussed in Example 1.6 had consisted of many more observations
(e.g. average per pupil spending for each school district in the U.S.), it would be quite
cumbersome to construct a corresponding dotplot. Our next technique is well suited to
such situations.

Histograms
Some numerical data is obtained by counting to determine the value of a variable (the
number of traffic citations a person received during the last year, the number of persons ar-
riving for service during a particular period), whereas other data is obtained by taking mea-
surements (weight of an individual, reaction time to a particular stimulus). The prescription
for drawing a histogram is different for these two cases.

definitionS A variable is discrete if its set of possible values either is finite or else can be listed
in an infinite sequence (one in which there is a first number, a second number,
and so on). A variable is continuous if its possible values consist of an entire
interval on the number line.

A discrete variable x almost always results from counting, in which case possible
values are 0, 1, 2, 3, . . . or some subset of these integers. Continuous variables arise from
making measurements. For example, if x is the pH of a chemical substance, then in
theory x could be any number between 0 and 14: 7.0, 7.03, 7.032, and so on. Of course,
in practice there are limitations on the degree of accuracy of any measuring instrument,
so we may not be able to determine pH, reaction time, height, and concentration to an
arbitrarily large number of decimal places. However, from the point of view of creating
mathematical models for distributions of data, it is helpful to imagine an entire con-
tinuum of possible values.
Consider data consisting of observations on a discrete variable x. The frequency of
any particular x value is the number of times that value occurs in the data set. The relative
frequency of a value is the fraction or proportion of time the value occurs:

number of times the value occurs

relative frequency of a value 5
number of observations in the data set

Suppose, for example, that our data set consists of 200 observations on x 5 the number of
major defects on a new car of a certain type. If 70 of these x values are 1, then
frequency of the x value 1: 70
70
relative frequency of the x value 1: 5 .35
200

Multiplying a relative frequency by 100 gives a percentage; in the defect example, 35% of the
cars in the sample had just one major defect. The relative frequencies, or percentages, are usually

of more interest than the frequencies themselves. In theory, the relative frequencies should sum
to 1, but in practice the sum may differ slightly from 1 because of rounding.

Constructing a Histogram for Discrete Data

First, determine the frequency and relative frequency of each value. Then mark possible
values on a horizontal scale. Above each value, draw a rectangle whose height is the rela-
tive frequency (or, alternatively, the frequency) of that value (all rectangles should have the
same base width).

This construction ensures that the area of each rectangle is proportional to the relative
frequency of the value. Thus if the relative frequencies of x 5 1 and x 5 5 are .35 and .07,
respectively, then the area of the rectangle above 1 is five times the area of the rectangle
above 5.

Example 1.7 Every corporation has a governing board of directors. The number of individuals on a
board varies from one corporation to another. One of the authors of the article “Does
Optimal Corporate Board Size Exist? An Empirical Analysis” (Journal of Applied
Finance, 2010: 57–69) provided the accompanying data on the number of directors
on the boards of a random sample of 204 corporations.
Relative Relative
Board Size Frequency Frequency Board Size Frequency Frequency
4 3 0.0147 19 0 0.0000
5 12 0.0588 20 0 0.0000
6 13 0.0637 21 1 0.0049
7 25 0.1225 22 0 0.0000
8 24 0.1176 23 0 0.0000
9 42 0.2059 24 1 0.0049
10 23 0.1127 25 0 0.0000
11 19 0.0931 26 0 0.0000
12 16 0.0784 27 0 0.0000
13 11 0.0539 28 0 0.0000
14 5 0.0245 29 0 0.0000
15 4 0.0196 30 0 0.0000
16 1 0.0049 31 0 0.0000
17 3 0.0147 32 1 0.0049
18 0 0.0000 204 0.9997

The corresponding histogram in Figure 1.6 rises to a peak and then declines. The
histogram extends a bit more on the right (toward large values) than it does on the
left—a slight positive skew.

30
Frequency

0
4 8 12 16 20 24 28 32
Board size

Figure 1.6 Histogram of number of corporate board members

From either the tabulated information or the histogram itself, we can determine
the following:

Proportion of boards with (relative (relative (relative

at most 10 directors 5 frequency 1 frequency 1 ∙ ∙ ∙ 1 frequency
for x 5 4) for x 5 5) for x 5 10)
5 0.0147 1 0.0588 1 0.0637 1 0.1225
1 0.1176 1 0.2059 1 0.1127 5 0.6959
Unless otherwise noted, all content on this page is © Cengage Learning.
Similarly,

Proportion of boards with (relative (relative (relative

more than 15 directors 5 frequency 1 frequency 1 ∙ ∙ ∙ 1 frequency
for x 5 16) for x 5 17) for x 5 32)
5 0.0049 1 0.0147 1 ∙ ∙ ∙ 1 0.0049 5 0.0343

Constructing a histogram for continuous data (measurements) entails subdividing the

measurement axis into a suitable number of class intervals or classes, such that each obser-
vation is contained in exactly one class. Suppose, for example, that we have 50 observations
on x 5 fuel efficiency of an automobile (mpg), the smallest of which is 27.8 and the largest

of which is 31.4. Then we could use the class boundaries 27.5, 28.0, 28.5, . . . , and 31.5 as
shown here:

27.5 28.0 28.5 29.0 29.5 30.0 30.5 31.0 31.5

A potential difficulty is that an observation such as 29.0 lies on a class boundary so it doesn’t
lie in exactly one interval. One way to deal with this problem is to use boundaries like
27.55, 28.05, . . . , 31.55. Adding a hundredths digit to the class boundaries prevents obser-
vations from falling on the resulting boundaries. Another way to deal with this problem is
to use the classes 27.5 2 , 28.0, 28.0 2 , 28.5, . . . , 31.0 2 , 31.5. Then 29.0 falls in the
class 29.0 2 , 29.5 rather than in the class 28.5 2 , 29.0. In other words, with this conven-
tion, an observation on a boundary is placed in the interval to the right of the boundary.
This is how Minitab constructs a histogram.

Constructing a Histogram for Continuous Data:

Equal Class Widths
Determine the frequency and relative frequency for each class. Mark the class boundar-
ies on a horizontal measurement axis. Above each class interval, draw a rectangle whose
height is the corresponding relative frequency (or frequency).

Example 1.8 Power companies need information about customer usage to obtain accurate fore-
casts of demand. Investigators from Wisconsin Power and Light determined energy
consumption (BTUs) during a particular period for a sample of 90 gas-heated homes.
An adjusted consumption value was calculated as follows:
consumption
adjusted consumption 5
(weather, in degree days)(house area)

This resulted in the accompanying data (part of the stored data set FURNACE.
MTW available in Minitab, which we have ordered from smallest to largest):
2.97 4.00 5.20 5.56 5.94 5.98 6.35 6.62 6.72 6.78
6.80 6.85 6.94 7.15 7.16 7.23 7.29 7.62 7.62 7.69
7.73 7.87 7.93 8.00 8.26 8.29 8.37 8.47 8.54 8.58
8.61 8.67 8.69 8.81 9.07 9.27 9.37 9.43 9.52 9.58
9.60 9.76 9.82 9.83 9.83 9.84 9.96 10.04 10.21 10.28
10.28 10.30 10.35 10.36 10.40 10.49 10.50 10.64 10.95 11.09
11.12 11.21 11.29 11.43 11.62 11.70 11.70 12.16 12.19 12.28
12.31 12.62 12.69 12.71 12.91 12.92 13.11 13.38 13.42 13.43
13.47 13.60 13.96 14.24 14.35 15.12 15.24 16.06 16.90 18.26

We let Minitab select the class intervals. The most striking feature of the histogram in
Figure 1.7 is its resemblance to a bell-shaped (and therefore symmetric) curve, with
the point of symmetry at roughly 10.
30

Percent 10

0
1 3 5 7 9 11 13 15 17 19
BTU

Figure 1.7 Histogram of the energy

consumption data from Example 1.8

Class: 12 ,3 32, 5 52, 7 72, 9 92, 11 112,13 132,15 152,17 172,19

Frequency: 1 1 11 21 25 17 9 4 1
Relative
frequency: .011 .011 .122 .233 .278 .189 .100 .044 .011
From the histogram,
proportion of observations
.01 1 .01 1 .12 1 .23 5 .37
less than 9
(exact value 5 34y90 5 .378)
The relative frequency for the 9 2 ,11 class is about .27, so roughly half of this, or
.135, should be between 9 and 10. Thus
proportion of observations
.37 1 .135 5 .505 (slightly more than 50%) Unless otherwise noted, all content on this page is © Cengage Learning.
less than 10
The exact value of this proportion is 47y90 5 .522.

There are no hard-and-fast rules concerning either the number of classes or the choice
of classes themselves. Between 5 and 20 classes will be satisfactory for most data sets. Gener-
ally, the larger the number of observations in a data set, the more classes should be used. A
reasonable rule of thumb is
number of classes 2number of observations
Equal-width classes may not be a sensible choice if a data set has at least one “stretched-
out tail.” Figure 1.8 (page 19) shows a dotplot of such a data set. Using a small number of

(a)

(b)

(c)

Figure 1.8 Selecting class intervals when there are outliers: (a) many short
equal width intervals; (b) a few wide equal-width intervals; (c) unequal-width
intervals

equal-width classes results in almost all observations falling in just one or two of the classes.
If a large number of equal-width classes are used, many classes will have zero frequency. A
sound choice is to use a few wider intervals near extreme observations and narrower inter-
vals in the region of high concentration.

Constructing a Histogram for Continuous Data:

Unequal Class Widths
After determining frequencies and relative frequencies, calculate the height of each rect-
angle using the formula

relative frequency of the class

rectangle height 5
class width

The resulting rectangle heights are usually called and the vertical scale is the
density scale. This prescription will also work when class widths are equal.

Example 1.9 Corrosion of reinforcing steel is a serious problem in concrete structures located
Unless otherwise noted, all content on this page is © Cengage Learning.

in environments affected by severe weather conditions. For this reason, researchers

have been investigating the use of reinforcing bars made of composite material.
One study was carried out to develop guidelines for bonding glass-fiber-reinforced
plastic rebars to concrete (“Design Recommendations for Bond of GFRP Rebars to
Concrete,” J. of Structural Engr., 1996: 247–254). Consider the following 48 obser-
vations on measured bond strength:

11.5 12.1 9.9 9.3 7.8 6.2 6.6 7.0 13.4 17.1 9.3 5.6
5.7 5.4 5.2 5.1 4.9 10.7 15.2 8.5 4.2 4.0 3.9 3.8
3.6 3.4 20.6 25.5 13.8 12.6 13.1 8.9 8.2 10.7 14.2 7.6
5.2 5.5 5.1 5.0 5.2 4.8 4.1 3.8 3.7 3.6 3.6 3.6

Class: 22 ,4 4 2 ,6 6 2 ,8 8 2 ,12 12 2 ,20 20 2 ,30

Frequency: 9 15 5 9 8 2
Relative
frequency: .1875 .3125 .1042 .1875 .1667 .0417
Density: .094 .156 .052 .047 .021 .004
The resulting histogram appears in Figure 1.9. The right or upper tail stretches
out much farther than does the left or lower tail—a substantial departure from
symmetry.

0.15

0.10
Density

0.05

0.00
2 4 6 8 12 20 30
Bond strength

Figure 1.9 A Minitab density histogram for

the bond strength data of Example 1.9

When class widths are unequal, not using a density scale will give a picture with dis-
torted areas. For equal class widths, the divisor is the same in each density calculation, and
the extra arithmetic simply results in a rescaling of the vertical axis (i.e., the histogram us-
ing relative frequency and the one using density will have exactly the same appearance). A
density histogram does have one interesting property. Multiplying both sides of the formula
Unless otherwise noted, all content on this page is © Cengage Learning.
for density by the class width gives

relative frequency 5 (class width)(density)

5 (rectangle width)(rectangle height)
5 rectangle area

That is, the area of each rectangle is the relative frequency of the corresponding class.
Furthermore, since the sum of relative frequencies must be 1.0 (except for roundoff),
the total area of all rectangles in a density histogram is 1. It is always possible to draw a
histogram so that the area equals the relative frequency (this is true also for a histogram
of discrete data—just use the density scale). This property will play an important role in
creating models for distributions in Section 1.3.

Histogram Shapes
Histograms come in a variety of shapes. A unimodal histogram is one that rises to a single
peak and then declines. A bimodal histogram has two different peaks. Bimodality occurs
when the data set consists of observations on two quite different kinds of individuals or
objects. For example, consider a large data set consisting of driving times for automobiles
traveling between San Luis Obispo, California, and Monterey, California (exclusive of
stopping time for sightseeing, eating, etc.). This histogram would show two peaks, one for
those cars that took the inland route (roughly 2.5 hours) and another for those cars travel-
ing up the coast (3.5–4 hours). However, bimodality does not automatically follow in such
situations. Only if the two separate histograms are “far apart” relative to their spreads will
bimodality occur in the histogram of combined data. Thus a large data set consisting of
heights of college students should not result in a bimodal histogram because the typical
male height of about 69 inches is not far enough above the typical female height of about
64–65 inches. A histogram with more than two peaks is said to be multimodal. Of course,
the number of peaks may well depend on the choice of class intervals, particularly with a
small number of observations. The larger the number of classes, the more likely it is that
bimodality or multimodality will manifest itself.

Example 1.10 Figure 1.10(a) shows a Minitab histogram of the weights (lbs) of the 121 players
listed on the rosters of the San Francisco 49ers and the New England Patriots as of
November 28, 2012. Figure 1.10(b) is a smoothed histogram (actually what is called
a density estimate) of the data from the R software package. Both the histogram and
the smoothed histogram show three distinct peaks: The one on the right is for line-
men, the middle peak corresponds to linebacker weights, and the peak on the left is
for all other players (wide receivers, quarterbacks, etc.).

10
Unless otherwise noted, all content on this page is © Cengage Learning.

Percent

0
180 210 240 270 300 330 360
(a) Weight

Figure 1.10 NFL player weights: (a) histogram, (b) smoothed histogram

Density Estimation of NFL Player Weights

0.008

0.006
Density estimate

0.004

0.002

0.000
150 200 250 300 350 400
(b) Player weight

Figure 1.10 ( )

A histogram is symmetric if the left half is a mirror image of the right half. A bell-
shaped histogram is symmetric, but there are other unimodal symmetric histograms that
are not bell-shaped; histograms with more than one peak can also be symmetric. A uni-
modal histogram is positively skewed if the right or upper tail is stretched out compared
with the left or lower tail, and negatively skewed if the longer tail extends to the left.
Figure 1.11 shows “smoothed” histograms, obtained by superimposing a smooth curve on
the rectangles, that illustrate the various possibilities.
Unless otherwise noted, all content on this page is © Cengage Learning.

(a) (b) (c) (d)

Figure 1.11 Smoothed histograms: (a) symmetric unimodal; (b) bimodal; (c) positively skewed;
(d) negatively skewed

Categorical Data
A histogram for categorical data is often called a bar chart. In some cases, there will
be a natural ordering of classes (for example, freshman, sophomore, junior, senior,
graduate student), whereas in other cases, the order will be arbitrary (Honda, Yamaha,

Harley-Davidson, etc.). A Pareto diagram is a bar chart resulting from a quality control
study in which each category represents a different type of product nonconformity or
production problem. The categories appear in order of decreasing frequency (if a mis-
cellaneous category is needed, it is the last one).

Example 1.11 In the manufacture of printed circuit boards, finished boards are subjected to a final
inspection before they are shipped to customers. Here is data on the type of defect for
each board rejected at final inspection during a particular time period:

Type of defect Frequency Relative frequency

Low copper plating 112 .615
Poor electroless coverage 35 .192
Lamination problems 10 .055
Plating separation 8 .044
Etching problems 5 .027
Miscellaneous 12 .066
Figure 1.12 is a Pareto diagram. Roughly 80% (.615 1 .192) of the defects were of
one of the first two types.
Number of rejects

120
100
80
60
40
20
0
Low copper
plating

Poor electroless
coverage
Lamination
problems

Plating
separation

Etching
problems

Miscellaneous
Unless otherwise noted, all content on this page is © Cengage Learning.

Figure 1.12 A Pareto diagram for Example 1.11

Section 1.2 Exercises

1. Consider the strength data for beams given in concentrated about the representative value or
Example 1.2. rather spread out?
a. Construct a stem-and-leaf display of the data. b. Does the display appear to be reasonably sym-
What appears to be a representative strength metric about a representative value, or would you
value? Do the observations appear to be highly describe its shape in some other way?

c. Do there appear to be any outlying strength 132.7 132.9 133.0 133.1 133.1 133.1

values? 133.1 133.2 133.2 133.2 133.3 133.3
d. What proportion of strength observations in this 133.5 133.5 133.5 133.8 133.9 134.0
sample exceed 10 MPa? 134.0 134.0 134.0 134.1 134.2 134.3
2. The article cited in Example 1.2 also gave the accom- 134.4 134.4 134.6 134.7 134.7 134.7
panying strength observations for cylinders: 134.8 134.8 134.8 134.9 134.9 135.2
6.1 5.8 7.8 7.1 7.2 9.2 6.6 8.3 7.0 8.3 135.2 135.2 135.3 135.3 135.4 135.5
7.8 8.1 7.4 8.5 8.9 9.8 9.7 14.1 12.6 11.2 135.5 135.6 135.6 135.7 135.8 135.8
a. Construct a comparative stem-and-leaf display of 135.8 135.8 135.8 135.9 135.9 135.9
the beam and cylinder data, then answer the ques- 135.9 136.0 136.0 136.1 136.2 136.2
tions in parts (b)–(d) of Exercise 1 for the observa- 136.3 136.4 136.4 136.6 136.8 136.9
tions on cylinders. 136.9 137.0 137.1 137.2 137.6 137.6
b. In what ways are the two sides of the display 137.8 137.8 137.8 137.9 137.9 138.2
similar? Are there any obvious differences be- 138.2 138.3 138.3 138.4 138.4 138.4
tween the beam observations and the cylinder
138.5 138.5 138.6 138.7 138.7 139.0
observations?
139.1 139.5 139.6 139.8 139.8 140.0
3. The accompanying specific gravity values for vari- 140.0 140.7 140.7 140.9 140.9 141.2
ous wood types used in construction appeared in the 141.4 141.5 141.6 142.9 143.4 143.5
article “Bolted Connection Design Values Based on 143.6 143.8 143.8 143.9 144.1 144.5
European Yield Model” (J. of Structural Engr., 1993: 144.5 147.7 147.7
2169–2186):
a. Construct a stem-and-leaf display of the data by
.31 .35 .36 .36 .37 .38 .40 .40 .40
first deleting (truncating) the tenths digit and then
.41 .41 .42 .42 .42 .42 .42 .43 .44
repeating each stem value five times (once for
.45 .46 .46 .47 .48 .48 .48 .51 .54
leaves 0 and 1, a second time for leaves 2 and 3,
.54 .55 .58 .62 .66 .66 .67 .68 .75
etc.). Why is it relatively easy to identify a repre-
Construct a stem-and-leaf display using repeated sentative strength value?
stems, and comment on any interesting features of b. Construct a histogram using equal-width classes
the display. with the first class having a lower limit of 122 and
an upper limit of 124. Then comment on any in-
4. Allowable mechanical properties for structural
teresting features of the histogram.
design of metallic aerospace vehicles requires an
approved method for statistically analyzing empiri- 5. Consider the accompanying values of golf course
cal test data. The article “Establishing Mechanical lengths (yards) for a sample of courses designated by
Property Allowables for Metals” (J. of Testing and Golf Magazine as being among the most challenging
Evaluation, 1998: 293–299) used the accompany- in the United States:
ing data on tensile ultimate strength (ksi) as a basis
for addressing the difficulties in developing such a 6433 6435 6464 6470 6506 6526 6527
method: 6583 6605 6614 6694 6700 6713 6745
6770 6770 6790 6798 6850 6870 6873
122.2 124.2 124.3 125.6 126.3 126.5 6890 6900 6904 6927 6936 7005 7011
126.5 127.2 127.3 127.5 127.9 128.6 7022 7040 7050 7051 7105 7113 7131
128.8 129.0 129.2 129.4 129.6 130.2 7165 7168 7169 7209 7280
130.4 130.8 131.3 131.4 131.4 131.5 a. Would it be best to use one-digit, two-digit, or
131.6 131.6 131.8 131.8 132.3 132.4 three-digit stems as a basis for a stem-and-leaf dis-
132.4 132.5 132.5 132.5 132.5 132.6 play? Explain your reasoning.

b. Construct a stem-and-leaf display based on authors were classified according to the number of
two-digit stems and two-digit leaves, with suc- articles they had published during a certain period.
cessive leaves separated by either a comma or a The results were presented in the accompanying fre-
space. quency distribution:
c. Construct a stem-and-leaf display in which the
leaf of each observation is its tens digit (so the Number
ones digit is truncated). Does this display appear of papers: 1 2 3 4 5 6 7 8
to be significantly less informative about course Frequency: 784 204 127 50 33 28 19 19
lengths than the display of part (b)? What ad- Number
vantage would this display have over the one in of papers: 9 10 11 12 13 14 15 16 17
part (b) if there had been 200 courses in the Frequency: 6 7 6 7 4 4 5 3 3
sample?
a. Construct a histogram corresponding to this fre-
6. Construct two stem-and-leaf displays for the accom- quency distribution. What is the most interesting
panying set of exam scores, one in which each stem feature of the shape of the distribution?
value appears just once and the other in which stem b. What proportion of these authors published at
values are repeated: least five papers? At least ten papers? More than
ten papers?
74 89 80 93 64 67 72 70 66 85 89 81 c. Suppose the five 15s, three 16s, and three 17s had
81 71 74 82 85 63 72 81 81 95 84 81 been lumped into a single category displayed as
80 70 69 66 60 83 85 98 84 68 90 82 ;$15.< Would you be able to draw a histogram?
69 72 87 88 Explain.
What feature of the data is revealed by the display d. Suppose that instead of the values 15, 16,
with repeated stems that is not so readily apparent in and 17 being listed separately, they had
the first display? been combined into a 15–17 category with
frequency 11. Would you be able to draw a
7. Temperature transducers of a certain type are shipped histogram? Explain.
in batches of 50. A sample of 60 batches was selected,
and the number of transducers in each batch not 9. The number of contaminating particles on a silicon
conforming to design specifications was determined, wafer prior to a certain rinsing process was deter-
resulting in the following data: mined for each wafer in a sample of size 100, result-
ing in the following frequencies:
2 1 2 4 0 1 3 2 0 5 3 3 1 3 2 4 7 0 2 3
0 4 2 1 3 1 1 3 4 1 2 3 2 2 8 4 5 1 3 1 Number
5 0 2 3 2 1 0 6 4 2 1 6 0 3 3 3 6 1 2 3 of particles: 0 1 2 3 4 5 6 7
Frequency: 1 2 3 12 11 15 18 10
a. Determine frequencies and relative frequencies
for the observed values of x 5 number of noncon- Number
forming transducers in a batch. of particles: 8 9 10 11 12 13 14
b. What proportion of batches in the sample have Frequency: 12 4 5 3 1 2 1
at most five nonconforming transducers? What
proportion have fewer than five? What propor- a. What proportion of the sampled wafers had at
tion have at least five nonconforming units? least one particle? At least five particles?
c. Draw a histogram of the data using relative fre- b. What proportion of the sampled wafers had be-
quency on the vertical scale, and comment on its tween five and ten particles, inclusive? Strictly
features. between five and ten particles?
c. Draw a histogram using relative frequency on the
8. In a study of author productivity (“Lotka’s Test,” Col- vertical axis. How would you describe the shape of
lection Mgmt., 1982: 111–118), a large number of the histogram?

10. The article “Knee Injuries in Women Collegiate z: 1 8 6 1 1 5 3 0 0 4 4 0 0 1 2 1

Rugby Players” (Amer. J. of Sports Medicine, 1997: 4 0 4 0 3 0 1 1 0 1 3 2 4 6 6 0
360–362) gave the following data on type of injury 1 1 8 3 3 5 0 5 2 3 1 0 0 0 3
(A 5 mensical tear, B 5 MCL tear, C 5 ACL tear,
a. Construct a histogram for the y data. What pro-
D 5 patella dislocation, E 5 PCL tear):
portion of these subdivisions had no culs-de-sac?
A B B A C A A D B A C E B At least one cul-de-sac?
B A A C D C A C B C C C A b. Construct a histogram for the z data. What pro-
B B C A A B C C A C B B D portion of these subdivisions had at most five
A B A C B A A C A B B E B intersections? Fewer than five intersections?
B B C C A C A A B D A A C
B C C A B B A D C A B 13. The article “Ecological Determinants of Herd Size
in the Thorncraft’s Giraffe of Zambia” (Afric. J. Ecol.,
Construct a Pareto diagram for this data. The three
2010: 962–971) gave the following data (read from a
most frequently occurring types of injuries account
graph) on herd size for a sample of 1570 herds over a
for what proportion of all injuries?
34-year period.
11. The article “Determination of Most Representative
Herd size: 1 2 3 4 5 6 7 8
Subdivision” (J. of Energy Engr., 1993: 43–55) gave
data on various characteristics of subdivisions that Frequency: 589 190 176 157 115 89 57 55
could be used in deciding whether to provide electri- Herd size: 9 10 11 12 13 14 15 17
cal power using overhead lines or underground lines. Frequency: 33 31 22 10 4 10 11 5
Here are the values of the variable x 5 total length of
streets within a subdivision: Herd size: 18 19 20 22 23 24 26 32
Frequency: 2 4 2 2 2 2 1 1
1280 5320 4390 2100 1240 3060 4770
1050 360 3330 3380 340 1000 960 a. What proportion of the sampled herds had just
1320 530 3350 540 3870 1250 2400 one giraffe?
960 1120 2120 450 2250 2320 2400 b. What proportion of the sampled herds had six or
more giraffes (characterized in the article as “large
3150 5700 5220 500 1850 2460 5850
herds”)?
2700 2730 1670 100 5770 3150 1890
c. What proportion of the sampled herds had be-
510 240 396 1419 2109
tween 5 and 10 giraffes inclusive?
a. Construct a stem-and-leaf display using the thou- d. Draw a histogram using relative frequency on the
sands digit as the stem and the hundreds digit as vertical axis. How would you describe the shape of
the leaf, and comment on the various features of this histogram?
the display.
b. Construct a histogram using class boundaries 0, 14. The article “Statistical Modeling of the Time Course
1000, 2000, 3000, 4000, 5000, and 6000. What of Tantrum Anger” (J. of Applied Stats, 2009: 1013–
proportion of subdivisions have total length less 1034) discussed how anger intensity in children’s
than 2000? Between 2000 and 4000? How would tantrums could be related to tantrum duration as
you describe the shape of the histogram? well as behavioral indicators such as shouting, stamp-
ing, pushing, and pulling. The following frequency
12. The article cited in Exercise 11 also gave the follow- distribution was given (as well as the corresponding
ing values of the variables y 5 number of culs-de-sac histogram):
and z 5 number of intersections: 136
0 2 ,2: 2 2 ,4: 92 4 2 ,11: 71
y: 1 0 1 0 0 2 0 1 1 1 2 1 0 0 1 1 112 ,20: 26 20 2 ,30: 7 30 2 ,40: 3
0 1 1 1 1 0 0 0 1 1 2 0 1 2 2 1 Draw the histogram and then comment on any in-
1 0 2 1 1 0 1 5 0 3 0 1 1 0 0 teresting features.

15. Automated electron backscattered diffraction is now Construct a histogram of this data based on classes
being used in the study of fracture phenomena. with boundaries 10, 20, 30, . . . . Then calculate
The following information on misorientation angle log10(x) for each observation, and construct a his-
(degrees) was extracted from the article “Observations togram of the transformed data using class bound-
on the Faceted Initiation Site in the Dwell-Fatigue aries 1.1, 1.2, 1.3, . . . . What is the effect of the
Tested Ti-6242 Alloy: Crystallographic Orienta- transformation?
tion and Size Effects” (Metallurgical and Materials
17. The accompanying data set consists of observa-
Trans., 2006: 1507–1518)
tions on shear strength (lb) of ultrasonic spot
Class: 0 2 ,5 5 2 ,10 10 2,15 15 2 ,20 welds made on a certain type of alclad sheet.
Rel Freq: .177 .166 .175 .136 Construct a relative frequency histogram based
on ten equal-width classes with boundaries 4000,
Class: 20 2 ,30 30 2 , 40 40 2 , 60 602,90 4200, . . . . (The histogram will agree with the
Rel Freq: .194 .078 .044 .030 one in “Comparison of Properties of Joints Pre-
a. Is it true that more than 50% of the sampled angles pared by Ultrasonic Welding and Other Means,”
are smaller than 15°, as asserted in the paper? J. of Aircraft, 1983: 552–556.) Comment on its
b. What proportion of the sampled angles are at least features.
30°?
5434 4948 4521 4570 4990 5702 5241
c. Roughly what proportion of angles are between
5112 5015 4659 4806 4637 5670 4381
10° and 25°?
d. Construct a histogram and comment on any inter- 4820 5043 4886 4599 5288 5299 4848
esting features. 5378 5260 5055 5828 5218 4859 4780
5027 5008 4609 4772 5133 5095 4618
16. A transformation of data values by means of some
4848 5089 5518 5333 5164 5342 5069
mathematical function, such as 1x or 1yx, can often
yield a set of numbers that has “nicer” statistical prop- 4755 4925 5001 4803 4951 5679 5256
erties than the original data. In particular, it may be 5207 5621 4918 5138 4786 4500 5461
possible to find a function for which the histogram of 5049 4974 4592 4173 5296 4965 5170
transformed values is more symmetric (or even better, 4740 5173 4568 5653 5078 4900 4968
more like a bell-shaped curve) than the original data. 5248 5245 4723 5275 5419 5205 4452
For example, the article “Time Lapse Cinematograph- 5227 5555 5388 5498 4681 5076 4774
ic Analysis of Beryllium–Lung Ibroblast Interactions”
4931 4493 5309 5582 4308 4823 4417
(Envir. Research, 1983: 34–43) reported the results of
5364 5640 5069 5188 5764 5273 5042
experiments designed to study the behavior of certain
individual cells that had been exposed to beryllium. 5189 4986
An important characteristic of such an individual cell 18. The paper “Study on the Life Distribution of Micro-
is its interdivision time (IDT). IDTs were determined drills” ( J. of Engr. Manufacture, 2002: 301–305) re-
for a number of cells both in exposed (treatment) and ported the following observations, listed in increasing
in unexposed (control) conditions. The authors of the order, on drill lifetimes (number of holes that a drill
article used a logarithmic transformation. Consider machines before it breaks) when holes were drilled in
the following representative IDT data: a certain brass alloy.

28.1 31.2 13.7 46.0 25.8 16.8 34.8 62.3 11 14 20 23 31 36 39 44 47 50

28.0 17.9 19.5 21.1 31.9 28.9 60.1 23.7 59 61 65 67 68 71 74 76 78 79
18.6 21.4 26.6 26.2 32.0 43.5 17.4 38.8 81 84 85 89 91 93 96 99 101 104
30.6 55.6 25.5 52.1 21.0 22.3 15.5 36.3 105 105 112 118 123 136 139 141 148 158
19.1 38.4 72.8 48.9 21.4 20.7 57.3 40.9 161 168 184 206 248 263 289 322 388 513

a. Why can a frequency distribution not be based c. Construct a frequency distribution and histo-
on the class intervals 0–50, 50–100, 100–150, gram of the natural logarithms of the lifetime
and so on? observations and comment on interesting char-
b. Construct a frequency distribution and his- acteristics.
togram of the data using class boundaries 0, d. What proportion of the lifetime observations in
50, 100, . . . and then comment on interesting this sample are less than 100? What proportion of
characteristics. the observations are at least 200?

1.3 Describing Distributions

In Section 1.2, we saw that a histogram could be used to describe how values of a
variable x are distributed in a data set. In practice, a histogram is virtually always
constructed from sample data. Consider the population or process from which
a sample might be selected. It is often possible to give a concise mathematical
description of how the possible values of x are distributed or dispersed along the
number line or measurement scale. Suppose, for example, that x is the fuel effi-
ciency (mpg) of a vehicle of a particular type (a continuous variable), so that the
value of x varies from vehicle to vehicle. Knowing the distribution of x enables us
to determine the proportion of vehicles for which x is less than 32, the proportion
for which x exceeds 30.5, the proportion of vehicles having 31.5 , x , 32.5, and
so on. If x is the number of defects on an item produced by some process (a discrete
variable), then the x distribution will describe what proportion of items produced
will have x 5 0, what proportion will have x 5 1, and so on. We now describe the
essential features of distributions for continuous variables and those for discrete
variables.

Continuous Distributions
Let x be a continuous variable, one whose value is determined by making a measurement of
some sort. Suppose we have a sample of x values from a population or ongoing process. For
example, the sample might consist of fuel efficiencies of cars selected from a large rental
fleet (a population) or waiting times for a succession of patients entering a large medical
clinic (a patient arrival process). If the sample size is small, a histogram based on only a
small number of relatively wide class intervals is appropriate. For a large sample size, many
narrow classes should be used. Let’s agree to draw our histograms using the density scale
discussed in Section 1.2 so that
For each rectangle, area 5 relative frequency of the class
Total area of all rectangles 5 1
With a large amount of data, a histogram based on any reasonable choice of classes should
have roughly the same shape and can very frequently be well approximated by a smooth
curve. This type of approximation is illustrated in Figure 1.13.
Many approximating curves that arise in practice can be obtained as graphs of reason-
ably simple mathematical functions. Such a mathematical function provides a very concise
description of the x distribution.

Density Density Density

(a) (b) (c)

Figure 1.13 Histograms of continuous data: (a) small number of wide classes; (b) large number of narrow
classes; (c) approximation by a smooth curve

DEFINITIONS A density function f (x) is used to describe (at least approximately) the population
or process distribution of a continuous variable x. The graph of f (x) is called the
density curve. The following properties must be satisfied:
1. f (x) $ 0
2. #2 f (x) dx 5 1 (the total area under the density curve is 1.0)
3. For any two numbers a and b with a , b,
b
proportion of x values between a and b 5 # f (x) dx
a

(This proportion is the area under the density curve and above the interval
with endpoints a and b, as illustrated in Figure 1.14.)

( )

proportion of values
Shaded area =
between and
Unless otherwise noted, all content on this page is © Cengage Learning.

Figure 1.14 The area under the density

curve is equal to the proportion of values in an
interval

There is no area under the density curve and above a single value (e.g., above 2.50), which
implies that
proportion of x values satisfying proportion of x values satisfying
5
a#x#b a,x,b

That is, the area under the curve between a and b does not depend on whether the two
interval endpoints are included or excluded.

Example 1.12 A certain daily program on a public radio station lasts 1 hour. Let x denote the
amount of time (hr) during which music is played. (There are no advertisements, but
the host provides occasional commentary and makes announcements.) A potential
program sponsor is interested in knowing how the value of x varies from program to
program. Consider the density function

90x8(1 2 x) 0#x#1
f (x) 5 e
0 otherwise

This looks complicated, but the corresponding density curve in Figure 1.15 has a
simple and appealing shape.

( )

0
0 .5 1

Figure 1.15 Density curve for Example 1.12

We see immediately that most x values are quite close to 1 and very few are small-
er than .5 (almost all programs consist of at least a half hour of music). The constant
90 in f(x) ensures that the total area under the density curve is 1.0 [f(x) 5 kx8(1 2 x) Unless otherwise noted, all content on this page is © Cengage Learning.

is a legitimate density function only for k 5 90]. Various proportions of interest can
now be obtained by integration. For example,

.9 .9 .9
proportion of programs
5 # 90x8(1 2 x) dx 5 90# x8 dx 2 90# x9 dx
with x between .7 and .9 .7 .7 .7

x9 x10 2 .9
5 90a 2 b 5 .587
9 10 .7

proportion of programs 1
for which x is at least .8 5 #.8 90x (1 2 x) dx 5 .624
8

What duration value c separates the smallest 50% of all x values from the largest
50%? Figure 1.16 shows the location of c; the corresponding equation is
c
#0 90x8(1 2 x) dx 5 .5
which becomes
c9 c10
90a 2 b 5 .5
9 10
Newton’s method or some other numerical technique is used to obtain the solution:
c .838. That is, about 50% of all programs have music for more than .838 hr, and
about 50% have music for less than .838 hr. The value .838 is called the median of
the x distribution.
( )

3
Shaded area = .5

0
0 .5 1
Median = .838

Figure 1.16 Determining the median of the

distribution in Example 1.12

Example 1.13 Let x denote the response time (sec) at a certain on-line computer; that is, x is the
Unless otherwise noted, all content on this page is © Cengage Learning.

time between the end of a user’s inquiry and the beginning of the system’s response
to that inquiry. The value of x varies from inquiry to inquiry. Suppose the density
function for the distribution of x is

.2e2.2x x$0
f (x) 5 e
0 otherwise

where e represents the base of the natural logarithm system and approximately equals
2.71828. A graph of f (x) is shown in Figure 1.17. By inspection, f (x) $ 0, and

#2 f (x) dx 5 # .2e2.2x dx 5 2e2.2x 2

0 0
51

( ) ( )

.2 .2

.1 .1
Shaded area = .10

0 0
0 10 20 0 10 20
11.5 = 90th percentile

Figure 1.17 The density curve and 90th percentile for Example 1.13

The proportion of inquiries with a response time less than 5 sec is

5
#0 .2e2.2x dx 5 1 2 e2.2(5) 5 .632
So 63.2% of all response times are at most 5 sec, and 36.8% of all times exceed 5 sec.
The value c that separates the largest 10% of all times from the smallest 90% (called
the 90th percentile) satisfies
c
.9 5 # .2e2.2x dx 5 1 2 e2.2c
0

from which c 5 2[ln(.1)]y.2 5 11.5. Only about 10% of all inquiries will have re-
sponse times exceeding 11.5 sec.

The density function in Example 1.13 is a particular case of a more general function.
Unless otherwise noted, all content on this page is © Cengage Learning.

DEFINITION A variable x is said to have an exponential distribution with parameter . 0 if

the density function for x is
e 2x x$0
f (x) 5 e
0 otherwise

Each different value of prescribes a different exponential distribution, so we have an en-

tire family of distributions. The shape of each density curve is like the curve in Figure 1.17;
the curve starts at height above x 5 0 and decreases exponentially as x increases. The ex-
ponential distribution has been used to model many different phenomena, including time

between successive arrivals at a service facility, the amount of time to complete a specified
task, and the 1-hr concentration of carbon monoxide in an air sample. In Sections 1.4 and
1.5, we introduce several other important continuous distributions.

Discrete Distributions
Let’s focus on a variable x whose possible values are nonnegative integers; usually the value
of x results from counting something. A histogram of sample data will have rectangles
centered at values 0, 1, 2, . . . (or some subset of these) regardless of the sample size. How-
ever, as the sample size increases, the relative frequencies (sample proportions of various
x values) tend to get closer and closer to their true population or process counterparts. We
will use the following notation:
proportion of x values in the population that equal 0, or the long run
p(0) 5 proportion of x values in a process that equal 0

p(1) 5 proportion of x values in the population that equal 1, or the long@run

proportion of x values in a process that equal 1

and so on. None of these proportions can be negative, and their sum must be 1 (so that
100% of the x values are included).

DEFINITION A population or process distribution for a discrete variable x is specified by a mass

function p(x) satisfying
p(x) $ 0 ^ p(x) 5 1
where the summation is over all possible x values. Other interesting proportions
can be obtained by adding various p(x) values. In particular, if a and b are integers
with a , b, then
proportion of x values
5 p(a) 1 p(a 1 1) 1 1 p(b)
between a and b (inclusive)

Example 1.14 Consider a package of four batteries of a particular type, and let x denote the number
of satisfactory (i.e., nondefective) batteries in the package. Possible values of x are 0,
1, 2, 3, and 4. One reasonable distribution for x is specified by the following mass
function:
24
p(x) 5 (.9)x(.1)42x x 5 0, 1, 2, 3, 4
x!(4 2 x)!

where “!” is the factorial symbol (e.g., 4! 5 (4)(3)(2)(1) 5 24, 1! 5 1, and 0! 5 1).
This looks a bit intimidating, but there is an intuitive argument leading to p(x) that
we will mention shortly. Substituting x 5 3, we get
24
p(3) 5 (.9)3(.1)1 5 .2916
(6)(1)

That is, roughly 29% of all packages will have three good batteries. Substituting the
other x values gives us the following tabulation:
x: 0 1 2 3 4
p(x): .0001 .0036 .0486 .2916 .6561
The proportion of packages with at least two good batteries is
proportion of packages with x
p(2) 1 p(3) 1 p(4) 5 .9963
values between 2 and 4 (inclusive) 5
More than 99% of all packages have at least two good batteries.

In Section 1.6, we will generalize the distribution of Example 1.14 and introduce one
additional important discrete distribution.

Section 1.3 Exercises

19. A continuous variable x is said to have a uniform distri- c. For any number k satisfying 25 , k , k 1 4 , 5,
bution if the density function is given by what long-run proportion of temperatures will be
between k and k 1 4?
1 a,x,b
f (x) 5 c b 2 a 21. Suppose that your morning waiting time for a bus
0 otherwise has a uniform distribution on the interval from 0
to 5 min, and your afternoon waiting time also has
The corresponding density “curve” has constant this distribution. Then if x denotes the total waiting
height over the interval from a to b. Suppose the time time on any particular day, the density function of x
(min) taken by a clerk to process a certain application can be shown to be
form has a uniform distribution with a 5 4 and b 5 6.
a. Draw the density curve, and verify that the total .04x for 0 , x , 5
area under the curve is indeed 1. f (x) 5 c .4 2 .04x for 5 # x , 10
b. In the long run, what proportion of forms will take 0 for other values of x
between 4.5 min and 5.5 min to process? At least
4.5 min to process?
a. Draw the density curve, and verify that f (x) speci-
c. What value separates the slowest 50% of all pro-
fies a legitimate distribution.
cessing times from the fastest 50% (the median of
b. In the long run, what proportion of your total daily
the distribution)?
waiting times will be at most 3 min? At least 7 min?
d. What value separates the best 10% of all process-
At least 4 min? Between 4 min and 7 min?
ing times from the remaining 90%?
c. What value separates the longest 10% of your
20. Suppose that the reaction temperature x (°C) in a cer- daily waiting times from the remaining 90%?
tain chemical process has a uniform distribution with
a 5 25 and b 5 5 (refer to Exercise 19 for a descrip- 22. Data collected at Toronto Pearson International
tion of a uniform distribution). Airport suggests that an exponential distribution
a. In the long run, what proportion of these reactions with 5 .37 is a good model for rainfall dura-
will have a negative value of temperature? tion in hours (Urban Stormwater Management
b. In the long run, what proportion of temperatures Planning with Analytical Probabilistic Models,
will be between 22 and 2? Between 22 and 3? 2000, p. 69).

a. What proportion of rainfall durations at this b. For the legitimate distribution of part (a), deter-
location are at least 2 hours? At most 3 hours? mine the long-run proportion of cars having at
Between 2 and 3 hours? most two underinflated tires, the proportion hav-
b. What must the duration of a rainfall be to place ing fewer than two underinflated tires, and the
it among the longest 5% of all times? proportion having at least one underinflated tire.

23. Extensive experience with fans of a certain type used 27. A mail-order computer business has six telephone
in diesel engines has suggested that the exponential lines. Let x denote the number of lines in use at a
distribution with 5 .00004 provides a good model specified time. Suppose the mass function of x is
for time until failure (hr). given by
a. Sketch a graph of the density function.
x: 0 1 2 3 4 5 6
b. What proportion of fans will last at least
p(x): .10 .15 .20 .25 .20 ? ?
20,000 hr? At most 30,000 hr? Between 20,000
and 30,000 hr? a. In the long run, what proportion of the time will at
c. What must the lifetime of a fan be to place it among most three lines be in use? Fewer than three lines?
the best 1% of all fans? Among the worst 1%? b. In the long run, what proportion of the time will at
least five lines be in use?
24. The article “Probabilistic Fatigue Evaluation of c. In the long run, what proportion of the time will
Riveted Railway Bridges” (J. of Bridge Engr., 2008: between two and four lines, inclusive, be in use?
237–244) suggested the exponential distribution with d. In the long run, what proportion of the time will at
5 1 6 as a model for the distribution of stress range least four lines not be in use?
(MPa) in certain bridge connections.
a. What proportion of stress ranges are at least 28. A contractor is required by a county planning de-
2 MPa? At most 7 MPa? Between 5 and 10 MPa? partment to submit 1, 2, 3, 4, or 5 forms (depending
b. What value separates the highest 2% of the stress on the nature of the project) when applying for a
ranges from the remaining 98%? building permit. Let y denote the number of forms
required for an application, and suppose the mass
25. The actual tracking weight of a stereo cartridge set to function is given by p(y) 5 cy for y 5 1, 2, 3, 4, or
track at 3 g can be regarded as a continuous variable 5. Determine the value of c, as well as the long-run
with density function f (x) 5 c[1 2 (x 2 3)2] for 2 , proportion of applications that require at most three
x , 4 and f (x) 5 0 otherwise. forms and the long-run proportion that require be-
a. Determine the value of c [you might find it help- tween two and four forms, inclusive.
ful to graph f (x)].
b. What proportion of actual tracking weights ex- 29. Many manufacturers have quality control programs
ceed the target weight? that include inspection of incoming materials for
c. What proportion of actual tracking weights are defects. Suppose a computer manufacturer receives
within .25 g of the target weight? computer boards in batches of five. Two boards
are randomly selected from each batch for inspec-
26. Let x represent the number of underinflated tires on tion. Consider batches for which exactly two of the
an automobile. boards are defective; for convenience, number the
a. Which of the following p(x) functions specifies a defective boards as 1 and 2, and the nondefective
legitimate distribution for x, and why are the other boards as 3, 4, and 5. Let x denote the number of
two not legitimate? defective boards among the two actually inspected,
(i) p(0) 5.3, p(1) 5 .2, and determine the mass function of x. Hint: One
p(2) 5 .1, p(3) 5 .05, p(4) 5 .05 possible sample of size 2 consists of boards 1 and
(ii) p(0) 5 .4, 2, another of boards 1 and 3, and so on. How many
p(1) 5 p(2) 5 p(3) 5 .1, p(4) 5 .3 such samples are there, and what is the value of x
(iii) p(x) 5 .2(3 2 x) for x 5 0, 1, 2, 3, 4 for each sample?

1.4 The Normal Distribution

The normal distribution is the most important distribution in statistics. A typical normal
density curve is shown in Figure 1.18. Many population and process variables have dis-
tributions that can be very closely fit by an appropriate normal curve. Examples include
heights, weights, and other physical characteristics of humans and animals, anthropometric
measurements on fossils, measurement errors in scientific experiments, reaction times in
psychological experiments, pollutant concentrations of various sorts, amounts dispensed
into containers by machines, thicknesses of material specimens, and numerous economic
measures and indicators. In addition, even when individual variables themselves are not
normally distributed, sums and averages of the variables will, under suitable conditions,
have approximately a normal distribution; this is the content of the Central Limit Theorem,
discussed in Chapter 5.

( )

.08

.06

.04

.02

0
80 90 100 110 120

Figure 1.18 A typical normal density curve

DEFINITION A continuous variable x is said to have a normal distribution with parameters Unless otherwise noted, all content on this page is © Cengage Learning.
and , where 2∞ , , ∞ and . 0, if the density function of x is

1 2 2
f (x) 5 e2(x2) y(2 ) 2 ,x,
22

Again, e denotes the base of the natural logarithm system and has an approximate value
of 2.71828, whereas represents the familiar mathematical constant approximately
equal to 3.14159.
Clearly, f (x) $ 0 for any number x, but techniques from multivariable calculus must
be used to show that #2 f (x) dx 5 1. The graph of f (x)—the density curve—is always a

bell-shaped curve (and hence symmetric) centered at , so is the median of the distribu-
tion. If the value of is close to zero, the normal curve is highly concentrated about (little
variability in the distribution), whereas a large value of corresponds to a curve that spreads
out a great deal (a substantial amount of variability). Figure 1.19 displays several different
normal density curves. Any normal curve has two inflection points—points at which the
curve changes from being concave downward to concave upward—that are equidistant
from . It can be shown that the value of is the distance from to each inflection point,
as illustrated in Figure 1.20.

= 40, = 2.5

= 10, =5

0 10 20 30 40 50

= 70, = 10

50 60 70 80 90

Figure 1.19 Several normal density curves

Curve turns downward

Curve turns upward Curve turns upward

= 10 = 10

80 90 = 100 110 120

Figure 1.20 Visual identification of and

Unless otherwise noted, all content on this page is © Cengage Learning.

Suppose that capacitors of a certain type have resistances that vary according to a nor-
mal distribution, with 5 800 megohms and 5 200 megohms. If a particular applica-
tion requires a resistance between 775 megohms and 850 megohms, the proportion of
capacitors with satisfactory values of resistance (x) is

proportion of x values 850 1

5#
2
e2(x2800) y[2(40,000)] dx
between 775 and 85 775 22(200)

Unfortunately, none of the standard integration techniques can be used to evaluate this
integral. To calculate proportions of this sort, a special normal reference distribution is
needed.

The Standard Normal Distribution

DEFINITIONS The normal distribution with parameter values 5 0 and 5 1 is called the
standard normal distribution. We shall use the letter z to denote a variable that
has this distribution. The corresponding density function is
1 2
f (z) 5 e2z y2 2 ,z,
22
The standard normal density curve, or z curve, is shown in Figure 1.21. It is cen-
tered at 0 and has inflection points at 61.

Appendix Table I, which also appears on the inside front cover of the book, is a tabula-
tion of cumulative z curve areas; that is, the table gives areas under the z curve to the left of
various values (to 2 ), as illustrated in Figure 1.21. Entries in this table were obtained by
using numerical integration techniques, since the standard normal density function cannot
be integrated in a straightforward way. Let’s first use this table to obtain various z curve areas
and other z curve information, and then see how the table applies to any normal curve.

Shaded area = Proportion of values less than

Standard normal ( ) curve

–2 –1 0 1 2

Figure 1.21 The standard normal ( ) curve and

a cumulative curve area

Example 1.15 The proportion of values in a standard normal distribution that are less than 1.25 is
Unless otherwise noted, all content on this page is © Cengage Learning.
proportion of z values entry in Appendix Table I at the intersection
5
satisfying z , 1.25 of the 1.2 row and .05 column
5 .8944
It is also true that
proportion of z values satisfying z # 1.25 5 .8944
Similarly,
proportion of z values entry in 20.3 row and .08 column
5
satisfying z , 2.38 of Appendix Table I

5 .3520

Figure 1.22 illustrates the simple relationship between an upper-tail area and a
cumulative area.

Cumulative area
1 = Total area
to the left of
Area to the right of
= –

Figure 1.22 Obtaining an “area to the right” from a cumulative curve area

In particular,

proportion of values
satisfying z . 1.25 5 area under z curve to the right of 1.25
5 1 2 area to the left of 1.25
5 1 2 .8944
5 .1056

What about the area under the z curve and above the interval between 2.38 and
1.25? Figure 1.23 shows that this is a difference between two cumulative areas:
proportion of z values
satisfying 2.38 , z , 1.25 5 (area to the left of 1.25)
2(area of the left of 2.38)
5 .8944 2 .3520 5 .5424

The proportion of z values satisfying –.38 # z # 1.25 is also .5424.

curve
Unless otherwise noted, all content on this page is © Cengage Learning.

= –

–.38 0 1.25 0 1.25 –.38 0

Figure 1.23 The area above an interval is the difference between two cumulative areas

In Example 1.15, a value on the horizontal z scale was specified and a curve area was
determined. We now reverse this process by showing how to select a value or values to cap-
ture a specified curve area.

Example 1.16 What value c on the horizontal z axis is such that the area under the z curve to the
left of c is .67? Figure 1.24 illustrates the situation.

Cumulative area = .67

curve

Figure 1.24 Determining to

capture a specified cumulative area

In Appendix Table I, we must look in the main body for .6700 (or the closest
entry to it). The value .6700 does indeed appear; it is at the intersection of the 0.4 row
and the .04 column. Thus c 5 .44. That is, 67% of the area under the z curve lies to
the left of .44. Another way of expressing this is to say that .44 is the 67th percentile of
the standard normal distribution. If .6710 replaces .6700 in the question posed, the
closest tabulated entry is .6700. Rather than use linear interpolation, we generally
recommend simply using the closest entry to answer the question; our answer to the
revised question would also be (approximately) .44.
What value c captures the upper-tail z curve area .05, as illustrated in Fig-
ure 1.25? The cumulative area to the left of c must be .9500. A search for this area
in Appendix Table I reveals the following information about the two closest entries:
.9495 is in the 1.6 row and .04 column
.9505 is in the 1.6 row and .05 column

Because the desired area .9500 is halfway between the two closest entries, we use
interpolation to find c 5 1.645 (1.64 or 1.65 would also be acceptable answers).
Finally, what interval, symmetrically placed about zero, captures 95% of the
area under the z curve? This situation is illustrated in Figure 1.26.
Unless otherwise noted, all content on this page is © Cengage Learning.

curve

Central area = .95

Upper-tail area = .05 Lower-tail area = .025

0 – 0

Figure 1.25 Finding the value to capture a Figure 1.26 Determining to capture a
specified upper-tail area specified central curve area

Since the lower-tail area to the left of 2c must be .025, the cumulative area to
the left of c is .9500 1 .0250 5 .9750. This cumulative area is in the 1.9 row and .06
column of the z table, so c 5 1.96. Alternatively, the desired lower-tail area .0250 lies
in the 21.9 row and .06 column of the z table, so 2c 5 21.96 and again c 5 1.96.

Nonstandard Normal Distributions

Any normal curve area can be obtained by first calculating a “standardized” limit or limits,
and then determining the corresponding area under the z curve. The particulars are pre-
sented in the following proposition.

Proposition Let x have a normal distribution with parameters and . Then the standardized
variable
x2
z5

has a standard normal distribution. This implies that if we form the standardized limits
a2 b2
a* 5 b* 5

then
proportion of x values satisfying proportion of z values satisfying
5
a ,x,b a* , z , b*

proportion of x values satisfying proportion of z values satisfying

5
x,a z , a*

proportion of x values satisfying proportion of z values satisfying

5
x .b z . b*

Example 1.17 The time that it takes a driver to react to the brake light on a decelerating vehicle
is critical in avoiding rear-end collisions. The article “Fast-Rise Brake Lamp as a
Collision-Prevention Device” (Ergonomics, 1993: 391–395) suggests that reaction
time for an in-traffic response to a brake signal from standard brake lights can be
modeled with a normal distribution having parameters 5 1.25 sec and 5 .46 sec.
In the long run, what proportion of reaction times will be between 1.00 sec and
1.75 sec? Let x denote reaction time. The standardized limits are
1.00 2 1.25 1.75 2 1.25
5 2.54 5 1.09
.46 .46

Thus
proporiton of x values proportion of z values satisfying
5
satisfying 1.00 , x , 1.75 2.54 , z , 1.09
entry in 1.0 row, .09 entry in 20.5 row,
5 2
column of z table .04 column of z tables
5 .8621 2 .2946
5 .5675
This calculation is illustrated in Figure 1.27.

Proportion of
values between
1.00 and 1.75
Normal, = 1.25, = .46
curve

1.25 0

1.00 1.75 –.54 1.09

Figure 1.27 Standardizing to calculate the desired proportion in Example 1.17

Similarly, if 2 sec is viewed as a critically long reaction time, the proportion of

reaction times that exceed this value is, since (2 2 1.25)y.46 5 1.63,

proportion of x values
5 proportion of z values that exceed 1.63
that exceed 2.0
5 1 2 area under z curve to the left of 1.63
5 1 2 .9484 Unless otherwise noted, all content on this page is © Cengage Learning.

5 .0516

Only a bit more than 5% of all reaction times will exceed 2 sec.

Example 1.18 The amount of distilled water dispensed by a certain machine has a normal distribu-
tion with 5 64 oz and 5 .78 oz. What container size c will ensure that overflow
occurs only .5% of the time? Let x denote the amount of water dispensed. The den-
sity curve for x is pictured in Figure 1.28, which shows that c captures a cumulative

Shaded area = .995

Normal, = 64, = .78
curve

= 64 0

= 99.5th percentile = 66.0 2.58 = 99.5th percentile

Figure 1.28 Distribution of amount dispensed and desired percentile for

Example 1.18

area of .995 under this normal curve. That is, c is the 99.5th percentile of this normal
distribution. Standardizing then tells us that
proportion of x values c 2 64
5 proportion of z values satisfying z ,
satisfying x , c .78
5 .995

How can we capture cumulative area .9950 under the z curve? The 2.5 row of
Appendix Table I has entries .9949 and .9951 in the .07 and .08 columns, respectively.
Let’s use the value 2.58 (a more detailed tabulation gives 2.576). This implies that

c 2 64
5 2.58
.78

giving

c 5 64 1 2.58(.78) 5 64 1 2.0 5 66 oz

Notice that the general form of the expression for c in Example 1.18 is

c 5 1 (z critical value)
Unless otherwise noted, all content on this page is © Cengage Learning.

where the z critical value captures the desired cumulative area under the z curve. Once we
know how to capture a particular cumulative area under the z curve, it is easy to determine
how to capture the same area under any other normal curve.
A histogram of sample data may suggest that a normal curve specifies a reasonable
population or process distribution, but appropriate values of and still remain to be
chosen. In Chapter 2, we begin to see how this can be done.

The Normal Distribution and Discrete Populations

The normal distribution is often used as an approximation to the distribution of values in
a discrete population. For example, the distribution of x 5 IQ in many populations is tak-
en to be approximately normal with 5 100 and 5 15, though IQ is an integer-valued
variable. A picture of the population distribution consists of a histogram with rectangles

centered at possible values of x. Consider the distribution of x 5 the number of correct

responses among 20 true–false questions included on a final exam. A picture of the distri-
bution is shown in Figure 1.29 along with the approximating normal curve. Notice that
the rectangle above 10 has its right edge at 10.5, so an approximation to the proportion of
x values that are at most 10 is the area under the normal curve to the left of 10.5 (i.e., 10.5
should be standardized to obtain the approximation).

( )

.20

Normal curve, = 12, = 2.2

.15

Normal approximation to
Shaded area =
.10 proportion of values 10

.05

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
10.5

Figure 1.29 A normal approximation to the distribution of 5 number of cor-

rect responses on a 20-question true–false test

Section 1.4 Exercises

30. Suppose that values are repeatedly chosen from a a. Be at most 1.78 b. Exceed .55
standard normal distribution. c. Exceed 2.80 d. Be between .21 and 1.21
a. In the long run, what proportion of values will be e. Be either at most 22.00 or at least 2.00 Unless otherwise noted, all content on this page is © Cengage Learning.
at most 2.15? Less than 2.15? f. Be at most 2 4.2 g. Be at least 4.33
b. What is the long-run proportion of selected values
32. a. What value z* is such that the area under the stan-
that will exceed 1.50? That will exceed 22.00?
dard normal curve to the left of z* is .9082?
c. What is the long-run proportion of values that will
b. What value z* is such that the area under the stan-
be between 21.23 and 2.85?
dard normal curve to the left of that value is .9080?
d. What is the long-run proportion of values that will
c. What value z* is such that the area under the
exceed 5? That will exceed 25?
standard normal curve to the right of z* is .121?
e. In the long run, what proportion of selected values
d. What value z* is such that the area under the
z will satisfy |z| , 2.50?
standard normal curve between 2 z* and z*
31. In the long run, what proportion of values selected is .754?
from the standard normal distribution will satisfy e. How far to the right of 0 would you have to go to
each of the following conditions? capture an upper-tail z curve area of .002? How

far to the left of 0 would you have to go to cap- cle Inspections” (J. of Automobile Engr., 2008: 1615–
ture this same lower-tail area? 1623) described a rolling bench test for determining
maximum vehicle speed. A normal distribution with
33. Suppose that values are successively chosen from the
5 46.8 km/h and 5 1.75 km/h is postulated.
standard normal distribution.
a. What proportion of mopeds have a maximum
a. How large must a value be to be among the largest
speed that is at most 50 km/h?
15% of all values selected?
b. What proportion of mopeds have a maximum
b. How small must a value be to be among the small-
speed that is at least 48 km/h?
est 25% of all values selected?
c. What speed separates the fastest 75% of all mopeds
c. What values are among the 4% that are farthest
from the others?
from 0?
38. Spray drift is a constant concern for pesticide ap-
34. Determine the following percentiles for the standard
plicators and agricultural producers. The inverse
normal distribution:
relationship between droplet size and drift potential
a. 91st b. 9th c. 22nd d. 99.9th
is well known. The paper “Effects of 2,4-D Formu-
35. Suppose that the thicknesses of bolts (mm) manufac- lation and Quinclorac on Spray Droplet Size and
tured by a certain process can be modeled with a nor- Deposition” (Weed Technology, 2005: 1030–1036)
mal distribution having 5 10 and 5 1. Note: The investigated the effects of herbicide formulation on
density curve here is just the standard normal curve spray atomization. A figure in the paper suggested
shifted to be centered at 10 rather than 0. the normal distribution with 5 1050 m and 5
a. What is the long-run proportion of bolts whose 150 m was a reasonable model for droplet size for
thicknesses are at most 11 mm? Hint: The corre- water (the “control treatment”) sprayed through a
sponding normal curve area is identical to what z 760 ml/min nozzle.
curve area? a. What proportion of all droplets have a size that is
b. In the long run, what proportion of these bolts less than 1500 m? At least 1000 m?
will have thickness values between 7.5 mm and b. What proportion of all droplets have a size that is
12.5 mm? between 1000 and 1500 m?
c. In the long run, what proportion of these bolts will c. How would you characterize the smallest 2% of
have thicknesses that exceed 11.5 mm? all droplets?

36. Suppose the flow of current (milliamps) in wire 39. The article “Reliability of Domestic-Waste Biofilm
strips of a certain type under specified conditions Reactors” (J. of Envir. Engr., 1995: 785–790) sug-
can be modeled with a normal distribution hav- gests that substrate concentration (mg/cm3) of influ-
ing 5 20 and 5 1 (think about how the cor- ent to a reactor is normally distributed with 5 .30
responding density curve relates to the standard and 5 .06.
normal curve). a. What proportion of concentration values exceed
a. What proportion of strips will have a current flow .25?
of between 18.5 and 22 milliamps? b. What proportion of concentration values are at
b. What proportion of strips will have a current flow most .10?
exceeding 15 milliamps? c. How would you characterize the largest 5% of all
c. How large must a current flow be to be among the concentration values?
largest 5% of all flows?
40. Consider babies born in the “normal range” of 37–43
37. Mopeds (small motorcycles with an engine capacity weeks gestational age. Extensive data supports the as-
below 50 cm3) are popular in Europe because of their sumption that for such babies born in the United States,
mobility, ease of operation, and low cost. The article birth weight is normally distributed with 5 3432 g
“Procedure to Verify the Maximum Speed of Auto- and 5 482 g. [The article “Are Babies Normal?”
matic Transmission Mopeds in Periodic Motor Vehi- (The American Statistician, 1999: 298–302) analyzed

data from a particular year; for a sensible choice of class b. What proportion of reels will have at most 30
intervals, a histogram did not look normal but further flaws? Fewer than 30 flaws?
investigation revealed that this was because some hos-
42. Based on extensive data from an urban freeway near
pitals measured weight in grams and others measured
Toronto, Canada, “it is assumed that free speeds
to the nearest ounce and then converted the data to
can best be represented by a normal distribution”
grams. A modified choice of class intervals that allowed
(“Impact of Driver Compliance on the Safety and
for this gave a histogram that was well described by a
Operational Impacts of Freeway Variable Speed
normal distribution.]
Limit Systems” (J. of Transp. Engr., 2011: 260–268)).
a. For babies of this type, what proportion of all birth
The values of and reported in the article were
weights exceeds 4000 g?
119 km/h and 13.1 km/h, respectively.
b. For babies of this type, what proportion of all birth
a. What percentage of vehicles have speeds that are
weights is between 3000 and 4000 g?
between 100 and 120 km/hr?
c. How would you characterize the highest .1% of all
b. What speed characterizes the fastest 10% of all
birth weights?
speeds?
d. What value c is such that the interval (3432 2 c,
c. The posted speed limit was 100 km/hr. What
3432 1 c) includes 98% of all birth weights?
percentage of vehicles were traveling at speeds
41. Let x denote the number of flaws along a 100-m exceeding this posted limit?
reel of magnetic tape (values of x are whole num- d. What two values, symmetrically placed about 119,
bers). Suppose x has approximately a normal distri- capture 90% of all vehicle speeds.
bution with 5 25 and 5 5. e. What values symmetrically placed about 119 sep-
a. What proportion of reels will have between 20 arate .1% of the most extreme vehicle speeds from
and 40 flaws, inclusive? the rest?

1.5 Other Continuous Distributions

Normal density curves are always bell-shaped and therefore symmetric. Exponential densi-
ty curves are positively skewed but have their maximum at x 5 0 and decrease as x increases.
Many histograms of data encountered in applied work are skewed and unimodal, rising to
a maximum and then declining. We now present several useful distributions that have this
property. Our survey is not exhaustive. Consult the bibliography at the end of the chapter
for information on the gamma, beta, and other distributions not discussed here.

The Lognormal Distribution

Lognormal distributions are related to normal distributions in exactly the way the name
suggests.

definition A nonnegative variable x is said to have a lognormal distribution if ln(x) has a

normal distribution with parameters and . It can be shown that the density
function of x is
e2[ln(x)2] y(2 )
2 2
1
x.0
12x
f (x) 5 c
0 for x # 0

Figure 1.30 illustrates density curves for several different combinations of and .
Every lognormal distribution is positively skewed. The following example shows that by
taking logarithms, calculation of any lognormal curve area reduces to a normal distribu-
tion computation.

( )

.25

.20
= 1, =1
.15

.10 = 3, = !3
= 3, =1
.05

0
0 5 10 15 20 25

Figure 1.30 Lognormal density curves

Example 1.19 According to the article “Predictive Model for Pitting Corrosion in Buried Oil and
Gas Pipelines” (Corrosion, 2009: 332–342), the lognormal distribution has been re-
ported as the best option for describing the distribution of maximum pit depth data
from cast iron pipes in soil. The authors suggest that a lognormal distribution with 5
.353 and 5 .754 is appropriate for maximum pit depth (mm) of buried pipelines.
Since x , 2 is equivalent to ln(x) , ln(2) 5 .693,
proportion of pipelines 5 proportion of pipelines with ln(x) , .693
with x , 2
5 area under normal (.353, .754) curve to the left of .693
5 area under z curve to the left of (.693 2 .353) .754
Unless otherwise noted, all content on this page is © Cengage Learning.

5 area under z curve to the left of .45

5 .6736
Similarly, since ln(1) 5 0 and (0 2 .353)/.754 5 20.47,
proportion of pipelines
5 area under z curve between 2 0.47 and 0.45
with 1 , x , 2
5 .6736 2 .3192
5 .3544

The Weibull Distribution

This distribution was introduced in 1939 by a Swedish physicist who developed many
applications over the course of the following two decades.

definition A variable x has a Weibull distribution with parameters and if the density
function of x is

21 2(xy)
x e x.0
f (x) 5 c
0 x#0

When 5 1, the Weibull density function reduces to the exponential density function
(with 5 1y). Figure 1.31 shows several Weibull density curves. Some combinations
of and result in a positive skew and others, a negative skew.

( )
( )

1 8

= 1, = 1 (exponential) 6
= 10, = .5
= 2, =1
.5 4
= 10, =1
= 2, = .5
= 10, =2
2

0
0 5 10 0 .5 1.0 1.5 2.0 2.5

Figure 1.31 Weibull density curves

Let t represent some positive number. The proportion of x values satisfying x , t is

t
area under density curve to the left of t 5 # f (x) dx
0

5 1 2 e2(ty) Unless otherwise noted, all content on this page is © Cengage Learning.

Thus, rather than needing a table of cumulative areas, such as the z table for normal dis-
tribution calculations, we use a simple mathematical function to get this information.

Example 1.20 In recent years the Weibull distribution has been used to model engine emissions
of various pollutants. Let x denote the amount of NOx emission (g/gal) from a cer-
tain type of four-stroke engine, and suppose that x has a Weibull distribution with
5 2 and 5 10 (suggested by information in the article “Quantification of Vari-
ability and Uncertainty in Lawn and Garden Equipment NOx and Total Hydrocar-
bon Emission Factors,” J. of the Air and Waste Management Assoc., 2002: 435–448).
The corresponding density curve looks exactly like the one in Figure 1.31 for 5 2,

5 1 except that now the values 50 and 100 replace 5 and 10 on the horizontal axis
(because is a “scale parameter”). Then
proportion of engines emitting 2

less than 10 g/gal 5 1 2 e2(10y10) 5 1 2 e21 5 .632

The proportion of engines emitting at most 25 g/gal is .998, so the distribution is almost
entirely concentrated on values between 0 and 25. The value c which separates the 5% of
all engines having the largest amounts of NOx emissions from the remaining 95% satisfies
2
.95 5 1 2 e2(cy10)
Isolating the exponential term on one side, taking logarithms, and solving the result-
ing equation gives c 17.3 as the 95th percentile of the emissions distribution.

Selecting an Appropriate Distribution

The choice of an appropriate distribution for a continuous variable x is usually based on
sample data. An investigator must first decide whether a particular family, such as the
Weibull family or the normal family, is reasonable. Then any parameters of the chosen
family must be estimated to find a particular member of the family that in some sense best
fits the data. These issues are considered in subsequent chapters.

Section 1.5 Exercises

43. A theoretical justification based on a certain mate- kg/day/km) could be modeled with a lognormal dis-
rial failure mechanism underlies the assumption tribution having 5 9.164 and 5 .385.
that ductile strength of a material has a lognormal a. What proportion of source loads are at most
distribution. Suppose the values of the parameters are 15,000 kg/day/km?
5 5 and 5 .1. b. What interval (a, b) is such that 95% of all
a. What proportion of material specimens have a source loads have values in this interval, 2.5%
ductile strength exceeding 120? What proportion have values less than a, and 2.5% have values
have a ductile strength of at least 120? exceeding b?
b. What proportion of material specimens have a
45. The article “Response of SiGf/Si3N4 Composites
ductile strength between 110 and 130?
Under Static and Cyclic Loading—An Experimen-
c. If the smallest 5% of strength values were unac-
tal and Statistical Analysis” (J. of Engr. Materials
ceptable, what would be the minimum accept-
and Technology, 1997: 186–193) suggests that ten-
able strength?
sile strength (MPa) of composites under specified
44. Nonpoint source loads are chemical masses that conditions can be modeled by a Weibull distribu-
travel to the main stem of a river and its tributaries in tion with 5 9 and 5 180.
flows that are distributed over relatively long stream a. Sketch a graph of the density function.
reaches in contrast to those that enter at well-defined b. What proportion of specimens of this type have
and regulated points. The article “Assessing Uncer- strength values exceeding 175?
tainty in Mass Balance Calculation of River Nonpoint c. What proportion of specimens of this type have
Source Loads” (J. of Envir. Engr., 2008: 247–258) strength values between 150 and 175?
suggested that for a certain time period and location, d. What strength value separates the weakest 10%
x 5 nonpoint source load of total dissolved solids (in of all specimens from the remaining 90%?

46. Suppose that fracture strength (MPa) of silicon nitride b. Construct a histogram of the natural logarithms of
braze joints under certain conditions has a Weibull dis- the lifetime observations, and comment on inter-
tribution with 5 5 and 5 125 (suggested by data esting characteristics.
in the article “Heat-Resistant Active Brazing of Silicon
11 14 20 23 31 36 39 44
Nitride: Mechanical Evaluation of Braze Joints,” Weld-
47 50 59 61 65 67 68 71
ing J., August 1997: 300s–304s).
a. What proportion of such joints have a fracture 74 76 78 79 81 84 85 91
strength of at most 100? Between 100 and 150? 93 96 99 101 104 105 105 112
b. What strength value separates the weakest 50% of 118 123 136 139 141 148 158 161
all joints from the strongest 50%? 168 184 206 248 263 289 322 388
c. What strength value characterizes the weakest 5% 513
of all joints?
49. The authors of the paper from which the data in
47. The Weibull distribution discussed in this section has the previous exercise was extracted suggested that a
a positive density function for all x . 0. In some situ- reasonable probability model for drill lifetime was a
ations, the smallest possible value of x will be some lognormal distribution with 5 4.5 and 5 .8.
number that exceeds zero. A shifted Weibull distri- a. What proportion of lifetime values are at most 100?
bution, appropriate in such situations, has a density b. What proportion of lifetime values are at least
function for x . obtained by replacing x with x 2 200? Greater than 200?
in the earlier density function formula. The article
“Predictive Posterior Distributions from a Bayesian 50. The article cited in Example 1.20 proposed the log-
Version of a Slash Pine Yield Model” (Forest Science, normal distribution with 5 4.5 and 5 .625 as a
1996: 456–463) suggests that the values 5 1.3 cm, model for total hydrocarbon emissions (g/gal).
5 4, and 5 5.8 specify an appropriate distribu- a. What proportion of engines emit at least 50 g/gal?
tion for diameters of trees in a particular location. Between 50 and 150 g/gal?
a. What proportion of trees have diameters between b. What value c separates the best 1% of engines
2 and 4 cm? with respect to THC emissions from the remain-
b. What proportion of trees have diameters that are ing 99%?
at least 5 cm? 51. The article “On Assessing the Accuracy of Offshore
c. What is the median diameter of trees, that is, the Wind Turbine Reliability-Based Design Loads from
value separating the smallest 50% from the largest the Environmental Contour Method” (Intl. J. of Off-
50% of all diameters? shore and Polar Engr., 2005: 132–140) proposes the
48. The paper “Study on the Life Distribution of Micro- Weibull distribution with 5 1.817 and 5 .863 as
drills” (J. of Engr. Manufacture, 2002: 301–305) re- a model for 1-hour significant wave height (m) at a
ported the following observations, listed in increasing certain site.
order, on drill lifetime (number of holes that a drill a. What proportion of wave heights are at most 0.5 m?
machines before it breaks) when holes were drilled in b. What proportion of wave heights are between 0.2
a certain brass alloy. and 0.6 m?
a. Construct a histogram of the data using class c. What is the 90th percentile of the wave height dis-
boundaries 0, 50, 100, . . . , and then com- tribution? The 10th percentile?
ment on interesting characteristics.

1.6 Several Useful Discrete Distributions

A distribution for a discrete variable x is specified by a mass function p(x) satisfy-
ing p(x) . 0 for every possible value and ^ p (x) 5 1. Here p(0) is the population
or long-run process proportion of x values that equal 0, p(1) is the proportion of

values that equal 1, and so on. We now introduce the two discrete distributions
that appear most frequently in statistical applications: the binomial and the Poisson
distributions.

The Binomial Distribution

Cartridges for a certain type of rollerball pen are sold two to a package. Suppose that 20%
of all such cartridges leak, making them unsatisfactory, and the other 80% do not leak. Let’s
also assume that the condition of the second cartridge in a package—satisfactory or unsat-
isfactory—is independent of the first cartridge’s condition. By this we mean that in packages
with a satisfactory first cartridge, 80% of the second cartridges are satisfactory, and in packag-
es with an unsatisfactory first cartridge, 80% of the second cartridges are satisfactory. In other
words, the percentage of satisfactory second cartridges is not affected by the condition of the
first cartridge. We will use SS to denote a package with two satisfactory cartridges, and SF
to denote a package with a satisfactory first cartridge and an unsatisfactory second cartridge
(S for success and F for failure). Then 80% of all packages will have a first S, and of these, a
further 80% will have a second S, giving 80% of 80% or 64% SS’s. Similarly, of the 80% of
all packages that have a first S, 20% will have a second cartridge that is an F, so 20% of 80%
or 16% of all packages will be SF’s. This is also the percentage of all packages that are FS’s:
80% of 20% or 16%. Finally, 20% of 20% or 4% of all packages are FF’s. Notice that these
percentages result from multiplying pairs of proportions:
SS: (.8)(.8) 5 .64 or 64%
SF: (.8)(.2) 5 .16 or 16%
FS: (.2)(.8) 5 .16 or 16%
FF: (.2)(.2) 5 .04 or 4%
Now let x be the number of S’s in a package. Possible values of x are 0, 1, and 2. Our
calculations imply that the proportion of all packages with x 5 0 is .04 and the proportion
of all packages with x 5 2 is .64. Because 16% of all packages are SF’s and 16% are FS’s,
(proportion of all packages with x 5 1) 5 .16 1 .16 5 .32
That is, in the long run, 32% of all packages will have x 5 1 (this also comes from 1 2
.04 2 .64).
Suppose instead that cartridges come in packages of four. Again let x be the
number of S’s in a package. One way to get a package with x 5 2 is SSFF, and by
independence, the percentage of all such packages is 20% of 20% of 80% of 80% or
100[(.8)(.8)(.2)(.2)] 5 2.56%, or a proportion of .0256. But there are in fact five other
ways, for a total of six possibilities:

Outcome for which x 5 2 Proportion

SSFF (S in 1 and 2) (.8)(.8)(.2)(.2) 5 .0256
SFSF (S in 1 and 3) (.8)(.2)(.8)(.2) 5 .0256
SFFS (S in 1 and 4) .0256
FSSF (S in 2 and 3) .0256
FSFS (S in 2 and 4) .0256
FFSS (S in 3 and 4) .0256

The population or long-run process proportion of packages having x 5 2 is then the sum
of these six values of .0256, or 6(.0256) 5.1536. Similarly, there are four possibilities for
x 5 1—the single satisfactory cartridge could be the first, second, third, or fourth one in
the package. The proportion of SFFF’s is (.8)(.2)(.2)(.2) 5 .0064, which is also the pro-
portion of FSFF’s, FFSF’s, and FFFS’s. Adding .0064 four times gives

(proportion of packages with x 5 1) 5 4(.8)(.2)3 5 .0256

By the same reasoning,

(proportion of packages with x 5 3) 5 4(.8)3(.2) 5 .4096
so roughly 41% of all packages will have three satisfactory cartridges.
What if packages have ten cartridges, and you want to know what proportion have six
S’s? It is extremely tedious to list all possibilities, but fortunately this is unnecessary. There
is a straightforward counting technique to determine the number of possible outcomes hav-
ing any particular x value.

The Binomial Distribution

Suppose that items or entities of some sort come in batches or groups of size . Let denote
the proportion of all items in the population or process that are satisfactory (S, for success), so
the proportion of all items that are unsatisfactory (F, for failure) is 1 – . Assume that the condi-
tion of any particular item (S or F) is independent of that of any other item. The binomial vari-
able is the number of S’s in a batch or group.The mass function of is given by the formula
( ) 5 proportion of batches with S’s

! 2
5 3 (1 2 ) 5 0,1, . . . ,
!( 2 )!

In the case of a population, the formula gives good approximations as long as the total
number of items examined in all batches is at most 5% of the population size (answers
are exact if the population size is infinite). For a process, it is required that the value of
remain constant over time (a stable process).
In the mass function formula, x(1 2 )n2x generalizes the multiplications
(.8) (.2) and (.8)2(.2)2 in the pen cartridge example. The factorial expression is the
3

number of possible outcomes for a batch of size n that have x S’s. For example, when
n 5 4 and x 5 2,

n! 4! (4)(3)(2)(1) (4)(3)
5 5 5 56
x!(n 2 x)! (2!)(2!) (2)(1)(2)(1) (2)(1)

as we saw previously. You can find a derivation of this formula in several of the refer-
ences listed in the bibliography.

Example 1.21 The binomial distribution is used extensively in genetic applications. An early
genetics article (“The Progeny in Generation F12 to F17 of a Cross Between a Yel-
low-Wrinkled and a Green-Round Seeded Pea,” J. of Genetics, 1923: 255–331)
reported on an experiment in which four-seeded pea pods from a dihybrid cross
were examined. The variable of interest was x 5 the number of YR (yellow and
round) peas in a pod. Mendelian laws of inheritance imply that 5 9/16 5 .5625
[from (3/4)(3/4)]. Now consider peas with eight-seeded pods. The proportion of all
pods with five YR peas is

8!
(proportion with x 5 5) 5 (.5625)5(.4375)3
(5!)(3!)

5 56(.5625)5(.4375)3 5 2641

The proportion of all pods with at least five such peas is

(proportion with x $ 5) 5 p(5) 1 p(6) 1 p(7) 1 p(8)

5 .2641 1 .1698 1 .0624 1 0.100 5 .5063

In the long run, slightly more than 50% of all pods will have five or more YR peas
and slightly less than 50% will have four or fewer YR peas. The complete distribution
of x is as follows:
x: 0 1 2 3 4 5 6 7 8
p(x): .0013 .0138 .0621 .1598 .2567 .2641 .1698 .0624 .0100

Figure 1.32 shows a picture of this distribution. The binomial histogram has a slight
negative skew (it is symmetric only when 5 .5).

Proportion

.30
Unless otherwise noted, all content on this page is © Cengage Learning.

.20

.10

0
0 1 2 3 4 5 6 7 8

Figure 1.32 A binomial histogram when 5 8 and 5 .5625

Use of the binomial distribution formula can be tedious when n is large. Appendix
Table II gives a tabulation of p(x) for a few selected values of n and . This will allow
you to practice binomial calculations without referring to the formula. Alternatively,
values of p(x) for any n and can be obtained from Minitab and other statistical com-
puter packages.

The Poisson Distribution

The Poisson distribution is usually used as a model for the number of times an “event”
of some sort occurs during a specified time period or in a particular region of space.
Examples include the number of accidents that occur on a segment of highway during
a particular 24-hour period, the number of blemishes on the exterior of a new automo-
bile, the number of customers in a grocery store’s express line on Wednesday at 6 p.m.,
and the number of plants of a particular species that are found in a chosen geographic
sampling region.

The Poisson Distribution

The Poisson mass function is
2 x

( )5 5 0, 1, 2, 3, . . .
!

where the parameter must satisfy . 0.

The condition p(x) $ 0 is clearly satisfied. The fact that ^ x50 p(x) 5 1 is a consequence
of multiplying both sides of the following infinite series expansion by e2:

2 3
e 5 1 1 1 1 1
2! 3!

We shall see in Chapter 2 that can be interpreted as the average rate at which events
occur.

Example 1.22 Let x denote the number of creatures of a particular type captured in a trap during
a given time period. Suppose that x has a Poisson distribution with 4.5, so, on aver-
age, traps will contain 4.5 creatures. [The article “Dispersal Dynamics of the Bivalve
Gemma Gemma in a Patchy Environment (Ecological Monographs, 1995: 1–20) sug-
gests this model; the bivalve Gemma gemma is a small clam]. The proportion of traps
with five creatures is
e24.5(4.5)5
(proportion with x 5 5) 5 5 .1708
5!

The proportion of traps having at most five creatures is

(proportion with x # 5) 5 p(0) 1 p(1) 1 1 p(5) 5 .7029 (roughly 70%)
so the proportion of traps with at least six creatures is 1 2 .7029 5 .2971. As x
increases, p(x) decreases but never quite reaches zero. The proportions for the first
13 x values follow; their sum is .9992. Figure 1.33 shows the corresponding Poisson
histogram.
x: 0 1 2 3 4 5 6
p(x): .0111 .0500 .1125 .1687 .1898 .1708 .1281
x: 7 8 9 10 11 12
p(x): .0824 .0463 .0232 .0104 .0043 .0016

Proportion

.20

.15

.10

.05

0
0 2 4 6 8 10 12

Figure 1.33 Poisson histogram when 5 4.5

A small tabulation of the Poisson mass function for selected values of appears in Ap-
pendix Table III.
Unless otherwise noted, all content on this page is © Cengage Learning.

The Poisson Approximation to the Binomial Distribution

Often a binomial scenario involves a group size n that is quite large in combination with
a success proportion close to zero. Under such circumstances, the binomial mass func-
tion can be well approximated by the Poisson mass function with 5 n. In particular, if
n $ 100, # .01, and 5 n # 20, then

n! e2x
x(1 2 )n2x
x!(n 2 x)! x!

A more formal statement of this result is that the Poisson mass function on the right-
hand side is the limit of the binomial function on the left as n , 0 in such a
way that n .

Example 1.23 Components of a certain type are shipped from a supplier to customers in lots of 5000.
Because the purchaser cannot check the condition of each component, a sample of 25
is selected and tested. The entire lot will then be accepted only if the number of compo-
nents x that do not conform to specification is at most three (so here S’s are nonconform-
ing units, not what we usually think of as a success). Suppose that .5% of all components
are nonconforming, giving 5 100(.005) 5 .5. Then the proportion of acceptable lots is
proportion of lots with x # 3
5 p(0) 1 p(1) 1 p(2) 1 p(3)
100! 100!
5 (.005)0(.995)100 1 … 1 (.005)3(.995)97
0!100! 3!97!
e2.5(.5)0 … e2.5(.5)3
1 1
0! 3!
5 .6065 1 .3033 1 .0758 1 .0126
5 .9982
The exact proportion using the binomial mass function is .6058 1 .3044 1 .0757 1
.0124 5 .9983.

Many applications of the Poisson distribution are in fact based on an underlying

binomial situation without the values of n and being stated explicitly. For example,
a very large number of vehicles may pass over a given stretch of highway during a par-
ticular time period, but the long-run proportion of vehicles receiving speeding tickets
will be quite small, so the number of ticketed vehicles will have at least approximately
a Poisson distribution.

Section 1.6 Exercises

52. When circuit boards used in the manufacture of com- have cosmetic flaws and that the condition of any par-
pact disc players are tested, the long-run percentage ticular goblet with respect to flaws is independent of
of defectives is 5%. Let x denote the number of de- the condition of any other goblet.
fective boards in a batch of 25 boards, so that x has a a. What proportion of boxes will contain only one
binomial distribution with n 5 25 and 5 .05. goblet with a cosmetic flaw?
a. What proportion of batches have at most 2 defec- b. What proportion of boxes will contain at least two
tive boards? goblets with cosmetic flaws?
b. What proportion of batches have at least 5 defec- c. What proportion of boxes will have between
tive boards? one and three goblets, inclusive, with cosmetic
c. What proportion of batches will have all 25 boards flaws?
free of defects?
54. On his way to work, a friend of ours must pass through
53. A company packages its crystal goblets in boxes con- ten traffic signals. Suppose that in the long run, she
taining six goblets. Suppose that 12% of all its goblets encounters a red light at 40% of these signals and that

whether any particular signal is red is independent of this new acceptance sampling plan. Does
whether any other one is red. this plan appear more satisfactory than the
a. On what proportion of days will our friend en- original plan?
counter at most two red lights? At least five red
57. Suppose that the number of drivers who travel be-
lights?
tween a particular origin and destination during a
b. On what proportion of days will our friend en-
designated time period has a Poisson distribution
counter between two and five (inclusive) red
with parameter 5 20 (suggested in the article
lights?
“Dynamic Ride Sharing: Theory and Practice,” J.
55. Suppose that 10% of all bits transmitted through of Transp. Engr., 1997: 308–312). In the long run,
a digital communication channel are erroneously in what proportion of time periods will the number
received and that whether any particular bit is er- of drivers
roneously received is independent of whether any a. Be at most 10?
other bit is erroneously received. Consider sending b. Exceed 20?
a very large number of messages, each consisting of c. Be between 10 and 20, inclusive? Be strictly
20 bits. between 10 and 20?
a. What proportion of these messages will have at
58. Let x be the number of material anomalies occur-
most 2 erroneously received bits?
ring in a particular region of an aircraft gas-turbine
b. What proportion of these messages will have at
disk. The article “Methodology for Probabilistic
least 5 erroneously received bits?
Life Prediction of Multiple-Anomaly Materials”
c. For what proportion of these messages will more
(Amer. Inst. of Aeronautics and Astronautics J.,
than half the bits be erroneously received?
2006: 787–793) proposes a Poisson distribution for
56. Components arrive at a distributor in very x. Suppose that 5 4.
large batches. A batch can be characterized as ac- a. What proportion of gas-turbine disks have exactly
ceptable only if the fraction of defective compo- one anomaly?
nents in the batch is at most .10. The distributor b. What proportion of gas-turbine disks have at least
decides to randomly select ten components from three anomalies?
the batch, test each one, and accept the batch c. What proportion of gas-turbine disks have be-
only if the sample contains at most two defec- tween one and six anomalies inclusive?
tive components. Assume that the condition of
59. Let x denote the number of trees in a quarter-acre
any particular component is independent of any
plot within a certain forest. Suppose that x has a
other.
Poisson distribution with 5 20 (corresponding to
a. If the actual fraction of defectives in each batch is
an average density of 80 trees per acre). In what pro-
only 5 .01, what proportion of batches will be
portion of such plots will there be at least 15 trees?
accepted? Repeat this calculation for the follow-
At most 25 trees?
ing values of : .05, .10, .20, and .25.
b. A graph of the proportion of batches accepted 60. An article in the Los Angeles Times (Dec. 3, 1993)
versus the actual fraction of defectives is reports that 1 in 200 people carry the defective
called the operating characteristic curve. Use gene that causes colon cancer. Let x denote the
the results of part (a) to sketch this curve for number of people in a group of size 1000 who
0 # # 1 (proportion of batches accepted is carry this defective gene. What is the approximate
on the vertical axis and is on the horizontal distribution of x? Use this approximate distribu-
axis). tion to determine the proportion of all such groups
c. Suppose the distributor decides to be more having at least 8 people who carry the defective
demanding by accepting a batch only if gene, as well as the proportion of all such groups
the sample contains at most one defective for which between 5 and 10 people (inclusive)
component. Repeat parts (a) and (b) with carry the defective gene.

Supplementary Exercises

61. The accompanying frequency distribution of fracture a. What proportion of elapsed times exceed
strength (MPa) observations for ceramic bars fired 1.5 days?
in a particular kiln appeared in the article “Evaluat- b. What is the 90th percentile of the elapsed time
ing Tunnel Kiln Performance” (Amer. Ceramic Soc. distribution?
Bull., August 1997: 59–63). 64. Let x denote the distance (m) that an animal moves
812 832 852 872 892 from its birth site to the first territorial vacancy it en-
Class:
, 83 , 85 , 87 , 89 , 91 counters. Suppose that for banner-tailed kangaroo
Frequency: 6 7 17 30 43 rats, x has an exponential distribution with parame-
912 932 952 972 ter 5 .01386 (as suggested in the article “Compe-
Class:
, 93 , 95 , 97 , 99 tition and Dispersal from Multiple Nests,” Ecology,
Frequency: 28 22 13 3 1997: 873–883).
a. Construct a histogram based on relative frequen- a. What proportion of distances are at most 100 m?
cies, and comment on any interesting features. At most 200 m? Between 100 m and 200 m?
b. What proportion of the strength observations are b. What proportion of distances are at least 50 m?
at least 85? Less than 95? c. What is the median distance, that is, the value that
c. Roughly what proportion of the observations are separates the smallest 50% of all distances from
less than 90? the largest 50%?
62. The article cited in Exercise 61 presented compel- 65. Suppose the unloading time x (centiminutes) of
ling evidence for assuming that fracture strength a forwarder in a harvesting operation could be as-
(MPa) of ceramic bars fired in a particular kiln sumed to be lognormal with 5 6.5 and 5 .75,
is normally distributed (while commenting that as suggested in the article “Simulating a Harvester-
the Weibull distribution is traditionally used as Forwarder Softwood Thinning” (Forest Products J.,
a model). Suppose that 5 90 and 5 3.75, May 1997: 36–41).
which is consistent with data given in the article. a. What proportion of unloading times exceed 1000?
a. In the long run, what proportion of bars would 2000? 3000?
have strength values less than 90? Less than 95? At b. What proportion of times are between 2500 and
least 95? 5000?
b. In the long run, what proportion of bars would c. What value characterizes the fastest 10% of all
have strength values between 85 and 95? Between times?
80 and 100? d. Sketch a graph of the density function of x. Is the
c. What value is exceeded by 90% of the fracture positive skewness quite pronounced?
strengths for all such bars? 66. In an experiment, 25 laminated glass units configured
d. What interval centered at 90 includes 99% of all in a particular way are subjected to an impact test (cf.
fracture strength values? “Performance of Laminated Glass Units Under Simu-
63. Once an individual has been infected with a cer- lated Windborne Debris Impacts,” J. of Architectural
tain disease, let x represent the time (days) that Engr., 1996: 95–99). We are interested in the number
elapses before the individual becomes infectious. of units that sustain an inner glass ply fracture. Sup-
The article “The Probability of Containment for pose that the long-run proportion of all such units that
Multitype Branching Process Models for Emerg- fracture is .20. In the long run, for what proportion of
ing Epidemics” (J. of Applied Probability, 2011: such experiments will the number of fractures be
173–188) proposes a Weibull distribution with 5 a. At least 10?
2.2 and 5 1.1 for x 2 .5 (i.e. the Weibull density b. At most 5?
curve is shifted to the right of 0 by .5; Minitab refers c. Between 5 and 10 inclusive?
to .5 as the value of the threshold parameter). d. Strictly between 5 and 10?

67. Airlines frequently overbook flights. Suppose that for a. Construct a stem-and-leaf display of the data.
a plane with 100 seats, an airline takes 110 reserva- b. What is a typical or representative flow value?
tions. Let x represent the number of people with res- Does the data appear to be highly concentrated or
ervations who actually show up for a sold-out flight. quite spread out about this typical value?
From past experience, we know that the distribution c. Does the distribution of values appear to be rea-
of x is as follows: sonably symmetric? If not, how would you de-
scribe the departure from symmetry?
x: 95 96 97 98 99 100 101 102 103
d. Does the data set appear to contain any outliers?
p(x): .05 .10 .12 .14 .24 .17 .06 .04 .03
e. Construct a histogram using class boundaries 2, 3,
x: 104 105 106 107 108 109 110 4, 5, 6, 7, 8, 9, 10, 12, 14, 16, and 20. From your
p(x): .02 .01 .005 .005 .005 .0037 .0013 histogram, approximately what proportion of the
observations are at most 11? Compare this with
a. For what proportion of such flights is the airline the exact proportion that are at most 11.
able to accommodate everyone who shows up
for the flight? 69. Let x denote the vibratory stress (psi) on a wind
b. For what proportion of all such flights is it not pos- turbine blade at a particular wind speed in a wind
sible to accommodate all passengers? tunnel. The article “Blade Fatigue Life Assessment
c. For someone who is trying to get a seat on such with Applications to VAWTS” (J. of Solar Energy
a flight and is number 1 on the standby list, Engr., 1982: 107–111) proposes the Rayleigh distri-
what proportion of the time is such an indi- bution as a model; the density function is
vidual able to take the flight? Answer the ques- x x.0
? e2x y(2 )
2 2

tion for individuals who are number 3 on the f (x) 5 c

standby list. 0 otherwise

68. The accompanying data are observations on shower a. Verify that f (x) is a legitimate density function.
flow rate for a sample of 129 houses in Perth, Australia b. Suppose that 5 100 (a value suggested by a
(“An Application of Bayes Methodology to the Analy- graph in the cited article). What proportion of vi-
sis of Diary Records in a Water Use Study,” J. Amer. bratory stress values will be at most 200? At least
Stat. Assoc., 1987: 705–711): 200? Between 100 and 200?
4.6 12.3 7.1 7.0 4.0 9.2 6.7 6.9 70. The article “Error Distribution in Navigation” (J. In
11.5 5.1 3.8 11.2 10.5 14.3 8.0 8.8 stitute of Navigation, 1971: 429–442) suggests that
6.4 5.1 5.6 9.6 7.5 7.5 6.2 5.8 the frequency distribution of positive errors (magni
2.3 3.4 10.4 9.8 6.6 3.7 6.4 6.0 tudes of errors) is well approximated by an exponen-
8.3 6.5 7.6 9.3 9.2 7.3 5.0 6.3 tial distribution. Let x denote the lateral position
13.8 6.2 5.4 4.8 7.5 6.0 6.9 10.8 error (nautical miles), which can be either positive
7.5 6.6 5.0 3.3 7.6 3.9 11.9 2.2 or negative, and suppose the density function of x is
f (x) 5 (.1)e2.2|x| for 2∞ , x , ∞.
15.0 7.2 6.1 15.3 18.9 7.2 5.4 5.5
a. Sketch the corresponding density curve, and verify
4.3 9.0 12.7 11.3 7.4 5.0 3.5 8.2
that f (x) is a legitimate density function.
8.4 7.3 10.3 11.9 6.0 5.6 9.5 9.3
b. What proportion of errors are negative? At most 2?
10.4 9.7 5.1 6.7 10.2 6.2 8.4 7.0 Between 21 and 2?
4.8 5.6 10.5 14.6 10.8 15.5 7.5 6.4
71. “Time headway” in traffic flow is the elapsed time be-
3.4 5.5 6.6 5.9 15.0 9.6 7.8 7.0
tween the time that one car finishes passing a fixed
6.9 4.1 3.6 11.9 3.7 5.7 6.8 11.3 point and the instant that the next car begins to pass
9.3 9.6 10.4 9.3 6.9 9.8 9.1 10.6 that point. Let x be the time headway (sec) for two
4.5 6.2 8.3 3.2 4.9 5.0 6.0 8.2 consecutive cars on a freeway during a period of heavy
6.3 flow. The following density function is essentially the

one suggested in “The Statistical Properties of Freeway b. Determine the mass function of x. Hint: The cu-
Traffic” (Transportation Research, 1977: 221–228): mulative proportion function jumps only at pos-
sible values of x.
.15e2.15(x2.5) x . .5
f (x) 5 e c. Use the cumulative proportion function to deter-
0 otherwise
mine the proportion of all policyholders for which
a. Sketch the corresponding density curve, and verify 3 # x # 6, and check to see that the mass function
that f (x) is a legitimate density function. gives this same proportion.
b. What proportion of time headways are at most
74. Based on data from a dart-throwing experiment, the
5 sec? Between 5 and 10 sec?
article “Shooting Darts” (Chance, Summer 1997,
c. What value separates the smallest 50% of all time
16–19) proposed that the horizontal and vertical er-
headways from the largest 50%?
rors from aiming at a point target should be indepen-
d. What value characterizes the largest 10% of all
dent of one another, each with a normal distribution
time headways?
having parameters 5 0 and . It can then be shown
72. A k-out-of-n system is one that will function if and that the density function of the distance from the tar-
only if at least k out of the n individual components get to the landing point is
in the system function. If individual components
v 2 2
function independently of one another and the f (v) 5 2 ? e2v y2 v.0
long-run proportion of components that function
is .9, what is the long-run proportion of 3-out-of-5
a. This pdf is a member of what family introduced in
systems that will function?
this chapter?
73. An insurance company offers its policyholders a num- b. If 5 20 mm (close to the value suggested in the
ber of different premium payment options. Let x de- paper), what proportion of darts will land within
note the number of months between successive pay- 25 mm (roughly 1 in.) of the target?
ments chosen by a policyholder. For any particular
75. The bursting strength of wine bottles of a certain type
number k, the proportion of x values that are at most k
is normally distributed with parameters 5 250 psi
(i.e., # k) is called a cumulative proportion.Consider
and 5 30 psi. If these bottles are shipped 12 to a
the following cumulative proportions: 0 for x , 1, .30
carton, in what proportion of cartons will at least one
for 1 , # x , 3, .40 for 3 # x , 4, .45 for 4 # x , 6, .60
of the bottles have a bursting strength exceeding 300
for 6 # x , 12, and 1 for x $ 12.
psi? Hint: Think of a bottle as a success S if its burst-
a. Graph this cumulative proportion function, that
ing strength exceeds 300 psi.
is, graph (proportion of x values # k) versus k.

Bibliography
Chambers, John, William Cleveland, Beat Kleiner, and New York, 1992. A veritable encyclopedia of informa-
Paul Tukey, Graphical Methods for Data Analysis, tion on discrete distributions.
Wadsworth, Belmont, CA, 1983. A very readable source Johnson, Norman, Samuel Kotz, and N. Balakrishnan,
for information on constructing histograms, checking Continuous Univariate Distributions (vol. 1, 2nd
the plausibility of various distributions, and other visual ed.), Wiley, New York, 1994. An encyclopedic reference
techniques. for continuous distributions.
Cleveland, William, The Elements of Graphing Data Olkin, Ingram, Cyrus Derman, and Leon Gleser,
(2nd ed.), Hobart Press, Summit, NJ, 1994. An infor- Probability Models and Applications (2nd ed.),
mal and informative introduction to various aspects of Macmillan, New York, 1994. Contains in-depth dis-
graphical analysis. cussions of both general properties of discrete and
Johnson, Norman, Samuel Kotz, and Adrienne Kemp, continuous distributions and results for specific
Univariate Discrete Distributions (2nd ed.), Wiley, distributions.

2
MichaelTaylor/Shutterstock.com
Numerical Summary
Measures
2.1 Measures of Center
2.2 Measures of Variability
2.3 More Detailed Summary Quantities
2.4 Quantile Plots

Introduction
In Chapter 1, we learned how to describe sample data using either a stem-
and-leaf display or a histogram. We then saw how a density function or mass
function could be used to represent the distribution of a variable in an entire
population or process. Often an investigator will want to obtain or convey in-
formation about particular characteristics of data. In this chapter, we first in-
troduce several numerical summary measures that describe where a sample or
distribution is centered. Another important aspect of a sample or distribution
is the extent of spread about the center. In Section 2.2, we develop the most
useful measures of variability. In Section 2.3, we consider more detailed data
summaries and how they can be combined to yield concise yet informative data
descriptions. Once sample data has been obtained, it is often important to know
whether it is plausible that the data came from a particular type of distribution,
such as a normal distribution or a Weibull distribution. In Section 2.4, we show
how to construct a picture from which the plausibility of any particular type of
underlying distribution can be judged.

2.1 Measures of Center

A preliminary sense of where a data set is centered can be gleaned from a stem-and-leaf
display or a histogram. A precise quantitative assessment entails calculating a measure of
center such as the mean or median; the resulting number can then be regarded as being rep-
resentative or typical of the data. First, we consider measures of center for sample data, and
then we turn our attention to analogous measures for distributions of a numerical variable x.

Measures of Center for Data

Suppose that the sample consists of observations on a numerical variable x. We shall
use the letter n to represent the sample size (number of observations in the sample, e.g.,
n 5 10). The individual observations will be denoted by x1, x2, . . . , xn. The subscripts
typically refer to the time order in which the observations were obtained—the first ob-
servation is x1, the second observation is x2, and so on. In general, the subscripts are un-
related to the magnitudes of the observations: x1 is not usually the smallest observation,
nor is xn the largest sample value.

The Sample Mean

The most frequently used measure of center is simply the arithmetic average of the n
observations.

definition The sample mean of observations x1, . . . , xn, denoted by x, is given by

x1 1 x2 1 1 xn
^ xi
i51
x5 5
n n
The numerator of x can be written more informally as ^ xi, where the summation
is over all sample observations.

For reporting x, we recommend using decimal accuracy of one digit more than the ac-
curacy of the xi’s. Thus if observations are stopping distances with x1 5 125, x2 5 131,
and so on, we might have x 5 127.3 ft.

Example 2.1 In recent years there has been growing commercial interest in the use of what is
known as internally cured concrete. This concrete contains porous inclusions most
commonly in the form of lightweight aggregate (LWA). In the article “Characterizing
Lightweight Aggregate Desorption at High Relative Humidities Using a Pressure Plate
Apparatus” (J. of Materials in Civil Engr., 2012: 961–969), researchers examined
various physical properties of 14 LWA specimens. The following are the 24-hour
water absorption percentages for the 14 specimens:
x1 5 16.0 x2 5 30.5 x3 5 17.7 x4 5 17.5 x5 5 14.1
x6 5 10.0 x7 5 15.6 x8 5 15.0 x9 5 19.1 x10 5 17.9
x11 5 18.9 x12 5 18.5 x13 5 12.2 x14 5 6.0

Figure 2.1 shows a stem-and-leaf display of the data (the tenths digit is truncated);
a water absorption percentage in the midteens appears to be “typical.” With
^ xi 5 229.0, the sample mean is x 5 229.0
14 5 16.36, a value consistent with informa-
tion conveyed by the stem-and-leaf display.

Figure 2.1 A stem-and-leaf display

of the water absorption data

The mean suffers from one deficiency that makes it an inappropriate measure of
center under some circumstances: Its value can be greatly affected by the presence
of even a single outlier (unusually large or small observation). In Example 2.1, the
value x2 5 30.5 is obviously an outlier. Without this observation, x 5 15.27; the outlier
increases the mean by more than 1%. If the 30.5 observation were replaced by the
relatively large value 90.0, a really extreme outlier, then x 5 288.5y14 5 20.61, which is
larger than any of the other observations!
The Sample Median
An alternative measure of center that resists the effects of outliers is the median. The me-
dian strip of a roadway divides the roadway into two equal parts, and the sample median does
the same for the sample. If, for example, n 5 5 and the observations are ordered from small-
est to largest, the third observation from either end is the median. When n 5 6, though,
there are two middle values in the ordered list; the median is the average of these two values.

The sample median, denoted by x, is obtained by first ordering the sample obser-
~
definition
vations from smallest to largest. Then
n11
Unless otherwise noted, all content on this page is © Cengage Learning.

single middle value 5 a b th value on ordered list n odd

2
x5 µ
~

average of two n n
5 average of th and a 1 1b th values n even
middle values 2 2

Example 2.2 People not familiar with classical music might tend to believe that a composer’s in-
structions for playing a particular piece are so specific that the duration would not
depend at all on the performer(s). However, there is typically plenty of room for
interpretation, and orchestral conductors and musicians take full advantage of this.
We went to the website ArkivMusic.com and selected a sample of 12 recordings of

Beethoven’s stunningly beautiful Symphony No. 9 (the “Chorale”), and found the
following durations (min) listed in increasing order:
62.3 62.8 63.6 65.2 65.7 66.4 67.4 68.4 68.8 70.8 75.7 79.0
Figure 2.2 is a dotplot of the data:

60 65 70 75 80
Duration

Figure 2.2 Dotplot of the data from Example 2.2

Since n 5 12 is even, the sample median is the average of the ny2 5 sixth and
(ny2 1 1) 5 seventh values from the ordered list:
66.4 1 67.4
x5 5 66.90
~

2
Note that if the largest observation 79.0 had not been included in the sample, then the
resulting sample median for the n 5 11 remaining observations would have been the
single middle value 67.4 [the (n 1 1)y2 5 sixth ordered value—i.e., the sixth value in
from either end of the ordered list]. The sample mean is x 5 ^ xi 5 816.1y12 5 68.01,
a bit more than a full minute larger than the median. The mean is pulled out a bit
relative to the median because the sample “stretches out” somewhat more on the up-
per end than on the lower end.

The largest observation or even the largest two or three observations in Ex
ample 2.2 can be increased by an arbitrary amount without impacting x. Similarly,
~

decreasing several of the smallest observations by any amount does not affect the me-
dian. In contrast to x, the median is impervious to many outliers.

Trimmed Means
A trimmed mean is a compromise between x and x; it is less sensitive to outliers than ~

the mean but more sensitive than the median. The observations are again first ordered
from smallest to largest. Then a trimming percentage 100r% is chosen, where Unless otherwise noted, all content on this page is © Cengage Learning.

r is a number between 0 and .5. Suppose that r 5 .1, so the trimming percentage
is 10%. Then if n 5 20, 10% of 20 is 2; the 10% trimmed mean results from deleting
(trimming) the largest two and the smallest two observations, and then averaging the
remaining 16 values. Notice that the trimming percentage specifies the number of
observations to be deleted from each end of the ordered list. The sample mean is a 0%
trimmed mean, whereas the median is a trimmed mean corresponding to the largest
possible trimming percentage (e.g., a 45% trimmed mean when n 5 20).

Example 2.3 Consider the following 20 observations, ordered from smallest to largest, each repre-
senting the lifetime (hr) of a certain type of incandescent lamp:

612 623 666 744 883 898 964 970 983 1003
1016 1022 1029 1058 1085 1088 1122 1135 1197 1201

The sample mean is x 5 19,299y20 5 965.0, and x 5 (1003 1 1016)y2 5 1009.5.

The 10% trimmed mean is

19,299 2 612 2 623 2 1197 2 1201

xtr(10) 5 5 979.1
16

The effect of trimming here is to produce a central value that is somewhat larger than
the mean yet considerably below the median. Similarly, the 20% trimmed mean
averages the middle 12 values to obtain xtr(20) 5 999.9, which is even closer to the
median. The various measures of center are illustrated in the dotplot of Figure 2.3.

–
tr(10)

600 800 1000 1200

– ˜

Figure 2.3 Dotplot of lifetimes and measures of center for Example 2.3

Statisticians generally recommend a trimming percentage between 5% and

25%. Notice that (r)(n) may not be a whole number; if r 5 .10 and n 5 25, then
(r)(n) 5 2.5. Eliminating two observations from each end gives a trimming percent-
age of 8%, whereas eliminating three observations gives 12%. The resulting two xtr’s
can then be averaged to obtain the 10% trimmed mean. More generally, a trimmed
mean for any trimming percentage can be obtained by interpolation.

Measures of Center for Distributions

The primary measure of center for a discrete distribution is the mean value, and both the
Unless otherwise noted, all content on this page is © Cengage Learning.

mean value and the median are frequently used measures for continuous distributions.

Discrete Distributions
Plastic parts manufactured using an injection molding process may exhibit one or more
defects, including sinks, scratches, black spots, and so on. Let x represent the number of
defects on a single part, and suppose the distribution of x is as follows:
x: 0 1 2 3 4
p(x): .80 .14 .03 .02 .01

A picture of the distribution appears in Figure 2.4. Where is this distribution centered?
That is, what is the mean or long-run average value of x? A first thought might be to sim-
ply average the five possible values of x to obtain a mean value of 2.0. But this entails

giving the same weight to each possible value, whereas the distribution indicates that
x 5 0 occurs much more frequently than any of the other values. So what is needed is
a weighted average of x values.
Proportion

.80

.70

.60

.50

.40

.30

.20

.10

0
0 1 2 3 4

Figure 2.4 Distribution of , the number of defects on a

manufactured plastic part

definition The mean value (alternatively, expected value) of a discrete variable x, denoted
by x or just [alternatively, E(x)] is given by

x 5 ^ x ? p(x)

where the summation is over all possible x values.

Example 2.4 We return now to the plastic part scenario introduced at the outset of this subsection.
The mean value of x, the number of defects on a part, is
Unless otherwise noted, all content on this page is © Cengage Learning.
4
x 5 ^ x ? p(x)
x50

5 0(p(0)) 1 1(p(1)) 1 2(p(2)) 1 3(p(3)) 1 4(p(4))

5 (0)(.80) 1 (1)(.14) 1 (2)(.03) 1 (3)(.02) 1 (4)(.01)
5 .30

When we consider the population of all such parts, the population mean value of
x is .30. Alternatively, .30 is the long-run average value of x when part after part is
monitored. It can also be shown that the histogram of the distribution of Figure 2.4
will balance on the tip of a fulcrum placed on the horizontal axis only if the tip is
at .30; is the balance point of the distribution.

In Example 2.4, is not a possible value of x. In the same way, if x is the number of
children in a household, the population mean value of x might be 1.7 even though there
are no households with 1.7 children.
In Chapter 1, we introduced two important types of discrete distributions, the bi-
nomial distribution and the Poisson distribution. The binomial distribution models the
number of “successes” in a group of n items when conditions of individual items are
independent of one another and the long-run proportion of successes is (a number
between 0 and 1). The mean value of x is
n
n!
x 5 ^ x x
(12 )n2x
x50 x!(n 2 x)!

The summation looks very intimidating, but fortunately some algebraic manipulation
yields an extremely simple result.

If is a binomial variable with parameters 5 group size and 5 success proportion,

then .

Thus if n 5 10 and 5 .8, 5 (10)(.8) 5 8; we “expect” eight of the ten items to be

successes, a very intuitive result.
When x is a Poisson variable with parameter ,

e2x e2x
x 5 ^ x 5^x
x50 x! x51 x!
e2x21
5 ^
x51 (x 2 1)!

If we now let y 5 x 2 1, the range of summation is from y 5 0 to :

2 y
e
5 ^
y50 y!

5 ? (sum of a Poisson mass function) 5 (1) 5

Let be a Poisson variable with parameter . The mean value of is itself.

Suppose, for example, that x is the number of burnt potato chips in a 13-oz bag. If x
has a Poisson distribution with parameter 5 2.5, then x 5 2.5; the population mean
number of burnt chips per bag is 2.5.

Continuous Distributions
A distribution for a continuous variable x is specified by a density function f (x) whose
graph is a smooth curve. To obtain , we replace summation in the discrete case by
integration and replace the mass function p(x) by the density function.

definition The mean value (or expected value) of a continuous variable x with density
function f (x) is given by

x 5 # x ? f (x) dx
2

Just as in the discrete case is the balance point for the histogram corresponding to p(x),
in the continuous case is the balance point for the density curve corresponding to f (x).

Example 2.5 The distribution of the amount of gravel (tons) sold by a particular construction sup-
ply company in a given week is a continuous variable x with density function
f (x) 5 1.5(1 2 x2) 0 # x # 1
(f(x) 5 0 outside the interval from 0 to 1). The density curve is shown in Figure 2.5.
Knowledge of the mean value of x will help the company decide on a price for the gravel:
1
x 5 # xf (x) dx 5 # x[1.5(1 2 x2)] dx
2 0

1 x2 x4 2 1
5 1.5# (x 2 x3) dx 5 1.5 a 2 b 5 .375
0 2 4 0
( )

1.5

0 1

Figure 2.5 The density curve

and mean value for Example 2.5 Unless otherwise noted, all content on this page is © Cengage Learning.

In Chapter 1, we introduced the normal distribution with parameters and . The

symmetry of the associated density curve about certainly suggests that is the mean
value, and this is indeed the case:

#2 xf (x) dx 5 # ( 1 x 2 )f (x) dx

1 1
5 # e2(x2) y(2 ) dx 1 # (x 2 )
2 2 2 2
e2(x2) y(2 ) dx
2
22 2 22
1 x2
#2
2
51 ye2y y2dy using y 5
22

The latter integral is zero because the integrand g(y) is an odd function (g(2 y) 5
2 g(y)), which gives the desired result.
A lognormal variable x is one for which ln(x) has a normal distribution with mean
value . That is, ln(x) 5 . Therefore, it might seem that x 5 e, but this is not the
case. It can be shown that
2
x 5 e1 y2

In Example 1.19 of Chapter 1, 5 .353 and 5 .754, from which we calculate

e 5 1.42 whereas
2
x 5 e.3531.5(.754) 5 1.89

The mean value of a Weibull variable is a somewhat complicated expression involving

the parameters and . Consult the chapter references for details.

m and x
If x1, . . . , xn have been randomly selected from some population or process distribution
with mean value , then the sample mean x gives a point estimate for . In Example 2.1,
we calculated x 5 16.36, so a reasonable educated guess for the population mean
water-absorption percentage is 16.36%. Estimation—both point (a single number) and
interval—will be discussed in Chapter 7.

The Median of a Distribution

Just as the sample median x separates the sample into two equal halves, the median m
~ ~

of a continuous distribution divides the area under the density curve into two equal
halves. The defining condition is
~

#2 f(x) dx 5 .5

Example 2.6 (Example 2.5 continued) The median for the distribution of weekly gravel sales
satisfies

x3 2
~

# 1.5(1 2 x2) dx 5 1.5 a x 2 b 5 .5

3 0
0

Using c in place of , we have the cubic equation 1.5(c 2 c3y3) 5 .5, whose so-
~

lution is c 5 5 .347. We previously calculated the mean as x 5 .375, which

is somewhat larger than the median because the distribution is positively skewed
(see Figure 2.5).

Figure 2.6 shows the relationship between the mean and the median for various
types of unimodal distributions or (smoothed) histograms. The median of a discrete dis-
tribution can also be defined; see one of the chapter references for details.

Mean = Median Median Mean Mean Median

Figure 2.6 The relationship between the mean and the median for a continuous
distribution or smoothed histogram

Just as the sample mean gives a point estimate of the population mean , the sam-
ple median x gives a point estimate of the population median. If the population distribu-
~

tion is symmetric (as is any normal distribution), both x and x are estimates of the same
~

population characteristic, namely, the point of symmetry. The issue of which estimate to
use will be addressed in Section 7.1.

Section 2.1 Exercises

1. The May 1, 2009, issue of The Montclarian reported a. Determine the sample mean for each sample.
the following sales figures ($ 1000s) for a sample of How do they compare?
homes in Alameda, California, that were sold the b. Determine the sample median for each sample.
previous month: How do they compare? Why is the median for
590 815 575 608 350 1285 the urban sample so different from the mean for
408 540 555 679 that sample?
a. Calculate and interpret the sample mean and c. Calculate the trimmed mean for each sample
median. by deleting the smallest and largest observation.
b. Suppose the sixth observation had been 985 What are the corresponding trimming percent-
rather than 1285. How would the mean and me- ages? How do the values of these trimmed
dian change? means compare to the corresponding means
c. Calculate a 20% trimmed mean by first trimming and medians?
the two smallest and two largest observations. 3. The production of Bidri is a traditional craft of India.
d. Calculate a 15% trimmed mean. Bidriware (bowls, vessels, and so on) is cast from an
2. Exposure to microbial products, especially endo- alloy containing primarily zinc along with some
toxin, may affect human vulnerability to allergic copper. Consider the following observations on
diseases. The article “Dust Sampling Methods copper content (%) for a sample of Bidri artifacts in Unless otherwise noted, all content on this page is © Cengage Learning.

for Endotoxin—An Essential but Underestimated London’s Victoria and Albert Museum (“Enigmas
Issue” (Indoor Air, 2006: 20–27) considered various of Bidri,” Surface Engr., 2005: 333–339), which are
issues associated with determining endotoxin con- listed in increasing order:
centration. The following data on concentration 2.0 2.4 2.5 2.6 2.6 2.7 2.7
(EU/mg) in settled dust for one sample of urban 2.8 3.0 3.1 3.2 3.3 3.3 3.4
homes and another of farm homes was kindly 3.4 3.6 3.6 3.6 3.6 3.7 4.4
supplied by the authors of the article. 4.6 4.7 4.8 5.3 10.1
U: 6.0 5.0 11.0 33.0 4.0 5.0 a. Construct a stem-and-leaf display of the data.
80.0 18.0 35.0 17.0 23.0 How does it suggest that the sample mean and
F: 4.0 14.0 11.0 9.0 9.0 8.0 median will compare?
4.0 20.0 5.0 8.9 21.0 9.2 b. Calculate the values of the sample mean and
3.0 2.0 0.3 median. Hint: ^ xi 5 95.0.

c. By how much could the largest observation, 7. An experiment to study the lifetime (hr) for a cer-
10.1, be increased without affecting the value tain type of component involved putting ten com-
of the sample median? By how much could this ponents into operation and observing them for
value be decreased without affecting the value of 100 hours. Eight of the components failed during
the sample median? that period, and those lifetimes were recorded.
Denote the lifetimes of the two components still
4. Suppose that after computing xn based on n sample
functioning after 100 hours by 1001. The resulting
observations x1, . . . , xn, another observation xn11 be-
sample observations were 48, 79, 1001, 35, 92, 86,
comes available. What is the relationship between
57, 17, 1001, and 29. Which of the measures of
the mean of the first n observations, the new ob-
center discussed in this section can be calculated,
servation, and the mean of all n 1 1 observations?
and what are the values of those measures? Note:
The mean of the 10 observations in Exercise 1
The data from this experiment is said to be “cen-
is 640.5. If an 11th property had sold at a price
sored on the right”; patient lifetimes in medical ex-
of 780, what would be the mean sale price for all
perimentation are sometimes obtained in this way.
11 properties?
8. A target is located at the point 0 on a horizontal axis.
5. In the article “Evaluation of Optimal Power Op-
Let x be the landing point of a shot aimed at the
tions for Base Transceiver Stations of Mobile
target, a continuous variable with density function
Telephone Networks Cameroon” (Solar Energy,
f (x) 5 .75(1 2 x2) for 21 # x # 1. What is the mean
2012: 2935–2949), researchers recorded site spe-
value of x?
cific information for remote telecommunications
stations throughout Cameroon. The following ob- 9. Let x denote the amount of time for which a book
servations are daily energy demand readings (kWh) on 2-hour reserve at a college library is checked out
for 12 stations: by a student, and suppose that x has density func-
tion f (x) 5 .5x for 0 , x , 2.
17.76
23.44 24.58
26.99 27.23
30.77
a. What is the mean value of x? Why is the mean
31.79 35.57 36.59 36.59
40.51 59.31
value not 1, the midpoint of the interval of posi-
Without doing any computation, how do you think tive density?
the sample mean compares to the sample median? b. What is the median of this distribution, and how
What would you report as representative, or typical, does it compare to the mean value?
of the daily energy demand for these stations? What c. What proportion of checkout times are within
prompted your choice? one-half hour of the mean time? What propor-
tion are within one-half hour of the median time?
6. Blood pressure values are often reported to the
nearest 5 mmHg (100, 105, 110, and so on). Sup- 10. Let x have a uniform distribution on the interval from
pose the actual blood pressure values for nine ran- a to b, so the density function of x is f (x) 5 1y(b 2 a)
domly selected individuals are for a # x # b. What is the mean value of x?

118.6 127.4 138.4 130.0 113.7 11. The weekly demand for propane gas (1000s of gal-
122.0 108.3 131.5 133.2 lons) at a certain facility is a continuous variable
with density function
a. What is the median of the reported blood pres-
1
sure values? 2a1 2 b 1#x#2
b. Suppose the blood pressure of the second indi- f (x) 5 c x2
vidual is 127.6 rather than 127.4 (a small change 0 otherwise
in a single value). How does this affect the me- Determine both the mean value and the median.
dian of the reported values? What does this say In the long run, in what proportion of weeks will
about the sensitivity of the median to rounding the value of x be between the mean value and the
or grouping in the data? median?

12. Refer to Exercise 27 of Section 1.3, in which x was the mean value of any function h(x) is computed

h(x) 5 1 h(x) f(x) dx.

the number of telephone lines in use at a speci- similarly to the way in which itself is computed:
fied time. If 5 2.64, what are the values of p(5)
and p(6)? a. Refer to Exercise 9. Suppose the library, in a
desperate search for revenue to fund its op-
13. The distribution of the number of underinflated
erations, charges a student h(x) 5 x2 dollars
tires x on an automobile is given in Exercise 26a(ii)
to check a book out on 2-hour reserve for
of Section 1.3. Determine the mean value of x.
x hours. What is the mean value of the check-
14. Sometimes, rather than wishing to determine the out charge?
mean value of x, an investigator wishes to determine b. Suppose that h(x) 5 a 1 bx, a linear function of
the mean value of some function of x. Suppose, for x. Show that h(x) 5 a 1 b (this is true for x con-
example, that a repairman assesses a fixed charge of tinuous or discrete). If the mean value of repair
$25 plus $40 an hour that he spends on a job. Then time is .5 hr for the repair situation mentioned
the revenue resulting from a job that takes x hours at the outset of this problem, what is the mean
is h(x) 5 25 1 40x. If x is a continuous variable, value of repair revenue?

2.2 Measures of Variability

Reporting a measure of center gives only partial information about a data set or distribu-
tion. Different samples or distributions may have identical measures of center yet differ
from one another in other important ways. For example, for a normal distribution with
parameters and , the normal curve becomes more spread out as the value of in-
creases. Figure 2.7 shows dotplots of three samples with the same mean and median, yet
the extent of spread about the center is different for all three samples. The first sample
has the largest amount of variability, the third has the smallest amount, and the second
is intermediate to the other two in this respect.

1:
* * * * * * * * *
2:
3:

30 40 50 60 70
Unless otherwise noted, all content on this page is © Cengage Learning.
Figure 2.7 Samples with identical measures of center but different
amounts of variability

Measures of Variability for Sample Data

The simplest measure of variability in a sample is the range, which is the difference
between the largest and smallest sample values. Notice that the value of the range
for sample 1 in Figure 2.7 is much larger than it is for sample 3, reflecting more vari-
ability in the first sample than in the third one. A defect of the range, though, is that
it depends on only the two most extreme observations and disregards the positions
of the remaining n 2 2 values. Samples 1 and 2 in Figure 2.7 have identical ranges,
yet when we take into account the observations between the two extremes, there is
much less variability or dispersion in the second sample than in the first one.

Our primary measures of variability involve quantities called deviations from the
mean: x1 2 x, x2 2 x, . . . , xn 2 x. That is, the deviations from the mean are obtained by
subtracting x from each of the n sample observations. A deviation will be positive if the
observation is larger than the mean (to the right of the mean on the measurement axis)
and negative if the observation is smaller than the mean. If all the deviations are small in
magnitude, then all xi’s are close to the mean and there is little variability. On the other
hand, if some of the deviations are large in magnitude, then some xi’s lie far from x, sug-
gesting a greater amount of variability. A simple way to combine the deviations into a
single quantity is to average them (sum them and divide by n). Unfortunately, there is a
major problem with this suggestion:
n
sum of deviations 5 ^ (xi 2 x) 5 0
i51

n
so that the average deviation is always zero (because ^ i51 x 5 x 1 1 x 5 nx 5
n
^ i51 xi). In practice, the sum of the deviations may not be identically zero because of
rounding in x. The greater the decimal accuracy used in x, the closer the sum will be
to zero.
How can we change the deviations to nonnegative quantities so the positive and
negative deviations do not counteract one another when they are combined? One pos-
sibility is to work with the absolute values of the deviations and calculate the average
absolute deviation ^ uxi 2 xuyn. Because the absolute value operation leads to a number
of theoretical difficulties, consider instead the squared deviations (x1 2 x)2, (x2 2 x)2, . . . ,
(xn 2 x)2. We might now use the average squared deviation ^ (xi 2 x)2yn, but for several
reasons we will divide the sum of squared deviations by n 2 1 rather than n.

definitions The sample variance, denoted by s2, is given by

s2 5
^ (xi 2 x)2 5 Sxx
n21 n21

The sample standard deviation, denoted by s, is the (positive) square root of the
variance:
s 5 2s2

An alternative computational formula for s2 is given in Exercise 18.

The unit for s is the same as the unit for each of the xi’s. If, for example, the observa-
tions are fuel efficiencies in miles per gallon (mpg), then we might have s 5 2.0 mpg. A
rough interpretation of the sample standard deviation is that it is the size of a typical or rep-
resentative deviation from the sample mean within the given sample. Thus if s 5 2.0 mpg,
then some xi’s in the sample are closer than 2.0 to x whereas others are farther away; 2.0 is a
representative (or standard) deviation from the mean fuel efficiency. If s 5 3.0 for a second
sample of cars of another type, a typical deviation in this sample is roughly one and one-half
times what it is in the first sample, an indication of greater variability in the second sample.

Example 2.7 The website www.fueleconomy.gov contains a wealth of information about the fuel
characteristics of various vehicles. In addition to EPA mileage ratings, there are many
vehicles for which users have reported their own values of fuel efficiency (mpg). Con-
sider the following sample of n 5 11 efficiencies for the 2009 Ford Focus equipped
with an automatic transmission (for this model, EPA reports an overall rating of
27 mpg—24 mpg in city driving and 33 mpg in highway driving):
Car xi xi 2 x (xi 2 x)2
1 27.3 25.96 35.522
2 27.9 25.36 28.730
3 32.9 20.36 0.130
4 35.2 1.94 3.764
5 44.9 11.64 135.490
6 39.9 6.64 44.090
7 30.0 23.26 10.628
8 29.7 23.56 12.674
9 28.5 24.76 22.658
10 32.0 21.26 1.588
11 37.6 4.34 18.836
^ xi 5 365.9 ^ (xi 2 x) 5 .04 ^ (x 2 x)2 5 314.110 x 5 33.26
i

Effects of rounding account for the sum of deviations differing slightly from zero. The
numerator of s2 is Sxx 5 314.110, from which
Sxx 314.110
s2 5 5 5 31.41, s 5 5.60
n21 11 2 1
The size of a representative deviation from the sample mean 33.26 is roughly 5.6 mpg.
Note: Of the nine people who also reported driving behavior, only three did more
than 80% of their driving in highway mode; we bet you can guess which cars they
drove. We haven’t a clue why all 11 reported values exceed the EPA figure: Maybe
only drivers with really good fuel efficiencies communicate their results.

One explanation for the use of n 2 1 in s2 goes back to the fact that ^ (xi 2 x) 5 0.
Suppose that n 5 5 and that x1 2 x 5 24, x2 2 x 5 6, x3 2 x 5 1, and x5 2 x 5 28.
Since the sum of these four deviations is 25, the remaining deviation must be
x4 2 x 5 5 (so that the sum of all five deviations is zero). More generally, once any
n 2 1 of the deviations are available, the value of the remaining deviation is deter-
mined. The n deviations actually contain only n 2 1 independent pieces of informa-
tion about variability. Statisticians express this by saying that s2 and s are based on
n 2 1 degrees of freedom (df). Many inferential procedures encountered in later
chapters are based on some appropriate number of df.

The Variance and Standard Deviation of a Discrete Distribution

Let x be a discrete variable with mass function p(x) and mean value . Just as itself is
a weighted average of possible x values, where the weights come from the mass function,
the variance is a weighted average of the squared deviations (x 2 )2 for possible x values.

definitions The variance of a discrete distribution for a variable x specified by mass function
p(x), denoted by 2x or just 2 (alternatively, V(x)), is given by

2 5 ^ (x 2 )2 ? p(x)

where the sum is over all possible x values. The standard deviation is , the posi-
tive square root of the variance.

If a particular x value is far from , resulting in a large squared deviation, it will

still not contribute much to variability in the distribution if p(x) is quite small. This
is desirable because any x value for which p(x) is quite small will be observed very
infrequently in a long sequence of selections from the population or process. Just
as s can be interpreted as the size of a representative deviation from the sample
mean, can be interpreted as the size of a typical deviation from the population
or process mean.

Example 2.8 Consider a computer system consisting of the computer itself, a monitor, and a printer.
Let x denote the number of system components that need service while under war-
ranty; possible x values are 0, 1, 2, and 3. Suppose that p(0) 5 .532, p(1) 5 .389,
p(2) 5 .076, and p(3) 5 .003 (these come from individual component failure propor-
tions of .2, .3, and .05 along with an assumption of component independence, so that
these proportions can be multiplied as we originally did in a binomial calculation).
Then 5 .55 and

2 5 ^ (x 2 )2 ? p(x)
5 (0 2 .55)2 (.532) 1 (1 2 .55)2 (.389) 1 (2 2 .55)2(.076)
1 (3 2 .55)2(.003)
5 .16093 1 .07877 1 .15979 1 .01801 5 .41750
from which 5 .646.

An alternative computational formula for calculating 2 is given in Exercise 26, which

is similar to the computational formula for s2 in Exercise 18.
Recall that the mean value of the binomial distribution based on group size n and
item success proportion is just n. The variance is also a simple expression, though
verification of this result involves some tedious manipulation of summations:
n
n!
2 5 ^ (x 2 n)2 x(1 2 )n2x 5 n (1 2 )
x50 x!(n 2 x)!

The standard deviation of a binomial distribution is then 5 2n(1 2 ). Note that
5 0 if 5 0 (in which case, every item is a failure, so x 5 0 always) or 5 1 (ev-
ery item a success, so x 5 n always). The variance and standard deviation are largest

when 5 .5 [(1 2 ) is maximized for this value], that is, when there is a 50–50 split
between successes and failures. As moves toward either 0 or 1, the variance and stan-
dard deviation decrease. If identical components are shipped in groups of size 25 and
the long-run success (doesn’t need warranty service) proportion is 5 .9, then

5 25(.9) 5 22.5 5 225(.9)(.1) 5 22.25 5 1.50

The mean value of a Poisson distribution with parameter is itself, and this is also
the variance of the distribution:

e2x
2 5 ^ (x 2 )2 5
x50 x!

(Again, much summation manipulation is required.) The standard deviation is, of course,
1. If the number of blemishes x on surfaces of a certain part has a Poisson distribution
with parameter 5 3.5, then the mean value is 3.5 and the standard deviation is 1.87.

The Variance and Standard Deviation of a Continuous Distribution

The variance of a continuous distribution with density function f (x) is obtained by re-
placing summation in the discrete case by integration and substituting f (x) for p(x).

definitionS The variance of a continuous distribution specified by density function f (x) is

2 5 # (x 2 )2 ? f (x) dx
2

The standard deviation is again the positive square root of the variance.

Example 2.9 The distribution of x 5 gravel sales during a given week (tons), introduced in
Example 2.5, was specified by the density function f (x) 5 1.5(1 2 x2) for x between
0 and 1. We found the mean value to be 5 .375. The variance of the distribution is
1
2 5 # (x 2 .375)2 ? 1.5(1 2 x2) dx
0

Multiplying the factors in the integrand gives 1.5(2x4 1 .75x3 1 .859375x2 2 .75x 1

.140625). Integrating this fourth-degree polynomial term by term gives 2 5 .059375
and 5 .244.

The Case of a Normal Distribution

The two parameters of a normal distribution were denoted by and . We have already
seen that is in fact the mean value, and it should come as no surprise that the second

parameter is the standard deviation of the distribution. That is, a bit of integration ma-
nipulation shows that
1
V(x) 5 # (x 2 )2
2 2
e 2(x2) y(2 ) dx 5 2
2 12
Let k be some fixed positive number. Consider the area under a normal curve with
parameters and that lies within k standard deviations of the mean value. That is,
we wish to determine the proportion of x values that lie in the interval from 2 k to
1 k. Standardizing the interval limits gives
2 k 2 1 k 2
5 2k 5k

Thus the desired proportion is the area under the standard normal (z) curve between
2k and k. This shows that the area within k standard deviations of the mean under any
normal curve depends only on k and not on the particular normal curve under consider-
ation. For k 5 1, the desired proportion is the area under the z curve between 21 and 1.
From Appendix Table I, this area is .8413 2 .1587 5 .6826 < .68. Similar calculations
for k 5 2 and k 5 3 give .9544 and .9974, respectively. Thus for any variable x whose
distribution is well approximated by a normal curve:
Approximately 68% of the values are within 1 standard deviation of the mean.
Approximately 95% of the values are within 2 standard deviations of the mean.
Approximately 99.7% of the values are within 3 standard deviations of the mean.
These three statements together are often referred to as the empirical rule; the name
reflects the fact that histograms of a great many data sets have at least roughly the shape
of a normal curve.

Other Continuous Distributions

A variable x is said to have a lognormal distribution with parameters and if ln(x) is
normally distributed with mean value and standard deviation . In Section 2.1, we
pointed out that the mean value of x itself is not . Similarly, the variance of x is not 2.
It can be shown that
2 2
V(x) 5 e21 (e 2 1)

The variance of a variable having a Weibull distribution is even more complicated than
the mean value; consult one of the chapter references.
2
and s2
The sample mean x is a sensible estimate (educated guess) for the value of the population
or process mean . Similarly, the sample variance should be defined so that it gives a rea-
sonable estimate of the population or process variance 2. Recall that 2 involves squared
deviations from , that is, quantities of the form (x 2 )2. If the value of were known
to an investigator, a good estimate of 2 based on sample observations x1, . . . , xn would be
^ (xi 2 )2yn. It is natural to replace by x when the value of the former quantity is
unknown. However, it can be shown that ^ (xi 2 x)2 , ^ (xi 2 )2 unless x 5 , so x is
“closer” to the sample observations than is . To compensate for this reduction in sum of

squares, the value of the denominator n should also be reduced. According to a technical
criterion called unbiasedness, the sample size n should be replaced by the number of df
n 2 1. The resulting sample variance s2 will tend to provide good estimates of 2.

Section 2.2 Exercises

15. In the article “Mechanical Reliability of Devices Sub- d. Notice that one group exhibits the smaller stan-
dermally Implanted into the Young of Long-Lived and dard deviation but the other exhibits the smaller
Endangered Wildlife” (J. of Materials Engr. and Perfor- range. Explain how it is possible for a data set to
mance, 2012: 1924–1931), researchers examined the have the smallest standard deviation yet not have
mechanical reliability of a thin enclosure for a biote- the smallest range. Hint: Keep in mind how stan-
lemetry device to be subdermally implanted in young dard deviation measures variability and compare
wild animals. Six enclosure specimens were subjected the dotplots you created.
to puncture tests. Each specimen was placed in a test
18. Traumatic knee dislocation often requires surgery to
apparatus, and researchers recorded the necessary
repair ruptured ligaments. One measure of recovery
force (N) for the puncture head to cause initial cracks
is range of motion (measured as the angle formed
in the enclosure. Here is the corresponding data:
when, starting with the leg straight, the knee is bent
2006.1 2065.2 2118.9
as far as possible). The given data on postsurgical
1686.6 1966.9 1792.5
range of motion appeared in the article “Recon-
a. Calculate x and the deviations from the mean.
~

struction of the Anterior and Posterior Cruciate

b. Use the deviations calculated in part (a) to ob-
Ligaments After Knee Dislocation” (Amer. J. Sports
tain the sample variance and the sample stan-
Med., 1999: 189–197):
dard deviation.
c. Compute the sample standard deviation using a 154 142 137 133 122 126 135
calculator or software function to confirm the ac- 135 108 120 127 134 122
curacy of your answer in (b). a. What are the values of the sample mean and
16. Return to the puncture test data given in Exercise 15. sample median?
a. Subtract 100 from each observation to obtain a b. An alternative computing formula for the nu-
sample of transformed values. Now calculate the merator of s2 is:
sample variance of these transformed values and 1
Sxx 5 ^ (xi 2 x)2 5 ^ x2i 2 ( ^ xi)2
compare it to s2 for the original data. n
b. Consider a sample x1, . . . , xn and let yi 5 xi 2 c for Using this formula, determine the sample variance
i 5 1, 2, . . . , n, where c is some specified number. of the data.
Give a general argument to show that the sample Hint: ^ xi 5 1695, ^ x2i 5 222,581.
variance of the yi’s is identical to that of the xi’s.
Hint: How are y and x related? 19. In the article “X-Ray Computed Tomography and
Nondestructive Evaluation of Clogging in Porous
17. Suppose the following represent quiz scores (out of Concrete Field Samples” (J. of Materials in Civil
15 points) for students in two different study groups: Engr., 2012: 1103–1109), investigators determined the
Group 1: 10, 14, 8, 7, 12, 7, 11 clogging percentage in porous concrete samples cored
Group 2: 5, 8, 9.5, 8.5, 9, 9.5, 13 from parking lots. Porosity profiles using computed
a. Compute the mean and standard deviation for tomography scanned images were used in this study.
each group. The following represent the average porosity (%) using
b. Determine the range for each data set. a gravimetric method for nine concrete cores:
c. Create a dotplot for each data set and ensure you 8.10 20.50 26.54 19.68 14.87
use the same axis scale for each. 14.36 9.19 23.55 22.27

Calculate and interpret the values of the sample a. Within 5 of the mean value?
mean and sample standard deviation for this data. b. Within 1 standard deviation of the mean value?

20. Use the alternative computing formula for Sxx as 24. Suppose that x, the number of flaws on the surface
shown in Exercise 18 to determine the sample stan- of a boiler of a certain type, has a Poisson distribu-
dard deviation for the average porosity measure- tion with 5 5. For what proportion of such boilers
ments presented in Exercise 19. will the number of flaws
a. Be within 1 standard deviation of the mean
21. Consider the following information on ultimate
number of flaws?
tensile strength (lb/in.) for a sample of n 5 4 hard
b. Exceed the mean number of flaws by more than
zirconium copper wire specimens (from “Charac-
2 standard deviations?
terization Methods for Fine Copper Wire,” Wire J.
Intl., August 1997: 74–80): 25. Let x represent the number of underinflated tires on
an automobile of a certain type, and suppose that
x 5 76,831 s 5 180 smallest xi 5 76,683
p(0) 5 .4, p(1) 5 p(2) 5 p(3) 5 .1, and p(4) 5 .3,
largest xi 5 77,048
from which 5 1.8.
Determine the values of the two middle sample ob- a. Calculate the standard deviation of x.
servations (and don’t do it by successive guessing!).
b. For what proportion of such cars will the num-
Hint: See Exercise 18 part b.
ber of underinflated tires be within 1 standard
22. The federal test procedure (FTP) for determin- deviation of the mean value? More than 3 stan-
ing the levels of various types of vehicle emissions dard deviations from the mean value?
is time-consuming and expensive to perform. Ac- 26. Use the fact that (x 2 )2 5 x2 2 2x 1 2 to show
cording to the article “Motor Vehicle Emissions that 2 5 ^ x2p(x) 2 2 for a discrete variable x.
Variability” (J. of the Air and Waste Mgmnt. Assoc., Then use this result to compute the variance for
1996: 667–675), there is a widespread belief that the variable whose distribution is given in the pre-
repeated FTP measurements on the same vehicle vious problem. Hint: Substitute the alternative
would yield identical (or nearly identical) results. expression for (x 2 )2 in the definition of 2, and
The accompanying data is from one particular ve- break the summation into three separate terms; the
hicle characterized as a high emitter: argument in the continuous case involves replacing
HC (gm/mi): 13.8 18.3 32.2 32.5 summation with integration.
CO (gm/mi): 118 149 232 236 27. If x has a uniform distribution on the interval from a
a. Compute the sample standard deviations for the to b [ f (x) 5 1y (b 2 a)], from which 5 (a 1 b)y2,
HC and CO observations. Does the widespread show that 2 5 (b 2 a)2y12. If task completion time
belief appear to be justified? is uniformly distributed with a 5 4 and b 5 6, what
b. The sample coefficient of variation syx (or 100syx) proportion of times will be farther than 1 standard
assesses the extent of variability relative to the deviation from the mean value of completion
mean. Values of this coefficient for several different time?
data sets can be compared to determine which 28. Suppose that bearing diameter x has a normal distri-
data sets exhibit more or less variation. Carry out bution. What proportion of bearings have diameters
such a comparison for the given HC and CO data. that are within 1.5 standard deviations of the mean
diameter? That exceed the mean diameter by more
23. Suppose, as in Exercise 57 of Chapter 1, that the
than 2.5 standard deviations?
number of drivers traveling between a particular
origin and destination during a designated time pe- 29. Historical data implies that 20% of all components
riod has a Poisson distribution with 5 20. In the of a certain type need service while under warranty.
long run, during what proportion of such periods Suppose that whether any particular component
will the number of drivers be needs warranty service is independent of whether

any other component does. If these components 31. If component lifetime is exponentially distributed
are shipped in batches of 25 and x denotes the with parameter , obtain an expression for the pro-
number of components in a batch that need war- portion of components whose lifetime exceeds the
ranty service, determine the standard deviation of mean value by more than 1 standard deviation. Hint:
x and then the proportion of batches for which the According to Exercise 26, 2 5#0 x2f (x) dx 2 2;
number of components that need warranty service now use integration by parts.
exceeds the mean number by more than 2 standard
32. The sample mean and sample standard devia-
deviations.
tion for the sample of n 5 100 shear strength ob-
30. If the unloading time of a forwarder in a harvesting servations given in Exercise 17 of Section 1.2 are
operation is lognormally distributed with a mean 5049.16 and 351.45, respectively. What percent-
value of 900 and a standard deviation of 725, what age of the observations in the sample are within
are the values of the parameters and ? Note: An 1 standard deviation of the mean, and how does this
expression for the mean value of a lognormal vari- compare to the corresponding percentage given by
able is given in Section 2.1, and an expression for the empirical rule? Answer this question also for
the variance appears in this section. 2 standard deviations and for 3 standard deviations.

2.3 More Detailed Summary Quantities

The median separates a data set or distribution into two equal parts, so that 50% of the
values exceed the median and 50% are smaller than the median. Quartiles and percentiles
give more detailed information about location of a data set or distribution by considering
percentages other than 50%. In this section, we also develop another measure of spread
based on the quartiles, the interquartile range (IQR). The median and IQR can be used
together to give a concise yet informative visual summary of sample data called a boxplot.

Quartiles and the Interquartile Range

The lower and upper quartiles along with the median separate a data set or distribution
into four equal parts: 25% of all values are smaller than the lower quartile, 25% exceed
the upper quartile, and 25% lie between each quartile and the median. This is illustrated
for a continuous distribution or smoothed histogram in Figure 2.8.

Unless otherwise noted, all content on this page is © Cengage Learning.

25% 25% 25% 25%

Lower Upper
quartile Median quartile

Figure 2.8 Illustrating the quartiles

Let’s first consider quartiles for sample data. There are several different sensible
ways to define the sample quartiles. We will use a definition that requires a minimal

amount of computation; statistical computer packages actually calculate quartiles by

interpolation (our quartiles are called fourths in some sources).

definitionS Separate the n ordered sample observations into a lower half and an upper half; if
n is an odd number, include the median x in each half. Then
~

lower quartile 5 median of the lower half of the data

upper quartile 5 median of the upper half of the data
The interquartile range (IQR), a measure of variability that is resistant to the
effect of outliers, is the difference between the two quartiles:
IQR 5 upper quartile 2 lower quartile

Example 2.10 Reconsider the flexural strength data for beams given in Example 1.2. A stem-and-
leaf display of the 27 observations follows:
5 9
6 3 3 5 8 8
7 0 0 2 3 4 6 7 7 8 8 9 Stem: ones digit
8 1 2 7 Leaf: tenths digit
9 0 7 7
10 7
11 3 6 7
Because n 5 27 is odd, the median x 5 7.7 is included in each half of the data:
~

Lower half: 5.9 6.3 6.3 6.5 6.8 6.8 7.0 7.0 7.2 7.3 7.4 7.6 7.7 7.7
Upper half: 7.7 7.8 7.8 7.9 8.1 8.2 8.7 9.0 9.7 9.7 10.7 11.3 11.6 11.8

7.0 1 7.0 8.7 1 9.0

lower quartile 5 5 7.0 upper quartile 5 5 8.85
2 2
IQR 5 8.85 2 7.0 5 1.85

Notice that if the largest observation, 11.8, were increased by any amount, the up-
per quartile and therefore the IQR would not be affected, whereas such an increase
would change the sample variance and standard deviation. Similarly, a decrease in
several of the smallest observations has no impact on the quartiles or the IQR.
The following output is from the summary and IQR commands from the R
software. The former command requests that the values of various summary quanti-
ties be calculated:
> summary(flexural)
Min. 1st Qu. Median Mean 3rd Qu. Max.
5.900 7.000 7.700 8.141 8.850 11.800
> IQR(flexural)
[1] 1.85
Minitab’s reported value for the quartile Q3 is 9.000, a bit different from what R returns.

Now consider a continuous variable x whose distribution is described by a density

function f (x). Recall that the median results from solving the equation ~

~

#2 f (x) dx 5 .5

(so that half the area under the density curve lies to the left of ). The lower quartile q1
~

and upper quartile qu are solutions to

q1
#2 f (x) dx 5 .25 # f (x) dx 5 .25
qu

Example 2.11 The exponential distribution with parameter has density function e2x for x . 0.
For any positive number c,
c c
#2 f (x) dx 5 # e2x dx 5 1 2 e2c
0

#c e2x dx 5 e2c
Equating either of these quantities to .5 and solving for c gives c 5 5 2ln(.5)y 5
~

.693y. Equating each of these two quantities to .25 gives

q1 5 2ln(.75)y 5 .288y qu 5 2ln(.25)y 5 1.386y

Suppose, for example, that times (min) between successive arrivals at a shipping ter-
minal are exponentially distributed with 5 .1. Then q1 5 2.88 min, 5 6.93 min,
~

and qu 5 13.86 min. The upper quartile is much farther from the median than is the
lower quartile because the distribution has a substantial positive skew (the mean value
of x is 1y 5 10, much larger than the median).

Example 2.12 The quartiles of a normal distribution are easily expressed in terms of and . First,
consider a variable z having the standard normal distribution. Symmetry of the stan-
dard normal curve about 0 implies that 5 0. Looking for .2500 inside Appendix
~

Table I, we obtain the following information:

area to the left of 2.67: .2514
area to the left of 2.68: .2483
Since .25 is roughly halfway between these two tabled areas, we take 2.675 as the
lower quartile. By symmetry, .675 is the upper quartile.
It is then easily verified that if x has a normal distribution with mean value and
standard deviation ,

upper quartile 5 1 .675 lower quartile 5 2 .675

That is, for any normal distribution, the quartiles are .675 standard deviation to ei-
ther side of the mean. The interquartile range is 1 .675 2 ( 2 .675) 5 1.35.
A familiar example is IQ scores in the general population, where 5 100, 5 15,
q1 5 89.875 90, and qu 110. Roughly 25% of all people have scores below
90 and roughly 25% have scores exceeding 110.
The relation IQR 5 1.35 suggests that if the sample IQR is very different from
1.35s, it is not plausible that the underlying distribution is normal. In Example 2.10,
1.35s 2.2, which is not much greater than the IQR of 1.85. A graphical technique
for assessing the plausibility of a normal population or process distribution is pre-
sented in the next section.

For our purposes, it is not necessary to discuss quartiles for a discrete distribution.

Boxplots
A boxplot is a visual display of data based on the following five-number summary:

smallest xi lower quartile median upper quartile largest xi

To create a boxplot, first draw a horizontal measurement scale. Then place a rectangle
above this axis; the left edge of the rectangle is at the lower quartile, and the right edge is
at the upper quartile (so box width 5 IQR). Place a vertical line segment or some other
symbol inside the rectangle at the location of the median; the position of the median
symbol relative to the two edges conveys information about skewness in the middle 50%
of the data. Finally, draw “whiskers” out from either end of the rectangle to the smallest
and largest observations. A boxplot with a vertical orientation can also be drawn by mak-
ing obvious modifications in the construction process.

Example 2.13 Returning to the article on lightweight aggregates referenced in Example 2.1, the
researchers also reported specific gravity measurements for all 14 LWA specimens:

1.10 1.29 1.38 1.39 1.40 1.45 1.46

1.48 1.49 1.50 1.51 1.51 1.56 1.62

The five-number summary is as follows:

Smallest xi 5 1.10 lower quartile 5 1.39 x 5 1.47 upper quartile 5 1.51
~

Largest xi 5 1.62
Figure 2.9 shows the resulting boxplot.
The right edge of the box is closer to the median than is the left edge, indicating
a substantial skew in the middle half of the data. The box width (IQR) is also reason-
ably large relative to the range of the data (distance between the tips of the whiskers).

1.1 1.2 1.3 1.4 1.5 1.6

Specific gravity

Figure 2.9 A boxplot of the LWA data generated by the R software

A boxplot is certainly more compact than a stem-and-leaf display or histogram, but

it is sometimes inferior to these latter two descriptive techniques because a boxplot can
mask important characteristics of the data, such as the presence of clusters. The main
attraction of boxplots is that they give a quick visual comparison. A comparative or side-
by-side boxplot is a very effective way of revealing similarities and differences between
two or more data sets consisting of observations on the same variable.

Example 2.14 The article “Compression of Single-Wall Corrugated Shipping Containers Using
Fixed and Floating Test Platens” (J. of Testing and Evaluation, 1992: 318–320) de-
scribes an experiment in which several different types of boxes were compared with
respect to compression strength. Consider the following observations on four dif-
ferent types of boxes (summary quantities for this data are in good agreement with
values given in the cited article):
Type of box Compression strength (lb)
1 655.5 788.3 734.3 721.4 679.1 699.4 Unless otherwise noted, all content on this page is © Cengage Learning.

2 789.2 772.5 786.9 686.1 732.1 774.8

3 737.1 639.0 696.3 671.7 717.2 727.1
4 535.1 628.7 542.4 559.0 586.9 520.0

Figure 2.10 is a comparative boxplot of this data produced by the Minitab statistical
package. (Recall that Minitab uses definitions of the quartiles that differ somewhat
from ours.) The most striking feature of the comparative boxplot is that strength
values for the fourth type of box appear to be considerably smaller than those for the
three other types; this suggests that the population mean strength for type 4 boxes is
less than the mean strengths for the other three types. The differences between box
types seem pretty clear-cut because within-sample variation is small relative to the

separation between sample means and medians. When this is not the case, an infer-
ential method called single-factor analysis of variance, discussed in Chapter 9, is used
to investigate differences among three or more populations or treatments.

800

700

CompStr
600

500
1 2 3 4
Type

Figure 2.10 A Minitab comparative boxplot of

the compressive strength data

Boxplots That Show Outliers

A boxplot can be embellished to indicate explicitly the presence of outliers.

definitionS Any observation farther than 1.5 IQR from the closest quartile is an outlier. An
outlier is extreme if it is more than 3 IQR from the nearest quartile, and it is mild
otherwise.

Many inferential procedures are based on the assumption that the sample came from
a normal distribution. Even a single extreme outlier in the sample warns the investiga-
Unless otherwise noted, all content on this page is © Cengage Learning.

tor that such procedures should not be used, and the presence of several mild outliers
conveys the same message.
Let’s now modify our previous construction of a boxplot by drawing a whisker out
from each end of the box to the smallest and largest observations that are not outliers. Each
mild outlier is represented by a closed circle and each extreme outlier by an open circle.
Some statistical computer packages do not distinguish between mild and extreme outliers.

Example 2.15 The National Health and Nutrition Examination Survey (NHANES), a massive
annual program conducted by the National Center for Health Statistics, is a series
of cross-sectional nationally representative surveys that include demographic,
socioeconomic, dietary, and health-related questions. The information from the

surveys is used to assess the health and nutritional status of adults and children in
the United States.
One variable measured is the high-density lipoprotein (HDL) cholesterol level
(mg/dl) of each survey participant. The following 30 HDL observations were ob-
tained from the 2009–2010 NHANES data set:
11 32 33 41 45 46 47 48 48 49
49 50 52 55 57 57 59 61 63 63
66 67 71 71 71 72 73 76 111 144
Relevant summary quantities are
x 5 57 lower quartile 5 48 upper quartile 5 71
~

IQR 5 23 1.5 IQR 5 34.5 3 IQR 5 69

Thus, any observation smaller than 48 2 34.5 5 13.5 or larger than 71 1 34.5 5
105.5 is an outlier. There is one outlier at the lower end of the sample and two at the
upper end. Because 71 1 69 5 140, the largest observation of 144 is an extreme out-
lier; the other outlier is mild. The whiskers extend out to 32 and 76, the most extreme
observations that are not outliers. The resulting boxplot is in Figure 2.11.

0 20 40 60 80 100 120 140 160

HDL

Figure 2.11 A boxplot of the HDL cholesterol data showing mild and
extreme outliers

Unless otherwise noted, all content on this page is © Cengage Learning.

Percentiles
Let p denote a number between 0 and 1. Then the (100p)th percentile, hp—also called
the pth quantile—separates the smallest 100p% of the data or distribution from the
remaining values. For example, 90% of all values lie below the 90th percentile, .9 (the
.9th quantile), and only 10% of all values exceed the 90th percentile. The median is
the 50th percentile, and the lower and upper quartiles are the 25th and 75th percentiles,
respectively. For a continuous distribution, p is the solution to the equation
p
#2 f (x) dx 5 p

That is, p is the area under the density curve to the left of p. Figure 2.12 illustrates the
definition.

( )

Shaded area =

Figure 2.12 The (100 )th percentile of a continuous

distribution

Example 2.16 Appendix Table I gives cumulative z curve areas for the standard normal distribution.
To find the 90th percentile, we look for cumulative area .9000 inside the table. The
entry closest to .9000 is .8997 in the 1.2 row and .08 column, so .9 1.28. By sym-
metry, the 10th z percentile (.1th quantile) is .1 21.28. It then follows that for the
normal distribution with mean value and standard deviation ,
.9 1 1.28 .1 2 1.28

Once a particular z percentile is determined, the corresponding percentile for any

normal distribution is easily calculated.

Percentiles for discrete distributions will not be needed in this book. In general,
percentiles for sample data require interpolation between successive sample values. In
Section 2.4, we use percentiles that correspond to the ordered sample observations. For
Unless otherwise noted, all content on this page is © Cengage Learning.

example, if n 5 10, we will regard the smallest sample observation as the fifth sample
percentile, the second smallest observation as the 15th sample percentile, and so on.

Section 2.3 Exercises

33. Reconsider the accompanying data on postsurgical b. Construct a boxplot based on the five-number
range of motion introduced in Exercise 18 of this summary and comment on its features.
chapter: c. How large or small does an observation have to
154 142 137 133 122 126 135 be to qualify as an outlier? As an extreme out-
135 108 120 127 134 122 lier?
a. What are the values of the quartiles? What is the d. By how much could the largest observation be
value of the IQR? decreased without affecting the IQR?

34. Here is a description from the R software of the 16 18 18 26 33 41 54

strength data given in Exercise 4 from Chapter 1.
56 66 68 87 91 95 98
Min. 1st Qu. Median 106 109 111 118 127 127 135
122.2 133.0 135.4 145 147 149 151 168 172 183
Mean 3rd Qu. Max.
189 190 200 210 220 229 230
135.4 138.2 147.7
233 238 244 259 294 329 403
a. Comment on any interesting features.
b. Construct a boxplot of the data and comment on Construct a boxplot that shows outliers and com-
what you see. ment on its features.
35. The diameter length of contact windows used in in- 38. A sample of 20 glass bottles of a particular type
tegrated circuits is normally distributed. About 5% was selected, and the internal pressure strength of
of all lengths exceed 3.75 m, and about 1% of all each bottle was determined. Consider the follow-
lengths exceed 3.85 m. What are the mean value ing partial sample information:
and standard deviation of the length distribution? median 5 202.2
36. The following data on distilled alcohol content (%) lower quartile 5 196.0
for a sample of 35 port wines was extracted from upper quartile 5 216.8
the article “A Method for the Estimation of Alcohol three smallest observations: 125.8 188.1 193.7
in Fortified Wines Using Hydrometer Baumé and three largest observations:
221.3 230.5 250.2
Refractometer Brix” (Amer. J. Enol. Vitic., 2006: a. Are there any outliers in the sample? Any ex-
486–490). Each value is an average of two duplicate treme outliers?
measurements. b. Construct a boxplot that shows outliers, and
16.35 18.85 16.20 17.75 19.58 comment on any interesting features.
17.73 22.75 23.78 23.25 19.08 39. A company utilizes two different machines to manu-
19.62 19.20 20.05 17.85 19.17 facture parts of a certain type. During a single shift, a
19.48 20.00 19.97 17.48 17.15 sample of n 5 20 parts produced by each machine is
19.07 19.90 18.68 18.82 19.03 obtained, and the value of a particular critical dimen-
19.45 19.37 19.20 18.00 19.60 sion for each part is determined. The accompanying
19.33 21.22 19.50 15.30 22.25 comparative boxplot is constructed from the resulting
data. Compare and contrast the two samples.
a. Determine the value of the IQR.
b. Are there any outliers in the sample? Any ex- Machine
treme outliers? Unless otherwise noted, all content on this page is © Cengage Learning.
c. Construct a boxplot and comment on its features.
2
d. By how much could the largest observation be
decreased without affecting the value of the IQR?

37. Grip is applied to produce normal surface forces 1

that compress the object being gripped. Examples
include two people shaking hands and a nurse Dimension
85 95 105 115
squeezing a patient’s forearm to stop bleeding. The
article “Investigation of Grip Force, Normal Force, Figure for Exercise 39
Contact Area, Hand Size, and Handle Size for Cy-
lindrical Handles” (Human Factors, 2008: 734–744) 40. Recall from Exercise 2 the data on the concen-
included the following data on grip strength (N) for tration (EU/mg) in settled dust for one sample of
a sample of 42 individuals: urban homes and another of farm homes:

U: 6.0 5.0 11.0 33.0 4.0 5.0 Engr., 1995: 483–490). Discuss any interesting
80.0 18.0 35.0 17.0 23.0 features.
F: 4.0 14.0 11.0 9.0 9.0 8.0 43. Exercise 46 from Section 1.5 suggested a Weibull
4.0 20.0 5.0 8.9 21.0 9.2 distribution with 5 5 and 5 125 as a model
3.0 2.0 0.3 for fracture strength of silicon nitride braze
a. Determine the medians, quartiles, and IQRs joints.
for the two samples. a. What are the quartiles of this distribution, and
b. Are there any outliers in either sample? Any what is the value of the IQR?
extreme outliers? b. Suppose that the value of is changed to
c. Construct a comparative boxplot and use it as 12.5. Determine the values of the quartiles
a basis for comparing and contrasting the two and the value of the IQR. Note: In essence,
samples. this amounts to dividing each observation in
the population distribution by 10, because
41. The authors of the article cited in Exercise 2 also
is a “scale” parameter and changing its value
provided endotoxin concentrations in dust from
stretches or compresses the x scale without
vacuum-cleaner dust bags:
changing the shape of the distribution.
U: 34.0 49.0 13.0 33.0 24.0 24.0 35.0 104.0
44. Reconsider the lognormal distribution with 5
34.0 40.0 38.0 1.0
9.164 and 5.385 proposed in Exercise 44 from
F: 2.0 64.0 6.0 17.0 35.0 11.0 17.0 13.0
Section 1.5 as a model for the distribution of non-
5.0 27.0 23.0 28.0 10.0 13.0 0.2 point source load of total dissolved solids (in kg/
Construct a comparative boxplot (which ap- day/km).
peared in the cited paper), and compare and a. What are the values of the quartiles?
contrast the two samples. b. What is the value of the 95th percentile of the
42. The comparative boxplot (see below) of gasoline concentration distribution?
vapor coefficients for vehicles in Detroit appeared c. If were 10.164 rather than 9.164, would the
in the article “Receptor Modeling Approach to values of the two quartiles simply increase by
VOC Emission Inventory Validation” (J. of Envir. an identical amount?

Gas vapor coefficient

60
Unless otherwise noted, all content on this page is © Cengage Learning.

0 Time
6 A.M. 8A.M. 12noon 2 P.M. 10P.M.

Figure for Exercise 42

2.4 Quantile Plots

An investigator frequently wishes to know whether it is plausible that a numerical sample
x1, x2, . . . , xn was selected from a particular type of population distribution (e.g., a normal
distribution). For one thing, many inferential procedures are based on the assumption
that the underlying distribution is of a specified type. The use of such procedures is in-
appropriate if the actual distribution differs greatly from the assumed type. Additionally,
understanding the underlying distribution can sometimes give insight into the physical
mechanisms involved in generating the data. An effective way to check a distributional
assumption is to construct a quantile plot (sometimes called a probability plot). The es-
sence of such a plot is that if the plot is based on the correct distribution, the points in the
plot will fall close to a straight line. If the actual distribution is quite different from the
one used to construct the plot, the points should depart substantially from a linear pattern.

Sample Quantiles
The details involved in constructing quantile plots differ a bit from source to source. The
basis for our construction is a comparison between quantiles of the sample data and the
corresponding quantiles of the distribution under consideration. Recall that for any num-
ber p between 0 and 1, the pth quantile p is such that area p lies to the left of p under the
density curve. For example, Appendix Table I shows that the .9th quantile (90th percen-
tile) for the standard normal distribution is approximately 1.28, the .1th quantile is roughly
21.28, the .8th quantile is about .84, and of course the .5th quantile (the median) is 0.
Roughly speaking, sample quantiles are defined in the same way that quantiles of a
population or process distribution are defined. The .5th sample quantile should separate
the smallest 50% of the sample from the largest 50%, the .9th sample quantile should be
such that 90% of the sample lies below that value and only 10% above, and so on. Our
interest here is only in the value of p corresponding to each of the sample observations
when ordered from largest to smallest. Recall that when n is odd, the sample median
or .5th quantile is the middle value in the ordered list; for example, the sixth smallest
value when n 5 11. This amounts to regarding the middle observations as being half
in the lower half of the data and half in the upper half. Similarly, suppose that n 5 10.
Then if we call the third smallest value the .25th quantile, we are regarding that value
as being half in the lower group (consisting of the two smallest observations) and half in
the upper group (comprising the seven largest observations). This leads to the following
general definition of sample quantiles:

definition Let x(1) denote the smallest sample observation, x(2) the second smallest sample
observation, . . . , and x(n) the largest sample observation. We take x(1) to be the
(.5yn)th sample quantile, x(2) to be the (1.5yn)th sample quantile, . . . , and finally
x(n) to be the [(n 2 .5)yn]th sample quantile. That is, for i 5 1, . . . , n, x(i) is the
[(i 2 .5)/n]th sample quantile.

Thus when n 5 20, x(1) is the .025th quantile, x(2) is the .075th quantile, x(3) is the .125th
quantile, . . . , and x(20) is the .975th quantile (97.5th percentile).

A Normal Quantile Plot

Suppose now that for i 5 1, . . . , n, the quantities (i 2 .5)yn are calculated and the cor-
responding quantiles are determined for a specified population or process distribution
whose plausibility is being investigated. If the sample were actually selected from the
specified distribution, the sample quantiles should be reasonably close to the corre-
sponding distributional quantiles. That is, for i 5 1, . . . , n, there should be reasonable
agreement between x(i) and the [(i 2 . 5)yn]th quantile for the specified distribution.
After determining the appropriate quantiles for the distribution being investigated, form
the n pairs as follows:

.5 1.5 n 2 .5
aa b th quantile, x(1) b , aa b th quantile, x(2) b , . . . , aa b th quantile, x(n) b
n n n

In other words, pair the smallest quantile with the smallest observation, the second
smallest quantile with the second smallest observation, and so on. Each such pair can
be plotted as a point on a two-dimensional coordinate system. If the first number in each
pair is close to the second number, the points in the plot will fall close to a 45° line [one
with slope 1 passing through the point (0, 0)].
For example, this program can be carried out to decide whether a normal distribution
with 5 100 and 5 15 is plausible. First the appropriate z quantiles are determined;
then the desired normal quantiles are expressed in the form 1 (corresponding z quan-
tile). However, an investigator is typically not interested in knowing whether a particular
normal distribution is plausible but instead whether some normal distribution is plausible.
It is clearly inefficient to construct a separate normal quantile plot for each of a large
number of different choices of and . Fortunately, this is not necessary because there
is a linear relationship between z quantiles and those for any other normal distribution:

quantile for normal (, ) distribution 5 1 (corresponding z quantile)

definition A normal quantile plot is a plot of the (z quantile, observation) pairs. The lin-
ear relation between normal (, ) quantiles and z quantiles implies that if the
sample has come from a normal distribution with particular values of and , the
points in the plot should fall close to a straight line with slope and vertical in-
tercept . Thus a plot for which the points fall close to some straight line suggests
that the assumption of a normal population or process distribution is plausible.

Note that if a straight line is fit to the points in the plot, the intercept and slope give esti-
mates of and , respectively, though these will typically differ from the usual estimates
x and s.

Example 2.17 There has been recent increased use of augered cast-in-place (ACIP) and drilled dis-
placement (DD) piles in the foundations of buildings and transportation structures. In
the article “Design Methodology for Axially Loaded Auger Cast-in-Place and Drilled

Displacement Piles” (J. Geotech. Geoenviron. Engr., 2012: 1431–1441) researchers

propose a design methodology to enhance the efficiency of these piles. The authors re-
ported the following length-diameter ratio measurements based on 17 static-pile load
tests on ACIP and DD piles from various construction sites. The values of p for which
z percentiles are needed are (1 2 .5)y17 5 .029, (2 2 .5)y17 5 .088, . . . , and .971.
x(i): 30.86 37.68 39.04 42.78 42.89 42.89 45.05 47.08 47.08
z percentile: 21.89 21.35 21.05 20.82 20.63 20.46 20.30 20.15 0.00
x(i): 48.79 48.79 52.56 52.56 54.8 55.17 56.31 59.94
z percentile: 0.15 0.30 0.46 0.63 0.82 1.05 1.35 1.89
Figure 2.13 shows the corresponding normal quantile plot as generated by the
qqnorm function in the R software. The pattern in the plot is quite straight, indicat-
ing it is plausible that the population distribution of length-diameter ratio is normal.

Normal quantile plot for length-diameter ratio

50
L/D

30
2 1 0 1 2
Normal quantile Unless otherwise noted, all content on this page is © Cengage Learning.

Figure 2.13 Normal quantile plot from R for the length-diameter

ratio data

The judgment as to whether a plot does or does not show a substantial linear pattern is
somewhat subjective. Particularly when n is small, normality should not be ruled out unless
the departure from linearity is very clear-cut. Figure 2.14 displays several plots that suggest a
nonnormal population or process distribution. In Section 8.4, we show how a quantitative as-
sessment of the extent to which points in a two-dimensional plot fall close to a straight line can
be used as the basis of an inferential procedure for deciding whether normality is plausible.

(a) (b) (c)

Figure 2.14 Quantile plots that are inconsistent with an underlying normal
distribution

Minitab will automatically obtain the z percentiles in response to an “NSCORE”

command, but it uses something a bit different from (i 2 .5)yn as a basis for this calcula-
tion. Minitab also has a normal plot command in its graphics menu; the resulting plot
has x on the horizontal axis and a nonlinear vertical axis constructed so that normal data
should plot close to a straight line.

Plots for Other Distributions

It is easy to assess the plausibility of a lognormal population or process distribution, be-
cause to say that x is lognormally distributed is to say that ln(x) has a normal distribution.
Thus one simply calculates ln(x(1)), . . . , ln(x(n)) and uses these quantities in place of
x(1), . . . , x(n) in a normal quantile plot.
For a Weibull distribution,
p 5 area to the left of p 5 1 2 e2(py)

This implies that

p
ln(1 2 p) 5 2a b

Multiplying by 21 and taking logs again gives

Unless otherwise noted, all content on this page is © Cengage Learning.

ln[2ln(1 2 p)] 5 [ln(p) 2 ln()] 5 ln(p) 1 where 5 2 ln()

Thus there is a linear relation between the logarithm of Weibull quantiles and
ln[2ln(1 2 p)]. This suggests that we calculate ln(x(1)), . . . , ln(x(n)) and then plot the
(ln[2 ln(1 2 p)], ln(x)) pairs. If the plot is reasonably straight, it is plausible that the
sample has come from some Weibull distribution.

Example 2.18 For many years it has been well established that the Weibull distribution is useful
in modeling the strength of fibers used in composite materials such as carbon
graphite, Kevlar, and glass. With the advent of nanotechnology where materials can
be developed at miniscule levels, scientists have questioned whether the Weibull

distribution is applicable to model material strength even at the nanoscale. In the

article “Stochastic Strength of Nanotubes: An Appraisal of Available Data” (Com-
posites Sci. and Tech., 2005: 2380–2384) researchers reported the tensile strengths
of three different types of nanotubes and assessed whether the Weibull distribution
would serve as a reasonable model for each type.
The following represent the tensile strengths (in GPa) for 26 multiwall carbon
nanotubes produced by chemical vapor deposition; their average diameter is
roughly 97 nm. Note that the values of pi 5 (i 2 .5)y 26 are also given:
x(i): 17.4 22.3 23.7 30.0 44.2 49.3 52.7 54.8 62.1 66.2
pi : 0.019 0.058 0.096 0.135 0.173 0.212 0.250 0.288 0.327 0.365
x(i): 84.9 90.1 90.3 91.1 99.5 101.6 108.5 109.5 119.1 127.0
pi : 0.404 0.442 0.481 0.519 0.558 0.596 0.635 0.673 0.712 0.750
x(i): 132.9 140.8 141.0 175.0 231.8 259.7
pi : 0.788 0.827 0.865 0.904 0.942 0.981
Figure 2.15 is a plot of the (ln[2ln(1 2 p)], ln(x)) pairs. Although there is some wig-
gling especially in the lower part of the plot, the overall pattern is reasonably straight
and so the assumption of an underlying Weibull distribution for tensile strength for
this type of nanotube appears to be acceptable. The article also showed that the
Weibull distribution was a good fit in modeling tensile strength for the two other
nanotube types discussed.

5.5

5.0

4.5
ln( )

Unless otherwise noted, all content on this page is © Cengage Learning.

4.0

3.5

3.0

4 3 2 1 0 1
ln( ln(1 ))

Figure 2.15 A Weibull plot of the nanotube tensile strength data

Most statistical computer packages make it easy to do the arithmetic necessary to

obtain the quantities to be plotted. In addition, the Minitab graphics menu has a Weibull
plot option, making it unnecessary for the user to do any arithmetic before obtaining the
plot. The x values are plotted directly on the horizontal axis, and the vertical axis is
constructed using a nonlinear scale so that data from a Weibull distribution should plot
close to a straight line.
Plots based on other distributions can also be constructed. Consult chapter refer-
ences and software packages for more information.

Section 2.4 Exercises

45. The accompanying normal quantile plot was con- 69.0 69.7 72.7 80.3 81.0
structed from a sample of 30 readings on tension for 85.0 86.0 86.3 86.7 87.7
mesh screens behind the surface of video display 89.3 90.7 91.0 92.5 93.0
tubes used in computer monitors. Does it appear The corresponding z percentiles are
plausible that the tension distribution is normal?
21.83 21.28 20.97 20.73 20.52
20.34 20.17 0.0 0.17 0.34
Tension
0.52 0.73 0.97 1.28 1.83
350
Construct a normal quantile plot and a dotplot. Is it
plausible that the population distribution is normal?
300
48. The accompanying observations are precipitation
values during March over a 30-year period in
250 Minneapolis–St. Paul.

.77 1.20 3.00 1.62 2.81 2.48

200
Normal quantile 1.74 .47 3.09 1.31 1.87 .96
–2 –1 0 1 2
.81 1.43 1.51 .32 1.18 1.89
46. The following are modulus of elasticity observa- 1.20 3.37 2.10 .59 1.35 .90
tions for cylinders given in the article cited in 1.95 2.20 .52 .81 4.75 2.05
Example 1.2:
a. Construct and interpret a normal quantile plot
37.0 37.5 38.1 40.0 40.2 40.8 41.0 for this data set.
Unless otherwise noted, all content on this page is © Cengage Learning.

42.0 43.1 43.9 44.1 44.6 45.0 46.1 b. Calculate the square root of each value and then
47.0 62.0 64.3 68.8 70.1 74.5 construct a quantile plot based on this transformed
Use the quantiles for a sample of size 20 given in data. Does it seem plausible that the square root of
this section to construct a normal quantile plot, and precipitation is normally distributed?
comment on the plausibility of a normal population c. Repeat part (b) after transforming by cube roots.
distribution.
49. The article “A Probabilistic Model of Fracture in
47. A sample of 15 female collegiate golfers was Concrete and Size Effects on Fracture Toughness”
selected, and the clubhead velocity (km/ hr) of each (Magazine of Concrete Res., 1996: 311–320) gives
golfer while swinging a driver was determined, arguments for why fracture toughness in concrete
resulting in the following data (“Hip Rotational specimens should have a Weibull distribution and
Velocities During the Full Golf Swing,” J. of Sports presents several histograms of data that appear well
Science and Medicine, 2009: 296–299): fit by superimposed Weibull curves. Consider the

following sample of size n 5 18 observations on transformer oil gap under various oil flow velocities
toughness for high-strength concrete (consistent and exposure to temporary overvoltage. Consider
with one of the histograms); values of pi 5 (i 2 .5)y 18 the following breakdown time data (in s) from their
are also given: experiment where an oil flow at 16 cm/s and an over-
voltage of 81kV were applied.
Obs: .47 .58 .65 .69 .72 .74
pi: .0278 .0833 .1389 .1944 .2500 .3056 7.2 10.0 18.0 25.0 36.0 38.0
Obs: .77 .79 .80 .81 .82 .84 46.0 63.0 71.0 76.0 92.0 95.0
pi: .3611 .4167 .4722 .5278 .5833 .6389 104.0 152.0 198.0 226.0 235.0 247.0
Obs: .86 .89 .91 .95 1.01 1.04 361.0 392.0
pi: .6944 .7500 .8056 .8611 .9167 .9722
Construct a Weibull plot and comment on the
Construct a Weibull quantile plot and comment. plausibility of breakdown time having a Weibull
distribution.
50. In the article “Weibull Parameter of Oil-Immersed
Transformer to Evaluate Insulation Reliability on 51. The accompanying figures show (a) a normal quan-
Temporary Overvoltage” (IEEE Trans. on Dielectrics tile plot of the observations on cell interdivision time
and Elec. Insul., 2010: 1863–1868), researchers in- (IDT) given in Exercise 16 of Section 1.2 and (b) a
vestigated the reliability of oil-immersed transformers normal quantile plot of the logarithms of the IDTs.
under various conditions. In one experiment, the What do these plots suggest about the distribution of
researchers measured the breakdown time of the cell interdivision time?

IDT

70
60
50
40
30
20
10
Normal quantile
–2 –1 0 1 2
(a)

ln(IDT)
Unless otherwise noted, all content on this page is © Cengage Learning.
4.5

3.5

2.5
Normal quantile
–2 –1 0 1 2
(b)

Figure for Exercise 51

52. A plot to assess the plausibility of an exponential 53. The article “Families of Distributions for Hourly
population distribution can be based on quantiles Median Power and Instantaneous Power of Received
of the exponential distribution having 5 1 (i.e., Radio Signals” (J. of Research for the National
the exponential distribution with density function Bureau of Standards, 1963: 753–762) suggests the
f (x) 5 e2x for x . 0). This is because , like for lognormal distribution for x 5 hourly median pow-
a normal distribution, is a scale parameter. Con- er (decibels) of received radio signals transmitted
sider the following failure time observations (1000s between two cities. Consider the following sample
of hours) resulting from accelerated life testing of of hourly median power readings:
16 integrated circuit chips of a certain type: 2.7 5.4 9.7 22.8 30.5 55.7 66.2 97.3
82.8 11.6 359.5 502.5 307.8 179.7 186.5 240.0
242.0 26.5 244.8 304.3 379.1 a. Is it plausible that these observations were sam-
212.6 229.9 558.9 366.7 204.6 pled from a normal distribution?
Construct a quantile plot and comment on the b. Is it plausible that these observations were sam-
plausibility of failure time having an exponential pled from a lognormal distribution?
distribution.

Supplementary Exercises
54. Anxiety disorders and symptoms can often be ef- d. Construct a comparative boxplot, and comment
fectively treated with benzodiazepine medications. on interesting features.
It is known that animals exposed to stress exhibit e. Would you recommend estimating the differ-
a decrease in benzodiazepine receptor binding in ence between the true average binding mea-
the frontal cortex. The paper “Decreased Benzo- sure of PTSD individuals and the true average
diazepine Receptor Binding in Prefrontal Cortex measure for healthy individuals using a method
in Combat-Related Posttraumatic Stress Disorder” based on assuming that each sample was select-
(American J. of Psychiatry, 2000: 1120–1126) deed from a normal population distribution? Ex-
scribed the first study of benzodiazepine receptor plain your reasoning.
binding in individuals suffering from PTSD. The
55. A sample of 77 individuals working at a particular
accompanying data on a receptor binding measure
office was selected, and the noise level (dBA) expe-
(adjusted distribution volume) was read from a
rienced by each one was determined, yielding the
graph in the paper:
following data (“Acceptable Noise Levels for Con-
PTSD: 10 20 25 28 31 35 37 38 38 struction Site Offices,” Building Serv. Engr. Res.
39 39 42 46 and Tech., 2009: 87–94).

Healthy: 23 39 40 41 43 47 51 58 55.3 55.3 55.3 55.9 55.9 55.9

55.9 56.1 56.1 56.1 56.1 56.1
63 66 67 69 72
56.1 56.8 56.8 57.0 57.0 57.0
a. Calculate and interpret the values of the mean, 57.8 57.8 57.8 57.9 57.9 57.9
median, and standard deviation for each of the 58.8 58.8 58.8 59.8 59.8 59.8
two samples. 62.2 62.2 63.8 63.8 63.8 63.9
b. Calculate a trimmed mean for each sample by 63.9 63.9 64.7 64.7 64.7 65.1
deleting the smallest and largest observations. 65.1 65.1 65.3 65.3 65.3 65.3
What is the trimming percentage? What effect 67.4 67.4 67.4 67.4 68.7 68.7
does trimming have? 68.7 68.7 69.0 70.4 70.4 71.2
c. Determine the value of the interquartile range 71.2 71.2 73.0 73.0 73.1 73.1
for each sample. Does either sample contain any 74.6 74.6 74.6 74.6 79.3 79.3
outliers? Any extreme outliers? 79.3 79.3 83.0 83.0 83.0

Use various techniques discussed in this chapter to 59. A deficiency of the trace element selenium in the
organize, summarize, and describe the data. diet can negatively affect growth, immunity, muscle
and neuromuscular function, and fertility. The
56. Three different C2F6 flow rates (SCCM) were con-
introduction of selenium supplements to dairy cows
sidered in an experiment to investigate the effect
is justified when pastures have low selenium levels.
of flow rate on the uniformity (%) of the etch on a
Authors of the paper “Effects of Short-Term Supple-
silicon wafer used in the manufacture of integrated
mentation with Selenised Yeast on Milk Production
circuits, resulting in the following data:
and Composition of Lactating Cows” (Australian J.
125: 2.6 2.7 3.0 3.2 3.8 4.6 of Dairy Tech., 2004: 199–203) supplied the follow-
160: 3.6 4.2 4.2 4.6 4.9 5.0 ing data on milk selenium concentration (mg/L)
200: 2.9 3.4 3.5 4.1 4.6 5.1 for a sample of cows given a selenium supplement
Compare and contrast the uniformity observations and a control sample given no supplement, both
resulting from these three different flow rates. initially and after a nine-day period.

57. Consider a sample x1, . . . , xn, and let xk and s2k de- Obs Init Se Init Cont Final Se Final Cont
note the sample mean and variance, respectively, of 1 11.4 9.1 138.3 9.3
the first k observations. 2 9.6 8.7 104.0 8.8
a. Show that 3 10.1 9.7 96.4 8.8
4 8.5 10.8 89.0 10.1
k
ks2k11 5 (k 2 1)s2k 1 (x 2 xk)2 5 10.3 10.9 88.0 9.6
k 1 1 k11
6 10.6 10.6 103.8 8.6
7 11.8 10.1 147.3 10.4
b. Suppose that a sample of 15 strands of drap-
ery yarn has resulted in a sample mean thread 8 9.8 12.3 97.1 12.4
elongation of 12.58 mm and a sample standard 9 10.9 8.8 172.6 9.3
deviation of .512 mm. A 16th strand results in 10 10.3 10.4 146.3 9.5
an elongation value of 11.8. What are the val- 11 10.2 10.9 99.0 8.4
ues of the sample mean and sample standard 12 11.4 10.4 122.3 8.7
deviation for all 16 elongation observations? 13 9.2 11.6 103.0 12.5
14 10.6 10.9 117.8 9.1
58. In 1997 a woman sued a computer keyboard man-
15 10.8 121.5
ufacturer, charging that her repetitive stress inju-
16 8.2 93.0
ries were caused by the keyboard (Genessy v. Dig-
ital Equipment Corp.). The jury awarded about a. Do the initial Se concentrations for the supple-
$3.5 million for pain and suffering, but the court ment and control samples appear to be similar?
then set aside that award as being unreason- Use various techniques from this chapter to
able compensation. In making this determina- summarize the data and answer the question
tion, the court identified a “normalative” group posed.
of 27 similar cases and specified a reasonable b. Again use methods from this chapter to summa-
award as one within 2 standard deviations of rize the data and then describe how the final Se
the mean of the awards in the 27 cases. The 27 concentration values in the treatment group dif-
awards were (in $1000s) 37, 60, 75, 115, 135, fer from those in the control group.
140, 149, 150, 238, 290, 340, 410, 600, 750,
750, 750, 1050, 1100, 1139, 1150, 1200, 1200, 60. An inequality developed by the Russian mathema-
1250, 1576, 1700, 1825, and 2000, from which tician Chebyshev gives information about the per-
^ xi 5 20,179, ^ x2i 5 24,657,511. What is the maxi centage of values in any sample or distribution that
mum possible amount that could be awarded un- fall within a specified number of standard deviations
der the 2 standard deviation rule? of the mean. Let k denote any number satisfying

k $ 1. Then at least 100(1 2 1y k2)% of the values 63. The accompanying observations are carbon mon-
are within k standard deviations of the mean. oxide levels (ppm) in air samples obtained from a
a. What does Chebyshev’s inequality say about certain region:
the percentage of values that are within 2 stan-
9.3 10.7 8.5 9.6 12.2 16.6 9.2 10.5
dard deviations of the mean? Within 3 standard
deviations of the mean? Within 5 standard de- 7.9 13.2 11.0 8.8 13.7 12.1 9.8
viations? Within 10 standard deviations? a. Calculate a trimmed mean by trimming the
b. What does Chebyshev’s inequality say about the smallest and largest observations, and give the
percentage of values that are more than 2 stan- corresponding trimming percentage. Do the
dard deviations from the mean? More than 3 same with the two smallest and two largest values
standard deviations from the mean? trimmed.
c. Suppose the distribution of slot width on a b. Using the results of part (a), how would you cal-
forging has a mean value of 1.000 in. and a culate a trimmed mean with a 10% trimming
standard deviation of .0025 in. What percent- percentage?
age of such forgings have a slot width that is be- c. Suppose there had been 16 sample observations.
tween .995 in. and 1.005 in.? If specifications How would you go about calculating a 10%
are 1.000 6 .005 in., what percentage of slot trimmed mean?
widths will conform to specifications?
d. Refer to part (c). What percentage of such forg- 64. Specimens of three different types of rope wire were
ings will have a slot width that is outside the selected, and the fatigue limit (MPa) was deter-
interval from .995 in. to 1.005 in. (i.e., either mined for each specimen, resulting in the accom-
less than .995 or greater than 1.005)? What panying data:
can be said about the percentage of widths Type 1: 350 350 350 358 370 370 370 371
that exceed 1.005 in.?
371 372 372 384 391 391 392
61. Reconsider Chebyshev’s inequality as stated in the Type 2: 350 354 359 363 365 368 369 371
previous exercise. 373 374 376 380 383 388 392
a. Compare what the inequality says about the per- Type 3: 350 361 362 364 364 365 366 371
centage within 1, 2, or 3 standard deviations of 377 377 377 379 380 380 392
the mean value to the corresponding percent-
ages given by the empirical rule. a. Construct a comparative boxplot, and comment
b. An exponential distribution with parameter on similarities and differences.
has both mean value and standard deviation b. Construct a comparative dotplot (a dotplot for
equal to 1y. If component lifetime is exponen- each sample with a common scale). Comment
tially distributed with a mean value of 100 hr, on similarities and differences.
what percentage of these components have life- c. Does the comparative boxplot of part (a) give an
times within 1 standard deviation of the mean informative assessment of similarities and differ-
lifetime? Within 2 standard deviations? Within ences? Explain your reasoning.
3 standard deviations? Compare these to the per-
65. The three measures of center introduced in this
centages given by Chebyshev’s inequality.
chapter are the mean, median, and trimmed
c. Why do you think the percentages from Che-
mean. Two additional measures of center that are
byshev’s inequality so badly understate the
occasionally used are the midrange, which is the
actual percentages in the situations of parts
average of the smallest and largest observations,
(a) and (b)?
and the midhinge, which is the average of the two
62. Consider a sample x1, . . . , xn with mean x and stan- quartiles. Which of these five measures of center
dard deviation s, and let zi 5 (xi 2 x)ys. What are the are resistant to the effects of outliers and which
mean and standard deviation of the zi’s? are not? Explain your reasoning.

66. The capacitance (nf) of multilayer ceramic capaci- mean 5 535 median 5 500 mode 5 500
tors supplied by a certain vendor is normally distrib- sd 5 96 minimum 5 220 maximum 5 925
uted with mean value 98 and standard deviation 2. 5th percentile 5 400 10th percentile 5 430
Specifications for these capacitors are 100 6 5 nf. 90th percentile 5 640 95th percentile 5 720
a. What proportion of these capacitors will con-
What can you conclude about the shape of a histo-
form to specification?
gram of this data? Explain your reasoning.
b. Suppose that these capacitors are shipped in
batches of size 20. Let x denote the number of ca- 69. Let x denote the maximum physical stress that a
pacitors in a batch that conform to specification. unit of a certain product encounters during its life-
Provided that capacitances of successive capaci- time. Suppose that x is normally distributed with
tors are independent of one another, what kind of 99th percentile 5 5.33 and 10th percentile 5 1.72
distribution does x have? In the long run, in what (suggested in the article “A Formulation of Product
proportion of batches will at least 19 of the 20 ca- Reliability through Environmental Stress Testing
pacitors conform to specifications? Hint: Think of and Screening,” J. of the Institute of Envir. Sciences,
a capacitor that conforms to specification as a “suc- 1994: 50–56; the unit for x was unspecified). What
cess,” so x is the number of successes in the batch. proportion of these units have maximum stress val-
ues exceeding 5? What proportion have maximum
67. Aortic stenosis refers to a narrowing of the aortic
stress values less than 2?
valve in the heart. The paper “Correlation Analysis
of Stenotic Aortic Valve Flow Patterns Using Phase 70. The indoor thermal climate is an important
Contrast MRI” (Annals of Biomed. Engr., 2005: characteristic affecting the health and pro-
878–887) gave the following data on aortic root ductivity of workers in buildings. The paper
diameter (cm) and gender for a sample of patients “Adaptive Comfort Temperature Model of Air-
having various degrees of aortic stenosis: Conditioned Buildings in Hong Kong” (Building
M: 3.7 3.4 3.7 4.0 3.9 and Environment, 2003: 837–852) reported data
3.8 3.4 3.6 3.1 4.0 on a number of building characteristics mea-
3.4 3.8 3.5 sured during the summer and also during the
F: 3.8 2.6 3.2 3.0 4.3 winter. Consider the accompanying values of
3.5 3.1 3.1 3.2 3.0 relative humidity.
a. Compare and contrast the diameter observations Summer: 57.18 58.11 56.53 58.61 57.40 62.64
for the two genders. 61.72 57.26 53.43 53.71 58.64 45.12
b. Calculate a 10% trimmed mean for each of the 47.52 54.47 55.88 51.08 53.69 54.37
two samples and compare to other measures of 54.36 61.01 52.66 56.20 48.40 46.99
center (for the male sample, the interpolation 50.63 52.40 52.20 55.95 53.77
method mentioned in Section 2.1 must be used). Winter: 52.20 41.83 55.63 54.18 54.56 56.20
58.09 56.70 57.57 58.70 56.15 59.77
68. A study carried out to investigate the distribution of
61.58 61.81 62.48 63.31 55.57 62.25
total braking time (reaction time plus accelerator-
57.40 55.07 62.52 52.80 57.20 59.27
to-brake movement time, in ms) during real driving
54.98 58.13
conditions at 60 km/hr gave the following summary
information on the distribution of times (“A Field Use methods from this and the previous chapter
Study on Braking Responses during Driving,” Ergo- to describe, summarize, compare, and contrast the
nomics, 1995: 1903–1910): summer and winter relative humidity data.

Bibliography

Please see the bibliography for Chapter 1.

3
Giancarlo Liguori/Shutterstock.com
Bivariate and Multivariate
Data and Distributions
3.1 ScatterPlots
3.2 Correlation
3.3 Fitting a Line to Bivariate Data
3.4 Nonlinear Relationships
3.5 Using More Than One Predictor
3.6 Joint Distributions

Introduction
Now that we have acquired some facility for working with univariate data and
distributions, it’s time to expand our horizons. A multivariate data set consists of
observations made simultaneously on two or more variables. One important special
case is that of bivariate data, in which observations on only two variables, and , are
available. In Section 3.1, we introduce the scatterplot, a picture for gaining insight
into the nature of any relationship between and .
Next, we discuss the correlation coefficient, which is a measure of how
strongly two variables are related. In many investigations, one primary objective
is to predict from the value of —for example, to predict yield from a chemical
reaction at a particular reaction temperature. If the scatterplot shows a linear
pattern, the natural strategy is to fit a straight line to the data and use it as the
basis for predictions, as we do in Section 3.3. If a scatterplot shows curvature,
fitting a nonlinear function, such as a quadratic or an exponential function, is
appropriate; we show how this can be done in Section 3.4. Multiple regression
functions, in which is related to two or more predictor variables, are the
subject of Section 3.5. Finally, Section 3.6 introduces bivariate and multivariate

101

distributions for population or process variables. In Chapter 11, we return to

this type of data and describe how formal conclusions about relationships can
be drawn by using methods from statistical inference.

3.1 ScatterPlots
A multivariate data set consists of measurements or observations on each of two or
more variables. One important special case, bivariate data, involves only two vari
ables, x and y. For example, x might be the distance from a particular highway and y,
the lead content of the soil at that distance. When both x and y are numerical variables,
each observation consists of a pair of numbers, such as (14, 5.2) or (27.63, 18.9). The
first number in a pair is the value of x and the second number is the value of y.
An unorganized list of such pairs yields little information about the distribution of
either the x values or the y values separately, and even less information about whether
the two variables are related to one another. In Chapter 1, we saw how pictures could
help make sense of univariate data. The most important picture based on bivariate
numerical data is a scatterplot. Each observation (pair of numbers) is represented by
a point on a rectangular coordinate system, as shown in Figure 3.1(a). The horizontal
axis is identified with values of x and is scaled so that any x value can be easily located.
Similarly, the vertical or y axis is marked for easy location of y values. The point cor
responding to any particular (x, y) pair is placed where a vertical line from the value on
the x axis intersects a horizontal line from the value on the y axis. Figure 3.1(b) shows
the point representing the observation (4.5, 15); it is above 4.5 on the horizontal axis
and to the right of 15 on the vertical axis.

40 40

30 30 Point
corresponding
20 20 to (4.5, 15)
= 15
10 10
Unless otherwise noted, all content on this page is © Cengage Learning.
1 2 3 4 5 1 2 3 4 5

(a) (b) = 4.5

Figure 3.1 Constructing a scatterplot: (a) rectangular coordinate system for a scatterplot

of bivariate data; (b) the point corresponding to the observation (4.5, 15)

Example 3.1 Visual and musculoskeletal problems associated with the use of visual display
terminals (VDTs) have become rather common in recent years. Some research
ers have focused on vertical gaze direction as a source of eye strain and irritation.
This direction is known to be closely related to ocular surface area (OSA), so a

method of measuring OSA is needed. The accompanying representative data on

y 5 OSA (cm2) and x 5 width of the palprebal fissure (i.e., the horizontal width
of the eye opening, in cm) is from the article “Analysis of Ocular Surface Area for
Comfortable VDT Workstation Layout” (Ergonomics, 1996: 877–884). The order in
which observations were obtained was not given, so for convenience they are listed
in increasing order of x values.
Obs: 1 2 3 4 5 6 7 8 9 10
x: .40 .42 .48 .51 .57 .60 .70 .75 .75 .78
y: 1.02 1.21 .88 .98 1.52 1.83 1.50 1.80 1.74 1.63
Obs: 11 12 13 14 15 16 17 18 19 20
x: .84 .95 .99 1.03 1.12 1.15 1.20 1.25 1.25 1.28
y: 2.00 2.80 2.48 2.47 3.05 3.18 3.76 3.68 3.82 3.21
Obs: 21 22 23 24 25 26 27 28 29 30
x: 1.30 1.34 1.37 1.40 1.43 1.46 1.49 1.55 1.58 1.60
y: 4.27 3.12 3.99 3.75 4.10 4.18 3.77 4.34 4.21 4.92
Thus (x1, y1) 5 (.40, 1.02), (x5, y5) 5 (.57, 1.52), and so on. A Minitab scatterplot is
shown in Figure 3.2; we used an option that produced a dotplot of both the x values
and y values individually along the right and top margins of the plot, which makes it
easier to visualize the distributions of the individual variables (histograms or boxplots
are alternative options).

3
OSA
Unless otherwise noted, all content on this page is © Cengage Learning.

0
0.4 0.6 0.8 1.0 1.2 1.4 1.6
Palwidth

Figure 3.2 Scatterplot from Minitab for the data from Example 3.1, along with
dotplots of and values

Here are some things to notice about the data and plot:

Several observations have identical x values yet different y values (for

example, x8 5 x9 5 .75, but y8 5 1.80 and y9 5 1.74). Thus the value of y is
not determined solely by x but also by various other factors.
There is a strong tendency for y to increase as x increases. That is, larger
values of OSA tend to be associated with larger values of fissure width—a
positive relationship between the variables.
It appears that the value of y could be predicted from x by finding a line that
is reasonably close to the points in the plot (the authors of the cited article
superimposed such a line on their plot). In other words, there is evidence of a
substantial (though not perfect) linear relationship between the two variables.

The horizontal and vertical axes in the scatterplot of Figure 3.2 intersect at the
point (0, 0). In many data sets, the values of x or y or the values of both variables differ
considerably from zero relative to the range(s) of the values. For example, a study of
how air conditioner efficiency is related to maximum daily outdoor temperature might
involve observations for temperatures ranging from 80°F to 100°F. When this is the
case, a more informative plot would show the appropriately labeled axes intersecting at
some point other than (0, 0).

Example 3.2 Arsenic is found in many ground waters and some surface waters. Recent re
search on health effects has prompted the Environmental Protection Agency
to reduce allowable arsenic levels in drinking water; as a result, many water
systems are no longer compliant with standards. This has spurred interest in
the development of methods to remove arsenic. The accompanying data on
x 5 pH and y 5 arsenic removed (%) by a particular process was read from a
scatterplot in the article “Optimizing Arsenic Removal During Iron Removal:
Theoretical and Practical Considerations” (J. of Water Supply Res. and Tech.,
2005: 545–560):

x: 7.01 7.11 7.12 7.24 7.94 7.94 8.04 8.05 8.07

y: 60 67 66 52 50 45 52 48 40

x: 8.90 8.94 8.95 8.97 8.98 9.85 9.86 9.86 9.87

y: 23 20 40 31 276 9 22 13 7

Figure 3.3 shows two Minitab scatterplots of this data. In Figure 3.3(a), the software
selected the scale for both axes. We obtained Figure 3.3(b) by specifying scaling for
the axes so that they would intersect at roughly the point (0, 0). The second plot
is much more crowded than the first one; such crowding can make it difficult to
ascertain the general nature of any relationship. For example, curvature can be over
looked in a crowded plot.

% removal
% removal
70
70
60
60
50
50
40
40
30
30
20 20
10 10
0 0
pH pH
7.0 7.5 8.0 8.5 9.0 9.5 10.0 0 2 4 6 8 10
(a) (b)

Figure 3.3 Minitab scatterplots of the data in Example 3.2

Large values of arsenic removal tend to be associated with low pH, a negative or in
verse relationship. Furthermore, the two variables appear to be at least approximately
linearly related, although the points in the plot would spread out somewhat about
any superimposed straight line (such a line appeared in the plot in the cited article).

Section 3.1 Exercises

1. In the article “Analysis of the Thermal Properties of 2. The article “Case Adaptation Method of Case-Based
Air-Conditioning-Type Building Materials” (Solar Reasoning for Construction Cost Estimation in Ko
Energy, 2012: 2967–2974), researchers investigated rea” (J. Constr. Engr. Mgmt., 2012: 43–52) provided
thermal properties of building materials that are data on military barrack projects undertaken by the
used across a variety of climate regions. One prop Korean Ministry of National Defense from 2004 to
erty of interest was solar absorptance, a measure of 2008. Two variables of interest were the floor area of
an object’s ability to absorb solar radiation. To reduce a barrack and the corresponding cost (in $US). The
building energy consumption, it would be desirable corresponding data is given here:
for the building material to have higher solar absorp
Floor Area: Cost:
Unless otherwise noted, all content on this page is © Cengage Learning.

tance in colder climates and lower solar absorptance

382 418,930
in warmer climates. The following data (read from a
571 609,386
graph) shows solar absorptance levels under different 618 755,489
temperature conditions for a building material called 726 660,527
G17S, which changes color depending on tempera 802 864,438
ture, thereby allowing for variable absorptance. 959 1,003,495
Temperature (in C): 2 9 20 28 39 1066 895,947
1306 1,461,549
Solar Absorptance: .81 .78 .69 .65 .48
1873 1,899,494
Create a scatterplot for this data. How would you 2460 2,331,632
characterize the relationship between these two vari 3134 2,833,203
ables? Is the desired inverse relationship between tem 4989 4,750,468
perature and absorptance evident for this material? 6918 5,331,390

a. Construct stem-and-leaf displays of both floor area is based on the amount of oil added (in g) and
and cost. Comment on any interesting features. the corresponding amount of oil recovered (in g)
b. Do the values of cost appear to be perfectly from wheat straw.
linearly related to the floor area values?
c. Construct a scatterplot of the data. Does Oil Added Oil Recovered
it appear that cost could be accurately 1.0 0.610
predicted by the value of floor area? Explain 1.5 0.840
2.1 1.512
your reasoning.
2.8 1.792
3. In the article referenced in Exercise 2, the relation 3.6 2.952
ship between the number of beds in a barrack and 4.5 2.880
the cost of the building was also investigated. 5.5 4.400
6.6 5.346
Number of Beds Cost
7.8 6.396
22 418,930
9.1 7.189
40 609,386
10.5 8.085
40 755,489
12.0 9.840
38 660,527
13.6 11.696
24 864,438
15.2 13.224
54 1,003,495
16.9 14.365
59 895,947
98 1,461,549 a. For each observation, determine the percentage
106 1,899,494 of oil recovery by wheat straw. Is this percentage
142 2,331,632 relatively constant across all observations? Was
190 2,833,203 the percentage higher at certain added oil levels
68 4,750,468 over others?
392 5,331,390 b. Do the values of the recovered oil appear to be
perfectly linearly related to the added oil values?
Construct a scatterplot based on this data. What ap
Why or why not?
pears to be the nature of the relationship between
c. Construct a scatterplot of the data. Does it
these two variables? Do you notice anything pecu
appear that recovered oil could be accurately
liar in the graph?
predicted by the value of added oil? Explain
4. Open water oil spills, such as the Deepwater your reasoning.
Horizon spill of 2010, can wreak terrible conse 5. The article “Objective Measurement of the
quences on the environment and be expensive to Stretch-ability of Mozzarella Cheese” (J. of Texture
clean up. Many physical and biological methods Studies, 1992: 185–194) reported on an experi
have been developed to recover oil from water ment to investigate how the behavior of mozzarella
surfaces. In the article “Capacity of Straw for Re cheese varied with temperature. Consider the ac
peated Binding of Crude Oil from Salt Water and companying data on x 5 temperature and y 5
Its Effect on Biodegradation” (J. Hazard. Toxic elongation (%) at failure of the cheese. Note: The
Radioact. Waste, 2012: 75–78), researchers exam researchers were Italian and used real mozzarella
ined how wheat straw could be used to extract cheese, not the poor cousin widely available in the
crude oil from a water surface. An experiment United States.
was conducted in which crude oil (0 to 16.9 g)
was added to 100 mL of saltwater in separate x: 59 63 68 72 74 78 83
Petri dishes. Wheat straw (2 g) was then added to y: 118 182 247 208 197 135 132
each dish and all dishes were shaken at 70 rpm a. Construct a scatterplot in which the axes
overnight. The following data read from a graph intersect at (0, 0). Mark 0, 20, 40, 60, 80, and

100 on the horizontal axis and 0, 50, 100, 150, coastal watersheds for (in m21) and average
200, and 250 on the vertical axis. long-term annual temperature (T in °C):
b. Construct a scatterplot in which the axes
T: 8.51 8.69 9.01 9.50 10.00 10.60 11.00 11.60
intersect at (55, 100), as was done in the cited
: .40 .42 .40 .43 .40 .38 .40 .30
article. Does this plot seem preferable to the one
in part (a)? Explain your reasoning. T: 11.60 12.60 12.60 13.60 14.20 15.30 17.90 17.90
c. What do the plots of parts (a) and (b) suggest : .41 .27 .28 .19 .22 .19 .13 .09
about the nature of the relationship between the
Construct a scatterplot of the data. How would you
two variables?
describe the nature of the relationship between the
6. Calcium phosphate cement is gaining increasing two variables?
attention for use in bone repair applications. The
article “Short-Fibre Reinforcement of Calcium 8. Researchers considered how the construction
Phosphate Bone Cement” (J. of Engr. in Med., cost of highway resurfacing projects in Kentucky
2007: 203–211) reported on a study in which were affected by that state’s asphalt price index
polypropylene fibers were used in an attempt to (API) and diesel price index (DPI) among other
improve fracture behavior. The following data on factors. From about the mid-1990s to 2010, Ken
x 5 fiber weight (%) and y 5 compressive strength tucky’s annual average API and DPI were found
(MPa) was provided by the article’s authors. to be closely related to the annual average crude
oil price. Based on this, the authors suggested
x: 0.00 0.00 0.00 0.00 0.00 1.25 that crude oil price could be used to predict API
y: 9.94 11.67 11.00 13.44 9.20 9.92 and DPI (“Prices of Highway Resurfacing Proj
ects in Economic Downturn: Lessons Learned
x: 1.25 1.25 1.25 2.50 2.50 2.50
y: 9.79 10.99 11.32 12.29 8.69 9.91
and Strategies Forward,” J. Mgmnt. Engr., 2012,
391–397).
x: 2.50 2.50 5.00 5.00 5.00 5.00 Consider the following monthly API and state
y: 10.45 10.25 7.89 7.61 8.07 9.04 wide crude oil index (COI) values for California
x: 7.50 7.50 7.50 7.50 10.00 10.00 during 2010−11, obtained from the California
y: 6.63 6.43 7.03 7.63 7.35 6.94 Department of Transportation.
x: 10.00 10.00
COI API COI API
y: 7.02 7.67
385.1 415.1 474.3 477.1
408.0 377.0 483.4 488.9
Construct a scatterplot of the data. How would you
400.8 402.8 504.9 586.3
describe the nature of the relationship between the
426.0 427.3 616.1 634.7
two variables?
437.0 436.9 656.6 667.5
7. In surface water hydrology, a common problem 384.0 360.8 606.0 592.2
is the estimation of long-term annual yield from 393.3 372.3 579.0 565.9
ungauged watersheds. In the article “General 402.9 417.2 588.4 570.5
ized Mediterranean Annual Water Yield Model: 404.2 376.5 536.8 589.7
Grunsky’s Equation and Long-Term Average 399.5 424.1 585.9 559.8
Temperature” (J. Hydrol. Engr., 2011: 874–879), 438.9 432.2 592.5 637.0
researchers propose a generalized water yield 447.8 450.6 650.0 625.0
model for watersheds. One important watershed-
specific component of the model is , a coeffi Construct a scatterplot of the data. How would you
cient characterizing the watershed’s annual water describe the nature of the relationship between the
yield response to annual precipitation. The article two variables? Does it seem to be the case that COI
provided the following data from 16 California and API are closely related?

3.2 Correlation
A scatterplot of bivariate numerical data gives a visual impression of how strongly x values
and y values are related. However, to make precise statements and draw reliable conclu
sions from data, we must go beyond pictures. A correlation coefficient (from co-relation)
is a quantitative assessment of the strength of relationship between x and y values in a set of
(x, y) pairs. In this section, we introduce the most frequently used correlation coefficient.
Figure 3.4 displays scatterplots that indicate different types of relationships between
the x and y values. The plot in Figure 3.4(a) suggests a very strong positive relationship
between x and y, that is, a strong tendency for y to increase as x increases. Figure 3.4(b)
gives evidence of a substantial negative relationship: As x increases, there is a tendency
for y to decrease (as would probably be the case for x 5 amount of time per week that
a high school student spends watching television and y 5 amount of time the student
spends studying). The plot of Figure 3.4(c) indicates no strong relationship between
the two variables; there is no tendency for y to either increase or decrease as x increases.
Finally, as illustrated in Figure 3.4(d), a scatterplot can show a strong positive (or
negative) relationship through a pattern that is curved rather than linear in appearance.

(a) (b) (c) (d)

Figure 3.4 Scatterplots illustrating various types of relationships: (a) positive relationship, linear pattern;
(b) negative relationship, linear pattern; (c) no relationship or pattern; (d) positive relationship, curved pattern

Pearson’s Sample Correlation Coefficient

Let (x1, y1), (x2, y2), . . . , (xn, yn) denote a sample of (x, y) pairs. Consider subtracting x Unless otherwise noted, all content on this page is © Cengage Learning.

from each x value to obtain the x deviations, x1 2 x, . . . , xn 2 x, and also subtracting

y from each y value to give y1 2 y, . . . , yn 2 y. Then multiply each x deviation by the
corresponding y deviation to obtain products of deviations of the form (x 2 x)(y 2 y).
The scatterplot in Figure 3.5(a) indicates a substantial positive relationship. A ver
tical line through x and a horizontal line through y divide the plot into four regions.
In region I, both x and y exceed their mean values, so x 2 x and y 2 y are both posi
tive numbers. It then follows that (x 2 x)(y 2 y) is positive. The product of deviations
is also positive for any point in region III, because both deviations are negative and
multiplying two negative numbers gives a positive number. In each of the other two
regions, one deviation is positive and the other is negative, so (x 2 x)(y 2 y) is negative.
Because almost all points lie in regions I and III, almost all products of deviations are
positive. Thus the sum of products, ^ (xi 2 x)(yi 2 y), will be a large positive number.

II – <0 I – >0
II I
– >0 – >0

–
–

III – <0 IV – >0

III IV
– <0 – <0

– –
(a) (b)

II I

III IV

–
Unless otherwise noted, all content on this page is © Cengage Learning.

(c)

Figure 3.5 Subdividing a scatterplot according to the signs of 2 and 2 :

(a) a positive relation; (b) a negative relation; (c) no strong relation

Similar reasoning for the data displayed in Figure 3.5(b), which exhibits a strong
negative relationship, implies that ^ (xi 2 x)(yi 2 y) will be a large negative number.
When there is no evidence of a strong relationship, as in Figure 3.5(c), positive and
negative products of deviations tend to counteract one another, giving a value of the
sum that is close to zero. In summary, ^ (xi 2 x)(yi 2 y) seems to be a reasonable mea
sure of the degree of association between the x and y values; it will be a large positive
number, a large negative number, or a number close to zero according to whether there
is a strong positive, a strong negative, or no strong relationship.

Unfortunately, our proposal has a serious deficiency: Its value depends on the
choice of unit of measurement for both x and y. Suppose, for example, that x is height.
Each x value expressed in inches will be 12 times the corresponding value expressed
in feet, and the same will then be true of x. It follows that the value of ^ (xi 2 x)(yi 2 y)
when the x unit is inches will be 12 times what it is when the unit is feet. A measure of
the inherent strength of the relationship should give the same value whatever the units
for the variables; otherwise our impressions may be distorted by the choice of units.
A straightforward modification of our initial proposal leads to the most popular
measure of association, one that is free of the defect just alluded to and has other attrac
tive properties.

DEFINITION Pearson’s sample correlation r is given by

r5
^(xi 2 x)(yi 2 y ) 5
Sxy

3 ^(xi 2 x)2 3 ^(yi 2 y)2 2Sxx 2Syy

Computing formulas for the three summation quantities are
1^ xi 22
Sxx 5 ^ x2i 2
n
1^ yi 2 2
Syy 5 ^ y2i 2
n

1 ^ xi 2 1^ yi 2
Sxy 5 ^ xiyi 2
n

Use of the computing formulas makes all the subtraction needed to obtain the devia
tions unnecessary. Instead, the following five summary quantities are needed: ^ xi, ^ yi,
^ x2i , ^ y2i , ^ xi yi. The following example shows how a tabular format facilitates the cal
culations (we’ll get to the issue of interpretation in a moment).

Example 3.3 The catch basin in a storm-sewer system is the interface between surface runoff and
the sewer. A catch-basin insert is a device for retrofitting catch basins to improve
their pollutant removal properties. The article “An Evaluation of the Urban Storm
water Pollutant Removal Efficiency of Catch Basin Inserts” (Water Envir. Res., 2005:
500–510) reported on tests of various inserts under controlled conditions for which
inflow is close to what can be expected in the field. Consider the following data, read
from a graph in the article, for one particular type of insert on x 5 amount filtered
(1000s of liters) and y 5 % total suspended solids removed.
x: 23 45 68 91 114 136 159 182 205 228
y: 53.3 26.9 54.8 33.8 29.9 8.2 17.2 12.2 3.2 11.1

The accompanying table contains five columns for the x, y, x2, y2, and xy values,
respectively. The sum of each column is given at the bottom of the table.
x y x2 y2 xy
23 53.3 529 2840.89 1225.9
45 26.9 2025 723.61 1210.5
68 54.8 4624 3003.04 3726.4
91 33.8 8281 1142.44 3075.8
114 29.9 12996 894.01 3408.6
136 8.2 18496 67.24 1115.2
159 17.2 25281 295.84 2734.8
182 12.2 33124 148.84 2220.4
205 3.2 42025 10.24 656
228 11.1 51984 123.21 2530.8
1251 250.6 199,365 9249.36 21,904.4

^ xi ^ yi ^ x2i ^ y2i ^ xi yi

Then
(1251)2
Sxx 5 199,365 2 5 42,865,
10
(250.6)2
Syy 5 9249.36 2 5 2969.3
10
(1251)(250.6)
Sxy 5 21,904.4 2 5 29446
10
from which
29446
r5 5 2.837
242,86522969.3

Properties and Interpretation of r

1. The value of r does not depend on the unit of measurement for either variable. If,
for example, x is height, the factor of 12 that appears in the numerator when changing
from feet to inches will also appear in the denominator, so the two will cancel and leave
r unchanged. The same value of r results from height expressed in inches, meters, or
miles. If y is temperature, expressing values in °F, °C, or °K will give the same value of r.
The correlation coefficient measures the inherent strength of relationship between two
numerical variables.
2. The value of r does not depend on which of the two variables is labeled x. Thus
if we had let x 5 % removed and y 5 amount filtered in Example 3.3, the same value,
r 5 2.837, would have resulted.
3. The value of r is between 21 and 11. A value near the upper limit, 11, is indicative
of a substantial positive relationship, whereas an r close to the lower limit, 21, suggests a
prominent negative relationship. Figure 3.6 shows a useful informal way to describe the

strength of relationship based on r. It may seem surprising that a value of r as extreme as 2.5
or .5 should be in the “weak” category; an explanation for this is given later in the chapter.

Strong Moderate Weak Moderate Strong

1 .8 .5 0 .5 .8 1

Figure 3.6 Describing the strength of relationship

4. r 5 1 only when all the points in a scatterplot of the data lie exactly on a straight
line that slopes upward. Similarly, r 5 21 only when all the points lie exactly on a
downward-sloping line. Only when there is a perfect linear relationship between x and
y in the sample will r take on one of its two possible extreme values.
5. The value of r is a measure of the extent to which x and y are linearly related—that
is, the extent to which the points in the scatterplot fall close to a straight line. A value
of r close to zero does not rule out any strong relationship between x and y; there could
still be a strong relationship but one that is not linear.

Example 3.4 As far back as Leonardo da Vinci, height and wingspan (measured from fingertip to
fingertip between outstretched hands) were known to be closely related. For the fol
lowing actual measurements (in inches) from 16 students in a statistics class notice
how close the two values are.
Height: 59.0 72.0 67.0 63.5 68.0 66.0 71.0 69.0
Wingspan: 57.5 70.5 69.0 63.5 71.0 67.0 71.5 68.5
Height: 73.0 69.0 69.5 72.0 73.5 73.0 74.0 70.0
Wingspan: 74.0 69.5 71.0 71.5 75.0 75.5 74.5 73.0

The scatterplot in Figure 3.7 shows an approximately linear shape, and the point
cloud is roughly elliptical. The correlation is computed to be 0.955. If the measure
ments were converted to centimeters, the correlation would remain unchanged.

75
Unless otherwise noted, all content on this page is © Cengage Learning.

70
Wingspan

55
60 62 64 66 68 70 72 74
Height

Figure 3.7 Wingspan plotted against height

Example 3.5 The article “Quantitative Estimation of Clay Mineralogy in Fine-Grained Soils”
(J. Geotech. Geoenviron. Engr., 2011: 997–1008) reported on various chemical prop
erties of natural and artificial soils. Consider the accompanying data on the cation
exchange capacity (CEC, in meq@100 g) and specific surface area (SSA, in m2@g) of
20 natural soils. A scatterplot appears in Figure 3.8.
CEC: 66 121 134 101 77 89 63 57 117 118
SSA: 175 324 460 288 205 210 295 161 314 265
CEC: 76 125 75 71 133 104 76 96 58 109
SSA: 236 355 240 133 431 306 132 269 158 303

Minitab gave the following output in response to a request for r:

correlation of SSA and CEC = 0.853

There is evidence of a moderate to strong positive relationship.

500

400
SSA

300

200

100
50 60 70 80 90 100 110 120 130 140
CEC

Figure 3.8 Scatterplot of the data from Example 3.5

Unless otherwise noted, all content on this page is © Cengage Learning.

Example 3.6 The accompanying data on y 5 glucose concentration (g/L) and x 5 fermentation
time (days) for a particular brand of malt liquor was read from a scatterplot appearing
in the article “Improving Fermentation Productivity with Reverse Osmosis” (Food
Tech., 1984: 92–96):
x: 1 2 3 4 5 6 7 8
y: 74 54 52 51 52 53 58 71

The scatterplot of Figure 3.9 (page 114) suggests a strong relationship, but not a lin
ear one, between x and y. With

^ xi 5 36 ^ x2i 5 204 ^ yi 5 465 ^ y2i 5 27,615 ^ xi yi 5 2094

50
2 4 6 8

Figure 3.9 Scatterplot of the data from Example 3.6

we have

(36)(465)
Sxy 5 2094 2 5 1.5000
8

(36)2
Sxx 5 204 2 5 42 Syy 5 586.875
8

1.500
r5 5 .0096 .01
2422586.875

This shows the importance of interpreting r as measuring the extent of any linear rela
tionship. We should not conclude that there is no relation whatsoever just because r 0.

The Population Correlation Coefficient

The sample correlation coefficient r measures how strongly the x and y values in a Unless otherwise noted, all content on this page is © Cengage Learning.
sample of pairs are related to one another. There is an analogous measure of how
strongly x and y are related in the entire population of pairs from which the sample
(x1, y1), . . . , (xn, yn) was obtained. It is called the population correlation coefficient
and is denoted by (notice again the use of a Greek letter for a population character
istic and a Roman letter for a sample characteristic). We will never have to calculate
from the entire population of pairs, but it is important to know that satisfies proper
ties paralleling those of r:
1. is a number between 21 and 11 that does not depend on the unit of measure
ment for either x or y, or on which variable is labeled x and which is labeled y.
2. 5 11 or 21 if and only if all (x, y) pairs in the population lie exactly on a
straight line, so measures the extent to which there is a linear relationship in
the population.

In Chapter 11, we show how the sample characteristic r can be used to make an
inference concerning the population characteristic . In particular, r can be used to
decide whether 5 0 (no linear relationship in the population).

Correlation and Causation

A value of r close to 1 indicates that relatively large values of one variable tend to
be associated with relatively large values of the other variable. This is far from say
ing that a large value of one variable causes the value of the other variable to be
large. Correlation (Pearson’s or any other) measures the extent of association, but
association does not imply causation. It frequently happens that two variables are
highly correlated not because one is causally related to the other but because they are
both strongly related to a third variable. Among all elementary-school children, there
is a strong positive relationship between the number of cavities in a child’s teeth and
the size of his or her vocabulary. Yet no one advocates eating foods that result in more
cavities to increase vocabulary size (or working to decrease vocabulary size to protect
against cavities). Number of cavities and vocabulary size are both strongly related to
age, so older children tend to have higher values of both variables than do younger
ones. Among children of any fixed age, there would undoubtedly be little relationship
between number of cavities and vocabulary size.
Scientific experiments can frequently make a strong case for causality by care
fully controlling the values of all variables that might be related to the ones un
der study. Then, if y is observed to change in a “smooth” way as the experimenter
changes the value of x, the most plausible explanation would be a causal relationship
between x and y. In the absence of such control and ability to manipulate values of
one variable, we must admit the possibility that an unidentified underlying third
variable is influencing both the variables under investigation. A high correlation in
many uncontrolled studies carried out in different settings can marshal support for
causality—as in the case of cigarette smoking and cancer—but proving causality is
often a very elusive task.

Section 3.2 Exercises

9. For each of the following pairs of variables, indicate 10. Head movement evaluations are important be
whether you would expect a positive correlation, cause individuals, especially those who are dis
a negative correlation, or little or no correlation. abled, may be able to operate communications
Explain your choice. aids in this manner. The article “Constancy
a. Maximum daily temperature and cooling cost of Head Turning Recorded in Healthy Young
b. Interest rate and number of loan applications Humans” (J. of Biomed. Engr., 2008: 428–436)
c. Incomes of husbands and wives when both have reported data on ranges in maximum inclination
full-time jobs angles of the head in the clockwise anterior, pos
d. Vehicle speed (mph, from 20 to 100) and fuel terior, right, and left directions for 14 randomly
efficiency (mpg) selected subjects. Consider the accompanying
e. Fuel efficiency and 3-year operating cost data on average anterior maximum inclination
f. Distance from a Stanford University student’s angle (AMIA) in both the clockwise (Cl) and
home town to campus and grade point average counterclockwise (Co) directions.

Subj: 1 2 3 4 5 6 7 Constr., 2012: 35–43), investigators developed a

Cl: 57.9 35.7 54.5 56.8 51.1 70.8 77.3 method to mathematically model bond strength
Co: 44.2 52.1 60.2 52.7 47.2 65.6 71.4 between a carbon FRP and a concrete substrate.
For each of 15 carbon FRP–concrete samples,
Subj: 8 9 10 11 12 13 14 the article reported the maximum transferable
Cl: 51.6 54.7 63.6 59.2 59.2 55.8 38.5 load (kN) calculated by the model and compared
Co: 48.8 53.1 66.3 59.8 47.5 64.5 34.5 this with the corresponding maximum transfer
a. Construct boxplots of both the clockwise and able load (kN) as measured in the laboratory.
counterclockwise direction observations, and The data is given here:
comment on any interesting features.
Calc: Meas: Calc: Meas:
b. Construct a scatterplot of the data. What does it
suggest about the general nature of the relation 14.2 13.7 14.3 13.4
ship between Cl and Co? 16.0 13.7 21.4 21.4
c. Calculate the value of the sample correlation 16.5 15.4 17.6 14.8
coefficient. Does it confirm your impression 15.9 15.4 8.6 7.4
from the scatterplot?
18.8 16.2 10.3 7.4
11. Torsion during external rotation and extension 17.9 16.3 11.9 14.7
of the hip may explain why acetabular labral
13.1 13.7 18.7 18.2
tears occur in professional athletes. The article
“Hip Rotational Velocities During the Full 15.4 16.2
Golf Swing” (J. of Sports Sci. and Med., 2009:
a. Construct a scatterplot of the data. Does it seem
296–299) reported on an investigation in which
to be the case that, in general, when the mea
lead hip internal peak rotational velocity (x) and
sured load is low (high), the calculated load is
trailing hip peak external rotational velocity (y)
also low (high)? For each sample, are the two
were determined for a sample of 15 golfers. Data
variables relatively close in value?
provided by the article’s authors was used to cal
b. Calculate the value of the sample correlation
culate the following summary quantities:
coefficient. Does it confirm your impression
from the scatterplot?
^ (xi 2 x)2 5 64,732.83,
13. The article “Behavioural Effects of Mobile Tele
^ (yi 2 y)2 5 130,566.96, phone Use During Simulated Driving” (Ergo-
nomics, 1995: 2536–2562) reported that for a
^ (xi 2 x)(yi 2 y) 5 44,185.87 sample of 20 experimental subjects, the sample
correlation coefficient for x 5 age and y 5 time
Based on this, compute the sample correlation since the subject had acquired a driving license
coefficient and interpret its value. How would you (yr) was .97. Why do you think the value of r
characterize this correlation—as strong, moderate, is so close to 1? (The article’s authors gave an
or weak? explanation.)
12. Historically, reinforced concrete structures used 14. An employee of an auction house has a list of
externally bonded steel plates to add strength 25 recently sold paintings. Eight artists were
and support. Recently, fiber reinforced polymer represented in these sales. The sale price of each
(FRP) plates have been used instead of steel painting is on the list. Would the correlation coef
plates because of their superior properties. In ficient be an appropriate way to summarize the
the article “Interfacial Bond Strength Character relationship between artist (x) and sale price (y)?
istics of FRP and RC Substrate” (J. of Compos. Why or why not?

15. A sample of automobiles traversing a certain stretch a. Calculate the sample correlation coefficient for
of highway is selected. Each automobile travels at a the nine (x, y) pairs.
roughly constant rate of speed, though speed does b. Let x1 be the average score on the first midterm
vary from auto to auto. Let x 5 speed and y 5 time exam for the 8 a.m. students and y1 be the average
needed to traverse this segment of highway. Would score on the second midterm for these students.
the sample correlation coefficient be closest to .9, Denote the two averages for the noon students by
.3, 23, or 2.9? Explain. x2 and y2, and for the night students by x3 and y3.
Calculate r for these three (x, y) pairs.
16. Suppose that x and y are positive variables and that a
c. Construct a scatterplot of the nine (x, y) pairs
sample of n pairs results in r 1. If the sample cor
and another one of the three pairs of averages.
relation coefficient is computed for the (x, y2) pairs,
Can you see why r in part (a) is smaller than r in
will the resulting value also be approximately 1?
part (b)? Does this suggest that a correlation co
Explain.
efficient based on averages (called an “ecologi
17. Nine students currently taking introductory statis cal” correlation) might be misleading? Explain.
tics are randomly selected, and both the first mid 18. Suppose data is collected on two quantitative vari
term exam score (x) and the second midterm score ables, x and y. Let r be the corresponding sample cor
(y) are determined. Three of the students have the relation coefficient for (x, y). The x and y values are
class at 8 a.m., another three have it at noon, and then transformed as follows: x= 5 a 1 bx, y= 5 c 1 dy
the remaining three have a night class. The result where a, b, c, and d are constants. Let r= be the cor
ing (x, y) pairs are as follows: responding sample correlation coefficient for (x=, y=).
8 a.m.: (70, 60) (72, 83) (94, 85) a. Show that x= 5 a 1 bx and y= 5 c 1 dy.
Noon: (80, 72) (60, 74) (55, 58) b. Show that sx= 5 bsx and sy= 5 dsy.
Night: (45, 63) (50, 40) (35, 54) c. Show that r 5 r=.

3.3 Fitting a Line to Bivariate Data

Given two numerical variables x and y, the general objective of regression analysis is
to use information about x to draw some type of conclusion concerning y. Often an
investigator wants to predict the y value that would result from making a single obser
vation at a specified x value—for example, to predict product sales y for a sales region
in which advertising expenditure x is one million dollars. The different roles played by
the two variables are reflected in standard terminology: y is called the dependent or
response variable, and x is referred to as the independent, predictor, or explanatory
variable.
A scatterplot of y versus x frequently exhibits a linear pattern. In such cases, it is
natural to summarize the relationship between the variables by finding a line that is
as close as possible to the points in the plot. Before doing so, let’s quickly review some
elementary facts about lines and linear relationships.
Suppose a car dealership advertises that a particular type of vehicle can be rented
on a one-day basis for a flat fee of $25 plus an additional $.30 per mile driven. If such a
vehicle is rented and driven for 100 miles, the dealer’s revenue y is

y 5 25 1 (.30)(100) 5 25 1 30 5 55

More generally, if x denotes the distance driven in miles, then

y 5 25 1 .30x

That is, x and y are linearly related.

The general form of a linear relationship between x and y is y 5 a 1 bx. A par
ticular relation is specified by choosing values of a and b, for example, y 5 10 1 2x or
y 5 100 2 5x. If we choose some x values and calculate y 5 a 1 bx for each value, the
points in the scatterplot of the resulting (x, y) pairs fall exactly on a straight line. The value
of b, the slope of the line, is the amount by which y increases when x increases by 1 unit.
The vertical or y intercept a is the height of the line above the value x 5 0. The equation
y 5 10 1 2x has slope b 5 2, so each 1-unit increase in x results in an increase of 2 in y.
When x 5 0, y 5 10 and the height at which the line crosses the vertical axis is 10. To
draw the line corresponding to this equation, select any two x values (e.g., x 5 5 and x 5
10). Substitute these values into the equation to obtain the corresponding y values (y 5 20
and y 5 30) and thus two (x, y) points on the line. Finally, connect these two points with
a straightedge.

Fitting a Straight Line

The line that gives the most effective summary of an approximate linear relation is the
one that in some sense is the best-fitting line, the one closest to the sample data. Con
sider the scatterplot and line shown in Figure 3.10. Let’s focus on the vertical deviations
from the points to the line. For example,
deviation from (15, 47) 5 height of point2 height of line
5 47 2 [10 1 2(15)]
57

(15, 47)
50
= 10 + 2

40 Unless otherwise noted, all content on this page is © Cengage Learning.

(13, 28)
20

0
5 10 15 20 25

Figure 3.10 Vertical deviations from points to

a line

Similarly,
deviation from (13, 28) 5 28 2 [10 1 2(13)] 5 28
A positive deviation results from a point that lies above the chosen line, and a negative
deviation from a point that lies below this line. A particular line gives a good fit if the
deviations from the line are small in magnitude, that is, reasonably close to zero.
We now need a way to combine the n deviations into a single measure of fit. The
standard approach is to square the deviations (to obtain nonnegative numbers) and sum
these squared deviations.

DEFINITIONS The most widely used criterion for assessing the goodness of fit of a line y 5 a 1
bx to bivariate data (x1, y1), . . . , (xn, yn) is the sum of the squared deviations about
the line:

^ [yi 2 (a 1 bxi)]2 5 [y1 2 (a 1 bx1)]2 1 1 [yn 2 (a 1 bxn)]2

According to the principle of least squares, the line that gives the best fit to
the data is the one that minimizes this sum; it is called the least squares line or
sample regression line.

To find the equation of the least squares line, let g( a , b) 5 ^ [yi 2 (a 1 bxi)]2.
Then the intercept a and slope b of the least squares line are the values of a and
b that minimize g1 a, b2. These minimizing values are obtained by taking the par
tial derivative of the g function first with respect to a and then with respect to b,
and equating these two partial derivatives to zero (this is analogous to solving the
=
single equation f (z) 5 0 to find the value of z that minimizes a function of a single
variable). This results in the following two equations in two unknowns, called the
normal equations:

na 1 1^ xi 2 b 5 ^ yi 1^ xi 2 a 1 1 ^ x2i 2 b 5 ^ xi yi
These equations are easily solved because they are linear in the unknowns (a conse
quence of using squared deviations in the fitting criterion).

The slope b of the least squares line is given by

5
^ 2 A ^ B A ^ B/
5
^ 2
2 A ^ B 2/

The vertical intercept a of the least squares line is

5 2

The equation of the least squares line is often written as yn 5 a 1 bx, where the
“ˆ” above y emphasizes that yn is a prediction of y that results from the substitution of

any particular x value into the equation. Notice that the numerator and denominator
of b appeared previously in the formula for the sample correlation coefficient r.

Example 3.7 The cetane number is a critical property in specifying the ignition quality of a fuel
used in a diesel engine. Determining this number for a biodiesel fuel is expensive and
time consuming. The article “Relating the Cetane Number of Biodiesel Fuels to Their
Fatty Acid Composition: A Critical Study” (J. of Automobile Engr., 2009: 565–583) in
cluded the following data on x 5 iodine value (g) and y 5 cetane number for a sample
of 14 biofuels. The iodine value is the amount of iodine necessary to saturate a sample
of 100 g of oil.
x: 132.0 129.0 120.0 113.2 105.0 92.0 84.0
y: 46.0 48.0 51.0 52.1 54.0 52.0 59.0
x: 83.2 88.4 59.0 80.0 81.5 71.0 69.2
y: 58.7 61.6 64.0 61.4 54.6 58.8 58.0

The necessary summary quantities for hand calculation can be obtained by plac
ing the x values in a column and the y values in another column and then creating
columns for x2, xy, and y2 (the latter value is not needed at the moment but will be
used shortly). Calculating the column sums gives
^ xi 5 1307.5, ^ yi 5 779.2, ^ x2i 5 128,913.93,
^ xiyi 5 71,347.30, ^ y2i 5 43,745.22
from which

1307.5 779.2
x5 5 93.392857, y5 5 55.657143
14 14
Sxx 5 128,913.93 2 (1307.5)2 / 14 5 6802.7693
Sxy 5 71,347.30 2 (1307.5)(779.2) / 14 521424.41429

Thus

21424.41429
b5 5 2.20938742
6802.7693
a 5 55.657143 2 (2.20938742)(93.392857) 5 75.212432

and the equation of the least squares line is yn 5 75.212 2 .2094x, exactly that re
ported in the cited article.
Figure 3.11 generated by the statistical computer package Minitab shows
that the least squares line is a very good summary of the relationship between the
two variables. A prediction of the cetane number when the iodine value is 100 is
yn 5 75.212 2 .2094(100) 5 54.27. The slope of the least squares line tells us that
a decrease of roughly .209 in cetane number is associated with a 1-gram increase
in iodine value.

Cet Num 75.21 0.2094 Iod Val

Cet num 60

50 60 70 80 90 100 110 120 130 140

Iod val

Figure 3.11 Scatterplot from Minitab for Example 3.7 with least squares
line superimposed

The least squares line should not be used to make a prediction for an x value much
beyond the range of the data, such as x 5 50 or x 5 250 in Example 3.7. The danger
of extrapolation is that the fitted relationship (here, a line) may not be valid for such x
values.

Regression
The term regression comes from the relationship between the least squares line and the
sample correlation coefficient. Let sx and sy denote the sample standard deviations of the
x and y values, respectively. Algebraic manipulation gives
Unless otherwise noted, all content on this page is © Cengage Learning.

sy sy
b 5 ra b yn 5 y 1 ra b (x 2 x)
sx sx

If r 5 1 and we substitute x 5 x 1 sx (an x value 1 standard deviation above the

mean x value), then yn 5 y 1 sy, which is 1 standard deviation above the mean y
value. If, however, r 5 .5 and this x value is substituted, then yn 5 y 1 .5sy, which is
only half a y standard deviation above the mean. More generally, when 21 6 r 6 1,
for any x value, the corresponding predicted value yn will be closer in terms of stan
dard deviations to y than is x to x; that is, yn is pulled toward (regressed toward)
the mean y value. This regression effect was first noticed by Sir Francis Galton
in the late 1800s when he studied the relation between father’s height and son’s

height; the predicted height of a son was always closer to the mean height than was
his father’s height.

Assessing the Fit of the Least Squares Line

How effectively does the least squares line summarize the relationship between the
two variables? In other words, how much of the observed variation in y can be attrib
uted to the approximate linear relationship and the fact that x is varying? A quantita
tive assessment is based on the vertical deviations from the least squares line. The
height of the least squares line above x1 is yn1 5 a 1 bx1, and y1 is the height of the
corresponding point in the scatterplot, so the vertical deviation (residual) from this
point to the line is y1 2 (a 1 bx1). Substituting the remaining x values into the equa
tion gives other predicted (or fitted) values yn2 5 a 1 bx2, . . . , nyn 5 a 1 bxn, and
the other residuals y2 2 ny2, . . . , yn 2 ynn are again obtained by subtraction. A residual
is positive if the corresponding point in the scatterplot lies above the least squares
line and negative if the point lies below the line. It can be shown that when predicted
values and residuals are based on the least squares line, ^ (yi 2 yni) 5 0, so of course
the average residual is zero.
Variation in y can effectively be explained by an approximate straight-line relation
ship when the points in the scatterplot fall close to the least squares line—that is, when
the residuals are small in magnitude. A natural measure of variation about the least
squares line is the sum of the squared residuals (squaring before combining prevents
negative and positive residuals from counteracting one another). A second sum of
squares assesses the total amount of variation in observed y values.

DEFINITIONS Residual sum of squares, denoted by SSResid, is given by

SSResid 5 ^(yi 2 yni)2 5 (y1 2 ny1)2 1 1 (yn 2 ynn)2
(alternatively called error sum of squares and denoted by SSE).
Total sum of squares, denoted by SSTo, is defined as
SSTo 5 ^(yi 2 y)2 5 (y1 2 y)2 1 1 (yn 2 y)2

Alternative notation for SSTo is Syy , and a computing formula is

1 ^ yi 2 2
^ y2i 2 n
A computing formula for residual sum of squares makes it unnecessary to calcu
late the residuals:
SSResid 5 SSTo 2 bSxy

Because b and Sxy have the same sign, bSxy is a positive quantity unless b 5 0, so the
computing formula shows that SSResid 5 SSTo if b 5 0 and SSResid 6 SSTo otherwise.

To avoid any rounding effects, use as much decimal accuracy in b as possible when
computing SSResid.
SSResid is often referred as a measure of “unexplained” variation; it is the amount
of variation in y that cannot be attributed to the linear relationship between x and y.
The more points in the scatterplot deviate from the least squares line, the larger the
value of SSResid and the greater the amount of y variation that cannot be explained
by a linear relation. Similarly, SSTo is interpreted as a measure of total variation; the
larger the value of SSTo, the greater the amount of variability in the observed yi’s. The
ratio SSResid/SSTo is the fraction or proportion of total variation that is unexplained
by a straight-line relation. Subtracting this ratio from 1.0 gives the proportion of total
variation that is explained.

DEFINITION The coefficient of determination, denoted by r2, is given by

SSResid
r2 5 1 2
SSTo
It is the proportion of variation in the observed y values that can be attributed to
(or explained by) a linear relationship between x and y in the sample. Multiplying
r2 by 100 gives the percentage of y variation attributable to the approximate linear
relationship. The closer this percentage is to 100%, the more successful is the
relationship in explaining variation in y.

Example 3.8 The scatterplot of the iodine value and cetane number data in Figure 3.11 portends
a reasonably high r2 value. With
Sxy 5 21424.41429 (the numerator of b) b 5 2.20938742

^ yi 5 779.2 ^ y2i 5 43,745.22

we have

SST 5 43,745.22 2 (779.2)2 / 14 5 377.174

SSE 5 377.174 2 (2.20938742)(21424.41429) 5 78.920

The coefficient of determination is then

r2 5 1 2 SSEySST 5 1 2 (78.920)y(377.174) 5 .791

That is, 79.1% of the observed variation in cetane number is attributable to

(can be explained by) the simple linear regression relationship between cetane
number and iodine value (r2 values are even higher than this in many scientific
contexts, but social scientists would typically be ecstatic at a value anywhere near
this large).

The wide availability of good statistical computer packages makes it unnec

essary to hand calculate the various quantities involved in a regression analysis.
Figure 3.12 shows partial Minitab output for the cetane number–iodine value data
of Examples 3.7 and 3.8; the package will also provide the predicted values and
residuals as well as other information on request. The formats used by other pack
ages differ slightly from that of Minitab, but the information content is very similar.
Quantities such as the standard deviations, t-ratios, F, and P-values are discussed in
Chapter 11.

100 2

SSResid

SSTo

Figure 3.12 Minitab output for the regression of

Examples 3.7 and 3.8

The symbol r was used in Section 3.2 to denote Pearson’s sample correlation coef
ficient. It is not coincidental that r2 is used to represent the coefficient of determination. Unless otherwise noted, all content on this page is © Cengage Learning.
The notation suggests how these two quantities are related:

(correlation coefficient)2 5 coefficient of determination

Thus, if r 5 .8 or r 5 2.8 then r2 5 .64, so that 64% of the observed variation in

the dependent variable can be attributed to the linear relationship. Notice that
because the value of r does not depend on which variable is labeled x, the same is
true of r2. The coefficient of determination is one of the very few quantities calcu
lated in the course of a regression analysis whose value remains the same when the
role of dependent and independent variables are interchanged. When r 5 .5, we
get r2 5 .25, so only 25% of the observed variation is explained by a linear relation.
This is why values of r between 2.5 and .5 can fairly be described as evidence of a
weak relationship.

Standard Deviation About the Least Squares Line

The coefficient of determination measures the extent of variation about the best-
fit line relative to overall variation in y. A high value of r2 does not by itself prom
ise that the deviations from the line are small in an absolute sense. A typical observa
tion could deviate from the line by quite a bit, yet these deviations might still be small
relative to overall y variation. Recall that in Chapter 2 the sample standard deviation
s 5 2 ^ (x 2 x)2y(n 2 1) was used as a measure of variability in a single sample; roughly
speaking, s is the typical amount by which a sample observation deviates from the mean.
There is an analogous measure of variability when a line is fit by least squares.

DEFINITION The standard deviation about the least squares line is given by

SSResid
se 5
A n22

Roughly speaking, se is the typical amount by which an observation deviates from the
least squares line. Justification for division by n 2 2 and the use of the subscript e are
given in Chapter 11.

Example 3.9 The values of x 5 commuting distance and y 5 commuting time were determined
for workers in samples from three different regions. Data is presented in Table 3.1;
the three scatterplots are displayed in Figure 3.13.
For sample 1, a rather small proportion of variation in y can be attributed to
an approximate linear relationship, and a typical deviation from the least squares
line is roughly 4. The amount of variability about the line for sample 2 is the
same as for sample 1, but the value of r2 is much higher because y variation is
much greater overall in sample 2 than in sample 1. Sample 3 yields roughly the
same high value of r2 as does sample 2, but the typical deviation from the line for
sample 3 is only half that for sample 2. A complete picture of variation requires
that both r2 and se be computed.
Unless otherwise noted, all content on this page is © Cengage Learning.

Table 3.1 Data for three regions (Example 3.9)

1 2 3
x y x y x y
15 42 5 16 5 8
16 35 10 32 10 16
17 45 15 44 15 22
18 42 20 45 20 23
19 49 25 63 25 31
20 46 50 115 50 60

120 120 120

ˆ = 13.67 + 1.69 ˆ = 7.87 + 2.14 ˆ = 3.20 + 1.13
100 2 = .433 100 2 = .989 100 2 = .991
= 4.03 = 4.03 = 1.90
80 80 80

60 60 60

40 40 40

20 20 20

10 20 30 10 20 30 40 50 10 20 30 40 50
(a) Region 1 (b) Region 2 (c) Region 3

Figure 3.13 Scatterplots and summary quantities for Example 3.9

Plotting the Residuals (Optional)

It is important to have methods for identifying unusual or highly influential observa
tions and revealing patterns in the data that may suggest how an improved fit can be
achieved. A plot based on the residuals is very useful in this regard.

DEFINITION A residual plot is a plot of the (x, residual) pairs—that is, of the pairs
(x1, y1 2 ny1), (x2, y2 2 ny2), . . . , (xn, yn 2 nyn)—or of the residuals versus predicted
values—the pairs (yn1, y1 2 yn1), . . . , (ynn, yn 2 ynn).

A desirable plot exhibits no particular pattern, such as curvature or much greater

spread in one part of the plot than in another part. Looking at a residual plot after fitting
a line amounts to examining y after removing any linear dependence on x. This can
Unless otherwise noted, all content on this page is © Cengage Learning.
sometimes more clearly show the existence of a nonlinear relationship.

Example 3.10 Consider the accompanying data (page 127) on x 5 height (in.) and y 5 average
weight (lb) for American females aged 30–39 (taken from The World Almanac and
Book of Facts). The scatterplot displayed in Figure 3.14(a) appears rather straight.
However, when the residuals from the least squares line (yn 5 298.2313.596x) are
plotted, substantial curvature is apparent (even though r2 < .99). It is not accurate to
say that weight increases in direct proportion to height (linearly with height). Instead,
average weight increases somewhat more rapidly in the range of relatively large
heights than it does for relatively small heights.

x: 58 59 60 61 62 63 64 65
y: 113 115 118 121 124 128 131 134
x: 66 67 68 69 70 71 72
y: 137 141 145 150 153 159 164

Residual

3
170
2
160
1
150
62 70
0
140 58 66
1
130
2
120
3

58 62 66 70 74
(a) (b)

Figure 3.14 Plots of data from Example 3.10: (a) scatterplot; (b) residual plot

We also hope that there are no unusual points in the plot. A point falling far
above or below the horizontal line at height zero corresponds to a large residual,
which may indicate some type of unusual behavior, such as a recording error, non
standard experimental condition, or atypical experimental subject. A point whose
x value differs greatly from others in the data set may have exerted excessive influ
ence in determining the fitted line. One method for assessing the impact of such
Unless otherwise noted, all content on this page is © Cengage Learning.

an isolated point on the fit is to delete it from the data set and then recompute the
best-fit line and various other quantities. Substantial changes in the equation, pre
dicted values, r2, and se warn of instability in the data. More information may then
be needed before reliable conclusions can be drawn.

Example 3.11 Bioaerosols are airborne particles such as bacteria or pollen that, when found in
indoor environments, may cause infectious or allergic health effects. The Andersen
method for determining bioaerosol concentration requires a 2–7-day incubation
period. The article “Measurement of Indoor Bioaerosol Levels by a Direct Count
ing Method” (J. of Envir. Engr., 1996: 374–378) discussed an alternative technique,
the FFDC method. Consider the accompanying data, read from a plot in the cited

article, on x 5 concentration using Andersen method (CFU/m3) and y 5 concentra

tion using FFDC method (no./m3):

Observation x y yn Residual
1 119 239 225.1 13.9
2 140 262 240.3 21.7
3 150 202 247.6 245.6
4 157 224 252.7 228.7
5 171 255 262.8 27.8
6 200 292 283.9 8.1
7 218 350 296.9 53.1
8 250 298 320.2 222.2
9 272 313 336.2 223.2
10 321 415 371.7 43.3
11 573 542 554.7 212.7

The equation of the least squares line is yn 5 138.68 1 .726x, with r2 5 .901. (The
slope, intercept, and r2 differ very slightly from values given in the article.)

7
500 40 10

20
FFDCconc

400
Residual

Potentially 0

300 observation 20

40
200 8
200 300 400 500 300 400 500
Andconc Predicted
(a) (b) Unless otherwise noted, all content on this page is © Cengage Learning.

Figure 3.15 Plots from R for the bioaerosol data of Example 3.11:

(a) —— least squares line for the full sample
- - - least squares line when the potentially influential observation is deleted
(b) residuals versus predicted values

Figure 3.15 shows a scatterplot and a residual plot (here, residuals versus predicted
values) from R (this package has excellent graphics capabilities). There is no single
residual that is much larger in magnitude than the other residuals. The most strik
ing feature here is that x11 is much larger than any other x value in the sample, so
that (x11, y11) is an observation with potentially high influence (sometimes called a

high-leverage observation). This point would not in fact be highly influential if it fell
close to the least squares line based on just the first ten observations. However, the
equation of this line is yn 5 115.09 1 .850x with r2 5 .757; this r2 value is much
lower than the original value, and the slope and intercept have also changed substan
tially. Without the influential observation, evidence for a very strong linear relation
ship between concentrations assessed by the two methods is not nearly so compelling.

Resistant Lines
As Example 3.11 shows, the least squares line can be greatly affected by the presence
of even a single observation that shows a large discrepancy in the x or y direction
from the rest of the data. When the data set contains such unusual observations, it
is desirable to have a method for obtaining a summarizing line that is resistant to
the influence of these stray values. In recent years, many methods for obtaining a
resistant (or robust) line have been proposed, and various statistical packages will
fit such lines. Consult a statistician or a book on exploratory data analysis to obtain
more information.

Section 3.3 Exercises

19. The invasive diatom species Didymosphenia gemi- the form of the regression function and subse
nata has the potential to inflict substantial eco quent conclusions. Eliminate it and redo part
logical and economic damage in rivers. The article (a). What do you conclude?
“Substrate Characteristics Affect Colonization by
20. Electromagnetic technologies such as ground pen
the Bloom-Forming Didymosphenia geminata”
etrating radar offer effective nondestructive sensing
(Aquatic Ecology, 2010: 33–40) described an in
techniques to determine a continuous profile of a
vestigation of colonization behavior. One aspect of
pavement structure. The propagation of electromag
particular interest was whether y 5 colony density
netic waves through the structure depends critically
was related to x 5 rock surface area. The article
on the dielectric properties of the media. However,
contained a scatterplot and summary of a regression
little research has been done on the characteriza
analysis. Here is representative data:
tion of dielectric properties of asphalt mixtures. The
x: 50 71 55 50 33 58 79 article “Dielectric Modeling of Asphalt Mixtures
y: 152 1929 48 22 2 5 35 and Relationship with Density” (J. Transp. Engr.,
2011: 104–111) reported on the dielectric response
x: 26 69 44 37 70 20 45 49 with percent air voids for various asphalt mixtures at
y: 7 269 38 171 13 43 185 25 7-GHz frequency. The following data, kindly pro
vided by the authors of the cited article, compares
a. Determine the equation of the least squares line y 5 dielectric constant and x 5 air void (%) for 18
for this data and then calculate and interpret the samples having 5% asphalt content:
coefficient of determination.
b. The second observation has a very extreme y: 4.55 4.49 4.50 4.47 4.47 4.45
y value (in the full data set consisting of 72 ob x: 4.35 4.79 5.57 5.20 5.07 5.79
servations, there were 2 of these). This obser y: 4.40 4.34 4.43 4.43 4.42 4.40
vation may have had a substantial impact on x: 5.36 6.40 5.66 5.90 6.49 5.70

Dependent Variable: oil_recov

y: 4.33 4.44 4.40 4.26 4.32 4.34 Analysis of Variance
x: 6.49 6.37 6.51 7.88 6.74 7.08 Sum of Mean
Source DF Squares Square F Value Pr > F
a. Does a scatterplot of the data suggest it is reason Model 1 289.45805 289.45805 2977.07 <.0001
able to assume an approximate linear relation Error 13 1.26398 0.09723
ship between x and y? C Total 14 290.72203
b. Find the equation of the least squares line for Root MSE 0.31182 R–Square 0.9957
this data and interpret its slope. Dep Mean 6.07513 Adj R-Sq 0.9953
c. Determine the proportion of observed variation C.V. 5.13266
in the response variable that can be attributed Parameter Estimates
to the approximate linear relationship between Parameter Standard t
Variable DF Estimate Error Value Pr > |t|
strength and fiber weight.
Intercept 1 -0.52343 0.14528 -3.60 0.0032
d. Does a residual plot indicate any deficiency in a oil_added 1 0.87825 0.01610 54.56 <.0001
straight line fit? Explain your reasoning. Predict
Obs Dep Var Value Residual
21. For the past decade rubber powder has been used 1 0.6100 0.3548 0.2552
in asphalt cement to improve performance. The 2 0.8400 0.7939 0.0461
article “Experimental Study of Recycled Rubber- 3 1.5120 1.3209 0.1911
Filled High-Strength Concrete” (Magazine of 4 1.7920 1.9357 -0.1437
5 2.9520 2.6383 0.3137
Concrete Res., 2009: 549–556) included on a re
6 2.8800 3.4287 -0.5487
gression of y 5 axial strength (MPa) on x 5 cube 7 4.4000 4.3069 0.0931
strength (MPa) based on the following sample 8 5.3460 5.2730 0.0730
data: 9 6.3960 6.3269 0.0691
10 7.1890 7.4686 -0.2796
x: 112.3 97.0 92.7 86.0 102.0 11 8.0850 8.6982 -0.6132
12 9.8400 10.0155 -0.1755
y: 75.0 71.0 57.7 48.7 74.3
13 11.6960 11.4207 0.2753
14 13.2240 12.8259 0.3981
x: 99.2 95.8 103.5 89.0 86.7 15 14.3650 14.3189 0.0461
y: 73.3 68.0 59.3 57.8 48.5 Sum of Residuals 0
Sum of Squared Residuals 1.2640
a. Does a scatterplot of the data suggest an appro a. Write the equation of the least squares line and
priate linear relationship between x and y? use it to predict the value of recovered oil when
b. Obtain the equation of the least squares line and added oil is 10 g.
interpret its slope. b. What are the values of SSResid, SSTo, r2, and
c. Calculate and interpret the coefficient of se? Do these values suggest that the least squares
determination. line provides an effective summary of the rela
d. Roughly what is the size of a typical deviation of tionship between the two variables?
points in the scatterplot from the least squares c. Construct a plot of the residuals. What does it
line? suggest?
22. Recall the data from Exercise 4 based on amount 23. Recall the data from Exercise 6 involving x 5 fiber
of oil added (in g) and the corresponding amount weight (%) and y 5 compressive strength (MPa).
of oil recovered (in g) from wheat straw. Suppose a. Determine the equation of the least squares line
that we want to use the least squares line to predict and interpret its slope.
the amount of oil recovered from the wheat straw b. Determine the proportion of observed variation
based on the initial amount of oil added. Consider in strength that can be attributed to the approxi
the accompanying output from the SAS statistical mate linear relationship between strength and
computer package. fiber weight.

c. Predict the value of the compressive strength b. Determine the equations for the least squares
when the fiber weight percentage is 6.5. line for the PMMA and glass data sets. Interpret
d. Would you feel comfortable using the least the slope for each equation.
squares line to predict the compressive strength c. For the PMMA lens, predict the reduction rate
when the fiber weight percentage is 25? Explain. of transmittance when sandblast momentum is
Now predict the value of y when x 5 25 and in at 50 g.m/s. Do the same for the glass lens type.
terpret the result. d. Based on your results, which lens type per
formed better in this experiment?
24. By their nature, deserts are typically exposed to
large amounts of solar radiation. Thus, such re 25. Two important properties of a soil are its initial
gions seem to be prime locations for harvesting so void ratio (e0, a measure of soil porosity) and its
lar energy through the installation of photovoltaic compression index (Cc, an indicator of soil com
modules. These modules rely on an optical system pressibility). The article “Consolidation and
to collect sunlight, often through some lens, so an Hydraulic Conductivity of Zeolite-Amended
important factor to consider would be the effect Soil-Bentonite Backfills” (J. Geotech. Geoenvi-
of desert sandstorms on lens performance. The ron. Engr., 2012: 15–25) reported the following
authors of “Sandblasting Durability of Acrylic and data (read from a graph) for the Cc and e0 vari
Glass Fresnel Lenses for Concentrator Photovol ables for sand–bentonite backfills with varying
taic Modules” (Solar Energy, 2012: 3021–3025) amounts and types of zeolites.
compared the performance of sandblasted acrylic
and glass Fresnel lenses used in concentrator pho e0: 0.988 1.018 1.058 1.070 1.085 1.145
tovoltaic modules. In the experiment, the transmit Cc: 0.19 0.20 0.20 0.22 0.23 0.24
tance after sandblasting of acrylic polymethylmeth
acrylate (PMMA) and glass Fresnel lenses were a. Using Cc as the response and e0 as the ex
measured. The experimental data, kindly provided planatory variable, create the corresponding
by the authors, compares y 5 reduction rate of scatterplot. Do the values of Cc appear to
transmittance (%) and x 5 sandblast momen be perfectly linearly related to the e0 values?
tum (g . m/s) for 14 PMMA and 8 glass substrate Explain.
samples: b. Determine the equation of the least squares
line.
PMMA: 10.56 20.80 15.84 31.20 48.00 c. What proportion of the observed variation in the
PMMA: 8.56 18.93 19.35 23.65 33.05 compression index can be attributed to the ap
proximate linear relationship between the two
PMMA: 21.12 41.60 64.00 16.80 33.20
variables?
PMMA: 18.53 29.21 40.39 17.21 27.21
d. Predict the value of the compression index
when the initial void ratio is 1.10. Would you
PMMA: 51.20 13.92 27.84 42.72
PMMA: 34.74 17.40 25.89 32.82 feel comfortable using the least squares line to
predict the compression index when the initial
Glass: 35.20 52.80 105.60 52.80 70.40 void ratio is .80? Explain.
Glass: 5.62 8.10 31.21 13.76 15.37
26. In biofiltration of wastewater, air discharged from
Glass: 56.00 48.00 139.20 a treatment facility is passed through a damp po
Glass: 14.76 16.55 37.08 rous membrane that causes contaminants to dis
solve in water and be transformed into harmless
a. In one graph, overlay the scatterplots for the products. The accompanying data on x 5 inlet
PMMA and the glass data sets and comment on temperature (°C) and y 5 removal efficiency (%)
any interesting features. Be sure to use different was the basis for a scatterplot that appeared in the
symbols for each data set. article “Treatment of Mixed Hydrogen Sulfide

and Organic Vapors in a Rock Medium Biofilter” e. Personal communication with the authors of the
(Water Environment Research, 2001: 426–435): article revealed that there was one additional
observation that was not included in their scat
Removal Removal
terplot: (6.53, 96.55). What impact does this ad
Obs Temp % Obs Temp %
ditional observation have on the equation of the
1 7.68 98.09 17 8.55 98.27
least squares line and the values of se and r2?
2 6.51 98.25 18 7.57 98.00
3 6.43 97.82 19 6.94 98.09 27. Consider the following four (x, y) data sets; the first
4 5.48 97.82 20 8.32 98.25 three have the same x values, so these values are
5 6.57 97.82 21 10.50 98.41 listed only once (from “Graphs in Statistical Analy
6 10.22 97.93 22 17.83 98.51
sis,” Amer. Statistician, 1973: 17–21).
7 15.69 98.38 23 17.83 98.71
8 16.77 98.89 24 17.03 98.79 For each of these four data sets, the values of
9 17.13 98.96 25 16.18 98.87 the summary quantities, ^ xi, ^ yi, and so on, are al
10 17.63 98.90 26 16.26 98.76 most identical, so the equation of the least squares
11 16.72 98.68 27 14.44 98.58 line(yn 5 3 1 .5x), SSResid, SSTo, r2, and se will be
12 15.45 98.69 28 12.78 98.73 virtually the same for all four. Based on a scatterplot
13 12.06 98.51 29 12.25 98.45 and a residual plot for each data set, comment on
14 11.44 98.09 30 11.69 98.37 the appropriateness of fitting a straight line; include
15 10.17 98.25 31 11.34 98.36
any specific suggestions for how a “straight-line
16 9.64 98.36 32 10.97 98.45
analysis” might be modified or qualified.
Calculated summary quantities are ^ xi 5 384.26,
Data set: 1–3 1 2 3 4 4
^ yi 5 3149.04, ^ x2i 5 5099.2412, ^ xi yi 5
Variable: x y y y x y
37,850.7762, and ^ y2i 5 309,892.6548.
10.0 8.04 9.14 7.46 8.0 6.58
a. Does a scatterplot of the data suggest ap
propriateness of the simple linear regression 8.0 6.95 8.14 6.77 8.0 5.76
model? 13.0 7.58 8.74 12.74 8.0 7.71
b. Determine the equation of the least square line, 9.0 8.81 8.77 7.11 8.0 8.84
obtain a point prediction of removal efficiency 11.0 8.33 9.26 7.81 8.0 8.47
when temperature 5 10.50, and calculate the 14.0 9.96 8.10 8.84 8.0 7.04
value of the corresponding residual. 6.0 7.24 6.13 6.08 8.0 5.25
c. Roughly what is the size of a typical deviation of 4.0 4.26 3.10 5.39 19.0 12.50
points in the scatterplot from the least squares line? 12.0 10.84 9.13 8.15 8.0 5.56
d. What proportion of observed variation in re 7.0 4.82 7.26 6.42 8.0 7.91
moval efficiency can be attributed to the ap 5.0 5.68 4.74 5.73 8.0 6.89
proximate linear relationship?

3.4 Nonlinear Relationships

A scatterplot of bivariate data frequently shows curvature rather than a linear pattern. In
this section, we discuss several different ways to fit a curve to such data.

Power Transformations
Suppose that the general pattern in a scatterplot is curved and monotonic—either strictly
increasing or strictly decreasing. In this case, it is often possible to find a power trans-
formation for x or y so that there is a linear pattern in a scatterplot of the transformed

data. By a power transformation, we mean the use of exponents p and q such that the
transformed values are x= 5 xp and/or y= 5 yq; the relevant scatterplot is of the (x=, y=)
pairs. Figure 3.16 displays a “ladder” of the most frequently used transformations and
a guide for choosing an appropriate transformation, depending on the pattern in the
original scatterplot.

4 1
Power transformation ladder:
Transformed value 5 (original value)POWER
Power Transformed value Name
3
3 (Original value) Cube
2 (Original value)2 Square
, ,
1 Original value No transformation
1
1Original value Square root
2
1 3
2 Original value Cube root
3
, ,
0 Log(original value) Logarithm
–1 1 / (original value) Reciprocal

3 2

Figure 3.16 Transformation ladder and guide

For example, suppose the pattern has the shape of segment 2 in Figure 3.16. Then
to straighten the plot, we should use a transformation on x that is up the ladder from
the no-transformation row, for example, x= 5 x2 or x3, or a transformation on y that is
down the ladder, such as y= 5 1yy or ln(y) (log10 would produce equivalent results). A
Unless otherwise noted, all content on this page is © Cengage Learning.

residual plot should be used to check that curvature has in fact been removed. Once a
straightening transformation has been identified, a straight line can be fit to the (x=, y=)
points using least squares. If it was not necessary to transform y, then the line provides
a direct way of predicting y values: calculate x= and substitute into the equation. When
y has been transformed, the line gives predictions of y= values. The transformation can
then be reversed to obtain predictions of y. For example, if x= 5 1yx and y= 5 1y, the
least squares line gives

1y a 1 byx

from which
y (a 1 byx)2

Example 3.12 No tortilla chip aficionado likes soggy chips, so it is important to find characteristics of
the production process that produce chips with an appealing texture. The following
data on x 5 frying time (sec) and y 5 moisture content (%) appeared in the article
“Thermal and Physical Properties of Tortilla Chips as a Function of Frying Time”
(J. of Food Processing and Preservation, 1995: 175–189):
x: 5 10 15 20 25 30 45 60
y: 16.3 9.7 8.1 4.2 3.4 2.9 1.9 1.3
The scatterplot in Figure 3.17(a), opposite, has the pattern of segment 3 in
Figure 3.16, so we must go down the ladder for x or y. A scatterplot of the (ln(x),
ln(y)) pairs in Figure 3.17(b) is quite straight. A regression of ln(y) on ln(x) gives
a 5 4.6384, b 5 21.04920, and r2 5 .976. The residual plot of Figure 3.17(c)
shows no evidence of curvature, though there is one rather large residual.

Moisture content

0 Frying time
0 10 20 30 40 50 60
(a)

ln(moisture content)

Unless otherwise noted, all content on this page is © Cengage Learning.

0 ln(frying time)
2 3 4
(b)

Figure 3.17 Plots of the data from Example 3.12: (a) scatterplot of

the original data; (b) scatterplot of the (ln( ), ln( )) pairs

Residual

–.1

–.2 ln(frying time)

0 2 3 4
(c)

Figure 3.17 Plots of the data from Example 3.12: (c) plot of the
residuals from the transformed regression

Thus ln(y) 4.6384 – 1.04920[ln(x)]. Since ln(20) 5 2.996, a prediction of

ln(y) is
lnn (y) 5 4.63842 (1.04920)(2.996) 5 1.495

Taking the antilog of 1.495 gives a prediction of y itself: e1.495 5 4.46%. In fact, taking
the antilog of both sides of the linear equation gives an explicit nonlinear relation
ship between x and y:
y 5 e ln(y) e4.6384 2 1.04920[ln(x)] 5 (e4.6384)(e 21.04920 ln(x)) 5 103.379x 21.04920
This is often called a power function relationship between x and y.

Fitting a Polynomial Function

Sometimes the general pattern of curvature in a scatterplot is not monotonic. Instead, it
may be the case that as x increases, there is a tendency for y first to increase and then to
Unless otherwise noted, all content on this page is © Cengage Learning.

decrease (like a bowl turned upside down) or for y first to decrease and then to increase.
In such instances, it is reasonable to fit a quadratic function a 1 b1x 1 b2x2, whose
graph is a parabola, to the data. If the quadratic coefficient b2 is positive, the parabola
turns upward, whereas it turns downward if b2 is negative. Just as in fitting a straight line,
the principle of least squares can be employed to find the best-fit quadratic. The least
squares coefficients a, b1, and b2 are the values of a, b 1, and b 2 that minimize

g(a, b1, b2) 5 ^ 3 yi 2 A a 1 b 1xi 1 b 2x2i B 4 2

which is the sum of squared vertical deviations from the points in the scatterplot to the parab
ola determined by the quadratic with coefficients a, b 1, and b 2. Taking the partial derivative
of the g function first with respect to a, then with respect to b 1, and finally with respect to b 2,

and equating these three expressions to zero gives three equations in three unknowns. These
normal equations are again linear in the unknowns, but because there are three rather than
just two, there is no explicit elementary expression for their solution. Instead, matrix algebra
must be used to solve the system numerically for each different data set. Fortunately, solution
procedures have been programmed into the most popular statistical computer packages, so
it is necessary only to make the appropriate request and then sit back and wait for output.

Example 3.13 The scatterplot of y 5 glucose concentration versus x 5 fermentation time shown in
Figure 3.9 (at the end of Section 3.2) has the appearance of an upward-turning qua
dratic. We supplied the data to Minitab and made the appropriate regression request
to obtain the accompanying output. The fitted quadratic equation appears at the top
of the output, and the values of the least squares coefficients a, b1, b2 appear in the
Coef column just below the equation. A prediction for glucose concentration when
fermentation time is 4 hours is
yn 5 84.482 2 15.875(4) 1 1.7679(4)2 5 49.27
The regression equation is
glucconc = 84.5 - 15.9 time + 1.77 timesqd
Predictor Coef Stdev t-ratio p
Constant 84.482 4.904 17.23 0.000
time -15.875 2.500 -6.35 0.001
timesqd 1.7679 0.2712 6.52 0.001
s = 3.515 R–sq = 89.5% R–sq (adj) = 85.3%
Analysis of Variance
SOURCE DF SS MS F p
Regression 2 525.11 262.55 21.25 0.004
Error 5 61.77 12.35
Total 7 586.88

Predicted or fitted values yn1, . . . , ynn are obtained by substituting the successive x
values x1, . . . , xn into the fitted quadratic equation (e.g., in Example 3.13, yn4 5 49.27),
and the residuals are the vertical deviations y1 2 yn1, . . . , yn 2 ynn from the observed
points to the graph of the fitted quadratic (e.g., y4 2 yn4 5 51 2 49.27 5 1.73).
Residual or error sum of squares and total sum of squares are defined exactly as they
were previously:

SSResid 5 ^ (yi 2 yni)2 SSTo 5 ^ (yi 2 y)2

i i

The Minitab output of Example 3.13 shows that SSResid 5 61.77 and SSTo 5
586.88. The coefficient of multiple determination, denoted by R2, is now the
proportion of observed y variation that can be attributed to the approximate qua
dratic relationship:

SSResid
R2 5 1 2
SSTo

The R2 value in Example 3.13 is .895, so about 89.5% of the observed variation in
glucose concentration can be attributed to the approximate quadratic relation between
concentration and fermentation time.
The methodology employed to fit a quadratic is easily extended to fit a higher-order
polynomial. For example, using the principle of least squares to fit a cubic equation
gives a system of normal equations consisting of four equations in four unknowns. The
arithmetic is best left to a statistical computer package. In practice, a cubic equation
is rather rarely fit to data, and it is virtually never appropriate to fit anything of higher
order than this.

Smoothing a Scatterplot
Sometimes the pattern in a scatterplot is too complex for a line or curve of a par
ticular type (e.g., exponential or parabolic) to give a good fit. Statisticians have
recently developed some more flexible methods that permit a wide variety of pat
terns to be modeled using the same fitting procedure. One such method is LOW-
ESS (or LOESS), short for locally weighted scatterplot smoother. Let (x , y ) denote
a particular one of the n (x, y) pairs in the sample. The yn value corresponding to
(x , y ) is obtained by fitting a straight line using only a specified percentage of
the data (e.g., 25%) whose x values are closest to x . Furthermore, rather than
use “ordinary” least squares, which gives equal weight to all points, those with x
values closer to x are more heavily weighted than those whose x values are farther
away.1 The height of the resulting line above x is the fitted value yn .This process
is repeated for each of the n points, so n different lines are fit (you surely wouldn’t
want to do all this by hand). Finally, the fitted points are connected to produce a
LOWESS curve.

Example 3.14 Weighing large deceased animals found in wilderness areas is usually not feasible,
so it is desirable to have a method for estimating weight from various characteristics
of an animal that can be easily determined. Minitab has a stored data set consist
ing of various characteristics for a sample of n 5 143 wild bears. Figure 3.18(a),
opposite, displays a scatterplot of y 5 weight versus x 5 distance around the chest
(chest girth). At first glance, it looks as though a single line obtained from ordinary
least squares would effectively summarize the pattern. Figure 3.18(b) shows the
LOWESS curve produced by Minitab using a span of 50% (the fit at (x , y ) is
determined by the closest 50% of the sample). The curve appears to consist of two
straight-line segments joined together above approximately x 5 38. The steeper
line is to the right of 38, indicating that weight tends to increase more rapidly as
girth does for girths exceeding 38 in.

1
The weighted least squares criterion involves finding a and b to minimize, ^ wi[yi 2 ( a 1 bxi)]2, where
w1, . . . , wn are nonnegative weights. For example, if we take w5 5 0, then (x5, y5) is disregarded in obtaining
the fitted line. R will also fit a local quadratic in this way.

500
500
400
400
300
Weight

300

Weight
200
200
100
100

0 0
20 30 40 50 20 30 40 50
Chest.G Chest.G
(a) (b)

Figure 3.18 A Minitab scatterplot and LOWESS curve for the bear weight data of Example 3.14

Section 3.4 Exercises

28. Polyester fiber ropes are increasingly being used as fit, predict the value of failure time from a load
components of mooring lines for offshore structures of 85%.
in deep water. The authors of the paper “Quantify
ing the Residual Creep Life of Polyester Mooring 29. The authors of “Experimental and Numerical
Ropes” (Intl. J. of Offshore and Polar Explor., 2005: Investigation of Bed-Load Transport Under Un
223–228) used the accompanying data as a basis for steady Flows” (J. Hydraul. Engr., 2011: 1276–1282)
studying how time to failure (hr) depended on load simulated sediment yield of a gravel bed load under
(% of breaking load): varying rates of water flow. The researchers wanted
to mathematically model the behavior of sediment
x: 77.7 77.8 77.9 77.8 85.5 85.5
transport under such conditions and proposed a
y: 5.067 552.056 127.809 7.611 .124 .077
new model parameter, Pgt, that characterizes the
x: 89.2 89.3 73.1 85.5 89.2 85.5 unsteadiness of the water flow. Eleven simulation
y: .008 .013 49.439 .503 .362 9.930 runs were conducted in the laboratory. For each
simulation, the article reported the computed value
x: 89.2 85.5 89.2 82.3 82.0 82.3
of the unsteadiness parameter Pgt and the nondi Unless otherwise noted, all content on this page is © Cengage Learning.
y: .677 5.322 .289 53.079 7.625 155.299
mensionalized total bed load, Wt . One aim of the
a. Construct a scatterplot of x 5 load versus y 5 study was to investigate the behavior of y 5 Wt as
time. Would it be reasonable to characterize a function of x 5 Pgt. Data from the experiment is
the relationship between the two variables to be given here:
linear?
x: 0.0021 0.0041 0.0045 0.0046
b. Transform the response variable by comput
y: 15.4 59.0 80.9 107.5
ing y= 5 log(y). Construct a scatterplot of x and
y=. Would it be reasonable to characterize the x: 0.0049 0.0043 0.0049 0.0043
relationship between these two variables to be y: 313.6 163.8 857.2 40.9
linear?
x: 0.0047 0.0038 0.0046
c. Fit a straight line to the (x, y= ) data. Assess the
quality of the fit. Finally, based on the linear y: 88.9 87.8 196.5

a. Would you fit a straight line to the data and use an inappropriate choice of transformation. If
it as a basis for predicting nondimensionalized necessary, return to part (b) and try a different
total bed load from the unsteadiness parameter? transformation.
Why or why not?
31. Failures in aircraft gas turbine engines due to
b. Find a transformation that produces an ap
high cycle fatigue is a pervasive problem. The
proximate linear relationship between the trans
article “Effect of Crystal Orientation on Fatigue
formed values. Then fit a line to the transformed
Failure of Single Crystal Nickel Base Turbine
data and use it to obtain an equation that de
Blade Superalloys” (J. of Engr. for Gas Turbines
scribes approximately the relationship between
and Power, 2002: 161–176) gave the accompany
the untransformed variables.
ing data and fit a nonlinear regression function
30. In the article “Sensitivity of Oklahoma Bind in order to predict strain amplitude from cycles
ers on Dynamic Modulus of Asphalt Mixes and to failure.
Distress Functions” (J. Mater. Civ. Engr., 2012:
Obs Cycfail Strampl Obs Cycfail Strampl
1076–1088), researchers measured various physi
cal characteristics of performance grade asphalt 1 1326 .01495 11 7356 .00576
binders commonly used in Oklahoma. One im 2 1593 .01470 12 7904 .00580
portant physical characteristic is dynamic shear 3 4414 .01100 13 79 .01212
modulus, G (kPa), which is the ratio of maxi 4 5673 .01190 14 4175 .00782
mum shear stress to the maximum shear strain and 5 29,516 .00873 15 34,676 .00596
is a measure of the stiffness or resistance of the 6 26 .01819 16 114,789 .00600
asphalt binder to deformation under load. In one 7 843 .00810 17 2672 .00880
experiment, the researchers measured the dynam 8 1016 .00801 18 7532 .00883
ic shear modulus of the asphalt binder samples 9 3410 .00600 19 30,220 .00676
over a range of testing temperatures (°C). The fol 10 7101 .00575
lowing is the corresponding data for binder type
PG64-22: a. Construct scatterplots of y versus x, y versus
ln(x), ln(y) versus ln(x), and 1@y versus 1@x.
Temp: 54.4 46.1 43.3 29.4 b. Which transformation from part (a) does the
G: 9.28 32.47 46.98 344.36 best job of producing an approximate linear
Temp: 21.1 12.7 4.4 relationship?
G: 1,030.38 4,870.00 18,300.00 c. Use the selected transformation to predict am

plitude when cycles to failure 5 5000.
a. Construct a scatterplot of y 5 dynamic shear 32. There has been an increasing demand for open-
modulus versus x 5 temperature. Would it be ended steel pipe piles to be used as deep founda
reasonable to characterize the relationship tions for offshore and onshore structures. When
between the two variables as approximately an open-ended pile is driven into the ground, a
linear? soil plug often forms within the pile. The driving
b. Transform only the dependent variable y so that resistance and the base capacity of the pile are
a scatterplot of the transformed data shows a heavily influenced by this plugging effect. As an
substantial linear pattern. Then fit a straight line indicator of the degree of plugging, researchers of
to this data, use the line to establish an approxi ten use the plug length ratio (PLR), which is the
mate relationship between x and y, and predict ratio of the plug length at the end of pile instal
the dynamic shear modulus when the tempera lation to the length of the pile. The article “Base
ture is 35°C. Capacity of Open-Ended Steel Pipe Piles in Sand”
c. Plot the residuals from your linear fit in part (b) (J. Geotech. Geoenviron. Engr., 2012: 1116–1128)
and look for any patterns that might suggest reported the PLR and corresponding pile inner

diameter, d (mm), of nine test piles used in case a. Is it possible to transform this data as described
studies. The data is given here: in this section so that there is an approximate
linear relationship between the transformed
d: 691.0 292.0 83.7 37.2 78.9
variables? Why or why not?
PLR: 1.00 0.82 0.76 0.44 0.76
b. Use a statistical computer package to fit a qua
d: 107.9 82.5 1444.0 1444.0 dratic function to this data and then predict
PLR: 0.88 0.75 1.00 1.00 bond strength when thickness is 500. Assess the
fit of the quadratic to the data.
a. The authors were interested in predicting
PLR based on the pile inner diameter. Trans 34. The accompanying data was extracted from the
form only the independent variable x so that article “Effects of Cold and Warm Temperatures
a scatterplot of the transformed data shows a on Springback of Aluminum-Magnesium Alloy
substantial linear pattern. Then fit a straight 5083-H111” (J. Engr. Manuf., 2009: 427–431). The
line to this data, use the line to establish an response variable is yield strength (MPa), and the
approximate relationship between x and y, and predictor is temperature ( C).
predict the plug length ratio when the pile in x: 250 25 100 200 300
ner diameter is 500 mm. y: 91.0 120.5 136.0 133.1 120.8
b. Plot the residuals from your linear fit in part (a)
Here is Minitab output from fitting the quadratic
and look for any patterns that might suggest an
regression function (a graph in the cited paper sug
inappropriate choice of transformation. If neces
gests that the authors did this):
sary, return to part (a) and try a different transfor
mation. Predictor Coef SE Coef T P
Constant 111.277 2.100 52.98 0.000
33. The article “Residual Stresses and Adhesion of
temp 0.32845 0.03303 9.94 0.010
Thermal Spray Coatings” (Surface Engr., 2005:
tempsqd -0.0010050 0.0001213 -8.29 0.014
35–40) considered the relationship between the
thickness (mm) of NiCrAl coatings deposited on S = 3.44398 R–Sq = 98.1% R–Sq(adj) = 96.3%
stainless steel substrate and corresponding bond Analysis of Variance
strength (MPa). The following data was read from a
Source DF SS MS F P
plot in the paper: Regression 2 1245.39 622.69 52.50 0.019
Thickness: 220 220 220 220 370 Residual Error 2 23.72 11.86
Total 4 1269.11
Strength: 24.0 22.0 19.1 15.5 26.3
a. What is the equation of the best-fit quadratic?
Thickness: 370 370 370 440 440 Use this quadratic to predict yield strength when
Strength: 24.6 23.1 21.2 25.2 24.0 temperature is 110.
Thickness: 440 440 680 680 680 b. What are the values of SSResid and SSTo?
Strength: 21.7 19.2 17.0 14.9 13.0 Verify that these values are consistent with the
value of R-sq given on the output. Do you think
Thickness: 680 860 860 860 860 the fit of the quadratic is good? Explain.
Strength:
11.8 12.2 11.2 6.6 2.8

3.5 Using More Than One Predictor

In Sections 3.3 and 3.4, we considered relationships between a dependent or response
variable y and a single predictor, independent, or explanatory variable x. In many situ
ations, predictions of y values can be improved and more observed y variation can be

explained by utilizing information in two or more explanatory variables. Notation is a

bit more complex than in the case of a single predictor. Let
k 5 number of explanatory variables or predictors
n 5 sample size
and x1, x2, . . . , xk denote the k predictors, so that each observation will consist of
k 1 1 numbers: the value of x1, the value of x2, . . . , the value of xk, and the value of y.
Also let
xij 5 value of the predictor xi in the jth observation
so
first observation 5 (x11, x21, . . . , xk1, y1)
?
? ?
? ?
nth observation 5 (x1n, x2n, . . . , xkn, yn)

Example 3.15 Soil and sediment adsorption, the extent to which chemicals collect in a condensed
form on the surface, is an important characteristic because it influences the effec
tiveness of pesticides and various agricultural chemicals. The article “Adsorption of
Phosphate, Arsenate, Methanearsonate, and Cacodylate by Lake and Stream Sedi
ments: Comparison with Soils” (J. of Environ. Qual., 1984: 499–504) gave the fol
lowing data on y 5 phosphate adsorption index, x1 5 amount of extractable iron, and
x2 5 amount of extractable aluminum

Observation x1 x2 y
1 61 13 4
2 175 21 18
3 111 24 14
4 124 23 18
5 130 64 26
6 173 38 26
7 169 33 21
8 169 61 30
9 160 39 28
10 244 71 36
11 257 112 65
12 333 88 62
13 199 54 40

Thus the first observation is the triple (x11, x21, y1) 5 (61, 13, 4), . . . , and the last
observation is (x1,13, x2,13, y13) 5 (199, 54, 40).

Each observation in Example 3.15 is a triple of numbers. A scatterplot of such

data would represent each observation as a point in a three-dimensional coordinate

system, which is obviously difficult to construct or visualize. For k . 2, a scatterplot

requires more than three dimensions! Partial information about the relationship
between the variables can be obtained by forming a scatterplot matrix. This is just
a collection of two-dimensional scatterplots, arranged in a square array, in which
each variable is plotted against every other variable. The matrix gives a preliminary
indication of whether any single predictor might be related to y, whether the rela
tionship might be linear, and whether there appears to be a strong relation between
any particular pair of predictors (in which case, one of them may be redundant).
Figure 3.19 shows a scatterplot matrix for the adsorption data from Example 3.15. In
the case k 5 2, there are really just three plots: y versus x1, y versus x2, and x2 versus
x1. Each of these plots appears twice in Figure 3.19, allowing the investigator to look
across any row and see a particular variable plotted against every other variable. For
example, the third row shows adsorption index versus extractable iron, followed by
adsorption index versus extractable aluminum. We can see that y appears linearly
related to both x1 and x2 and that there is not a very strong relationship between x1
and x2.

20 40 60 80 100

300

Exiron 200

100

60 Exalum

40
Unless otherwise noted, all content on this page is © Cengage Learning.
20

60
50
40
Adsorpind
30
20
10

100 150 200 250 300 10 20 30 40 50 60

Figure 3.19 A scatterplot matrix from R of the data from Example 3.15

Fitting a Linear Function

We now consider fitting a relation of the form

y a 1 b1x1 1 b2x2 1 1 bk xk

The reasonableness of this approximation depends on patterns in the scatterplot matrix

and other characteristics of the data to be considered shortly. As with bivariate data, the
values of a, b1, . . . , bk should be selected to give the best fit. Again, the principle of least
squares can be invoked: The least squares coefficients a, b1, b2, . . . , bk are the values
of a, b 1, . . . , b k that minimize
n
g(a, b 1, . . . , b k) 5 ^ [yj 2 (a 1 b 1x1j 1 1 b kxkj)]2
j51

The g( ) function is the sum of squared deviations between observed y values and
what would be predicted by a 1 b 1x1 1 1 b k xk. Determination of the least squares
coefficients involves multivariable calculus: Take the partial derivative of g( ) with
respect to each unknown, equate these to zero to obtain a system of k 1 1 linear
equations in the k 1 1 unknowns (the normal equations), and solve the system. The
arithmetic is quite tedious, but any good statistical computer package can handle
the task upon request; a regression command of some sort is usually required.

Example 3.16 (Example 3.15 continued) Figure 3.20 shows partial Minitab output from a request
to fit a1b1x1 1b2x2 to the phosphate adsorption data using the principle of least
squares. The result is
yn 27.3511 .11273x1 1.34900x2 27.35 1 .113x1 1.349x2

1
Unless otherwise noted, all content on this page is © Cengage Learning.

100 2

SSResid
SSTo

Figure 3.20 Minitab regression output for the phosphate

adsorption data

A prediction of the phosphate adsorption index for an observation to be made when

extractable iron is 150 and extractable aluminum is 60 is
yn 5 27.35 1.113(150) 1.349(60) 530.54
We interpret b1 5 .113 to mean that when the amount of extractable iron increases
by 1 unit and the amount of extractable aluminum is held fixed, we can expect
the phosphate adsorption index to increase by roughly .113. A similar interpretation
applies to b2 5 .349.

Predicted values and residuals are calculated in a manner similar to that

used in the case of a single predictor. For example, yn1 results from substituting
x1 5 x11, x2 5 x21, . . . , xk 5 xk1 (the values of the predictors for the first observation)
into yn 5 a 1 b1x1 1 1 bk xk, and the corresponding residual is y1 5 yn1. From
Examples 3.15 and 3.16,

yn1 5 27.35 1 .113(61) 1 .349(13) 5 4.08 y12 yn1 5 4 2 4.08 5 2.08

The same two sums of squares calculated after fitting a line are relevant here:

SSResid 5 ^ (yj 2 ynj)2 a measure of unexplained variation

SSTo 5 ^ (yj 2 y)2 a measure of total variation

The coefficient of multiple determination
SSResid
R2 5 1 2
SSTo
is interpreted as the proportion of observed y variation that can be explained by or at
tributed to the approximate linear relation between the response variable and the pre
dictors. The value of R2 is the first concrete indicator of whether the postulated linear
relationship is indeed a good approximation. Looking at the Minitab output of Figure
3.20, we see that about 94.8% of observed variation in the phosphate adsorption index
can be explained by its approximate linear relationship to extractable iron and extract
able aluminum, a very impressive result. In addition, residual plots—residuals versus x1,
residuals versus x2, . . . , and residuals versus xk—should be examined for evidence that
the fitted relationship must be modified. The two residual plots of Figure 3.21 (p. 145)
show no unusual pattern indicating that a modification is needed.
There is one potential difficulty with R2: Its value can be greatly inflated by using
many predictors of questionable importance when fitting the linear relationship. Sup
pose, for example, that y is the sale price of a home and that we have a sample of n 5 20
homes from the region of interest. Important predictors include x1 5 interior size (ft2),
x2 5 lot size, x3 5 age of the home, x4 5 number of bedrooms, and x5 5 size of the
garage. Consider adding other predictors that are intuitively relatively uninformative or
even frivolous: thickness of the driveway slab, diameter of a showerhead, height of the

Residual
Residual
5
5

0 0

–5 –5

Extractable Extractable
–10 –10
50 100 250 350 iron 50 100 aluminum
(a) (b)

Figure 3.21 Residual plots for the adsorption data of Examples 3.15 and 3.16

doorknob on the front door, and so on. It turns out that if 19 predictors are included
(one less than the number of observations), then it will virtually always be the case that
R2 5 1. So the goal here is not simply to obtain a set of predictors for which R2 is large,
but to obtain a large value using relatively few predictors while excluding those of mar
ginal significance. We will further discuss this issue in Chapter 11.

Creating New Predictors from Existing Ones

The ability of predictors x1, . . . , xk to explain variation in y can often be consider
ably enhanced by having one or more predictors that are mathematical functions
of the remaining predictors. As an example, let y denote the yield of a particular
product resulting from a certain chemical reaction. Usually y will depend on both
x1 5 reaction temperature and x2 5 reaction pressure. It may be the case, though,
that using only these two predictors results in an R2 value much less than 1, whereas
including a third predictor x3 5 x1x2 considerably increases R2 (adding a predictor
cannot possibly decrease R2). Alternatively, using predictors x1 and x2 along with
the two additional predictors x3 5 x21 and x4 5 x22 may result in most of the ob
served y variation being explained. Or perhaps all three of the additional predictors
x3 5 x1x2, x4 5 x21, and x5 5 x22 will give very impressive results. The first new vari
able x3 is called an interaction predictor, and the other two are quadratic predictors
Unless otherwise noted, all content on this page is © Cengage Learning.

(interpretations will be given in Chapter 11); the fit with all five predictors is called
the full quadratic or complete second-order relationship. In fact, we used a quadratic
predictor in the previous section when fitting a quadratic function to bivariate data.
The two predictors there were x1 5 x and x2 5 x2, implying that quadratic (more
generally, polynomial) regression is a special case of multiple regression.

Example 3.17 Researchers carried out a study to see how y 5 ultimate deflection, d (mm), of
reinforced ultrahigh toughness cementitious composite beams were influenced
by x1 5 shear span ratio and x2 5 splitting tensile strength (MPa), resulting in the

accompanying data (“Shear Behavior of Reinforced Ultrahigh Toughness Cementi

tious Composite Beams without Transverse Reinforcement,” J. Mater. Civ. Engr.,
2012: 1283–1294):
x1 x2 x1x2 y
2.04 3.55 7.2420 3.11
2.04 6.07 12.3828 3.26
3.06 3.55 10.8630 3.89
3.06 6.07 18.5742 10.25
4.08 3.55 14.4840 3.11
4.08 6.16 25.1328 13.48
2.06 3.62 7.4572 3.94
2.06 6.16 12.6896 3.53
3.08 3.62 11.1496 3.36
3.08 5.89 18.1412 6.49
4.11 3.62 14.8782 2.72
4.11 5.89 24.2079 12.48
2.01 6.18 12.4218 2.82
3.02 6.18 18.6636 5.19
4.03 6.18 24.9054 8.04

Fitting a 1 b1x1 1 b2x2 results in

yn 5 29.251 1 2.322x1 1 1.544x2, R2 5 .576

Including an interaction predictor yields

yn 5 17.279 2 6.368x1 2 3.658x2 1 1.707x1x2, R2 5 .825

Adding in the two quadratic predictors gives

yn 5 234.323 2 6.568x1 1 19.347x2 1 1.655x1x2 1 .058x21 2 2.359x22, R25.845

General Additive Fitting

The relationships described heretofore in this section impose quite a bit of structure on
how y depends on the explanatory variables. A more flexible type of relation is

yn 5 a 1 f1(x1) 1 f2(x2) 1 1 fk(xk)

where the forms of f1( ), . . . , and fk( ) are left unspecified. The statistical package R,
among others, will execute this general additive fit by calculating a and the individual
fi( )’s; one method for carrying out this latter task is based on the LOWESS technique
described in Section 3.4.

Example 3.18 The ethanol data set stored in the R package contains 88 observations on variables
x1, x2, and y obtained in an experiment in which ethanol was burned in a single-
cylinder automobile test engine. The variables are
x1 5 C 5 compression ratio of the engine
x2 5 E 5 equivalence ratio at which the engine was run (a measure of
richness of the air/ethanol mix)
y 5 NOx 5 concentration of nitric oxide and nitrogen dioxide in engine
exhaust, normalized in a certain manner

Figure 3.22 shows a scatterplot matrix of the data; it appears that there is a sub
stantial nonlinear relation between y and x2. We asked R to obtain a general
additive fit using LOWESS with a span of .75 (closest 75% of the data values) for
each of the two component functions f1(x1) and f2(x2). Graphs of these two functions
appear in Figure 3.23. Sure enough, the second graph is highly nonlinear, and there
is also some nonlinearity in the first graph.

8 10 12 14 16 18
4

NOx 2

18
16
14
C
12
10
8
Unless otherwise noted, all content on this page is © Cengage Learning.

1.2

1.0
E
0.8

0.6

1 2 3 4 0.6 0.8 1.0 1.2

Figure 3.22 Scatterplot matrix of the ethanol data from R

0.3 1.0
0.2 0.5
lo(C, 0.75)

lo(E, 0.75)
0.1
0.0 –0.5

–0.2 –1.5
8 10 12 14 16 18 0.6 0.8 1.0 1.2
C E

Figure 3.23 R graphs of the component functions resulting from a general additive fit to the
ethanol data

The R2 value for this fit was .873, whereas the value for the linear fit a 1 b1x1 1
b2x2 was only .01. The reported value of the constant term a was 1.957, and the pre
dicted value of NOx when C 5 9.0 and E 5 1.0 was given by R as
yn 5 1.957 1 f1(9.0) 1 f2(1.0) 5 2.743

Section 3.5 Exercises

35. Recently there has been increased use of stainless y: 3.423 3.242 3.385 3.420 3.380 3.402
steel claddings in industrial settings. Claddings x1: 8.5 8.5 8.5 8.5 8.5 8.5
are used to finish the exterior walls of a building
x2: 60 40 40 40 40 40
and help weatherproof the structure. To ensure
the quality of claddings, it is essential to know how y: 3.382 3.388 3.398 3.404
welding parameters impact the cladding process. x1: 8.5 8.5 8.5 8.5
The authors of “Mathematical Modeling of Weld x2: 40 40 40 40
Bead Geometry, Quality, and Productivity for a. A least squares fit of y 5 a 1 b1x1 1 b2x2 to
Stainless Steel Claddings Deposited by FCAW” this data gave a 5 .0558, b1 5 .3749, and b2 5
(J. Mater. Engr. Perform., 2012: 1862–1872) in .0028. What value of deposition rate would you
Unless otherwise noted, all content on this page is © Cengage Learning.
vestigated how y 5 deposition rate was influenced predict when wire feed rate 5 11.5 and weld
by x1 5 wire feed rate (Wf, in m/min) and x2 5 ing speed 5 40? What is the value of the cor
welding speed (S, in cm/min). The following 22 responding residual?
observations correspond to the experiment condi b. Residual and total sums of squares are .03836
tion where applied voltage was less than 30v: and 5.1109, respectively. What proportion of
observed variation in deposition rate can be
y: 2.718 3.881 2.773 3.924 2.740 3.870
attributed to the stated approximate relation
x1: 17.0 10.0 7.0 10.0 7.0 10.0
ship between deposition rate and the two pre
x2: 30 30 50 50 30 30 dictor variables?
y: 2.847 3.901 2.204 4.454 3.324 3.319 36. The accompanying Minitab regression output
x1: 7.0 10.0 5.5 11.5 8.5 8.5 is based on data that appeared in the article “Ap
x2: 50 50 40 40 40 20 plication of Design of Experiments for Modeling

Surface Roughness in Ultrasonic Vibration Turn obs 1 2 flth

ing” (J. of Engr. Manuf., 2009: 641–652). The 8 23471 .0006400 14.18
response variable is surface roughness (mm), and 9 13948 .0004850 20.64
10 8824 .0003660 20.60
the independent variables are vibration amplitude
11 7699 .0002290 16.61
(mm), depth of cut (mm), feed rate (mm/rev), and
12 15791 .0014100 15.08
cutting speed (m/min), respectively. 13 10239 .0004100 18.05
The regression equation is 14 43835 .0000960 99.71
Ra = -0.972 - 0.0312a - 0.557d - 18.3f - 0.00282v 15 49793 .0000896 58.97
Predictor Coef SE Coef T P 16 40656 .0026000 172.58
Constant -0.9723 0.3923 -2.48 0.015
17 50774 .0009530 44.25
a -0.03117 0.01864 -1.67 0.099
d 0.5568 0.3185 1.75 0.084 Coefficients:
f 18.2602 0.7536 24.23 0.000 Std. t Pr
v 0.002822 0.003977 0.71 0.480
Estimate Error value (>|t|)
S =
0.822059 R–Sq = 88.6% R-Sq(adj) = 88.0% (Intercept) -33.46 1.490e+01 -2.246 0.0413
Source DF SS MS F P
1 2.055e-03 2.945e-04 6.977 6.48e-06
Regression 4 401.02 100.25 148.35 0.000
Residual Error 76 51.36 0.68 2 29836 1.365e+04 2.185 0.0464
Total 80 452.38 Residual standard error: 44.28 on 14 degrees of
a. Predict the value of surface roughness when am freedom Multiple R–squared: 0.9234, Adjusted R–
plitude is 10, depth of cut is .5, feed rate is .25, squared: 0.9125 F-statistic: 84.39 on 2 and 14
and cutting speed is 50. DF, p-value: 1.546e–08

b. What proportion of observed variation in surface Analysis of Variance Table

roughness can be explained by the approximate Response: flth

relationship between surface roughness and the Df Sum Sq Mean Sq F value Pr(>F)
1 321625 321625 164.011 4.04e-09
four predictors? 1

2 1 9364 9364 4.775 0.04637

37. Snowpacks contain a wide spectrum of pollutants Residuals 14 27454 1961
that may represent environmental hazards. The
article “Atmospheric PAH Deposition: Deposi a. Interpret the value of the coefficient of multiple
tion Velocities and Washout Ratios” (J. of Envir. determination.
Engr., 2002: 186–195) focused on the deposition of b. Predict the value of deposition when x1 5
polyaromatic hydrocarbons. The authors proposed 20,000 and x2 5 .001.
a multiple regression function for relating deposi c. Since b2 5 29,836, is it legitimate to conclude
tion over a specified time period (y, in mg/m2) to that if x2 increases by 1 unit while the values
two rather complicated predictors x1 (mg-sec/m3) of the other predictors remain fixed, deposi
and x2 (mg/m2), defined in terms of PAH air con tion would increase by 29,836 units? Explain
centrations for various species, total time, and total your reasoning.
amount of precipitation. Here is data on the species
38. An investigation of a die-casting process resulted in
fluoranthene and corresponding output fitting y 5
the accompanying data on x1 5 furnace temperature,
a 1 b1x1 1 b2x2 from the R software:
x2 5 die close time, and y 5 temperature difference
obs 1 2 flth on the die surface (“A Multiple-Objective Decision-
1 92017 .0026900 278.78 Making Approach for Assessing Simultaneous Im
2 51830 .0030000 124.53 provement in Die Life and Casting Quality in a Die
3 17236 .0000196 22.65 Casting Process,” Quality Engr., 1994: 371–383).
4 15776 .0000360 28.68
5 33462 .0004960 32.66 x1: 1250 1300 1350 1250 1300
6 243500 .0038900 604.70 x2: 6 7 6 7 6
7 67793 .0011200 27.69 y: 80 95 101 85 92

R-Square Coeff Var Root MSE beta Mean

x1: 1250 1300 1350 1350 0.016489 102.0515 0.296148 0.290195
x2: 8 8 7 8
Standard t Pr >
y: 87 96 106 108
Parameter Estimate Error Value |t|
Intercept 0.4010752535 0.38164661 1.05 0.3089
Use a statistical computer package to fit y 5 a 1 lino 0.0011095713 0.00801331 0.14 0.8916
b1 x1 1 b2x2 using the least squares method. Be sure kero -.0032850626 0.00801331 -0.41 0.6873
to specify all function coefficients. Also include the anti -.0045615514 0.01602662 -0.28 0.7796
coefficient of multiple determination and interpret
A request to the SAS package to fit a function with
its value.
predictors x1, x2, and x3 as well as quadratic and inter
39. Use of sucrose as a carbon source for the production action predictors yielded the following output:
of chemicals is uneconomical. Beet molasses is a Dependent Variable: beta
readily available and lower-priced substitute. The ar Sum of Mean F Pr .
ticle “Optimization of the Production of -Carotene Source DF Squares Square Value F

from Molasses by Blakeslea trispora” (J. of Chem. Model 9 1.40762342 0.15640260 81.61 ,.0001
Error 10 0.01916523 0.00191652
Tech. and Biotech. 2002: 933–943) carried out a
C. Total 19 1.42678865
multiple regression analysis to relate the dependent
variable y 5 amount of -carotene (g/dm3) to the R–Square Coeff Var Root MSE beta Mean
three predictors amount of lineolic acid, amount of 0.986568 15.08576 0.043778 0.290195
kerosene, and amount of antioxidant (all g/dm3). Standard t Pr >
Parameter Estimate Error Value |t|
Obs Linoleic Kerosene Antiox Betacaro Intercept -2.368673650 0.25095313 -9.44 <.0001
1 30.00 30.00 10.00 0.7000 lino 0.115946557 0.00896686 12.93 <.0001
2 30.00 30.00 10.00 0.6300 kero 0.048329827 0.00896686 5.39 0.0003
3 30.00 30.00 18.41 0.0130 anti 0.125140001 0.01622284 7.71 <.0001
4 40.00 40.00 5.00 0.0490 lino*kero 0.000116125 0.00015478 0.75 0.4704
5 30.00 30.00 10.00 0.7000 lino*anti 0.000820250 0.00030956 2.65 0.0243
6 13.18 30.00 10.00 0.1000 kero*anti 0.001002750 0.00030956 3.24 0.0089
7 20.00 40.00 5.00 0.0400 lino*lino -0.002108721 0.00011530 -18.29 <.0001
8 20.00 40.00 15.00 0.0065 anti*anti -0.009219578 0.00046120 -19.99 <.0001
9 40.00 20.00 5.00 0.2020 kero*kero -0.001085436 0.00011530 -9.41 <.0001
10 30.00 30.00 10.00 0.6300
11 30.00 30.00 1.59 0.0400 a. What is the coefficient of multiple determina
12 40.00 20.00 15.00 0.1320 tion for each fitted function?
13 40.00 40.00 15.00 0.1500 b. For the fit using a 1 b1x1 1 b2x2 1 b3x3, what
14 30.00 30.00 10.00 0.7000
is the predicted value of -carotene when lin
15 30.00 46.82 10.00 0.3460
eolic acid 5 40, kerosene 5 20, and antioxi
16 30.00 30.00 10.00 0.6300
17 30.00 13.18 10.00 0.3970 dant 5 5? What is the corresponding residual?
18 20.00 20.00 5.00 0.2690 c. For the fit with predictors x1, x2, and x3 as well
19 20.00 20.00 15.00 0.0054 as quadratic and interaction predictors, what is
20 46.82 30.00 10.00 0.0640 the predicted value of -carotene when lineolic
acid 5 40, kerosene 5 20, and antioxidant 5 5?
A request to the SAS package to fit a 1 b1x1 1 What is the corresponding residual?
b2x2 1 b3x3 yielded the following output: d. Note the difference in magnitude of the re
Dependent Variable: beta
siduals you just computed for the two regres
Sum of Mean F Pr sions. Explain how it is reasonable for one of
Source DF Squares Square Value . F these to have a smaller residual magnitude
Model 3 0.02352595 0.00784198 0.09 0.9648
Error 16 1.40326270 0.08770392
given the difference in coefficients of multiple
C. Total 19 1.42678865 determination.

40. The collapse of reinforced concrete buildings dur 41. A new surface finishing method has been developed
ing earthquakes can result in significant loss of for nanofinishing flat and three-dimensional work
property and life. Often such collapses are caused piece surfaces. The authors of “Parametric Analysis
by concrete column axial failure. The authors of of an Improved Ball End Magnetorheological Fin
“Rotation-Based Shear Failure Model for Lightly ishing Process” (J. Engr. Manuf., 2012: 1550–1563)
Confined RC Columns” (J. Struct. Engr., 2012: investigated how y 5 percent change in surface
1267–1278) introduced a model for the deforma roughness was influenced by x1 5 rotational speed
tion at onset of shear failure for a class of rein of tool core (N, in r/min), x2 5 magnetizing current
forced concrete columns. As part of the study, the (I, in A), x3 5 working gap (D, in mm).
authors investigated how y 5 maximum sustained
y: 47.68 39.80 80.69 34.12 45.10
shear (Vmax, in kN) is influenced by x1 5 transverse-
x1: 400 500 500 600 500
reinforcement yield stress (MPa) and x2 5 concrete x2: 5.0 2.3 4.0 5.0 4.0
cylinder compressive strength (MPa). x3: 2.00 1.50 0.66 2.00 1.50
y: 314.9 359.0 300.7 271.3 266.9
y: 46.51 69.63 63.62 37.18 36.75
x1: 469 469 469 400 400
x1: 500 500 600 668 400
x2: 21.10 21.10 20.90 25.60 25.60
x2: 4.0 5.7 5.0 4.0 3.0
y: 240.2 231.3 315.8 338.1 355.9 x3: 1.50 1.50 1.00 1.50 2.00
x1: 400 400 400 400 400 y: 49.94 45.86 70.64 54.75 24.97
x2: 33.10 33.10 25.70 27.60 27.60 x1: 500 500 400 600 600
x2: 4.0 4.0 5.0 3.0 3.0
y: 378.1 101.9 110.8 103.2 101.9
x3: 1.50 1.50 1.00 1.00 2.00
x1: 400 46 46 365 365
x2: 25.70 4.65 4.34 23.00 20.20 y: 49.38 59.85 55.18 32.05 44.94
x1: 500 400 332 500 500
y: 120.5 111.6 219.3 213.1 x2: 4.0 3.0 4.0 4.0 4.0
x1: 365 365 392 392 x3: 1.50 1.00 1.50 2.34 1.50
x2: 23.00 20.20 30.70 30.70
Use a statistical computer package to fit
Use a statistical computer package to fit (a) a 1 (a) a 1 b1x1 1 b2x2 1 b3x3, (b) a 1 b1x1 1 b2x2 1
b1x1 1 b2x2, (b) a 1 b1x1 1 b2x2 1 b3x1x2, and b3x3 1 b4x1x2 1 b5x1x3 1 b6x2x3, and (c) a 1 b1x1 1
(c) a 1 b1x1 1 b2x2 1 b3x1x2 1 b4x21 1 b5x22. Be b2x2 1 b3x3 1 b4x1x2 1 b5x1x3 1 b6x2x31 b7x21 1
sure to specify all function coefficients. For each b8x22 1 b9x23. Be sure to specify all function coeffi
function, also include the coefficient of multiple cients. For each fit, also include the coefficient of
determination and interpret its value. multiple determination and interpret its value.

3.6 Joint Distributions

In Chapter 1, we presented several different ways to display and summarize sample data
consisting of observations on a single quantitative variable x. These ideas were then
extended to a population or process distribution consisting of a density function (when
x is continuous) or mass function (for discrete x) and the corresponding graph. In this
chapter, we have discussed bivariate and multivariate sample data. Now we consider
distributions for two or more variables in a population or ongoing process.

Distributions for Two Variables

Let’s initially focus on the case of two numerical variables x and y. For example, x might
be the time a customer spends in a grocery checkout line (a continuous variable) and

y the number of items purchased by the customer (a discrete variable). In practice, x

and y are usually of the same type, either both discrete or both continuous. Suppose first
that x and y are both discrete. Then their “joint” distribution is specified by a joint mass
function f(x, y) satisfying

f (x, y) $ 0 ^ f (x, y) 5 1
all (x,y)

Often there is no nice formula for f (x, y). When there are only a few possible values of x
and y, the mass function is most conveniently displayed in a rectangular table.

Example 3.19 A certain market has both an express checkout register and a superexpress register.
Let x denote the number of customers queueing at the express register at a particular
weekday time, and let y denote the number of customers in line at the superexpress
register at that same time. Suppose that the joint mass function is as given in the ac
companying table:

y
0 1 2 3
0 .08 .07 .04 .00
1 .06 .15 .05 .04
x 2 .05 .04 .10 .06
3 .00 .03 .04 .07
4 .00 .01 .05 .06

According to the table, f(x, y) . 0 for only 17 (x, y) pairs. Just as in the case of a single
variable, individual proportions from the mass function can be added to yield other
proportions of interest. For example, (x, y) pairs for which the number of customers
at the express register is equal to the number of customers at the other register are
(0, 0), (1, 1), (2, 2), and (3, 3), so
longrun proportion of
a b 5 f (0, 0)1 f (1, 1) 1 f (2, 2) 1 f (3, 3)
times for which x 5y
5 .08 1.15 1 .10 1.07
5 .40
The total number of customers at these two registers will be 2 if (x, y) 5 (2, 0),
(1, 1), or (0, 2), so
longrun proportion of
a b 5 f (2, 0) 1 f (1, 1) 1 f (0, 2)
times for which x 1 y 52
5 .05 1.15 1.04
5 .24

Suppose we are presented with the joint distribution but are interested only in the
distribution of x alone: the marginal distribution of x. In Example 3.19, we might wish
to know f1(0), f1(1), f1(2), f1(3), and f1(4), the long-run proportions for various values of
the first variable, x. Consider x 5 1, which occurs when (x, y) 5 (1, 0), (1, 1), (1, 2), or
(1, 3). Thus
f 1(1) 5 long@run proportion of the time that x 5 1
5 f (1, 0) 1 f (1, 1) 1 f (1, 2) 1 f (1, 3)
5 .06 1 .15 1 .05 1 .04 5 .30

This is nothing more than the sum of proportions in the x 5 1 row of the joint mass
table. Adding proportions in the other rows gives the entire marginal distribution of x,
whereas adding proportions in the various columns gives the marginal distribution of y,
denoted by f2(y):
x: 0 1 2 3 4 y: 0 1 2 3
f1(x): .19 .30 .25 .14 .12 f2(y): .19 .30 .28 .23
Now let’s consider the case of two continuous random variables. The distribution for
a single continuous variable x is specified by a density function f (x) that satisfies f (x) 0
and #2 f (x) dx 5 1. The graph of f (x) is the density curve, and various proportions
correspond to areas under this curve that are obtained by integrating the density function.
Extending these ideas to two variables requires that we use multivariate calculus, in par
ticular multiple integration. The joint distribution of x and y is specified by a joint density
function f (x, y) that satisfies

f (x, y) $ 0 #2 #2 f (x, y) dx dy 5 1

The graph of f (x, y) is a surface in three-dimensional space. The second condition

indicates that the total volume under this density surface is 1.0. Suppose that x and
y are reaction times by an individual to two different stimuli (e.g., two different con
figurations of brake lights) and that we wish to calculate the proportion of individu
als for which both .5 x 1 and .5 y 1. Letting A 5 {(x, y): .5 x 1, .5
y 1}, a rectangular region in the x–y plane, the desired proportion is the dou
ble integral ##A f (x, y) dx dy; this is just the volume underneath the density surface
that lies above the region A, as illustrated in Figure 3.24 (p. 154). Even though the
region of integration is a rectangle, the integral may be quite difficult to compute
if the integrand (density function) is complicated, perhaps requiring a numerical
integration of some sort. When A is not a rectangle, the integration will typically be
even more difficult to carry out. We are not going to do multiple integration in this
text; we simply want you to be acquainted with the basic ideas of continuous distributions.
To see examples of calculations, please consult one of the chapter references.
In the same way that in the discrete case the marginal distribution for either one of
the variables is obtained by summing the joint mass function over values of the other
variable [the row or column sums from a rectangular table of f (x, y)], the marginal den
sity function f1(x) is obtained by integrating the joint density with respect to y, and f2(y)
results from integrating f (x, y) with respect to x.

( , )

Surface ( , )

= Shaded rectangle

Figure 3.24 Volume representing the proportion

of ( , ) in the region

Correlation and the Bivariate Normal Distribution

Let h(x, y) represent some particular function of x and y, such as h(x, y) 5 x 1 y or
h(x, y) 5 xy. Paralleling the definition of a mean value in the case of a single variable,
the mean (or expected) value of h(x, y) is a weighted average of h(x, y), with the weights
given by the joint mass or density function:

^ ^ h(x, y) f (x, y) x, y discrete

h(x,y) 5 d
##h(x, y)f (x, y) dx dy x, y continuous

Let x and y denote the mean values of x and y, respectively. Then the function
h(x, y) 5 (x 2 x)(y 2 y)
is a product of x and y deviations from their mean values [like (x 2 x)(y 2 y) in our
discussion of sample correlation]. The mean value of this product of deviations
is called the covariance between x and y, and the population correlation coef-
ficient is
Unless otherwise noted, all content on this page is © Cengage Learning.

covariance(x, y)
5
xy

where x and y are the x and y standard deviations, respectively. This definition of is
very similar to the definition of the sample correlation coefficient r given in Section 3.2.
You need not worry about calculating , but we do want you to know that it exists and
shares many properties with r. In particular,
1. does not depend on the x or y units of measurement.
2. 21 1
3. The closer is to 11 or 21, the stronger the linear relationship between the two
variables.

One of the most frequently occurring bivariate distributions in statistics generalizes

the univariate normal distribution introduced in Section 1.4. The bivariate normal
joint density function is given by
1 x 2 x 2 x 2 x y 2 y y 2 y 2

1 2 ca b 2 2 a ba b1a b d
f (x, y) 5 e 2(1 2 ) 2 x x y y
2xy 21 2 2

2 ,x,
2 ,y,

One interesting example of the use of this joint distribution appears in the article “Analysis
of Size-Grouped Potato Yield Data Using a Bivariate Normal Distribution of Tuber Size
and Weight” (J. of Agric. Science, 1993: 193–198). Figure 3.25 is a three-dimensional graph
of this function for specified parameter values. The function cannot be easily integrated,
so tables or numerical methods must be employed to calculate various proportions of in
terest. In Chapter 11, we consider an inferential procedure for drawing conclusions about
based on assuming that the sample was selected from a bivariate normal distribution.

.10

.08

.06

.04

.02

0
30
28
26
13
24 12
11
22 10
Unless otherwise noted, all content on this page is © Cengage Learning.

9
20 8
7

Figure 3.25 Graph of the bivariate normal density function when x 5 10,
x 5 1, y 5 25, y 5 2, and 5 .5

The Case of Independence

In general, it may be difficult to find a reasonable joint distribution for two variables x
and y. The one situation in which this task is relatively straightforward is when x and
y are independent. Intuitively, independence means that knowing the value of x does not
change the distribution of y (equivalently, the distribution of y is the same for each dif
ferent x value) and knowing the value of y has no bearing on the distribution of x. Look

back at the joint distribution table for x and y in Example 3.19. Notice that if x 5 0,
then y 5 0 is a possibility but not y 5 3. However, if x 5 4, then y 5 0 is excluded,
whereas y 5 3 is possible. So the distribution of one variable does depend on the value
of the other, and the variables are therefore not independent.
Let f1(x) and f2(y) denote the marginal distributions of x and y, respectively.
Frequently, an investigator has enough knowledge of the situation under study to
assume independence. When this is the case, the joint mass or density function
must satisfy
( ) f (x, y) 5 f1(x) . f2(y)

For the variables of Example 3.19 to be independent, every entry in the joint table would
have to be the product of the row and column totals. Very importantly, once indepen
dence is assumed, one has only to select appropriate distributions for x and y separately
and then use ( ) to create the joint distribution.

Example 3.20 A business is planning to purchase two different new vehicles, a van and a sedan. Let
x denote the number of major defects on the first vehicle, and y be the number of
major defects on the second one. Because the vehicles come from different manu
facturers and assembly lines, an assumption of independence is reasonable. Suppose
x has a Poisson distribution with 5 2 and y has a Poisson distribution with 5 1.5
(the marginal distributions). Then

e 22 (2)x e 21.5 (1.5)y

f (x, y) 5 c dc d x 5 0, 1, 2, . . . ; y 5 0, 1, 2, . . .
x! y!

The long-run proportion of such purchases that would result in at most one major de
fect for the two vehicles combined (x 1 y # 1) is then f (0, 0) 1 f (0, 1) 1 f (1, 0) 5 .136.

Independence was introduced in Chapter 1 in connection with the binomial distri

bution. The concept will be considered further when we discuss probability in Chapter 5.
Variables x and y that have a bivariate normal distribution will be independent if
5 0, since then the joint density can be written as a product of two univariate normal
densities. If the joint distribution is not bivariate normal, however, then 5 0 does not
imply independence. Zero correlation means only that there is no linear relationship,
whereas independence means that there is no relationship of any sort.

More Than Two Variables

Suppose that k variables x1, x2, . . . , xk are under consideration. We might, for example, have
a system with k 5 4 components and let xi be the useful lifetime of component i(i 5 1, 2, 3,
4). Properties satisfied by a joint mass or density function f (x1, . . . , xk) are analogous to those
in the bivariate case. It can be quite difficult to specify a reasonable joint distribution. The
multivariate normal distribution is frequently used when the variables are continuous. How
ever, its density function is rather complicated. If it can be assumed that the variables are
independent, then the joint distribution is again the product of the marginal distributions.

Section 3.6 Exercises

42. A large insurance agency provides services to a a. In what proportion of cycles will there be exactly
number of customers who have purchased both a one car and one bus?
homeowner’s policy and an automobile policy. For b. In what proportion of cycles will there be at
each type of policy, a deductible amount must be most one vehicle of each type?
specified. Let x denote the homeowner’s deductible c. In what proportion of cycles will the number of
amount and y denote the automobile deductible cars be the same as the number of buses?
amount for a customer who has both types of poli d. What is the mean value of the number of cars
cies. The joint mass function of x and y is as follows: per signal cycle?
e. If a bus occupies three vehicle spaces and a car
y
occupies just one, what is the mean value of the
f (x, y) 0 250 500 number of vehicle spaces occupied during a sig
200 .20 .10 .20 nal cycle? Hint: Let h(x, y) 5 x 1 3y.
x
500 .05 .15 .30
44. Let x denote the number of major defects for a
a. What proportion of customers have $500 de particular piece of machinery and y be the num
ductible amounts for both types of policies? ber of cosmetic flaws on this same piece. Sup
b. What proportion of customers have both de pose that x and y are independent variables with
ductible amounts less than $500? f1(x) 5 .80, .15, and .05 for x 5 0, 1, and 2, re
c. What is the marginal mass function of x? What spectively, and f2(y) 5 .50, .25, .15, .08, and .02
is the marginal mass function of y? for y 5 0, 1, . . . , 4, respectively.
a. What is the joint mass function of these two
43. The joint distribution of the number of cars (x) and
variables?
the number of buses (y) per signal cycle at a particular
b. What proportion of these machines will have no
left turn lane is displayed in the accompanying table:
major defects or cosmetic flaws? What propor
y tion will have at least one defect or flaw?
c. For what proportion of these machines will the
f (x, y) 0 1 2 number of cosmetic flaws exceed the number of
0 .025 .015 .010 major defects?
1 .050 .030 .020
45. Refer to Exercise 42. Compute the covariance
2 .125 .075 .050
x between x and y and then the value of the popu
3 .150 .090 .060
lation correlation coefficient. Do these two vari
4 .100 .060 .040
ables appear to be strongly related? Explain.
5 .050 .030 .020

Supplementary Exercises
46. Orthotropic steel bridge decks with closed ribs have 2011: 492–499), researchers examine the physical
been widely used in suspension bridges, cable- properties of 22 bridge specimens. Each speci
stayed bridges, and urban elevated expressways due men was attached to a fatigue testing apparatus.
to their overall light weights, ease of construction, Fatigue life was determined as the number of cycles
and high load-carrying capacities. In the article (in millions, p. 158) at the end of the fatigue test.
“Fatigue Evaluation of Rib-to-Deck Welded Joints For each specimen, the corresponding stress range
of Orthotropic Steel Bridge Deck” ( J. Bridge Engr., (MPa) was also recorded.

Stress: 121 71 108 99 77 the article: n 5 33; x 5 .17, .33, .50, .67, . . . , 5.50;
Cycles: 1.257 11.250 2.240 4.030 6.650 y 5 .50, 1.25, 1.50, 2.75, 3.50, 4.75, 5.75, 5.60, 7.00,
Stress: 70 79 56 89 75 8.00, 8.25, 9.50, 10.50, 11.00, 10.75, 12.50, 12.25,
Cycles: 6.970 6.430 19.140 3.950 9.000 13.25, 15.50, 15.00, 15.25, 16.25, 17.25, 18.00, 18.25,
Stress: 95 90 110 77 64 18.15, 20.25, 19.50, 20.00, 20.50, 20.60, 20.50, 19.80.
Cycles: 2.290 4.470 2.150 10.490 19.260 a. The r2 value resulting from a least squares fit is
Stress: 90 99 91 91 82
.977. Interpret this value and comment on the
Cycles: 4.120 1.800 2.190 3.150 5.800 appropriateness of assuming an approximate lin
ear relationship.
Stress: 75 79
Cycles: 5.130 5.970
b. The residuals, listed in the same order as the x
values, are
a. Would you fit a straight line to the data and use 21.03 20.92 21.35 20.78 20.68 20.11 0.21
it as a basis for predicting y 5 stress range from 20.59 0.13 0.45 0.06 0.62 0.94 0.80
x 5 number of cycles? Why or why not? 20.14 0.93 0.04 0.36 1.92 0.78 0.35
b. Find a transformation that produces an ap 0.67 1.02 1.09 0.66 20.09 1.33 20.10
proximate linear relationship between the trans 20.24 20.43 21.01 21.75 23.14
formed values. Then fit a line to the transformed Plot the residuals against elapsed time. What does
data and use it to obtain an equation that de the plot suggest?
scribes approximately the relationship between
49. An investigation was carried out to study the rela
the untransformed variables.
tionship between speed (ft/sec) and stride rate (num
47. An investigation of the relationship between the tem ber of steps taken/sec) among female marathon
perature (°F) at which a material is treated and the runners. Resulting summary quantities included
strength of the material involved an experiment in n 5 11, ^ (speed) 5 205.4, ^ (speed)2 5 3880.08,
which four different strength observations were ob ^ (rate) 5 35.16, ^ (rate)2 5 112.681, and ^ (speed)
tained at each of the temperatures 100, 110, 120, 130, (rate) 5 660.130.
and 140. A scatterplot of the data showed a substantial a. Calculate the equation of the least squares line that
linear pattern. The least squares line fit to the data had you would use to predict stride rate from speed.
a slope of .500 and a vertical intercept of 225.000. b. Calculate the equation of the least squares line that
a. Interpret the value of the slope. you would use to predict speed from stride rate.
b. The largest strength value when temperature was c. Calculate and interpret the coefficient of deter
120 was 40 and the smallest was 29. What value mination for the regression of stride rate on speed
of strength would you have predicted for this of part (a) and for the regression of speed on stride
temperature, and what are the values of the re rate of part (b). How are these two related?
siduals for the two aforementioned observations?
Why do these residuals have different signs? 50. Refer to Exercise 49. Consider predicting speed
c. The values of SSTo and SSResid were 1060.0 from stride rate, so that the response variable y
and 390.0, respectively. Calculate and interpret is speed. Suppose that the values of speed in the
the coefficient of determination. sample are expressed in meters/second. How does
this change in the unit of measurement for y affect
48. As the air temperature drops, river water becomes
the equation of the least squares line? More gener
supercooled and ice crystals form. Such ice can sig
ally, if each y value in the sample is multiplied by
nificantly affect the hydraulics of a river. The article
the same number c, what happens to the slope and
“Laboratory Study of Anchor Ice Growth” (J. of Cold
vertical intercept of the least squares line?
Regions Engr., 2001: 60–66) described an experi
ment in which ice thickness (mm) was studied as a 51. The relationship between x 5 strain (in./in.) and y 5
function of elapsed time (hr) under specified condi stress (ksi) for an experimental alloy tension member
tions. The following data was read from a graph in was investigated by making an observation on stress

for each of n 5 10 values of strain. A scatterplot of a. Would a straight line fit to this data give accu
the resulting data suggested a quadratic relationship rate predictions of viscosity?
between the two variables. Employing the principle of b. Let x951/x and y9 5 ln(y). Fit a straight line to
least squares gave yn 5 88.791 1 5697.0x 2 328,161x2 the (x9, y9) data, use it as a basis for predicting
as the equation of the best-fit quadratic. viscosity when temperature is 720, and calculate
a. One observation in the sample was made when a quantitative assessment of the extent to which
strain was .005, and the resulting value of stress the approximate linear relationship between x9
was 111. What value of stress would you have and y9 explains observed variation.
predicted in this situation, and what is the value
54. Ground motions resulting from an earthquake can be
of the corresponding residual?
heavily influenced by the dynamic properties of the
b. The observed values of stress were 91, 97, 108,
soils overlying bedrock. The authors of “Influence of
111, 114, 110, 112, 102, 98, and 91. Using the
Pore Fluid Viscosity on the Dynamic Properties of an
best-fit quadratic gave corresponding predicted
Artificial Clay” (J. Geotech. Geoenviron. Engr., 2011:
values of 94.16, 98.87, 102.93, 109.07, 111.16,
1190–1201) investigated properties of an artificial soil
113.36, 113.48, 104.22, 95.93, and 90.80, re
called modified glyben to study seismic soil-structure
spectively. Calculate a quantitative assessment
interaction. Researchers investigated the relationship
of the extent to which variation in observed stress
between x 5 fluid content by mass (%) and vane shear
values can be attributed to the approximate qua
strength (kPa) for three types of modified glyben at dif
dratic relationship between stress and strain.
ferent pore fluid viscosities (w/gw): y' 5 vane shear
c. What happens if the best-fit equation is used to
strength (0% w/gw), y'' 5 vane shear strength (25%
predict stress when strain is .03? Note: The larg
w/gw), y''' 5 vane shear strength (50% w/gw). The
est strain value in the sample was .017.
data below corresponds to a graph from the article:
52. An experiment carried out to investigate the relation x 35.0 37.5 40.0 42.5 45.0 47.5
ship between y 5 wire bond pull strength in a semi y9 75.0 63.0 57.0 45.0 28.5 38.0
conductor product and the two predictors x1 5 wire y0 52.0 41.5 38.0 35.0 20.0 16.0
length and x2 5 die height resulted in data for which y- 33.5 24.5 22.0 19.0 13.0 10.0
the best-fit equation according to the principle of a. Create the scatterplots for the pairs (x, y=), (x, y==),
least squares was yn 5 2.300 1 2.750x1 1 .0125x2. and (x, y===). Does each scatterplot suggest that a lin
a. Interpret the coefficients of x1 and x2 in the ear relationship holds for the respective variables?
given equation. b. Determine the least squares regression line for
b. The observed value of pull strength was 24.35 each pair. For each, determine the correspond
when wire length was 9 and die height was 100. ing coefficient of determination.
What value of pull strength would you have pre c. Given the slope coefficients from the regression,
dicted under these circumstances, and what is summarize the relationship between vane shear
the value of the corresponding residual? strength and fluid content by mass as pore fluid
c. The values of SSTo and SSResid were 6110.2 viscosity changes from 0%, to 25%, and to 50%.
and 123.4, respectively. Can a substantial per
55. Failures in aircraft gas turbine engines due to high
centage of the observed variation in strength be
cycle fatigue is a pervasive problem. The article
attributed to the postulated approximate relation
“Effect of Crystal Orientation on Fatigue Failure of
ship between strength and the two predictors?
Single Crystal Nickel Base Turbine Blade Superal
53. The accompanying data resulted from an investiga loys” (J. of Engr. for Gas Turbines and Power, 2002:
tion of the relationship between temperature (x, in 161–176) gave the accompanying data and fit a
°F) and viscosity (y, in poise) for specimens of bitu nonlinear regression model in order to predict strain
men removed from tar sand deposits: amplitude from cycles to failure. Fit an appropriate
x: 750 800 700 850 590 620 650 680 710 550 curve, investigate the quality of the fit, and predict
y: 50 16 102 10 945 818 403 151 114 1358 amplitude when cycles to failure 5 5000.

Obs Cycfail Strampl Obs Cycfail Strampl Energy

1 1326 .01495 11 7356 .00576 Obs Plastics Paper Garbage Water Content
2 1593 .01470 12 7904 .00580 18 25.11 22.59 37.02 48.74 1453
3 4414 .01100 13 79 .01212 19 21.04 26.27 38.66 53.22 1278
4 5673 .01190 14 4175 .00782 20 17.99 28.22 44.18 53.37 1153
21 18.73 29.39 34.77 51.06 1225
5 29,516 .00873 15 34,676 .00596
22 18.49 26.58 37.55 50.66 1237
6 26 .01819 16 114,789 .00600
23 22.08 24.88 37.07 50.72 1327
7 843 .00810 17 2672 .00880 24 14.28 26.27 35.80 48.24 1229
8 1016 .00801 18 7532 .00883 25 17.74 23.61 37.36 49.92 1205
9 3410 .00600 19 30,220 .00676 26 20.54 26.58 35.40 53.58 1221
10 7101 .00575 27 18.25 13.77 51.32 51.38 1138
28 19.09 25.62 39.54 50.13 1295
56. Efficient design of certain types of municipal waste 29 21.25 20.63 40.72 48.67 1391
incinerators requires that information about energy 30 21.62 22.71 36.22 48.19 1372
content of the waste be available. The authors of the Using Minitab to fit a regression function with the
article “Modeling the Energy Content of Municipal four aforementioned variables as predictors of en
Solid Waste Using Multiple Regression Analysis” (J. ergy content resulted in the following output:
of the Air and Waste Mgmnt. Assoc., 1996: 650–656) The regression equation is
kindly provided us with the accompanying data on enercont = 2245 + 28.9 plastics
y 5 energy content (kcal/kg); the three physical com + 7.64 paper + 4.30 garbage - 37.4 water
position variables x1 5 % plastics by weight, x2 5 % pa Predictor Coef StDev T P
per by weight, and x3 5 % garbage by weight;, and the Constant 2244.9 177.9 12.62 0.000
proximate analysis variable x4 5 % moisture by weight plastics 28.925 2.824 10.24 0.000
for waste specimens obtained from a certain region. paper 7.644 2.314 3.30 0.003
Energy garbage 4.297 1.916 2.24 0.034
Obs Plastics Paper Garbage Water Content water -37.354 1.834 20.36 0.000
1 18.69 15.65 45.01 58.21 947 s = 31.48 R–Sq = 96.4% R–Sq(adj) = 95.8%
2 19.43 23.51 39.69 46.31 1407
Analysis of Variance
3 19.24 24.23 43.16 46.63 1452
4 22.64 22.20 35.76 45.85 1553 Source DF SS MS F P
5 16.54 23.56 41.20 55.14 989 Regression 4 664931 166233 167.71 0.000
6 21.44 23.65 35.56 54.24 1162 Error 25 24779 991
7 19.53 24.45 40.18 47.20 1466 Total 29 689710
8 23.97 19.39 44.11 43.82 1656 a. Predict the value of energy content when plas
9 21.45 23.84 35.41 51.01 1254
10 20.34 26.50 34.21 49.06 1336
tics is 17.03, paper is 23.46, garbage is 32.45,
11 17.03 23.46 32.45 53.23 1097 and water is 53.23. Also determine the corre
12 21.03 26.99 38.19 51.78 1266 sponding residual.
13 20.49 19.87 41.35 46.69 1401 b. What proportion of observed variation in energy
14 20.45 23.03 43.59 53.57 1223
15 18.81 22.62 42.20 52.98 1216 content can be explained by the approximate re
16 18.28 21.87 41.50 47.44 1334 lationship between energy content and the four
17 21.41 20.47 41.20 54.68 1155 predictors?

Bibliography
Kutner, M., C. Nachstein, and J. Neter, Applied Linear included in Applied Linear Statistical Models, a lon-
Regression Models (4th ed.), McGraw-Hill/Irwin, ger book by the same authors.)
Burr Ridge, IL, 2004. A comprehensive up-to-date Montgomery, D. C., E. A. Peck, and G. G. Vining,
exposition of regression and correlation analysis Introduction to Linear Regression Analysis (5th ed.),
without overindulging in theory, though matrix Wiley, New York, 2012. A very nice treatment of regres-
algebra is rather frequently used. (This material is also sion written for engineers and physical scientists.

Introduction
Engineering has been defined as the art of applying science and technology for the
optimal conversion of the resources of nature into the uses of humankind.1 The sci-
ences, in turn, are grounded in mathematics, so it is natural that measurements of all
kinds should play a large role in engineering and scientific practice. In this chapter,
we examine some of the ways in which data is collected as well as some approaches
to ensuring data quality.
Scientists and statisticians have long realized that some sets of data are defi-
nitely more useful than others, and that at the heart of data quality lies the
realization that external conditions can often exert a large influence on mea-
sured values. Temperature, for example, is well known to affect the physical di-
mensions (length, area, etc.) of most materials, so the measured length of a thin
strip of aluminum will necessarily vary depending on the ambient temperature.
In an effort to control or eliminate the effects of such external or “noise” fac-
tors, engineers have developed a large number of professional standards whose
purpose is to ensure the consistency and quality of scientific data. We will look
at some specific examples of such standards in Section 4.1.
Since the early 1920s, statisticians have also addressed the problems of data
quality by introducing tightly controlled data collection schemes. These schemes,

1
Encyclopedia Britannica, 1998.
161

called experimental designs and sampling plans, provide methods not only
for controlling or eliminating the effects of external factors but also for assess-
ing the magnitude of their combined effect on measured data. Sampling plans
also address the problem of how far we can generalize the conclusions that we
draw from data. One important feature of experimental designs is the ability to
study the effects of several factors simultaneously on the values of another factor,
called a response variable. This feature is especially well suited to research and
development activities. The main components of such designs are introduced in
Sections 4.2 and 4.3.
The process of obtaining measurements is also vital to the eventual conclu-
sions drawn from data. Numerous questions can be asked about measurement
procedures: Can we trust a particular measuring instrument’s readings? Are the
readings accurate and precise? Do repeated measurements of the same object
give similar results, or do the results exhibit large variation? If different people or
special laboratories are involved at various stages of the measuring process, does
this have an adverse effect on the quality of the data? These questions are the sub-
ject of metrology, the study of measurement, and are examined in Section 4.4.

4.1 Operational Definitions

When working with data, two facts quickly emerge: (1) There are usually several ways to
measure the same thing and (2) external factors can exert a large influence on our final
measurements. We learn early that failing to be specific about what we want to measure
can lead to endless problems and questions about how, or even whether, to use a set
of data. To illustrate, suppose that you ask two people to measure the density of water.
Person A might use the following method: An empty graduated cylinder is weighed and
then filled with water and reweighed; the two weights are subtracted, giving the weight
of the water in the cylinder; then, by reading off the water volume from the cylinder’s
measuring scale, the ratio of the volume to the weight is used as a measure of the
water’s density. Person B, however, decides to simply use a hydrometer, an instrument
that directly measures the density of water. Do the two measurement methods agree?
Probably not. The measurements from person A, for instance, depend on the precision
and accuracy of the weighing scale used and on the person’s ability to read the volume
correctly from the cylinder.2 The readings from person B depend on the precision of
the hydrometer and whether it is correctly calibrated. There are additional reasons why
the two measurements may not be equal. For instance, what kind of water was used?
After all, pure water, freshwater, and seawater are known to have different densities.
Furthermore, temperature is an important factor affecting water density (maximum
water density occurs at 39.09°F). Did person A and person B measure the same sort of
water at the same temperatures?

2
Surface tension causes the top of the water to form a bowl-like surface, called a meniscus. Using the top of
the meniscus leads to a different volume estimate than using the bottom of the meniscus.

As this example shows, unless you are very specific about what to measure (e.g.,
seawater at 50°F and 1 atmosphere of pressure) and how to measure it, data can be quite
unreliable. Realizing this, the quality pioneer W. Edwards Deming recommended that,
prior to collecting any set of data, one should first create an operational definition that
spells out exactly what is to be measured and exactly how the measurements should
be made. The reward for doing this is consistent, reliable data. Any two people should
be able to follow the operational definition and obtain essentially the same measure-
ments. Cognizant of the importance of operational definitions, most scientists include
a Materials and Methods or Experimental Procedure section that outlines the exact
procedures employed to collect the data used in a study.

Example 4.1 Automobile gasoline is a carefully balanced blend of from 8 to 15 different hydrocar-
bons. The resulting blends must meet up to 15 quality and environmental require-
ments, including standards regarding vapor pressure, boiling point, stability, color, and
octane rating. The octane scale measures the degree to which a gasoline blend per-
forms like pure isooctane (which gives the least amount of premature firing or “knock”)
or pure normal heptane (which produces extreme knocking). If the blend performs
like a mixture of 90% isooctane and 10% heptane, it is assigned an octane rating of 90.
Because octane measurements are heavily influenced by engine speed and
temperature, an operational definition must be used when assigning octane ratings.
First, using a standard knock engine, the “research octane” level is measured under
mild conditions (600 rpm and 120°F). Second, “motor octane” is measured under
harsher conditions (900 rpm and 300°F). Finally, the “road octane” rating is calcu-
lated as the average of the research and motor octane levels. Road octane, calculated
by the (R 1 M)y2 method, is the one commonly reported on gasoline station pumps.

Example 4.2 Operational definitions are often created on the job. For example, when inspecting
injection-molded automobile dashboards, several types of defects can be observed,
such as pinholes, creases, burn marks, and voids (hollow areas underneath the outer
skin of the dashboard). To generate meaningful data about such defects, an opera-
tional definition must be created so that any two inspectors will report the same types
and severity of defects. For example, we might decide to classify creases longer than
1 inch as severe, whereas creases less than one-quarter inch might be called minor.
Pinholes that occur under the dashboard (not visible to passengers) could be classi-
fied differently from those that are in the passengers’ field of vision. Similarly, voids
with large diameters might be treated as major defects, whereas smaller voids are
minor defects. Once these definitions have been established, the resulting data can
be reliably used in quality control charts (Chapter 6) or other statistical methods.

Professional Standards
It often takes highly specialized knowledge to create operational definitions. Conse-
quently, entire professional societies have arisen to create such definitions, which are
then called professional standards or simply standards. One of the largest such groups

is the American Society for Testing and Materials (ASTM). ASTM publishes
standard test methods, specifications, practices, and guides for engineers working with
materials, products, systems, and services. Over 12,000 ASTM standards have now been
published, and these standards are commonly adopted by government agencies for use
in codes, regulations, and laws. Building codes, for example, commonly cite ASTM
standards for conducting tests on structures. In the following example, notice how each
step of a measurement process is carefully defined.

Example 4.3 Concrete used in construction must meet tight consistency standards. Consistency
refers to the fluidity of the concrete when poured. ASTM C 143 (Standard Method
for Slump of Portland Cement Concrete) is often cited in state construction codes as
the required method of testing consistency.
ASTM C 143 requires that a sample of concrete be poured into a cone shaped
like a megaphone (8-in. diameter at one end and 4-in. diameter at the other end).
The large base of the cone is on the ground during the pour. The cone is filled
one-third full and then tamped down 25 times. This procedure is repeated twice,
leaving the mold full. The cement sample must come from the middle portion of
the batch being poured. Next, the cone is lifted off the cement and quickly inverted
and placed beside the conical pile of cement. Without the support of the cone, the
height of the cement then diminishes or slumps. The distance between the top of the
cone and the top of the cement is called the slump, and, depending on the building
code used, the slump must fall within specified limits.

Other organizations, including the federal government, make extensive use of pub-
lished standards. The Code of Federal Regulations (CFR), for instance, is an important
source of engineering standards and requirements in all federally regulated industries.

Example 4.4 The Department of Transportation (DOT) oversees the testing and rating of au-
tomobile tires. Tires are rated for treadwear, traction, and temperature resistance.
These ratings are marked on the side of each tire. A treadwear rating of DOT 150,
for example, means that a tire wears about one and a half times as long as a tire rated
100 on a standard government test course. Estimating the treadwear of a given brand
of tire is done via regression analysis.
Because of the numerous factors that can affect treadwear (size of car, driving
style, road conditions, and speed), the operational definition specified by DOT is
extensive. In brief, Regulation 49CFR 575.104 (Uniform Tire Quality Grading
Standards) requires that a convoy of two or four rear-wheel-drive passenger cars be
driven over a 400-mile government test course in the vicinity of San Angelo, Texas.
One vehicle is outfitted with special government-manufactured course-monitoring
tires; the other vehicles have only test tires. Inflation pressures are specified, and
each vehicle is weight-loaded to put a required test load on the tires. Wheel align-
ments are checked, tires are broken in for two laps (800 miles), air pressure is

rechecked, and wheels are realigned. Initial tread depth, to the nearest .001 in., is
measured. The convoy is then driven for 6400 miles, rotating tires every 400 miles
in a specified pattern. A car’s position in the convoy is also rotated. In addition, tires
are also shifted from one vehicle to another every 1600 miles. Tread depth is mea-
sured every 800 miles. Finally, a regression line is fit to the nine treadwear points
(one initial reading and eight readings at 800-mile intervals). The regression line
is used to calculate a projected mileage for the test tires and the monitoring tires.
Comparisons between the projected test tire wear and monitoring tire wear are used
to assign the DOT wear rating.

Another organization that has played a major role in setting standards for various indus-
tries is the International Organization for Standardization (ISO). Founded in 1947,
the ISO has published more than 19,500 international standards covering diverse areas
such as food safety, computers, agriculture, and health care.

Example 4.5 We often assume that children’s toys, once made available on the shelves of a store,
are perfectly safe to use by children. Unfortunately, this is not always the case as evi-
denced by toy product recalls because of some hazard concern. For example, the U.S.
Consumer Product Safety Commission maintains a regularly updated website that
lists various hazardous toy recalls. In 2012, the ISO updated its series of toy safety stan-
dards that detail requirements and test methods for toys intended for use by children
under 14 years of age; it also sets age limits for various requirements. The series con-
tains four parts: Part 1—Safety aspects related to mechanical and physical properties;
Part 2—Flammability; Part 3—Migration of certain elements; and Part 4—Swings,
slides, and similar activity toys for indoor and outdoor family domestic use. Two new
parts are currently under development: Part 5—Determination of total concentration
of certain elements in toys; and Part 6—Toys and children’s products—Determination
of phthalate plasticizers in polyvinyl chloride plastics. By adopting the requirements
and recommendations of the ISO safety standards, toy manufacturers can help mini-
mize product recalls and reduce the risk of a child being injured by an unsafe toy.

Benchmarks
Operational definitions are especially appropriate for establishing industry and profes-
sional standards. However, when we want to compare several different products or pro-
cesses, another sort of standard is needed. For these applications, benchmarks are the
appropriate tools. Benchmarks are well-defined objects or processes whose character-
istics are already explicitly known. Knowing the exact value of some characteristic in
advance allows one to evaluate several products or processes by comparing how they
perform against the benchmark. For example, the National Institute of Standards and
Technology (NIST) keeps copies of standard physical units, such as the volt and the
kilogram. These standards are the benchmarks against which the precision and accu-
racy of all measuring instruments are eventually compared.

Example 4.6 Benchmarks are routinely used for comparing software products. For instance, statis-
tical software packages are evaluated for computational accuracy by using specially
designed data sets whose statistical properties are precisely known. One repository
of such benchmark data sets can be found at http://www.itl.nist.gov/div898/strd
/index.html, a website maintained by the Information Technology Laboratory of
the National Institute of Standards and Technology. This website was produced as
part of the Statistical Reference Datasets Project. One of these data sets is the set of
three integers 10,000,001 to 10,000,003 that is used to evaluate a software programs
computation of the sample standard deviation, s. The sample standard deviation for
these three values is s 5 1, the same as for the sample 1, 2, 3.
Using this data set as a benchmark, it is possible to compare the different ap-
proaches to calculating s that are used in software packages. For instance, summing
the squares of the three integers (a step used in some formulas for s) leads to inac-
curate results. However, programs that use updating formulas (in which the value of
s is updated as each data point is entered) are generally very accurate.

Section 4.1 Exercises

1. What is the primary difference between an opera- the published reference more properly be considered
tional definition and a benchmark? an operational definition or a benchmark?

2. Give an operational definition for measuring the fuel 5. Print speed (often measured in pages per minute,
efficiency of a car. In your definition, take into ac- ppm) is an important property to consider when
count factors such as the driving speed, octane rating, buying a printer. However, printer manufacturers
distance driven, tire pressure, and driving terrain. measure this property in different ways, making
comparison of print speeds difficult. In 2009,
3. Give an operational definition for measuring the day-
the ISO developed an international standard for
time temperature in a city. In your definition, take
measuring print speed. The standard, known as
into account factors such as time of day and location.
“ISO ppm,” allows a consumer to make “apples-
4. To test the accuracy of a new numerical algorithm, a to-apples” comparisons of real-world print speeds
programmer uses the algorithm to produce the first under standard conditions. It is now common for
200 digits of the number . The programmer checks the ISO ppm rating of a printer to be included in
the accuracy of 200 digits by comparing them to its product specifications listing. Here, would ISO
those in a published reference, whose accuracy has ppm more properly be considered an operational
been previously verified. In this application, would definition or a benchmark?

4.2 Data from Sampling

The data used in most applications arises from some form of sampling. By its very
definition, a sample is simply a fraction or a part of some larger entity. Sometimes,
the larger entity can be considered to be a population, such as the population of all
electronic components made during a single workshift. At other times, the sampled
entity may be a single object, such as a batch of cement, a chemical process, or a city’s
water supply.

The goal in all forms of sampling is to be able to draw conclusions about the larger
entity based solely on our analyses of the information in a sample. For this reason,
every effort is made to ensure that samples are truly representative of the thing we are
sampling. Professional standards usually provide great detail on how samples are to be
obtained. For example, ASTM C 172 (Standard Method of Sampling Freshly Mixed
Concrete) requires that samples of fresh concrete be taken “. . . at two or more regularly
spaced intervals during discharge of the middle of the batch . . .” and that the inspector
should “. . . perform sampling by passing a receptacle completely through the discharge
stream . . .” while taking care “. . . not to restrict the flow of the concrete . . . so as to
cause segregation.” Another method of assuring representative samples is based on the
concept of random sampling, described later in this section.

The Advantages of Sampling

When done properly, sampling has several desirable features. Foremost among these
are the savings in resources, especially time and money, that can be obtained by us-
ing samples. The economics of sampling are readily apparent because, for example,
sampling and testing 20 items from a batch of 1000 items obviously involves less
labor than testing the entire batch. In many cases, it is equally important to control
the amount of time spent analyzing the sample data itself because production deci-
sions often depend on tests performed on samples. In quality control applications,
for instance, the decision of whether to adjust a process or to leave it alone is based
on the analysis of periodic samples taken from an ongoing process. Timely test re-
sults are equally important in construction, where decisions on whether to accept a
contractor’s work and to proceed to the next phase of construction are based on the
results of test samples.
Sometimes sampled material must be destroyed during testing. This is the case,
for example, when evaluating the breaking strength of materials (e.g., metals, wood,
fabrics, plastics), assessing the potency of drugs, or estimating the average lifetime of a
group of electronic components. Such evaluation is called destructive testing. In such
cases, sampling is not just an advantage, it is a necessity.
Even when testing is nondestructive, it still makes sense to sample. In addition to
the economic benefits described previously, testing done on samples is often more reli-
able than testing done on entire populations. Several case studies have verified this phe-
nomenon. The simple explanation is that testing and inspection errors begin to creep in
whenever large numbers of items are tested because of inspector fatigue or differences
between inspectors. With samples, more attention can be devoted to each item tested,
and this almost always results in more reliable test data.

Example 4.7 The inspection and approval of metal welding in building construction can be based
on nondestructive test (abbreviated NDT) methods, destructive test methods, or vi-
sual inspection. There are several NDT methods available, including magnetic par-
ticle testing, radiographic inspection, penetrant inspection, ultrasonic testing (UT),
leak testing, and hardness testing. Each of these methods is based on a nondestruc-
tive examination of a sample of welded material.

Penetrant inspection, for example, involves the application of a dye (often red in
color) to the welded surface. The dye penetrates any existing cracks and holes in the
metal surface. After the excess dye is wiped away, only the dye in the cracks remains.
To reveal these cracks, another liquid, called a developer, is applied to the surface. This
causes the dye to come to the surface of the crack and creates a highly visible marking
of each crack or hole in the weld. An experienced inspector can then make an evalua-
tion of the quality of the weld from the number and location of these markings.

Random Sampling
Random sampling is a form of sampling used extensively in statistical methods. This
technique presupposes that samples are to be obtained from some well-defined popu-
lation of distinct items, and it provides a simple mechanism for randomly selecting
items from the population to be included in a sample. The advantages of using random
sampling are (1) it helps to reduce or eliminate bias in the manner in which the sam-
pled items are chosen and (2) it enables us to make precise statements about the extent
to which conclusions drawn from a sample can be applied to the entire population.
Random samples are obtained by making sure that every sample of the desired size has
the same chance of being selected. This in turn implies that each item in the population has
an equally likely chance of being chosen. One popular method for achieving this is to first
create a list (called a sampling frame) of the items in a population. Next, successive positive
integers are assigned to the items on the list, and then a random number generator is used
to select a random sample of these positive integers. Random number generators can be
in the form of tables, functions on handheld calculators, or commands in programming
languages and statistical software. Whatever method is used, the selected integers will
correspond to specific items in the sampling frame.
When sampling, we are immediately faced with a decision to sample with or with-
out replacement. Sampling with replacement means that after each successive item
(or integer) is selected for the random sample, the item is “replaced” back into the
population and may even be selected again at a later stage. Thus, sampling with re-
placement allows for the possibility of having “repeats” occur in our random sample. In
practice, sampling with replacement is rarely used. Instead, the more common notion
of sampling is to allow only distinct items from the population in the sample. That is, no
repeats are allowed. Sampling in this manner is called sampling without replacement.
Although these two forms of sampling are indeed different, in most applications (i.e.,
when the sample size is small compared to the population size) there is little practical
difference between them. Unless otherwise stated, however, we will always assume that
random sampling is done without replacement.

Example 4.8 Suppose that we want to perform some electrical tests on a random sample of
5 integrated circuit chips from a package of 20 chips. Arranging the 20 chips in a
horizontal line on a table is a rapid way of associating a unique integer from 1 to
20 with each chip (the leftmost chip would be labeled “1,” the rightmost would be
“20,” and so forth). It is important to note that the particular ordering of the chips

is completely immaterial to the sampling process. All that is needed is a method for
assigning integers to the chips, and horizontal positioning achieves that purpose.
Using a random number generator from a calculator or a statistical software
package, we next generate a random sample of five integers from the numbers 1
through 20. When doing this, we have to decide whether to sample with replace-
ment or without replacement. Suppose we choose to sample without replacement
and that the randomly chosen integers turn out to be 4, 14, 3, 18, and 15. Then,
our random sample of 5 chips would consist of the 4th, 14th, 3rd, 18th, and 15th
chips, counting from left to right.

The sample size used in random sampling can sometimes change due to changes
in available budgets or changes in the precision of the information required from the
sample. In such cases, after already having drawn a random sample of size n from a
population of N items, we may find ourselves in the position of wanting to either reduce
or increase the sample size somewhat. A question then arises as to how to accomplish
this. Fortunately, as the following rules illustrate, adjusting the sample size does not
require that we discard the items already sampled.

Rules for Increasing or Decreasing the Size of a Random Sample3

1. The complement of a random sample of size from a population of size is itself a
random sample from the population.
2. Any random subsample of a random sample is also a random sample from the population.
3. Any random subsample from the complement of a random sample is itself a random
sample from the population.
4. After a random sample of size has been selected, any random sample from its comple-
ment can be added to it to form a larger random sample from the population.

[ The complement of any sample is the name given to those items in the sample.]

Example 4.9 Commercial and military aircraft are built using hundreds of thousands of specially
designed nuts and bolts, known as “fasteners.” Because these fasteners are subjected
to stress, fatigue, and a host of environmental conditions, random samples of each
type of fastener are routinely tested for strength requirements.
Suppose an inspector has drawn a random sample of size 10 from a box
of completed fasteners and conducts torque tests on them. After testing, the inspector
is informed that, in fact, a sample of size 25 is required by the customer for these fas-
teners. Since the fasteners remaining in the box are the complement of the original

3
Wright, T., and H. Tsao, “Some Useful Notes on Simple Random Sampling,” Journal of Quality Technology,
1985: 67–73.

sample of 10, then the inspector need only select a random sample of 15 fasteners from
the box to add to the original sample. Rule 4 ensures that the group of 25 fasteners
selected in this fashion qualifies as a random sample from the box.
On the other hand, suppose the inspector had originally selected a sample of
size 25 but subsequently found that a sample of only 10 was needed. By simply
selecting a random sample of 10 from the original 25 items, the inspector will have
legitimately obtained a random sample of size 10 from the box.
Obtaining random samples often requires some ingenuity. This is especially the
case when it is difficult to develop a sampling frame for the population of interest.
For example, continuous processes, which are not conveniently divided into finite
numbers of discrete parts, usually pose special problems when developing sampling
frames. In such circumstances, it is helpful to remember that a sampling frame can
also be a procedure, not just a list.4

Example 4.10 Agricultural inspectors are required to select random samples of crops for testing and
evaluation. Harvested crops stored in cartons or bins, such as citrus fruit, pose special
sampling problems. Although it is easy to imagine tagging the fruit in a bin with succes-
sive integers and applying the random number scheme to generate samples, doing so
would be time-consuming and economically prohibitive. Instead, other schemes have
been developed to obtain random samples in a more economical fashion. One popular
technique is to select a bin of fruit at random (bins are generally easy to select by the
random number method) and then follow a “random corner” method for obtaining the
sample: First, one of the bin’s four corners is chosen at random (a small printed table of
random numbers is helpful here); then the fruit stacked in the selected corner are used
to form the sample. This method relies on the reasonable assumption that the fruit
were randomly mixed when packed in the bin. Choosing a corner at random has the
additional benefit of not allowing human inspectors to introduce bias into the result-
ing data by always choosing a corner in which the fruit looks especially good (or bad).

Random Versus Nonrandom Samples

The first three chapters of this text focus primarily on the mechanics of how data is used to
describe samples and populations. In Chapter 4, because of the importance of obtaining
good (and avoiding bad) data, we look at how data is generated in the first place. This is one
of the most important aspects of conducting a statistical study because you certainly do not
want all your hard work on a problem to be negated by statements such as “your results are
only as good as your data” or the well-known acronym GIGO (garbage in, garbage out).
Drawing conclusions from data always comes down to a question of trust: How
reliable or trustworthy is the person, organization, or method providing the data? Stat-
isticians address this issue by using data-gathering methods based on random sampling
and randomization (see Section 4.3), techniques that then allow the use of probability
calculations (Chapter 5) to numerically assess the reliability of the conclusions drawn

4
Kish, L., Survey Sampling, John Wiley & Sons, New York, 1965: 53.

from the data. Such methods are objective and the only “trust” involved is in assuring
that random sampling or randomization is correctly employed while gathering the data.
On the other hand, with nonrandom samples (i.e., data not gathered using some sort of
randomizing technique), no such probability assessments are possible and the informa-
tion in such data cannot, as a rule, be generalized to larger populations.
The problems with nonrandom data go even deeper. Even with the best intentions,
when trying to subjectively obtain data that we think is “representative” of a larger popula-
tion, the resulting data can be badly skewed. For example, when assessing the reliability of a
product, an engineer might try to ensure that the data includes examples of each kind of fail-
ure mode that the product experiences in the field. This practice automatically ignores the
fact that some failure modes are usually much more prevalent than others, and inferences
based on such “representative” samples may not only be unreliable, but even misleading.
So, what should you do when nonrandomly collected data arises in practice?
Although it is acceptable to apply simple descriptive statistical measures to the data (e.g.,
means, histograms, and so forth), be aware that (1) such measures can’t legitimately be
generalized, and (2) the statistical techniques presented in the following chapters may
not be valid when applied to such data.

Stratified Sampling
The method of random sampling can be extended to incorporate additional sources
of information and to handle problems that arise when sampling from populations for
which suitable sampling frames are hard to obtain. To distinguish basic random sam-
pling (as previously described in this section) from the extended sampling schemes that
rely on it, random sampling is often referred to as simple random sampling (SRS).
One method for incorporating additional information is stratified sampling. In
stratified sampling, the population of interest is first divided into several nonoverlap-
ping subsets called strata, and then the SRS method is used to select a separate ran-
dom sample from each of the strata. All of the strata samples are then combined into
one large “stratified” sample from the population. When the strata are properly spec-
ified, stratified sampling will generally produce estimates that are more precise than
SRS sampling.

General Rules for Choosing Strata

Decide on a response variable that is of interest.
Divide the entire population into nonoverlapping groups (i.e., strata) 1, 2, . . . , each of
which is as as possible.
Decide on the sample sizes 1, 2, . . . , to select from the strata.
Use SRS to obtain a sample from each stratum.

Estimating a Population Mean

Figure 4.1 illustrates the decomposition of a population into strata for estimating the
mean of a population. Let the number of population elements that fall within these
strata be denoted by Ni (i 5 1, 2, 3, . . . , k); each stratum Si has its own mean i and

standard deviation i. The selection of sample sizes can be done in two steps: (1) Decide
on the total sample size n that will be used, and then (2) decide how to divide n up into
the strata sample sizes n1, n2, n3, . . . , nk.

1, 1 2, 2 3, 3 ,

1 2 3

1 2 3 sum =

Figure 4.1 A population divided into strata 1, 2,

3, . . . , of size 1, 2, 3, . . . ,

Example 4.11 Companies that produce or handle hazardous chemicals are required to apply for a
National Pollutant and Discharge Elimination System (NPDES) permit from the
federal government (“Measuring, Sampling, and Analyzing Storm Water,” Pollution
Engr., Mar. 1, 1992: 50–55). The environmental concerns addressed by the NP-
DES permit involve the amounts of pollutants carried by storm water runoff from a
company’s facility to nearby public waters. Pollutant levels are estimated by taking
random samples of storm water and subjecting them to chemical analysis.
Sampling runoff water is accomplished by stratifying runoff water according to
the different point sources, usually water channels, that carry the runoff. Using various
techniques and meters, the average velocity of water flow and the cross-sectional area
of each channel are estimated. These are used to estimate total flow volumes for
each point source. The flow volume can be thought of as a measure of the size Ni of
the ith stratum. The total of all flow volumes represents the population size. Water
samples from each point source are obtained and chemically analyzed. The total pol-
lutant level is then calculated as a weighted average of the pollutants in each sample,
weighted by the flow volume from the point source where the sample was obtained. Unless otherwise noted, all content on this page is © Cengage Learning.

To choose n, we first decide on a confidence level and a bound B on the error

of estimation. The confidence level (which will be discussed in greater detail in
Section 7.2) is a measure of the degree of reliability, measured on a scale from 0% to
100%, that we would like to have in our final estimate of the overall population mean
. Of course, since estimates are based on samples, 100% confidence is not possible,
so confidence levels are usually restricted to large numbers (e.g., 90%, 95%, 99%, etc.)
less than 100%. Estimates should also be “close enough” to the population character-
istic they estimate to be useful for subsequent calculations and decision making. This
requirement is achieved by specifying B, the “plus or minus” margin of error that you are
willing to accept in your estimate. Finally, we let wi denote the proportion (or weight)

that the ith stratum sample represents in the total sample of n, that is, wi 5 niyn for
i 5 1, 2, 3, . . . , k. Given the wi’s, the Ni’s, the i’s, a confidence level of 95%, and B, it
can be shown that the minimum necessary sample n for estimating the population mean
to within a margin of error of 6 B is

k N2i2i
^ wi
i51
n5
B 2 k
N2 a b 1 ^ Ni2i
1.96 i51

where N 5 N1 1 N2 1 N3 1 1 Nk. For confidence levels other than 95%, replace

1.96 by the appropriate value from a table of standard normal curve areas.
Assuming the same per unit cost for sampling from each stratum, the optimum
allocation of sample sizes (called the Neyman allocation) can be shown to be

k 2
Nii c ^ Nii d
k i51
ni 5 n ° ¢ where n 5
^ Nii N2 a
B 2 k
b 1 ^ Ni2i
i51
1.96 i51

If, in addition, the strata standard deviations are identical, then

Ni 1
ni 5 n a b where n 5
N B 2 1
a b 1
1.96 N

This is called the proportional allocation. Please consult one of the chapter references
for the case of unequal sampling costs.
Regardless of the allocation used, the stratified estimate of the population mean
is given by

N1 N2 N3 Nk
xstr 5 x1 a b 1 x2 a b 1 x3 a b1 1 xk a b
N N N N

where xi denotes the mean of the ni observations from stratum Si. One of the nice
features of the proportional allocation is that the resulting data is “self-weighting”; in
other words, instead of calculating the stratified estimate we can simply combine the
data from all the strata and calculate the ordinary sample mean of the combined data,
which, only in this case, will exactly equal xstr.
Stratified estimates of are usually accompanied by a measure called their standard
error (which will be discussed more fully in Chapter 7) that can be interpreted in much
the same way the sample standard deviation is interpreted. That is, if we think of all
the possible stratified samples of size n that we could have selected, about 95% of the

estimated means from such samples will be within about 2 standard errors of . For
stratified sampling, the standard error is approximated by

1 k s2i N i 2 ni
2^ i
sstr N2 a ba b
CN i51
ni Ni 2 1

where s2i is the sample variance of the ni observations from stratum Si.

Example 4.12 Since 1991 the USGS (U.S. Geological Survey) has conducted the National Water
Quality Assessment Program (NAWQA), whose purpose is to study natural and human
factors that affect water quality. One important measurement that NAWQA produces
is an estimate of the percentages of a region covered by various crop types. In one
study (“Validation of National Land-Cover Characteristics Data for Regional Water
Quality Assessment,” Geocarto International, Dec. vol. 10, no. 4 1995: 69–80) of the
percentages of a region covered by corn crops, a region was divided into the following
strata: A (irrigated crops), B (small grains and mixed crops), C (grasslands and small
crops), D (wooded areas and crops), E (grasslands), and F (woods and pastures).
The region under study is first divided into smaller regions called quadrats, each
with an area of 1 km2. These subregions are then assigned to the various strata cat-
egories. Suppose that data from previous studies is used to obtain estimates of the
standard deviations i of the percentages of corn crops within each stratum and that
this information is collected in the following table:
Stratum (Si) Stratum size (Ni) Standard deviation (i)
A 500 .2
B 300 .2
C 100 .3
D 50 .4
E 50 .6
F 200 .8
Since aerial photographs are used to estimate the percentage of corn coverage at a
given site, the unit sampling costs will be about the same for each 1 km2 subregion,
so the Neyman allocation can be used. If we specify a 90% confidence level (the
area under the z curve between 21.645 and 11.645 is .90) and a margin or error
of 610% (i.e., B 5 .10), then
k 2
c ^ Nii d
i51
n5 2 k
B
N2 a b 1 ^ Ni2i
1.645 i51

[500(.2) 1 300(.2) 1 100(.3) 1 50(.4) 1 50(.6) 1 200(.8)]2

5
0.10 2
12002 a b 1 [500(.22) 1 300(.22) 1 1 200(.82)]
1.645
5 109.68 110 (rounding to the nearest integer).

Using the fact that ^ ki51 Nii 5 410.0, the Neyman allocation of n 5 110 to the strata is

Stratum Ni i Nii ni n(Nii yg ki 1 Ni i)

A 500 .2 100 n1 5 110(100y410) 5 26.8 27
B 300 .2 60 n2 5 110(60y410) 5 16.1 16
C 100 .3 40 n3 5 110(40y410) 5 10.7 11
D 50 .4 20 n4 5 110(20y410) 5 5.4 5
E 50 .6 30 n5 5 110(30y410) 5 8.0 8
F 200 .8 160 n6 5 110(160y410) 5 42.8 43

The next step in the study is to obtain random samples of size n1 5 27, n2 5
16, . . . , n6 5 43 from the respective strata and to use aerial photographs of the selected
1-km2 regions to obtain estimates of the corn percentages in these regions. To illustrate, the
following table summarizes the data from such a study:
Stratum ni Ni xi si
A 27 500 .52 .18
B 16 300 .22 .23
C 11 100 .02 .35
D 5 50 .06 .45
E 8 50 .01 .64
F 43 200 .67 .78
From this data we estimate that overall percentage of the entire region that is covered by
corn crops is

xstr 5 .52(500y1200) 1 .22(300y1200) 1 1 .67(200y1200)

5 .39 (or, 39%)

and the estimated standard deviation that accompanies this estimate is sstr .03 (or, 3%).

Estimating a Population Proportion

Stratification can also be used to obtain an estimate of a population proportion . Recall
that a population proportion is simply the proportion of all the items in a population that
have a particular attribute. In statistics, it is important to remember that the term popula-
tion proportion refers to a proportion of the number of items in a population. Proportions
or percentages that use different bases of comparison (such as in Example 4.12, where per-
centages of land areas were used) are treated simply as numerical data, not as proportions.
The procedure presented earlier for finding stratified estimates of a population
mean can easily be converted into a procedure for estimating a population proportion.
Using earlier notation, where N denotes the population size and Ni denotes the number
of items in the ith stratum, Si, the only changes in the formulas are:
1. Replace each xi by pi, where pi is the sample proportion of items found in the
sample of ni items selected from stratum Si.
2. Replace each i by 2i(1 2 i), where i is the proportion of items
in stratum Si that have the given attribute. Since the values of i

are not normally known exactly, there are various possibilities for
estimating them:
a. You can approximate the i values based on pilot studies or on results from
previous studies.
b. Or, if there is no prior information about i values, then be pessimistic
and use i 5 .5 for each i 5 1, 2 ,3 , . . . , k (this choice maximizes
1i(1 2 i)).
The stratified estimate of the population proportion is then given by
N1 N2 N3 Nk
pstr 5 p1 a b 1 p2 a b 1 p3 a b1 1 pk a b
N N N N

and its associated standard error is approximated by

1 k Ni2 ni pi(12 pi)

C N2 ^ i
sp N2 a ba b
i51 Ni ni21

Example 4.13 Improper handling of newly planted citrus trees can cause a defect called benchroot,
which is the tendency for the root system to grow sideways. Benchroot eventually
causes trees to be less healthy and smaller than normal, which results in smaller
crops. Because citrus trees require several years of growth before reaching maximum
production levels, the presence of benchroot is not apparent until years after planting.
By sampling young trees shortly after planting, the extent of the benchroot problem
can be estimated in time to take other measures, such as replanting selected areas.
Suppose that a citrus cooperative consists of five different farms. Using the farms as
strata should increase the precision of the final sampling results since the trees within a
given farm ought to be more similar to each other than to trees on other farms. The number
of trees on the farms are known to be N1 5 2000, N2 5 4000, N3 5 8000, N4 5 8000,
and N5 5 1000. Based on records from previous plantings, the benchroot problem has
affected no more than about 10% of all trees, so a value of i 5 .10 (i 5 1, 2, 3, 4, 5) is
selected for each farm. This means that i 5 1.10(1 2 .10) 5 .3 for each farm. Since the
unit costs ci(i 5 1, 2, 3, 4, 5) of selecting and testing a tree are assumed to be equal for each
farm, the Neyman allocation can be used to find the required sample size and its allocation
to the strata (farms). Finally, suppose that a confidence level of 95% and an error bound
of B 5 .03 (i.e., 63%) are chosen. Based on this information, the required sample size is
k 2
c ^ Nii d
i51
n5
B 2 k
N2 a b 1 ^ Ni2i
1.96 i51

[2000(.3) 1 4000(.3) 1 8000(.3) 1 5000(.3) 1 1000(.3)]2

5
0.03 2
200002 a b 1 [2000(.32) 1 4000(.32) 1 1 1000(.32)]
1.960
5 376.92

which we round to n 5 377. The following table shows the steps in allocating the
total sample of 377 to the five strata (farms). After sampling, the number of trees with
benchroot, xi, is recorded for each farm. Note that we have rounded all final sample
sizes to integer values.

Farm Ni i Nii ni n(Nii y a ki 1Nii) xi

1 2000 .3 600 n1 5 377(600y6000) 5 38 2
2 4000 .3 1200 n2 5 377(1200y6000) 5 75 5
3 8000 .3 2400 n3 5 377(2400y6000) 5151 8
4 5000 .3 1500 n4 5 377(1500y6000) 5 94 3
5 1000 .3 300 n5 5 377(300y6000) 5 19 2

Of the sampled trees, x1 5 2, x2 5 5, x3 5 8, x4 5 3, and x5 5 2 trees were found to

have the benchroot problem. Using this data, the stratified estimate of the proportion
of all the 20,000 trees in the cooperative having benchroot is
N1 N2 N3 N4 N5
pstr 5 p1 a b 1 p2 a b 1 p3 a b 1 p4 a b 1 p5 a b 5 .053.
N N N N N

The reader can verify that the standard error associated with this estimate is
sp 5 .011.

Cluster Sampling
Stratified and SRS sampling are best when relatively complete lists of population elements
and strata sizes are known before sampling. In some applications, however, such informa-
tion is difficult or impossible to obtain. In wildlife sampling, for instance, scientists usually
do not have advance knowledge of either the size of the particular population or the size of
the various strata in the population. In such cases, some form of cluster sampling is used
instead of SRS or stratified sampling. Like stratified sampling, cluster sampling requires
that we first divide a population into nonoverlapping groups, called clusters. However, we
do not need to know the number of population elements in each cluster. Instead, we simply
take an SRS sample of the clusters and then measure all elements within the selected clus-
ters. For example, the U.S. Census relies on cluster sampling when complete lists of city in-
habitants are not known. A city is divided into blocks (clusters) using maps, then a random
sample of these blocks is selected and all residences in the sampled blocks are contacted.

Example 4.14 Biologists and ecologists frequently sample geographic areas by dividing a map of a
region into a collection of small square regions called quadrats (Ripley, B. D., Spa-
tial Statistics, New York, Wiley, 2004: 102). By making sure the quadrats do not over-
lap, we can apply the method of cluster sampling by choosing a random sample of
quadrats to investigate. In wildlife studies, for instance, the number of a given species
in each of the selected quadrats is counted. Because the area of a quadrat is known,
these counts are usually converted into a count per unit area, which is a measure of
the abundance of the particular species per unit area.

Section 4.2 Exercises

6. Devise a procedure for selecting a random sample (e.g., daily amounts of sunlight) are different
of words from a dictionary. Explain why your proce- on the four sides of the hill, the hill should be
dure guarantees that, for any n, each collection of n divided into four quadrants and trees should be
words has an equally likely chance of being selected. randomly sampled from each quadrant. What is
the name for this type of sampling procedure?
7. Sometimes it is difficult or impossible to determine
the population size before selecting a random sample. 10. In stratified sampling, explain why it is best to
Describe how you would go about selecting a random choose strata such that the objects within any stra-
sample of trees from a 1-square-mile area of forest. tum are relatively homogeneous.

8. Small manufactured goods are often gathered into 11. Explain how to use the 5RANDBETWEEN fun
large batches, called lots, for purposes of handling ction in Excel™ to generate a random sample from
and shipping. Random sampling is commonly the integers 1 through 1000. Does the 5RANDBE-
used to evaluate the quality of items in a given lot. TWEEN function generate samples with or without
Suppose an inspector selects a random sample of replacement?
20 items from a lot of 1000 items. 12. A population of items is partitioned into k strata of sizes
a. Before evaluating the 20 items, the inspector de- N1, N2, . . . , Nk. Using proportional allocation, ran-
cides that a sample of size 30 should be used in- dom samples of size n1, n2, n3, . . . , nk are selected
stead. If the inspector obtains a second random from the strata and the numbers x1, x2, x3, . . . , xk
sample of size 10 from the remaining 980 items, of items having a specified characteristic are deter-
can the two samples combined be validly con- mined. Sample proportions p1, p2, p3, . . . , pk are
sidered a random sample of 30 from the lot? Ex- then computed (i.e., pi 5 xiyni for each i).
plain your reasoning. a. Write an expression for the weighted average of
b. Suppose the inspector decides that only 15 items the sample proportions, using the stratum sizes
must be tested. Describe a method by which a as weights.
valid random sample of 15 from the lot can be b. Show that the weighted average in part (a)
formed from the 20 items already selected. simplifies to (x1 1 x2 1 x3 1 1 xk)y(n1 1 n2 1
9. Citrus trees are usually grown in orderly arrangements n3 1 1 nk).
of rows to facilitate automated farming and harvesting 13. Integrated circuits (ICs) consist of thousands of small
practices. Suppose a group of 1000 trees is laid out in circuits, electronic subcomponents (e.g., resistors),
40 rows of 25 trees each. To test the sugar content of and connections. An important factor in the manu-
fruit from a sample of 30 trees, researcher A suggests facture of ICs is the yield, the percentage of manu-
randomly selecting five rows and then randomly se- factured ICs that function correctly. Stratified sam-
lecting six trees from each sampled row. Researcher B pling has recently been used to estimate the number
suggests numbering a map of the trees from 1 to 1000 of defects of various kinds that occur throughout an
and selecting a random sample (without replace- IC. The area of the IC is first divided into smaller
ment) of 30 integers from the integers 1 to 1000. areas (i.e., strata) and then small sample areas are
a. Without performing any calculations, do you selected from the strata and examined for defects.
think that both methods are capable of gener- A stratified estimate of the overall proportion of de-
ating random samples from the block of trees? fects can be used to help estimate the eventual yield
Justify your answer using the rules for random of the IC manufacturing process.
samples listed in this section. In one such study, to estimate the proportion
b. Suppose that the group of trees is grown on of pinholes on an IC, its entire surface was first di-
the top and sides of a small hill. A researcher vided into 10 equal areas (strata), each of which was
suggests that, because growing conditions further subdivided into 1000 smaller rectangles that

served as the elements to be sampled. It was also as- 15. When the per unit cost of sampling from stratum
sumed that the unit costs and variances of the num- i is ci, it can be shown that the optimal weights for
bers of pinholes were equal from strata to strata. allocating the total sample size are given by
a. Calculate the population size N. Nii
b. Using a confidence level of 90% and a bound on 1ci
the error of estimation of B 5 .03 (i.e., 63%), wi 5
N11 N33
N22 Nkk
calculate the required sample size n and its 1 1 1 1
allocation n1, n2, n3, . . . , n10 to the ten strata. 1c1 1c2 1c3 1ck
Round all sample sizes to the nearest integer.
a. In the case where all unit sampling costs are equal,
c. Using the sample sizes in part (b), the results of
show that the resulting weights give the formulas
the study showed the following numbers of pin-
for n and ni specified by the Neyman allocation.
holes per sample:
b. In the case where all unit sampling costs are equal
Sample #: 1 2 3 4 5 6 7 8 9 10 and all strata variances are equal, show algebraically
Pinholes: 5 4 7 6 3 9 5 6 2 8 that the resulting weights give the formulas for n
Calculate the stratified estimate of the proportion and ni specified by the “proportional” allocation.
of pinholes on the entire IC. 16. In stratified sampling, explain why the number of
d. Calculate the standard error associated with the strata, k, should not exceed ny2, where n 5 n1 1
estimate in part (c). n2 1 n3 1 1 nk is the total sample size and ni
14. Of the elements of a certain population 20% are denotes the number of sampled items selected from
grouped into stratum S1 and the rest of the popu- stratum Si (i 5 1, 2, 3, . . . , k).
lation elements comprise stratum S2. Suppose that 17. In stratified sampling, what value would you use in
the variances of the characteristic being measured place of 1.96 if you wanted the confidence level to
are the same for each stratum, but it costs twice as be 99% rather than 95%? What is the consequence
much to obtain a sampled item from stratum S1 as of using the higher confidence level on the neces-
it does from stratum S2. What is the best allocation sary sample size?
of a total sample of n 5 1000 to these two strata?

4.3 Data from Experiments

The choice of a data collection method is dictated, to a large extent, by how we intend
to use the data. If our work involves applying standards and codes (e.g., strength testing
of concrete in commercial buildings, measuring the amount of a pollutant in a water
sample, or assigning the DOT treadwear rating printed on automobile tires), then it is
desirable to use operational definitions (Section 4.1) to keep tight control over every as-
pect of the measurement process. By doing so, we ensure that the results will be directly
comparable to similar tests and measurements made by ourselves and others. On the
other hand, if our work involves research and experimentation, then it is necessary to
purposely allow some of the underlying conditions to vary so that their combined effects
can be studied and understood. In this way, we can generalize the conclusions obtained
from the data to a larger setting. This text is concerned primarily with the latter type of
application: the statistical tools needed in research and experimentation.
The statistical techniques used in experimental research are collectively known
as experimental designs. In the sciences, these tools are also referred to as the design
of experiments, commonly abbreviated DOE. Experimental designs are carefully de-
tailed plans for obtaining sample data for the purpose of understanding relationships

between variables and generalizing conclusions obtained from the data. Inherent in
these designs are methods for balancing the two opposing goals of comparability and
generalizability mentioned in the previous paragraph.

Example 4.15 Plastic resins used in injection molding machines are designed to meet various pro-
duction requirements (e.g., melting temperatures, hardness, color). Raw resins are
manufactured in the form of solid plastic pellets that are subsequently melted inside
an injection molding machine and then “shot” into molds.
Suppose that a company wants to test similar resins from two suppliers, A and B,
to determine which one better achieves the hardness requirements for certain molded
parts. One experimental approach is to test each resin two or more times using the same
molding machine. By combining more than one reading for each brand, we hope to
“average out” any unexpected biases that might creep into any single measurement. In
such an experiment, the average hardness measurements would be directly comparable.
That is, as long as all other experimental conditions are held constant, there would be
little doubt that differences between the average hardness measurements could be at-
tributed to differences between the two brands of resin. Figure 4.2(a) depicts this design.

Brand A Brand B

Brand A Brand B Machine 1 1 3

1 2 3 4 Machine 2 2 4

(a) (b)

Figure 4.2 Experimental designs for hardness requirements:

(a) one machine, two measurements per brand; (b) two
machines, two measurements per brand

It is very difficult, however, to extrapolate such results to a more general setting. For
instance, would the hardness measurements be significantly affected if we used several
different molding machines? Figure 4.2(b) shows a simple experimental design that
allows us to answer this question while simultaneously allowing us to answer the origi-
nal question about differences between the two brands. The noteworthy feature of this
design is that comparability between brands is maintained [by comparing the average
Unless otherwise noted, all content on this page is © Cengage Learning.
hardness reading (x1 1 x2)y2 for brand A to the average (x3 1 x4)y2 for brand B], yet
we can also answer questions about whether different machines influence the results
[by comparing the two machine averages (x1 1 x3)y2 and (x2 1 x4)y2]. As this design
illustrates, the key to maintaining comparability while answering questions about gen-
eralizability is to make each measurement work more than once. Note, for instance,
that reading x1 appears in the average for brand A and again in the average for machine
1. Designs such as the one in Figure 4.2(b) can easily be extended to handle more and
more complex questions involving the effects of changing several test conditions.

Experimental designs are considered to be controlled studies because they place

strict guidelines on which factors are allowed to vary and on the range of values these
factors may assume. In this way, they differ from observational studies, in which

experimenters simply observe and measure but otherwise allow all factors to vary freely.
The following list shows some of the most common applications of experimental design.

Where Experimental Designs Are Used

Studying cause-and-effect relationships
Increasing the external validity of data
Studying how independent variables (factors) affect a dependent variable (response)
Studying the interrelationships among factors that affect a response
Optimizing product and process characteristics
Measuring experimental error

Experimental Design Terminology

Most of the concepts and terminology of experimental design were developed in the
mid-1920s by the English statistician Sir Ronald Fisher while he was working at the
British Agricultural Experimentation Station at Rothamsted, just outside London. Al-
though Fisher’s applications were primarily agricultural, statisticians quickly realized
that the methods of experimental design were universal and soon began using them in
industrial and scientific applications as well.
The object of using an experimental design is to study and quantify the effects that dif-
ferent test conditions have on some measurable characteristic of a product or process. For
instance, experimental designs have been used for decades to analyze drilling processes.
In one such study, the thrust force (lb) required to push a drill into a bar of aluminum
was studied along with two explanatory variables, drill diameter (in.) and the feed rate
(in./revolution) with which the drill penetrates the metal (“Design of a Metal-Cutting
Drilling Experiment: A Discrete Two-Variable Problem,” Quality Engr., 1993: 71–98). In
the language of experimental design, the thrust force is a response variable (also called
a dependent variable); drill diameter and feed rate are two factors (also called indepen-
dent variables) whose values are thought to explain or affect the values of the response
variable. Part of the experimental process involves selecting specific factor values, called
the factor levels (or treatment levels), to use in the study. In this study, five different feed
rates were used (.005, .006, .009, .013, and .017 in./rev.) along with five drill sizes (.225,
.250, .318, .406, and .450 in.). The final choice to be made involves the experimen-
tal unit(s) to which the treatments will be applied. Experimental units are the objects
or material upon which the final measurements are made. In the drilling study, it was
decided to use samples of a single type of aluminum alloy as the experimental units.
The particular choice of experimental units is important because it influences the
range of validity of the experimental results. Generally speaking, the more variation there
is between experimental units, the wider the range of validity of the experiment. By choos-
ing a single type of aluminum alloy, for example, the results of the drilling experiment
previously described are limited primarily to conclusions about drilling in aluminum.
If, instead, the experimental units had consisted of different types of metals, then the ex-
perimental results would correspondingly apply to a wider range of drilling applications.

The Basic Tools of Experimental Design

Experimental designs are built from a small group of tools, each addressing specific con-
cerns about experimental results: reducing bias, reducing experimental error, reducing

the effect of external factors, and increasing the generalizability of the conclusions.
What follows is an overview of these tools. Specific designs are presented in Chapter 10.
Perhaps the most familiar tool is that of replication, that is, making several repeated
measurements at each fixed combination of factor or treatment levels. For instance, in
Figure 4.2(b) of Example 4.15, suppose that we decide to make three measurements
of plastic hardness at each of the four combinations of factor levels: {brand A with ma-
chine 1, brand B with machine 1, brand A with machine 2, brand B with machine 2}.
The purpose of doing this is twofold: (1) Biases tend to be eliminated when several
measurements are averaged and (2) the variation between repeated measurements gives
a measure of experimental error. Experimental error is the name given to the slight
differences that we expect to find between repeated experimental tests, even when we
attempt to hold all test conditions constant.
The next tool, randomization, is somewhat less familiar than replication. Random-
ization requires that treatments be given to the experimental units in random order, or
equivalently, that we assign experimental units to the various treatments in a random
fashion. In Example 4.15, the experimental units are the individual containers of plastic
pellets (of each brand) that are used for testing. Since we decided to use three replica-
tions for each combination of factor levels, there are a total of 12 tests to conduct (three
measurements at each of the four factor combinations). Randomization requires that
these 12 tests be run in random order. This is easy to accomplish using the methods of
Section 4.2, as the next example shows.

Example 4.16 In Figure 4.2(b) of Example 4.15 (page 180), denote the four distinct treatment com-
binations by M1A, M1B, M2A, and M2B, where M1A stands for the combination
“machine 1 and brand A,” M1B stands for “machine 1 and brand B,” and so forth. To
run three replicate tests at each treatment combination, we first number these tests
from 1 to 12 as in the following table. Next, a random sample of size 12 is chosen
(without replacement) from the integers 1 through 12. Suppose, for instance, the
random sample is {11, 3, 7, 1, 4, 5, 12, 2, 8, 10, 6, 9}. With this ordering, test 4 (M2A)
would be the first one conducted, test 8 (M1B) would be next, and so forth. In this
way, the tests will be conducted in random order.

Random order in which

Test # Test conditions tests are conducted
1 M1A 11
2 M1A 3
3 M1A 7
4 M2A 1
5 M2A 4
6 M2A 5
7 M1B 12
8 M1B 2
9 M1B 8
10 M2B 10
11 M2B 6
12 M2B 9

Randomization is used for much the same reasons that we use random sampling
(Section 4.2): to eliminate unforeseen biases from the experimental data and to lay the
groundwork for the statistical inferences that we eventually draw from the experiment.
The first reason is easy to understand when we again consider Example 4.15. To save
time, for instance, someone might decide to run all three tests involving machine 1 and
brand A sequentially, since the brand A plastic could simply be inserted in the machine
three times in a row, avoiding any downtime for cleaning the machine when switching
to the other brand. However, this might mean that all the tests with machine 1 and
brand B would have to be conducted on a different day than the brand A tests. Since it is
possible that environmental factors could change from one day to another or that differ-
ent machine operators might be used on different days, these different conditions them-
selves could be responsible for substantial differences in the hardness measurements.
In other words, we could no longer be confident about attributing differences between
hardness measurements solely to differences between the two brands. By running the 12
tests in random order, we can avoid systematic biases such as these.
The third tool used extensively in experimental design is blocking. Blocking is
used to screen out the effects of external factors that the experimenter suspects in
advance will have a large effect on the measurements. Pharmaceutical companies,
for example, use blocking when testing the effectiveness of a new drug. Because
different people often differ widely in their responses to drugs, experimenters first
divide the experimental subjects into homogeneous groups or blocks. The people
in a given block are “matched” on various characteristics (e.g., blood pressure, age,
gender) so that the people in any given block are very similar to one another but fairly
different from the people in other blocks. The goal is to maximize the similarity of
the subjects within each block and to maximize the differences between the blocks.
For instance, block 1 might consist of young females with low blood pressure, block
2 could consist of middle-aged men with high blood pressure, and so forth. After
the blocks are formed, the experimental treatments are applied within the blocks.
For example, half of the people in block 1 would be given the new drug, whereas
the other half would receive a placebo. Similarly, half the people in block 2 would
receive the new drug and half would receive the placebo. In this way, when we look
at a particular block, any differences in response between the two halves of the block
could be attributed to the different treatments (receiving the drug or receiving the
placebo), not to the differences between people. Without blocking, differences in the
response to different treatments can often be masked by large differences between
the individuals randomly selected for each treatment.
Blocking increases the sensitivity of an experiment for detecting differences be-
tween treatments. When blocking is applied in conjunction with randomization, it is
possible to design experiments that are simultaneously sensitive to differences between
the treatments studied but less sensitive to the unknown external factors that might
affect the data. One popular phrase that summarizes how these tools are to be used is
“block what you know, randomize what you don’t.”5 In other words, try to identify known
sources of variation and eliminate their effect by forming blocks. However, within each
block, remember to assign experimental units to the treatments in a random fashion.

5
Box, G. E. P., W. G. Hunter, and J. S. Hunter, Statistics for Experimenters (2nd ed.), John Wiley & Sons,
New York, 2005: 93.

Example 4.17 The strength of concrete used in commercial construction tends to vary from
one batch to another. Consequently, small test cylinders of concrete sampled
from a batch are “cured” for periods up to about 28 days in temperature- and
moisture-controlled environments before strength measurements are made.
Concrete is then “bought and sold on the basis of strength test cylinders” (ASTM
C 31 Standard Test Method for Making and Curing Concrete Test Specimens
in the Field).
Suppose that we want to compare three different methods of curing concrete
specimens. We know that batch-to-batch variation can be a significant factor in
strength measurements. One way to compare the three methods is to use different
batches of concrete as blocks in an experimental design. This is accomplished by
separating each batch into three portions and then randomly assigning the portions
to the three curing methods. Table 4.1 shows the data from one such test using ten
batches of concrete of comparable strengths.

Table 4.1 Data from the blocked experiment of

Example 4.17
Strength (in MPa)
Batch Method A Method B Method C
1 30.7 33.7 30.5
2 29.1 30.6 32.6
3 30.0 32.2 30.5
4 31.9 34.6 33.5
5 30.5 33.0 32.4
6 26.9 29.3 27.8
7 28.2 28.4 30.7
8 32.4 32.4 33.6
9 26.6 29.5 29.2
10 28.6 29.4 33.2

The purpose of blocking is to allow for fair comparisons among the three test Unless otherwise noted, all content on this page is © Cengage Learning.
methods. Notice, for example, that all three methods gave relatively lower values
for batch 6 and higher values for batch 5. This is evidence of a difference be-
tween batches 5 and 6. By blocking, however, any differences among the batches
are experienced by all three test methods. Consider how different things might
be if we had simply assigned entire batches at random to the three test methods.
By doing so, it is possible that batch 5 could be assigned to method C alone and
batch 6 to method A alone, which would increase the average strength measure-
ment for column C and decrease the average for column A. In other words, if
we do not use blocking, then differences among the three test methods could be
significantly influenced by the manner in which the batches of cement are as-
signed to the tests.

Section 4.3 Exercises

18. Four new word processing software programs are to b. What operational definitions would you sug-
be compared by measuring the speed with which gest that the researcher incorporate into this
various standard tasks can be completed. Before experiment?
conducting the tests, researchers note that the c. What changes would you make to the experi-
level of a person’s computer experience is likely to ment to increase the generalizability of the
have a large influence on the test results. Discuss experimental results?
how you would design an experiment that fairly
22. In a study of the ratio of nitrogen, phosphoric acid,
compares the word processing programs while
and potash in fertilizers, four different mixtures
simultaneously accounting for possible differences
(M1, M2, M3, M4) of the three chemicals are to be
in users’ computer proficiency.
tested for their effects on the rate of growth of grass
19. What primary purpose do replicated measurements seedlings. A square plot of land is subdivided into
serve in an experimental design? four equal-size square plots, each planted with the
same amount, by weight, of seedlings. Before the
20. In a study of factors that affect the ability of the laser
fertilizers are applied, each square subplot is itself
in a DVD player to read the information on a DVD,
divided into four more squares. Two experimental
a researcher decides to examine several different pho-
methods are proposed for applying the fertilizers to
toresist thicknesses used in making the plates from
the subplots. In experiment A, the four fertilizers
which plastic DVDs are stamped. As a response vari-
are randomly assigned to the large subplots, where-
able, the researcher decides to measure the average pit
as in experiment B, all four fertilizers are randomly
depth of the holes etched on the surface of the DVD.
assigned to the subplots of the four large plots. An
The experiment must be conducted under a fixed
illustration of both experimental designs follows.
budget and time constraint that allows the researcher
to analyze a sample of at most 20 DVDs. M1 M1 M4 M4 M1 M4 M3 M2
a. Suppose that it is known that, for any fixed
photoresist thickness, there tends to be little, M1 M1 M4 M4 M3 M2 M4 M1
if any, variation in the pit depths on a DVD.
M3 M3 M2 M2 M2 M1 M2 M3
Which would be better: (1) an experiment
with little or no replication and several pho- M3 M3 M2 M2 M4 M3 M1 M4
toresist thickness levels or (2) an experiment
with more replication, but fewer photoresist Experiment A Experiment B
thickness levels?
a. If care were taken to ensure that there are no
b. Suppose it is known that, even for a fixed photo-
significant differences in the growing conditions
resist thickness, pit depths can vary substantially.
(soil type, irrigation, drainage, sunlight, etc.)
Answer the question posed in part (a) for this
among the four large subplots, is one of these
situation.
designs preferable over the other? Why?
21. A researcher wants to test the effectiveness of a new b. If it is suspected that there could be significant
fuel additive for increasing the fuel efficiency (miles differences in the growing conditions among
per gallon, mpg) of automobiles. The researcher the four main subplots, is one of the two designs
proposes that a car be driven for a total of 500 miles preferable over the other? Why?
and that at the end of each 100-mile segment the
23. A complex chemical experiment is conducted and,
fuel efficiency be measured and recorded.
because the amount of precipitate produced is expect-
a. What is the purpose of measuring efficiency
ed to vary, the experiment is repeated several times.
every 100 miles? Why not just measure efficien-
A lengthy lab equipment setup, followed by a tedious
cy at the end of the 500-mile course?
experimental procedure, allows the experiment to

be repeated up to six times in any given day. Conse- a. Calculate an estimate of how much plastic hard-
quently, one lab assistant is assigned to set up the lab ness is increased or decreased by switching from
equipment and then conduct six runs one day. The the brand A resin to the brand B resin.
next day a second lab assistant conducts another six b. Calculate an estimate of how much plastic hard-
runs using the same lab setup from the previous day. ness is increased or decreased by switching from
What two basic experimental design principles are machine 1 to machine 2.
violated by this experimental procedure? c. Because this experiment does not provide any
estimate of the experimental error expected in
24. Refer to Example 4.15 and Figure 4.2(b). Suppose
successive experimental runs, it is impossible
the hardness measurements (in Mohs) of plastics in
to know whether the estimated change in part
four test runs are as follows:
(a) is caused by switching brands or is simply
Brand A Brand B
due to experimental variation. Describe how
Machine 1 2.6 3.2 you would improve this experiment to obtain
Machine 2 2.8 3.6 an estimate of the experimental error.

4.4 Measurement Systems

The quality of data is affected by the type of data-gathering plan followed and the reli-
ability of the instruments used to make required measurements. Previous sections of
this chapter have dealt with concerns about data-gathering methods, especially the role
of operational definitions and statistics in addressing these concerns. However, most
statistical methods are not explicitly designed to address questions about the quality of
the raw measurements themselves. Instead, concerns about measurement quality are
usually considered separately.
The study of measurement is called metrology. Broadly speaking, metrology is
concerned with two basic issues. The first deals with our ability to produce measure-
ments of sufficient accuracy and precision to support any analyses based on these
measurements. The second concern is calibration. Calibration addresses the various
systematic errors that can cause an instrument’s readings to be in error. A familiar
example is found in common household scales, which must be “zeroed” before giv-
ing a true reading of a person’s weight. If such a scale consistently gives readings that
are 5 lb too high, then we say that the scale is “out of calibration” and that it has an
offset of 5 lb. Instruments are said to be “in calibration” if they give true readings,
that is, if their offset is zero. Calibrating an instrument usually requires comparing
its readings to those of a similar instrument that is already known to be in calibration.
In turn, these secondary instruments must themselves be calibrated by comparison
with yet a higher standard until we can eventually trace all such comparisons back to
the highest measurement authority—those housed within the National Institute of
Standards and Technology (NIST).

Accuracy and Precision

The concepts of accuracy and precision of a measuring instrument are statistical in na-
ture. Accuracy refers to the degree to which repeated measurements of a known quan-
tity x tend to agree with x. Given several repeated measurements x1, x2, x3, . . . , xn of

some known value x, we measure the accuracy of the readings by the difference between
x and the average of the n readings:

accuracy 5 x 2 x

Refer to the n measurement readings displayed in the histogram in Figure 4.3. We can
think of accuracy as the distance between the center of the histogram (i.e., the mean)
and the true value of x.

Measurements
–

Accuracy

Figure 4.3 Measurement accuracy

The precision of an instrument describes the extent to which repeated measure-

ments tend to agree with one another. They do not necessarily have to agree with the
true value x that is being measured. Precision, then, is a measure of variation and is es-
timated by the sample standard deviation of n repeated measurements x1, x2, x3, . . . , xn:

1
precision 5 s 5
A n21
^ (xi2 x)2
Figure 4.4 shows the various combinations of precision and accuracy that are pos-
sible in practice. The worst case occurs in Figure 4.4(a) where the measurements have
a large variation (i.e., low precision) and are biased to the left of the true value of x. The
best-case scenario is in Figure 4.4(d), where all the measurements are tightly packed
Unless otherwise noted, all content on this page is © Cengage Learning.

around x (i.e., high precision and good accuracy).

(a) Inaccurate and imprecise (b) Accurate, but imprecise (c) Precise, but inaccurate (d) Accurate and precise

Figure 4.4 Precision and accuracy

Repeatability and Reproducibility

The concepts of repeatability and reproducibility refer to the amount of variation that
exists between several repeated readings made by a measurement system. Repeatability
is the amount of variation expected when almost all external sources of measurement
error have been controlled and held fixed. For this reason, repeatability studies are often
conducted by the same person using a single instrument to repeatedly measure a single
item. Repeatability is a measure of the best that one can hope to achieve from a mea-
suring instrument. Because the controlled environment of a repeatability study is hard
to duplicate in a production environment, repeatability usually paints a very optimistic
picture of measurement variation.
Repeatability is defined as the sample standard deviation, s, of several repeated
measurements made under the controlled conditions described previously. When
we make the additional assumption that the measurement errors follow a normal
distribution, it is also common to report repeatability as 63s, because we expect the
vast majority of the readings to fall within a range of about 3 standard deviations
on either side of the average reading. Because precise estimation of a population
standard deviation generally requires larger sample sizes than those necessary for
estimating population means, we recommend against using small sample sizes in
repeatability studies. If desired, exact sample size formulas for estimating the popu-
lation standard deviation can be used.

Example 4.18 In a repeatability study, a worker selects a single manufactured part and measures its
length 25 times. The measurements (in.) and their sample mean and standard de-
viation are given in Table 4.2. The repeatability of the measuring instrument can be
reported either as the standard deviation s 5 .096 in. or in terms of 63s 5 63(.096) 5
6.288 in. The latter method has the intuitive interpretation that the instrument’s read-
ings generally lie within about .288 in. of the true length. For instance, if the worker
measures another part and obtains a reading of 9.98 in., then the true length of that
part should be somewhere between 9.692 and 10.268 in.

Table 4.2 Data for the repeatability study of Example 4.18

Repetition Measurement (in.) Repetition Measurement (in.)
1 9.92 14 9.90 Unless otherwise noted, all content on this page is © Cengage Learning.

2 10.05 15 9.88
3 9.99 16 9.82
4 9.85 17 9.91
5 9.90 18 10.05
6 10.00 19 9.87
7 9.99 20 10.05
8 9.98 21 9.94
9 10.17 22 9.75
10 9.97 23 9.89
11 9.97 24 9.85
12 10.02 25 10.12
13 10.00
x 5 9.95 s 5 .096

Unfortunately, the terms repeatability and reproducibility are not uniquely defined
in the literature, so you may encounter alternative definitions from time to time. One
popular definition of repeatability is given by the formula k12s, which estimates the
maximum difference that, with high reliability, can be expected between any two in-
strument readings. In this formula, s is the sample standard deviation and the factor k
depends on the reliability level we specify. Tabled values of k, along with a detailed dis-
cussion of this form of repeatability, can be found in the article by Mandel and Lashof
listed in the chapter bibliography.
As we allow more and more parts of a measurement system to vary, we move from
repeatability to the concept of reproducibility. Reproducibility studies allow several fac-
tors to vary at the same time. In such studies, it is common to use several operators and
several instruments to measure several production items. The idea is to see how the mea-
surement system behaves in an environment more closely resembling a real production
environment. Reproducibility studies are usually based on simple experimental designs
that allow us to break measurement variation into distinct components that estimate the
contribution of the various noise factors (different operators, different parts, etc.) to the
overall measurement error. Examples of such designs are given in Chapter 10.

Interlaboratory Comparisons
Many measurements are done by laboratories specializing in complex measurement
procedures. This is the case, for example, for most of the nondestructive tests mentioned
in Example 4.7. For such data, our concern centers on the consistency of the results
reported by different laboratories. Practically speaking, we want some assurance that if
we submit the same sample material to laboratory A and laboratory B, then the results
reported by the two laboratories will be in close agreement.
The reliability of data from different laboratories is evaluated by means of interlabo-
ratory comparison programs. Professional organizations such as the American Society for
Testing and Materials (see Section 4.1) run several such programs each year. For example,
in the ASTM interlaboratory cross-check program for reformulated gasoline, participating
laboratories are given test samples each month for measurement. The test samples are
specially prepared under the direction of ASTM to ensure that each lab receives the same
test material. The data from all participating laboratories is then summarized and given
to the participating laboratories. In this way, each laboratory can evaluate its performance
against the others and, if necessary, make changes to its measurement system.
Youden plots, introduced in 1959, are the standard technique for compar-
ing the data from a group of laboratories (Youden, W. J., “Graphical Diagnosis of
Interlaboratory Test Results,” Industrial Quality Control, 1959: 24–28). To create
these simple scatterplots, each laboratory is given two nearly identical test samples
(labeled A and B) to measure. The two measurements from a given laboratory are
then plotted as a single point on the Youden plot. The horizontal axis is used for
the measurements of sample A and the vertical axis is used for sample B. As an aid
in interpreting the plots, horizontal and vertical lines positioned at the medians of
the sample A data and sample B data are included. Some typical Youden plots are
shown in Figure 4.5 (page 190). The points generally fall close to a 45° line because
the two samples (A and B) are similar and because each lab follows a fixed measure-
ment procedure.

Sample B Sample B Sample B

Lab 1

Lab 2
Sample A Sample A Sample A
(a) (b) (c)

Figure 4.5 Typical Youden plots and their interpretation: (a) ideal situation with the points evenly
scattered in all four quadrants; (b) laboratory 1 and laboratory 2 are using procedures that are
systematically different from those used at the other labs; (c) most of the labs are following slightly
different versions of the test procedure

Example 4.19 Nonsteroidal anti-inflammatory drugs (NSAIDs) are often used to reduce in-
flammation and relieve fever and pain. Examples of NSAIDs include ibuprofen,
ketoprofen, and naproxen. In “Second Interlaboratory Exercise on Non-Steroidal
Anti-Inflammatory Drug Analysis in Environmental Aqueous Samples” (Talanta,
2010: 1189–1196), researchers wanted to investigate interlaboratory comparisons
of NSAIDs in different aqueous samples. This research was conducted to ascertain
the level of interlaboratory agreement of NSAID analyses among various European
laboratories and also to determine possible sources of variation. In one investigation,
each of 12 laboratories measured the concentrations of ibuprofen (ng/L) in two test
samples of tap water. Table 4.3 shows the data from these tests as read from a graph.
The Youden plot for this data (Figure 4.6) shows many points scattered near the
45°-line, indicating that several of the laboratories are following different versions of
the chemical test procedure.

Table 4.3 Ibuprofen Concentrations (ng L–1)

Laboratory Sample A Sample B
1 29.36 33.33
2 30.11 41.09 Unless otherwise noted, all content on this page is © Cengage Learning.

3 42.74 42.46
4 46.09 45.20
5 46.46 46.11
6 49.81 48.85
7 60.96 55.24
8 66.53 40.63
9 67.65 47.02
10 113.36 105.46
11 172.09 172.11
12 199.97 193.11

Sample B

200

150

100

0 Sample A
0 50 100 150 200

Figure 4.6 Youden plot of the data in Table 4.3

Section 4.4 Exercises

25. To estimate the accuracy and precision of an instru- measurements within the range of an instru-
ment that measures lengths, a .300-in. gauge block ment. Suppose, for example, that a thermom-
was used as a reference standard and was measured eter has a maximum relative error of 64% over
five times. The resulting measurements were: .301, its operating range of 250°F to 150°F. What
.303, .299, .305, and .304. Calculate estimates of is the maximum absolute error you would ex-
both the accuracy and the precision of the measur- pect in a measured reading of 70°F from this
ing instrument. thermometer?

26. Calibration is the process of comparing an instru- 28. After carefully controlling all the chemical reagents
ment’s measurements to those of a reliable ref- and conditions during an experiment, a chemist
erence standard. If necessary, the instrument is weighs the amount of reactant produced by an ex-
adjusted to bring its measurements into agreement periment. The chemist weighs the reactant on an
with the reference standard. Explain what effect electronic balance, then reweighs the reactant five
Unless otherwise noted, all content on this page is © Cengage Learning.

calibration has on the estimated precision (not the times, being careful to remove and replace the re-
accuracy) of a measuring instrument. actant on the balance between weighings.
a. In the language of experimental design, can these
27. Many instrument makers report the accuracy of
six measurements be considered replications?
their instruments in terms of relative error as well as
b. What type of variation is measured by calculat-
absolute error. The relative error in a measurement
ing the sample standard deviation, s, of the six
is defined as (m 2 x)yx ? 100%, where m is the mea-
measurements?
sured value and x is the true value. Absolute error is
given by |m 2 x|. 29. The melt flow index (MFI) of a polymer is defined
a. Calculate the relative errors for each of the five to be the amount of the polymer (in grams) that can
measurements in Exercise 25. flow in 10 minutes through a standard die when
b. Relative errors are often stated in terms of the subjected to a specified force and temperature. MFI
maximum relative error to be expected for any is widely regarded as an important characteristic for

commercial polymer processing. However, there is Laboratory Replicate 1 Replicate 2

a lack of reference standards for measuring MFI. To 10 11.096 11.286
address this issue, the authors of “An Interlabora- 11 10.522 10.215
tory Comparison of the Melt Flow Index: Relevant 12 10.603 10.211
Aspects for the Participant Laboratories” (Polymer 13 12.031 12.117
Testing, 2007: 576–586) report MFI readings by 24 14 10.900 11.477
laboratories for various polypropylene and polysty- 15 10.876 10.772
rene polymer samples. For one of the polypropyl- 16 11.043 11.177
ene polymers, each of the 24 laboratories provided 17 10.384 10.669
the following replicate measurements of MFI: 18 10.118 10.260
19 5.382 5.369
Laboratory Replicate 1 Replicate 2 20 10.353 10.132
1 11.700 11.502 21 11.413 11.389
2 9.790 10.300 22 11.540 11.864
3 12.760 12.073 23 12.202 11.548
4 10.400 9.800 24 11.227 11.259
5 10.648 10.904
6 11.074 11.072 a. Create a Youden plot of this data.
7 10.820 10.746 b. What conclusions can you draw regarding the
8 11.473 11.682 test procedures being used in these laboratories?
9 10.723 11.545 Are there any unusual MFI measurements?

Supplementary Exercises
30. Consult a published reference, weather bureau, caught and tagged. After releasing the fish and al-
or Internet site to determine the operational defi- lowing sufficient time for them to mix with the rest
nition used by weather forecasters when making of the fish in the lake, a second sample of, say, 50 fish
statements like “There will be a 30% chance of are caught. The number of tagged fish in the second
rain tomorrow.” sample is counted.
a. Suppose there are five tagged fish found in the
31. A common method for selecting a random second sample. Because the samples are as-
sample without replacement from the integers sumed to be random samples from the entire
1, 2, 3, . . . , N is to generate a random sample population of T fish, the proportion of tagged
with replacement (using random number tables fish in the second sample should be approxi-
or a software program) and then discard any mately equal to the proportion of tagged fish in
duplicate numbers that appear in the sample. the population. Use this fact to estimate T, the
Use the sampling rules in Section 4.2 to justify total number of fish in the lake.
why this procedure will produce a valid random b. Generalize your result in part (a). That is, if xtag1
sample without replacement.
is the number of fish caught and tagged in the
32. The method of capture–recapture sampling is often first sample and xtag2 is the number of tagged
used to estimate the size of wildlife populations fish found in a second sample of size y, write an
(Thompson, S. K., Sampling, John Wiley & Sons, equation for the estimated value of T.
New York, 1992: 212–233). To illustrate the method, 33. Cr(VI) is a pollutant associated with chromite ore
suppose an initial sample of 100 fish from a lake are processing. In a study of Cr(VI) concentrations, a

sampling plan was devised to estimate ambient lev- several models of gas-powered lawn mowers. The
els of Cr(VI) in the air [“Background Air Concen- following table shows NOx emission rates (grams/
trations of Cr(VI) in Hudson County, New Jersey: kWh) for two measuring methods: STC (similar to
Implications for Setting Health-Based Standards certification), which measures emissions for a 10-sec
for Cr(VI) in Soil,” J. of Air and Waste Manage- period, and an experimental method C6M, which is
ment, 1997: 592–597]. The authors propose using a weighted average of emission rates obtained under
such background measurements as a basis for de- six different combinations of running speeds, times,
veloping health-based standards for chromite ore and engine loads.
processing plants.
a. In the study, background samples of air were NOx emission rate estimates
selected to be representative of land use in the Lawn mower STC C6M
vicinity of chromite ore processing sites, but not 1 3.03 4.40
so close that these samples would be affected 2 4.04 4.38
by emissions from the processing plants. What 3 5.34 7.64
role would such samples play in an experiment 4 6.42 8.28
5 4.17 7.21
to subsequently evaluate emissions at chromite 6 1.23 1.43
ore plants? 7 4.10 3.91
b. The authors used ASTM Standard Test Method 8 2.21 1.89
D5281–92 when measuring the concentrations 9 6.57 7.14
of Cr(VI). What experimental purpose does us- 10 3.80 4.71
11 4.76 6.80
ing such a standard serve? 12 .49 .01
c. Air samples were taken at two different locations, 13 1.97 2.91
an industrial area and an undeveloped commer- 14 1.64 1.23
cial site. Samples were collected at each site dur- 15 3.26 2.72
ing six 24-hour sampling periods; wet and dry 16 4.20 6.95
17 .32 .11
days were included. What general experimental 18 7.76 8.73
design principles are illustrated here? 19 4.79 6.75
20 .98 1.12
34. Youden plots are frequently used to compare two dif-
ferent instruments or evaluation methods. In a study a. Construct a Youden plot of this data.
of lawn mower exhaust emissions (“Exhaust Emis- b. Use the methods of Chapter 3 to fit a regression
sions from Four-Stroke Lawn Mower Engines,” J. of line to this data, with STC as y and C6M as x.
Air and Waste Management, 1997: 945–950), two c. What conclusions can you draw from the results
methods of measuring NOx (nitrogen oxide) emis- in parts (a) and (b) about the two NOx measur-
sion rates were compared by using both methods on ing methods?

Bibliography
Box, G. E. P, W. G. Hunter, and J. S. Hunter, Statistics Mandel, N., and T. W. Lashof, “The Nature of Repeatability
for Experimenters (2nd ed.), Wiley, New York, 2005. and Reproducibility,” J. of Quality Technology, vol. 19,
Written for researchers. Emphasis is on explanation no. 1, 1987: 29–36. A nice explanation and comparison
and application of experimental design techniques to of two concepts that are sometimes confused in practice.
real data and examples. Thomas, G. G., Engineering Metrology, Wiley, New
Lohr, Sharon, Sampling Design and Analysis (2nd ed.), York, 1974. Explains the various methods used to
Duxbury, Belmont, CA, 2009. A comprehensive obtain measurements of different physical quantities.
survey of sampling.

5
Scorpp/Shutterstock.com
Probability and Sampling
Distributions
5.1 Chance Experiments
5.2 Probability Concepts
5.3 Conditional Probability and Independence
5.4 Random Variables
5.5 Sampling Distributions
5.6 Describing Sampling Distributions

Introduction
Chapter 5 marks a transition from purely descriptive methods to the inferential
methods discussed in the remainder of this book. Beginning in this chapter, we will
refer to any numerical measure calculated from sample data as a statistic. As you
have seen in Chapters 1–3, statistics such as the sample mean, standard deviation,
and correlation coefficient are useful tools for describing sets of data. Similarly, den-
sity and mass functions provide concise descriptions of populations and ongoing
processes. One important question left unanswered in those chapters, however, is:
How do we know what parameter values to use in a mass function or density func-
tion? For example, the Weibull density is commonly used for modeling the lifetimes
of products, but how do you go about selecting numerical values for the
Weibull parameters, and , that best describe the lifetimes of a particular product?
One way to answer such questions is to use statistical inference, a tech-
nique that converts the information from random samples (see Section 4.2)
into reliable estimates of, and conclusions about, population or process parame-
ters. Sections 5.5 and 5.6 illustrate how statistical inference works.When reading
these sections, it is important to keep in mind the crucial role played by ran-
dom sampling. random sampling, statistics can only provide descriptive
194

summaries of the data itself. random sampling, though, our conclusions can
be reliably extended beyond the data, to the population or process from which
the data arose. Figure 5.1 illustrates the difference between statistics based on
ordinary data sets and statistics based on random samples.

Population Sample
or Data
process

Statistic
(a)

Population Random sample

or Data
process
Inference
Statistic
(b)

Figure 5.1 Statistical inference:

(a) descriptive statistics;
(b) inferential statistics

Drawing conclusions from samples necessarily involves some risk. Samples, after
all, only give approximate pictures of populations or processes. Intuition tells us that
the clarity of these pictures ought to increase as the sample size grows, but intuition
fails to be more precise than that. For example, when testing a large shipment of
parts for defective items, most people would agree that finding two defective items
in a random sample of 10 is very different from finding 200 defectives in a random
sample of 1000. Although the sample percentage (i.e., the statistic calculated from
the data) is the same in both cases, the 20% defect rate in the larger sample seems
much more credible than the 20% defect rate in the smaller sample. To quantify just
how much more credible the information in the larger sample is, we use the tools
Unless otherwise noted, all content on this page is © Cengage Learning.

of probability. Probability methods, discussed in Sections 5.1–5.4, provide the basis

for measuring the amount of confidence or reliability in a statistic.

5.1 Chance Experiments

The term chance experiment may sound self-contradictory to an engineer or a scientist.
What could possibly be random or uncertain about a carefully planned scientific inves-
tigation? The answer, of course, lies in our definition of the term. A chance experiment,
also called a random experiment, is simply an activity or situation whose outcomes, to
some degree, depend on chance. To decide whether a given activity qualifies as a chance
experiment, ask yourself the question, Will I get exactly the same result if I repeat the
experiment more than once? If the answer is “no,” then the experiment qualifies as a

chance experiment. Under this rather wide definition, determining whether a metal
part withstands a stress test, recording whether it rains tomorrow, measuring the yield of
a chemical reaction, assessing the potency of a pharmaceutical product, or measuring
the volume of water flowing in a drainage system all qualify as chance experiments.
Most chance experiments in the sciences arise because either (1) some natural
phenomenon is at work, causing unpredictable changes in experimental outcomes, or
(2) we purposely introduce randomness as a tool for extrapolating information from data
to conclusions about populations or processes (see Section 4.3). As an example of the
former, yields of chemical reactions often vary with each repetition of an experiment,
no matter how hard one tries to control the conditions of the experiment. Slight differ-
ences in handling (e.g., the amount of mixing, the ambient temperature, the elapsed
time of the reaction) or even in the behavior at the molecular level (e.g., Brownian mo-
tion, material flow) can induce small changes in experimental results. However chance
experiments may arise, from natural forces or by statistical methodology, probability
provides a structure for measuring and consistently handling uncertainty.

Events
Underlying the computations of probability is an organized system for describing and
working with the outcomes of chance experiments. These outcomes can be divided into
two types: (1) simple events, which are the individual outcomes of an experiment and,
more generally, (2) events, which consist of collections of simple events. For instance,
the chance experiment of conducting a series of stress tests on three metal parts has the
eight possible outcomes PPP, PPF, PFP, FPP, PFF, FPF, FFP, and FFF, where P and
F denote the test results “pass” and “fail,” and the order in which the letters appear cor-
responds to the part number tested (e.g., PPF indicates that the first two parts passed the
test, but the third part failed). Each of these eight outcomes is a simple event, which,
taken together, form the sample space of the experiment.
Events are often denoted by single uppercase letters, usually from the beginning of
the alphabet, much like we denote constants in formulas by lowercase letters. Single-letter
names for events are very useful when applying the probability formulas in Section 5.2.
Thus we might denote the event that at least two parts pass the stress test by A, the event
that exactly 1 part passes the stress test by B, and so forth. Events can also be described
by just listing, in brackets, the simple events that comprise them. For example, the
event that at least two parts pass the stress test corresponds to the set of outcomes {PPP,
PPF, PFP, FPP}. If we had also chosen to denote this event by the letter A, then we
could also write A 5 {PPP, PPF, PFP, FPP}.

Example 5.1 Let’s continue with our example of stress-testing metal parts. Suppose that we now se-
lect and test four parts. Using sequences of Ps (for parts that pass the test) and Fs (for
parts that fail the test), the sample space of the experiment of selecting and testing
four metal parts is somewhat larger than that of the experiment of selecting and test-
ing three metal parts, discussed previously. In particular, the sample space consists of
these 16 simple events: {PPPP, PPPF, PPFP, PFPP, FPPP, PPFF, PFPF, PFFP, FPPF,
FPFP, FFPP, PFFF, FPFF, FFPF, FFFP, FFFF}. For convenience, these events are
listed in order of decreasing numbers of Ps in each four-letter sequence.

Suppose we are interested in the events A 5 at least two parts pass the stress test and
B 5 at most two parts pass the stress test. In terms of simple events, we can write A and B as
A 5 {PPPP, PPPF, PPFP, PFPP, FPPP, PPFF, PFPF, PFFP, FPPF, FPFP, FFPP}
B 5 {PPFF, PFPF, PFFP, FPPF, FPFP, FFPP, PFFF, FPFF, FFPF, FFFP, FFFF}
Note that A and B have several simple events in common (shown underlined).

Example 5.2 A reasonably large percentage of C11 programs written at a particular company
compile on the first run, but some do not (a compiler is a program that translates
source code—in this case, C11 programs—into machine language so programs can
be executed). Suppose an experiment consists of selecting and compiling C11 pro-
grams at this location one by one until encountering a program that compiles on the
first run. Denote a program that compiles on the first run by S (for success) and one
that does not by F (for failure). Although it may not be very likely, a possible outcome
of this experiment is that the first 5 (or 10 or 20 or . . .) are F’s, and the next one is
an S. In other words, for any positive integer n, we may have to examine n programs
before seeing the first S. The sample space is {S, FS, FFS, FFFS, . . .}, which con-
tains an infinite number of possible outcomes. The same abbreviated form of the
sample space is appropriate for an experiment in which, starting at a specified time,
the gender of each newborn infant is recorded until the birth of a male is observed.

Depicting Events
Various devices have been created to help visually describe the events in a sample space.
Tree diagrams are especially useful for depicting experiments that are conducted in a
sequence of steps, such as our example of testing three metal parts. Beginning at the left,
each step in the sequence is given its own set of branches, which themselves form the
starting points for all branches to their right. Figure 5.2 shows a tree diagram for the ex-
periment of selecting and testing three metal parts. Simple events are formed by follow-
ing any branch of the tree diagram from the leftmost point to one of the rightmost points.

First part Second part Third part

Unless otherwise noted, all content on this page is © Cengage Learning.

P
P
F
P
P
F
F
P
P
F
F
P
F
F

Figure 5.2 Tree diagram for the experiment of

selecting and testing three metal parts (branches
forming the simple event PPF are shown shaded)

Another visual device, the Venn diagram, is especially useful for depicting rela-
tionships between events. Venn diagrams are simple two-dimensional figures, often
rectangles or circles, whose enclosed regions are intended to depict a collection of
simple events, called points, in a sample space. Figure 5.3 shows a Venn diagram of
several events based on Example 5.1. Events like A and B that contain points in com-
mon are depicted as overlapping regions in the diagram. Events that do not contain
any common points, such as the events B 5 at most two parts pass the test and C 5
exactly three parts pass the test, are shown as nonoverlapping regions. An event that
contains all the points of some other event is shown as surrounding the smaller event.
For example, the event A 5 at least two parts pass the test contains all of the simple
events in event C 5 exactly three parts pass the test, so C is shown inside of A in
Figure 5.3.

Sample space

Figure 5.3 Venn diagram of the events and

in Example 5.1

Venn diagrams and tree diagrams are indispensable tools in many parts of probabil-
ity theory, but they are not essential to conducting statistical studies. We will use these
diagrams primarily as an aid for discussing certain probability concepts, but, beyond Unless otherwise noted, all content on this page is © Cengage Learning.

that, their use is not emphasized. The interested reader may consult texts on probability
for more information on working with Venn diagrams.

Forming New Events

Simple events are fundamental to describing chance experiments, but the events
that are of most interest are usually much more complex. Indeed, it is not an exag-
geration to state that the majority of probability calculations involve techniques for
decomposing complex events into simpler ones. One of the primary methods for
creating complex events and, therefore, for unraveling them, involves the use of the
words and, or, and not. The following box shows how these words are used to build
new events from old ones.

definitions For a chance experiment and any two events A and B:

1. The event A or B consists of all simple events that are contained in either
A or B. A or B can also be described as the event that at least one of A or B
occurs.
2. The event A and B consists of all simple events common to both A and B. A
and B can be described as the event that both A and B occur.
he event A=, called the complement of A, consists of all simple events that
3. T
are not contained in A. A= is the event that A does not occur.

Example 5.3 Refer to Example 5.1, the experiment of selecting and testing four metal parts. To
form the event A or B, we simply list all events that are in either A or B, or in both.
The easiest way to do this is to list all the events in A and then add the events in B
that are not duplicates of those in A. Thus

A or B 5 {PPPP, PPPF, PPFP, PFPP, FPPP, PPFF, PFPF, PFFP, FPPF,

FPFP, FFPP, PFFF, FPFF, FFPF, FFFP, FFFF}

For these two events, A or B happens to contain all 16 sample space points. In a
similar fashion, the event A and B, which consists only of the underlined events in
both A and B, is given by

A and B 5 {PPFF, PFPF, PFFP, FPPF, FPFP, FFPP}

In this case, it is possible to give a short verbal description of the event A and B;
namely, A and B 5 exactly two parts pass (and, hence, two fail) the stress test. Finally,
the complement of event A is

A= 5 {PFFF, FPFF, FFPF, FFFP, FFFF}

A= can also be verbally described as the event that at most one part passes the test.

When two events A and B have no simple events in common, we say that they
are mutually exclusive or disjoint. More intuitively, mutually exclusive events are
ones that cannot occur simultaneously; the occurrence of either event precludes the
occurrence of the other. In a Venn diagram, mutually exclusive events are depicted
as nonoverlapping regions. As we will see in Section 5.2, probability calculations in-
volving disjoint events are particularly simple. For this reason, we often try to decom-
pose complex events into collections of mutually exclusive events when computing
probabilities.
Several of the previous definitions can be extended to include events formed from
more than two events. These definitions are given in the next box.

definitions Given a chance experiment and any events A1, A2, A3, . . . , Ak:

1. T
he event A1 or A2 or A3 or . . . or Ak consists of all the simple events that are
contained in at least one of the events A1, A2, A3, . . . , or Ak. It can also be de-
scribed as the event that at least one of the events A1, A2, A3, . . . , or Ak occurs.
2. T
he event A1 and A2 and A3 and . . . and Ak consists of all simple events com-
mon to all the events A1, A2, A3, . . . , and Ak. This event can be described as
the event that all of the events A1, A2, A3, . . . , and Ak occur.
3. S
everal events A1, A2, A3, . . . , and Ak are said to be mutually exclusive or
disjoint if no two of them have any simple events in common.

Example 5.4 Sampling inspection is a common method for ascertaining the quality level of
batches (called lots) of finished products. Sampling inspection can be used by a
manufacturer to check the quality of products prior to shipment or by a customer
to check the quality of incoming shipments before accepting them. In either case,
sampling inspection is done by first selecting a random sample of n items from a lot
and counting the number of sampled items that do not meet quality standards.
Suppose, for example, that n 5 20 items are randomly selected from a large lot.
In this situation, an event that we might be interested in is A 5 the sample contains at
most one item that fails to meet quality standards. As you can imagine from reading the
other examples in this section, the sample space of the experiment of randomly select-
ing and testing 20 items is prohibitively large. Even a tree diagram is of no help in de-
picting the simple events or the event A itself. However, relying on only verbal descrip-
tions of the events, it is possible to decompose A into a combination of two less complex
events: B 5 no items fail inspection and C 5 exactly one item fails inspection. In fact,
it is not hard to see that the event B or C is the same as the event A. We write this as A 5
B or C. Furthermore, B and C are mutually exclusive events. In Section 5.2, we show
how to use this fact to more easily compute the probability that A occurs.

Section 5.1 Exercises

1. A random sample, without replacement, of three E1 5 the plant at site 1 is completed by the
items is to be selected from a population of five contract date
items (labeled a, b, c, d, and e). E2 5 the plant at site 2 is completed by the
a. List all possible different samples. contract date
b. List the samples that correspond to the event E3 5 the plant at site 3 is completed by the
A 5 items a and c are included in the sample. contract date
c. List the samples that correspond to the comple-
Draw a Venn diagram that depicts these three
ment of the event A in part (b).
events as intersecting circles. Shade the region on
2. An engineering firm is constructing power plants at the Venn diagram corresponding to each of the fol-
three different sites. Define the events E1, E2, and lowing events (redraw the Venn diagram for each
E3 as follows: question):

a. At least one plant is completed by the contract by vibrations during flight, some fasteners are slightly
date. crimped so that they lock more tightly. The amount
b. All plants are completed by the contract date. of crimping, however, must meet specific standards.
c. None of the plants is completed by the contract To test finished fasteners, an initial inspection classi-
date. fies them into two groups: those that meet standards
d. Only the plant at site 1 is completed by the con- and those that do not. Of those not meeting standards,
tract date. some are completely defective and must be scrapped,
e. Exactly one of the three plants is completed by whereas the rest can be run through a machine that
the contract date. readjusts the amount of crimping. Of the recrimped
f. Either the plant at site 1 or site 2 or both of the fasteners, some are corrected by the recrimping opera-
two plants are completed by the contract date. tion and pass inspection, whereas the remainder can-
not be salvaged and are scrapped. Draw a tree diagram
3. Let A and B denote the events A 5 there are more
that depicts the testing and rework operations.
than three defective items in a random sample of
ten items and B 5 there are fewer than six defec- 6. Information theory is concerned with the transmis-
tives in a random sample of ten items. sion of data, usually encoded as a stream of 0s and 1s,
a. Describe, in words, the event A and B. over communication channels. Because channels are
b. Describe, in words, the event A or B. “noisy,” there is a chance that some 0s sent through
c. Describe, in words, the complement of A. the channel are mistakenly received at the other end
as 1s, and vice versa. The majority of digits sent, how-
4. Draw a Venn diagram depicting two events A and
ever, are not altered by the channel. Draw a tree dia-
B that are not disjoint. Shade in the portion of this
gram that depicts the type of bit sent (either 0 or 1) and
diagram that corresponds to the event A and B=.
the type of bit received at the end of the channel.
5. Nuts and bolts used in aircraft manufacturing are
7. Use a Venn diagram to find a simple expression for
called fasteners. To ensure that they are not loosened
{A and B}= in terms of A= and B=.

5.2 Probability Concepts

Probability allows us to quantify the likelihood associated with uncertain events, that is,
events that result from chance experiments. Generally speaking, the probability of an
event can be thought of as the proportion of times that the event is expected to occur
in the long run. This definition works well for experiments that can be repeated many
times, such as in testing a large number of electronic components. After testing enough
components, we begin to get a good idea of the chance (i.e., probability) that the next
item tested will be defective or nondefective.
Probabilities are reported either as proportions (between 0 and 1) or as percentages
(between 0% and 100%). To simplify computations with probabilities, the shorthand
notation P(A) is used to denote the probability of an event A occurring. Thus the state-
ments P(A) 5 .30, the probability of event A occurring is .30, and the event A has a
30% chance of occurring are equivalent. As a general rule, it is best to write probabili-
ties as proportions when performing probability calculations, converting to percent-
ages only when it helps to interpret a probability statement.

Assigning Probabilities
Writing in his treatise Théorie Analytique des Probabilités (1812), mathematician and theo-
retical astronomer Pierre Simon de Laplace (1749–1827) stated that “at bottom, the theory

of probability is only common sense reduced to calculation.” With this brief statement,
Laplace recognized that any rigorous definition of probability must satisfy certain com-
monsense requirements. For example, the probability of any event must lie between 0 and
1. This is another way of stating the obvious condition that, in any number of repetitions
of an experiment, no event can occur less than 0% of the time nor more frequently than
100% of the time. In practice, this requirement provides a quick check on our probability
calculations; calculated values that lie outside the interval [0, 1] are immediate signals that
a mistake has occurred somewhere in the computations. Used correctly, the probability
formulas given in this chapter will never yield probabilities outside the interval [0, 1].
A second self-evident requirement is that probabilities of events must not lead to
logical inconsistencies. For example, it does not make sense to state that 90% of metal
parts pass a stress test and that 20% fail the test. These two probabilities are inconsistent
because we know that exactly 100%, not 110%, of the parts will either pass or fail the
test. In the same vein, it would not make sense to say that 90% pass and 5% fail the test,
since this implies the illogical conclusion that only 95% of all parts pass or fail the test.
To avoid nonsensical statements like these, we demand that the probabilities associated
with the simple events always total to exactly 1. Thus any sensible assignment of prob-
abilities to events must satisfy the following two basic requirements:

Probability Axioms
1. The probability of any event must lie between 0 and 1. That is, 0 # ( ) # 1 for any
event .
2. The total probability assigned to the sample space of an experiment must be 1.

Within the limits imposed by these axioms, there are several ways to determine
probabilities: (1) as frequencies of occurrence, (2) from subjective estimates, (3) by
assuming that events are equally likely, and (4) by using density and mass functions
(see Section 5.4). Depending on the circumstances, each method has its merits. For
example, when it is possible to repeat a chance experiment, the “frequentist” approach
defines the probability of an event A to be the long-run ratio
number of times A occurs
P 1A2 5
numbers of times experiment is repeated
The justification for this approach is that, as the number of trials increases, we expect
this ratio to stabilize and eventually approach a limiting value, which we take as our
definition of P(A). For example, let A be the event that a package sent within the state
of California for 2nd-day delivery actually arrives within 1 day. The results from sending
10 such packages (the first 10 replications) are as follows:

Package No. 1 2 3 4 5 6 7 8 9 10
Did A occur N Y Y Y N N Y Y N N
Relative frequency of A 0 .5 .667 .75 .6 .5 .571 .625 .556 .5

Figure 5.4(a) shows how the relative frequency fluctuates rather substantially over
the course of the first 50 replications. But as the number of replications continues
to increase, Figure 5.4(b) illustrates how the relative frequency stabilizes. Using
Figure 5.4(b), we would be inclined to state that P(A) is close to .60.

Relative frequency delivered in one day

1.0
Relative 9 .60
frequency 15 .7
.8

.6 Approaches .6
.6
.4
Relative 5
frequency .50
10
.2 .5

0
0 10 20 30 40 50 0 100 200 300 400 500 600 700 800 900 1000
Number of packages Number of packages
(a) (b)

Figure 5.4 Behavior of relative frequency: (a) initial fluctuation; (b) long-run stabilization

Of course, the frequentist approach does not work when experiments cannot
be faithfully replicated, as is the case with sports competitions. In these instances,
subjective estimates, guided by the probability axioms, can be used to arrive at
numerical probabilities that certain teams will win or lose a game. Needless to say,
entire texts can and have been written comparing the various methods for assigning
probabilities to events. It is not our purpose to compare each of these methods. Instead,
in Section 5.4, we emphasize the technique that is most often used in statistical studies,
defining probabilities by means of mass and density functions.

The Addition Rule for Disjoint Events

Unless otherwise noted, all content on this page is © Cengage Learning.

Probability rules, or laws, are formulas that are intended to simplify the process of cal-
culating the probabilities of complex events. They achieve this purpose by first decom-
posing some event of interest into two or more less complex events whose probabili-
ties are more easily found. The formulas then describe how to recombine the simpler
probabilities to find the probability of the original event. One of the most frequently
used laws is the addition rule for disjoint events, which states that the probability of
the event A1 or A2 or A3 or . . . or Ak is simply the sum of the individual probabilities
P(A1) 1 P(A2) 1 P(A3) 1 1 P(Ak) as long as all the events A1, A2, A3, . . . , and Ak are
mutually exclusive. The addition rule is usually applied to an event E by first finding a
collection of less complicated events A1, A2, A3, . . . , and Ak that satisfy two conditions:
(1) the events A1, A2, A3, . . . , and Ak are disjoint and (2) E 5 A1 or A2 or A3 or . . . or Ak.
The events A1, A2, A3, . . . , Ak are sometimes said to partition the event E into mutually
exclusive events.

The Addition Rule for Disjoint Events

Disjoint, or mutually exclusive, events are events that cannot occur simultaneously.
For any two disjoint events and ,

( )5 ( )1 ( )

More generally, for any collection of disjoint events 1, 2, 3, . . . , ,

( 1 2 3 ... ) 5 ( 1) 1 ( 2) 1 ( 3) 1 1 ( )

Example 5.5 Suppose that you want to find the probability that at most one item fails to meet
quality standards in a random sample of n 5 20 items from a large shipment of
such items. Denote the event of interest as A 5 at most one item fails to meet a
quality standard. In Example 5.4, we showed that A can be partitioned into the
events B 5 no items fail inspection and C 5 exactly one item fails inspection.
That is, we can write A 5 B or C, where B and C are disjoint events. According
to the addition rule for disjoint events, P(A) can be found by simply adding the
probabilities P(B) and P(C), both of which are easier to find than P(A). In fact,
in Section 5.4 we show that the binomial mass function can be used to find both
P(B) and P(C).

Complementary Events
The complement A= of an event A was defined in Section 5.1 to be the collection of
simple events that are not in A. In more intuitive terms, it is helpful to think of A9 as the
opposite of A when trying to express A= in words. For example, if A is the event that at
least one metal part passes a stress test, then the opposite event must be A= 5 no metal
parts pass the stress test. Notice that we did not need to write down the sample space of
the experiment to arrive at this description of A=. Consider how you might describe the
complement of A=. Since A and A= are opposites, then the complement of A= is simply
the event A itself, which we can write as (A=)= 5 A.
Yet another way to describe the complement of an event A is to say that when
A does not occur, then, necessarily, its complement A9 has occurred. Viewed this
way, the symbol A is somewhat like a switch that is either on (A) or off (A= ). The
truth-table logic you would use to describe electronic circuits can then be applied
to finding complements of complex events. For instance, consider how you might
go about finding the complement of the event A or B. If the event A or B does not
happen, then it must be true that both A and B do not happen, which we can express
by writing A= and B= . In equation form, {A or B}= 5 A= and B= . Figure 5.5 shows how
a tree diagram can be used to demonstrate the same result. The branches of the
tree depict all possible combinations of the events A, B, A= , and B9. The top three
branches correspond to the event A or B, which implies that its complement must
be the bottom branch, A= and B= .

and occur
These three branches
correspond to the
and occur event .
and occur

The remaining branch is

and occur the complement
of .

Figure 5.5 Finding the complement of the event or

Because an event A and its complement A= cannot occur simultaneously,

complementary events are special cases of mutually exclusive events. That is, A
and A= are disjoint. Furthermore, because we are 100% certain that exactly one of
these events will occur, P(A or A9) 5 1. Applying the formula for mutually exclusive
events yields
1 5 P(A or A=) 5 P(A) 1 P(A=)
which is called the law of complementary events and is usually written in the form
P(A) 5 1 2 P(A=)
The usefulness of this simple formula lies in the fact that it is sometimes easier to find
the probability of the complement A= rather than the probability of A itself.

definition When an event A does not occur, we say that its complement, denoted by A= ,
has occurred, and vice versa. The probabilities of A and A= are related by the
formula P(A) 5 1 2 P( A= ).

Example 5.6 Refer to Example 5.5. Suppose you want to find the probability that, of the 20 items
randomly selected for inspection, at least one item fails to meet quality standards.
Unless otherwise noted, all content on this page is © Cengage Learning.

Denote this event by D 5 at least one item fails inspection. One approach to find-
ing this probability is to partition D into the events E1, E2, E3, . . . , E20, where, for
each i 5 1, 2, 3, . . . , 20, the Ei denotes the event that exactly i items fail inspec-
tion. Since E1 through E20 are disjoint, the addition rule says that P(D) 5 P(E1) 1
P(E2) 1 1 P(E20). As mentioned in Example 5.5, the binomial mass function
could then be used to find each P(Ei) in this summation.
Although the addition rule will give the correct value for P(D), an easier
method for finding P(D) is to use the law of complementary events, P(D) 5
1 2 P(D=). The complement of the event D 5 at least one item fails inspection is
the event D= 5 no items fail inspection. As we will see in Section 5.4, finding P(D=)
requires only one computation with the binomial mass function, whereas the par-
tition method requires 20 separate computations.

The General Addition Rule

As we have seen in this section, finding the probability of an event E can be simplified
considerably if it is possible to first express E in the form E 5 A or B, or more generally, in
the form E 5 A1 or A2 or . . . or Ak, where the events A1, A2, . . . , Ak are mutually exclusive.
There are times, however, when it is not so easy to break up an event E into disjoint events.
In such cases it is helpful to have another method for finding the probability of E.
The general addition rule is used to find the probability of an event E that can be
written in the form E 5 A or B, where events A and B are not necessarily disjoint. When
an event is expressed in the form A or B, its probability can be calculated from the fol-
lowing formula:
P(A or B) 5 P(A) 1 P(B) 2 P(A and B)
which is called the general addition rule. This formula can be applied to any two events
A and B.
Although it is indeed more generally applicable than the addition rule for disjoint
events, notice that the general addition rule presupposes that you are able to find the
probability of the event A and B, which can often be just as difficult as finding P(A or B).
However, as you will see in Section 5.3, when A and B satisfy certain additional condi-
tions, it is relatively easy to find P(A and B).

The General Addition Rule

For any two events and , which need not be mutually exclusive,
( )5 ( )1 ( )2 ( )

Sample space

Unless otherwise noted, all content on this page is © Cengage Learning.

and

Figure 5.6 Venn diagram of or

Here is a simple intuitive justification for the general addition rule. Referring to the
Venn diagram in Figure 5.6, imagine that the events A and B represent circular rugs on
a floor and that we want to find the total floor area covered by these two rugs, analogous
to determining P(A or B). For the purposes of this example, think of P(A) and P(B) rep-
resenting the floor areas covered by each rug individually. To find the total area covered

by both rugs we could start by adding the areas of these two rugs, but then the floor area
where the two rugs overlap has been counted twice by this simple addition. The obvi-
ous remedy is to subtract the overlapping area, represented by P(A and B), once from
the sum, giving a final result of P(A) 1 P(B) 2 P(A and B). This is in essence how the
general addition rule works.

Example 5.7 In a certain residential suburb, 60% of all households get Internet service from the
local cable company, 80% get television service from that company, and 50% get
both services from that company. If a household is randomly selected, what is the
probability that it gets at least one of these two services from the company? With A 5
{gets Internet service} and B 5 {gets TV service}, the given information implies that
P(A) 5 .6, P(B) 5 .8 and P(A and B) 5 .5. The general addition rule now yields
P(subscribes to at least one of the two services)
5 P(A or B) 5 P(A) 1 P(B) 2 P(A and B) 5 .6 1 .8 2 .5 5 .9

Section 5.2 Exercises

8. Two methods are proposed for testing a shipment of the same collection of 10,000 solder joints for a par-
five items (call them A, B, C, D, and E). In method 1, ticular problem:
an inspector randomly samples two of the five items Number of
and tests to see whether either item is defective. In defective solder
method 2, an inspector randomly samples one item joints found
and tests it; the remaining four items are sent to a sec-
Inspector A 724
ond inspector who randomly samples one item and
Inspector B 751
tests it. Suppose that only item A is defective in the
Common to both inspectors 316
shipment.
a. How many defective solder joints were found by
a. What is the probability that item A will be dis-
the two inspectors?
covered by method 1?
b. How many defective solder joints found by in-
b. What is the probability that item A will be dis-
spector A were not found by inspector B?
covered by method 2?
c. What general statement can you make regard- 10. For any collection of events A1, A2, A3, . . . , Ak, it
ing the effectiveness of the two methods? Can can be shown that the inequality
your statement be extended to methods involv- P(A1 and A2 and A3 and . . . and Ak)
ing samples of more than two items? $1 2 [P(A1= ) 1 P(A2= ) 1 P(A3= ) 1 1 P(Ak= )]
9. Human visual inspection of solder joints on printed always holds. This inequality is particularly use-
circuit boards can be very subjective. Part of the ful when each of the events has relatively high
problem stems from the numerous types of solder probability. Suppose, for example, that a system
defects (e.g., pad nonwetting, knee visibility, voids) consists of ten components connected in series
and even the degree to which a joint possesses one (cf. Example 5.8, Section 5.3) and that each com-
or more of these defects. Consequently, even highly ponent has a .999 probability of functioning without
trained inspectors can disagree when examining failure. What lower bound can you put on the reli-
the same circuit board. The accompanying table ability (i.e., the probability of functioning correctly)
shows the results of two inspectors who examined of the system built from these ten components?

11. For any collection of events A1, A2, A3, . . . , Ak, it 12. Suppose that 55% of all adults regularly consume
can be shown that the inequality coffee, 45% regularly consume carbonated soda,
and 70% regularly consume at least one of these
P(A1 or A2 or A3 or . . . or Ak)
two types of drinks.
# P(A1) 1 P(A2) 1 P(A3) 1 1 P(Ak)
a. What is the probability that a randomly selected
always holds. This inequality is most useful in cases adult regularly consumes both coffee and soda?
where the events involved have relatively small b. What is the probability that a randomly selected
probabilities. For example, suppose a system con- adult doesn’t regularly consume at least one of
sists of five subcomponents connected in series these two products?
(cf. Example 5.8) and that each component has a c. What is the probability that a randomly selected
.01 probability of failing. Find an upper bound on adult regularly consumes coffee but does not
the probability that the entire system fails. regularly consume soda?

5.3 Conditional Probability and Independence

Conducting experimental studies is an iterative process. An initial guess or hypothesis is
compared with experimental data, new hypotheses are formed, more data is gathered, and
the process repeats itself until we are satisfied with the knowledge gained from experimen-
tation. The process of adjusting our view of the world as more information is gathered can
also be applied to calculating probabilities. In this context, we ask how the knowledge that
a certain event B has occurred can be used to update our initial assessment of the prob-
ability that another event A will occur. Sometimes, the probability that A occurs depends
heavily on whether B has occurred. In such cases, we use the methods of conditional
probability. At other times, when the occurrence or nonoccurrence of B has no effect at
all on the probability that A occurs, we say that A and B are independent events.
From the standpoint of probability calculations, independent events are especially
easy to work with. This is one of the primary reasons that statistical methods usually in-
corporate some sort of random procedure, such as random sampling or randomization,
as a method for ensuring that certain events will be independent.

Conditional Probability
Before shipping finished products, manufacturers routinely use automatic test equip-
ment (ATE) to assess the functionality of products and systems. In addition to giving
physical measurements of product characteristics, ATE machines can conduct a se-
quence of complex tests that eventually result in a final “thumbs up” or “thumbs down”
determination for the item being tested. Before testing, historical process data can be
used to estimate the probability that any particular item will function correctly. Sup-
pose, for example, that such records show that 95% of the items in a certain product line
perform correctly. Letting A denote the event that a randomly selected item is defect
free, we can then say that P(A) 5 .95. Now consider how this estimate may change
when we submit a particular item to an ATE test. Because the determinations given by
ATE are good but not perfect, we will want to give a good deal of weight, but not 100%,
to the ATE test result. Thus if the ATE test indicates that the item is defective, then we
will definitely want to reduce our estimate of P(A). Alternatively, if the item passes the
ATE test, then we will revise P(A) upward. In both cases, we want to update our estimate
of P(A) for the item being tested by factoring in the new information from the ATE test.

Let B 5 the item passes the ATE test. Then the conditional probability of A given
B is denoted by P(A B). Conditional probabilities are computed from the following
definition:
P(A and B)
P(A|B) 5
P(B)
This formula can be justified by thinking of probability as the proportion of times that
an event occurs in a large number of trials N: About P(B) 3 N of the trials will result in
items that pass the ATE test and about P(A and B) 3 N of the trials will correspond to
items that not only pass the test but are truly defect-free. Thus P(A|B), the proportion
of items that are truly defect-free out of the total number passing the ATE test, should
be P(A and B) 3 Ny (P(B) 3 N), which simplifies to P(A and B)y P(B).
Tree diagrams are very useful for summarizing problems that involve conditional
probabilities. Figure 5.7 shows such a diagram for our ATE example. Note that con-
ditional probabilities correspond to the branches on the tree. By writing the formula
P(A|B) 5 P(A and B)yP(B) in the form P(A and B) 5 P(B)P(A|B), we see that the prob-
ability of taking a particular path through the diagram (from left to right) is simply the
product of the probabilities of the branches that comprise that path.

( | ) ( )= ( ) ( )

( )
( | ) ( )= ( ) ( )

( | ) ( )= ( ) ( )
( )

( | ) ( )= ( ) ( )

Figure 5.7 Tree diagram for depicting probabilities

definition Let A and B be two events with P(B) . 0. The conditional probability of A
occurring given that event B has already occurred is denoted by P(A|B) and can
be calculated from the formula P(A | B) 5 P(A and B)yP(B).
Unless otherwise noted, all content on this page is © Cengage Learning.

Independent Events
Conditional probability is used when the likelihood of occurrence of an event depends
on whether or not another event occurs. At the other end of the spectrum are events
that do not impose such restrictions on each other’s chances of occurring. Two events,
A and B, are said to be independent if the occurrence of either event has no effect
whatsoever on the likelihood of occurrence of the other. This definition readily extends
to any number of events.
To understand the role played by independence in probability calculations, con-
sider the following example. To filter certain harmful particles out of a given volume
of air, suppose we sequentially use two filters A and B, each of which captures a large
percentage of the particles in any air passing through it. In particular, filter A allows

only 5% of the particles to pass through, whereas filter B has about a 10% pass-through
rate. If we begin with a fixed volume of air containing V harmful particles, then after
passing through filter A, there should be (.05)V particles remaining. When this air is
screened through filter B, an additional 90% of the remaining particles are removed,
leaving a total of (.10)(.05)V particles after the two screenings. We then ask, Would it
make any difference if we changed the order in which the filtering is performed? This
is equivalent to asking, Do the two filters perform independently of one another? If the
filters are independent, then we should be able to reverse the filtering procedure without
changing the pass-through rates of the filters (see Figure 5.8). Thus filter B leaves (.10)
V particles, of which filter A then leaves (.05)(.10)V particles.

Filter A Filter B
(.05) (.10)(.05)

Initial number Number of harmful

of harmful particles particles left after filtering

Filter B Filter A
(.10) (.05)(.10)

Figure 5.8 Two air filters acting independently

If we think of the pass-through rates as probabilities of the events A 5 filter A lets

through a harmful particle and B 5 filter B lets through a harmful particle, then indepen-
dence allows us to conclude that the proportion of particles left after applying both filters is
P(A and B) 5 P(A)P(B) 5 (.05)(.10)
and that the order in which we apply the filters does not affect the probability of the
events A and B. Independent events, then, allow us to reduce the calculation of P(A and
B) to a simple multiplication. Furthermore, it is easy to see from our filter example that
this multiplication formula can be extended to any number of independent filters A1, A2,
A3, . . . , Ak, whose overall pass-through rate should be
P(A1 and A2 and A3 and . . . and Ak) 5 P(A1)P(A2)P(A3) P(Ak)
and that the order in which the filters are applied should not affect the final probability.

Unless otherwise noted, all content on this page is © Cengage Learning.

definition Two events, A and B, are independent events if the probability that either one
occurs is not affected by the occurrence of the other. In this case,
P(A and B) 5 P(A)P(B)
Several events, A1, A2, A3, . . . , Ak, are independent if the probability of each
event is unaltered by the occurrence of any subset of the remaining events. In
this case, the product rule can be applied to any subset of the k events. That is,
the probability that all the events in any subset occur equals the product of their
individual probabilities of occurring. In particular, for all k events,
P(A1 and A2 and A3 and and Ak) 5 P(A1)P(A2)P(A3) P(Ak)

Determining whether two (or more) events are independent is not quite as easy as
deciding whether they are mutually exclusive. With independent events, we rely either
on our intuition or on special procedures (such as random sampling and randomization).
Intuition is what we generally employ when we assume that different tosses of a coin or
different air filters are independent. With statistical methods, on the other hand, we rely
on random sampling, not intuition, to ensure that events are independent. Practically
speaking, we often assume independence when we do not know of any strong reasons
why the events should be related. At other times, independence provides a reasonable
approximation to the truth for the application at hand, but it may not be reasonable if
the situation changes a little. In our filter example, for instance, independence may be
a good assumption when the volume of particulate matter in the air is relatively large,
but it may cease to be valid for small volumes (e.g., after being screened by one filter,
the volume of particles may have dropped below the detection limit of the other filter).

Example 5.8 One branch of reliability theory, called topological reliability, is concerned with
calculating the reliability of systems comprising several components connected
in specific patterns. One common layout for components is the series system
(Figure 5.9), in which the system operates correctly only if each of its subcompo-
nents works correctly. A familiar example of such a system is a circuit with two
switches, both of which must be closed for the circuit to conduct electricity. It is
commonly assumed that the components are independent when performing reli-
ability calculations.
Component A Component B

Figure 5.9 A two-component series system, which functions

correctly only if components function correctly

Suppose that the switches A and B in a two-component series system are

closed about 60% and 80% of the time, respectively. If we assume that the clos-
ing of switch A occurs independently of switch B, the probability that the entire
circuit is closed is
P(circuit closed) 5 P(A closed and B closed)
Unless otherwise noted, all content on this page is © Cengage Learning.

5 P(A closed) P(B closed)

5 (.60)(.80) 5 .48
That is, the circuit will be closed about 48% of the time.

Combining Several Concepts

The independence of two events A and B carries over to their complements. In particu-
lar, if A and B are independent, then any pairing of A or its complement with B or its
complement will also produce a pair of independent events. That is, each of the pairs
of events A9 and B, A9 and B9, and A and B9 will be independent if A and B are inde-
pendent (cf. Exercise 25). To see how this fact can be used, let’s consider a frequently
asked probability question: What is the chance that at least one of a set of independent

events will occur? For two independent events, A and B, the event that at least one of
these events occurs can be written {A or B}. As we showed in our discussion of comple-
mentary events, the complement of {A or B} is the event {A9 and B9}. Therefore, using
the additional knowledge that the complements of independent events must themselves
be independent, we can write
P(at least one of two independent events occurs)
5 P(A or B) 5 1 2 P(A= and B=) 5 1 2 P(A=)P(B=)
This formula can readily be extended to any number of independent events, A1, A2,
A3, . . . , Ak. That is,
P(at least one of k independent events occurs)
5 1 2 P(A1= )P(A2= )P(A3= ) P(Ak= )
The “at least one” rule has numerous applications, two of which are given in the
following examples.

Example 5.9 In an example demonstrating how vendor quality affects customer quality, H. S.
Gitlow and D. A. Wiesner (“Vendor Relations: An Important Piece of the Quality
Puzzle,” Quality Progress, 1988: 19–23) considered a hypothetical product consist-
ing of 50 critical parts, any one of which, if defective, could cause the finished
product to be defective. Suppose that each of these parts is purchased from a dif-
ferent vendor. It is therefore reasonable to assume that the condition of each part,
created by a different vendor, should be independent of the conditions of the oth-
ers. Furthermore, suppose that about 99.5% of all the parts supplied by a given
vendor are good. What is the overall proportion of assembled products that can be
expected to be defective?
To answer this question, let Di denote the event that the part purchased from the
ith vendor is defective, so that P(Di) 5 .005 and P(Di=) 5 .995. Then, the probability
we seek is

P(at least one of the 50 parts is defective) 5 1 2 P(D1= )P(D2= )P(D3= ) P(D50
=
)
50
5 1 2 (.995) 5 1 2 .7783 5 .2217
This example demonstrates the important point that it is possible for complex
systems to have high failure rates even if the quality of their individual components
is relatively good.

Example 5.10 Consider the portion of an electronic circuit diagrammed in Figure 5.10. The cir-
cuit is primarily a parallel system (i.e., either switch A or both switches B and C
must function if the current is to flow from left to right). The branch containing
switches B and C, however, forms a series system. To compute the probability that a
closed circuit is made between the left and right sides of the diagram, we must find
the probability of the event {A or {B and C}}. Assuming that the switches function

independently of one another and that they are closed with probabilities P(A) 5 .80,
P(B) 5 .70, and P(C) 5 .90, we proceed as follows:
The general addition
rule applied to the
P(A or (B and C)) 5 P(A) 1 P(B and C) 2 P(A and (B and C)) events A and
{B and C}
Since A, B, and C
5 P(A) 1 P(B)P(C) 2 P(A)P(B)P(C) are independent
5 .80 1 (.70)(.90) 2 (.80)(.70)(.90) 5 .926

Thus the circuit is closed about 92.6% of the time. Since switch A is closed 80% of
the time, the probability that the circuit is closed must certainly exceed 80%, so our
answer makes sense.

Switch A

Switch B Switch C

Figure 5.10 Series and parallel circuit with

three switches shown in their open positions

Section 5.3 Exercises

13. Five companies (A, B, C, D, and E) that make elec- a. Use the addition law to show that P(A) 5
trical relays compete each year to be the sole sup- P(A and B) 1 P(A and B9).
plier of relays to a major automobile manufacturer. b. Use the conditional probability formula to
The auto company’s records show that the probabili- write P(A and B) in terms of P(A | B) and P(B).
ties of choosing a company to be the sole supplier are Develop a similar formula for P(A and B9) in
terms of P(A | B9) and P(B9).
Supplier chosen: A B C D E
c. Use parts (a) and (b) to show that
Probability: .20 .25 .15 .30 .10
a. Suppose that supplier E goes out of business this P(AuB)P(B)
Unless otherwise noted, all content on this page is © Cengage Learning.

P(BuA) 5
year, leaving the remaining four companies to P(AuB)P(B) 1 P(AuB=)P(B=)
compete with one another. What are the new
This formula, known as Bayes’ theorem, is used
probabilities of companies A, B, C, and D being
to “turn conditional probabilities around”; that
chosen as the sole supplier this year?
is, it allows us to express P(B | A) in terms of
b. Suppose the auto company narrows the choice
P(A | B) and P(A | B=).
of suppliers to companies A and C. What is the
d. In Figure 5.7, the probability associated with any
probability that company A is chosen this year?
path from left to right through the tree is simply
14. Refer to the tree diagram in Figure 5.7. Suppose the product of the probabilities of the branches.
you want to find the probability P(B | A) using the Why?
information available in the tree diagram. To do e. Use the observation in part (d) and the condi-
this, P(B | A) must be expressed in terms of conditional probability formula for P(B | A) to justify
tional probabilities, like P(A | B) and P(A9 | B). Bayes’ theorem.

15. In Exercise 5, suppose that 95% of the fasteners pass d. Suppose that 5 1026. What is the probabil-
the initial inspection. Of those that fail inspection, ity that at least one person in a sample of one
20% are defective. Of the fasteners sent to the re- million will have a blood type matching that
crimping operation, 40% cannot be corrected and found at the crime scene?
are scrapped; the rest are corrected by the recrimp-
19. In forensic science, the probability that any two
ing and then pass inspection.
people match with respect to a given characteristic
a. What proportion of fasteners that fail the initial
(hair color, blood type, etc.) is called the probability
inspection pass the second inspection (after the
of a match. Suppose that the frequencies of blood
recrimping operation)?
phenotypes in the population are as follows:
b. What proportion of fasteners pass inspection?
c. Given that a fastener passes inspection, what is A B AB O
the probability that it passed the initial inspection .42 .10 .04 .44
and did not have to go through the recrimping
a. What is the probability that two randomly cho-
operation?
sen people both have blood type A?
16. In Exercise 6, suppose that there is a probability of b. Repeat the calculation in part (a) for the other
.01 that a digit is incorrectly sent over a commu- three blood types.
nication channel (i.e., that a digit sent as a 1 is re- c. Find the probability that two randomly chosen
ceived as a 0, or a digit sent as a 0 is received as a 1). people have matching blood types. Note: A per-
Consider a message that consists of exactly 60% 1s. son can have only one phenotype.
a. What is the proportion of 1s received at the end d. The probability that two people do not match for a
of the channel? given characteristic is called discriminating power.
b. If a 1 is received, what is the probability that a 1 was What is the discriminating power for the compari-
sent? Hint: Use the tree diagram from Exercise 6. son of two people’s blood types in part (c)?

17. Suppose that A and B are independent events with 20. A construction firm has bid on two different con-
P(A) 5 .5 and P(B) 5 .6. Can A and B be mutually tracts. Let E1 be the event that the bid on the first
exclusive events? contract is successful, and define E2 analogously for
the second contract. Suppose that P(E1) 5 .4 and
18. Probability calculations play an important role in
P(E2) 5 .3 and that E1 and E2 are independent.
modern forensic science (Aitken, C., Statistics and
a. Find the probability that both bids are successful.
the Evaluation of Evidence for Forensic Scientists,
b. Find the probability that neither bid is successful.
John Wiley, New York, 1995). Suppose that a sus-
c. Find the probability that at least one of the bids
pect is found whose blood type matches a rare
is successful.
blood type found at a crime scene. Let denote
the frequency with which people in the popula- 21. Consider a system of components connected as
Unless otherwise noted, all content on this page is © Cengage Learning.
tion have this particular blood type. Assuming that shown in the following figure.
people in the population are sampled at random,
answer the following questions: 1
a. What is the probability that a randomly chosen
person from the population does not have the 2
same blood type as that found at the crime scene?
b. What is the probability that none of n randomly 3 4
chosen people will match the blood type found
at the crime scene? Components 1 and 2 are connected in parallel, so
c. What is the probability that at least one person that their subsystem functions correctly if either
in a random sample of n people will match the component 1 or 2 functions. Components 3 and 4
blood type found at the crime scene? are connected in series, so their subsystem works

only if both components work correctly. If all com- Each point plotted on a control chart can signal either
ponents work independently of one another and that a manufacturing process is operating correctly or
P(a given component works) 5 .9, calculate the that it is not operating correctly. However, even when
probability that the entire system works correctly. a process is running correctly, there is a small prob-
ability, say, 1%, that a charted point will mistakenly
22. The reviews editor for a certain scientific journal
signal that there is a problem with the process.
decides whether the review for any particular book
a. What is the probability that at least one of ten
should be short (1–2 pages), medium (3–4 pages),
points on a control chart signals a problem with
or long (5–6 pages). Data on recent reviews indicates
a manufacturing process when in fact the pro-
that 60% of them are short, 30% are medium, and the
cess is running correctly?
other 10% are long. Reviews are submitted in either
b. What is the probability that at least 1 of 25 points
Word or a typesetting program called LaTeX. For short
on a control chart signals a problem with a man-
reviews, 80% are in Word, whereas 50% of medium
ufacturing process when in fact the process is
reviews are in Word and 30% of long reviews are in
running correctly?
Word. Suppose a recent review is randomly selected.
a. What is the probability that the selected review 25. If A and B are independent events, show that A9
was submitted in Word format? and B are also independent. Hint: Use a Venn
b. Suppose you are told the selected review was diagram to show that P(A= and B) 5 P(B) 2
submitted in Word format. What is the probabil- P(A and B).
ity that the review was medium in length?
26. In October 1994, a flaw in a certain Pentium chip
23. In a certain population, 1% of all individuals are installed in computers was discovered that could re-
carriers of a particular disease. A diagnostic test for sult in a wrong answer when performing a division.
this disease has a 90% detection rate for carriers and The manufacturer initially claimed that the chance
a 5% detection rate for noncarriers. Suppose that of any particular division being incorrect was only
the diagnostic test is applied independently to two 1 in 9 billion, so that it would take thousands of
different samples from the same randomly selected years before a typical user encountered a mistake.
individual. However, statisticians are not typical users; some
a. What is the probability that both tests yield the modern statistical techniques are so computation-
same result? ally intensive that a billion divisions over a short
b. If both tests are positive, what is the probability time period is not outside the realm of possibility.
that the selected individual is a carrier? Assuming that the 1 in 9 billion figure is correct and
that results of different divisions are independent of
24. One of the assumptions underlying the theory of con-
one another. What is the probability that at least 1
trol charts (see Chapter 6) is that the successive points
error occurs in 1 billion divisions with this chip?
plotted on a chart are independent of one another.

5.4 Random Variables

Scientific and engineering studies rely heavily on numerical measurements derived
from experiments. Indeed, it is often easy to think in terms of measurements them-
selves, not physical outcomes, as the end products of an experiment. Although you can
imagine the various physical materials (the outcomes) that could result from repeating
the chemical reaction, it is much more natural to think in terms of a numerical quantity
of interest, such as the yield that might occur. Because measurements predominate in
scientific studies, a mechanism is needed for extending probability concepts from the
realm of simple events to the more natural scientific domain of numerical outcomes.

Random Variables
When the same numerical characteristic can conceivably be measured on any out-
come of a chance experiment, we say that this quantity is a random variable. For
instance, the measured yield of a chemical reaction is a random variable. Random-
ness enters the picture because we expect there to be slight unpredictable differenc-
es between each repetition of the reaction, which, in turn, will be reflected in the
measured yields. There can be any number of random variables associated with a
chance experiment. In a chemical reaction, any quantifiable feature associated with
the reaction is a random variable (e.g., yield, density, weight, viscosity, volume, and
translucence of the material produced). To make them easier to work with, random
variables are usually denoted by single letters near the end of the alphabet. The yield
of a chemical reaction might simply be denoted by the letter x, the density of the ma-
terial by w, and so forth. The assignment of a letter to a random variable is sometimes
written in the form of an equation, such as x 5 yield of a chemical reaction or w 5
density of the material produced in the reaction.
Technically speaking, the numerical values of a random variable are not the simple
events of a chance experiment. Instead, a random variable is a function that assigns
numerical values to the possible outcomes of a chance experiment, as illustrated in
Figure 5.11. Notice that it is possible for more than one point in the sample space to be
assigned the same real number. For instance, the random variable y 5 number of metal
parts that pass a stress test out of three randomly selected parts assigns the number y 5 2
to each of the sample space points PPF, PFP, and FPP.

Sample space outcomes

Simple events that

comprise event

Unless otherwise noted, all content on this page is © Cengage Learning.

Numerical measurements
from each outcome in

Figure 5.11 A random variable assigns numerical

values to the outcomes in the sample space

Because measurements are either discrete or continuous, random variables are

also classified as discrete random variables or continuous random variables. Recall
that discrete measurements generally arise when we count things, whereas continuous
measurements are the result of using a measuring instrument. The yield of a chemical
reaction would be a continuous random variable, whereas the number of metal parts
passing a stress test would be a discrete random variable.

definitions A numerical characteristic whose value depends on the outcome of a chance

experiment is called a random variable. A random variable is discrete if its pos-
sible values form a finite set or, perhaps, an infinite sequence of real numbers.
Otherwise, a variable is continuous if its possible values span an entire interval
of real numbers.

Events Defined by Random Variables

Although technically accurate, the description of a random variable as a function that
assigns numerical values to sample space outcomes is not essential to most statistical
applications. It is usually more helpful to think of random variables simply as variables
whose values are likely to lie within certain ranges of the real number line. For example,
the event that at least two of four randomly selected metal parts pass a stress test can
simply be depicted by the numbers x 5 2, 3, and 4 on the real number line, where
x 5 number of parts passing the stress test is a discrete random variable (Figure 5.12).
In other words, we often suppress the picture of the sample space in Figures 5.11 and
5.12 and simply think of an event as a list or interval of numbers on the horizontal axis.
With discrete variables, events correspond to finite or countable collections of points on
the number line. For instance, the event {x $ 2} corresponds to the integers 2, 3, and 4
for the random variable x 5 number of parts passing the stress test. The event {y $ 2}
corresponds to the infinite collection of integers y 5 2, 3, 4, . . . for the variable y 5
number of parts tested until one is found that fails the stress test. For continuous random
variables, events such as {x . 3.21}, {x # 5.4}, or {18 # x # 21} all refer to the real num-
bers contained in these intervals.

Sample space
Unless otherwise noted, all content on this page is © Cengage Learning.

0 1 2 3 4

Figure 5.12 The event that at least two parts pass a stress test
and the random variable 5 number of parts passing the
stress test

The probability laws introduced in previous sections can be applied to events

defined by random variables. For instance, let x 5 length (in inches) of a randomly

selected manufactured part. Then an event {18 # x # 21} can, if desired, be partitioned
into the disjoint events {18 # x # 21} 5 {18 # x , 19} or {19 # x , 20} or {20 # x #
21}. Notice that the particular choice of strict and inclusive inequality signs is what
causes these events to be disjoint. The addition rule for disjoint events then states that
P(18 # x # 21) 5 P(18 # x , 19) 1 P(19 # x , 20) 1 P(20 # x # 21). Similarly, because
the event {x . 18} is the complement of the event {x # 18}, the law of complementary
events allows us to write P(x # 18) 5 1 2 P(x . 18).

Probability Distributions
The mechanism for assigning probabilities to events defined by random variables is
to use either a mass function (for discrete random variables) or a density function (for
continuous variables). In either case, we first envision an event of interest as a particular
subset of the real number line. For discrete variables, the probability of the event is
defined to be the sum of the mass function values that lie within the event subset. For
continuous variables, the probability of an event is defined to be the area under the
portion of the density curve that lies over the event on the number line. Figure 5.13
shows how a mass or density function assigns a probability to any event of interest on the
real number line. When used to describe random variables, mass functions and density
functions are both called probability distributions.

(2 5) = (2) + (3) + (4) + (5)

(3)

(2.3 5.4)
(2) (4)

(5)
(1)
(0) (6)
(7)

0 1 2 3 4 5 6 7 2.3 5.4
Event: 2 5 Event: 2.3 5.4
Discrete random variable Continuous random variable
Unless otherwise noted, all content on this page is © Cengage Learning.
Figure 5.13 Using mass or density functions to assign probabilities to events

When a probability distribution has one of the familiar distributional forms de-
scribed in Chapter 1, the methods described in that chapter can be used to find event
probabilities. For example, if we believe that the length, x, of a randomly selected part
can be described by a normal distribution with a mean of 20 cm and a standard devia-
tion of 1.8 cm, then probabilities associated with x are found by standardizing, as shown
in Chapter 1. Thus
18 2 20 21 2 20
P(18 # x # 21) 5 P a #z# b
1.8 1.8
5 P(21.11 # z # .56) 5 .5788

There are several ways to choose an appropriate probability distribution for describing
a random variable. In the upcoming examples and chapters, we will use the following
methods to justify our choices of probability distributions:
1. Examine a histogram of data, and select a familiar density or mass function
whose shape approximately matches that of the histogram.
2. Use a density or mass function recommended by previous studies or profes-
sional practice.
3. Verify conditions that are known to give rise to certain mass or density func-
tions (see binomial distributions in Section 1.6, normal distributions in
Section 5.6).

Example 5.11 Examples 5.4–5.6 describe several events related to the chance experiment of ran-
domly sampling and testing 20 items from a large shipment:

A 5 at most one of the sampled items fails to meet quality standards

B 5 none of the sampled items fails to meet quality standards
C 5 exactly one item fails to meet quality standards
D 5 at least one item fails to meet quality standards

These events can be recast in terms of the random variable x 5 number of items that
fail to meet quality standards, as follows:
A {x # 1}
B {x 5 0}
C {x 5 1}
D {x $ 1}
Because random sampling ensures that each of the 20 selections is independent of
the others, a binomial mass function is a good choice for describing probabilities
associated with x (see Section 1.6). Suppose that it is known from manufacturing
records that about 2% of all such items do not conform to quality standards. Using
5 .02 and n 5 20 in the formula for the binomial mass function, we calculate the
probabilities of the previously described events as
P(x # 1) 5 P(x 5 0) 1 P(x 5 1)
20! 20!
5 (.02)0(.98)20 1 (.02)1(.98)19
0! 20! 1! 19!
5 1(.98)20 1 20(.02)(.98)19
5 .6676 1 .2725 5.9401

P(B) P(C)
Thus if groups of 20 items are repeatedly selected, in the long run about 94% of all
groups should have at most one item failing to meet standards.

The probability that at least one item fails to meet quality standards is
P(x $ 1) 5 1 2 P(x 5 0) 5 1 2 .6676 5 .3324
Notice that the addition rule and the law of complementary events were used to
simplify the computations of P(x # 1) and P(x $ 1).

Mean and Variance of a Random Variable

The mean of a random variable x can be thought of as the long-run average value of x that
should occur in many repeated trials of a chance experiment. Fortunately, when the prob-
ability distribution of x is known, there is no need to actually perform repeated experimen-
tal trials. Instead, we define the mean to be the mean of the population described by the
mass or density function and then use the methods of Chapter 2 to compute it. The same
notation used to describe the mean of a population is now used to denote the mean of
a random variable. For a known mass function p(x), the mean is defined as

5 ^ xp(x)
x

For a known density function f (x), the mean is given by

5 #x f (x) dx

Similarly, the variance 2 of a random variable is calculated from the familiar formulas
in Chapter 2. The standard deviation of a random variable is defined to be the square
root of its variance:

2 5 ^ (x2)2p(x) or 2 5 #(x2)2f(x) dx

The mean and standard deviation of a random variable frequently appear as param-
eters in the defining formulas for a mass or density function. For this reason, it is often
necessary to obtain estimates of and before probability calculations are possible. As
discussed later in the chapter, statistics such as the sample mean, x, and sample standard
deviation, s, are frequently used to provide such estimates.

Example 5.12 The reliability of a product at time t, denoted by R(t), is defined as the probability
that the product is still working correctly after t units of time (see Section 6.6).
For complex products consisting of several parts and subassemblies, the time x
until a product fails often follows an exponential distribution with parameter .
In such applications, the mean of the distribution, 5 1y , is called either the
mean time between (or before) failures (MTBF) or the mean time to failure
(MTTF). According to the definition of R(t), the reliability can be calculated
from the formula

R(t) 5 P(x . t) 5 # e 2x dx 5 e 2t

Suppose that the lifetime of a certain product follows an exponential distribu-

tion with an MTBF of 10,000 hr and that we want to find the proportion of such
products that fail before 20,000 hr of service. Since MTBF 5 1y , the value of
for this distribution is 5 1y 10,000 5 .0001 hr21, and the reliability function is
given by
R(t) 5 P(x . t) 5 e 2t 5 e2.0001t
For t 5 20,000 hr, the reliability is then R(20,000) 5 e2(.0001)(20,000) 5 e22 5 .1353.
That is, about 13.53% of the products will last at least 20,000 hours. The proportion
of products that do not last at least 20,000 hours is found by using the formula for
complementary events:
P(product fails before 20,000 hr) 5 1 2 P(product lasts at least 20,000 hr)
5 1 2 .1353 5 .8647

Calculations with Random Variables

Mathematical operations (e.g., addition, multiplication, exponentiation, square roots)
can be applied to random variables. One reason for doing this is to reduce probability
statements about one random variable to statements about a more familiar random
variable whose probabilities are well known. This is what we do, for instance, when
simplifying statements about a normal random variable x. By performing the arithmetic
operations of subtraction (to form x 2 ) and division (to form (x 2 )y ), we even-
tually reduce probability statements about x to statements about the standard normal
variable z, whose probabilities are easily found in tables.

Example 5.13 Resistors come in two varieties, general purpose (with tolerances of 65% or greater)
and precision (with tolerances of 62% or less). The tolerance is the amount by which
the true resistance can deviate from the stated resistance. For example, a 6.0-kilohm
(kV) resistor with a tolerance of 610% can be expected to have a measured resis-
tance of 6.0 6 (.10)(6.0), that is, from 5.4 kV to 6.6 kV. Assuming that a uniform
density adequately describes the possible values of x, then the true resistance x of a
randomly selected 6.0-kV resistor is a random variable described by the density func-
tion (see Chapter 1 for the definition of uniform densities):
1
for 5.4 , x , 6.6
f (x) 5 c 1.2
0 otherwise
Suppose we want to find the probability that the conductance (defined as the recip-
rocal of resistance) is greater than a specified amount, say, .16 siemens (S). Writing
this probability statement, we can take reciprocals of both sides to find
1
P(conductance . .16) 5 Pa . .16b 5 P(x , 6.25) 5 .7083
x

Mathematical operations can also be applied to several random variables. In sta-

tistical applications, for example, we commonly form sums of several random variables
and ask about the probability with which these sums assume various numerical values.
Such calculations are greatly simplified if the random variables involved are known to
be independent of one another. Two random variables, x and y, are said to be indepen-
dent if the events {x , a} and {y , b} are independent for all possible combinations of
real numbers a and b. For example, if x represents the lifetime of a randomly chosen elec-
tronic component (measured in hours of service) and y denotes the lifetime of another
randomly chosen component, then, intuitively, we expect that the event {x , 1000}
should be independent of the event {y , 500}. In fact, we expect that the choice of 1000
and 500 is immaterial here, so that the events {x , a} and {y , b} should be indepen-
dent, regardless of the values of a and b.
Independent variables commonly arise from the application of random procedures
such as random sampling or randomization, or from an assumption of randomness.
For two discrete random variables with mass functions p1(x) and p2(y), independence
also means that their joint probability p(a, b) 5 P(x 5 a and y 5 b) equals the prod-
uct of their mass functions; that is, p(a, b) 5 p1(a)p2(b) for any combination of values
of a and b. Similarly, for continuous random variables with densities f1(x) and f2(y),
independence allows their joint density f (x, y) to be written as a product of their
individual densities: f (x, y) 5 f1(x)f2(y) for any values of x and y. Now let B denote a
collection of points in the x2y plane. When x and y are independent, the probability
that the pair (x, y) lies in B is

P(B) 5 ^ ^ p(x, y) 5 ^ ^ p1(x)p2(y) x, y discrete

(x, y) B (x, y) B

P(B) 5 ## f (x, y) dx dy 5 ## f1(x)f2(y) dx dy x, y continuous

B B

Example 5.14 Images displayed on computer screens consist of thousands of small regions
called picture elements, or pixels for short. The intensity of the electron beam
focused at a given point (x0, y0) on a flat screen is usually described by two in-
dependent normal random variables x and y, with means x0 and y0, respectively.
That is, we represent the intensity of the beam by a joint density function of two
independent random variables. For example, Figure 5.14 shows a graph of the
joint density function describing an electron beam focused on the point (x0, y0) 5
(30, 50). The standard deviations of the two normal distributions are x 5 .2 and
y 5 .2. Because x and y are independent, we can write the joint density as the product
f (x, y) 5 f1(x)f2(y)
1 1 x 2 30 2 1 1 y 2 50 2
5 e2 2 a .2
b
? e2 2 a .2
b
x 22 y 22
1 1 x 2 30 2 y 2 50 2

5 e2 2 c a .2 b 1a
.2
b d
2xy

0
29
49
30
50

31 51

Figure 5.14 Joint density function near the point

( 0, 0) 5 (30, 50)

The volume under this density that sits over a given region B in the x–y
plane describes the proportion of time that the electron beam spends in region
B. Although the joint density can be used to find the probability associated with
any set B of points near (30, 50) on the screen, the probability of some sets can
be found in an easier way. For example, if we want to find the proportion of time
that the beam spends in the region where x , 29.5 and y , 49.6, we can simply
use the independence of x and y to obtain

P(x , 29.5 and y , 49.6) 5 P(x , 29.5)P(y , 49.6)

Unless otherwise noted, all content on this page is © Cengage Learning.

instead of integrating the density over the region B 5 {(x, y)|x , 29.5, y , 49.6}.
Thus

P(x , 29.5 and y , 49.6) 5 P(x , 29.5)P(y , 49.6)

29.5 2 30 49.6 2 50
5 Paz , bPaz , b
.2 .2
5 P(z , 22.5)P(z , 22.0)
5 (.0062)(.0228) 5 .00014

The proportion of time that the beam spends in this region is very small.

Example 5.15 In Sections 5.5 and 5.6, we will be concerned with sums and averages of indepen-
dent random variables. Suppose, for example, that two printed circuit boards are ran-
domly selected and tested. Let x be the number of defective computer chips found
on one board; let y be the number of defectives found on the other board. Suppose
the following mass functions describe x and y:

x: 1 2 3 4 y: 1 2 3 4
p1(x): .25 .25 .25 .25 p2(y): .25 .25 .25 .25

To find the mass function associated with the average number of defectives on two
boards, w 5 (x 1 y)y2, we can use mutually exclusive events and independence to
simplify each probability. For example, to find P(w 5 2.5), first break up the event
{(x 1 y)y2 5 2.5} into the disjoint events {x 5 1 and y 5 4}, {x 5 2 and y 5 3}, {x 5
3 and y 5 2}, and {x 5 4 and y 5 1}. Next, find the probabilities of these events by
multiplying mass function values:

P(x 5 1 and y 5 4) 5 P(x 5 1)P(y 5 4) 5 (.25)(.25) 5 .0625

P(x 5 2 and y 5 3) 5 P(x 5 2)P(y 5 3) 5 (.25)(.25) 5 .0625
P(x 5 3 and y 5 2) 5 P(x 5 3)P(y 5 2) 5 (.25)(.25) 5 .0625
P(x 5 4 and y 5 1) 5 P(x 5 4)P(y 5 1) 5 (.25)(.25) 5 .0625

Finally, add the probabilities of these disjoint events to find P(w 5 2.5) 5 .2500.
Proceeding in this manner gives the mass function of the average, w:

w: 1 1.5 2 2.5 3 3.5 4

p(w): .0625 .1250 .1875 .2500 .1875 .1250 .0625

The graphs of all three mass functions are shown in Figure 5.15. Notice that the
mass function of the average tends to bunch more closely around its mean than do
either of its constituent mass functions, p1(x) and p2( y).

Unless otherwise noted, all content on this page is © Cengage Learning.

1 2 3 4

+
––––
1 1.5 2 2.5 3 3.5 4 2

1 2 3 4

Figure 5.15 The mass function of an average of two independent random

variables

Section 5.4 Exercises

27. Classify each of the following random variables as p(x) 5 c(5 2 x) for x 5 0, 1, 2, 3, 4. Find the
either discrete or continuous. numerical value of c and then compute P(x . 0).
a. x 5 the number of flaws per square foot in a ran-
31. A contractor is required by a county planning
domly selected sheet of fabric
department to submit from one to five different
b. y 5 the measured concentration of chemical in
forms, depending on the nature of the project. Let
a solution
y 5 number of forms required of the next contrac-
c. w 5 the proportion of oversize bolts in a ran-
tor. Suppose that it is known that the probability
domly selected box of bolts
that y forms are required is proportional to y; that
d. u 5 the number of errors per 1000 randomly se-
is, p(y) 5 ky for y 5 1, 2, 3, 4, and 5.
lected lines of computer code
a. What is the numerical value of k?
e. v 5 the breaking strength of a randomly selected
b. What is the probability that at most three forms
metal bar
are required?
f. t 5 the lifetime of a randomly selected electronic
c. What is the expected number of forms required?
component
d. Find the standard deviation of the number of
g. x 5 the number of customer complaints in a
forms required.
randomly selected week
32. Suppose that the reaction time (sec) to a certain
28. The probability mass function for the number x
stimulus is a continuous random variable with a
of coding errors found in 1000 randomly selected
density function given by
lines of computer code is given by
kyx for 1 # x # 10
x: 0 1 2 3 4 f (x) 5 e
0 otherwise
p(x): .08 .15 .45 .27 .05
a. Sketch a graph of f(x).
a. Calculate the mean number of coding errors for b. Find the numerical value of k.
all such blocks of 1000 lines of code. c. What is the probability that x exceeds 3?
b. Calculate the variance and standard deviation d. What is the probability that x lies within .25 sec
of x. of 3?
29. A chemical supply company currently has in stock 33. A printed circuit board (PCB) has 285 small holes,
100 pounds of a certain chemical, which it sells to its called “joints,” into which are inserted the thin
customers in 5-lb lots. Let x denote the number of lots leads or “pins” emanating from electronic com-
ordered by a randomly selected customer, and sup- ponents soldered to the PCB (see Example 5.20).
pose x has the following probability mass function: Assuming that the quality of the solder joint at any
x: 1 2 3 4 pin is independent of the quality at any other pin, a
binomial mass function can be used to describe x,
p(x): .2 .4 .3 .1
the number of defective solder joints. Answer the
a. Compute the mean number of lots ordered by a following questions, given the probability that a
customer. given solder joint is defective is .01:
b. Compute the variance of the number of lots or- a. What are the mean and standard deviation of
dered by a customer. the number of defective solder joints on a PCB?
c. Compute the expected number of pounds left b. What proportion of all PCBs are defect-free?
after a customer’s order is shipped. c. What is the probability that a given PCB has two
or more defective solder joints?
30. Let x denote the number of ticketed airline pas-
sengers denied a flight because of overbooking. 34. The Poisson mass function is often used in biol-
Suppose that x is a random variable for which ogy to model the number of bacteria in a solution.

Suppose a dilute suspension of bacteria is divided b. Suppose the measuring instrument in part (a)
into several different test tubes. The number of bac- is replaced with a more precise measuring in-
teria x in a test tube has a Poisson mass function with strument having a standard deviation of .5 mm.
a parameter that represents the mean number of What is the probability that a measurement
bacterial cells contained in the different test tubes. from the new instrument lies within 2 mm of
a. Express the probability that a particular test tube the true length of an object?
contains no bacteria, in terms of .
37. Acceptance sampling is a method that uses small
b. In terms of , what is the probability that a test
random samples from incoming shipments of prod-
tube contains at least one bacterial cell?
ucts to assess the quality of the entire shipment. Typi-
c. After a certain period of time, all of the test
cally, a random sample of size n is selected from a
tubes are examined, and it is found that 40%
shipment, and each sampled item is tested to see
of the tubes contain at least one bacterial cell.
whether it meets quality specifications. The number
Use your answer from part (b) to estimate , the
of sampled items that do not meet specifications is
mean number of cells per test tube.
denoted by x. As long as x does not exceed a prespeci-
35. A standard procedure for testing safety glass is to fied integer c, called the acceptance number, then the
drop a 1/2-lb iron ball onto a 12-in. square of glass entire shipment is accepted for use. If x exceeds c,
supported on a frame (“Statistical Methods in Plas- then the shipment is returned to the vendor. In prac-
tics Research and Development,” Quality Engr., tice, because n is usually small in comparison to the
1989: 81–89). The height from which the ball is number of items in a shipment, a binomial distribu-
dropped is determined so that there is a 50% chance tion is used to describe the random variable x.
of breaking through the glass. A breakthrough is a. Suppose a company uses samples of size
considered to be a failure, whereas a ball that is n 5 10 and an acceptance number of c 5 1
stopped by the glass (even if the glass cracks) is con- to evaluate shipments. If 10% of the items in
sidered to be a success. Suppose that 100 sheets of a certain shipment are defective, what is the
safety glass are randomly selected and tested, and probability that this shipment will be returned
that no change has been made in the resin used to to the vendor?
manufacture the glass. b. Suppose that a certain shipment contains no
a. What is the expected number of sheets that will defective items. What is the probability that the
experience a breakthrough? shipment will be accepted by the sampling plan
b. What is the probability that 60 or more sheets in part (a)?
will have a breakthrough? c. Rework part (a) for shipments that are 5%, 20%,
and 50% defective.
36. The normal distribution is commonly used to model
d. Let denote the proportion of defective items
the variability expected when making measurements
in a given shipment. Use your answers to parts
(Taylor, J. R., An Introduction to Error Analysis: The
(a)2(c) to plot the probability of accepting a
Study of Uncertainties in Physical Measurements,
shipment (on the vertical axis) against 5 0,
University Science Books, Sausalito, CA, 1997). In
.05, .10, .20, and .50 (on the horizontal axis).
this context, a measured quantity x is assumed to
Connect the points on the graph with a smooth
have a normal distribution whose mean is assumed
curve. The resulting curve is called the operat-
to be the “true” value of the object being measured.
ing characteristic (OC) curve of the sampling
The precision of the measuring instrument deter-
plan. It gives a visual summary of how the plan
mines the standard deviation of the distribution.
performs for shipments of differing quality.
a. If the measurements of the length of an object
have a normal probability distribution with a 38. Refer to Exercise 37. Acceptance sampling plans that
standard deviation of 1 mm, what is the prob- use an acceptance number of c 5 0 are given the
ability that a single measurement will lie within name zero acceptance plans. Zero acceptance plans
2 mm of the true length of the object? are not frequently used because, although they protect

against accepting shipments of inferior quality, they MTBF of 500 hours (see Example 5.12 for the
also tend to reject many shipments of good quality. definition of MTBF). Find the median of this
a. Let denote the proportion of defective items distribution. The median is the time by which
in a shipment. Develop a general formula for half of all such assemblies will break down.
the probability of accepting a shipment having b. Is the median time to failure from part (a) larger
3 100% defective items. or smaller than the mean time before failure
b. Plot the OC curve for the zero acceptance plan (MTBF)?
that uses sample sizes of n 5 10. c. From your answer to part (a), find a general
c. For what value of is the probability of accept- formula (for any value of MTBF) for expressing
ing a shipment about .05? the median time to failure in terms of the mean
time before failure.
39. Qualification exams for becoming a state-certified
welding inspector are based on multiple-choice 42. On a construction site, subcontractor A is respon-
tests. As in any multiple-choice test, there is a sible for completing the structural frame of a build-
possibility that someone who is simply guessing ing. When this task is complete, subcontractor B
the answers to each question might pass the test. then begins the task of installing electrical wiring
Let x denote the number of correct answers given and outlets. The following tables show estimated
by a person who is guessing each answer on a probabilities of completing each task in x days:
25-question exam, with each question having five
Framing time (days), x: 10 15 20 25 30
possible answers (for each question, assume only
one of the five choices is correct). Probability, p1(x): .10 .20 .30 .30 .10
a. What type of probability distribution does x Wiring time (days), y: 5 10 15 20
have? Probability, p2(y): .20 .50 .20 .10
b. For the 25-question test, what are the mean and
standard deviation of x? a. Calculate the expected completion time for
c. The exam administrators want to make sure each task.
that there is a very small chance, say, 1%, that a b. Find the probability distribution of the total
person who is guessing will pass the test. What time for completing both tasks (assume that the
minimum passing score should they allow on framing and wiring tasks are independent).
the exam to meet this requirement? c. What is the probability that the total time to
complete both tasks is less than 35 days?
40. When used to model lifetimes of components, a d. What is the expected time for completing both
probability distribution is said to be “memoryless” tasks?
if, for a component that has already lasted (with-
43. Let x be the cost ($) of an appetizer and y be the
out failure) for t hours, the probability that it lasts
cost of a main course at a certain restaurant for a
for another s hours does not depend on t. That is,
customer who orders both courses. Suppose that x
P(x $ t 1 sux $ t) 5 P(x $ s). Show that the expo-
and y have the following joint distribution:
nential distribution is memoryless.
y
41. The concept of the median of a set of data can also
10 15 20
be applied to the probability distribution of a ran-
5 .20 .15 .05
dom variable. If x is a random variable with density
x 6 .10 .15 .10
function f (x), then the median of this distribution is
7 .10 .10 .05
defined to be the value for which half the area
under the density curve lies to the left of . That is, a. Find the probability mass function of x.
is the solution to the equation #2 f1x2 dx 5 12.

b. Find the probability mass function of y.
a. Suppose the lifetime x of an electronic assem- c. Find the probability that x 1 y # 21.
bly follows an exponential distribution with an d. Are x and y independent?

5.5 Sampling Distributions

The general objective of statistical inference, as we have noted in the chapter introduc-
tion, is to answer questions about the characteristics of populations and processes. In
particular, we wish to be able to make statements about population and process parame-
ters and to also accompany them by a measure of how much reliability or confidence we
have in our statements. Statistical inference is based on the interplay between random
samples (used to obtain data and calculate statistics), sampling distributions (which
describe the behavior of such statistics), and probability (which gives quantitative mea-
sures of reliability about what the statistics say). In this section and the next, we show
how these three tools are used in statistical inference.
The sampling distribution of a statistic is a mass or density function that character-
izes all the possible values that the statistic can assume in repeated random samples.
Depending on the particular statistic (e.g., x, s, s2, x , range, IQR), we speak of the sam-
pling distribution of x, the sampling distribution of s, and so forth. Every statistic has
a sampling distribution. The sampling distribution sets the limits on which values of a
statistic are likely and which are not.

definition The sampling distribution of a statistic is a mass or density function that char-
acterizes all the possible values that the statistic can assume in repeated random
samples from a population or process.

How Sampling Distributions Are Used

One way to approximate the sampling distribution of a statistic is to repeatedly select a
large number of random samples of size n from a given population. By calculating the
value of the statistic for each sample and forming a histogram of the results, we get an
approximate picture of the sampling distribution of the statistic. In turn, this picture
can be used to describe the values of the statistic that are likely to occur in any random
sample of size n.

Example 5.16 Suppose that we draw 1000 random samples, each of size n 5 25, from a normal pop-
ulation with a mean of 50 and a standard deviation of 2. If we calculate the mean x
of each sample, then the distribution of all 1000 x values gives a good approximation
to the sampling distribution of x. Figure 5.16 shows a histogram of the results of such
an experiment. Notice that the 1000 sample means stack up around the population
mean ( 5 50) and that variation among the sample means is smaller than variation
in the population. In particular, none of the sample means fall outside the range of
48.5 to 51.5 (i.e., none are more than 1.5 units away from ). In fact, it also appears
that very few sample means fall outside the interval 49 to 51; that is, they are gener-
ally within 1 unit of .

Distribution of sample means

0.2
Frequency
100
80
0.1
Population 60
40
20
0.0
44 46 48 50 52 54 56 48.5 49.5 50.5 51.5
Mean

Figure 5.16 Approximating the sampling distribution of ( 5 25)

From the shape and location of the sampling distribution, we can begin to see
which values of the sample statistic are more likely to occur than others. In this sense,
the information in a sampling distribution provides a template for evaluating any sam-
ple, even future samples, from a population or process. In Figure 5.16, for instance, we
can use the tails of the sampling distribution to place bounds on the values that x can
assume whenever we take random samples of size 25 from a normal population with a
mean of 50 and a standard deviation of 2. Going a step further, we can reasonably say
that the mean affects only the location of the histogram and that the value of affects
only the spread of the sample results. If this is so, then we now know a lot about what to
expect when sampling from any normal population whose standard deviation is 5 2.

Example 5.17 Refer to Example 5.16. Suppose that next week we select a single sample of size 25
from a normal population whose standard deviation is known to be 5 2, but whose
mean is unknown to us. If x 5 70 for this sample, then the results in Example 5.16
indicate that 70 is almost certainly no farther than 1.5 units away from the population
mean and it is fairly likely that it is within 1 unit of . That is, we can infer that
the unknown population mean is almost certainly between 68.5 and 71.5, and we
can be reasonably confident that it is between 69 and 71. In this way, by using our
knowledge of what the sampling distribution of x looks like, we can begin to make
inferences about the likely values of the unknown population parameter .
Unless otherwise noted, all content on this page is © Cengage Learning.

General Properties of Sampling Distributions

As we noted previously, sampling distributions can be created for any statistic, not just x.
For example, Figure 5.17 shows the approximate sampling distributions of the statistics x,
x, s, and s2, for the same 1000 samples of size n 5 25 that were used to create Figure 5.16.
What about sampling from discrete populations? In particular, suppose we want
to use samples of size 25 to estimate the proportion of defectives being made by a
certain process. Denoting defective items by a “1” and nondefectives by a “0,” the mass
function

x: 0 1
p(x): .80 .20

Mean Median

Standard deviation Variance

2
Figure 5.17 Sampling distributions of , , , and

describes such a process in which the proportion of defective items is 20%. By calculat-
ing the sample proportion defective p for each of 1000 random samples of size n 5 25,
an approximate sampling distribution for the statistic p can be formed (Figure 5.18).
Note that this distribution has many more possible values than just the values x 5 0
and x 5 1 in the population (each of the values 0y25, 1y25, 2y25, 3y25, . . . , 25y25 is
a possible value of p). The shape of this sampling distribution is similar to the one in
Figure 5.16, although it contains some gaps because only the values of p shown previ-
ously are possible to attain in a sample of size 25.

Frequency

200

100 Unless otherwise noted, all content on this page is © Cengage Learning.

0
0 .1 .2 .3 .4 .5
Proportion

Figure 5.18 Sampling distribution of when

5 25 from a process with 5 .20

Sampling experiments can reveal many general properties of sampling distributions.

Consider Table 5.1, which shows the means and standard deviations of the sampling

distributions in Figure 5.17 along with the actual values of the corresponding population
parameters (for a normal population with 5 50, 5 2). The similarity between the
column of population parameters and the column of means of the sampling distribu-
tions leads us to conjecture that the center (i.e., the mean) of the sampling distribution
of a statistic may, in fact, coincide with the corresponding population parameter. When
this happens, we say that the statistic is unbiased, or that it is an unbiased estimator of
the population parameter. As we shall see in Section 5.6, some of the most important
statistics we have encountered so far are unbiased.

Table 5.1 Means and standard deviations of sampling distributions in Figure 5.17
Population Actual Sample mean of Sample standard deviation
parameter value sampling distribution of sampling distribution

Mean, 50 50.000 .418

Median 50 49.982 .515
Standard deviation, 2 1.9831 .2853
Variance, 2 4 4.0139 1.1528

Second, the standard deviations of the sampling distributions exhibit an important

feature: All are smaller than the population standard deviation (5 2). In fact, as we
shall see in Section 5.6, an even stronger statement can be made: For the majority of
statistics we have studied, the variation in the sampling distribution actually decreases
as the sample size increases.
Beyond these simple observations, additional questions immediately come to mind.
What role does the shape of the population play in controlling the shape of the sampling
distribution? What is the effect of increasing or decreasing the sample size? Exactly how
is the variation exhibited by the sampling distribution related to the variation in the
population? Most of these questions can be answered in general terms by conducting a
few more sampling experiments. Such experiments provide the motivation for the fol-
lowing general conclusions:

General Properties of Sampling Distributions

Unless otherwise noted, all content on this page is © Cengage Learning.

1. The sampling distribution of a statistic often tends to be at the value of the

population parameter estimated by the statistic.
2. The spread of the sampling distributions of many statistics tends to grow smaller as the
sample size increases.
3. As the sample size increases, sampling distributions of many statistics become more and
more bell-shaped (more and more like normal distributions).

Finally, and perhaps most importantly, do we really have to conduct a lengthy sam-
pling experiment every time we want to make inferences based on a statistic generated
from a single sample? As we shall see in Section 5.6, the surprising answer is “no.” In

fact, the approximate shape of the sampling distribution is often known in advance, be-
fore taking even a single sample! Furthermore, knowing the specific shape of a sampling
distribution also enables us to calculate probabilities, which allow us to quantify exactly
what we mean by saying that, for example, the sample mean is highly likely to be within
1 unit of the population mean.

Section 5.5 Exercises

44. What primary purpose do sampling distributions d. What will happen to the variance of the sam-
serve in statistical inference? pling distribution of R as the sample size n in-
creases? Give a simple justification for your an-
45. Refer to Exercise 37. Suppose that a large lot of items
swer based on the definition of the range.
is inspected by taking a random sample of size n and
determining the number x of defective items in the 47. The Food and Drug Administration (FDA) oversees
sample. The result is then reported in terms of the the approval of both medical devices and new drugs.
proportion p 5 xyn of defective items in the sample. To gain FDA approval, a new device must be shown
Assume that the binomial distribution can be used to to perform at least as well, and hopefully better, than
describe the behavior of the random variable x. any similar device already on the market. Suppose a
medical device company develops a new system for
a. Suppose that 5% of the items in a particular lot
connecting intravenous tubes used on hospital pa-
are defective and that a random sample of size
tients. To be comparable to an already-existing prod-
n 5 5 is to be taken from the lot. Calculate the
uct, the force required to disconnect two tubes joined
probability that the sample proportion p falls
by the new device must not exceed 5 lb. To estimate
within 1% of the true percent defective in the
the maximum force required to disconnect two tubes,
lot. That is, find p(.05 2 .01 # p # .05 1 .01)
several tests are made. For a random sample of n con-
b. Answer the question in part (a) for samples of
nections, the forces x1, x2, x3, . . . , xn required to dis-
size n 5 25.
connect the tubes are recorded and the maximum,
c. Answer the question in part (a) for samples of
M, of the n readings is used to estimate the maximum
size n 5 100. Hint: Use the normal approxima-
necessary force for all such connections.
tion to the binomial distribution.
a. Suppose that the actual distribution of forces
46. Random samples of size n are selected from a popu- needed to disconnect tubes can be described by
lation that is uniformly distributed over the interval a uniform distribution on the interval [2, 4]. For
[10, 20]. Without sampling or performing any cal- a sample of size n 5 2, do you expect the mean
culations, describe what you expect the sampling of the sampling distribution of M to be closer to
distribution of the range R (R 5 largest minus 2 or 4?
smallest value in a sample) to look like. b. Which do you expect to be larger, the mean of
a. For samples of size n 5 2, what do you predict the the sampling distribution of M for samples of size
mean of the sampling distribution of R will be? n 5 2 or the mean of the sampling distribution of
b. For samples of size n 5 100, what do you predict M for samples of size n 5 100? Use the definition
the mean of the sampling distribution of R will be? of the maximum of a sample to justify your answer.
c. Will the variance of the sampling distribution of c. Will the variance of the sampling distribution of M
R for samples of size n 5 2 be the same or dif- for samples of size n 5 2 be the same or different
ferent from the variance of the sampling distri- from the variance of the sampling distribution of
bution of R for samples of size n 5 100? Give a M for samples of size n 5 100? Give a simple justi-
simple justification for your answer based on the fication for your answer based on the definition of
definition of the range for n 5 2 versus n 5 100. the sample maximum for n 5 2 versus n 5 100.

5.6 Describing Sampling Distributions

Just as histograms of sample data provide approximations to population distributions,
sampling experiments (Section 5.5) furnish approximate pictures of sampling distribu-
tions. We now turn our attention to developing more precise summaries of sampling
distributions. This requires a slightly deeper investigation of the role played by random
sampling. For instance, Example 5.15 gives a glimpse of how random sampling and the
form of the statistic x are brought together to form a more exact picture of the sampling
distribution of x. The essential role of random sampling is to ensure that the sampled
values can be considered to be independent. Independence, in turn, enables us to per-
form the necessary probability calculations to arrive at the distribution of the statistic.
In this section, we study in some detail the exact sampling distributions of the sta-
tistics x (sample mean) and p (sample proportion). These two statistics appear in a great
many statistical techniques, and their sampling distributions serve as prototypes for all
other sampling distributions. In subsequent chapters, we will simply state the form of
the sampling distribution that applies to a given statistical technique.

Sampling Distribution of x
The sampling distribution of x, also called the sampling distribution of the mean, is
the probability distribution that describes the behavior of x in repeated random samples
from a population or process. Like any distribution, the sampling distribution of x has
its own unique mean and standard deviation, which we denote by x and x, respec-
tively. The next general result relates x and x to the population or process mean and
standard deviation.

Mean and Standard Deviation of the Sampling

Distribution of
Let be the sample mean of a random sample 1, 2, 3, . . . , from a population
or process with mean and standard deviation . Then, the mean of the sampling
distribution of coincides with , regardless of the sample size . The spread of the
sampling distribution, described by = is equal to the population standard deviation
divided by the square root of the sample size. That is,

5 and 5
1

These equations hold regardless of the particular form of the population distribution. To
emphasize the fact that it describes a sampling distribution, not a population, x is also
called the standard error of x, or the standard error of the mean.
One of the key features of the standard error of the mean x is that it decreases as
the sample size increases. In fact, many statistics have this property (see Section 5.5).
This makes intuitive sense, since we expect that more information ought to provide
better estimates (i.e., smaller standard errors). As a result, increasing the size of a ran-
dom sample has the desirable effect of increasing the probability that the estimate x will
lie close to the population mean .

Sampling from a Normal Population

When a population follows a normal distribution, it can be shown that the sampling
distribution of x is also normal, for any sample size n. The normality of x, along with the
fact that its mean x and standard error x can be determined from and , is enough
to completely characterize the sampling distribution of x in this case. As a result, with
the normal distribution, probabilities of events involving x reduce to straightforward
calculations. Figure 5.19 demonstrates the effect that increasing n has on the sampling
distribution of x.

–––
––
10

( = 10)

–––
––
30

( = 30)

––––
–––
100

( = 100)

Figure 5.19 The probability that falls within a fixed distance from increases as
increases

Unless otherwise noted, all content on this page is © Cengage Learning.

Sampling Distribution of (Normal Population)
When a population distribution is normal, the sampling distribution of x is also normal,
regardless of the size of the sample.

Example 5.18 Physical characteristics of manufactured products are often well described by
normal distributions. Suppose, for example, that we want to evaluate the length
(in cm) of certain parts in a production process based on the information in a
random sample of five such parts. The parts are required to have a nominal length
of 20 cm; past experience with this process indicates that the standard deviation
is known to be 5 1.8 cm. If we assume that the lengths can be described by a

normal distribution, what is the probability that the mean of this sample will be
within 2 mm of the current process mean ? That is, what is the probability that x
will lie between 2 2 and 1 2?
The solution to this type of problem lies in recognizing that the sam-
pling distribution of x is normal with a mean of x 5 and standard error of
x 5 y1n 5 1.8y15 5 .805. To find the probability P 1 2 2 , x , 1 22, we
standardize, making sure to use the mean and standard error of x while doing this:
2 22 122
,z,
P( 2 2 , x , 1 2) 5 P °
¢
1n 1n
22 2
5 Pa ,z, b 5 .9868
.805 .805
That is, there is a 98.68% chance that the mean of a random sample of size n 5 5
will be within 2 units of the population mean . Notice how the unknown mean
cancels itself during the standardization. In other words, we do not need to know (or
assume) a value for . Instead, when we select our sample of five parts, we can be
relatively confident that the sample mean will be no farther than 2 cm from the true
(unknown) process mean.

The Central Limit Theorem

As we have just seen, prior knowledge about the shape of a population distribution
determines the sampling distribution of x. Unfortunately, possessing such knowledge is
more often the exception than the rule. In many applications, we are faced with sam-
pling from populations whose distributions are, at best, only approximately understood
or that sometimes deviate markedly from normality.
The remedy for this problem is to rely more heavily on the sampling process and
less on our knowledge of the population. It is a fortunate and somewhat surprising fact
that a complete knowledge of a population distribution is not necessary, as long as we
compensate by selecting a large enough sample. By using a moderately large sample size
n, it can be shown that the sampling distribution of x is approximately normal, regardless of
the particular population distribution. This result is known as the Central Limit Theorem.

The Central Limit Theorem

The sampling distribution of can be approximated by a normal distribution when the
sample size is sufficiently large, irrespective of the shape of the population distribution.
The larger the value of , the better the approximation.

Although the sampling distribution of x must reflect certain features of the popula-
tion being sampled (especially its location, ), the shape of the sampling distribution is
primarily influenced by n. That is, as n increases, the particular shape of the population
(e.g., uniform, exponential, normal, Weibull) exerts less and less influence on the shape

of the sampling distribution, which becomes more and more normal in appearance.
Figure 5.20 illustrates this effect for several different populations. The closer the popu-
lation is to being normal, the more rapidly the sampling distribution of x approaches
normality. For instance, we saw this behavior emerging in Figure 5.15, where even
small samples of size n 5 2 from a uniform population result in a sampling distribution
that is already beginning to take on the characteristic normal shape.

Uniform population
=2 = 10

Exponential population

=2 = 50

Normal population

=2 = 10

Figure 5.20 The Central Limit Theorem: The sampling distribution of approaches a

normal distribution as the sample size increases

Many authors use n $ 30 as a rough guide for what constitutes a “large enough”
sample size for invoking the Central Limit Theorem. This is not a bad rule in general,
but there are cases where substantially smaller values of n will suffice (e.g., with sym-
metric populations like the uniform and the normal), as well as cases where larger Unless otherwise noted, all content on this page is © Cengage Learning.
sample sizes are needed (especially for highly skewed populations). As a rule, the less
symmetric a population is, the larger the sample size will have to be to ensure normality
of x. For example, in the case of an exponential population, sample sizes of 40 to 50 are
often required to achieve normality.

Example 5.19 Consider the distribution shown in Figure 5.21 for the amount purchased (rounded
to the nearest dollar) by a randomly selected customer at a particular gas station (a
similar distribution for purchases in Britain (in pounds) appeared in the article “Data
Mining for Fun and Profit,” Statistical Science, 2000: 111–131; there were big spikes at
the values 10, 15, 20, 25, and 30). The distribution is obviously quite nonnormal. We
asked Minitab to select 1000 different samples, each consisting of n 5 15 observations,

Probability

0.16

0.14

0.12

0.10

0.08

0.06

0.04

0.02
Purchase
0.00
5 10 15 20 25 30 35 40 45 50 55 60 amount

Figure 5.21 Probability distribution of 5 amount of gasoline purchased ($)

and calculate the value of the sample mean x for each one. Figure 5.22 is a histogram
of the resulting 1000 values; this is the approximate sampling distribution of x under
the specified circumstances. This distribution is clearly approximately normal even
though the sample size is not very large. A normal quantile plot based on the 1000 x
values exhibits a very prominent linear pattern.

Density

0.14

0.12

0.10
Unless otherwise noted, all content on this page is © Cengage Learning.

0.08

0.06

0.04

0.02

0.00 Mean
18 21 24 27 30 33 36

Figure 5.22 Approximate sampling distribution of the sample mean

amount purchased when 5 15 and the population is as shown in Figure 5.21

Example 5.20 Printed circuit boards (PCBs), used in electronic equipment such as computers and
appliances, are laminated cards (usually green) upon which various electronic compo-
nents are mounted. One step in the manufacture of PCBs uses machines to automati-
cally insert the metal connecting pins on the components into the appropriate hole
patterns on a PCB. Components of each type (e.g., resistors, capacitors) are adhesively
mounted on large paper-tape rolls and fed into the machines, which then insert them
into a PCB. The amount of time it takes to insert all the components on a given PCB
varies somewhat from board to board because of machine downtime for replenish-
ing tape rolls and replacing components with broken pins. Suppose that an insertion
machine can complete a certain type of PCB in an average time of 3 minutes with a
standard deviation of .5 minute. If an order of 100 PCBs is run on this machine, what
is the probability that the average time to complete all the boards exceeds 3.1 minutes?
Viewing the completion times as a random sample from a population with
5 3 and 5 .5, we can calculate the mean and standard error of the sampling
distribution of the average completion time (of the 100 boards) as follows:
.5
x 5 5 3 and x 5 5 5 .05
1n 1100
Because the sample size n 5 100 is large, the Central Limit Theorem allows us to
use the normal distribution to calculate the desired probability:
3.1 2 3
P(x . 3.1) Paz . b
.05
5 P(z . 2) 5 1 2 P(z # 2) 5 1 2 .9772 5 .0228
That is, there is only a 2.28% chance that the average completion time will exceed
3.1 minutes. Since x .3.1 is equivalent to (100) x . 100(3.1), we can also state that
there is a 2.28% chance that the total time for completing the 100 boards will exceed
310 minutes (5 hours, 10 minutes).

Sampling Distribution of the Sample Proportion

Qualitative information can also be included in statistical studies. To do this, we first
numerically code such information using the following simple device: The number “1”
is assigned to population members having a specified characteristic and “0” is assigned
to those that do not. The population that results from this 0–1 coding scheme is pictured
in Figure 5.23. The parameter of interest in this situation is , the proportion of the
population that has the characteristic of interest. Notice that is also the height of the
bar associated with the value of 1 in Figure 5.23.
Using the formulas in Chapter 2, we can calculate the mean, variance, and stan-
dard deviation of this population:

5 ^ xp(x) 5 0(1 2 ) 1 1 ? 5
x

5 ^ (x 2 )2p(x) 5 (1 2 )
2

5 2(1 2 )

(1 – )

1–

0 1

Figure 5.23 The distribution of coded

values of a qualitative characteristic: “1”
denotes that the specified characteristic
is present; “0” indicates that it is not

Every random sample drawn from such a population will consist entirely of 0s and
1s. Suppose, for instance, that a particular sample of size 10 contains the observations
{0, 0, 1, 1, 0, 1, 0, 0, 1, 0}. Then the sample mean is (0 1 0 1 1 1 1 1 0 1 1 1 0 1 0 1
1 1 0)y 10 5 .40. That is, the sample mean is simply the proportion of 1s in the sample.
We use the notation p to denote the proportion of successes, also called the sample
proportion, in a random sample of size n.
Since p is actually a sample mean, we can use the earlier results in this section to
determine its sampling distribution. For example, the mean and standard error of the
sampling distribution of p are given by

2(1 2 ) (1 2 )
p 5 5 and p 5 5 5
1n 1n C n

Furthermore, for a sufficiently large sample size n, the Central Limit Theorem in-
dicates that the sampling distribution of p will be approximately normal. Because
we record only whether each sampled item has a certain characteristic or not,
large samples are often easy to come by when estimating a population proportion
Unless otherwise noted, all content on this page is © Cengage Learning.

. As a general rule, the accuracy of the normal approximation is best when both
n $ 5 and n(1 2 ) $ 5.

Sampling Distribution of
The mean and standard error of the sampling distribution of are given by
(1 2 )
5 and 5
B
In addition, for a large enough , the sampling distribution of is approximately normal. In
general, the normal approximation is best when $ 5 and 11 2 2 $ 5.

The fact that the formulas for p and p both contain the unknown parameter
might at first appear to negate the usefulness of the sampling distribution of p. After all,
if the population proportion is unknown, how can we possibly find 2(1 2 )yn ?
In practice, there are two relatively simple solutions to this problem: (1) Use a prede-
termined value of that describes some hypothetical value of against which the
sample data is to be compared or (2) use 5 1y2 in the formula for p, which results
in a conservatively large value of p.
The second approach is based on the observation that (1 2 ) # .25 for any value
of between 0 and 1.1 This means that

(1 2 ) .25 1
p 5 # 5
B n A n 21n
no matter what the true value of . Thus, by choosing the sample size n large enough,
1y21n (and hence p) can be made as small as desired. This approach is commonly
used in all forms of survey sampling.

Example 5.21 Control charts are graphs that monitor the movements in a sample statistic (such
as x or p) in periodic samples taken from an ongoing process. Using the sampling
distribution of the statistic as a yardstick, values of the statistic “too far” away from the
center of the sampling distribution are taken to be signals of possible problems with
the process. For example, a p chart is often used to monitor the proportion of non-
conforming products in a manufacturing process. Using past data from the process, a
value of is selected as being representative of the long-run behavior of the process.
Suppose, for example, that a certain process constantly generates an average of about
5% nonconforming products and that samples of size 100 are taken each day to test
whether the 5% nonconformance rate has changed. On one particular day, 12 non-
conforming products appear in the sample. How do we interpret this information?
Assuming that the process is behaving as it has in the past, we set 5 .05. For
this value of , n 5 100(.05) 5 5 and n(1 2 ) 5 100(.95) 5 95, so the condition
for applying the normal approximation is met. Furthermore, the mean and standard
deviation of the sampling distribution of p can be calculated:
(1 2 ) (.05)(1 2 .05)
p 5 .05 and 5 p 5 5 .0218
B n B 100
Because the sampling distribution of p is approximately normal, we can evaluate the
sample proportion of p 5 12y100 5 .12 by determining how far away it is from the mean
of .05. Since (.12 2 .05)y.0218 5 3.21, we see that the value of .12 is 3.21 standard
deviations above the process mean. In other words, this sample result has a very small
probability of occurring if the process is running as usual. Our conclusion is that it is
more likely that something has caused an increase in the process nonconformance rate.

1
Writing (1 2 ) as 1y4 2 (1y2 2 )2 you can see that the maximum value 1y 4 occurs when 5 1y2.
Alternatively, you could use calculus, setting the derivative of (1 2 ) equal to 0, to find that 5 1y2
maximizes the quantity (1 2 .)

Section 5.6 Exercises

48. The inside diameter of a randomly selected piston a. If the expected weight of each bag is 50 lb and
ring is a random variable with a mean of 12 cm and the standard deviation of bag weights is known
a standard deviation of .04 cm. to be 1 lb, calculate the approximate value of
a. Where is the sampling distribution of x cen- P149.75 # x # 50.252 by relying on the Central
tered? What is the standard deviation of the sam- Limit Theorem.
pling distribution of x? b. If the expected weight per bag is 49.8 lb rather
b. Answer the questions in part (a) for sample than 50 lb (so that, on average, the bags are un-
means based on samples of size 64. derfilled), calculate P149.75 # x # 50.252.
c. Which is more likely to lie within .01 cm of
53. The lifetime of a certain battery is normally distrib-
12 cm, the mean of a random sample of size 16
uted with a mean value of 8 hours and a standard
or the mean of a random sample of size 64?
deviation of 1 hour. There are four such batteries in
49. A survey of the members of a large professional en- a package.
gineering society is conducted to determine their a. What is the probability that the average lifetime
views on proposed changes to an ASTM measure- of the four batteries exceeds 9 hours?
ment standard. Suppose that 80% of the entire b. What is the probability that the total lifetime of
membership favor the proposed changes. the batteries will exceed 36 hours?
a. Calculate the mean and standard error of the c. If T denotes the total lifetime of the four batter-
sampling distribution of the proportion of engi- ies in a randomly selected package, find the nu-
neers in samples of size 25 who favor the pro- merical value of T0 for which P1T $ T0 2 5 .95.
posed changes. d. Refer to your answer to part (c). Suppose the bat-
b. Calculate the mean and standard error of the tery manufacturer guarantees that any package
sampling distribution of the proportion of engi- of batteries that does not yield a total lifetime
neers in samples of size 25 who do not favor the of T0 hours will be replaced free of charge to
proposed changes. the customer. If it costs the manufacturer $3.00
c. Calculate the mean and standard error of the to replace a package of batteries (materials plus
sampling distribution of the proportion of engi- mailing to customer), calculate the expected
neers in samples of size 100 who favor the pro- replacement cost per package associated with a
posed changes. large shipment of batteries.
50. A random sample of size 25 is selected from a large 54. The Rockwell hardness of certain metal pins is known
batch of electronic components, and the proportion to have a mean of 50 and a standard deviation of 1.5.
of defective items in the sample is recorded. The a. If the distribution of all such pin hardness mea-
proportion of defective items in the entire batch, surements is known to be normal, what is the
however, is unknown. What is the maximum value probability that the average hardness for a ran-
that the standard error of the sampling distribution dom sample of 9 pins is at least 52?
of the sample proportion could have? b. What is the approximate probability that the av-
51. Refer to Exercise 48. Assume that the distribution of erage hardness in a random sample of 40 pins is
piston diameters is known to be normal. at least 52?
a. Calculate the probability P111.99 # x # 12.012
55. Suppose that the sediment density (in g@cm3) of
when n 5 16.
specimens from a certain region is normally distrib-
b. Calculate the probability P111.99 # x # 12.012
uted with a mean of 2.65 and a standard deviation
when n 5 64.
of .85 (“Modeling Sediment and Water Column
52. Let x1, x2, x3, . . . , x100 denote the actual weights of Interactions for Hydrophobic Pollutants,” Water
100 randomly selected bags of fertilizer. Research, 1984: 169–174).

a. If a random sample of 25 such specimens is se- 58. In Exercise 36, what is the probability that the aver-
lected, what is the probability that the sample age of two measurements will lie within 2 mm of
average sediment density is at most 3.00? Be- the true length of the object?
tween 2.65 and 3.00?
59. Roughly speaking, the Central Limit Theorem
b. How large a sample would be required to ensure
says that sums of independent random variables
that the first probability in part (a) is at least .99?
tend to have (approximately) normal distributions.
56. The number of flaws x on an electroplated automo- Similarly, it can be shown that products of inde-
bile grill is known to have the following probability pendent positive random variables tend to have
mass function: lognormal distributions. Recall from Section 1.5
that a random variable x is said to have a lognor-
x: 0 1 2 3
mal distribution with parameters and if the
p(x): .8 .1 .05 .05
random variable y 5 ln(x) is normal with mean
a. Calculate the mean and standard deviation of x. and standard deviation . The successive breaking
b. What are the mean and standard deviation of of particles into finer and finer pieces, a process
the sampling distribution of the average number that can be modeled as a product of positive ran-
of flaws per grill in a random sample of 64 grills? dom variables, leads to lognormal particle size dis-
c. For a random sample of 64 grills, calculate the tributions. In particular, small particles suspended
approximate probability that the average num- in the atmosphere (called aerosols) have radii that
ber of flaws per grill exceeds 1. can be described by a lognormal distribution, with
parameters 5 22.62 and 5 .788 (Crow, E.
57. Only 2% of a large population of 100-ohm gold-band
L., and K. Shimizu, Lognormal Distributions: The-
resistors have resistances that exceed 105 ohms.
ory and Applications, Marcel Dekker, New York,
a. For samples of size 100 from this population,
1988: 337).
describe the sampling distribution of the sample
a. Find the mean radius (in m) of the atmospheric
proportion of resistors that have resistances in
particles.
excess of 105 ohms.
b. What is the probability that an atmospheric par-
b. What is the probability that the proportion of re-
ticle will have a radius exceeding .12 m?
sistors with resistances exceeding 105 ohms in a
random sample of 100 will be less than 3%?

Supplementary Exercises
60. Figure 5.5 shows how a tree diagram can be used a. What is the probability that a randomly chosen
to verify that {A or B}= 5 {A= and B=}. Use a Venn dia- tree comes from one of the first three parcels of
gram to prove this fact. land?
b. What is the probability that a randomly chosen
61. A large farming area is divided into five parcels of
tree does not come from parcel 5?
land of different sizes, as follows:
Parcel: B1 B2 B3 B4 B5 62. A complex assembly contains 20 critical compo-
Size (acres): 15 20 25 10 20 nents (labeled C1, C2, . . .), each having a probabil-
Because crop-bearing trees are uniformly planted ity of .95 of functioning correctly. Each component
within each parcel, the probability that a randomly must function correctly for the entire assembly to
sampled tree from the farm comes from a particular function. Let A denote the event that the assembly
parcel is assumed to be proportional to the size of fails to function correctly and let B denote the event
the parcel. that component C1 fails to function correctly.

a. Give a verbal description of the expressions A | B a. Verify that Pd(c ) 5 .5.

and B | A. b. What is Pd(2c ) when 5 4?
b. Does P(A | B) 5 P(B | A)? c. Suppose an inspector inspects two different panels,
one with a crack size of c and the other with a
63. A battery-operated tool requires that each of its
crack size of 2c . Again assuming 5 4 and also
four batteries operate correctly to provide sufficient
that the results of the two inspections are indepen-
power to the tool. If each battery operates indepen-
dent of one another, what is the probability that
dently of the others and each has a .10 chance of
exactly one of the two cracks will be detected?
failing over a 30-hour period of operation, what is
the probability that the tool will fail sometime dur- 69. “Travelers” are documents that accompany a prod-
ing the 30-hour operating period? uct as it sequences through various production
steps. Travelers contain manufacturing instructions
64. Two pumps that are connected in parallel fail in-
pertaining to the particular item or order. Suppose
dependently of one another on any given day. The
that each of 30 data fields on a particular traveler
probability that only one pump fails is .10, and the
has a .5% chance of being filled out incorrectly. As-
probability that neither of the two pumps fails is .05.
sume that each field is independent of the others.
What is the probability that both pumps fail on a
a. What is the probability that a given traveler will
given day? Hint: Use a Venn diagram.
contain at least one incorrect field?
65. Find a formula for the probability that at least one b. What is the probability that a traveler is free of
of two independent events occurs. (Hint: If events errors?
A and B are independent, then so are the pairs of c. What is the probability of finding two or more
events A= and B, A and B=, and A= and B=.) errors on a traveler?
66. Let x denote the number of nonzero digits in a ran- 70. A continuous signal is sent over a communication
domly selected zip code. channel. The number of errors per second, x, at the
a. List the possible values of the random variable x. receiving end of the channel has a normal distribu-
b. Can two or more zip codes have the same value tion with a mean and standard deviation of 3 and .8
of x? errors per second, respectively.
67. A continuous random variable x has a density func- a. In any given 1-second period, what is the prob-
tion of the form f1x2 5 .5x over the interval [0, b]. ability that no errors are transmitted?
a. Find b. b. Find the probability of transmitting two or more
b. Find the mean of the variable x. errors per second.
c. Find the standard deviation of the variable x. c. What is the probability that more than five er-
rors per second will be transmitted?
68. According to the article “Optimization of Dis-
tribution Parameters for Estimating Probabil- 71. Use spreadsheet (e.g., Excel™) or other software
ity of Crack Detection” (J. of Aircraft, 2009: to approximate the sampling distribution of the
2090–2097), the following “Palmberg” equation sample mean.
is commonly used to determine the probability a. Generate at least 100 samples of size 10 from
Pd(c) of detecting a crack of size c in an aircraft a uniform distribution on the interval [10, 20].
structure: Create a histogram of the 100 sample means,
and describe the shape of the histogram.
(cyc*) b. Repeat part (a) by generating 100 samples of
Pd(c) 5 size 10 from an exponential distribution with a
1 1 (cyc*)
mean of 5.
where c is the crack size that corresponds to a .5 c. Compare the shapes of the histograms in
detection probability (and thus is an assessment of parts (a) and (b), and offer an explanation for any
the quality of the inspection process). differences that you observe.

72. An electrical appliance uses four 1.5-volt batteries. what is the probability that all births occurred on
The batteries are connected in series so that the total March 11? Hint: The deviation of birth date from
voltage supplied to the appliance is the sum of the due date is normally distributed with mean 0.
voltages in the four batteries. Suppose that the ac- d. Explain how you would use the information in
tual voltage of all 1.5-volt batteries is known to have a part (c) to calculate the probability of a common
mean of 1.5 volts and a standard deviation of .2 volt. birth date.
a. What are the mean and standard error of the
75. A friend who lives in Los Angeles makes frequent
sampling distribution of the average voltage in
consulting trips to Washington, DC; 50% of the time
four randomly selected 1.5-volt batteries?
she travels on airline #1, 30% of the time on airline
b. What is the mean of the sampling distribution of
#2, and the remaining 20% of the time on airline #3.
the total voltage in four randomly selected 1.5-
For airline #1, flights are late into DC 30% of the time
volt batteries?
and late into LA 10% of the time. For airline #2, these
73. Five randomly selected 100-ohm resistors are con- percentages are 25% and 20%, whereas for airline #3
nected in a series circuit. Suppose that it is known the percentages are 40% and 25%. If we learn that
that the population of all such resistors has a mean on a particular trip she arrived late at exactly one of
resistance of exactly 100 ohms with a standard de- the two destinations, what are the posterior probabili-
viation of 1.7 ohms. ties of having flown on airlines #1, #2, and #3? Hint:
a. What is the probability that the average resis- From the tip of each first-generation branch on a
tance in the circuit exceeds 105 ohms? tree diagram, draw three second-generation branches
b. What is the probability that the total resistance labeled, respectively, 0 late, 1 late, and 2 late.
in the circuit differs from 500 ohms by more
76. A factory uses three production lines to manufac-
than 11 ohms?
ture cans of a certain type. The accompanying
c. Find the number of resistors, n, for which
table gives percentages of nonconforming cans,
p1490 # T # 5102 5 .95, where T denotes the
categorized by type of nonconformance, for each of
total resistance in the circuit.
the three lines during a particular time period:
74. The article “Three Sisters Give Birth on the Same
Day” (Chance, Spring 2001, 23–25) used the Line 1 Line 2 Line 3
fact that three Utah sisters had all given birth on Blemish 15 12 20
March 11, 1998 as a basis for posing some interest-
Crack 50 44 40
ing questions regarding birth coincidences.
a. Disregarding leap year and assuming that the Pull-tab problem 21 28 24
other 365 days are equally likely, what is the Surface defect 10 8 15
probability that three randomly selected births Other 4 8 2
all occur on March 11? Be sure to indicate what,
if any, extra assumptions you are making. During this period, line 1 produced 500 noncon-
b. With the assumptions used in part (a), what is forming cans, line 2 produced 400 such cans, and
the probability that three randomly selected line 3 was responsible for 600 nonconforming cans.
births all occur on the same day? Suppose that one of these 1500 cans is randomly
c. The author suggested that, based on extensive selected.
data, the length of gestation (time between con- a. What is the probability that the can was pro-
ception and birth) could be modeled as having duced by line 1? That the reason for nonconfor-
a normal distribution with mean value 280 days mance is a crack?
and standard deviation 19.88 days. The due dates b. If the selected can came from line 1, what is the
for the three Utah sisters were March 15, April 1, probability that it had a blemish?
and April 4, respectively. Assuming that all three c. Given that the selected can had a surface defect,
due dates are at the mean of the distribution, what is the probability that it came from line 1?

77. One satellite is scheduled to be launched from different from the bit received (a reversal). Assume
Cape Canaveral in Florida, and another launching that relays operate independently of one another.
is scheduled for Vandenberg Air Force Base in Cali- Transmitter Relay 1 Relay 2 Relay 3
fornia. Let A denote the event that the Vandenberg Receiver
launch goes off on schedule, and let B represent the
a. If a 1 is sent from the transmitter, what is the
event that the Cape Canaveral launch goes off on
probability that a 1 is sent by all three relays?
schedule. If A and B are independent events with
b. If a 1 is sent from the transmitter, what is the
P(A) . P(B) and P(A or B) 5 .626, P(A and B) 5
probability that a 1 is received by the receiver?
.144, determine the values of P(A) and P(B).
Hint: Use a tree diagram.
78. A message is transmitted using a binary code of c. Suppose that 70% of all bits sent from the trans-
0s and 1s. Each transmitted bit (0 or 1) must pass mitter are 1s. If a 1 is received by the receiver,
through three relays before reaching a receiver. At what is the probability that a 1 was sent? Hint:
each relay, the probability is .20 that the bit sent is Use a tree diagram.

Bibliography

Devore, J. L. and K. N. Berk, Modern Mathematical Ross, S., A First Course in Probability (8th ed.), Wiley,
Statistics with Applications (2nd ed.), Springer, New York, 2009. A succinct mathematical treatment
New York, 2012. A more mathematical treatment with good examples and problems.
than given in this text, but still readable, with good Sachs, L., Applied Statistics: A Handbook of Tech-
examples and problems. niques (2nd ed.), Springer, New York, 1984. A one-
Olofsson, P., Probabilities: The Little Numbers That volume summary of statistical methods that emphasiz-
Rule Our Lives, Wiley, New York, 2007. An outstand- es short summaries of essentials, easy examples, tables,
ing non-mathematical exposition, with great insights. notes, and detailed references for further reading.

6
hilmi_m/Istockphoto.com
Quality and Reliability
6.1 Terminology
6.2 How Control Charts Work
6.3 Control Charts for Mean and Variation
6.4 Process Capability Analysis
6.5 Control Charts for Attributes Data
6.6 Reliability

Introduction
Statistical methods for monitoring and improving the quality of manufactured goods
have been around since the early 1920s when Bell Laboratories engineer W. A.
Shewhart introduced the graphical control chart method for detecting possible
problems in manufacturing processes (Sections 6.2, 6.3, and 6.5). Current applica-
tions of statistical methods of quality assurance have widened to include service
industries as well as traditional manufacturing applications. Since the 1980s, there
has also been a greatly increased emphasis on the use of experimental design
techniques that seek to identify the key factors that lead to improvements in pro-
cesses and products. Experimental design methods, which were briefly described
in Section 4.3, are discussed in detail in Chapters 9 and 10. Although the focus in
Chapter 6 is on the various control charts that have been developed to monitor
existing production systems, we also include a discussion of the important topic of
evaluating the reliability of finished products (Section 6.6).
The statistical tools underlying the methods of this chapter are fairly basic.
Calculations of tail areas of normal distributions are used in Section 6.4 to estimate the
capability of a production process to produce acceptable products. Control chart
methods in the remaining sections are based on knowing the sampling distribution
(Sections 5.5 and 5.6) of the various statistics used to describe the output of a

246

production process. Histograms not only provide convenient summaries of process

data but are also used to detect potential process problems (Section 6.1).

6.1 Terminology
Applying statistics to a specific field, such as quality control, requires some knowledge of
the jargon used in that field. The terminology introduced subsequently is used through-
out the remaining sections of this chapter. In some cases, familiar statistical terms (such
as discrete and continuous measurements) are given different names by quality practitio-
ners, making it necessary to know both names when working in this field.

Specification Limits
When product designs are translated into tangible entities, it becomes necessary to
precisely define the key characteristics of a product and each of its subcomponents. For
manufactured products, this is done by specifying the exact physical dimensions and
other quality characteristics that finished products should have. For services, specifica-
tions often take the form of rules for processing transactions or guidelines for interacting
with customers. In many cases, especially in manufacturing, a single value corresponds
to the most desired quality level for a given product characteristic. We refer to this value
as the nominal or target value of the quality characteristic.
Practically speaking, it is almost impossible to make each unit of product identical
to the next, so some flexibility is required in achieving target values. This is done by
choosing specification limits or tolerances that delineate the range of measured values
that we will accept as “close enough” to the target value, in the sense that products that
are within the specification range should be fit for their intended use.1 For example,
car doors are made with a certain nominal width, but specification limits are neces-
sary because doors cannot be too wide (or they may not close properly) or too narrow
(or they may fail to latch correctly). Quality characteristics that have both upper and
lower specification limits are said to have a two-sided tolerance. Those with only one
specification limit have a one-sided tolerance. Examples of characteristics with one-
sided tolerances include breaking strengths of materials, which have lower specifica-
tion limits, and the level of contaminants in a water supply, which have only upper
specification limits.
Nominal values and their associated specification limits are generally stated
together in an abbreviated form such as 1 in. 6.005 in., which describes a character-
istic with a nominal value of 1 in., a lower specification limit of .995 in., and an upper
specification limit of 1.005 in. Together, the nominal value and specification limits are
called the specifications or, more simply, the “specs.” When data do not exceed the
specification limits placed on them, we say that the particular process giving rise to the
data is “within specifications.” Otherwise, the process is said to “fail the specifications”
or to be “out of spec.”

1
In the 1970s, quality was defined to be “fitness for use.” Around 1983, the American Society for Quality
Control (ASQC) expanded the definition to “quality is the totality of features and characteristics of a product
or service that bear on its ability to satisfy given needs.”

definitions The largest allowable value that a quality characteristic can have is called the
upper specification limit (USL); the smallest allowable value is called the lower
specification limit (LSL).

Conformance and Nonconformance

When a product or process fails to meet its specifications, there is a need to classify the
seriousness of the situation. Sometimes out-of-spec conditions lead to problems that are
very serious and prevent a product from ever being used. At other times, problems caused
by not meeting specifications may be only cosmetic. To distinguish between these two
extremes, quality practitioners have adopted the following classification scheme. Products
that do not meet their specifications are called nonconforming and the problems or flaws
in such nonconforming items are called nonconformities. A nonconforming product
is not necessarily unfit for any use. For example, the fact that a chemical concentration is
lower than its LSL does not necessarily mean that the chemical will fail to have a desired
effect; it may just require a longer reaction time than if the concentration exceeds the
LSL. In the garment industry, shirts that have minor nonconformities are still usable and
are sold as seconds in discount stores. However, when nonconformities become so serious
that a product is no longer fit for its intended use, we say that it is defective. A defective
product can contain one or more defects that cause it to be classified as defective.

definitions A product is nonconforming if it has one or more nonconformities that cause

it, or an associated product or service, not to meet a specification requirement.
A defective product is one that has one or more defects that cause it, or an associ-
ated product or service, not to satisfy intended usage requirements.

The Process Approach

In modern quality programs, each step in a product’s manufacture or each step in a
service procedure is viewed as a separate process to be performed. Every such process
has inputs (from the preceding process steps) and outputs (for use in the succeeding
steps). It is common practice to use a systems diagram to depict the various process steps
and their interconnections (Figure 6.1). Quality control efforts that are directed at key
processes or subprocesses, with an eye to solving problems and maintaining consistent
output, are called process control activities. When statistical methods are used for this
purpose, such activities are referred to as statistical process control (SPC). Manag-
ing these numerous applications of SPC and other quality improvement tools usually
requires well-designed implementation programs, the most popular of which are TQM
(Total Quality Management) and SIX-SIGMA. Detailed descriptions of these programs
can be found in many quality control textbooks.
Using statistical methods to control processes is accomplished by identifying key
product characteristics, measuring them, and then converting the data into sample statis-
tics. Both continuous and discrete measurements are used in quality control procedures.
Continuous data, those obtained from measuring instruments, is also called variables

Inputs Process Outputs

(a)

(b)

Figure 6.1 System diagrams: (a) envisioning each step

in creating a product as a process; (b) products and
services broken into a series of subprocesses

data in the quality professions. Discrete data, those that arise by counting things, is called
attributes data. These names are commonly used to describe the various statistical tech-
niques used in quality control. For example, control charts are classified as variables
control charts or attributes control charts, depending on the kind of data used to form
the charts (see Sections 6.2, 6.3, and 6.5).

Histograms
As shown in Figures 6.2 and 6.3, histograms are very effective tools for understanding
processes that generate variables data. Because many processes tend to produce vari-
ables data that follow normal distributions, normal curves are often superimposed over
such histograms. This technique is so commonly used that it is standard practice to
describe a process’s output by drawing a normal curve centered at the sample mean of
the data, sometimes without even including the histogram of the data. When specifica-
tion limits are included as well, we get a visual picture of how a process is behaving with
respect to its specifications (Figure 6.2). From this figure, it is easy to see how much of
the process data is nonconforming, that is, outside of the specification range.
Unless otherwise noted, all content on this page is © Cengage Learning.

Measurement
LSL Process USL
average

Figure 6.2 Normal curve describing process measurements

Histograms are often used to give warnings of possible process problems. A smoothly
running process usually generates data whose histogram appears similar to that in Fig-
ure 6.2. Irregularities in a process are evidenced by histogram shapes that differ from
a normal curve. Figure 6.3 shows some of the typical histogram shapes that can occur
along with the most likely reasons for their appearance.

Histogram Most Likely Explanation

Special (assignable) cause,

measuring error, or
data entry/recording error
LSL USL
(a)

Process shifted to right, or

measurements out of calibration
LSL USL
(b)

Process shifted to left, or

measurements out of calibration
LSL USL
(c)

Two process streams mixed;

data from two different times,
operators, machines, etc.
LSL USL
(d)

Process variation too large for

specification limits
LSL USL
(e)

Truncated data: nonconforming

items not reported
LSL USL
(f)
Granularity: too little data for
the number of classes, inspectors
rounding measurements, or
measuring instrument resolution
LSL USL Unless otherwise noted, all content on this page is © Cengage Learning.
not fine enough
(g)

Process stable and capable of

meeting specifications
LSL USL
(h)

Figure 6.3 Typical histograms of process data

Section 6.1 Exercises

1. General-purpose resistors are color-coded with a se- c. The thread diameter of a bolt
quence of four rings that identify the nominal value d. The number of bolts in a batch that have over-
of the resistance (in ohms) and the plus and minus size thread diameters
tolerance (expressed as a percentage of the nominal) e. The proportion of bolts in a batch that have
to be expected in the actual resistance. For example, oversize thread diameters
bands (in order) of green, blue, brown, and gold de- f. The torque applied to an airplane wing fastener
note a resistor with a nominal value of 560 ohms and (bolts and nuts used in aerospace are called
a tolerance of 65%. What are the specification limits fasteners)
(in ohms) for the process that produces such resistors? g. The number of errors in 1000 lines of computer
code
2. Determining whether structural materials conform
h. The time between breakdowns of a certain
to specifications often requires special test equip-
machine
ment (which can be expensive) and test procedures
i. The breaking strength of a molded plastic part
(which require specialized training). Consequently,
independent testing and evaluation labs have arisen 6. The following are measurements (in inches) of a
to perform such tests. One of the measures of the quality characteristic with specification limits of
quality of the services provided by such labs is the 2.50 6 .05 in.:
waiting time before test results are available. Does
2.54 2.52 2.50 2.52 2.50 2.50 2.47 2.48
the characteristic waiting time have a one- or a two-
2.51 2.53 2.53 2.51 2.50 2.47 2.49 2.50
sided tolerance?
2.50 2.50 2.46 2.48 2.48 2.50 2.51 2.53
3. A standard legal envelope is 4 inches wide by 2.51 2.53 2.53 2.52 2.47 2.51
9.5 inches long. Normally, 8.5-inch by 11-inch pages
are folded in thirds before they are inserted into such a. Create a histogram of the data.
envelopes. Viewing page folding as a process whose b. Estimate the mean and standard deviation of
measurable output is the width of the folded page, an- the process from which this data was taken.
swer the following questions: c. What percentage of these measurements falls
a. What specification limit does the envelope size above the USL? What percentage of the mea-
place on the page-folding process? surements falls below the LSL?
b. What are the penalties for exceeding the upper d. Assuming that the process from which the
specification? data was taken can be described by a normal
density function, what percentage of the pro-
4. Citrus products must have a certain sugar content, cess data is expected to fall above the USL
measured in degrees Brix, to be judged satisfactory [use your estimates from part (b)]? What per-
to sell to grocery stores. Suppose that a certain batch centage of the process data is expected to fall
of oranges fails to meet the specified Brix level. below the LSL?
Which classification would you apply to these or- e. Explain the reason for the difference in your an-
anges, defective or nonconforming? swers to parts (c) and (d).
5. Measurements are to be taken on each of the follow- 7. If the measuring instrument used in Exercise 6 is
ing characteristics. In each case, indicate whether out of calibration and is giving readings that are,
the resulting measurements would be classified as say, .02 in. higher than the true length of an object,
variables or attributes data. what is the effect on the estimated proportions of
a. The number of flaws per square foot in a large conforming and nonconforming products?
sheet of metal
b. The concentration of a chemical solution used 8. A cork intended for use in a wine bottle is consid-
in an electroplating process ered acceptable if its diameter is between 2.9 cm

and 3.1 cm (so the lower specification limit is a randomly selected cork will conform to
LSL 5 2.9 cm, and the upper specification limit is specification?
USL 5 3.1 cm). b. If instead the mean value is 3.00 and the standard
a. If cork diameter is a normally distributed vari- deviation is .05, is the probability of conforming
able with mean value 3.04 cm and standard to specification smaller or larger than it was in
deviation .02 cm, what is the probability that part (a)?

6.2 How Control Charts Work

The recognition that variation is unavoidable in every repetitive process was well understood
by the early pioneers of statistical quality control. To identify, and, when possible, eliminate
sources of process variation, W. A. Shewhart introduced the control chart method in 1924.
Shewhart envisioned two types of variation that, when combined, account for all the varia-
tion in a process. The first type, common cause variation, is the result of the myriad im-
perceptible changes, or common causes, that occur in the everyday operation of a process.
Common causes are essentially the noise in a production system and, as such, common
cause variation is considered to be expected, but uncontrollable variation. Controllable
variation, on the other hand, is variation for which we can find definite assignable causes,
also called special causes. Assignable causes are frequently found when there are changes in
brands of raw materials, turnover in the workforce, or machine wear or breakdown. Control
charts are designed as a method for detecting the existence of assignable causes.

Control Charts
Control charts are constructed by taking successive samples from the output of a pro-
cess, making measurements on the sampled items, and then plotting summary statistics
of these results. Figure 6.4 shows a typical control chart. The samples, also called sub-
groups, of size n are taken at regular intervals of time. For each subgroup, a summary
statistic is calculated and plotted (on the vertical axis) versus the subgroup number (on
the horizontal axis). Any statistic of interest can be calculated, but the most commonly
used are x (subgroup mean), R (subgroup range), s (subgroup standard deviation), p
(proportion nonconforming), c (number of nonconformities), and u (nonconformities
per unit). A control chart derives its name from the name of the particular statistic
Unless otherwise noted, all content on this page is © Cengage Learning.

Subgroup statistic

Assignable causes
Upper control limit
UCL
Centerline
Common cause
variation
Lower control limit
Assignable causes LCL

Subgroup number
1 2 3 4 ...

Figure 6.4 The Shewhart control chart

calculated in the subgroups. For example, an x chart (read “x bar chart”) is one that
monitors successive subgroup means, an R chart monitors subgroup ranges, and so forth.
The control limits and centerline of a control chart are based on the sampling
distribution (see Sections 5.5 and 5.6) of the chart statistic. The smaller of the two con-
trol limits is called the lower control limit (LCL) and the larger one is called the upper
control limit (UCL). In the United States, control limits are set at a distance of 3 stan-
dard errors (i.e., 3 standard deviations of the subgroup statistic) from the mean of the
sampling distribution. This is based on the fact that many sampling distributions closely
approximate normal distributions, the majority of whose probability, about 99.73%, lies
within 3 standard deviations of the mean. For example, in an x chart, the standard error
y1n of the sampling distribution of x is used to establish the control limits, which in
theory would be set at 6 3y1n. In practice, of course, estimates of the process mean
and process standard deviation must be used in this formula. In England and other
countries, control limits are set by specifying the probability, typically around 99%, that
lies under the sampling distribution curve between the control limits.
Plotted points that fall outside (i.e., above the UCL or below the LCL) are inter-
preted as signals of possible special causes, whereas points within the control limits are
usually (but not always) associated with common cause variation, that is, the absence
of special causes. It is also important to remember that control limits are different from
specification limits, which are not plotted on a control chart.

Statistical Control
When all the points on a control chart lie between the control limits and when there are no
other anomalous patterns in the charted points, a process is said to be in a state of statistical
control or, more briefly, “in control.” Otherwise, the process is said to be “out of control.”
The phrase out of control, which can sometimes be misinterpreted, is only a way of indicat-
ing that control chart points are behaving in a nonrandom fashion. It does not imply that
the process itself is bad nor does it necessarily imply that any nonconforming products are
being made. “Out of control” simply means that assignable causes are likely to be present.
When control charts were first introduced, the primary signal of an “out-of-control”
condition was when one or more points were outside one of the control limits. If the
sampling distribution of the subgroup statistic is approximately normal, this means that
there is a probability of about .0027 (or .27%) that a control chart point will fall outside
one of the control limits when no assignable causes are present. That is, when a process is
running smoothly and no special causes are operating, there is a relatively small chance
(.27%) that a control chart point will give a false positive—mistakenly signaling the pres-
ence of a special cause. On the other hand, when special causes are present, there is also
a chance that the chart will fail to detect them. To increase the sensitivity of a chart for
detecting special causes, while still maintaining the false positive rate at .27%, an extended
set of “out-of-control” rules is often used. The “out-of-control” rules in Figure 6.5 are
commonly used by quality control software to help detect the presence of special causes.

Rational Subgroups
Selecting rational subgroups is key to the proper use of control charts. The name ratio-
nal subgroup is intended to remind us that the subgroups are chosen in a thoughtful

Each zone (A, B, C) has a width of 1 standard error.

UCL UCL
A A
B B
C C
Centerline Centerline
C C
B B
A A
LCL LCL

Test 1. One point beyond Zone A Test 2. Nine points in a row in

Zone C or beyond

UCL UCL
A A
B B
C C
Centerline Centerline
C C
B B
A A
LCL LCL

Test 3. Six points in a row steadily Test 4. Fourteen points in a row

increasing or decreasing alternating up and down

UCL UCL
A A
B B
C C
Centerline Centerline
C C
B B
A A
LCL LCL

Test 5. Two out of three points in Test 6. Four out of five points in a
a row in Zone A or beyond row in Zone B or beyond

UCL UCL
A A
B B Unless otherwise noted, all content on this page is © Cengage Learning.
C C
Centerline Centerline
C C
B B
A A
LCL LCL

Test 7. Fifteen points in a row in Test 8. Eight points in a row on

Zone C (above and below both sides of centerline
centerline) with none in Zone C

Figure 6.5 Extended list of “out-of-control” rules for Shewhart charts

manner and are usually not random samples. Instead, rational subgroups should be
chosen in a way that maximizes the ability of the chart to detect special causes. The goal
is to have the variation within any rational subgroup represent the common cause varia-
tion in the process. In this way, any significant variation between subgroups can be
attributed to possible special causes. Randomness and sampling distributions enter the
picture when we make the assumption that there are no special causes at work, in which
case each rational subgroup can be considered to be a random sample from the process.
That is, if a process is in control, then successive items (and subgroups of such items)
should vary according to a system of random causes, which then permits us to use the
properties of a sampling distribution to form control limits.
One commonly used method for forming rational subgroups is to choose subgroup
elements over a fairly short span of time. The time span should be short enough so that it
is unlikely for the occurrence of a special cause to overlap two subgroups. For example,
if differences between raw materials are a potential source of process problems, then
subgroups should be formed such that all elements in each subgroup correspond to only
one type of raw material. Then, if a problem occurs when raw materials are changed,
the data in all subgroups occurring after the change of materials will differ from the data
in the subgroups taken before the change, and the control chart points calculated from
such subgroups will have a good chance of detecting the problem.
A general strategy for deciding how to form rational subgroups is (1) to decide which
causes are important to detect and which are not, then (2) to design subgroups that
maximize the chance of detecting the important causes and relegate the unimportant
causes to the within-subgroup variation. For instance, suppose that daily changes in
temperature are known to have a small, but inconsequential, effect on the lengths of
plastic parts, whereas impurities in batches of raw plastic pellets are known to have a
serious effect on part lengths. If each batch of pellets lasts, say, for 4 hours of produc-
tion, then subgroups of size 6 might be formed once an hour by selecting one part about
every 10 minutes after a new batch of pellets is opened. In this way, each subgroup of
6 would represent a specific batch, but several different temperatures would be repre-
sented over each 1-hour collection period.

Section 6.2 Exercises

9. Two identical machines are used to make a par- are taken from the output of machine 2; an
ticular metal part. The finished parts from both hour later, five parts are sampled from ma-
machines are mixed together on a conveyor system chine 1; and so forth.
that moves the parts to a subsequent assembly op- Which method of choosing rational sub-
eration. Consider the following two methods for groups would be better able to detect when one
generating rational subgroups for a control chart of of the machines is not in statistical control?
this process:
10. When a process is in a state of statistical control,
a. Method 1: Five parts per hour are sampled from
all of the points on a control chart should fall
the finished parts on the conveyor system each
within the control limits. However, it is undesir-
hour.
able that all of the points should fall extremely
b. Method 2: Before reaching the conveyor sys-
near, or exactly on, the centerline of the control
tem, a sample of five parts is taken from the
chart. Why?
output of machine 1; an hour later, five parts

11. U.S. companies commonly use 3-sigma limits to UCL

establish control limits. Some other countries (e.g.,
Great Britain) use control limits that are 3.09 sigmas
from the chart’s centerline.
a. Using the normal distribution, what is the prob-
LCL
ability that a single control chart point falls
above the UCL in a 3-sigma control chart?
b. Using the normal distribution, what is the
probability that a single control chart point UCL
falls above the UCL in a 3.09-sigma control
chart?
12. Suppose that the measuring instrument used to
obtain data from a certain process is out of cali- LCL
bration, so that each of its reported measurements
is off by 1 units from the true value. What effect
does this have on the signals given by the x and R UCL
charts?
13. Using the extended list of “out-of-control” rules
in Figure 6.5 (page 254), determine whether the
processes that give rise to the adjacent control LCL
charts appear to be in statistical control. Circle any
points at which an out-of-control condition is first
signaled.
UCL

LCL

Figure for Exercise 13

Unless otherwise noted, all content on this page is © Cengage Learning.

6.3 Control Charts for Mean and Variation
In this section, we introduce the most commonly used Shewhart charts for monitoring
the mean and variation of a process. The important thing to remember about such
charts is that they are generally used in pairs, one chart to track the process average and
one for the process variation. Furthermore, the chart for process variation is created first
because its centerline is a key ingredient in calculating the control limits for the chart
that monitors the process average.
Shewhart originally used the sample mean x and the sample range R as the sub-
group statistics to use in control charts for variables data. These charts, called x and R
charts (read “x bar and R charts”) are still among the most frequently used variables
control charts. They serve as the prototype for understanding how all other Shewhart
charts are intended to operate.

Theoretically, the control limits for the x chart are based on 3-sigma limits of the
sampling distribution of the statistic x:

UCL 5 1 3 and LCL 5 2 3
1n 1n
where and denote, respectively, the long-run process mean and standard deviation
of the process. Of course, these formulas cannot be used directly since both and
must first be estimated from the available process data. The process average is estimated
by the average of k successive subgroup means:
1 k
x5 ^x
k i51 i
The estimate is denoted by x (read “x double bar”) because it is an average of several
averages; x is also called the grand mean of the subgroup means. To obtain a reason-
able estimate of , the following two-stage procedure is used. First, the chart for process
variation (the R chart) is brought into statistical control. This ensures that the process
variation is stable and, therefore, that the centerline of the R chart is a reliable estimate
of the average range of subgroups of size n from the process. Second, this centerline
is converted into an estimate of the process standard deviation , which is then put
into the expression x 6 3y1n to obtain the approximate control limits for the x chart.
Fortunately, the control limits of the R chart also turn out to be simple functions of the
centerline of the R chart.

The R Chart
To construct an R chart, we use the data from some number, k, of successive subgroups
of process measurements. It is usually recommended that about 20 to 25 subgroups be
used. If possible, the same sample size n is used to form each subgroup. The centerline
of the R chart is denoted by R and is calculated by averaging the sample ranges R1, R2,
R3, . . . , Rk of the k subgroups:
1 k
R5 ^R
k i51 i
R serves as an estimate of R, the mean of the sampling distribution of the ranges (for
samples of size n) from the process. Let R denote the standard deviation of this sam-
pling distribution; the 3-sigma limits, R 6 3R, are used to form the control limits for
the R chart. Assuming that the process measurements can be adequately described by
a normal distribution, it can be shown that the control limits for the R chart are given
by
UCL 5 D4R and LCL 5 D3R

where D3 and D4 are constants that depend on the subgroup size, n. Values of D3 and
D4 are found in Appendix Table XI, which lists such constants for a variety of different
types of control charts.
After finding the centerline and control limits, the R chart is constructed by simply
plotting the k subgroup ranges Ri(i 5 1, 2, . . . , k) versus the subgroup index, i, and then
drawing horizontal lines to represent the centerline R and control limits. Using the

“out-of-control” rules listed in Section 6.2, we examine the R chart to see whether these
k ranges seem to be in statistical control. If any out-of-control conditions are found, it is
recommended that the subgroup(s) associated with these problems be eliminated and
that the centerline and control limits be recalculated based on the reduced number of
subgroups. When doing this, subgroups should be eliminated only if definite assign-
able causes can be found for the out-of-control signal associated with these subgroups.
Out-of-control subgroups for which no assignable cause can be found should not be
eliminated.
When the R chart is deemed to be in a state of statistical control, the centerline
R can then be considered to be a reliable estimate of the average range (of samples of
size n) from a normal population. This estimate can then be converted into an esti-
mate for the process standard deviation by means of the formula
R
n5

d2
where d2 is found in the table of control chart constants (Appendix Table XI). The
estimate n of is used to calculate the control limits of the x chart and to assess the
capability of the process to meet the specification limits (see Section 6.4).

The x Chart
Once the R chart is in control, the x chart is then constructed. Any subgroups that were
eliminated during the construction of the R chart should automatically be eliminated
from the x chart calculations. Given that we have k valid subgroups of data, whose
subgroup means are denoted by x1, x2, x3, . . . , xk, the centerline of the x chart is just the
average of the subgroup means,
1 k
x5 ^x
k i51 i
as mentioned previously. The control limits are found by replacing and by the esti-
mates x and Ryd2 in the control limit formulas:

Ryd2 Ryd2
UCL 5 1 3 x13 and LCL 5 2 3 x23
1n 1n 1n 1n
Letting A2 5 3yd2 1n, we shall now use the following estimated limits:

UCL x 1 A2R and LCL x 2 A2R

where the constant A2 depends on the particular subgroup size, n, and is found in
Appendix Table XI. These formulas show how the centerline R of the R chart directly
affects the control limits of the x chart.

Example 6.1 The process of making ignition keys for automobiles consists of trimming and pressing
raw key blanks, cutting grooves, cutting notches, and plating. Some of the dimensions,
such as the depth of grooves and notches, are critical to the proper functioning of the
keys. Table 6.1 contains measurements (in inches) of a particular groove depth on the

side of each key. Due to the high volume of keys processed per hour, the sampling
frequency is chosen to be five keys every 20 minutes. For convenience, the subgroup
means and standard deviations are also given in Table 6.1, along with the grand mean
x 5 .007966 and the average range R 5 .002400. The relevant control chart constants
for subgroups of size n 5 5 are D4 5 2.114, D3 5 0, and A2 5 .577 (Appendix Table XI).
The initial estimates of the control limits for the R chart are
UCL 5 D4R 5 (2.114)(.002400) 5 .005074
LCL 5 D3R 5 (0)(.002400) 5 0
The corresponding control chart is shown in Figure 6.6. Because there do not appear
to be any out-of-control points in the chart, no subgroups need be dropped, and we
can proceed immediately to the construction of the x chart.

Table 6.1 Ignition key data for Example 6.1

Subgroup
number i Groove depth (inches) xi Ri
1 .0061 .0084 .0076 .0076 .0044 .00682 .0040
2 .0088 .0083 .0076 .0074 .0059 .00760 .0029
3 .0080 .0080 .0094 .0075 .0070 .00798 .0024
4 .0067 .0076 .0064 .0071 .0088 .00732 .0024
5 .0087 .0084 .0088 .0094 .0086 .00878 .0010
6 .0071 .0052 .0072 .0088 .0052 .00670 .0036
7 .0078 .0089 .0087 .0065 .0068 .00774 .0024
8 .0087 .0094 .0086 .0073 .0071 .00822 .0023
9 .0074 .0081 .0086 .0083 .0087 .00822 .0013
10 .0081 .0065 .0075 .0089 .0097 .00814 .0032
11 .0078 .0098 .0081 .0062 .0084 .00806 .0036
12 .0089 .0090 .0079 .0087 .0090 .00870 .0011
13 .0087 .0075 .0089 .0076 .0081 .00816 .0014
14 .0084 .0083 .0072 .0100 .0069 00816 .0031
15 .0074 .0091 .0083 .0078 .0077 .00806 .0017
Unless otherwise noted, all content on this page is © Cengage Learning.

16 .0069 .0093 .0064 .0060 .0064 .00700 .0033

17 .0077 .0089 .0091 .0068 .0094 .00838 .0026
18 .0089 .0081 .0073 .0091 .0079 .00826 .0018
19 .0081 .0090 .0086 .0087 .0080 .00848 .0010
20 .0074 .0084 .0092 .0074 .0103 .00854 .0029
x 5 .007966 R 5 .002400

The control limits for the x chart are

UCL 5 x 1 A2R 5 .007966 1 (.577)(.002400) 5 .009351
LCL 5 x 2 A2R 5 .007966 2 (.577)(.002400) 5 .006581

Sample range

.005 UCL = .005074

.004

.003
– = .002400
.002

.001

0 LCL = 0
Sample number
0 10 20

Figure 6.6 chart for the data of Table 6.1

The x chart is shown in Figure 6.7. None of the points is outside the control units,
although there is a run of eight consecutive points above the centerline (subgroups
8–15). According to the extended list of “out-of-control” rules in Section 6.2, this run
of points is not quite long enough to signal an out-of-control condition.

Sample mean

.0095
UCL = .009351

.0085
–
= .007966

.0075

.0065 LCL = .006581

Sample number
0 10 20
Unless otherwise noted, all content on this page is © Cengage Learning.
Figure 6.7 chart for the data of Table 6.1

x and s Charts
Various alternatives to x and R charts have been proposed over the years. Because there
are many different statistics available for measuring central tendency, along with several
measures for variation, just about any combination of the two can be used to monitor
a process average and variation. One combination that is frequently used is the x and
s chart. The procedure for constructing x and s charts parallels that for x and R charts:
The variation chart (i.e., the s chart) is first brought into statistical control, then the x
chart is constructed using control limits formed from the centerline of the s chart.

Starting with k subgroups, each of size n, we denote the individual subgroup stan-
dard deviations by s1, s2, s3, . . . , sk. Their average, s, forms the centerline of the s chart:
1 k
s5 ^s
k i51 i
s is an estimate of s, the mean of the sampling distribution of the sample standard
deviation based on samples of size n. Following the usual 3-sigma procedure, control
limits for the s chart can be shown to have the form
UCL 5 B4s and LCL 5 B3s
where B3 and B4 depend on the subgroup size, n, and are found in Appendix Table XI.
In addition, to calculate the capability of a process, the standard deviation of the process
measurements can be estimated by
s
n5
c4
where c4 is yet another control chart constant found in Appendix Table XI. The same
extended list of “out-of-control” rules used for x and R charts can be applied to x and s
charts (see Figure 6.5 on page 254).
For the x chart, the grand average of the subgroup means forms the centerline of
the chart, as follows:
1 k
x5 ^x
k i=1 i

Following the same procedure as with the x and R charts, we form the control limits for
the x chart by substituting an estimate of into the theoretical 3-sigma limits. In this
case, the estimate is syc4, which is based on the s chart:
syc4 syc4
UCL 5 1 3 x13 and LCL 5 2 3 x23
1n 1n 1n 1n
By letting A3 5 3yc4 1n, we can write these control limits in the simpler form

UCL 5 x 1 A3s and LCL 5 x 2 A3s

Example 6.2 In this example, we reanalyze the key groove data of Table 6.1, this time using
x and s charts. Using the average of the 20 subgroup standard deviations,
s 5 .0009672, along with the control chart constants B3 5 0 and B4 5 2.089 from
Appendix Table XI (for subgroups of size n 5 5), we calculate the control limits
for the s chart to be
UCL 5 B4s 5 (2.089)(.0009672) 5 .002020
and
LCL 5 B3s 5 (0)(.0009672) 5 0

The s chart, shown in Figure 6.8, does not exhibit any out-of-control conditions.
With respect to the x chart, the centerline is still calculated as the average of the
subgroup averages:
1 k
x5 ^ x 5 .007966
k i=1 i
as in Example 6.1. For subgroups of size n 5 5, the factor A3 5 1.427 is found from
Appendix Table XI. This gives control limits of
UCL 5 x 1 A3s 5 .007966 1 (1.427)(.0009672) 5 .009346
LCL 5 x 2 A3s 5 .007966 2 (1.427)(.0009672) 5 .006586
Note that these limits are very close to the limits obtained from the R chart (UCL 5
.009351 and LCL 5 .006581). Consequently, the x chart is almost identical to that
of Example 6.1, and, in particular, it gives no out-of-control signals (see Figure 6.9
below).
Standard deviation

.002 UCL = .002020

.001 – = .000967

0 LCL = 0
Sample number
0 10 20

Figure 6.8 chart for groove depth data

Sample mean

.0095
UCL = .009346 Unless otherwise noted, all content on this page is © Cengage Learning.

.0085
– = .007966

.0075

.0065 LCL = .006586

Sample number
0 10 20

Figure 6.9 chart for groove depth data

The choice between running a combination of x and s charts or one of x and R

charts is largely a matter of personal preference. Although some authors recommend s
charts in lieu of R charts because the sample standard deviation s makes more efficient
use of the data than does R, the difference in efficiency is actually very small for the
small subgroup sizes used in control charts, so x and R charts will almost always lead
to the same conclusions as x and s charts. If control charts are done by hand, then the
R chart method is definitely preferable because of the ease in calculating the ranges of
small samples. If a computer is available, then computational difficulty is immaterial,
and one might as well use the x and s chart approach.

Section 6.3 Exercises

14. The control limits on x charts become closer 17. Refer to the data of Exercise 16.
together as the subgroup size n is increased (i.e., a. Construct an s chart for this data, and check for
the A2 factor decreases as n increases). For a process special causes.
that is in statistical control, does this imply that a b. Construct an x chart for this data. Why are the
control chart point is more likely to fall outside the control limits of this chart different from those in
control limits of an x chart based on a larger sub- Exercise 16(b)?
group size rather than a smaller subgroup size?
18. When installing a bath faucet, it is important to
15. Subgroups of four power units are selected once properly fasten the threaded end of the faucet stem
each hour from an assembly line, and the high- to the water-supply line. The threaded stem dimen-
voltage output of each unit is measured. Suppose sions must meet product specifications, otherwise
that the sum of the ranges of 30 such subgroups is malfunction and leakage may occur. Authors of
85.2. Calculate the centerline and control limits of “Improving the Process Capability of a Boring
an R chart for this data. Operation by the Application of Statistical Tech-
16. Hourly samples of size 3 are taken from a process niques” (Intl. J. Sci. Engr. Research, Vol. 3, Issue 5,
that produces molded plastic containers, and a May 2012) investigated the production process of a
critical dimension is measured. Data from the most particular bath faucet manufactured in India. The
recent 20 samples is given here: article reported the threaded stem diameter (target
value being 13 mm) of each faucet in 25 samples of
Hour x1 x2 x3 Hour x1 x2 x3 size 4 as shown here:
1 .36 .39 .36 11 .36 .32 .36
Subgroup x1 x2 x3 x4
2 .33 .35 .30 12 .38 .47 .35
1 13.02 12.95 12.92 12.99
3 .51 .41 .42 13 .29 .45 .39
4 .42 .37 .34 14 .44 .38 .43 2 13.02 13.10 12.96 12.96
5 .39 .38 .38 15 .38 .37 .37 3 13.04 13.08 13.05 13.10
6 .33 .41 .45 16 .31 .43 .38 4 13.04 12.96 12.96 12.97
7 .43 .39 .41 17 .39 .49 .35 5 12.96 12.97 12.90 13.05
8 .41 .32 .32 18 .43 .36 .38 6 12.90 12.88 13.00 13.05
9 .37 .42 .36 19 .40 .45 .32 7 12.97 12.96 12.96 12.99
10 .26 .42 .32 20 .40 .40 .32 8 13.04 13.02 13.05 12.97
a. Construct an R chart for this data. Are any out- 9 13.05 13.10 12.98 12.96
of-control signals indicated by this chart? 10 12.96 13.00 12.96 12.99
b. Construct an x chart for this data, and check for 11 12.90 13.05 12.98 12.88
signs of special causes. 12 12.96 12.98 12.97 13.02

Subgroup x1 x2 x3 x4 20. In Exercise 19, suppose that an assignable cause

13 13.00 12.96 12.99 12.90 was found for the unusually high average refractive
14 12.88 12.94 13.05 13.00 index in subgroup 22.
15 12.96 12.96 13.04 12.98 a. Recompute the control limits for both the
x and s charts after removing the data from
16 12.99 12.94 13.00 13.05
day 22.
17 13.05 13.02 12.88 12.96
b. Do the charts in part (a) indicate that there are
18 13.08 13.06 13.10 13.05 any other out-of-control signals present?
19 13.02 13.05 13.04 12.97
20 12.96 12.90 12.97 13.05 21. Because processes are designed to produce prod-
ucts with fixed nominal dimensions, it is quite com-
21 12.98 12.99 12.96 13.00
mon to find that most of the variation in sample
22 12.97 13.02 12.96 12.99
data occurs in the rightmost one or two decimal
23 13.04 13.00 12.98 13.10 places. For example, the following data comes from
24 13.02 12.90 13.05 12.97 a process making parts whose nominal length is re-
25 12.93 12.88 12.91 12.90 quired to be .254 inch:
a. Construct an R chart for this data. Are there any Subgroup x1 x2 x3 Subgroup x1 x2 x3
out-of-control signals present? 1 .258 .254 .256 11 .253 .257 .254
b. Construct an x chart for this data. Are there any 2 .253 .251 .253 12 .252 .253 .258
out-of-control signals present? 3 .252 .258 .256 13 .258 .253 .257
c. If there are any out-of-control conditions found 4 .252 .252 .255 14 .251 .257 .256
in parts (a) or (b), recalculate and interpret the 5 .254 .252 .256 15 .256 .254 .257
revised x and R charts after eliminating these 6 .253 .254 .256 16 .251 .255 .253
subgroups. (doing this assumes that assignable 7 .251 .257 .257 17 .252 .256 .255
causes for out-of-control subgoups can be found 8 .252 .251 .255 18 .251 .256 .253
prior to their elimination).
9 .251 .255 .257 19 .253 .252 .254
19. The following table gives sample means and stan- 10 .257 .255 .255 20 .255 .252 .253
dard deviations, each based on subgroups of six
To simplify control chart calculations for such data,
observations of the refractive index of fiber-optic
practitioners often code the data by transforming
cable:
the measurements into deviations from the nomi-
Day x s Day x s nal value and then multiplying by a suitable power
1 95.47 1.30 13 97.02 1.28 of 10 to eliminate decimal points. In the foregoing
2 97.38 .88 14 95.55 1.14 data, for example, a reading of .258 would be con-
3 96.85 1.43 15 96.29 1.37 verted to .258 2 .254 (the deviation from the nomi-
4 96.64 1.59 16 96.80 1.40 nal value), and then multiplied by 1000. Thus .258
5 96.87 1.52 17 96.01 1.58 transforms into 4, .254 transforms into 0, .256 be-
6 95.52 1.27 18 95.39 .98 comes 2, .251 becomes 23, and so forth.
7 96.08 1.16 19 96.58 1.21 a. Use the formulas for the control limits of x and
8 96.48 .79 20 96.43 .75 R charts to explain why the signals given by
9 96.63 1.48 21 97.06 1.34 charting the deviations from nominal values will
always be identical to the signals given by chart-
10 96.50 .80 22 98.34 1.60
ing the untransformed process data.
11 97.22 1.42 23 96.42 1.22
b. Transform the data in this problem as described,
12 96.55 1.65 24 95.99 1.18
and then create x and R charts of the transformed
a. Construct an s chart for this data. data. Use the extended list of out-of-control con-
b. Construct an x chart for this data. ditions (Figure 6.5) to evaluate these charts.

c. For comparison, create the x and R charts of the Batch x1 x2

untransformed data, and evaluate these charts as 9 59.681 59.675
in part (b). 10 59.655 59.672
11 59.691 59.676
22. Three-dimensional (3D) printing is a manufacturing
12 59.682 59.651
technology that allows the production of three-dimen-
13 59.651 59.682
sional solid objects through a meticulous layering
14 59.668 59.685
process performed by a 3D printer. 3D printing has
15 59.691 59.682
rapidly become a time-saving and economical way to 16 59.661 59.673
create a wide variety of products such as medical im-
plants, furniture, tools, and even jewelry. The article a. Construct an R chart for this data. Are there
“Improving the Process Capability of a Boring Op- any out-of-control signals present?
eration by the Application of Statistical Techniques” b. Construct an x chart for this data. Are there any
(MIT Intl. J. Mech. Engr., 2012: 31–38) considered out-of-control signals present?
the production process of metal castings by using a 23. Reconsider the data from Exercise 16.
3D printer manufactured by ZCorporation. Data was a. Estimate the process standard deviation.
collected on 16 batches (each having two castings), b. Suppose the specification limits on the process
where the outer diameter of each casting (in mm) was are .40 6 .08. Assuming that a normal distribu-
recorded. The target diameter of each casting was 60 tion can be used to describe the process mea-
mm. The corresponding data is given here: surements, estimate the proportion of the pro-
Batch x1 x2 cess measurements above the USL and below
1 59.664 59.675 the LSL.
2 59.661 59.648 24. Reconsider the results from Exercise 17(a).
3 59.679 59.652 a. Estimate the process standard deviation.
4 59.665 59.654 b. If the specification limits for the process are
5 59.667 59.678 .40 6 .08, estimate the proportions of the pro-
6 59.673 59.657 cess measurements above the USL and below
7 59.676 59.661 the LSL. Compare your results to those in
8 59.648 59.651 Exercise 23(b).

6.4 Process Capability Analysis

After all special causes have been identified and eliminated, a process is said to be in a
state of statistical control. One of the desirable features of a controlled process is that it is
predictable, in the sense that the process average and standard deviation are reasonably
stable over time. This makes it possible to get a clear picture of how the process output
compares to the requirements, or specifications, that are placed on the process. Without
statistical control, it is difficult, if not impossible, to reliably evaluate the capability of a
process to perform as required.
Process capability is evaluated by comparing process performance with process re-
quirements. Since meeting specification limits is one of the most basic requirements, capa-
bility analyses usually involve specification limits somewhere in their calculations. Process
data, usually from a control chart, is used to describe how a process is actually performing.
Data from the chart’s subgroups is used to estimate the process average and standard devia-
tion. These, in turn, are transformed into estimates of the proportions of measurements that
fall inside or outside of the specification limits. This last step requires that an assumption be

made about the type of probability distribution that the process is thought to follow. Since
many process characteristics tend to follow normal distributions, the majority of capability
calculations are based on this distribution. In recent years, capability indexes have also been
developed for nonnormal process data. We do not discuss the calculations required for non-
normal data, which are much more laborious than the relatively simple computations for
normal processes, but we do provide references on this material for the interested reader.

Estimating the Process Mean and Variation

The best source of data for estimating process variation usually comes from the control
chart used to bring the process into statistical control. In particular, variation estimates
are derived from charts that monitor process variation, such as the R and s charts. As
we saw in Section 6.3, depending on which variation chart is used, the process standard
deviation is estimated by one of two formulas,
R s
n5
or n5

d2 c4
Both of these formulas are based only on the within-subgroups variation present in the
data. It is also good to keep in mind that both formulas are based on the assumption
that the process data follows a normal distribution. If you have reason to believe that
the process is not normally distributed, then these estimates would not be appropriate.
Another method of estimating the process standard deviation is to pool the subgroup
data used to make the control chart and calculate the sample standard deviation, s,
of the entire set of data. For instance, rather than computing Ryd2 or syc4 from, say, 20
subgroups of size 5, you could calculate s for the combined group of 100 measurements.
The reason that this is permissible is that no assignable causes should be present in the
control chart data for a process that is in control and, consequently, there should be
no significant difference between the subgroup-to-subgroup variation and the within-
subgroup variation (as estimated by Ryd2 or syc4), which makes subgroup pooling an ac-
ceptable procedure. If a process is not in control, however, s will usually be much larger
than either Ryd2 or syc4. When evaluating process capability indexes, it is often useful
to know which of these methods is being used to estimate the process variation. In the
ensuing discussion, we denote the estimated process average and standard deviation by
n and n , regardless of the method of estimation used.
Capability studies generally use the grand mean x of the subgroup data to estimate
the process average, that is, n 5 x. This works well for data whose distributions are fairly
symmetric, such as the normal distribution. Under the assumption that a process follows
a normal distribution, the 3-sigma region on either side of the process average is often
called the process spread:
process spread 5
n 6 3
n
This name arose from the fact that, for normal distributions, most (about 99.73%) of the
process observations lie within 3 sigmas of the mean.

Nonconformance Rates
The proportion of the process measurements that fall above the upper specification
limit or below the lower limit are called nonconformance rates or nonconformance

proportions. Assuming that the process measurements follow a normal distribution, we

estimate these rates as follows:
USL 2 n
proportion above USL 5 P(x . USL) P a z . b
n

LSL 2
n
proportion above LSL 5 P(x , LSL) Paz , b
n

In these definitions, x is a normal random variable whose distribution describes the pro-
cess data, and z denotes a standard normal variable. The shaded regions in Figure 6.10
show the nonconformance proportions for normally distributed process data.

( < LSL) ( > USL)

LSL ˆ USL

Figure 6.10 Nonconformance proportions

for normally distributed process data
Nonconformance rates are usually expressed in terms of either percentages (%)
or parts per million (ppm). One nonconforming (or defective) item in a collection of a
million items is called one part per million, abbreviated as 1 ppm. Thus a nonconfor-
mance rate of .25% could also be expressed as 2500 ppm. For convenience, Table 6.2
shows the equivalent percentage and ppm rates for a range of values that can occur in
practice. To establish small ppm rates, the standard normal table (Appendix Table I),
which is traditionally limited to values of z between 23 and 13 or so, must be extended
to accommodate the large z values required for calculations in ppm.

Table 6.2 Converting from percentage

nonconforming to parts per million
Percentage (%) Parts per million (ppm)
Unless otherwise noted, all content on this page is © Cengage Learning.

10.0 100,000
5.0 50,000
1.0 10,000
.1 1,000
.01 100
.001 10
.0001 1

Example 6.3 Because the control charts of the ignition key process in Example 6.1 do not indicate
any out-of-control conditions, the process appears to be in statistical control. Suppose
that the specification limits for the groove depth of the keys are .0072 6 .0020 inch.

Assuming that the process data is normally distributed, we can estimate the process
standard deviation using the centerline of the R chart,
R .002400
n5
5 5 .00103
d2 2.326

Alternatively, the variation can be estimated by syc4 from the centerline of the s chart.
The nonconformance rates can then be estimated by
USL 2
n
P(x . USL) Pa z . b
n
.0092 2 .007966
5 Paz . b 5 P(z . 1.20) 5 .1151
.00103

LSL 2
n
P(x , LSL) Pa z , b
n

.0052 2 .007966
5 Paz , b 5 P(z , 22.69) 5 .0036
.00103
In percentage terms, we estimate that about 11.51% of the output of this process ex-
ceeds the upper specification limit, whereas only .36% is below the lower limit. This
gives rise to a total percentage of 11.51% 6 .36% 5 11.87%, which is unacceptably
high. Thus statistical control alone does not necessarily guarantee that a process will
successfully meet its specification limits.

Capability Indexes
Process spread, as defined by the interval
n 6 3 n , gives a measure of how a process is cur-
rently performing. The width of this interval is 6
n . Alternatively, the distance between the
specification limits, USL 2 LSL, provides a measure of the maximum process spread we
are willing to tolerate. By comparing the two measures, it is possible to give a very succinct
summary of the capability of a process to meet its specification limits. We refer to the pro-
cess spread as the actual process spread and to USL 2 LSL as the allowable process spread.
The process capability index, denoted by Cp, is defined by the ratio
allowable spread USL 2 LSL
Cp 5 5
actual spread 6
n
where n is an estimate of the process standard deviation.
The Cp index is interpreted as follows. If Cp 5 1, then the process is said to be mar-
ginally capable of meeting its specification limits. This occurs when the process is ex-
actly centered midway between its specification limits (i.e., when n 5 (USL 1 LSL)y2
and the actual process spread uses all of the allowable spread. As you can see from
Figure 6.11, this is a fairly tenuous situation since even the slightest movement of the
process mean will lead to an increase in the overall nonconformance rate of the process.
Normally, we would like the Cp to exceed 1, since then there is a higher likelihood that
the process measurements will be able to stay within the specification limits, even if
the mean wanders a little. A Cp that exceeds 1.33 (i.e., an 8- spread that fits within the

< 1.0
Process is not capable.

LSL USL

= 1.0
Process is marginally
capable.

LSL USL

> 1.0
Process is capable.

LSL USL

Figure 6.11 Interpretation of the process

capability index

specification limits) is usually considered fairly good and is commonly used as a goal by
many companies. On the other hand, Cp values that are less than 1 imply that a process
is not capable of meeting the specification limits.
The Cp is one of four commonly used indexes, originally invented in Japan, which
are routinely used in modern quality improvement programs. The indexes derive their
usefulness from the fact that they convey much information in a very simple fashion.
Capability indexes also have the advantage of being unitless measures, making them
useful for comparing related and unrelated processes alike. For example, if the copper
plating thickness (in inches) from a chemical plating process has a Cp of .81, whereas
the resistance (in ohms) of certain electronic components has a Cp of 2.30, then we can
conclude that the electronic process is the more capable of the two, even though their
Unless otherwise noted, all content on this page is © Cengage Learning.

measurement units, inches and ohms, are unrelated.

One drawback of the Cp is that it does not take the process location (i.e., the mean)
into account. For this reason, it is often said that Cp measures only the potential for a
process to meet its specifications. For example, Figure 6.12 shows two process distribu-
tions, both with Cp values of 2.0, one centered between the specification limits and the
other located near the upper limit. Although the latter process currently has a very high
nonconformance rate, it still has the potential to be capable because its Cp exceeds 1.0.
However, it will realize this potential only if the process average can be moved closer to
the center of the specification range.
An index that does take the process mean into account is the Cpk index:
USL 2
n n 2 LSL
Cpk 5 minimumc , d
3 3
n

= 2.0

LSL USL

= 2.0

LSL USL

Figure 6.12 is not affected by the process average

For normally distributed data, n is taken to be the centerline of the x chart and
n is
chosen to be Ryd2, syc4, or perhaps the combined-subgroup estimate s mentioned previ-
ously. The k in the subscript of Cpk refers to the so-called k factor:
(USL 1 LSL)y2 2
n
k5
(USL 2 LSL)y2
which measures the extent to which the process location n differs from the midpoint of
the specification region. It can be shown that k lies between 0 and 1 and that Cp and Cpk
are related by the formula
Cpk 5 (1 2 k)Cp
Since 0 # k # 1, this formula shows that Cpk never exceeds Cp and that Cpk 5 Cp pre-
cisely when the process is centered midway between its specification limits. When used
together, Cp and Cpk give a clear picture of process performance as well as process
potential.
Unless otherwise noted, all content on this page is © Cengage Learning.

Example 6.4 Nonconformance rates for the groove dimension data (Table 6.1) are calculated in
Example 6.3, where we concluded that the process had poor capability. The reason for the
poor capability can be found by comparing the Cp and Cpk indexes. Using the estimates
n 5 .007966
and n 5 .00103

from Example 6.3 along with the specification limits USL 5 .0092 and LSL 5 .0052,
we calculate the k factor as follows:
(USL 1 LSL)y2 2 n (.0092 1 .0052)y2 2 .007966
k5 5 5 .383
(USL 2 LSL)y2 (.0092 2 .0052)y2

The Cp and Cpk indexes are

.0092 2 .0052
Cp 5 5 .647
6(.00103)
Cpk 5 (1 2 k)Cp 5 (1 2 .383)(.647) 5 .399
Neither index exceeds 1.0. The value of Cpk = .399 is a measure of how the process
is currently performing with respect to meeting the specifications. The fact that Cp
and Cpk are unequal is evidence that the process location has shifted away from the
center of the specification region. However, even if the process could be adjusted so
that it is centered within the specification region, a Cp of .647, which is less than 1.0,
indicates that the process will still not have good capability. Clearly, attention must
be focused on reducing the variation of the groove cutting process as well as bringing
the process average closer to the midpoint of the specifications.

The Cp and Cpk indexes are used for quality characteristics that have two-sided
tolerances, that is, processes with both upper and lower specification limits. Some char-
acteristics, however, can have one-sided tolerances. The breaking strength of a material,
for instance, usually has a lower specification limit, but no upper specification, since
we normally want materials to have a certain minimum strength but we do not care by
how much they exceed that minimum. One-sided capability indexes are used for such
processes. In fact, the definitions of upper and lower capability indexes are contained
within the definition of the Cpk. For processes having only a lower specification limit,
LSL, the lower capability index Cpl is defined by
n 2 LSL

Cpl 5
3n
Similarly, for processes having only an upper specification USL, the upper capability
index Cpu is given by the formula
USL 2 n
Cpu 5
3n
The reason 3 n rather than 6n appears in the denominators is that one-sided capability
indexes compare only one side of the process distribution, the upper or lower, to the
corresponding upper or lower specification limit.
Even when a process has both upper and lower specification limits, calculating
Cpu and Cpl is worthwhile because the smaller of the statistics indicates the direction in
which the process average has shifted away from the nominal value. In fact, from the
formulas it is apparent that Cpk is equal to the smaller of Cpl and Cpu:
Cpk 5 minimum 3 Cpl, Cpu 4
For data that is normally distributed, it is convenient to transform Cpu and Cpl into
their corresponding nonconformance rates using the following relationships:
USL 2 n
P(x . USL) Pa z . b 5 P(z . 3Cpu)
n

LSL 2
n
P(x , LSL) Pa z , b 5 P(z , 2 3Cpl)
n

Example 6.5 Using the results of Examples 6.3 and 6.4, we calculate the Cpu and Cpl indexes for
the groove depth data of Table 6.1 as follows:
n 2 LSL
.007966 2 .0052
Cpl 5 5 5 .895
3n 3(.00103)
USL 2
n .0092 2 .007966
Cpu 5 5 5 .399
3
n 3(.00103)
This information could be used, if desired, to calculate Cpk:

Cpk 5 minimum[Cpu, Cpl] 5 minimum[.399, .895] 5 .399

Because both Cpu and Cpl are less than 1.0, we can conclude that the process is not
performing well with respect to meeting either of its specification limits. Further-
more, the fact that Cpl is the smaller of the two indexes means that the process aver-
age has shifted to the right of the midpoint of the specification region.

Another way of describing capability is to estimate what proportion of the allowed

process spread, USL 2 LSL, is “used up” by the process spread 6 n . This proportion,
6ny(USL 2 LSL), is called the capability ratio. As you can see, the capability ratio is
simply the reciprocal of the Cp index,
1
capability ratio 5
Cp
When expressed as a percentage, the capability ratio is referred to as the percentage
of specification used by the process. Notice that the Cpk is not used in this definition.
The reason is that one usually wants to know how much of the specification range is
used under ideal circumstances (i.e., when the process is centered). For instance, in
Example 6.4, the Cp index of the groove depth data was shown to be .647. This means
that the capability ratio is [email protected]@1.55, or in percentage terms, 155%. In other words,
the process spread 6 n uses about 155% of the allowed tolerance, which is not good.
Capable processes should use up less than 100% of the allowed tolerance.

Using Capability Indexes

The interpretation of capability indexes can be complicated by the presence of measure-
ment errors and assumptions about the process distribution. It should be stressed that the
interpretations given in this section are made under the assumptions that (1) a process is
in statistical control, (2) measurement errors are negligible, and (3) the process follows
a normal distribution. Keep in mind that capability indexes are statistics that arise from
taking sample data from a process. As such, capability indexes will exhibit some degree
of sampling variability. That is, you should expect these indexes to vary a little with each
new set of data. If desired, the amount of sampling variability can be estimated (see,
for example, Kane, V. E., 1986, “Process Capability Indexes,” J. Quality Technology,
1986: 41–52). In addition, if you suspect that the process data does not follow a normal
distribution, then you will want to use indexes designed to handle nonnormal process
data. The references at the end of the chapter give methods for handling such situations.

Section 6.4 Exercises

25. Why must a process be in a state of statistical con- 31. Use the data in Exercise 6 to calculate the Cp, Cpu,
trol before its capability can be measured? Cpl, and Cpk indexes. What do the indexes indicate
about the capability of the process?
26. A process has a Cp index of 1.2 and is centered on
its nominal value. What proportion of the specifica- 32. Using the data of Exercise 16, we estimated the
tion range is used by the process measurements? process standard deviation in Exercise 23. If the
specification limits for the process are .40 6 .08,
27. A computer printout shows that a certain process
calculate the Cp and Cpk indexes. What conclu-
has a Cp of 1.6 and a Cpk of .9. Assuming that the
sions can you draw about the capability of the
process is in control, what do these indexes say
process?
about the capability of this process?
28. A process with specification limits of 5 6 .01 has a Cp 33. The data of Exercise 21 was analyzed by first trans-
forming it into deviations from the nominal value
of 1.2 and a Cpk of 1.0. What is the estimated process
and then running x and R charts on the transformed
average x from which these indexes are calculated?
data. Suppose the specification limits on the pro-
29. It can be shown that the following equation always cess are .254 6 .01 inch.
holds for processes that can be described by a normal a. Describe a procedure for calculating capability
distribution (Farnum, N. R., Modern Statistical Qual- indexes from the transformed data.
ity Control and Improvement, Duxbury, Belmont, b. Calculate the Cp, Cpu, Cpl, and Cpk indexes from
CA, 1994: 235): the transformed data.
proportion out of specification 34. Based on your analysis in Exercise 18, if the speci-
5 P(z $ 3Cpk) 1 P(z $ 6Cp 2 3Cpk) fication limits for the process are 13 6 .2, calculate
Use this equation with the Cp and Cpk from the Cp and Cpk indexes. What conclusions can you
Exercise 27 to estimate the proportion of the pro- draw about the capability of the process?
cess that is not within the specification limits. 35. Using the data of Exercise 22, if the specification
limits for the process are 60 ± .4, calculate the Cp
30. Use the formula given in Exercise 29 to calculate
and Cpk indexes. What conclusions can you draw
the proportion of the process that is out of specifica-
about the capability of the process?
tion in Exercise 28.

6.5 Control Charts for Attributes Data

In quality control, counted data is called attributes data. Attributes data arises when we
check products to see whether they possess a specified characteristic or attribute. Those
that have the attribute in question are said to be conforming and those without it are called
nonconforming. Attributes, or product characteristics, can be either precisely defined or
fairly subjective in nature. For example, the testing of electronic devices usually results in
a clear decision as to which devices are nonconforming (those that fail to function cor-
rectly) and which are conforming. Alternatively, when an injection-molded dashboard
of a car is inspected for small pinholes and other blemishes, deciding which dashboards
are nonconforming is a more subjective matter. Attributes measurements are frequently
used in situations where variable measurements are not practical and human judgment
is needed. In addition to classifying entire products as conforming or nonconforming, it is
also possible to count the number of flaws or nonconformities on a single unit of product.
Control charts for monitoring the proportion of nonconforming items in subgroups
are called p charts. Monitoring the number of nonconforming items is accomplished

with the np chart. For products that are created in distinct units, such as components or
appliances, the c chart is used to track the number of nonconformities in subgroups of
such items. When products are not made in distinct units, such as reels of wire, fabric,
or paper, then the u chart is used to monitor the number of nonconformities in speci-
fied “units” of such products.

Interpreting the Control Limits

The plotted points on control charts for nonconforming items (p and np charts) or on
charts for nonconformities (c and u charts) gauge the numbers or proportions of prob-
lems found in each subgroup. Points that are above the chart’s UCL indicate abnormally
high levels of problems. Assignable causes for such problems should be immediately
sought and eliminated. Points that fall below the LCL, however, indicate abnormally
low problem levels. Although such points certainly qualify as out of control, they are
also evidence that some assignable cause has brought about a temporary, but welcome,
improvement in the process. In this situation, the goal changes to one of finding the as-
signable cause and then taking steps, not to eliminate it, but to ensure that it continues
to exist in the future.

p and np Charts
The proportions of nonconforming items in successive subgroups of size n are plotted
on p charts. If we assume that all the subgroups come from a stable process in which
the true proportion of nonconforming items is , each of the subgroup proportions p1,
p2, p3, . . . , pk is a statistic whose sampling distribution (see Section 5.5) has a mean and
standard deviation of
(1 2 )
p 5 and p 5
B n
In theory, the 3-sigma control limits for the p chart are formed by p 6 3p. In
practice, of course, we must first estimate and then substitute this estimate into
the formulas.
To estimate , the k subgroup proportions are averaged. This average is denoted by
1 k
p5 ^p
k i51 i
and is used as the centerline of the chart. Substituting p for , we find the control limits
for the p charts as
p(1 2 p)
UCL 5 p 1 3
B n
p(1 2 p)
LCL 5 p 2 3
B n
Sometimes, because of the small values of p that are encountered in practice, the LCL
can be negative. When this happens, we replace the LCL by 0 since it is impossible to
have negative nonconformance rates.
In the frequently occurring case where the subgroup sizes n1, n2, n3, . . . , nk are not
all equal, the calculations for the centerline and control limits are modified as follows.

Letting x1, x2, x3, . . . , xk denote the numbers of nonconforming items in each subgroup,
we estimate the centerline of the p chart by
x1 1 x2 1 x3 1 1 xk
p5
n 1 1 n 2 1 n 3 1 1 nk
This formula is conveniently remembered as “the total number of nonconforming items
over the total sample size.” The formula is more general than the equal-samples formula
and, for that reason, it is sometimes the only formula cited by some texts for estimating .
When the subgroup sizes are unequal, the control limits are calculated separately for each
subgroup. That is, the control limits for the ith subgroup are
p(1 2 p)
UCL 5 p 1 3
B ni
p(1 2 p)
LCL 5 p 2 3
B ni

Example 6.6 Aerospace contractors and subcontractors must often demonstrate, using control charts,
that their manufacturing processes are capable of meeting ever-increasing quality stan-
dards for military systems and hardware (“Department of Defense Renews Emphasis
on Quality,” Quality Progress, March 1988: 19–21). Many such systems include printed
circuit board (PCB) assemblies with various electronic components soldered to them.
Components are soldered in place by means of a wave solder machine, which passes
the PCBs on a conveyor over a surface of liquid solder. Soldered PCBs are then con-
nected to test stations, which electronically test the circuits and classify each board as
either conforming or nonconforming. Table 6.3 contains records of the daily numbers
of rejected (nonconforming) PCBs for a 30-day period. For this data,
14 1 22 1 9 1 1 12
p5 5 .054
286 1 281 1 310 1 1 289

Table 6.3 Daily records of numbers of tested and rejected circuit board assemblies
Day Rejects Tested Proportion Day Rejects Tested Proportion
1 14 286 .049 16 15 297 .051
Unless otherwise noted, all content on this page is © Cengage Learning.

2 22 281 .078 17 14 283 .049

3 9 310 .029 18 13 321 .040
4 19 313 .061 19 10 317 .032
5 21 293 .072 20 21 307 .068
6 18 305 .059 21 19 317 .060
7 16 322 .050 22 23 323 .071
8 16 316 .051 23 15 304 .049
9 21 293 .072 24 12 304 .039
10 14 287 .049 25 19 324 .059
11 15 307 .049 26 17 289 .059
12 16 328 .049 27 15 299 .050
13 21 296 .071 28 13 318 .041
14 9 296 .030 29 19 313 .061
15 25 317 .079 30 12 289 .042

Since different numbers of PCBs are tested each day, the control limits for a p chart
of the data are calculated separately for each subgroup:
p(1 2 p)
UCL 5 p 1 3
B ni
(.054)(1 2 .054) .6781
5 .054 1 3 5 .054 1
B n i 2n i

p(1 2 p)
LCL 5 p 2 3
B ni
(.054)(1 2 .054) .6781
5 .054 2 3 5 .054 2
B ni 2n i

Figure 6.13 shows the p chart for using these control limits. Note that the smaller the
subgroup size ni, the wider the control limits. Since the chart shows no signs of any
out-of-control conditions, we conclude that the process is in control and currently
operating at about a 5.4% nonconforming rate.

Proportion

.10
Average UCL = .09368

– = .05385
.05

Average LCL = .01402

0 Subgroup (day) number

0 10 20 30

Figure 6.13 chart of the data in Table 6.3 Unless otherwise noted, all content on this page is © Cengage Learning.

If you have the ability to choose constant subgroup sizes in your particular
application, then the p chart calculations can be further simplified. In fact, with
a constant base of comparison (i.e., constant subgroup size n), there is no need to
even convert the numbers of nonconforming items into the subgroup proportions
p1, p2, p3, . . . , pk. Instead, we can simply plot the numbers of nonconforming items
x1, x2, x3, . . . , xk on the chart. This chart is called an np chart because the number
of nonconforming items in a subgroup is simply n times the proportion of noncon-
forming items.
If x1, x2, x3, . . . , xk denote the numbers of nonconforming items in k subgroups,
then the centerline of the np chart is simply np, where p is calculated by either of the

formulas for p used with the p chart. Similarly, the 3-sigma control limits for the np
chart are found by multiplying each of the control limits of the p chart by n:

UCL 5 np 1 32np(1 2 p)

LCL 5 np 2 32np(1 2 p)

As in the p chart, if the LCL turns out to be negative, we replace it by 0.

Example 6.7 In complex systems, items are routed through a succession of different processes be-
fore emerging as finished products or completed services. In “build to order” systems,
for example, individual orders are routed through slightly different paths from other
orders, according to a customer’s specific design requirements. A common method for
tracking an item’s progress during this journey is to attach paperwork to each order that
describes the requirements for every step of the production process. These documents,
often called travelers, are created before an order is processed. It is imperative that they
be correct, since incorrect travelers are essentially recipes for nonconforming products!
To monitor the quality of such paperwork, suppose that periodic samples
of 100 travelers are examined for errors, where a nonconforming document
is defined to be one that contains at least one error. Table 6.4 shows data from
25 daily samples of size 100 travelers and the corresponding numbers of noncon-
forming ones. The total number of nonconformities in the 25 samples is 272, so
p5 272y[(25)(100)] 5 .1088 and, therefore, np 5 100(.1088) 5 10.88. The control
limits are
UCL 5 np 1 32np(1 2 p) 5 10.88 1 3210.88(1 2 .1088) 5 20.22

LCL 5 np 2 32n2
p (1 2 p) 5 10.88 2 3210.88(1 2 .1088) 5 1.54

Table 6.4 Numbers of documents containing errors in samples of 100 documents

Day Number Sample Size Day Number Sample Size
1 10 100 14 21 100
2 12 100 15 20 100
3 10 100 16 12 100
Unless otherwise noted, all content on this page is © Cengage Learning.

4 11 100 17 11 100
5 6 100 18 6 100
6 7 100 19 10 100
7 12 100 20 10 100
8 10 100 21 11 100
9 6 100 22 11 100
10 11 100 23 11 100
11 9 100 24 6 100
12 14 100 25 9 100
13 16 100

The np chart (Figure 6.14) shows one point (subgroup 14) above the UCL. Production
records for day 14 should be examined for a possible assignable cause. If one is found,
then subgroup 14 should be eliminated from the calculations, and an np chart with a
revised centerline and control limits should be used to monitor subsequent data.

Sample count

1
20 UCL = 20.22

– = 10.88
10

LCL = 1.538
0 Subgroup (day) number
0 5 10 15 20 25

Figure 6.14 chart from Minitab for the data of Table 6.4
(Minitab labels the first out-of-control point with a “1”)

c and u Charts
Because an object can have any number of flaws, or nonconformities, it is important
to establish an inspection unit when working with c and u charts. The inspection unit
defines the fixed unit of output that will be regularly sampled and examined for non-
conformities. Inspection units are often single units of product, such as a single printed
circuit board or a single television. Inspection units can also be collections of items,
which might be used when one examines accounting records for errors by looking at
batches of 100 accounting records per day. The inspection unit is then 100 records, and
the number of nonconformities for such an inspection unit is the total number of errors
found in each such batch. Products are usually grouped in batches like this when the
nonconformance rate is small and large samples are needed to detect nonconformities. Unless otherwise noted, all content on this page is © Cengage Learning.
Choosing an inspection unit is especially important with continuous processes, such
as the production of long rolls of paper, wire, fabric, or metal. To count the number of
surface flaws in long rolls of metal, for example, it would not be practical to look at every
square foot of the metal surface. Instead, we decide on a fixed-size inspection unit, say, a
2-square-foot section of metal, and count the number of nonconformities found therein.
The number of nonconformities per unit (i.e., per inspection unit) is denoted by c.
To create a c chart, a sample of k successive inspection units is examined, and the num-
bers of nonconformities c1, c2, c3, . . . , ck found in these units are counted. The centerline
of the chart, denoted by c, is the average

1 k
c5 ^c
k i51 i

For a stable process, the number of nonconformities, c, is modeled by a Poisson distribu-

tion. Since the mean and variance of a Poisson distribution are equal (see Chapter 2) and
since the mean is estimated by the centerline c, the 3-sigma control limits of the c chart are
UCL 5 c 1 32c
LCL 5 c 2 32c
As in the p and np charts, negative LCLs are replaced by 0. However, this problem can
be avoided if the inspection unit is chosen so that c exceeds 9 (see Exercise 38).

Example 6.8 One measure of software quality is the number of coding errors made by program-
mers per 1000 lines of computer code. Using K to denote 1000, the inspection unit
“a thousand lines of code” is usually abbreviated as KLOC (i.e., K Lines Of Code).
The data in Table 6.5 shows the defects per KLOC obtained from weekly test logs in
a software company. The average number of errors per KLOC is c 5 134y30 5 4.467.
The upper and lower control limits of the chart are then

UCL 5 c 1 31c 5 4.467 1 314.467 5 10.807

LCL 5 c 2 31c 5 4.467 2 314.467 5 21.874

Table 6.5 Number of errors per 1000 lines of code

Week Errors Week Errors
1 6 16 3
2 7 17 2
3 7 18 0
4 6 19 0
5 8 20 1
6 6 21 2
7 5 22 5
8 8 23 1
9 1 24 7
10 6 25 7
Unless otherwise noted, all content on this page is © Cengage Learning.

11 2 26 1
12 5 27 5
13 5 28 5
14 4 29 8
15 3 30 8

Because the LCL is negative, we reset it to 0 and then construct the c chart shown in
Figure 6.15. Note that the two points at weeks 18 and 19 are touching the lower con-
trol limit and that there are several runs of points on the same side of the centerline.
According to the extended “out-of-control” rules in Section 6.2, these observations
do not quite qualify as out-of-control signals, but they are close. It might therefore

be rewarding to conduct a small search for reasons why the error rate was so low in
weeks 18 and 19.
Sample count

UCL = 10.81
10

5 – = 4.467

0 LCL = 0
Subgroup (week) number
0 10 20 30

Figure 6.15 chart for the data of Table 6.5

Sometimes it is neither possible nor convenient to use inspection units based on

collections of units of a product. This is especially the case for continuous processes,
such as the manufacture of sheet metal and plastic or rolls of paper, tubing, wire, and
fabric. It is not convenient, nor is it necessary, to inspect entire rolls of fabric for non-
conformities. Instead, smaller samples of such products are used to form control charts.
This is accomplished by first deciding on an inspection unit of a specified size, such as
a 2-square-foot area of fabric or, perhaps, a 2-yard section of wire. The second step is
to obtain small samples of the product for testing, but these samples need not neces-
sarily coincide with the chosen inspection unit. For example, a 4-yard section of wire
might be examined for flaws on one day and a half-yard section on another. To make a
fair comparison between samples of different sizes, however, we divide the number of
flaws found in any sample by the number of inspection units represented in the sample.
For instance, if three flaws are found in the 4-yard section of wire and we are using an
inspection unit of 2 yards, the nonconformity rate would be recorded as 1.5 flaws per Unless otherwise noted, all content on this page is © Cengage Learning.
unit, since 4 yards represents two inspection units. Similarly, three flaws in a half-yard
section of wire is more serious, because this is equivalent to 12 flaws per 2 yards, or
12 flaws per unit.
To account for variable numbers of inspection units in our subgroups, c charts
are replaced by u charts based on the adjusted per unit rates described previously. If
subgroup i contains ci nonconformities and represents ni inspection units, then the non-
conformities per unit, ui, is simply
ci
ui 5
ni

Note that the numbers of inspection units, ni, represented in a sample does not have to
be an integer.

For k subgroups of such data, the statistics u1, u2, u3, . . . , uk are plotted on the u
chart. The centerline on the chart is
total nonconformities in the k subgroups
u5
total number of inspection units
c1 1 c2 1 c3 1 1 ck
5
n 1 1 n 2 1 n 3 1 1 nk
Because the subgroup size ni usually varies from sample to sample, control limits for
the u chart are computed separately for each subgroup:
u
UCL 5 u 1 3
A ni

u
LCL 5 u 2 3
A ni

Example 6.9 The data in Table 6.6 shows the number of flaws found in 30 samples of fabric
and corresponding sizes of the samples examined (in square feet). Suppose
an inspection unit of 2 square feet is used to monitor the quality of this fabric.
Table 6.6 also shows the conversion of the raw nonconformity rates into the per unit
rates, ui. The u chart of this data (Figure 6.16) reveals several out-of-control points,
some bad (above the UCL) and some good (below the LCL). Before this control
chart can be used to monitor subsequent production, a search should be made
for possible assignable causes and then appropriate actions taken. A revised chart,
after eliminating out-of-control points, would then be used to monitor subsequent
samples from the process.

Table 6.6 Number of flaws and per unit rates in 30 fabric samples

Sample size Sample size
i ci (ft2) ui i ci (ft2) ui
1 12 3.9 6.15 16 29 9.8 5.92
2 18 9.0 4.00 17 18 8.8 4.09
Unless otherwise noted, all content on this page is © Cengage Learning.

3 27 6.7 8.06 18 28 7.1 7.89

4 64 9.2 13.91 19 10 3.3 6.06
5 11 3.6 6.11 20 47 5.9 15.93
6 13 6.7 3.88 21 21 5.2 8.08
7 25 8.3 6.02 22 6 5.6 2.14
8 22 5.6 7.86 23 16 8.0 4.00
9 43 6.1 14.10 24 27 8.9 6.07
10 17 4.2 8.10 25 21 5.3 7.92
11 0 8.4 .00 26 12 3.1 7.74
12 14 6.8 4.12 27 19 6.2 6.13
13 9 4.4 4.09 28 14 4.8 5.83
14 16 5.2 6.15 29 42 8.3 10.12
15 0 7.8 .00 30 19 4.7 8.09

Count per unit

Average UCL = 11.48

– = 6.496
5

Average LCL = 1.508

0
Subgroup number
0 10 20 30

Figure 6.16 chart for the data of Table 6.6

Section 6.5 Exercises

36. Explain the difference in the actions taken on a a. From this information, calculate the centerline
process when a point on a p chart exceeds the upper and control limits for a p chart.
control limit versus the actions taken when a point b. The highest number of failures on a given day was
falls below the lower control limit. 39 and the lowest number was 13. Would either of
these points indicate an out-of-control condition?
37. For a fixed subgroup size n, find the smallest value
c. If your answer to part (b) is “yes,” then eliminate
of p that will give a positive lower control limit on a
the out-of-control point(s) from the data and re-
p chart.
compute the centerline and control limits of the
38. Control limits for attributes charts are never nega- p chart.
tive, and it is desirable that they be positive. For a
41. After assembly and wiring of the individual keys,
c chart, what values of the centerline c will ensure
computer keyboards are tested by an automated test
that the lower control limit is positive?
station that pushes each key several times. Daily re-
39. The following data shows the number of noncon- cords are kept of the number of keyboards inspected
forming items found in 30 successive lots, each of and the number that fail the inspection. Data from
size 50, of a finished product: 25 successive manufacturing days is given here.
Unless otherwise noted, all content on this page is © Cengage Learning.
4 3 0 2 2 2 0 1 1 0
Number Number Number Number
3 2 1 1 0 0 2 4 2 5 Day tested failed Day tested failed
0 0 1 1 0 3 2 1 2 4
1 2186 28 11 2141 31
a. Construct a control chart for the proportion of
2 2131 21 12 2019 18
nonconforming items per lot.
3 2158 22 13 2027 27
b. Interpret the chart in part (a).
4 2307 14 14 2376 25
40. On each of 25 days, 100 printed circuit boards are 5 2262 17 15 2118 27
subjected to thermal cycling; that is, they are sub- 6 2379 27 16 2251 14
jected to large changes in temperature, a procedure 7 2069 18 17 2068 31
known to cause failures in boards with weak circuit 8 2264 20 18 2242 23
connections. Of the boards tested, a total of 578 fail 9 2383 18 19 2089 23
to work properly after the thermal cycling test. 10 2350 19 20 2387 36

Number Number Number Number Number Number

Day tested failed Day tested failed Day of flaws Day of flaws
21 2011 38 24 2375 20 19 63 23 42
22 2059 13 25 2029 30 20 42 24 39
23 2045 11 21 45 25 38
22 43
a. Calculate the centerline of a p chart for this data.
Construct an appropriate control chart for this data,
b. Construct the control limits for the p chart.
and examine it for any evidence of a lack of statisti-
c. Are there any signs of out-of-control conditions
cal control.
in this data?
44. Forty consecutive automobile dashboards are exam-
42. The following observations are the number of de-
ined for signs of pinholes in the plastic molding.
fects in 25 1-square-yard specimens of woven fabric
The numbers of pinholes found are (read across)
(read across):
6 2 3 2 5 2 2 3 2 4 9
3 7 5 3 4 2 8 4 3 3 6 7 2
4 0 5 0 6 5 4 2 3 3 1
3 2 4 7 3 2 4 4 1 4 5 6 4 1 7 3 3 5 7 3 6 7 6
a. Construct a c chart for this data. 4 5 3 8 5 4 3
b. Check the chart for any out-of-control signals a. Construct a control chart for the number of pin-
and, if necessary, eliminate such points from the holes per dashboard.
data and reconstruct the c chart. b. Interpret the chart in part (a).
43. Off-color flaws in aspirin are caused by extremely 45. Painted metal panels are examined after baking at
small amounts of iron that change color when wet high temperatures to harden the paint. Because the
aspirin comes into contact with the sides of drying manufacturer produces panels of several different
containers (“People: The Only Thing That Makes sizes, inspectors simply record the number of blem-
Quality Work,” Quality Progress, Sept. 1988: 63–67). ishes found along with the known area of the panel
Such flaws are not harmful but are nonetheless un- (ft2). The following table shows the number of sur-
attractive to consumers. At one Dow Chemical face flaws found on 20 successive panels:
plant, a 250-lb sample is taken out of every batch of
Number Area of Number Area of
aspirin, and the number of off-color flaws is counted.
Panel of flaws panel Panel of flaws panel
The following table shows the numbers of flaws per
1 3 .8 11 1 .6
250-lb sample for a period of 25 days: 2 2 .6 12 3 .8
Number Number 3 3 .8 13 5 .8
Day of flaws Day of flaws 4 2 .8 14 4 1.0
1 46 10 44 5 5 1.0 15 6 1.0
2 51 11 47 6 5 1.0 16 12 1.0
3 56 12 51 7 10 .8 17 3 .8
4 57 13 46 8 12 1.0 18 3 .6
5 37 14 49 9 4 .6 19 5 .6
6 51 15 48 10 2 .6 20 1 .6
7 47 16 59 a. Construct a u chart for this data.
8 34 17 53 b. Examine the chart in part (a) for any out-of-
9 30 18 61 control points.

6.6 Reliability
Implicit in our understanding of the term quality is a product’s ability to perform its
intended function for a reasonable period of time. Unless expressly designed for short-
term or one-time jobs, products that fail after only a brief period of use are not normally

considered to be of high quality. In addition to applying quality improvement methods

to create products, attention must also be paid to making these products last.
The field of reliability is concerned with the time aspect of quality. Reliability tech-
niques are used to estimate the useful lifetime of products, to detect and fix the types of
problems that occur with time, and to aid in establishing warranty, replacement, and
repair policies. Directly related to reliability are issues of product safety and product li-
ability, both of great importance to consumers and companies alike.

Failure Laws
The length of time that a product lasts until it fails, or ceases to operate correctly, is
called its lifetime. Lifetimes are measured in terms of how a product is used. Many
product lifetimes are simply measured in units of time (minutes, hours, etc.), as, for
example, in a wall clock battery that begins its useful life when installed in a clock and
fails sometime later when the clock stops. For items such as lightbulbs, that usually do
not operate continuously, lifetimes refer to the accumulated operating time a product
experiences before failure (i.e., the total number of hours during which the bulb was
on). With tires, the number of miles driven is usually a better indicator of product life
than simply the time that the tires have been on the car. Mechanical devices, such as
springs, have lifetimes measured in cycles of operation, where, for example, a cycle
might be defined to be one compression and release of the spring. Whatever units are
used, time or cycles, we define a product’s lifetime to be a measure of the total accu-
mulated exposure to failure, often called the time on test, that the product experiences
prior to failure.
Lifetimes are modeled as continuous random variables and, as such, their prob-
ability distributions are described by probability density functions (pdf’s). Lifetimes can
take on nonnegative numerical values, even zero (e.g., products that fail immediately),
so density functions such as the exponential, Weibull, and lognormal are frequently
used to model lifetimes. Distributions that allow negative values, such as the normal
distribution, can also be used as long as their parameters are chosen in a manner that
gives negligible probability to negative lifetimes. When used to model lifetimes, density
functions are also called failure laws.
Choosing an appropriate failure law for a particular product or set of data can be
done in several ways:
1. There may be a physical or mathematical reason that justifies the use of a
particular density (e.g., the Central Limit Theorem justifies using the normal
distribution for sums and averages).
2. Quantile plots (see Section 2.4) may show that a particular density provides a
good fit to available data.
3. A failure law may have already been used by others and found to work well.
Because of the vast amount of research that has already been done on many products
and materials, item (3) in the preceding list often leads to a good failure law choice. It
is also useful to keep in mind the following brief list of situations that may provide the
necessary justification needed in item (1):
Normal failure laws often apply in situations where lifetimes are the result of a
sum of many other variable quantities.

Exponential failure laws apply to products whose current ages do not have
much effect on their remaining lifetimes. This is the “memoryless” property of
exponential distributions (see Exercise 61). Typical applications: fuse lifetimes,
interarrival times, alpha ray arrivals, Geiger counter ticks).
Lognormal failure laws work well when the degradation in lifetime is propor-
tional to the previous amount of degradation (typical applications: corrosion,
crack growth, diffusion, metal migration, mechanical wear).
Weibull failure laws are good models for the failure time of the weakest compo-
nent of a system (e.g., capacitor, bearing, relay, and pipe joint failures).

Example 6.10 The lognormal distribution is often used to model tread wear of tires. To fit a log-
normal distribution to such data, suppose a tire manufacturer uses warranty data to
estimate that the mean time to failure (measured in total miles driven) for a certain
tire model is 40,000 miles with a standard deviation of 7500 miles. Denoting tire life-
times (in miles) by a random variable x, the parameters of the lognormal distribution
can be calculated using the formulas (see pages 69 and 77)
V(x) 5 e21 1 e 2 1 2
2 2 2
E(x) 5 e1( y2) and
which can be solved for the lognormal parameters and :
V(x) 75002
2 5 ln a 1 1 2 b 5 ln a 1 1 b 5 .034552
[E(x)] [40,000]2

so 5 .185882 and
2 .034552
5 ln(E(x)) 2 5 ln(40,000) 2 5 10.57936
2 2

Reliability and Hazard Functions

Letting f (x) be the density function (failure law) for a random variable x that describes
the lifetime of a product, the reliability at time t, denoted R(t), is the probability that
the product lasts longer than time t:

reliability at time t 5 R(t) 5 P(T . t) 5 # f (x) dx

Directly related to R(t) is a function Z(t) called the failure rate or hazard function:

f (t)
failure rate at time t 5 Z(t) 5
R(t)

Z(t) is interpreted as the instantaneous rate of failure at time t, meaning that of those
items that have not failed before time t, the proportion that will fail in the small interval
of time from t to t 1 Dt is approximately Dt ? Z(t). The failure rate function is very use-
ful for describing the manner in which failures occur.

The normal and lognormal distributions do not have closed-form expressions for
either the reliability or the hazard functions; however, the exponential and Weibull
distributions do have simple closed-form expressions for R(t) and Z(t):
Density R(t) Z(t)
Exponential: e 2x
( . 0) e 2t
(a constant)
21 2(xy) 21
Weibull: x e (, . 0) e2(ty) t

(recall that the exponential is a special case of the Weibull when 5 1y and 5 1).
Figure 6.17 shows graphs of Z(t) for various values of (the “shape” parameter) for
the Weibull distribution. Notice that for 0 , , 1 the failure rate decreases with time, for
5 1 (i.e., the exponential distribution) the failure rate is constant, and for . 1 the failure
rate increases with time. In the case of the exponential distribution ( 5 1), the fact that the
failure rate is constant is often interpreted as saying that products that have exponential fail-
ure law are “memoryless.” That is, no matter how old such products are, their failure rates
are always the same. This means, after any time t, such products are essentially “as good as
new.” In fact, this may be a good approximation to the behavior of items such as fuses—if
a fuse has not burned out by time t, then it is probably very nearly as good as a new fuse.

() () ()

0< <1 =1 >1

Figure 6.17 Failure rates of Weibull distributions for various values of the shape
parameter
Unless otherwise noted, all content on this page is © Cengage Learning.

Example 6.11 In Example 6.10, warranty data on tire failures was used to estimate the parameters
5 10.57936 and 5 .185882 of a lognormal distribution that describes tread wear
(in miles). Denoting tread life by x, the reliability function for x can be calculated using
the fact that ln(X) follows a normal distribution with mean and standard deviation :
ln(t) 2 ln(t) 2
R(t) 5 P(x . t) 5 P(ln(x) . ln(t)) 5 Pa z . b 5 1 2 Fa b

where F(z) denotes the cumulative probability for the standard normal distribution
(see Appendix Table I). Although there is no closed-form expression for R(t), it is easy
to use Table I or statistical software to create a graph of R(t), as shown in Figure 6.18.

()

1.0

0
0 10000 20000 30000 40000 50000 60000

Figure 6.18 Graph of the reliability function for the

lognormal distribution of Example 6.11

Similarly, the hazard function Z(t) can be computed and plotted, as shown in Figure 6.19.
Notice that the failure rate is an increasing function for the lognormal distribution.
()

.0002

.0001

0
0 10000 20000 30000 40000 50000 60000

Figure 6.19 Graph of the hazard function for the lognormal

distribution of Example 6.11
Unless otherwise noted, all content on this page is © Cengage Learning.

System Reliability
Products that consist of large assemblies of components can be at risk of failure if one or
more of their individual parts fails. Studying how a product’s components are connected
and how this affects product lifetime is referred to as topological or system reliability.
Systems or assemblies are usually comprised of successive levels of subsystems
whose individual reliabilities are easy to estimate. By finding the subsystem reliabilities
first, one can often combine these estimates into an overall estimate of product reliabil-
ity. The particular combination depends on how the subsystems are connected.
Series systems are defined to be systems whose individual components are connect-
ed end-to-end in a “series.” Figure 6.20 shows a diagram of the typical series system. The
main aspect of such systems is that they can only function as long as every component

of the system functions correctly. Examples of series systems would be the tires on a
vehicle, batteries in a flashlight, and the power supply and CPU in a computer.

Component 1 Component 2 Component 3 ... Component

Figure 6.20 Diagram of a series system

If we denote the reliability at time t of the ith component by Ri(t), then the fundamental
theorem of series systems can be summarized as follows:

If all components in a series system function of one another, then the reli-
ability function ( ) for the entire system is simply the product of the reliability functions
of the components. That is:
( )5 1( )? 2( )? 3( ) ()

Parallel systems are ones whose components function in parallel, that is, those
systems that will function as long as at least one of their components functions correctly.
Figure 6.21 shows a diagram of a typical parallel system comprised of n components.
Parallel systems are often used to build redundancy into a product; that is, the com-
ponents in parallel systems serve as “backups” for each other so that if one component
fails, then the entire system will not necessarily fail. Such systems are often used to
increase the reliability of a product. Examples of parallel systems include computer
routing systems, pacemakers, and safety systems on airplanes.
The fundamental result for computing the reliability R(t) of a parallel system in
terms of the reliabilities Ri(t) of its n components is

If all components in a parallel system function of one another, then the reli-
ability function ( ) for the entire system is given as
( ) 5 1 2 [1 2 1( )] ? [1 2 2( ) 4 ? [1 2 3( )] [1 2 ( )]

Component 1
Unless otherwise noted, all content on this page is © Cengage Learning.

Component 2

Component 3
.
.
.
Component

Figure 6.21 Diagram of a parallel system

The concepts of series and parallel systems can be used either separately or in
combination when analyzing the reliability of complex systems. The basic method is,
when possible, to break down a complex system into various combinations of series and/
or parallel subsystems. The reliabilities of such subsystems can then be calculated from
the theorems in this section and they, in turn, can often be combined to calculate the
overall product or system reliability.

Example 6.12 Routers are used in the telecommunications industry to transmit data (in the form of
digitized electronic signals) from one location to another. Because many important
business and scientific organizations depend upon the continuous availability of data,
routing systems must be highly reliable. The usual way of increasing reliability in rout-
ing systems is to include various sources of redundancy in the form of parallel subsys-
tems. For example, Figure 6.22 shows a routing system that uses two identical routers
in parallel. In addition, each router contains four different power sources (arranged in
parallel) and two “supervisor cards” (also in parallel) that direct the router’s actions.
Assuming all power sources are of the same kind, each with reliability function
Rp(t), and that they act independently of one another, the reliability of each set of
four power sources is 1 2 [1 2 Rp(t)]4. Making the same assumptions for the super-
visor cards [cards are independent and have a common reliability function Rs(t)],
the reliability of each set of two cards is 1 2 [1 2 Rs(t)]2]. Since the power sources
in each router are connected in series to the supervisor cards, the reliability of a
single router must be the product of the power source reliability and the supervisor
card reliability: {1 2 [1 2 Rp(t)]4} ? {1 2 [1 2 Rs(t)]2}.
Since both routers are connected in parallel, the overall reliability for the rout-
ing system is 1 2 [1 2 {1 2 [1 2 Rp(t)]4} ? {1 2 [1 2 Rs(t)]2}2]. The final step would
be to determine the particular form of the failure laws for the power sources and
supervisor cards (e.g., exponential or Weibull) and substitute these numerical expres-
sions into the overall reliability formula.
Router 1
Unless otherwise noted, all content on this page is © Cengage Learning.

Router 2

Figure 6.22 Redundancy in a routing system:

two routers, four power supplies per router,
two supervisor cards per router

Section 6.6 Exercises

46. Intravenous (IV) tubes that deliver liquids (drugs, a. Use this model to calculate the percentage of
saline solution, food, etc.) to medical patients are people living over 150 years.
connected by inserting a plastic prong (called a b. Next, calculate the percentage of people living
canula) from one tub through a rubber membrane less than 10 years.
in the connector of another tube. Canula systems c. Based on results in (a) and (b), do you think the
are needleless and therefore eliminate the pos- exponential distribution is a good one for mod-
sibility of needle punctures to nurses or patients eling human lifetimes?
when connecting IV tubes. When disconnected, d. Answer part (c) using only the “memoryless”
the surface of the rubber membrane closes up, re- property of the exponential distribution.
sealing the end of the tube. However, after many
such connections and disconnections, the rubber 49. a. Assume that a certain product can be modeled
membrane may eventually wear out and fail to with a normal failure law having a mean lifetime
close properly. of 5 10 years and standard deviation 5 2
Suppose the number of times (i.e., cycles) you years. Use a spreadsheet program or other soft-
can connect and disconnect such a system until a ware to create a graph of the failure rate, Z(t), for
membrane wears out is modeled by an exponential such a product.
distribution with a mean time to failure of 5 500 b. Based on your result in part (a), what type of fail-
cycles. ure rate (decreasing, constant, or increasing) do
a. What is the probability that a given canula con- products with normal failure laws have?
nection will last for at least 100 cycles? 50. Estimates of Weibull parameters can be obtained
b. Find the number of cycles, t, for which the reli- using simple linear regression (see Section 3.3). De-
ability equals .95 (i.e., 95%). noting the lifetime of a product by t, the Weibull
c. Suppose a manufacturer of such systems wants cumulative area function can be written as F(x) 5
to increase their reliability by specifying that 1 2 e2(xy), and it is easy to show algebraically
R(100) 5 .95. What is the mean time to failure that ln 3 ln (1y1 2 F(x)) 4 5 ln(x) 2 ln() (see
(in cycles) for such a system? page 93). For an ordered set of data x1 # x2 # x3 #
47. Reliability of mechanical springs is measured in # xn, we associate xi with the [(i 2 .5)yn]th sample
terms of how many times (i.e., cycles) the spring can quantile of the Weibull distribution. That is, we use
be compressed and released. Suppose that the life- pi 5 (i 2 .5)yn in place of F(x) and perform a regres-
time of a certain type of spring can be modeled by sion of ln[ln(1y1 2 pi)] on ln(xi) to find estimates of
a Weibull distribution with shape parameter 5 4 and .
and scale parameter 5 600,000. a. Using the data of Example 2.18 (page 93), esti-
a. Calculate the reliability at t 5 400,000 cycles. mate the parameters of the Weibull distribution
b. Calculate the reliability at t 5 800,000 cycles. that fit this data.
c. Calculate the reliability at t 5 5 600,000 cy- b. Compare the estimates from part (a) with those
cles. Note that R() is always the same number obtained in Example 7.18 (page 338).
for any Weibull distribution. 51. RAID (Redundant Arrays of Inexpensive Disks)
d. Write the hazard function. Is the failure rate in- structures consist of various combinations of
creasing, decreasing, or flat? computer disks that use parallel design elements
48. Is the exponential distribution a reasonable one for to achieve high reliability. Suppose that a RAID
modeling human lifetimes? To answer this ques- system consists of three disks (A, B, and C) and
tion, suppose the mean lifetime is about 75 years three “mirror” disks that contain complete copies
and that lifetimes follow an exponential distribu- of the data in the first three disks. Suppose each
tion with parameter 5 1y75 5 .0133. A, B, and C disk is connected in parallel to its

corresponding mirror disk and that the three such b. Suppose any disk has an exponential lifetime
pairs of disks are connected in series. (in months) with parameter 5 .025. Calculate
a. Draw a diagram of this RAID system. the reliability of this system.

Supplementary Exercises
52. When affixed to an object, each piece of paper in a. Construct an s chart based on this data.
a pad of adhesive notepaper must stay in place but b. Check the chart in part (a) for any out-of-control
must also be easily removable. The strength of the points. If there are any, eliminate them from the
adhesive used is a critical quality characteristic of data and reconstruct the s chart. Repeat this
such pads. For this type of product, does adhesive process, if necessary, until there are no out-of-
strength have a one- or a two-sided tolerance? control signals in the s chart.
53. In a bottling process, a beam of light is passed 57. The deviations from nominal transformation in Exer-
through the necks of bottles passing by on a conveyor cise 21 can be used in so-called short-run processes.
system. Underfilled bottles, which allow the beam Even though small numbers of different-size parts are
of light to pass through, trip a sensor that routes the created by such processes, the deviations from the
bottles off the conveyor system. Bottles with liquid various nominal values of these parts provide informa-
levels above the level of the light beam do not trigger tion about the particular process, not the parts, that
the sensor, thereby meeting the required fill specifi- is common to all the parts. For example, consider a
cation; these bottles are then shipped to customers. milling process in which metal bars of various sizes
Describe the shape of the distribution of fill volumes are machined to specified lengths. The size of the bars
for the bottles that pass this inspection. submitted to the machining process may vary from
hour to hour, so there may be insufficient data to cre-
54. Instead of constructing x and R charts for 30 sub-
ate control charts on any particular bar size. However,
groups of size 4, a friend suggests the simpler al-
by subtracting the nominal value from each batch of
ternative of calculating the standard deviation s of
bars, the resulting subgroups of data are sufficient to
the 30 means to establish 3-sigma limits for a con-
create a control chart for the milling process itself. The
trol chart. That is, it is suggested that the 30 means
following table shows the raw length measurements of
be plotted on a chart with control limits x 6 3s .
milled steel bars of various sizes, denoted P1, P2, P3,
Sample means that fall outside these control limits
and P4. The nominal length for bars of type P1 is .125;
would indicate process problems. Explain what is
for bars of type P2, .250; for P3, .375; and for P4, .500.
wrong with this procedure.
Subgroup x1 x2 x3 x4 Part type
55. A tool that drills holes in metal parts eventually
wears out and periodically must be replaced. If the 1 .251 .252 .250 .249 P2
hole diameters drilled by this machine are moni- 2 .372 .378 .379 .375 P3
tored on a control chart, describe the type of pat- 3 .247 .249 .254 .251 P2
tern you would expect to see on the chart as the 4 .248 .247 .250 .252 P2
drill wears out. 5 .249 .249 .250 .249 P2
6 .125 .127 .125 .126 P1
56. A manufacturer of dustless chalk monitors the 7 .372 .374 .375 .376 P3
consistency of chalk by running an s chart on the 8 .499 .502 .495 .503 P4
density of chalk in subgroups of size 8. The most 9 .124 .121 .123 .126 P1
recent 24 such subgroups had the accompanying 10 .126 .126 .130 .122 P1
sample standard deviations (read across):
11 .375 .374 .378 .379 P3
.204 .315 .096 .184 .230 .212 .322 .287 12 .249 .249 .250 .247 P2
.145 .211 .053 .145 .272 .351 .159 .214 13 .250 .253 .251 .248 P2
.388 .187 .150 .229 .276 .118 .091 .056 14 .249 .250 .249 .249 P2

Subgroup x1 x2 x3 x4 Part type P(X . t1 1 t2 u X . t1) 5 P(X . t2) for a random

variable X that has an exponential distribution
15 .252 .250 .251 .247 P2
with parameter .
16 .251 .249 .250 .250 P2
17 .126 .127 .122 .125 P1 62. Suppose that two components with reliabilities
18 .123 .123 .123 .128 P1 R1(t) and R2(t) are connected in series, but that
19 .252 .250 .247 .248 P2 the two components do not necessarily function
20 .502 .496 .502 .502 P4 independently of one another. Show in this
a. Using the nominal lengths given, convert this case that min 5 R1(t), R2(t) 6 # R1(t)R2(t) 1 14.
data into the deviations from nominal format. 63. Show that any series system consisting of two
b. Construct x and R charts of the transformed components with reliabilities R1(t) and R2(t) can
data in part (a). Evaluate the charts, and com- never have a system reliability R(t) that exceeds
ment on the milling process. the reliability of its weakest link, that is, R(t) #
58. Explain why it is possible for all the measurements min{R1(t), R2(t)}.
in a given sample to lie within the specification lim- a. Prove this under the assumption that the two
its and for the same data to yield a nonzero estimate components function independently of one
of the proportion of the process data that exceeds another.
the specification limits. b. Then prove it in the more general case,
where the two components may or may not
59. For a certain process, x and R charts based on sub- function independently of one another.
groups of size 5 have centerlines of 14.5 and 1.163,
respectively. Given that the process has specification 1 2
limits of 12 and 16, calculate Cp, Cpu, Cpl, and Cpk.
60. For a fixed value of p, how large does the subgroup 3
size n have to be to yield a positive lower control
limit on a p chart? 64. A small system contains three components that
61. A “memoryless” system or component is one that are connected according to the following dia-
satisfies the following property: If it has already gram. Assuming that the components all func-
lasted for t1 hours, then the probability it lasts for tion independently of one another, find the
another t2 hours is the same as its initial prob- general expression for the system reliability R(t)
ability of lasting t2 hours. Prove that the exponen- in terms of the component reliabilities R1(t),
tial distribution is memoryless. That is, prove that R2(t), and R3(t).

Bibliography
DeVor, R. E., T. Chang, and J. W. Sutherland, Statisti- Milwaukee, 1984. Classic text covering all aspects of
cal Quality Design and Control (2nd. ed.), Prentice reliability. Good explanations, with many examples.
Hall, New Jersey, 2006. Good discussion of several of Meeker, W. Q., and L. A. Escobar, Statistical Methods
the more advanced techniques of quality control. for Reliability Data, Wiley, New York, 1998. Com-
Farnum, N. R., Modern Statistical Quality Control plete, modern presentation of estimation, evaluation,
and Improvement, Duxbury Press, Belmont, CA, and graphing of reliability functions.
1994. A comprehensive overview of control charts, ac- Montgomery, D. C., Introduction to Statistical Qual-
ceptance sampling, experimental design, metrology, ity Control (6th ed.), Wiley, New York, 2012. Com-
and the modern approach to quality. prehensive and easy to read, with good examples and
Lloyd, D. K., and M. Lipow, Reliability: Manage- problems.
ment, Methods, and Mathematics, ASQC Press,

7
Max Earey/Shutterstock.com
Estimation and
Statistical Intervals
7.1 Point Estimation
7.2 Large-Sample Confidence Intervals for
a Population Mean
7.3 More Large-Sample Confidence Intervals
7.4 Small-Sample Intervals Based on a Normal
Population Distribution
7.5 Intervals for m1 m2 Based on Normal
Population Distributions
7.6 Other Topics in Estimation (Optional)

Introduction
The general objective of statistical inference is to use sample information as
a basis for drawing various types of conclusions. In an estimation problem, we
want to make an educated guess about the value of some population charac-
teristic or parameter, such as the population mean battery lifetime , the pro-
portion of all components of a certain type that need service while under
warranty, or the difference 1 2 2 between the population mean lifetimes for
two different types of batteries. The simplest type of estimate is a
a single number that represents our best guess for the value of the parameter.
Thus we might report a point estimate of 758 hours for the population mean
lifetime of all brand X 100-watt lightbulbs; we are not saying that 5 758, only
that sample data suggests 758 as a very plausible value for . Point estimation
is discussed in Section 7.1.

293

A point estimate of a parameter almost surely differs by at least a small amount

from the actual value of the parameter. That is, there is almost always at least a small
error in the estimate. Our estimate, for example, may be 758 hours when is actually
750 hours. It would be nice if our estimates could provide some indication of preci-
sion; this is the purpose of a confidence interval (interval estimate). Such estimates, as
well as several other types of intervals, are presented in Sections 7.2–7.5. Section 7.6
briefly considers several other topics relating to estimation.

7.1 Point Estimation

A point estimate of some parameter is a single number, calculated from sample data,
that can be regarded as an educated guess for the value of . We might, for example,
report 32.5 mpg as a point estimate of the population mean fuel efficiency for all cars
of a particular type under specified conditions. Or we might decide that .350 is a point
estimate for the proportion of all individuals who would try a particular product again
after using a free trial sample.
A point estimate is usually obtained by selecting a suitable statistic and calculating
its value for the given sample data. For example, a natural statistic to use for estimating
a population mean is the sample mean x, and a sensible way to estimate a population
variance 2 is to compute the value of the sample variance s2. The statistic used to cal-
culate an estimate is sometimes called an estimator, and the symbol n is frequently used
to denote either the estimator or the resulting estimate. Thus the statement

n 5 x 5 32.5

says that the point estimate of the population mean is 32.5 and that this estimate was
calculated using the sample mean x as the estimator.

Example 7.1 A commonly used method of estimating the size of a wildlife population is to perform
a capture/recapture experiment. Suppose a biologist wishes to estimate the number
of fish in a certain lake; that is, the parameter to be estimated is the population
size N. An initial sample of 100 fish is selected, each one is tagged, and the tagged
fish are returned to the lake. After a time period sufficient to allow the tagged fish to
mix with the other fish in the lake, a second sample of 250 fish is selected. If 25 of the
fish in the recapture sample are tagged, what is a sensible estimate for N? Because
10% of the fish in the recapture sample are tagged, it is reasonable to estimate that
10% of all fish in the lake are tagged. Since we know that a total of 100 fish were
initially tagged, this suggests that we use 1000 as a point estimate of N.
More generally, if M denotes the number of fish initially tagged, n the size of the
recapture sample, and x the number of tagged fish in the recapture sample (so x is a
random variable), the proposed estimator of N is N n 5 [Mnyx]. (The square bracket
notation [c] denotes the largest whole number that is at most c; this takes care of cases
where Mn@x is not a whole number.)

Frequently, there is more than one estimator that can sensibly be used to calculate
an estimate, as the following example shows.

Example 7.2 Consider a population of N 5 5000 invoices. Associated with each invoice is its “book
value,” the recorded amount of that invoice. Let T 5 $1,761,300 denote the known
total book value. Unfortunately, some of the book values are erroneous. An audit will
be carried out by randomly selecting n invoices and determining the audited (i.e.,
correct) value for each one. Suppose the sample gives the following results:
Invoice: 1 2 3 4 5
Book value: 300 720 526 200 127
Audited value: 300 520 526 200 157
Error: 0 200 0 0 230
Let y 5 sample mean book value 5 $374.60, x 5 sample mean audited value 5
$340.60, and e 5 sample mean error 5 $34.00. Each of the following estimators for
the total audited (i.e., correct) value and resulting estimates is sensible:
mean per unit statistic 5 N x; estimate 5 5000(340.60) 5 $1,703,000
difference statistic 5 T 2 Ne; estimate 5 1,761,300 2 (5000)(34)
5 $1,591,300
ratio statistic 5 T(xyy); estimate 5 (1,761,300)(340.6y374.6) 5 $1,601,438
The choice among these estimates is not clear-cut. In fact, all three of the estimators
have been advocated by those employing statistical methodology in auditing.

In situations where there is more than one sensible estimator available, criteria for
selecting an estimator are needed. We now turn to a brief discussion of desirable proper-
ties of estimators.

Properties of Estimators
One desirable property that a good estimator should possess is that it be unbiased. An
estimator is unbiased if, in repeated random samples, the numerical values of the es-
timator stack up around the population parameter that we are trying to estimate. An
often-used analogy is to think of each value of an estimator as a shot fired at a target, the
target being the population parameter of interest. As long as all the shots fall in a pattern
with the target value in the middle, we say that the shots are unbiased. Notice that we
do not require that any of the individual shots actually hit the target; we require only
that they be centered around the target value. If the majority of the shots are centered
somewhere else, then we say that they exhibit a certain amount of bias.
In terms of sampling distributions, an estimator is said to be unbiased if the mean
of its sampling distribution coincides with the parameter that is being estimated. For
instance, we know from Section 5.5 that the sampling distribution of the statistic x has a
mean value of x, which equals the mean of the population from which the samples
are taken. Then x is said to be an estimator of the parameter and, because x 5 ,

x is also an unbiased estimator of . In general, for any population parameter and any
estimator n of that parameter, Figure 7.1 illustrates what it means for n to be unbiased
or biased.

ˆ is unbiased
(a)

Sampling
distribution
ˆ of ˆ

(b)
ˆ is biased

bias

Figure 7.1 Sampling distribution of an estimator n

definitions Denote a population parameter generically by the letter and denote any estima-
tor of this parameter by n. Then n is an unbiased estimator if n 5 . Otherwise,
n is said to be biased, and the quantity n 2 . is called the bias of n.

Some of the most important statistics we have studied are unbiased estimators of
certain population parameters. For example, it can be shown that the sample mean x is Unless otherwise noted, all content on this page is © Cengage Learning.

an unbiased estimator of the population mean , the sample variance s2 is an unbiased

estimator of the population variance 2, and the sample proportion p is an unbiased esti-
mator of the population proportion . One important exception is the sample standard
deviation s, which turns out to be a slightly biased estimator of the population standard
deviation . Fortunately, for large samples, the amount of bias in s is negligible. For
small samples from a normal population, there is a simple correction factor that can be
applied to s that converts it into an unbiased statistic for estimating .
Unbiasedness does not imply that the estimate computed from any particular
sample will coincide with the value of the parameter being estimated. Consider, for
example, using the sample proportion p to estimate the population proportion based
on a sample of size n 5 25, and suppose that 5 .7. Then p 5 .7, so the sampling dis-
tribution of p is centered at .7. However, with x denoting the number of “successes” in

the sample, p 5 xy25 Þ .7 for any possible value of x. That is, even though p is unbiased
for estimating , the value of the estimate calculated from any particular sample will
inevitably differ from . Nevertheless, if sample after sample is selected and the value
of p calculated for each one, unbiasedness implies that the long-run average of these
estimates will be the correct value, .7.
A second desirable property that estimators often possess is consistency. If n de-
notes an estimator of some population parameter , then n is said to be consistent
if the probability that it lies close to increases to 1 as the sample size increases.
Simply stated, consistent estimators become more and more accurate as the sample
size increases. That is, as you increase n, it becomes more and more likely that such
estimators will be very close to the parameter they are intended to estimate. The most
common method for showing that an estimator is consistent is to show that its standard
error decreases as the sample size increases. For instance, because the standard error
of x is x 5 y1n, which must necessarily decrease as n increases, the sample mean
qualifies as a consistent estimator of . This means that for any interval around ,
no matter how small the interval, we can eventually select n large enough so that the
sampling distribution lies almost entirely within the interval. This property is illus-
trated in Figure 5.19. Although there are some estimators that are not consistent, such
examples are fairly rare. In fact, all of the statistical applications in this text involve
consistent estimators.

definition If the probability that an estimator n falls close to a population parameter can be
made as near to 1 as desired by increasing the sample size n, then n is said to be
a consistent estimator of .

Section 7.1 Exercises

1. A single plastic part is randomly selected from a 3. Random samples of size n are taken from a nor-
large population of such parts. Can the length of mal population whose standard deviation is known
the chosen part be considered an unbiased estima- to be 5.
tor of the average length of all the parts? a. For random samples of size n 5 10, calculate the
area under the sampling distribution curve for
2. A random sample of ten homes in a particular area,
x between the values 2 1 and 1 1. That is,
each heated with natural gas, is selected, and the
find the probability that the sample mean lies
amount of gas (therms) used during January is deter-
within 61 unit of the population mean.
mined for each home. The resulting observations are
b. Repeat the probability calculation in part (a) for
103, 156, 118, 89, 125, 147, 122, 109, 138, and 99.
samples of size n 5 50, n 5 100, and n 5 1000.
a. Use an unbiased estimator to compute a point
c. Graph the probabilities you found in parts (a)
estimate of , the average amount of gas used
and (b) versus their corresponding sample sizes,
by all houses in the area.
n. What can you conclude from this graph?
b. Use an unbiased estimator to compute a point
estimate of , the proportion of all homes that 4. Random samples of n trees are taken from a large
use over 100 therms. area of forest, and the proportion of diseased trees in

each sample is determined. The actual proportion minimum sample size n that satisfies this re-
of diseased trees, , is unknown. quirement.
a. For random samples of size n 5 10, calcu- b. Repeat the calculations in part (a) for areas of
late the area under the sampling distribution 80%, 95%, and 99%.
curve for p between the points 2 .10 and c. Plot the sample sizes found in parts (a) and (b)
1 .10. That is, find the probability that the versus their corresponding probabilities. What
sample proportion lies within 6.10 (i.e., 10%) can you conclude from this graph?
of the population proportion. Use the formula
6. Each of 150 newly manufactured items is exam-
for the upper bound on the standard error of p
ined, and the number of surface flaws per item is
(see Section 5.6) in your calculations.
recorded, yielding the following data:
b. Repeat the probability calculation in part (a) for
samples of size n 5 50, n 5 100, and n 5 1000. Number of flaws: 0 1 2 3 4 5 6 7
(Use the normal approximation to the binomial.) Observed frequency: 18 37 42 30 13 7 2 1
c. Graph the probabilities you found in parts
(a) and (b) versus their corresponding sample Let x denote the number of flaws on a randomly
sizes, n. What can you conclude from this graph? chosen item, and assume that x has a Poisson distri-
bution with parameter .
5. Random samples of size n are selected from a a. Find an unbiased estimator for and compute
normal population whose standard deviation is the estimate using the data. Hint: The mean of
known to be 2. a Poisson random variable equals .
a. Suppose you want 90% of the area under b. What is the standard error of the estimator in
the sampling distribution of x to lie within part (a)? Hint: The variance of a Poisson random
61 unit of a population mean . Find the variable also equals .

7.2 arge-Sample Confidence Intervals

L
for a Population Mean
A point estimate, because it is a single number, by itself provides no information about
the precision and reliability of estimation. Consider, for example, using the statistic x
to calculate a point estimate for the true average breaking strength (g) of paper towels
of a certain brand, and suppose that x 5 9322.7. Because of sampling variability, it is
virtually never the case that x 5 . The point estimate says nothing about how close it
might be to . An alternative to reporting a single most plausible value of the parameter
being estimated is to calculate and report an entire interval of plausible values—an
interval estimate or confidence interval (CI). A confidence interval is always calculated
by first selecting a confidence level, which is a measure of the degree of reliability of the
interval. A confidence interval with a 95% confidence level for the true average break-
ing strength might have a lower limit of 9162.5 and an upper limit of 9482.9. Then at
the 95% confidence level, any value of between 9162.5 and 9482.9 is plausible. A
confidence level of 95% implies that 95% of all samples would give an interval that
includes , or whatever other parameter is being estimated, and only 5% of all samples
would yield an erroneous interval. The most frequently used confidence levels are 95%,
99%, and 90%. The higher the confidence level, the more strongly we believe that the
value of the parameter being estimated lies within the interval.
Information about the precision of an interval estimate is conveyed by the width of
the interval. If the confidence level is high and the resulting interval is quite narrow, our

knowledge of the value of the parameter is reasonably precise. A very wide confidence in-
terval, however, gives the message that there is a great deal of uncertainty concerning the
value of what we are estimating. Figure 7.2 shows 95% confidence intervals for true average
breaking strengths of two different brands of paper towels. One of these intervals suggests
precise knowledge about , whereas the other suggests a very wide range of plausible values.

Brand 1: ( ) Strength

Brand 2: ( ) Strength

Figure 7.2 Confidence intervals indicating precise (brand 1)

and imprecise (brand 2) information about

A Confidence Interval for m

with Confidence Level 95%
A confidence interval for a population or process mean is based on the following
properties of the sampling distribution of x:

x 5 x 5
1n
When n is large, the x distribution is approximately normal (this is the Central Limit
Theorem). Standardizing x by subtracting its mean value and dividing by its standard
deviation gives the following standardized variable, denoted by z to emphasize that its
distribution is approximately standard normal (the z curve):
x2
z5
y1n
The difficulty with this standardized variable is that, in practice, the value of the
population or process standard deviation will almost never be known to an investiga-
tor. Consider instead the standardized variable in which is replaced by the sample
standard deviation s:
x2
sy1n
Unless otherwise noted, all content on this page is © Cengage Learning.

Because there is sampling variability in this second standardized variable both in the nu-
merator (because of x) and in the denominator (the value of s will also vary from sample
to sample), it would seem as though its distribution should be more spread out than the
z curve. But appearances are deceiving! It turns out that when n is large, replacement of
by s does not add much variability; in this case, the variable z 5 (x 2 )y(sy1n) also
has approximately a standard normal distribution.
A confidence interval with a 95% confidence level is obtained by starting with a
central z curve area of .95. As Figure 7.3 illustrates, the z critical values 1.96 and 21.96
capture this area (consult Appendix Table I).
The foregoing facts justify the following probability statement:
x2
P a21.96 , , 1.96b .95
sy1n

curve Central area = .95

Lower-tail Upper-tail
area = .025 area = .025

–1.96 0 1.96

Figure 7.3 Capturing a central curve area of .95

Now let’s manipulate the inequalities inside the parentheses to isolate in the middle
and move everything else to the two extremes. This is achieved as follows:
1. Multiply all three terms by sy1n.
2. Subtract x from all three terms (leaving only 2 in the middle).
3. Multiply by 21 (causing the direction of each inequality to reverse).
The result is x 1 1.96(sy1n) . . x 2 1.96(sy1n), or, rewriting the terms in reverse
order,
s s
x 2 1.96 , , x 1 1.96
1n 1n
These new inequalities are algebraically equivalent to those we started with, so
the probability associated with the new inequalities is also (approximately) .95. That is,
think of x 2 1.96(sy1n) as the lower limit and x 1 1.96(sy1n) as the upper limit of an
interval. Both of these limits involve x and s, so the values of both limits will vary from
sample to sample. With a probability of approximately .95, the selected sample will be
such that the value of is captured between these two interval limits. Substituting the
values of n, x, and s from any particular sample into these expressions gives a confidence
interval for with a confidence level of approximately 95%.

A large-sample confidence interval for m with a confidence level of (approxi-

mately) 95% has Unless otherwise noted, all content on this page is © Cengage Learning.

lower confidence limit 5 2 1.96

upper confidence limit 5 1 1.96

The interval is centered at and extends out the same distance, 1.96 y1 , to each side,
so it can be written in abbreviated form as

6 1.96
1

This formula is valid whatever the shape of the population distribution.

The two limits x 6 (1.96)sy1n can also be obtained by replacing each , inside the
parentheses in the probability statement by 5 and solving the two resulting equations
for .

Example 7.3 The alternating-current (AC) breakdown voltage of an insulating liquid indicates
its dielectric strength. The article “Testing Practices for the AC Breakdown Voltage
Testing of Insulation Liquids” (IEEE Electrical Insulation Magazine, 1995: 21–26)
gave the accompanying sample observations on breakdown voltage (kV) of a particu-
lar circuit under certain conditions:
62 50 53 57 41 53 55 61 59 64 50 53 64 62 50 68
54 55 57 50 55 50 56 55 46 55 53 54 52 47 47 55
57 48 63 57 57 55 53 59 53 52 50 55 60 50 56 58

Figure 7.4 shows the output from the JMP software’s Analyze/Distribution com-
mand. The boxplot of the data shows a high concentration in the middle half of the
data (narrow box width). There is a single outlier at the upper end, but this value
is actually a bit closer to the median (55) than is the smallest sample observation.

Distributions
Voltage
Quantiles Moments
100% maximum 68 Mean 54.708333
99.5% 68 Std Dev 5.230672
97.5% 67.1 Std Err Mean 0.7549825
90.0% 62.1 Upper 95% Mean 56.227162
75.0% quartile 57 Lower 95% Mean 53.189505
50.0% median 55 N 48
25.0% quartile 50.5
40 45 50 55 60 65 70 10.0% 47.9
2.5% 42.125
0.5% 41
0.0% minimum 41

Figure 7.4 Output from JMP for the breakdown voltage data from Example 7.3
Unless otherwise noted, all content on this page is © Cengage Learning.

Summary quantities include n 5 48, x 5 54.7, and s 5 5.23. The 95% confidence
interval is then
5.23
54.7 6 1.96 5 54.7 6 1.5 5 (53.2, 56.2)
248
That is,
53.2 , , 56.2
with a confidence level of approximately 95%. The interval is reasonably narrow,
indicating that we have precisely estimated . Note that our lower and upper interval
endpoints match JMP’s “Lower 95% Mean” and “Upper 95% Mean,” respectively.

The 95% confidence interval for in the foregoing example is (53.2, 56.2). It is
tempting to say that there is a 95% chance that is between 53.2 and 56.2. Do not yield
to this temptation! The 95% refers to the long-run percentage of all possible samples re-
sulting in an interval that includes . That is, if we consider taking sample after sample
from the population and use each one separately to compute a 95% confidence interval,
in the long run roughly 95% of these intervals will capture . Figure 7.5 illustrates this
for 100 samples; 93 of the resulting intervals include , whereas 7 do not. Without
knowing the value of , we cannot tell whether our interval (53.2, 56.2) is one of the
good 95% or the bad 5% of all intervals that might result. The confidence level refers to
the method used to construct the interval rather than to any particular calculated interval.

Unless otherwise noted, all content on this page is © Cengage Learning.

Figure 7.5 95% confidence intervals for from 100 different

samples (* identifies an interval that does not include )

Other Confidence Levels and a General Formula

The confidence level of 95% was inherited from the probability .95 with which we be-
gan the derivation of the interval. This probability in turn dictated the use of the z criti-
cal value 1.96 in the confidence interval formula. It follows that if we want a confidence
level of 99%, we should identify the z critical value that captures a central z curve area
of .99. Figure 7.6 shows how this is done. In Appendix Table I, the closest entries for the
cumulative area .9950 are .9949, in the 2.5 row and .07 column, and .9951, in the same

Central area = .99

curve

.01 .01
= .005 = .005
2 2

critical value = –2.576 critical value = 2.576

Cumulative area = .995

Figure 7.6 Finding the critical value for a 99% confidence level

row and .08 column. Thus 2.576 (or 2.58, to be conservative) should be used in the CI
formula in place of 1.96 to obtain the higher confidence level.
It should be clear at this point that any confidence level can be achieved simply by
finding the z critical value that captures the corresponding z curve area. For example, it is
easily verified that the interval from 21.28 to 1.28 contains above it about 80% of the area
under the z curve, so using 1.28 in place of 1.96 gives a CI with confidence level 80%.

A large-sample confidence interval for a population or process mean m is given

by the formula

6( )
1

As a general rule, this interval is appropriate when the sample size exceeds 30. The three
most commonly used confidence levels, 90%, 95%, and 99%, use critical values of 1.645,
1.96, and 2.576, respectively.
Unless otherwise noted, all content on this page is © Cengage Learning.

Why settle for 95% confidence when 99% confidence is possible? The price of a
higher confidence level is that the resulting interval is wider. The width of the 95%
interval is 2(1.96sy1n), whereas the 99% interval has a width of 2(2.576sy1n). The
higher reliability of the 99% interval entails a loss in precision (as indicated by the wider
interval). Many investigators think that a 95% confidence level gives a reasonable com-
promise between reliability and precision.

Choosing the Sample Size

The half-width 1.96sy1n of the 95% CI is sometimes called the bound on the error of
estimation associated with a 95% confidence level; that is, with 95% confidence, the
point estimate x will be no farther than this from . Before obtaining data, an investi-
gator may wish to determine a sample size for which a particular value of the bound

is achieved. For example, with representing the average fuel efficiency (mpg) for all
cars of a certain type, the objective of an investigation may be to estimate to within
1 mpg with 95% confidence. More generally, suppose we wish to estimate to within
an amount B (the specified bound on the error of estimation) with 95% confidence.
This implies that B 5 1.96sy1n, from which

1.96s 2
n5 c d
B

The difficulty with this formula is that calculating the value of n requires having s,
which is of course not available until a sample has been selected. Instead, prior infor-
mation about may be used as a basis for a reasonable guess for s. Alternatively, for a
population distribution that is not too skewed, dividing the range (difference between
the largest and smallest values) by 4 often gives a rough idea of what s might be.

Example 7.4 Refer to Example 7.3 on breakdown voltage. Suppose that the investigator believes
that almost all values in the population distribution are between 40 and 70. Then
(70 2 40)y45 7.5 gives a reasonable value for s. The appropriate sample size for
estimating true average breakdown voltage to within 1 kV with confidence level 95%
is now
(1.96)(7.5) 2
n5 c d 217
1

The sample size associated with an error bound B for any other confidence level, such
as 99%, results from replacing 1.96 in the formula for n by the corresponding critical
value, for example, 2.576.

One-Sided Confidence Intervals

(Confidence Bounds)
The confidence intervals discussed thus far give both a lower confidence bound and an
upper confidence bound for . In some circumstances, an investigator will want only
one of these two types of bounds. For example, a psychologist may wish to calculate a
95% upper confidence bound for true average reaction time to a particular stimulus, or
a reliability engineer may want only a lower confidence bound for true average lifetime
of components of a certain type. It is easily verified that the cumulative area under the
curve to the left of 1.645 is .95, implying that

x2
Pa , 1.645b .95
sy1n

Manipulating the inequality inside the parentheses to isolate on one side gives the
equivalent inequality . x 2 1.645sy1n; the expression on the right is the desired
lower confidence bound. Starting with P(21.645 , z) .95 and manipulating the
inequality results in the upper confidence bound. A similar argument gives a one-sided
bound associated with any other confidence level.

A large-sample upper confidence bound for m is

, 1 ( critical value)
1
and a large-sample lower confidence bound for m is

. 2 ( critical value)
1
The three most commonly used confidence levels, 90%, 95%, and 99%, use critical values
of 1.28, 1.645, and 2.33, respectively.

Example 7.5 Recently there has been increased use of titanium and its alloys in aerospace
and automotive applications. These alloys are highly durable and have a high
strength-to-weight ratio. However, machining of titanium is difficult due to its low
thermal conductivity. The authors of “Modelling and Multi-Objective Optimiaz-
tion of Process Parameters of Wire Electrical Discharge Machining Using Non-
Dominated Sorting Genetic Algorithm-II” (J. of Engr. Manuf., 2012: 1186–2001),
Investigated different settings that impact wire electrical discharge machining of
titanium 6-2-4-2. A characteristic of interest was surface roughness (in m) of
the metal after machining. In one particular investigation a sample of 54 surface
roughness observations gave a sample mean of 1.9042 m and a sample stan-
dard deviation of .1455 m. An upper confidence bound for true average surface
roughness with confidence 95% is
(.1455)
1.9042 1 1.645 5 1.9042 1 .0326 5 1.9368
154
That is, with a confidence level of 95%, the value of lies in the interval (2 , 1.9368).
Since negative values for surface roughness are not possible, we revise this interval
to (0, 1.9368).

Section 7.2 Exercises

7. Assuming that n is large, determine the confidence 9. Discuss how each of the following factors affects the
level for each of the following two-sided confidence width of the large-sample two-sided confidence in-
intervals: terval for :
a. x 6 3.09sy1n b. x 6 2.81sy1n a. Confidence level (for fixed n and s)
c. x 6 1.44sy1n d. x 6 sy2n b. Sample size n (for fixed confidence level and s)
8. What z critical value in the large-sample two-sided c. Sample standard deviation s (for fixed confi-
confidence interval for should be used to obtain dence level and n)
each of the following confidence levels? 10. Each of the following is a confidence interval for
a. 98% b. 85% 5 true average (i.e., population mean) resonance
c. 75% d. 99.9%

frequency (Hz) for all tennis rackets of a certain a. Calculate a 95% two-sided confidence inter-
type: val for population mean concentration for the
(114.4, 115.6) (114.1, 115.9) Mugil liza species.
b. Calculate a 99% two-sided confidence inter-
a. What is the value of the sample mean resonance
val for population mean concentration for the
frequency?
Pogonias cromis species. Why is this interval
b. Both intervals were calculated from the same
wider than the interval of part (a) even though it
sample data. The confidence level for one of
is based on a somewhat larger sample size?
these intervals is 90% and for the other is 99%.
Which of the intervals has the 90% confidence 13. Young people may feel they are carrying the weight
level, and why? of the world on their shoulders when in reality they
are too often carrying an excessively heavy back-
11. Suppose that a random sample of 50 bottles of a par-
pack. The article “Effectiveness of a School-Based
ticular brand of cough syrup is selected, and the al-
Backpack Health Promotion Program” (Work,
cohol content of each bottle is determined. Let de-
2003: 113–123) reported the following data for a
note the average alcohol content for the population
sample of 131 sixth graders: for backpack weight
of all bottles of the brand under study. Suppose that
(lbs), x 5 13.83, s 5 5.05; for backpack weight as a
the resulting 95% confidence interval is (7.8, 9.4).
percentage of body weight, a 95% CI for the popu-
a. Would a 90% confidence interval calculated from
lation mean was (13.62; 15.89).
this same sample have been narrower or wider
a. Calculate and interpret a 99% CI for population
than the given interval? Explain your reasoning.
mean backpack weight.
b. Consider the following statement: There is a
b. Obtain a 99% CI for population mean weight as
95% chance that is between 7.8 and 9.4. Is
a percentage of body weight.
this statement correct? Why or why not?
c. The American Academy of Orthopedic Surgeons
c. Consider the following statement: We can be
recommends that backpack weight be at most
highly confident that 95% of all bottles of this
10% of body weight. What does your calculation
type of cough syrup have an alcohol content that
of part (b) suggest and why?
is between 7.8 and 9.4. Is this statement correct?
Why or why not? 14. The article “Extravisual Damage Detection? Defining
d. Consider the following statement: If the process of the Standard Normal Tree” (Photogrammetric Engr.
selecting a sample of size 50 and then computing and Remote Sensing, 1981: 515–522) discusses the use
the corresponding 95% interval is repeated 100 of color infrared photography in identification of nor-
times, 95 of the resulting intervals will include . mal trees in Douglas fir stands. Among data reported
Is this statement correct? Why or why not? were summary statistics for green-filter analytic opti-
12. Heavy-metal pollution of various ecosystems is a se- cal densitometric measurements on samples of both
rious environmental threat, in part because of the healthy and diseased trees. For a sample of 69 healthy
potential transference of hazardous substances to trees, the sample mean dye-layer density was 1.028,
humans via food. The article “Cadmium, Zinc, and and the sample standard deviation was .163.
Total Mercury Levels in the Tissues of Several Fish a. Calculate a 95% two-sided CI for the true aver-
Species from La Plata River Estuary, Argentina” age dye-layer density for all such trees.
(Environmental Monitoring and Assessment, 1993: b. Suppose the investigators had made a rough guess
119–130) reported the following summary data on of .16 for the value of s before collecting data.
zinc concentration (g@g) in the liver of fish: What sample size would be necessary to obtain an
interval width of .05 for a confidence level of 95%?
Species n x s 15. The negative effects of ambient air pollution on
Mugil liza 56 9.15 1.27 children’s lung function has been well established,
Pogonias cromis 61 3.08 1.71 but less research is available about the effects of

indoor air pollution. The authors of “Indoor Air large a sample would have been required to esti-
Pollution and Lung Function Growth Among mate to within .5 MPa with 95% confidence?
Children in Four Chinese Cities” (Indoor Air,
17. When the population distribution is normal and n
2012: 3–11) investigated the relationship between
is large, the statistic s has approximately a normal
indoor air pollution metrics and lung function
distribution with s , s y12n. Use this fact
growth among children ages 6–13 years living in
to develop a large-sample two-sided confidence
four Chinese cities. For each subject in the study,
interval formula for . Then calculate a 95% con-
the authors measured an important lung-capacity
fidence interval for the true standard deviation of
index known as FEV1, the forced volume (in ml) of
the fracture strength distribution based on the data
air that is exhaled in 1 second. Higher FEV1 values
given in Exercise 16 (the cited paper gave compel-
are associated with greater lung capacity.
ling evidence in support of assuming normality).
Burning coal inside houses can lead to increased
levels of indoor air toxins that may have negative effects 18. Determine the confidence level for each of the fol-
on lung function. Among the children in the study, lowing large-sample one-sided confidence bounds:
514 came from households that use coal for cooking a. Upper bound: x 1 .84sy1n
or heating or both. Their FEV1 mean was 1427 with b. Lower bound: x 2 2.05sy1n
standard deviation 325. (Using a complex statistical c. Upper bound: x 1 .67sy1n
procedure the authors went on to show that burning
19. The charge-to-tap time (min) for a carbon steel in
coal had a clear negative effect on mean FEV1 levels.)
one type of open hearth furnace was determined
a. Calculate and interpret a 95% (two-sided) con-
for each heat in a sample of size 36, resulting in a
fidence interval for true average FEV1 level in
sample mean time of 382.1 and a sample standard
the population of all children from which the
deviation of 31.5. Calculate a 95% upper confi-
sample was selected.
dence bound for true average charge-to-tap time.
b. Suppose the investigators had made a rough
guess of 320 for the value of s before collecting 20. A Brinell hardness test involves measuring the di-
data. What sample size would be necessary to ameter of the indentation made when a hardened
obtain an interval width of 50 ml for a confi- steel ball is pressed into material under a standard
dence level of 95%? test load. Suppose that the Brinell hardness is de-
termined for each specimen in a sample of size 32,
16. The article “Evaluating Tunnel Kiln Performance”
resulting in a sample mean hardness of 64.3 and a
(Amer. Ceramic Soc. Bull., August 1997: 59–63)
sample standard deviation of 6.0. Calculate a 99%
gave the following summary information for frac-
lower confidence bound for true average Brinell
ture strengths (MPa) of n = 169 ceramic bars fired
hardness for material specimens of this type.
in a particular kiln: x 5 89.10, s 5 3.73.
a. Calculate a two-sided confidence interval for true 21. The article “Ultimate Load Capacities of Expan-
average fracture strength using a confidence level sion Anchor Bolts” (J. of Energy Engr., 1993:
of 95%. Does it appear that true average fracture 139–158) gave the following summary data on
strength has been precisely estimated? shear strength (kip) for a sample of 3y8-in. anchor
b. Suppose the investigators had believed a priori bolts: n 5 78, x 5 4.25, s 5 1.30. Calculate a lower
that the population standard deviation was confidence bound using a confidence level of 90%
about 4 MPa. Based on this supposition, how for true average shear strength.

7.3 More Large-Sample Confidence Intervals

In Section 7.2, we used properties of the sampling distribution of x as a basis for obtaining
a confidence interval formula for estimating when the sample size was large. In this
section, we develop a large-sample interval formula for , the proportion of individuals

or objects in a population or process that possess a particular characteristic, and also for
1 2 2, the difference between two population or process means. These intervals are
based on sampling distribution properties of appropriate statistics.

A Large-Sample Confidence Interval for p

Let denote the proportion of individuals or objects in a population or process that pos-
sess a particular characteristic (the successes). For example, might represent the propor-
tion of all components of a certain type that do not need service while under warranty, the
proportion of all computers sold at a certain store that are laptop models, or the propor-
tion of patients suffering from a certain disease who respond favorably to a particular treat-
ment. An inference about will be based on a random sample of size n selected from
the population or process. The natural statistic for estimating is the sample proportion
number of successes in the sample
p5
n
For example, if n 5 5 and the resulting sample is SFFSS (the first, fourth, and fifth sam-
pled individuals possess the property of interest but the second and third do not), then
p 5 3y5 5 .60. The value of p is also .60 for the outcomes SSSFF and SFSFS, whereas it is
.20 for the outcome FSFFF and 1 for the outcome SSSSS. When n 5 5, the six possible val-
ues of p are 0, .2, .4, .6, .8, and 1. The larger the sample size, the more values of p are possible.
The value of for any such population is a fixed number between 0 and 1. If, however,
we select sample after sample of size n from the same population or process, the value of p
will vary from sample to sample. In the case n 5 5, a first sample might give p 5 .6, a second
sample p 5 .8, a third sample p 5 .6 again, and so on. The sampling distribution of the
statistic p describes this long-run variation. Consider again n 5 5 and suppose that 5 .6.
Using the same reasoning that led to the binomial distribution in Chapter 1, the long-run
proportion of samples with p 5 1 (corresponding to the single outcome SSSSS) is (.6)5 5
.078. Similarly, there are five outcomes for which p 5 .8 (FSSSS, . . . , SSSSF), and the cor-
responding long-run proportion is 5(.6)4(.4) 5 .259. The complete sampling distribution is
p: 0 .2 .4 .6 .8 1
Long-run proportion (probability): .010 .077 .230 .346 .259 .078
We can then easily verify that the mean value of the statistic p is
p 5 0(.010) 1 .2(.077)1 … 11(.078) 5 .60
That is, the sampling distribution of p is centered exactly at the value of what the statis-
tic is trying to estimate. This is true regardless of the values of and n—the statistic is
unbiased. Notice, however, that it is not highly likely that p 5 ; the sampling distribu-
tion is quite spread out about its mean value.

General properties of the sampling distribution of :

1. 5
2. 5 2(1 2 )y
3. If both . 5 and (1 2 ) . 5, the sampling distribution is approximately normal.

Because n is in the denominator under the square root in the expression for p, the
standard deviation decreases and the sampling distribution becomes more and more
concentrated about as the sample size increases. The two inequality conditions in the
third property are designed to ensure that there is enough symmetry in the sampling dis-
tribution so that a normal curve with mean value and standard deviation p provides
a good approximation to a histogram of the actual distribution. For example, if n 5 100
but 5 .02, there is too much (positive) skewness for the approximation to work well
(much of the distribution is concentrated on the values 0, .01, .02, .03, and .04, and the
rest trails out to 1, so there is almost no lower tail).
The foregoing properties allow us to form a variable having approximately a stan-
dard normal distribution when n is large:
p2
z5
2(1 2 )yn

Using z to denote an appropriate z critical value (1.96, 1.645, etc.), we have that
p2
P a2 z* , , z* b 12
2(1 2 )yn

As suggested earlier in the derivation of our first confidence interval for , consider re-
placing each , inside the parentheses by 5 and solving the two resulting equations for
to obtain the confidence limits. Unfortunately, these equations are not as easy to solve
as were the earlier ones. This is because appears both in the numerator and in the
denominator. The equations are therefore both quadratic. Using the general formula for
the solution to a quadratic equation gives the following confidence interval.

A confidence interval for a population proportion p is

*2 (1 2 ) *2
1 6 * 1 2
2 B 4
*2
11

where * denotes an appropriate critical value, the 2 sign in the numerator gives the lower
confidence limit, and the 1 sign gives the upper confidence limit. The critical values cor-
responding to the most frequently used confidence levels, 90%, 95%, and 99%, are 1.645,
1.96, and 2.576, respectively. A lower confidence bound for results from using only
the 2 sign in the formula (along with the appropriate *), and using only the 1 sign gives
an upper confidence bound.

Although the preceding interval was derived from the large-sample distribution
of p, recent research has shown that it performs well even when n is quite small.
Additionally, the actual confidence level achieved by the interval is almost always
quite close to the desired level corresponding to the choice of any particular z critical
value. For example, using 1.96 as the z critical value implies a desired confidence

level of 95%, and the actual confidence level (long-run capture percentage if the
formula is used repeatedly on different samples) will almost always be roughly 95%.
When n is quite large, the three terms in the CI formula involving z* are negligible
compared to the three remaining terms. In this case, the CI reduces to the traditional
interval

p 6 (z critical value)2p(1 2 p)yn

This latter interval has the same general form as our earlier large-sample interval
for .

Example 7.6 The article “Repeatability and Reproducibility for Pass/Fail Data” (J. of Testing and
Eval., 1997: 151–153) reported that in n 5 48 trials in a particular laboratory, 16
resulted in ignition of a particular type of substrate by a lighted cigarette. Let de-
note the long-run proportion of all such trials that would result in ignition. A point
estimate for is p 5 16y48 5 .333. A confidence interval for with a confidence
level of approximately 95% is

.333 1 (1.96)2y96 6 1.962(.333)(.667)y48 1 (1.96)2y9216

1 1 (1.96)2y48
.333 6 .139
5 5 (.217, .474)
1.08
This interval is rather wide, indicating imprecise information about . The tradi-
tional interval is
.333 6 1.962(.333)(.667)y48 5 .333 6 .133 5 (.200, .466)
These two intervals would be in much closer agreement were the sample size sub-
stantially larger.

A Bound on the Error of Estimation

The quantity 1.96p 5 1.962(1 2 )yn gives a bound on the error of estimation with
a 95% confidence level in the sense that in the long run, p should be within this
distance of for roughly 95% of all samples. If the desired value of the bound is B,
equating this to 1.96p and solving for the necessary sample size n gives

1.96 2
n 5 (1 2 )c d
B

If some other confidence level is desired, the corresponding z critical value replaces
1.96. The difficulty with using this formula is that it involves the unknown . A conser-
vative approach utilizes the fact that (1 2 ) is largest when 5 .5. The sample size
resulting from this choice of will be large enough so that the bound B is achieved with
the desired confidence level no matter what the value of .

Example 7.7 A survey is to be carried out to estimate the proportion of all registered voters in a
particular state who favor certain term limits for their state legislators. How many
people should be included in a random sample to estimate this proportion to within
the amount .05 with 95% confidence? Substituting 5 .5 in the formula for n gives
n 5 .5(1 2 .5)(1.96y.05)2 5 384.16
so a sample size of 385 should be used. The resulting 95% confidence interval for
will have a half-width of at most .05 regardless of the value of p. Notice that this
sample size is far larger than what appeared in the previous example, which explains
why that interval was so wide.

A Large-Sample Confidence Interval for m1 2 m2

The symbols and have been used to denote the mean value and standard deviation,
respectively, of a population, process, or treatment response distribution. When two different
populations, processes, or treatments are being compared, different subscripts will be used
to differentiate characteristics of the first from those of the second. Similar notation is used
to distinguish between the two sample sizes, sample means, and sample standard deviations.

Notation
Mean Standard
value Variance deviation
Population, process, or treatment 1 1 21 1
Population, process, or treatment 2 2 22 2

Sample
Sample Sample Sample standard
size mean variance deviation
Sample from population, process,
2
or treatment 1 1 1 1 1

Sample from population, process,

2
or treatment 2 2 2 2 2

It is assumed that the observations in the first sample were obtained completely inde-
pendently from those in the second sample. Notice that our notation allows for the
possibility that the two sample sizes might be different. This might happen because
one population, process, or treatment is more expensive to sample than the other, or
perhaps because observations are “lost” in the course of obtaining data; for example,
several animals receiving a first diet die (hopefully for reasons unrelated to the diet).

Example 7.8 A study was carried out to compare population mean lifetimes (hr) for two different
brands of AA alkaline batteries used in a particular manner. Here, 1 is the mean
lifetime of all brand 1 batteries and 1 is the population standard deviation of brand 1

lifetimes; 2 and 2 are the mean value and standard deviation for the distribution of
brand 2 lifetimes. Values of the summary quantities calculated from the two resulting
samples are as follows:
Brand 1: n1 5 50 x1 5 4.15 s1 5 1.79
Brand 2: n2 5 45 x2 5 4.53 s2 5 1.64
Consider estimating the difference 1 2 2. The natural statistic for estimating
1 is x1, and the statistic x2 gives an estimate for 2. The difference between the two
x’s then gives an estimate of the difference between the two ’s. The point estimate
from the data is 4.15 2 4.53 5 2.38. That is, we estimate that, on average, brand 2
batteries last .38 hr longer than do brand 1 batteries. If the labels 1 and 2 on the two
brands had been reversed, the point estimate would be .38, and the interpretation
would be the same as with the original labeling.

Both x1 and x2 vary in value from sample to sample, and this will also be true of their
difference. For example, repeating the study described in Example 7.8 with the same
sample sizes might result in x1 5 4.02 and x2 5 4.75, giving the estimate 2.73. Just as a
confidence interval for a single was based on properties of the x sampling distribution,
a confidence interval for 1 2 2 is derived from properties of the sampling distribution
of the statistic x1 2 x2. These properties follow from the following general results:
1. For any two random variables x and y,
x2y 5 mean value of the difference 5 x 2 y
5 difference between the two means
2. If x and y are two independent random variables, then
2x2y 5 variance of a difference 5 2x 1 2y 5 sum of the variances
3. If x and y are independent random variables, each with a normal distribution, then
the difference x 2 y also has a normal distribution. If each variable is approximately
normal, then the distribution of the difference is also approximately normal.

Properties of the Sampling Distribution of x1 2 x2

1.
1 2 2 5 1 2 2 5 1 2 2, so that 12 2 is an unbiased statistic for estimating
1 2 2 .
21 22
21 2 2 5 21 1 22 5
2. 1 , from which the standard deviation of 1 2 2 is
1 2

21 22
5 1
12 2
C 1 2

3. If both population distributions are normal, the sampling distribution of 12 2 is normal.
4. If both the sample sizes are large, then the sampling distribution of 1 2 2 will be
approximately normal irrespective of the shapes of the two population distributions
(a consequence of the Central Limit Theorem).

The unbiasedness of x1 2 x2 means that the sampling distribution of this sta-

tistic is always centered at the value of what the statistic is trying to estimate. If,
for example, 1 5 110 and 2 5 100, then the sampling distribution is centered at
110 2 100 5 10, whereas if 1 5 100 and 2 5 105, the mean value of the statistic
is 100 2 105 5 25. In addition to knowing that the sampling distribution is cen-
tered at the right place, we would also like it to be highly concentrated about its
center. This will be the case if the variance and standard deviation of the statistic are
small. The two 2 values are in the numerator of the variance and the n’s are in the
denominator. So when there is little variability in the two population, process, or
treatment distributions (small values of 2 ), the variance and standard deviation will
be small even when the sample sizes are small. On the other hand, a great deal of
variability in each distribution can be counteracted by increasing the sample sizes
to again obtain a small variance and standard deviation (at the price of expending
more resources to collect data).
Now consider using the foregoing results to standardize x1 2 x2 when both sample
sizes are large. This entails subtracting the mean value of the statistic and then dividing by
its standard deviation. The standard deviation involves 21 and 22, and the values of these
variances are almost never available to an investigator. Fortunately, because of the large
n’s, we can replace the 2 values by the sample variances and still end up with a z variable.

When 1 and 2 are both large, the standardized variable

1 2 2 2 (1 2 2)
5
2 2
1 2
1
C 1 2

has approximately a standard normal distribution (the curve). Using this variable in the
same way that variables were used earlier to obtain confidence intervals for and for
gives the following large-sample confidence interval formula for estimating 1 2 2:
2 2
1 2
12 26( critical value) 1
C 1 2

This formula is valid irrespective of the shapes of the two underlying distributions. The
three most frequently used confidence levels of 95%, 99%, and 90% are achieved by using
the critical values 1.96, 2.576, and 1.645, respectively.

Example 7.9 An experiment carried out to study various characteristics of anchor bolts resulted in
78 observations on shear strength (kip) of 3y8-in. diameter bolts and 88 observations
on strength of 1y2-in. diameter bolts. Summary quantities from Minitab follow, and
a comparative boxplot appears in Figure 7.7. The sample sizes, sample means, and
sample standard deviations agree with values given in the article “Ultimate Load
Capacities of Expansion Anchor Bolts” (J. Energy Engr., 1993: 139–158). The sum-
maries suggest that the main difference between the two samples is in where they are

centered. Let’s now calculate a confidence interval for the difference between true
average shear strength for 3y8-in. bolts (1) and true average shear strength for 1y2-in.
bolts (2) using a confidence level of 95%:

(1.30)2 (1.68)2
4.25 2 7.14 6 (1.96) 1 5 22.89 6 (1.96)(.2318)
C 78 88
5 22.89 6 .45 5 (23.34,22.44)

Variable N Mean Median TrMean StDev SEMean

diam 3/8 78 4.250 4.230 4.238 1.300 0.147
Variable Min Max Q1 Q3
diam 3/8 1.634 7.327 3.389 5.075
Variable N Mean Median TrMean StDev SEMean
diam 1/2 88 7.140 7.113 7.150 1.680 0.179
Variable Min Max Q1 Q3
diam 1/2 2.450 11.343 5.965 8.447

2
Type

2 7 12
Unless otherwise noted, all content on this page is © Cengage Learning.
Strength

Figure 7.7 A comparative boxplot of the shear strength data

That is, with 95% confidence, 23.34 , 1 2 2 , 22.44. We can therefore be highly
confident that the true average shear strength for the 1y2-in. bolts exceeds that for
the 3y8-in. bolts by between 2.44 kip and 3.34 kip. Notice that if we relabel so that 1
refers to 1y2-in. bolts and 2 to 3y8-in. bolts, the confidence interval is now centered
at 12.89 and the value .45 is still subtracted and added to obtain the confidence
limits. The resulting interval is (2.44, 3.34), and the interpretation is identical to that
for the interval previously calculated.

Section 7.3 Exercises

22. The American Taxpayer Relief Act of 2012 was of such consumers is smaller than 1y3? Explain
passed by the U.S. Congress on January 1, 2013. your reasoning.
This act helped address what became famously
25. The technology underlying hip replacements has
known as the “fiscal cliff” crisis. However, dur-
changed as these operations have become more
ing the last months of 2012, heated debates con-
popular (more than 250,000 in the United States
cerning the crisis were ongoing in Congress, and
in 2008). Starting in 2003, highly durable ceramic
there was growing concern political gridlock was
hips were marketed. Unfortunately, for too many
preventing solution of the crisis by the end-of-
patients the increased durability has been counter-
year deadline. In mid-December, a USA Today–
balanced by an increased incidence of squeaking.
Gallup poll reported that only 18% of a sample
The May 11, 2008 issue of The New York Times
of 1025 adult Americans approved of the job
reported that in one study of 143 individuals who
Congress was doing in working toward a solution
received ceramic hips between 2003 and 2005, 10
to the looming fiscal cliff. Calculate a two-sided
developed squeaking problems.
confidence interval using a 99% confidence level
a. Calculate a lower confidence bound at the 95%
for the proportion of all U.S. adults who approved
confidence level for the true proportion of such
of the congressional handling of the crisis in
hips that develop squeaking.
December 2012.
b. Interpret the 95% confidence level used in
23. TV advertising agencies face growing challenges part (a).
in reaching audience members because viewing
26. Researchers have developed a chemical treat-
TV programs via digital streaming is increasingly
ment that retards the growth of trees of a certain
popular. The Harris poll reported on November 13,
type whose branches pose a safety threat to power
2012, that 53% of 2343 American adults surveyed
lines. However, an overly severe application of the
said they have watched digitally streamed TV pro-
treatment can cause trees to die. In an experiment
gramming on some type of device.
involving one particular treatment level applied to
a. Calculate and interpret a confidence interval at
250 trees, 38 trees died.
the 99% confidence level for the proportion of
a. Calculate and interpret a 95% confidence inter-
all adult Americans who have watched streamed
val for the proportion of all such trees that would
programming.
die if the treatment were applied at the tested
b. What sample size would be required for the
level.
width of a 99% CI to be at most .05 irrespective
b. The traditional CI for discussed in Section
of the value of p?
7.3 is based on the sample proportion p having
24. In a sample of 1000 randomly selected consumers approximately a normal sampling distribution,
who had opportunities to send in a rebate claim so the confidence level is only approximate
form after purchasing a product, 250 said they rather than exact. Recent research has shown
never did so (“Rebates: Get What You Deserve,” that under certain circumstances, its actual
Consumer Reports, May 2009: 7). Reasons cited confidence level can deviate dramatically from
for their behavior included too many steps in the the nominal one chosen by the investigator
process, rebate amount too small, missed deadline, (e.g., the actual level may be quite different
fear of being placed on a mailing list, lost receipt, from the 95% level selected). An article by
and doubts about receiving the money. Calculate two statisticians (Agresti, A., and B. A. Coull,
an upper confidence bound at the 95% confidence “Approximate Is Better Than ‘Exact’ for Inter-
level for the true proportion of such consumers who val Estimation of a Binomial Proportion,” The
never apply for a rebate. Based on this bound, is American Statistician, May 1998: 119–126)
there compelling evidence that the true proportion has suggested the following remedy in the case

of a 95% confidence level: Add 2 to both the 28. The article “The Effects of Cigarette Smoking and
number of successes and the number of fail- Gestational Weight Change on Birth Outcomes
ures and then use the traditional formula. Do in Obese and Normal-Weight Women” (Amer.
this for the data described in this exercise, and J. of Public Health, 1997: 591–596) reported on
compare the resulting interval to the one you a random sample of 487 nonsmoking women of
calculated in part (a). normal weight (body mass index between 19.8
and 26.0) who had given birth at a large metro-
27. Let 1 and 2 denote the proportion of successes
politan medical center. It was determined that
in population 1 and population 2, respectively. An
7.2% of these births resulted in children of low
investigator sometimes wishes to calculate a confi-
birth weight (less than 2500 g). The article also
dence interval for the difference 1 2 2 between
reported that 6.8% of a sample of 503 nonsmok-
these two population proportions. Suppose random
ing obese women (body mass index . 29) gave
samples of size n1 and n2, respectively, are indepen-
birth to children of low birth weight. Calculate a
dently selected from the two populations, and let p1
95% lower confidence bound for the difference
and p2 denote the resulting sample proportions of
between the population proportion of normal-
successes. If the sample sizes are sufficiently large
weight nonsmoking women and the population
(apply the rule of thumb appropriate for a single
proportion of obese nonsmoking women who give
proportion to each sample separately), the statistic
birth to children of low birth weight. Hint: Refer
p1 2 p2 has approximately a normal sampling dis-
to the previous problem.
tribution with mean value 1 2 2 and standard
deviation 11(1 2 1)yn1 1 2(1 2 2)yn2. The 29. Let 1 and 2 denote the proportions of successes
estimated standard deviation of this statistic results in two different populations. Rather than estimate
from replacing each π under the square root by the the difference 1 2 2 as described in Exercise 27,
corresponding p. an investigator will often wish to estimate the ratio
a. Use the foregoing facts to obtain a large-sample of the two ’s. If, for example, 1y2 5 3, then
two-sided 95% confidence interval formula for successes occur three times as frequently in popu-
estimating 1 2 2. lation 1 as they do in population 2. Alternatively,
b. Is the response rate for questionnaires affected if the ’s refer to success proportions for two dif-
by including some sort of incentive to respond ferent treatments, then a ratio of 3 implies that the
along with the questionnaire? In one experi- first treatment is three times as likely to result in a
ment, 110 questionnaires with no incentive success as is the second treatment. Consider inde-
resulted in 75 being returned, whereas 98 pendent random samples of sizes n1 and n2 from
questionnaires that included a chance to win the two different populations, which result in sam-
a lottery yielded 66 responses (“Charities, No; ple proportions p1 and p2, respectively. Also let u 5
Lotteries, No; Cash, Yes,” Public Opinion number of successes in the first sample and v 5
Quarterly, 1996: 542–562). Calculate a two- number of successes in the second sample. When
sided 95% CI for the difference between the the n’s are both large, the statistic ln(p1yp2)has ap-
true response proportions under these circum- proximately a normal sampling distribution with
stances. Does the interval suggest that, in fact, approximate mean value and standard deviation
the values of 1 and 2 are different? Explain ln(1y2) and 2(n1 2 u)y(un1) 1 (n2 2 v)yvn2),
your reasoning. respectively.
c. Recent research has shown that “coverage a. Use these facts to obtain a large-sample two-
probability” and small-sample behavior are sided 95% CI for ln(1y2) and a CI for 1y2
improved by adding one success and one fail- itself.
ure to each sample and then using the formula b. The article cited in Exercise 27 stated that in
you obtained in part (a). Do this for the data of addition to 75 of 110 questionnaires without
part (b). an incentive to respond being returned, 78 of

100 questionnaires that included a prepaid Estimate the difference between true average
cash amount of $5 were returned. Calculate densities for the two types of wood in a way
a 95% confidence interval for the ratio of the that conveys information about reliability and
proportion of questionnaires returned when precision.
such a cash incentive is included to the pro-
34. Is there any systematic tendency for part-time
portion returned in the absence of any incen-
college faculty to hold their students to different
tive. Does the interval suggest that such an
standards than full-time faculty do? The article
incentive may not increase the likelihood of
“Are There Instructional Differences Between
response?
Full-Time and Part-Time Faculty?” (College
30. A manufacturer of small appliances purchases plas- Teaching, 2009: 23–26) reported that for a
tic handles for coffeepots from an outside vendor. If sample of 125 courses taught by full-time faculty,
a handle is cracked, it is considered defective and the mean course GPA was 2.7186 and the stan-
must be discarded. A very large shipment of handles dard deviation was .63342, whereas for a sample
is received. The proportion of defective handles, , of 88 courses taught by part-timers, the mean
is of interest. How many handles from the shipment and standard deviation were 2.8639 and .49241,
should be inspected to estimate to within .1 with respectively.
99% confidence? Calculate a confidence interval at the 99%
level to estimate the true mean GPA difference be-
31. A manufacturer of exercise equipment is interested
tween full-time and part-time faculty. Does it ap-
in estimating the proportion of all purchasers of
pear that true average course GPA for part-time fac-
one of its products who still own the product two
ulty differs from that for faculty teaching full-time?
years after purchase. What sample size is required
Explain your reasoning.
to estimate this proportion to within .05 with a con-
fidence level of 90%? 35. An experiment was performed to compare the
32. Use the accompanying data to estimate with a 95% fracture toughness of high-purity Ni-maraging
confidence interval the difference between true aver- steel with commercial-purity steel of the same
age compressive strength (N/mm2) for 7-day-old con- type. For 32 high-purity specimens, the sample
crete specimens and true average strength for 28-day- mean toughness and sample standard deviation of
old specimens (“A Study of Twenty-Five-Year-Old toughness were 65.6 and 1.4, respectively, whereas
Pulverized Fuel Ash Concrete Used in Foundation for 32 commercialpurity specimens, the sample
Structures,” Proc. Inst. Civil Engrs., 1985: 149–165): mean and sample standard deviation were 59.2
and 1.1, respectively. Estimate the difference be-
7-day old: n1 5 68 x1 5 26.99 s1 5 4.89 tween true average toughness for the high-purity
steel and that for the commercial steel using a
28-day old: n2 5 74 x2 5 35.76 s2 5 6.43
lower 95% confidence bound. Does your estimate
33. Relative density was determined for one sample of demonstrate conclusively that this difference ex-
second-growth Douglas fir 2 3 4s with a low per- ceeds 5? Explain your reasoning.
centage of juvenile wood and another sample with
36. An investigator wishes to estimate the difference
a moderate percentage of juvenile wood, resulting
between population mean lifetimes of two different
in the following data (“Bending Strength and Stiff-
brands of batteries under specified conditions. If the
ness of Second-Growth Douglas Fir Dimension
population standard deviations are both roughly
Lumber,” Forest Products J., 1991: 35–43):
2 hr and equal sample sizes are to be selected, what
value of the common sample size n will be neces-
Type n x s
sary to estimate the difference to within .5 hr with
Low 35 .523 .0543
95% confidence?
Moderate 54 .489 .0450

7.4 mall-Sample Intervals Based

S
on a Normal Population Distribution
Suppose we select a random sample of components of a certain type and determine the
lifetime of each one, resulting in data x1, x2, . . . , xn. This sample can be used as a basis
for calculating one of three different kinds of statistical intervals:

1. An interval of plausible values for the population mean lifetime, that is, a confi-
dence interval for
2. An interval of plausible values for the lifetime of a single component of this type
that you are planning to buy at some time in the near future, that is, a prediction
interval for a single x value
3. An interval of values that includes a specified percentage, for example, 90%, of
the lifetime values for components in the population, that is, a tolerance interval
for a chosen percentage of x values in the population distribution

We have already seen how to calculate a z confidence interval for when n is large.
In this section, we assume that the sample has been selected from a normal population
distribution, and show how each of the three types of intervals can be obtained.

t Distributions and the One-Sample

t Confidence Interval
When the population distribution is normal, the sampling distribution of x is also normal
for any sample size n. This in turn implies that z 5 (x 2 )y(y1n) has a standard nor-
mal distribution (the z curve). The large-sample interval for presented in Section 7.2
was based on replacing by s in z; for large n, little extra variability is introduced by this
substitution, so (x 2 )y(sy1n) also has approximately a standard normal distribution in
this case. However, for small n this is no longer true. The standardized variable with s in
the denominator varies much more in value from sample to sample than does the first
variable. The following proposition introduces a new type of probability distribution
needed for a small-sample interval.

Proposition Let x1, x2, . . . , xn be a random sample from a normal distribution. Then the standard-
ized variable

x2
t5
sy1n

has a type of probability distribution called a t distribution with n 2 1 degrees of free-

dom (df).

A passing acquaintance with properties of t distributions is important for an under-

standing of various inferential procedures based on these distributions.

Properties of Distributions
1. Any particular distribution is specified by the value of a parameter called the
abbreviated df. There is one distribution with 1 df, another with
2 df, yet another one with 3 df, and so on. The number of df for a distribution can be
any positive integer.
2. The density curve corresponding to any particular distribution is bell-shaped and cen-
tered at 0, just like the curve.
3. Any curve is more spread out than the curve.
4. As the number of df increases, the spread of the corresponding curve decreases. Thus
the most spread out of all curves is the one with 1 df, the next most spread out is the
one with 2 df, and so on.
5. As the number of df increases, the sequence of curves approaches the curve. (The
curve is sometimes referred to as the curve with df 5 .)

Figure 7.8 compares the z curve to several different t curves.

curve

curve for 12 df curve for 4 df

–3 –2 –1 0 1 2 3

Figure 7.8 Comparison of the curve to several

curves

Formulas for the large-sample z intervals utilized z critical values, numbers like
Unless otherwise noted, all content on this page is © Cengage Learning.

1.96 and 2.33, that captured certain central or cumulative areas under the z curve.
Formulas for t intervals require t critical values, which play the same role for various t
curves. Appendix Table IV gives a tabulation of such values. Each row of the table cor-
responds to a different number of df, and each column gives critical values that capture
a particular central area and the corresponding cumulative area. For example, the t criti-
cal value at the intersection of the 12 df row and the .95 central area column is 2.179,
so the area under the 12 df t curve between 22.179 and 2.179 is .95. The cumulative
area under this t curve all the way to the left of 2.179 is the central area .95 plus the
lower tail area .025, or .975. This is illustrated in Figure 7.9. The critical value 2.179
can then be used to calculate a two-sided confidence interval with a confidence level
of 95%. A one-sided interval, which gives either an upper confidence bound or a lower
confidence bound, with confidence level 95% necessitates going to the .95 cumulative
area column; for 12 df, the required critical value is 1.782.

curve
Shaded cumulative area .95 for 12 df Shaded area .95

Tail area .05 Tail area .025

0 1.782 –2.179 0 2.179

Cumulative area .975

Figure 7.9 critical values illustrated

As we move from left to right in any particular row of the table, the critical values in-
crease. This is because capturing a larger central or cumulative area requires going farther
out into the tail of the t curve. Starting with 1 df, the rows increase by 1 df until reaching
30 df, and then they jump to 40, 60, 120, and finally to ; this last row contains z critical
values. Once past 30 df, there is little difference between the t curves and the z curve as far
as the areas of interest to us are concerned. Rather than using the 30, 40, 60, and 120 rows
or trying to interpolate, we recommend that z critical values be used whenever df . 30.
The large-sample z CI for was obtained by using the (approximate) standard
normal variable z 5 (x 2 )y(sy1n) as the basis for a probability statement and then
manipulating inequalities to isolate . An analogous derivation, based on the fact
that t 5 (x 2 )y(sy1n) has a t distribution with n 2 1 df, gives the following one-
sample t CI.

One-Sample Confidence Intervals

Let and be the sample mean and sample standard deviation of a random sample of size
from a normal population or process distribution. Then a two-sided confidence interval
for the population or process mean has the form

6 ( critical value)
1
Unless otherwise noted, all content on this page is © Cengage Learning.
critical values for the most frequently used confidence levels, corresponding to particular
central curve areas, are given in Appendix Table IV. An upper confidence bound results
from replacing 6 in the given formula by 1, whereas a lower confidence bound uses 2 in
place of 6. For such a one-sided interval, a critical value in the cumulative area column
corresponding to the desired confidence level is used.

Example 7.10 As part of a larger project to study the behavior of stressed-skin panels, a structural
component being used extensively in North America, the article “Time-Dependent
Bending Properties of Lumber” (J. of Testing and Eval., 1996: 187–193) reported
on various mechanical properties of Scotch pine lumber specimens. Consider the

following observations on modulus of elasticity (MPa) obtained 1 minute after load-

ing in a certain configuration:
10,490 16,620 17,300 15,480 12,970 17,260 13,400 13,900
13,630 13,260 14,370 11,700 15,470 17,840 14,070 14,760
Figure 7.10 shows a normal quantile plot obtained from Minitab. The straightness
of the pattern in the plot provides strong support for assuming that the population
distribution of modulus of elasticity is at least approximately normal.
18000

17000

16000

15000
Modulus

14000

13000

12000

11000

10000
2 1 0 1 2
N quantile

Figure 7.10 A normal quantile plot of the modulus of elasticity data

Hand calculation of the sample mean and standard deviation is simplified by
subtracting 10,000 from each observation: yi 5 xi 2 10,000. It is easily verified that
^ yi 5 72,520 and ^ y2i 5 392,083,800, from which y 5 4532.5 and sy 5 2055.67. Thus
x 5 14,532.5 and sx 5 2055.67 (adding or subtracting the same constant from each
observation does not affect variability). The sample size is 16, so a confidence interval
for population mean modulus of elasticity is based on 15 df. A confidence level of 95%
for a two-sided interval requires the t critical value of 2.131. The resulting interval is
s 2055.67
Unless otherwise noted, all content on this page is © Cengage Learning.

x 6 (t critical value) 5 14,532.5 6 (2.131)

1n 116

5 14,532.5 6 1095.2 5 (13,437.3, 15,627.7)
This interval is quite wide both because of the small sample size and because of the
large amount of variability in the sample. A 95% lower confidence bound is obtained
by using 2 and 1.753 in place of 6 and 2.131, respectively.

A Prediction Interval for a Single x Value

Rather than wanting to estimate the population or process mean based on a sample
x1, . . . , xn, the individual who obtained the data may wish to use it as a basis for predict-
ing a single x value that has not yet been observed, for example, the lifetime of the next

component to be purchased, the number of calories in the next frozen dinner to be

consumed, and so on. Again we assume that the underlying distribution is normal. A
point prediction for x is just x, which is also a point estimate for . An entire interval of
plausible values for x is based on the prediction error x 2 x. The value of x is of course
subject to uncertainty, and so is x before obtaining data. The expected or mean value
of the prediction error is

(x2x) 5 x 2 x 5 2 5 0

Since the xi values in the sample are assumed independent of the “future” x value, the
variance of the prediction error is the sum of the variance of x and the variance of x:

2 1
2(x2x) 5 2x 1 2x 5 1 2 5 2 a 1 1 b
n n

Statistical theory then says that if we use these results to standardize the prediction error
(with s2 used in place of 2), we obtain a t variable based on n 2 1 df.

When the underlying distribution is normal, the standardized variable

2
5
1
11
A

has a distribution based on 2 1 df. This implies that a two-sided prediction interval for
has the form

1
6 ( critical value) ? 11
A
An upper prediction bound and a lower prediction bound result from using 1 and 2,
respectively, in place of 6 and selecting the appropriate critical value from the corre-
sponding cumulative area column of the table rather than the central area column.

The interpretation of a 95% prediction level is analogous to that of a 95% confi-

dence level. If the two-sided interval is used repeatedly on different samples, in the long
run about 95% of the calculated intervals will include the value of x that is being pre-
dicted. (If the samples are selected from entirely different population distributions, such
as a sample of component lifetimes, then a sample of fuel efficiencies for automobiles,
then a sample of service times for customers, etc., then in the long run about 95% of the
intervals will include the actual values of the variables being predicted.) Notice that if
the 1 under the square root in the 6 factor is suppressed, the earlier confidence interval
formula results. This implies that the prediction interval is wider than the confidence
interval, often much wider because 1 will generally dominate 1yn. There is a lot more
uncertainty in predicting the value of a single observation x than there is in estimating
a mean value .

Example 7.11 Reconsider the modulus of elasticity data introduced in the previous example. Sup-
pose that one more specimen of lumber is to be selected for testing. A 95% prediction
interval for the modulus of elasticity of this single specimen uses the same t critical
value and values of n, x, and s used in the confidence interval calculation:

1
14,532.5 6 (2.131)(2055.67) 11 5 14,532.5 6 4515.5
A 16
5 (10,017.0, 19,048.0)

This interval is extremely wide, indicating that there is great uncertainty as to what
the modulus of elasticity for the next lumber specimen will be. Notice that the 6
factor for the confidence interval is 1095.2, so the prediction interval is roughly four
times as wide as the confidence interval.

Tolerance Intervals
Consider a population of automobiles of a certain type, and suppose that under specified
conditions, fuel efficiency (mpg) has a normal distribution with 5 30 and 5 2. Then
since the interval from 21.645 to 1.645 captures 90% of the area under the z curve, 90%
of all these automobiles will have fuel efficiency values between 2 1.645 5 26.71
and 1 1.645 5 33.29. But what if the values of and are not known? We can
take a sample of size n, determine the fuel efficiencies, x, and s, and form the interval
whose lower limit is x 2 1.645s and whose upper limit is x 1 1.645s. However, because
of sampling variability in the estimates of and , there is a good chance that the result-
ing interval will include less than 90% of the population values. Intuitively, to have an
a priori 95% chance of the resulting interval including at least 90% of the population
values, when x and s are used in place of and , we should also replace 1.645 by some
larger number. For example, when n 5 20, the value 2.310 is such that we can be 95%
confident that the interval x 6 2.310s will include at least 90% of the fuel efficiency
values in the population.

Let be a number between 0 and 100. A tolerance interval for capturing at least
% of the values in a normal population distribution with a confidence level 95% has
the form

6 (tolerance critical value) ?

Tolerance critical values for 5 90, 95, and 99 in combination with various sample sizes
are given in Appendix Table V. This table also includes critical values for a confidence level
of 99% (these values are larger than the corresponding 95% values). Replacing 6 by 1
gives an upper tolerance bound and using 2 in place of 6 results in a lower tolerance
bound. Critical values for obtaining these one-sided bounds also appear in Appendix
Table V.

Example 7.12 Let’s return to the modulus of elasticity data discussed in Examples 7.10 and 7.11,
where n 5 16, x 5 14,532.5, s 5 2055.67, and a normal quantile plot of the data in-
dicated that population normality was quite plausible. For a confidence level of 95%,
a two-sided tolerance interval for capturing at least 95% of the modulus of elasticity
values for specimens of lumber in the population sampled uses the tolerance critical
value of 2.903. The resulting interval is
14,532.5 6 (2.903)(2055.67) 5 14,532.5 6 5967.6 5 (8564.9, 20,500.1)
We can be highly confident that at least 95% of all lumber specimens have modulus
of elasticity values between 8564.9 and 20,500.1.
The 95% CI for was (13,437.3, 15,627.7), and the 95% prediction interval for
the modulus of elasticity of a single lumber specimen was (10,017.0, 19,048.0). Both
the prediction interval and the tolerance interval are substantially wider than the
confidence interval.

Intervals Based on Nonnormal

Population Distributions
The one-sample t CI for is robust to small or even moderate departures from normal-
ity unless n is quite small. By this we mean that if a critical value for 95% confidence,
for example, is used in calculating the interval, the actual confidence level will be rea-
sonably close to the nominal 95% level. If, however, n is small and the population
distribution is highly nonnormal, then the actual confidence level may be considerably
different from the one you think you are using when you obtain a particular critical
value from the t table. It would certainly be distressing to believe that your confidence
level is about 95% when in fact it was really more like 88%! The bootstrap technique,
introduced in Section 7.6, has been found to be quite successful at estimating param-
eters in a wide variety of nonnormal situations.
In contrast to the confidence interval, the validity of the prediction and tolerance
intervals described in this section is closely tied to the normality assumption. These
latter intervals should not be used in the absence of compelling evidence for normality.
The excellent reference Statistical Intervals, mentioned previously, discusses alternative
procedures of this sort for various other situations.

Section 7.4 Exercises

37. Determine the t critical value that will capture the 38. Determine the t critical value for a two-sided confi-
desired t curve area in each of the following cases: dence interval in each of the following situations:
a. Central area 5 .95, df 5 10 a. Confidence level 5 95%, df 5 10
b. Central area 5 .95, df 5 20 b. Confidence level 5 95%, df 5 15
c. Central area = .99, df 5 20 c. Confidence level 5 99%, df 5 15
d. Central area 5 .99, df 5 50 d. Confidence level 5 99%, n 5 5
e. Upper-tail area 5 .01, df 5 25 e. Confidence level 5 98%, df 5 24
f. Lower-tail area 5 .025, df 5 5 f. Confidence level 5 99%, n 5 38

39. Determine the t critical value for a lower or an up- contained the following observations on degree of
per confidence bound for each of the situations de- polymerization for paper specimens for which vis-
scribed in Exercise 38. cosity times concentration fell in a certain range:

40. According to the article “Fatigue Testing of Con- 418 421 421 422 425 427
doms” (Polymer Testing, 2009: 567–571), “tests 431 434 437 439 446 447
currently used for condoms are surrogates for the 448 453 454 463 465
challenges they face in use,” including a test for a. Construct a boxplot of the data and comment
holes, an inflation test, a package seal test, and on any interesting features.
tests of dimensions and lubricant quality. The in- b. Is it plausible that the given sample observations
vestigators developed a new test that adds cyclic were selected from a normal distribution?
strain to a level well below breakage and deter- c. Calculate a two-sided 95% confidence interval
mines the number of cycles to break. A sample of for true average degree of polymerization (as did
20 condoms of one particular type resulted in a the authors of the article). Does the interval sug-
sample mean number of 1584 and a sample stan- gest that 440 is a plausible value for true average
dard deviation of 607. Calculate and interpret a degree of polymerization? What about 450?
confidence interval at the 99% confidence level
43. Haven’t you always wanted to own a Porsche? We
for the true average number of cycles to break.
investigated the Boxster (their cheapest model)
(Note: The article presented the results of hypoth-
and performed an online search at www.cars
esis tests based on the t distribution; the validity
.com on December 30, 2012. Asking prices were
of these depends on assuming normal population
well beyond our meager professorial salaries, so
distributions.)
instead we focused on odometer readings (mile-
41. Ultra high performance concrete (UHPC) is a rela- age). Here are reported readings for a sample of
tively new construction material that offers strong 16 Boxsters:
adhesive properties with other materials. The
1445 25,822 26,892 29,860
authors of “Adhesive Power of Ultra High Perfor-
35,285 47,874 49,544 64,763
mance Concrete from a Thermodynamic Point of
View” (J. of Materials in Civil Engr., 2012: 1050– 72,698 75,732 84,457 91,577
1058) investigated the intermolecular forces for 93,000 109,538 113,399 137,652
UHPC connected to various substrates. As reported
A normal quantile plot supports the assumption
in the article, here are the work of adhesion mea-
that mileage is at least approximately normally dis-
surements (in mJ/m2) for five samples of UHPC
tributed. The R software reports the following sum-
adhered to steel: mary statistics for this data:
107.1 109.5 107.4 106.8 108.1 > summary(odometer, digits=6)
Min 1st Qu Median Mean
a. Is it plausible that the given sample observations 1445.0 33928.8 68730.5 66221.1

were selected from a normal distribution? 3rd Qu Max

91932.8 137652.0
b. Calculate a two-sided 95% confidence interval
> sd(odometer)
for the true average work of adhesion for UHPC
37683.17
adhered to steel. Does the interval suggest that
a. Estimate true average mileage in a way that con-
107 is a plausible value for the true average work
veys information about precision and reliability.
of adhesion for UHPC adhered to steel? What
b. Predict the mileage for a single Porsche Box-
about 110?
ster in a way that conveys information about
42. The article “Measuring and Understanding the precision and reliability. How does the pre-
Aging of Kraft Insulating Paper in Power Transform- diction compare to the estimate calculated in
ers” (IEEE Electrical Insul. Mag., 1996: 28–34) part (a)?

44. A new concrete structure that experiences crack- b. SAS reports the following summary information
ing within the first seven days after setting is often for this data:
said to have experienced “early-age cracking.” The MEANS Procedure
This is usually a precursor to later-age cracking Analysis Variable : pressure
Lower 95% Upper 95%
and other problems that lead to an overall weak- CL for Mean CL for Mean Mean Std Error
ening of the structure. According to the article 36.1892782 39.0373884 37.6133333 0.6639612
“Early-Age Cracking Tendency and Ultimate De-
Calculate a two-sided 95% confidence interval
gree of Hydration of Internally Cured Concrete”
for the population mean of maximum pressure
(J. of Materials in Civil Engr., 2012: 1025–1033),
and confirm the lower and upper endpoints re-
more than 60% of surveyed transportation agen-
ported by SAS.
cies regard early-age transverse cracking to be
c. Calculate an upper confidence bound with con-
problematic. The authors investigated the ef-
fidence level 95% for the population mean of
fectiveness of a process known as internal curing
maximum pressure.
to mitigate early-age cracking of bridge deck
d. Calculate an upper prediction bound with level
concretes.
95% for the maximum pressure of a single obser-
One important mechanical property of con-
vation. How does the prediction compare to the
crete is its modulus of elasticity (in GPa), which
estimate calculated in part (b)?
is the material’s tendency to be deformed elasti-
cally when subjected to an applied force. A higher 46. A study of the ability of individuals to walk in a
modulus of elasticity indicates a stiffer material. As straight line (“Can We Really Walk Straight?” Amer.
reported in the article, the following are modulus J. of Physical Anthro., 1992: 19–27) reported the ac-
of elasticity measurements for seven specimens of companying data on cadence (strides per second)
internally cured concrete that have been set for one for a sample of n 5 20 randomly selected healthy
week: men:

27.0 25.5 28.5 34.0 31.0 34.5 32.5 .95 .85 .92 .95 .93 .86 1.00 .92 .85 .81
.78 .93 .93 1.05 .93 1.06 1.06 .96 .81 .96
a. Is it plausible that this sample was selected from
A normal quantile plot gives substantial support
a normal population distribution?
to the assumption that the population distribution
b. Estimate true average modulus of elasticity for
of cadence is approximately normal. A descriptive
these mixtures in a way that conveys informa-
summary of the data from Minitab follows:
tion about precision and reliability.
Variable N Mean Median TrMean StDev SEMean
c. Predict the modulus of elasticity for a single cadence 20 0.9255 0.9300 0.9261 0.0809 0.0181
mixture in a way that conveys information about Variable Min Max Q1 Q3
precision and reliability. How does the predic- cadence 0.7800 1.0600 0.8525 0.9600
tion compare to the estimate calculated in a. Calculate and interpret a 95% confidence inter-
part (b)? val for population mean cadence.
b. Calculate and interpret a 95% prediction inter-
45. The article “Concrete Pressure on Formwork”
val for the cadence of a single individual ran-
(Mag. of Concrete Res., 2009: 407–417) gave the
domly selected from this population.
following observations on maximum concrete pres-
c. Calculate an interval that includes at least 99%
sure (kN/m2):
of the cadences in the population distribution
33.2 41.8 37.3 40.2 36.7 using a confidence level of 95%.
39.1 36.2 41.8 36.0 35.2
47. A sample of 25 pieces of laminate used in the
36.7 38.9 35.8 35.2 40.1
manufacture of circuit boards was selected and
a. Is it plausible that this sample was selected from the amount of warpage (in.) under particular
a normal population distribution? conditions was determined for each piece, resulting

in a sample mean warpage of .0635 and a sample 48. A more extensive tabulation of t critical values than
standard deviation of .0065. what appears in this book shows that for the t distribu-
a. Calculate a prediction for the amount of warpage tion with 20 df, the areas to the right of the values .687,
of a single piece of laminate in a way that pro- .860, and 1.064 are .25, .20, and .15, respectively.
vides information about precision and reliability. What is the confidence level for each of the following
b. Calculate an interval for which you can have three confidence intervals for the mean of a normal
a high degree of confidence that at least 95% population distribution? Which of the three intervals
of all pieces of laminate result in amounts of would you recommend be used, and why?
warpage that are between the two limits of the a. (x 2 .687sy121, x 1 1.725sy121)
interval. b. (x 2 .860sy121, x 1 1.325sy121)
c. (x 2 1.064sy121, x 1 1.064sy121)

7.5 I ntervals for m1 2 m2 Based

on Normal Population Distributions
In Section 7.3, we showed how to obtain a large-sample confidence interval for a differ-
ence between two population, process, or treatment means. The validity of the interval
required that the two samples be selected independently of one another, and the deriva-
tion involved standardizing x1 2 x2 to obtain a variable having approximately a standard
normal distribution. In this section, we first consider two independent samples with at
least one of the sample sizes being small and then an interval calculated from paired data.

The Two-Sample t Interval

The one-sample t confidence interval for presented in Section 7.4 can be used for any
sample size n provided that the population distribution is at least approximately normal.
The validity of the two-sample t interval requires that both population, process, or treat-
ment response distributions be normal.

Proposition Consider two normal distributions with mean values 1 and 2, respectively. Suppose
a random sample of size n1 is selected from the first distribution, resulting in a sample
mean of x1 and a sample standard deviation of s1. A random sample from the second
distribution, selected independently of that from the first one, yields sample mean x2
and sample standard deviation s2. Then the standardized variable
x1 2 x2 2 (1 2 2)
t5
s21 s22
1
C n1 n2

has approximately a t distribution with df estimated from the sample by the following
formula:
(se1)2 1 (se2)2 2

df 5
(se1)4 (se2)4
1
n1 2 1 n2 2 1

where se 5 sy1n (Note: df should be rounded down to the nearest integer).

This implies that a confidence interval for m1 2 m2 in this situation is

s21 s22
x1 2 x2 6 (t critical value) 1
C n1 n2

t critical values corresponding to the most frequently used confidence levels appear in
Appendix Table IV.

The standardized variable in the box is identical to the one used in our previous de-
velopment of the large-sample interval; it is labeled t here simply to emphasize that it
now has approximately a t rather than a z distribution. The only difference between the
formulas for the two intervals is that the formula here uses a t critical value instead of a z
critical value. Separate normal quantile plots of the observations in the two samples can
be used as a basis for checking that the normality assumption is plausible.

Example 7.13 Which way of dispensing champagne, the traditional vertical method or a tilted beer-
like pour, preserves more of the tiny gas bubbles that improve flavor and aroma? The
following data was reported in the article “On the Losses of Dissolved CO2 during
Champagne Serving” (J. Agr. Food Chem., 2010: 8768–8775).
Temp (°C) Type of Pour n Mean (g/L) SD
18 Traditional 4 4.0 .5
18 Slanted 4 3.7 .3
12 Traditional 4 3.3 .2
12 Slanted 4 2.0 .3
Assuming the sampled distributions are normal, let’s calculate confidence intervals
for the difference between true average dissolved CO2 loss for the traditional pour
and that for the slanted pour at each of the two temperatures. For the 18°C tempera-
ture, the number of degrees of freedom for the interval is
2 2
1 .54 1 .34 22 .007225
df 5 5 5 4.91
(.52y4)2 (.32y4)2 .00147083
3 1 3

Rounding down, the CI will be based on 4 df. For a confidence interval of 99%,
Appendix Table IV gives t critical value 5 4.604. The desired interval is
.52 .32
4.0 2 3.7 6 (4.604) 1 5 .3 6 (4.604)(.2915) 5 .3 6 1.3 5 (21.0, 1.6)
B 4 4
Thus, we can be highly confident that 21,0 , 122 , 1.6, where 1 and 2 are
true average losses for the traditional and slant methods, respectively. Notice that this
CI contains 0; so at the 99% confidence level, it is plausible that 1 2 2 5 0—that
is, that 1 5 2. Note that if the 1 and 2 labels had been reversed, the resulting inter-
val would have been (21.6, 1.0), with exactly the same interpretation.

The df formula for the 12°C comparison yields df 5 .00105625y .00020208 5 5.23.
The required df is 5, and Appendix Table IV gives t critical value 5 4.032 for a 99%
CI. The resulting interval is (.6, 2.0). Thus, 0 is not a plausible value for this dif-
ference. It appears from the CI that the true average loss when the slant method is
used is smaller than that when the traditional method is used, so the slant method is
better at this temperature. This, in fact, was the conclusion reported in the popular
media.

There is a special confidence interval formula for the case of normal population
distributions having 1 5 2. It is called the pooled t confidence interval; “pooled”
refers to the fact that s1 and s2 are combined to estimate the common population stan-
dard deviation. Recent studies have shown that the behavior of this interval is rather
sensitive to the assumption of equal population standard deviations. If they are not in
fact the same, the actual confidence level may be quite different from the nominal
level (e.g., the actual level may deviate substantially from an assumed 95% level). For
this reason we recommend the use of the two-sample t interval we have described un-
less there is compelling evidence for at least approximate equality of the population
standard deviations.

A Confidence Interval from Paired Data

Let 1 denote the population mean height for all married males and 2 represent the
population height for all married females (both in inches). One way to estimate 1 2 2
would be to obtain two independent samples of heights, one for married males and the
other for married females, and (assuming normality) use the two-sample t interval just
discussed. Another possibility, though, is to randomly select n married couples and de-
termine the height of the male and the female in each couple. This results in a sample
of pairs of numerical values. The first observation might be (69, 66), the second (73, 63),
the third (66, 68), and so on. Because tall men tend to marry tall women and short men
tend to marry short women, it is unreasonable to think that the two variables height of
male and height of female in a married couple are independent. This invalidates the use
of the two-sample t interval.
Conceptualize the entire population of pairs from which our sample was selected.
For each such pair, we can subtract the second number from the first to obtain a differ-
ence value. The difference is 3 for the pair (69, 66), 22 for the pair (66, 68), and so on.
Now let d denote the population mean difference, that is, the average of all differences
in the population. It can be shown that

d 5 1 2 2

where 1 is the population mean value of all first numbers within pairs and 2 is defined
similarly for all second numbers. The importance of this relationship is that if we can
obtain a CI for d, it will also be a CI for 1 2 2. A CI for d can be calculated from
the differences for pairs in the sample. In particular, if the population distribution of

the differences can be assumed to be normal, then a one-sample t interval based on the
sample differences is appropriate.

The Paired- Interval

Let and denote the sample mean and sample standard deviation, respectively, for a
random sample of differences. If the distribution from which this sample was selected is
normal, a confidence interval for (i.e., for 1 2 2) is given by

6 ( critical value)
1

The critical value is based on 2 1 df. If is large, the Central Limit Theorem ensures the
validity of the interval without the normality assumption.

Example 7.14 Example 7.10 in the previous section gave data on the modulus of elasticity obtained
1 minute after loading in a certain configuration. The cited article also gave the
values of modulus of elasticity obtained 4 weeks after loading for the same lumber
specimens. The data is presented here.

Observation 1 minute 4 weeks Difference

1 10,490 9110 1380
2 16,620 13,250 3370
3 17,300 14,720 2580
4 15,480 12,740 2740
5 12,970 10,120 2850
6 17,260 14,570 2690
7 13,400 11,220 2180
8 13,900 11,100 2800
9 13,630 11,420 2210
10 13,260 10,910 2350
11 14,370 12,110 2260
12 11,700 8620 3080
13 15,470 12,590 2880
14 17,840 15,090 2750
15 14,070 10,550 3520
16 14,760 12,230 2530

The normal quantile plot of the differences shown in Figure 7.11 appears to be
reasonably straight, though the point on the far left deviates somewhat from a line
determined by the other points. (Use of a formal inferential procedure presented in
Chapter 8 indicates that it is reasonable to assume that the population distribution of
the differences is approximately normal.)

Difference

3500

2500

1500
Normal quantile
–2 –1 0 1 2

Figure 7.11 Normal quantile plot of the differences from Example 7.14

The sample consists of 16 pairs, so a 99% confidence interval based on 15 df

requires the t critical value 2.947. With d 5 2635.6 and sd 5 508.64, the interval is
508.64
2635.6 6 (2.947) 5 2635.6 6 374.7 5 (2260.9, 3010.3)
216
We can be highly confident, at the 99% confidence level, that the true average
modulus of elasticity after 1 minute exceeds that after 4 weeks by between roughly
2261 MPa and 3010 MPa. This interval is rather wide, partly because of the high
confidence level and partly because there is a reasonable amount of variability in the
sample differences.
Although the two-sample t CI should not be used here because the 1-minute
observations are not independent of the 4-week observations, the resulting interval
has limits of roughly 705 and 4566. This interval is a great deal wider than the cor-
rect interval. The reason for this is that there is much less variability in the differ-
ences than there is in either the 1-minute observations or the 4-week observations
(sd 5 509, s1 5 2056, and s2 5 1902).
Unless otherwise noted, all content on this page is © Cengage Learning.

In practice, it is frequently the case that a CI calculated from paired data is

much narrower than a CI calculated from two independent samples. This is because
numbers within pairs often tend to be rather similar—when one is relatively large
(small), the other tends to be relatively large (small) also. The implication is that
the differences will show much less variation than that in either of two independent
samples. In Example 7.14, there is a natural pairing, but this is not always the case. In
medical experimentation, investigators frequently create matched pairs by selecting
patients so that within each pair, the two individuals are as similar as possible with
respect to age, general physical condition, and physiological variables, such as blood
pressure, heart rate, and so on. Then the differences within pairs will largely reflect
the differences between the two treatments rather than extraneous variation from all
other factors.

Section 7.5 Exercises

49. The firmness of a piece of fruit is an important 51. Refer to Exercise 42 in Section 7.4. The cited ar-
indicator of fruit ripeness. The Magness–Taylor ticle also gave the following observations on degree
firmness (N) was determined for one sample of of polymerization for specimens having viscosity
20 golden apples with a shelf life of zero days, times concentration in a higher range:
resulting in a sample mean of 8.74 and a sample
429 430 430 431 436 437
standard deviation of .66, and another sample of
20 apples with a shelf life of 20 days, with a sample 440 441 445 446 447
mean and sample standard deviation of 4.96 and a. Construct a comparative boxplot for the two
.39, respectively. Calculate a confidence interval samples, and comment on any interesting
for the difference between true average firmness features.
for zero-day apples and true average firmness for b. Calculate a 95% confidence interval for the dif-
20-day apples using a confidence level of 95%, and ference between true average degree of polym-
interpret the interval. erization for the middle range and that for the
50. Anorexia nervosa (AN) is a psychiatric condition high range. Does the interval suggest that 1
leading to substantial weight loss among wom- and 2 may in fact be different? Explain your
en fearful of becoming overweight. The article reasoning.
“Adipose Tissue Distribution After Weight Resto- 52. The degenerative disease osteoarthritis most fre-
ration and Weight Maintenance in Women with quently affects weight-bearing joints such as the
Anorexia Nervosa” (Amer. J. of Clinical Nutr., 2009: knee. The article “Evidence of Mechanical Load
1132–1137) used whole-body magnetic resonance Redistribution at the Knee Joint in the Elderly
imagery to determine various tissue characteristics when Ascending Stairs and Ramps” (Annals of
for both an AN sample of individuals who had un- Biomed. Engr., 2008: 467–476) presented the fol-
dergone acute weight restoration and maintained lowing summary data on stance duration (ms) for
their weight for a year and a comparable (at the out- samples of both older and younger adults.
set of the study) control sample. Here is summary
data on intermuscular adipose tissue (IAT, in kg). Sample Sample Sample
Age Size Mean SD
Sample Sample Sample
Older 28 801 117
Condition Size Mean SD
Younger 16 780 72
AN 16 .52 .26
Control 8 .35 .15 Assume that both stance duration distributions are
normal.
Assume that both samples were selected from
a. Calculate and interpret a 99% CI for true aver-
normal distributions.
age stance duration among elderly individuals.
a. Calculate an estimate for true average IAT un-
b. Calculate a 99% CI for the difference between
der the described AN protocol; do so in a way
true average stance duration for the elderly and
that conveys information about the reliability
the younger individuals. Does your interval sug-
and precision of the estimation.
gest that true average stance duration is larger
b. Calculate an estimate for the difference between
among elderly individuals than among younger
true average AN IAT and true average control
individuals?
IAT; do so in a way that conveys information
about the reliability and precision of the estima- 53. Arsenic is a known carcinogen and poison. The stan-
tion. What does your estimate suggest about true dard laboratory procedures for measuring arsenic
average AN IAT relative to true average control concentration (g/L) in water are expensive. Con-
IAT? sider the accompanying summary data and Minitab

output for comparing a laboratory method to a new average outputs for the two brands with a 95%
relatively quick and inexpensive field method (from confidence interval.
the article “Evaluation of a New Field measurement c. Estimate the difference between the two >s
Method for Arsenic in Drinking Water Samples,” using the two-sample t interval discussed in
J. of Envir. Engr., 2008: 382–388). this section, and compare it to the interval of
part (b).
Two-Sample T-Test and CI
Sample N Mean StDev SE Mean 55. Along any major freeway we often encounter
1 3 19.70 1.10 0.64 service (or logo) signs that give information on
2 3 10.90 0.60 0.35 attractions, camping, lodging, food, and gas ser-
Estimate for difference: 8.800 vices in advance of the off-ramp that leads to such
95% CI for difference: (6.498, 11.102)
services. These signs typically do not provide in-
Calculate a two-sided 95% confidence interval formation on distances. Researchers in Virginia,
for the difference in population means and con- with cooperation from the Virginia Department
firm the lower and upper endpoints reported by of Transportation, performed an experiment to
Minitab. Based on the interval, what conclusion see if the addition of distance information on the
you can draw about the two methods? Why? service signs would affect drivers. The results of
this experiment were reported in “Evaluation of
54. Suppose not only that the two population or treat- Adding Distance Information to Freeway-Specific
ment response distributions are normal but also Service (Logo) Signs” (J. of Transp. Engr., 2011:
that they have equal variances. Let 2 denote the 782–788).
common variance. This variance can be estimated In one investigation, the authors selected six
by a “pooled” (i.e., combined) sample variance as sites along Virginia interstate highways where ser-
follows: vice signs are posted. For each site, crash data was
n1 2 1 n2 2 1 obtained for a three-year period before distance in-
s2p 5 a bs21 1 a bs22
n1 1 n2 2 2 n1 1 n2 2 2 formation was added to the service signs and for a
one-year period afterward. The number of crashes
(n1 1 n2 2 2 is the sum of the df’s contributed by
per year before and after the sign changes were
the two samples). It can then be shown that the
made are given here:
standardized variable
Before: 15 26 66 115 62 64
(x1 2 x2) 2 (1 2 2) After: 16 24 42 80 78 73
t5
1 1
sp 1 a. Calculate a confidence interval for the popula-
A n1 n2
tion mean difference in the number of crashes
per year before and after the sign changes were
has a t distribution with n1 1 n2 22 df.
made. Provide an interpretation for this interval.
a. Use the t variable above to obtain a pooled t
b. If a seventh site were to be randomly selected
confidence interval formula for 1 2 2.
among locations bearing service signs, between
b. A sample of ultrasonic humidifiers of one par-
what values would you predict the difference in
ticular brand was selected for which the observa-
number of crashes to lie?
tions on maximum output of moisture (oz) in a
controlled chamber were 14.0, 14.3, 12.2, and 56. Lactation promotes a temporary loss of bone mass to
15.1. A sample of the second brand gave out- provide adequate amounts of calcium for milk pro-
put values 12.1, 13.6, 11.9, and 11.2 (“Multiple duction. The paper “Bone Mass Is Recovered from
Comparisons of Means Using Simultaneous Lactation to Postweaning in Adolescent Mothers
Confidence Intervals,” J. of Quality Technology, with Low Calcium Intakes” (Amer. J. of Clinical
1989: 232–241). Use the pooled t formula from Nutr., 2004: 1322–1326) gave the following data on
part (a) to estimate the difference between true total body bone mineral content (TBBMC) (g) for

a sample both during lactation (L) and in the post- Pos Dom Tr Pos ND Tr Pit Dom Tr Pit ND Tr
weaning period (P). 1 30.31 32.54 27.63 24.33
Subject L P 2 44.86 40.95 30.57 26.36
3 22.09 23.48 32.62 30.62
1 1928 2126
4 31.26 31.11 39.79 33.74
2 2549 2885
5 28.07 28.75 28.50 29.84
3 2825 2895
6 31.93 29.32 26.70 26.71
4 1924 1942 7 34.68 34.79 30.34 26.45
5 1628 1750 8 29.10 28.87 28.96 21.49
6 2175 2184 9 25.51 27.59 31.19 20.82
7 2114 2164 10 22.49 21.01 36.00 21.75
8 2621 2626 11 28.74 30.31 31.58 28.32
9 1843 2006 12 27.89 27.92 32.55 27.22
10 2541 2627 13 28.48 27.85 29.56 28.86
14 25.60 21.95 28.64 28.58
a. Construct a comparative boxplot of TBBMC for
15 20.21 21.59 28.58 27.15
the lactation and postweaning periods and com-
16 33.77 32.48 31.99 29.46
ment on any interesting features.
17 32.59 32.48 27.16 21.26
b. Estimate the difference between true average
18 32.60 31.61
TBBMC for the two periods of concrete in a
way that conveys information about precision 19 29.30 27.46
and reliability. Does it appear plausible that the 58. Dentists make many people nervous (even more
true average TBBMCs for the two periods are so than statisticians!). To assess any effect of such
identical? Why or why not? nervousness on blood pressure, the systolic blood
pressure of each of 60 subjects was measured both
57. The paper “Quantitative Assessment of Glenohu-
in a dental setting and in a medical setting (“The
meral Translation in Baseball Players” (Amer. J. of
Effect of the Dental Setting on Blood Pressure
Sports Med., 2004: 1711–1715) considered various
Measurement,” Amer. J. of Public Health, 1983:
aspects of shoulder motion for a sample of pitch-
1210–1214). For each subject, the difference
ers and another sample of position players [gleno-
between dental setting pressure and medical set-
humeral refers to the articulation between the hu-
ting pressure was computed; the resulting sample
merus (ball) and the glenoid (socket)]. The authors
mean difference and sample standard deviation of
kindly supplied the following data (for 19 position
the differences were 4.47 and 8.77, respectively.
players and 17 pitchers) on anteroposterior transla-
Estimate the true average difference between
tion (mm), a measure of the extent of anterior and
blood pressures for these two settings using a 99%
posterior motion, for both dominant nondominant
confidence interval. Does it appear that the true
arms.
average pressure is different in a dental setting
a. Estimate the true average difference in transla-
than in a medical setting?
tion between dominant and nondominant arms
for pitchers in a way that conveys information 59. Antipsychotic drugs are widely prescribed for condi-
about reliability and precision. Interpret the re- tions such as schizophrenia and bipolar disease. The
sulting estimate. article “Cardiometabolic Risk of Second-Generation
b. Repeat part (a) for position players. Antipsychotic Medications During First-Time Use in
c. The authors asserted that “pitchers have greater Children and Adolescents” (J. of the Amer. Med. As-
difference in side-to-side anteroposterior transla- soc., 2009: 1765–1773) reported on body composition
tion of their shoulders compared with position and metabolic changes for individuals who had taken
players.” Do you agree? Explain. various antipsychotic drugs for short periods of time.

a. The sample of 41 individuals who had taken total cholesterol under these circumstances (the
aripiprazole had a mean change in total cho- cited article included this CI).
lesterol (mg/dL) of 3.75, and the estimated b. For the sample of 45 individuals who had taken
standard error sd y1n was 3.878. Calculate a olanzapine, the article reported (7.38, 9.69) as a
confidence interval with confidence level ap- 95% CI for true average weight gain (kg). What
proximately 95% for the true average increase in is a 99% CI?

7.6 Other Topics in Estimation (Optional)

Maximum Likelihood Estimation
Maximum likelihood estimation is a technique for automatically generating point esti-
mators. This widely used procedure can be applied to any mass or density function, and
the resulting estimators can be shown to have certain desirable statistical properties. As
its name suggests, this technique is based on trying to find the value of an estimator that
is most likely, given the particular set of sample data.

Example 7.15 In a random sample of ten electronic components, suppose that the first, third,
and tenth components fail to function correctly when tested. Using the 0–1 coding
scheme introduced in Section 5.6, we can write the data in this sample as x1 5 1,
x2 5 0, x3 5 1, x4 5 0, . . . , x10 5 1, where a “0” indicates that the component func-
tioned correctly and a “1” indicates that it did not work correctly.
Since this data comes from a random sample, we can assume that the outcome
involving the first item sampled is independent of the outcome involving the second
component sampled, and so forth. Therefore, if denotes the unknown proportion
of defective components in the manufacturing process from which the sample was
obtained, then the probability of getting the particular sample can be written as
P(x1 5 1 and x2 5 0 and x3 5 1 and … and x10 5 1)
5 P(x1 5 1) P(x2 5 0) P(x3 5 1) … P(x10 5 1)
5 (1 2 ) … 5 3(1 2 )7
The expression 3(1 2 )7 represents the likelihood of our sample result occurring,
and it is abbreviated as L() 5 3(1 2 )7. We now ask, For what value of is the
observed sample most likely to have occurred? That is, we want to find the value of
that maximizes the probability 3(1 2 )7. This requires setting the derivative of
L() equal to 0 and solving for . However, to simplify the calculations, we first take
the natural logarithm of L() 5 3(1 2 )7:
ln(L()) 5 ln3 3(1 2 )7 4 5 3 ln() 1 7 ln(1 2 )
and then take the derivative1:
d 3 7
ln(L()) 5 2
d 12

1
Since ln(x) is an increasing function of x, the value of that maximizes ln(L()) will be the same value
that maximizes L().

Setting this expression equal to 0 and solving for , we find that the solution equals
3/10 5 .30. The value .30 is said to be the maximum likelihood estimate of the pro-
cess proportion defective . Notice that this estimate happens to be the ratio of the
number of defective components in the sample divided by the sample size, that is,
the sample proportion, p. In fact, this is true in general, regardless of the particular
sample data, so we can also say that the sample proportion is a maximum likelihood
estimator for a population or process proportion.

The technique in the previous example can be put into a general form that applies
to any mass or density function. Let f (x) denote either a mass or density function that
is defined by a set of parameters 1, 2,…, k. Given the data x1, x2, x3, . . . , xn in any
random sample from a population whose distribution is described by f (x), we form the
likelihood function

L(1, 2,…, k) 5 f (x1)f (x2)f (x3) f ( xn)

where each f (xi) is formed by simply substituting the ith data point xi into the func-
tion f (x). When f (x) is a mass function, L can be interpreted as the probability that the
sample result occurs. When f (x) is a density, L is not a probability and, in this case, we
simply call it a likelihood function.
The maximum likelihood estimators of the parameters 1, 2,…, k are the par-
ticular values of 1, 2,…, k that maximize the function L(1, 2,…, k). The usual
method for finding these parameter values is to treat L(1, 2,…, k) as a function of
k variables and use calculus to find the extreme points of the function. For k 5 1,
ordinary differentiation is required; for k $ 2, partial derivatives are needed. Because
L(1, 2,…, k) is a product of several functions, it is usually easier to work with its natu-
ral logarithm ln(L(1, 2,…, k)), which facilitates differentiation by converting L into
a sum of functions:

ln(L(1, 2,…, k)) 5 ln( f (x1)) 1 ln( f (x2)) 1 ln( f (x3))1… 1 ln( f (xn))

Because ln(x) is an increasing function of x, the values of 1, 2,…, k that maximize
ln(L(1, 2,…, k)) are the same ones that maximize L(1, 2,…, k).

Example 7.16 The exponential distribution is commonly used to describe the lifetimes of certain
products (see Example 5.12). Suppose that a sample of n 5 12 electric appliances
are tested continuously until each ceases to function. The length of time that each
appliance lasted (in hours) follows:
10,502 9560 11,671 12,825 8987 7924
9508 8875 14,439 11,320 6549 10,654
To use maximum likelihood estimation to find the parameter of the exponential
distribution that describes this data, we proceed as follows. Suppose x1, x2, x3,…, xn

is any random sample from an exponential distribution with parameter . Since the
exponential density function is of the form f (x) 5 e2x, the likelihood function as-
sociated with the sample data is

L() 5 f (x1)f (x2)f (x3) f (xn)

5 (e2x1)(e2x2)(e2x3) … (e2xn)
5 ne2
^xi

Taking logarithms,

ln(L()) 5 n ln() 2 ^ xi

Equating the derivative of this function to 0 and solving for , we find

d n 1
(ln(L()) 5 ny 2 ^ xi 5 0 so 5 5
d ^ xi x
Thus the maximum likelihood estimator of is n =1yx. For the lifetime of the appli-
ances, this estimate is n 5 1y10,234.5 5 .0000977 5 9.77 3 1025.

Example 7.17 In Example 2.17, n 5 17 observations on length-diameter ratio were plotted on a

normal quantile plot, and it was determined that a normal distribution provides a
good fit to this data. To find the maximum likelihood estimators of the parameters
and 2 of the normal distribution that best fits this data, let x1, x2, x3,…, xn denote
any random sample from a normal distribution. Then the likelihood function based
on this data is as follows.

L(, 2) 5 f (x1)f (x2)f (x3) f (xn)

1 1 x1 2 2 1 1 x2 2 2
e2 2 a b
e2 2 a
b
5a b a b
222 222
1 1 xn 2 2
a e2 2 a
b
b
222
ny2 1 xi 2 2
1 2 ^a b
5a b e 2
22

Taking logarithms,
n 1
ln(L(, 2)) 5 2 ln(22) 2 2 ^ (xi 2 )2
2 2

Since this is a function of two variables, the partial derivatives with respect to and
2 must be set to 0 and the resulting two equations solved. Omitting the details, we
find the maximum likelihood estimators to be

1
n^ i
n 5x
n2 5
(x 2 x)2

Note that the first estimator, n 2, is slightly

n , is unbiased, but the second estimator,
2 2
biased (recall that the unbiased estimator of is the sample variance s , which uses
a denominator of n 2 1, not n). For the length-diameter ratio data, the maximum
likelihood estimates are n 5 x 5 47.31 and n 2 5 57.153.

Example 7.18 In Example 2.18, the following data on tensile strength for multi-wall carbon nano-
tubes was thought to follow a Weibull distribution:
17.4 22.3 23.7 30.0 44.2 49.3 52.7 54.8 62.1
66.2 84.9 90.1 90.3 91.1 99.5 101.6 108.5 109.5
119.1 127.0 132.9 140.8 141.0 175.0 231.8 259.7
In general, let x1, x2, x3,…, xn be a random sample from a Weibull distribution with
parameters and and density function
21 2(x@)
f (x) 5 x e

As in Example 7.17, the likelihood function L(, ) is a function of two variables. So

we must take partial derivatives of ln(L(, )), set them equal to 0, and solve the two
resulting equations. Omitting the algebraic details, we find the following equations:
21

5 £
^ xi ln(xi) ^ ln(xi)
2 § 5 a ^ i b
x
1y

^ xi n n

These two equations cannot be solved explicitly for the maximum likelihood es-
n and n . Instead, for each sample x1, x2, x3,…, xn, the equations must be
timates
solved using an iterative numerical procedure. For the tensile strength data, the max-
imum likelihood estimates are n 5 1.727 and n 5 109.304. These estimates can be
obtained by using the survival package in R or by using the optimization procedure
PROC NLP in SAS.

As you can see from these examples, maximum likelihood estimators are not always
unbiased. In many cases, however, this bias can be removed by using a simple mul-
tiplicative correction factor. In Example 7.17, for instance, the maximum likelihood

estimator of 2 in a normal distribution is slightly biased, but that bias can be corrected
by simply multiplying the estimator by the factor ny(n 2 1). Note that as n increases,
the bias becomes negligible and the correction factor is essentially equal to 1. Beyond
some slight problems with unbiasedness, maximum likelihood estimators have several
properties that make them highly useful in practice. The two most important properties
are listed in the following box.

Properties of Maximum Likelihood Estimators (MLEs)

1.
For large , the sampling distribution of an MLE is approximately normal and the
estimator is unbiased or nearly so, with a variance smaller than that of any other
estimator.
2. For any function (·), if n is the MLE of a parameter , then (n) is
the MLE of ().

Example 7.19 In Example 7.16, we showed that the MLE of in an exponential distribution is
n 5 1yx. Since the mean of an exponential distribution is related to by the equation

5 1y, the MLE of is simply 1y n 5 x. That is, given g() 5 1y, then since n is
the MLE for , g(n ) is the MLE for g().

Density Estimation
In many applications, populations or processes can be described by normal density
curves. Given a random sample of size n from a normal population, the density curve
can be approximated by simply using the sample statistics x and s in place of the param-
eters and in the formula for the density curve:

1 1 x2x 2
f(x) e2 2 a s
b
22s2

Although this function can be graphed by itself, it is often good practice to superimpose
a plot of f (x) over a histogram of the sample data from which x and s were calculated.
When the bars in the histogram represent densities (see p. 19), the graph of f (x) will
be of the same scale as the histogram, because both will have a total area of 1. When
the histogram bars are simply frequencies, then f (x) must be multiplied by an appropri-
ate factor so that its area coincides with the area under the histogram. If w denotes the
width of each histogram bar and there are n data points in the sample, then the total
area encompassed by a frequency histogram is w ? n. Therefore, to make the approxi-
mate density function plot correctly over such a histogram, we must plot the function
w ? n ? f (x) instead of f (x).

Kernel Density Estimation

Some populations and processes are not adequately described by common density
curves. In such cases, the population density curve can be approximated by using the
method of kernel density estimation. Creating a kernel density estimate is very similar
to creating a histogram. In a histogram, the bar over any class interval can be thought of
as a stack of several equal-size rectangles, each representing a single data point in that
class. In the kernel density estimate, these n rectangles are replaced by n normal density
curves centered at the n data points. The kernel function is then defined to be the aver-
age of these n normal densities. The kernel function is used as an approximation to the
population density curve. Figure 7.12 illustrates this procedure on a small data set. Note
that the kernel function in this figure is shown as the sum of the individual densities, not
the average, to highlight the shape of the kernel function.

( )

0 1 2 3 4

Figure 7.12 A kernel function for the data

(1, 1.5, 1.8, 2, 2.1, 2.2, 2.5, 3)

To put a normal density curve around each point in a set of data x1, x2, x3,…, xn, we
must determine the appropriate mean and variance to use. Let
n
1
s2 5
n21
^ (xi 2 x)2 Unless otherwise noted, all content on this page is © Cengage Learning.
i51

denote the sample variance of the n data points; the normal density centered at xi has a
mean and standard deviation of
5 xi 5 s

where is a positive number called the smoothing parameter or window width. The
smoothing parameter controls the spread of each of the normal distributions centered at
the data points. These distributions have densities defined by

1 1 x 2 xi 2
fi(x) 5 e2 2 a s
b
for 2 ,x,
s22

The kernel function is then given by the formula

1 n
k(x) 5 ^ fi(x)
n i51
for 2 ,x,

The effect of the smoothing constant is illustrated in the following example.

Briefly, small values of yield kernel functions that follow the data very closely and,
therefore, often have a choppy appearance similar to a histogram of the data. Larger
values of lead to smoother-looking kernel functions.

Example 7.20 The tragedy that befell the space shuttle Challenger and its astronauts in 1986 led to
a number of studies to investigate the reasons for mission failure. Attention quickly
focused on the behavior of the rocket engine’s O-rings. Here is data consisting of ob-
servations on x 5 O-ring temperature (°F) for each test firing or actual launch of the
shuttle rocket engine (Presidential Commission on the Space Shuttle Challenger
Accident, Vol. 1, 1986: 129–131).

31 40 45 49 52 53 57 58 58
60 61 61 63 66 67 67 67 67
68 69 70 70 70 70 72 73 75
75 76 76 78 79 80 81 83 84

The sample standard deviation of this data is s 5 12.159. Suppose we choose a

smoothing parameter of 5 .5. Starting with the leftmost point in the data, x1 5 31,
we then form the normal density curve with mean 5 31 and standard deviation
5 s 5 (.5)(12.159) 5 6.0795:

1 1 x 2 31 2

f1(x) 5 e2 2 a 6.0795 b
6.079522

Proceeding to the next largest data point, x2 5 40, we create a density curve with
mean 5 40 and 5 s 5 6.0795:

1 1 x 2 40 2

f2(x) 5 e2 2 a 6.0795 b
6.079522

After continuing in this manner through all n 5 36 data points, we take the
average of all 36 density functions to form the kernel function k(x). Fig-
ure 7.13(a) shows the plot of k(x) along with a histogram of the O-ring data. For
comparison, Figure 7.13(b) shows a kernel function based on a value of 5 .2.
Although the choice of is subjective, the value of 5 .5 provides a smoother fit
to the data.

Density

.04

.03

.02

.01

0 Temperature
10 20 30 40 50 60 70 80 90
(a)

Density

.04

.03

.02

.01

0 Temperature
10 20 30 40 50 60 70 80 90
(b)

Figure 7.13 Kernel functions fit to the O-ring data of Example 7.20:
(a) 5 .5; (b) 5 .2

Unless otherwise noted, all content on this page is © Cengage Learning.

Bootstrap Confidence Intervals

The confidence interval formulas we developed in the preceding sections require a
knowledge of one or both of the following: (1) the exact distribution (e.g., the nor-
mal) of the population sampled and (2) a mathematical expression for the standard
error of the statistic used to form the interval. Although requirement 1 becomes less
important as the sample size increases, requirement 2 cannot be ignored. Compli-
cating the situation further is the fact that many statistics have standard error for-
mulas that are only approximations based on the assumption of normal populations.
The sample correlation coefficient, r, is one such example. Even under the assump-
tion of sampling from a bivariate normal population, the sampling distribution of r
has no simple form, and formulas for the standard error of r are only approximations.

In an effort to avoid such problems, Efron (“Bootstrap Methods: Another Look at

the Jackknife,” Annals of Statistics, 1979: 1–26) introduced a computer-intensive method
called the bootstrap method. The term bootstrap, a reference to the phrase “to pull oneself
up by one’s own bootstraps,” is intended to describe the way in which bootstrap procedures
approximate the sampling distribution of a statistic—by drawing large numbers of random
samples from a single sample of data. The bootstrap is only one example of a general class
of methods, called resampling procedures, based on the idea that there is information to
be gained by sampling from a sample. The bootstrap method is described as follows:

Outline of the Bootstrap Method

1. Obtain a random sample of size n from a population or process.
2. Generate a random sample of size n, with replacement, from the original sample
in step 1.
3. Calculate a statistic of interest for the sample in step 2.
4. Repeat steps 2 and 3 a large number of times to form the approximate sampling
distribution of the statistic.
Sampling with replacement (see Section 4.2) is the key to the proper use of the
bootstrap method. Otherwise, sampling n items without replacement from a set of n
data values would always yield the original n items and, hence, the same calculated
statistic for each sample. Much computational effort is necessary to draw repeated
samples, often as many as 1000, and to calculate and compile a sampling distribu-
tion. Such effort, which would have been out of the question in the precomputer
era, is a simple task for today’s computers.
Understanding how the bootstrap works is easier than understanding why it works.
At first glance, the procedure seems to give something for nothing. It requires no distri-
butional assumptions about a population, needs no standard error formulas, and gener-
ates the sampling distribution of a statistic from the information in only a single sample.
Roughly speaking, resampling methods work because random subsamples of a random
sample are also random samples from a population (see Sampling Rules, Section 4.2).
Consequently, each bootstrap sample qualifies as a genuine random sample of size n
drawn, with replacement, from a population. This means that statistics calculated from
such samples can properly be used to form a sampling distribution.

Bootstrap Intervals for the Mean

Bootstrap confidence intervals, also called bootstrap percentile intervals, for estimat-
ing a population or process are generated using the general format outlined previously.
A large number, B, of bootstrap samples are randomly selected and the sample mean
x is calculated for each sample. A (1 – )100% confidence interval for is formed
by finding the upper and lower (y2)100% percentiles of the B sample means. The
bootstrap procedure can be applied to large-sample and small-sample problems alike.
Choosing a value of B that makes B(y2) an integer simplifies the work, because
the percentiles can be found by just counting in B(y2) units from both ends of the
sorted list of sample means. Empirical studies have shown that values of B in the range
of 500 to 1000 generally give good results. For the typical confidence levels used in prac-
tice (e.g., 90%, 95%, 99%), the choice B 5 1000 will satisfy all of these requirements. In
general, larger values of B should be used for larger confidence levels.

Example 7.21 In Example 7.3 (Section 7.2), the large-sample confidence interval formula
x 6 1.96sy1n was used to find a 95% confidence interval for the mean breakdown
voltage (in kV) for a particular electronic circuit. Using the sample of n = 48 observa-
tions, this interval was determined to be (53.2, 56.2). For comparison, we now use this
data to find a 95% bootstrap interval for the mean.
A histogram of B 5 1000 samples, drawn with replacement from the original
sample, is shown in Figure 7.14. Since 1 2 5 .95, the upper and lower endpoints
of the confidence interval are found by counting in B(y2) 5 1000(.05/2) 5 25
units from each end of the sorted list of 1000 sample means. For the sample
means shown in Figure 7.14, the 25th largest value is 53.21 and the 975th largest
value is 56.13, giving a confidence interval of (53.2, 56.1). Note how close the
bootstrap interval is to the earlier interval (53.2, 56.2). This is not an accident.
Bootstrap intervals usually agree closely with traditional confidence intervals
when all the assumptions necessary for the traditional interval are met.

Frequency

100

0 Sample mean
52 53 54 55 56 57 58

Figure 7.14 5 1000 bootstrapped sample means from the data in Example 7.3

Unless otherwise noted, all content on this page is © Cengage Learning.

Two-Sample Bootstrap Intervals

Bootstrap methods can readily be applied to statistics based on two or more samples.
The procedures are intuitive extensions of the ones illustrated previously. For the case
of two independent random samples of size n1 and n2 (Section 7.5), each bootstrap
sample consists of a pair of samples, one of size n1 drawn from the first sample, the other
of size n2 drawn from the second sample. The difference of these two sample means is
recorded, and the next bootstrap pair is drawn until B such differences have been ob-
tained. The percentile method is then used to obtain the desired confidence limits. For
paired data, the procedure is even easier: Select B bootstrap samples from the original n
differences between the two paired samples of n data points. Again, we use the percentile
method to form the confidence limits.

Comments
Since its inception in 1979, the bootstrap method has been successfully applied to many
different situations, including regression and correlation analysis, as well as other ad-
vanced statistical procedures. During this time, computer power and availability have
also dramatically increased, making the bootstrap a realistic option for data analysis. It
is now relatively easy to write macros in any statistical or spreadsheet software program
to carry out bootstrap computations.
As a rule, bootstrap intervals generally agree fairly well with traditional confidence
interval results when the assumptions necessary for the traditional interval are met. In
those cases where the assumptions are not met (e.g., when populations are not normally
distributed), bootstrap intervals offer the additional advantage of giving more realistic
results than traditional confidence intervals. For further reading on this subject, the
book by Efron and Tibshirani entitled An Introduction to the Bootstrap offers a useful
guide to applying the bootstrap (Efron, B., and R. J. Tibshirani, An Introduction to the
Bootstrap, Chapman and Hall, New York, 1993).

Section 7.6 Exercises

60. Refer to Exercise 42 of Section 7.4. b. Use the procedure outlined in this exercise to
a. Use the bootstrap method to find a 95% boot- generate a 95% bootstrap interval for the aver-
strap interval for the mean of the population age dye-layer density.
from which the data of Exercise 42 was obtained. c. Compare your result in part (b) to the 95%
b. Compare your result in part (a) to the 95% con- confidence interval found in Exercise 14(a).
fidence interval found in Exercise 42(c).
63. A random sample of n electronic assemblies is
61. Refer to Exercise 46 of Section 7.4. selected from a large shipment, and each assem-
a. Use the bootstrap method to find a 95% boot- bly is tested on an automatic test station. The
strap interval for the mean of the population number x of assemblies that do not perform cor-
from which this data was obtained. rectly is determined. Let denote the propor-
b. Compare your result in part (a) to the 95% con- tion of assemblies in the entire shipment that
fidence interval found in Exercise 46(a). are defective.
a. In terms of x, what is the maximum likelihood
62. In Exercise 14 (Section 7.2), the sample mean and
estimator of ?
standard deviation of the dye-layer density of aerial
b. Is the estimator in part (a) unbiased?
photographs of 69 forest trees were found to be 1.028
c. What is the MLE of (1 2 )5, the probability
and .163, respectively. Because the raw data is not
that none of the next five assemblies tested is
available, a researcher suggests using a computer to
defective?
generate a random sample of 69 observations from
a normal distribution whose mean and standard de- 64. Let x denote the proportion of an allotted time
viation are 1.028 and .163, respectively. If necessary, frame that a randomly selected worker spends per-
after obtaining the sample, the data are adjusted so forming a manufacturing task. Suppose the prob-
that their sample mean and standard deviation co- ability density function of x is
incide exactly with 1.028 and .163. A 95% bootstrap
( 1 1)x for 0 # x # 1
interval is then generated using this simulated data. f(x) 5 e
0 otherwise
a. Under what conditions will this procedure pro-
vide a reliable interval estimate? where the value of must be larger than 1.

a. Derive the maximum likelihood estimator of point. The random variable x = time headway
for a random sample of size n. has been modeled by a shifted exponential dis-
b. A random sample of ten workers yielded the fol- tribution. For a random sample of ten headway
lowing data on x: .92, .79, .90, .65, .86, .47, .73, times—3.11, .64, 2.55, 2.20, 5.44, 3.42, 10.39,
.97, .94, .77. Use this data to obtain an estimate 8.93, 17.82, and 1.30—use the results from part
of . (a) to find estimates of and .

65. The shear strength x of a random sample of spot 68. A specimen is weighed twice on the same scale. Let
welds is measured. Shear strengths (in psi) are as- x and y denote the two measurements. Suppose x
sumed to follow a normal distribution. and y are independent of one another and are as-
a. Find the maximum likelihood estimator of the sumed to follow normal distributions with the same
strength that is exceeded by 5% of the popula- mean (the true weight of the specimen) and the
tion of welds. That is, find a maximum likeli- same variance 2.
hood estimator for the 95th percentile of the a. For a random sample of n specimens, show
normal distribution based on a random sample that the maximum likelihood estimator
of size n. Hint: Determine the relationship of 2 is given by (1y4n) ^ (xi 2 yi)2, where
between the 95th percentile and and , then (x1, y1), (x2, y2),…,(xn, yn) denote the n pairs of
use the invariance property of MLEs. scale measurements. Hint: The sample vari-
b. A random sample of ten spot-weld strengths ance of two measurements z1 and z2 equals
yields the following data (in psi): 392, 376, (z1 2 z2)2y2.
401, 367, 389, 362, 409, 415, 358, 375. Use b. Five randomly chosen specimens are weighed,
the result in part (a) to find an estimate of the yielding the following data: (3.10, 3.12), (3.52,
95th percentile of the distribution of all weld 3.45), (4.22, 4.30), (2.98, 3.06), and (5.43, 5.38).
strengths. Use the result in part (a) to find an estimate of 2.

66. Refer to Exercise 65. Suppose the strength x of an- 69. Suppose someone suggests using a smoothing
other randomly selected spot weld is measured. parameter of 5 2 to create a kernel density graph.
a. Find a maximum likelihood estimator of the Do you expect the graph to provide a useful picture
probability that x is less than 400. That is, find of the data? Why?
the MLE for P(x , 400). 70. Refer to the data in Exercise 46 of Section 7.4.
b. Use the result in part (a) with the data from a. Use a smoothing parameter of 5 .5 to create a
Exercise 65(b) to estimate P(x , 400). kernel density plot for this data.
67. A random sample x1, x2, x3,…, xn is selected from b. Repeat part (a) using a smoothing parameter of
a shifted exponential distribution whose probability 5 .3.
density function is given by c. Which of the plots in parts (a) and (b) appears to
fit the data better?
e2(x2) for x $
f (x) 5 e 71. Suppose the smallest distance d between any two
0 otherwise
successive measurements in an ordered set of data
When 5 0, this probability density function re- (i.e., measurements sorted from smallest to largest)
duces to the probability density function of the ex- is 3 units.
ponential distribution. a. If s denotes the sample standard deviation of the
a. Obtain maximum likelihood estimators of both measurements in a sample of size n, would
and . 5 dy(3s) lead to a kernel density graph with a
b. In traffic flow research, time headway is defined choppy appearance or a smooth appearance? Why?
to be the elapsed time between the moment b. Will values of that are greater than dy(3s) lead
that one car finishes passing a fixed point and to choppier- or smoother-looking kernel density
the instant that the next car begins to pass that estimates?

72. Refer to the data in Exercise 42 of Section 7.4. The researcher wants to fit a new kernel function
a. Use a smoothing parameter of 5 .5 to create a to the reduced sample of n 2 1 data points. To pro-
kernel density plot for this data. duce a graph that has about the same smoothness
b. Repeat part (a) using a smoothing parameter of as the original kernel function, will the value of
5 .3. have to be raised or lowered?
c. Which of the plots in parts (a) and (b) appears to
74. In Example 1.8 (Chapter 1), a histogram was fit
fit the data better?
to the energy consumption data (in BTUs) from a
73. A kernel function is fit to the data in a sample of sample of 90 homes. Using this data, experiment
size n. Later, a researcher realizes that the larg- with different values of until you find a value that
est observation in the sample was actually a typo- gives a kernel density estimate that approximates
graphical error and, because the original lab data the shape of the histogram of this data shown in
no longer exists, this data point is removed from the Figure 1.7.
sample, leaving a sample of n 2 1 measurements.

Supplementary Exercises
75. Exercise 4 of Chapter 1 presented a sample of n = distribution volume) was read from a graph in
153 observations on ultimate tensile strength. the paper.
a. Obtain a lower confidence bound for popula- PTSD: 10, 20, 25, 28, 31, 35, 37, 38, 38, 39,
tion mean strength. Does the validity of the 39, 42, 46
bound require any assumptions about the popu- Healthy: 23, 39, 40, 41, 43, 47, 51, 58, 63,
lation distribution? Explain. 66, 67, 69, 72
b. Is any assumption about the tensile strength dis- a. Is it plausible that the population distribu-
tribution required prior to calculating a lower tions from which these samples were se-
prediction bound for the tensile strength of the lected are normal?
next specimen selected using the method de- b. Calculate an interval for which you can
scribed in this section? Explain. be 95% confident that at least 95% of all
c. Use a statistical software package to investi- healthy individuals in the population have
gate the plausibility of a normal population adjusted distribution volumes lying be-
distribution. tween the limits of the interval.
d. Calculate a lower prediction bound with a pre- c. Predict the adjusted distribution volume of
diction level of 95% for the ultimate tensile a single healthy individual by calculating a
strength of the next specimen selected. 95% prediction interval. How does this in-
terval’s width compare to the width of the
76. Anxiety disorders and symptoms can often be ef-
interval calculated in part (b)?
fectively treated with benzodiazepine medications.
d. Estimate the difference between the true
It is known that animals exposed to stress exhibit
average measures in a way that conveys in-
a decrease in benzodiazepine receptor binding in
formation about reliability and precision.
the frontal cortex. The paper “Decreased Benzo-
diazepine Receptor Binding in Prefrontal Cortex 77. The article “Quantitative MRI and Electro-
in Combat-Related Posttraumatic Stress Disorder” physiology of Preoperative Carpal Tunnel
(Amer. J. of Psychiatry, 2000: 1120–1126) described Syndrome in a Female Population” (Ergonom-
the first study of benzodiazepine receptor binding ics, 1997: 642–649) reported that (2 473.3,
in individuals suffering from PTSD. The accompa- 1691.9) was a large-sample 95% confidence in-
nying data on a receptor binding measure (adjusted terval for the difference between true average

thenar muscle volume (mm3) for sufferers of carpal 80. As reported by the Pew Research Center’s Social and
tunnel syndrome and true average volume for non- Demographic Trends Project in September 2012, a
sufferers. Calculate a 90% confidence interval for survey of 6500 American households revealed that
this difference. a record 19% owed student loan debt in 2010 (a
sharp increase from the 15% that owed such debt
78. Acrylic bone cement is commonly used in to-
in 2007).
tal joint arthroplasty as a grout that allows for the
a. Calculate and interpret a 95% CI for the pro-
smooth transfer of loads from a metal prosthesis to
portion of all American households in 2010 that
bone structure. The paper “Validation of the Small-
owed student loan debt.
Punch Test as a Technique for Characterizing the
b. What sample size is required if the desired width
Mechanical Properties of Acrylic Bone Cement”
of the 95% CI is to be at most .04, irrespective of
(J. of Engr. In Med., 2006: 11–21) gave the follow-
the sample results?
ing data on breaking force (N):
c. Does the upper limit of the interval in part (a)
Temp Medium n x s specify a 95% upper confidence bound for the
37° Dry 6 325.73 34.97 proportion being estimated? Explain.
37° Wet 6 306.09 41.97
81. Torsion during hip external rotation (ER) and ex-
Assume that all population distributions are normal.
tension may be responsible for certain kinds of in-
a. Estimate true average breaking force in a dry
juries in golfers and other athletes. The article “Hip
medium at 37° in a way that conveys informa-
Rotational Velocities During the Full Golf Swing”
tion about reliability and precision. Interpret
(J. of Sports Sci. and Med., 2009: 296–299) reported
your estimate.
on a study in which peak ER velocity and peak IR
b. Estimate the difference between true average
(internal rotation) velocity (both in deg ? sec21)
breaking force in a dry medium at 37° and true
were determined for a sample of 15 female colle-
average force at the same temperature in a wet
giate golfers during their swings. The following data
medium, and do so in a way that conveys infor-
was supplied by the article’s authors:
mation about precision and reliability. Then in-
terpret your estimate. Golfer ER IR diff z quan
1 2130.6 298.9 231.7 21.28
79. An experiment was carried out to compare various 2 2125.1 2115.9 29.2 20.97
properties of cotton/polyester spun yarn finished
3 251.7 2161.6 109.9 0.34
with softener only and yarn finished with softener
4 2179.7 2196.9 17.2 20.73
plus 5% DP-resin (“Properties of a Fabric Made with
5 2130.5 2170.7 40.2 20.34
Tandem Spun Yarns,” Textile Res. J., 1996: 607–
6 2101.0 2274.9 173.9 0.97
611). One particularly important characteristic of
fabric is its durability, that is, its ability to resist wear. 7 224.4 2275.0 250.6 1.83
For a sample of 40 softener-only specimens, the 8 2231.1 2275.7 44.6 20.17
sample mean stoll-flex abrasion resistance (cycles) 9 2186.8 2214.6 27.8 20.52
in the filling direction of the yarn was 3975.0, with a 10 258.5 2117.8 59.3 0.00
sample standard deviation of 245.1. Another sample 11 2219.3 2326.7 107.4 0.17
of 40 softener-plus specimens gave a sample mean 12 2113.1 2272.9 159.8 0.73
and sample standard deviation of 2795.0 and 293.7, 13 2244.3 2429.1 184.8 1.28
respectively. Calculate a confidence interval with 14 2184.4 2140.6 243.8 21.83
confidence level 99% for the difference between 15 2199.2 2345.6 146.4 0.52
true average abrasion resistances for the two types a. Is it plausible that the differences came from a
of fabric. Does your interval provide convincing evi- normally distributed population?
dence that true average resistances differ for the two b. Estimate the true average difference in peak
types of fabric? Why or why not? ER and IR velocities in a way that conveys

information about reliability and precision. observation is smaller than the median and all
Interpret the resulting estimate. others exceed the median?
b. What is the probability that only x2 is smaller
82. It is important that face masks used by firefighters be
than the median and all other n 2 1 observations
able to withstand high temperatures. In a test of one
exceed the median?
type of mask, the lenses in 11 of the 35 masks popped
c. What is the probability that exactly one of the
out at a temperature of 250°F. Calculate a lower con-
xi’s is less than ?
fidence bound for the proportion of all such masks
d. What is P( , y2), where y2 denotes the second
whose lenses would pop out at this temperature us-
smallest xi? Hint: , y2 occurs if either all n of
ing both the method suggested in Section 7.3 and
the observations exceed the median or all but
the method suggested in Exercise 26(b).
one of the xi’s does.
83. Suppose an investigator wants a confidence interval e. With yn21 denoting the second largest xi, what is
for the median of a continuous distribution based P( . yn21)?
on a random sample x1,…, xn without assuming f. Using the results of parts (d) and (e), what is
anything about the shape of the distribution. P(y2 , , yn21)? What does this imply about
a. What is P(x1 , ), the probability that the first the confidence level associated with the inter-
observation is smaller than the median? val (y2, yn21)? Determine the interval and as-
b. What is the probability that both the first and sociated confidence level for the data given in
the second observations are smaller than the Exercise 83.
median?
85. Suppose we have obtained a random sample
c. Let yn 5 max {x1,…, xn}. What is P(yn , )?
x1,…, xn from a continuous distribution and wish
Hint: The condition that yn is less than is
to use it as a basis for predicting a single new ob-
equivalent to what about x1,…, x2?
servation xn11 without assuming anything about the
d. With y1 5 min {x1,…, xn}, what is P( , y1)?
shape of the distribution. Let y1 and yn denote the
e. Using the results of parts (c) and (d), what is
smallest and largest, respectively, of the n sample
P(y1 , , yn)? Regarding (y1, yn) as a confi-
observations.
dence interval for , what is the associated con-
a. What is P(xn11 , x1)?
fidence level?
b. What is P(xn11 , x1 and xn11 , x2), that is, the
f. An experiment carried out to study the curing
probability that xn11 is the smallest of these
time (hr) for a particular experimental adhesive
three observations?
yielded the following observations:
c. What is P(xn11 , y1)? What is P(xn11 . yn)?
31.2 36.0 31.5 28.7 37.2 d. What is P(y1 , xn11 , yn), and what does this
35.4 33.3 39.3 42.0 29.9 say about the prediction level associated with
Referring back to part (e), determine the con- the interval (y1, yn)? Determine the interval and
fidence interval and the associated confidence associated prediction level for the curing time
level. data given in Exercise 83.
g. Assuming that the data in part (f) was selected
86. The derailment of a freight train due to the
from a normal distribution (is this assumption
catastrophic failure of a traction motor arma-
justified?), calculate a confidence interval for
ture bearing provided the impetus for a study
(which for a normal distribution is identical to )
reported in the article “Locomotive Traction
using the same confidence level as in part (f ),
Motor Armature Bearing Life Study” (Lubri-
and compare the two intervals.
cation Engr., Aug. 1997: 12–19). A sample of
84. Consider the situation described in Exercise 83. 17 high-mileage traction motors was selected
a. What is P(x1 , , x2 . , x3 . ,…, xn . ), and the amount of cone penetration (mm@10)
that is, the probability that only the first was determined both for the pinion bearing and

for the commutator armature bearing, resulting The article “High-Performance Wire Electrodes
in the following data: for Wire Electrical-Discharge Machining—A Review”
Motor: 1 2 3 4 5 6 (J. of Engr. Manuf., 2012: 1757–1773) gave the follow-
Commutator: 211 273 305 258 270 209 ing sample observations on total coating layer thick-
ness (in m) of eight wire electrodes used for WEDM:
Pinion: 226 278 259 244 273 236
Motor: 7 8 9 10 11 12 21 16 29 35 42 24 24 25
Commutator: 223 288 296 233 262 291 a. Is it plausible that the given sample observations
Pinion: 290 287 315 242 288 242 were selected from a normal distribution?
Motor: 13 14 15 16 17 b. Calculate and interpret a 95% CI for true average
Commutator: 278 275 210 272 264 total coating layer thickness in all such electrodes.
c. Predict the total coating layer thickness for a
Pinion: 278 208 281 274 268
single electrode in a way that conveys informa-
Calculate an estimate of the population mean dif- tion about precision and reliability.
ference between penetration for the commutator
armature bearing and penetration for the pinion 89. Nine Australian soldiers were subjected to extreme
bearing, and do so in a way that conveys infor- conditions that involved a 100-min walk with a 25-lb
mation about the reliability and precision of the pack when the temperature was 40°C (104°F). One of
estimate. (Note: A normal quantile plot validates them overheated (above 39°C) and was removed from
the necessary normality assumption.) Would you the study. Here are the rectal Celsius temperatures of
say that the population mean difference has been the other eight at the end of the walk (“Neural Net-
precisely estimated? Does it look as though popu- work Training on Human Body Core Temperature
lation mean penetration differs for the two types Data,” Combatant Protection and Nutrition Branch,
of bearings? Explain. Aeronautical and Maritime Research Laboratory of
Australia, DSTO TN-0241, 1999):
87. The article cited in Exercise 86 also included the
38.4 38.7 39.0 38.5 38.5 39.0 38.5 38.6
following data on percentage of oil remaining for
the commutator bearings: We would like to get a 95% confidence interval for
71.02 86.49 81.14 84.89 87.42 the population mean.
84.49 82.09 80.97 69.80 89.29 a. Compute the t-based confidence interval of
86.10 86.80 83.41 60.56 88.80 Section 7.4.
86.41 86.19
b. Use the bootstrap method to find a 95% bootstrap
Would you use the one-sample t confidence in- interval for the population mean.
terval to estimate the population mean and medi-
an? Estimate the population median percentage c. Compare your results in parts (a) and (b).
of oil left using the interval suggested in Exer 90. Suppose that samples of size n1, n2, and n3 are in-
cise 84, and determine the corresponding confi- dependently selected from three different popula-
dence level. tions. Let i and i (i 5 1, 2, 3) denote the popula-
88. Wire electrical-discharge machining (WEDM) is tion means and standard deviations, and consider
a process used to manufacture conductive hard estimating 5 a11 1 a22 1 a33, where the ai >s
metal components. It uses a continuously mov- are specified numerical constants. A point estimate
ing wire that serves as an electrode. Coated wires of is n 5 a1x1 1 a2x2 1 a3x3. When the sample
have been used to substantially increase the cut- sizes are all large, n has approximately a normal
ting speed and precision of the process. Coating distribution with variance
on the wire electrode allows for cooling of the
wire electrode core and provides an improved 21 22 23
2n 5 a21 ? 1 a22 ? 1 a23 ?
cutting performance. n1 n2 n3

An estimated variance s2n results from replacing the 92. The one-sample CI for a normal mean and PI for a
2>s; by the s2>s; n can then be standardized to ob- single observation from a normal distribution were
tain a z variable from which the confidence interval both based on the central t distribution. A CI for a
n 6 (z crit)s n is obtained. Suppose that samples of particular percentile (e.g., the 1st percentile or the
three different brands of tires with identical lifetime 95th percentile) of a normal population distribution
ratings—a store brand (1) and two national brands is based on the noncentral t distribution. A particular
(2 and 3)—are selected, and the lifetime of each distribution of this type is specified by both df and
tire is determined, resulting in the following data: the value of the noncentrality parameter ( 5 0
gives the central t distribution). The key result is that
Sample
the variable
Sample Sample standard
Brand size mean deviation x2
2 (z percentile) 1n
1 40 38,376 1522 y1n
t5
2 32 41,569 1711 sy
3 32 42,123 1645
Calculate and interpret a confidence interval with has a noncentral t distribution with df 5 n 2 1 and
confidence level 95% for 5 1 2 (2 1 3)y2. 5 (2z percentile) 1n. Let t.025.v, and t.975,v, de-
note the critical values that capture lower tail area
91. Recent information suggests that obesity is an in- .025 and upper tail area .025, respectively, under
creasing problem in America among all age groups. the noncentral t curve with v df and noncentrality
The Associated Press (October 9, 2002) reported that parameters (when 5 0, t.025 5 2t.975, since central
1276 individuals in a sample of 4115 adults were t distributions are symmetric about 0).
found to be obese (a body mass index exceeding 30; a. Use the given information to obtain a formula
this index is a measure of weight relative to height). for a 95% confidence interval for some particular
a. Estimate the proportion of all American adults percentile of a normal population distribution.
who are obese in a way that conveys informa- b. For 5 6.58 and df 5 15, t.025 and t.975 are
tion about the reliability and precision of the (from Minitab) 4.1690 and 10.9684, respectively.
estimate. Use this information to obtain a 95% CI for the
b. A 1998 survey based on people’s own assess- 5th percentile of the modulus of elasticity distri-
ments revealed that 20% of all adult Americans bution considered in Example 7.10.
consider themselves obese. Does the estimate of
part (a) suggest that the 2002 percentage is more
than 1.5 times the 1998 percentage? Explain.

Bibliography
DeGroot, Morris, and Mark Schervish, Probability and New York, 2012. An excellent survey of general con-
Statistics (4th ed.), Addison-Wesley, Reading, MA, cepts of inference.
2011. A very good exposition of the general principles Hahn, Gerald, and William Meeker, Statistical Inter-
of statistical inference at a level somewhat above that vals, Wiley, New York, 2011. Everything you ever
of our book. wanted to know about statistical intervals—confidence,
Devore, Jay and Kenneth Berk, Modern Mathemati- prediction, tolerance, and others.
cal Statistics with Applications (2nd ed.), Springer,

8
BSIP/UIG/Getty images
Testing Statistical
Hypotheses
8.1 Hypotheses and Test Procedures
8.2 Tests Concerning Hypotheses About Means
8.3 Testing Concerning Hypotheses
About a Categorical Population
8.4 Testing the Form of a Distribution
8.5 Further Aspects of Hypothesis Testing

Introduction
Estimation of a parameter does not explicitly involve making a decision; instead we
wish to determine the most plausible value (a point estimate) or a range of plausible
values (a confidence interval). In contrast, the objective of a hypothesis-testing analysis
is to decide which of two competing claims (hypotheses) is true. We have already
encountered an informal situation of this sort in the context of quality control: At
each time point, we used sample information to decide whether a process was
out of control. The decision rule involved control limits, with the out-of-control
conclusion justified only if the value of some quality statistic fell outside the limits.
In Section 8.1, we discuss the forms of hypotheses about parameters and the
general nature of for deciding between the two relevant hypotheses.
Test procedures based on distributions are developed in Section 8.2 for testing
hypotheses about a single mean or about the difference 1 2 2 between two
means. Sections 8.3 and 8.4 introduce procedures for hypotheses about certain
population proportions and population distributions. Finally, in Section 8.5, we con-
sider a variety of issues and concepts relating to the behavior of test procedures.
Hypothesis testing methods, as well as estimation methods, will be used extensively
throughout the remainder of the book.
352

8.1 Hypotheses and Test Procedures

A statistical hypothesis, or just hypothesis, is a claim or assertion either about one or
more population or process characteristics (parameters) or else about the form of the
population or process distribution. Here are some examples of legitimate hypotheses:
1. Parameter: 5 proportion of e-mail messages emanating from a certain
system that are undeliverable
Hypothesis: , .01
2. Parameters: 1 5
true average lifetime for a particular name-brand tire
(miles)
2 5 true average lifetime for a less expensive store-brand tire
Hypothesis: 1 2 2 . 10,000
3. Parameters: 1 5 proportion of individuals in a certain population with an
AA genotype for a particular genetic characteristic
2 5 proportion of individuals with an Aa genotype
3 5 proportion of individuals with an aa genotype
Hypothesis: 1 5 .25, 2 5 .50, 3 5 .25
4. Population distribution: f (x), where x 5 the time between successive
adjustments of a lathe process to
correct for tool wear
x has an exponential distribution, that is, f1x2 5 e2x for
Hypothesis:
some . 0
In any hypothesis-testing problem, there are two competing hypotheses under con-
sideration. One hypothesis might be 5 1000 and the other Þ 1000, or we might be
considering 5 .10 versus , .10. If it were possible to carry out a census of the entire
population, we would know which of the two hypotheses is correct, but almost always
our conclusion must be based on information in sample data. A test of hypotheses is a
method for using sample data to decide between the two competing hypotheses under
consideration. We initially assume that one of the hypotheses, the null hypothesis, is
correct; this is the “prior belief” claim. We then consider the evidence (sample data),
and we reject the null hypothesis in favor of the competing claim, called the alternative
hypothesis, only if there is convincing evidence against the null hypothesis.

definitions The null hypothesis, denoted by H0, is the assertion that is initially assumed to be
true. The alternative hypothesis, denoted by Ha, is the claim that is contradictory to
H0. The null hypothesis will be rejected in favor of the alternative hypothesis only if
sample evidence suggests that H0 is false. If the sample does not strongly contradict
H0, we will continue to believe in the truth of the null hypothesis. The two possible
conclusions from a hypothesis-testing analysis are then reject H0 or fail to reject H0.

Making a decision in a criminal trial is similar to what is involved in testing hypotheses.

The null hypothesis, the claim initially believed to be true, is that the accused is innocent
(“innocent until proven guilty”). The jury is instructed not to switch its belief to the alterna-
tive hypothesis that the accused is guilty unless there is serious and compelling evidence
for reaching that conclusion. The burden of proof is on the prosecution to demonstrate
conclusively from the evidence that the accused is guilty. In hypothesis testing, the burden
of proof is on the alternative hypothesis; in the absence of evidence strongly contradictory
to H0 and much more consistent with Ha, we continue to believe in the null hypothesis.
The selection of the claim believed true (H0) and the claim that will bear the
burden of proof (Ha) depends on the objectives of the study. In general, if an investi-
gator wishes to demonstrate conclusively that a particular assertion is correct, or wants
to see strong evidence for an assertion before taking action, that assertion should be
incorporated in Ha. Frequently in science, a researcher develops a new theory that
stands in contrast to currently accepted theory. If the current theory is identified as
H0, and the new theory as Ha (the research hypothesis), and if H0 can then be rejected,
the investigator will have compelling evidence that the new theory is correct.

Example 8.1 Because of machining process variability, bearings produced by a certain machine
do not have identical diameters. Let denote the true average diameter for bearings cur-
rently being produced. The machine was initially calibrated to achieve the design spec-
ification 5 .5 in. However, the manufacturer is now concerned that the diameters no
longer conform to this specification. That is, the hypothesis Þ .5 must now be consid-
ered a possibility. If sample evidence suggests that this latter hypothesis is indeed correct,
the production process will have to be halted while recalibration takes place. Stopping
the process is quite costly, so the manufacturer wants to be sure that recalibration is nec-
essary before this is done. Under these circumstances, a sensible choice of hypotheses is
H0: 5 .5 (the specification is being met, so recalibration is unnecessary)
Ha: Þ .5
Only compelling sample evidence would then result in H0 being rejected in favor
of Ha.

In many hypothesis-testing problems that we will consider, the null and alternative
hypotheses assume particular forms. H0 will be

population or process characteristic 5 some hypothesized value

Ha then results from replacing the “5” in H0 by one of the three possible inequalities:
.,,, or Þ; the relevant inequality again depends on the research objectives. One
example of this is H0: 5 .002 versus Ha: , .002, where is the process standard
deviation of bearing diameter.

Example 8.2 A pack of a certain brand of cigarettes displays the statement “1.5 mg nicotine aver-
age per cigarette by FTC method.” Let denote the mean nicotine content per
cigarette for all cigarettes of this brand. The advertised claim is that 5 1.5. People

who smoke this brand would probably be disturbed if it turned out that true average
nicotine content exceeded the claimed value, since excessive nicotine ingestion is a
known health hazard. Suppose a sample of cigarettes of this brand is selected and the
nicotine content of each cigarette is determined. Evidence from this sample against
the company’s claim would have to be quite strong before the accusation is made
that the claim is false, since serious financial and legal consequences could ensue
from any such action. This suggests that we test
H0: 5 1.5 (the advertised claims is correct)
against the alternative hypothesis
Ha: . 1.5 (true average nicotine level exceeds the advertised value)
and reject H0 in favor of Ha only if sample evidence is very compelling for this conclusion.

Since the alternative hypothesis in Example 8.2 asserted that . 1.5, it might have
seemed sensible to state H0 as the inequality # 1.5. This assertion is in fact the implicit
null hypothesis, but we will state H0 explicitly as a claim of equality. There are several
reasons for this. First of all, the development of a test procedure is most easily understood
if there is a unique value of (or , or whatever other parameter is under consideration)
when H0 is true. Second, suppose sample data gives much more support to . 1.5
than to 5 1.5. Then there would also be more support for . 1.5 than for # 1.5.
If, on the other hand, 5 1.5 is much more plausible than . 1.5 in light of the data,
then # 1.5 would also be deemed more plausible than . 1.5. So the conclusion
when testing H0: 5 1.5 versus Ha: . 1.5 should be identical to that when consider-
ing the more realistic null hypothesis # 1.5 against this alternative. Similarly, what-
ever conclusion reached when testing H0: 5 .1 versus Ha: , .1 would also apply to
the implicit null hypothesis H0: $ .1.

Errors in Hypothesis Testing

Once hypotheses have been formulated, we need a method for using sample data to
determine whether H0 should be rejected. A decision rule used for this purpose is called
a test procedure. Just as a jury may reach the wrong verdict in a trial, there is some
chance that the use of a test procedure may result in an erroneous conclusion. One
incorrect conclusion in a judicial setting is for a jury to convict an innocent person, and
another is for a guilty person to be set free. Similarly, there are two possible errors to
consider when developing a test procedure.

definitions A type I error is the error of rejecting H0 when H0 is actually true.

A type II error consists of not rejecting H0 when H0 is false.

No reasonable test procedure can guarantee complete protection against either type of
error; this is the price we pay for basing our inference on sample data.

Example 8.3 Suppose you have to purchase tires for your vehicle and have narrowed your choice
to a certain name-brand tire and another tire sold only through a particular chain of
stores. The name-brand tire is more expensive to purchase than the store-brand tire, but
the extra expense would be justified if the lifetime of the former significantly exceeded
that of the latter. Let 1 denote true average tire lifetime for the brand-name tire under
specified testing conditions, and let 2 denote true average lifetime for the store-brand
tire under these conditions. You have decided that the extra expense can be justified
only if 1 exceeds 2 by more than 10,000 miles, and you want to see persuasive evi-
dence before incurring this extra expense. The natural choice of hypotheses is then
H0: 1 2 2 5 10,000
Ha: 1 2 2 . 10,000
A type I error here involves rejecting H0 and purchasing the name-brand tire when
its true average mileage does not exceed that of the store-brand tire by more than
10,000 miles. A type II error consists of not rejecting H0 and purchasing the less
expensive tire when the true average lifetime of the name-brand tire actually does
exceed that of the store brand by more than 10,000 miles.
Recall that when sampling a population or a process, sampling variability will
virtually always be present. In particular, the value of a sample mean x may be rather
different from the value of . In the tire situation, even if 1 2 2 does equal 10,000,
the name-brand tires in the sample may be unusually good and the store-brand
sample unusually bad, yielding data for which H0 should be rejected. On the other
hand, perhaps 1 2 2 5 12,000, so H0 is false; yet there is some chance that the
store-brand sample would be unusually good and the name-brand sample not so
impressive, suggesting that H0 should not be rejected.

If a test procedure cannot offer guaranteed protection against committing either a

type I error or a type II error, we would at least like the chance of making either type of
error to be small.

definition The probability of making a type I error is denoted by and is called the level of
significance or significance level of the test. Thus a test with 5 .01 is said to
have a significance level of .01. This means that if H0 is actually true and the test
procedure is used repeatedly on different samples selected from the population
or process, in the long run H0 would be incorrectly rejected only 1% of the time.
The probability of a type II error is denoted by .

The ideal of 5 0 and 5 0 cannot be achieved as long as a conclusion is to be based

on sample data. The test procedures used in practice allow the user to specify the sig-
nificance level to be employed in the test. So why would someone ever select a sig-
nificance level like .10 or .05 when a smaller significance level such as .01 can also be
employed? Why not always select a very small value for ? The answer is that the two
error probabilities are inversely related to one another. Changing the test procedure to
obtain a smaller probability of making a type I error inevitably makes it more likely that
a type II error will be committed if H0 happens to be false (just as changing the rules of

evidence to make it less likely that an innocent person will be convicted also makes it
more likely that a guilty person will go free). If a type I error is much more serious than
a type II error, a very small value of is reasonable. When a type II error could have
quite unpleasant consequences, it is better to use a larger to keep under control.
This leads to the following general principle for specifying a test procedure:
After thinking about the relative consequences of type I and type II errors, decide on
the largest that is tolerable for the situation under consideration. Then employ a test
procedure that uses this maximum acceptable value—rather than anything smaller—
as the significance level (because using a smaller level would increase ). In following
this principle, we are making as small as possible subject to keeping a clamp on .

Thus if you decide that 5 .05 is tolerable, you should not use a test with 5 .01 or .001,
because doing so would inflate . The significance levels used most frequently in practice
are .05 and .01 (a 1-in-20 or 1-in-100 chance of rejecting H0 when it is true), but the level
that you decide to employ should reflect the seriousness of errors in your specific situation.

Test Statistics and P-Values

A test of hypotheses is carried out by employing what is called a test statistic, the func-
tion of the data that is computed and used to decide between H0 and Ha. Suppose,
for example, that is the true average flexural strength of concrete beams of a certain
type. These beams will not be used in a certain application unless there is strong evi-
dence that exceeds 600 psi. The appropriate hypotheses then are H0: 5 600 versus
Ha: . 600. A sample of beams will be selected, and the strength determined for each
one. Obviously the value of the sample mean x will provide information about the value
of . Recall the following properties of the sampling distribution of x:
x 5 (the sampling distribution is centered at )
When n is large, x has approximately a normal sampling distribution (the Central
Limit Theorem) with standard error
s
x 5 estimated by
1n 1n
in which case the standardized variables
x2 x2
z5 z5
y1n sy1n
both have an approximately standard normal distribution (the z curve).
When H0 is true, x 5 600, whereas when H0 is false, we expect x to exceed 600. The
difference x 2 600 is the distance between the sample mean and what we expect it to
be when H0 is true. Consider the test statistic
x 2 600
z5
sy1n
The division by sy1n expresses the distance as some number of (estimated) standard
deviations of x. If, for example, z 5 3.0, then the observed x value is 3 standard deviations
larger than what would be expected were H0 true—a result not very consistent with H0. A
z value of .5 results from an x value that is only half a standard deviation larger than what
is expected when the null hypothesis is true; this distance is not at all contradictory to H0.

Having decided on a test statistic and calculated its value for the given sample, we now
ask the following key question: If H0 is true, how likely is it that a test statistic value at least as
contradictory to H0 as the one obtained would result? If the likelihood of this is very small,
then the test statistic value is quite extreme relative to what the null hypothesis suggests and
very contradictory to H0. On the other hand, if there is a large chance of a value at least this
extreme occurring when H0 is true, then what was observed is reasonably consistent with H0.

definition The P-value, or observed significance level (OSL), is the probability, calculated
assuming H0 is true, of obtaining a test statistic value at least as contradictory to H0
as the value that actually resulted. The smaller the P-value, the more contradic-
tory is the data to H0. The null hypothesis should then be rejected if the P-value
is sufficiently small. In particular, the following decision rule specifies a test with
the desired significance level (type I error probability) :
Reject H0 if P@value # .
Do not reject H0 if P@value . .

Example 8.4 The recommended daily dietary allowance (RDA) for zinc among males older than
50 years is 15 mg/day (World Almanac, 1992). The article “Nutrient Intakes and Dietary
Patterns of Older Americans: A National Study” (J. of Gerontology, 1992: M145–M150)
reported the following data on zinc intake for a sample of males age 65–74 years:
n 5 115 x 5 11.3 s 5 6.43
Does this data suggest that , the average daily zinc intake for the entire population
of males age 65–74, is less than the RDA? The relevant hypotheses are
H0: 5 15
Ha: , 15
Figure 8.1 shows a boxplot of data consistent with the given summary quantities.
Roughly 75% of the sample observations are smaller than 15 (the top edge of the box is
at the upper quartile). Furthermore, the observed x value, 11.3, is certainly smaller than
15, but this could be just the result of sampling variability when H0 is true. Is it plausible
that a sample mean this much smaller than what was expected if H0 were true occurred
as a result of chance variation, or is , 15 a better explanation for what was observed?
Unless otherwise noted, all content on this page is © Cengage Learning.
Zinc intake

Figure 8.1 Boxplot for zinc intake data

The appropriate test statistic for testing the stated hypotheses is

x 2 15
z5
sy1n
Because n is large here, when H0 is true z has approximately a standard normal distri-
bution (because z was formed by standardizing x using 15, the mean value of x under
H0). This implies that the P-value will be a z-curve area. The test statistic value is
x 2 15 11.3 2 15 23.7
z5 5 5 5 26.17
sy1n 6.43y1115 .600
Values of z at least as contradictory to H0 as this are those even smaller than 26.17
(those resulting from x values that are even farther below 15 than 11.3). Thus
P@value 5 P(z , 26.17 when H0 is true)
5 area under the standard normal (z) curve to the left of 26.17
0
There is virtually no chance of seeing a z value this extreme as a result of chance
variation alone when H0 is true. If a significance level of .01 is used, then
P@value 0 # .01 5
so the null hypothesis should be rejected. Because the P-value is so small, the null
hypothesis would in fact be rejected at any reasonable significance level, even .001
or smaller. The data is much more consistent with the conclusion that true average
intake is in fact smaller than the RDA.

In Example 8.4, given that the alternative hypothesis asserted , 15, it might
seem reasonable to state H0 as $ 15, previously referred to as the implicit null hypoth-
esis. However, our null hypothesis is explicitly stated as a claim of equality (H0: 5 15).
On page 355 we asserted the conclusion using H0: 5 15 versus H0: , 15 would be
identical to that when considering H0: $ 15 versus Ha: , 15. Let us see why this is
the case.
In the previous example, we tested H0: 5 15 versus Ha: , 15 and rejected H0
in favor of Ha. Thus, we believe that , 15 is a much more plausible assertion than
5 15. It follows logically that we would also believe that , 15 is a much more plau-
sible than the claim that 5 16, or the claim that 5 17, and so on. In other words,
when we reject H0: 5 15 in favor of Ha: , 15, we are also implicitly saying that
, 15 is much more plausible than any value of that exceeds 15. This is why explicit
consideration of the null hypothesis with a claim of equality is equivalent to considering
the more realistic H0 that includes an appropriate inequality.
Let 0 denote the value of asserted by the null hypothesis (0 5 15 in Example 8.4).
The test statistic for testing hypotheses about when the sample size n is large is
x 2 0
z5
sy1n

When H0 is true, this test statistic will have approximately a standard normal distribu-
tion (this will be true for any test statistic labeled z in this book). The P-value is then a
z-curve area that depends on the inequality in H0:
Inequality in H0 P-value Type of test
. Area to the right of the calculated z Upper-tailed
, Area to the left of the calculated z Lower-tailed
Þ 2 ? (tail area captured by calculated z) Two-tailed
These three cases are illustrated in Figure 8.2.
curve
-value area in upper tail
1. Upper-tailed test
a contains the inequality >

0
Calculated

curve
-value area in lower tail
2. Lower-tailed test
a contains the inequality <

0
Calculated

-value sum of area in two tails

curve

3. Two-tailed test
a contains the inequality

Calculated

Figure 8.2 Determination of the -value when the test statistic is Unless otherwise noted, all content on this page is © Cengage Learning.

As an example of the latter case, suppose that we are testing

H0: 5 .5 versus Ha: Þ .5

where denotes true average bearing diameter. The large-sample test statistic is
x 2 .5
z5
sy1n
In this situation, values of x either much larger or much smaller than .5, corresponding
to z values far from zero in either direction, are inconsistent with H0 and give support to
Ha. If, for example, z 5 22.76, then
P@value 5 P(observing a z value at least as contradictory to H0 as 22.76
when H0 is true)

5 P(either z # 22.76 or z $ 2.76 when z has approximately

a standard normal distribution)
5 (area under z curve to the left of 22.76)
1 (area under z curve to the right of 2.76)
5 2(area under z curve to the left of 22.76)
5 2(.0029) 5 .0058

The P-value would also be .0058 if z 5 2.76. Using a significance level of .05, H0 would
be rejected because P-value # .

Section 8.1 Exercises

1. State whether each of the following assertions is a average exceeds 40, the manufacturer might be
legitimate statistical hypothesis and why: liable for damage to an electrical system due to fuse
a. H: . 100 b. H: x 5 45 malfunction. After obtaining data from a sample of
c. H: Þ 2.0 d. H: s # .50 fuses, what null and alternative hypotheses would
e. H: 1y2 , 1 f. H: x1 2 x2 5 25.0 be of interest to the manufacturer?
g. H: , .01, where is the parameter of an expo-
4. Before agreeing to purchase a large order of polyeth-
nential distribution used to model component
ylene sheaths for a particular type of high-pressure,
lifetime
oil-filled submarine power cable, a company wants
h. H: 5 .10, where is the population propor- to see conclusive evidence that the population stan-
tion of components that need warranty service dard deviation of sheath thickness is less than .05 mm.
i. H: x 5 sound intensity of a certain source (deci- What hypotheses should be tested, and why? In this
bels) has a lognormal distribution context, what are the type I and type II errors?
j. H: x 5 rupture strength of a certain material
(10,000 N/cm2) has a Weibull distribution with 5. A new design for the braking system on a certain
5 8 and 5 50 type of car has been proposed. For the current sys-
tem, the true average braking distance at 40 mph
2. To decide whether the pipe welds in a nuclear under specified conditions is known to be 120 ft. It is
power plant meet specifications, a random sam- proposed that the new design be implemented only
ple of welds is to be selected and the strength of if sample data strongly indicates a reduction in true
each weld (force required to break the weld) de- average braking distance for the new design. State
termined. Suppose a population mean strength of the relevant hypotheses, and describe the type I and
100 lb/in2 is the dividing line between welds meet- type II errors in the context of this situation.
ing specification or not doing so. Explain why it
might be better to test the hypotheses H0: 5 100 6. A mixture of pulverized fuel ash and Portland ce-
versus Ha: . 100 rather than H0: 5 100 versus ment to be used for grouting should have a true aver-
Ha: , 100. age compressive strength of more than 1300 KN/m2.
The mixture will not be used unless experimental
3. Many older homes have electrical systems that use evidence indicates conclusively that the strength
fuses rather than circuit breakers. A manufacturer specification has been met. State the relevant hy-
of 40-amp fuses wants to make sure that the true potheses, and describe the type I and type II errors
average amperage at which its fuses burn out is in the context of this problem.
indeed 40. If the average amperage is lower than
40, purchasers will complain because the fuses will 7. A regular type of laminate is currently being used by
have to be replaced too frequently, whereas if the a manufacturer of circuit boards. A special laminate

has been developed in an attempt to reduce warpage. have approximately a standard normal distribution
The regular laminate will be used on one sample of when H0 is true. Determine the value of z and the
specimens and the special laminate on another sam- P-value in each of the following cases:
ple; the amount of warpage will then be determined a. n 5 50, x 5 34.43, s 5 1.06
for each specimen. The manufacturer will then b. n 5 50, x 5 33.57, s 5 1.06
switch to the special laminate only if it can be dem- c. n 5 32, x 5 33.25, s 5 1.89
onstrated that the true average amount of warpage d. n 5 36, x 5 34.66, s 5 2.53
for that laminate is less than for the regular laminate.
State the relevant hypotheses, and describe the type I 13. It is specified that a certain type of iron should
and type II errors in the context of this situation. contain .85 gm of silicon per 100 gm of iron
(.85%). The silicon content of each of 32 randomly
8. a. Use the definition of a P-value to explain why selected iron specimens was determined, and the
H0 would certainly be rejected if P-value 5 accompanying Minitab output resulted from a test
.0003. of the appropriate hypotheses:
b. Use the definition of a P-value to explain why
Variable N Mean StDev SE Mean Z P-Value
H0 would definitely not be rejected if P-value 5
sil cont 32 0.8228 0.1894 0.0335 -0.81 0.42
.350.
a. What hypotheses were tested?
9. For which of the given P-values will the null hy-
b. What conclusion would be reached for a signifi-
pothesis be rejected when using a test with a signifi-
cance level of .05, and why? Answer the same
cance level of .05?
question for a significance level of .10.
a. .001 b. .021 c. .078
d. .047 e. .156 14. Lightbulbs of a certain type are advertised as having
an average lifetime of 750 hours. The price of these
10. For each of the given pairs of P-values and signifi- bulbs is very favorable, so a potential customer has
cance levels, state whether H0 should be rejected. decided to go ahead with a purchase arrangement
a. P@value 5 .084, 5 .05 unless it can be conclusively demonstrated that the
b. P@value 5 .003, 5 .001 true average lifetime is smaller than what is adver-
c. P@value 5 .048, 5 .05 tised. A random sample of 50 bulbs was selected,
d. P@value 5 .084, 5 .10 the lifetime of each bulb determined, and the ap-
e. P@value 5 .039, 5 .01 propriate hypotheses were tested using Minitab,
f. P@value 5 .017, 5 .10 resulting in the accompanying output:
11. Let denote the true average reaction time to a cer- Variable N Mean StDev SEMean Z P-Value

tain stimulus. A test of H0: 5 5 versus Ha: . 5 lifetime 50 738.44 38.20 5.40 -2.14 0.016

will be based on a large sample size so that when a. How can you tell from the output that the alter-
H0 is true, the test statistic z 5 (x 2 5)y(sy1n) native hypothesis was not Ha: . 750?
has approximately a standard normal distribution b. What conclusion would be appropriate for a
(the z curve). Determine the value of z and the cor- significance level of .05? A significance level
responding P-value in each of the following cases: of .01? What significance level and conclusion
a. n 5 50, x 5 5.23, s 5 .89 would you recommend?
b. n 5 35, x 5 5.72, s 5 1.01
15. A sample of 40 speedometers of a particular type is se-
c. n 5 40, x 5 5.35, s 5 1.67
lected, and each speedometer is calibrated for accura-
12. Newly purchased automobile tires of a certain type cy at 55 mph, resulting in a sample mean and sample
are supposed to be filled to a pressure of 34 psi. Let standard deviation of 53.87 and 1.36, respectively.
denote the true average pressure. A test of H0: 5 34 Does this data suggest that the true average reading
versus Ha: Þ 34 will be based on a large sample of when speed is 55 mph is in fact something other than
tires so that the test statistic z 5 (x 2 34)y(s 1n) will 55? State the relevant hypotheses, calculate the value

of the appropriate z statistic, determine the P-value, A measure of the accuracy of the automatic region
and state the conclusion for a significance level of .01. is the average linear displacement (ALD). The
paper gave the following ALD observations for a
16. To obtain information on the corrosion-resistance
sample of 49 kidneys (units of pixel dimensions).
properties of a certain type of steel conduit, 35
specimens are buried in soil for an extended pe- 1.38 0.44 1.09 0.75 0.66 1.28 0.51
riod. The maximum penetration (in mils) is then 0.39 0.70 0.46 0.54 0.83 0.58 0.64
measured for each specimen, yielding a sample 1.30 0.57 0.43 0.62 1.00 1.05 0.82
mean penetration of 52.7 and a sample standard 1.10 0.65 0.99 0.56 0.56 0.64 0.45
deviation of 4.8. The conduits were manufactured 0.82 1.06 0.41 0.58 0.66 0.54 0.83
with the specification that true average penetration 0.59 0.51 1.04 0.85 0.45 0.52 0.58
be at most 50 mils. Does the sample data indicate 1.11 0.34 1.25 0.38 1.44 1.28 0.51
that specifications have not been met? State the rel- a. Summarize and describe the data.
evant hypotheses, calculate the value of the appro- b. Is it plausible that ALD is at least approximately
priate z statistic, determine the P-value, and state normally distributed? Must normality be as-
the conclusion for a significance level of .05. sumed prior to testing hypotheses about true
17. Automatic identification of the boundaries of sig- average ALD? Explain.
nificant structures within a medical image is an c. The authors commented that in most cases the
area of ongoing research. The article “Automatic ALD is better than or on the order of 1.0. Does
Segmentation of Medical Images Using Image the data in fact provide strong evidence for
Registration: Diagnostic and Simulation Applica- concluding that true average ALD under these
tions” (J. of Medical Engr. and Tech., 2005: 53–63) circumstances is less than 1.0? Carry out an ap-
discussed a new technique for such identification. propriate test of hypotheses.

8.2 Tests Concerning Hypotheses About Means

In this section, we consider hypotheses either about a single population or process
mean or about a difference 1 2 2 between two such means. Our test procedures
will utilize test statistics that have either exactly or approximately a t distribution when
the null hypothesis H0 is true. This implies that the P-value for the test—the prob-
ability, calculated assuming that H0 is true, of observing a test statistic value at least as
contradictory to the null hypothesis as what was obtained—will be a t-curve tail area of
some sort. The particular tail area that is relevant depends on whether the alternative
hypothesis Ha contains an inequality of the form ., ,, or Þ.

-Values for Tests

Inequality in Ha Type of test
Determination of the P-value
. Upper-tailed
Area under the relevant curve to
the of the calculated
, Lower-tailed Area under the relevant curve to
the of the calculated
Þ Two-tailed Twice the tail area captured by the
calculated under the relevant curve
By the “relevant” curve, we mean the one having the appropriate number of df.
The three cases are illustrated in Figure 8.3.

curve for relevant df

-value area in upper tail

1. Upper-tailed test
a contains the inequality >

0
Calculated
curve for relevant df

-value area in lower tail

2. Lower-tailed test
a contains the inequality <

0
Calculated

-value sum of area in two tails

curve for relevant df

3. Two-tailed test
a contains the inequality

Calculated

Figure 8.3 -values for tests: (1) upper-tailed; (2) lower-tailed; (3) two-tailed

Appendix Table VI contains a tabulation of t-curve upper-tail areas. Each differ-

ent column of the table is for a different number of df, and the rows are for calculated
values of the test statistic t ranging from 0.0 to 4.0 in increments of .1. For example,
the number .074 appears at the intersection of the 1.6 row and the 8 df column, so
the area under the 8 df curve to the right of 1.6 (an upper-tail area) is .074. Because
t curves are symmetric, .074 is also the area under the 8 df curve to the left of 21.6
(a lower-tail area).
Suppose, for example, that a test of H0: 5 100 versus Ha: . 100 is based on the Unless otherwise noted, all content on this page is © Cengage Learning.

8 df t distribution. If the calculated value of the test statistic is t 5 1.6, then the P-value
for this upper-tailed test is .074. Because .074 exceeds .05, we would not be able to
reject H0 at significance level .05. If the alternative hypothesis is Ha: , 100 and a test
based on 20 df yields t 5 23.2, then Appendix Table VI shows that the P-value is the
captured lower-tail area .002. The null hypothesis can be rejected at either level .05 or
.01. Consider testing H0: 1 2 2 5 0 versus Ha: 1 2 2 Þ 0; the null hypothesis states
that the means of the two populations are identical, whereas the alternative hypothesis
states that they are different without specifying a direction of departure from H0. If the
test is based on 20 df and t 5 3.2, then the P-value for this two-tailed test is 2(.002) 5
.004. This would also be the P-value for t 5 23.2. The tail area is doubled because
values both larger than 3.2 and smaller than 23.2 are more contradictory to H0 than
what was calculated (values farther out in either tail of the t curve). Notice that if the

calculated value of t exceeds 4.0, for all but very small df’s the captured tail area is negli-
gible. Also note that the table jumps from 40 df to 60 df to 120 df to (the z or standard
normal curve). For example, for 45 df, one could either interpolate between 40 df and
60 df, or use the z-curve area as an approximation.

Tests Concerning a Single Mean

Consider testing hypotheses about the mean of a single population or process. The
null hypothesis will be a statement of equality, such as H0: 5 100. The alternative hy-
pothesis Ha will contain one of three possible inequalities. A general description of the
test procedures necessitates using a symbol to denote the value of asserted to be true
by the null hypothesis. We use 0 to denote this null value. Thus the general form of
the null hypothesis will be H0: 5 0, and the contradictory claim will be Ha: . 0,
Ha: , 0, or Ha: Þ 0.
Suppose that the sample x1, . . . , xn has been randomly selected from a normal
population or process distribution (recall from Chapter 2 that the plausibility of this can
be checked by constructing a normal quantile plot). Then, as discussed in the develop-
ment of a confidence interval for , the standardized variable
x2
t5
sy1n
has a t distribution with n 2 1 degrees of freedom. Our test statistic results from replac-
ing by the null value 0. For H0: 5 100, this gives the test statistic
x 2100
t5
sy1n
The key result is that when the null hypothesis is true, the test statistic has a t distribu-
tion based on n 2 1 df; this is what justifies computing the P-value as described at the
beginning of this section.

The One-Sample Test

Null hypothesis: 0: 5 0
2 0
Test statistic: 5
y1
-value: Calculated by reference to the curve for 2 1 df. The test is upper-tailed
when the alternative hypothesis is a: . 0, lower-tailed in the case
a: , 0, and two-tailed if the alternative is a: Þ 0.
Assumption: ,
1 2 ,…, is a random sample from a normal population or process
distribution. If is large (usually . 30 suffices), this normality assump-
tion is no longer necessary, because the Central Limit Theorem guaran-
tees that the sampling distribution is approximately normal whatever
the shape of the population or process distribution. The test statistic
can then be denoted by rather than , and the -value is obtained from
the (standard normal) curve.

Example 8.5 Glycerol is a major by-product of ethanol fermentation in wine production and
contributes to the sweetness, body, and fullness of wines. The article “A Rapid and
Simple Method for Simultaneous Determination of Glycerol, Fructose, and Glucose
in Wine” (American J. of Enology and Viticulture, 2007: 279–283) includes the fol-
lowing observations on glycerol concentration (mg/mL) for samples of standard-
quality (uncertified) white wines: 2.67, 4.62, 4.14, 3.81, 3.83. Suppose the desired
concentration value is 4. Does the sample data suggest that true average concen-
tration is something other than the desired value? The normal quantile plot in
Figure 8.4 provides strong support for assuming that the population distribution of
glycerol concentration is normal. Let’s carry out a test of appropriate hypotheses
using the one-sample t test with a significance level of .05.

4.5

4.0
Concentration

3.5

3.0

2.5

2 1 0 1 2
quantile

Figure 8.4 Normal quantile plot for the data of Example 8.5
Our analysis employs a sequence of steps that we advocate using for any hy-
pothesis-testing investigation:
Unless otherwise noted, all content on this page is © Cengage Learning.
1. Parameter of interest: 5 true average glycerol concentration
2. Null hypothesis: H0: 5 4
3. Alternative hypothesis: Ha: Þ 4
24
4. Test statistic formula: t 5 xsy1n (do not substitute sample quantities yet)

5. Computation of test statistic value: x 5 3.814, s 5 .718, and

3.814 2 4
t5 5 2.58 2.6
.718y15
6. Determination of the P-value: The test is based on n 2 1 5 4 df. Appendix
Table VI shows that the area under the 4 df curve to the right of .6 is .290.

Therefore the area under the 4 df curve to the left of 2.6 is .290. Because
the test is two-tailed, P-value 5 2(.290) 5 .580.
7. Conclusion: The specified significance level is 5 .05. Since P-value 5
.580 . .05 5 , we cannot reject H0 at this (or any other reasonable) signifi-
cance level. The data does not provide strong evidence for concluding that
population mean glycerol concentration differs from 4. Notice that in not
rejecting H0, we may be committing a type II error (not rejecting the null
hypothesis when it is false); we hope, though, we came to this conclusion
for the right reason!
The R output from a request to carry out the test follows. The P-value differs
slightly from ours because R uses more decimal accuracy in computing t. Thus,
if H0 were true, about 59% of all samples would yield a value of t more extreme
than what we obtained. We decided not to reject H0 because 2.58 is not in the
most extreme 5% of all t values.
One Sample t-test
data: concentration
t = -0.5789, df = 4, p-value = 0.5937
alternative hypothesis: true mean is not equal to 4
95 percent confidence interval: 2.921875 4.706125
sample estimates: mean of x 3.814

Suppose the sample size in Example 8.5 had been 45 rather than 5, with the same
values of x and s. The normality assumption for glycerol concentration becomes un-
necessary. The test statistic would be labeled z, and its value would be z 5 21.74.
Appendix Table I shows that the area under the z curve to the left of 21.74 is .0409, so
the P-value is 2(.0409) 5 .0818 and H0 would be rejected at level .10 but not at levels
.05 or .01.

Tests Concerning a Difference Between

Two Means: Independent Samples
Hypothesis testing often is used as a basis for comparing two populations, processes, or treat-
ments. For example, data might be collected to decide whether population mean fuel ef-
ficiency for a particular compact car exceeds that for a certain midsize car by more than 4
miles per gallon. Alternatively, two coatings for retarding corrosion might be available for
treating a certain type of pipe. An experiment might then be carried out to decide whether
the true average amount of corrosion when the first coating is used differs from the true
average amount when the second coating is used; the two coatings are the treatments being
studied. The same notation for the two population, process, or treatment means employed
in connection with confidence intervals in the previous chapters will be used here:
1 5 mean of population or process 1, or the true average response when treat-
ment 1 is applied
2 5 mean of population or process 2, or the true average response when treat-
ment 2 is applied

Inferences about the value of 1 relative to 2 are based on two independently obtained
random samples, one from the first population, process, or treatment and the other
from the second. Let
n1 5 number of observations in the first sample
x1 5 sample mean of these n1 observations
s21 5 sample variance of these n1 observations
and n2, x2, and s22 are defined analogously with respect to the second sample. Assume that
both population, process, or treatment response distributions are normal. A confidence
interval for the difference 1 2 2 was based on the fact that the standardized variable
x1 2 x2 2 (1 2 2)
t5
s21 s22
1
C n1 n2
has approximately a t distribution. Suppose the null hypothesis is H0: 1 2 2 5 4 (i.e.,
the value of 1 is 4 larger than the value of 2). A test statistic results from replacing
1 2 2 in the numerator of t by the null value 4. The test statistic then has approxi-
mately a t distribution when the null hypothesis is true. The test will be upper-tailed if
the alternative hypothesis is Ha: 1 2 2 . 4, lower-tailed if the alternative contains the
inequality ,, and two-tailed if Þ appears in Ha.
A general description of the test procedure requires the use of a symbol for the null
value; we use the Greek letter D for that purpose. Most frequently, in practice, D 5 0,
in which case the null hypothesis says there is no difference between the two ’s.

The Two-Sample Test

Null hypothesis: 0: 1 2 2 5 D ( D denotes the null value, a number appropriate to
the problem situation under consideration)
12 22D
Test statistic: 5
2 2
1 2
1
C 1 2
-value: When 0 is true, the test statistic has approximately a distribution
with
3( 2
1) 1 ( 2)
2 2
4
df 5 4
( 1) ( 2)4
1
121 221

where 5 y1 (df should be rounded down to the nearest whole

number). The -value should then be calculated by reference to the
corresponding curve according to whether the test is upper-, lower-,
or two-tailed.
Assumptions: The two random samples are selected independently, both from under-
lying normal population, process, or treatment response distributions.
If the sample sizes are large (usually both 1 . 30 and 2 . 30 will suffice),

the Central Limit Theorem implies that the normality assumption is no

longer necessary. In this case, the test statistic can be denoted by , and
the -value calculated by reference to the curve.

Example 8.6 The deterioration of many municipal pipeline networks across the country is a grow-
ing concern. One technology proposed for pipeline rehabilitation uses a flexible lin-
er threaded through existing pipe. The article “Effect of Welding on a High-Density
Polyethylene Liner” (J. of Materials in Civil Engr., 1996: 94–100) reported the fol-
lowing data on tensile strength (psi) of liner specimens both when a certain fusion
process was used and when this process was not used:
1. No fusion: 2748 2700 2655 2822 2511
3149 3257 3213 3220 2753
n1 5 10 x1 5 2902.8 s1 5 277.3 se1 5 87.69
2. Fused: 3027 3356 3359 3297 3125 2910 2889 2902
n2 5 8 x2 5 3108.1 s2 5 205.9 se2 5 72.80
Figure 8.5 shows normal probability plots from Minitab. These plots employ a
probability scale rather than the normal quantiles discussed previously, but the criti-
cal issue is the same: Is the pattern of plotted points reasonably close to linear? There
certainly is some wiggling in these plots, but not enough to suggest that the normal-
ity assumption is implausible. Furthermore, the P-values that appear along with the
plots are for formal tests of the assertion that the underlying distributions are normal
(we discuss this test in Section 8.4). Because each P-value exceeds .1, the hypothesis
of normality cannot be rejected.

99
Mean 2903
95 StDev 277.3
90 N 10
RJ 0.944
Unless otherwise noted, all content on this page is © Cengage Learning.

80
70 P-Value >0.100
Percent

60
50
40
30
20
10
5

1
2200 2400 2600 2800 3000 3200 3400 3600
NotFused

Figure 8.5 Normal probability plots from Minitab of the tensile

99
strength data Mean 3108
95 StDev 205.9
90 N 8
RJ 0.939
80
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may P-Value
70 learningwww.ebook3000.com >0.100
be suppressed from the eBook and/or eChapter(s). Editorial review has
deemed that any suppressed content does not materially affect the overall experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
60
Percent
20
10
5
370 chapter 8 Testing Statistical Hypotheses
1
2200 2400 2600 2800 3000 3200 3400 3600
NotFused

99
Mean 3108
95 StDev 205.9
90 N 8
RJ 0.939
80
70 P-Value >0.100
60

Percent
50
40
30
20
10
5

1
2500 2750 3000 3250 3500
Fused

Figure 8.5 ( )

The authors of the article stated that the fusion process increased the average
tensile strength. The message from the comparative boxplot of Figure 8.6 is not all
that clear. Let’s carry out a test of hypotheses to see whether the data supports this
conclusion.

1. Let 1 be the true average tensile strength of specimens when the no-fusion
treatment is used and 2 denote the true average tensile strength when the
fusion treatment is used.
2. H0: 1 2 2 5 0 (no difference in the true average tensile strengths for
the two treatments)

3. Ha: 1 2 2 , 0 (true average tensile strength for the no-fusion treatment

is less than that for the fusion treatment, so the investiga-
tors’ conclusion is correct)

Type 2
Unless otherwise noted, all content on this page is © Cengage Learning.

Type 1

Strength
2500 2600 2700 2800 2900 3000 3100 3200 3300 3400

Figure 8.6 A comparative boxplot of the tensile strength data

4. The null value is D 5 0, so the test statistic is

x1 2 x2
t5
s21 s22
1
C n1 n2
5. We now compute both the test statistic value and the df for the test:
2902.8 2 3108.1 2205.3
t5 5 5 21.8
(277.3) 2
(205.9) 2 113.97
1
C 10 8
3 (87.69) 1 (72.80)2 4 2
2
df 5 5 15.94
(87.69)4y9 1 (72.80)4y7
so the test will be based on 15 df.
6. Appendix Table VI shows that the area under the 15 df t curve to the right
of 1.8 is .046, so the P-value for a lower-tailed test is also .046. The following
Minitab output summarizes all the computations:
Twosample T for nofusion vs fused
N Mean StDev SE Mean
nofusion 10 2903 277 88
fused 8 3108 206 73
95% C.I. for mu nofusion-mu fused: (–448, 38)
T-Test mu nofusion = mu fused (vs <): T= – 1.80 P = 0.046 DF=15
7. Using a significance level of .05, we can barely reject the null hypothesis in
favor of the alternative hypothesis, confirming the conclusion stated in the
article. However, someone demanding more compelling evidence might
select = .01, a level for which H0 cannot be rejected.

Suppose the issue in Example 8.6 had been whether fusing increased true aver-
age strength by more than 100 psi. Then the relevant hypotheses would have been
H0 5 1 2 2 5 2100 versus Ha: 1 2 2 , 2 100; that is, the null value would have
been D 5 2100.

Tests Concerning a Difference Between

Two Means: Paired Data
A comparison of two population, process, or treatment means is often carried out by col-
lecting data in pairs. Suppose, for example, that two different fertilizer formulations are
being compared with respect to crop yield. Variation in soil characteristics, amount of
precipitation, amount of sunshine, and various other factors can affect yield. To protect
against this extraneous variation, an investigator could select pairs of plots (the experi-
mental units) so that within each pair the two plots are as similar as possible with respect
to any characteristics that might have a bearing on yield. Then the first fertilizer could

be applied to one plot within each pair and the second formulation used on the other
plot. This pairing is really a special case of blocking, as discussed in Chapter 4. The
homogeneity of experimental units within each block (pair) makes it easier to detect a
difference between the treatments if a difference actually exists.
Again, let 1 and 2 denote the two population, process, or treatment response
means. The pairs in a sample can be viewed as having been selected from a much larger
population of pairs. Now conceptualize subtracting the second number in each such
pair from the first number to obtain a population of differences. If we let d denote the
population mean difference, it follows that
d 5 1 2 2
This relationship implies that any hypothesis about 1 2 2 is equivalent to a hypothe-
sis about d. For example, the assertion that 1 2 2 5 10 is the same as the claim
d 5 10. But hypotheses about d can be tested by using the sample differences. In
particular, assuming that the underlying distribution of differences is normal, we can
use a one-sample t test based on these sample differences.

The Paired Test

Null hypothesis 0:
5 D (equivalent to 1 2 2 5 D) where denotes the popula-
tion mean difference
2D
Test statistic: 5 , where
y1
5 number of sample differences (pairs)
,
1 2 , . . . , 5 these sample differences
5 sample mean difference
5 sample standard deviation of the differences
-value: Calculated from the curve with 2 1 df as described previously. The
test is upper-tailed, lower-tailed, or two-tailed, depending on whether the
inequality in a is ., ,, or Þ , respectively.
Assumptions: The sample differences 1, . . . , have been randomly selected from a dif-
ference population having a normal distribution. If is large, the normality
assumption is not necessary; the test statistic is labeled , and the -value
is determined from the curve.

Example 8.7 Musculoskeletal neck-and-shoulder disorders are all too common among office
staff who perform repetitive tasks using visual display units. The article “Upper-
Arm Elevation During Office Work” (Ergonomics, 1996: 1221–1230) reported
on a study to determine whether more varied work conditions would have any
impact on arm movement. The accompanying data was obtained from a sample
of n 5 16 subjects. Each observation is the amount of time, expressed as a pro-
portion of total time observed, during which arm elevation was below 30°. The
two measurements from each subject were obtained 18 months apart. During
this period, work conditions were changed, and subjects were allowed to engage

in a wider variety of work tasks. Does the data suggest that true average time dur-
ing which elevation is below 30° differs after the change from what it was before
the change?
Subject: 1 2 3 4 5 6 7 8
Before: 81 87 86 82 90 86 96 73
After: 78 91 78 78 84 67 92 70
Difference: 3 24 8 4 6 19 4 3
Subject: 9 10 11 12 13 14 15 16
Before: 74 75 72 80 66 72 56 82
After: 58 62 70 58 66 60 65 73
Difference: 16 13 2 22 0 12 29 9
Figure 8.7 shows a normal probability plot of the 16 differences; the pattern in the
plot is quite straight, supporting the normality assumption. A boxplot of these differ-
ences appears in Figure 8.8; the boxplot is located considerably to the right of zero,
suggesting that perhaps d . 0 (note also that 13 of the 16 differences are positive
and only two are negative).

99
Mean 6.75
95 StDev 8.234
90 N 16
RJ 0.992
80
70 P-Value >0.100
Percent

60
50
40
30
20
10
5

1
10 0 10 20 30
Difference

Figure 8.7 A normal probability plot from Minitab of the

Unless otherwise noted, all content on this page is © Cengage Learning.

differences in Example 8.7

Difference
–10 0 10 20

Figure 8.8 A boxplot of the differences in Example 8.7

Let’s now use the recommended sequence of steps to test the appropriate
hypotheses.

1. Let d denote the true average difference between elevation time before the
change in work conditions and time after the change.
2. H0: d 5 0 (there is no difference between true average time before the
change and true average time after the change)
3. Ha: d Þ 0
d20 d
4. t 5 5
sdy1n sdy1n
5. n 5 16, ^ di 5 108, ^ d2i 5 1746, from which d 5 6.75, sd 5 8.234, and
6.75
t5 5 3.28 3.3
8.234y116
6. Appendix Table VI shows that the area to the right of 3.3 under the t curve
with 15 df is .002. The inequality in Ha implies that a two-tailed test is appro-
priate, so the P-value is approximately 2(.002) 5 .004 (Minitab gives .0051).
7. Since .004 , .01, the null hypothesis can be rejected at either significance
level .05 or .01. It does appear that the true average difference between times
is something other than zero; that is, true average time after the change is
different from that before the change.
Suppose the question posed had been, Does it appear that the change in work
conditions decreases true average time by more than 5? The relevant hypoth-
eses would then be H0: d 5 5 versus Ha: d . 5, for which the test statistic is
t 5 (d 2 5)y(sdy1n ).

In Section 8.4, we show how a test of the null hypothesis that a population distribu-
tion is normal can be based on a normal quantile or probability plot. In Section 8.5, we
discuss several further aspects of hypothesis testing, including the determination of type
II error probabilities for t tests.

Section 8.2 Exercises

18. Give as much information as you can about the the true average reflectometer reading for a new type of
P-value of a t test in each of the following situations: paint under consideration. A test of H0: 5 20 versus
a. Upper-tailed test, df 5 8, t 5 2.0 Ha: . 20 will be based on a random sample of size n
b. Lower-tailed test, df 5 11, t 5 22.4 from a normal population distribution. What conclu-
c. Two-tailed test, df 5 15, t 5 21.6 sion is appropriate in each of the following situations?
d. Upper-tailed test, df 5 19, t 5 2.4 a. n 5 15, t 5 3.2, 5 .05
e. Upper-tailed test, df 5 5, t 5 5.0 b. n 5 9, t 5 1.8, 5 .01
f. Two-tailed test, df 5 40, t 5 24.8 c. n 5 24, t 5 2.2
19. The paint used to make lines on roads must reflect 20. A certain pen has been designed so that true
enough light to be clearly visible at night. Let denote average writing lifetime under controlled conditions

(involving the use of a writing machine) is at least 10 b. A normal quantile plot of the data was quite
hours. A random sample of 18 pens is selected, the straight. Use the descriptive output to test the
writing lifetime of each is determined, and a normal appropriate hypotheses.
quantile plot of the resulting data supports the use of
23. Exercise 5 in Chapter 2 gave n 5 12 observations
a one-sample t test.
on daily energy demand readings (kW h) for
a. What hypotheses should be tested if the investi-
remote telecommunications stations throughout
gators believe a priori that the design specifica-
Cameroon, from which the sample mean and
tion has been satisfied?
sample standard deviation are 32.59 and 10.66, re-
b. What conclusion is appropriate if the hypoth-
spectively. Suppose the investigators had believed a
eses of part (a) are tested, t 5 22.3, and 5 .05?
priori that true average daily energy demand would
c. What conclusion is appropriate if the hypothe-
be at most 30 kW h. Does the data contradict this
ses of part (a) are tested, t 5 21.8, and 5 .01?
prior belief? Assuming normality, test the appropri-
d. What should be concluded if the hypotheses of
ate hypotheses using a significance level of .05.
part (a) are tested and t 5 23.6?
24. Reconsider the sample observations introduced in
21. The true average diameter of ball bearings of a cer-
Exercise 15 in Chapter 2 on the required force (N)
tain type is supposed to be .5 in. A one-sample t test
to cause initial cracks in a thin enclosure for a sub-
will be carried out to see whether this is the case.
dermally implanted biotelemetry device:
What conclusion is appropriate in each of the fol-
lowing situations? 2006.1 2065.2 2118.9 1686.6 1966.9 1792.5
a. n 5 13, t 5 1.6, 5 .05
Suppose the device will not be used unless the
b. n 5 13, t 5 21.6, 5 .05
true average required force to cause initial cracks
c. n 5 25, t 5 22.6, 5 .01
exceeds 1800 N. Does this requirement appear to
d. n 5 25, t 5 23.9
have been satisfied? State and test the appropriate
22. The article “The Foreman’s View of Quality Con- hypotheses.
trol” (Quality Engr., 1990: 257–280) described
25. Poly(3-hydroxybutyrate) (PHB), a semicrystalline
an investigation into the coating weights for large
polymer that is fully biodegradable and biocompat-
pipes resulting from a galvanized coating process.
ible, is obtained from renewable resources. From a
Production standards call for a true average weight
sustainability perspective, PHB offers many attrac-
of 200 lb per pipe. The accompanying descriptive
tive properties though it is more expensive to pro-
summary and boxplot are from Minitab.
duce than standard plastics. The authors of “The
Variable N Mean Median TrMean StDev SEMean Melting Behaviour of Poly(3-Hydroxybutyrate) by
ctg wt 30 206.73 206.00 206.81 6.35 1.16 DSC. Reproducibility Study” (Polymer Testing,
2013: 215–220) wanted to investigate various physi-
Unless otherwise noted, all content on this page is © Cengage Learning.

Variable Min Max Q1 Q3

ctg wt 193.00 218.00 202.75 212.00 cal properties of PHB by using a differential scan-
ning calorimeter (DSC).

For each of 12 PHB specimens, the authors used a

DSC to measure the melting point (in °C) of the
polymer, which is the temperature for 99% comple-
tion of the fusion process.
Coating weight
190 200 210 220 180.5 181.7 180.9 181.6 182.6 181.6
181.3 182.1 182.1 180.3 181.7 180.5
a. What does the boxplot suggest about the status A normal probability plot of the data shows a rea-
of the specification for true average coating sonably linear pattern, so it is plausible that the
weight? population distribution of PHB melting points as

measured by DSC is at least approximately normal. Assuming that both zinc mass distributions are at
The sample mean and standard deviation are 181.4 least approximately normal, carry out a test at sig-
and .7242, respectively. Is there compelling evi- nificance level .05 to decide whether true average
dence for concluding that true average melting zinc mass is different for the two types of batteries.
point exceeds 181°C? Carry out a test of hypotheses
29. Quantitative noninvasive techniques are needed
using a significance level of .05.
for routinely assessing symptoms of peripheral
26. The relative conductivity of a semiconductor neuropathies, such as carpal tunnel syndrome
device is determined by the amount of impurity (CTS). The article “A Gap Detection Tactility Test
“doped” into the device during its manufacture. for Sensory Deficits Associated with Carpal Tun-
A silicon diode to be used for a specific purpose nel Syndrome” (Ergonomics, 1995: 2588–2601)
requires an average cut-on voltage of .60 V, and if reported on a test that involved sensing a tiny gap
this is not achieved, the amount of impurity must in an otherwise smooth surface by probing with a
be adjusted. A sample of diodes was selected and finger; this functionally resembles many work-re-
the cut-on voltage was determined. The accompa- lated tactile activities, such as detecting scratches
nying SAS out-put resulted from a request to test or surface defects. When finger probing was not
the appropriate hypotheses. allowed, the sample average gap detection thresh-
N Mean Std Dev T Prob>|T| old for n1 5 8 normal subjects was 1.71 mm, and
15 0.0453333 0.0899100 1.9527887 0.0711 the sample standard deviation was .53; for n2 5
10 CTS subjects, the sample mean and sample
(Note: SAS explicitly tests H0: 5 0, so to test
standard deviation were 2.53 and .87, respectively.
H0: 5 .60, the null value .60 must be subtracted
Does this data suggest that the true average gap
from each xi; the reported mean is then the average
detection threshold for CTS subjects exceeds that
of the (xi 2 .60) values. Also, SAS’s P-value is always
for normal subjects? State and test the relevant hy-
for a two-tailed test.) What would be concluded for
potheses using a significance level of .01.
a significance level of .01? .05? .10?
30. According to the article “Fatigue Testing of Con-
27. Determine the number of degrees of freedom
doms” (Polymer Testing, 2009: 567–571), “tests
for the two-sample t test in each of the following
currently used for condoms are surrogates for the
situations:
challenges they face in use,” including a test for
a. n1 5 10, n2 5 10, s1 5 5.0, s2 5 6.0
holes, an inflation test, a package seal test, and tests
b. n1 5 10, n2 5 15, s1 5 5.0, s2 5 6.0
of dimensions and lubricant quality. The investiga-
c. n1 5 10, n2 5 15, s1 5 2.0, s2 5 6.0
tors developed a new test that adds cyclic strain to a
d. n1 5 12, n2 5 24, s1 5 5.0, s2 5 6.0
level well below breakage and determines the num-
28. Urban storm water can be contaminated by many ber of cycles to break.
sources, including discarded batteries. When rup- The article reported that for a sample of
tured, these batteries release metals of environmen- 20 natural latex condoms of a certain type, the sample
tal significance. The article “Urban Battery Litter” mean and sample standard deviation of the number
(J. of Environ. Engr., 2009: 46–57) presented sum- of cycles to break were 4358 and 2218, respectively,
mary data for characteristics of a variety of batteries whereas a sample of 20 polyisoprene condoms gave a
found in urban areas around Cleveland. sample mean and sample standard deviation of 5805
and 3990, respectively. Is there strong evidence for
Here are data on zinc mass (g) for two different
concluding that the true average number of cycles to
brands of size D batteries:
break for the polyisoprene condom exceeds that for
Sample Sample Sample the natural latex condom by more than 1000 cycles?
Brand Size Mean SD Carry out a test using a significance level of .01.
Duracell 15 138.52 7.76 (Note: The cited paper reported P-values of t tests for
Energizer 20 149.07 1.52 comparing means of the various types considered.)

31. Fusible interlinings are being used with increasing and another sample wired with EC aluminum. Does
frequency to support outer fabrics and improve the the accompanying SAS output suggest that the true
shape and drape of various pieces of clothing. The average potential drop for alloy connections (type 1)
article “Compatibility of Outer and Fusible Inter- is higher than that for EC connections (as stated in
lining Fabrics in Tailored Garments” (Textile Res. J., the article)? Carry out the appropriate test using a sig-
1997: 137–142) gave the accompanying data on ex- nificance level of .01. In reaching your conclusion,
tensibility (%) at 100 gm/cm for both high-quality what type of error might you have committed? Note:
fabric (H) and poor-quality fabric (P) specimens: SAS reports the P-value for a two-tailed test.
H: 1.2 .9 .7 1.0 1.7 1.7 1.1 .9 1.7 Type N Mean Std Dev Std Error
1.9 1.3 2.1 1.6 1.8 1.4 1.3 1.9 1.6 1 20 17.4990 0.55012821 0.12301241
.8 2.0 1.7 1.6 2.3 2.0 2 20 16.9000 0.48998389 0.10956373

P: 1.6 1.5 1.1 2.1 1.5 1.3 1.0 2.6 Type Variances T DF Prob>|T|
1 Unequal 3.6362 37.5 0.0008
a. Construct normal quantile plots to verify the 2 Equal 3.6362 38.0 0.0008
plausibility of both samples having been selected
from normal population distributions. 34. The article “Evaluation of a Ventilation Strategy
b. Construct a comparative boxplot. Does it sug- to Prevent Barotrauma in Patients at High Risk for
gest that there is a difference between true Acute Respiratory Distress Syndrome” (New England
average extensibility for high-quality fabric spec- J. of Medicine, 1998: 355–358) reported on an ex-
imens and that for poor-quality specimens? periment in which 120 patients with similar clinical
c. The sample mean and standard deviation for features were randomly divided into a control group
the high-quality sample are 1.508 and .444, re- and a treatment group, each consisting of 60 of the
spectively, and those for the poor-quality sample patients. The sample mean ICU stay (days) and sam-
are 1.588 and .530. Use the two-sample t test to ple standard deviation for the treatment group were
decide whether true average extensibility differs 19.9 and 39.1, respectively, whereas these values for
for the two types of fabrics. the control group were 13.7 and 15.8.
a. Calculate a point estimate for the difference be-
32. The article cited in Exercise 41 in Chapter 7
tween true average ICU stay for the treatment
gave the following data on work of adhesion mea-
and control groups. Does this estimate suggest
surements (in mJ/m2) for samples of ultra-high
that there is a significant difference between
performance concrete adhered to two types of
true average stays under the two conditions?
substrates:
b. Answer the question posed in part (a) by carry-
Substrate Observations ing out a formal test of hypotheses. Is the result
Steel: 107.1 109.5 107.4 106.8 108.1 different from what you conjectured in part (a)?
Glass: 122.4 124.6 121.6 120.6 123.3 c. Does it appear that ICU stay for patients
Assuming that both samples were selected from given the ventilation treatment is normally
normal distributions, carry out a test of hypotheses distributed? Explain your reasoning.
to decide whether the true average work of adhe-
35. According to the article “Modelling and Predict-
sion for the glass substrate is more than 12 mJ/m2
ing the Effects of Submerged Arc Weldment Pro-
higher than that for the steel substrate.
cess Parameters on Weldment Characteristics and
33. The article “The Influence of Corrosion Inhibitor Shape Profiles” (J. of Engr. Manuf., 2012: 1230–
and Surface Abrasion on the Failure of Aluminum- 1240), the submerged arc welding (SAW) process
Wired Twist-on Connections” (IEEE Trans. on Com- is commonly used for joining thick plates and
ponents, Hybrids, and Manuf. Tech., 1984: 20–25) re- pipes. During welding, the SAW electrode causes
ported data on potential drop measurements for one a slight deformation on and in the surface of the
sample of connectors wired with alloy aluminum base metal. This deformation is known as the

SAW weldment profile; research has shown that its 37. Exercise 54 in Chapter 7 presented a t variable
shape could be related to plate melting efficiency. appropriate for making inferences about 1 2 2
Authors of the article wanted to investigate how when both population distributions are normal
certain settings of the welding process affect macro- and, in addition, it can be assumed that 1 5 2.
structure zones of the SAW weldment profile. The a. Describe how this variable can be used to form a
heat affected zone (HAZ), a band created within the test statistic and test procedure, the pooled t test,
base metal during welding, was of particular interest. for testing H0: 1 2 2 5 D.
The article reported the impact of various b. Use the pooled t test to test the relevant hypotheses
SAW process settings (including current, voltage, based on the SAS output given in Exercise 33.
and welding speed) on characteristics of the weld- c. Use the pooled t test to reach a conclusion in
ment profile. In one investigation, the SAW pro- Exercise 35.
cess was run on various current settings (A) and the
38. The drug diethylstilbestrol was used for years by
depth (mm) of the HAZ was recorded. The data be-
women as a nonsteroidal treatment for pregnancy
low is partitioned across high (525 A) and nonhigh
maintenance, but it was banned in 1971 when
(,525 A) current settings:
research indicated a link with the incidence of
NonHigh: 1.04 1.15 1.23 1.69 1.92 1.98 2.36 2.49 2.72 cervical cancer. The article “Effects of Prenatal
1.37 1.43 1.57 1.71 1.94 2.06 2.55 2.64 2.82 Exposure to Diethylstilbestrol (DES) on Hemi-
spheric Laterality and Spatial Ability in Human
High: 1.55 2.02 2.02 2.05 2.35 2.57 2.93 2.94 2.97
Males” (Hormones and Behavior, 1992: 62–75) dis-
cussed a study in which ten males exposed to DES
Does it appear that true average HAZ depth is larger
and their unexposed brothers underwent various
for the high current condition than for the nonhigh
tests. This is the summary data on the results of a
current condition? Carry out a test of appropriate
spatial ability test:
hypotheses using a significance level of .01.
exposed mean 5 12.6
36. Which factors are relevant to the time a consumer
unexposed mean 5 13.8 sd
spends looking at a product on the shelf prior to
standard error of difference 5 5 .5
selection? The article “Effects of Base Price Upon 1n
Search Behavior of Consumers in a Supermarket” Does DES exposure appear to be associated with
(J. Econ. Psychol., 2003: 637–652) reported the fol- reduced spatial ability? State and test the appropri-
lowing data on elapsed time (sec) for fabric softener ate hypotheses using 5 .05. Does the conclusion
purchasers and washing-up liquid purchasers; the change if 5 .01 is used?
former product is significantly more expensive than
39. Parents often urge their children to “sit up
the latter. These products were chosen because
straight” when dining to practice good table
they are similar with respect to allocated shelf space
manners. Although proper posture is part of
and number of alternative brands.
maintaining good etiquette, research has shown
Sample Sample Sample that it can also help in reducing musculoskeletal
Product Size Mean SD disorders (MSDs). The authors of “Reducing
Fabric softener 15 30.47 19.15 Musculoskeletal Disorders Among Computer
Operators: Comparison Between Ergonomics
Washing-up liquid 19 26.53 15.37
Interventions at the Workplace” (Ergonomics,
a. What if any assumptions are needed before the 2012: 15711–1585) investigated the impact of
t inferential procedure can be used to compare a workplace intervention for reducing MSDs
true average elapsed times? for computer workers. For one group of workers
b. Carry out a test of hypotheses to decide whether the intervention was in the form of a short oral
the true average difference in elapsed times dif- presentation on how to sit; the preferred heights
fers from zero. of chairs, tables, keyboards, and screens; and

optimal positions of the back, shoulders, elbows, various doses and contact times. Observations are
and wrists. in mg/L.
Both an MSD score and a rapid upper limb
assessment (RULA) score were obtained for each Sample
participant. The MSD score is the total number of 1 2 3 4
painful body parts reported by the individual. The MSI method .39 .84 1.76 3.35
RULA score is a rating of the individual’s posture, SIB method .36 1.35 2.56 3.92
with lower numbers indicating better posture. Each 5 6 7 8
score was determined both before and after the oral
MSI method 4.69 7.70 10.52 10.92
presentation intervention. (The textbook author
SIB method 5.35 8.33 10.70 10.91
who found this article did find that his own posture
improved at least while he was typing this exercise
Does the true average content measured by one
in the manuscript.)
method appear to differ from that measured by the
other method? State and test the appropriate hy-
Mean
potheses. Does the conclusion depend on whether
Sample Difference SD of
a significance level of .05, .01, or .001 is used?
Measurement Size (After–Before) Difference
MSD Score 21 .19 1.03 41. Shoveling is not exactly a high-tech activity but will
continue to be a required task even in our informa-
RULA Score 21 21.52 1.56
tion age. The article “A Shovel with a Perforated
Blade Reduces Energy Expenditure Required for
a. Assuming that the difference in MSD scores
Digging Wet Clay” (Human Factors, 2010: 492–502)
(After–Before) is approximately normal, carry
reported on an experiment in which each of 13
out a test at significance level .05 to decide
workers was provided with both a conventional
whether true average difference in MSD scores
shovel and a shovel whose blade was perforated
is different from zero.
with small holes. The authors of the cited article
b. Assuming that the difference in RULA scores
provided the following data on stable energy
(After–Before) is approximately normal, carry
expenditure [kcal/kg(subject)/lb(clay)]:
out a test at significance level .05 to decide
whether true average difference in RULA scores Worker: 1 2 3 4 5 6 7
is different from zero. Conventional: .0011 .0014 .0018 .0022 .0010 .0016 .0028
c. From parts (a) and (b) you should have found Perforated: .0011 .0010 .0019 .0013 .0011 .0017 .0024
that for one score the intervention had a sig- Worker: 8 9 10 11 12 13
nificant impact but not for the other score. Conventional: .0020 .0015 .0014 .0023 .0017 .0020
Keeping in mind what the scores measure, can Perforated: .0020 .0013 .0013 .0017 .0015 .0013
you offer an explanation of why this may have
occurred? (For a group of computer workers Carry out a test of hypotheses at significance level
who were exposed to a more rigorous type of .05 to see if true average energy expenditure using
intervention, the article reported that interven- the conventional shovel exceeds that using the per-
tion was beneficial for both MSD and RULA forated shovel.
scores.)
42. The article “Supervised Exercise Versus Non-
40. The article “Selection of a Method to Determine Supervised Exercise for Reducing Weight in
Residual Chlorine in Sewage Effluents” (Water Obese Adults” (J. Sport. Med. Phys. Fit., 2009:
and Sewage Works, 1971: 360–364) reported the 85–90) reported on an investigation in which
results of an experiment in which two different participants were randomly assigned either to a
methods of determining chlorine content were supervised exercise program or a control group.
used on samples of Cl2-demand-free water for Those in the control group were told only that

they should take measures to lose weight. After Foods” (J. of the Amer. Dietetic Assoc., 2010: 116–
4 months, the sample mean decrease in body fat for 123) presented the accompanying data on vendor-
the 17 individuals in the experimental group was stated gross energy and measured value (both in kcal)
6.2 kg with a sample standard deviation of 4.5 kg, for 10 different supermarket convenience meals):
whereas the sample mean and standard deviation for
the 17 people in the control group were 1.7 kg and Meal: 1 2 3 4 5 6 7 8 9 10
3.1 kg, respectively. Assume normality of the two Stated: 180 220 190 230 200 370 250 240 80 180
body fat loss distributions (as did the investigators). Meas.: 212 319 231 306 211 431 288 265 145 228
Does it appear that true average decrease in Carry out a test of hypotheses to decide whether the
body fat is more than 2 kg larger for the experimen- true average % difference from that stated differs from
tal condition than for the control condition? Carry zero. (Note: The article stated “Although formal
out a test of appropriate hypotheses using a signifi- statistical methods do not apply to convenience
cance level of .01. samples, standard statistical tests were employed to
43. The article “The Accuracy of Stated Energy Con- summarize the data for exploratory purposes and to
tents of Reduced-Energy, Commercially Prepared suggest directions for future studies.”)

8.3 ests Concerning Hypotheses

T
About a Categorical Population
In this section, we consider several hypothesis-testing situations involving categorical,
as opposed to numerical, populations. Suppose that each individual or object in the
population can be placed in one of k nonoverlapping categories. For example, systems
of a particular type may consist of four components, and the failure of each system
may be attributed to failure of one particular component. The four relevant categories
would then be “failure of first component,” . . . , “failure of fourth component.” The
null hypothesis will specify a particular value for each one of the category proportions
(i.e., probabilities). In the system example, H0 might specify that each of the long-run
failure proportions is .25; that is, a failure is equally likely to be attributed to any one of
the four components. A more complicated situation is that in which each individual or
object can be categorized with respect to two different categorical factors. For example,
each new automobile of a certain type might be classified with respect to color—white,
black, blue, etc.—and also with respect to the type of transmission—automatic or man-
ual. We shall consider testing the null hypothesis that categories of the first factor occur
independently of those of the second, for example, that car color is independent of type
of transmission, that political party registration is independent of preferred religious de-
nomination, and so on. These tests are based on a type of probability distribution that we
have not yet encountered, so we first digress from testing to introduce this distribution.

Chi-Squared Distributions
Just as with t distributions, there is not a single chi-squared distribution. Rather there
is an entire family of distributions. A particular member of the family is identified by
specifying some number of degrees of freedom. Thus there is one chi-squared distribu-
tion with 1 df, another with 2 df, yet another with 3 df, and so on. Curves corresponding
to several different chi-squared distributions are shown in Figure 8.9. There is no den-
sity to the left of zero, so negative values of chi-squared variables are precluded. Each

df = 8
df = 12
df = 20

Figure 8.9 Chi-squared curves

chi-squared curve is positively skewed; as the number of df increases, the curves stretch
farther and farther to the right and become more symmetric.
Our chi-squared tests are all upper-tailed, so the P-value is the area captured under
a particular chi-squared curve to the right of the calculated test statistic value. The
fact that t curves were all centered at zero allowed us to tabulate t-curve tail areas in a
relatively compact way, with the left margin giving values ranging from 0.0 to 4.0 on
the horizontal t scale and various columns displaying corresponding upper-tail areas for
various df’s. The rightward movement of chi-squared curves as df increases necessitates
a somewhat different type of tabulation. The left margin of Appendix Table VII displays
various upper-tail areas: .100, .095, .090, . . . , .005, and .001. Each column of the table
is for a different value of df, and the entries are values on the horizontal chi-squared axis
that capture these corresponding tail areas. For example, moving down to tail area .085
and across to the 2 df column, we see that the area to the right of 4.93 under the 2 df
chi-squared curve is .085 (see Figure 8.10). To capture this same upper-tail area under
the 10 df curve, we must go out to 16.54. In the 2 df column, the top row shows that if
the calculated value of the chi-squared variable is smaller than 4.60, the captured tail
area (the P-value) exceeds .10. Similarly, the bottom row in this column indicates that
if the calculated value exceeds 13.81, the tail area is smaller than .001 (P-value < .001).

Chi-squared curve for 2 df

Shaded area = .085

Unless otherwise noted, all content on this page is © Cengage Learning.

4.93

Figure 8.10 Capturing a particular upper-tail area under a

chi-squared curve

Tests Based on Univariate Categorical Data

Suppose that each individual or object in a population or process can be placed in one
of k nonoverlapping categories. Let
1 5 population or long-run process proportion falling in the first category
. .
. .
. .
k 5 population or long-run process proportion falling in the kth category

The ’s can also be interpreted as probabilities; i is the probability that a randomly
selected individual or object will fall in the ith category. The null hypothesis completely
specifies the value of each i; we denote these hypothesized values by adding a sub-
script 0 to each i (as we used 0 to denote the null value in a test involving ):
i0 5 value of i asserted to be true by the null hypothesis (i 5 1, . . . , k)
As an example, suppose that the genotype for a particular genetic characteristic can be
either AA, Aa, or aa (k 5 3). The standard genetic argument in this situation implies
the null hypothesis
H0: 1 5 .25, 2 5 .50, 3 5 .25
The alternative hypothesis states simply that the specification in H0 is not correct—that
is, at least one of the i0’s is incorrect (because the hypothesized values add to 1.0, if a
particular value is incorrect, at least one other value must also be incorrect). A test of these
hypotheses will be based on a random sample taken from the population or process. Each
individual or object in the sample will belong in exactly one of the k categories; thus we
will have a sample consisting of univariate categorical data. For example, we might select
n 5 100 individuals and find that the first has genotype Aa, the second has genotype aa,
the third and fourth both have genotype Aa, the fifth has genotype AA, and so on. Let
n1 5 number of sampled individuals or objects falling in the first category
. .
. .
. .
nk 5 number of sampled individuals or objects falling in the kth category

The ni values are called observed category frequencies or counts. In the genetics
example with k 5 3, we might have n 5 100, n1 5 20, and n2 5 53, from which
n3 5 100 2 20 2 53 5 27.
The central idea of the test procedure is to compare the observed counts with what
would be expected were H0 true. If, for example, the three hypothesized values are .25,
.50, and .25, and n 5 100, then when the null hypothesis is true,
expected number in the first category 5 n10 5 100(.25) 5 25
expected number in the second category 5 n20 5 100(.50) 5 50
expected number in the third category 5 n30 5 100(.25) 5 25
More generally,
expected frequency for category i when H0 is true 5 ni0 (i 5 1, . . . , k)
That is, expected frequencies under H0 are obtained by multiplying each hypothesized
value by the sample size. Intuitively, the data supports the null hypothesis when the
observed frequencies are similar to the expected frequencies. If some of the observed
frequencies differ substantially from what would be expected if H0 were true, the null
hypothesis is no longer tenable.
We now need a quantitative measure of how different the observed frequencies
are from the expected frequencies, assuming H0 is true. A first thought is to subtract
each expected frequency from the corresponding observed frequency to obtain a

deviation, square these deviations, and add them together. Symbolically, this would be
^ (ni 2 ni0)2. Suppose, however, that

n1 5 95, n10 5 100

n2 5 15, n20 5 20

Then both deviations are 25, so they both contribute the same amount to our quanti-
tative measure of discrepancy. However, the observed frequency for the first category
is only 5% smaller than what was expected, whereas the observed frequency for the
second category is fully 25% smaller than what we would expect if the null hypothesis
were true. Our proposed measure does not reflect the fact that, on a percentage basis,
the discrepancy for the second category is more sizable than that for the first category.
The chi-squared test statistic takes into account percentage deviations.

The Chi-Squared Test Based on Univariate Categorical Data

Hypotheses:
0: 1 5 10, . . . , 5 0

a: the specification of ’s in 0 is not correct

2
( 2 0) (observed 2 expected)2
Test statistic: 2 5 ^ 5^
51 0 expected

(Many sources denote this statistic by 2, read “chi-squared,” but to avoid

confusing this with a parameter we don’t want to use a Greek letter.)
The smallest possible value of this test statistic is 2 5 0 (when observed 5
expected for every category), which provides the strongest possible sup-
port for the null hypothesis. The larger the value of 2, the stronger is the
evidence against 0.
-value: The key result underlying the test procedure is that when 0 is true
and 0 . 5 for 5 1, . . . , (i.e., all expected counts exceed 5), 2 has
approximately a chi-squared distribution with 21 df. The -value is
then approximately the area under the 2 1 df chi-squared curve to
the right of the calculated 2 value (information about tail areas for chi-
squared curves appears in Appendix Table VII). If one or more expected
counts is at most 5, categories should be combined in a sensible way so
that the resulting expected counts are large enough.

Example 8.8 A number of psychologists have considered the relationship between various deviant be-
haviors and geophysical variables such as the lunar phase. The article “Psychiatric and
Alcoholic Admissions Do Not Occur Disproportionately Close to Patients’ Birthdays”
(Psychological Reports, 1992: 944–946) investigated whether the chance of a patient’s
admission date for a particular treatment is smaller or larger than would be the case

under the assumption of complete randomness. Disregarding leap year, there are 365
possible admission days, so complete randomness would imply a probability of 1/365
for each day. However, this results in far too many categories and expected counts that
are too small for the chi-squared test. So the following four categories were established:
1. Within 7 days of an individual’s birthday (7 days before to 7 days after)
2. Between 8 and 30 days, inclusive, from the birthday
3. Between 31 and 90 days, inclusive, from the birthday
4. More than 90 days from the birthday
Let i denote the true proportion of individuals in category i (i 5 1, 2, 3, 4). Then
complete randomness with respect to admission date implies that
1 5 15y365 5 .041 2 5 46y365 5 .126 3 5 .329
4 5 1 2 (.041 1 .126 1 .329) 5 .504
Thus the relevant hypotheses are
H0: 1 5 .041, 2 5 .126, 3 5 .329, 4 5 .504
versus
Ha: the specification of ’s in H0 is not correct
The cited article gave data for n 5 200 patients admitted for alcoholism treatment.
The expected counts when H0 is true are then
expected count for category 1 5 n10 5 200(.041) 5 8.2
n20 5 200(.126) 5 25.2 n30 5 200(.329) 5 65.8
n40 5 200 2 (8.2 1 25.2 1 65.8) 5 100.8 35 200(.504) 4
Since all expected counts exceed 5, the chi-squared test can be used. The observed
counts along with their expected counterparts are as follows:
Category: 1 2 3 4
Observed: 11 24 69 96
Expected: 8.2 25.2 65.8 100.8
The value of the chi-squared statistic is thus
(11 2 8.2)2 (24 2 25.2)2 (69 2 65.8)2 (96 2 100.8)2
X2 5 1 1 1
8.2 25.2 65.8 100.8
5 .96 1 .06 1 .16 1 .23
5 1.41
The test is based on k 2 1 5 3 df. The smallest entry in the 3 df column of Appendix
Table VII is 6.25, corresponding to an upper-tail area of .10. Because 1.41 , 6.25,
the area captured to the right of 1.41 exceeds .10. That is, P-value ..10, so H0 can-
not be rejected at any reasonable significance level. Our analysis is consistent with
the title of the cited article; we have no evidence to suggest that admission date is
anything other than random.

Testing for Homogeneity of Several

Categorical Populations
Suppose now that an investigator is interested in several different categorical popula-
tions or processes, each one consisting of the same categories. For example, there
are gas stations selling four different brands of gasoline at a particular freeway inter-
change: Arco (A), Chevron (C), Mobil (M), and Union (U). Each station sells three
different grades of gasoline: regular (R), plus (P), and super (S). The four relevant
populations consist of the customers purchasing gasoline at each of the four stations,
and the grades of gas are the categories. For any particular one of these four popula-
tions, there is some proportion of individuals in each of the three categories; these
proportions sum to 1.0 for each population. Table 8.1 displays two different possible
configurations of population proportions. In the first one, the proportion of individu-
als in the R category is the same for each population, and the proportion of individu-
als in the P category is also identical for the four populations. This, of course, implies
that the proportion in the last category (S) is constant across the four populations. The
populations are said to be homogeneous with respect to the categories when this is
the case—that is, when the proportion in the first category is the same for all popula-
tions, the proportion in the second category is also identical for all populations, and
so on. The second configuration in Table 8.1 corresponds to nonhomogeneous popu-
lations; the proportions in the various categories are not constant across the popula-
tions. Of course, the first configuration in Table 8.1 is not the only one for which the
populations are homogeneous; any configuration for which the proportions in any
particular column are identical (e.g., .7 for the first column, .2 for the second, and .1
for the third) satisfies the stated condition.

Table 8.1 Two possible configurations of proportions for four categorical populations
(a) Homogeneous populations (b) Nonhomogeneous populations
Category Category
R P S R P S
A .50 .30 .20 A .50 .30 .20
C .50 .30 .20 C .60 .25 .15
Population Population
M .50 .30 .20 M .50 .25 .25
Unless otherwise noted, all content on this page is © Cengage Learning.

U .50 .30 .20 U .65 .25 .10

The null hypothesis that we wish to test is that the populations are homogeneous.
For this purpose, we require a separate random sample from each of the populations;
let’s denote the corresponding sample sizes by n1, n2, and so on. Of the n1 individuals
or objects selected from the first population, some number will fall in the first category,
some number will be in the second category, and so on. This is also the case for the
samples from the other populations. The resulting category frequencies or counts can
be displayed in a rectangular table called a contingency table; there is a row for each
population and a column for each category. The row sums of these observed frequen-
cies are the sample sizes, so they are fixed by the experimenter. Table 8.2 shows one
possible set of observed frequencies when each sample size is 200.

Table 8.2 A contingency table in the case of

four populations, each with three categories
Category
R P S Sample size
A 107 62 31 n1 5 200
C 95 67 38 n2 5 200
Population
M 103 57 40 n3 5 200
U 98 59 43 n4 5 200
Number in category 403 245 152 800

Homogeneity asserts that there is a common value of 1, the proportion in the first
category, for all populations, a common value of 2 for all populations, and so on. If the
values of these ’s were known, then just as in the case of a single population, expected
frequencies would result from multiplying these ’s by the various sample sizes. Let’s now
assume that the populations are homogeneous and estimate the ’s from the observed
frequencies. Consider the frequencies in Table 8.2. Sensible estimates of 1, 2, and 3
are then just the proportions of the total sample size 800 falling in the various categories:
estimate of 1 5 proportion of total sample size in first category
5 403y 800 5 .50375
estimate of 2 5 proportion of total sample size in second category
5 245y 800 5 .30625
estimate of 3 5 proportion of total sample size in third category
5 152y 800 5 .19000
Multiplying these estimates by n1 5 200 gives the estimated expected frequencies for
the sample from the first population (assuming homogeneity). For example,
estimated expected frequency for the first category in the first sample
403
5 200a b 5 100.75
800
Notice that this estimated expected frequency is the product of the row total (200) and
the column total (403) divided by the “grand” total (800). This is in fact the general
prescription for obtaining estimated expected frequencies: (row total)(column total)y
grand total. Once these have been calculated, the value of a chi-squared statistic can Unless otherwise noted, all content on this page is © Cengage Learning.

be obtained exactly as in the case of a single population, by summing the quantities

(observed 2 expected)2y expected over all cells in the contingency table.

The Chi-Squared Test for Homogeneity of Several

Categorical Populations
(The word may be replaced by everywhere.)
Denote the number of populations by and the number of categories for each popu-
lation by (the same categories for all populations).
Hypotheses: 0: the populations are homogeneous with respect to the categories
(i.e., the proportion of each population falling in the first category is the

same for all populations, the proportion falling in the second category is
also the same for all populations, and so on)
a: the populations are not homogeneous
(for at least one of the categories, the proportions are not identical for
all populations)
Test statistic: Suppose the observed counts are displayed in a contingency table consist-
ing of rows, one for the sample from each population, and columns, one
for each category (an by table). Then the expected frequency
corresponding to any particular observed frequency (i.e., to any particular
cell of the table) is computed as
(row total)(column total)
estimated expected frequency 5

where is the sum of the individual sample sizes. The test statistic is then
(observed 2 estimated expected)2
2
5 ^ estimated expected
all cells

-value: When 0 is true and all estimated expected frequencies exceed 5, 2 has ap-
proximately a chi-squared distribution with df 5 ( 2 1)( 2 1). Because any
value larger than the calculated 2 is even more contradictory to 0, the test
is upper-tailed and the -value is approximately the area to the right of the cal-
culated 2 under the ( 2 1)( 2 1) chi-squared curve. If at least one estimated
expected counts is at most 5, categories should be combined in a sensible way.

Example 8.9 A company packages a particular product in cans of three different sizes, each one
using a different production line. Most cans conform to specifications, but a quality
control engineer has identified the following reasons for nonconformance:
1. Blemish on can
2. Crack in can
3. Improper pull tab location
4. Pull tab missing
5. Other
A sample of nonconforming units is selected from each of the three lines, and each
unit is categorized according to reason for nonconformity, resulting in the following
contingency table data:
Reason for nonconformity
Sample
Blemish Crack Location Missing Other size
1 34 65 17 21 13 150
Production 2 23 52 25 19 6 125
line 3 32 28 16 14 10 100
Total 89 145 58 54 29 375

Does the data suggest that the proportions falling in the various nonconformance
categories are not the same for the three lines? The parameters of interest are the
various proportions, and the relevant hypotheses are
H0: the production lines are homogeneous with respect to the five nonconformance
categories
Ha: the production lines are not homogeneous with respect to the categories
To calculate X2, we must first compute the estimated expected frequencies (assuming
homogeneity). Consider the first nonconformance category for the first production
line. When the lines are homogeneous,
estimated expected number among the 150 selected units that are blemished
(first row total)(first column total) (150)(89)
5 5 5 35.60
total of sample sizes 375
The contribution of the cell in the upper-left corner to X2 is then
(observed 2 estimated expected)2 (34 2 35.60)2
5 .072 5
estimated expected 35.60
The other contributions are calculated in a similar manner. Table 8.3 shows Minitab
output for the chi-squared test. The observed count is the top number in each cell,
and directly below it is the estimated expected count. The contribution of each cell
to X2 appears below the counts, and the test statistic value is X2 5 14.159. All esti-
mated expected counts exceed 5, so combining categories is unnecessary. The test
is based on (3 2 1)(5 2 1) 5 8 df. Our chi-squared table shows that the values that
capture upper-tail areas of .08 and .075 under the 8 df curve are 14.06 and 14.26,
respectively. Thus the P-value is between .075 and .08; Minitab gives P-value 5 .079.
The null hypothesis of homogeneity should not be rejected at the usual significance
levels of .05 or .01, but it would be rejected for the higher of .10.
Table 8.3 Minitab output for the chi-squared test of Example 8.9
Expected counts are printed below observed counts
blem crack loc missing other Total
1 34 65 17 21 13 150
35.60 58.00 23.20 21.60 11.60
2 23 52 25 19 6 125
Unless otherwise noted, all content on this page is © Cengage Learning.
29.67 48.33 19.33 18.00 9.67
3 32 28 16 14 10 100
23.73 38.67 15.47 14.40 7.73
Total 89 145 58 54 29 375
Chisq = 0.072 + 0.845 + 1.657 + 0.017 + 0.169 + 1.498 + 0.278 +
1.661 + 0.056 + 1.391 + 2.879 + 2.943 + 0.018 + 0.011 +
0.664 = 14.159
df = 8, p = 0.079

Testing for Independence of Two Categorical

Factors in a Single Population
Rather than comparing several different categorical populations or processes, consider
a single population or process in which each individual or object can be classified both

with respect to a first categorical factor A and with respect to a second such factor B. For
example, each car of a certain type manufactured in a particular year can be classified
with respect to body style—two-door coupe, four-door sedan, or hatchback—and with
respect to color—white, black, blue, green, or red. Suppose we take a sample of size n
and classify each sampled individual or object with respect to both the A factor (style)
and the B factor (color). The resulting counts can be displayed in a contingency table
having a row for each category of the A factor and a column for each category of the B
factor—a 3 by 5 table in the example under consideration. In this situation, neither the
row nor the column totals are fixed in advance, only the sum of all counts, which equals
n. The number in the upper-left corner would be the number of sampled automobiles
that are both coupes and white, and so on. The null hypothesis of interest in this situa-
tion is that the two factors A and B are independent; that is, knowing the body style does
not change the likelihood of a particular color and vice versa.
Although homogeneity and independence are two different scenarios, the follow-
ing can be shown: (1) The estimated expected frequencies in the test of independence
are calculated exactly as they were for the test of homogeneity: row total times column
total divided by n; (2) X2 is still an appropriate test statistic; (3) the test is still upper-
tailed; and (4) the test is based on the same number of df as the homogeneity test.

Fisher’s Exact Test

Suppose a company uses one of two methods (A and B) in the manufacture of printed
circuit boards. A random sample of 15 boards is taken from the production line and
each board is inspected for the existence of any major defects. The following table pro-
vides a cross-classification of the boards:
Method A Method B
Defects Present 7 1
Defects Absent 1 6
Consider carrying out a test of hypotheses where the null asserts that production meth-
od is independent of board condition and the alternative is that there is dependence.
Here we would not be able to apply the chi-squared test due to the fact that estimated
expected frequencies will not all exceed 5. In such situations the chi-squared test is
known to yield unreliable results. Note in the following chi-square test output from
Minitab that a warning appears concerning cells having small expected counts.
Expected counts are printed below observed counts
Method A Method B Total
Defects Present 7 1 8
4.27 3.73
Defects Absent 1 6 7
3.73 3.27

Total 8 7 15
Chi-Sq = 8.040, DF = 1, P-Value = 0.005
4 cells with expected counts less than 5.

For a contingency table having more than two rows and two columns, if any estimated
expected count is at most 5, it may be possible to consolidate some categories and

generate a new contingency table whose estimated expected counts would all exceed 5.
However, this option would not be available for a contingency table with two rows and
two columns as the minimum number of categories for each variable has been reached.
Instead of using the chi-square approach, we now introduce a different method that is
popularly known as Fisher’s Exact Test.
Recall in our example that 8 out of the 15 boards were produced using Method A
and a total of 8 printed circuit boards had defects. If the null hypothesis of indepen-
dence between production method and board condition is true, given that Method A
accounts for 8 out of the 15 boards and that 8 out of the boards had defects, what is the
probability that we would obtain results at least as extreme as what we observed? This
probability is the P-value for Fisher’s Exact Test; it can be computed explicitly by using
a particular discrete distribution.
First, let us consider all possible contingency table configurations under the
assumption that Method A accounts for 8 out of the 15 boards and that 8 out of the
boards had defects. Figure 8.11 reveals that there are only 8 possible contingency
tables. If the null hypothesis is true, it can be shown that the probability of each of
the 8 possible outcomes can be determined by a discrete distribution known as the
hypergeometric. Statistical software packages can readily compute probabilities from
this distribution.

A B A B A B A B
Present 8 0 Present 7 1 Present 6 2 Present 5 3
Absent 0 7 Absent 1 6 Absent 2 5 Absent 3 4
Prob. 5 .0002 Prob. 5 .0087 Prob. 5 .0914 Prob. 5 .3046

A B A B A B A B
Present 4 4 Present 3 5 Present 2 6 Present 1 7
Absent 4 3 Absent 5 2 Absent 6 1 Absent 7 0
Prob. 5 .3807 Prob. 5 .1828 Prob. 5 .0305 Prob. 5 .0012
Unless otherwise noted, all content on this page is © Cengage Learning.
Figure 8.11 All possible contingency tables and corresponding hypergeometric
probabilities

With all table probabilities in hand, we can now obtain P-value information.
Our originally observed contingency table yielded 7 boards having defects manu-
factured by Method A. The corresponding table probability is .0087. To determine
the P-value we need to consider other tables that would be at least as extreme than
what was observed. This would include any tables having a corresponding prob-
ability that is less than or equal to .0087. From Figure 8.11 we see that only two
other tables qualify (with probabilities. 0002 and .0012). Combining these prob-
abilities, we have P-value 5 .0087 1 .0002 1 0012 5 .0101. Thus, at the .05 signifi-
cance level we can reject the null hypothesis of independence between production

method and board condition. Figure 8.12 is the corresponding output from SAS for
our example:

Fisher’s Exact Test

Cell (1,1) Frequency (F) 7

Left-sided Pr <= F 0.9998
Right-sided Pr >= F 0.0089

Table Probability (P) 0.0087

Two-sided Pr <= P 0.0101

Sample Size = 15

Figure 8.12 SAS Output for Fisher’s Exact Test

From the output, the P-value we computed corresponds to the probability reported next
to Two-sided Pr ,5 P as we were interested in testing if any type of dependence
existed. As the output suggests, we can use a directional alternative for Fisher’s Exact
Test as well. Consult the book by Agresti cited in the chapter bibliography for more
details on this test.

Section 8.3 Exercises

44. Say as much as you can about the P-value for a chi- the accompanying data. Does this data suggest that
squared test in each of the following situations: the homicide rate somehow depends on the season?
a. X2 5 7.5, df 5 2 b. X2 5 13.0, df 5 6 State the relevant hypotheses, then test using 5 .05.
c. X2 5 18.0, df 5 9 d. X2 5 21.3, df 5 4 Season: Winter Spring Summer Fall
e. X2 5 5.0, df 5 3
Frequency: 328 334 372 327
45. A statistics department at a large university main- 47. The article “Racial Stereotypes in Children’s Tele-
tains a tutoring service for students in its introduc- vision Commercials” (J. of Adver. Res., 2008: 80–93)
tory service courses. The service has been staffed reported the following frequencies with which eth-
with the expectation that 40% of its clients would nic characters appeared in recorded commercials
be from the business statistics course, 30% from en- that aired on Philadelphia television stations.
gineering statistics, 20% from the statistics course
Unless otherwise noted, all content on this page is © Cengage Learning.

for social science students, and the other 10% from African
the course for agriculture students. A random sam- Ethnicity: American Asian Caucasian Hispanic
ple of n 5 120 clients revealed 52, 38, 21, and 9 Frequency: 57 11 330 6
from the four courses. Does this data suggest that The 2000 census proportions for these four ethnic
the percentages on which staffing was based are not groups are .177, .032, .734, and .057, respectively.
correct? State and test the relevant hypotheses us- Does the data suggest that the proportions in com-
ing 5 .05. mercials are different from the census proportions?
Carry out a test of appropriate hypotheses using a
46. Criminologists have long debated whether there is
significance level of .01.
a relationship between weather and violent crime.
The author of the article “Is There a Season for 48. An information retrieval system has ten storage lo-
Homicide?” (Criminology, 1988: 287–296) classi- cations. Information has been stored with the ex-
fied 1361 homicides according to season, resulting in pectation that the long-run proportion of requests

for location i is given by i 5 (5.5 2 i 2 5.5 )y30. A 51. A placebo—that is, a fake medication or treatment—
sample of 200 retrieval requests gave the following is well known to sometimes have a positive effect just
frequencies for locations 1–10, respectively: 4, 15, because patients often expect the medication or treat-
23, 25, 38, 31, 32, 14, 10, and 8. Use a chi-squared ment to be helpful. The article “Beware the Nocebo
test at significance level .10 to decide whether the Effect” (The New York Times, Aug. 12, 2012) gave ex-
data is consistent with the a priori proportions. amples of a less familiar phenomenon: the tendency
for patients informed of possible side effects to actu-
49. The article “The Gap Between Wine Expert Rat-
ally experience those side effects. The article cited
ings and Consumer Preferences” (Intl. J. of Wine
a study reported in The Journal of Sexual Medicine
Business Res., 2008: 335–351) studied differences
in which a group of patients diagnosed with benign
between expert and consumer ratings by consider-
prostatic hyperplasia was randomly divided into two
ing medal ratings for wines: gold (G), silver (S), or
subgroups. One subgroup of size 55 received a com-
bronze (B). Three categories were then established:
pound of proven efficacy along with counseling that
1. Rating is the same [(G,G), (B,B), (S,S)].
a potential side effect of the treatment was erectile
2. Rating differs by one medal [(G,S), (S,G), (S,B),
dysfunction. The other subgroup of size 52 was giv-
(B,S)].
en the same treatment without counseling. The per-
3. Rating differs by two medals [(G,B), (B,G)].
centage of the no-counseling subgroup that reported
The observed frequencies for these three categories one or more sexual side effects was 15.3%, whereas
were 69, 102, and 45, respectively. On the hypoth- 43.6% of the counseling subgroup reported at least
esis of equally likely expert ratings and consumer one sexual side effect. State and test the appropriate
ratings being assigned completely by chance, each hypotheses at significance level .05 to decide wheth-
of the 9 medal pairs has probability 1y9. Carry out er the nocebo effect is operating here. (Hint: First
an appropriate chi-squared test using a significance arrange the data into a contingency table comparing
level of .10. subgroup versus presence of side effects.)

50. A random sample of smokers was obtained, and each 52. A random sample of individuals who drive to work
individual was classified by both gender and age when in a large metropolitan area was obtained, and each
he or she first started smoking. The data in the ac- individual was categorized with respect to both size of
companying table is consistent with summary results vehicle and commuting distance (in miles). Does the
reported in the article “Cigarette Tar Yields in Rela- accompanying data suggest that there is an association
tion to Mortality in the Cancer Prevention Study II between type of vehicle and commuting distance?
Prospective Cohort” (British Med. J., 2004: 72–79).
Commuting Distance
Gender 0 2 ,10 10 2 ,20 $20
Male Female Subcompact 6 27 19
,16 25 10 Type of Compact 8 36 17
Age 16–17 24 32 vehicle Midsize 21 45 33
18–20 28 17 Full-size 14 18 6
.20 19 34 X2 5 14.16

a. Does this situation call for a test of homogeneity

a. Calculate the proportion of males in each age
or a test of independence?
category; do the same for females. Based on
b. State and test the appropriate hypotheses using
these proportions, does it appear there might
5 .05.
be an association between gender and the age
when an individual first smokes? 53. We often think that occupational hazards are pri-
b. Carry out a test of hypotheses to decide whether marily experienced by those who work under
there is an association between the two factors. dangerous conditions (e.g., construction workers,

law enforcement officers, dockworkers). Clearly, 55. Children often suffer from a condition known as ton-
a dangerous job can lead to illness or death. But sillitis in which the tonsils become sore or swollen.
can the psychological stress of a work environment When the condition becomes chronic, many sufferers
affect employees’ overall health? This issue was in- have their tonsils surgically removed by the tonsillec-
vestigated in the article “Are There Health Effects tomy (TE) procedure. TE is one of the most common
of Harassment in the Workplace? A Gender-Sensi- surgeries performed in children and young adults
tive Study of the Relationships Between Work and worldwide. However, because of the invasive nature
Neck Pain” (Ergonomics, 2012: 147–159). The re- of the surgery, TE patients often experience severe
searchers wanted to identify workplace physical and postoperative complications. Tonsillotomy (TT), an al-
psychosocial risk factors for neck pain among male ternative procedure to surgically removing the tonsils,
and female workers. They also wanted to study the has become increasingly popular because studies have
relationship between neck pain and intimidation shown it to be less invasive and to have lower risk of
or sexual harassment in the workplace. (Advanced postoperative complications.
statistical techniques were used to show that neck The article “Differences in Pain and Nau-
pain was significantly associated with intimidation sea in Children Operated on by Tonsillectomy or
at work among both male and female workers.) Tonsillotomy—a Prospective Follow-Up Study”
This study was based on a representative sam- (J. of Advanced Nursing, 2012) examined the dif-
ple (5405 men, 3987 women) of the Quebec work- ferences in postoperative pain, nausea, and time of
ing population. The following cross-classification discharge in children 3–12 years of age after TE or
table for this sample on gender versus level of neck TT. To compare differences in postoperative nau-
pain is consistent with data reported in the article: sea, researchers kept track of the number of pre-
scriptions of ondansetron (a drug to treat nausea
Gender
and vomiting) that were issued to the TE and TT
Men Women
children. Four out of 34 TE children compared to
Never 3048 1842
none of the 53 TT children received such prescrip-
Pain Occasionally 1767 1411 tions.
At Least Failry Often 590 734 a. Suppose we are interested in testing whether
surgery method affects the provision of ondan-
Does it appear that there might be an association setron prescriptions. Determine the estimated
between gender and neck pain? Carry out a test of expected counts based on the chi-squared test
hypotheses using the .01 significance level. method. Do all expected counts exceed 5?
54. The article cited in Exercise 53 classified each b. Use Fisher’s exact test to analyze this data and
member of the sample of workers with respect to report the P-value based on a two-sided alterna-
both gender and level of work-related psychologi- tive (as did the authors of the cited article). If
cal demands. The following table is consistent with your software does not perform this test, there
summary results reported in the article: are many online calculators that will report
the P-value based on this test. One such site is
Gender
http://research.microsoft.com/en-us/um
Men Women /redmond/projects/mscompbio/fisherexacttest
Low 1692 1324
Job Demand Medium 1838 1352 56. For many years, federal equal employment oppor-
tunity laws have prohibited compensation discrimi-
High 1875 1311
nation. However, according to the U.S. Equal Em-
Does it appear that there might be an association ployment Opportunity Commission (EEOC), pay
between gender and work-related psychological de- disparities continue to exist in various demographic
mands? Carry out a test of hypotheses using the .05 groups. According to the EEOC website (visited
significance level. on January 13, 2013), Section 10 of the EEOC

Compliance Manual describes the standards and and 2). There are 14 members in group 1 and
suggested steps for investigating a charge of com- 17 in group 2. Eight members of group 1 and three
pensation discrimination. In the statistical analysis members of group 2 earn salaries greater than the
section, Fisher’s exact test is recommended as the company median salary. Use Fisher’s exact test at
test of choice. The following is based on the exam- significance level .05 to investigate whether group
ple found in the EEOC Compliance Manual. affiliation has an effect on salary status. (The previ-
Suppose the employees of a particular com- ous exercise identifies a website that will carry out
pany can be classified into one of two groups (1 the calculations.)

8.4 Testing the Form of a Distribution

An investigator, having obtained a sample x1, x2, . . . , xn from some underlying popu-
lation or process distribution, will often wish to know whether it is plausible that the
underlying distribution is a member of a particular family, such as the normal family,
Weibull family, or (in the case of discrete count data) the Poisson family. In this section,
we first present a special test for the normal case and then show more generally how a
test based on the chi-squared distribution can be carried out.

Is the Population Distribution Normal?

The validity of many inferential procedures, such as the one- and two-sample t inter-
vals and tests presented in this chapter and in Chapter 7, requires that the underlying
distribution(s) be at least approximately normal. If an assumption of normality is not justi-
fied, alternative methods of analyzing the data must be used. In Chapter 2, we suggested
the use of a normal quantile plot to assess the plausibility of the underlying distribution
being normal. The construction involved first determining the (.5yn)th quantile of the
standard normal distribution, the (1.5yn)th quantile, the (2.5yn)th quantile, and so on
[these are the values that separate, for i 5 1, . . . , n, the smallest 100((i 2 .5)yn)% of the
distribution from the remaining part]. These quantiles are then paired with the smallest
sample observation, the second smallest observation, the third smallest, and so on, and
the resulting pairs are plotted on a rectangular coordinate system. Normality is suggested
by a plot in which the points fall reasonably close to some straight line. A plot with a sub-
stantial nonlinear pattern of some sort (e.g., curvature, or one or more points far from the
line determined by the remaining points) casts doubt on population or process normality.
Some users of statistical methodology will not be comfortable with a subjective assess-
ment of the visual evidence in a plot. After all, people may argue about what is reasonably
close or what constitutes a substantial departure. Recall that in Chapter 3 we proposed the
sample correlation coefficient r as a measure of the strength of any linear relationship in a
bivariate sample. Consider the correlation coefficient r calculated from the pairs in a nor-
mal quantile plot to be our test statistic for the null hypothesis of normality. Because larger
observations are paired with larger z quantiles, the points in the plot increase in height when
moving from left to right. That is, the points in the plot slope upward, implying that r must
be positive. A value of r quite close to 1.0 gives evidence of a very straight pattern in the plot
and is thus supportive of normality. Suppose, for example, that we calculate r 5 .962. Then
any test statistic value smaller than .962 is even more contradictory to the null hypothesis
than what was obtained. For this reason, the test is lower-tailed; the P-value is the area under
the r sampling distribution curve (when H0 is true) to the left of the calculated r .

The test described in the next box involves a slight modification of what we have
so far suggested. For technical reasons, rather than using z quantiles corresponding to
(i 2 .5)yn, quantiles corresponding to (i 2 .375)y(n 1 .25) are used. These alternative
“plotting positions” do not greatly alter the appearance of the plot, but they have been
found to improve the behavior of the test.

The Ryan–Joiner Test for Normality

( This test is very similar to another procedure called the Wilk–Shapiro test.)
Null hypothesis: 1, . . . , comes from a normal distribution.
Alternative hypothesis: The sampled distribution is not normal.
Test statistic: 5 the sample correlation coefficient calculated from
( quantile, observation) pairs, where the quantiles are for
proportions ( 2 .375)y( 1 .25), 5 1, . . . , .
-value: The sampling distribution of when 0 is true is differ-
ent for each sample size . The -value is the area under
the appropriate one of these sampling distribution curves
to the left of the calculated . Appendix Table XII gives, for
various sample sizes, the values that capture lower-tail areas
of .10, .05, and .01. Unless the calculated value coincides
with one of these tabulated values, one of the following four
statements about the -value can be made: (1) -value ..10,
(2) .05 , -value , .10, (3) .01, -value , .05, (4) -value ,
.01. The statistical package Minitab will give -value informa-
tion for this test upon request.

Example 8.10 The following sample of n 5 17 observations on length-diameter ratio (LDR) mea-
surements based on static pile load tests first appeared in Example 2.17.
Quantile: 21.89 21.35 21.05 20.82 20.63 20.46 20.30 20.15 0.00
LDR: 30.86 37.68 39.04 42.78 42.89 42.89 45.05 47.08 47.08

Quantile: 0.15 0.30 0.46 0.63 0.82 1.05 1.35 1.89

LDR: 48.79 48.79 52.56 52.56 54.8 55.17 56.31 59.94
We asked Minitab to carry out the Ryan-Joiner test, and the result appears in
Figure 8.13. The test statistic value is r 5 .990, and Appendix Table XII gives .9549
as the critical value that captures lower-tail area .10 under the r sampling distribution
curve when n 5 17 and the underlying distribution is actually normal. Since .990 .
.9549, we conclude that P-value 5 area to the left of .9881 . .10, which is what the
Minitab output of Figure 8.13 reports. The P-value is larger than any reasonable
significance level, so there is absolutely no reason to doubt that the length-diameter
ratio is normally distributed.

99
Mean 47.31
StDev 7.560
95 N 17
90 RJ 0.990
80 P-Value >0.100
70
Percent 60
50
40
30
20
10
5

1
30 40 50 60 70
LDR

Figure 8.13 Minitab output from the Ryan-Joiner test for the data of

Chi-Squared Tests
Carrying out a chi-squared test requires that categories be established so that observed fre-
quencies can be compared with those expected if the hypothesized family is correct. Sup-
pose, for example, that we have observations on x 5 number of defects for a sample of 200
automobiles. Possible values of x are 0, 1, 2, . . . . A reasonable null hypothesis is that x has
a Poisson distribution. We might select the x value 0 as the first category, the value 1 as the
second category, 2 as the third category, 3 as the fourth category, and aggregate all x values
that are at least 4 as the remaining catchall category. The form of the Poisson mass function
is p(x) 5 e 2xyx! for x 5 0, 1, 2, . . . . Substituting x 5 0, 1, 2, and 3 and multiplying each
result by n 5 200 would give the expected frequencies for the first four categories; the last ex-
pected frequency could then be obtained by adding the first four and subtracting from 200.
However, carrying this out requires that we have a value of the parameter . The null
hypothesis states only that the distribution is Poisson, without specifying the correct . So
the value of must be estimated from the data before a test can be conducted, and the cor-
rect way to do this is to use the method of maximum likelihood introduced in Chapter 7. Unless otherwise noted, all content on this page is © Cengage Learning.
The estimate should be based on the grouped data (i.e., the number of observations falling
in each of the five categories) rather than the individual observations, but this is virtually
never done. Instead, the estimate n 5 x based on the full data is customarily used (this es-
timate is intuitively appealing because the mean value of a Poisson variable is just x 5 ).
Furthermore, the estimation of any parameters before calculating expected frequencies and
carrying out the test reduces the number of degrees of freedom on which the test is based.

Each parameter that must be estimated from the data before calculating expected fre-
quencies and carrying out a chi-squared test reduces the number of df for the test by one.
Thus if the test is based on categories, all (estimated) expected counts are at least 5; and
if parameters were estimated, the test is based on 2 1 2 df.

For a Poisson distribution, using the five categories suggested previously would result
in a test based on df 5 5 2 1 2 1 5 3 (provided that all expected counts were at least 5).
A chi-squared test for normality (not recommended because the Ryan–Joiner test, as
well as other tests, have smaller type II error probabilities for the same significance
level) would require estimating both and , reducing degrees of freedom by two.

Example 8.11 Consider the accompanying data on the number of Larrea divaricata plants found
in each of n 5 48 identically shaped sampling regions (ecologists call such regions
quadrats), taken from the article “Some Sampling Characteristics of Plants and
Arthropods in the Arizona Desert” (Ecology, 1962: 567–571):
Number of plants: 0 1 2 3 at least 4
Frequency: 9 9 10 14 6
The author of the article fit a Poisson distribution to this data. Suppose that the six
observations in the last category were actually 4, 4, 5, 5, 6, and 6; it is easily verified
n 5 x 5 2.10 (the value reported in the article). The (estimated) expected fre-
that
quency for the first category is then
e22.1(2.1)0
48 c d 5 5.88
0!
The other four expected frequencies, calculated in the same way, are 12.34, 12.96,
9.07, and (by subtraction) 7.75. All expected frequencies exceed 5, so the test will be
based on 5 2 1 2 1 5 3 df. The test statistic value is
(9 2 5.88)2 … (6 2 7.75)2
X2 5 1 1 5 6.31
5.88 7.75
The two smallest critical values in the 3 df column of our chi-squared table (Appen-
dix Table VII) are 6.25 and 6.36, corresponding to upper-tail areas of .100 and .095,
respectively. Thus the approximate P-value for the test is slightly less than .10. At a
significance level of either .05 or .01, there is little reason to doubt that the distribu-
tion of the number of plants per quadrat is Poisson.

In the case of continuous data, the categories are simply class intervals. For example,
we might select the following six classes: (2 , 85), (85, 95), (95, 100), (100, 105), (105,
115), and (115, ). After estimating any parameters, the estimated expected frequency for
105
the fourth class would be n ? 3#100 f (x) dx 4, where parameters in the density function f(x)
are replaced by their estimates.

Section 8.4 Exercises

57. Consider the Ryan–Joiner test for population iii. n 5 25, r 5 .983
normality. iv. n 5 25, r 5 .915
a. Give as much information as possible for the b. For each of the situations in part (a), state
P-value in each of the following situations: whether the null hypothesis would be rejected
i. n 5 10, r 5 .95 when using a significance level of .05.
ii. n 5 10, r 5 .90

58. The article cited in Exercise 31 of Section 8.2 24.6 12.7 14.4 30.6 16.1 9.5 31.5 17.2
gave the following observations on bending rigid- 46.9 68.3 30.8 116.7 39.5 73.8 80.6 20.3
ity (N ? m) for medium-quality fabric specimens,
25.8 30.9 39.2 36.8 46.6 15.6 32.3
from which the accompanying Minitab output was
obtained: Would you use a one-sample t confidence interval
to estimate true average bending rigidity? Explain
your reasoning.

99
Mean 37.42
95 StDev 25.81
90 N 23
80 RJ 0.911
70 P-Value <0.010
Percent

60
50
40
30
20
10
5

1
40 20 0 20 40 60 80 100 120
Bendrig

59. The article from which the data in Exercise 44 of P-value ..10. Would you use the one-sample t test
Chapter 7 was obtained also gave the following to test hypotheses about the value of the true aver-
data on the compressive strength (in MPa) for 7 age compressive strength? Why or why not?
specimens of internally cured concrete that have
60. The data in Exercise 40 is paired, so a paired
been set for 28 days:
t analysis is appropriate if it is plausible that the val-
38.7 40.1 40.3 47.5 48.0 56.0 61.1 ues of the differences were selected from a normal
distribution. Based on the accompanying plot from
Minitab gives r 5 .953 as the value of the cor- Minitab, does this appear to be the case?
relation coefficient test statistic and reported that

99 Unless otherwise noted, all content on this page is © Cengage Learning.

Mean 0.4137
95 StDev 0.3210
90 N 8
RJ 0.949
80
70 P-Value >0.100
Percent

60
50
40
30
20
10
5

1
1.25 1.00 0.75 0.50 0.25 0.00 0.25 0.50
Difference

61. The article cited in Exercise 88 of Chapter 7 1976: 567–573). The mean value of this distribution
gave the following observations on conductivity is 5 (1y) 2 (x0e2x0)y(1 2 e2x0). Replacing by
(% IACS) for eight wire electrodes used for wire x and by n and solving for the latter quantity gives
electrical-discharge machining: an estimate of . The expected frequencies for vari-
ous categories (class intervals) can then be calcu-
31 28 26 24 33 65 29 29
lated. Use the accompanying data along with x 5
a. Employ software to perform a test for normal- 13.086 to decide whether the truncated exponen-
ity (such as the Ryan-Joiner test) using a signifi- tial distribution is a plausible model (x0 5 70 here).
cance level of .05.
Class: 02,8 82,16 162,24
b. Note that there is one unusually high conduc-
Frequency: 20 8 7
tivity reading. Suppose the researchers discov-
ered there was a recording error for this observa- Class: 242,32 322,40 402,48
tion. Remove it and repeat part (a). How does Frequency: 1 2 1
the removal of the observation affect the test for Class: 482,56 562,64 642,70
normality? Frequency: 0 1 0
62. In a genetics experiment, investigators examined 64. It is hypothesized that when homing pigeons are
300 chromosomes of a particular type and counted disoriented in a certain manner, they will exhibit
the number of sister-chromatid exchanges on no preference for any direction of flight after take-
each one (“On the Nature of Sister-Chromatid off (the direction x, a continuous variable, should
Exchanges in 5-Bromodeoxyuridine-Substituted be uniformly distributed on the interval from 0°
Chromosomes,” Genetics, 1979: 1251–1264). A to 360°, so f (x) 5 1y360 on this interval). To test
Poisson model was hypothesized for the distribution this, 50 pigeons were disoriented and released,
of the number of exchanges. Test the fit of such a resulting in the following observed directions.
model to the accompanying data by first estimating Use a chi-squared test based on eight classes to
and then combining the frequencies for x 5 8 test the appropriate hypotheses at a significance
and x 5 9. level of .05.
x: 0 1 2 3 4 5 6 7 8 9
Frequency: 6 24 42 59 62 44 41 14 6 2 171 338 238 37 92 287 203 320 88
36 131 32 61 250 99 138 155 183
63. In an investigation into the distribution of out-
put tuft weight x of cotton fibers when the input 201 312 89 158 206 170 204 46 323
weight was x0, a truncated exponential distribu- 289 141 319 242 179 249 185 277 95
tion, f (x) 5 (e2x)y(1 2 e2x0) for 0 , x , x0, was 46 197 251 196 326 124 350 112 37
hypothesized (“Some Studies on Tuft Weight Dis-
104 290 47 310 86
tributions in the Opening Room,” Textile Res. J.,

8.5 Further Aspects of Hypothesis Testing

Our focus in hypothesis testing thus far has been on an intuitive development of test
procedures in various situations and their application to sample data. In this section, we
consider several somewhat more conceptual issues: the distinction between statistical
and practical significance of a test result, the interpretation and determination of type II
error probabilities, a test procedure that is distribution-free in the sense that its validity
does not depend on any restrictive assumptions, the relation between confidence in-
tervals and test procedures, and a general principle for construction of test procedures.

Statistical Versus Practical Significance

Carrying out a test amounts to deciding whether the value obtained for the test statistic
could plausibly have resulted when H0 is true. If the value does not deviate too much
from what is expected when the null hypothesis is true, there is no compelling reason
for rejecting H0 in favor of Ha. But suppose that the P-value is quite small, indicating a
test statistic value that is quite inconsistent with H0. One could continue to believe that
H0 is true and that such a value arose just through chance variation (a very unusual and
unrepresentative sample). However, in this case a more plausible explanation for what
was observed is that the null hypothesis is false and Ha is true.
When the P-value is smaller than the chosen significance level , it is customary
to say that the result is statistically significant. The finding of statistical significance
means that, in the investigator’s opinion, the observed deviation from what was expect-
ed under H0 cannot plausibly be attributed to sampling variability alone. However,
statistical significance cannot be equated with the conclusion that the true situation
differs from what H0 states in any practical sense. That is, even after the null hypothesis
has been rejected, the data may suggest that there is no practical difference between the
true value of the parameter and what the null hypothesis asserts that value to be.

Example 8.12 Samples of two different automobile braking systems were selected and the braking
distance (ft) for each was determined under specified experimental conditions, re-
sulting in the following summary information:
n1 5 100 x1 5 120 s1 5 5.0
n2 5 100 x2 5 118 s2 5 5.0

Does it appear that true average braking distance for the first system differs from that
for the second system? The relevant hypotheses are H0: 1 2 2 5 0 versus the alter-
native Ha: 1 2 2 Þ 0, and

x1 2 x2 120 2 118 2
z5 5 5 5 2.83
s21 s22 25 25 .707
1 1
C n1 n2 C 100 100

The P-value for this two-tailed z test is then 2 ? (area under z curve to the right of 2.83) 5
.0046. Thus the null hypothesis should be rejected at a significance level of .05 or
even at .01. We say that the data is statistically significant at either of these levels.
However, because of the rather large sample sizes and relatively small standard devia-
tions, it appears that 1 2 2 x1 2 x2 5 2.0. From a practical point of view, a 2-foot
difference in true average braking distance would appear to be relatively unimport-
ant. This is an instance of statistical significance without any evidence of a practi-
cally significant difference.

Type II Error Probabilities

A test carried out at a specified significance level is one for which the probability
of a type I error—the probability of rejecting the null hypothesis when it is true—is

the chosen . Using a small significance level results in a test that has good protection
against the commission of a type I error. However, if at the same time the likelihood of
committing a type II error—not rejecting the null hypothesis when in fact it is false—is
large, then the test procedure will be quite ineffective at detecting departures from the
null hypothesis. For example, consider testing H0: 5 100 versus Ha: . 100 using a
test with a significance level of 5 .01. If this test is used repeatedly on different samples
selected from the population of interest and if H0 is in fact true, in the long run only
1% of all samples will result in the incorrect rejection of the null hypothesis. Suppose,
though, that the alternative 5 105 represents an important departure from the null hy-
pothesis, but that in this situation 5 P(type II error) 5 .75. Then if the test procedure
is used over and over on different samples and in fact really is 105 rather than 100, in
the long run only 25% of all samples will result in the rejection of H0, whereas the other
75% of all samples will yield an incorrect conclusion. The test procedure has rather
poor ability to detect a departure from the null hypothesis that has substantial practical
significance. In general, it makes little sense to expend the resources necessary to acquire
sample data and carry out a test if the test procedure has very poor ability to detect impor-
tant departures from the null hypothesis. This is why we recommend investigating the
likelihood of committing a type II error before a test with a specified is used.
One way to determine is to use an appropriate set of curves. Figure 8.14 shows
three different curves for a one-tailed t test (appropriate for either the alternative
Ha: . 0 or the alternative Ha: . 0). Obtaining requires that we specify an alter-
native value of (e.g., 105 in the situation considered in the previous paragraph) and
also that we select a realistic value of the population or process standard deviation .
Then we calculate the value of
(alternative value of ) 2 0
d5

the distance between the alternative value and the null value expressed as some number
of population standard deviations. Thus d 5 2 means that the alternative value of is 2

1.0
= .01, df = 6
Unless otherwise noted, all content on this page is © Cengage Learning.

.8
= .05, df = 6
.6
= .01, df = 19
.4
Associated
value of .2

0 1 2 3

Value of

Figure 8.14 Selected curves for the one-tailed test

population standard deviations away from the null value. Finally, locate the value of d
on the horizontal axis, move directly up to the curve for n 2 1 df, and move over to the
vertical axis to read the value of .
The following general properties provide insight into how behaves.
1. The larger the number of degrees of freedom, the lower is the corresponding
curve for any value of d. Because df increases as the sample size increases, we
have the intuitively plausible result that decreases as n increases.
2. The farther the alternative value of interest is from the null value, the larger the
value of d. Because every curve decreases as d increases, it follows that will be
smaller for an alternative value far from what the null hypothesis asserts than for
a value close to 0. Thus the test is more likely to detect a large departure from
0 than a small departure.
3. The larger the value of , the smaller the value of d and the larger the resulting
value of corresponding to any particular alternative value of . That is, the
more underlying variability there is in the population or process, the more dif-
ficult it will be to detect a departure from H0 of any given magnitude. Selecting
a relatively large value of for the calculation gives a pessimistic value of .
In recent years, the use of curves has been superseded by statistical software, which is
quicker and avoids the visual inaccuracies associated with the curves. In particular, Minitab
will determine the power of the one-sample t test, where power 5 1 2 , once the differ-
ence between the null value and alternative value of and also the value of have been
specified (small is equivalent to large power; a powerful test is one that has large power
and therefore good ability to discriminate between the null hypothesis and the alternative
value of ). In addition, instead of specifying n and asking for power, the user can specify
the desired power for the given difference and ask Minitab for the necessary sample size.

Example 8.13 The true average voltage drop from collector to emitter of insulated gate bipolar tran-
sistors of a certain type is supposed to be at most 2.5 volts. An investigator selects a
sample of n 5 10 such transistors and uses the resulting voltages as a basis for testing
H0: 5 2.5 versus Ha: . 2.5 using a t test with significance level 5 .05. If the
standard deviation of the voltage distribution is 5 .100, how likely is it that H0 will
not be rejected when in fact 5 2.6?
The difference value is 2.6 2 2.5 5 .1. Providing this information to Minitab
along with the sample size, value of , and the fact that the test is upper-tailed
results in power 5 .8975, from which .1. The investigator may think that this
value of is too large for such a substantial departure from H0. When Minitab is
supplied with the difference .1, 5 .1, and the target power of .95 ( 5 .05) for
an upper-tailed test with 5 .05, the necessary sample size is returned as 13. The
actual power in this case is .9597, whereas using n 5 12 would result in power
being somewhat below the target.

Type II error probabilities for other tests can be determined in a similar manner using
appropriate statistical software.

You might ask whether there is another test procedure, based on a different test statis-
tic (a different function of the sample data), that outperforms the one-sample t test in the
sense that it has the same significance level but smaller type II error probabilities. It turns
out that there is no such test as long as the population distribution is normal. The one-
sample t test is really the best possible test in this situation. Furthermore, if the population
distribution is not too far from being normal, no test can improve on the one-sample t test
by very much. However, if the population distribution is highly nonnormal (heavy-tailed,
highly skewed, or multimodal), the t test should not be used. Then it is time to consult your
friendly neighborhood statistician to see what alternative methods of analysis are available.

An Alternative Two-Sample Test for

Hypotheses About m1 2 m2
Unfortunately, the two-sample t test does not have the same status as does the one-
sample test. The two-sample t test is an intuitively reasonable procedure that appears to
protect against both type I and type II errors, but it is not known whether it is the best
test in the sense described previously (smallest s for any given ). Furthermore, if the
population distributions are not normal, there are better tests available. We now give a
brief description of one such test, called the Wilcoxon rank-sum test or alternatively the
Mann–Whitney test (after the statisticians who discovered the procedure). The validity
of the test procedure requires that both population or process distributions be continu-
ous with the same shapes and spreads, so that the only possible difference between them
is the location of the center. The two-sample t test does not require equal variances, so
the new situation is more restrictive in this respect. However, the Wilcoxon test does not
require normal distributions, making it more widely applicable in this sense.
The test is based on a random sample from the first distribution and another ran-
dom sample, selected independently of the first one, from the second distribution. Let’s
take the n1 observations in the first sample and combine them with the n2 observations
from the second sample. Suppose that there are no tied values in this combined sample
(all n1 1 n2 observations are distinct). We assign a rank to each value in the combined
sample: The smallest value gets a rank of 1, the second smallest rank 2, and so on, until
finally the largest value has rank n1 1 n2. The following example, with n1 5 4 and n2 5 3,
consists of observations on fuel efficiencies (mpg) for two different types of cars:

Distribution from which the

2 2 1 2 1 1 1
observation was selected:
Combined sample (ordered): 27.8 29.0 29.3 29.8 31.0 32.1 33.0
Rank: 1 2 3 4 5 6 7

Consider testing H0: 1 2 2 5 0 versus Ha: 1 2 2 . 0. The key idea behind the
test is that, if the null hypothesis is true, observations from the two samples should be
intermingled in magnitude, so that the ranks are intermingled. However, when 1 ex-
ceeds 2, observations in the first sample will tend to be larger than those in the second
sample. In this case, the larger ranks will be assigned to sample 1 observations and the
smaller ranks to the observations from sample 2. The Wilcoxon test statistic w is the
sum of the ranks assigned to observations in the first sample. For the data introduced,

w 5 sum of ranks for observations in sample 1 5 3 1 5 1 6 1 7 5 21

Because the inequality . appears in Ha, values of w larger than 21 are even more con-
tradictory to the null hypothesis than the value actually obtained. Thus

P-value 5 P(w $ 21 when H0 is true)

Now the only set of four ranks for which w 5 21 is the one that resulted, and the only
possible w value larger than 21 is w 5 22, which occurs when the ranks are 4, 5, 6, and
7. So

P-value 5 P(ranks are 3, 5, 6, and 7 or 4, 5, 6, and 7 when H0 is true)

But when the null hypothesis is true, all seven observations have actually been selected
from the same distribution, in which case any set of four ranks for the observations in
the first sample has the same chance of resulting—the set 1, 2, 5, 7 or the set 1, 3, 4, 6,
and so on. It is not difficult to see that there are 35 possible sets of four ranks that can be
selected from the ranks 1, . . ., 7.1 Since only two of these 35 sets have w $ 21,
2
5 .0571 P@value 5
35
When H0 is true and this test statistic is used repeatedly on different samples, in the long
run about 5.7% of all samples will give a w value at least as contradictory to the null
hypothesis as what we obtained. The P-value is small enough to justify rejection of H0
at level .10 but not at level .05.
Unless n1 and n2 are quite small, it can be time-consuming to determine the sets of
ranks corresponding to w values at least as extreme as what was obtained to calculate the P-
value. We recommend using a statistical computer package for this purpose. The Wilcox-
on test is valid whatever the nature of the two distributions as long as they are continuous
with the same shapes and spreads. This test is often described as being distribution-free (or
nonparametric), meaning that it is valid for a wide variety of underlying distributions rather
than just one particular type of distribution. The t test is not distribution-free, because its
validity is predicated on the two distributions being at least approximately normal. There
are a number of other distribution-free tests in a statistician’s toolbox, many of them based
on ranks of the observations. The best of these tests, including the Wilcoxon test, perform
almost as well as tests such as the t test that are developed with specific types of distribu-
tions in mind. That is, for the same significance level , type II error probabilities for the
distribution-free tests are not much larger than those of the best tests in various situations.
Consult one of the chapter references for more information on procedures of this type.

The Relationship Between Test Procedures

and Confidence Intervals
Suppose the two-sided large-sample confidence interval for a population mean at the
95% confidence level based on a particular sample is (103.5, 108.2). Consider using
this same sample to test, at a significance level of .05, the null hypothesis H0: 5 0
against the two-sided alternative Ha: Þ 0. It is not difficult to see that if the null value
0 is a number in the confidence interval, such as 105 or 107.5, then the P-value will
exceed .05, so H0 cannot be rejected. If, however, 0 lies outside the confidence interval

1
In general, there are (n1 1 n2)!y(n1!)(n2!) ways to select the n1 ranks for the observations from the first sample.

(e.g., 100 or 110), then the P-value # .05 and H0 can be rejected. In other words, the
95% interval consists precisely of all values 0 for which the null hypothesis H0: 5 0
cannot be rejected at a significance level of .05. This is intuitively reasonable, since the
confidence interval consists of all plausible values of at the designated confidence lev-
el, and not rejecting H0 means that 0 is plausible. The following generalization of this
situation describes an important relationship between tests and confidence intervals.

Let nL denote the lower confidence limit for some parameter and nU denote the upper
confidence limit, where the confidence level is 100(1 2 ),. Consider the test procedure
that rejects H0: 5 0 in favor of a: Þ 0 if 0 lies outside the interval and does not
reject the null hypothesis if 0 falls between nL and nU. (Notice that there is no explicit test
statistic, but we still have a decision rule.) This test procedure has a significance level of .

The result is important because a confidence interval can be used as a basis for
testing hypotheses, and, by the same token, there is a confidence interval procedure cor-
responding to any particular test procedure. (Our discussion has focused on two-sided
confidence intervals and two-tailed tests, but one-sided confidence intervals that specify
a lower or an upper confidence bound give rise to one-tailed tests and vice versa.) For
example, in Chapter 7 we discussed the bootstrap method for calculating confidence
intervals; these intervals also form the basis for bootstrap tests of hypotheses. Similarly,
the Wilcoxon rank-sum test, which was described previously, gives rise to a distribution-
free confidence interval for 1 2 2. In summary, the duality between tests and confi-
dence intervals has led to the development of many important inferential procedures.

A General Principle for Obtaining Test Procedures

The test procedures considered so far have all been developed in an ad hoc manner;
an intuitively plausible test statistic was selected and its sampling distribution when
H0 is true was obtained so that the P-value could be calculated. Many frequently used
test procedures can be derived using a general technique called the likelihood ratio
principle. Suppose that the mass or density function for a single observation to be ran-
domly selected from some population or process is f (x; ). Recall from our discussion
of maximum likelihood estimation that if the n sample observations x1, . . . , xn are inde-
pendently selected from this distribution (a random sample), then the likelihood is the
joint mass or density function f (x1; ) f (x2; ) f (xn; ), regarded as a function of .
For example, for a random sample from a Poisson distribution, the likelihood would be
x1 e 2 … xn e 2 xi e 2n
^

5 …
x 1! xn! x 1! x n !
Now consider general null and alternative hypotheses of the form H0: V 0 ver-
sus Ha: V a( is read as “lies in the set. . . ”). For example, V 0 might be the single
value 10, and V a might consist of all numbers except 10, whence the hypotheses are
H0: 5 10 versus Ha: Þ 10. Now consider the following likelihood ratio test statistic:
maximum value of likelihood for all V0
(x) 5 likelihood ratio test statistic 5
maximum value of likelihood for all Va

where x is compact notation for x1, . . . , xn. If the numerator of this statistic is much
larger than the denominator (the ratio is much larger than 1.0), then there is a value of
specified by the null hypothesis for which the observed data is a lot more likely than it
would be for any value of specified by the alternative hypothesis. If, however, the ratio
is much smaller than 1.0, there is an alternative value of for which the observed data is
much more likely than would be the case if the null hypothesis were true. A ratio of the
latter sort therefore suggests rejecting H0 in favor of Ha. Suppose, for example, that the
value of (x) is .2. Then values of this statistic smaller than .2 are even more contradic-
tory to H0 than what was obtained, implying that
P@value 5 P( (x) # .2 when H0 is true)
Suppose that the population distribution is normal and that we wish to test the null
hypothesis H0: 5 0 against one of the three alternatives considered previously. It is
not at all obvious by inspection, and the argument requires a bit of tedious algebra, but
it can be shown that application of the likelihood ratio principle here gives rise to the
one-sample t test. So this test procedure can be derived from a general principle for test
construction rather than being justified simply on intuitive grounds. This is also true of
a number of test procedures to be considered in the next several chapters.

Section 8.5 Exercises

65. Let x denote the IQ of a child randomly selected To be suitable for a particular application, true
from a certain large geographical region. Suppose x average expansion should be less than 75 mils.
is known to have (approximately) a normal distribu- The alloy will not be used unless there is strong
tion with 5 15. A parent group wishes to test the evidence that the criterion has been met. Assum-
hypothesis H0: 5 100 versus Ha: . 100, hoping ing a normal distribution and a test with 5 .01,
to reject the null hypothesis and be able to claim what is the probability that a type II error will be
that the average IQ of their children exceeds the committed and the alloy not used when in fact
nationwide average. The test statistic in this situa- 5 72 and 5 5? What is this probability when
tion is z 5 (x 2 100)y(15y1n). 5 70 and 5 5?
a. Determine the P-value of the test for each of
67. A sample of 15 radon detectors of a particular type
the following values of n when x 5 101 (which
is to be selected, and each will be exposed to 100
suggests that if there is a departure from H0, it
pCi/L of radon. The resulting data will be used to test
is of little practical significance): i. 100, ii. 400,
whether the population mean reading is in fact 100.
iii. 1600, iv. 2500.
Suppose that the reading x has a normal distribution
b. At a significance level of .01, rejecting H0 is ap-
within the population. Write a paragraph or two ex-
propriate if the P-value # .01, equivalent to z $
plaining the following Minitab output to someone
2.33, that is, x $ 100 1 (2.33)(15)y 1n. Deter-
who is familiar with the elements of hypothesis test-
mine the value of the type II error probability
ing but not with type II error probabilities:
when 5 101 for each of the sample sizes given
in part (a). Is a large sample size likely to result in Testing mean = null (versus not = null)
rejecting H0 even in the absence of a practically Calculating power for mean = null + difference
significant departure from H0? Alpha = 0.01 Sigma = 1
Sample
66. The Charpy V-notch impact test is to be applied Difference Size Power
to a sample of 20 specimens of a certain alloy to 0.5 15 0.1944
determine transverse lateral expansion at 110°F. 0.8 15 0.5619

Alpha = 0.01 Sigma = 0.8 five bondings of two surfaces, and the force necessary
Sample to separate the two surfaces was determined for each
Difference Size Power bonding, resulting in the following data:
0.5 15 0.3311
0.8 15 0.7967 Adhesive 1: 229 286 245 299 250
Alpha = 0.01 Sigma = 0.8 Adhesive 2: 216 179 183 247 232
Sample Target Actual Use the Wilcoxon rank-sum test to decide whether
Difference Size Power Power true average bond strengths differ for the two adhe-
0.5 42 0.9000 0.9047 sives. Hint: For these sample sizes, when H0 is true,
0.8 19 0.9000 0.9147
P(w $ c) 5 .048 for c 5 36, .028 for c 5 37, and .008
68. The article “A Study of Wood Stove Particulate for c 5 39. Furthermore, when H0 is true, the distri-
Emission” (J. of the Air Pollution Control Fed., bution of w is symmetric about n1(n1 1 n2 1 1)y2, so
1979: 724–728) reported the following data on in this case P(w # c) 5 .048 for c 5 19.
burn time (hr) for specimens of oak and pine.
70. The confidence interval associated with Wilcoxon’s
Use Wilcoxon’s test at a significance level of .05
rank-sum test has the following general form. First,
to decide whether true average burn time for oak
subtract each observation in the first sample from ev-
exceeds that for pine. Hint: With n1 5 6 and n2 5
ery observation in the second sample to obtain a set
8, when H0 is true, P(w $ c) 5 .054 for c 5 58
of n1n2 differences. Then the confidence interval ex-
and is .010 for c 5 63.
tends from the cth smallest of these differences to the
Pine: .98 1.40 1.33 1.52 .73 1.20 cth largest difference, where the value of c depends
on the desired confidence level. In the case n1 5
Oak: 1.72 .67 1.55 1.56 1.42 1.23 1.77 .48
n2 5 5, c 5 4 results in a confidence level of 94.4%,
69. In an experiment to compare the bond strength of which is as close to 95% as can be obtained. Deter-
two different adhesives, each adhesive was used in mine this CI for the strength data in Exercise 69.

Supplementary Exercises
71. Have you ever been frustrated because you could not conclusion change if a significance level of .01
get a container of some sort to release the last bit of its had been used?
contents? The article “Shake, Rattle, and Squeeze: c. Describe in context type I and II errors, and say
How Much Is Left in That Container?” (Consumer which error might have been made in reaching
Reports, May 2009: 8) reported on an investigation a conclusion.
of this issue for various consumer products. Suppose
72. The article cited in Exercise 25 of Section 8.2 gave
five 6.0-oz tubes of toothpaste of a particular brand
the following data on mass crystallinity (in %) for
are randomly selected and squeezed until no more
12 samples of the PHB polymer:
toothpaste will come out. Then each tube is cut
open and the amount remaining is weighed, result- 42.97 38.81 38.83 41.03 41.25 36.99
ing in the following data (consistent with what the 49.57 41.77 34.50 44.77 36.92 40.48
cited article reported): .53, .65, .46, .50, .37. Does it
appear that the true average amount left is less than a. Is it plausible that the mass crystallinity for this
10% of the advertised net contents? type of polymer is normally distributed?
a. Check the validity of any assumptions necessary b. Suppose researchers wanted to investigate
for testing the appropriate hypotheses. whether the true average mass crystallinity ex-
b. Carry out a test of the appropriate hypotheses ceeds 40%. Carry out a test of appropriate hy-
using a significance level of .05. Would your potheses using a significance level of .05.

73. The following summary data on daily caffeine con- Replace 2 in X2 by its hypothesized value 20 to
sumption for a sample of adult women appeared obtain a test statistic. If the alternative hypothesis is
in the article “Caffeine Knowledge, Attitudes, and Ha: . 0, the P-value is the area under the n 2 1
Consumption in Adult Women” (J. of Nutrition df chi-squared curve to the right of the calculated X2
Educ., 1992: 179–184): n 5 47, x 5 215 mg, s 5 (an upper-tailed test).
235 mg, range of data: 5–1176. a. To ensure reasonably uniform characteristics
a. Does it appear plausible that the population distri- for a particular application, it is desired that the
bution of daily caffeine consumption is normal? true standard deviation of the softening point
Is it necessary to assume a normal population of a certain type of petroleum pitch be at most
distribution to test hypotheses about population .50°C. The softening points of ten different
mean consumption? Explain your reasoning. specimens were determined, yielding a sample
b. Suppose it had previously been believed that pop- standard deviation of .58°C. Assume that the
ulation mean consumption was at most 200 mg. distribution from which the observations were
Does the given data contradict prior belief? selected is normal. Does the data contradict the
uniformity specification? State and test the ap-
74. Contamination of mine soils in China is a serious en-
propriate hypotheses using 5 .01.
vironmental problem. The article “Heavy Metal Con-
tamination in Soils and Phytoaccumulation in a Man- b. Suppose that the investigator who performed
ganese Mine Wasteland, South China” (Air, Soil, and the experiment described in part (a) had wished
Water Res., 2008: 31–41) reported that, for a sample of to test H0: 5 .70 versus Ha: , .70. Can this
3 soil specimens from a certain restored mining area, test be carried out using the chi-squared table in
the sample mean concentration of total Cu was 45.31 this book? Why or why not?
mg/kg with a corresponding (estimated) standard error 77. Let denote the proportion of “successes” in some
of the mean of 5.26. It was also stated that the China population. Consider selecting a random sample of
background value for this concentration was 20. The size n, and let p denote the sample proportion of
results of various statistical tests described in the article successes (number of successes in the sample divid-
were predicated on assuming normality. ed by n). Suppose we wish to test H0: 5 0. When
Does the data provide strong evidence for H0 is true and both n0 . 5 and n(1 2 0) . 5, the
concluding that the true average concentration in sampling distribution of p is approximately nor-
the sampled region exceeds the stated background mal with mean value 0 and standard deviation
value? Carry out a test at significance level .01. 20(1 2 0)yn. This implies that a “large-sample”
test statistic is z 5 (p 2 0)y20(1 2 0)yn (i.e.,
75. In an investigation of the toxin produced by a certain
we standardize p assuming that H0 is true); the
poisonous snake, a researcher prepared 26 different
P-value is calculated as was done in Section 8.1 for
vials, each containing 1 g of the toxin, and then
a z test concerning .
determined the amount of antitoxin necessary to
Seat belts help prevent injuries in vehicle ac-
neutralize the toxin. The sample average amount of
cidents, but they don’t offer complete protection
antitoxin necessary was found to be 1.89 mg, and the
in extreme situations. A sample of 319 front-seat
sample standard deviation was .42. Previous research
occupants involved in head-on collisions in a cer-
had indicated that the true average neutralizing
tain region resulted in 95 who sustained no inju-
amount was 1.75 mg/g of toxin. Does the new data
ries (“Influencing Factors on the Injury Severity
contradict the value suggested by prior research?
of Restrained Front Seat Occupants in Car-to-Car
State and test the relevant hypotheses using 5 .05.
Head-on Collisions,” Accident Analysis and Preven-
76. When the population distribution is normal, it can tion, 1995: 143–150). Does this data suggest that
be shown that the variable X2 5 (n 2 1)s2y2 has less than one-third of all such accidents result in
a chi- squared distribution with n 2 1 df. This can no injuries? State and test the relevant hypotheses
be used as a basis for testing H0: 5 0, as follows: using a significance level of .05.

78. Some of the deadliest mass shootings in U.S. histo- Is this difference in fact statistically significant? State
ry occurred in 2012. These events led to many calls the appropriate hypotheses and test at 5 .05.
for stricter national gun control. On December
81. Information about hand posture and forces gener-
27, 2012, the Gallup organization reported that
ated by the fingers during manipulation of various
roughly 600 of 1038 American adults surveyed said
daily objects is needed for designing high-tech hand
they would be in favor of strengthening laws cover-
prosthetic devices. The article “Grip Posture and
ing the sale of firearms.
Forces During Holding Cylindrical Objects with
a. Does this provide strong evidence for conclud-
Circular Grips” (Ergonomics, 1996: 1163–1176)
ing that more than 50% of the population of
reported that for a sample of 11 females, the sample
American adults was in favor of making laws
mean four-finger pinch strength (N) was 98.1 and
covering the sale of firearms more strict? Con-
the sample standard deviation was 14.2. For a sam-
duct an appropriate test of hypotheses using a
ple of 15 males, the sample mean and sample stan-
.01 significance level. (Hint: Read the first para-
dard deviation were 129.2 and 39.1, respectively.
graph of the previous problem.)
a. A test carried out to see whether true average
b. This poll was conducted December 19–22,
strengths for the two genders were different re-
just days after a mass shooting at an elementary
sulted in t 5 2.51 and P-value 5 .019. Does the
school in Connecticut. Discuss what effects this
appropriate test procedure described in this chap-
event may have had on the poll’s outcome.
ter yield this value of t and the stated P-value?
79. Headability is the ability of a cylindrical piece b. Is there substantial evidence for concluding that
of material to be shaped into the head of a bolt, true average strength for males exceeds that for
screw, or other cold-formed part without crack- females by more than 25 N? State and test the
ing. The article “New Methods for Assessing Cold relevant hypotheses.
Heading Quality” (Wire J. Intl., Oct. 1996: 66–72)
82. The article “Pine Needles as Sensors of Atmospheric
described the result of a headability impact test
Pollution” (Environ. Monitoring, 1982: 273–286)
applied to 30 specimens of aluminum killed steel
reported on the use of neutron-activity analysis to
and 30 specimens of silicon killed steel. The
determine pollutant concentration in pine needles.
sample mean headability rating number for the
According to the article’s authors, “These obser-
steel specimens was 6.43 and the sample mean for
vations strongly indicated that for those elements
aluminum specimens was 7.09. Suppose that the
which are determined well by the analytical proce-
sample standard deviations were 1.08 and 1.19, re-
dures, the distribution of concentration is lognor-
spectively. Do you agree with the article’s authors
mal. Accordingly, in tests of significance the loga-
that the difference in headability ratings is signifi-
rithms of concentrations will be used.” The given
cant at the 5% level?
data refers to bromine concentration in needles
80. The article “Two Parameters Limiting the Sensitivity taken from a site near an oil-fired steam plant and
of Laboratory Tests of Condoms as Viral Barriers” from a relatively clean site. The summary values
(J. of Testing and Eval., 1996: 279–286) reported that, are means and standard deviations of the log-trans-
in brand A condoms, among 16 tears produced by a formed observations.
puncturing needle, the sample mean tear length was Standard
74.0 m, whereas for the 14 brand B tears, the sample Mean Deviation
mean length was 61.0 m (determined using light Sample log of log
microscopy and scanning electron micrographs). Site Size concentration concentration
Suppose the sample standard deviations are 14.8 Steam plant 8 18.0 4.9
and 12.5, respectively (consistent with the sample
Clean 9 11.0 4.6
ranges given in the article). The authors commented
that the thicker brand B condom displayed a smaller Let *1 be the true average log concentration at the
mean tear length than the thinner brand A condom. first site and define *2 analogously for the second site.

a. Use the pooled t test (based on assuming nor- 85. Tardive dyskinesia refers to a syndrome comprising a
mality and equal standard deviations), described variety of abnormal involuntary movements assumed
in Exercise 37, to decide at significance level to follow long-term use of antipsychotic drugs. An ex-
.05 whether the two concentration distribution periment carried out to investigate the effect of the
means are equal. drug deanol also used a placebo treatment, some-
b. If 1* and 2* , the standard deviations of the thing that resembled deanol in every way but was
two log concentration distributions, are not known to be inert and have absolutely no medical
equal, would 1 and 2, the means of the con- effect. The two treatments were administered for 4
centration distributions, be equal if *1 5 *?
2 weeks each in random order to 14 patients, resulting
Explain your reasoning. in the following total severity index scores (“Double
Blind Evaluation of Deanol in Tardive Dyskinesia,”
83. The article cited in Exercise 78 of Chapter 7 gave
J. of the Amer. Med. Assoc., 1978: 1997–1998):
additional data on breaking force (N):
Patient: 1 2 3 4 5 6 7
Temp Medium n x s
Deanol: 12.4 6.8 12.6 13.2 12.4 7.6 12.1
22° Dry 6 170.60 39.08
Placebo: 9.2 10.2 12.2 12.7 12.1 9.0 12.4
37° Dry 6 325.73 34.97
22° Wet 6 366.36 34.82 Patient: 8 9 10 11 12 13 14
37° Wet 6 306.09 41.97 Deanol: 5.9 12.0 1.1 11.5 13.0 5.1 9.6
Placebo: 5.9 8.5 4.8 7.8 9.1 3.5 6.4
a. Is there strong evidence for concluding that true
average force in a dry medium at the higher Does the data indicate that, on average, deanol
temperature exceeds that at the lower tempera- yields a higher total severity index score than does
ture by more than 100 N? the placebo treatment?
b. Is there strong evidence for concluding that true
average force in a wet medium at the lower tem- 86. The authors of the article “Predicting Professional
perature exceeds that at the higher temperature Sports Game Outcomes from Intermediate Game
by more than 50 N? Scores” (Chance, 1992: 18–22) used statistical anal-
ysis to determine whether there was any merit to
84. Long-term exposure of textile workers to cotton dust the idea that basketball games are not settled until
released during processing can result in substantial the last quarter, whereas baseball games are “over”
health problems so textile researchers have been inves- by the seventh inning. They also considered foot-
tigating methods that will result in reduced risks while ball and hockey. Data was collected for a sample of
preserving important fabric properties. The accompa- games of each type, selected from all games played
nying data on roving cohesion strength (kN?m/kg) for during the 1990 season for baseball and football
specimens produced at five different twist multiples is and during the 1990–1991 season for the other two
from the article “Heat Treatment of Cotton: Effect on sports. For each game, the late-game leader was
Endotoxin Content, Fiber and Yarn Properties, and determined, and it was noted whether the leader
Processability” (Textile Research J., 1996: 727–738): actually ended up winning the game. The leader
was defined as the team ahead after three quarters
Twist multiple: 1.054 1.141 1.2451.370 .481
in basketball and football, two periods in hockey,
Control strength: .45 .60 .61 .73 .69 and seven innings in baseball. The results follow:
Heated strength: .51 .59 .63 .73 .74
Sport Leader wins Leader loses
The authors of the cited article stated that strength
Basketball 150 39
for heated specimens appeared to be slightly higher
Baseball 86 6
on average than for the control specimens. Is the
difference statistically significant? State and test the Hockey 65 15
relevant hypotheses using 5 .05. Football 72 21

Do the four sports appear to be identical with respect Does the true mean difference between slide retriev-
to the proportion of games won by the late-game al time and digital retrieval time appear to exceed
leader? State and test the appropriate hypotheses us- 10 sec? Be sure to check the validity of any assump-
ing 5 .05. Do you think your conclusion can be tions on which your chosen inferential method is
attributed to a single sport being an anomaly? based.

87. As the population ages, there is increasing concern 89. The NCAA basketball tournament begins with
about accident-related injuries to the elderly. The 64 teams that are apportioned into four regional
article “Age and Gender Differences in Single-Step tournaments, each involving 16 teams. The 16
Recovery from a Forward Fall” (J. of Gerontology, teams in each region are then ranked (seeded)
1999: M444–M50) reported on an experiment from 1 to 16. During the 12-year period from
in which the maximum lean angle—the furthest 1991 to 2002, the top-ranked team won its re-
a subject is able to lean and still recover in one gional tournament 22 times, the second-ranked
step—was determined for both a sample of younger team won 10 times, the third-ranked team was
females (21–29 years) and a sample of older females 5 times, and the remaining 11 regional tourna-
(67–81 years). The following observations are con- ments were won by teams ranked lower than
sistent with summary data given in the article: 3. Let P ij denote the probability that the team
ranked i in its region is victorious in its game
YF: 29 34 33 27 28 32 31 34 32 27 against the team ranked j. Once the Pij’s are avail-
OF: 18 15 23 13 12 able, it is possible to compute the probability that
any particular seed wins its regional tournament
Does the data suggest that true average maximum (a complicated calculation because the number
lean angle for older females is more than 10 degrees of outcomes in the sample space is quite large).
smaller than it is for younger females? State and test The paper “Probability Models for the NCAA
the relevant hypotheses at significance level .10. Regional Basketball Tournaments” (The Ameri-
can Statistician, 1991: 35–38) proposed several
88. Adding computerized medical images to a data-
different models for the Pij’s.
base promises to provide great resources for physi-
a. One model postulated Pij 5 .5 1 (i 2 j) with
cians. However, there are other methods of obtain-
5 1y32 (from which P16.1 5 , P16.2 5 2, etc.).
ing such information, so the issue of efficiency of
Based on this, P(seed #1 wins) 5 .27477, P(seed
access needs to be investigated. The article “The
#2 wins) 5 .20834, and P(seed #3 wins) 5
Comparative Effectiveness of Conventional and
.15429. Does this model appear to provide a
Digital Image Libraries” (J. of Audiovisual Media
good fit to the data?
in Medicine, 2001: 8–15) reported on an experi-
b. A more sophisticated model has Pij 5 .5 1
ment in which 13 computer-proficient medical
.2813625(zi 2 zj) where the z’s are measures
professionals were timed both while retrieving an
of relative strengths related to standard normal
image from a library of slides and while retrieving
percentiles (percentiles for successive highly seed-
the same image from a computer database with a
ed teams are closer together than is the case for
Web front end.
teams seeded lower, and .2813625 ensures that the
Subject: 1 2 3 4 5 6 7 range of probabilities is the same as for the model
Slide: 30 35 40 25 20 30 35 in part (a)). The resulting probabilities of seeds
Digital: 25 16 15 15 10 20 7 1, 2, or 3 winning their regional tournaments are
Difference: 5 19 25 10 10 10 28 .45883, .18813, and .11032, respectively. Assess
the fit of this model.
Subject: 8 9 10 11 12 13
Slide: 62 40 51 25 42 33 90. One way to reduce the equipment problems that
Digital: 16 15 13 11 19 19 occur during die casting is to apply a thin coating to
Difference: 46 25 38 14 23 14 the core pins. The paper “Tool Treatment Extends

Core and Pin Life” (Die Casting Engineer, March/ estimator is unbiased and normally distributed
April 1999: 88) reported on an experiment in which provided that the two population distributions are
one group of core pins was coated using the tradi- normal, and its variance can be determined from
tional nitride process and a second group was coat- the fact that for any two independent random vari-
ed using a new thermal diffusion process. Use the ables y1 and y2 and numerical constants a1 and a2,
accompanying data to decide at significance level V(a1y1 1 a2y2) 5 a21V(y1) 1 a22V(y2).
.01 whether there is strong evidence for concluding
that true average lifetime for the thermal treatment Nitrite: 9000 20,000 10,000 20,000
is more than four times that of the nitride treat- 21,000 3000 4000
ment. Hint: Consider the parameter 5 41 2 2 Thermal: 49,000 23,000 20,000 100,000
with corresponding estimator n 5 4x1 2 x2. This 114,000 35,000 30,000

Bibliography
Agresti, Alan, An Introduction to Categorical Data Belmont, CA, 2012. A somewhat more comprehensive
Analysis (2nd ed.), Wiley, New York, 2007. An excel- and slightly advanced treatment of hypothesis testing
lent treatment of various aspects of categorical data and other topics than what is presented in this book.
analysis by one of the most prominent researchers in See also the books by Devore et al. and by DeGroot et al.
this area. listed in the Chapter 7 bibliography.
Devore, Jay L., Probability and Statistics for Engineer-
ing and the Sciences (8th ed.), Brooks/Cole -Cengage,

Introduction
As we saw in Chapter 8, there is more than one way to make comparisons be-
tween two populations or processes. Choosing the best approach involves using
one’s technical knowledge of a problem to select an appropriate statistical tech-
nique. In some cases, the independent samples test (Section 8.2) may be the best
approach. At other times, the paired-samples test (also Section 8.2) may be supe-
rior. In Chapter 9, both of these methods are extended to comparisons between
more than two population means. The independent samples test generalizes to the
single-factor analysis of variance (Section 9.2), whereas the paired-samples test
generalizes to the randomized block design (Section 9.4). The procedures in
Chapter 9 are the tip of a large statistical iceberg called experimental design,
which is discussed further in Chapter 10.
One of the important features of the designs in Chapter 9 is that they com-
bine the sample data from several populations into a test capable of de-
tecting when one or more of the population means differ from the rest. That is,
analysis of variance tests are not conducted by simply performing the two-sample
tests of Chapter 8 on all the different pairings of several populations. Only after
an analysis of variance test signals a possible difference between the population
means do we begin to conduct multiple comparisons of the populations, dis-
cussed in Section 9.3, to pinpoint the specific populations whose means differ
from one another.

413

9.1 Terminology and Concepts

Although the focus in this chapter is on detecting differences between several population
or process means, the primary tool used in these tests is based on a comparison of vari-
ances. Consequently, the procedures in this chapter have collectively become known as
the analysis of variance. This phrase is often shortened to the acronym ANOVA.
ANOVA methods are concerned with testing null hypotheses of the form
H0: the means of several populations or processes are the same
The alternative hypothesis Ha is that at least two of the means differ from each other. Letting
1, 2, 3,…, k denote k population or process means, H0 and Ha can be written as
H0: 1 5 2 5 3 5 … 5 k
Ha: at least two of the i’s are different

Some typical ANOVA applications follow:

Do four different brands of gasoline have different effects on automobile fuel effi-
ciency? (H0: the mean fuel efficiency (mpg) obtained is the same for all four brands.)
Is there any difference in crop yields when five different fertilizers are used? (H0:
the mean crop yield per acre is the same for all five fertilizers.) What about using
four different watering schedules? (H0: the mean crop yield per acre is the same
for each watering schedule.)
Will three different levels of a chemical concentration have differing effects on
an electroplating process? (H0: the mean plating thickness is the same for all
three concentration levels.)
In each of these examples, the populations share some common characteristic,
called a factor or treatment, whose various levels or treatment levels distinguish one
population from another. For example, when testing for possible differences in fuel effi-
ciency among four brands of gasoline, the factor of interest is gasoline brand, which has
four different levels, for example, brand 1, brand 2, brand 3, and brand 4. Alternatively,
we could refer to gasoline brand as a treatment with the four brands representing the
treatment levels. The k levels of a factor correspond to the k different populations being
compared in the test of the hypothesis H0: 1 5 2 5 3 5 … 5 k.
Comparisons between populations are made by choosing a numerical quantity,
called a response variable, that is measured for each sampled item. In the fuel effi-
ciency study, for example, fuel efficiency (mpg) would be the response variable, and we
would measure the mpg for cars selected to use gasoline of brand 1, brand 2, brand 3,
and brand 4, respectively. Using this terminology, these three examples can be sum-
marized as follows:

Factor Levels Response

Gasoline Brand 1, brand 2, brand 3, brand 4 Fuel efficiency, in mpg
Fertilizer Fertilizer 1, fertilizer 2, fertilizer 3 Crop yield, in bushels/acre
Chemical Concentration 1, concentration 2, Plating thickness, in mm
concentration concentration 3

When an ANOVA problem is expressed in terms of a factor and a response, the goal
of the study is to determine whether the different factor levels have different effects on
the response variable. Think of the factor as the independent variable and the response
as the dependent variable. It often helps to draw a picture, as shown in Figure 9.1, to
visualize the data from an ANOVA study.
Response

Samples

Factor
1 2 3 ...
Level

Figure 9.1 Visualizing data from an ANOVA study

Of course, populations may be characterized by several factors, not just one, and
each factor can have any number of levels. Populations that represent different levels of
a single factor are said to form a one-way classification, and we compare these popula-
tions using a single-factor (one-way) ANOVA. Characterizing populations by two fac-
tors is called a two-way classification, and so forth. Techniques for studying two or more
factors are presented in Chapter 10.

How ANOVA Tests Work

Like the two-sample tests in Chapter 8, ANOVA procedures use just one test for com-
paring k population means. Suppose, for example, that we select random samples from
each of k 5 4 populations and present the data in a graph such as that in Figure 9.1.
Unless otherwise noted, all content on this page is © Cengage Learning.

This is the natural extension of the independent samples situation of Section 8.2. To
determine whether the population means differ, the ANOVA approach compares the
variation between the four sample means to the inherent variability within each sample
(see Figure 9.2). The more the sample means differ, the larger will be the between-
samples variation shown at the right in Figure 9.2. The test statistic that compares these
two types of variation is the ratio of the between-samples variation to the within-samples
variation,
between@samples variation
test statistic5
within@samples variation
Figure 9.3 shows how this test statistic behaves when there is no difference be-
tween the four means (i.e., when H0: 1 5 2 5 3 5 4 is true) and when the means
do differ (when H0 is false). In essence, large values of the test statistic tend to support

the alternative hypothesis (that some of the means differ from the others), whereas small
values of the statistic support the null hypothesis.

Variation within samples

–
4

–
2 Variation between
– sample means
1
–
3

Sample number
1 2 3 4

Figure 9.2 ANOVA methods compare variation to

variation

F Distributions
When the hypothesis H0: 1 5 2 5 3 5 … 5 k is true, it can be shown that the
test statistic described previously follows a continuous probability distribution called
an F distribution. F distributions arise in statistical tests that involve ratios of two varia-
tion measures, such as the ratio of between-samples variation to within-samples varia-
tion, as shown in Figure 9.3. The variation measures used in an F ratio are based on

When 0 is true: When 0 is false:

1 2 3 4 1 2 3 4

between-samples variation between-samples variation

is small is large
within-samples variation within-samples variation

Figure 9.3 How an ANOVA test works

certain sums of squares calculated from the sample data, and each sum of squares has
an associated number of degrees of freedom. The numerator degrees of freedom, de-
noted df1, is the number of degrees of freedom associated with the sum of squares in the
numerator of an F ratio. Similarly, df2 denotes the denominator degrees of freedom.
There is a different F distribution for every different combination of positive in-
tegers df1 and df2. For example, there is an F distribution with 4 numerator degrees
of freedom and 12 denominator degrees of freedom, another with 3 numerator degrees
of freedom and 20 denominator degrees of freedom, and so forth. Because they are
ratios of nonnegative quantities, variables that follow F distributions have only nonnega-
tive values, and their density curves have a shape similar to that shown in Figure 9.4.
ANOVA tests, as Figure 9.3 illustrates, are upper-tailed tests. In other words, only
large values of the F ratio lead to rejecting H0; small values do not reject H0. In terms
of P-values, this means that the P-value associated with a calculated F ratio is the area
under the F distribution to the right of the calculated F ratio. Figure 9.4 shows the P-value
associated with a calculated F ratio of 9.15 based on df1 5 4 and df2 5 6. Tables of critical
values for F distributions can be found in Appendix Table VIII.

curve for df1 = 4 and df2 = 6

Shaded area = -value = .01

0
= 9.15

Figure 9.4 -value for an upper-tailed test

The F table (Table VIII) contains critical F values associated with tail areas of .10,
.05, .01, and .001. To use the table with an F ratio based on df1 5 4 and df2 5 6, read
across the top of the table to find the column with df1 5 4 and read down the left side of
the table to find the row with df2 5 6. At the intersection of this column and row, there
will be four critical values, corresponding to right-tail probabilities of .10, .05, .01, and
.001. For example, a P-value of .01 is associated with an F ratio of 9.15, a P-value of .05
Unless otherwise noted, all content on this page is © Cengage Learning.

is associated with an F ratio of 4.53, and so forth.

Section 9.1 Exercises

1. Three types of wood (denoted A, B, and C) are a. What hypotheses would you test in such a study?
being considered for use in a building project. Each Describe, in words, the parameters that appear
type of wood differs in cost, so the builder is inter- in the hypotheses.
ested in keeping costs down as well as in selecting b. Suppose an ANOVA test indicates that beams of
wood that will be strong enough. To determine types A and B are not significantly different in
whether there is a difference between the aver- strength from one another, but that both types
age strengths of the three types of wood, a random are significantly stronger than beams of type C. If
sample of ten beams of each type is selected and the builder’s objective is to use as strong a beam
their strengths are measured. as possible, what type of beam should be used?

c. Suppose the ANOVA test does not reveal any sig- select the factor level having the largest sample
nificant differences in strength between the three mean. This strategy has been called the “pick the
types of beams. If the builder must use one of the winner” approach in the literature on experimental
three types, which type should be chosen? design. Explain what is wrong with this approach
and why it does not take the place of an ANOVA test.
2. Suppose you have a fixed budget to allocate to the
samples used in a study of the effect of the factor 6. Use the table of F distribution critical values (Ap-
“chemical concentration” on the plating thickness pendix Table VIII) to find
of electroplated plastic parts. Describe in general a. The F critical value based on df1 5 5 and df2 5 8
terms how you would allocate the samples. Spe- that captures an upper-tail area of .05
cifically, what information would make you want b. The F critical value based on df1 5 8 and df2 5 5
to use fewer levels of chemical concentration and, that captures an upper-tail area of .05
correspondingly, more plastic parts at each concen- c. The F critical value based on df1 5 5 and df2 5 8
tration level? Conversely, what scenario would lead that captures an upper-tail area of .01
you to use a larger number of concentration levels d. The F critical value based on df1 5 8 and df2 5 5
and, therefore, fewer plastic parts per concentration that captures an upper-tail area of .01
level? Include the two sources of variation in an e. The 95th percentile of an F distribution with
ANOVA experiment in your answers. df1 5 3 and df2 5 20
f. The probability P(F # 6.16) for df1 5 6 and
3. In a one-way ANOVA test for comparing the mean
df2 5 4
strengths (in kilograms) of three different alloys,
g. The probability P(4.74 # F # 7.87) for df1 5 10
suppose that the measuring instrument used is
and df2 5 5
out of calibration, causing it to give readings that
are consistently 2.5 kilograms higher than the true 7. Based on your answers to Exercise 6(a)–(d), what
measured strength. Using the general description of effect does interchanging df1 and df2 have on the
the techniques given in this section, explain what critical F value (for a fixed upper-tail area)?
effect you think such data would have on the results
8. An experiment was carried out to compare flow
of an ANOVA test comparing samples of the three
rates for four different types of nozzles.
alloys. Do you think an ANOVA test based on ac-
a. Samples of five type-A nozzles, six type-B noz-
curate measurements of the same samples of alloys
zles, seven type-C nozzles, and six type-D noz-
will lead to a different conclusion?
zles were tested. ANOVA calculations yielded
4. Repeated measurements in an ANOVA study are an F value of 3.68 with df1 5 3 and df2 5 20.
supposed to indicate what would happen if another State and test the relevant hypotheses using
researcher tried to repeat your study. In particular, 5 .01.
simply measuring the same sampled item several b. Analysis of the data using statistical software
times, which gives repeated measurements of that yielded a P-value of P 5 .029. Using 5 .01,
item, is not considered to be a valid form of replica- what conclusion would you draw regarding the
tion. Instead, several different items should each be test in part (a)?
measured once. What is the danger in using repeated
9. In a test of the hypothesis H0: 1 5 2 5 3 5 4,
measurements of the same item instead of truly repli-
samples of size 6 were selected from each of four
cating an experimental result? What do you expect the
populations, and an F statistic value of 4.12 was cal-
effect on the F statistic to be if repeated measurements
culated (using the methods in the next section). The
of a single item are used at each level of a factor?
appropriate degrees of freedom for the F distribution
5. As a simple method of determining which of k factor in this exercise are df1 5 3, df2 5 20. Using 5 .05,
levels maximizes the average value of a certain re- conduct the test to determine whether you can con-
sponse variable, inexperienced researchers some- clude that there are differences between 1, 2, 3,
times calculate the k sample means and then simply and 4 .

9.2 Single-Factor ANOVA

Notation and Formulas
One way to test the null hypothesis H0: 1 5 2 5 3 5 … 5 k is to compare the
means of random samples selected from each of the k populations specified by H0. This
method of sampling is the basis of the completely randomized design. The sample
sizes n1, n2, n3, . . . , and nk do not have to be equal. Let xij denote the measured response
for the jth item in a sample from the ith population. The following notation will be used
in our computations:

ANOVA Notation
Sample sizes: 1, 2, 3, ...,
Sample means: 1, 2, 3, . . . ,
Sample variances: 2 2 2 2
1, 2, 3, ...,
Total sample size: 5 1 1 …1
1 2 31
Grand average: 5 average of all responses

The double bar in the notation for the grand average is meant to imply that x is an
average of averages. More accurately, x is a weighted average of the k sample means:

n1 n2 n3 nk
x5 a b x1 1 a b x2 1 a b x3 1 … 1 a b xk
n n n n

where the weights n1yn, n2yn, n3yn, …, nkyn, sum to 1 (because n 5 n1 1 n2 1

n3 1 … 1 nk). Alternatively, x can also be thought of as the sample mean of the com-
bined group of n response values.
With this notation, the treatment sum of squares (denoted SSTr) and error sum of
squares (denoted SSE) are defined as

SSTr 5 n1(x1 2 x)2 1 n2(x2 2 x)2 1 n3(x3 2 x)2 1 … 1 nk(xk 2 x)2

n1 n2 n3
SSE 5 ^ (x1j 2 x1)2 1 ^ (x2j 2 x2)2 1 ^ (x3j 2 x3)2
j51 j51 j51
nk
1… 1 ^ (xkj 2 xk)2
j51

SSTr and SSE form the basis of the between-samples variation and within-samples
variation described in Section 9.1. Before these quantities are used to conduct the
hypothesis test, however, they must be adjusted to take into account the effects of sample
sizes. This is done later in the section.
SSE can also be written in the form

SSE 5 (n1 2 1)s21 1 (n2 2 1)s22 1 (n3 2 1)s23 1 … 1 (nk 2 1)s2k

which more clearly shows how SSE combines or pools the information in the k sample
variances s21, s22, s23, . . . , s2k . Together, these two sources of variation comprise the total
sum of squares (denoted SST). That is,
SST 5 SSTr 1 SSE
where SST is the sum of squared deviations from the grand mean:
k ni
SST 5 ^ ^ (xij 2 x )2
i51 j51

Hypothesis Tests
Until now, our ANOVA formulas have been merely arithmetic constructs. To put these
ingredients together to form a statistical procedure, we must be willing to make a few
assumptions about the populations being studied. ANOVA tests, in particular, are based
on the following assumptions:
ANOVA Assumptions
1. All of the k population variances are equal (i.e., 21 5 22 5 23 5 … 5 2k )
2. Each of the k populations follows a normal distribution.
These assumptions are identical to those for the two-sample equal-variance procedures
in Exercise 54 of Chapter 7 and Exercise 37 of Chapter 8, which, in fact, are just special
cases (when k 5 2) of the more general single-factor ANOVA test we are currently dis-
cussing. If the ratio of the largest sample variance to the smallest one does not exceed 4
by very much, then Assumption 1 is plausible. And for very small sample sizes, this rule
is conservative, so 4 can be replaced by 6. Formal test procedures can be found in the
chapter references. Assumption 2 can be checked by examining normal quantile plots
of each sample or, if sample sizes are quite small, a single quantile plot of the deviations
xij 2 xi calculated separately within each sample.
When sampling from normal populations, each sum of squares (such as SST, SSTr,
and SSE) has its own unique number of degrees of freedom. Furthermore, just as SST
can be decomposed into the sum of SSTr and SSE, the degrees of freedom associated with
these sums of squares also decompose in a similar fashion. In a one-way classification, the
total degrees of freedom (associated with SST) is n 2 1, which equals the sum of k 2 1
(the degrees of freedom for treatments) plus n 2 k (the degrees of freedom for error)1:

Sums of Squares and Their Degrees of Freedom

(Single-factor ANOVA)
Decomposition of sums of squares: SST 5 SSTr 1 SSE
Decomposition of degrees of freedom: n 2 1 5 k 2 1 1 n 2 k

1
The total degrees of freedom is always n 2 1, regardless of the ANOVA test we use. However, the other sums
of squares have df values that depend on the particular test. For example, the df for SSE in a one-way ANOVA
is different from the df for SSE in a two-way ANOVA.

Our purpose in finding the degrees of freedom is to convert the sums of squares
into mean squares by dividing each sum of squares by its associated df. Thus we define
SSTr
Mean square for treatments (between-samples) 5 MSTr 5
k21
SSE
Mean square error (within-samples) 5 MSE 5
n2k
MSTr and MSE serve as our measures of the between-samples and within-samples varia-
tion described in Section 9.1. All of this information is usually organized into an ANOVA
table (Figure 9.5). The ANOVA table is arranged in column form to emphasize the fact
that the sums of squares and degrees of freedom sum to SST and n 2 1, respectively.
Source of variation df SS MS F P-value
Between samples (treatments) k21 SSTr MSTr MSTryMSE
Within samples (error) n2k SSE MSE
Total variation n21 SST
Figure 9.5 ANOVA table for the one-way classification

The entry in the F column of the ANOVA table is the test statistic value
MSTr
F5
MSE
which is used to test the hypothesis H0: 1 5 2 5 3 5 … k. This F distribution has
(k 2 1, n 2 k) degrees of freedom since the numerator in MSTryMSE has df 5 k 2 1
and the denominator has df 5 n 2 k. As we mentioned in Section 9.1, the test proce-
dure is always right-tailed; that is, the P-value associated with an F statistic is equal to
the area to the right of the statistic under the appropriate F density curve. We reject H0
whenever the P-value of the test statistic F is less than or equal to the desired signifi-
cance level . Software packages usually include the P-value in the table.

One-Way ANOVA Test (Significance level )

Null hypothesis: 0: 1 5 2 5 3 5 … 5

Alternative hypothesis: At least two ’s are different.

MSTr
Test statistic: 5
MSe
-value: is the area under the density with ( 2 1, 2 ) degrees
of freedom to the right of the calculated .
Decision: Reject 0 if -value # .

Example 9.1 Numerous factors contribute to the smooth running of an electric motor (“Increas-
ing Market Share Through Improved Product and Process Design: An Experimental
Approach,” Quality Engineering, 1991: 361–369). In particular, it is desirable to keep

motor noise and vibration to a minimum. To study the effect that the brand of bearing
has on motor vibration, five different motor bearing brands were examined by install-
ing each type of bearing on different random samples of six motors. The amount of
motor vibration (measured in microns) was recorded when each of the 30 motors
was running. The data for this study is given in Table 9.1. Because each sample of
six motors was selected independently of the other samples, this is a completely ran-
domized design with the factor brand at five levels (brand 1, brand 2, . . . , brand 5).
Determining whether the bearing brands have different effects on the response vari-
able (motor vibration) can be accomplished with a one-way ANOVA test. The null
hypothesis is H0: 1 5 2 5 3 5 4 5 5, where i average vibration (in microns) for
motors using bearings of brand i. We use a significance level of .05 to conduct this test.
Table 9.1 Vibration (in microns) in five groups of electric motors
with each group using a different brand of bearing
Brand 1 Brand 2 Brand 3 Brand 4 Brand 5
13.1 16.3 13.7 15.7 13.5
15.0 15.7 13.9 13.7 13.4
14.0 17.2 12.4 14.4 13.2
14.4 14.9 13.8 16.0 12.7
14.0 14.4 14.9 13.9 13.4
11.6 17.2 13.3 14.7 12.3
Mean: 13.68 15.95 13.67 14.73 13.08
St. dev.: 1.194 1.167 .816 .940 .479

ANOVA Table
Source df SS MS F
Factor 4 30.88 7.72 8.45
Error 25 22.83 .913
Total 29 53.71

The ANOVA calculations proceed as follows. The sum of all n 5 30 values

in the data is 426.7, so the grand mean is x 5 426.7y30 5 14.22. Alternatively, we
could use the sample means to find x:
n1 n2 n3 n5
x5 a b x1 1 a b x2 1 a b x3 1 … 1 a b x5 Unless otherwise noted, all content on this page is © Cengage Learning.
n n n n
6 6 6 6
5 a b (13.68) 1 a b (15.95) 1 a b (13.67) 1 a b (14.73)
30 30 30 30
6
1 a b (13.08)
30
5 14.22
Furthermore,
SSTr 5 n1(x1 2 x )2 1 n2(x2 2 x )2 1 n3(x3 2 x )2 1 … 1 n5(x5 2 x )2
5 6(13.68 2 14.22)2 1 6(15.95 2 14.22)2 1 … 1 6(13.08 2 14.22)2
5 30.88

and
SSE 5 (n1 2 1)s21 1 (n2 2 1)s22 1 (n3 2 1)s23 1 … 1 (n5 2 1)s25
5 (6 2 1)(1.194)2 1 (6 2 1)(1.167)2 1 … 1 (6 2 1)(.479)2
5 22.83
Putting these results into the formulas for MSTr and MSE, we find
SSTr 30.88
MSTr 5 5 5 7.72
k21 521
SSE 22.83
MSE 5 5 5 .913
n 2 k 30 2 5
which yields the test statistic value
MSTr 7.72
F5 5 5 8.45
MSE .913
Using Appendix Table VIII for the F distribution with (k 2 1, n 2 k) 5 (5 2 1, 30 2 5) 5
(4, 25) degrees of freedom, we find that the P-value associated with the test statistic
F 5 8.45 is less than .001. Since this P-value is smaller than the prescribed of .05,
we can reject the hypothesis that all five means are equal and conclude that the type
of motor bearing used does have a significant effect on motor vibration. In particular,
a visual inspection of the sample means in Table 9.1 suggests that brand 5 is the best
choice for reducing vibration. In Section 9.3, we present a statistical procedure to
sort out which brands are indeed the better ones to use.

Section 9.2 Exercises

10. Five brands of raw materials are tested for their 2009: 114–122). A partial ANOVA table for this
effect on a process yield. Random samples of size 10 data follows:
are used for each of the materials. Complete the Sum of
following ANOVA table for this experiment: Source df Squares Mean Square F
Source of Mixture
variation df SS MS F
Error 13.929
Brand 15.32
Total 5664.415
Error .64
Total a. Fill in the missing entries in the ANOVA table.
variation b. State the null and alternative hypotheses of in-
terest in this experiment.
11. An experiment was carried out to compare elec-
c. Use 5 .05 to carry out the hypothesis test in
trical resistivity for six different low-permeability
part (b).
concrete bridge deck mixtures. There were 26 mea-
surements on concrete cylinders for each mixture; 12. Super duplex stainless steels (SDSS) are iron-
these were obtained 28 days after casting. The en- based alloys that offer an excellent combination of
tries in the accompanying ANOVA table are based toughness and mechanical strength. Such alloys
on information in the article “In-Place Resistivity of are useful for many applications in the chemical
Bridge Deck Concrete Mixtures” (ACI Materials J., and petrochemical industries. Recent research

has shown that the pulsed current gas tungsten arc ent HPJA coolant pressure levels (.6, 10, and 30 MPa)
welding (PCGTAW) process offers superior SDSS and recorded the corresponding average tool flank
welds compared to other methods. The authors of wear (ATFW), a combination of abrasive and depth
“Optimization of Experimental Conditions of the of cut notch wear:
Pulsed Current GTAW Parameters for Mechani-
Pressure
cal Properties of SDSS UNS S32760 Welds Based
.6: 145.00 158.14 157.32 409.42 143.00 135.50
on the Taguchi Design Method” (J. of the Air and
10: 75.00 113.82 76.02 378.65 61.58 183.39
Waste Mgmt. Assoc., 2012: 1978–1988) researched
30: 94.03 65.90 102.31 131.62 53.12 108.41
the impact of different PCGTAW process param-
eters on mechanical properties of the welds of a par- Consider conducting an ANOVA test to see if there
ticular SDSS. One investigation focused on seeing are any differences in the true mean ATFW caused
how pulse current (A) of the PCGTAW affects the by the different coolant pressures. The validity of an
toughness (J) of the SDSS welds. Here are experi- ANOVA test depends on the extent to which the two
mental results for toughness measurements under fundamental ANOVA assumptions (normal popula-
three pulse current settings: tions; equal population variances) are satisfied.
a. Create a single normal probability (quantile)
Pulse Current: 100 100 100 120 120 120 140 140 140
plot based on the deviations of the sample data
Toughness: 39 47 44 52 56 53 40 46 42 from the sample mean for each of the three sam-
Use 5 .05 to conduct the test for whether there are ples. Does the assumption of normality appear
any differences in the true average weld toughness to hold?
that may be attributable to the different pulse currents. b. The assumption of equal population variances
is plausible if the ratio of the largest sample
13. The article “Influence of Contamination and variance to the smallest sample variance is not
Cleaning on Bond Strength to Modified Zirconia” much more than 4. Is it plausible that the popu-
(Dental Materials, 2009: 1541–1550) reported on lation variances are approximately equal?
an experiment in which 50 zirconium-oxide disks
were divided into 5 groups of 10 each. Then a dif- 15. It is common practice in many countries to destroy
ferent contamination/cleaning protocol was used (shred) refrigerators at the end of their useful lives.
for each group. The following summary data on In this process, material from insulating foam may
shear bond strength (MPa) appeared in the article: be released into the atmosphere. The article “Release
of Fluorocarbons from Insulation Foam in Home
Treatment: 1 2 3 4 5
Appliances During Shredding” (J. of the Air and Waste
Sample mean: 10.5 14.8 15.7 10.0 21.6 Mgmt. Assoc., 2007: 1452–1460) gave the following
Sample sd: 4.5 6.8 6.5 6.7 6.0 data on foam density (g/L) for each of two refrigerators
a. State the hypotheses of interest in this experiment. produced by four different manufacturers:
b. Using a significance level of .01, can you con-
Manufacturer: 1 1 2 2 3 3 4 4
clude that there is a difference between the
Foam Density: 30.4 29.2 27.7 27.1 27.1 24.8 25.5 28.8
mean shear bond strength of the five groups?
Does it appear that true average foam density is not the
14. In “Investigation on Machining Performance of
same for all these manufacturers? State and test the
Inconel 718 Under High Pressure Cooling Con-
relevant hypotheses using a significance level of 5
ditions” (J. of Mech. Engr., 2012: 683–690), re-
.05. Summarize your analysis in an ANOVA table.
searchers varied selected high-pressure jet-assisted
(HPJA) machining parameters for the nickel-based 16. According to “Evaluating Fracture Behavior of Brit-
alloy Inconel 718 and investigated their effect on tle Polymeric Materials Using an IASCB Specimen”
tool wear. (J. of Engr. Manuf., 2013: 133–140), researchers
In one experiment, the researchers machined have recently proposed an improved test for the in-
six specimens of Inconel 718 at each of three differ- vestigation of fracture toughness of brittle polymeric

materials. The authors applied this new fracture from inches to centimeters has on the ANOVA
test to the brittle polymer polymethylmethacrylate calculations.
(PMMA), more popularly known as Plexiglas, b. Based on your conclusions in part (a), what gen-
which is widely used in commercial products. eral statement can you make about the effect of
The test was performed by applying asymmet- changing units of measure in an ANOVA test?
ric three-point bending loads on PMMA specimens
18. The accompanying summary data on skeletal-
and varied the location of one of the three load-
muscle citrate synthase activity (nmol/min/mg)
ing points to determine its effect on fracture load.
appeared in the article “Impact of Lifelong Seden-
In one experiment, three loading point locations
tary Behavior on Mitochondrial Function of Mice
based on different distances (mm) from the center
Skeletal Muscle” (J. of Gerontology, 2009: 927–939):
of the specimen’s base were selected, resulting in
the following fracture load data (kN): Old Old
Young Sedentary Active
Distance Fracture Load
Sample size 10 8 10
42 2.62 2.99 3.39 2.86
Sample mean 46.68 47.71 58.24
36 3.47 3.85 3.77 3.63
31.2 4.78 4.41 4.91 5.06 Suppose that the total sum of squares for the experi-
mental data is SST 5 2116.81.
Here is the corresponding Minitab ANOVA table:
a. Construct an ANOVA table for this experiment.
One-way ANOVA: Fracture versus Distance b. Using 5 .05, can you conclude that true aver-
age activity differs for the three groups?
Source DF SS MS F P
Dist. 2 6.7653 3.3826 48.58 0.000 19. A study was conducted to determine whether cer-
Error 9 0.6267 0.0696 tain physical properties of asphalt are related to
Total 11 7.3920 portions of a gel permeation chromatogram of the
a. Use your calculator to confirm Minitab’s com- asphalt (“Methodology for Defining LMS Por-
putations. tion in Asphalt Chromatogram,” J. of Materials in
b. At a significance level of .01, can you conclude Civil Engr., 1997: 31–39). To determine whether
there is a difference among true average fracture certain bands or slices of the chromatogram can
loads for the three loading point locations? be used to distinguish different aging conditions
c. Returning to the Minitab output, note that the in asphalt, samples of grade AC-10 asphalt were
number reported under P corresponds to the sampled from several sources and artificially aged,
P-value. Is the P-value exactly zero? What does it some samples for 5 hours and others for 24 hours.
mean when Minitab reports 0.000? Another group of samples was not aged. The fol-
lowing table shows the percentage area of the
17. In an experiment to study the possible effects of four same slice of the chromatograms of these samples
different concentrations of a chemical on heights of (i.e., area of the slice as a percentage of the entire
newly grown plants, suppose that an ANOVA test chromatogram):
is conducted and that plant height is measured in
inches. At a later date, the experimenter decides that Age of asphalt
plant heights should have been measured in centi- 0 hours 5 hours 24 hours
meters instead of inches. After multiplying the data Mean 3.43 3.18 3.22
in the original samples by 2.54 (1 in. 5 2.54 cm), Standard deviation .22 .13 .11
the experimenter wants to know what effect this Sample size 6 6 6
data conversion will have on the conclusions drawn
from the ANOVA test. Can you conclude (using 5 .05) that there is a
a. Use the formulas for SSTr, SSE, SST, MSE, difference between the means for the three age
and MSTr to discuss the effect that changing categories?

20. To assess the reliability of timber structures and Peg orientation

related building design codes, many researchers
Sample
have studied strength factors of structural lum-
number 0° 45° 90°
ber. In one such study (“Size Effects in Visually
1 17.7 22.0 19.3
Graded Softwood Structural Lumber,” J. of Ma-
terials in Civil Engr., 1995: 19–29), three species 2 17.4 18.7 20.8
of Canadian softwood were analyzed for bending 3 17.1 20.5 27.5
strength. Because the amount of bending de- 4 17.3 19.5 19.6
pends on the width and length of a board and the
5 16.8 17.4 19.3
particular stress applied, the board dimensions
were kept the same in each of the three wood spe- 6 22.4 22.0 22.3
cies. Wood samples were selected from randomly 7 22.3 19.4 22.9
selected sawmills, and, according to ASTM Stan- 8 20.4 18.3 19.6
dard D 4761, each sample was conditioned by
kiln and air drying to achieve approximately a a. Conduct an ANOVA test to determine whether
15% moisture content. The results of the experi- the mean bearing strength of the pegs is affected
ment are given here. by the orientation of the pegs in the joint con-
nection (use 5 .05).
Bending strength (in MPa) b. Would you say the test results in part (a) are fa-
Douglas-Fir: 65 46 52 39 41 44 vorable or unfavorable for the practice of using
wooden pegs in timber connections?
Species Hem-Fir: 45 48 32 30 47 50
Spruce Pine- Fir: 42 38 30 28 39 40 22. Friction in machining processes generates high
cutting temperatures that ultimately lead to wear
Using a significance level of 5%, conduct an and thermal damage of cutting tools. Fluid is tra-
ANOVA test to determine whether there is a dif- ditionally used to reduce cutting temperature, but
ference in the mean bending strengths among the this can lead to environmental pollution, health
three types of wood. hazards, and higher production costs. An alterna-
tive and novel process known as dry cutting uses
21. Pegged mortise and tenon joints have been used no cooling liquids and has shown great promise
to build wooden structures for centuries. Since for the machining industry to produce compo-
the mid-1960s, there has been renewed interest in nents in an economical and ecologically desirable
this method of timber connection because of its manner.
inherent strength compared with other methods Within the dry cutting device an interchange-
of connection. In a recent study of the bearing able cooling structure is placed near the cutting tip.
strength of white oak dowels, a random sample of The authors of “Design and Analysis of an Internally
white oak boards was used to create several pegs, Cooled Smart Cutting Tool for Dry Cutting” ( J. of
from which a random sample of pegs was drawn Engr. Manuf., 2012: 585–591) investigated how
(“Characterization of Bearing Strength Factors various physical characteristics of the cooling com-
in Pegged Timber Connections,” J. of Structural partment affect cutting temperature. Data from one
Engr., 1997: 326–332). To determine whether experiment that compared thickness of the cooling
the bearing strength of a peg is affected by the di- structure (mm) to the corresponding cutting tem-
rection with which forces are applied to the peg, perature (K) is given here:
three different peg orientations were used in the
Thickness Temperature
study: 0°, 45°, and 90°. The pegs were randomly
assigned to one of the three orientations, and a 0.5 425.60 426.95 424.30
stress measurement (in MPa) was recorded for 1.0 415.38 415.04 418.71
each peg: 1.5 416.91 418.84 418.63

Using a significance level of 5%, can it be concluded calibration problem in Exercise 3 will have on
that there is a difference among the true mean the entries in the ANOVA table.
temperature measurements for the three structure b. Based on your conclusions in part (a), what gen-
thicknesses? eral statement can you make about the effect of
calibration problems in measuring the response
23. In Exercise 3, a measuring instrument that was
variable of a single-factor ANOVA test?
out of calibration was used to measure strengths
(in kg) of three different alloys. Use the formulas 24. Check the validity of the two fundamental ANOVA
in Section 9.2 to give a more specific answer to the assumptions for the data in Exercise 21 by following
question posed in Exercise 3. That is, the steps stated in Exercise 14.
a. Using the formulas for SSTr, SSE, SST, MSE,
MSTr, and F, describe the exact effect the

9.3 Interpreting ANOVA Results

Effects Plots
A useful way to summarize the results of an ANOVA test is to create a graph showing,
on average, how a response variable changes as the levels of the independent variable
change. Such graphs are called effects plots because they depict the effect of changing
the levels of the independent variable. For a factor with k levels, this amounts to simply
plotting the sample averages x1, x2, x3, . . . , xk versus the integers 1, 2, 3, . . . , k. To make
the graph easier to read, the k averages are also joined by straight-line segments. By fol-
lowing these line segments from point to point, we get a clearer picture of the relation-
ship between the response and independent variables in an experiment.
Statistical software programs often include effects plots to accompany ANOVA cal-
culations. When you look at such printouts, remember that effects plots depict only the
between-samples variation in the experiment. They do not show the within-samples
variation and, consequently, cannot be used as substitutes for ANOVA tests. Techni-
cally, effects plots are only used after an ANOVA test shows that the independent vari-
able is statistically significant. Even then, effects plots give a general picture and do
not conclusively indicate which factor levels are truly distinct from others. Making that
determination requires the use of the multiple comparisons procedures presented later
in this section.

Example 9.2 In Example 9.1, we compared five different brands of motor bearings to find out
which brands, if any, are better for reducing motor vibration. Because the ANOVA
test in that example shows that the factor “brand” is statistically significant, it is per-
missible to create the effects plot for the five sample means given in Table 9.1. This
plot (Figure 9.6) clearly shows that the sample from brand 5 gives the lowest average
vibration. However, this fact still does not allow us to conclusively say that brand 5 is
the best. It might prove to be the case, for instance, that brands 1 and 3 are about the
same as brand 5 in their effectiveness for reducing vibration, even though the effects
plot shows that their sample means are slightly higher than the mean for brand 5.
Example 9.3 further clarifies the results of this study.

Vibration (microns)

13 Brand
1 2 3 4 5

Figure 9.6 Effects plot of the data in Table 9.1

Multiple Comparisons:Tukey’s Method

If the F statistic in a single-factor ANOVA test is not significant, then we have no statis-
tical evidence for concluding that the mean response differs at any of the k treatment
levels. Depending on the particular problem at hand, a nonsignificant ANOVA test
result can be as important as a significant result. Suppose, for instance, that the test in
Example 9.1 had turned out to be nonsignificant, so that no statistically significant dif-
ferences were detected between the five brands of bearings. As long as we are confident
that a sufficient amount of data was used to ensure the reliability of the experimental
results, a nonsignificant test result would allow us to freely choose any of the five brands
to use in producing electric motors. This would be very useful information because the
decision of which brand to use could then be based on other considerations, such as a
brand’s reliability or unit cost.
If the F statistic in an ANOVA test is significant, however, we must do further test-
ing before drawing conclusions. The most common method for doing this involves the
use of a multiple comparisons procedure. There are several such procedures in the
statistics literature. The one we present, called Tukey’s procedure, was developed by
Princeton statistician John Tukey, who is better known to scientists and engineers for Unless otherwise noted, all content on this page is © Cengage Learning.

inventing the fast Fourier transform (FFT) method and for introducing the term bit as
a shortened version of binary digit.
Tukey’s procedure allows us to conduct separate tests to decide whether i 5 j for
each pair of means in an ANOVA study of k population means. The method is based on
the selection of a “family” significance level, , that applies to the entire collection of
pairwise hypothesis tests. For example, when using the Tukey procedure with a signifi-
cance level of, say, 5%, we are assured that there is at most a 5% chance of obtaining a
false positive among the entire set of pairwise tests. That is, there is at most a 5% chance
of mistakenly concluding that two population means differ when, in fact, they are equal.
This is very different from simply conducting all the pairwise tests as individual tests,
each at a 5 .05, which can result in a high probability of finding false positives among
the pairwise tests.

Consider first the case of equal sample sizes. Tukey’s procedure is based on
comparing the distance between any two sample means, xi 2 xj , to a threshold val-
ue T that depends on as well as on the MSE from the ANOVA test. The formula
for T is
MSE
T 5 q
A ni
where ni is the size of the sample drawn from each population. The value of q is found from
a table of right-tail values of a statistic, q, that follows the Studentized range distribution.
A table of the values of q is given in Appendix Table IX. The Studentized range distribution
is a probability distribution that depends on a pair of degrees of freedom (k, m), where
k 5 number of population means to be compared
m 5 error degrees of freedom
5 n 2 k, for single@factor ANOVA
n5n 1n 1n 1…1n
1 2 3 k

5 total number of observations used in the ANOVA study

To determine whether two means i and j differ, we simply compare xi 2 xj to T. If
xi 2 xj exceeds T, then we conclude that i Þ j. Otherwise, we cannot conclude that
there is a difference between the two means.

Tukey’s Procedure for Equal Sample Sizes

1. Select a family significance level at which to conduct the hypothesis tests.

2. Compute 5 .

A
3. Conclude that Þ if 2 . .
4. Use bars to connect each pair of means and for which 2 does exceed
in Step 3. The corresponding means and of such pairs are not considered to dif-
fer statistically from one another.

One easy method for keeping track of the results of all these pairwise tests is to ar-
range the sample means x1, x2, x3, . . . , xk in increasing order, plot the ordered means
along a horizontal line, and then draw horizontal bars connecting pairs of means that
are no farther than T units apart. These connecting bars are usually drawn in several
rows beneath the corresponding means to keep the diagram uncluttered. The bars show
which population means do not significantly differ from one another. Likewise, means
that are not connected by a bar do differ significantly from one another. Figure 9.7
illustrates how this graphical procedure would be used to summarize the multiple com-
parisons of an ANOVA test using k 5 4 populations.

– – – –
3 2 4 1

Figure 9.7 Using bars to connect means that do not significantly

differ for 1 5 5, 2 5 2, 3 5 1, 4 5 3, and critical value 5 2.5

Example 9.3 Because the ANOVA test in Example 9.1 is significant, it is necessary to conduct a
multiple comparisons procedure to delineate exactly which of the five bearing brands
are better than the others. Using Tukey’s procedure with a family significance level of
5 .05, we calculate the critical distance between sample means to be

MSE .913
T 5 q 5 (4.15) 5 1.62
A ni A 6

where q is based on (k, n 2 k) 5 (5, 25) degrees of freedom and is approximated by inter-
polating between the values of q.05(5, 24) and q.05(5, 30) found in Appendix Table IX. The
pairwise distances between the five sample means in Table 9.1 are then compared to T:

Samples Distance T Conclusion

1, 2 |13.68 2 15.95| 5 2.27 . 1.62 1 differs from 2
1, 3 |13.68 2 13.67| 5 .01 , 1.62
1, 4 |13.68 2 14.73| 5 1.05 , 1.62
1, 5 |13.68 2 13.08| 5 .60 , 1.62
2, 3 |15.95 2 13.67| 5 2.28 . 1.62 2 differs from 3
2, 4 |15.95 2 14.73| 5 1.22 , 1.62
2, 5 |15.95 2 13.08| 5 2.87 . 1.62 2 differs from 5
3, 4 |13.67 2 14.73| 5 1.06 , 1.62
3, 5 |13.67 2 13.08| 5 .59 , 1.62
4, 5 |14.73 2 13.08| 5 1.65 . 1.62 4 differs from 5
Unless otherwise noted, all content on this page is © Cengage Learning.

The information from these ten tests is summarized in Figure 9.8 by arranging
the five sample means in ascending order and then drawing rows of bars connecting
the pairs whose distances do not exceed T 5 1.62. Starting at the left, the top row
connects the means that do not significantly differ from 5; the next row shows the
means that do not differ from 4; etc. Using this diagram along with the effects plot
(Figure 9.6), we can now summarize what is happening in the ANOVA test. Although
brand 5 has the lowest mean, it does not significantly differ from brands 1 and 3 in its
effect on vibration. We can conclude, however, that brand 5 is definitely better than
brands 2 and 4. Thus the choice of which bearing brand is best has been narrowed
to brands 1, 3, and 5. If we are satisfied that the average vibration levels produced
by these three brands are acceptable for use in the motors, then the choice could be

further narrowed by considering additional factors, such as unit cost and reliability.
Figure 9.9 shows the SAS output from the application of Turkey’s procedure.

Brand 5 Brand 3 Brand 1 Brand 4 Brand 2

13.08 13.67 13.68 14.73 15.95

Figure 9.8 Summarizing the ten comparisons from Tukey’s procedure

for the data of Example 9.1

Alpha = 0.05 df = 25 MSE = 0.913533

Critical Value of Studentized Range = 4.15336
Minimum Significant Difference = 1.6206

Means with the same letter are not significantly different.

Tukey Grouping Mean N Brand

A 15.9500 6 2
A
B A 14.7333 6 4
B
B C 13.6833 6 1
B C
B C 13.6667 6 3
C
C 13.0833 6 5

Figure 9.9 Turkey’s Method in SAS

Experimental designs that use the same sample size for each treatment level are
called balanced designs, whereas those with different sample sizes for some treatment
levels are called unbalanced designs. For an unbalanced design, the Tukey procedure
Unless otherwise noted, all content on this page is © Cengage Learning.

is often run by choosing the minimum of the numbers n1, n2, n3, . . . , nk to use in the
calculation of the critical value T. This leads to a slightly larger value of T than neces-
sary for multiple comparisons; consequently, this practice is considered a conservative
procedure. That is, differences between sample means that exceed T would surely re-
main significant if larger values of ni were to be used in the calculation of T. Other mod-
ifications of Tukey’s procedure include using Tij 5 q 1(MSEy2)(1yni 1 1ynj) in place
of q 1MSEyni when comparing two sample means based on unequal sample sizes.
One question that sometimes arises when first encountering multiple comparisons pro-
cedures is: Why not simply conduct such procedures at the outset and bypass the step of
conducting the ANOVA test? One answer is that most multiple comparison procedures tend
to be not quite so powerful as the ANOVA test for detecting differences between means. The
main reason for this is that, faced with a large number of pairwise hypothesis tests, multiple
comparisons procedures attempt to avoid the problem of making too many type I errors

(i.e., falsely detecting differences between means that are, in fact, equal) by using family
significance levels. These family significance levels essentially put more demands on the
individual pairwise tests than we might normally do if we were comparing only one pair of
means, not several. By controlling the overall, or family, error rate of all the tests, each of
the individual pairs tested must pass a higher standard (i.e., the significance levels for each
individual test are much smaller than the family error rate). The end result is that a multiple
comparisons procedure can sometimes miss significant findings that the ANOVA test would
not fail to detect. For these and other reasons, it is usually recommended that multiple com-
parisons procedures be run after determining that the appropriate ANOVA test is significant.

Multiple Comparisons to a Control: Dunnett’s Method

Many scientific studies involve comparisons of several treatment populations to a fixed
control population. For example, in tests for levels of contaminants in water, water
samples taken downstream from an industrial discharge source are usually compared to
a control sample of water taken upstream from the source. Many biological studies com-
pare the potential effects of drugs or other treatments on treated samples to a control
sample that is not treated. In such studies, we are mainly interested in the comparisons
between the k 2 1 treatment means and the mean of a single control sample, but we are
not necessarily interested in making all possible pairwise comparisons between samples.
Multiple comparisons procedures, such as Tukey’s method, which take into account all
possible pairwise comparisons, are usually too conservative for applications involving
control groups. Consequently, alternative procedures, such as Dunnett’s method, are
used when only comparisons to a control are desired.
The steps in Dunnett’s method are similar to those in Tukey’s method except that
the critical value T is computed as
T 5 t(k 2 1, n 2 k)2MSE(1yni 1 1ync)
where nc denotes the sample size used in the control group and ni is the sample size of
the treatment group being compared to the control. The critical value t(k 2 1, n 2 k),
called Dunnett’s t, is based on (k 2 1, n 2 k) degrees of freedom, where n is the total
of the sample sizes used in the experiment. Values of t(k 2 1, n 2 k) can be found in
Appendix Table X. Instead of making all k(k 2 1)y2 possible pairwise comparisons,
Dunnett’s method involves only k 2 1 comparisons of the k 2 1 treatment means to the
single control group mean.

Example 9.4 To illustrate Dunnett’s method, we reconsider the data of Example 9.1. Suppose that
bearings of brand 2 are currently used to manufacture the electric motors and that
we want to compare each of four competing brands to brand 2. To conduct such a
test of k 5 5 means, we would use Dunnett’s method and compare the k 2 1 5 4
treatment samples (brands 1, 3, 4, and 5) to the control sample (brand 2). Because
the sample sizes are equal, the same T value would be used for all four comparisons.
Using a family significance level of a 5 .05, we find

1 1 1 1
T 5 t(k 2 1, n 2 k) MSEa 1 b 5 (2.61) (.913)a 1 b 5 1.440
B ni nc B 6 6

where the value of t.05(4, 25) is found in Appendix Table X to be approximately 2.61.
The four comparisons to the control sample yield the following results:
Samples
Distance T Conclusion

2, 1 |15.95 2 13.68| 5 2.27 . 1.440 1 differs from 2

2, 3 |15.95 2 13.67| 5 2.28 , 1.440 3 differs from 2
2, 4 |15.95 2 14.73| 5 1.22 < 1.440
2, 5 |15.95 2 13.08| 5 2.87 < 1.440 5 differs from 2
There is no need to create a bar diagram as in Tukey’s method because all four com-
parisons are being made to a single population, brand 2. The results of the test show
that brands 1, 3, and 5 each differ significantly from brand 2, so we are free to choose
among these three brands when considering a replacement for brand 2.

Fixed and Random Effects

Factor or treatment levels used in an experiment arise in essentially two ways, each of
which forces a different interpretation on the results of an ANOVA test. Sometimes, the
factor levels chosen may be the only ones of interest to us. This would be the case, for in-
stance, if the five brands of motor bearings studied in Examples 9.1–9.4 are the only brands
of such motor bearings currently available in the market. In this situation, our conclusions
pertain only to these five brands and to comparisons between them. A factor whose levels are
the only ones of interest in an experiment is called a fixed factor, and ANOVA models based
on such factors are said to be fixed effects models. Alternatively, the levels of a factor may
be only a sample from a larger population of possible levels. When this is the case, we call
the factor a random factor, and ANOVA models based on such factors are called random
effects models. For example, if the five brands studied in Example 9.1 are only a sample from
a large population of possible brands, then “brand” would be considered a random factor.
It is important to understand the difference between fixed and random factors for
two reasons: (1) The computations of F ratios for testing whether a factor is significant
usually depend on whether the factor is fixed or random, and (2) the interpretation of
the ANOVA results differs for the two types of factor. The first fact is especially impor-
tant when working with multifactor models (see Chapter 10). With more than one fac-
tor in a model, some factors may be random whereas others are fixed. In such cases, the
F ratios for random factors are often calculated differently from F ratios for fixed factors.
Fortunately, though, for single-factor ANOVA models, it turns out that the statistical test
procedure is identical for either the random or the fixed effects model.
For example, the single-factor ANOVA test of Example 9.1 would be conducted in
exactly the same manner, regardless of whether the factor “brand” was considered to
be fixed or random. The interpretations, though, would differ in the following ways. If
“brand” is a fixed factor, then we would report the ANOVA results by pointing out the
significant differences between the population means, 1, 2, 3, . . . , and k . Further-
more, the conclusions of the study would not be extended beyond these populations.
If “brand” is a random factor, however, then the purpose of the study is to extrapolate
the ANOVA findings to the larger population from which the factor levels are cho-
sen. In particular, we are interested in estimating how much of the variability in the

sample results is due to the variability between the various brands in the population
(from which the five brands in the study were selected) and how much is due to the
experimental error. These two components of variance sum to the total variation, 2:

2 5 2 1 2«

where 2 denotes the variability in the population from which the treatment levels
are chosen and 2« is the experimental, or within-samples, error. In the random effects
model, the hypotheses we test are H0: 2 5 0 versus Ha: 2 . 0. For the case of equal
sample sizes, estimates of 2 and 2« are given by the formulas

n 2« 5 MSE

MSTr 2 MSE
n 2 5

ni

Example 9.5 The study of nondestructive forces and stresses in materials furnishes important
information for efficient engineering design. The paper “Zero-Force Travel-Time
Parameters for Ultrasonic Head-Waves in Railroad Rail” (Materials Evaluation,
1985: 854–858) reports on a study of travel time for a certain type of wave that results
from longitudinal stress of rails used for railroad track. Three measurements were
made on each of six rails randomly selected from a population of rails. The investiga-
tors used random effects ANOVA to decide whether some of the variation in travel
time could be attributed to “between-rail variability.” The data for this experiment
and the corresponding ANOVA table appear in Table 9.2.
The error variance is estimated by n 2« 5 MSE 5 16.17, and the estimated var-
2
iation in the population of rails is
n 5 (MSTr 2 MSE)yni 5 (1862.1 2 16.17)y3 5
615.31. Furthermore, since the F ratio of 115.2 is highly significant (i.e., it has a low
P-value), we can conclude that the differences between rails are an important source
of travel-time variability.

Table 9.2 Wave travel times (in nanoseconds)

Observations Sample mean
Rail 1 55 53 54 54.00
Unless otherwise noted, all content on this page is © Cengage Learning.
Rail 2 26 37 32 31.67
Rail 3 78 91 85 84.67
Rail 4 92 100 96 96.00
Rail 5 49 51 50 50.00
Rail 6 80 85 83 82.67
ANOVA Table
Source df SS MS F
Treatments 5 9310.5 1862.1 115.2
Error 12 194.0 16.17
Total 17 9504.5

Section 9.3 Exercises

25. Explain why creating an effects plot does not take 30. Refer to the data from Exercise 19.
the place of performing an ANOVA test. a. Construct an effects plot for this data.
b. Use Tukey’s method with 5 .05 to determine
26. Refer to the data from Exercise 18.
which age categories differ from each other.
a. Create an effects plot of the data.
c. Suppose that asphalt that is not aged is taken
b. Use Tukey’s multiple comparisons procedure to
to be a control group. Use Dunnett’s method
determine which groups differ from one another
with 5 .05 to decide whether one or both of
with respect to CS activity.
the aged asphalt groups differ from the control
27. An experiment to compare the wall coverage area group.
of five different brands of yellow interior latex paint
31. Exercise 11 described an experiment in which 26
used 4 gallons of each brand. The sample means of
resistivity observations were made on each of six
the coverage areas (in ft2/gal) for the five brands were:
different concrete mixtures. The article cited there
462.0, 512.8, 437.5, 469.3, 532.1. The MSE was 272.8
gave the following sample means: 14.18, 17.94,
and the computed F statistic for the ANOVA test was
18.00, 18.00, 25.74, 27.67. Apply Tukey’s method
found to be significant at 5 .05. Use Tukey’s test
using 5 .05 to identify significant differences,
(at 5 .05) to investigate the pairwise differences be-
and describe your findings (use MSE 5 13.929).
tween the coverage areas of the five brands of paint.
32. In Exercise 16, samples of three different loading
28. In Exercise 27, suppose that the third sample mean
points were tested to determine whether there were
is 427.5 (instead of 437.5). Use Tukey’s procedure
differences among their average fracture loads.
to see which population averages can be considered
a. Draw an effects plot for the data.
different from one another ( 5 .05). Use the meth-
b. Using 5 .05, apply Tukey’s method to deter-
od of placing bars under those means that are not
mine which if any of the loading points differ
statistically different from one another. Write a short
from the others.
sentence summarizing your conclusions. Assume
the MSE remains the same as in Exercise 27. 33. Using a significance level of 5 .05, apply Tukey’s
method to the data of Exercise 12. Is there a pulse
29. Repeat Exercise 28 for the case where the sample
current that seems to be the best choice to yield
means are 462.0, 502.8, 427.5, 469.3, 532.1 (i.e.,
maximum average toughness?
the second and third sample means have been
changed from their original values in Exercise 27).

9.4 Randomized Block Experiments

Using one’s knowledge about a problem, whether it comes from technical experience or
common sense, often helps guide the choice of an experimental design. For instance,
consider how additional knowledge might affect a comparison of the fuel efficiency
(measured in miles per gallon) of several different brands of gasoline. Our first inclination
might be to conduct this study as a completely randomized design (Section 9.2) involv-
ing the hypothesis H0: 1 5 2 5 3 5 … 5 k, where i 5 average mpg obtained using
brand i. However, there is a potential problem: Experience tells us that compact cars get
better fuel efficiency (higher mpg) than mid-size cars, which, in turn, are more efficient
than luxury cars. So what would happen if our random sampling happened to produce
a disproportionate number of compact cars in sample 1 (cars that use brand 1)? Clearly,
the average mpg of the cars in such a sample would probably be higher than the average

mpg for the other samples, even if brand 1 were the worst of the four in terms of average
fuel efficiency. In fact, it is easy to imagine many scenarios in which the sample means
might reflect more about the particular sizes of the automobiles chosen than about the
efficiency of the gasoline brands. To avoid such problems, we should use an experimental
design that ensures that each brand of fuel is tested on the same range of car sizes.
External influences, such as car size, can be thought of as additional factors to be
included in an experimental design. There is usually no need to test such external factors
for statistical significance. Either common sense or technical knowledge tells us that they
are influential, and our reason for considering them is to make sure that they do not in-
validate conclusions about the factor in which we are truly interested (e.g., brand of fuel).
The effect of such external influences can be eliminated, or at least substantially
reduced, by using them as blocks in an experiment. Blocks are groups of items in a
population that have similar characteristics, such as the block of compact cars and the
block of mid-size cars. By making sure that a member of each block is included in each
of the samples, we can eliminate the effect of external factors on the differences between
average responses for the factor we are studying.
For example, to eliminate the influence of car size in the fuel efficiency study, we
could select a range of car sizes, call them B1, B2, B3, . . . , Bb, and then make sure that
each gasoline brand is used on a car from each of these blocks. Denoting the levels of
the factor “gasoline brand” by A1, A2, A3, . . . , Aa, we can summarize the data from such
an experiment in matrix form (Figure 9.10).
The design in Figure 9.10, called a randomized block design, is the natural exten-
sion of the paired-samples test of Section 8.2. In Figure 9.10, blocks of homogeneous
experimental units take the place of the data pairs of Section 8.2. Notice, for instance,
that the observations in any two rows of this matrix are paired because each level of the
blocking factor is represented in each row. Just as in the paired-samples test, the effect
of the different blocks is subtracted out when calculating the difference between any
two row means (i.e., the differences in the average responses for the levels of factor A).

Block
1 2

1 ...

...

=
Each cell has one response value.

Figure 9.10 Data layout for a randomized block design

Sums of Squares
In a randomized block design, the total variation SST in the response variable decom-
poses into three terms, one representing the variation due to the differences in treatment

levels (SSTr), one representing the variation between the block means (SSB), and the error
term (SSE, which accounts for all other variation):
SST 5 SSTr 1 SSB 1 SSE
SST, SSTr, and SSB are computed from the formulas shown in the following box:

Sums of Squares Formulas for

a Randomized Block Experiment
5 observation on factor level in block

SST 5 ^ ^ ( 2 )2
51 51

SSTr 5 ^( 2 )2, where denotes the mean of the data the th row
51

SSB 5 ^( 2 )2, where denotes the mean of the data in the th column
51

The remaining term, SSE, is computed by rewriting the ANOVA decomposition as

SSE 5 SST 2 SSTr 2 SSB

Hypothesis Tests
Under the usual ANOVA assumptions of normal populations and equal variances, the
total degrees of freedom associated with SST is n 2 1, where n 5 a ? b. The degrees
of freedom for treatments and the blocking factor B are a 2 1 and b 2 1, respectively.
The remaining degrees of freedom, (a 2 1)(b 2 1), are associated with the error term:
ANOVA decomposition: SST 5 SSTr 1 SSB 1 SSE
Degrees of freedom: ab 2 1 5 (a 2 1) 1 (b 2 1) 1 (a 2 1)(b 2 1)
The mean squares are given by
SSTr
MSTr (treatments) 5
a21
SSB
MSB (blocks) 5
b21
SSE
MSE (error) 5
(a 2 1)(b 2 1)
The hypothesis tests for a randomized block design are summarized in the fol-
lowing box:

Randomized Block Test (Significance level )

Hypothesis Test statistic Decision

0: thereis no MSTr Reject 0 if #

5
treatment effect MSe
with [ 2 1, ( 2 1)( 2 1)] df

0: thereis no MSB Reject 0 if #

5
block effect MSe
with [( 2 1), ( 2 1)( 2 1)] df
(In each case, the alternative hypothesis is that the particular effect exist.)

Example 9.6 The applications of statistics to crop studies, which began in the 1920s, frequently
makes use of a particular blocking variable, the plot. As farmers have long known, dif-
ferent plots of land have unique combinations of water, sunlight, and soil chemicals,
each having a significant effect on crop growth and yield. Oranges, for example, are
so sensitive to different amounts of sunlight that it is a well-known fact that the sweet-
est oranges come from the south side of the tree.2
In a study of different rootstocks for orange trees, four different varieties of
rootstock are tested by planting each variety on the same ten plots of land.3 The
numbers of oranges produced by these trees are recorded in Figure 9.11. In this
study, the factor A 5 “variety” has a 5 4 levels. The blocking factor B 5 “plot” has
b 5 10 levels.
Block (plot)
1 2 3 4 5 6 7 8 9 10 Average:

1 11 12 10 10 10 9 10 10 10 12 10.4

2 12 12 10 10 10 9 10 10 10 12 10.5
Treatment
(variety)
3 14 15 12 13 12 12 13 13 14 16 13.4

4 12 13 10 10 11 9 11 12 11 14 11.3

Average: 12.25 13.0 10.5 10.75 10.75 9.75 11.0 11.25 11.25 13.5

Figure 9.11 Number of oranges per tree (in 100s) for Example 9.6

The grand average of the 40 values is x 5 11.4, and the ANOVA calculations for
this data are as follows:
a
Unless otherwise noted, all content on this page is © Cengage Learning.
SSTr 5 b ^ (Ai 2 x)2
i51
5 10 3 (10.4 2 11.4)2 1 (10.5 2 11.4)2 1 (13.4 2 11.4)2
1 (11.3 2 11.4)2 4 5 58.2
b
SSB 5 a ^ (Bj 2 x)2
j51

5 4 3 (12.25 2 11.4)2 1 (13.0 2 11.4)2 1 … 1 (13.5 2 11.4)2 4 5 49.1

2
McPhee, J., Oranges, Farrar, Straus, Giroux, New York, 1967, p. 8.
3
Oranges, like roses, are grown by grafting plants with desirable characteristics onto the root structure
of another plant whose root system is known to be resistant to disease and other problems.

a b
SST 5 ^ ^ (xij 2 x)2
i51 j51

5 (11 2 11.4)2 1 (12 2 11.4)2 1 (10 2 11.4)2 1 … 1 (11 2 11.4)2

1 (14 2 11.4)2 5 113.6
By subtraction, SSE 5 SST 2 SSTr 2 SSB 5 113.6 2 58.2 2 49.1 5 6.3. All of this
information is summarized in the ANOVA table:
Source of variation df SS MS F
Treatments (variety) a215 3 58.2 19.4 83.15
Blocks (plots) b215 9 49.1 5.456 23.38
Error (a 2 1)(b 2 1) 5 27 6.3 .2333
Total variation ab 2 1 5 39 113.6
Using a significance level of a 5 .05, we can conclude that the different va-
rieties do have different mean yields since F 5 MSTryMSE 5 83.15 has a P-value
smaller than .05. We can also conclude that the different plots have differing effects
on yield since F 5 MSB/MSE 5 23.38 also has a P-value smaller than .05, although
this conclusion only confirms our original belief about the effect of different plots.
When H0 is rejected, Tukey’s method can be applied to identify significant dif-
ferences among treatments.

Section 9.4 Exercises

34. A pharmaceutical company wants to begin testing a one each, to the four humidity levels. The resulting
drug designed to reduce blood pressure. The compa- annual power consumption (in kilowatt-hours) is given
ny wants to test the drug by measuring the blood pres- in the following table:
sures of two samples of people, those who take the
Humidity level
drug for a prescribed period of time and those who
do not take this drug (or any other medications dur- Brand 1 2 3 4
ing the test period). Because researchers know that 1 685 792 838 875
several human characteristics (e.g., age, weight, diet, 2 722 806 893 953
exercise) may have considerable effects on the experi-
3 733 802 880 941
mental results, they want to run their experiment as
a randomized block design. Using the characteristics 4 811 888 952 1005
mentioned, describe how the researchers should go 5 828 920 978 1023
about creating the blocks for such an experiment.
a. Using 5 .01, can you conclude that there is a
35. A consumer protection organization wants to com- difference between the power consumptions of
pare the annual power consumption of five different the five brands?
brands of dehumidifiers. Because power consumption b. Using 5 .01, can you conclude that there are
depends on the prevailing humidity level, each brand differences in power consumption between the
was tested at four different humidity levels, ranging levels of the blocking factor “humidity”? Does
from moderate to heavy humidity. For each brand, this result support the experimenters’ use of hu-
a sample of four humidifiers was randomly assigned, midity as a blocking factor?

36. A certain county uses three assessors to determine people could exhibit large differences in effort, even
the values of residential properties. To see whether from the same type of stool, a sample of nine people
the three assessors differ in their assessments, five was selected and each was tested on all four stools:
houses are selected and each assessor is asked to
determine the market value of each house. Let Subject
A denote the factor “assessors” and B denote the 1 2 3 4 5 6 7 8 9
blocking factor “houses.” An ANOVA calcula-
A 12 10 7 7 8 9 8 7 9
tion revealed that SSA 5 11.7, SSB 5 113.5, and Type of B 15 14 14 11 11 11 12 11 13
SSE 5 25.6. stool
a. Using 5 .05, test the hypothesis that there are C 12 13 13 10 8 11 12 8 10
no differences between the average values re- D 10 12 9 9 7 10 11 7 8
ported by the three assessors.
b. Based on the ANOVA results, was the use of a. Using a significance level of 5 .05, can you
houses as a blocking factor warranted in this conclude that there is a difference in the average
study? effort required to rise from each type of stool?
b. Do the differences in rising effort that the
37. The article “A Software-Based Resource Selection
researchers expected seem to be confirmed by
Process in Competitive Network Environment
the data?
Using ANOVA (A Case Study)” (Intl. J. of Comp.
Appl., 2012: 17–21) reported on a study in which 39. To assess the potential risks associated with failure
three types of lathes were compared. Each of three of a particular process, investigators often perform
operators used each of the lathes for the equivalent a failure modes and effects analysis (FMEA). An
of a full workday shift. For each shift, the research- FMEA identifies opportunities for failure, known
ers recorded the percentage of acceptable products as failure modes, in a given process. Each mode is
manufactured by the operator. The data from the assessed with a numeric score based on (1) severity
experiment is given here: of the consequences of failure, (2) likelihood of
failure occurrence, and (3) likelihood that failure
Lathe Brand
would not be detected. The product of these scores
1 2 3 is the risk priority number (RPN) for the mode.
1 86 86 88 Modes having the highest RPN values are usually
Operator 2 85 86 91 given the highest priority in carrying out further
3 82 83 85 analyses.
The article “Continuous Quality Improvement
a. Using the three operators as blocks, can you
in Investment Castings: An Experimental Study us-
conclude that there is a difference among the
ing a Modified FMEA Approach Called FEAROM”
percent of acceptable products due to lathes?
(Eur. J. of Sci. Res., 2012: 308–325) reported on a study
(Use 5 .05)
that compared four design methods (M1, M2, M3, M4)
b. Can you conclude that the different operators
in preproduction trials of the upper range for a particu-
have differing effects on product acceptability
lar casting valve. The design methods are applied by
rate? (Use 5 .05).
human operators, which introduces potential operator-
38. In the article “The Effects of a Pneumatic Stool to-operator variation in RPN values. To account
and a One-Legged Stool on Lower Limb Joint Load for this, each of the four design methods was used
and Muscular Activity During Sitting and Rising” (in random order) by all 21 individuals in the study.
(Ergonomics, 1993: 519–535), the following data The data was analyzed by the R software, giv-
is given on the effort (measured on the Borg scale) ing the following output. Note that the format of
required by a subject to arise from sitting on four dif- the ANOVA table in R is very similar to the one
ferent stools. Because it was suspected that different we use, except R eliminates the row of “totals” and

uses the word residuals instead of error. The column d. Explain why your conclusions about wood types
labeled ‘Pr(>F)’ represents P-value. in this experiment differ from the conclusions
reached in Exercise 20.
Df Sum Sq Mean Sq F-value Pr(F)
DESIGN ? 519515 ? ? ? 41. Example 4.15 (Chapter 4) describes a randomized
PERSON ? ? 5023 ? 0.445 block experiment for comparing three different
Residuals ? 293009 ? methods (A, B, and C) of curing concrete. Different
a. Fill in the missing values in the table above. batches of concrete are used as the blocks in the ex-
b. Using 5 .05, can it be concluded that there is periment. For convenience, the data from Table 4.1
a difference in the true average RPN among the is repeated here:
four design methods? Strength (in MPa)
c. Do the person-to-person differences in RPN Batch Method A Method B Method C
seem to be confirmed by the data? Explain.
1 30.7 33.7 30.5
40. In the study described in Exercise 20, the wood
2 29.1 30.6 32.6
grade is known to affect wood strength. To incor-
porate this information, three wood grades were 3 30.0 32.2 30.5
studied: SS (select structural), grade 2, and grade 3. 4 31.9 34.6 33.5
Wood grades are determined by visual inspection. 5 30.5 33.0 32.4
The following table shows bending strengths from 6 26.9 29.3 27.8
testing wood samples of each type and grade: 7 28.2 28.4 30.7
8 32.4 32.4 33.6
Wood grade 9 26.6 29.5 29.2
SS Grade 2 Grade3 10 28.6 29.4 33.2
Douglas Fir 65 43 41 a. Using a significance level of 5%, can you con-
Species Hem-Fir 45 38 32 clude that there is a difference in mean concrete
Spruce-Pine-Fir 42 35 30 strength between the three curing methods?
b. Can you conclude that there are differences be-
a. Using the three wood grades as blocks, can you tween the batch means? (Use 5 .05.)
conclude that there is a difference between the c. Suppose that you ignore the fact that the batches
mean bending strengths of the three species of are blocks in this experiment and that you sim-
wood? (use 5 .05.) ply run a one-factor ANOVA test, treating the
b. Can you conclude that there are differences be- three columns of data as three random samples.
tween the mean bending strengths for the three Using a significance level of .05, what conclu-
grades of wood? (use 5 .05.) sion do you reach regarding the differences be-
c. Suppose that wood with a large bending strength tween the three curing methods?
is needed for a particular structure and that any
wood grade is acceptable. Which type and grade
of wood is best for such a structure?

Supplementary Exercises
42. The authors of “Statistical Analysis and Optimiza- Engr. Manuf., 2012: 1847–1861) investigated
tion Study on the Machinability of BerylliumCopper the machinability of berylliumcopper alloy in an
Alloy in Electro Discharge Machining” (J. of electro discharge machining (EDM) process. The

accompanying data resulted from an EDM process In particular, it can be shown that F 5 (t@2)2, for
using an oil dielectric medium where researchers an F distribution with df1 5 1 and any value of df2
applied four different EDM pulse times ( s) and and for a t distribution with df 5 df2. The subscripts
recorded the corresponding material removal rate and @2 on F and t@2 denote right-tail areas of
(MRR, in mm3/s). and @2 under the density curves for the F and t
MRR distributions, respectively.
a. Verify this relationship by looking up F.05(df1 5
20 0.1797 0.3353 0.4073 0.7548
1, df2 5 10) and t.025(df 5 10) in the F and t ta-
Pulse 40 0.2433 0.3830 0.5625 0.7258
bles, Appendix Tables VIII and IV, respectively.
Time 60 0.2338 0.3372 0.5552 0.7453 b. For 5 .05, the values of t@2 approach
80 0.1341 0.2806 0.5502 0.8212 z@2 5 1.96 as the degrees of freedom increase.
What limit does F.05(df1 5 1, df2) approach as
df2 increases?
Use 5 .05 to conduct the test for whether there
are any differences in the true average MRR that 46. Consider the following data on plant growth after the
may be attributable to the different pulse times. application of five different types of growth hormone:
43. The lumen output was determined for three dif- Data
ferent brands of 60-watt soft-white light bulbs, with
A 13 17 7 14
eight bulbs of each brand tested. From the result-
ing lumen measurements, the following sums of B 21 13 20 17
squares were computed: SSE 5 4773.3 and SSTr 5 C 18 15 20 17
591.2. D 7 11 18 10
a. State the hypotheses of interest. Describe,
E 6 11 15 8
in words, the parameters that appear in the
hypotheses. a. Perform the F test for this single-factor ANOVA
b. Compute each of the entries in the ANOVA at 5 .05.
table for this experiment. b. Apply Tukey’s procedure to this data with
c. Using 5 .05, can you conclude that there are 5 .05. Compare your results to the conclu-
any differences between the average lumen out- sion obtained in part (a).
puts for the three brands?
47. Consider a single-factor ANOVA in which samples
44. In the study described in Exercise 12, the authors of size 5 each are measured at each of three levels
also investigated how pulse current affects the hard- of a certain factor. The means of the three samples
ness of the SDSS welds. Hardness is measured in are 10, 12, and 20. Find a value of SSE that satisfies
HV (known as the Vickers number; higher values the following two requirements:
indicate harder metals). (1) The calculated F statistic is larger than the tabled
value of F for 5 .05, df1 5 2, and df2 5 12,
Pulse Current: 100 100 100 120 120 120 140 140 140 so the hypothesis H0: 1 5 2 5 3 is rejected at
Hardness: 326 296 312 245 273 276 299 296 282 5 .05.
(2) When Tukey’s procedure is applied, none of the
Use 5 .05 to conduct the test for whether there three i’s can be said to differ from one another
are any differences in the true average weld hard- (again using 5 .05).
ness attributable to the different pulse currents.
48. For the data referenced in Exercise 39, the article re-
45. In the special case where df1 5 1, the right-tail ported that there was a difference in RPN means for
areas associated with an F distribution are related to the four design methods (M1, M2, M3, M4). Perform
similar areas under a t distribution’s density curve. a post hoc analysis by applying Tukey’s procedure

(as the authors did) using the following output from a. Using 5 .01, conduct an ANOVA test to
the SAS software: determine whether there is a difference in the
Alpha = 0.05 df = 60 MSE = 4883.488 average focus settings between the two groups of
Critical Value of Studentized Range = pilots.
3.73709 b. Which test procedure in Chapter 8 could have
Minimum Significant Difference = 56.989 been used on this data in place of the ANOVA
test in part (a)?
Means with the same letter are not signifi-
c. Conduct the appropriate test you identified in
cantly different.
part (b), using 5 .01, and compare your an-
Tukey Grouping Mean N trt swer to the answer in part (a).
A 336.00 21 M2
A 51. The results on the effectiveness of line dry-
A 301.00 21 M4 ing on the smoothness of fabric were studied
in the paper “Line-Dried vs. Machine-Dried
B 171.43 21 M3 Fabrics: Comparison on Appearance, Hand, and
B Consumer Acceptance” (Home Econ. Research J.,
B 155.71 21 M1 1984: 27–35). Smoothness scores were given
49. In Exercise 47, suppose that the three sample means for nine types of fabric and five different dry-
are 10, 15, and 20. Can you now find a value of SSE ing methods. Because the different types of fab-
that satisfies the two conditions in Exercise 47? ric were expected to have large differences in
smoothness, regardless of drying method, each of
50. Helmet-mounted displays (HMDs) are computer the five drying methods was used on five samples
displays that are presented on see-through screens of each fabric type. The smoothness scores for this
attached to the helmets of helicopter pilots. experiment were as follows:
HMDs are normally employed to aid night flights.
In a study of HMDs, researchers tested Apache Drying method
helicopter pilots to determine whether the pres-
ence of in-flight vision problems has an effect on Fabric type 1 2 3 4 5
a pilot’s ability to focus the HMD panel. Thirteen Crepe 3.3 2.5 2.8 2.5 1.9
pilots were divided into two groups: those who
Double knit 3.6 2.0 3.6 2.4 2.3
experience certain in-flight vision problems and
Twill 4.2 3.4 3.8 3.1 3.1
those who do not. Subjects were asked to set the
Twill mix 3.4 2.4 2.9 1.6 1.7
focus of the HMD for a fixed test pattern, and their
Terry 3.8 1.3 2.8 2.0 1.6
focus settings were then measured with a dioptom-
eter (“Oculomotor Responses with Aviator Helmet- Broadcloth 2.2 1.5 2.7 1.5 1.9
Mounted Displays and Their Relation to In-Flight Sheeting 3.5 2.1 2.8 2.1 2.2
Symptoms,” Human Factors, 1995: 699–710). The Corduroy 3.6 1.3 2.8 1.7 1.8
data from one such experiment is given here: Denim 2.6 1.4 2.4 1.3 1.6
In-Flight symptom tested: Distance
misperception (measurements are in diopters) a. Construct an ANOVA table for this experiment.
b. Using a significance level of .05, can you con-
Symptom Symptom clude that there is a difference between the
present absent mean smoothness scores for the five drying
Sample size 9 4 methods?
Sample mean 2.83 2.70
52. A consumer protection organization carried out
Sample standard deviation .172 .184
a study to compare the electricity usage for four

different types of residential air-conditioning sys- Type of home

tems. Each system was installed in five homes and
the monthly electricity usage (in kilowatt-hours) 1 2 3 4 5
was measured for a particular summer month. 1 116 118 97 101 115
Because of the many differences that can exist Air- 2 171 131 105 107 129
between residences (e.g., floor space, type of in- conditioning 3 138 131 115 93 110
sulation, type of roof, etc.), five different groups system
of homes were identified for study. From each 4 141 141 115 93 99
group of homes of a similar type, four homes were
a. Construct an ANOVA table for this experiment.
randomly selected to receive one of the four air-
b. Using a significance level of .05, can you con-
conditioning systems. The resulting data is given
clude that there is a difference between the
in the table.
monthly mean kilowatt-hours of electricity used
by the four types of air conditioners?

Bibliography
Montgomery, D. C., Design and Analysis of Experi- Ott, R.L. and M. Longnecker, Introduction to Statisti-
ments (8th ed.), Wiley, New York, 2012. The first cal Methods and Data Analysis (6th ed.), Cengage
half of the book gives a good introduction to statisti- Learning, Belmont, CA, 2008. A practitioner’s guide
cal inference and the analysis of variance method. The to analysis of variance and experimental design. Em-
remaining chapters give an equally readable account phasizes applications, calculations, and interpretation
of the experimental design techniques described in of results.
Chapter 10.

Introduction
Methods of experimental design are used to evaluate the effects of several dif-
ferent treatments on a response variable. In the field of agronomy, where experi-
mental design techniques were first applied in the 1920s, different fertilizer blends
(the treatments) were applied to a crop in an effort to find the particular blend
that maximized crop yield (the response). The essential statistical ideas underlying
experimental design lie in the commonsense notion that the usefulness of the con-
clusions drawn from an experiment will critically depend on how the experiment
is conducted.
Scientific applications of experimental design methods are often called design
of experiments (abbreviated DOE). Furthermore, the designs discussed in this
chapter are from a special class called factorial designs. The multifactor designs
presented in this chapter are an extension of the single-factor designs discussed
in Chapter 9. Consequently, the terminology in Section 10.1 builds on that already
introduced in Chapter 9. Sections 10.2 and 10.3 show how to conduct factorial
experiments and how to interpret the results from such experiments.
Throughout the chapter, the statistical tool of analysis of variance
(ANOVA) is used to analyze the data from experiments and to make decisions
about whether a given factor has a significant impact on a response variable.

445

In addition, the graphical tools of effects plots and probability plots provide
very simple, yet powerful methods for visually summarizing the results of an ex-
periment and for sorting out factors that are influential from those that are not.
Effects plots, which were introduced in Section 9.3, are discussed in Section 10.2,
and probability (quantile) plots, first introduced in Section 2.4, are used through-
out Sections 10.4 and 10.5.
Sections 10.4 and 10.5 deal with a class of factorial designs called 2k designs.
These designs have been widely used in industrial and scientific applications. Because
each factor in a 2 design is restricted to only two levels, the resulting statistical
analyses are simplified, making these designs very intuitive and easy to use.

10.1 Terminology and Concepts

Much of the terminology of experimental design has already been introduced in
Sections 4.3 and 9.1. Recall from those discussions that a response variable, or more
simply, a response, is a measurable characteristic of a product or process that we would
like to study. The object of the study is to determine the extent to which various factors
(also called independent variables) affect the values of the response variable. Experi-
ments are carried out by simply changing the levels of each factor and then measuring
whether, and by how much, the response changes. Experimental designs are specific
procedures that stipulate exactly how each factor is to be varied to obtain the most infor-
mation from the experimental data.
One of the most surprising things to come out of Fisher’s original work on experi-
mental design in the 1920s was the realization that the intuitive one-factor-at-a-time ap-
proach to experimentation has several disadvantages. One-factor-at-a-time experiments
are conducted by allowing one factor to vary at a time, keeping the levels of all other
factors fixed while doing so. By successively testing each factor in this manner, an ex-
perimenter hopes to determine both the individual and combined effects that the factors
have on a response variable. Fisher pointed out how inefficient the one-factor-at-a-time
approach is and suggested that better experiments could be designed by using factorial
designs along with the statistical tools of randomization, replication, and blocking
(Section 4.3).
There are several major difficulties with one-factor-at-a-time experiments: (1) They
require more experimental runs than do the factorial designs discussed in this chapter;
(2) they are incapable of detecting how the interplay between two or more factors influ-
ences a response variable; and (3) they usually cannot detect the specific levels of each
factor that will optimize a response variable. In short, one-factor-at-a-time experiments
fail to achieve most of the important goals of an experimenter.
We illustrate each of these shortcomings by reconsidering the discussion from
Example 4.15. In that example, two factors, the particular injection molding ma-
chine used (machine 1 or machine 2) and the brand of plastic pellets used (brand A
or brand B) in the machines were thought to affect the hardness of molded plastic
parts. In the terminology of experimental design, machine and brand are the factors
and plastic hardness is the response variable. Using the one-factor-at-a-time ap-
proach, an experimenter might conduct a series of six tests, as shown in Figure 10.1.

Brand

1 Machine 2

Figure 10.1 Experimental
runs in a one-factor-at-a-time
experiment

The arrows in the figure indicate the direction in which the factor is varied (e.g., test
runs for brand are first done for brand A and machine 1). Two experimental runs
are made at each fixed combination of factor settings to help increase the preci-
sion of the estimates derived from the data. Employing repeated measurements, or
replication, is an intuitive method often used in experiments to reduce errors intro-
duced by outside factors that can bias experimental results. Along the horizontal axis
in Figure 10.1, the experimenter holds the brand factor fixed (i.e., only brand A is
used) and allows the machine factor to vary. Then, holding the machine factor fixed
(at machine 1), the brand factor is varied as shown on the vertical axis. A total of
six experimental measurements are made using this one-factor-at-a-time method. To
estimate the effect of changing from machine 1 to machine 2, the experimenter can
compare the average of the two response values for machine 1 with the average of the
two responses for machine 2. The difference between these two averages is a measure
of how much the response changes when the machine factor is varied. Similarly,
the difference in the two averages associated with brands A and B can be used to
measure the effect of varying the brand factor.
Figure 10.1 highlights one of three problems with one-factor-at-a-time experi-
ments mentioned above: the inability of this design to capture all the information
about the interplay between factors. Suppose, for illustration, that the plastic of brand
B works about the same as brand A does in machine 1, but that brand B works signifi-
cantly better in machine 2 than in machine 1. If so, such information would not be
Unless otherwise noted, all content on this page is © Cengage Learning.

seen in the results of the experiment shown in Figure 10.1. Instead, the data from the
one-factor-at-a-time experiment would show that there was very little effect on hard-
ness when changing the brand factor, since brand B is evaluated only on machine 1.
From those results, an experimenter would incorrectly conclude that changing
plastic brands has little effect on the hardness of the molded parts. As you can see
from Figure 10.1, this potential problem is caused by the fact that the one-factor-at-
a-time approach does not include any experimental runs using plastic of brand B on
machine 2.
In contrast, the designs introduced in this chapter are constructed to expressly take
into account the possibility of significant interplay between factors. In statistics, such in-
terplay between factors is called interaction. Two or more factors are said to interact if,
as described in the previous paragraph, the magnitude of a factor’s effect on the response
variable depends on the particular level(s) of the other factor(s) in the experiment.

In our example, the effect of changing plastic brands on plastic hardness was negligible
when machine 1 was used, but the brand effect becomes substantial when machine 2
was used. Thus there is an interaction between brand and machine. Interactions be-
tween factors are discussed in more detail in Section 10.2.
Figure 10.2 shows an experimental design that does allow for detecting such an
interaction, if it exists. This design is an example of the factorial designs discussed
throughout the chapter. One of the important features of such designs is that experi-
mental tests are conducted at many, if not all, combinations of the levels of the factors.
In particular, note that the design in Figure 10.2 includes a test measurement for the
combination of machine 2 with plastic brand B. If there is an interaction between the
two factors, this design will be able to detect it.

3 4
B

Brand

2
1
A

1 Machine 2

Figure 10.2 A factorial

design using two factors

Another significant feature of the design in Figure 10.2 is that only one measure-
ment is made at each of the combinations of factor levels, which means that a total of
four experimental runs are needed. This brings up the question of whether this four-run
experiment is capable of estimating the factor effects with the same precision as the
one-factor-at-a-time experiment, in which each factor effect is estimated as the differ-
ence between two averages, each based on two measurements. To answer this question,
we denote the four test measurements in Figure 10.2 by y1, y2, y3, and y4. First consider
the factor machine. The difference y2 2 y1 estimates the change in plastic hardness
caused by changing machines when brand A is used on both machines. Similarly, the
difference y4 2 y3 estimates the effect of changing machines when brand B is used on
both. Therefore, by averaging these two estimates, we obtain a more precise estimate of Unless otherwise noted, all content on this page is © Cengage Learning.

the effect of changing machines:

1
machine effect 5 [(y 2 y1) 1 (y4 2 y3)]
2 2
By rearranging this expression, we can write the machine effect in the form
1 1
machine effect 5 ( y2 1 y4) 2 ( y1 1 y3)
2 2
which shows that the machine effect is estimated by the difference between two av-
erages, each based on two measurements, just as is done in the one-factor-at-a-time
experiment that uses six experimental runs. Thus the four-run factorial not only is able
to achieve the same degree of precision as the one-factor-at-a-time experiment but also

does so with fewer experimental runs. Using the same reasoning, we can show that the
factor brand is also measured with the same precision and can be written
1 1
brand effect 5 ( y3 1 y4) 2 ( y1 1 y2)
2 2
As the preceding paragraph illustrates, factorial experiments are more efficient than
one-at-a-time experiments. In fact, as you will see in later sections, the efficiency of
factorial designs compared to one-factor-at-a-time experiments increases as more and
more factors are included in an experiment. As Figure 10.2 shows, factorial experi-
ments achieve their efficiency by using the data more than once. Note, for example,
that the same four data values in Figure 10.2 are used in both of the effects estimates
described in the previous paragraph. Cuthbert Daniel, one of the pioneers in apply-
ing factorial designs to industrial processes, describes this feature of factorial designs as
“making each piece of data work twice,” an expression originally credited to the statisti-
cian W. J. Youden.1
To demonstrate that one-factor-at-a-time experiments do not generally yield the op-
timum settings for each factor, it is helpful to imagine what would happen if we were for-
tunate enough to know the exact relationship between the factors and the response vari-
able. Suppose, for instance, that such information is available for a particular response
value y and two factors whose measured values are denoted by x1 and x2. Thus we can
find the exact value of y associated with any two values of x1 and x2 and, therefore, create
a graph of y versus x1 and x2. Such a graph is called a response surface. Figure 10.3 is
an idealized example of a response surface, which illustrates how the percentage yield
y of a process might be related to the levels of two factors known to affect process yield.
From this graph, it is easy to find the particular values of x1 and x2 that will maximize the
percentage yield y. In a real experiment, of course, the shape of the response surface is
unknown, and the experimenter’s goal is to come as close as possible to the settings of x1
and x2 that optimize the response variable.

20
0 10 20 30 40 50
0 2
10
20
30
40
50
1

Figure 10.3 A response surface of process yield (in percent) versus the
values 1 and 2 of two factors

1
Daniel, C., Application of Statistics to Industrial Experimentation, John Wiley & Sons, New York, 1976: 3.

Another way to summarize the information in a response surface is to create a

contour plot. Contour plots are similar to two-dimensional topographical maps in that
they consist of a series of lines in the plane connecting all points (x1, x2) with a common
y value. For example, by connecting all points (x1, x2) whose associated y value is 90%,
a contour line is formed in the plane. By comparing the contours associated with other
y values, the reader can then form a mental image of how the height of the response
surface changes. Figure 10.4 shows a contour plot created from the response surface of
Figure 10.3. Notice how much easier the contour plot makes the task of finding the x1
and x2 coordinates of the point where the surface achieves its maximum. For this reason,
we will now use the contour plot to illustrate why one-factor-at-a-time experiments gen-
erally fail to find the optimum factor settings.

65 60 55 50 45 40 35 30
65

70 70 65
60
75
80 80
75
85

80
75
85 80
85
2
90
70 75

60 85
55 80
50 75
80
70

65 75
60
70
65
15 20 25 30 35 40 45 50 55

Figure 10.4 Contour plot of the response surface in Figure 10.3

Figures 10.5 and 10.6 show two different experimental strategies that could be fol-
lowed in a one-factor-at-a-time experiment. Suppose, for illustration, that a process is
currently running with the two factors set at the values associated with point A in the
figures. Starting with Figure 10.5, suppose that an experimenter begins by varying the
values of x1 (keeping x2 fixed) and tries to maximize the process yield. As Figure 10.5
shows, the best value of x1 occurs near point B in the figure. Next, keeping x1 fixed at
its value from point B, the experimenter then varies x2 until its optimum value is found
near point C. The experimenter would conclude that both factors had been optimized
and that the best process yield possible is about 86%.

65 60 55 50 45 40 35 30
65

70 70 65
60
75
80 80
75
85

80 Maximum
75
85 80
85
2 90
70 75
C
65
60 85
55 80
50 75
80
70
A B
65 75
60
70
65
15 20 25 30 35 40 45 50 55

Figure 10.5 Contour plot of a one-factor-at-a-time experiment: changing 1 to

a new value, then changing 2

65 60 55 50 45 40 35 30
65

70 70 65
60
75
80 80
75
C 85
D
80 Maximum
75
85 80
85
90
2
Unless otherwise noted, all content on this page is © Cengage Learning.

70 75

65
60 85
55 80
50 75
80
70
A B
65 75
60
70
65
15 20 25 30 35 40 45 50 55

Figure 10.6 Contour plot of a one-factor-at-a-time experiment: separate

searches for 1 and 2 values are combined

Alternatively, an experimenter could employ the strategy shown in Figure 10.6, in

which x1 is first varied until point B is found, x1 is returned to its original value from
point A, and then x2 is varied until its best value is found at point C. Putting these results
together, the experimenter might surmise that the best combination of x1 and x2 is at
point D, which uses the x1 coordinate from point B along with the x2 coordinate from
point C. This time, the experimenter concludes that the optimum process yield is about
79%. In both cases, the experimenter has indeed improved the process yield, but in
neither case has the optimum yield been located.
In practice, one-factor-at-a-time procedures usually require that several experi-
ments be conducted to ascertain the approximate location of the points B and C
illustrated in Figures 10.5 and 10.6. Thus not only do such experiments generally
fail to pinpoint optimal factor settings but several repeated tests are also needed to
do so. By comparison, factorial experiments require much less experimentation and
usually come closer to achieving the goal of finding optimum factor settings. To see
why this happens, consider Figure 10.7, which shows the results of running a facto-
rial experiment near the starting point A. Based on the size of the response values at
the four corners of this factorial design, it is readily apparent that the experimenter
should move in the direction indicated by the arrow in Figure 10.7. By repeating
this process at points B and C, the experimenter quickly determines the optimum
factor settings.

65 60 55 50 45 40 35 30
65

70 65
70
60
75
80 80
75
85

60 85
55 80
50 75
80
70
A 75
65
60
70
65
15 20 25 30 35 40 45 50 55

Figure 10.7 Using factorial designs to search for optimum factor settings

Section 10.1 Exercises

1. What statistical purpose does replication serve in an b. From the graph in part (a), determine the ap-
experimental design? proximate coordinates of the point (x, y) at which
the response surface is at its maximum.
2. Factors A and B are thought to have an effect on a
c. Find an equation that describes the typical con-
certain response value, y. The following table con-
tour of the response surface.
tains data on the response variable measured at each
d. Sketch some of the contours using your answer
combination of the two levels of factors used in a
to part (b). From this sketch, determine the ap-
study:
proximate coordinates of the point at which the
Factor A level response surface is at its maximum from these
1 2 contours.
1 5.2 7.4 4. Suppose that the response surface for a two-factor ex-
Factor B level
2 4.0 6.3 periment can be described by the function f (x, y) 5
2
e2(x2y) .
a. Calculate an estimate of the effect of changing a. Use a computer package to create a graph of the
factor A from level 1 to level 2. response surface.
b. Calculate an estimate of the effect of changing b. Find an equation that describes the contours of
factor B from level 1 to level 2. the response surface.
3. Suppose that the response surface for a two-factor ex- c. Sketch some of the contours using the equation(s)
periment can be described by the function f (x, y) 5 in part (b). Using these results, determine from
2 2
e(21y2)[(x22) 1(y25) ]. these contours the approximate coordinates of
a. Use a computer package to create a graph of the the point(s) at which the response surface is at its
response surface. maximum.

10.2 Two-Factor Designs

In a two-factor design, two factors (labeled A and B in the ensuing discussion) are
specified along with the number of levels of each factor, which are denoted by a and b,
respectively. For example, suppose that we want to expand the motor vibration study de-
scribed in Example 9.1 to include two factors, A 5 brand of bearing used in the motor
and B 5 material used for the motor casing. If we decide to use five bearing brands and
three types of casing material, then a 5 5 and b 5 3 for such an experiment.
A two-factor design is often denoted as an a b design (read “a by b design”). The
design in the previous paragraph would therefore be called a 5 × 3 design. In addition to
allowing us to quickly read the number of levels for each factor, this notation reminds us of
multiplication (e.g., 5 3 3 5 15), because the product of a and b happens to be the number
of distinct treatments (i.e., different combinations of factor levels) created by the two factors.
Thus in the motor vibration study, there are 15 distinct combinations of bearing brand and
casing material that must be included in the experiment. Although it is certainly possible to
conduct any number of tests at each factor–level combination, it simplifies the calculations
if we choose the same number of items for each such treatment. Designs that use the same
number of samples for each factor–level combination are called balanced designs. We will
use the letter r (which stands for repeated measures or replicates) to denote the common
sample size selected from each factor–level combination in a balanced design.

Factor
1 2 ...

1 * * * *

2 * * * *

Factor ..
. *** *** *** ***

* * * *

=
Each cell has response values.

Figure 10.8 Data layout for a balanced

two-factor design with replicates per cell

The most convenient way to keep track of the information in a two-way design is to
display it in matrix form, as shown in Figure 10.8. Each of the a 3 b combinations has
its own cell in which r response values are recorded. With r values in each of the a 3 b
cells, the total number of experimental runs is n 5 a 3 b 3 r.

Main Effects and Interactions

Graphs of the average response versus the factor levels can reveal much about the influ-
ence the factors have on a response variable. Such graphs are called effects plots since
they illustrate the effect that changing the levels of a factor has on the response variable.
Recall that effects plots for one-factor experiments were first introduced in Section 9.3.
One simple rule governs all effects plots: A plotted point corresponding to any factor
level (or factor–level combination) is simply the average of all response values in which
that factor level (or factor–level combination) is present. The following example illus-
trates the process of creating effects plots from the data in a two-way design matrix.
Suppose the data for a 3 3 2 design is as follows:
Factor
1 2

1 10, 14 18, 14 14 = (10 + 14 + 18 + 14)/4

Factor 2 23, 21 16, 20 20 = (23 + 21 + 16 + 20)/4

3 31, 27 21, 25 26 = (31 + 27 + 21 + 25)/4

10 + 14 + ... + 27 18 + 14 + ... + 25
21 = –––––––––––––– 19 = ––––––––––––––
6 6

In the margins of the matrix, we have included the averages of all the responses in
the rows and columns. For instance, the average response for the first row is 14, which
is the average of all four numbers in that row. Notice that these four numbers each cor-
respond to the first level of factor A, which we will denote by A1 in the graphs that follow.
Also, because we have used a balanced design, each level of B is included an equal
number of times in these four numbers, which is what makes the average response of 14
a good representation of what to expect at level A1. As you can see, each level of B is also
represented in the four numbers used to find the average responses for levels A2 and A3.

Following our general rule for computing effects averages, we compute the average
responses for B1 and B2 from six numbers, because each column (i.e., each level of B)
in the matrix contains a total of six measurements.
By plotting the average response versus the levels of a factor, we obtain a graph of the
main effect of that factor. In Figure 10.9, for example, the plots of the main effects of factors
A and B in our example show that the average response tends to increase as factor A changes
from level A1 to A2 to A3, whereas changing factor B from B1 to B2 has the effect of decreas-
ing the average response from 21 to 19. Plotting both the A and B main effects on the same
graph allows you to easily compare the magnitudes—of the A and B effects. In those cases
where the average response stays relatively constant from level to level (e.g., if the average
response had been 20 at all three levels of A), we say that a factor has no main effect.

Response

26
21
20 19

1 2 3 1 2

Main effect for Main effect for

Figure 10.9 Plotting the main effects of factors and

From Figure 10.9, we can see that going from level A1 to level A3 has the net effect of
raising the average response by 12 units (i.e., from 14 at A1 to 26 at A3) and that going from
B1 to B2 lowers the average response by 2 units (from 21 at B1 to 19 at B2). Looking at this
figure, it is tempting to want to treat A and B separately, by simply choosing desirable set-
tings first for A, then for B. If this were always the case, it would make the results of a two-way
experiment exceedingly easy to interpret. Unfortunately, things are not always that simple.
It is possible, as noted in Section 10.1, that two (or more) factors do not act inde-
Unless otherwise noted, all content on this page is © Cengage Learning.

pendently of one another. Two factors are said to interact when the effect of changing
the levels of one factor depends on the particular level of the other factor. This is the case
in our example. The following calculations show that the effect of changing factor A
depends on the particular setting of factor B:
Effect of changing A (B fixed at B1) Effect of changing A (B fixed at B2)
A1 and B1 (10 1 14)y2 5 12 A1 and B2 (18 1 14)y2 5 16
A2 and B1 (23 1 21)y2 5 22 s 17 A2 and B2 (16 1 20)y2 5 18 s 7
A3 and B1 (31 1 27)y2 5 29 A3 and B2 (21 1 25)y2 5 23
Notice that the effect of going from A1 to A3 is an increase of 17 units when B is at level
B1, whereas the corresponding increase is only 7 units when B is at level B2. Thus the
effect of changing the levels of A seems to depend on the particular level of B.

Like main effects, such two-factor interaction effects can also be plotted. This can
be done as shown in Figure 10.10 by overlaying separate graphs, one for each level of
factor B. Alternatively, the two values of B could be used on the horizontal axis with
three overlaid graphs (one for each level of A). The presence of interaction between
two factors is indicated by graphs that either cross one another or, more generally, are
not parallel. Parallel graphs, as depicted in Figure 10.11, are a sign of no interaction
between the factors. Why?
Response Response
1

1
29

22
23
16 2
18
2
12

1 2 3

Figure 10.10 A two-factor 1 2 3

interaction plot Figure 10.11 A plot showing no

interaction between factors and

Keep in mind that effects plots do not take the place of statistical tests. You should
always run an ANOVA test first to determine which of the effects are significant and
which are not. It may turn out, for example, that the interaction effect is not statistically
significant, in which case you can interpret the main effects without worrying about
factor interactions. At other times, you may discover that a factor that you initially
thought was important turns out to have no significant effect on the response variable.
When statistical testing shows that an interaction effect is significant, then the re-
sults of the experiment must be interpreted by examining the interaction plots, not the
main effects plots. When interactions exist, the conclusions drawn from the main effects
plots may or may not agree with those drawn from the interaction plots. On the other
hand, if the interaction between factors is not significant, then you can simply examine Unless otherwise noted, all content on this page is © Cengage Learning.
and interpret the main effects plots. For instance, in our example, neither the main
effect for factor B nor the interaction effect is significant at 5 .05 (see Exercise 6).
This means that we need only examine the main effects plot for factor A. If the goal of
the study is, say, to maximize the response value, then the main effects plot suggests that
we set factor A at level 3. Because the main effect for factor B is not significant and
because the interaction between A and B is not significant, choosing either level of B
should give substantially the same response value.

ANOVA Formulas
All ANOVA procedures share a common goal: to analyze the total variation (SST) in
a response variable by breaking it into identifiable sources of variation. This is accom-
plished by defining a separate sum of squares for each source of variation and then

decomposing SST into a sum of these components. Such formulas are called ANOVA
decompositions.
The general ANOVA decomposition for a two-factor analysis of variance is
SST 5 SSA 1 SSB 1 SS(AB) 1 SSE
where SSA, SSB, and SS(AB) denote the sums of squares associated with factor A, factor
B, and the AB interaction, respectively. SSE, the error or residual sum of squares, rep-
resents the variation from all sources of variation other than A, B, and their interaction.
The formulas for these sums of squares are given in the following box.2 Note that once
SST, SSA, SSB, and SSE are computed, SS(AB) can easily be found by rewriting the
ANOVA decomposition as
SS(AB) 5 SST 2 SSA 2 SSB 2 SSE

Sums of Squares Formulas (Balanced Two-Way ANOVA)

SSa 5 ^( 2 )2
51

SSB 5 ^( 2 )2
51

SST 5 sum of squared deviations of all individual response values

from the grand average,

5 ^ ^ ^( 2 )2
51 51 51

SSe 5 sum of squared deviations of response values from the corresponding

cell means,

5 ^ ^ ^( 2 )2
51 51 51

SS(aB) 5 SST 2 SSa 2 SSB 2 SSe

where
5 th observation when is at level and is at level
5 number of levels of factor
5 number of levels of factor
5 number of replications per cell
5 average of all response values associated with the th level of factor
5 average of all response values associated with the th level of factor

Hypothesis Tests
We now proceed to find the degrees of freedom and mean squares associated with each
source of variation. The total degrees of freedom is n 2 1, where n 5 abr. The degrees
of freedom associated with a factor is simply its number of levels minus 1, and the

2
In the precomputer era, shortcut formulas were often used instead of the formulas we have given. The
interested reader may consult other texts for these formulas.

degrees of freedom for an interaction term is the product of the degrees of freedom of
the corresponding factors. The error degrees of freedom equals ab(r – 1). Decomposi-
tion of degrees of freedom mimics that for sums of squares:
ANOVA
SST 5 SSA 1 SSB 1 SS(AB) 1 SSE
decomposition:
Degree of
abr 2 1 5 (a 2 1) 1 (b 2 1) 1 (a 2 1)(b 2 1) 1 ab(r 2 1)
freedom:
By dividing each sum of squares by its degrees of freedom, we form the mean squares:
SSA SS(AB)
MSA 5 MS(AB) 5
a21 (a 2 1)(b 2 1)
SSB SSE
MSB 5 MSE 5
b21 ab(r 2 1)
These are used to form the F ratios used in our hypothesis tests. In a two-way ANOVA, we
can conduct separate tests for the presence of each main effect and the interaction effect.
In each such test, the null hypothesis is that the effect does not exist, and the alternative
hypothesis is that the effect is present. To conclude, for example, that the factor A (or B)
effect is present means that the average response differs at different levels of A (or B). The
following box summarizes the test procedures for a two-factor ANOVA. An ANOVA table
(Figure 10.12) provides the most convenient way to summarize these results.

Two-Way ANOVA Tests (Significance Level )

To test these Test Degrees of freedom for
hypotheses: statistic P-value determination
0 : There is no main MSa 2 1, ( 2 1)
5
effect for MSe
0 : There is no main MSB 2 1, ( 2 1)
5
effect for MSe
0 : There is no MS(aB) ( 2 1)( 2 1), ( 2 1)
5
interaction effect MSe
Reject 0 if -value # (In each case, is that the particular effect exist.) Unless otherwise noted, all content on this page is © Cengage Learning.

If 0 is rejected, then the interaction plot takes precedence over the main effects plots
when interpreting the effects of and .

Source of variation df SS MS F
Factor A a21 SSA MSA MSA/MSE
Factor B b21 SSB MSB MSB/MSE
AB interaction (a 2 1)(b 2 1) SS(AB) MS(AB) MS(AB)/MSE
Error ab(r 2 1) SSE MSE
Total variation abr – 1 SST

Figure 10.12 ANOVA table for the two-way classification

Technically speaking, the statistical tests just described are based on a fixed effects
model, in which the particular levels of A and B are assumed to be the only ones
of interest in the study. If, on the other hand, we think of the levels only as samples
from all of the possible levels of A and B, then a random effects model should be
used. Recall that the distinction between fixed and random effects was first introduced
in Section 9.3. Although the distinction between fixed and random factors does not
alter the ANOVA calculations for one-factor experiments (Chapter 9), this situation
changes for multifactor designs. In particular, the calculation of F ratios for random
effects models and mixed models (one factor fixed, the other random) are slightly dif-
ferent from those of the fixed effects models. These topics are beyond the scope of our
introductory discussion. Throughout this chapter, we consider all factors in a design
to be fixed factors.

Example 10.1 Refer to Example 9.1, where we examined the possible causes of electric motor vibra-
tion. Suppose that we have identified two product characteristics (factors) that are
thought to influence the amount of vibration (the response, measured in microns)
of running motors: factor A 5 the brand of bearing used in the motor and B 5 the
material used for the motor casing. Figure 10.13 shows the data from an experiment
in which a 5 5 brands of bearings were tested along with b 5 3 types of casing mate-
rial (steel, aluminum, and plastic). Two motors (r 5 2) were constructed and tested
for each of the ab 5 5 ? 3 5 15 combinations of bearing brand and casing type, giving
a total sample size of abr 5 5 ? 3 ? 2 5 30.

Factor
(casing material)
1 2 3 Averages:

1 13.1, 13.2 15.0, 14.8 14.0, 14.3 14.07

2 16.3, 15.8 15.7, 16.4 17.2, 16.7 16.35

Factor 3 13.7, 14.3 13.9, 14.3 12.4, 12.3 13.48

(brand)

4 15.7, 15.8 13.7, 14.2 14.4, 13.9 14.62

5 13.5, 12.5 13.4, 13.8 13.2, 13.1 13.25

Averages: 14.39 14.52 14.15

Figure 10.13 Data on electric motor vibration for

Example 10.1

Before proceeding with the ANOVA calculations, it is instructive to look at the

margins of the data array in Figure 10.13. In particular, note that there appears to be
very little difference between the average responses for the three levels of factor B,
which is a preliminary indication that factor B may have little or no effect on reduc-
ing vibration.

The sum of all 30 response values is 430.6, so the grand average is y 5

430.6y30 5 14.353, which, with the row averages A1 5 14.07, A2 5 16.35, A3 5
13.48, A4 5 14.62, and A5 5 13.25 gives
a
SSA 5 b ? r ^ (Ai 2 y )2
i51

5 3 ? 2 3 (14.07 2 14.353)2 1 (16.35 2 14.353)2 1 (13.48 2 14.353)2

1 (14.62 2 14.353)2 1 (13.25 2 14.353)2 4
5 6 3 6.118125 4 5 36.709
Similarly, the column averages B1 5 14.39, B2 5 14.52, B3 5 14.15 yield
b
SSB 5 a ? r ^ (Bj 2 y)2
j51

5 5 ? 2 3 (14.39 2 14.353)2 1 (14.52 2 14.353)2 1 (14.15 2 14.353)2 4

5 10 3 .070467 4 5 .705
The total sum of squares is the sum of the squared differences of all 30 values from y:
SST 5 (13.1 2 14.353)2 1 … 1 (13.1 2 14.353)2 5 50.655
whereas SSE is the sum of the squared differences of each response value from its
own cell mean,
SSE 5 3 (13.1 2 13.15)2 1 (13.2 2 13.15)2 4
1 3 (15.0 2 14.9)2 1 (14.8 2 14.9)2 4 1 …
5 1.670
By subtraction, SS(AB) 5 SST 2 SSA 2 SSB 2 SSE 5 50.655 2 36.709 2 .705 2
1.670 5 11.571. These results, along with their associated degrees of freedom and
mean squares, are summarized in the following ANOVA table:

Source of variation df SS MS F
Factor A
(bearing brand) 5 2 1 5 4 36.709 36.709y4 5 9.177 9.177y.1113 5 82.45
Factor B
(casing material) 3 2 1 5 2 .705 .705y2 5 .353 .353y.1113 5 3.17
AB interaction (5 2 1)(3 2 1) 5 8 11.571 11.571y8 5 1.446 1.446y.1113 5 12.99
Error 5 · 3(2 2 1) 5 15 1.670 1.670y15 5 .1113
Total variation 5 · 3 · 2 2 1 5 29 50.655

At a significance level of 5 .05, let’s first test for the presence of any interac-
tions. Because the P-value for F 5 12.99 (based on df1 5 8, df2 5 15) is less than
.001, H0AB must be rejected. It appears that there is interaction between the two fac-
tors. Therefore we should consider the corresponding effects plot (see Figure 10.14)

to draw conclusions. Although the casing material does not have a significant effect
by itself, it does influence the A main effect (because the AB interaction is signifi-
cant). The lowest vibration occurs for bearing brand A3, but only if casing B3 (plastic
casing) is used with A3.

Response

16 2
1

13
3

12 Factor
1 2 3 4 5

Figure 10.14 Effects plot for Example 10.1

Section 10.2 Exercises

5. Why do parallel line segments in effects plots indi- 8. A chemical engineer conducts an experiment to
cate that there is no interaction between two factors? test the effects of gas flow rate (factor A) and liquid
flow rate (factor B) on the gas film heat transfer co-
6. In the example discussed on page 454, perform the
efficient (in Btu/hr ft2). Four levels of each factor
necessary hypothesis tests to show that neither fac-
are used in the study, and two replications are con-
tor B nor the two-factor AB interaction is significant
ducted at each combination of factor levels:
(using 5 .05).

7. A fixed effects model is used to analyze two factors, Factor B

each of which has five levels. Three replicated mea- 1 2 3 4
Unless otherwise noted, all content on this page is © Cengage Learning.

surements are available for each combination of 1 200, 211 226, 219 240, 249 261, 250
factor levels. Complete the following ANOVA table 2 278, 267 312, 324 330, 337 381, 375
for this experiment:
Factor A 3 369, 355 416, 402 462, 457 517, 524
4 500, 487 575, 593 645, 632 733, 718
Source of
variation df SS MS F a. Is there evidence of a significant interaction be-
Factor A 20 tween the two factors? Use 5 .01.
Factor B 8.1 b. Use 5 .01 to test the hypothesis that gas flow
AB interaction rate has no effect on the heat transfer coefficient.
Error 2 c. Use 5 .01 to test the hypothesis that liquid flow
Total variation 200 rate has no effect on the heat transfer coefficient.

9. The following data was obtained in an experiment c. Use 5 .01 to test the hypothesis that asphalt
to investigate whether the yield from a certain binder grade has no effect on thermal conductivity.
chemical process depends on either the chemical
12. Factorial designs have been used to study produc-
formulation of the input materials or the mixer
tivity of software engineers (“Experimental Design
speed, or on both factors:
and Analysis in Software Engineering,” Software
Speed Engineering Notes, 1995: 14–16). Suppose that an
60 70 80 experiment is conducted to study the time it takes
189.7 185.1 189.0 to code a software module. Factors that may affect
1 188.6 179.4 193.0 the coding time are the size of the module and
190.1 177.3 191.0 whether the programmer has access to a library of
Formulation previously coded submodules. Module size is stud-
ied at two levels, large and small, whereas access to
165.1 161.7 163.3
a library of submodules is either available or not. Af-
2 165.9 159.8 166.6
ter running a two-factor design on sample modules,
167.6 161.6 170.3
suppose that the interaction between module size
A statistical software package gave these results: and library access is found to be significant.
SS(formulation) 5 2253.44, SS(speed) 5 230.81, a. If the goal is to reduce coding time, describe the
SS(interaction) 5 18.58, and SSE 5 71.87. conclusions you can draw from the experiment
a. Does there appear to be interaction between the if the interaction plot looks like this:
two factors? (Use 5 .05.)
Coding time
b. Does the yield appear to depend on either the
formulation or the speed? (Use 5 .05.) No library access
10. Draw an interaction plot for the data of Exercise 9.
11. Lightweight aggregate asphalt mix has been found Library access
to have lower thermal conductivity, which is desir-
able, than a conventional mix would have. The
article “Influence of Selected Mix Design Factors Module size
Small Large
on the Thermal Behavior of Lightweight Aggregate
Asphalt Mixes” (J. of Testing and Eval., 2008: 1–8) b. What possible reasons can you give for an inter-
reported on an experiment in which various thermal action plot that looks like the following one?
properties of mixes were determined. Three different
Coding time
binder grades were used in combination with three
Library access
different coarse aggregate contents (%), with two ob-
Unless otherwise noted, all content on this page is © Cengage Learning.
No library access
servations made for each such combination, result-
ing in the conductivity data (W/m∙K) given here:
Coarse Aggregate Content (%)
38 41 44
Asphalt PG58 .835, .845 .822, .826 .785, .795 Module size
Binder PG64 .855, .865 .832, .836 .790, .800 Small Large
Grade PG70 .815, .825 .800, .820 .770, .790 13. The article “Fatigue Limits of Enamel Bonds with
a. Test for the presence of interaction between the Moist and Dry Techniques” (Dental Materials,
two factors. Use 5 .01. 2009: 1527–1531) described an experiment to in-
b. Use 5 .01 to test the hypothesis that coarse vestigate the ability of adhesive systems to bond to
aggregate content has no effect on thermal con- mineralized tooth structures. The response variable
ductivity. is shear bond strength (MPa), and two different

adhesives—Adper Single Bond Plus (SBP) and taken to determine the effects of carbon fiber (in %)
OptiBond Solo Plus (OBP)—were used in combi- and sand addition (in %) on two response variables,
nation with two different surface conditions. The casting hardness and wet-mold strength.
accompanying data was supplied by the authors of
the article. The first 12 observations came from the Sand Carbon fiber Casting Wet-mold
SBP-dry treatment, the next 12 from the SBP-moist addition (%) addition (%) hardness strength
treatment, the next 12 from the OBP-dry treatment, 0 0 61.0 34.0
and the last 12 from the OBP-moist treatment. 0 0 63.0 16.0
15 0 67.0 36.0
SBP-Dry 56.7 57.4 53.4 54.0 49.9 49.9
15 0 69.0 19.0
56.2 51.9 49.6 45.7 56.8 54.1
SBP-Moist 49.2 47.4 53.7 50.6 62.7 48.8 30 0 65.0 28.0
41.0 57.4 51.4 53.4 55.2 38.9 30 0 74.0 17.0
OBP-Dry 38.8 46.0 38.0 47.0 46.2 39.8 0 .25 69.0 49.0
25.9 37.8 43.4 40.2 35.4 40.3 0 .25 69.0 48.0
OBP-Moist 40.6 35.5 58.7 50.4 43.1 61.7 15 .25 69.0 43.0
33.3 38.7 45.4 47.2 53.3 44.9 15 .25 74.0 29.0
30 .25 74.0 31.0
a. Construct a comparative boxplot of the data on
30 .25 72.0 24.0
the four different treatments and comment.
0 .50 67.0 55.0
b. Carry out an appropriate analysis of variance
0 .50 69.0 60.0
and state your conclusions (use a significance
15 .50 69.0 45.0
level of .01 for any tests). Include any graphs
15 .50 74.0 43.0
that provide insight.
30 .50 74.0 22.0
c. If a significance level of .05 is used for the two-
30 .50 74.0 48.0
way ANOVA, the interaction effect is significant
(just as in general different glues work better
a. Construct an ANOVA table for the effects of
with some materials than with others). So now
these factors on wet-mold strength. Test for the
it makes sense to carry out a one-way ANOVA
presence of significant effects using 5 .05.
on the four treatments SBP-D, SBP-M, OBP-D,
b. Construct an ANOVA table for the effects of
and OBP-M. Do this and identify significant dif-
these factors on casting hardness. Test for the
ferences among the treatments.
presence of significant effects using 5 .05.
14. Experiments often have more than one response c. From your results in parts (a) and (b), what levels
value of interest. In the article “Towards Improving of each factor would you select to maximize wet-
the Properties of Plaster Moulds and Castings” (J. mold strength? What factor levels would you
Engr. Manuf., 1991: 265–269), a study was under- choose to maximize casting hardness?

10.3 Multifactor Designs

The two-factor designs of Section 10.2 can be extended to include any number of factors A,
B, C, D, . . . , each with its own number of levels a, b, c, d, . . . , and so on. These factorial
designs, as they are called, require that experimental runs be made at all possible combina-
tions of the factor levels. As in the two-factor case, the total sample size for a factorial design
is the product of the number of factor levels times the number of replicates, r. A four-factor
experiment, for example, would require n 5 a ? b ? c ? d ? r sample measurements. Need-
less to say, sample sizes can grow rapidly as more and more factors are included in an experi-
ment, a problem that is addressed in Sections 10.4 and 10.5 of this chapter.

The “3” notation used to describe two-factor designs also provides compact descrip-
tions of multifactor designs. For instance, a 3 3 2 3 2 factorial design is one that has
three factors A, B, and C, with a 5 3 levels of A, b 5 2 levels of B, and c 5 2 levels of C.
Figure 10.15 shows a data layout for such a design. Note that the number of replicates, r,
is not included in this notation. To indicate that repeated measurements have been made
at each factor–level combination, we simply state this fact when referring to the design, for
example, a replicated 3 3 2 3 2 design, or a 3 3 2 3 2 design with r replicates.

1 2

1 2 1 2

* * * *
1 * * * *

* * * *
2 * * * *

* * * *
3
* * * *

observations per cell

Figure 10.15 Data layout for a 3 3 2 3 2

factorial design (three levels of factor ,
two levels of , two levels of )

Main Effects and Interactions

In multifactor designs, main effects and two-factor interactions are interpreted in the
same manner as in two-factor designs (Section 10.2). With more than two factors,
however, the opportunity arises to incorporate even higher-order interactions, such as
the interaction between three or more factors. Notationally, a three-factor interaction
between factors A, B, and C is denoted by either ABC or A 3 B 3 C, a four-factor in-
teraction is denoted by either ABCD or A 3 B 3 C 3 D, and so forth. The basic rule
for interpreting interaction terms is the same as in Section 10.2: If an interaction is
statistically significant, then each of the component factors’ effects will depend on the
particular combination of the other factors in the interaction term. For instance, the
presence of a significant ABC interaction means that the effect of factor A is different at
different settings of factors B and C, or, equivalently, that the effect of B depends on the
levels of A and C, or that the effect of C depends on the levels of A and B. Unless otherwise noted, all content on this page is © Cengage Learning.
Operationally, the presence of an interaction term indicates that we must first look at in-
teraction plots, not main effects plots, when interpreting the experimental results. However,
factors that are not involved in significant interaction terms may be interpreted by simply
examining their main effects plots. For example, suppose that an ANOVA test reveals that
both the main effect for factor A and the BC interaction are significant. This means that we
can look at the main effects plot for A when deciding on the best setting for A, but the BC
interaction plot must be consulted when determining the best settings for factors B and C.

ANOVA Formulas
ANOVA decompositions for factorial designs contain a sum of squares term for every
possible main effect and interaction. For example, a factorial design based on factors A,
B, and C gives rise to three main effects terms (A, B, and C), three two-factor interactions

(AB, AC, and BC), and one three-factor interaction (ABC). The sum of squares for an
interaction term is denoted by putting the interaction term in parentheses after the SS
notation. Thus SS(AB) denotes the sum of squares associated with the AB interaction,
and so on. The ANOVA decomposition for a three-factor model is given by
SST 5 SSA 1 SSB 1 SSC 1 SS(AB) 1 SS(AC) 1 SS(BC) 1 SS(ABC) 1 SSE
where SST (the total sum of squares) measures the total variation in the response data
and SSE (the error sum of squares) is the variation from all sources other than the factors
included in the experiment.
For a three-factor design including every possible main effect and interaction, com-
putational formulas for sums of squares are given in the following box. The key is to start
by computing the sums of squares of main effects and then use these results to find sums
of squares for the two-factor interactions. Similarly, the sums of squares of the two-factor
interactions are used to find SS(ABC). Although the patterns evident in these formulas
can be extended to the case of four or more factors, in practice one usually relies on
statistical software to perform the calculations.

Sum of Squares Formulas (Balanced Three-Factor ANOVA)

Let denote the grand total of all 5 response values.
2 2
SSa 5 ^ 2 , where 5 sum of all data for th level of factor
51
2 2
SSB 5 ^ 2 , where 5 sum of all data for th level of factor
51
2 2
SSC 5 ^ 2 , where 5 sum of all data for th level of factor
51
2 2
SS(aB) 5 ^ ^ 2 2 SSa 2 SSB, where 5 sum of all data for
51 51
ith level of and th level of
c 2 2
SS(aC) 5 ^ ^ 2 2 SSa 2 SSC, where 5 sum of all data for th
51 k51 br abcr
level of and th level of
2 2
SS(BC) 5 ^ ^ 2 2 SSB 2 SSC, where 5 sum of all data for
51 51
th level of and th level of
SS(aBC) 5 SST 2 SSa 2 SSB 2 SSC 2 SS(aB) 2 SS(aC) 2 SS(BC) 2 SSe
SST 5 sum of squared deviations of all 5 response values from the
grand average of the data
SSe 5 sum over all cells of squared deviations of cell entries from corresponding
cell means

For a three-factor design restricted to main effects and two-factor interactions (i.e., a
design that excludes the ABC interaction), we can determine the sums of squares for total
variation, main effects, and two-factor interactions using the computational formulas in

the foregoing box. However, now that we are excluding the ABC term, the SSE must
necessarily change. It is no longer just the sum of squared cell deviations (shown in the
foregoing table), but it will now be increased by an amount equal to SS(ABC). The cor-
rect ANOVA decomposition will now be given by
SST 5 SSA 1 SSB 1 SSC 1 SS(AB) 1 SS(AC) 1 SS(BC) 1 SSE
Rearrangement of this decomposition yields an expression for the new SSE:
SSE 5 SST 2 SSA 2 SSB 2 SSC 2 SS(AB) 2 SS(AC) 2 SS(BC).
Comparing SSE to the SSE for the full three-factor model, we see that the error term
now includes the ABC contribution in the sense that SSE 5 SSE 1 SS(ABC).
Similarly, if we want to restrict the model to only main effects, the ANOVA decom-
position becomes
SST 5 SSA 1 SSB 1 SSC 1 SSE
from which
SSE 5 SST 2 SSA 2 SSB 2 SSC.
Again, the SSE term has simply absorbed the SSE for the full three-factor model and
all sums of squares of the terms omitted from this model. This also happens when terms
from a two-factor design are dropped (cf. page 457) or, in general, from a model having
any number of factors, including multiple regression models (discussed in Chapter 11).

Hypothesis Tests
Hypothesis tests concerning main effects and interactions are based on the familiar
ANOVA assumption that the response values at each fixed factor–level combination
follow a normal distribution and that the variances of these distributions are the same,
regardless of the particular factor–level combination. From these assumptions, a sepa-
rate degrees of freedom and mean square can be computed for each source of variation.
The total degrees of freedom is n 2 1, where n is the total number of experimental
runs. The degrees of freedom for each main effect equals its number of levels minus 1
and the degrees of freedom for any interaction term is simply the product of the degrees
of freedom for its component factors. The mean square associated with any main ef-
fect or interaction equals its sum of squares divided by its degrees of freedom. All of this
information is summarized in the form of an ANOVA table. For example, Figure 10.16
shows the general form of the ANOVA table for a three-factor design. Unless otherwise noted, all content on this page is © Cengage Learning.

Source of variation df SS MS F
A a21 SSA MSA MSA/MSE
B b21 SSB MSB MSB/MSE
C c21 SSC MSC MSC/MSE
AB (a 2 1)(b 2 1) SS(AB) MS(AB) MS(AB)/MSE
AC (a 2 1)(c 2 1) SS(AC) MS(AC) MS(AC)/MSE
BC (b 2 1)(c 2 1) SS(BC) MS(BC) MS(BC)/MSE
ABC (a 2 1)(b 2 1)(c 2 1) SS(ABC) MS(ABC) MS(ABC)/MSE
Error abc(r 2 1) SSE MSE
Total variation abcr 2 1 SST

Figure 10.16 ANOVA table for a factorial design with three factors, , , and

Example 10.2 Over the past decade researchers and consumers have shown increased interest
in renewable fuels such as biodiesel, a form of diesel fuel derived from vegetable
oils and animal fats. According to www.fueleconomy.gov, compared to petroleum
diesel, the advantages of using biodiesel include its nontoxicity, biodegradability and
lower greenhouse gas emissions. One popular biodiesel fuel is fatty acid ethyl ester
(FAEE). The authors of “Application of the Full Factorial Design to Optimization
of Base-Catalyzed Sunﬂower Oil Ethanolysis” (Fuel, 2013: 433−442) performed an
experiment to determine optimal process conditions for producing FAEE from the
ethanolysis of sunﬂower oils. In one study, the effects of three process factors on
FAEE purity (%) were investigated.

Factor Factor name Factor levels

A Reaction Temperature 25°C, 50°C, 75°C
B Ethanol-to-oil molar ratio 6:1, 9:1, 12:1
C Catalyst loading .75 wt.%, 1.00 wt.%, 1.25 wt.%

Table 10.1 shows the data from this 3 3 3 3 3 experiment. Note that there are
r 5 2 repeated tests run at each combination of factor levels. Figure 10.17 shows
the resulting ANOVA table. All effects except the BC and ABC interaction effects
are signiﬁcant at 5 .05. Because some interaction terms are signiﬁcant, the
interaction plots must be examined when drawing conclusions about the factor
effects.
Plots of all two-factor interactions are shown in Figure 10.18, along with the
main effects plots for the three factors. Suppose we are interested in maximizing
the value of the response variable, FAEE purity. Looking at the interaction plots,
the combination of factor levels that best accomplishes this objective is A 5 75°C,
B 5 12:1, and C 5 1.25%. In this example, the conclusions from the interaction
plots agree with the conclusions that we would have drawn from inspecting the main
effects plots.

Table 10.1 Purity (%) of fatty acid ethyl ester

6:1 9:1 12:1

Loading .75 1.00 1.25 .75 1.00 1.25 .75 1.00 1.25
25 81.07 88.71 95.42 81.54 89.12 96.32 86.07 92.05 97.02
82.22 87.61 94.06 82.82 86.49 95.45 87.73 91.72 96.16
50 87.31 89.52 94.68 87.99 90.05 96.44 89.61 90.32 98.30
87.94 88.75 95.45 88.98 90.42 96.47 89.02 90.61 96.62
75 90.66 91.60 93.65 92.14 92.55 97.41 92.88 96.12 97.66
91.87 92.34 95.73 92.22 97.06 97.08 93.30 97.41 97.59

Temp

Source df SS MS F P
A 2 215.38 107.69 112.07 .0000
B 2 74.51 37.26 38.77 .0000
C 2 602.72 301.36 313.60 .0000
AB 4 13.45 3.36 3.50 .0200
AC 4 107.41 26.85 27.94 .0000
BC 4 4.37 1.09 1.14 .3598
ABC 8 12.47 1.56 1.62 .1649
Error 27 25.95 .961
Total 53 1056.26
Figure 10.17 ANOVA for the data of Table 10.1
Interaction Plots for FAEE
Data Means
6 9 12
TEMP
95 25
50
90 TEMP 75
85
RATIO
95 6
9
RATIO 90 12
85
LOAD
95 0.75
1.00
90 LOAD 1.25
85
25 50 75 0.75 1.00 1.25

Main Effects Plots for FAEE

90
88
Mean

25 50 75 6 9 12
LOAD
96
94
92
90
88
0.75 1.00 1.25

Figure 10.18 Two-factor interaction plots and main effects plots for Example 10.2

Based on years of empirical evidence, the results of a factorial experiment usually show
that, over the range of factor levels studied, only a few factors are significant and even fewer
interaction terms are significant. When all main effects and interactions are significant, the
experimenter should carefully examine how the test runs were conducted to make sure that
correct procedures were followed. Recall from Section 4.3 that the proper method of con-
ducting repeated tests is to completely replicate the experimental conditions for each test.
For example, in Example 10.2, the two repeated tests made at A 5 25 C, B 5
6:1 molar ratio, and C 5 .75 wt.% catalyst loading should be conducted by resetting
the apparatus used in the first test, substituting a new sunflower oil sample using the
specified molar ratio and catalyst loading, allowing the temperature to change and be
reset to 25°C, and then running the second test. If, instead, the experimenter simply
leaves the apparatus from the first test in place and immediately conducts a second test,
then the variation between the two FAEE purity responses is more likely to be a mea-
sure of the repeatability of the purity measurement system. It will not truly capture the
experimental error we would expect for any sunflower oil sample under the conditions
A 5 25°C, B 5 6:1 molar ratio, and C 5 .75 wt.% catalyst loading. Test runs that are
incorrectly conducted by simply taking two successive measurements usually result in
underestimating the experimental error MSE, thereby artificially increasing the F ratios
on which hypothesis tests are based.

Section 10.3 Exercises

15. Highly precise finishing methods are important for c. Test to see whether any main effects are signifi-
the manufacturing of ultraprecision optical parts but cant at 5 .05.
conventional polishing methods have proven to be
16. Factorial designs have been used in forestry to assess
unsatisfactory. Magnetic abrasive finishing (MAF),
the effects of various factors on the growth behavior
a relatively new technology that uses abrasive
of trees. In one such experiment, researchers
particles surrounded by magnets that generate a
thought that healthy spruce seedlings should bud
magnetic field around the polishing area, has drawn
sooner than diseased spruce seedlings (“Practical
attention as an alternative finishing method. The
Analysis of Factorial Experiments in Forestry,”
authors of “Run-to-Run Process Control of Magnetic
Canadian J. of Forestry, 1995: 446-461). In addi-
Abrasive Finishing Using Bonded Abrasive Particles”
tion, before planting, seedlings were also exposed
(J. of Engr. Manuf., 2012: 1963–1975) examined
to three levels of pH to see whether this factor has
the impact of MAF process control parameters on
an effect on virus uptake into the root system. The
finishing outcomes. To see whether average surface
following table shows data from a 2 3 3 experiment
roughness (Ra) is affected by the abrasive size (A),
to study both factors:
abrasive quantity (B), and quill gap (C), an experi-
ment using three sizes, three quantities, and three pH
gaps was performed, with two replicates at each 3 5.5 7
of the factor combinations. The resulting sums of 1.2, 1.4, .8, .6, 1.0, 1.0,
squares were SSA 5 210.67, SSB 5 132.17, SSC 5 Diseased 1.0, 1.2, .8, 1.0, 1.2, 1.4,
2586.35, SS(AB) 5 57.48, SS(AC) 5 636.84, 1.4 .8 1.2
SS(BC) 5 875.00, SS(ABC) 5 888.52, SSE 5
Health status
5416.67 and SST 5 10,803.70.
1.4, 1.6, 1.0, 1.2, 1.2, 1.4,
a. Construct an ANOVA table for this data.
b. Test to see whether any interaction effects are Healthy 1.6, 1.6, 1.2, 1.4, 1.2, 1.2,
significant at 5 .05. 1.4 1.4 1.4

The response variable is an average rating of five buds heat treatment applied (B), and machine used (C).
from a seedling. The ratings are 0 (bud not broken), The three times were 8:00 a.m., 11:00 a.m., and
1 (bud partially expanded), and 2 (bud fully expanded). 3:00 p.m. Two types of heat and four machines were
a. Using a significance level of 5%, conduct an used. The data from this 3 3 2 3 4 factorial design
ANOVA test for this data. Indicate which factors is given in the following table. Note: Data is coded as
are significant and whether the interaction term 1000(length 2 4.380); this does not affect the analysis.
is significant.
B1
b. Create an effects plot for the factors that were
found to be significant in part (a). C1 C2 C3 C4
c. What conclusions can you draw regarding the A1 6, 9, 1, 3 7, 9, 5, 5 1, 2, 0, 4 6, 6, 7, 3
effects of the two factors on bud rating? A2 6, 3, 1, 21 8, 7, 4, 8 3, 2, 1, 0 7, 9, 11, 6
17. The output of a continuous extruding machine that A3 5, 4, 9, 6 10, 11, 6, 4 21, 2, 6, 1 10, 5, 4, 8
coats steel pipe with plastic was studied as a function B2
of thermostat temperature profile (A, at three levels), C1 C2 C3 C4
type of plastic (B, at three levels), and the speed (C, at
A1 4, 6, 0, 1 6, 5, 3, 4 21, 0, 0, 1 4, 5, 5, 4
three levels) of the rotating screw that forces the plas-
A2 3, 1, 1, 22 6, 4, 1, 3 2, 0, 21, 1 9, 4, 6, 3
tic through a tube-forming die. Two replications were
A3 6, 0, 3, 7 8, 7, 10, 0 0, 22, 4, 24 4, 3, 7, 0
obtained at each factor–level combination, yielding
a total of 54 observations. The sums of squares were a. Construct an ANOVA table for this data.
SSA 5 14,144.44, SSB 5 5,511.27, SSC 5 244,696.39, b. Test to see whether any interaction effects are
SS(AB) 5 1,069.62, SS(AC) 5 62.67, SS(BC) 5 significant at 5 .05.
331.67, SSE 5 3127.50, and SST 5 270,024.33. c. Test to see whether any main effects are signifi-
a. Construct an ANOVA table for this experiment. cant at 5 .05.
b. Use the appropriate F ratios to show that none of
the two- or three-factor interactions is significant 20. The deposition of thick protective coatings on
at 5 .05. substrates can be facilitated by laser cladding, in
c. Which main effects are significant at 5 .05? which an alloy powder is melted on the substrate
surface. Experiments were conducted to determine
18. To see whether the force in drilling is affected by the how three processing parameters, laser power (A),
drilling speed (A), feed rate (B), or material used (C), scanning velocity (B), and powder flow rate (C) af-
an experiment using four speeds, three rates, and two fect the coating hardness. (“Laser Cladding: An Ex-
materials was performed, with two replicate samples perimental Study of Geometric Form and Hardness
drilled at each combination of levels of the three of Coating Using Statistical Analysis,” J. of Engr.
factors. A software package was used to obtain the sums Manuf., 2006: 1549–1554). Each factor had three
of squares for the experimental data: SSA 5 19,149.73, levels, and there was one observation at each factor-
SSB 5 2,589,047.62, SSC 5 157,437.52, SS(AB) 5 level combination. The following corresponds to
53,238.21, SS(AC) 5 9,033.73, SS(BC) 5 91,880.04, the ANOVA table from the article; only main effects
SSE 5 56,819.50, and SST 5 2,983,164.81. and two-factor interactions were considered there:
a. Construct an ANOVA table for this experiment, SOURCE DF SS MS
and identify significant effects using 5 .01. ? ? ? 63.24
b. Is there any single factor that appears to have ? 2034.74 ? ?
no effect on thrust force? If so, how would you ? ? 480.26 ?
go about choosing the level of this factor that ? ? ? 6.48
would minimize thrust force? ? 729.04 ? ?
? ? 115.26 ?
19. An experiment was conducted to investigate how the Error ? ? 104.26
length of steel bars is affected by the time of day (A), Total ? ?

a. Fill in the missing entries in the table. 22. A four-factor factorial design was used to investigate
b. Identify significant effects using 5 .01. the effect of fabric (A), type of exposure (B), level of
exposure (C), and fabric direction (D) on the extent
21. Recently, nickel titanium (NiTi) shape memory
of color change as measured by a spectrocolorimeter
alloy (SMA) has become widely used in medical
(from “Accelerated Weathering of Marine Fabrics,”
devices. This is attributable largely to the alloy’s
J. Testing and Eval., 1992: 139–143). Two observa-
shape memory effect (material returns to its origi-
tions were made at each combination of the factor
nal shape after heat deformation), superelasticity,
levels. The resulting mean squares were MSA 5
and biocompatibility. An alloy element is usually
2,207.329, MSB 5 47.255, MSC 5 491.783,
coated on the surface of NiTi SMAs to prevent
MSD 5 .44, MS(AB) 5 15.303, MS(AC) 5
toxic Ni release. The alloy element is coated
275.446, MS(AD) 5 .470, MS(BC) 5 2.141,
by laser cladding, a technique first described in
MS(BD) 5 .273, MS(CD) 5 .247, MS(ABC) 5
Exercise 20.
3.714, MS(ABD) 5 4.072, MS(ACD) 5 .767,
The authors of “Parametrical Optimization
MS(BCD) 5 .280, and MSE 5 .977. Perform an
of Laser Surface Alloyed NiTi Shape Memory
analysis of variance using 5 .01 for all tests, and
Alloy with Co and Nb by the Taguchi Method”
summarize your conclusions.
(J. of Engr. Manuf., 2012: 969–979) conducted
a study to see whether the percent by weight of 23. One property of automobile air bags that contrib-
nickel in the alloyed layer is affected by carbon utes to their ability to absorb energy is the per-
monoxide powder paste thickness (A, at three meability of the woven material used to construct
levels), scanning speed (B, at three levels), and the air bags. Understanding how permeability
laser power (C, at three levels). One observation is influenced by various factors is important for
was made at each factor-level combination (Note: increasing effectiveness. In one study, the ef-
Thickness column headings were incorrect in the fects of three factors were studied: temperature
cited article): (A), fabric denier (B), and air pressure (C). Two
specimens were measured at each factor-level
Paste Thickness
combination (“Analysis of Fabrics Used in Passive
Power Speed .2 .3 .4 Restraint Systems—Airbags,” J. of the Textile Insti-
600 600 38.64 35.13 19.20 tute, 1996: 554–571).
900 38.16 34.24 26.23 Temperature
1200 37.54 33.46 30.44
8 50 75
700 600 36.56 35.91 34.62 Pressure 17.2 34.4 103.4 17.2 34.4 103.4 17.2 34.4 103.4
900 39.16 33.10 28.71 420-D 73 157 332 52 125 281 37 95 276
1200 37.06 31.78 21.50 80 155 332 51 118 264 31 106 281
800 600 39.44 40.42 37.21 630-D 35 91 288 16 72 169 30 91 213
900 39.34 37.64 35.65 43 98 271 12 78 173 41 100 211

1200 39.30 34.97 32.50 840-D 125 234 477 90 149 338 102 170 307
111 233 464 100 155 350 98 160 311
a. Construct an ANOVA table for this experi-
ment, including all main effects and two- Denier
factor interactions (as did the authors of the
cited article). a. Construct an ANOVA table for this data.
b. Use the appropriate F ratios to show that none b. Test to see whether any interaction effects are
of the two-factor interactions is significant at significant at 5 .01.
5 .05. c. Test to see whether any main effects are signifi-
c. Which main effects are significant at 5 .05? cant at 5 .01.

10.4 2k Designs
The minimum number of experimental runs needed for a factorial experiment can
increase rapidly as more factors are added to an experiment. Recall, for instance, that
to study factor A at three levels, factor B at two levels, and factor C at four levels, a
minimum of 3 3 2 3 4 5 24 runs are needed, one run for each different combination
of factor levels. If each test run is replicated r times, then the total number of runs will
further increase by a factor of r. As a consequence, the cost of resources needed to con-
duct a factorial experiment can quickly become prohibitive.
One method of combating the problem of extremely large numbers of runs is to use
only two levels of each of the factors of interest. Using this approach to study k different
factors, each having only two levels, the minimum number of experimental runs need-
ed is 2 2 2 … 2 5 2k, which is the reason such experiments are called 2k factorial
designs. These designs are very popular in the research and development of products
and processes, not only because they require smaller sample sizes but also because the
associated statistical analyses are exceedingly simple and, if necessary, can even be done
by hand.

Coding Schemes and the Design Matrix

It is convenient to use coding schemes to describe the factor levels in a 2k experiment.
Two such schemes are in common use, one based on 1 and – signs, the other based
on using lowercase English letters. The 1 and 2 sign scheme is particularly useful for
simplifying the computations needed for the analysis of a 2k design. The other coding
scheme is better for compactly describing the particular combinations of factor levels
used in an experiment. It is useful to understand both methods.
In the 1 and 2 coding method, 11 is used to denote one level of a factor, often
called the high level, whereas –1 is used to denote the low level. For example, if
a factor such as temperature is studied at the two levels 60°F and 100°F, then we
would use 21 to code 60°F and 11 to code 100°F. Although it does not matter
which factor level is assigned 11 or 21, for numerical factors (such as temperature)
it is usually best to assign the 11 coding to the numerically larger of the two levels.
For qualitative factors, such as the brand of raw material used, it does not matter
which brand is assigned the 11 or 21 code. When creating models that relate the
factors to the response variable, the actual factor settings are called the uncoded
factor levels.

Uncoded factor Coded factor

levels levels
Factor A 60°F 21 (low level of A)
temperature 100°F 11 (high level of A)

The 6 1 coding scheme provides a quick method for listing all 2k experimental
runs. Using capital letters A, B, C, . . . , to denote the names of the k factors in an experi-
ment, we form k columns of 11 and 21 values according to the following rule:

Creating the Design Matrix for a 2k Experiment

Column 1: Starting with 21, create a column of length 2k by alternating 21 and
11 values.
Column 2: Create a column of alternating blocks of two 21 values and two 11
values.
Column 3: Create a column of alternating blocks of four 21 values and four 11
values.
Column 4: Create a column of alternating blocks of eight 21 values and eight 11
values.
. .
. .
. .
Continue in this manner, using block sizes that are successive powers of 2, until all
k columns have been formed.
When these columns are placed side by side, they form the design matrix of the experi-
ment, in which each row specifies a particular combination of factor settings. That is,
each row constitutes one of the 2k experimental test runs. The order in which these runs
are listed in the design matrix is called Yates standard order after Frank Yates, a col-
league of Fisher’s who helped develop the methodology of factorial designs.

Example 10.3 For a 23 experiment based on the factors A, B, and C, the eight experimental runs in
Yates standard order are as follows:

Run A B C
1 21 21 21
2 11 21 21
3 21 11 21
4 11 11 21
5 21 21 11
6 11 21 11
7 21 11 11
8 11 11 11

The alternative coding scheme used with 2k designs is based on lowercase letters a, b, c, d, . . . ,
which are intended to denote the high levels of the corresponding factors A, B, C, D, . . . . To denote a
particular experimental run, we form a string of lowercase letters, showing which factors in the run are
set to their high levels. Letters are omitted for factors that are set to their low levels. For instance, in a 23
experiment with factors A, B, and C, the combination of letters ab refers to the test run in which both A
and B are set at their high levels and C is set at its low level. Similarly, the letter b denotes the run with B
high and both A and C low. The notation (1) is used for the one test run in which all factors are set to their
low levels.

Example 10.4 Using the letter coding method, the eight test runs of the 23 experiment in Example
10.3 are coded as follows. The table shows the letter codes that correspond to runs
that have been written in Yates standard order:

Run A B C Letter code

1 21 21 21 (1)
2 11 21 21 a
3 21 11 21 b
4 11 11 21 ab
5 21 21 11 c
6 11 21 11 ac
7 21 11 11 bc
8 11 11 11 abc

Conducting an Experiment
Yates’s method for generating the columns of the design matrix provides a quick
and organized method for laying out the factor–level combinations of a 2k experi-
ment. However, when it comes to actually performing the experimental tests, test
runs should be conducted in random order. Randomization of experimental runs, first
discussed in Section 4.3, helps reduce the possible effects of unknown factors on the
test results.
To see why randomization is used, suppose that we begin to conduct the runs
in a 23 experiment in standard order (as in Example 10.3) but that unforeseen prob-
lems occur and only half the runs can be performed in one day, the remaining runs
being postponed until later in the week. Because the runs are not randomized,
factor C is always at its low level during the first day of testing. Later in the week,
the remaining runs will be conducted when C is at its high level and when other
external conditions may possibly have changed. Consequently, any effect that C has
on the response will be commingled with the effects of changing conditions during
the week. If statistical tests eventually show that factor C has a significant effect on
the response, the experimenter will not be able to tell whether this effect is really
caused by factor C or, instead, if it is caused by changes in other conditions that
might have arisen between the two days of testing. If the test runs had been ran-
domized, there would have been a much smaller chance that such external factors
could systematically influence the test results. For instance, it is highly unlikely that
a randomized run sequence would have resulted in having C always at its low level
during the first half of the runs.

Example 10.5 To randomize the test runs in a 23 experiment, first find the total number of runs
required, including replicated runs. For example, if we decide to conduct two rep-
licate runs for each factor–level combination, then a total of N 5 r2k 5 2 × 23 5 16

test runs must be conducted. Using a random number generator in a statistical

computer program or spreadsheet, choose a random sample of size N, without
replacement, from the integers 1, 2, 3, . . . , N. Assign the first random number
chosen to the first row in the design matrix, the second random number to the
second row, and so forth. These numbers indicate the order in which the tests are
to be conducted.
Suppose, for instance, that the random sample of 1, 2, 3, . . . , 16 turns out
to be

8 6 10 16 4 15 7 3 14 1 11 12 5 13 2 9

Proceeding down the rows of the design matrix, we write the first set of eight random
numbers. Returning to the top row, we record the second set of eight random num-
bers. According to this randomization, the experimenter should begin by conduct-
ing run 2, followed by run 7, then run 8, run 5, run 5, run 2, and so forth. The
response value measured at each run is recorded in the row corresponding to its test
number.

Run A B C Run order Responses

1 21 21 21 8 14 y11, y12
2 11 21 21 6 1 y21, y22
3 21 11 21 10 11 y31, y32
4 11 11 21 16 12 y41, y42
5 21 21 11 4 5 y51, y52
6 11 21 11 15 13 y61, y62
7 21 11 11 7 2 y71, y72
8 11 11 11 3 9 y81, y82

Calculating Effects Estimates

Main effects and two-factor interaction effects can be plotted for a 23 experiment in
exactly the same manner as described for general factorial designs in Sections 10.2 and
10.3. Furthermore, in 2k designs, it is also possible to calculate a numerical estimate for
each main effect and interaction effect. Main effects and two-factor interaction effects
are defined as follows:

definitions The main effect of a factor is the average response value for all test runs at the
high level of the factor minus the average response value for runs at the low level
of the factor.
The two-factor interaction effect is one-half of the difference between the main
effects of one factor calculated at the two levels of the other factor.

These definitions are best understood by considering a numerical example. The

24 experiment in Table 10.2 shows four process variables used in the first stages of
an industrial chemical reaction. The response variable is the percentage of a critical
chemical that is converted during the first stage of the reaction. (We thank Eric Ziegel
of AMOCO Corp. for providing this data.)

Table 10.2 24 experiment for studying the effects of four

factors on the percent yield of a chemical reaction
Factor Low level High level
A, Pressure(psi) 14.0 20.0
B, Steam ratio 7.5 11.5
C, Throughput rate .52 .66
D, Temperature (°F) 1150 1200

Run A B C D y
1 21 21 21 21 27.22
2 1 21 21 21 25.19
3 21 1 21 21 23.23
4 1 1 21 21 18.93
5 21 21 1 21 25.32
6 1 21 1 21 22.61
7 21 1 1 21 26.80
8 1 1 1 21 20.20
9 21 21 21 1 44.53
10 1 21 21 1 42.44
11 21 1 21 1 43.78
12 1 1 21 1 37.66
13 21 21 1 1 42.16
14 1 21 1 1 38.97
15 21 1 1 1 48.85
16 1 1 1 1 42.05

The main effect for factor D in this experiment is calculated as follows:
1
main effect for D 5 (44.53 1 42.44 1 43.78 1 … 1 42.05)
8
1
2 (27.22 1 25.19 1 23.23 1 … 1 20.20)
8
5 42.56 2 23.69 5 18.87
That is, we estimate that changing factor D from its low to its high level results in an in-
crease of about 18.87 in the response variable. Figure 10.19 shows the main effect graph for
factor D. Note that the vertical distance (dashed line) in this graph is the numerical
value of the main effect. The sloped line connecting the two average response values
shows whether the effect is increasing or decreasing the response value as we change
from the low to the high level of the factor.

Response

42.56

18.87 = Main effect for

23.69

– +

Figure 10.19 Main effect for factor

for the data of Table 10.2

The interaction effect between two factors A and B is denoted by writing either AB or
A 3 B. Both notations are found in the literature. The interaction between three fac-
tors A, B, and C is written either as ABC or A 3 B 3 C; four-factor interactions are
written ABCD or A 3 B 3 C 3 D, and so forth. To illustrate the calculation of a two-
factor interaction effect, consider the BC interaction for the experiment in Table 10.2.
Figure 10.20 shows the BC interaction graph created by plotting the average response
values for all four combinations of levels of B and C. Each plotted point is now the aver-
age of four data points, not eight. For instance, the point where B is low and C is low is
the average of the data points 27.22, 25.19, 44.53, and 42.44. With B on the horizontal
axis, the pairs of points with the same level of C are joined by line segments. These two
lines show the main effect of changing B from low to high while holding each level of C
fixed. As you can see from the graph, the effect of changing B from low to high is very
different for the two levels of C. The BC interaction is defined to be one-half of the dif-
ference between the main effect for B with C at its high level and the main effect for B
with C at its low level:
1
BC interaction effect 5 3 (34.48 2 32.27) 2 (30.90 2 34.85) 4 5 3.08
2

B effect when C is held at 1 1 B effect when C is held at 2 1

34.85
34.48
+
interaction = half of the
effect (when is held at +1)
minus the effect (for held at –1)
32.27 –
30.90

– +

Figure 10.20 interaction effect for the data of Table 10.2

We leave it as an exercise for the reader to show that the calculation of an interac-
tion effect does not depend on the order in which the factors appear. That is, the BC

and CB interaction effects are exactly the same and are both treated as measures of the
same two-factor interaction.
The definitions of higher-order interaction effects become more complex as the
number of factors increases. For example, the three-factor ABC interaction is defined
to be one-half of the difference between the AB interaction values calculated at the two
levels of C. As is the case with all interaction calculations, this definition is symmetric
in the sense that the ABC interaction can also be calculated by using the difference
between the BC interactions at the two levels of A or by using the difference between
the AC interaction values at the two levels of factor B.
Fortunately, there is a much simpler method for calculating interactions of any
order. Starting with the design matrix, we first create additional columns by forming
all possible products (two at a time, three at a time, etc.) of columns in the design
matrix. It is convenient to append these columns to the right of the design matrix.
For example, an AB column is formed by multiplying the corresponding entries in
columns A and B, an ABC column is formed by multiplying across the rows of A, B,
and C, and so forth. Next, a contrast is calculated for each column in the extended
matrix by multiplying the signs in a particular column by the column of response
values and then summing. In the case where there are repeated runs at each factor–
level setting (as illustrated in Example 10.5), the column signs are multiplied by the
total of the responses at each factor–level combination. Each contrast is given the
name of the column from which it is constructed. For instance, in a 23 design, there
will be contrasts for A, B, C, AB, AC, BC, and ABC. The final step is to divide each
contrast by half the number of runs:
contrast contrast
effect estimate 5 5
r2 k21
half the number of runs

The resulting values will be the estimates for each main effect and each interaction
effect.

Example 10.6 To calculate all main effects and interaction effects for the 24 design in Table 10.2,
the design matrix (in Yates standard order) is first extended to include all possible
products of columns. For illustration, the BC and ABC columns are shown here.
As part of Exercise 25, the reader should fill in the remaining columns.

Run A B C D AB AC AD BC BD CD ABC ABD ACD BCD ABCD y

1 21 21 21 21 11 21 27.22
2 1 21 21 21 11 11 25.19
3 21 1 21 21 21 11 23.23
4 1 1 21 21 21 21 18.93
5 21 21 1 21 21 11 25.32
6 1 21 1 21 21 21 22.61
7 21 1 1 21 11 21 26.80
8 1 1 1 21 11 11 20.20

Run A B C D AB AC AD BC BD CD ABC ABD ACD BCD ABCD y

9 21 21 21 1 11 21 44.53
10 1 21 21 1 11 11 42.44
11 21 1 21 1 21 11 43.78
12 1 1 21 1 21 21 37.66
13 21 21 1 1 21 11 42.16
14 1 21 1 1 21 21 38.97
15 21 1 1 1 11 21 48.85
16 1 1 1 1 11 11 42.05
As a check on your calculations, each column in the extended design matrix should
consist of exactly half 11s and half 21s. By multiplying each entry in an effect
column by the corresponding entry in the response column and summing, we obtain
the contrast for each effect. For instance, the A, BC, and ABC contrasts are
A contrast 5 227.22 1 25.19 2 23.33 1 18.93 2 25.32 1 22.61
226.80 1 20.20 2 44.53 1 42.44 2 43.78 1 37.66
242.16 1 38.97 2 48.85 1 42.05
5 233.84
BC contrast 5 227.22 1 25.19 2 23.23 2 18.93 2 25.32 2 22.61
126.80 1 20.20 1 44.53 1 42.44 2 43.78 2 37.66
242.16 2 38.97 1 48.85 1 42.05
5 24.62
ABC contrast 5 227.22 1 25.19 1 23.23 2 18.93 1 25.32 2 22.61
126.80 1 20.20 2 44.53 1 42.44 1 43.78 2 37.66
142.16 2 38.97 2 48.85 1 42.05
5 21.20
Because the total number of test runs is 16, each contrast is divided by 8 (half the
number of runs) to obtain the effect estimates:
233.84
Main effect for A 5 5 24.23
8
24.62
BC interaction effect 5 5 3.08
8
21.20
ABC interaction effect 5 5 2.15
8

Analyzing a 2k Experiment
Having obtained estimates of all main effects and interaction effects, we must now use a
statistical procedure to sort the important effects from the unimportant ones. The particu-
lar procedure used depends on whether the experiment is replicated. If only one response

value is measured for each test run, then there is no replication of test runs; consequently,
no estimate of experimental error is available. In this commonly occurring situation, the
recommended procedure is to create a normal quantile or probability plot of the effects and
fit, by eye, a straight line through the “small” effects, that is, the effects with magnitudes close
to zero. Only the effects that do not fall on or near the straight line are considered to be the
important ones. The effects falling near the line are thought to be due to experimental error
or “noise.” To date, there is no universally agreed-upon method for deciding which group
of “small” effects to fit with a straight line. Fortunately, though, decades of empirical stud-
ies have shown that the nonsignificant effects usually comprise the majority of the plotted
points, so fitting an appropriate straight line is usually fairly easy.

Example 10.7 The 15 effects A, B, C, D, AB, . . . , ABCD for the 24 experiment in Table 10.2 are shown
in a normal quantile plot in Figure 10.21. As expected, many of the small effects tend to
fall very close to a straight line (fit by eye). The effects that fall off the line appear to be
A, D, AB, BD, and BC, although it is possible the last three may be close enough to the
line to ignore. Based on these results, we tentatively propose that these five effects are
the only ones that matter in the experiment. In particular, factor D (temperature) has
a large positive effect on the response variable (% of chemical converted). Specifically,
changing D from its low to high level causes an increase in the percentage of chemi-
cal converted. The situation for the other factors is not as clear since the three smaller
interaction terms, AB, BD, and BC, are potentially significant, which means that their
interaction plots must be examined before deciding on the best settings for these factors.

Normal quantile

Effect
0 10 20

Figure 10.21 Normal quantile plot of the effects (response is % conversion)

Another method for separating the important effects from the others is to assume
that certain higher-order effects are nonsignificant and to use these effects to obtain an
estimate of the experimental error. This procedure is based on decades of empirical
evidence suggesting that main effects and two-factor interaction effects are usually the
most important ones in an experiment. Given a choice, the method based on normal

probability plotting is usually more reliable than simply making assumptions about the
outcomes of an experiment. However, when a normal probability plot suggests that the
higher-order interactions may indeed be insignificant, there is more justification for
combining these effects to calculate an SSE.
Suppose that m effects, call them E1, E2, E3, . . . , Em, are thought to be insig-
nificant. To create an estimate of the variance of any effect, we form the average of the
squared effect values:
1 m 2
variance of any effect ^ Ei
m i51
This estimate has m degrees of freedom associated with it. Consequently, confi-
dence intervals for the remaining effects in the experiment can be constructed by using
the following formula:
1 m 2
confidence interval for effect E: E 6 (t critical value) ^ E (df 5 m)
A m i51 i

Example 10.8 The normal quantile plot in Figure 10.21 shows that only a few of the main
effects and two-factor interactions are likely to be significant for the experiment in
Table 10.2. Consequently, it is reasonable to assume that at least the three-factor and
four-factor interactions are negligible and can be safely used to derive confidence
intervals for the remaining 10 effects. Combining these m 5 5 effects allows us to
approximate the variance of any effect as follows:
Effect
name Effect estimate Squared effect
1 ABC 2.150 (2.150)2
2 ABD 2.185 (2.185)2
3 ACD .150 (.150)2
4 BCD .748 (.748)2
5 ABCD .255 (.255)2
Sum 5 .70375
variance of any effect .70375/5 5 .14075
We can then determine the 95% confidence interval for an effect E as follows:
1 m 2
E 6 ty2 ^ E 5 E 6 (t critical value for 5 df)2.14075
A m i51 i
5 E 6 (2.571)(.3752)
5 E 6 .965
For instance, a 95% confidence interval for the D effect is 18.87 6 .965. Since this in-
terval does not contain 0, we conclude that the D effect is significantly different from 0.
Similarly, a 95% interval estimate for the BC interaction is 3.08 6 .965, which indicates
that the BC interaction is also significant. The same effects identified by the normal
quantile plot (A, D, AB, BC, and BD) turn out to be the only significant effects identi-
fied when we assume that the three- and four-factor interactions are negligible.

When test runs are replicated, that is, when r $ 2, then a 2k experiment can be ana-
lyzed using ANOVA techniques. The sum of squares for any main effect or interaction
effect can easily be computed from the effect’s contrast:
contrast 2 contrast 2
sum of squares for an effect 5 5
r2k total number of runs
The error sum of squares, SSE, can be computed in two ways: (1) by calculating the
total sum of squares, SST, for the data and then subtracting the sums of squares of the
effect estimates or (2) directly, by finding the error variation for each of the 2k test runs.
Both methods are illustrated in Example 10.9.

Example 10.9 Compact discs (CDs) and digital video discs (DVDs) are manufactured by the same
process. First, a master disc is created by baking a photosensitive material on a round
glass plate. Next, timed pulses from a laser beam etch digital signals in a tight spiral
on the plate. The plate is then “developed” to reveal a sequence of surface “pits” that
encode the digital information. Master plates are electroplated to produce metal
“stampers,” which, when placed in plastic injection molding machines, press thou-
sands of copies of the final disc.
Some of the factors that affect the mastering stage are listed here, along with the
factor levels that were used in a 23 experiment on compact discs. The goal of the ex-
periment was to minimize an electronic response called “jitter,” which is a measure
of how well the CD can be read by a CD-ROM device. The factor “linear velocity” is
a measure of the speed with which the laser travels in a slowly increasing spiral path
as it burns the pits in the photosensitive material.
Factor Low level High level
Laser power 90% 110%
Developing time 20 sec 30 sec
Linear velocity 1.20 1.30
Data from two replicated runs of the 23 experiment is given in Table 10.3. The
16 test runs were conducted in random order, but the data is presented in the table
in Yates standard order. By extending the design matrix (see Example 10.6) and
applying the columns of “1” and “2” signs to the column of response totals, we
obtain the contrasts, effects, and sums of squares listed below Table 10.3. Unless otherwise noted, all content on this page is © Cengage Learning.

Table 10.3 Data for the 23 experiment in Example 10.9

Response
Run A B C values
1 21 21 21 34 40
2 1 21 21 26 29
3 21 1 21 33 35
4 1 1 21 21 22
5 21 21 1 24 23
6 1 21 1 23 22
7 21 1 1 19 18
8 1 1 1 18 18

Effect Symbol Contrast Effect estimate Effect sum of squares

Laser power A 247.0 247.0y8 5 25.875 (247.0)2y16 5 138.063
Linear velocity B 237.0 237.0y8 5 24.625 (237.0)2y16 5 85.563
Developing time C 275.0 275.0y8 5 29.375 (275.0)2y16 5 351.563
Laser power 3
Linear velocity AB 25.0 25.0y8 5 2.625 (25.0)2y16 5 1.563
Laser power 3
Developing time AC 41.0 41.0y8 5 5.125 (41.0)2y16 5 105.063
Linear velocity 3
Developing time BC 21.0 21.0y8 5 2.125 (21.0)2y16 5 .063
Laser power 3
Linear velocity 3
Developing time ABC 7.0 7.0y8 5 .875 (7.0)2y16 5 3.063

The total sum of squares SST can be found by calculating the sample variance of all
16 measurements and then multiplying by 15. Thus SST 5 15(6.8869)2 5 711.441.
Subtracting all of the effects sums of squares from SST gives the value of SSE 5
711.441 2 138.063 2 85.563 2… 2 .063 2 3.063 5 26.5. Alternatively, the error
variation can be calculated separately for each run by finding ^ ri51(yi 2 y)2 and then
summing all 23 5 8 results:
r

Run ^ (yi 2 y)2

i51
1 18.0
2 4.5
3 2.0
4 .5
5 .5
6 .5
7 .5
8 .0
SSE 5 26.5
Finally, converting SSE to MSE by dividing by the error degrees of freedom
(r 2 1)2k, we have the following ANOVA table from Minitab:
Source DF SS MS
1 138.063 138.063 41.68 0.000 *
1 85.563 85.563 25.83 0.000 *
1 351.563 351.563 106.13 0.000 *
1 1.563 1.563 0.47 0.512
1 105.063 105.063 31.72 0.000 *
1 0.063 0.063 0.02 0.894
1 3.063 3.063 0.92 0.364
Error 8 26.500 3.313
Total 15 711.441

At a significance level of 5 .01, this table shows that the significant effects
are A, B, C, and the AC interaction. Because B does not appear to interact with A
or C (i.e., neither the AB nor the BC interaction is significant), we can immediately
conclude that increasing the linear velocity will, on average, cause the response vari-
able to decrease by about 4.625 units. Because the AC interaction is significant, it is
necessary to examine the AC interaction plot before deciding on the proper settings
for A and C (Figure 10.22). From the plot, we see that the settings that minimize the
response variable are A 5 21 and C 5 11. In this example, the conclusions from
the interaction plot do not agree with those from the main effects plots, which would
have (incorrectly) indicated that both A and C should be set at their 21 levels.
Mean

Laser power high ( = +1)

Laser power low, = –1

20
Developing time
–1 1

Figure 10.22 Laser power × developing time interaction plot

for Example 10.9

Fitting a Model
After the important effects have been identified, it is often useful to write an equation
for predicting the response value. Although in other applications this task would require
the methods of regression analysis (Chapters 3 and 11), the special arrangement of
factor levels in a factorial design makes it especially easy to find prediction equations. All Unless otherwise noted, all content on this page is © Cengage Learning.

that is needed is to define k predictor variables, one for each factor in a 2k experiment.
These predictor variables, also called indicator variables or dummy variables, use the
same 11 and 21 coding that we used to form the design matrix. For example, the indi-
cator variable for factor A is denoted by xA and is defined as follows:
11 When A is at its high level
xA 5 e
21 When A is at its low level
In the same fashion, indicator variables xB, xC, . . . are defined for the remaining factors
in the experiment. Interaction terms are represented by products of the indicator vari-
ables for the factors comprising the interaction term. For instance, the AB interaction
term in a prediction equation is represented by the product xAxB, the ABC interaction
by the product xAxBxC, and so forth.

To write a prediction equation based on certain main effects and interactions, we

simply include the associated indicator variables (or products of indicators) accompanied
by coefficients that are exactly half of the corresponding effect estimates. For instance,
if factor A is to be included in the equation, then we include the term cxA, where the
coefficient c equals one-half of the estimated main effect for A. As another example, to
include the AB interaction term, we would write cxAxB, where the constant c is now one-
half of the estimated AB interaction effect. In this manner, a separate term is included
for each effect that we choose to put in the model. These terms, along with the grand
average of all the data, are then added together to form the desired prediction equation.

Example 10.10 In Example 10.9, our analysis showed that the important effects are A, B, C, and
the AC interaction. The corresponding effect estimates are 25.875 (A), 24.625 (B),
29.375 (C), and 5.125 (AC). Furthermore, the grand average of all 16 data points
in the experiment is 25.313. Using indicator variables xA, xB, and xC, the prediction
equation based on A, B, C, and AC is
predicted
5 yn 5 25.313 2 2.938xA 2 2.313xB 2 4.688xC 1 2.563xAxC
value of y

This equation can then be used to find predicted values of the response variable
for selected values of factors A, B, and C. For example, suppose that we set A high
and both B and C low. This corresponds to the choice xA 5 11, xB 5 21, xC 5 21.
Substituting these values into the prediction equation, we get
yn 5 25.313 2 2.938( 1 1) 2 2.313( 2 1) 2 4.688( 2 1) 1 2.563( 1 )( 2 1)
5 26.813
Notice that the predicted value of 26.813 agrees reasonably well with the average of
the two response values (26 and 29) that were measured at this combination of factor
settings.

Prediction equations are used for several purposes: (1) to generate diagnostic checks
on the adequacy of the chosen model, (2) to create response surface and contour plots,
and (3) to establish factor settings that lie between the 11 and 21 levels. Discussing all
of these applications is beyond the scope of our presentation. However, Example 10.11
illustrates how the prediction equation can help in choosing factor settings.

Example 10.11 Based on our analysis of the compact disc experiment in Examples 10.9 and 10.10,
the prediction equation
yn 5 25.313 2 2.938xA 2 2.313xB 2 4.688xC 1 2.563xAxC
should provide an adequate description of how the response variable is affected by
the factors A (laser power), B (linear velocity), and C (developing time). To increase
the speed with which discs are manufactured, the compact disc company would like
to set the linear velocity (factor B) as fast as possible, while shortening the developing
time as much as possible. Within the range of factor values studied in this experiment,

this means that they would like to operate the mastering process at the high setting for
B and the low setting for C. Given this situation, what setting should they choose for
the laser power if the goal is to minimize the response variable “jitter”?
Substituting xB 5 11 and xC 5 21 into the prediction equation and collecting
terms, we find that
yn 5 25.313 2 2.938xA 2 2.313( 1 1) 2 4.688( 2 1) 1 2.563xA( 2 1)
or
yn 5 27.688 2 5.501xA
From this equation, we see that minimizing the response value y can be accomplished
by making xA as large as possible. Within the range of values studied in the experiment,
the best setting for xA should be xA 5 11; that is, laser power should be set at its high
level of 110%. When this is done, the value of the response variable should be about
27.688 2 5.501(11) 5 22.187. If the value of 22.187 is small enough to satisfy customer
requirements for jitter, then the company can proceed to use these factor settings. If not,
then it can further reduce the jitter by choosing the settings xA 5 21, xB 5 11, and xC 5
11 as in Example 10.9, even though these settings will necessarily increase the produc-
tion time for each master disc (since developing time, C, will now be at its high level).

Section 10.4 Exercises

24. Write the design matrix (in Yates standard order) for Strength of Red Mud Filled PP/LLDPE Blends,”
a complete 23 experiment. Denote the response mea- J. of Materials Science Letters, 1996: 1343–1345).
surements associated with the runs as y1, y2, . . . , y8. The factors studied were the ratio of PP to LLDPE
a. Using the definition that the BC interaction is and the amount of red mud particles (in parts per
one-half the difference between the main ef- hundred parts of resin). The levels at which these
fect for B with C at its high level and the main factors were studied are given in the following
effect for B at its low level, write the formula table:
for the BC interaction in terms of the data y1,
y 2, . . . , y 8. Lower level Upper level
b. Reversing the order of the factors, repeat the cal- PP/LLDPE ratio .25 4
culation in part (a) for the CB interaction. RM particles 4 10
c. Show that the formulas in parts (a) and (b) are
Composites made with each combination of factor
equivalent.
levels were strength tested, with the following
25. Fill in the remaining columns of contrasts for the 24 results:
design in Example 10.6.
Strength
26. Polyolefin blends and composites can often (in MPa)
improve the strength of existing polymers. In PP/ Repli- Repli-
a study to determine which blends lead to in- LLPDE RM cation cation
creased material strength, composites of isotactic Run ratio particles 1 2
polypropylene (PP) and linear low-density polyeth- 1 4 10 19.3 20.2
ylene (LLDPE) were mixed with red mud (RM) 2 .25 10 8.1 9.7
particles (“Application of Factorial Design of Ex- 3 4 4 20.3 24.5
periments to the Quantitative Study of Tensile 4 .25 4 10.4 11.8

a. Calculate the main effects and the two-factor 29. As with many dried products, sun-dried tomatoes
interaction effect for this experiment. can exhibit an undesirable discoloration during the
b. Create the ANOVA table for the experiment. drying and storage process. A replicated 23 experi-
Which factors appear to have an effect on ment was conducted in an effort to optimize color
strength? (Use 5 .05.) by considering storage time, temperature, and pack-
c. Draw the main effects and interaction effects aging type (“Use of Factorial Experimental Design
plots for the factors identified in part (b). for Analyzing the Effect of Storage Conditions on
d. Which settings (high or low) of the factors in Color Quality of Sun-Dried Tomatoes,” Sci. Res.
part (b) lead to maximizing the strength of a and Essays, 2012: 477–489). In the following table,
composite? higher values of the response variable (based on
e. Using the important effects identified in part (b), chromaticity measurements) are associated with
write a model for predicting strength of a composite. higher color quality:

27. The following data resulted from a study of the Color Quality
dependence of welding current on three factors: Storage Storage Replication Replication
welding voltage (A), wire feed speed (B), and tip- Run time temp Packaging 1 2
to-workpiece distance (C). Two levels of each factor 1 2 2 2 2.38 2.40
were used, with two replicate observations made at 2 1 2 2 2.38 2.40
each combination of factor levels. 3 2 1 2 2.42 2.40
Test run Response values 4 1 1 2 2.31 2.29
(1) 200.0, 204.2 5 2 2 1 2.38 2.40
a 215.5, 219.5 6 1 2 1 2.38 2.40
b 272.7, 276.9 7 2 1 1 1.94 1.94
ab 299.5, 302.7 8 1 1 1 1.93 1.92

c 166.6, 172.6
a. Calculate all main effects and two-factor inter-
ac 186.4, 192.0 action effects.
bc 232.6, 240.8 b. Construct an ANOVA table and use it as a basis
abc 253.4, 261.6 for deciding which factors appear to affect color
a. Create the ANOVA table for this experiment. quality (Use 5 .01).
b. At 5 .01, which effects appear to be important? c. Create main effects and interaction effects plots
for the factors identified in part (b).
28. The article “Effect of Cutting Conditions on Tool
d. Which settings (high or low) of the factors in
Performance in CBN Hard Turning” ( J. of Manuf.
part (b) lead to maximizing color quality?
Processes, 2005: 10–16) reported the accompanying
data, from a 23 design, on cutting speed (m/s), feed 30. Self-consolidating concrete (SCC) is a highly flow-
(mm/rev), depth of cut (mm), and tool life (min). able product that can easily fill heavily congested
Perform an ANOVA to investigate two-factor inter- reinforcement areas. Despite its low viscosity, SCC
actions and main effects. also maintains high stability to prevent segregation.
Obs Cut spd Feed Cut Depth Life The authors of “Effect of SCC Mixture Composi-
1 1.21 0.061 0.102 27.5 tion on Thixotropy and Formwork Pressure” (J. Ma-
2 1.21 0.168 0.102 26.5 ter. Civ. Engr., 2012: 876–888) conducted a study
3 1.21 0.061 0.203 27.0 to determine the effect of three mixture param-
4 1.21 0.168 0.203 25.0 eters—base material slump flow (A), sand-to-total
5 3.05 0.061 0.102 8.0 aggregate ratio by volume (B), and relative content
6 3.05 0.168 0.102 5.0 of coarse aggregate (C)—on characteristics of the
7 3.05 0.061 0.203 7.0 resulting SCC mixtures. The following table gives
8 3.05 0.168 0.203 3.5 the coded factor levels along with values of the time

(s) required for the SCC mixture to reach 500-mm a. For the response variable combustion time, cal-
slump flow. culate all main effects and interaction effects for
this experiment.
Run Slump S/A Coarse Time
b. Create a probability plot of the effects in part (a).
1 21 21 21 1.71
Which effects appear to be important?
2 21 21 1 3.19
c. Which settings (high or low) of the factors in
3 21 1 21 1.75
part (b) lead to maximizing combustion time?
4 21 1 1 3.06
Which settings lead to minimizing combustion
5 1 21 21 .88
time?
6 1 21 1 2.44
d. Determine a model equation relating combus-
7 1 1 21 1.34
tion time to the effects identified in part (b).
8 1 1 1 3.37
e. Repeat parts (a)–(d) for the response variable
a. Calculate all main effects and interaction coke burnoff.
effects. 32. Impurities in the form of iron oxides lower the
b. Create a probability plot of the effects from part (a). economic value and usefulness of industrial miner-
Which effects appear to be important? als, such as kaolins, to ceramic and paper-processing
c. Which settings (high or low) of the factors in industries. A 24 experiment was conducted to assess
part (b) lead to maximizing the response vari- the effects of four factors on the percentage of iron re-
able? Which settings lead to minimizing the moved from kaolin samples (“Factorial Experiments
value of the response variable? in the Development of a Kaolin Bleaching Pro-
d. Determine a model equation relating time need- cess Using Thiourea in Sulphuric Acid Solutions,”
ed to reach 500-mm slump flow to the effects Hydrometallurgy, 1997: 181–197). The factors and
identified in part (b). their levels are displayed in the following table:
31. Combustion experiments of medium crude oil Low High
were conducted to determine which of three factors level level
(oxygen partial pressure, oxygen flow rate, and oxy-
Factor Description Units (21) (11)
gen molar concentration) affect various aspects of
the combustion process. (“Factorial Analysis of In A H2SO4 M .10 .25
Situ Combustion Experiments,” Trans. of the Insti- B Thiourea g/l 0.0 5.0
tution of Chemical Engineers, 1991: 237–244). Two C Temperature °C 70 90
response variables, combustion time (in hours) and D Time min 30 150
coke burnoff (in grams/hour), were studied using a
full 23 design with no replications: The data from an unreplicated 24 experiment is
given in the table below:
Molar Com- Coke
Iron Iron
Partial Flow concen- bustion burn
Test extraction Test extraction
Run pressure rate tration time off
run (%) run (%)
1 –1 –1 –1 10.6 5.73
(1) 7 d 28
2 1 –1 –1 11.2 5.70
a 11 ad 51
3 –1 1 –1 24.4 3.05
b 7 bd 33
4 1 1 –1 20.3 2.87 ab 12 abd 57
5 –1 –1 1 9.2 5.57 c 21 cd 70
6 1 –1 1 7.0 5.87 ac 41 acd 95
7 –1 1 1 14.3 3.13 bc 27 bcd 77
8 1 1 1 17.5 3.05 abc 48 abcd 99

a. Calculate all main effects and two-factor inter- Test Removal Test Removal Test Removal Test Removal
action effects for this experiment. run (%) run (%) run (%) run (%)
b. Create a probability plot of the effects. Which (1) 48.70 d 35.70 e 57.20 de 36.40
effects appear to be important? a 86.50 ad 59.60 ae 81.00 ade 52.50
c. Which settings (high or low) of the factors in b 89.10 bd 69.10 be 85.10 bde 61.00
part (b) lead to maximizing the percentage of ab 97.00 abd 89.10 abe 96.90 abde 89.30
iron extracted? c 58.30 cd 37.00 ce 57.60 cde 47.50
d. Write a model for predicting iron extraction per- ac 84.80 acd 64.80 ace 78.80 acde 55.90
centage from the factors identified in part (b). bc 90.90 bcd 71.70 bce 87.30 bcde 58.50
33. An unreplicated 25 experiment was performed to abc 95.20 abcd 93.90 abce 97.10 abcde 89.00
determine which factors affect the percent of arse- a. Calculate all main effects and two-factor inter-
nic removed from contaminated water by electro- action effects.
coagulation (EC) (“Prediction of Arsenic Removal b. Create a probability plot of the effects. Three
by Electrocoagulation: Model Development by effects in particular should appear to be impor-
Factorial Design,” J. Hazard. Toxic Radioact. Waste, tant; what are they?
2011: 48–54). The factors and corresponding levels c. Which settings (high or low) of the factors in
are shown here along with the resulting data. part (b) lead to maximizing the percentage of
Factor Description Units Low level High level arsenic extracted?
(21) (11) d. Develop a model equation for predicting arsenic
A Time s 30 120 removal percentage from the factors identified
in part (b).
B Currcnt amp .6 3.0
C EC area cm2 57 91.2
D Volume L 1 3
E Arsenic mg/L .23 1.18

10.5 Fractional Factorial Designs

The two experiments analyzed in Section 10.4 exhibit a phenomenon commonly found
in 2k designs: Only a few main effects and interactions are important. Most of the effects,
especially the higher-order interactions, tend not to be significant. Early researchers quickly
devised methods for taking advantage of this situation. One such method was discussed in
Section 10.3, where higher-order interaction effects are sometimes assumed to be negligible
and are then pooled to form an estimate of the experimental error for testing the remaining
effects in an experiment. Another procedure that relies on the scarcity of significant interac-
tion effects is the method of fractional factorial designs discussed in this section.
An important reason for using fractional factorial designs is that full factorial designs,
in which all 2k tests are conducted at least once, expend a large amount of resources in esti-
mating interaction terms. That is, as the number of factors k increases, the ratio of the num-
ber of main effects to the total number of effects shrinks rapidly in a 2k design. Table 10.4
(page 490) illustrates how quickly this ratio declines. For instance, in a full 26 experiment
with 64 test runs, only 9.5% of the effects calculated are main effects. The remaining
90.5% of the estimates are devoted to interaction effects, many of which are not likely to
be of statistical or practical importance. Because simple models based on main effects
and, perhaps, some two-factor interactions tend to predominate in actual applications, full
2k designs can be somewhat inefficient for studying large numbers of experimental factors.

Table 10.4 Percentage (rounded) of main effect estimates in a full 2 design

Number Number of Number of Total number of Percentage of
of factors main effects interaction effects effects (2k 2 1) main effects
1 1 0 1 100
2 2 1 3 67
3 3 4 7 43
4 4 11 15 27
5 5 26 31 16
6 6 57 63 9.5
7 7 120 127 5.5
8 8 247 255 3.1
9 9 502 511 1.8
10 10 1013 1023 1.0

Creating a Fraction of a 2k Design

To reduce the problem of estimating large numbers of possibly unimportant interaction
effects, fractional factorial designs are created by replacing some of the higher-order
interaction terms by additional experimental factors. For example, suppose that we want
to study four factors, A, B, C, and D, but that we want to use 8 test runs rather than the
16 runs required by a full 24 design. To do this, first write down the extended design
matrix for the full 23 design (i.e., the 2k design with 8 runs):
A B C AB AC BC ABC
21 21 21 11 11 11 21
11 21 21 21 21 11 11
21 11 21 21 11 21 11
11 11 21 11 21 21 21
21 21 11 11 21 21 11
11 21 11 21 11 21 21
21 11 11 21 21 11 21
11 11 11 11 11 11 11
Next, since the highest-order interaction is least likely to be important, replace the ABC
Unless otherwise noted, all content on this page is © Cengage Learning.
column by the letter D. This is abbreviated by writing D 5 ABC. Then erase all remain-
ing interaction columns to obtain the design matrix:
A B C D
21 21 21 21
11 21 21 11
21 11 21 11
11 11 21 21
21 21 11 11
11 21 11 21
21 11 11 21
11 11 11 11

This four-column matrix is the design matrix of a fractional factorial design based on
four factors. In fact, these 8 test runs correspond to certain rows in the full 24 design, as
shown (shaded) here.

Run A B C D
1 21 21 21 21
2 11 21 21 21
3 21 11 21 21
4 11 11 21 21
5 21 21 11 21
6 11 21 11 21
7 21 11 11 21
8 11 11 11 21
9 21 21 21 11
10 11 21 21 11
11 21 11 21 11
12 11 11 21 11
13 21 21 11 11
14 11 21 11 11
15 21 11 11 11
16 11 11 11 11

Because the 8 test runs comprise only a fraction of the 16 runs required in a full 24
design, we say that the 8-run experiment is a fractional factorial experiment. Further-
more, since this design uses only half of the 16 runs, we say that it is a half fraction of
the full factorial design based on four factors.
All of the information about the 8-run design can be compactly summarized using
the following notation system. The particular fractional factorial design we have created
is denoted as a 2421 design. This notation carries the following information:
1. The design has 8 test runs (because 2421 5 23 5 8).
2. Four factors are studied in the experiment.
3. Each factor has two levels.
4. One factor (factor D) has been added to a full design based on 8 runs.
5. The design uses a fraction, 1/21, of the runs of a full 2k design.
In general, any fractional factorial design can be described by the notation 2k2p, which
is intended to convey that
1. The design has a total of 2k2p test runs.
2. k factors are studied in the experiment.
3. Each factor has two levels.
4. p factors have been added to a full design based on 2k2p runs.
5. The design uses a fraction, 1@2p, of the runs of a full 2k design.
The general procedure for creating a fractional factorial design is similar to that in
the previous example: First, create the extended design matrix for a full design based

on k 2 p test runs, and then rename p of the interaction columns with the p additional
factors. It is convenient to use sequential capital English letters to denote the factors.
As we will see subsequently, the choice of which columns to replace with the additional
factors is important and cannot simply be made arbitrarily.

Example 10.12 Suppose that you want to study five factors using only 8 test runs. How do you
create a fractional factorial design to accomplish this? First, start with the full 23
design (i.e., the full 2k design that has 8 runs). Write the column headings of the
extended design matrix: A, B, C, AB, AC, BC, and ABC. Finally, choose two of
the interaction columns, say, ABC and AC, and assign the additional two factors,
D and E, to these columns. Denote this column assignment by writing D 5 ABC
and E 5 AC. Because we are adding two factors (D and E) to a full design based
on three factors (A, B, and C), this design is called a 2522 fractional factorial. To
create the design matrix for this particular 2522 experiment, first write the design
matrix in Yates standard order for the 23 experiment with factors A, B, and C.
Then append columns D and E. The entries in D are found by multiplying the
entries of columns A, B, and C. Similarly, the entries in column E are found by
multiplying the entries of columns A and C. Exercise 35 asks you to write this
design matrix.

Finding the Alias Structure

The reward for using fractional factorial designs is a substantial reduction in the re-
quired number of test runs. It stands to reason, however, that there is also a price to
pay. After all, how can a 2421 design with 8 test runs be expected to give exactly the
same quality of information about four factors that a full 24 design with 16 runs can?
What is lost in a fractional design is the ability to clearly distinguish some of the ef-
fects from one another. To illustrate, consider the 2421 design created previously by
the assignment D 5 ABC. We immediately see that the D effect and the ABC effect
cannot be distinguished from one another because the same column of 11s and 21s
in the design matrix is used to compute both the ABC and D effects. Consequently,
D and ABC are said to be aliases of one another. We also say that the D effect is con-
founded with the ABC effect. Of course, the reason that we chose to alias D with the
ABC column in the first place was that we hoped the ABC effect would be negligible.
If this turns out to be the case, then we will have obtained a main effect estimate for
D using only 8 runs.
Unfortunately, the assignment D 5 ABC induces even more confounding than
you might first imagine. Consider, for example, the AB and the CD interactions. In
Exercise 36, you are asked to show, by multiplying the appropriate columns, that the AB
and CD columns are identical. Thus not only are D and ABC aliased but AB and CD
are also aliased. In fact, there are many sets of aliased effects generated by our original
choice of D 5 ABC. The entire set of aliases in a fractional factorial design is called the
alias structure of the design.
As is the case with all the other aspects of 2k designs, there is a fairly easy method for
writing down the alias structure of a fractional design. This method depends on some
simple observations about multiplying columns of 11s and 21s:

1. First, the letter I denotes the column consisting entirely of 11s.

2. Note that any column multiplied by itself yields column I. For example,
A ? A 5 A2 5 I, B ? B 5 B2 5 I, and so forth.
3. Multiplying column I by any other column does not change the column. For
example, A ? I 5 I ? A 5 A.
Using these facts, we can obtain the alias structure of any fractional factorial as follows:

Finding the Alias Structure of a Fractional Factorial

1. First, write the p assignments of additional factors in equation form. These p
equations are called the design generators.
2. Multiply each generator from Step 1 by its left side to put each generator into
the form I 5 w, where w is a “word” composed of several letters represent-
ing particular experimental factors (e.g., D 5 ABC becomes I 5 ABCD). It
is also possible to create words with “2” signs, such as D 5 2ABC. If this is
done, the resulting design will use a different fraction of the runs from the
full 2k design.
3. Letting I 5 w1, I 5 w2, . . . , I 5 wp denote the p design generators from Step 2,
form all possible products of the words wi (one at a time, two at a time, three at
a time, etc.). Use the fact that squares of factors can be eliminated (e.g., A2 5 I
and multiplying by I does not change anything). There will be a total of 2p words
formed. This collection is called the defining relation of the design.
4. Multiply each word in the defining relation by all 2k 2 1 effects based on k fac-
tors. Use the fact that squares of factors cancel out to simplify the products. The
result is called the alias structure of the design.
As the following examples show, finding the alias structure is not as complicated a task
as the procedure may indicate.

Example 10.13 Let’s determine the alias structure of the 2421 design where D is aliased with
ABC. The generator of this design is D 5 ABC. Multiplying both sides by D gives
D ? D 5 D(ABC), or I 5 ABCD. Since there is only one “word” in this equation,
the defining relation is also of the form I 5 ABCD. Multiplying each of the 24 21
effects by the relation I 5 ABCD yields the following:

Effect Aliases Effect Aliases

A 5 BCD BD 5 AC
B 5 ACD CD 5 AB
C 5 ABD ABC 5D
D 5 ABC ABD 5C
AB 5 CD ACD 5B
AC 5 BD ABCD 5A
AD 5 BC ABCD 5I
BC 5 AD

There is a lot of repetition in this list. Eliminating duplicate equations, we can sum-
marize the alias structure of the design as follows:

A 5 BCD AB 5 CD
B 5 ACD AC 5 BD
C 5 ABD AD 5 BC
D 5 ABC ABCD 5I

The alias structure can be summarized as follows: (1) Each main effect is aliased
with a three-factor interaction; (2) all two-factor interactions are aliased with one
another; and (3) the single four-factor interaction is aliased with the grand average
of the data.

Example 10.14 The 2522 design of Example 10.12 provides a better illustration of how the defining
relation is formed. Recall that the design generators in that example are D 5 ABC
and E 5 AC. Writing these in the form I 5 ABCD and I 5 ACE, we can see that the
defining relation is formed from the “words” ABCD and ACE and all possible prod-
ucts of these words. Since there is only one such product, namely, (ABCD)(ACE) 5
A2BC2DE 5 BDE, the defining relation is I 5 ACE 5 BDE 5 ABCD. Multiplying
each of the 25 2 1 effects through by the defining relation gives the following alias
structure (Exercise 37):

I 5 ACE 5 BDE 5 ABCD

A 5 CE 5 BCD 5 ABDE
B 5 DE 5 ACD 5 ABCE
C 5 AE 5 ABD 5 BCDE
D 5 BE 5 ABC 5 ACDE
E 5 AC 5 BD 5 ABCDE
AB 5 CD 5 ADE 5 BCE
AD 5 BC 5 ABE 5 CDE

Notice that each main effect is now aliased with at least one two-factor interaction as
well as higher-order interactions in this design.

Analyzing a Fractional Factorial Experiment

Fractional factorial designs are also called screening designs because they are used to
separate the few important effects from the many unimportant effects in the early stages
of experimentation. Because of their emphasis on studying a large number of factors
with as small a number of runs as possible, replicated fractional factorials are fairly
rare. It is much more likely to find fractional designs run using only one test run for

each combination of factor levels. Therefore, normal quantile or probability plots are
generally used to analyze fractional designs. In those fortunate cases where replicated
test runs are available, ordinary ANOVA tests can be used to distinguish the important
effects from the others.
To begin the analysis of an unreplicated fractional design, first compute all 2k2p
effects (this includes the grand average) associated with the design. Then construct a
normal plot of all effects except the grand average. Analyze the plot in the usual fashion
by fitting a straight line, by eye, through the effects with small magnitudes. Finally, use
the alias structure to formulate the model that is most likely to explain the pattern in the
plot. One common practice is to opt for main effects and two-factor interactions rather
than higher-order effects when formulating a tentative model.

Example 10.15 Pyrometallurgical processes are normally used to extract manganese from raw
mineral ores, but alternative methods based on chemical reactions are currently
being studied. One such method, based on reductive chemical leaching, uses
sucrose in a solution of sulfuric acid to extract manganese dioxide (“Fractional
Factorial Experiments in the Development of Manganese Dioxide Leaching by
Sucrose in Sulfuric Acid Solutions,” Hydrometallurgy, 1994: 215–230). In this in-
vestigation, five factors were studied to determine their effect on the percentage
of manganese dioxide, MnO2, obtained from the leaching process (Table 10.5,
page 496).
A 2521 design with generator E 5 ABCD was used. From the data in Table 10.5,
a normal quantile plot of the effects was created (Figure 10.23, page 496). From this
plot, it appears that only factors A (sucrose concentration), B (particle size of ore),
and E (sulfuric acid concentration) have a significant effect on the percentage of
MnO2 extracted by the leaching process. None of the interaction terms appears to be
significant. The effect estimates are

Factor Main effect

A (sucrose) 10.69
B (size) 11.19
E (H2SO4) 232.69

From these results, we can conclude that raising the sucrose concentration and using
ores of larger particle size tend to increase the MnO2 yield. In addition, because rais-
ing the sulfuric acid concentration tends to reduce the yield, it would be better to use
the lower concentration. We divide the effects by 2 to obtain the model coefficients.
In addition, we can write a model for predicting the percentage yield, y, given the
(coded) values of the variables xA, xB, and xE.

yn 5 42.97 1 5.35xA 1 5.60xB 2 16.35xE

The fact that lowering the sulfuric acid concentration has such a large effect on yield
suggests that further experiments be conducted with even lower H2SO4 levels.

Table 10.5 25–1 design for studying the effects of five factors
on percentage yield of a chemical process
Factor Factor name Low level High level
A Sucrose (g/L) 5 10
B Ore particle size (m) 90–125 200–300
C Mixing rate (min–1) 150 200
D Temperature (°C) 30 50
E Sulfuric acid (M) 1 2
Particle Yield
Run Sucrose size Agitation Temperature H2SO4 %
1 21 21 21 21 1 14.0
2 1 21 21 21 21 56.0
3 21 1 21 21 21 63.5
4 1 1 21 21 1 38.0
5 21 21 1 21 21 48.0
6 1 21 1 21 1 25.5
7 21 1 1 21 1 26.5
8 1 1 1 21 21 81.0
9 21 21 21 1 21 45.0
10 1 21 21 1 1 25.0
11 21 1 21 1 1 24.0
12 1 1 21 1 21 51.5
13 21 21 1 1 1 18.0
14 1 21 1 1 21 67.5
15 21 1 1 1 21 62.0
16 1 1 1 1 1 42.0

Normal quantile

0
: Sucrose
: Size
: Agitate
–1 : Temp.
: H2SO4

Effect
–30 –20 –10 0 10
Figure 10.23 Normal quantile plot of the effects (response is yield %)

Section 10.5 Exercise

34. In a 2723 fractional factorial design, 40. Metal “leads” that protrude from electronic compo-
a. How many factors are being studied? nents often have their bases sealed with glass to protect
b. How many experimental runs are required against moisture ingress. Fractures in the glass can be
(assuming no replications)? caused by bending or twisting the leads and by large
c. What fraction of the runs of a full 27 design are thermal changes. In an experiment designed to evalu-
used by this experiment? ate how different factors affect the peak stress applied
to a glass seal, the following factors and factor levels
35. Fill in all the columns in the design matrix for the
were studied (“A Fractional Factorial Numerical
2522 design of Example 10.12.
Technique for Stress Analysis of Glass-to-Metal Lead
36. A 2421 design is specified by setting D 5 ABC. Seals,” J. of Electronic Packaging, 1994: 98–104):
a. Fill in the columns of the design matrix for this Low High
fractional factorial design. level level
b. By multiplying the appropriate columns in the
Factor Description (L, in.) (H, in.)
design matrix from part (a), show that AB and
s Half the distance between
CD contrasts are identical.
neighboring leads .025 .35
37. Using the design generators I 5 ABCD and I 5 wlead Horizontal width of lead .010 .020
ACE, verify all the entries in the alias structure of hlead Distance from package
the 2522 design of Example 10.14. base to center of lead .127 .381
7 rport Radius of port in package
38. A quarter-fraction of a 2 experiment (factors A,
for lead seal .4572 .5588
B, . . . , G) is constructed using the design genera-
tors ABCDE 5 F and CDE 5 G. twall Wall thickness of
a. How many experimental runs (assuming no rep- package .030 .050
lications) must be conducted? The design matrix for the study was
b. Write down the alias structure for this design.
Run s wlead hlead rport twall
39. A fractional factorial experiment with 16 test runs 1 L L L L L
was conducted to determine the effects of several 2 L L L H H
factors on the antioxidant capacity in carotenoid ex- 3 L L H L H
tracts of the bacterium Thermus filiformis (“Evalua- 4 L L H H L
tion of Biomass Production, Carotenoid Level and 5 L H L L H
Antioxidant Capacity Produced by Thermus Fili- 6 L H L H L
formis Using Fractional Factorial Design,” Braz. J. 7 L H H L L
Microbiol., 2012: 126–134). The variables studied 8 L H H H H
were temperature (at 65°C and 75°C), pH (at 7 and 8), 9 H L L L H
tryptone (at 5 and 10 g/L), yeast extract (at 5 and 10 H L L H L
10 g/L), and Nitsch’s trace elements (2 and 5 mL/L). 11 H L H L L
The Nitsch’s trace elements factor was aliased with 12 H L H H H
the highest-order interaction term. 13 H H L L L
a. What are k and p for this 2k2p design? 14 H H L H H
b. Determine the alias structure of the design. 15 H H H L H
c. Suppose that it is reasonable to assume that all in- 16 H H H H L
teractions consisting of three or more factors are
negligible. In this case, will any of the estimates of a. Find k and p for this 2k2p design.
the remaining effects be aliased with one another? b. Determine the alias structure of this design.

41. In an effort to reduce the variation in copper plat- Here is data from the resulting fractional factorial
ing thickness on printed circuit boards, a fraction- experiment:
al factorial design was used to study the effect of
pH Temp Agents Ratio Speed Iron Removal (%)
three factors—anode height (up or down), circuit
2 2 2 1 1 29.19
board orientation (in or out), and anode placement
1 2 2 2 1 84.72
(spread or tight)—on plating thickness (“Charac-
2 1 2 2 2 95.25
terization of Copper Plating Process for Ceramic
1 1 2 1 2 96.08
Substrates,” Quality Engr., 1990: 269–284). The
2 2 1 1 2 49.89
following factor combinations were run:
1 2 1 2 2 87.92
Anode Board Anode Thickness 2 1 1 2 1 89.22
height orientation placement variation 1 1 1 1 1 96.17
2 2 2 11.63
a. What are k and p for this 2k2p design?
2 1 1 3.57
b. Determine the alias structure of this design.
1 2 1 5.57
Hint: Each of the last two design columns is a
1 1 2 7.36 product of two of the initial three columns.
a. Find k and p for this 2k2p design. c. Calculate estimates of the effects for this study.
b. Determine the alias structure of this design. d. Create a normal probability plot for the effects
c. Calculate estimates of the effects for this experi- determined in part (b) and identify any effects
ment. that appear to be important.
d. Assuming that the AB interaction is negligible, 43. Exercise 39 described a half-fraction of a factorial
use this information to obtain an estimate of experiment in which the Nitsch’s trace elements
SSE and perform hypothesis tests for both main factor was aliased with the highest-order interaction
effects. (Use 5 .05.) term. The response variable, antioxidant capacity,
e. From the results in part (d), which factors have a was measured in percent protection against singlet
significant effect on plating thickness variation? oxygen [O2(1 D g)]. The cited article reported the fol-
f. If the objective of the study is to minimize the lowing data:
variation in plating thickness, what setting of
each factor do you recommend? Temp pH Yeast Tryptone Nitsch %Prot
2 2 2 2 1 51.5
42. Lateritic nickel ore deposits are an important 1 2 2 2 2 85.1
source of nickel. Atmospheric acid leaching (AL) 2 1 2 2 2 46.1
has grown in popularity as a method to extract 1 1 2 2 1 49.0
nickel from such deposits. In the AL process, a 2 2 1 2 2 33.6
high concentration of ferric iron may remain 1 2 1 2 1 82.9
in the leach solution which would diminish the 2 1 1 2 1 57.1
purity of the desired nickel. A study was con- 1 1 1 2 2 71.9
ducted to investigate how five AL process factors 2 2 2 1 2 34.4
impact iron removal efficiency (%) from leach 1 2 2 1 1 42.7
solutions. These factors were pH (2 versus 4), tem- 2 1 2 1 1 31.4
perature (25°C and 85°C), neutralizing agents 1 1 2 1 2 64.8
[15% (W/W) MgO and 25% (W/W) CaCO3], Fe/ 2 2 1 1 1 4.3
Ni ratio (6 versus 18), and stirring speed (200 and 1 2 1 1 2 40.4
500 rpm) (“The Effect of Iron Precipitation Upon 2 1 1 1 2 48.9
Nickel Losses from Synthetic Atmospheric Nickel 1 1 1 1 1 60.5
Laterite Leach Solutions: Statistical Analysis and
Modelling,” Hydrometallurgy, 2011: 140–152). a. Calculate estimates of the various effects.

b. Suppose that additional experimentation shows d. If the objective of the study is to maximize
that only those effects whose magnitudes exceed percent protection, what setting of each factor
15 are important. Which factors or interactions do you recommend?
have a significant effect on percent protection?
c. Create an effects plot for the important effects
identified in part (b).

Supplementary Exercises
44. The following data was used to investigate whether gap (C); the experiment involved three sizes, three
the compressive strength of concrete depends on the quantities, and three gaps, with two replicates at
type of capping material used or on type of curing each of the factor combinations. The resulting sums
method used. The numbers in the matrix are totals, of squares were SSA 5 12,209.77 SSB 5 19,641.09
each based on three replications. In addition, SSE 5 SSC 5 367,688.98 SS(AB) 5 8721.72 SS(AC) 5
4716.67 and SST 5 35,954.31 for this data. 40,008.11 SS(BC) 5 44,347.01 SS(ABC) 5 94,554.41
SSE 5 334,393.64 and SST 5 921,564.7275.
Curing method
a. Construct an ANOVA table for this data.
1 2 3 4 5 b. Test to see whether any interaction effects are
1 1847 1942 1935 1891 1795 significant at 5 .05.
Capping
2 1779 1850 1795 1785 1626 c. Test to see whether any main effects are signifi-
material
3 1806 1892 1889 1891 1756 cant at 5 .05.

a. Construct an ANOVA table for this experiment. 47. Exercise 20 described an experiment involving
b. Using 5 .01, test to see whether either factor three processing parameters: laser power (A), scan-
or their interaction is significant. Describe your ning velocity (B), and powder flow rate (C). Another
conclusions from these tests. experiment considered how depth penetration of
the cladding layer is affected by these same factors.
45. In an experiment to assess the effects of curing
Each factor had three levels and there was one ob-
time (factor A) and type of mix (factor B) on the
servation at each factor combination. Here is the
compressive strength of concrete cylinders, three
ANOVA table from the article, which only consid-
different curing times were used in combination
ered main effects and two-factor interactions:
with four different mixes, with three replicate ob-
servations obtained for each of the 12 factor–level SOURCE DF SS MS
combinations. The resulting sums of squares were ? ? ? 162.38
SSA 5 30,763.0, SSB 5 34,185.6, SSE 5 97,436.8, ? 0.080570 ? ?
and SST 5 205,966.6. ? ? 0.130195 ?
? ? ? 0.56
a. Construct an ANOVA table for this experiment.
? 0.145137 ? ?
b. Using 5 .05, can you conclude that there is a
? ? ? 0.76
significant interaction between the two factors?
Error ? ? 0.006387
c. Test, at 5 .05, the hypothesis that factor A has Total ? ?
no effect on compressive strength.
d. Test, at 5 .05, the hypothesis that factor B has a. Fill in the missing entries in the table.
no effect on compressive strength. b. Identify significant effects using 5 .01

46. The authors of the article cited in Exercise 15 also 48. The article “An Assessment of the Effects of Treat-
performed an experiment to see whether the maxi- ment, Time, and Heat on the Removal of Erasable
mum peak to valley profile height (Rmax) is affected by Pen Marks” (J. Testing and Eval., 1991: 394–397)
the abrasive size (A), abrasive quantity (B), and quill reports the following sums of squares for the response

variable “degree of removal of marks” (larger values a. Construct an ANOVA table for this experiment
of this variable are associated with more complete re- including only main effects and two-factor inter-
moval of marks): SSA 5 39.171, SSB 5 .665, SSC 5 actions (as did the authors of the cited article).
21.508, SS(AB) 5 1.432, SS(AC) 5 15.953, SS(BC) 5 b. Use the appropriate F ratios to show that none
1.382, SS(ABC) 5 9.016, and SSE 5 115.820. Four of the two-factor interactions are significant at
different laundry treatments (factor A), three differ- 5 .05.
ent types of pen (factor B), and six different fabrics c. Which main effects are significant at 5 .05?
(factor C) were used in the experiment. Three obser-
50. Even under the increased levels of security sought
vations were obtained for each combination of the
by current airport security practices, airports try to
factor levels. Perform an analysis of variance using
assure rapid processing of individuals through secu-
5 .01 for all tests, and state your conclusions.
rity checkouts. In an experiment designed to find
49. The article cited in Exercise 21 also reported on an- combinations of factors that will minimize travelers’
other experiment in which the authors investigated processing times at security checkpoints, three fac-
whether the percent by weight of nickel in the al- tors were studied: the number of ticket checkers (2
loy layer is affected by niobium powder paste thick- or 3), the number of X-ray machines (1 or 2), and
ness (A, at three levels), scanning speed (B, at three the number of metal detectors (1 or 2) (“Operation
levels), and laser power (C, at three levels). One ob- of Airport Security Checkpoints Under Increased
servation was made at each factor-level combination, Threat Conditions,” J. of Transp. Engr.,1996: 264–
yielding the accompanying data (Note: Thickness 269). Each of the possible combinations of these
column headings were incorrect in the cited article): factors was studied by using eight separate random
samples of 67 travelers. The processing times
Paste Thickness
(in seconds) are summarized in the table below.
Power Speed .2 .3 .4
a. Calculate all main effects and interaction effects
700 600 17.14 20.16 18.73 for this experiment.
900 24.75 17.19 26.54 b. Pool the standard deviations of the replicated
1200 18.78 18.80 21.42 runs to find a value for SSE.
800 600 26.55 13.03 18.92 c. Using the SSE from part (b), determine which
900 19.96 29.37 21.41 effects are significant (at 5 .05).
1200 26.66 19.80 22.01 d. Which settings (high or low) of the factors in
900 600 33.33 27.65 28.71 part (c) lead to minimizing processing time?
900 37.33 28.81 23.22 e. What is the best way to staff a security check-
1200 34.98 26.40 15.44 point if management wants to limit the number

Processing time
Ticket X-ray Metal Number of Standard
Test checkers machines detectors replicates Mean deviation
1 2 2 2 67 39.10 1.29
2 3 2 1 67 46.50 4.30
3 2 2 1 67 50.56 5.41
4 3 2 2 67 35.07 1.05
5 2 1 2 67 93.37 37.75
6 3 1 1 67 90.55 33.52
7 2 1 1 67 97.70 34.79
8 3 1 2 67 88.86 37.58

of employees to five per checkpoint? Note: X-ray amount of chemical sprayed (spray volume), and
machines and metal detectors each require one the brand of chemical used (brand) are factors that
operator. may affect the uniformity of the coating applied.
f. Is the disparity in magnitudes of the standard A replicated 23 experiment was conducted in an
deviations a possible cause for concern in this effort to increase the coating uniformity. In the
experiment? following table, higher values of the response
variable are associated with higher surface uni-
51. Shea tree oxidation experiments were conducted to
formity:
determine which of three factors (reaction time, air
pressure, reaction temp.) affect various aspects in con- Surface
verting the woody biomass into a renewable biofuel. uniformity
Optimal enzymatic conversion of the Shea tree into Repli- Repli-
ethanol occurs when the cellulose content is maxi- Spray Belt cation cation
mized and lignin content is minimized (“Optimiza- Run volume speed Brand 1 2
tion of Pretreatment Conditions Using Full Factorial 1 2 2 2 40 36
Design and Enzymatic Convertibility of Shea Tree 2 1 2 2 25 28
Sawdust,” Biomass and Bioenergy, 2013: 130–138). 3 2 1 2 30 32
The response variable lignin removal (g/kg) was stud- 4 1 1 2 50 48
ied using a full 23 design with no replication: 5 2 2 1 45 43
Run Time Pressure Temp Lignin 6 1 2 1 25 30
1 21 21 21 30 7 2 1 1 30 29
8 1 1 1 52 49
2 1 21 21 110
3 21 1 21 241 a. Calculate all main effects and two-factor inter-
4 1 1 21 192 action effects for this experiment.
5 21 21 1 116 b. Create the ANOVA table for this experiment.
6 1 21 1 201 Which factors appear to have an effect on sur-
7 21 1 1 230 face uniformity? (Use 5 .01).
8 1 1 1 191 53. A half-fraction of a 25 experiment is used to study
a. Calculate all main effects and interaction effects the effects of heating time (A), quenching time (B),
for this experiment. drawing time (C), position of heating coils (D), and
b. Create a probability plot of the effects in part (a). measurement position (E) on the hardness of steel
c. Suppose that additional experimentation shows castings. The following data was obtained:
that only those effects whose magnitudes exceed
Test run Obs Test run Obs
40 are important. Which factors or interactions
a 70.4 acd 66.6
have a significant effect on lignin removal?
b 72.1 ace 67.5
d. Draw an effects plot for the important effects
identified in part (c). c 70.4 ade 64.0
e. Suppose that additional experiments show that d 67.4 bcd 66.8
the AB and BC interactions are not significant. e 68.0 bce 70.3
If the objective of the study is to maximize lig- abc 73.8 bde 67.9
nin removal, what setting of each factor do you abd 67.0 cde 65.9
recommend? abe 67.8 abcde 68.0

52. ln an automated chemical coating process, the Assuming that second- and higher-order interac-
speed with which objects on a conveyor belt are tions are negligible, conduct tests (at 5 .01) for
passed through a chemical spray (belt speed), the the presence of main effects.

Bibliography

Box, G. E. P., W. G. Hunter, and J. S. Hunter, Statis- Montgomery, D. C., Design and Analysis of Experi-
tics for Experimenters (2nd ed.), Wiley, New York, ments (8th ed.), Wiley, New York, 2012. This book
2005. This is one of the definitive texts on industrial gives complete coverage of experimental designs, in-
experimental design, with emphasis on 2k designs and cluding general factorials, blocking, 2k designs, frac-
fractional factorial designs. tional factorial designs, and more. Rigorous treatment,
Daniel, C., Applications of Statistics to Industrial Ex- good examples, and easy to read.
perimentation, Wiley, New York, 1976. A classic Myers, R.H., D.C. Montgomery, and C.M. Anderson-
text that briefly, yet eloquently, explains 2k and Cook, Response Surface Methodology: Process
fractional factorial designs from the point of view of and Product Optimization Using Designed Experi-
the practitioner. The author’s considerable experience ments (3rd ed.), Wiley, New York, 2009. Easy-to-read
in applying these designs makes it a very valuable presentations of response surface analysis and factorial
reference. designs. Includes some of the most recent developments
and tools in experimental design.

Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has
deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
11
Nightman1965/Shutterstock.com
Inferential Methods
in Regression
and Correlation
11.1 Regression Models Involving a Single
Independent Variable
11.2 Inferences About the Slope Coefficient b
11.3 Inferences Based on the Estimated
Regression Line
11.4 Multiple Regression Models
11.5 Inferences in Multiple Regression
11.6 Further Aspects of Regression Analysis

Introduction
Regression and correlation were introduced in Chapter 3 as techniques for
describing and summarizing data consisting of observations on a dependent or
response variable and one or more independent variables. We first focused
on the case of a single independent variable and suggested constructing a
scatterplot of sample data ( 1, 1), . . . , ( , ) to gain preliminary insight into
the nature of any relationship between the two variables. When the scatterplot
exhibits a linear pattern, a line fit to the data by the principle of least squares
provides a convenient summary of the approximate relationship; the coefficient
of determination 2 describes what proportion of the total variation in the ob-
served values can be attributed to this relation. Substituting a particular
value into the linear equation results in a for the value of that
would be observed if one more observation were made at this particular value.

503

When data is available on independent variables ( $ 2) 1, . . . , , the same

line of reasoning leads to a best-fit prediction equation having the general form
n5 1 1 11 1 and a value of the coefficient of multiple determination
2
. Again, a point prediction of results from substituting specified values of the
’s into the prediction equation.
In this chapter, we introduce probabilistic models as a way of describing situa-
tions where there is uncertainty in the value of even after the values of selected
predictor variables have been specified. Such models are then used to test various
hypotheses of interest and to calculate both confidence intervals for mean values
and prediction intervals for individual values to be observed at some future time.
We also show how the sample correlation coefficient can be used to test hypoth-
eses about the .

11.1 egression Models Involving

R
a Single Independent Variable
A deterministic relationship between two variables x and y is one in which the value of y
is completely and uniquely determined, with no uncertainty, by the value of x. Such a
relationship can be described using traditional mathematical notation: y 5 f (x), where
f (x) is a specified function, such as 10 1 2x, 5e2.2x, or 100 2 4y1x. In many engineer-
ing and science applications, it is unreasonable to assume that the variables of interest
are deterministically related. For example, there is presumably a strong relationship
between x 5 engine horsepower and y 5 time to go from 0 mph to 60 mph. Yet it
is possible for two different engines with the same x values to result in two different
values of y, so that the value of the latter variable is not determined solely by the value
of the former.
A description of the relations between variables x and y that are not deterministi-
cally related can be given by specifying a probabilistic model. The general form of an
additive probabilistic model allows y to deviate from f (x) by a random amount. The
model equation is

y 5 deterministic function of x 1 random deviation

5 f (x) 1 e

(The random deviation e is sometimes referred to as a random error.) Consider graphing

f (x) on a two-dimensional rectangular coordinate system. If we fix x at the value x and
make an observation on y, in the absence of the random deviation, the resulting point
(x , y) would fall exactly on the graph. However, if e . 0, the point falls above the graph,
whereas e , 0 implies that the point falls below the graph. So the role of the random
deviation is to allow observed points to deviate from the graph of the deterministic func-
tion by random amounts.
Occasionally, some sort of theoretical argument will suggest an appropriate choice
of f (x). Most frequently, though, a scatterplot of the data is used for this purpose. When

the plot shows a linear pattern, it is natural to take f (x) to be a linear function, resulting
in what is called the simple linear regression model.

DEFINITIONS The simple linear regression model assumes that there is a line with slope and
vertical or y intercept , called the true or population regression line. When a
value of the independent variable x is fixed and an observation on the dependent
variable y is made, the variables are related by the model equation
y 5 1 x 1 e
Without the random deviation e, all points would fall exactly on the population
regression line. We shall assume that for any fixed x value, e has a normal distri-
bution with mean value 0 (e 5 0) and standard deviation (e 5 ). We also
assume that the random deviations e1, e2, . . . , en associated with different obser-
vations are independent of one another.

Figure 11.1 shows several observations in relation to the population regression line.

Population regression
Observation when = 1 line (slope )
(positive deviation)

2
1

= vertical Observation when = 2

intercept (negative deviation)

0 = =
1 2

Figure 11.1 Several observations resulting from the simple linear

regression model

Randomness in e implies that y itself is subject to uncertainty. The foregoing

assumptions about the distribution of e imply that the distribution of y values in re-
peated sampling satisfies certain properties. Consider y when x equals some fixed
value x , and let
y ? x * 5 mean or expected value of y when x 5 x
2y ? x * 5 variance of y when x 5 x
y ? x * 5 standard deviation of y when x 5 x

For example, if x 5 engine horsepower and y 5 time to go from 0 to 60 mph, then

y ? 200 5 mean time to go from 0 to 60 mph when horsepower is 200

y ? 200 5 standard deviation of time when horsepower is 200

Because and in the equation y 5 1 x* 1 e are fixed numbers, so is 1 x*.

Taking the mean value on both sides of this equation then gives (since e 5 0)

y ? x * 5 1 x*

which is just the height of the population regression line above the value x 5 x . Simi-
larly, taking the variance on both sides of the equation and using the fact that the vari-
ance of a constant is zero gives

2y ? x * 5 2e 5 2 y ? x * 5

That is, for a given value of x, the amount of variability in y is the same as the amount
of variability in e, which in turn is the amount of variability about the population line.
Finally, e is assumed to have a normal distribution, and the sum of a constant 1 x*
and a normally distributed variable itself has a normal distribution. Thus the distribu-
tion of y for any fixed x value is normal.

For any fixed value, the dependent variable has a normal distribution with

mean value height of the population

5 51
for fixed regression line above

(so the population regression line is the line of mean values) and

standard deviation of for fixed value 5

The slope of the population regression line is the mean or expected change in
associated with a 1-unit increase in . The value of determines the extent to which
( , ) observations deviate from the population regression line—roughly speaking, it
is the size of a “typical” deviation from the line. Most or even all of the ( , ) observa-
tions will fall quite close to the population line when is close to 0, but when is
large there are likely to be some large deviations from the line. Finally, independence
of the ’s corresponding to different observations implies that the different ’s are
independent.

The key features of the model are illustrated in Figures 11.2 and 11.3. The three
normal curves in Figure 11.2 have identical spreads because the amount of variability
in y is the same at each x value.

= +
the population
regression line
+ 3 (line of mean values)
Mean value + 3
Standard deviation
+ 2 Normal curve

Mean value + 2
Standard deviation
Normal curve
+ 1

Mean value + 1
Standard deviation
Normal curve

1 2 3

Three different values

Figure 11.2 The simple linear regression model

Population regression line

(a) (b)

Figure 11.3 Data from the simple linear regression model: (a) s small; (b) s large
Unless otherwise noted, all content on this page is © Cengage Learning.

Example 11.1 Recently the use of granite in construction and as an ornamental material has grown
in popularity. However, due to its textural properties, granite is a difficult material to
process by traditional machining methods. Abrasive waterjet (AWJ) is an advanced
cutting process that has shown promise in improving granite machining. The authors
of “Performance of Abrasive Waterjet in Granite Cutting: Influence of the Textural
Properties” (J. of Materials in Civil Engr., 2012: 944–949) examined the effect of
textural properties on the cutting performance of AWJ. The article suggested the
simple linear regression model as a way to relate y 5 AWJ cut depth (mm) to x 5
granite grain size (mm).

Suppose that the parameter values for the actual model (as suggested by data in
the cited article) are
5 2.4 5 25.5 5 .9 mm
Then for any particular fixed x value, y is normally distributed with
mean value 5 y ? x 5 25.5 2 .4x
standard deviation 5 y ? x 5 .9

For example, when x 5 5, AWJ cut depth has mean value 5 25.5 2 .4(5) 5 23.5 mm.
Because 23.5 6 2 5 21.7 and 25.3, roughly 95% of all AWJ cut depths made when
granite grain size is 5 mm will be between these limits. The slope 5 2.4 is the
mean decrease in AWJ cut depth associated with a 1-mm increase in granite grain
size. Thus, if we make one observation on AWJ cut depth when x 5 5 and another
when x 5 6, we expect the former cut depth to exceed the latter by .4 mm (but the
actual difference in y values will almost always be either larger or smaller than this
because observations will deviate from the population line).

In practice, the judgment as to whether the simple linear regression model is ap-
propriate is virtually always based on sample data and a scatterplot. The plot should
show a linear rather than a curved pattern, and the vertical spread of points should be
relatively homogeneous throughout the range of x values. Figure 11.4 shows plots with
three different patterns, only one of which is consistent with the model.

Figure 11.4 Some commonly encountered patterns in scatterplots: (a) consistent with
the simple linear regression model; (b) suggests a nonlinear probabilistic model;
(c) suggests that variability in changes with

Estimating Model Parameters

Estimates of the three parameters , , and 2 (or ) are based on n sample observations
(x1, y1), (x2, y2), . . . , (xn, yn) assumed to have been obtained independently according to
the simple linear regression model; that is, y1 5 1 x1 1 e1, y2 5 1 x2 1 e2, and so
on. Denote estimates of the intercept and slope by a and b, respectively. These estimates
come from applying the principle of least squares introduced in Chapter 3; the least
squares line has smaller sum of squared vertical deviations than does any other line.

The least squares estimates of the slope and intercept of the population regression
line are the slope and intercept, respectively, of the least square line, given by

5 point estimate of 5

5 point of estimate of 5 2

where

5^ 2°
^ ^ ¢

1 ^ 22
5^ 2
2

The estimate of the population regression line is then just the least squares line
n5 1

Let x denote some particular value of the predictor variable x. Then a 1 bx has two
different interpretations:
1. It is a point estimate of the mean y value when x 5 x (i.e., of 1 x ).
2. It is a point prediction of an individual y value to be observed when x 5 x .

Example 11.2 Variations in clay brick masonry weight have implications not only for structural and
acoustical design but also for design of heating, ventilating, and air conditioning sys-
tems. The article “Clay Brick Masonry Weight Variation” (J. of Architectural Engr., 1996:
135–137) gave a scatterplot of y 5 mortar dry density (lb/ft3) versus mortar air content (%)
for a sample of mortar specimens, from which the following representative data was read:
x: 5.7 6.8 9.6 10.0 10.7 12.6 14.4 15.0 15.3
y: 119.0 121.3 118.2 124.0 112.3 114.1 112.2 115.1 111.3
x: 16.2 17.8 18.7 19.7 20.6 25.0
y: 107.2 108.9 107.8 111.0 106.2 105.0
The scatterplot of this data in Figure 11.5 certainly suggests the appropriateness of the
simple linear regression model; there appears to be a substantial negative linear rela-
tionship between air content and density, one in which density tends to decrease as air
content increases.
The values of the summary statistics required for calculation of the least squares
estimates are
^ xi 5 218.1 ^ yi 5 1693.6 ^ x2i 5 3577.01
^ xiyi 5 24,252.54 ^ y2i 5 191,672.90

Density

125

115

105
Air content
5 15 25

Figure 11.5 Scatterplot of the data from Example 11.2

from which
(218.1)(1693.6)
Sxy 5 24,252.54 2 5 2372.404000
15
(218.1)2
Sxx 5 3577.01 2 5 405.836000
15
2372.404000
b5 5 2.917622 2.9176
405.836000
1693.6 218.1
a5 2 (2.917622) a b 5 126.248889 126.25
15 15
The equation of the estimated regression line (the least squares line) is then
yn 5 126.25 2 .9176x
Substitution of the air content value 12.0 into this equation gives yn 5 115.24, which
can be interpreted either as a point estimate of the mean dry density for all specimens
whose air content is 12% or as a prediction for the dry density of a single mortar speci-
men whose air content is 12%.
Unless otherwise noted, all content on this page is © Cengage Learning.

Inferences based on the fitted model require that the error standard devia-
tion be estimated. The estimate is based on calculating the vertical deviations from
the estimated regression line. First, the predicted or fitted values are obtained by
substituting the x values from the sample into the equation of the estimated regres-
sion line: yn1 5 a 1 bx1, yn2 5 a 1 bx2, and so on. The residuals are then the differences
between the observed y values and the predicted y values: y1 2 yn1, . . . , yn 2 ynn. These are
the vertical deviations from the points in the scatterplot to the estimated regression line
(least squares line). Squaring and summing these residuals gives residual or error sum
of squares, denoted by either SSResid or by SSE:
SSResid 5 ^ (yi 2 yni)2 5 Syy 2 bSxy

Each sum of squares in statistics has associated with it a specified number of degrees
of freedom. In simple linear regression, SSResid is based on n 2 2 df, because before
SSResid can be calculated, the two parameters and must be estimated, resulting
in a loss of 2 df (just as in the case of a single sample, estimating by x gives the sum
of squares ^ (xi 2 x)2 based on n 2 1 df). The statistic for estimating the third model
parameter 2 is the mean square error, obtained by dividing error SS by its df:

SSResid
estimate of 2 5 s2e 5
n22
estimate of 5 se 5 2s2e

Roughly speaking, se is the size of a typical deviation in the sample from the estimated
regression line.
In Chapter 3, SSResid was interpreted as a measure of the variation in observed y
values not explained by the approximate linear relationship between x and y. We also
introduced total sum of squares
1 ^ yi 2 2
SSTo 5 Syy 5 ^ (yi 2 y)2 5 ^ y2i 2
n
interpreted as a measure of total variation in the observed y values. In the present con-
text, the coefficient of determination

SSResid
r2 5 1 2
SSTo

is interpreted as the proportion of observed y variation that can be attributed to (or,

equivalently, explained by) the simple linear regression model relationship between y
and x. The closer r2 is to 1.0, the better the model explains the y variation. The differ-
ence between SSTo and SSResid is itself a sum of squares, called regression sum of
squares, which is interpreted as explained variation:
SSRegr
SSRegr 5 SSTo 2 SSResid r2 5
SSTo

Example 11.3 Let’s reconsider the data on x 5 air content and y 5 mortar dry density from
Example 11.2. The first predicted value and residual are
yn1 5 126.248889 2 .917622(5.7) 5 121.0184
y1 2 yn1 5 119.0 2 121.0184 5 22.0184
(The negative residual implies that the point (5.7, 119.0) lies below the estimated
regression line.) The relevant sums of squares are
SSTo 5 Syy 5 191,672.90 2 (1693.6)2y15 5 454.1693
SSResid 5 Syy 2 bSxy 5 454.1693 2 (2.917622)(2372.4040) 5 112.4432

from which the coefficient of determination is

112.4432
r2 5 1 2 5 .752
454.1693
Thus roughly 75% of the observed variation in density can be attributed to the simple
linear regression model relationship between density and air content. SSResid is
based on 15 2 2 5 13 df, and the estimates of the “error” variance and standard
deviation are
112.4432
s2e 5 5 8.6495 se 5 2.941
13
Figure 11.6 shows output from the SAS software package. Values on the output agree
quite closely with our hand calculations.

Figure 11.6 SAS output for the data of Example 11.3

Exponential Regression
A scatterplot of data obtained in a scientific or engineering investigation will often show
curvature rather than a linear pattern. The scatterplot of Figure 11.7 shows a monotonic
pattern, a tendency for y to decrease as x increases (alternatively, y might tend to in-
crease as x increases). In this case, an exponential regression model may be a reasonable
way to relate y to x. The model equation is multiplicative rather than additive:
y 5 ex ? «, « . 0
(The multiplicative random deviation is denoted by « to avoid confusion with the base e
of the natural logarithm system, whose value is approximately 2.7182818.) The popula-
tion regression function is e x. When « . 1, the point (x, y) lies above the graph of the
regression function, and « , 1 implies that the point lies below the graph. Now consider
the percentage change in the population regression function when x increases by 1:
3 e (x11) 2 e x 4
100 5 100(e 2 1)
e x
a constant not dependent on x. In simple linear regression, when x increases by 1 unit,
on average y will increase by a constant amount ; in this case, when x increases by
1 unit, on average y will increase (or decrease, if , 0) by a constant percentage.

Rupture time

45
40
35
30
25
20
15
10
5
0 Applied stress
20 30 40 50 60 70

Figure 11.7 A scatterplot consistent with an exponential

regression model ( 5 time to rupture a brass specimen,

5 applied stress)

Let’s now take the logarithm of both sides of the model equation:

y 5 ln (y) 5 ln () 1 x 1 ln («) 5 1 x 1 «

where 5 ln (), 5 , and « 5 ln («). This is exactly the equation for simple linear
regression. Thus to say that y and x are related via the exponential regression model is
the same as saying that ln(y) and x are related by the simple linear regression model
(provided that ln(«) is normally distributed, which is equivalent to « itself having a
lognormal distribution). In particular, using the previous formulas for the slope and
intercept of the least squares line on the (xi, ln(yi)) pairs gives point estimates of and

ln (), respectively. A point estimate of results from taking the antilog of the estimate
for ln (). Figure 11.8 shows the result of transforming the y values in Figure 11.7 by logs
and then fitting the simple linear regression model. The r2 value from this regression is
obviously very high, so the simple linear regression model explains virtually all of the
observed variation in ln(time to rupture).

Regression Plot
Y 5.08298 5.55E-02X
R-Sq 98.8%
4

3
In (time)

20 30 40 50 60 70
applstrss

Figure 11.8 Minitab output from fitting the simple linear regression
model to the ( , ln( )) pairs resulting from the data of Figure 11.7

The key point here is that making a transformation [transformed y 5 ln(y)] results
in the simple linear regression model. There are many other models nonlinear in y or x
for which a transformation on one or both of the variables recaptures the simple linear
regression model. Parameters of the original model can then be estimated in a relatively
straightforward way. Unless otherwise noted, all content on this page is © Cengage Learning.

Section 11.1 Exercises

1. The flow rate y (m3/min) in a device used for air- b. What change in flow rate can be expected
quality measurement depends on the pressure drop when pressure drop increases from 10 in. to
x (in. of water) across the device’s filter. Suppose 15 in.?
that for x values between 5 and 20, the two variables c. What is the expected (i.e., true average) flow
are related according to the simple linear regression rate when the pressure drop is 10 in.? When the
model with true regression line y 5 2.12 1 .095x. pressure drop is 15 in.?
a. What is the expected (i.e., true average) change d. Suppose that 5 .025 and consider making
in flow rate associated with a 1-in. increase in repeated observations on flow rate when the
pressure drop? Explain. pressure drop is 10 in. What is the long-run

proportion of observed flow rates that will exceed c. Calculate a point estimate of the true average
.835 [that is, what is P(y . .835 when x 5 10)]? runoff volume when rainfall volume is 50.
d. Calculate a point estimate of the error standard
2. In a certain chemical process the reaction time y (hr)
deviation .
is known to be related according to the simple
e. What proportion of the observed variation in run-
linear regression model to the temperature x (°F)
off volume can be attributed to the simple linear
in the chamber in which the reaction takes place.
regression relationship between runoff and rainfall?
The model equation is y 5 5.00 2 .01x 1 e, with
5 .075. 5. The bond behavior of reinforcing bars is an im-
a. What is the true average change in reaction portant determinant of strength and stability. The
time associated with a 1°F increase in tempera- article “Experimental Study on the Bond Behavior
ture? A 10°F increase in temperature? of Reinforcing Bars Embedded in Concrete Sub-
b. What is the true average reaction time when jected to Lateral Pressure” (J. of Materials in Civil
temperature is 200°F? When temperature is Engr., 2012: 125–133) reported the results of one ex-
250°F? periment in which the researchers applied varying
c. What is P(2.4 , y , 2.6 when x 5 250)? If an in- levels of lateral pressure on 21 concrete cube speci-
vestigator makes five independent experimental mens, each with an embedded 16-mm plain steel
runs, each for a temperature of 250°F, what is round bar, and measured the corresponding bond
the probability that all five observed reaction capacity. Due to differing concrete cube strengths
times are between 2.4 and 2.6? ( fcu, in MPa), the applied lateral pressure was
equivalent to a fixed proportion of the specimen’s
3. Let V be the vapor pressure of water (mm Hg) at a
fcu (0, .1fcu, . . . , .6fcu). Also, since bond strength can
specific temperature T (°K). The Clausius–Clapeyron
be heavily influenced by the specimen’s fcu, bond
equation from physical chemistry suggests that y 5
capacity was expressed as the ratio of bond strength
ln(V) is related to x 5 1/ T according to the simple
(MPa) to 1fcu.
linear regression model.
a. What is the implied probabilistic relationship Pressure: 0 0 0 .1 .1 .1 .2
between V and T? Ratio: 0.123 0.100 0.101 0.172 0.133 0.107 0.217
b. If the coefficients in the simple linear regres-
Pressure: .2 .2 .3 .3 .3 .4 .4
sion model are 5 20.607 and 5 25200.762,
Ratio: 0.172 0.151 0.263 0.227 0.252 0.310 0.365
what would you predict for the value of vapor
pressure when temperature is 300? Pressure: .4 .5 .5 .5 .6 .6 .6
Ratio: 0.239 0.365 0.319 0.312 0.394 0.386 0.320
4. The article “Characterization of Highway Runoff
in Austin, Texas, Area” (J. of Envir. Engr., 1998: a. Does a scatterplot of the data support the use of
131–137) gave a scatterplot, along with the least the simple linear regression model?
squares line, of x 5 rainfall volume (m3) and y 5 b. Calculate point estimates of the slope and inter-
runoff volume (m3) for a particular location. The cept of the population regression line.
accompanying values were read from the plot: c. Calculate a point estimate of the true average
bond capacity when lateral pressure is .45fcu.
x: 5 12 14 17 23 30 40 47
d. Calculate a point estimate of the error standard
y: 4 10 13 15 15 25 27 46 deviation .
x: 55 67 72 81 96 112 127
6. A study reported in the article “The Effects of Water
y: 38 46 53 70 82 99 100
Vapor Concentration on the Rate of Combustion
a. Does a scatterplot of the data support the use of of an Artificial Graphite in Humid Air Flow”
the simple linear regression model? (Combustion and Flame, 1983: 107–118) gave
b. Calculate point estimates of the slope and inter- data on x 5 temperature of a nitrogen–oxygen mix-
cept of the population regression line. ture (1000s of °F) under specified conditions and

y 5 oxygen diffusivity. Summary quantities are rating and pile length as timber damage changes
from 0%, to 20%, and to 40%.
n59 ^ xi 5 12.6 ^ yi 5 27.68 d. Calculate a point estimate of the error standard
^ x2i 5 18.24 ^ xiyi 5 40.968 deviation for each of the pairs. How do these
point estimates change as timber damage in-
^ y2i 5 93.3448
creases from 0% to 20% and then to 40%?
a. Assuming that the variables are related by the
simple linear regression model, determine the 8. Exercise 30 in Section 3.4 gave data on x 5 testing
equation of the estimated regression line. temperature and y 5 dynamic shear modulus for a
b. Calculate a point estimate of mean diffusivity particular asphalt binder type. A scatterplot of x and
when temperature is 1.5. How does this point y 5 log(y) shows a substantial linear pattern, sug-
estimate compare to a point prediction of the gesting that these variables are related by the simple
diffusivity value that would result from making linear regression model.
one more observation when temperature is 1.5? a. What probabilistic model for relating y =
c. Estimate the error standard deviation . dynamic shear modulus to x 5 testing tempera-
d. Calculate and interpret the coefficient of deter- ture is implied by the simple linear regression
mination. relationship between x and y9?
b. Summary quantities calculated from the data are
7. Timber piles are often used to buttress multiple-
span simply supported (MSSS) bridges that are com- n 5 7 ^ xi 5 211.4 ^ y i 5 40.64
monly found in rural areas. The authors of “Bridge ^ x2i 5 8449.68 ^ (y i)2 5 282.58
Timber Piles Load Rating under Eccentric Loading
Conditions” (J. Bridge Engr., 2012: 700–710) exam- ^ xi y i 5 917.48
ined the effect of various geometric and structural Calculate estimates of the parameters for the model
characteristics on the critical rating (an overall struc- in part (a), and then obtain a point prediction of
tural assessment score) of MSSS bridges. The article dynamic shear modulus when temperature is 35°F.
reported the following data (read from a graph) for x 5
timber pile length (m) and y 5 critical rating for a 9. The authors of the article “Long-Term Effects
particular timber profile at various damage levels. of Cathodic Protection on Prestressed Concrete
Structures” (Corrosion, 1997: 891–908) presented
x 5 Timber pile a scatterplot of y 5 steady-state permeation flux
length (m): 7.32 7.93 8.54 9.14 9.75 ( Aycm2) versus x 5 inverse foil thickness (cm21);
y 5 Critical rating the substantial linear pattern was used as a basis for
(damage 5 0%): 59.09 54.79 49.74 44.11 37.99 an important conclusion about material behavior.
y 5 Critical rating This is the Minitab output from fitting the simple
(damage 5 20%): 57.52 52.63 44.28 33.85 25.74 linear regression model to the data.
y 5 Critical rating
(damage 5 40%): 43.94 30.70 19.12 9.77 2.48 The regression equation is
flux = –0.398 + 0.260 invthick
a. Create the scatterplots for the pairs (x, y ), (x, y ),
Predictor Coef Stdev t-ratio p
and (x, y ). Does each scatterplot suggest that
Constant –0.3982 0.5051 –0.79 0.460
a simple linear regression model holds for the
invthick 0.26042 0.01502 17.34 0.000
respective variables?
b. For each pair, calculate point estimates of the s = 0.4506 R-sq = 98.0% R-sq(adj) = 97.7%
slope and intercept of the respective population Analysis of Variance
regression line and determine the correspond- Source DF SS MS F p
ing coefficients of determination. Regression 1 61.050 61.050 300.64 0.000
c. Given the slope coefficients from the regression, Error 6 1.218 0.203
summarize the relationship between critical Total 7 62.269

inv- Stdev. St. a. Interpret the estimated slope and the coefficient
Obs. thick flux Fit Fit Residual Resid of determination.
1 19.8 4.3 4.758 0.242 –0.458 –1.20 b. Calculate a point estimate of true average flux
2 20.6 5.6 4.966 0.233 0.634 1.64 when inverse foil thickness is 23.5.
3 23.5 6.1 5.722 0.203 0.378 0.94 c. Predict the value of flux that would result from a
4 26.1 6.2 6.399 0.182 –0.199 –0.48 single observation made when inverse foil thick-
5 30.3 6.9 7.493 0.161 –0.593 –1.41 ness is 45.
6 43.5 11.2 10.930 0.236 0.270 0.70 d. Verify that the sum of the residuals is zero and
7 45.0 11.3 11.321 0.253 –0.021 –0.06 that squaring and summing the residuals results
8 46.5 11.7 11.711 0.271 –0.011 –0.03 in the value of SSResid given in the output.

11.2 Inferences About the Slope Coefficient b

The slope of the population regression line is the true average change in the depen-
dent variable y associated with a 1-unit increase in the independent variable x. The
slope of the least squares line, b, gives a point estimate of . A confidence interval is
a more effective way to estimate a parameter than is a point estimate, because it gives
information about reliability (via the confidence level) and precision (from the width of
the interval). Recall that the development of the one-sample t confidence interval for a
population mean was based on properties of the sampling distribution of the statistic
x: x 5 , x 5 y1n, and that x is normally distributed when the population itself has
a normal distribution. These results in turn implied that the standardized variable
x2
t5
sy1n

has a t distribution with n 2 1 degrees of freedom, from which the interval estimate x 6
(t critical value) (sy1n) emerges.
In the same way that the statistic x varies in value from sample to sample, the sta-
tistic b does also. For example, if the slope of the population regression line is actually
5 25.0, a first sample might result in b 5 24.2, a second in an estimate of 26.5, a third
in 25.4, and so on.

Properties of the Sampling Distribution of

1. 5 (The sampling distribution is always centered at the value of what the statistic
is trying to estimate; that is, is an unbiased statistic.)

2. 5 y2 , where 5 ^ ( 2 )2 5 ^ 2 2 ( ^ )2y .

The estimated standard deviation of results from replacing by its estimate :
5 y1

3. is normally distributed (because in the model equation is assumed to have a normal
distribution).

The smaller the value of b, the more precisely will tend to be estimated. Because
is in the numerator, the less variability there is about the population line, the smaller is
the standard deviation of b and the more concentrated is its sampling distribution. The
value of is, of course, not under our control. However, we may be able to have an
impact on the value of Sxx. Because this quantity is in the denominator of b, the larger
its value, the smaller is the value of the standard deviation. Since Sxx is a measure of how
much the xi values in the sample spread out, the implication is that spreading out the
values of the independent variable tends to give a more precise estimate than if these
values were quite close together. Intuitively, if the sample xi values were highly concen-
trated, very small changes in the resulting yi’s might substantially affect the slope of the
least squares line, whereas such changes would have little effect on the slope if the xi’s
were quite spread out. So if the investigator can select the x values at which observations
will be made (frequently not possible in social science and business scenarios), they
should be spread out as much as possible while still preserving the approximate linearity
of the relationship between x and y.

A Confidence Interval for the Slope Parameter

Just as in the case of x and , the foregoing properties allow us to form a t variable,
which then gives rise to the desired confidence interval.

The standardized variable

2
5

has a distribution based on 2 2 df. This in turn implies that a confidence interval
for is
6 ( critical value)
Appendix Table IV contains critical values corresponding to the most frequently used
confidence levels.

Example 11.4 Let’s reconsider the data on air content and mortar dry density introduced in
Examples 11.2 and 11.3. In this context, is the average or expected change in dry
density associated with an increase of 1% in air content. We previously calculated
Sxx 5 405.836000, b 5 2.918, and se 5 2.941, from which the estimated standard
deviation (standard error) of b is
2.941
sb 5 5 .1460
1405.836
The confidence interval is based on n 2 2 5 15 2 2 5 13 df, and the correspond-
ing t critical value for a confidence level of 95% is 2.160. The confidence interval is
2 .918 6 (2.160)(.1460) 5 2.918 6 .315 5 (21.233,2.603)

With a high degree of confidence, we estimate that an average decrease in density of

between .603 lbyft3 and 1.233 lbyft3 is associated with a 1% increase in air content (at
least for air content values between roughly 5 and 25%, corresponding to the x values
in our sample). The interval is reasonably narrow, indicating that the slope of the
population line has been precisely estimated. Notice that the interval includes only
negative values, so we can be quite confident of the tendency for density to decrease
as air content increases.
Looking back to the SAS output of Figure 11.6, we find the value of sb in the
Parameter Estimates table as the second number in the Standard Error column.
All of the widely used statistical packages include this estimated standard error in
output. There is also an estimated standard error for the statistic a, from which a
confidence interval for the intercept of the population regression line can be
calculated.

Testing Hypotheses About b

The form of the null hypothesis when testing hypotheses about a population or process
mean was H0: 5 0, where the symbol 0 (“mu naught”) represented the value of
asserted to be true by the null hypothesis, or simply the null value. The test statistic
resulted from using the null value to standardize x: t 5 (x 2 0)y(sy1n). Let 0 denote
the null value when testing hypotheses about . Then when H0: 5 0 is true, the
statistic t 5 (b 2 0)ysb has a t distribution based on n 2 2 df. The P-value for the test
(probability of obtaining a value of b more contradictory to H0 than the value actually
obtained from the given sample) is then a t curve tail area whose computation depends
on the nature of the inequality in the alternative hypothesis.

Null hypothesis: 0: 5 0
2 0
Test statistic: 5 , which is based on 2 2 df

Alternative hypothesis Type of test P-value

a : . 0 Upper-tailed Area under the 2 2 df
curve to the right of the
calculated
:a , 0 Lower-tailed Area under the 2 2 df
curve to the left of the
calculated
a: Þ 0 Two-tailed Twice the area under the
2 2 df curve to the right
of the calculated |
Upper-tail areas captured under various curves are given in Appendix Table VI. Because
curves are symmetric about zero, these are also lower-tail areas.

In practice, the most frequently tested null hypothesis is H0: 5 0. When the slope
of the population regression line is zero, there is no useful linear relationship between x
and y. The usual alternative hypothesis is Ha: Þ 0, according to which there is a useful
linear relationship between the two variables. A test of these two hypotheses is often re-
ferred to as the model utility test in simple linear regression. Unless H0 can be rejected
at a reasonably small significance level, the simple linear regression model should not be
used as a basis for making various inferences (e.g., for predicting y from knowledge of x).
In practice, the model will generally be judged useful by this test when r2 is reasonably
large. On occasion, the alternatives Ha: . 0 or Ha: , 0 may be of interest; the for-
mer says that there is in fact a positive linear relationship between the two variables (a
tendency for y to increase linearly as x increases). The test statistic in all three cases is
the t-ratio, bysb.

Example 11.5 The presence of hard alloy carbides in high chromium white iron alloys results in
excellent abrasion resistance, making them suitable for materials handling in the
mining and materials processing industries. The accompanying data on x 5 retained
austentite content (%) and y 5 abrasive wear loss (mm3) in pin wear tests with garnet
as the abrasive was read from a plot in the article “Microstructure-Property Relation-
ships in High Chromium White Iron Alloys” (Intl. Materials Reviews, 1996: 59–82).
x: 4.6 17.0 17.4 18.0 18.5 22.4 26.5 30.0 34.0
y: .66 .92 1.45 1.03 .70 .73 1.20 .80 .91
x: 38.8 48.2 63.5 65.8 73.9 77.2 79.8 84.0
y: 1.19 1.15 1.12 1.37 1.45 1.50 1.36 1.29
A scatterplot of the data (not shown) suggests that the simple linear regression may
specify a useful relationship between these two variables. Is this indeed the case?
Let’s base our analysis on the SAS output in Figure 11.9.

Figure 11.9 SAS output from a simple linear regression of the data in Example 11.5

The parameter of interest is , the average change in wear loss associated with a
1% (i.e., 1-unit) increase in austentite content. The relevant hypotheses are
H0: 5 0 (the model is not useful)
Ha: Þ 0 (there is a useful linear relationship between the variables)
The test statistic is the model utility t-ratio t 5 bysb. From the Parameter Estimates
table in Figure 11.9,
.007570
b 5 .007570 sb 5 .00192626 t5 5 3.93 3.9
.00192626
The two-tailed test is based on n 2 2 5 15 df. In Appendix Table VI, the area under
the 15 df t curve to the right of 3.9 is .001, so the P-value for the test is roughly .002.
Figure 11.9 gives this P-value as .0013 (so the area to the right of 3.93 must be about
.00065). Clearly the P-value is smaller than either .05 or .01. H0 can obviously be
rejected in favor of the conclusion that there is a useful linear relationship. Notice
that the r2 value is .507, which is not terribly impressive. But as long as n is not too
small, the model will be judged useful even when r2 is moderate to small.
The article’s authors asserted that “increasing the austentite content leads to
greater wear rates with garnet as the abrasive.” The implied alternative hypothesis
is Ha: . 0 (a positive linear relationship). The P-value for this upper-tailed test is
about .001 (more exactly, .00065), which clearly supports the authors’ contention.

Regression and ANOVA

An alternative to the t test for model utility is based on the decomposition of total sum
of squares into regression or model sum of squares and error sum of squares:

SSTo 5 SSRegr 1 SSResid

where df 5 1 for SSRegr and df 5 n 2 2 for SSResid. The two mean squares are then
MSRegr 5 SSRegry 1 and MSResid 5 SSResidy (n 2 2), and the F ratio is given by F 5
MSRegry MSResid. The calculations are usually summarized in an ANOVA table, as
shown in Table 11.1.
Unless otherwise noted, all content on this page is © Cengage Learning.

Table 11.1 ANOVA table for simple linear regression

Source of Sum of
variation df squares Mean square F P-value
Model Area to right of
(Regression) 1 SSRegr SSRegr MSRegry MSResid calculated F
Error n22 SSResid SSResidy(n 2 2)
Total n21 SSTo

Looking at the ANOVA table on the SAS output of Figure 11.9, we see that the
calculated F ratio for the data of Example 11.5 is F 5 15.444, and the corresponding
P-value (the area under the F curve with 1 numerator and 15 denominator df to the

right of 15.444) is .0013. That this P-value is identical to the P-value for the model utility
t test is no accident: It can be shown that t2 5 F [(3.930)2 5 15.444 in Example 11.5],
and the distribution of the square of a t variable with df is the F distribution with 1
numerator and denominator df. However, in multiple regression, the test for model
utility is an F test, and t tests are used for another purpose.

Correlation Revisited
The sample correlation coefficient r was introduced in Chapter 3 as a measure of the
extent of linear association between values of x and y in a sample. An analogous mea-
sure for the entire population from which the sample of pairs was selected is called the
population correlation coefficient and is denoted by . The most important properties
of r are also satisfied by ; in particular,21 # # 1, so the closer is to 1 or 21, the
stronger the linear relationship within the population. The value 5 0 indicates the
complete absence of any linear relationship in the population. Even if 5 0, the value
of r will usually differ somewhat from zero because of sampling variability—r is a sta-
tistic and its value will vary from sample to sample in the same way that x and b do. It
is therefore important to have a formal test of the null hypothesis that 5 0. The usual
test procedure assumes that (x1, y1), . . . , (xn, yn) have been randomly selected from a
bivariate normal population distribution (introduced in Section 3.6). This assumption
is difficult to check. A partial assessment of plausibility is based on constructing one
normal quantile plot of the x’s and another of the y’s. A nonlinear pattern in either plot
is a warning of implausibility.

A Test for Linear Association in a Bivariate Normal Population

Null hypothesis: 0: 50
1 22
Test statistic: 5 where 5
21 2 2 1

When 0 is true, the test statistic has a distribution based on 2 2 df, so a -value is
computed as was done for previous tests. In particular, the usual alternative hypothesis is
a: Þ 0 (no linear association, positive or negative, in the population), for which the test
is two-tailed and the -value is twice the tail area captured by the calculated | .

Example 11.6 Neurotoxic effects of manganese are well known and are usually caused by high oc-
cupational exposure over long periods of time. In the fields of occupational hygiene
and environmental hygiene, the relationship between lipid peroxidation, which is
responsible for deterioration of foods and damage to live tissue, and occupational
exposure has not been previously reported. The article “Lipid Peroxidation in Work-
ers Exposed to Manganese” (Scand. J. Work and Environ. Health, 1996: 381–386)
gave data on x 5 manganese concentration in blood (ppb) and y 5 concentration
(molyL) of malondialdehyde, which is a stable product of lipid peroxidation, both

for a sample of 22 workers exposed to manganese and for a control sample of 45 indi-
viduals. The value of r for the control sample was .29, from which
(.29)145 2 2
t5 2.0
21 2 (.29)2
The corresponding P-value for a two-tailed t test based on 43 df is roughly .052 (the
cited article reported only that P-value ..05). We would not want to reject the as-
sertion that 5 0 at either significance level .01 or .05. For the sample of exposed
workers, r 5 .83 and t 6.7, clear evidence that there is a linear relationship in the
entire population of exposed workers from which the sample was selected.

The hypothesis H0: 5 0 for the model utility test in regression also asserts that
there is no linear relationship between x and y. Although it is certainly not obvious by
inspection, it can be shown that the t-ratio bysb is algebraically identical to the t statistic
in the previous box for testing 5 0. The value of the latter statistic is easier to compute,
since it requires only r and not any of the calculations appropriate for regression.
Test procedures for H0: 5 0 when 0 Þ 0 are rather complicated, as is the proce-
dure for obtaining a confidence interval for . Please consult one of the chapter refer-
ences for further information.

Section 11.2 Exercises

10. Exercise 4 of Section 11.1 gave data on x 5 rainfall 12. Use the computer output given in Exercise 9 of
volume and y 5 runoff volume (both in m3). Use the previous section to decide whether the simple
the accompanying Minitab output to decide wheth- linear regression model specifies a useful relation-
er there is a useful linear relationship between rain- ship between flux and inverse foil thickness.
fall and runoff, and then calculate a confidence in-
13. Exercise 22 (Section 3.3) of Chapter 3 gave SAS
terval for the true average change in runoff volume
output from a regression of amount of oil recovered
associated with a 1-m3 increase in rainfall volume.
from wheat straw on amount of oil added.
The regression equation is
a. Does the simple linear regression model appear
runoff = –1.13 + 0.827 rainfall
to specify a useful relationship between these
Predictor Coef Stdev t-ratio p
Constant –1.128 2.368 –0.48 0.642 two variables? State the relevant hypotheses,
rainfall 0.82697 0.03652 22.64 0.000 and carry out a test in two different ways.
s =5.240 R-sq = 97.5% R-sq(adj) = 97.3% b. If the roles of the two variables were reversed,
so that the amount of oil recovered from wheat
11. In the same way that bysb is the t-ratio for testing
straw was the independent variable, what would
H0: 5 0, the t-ratio aysa is appropriate for testing
be the value of the t-ratio for testing model
H0: 5 0, where sa is the estimated standard devia-
utility? (Answer without actually carrying out
tion of the statistic a and the test is again based on
another regression analysis, and explain your
n 2 2 df. The null hypothesis says that the vertical
reasoning.)
intercept of the population line is zero, so that the
line passes through the origin (0, 0). Carry out this 14. Exercise 20 (Section 3.3) of Chapter 3 presented
test using the information given in Exercise 10. data on y 5 dielectric constant and x 5 air void (%)

for 18 asphalt mixture samples having 5% asphalt 17. A sample of n 5 13 steel specimens was selected,
content. The following R output is from a simple and the values of x 5 nickel content and y 5 per-
linear regression of y on x: centage austentite were determined, resulting in
^ (xi 2 x)2 5 1.183 ^ (yi 2 y)2 5 .05080
Std. t Pr
Estimate Error value (>|t|)
(Intercept) 4.858691 0.059768 81.293
AirVoid
<2e-16
-0.074676 0.009923 -7.526 1.21e-06
^ (xi 2 x)(yi 2 y) 5 .2073
Residual standard error: 0.03551 on 16 Does there appear to be a positive linear relation-
degrees of freedom ship between these two variables in the sampled
Multiple R-squared: 0.7797, population? State and test the relevant hypotheses.
Adjusted R-squared: 0.766
F-statistic: 56.63 on 1 and 16 DF, p-value: 1.214e-06 18. In what was surely an unpleasant data collection ex-
Analysis of Variance Table perience, the article “Annual Variations of Odor Con-
Response: Dielectric centrations and Emissions from Swine Gestation,
DF Sum Sq Mean Sq F value Pr(>F) Farrowing, and Nursery Buildings” (J. of the Air
AirVoid 1 0.071422 0.071422 56.635 1.214e-06
and Waste Mgmnt., 2011: 1361–1368) reported on
Residuals 16 0.020178 0.001261
monthly odor concentrations and emission rates from
a. What are the values of SSRegr, SSResid, and a Canadian swine farm for a period of one year. One
SSTo? study objective was to identify possible relationships,
b. Determine and interpret the value of r2 for this if any, between odor and presence of other gases such
regression. What is the corresponding value of r? as ammonia (NH3), hydrogen sulfide (H2S), carbon
Note that the sign of r can be determined based dioxide (CO2), and methane (CH4). Identifying such
on the output. relationships would be helpful in that the gas concen-
c. Use the output to calculate a confidence interval tration could be used as an odor indicator.
with a confidence level of 95% for the slope of a. A scatterplot of the n 5 32 observations on y 5
the population regression line and interpret the odor concentration (OU/m3) and x 5 H2S con-
resulting interval. centration (ppb) suggested the plausibility of a
d. Suppose it had previously been believed that when positive linear relationship. The coefficient of
air void increased by 1 percent, the associated true determination for the simple linear regression
average change in dielectric constant would be at of y on x was .58. State and test the relevant hy-
least 2.05. Does the sample data contradict this potheses to see if the message from the scatter-
belief? State and test the relevant hypotheses. plot can be confirmed.
15. Suppose that the unit of measurement for y 5 wear b. A scatterplot of the n 5 32 observations on y 5
loss in Example 11.5 is changed from mm3 to in3, odor concentration (OU/m3) and x 5 CH4 con-
which amounts to multiplying each y value by the centration (ppm) also suggested the plausibility of
same conversion factor c. How does this change a positive linear relationship. The coefficient of
affect the value of the t-ratio for testing model determination for the simple linear regression of y
utility? Explain your reasoning. on x was 0.33. State and test the relevant hypoth-
eses to see if the message from the scatterplot can
16. The value of the sample correlation coefficient is
be confirmed.
.722 for the n 5 14 observations on average ante-
rior maximum inclination angle (AMIA) in both 19. How does lateral acceleration—side forces ex-
the clockwise (Cl) and counterclockwise (Co) di- perienced in turns that are largely under driver
rections given in Exercise 10 (Section 3.2) of Chap- control—affect nausea as perceived by bus pas-
ter 3. Carry out a test at significance level .05 to de- sengers? The article “Motion Sickness in Public
cide whether these two variables are linearly related Road Transport: The Effect of Driver, Route, and
in the population from which the data was selected Vehicle” (Ergonomics, 1999: 1646–1664) reported
(assuming that the population distribution is bivari- data on x 5 motion sickness dose (calculated in
ate normal). accordance with a British standard for evaluating

similar motion at sea) and y 5 reported nausea (%). Here is data on x 5 leaching time (h), yfw 5 nitrate
Relevant summary quantities are extraction percentage (freshwater), and ysw 5 nitrate
extraction percentage (seawater):
n 5 17 ^ xi 5 221.1 ^ yi 5 193
^ x2i 5 3056.69 ^ xiyi 5 2759.6 x: 25.5 31.5 37.5 43.5 49.5 55.5
yfw: 25.7 43.2 55.3 62.9 68.6 73.2
^ y2i 5 2975 ysw: 26.4 40.1 50.2 57.4 62.7 67.3
Values of dose in the sample ranged from 6.0 to 17.6.
a. Assuming that the simple linear regression x: 61.5 67.5 73.5 79.5 85.5 91.5
model is valid for relating these two variables yfw: 76.7 79.4 81.8 83.7 85.1 86.5
(this is supported by the raw data), calculate and ysw: 71.4 74.7 77.8 80.3 82.3 84.1
interpret an estimate of the slope parameter that x: 97.5 103.5 109.5 115.5 121.5 127.5
conveys information about the precision and yfw: 87.7 88.6 89.6 90.5 90.7 91.2
reliability of estimation.
ysw: 85.5 86.6 87.9 89.0 89.9 90.6
b. Does it appear that there is a useful linear rela-
tionship between these two variables? x: 133.5 139.5 145.5 151.5 157.5
c. Would it be sensible to use the simple linear re- yfw: 91.9 92.5 93.1 93.9 94.7
gression model as a basis for predicting % nausea ysw: 91.2 91.8 92.3 92.8 93.3
when dose 5 5.0? Explain your reasoning.
a. Construct scatterplots of yfw versus x and ysw ver-
d. When Minitab was used to fit the simple linear
sus x. Note the nonlinearity of the plots. Would
regression model to the raw data, the observa-
it be reasonable to describe the patterns in both
tion (6.0, 2.50) was flagged as possibly having
plots as curved and monotonic?
a substantial impact on the fit. Eliminate this
b. In Section 3.4, we described how a power trans-
observation from the sample and recalculate the
formation can be applied to create a linear pat-
estimate of part (a). Based on this, does the obser-
tern in the transformed data. Using the trans-
vation appear to be exerting an undue influence?
formation x 5 1yx, construct scatterplots of yfw
20. Mineral mining is one of the most important eco- versus x , and ysw versus x . For each set of pairs,
nomic activities in Chile. Mineral products are fre- calculate point estimates of the slope and inter-
quently found in saline systems composed largely of cept of the respective population regression line.
natural nitrates. Freshwater is often used as a leach- c. Does the simple linear regression model appear
ing agent for the extraction of nitrate, but the Chilean to specify a useful relationship between either
mining regions have scarce freshwater resources. An dependent variable and x in part (b)? State and
alternative leaching agent is seawater. The authors test the relevant hypotheses.
of “Recovery of Nitrates from Leaching Solutions d. The researchers concluded that the freshwater
Using Seawater” (Hydrometallurgy, 2013: 100–105) and seawater leaching agents yield similar nitrate
evaluated the recovery of nitrate ions from discarded extraction efficiencies. Using the regression mod-
salts using freshwater and seawater leaching agents. els from part (b), calculate a point estimate of true
Tests were performed in salt columns irrigated at nitrate extraction percentage when leaching time
the same rate for a period of more than 150 hours. is 150 hours. Are the two estimates similar?

11.3 Inferences Based on the Estimated Regression Line

Once the simple linear regression model has been judged useful by the model utility
test discussed in Section 11.2, the estimated model can be used as the basis for further
inferences. Let x denote a particular value of the independent or predictor variable x.
In this section, we show how to obtain a confidence interval for the mean y value when

x 5 x and also how to calculate a prediction interval for the value of a single y to be
observed at some time in the future when x 5 x . For example, x might be the tensile
force applied to a steel specimen (1000s of lb) and y the resulting amount of elonga-
tion (thousandths of an inch). Then we might wish to calculate a confidence interval
(interval of plausible values) for the average amount of elongation for all specimens to
which a tensile force of 5000 lb is applied (so x 5 5). Alternatively, we might subject a
single specimen to a force of 5000 lb and wish to calculate a prediction interval (interval
of plausible values) for the resulting amount of elongation.
Recall that substituting a particular value x into the equation of the estimated re-
gression line gives a number yn 5 a 1 bx* that has two different interpretations: It can be
regarded either as a point estimate of the mean y value when x 5 x or as a point predic-
tion of the y value that would result from making a single observation when x has this
value. Because the point estimate and point prediction are single numbers, they convey
no information about the reliability or precision of estimation or prediction. An interval
gives information about reliability through its confidence or prediction level (e.g., 95%)
and about precision from the width of the interval.
Before we obtain sample data, both a and b are subject to sampling variability—that
is, they are both statistics whose values will vary from sample to sample. Suppose, for
example, that 5 50 and 5 2. Then a first sample of (x, y) pairs might give a 5 52.35,
b 5 1.895, a second sample might result in a 5 46.52, b 5 2.056, and so on. It follows that
yn 5 a 1 bx* itself varies in value from sample to sample, so it is a statistic. If the intercept
and slope of the population line are the aforementioned values 50 and 2, respectively, and
x 5 10, then this statistic is trying to estimate the value 50 1 2(10) 5 70. The estimate
from a first sample might be 52.35 1 1.895(10) 5 71.30, from second sample might be
46.52 1 2.056(10) 5 67.08, and so on. In the same way that a confidence interval for
was based on properties of the sampling distribution of b, a confidence interval for a mean
y value in regression is based on properties of the sampling distribution of the statistic yn.

Properties of the Sampling Distribution of 1

Let denote a particular value of the independent variable . Then the sampling distribu-
tion of the statistic n 5 1 has the following properties:
1. The mean value of this statistic is 1 x , so the sampling distribution is centered at
the value that the statistic is attempting to estimate (i.e., the statistic is unbiased for
estimating 1 ).
2. The standard deviation of the statistic is
2
1 ( *2 )
n 5 1
C
The standard deviation of the statistic n, which we denote by n , results from
replacing in this expression by its estimate, .
3. The assumptions that any particular random deviation in the model equation is nor-
mally distributed and that different deviations are independent of one another imply
that n itself is normally distributed.

The values of both yn and syn increase as the value of (x* 2 x)2 gets larger. That is,
these standard deviations increase in value as the specified value x deviates farther from
x, the center of the x values for the sample observations. Thus the farther x is from x, the
less precisely yn tends to estimate 1 x*.

A Confidence Interval for a Mean y Value

In the same way that a confidence interval for the slope was based on the t variable
t 5 (b 2 )ysb, a confidence interval here is based on a standardized variable having a
t distribution.

The standardized variable

n 2 ( 1 *)
5
n
has a distribution based on 2 2 df, where

1 ( * 2 )2
n5 1
B

This implies that a confidence interval for a bx*, the mean value when 5 *, is

n 6 ( critical value) n

The critical values corresponding to the usual confidence levels are given in Appendix
Table IV; a value from the 2 2 df row of this table should be used.

Example 11.7 Corrosion of steel reinforcing bars is the most important durability problem for
reinforced concrete structures. Carbonation of concrete results from a chemical
reaction that lowers the pH value by enough to initiate corrosion of the rebar.
Representative data on x 5 carbonation depth (mm) and y 5 strength (MPa) for
a sample of core specimens taken from a particular building follow (read from
a plot in the article “The Carbonation of Concrete Structures in the Tropical
Environment of Singapore,” Magazine of Concrete Res., 1996: 293–300):
x: 8.0 15.0 16.5 20.0 20.0 27.5 30.0 30.0 35.0
y: 22.8 27.2 23.7 17.1 21.5 18.6 16.1 23.4 13.4
x: 38.0 40.0 45.0 50.0 50.0 55.0 55.0 59.0 65.0
y: 19.5 12.4 13.2 11.4 10.3 14.1 9.7 12.0 6.8
A scatterplot of the data (see Figure 11.11 on p. 529) gives strong support to use of the
simple linear regression model. Relevant quantities are as follows:
^ xi 5 659.0 ^ x2i 5 28,967.50 x 5 36.61111 Sxx 5 4840.7778
^ yi 5 293.2 ^ xiyi 5 9293.95 ^ y2i 5 5335.76
b 5 2.297561 a 5 27.182936 SSResid 5 131.2402
2
r 5 .766 se 5 2.8640

Let’s now calculate a confidence interval, using a 95% confidence level, for the
mean strength for all core specimens having a carbonation depth of 45 mm—that is,
a confidence interval for 1 (45). The interval is centered at

yn 5 a 1 b(45) 5 27.18 2 .2976(45) 5 13.79

The estimated standard deviation of the statistic yn is
2
1 (45 2 36.6111)
syn 5 2.8640 1 5 .7582
B 18 4840.7778

The 16 df t critical value for a 95% confidence level is 2.120, from which we deter-
mine the desired interval to be

13.79 6 (2.120)(.7582) 5 13.79 6 1.61 5 (12.18, 15.40)

The narrowness of this interval suggests that we have reasonably precise informa-
tion about the mean value being estimated. Remember that if we recalculated this
interval for sample after sample, in the long run about 95% of the calculated intervals
would include 1 (45). We can only hope that this mean value lies in the single
interval that we have calculated.
Figure 11.10 shows Minitab output resulting from a request to fit the simple
linear regression model and calculate confidence intervals for the mean value of
strength at depths of 45 mm and 35 mm. The intervals are at the bottom of the
output; note that the second interval is narrower than the first, because 35 is much
closer to x than is 45. Figure 11.11 (on page 529) shows a Minitab scatterplot with
(1) curves corresponding to the confidence limits for each different x value and
(2) prediction limits, to be discussed shortly. Notice how the curves get farther and
farther apart as x moves away from x.

The regression equation is

strength = 27.2 – 0.298 depth
Predictor Coef Stdev t-ratio p
Constant 27.183 1.651 16.46 0.000
depth –0.29756 0.04116 –7.23 0.000 Unless otherwise noted, all content on this page is © Cengage Learning.
s = 2.864 R-sq = 76.6% R-sq(adj) = 75.1%
Analysis of Variance
SOURCE DF SS MS F P
Regression 1 428.62 428.62 52.25 0.000
Error 16 131.24 8.20
Total 17 559.86
Fit Stdev.Fit 95.0% C.I. 95.0% P.I.
13.793 0.758 (12.185, 15.401) (7.510, 20.075)
Fit Stdev.Fit 95.0% C.I. 95.0% P.I.
16.768 0.678 (15.330, 18.207) (10.527, 23.009)

Figure 11.10 Minitab regression output for the data of Example 11.7

Regression Plot
Y 27.1829 0.297561X
R-Sq 76.6%

20
strength

Regression
95% Cl
95% Pl
0
0 10 20 30 40 50 60 70
depth

Figure 11.11 Minitab scatterplot with confidence intervals and prediction intervals for
the data of Example 11.7

A Prediction Interval for a Single y Value

Suppose an investigator is contemplating making a single observation on the depen-
dent variable y at some future time when x has the value x . Let y denote the resulting
future observation. The point prediction for y is yn 5 a 1 bx , and this is also the point
estimate for 1 x , the mean y value when x 5 x . Consider the errors of estimation
and prediction:

estimation error 5 estimate 2 true value 5 yn 2 ( 1 x*)

prediction error 5 prediction 2 true value 5 yn 2 y*
Unless otherwise noted, all content on this page is © Cengage Learning.

The estimation error is the difference between a random quantity (yn) and a fixed quan-
tity, whereas the prediction error is the difference between two random quantities. This
implies that there is more uncertainty associated with making a prediction than with
estimating a mean y value. The mean value of the prediction error is

yn 2y* 5 yn 2 y* 5 1 x* 2 ( 1 x*) 5 0

Furthermore, yn and y are independent of one another, because the former is based on
the sample data and the latter is to be observed at some future time. This implies that
2
1 (x* 2 x)
2yn 2y* 5 2yn 1 2y* 5 2 c 1 d 1 2
n Sxx

The standard deviation of the prediction error is the square root of this expression, and
the estimated standard deviation results from replacing 2 by s2e . Using these results to
standardize the prediction error gives a t variable from which the prediction interval is
obtained.

The standardized variable

n2 *
5
2 2
2 1 n

has a distribution with 2 2 df. This implies that a prediction interval for a future
y value y* to be observed when x x* is
n 6 ( critical value) 2 2 1 s2n

Without s2e under the square root in the prediction interval formula, we would have
the confidence interval formula. This implies that the prediction interval (PI) is wider
than the confidence interval (CI)—often much wider because s2e is frequently much
larger than sn2y . The prediction level for the interval is interpreted in the same way that
a confidence level was previously interpreted. If a prediction level of 95% is used in
calculating interval after interval from different samples, in the long run about 95% of
the calculated intervals will include the value y that is being predicted. Of course, we
will not know whether the single interval that we have calculated is one of the good 95%
until we have observed y .

Example 11.8 Let’s return to the carbonation depth–strength data of Example 11.7 and calculate a
95% prediction interval for a strength value that would result from selecting a single
core specimen whose carbonation depth is 45 mm. Relevant quantities from that
example are
yn 5 13.79 syn 5 .7582 se 5 2.8640

For a prediction level of 95% based on n 2 2 5 16 df, the t critical value is 2.120,
exactly what we previously used for a 95% confidence level. The prediction interval
is then

13.79 6 (2.120)2(2.8640)2 1 (.7582)2 5 13.79 6 (2.120)(2.963)

5 13.79 6 6.28 5 (7.51, 20.07)

Plausible values for a single observation on strength when depth is 45 mm are (at
the 95% prediction level) between 7.51 MPa and 20.07 MPa. The 95% confidence
interval for mean strength when depth is 45 was (12.18, 15.40). The prediction in-
terval is much wider than this because of the extra (2.8640)2 under the square root.
Figure 11.10, the Minitab output in Example 11.7, shows this interval as well as the
confidence interval.

Simultaneous Intervals
Suppose we wish to calculate a confidence interval for the mean y value or a prediction
interval for a future y value both when x 5 x1* and also when x 5 x2* , two different values
of the predictor variable. If the confidence or prediction level for each individual inter-
val is 95%, then the joint or simultaneous level of confidence for both intervals will be
smaller than 95%. For example, from Examples 11.7 and 11.8, we can be 95% confident
that a y value to be observed when x 5 45 will be in the interval (7.51, 20.07) and also
95% confident that a y value to be observed when x 5 35 will lie in the interval (10.53,
23.01). The degree of confidence in the simultaneous statements

7.51 , 1st y , 20.07, 10.53 , 2nd y , 23.01

must be less than 95%. It is very difficult to say exactly what the degree of simulta-
neous confidence is, because the two intervals are not based on independent data
sets [if they were, the simultaneous level would be 100(.95)2 90%]. What can be
said is that the simultaneous confidence level will be at least 100(1 2 2(.05))%, that
is, at least 90%. More generally, if k different intervals are calculated, each using a
confidence or prediction level of 100(1 2 )%, then the simultaneous confidence
or prediction level for all k intervals will be at least 100(1 2 k)%. Thus if three
different 99% confidence intervals were computed, the simultaneous confidence
level would be at least 97%. There is a special table of t critical values for which the
simultaneous level for k intervals is at least 95% (k 5 2, 3, 4, . . .) and another such
table for at least 99%; the tabulated numbers are called Bonferroni t critical values
after the mathematician whose inequality justifies the “at least” statement. If more
than two or three of these intervals are calculated, they will have to be quite wide to
guarantee at least the desired level.

Section 11.3 Exercises

21. Mist (airborne droplets or aerosols) is generated b. What proportion of observed variation in mist
when metal-removing fluids are used in machin- can be attributed to the simple linear regression
ing operations to cool and lubricate the tool and relationship between velocity and mist?
workpiece. Mist generation is a concern to OSHA, c. The investigators were particularly interested in
which has recently lowered substantially the work- the impact on mist of increasing velocity from
place standard. The article “Variables Affecting 100 to 1000 (a factor of 10 corresponding to the
Mist Generation from Metal Removal Fluids” difference between the smallest and largest x val-
(Lubrication Engr., 2002: 10–17) gave the accom- ues in the sample). When x increases in this way,
panying data on x 5 fluid flow velocity for a 5% is there substantial evidence that the true average
soluble oil (cm/sec) and y 5 the extent of mist drop- increase in y is less than .6?
lets having diameters smaller than 10 m (mg/m3): d. Estimate the true average change in mist associ-
x: 89 177 189 354 362 442 965 ated with a 1 cm/sec increase in velocity, and
do so in a way that conveys information about
y: .40 .60 .48 .66 .61 .69 .99
precision and reliability.
a. The investigators performed a simple linear
regression analysis to relate the two variables. 22. Phenolic compounds are found in the effluents of
Does a scatterplot of the data support this strategy? coal conversion processes, petroleum refineries,

herbicide manufacturing, and fiberglass manufac- 24. The simple linear regression model provides a very
turing. These compounds are toxic, carcinogenic, good fit to the data on rainfall and runoff volume
and have contributed over the past decades to given in Exercise 4 of Section 11.1. The equation
environmental pollution of aquatic environments. of the least squares line is yn 5 21.128 1 .82697x,
In one study reported in “Photolysis, Biodegrada- r2 5 .975, and se 5 5.24. Use the fact that syn 5 1.44
tion, and Sorption Behavior of Three Selected when rainfall volume is 40 m3 to predict runoff in a
Phenolic Compounds on the Surface and Sediment way that conveys information about reliability and
of Rivers” (J. of Envir. Engr., 2011: 1114–1121), the precision. Does the resulting interval suggest that
authors examined the sorption characteristics of precise information about the value of runoff for
three selected phenolic compounds. The following this future observation is available? Explain your
data on y 5 sorbed concentration (g/g) and x 5 reasoning.
equilibrium concentration (g/mL) of 2, 4-Dinitro-
25. The article “Root Dentine Transparency: Age
phenol (DNP) in a particular natural river sediment
Determination of Human Teeth Using Comput-
was read from a graph in the article.
erized Densitometric Analysis” (Amer. J. of Physi-
x: 0.11 0.13 0.14 0.18 0.29 0.44 0.67 0.78 0.93 cal Anthro., 1991: 25–30) reported on an inves-
y: 1.72 2.17 2.33 3.00 5.17 7.61 11.17 12.72 14.78 tigation of methods for age determination based
a. Calculate point estimates of the slope and inter- on tooth characteristics. A single observation on
cept of the population regression line. y 5 age (yr) was made for each of the following
b. Using the simple linear regression model fit values of x 5 % of root with transparent dentine:
to this data, confirm that yn 5 3.404, syn 5 .107 15, 19, 31, 39, 41, 44, 47, 48, 55, 64. Consider
when x 5 .2, and yn 5 6.616, syn 5 .088 when x 5 the following six intervals based on the resulting
.4. Explain why syn is larger when x 5 .2 than data: (i) a 95% CI for mean age when x 5 35;
when x 5 .4. (ii) a 95% PI for age when x 5 35; (iii) a 95% CI
c. Calculate a confidence interval with a confi- for mean age when x 5 42; (iv) a 95% PI for age
dence level of 95% for the true average DNP when x 5 42; (v) a 99% CI for mean age when
sorbed concentration of all river sediment speci- x 5 42; (vi) a 99% PI for age when x 5 42. With-
mens using an equilibrium concentration of .4. out computing any of these intervals, what can be
d. Calculate a prediction interval with a prediction said about their relative widths?
level of 95% for the DNP sorbed concentration
26. During oil drilling operations, components of the
of a single river sediment specimen using an
drilling assembly may suffer from sulfide stress
equilibrium concentration of .4.
cracking. The article “Composition Optimiza-
e. If a 95% CI is calculated for true average DNP
tion of High-Strength Steels for Sulfide Cracking
sorbed concentration when equilibrium con-
Resistance Improvement” (Corrosion Sci., 2009:
centration is .2, what will be the simultaneous
2878–2884) reported on a study in which the
confidence level for both this interval and the
composition of a standard grade of steel was ana-
interval calculated in part (c)?
lyzed. The following data on y 5 threshold stress
23. Refer to Exercise 6 of Section 11.1. (% SMYS) and x 5 yield strength (MPa) was read
a. Predict oxygen diffusivity for a single observa- from a graph in the article (which also included the
tion to be made when temperature is 1500°F, equation of the least squares line).
and do so in a way that conveys information x: 635 644 711 708 836 820 810
about reliability and precision.
y: 100 93 88 84 77 75 74
b. Would a prediction interval for diffusivity when
temperature is 1200°F using the same predic- x: 870 856 923 878 937 948
tion level as in part (a) be wider or narrower y: 63 57 55 47 43 38
than the interval of part (a)? Answer without a. Does a scatterplot support the use of the simple
computing this second interval. linear regression model for relating y to x?

b. What proportion of observed variation in stress Relevant calculated values include Sxx 5 762.012,
can be attributed to the approximate linear rela- b 5 .024576, a 5 .175576, SSTo 5 .48144, and
tionship between the two variables? SSResid 5 .02120.
c. Determine a 90% confidence interval for the a. Does the simple linear regression model specify
true average threshold stress of all similar steel a useful relationship between production and
specimens whose yield strength is 800 MPa. protein?
d. Determine a 90% prediction interval for the b. Estimate true average protein for all cows whose
threshold stress of a single steel specimen whose production is 30 kg/day; use a confidence interval
yield strength is 800 MPa. with a confidence level of 99%. Does the result-
ing interval suggest that this mean value has been
27. Milk is an important source of protein. How does
precisely estimated? Explain your reasoning.
the amount of protein in milk from a cow vary
c. Calculate a 99% prediction interval for the protein
with milk production? The article “Metabolites of
from a single cow whose production is 30 kg/day.
Nucleic Acids in Bovine Milk” (J. of Dairy Science,
1984: 723–728) reported the accompanying data 28. Obtain an expression for sa, the estimated standard
on x 5 milk production (kg/day) and y 5 milk deviation of the intercept of the least squares line.
protein (kg/day) for Holstein-Friesan cows: Then use the fact that t 5 (a 2 )ysa has a t distribu-
tion with n 2 2 df to test H0: 5 0 for the data in
x: 42.7 40.2 38.2 37.6 32.2 32.2 28.0 Exercise 27 (this null hypothesis says that the popu-
y: 1.20 1.16 1.07 1.13 .96 1.07 .85 lation regression line passes through the origin).
x: 27.2 26.6 23.0 22.7 21.8 21.3 20.2 Hint: When x 5 0, yn 5 a 1 b(0) 5 a, and we have
y: .87 .77 .74 .76 .69 .72 .64 a general expression for syn.

11.4 Multiple Regression Models

The regression models considered thus far have involved relating the dependent or
response variable y to a single independent or predictor variable x. But it is virtually
always the case that a model relating y to two or more predictors will explain more
variation and provide better predictions than will a model with just a single predic-
tor. For example, we should be able to predict fuel efficiency of a car more precisely
from knowing both engine size and weight of the car than from knowing only one
of these variables. Let k denote the number of predictor variables to be used in a
model, and denote the predictors themselves by x1, x2, . . . , xk (previously x1, x2, . . .
represented various values of the single variable x, whereas now they represent dif-
ferent variables). For example, let y be the concentration of a certain chemical
contaminant in an industrial worker’s bloodstream. Then we might use the four
predictors
x1 5 number of years of exposure to the contaminant
x2 5 number of years since the last exposure
x3 5 age of the worker
x4 5 a quantitative index of body mass
It is almost never true that the value of y is completely and uniquely determined
by values of x1, . . . , xk. A probabilistic relationship is obtained by starting with some
deterministic function f (x1, . . . , xk) and adding (or perhaps multiplying by) a random
deviation e to incorporate uncertainty due to various other factors.

definitions A general additive multiple regression model, which relates a dependent vari-
able y to k predictor variables x1, x2, . . . , xk, is given by the model equation
y 5 1 1x1 1 2x2 1 … 1 k xk 1 e
The random deviation e is assumed to be normally distributed with mean value
0 and variance 2 for any particular values of the predictors, and the e’s resulting
from different observations are assumed to be independent of one another. The
i >s are called population regression coefficients, and the deterministic portion
1 1x1 1 … 1 k xk is the population regression function.

Let x1* , x2* , . . . , xk* denote particular values of the predictors. Then the model equa-
tion and assumptions about e imply that

(mean y value when x1 5 x1* ,…, xk 5 xk) 5 1 1x1 1 … 1 k xk*

(variance of y when x1 5 x1* ,…, xk 5 xk*) 5 2

As in simple linear regression, if 2 is quite close to 0, any particular observed y value

will tend to be quite near its mean value. When 2 is large, many of the y observations
may deviate substantially from their mean y values.
The slope coefficient in simple linear regression was interpreted as the mean
change in y associated with a 1-unit increase in the value of x. Each population regres-
sion coefficient in multiple regression has a similar interpretation. For example, 2 is
the mean change in y associated with a 1-unit increase in x2 provided that the values of
the remaining predictors x1, x3, . . . , xk are held fixed.

Example 11.9 Cardiorespiratory fitness is widely recognized as a major component of overall physi-
cal well-being. Direct measurement of maximal oxygen uptake (VO2max) is the
single best measure of such fitness, but direct measurement is time-consuming and
expensive. It is therefore desirable to have a prediction equation for VO2max in terms
of easily obtained quantities. Consider the variables
y 5 VO2max (Lymin) x1 5 weight (kg) x2 5 age (yr)
x3 5 time necessary to walk 1 mile (min)
x4 5 heart rate at the end of the walk (beatsymin)
Here is one possible model, for male students, consistent with the information given
in the article “Validation of the Rockport Fitness Walking Test in College Males and
Females” (Research Quarterly for Exercise and Sport, 1994: 152–158):
y 5 5.0 1 .01x1 2 .05x2 2 .13x3 2 .01x4 1 e 5 .4
The population regression function is
mean y valued for fixed x1, . . . , xk 5 5.0 1 .01x1 2 .05x2 2 .13x3 2 .01x4

For individuals whose weight is 76 kg, age is 20 yr, walk time is 12 min, and heart
rate is 140 beats/min,
mean value of VO2max 5 5.0 1 .01(76) 2 .05(20) 2 .13(12) 2 .01(140)
5 1.80 L/min
With 2 5 .80, it is quite likely (a probability of roughly .95) that an actual y value
observed when the xi’s are as stated will be within .80 of the mean value, that is, in
the interval from 1.00 to 2.60.
The value 2 5 2.05 is interpreted as the average change in VO2max (here a
decrease) associated with a 1-year increase in age while weight, walk time, and heart
rate are all held fixed. The three other i >s associated with predictors have similar
interpretations.

A Special Case: Polynomial Regression

Consider again the case of a single independent variable x, and suppose that a scatter-
plot of the n sample (x, y) pairs has the appearance of Figure 11.12. The simple linear
regression model is clearly not appropriate. It does, however, look as though a parabola,
the graph of a quadratic function y 5 1 1x 1 2x2, would provide a very good fit to
the data for appropriately chosen values of and the i >s. Because no quadratic would
give a perfect fit, we need a probabilistic model that allows observed points to deviate
from the parabola. Adding a random deviation e to the quadratic function gives such a
model:

Figure 11.12 A scatterplot consistent

with a quadratic regression model

If we rewrite this equation with x1 5 x and x2 5 x2, a special case of the general
multiple regression model with k 5 2 results. Notice that one of the two predictors is a
mathematical function of the other one: x2 5 (x1)2. In general, in a multiple regression
model, it is perfectly legitimate to have one or more of the k predictors that are mathemati-
cal functions of other predictors. For example, we will shortly discuss models that include
an interaction predictor of the form x3 5 x1x2, a product of two other predictors. In particu-
lar, the general polynomial regression model begins with a single independent variable x
and creates predictors x1 5 x, x2 5 x2, . . . , xk 5 xk for some specified value of k.

definitions The kth-degree polynomial regression model

y 5 1 1x 1 2 x2 1 … 1 k xk 1 e
is a special case of the general additive multiple regression model with x1 5 x,
x2 5 x2, . . . , xk 5 xk. The population regression function is
mean y value for fixed x 5 1 x 1 … 1 x k 1 k

The most important special case other than simple linear regression (k 5 1) is the
quadratic regression model
y 5 1 1x 1 2x2 1 e
This model replaces the line of mean y values in simple linear regression with a
parabolic curve of mean values 1 1x 1 2x2. If 2 , 0, the curve opens down-
ward, as in Figure 11.13(a), whereas it opens upward when 2 . 0. A less fre-
quently encountered case is that of cubic regression, in which k 5 3.

(a) (b) (c)

Figure 11.13 Polynomial regression models: (a) quadratic regression model with 2 , 0;

(b) quadratic regression model with 2 . 0; (c) cubic regression model with 3 . 0

Example 11.10 Researchers have examined a variety of climatic variables in an attempt to gain an
understanding of the mechanisms that govern rainfall runoff. The article “The Appli-
cability of Morton’s and Penman’s Evapotranspiration Estimates in Rainfall-Runoff
Modeling” (Water Resources Bull., 1991: 611–620) reported on a study in which data
on x 5 cloud cover and y 5 daily sunshine (hr) was gathered from a number of dif- Unless otherwise noted, all content on this page is © Cengage Learning.
ferent locations. The authors used a cubic regression model to relate these variables.
Suppose that the actual model equation for a particular location is
y 5 11 2 .400x 2 .250x2 1 .005x3 1 e
Then the regression function is
(mean daily sunshine for given cloud cover x) 5 11 2 .400x 2 .250x2 1 .005x3
For example,
(mean daily sunshine when cloud cover is 4) 5 11 2 .400(4) 2 .250(4)2 1 .005(4)3
5 5.72
If 5 1, it is quite likely that an observation on daily sunshine made when x 5 4
would be between 3.72 and 7.72 hr.

The interpretation of i given previously for the general multiple regression

model is not legitimate in polynomial regression. This is because all predictors are
functions of x, so xi 5 (x)i cannot be increased by 1 unit while the values of all other
predictors are held fixed. In general, the interpretation of regression coefficients
requires extra care when some predictor variables are mathematical functions of
other variables.

Interaction Between Variables

Suppose that an industrial chemist is interested in the relationship between product
yield (y) from a certain reaction and two independent variables, x1 5 reaction tem-
perature and x2 5 pressure at which the reaction is carried out. The chemist initially
proposes the relationship

y 5 1200 1 15x1 2 35x2 1 e

for temperature values between 80 and 100 in combination with pressure values rang-
ing from 50 to 70. The population regression function 1200 1 15x1 2 35x2 gives the
mean y value for any particular values of the predictors. Consider this mean y value for
three different particular temperature values:
x1 5 90: mean y value 5 1200 1 15(90) 2 35x2 5 2550 2 35x2
x1 5 95: mean y value 5 2625 2 35x2
x1 5 100: mean y value 5 2700 2 35x2
Graphs of these three mean y value functions are shown in Figure 11.14(a). Each graph
is a straight line, and the three lines are parallel, each with a slope of 235. Thus irre-
spective of the fixed value of temperature, the average change in yield associated with a
1-unit increase in pressure is 235.

Mean value Mean value

30
26

0
25

0
35
25

2
5
50

40
35

35
1

2
2

22
=1
35

(
(

2
50
00

1
1

(
2

=1
=9

)
(

1
=9
30
5)

00
1
=9

)
2
0)

(1
=
90
)

2 2
(a) (b)

Figure 11.14 Graphs of the mean y value for two different models:
(a) 1200 1 15 1 2 35 2; (b) 24500 1 75 1 1 60 2 2 1 2

Since chemical theory suggests that the decline in average yield when pressure x2
increases should be more rapid for a high temperature than for a low temperature, the
chemist now has reason to doubt the appropriateness of the proposed model. Rather
than the lines being parallel, the line for a temperature of 100 should be steeper than
the line for a temperature of 95, and that line in turn should be steeper than the line for
x1 5 90. A model that has this property includes, in addition to predictors x1 and x2, a
third predictor variable, x3 5 x1x2. One such model is

y 5 24500 1 75x1 1 60x2 2 x1x2 1 e

for which the population regression function is 24500 175x1 1 60x2 2 x1x2. This gives

mean y value when temperature is 100 5 24500 1 (75)(100) 1 60x2 2 100x2

5 3000 2 40x2
mean value when temperature is 95 5 2625 2 35x2
mean value when temperature is 90 5 2250 2 30x2

These are graphed in Figure 11.14(b), where it is clear that the three slopes are differ-
ent. Now each different value of x1 yields a line with a different slope, so the average
change in yield associated with a 1-unit increase in x2 depends on the value of x1. When
this is the case, the two variables are said to interact.

definition If the change in the mean y value associated with a 1-unit increase in one inde-
pendent variable depends on the value of a second independent variable, there is
interaction between these two variables. Denoting the two independent variables
by x1 and x2, we can model this interaction by including as an additional predictor
x3 5 x1x2, the product of the two independent variables.

The general equation for a multiple regression model based on two independent
variables x1 and x2 that also includes an interaction predictor is

y 5 0 1 1x1 1 2x2 1 3x3 1 e with x3 5 x1x2

When x1 and x2 do interact, this model will usually give a much better fit to resulting
data than would the no-interaction model. Failure to consider a model with interaction
too often leads an investigator to conclude incorrectly that the relationship between y
and a set of independent variables is not very substantial.
More than one interaction predictor can be included in the model when more than
two independent variables are available. If, for example, three independent variables x1,
x2, and x3 are available, one possible model is

y 5 1 1x1 1 2x2 1 3x3 1 4x1x2 1 5x1x3 1 6x2x3 1 e

One could even include a three-way interaction x7 5 x1x2x3, although in practice this
is rarely done. In applied work, quadratic predictors such as x21 and x22 are often in-
cluded to model a curved relationship between y and several independent variables. A
frequently used model with k 5 5 based on two independent variables x1 and x2 is the
full quadratic or complete second-order model

y 5 1 1x1 1 2x2 1 3x1x2 1 4x21 1 5x22 1 e

This model replaces the straight lines of Figure 11.14 with parabolas (each one is
the graph of the population regression function as x2 varies when x1 has a particular
value). Starting with four independent variables x1,…, x4, one could create a model
with four quadratic predictors and six two-way interaction predictor variables. Clearly,
a great many different models can be created from just a small number of independent
variables. In Section 11.6 we briefly discuss methods for selecting one model from a
number of competing models.

Qualitative Predictor Variables

Thus far we have explicitly considered the inclusion of only quantitative (numerical)
predictor variables in a multiple regression model. Using simple numerical coding,
qualitative (categorical) variables, such as bearing material (aluminum or copper/lead)
or type of wood (pine, oak, or walnut), can also be incorporated into a model. Let’s first
focus on the case of a dichotomous variable, one with just two possible categories—
male or female, U.S. or foreign manufacture, and so on. With any such variable, we
associate a dummy or indicator variable x whose possible values 0 and 1 indicate which
category is relevant for any particular observation.

Example 11.11 The article “Estimating Urban Travel Times: A Comparative Study” (Trans. Res.,
1980: 173–175) described a study relating the dependent variable y 5 travel time
between locations in a certain city and the independent variable x2 5 distance be-
tween locations. Two types of vehicles, passenger cars and trucks, were used in the
study. Let
1 if the vehicle is a truck
x1 5 e
0 if the vehicle is a passenger car
One possible multiple regression model is
y 5 1 1x1 1 2x2 1 e
The mean value of travel time depends on whether a vehicle is a car or a truck:
mean time 5 1 2x2 when x1 5 0 (cars)

mean time 5 1 1 1 2x2 when x1 5 1 (trucks)

The coefficient 1 is the difference in mean times between trucks and cars with
distance held fixed; if 1 . 0, on average it will take trucks longer to traverse any
particular distance than it will for cars.

A second possibility is a model with an interaction predictor:

y 5 1 1x1 1 2x2 1 3x1x2 1 e
Now the mean times for the two types of vehicles are
mean time 5 1 2x2 when x1 5 0
mean time 5 1 1 1 (2 1 3) x2 when x1 5 1

For each model, the graph of the mean time versus distance is a straight line for
either type of vehicle, as illustrated in Figure 11.15. The two lines are parallel for
the first (no-interaction) model, but in general they will have different slopes when
the second model is correct. For this latter model, the change in mean travel time
associated with a 1-mile increase in distance depends on which type of vehicle is
involved—the two variables “vehicle type” and “travel time” interact. Indeed, data
collected by the authors of the cited article suggested the presence of interaction.

Mean Mean

) 1)
=1 =
( 1 1
2
(
2 ) 2
1
+ =0 )
3
+ ( 1 +
2
2 ( )
+ =0
2
+ 1 ( 1
+
2 2
+

2 2
(a) (b)

Figure 11.15 Regression functions for models with one dummy variable ( 1) and
one quantitative variable 2: (a) no interaction; (b) interaction

You might think that the way to handle a three-category situation is to define a
single numerical variable with coded values such as 0, 1, and 2 corresponding to the Unless otherwise noted, all content on this page is © Cengage Learning.

three categories. This is incorrect, because it imposes an ordering on the categories that
is not necessarily implied by the problem context. The correct approach to incorporat-
ing three categories is to define two different dummy variables. Suppose, for example,
that y is the lifetime of a certain cutting tool, x1 is cutting speed, and there are three
brands of tool being investigated. Then let

1 if a brand A tool is used 1 if a brand B tool is used

x2 5 e x3 5 e
0 otherwise 0 otherwise

When an observation on a brand A tool is made, x2 5 1 and x3 5 0, whereas for a brand

B tool, x2 5 0 and x3 5 1. An observation made on a brand C tool has x2 5 x3 5 0, and
it is not possible that x2 5 x3 5 1 because a tool cannot simultaneously be both brand

A and brand B. The no-interaction model would have only the predictors x1, x2, and x3.
The following interaction model allows the mean change in lifetime associated with a
1-unit increase in speed to depend on the brand of tool:

y 5 1 1x1 1 2x2 1 3x3 1 4x1x2 1 5x1x3 1 e

Construction of a picture like Figure 11.14 with a graph for each of the three possible
(x2, x3) pairs gives three nonparallel lines (unless 4 5 5 5 0).
More generally, incorporating a categorical variable with c possible categories into
a multiple regression model requires the use of c 2 1 indicator variables (e.g., five
brands of tools would necessitate using four indicator variables). Thus even one cat-
egorical variable can add many predictors to a model.

Nonlinear Multiple Regression Models

Many nonlinear relationships can be put in the form of our basic additive model equa-
tion by transforming one or more of the variables. For example, taking the logarithm on
both sides of the multiplicative exponential model equation

y 5 0e1x1 12x2 1 1kxk

«, « . 0

gives an equation of the desired form [with 5 ln(0)]. An appropriate transformation

could be suggested by theory or by various plots of the data, such as those to be discussed
in Section 11.6. There are also relationships that cannot be linearized by means of a
transformation, necessitating more complex methods of analysis. Consult one of the
chapter references for more information.

Section 11.4 Exercises

29. A trucking company considered a multiple regres- (% of crude oil), x1 5 crude oil gravity ( API), x2 5
sion model for relating the dependent variable y 5 crude oil vapor pressure (PSIA), x3 5 crude oil ASTM
total daily travel time for one of its drivers (hours) 10% point ( F), and x4 5 gasoline end point ( F).
to the predictors x1 5 distance traveled (miles) and a. Interpret the population regression coefficients
x2 5 the number of deliveries made. Suppose that 1 and 3.
the model equation is b. What is the mean yield when x1 5 40, x2 5 5,
y 5 2.800 1 .060x1 1 .900x2 1 e x3 5 230, and x4 5 360?
a. What is the mean value of travel time when dis-
31. High-alumina refractory castables have been exten-
tance traveled is 50 miles and three deliveries
sively investigated in recent years because of their
are made?
significant advantages over other refractory brick of
b. How would you interpret 1 5 .060, the coeffi-
the same class: lower production and application
cient of the predictor x1? What is the interpreta-
costs, versatility, and performance at high tempera-
tion of 2 5 .900?
tures. The authors of “Processing of Zero-Cement
c. If 5 .5 hour, what is the probability that travel
Self-Flow Alumina Castables” (The Amer. Ceramic
time will be at most 6 hours when three deliveries
Soc. Bull., 1998: 60–66) proposed a quadratic regres-
are made and the distance traveled is 50 miles?
sion model to describe the relationship between x 5
30. Consider the regression model y 5 26.50 1.250x1 1 viscosity (MPa ∙ sec) and y 5 free flow (%). Suppose
.600x2 2 .150x3 1 .160x4 1 e, where y 5 gasoline yield the actual model is y 5 2296 1 2.20x 2 .003x2 1 e.

a. Graph the true regression function y 5 2296 1 in mean life associated with an increase of 1 in
2.20x – .003x2 for x values between 350 and 485. load?
b. Would mean free flow percentage be higher for
33. Let y 5 sales at a fast-food outlet (1000s of $), x1 5
a viscosity value of 450 or 470?
number of competing outlets within a 1-mile radius,
c. What is the change in mean free flow percent-
x2 5 population within a 1-mile radius (1000s of
age when the viscosity increases from 450 to
people), and x3 be an indicator variable that equals
460? From 460 to 470?
1 if the outlet has a drive-up window and 0 other-
32. Let y 5 wear life of a bearing, x1 5 oil viscosity, wise. Suppose that the true regression model is
and x2 5 load. Suppose that the multiple regression y 5 10.00 2 1.2x1 1 6.8x2 1 15.3x3 1 e
model relating life to viscosity and load is a. What is the mean value of sales when the
y 5 125.0 1 7.750x1 1 .0950x2 2 .0090x1x2 1 e number of competing outlets is 2, there are
a. What is the mean value of life when viscosity is 8000 people within a 1-mile radius, and the
40 and load is 1100? outlet has a drive-up window?
b. When viscosity is 30, what is the change in b. What is the mean value of sales for an out-let
mean life associated with an increase of 1 in without a drive-up window that has 3 competing
load? When viscosity is 40, what is the change outlets and 5000 people within a 1-mile radius?

11.5 Inferences in Multiple Regression

We now assume that a dependent or response variable y is related to k independent,
predictor, or explanatory variables x1, . . . , xk via the general additive multiple regression
model

y 5 1 1x1 1 … 1 kxk 1 e

discussed in Section 11.4. Estimation of model parameters and other inferences are
based on a sample of n observations, each one consisting of k 1 1 numbers: a value
of x1, a value of x2, . . . , a value of xk, and a value of y. As in simple linear regression,
the principle of least squares is used to estimate the population regression coefficients
, 1, . . . , k. The least squares estimates a, b1, b2, . . . , bk are chosen to minimize the
sum of squared deviations:

^ [y 2 (a 1 b1x1 1 … 1 bk xk)]2
all obs

As described in Section 3.5, the minimization requires taking k 1 1 partial derivatives,

equating these to zero to obtain a system of linear equations (the normal equations), and
solving this system for the estimates. There are formulas for the least squares estimates,
but the only sensible way to express them is to use the branch of mathematics called
matrix algebra. Fortunately, this is not necessary for our purposes; these formulas have
been programmed into all of the most popular statistical computer packages. When
using any particular package, it is necessary only to enter the data, make an appropriate
request, and know how to find the estimates on the output. The estimated regression
equation yn 5 a 1 b1x1 1 … 1 bk xk can then be used to estimate a mean y value or pre-
dict a single y value.

Example 11.12 The article “How to Optimize and Control the Wire Bonding Process: Part II (Solid
State Technology, Jan. 1991: 67–72) described an experiment carried out to assess the
impact of the variables x1 5 force (g), x2 5 power (mW), x3 5 temperature (°C), and
x4 5 time (ms) on y 5 ball bond shear strength (g). The following data1 was gener-
ated to be consistent with the information given in the article:

Observation Force Power Temperature Time Strength

1 30 60 175 15 26.2
2 40 60 175 15 26.3
3 30 90 175 15 39.8
4 40 90 175 15 39.7
5 30 60 225 15 38.6
6 40 60 225 15 35.5
7 30 90 225 15 48.8
8 40 90 225 15 37.8
9 30 60 175 25 26.6
10 40 60 175 25 23.4
11 30 90 175 25 38.6
12 40 90 175 25 52.1
13 30 60 225 25 39.5
14 40 60 225 25 32.3
15 30 90 225 25 43.0
16 40 90 225 25 56.0
17 25 75 200 20 35.2
18 45 75 200 20 46.9
19 35 45 200 20 22.7
20 35 105 200 20 58.7
21 35 75 150 20 34.5
22 35 75 250 20 44.0
23 35 75 200 10 35.7
24 35 75 200 30 41.8
25 35 75 200 20 36.5
26 35 75 200 20 37.6
27 35 75 200 20 40.3
28 35 75 200 20 46.0
29 35 75 200 20 27.8
30 35 75 200 20 40.3
A statistical computer package gave the following least squares estimates:
a 5 237.48 b1 5 .2117 b2 5 .4983 b3 5 .1297 b4 5 .2583

1
From the book Statistics Engineering Problem Solving by Stephen Vardeman, an excellent exposition
of the territory covered by our book, albeit at a somewhat higher level.

Thus we estimate that .1297 gm is the average change in strength associated with a
1-degree increase in temperature when the other three predictors are held fixed; the
other estimated coefficients are interpreted in a similar manner.
The estimated regression equation is
yn 5 237.48 1 .2117x1 1 .4983x2 1 .1297x3 1 .2583x4
A point prediction of strength resulting from a force of 35 g, power of 75 mW, tem-
perature of 200 degrees, and time of 20 ms is
yn 5 237.48 1 (.2117)(35) 1 (.4983)(75) 1 (.1297)(200) 1 (.2583)(20)
5 38.41 g
This is also a point estimate of the mean value of strength for the specified values of
force, power, temperature, and time.

Substituting the values of the predictors from the successive observations into the
equation for the estimated regression gives the predicted or fitted values yn1, yn2,…, ynn. For
example, since the values of the four predictors for the last observation in Example 11.12
are 35, 75, 200, and 20, respectively, the corresponding predicted value is yn30 5 38.41.
The residuals are the differences y1 2 yn1,…, yn 2 ynn ? The last residual in Example 11.12
is 40.3 2 38.41 5 1.89. The closer the residuals are to zero, the better the job our estimat-
ed equation is doing in predicting the y values corresponding to values of the predictors in
our sample. Squaring these residuals and summing gives residual or error sum of squares
^ (yi 2 yni)2, denoted by SSResid. The number of df associated with SSResid is n 2 (k 1 1).
The explanation is that the k 1 1 parameters , 1,…, k have to be estimated from the
data before SSResid can be calculated, resulting in a loss of this many df (in simple linear
regression, k 5 1 so df 5 n 2 2). The variance 2 of a random deviation e in the model
equation is estimated by s2e 5 SSResidy[n 2 (k 1 1)], and se is the estimate of . For the
data of Example 11.12, SSResid 5 665.12, so 665.12y[30 2 (4 1 1)] 5 26.60 and the
estimated standard deviation is 5.16. We estimate that, roughly speaking, the size of a typi-
cal deviation of y from its mean value will be about 5.2 g.

Model Utility
A very important quantity introduced in Section 3.5 is the coefficient of multiple
determination, R2, given by

SSResid
R2 5 1 2 where SSTo 5 ^ (yi 2 y)2
SSTo

R2 is interpreted as the proportion of variation in the observed y values that can be attributed
to (or explained by) the model relationship between y and the predictors. The closer R2 is to
1, the more effectively the model has explained variation in y by relating it to the predictors.
The coefficient of multiple determination for the data of Example 11.12 is .714, so some-
what more than 70% of the observed variation in strength can be attributed to the model
relationship between strength and the four predictors force, power, temperature, and time.

The value of R2 cannot decrease when an extra predictor is added to the model,
and it will generally increase. Furthermore, the value of R2 can almost always be made
very close to 1 simply by using a model whose number of predictors is quite close to the
sample size, even if many of these predictors are “frivolous” in the sense that they would
contribute only marginally to explaining variation in y. Because R2 can be misleading in
this way, a quantity called adjusted R2 is included on multiple regression output from
most statistical computer packages. It is defined by
SSResidy[n 2 (k 1 1)] n21 SSResid
adjusted R2 5 1 2 512 c d
SSToy(n 2 1) n 2 (k 1 1) SSTo

Replacing the expression in brackets on the far right by 1 gives R2 itself. Since this expres-
sion is less than 1, the adjusted R2 is smaller than R2. This downward adjustment will
be small when R2 is reasonably high and this has been achieved by using a model with
relatively few predictors compared to the sample size. For example, adjusted R2 for the
model fit in Example 11.12 is .668, which is not all that much smaller than R2 itself. The
adjustment will be more dramatic when R2 is not so high or when k is large relative to n.
High values of R2 and adjusted R2 certainly suggest that the model fit is a useful one.
But how large should these values be before we draw this conclusion? It is desirable to
have a formal test procedure so that we will not be led astray by intuition. Recall that the
null hypothesis for the model utility test in simple linear regression was that 5 0; its
interpretation was that there is no useful linear relationship between y and the single pre-
dictor x. Here, the null hypothesis states that there is no useful linear relationship between
y and any of the k predictors included in the model. The test is based on F distributions,
which were first encountered in Chapter 9 in connection with the analysis of variance.

The Model Utility Test in Multiple Regression

Null hypothesis: 0: 1 5 2 5 5 50
Alternative hypothesis: a: at least one among 1, . . . , is not zero
2
y MSRegr
test statistic: 5 5
(1 2 2
)y[ 2 ( 1 1)] MSResid

where
MSResid 5 SSresidy[( 2 ( 1 1)]
MSRegr 5 SSRegryk
SSRegr 5 SSto 2 SSResid
The larger the value of 2, the larger the value of will be, implying that the test is upper-
tailed (as were tests in ANOVA). When 0 is true, the test statistic has an distribution
based on numerator and 2 ( 1 1) denominator df. The -value for the test is the area
under the corresponding curve to the right of the calculated value of . Partial informa-
tion about this -value can be obtained from the table of critical values given in Appendix
Table VIII. As usual, the null hypothesis is rejected if the -value is less than or equal to the
chosen significance level.

A large value of R2 is no guarantee that the model will be judged useful by the F test. If k is
large relative to n, F will not exceed 0 by a great deal and the P-value will not be very small.

Example 11.13 Returning to the bond shear strength data of Example 11.12, a model with k 5 4
predictors was fit, so the relevant hypotheses are
H0: 1 5 2 5 3 5 4 5 0
Ha: at least one of these four ’s is not zero
Figure 11.16 shows output from the JMP statistical package. The values of the es-
timated coefficients, se (Root Mean Square Error), R2, and adjusted R2 agree with
those given previously.

Figure 11.16 Multiple regression output from JMP for

the data of Example 11.12

The value of the model utility F ratio is

R2yk .713959y4
F5 5 5 15.60
(1 2 R )y[n 2 (k 1 1)]
2
.286041y(30 2 5)

This value also appears in the F Ratio column of the ANOVA table in Figure 11.16.
The largest F critical value for 4 numerator and 25 denominator df in our F table is
6.49, which captures an upper-tail area of .001. Thus P-value < .001. The ANOVA
table in the JMP output (Figure 11.16) shows that P-value < .0001. This is a highly
significant result. The null hypothesis should be rejected at any reasonable signifi-
cance level. We conclude that there is a useful linear relationship between y and at
least one of the four predictors in the model. This does not mean that all four predic-
tors are useful; we will say more about this subsequently.

Inferences About an Individual bi

Just as the value of the estimated slope coefficient b in simple linear regression
varies from sample to sample, so too does the value of any estimated coefficient bi in
multiple regression. That is, bi is a statistic, therefore it has a sampling distribution.
It can be shown that the sampling distribution is normal (a consequence of the as-
sumption that the random deviation e is normally distributed and that the various
deviations are independent of one another). The mean value of the statistic bi is
i. That is, the sampling distribution is always centered at the value of what the
statistic is trying to estimate, so the statistic is unbiased. We denote the estimated
standard deviation of bi by sbi; the formulas for these estimated standard deviations
are complicated, but their values will be available on output from all of the most
popular statistical computer packages. In the JMP output of Figure 11.16, the es-
timated standard deviations are shown in the Std Error column right next to the
estimated coefficients. These quantities are the basis for calculating confidence in-
tervals and testing hypotheses.

The standardized variable

2
5

has a distribution based on 2 ( 1 1) df. This implies that a confidence interval for is

6 ( critical value)

The test statistic for 0: 5 is

2 hypothesized value
5

The test is upper-, lower-, or two-tailed, depending on whether the inequality in a is ., ,,

or Þ. In practice, the most frequently tested null hypothesis is 0: 5 0. The interpreta-
tion of 0 is that , the predictor
provides no useful information about .

Example 11.14 The JMP output of Figure 11.16 gives b2 5 .498333, sb2 5 .070191, and error df 5
n 2 (k 1 1) 5 25. The t critical value for a confidence interval for 2 with a
confidence level of 95% is 2.060. The confidence interval is
.498333 6 (2.060)(.070191) .498 6 .145 5 (.353, .643)
We therefore estimate with a high degree of confidence that, when the value of
power is increased by 1 mw while force, temperature, and time are all held fixed, the
associated change in average strength will be between .353 gm and .643 gm.

Example 11.15 In Example 3.15 from Section 3.5, we gave a data set consisting of 13 observa-
tions on the variables y 5 adsorption, x1 5 extractable iron, and x2 5 extractable
aluminum. Figure 11.17 is the Minitab output from fitting the model y 5 1
1x1 1 2x2 1 3x3 1 e, where x3 5 x1x2 is an interaction predictor.
Judging from the P-value of .000 for the model utility test, the fitted model
specifies a very useful relationship between y and the predictors. Provided that iron
and aluminum are retained in the model, does the interaction predictor appear to
provide useful information about adsorption? The relevant hypotheses are
H0: 3 5 0
Ha: 3 Þ 0
The test statistic is the t-ratio b3ysb3, with value .0005278y.0006610 5 .80. Our table
of t curve tail areas shows that the area under the 13 2 (3 1 1) 5 9 df curve to the
right of .8 is .222 (see Appendix Table VI), so the P-value for the two-tailed test is
.444 (.445 according to Minitab). The null hypothesis should not be rejected at any
reasonable significance level. It is very plausible that 3 5 0, from which we con-
clude that the interaction predictor does not appear to provide useful information
beyond what is provided by the predictors iron and aluminum.

Figure 11.17 Minitab output for Example 11.15

More Intervals
Because the individual estimated coefficients vary from sample to sample, so will
the value of yn 5 a 1 b1x1 1 … 1 bk xk for fixed values of x1, . . . , xk. Properties of the
sampling distribution of the statistic yn can be used to obtain both a confidence inter-
val for a mean y value and a prediction interval for a single y value when the predictors
have specified values. Both intervals are based on n 2 (k 1 1) df and have the same
general form as in the case of simple linear regression. The CI for a mean y value is
yn 6 (t critical value)syn
and the PI for a single as-yet-unobserved y value is
yn 6 (t critical value)2s2e 1 s2yn

where syn is the estimated standard deviation of the statistic yn. The PI is always wider
than the corresponding CI.

Example 11.16 Figure 11.18 shows Minitab output from fitting the model, using only the predictors x1
and x2, to the adsorption data referred to in Example 11.15. About 95% of the observed
variation in adsorption can be attributed to the model relationship. The P-value for
model utility is .000, confirming the utility of the chosen model. The P-values corre-
sponding to t-ratios for the two coefficients are .004 and .000, respectively, indicating
that neither of these predictors should be deleted from the model when the other one is
retained. That is, both predictors appear to provide useful information about y. The last
line of the output gives estimation and prediction information when x1 5 200 and x2 5
40. The values of yn and syn are 29.16 and 1.76, respectively. The limits of both a 95%
CI for mean adsorption and a 95% PI for a single adsorption value are also displayed.
Notice how much wider the PI is than the CI. Even with a very high R2 value, there
is still a reasonable amount of uncertainty involved in predicting a single value of
adsorption.
Unless otherwise noted, all content on this page is © Cengage Learning.

Figure 11.18 Minitab output for Example 11.16

Eliminating a Group of Predictors

The null hypothesis in the model utility test asserts that none of the predictors is useful.
The usefulness of a single predictor can be assessed using a t-ratio. Sometimes an inves-
tigator will want to know whether any of the predictors in some specified group provides
useful information. Let
g 5 number of predictors in the group under investigation
The relevant hypotheses are
H0: all ’s corresponding to the g predictors in the group have value 0
Ha: at least one of the ’s referred to in H0 is not 0
The alternative hypothesis is interpreted as saying that at least one predictor in the group
does provide useful information about y.
The test is carried out by fitting two different models: the “full” model, consist-
ing of all predictors (those in the group of interest and those not being considered for
deletion), and the “reduced” model, which contains only those predictors not in the
specified group. This results in an SSResid(full) value and an SSResid(red) value. The
former SSResid cannot be larger than the latter, because it results from adding extra
predictors (those in the group) without deleting anything. The usefulness of at least one
predictor in the group is suggested by an SSResid(full) value that is a good deal smaller
than the SSResid(red) value, because much less variation is left unexplained by the full
model than by the reduced model. The test statistic is
[SSResid(red) 2 SSResid(full)]yg
F5
SSResid(full)y[n 2 (k 1 1)]

The test is upper-tailed and is based on the F distribution having g numerator df and
n 2 (k 1 1) denominator df.

Example 11.17 For the bond shear strength data given in Example 11.12, the model with the four
predictors force, power, temperature, and time gave SSResid 5 665.12, R2 5 .714,
and adjusted R2 5 .668. Now consider as the full model the complete second-order
model containing not only x1–x4 but also 4 quadratic predictors and 6 interaction
predictors, for a total of 14 predictors. The estimated regression equation is
strength 5 21 2 2.30force 2 .08power 1 .836temp 2 3.99time
1.0240for* pow 2 .0093for* temp 1 .0755for* time
2.00467pow* temp 1 .0237pow* time
1.0007temp* tim 1 .0152forsqd 1 .00130powsqd
2.00011tempsqd 2 .0078timesqd
with SSResid(full) 5 426.93, R2 5 .816, adjusted R2 5 .645, and P-value 5 .002 for
the model utility F test. Should any of the second-order predictors be retained in the
model? The relevant null hypothesis is
H : 5 5 … 5 5 0 0 5 6 14

whereas the alternative hypothesis states that at least one of these ’s is not zero (that
is, there is at least one useful second-order predictor). The number of predictors in
the subset being considered for deletion is g 5 10, which is numerator df; denomina-
tor df is 30 2 (14 1 1) 5 15. The test statistic value is
(665.12 2 426.93)y10
F5 5 .84
426.93y15
for which P-value > .10. The null hypothesis should not be rejected at any reason-
able significance level. None of the second-order predictors appears to provide useful
information beyond what is contained in the four first-order predictors.

In Chapter 3, we discussed briefly fitting more general kinds of functions to bivari-

ate or multivariate data using the LOWESS technique or a general additive relation-
ship. Inferential techniques for fits of these types are still in the developmental stages.
Confidence intervals, for example, can be calculated using the bootstrap method pre-
sented in Chapter 7. Statistical packages such as R and SAS will do this sort of thing
without much difficulty. Please consult a more advanced reference for details. In the
next section, we consider further aspects of regression modeling, including checking
model adequacy and variable selection.

Section 11.5 Exercises

34. The article “Validation of the Rockport Fitness Walk- d. Using SSResid 5 30.1033 and SSTo 5 102.3922,
ing Test in College Males and Females” (Research what proportion of observed variation in VO2max
Quarterly for Exercise and Sport, 1994: 152–158) can be attributed to the model relationship?
recommended the following estimated regression
35. Exercise 35 of Section 3.5 gave data on x1 5 wire
equation for relating y 5 VO2max (L/min, a mea-
feed rate, x2 5 welding speed, and y 5 deposition
sure of cardiorespiratory fitness) to the predictors
rate of a welding process. Minitab output from fit-
x1 5 gender (female 5 0, male 5 1), x2 5 weight
ting the multiple regression model with x1 and x2 as
(lb), x3 5 1-mile walk time (min), and x4 5 heart rate
predictors is given here.
at the end of the walk (beats/min):
The regression equation is
yn 5 3.5959 1 .6566x1 1 .0096x2 2 .0996x3 DepRate = 0.0558 + 0.375 FeedRate + 0.00278 WeldSpd

2.0080x4 Predictor Coef Stdev t-ratio p

Constant 0.05580 0.07836 0.71 0.485
FeedRate 0.374917 0.007476 50.15 0.000
a. How would you interpret the estimated coeffi- WeldSpd 0.002775 0.001121 2.47 0.023
cient b3 5 2.0996? s = 0.0448530 R-sq = 99.3% R-sq(adj) = 99.2%
b. How would you interpret the estimated coeffi-
Analysis of Variance
cient b1 5 .6566?
SOURCE DF SS MS F p
c. Suppose that an observation made on a male Regression 2 5.0726 2.5363 1260.71 0.000
whose weight was 170 lb, walk time was 11 min, Error 19 0.0382 0.0020
Total 21 5.1108
and heart rate was 140 beats/min resulted in
VO2max 5 3.15. What would you have predict- a. Carry out the model utility test.
ed for VO2max in this situation, and what is the b. Calculate and interpret a 95% confidence interval
value of the corresponding residual? for 2, the population regression coefficient of x2.

c. When x1 5 11.5 and x2 5 40, the estimated stan- b. Again using n 5 25, calculate the value of ad-
dard deviation of yn is syn 5 .02438. Calculate a justed R2.
95% confidence interval for true average deposi- c. Calculate a 99% confidence interval for true
tion rate for the given values of x1 and x2. mean yarn tenacity when yarn count is 16.5, yarn
d. Calculate a 95% prediction interval for the de- contains 50% polyester, first nozzle pressure is 3,
position rate resulting from a single experimen- and second nozzle pressure is 5 if the estimated
tal run with x1 5 11.5 and x2 5 40. standard deviation of predicted tenacity under
these circumstances is .350.
36. Exercise 37 of Section 3.5 gave R output for a re-
gression of y 5 deposition over a specified time pe- 38. A regression analysis carried out to relate y 5 repair
riod on two complex predictors x1 and x2 defined in time for a water filtration system (hr) to x1 5 elapsed
terms of PAH air concentrations for various species, time since the previous service (months) and x2 5
total time, and total amount of precipitation. Use type of repair (1 if electrical and 0 if mechani-
the output in that exercise to answer the following: cal) yielded the following model based on n 5 12
a. Does there appear to be a useful linear relation- observations: yn 5 .950 1 .400x1 1 1.250x2. In addi-
ship between y and at least one of the predictors? tion, SSTo 5 12.72, SSResid 5 2.09, and sb2 5 .312.
b. The estimated standard deviation of yn when x1 a. Does there appear to be a useful linear relation-
is 20,000 and x2 is .002 is syn 5 21.7. Calculate a ship between repair time and the two model
95% confidence interval for the mean value of predictors? Carry out a test of the appropriate
deposition under these circumstances. hypotheses using a significance level of .05.
c. Fitting the model with predictors x1 and x2 gave b. Given that elapsed time since the last service re-
SSResid 5 27,454, whereas fitting with x1, x2, mains in the model, does type of repair provide
and x3 5 x1x2 resulted in SSResid 5 20519. useful information about repair time? State and
Using 5 .01, can we conclude that the x1x2 test the appropriate hypotheses using a signifi-
term adds useful information to a ‘reduced’ cance level of .01.
model containing only x1 and x2? Note: when c. Calculate and interpret a 95% confidence inter-
g 5 1, the resulting F test gives the same con- val for 2.
clusion as the t-test for whether a single vari- d. The estimated standard deviation of a prediction
able (here, x1x2) contributes useful information for repair time when elapsed time is 6 months and
to a model. the repair is electrical is .192. Predict repair time
under these circumstances by calculating a predic-
37. The article “Analysis of the Modeling Method-
tion interval with a 99% prediction level. Does the
ologies for Predicting the Strength of Air-Jet Spun
resulting interval suggest that the estimated model
Yarns” (Textile Res. J., 1997: 39–44) reported on a
will give an accurate prediction? Why or why not?
study carried out to relate yarn tenacity (y, in g/tex) to
yarn count (x1, in tex), percentage polyester (x2), first 39. The accompanying data on x 5 frequency (MHz)
nozzle pressure (x3, in kg/cm2), and second nozzle and y 5 power (W) for a certain laser configuration
pressure (x4, in kg/cm2). The estimate of the constant was read from a graph in the article “Frequency
term in the corresponding multiple regression equa- Dependence in RF Discharge Excited Waveguide
tion was 6.121. The estimated coefficients for the CO2 Lasers” (IEEE J. of Quantum Electronics,
four predictors were 2.082, .113, .256, and 2.219, 1984: 509–514):
respectively, and the coefficient of multiple determi-
x: 60 63 77 100 125 157 186 222
nation was .946.
y: 16 17 19 21 22 20 15 5
a. Assuming that the sample size was n 5 25, state
and test the appropriate hypotheses to decide Fitting a quadratic regression model to this
whether the fitted model specifies a useful linear data yielded the following summary quantities:
relationship between the dependent variable a 5 21.5127, b1 5 .391902, b2 5 2.00163141,
and at least one of the four model predictors. SSResid 5 .29, SSTo 5 202.87, and sb2 5 .00003391.

a. Why is b2 negative rather than positive? (this model was also fit by the investigators).
b. What proportion of observed variation in output Does it appear that at least one of the inter-
power can be attributed to the model relation- action predictors provides useful information
ship between power and frequency? about power over and above what is provided
c. Carry out a test of hypotheses to decide whether by the first-order predictors? State and test the
the quadratic regression model is useful. appropriate hypotheses using a significance
d. Carry out a test of hypotheses to decide whether level of .05.
the quadratic predictor should be retained in
41. The article “The Undrained Strength of Some
the model.
Thawed Permafrost Soils” (Canadian Geotechnical
e. When x 5 150, the estimated standard deviation
J., 1979: 420–427) reported the following data on
of yn is syn 5 .1410. Calculate a 99% confidence
undrained shear strength of sandy soil (y, in kPa),
interval for true average power when frequency
depth (x1, in m), and water content (x2, in %):
is 150, and also a 99% prediction interval for
a single output power observation to be made Obs Depth Watcont Shstren
when frequency is 150. 1 8.9 31.5 14.7
2 36.6 27.0 48.0
40. The article “Sensitivity Analysis of a 2.5 kW Proton
3 36.8 25.9 25.6
Exchange Membrane Fuel Cell Stack by Statistical
Method” (J. of Fuel Cell Sci. and Tech., 2009: 1–6) 4 6.1 39.1 10.0
used regression methodology to investigate the rela- 5 6.9 39.2 16.0
tionship between fuel cell power (W) and the inde- 6 6.9 38.3 16.8
pendent variables x1 5 H2 pressure (psi), x2 5 H2 flow 7 7.3 33.9 20.7
(stoc), x3 5 air pressure (psi), and x4 5 airflow (stoc). 8 8.4 33.8 38.8
Here is the Minitab output from fitting the mod- 9 6.5 27.9 16.9
el with the aforementioned independent variables as 10 8.0 33.1 27.0
predictors (also fit by the authors of the cited article): 11 4.5 26.3 16.0
Predictor Coef SE Coef T p 12 9.9 37.8 24.9
Constant 1507.3 206.8 7.29 0.000 13 2.9 34.6 7.3
x1 -4.282 4.969 -0.86 0.407 14 2.0 36.4 12.8
x2 7.46 62.11 0.12 0.907
x3 -0.9162 0.6227 -1.47 0.169 Fitting the model with predictors x1 and x2 only
x4 90.60 24.84 3.65 0.004 gave SSResid 5 894.95, whereas fitting the com-
plete second-order model with predictors x1, x2,
s = 4.6885 R-sq = 59.6% R-sq(adj) = 44.9%
x21, x22, and x1x2 resulted in SSResid 5 390.64. Carry
SOURCE DF SS MS F p out a test at significance level .01 to decide whether
Regression 4 40048 10012 4.06 0.029 at least one of the second-order predictors provides
Res.Error 11 27158 2469 useful information about shear strength.
Total 15 67206
42. Soluble dietary fiber (SDF) can provide health ben-
a. Does there appear to be a useful relationship be- efits by lowering blood cholesterol and glucose lev-
tween power and at least one of the predictors? els. The article “Effects of Twin-Screw Extrusion on
Carry out a formal test of hypotheses. Soluble Dietary Fiber and Physicochemical Prop-
b. Fitting the model with predictors x3, x4, and the erties of Soybean Residue” (Food Chemistry, 2013:
interaction x3x4 gave R2 5 .834. Does this model 884–889) reported the following data on y 5 SDF
appear to be useful? Can an F test be used to com- content (%) in soybean residue and the three predic-
pare this model to the model of part (a)? Explain. tors x1 5 extrusion temperature (in C), x2 = feed
c. Fitting the model with all 4 predictors as well moisture (in %), and x3 5 screw speed (in rpm) of a
as all second-order interactions gave R2 5 .960 twin-screw extrusion process.

Obs x1 x2 x3 y b. The accompanying Minitab output resulted from

1 35 110 160 11.13 fitting the model of part (a) (the articles authors
2 25 130 180 10.98 also used Minitab; amusingly, they employed a sig-
3 30 110 180 12.56 nificance level of .06 in various tests of hypotheses).
4 30 130 200 11.46 Does there appear to be a useful relationship be-
5 30 110 180 12.38 tween the response variable and at least one of the
6 30 110 180 12.43 predictors? Carry out a formal test of hypotheses.
c. When BHP is 1000, material is from supplier 1, and
7 30 110 180 12.55
no lubrication is used, syn 5 .524. Calculate a 95%
8 25 110 160 10.59
PI for the springback that would result from making
9 30 130 160 11.15
an additional observation under these conditions.
10 30 90 200 10.55
d. From the output, it appears that lubrication regi-
11 30 90 160 9.25 men may not be providing useful information.
12 25 90 180 9.58 A regression with the corresponding predictors
13 35 110 200 11.59 removed resulted in SSResid 5 48.426. What is
14 35 90 180 10.68 the coefficient of multiple determination for this
15 35 130 180 11.73 model, and what would you conclude about the
16 25 110 200 10.81 importance of the lubrication regimen?
17 30 110 180 12.68 e. A model with predictors for BHP, supplier, and
lubrication regimen, as well as predictors for in-
a. The authors fit the complete second-order
teractions between BHP and both supplier and
model with predictors x1, x2, x3, x21, x22, x23, x1x2,
lubrication regiment, resulted in SSResid 5
x1x3, and x2x3, which resulted in SSResid 5 .215
28.216 and R2 5 .849. Does this model appear
and SSTo 5 16.798. Determine the correspond-
to improve on the model with just BHP and pre-
ing values of R2 and adjusted R2.
dictors for supplier? Use = .05.
b. If we include in the model only the predictors x1,
Predictor Coef SE Coef T p
x2, and x3, the corresponding SSResid 5 11.428.
Constant 21.5322 0.6782 31.75 0.000
Carry out a test at significance level .01 to decide
BHP -0.0033680 0.0003919 -8.59 0.000
whether at least one of the second-order predictors Supp1_1 -1.7181 0.5977 -2.87 0.007
provides useful information about SDF content. Supp1_2 -1.4840 0.6010 -2.47 0.019
43. The use of high-strength steels (HSS) rather than Lub_1 -0.3036 0.5754 -0.53 0.602
aluminum and magnesium alloys in automotive Lub_2 0.8931 0.5779 1.55 0.133
body structures reduces vehicle weight. However, s = 1.18413 R-sq = 77.5% R-sq(adj) = 73.8%
HSS use is still problematic because of difficul- SOURCE DF SS MS F p
ties with limited formability, increased springback, Regression 5 144.915 28.983 20.67 0.000
difficulties in joining, and reduced die life. The Res.Error 30 42.065 1.402
article “Experimental Investigation of Springback Total 35 186.980
Variation in Forming of High Strength Steels” (J. of 44. Coir fiber, derived from coconut, is an eco-friendly
Manuf. Sci. and Engr., 2008: 1–9) included data material with great potential for use in construc-
on y 5 springback from the wall opening angle and tion. The article “Seepage Velocity and Piping
x 5 blank holder pressure (BHP). Three different Resistance of Coir Fiber Mixed Soils” (J. of Irrig.
material suppliers and three different lubrication and Drainage Engr., 2008: 485–492) included
regimens (no lubrication, lubricant 1, and lubri- several multiple regression analyses. The article’s
cant 2) were also utilized. authors kindly provided the accompanying data
a. What predictors would you use in a model to in- on x1 5 fiber content (%), x2 5 fiber length (mm),
corporate supplier and lubrication information x3 5 hydraulic gradient (no unit provided), and y 5
in addition to BHP? seepage velocity (cm/sec).

Obs cont lngth grad vel Obs cont lngth grad vel
1 0.0 0 0.400 0.027 26 1.5 50 1.141 0.058
2 0.0 0 0.716 0.050 27 1.5 50 1.474 0.082
3 0.0 0 0.925 0.080 28 1.5 50 1.581 0.112
4 0.0 0 1.098 0.099 29 1.5 50 1.983 0.144
5 0.0 0 1.226 0.107 30 1.0 25 0.462 0.028
6 0.0 0 1.427 0.140 31 1.0 25 0.705 0.059
7 0.0 0 1.709 0.178 32 1.0 25 0.987 0.084
8 0.0 0 1.872 0.200 33 1.0 25 1.154 0.101
9 0.5 50 0.380 0.022 34 1.0 25 1.479 0.150
10 0.5 50 0.774 0.040 35 1.0 25 1.786 0.194
11 0.5 50 1.056 0.060 36 1.0 25 1.957 0.218
12 0.5 50 1.329 0.111 37 1.0 40 0.419 0.030
13 0.5 50 1.598 0.158 38 1.0 40 0.705 0.050
14 0.5 50 1.799 0.188 39 1.0 40 0.979 0.068
15 1.0 50 0.410 0.026 40 1.0 40 1.226 0.091
16 1.0 50 0.577 0.038 41 1.0 40 1.470 0.126
17 1.0 50 0.748 0.049 42 1.0 40 1.744 0.168
18 1.0 50 0.927 0.060 43 1.0 60 0.436 0.034
19 1.0 50 1.090 0.070 44 1.0 60 0.650 0.051
20 1.0 50 1.239 0.088 45 1.0 60 0.889 0.068
21 1.0 50 1.496 0.111 46 1.0 60 1.222 0.093
22 1.0 50 1.744 0.134 47 1.0 60 1.477 0.112
23 1.0 50 1.915 0.145 48 1.0 60 1.726 0.139
24 1.5 50 0.444 0.014 49 1.0 60 1.983 0.173
25 1.5 50 0.821 0.037

a. Here is output from fitting the model with the c. Fitting the model with just fiber length and hy-
three xi’s as predictors: draulic gradient as predictors gave the estimated
Predictor Coef SE Coef T p regression coefficients a 5 2.005315, b1 5
Constant -0.002997 0.007639 -0.39 0.697 2.0004968, and b2 5 .102204 (the t-ratios for
fib cont -0.012125 0.007454 -1.63 0.111 these two predictors are both highly significant).
fib lngth -0.0003020 0.0001676 -1.80 0.078
hyd grad 0.102489 0.004711 21.76 0.000
In addition, syn 5 .00286 when fiber length 5 25
and hydraulic gradient 5 1.2. Is there convinc-
s = 0.0162355 R-sq = 91.6% R-sq(adj) = 91.1%
ing evidence that true average velocity is some-
Source DF SS MS F p
Regression 3 0.129898 0.043299 164.27 0.000
thing other than .1 in this situation? Carry out a
Residual Error 45 0.011862 0.000264 test using a significance level of .05.
Total 48 0.141760 d. Fitting the complete second-order model (as
How would you interpret the number –.0003020 did the article’s authors) resulted in SSResid 5
in the Coef column on the output? .003579. Does it appear that at least one of the
b. Does fiber content appear to provide useful second-order predictors provides useful informa-
information about velocity provided that fiber tion over and above what is provided by the three
length and hydraulic gradient remain in the first-order predictors? Test the relevant hypotheses
model? Carry out a test of hypotheses at 5 .05. at 5 .05.

11.6 Further Aspects of Regression Analysis

This last section surveys a variety of issues in regression analysis, including diagnostic
checks for model adequacy, identification of unusual observations, selection of a good
group of predictors from a candidate pool, problems associated with a strong linear

relationship among the predictors, and a model appropriate when y is a 0–1 variable
corresponding to a success–failure dichotomy.

Checking Model Adequacy

In Section 11.5, we presented inferential methods based on the general additive multiple re-
gression model. These methods are appropriate only if the model assumptions, for example,
normality of the random deviation e in the model equation, are satisfied. Checks of model
adequacy are usually based on the residuals, and in particular various plots involving these
or related quantities. Recall that the residuals are the differences y1 2 yn1, y2 2 yn2,…, yn 2 ynn
between observed and predicted y values. Before the data is obtained, each one of these
residuals is subject to randomness; we do not know a priori whether any particular residual
will be 23.2, 5.7, 0, or any other possible value. If the correct model has been fit, the mean
value of any particular residual is zero. Unfortunately, the amount of variability in any
particular residual will depend on the values of the predictors at which the corresponding
observation is made. This can make it difficult to compare the various residuals to one an-
other. One remedy for this difficulty is to standardize the residuals:
residual 2 0
standardized residual 5
estimated standard deviation of residual
residual
5
estiamted standard deviation of residual

The most popular statistical packages will produce these standardized residuals on request.

Example 11.18 The adsorption data introduced in Example 3.15, repeated here, is used in several
examples in the previous section. The residuals are based on the model with the two
predictors x1 5 iron content and x2 5 aluminum content.
Estimated
standard Standardized
Obs Iron Aluminum Adsorption Residual deviation residual
1 61 13 4 2.06305 3.64425 2.01730
2 175 21 18 21.70661 3.72079 2.45867
3 111 24 14 .46130 4.02690 .11455
4 124 23 18 3.34477 4.04931 .82601
5 130 64 26 23.64064 3.50644 21.03827
6 173 38 26 .58585 4.14741 .14126
7 169 33 21 22.21821 4.09222 2.54206
8 169 61 30 22.99022 4.07688 2.73346
9 160 39 28 3.70238 4.18323 .88505
10 244 71 36 28.93520 4.03193 22.21611
11 257 112 65 4.29026 2.98776 1.43595
12 333 88 62 1.09857 2.99775 .36647
13 199 54 40 6.07079 4.18560 1.45040

Notice that the estimated standard deviations for the 11th and 12th observations
are much smaller than those of most other observations. This is because the x1 and
x2 values for these two observations are quite far from the center of the data. This is
analogous to the least squares line in simple linear regression being pulled toward an
observation whose x value is far to the left or right of the other x values; there is less
variability in the corresponding residual than for the other observations. The only
unusually large residual here is for the 10th observation; because the standardized
residual is 22.22, the residual 28.94 is more than 2 standard deviations smaller than
what would be expected if the correct model had been fit.

We previously advocated the use of a normal quantile plot to check a normality

assumption. In regression, we suggest that the assumption of normally distributed random
deviations be investigated by constructing a normal quantile plot of the standardized re-
siduals. A reasonably linear pattern in this plot suggests that normality is plausible.

Example 11.19 Figure 11.19 shows a normal quantile plot of the standardized residuals for the ad-
sorption data given in Example 11.18. The straightness of the plot casts little doubt
on the assumption that the random deviation e is normally distributed.

Standardized residual

1.5

–.5

–1.5

–2.5 Normal quantile

Figure 11.19 A normal quantile plot of the standardized residuals

from Example 11.18

Another model assumption is that the variance 2 of a random deviation is a constant;

that is, it does not depend on the values of the predictors. This can be checked by plotting
the standardized residuals against each predictor in turn—one plot of standardized residu-
als versus x1, another of standardized residuals versus x2, and so on. Ideally, the points in
each of these plots should appear randomly placed with no discernible pattern. If there is
any marked tendency for the points in one of these plots to spread out substantially more at
one end than the other, the constant variance assumption is suspect. Remedial action must
be taken, and the advice of a statistician should be sought. Additionally, if there is substan-
tial curvature in a plot, the population regression function in the chosen model has been

incorrectly specified. It would then be necessary to try transforming one or more of the vari-
ables or introducing new predictors, for example, quadratic predictors. Some statisticians
suggest replacing plots of the standardized residuals (or residuals) versus each predictor by
a single omnibus plot of the standardized residuals (or residuals) versus the predicted values
(yn’s). Again, any marked deviation from randomness is a call for remedial action. A plot of
yn versus y gives a visual impression of how well the model is predicting for the observations
in the sample. The closer the points in this plot are to a 45° line, the better the predictions;
the vertical deviations from this line are just the residuals. Finally, if the observations were
obtained in time sequence, the standardized residuals should be plotted in time order to
see whether there is an effect over time. Such an effect might indicate that the e’s for suc-
cessive observations are not independent, necessitating a more complex model.

Example 11.20 Figure 11.20 shows the suggested plots for the adsorption data. Given that there are
only 13 observations in the data set, there is not much evidence of a pattern in any
of the first three plots other than randomness. The point at the bottom of each of
these three plots corresponds to the observation with the large residual. We will say
more about such observations subsequently. For the moment, there is no compelling
reason for remedial action.
Standardized residual Standardized residual

1.5 1.5

.5 .5

–.5 –.5

–1.5 –1.5

–2.5 Iron –2.5 Aluminum

50 150 250 350 0 50 100
(a) (b)

Standardized residual Predicted

1.5 60 Unless otherwise noted, all content on this page is © Cengage Learning.
50
.5
40
–.5 30
20
–1.5
10
–2.5 Predicted 0 Adsorption
0 10 20 30 40 50 60 0 10 20 30 40 50 60 70
(c) (d)

Figure 11.20 Diagnostic plots for the adsorption data: (a) standardized residual
versus 1 (b) standardized residual versus 2 (c) standardized residual versus n
and (d) n versus

Identifying Unusual Observations

Two different types of unusual observations can occur in a regression data set: those with
extreme y values and those with extreme values of the predictors (the xi’s). Extreme y
values are indicated by standardized residuals quite different from zero. Minitab, for ex-
ample, will flag any observation for which a standardized residual exceeds 2 in absolute
value. There is one such observation in the adsorption data set.
One way to recognize an observation whose predictor values are unusual relies on
the fact that each predicted value is a linear function of all the observed y values:

yn1 5 h11y1 1 h12y2 1 … 1 h1nyn

yn2 5 h21y1 1 h22y2 1 … 1 h2nyn

The hij coefficients depend only on the values of the predictors for the various observa-
tions and not on the resulting y values. The coefficient h11 is the weight given to y1 in
computing the corresponding predicted value, and an analogous interpretation applies
to h22, . . . , hnn. Intuitively, a large value of hii for any particular i identifies an observa-
tion that is heavily weighted in calculating the corresponding predicted value. The first
observation is said to have high leverage—high potential influence—if h11 is large rela-
tive to the other hii’s. The influence is only potential because whether an observation is
actually influential depends on its y value as well as the values of the predictors. Minitab
will flag any observation whose hii exceeds 3(k 1 1)yn ( ^ hii 5 k 1 1, so an observation
is flagged if its hii is three times the average of all the hii’s). The hii’s for the adsorption
data are as follows:

.308 .278 .154 .145 .359 .103 .127 .133 .088 .152 .535 .531 .087

Since 3(k 1 1)yn 5 3(3)y13 5 .692, no observation can be characterized as having high
leverage.
A commonly used strategy for assessing the impact of an “unusual” observation—
either a large standardized residual or high leverage—is to remove the observation from
the data set and refit the same model using the remaining observations. If any of the
calculated quantities, such as the bi’s, R2, and se, change substantially from their values
before deletion of the unusual observation, the regression analysis is unstable. When the
observation with the large standardized residual was removed from the adsorption data,
estimated coefficients and other quantities changed very little. When large changes do
occur, one possibility is to use a “robust” fitting technique for which estimated coef-
ficients are not so heavily affected by unusual observations as they may be for a least
squares fit. Consult one of the chapter references for more information on these matters.

Model Selection
An investigator has obtained data on a response variable y and a “candidate pool” of p
predictors (some of which may be mathematical functions of others, such as interaction
or quadratic predictors) and wishes to fit a multiple regression model. Frequently, some
of these p predictors are only weakly related to y or contain information that duplicates

information provided by some of the other predictors. So the issue is how to select a
subset of predictors from the candidate pool to obtain an effective model.
One type of model selection strategy involves fitting all possible models, computing
one or more summary quantities from each fit, and comparing these quantities to iden-
tify the most satisfactory model. With p predictors in the pool, there are 2p possible mod-
els when the model that contains none of the predictors is counted (because there are
two possibilities for each predictor, it could be included in the model or not included).
When p exceeds 5, it is obviously time-consuming to sit in front of a computer and
explicitly request that each possible model in turn be fit. Several of the most powerful
statistical computer packages have an all-subsets option, which will give limited output
from several of the best (according to criteria discussed shortly) models of each different
size. Once the field has been narrowed, the fit of each finalist can then be examined in
more detail. Minitab can be used for this purpose as long as p # 31 (for p 5 31, over
2 billion models are under consideration).
Suppose that p is small enough for the all-subsets option to be feasible. What crite-
ria can be used to select a winner? An obvious and appealing choice is the coefficient of
multiple determination, R2. Certainly for two models containing the same number of
predictors, if the corresponding R2 values are quite different, the model with the larger
value should be preferred to the one with the smaller value. However, using R2 as a
basis for choosing between models that contain different numbers of predictors is not
so straightforward. The reason is that adding a predictor to a model can never result in
a decrease in R2; there is almost always an increase, though it may be quite small. In
particular, let
R2i 5 largest R2 value for any model containing i predictors (i 5 1, 2, . . . , p)

Then R21 # R21 # # R2p. The objective then is not simply to find the model with the
2
largest R value; the model with all p predictors from the candidate pool does that. In-
stead, we should look for a model that contains relatively few predictors but has a large
R2 value. The model should be such that no other model containing more predictors
yields much of an improvement in R2 value. Suppose, for example, that p 5 5 and that
R21 5 .427 R22 5 .733 R23 5 .885 R24 5 .898 R25 5 .901
The best three-predictor model seems to be a good choice, since it substantially im-
proves on the best one- and two-predictor models, whereas very little is gained by using
more than three predictors.
A small increase in R2 resulting from the addition of a predictor to a model may be
offset by the increased complexity of the new model and the reduction in df associated
with SSResid (resulting in less precise estimates and predictions). This is the rationale
for adjusted R2, which can either decrease or increase when a predictor is added to the
model. We can then think of identifying the model whose adjusted R2 is largest and then
consider only this model and any others whose adjusted R2 values are nearly as large.
When considering models containing some fixed number of predictors, for exam-
ple, k 5 8, there may be several different models whose R2 and adjusted R2 values are
rather close to one another. By focusing only on the model with the highest values of
these two criterion measures, we may miss out on other good models that are easier
to interpret and use for estimation and prediction. For this reason, most all-subsets

procedures allow the analyst to specify some number of models c of each given size
(e.g., c 5 3) for which output should be provided.
One other criterion for model selection that has been used with increasing
frequency in recent years is Mallows’ CP. Let i denote the mean or expected value
of yi, which is the value of the response variable for the ith observation in our sample.
Then after fitting any particular model, yni calculated from the fit provides an estimate
of i, and the total expected estimation error for all observations in the data set is
^ E [(yni 2 i)2]. Mallows’ CP is an estimate of this total expected estimation error nor-
malized in a certain way. It is desirable to choose a model for which CP is small. One
additional consideration is that to protect against possible biases in estimates of popula-
tion regression coefficients, it is desirable to have CP k 1 1 when the model under
consideration has k predictors.

Example 11.21 The bond shear strength data introduced in Section 11.5 contains values of four
different independent variables x1–x4. We found that the model with only these four
variables as predictors was useful and that there was no compelling reason to con-
sider the inclusion of second-order predictors. Figure 11.21 is the Minitab output
that results from a request to identify the two best models of each given size.

Figure 11.21 Output from Minitab’s Best Subsets option

The best two-predictor model, with predictors power and temperature, seems to
be a very good choice on all counts: R2 is significantly higher than for models with
fewer predictors yet almost as large as for any larger models, adjusted R2 is almost at
Unless otherwise noted, all content on this page is © Cengage Learning.

its maximum for this data, and CP is small and close to 2 1 1 5 3.

The choice of a “best” model in Example 11.21 seemed reasonably clear-cut. This
is often not the case. More typically, there will be several different models that are more
or less equally appealing in terms of the criteria discussed here. These finalists would
then have to be examined in more detail to choose the best model.
If the number of predictors in the candidate pool is too large or if suitable software
is not available, an alternative to an all-subsets or best regression approach is to use an
automatic selection procedure. The most easily understood such procedure is backward
elimination. First, fit the model containing all predictors in the candidate pool, then
eliminate predictors one by one until at some point all remaining predictors seem im-
portant. This involves looking at the t-ratios biysbi on all coefficients for predictors in the

model at each stage of the process. The obvious candidate for elimination is the predic-
tor corresponding to the t-ratio closest to zero. The most frequently used rule of thumb
in practice is to stop eliminating predictors when all t-ratios either exceed 2 or are less
than 22. Some packages use F ratios, which are the squares of t-ratios, with a cutoff of 4.

Example 11.22 Figure 11.22 shows Minitab output from the backward elimination procedure ap-
plied to the bond shear strength data (this was done within Minitab’s Stepwise op-
tion). At the first stage, the t-ratio closest to zero was 1.01 for the coefficient cor-
responding to the predictor force. Since this t-ratio is between 22 and 2, force is
eliminated (this would also have been the case if the t-ratio had been 21.01). At
the next stage, the model with the three remaining predictors was fit. The predictor
time now qualifies for elimination, since the corresponding t-ratio 1.23 is closest
to 0 and between 22 and 2. When the model with the two remaining predictors is
fit, both the corresponding t-ratios exceed 2 in absolute value, and the procedure is
terminated. The resulting model is the same one that we suggested previously based
on all-subsets considerations.

Figure 11.22 Backward elimination output from Minitab

Another automatic selection procedure is forward selection, in which predictors from

the candidate pool are added to the model one-by-one until, at a certain point, none of
the predictors not already added appears useful. Suppose, for example, that p 5 10 and
that x1, x6, and x9 have already been added. Then at the next stage, each four-predictor
model that includes these three predictors along with one of the predictors not yet added
would be fit (e.g., the model with predictors x1, x6, x9, and x2). The t-ratios for the coef-
ficients corresponding to not-yet-entered predictors are the basis for deciding whether to
enter at least one more predictor or to terminate. A 6 2 cutoff is frequently employed.
Clearly, forward selection involves fitting many more models than is the case with back-
ward elimination; for example, at the first stage, all p one-predictor models must be fit to
decide whether at least one predictor should enter.

A variation on this is stepwise regression, in which predictors are added one-by-one

with the option of deleting a predictor at some later stage that was added previously. The
justification for this variation is that a predictor that earlier seemed important may be-
come redundant once several other predictors have been entered into the model. The
models identified by backward elimination, forward selection, and stepwise regression
may not be the same. Furthermore, none of these automatic selection procedures may
identify the model selected using the all-subsets criteria. Our recommendation is that
all-subsets be used in preference to any of the automatic selection procedures whenever
possible.

Multicollinearity
When the values of the single predictor x in a simple linear regression analysis are all
quite close to one another, sb will usually be quite large, indicating that the slope coeffi-
cient has been imprecisely estimated. The analogous situation in multiple regression is
referred to as multicollinearity. When the model to be fit includes the k predictors x1, . . . ,
xk, there is said to be multicollinearity if there is a strong linear relationship between these
predictors (so multicollinearity has nothing to do with the response variable y). Severe
multicollinearity leads to poorly estimated population regression coefficients and various
other problems. The most straightforward way to recognize the presence of multicol-
linearity is to fit k different regression models, each of which has one of the x variables as
the dependent variable and the other k 2 1 predictors as the independent variables (e.g.,
if k 5 5, there would be five regressions, the first with x1 as the dependent variable, the
second with x2 playing this role, and so on). If one or more of the resulting R2 values is
close to 1, multicollinearity exists. If you use Minitab to regress y against the k predictors,
a warning message will appear if any of these R2’s exceeds .99, and the package will not
allow you to include all predictors if any R2 exceeds .9999. Many analysts would be more
conservative and say that multicollinearity is a problem if any R2 exceeds .9.
When values of the predictor variable are under the control of the experimenter,
as was the case in the bond shear strength example, a careful choice of values will
preclude multicollinearity from arising. It is, however, often a problem in social sci-
ence or business applications, where data results simply from observation rather than
from intervention by an investigator. Statisticians have proposed various remedies
for the problems associated with multicollinearity, but a discussion would take us
beyond the scope of this book (after all, we want to leave something for your next
statistics course!).

Logistic Regression
The simple linear regression model is appropriate for relating a quantitative response
variable y to a quantitative predictor x. Suppose that y is a dichotomous variable with
possible values 1 and 0 corresponding to success and failure. Let 5 P(S) 5 P(y 5 1).
Frequently, the value of will depend on the value of some quantitative variable x. For
example, the probability that a car needs warranty service of a certain kind might well
depend on the car’s mileage, or the probability of avoiding an infection of a certain type
might depend on the dosage in an inoculation. Instead of using just the symbol for the
success probability, we now use (x) to emphasize the dependence of this probability

on the value of x. The simple linear regression equation y 5 1 x 1 e is no longer ap-

propriate, for taking the mean value on each side of the equation gives

y 5 1 ? (x) 1 0 ? (1 2 (x)) 5 (x) 5 1 x

Whereas (x) is a probability and therefore must be between 0 and 1, 1 x need not
be in this range.
Instead of letting the mean value of y be a linear function of x, we now consider a
model in which some function of the mean value of y is a linear function of x. In other
words, we allow (x) to be a function of 1 x rather than 1 x itself. A function that
has been found quite useful in many applications is the logit function,

e1x
(x) 5
1 1 e1x

Figure 11.23 shows a graph of (x) for particular values of and with . 0. As x
increases, the probability of success increases. For negative, the success probability
would be a decreasing function of x.

( )

1.0

0
10 20 30 40 50 60 70 80

Figure 11.23 A graph of a logit function

Logistic regression means assuming that (x) is related to x by the logit function.
Straightforward algebra shows that

(x)
5 e1x
1 2 (x)

The expression on the left-hand side is called the odds. Suppose, for example, that
(60)y[1 2 (60)] 5 3. Then when x 5 60 a success is three times as likely as a fail-
ure. We now see that the logarithm of the odds is a linear function of the predictor. In
particular, the slope parameter is the change in the log odds associated with a 1-unit
increase in x. This implies that the odds itself changes by the multiplicative factor e
when x increases by 1 unit.

Fitting the logistic regression to sample data requires that the parameters and
be estimated. This is usually done using the maximum likelihood technique described
in Chapter 7. The details are quite involved, but fortunately the most popular statistical
computer packages will do this on request and provide quantitative and pictorial indica-
tions of how well the model fits.

Example 11.23 Here is data on launch temperature and the incidence of failure for O-rings in
24 space shuttle launches prior to the Challenger disaster of 1986.

Temperature Failure Temperature Failure

53 Y 70 Y
57 Y 72 N
63 N 73 N
66 N 75 N
67 N 75 Y
67 N 76 N
67 N 76 N
68 N 78 N
69 N 79 N
70 N 80 N
70 Y 81 N
70 N

Figure 11.24 shows Minitab output for a logistic regression analysis and a graph of
the estimated logit function from the R software. We have chosen to let denote
the probability of failure. The graph of decreases as temperature increases because
failures tended to occur at lower temperatures than did successes. The estimate of
is b 5 2.232, and the estimated standard deviation of b is sb 5 .1082. Provided
that n is large enough, and we assume it is in this case, b has approximately a normal
distribution. If 5 0 (temperature does not affect the likelihood of O-ring failure),
z 5 bysb has approximately a standard normal distribution. The value of this z-ratio is
22.14, and the P-value for a two-tailed test is .032 (some packages report a chi-square
value, which is just z2, with the same P-value). At significance level .05, we reject the
null hypothesis of no temperature effect.
The estimated odds of failure for any particular temperature value x is
(x)
5 e15.04292.232163x
1 2 (x)
This implies that the odds ratio, the odds of failure at a temperature of x 1 1 divided
by the odds of failure at a temperature of x, is

(x 1 1)y[1 2 (x 1 1)]

5 e2.232163 5 .7928
(x)y[1 2 (x)]

Binary Logistic Regression: Failure versus Temp

Logistic Regression Table
Odds 95% CI
Predictor Coef SE Coef z p Ratio Lower Upper
Constant 15.0429 7.37862 2.04 0.041
Temp -0.232163 0.108236 -2.14 0.032 0.79 0.64 0.98

(a)

1.0 Y YY Y Y Y
Y

0.8

Y Failure
N No Failure
0.6 Predicted Probability of Failure
Failure

0.4

0.2

N
N N N
0.0 NN N NN NN NN NN N
55 60 65 70 75 80
Temperature
(b)

Figure 11.24 (a) Logistic regression output from Minitab for Example 11.23;
(b) graph of estimated logistic function from R

The interpretation is that for each additional degree of temperature, we estimate that
the odds of failure will decrease by a factor of .79 (21%). A 95% CI for the true odds
ratio also appears on the output. Unless otherwise noted, all content on this page is © Cengage Learning.
The launch temperature for the Challenger mission was only 31°F. This temper-
ature is much smaller than any value in the sample, so it is dangerous to extrapolate
the estimated relationship. Nevertheless, it appears that O-ring failure is virtually a
sure thing for a temperature this small.

Our treatment of logistic regression modeling can be extended in an obvious way

to incorporate more than one predictor. The probability of success is now a function
of the predictors x1, x2, . . . , xk:
…
e11x1 1 1kxk
(x1, . . . , xk) 5 …
1 1 e11x1 1 1kxk

Simple algebra yields an expression for the odds:

(x1, . . . , xk) …
5 e11x1 1 1k xk
1 2 (x1, . . . , xk)
The interpretation of i (i 5 1, . . . , k) is analogous to the interpretation for given
in the logit function containing only a single predictor x. That is, for i 5 1, . . . , k, the
following argument shows that the odds changes by the multiplicative factor ei when xi
increases by 1 unit and all other predictors remain fixed.
(x1, . . . , xi 1 1, . . . , xk) … (x 11)1 … 1 x
5 e11x1 1 i i k k

1 2 (x1, . . . , xi 1 1, . . . , xk)
… …
5 e11x1 1 ixi 1 1kxk 1i
(x1, . . . , xk)
5 e i
1 2 (x1, . . . , xk)

Again, statistical software must be used to estimate parameters, calculate relevant stan-
dard deviations, and provide other inferential information.

Example 11.24 Data was obtained from 189 women who gave birth during a particular period
at the Baystate Medical Center in Springfield, Massachusetts, in order to iden-
tify factors associated with low birth weight. The accompanying Minitab output
resulted from a logistic regression in which the dependent variable indicated
whether (1) or not (0) a child had low birth weight (,2500 g), and predictors were
weight of the mother at her last menstrual period, age of the mother, and an indi-
cator variable for whether (1) or not (0) the mother had smoked during pregnancy.
Logistic Regression Table
Odds 95% CI
Predictor Coef SE Coef z p Ratio Lower Upper
Constant 2.06239 1.09516 1.88 0.060
Wt -0.01701 0.00686 -2.48 0.013 0.98 0.97 1.00
Age -0.04478 0.03391 -1.32 0.187 0.96 0.89 1.02
Smoke 0.65480 0.33297 1.97 0.049 1.92 1.00 3.70
It appears that age is not an important predictor of low birth weight, provided that the
two other predictors are retained. The other two predictors do appear to be informa-
tive. The point estimate of the odds ratio associated with smoking status is 1.92 (ratio
of the odds of low birth weight for a smoker to the odds for a nonsmoker); at the 95%
confidence level, the odds of a low-birth-weight child could be as much as 3.7 times
higher for a smoker than what it is for a nonsmoker.

Please see one of the chapter references for more information on logistic regres-
sion, including methods for assessing model effectiveness and adequacy.
We have reached the end of our exposition, but hopefully this is not the end of your
statistical education. Our hope is that you have enjoyed the journey through statistics
thus far and that you will find many opportunities to apply the concepts and methods
in the near future. Enjoy!!

Section 11.6 Exercises

45. Reconsider the data on x 5 inverse thickness and a. Construct a normal quantile plot of the stan-
y 5 flux from Exercise 9 of Section 11.1. The values dardized residuals to see whether it is plausible
of the standardized residuals from a simple linear that the random deviations in the fitted model
regression analysis and the corresponding normal come from a normal distribution.
quantiles follow: b. Plot the standardized residuals against depth
x: 19.8 20.6 23.5 26.1 and against water content, and comment on the
Standardized plots.
residual: 21.20 1.64 .94 2.48 47. The accompanying table shows the smallest value
z quantile: 2.85 1.43 .85 2.47 of SSResid for each number of predictors k (k 5
1, 2, 3, 4) for a regression problem in which y 5
x: 30.3 43.5 45.0 46.5
cumulative heat of hardening in cement, x1 5 %
Standardized
tricalcium aluminate, x2 5 % tricalcium silicate,
residual: 21.41 .70 2.06 2.03
x3 5 % aluminum ferrate, and x4 5 % dicalcium
z quantile: 21.43 .47 2.15 .15 silicate:
a. Does it appear plausible that the random de-
viations in the simple linear regression model Number of
equation are normally distributed? predictors k Predictors SSResid
b. Construct a plot of the standardized residuals 1 x4 880.85
versus x and comment. 2 x1, x2 58.01
46. Exercise 41 of Section 11.5 gave data on y 5 shear 3 x1, x2, x3 49.20
strength of a soil specimen, x1 5 depth, and x2 5 4 x1, x2, x3, x4 47.86
water content. The data is presented again, along
with the standardized residuals and corresponding
In addition, n 5 13 and SSTo 5 2715.76.
normal scores obtained from fitting the complete
a. Use the criteria discussed in the text to recom-
second-order model.
mend the use of a particular model.
b. Would the forward selection method of model
Obs Shstren Depth Watcont Stresid NQuant selection have considered the best two-predictor
1 14.7 8.9 31.5 21.50075 21.20448 model? Explain your reasoning.
2 48.0 36.6 27.0 .53889 .89743
3 25.6 36.8 25.9 2.52893 2.65862 48. A study carried out to investigate the relationship
between a response variable relating to pressure
4 10.0 6.1 39.1 2.17350 2.26585
drops in a screen-plate bubble column and the pre-
5 16.0 6.9 39.2 .33350 .45321
dictors x1 5 superficial fluid velocity, x2 5 liquid
6 16.8 6.9 38.3 .04076 2.08767
viscosity, and x3 5 opening mesh size resulted in
7 20.7 7.3 33.9 2.41791 2.45321
the accompanying data (top of page 569; “A Corre-
8 38.8 8.4 33.8 2.16543 1.70991 lation of Two-Phase Pressure Drops in Screen-Plate
9 16.9 6.5 27.9 .22720 .26585 Bubble Column,” Canad. J. of Chem. Engr., 1993:
10 27.0 8.0 33.1 .43788 .65862 460–463).
11 16.0 4.5 26.3 .19601 .08767 The standardized residuals and hii values re-
12 24.9 9.9 37.8 2.90858 2.89743 sulted from the model that included the three in-
13 7.3 2.9 34.6 21.53399 21.70991 dependent variables as predictors. Are there any
14 12.8 2.0 36.4 1.02146 1.20448 unusual observations?

Data for Exercise 48

Standardized
Obs Velocity Viscosity Mesh size Response residual hii
1 2.14 10.00 .34 28.9 2.01721 .202242
2 4.14 10.00 .34 26.1 1.34706 .066929
3 8.15 10.00 .34 22.8 .96537 .274393
4 2.14 2.63 .34 24.2 1.29177 .224518
5 4.14 2.63 .34 15.7 2.68311 .079651
6 8.15 2.63 .34 18.3 .23785 .267959
7 5.60 1.25 .34 18.1 .06456 .076001
8 4.30 2.63 .34 19.1 .13131 .074927
9 4.30 2.63 .34 15.4 2.74091 .074927
10 5.60 10.10 .25 12.0 21.38857 .152317
11 5.60 10.10 .34 19.8 2.03585 .068468
12 4.30 10.10 .34 18.6 2.40699 .062849
13 2.40 10.10 .34 13.2 21.92274 .175421
14 5.60 10.00 .55 22.8 21.07990 .712933
15 2.14 112.00 .34 41.8 21.19311 .516298
16 4.14 112.00 .34 48.6 1.21302 .513214
17 5.60 10.10 .25 19.2 .38451 .152317
18 5.60 10.10 .25 18.4 .18750 .152317
19 5.60 10.10 .25 15.0 2.64979 .152317

49. The article “Anatomical Factors Influencing Wood s s s s s

Specific Gravity of Slash Pines and the Implica- p u p u u
tions for the Development of a High-Quality Pulp- f f p l l
wood” (TAPPI, 1964: 401–404) reported the results R-Sq i i e a a
of an experiment in which 20 specimens of slash Vars R-Sq (adj) C-p s b b r b b
pine wood were analyzed. A primary objective was 1 56.4 53.9 10.6 0.021832 X
to relate wood specific gravity (y) to various other 1 10.6 5.7 38.5 0.031245 X
wood characteristics. Consider the accompanying
1 5.3 0.1 41.7 0.032155 X
data (top of page 570) on y and the predictors x1 5
2 65.5 61.4 7.0 0.019975 X X
number of fibers/mm2 in springwood, x2 5 number
2 62.1 57.6 9.1 0.020950 X X
of fibers/mm2 in summerwood, x3 5 springwood %,
x4 5 % springwood light absorption, and x5 5 % 2 60.3 55.6 10.2 0.021439 X X
summerwood light absorption. 3 72.3 67.1 4.9 0.018461 X X X
Based on the accompanying Minitab output, 3 71.2 65.8 5.6 0.018807 X X X
which model(s) would you recommend investigat- 3 71.1 65.7 5.6 0.018846 X X X
ing in more detail? 4 77.0 70.9 4.0 0.017353 X X X X
4 74.8 68.1 5.4 0.018179 X X X X
4 72.7 65.4 6.7 0.018919 X X X X
5 77.0 68.9 6.0 0.017953 X X X X X

Data for Exercise 49

% %
Springwood Summerwood
Springwood Summerwood % light light Specific
Obs fibers fibers Springwood absorption absorption gravity
1 573 1059 46.5 53.8 84.1 .534
2 651 1356 52.7 54.5 88.7 .535
3 606 1273 49.4 52.1 92.0 .570
4 630 1151 48.9 50.3 87.9 .528
5 547 1135 53.1 51.9 91.5 .548
6 557 1236 54.9 55.2 91.4 .555
7 489 1231 56.2 45.5 82.4 .481
8 685 1564 56.6 44.3 91.3 .516
9 536 1182 59.2 46.4 85.4 .475
10 685 1564 63.1 56.4 91.4 .486
11 664 1588 50.6 48.1 86.7 .554
12 703 1335 51.9 48.4 81.2 .519
13 653 1395 62.5 51.9 89.2 .492
14 586 1114 50.5 56.5 88.9 .517
15 534 1143 52.1 57.0 88.9 .502
16 523 1320 50.5 61.2 91.9 .508
17 580 1249 54.6 60.8 95.4 .520
18 448 1028 52.2 53.4 91.8 .506
19 476 1057 42.9 53.2 92.9 .595
20 528 1057 42.4 56.6 90.0 .568
50. The accompanying Minitab output resulted from Response is spgrav on 5 predictors, with N = 20

applying both the backward elimination method Step 1 2

Constant 0.7585 0.5179
and the forward selection method to the wood spe-
%sprwood –0.00444 –0.00438
cific gravity data given in Exercise 49. Explain for
T-Value –4.82 –5.20
each method what occurred at every iteration of the
sumltabs 0.0027
algorithm.
T-Value 2.12
Response is spgrav on 5 predictors, with N = 20 S 0.0218 0.0200
Step 1 2 3 4 R-Sq 56.36 65.50
Constant 0.4421 0.4384 0.4381 0.5179
51. The article “The Analysis and Selection of Variables
sprngfib 0.00011 0.00011 0.00012
T-Value 1.17 1.95 1.98
in Linear Regression” (Biometrics, 1976: 1–49) con-
sidered a data set of 32 observations on the follow-
sumrfib 0.00001
T-Value 0.12
ing variables: y 5 fuel efficiency, x1 5 engine type
(straight or V), x2 5 number of cylinders, x3 5 trans-
%sprwood –0.00531 –0.00526 –0.00498 –0.00438
T-Value –5.70 –6.56 –5.96 –5.20 mission type (manual or automatic), x4 5 number
of transmission speeds, x5 5 engine size, x6 5 horse-
spltabs –0.0018 –0.0019
T-Value –1.63 –1.76 power, x7 5 number of carburetor barrels, x8 5 final
sumltabs 0.0044 0.0044 0.0031 0.0027 drive ratio, x9 5 weight, and x10 5 quarter-mile time.
T-Value 3.01 3.31 2.63 2.12 Use the summary information (top of page 571) on
S 0.0180 0.0174 0.0185 0.0200 the best model of each given size to select a model,
R-Sq 77.05 77.03 72.27 65.50 and explain the rationale for your choice.

Number of Variables Adjusted Calculate and interpret the values of R2 and

2
predictors included R R2 adjusted R2. Does the model appear to be useful?
1 9 .756 .748 b. Fitting the complete second-order model gave
2 2, 9 .833 .821 the following results:
3 3, 9, 10 .852 .836
Predictor Coef SE Coef T P
4 3, 6, 9, 10 .860 .839
Constant -119.49 18.53 -6.45 0.000
5 3, 5, 6, 9, 10 .866 .840
x1 -0.1047 0.2839 -0.37 0.718
6 3, 5, 6, 8, 9, 10 .869 .837
x2 28.678 3.625 7.91 0.000
7 3, 4, 5, 6, 8, 9, 10 .870 .832
x3 0.4074 0.1303 3.13 0.007
8 3, 4, 5, 6, 7, 8, 9, 10 .871 .826 x4 0.2711 0.2606 1.04 0.316
9 1, 3, 4, 5, 6, 7, 8, 9, 10 .871 .818 x1sqd -0.000752 0.002110 -0.36 0.727
10 All independent x2sqd -1.6452 0.2110 -7.80 0.000
variables .871 .809
x3sqd 0.0002121 0.0005275 0.40 0.694
x4sqd -0.015152 0.002110 -7.18 0.000
52. Refer to the wood specific gravity data presented
x1x2 0.02150 0.02687 0.80 0.437
in Exercise 49. The following R2 values resulted
x1x3 0.000550 0.001344 0.41 0.688
from regressing each predictor on the other four
predictors (in the first regression, the dependent x1x4 -0.000800 0.002687 -0.30 0.770
variable was x1 and the predictors were x2 2 x5, x2x3 -0.05900 0.01344 -4.39 0.001
etc.): .628, .711, .341, .403, and .403. Does mul- x2x4 0.03900 0.02687 1.45 0.169
ticollinearity appear to be a substantial problem? x3x4 0.002725 0.001344 2.03 0.062
Explain.

53. The article “Response Surface Methodology for S = 0.268703 R-Sq = 96.7% R-Sq(adj) = 93.4%
Protein Extraction Optimization of Red Pepper
Analysis of Variance
Seed” (Food Sci. and Tech., 2010: 226–231) gave
data on the response variable y 5 protein yield (%) Source DF SS MS F P
and the independent variables x1 5 temperature Regression 14 29.4287 2.1020 29.11 0.000
(°C), x2 5 pH, x3 5 extraction time (min), and x4 5
Res. Error 14 1.0108 0.0722
solvent/meal ratio.
a. Fitting the model with the four xi’s as predictors Total 28 30.4395
yielded the following output:
Does at least one of the second-order predictors
appear to be useful? Carry out an appropriate
Predictor Coef SE Coef T P
test of hypotheses.
Constant -4.586 2.542 -1.80 0.084 c. From the output in part (b), we conjecture that
x1 0.01317 0.02707 0.49 0.631 none of the predictors involving x1 are provid-
x2 1.6350 0.2707 6.04 0.000 ing useful information. When these predictors
x3 0.02883 0.01353 2.13 0.044 were eliminated, the value of SSResid for the
x4 0.05400 0.02707 1.99 0.058 reduced regression model is 1.1887. Does this
support the conjecture?
d. Here is output from Minitab’s best subsets op-
Source DF SS MS F P tion, with just the single best subset of each
Regression 4 19.8882 4.9721 11.31 0.000 size identified. Which model(s) would you
Res. Error 24 10.5513 0.4396 consider using (subject to checking model
Total 28 30.4395 adequacy)?

1 2 3 4 x x x x x x
s s s s 1 1 1 2 2 3
Mallows x x x x q q q q x x x x x x
Vars R-Sq R-Sq(adj) Cp S 1 2 3 4 d d d d 2 3 4 3 4 4
1 52.7 50.9 174.4 0.73030 X
2 67.9 65.4 112.5 0.61349 X X
3 77.5 75.0 73.1 0.52124 X X X
4 83.4 80.7 50.8 0.45835 X X X X
5 90.9 88.9 21.4 0.34731 X X X X X
6 94.6 93.1 7.9 0.27422 X X X X X X
7 95.8 94.4 4.7 0.24683 X X X X X X X
8 96.2 94.6 5.1 0.24137 X X X X X X X X
9 96.4 94.7 6.1 0.23962 X X X X X X X X X
10 96.6 94.6 7.5 0.24132 X X X X X X X X X X
11 96.6 94.4 9.4 0.24716 X X X X X X X X X X X
12 96.6 94.1 11.2 0.25328 X X X X X X X X X X X X
13 96.7 93.8 13.1 0.26041 X X X X X X X X X X X X X
14 96.7 93.4 15.0 0.26870 X X X X X X X X X X X X X X

54. It seems reasonable that the size of a cancerous out to determine risk factors for kyphosis reported
tumor should be related to the likelihood that the the accompanying ages (months) for 40 subjects at
cancer will spread (metastasize) to another site. the time of the operation; the first 18 subjects did
The article “Molecular Detection of p16 Promoter have kyphosis and the remaining 22 did not.
Methylation in the Serum of Patients with Esopha- Kyphosis: 12 15 42 52 59 73
geal Squamous Cell Carcinoma” (Cancer Res., 82 91 96 105 114 120
2001: 3135–3138) investigated the spread of esoph- 121 128 130 139 139 157
ageal cancer to the lymph nodes. With x 5 size of
No kyphosis: 1 1 2 8 11 18
a tumor (cm) and y 5 1 if the cancer does spread,
consider the logistic regression model with a 5 22 22 31 37 61 72 81
and b 5 .5 (values suggested by data in the article). 97 112 118 127 131 140
a. Tabulate values of x, (x), the odds (x)y[1 2 151 159 177 206
(x)], and the log odds for x 5 0, 1, 2, . . . , 10. Use the Minitab logistic regression output below
b. Explain what happens to the odds when x is in- to decide whether age appears to have a significant
creased by 1. Your explanation should involve impact on the presence of kyphosis.
the .5 that appears in the formula for (x).
56. The following data resulted from a study commis-
c. For what value of x are the odds 1? 5? 10?
sioned by a large management consulting company
55. Kyphosis refers to severe forward flexion of the spine to investigate the relationship between amount of
following corrective spinal surgery. A study carried job experience (months) for a junior consultant

Logistic Regression Table for Exercise 55

Odds 95% CI
Predictor Coef StDev Z P Ratio Lower Upper
Constant –0.5727 0.6024 –0.95 0.342
age 0.004296 0.005849 0.73 0.463 1.00 0.99 1.02
Logistic Regression Table for Exercise 56
Odds 95% CI
Predictor Coef StDev Z P Ratio Lower Upper
Constant –3.211 1.235 –2.60 0.009
age 0.17772 0.06573 2.70 0.007 1.19 1.05 1.36

and the likelihood of the consultant being able to ID x1 x2 Stable? | ID x1 x2 Stable?

perform a certain complex task. 19 0.83 0.97 N | 25 0.94 1.30 N
20 0.57 0.94 N | 26 1.58 0.83 N
Success: 8 13 14 18 20 21 21 22 25
21 1.44 1.00 N | 27 1.67 1.05 N
26 28 29 30 32 22 2.08 0.78 N | 28 3.00 1.19 N
Failure: 4 5 6 6 7 9 10 11 11 23 1.50 1.03 N | 29 2.21 0.86 N
13 15 18 19 20 23 27 24 1.38 0.82 N |
Interpret the Minitab logistic regression output The corresponding logistic regression output from
(p. 572), and sketch a graph of the estimated proba- R is given here:
bility of task performance as a function of experience.
Coefficients:
57. Pillar stability is a most important factor to ensure Std.
Estimate Error z value Pr(>|z|)
safe conditions in underground mines. The authors
(Intercept) -13.146 5.184 -2.536 0.0112
of “Developing Coal Pillar Stability Chart Using
x1 2.774 1.477 1.878 0.0604
Logistic Regression” (Intl. J. of Rock Mechanics &
x2 5.668 2.642 2.145 0.0319
Mining Sci., 2013: 55–60) used a logistic regres-
sion model to predict pillar stability. The article a. Use the output to determine whether the two
reported the following data on x1 5 pillar height to predictor variables appear to have a significant
width ratio, x2 5 pillar strength to stress ratio, and impact on pillar stability. Use 5 .1.
pillar stability for 29 coal pillars. b. Provide interpretations for e2.774 and e5.668.
ID x1 x2 Stable? | ID x1 x2 Stable? c. Determine an estimate (as the authors did) for
1 1.80 2.40 Y | 10 3.59 5.55 Y the probability of pillar stability for each of the
2 1.65 2.54 Y | 11 8.33 2.58 Y 29 pillars using the parameter estimates given in
3 2.70 0.84 Y | 12 2.86 2.00 Y the output. Then label each pillar as “stable” if the
4 3.67 1.68 Y | 13 2.58 3.68 Y
estimated probability is at least .75 and “unstable”
5 1.41 2.41 Y | 14 2.90 1.13 Y
otherwise. How many of the pillars that were ac-
6 1.76 1.93 Y | 15 3.89 2.49 Y
7 2.10 1.77 Y | 16 0.80 1.37 N tually stable were correctly designated as “stable”?
8 2.10 1.50 Y | 17 0.60 1.27 N How many unstable pillars were correctly desig-
9 4.57 2.43 Y | 18 1.30 0.87 N nated as “unstable”?

Supplementary Exercises
58. Suppose data was collected on y 5 bulk density Here is the Mintab output from a request to fit a
(kg/m3) and x 5 moisture content (%) for a sample simple linear regression model of y on x:
of six seeds of a particular type resulting in the ac-
Unless otherwise noted, all content on this page is © Cengage Learning.

companying scatterplot. The regression equation is

density = 545 - 5.46 moisture
Scatterplot of density vs moisture
Predictor Coef SE Coef T p
500 Constant 545.23 28.19 19.34 0.000
moisture -5.463 1.786 -3.06 0.038
480
density

460
Noticing the relatively small P-value for the mois-
440 ture predictor, a fellow student concludes that,
based on the model utility test, there is a useful lin-
420
ear relationship between the two variables. Com-
400 ment on the validity of this conclusion. How useful
5.0 7.5 10.0 12.5 15.0 17.5 20.0 22.5 is this Minitab output (keeping in mind the scat-
moisture terplot of the data)?

59. The accompanying data was read from a scatterplot Relevant summary qualities are as follows:
in the article “Urban Emissions Measured with
Aircraft” (J. of the Air and Waste Mgmt. Assoc., ^ xi 5 19.404, ^ yi 5 2.549, ^ x2i 5 13.248032
1998: 16–25). The response variable is DNOy and ^ y2i 5 11.835795, ^ xiyi 5 3.497811
the explanatory variable is DCO. Sxx 5 13.248032 2 (19.404)2y32 5 1.48193150,
DCO: 50 60 95 108 135 Syy 5 11.82637622
DNOy: 2.3 4.5 4.0 3.7 8.2 Sxy 5 3.497811 2 (19.404)(2.549)y32 5 3.83071088
DCO: 210 214 315 720
a. Fit the simple linear regression model to this
DNOy: 5.4 7.2 13.8 32.1
data. Then determine the proportion of ob-
a. Fit an appropriate model to the data and judge served variation in astringency that can be at-
the utility of the model. tributed to the model relationship between as-
b. Predict the value of DNOy that would result from tringency and tannin concentration.
making one more observation when DCO is b. Calculate and interpret a confidence interval for
400, and do so in a way that conveys information the slope of the true regression line.
about precision and reliability. Does it appear c. Estimate true average astringency when tannin
that DNOy can be accurately predicted? Explain. concentration is .6, and do so in a way that con-
c. The largest value of DCO is much greater than the veys information about reliability and precision.
other values. Does this observation appear to have d. Predict astringency for a single wine sample
had a substantial impact on the fitted equation? whose tannin concentration is .6, and do so in
60. Astringency is the quality in a wine that makes a wine a way that conveys information about reliability
drinker’s mouth feel slightly rough, dry, and puckery. and precision.
The paper “Analysis of Tannins in Red Wine Us- 61. In a discussion of the article “Tensile Behavior of
ing Multiple Methods: Correlation with Perceived Slurry Infiltrated Mat Concrete (SIMCON)” (ACI
Astringency” (Amer. J. Enol. Vitic., 2006: 481–485) Materials J., 1998: 77–79), the discussant presented
reported on an investigation to assess the relationship data on y 5 toughness (psi) and x 5 aspect ratio. He
between perceived astringency and tannin concen- stated that “a (simple linear) regression analysis clear-
tration using various analytic methods. Here is data ly shows that the aspect ratio is not a reliable variable
provided by the authors on x 5 tannin concentra- that can be used to predict toughness.” The following
tion by protein precipitation and y 5 perceived observations were read from a graph in the article:
astringency as determined by a panel of tasters.
x: 0.718 0.808 0.924 1.000 x: 500 500 500 500 500 715 715 715 715 715
y: 0.428 0.480 0.493 0.978 y: 33 34 35 38 40 35 36 37 39 44
x: 0.667 0.529 0.514 0.559
a. Why is the relationship between these two vari-
y: 0.318 0.298 20.224 0.198
ables clearly not deterministic?
x: 0.766 0.470 0.726 0.762
b. Fit the simple linear regression model, and state
y: 0.326 20.336 0.765 0.190
whether you agree with the discussant’s assessment.
x: 0.666 0.562 0.378 0.779
c. Even if the y values had been much closer to-
y: 0.066 20.221 20.898 0.836
gether, so that the model could be judged use-
x: 0.674 0.858 0.406 0.927
ful, would there be any way to check model
y: 0.126 0.305 20.577 0.779
adequacy to decide whether a quadratic regres-
x: 0.311 0.319 0.518 0.687
sion model would be more appropriate? Explain
y: 20.707 20.610 20.648 20.145
your reasoning.
x: 0.907 0.638 0.234 0.781
y: 1.007 20.090 21.132 0.538 62. The accompanying data on y 5 energy output (W)
x: 0.326 0.433 0.319 0.238 and x 5 temperature difference (K) was provided
y: 21.098 20.581 20.862 20.551 by the authors of the article “Comparison of Energy

and Exergy Efficiency for Solar Box and Parabolic Settleability Measures” (Water Environ. Research,
Cookers” (J. of Energy Engr., 2007: 53–62). 1998: 87–93) included a scatterplot of y 5 final set-
tled height fraction versus x 5 initial solids concen-
x: 23.20 23.50 23.52 24.30 25.10 26.20
tration (g/L), from which the following data was read:
y: 3.78 4.12 4.24 5.35 5.87 6.02
x: 27.40 28.10 29.30 30.60 31.50 32.01 x: .5 .9 1.1 1.7 2.0 2.2 2.7 3.0 3.3 4.2
y: 6.12 6.41 6.62 6.43 6.13 5.92 y: .06 .08 .10 .13 .15 .16 .18 .17 .15 .27

x: 32.63 33.23 33.62 34.18 35.43 35.62 x: 4.5 5.3 5.8 5.9 6.2 6.8 7.2 9.1 9.4 10.4
y: 5.64 5.45 5.21 4.98 4.65 4.50 y: .30 .25 .31 .32 .48 .43 .32 .40 .61 .57
x: 36.16 36.23 36.89 37.90 39.10 4166
Summary quantities include n 5 20,
y: 4.34 4.03 3.92 3.65 3.02 2.89
^ xi 5 92.2 ^ x2i 5 591.46
The article’s authors fit a cubic regression model to
the data. Here is Minitab output from such a fit.
^ yi 5 5.44 ^ y2i 5 1.9674
The regression equation is
^ xiyi 5 33.577
y = -134 + 12.7 x - 0.377 x**2 + 0.00359 x**3 a. The article included the statement “the linear
Predictor Coef SE Coef T P correlation coefficient, r2 5 .89.” Is this entire
Constant -133.787 8.048 -16.62 0.000 statement correct? If not, why, and what part is
x 12.7423 0.7750 16.44 0.000 correct?
x**2 -0.37652 0.02444 -15.41 0.000 b. Carry out a test of appropriate hypotheses to see
x**3 0.0035861 0.0002529 14.18 0.000 whether there is in fact a linear relationship be-
tween the two variables.
s = 0.168354 R-Sq = 98.0% R-Sq(adj) = 97.7% c. The standardized residuals from fitting the sim-
ple linear regression model are (in increasing
Analysis of Variance
order of x values) 2.04, 2.05, .14, .13, .22, .21,
Source DF SS MS F P
.11, 2.37, 21.04, .36, .63, 21.08, 2.43, 2.34,
Regression 3 27.9744 9.3248 329.00 0.000
2.40, .88, 21.62, 22.04, 1.90, and .05. Does a
Res. Error 20 0.5669 0.0283
plot of the standardized residuals versus x show
Total 23 28.5413
a disturbing pattern? Explain.
a. What proportion of observed variation in energy
64. The use of microorganisms to dissolve metals from
output can be attributed to the model relationship?
ores has offered an ecologically friendly and less
b. Fitting a quadratic model to the data results in
expensive alternative to traditional methods. The
R2 5 .780. Calculate adjusted R2 for this model
dissolution of metals by this method can be done
and compare to adjusted R2 for the cubic
in a two-stage bioleaching process: (1) microorgan-
model.
isms are grown in culture to produce metabolites
c. Does the cubic predictor appear to provide use-
(e.g. organic acids) and (2) ore is added to the
ful information about y over and above that
culture medium to initiate leaching. The article
provided by the linear and quadratic predictors?
“Two-Stage Fungal Leaching of Vanadium from
State and test the appropriate hypotheses.
Uranium Ore Residue of the Leaching Stage us-
d. When x 5 30, syn5.0611. Calculate a 95% CI
ing Statistical Experimental Design” (Annals of
for true average energy output in this case, and
Nuclear Energy, 2013: 48–52) reported on a two-
also a 95% PI for a single energy output to be
stage bioleaching process of vanadium by using
observed when temperature difference is 30.
the fungus Aspergillus niger. In one study, the au-
63. Secondary settling tanks play an important role in thors examined the impact of the variables x1 5
the performance of suspended-growth activated- pH, x2 5 sucrose concentration (g/L), and x3 5
sludge processes. The article “Sludge Volume Index spore population (106 cells/ml) on y 5 oxalic acid

production (mg/L). The accompanying SAS out- dependent variable is durable press rating, a quan-
put resulted from a request to fit the model with titative measure of wrinkle resistance, and the four
predictors x1, x2, and x3 only. independent variables are formaldehyde concentra-
tion, catalyst ratio, curing temperature, and curing
Source DF Sum of Mean F Pr > F time, respectively.
Squares Square Value a. Fitting the model with the four independent
Model 3 5861301 1953767 7.53 0.0052 variables as predictors resulted in the following
Error 11 2855951 259632 Minitab output. Does the fitted model appear to
C. Total 14 8717252
be useful?
Fitting the complete second-order model resulted The regression equation is
in SSResid 5 541,632. Carry out a test at signifi- durpr = –0.912 + 0.161 formconc
cance level .01 to decide whether at least one of the + 0.220 catratio + 0.0112 temp
second-order predictors provides useful informa- + 0.102 time
tion about oxalic acid production. Predictor Coef StDev T p
Constant –0.9122 0.8755 –1.04 0.307
65. The article cited in Exercise 64 also examined the formconc 0.16073 0.06617 2.43 0.023
effect of x1 5 pH, x2 5 sucrose concentration (g/L), catratio 0.21978 0.03406 6.45 0.000
and x3 5 spore population (106 cells/ml) on y 5 temp 0.011226 0.004973 2.26 0.033
gluconic acid production (mg/L). The accompany- time 0.10197 0.05874 1.74 0.095
ing SAS output resulted from a request to fit the S = 0.8365 R-Sq = 69.2% R-Sq(adj) = 64.3%
model with predictors x1, x2, and x3 only. Analysis of Variance
Source DF SS MS F P
Source DF Sum of Mean F Pr > F
Regression 4 39.3769 9.8442 14.07 0.000
Squares Square Value
Error 25 17.4951 0.6998
Model 3 74027925 24675975 178.18 <.0001
Total 29 56.8720
Error 11 1523351 138486
C. Total 14 75551276 b. Estimate, in a way that conveys information
about precision and reliability, the average
Fitting the complete second-order model resulted
change in durability press rating associated with
in SSResid 5 805,534. Carry out a test at signifi-
a 1-degree increase in curing temperature when
cance level .01 to decide whether at least one of the
concentration, catalyst ratio, and curing time all
second-order predictors provides useful informa-
remain fixed.
tion about oxalic acid production.
c. Given that catalyst ratio, curing temperature,
66. The accompanying data was taken from the article and curing time all remain in the model, do you
“Applying Stepwise Multiple Regression Analysis think that formaldehyde concentration provides
to the Reaction of Formaldehyde with Cotton Cel- useful information about durable press rating?
lulose” (Textile Research J., 1984: 157–165). The

Data for Exercise 66

Formaldehyde Catalyst Curing Curing Durable
Obs concentration ratio temperature time press rating
1 8 4 100 1 1.4
2 2 4 180 7 2.2
3 7 4 180 1 4.6
4 10 7 120 5 4.9
5 7 4 180 5 4.6
(Continued)

Formaldehyde Catalyst Curing Curing Durable

Obs concentration ratio temperature time press rating
6 7 7 180 1 4.7
7 7 13 140 1 4.6
8 5 4 160 7 4.5
9 4 7 140 3 4.8
10 5 1 100 7 1.4
11 8 10 140 3 4.7
12 2 4 100 3 1.6
13 4 10 180 3 4.5
14 6 7 120 7 4.7
15 10 13 180 3 4.8
16 4 10 160 5 4.6
17 4 13 100 7 4.3
18 10 10 120 7 4.9
19 5 4 100 1 1.7
20 8 13 140 1 4.6
21 10 1 180 1 2.6
22 2 13 140 1 3.1
23 6 13 180 7 4.7
24 7 1 120 7 2.5
25 5 13 140 1 4.5
26 8 1 160 7 2.1
27 4 1 180 7 1.8
28 6 1 160 1 1.5
29 4 1 100 1 1.3
30 7 10 100 7 4.6

d. Now consider models based not only on these of CEHDP Bleaching for High Brightness Kraft Pulp
four independent variables but also on second- Production,” TAPPI, 1964: 170A–173A). Each inde-
order predictors (four x2i predictors and six xixj pendent variable was allowed to assume five different
predictors). Use a statistical computer package values, and these values were coded for regression
to identify a good model based on this candidate analysis as follows:
pool of predictors. Coded
67. A study was carried out to investigate the relation- Variable value: –2 –1 0 1 2
ship between brightness of finished paper (y) and H2O2 .1 .2 .3 .4 .5
the variables percentage of H2O2 by weight, per- NaOH .1 .2 .3 .4 .5
centage of NaOH by weight, percentage of silicate Silicate .5 1.5 2.5 3.5 4.5
by weight, and process temperature (“Advantages Temperature 130 145 160 175 190

The data follow: temperature is 175. What are the values of the re-
siduals for the observations made with these values
Sili- Temp- Bright-
of the independent variables?
Obs H2O2 NaOH cate erature ness
b. Express the estimated regression in uncoded form.
1 21 21 21 21 83.9
c. SSTo 5 17.2567 and R2 for the model of part (a)
2 1 21 21 21 84.9 is .885. When a model that includes only the
3 21 1 21 21 83.4 four independent variables as predictors is fit,
4 1 1 21 21 84.2 R2 5 .721. Carry out a test at level .05 to decide
5 21 21 1 21 83.8 whether at least one of the second-order predic-
6 1 21 1 21 84.7 tors provides useful information about brightness.
7 21 1 1 21 84.0
68. Three sets of journal bearing tests were run on a
8 1 1 1 21 84.8
Mil-L-8937-type film at each combination of three
9 21 21 21 1 84.5
loads (psi) and three speeds (rpm). The wear life
10 1 21 21 1 86.0 (hr) was recorded for each run, resulting in the fol-
11 21 1 21 1 82.6 lowing data (“Accelerated Testing of Solid Film
12 1 1 21 1 85.1 Lubricants,” Lubrication Engr., 1972: 365–372):
13 21 21 1 1 84.5
Load Load
14 1 21 1 1 86.0
Speed (1000s) Life Speed (1000s) Life
15 21 1 1 1 84.0
20 3 300.2 60 6 65.9
16 1 1 1 1 85.4
20 3 310.8 60 10 10.7
17 22 0 0 0 82.9
20 3 333.0 60 10 34.1
18 2 0 0 0 85.5
20 6 99.6 60 10 39.1
19 0 22 0 0 85.2
20 6 136.2 100 3 26.5
20 0 2 0 0 84.5
20 6 142.4 100 3 22.3
21 0 0 22 0 84.7
20 10 20.2 100 3 34.8
22 0 0 2 0 85.0
20 10 28.2 100 6 32.8
23 0 0 0 22 84.9
20 10 102.7 100 6 25.6
24 0 0 0 2 84.0
60 3 67.3 100 6 32.7
25 0 0 0 0 84.5
60 3 77.9 100 10 2.3
26 0 0 0 0 84.7
60 3 93.9 100 10 4.4
27 0 0 0 0 84.6
60 6 43.0 100 10 5.8
28 0 0 0 0 84.9
60 6 44.5
29 0 0 0 0 84.9
a. With w 5 wear life, s 5 speed, and l 5 load (in
30 0 0 0 0 84.5
1000s), fit the model with dependent variable w
31 0 0 0 0 84.6
and predictors s and l, and assess the utility of
a. When the complete second-order coded model the fitted model.
was fit, the estimate of the constant term was 84.67; b. The cited article contains the comment that a
the estimated coefficients of the linear predictors lognormal distribution is appropriate for wear
were .650, 2.258, .133, and .108, respectively; the life, since ln(w) is known to follow a normal law.
estimated quadratic coefficients were 2.135, .028, The suggested model is w 5 3y(s l )4 «, where
.028, and –.072, respectively; and the estimated « denotes a random deviation and , , and
coefficients of the interaction predictors were .038, are parameters. Estimate the model param-
2.075, .213, .200, 2.188, and .050, respectively. eters, and obtain a prediction interval for wear
Calculate a point prediction of brightness when life when speed is 60 rpm and load is 6000 psi.
H2O2 is .4%, NaOH is .4%, silicate is 3.5%, and (Hint: Transform the model equation so it has

the appearance of the general additive multiple a. Use various techniques to decide whether it is
regression model equation.) plausible that the two techniques measure on
average the same amount of fat.
69. Normal hatchery processes in aquaculture inevi-
b. Use the data to develop a way of predicting an
tably produce stress in fish, which may negatively
HW measurement from a BOD POD measure-
impact growth, reproduction, flesh quality, and
ment, and investigate the effectiveness of such
susceptibility to disease. Such stress manifests itself
predictions.
in elevated and sustained corticosteroid levels. The
article “Evaluation of Simple Instruments for the 71. Curing concrete is known to be vulnerable to
Measurement of Blood Glucose and Lactate, and shock vibrations, which may cause cracking or
Plasma Protein as Stress Indicators in Fish” (J. of hidden damage to the material. As part of a study
the World Aquaculture Society, 1999: 276–284) de- of vibration phenomena, the paper “Shock Vibra-
scribed an experiment in which fish were subjected tion Test of Concrete” (ACI Materials J., 2002:
to a stress protocol and then removed and tested at 361–370) reported the accompanying data on peak
various times after the protocol had been applied. particle velocity (mm/sec) and ratio of ultrasonic
The accompanying data on x 5 time (min) and y 5 pulse velocity after impact to that before impact in
blood glucose level (mmol/L) was read from a plot: concrete prisms:
x: 2 2 5 7 12 13 17 18 23 24 26 28
Obs ppv Ratio Obs ppv Ratio
y: 4.0 3.6 3.7 4.0 3.8 4.0 5.1 3.9 4.4 4.3 4.3 4.4
1 160 .996 16 708 .990
x: 29 30 34 36 40 41 44 56 56 57 60 60
2 164 .996 17 806 .984
y: 5.8 4.3 5.5 5.6 5.1 5.7 6.1 5.1 5.9 6.8 4.9 5.7
3 178 .999 18 884 .986
Use the methods developed in this chapter to analyze 4 252 .997 19 526 .991
the data, and write a brief report summarizing your 5 293 .993 20 490 .993
conclusions (assume that the investigators are partic- 6 289 .997 21 598 .993
ularly interested in glucose level 30 min after stress). 7 415 .999 22 505 .993
70. The article “Evaluating the BOD POD for As- 8 478 .997 23 525 .990
sessing Body Fat in Collegiate Football Players” 9 391 .992 24 675 .991
(Medicine and Science in Sports and Exercise, 10 486 .985 25 1211 .981
1999: 1350–1356) reports on a new air displace- 11 604 .995 26 1036 .986
ment device for measuring body fat. The custom- 12 528 .995 27 1000 .984
ary procedure utilizes the hydrostatic weighing 13 749 .994 28 1151 .982
device, which measures the percentage of body fat 14 772 .994 29 1144 .962
by means of water displacement. Here is represen- 15 532 .987 30 1068 .986
tative data read from a graph in the paper.
Obs BOD HW Obs BOD HW Transverse cracks appeared in the last 12 prisms,
whereas there was no observed cracking in the first
1 2.5 8.0 11 12.2 15.3
18 prisms.
2 4.0 6.2 12 12.6 14.8
a. Construct a comparative boxplot of ppv for the
3 4.1 9.2 13 14.2 14.3
cracked and uncracked prisms, and comment.
4 6.2 6.4 14 14.4 16.3
Then estimate the difference between true aver-
5 7.1 8.6 15 15.1 17.9 age ppv for cracked and uncracked prisms in
6 7.0 12.2 16 15.2 19.5 a way that conveys information about precision
7 8.3 7.2 17 16.3 17.5 and reliability.
8 9.2 12.0 18 17.1 14.3 b. The investigators fit the simple linear regression
9 9.3 14.9 19 17.9 18.3 model to the entire data set consisting of 30 obser-
10 12.0 12.1 20 17.9 16.2 vations, with ppv as the independent variable and

ratio as the dependent variable. Use a statistical b. For the soccer players, the sample correlation
software package to fit several different regression coefficient calculated from the values of x 5
models, and draw appropriate inferences. soccer exposure (total number of competitive
seasons played prior to enrollment in the study)
72. Have you ever wondered whether soccer players
and y 5 score on an immediate memory recall
suffer adverse effects from hitting “headers”? The
test was r 5 2.220. Interpret this result.
authors of the article “No Evidence of Impaired
c. Here is summary information on score on a con-
Neurocognitive Performance in Collegiate Soccer
trolled oral word association test for the soccer
Players” (The Amer. J. of Sports Medicine, 2002:
and nonsoccer athletes:
157–162) investigated this issue from several
perspectives. n1 5 26 x15 37.50 s1 5 9.13
a. The paper reported that 45 of the 91 soccer n2 5 56 x25 39.63 s2 5 10.19
players in their sample had suffered at least one Analyze this data and draw appropriate conclu-
concussion, 28 of 96 nonsoccer athletes had suf- sions.
fered at least one concussion, and only 8 of 53 d. Considering the number of prior nonsoccer con-
student controls had suffered at least one con- cussions, the values of mean 6 sd for the three
cussion. Analyze this data and draw appropriate groups were .30 6 .67, .49 6 .87, and .19 6 .48.
conclusions. Analyze this data and draw appropriate conclusions.

Bibliography
Please see the bibliography for Chapter 3.

I. The Standard Normal Distribution VII. Chi-Squared Critical Values

II. The Binomial Distribution VIII. Critical Values
III. The Poisson Distribution IX. Studentized Range Critical Values
IV. Critical Values for Confidence and X. Critical Values for Dunnett’s Method
Prediction Intervals XI. Control Chart Constants
V. Tolerance Critical Values for Normal XII. Approximate Critical Values for the
Population Distributions Ryan-Joiner Test of Normality
VI. Tail Areas for Curves

581

Table I The standard normal distribution (cumulative z curve areas)

Tabulated area Standard normal ( ) curve

* 0

z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09

3.8 .0001 .0001 .0001 .0001 .0001 .0001 .0001 .0001 .0001 .0000
3.7 .0001 .0001 .0001 .0001 .0001 .0001 .0001 .0001 .0001 .0001
3.6 .0002 .0002 .0001 .0001 .0001 .0001 .0001 .0001 .0001 .0001
3.5 .0002 .0002 .0002 .0002 .0002 .0002 .0002 .0002 .0002 .0002
3.4 .0003 .0003 .0003 .0003 .0003 .0003 .0003 .0003 .0003 .0002
3.3 .0005 .0005 .0005 .0004 .0004 .0004 .0004 .0004 .0004 .0003
3.2 .0007 .0007 .0006 .0006 .0006 .0006 .0006 .0005 .0005 .0005
3.1 .0010 .0009 .0009 .0009 .0008 .0008 .0008 .0008 .0007 .0007
3.0 .0013 .0013 .0013 .0012 .0012 .0011 .0011 .0011 .0010 .0010
2.9 .0019 .0018 .0018 .0017 .0016 .0016 .0015 .0015 .0014 .0014
2.8 .0026 .0025 .0024 .0023 .0023 .0022 .0021 .0021 .0020 .0019
2.7 .0035 .0034 .0033 .0032 .0031 .0030 .0029 .0028 .0027 .0026
2.6 .0047 .0045 .0044 .0043 .0041 .0040 .0039 .0038 .0037 .0036
2.5 .0062 .0060 .0059 .0057 .0055 .0054 .0052 .0051 .0049 .0048
2.4 .0082 .0080 .0078 .0075 .0073 .0071 .0069 .0068 .0066 .0064
2.3 .0107 .0104 .0102 .0099 .0096 .0094 .0091 .0089 .0087 .0084
2.2 .0139 .0136 .0132 .0129 .0125 .0122 .0119 .0116 .0113 .0110
2.1 .0179 .0174 .0170 .0166 .0162 .0158 .0154 .0150 .0146 .0143
2.0 .0228 .0222 .0217 .0212 .0207 .0202 .0197 .0192 .0188 .0183
1.9 .0287 .0281 .0274 .0268 .0262 .0256 .0250 .0244 .0239 .0233
1.8 .0359 .0351 .0344 .0336 .0329 .0322 .0314 .0307 .0301 .0294
1.7 .0446 .0436 .0427 .0418 .0409 .0401 .0392 .0384 .0375 .0367
1.6 .0548 .0537 .0526 .0516 .0505 .0495 .0485 .0475 .0465 .0455
1.5 .0668 .0655 .0643 .0630 .0618 .0606 .0594 .0582 .0571 .0559
1.4 .0808 .0793 .0778 .0764 .0749 .0735 .0721 .0708 .0694 .0681
1.3 .0968 .0951 .0934 .0918 .0901 .0885 .0869 .0853 .0838 .0823
1.2 .1151 .1131 .1112 .1093 .1075 .1056 .1038 .1020 .1003 .0985
1.1 .1357 .1335 .1314 .1292 .1271 .1251 .1230 .1210 .1190 .1170
1.0 .1587 .1562 .1539 .1515 .1492 .1469 .1446 .1423 .1401 .1379
0.9 .1841 .1814 .1788 .1762 .1736 .1711 .1685 .1660 .1635 .1611
0.8 .2119 .2090 .2061 .2033 .2005 .1977 .1949 .1922 .1894 .1867
0.7 .2420 .2389 .2358 .2327 .2296 .2266 .2236 .2206 .2177 .2148
0.6 .2743 .2709 .2676 .2643 .2611 .2578 .2546 .2514 .2483 .2451
0.5 .3085 .3050 .3015 .2981 .2946 .2912 .2877 .2843 .2810 .2776
0.4 .3446 .3409 .3372 .3336 .3300 .3264 .3228 .3192 .3156 .3121
0.3 .3821 .3783 .3745 .3707 .3669 .3632 .3594 .3557 .3520 .3483
0.2 .4207 .4168 .4129 .4090 .4052 .4013 .3974 .3936 .3897 .3859
0.1 .4602 .4562 .4522 .4483 .4443 .4404 .4364 .4325 .4286 .4247
0.0 .5000 .4960 .4920 .4880 .4840 .4801 .4761 .4721 .4681 .4641

Table I The standard normal distribution

Tabulated area Standard normal ( ) curve

*
0

z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09

0.0 .5000 .5040 .5080 .5120 .5160 .5199 .5239 .5279 .5319 .5359
0.1 .5398 .5438 .5478 .5517 .5557 .5596 .5636 .5675 .5714 .5753
0.2 .5793 .5832 .5871 .5910 .5948 .5987 .6026 .6064 .6103 .6141
0.3 .6179 .6217 .6255 .6293 .6331 .6368 .6406 .6443 .6480 .6517
0.4 .6554 .6591 .6628 .6664 .6700 .6736 .6772 .6808 .6844 .6879
0.5 .6915 .6950 .6985 .7019 .7054 .7088 .7123 .7157 .7190 .7224
0.6 .7257 .7291 .7324 .7357 .7389 .7422 .7454 .7486 .7517 .7549
0.7 .7580 .7611 .7642 .7673 .7704 .7734 .7764 .7794 .7823 .7852
0.8 .7881 .7910 .7939 .7967 .7995 .8023 .8051 .8078 .8106 .8133
0.9 .8159 .8186 .8212 .8238 .8264 .8289 .8315 .8340 .8365 .8389
1.0 .8413 .8438 .8461 .8485 .8508 .8531 .8554 .8577 .8599 .8621
1.1 .8643 .8665 .8686 .8708 .8729 .8749 .8770 .8790 .8810 .8830
1.2 .8849 .8869 .8888 .8907 .8925 .8944 .8962 .8980 .8997 .9015
1.3 .9032 .9049 .9066 .9082 .9099 .9115 .9131 .9147 .9162 .9177
1.4 .9192 .9207 .9222 .9236 .9251 .9265 .9279 .9292 .9306 .9319
1.5 .9332 .9345 .9357 .9370 .9382 .9394 .9406 .9418 .9429 .9441
1.6 .9452 .9463 .9474 .9484 .9495 .9505 .9515 .9525 .9535 .9545
1.7 .9554 .9564 .9573 .9582 .9591 .9599 .9608 .9616 .9625 .9633
1.8 .9641 .9649 .9656 .9664 .9671 .9678 .9686 .9693 .9699 .9706
1.9 .9713 .9719 .9726 .9732 .9738 .9744 .9750 .9756 .9761 .9767
2.0 .9772 .9778 .9783 .9788 .9793 .9798 .9803 .9808 .9812 .9817
2.1 .9821 .9826 .9830 .9834 .9838 .9842 .9846 .9850 .9854 .9857
2.2 .9861 .9864 .9868 .9871 .9875 .9878 .9881 .9884 .9887 .9890
2.3 .9893 .9896 .9898 .9901 .9904 .9906 .9909 .9911 .9913 .9916
2.4 .9918 .9920 .9922 .9925 .9927 .9929 .9931 .9932 .9934 .9936
2.5 .9938 .9940 .9941 .9943 .9945 .9946 .9948 .9949 .9951 .9952
2.6 .9953 .9955 .9956 .9957 .9959 .9960 .9961 .9962 .9963 .9964
2.7 .9965 .9966 .9967 .9968 .9969 .9970 .9971 .9972 .9973 .9974
2.8 .9974 .9975 .9976 .9977 .9977 .9978 .9979 .9979 .9980 .9981
2.9 .9981 .9982 .9982 .9983 .9984 .9984 .9985 .9985 .9986 .9986
3.0 .9987 .9987 .9987 .9988 .9988 .9989 .9989 .9989 .9990 .9990
3.1 .9990 .9991 .9991 .9991 .9992 .9992 .9992 .9992 .9993 .9993
3.2 .9993 .9993 .9994 .9994 .9994 .9994 .9994 .9995 .9995 .9995
3.3 .9995 .9995 .9995 .9996 .9996 .9996 .9996 .9996 .9996 .9997
3.4 .9997 .9997 .9997 .9997 .9997 .9997 .9997 .9997 .9997 .9998
3.5 .9998 .9998 .9998 .9998 .9998 .9998 .9998 .9998 .9998 .9998
3.6 .9998 .9998 .9999 .9999 .9999 .9999 .9999 .9999 .9999 .9999
3.7 .9999 .9999 .9999 .9999 .9999 .9999 .9999 .9999 .9999 .9999
3.8 .9999 .9999 .9999 .9999 .9999 .9999 .9999 .9999 .9999 1.0000

Table II The binomial distribution

n 5

x 0.05 0.1 0.2 0.25 0.3 0.4 0.5 0.6 0.7 0.75 0.8 0.9 0.95

0 .774 .590 .328 .237 .168 .078 .031 .010 .002 .001 .000 .000 .000
1 .203 .329 .409 .396 .360 .259 .157 .077 .029 .015 .007 .000 .000
2 .022 .072 .205 .263 .309 .346 .312 .230 .132 .088 .051 .009 .001
3 .001 .009 .051 .088 .132 .230 .312 .346 .309 .263 .205 .072 .022
4 .000 .000 .007 .015 .029 .077 .157 .259 .360 .396 .409 .329 .203
5 .000 .000 .000 .001 .002 .010 .031 .078 .168 .237 .328 .590 .774

n 10

x 0.05 0.1 0.2 0.25 0.3 0.4 0.5 0.6 0.7 0.75 0.8 0.9 0.95

0 .599 .349 .107 .056 .028 .006 .001 .000 .000 .000 .000 .000 .000
1 .315 .387 .268 .188 .121 .040 .010 .002 .000 .000 .000 .000 .000
2 .075 .194 .302 .282 .233 .121 .044 .011 .001 .000 .000 .000 .000
3 .010 .057 .201 .250 .267 .215 .117 .042 .009 .003 .001 .000 .000
4 .001 .011 .088 .146 .200 .251 .205 .111 .037 .016 .006 .000 .000
5 .000 .001 .026 .058 .103 .201 .246 .201 .103 .058 .026 .001 .000
6 .000 .000 .006 .016 .037 .111 .205 .251 .200 .146 .088 .011 .001
7 .000 .000 .001 .003 .009 .042 .117 .215 .267 .250 .201 .057 .010
8 .000 .000 .000 .000 .001 .011 .044 .121 .233 .282 .302 .194 .075
9 .000 .000 .000 .000 .000 .002 .010 .040 .121 .188 .268 .387 .315
10 .000 .000 .000 .000 .000 .000 .001 .006 .028 .056 .107 .349 .599

Table II The binomial distribution

n 15

x 0.05 0.1 0.2 0.25 0.3 0.4 0.5 0.6 0.7 0.75 0.8 0.9 0.95

0 .463 .206 .035 .013 .005 .000 .000 .000 .000 .000 .000 .000 .000
1 .366 .343 .132 .067 .030 .005 .000 .000 .000 .000 .000 .000 .000
2 .135 .267 .231 .156 .092 .022 .004 .000 .000 .000 .000 .000 .000
3 .031 .128 .250 .225 .170 .064 .014 .002 .000 .000 .000 .000 .000
4 .004 .043 .188 .225 .218 .126 .041 .007 .001 .000 .000 .000 .000
5 .001 .011 .103 .166 .207 .196 .092 .025 .003 .001 .000 .000 .000
6 .000 .002 .043 .091 .147 .207 .153 .061 .011 .003 .001 .000 .000
7 .000 .000 .014 .040 .081 .177 .196 .118 .035 .013 .003 .000 .000
8 .000 .000 .003 .013 .035 .118 .196 .177 .081 .040 .014 .000 .000
9 .000 .000 .001 .003 .011 .061 .153 .207 .147 .091 .043 .002 .000
10 .000 .000 .000 .001 .003 .025 .092 .196 .207 .166 .103 .011 .001
11 .000 .000 .000 .000 .001 .007 .041 .126 .218 .225 .188 .043 .004
12 .000 .000 .000 .000 .000 .002 .014 .064 .170 .225 .250 .128 .031
13 .000 .000 .000 .000 .000 .000 .004 .022 .092 .156 .231 .267 .135
14 .000 .000 .000 .000 .000 .000 .000 .005 .030 .067 .132 .343 .366
15 .000 .000 .000 .000 .000 .000 .000 .000 .005 .013 .035 .206 .463

n 20

x 0.05 0.1 0.2 0.25 0.3 0.4 0.5 0.6 0.7 0.75 0.8 0.9 0.95

0 .358 .122 .012 .003 .001 .000 .000 .000 .000 .000 .000 .000 .000
1 .377 .270 .058 .021 .007 .000 .000 .000 .000 .000 .000 .000 .000
2 .189 .285 .137 .067 .028 .003 .000 .000 .000 .000 .000 .000 .000
3 .060 .190 .205 .134 .072 .012 .001 .000 .000 .000 .000 .000 .000
4 .013 .090 .218 .190 .130 .035 .005 .000 .000 .000 .000 .000 .000
5 .002 .032 .175 .202 .179 .075 .015 .001 .000 .000 .000 .000 .000
6 .000 .009 .109 .169 .192 .124 .037 .005 .000 .000 .000 .000 .000
7 .000 .002 .055 .112 .164 .166 .074 .015 .001 .000 .000 .000 .000
8 .000 .000 .022 .061 .114 .180 .120 .035 .004 .001 .000 .000 .000
9 .000 .000 .007 .027 .065 .160 .160 .071 .012 .003 .000 .000 .000
10 .000 .000 .002 .010 .031 .117 .176 .117 .031 .010 .002 .000 .000
11 .000 .000 .000 .003 .012 .071 .160 .160 .065 .027 .007 .000 .000
12 .000 .000 .000 .001 .004 .035 .120 .180 .114 .061 .022 .000 .000
13 .000 .000 .000 .000 .001 .015 .074 .166 .164 .112 .055 .002 .000
14 .000 .000 .000 .000 .000 .005 .037 .124 .192 .169 .109 .009 .000
15 .000 .000 .000 .000 .000 .001 .015 .075 .179 .202 .175 .032 .002
16 .000 .000 .000 .000 .000 .000 .005 .035 .130 .190 .218 .090 .013
17 .000 .000 .000 .000 .000 .000 .001 .012 .072 .134 .205 .190 .060
18 .000 .000 .000 .000 .000 .000 .000 .003 .028 .067 .137 .285 .189
19 .000 .000 .000 .000 .000 .000 .000 .000 .007 .021 .058 .270 .377
20 .000 .000 .000 .000 .000 .000 .000 .000 .001 .003 .012 .122 .358

Table II The binomial distribution

n 25

x 0.05 0.1 0.2 0.25 0.3 0.4 0.5 0.6 0.7 0.75 0.8 0.9 0.95

0 .277 .072 .004 .001 .000 .000 .000 .000 .000 .000 .000 .000 .000
1 .365 .199 .023 .006 .002 .000 .000 .000 .000 .000 .000 .000 .000
2 .231 .266 .071 .025 .007 .000 .000 .000 .000 .000 .000 .000 .000
3 .093 .227 .136 .064 .024 .002 .000 .000 .000 .000 .000 .000 .000
4 .027 .138 .187 .118 .057 .007 .000 .000 .000 .000 .000 .000 .000
5 .006 .065 .196 .164 .103 .020 .002 .000 .000 .000 .000 .000 .000
6 .001 .024 .163 .183 .148 .045 .005 .000 .000 .000 .000 .000 .000
7 .000 .007 .111 .166 .171 .080 .015 .001 .000 .000 .000 .000 .000
8 .000 .002 .062 .124 .165 .120 .032 .003 .000 .000 .000 .000 .000
9 .000 .000 .030 .078 .134 .151 .061 .009 .000 .000 .000 .000 .000
10 .000 .000 .011 .042 .091 .161 .097 .021 .002 .000 .000 .000 .000
11 .000 .000 .004 .019 .054 .146 .133 .044 .004 .001 .000 .000 .000
12 .000 .000 .002 .007 .027 .114 .155 .076 .011 .002 .000 .000 .000
13 .000 .000 .000 .002 .011 .076 .155 .114 .027 .007 .002 .000 .000
14 .000 .000 .000 .001 .004 .044 .133 .146 .054 .019 .004 .000 .000
15 .000 .000 .000 .000 .002 .021 .097 .161 .091 .042 .011 .000 .000
16 .000 .000 .000 .000 .000 .009 .061 .151 .134 .078 .030 .000 .000
17 .000 .000 .000 .000 .000 .003 .032 .120 .165 .124 .062 .002 .000
18 .000 .000 .000 .000 .000 .001 .015 .080 .171 .166 .111 .007 .000
19 .000 .000 .000 .000 .000 .000 .005 .045 .148 .183 .163 .024 .001
20 .000 .000 .000 .000 .000 .000 .002 .020 .103 .164 .196 .065 .006
21 .000 .000 .000 .000 .000 .000 .000 .007 .057 .118 .187 .138 .027
22 .000 .000 .000 .000 .000 .000 .000 .002 .024 .064 .136 .227 .093
23 .000 .000 .000 .000 .000 .000 .000 .000 .007 .025 .071 .266 .231
24 .000 .000 .000 .000 .000 .000 .000 .000 .002 .006 .023 .199 .365
25 .000 .000 .000 .000 .000 .000 .000 .000 .000 .001 .004 .072 .277

Table III The Poisson distribution

x .1 .2 .3 .4 .5 .6 .7 .8 .9 1.0

0 .905 .819 .741 .670 .607 .549 .497 .449 .407 .368
1 .090 .164 .222 .268 .303 .329 .348 .359 .366 .368
2 .005 .016 .033 .054 .076 .099 .122 .144 .165 .184
3 .001 .003 .007 .013 .020 .028 .038 .049 .061
4 .001 .002 .003 .005 .008 .011 .015
5 .001 .001 .002 .003
6 .001

x 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 15.0 20.0

0 .135 .050 .018 .007 .002 .001 .000 .000 .000 .000 .000
1 .271 .149 .073 .034 .015 .006 .003 .001 .000 .000 .000
2 .271 .224 .147 .084 .045 .022 .011 .005 .002 .000 .000
3 .180 .224 .195 .140 .089 .052 .029 .015 .008 .000 .000
4 .090 .168 .195 .175 .134 .091 .057 .034 .019 .001 .000
5 .036 .101 .156 .175 .161 .128 .092 .061 .038 .002 .000
6 .012 .050 .104 .146 .161 .149 .122 .091 .063 .005 .000
7 .003 .022 .060 .104 .138 .149 .140 .117 .090 .010 .001
8 .001 .008 .030 .065 .103 .130 .140 .132 .113 .019 .001
9 .003 .013 .036 .069 .101 .124 .132 .125 .032 .003
10 .001 .005 .018 .041 .071 .099 .119 .125 .049 .006
11 .002 .008 .023 .045 .072 .097 .114 .066 .011
12 .001 .003 .011 .026 .048 .073 .095 .083 .018
13 .001 .005 .014 .030 .050 .073 .096 .027
14 .002 .007 .017 .032 .052 .102 .039
15 .001 .003 .009 .019 .035 .102 .052
16 .001 .005 .011 .022 .096 .065
17 .001 .002 .006 .013 .085 .076
18 .001 .003 .007 .071 .084
19 .001 .004 .056 .089
20 .001 .002 .042 .089
21 .001 .030 .085
22 .020 .077
23 .013 .067
24 .008 .056
25 .005 .045
26 .003 .034
27 .002 .025
28 .001 .018
29 .013
30 .008

Table IV critical values for confidence and prediction intervals

Central area density curve density curve

Cumulative area

0 0
– critical value critical value critical value

Central area 5 confidence/prediction level

for two-sided interval: 80% 90% 95% 98% 99% 99.8% 99.9%
Cumulative area 5 confidence/prediction
level for one-sided interval: 90% 95% 97.5% 99% 99.5% 99.9% 99.95%

1 3.078 6.314 12.706 31.821 63.657 318.310 636.620

2 1.886 2.920 4.303 6.965 9.925 22.326 31.598
3 1.638 2.353 3.182 4.541 5.841 10.213 12.924
4 1.533 2.132 2.776 3.747 4.604 7.173 8.610
5 1.476 2.015 2.571 3.365 4.032 5.893 6.869
6 1.440 1.943 2.447 3.143 3.707 5.208 5.959
7 1.415 1.895 2.365 2.998 3.499 4.785 5.408
8 1.397 1.860 2.306 2.896 3.355 4.501 5.041
9 1.383 1.833 2.262 2.821 3.250 4.297 4.781
10 1.372 1.812 2.228 2.764 3.169 4.144 4.587
11 1.363 1.796 2.201 2.718 3.106 4.025 4.437
12 1.356 1.782 2.179 2.681 3.055 3.930 4.318
13 1.350 1.771 2.160 2.650 3.012 3.852 4.221
14 1.345 1.761 2.145 2.624 2.977 3.787 4.140
15 1.341 1.753 2.131 2.602 2.947 3.733 4.073
16 1.337 1.746 2.120 2.583 2.921 3.686 4.015
Degrees of 17 1.333 1.740 2.110 2.567 2.898 3.646 3.965
freedom 18 1.330 1.734 2.101 2.552 2.878 3.610 3.922
19 1.328 1.729 2.093 2.539 2.861 3.579 3.883
20 1.325 1.725 2.086 2.528 2.845 3.552 3.850
21 1.323 1.721 2.080 2.518 2.831 3.527 3.819
22 1.321 1.717 2.074 2.508 2.819 3.505 3.792
23 1.319 1.714 2.069 2.500 2.807 3.485 3.767
24 1.318 1.711 2.064 2.492 2.797 3.467 3.745
25 1.316 1.708 2.060 2.485 2.787 3.450 3.725
26 1.315 1.706 2.056 2.479 2.779 3.435 3.707
27 1.314 1.703 2.052 2.473 2.771 3.421 3.690
28 1.313 1.701 2.048 2.467 2.763 3.408 3.674
29 1.311 1.699 2.045 2.462 2.756 3.396 3.659
30 1.310 1.697 2.042 2.457 2.750 3.385 3.646
40 1.303 1.684 2.021 2.423 2.704 3.307 3.551
60 1.296 1.671 2.000 2.390 2.660 3.232 3.460
120 1.289 1.658 1.980 2.358 2.617 3.160 3.373
1.282 1.645 1.960 2.326 2.576 3.090 3.291

Two-sided intervals One-sided intervals

Confidence level 95% 99% 95% 99%

% of population captured $90% $95% $99% $90% $95% $99% $90% $95% $99% $90% $95% $99%

2 32.019 37.674 48.430 160.193 188.491 242.300 20.581 26.260 37.094 103.029 131.426 185.617
3 8.380 9.916 12.861 18.930 22.401 29.055 6.156 7.656 10.553 13.995 17.370 23.896
4 5.369 6.370 8.299 9.398 11.150 14.527 4.162 5.144 7.042 7.380 9.083 12.387
5 4.275 5.079 6.634 6.612 7.855 10.260 3.407 4.203 5.741 5.362 6.578 8.939
6 3.712 4.414 5.775 5.337 6.345 8.301 3.006 3.708 5.062 4.411 5.406 7.335
7 3.369 4.007 5.248 4.613 5.488 7.187 2.756 3.400 4.642 3.859 4.728 6.412
8 3.136 3.732 4.891 4.147 4.936 6.468 2.582 3.187 4.354 3.497 4.285 5.812
9 2.967 3.532 4.631 3.822 4.550 5.966 2.454 3.031 4.143 3.241 3.972 5.389
10 2.839 3.379 4.433 3.582 4.265 5.594 2.355 2.911 3.981 3.048 3.738 5.074
11 2.737 3.259 4.277 3.397 4.045 5.308 2.275 2.815 3.852 2.898 3.556 4.829
12 2.655 3.162 4.150 3.250 3.870 5.079 2.210 2.736 3.747 2.777 3.410 4.633
13 2.587 3.081 4.044 3.130 3.727 4.893 2.155 2.671 3.659 2.677 3.290 4.472
14 2.529 3.012 3.955 3.029 3.608 4.737 2.109 2.615 3.585 2.593 3.189 4.337
15 2.480 2.954 3.878 2.945 3.507 4.605 2.068 2.566 3.520 2.522 3.102 4.222
16 2.437 2.903 3.812 2.872 3.421 4.492 2.033 2.524 3.464 2.460 3.028 4.123
Sample size n 17 2.400 2.858 3.754 2.808 3.345 4.393 2.002 2.486 3.414 2.405 2.963 4.037
18 2.366 2.819 3.702 2.753 3.279 4.307 1.974 2.453 3.370 2.357 2.905 3.960
19 2.337 2.784 3.656 2.703 3.221 4.230 1.949 2.423 3.331 2.314 2.854 3.892
20 2.310 2.752 3.615 2.659 3.168 4.161 1.926 2.396 3.295 2.276 2.808 3.832

www.ebook3000.com
25 2.208 2.631 3.457 2.494 2.972 3.904 1.838 2.292 3.158 2.129 2.633 3.601
30 2.140 2.549 3.350 2.385 2.841 3.733 1.777 2.220 3.064 2.030 2.516 3.447
35 2.090 2.490 3.272 2.306 2.748 3.611 1.732 2.167 2.995 1.957 2.430 3.334
40 2.052 2.445 3.213 2.247 2.677 3.518 1.697 2.126 2.941 1.902 2.364 3.249
45 2.021 2.408 3.165 2.200 2.621 3.444 1.669 2.092 2.898 1.857 2.312 3.180
50 1.996 2.379 3.126 2.162 2.576 3.385 1.646 2.065 2.863 1.821 2.269 3.125
60 1.958 2.333 3.066 2.103 2.506 3.293 1.609 2.022 2.807 1.764 2.202 3.038
70 1.929 2.299 3.021 2.060 2.454 3.225 1.581 1.990 2.765 1.722 2.153 2.974
80 1.907 2.272 2.986 2.026 2.414 3.173 1.559 1.965 2.733 1.688 2.114 2.924
90 1.889 2.251 2.958 1.999 2.382 3.130 1.542 1.944 2.706 1.661 2.082 2.883
100 1.874 2.233 2.934 1.977 2.355 3.096 1.527 1.927 2.684 1.639 2.056 2.850
150 1.825 2.175 2.859 1.905 2.270 2.983 1.478 1.870 2.611 1.566 1.971 2.741
200 1.798 2.143 2.816 1.865 2.222 2.921 1.450 1.837 2.570 1.524 1.923 2.679
250 1.780 2.121 2.788 1.839 2.191 2.880 1.431 1.815 2.542 1.496 1.891 2.638
300 1.767 2.106 2.767 1.820 2.169 2.850 1.417 1.800 2.522 1.476 1.868 2.608
1.645 1.960 2.576 1.645 1.960 2.576 1.282 1.645 2.326 1.282 1.645 2.326

deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has
590 Appendix Tables

Table VI Tail areas for curves

curve Area to the

right of

df
t 1 2 3 4 5 6 7 8 9 10 11 12

0.0 .500 .500 .500 .500 .500 .500 .500 .500 .500 .500 .500 .500
0.1 .468 .465 .463 .463 .462 .462 .462 .461 .461 .461 .461 .461
0.2 .437 .430 .427 .426 .425 .424 .424 .423 .423 .423 .423 .422
0.3 .407 .396 .392 .390 .388 .387 .386 .386 .386 .385 .385 .385
0.4 .379 .364 .358 .355 .353 .352 .351 .350 .349 .349 .348 .348
0.5 .352 .333 .326 .322 .319 .317 .316 .315 .315 .314 .313 .313
0.6 .328 .305 .295 .290 .287 .285 .284 .283 .282 .281 .280 .280
0.7 .306 .278 .267 .261 .258 .255 .253 .252 .251 .250 .249 .249
0.8 .285 .254 .241 .234 .230 .227 .225 .223 .222 .221 .220 .220
0.9 .267 .232 .217 .210 .205 .201 .199 .197 .196 .195 .194 .193
1.0 .250 .211 .196 .187 .182 .178 .175 .173 .172 .170 .169 .169
1.1 .235 .193 .176 .167 .162 .157 .154 .152 .150 .149 .147 .146
1.2 .221 .177 .158 .148 .142 .138 .135 .132 .130 .129 .128 .127
1.3 .209 .162 .142 .132 .125 .121 .117 .115 .113 .111 .110 .109
1.4 .197 .148 .128 .117 .110 .106 .102 .100 .098 .096 .095 .093
1.5 .187 .136 .115 .104 .097 .092 .089 .086 .084 .082 .081 .080
1.6 .178 .125 .104 .092 .085 .080 .077 .074 .072 .070 .069 .068
1.7 .169 .116 .094 .082 .075 .070 .066 .064 .062 .060 .059 .057
1.8 .161 .107 .085 .073 .066 .061 .057 .055 .053 .051 .050 .049
1.9 .154 .099 .077 .065 .058 .053 .050 .047 .045 .043 .042 .041
2.0 .148 .092 .070 .058 .051 .046 .043 .040 .038 .037 .035 .034
2.1 .141 .085 .063 .052 .045 .040 .037 .034 .033 .031 .030 .029
2.2 .136 .079 .058 .046 .040 .035 .032 .029 .028 .026 .025 .024
2.3 .131 .074 .052 .041 .035 .031 .027 .025 .023 .022 .021 .020
2.4 .126 .069 .048 .037 .031 .027 .024 .022 .020 .019 .018 .017
2.5 .121 .065 .044 .033 .027 .023 .020 .018 .017 .016 .015 .014
2.6 .117 .061 .040 .030 .024 .020 .018 .016 .014 .013 .012 .012
2.7 .113 .057 .037 .027 .021 .018 .015 .014 .012 .011 .010 .010
2.8 .109 .054 .034 .024 .019 .016 .013 .012 .010 .009 .009 .008
2.9 .106 .051 .031 .022 .017 .014 .011 .010 .009 .008 .007 .007
3.0 .102 .048 .029 .020 .015 .012 .010 .009 .007 .007 .006 .006
3.1 .099 .045 .027 .018 .013 .011 .009 .007 .006 .006 .005 .005
3.2 .096 .043 .025 .016 .012 .009 .008 .006 .005 .005 .004 .004
3.3 .094 .040 .023 .015 .011 .008 .007 .005 .005 .004 .004 .003
3.4 .091 .038 .021 .014 .010 .007 .006 .005 .004 .003 .003 .003
3.5 .089 .036 .020 .012 .009 .006 .005 .004 .003 .003 .002 .002
3.6 .086 .035 .018 .011 .008 .006 .004 .004 .003 .002 .002 .002
3.7 .084 .033 .017 .010 .007 .005 .004 .003 .002 .002 .002 .002
3.8 .082 .031 .016 .010 .006 .004 .003 .003 .002 .002 .001 .001
3.9 .080 .030 .015 .009 .006 .004 .003 .002 .002 .001 .001 .001
4.0 .078 .029 .014 .008 .005 .004 .003 .002 .002 .001 .001 .001

Table VI Tail areas for curves

curve Area to the

right of

df
t 13 14 15 16 17 18 19 20 21 22 23 24

0.0 .500 .500 .500 .500 .500 .500 .500 .500 .500 .500 .500 .500
0.1 .461 .461 .461 .461 .461 .461 .461 .461 .461 .461 .461 .461
0.2 .422 .422 .422 .422 .422 .422 .422 .422 .422 .422 .422 .422
0.3 .384 .384 .384 .384 .384 .384 .384 .384 .384 .383 .383 .383
0.4 .348 .347 .347 .347 .347 .347 .347 .347 .347 .347 .346 .346
0.5 .313 .312 .312 .312 .312 .312 .311 .311 .311 .311 .311 .311
0.6 .279 .279 .279 .278 .278 .278 .278 .278 .278 .277 .277 .277
0.7 .248 .247 .247 .247 .247 .246 .246 .246 .246 .246 .245 .245
0.8 .219 .218 .218 .218 .217 .217 .217 .217 .216 .216 .216 .216
0.9 .192 .191 .191 .191 .190 .190 .190 .189 .189 .189 .189 .189
1.0 .168 .167 .167 .166 .166 .165 .165 .165 .164 .164 .164 .164
1.1 .146 .144 .144 .144 .143 .143 .143 .142 .142 .142 .141 .141
1.2 .126 .124 .124 .124 .123 .123 .122 .122 .122 .121 .121 .121
1.3 .108 .107 .107 .106 .105 .105 .105 .104 .104 .104 .103 .103
1.4 .092 .091 .091 .090 .090 .089 .089 .089 .088 .088 .087 .087
1.5 .079 .077 .077 .077 .076 .075 .075 .075 .074 .074 .074 .073
1.6 .067 .065 .065 .065 .064 .064 .063 .063 .062 .062 .062 .061
1.7 .056 .055 .055 .054 .054 .053 .053 .052 .052 .052 .051 .051
1.8 .048 .046 .046 .045 .045 .044 .044 .043 .043 .043 .042 .042
1.9 .040 .038 .038 .038 .037 .037 .036 .036 .036 .035 .035 .035
2.0 .033 .032 .032 .031 .031 .030 .030 .030 .029 .029 .029 .028
2.1 .028 .027 .027 .026 .025 .025 .025 .024 .024 .024 .023 .023
2.2 .023 .022 .022 .021 .021 .021 .020 .020 .020 .019 .019 .019
2.3 .019 .018 .018 .018 .017 .017 .016 .016 .016 .016 .015 .015
2.4 .016 .015 .015 .014 .014 .014 .013 .013 .013 .013 .012 .012
2.5 .013 .012 .012 .012 .011 .011 .011 .011 .010 .010 .010 .010
2.6 .011 .010 .010 .010 .009 .009 .009 .009 .008 .008 .008 .008
2.7 .009 .008 .008 .008 .008 .007 .007 .007 .007 .007 .006 .006
2.8 .008 .007 .007 .006 .006 .006 .006 .006 .005 .005 .005 .005
2.9 .006 .005 .005 .005 .005 .005 .005 .004 .004 .004 .004 .004
3.0 .005 .004 .004 .004 .004 .004 .004 .004 .003 .003 .003 .003
3.1 .004 .004 .004 .003 .003 .003 .003 .003 .003 .003 .003 .002
3.2 .003 .003 .003 .003 .003 .002 .002 .002 .002 .002 .002 .002
3.3 .003 .002 .002 .002 .002 .002 .002 .002 .002 .002 .002 .001
3.4 .002 .002 .002 .002 .002 .002 .002 .001 .001 .001 .001 .001
3.5 .002 .002 .002 .001 .001 .001 .001 .001 .001 .001 .001 .001
3.6 .002 .001 .001 .001 .001 .001 .001 .001 .001 .001 .001 .001
3.7 .001 .001 .001 .001 .001 .001 .001 .001 .001 .001 .001 .001
3.8 .001 .001 .001 .001 .001 .001 .001 .001 .001 .000 .000 .000
3.9 .001 .001 .001 .001 .001 .001 .000 .000 .000 .000 .000 .000
4.0 .001 .001 .001 .001 .000 .000 .000 .000 .000 .000 .000 .000

Table VI Tail areas for curves

curve Area to the
right of

df
t 25 26 27 28 29 30 35 40 60 120 `( z)

0.0 .500 .500 .500 .500 .500 .500 .500 .500 .500 .500 .500
0.1 .461 .461 .461 .461 .461 .461 .460 .460 .460 .460 .460
0.2 .422 .422 .421 .421 .421 .421 .421 .421 .421 .421 .421
0.3 .383 .383 .383 .383 .383 .383 .383 .383 .383 .382 .382
0.4 .346 .346 .346 .346 .346 .346 .346 .346 .345 .345 .345
0.5 .311 .311 .311 .310 .310 .310 .310 .310 .309 .309 .309
0.6 .277 .277 .277 .277 .277 .277 .276 .276 .275 .275 .274
0.7 .245 .245 .245 .245 .245 .245 .244 .244 .243 .243 .242
0.8 .216 .215 .215 .215 .215 .215 .215 .214 .213 .213 .212
0.9 .188 .188 .188 .188 .188 .188 .187 .187 .186 .185 .184
1.0 .163 .163 .163 .163 .163 .163 .162 .162 .161 .160 .159
1.1 .141 .141 .141 .140 .140 .140 .139 .139 .138 .137 .136
1.2 .121 .120 .120 .120 .120 .120 .119 .119 .117 .116 .115
1.3 .103 .103 .102 .102 .102 .102 .101 .101 .099 .098 .097
1.4 .087 .087 .086 .086 .086 .086 .085 .085 .083 .082 .081
1.5 .073 .073 .073 .072 .072 .072 .071 .071 .069 .068 .067
1.6 .061 .061 .061 .060 .060 .060 .059 .059 .057 .056 .055
1.7 .051 .051 .050 .050 .050 .050 .049 .048 .047 .046 .045
1.8 .042 .042 .042 .041 .041 .041 .040 .040 .038 .037 .036
1.9 .035 .034 .034 .034 .034 .034 .033 .032 .031 .030 .029
2.0 .028 .028 .028 .028 .027 .027 .027 .026 .025 .024 .023
2.1 .023 .023 .023 .022 .022 .022 .022 .021 .020 .019 .018
2.2 .019 .018 .018 .018 .018 .018 .017 .017 .016 .015 .014
2.3 .015 .015 .015 .015 .014 .014 .014 .013 .012 .012 .011
2.4 .012 .012 .012 .012 .012 .011 .011 .011 .010 .009 .008
2.5 .010 .010 .009 .009 .009 .009 .009 .008 .008 .007 .006
2.6 .008 .008 .007 .007 .007 .007 .007 .007 .006 .005 .005
2.7 .006 .006 .006 .006 .006 .006 .005 .005 .004 .004 .003
2.8 .005 .005 .005 .005 .005 .004 .004 .004 .003 .003 .003
2.9 .004 .004 .004 .004 .004 .003 .003 .003 .003 .002 .002
3.0 .003 .003 .003 .003 .003 .003 .002 .002 .002 .002 .001
3.1 .002 .002 .002 .002 .002 .002 .002 .002 .001 .001 .001
3.2 .002 .002 .002 .002 .002 .002 .001 .001 .001 .001 .001
3.3 .001 .001 .001 .001 .001 .001 .001 .001 .001 .001 .000
3.4 .001 .001 .001 .001 .001 .001 .001 .001 .001 .000 .000
3.5 .001 .001 .001 .001 .001 .001 .001 .001 .000 .000 .000
3.6 .001 .001 .001 .001 .001 .001 .000 .000 .000 .000 .000
3.7 .001 .001 .000 .000 .000 .000 .000 .000 .000 .000 .000
3.8 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000
3.9 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000
4.0 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000

Table VII Chi-squared critical values

Right-tail area df 1 df 2 df 3 df 4 df 5

..100 ,2.70 ,4.60 ,6.25 ,7.77 ,9.23

0.100 2.70 4.60 6.25 7.77 9.23
0.095 2.78 4.70 6.36 7.90 9.37
0.090 2.87 4.81 6.49 8.04 9.52
0.085 2.96 4.93 6.62 8.18 9.67
0.080 3.06 5.05 6.75 8.33 9.83
0.075 3.17 5.18 6.90 8.49 10.00
0.070 3.28 5.31 7.06 8.66 10.19
0.065 3.40 5.46 7.22 8.84 10.38
0.060 3.53 5.62 7.40 9.04 10.59
0.055 3.68 5.80 7.60 9.25 10.82
0.050 3.84 5.99 7.81 9.48 11.07
0.045 4.01 6.20 8.04 9.74 11.34
0.040 4.21 6.43 8.31 10.02 11.64
0.035 4.44 6.70 8.60 10.34 11.98
0.030 4.70 7.01 8.94 10.71 12.37
0.025 5.02 7.37 9.34 11.14 12.83
0.020 5.41 7.82 9.83 11.66 13.38
0.015 5.91 8.39 10.46 12.33 14.09
0.010 6.63 9.21 11.34 13.27 15.08
0.005 7.87 10.59 12.83 14.86 16.74
0.001 10.82 13.81 16.26 18.46 20.51
,0.001 .10.82 .13.81 .16.26 .18.46 .20.51
Right-tail area df 6 df 7 df 8 df 9 df 10

..100 ,10.64 ,12.01 ,13.36 ,14.68 ,15.98

0.100 10.64 12.01 13.36 14.68 15.98
0.095 10.79 12.17 13.52 14.85 16.16
0.090 10.94 12.33 13.69 15.03 16.35
0.085 11.11 12.50 13.87 15.22 16.54
0.080 11.28 12.69 14.06 15.42 16.75
0.075 11.46 12.88 14.26 15.63 16.97
0.070 11.65 13.08 14.48 15.85 17.20
0.065 11.86 13.30 14.71 16.09 17.44
0.060 12.08 13.53 14.95 16.34 17.71
0.055 12.33 13.79 15.22 16.62 17.99
0.050 12.59 14.06 15.50 16.91 18.30
0.045 12.87 14.36 15.82 17.24 18.64
0.040 13.19 14.70 16.17 17.60 19.02
0.035 13.55 15.07 16.56 18.01 19.44
0.030 13.96 15.50 17.01 18.47 19.92
0.025 14.44 16.01 17.53 19.02 20.48
0.020 15.03 16.62 18.16 19.67 21.16
0.015 15.77 17.39 18.97 20.51 22.02
0.010 16.81 18.47 20.09 21.66 23.20
0.005 18.54 20.27 21.95 23.58 25.18
0.001 22.45 24.32 26.12 27.87 29.58
,0.001 .22.45 .24.32 .26.12 .27.87 .29.58

Table VII Chi-squared critical values

Right-tail area df 11 df 12 df 13 df 14 df 15

..100 ,17.27 ,18.54 ,19.81 ,21.06 ,22.30

0.100 17.27 18.54 19.81 21.06 22.30
0.095 17.45 18.74 20.00 21.26 22.51
0.090 17.65 18.93 20.21 21.47 22.73
0.085 17.85 19.14 20.42 21.69 22.95
0.080 18.06 19.36 20.65 21.93 23.19
0.075 18.29 19.60 20.89 22.17 23.45
0.070 18.53 19.84 21.15 22.44 23.72
0.065 18.78 20.11 21.42 22.71 24.00
0.060 19.06 20.39 21.71 23.01 24.31
0.055 19.35 20.69 22.02 23.33 24.63
0.050 19.67 21.02 22.36 23.68 24.99
0.045 20.02 21.38 22.73 24.06 25.38
0.040 20.41 21.78 23.14 24.48 25.81
0.035 20.84 22.23 23.60 24.95 26.29
0.030 21.34 22.74 24.12 25.49 26.84
0.025 21.92 23.33 24.73 26.11 27.48
0.020 22.61 24.05 25.47 26.87 28.25
0.015 23.50 24.96 26.40 27.82 29.23
0.010 24.72 26.21 27.68 29.14 30.57
0.005 26.75 28.29 29.81 31.31 32.80
0.001 31.26 32.90 34.52 36.12 37.69
,0.001 .31.26 .32.90 .34.52 .36.12 .37.69
Right-tail area df 16 df 17 df 18 df 19 df 20

..100 ,23.54 ,24.77 ,25.98 ,27.20 ,28.41

0.100 23.54 24.76 25.98 27.20 28.41
0.095 23.75 24.98 26.21 27.43 28.64
0.090 23.97 25.21 26.44 27.66 28.88
0.085 24.21 25.45 26.68 27.91 29.14
0.080 24.45 25.70 26.94 28.18 29.40
0.075 24.71 25.97 27.21 28.45 29.69
0.070 24.99 26.25 27.50 28.75 29.99
0.065 25.28 26.55 27.81 29.06 30.30
0.060 25.59 26.87 28.13 29.39 30.64
0.055 25.93 27.21 28.48 29.75 31.01
0.050 26.29 27.58 28.86 30.14 31.41
0.045 26.69 27.99 29.28 30.56 31.84
0.040 27.13 28.44 29.74 31.03 32.32
0.035 27.62 28.94 30.25 31.56 32.85
0.030 28.19 29.52 30.84 32.15 33.46
0.025 28.84 30.19 31.52 32.85 34.16
0.020 29.63 30.99 32.34 33.68 35.01
0.015 30.62 32.01 33.38 34.74 36.09
0.010 32.00 33.40 34.80 36.19 37.56
0.005 34.26 35.71 37.15 38.58 39.99
0.001 39.25 40.78 42.31 43.81 45.31
,0.001 .39.25 .40.78 .42.31 .43.81 .45.31

Table VIII critical values

curve

Area

Value

Numerator df

Area 1 2 3 4 5 6 7 8 9 10

1 .100 39.86 49.50 53.59 55.83 57.24 58.20 58.91 59.44 59.86 60.19
.050 161.40 199.50 215.70 224.60 230.20 234.00 236.80 238.90 240.50 241.90
.010 4052.00 5000.00 5403.00 5625.00 5764.00 5859.00 5928.00 5981.00 6022.00 6056.00
2 .100 8.53 9.00 9.16 9.24 9.29 9.33 9.35 9.37 9.38 9.39
.050 18.51 19.00 19.16 19.25 19.30 19.33 19.35 19.37 19.38 19.40
.010 98.50 99.00 99.17 99.25 99.30 99.33 99.36 99.37 99.39 99.40
.001 998.50 999.00 999.20 999.20 999.30 999.30 999.40 999.40 999.40 999.40
3 .100 5.54 5.46 5.39 5.34 5.31 5.28 5.27 5.25 5.24 5.23
.050 10.13 9.55 9.28 9.12 9.01 8.94 8.89 8.85 8.81 8.79
.010 34.12 30.82 29.46 28.71 28.24 27.91 27.67 27.49 27.35 27.23
.001 167.00 148.50 141.10 137.10 134.60 132.80 131.60 130.60 129.90 129.20
4 .100 4.54 4.32 4.19 4.11 4.05 4.01 3.98 3.95 3.94 3.92
.050 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 6.00 5.96
.010 21.20 18.00 16.69 15.98 15.52 15.21 14.98 14.80 14.66 14.55
.001 74.14 61.25 56.18 53.44 51.71 50.53 49.66 49.00 48.47 48.05
Denominator df

5 .100 4.06 3.78 3.62 3.52 3.45 3.40 3.37 3.34 3.32 3.30
.050 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 4.77 4.74
.010 16.26 13.27 12.06 11.39 10.97 10.67 10.46 10.29 10.16 10.05
.001 47.18 37.12 33.20 31.09 29.75 28.83 28.16 27.65 27.24 26.92
6 .100 3.78 3.46 3.29 3.18 3.11 3.05 3.01 2.98 2.96 2.94
.050 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 4.10 4.06
.010 13.75 10.92 9.78 9.15 8.75 8.47 8.26 8.10 7.98 7.87
.001 35.51 27.00 23.70 21.92 20.80 20.03 19.46 19.03 18.69 18.41
7 .100 3.59 3.26 3.07 2.96 2.88 2.83 2.78 2.75 2.72 2.70
.050 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 3.68 3.64
.010 12.25 9.55 8.45 7.85 7.46 7.19 6.99 6.84 6.72 6.62
.001 29.25 21.69 18.77 17.20 16.21 15.52 15.02 14.63 14.33 14.08
8 .100 3.46 3.11 2.92 2.81 2.73 2.67 2.62 2.59 2.56 2.54
.050 5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 3.39 3.35
.010 11.26 8.65 7.59 7.01 6.63 6.37 6.18 6.03 5.91 5.81
.001 25.41 18.49 15.83 14.39 13.48 12.86 12.40 12.05 11.77 11.54
9 .100 3.36 3.01 2.81 2.69 2.61 2.55 2.51 2.47 2.44 2.42
.050 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 3.18 3.14
.010 10.56 8.02 6.99 6.42 6.06 5.80 5.61 5.47 5.35 5.26
.001 22.86 16.39 13.90 12.56 11.71 11.13 10.70 10.37 10.11 9.89

Table VIII critical values

Numerator df

Area 1 2 3 4 5 6 7 8 9 10

10 .100 3.29 2.92 2.73 2.61 2.52 2.46 2.41 2.38 2.35 2.32
.050 4.96 4.10 3.71 3.48 3.33 3.22 3.14 3.07 3.02 2.98
.010 10.04 7.56 6.55 5.99 5.64 5.39 5.20 5.06 4.94 4.85
.001 21.04 14.91 12.55 11.28 10.48 9.93 9.52 9.20 8.96 8.75
11 .100 3.23 2.86 2.66 2.54 2.45 2.39 2.34 2.30 2.27 2.25
.050 4.84 3.98 3.59 3.36 3.20 3.09 3.01 2.95 2.90 2.85
.010 9.65 7.21 6.22 5.67 5.32 5.07 4.89 4.74 4.63 4.54
.001 19.69 13.81 11.56 10.35 9.58 9.05 8.66 8.35 8.12 7.92
12 .100 3.18 2.81 2.61 2.48 2.39 2.33 2.28 2.24 2.21 2.19
.050 4.75 3.89 3.49 3.26 3.11 3.00 2.91 2.85 2.80 2.75
.010 9.33 6.93 5.95 5.41 5.06 4.82 4.64 4.50 4.39 4.30
.001 18.64 12.97 10.80 9.63 8.89 8.38 8.00 7.71 7.48 7.29
13 .100 3.14 2.76 2.56 2.43 2.35 2.28 2.23 2.20 2.16 2.14
.050 4.67 3.81 3.41 3.18 3.03 2.92 2.83 2.77 2.71 2.67
.010 9.07 6.70 5.74 5.21 4.86 4.62 4.44 4.30 4.19 4.10
.001 17.82 12.31 10.21 9.07 8.35 7.86 7.49 7.21 6.98 6.80
14 .100 3.10 2.73 2.52 2.39 2.31 2.24 2.19 2.15 2.12 2.10
.050 4.60 3.74 3.34 3.11 2.96 2.85 2.76 2.70 2.65 2.60
Denominator df

.010 8.86 6.51 5.56 5.04 4.69 4.46 4.28 4.14 4.03 3.94
.001 17.14 11.78 9.73 8.62 7.92 7.44 7.08 6.80 6.58 6.40
15 .100 3.07 2.70 2.49 2.36 2.27 2.21 2.16 2.12 2.09 2.06
.050 4.54 3.68 3.29 3.06 2.90 2.79 2.71 2.64 2.59 2.54
.010 8.68 6.36 5.42 4.89 4.67 4.32 4.14 4.00 3.89 3.80
.001 16.59 11.34 9.34 8.25 7.57 7.09 6.74 6.47 6.26 6.08
16 .100 3.05 2.67 2.46 2.33 2.24 2.18 2.13 2.09 2.06 2.03
.050 4.49 3.63 3.24 3.01 2.85 2.74 2.66 2.59 2.54 2.49
.010 8.53 6.23 5.29 4.77 4.44 4.20 4.03 3.89 3.78 3.69
.001 16.12 10.97 9.01 7.94 7.27 6.80 6.46 6.19 5.98 5.81
17 .100 3.03 2.64 2.44 2.31 2.22 2.15 2.10 2.06 2.03 2.00
.050 4.45 3.59 3.20 2.96 2.81 2.70 2.61 2.55 2.49 2.45
.010 8.40 6.11 5.18 4.67 4.34 4.10 3.93 3.79 3.68 3.59
.001 15.72 10.66 8.73 7.68 7.02 6.56 6.22 5.96 5.75 5.58
18 .100 3.01 2.62 2.42 2.29 2.20 2.13 2.08 2.04 2.00 1.98
.050 4.41 3.55 3.16 2.93 2.77 2.66 2.58 2.51 2.46 2.41
.010 8.29 6.01 5.09 4.58 4.25 4.01 3.84 3.71 3.60 3.51
.001 15.38 10.39 8.49 7.46 6.81 6.35 6.02 5.76 5.56 5.39
19 .100 2.99 2.61 2.40 2.27 2.18 2.11 2.06 2.02 1.98 1.96
.050 4.38 3.52 3.13 2.90 2.74 2.63 2.54 2.48 2.42 2.38
.010 8.18 5.93 5.01 4.50 4.17 3.94 3.77 3.63 3.52 3.43
.001 15.08 10.16 8.28 7.27 6.62 6.18 5.85 5.59 5.39 5.22

Table VIII critical values

Numerator df

Area 1 2 3 4 5 6 7 8 9 10

20 .100 2.97 2.59 2.38 2.25 2.16 2.09 2.04 2.00 1.96 1.94
.050 4.35 3.49 3.10 2.87 2.71 2.60 2.51 2.45 2.39 2.35
.010 8.10 5.85 4.94 4.43 4.10 3.87 3.70 3.56 3.46 3.37
.001 14.82 9.95 8.10 7.10 6.46 6.02 5.69 5.44 5.24 5.08
21 .100 2.96 2.57 2.36 2.23 2.14 2.08 2.02 1.98 1.95 1.92
.050 4.32 3.47 3.07 2.84 2.68 2.57 2.49 2.42 2.37 2.32
.010 8.02 5.78 4.87 4.37 4.04 3.81 3.64 3.51 3.40 3.31
.001 14.59 9.77 7.94 6.95 6.32 5.88 5.56 5.31 5.11 4.95
22 .100 2.95 2.56 2.35 2.22 2.13 2.06 2.01 1.97 1.93 1.90
.050 4.30 3.44 3.05 2.82 2.66 2.55 2.46 2.40 2.34 2.30
.010 7.95 5.72 4.82 4.31 3.99 3.76 3.59 3.45 3.35 3.26
.001 14.38 9.61 7.80 6.81 6.19 5.76 5.44 5.19 4.99 4.83
23 .100 2.94 2.55 2.34 2.21 2.11 2.05 1.99 1.95 1.92 1.89
.050 4.28 3.42 3.03 2.80 2.64 2.53 2.44 2.37 2.32 2.27
.010 7.88 5.66 4.76 4.26 3.94 3.71 3.54 3.41 3.30 3.21
.001 14.20 9.47 7.67 6.70 6.08 5.65 5.33 5.09 4.89 4.73
24 .100 2.93 2.54 2.33 2.19 2.10 2.04 1.98 1.94 1.91 1.88
.050 4.26 3.40 3.01 2.78 2.62 2.51 2.42 2.36 2.30 2.25
Denominator df

.010 7.82 5.61 4.72 4.22 3.90 3.67 3.50 3.36 3.26 3.17
.001 14.03 9.34 7.55 6.59 5.98 5.55 5.23 4.99 4.80 4.64
25 .100 2.92 2.53 2.32 2.18 2.09 2.02 1.97 1.93 1.89 1.87
.050 4.24 3.39 2.99 2.76 2.60 2.49 2.40 2.34 2.28 2.24
.010 7.77 5.57 4.68 4.18 3.85 3.63 3.46 3.32 3.22 3.13
.001 13.88 9.22 7.45 6.49 5.89 5.46 5.15 4.91 4.71 4.56
26 .100 2.91 2.52 2.31 2.17 2.08 2.01 1.96 1.92 1.88 1.86
.050 4.23 3.37 2.98 2.74 2.59 2.47 2.39 2.32 2.27 2.22
.010 7.72 5.53 4.64 4.14 3.82 3.59 3.42 3.29 3.18 3.09
.001 13.74 9.12 7.36 6.41 5.80 5.38 5.07 4.83 4.64 4.48
27 .100 2.90 2.51 2.30 2.17 2.07 2.00 1.95 1.91 1.87 1.85
.050 4.21 3.35 2.96 2.73 2.57 2.46 2.37 2.31 2.25 2.20
.010 7.68 5.49 4.60 4.11 3.78 3.56 3.39 3.26 3.15 3.06
.001 13.61 9.02 7.27 6.33 5.73 5.31 5.00 4.76 4.57 4.41
28 .100 2.89 2.50 2.29 2.16 2.06 2.00 1.94 1.90 1.87 1.84
.050 4.20 3.34 2.95 2.71 2.56 2.45 2.36 2.29 2.24 2.19
.010 7.64 5.45 4.57 4.07 3.75 3.53 3.36 3.23 3.12 3.03
.001 13.50 8.93 7.19 6.25 5.66 5.24 4.93 4.69 4.50 4.35
29 .100 2.89 2.50 2.28 2.15 2.06 1.99 1.93 1.89 1.86 1.83
.050 4.18 3.33 2.93 2.70 2.55 2.43 2.35 2.28 2.22 2.18
.010 7.60 5.42 4.54 4.04 3.73 3.50 3.33 3.20 3.09 3.00
.001 13.39 8.85 7.12 6.19 5.59 5.18 4.87 4.64 4.45 4.29

Table VIII critical values

Numerator df

Area 1 2 3 4 5 6 7 8 9 10

30
.100 2.88 2.49 2.28 2.14 2.05 1.98 1.93 1.88 1.85 1.82
.050 4.17 3.32 2.92 2.69 2.53 2.42 2.33 2.27 2.21 2.16
.010 7.56 5.39 4.51 4.02 3.70 3.47 330 3.17 3.07 2.98
.001 13.29 8.77 7.05 6.12 5.53 5.12 4.82 4.58 4.39 4.24
40
.100 2.84 2.44 2.23 2.09 2.00 1.93 1.87 1.83 1.79 1.76
.050 4.08 3.23 2.84 2.61 2.45 2.34 2.25 2.18 2.12 2.08
.010 7.31 5.18 4.31 3.83 3.51 3.29 3.12 2.99 2.89 2.80
.001 12.61 8.25 6.59 5.70 5.13 4.73 4.44 4.21 4.02 3.87
60
.100 2.79 2.39 2.18 2.04 1.95 1.87 1.82 1.77 1.74 1.71
.050 4.00 3.15 2.76 2.53 2.37 2.25 2.17 2.10 2.04 1.99
.010 7.08 4.98 4.13 3.65 3.34 3.12 2.95 2.82 2.72 2.63
.001 11.97 7.77 6.17 5.31 4.76 4.37 4.09 3.86 3.69 3.54
Denominator df

90
.100 2.76 2.36 2.15 2.01 1.91 1.84 1.78 1.74 1.70 1.67
.050 3.95 3.10 2.71 2.47 2.32 2.20 2.11 2.04 1.99 1.94
.010 6.93 4.85 4.01 3.53 3.23 3.01 2.84 2.72 2.61 2.52
.001 11.57 7.47 5.91 5.06 4.53 4.15 3.87 3.65 3.48 3.34
120
.100 2.75 2.35 2.13 1.99 1.90 1.82 1.77 1.72 1.68 1.65
.050 3.92 3.07 2.68 2.45 2.29 2.18 2.09 2.02 1.96 1.91
.010 6.85 4.79 3.95 3.48 3.17 2.96 2.79 2.66 2.56 2.47
.001 11.38 7.32 5.78 4.95 4.42 4.04 3.77 3.55 3.38 3.24
240
.100 2.73 2.32 2.10 1.97 1.87 1.80 1.74 1.70 1.65 1.63
.050 3.88 3.03 2.64 2.41 2.25 2.14 2.04 1.98 1.92 1.87
.010 6.74 4.69 3.86 3.40 3.09 2.88 2.71 2.59 2.48 2.40
.001 11.10 7.11 5.60 4.78 4.25 3.89 3.62 3.41 3.24 3.09
.100 2.71 2.30 2.08 1.94 1.85 1.77 1.72 1.67 1.63 1.06
.050 3.84 3.00 2.60 2.37 2.21 2.10 2.01 1.94 1.88 1.83
.010 6.63 4.61 3.78 3.32 3.02 2.80 2.64 2.51 2.41 2.32
.001 10.83 6.91 5.42 4.62 4.10 3.74 3.47 3.27 3.10 2.96

k
Error
df 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

1 18.00 027.00 032.80 37.10 40.40 43.10 45.40 47.40 49.10 50.60 52.00 53.20 54.30 55.40 56.30 57.20 58.00 58.80 59.60
2 6.08 8.33 9.80 10.90 11.70 12.40 13.00 13.50 14.00 14.40 14.70 15.10 15.40 15.70 15.90 16.10 16.40 16.60 16.80
3 4.50 5.91 6.82 7.50 8.04 8.48 8.85 9.18 9.46 9.72 9.95 10.20 10.30 10.50 10.70 10.80 11.00 11.10 11.20
4 3.93 5.04 5.76 6.29 6.71 7.05 7.35 7.60 7.83 8.03 8.21 8.37 8.52 8.66 8.79 8.91 9.03 9.13 9.23
5 3.64 4.60 5.22 5.67 6.03 6.33 6.58 6.80 6.99 7.17 7.32 7.47 7.60 7.72 7.83 7.93 8.03 8.12 8.21
6 3.46 4.34 4.90 5.30 5.63 5.90 6.12 6.32 6.49 6.65 6.79 6.92 7.03 7.14 7.24 7.34 7.43 7.51 7.59
7 3.34 4.16 4.68 5.06 5.36 5.61 5.82 6.00 6.16 6.30 6.43 6.55 6.66 6.76 6.85 6.94 7.02 7.10 7.17
8 3.26 4.04 4.53 4.89 5.17 5.40 5.60 5.77 5.92 6.05 6.18 6.29 6.39 6.48 6.57 6.65 6.73 6.80 6.87
9 3.20 3.95 4.41 4.76 5.02 5.24 5.43 5.59 5.74 5.87 5.98 6.09 6.19 6.28 6.36 6.44 6.51 6.58 6.64
10 3.15 3.88 4.33 4.65 4.91 5.12 5.30 5.46 5.60 5.72 5.83 5.93 6.03 6.11 6.19 6.27 6.34 6.40 6.47
11 3.11 3.82 4.26 4.57 4.82 5.03 5.20 5.35 5.49 5.61 5.71 5.81 5.90 5.98 6.06 6.13 6.20 6.27 6.33
12 3.08 3.77 4.20 4.51 4.75 4.95 5.12 5.27 5.39 5.51 5.61 5.71 5.80 5.88 5.95 6.02 6.09 6.15 6.21
13 3.06 3.73 4.15 4.45 4.69 4.88 5.05 5.19 5.32 5.43 5.53 5.63 5.71 5.79 5.86 5.93 5.99 6.05 6.11
14 3.03 3.70 4.11 4.41 4.64 4.83 4.99 5.13 5.25 5.36 5.46 5.55 5.64 5.71 5.79 5.85 5.91 5.97 6.03
15 3.01 3.67 4.08 4.37 4.59 4.78 4.94 5.08 5.20 5.31 5.40 5.49 5.57 5.65 5.72 5.78 5.85 5.90 5.96
16 3.00 3.65 4.05 4.33 4.56 4.74 4.90 5.03 5.15 5.26 5.35 5.44 5.52 5.59 5.66 5.73 5.79 5.84 5.90
17 2.98 3.63 4.02 4.30 4.52 4.70 4.86 4.99 5.11 5.21 5.31 5.39 5.47 5.54 5.61 5.67 5.73 5.79 5.84
18 2.97 3.61 4.00 4.28 4.49 4.67 4.82 4.96 5.07 5.17 5.27 5.35 5.43 5.50 5.57 5.63 5.69 5.74 5.79
19 2.96 3.59 3.98 4.25 4.47 4.65 4.79 4.92 5.04 5.14 5.23 5.31 5.39 5.46 5.53 5.59 5.65 5.70 5.75

www.ebook3000.com
20 2.95 3.58 3.96 4.23 4.45 4.62 4.77 4.90 5.01 5.11 5.20 5.28 5.36 5.43 5.49 5.55 5.61 5.66 5.71
24 2.92 3.53 3.90 4.17 4.37 4.54 4.68 4.81 4.92 5.01 5.10 5.18 5.25 5.32 5.38 5.44 5.49 5.55 5.59
30 2.89 3.49 3.85 4.10 4.30 4.46 4.60 4.72 4.82 4.92 5.00 5.08 5.15 5.21 5.27 5.33 5.38 5.43 5.47
40 2.86 3.44 3.79 4.04 4.23 4.39 4.52 4.63 4.73 4.82 4.90 4.98 5.04 5.11 5.16 5.22 5.27 5.31 5.36
60 2.83 3.40 3.74 3.98 4.16 4.31 4.44 4.55 4.65 4.73 4.81 4.88 4.94 5.00 5.06 5.11 5.15 5.20 5.24
120 2.80 3.36 3.68 3.92 4.10 4.24 4.36 4.47 4.56 4.64 4.71 4.78 4.84 4.90 4.95 5.00 5.04 5.09 5.13
2.77 3.31 3.63 3.86 4.03 4.17 4.29 4.39 4.47 4.55 4.62 4.68 4.74 4.80 4.85 4.89 4.93 4.97 5.01

k
Error
df 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

1 90.00 135.00 0164.80 186.10 202.40 216.00 227.00 237.00 246.00 253.00 260.00 266.00 272.00 277.00 282.00 286.00 290.00 294.00 298.00
2 14.00 19.00 22.30 24.70 26.60 28.20 29.50 30.70 31.70 32.60 33.40 34.10 34.80 35.40 36.00 36.50 37.00 37.50 37.90
3 8.26 10.60 12.20 13.30 14.20 15.00 15.60 16.20 16.70 17.10 17.50 17.90 18.20 18.50 18.80 19.10 19.30 19.50 19.80
4 6.51 8.12 9.17 9.96 10.60 11.10 11.50 11.90 12.30 12.60 12.80 13.10 13.30 13.50 13.70 13.90 14.10 14.20 14.40
5 5.70 6.97 7.80 8.42 8.91 9.32 9.67 9.97 10.20 10.50 10.70 10.90 11.10 11.20 11.40 11.60 11.70 11.80 11.90
6 5.24 6.33 7.03 7.56 7.97 8.32 8.61 8.87 9.10 9.30 9.49 9.65 9.81 9.95 10.10 10.20 10.30 10.40 10.50
7 4.95 5.92 6.54 7.01 7.37 7.68 7.94 8.17 8.37 8.55 8.71 8.86 9.00 9.12 9.24 9.35 9.46 9.55 9.65
8 4.74 5.63 6.20 6.63 6.96 7.24 7.47 7.68 7.87 8.03 8.18 8.31 8.44 8.55 8.66 8.76 8.85 8.94 9.03
9 4.60 5.43 5.96 6.35 6.66 6.91 7.13 7.32 7.49 7.65 7.78 7.91 8.03 8.13 8.23 8.32 8.41 8.49 8.57
10 4.48 5.27 5.77 6.14 6.43 6.67 6.87 7.05 7.21 7.36 7.48 7.60 7.71 7.81 7.91 7.99 8.07 8.15 8.22
11 4.39 5.14 5.62 5.97 6.25 6.48 6.67 6.84 6.99 7.13 7.25 7.36 7.46 7.56 7.65 7.73 7.81 7.88 7.95
12 4.32 5.04 5.50 5.84 6.10 6.32 6.51 6.67 6.81 6.94 7.06 7.17 7.26 7.36 7.44 7.52 7.59 7.66 7.73
13 4.26 4.96 5.40 5.73 5.98 6.19 6.37 6.53 6.67 6.79 6.90 7.01 7.10 7.19 7.27 7.34 7.42 7.48 7.55
14 4.21 4.89 5.32 5.63 5.88 6.08 6.26 6.41 6.54 6.66 6.77 6.87 6.96 7.05 7.12 7.20 7.27 7.33 7.39
15 4.17 4.83 5.25 5.56 5.80 5.99 6.16 6.31 6.44 6.55 6.66 6.76 6.84 6.93 7.00 7.07 7.14 7.20 7.26
16 4.13 4.78 5.19 5.49 5.72 5.92 6.08 6.22 6.35 6.46 6.56 6.66 6.74 6.82 6.90 6.97 7.03 7.09 7.15
17 4.10 4.74 5.14 5.43 5.66 5.85 6.01 6.15 6.27 6.38 6.48 6.57 6.66 6.73 6.80 6.87 6.94 7.00 7.05
18 4.07 4.70 5.09 5.38 5.60 5.79 5.94 6.08 6.20 6.31 6.41 6.50 6.58 6.65 6.72 6.79 6.85 6.91 6.96
19 4.05 4.67 5.05 5.33 5.55 5.73 5.89 6.02 6.14 6.25 6.34 6.43 6.51 6.58 6.65 6.72 6.78 6.84 6.89
20 4.02 4.64 5.02 5.29 5.51 5.69 5.84 5.97 6.09 6.19 6.29 6.37 6.45 6.52 6.59 6.65 6.71 6.76 6.82
24 3.96 4.54 4.91 5.17 5.37 5.54 5.69 5.81 5.92 6.02 6.11 6.19 6.26 6.33 6.39 6.45 6.51 6.56 6.61
30 3.89 4.45 4.80 5.05 5.24 5.40 5.54 5.65 5.76 5.85 5.93 6.01 6.08 6.14 6.20 6.26 6.31 6.36 6.41
40 3.82 4.37 4.70 4.93 5.11 5.27 5.39 5.50 5.60 5.69 5.77 5.84 5.90 5.96 6.02 6.07 6.12 6.17 6.21
60 3.76 4.28 4.60 4.82 4.99 5.13 5.25 5.36 5.45 5.53 5.60 5.67 5.73 5.79 5.84 5.89 5.93 5.98 6.02
120 3.70 4.20 4.50 4.71 4.87 5.01 5.12 5.21 5.30 5.38 5.44 5.51 5.56 5.61 5.66 5.71 5.75 5.79 5.83
3.64 4.12 4.40 4.60 4.76 4.88 4.99 5.08 5.16 5.23 5.29 5.35 5.40 5.45 5.49 5.54 5.57 5.61 5.65

Source: From E. S. Pearson and H. O. Hartley, Biometrika Tables for Statisticians, 1: 176–77. Reproduced by permission of the Biometrika Trustees.

Table X Critical values for Dunnett’s Method

Two-sided comparisons

k 1 number of treatment means (excluding control)

n k 1 2 3 4 5 6 7 8 9

5 2.57 3.03 3.29 3.48 3.62 3.73 3.82 3.90 3.97
6 2.45 2.86 3.10 3.26 3.39 3.49 3.57 3.64 3.71
7 2.36 2.75 2.97 3.12 3.24 3.33 3.41 3.47 3.53
8 2.31 2.67 2.88 3.02 3.13 3.22 3.29 3.35 3.41
9 2.26 2.61 2.81 2.95 3.05 3.14 3.20 3.26 3.32
10 2.23 2.57 2.76 2.89 2.99 3.07 3.14 3.19 3.24
11 2.20 2.53 2.72 2.84 2.94 3.02 3.08 3.14 3.19
12 2.18 2.50 2.68 2.81 2.90 2.98 3.04 3.09 3.14
13 2.16 2.48 2.65 2.78 2.87 2.94 3.00 3.06 3.10
14 2.14 2.46 2.63 2.75 2.84 2.91 2.97 3.02 3.07
15 2.13 2.44 2.61 2.73 2.82 2.89 2.95 3.00 3.04
16 2.12 2.42 2.59 2.71 2.80 2.87 2.92 2.97 3.02
17 2.11 2.41 2.58 2.69 2.78 2.85 2.90 2.95 3.00
18 2.10 2.40 2.56 2.68 2.76 2.83 2.89 2.94 2.98
19 2.09 2.39 2.55 2.66 2.75 2.81 2.87 2.92 2.96
20 2.09 2.38 2.54 2.65 2.73 2.80 2.86 2.90 2.95
24 2.06 2.35 2.51 2.61 2.70 2.76 2.81 2.86 2.90
30 2.04 2.32 2.47 2.58 2.66 2.72 2.77 2.82 2.86
40 2.02 2.29 2.44 2.54 2.62 2.68 2.73 2.77 2.81
60 2.00 2.27 2.41 2.51 2.58 2.64 2.69 2.73 2.77
120 1.98 2.24 2.38 2.47 2.55 2.60 2.65 2.69 2.73
1.96 2.21 2.35 2.44 2.51 2.57 2.61 2.65 2.69
a
Reproduced with permission from C. W. Dunnett, “New Tables for Multiple Comparison with a
Control,” Biometrics, Vol. 20, No. 3, 1964, and from C. W. Dunnett, “A Multiple Comparison Procedure
for Comparing Several Treatments with a Control,” Journal of the American Statistical Association, Vol. 50,
1955.

Table XI Control chart constants

Process Process Process

variation average standard deviation

Sample
size (n) D3 D4 B3 B4 A2 A3 A6 A7 d2 c4 d3

2 0.000 3.267 0.000 3.267 1.880 2.659 1.880 1.880 1.128 0.7979 0.853
3 0.000 2.574 0.000 2.568 1.023 1.954 1.187 1.067 1.693 0.8862 0.888
4 0.000 2.282 0.000 2.266 0.729 1.628 0.796 0.796 2.059 0.9213 0.880
5 0.000 2.114 0.000 2.089 0.577 1.427 0.691 0.660 2.326 0.9400 0.864
6 0.000 2.004 0.030 1.970 0.483 1.287 0.549 0.580 2.534 0.9515 0.848
7 0.076 1.924 0.118 1.882 0.419 1.182 0.509 0.521 2.704 0.9594 0.833
8 0.136 1.864 0.185 1.815 0.373 1.099 0.434 0.477 2.847 0.9650 0.820
9 0.184 1.816 0.239 1.761 0.337 1.032 0.412 0.444 2.970 0.9693 0.808
10 0.223 1.777 0.284 1.716 0.308 0.975 0.365 0.419 3.078 0.9727 0.797
11 0.256 1.744 0.321 1.679 0.285 0.927 0.350 0.399 3.173 0.9754 0.787
12 0.283 1.717 0.354 1.646 0.266 0.886 0.317 0.382 3.258 0.9776 0.778
13 0.307 1.693 0.382 1.618 0.249 0.850 0.306 0.368 3.336 0.9794 0.770
14 0.328 1.672 0.406 1.594 0.235 0.817 0.282 0.356 3.407 0.9810 0.763
15 0.347 1.653 0.428 1.572 0.223 0.789 0.274 0.346 3.472 0.9823 0.756
16 0.363 1.637 0.448 1.552 0.212 0.763 0.257 0.337 3.532 0.9835 0.750
17 0.378 1.622 0.466 1.534 0.203 0.739 0.250 0.329 3.588 0.9845 0.744
18 0.391 1.608 0.482 1.518 0.194 0.718 0.237 0.322 3.640 0.9854 0.739
19 0.403 1.597 0.497 1.503 0.187 0.698 0.231 0.315 3.689 0.9862 0.734
20 0.415 1.585 0.510 1.490 0.180 0.680 0.218 0.308 3.735 0.9869 0.729
21 0.425 1.575 0.523 1.477 0.173 0.663 0.215 0.303 3.778 0.9876 0.724
22 0.434 1.566 0.534 1.466 0.167 0.647 0.204 0.298 3.819 0.9882 0.720
23 0.443 1.557 0.545 1.455 0.162 0.633 0.202 0.292 3.858 0.9887 0.716
24 0.451 1.548 0.555 1.445 0.157 0.619 0.192 0.288 3.895 0.9892 0.712
25 0.459 1.541 0.565 1.435 0.153 0.606 0.191 0.284 3.931 0.9896 0.708
a
Values in this table were generated using MathCAD version 3.1 software.

Table XII Approximate critical values for the

Ryan-Joiner test of normality

.10 .05 .01

4 .8951 .8734 .8318

5 .9033 .8804 .8319
6 .9114 .8893 .8409
7 .9186 .8978 .8517
8 .9248 .9054 .8622
9 .9301 .9121 .8718
10 .9347 .9179 .8804
11 .9387 .9230 .8880
12 .9422 .9275 .8947
13 .9454 .9315 .9008
14 .9481 .9351 .9061
n 15 .9506 .9383 .9109
16 .9529 .9411 .9153
17 .9549 .9437 .9192
18 .9567 .9461 .9228
19 .9584 .9483 .9260
20 .9600 .9503 .9290
25 .9662 .9582 .9407
30 .9707 .9639 .9490
40 .9767 .9715 .9597
50 .9807 .9764 .9664
60 .9835 .9799 .9709
75 .9865 .9835 .9756

Source: Minitab Reference Manual.

Chapter 1 “mild” outlier, but this is simply a consequence of

1. a. 5 9 using repeated stems.
6 3 8 8 5 3 5. a. Two-digit stems. One-digit stems would give a
7 2 3 0 6 0 9 8 4 7 8 7 stem: ones display with too few rows to be informative, and
8 1 2 7 leaf: tenths three-digit stems would result in far too many
9 0 7 7 rows.
10 7 b. 64 33 35 64 70
11 6 3 8 65 06 26 27 83 stem: thousands
A value close to 8.0 is representative. There ap- 66 05 14 94 and hundreds
pears to be a substantial amount of dispersion in 67 00 13 45 70 70 90 98 leaf: tens and
the data. 68 50 70 73 90 ones
b. There is clearly asymmetry, a skewness toward 69 00 04 27 36
larger values (positive skewness). 70 05 11 22 40 50 51
c. No d. .148, or roughly 15% 71 05 13 31 65 68 69
72 09 80
3. 3L 1
3H 5 6 6 7 8 stem: tenths c. 64 3 3 6 7
4L 0 0 0 1 1 2 2 2 2 2 3 4 leaf: hundredths 65 0 2 2 8 stems: thousands
4H 5 6 6 7 8 8 8 66 0 1 9 and hundreds
5L 1 4 4 67 0 1 4 7 7 9 9 leaf: tens
5H 5 8 68 5 7 7 9
6L 2 69 0 0 2 3
6H 6 6 7 8 70 0 1 2 4 5 5
7L 71 0 1 3 6 6 6
7H 5 72 0 8

A specific gravity of roughly .45 is typical. The data The second display is essentially as informative
spreads out quite a bit about this typical value. as the first. With 200 observations, the first dis-
There is asymmetry in the distribution of values. play would be very cumbersome.
The observation .75 appears at first glance to be a

604

7. a. # Nonconforming Frequency Rel. freq. Histogram of Herd size

0 7 .117 50
1 12 .200
2 13 .217 40
3 14 .233
4 6 .100
5 3 .050 30

Percent
6 3 .050
7 1 .017 20
8 1 .017
60 1.001
10
b. .917, .867, 1 2 .867 5 .133
c. The histogram has a substantial positive skew.
It is centered somewhere between 2 and 3 and 0
1 7 13 19 25 31
spreads out quite a bit about its center. Herd size
9. a. .99 (99%), .71 (71%)
15. a. Yes, .518.
b. .64 (64%), .44 (44%)
b. .152.
c. Strictly speaking, the histogram is not unimodal,
c. .408.
but is close to being so with a moderate positive
d. The distribution is heavily positively skewed.
skew. A much larger sample size would likely
Though angles can range from 0° to 90°,
give a smoother picture.
approximately 85% of all angles are less than 30°.
11. a. 0 3 3 9 5 5 9 4 5 1 5 2 3 Histogram of Angle
1 2 2 0 0 3 2 1 8 6 8 4 stem: thousands 0.04
2 1 4 1 2 3 4 4 7 7 1 leaf: hundreds
3 0 3 3 3 8 1 1
4 3 7 0.03
5 3 7 2 8 7
A typical value is one in the low 2000s; there is
Density

much variability in the data, no gaps, and the 0.02

display is close to being unimodal with a posi-
tive skew.
b. Class Frequency Rel. Freq. 0.01

0–,1000 12 .255
1000–,2000 11 .234 0.00
0 10 20 40 90
2000–,3000 10 .213
Angle
3000–,4000 7 .149
17. Class Freq. Rel. Freq.
4000–,5000 2 .043
5000–,6000 5 .106 4000–,4200 1 .01
47 1.000 4200–,4400 2 .02
.489, .149; see the description in (a) 4400–,4600 9 .09
13. a. 589/1570 5 .3752 4600–,4800 13 .13
b. 1 2 (589 1 190 1 176 1 157 1 115)y1570 5 4800–,5000 18 .18
.2185 5000–,5200 22 .22
c. (115 1 89 1 57 1 55 1 33 5 31)y1570 5 5200–,5400 20 .20
.2420 5400–,5600 7 .07
d. The shape of this histogram is positively skewed. 5600–,5800 7 .07
5800–,6000 1 .01
100 1.00

The histogram is quite symmetric and indeed ap- 49. a. .5517

proximately bell-shaped. A representative strength b. Proportion (x . 200) 5 Proportion (x $ 200) 5
value is something in the neighborhood of 5000; .1587
the data spreads out rather substantially about this 51. a. .3099 b. .4035 2 .0678 5 .3357
representative value. c. 90th percentile 5 1.3657, 10th percentile 5 .2501
1 53. a. .3799 b. .1557 c. .5330
19. a. 2 3 5 1 b. .5 c. 5 d. 5.8
2 55. a. Proportion (x # 2) 5 .677 (from Table II)
21. a. The density curve is a triangle over the inter- b. Proportion (x $ 5) 5 .043
val [0, 10]. Total area under curve 5 12 (base) c. Proportion (x $ 11) 5 .000
(height) 5 12 (10)(.2) 5 1. 57. a. Proportion (x # 10) 5 .01 (Table III, 5 20)
b. Proportion (x . 20) 5 .428
height .2
c. Proportion (10 # x # 20) 5 .556, Proportion
(10 , x , 20) 5 .461
0 5 10 59. Using Table III ( 5 20), Proportion (x $ 15) 5
.894. Proportion (x # 25) 5 .902.
b. Proportion (x # 3) 5 .18, Proportion (x $ 7) 5
61. a. The histogram is reasonably symmetric and
.18, Proportion (x $ 4) 5 .68, Proportion
bell-shaped. A representative value is about 90.
(4 , x , 7) 5 .50
b. Proportion (x $ 85) 5 .9231. Proportion
c. 7.7639
(x , 95) 5 .9053.
23. b. .449, .699, .148 c. 115,129.25; 251.26
c. .0355 1 .0414 1 .1006 1 .1775 1 .2544/2 5 .4822
25. a. .75 b. .5 (50%) c. .367
63. a. .4445 b. 2.107
27. a. .70, .45 b. .10 c. .65 d. .45
65. a. .2946, .0708, .0222 b. .0348
29. x: 0 1 2
c. 254.3 separates the fastest 10% of all times from
p(x): .3 .6 .1
the slowest 90%. d. The distribution is
31. a. .9625 b. .2912 c. .7881
quite positively skewed.
d. .3037 e. .0456 f. <0 g. <0
67. a. .82 b. .18 c. .65, .27
33. a. <1.04 or larger b. <2.675 or smaller
69. b. .8647, .1353, .4712
c. larger than 2.05 or smaller than 22.05
71. b. .491, .269 c. 5.12
35. a. .8413 b. .9876 c. .0668
d. 15.85 separates the largest 10% from the others.
37. a. .9664 b. .2451 c. 45.62 km/h
73. b. x: 1 3 4 6 12
39. a. .7967 b. .0004
p(x): .30 .10 .05 .15 .40
c. Those larger than .399
c. .30
41. a. .8633 b. .8643, .8159
75. .4423
43. a. Proportion (x . 120) 5 Proportion (x $ 120) 5
.9834 Chapter 2
b. .0905
c. 125.90 1. a. x 5 640.5, x~ 5 582.5. The average sale price for
45. a. a home in this sample was $640,500. Half the
0.02 sales were for less than $582,500.
b. Mean becomes 610.5, median is unchanged.
c. xtr(20) 5 591.2 ($591,200).
d. xtr(15) 5 (591.2 1 596.3)/2 5 593.75 ($593,750).
3. a. 2 0 4 5 6 6 7 7 8
0.01
3 0 1 2 3 3 4 4 6 6 6 6 7
4 4 6 7 8 stem: ones
5 3 leaf: tenths
6
0.00
7
100 125 150 175 200 225 8
b. .4602 c. .3636 d. 140.18 9
47. a. .0456 b. .8474 c. 6.592 10 1

Due to the strong positive skew, the sample mean 29. 2 5 n(1 2 ) 5 (25)(.20)(.80) 5 4 5 2
will be greater than the sample median. P(x . 1 2) 5 P(x . 5 1 2(2)) 5 P(x . 9) 5 .017
b. x 5 3.654, x~ 5 3.35 31. .135
c. By any amount. By no more than 6.7. 33. a. Lower quartile 5 122, upper quartile 5 135,
5. Due to the unusually large observation 59.31, the IQR 5 13
sample mean will be greater than the sample me- b. The proximity of the upper quartile to the
dian. Since the mean can be inflated when an un- median suggests a negative skew. The variation
usually large observation exists, the median (31.28) seems quite large and there do not appear to be
appears to be a more representative value. any outliers.
7. x~ 5 68.0, 20% xtr 5 66.2, 30% xtr 5 67.5 c. Observations less than 102.5 and greater than
9. a. 4y3 because of skewness 154.5 would be outliers, and observations less
b. 1.414, so , ~ because of negative skewness. than 83 and greater than 174 would be extreme
c. .615, .707 outliers.
11. 5 1.614, ~ 5 1.64, .032 (a bit more than 3% of d. Decrease the maximum by any amount and the
all weeks) IQR remains unchanged.
13. 1.8 35. 5 3.51, 5 .146
15. a. x 5 1939.367, and the deviations are 66.733, 37. min 5 16; lower quartile 5 87; median 5 140;
125.833, 179.533, 2252.767, 27.533, and upper quartile 5 210; max 5 403. A mild high
2146.867. outlier is above 394.5 N and an extreme high
b. 27747.695, 166.576 outlier is above 579 N. The value 403 N is a mild
17. a. Group 1 has mean 5 9.86, SD 5 2.67. Group 2 outlier. The distribution has positive skew.
has mean 5 8.93, SD 5 2.37. 39. The most noticeable feature of the comparative
b. Group 1 has range 5 7, Group 2 has range 5 8. boxplots is that machine 2’s sample values have
c. considerably more variation than do machine 1’s

Group1

Group2
4.8 6.0 7.2 8.4 9.6 10.8 12.0 13.2

d. The standard deviation measures spread by sample values. However, a typical value, as mea-
incorporating the deviation of each observation sured by the median, seems to be about the same
from the sample mean. Many observations of for the two machines. The only outlier that exists is
Group 2 are clustered near its sample mean of from machine 1.
8.93, whereas the observations of Group 1 are 41. The endotoxin concentration in urban homes gener-
farther away from its sample mean of 9.86. So, ally exceeds that in farm homes. The range of endo-
although Group 2 data exhibits a larger range, it toxin concentrations for urban homes exceeds that for
also yields the smaller standard deviation. farm homes. For the urban homes data, there is one
19. The sample mean of 17.67 can be considered a mild outlier (1) and one extreme outlier (104). For
representative value for this data. The standard the farm homes data there is one mild outlier (64).
deviation is 6.41. In general, the size of a typical 43. a. IQR 5 (qu 2 ql) 5 (133.44 2 97.43) 5 36.01
deviation from the sample mean is about 6.41. b. IQR 5 (13.34 2 9.74) 5 3.6
Some observations may deviate from 17.67 by a 45. The general pattern is reasonably straight and a de-
little more than this, some by less. parture from linearity is not clear-cut. One should
21. 76,683 and 76,910 not rule out normality of the tension distribution.
23. a. .785 47. The plot shows some nontrivial departures from
b. .688 linearity, especially in the lower tail of the distribu-
25. a. 1.72 tion. This indicates a normal distribution might
b. .3, 0 not be a good fit to the population distribution of
27. .423 clubhead velocities for female golfers.

49. The corresponding probability plot appears suf- c. Chebyshev’s inequality may not accurately
ficiently straight to lead us to agree with the argu- estimate any particular distribution as it must
ment that the distribution of fracture toughness in accommodate all distributions.
concrete specimens could well be modeled by a 63. a. xtr(6.7) 5 10.67, xtr(13.3) 5 10.58
Weibull distribution (10.67 1 10.58)
51. Clearly, the variable IDT is not normally distrib- b. xtr(10) 5 a b 5 10.625
uted, since its normal quantile plot is nonlinear. 2
c. Interpolate between xtr(6.25) and xtr(12.5) to obtain
IDT is likely to be lognormally distributed since
xtr(10)
the normal quantile plot of ln(ITD) is quite linear.
53. a. Clearly, the variable, hourly median power, is 65. The mean and the midrange are sensitive to outli-
not normally distributed, as the normal quantile ers. The median, the trimmed mean, and the mid-
plot is curvilinear. hinge are not sensitive to outliers.
b. By taking the natural logarithm of the variable 67. a. Aortic root diameters for males have mean
and constructing a normal quantile plot, the 3.64 cm, median 3.70 cm, standard deviation
plot looks quite linear indicating that it is plau- 0.269 cm, and IQR 0.40. The correspond-
sible that these observations were sampled from ing values for females are x 5 3.28 cm, x~ 5
a lognormal distribution. 3.15 cm, s 5 0.478 cm, and IQR 5 0.50 cm.
55. The corresponding histogram shows the noise Aortic root diameters are typically smaller for
distribution is bimodal (but close to unimodal) females than for males, and females show more
with a positive skew and no outliers. The mean variability. The distribution for males is nega-
noise level is 64.89 dB and the median noise level tively skewed, while the distribution for females
is 64.7 dB. The IQR of the noise measurements is is positively skewed.
about 70.4 2 57.8 5 12.6 dB. b. For females (n 5 10), the 10% trimmed mean
57. b. x16 5 12.53125, s16 5 .532 is the average of the middle 8 observations:
59. a. The initial Se concentrations in the treatment xtr(10) 5 3.24 cm. For males (n 5 13), the
and control groups are not that different. The 1y13 trimmed mean is 40.2y11 5 3.6545, and
median initial Se concentrations for the treat- the 2y13 trimmed mean is 32.8y9 5 3.6444.
ment and control groups are 10.3 mg/L and Interpolating, the 10% trimmed mean is xtr(10) 5
10.5 mg/L, respectively, each with IQR of about 0.7(3.6545) 1 0.3(3.6444) 5 3.65 cm.
1.25 mg/L. So, the two groups of cows are com- 69. .0228, .1587
parable at the beginning of the study.
b. The final Se concentrations of the two groups are Chapter 3
extremely different. The median final Se con- 1. The scatterplot exhibits a negative linear associa-
centration for the control group is 9.3 mg/L, the tion between the variables.
median Se concentration in the treatment group 3. The scatterplot exhibits a positive linear association
is now 103.9 mg/L, nearly a 10-fold increase. between the variables. One unusual observation
61. a. (with # beds 5 68) deviates from the linear pattern.
Percentage within Chebyshev’s Rule Empirical Rule 5. b. Yes
c. There appears to be an appropriate quadratic
1 No statement About 68% relationship (points fall closest to a parabola).
2 At least 75% About 95% 7. The scatterplot exhibits a negative linear associa-
3 At least 89% About 99.7% tion between the variables.
9. a. Positive b. Negative c. Positive
Chebyshev’s inequality is more conservative than is d. Little or none e. Negative
the empirical rule. f. Little or none
b. 11. r 5 .4806, a weak to moderate linear correlation
Percentage within Chebyshev’s Rule Exponential exists
13. If, for example, 18 is the minimum age of eligibil-
1 No statement 86.47% ity, then for most people y < x 2 18.
2 At least 75% 95.02% 15. 2.9
3 At least 89% 98.17% 17. a. .733 b. .9985

19. a. yn 5 2305.88 1 9.96x. The coefficient of deter- 31. b. The ln(x) versus y transformation seems to do
mination is .124, which is quite low. The linear the best job, though it yields a somewhat low
regression model accounts for only 12.4% of the r2 5 .497.
variability of colony density. c. yn 5 .0197 2 .0013* ln(5000) 5 .0086
b. yn 5 34.37 1 .78x. The coefficient of determina- 33. a. No, there is a quadratic relationship between
tion is .024, which is much lower than before. strength and thickness, so a quadratic model
The linear regression model accounts for only should be fit.
2.4% of the variability of colony density. The b. yn 5 14.521 1 .0432x 2 .00006x2. At x 5 500,
elimination of the observation has a drastic yn 5 21.121. The residual plot shows no unusual
impact on the regression model. pattern and R2 5 .780. The quadratic fit seems
21. a. The scatterplot reveals a roughly positive linear adequate.
relationship. 35. a. yn 5 4.479. Residual 5 4.454 2 4.479 5 2.0025
b. yn 5 231.80 1 .987x. A one-MPa increase in b. 12.03836y5.1109 5 .9925.
cube strength is associated with a .987 MPa 37. a. 92.34% of the observed variability in
increase in the predicted axial strength for these hydrocarbon deposition can be attributed to the
asphalt samples. given multiple regression model involving x1
c. r2 5 .630. That is, 63.0% of the observed varia- and x2.
tion in axial strength of asphalt samples of this b. yn 5 37.476
type can be attributed to its linear relationship c. Yes, it is legitimate to interpret b2 in this way.
with cube strength. 39. a. For yn 5 a 1 b1x1 1 b2x2 1 b3x3, R2 5 .0165.
d. se 5 6.625. The second model gives R2 5 .9866. Clearly, the
23. a. yn 5 11.013 2 .448x. A one percent increase second model yields a superior fit to the data.
in fiber weight is associated with a .448 MPa b. yn 5 .3569, residual 5 2.1549.
increase in the predicted compressive strength. c. yn 5 .1801, residual 5 2.0219.
b. .694 d. The larger residual magnitude based on
c. yn 5 8.101 MPa yn 5 a 1 b1x1 1 b2x2 1 b3x3 is reasonable
d. The observed range for x was 0 to 10%. given the corresponding low coefficient of
25% is well outside this range and the ex- determination.
trapolated prediction could be unreliable. For 41. a. a 5 89.111, b1 5 2.050, b2 5 6.564, b3 5
x 5 25, yn 5 2.187 MPa, a nonsensical value. 227.418, R2 5 .9175
25. a. No, if the values of Cc were perfectly linearly b. a 5 55.703, b1 5 .018, b2 5 8.719, b3 5
related to the e0 values, then one line would 211.313, b4 5 2.005, b5 5 2.033, b6 5 .105,
exactly satisfy all points in the scatterplot. R2 5 .9237
b. yn 5 2.144 1 .337x c. a 5 81.233, b1 5 .123, b2 5 26.837, b3 5
c. .874 242.035, b4 5 2.005, b5 5 2.033, b6 5 .105,
d. yn 5 .227 when x 5 1.10. Predicting y when b7 5 2.0001, b8 5 1.945, b9 5 10.241, R2 5
x 5.80 would not be advisable as this is an .9679
example of extrapolation. 43. a. .030 b. .120 c. .105 d. 2.80
27. Data set #1: scatterplot yields a rough linear e. 4.90
relationship. Data set #2: scatterplot reveals a 45. 9375, .302, no
quadratic relationship, so a linear relationship does 47. b. 35, 5, 26 c. .632
not hold. Data set #3: scatterplot shows a clear outlier. 49. a. yn 5 1.6932 1 .0805x
Without this observation, a linear relationship holds b. yn 5 220.0514 1 12.1149x
very well. Data set #4: scatterplot (containing a clear c. .975 for both regressions
outlier) shows a linear relationship does not hold. 51. a. 109.07 b. R2 5 .893
29. a. It is not appropriate to fit a straight line to this c. yn , 0, which is ridiculous.
data as there is clear curvature to the scatterplot. 53. a. No
b. A scatterplot of (x, 1yy) yields rough linearity. b. ln( y) 5 27.2557 1 8328.4yx, yn 5 74.6, r2 5 .953
The least squares line is 1yyn 5 .105 2 21.02x 55. ln( y) 5 23.7372 2 .12395ln(x), r2 5 46.9,
with corresponding r2 5 .868. y 5 .00829 when x 5 5000

Chapter 4 Examples include: type and condition of the

1. Operational definitions are used to define measure- vehicle, tire pressure, driving speed and style,
ment procedures. Benchmarks are existing objects environmental conditions, etc…
or procedures used to compare two or more prod- c. To draw conclusions about the effectiveness of
ucts or processes. the new fuel additive, the researcher may want
3. Example: Temperature at 2:00 p.m. in a fixed, un- to assess the effectiveness under different experi-
shaded area on top of City Hall mental conditions by introducing experimental
5. ISP ppm is an operational definition. factors and blocking variables. For example, the
7. Here is one possibility. Divide the one-square-mile researcher may wish to determine the effect that
area of forest into 100 smaller regions each of equal “vehicle type” has on the response.
size (each area would be (1y10)th of a mile by 23. Two basic experimental design principles are
(1y10)th of a mile). Call each region a cluster. Ran- violated; replication and randomization.
domly sample n of these clusters. Within each sam- 25. x 5 .3024, accuracy 5 (x 2 x) 5 (.3024 2 .300) 5
pled cluster study all of the trees that are growing. .0024. Precision 5 s 5 .0024083.
9. a. Both methods are capable of generating 27. a. Measurement, m Relative Error
random samples from the block of trees. By .301 .333%
Researcher A’s suggestion the chance a tree is .303 1%
selected is: .299 2.333%
5 6 .305 1.67%
a b a b 5 0.03. Each tree has this same
40 25 .304 1.33%
chance of selection, making this sampling b. The maximum absolute error you would expect
scheme a random sampling scheme. By Re- in a measured reading of 70 degrees Fahrenheit
searcher B’s suggestion the chance of selection from this thermometer is: (70)(.04) 5 2.8
30 degree Fahrenheit
a b 5 0.03 is for each tree.
1000 29. b. The Youden plot for this data shows many
b. Stratified random sampling. points near the 45 degree line, indicating that
11. Use 5RANDBETWEEN (1,1000) which uses sam- several of the laboratories are following slightly
pling with replacement. different versions of the test procedure. Lab 19
13. a. 10,000 b. 70 c. .0786 d. .0098 clearly made unusual measurements.
Nii 31. Suppose you take a random sample of size n
15. a. Since wi can be shown to be equal to k ,
with replacement. Then according to Rule 1
^ Nii in Section 4.2, the complement of this random
i51
this yields Neyman allocation. sample is also a random sample. Notice that
b. Since wi can be shown to be equal to the complement will contain no duplicates.
Finally, using Rule 1 again, the complement of
Nj Nj
5 , this yields proportional the complement will be a random sample and
N1 1 N2 1 1 Nk N is equivalent to the original random sample but
allocation. with duplicates discarded (i.e., a random sample
17. 2.576, the required sample size increases without replacement).
19. Biases tend to be eliminated when several mea- 33. a. The background samples of air would be used
surements are averaged; but more importantly, the as a benchmark of the ambient levels of Cr(VI)
variation between repeated measurements gives a in the air. Then the background samples can
measure of experimental error. be compared to the plant samples in order to
21. a. Variation in fuel efficiency between 100-mile estimate the increase in Cr(VI) pollutant at
segments can be quantified if one measures fuel chromite ore plants.
efficiency every segment. If one measures ef- b. ASTM Standard Test Method D5281-92 is the
ficiency at the end of the 500-mile course, there operational definition for how measurements
is no measure of experimental error. are to be made. Using this method, the authors
b. The researcher should consider specifying hope to reduce measurement variation so that
those variables that may affect fuel efficiency. any changes in Cr(VI) concentrations can be

attributed to the chromite ore plants and not to 15. a. (.80)(.60) 5 .48
variation in the measurement system. b. .95 1 (.05)(.80)(.60) 5 .974
c. The location at which an air sample is taken c. P(F | I) 5 P(F and I)yP(I) 5 .95y.974 5 .9754
can be considered an experimental factor (i.e., 17. The probabilities of independent events A and B
independent variable). The six sampling periods must satisfy the equation P(A and B) 5 P(A) ? P(B).
illustrate the experimental principle of replica- If A and B were also mutually exclusive, then P(A
tion. Distinguishing between wet and dry days and B) would equal 0, which would mean that
constitutes blocking. P(A) ? P(B) 5 P(A and B) 5 0. But, P(A) ? P(B) 5
.5 ? .6 5 .3. So, A and B cannot be mutually
Chapter 5 exclusive.
19. a. (.42)(.42) 5 .1764 b. .01, .0016, .1936
1. a. There are 10 possible such samples of size 3:
c. .1764 1 .01 1.0016 1 .1936 5 .3816
{a, b, c}, {a, b, d}, {a, b, e}, {a, c, d}, {a, c, e},
d. 1 2 (.3816) 5 .6184
{a, d, e}, {b, c, d}, {b, c, e}, {b, d, e}, {c, d, e}
21. .81 1 .99 2 .8019 5 .9981
b. A 5 {{a, b, c}, {a, c, d}, {a, c, e}}
23. a. .9042 b. .7660
c. A9 5 {{a, b, d}, {a, b, e}, {a, d, e}, {b, c, d},
25. Using the addition law for exclusive events,
{b, c, e}, {b, d, e}, {c, d, e}}
P(B) 5 P(A and B) 1 P(A9 and B), which can be
3. a. A and B is the event “either 4 or 5 defectives in
rearranged as P(A9 and B) 5 P(B) 2 P(A and B).
the sample.”
Using the fact that A and B are independent,
b. A or B is the event “there is at least one defec-
P(A and B) 5 P(A) ? P(B), so, P(A9 and B) 5
tive in the sample.”
P(B) 2 P(A and B) 5 P(B) 2 P(A) ? P(B) 5
c. A9 is the event “there are at most 3 defectives in
[1 2 P(A)] ? P(B) 5 P(A9) ? P(B), which shows that
the sample.”
A9 and B are independent.
5. meet standards
27. a. Discrete b. Continuous c. Discrete
scrap d. Discrete e. Continuous f. Continuous
do not meet
meet standards g. Discrete
standards readjust crimp 29. a. 2.3 b. .81 c. 88.5 lb
scrap
31. a. k 5 1y15 b. .40
7. The event A and B is the shaded area where A and c. 11y3 5 3.667 d. 1.2472
B overlap in a Venn diagram. Its complement con- 33. a. Mean 5 2.85; standard deviation 5 1.6797
sists of all events that are either not in A or not in b. .05702 c. .77883
B (or not in both). That is, the complement can be 35. a. Binomial; mean 5 50
expressed as A9 or B9. b. Normal approximation (with continuity
9. a. 1159 distinct joints were identified by the in- correction) to binomial gives .0287.
spectors together. 37. a. 1 2 .736 5 .264
b. A and B9 contains 724 2 316 5 408 solder joints. b. 1, because there will be no defectives in any
11. P(A1 or A2 or A3 or . . . or Ak) sample
# P(A1) 1 P(A2) 1 P(A3) 1 1 P(Ak) c. .086 (for 5%); .624 (for 20%); .989 (for 50%)
5 .01 1 .01 1 1 .01 5 10(.01) 5 .10 39. a. Binomial with n 5 25, 5 1y5
13. a. P(A | E9) 5 P(A and E9)yP(E9) 5 P(A)yP(E9) 5 b. Mean 5 5; standard deviation 5 2
.20y(1 2 .10) 5 .20y.90 5 20/90. c. Closest integer score S that satisfies P(x $ S) 5
P(B | E9) 5 P(B and E9)yP(E9) 5 P(B)yP(E9) 5 .01 is S 5 11.
.25y(1 2 .10) 5 .25y.90 5 25/90. 41. a. Median 5 346.57 hours
P(C | E9) 5 P(C and E9)yP(E9) 5 P(C)yP(E9) 5 b. Median is smaller than mean.
.15y(1 2.10) 5 .15y.90 5 15/90. c. Median 5 2ln(.50)y 5 .693y 5 .693
P(D | E9) 5 P(D and E9)yP(E9) 5 P(D)yP(E9) 5 43. a. P(x 5 5) 5 .40; P(x 5 6) 5 .35; P(x 5 7) 5 .25
.30y(1 2 .10) 5 .30y.90 5 30/90. b. P( y 5 10) 5 .40; P( y 5 15) 5 .40;
b. P(A | B, D, E not chosen) 5 P(A)y(1 2 (.25 1 P( y 5 20) 5 .20
.30 1 .10) 5 .20y.35 5 20y35. c. No, because P(x 5 5 and y 5 10) 5 .20 Þ (.40)
P(C | B, D, E not chosen) 5 P(C)y(1 2 (.25 1 (.40) 5 P(x 5 5)P( y 5 10)
.30 1 .10) 5 .15y.35 5 15y35.

45. a. 0, because x cannot take values between .2 and .3 b. Possible penalties: refold letter (rework), bend
b. .36498 letter to fit envelope (lower quality), reprint and
c. .3544 (without continuity correction); .5098 fold new letter (scrap and rework).
(with continuity correction) 5. a. Attributes data b. Variables data
47. a. Mean of sampling distribution should be closer c. Attributes data d. Attributes data
to 4. e. Attributes data f. Variables data
b. Mean of sampling distribution based on n 5 100 g. Attributes data h. Variables data
will be closer to 4. i. Variables data
c. Variance of sampling distribution based on n 5 7. Some unacceptable parts whose true lengths are
2 will be larger. .02 inch or less below the LSL will give measured
49. a. Mean 5 .80; standard deviation 5 .08 lengths above the LSL (and will then be incor-
b. Mean 5 .20; standard deviation 5 .08 rectly classified as acceptable). Conversely, some
c. Mean 5 .80; standard deviation 5 .04 acceptable parts whose true lengths are less than
51. a. .6826 b. .9544 .02 inch below the USL will have measured
53. a. .0228 b. .0228; same as in (a) lengths above the USL (which incorrectly classifies
c. 8.8225 hours d. $0.15 per package them as unacceptable).
55. a. .9803; .4803 b. 31.91, or n 5 32 9. Method 2 would be a better rational subgrouping
57. a. Sampling distribution is approximately normal scheme.
with p 5 .02 and p 5 .014 11. a. P(z . 3) 5 .0013 b. P(z . 3.09) 5 .001
b. .7611 13. Chart #1: Test #3 is found [Six points in a row
59. a. x 5 exp( 1 2y2) 5 .099308 b. .2643 are steadily increasing, starting with
61. a. 2y3 b. 7y9 point #3.]
63. As long as P(A) and P(B) are both positive, A and B Chart #2: Even though there are no tests found,
cannot be independent. Test #7 (which requires that 15 points
65. P(A or B) 5 1 2 P(A9)P(B9) if A and B are in a row be in zone C) seems likely to
independent. occur.
67. a. b 5 2 b. 5 4y3 Chart #3: Test #2 is found [Nine points in a row
c. 2 5 32y243, so 5 .36289 on one side of the centerline, starting
69. a. .1396 b. .8604 c. .0099 with point #2.]
71. a. The shape of the histogram should be symmet- Chart #4: Both Tests #5 and 6 are found starting
ric and bell shaped with point #1.
b. The shape of the histogram should be positively 85.2
15. Centerline R 5 a b 5 2.84
skewed. 30
c. For the uniform distribution, a sample size of
UCLR 5 D4R 5 (2.282)(2.84) 5 6.48
10 is sufficiently large to produce a reasonable
normal sampling distribution of x. However, for LCLR 5 D3R 5 (0)(2.84) 5 0
the exponential distribution, a sample size of 17. a. On the s chart no rules for statistical control are
10 is not yet sufficiently large to produce a normal broken. So, we would conclude that the process
sampling distribution of x. variation is in statistical control.
73. a. 0 b. .0038 c. 6 b. The control limits of Exercise 16(b) are based
75. For flights coming into DC: P(1 u late) 5 .4918, on a different formula compared to that used in
P(2 u late) 5 .2459, P(3 u late) 5 .2623 17(b). However, the control limits in both exer-
For flights coming into LA: P(1 u late) 5 .3125, cises are similar in values.
P(2 u late) 5 .375, P(3 u late) 5 .3125 19. a. Centerline 5 1.2642, UCL 5 2.4905,
77. P(A) 5 .45, P(B) 5 .32 LCL 5 .0379
b. Centerline 5 96.503, UCL 5 98.1300,
Chapter 6 LCL 5 94.8760
1. Tolerance 5 6(.05)(560) 5 28 ohm, so LSL 5 532 21. a. If each xi value is transformed into yi 5
and USL 5 588. b(xi 2 a), where a and b are constants and
3. a. The envelope puts an upper specification limit b . 0, then for any set of n values, y 5 b(x 2 a)
of 4.00 inches on the width of a folded letter. and Ry 5 bRx . From these two relationships,

simple algebra will show that, for example, (.01043)(.98957)

x . UCL (of the x-data) if and only if y . UCL 5 .01043 1 3
ni B
UCL (of the y-data). That is, the x charts b.
(.01043)(.98957)
based on untransformed data and transformed LCL 5 .01043 2 3
data will give the same signals. In the same B ni
manner, it can be shown that the R charts for c. On day 21, the proportion of keyboards failing
both transformed and untransformed data give inspection is .0189. This value is above the up-
the same signals. per control limit and the production process of
b. R chart: centerline 5 4.200, UCL 5 10.8108, that day should be investigated.
LCL 5 0 43. A c-chart is required in this case.
x chart: centerline 5 .1833, UCL 5 4.4799, 1,179
Computations are: c 5 a b 5 47.16
LCL 5 24.1133 25
There are no “out-of-control” conditions in UCL 5 47.16 1 3147.16 5 67.76
either chart.
LCL 5 47.16 2 3147.16 5 26.56
c. R chart: centerline 5 .0042, UCL 5 .01081,
When analyzing the control chart we do not see
LCL 5 0
any “out of control” conditions. However, the last
wx chart: centerline 5 .25418, UCL 5 .25848, 6 days of production have produced below average
LCL 5 .2499
numbers of flaws and these days may need to be
There are no “out-of-control” conditions in
investigated.
either chart.
23. a. .04932 b. P(x . USL) 5 .0228, 91
45 a. u 5 a b 5 5.759. The general control chart
P(x , LSL) 5 .1075 15.8
25. If a process is not in statistical control, then we do formulas are:
not have stable output and so there is no use in 5.759
comparing this output to the specifications. UCL 5 5.759 1 3
B ni
27. Since the Cp . 1, we know that the process has
the potential of meeting both specifications. 5.759
LCL 5 5.759 2 3
However, since the Cpk , 1, the process is not B ni
actually capable of meeting both specifications. b. Panels #7 and 8 seem to have significantly larger
29. .0035 flaw rates than the process average. Test #5
31. Cp 5 .785, Cpu 5 .733, Cpl 5 .837, Cpk 5 .733. (two out of three points in a row outside of
The process does not have good capability. 2 standard deviations) is observed.
33. a. To compute capability indexes on the trans- 47. a. R(400,000) 5 e2(400,000y600,000) 5 .82075
4

formed process data, the control chart statistics 4

from the transformed data should be used. So, b. R(800,000) 5 e2(800,000y600,000) 5 .0424
x 5 .1833, R 5 4.200. Also, the process speci- c. R(600,000) 5 e2(600,000y600,000) 5 1ye 5 .3679
4

fications need to be transformed. So, USL 5

10 and LSL 5 210. Using these values the Cp (y)t 21exp{2(ty)} 21
d. Z(t) 5 5 t . The
indexes can be computed for the transformed exp{2(ty)}
data. failure function is increasing.
b. Cp 5 1.34, Cpu 5 1.32, Cpl 5 1.37, Cpk 5 1.32. 49. b. The normal failure laws have an increasing rate.
The process is capable.
35. Cp 5 10.789, Cpk 5 1.842. The process is capable. 51. b. R(t) 5 [R1(t)]3 5 (1 2 [1 2 e2.025t] ? [1 2 e2.025t])3
37. 9y(n 1 9) 53. The shape of the distribution of fill volumes that pass
49 inspection is truncated on the left, since bottles with
39. a. p 5 a b 5 .032667, LCL = 0, UCL = .1081 fill volumes below the lower specification have been
1500
b. There are no signs of any ‘out of control’ condi- inspected out, resulting in the left part of the distribu-
tions and we conclude that the process is in tion being “cut-off .” The histogram illustrates a nor-
statistical control. mal distribution that has been truncated at the LSL.
41. a. .01043

55. As the drill wears out it may not be able to drill the 11. a. Narrower b. No c. No d. No
hole diameters properly. On a control chart this 13. a. (12.69, 14.97). We are 99% confident the aver-
problem will likely manifest itself as a slow trend age backpack weight of 6th graders is between
down in the hole diameters that are being drilled. 12.69 and 14.97 pounds
That is, the hole diameters may get smaller and b. (13.26, 16.25).
smaller. Test #3 on the conditions for an “out of c. The average backpack weight as a percentage
control” process may occur. of body weight of 6th graders seems well above
57. b. When analyzing the control chart we do not see the recommendation as 10% is well outside the
any “out of control” conditions. We conclude interval (13.26, 16.25).
that the milling process is in statistical control. 15. a. (1398.90, 1455.10). We are 95% confident that
59. a. Cp 5 1.33, Cpu 5 1, Cpl 5 1.67, Cpk 5 1. the true average FEV1 level for the given popu-
61. It can be shown that P(x . t1 1 t2 u x . t1) lation is between 1398.90 and 1455.10 ml.
P(x . t1 1 t2) b. 158
5 5 e2t2 5 P(x . t2)
P(x . t1) s
17. s 6 (z critical value) a b 5 (3.332, 4.128)
63. a. Since this system is connected in series, the 12n
overall reliability is R(t) 5 R1(t) ? R2(t). 19. 390.74 min
Note that R1(t) # 1 and R2(t) # 1, and so 21. 4.062 kip
R(t) # R1(t) and R(t) # R2(t). Thus, the over- 23. a. (.50, .56). We are 99% confident the propor-
all reliability never exceeds the reliability of tion of all adult Americans who have watched
any of its individual components. That is, streamed programming is between 50 and 56%.
R(t) # min{R1(t), R2(t)}. b. 664
b. In the case where the components are not nec- 25. a. .042
essarily independent, then R(t) = P(T . t) 5 b. If we were to sample repeatedly, the calculation
P(both components last longer than t) 5 P(T1 . t method in (a) is such that will exceed the cal-
and T2 . t). Since {T1 . t and T2 . t} is the in- culated lower confidence bound for 95% of all
tersection of the two events {T1 . t} and possible random samples of n 5 143 individuals
{T2 . t}, it’s probability cannot exceed P(T1 . t) who received ceramic hips.
or P(T2 . t). That is, R(t) # min[P(T1 . t), 27. a. p1 2 p2
P(T2 . t)] 5 min[R1(t), R2(t)]. p1(1 2 p1) p2(1 2 p2)
6 (z critical value) 1
B n1 n2
Chapter 7
b. (2.118, .136), no
1. Yes, because the length x can also be thought of as
c. (2.118, .135)
a sample average based on a sample size of n 5 1.
29. a. (A, B) 5 ln(p1yp2)
3. a. .4714 n1 2 u n2 2 v
b. .8414 (n 5 50); .9544 (n 5 100) approximately 6 (z critical value) 1 , (eA, eB)
1 (n 5 1000) A n 1u n 2v
c. The probability that the sample mean lies within b. (.970, 1.349), yes
61 unit of increases as the sample size n 31. 271
increases. 33. (.012, .056), using a 95% confidence level
5. a. 1n 5 2(1.645), so n $ 11 35. 4.3, no
b. 80%: 1n 5 2(1.282), so n $ 7 37. a. 2.228 b. 2.086 c. 2.845
95%: 1n5 2(1.960), so n $ 16 d. 2.680 e. 2.485 f. 2.571
99%: 1n 5 2(2.576), so n $ 27 39. a. 1.812 b. 1.753 c. 2.602 d. 3.747
c. Increasing the probability that x lies within e. 2.1716 (from Minitab) f. Roughly 2.43
1 unit of requires corresponding increases in 41. a. Yes, a normal quantile plot shows a somewhat
the sample size n. linear relationship.
7. a. 99.8% b. 99.5% c. 85% d. 68% b. (106.4, 109.1). Based on this interval, 107 is a
9. a. Increased interval width plausible value but 110 is not plausible for the
b. Decreased interval width true average work of adhesion.
c. Increased interval width

43. a. We are 95% confident that the true average c. Let 1 and 2 represent the true mean differ-
mileage is between 46,145.4 and 86,296.8. ences in side-to-side AP translation for pitchers
b. We are 95% confident that the mileage for a and position players, respectively. To generate a
single vehicle is between 0 and 148,995.4. This confidence interval for 1 2 2, we use the dif-
interval is much wider than the interval from ferences utilized in parts (a) and (b). A 95% con-
part (a). fidence interval for 1 2 2 is (1.69, 5.98). Since
45. a. Using a normal probability plot, we ascertain both endpoints are positive, we concur with the
that it is plausible that this sample was taken authors’ assessment that this difference is greater,
from a normal population distribution. on average, in pitchers than in position players.
c. 38.78 59. a. (23.85, 11.35)
d. 42.29, a higher upper bound than that found in b. (7.02, 10.06)
part (c). 61. a. 95% bootstrap interval is (431.82, 445.65) based
47. a. A 95% prediction interval for the amount of on 200 bootstrap replications. (Note that all
warpage of a single piece of laminate is .0635 6 bootstrap intervals will differ slightly from one
.0137 another.)
b. (.0464, .0806) b. t interval: (430.51, 446.08); bootstrap interval:
49. (3.43, 4.13). Thus, with 95% confidence, we can (431.82, 445.65)
say that the true average firmness for zero-day ap- 63. a. MLE for is xyn.
ples exceeds that of 20-day apples by between 3.43 b. xyn is an unbiased estimator of .
and 4.13 N. c. MLE for (1 2 )5 is (1 2 xyn)5.
51. a. The most notable feature of these boxplots is 65. a. MLE is x 1 11.645(s ), where s equals
the larger amount of variation present in the n21
s ,
mid-range data as compared to the high-range A n
data. Otherwise, both boxplots look reasonably where s is the sample standard deviation of the
symmetric and there are no outliers present. data.
b. A 95% confidence interval for ( mid range 2 b. 403.3
high range) is (27.84, 9.54). Since plausible 67. a. n 5 min(x1, x2,…, xn); n 5 1y(x 2 n)
values for (1 2 2) are both positive and nega- b. n 5 .64; n 5 1y(5.58 2 .64) 5 .202
tive (i.e., the interval spans zero), we would 69. 5 2 is too large; the resulting kernel density will
conclude that there is not sufficient evidence to not show much detail in the data.
suggest that 1 and 2 differ. 71. a. The kernel density graph will have a very
53. Assuming sample “1” corresponds to the lab choppy appearance.
method, the CI says we’re 95% confident that the b. Larger values of will result in smoother kernel
true mean arsenic concentration measurement density curves.
using the lab method is between 6.498 g/L and 73. will have to be raised.
11.102 g/L higher than using the field method. 75. a. . 134.78
55. a. A 95% confidence interval for d is (214.83, b. Tensile strengths should be normally distributed.
26.50). Since this interval contains negative and c. A histogram of the data appears approximately
positive values, there is not sufficient evidence bell-shaped, so the normality assumption is a
to suggest that d is different from zero. good one for this data.
b. A 95% prediction interval for the difference d is d. . 127.81
(248.85, 60.51). 77. (2299.3, 1517.9)
57. a. (2.03, 6.10). We are 95% confident that the 79. (1024.0, 1336.0), yes
true mean difference between dominant and 81. a. A normal probability plot shows it is reasonable
nondominant arm translation for pitchers is be- to assume the sample was taken from a normal
tween 2.03 and 6.10. population distribution.
b. (20.54, 1.01). We are 95% confident that the b. Letting d 5 peak ER velocity–peack IR veloc-
true mean difference between dominant and ity, a 95% confidence interval for d is (34.1,
nondominant arm translation for position play- 130.9). Since both endpoints are positive, we
ers is between 2.54 and 1.01. conclude that IR and ER differ significantly,
with ER being the higher of the two.

83. a. .5 b. .25 c. (.5)n d. (.5)n b. No, it is not plausible from the results in part a
e. 1 2 2(.5)n, 100[1 2 2(.5)n] that the variable ALD is normal. However, since
f. (28.7, 42.0), 99.8% n 5 49, normality is not required for the use of
g. (28.62, 40.28), narrower than (f) z inference procedures.
85. a. 1y2 b. 1y3 c. 1y(n 1 1), 1y(n 1 1) c. H0: 5 1.0 versus Ha: , 1.0. z 5 25.79; at
d. 1 2 2y(n 1 1), 100[1 2 2y(n 1 1)]%, any reasonable significance level, we reject the
(28.7, 42.0), 81.8% null hypothesis. Yes, the data provides strong evi-
87. No, (69.80, 88.80), 99.97% dence that the true average ALD is less than 1.0.
89. a. (38.46, 38.84) 19. a. P-value 5 P(t . 3.2) 5 .003, reject H0.
b. (Answers will vary): For a simulation pro- b. P-value 5 P(t . 1.8) 5 .055, do not reject H0.
grammed in R using 1000 bootstrapped means, c. P-value 5 P(t . 2.2) 5 .578, do not reject H0.
a 95% bootstrap interval for the population 21. a. P-value 5 2P(t . 1.6) 5 2(.068) 5 .136, do not
mean was (38.51, 38.81). This interval agrees reject H0.
closely with the interval from part (a). b. P-value 5 2P(t , 21.6) 5 2(.068) 5 .136, do
91. a. (.296, .324) not reject H0.
b. Since the interval value dips below 30%, we c. P-value 5 2P(t , 22.6) 5 2(.008) 5 .016, do
cannot conclude that the 2002 percentage is not reject H0.
more than 1.5 times the 1998 percentage. d. P-value 5 2P(t , 23.9) 2(0) 0, reject H0.
23. H0: 5 30 versus Ha: , 30. t 5 0.84, P-value 5
Chapter 8 .209, do not reject H0.
25. H0: 5 181 versus Ha: . 181. t 5 1.91,
1. a. Yes b. No c. Yes d. No
P-value 5 .041, reject H0.
e. Yes f. Yes g. Yes h. Yes
27. a. 17 b. 21 c. 18 d. 26
i. Yes j. Yes
29. H0: (1 2 2) 5 0 versus Ha: (1 2 2) , 0.
3. H0: 5 40 versus Ha: Þ 40
t 5 22.46 22.5, df 5 15, P-value 5 .012.
5. H0: 5 120 versus Ha: , 120. Type I: Con-
Do not reject H0.
clude that the new system does reduce average
31. a. Normal quantile plots show sufficient linearity
distance when in fact it does not. Type II: Con-
for each data set. Therefore, it is plausible that
clude that the new system does not reduce average
both samples have been selected from normal
distance when in fact it does.
population distributions.
7. With 1 for regular and 2 for special, H0: 1 2 2 5
b. The comparative boxplot does not suggest a
0 versus Ha: 1 2 2 . 0. Type I: Conclude that
difference between average extensibility for the
the special outperforms the regular laminate when
two types of fabrics.
this is not the case. Type II: Conclude that the
c. H0: (H 2 P) 5 0 versus Ha:(H 2 P) Þ 0.
regular laminate is at least as good as the special
t 5 2.38, df 5 10, P-value 5 .71. Do not reject
laminate when in fact the special does yield an
H0.
improvement.
33. H0: (1 2 2) 5 0 versus Ha: (1 2 2) . 0. When
9. a. Reject H0. b. Reject H0.
assuming unequal variances, t 5 3.6362, the
c. Don’t reject H0. d. Reject H0.
corresponding df is 37.5, and, the P-value for our
e. Do not reject H0.
upper-tailed test would be [(.0008)y2] 5 .0004.
11. a. 1.83, .0336 b. 4.22, approximately 0
(Note: P-value 5 P(t . 3.6362) 5 .0004) Reject
c. 1.33, .0918
H0. We could have committed a Type I error.
13. a. H0: 5 .85 versus Ha: Þ .85
35. H0: (H 2 NH) 5 0 versus Ha: (H 2 NH) . 0.
b. Don’t reject H0, because P-value . . Same
t 5 2.09, df 5 17, P-value 5 .026. Do not reject
conclusion and reason.
H0.
15. H0: 5 55 versus Ha: Þ 55, z 5 25.25, (x1 2 x2) 2 D
P-value < 0, reject H0 37. a. Use t 5 with corresponding
17. a. Using software, x 5 0.75, x 5 0.64, s 5 .3025,
~ 1 1
sp 1
IQR 5 .505. These summary statistics, as well A n1 n2
as a boxplot (not shown) indicate substantial df 5 (n1 1 n2 2 2). sp is defined in Exercise 54
positive skewness, but no outliers. in Chapter 7.

b. t 5 3.6362, df 5 (n1 1 n2 2 2) 5 (20 1 20 2 53. T his is a 2 test of the homogeneity of several

2) 5 38, P-value 5 P(t . 3.6362) 5 (.0008y2) 5 proportions. The hypotheses to test are:
.0004. Reject H0. H0: the genders are homogeneous with respect to
c. t 5 2.01074, df 5 (n1 1 n2 2 2) 5 (9 1 18 2 neck pain versus
2) 5 25, P-value 5 P(t . 2.01074) 5 .0276. Do Ha: the genders are not homogeneous with respect
not reject H0. to neck pain
39. a. For MSD, H0: d 5 0 versus Ha: d Þ 0. t 5 .85, 2 5 142.1, P-value , .001. Reject H0.
df 5 20, P-value 5 .408. Do not reject H0. 55. a. The four expected counts are 1.56, 2.44, 32.44,
b. For RULA, H0: d 5 0 versus Ha: d Þ 0. 50.56. Two of the cells have expected counts
t 5 24.47, df 5 20, P-value , .001. Reject H0. less than 5.
c. Measurements were taken before and after b. Fisher’s exact test P-value 5 .02083. We con-
intervention. The intervention in the form of clude at 5 5% that surgery method affects the
a short oral presentation would most likely not provision of ondanestron.
lead to instant reductions in musculoskeletal 57. a. i. Since r . .9347, P-value . .10
disorders (MSD). However, such an interven- ii. Since .8804 , r , .9180, .01 , P-value , .05
tion could cause an immediate change in one’s iii.Since r . .9662, P-value . .10
posture, and therefore have a major impact on iv. Since r , .9408, P-value , .01
one’s RULA score. b. i. Fail to reject H0, since P-value . .05
41. H0: d 5 0 versus Ha: d . 0. t 5 2.68, df 5 12, ii.Reject H0, since P-value , .05
P-value 5 .01. Reject H0. iii.Fail to reject H0, since P-value . .05
43. First, compute the percent change, (measured– iv. Reject H0, since P-value , .05
stated)/stated, for each meal. Let denote the true 59. The Ryan-Joiner test P-value is larger than .10, so
average percentage change for all supermarket we conclude that this data could reasonably have
convenience meals. H0: 5 0 versus Ha: Þ 0. come from a normal population. We can safely
t 5 3.90, df 5 9, P-value 5 .004. Reject H0. use a one-sample t-test to test hypotheses about the
45. H0: 1 5 .40 2 5 .30 3 5 .20 4 5 .10 versus value of the true average compressive strength.
Ha: The Statistics Department’s expectations are 61. a. The Ryan-Joiner test P-value is less than .01,
not correct. so it is implausible that this data came from a
2 5 1.57, df 5 3, P-value . .10. Do not reject H0. normal population. In particular, the observa-
47. H0: 1 5 .177, 2 5 .032, 3 5 .734, 4 5 .057. tion 65 is a clear outlier.
The alternative hypothesis is that at least one of b. The Ryan-Joiner test P-value is larger than .10,
these proportions is incorrect. 2 519.6, P-value , so we conclude that the data without the outlier
.001. Reject H0. could reasonably have come from a normal
49. H0: 1 5 3y9, 2 5 4y9, 3 5 2y9. The alternative population.
hypothesis is that at least one of these proportions 63. n 5 .0742, combine the last six intervals to obtain
is incorrect. 2 5 .6875, P-value . .10. Do not X2 5 1.34 based on 2 df, so a truncated exponen-
reject H0. tial distribution is plausible.
51. This is a 2 test of the homogeneity of several 65. a. .2514, .0918, .0038, .0004
proportions. The hypotheses to test are: b. .9515, .8413, .3669, .1587
H0: the groups are homogeneous with respect to 67. Two-sided test of H0: 5 100 versus Ha: Þ 100
side effects versus at 5 .01 with n 5 15 is proposed. is thought to
Ha: the groups are not homogeneous with respect be between .8 and 1. The first two printouts show
to side effects that power of detecting shifts of .5 or .8 will be
Counseling Group: 24 had at least one side effect, very low. Last printout shows that power can be in-
31 had none. No-Counseling Group: 8 had at creased to 90% by increasing sample size to 42 (for
least one side effect, 44 had none. 2 5 10.177, a .5 shift) and 19 (for a .8 shift).
P-value 5 .0014. Reject H0. 69. P-value 5 .056, so at significance level .05, H0
cannot be rejected.

71. a. The corresponding probability plot suggests the 85. H0: d 5 0 versus Ha: d . 0, d 5 .821, sd 5 2.52,
data is consistent with a normally distributed t 5 1.22, P-value 5 .126, do not reject H0.
population. So, we are comfortable proceeding 87. H0: 1 2 2 5 10 versus Ha: 1 2 2 . 10, t 5 2.49,
with the t procedure. df 5 5, P-value 5 .027, reject H0 when 5 5%.
b. H0: 5 0.6 versus Ha: , 0.6, t 5 22.14, 89. a. H0: 1 5 .27477, 2 5 .20834, 3 5 .15429,
P-value 5 .0495, reject H0 when 5 5%, do 4 5 .3626. The alternative hypothesis is that
not reject H0 when 5 1%. at least one of these proportions is incorrect.
c. In this context, a Type I error would be to 2 5 9.02, .01 , P-value , .05. Reject H0 when
conclude that less than 10% of the tube’s 5 5%. Thus, the above model is questionable.
contents remain after squeezing, on average, b. H0: 1 5 .45883, 2 5 .18813, 3 5 .11032,
when in fact 10% (or more) actually remains. 4 5 .24272. The alternative hypothesis is that
When we rejected H0 at the 5% level, we may at least one of these proportions is incorrect.
have committed a Type I error. A Type II error 2 5 .157, P-value . .10. Do not reject H0.
occurs if we fail to recognize that less than 10% Thus, the proposed model appears to fit the data
of a tube’s contents remains, on average, when quite well.
that’s actually true (i.e., we fail to reject the
false null hypothesis of 5 0.6 oz). When we Chapter 9
failed to reject H0 at the 1% level, we may have
committed a Type II error. 1. a. H0: A 5 B 5 C; where i 5 average
73. a. Since the mean, x 5 215, is so much lower strength of wood of Type i.
than the midrange (about 585), one would b. Use either Type A or B, but choose the less
suspect the distribution is positively skewed. expensive of the two types.
However, it is not necessary to assume normal- c. Choose the least expensive of the three types.
ity if the sample size is “large enough,” due to 3. The two ANOVA tests will give identical
the central limit theorem. Since n 5 47, we can conclusions.
proceed with a test of hypothesis about the true 5. There is no way of knowing whether there is a
mean consumption. statistically significant difference between the
b. H0: 5 200 versus Ha: . 200, z 5 .44, means. When there is no difference, the “pick
P-value 5 .33, do not reject H0. the winner” strategy doesn’t allow you to choose
75. H0: 5 1.75 versus Ha: Þ 1.75, t 5 1.70, between the population based on other criteria
P-value 5 .102, do not reject H0. (e.g., cost, time, etc.).
77. H0: 5 1y3 versus Ha: , 1y3, z 5 21.35, 7. F.05(5, 8) Þ F.05 (8, 5) and F.01(5, 8) Þ F.01(8, 5)
P-value 5 .0885, do not reject H0. 9. F 5 4.12 exceeds F.05(3, 30), so conclude that
79. H0: (1 2 2) 5 0 versus Ha: (1 2 2) Þ 0, there is a difference among the means.
z 5 2.25, P-value 5 .0244, reject H0. 11. a. Source df SS MS F
81. a. H0: (1 2 2) 5 0 versus Ha: (1 2 2) Þ 0.
Using the unpooled t-test statistic we have t 5 Treatments 5 3575.065 715.013 51.3
2.84 and df 5 18. This results in a P-value 5 Error 150 2089.350 13.929
2[P(t . 2.8)] 5 2(.006) 5 .012. These values Total 155 5664.415
differ slightly from t 5 2.51 and P-value 5 .019. b. H0: 1 5 5 6 versus Ha: at least two of the
b. H0: (1 2 2) 5 25 versus Ha: (1 2 2) . 25, i’s are different.
the unpooled t-test statistic value is .556, c. P-value is P(F5,150 51.3) 0, reject H0
P-value 5 .278, do not reject H0. 13. a. H0: 1 5 5 5 versus Ha: at least two of the
83. a. H0: 37,dry 2 22,dry 5 100 v. Ha: 37,dry 2 22,dry . i’s are different.
100. The relevant test statistic value is t 5 2.58, b. F 5 4.14, using software P-value 5 .0061,
df 5 9, P-value 5 .015, reject H0 when 5 5%. reject H0
b. H0: 22,wet 2 37,wet 5 100 v. 15. H0: 1 5 2 5 3 5 4 versus Ha: at least two of
Ha: 22,wet 2 37,wet . 50. The relevant test the i’s are different. F 5 2.31, P-value . .10, do
statistic value is t 5 .46, df 5 9, P-value 5 .328, not reject H0
do not reject H0. 17. a. SST, SSTr, and SSE are each multiplied by a
factor of (2.54)2, but the F ratio does not change.

b. Changing the units of measurement will change 41. a. F ratio for Methods 5 8.69 is significant at
the sum of squares column in the ANOVA 5 5%. Curing methods do have differing ef-
table, but the degrees of freedom and F ratio fects on strength.
will remain unchanged. b. F ratio for Batches 5 7.22 is significant at
19. SSTr 5 .2164; SSE 5 .3870; F ratio 5 4.194 is 5 5%. Different batches do have an effect on
significant at 5% level. There is evidence of a strength.
difference between the means. c. F ratio for Methods 5 2.83, which is not signifi-
21. a. SSTr 5 25.80; SSE 5 115.48; SST 5 141.28; cant at 5 .05. Conclusion: Curing method
F ratio 5 2.35 is not significant at 5% level. does not have an effect on strength.
There is no evidence of a difference between 43. a. H0: 1 5 2 5 3 versus Ha: at least two of the
the means. population means differ
b. Favorable, because pegs have same strength b. Source df SS MS F
regardless of positioning
23. a. The ANOVA table entries will be unchanged by Factor 2 591.20 295.6 1.3
the calibration error. Error 21 4773.3 227.3
b. If all data points are shifted (up or down) by the Total 23 5364.50
same amount , this will not affect any of the en- c. Corresponding P-value . .10. Do not reject H0.
tries in the ANOVA table; however, the means of 45. a. F.05(1, 10) 5 4.96 and t.025(10) 5 2.228. (2.228)2 <
each sample will shift by an amount equal to . 4.96; the equality is approximate because the F
25. An effects plot does not show the within-samples and t table entries are rounded.
variance, only the between-groups variation. b. F.05(1, df2) approaches 3.8416, the square of 1.96.
27. q.05(5, 15) 5 4.37, so T 5 36.09. 47. MSTr 5 140, so if F ratio . F.05(2, 12) 5 3.89,
437.5 462.0 469.3 512.8 532.1 then MSE , 140y3.89 5 35.99. q.05(3, 12) 5 3.77,
29. T 5 36.09 as in Problem 27. so to have T . 10 (i.e., largest mean 2 smallest
427.5 462.0 469.3 502.8 532.1 mean), MSE must exceed (5)(10y3.77)2 5 35.18.
Therefore, if 35.18 , MSE , 35.99, then the
two conditions will be satisfied. In terms of SSE,
422.16 , SSE , 431.88.
31. q.05(6,150) q.05(6,120) 5 4.10, T 5 3.00. 49. For condition 1 to be satisfied it can be shown that
14.18 17.94 18.00 18.00 25.74 27.67 SSE , 385.60. For condition 2 to be satisfied it
33. q.05(3, 6) 5 4.34, T 5 7.92. There are 2 distinct can be shown that SSE . 422.15. Therefore, no
sets: Set 1 (42.67, 43.33), Set 2 (53.67). SSE value exists that can satisfy both conditions.
42.67 43.33 53.67
35. a. F ratio for Brands 5 95.57 is significant at 5 51. a.
1%. There is a difference between the brands. Source df SS MS F P-value
b. F ratio for Humidity 5 278.20 is significant at
5 1%. Humidity levels do affect power con- Drying method 4 14.962 3.741 36.70 0.000
sumption, so it was wise to use humidity as a Fabric type 8 9.696 1.212 11.89 0.000
blocking factor. Error 32 3.262 0.102
37. a. F ratio for Brand 5 8.96 is significant at 5 Total 44 27.920
5%. There is a difference among lathe brands. b. The null hypothesis of interest is H0: there are
b. F ratio for Operators 5 10.78 is significant at no differences in mean smoothness scores for
5 5%. There is a difference among operators. the five drying methods. The F-ratio for “drying
39. a. method” is F 5 36.7, P-value , .001. reject H0.
Df Sum Sq Mean Sq F-value Pr(F)
DESIGN 3 519515 173171.67 35.46 <.0001 Chapter 10
PERSON 20 100460 5023.00 1.03 0.445 1. Replication allows you to obtain an estimate of the
Residuals 60 293009 4883.48
experimental error.
b. Yes, P-value ,.0001. 3. a. Surface is a dome over the x–y plane
c. Corresponding F ratio 5 1.03 with P-value 5 b. Maximum occurs at x 5 2, y 5 5.
.445. The person-to-person differences in RPN c. Contours are circles centered at (x, y) 5 (2, 5).
are not confirmed by the data.

5. When the lines in the AB interaction plot are par- 17. a.

allel, the effect of changing factor A (factor B) from Source DF SS MS F
one level to another will be the same for each level
of factor B (factor A). A 2 14,144.44 7,072.22 61.06
7. Source DF SS MS F B 2 5,511.27 2,755.64 23.79
C 2 244,696.39 2,348.20 1,056.27
Factor A 4 20.0 5 2.5 AB 4 1,069.62 267.20 2.31
Factor B 4 64.8 16.2 8.1 AC 4 62.67 15.67 .14
Interaction 16 15.2 .95 .475 BC 4 311.67 82.92 .72
Error 50 100.0 2 ABC 8 1,080.77 135.10 1.17
Total 74 200.0 Error 27 3,127.50 115.83
9. a. F ratio for Interaction is 1.545, which does not Total 53 270,024.33
exceed F.05(2, 12) 5 3.89, so there is no evi- b. All F ratios for interaction terms are smaller
dence of interaction between the factors. than corresponding tabled F.05 values.
b. F ratio for Formulation 5 376.25, F ratio for c. All three main effects are significant.
Speed 5 19.269. Both factors have an effect on 19. a. Source DF SS MS F
yield.
11. a. F ratio for the Interaction 5 1.10, which is not A 2 12.896 6.448 1.04
significant at 5 .05. B 1 100.042 100.042 16.10
b. F ratio for the Aggregate Content 5 56.06, C 3 393.417 131.139 21.10
which is significant at 5 .05. AB 2 1.646 .823 .13
c. F ratio for the Asphalt Grade 5 14.12, which is AC 6 71.021 11.837 1.90
significant at 5 .05. BC 3 1.542 .514 .08
13 b. At the .01 level, there is not a statistically ABC 6 9.771 1.628 .26
significant interaction between adhesive and Error 72 477.50 6.215
condition’s effects on shear bond strength. Total 95 1,037.833
Ignoring the interaction effect, condition (dry/ b. Main effects for factors B and C are significant.
moist) is not statistically significant, while adhesive c. None of the interaction terms are significant.
(OBP/SBP) is highly statistically significant. 21. a. Software gives the following table:
c. Using Tukey’s procedure for a one–way Source DF SS MS F P-value
ANOVA using the four groups (OBP–D, OBP–
M, SBP–D, SBP–M) yields the following result: A 2 124.60 62.30 4.85 0.04
OBP–D OBP–M SBP–M SBP–D B 2 20.61 10.30 0.80 0.48
39.9 46.1 50.8 53.0 C 2 356.95 178.47 13.89 0.00
AB 4 57.49 14.37 1.12 0.41
AC 4 61.39 15.35 1.19 0.38
15. a. Software gives the following table: BC 4 11.06 2.76 0.22 0.92
Source DF SS MS F P-value Error 8 102.78 12.85
Total 26 734.88
A 2 210.67 105.33 0.53 0.60 b. The appropriate F ratios for the AB, AC, and BC
B 2 132.17 66.09 0.33 0.72 interactions are 1.12, 1.19, and .22, respectively.
C 2 2586.35 1293.18 6.45 0.01 These F ratios are all not statistically significant
AB 4 57.48 14.37 0.07 0.99 at 5 .05.
AC 4 636.84 159.21 0.79 0.54 c. The main effects for A (paste thickness) and
BC 4 875.00 218.75 1.09 0.38 for C (laser power) have corresponding F ratios
ABC 8 888.52 111.06 0.55 0.81 4.85 and 13.98 respectively. These F ratios are
Error 27 5416.67 200.62 all statistically significant at 5 .05.
Total 53 10803.70
b. There are no significant interaction effects.
c. The only significant main effect is for C (quill
gap) having F ratio 5 6.45.

23. a. 27. a. Source df SS MS F P-value

Source df SS MS F P-value
A 1 1685.1 1685.1 102.38 0.000
A 2 34436 17218 436.92 0.000 B 1 21272.2 21272.2 1292.36 0.000
B 2 105793 52897 1342.3 0.000 C 1 5076.6 5076.6 308.42 0.000
C 2 516398 258199 6552.04 0.000 AB 1 36.6 36.6 2.22 0.174
AB 4 6868 1717 43.57 0.000 AC 1 0.4 0.4 0.03 0.877
AC 4 10922 2731 69.29 0.000 BC 1 109.2 109.2 6.63 0.033
BC 4 10178 2545 64.57 0.000 ABC 1 23.5 23.5 1.43 0.266
Error 8 131.7 16.5
ABC 8 6713 839 21.3 0.000
Total 15 28335.3
Error 27 1064 39
b. At 5 .01, all three main effects are impor-
Total 53 692372 tant, since each of their P-values is less than
b. The appropriate F ratios for the AB, AC, BC, .001. No significant interaction effects exist,
and ABC interactions are 43.57, 69.29, 64.57, when testing at 5.01.
and 21.3, respectively. These F ratios are all 29. a. Let A 5 storage time, B 5 storage temp, C 5
statistically significant at 5 .01. packaging type.
c. The appropriate F ratios for the A, B, and C Term Effect
main effects are 436.92, 1342.3 and 6552.04,
A 20.03125
respectively. These F ratios are all statistically
B 20.24625
significant at 5 .01.
C 20.21125
25. AB AC AD BD CD ABD ACD BCD ABCD AB 20.03125
AC 0.02375
1 1 1 1 1 21 21 21 1 BC 20.21125
21 21 21 1 1 1 1 21 21 ABC 0.02375
21 1 1 21 1 1 21 1 21 b.
1 21 21 21 1 21 1 1 1 Source DF SS MS F P-value
1 21 1 1 21 21 1 1 21
21 1 21 1 21 1 21 1 1 A 1 0.003906 0.003906 25.00 0.001
21 21 1 21 21 1 1 21 1 B 1 0.242556 0.242556 1552.36 0.000
1 1 21 21 21 21 21 21 21 C 1 0.178506 0.178506 1142.44 0.000
1 1 21 21 21 1 1 1 21 AB 1 0.003906 0.003906 25.00 0.001
21 21 1 21 21 21 21 1 1 AC 1 0.002256 0.002256 14.44 0.005
21 1 21 1 21 21 1 21 1 BC 1 0.178506 0.178506 1142.44 0.000
1 21 1 1 21 1 21 21 21 ABC 1 0.002256 0.002256 14.44 0.005
1 21 21 21 1 1 21 21 1 Error 8 0.001250 0.000156
21 1 1 21 1 21 1 21 21 Total 15 0.613144
21 21 21 1 1 21 21 1 21 All interaction and main effects are significant
1 1 1 1 1 1 1 1 1 at 5.01.
d. A (storage time), B (storage temp), and C (packag-
ing type) should be set to their low values.
All contrasts and effects are shown here:
Effect Effect Effect
Name Contrast Effect Name Contrast Effect Name Contrast Effect

A 233.84 24.23 AC 24.76 20.595 ABC 21.20 20.150

B 26.94 20.8675 AD 22.56 20.320 ABD 21.408 20.185
C 3.98 0.4975 BC 24.62 3.0775 ACD 1.20 0.150
D 150.94 18.8675 BD 15.42 1.9275 BCD 5.98 0.7475
AB 213.8 21.725 CD 3.26 0.4075 ABCD 2.04 0.255

31. a. Term Effect 35. A B C D E

A 2.625 21 21 21 21 1
B 9.625 1 21 21 1 21
C 24.625 21 1 21 1 1
AB .175 1 1 21 21 21
AC 1.125 21 21 1 1 21
BC 21.825 1 21 1 21 1
ABC 2.525 21 1 1 21 21
b. Factors B and C appear to be significant. 1 1 1 1 1
c. B at its high level and C at its low level. 37. By multiplying each of the 2521 effects through by
d. yn 5 14.313 1 4.813xB 2 2.313xC the defining relation I 5 ACE 5 BDE 5 ABCD
e. Term Effect you obtain the following alias structure:
A 5 CE 5 BCD 5 ABDE, B 5 DE 5 ACD 5
A .002 ABCE, C 5 AE 5 ABD 5 BCDE, D 5 BE 5
B 22.693 ABC 5 ACDE, E 5 AC 5 BD 5 ABCDE, AB 5
C .067 CD 5 ADE 5 BCE, AD 5 BC 5 ABE 5 CDE
AB 2.132 39. a. k 5 5 and p 5 1
AC .108 b. Let A 5 temp, B 5 pH, C 5 yeast, D 5
BC .063 Tryptone, and E 5 Nitsch. A 5 BCDE, B 5
ABC 2.058 ACDE, C 5 ABDE, D 5 ABCE, E 5 ABCD,
Only factor B appears to be significant. Set fac- AB 5 CDE, AC 5 BDE, AD 5 BCE, AE 5
tor B at its low level. Prediction equation is yn 5 BCD, BC 5 ADE, BD 5 CE, BE 5 ACD,
4.371 2 1.347xB. CD 5 ABE, CE 5 ABD, DE 5 ABC
33. a. Let A 5 time, B 5 current, C 5 EC area, c. The four–way interactions are confounded with
D 5 volume, E 5 arsenic. the main effects. The two–way interactions are
confounded with the three–way interactions.
Term Effect Term Effect Term Effect So, if all interactions consisting of three or more
A 20.019 AB 23.169 BD 2.906 factors are negligible, none of the estimates of
the remaining effects will be confounded with
B 26.119 AC 21.181 BE 21.456 one another.
C 2.131 AD 2.131 CD 1.069 41. a. k 5 3 and p 5 1
D 217.531 AE 21.281 CE 20.594 b. Let A 5 anode height, B 5 board orientation,
C 5 anode placement. The design generator in
E 22.519 BC 21.256 DE 21.331 this design is E 5 ABCD. The alias structure is:
b. The important effects appear to be the main A 5 2BC, B 5 2AC, C 5 2AB
effects A (time), B (current), and D (volume). c. Effect Name Effect
c. A (time) and B (current) should be set to
their high values. D (volume) should to set to A 23.135
its low value. B 21.135
d. Grand mean 5 71.953, Coefficient for C 24.925
A 5 (20.019y2) 5 10.010, Coefficient for d. SSE 5 (24.925)2 5 24.26, SSTo 5 (s2)(3) 5
B 5 (26.119y2) 5 13.060, Coefficient for (3.4338)2(3) 5 35.37, SSA 5 (23.135)2 5 9.83,
D 5(217.531/2) 5 28.766. So, the SSB 5 (21.135)2 51.29. When testing at 5
prediction equation is: .05, neither factor A nor factor B is important,
yn 5 71.953 1 10.010xA 1 13.060xB 2 8.766xD since their corresponding P-values (.639 and
.856) are so large.
e. Based on our analysis in part (d), we cannot con-
clude that factors A or B are significant. Also, we
assumed factor C was not significant in order to
test for the significance of factors A and B.

f. Since we have found no significant differences 47. a. Source df SS MS F

between the factors, the decision about how
to minimize the variation in plating thickness A 2 2.0742 1.0371 162.38
would not be made using the statistical analy- B 2 0.080570 0.0403 6.31
sis from early parts of this problem. However, C 2 0.2604 0.130195 20.38
based solely on the sign of the effect for each AB 4 0.0143 0.0036 0.56
factor, one might conclude that all three fac- AC 4 0.145137 0.0363 5.68
tors should be set at the high level (11), in BC 4 0.0194 0.0049 0.76
order to minimize the variation in plating Error 8 0.0511 0.006387
thickness. Total 26 2.4195
43. a. Let A 5 temp, B 5 pH, C 5 yeast, D 5 b. The significant effects are the A and C main
Tryptone, and E 5 Nitsch. Using the DOE effects.
command in Minitab, the effects estimates are: 49. a. Source df SS MS F
Term Effect Coef Term Effect Coef
A 2 326.67 163.34 5.89
Constant 50.288 AD 21.400 20.700 B 2 43.83 21.92 0.79
C 2 123.84 61.92 2.23
A 23.750 11.875 AE 21.050 20.525 AB 4 48.51 12.13 0.44
B 6.850 3.425 BC 12.450 6.225 AC 4 168.26 42.06 1.52
C 20.675 20.337 BD 14.100 7.050 BC 4 23.49 5.87 0.21
Error 8 221.68 27.71
D 218.725 29.363 BE 22.700 21.350 Total 26 956.28 36.78
E 25.725 22.863 CD 24.125 22.062 b. The two–way interactions AB, AC, and BC have
corresponding F ratios .44, 1.52, and .21 respec-
AB 28.075 24.037 CE 8.225 4.112
tively. None of these values are significant at the
AC 4.200 2.100 DE 26.675 23.337 5% level.
Estimates of the three and four-way interaction c. Only main effect A is significant at the 5% level.
terms would be determined by the estimates of the 51. a. Let A 5 time, B 5 pressure, C 5 temp. Using
corresponding aliased term. See Exercise 39 for the DOE command in Minitab, the effects
alias structure. estimates are:
b. A(5BCDE), D (5ABCE) Term Effect Coef Term Effect Coef
d. Settings that maximize percent protection are: A
high and D low. Constant 163.88 AB –63.25 –31.62
45. a. Source df SS MS F
A 19.25 9.63 AC 3.75 1.88
A 2 30,763.00 15,381.50 3.79
B 99.25 49.62 BC –47.25 –23.62
B 3 34,185.60 11,395.20 2.81
AB 6 43,581.20 7,263.53 1.79 C 41.25 20.63 ABC 1.25 0.62
Error 24 97,436.80 4,059.87 c. B, C, AB, BC
Total 35 205,966.60 d. Settings that maximize lignin removal are:
b. The F ratio for the interaction effect is 1.79 B high and C high.
which is not significant at the 5% level. 53. Caution: test runs are not in Yates order; pooled
c. The F ratio for the A main effect is 3.79 which SS for 2-factor interactions is 18.12 with 10 degrees
is significant at the 5% level. of freedom, so MSE 5 1.812; SSA 5 .856, SSB
d. The F ratio for the B main effect is 2.81 which 5 11.391, SSC 5 1.380, SSD 5 44.56, SSE 5
is not significant at the 5% level. 14.25. Factors B, D, and E are significant at
a 5 .01.

Chapter 11 15. As we saw in Exercise 13(b), the t ratio for testing
1. a. .095 b. .475 the model utility is dependent only on the sample
c. .830, 1.305 d. .4207 size and the sample correlation coefficient. Neither
3. a. V 5 g0 ? g11/T ? « b. 26.341 of these quantities is unit dependent. So, multiply-
5. a. Yes, a linear model seems appropriate. ing the dependent variable by a constant will have
b. yn 5 .1012 1 .4607x no effect on the t test statistic.
c. .3085 17. H0: 5 0 versus Ha: > 0, t 5 5.25, P-value ,
d. .0011 .0001, reject H0.
7. a. Yes, a linear model seems appropriate for each 19. a. b 5 1.378. There is, on average, a 1.378% in-
pair of variables. crease in reported nausea for each unit increase
b. yn0%5 123.501 2 8.711x, yn20% 5 158.570 2 in motion sickness dose.
13.562x, yn40% 5 167.282 2 17.113x b. t 5 3.422. Yes, there is a useful relationship be-
c. As timber damage increases, the linear relation- tween the two variables.
ship between pile length and critical rating c. It would be possible, but not advisable because
becomes increasingly negative. x 5 5 is outside the range of the x data.
d. se 5 .45 at 0%, se 5 3.01 at 20%, se 5 4.7 at d. b 5 1.424
40%. As timber damage increases, the estimated 21. a. The scatterplot appears to be quite linear.
value se also increases. b. .931
9. a. For a one unit increase in inverse foil thickness, c. If increasing velocity by 900 cm/sec results
one would expect a .260 unit increase in flux. in an average change in the response of .6,
98% of the observed variation in flux can be then our true population slope coefficient is
attributed to the simple linear regression rela- 5.6y90056.667 3 1024. H0 : 56.66731024
tionship between flux and inverse foil thickness. versus Ha : , 6.667 3 1024, t 5 2.6016,
b. 5.712 P-value . .10, do not reject H0.
c. 11.302 d. We are 95% confident that the true average
change in mist associated with a 1 cm/sec
a 2 1.128
11. H0: 50 versus Ha: Þ 0, t5 a b 5 a b5 increase in velocity is between 4.26 3 1024 and
sa 2.368 8.159 3 1024.
2.48, P-value 5 .642, do not reject H0. 23. a. A 95% prediction interval is (3.2833, 3.6067).
13. a. Method 1: Hypothesis Test, H0: 5 0 versus b. The interval when the temperature is 1200 de-
Ha: Þ 0, t 5 54.56, P-value , .0001, reject H0 grees will be wider than when the temperature
and conclude that there is a useful linear rela- is 1500 degrees. This is because 1200 degrees is
tionship between these two variables. Method 2: 200 degrees away from the mean temperature
A confidence interval for 5 b 6 (t critical of 1400 degrees whereas 1500 degrees is only
value)∙sb. A 95% confidence interval for is: 100 degrees away from the mean temperature.
.87825 6 (2.179)(.01610) 5 (0.8432, 0.9133), 25. The mean x value is 40.3. Intervals with x val-
using t critical value for df 5 (n 2 2) 5 ues farther away from this mean are wider. Also,
(14 2 2) 5 12. The plausible values are all prediction intervals are wider than confidence
positive so we conclude there is a useful linear intervals. And, 99% intervals are wider than 95%
relationship between the two variables. intervals. Therefore, (i) will be wider than
b. The t ratio for testing model utility would (iii), (i) will be more narrow than (ii), (ii) will be
be the same value regardless of which of the wider than (iv), (iii) will be more narrow than
two variables was defined to be the inde- (iv) and (v).
pendent variable. This can be easily seen by 27. a. t 5 16.2, P-value < 0, conclude there is a use-
looking at the t test statistic for testing if the ful linear relationship.
population correlation coefficient is equal to b. (.879, .947) c. (.780, 1.046)
zero. In that equation the only values required 29. a. 4.9 hr
are the sample size (n) and the sample cor- b. When number of deliveries is held fixed, the
relation coefficient (r). Both r and n are not average change in travel time associated with a
dependent on which variable was the indepen- 1-mile increase in distance traveled is .060 hr.
dent variable.

When distance traveled is held fixed, the average c. (13.80, 19.09)

change in travel time associated with one extra d. .741, a negligible drop in R2, suggesting the
delivery is .900 hr. lubrication regimen indicator variables are not
c. .9861 important. A formal “full” versus “reduced”
31. b. For x 5 350, y 5 106.5. For x 5 485, y 5 model test confirms this suggestion.
65.325. The mean free flow percentage is e. The corresponding “full” versus “reduced”
higher when viscosity is 350. model test uses the null hypothesis that the
c. The change in mean free flow percentage interaction terms are not statistically significant
as viscosity increases from 450 to 460 is contributors to the model. F 5 3.19, .01 ,
81.2 2 86.5 5 25.3. The change in mean free P-value , .05, reject H0 at the .05 level and
flow percentage as viscosity increases from conclude that the interaction terms, as a group,
460 to 470 is 75.3 2 81.2 5 25.9. do contribute significantly to the regression
33. a. 77.3 b. 40.4 model.
35. a. To test H0: 1 5 2 5 0 versus Ha: at least one of 45. a. Since the plot of normal quantiles versus
1 and 2 is not zero, F 5 MSRegr/MSResid 5 standardized residuals looks linear, we would
1260.71 (from printout), P-value < .001, reject H0. conclude that the standardized residuals are
b. A 95% confidence interval for 2: b2 6 normally distributed.
(t-critical)sb2 5 .002775 6 (2.093)(.001121) 5 b. The plot of x versus the standardized residuals
(0.00043, 0.00512). has no discernible pattern. So, we would con-
c. For x1 5 11.5 and x2 5 40, yn 5 4.478. A 95% clude that our simple linear regression model
confidence interval for true average deposition assumptions are being met.
rate is 4.478 6 (2.093)(.02438) 5 (4.42697, 47. a. We would recommend the model with
4.52903). k 5 2. This model has a substantially higher
d. se 5 .04485, prediction interval is R2 adjusted value over the model with k 5 1.
4.478 6 (2.093) 2(.04485)2 1 (.02438)2 And, the models with k 5 3 and k 5 4 give
5 (4.371, 4.585). little improvement.
This interval contains the interval from part (c) b. No, a forward selection method would not have
as expected. considered the k 5 2 model described in the ex-
37. a. F 5 87.6, P-value 5 0; there does appear to be ample. Forward selection would let x4 enter the
a useful linear relationship between y and at model first and would not delete it at the next
least one of the predictors. stage.
b. .935 c. (9.095, 11.087) 49. The model with four variables including all but
39. b. .9986 the summerwood fiber variable would seem best.
c. P-value 5 0; judge the model useful. R2 is as large as any of the models, including the
d. t 5 48, P-value 5 0; the quadratic predictor 5 variable model. R2 adjusted is at its maximum
does appear useful. and CP is at its minimum. As a second choice,
e. (20.00, 21.14), (19.44, 21.70) one might consider the model with k 5 3 which
41. F 5 3.44, .05 , P-value , .10, conclude that the excludes the summerwood fiber and springwood
second-order predictors do not provide useful % variables.
information. 51. The model using the three variables x3, x9, x10
43. a. The variable “supplier” has three categories, so we would seem best. It has an adjusted R2 only slightly
need two indicator variables: x2 5 1 for supplier smaller than the largest adjusted R2. As a second
1 (0 otherwise), x3 5 1 for supplier 2 (0 otherwise). choice, the two predictor model is also quite
Likewise for “lubrication” we have two indicator good.
variables: x4 5 1 for lubricant 1 (0 otherwise),
x5 5 1 for lubricant 2 (0 otherwise).
b. H0: 1 5 2 5 3 5 4 5 5 5 0 versus Ha: at
least one i Þ 0. F 5 20.67, P-value < .001,
reject H0.

SSE 10.5513 ID x1 x2 Stable? Prob Label

53. a. R2 5 1 2 512 5 .653 or
SST 30.4395
MSE 1 1.8 2.4 Y 0.996 stable
65.3%, while adjusted R2 5 1 2 5
MST 2 1.65 2.54 Y 0.997 stable
10.5513y24 3 2.7 0.84 Y 0.29 unstable
1 2 5 .596 or 59.6%. Yes, the model
30.4395y28 4 3.67 1.68 Y 0.999 stable
appears to be useful. 5 1.41 2.41 Y 0.988 stable
b. The corresponding “full” versus “reduced”
6 1.76 1.93 Y 0.936 stable
model test uses the null hypothesis that the
10 second-order interaction terms are not sta- 7 2.1 1.77 Y 0.938 stable
tistically significant contributors to the model. 8 2.1 1.5 Y 0.765 stable
F 5 13.21, P-value < .001, reject H0 at the 9 4.57 2.43 Y 1 stable
.01 level and conclude that at least one of the 10 3.59 5.55 Y 1 stable
second-order terms is a statistically significant 11 8.33 2.58 Y 1 stable
predictor of protein yield.
12 2.86 2 Y 0.998 stable
c. We want to compare the “full” model with 14
predictors in (b) to a “reduced” model with 5 13 2.58 3.68 Y 1 stable
fewer predictors (x1, x21, x1x2, x1x3, x1x4). F 5 .62, 14 2.9 1.13 Y 0.787 stable
P-value > .10, fail to reject H0; therefore, it in- 15 3.89 2.49 Y 1 stable
deed appears that the five predictors involving x1 16 0.8 1.37 N 0.041 unstable
could all be removed. 17 0.6 1.27 N 0.014 unstable
d. The “best” models seem to be the 7-, 8-, 9-, and
18 1.3 0.87 N 0.01 unstable
10-variable models. All of these models have
high adjusted R2 values and low Mallows’ CP 19 0.83 0.97 N 0.005 unstable
values compared to the other models. 20 0.57 0.94 N 0.002 unstable
55. H0: 5 0 versus Ha: Þ 0, z 5 .73, P-value is 21 1.44 1 N 0.03 unstable
.463, do not reject the null hypothesis. There is 22 2.08 0.78 N 0.05 unstable
insufficient evidence to claim that age has a signifi- 23 1.5 1.03 N 0.041 unstable
cant impact on the presence of kyphosis. 24 1.38 0.82 N 0.009 unstable
57. a. For x1 5 pillar height to width ratio, H0: 1 5 0
25 0.94 1.3 N 0.04 unstable
versus Ha: 1 Þ 0, z 5 1.878, P-value 5 .0604,
reject H0. For x2 5 pillar strength to stress ratio, 26 1.58 0.83 N 0.017 unstable
H0: 2 5 0 versus Ha: 2 Þ 0, z 5 2.145, 27 1.67 1.05 N 0.072 unstable
P-value 5 .0319, reject H0. Each of the vari- 28 3 1.19 N 0.872 stable
ables appears to have a significant impact on 29 2.21 0.86 N 0.105 unstable
pillar stability.
b. The odds of pillar stability changes by the multi- 59. a. A simple linear regression model seems to fit the
plicative factor e2.774 5 16.02 when x1 increases by data well. The least squares regression equation is:
1 and x2 remains fixed. The odds of pillar stability yn 5 2.220 1 .0436x. The model utility test
changes by the multiplicative factor e5.668 5 289.46 obtained from Minitab produces a t test statistic
when x2 increases by 1 and x1 remains fixed. equal to 12.72. The corresponding P-value is
c. The table of observations with corresponding extremely small. So, we have sufficient evidence
probabilities and labels is shown below. Based to claim that ΔCO is a good predictor of ΔNOy.
on this, only two observations had a label that b. yn 5 2.220 1 .04362(400) 5 17.228. A
did not match actual stability status. The pil- 95% prediction interval produced by Minitab
lar with ID #3 was labeled as “unstable” when is (11.953, 22.503). Since this interval is so
in fact it was stable. The pillar with ID #28 wide, it does not appear that ΔNOy is accurately
was labeled as “stable” when in fact it was predicted.
unstable.

c. The large ΔCO value has extremely high le- Term Coefficient
verage. The least squares line that is obtained Constant 76.437
when excluding the value is yn 5 1.00 1 .0346x.
z1 27.35
The R2 value with the value included is 96% z2 9.61
and is reduced to 75% when the value is ex-
z3 2.915
cluded. The value of se with the value included
is 2.024 and with the value excluded is 1.96. So, z4 .09632
the large ΔCO value does appear to affect our z21 213.452
analysis in a substantial way. z22 2.798
61. a. Same x values yet different y values .02798
z23
b. b 5 .01023, sb 5 .009577, t 5 1.1,
P-value < .3; model cannot be judged useful. z24 2.0003201
63. a. The statement is incorrect. r2 is not the “linear z1 z2 3.750
correlation coefficient.” r2 is the coefficient of z1 z3 2.7500
determination. The linear correlation coeffi- z1 z4 .14167
cient is r and r 5 2.89 5 .9434. z2 z3 2.000
b. H0: 5 0 versus Ha: Þ 0. The value of the
z2 z4 2.1250
t test statistic equals 12.06. The corresponding
z3 z4 .00333
P-value is extremely small. So, we reject the
null hypothesis and conclude that there is a lin- c. The full model contains k 5 14 variables.
ear relationship between the two variables. The reduced model contains 4 variables.
c. As x increases so does the variation in the stan- H0: 5 5 5 14 5 0 versus H0: At least one
dardized residuals. This fact is inconsistent with of the ’s is not zero. SSResid(full) 5 1.9845,
our constant variance assumption of a least SSResid(reduced) 5 4.8146.
squares regression analysis. The value of the test statistic is:
65. a. The full model contains k 5 9 predictors. The (4.8146 2 1.9845)y10
F 5 c d 5 2.28,
reduced model contains 3 predictors. (1.9845)y(31 2 15)
H0: 4 5 5 9 5 0 versus H0: At least one of .05 , P-value , .10, do not reject the null
the ’s is not zero. hypothesis. There is not sufficient evidence at
(15233512805534)y6 the 5% level to claim that the second-order pre-
F 5 c d 5 .743, P-value . dictors provide useful information beyond what
(805534)y(15210)
is contained in the four first-order predictors.
.10, do not reject H0. There is not sufficient
69. A plot of y versus x suggests that simple linear
evidence to claim that the second-order predic-
regression model may be appropriate, but a graph
tors provide useful information beyond what is
of the residuals versus fitted values questions the
contained in the three first-order predictors.
validity of a simple linear regression model. Fit-
67. a. yn 5 84.67 1 .650 2 .258 1 .133 1 .108 2 .135 1 ting higher order models (such as second and
.028 1 .028 2 .072 1 .038 2 .075 1 .2131 third-order) may be more appropriate.
.200 2 .188 1 .050 5 85.39 The second-order model has R2 5 65.3%
The value of the residual for the one observa- and adjusted R2 5 62%, whereas the third-order
tion made under the specified conditions is: model has R2 5 70.7% and adjusted R2 5 66.3%.
(y 2 yn) 5 (85.4 2 85.39) 5 .01 Comparing adjusted R2 values the third-order
b. Let z1, z2, z3, z4 denote the uncoded variables. model seems to perform slightly better. From the
Then, z1 5 .1x1 1 .3, z2 5 .1x2 1 .3, z3 5 x3 1 second-order model, we predict y (at x 5 30) to be
2.5, z4 5 15x4 1 160. Equivalently, x1 5 10z1 2 3, 3.45 1 .0618(30) 2 .000377(302) 5 4.9647.
x2 5 10z2 2 3, x3 5 z3 2 2.5, x4 5 (z4 2 160)/15. For the third-order model, our estimate for y is
Substitution yields the following least squares 3.94 2 .045(30) 5 .0041(302) 2 .000048(303) 5
regression coefficients: 4.984. Both models appear to give roughly the
same estimate.
71. a. The boxplot shows that the shapes of the ppv for
the cracked and uncracked prisms appear to be

fairly symmetric. The boxplot further suggests present and 1 if there is a crack), and the in-
that the ppv for the cracked prisms tend to be teraction term PPV*Crack?. The best subsets
greater than the ppv for the uncracked prisms. regression suggests that the single quadratic
Let 1 5 the true mean ppv for uncracked term PPV2 is the single most useful predic-
prisms and 2 5 the true mean ppv for cracked tor. The quadratic regression model, which
prisms. A 95% confidence interval for (222), has the R2 value of 61.2%, has the equation
using the critical t value 5 2.093 based on yn 5 .996719 2 .00000001(PPV)2. The next
19 df, is: most useful single predictor is the PPV term.
233.72 295.32 This simple linear regression model, which
(482.7 2 827.4) 6 2.093 1 has the R2 value of 57.7%, has the equation
B 18 12
yn 5 1.00161 2 .000018(PPV). Models involving
2344.7 6 2.093(101.494), or (2557.127, more than 1 term don’t appear to explain the
2132.273). ratio variable any more significantly, since the
b. Using Minitab, we can use the best subsets R2 values of such models are not much different
option using the PPV, PPV2, the indica- than the model that simply uses PPV2 or PPV.
tor variable Crack? (0 if there is no crack

A Assignable causes, 252 Bootstrap confidence intervals,

Association and causation, 115 342–344, 551
Absolute error, 191 Attributes control charts, 249 Bootstrap percentile intervals, 343
Acceptance sampling, 226 Attributes data, 249, 273–283 Bound on the error of estimation,
Accuracy of measuring data, Automatic selection procedure, 561 172, 303–304, 310–311
186–187 Axioms of probability, 202 Boxplots, 80, 83–86
Addition rule in probability, comparative, 84
203–204, 206–207 and outliers, 85–86
Additive (probabilistic) model, B
504, 534
Adjusted coefficient of multiple Backward elimination, 561 C
determination, 545 Balanced designs, 431, 453
Alias structures, 492–494 Balanced three-factor ANOVA, 465 c charts, 278–282
All subsets regression, 560–561 Bar charts, 22–23 Calibration, 186, 191
Allowable process spread, 268 Bayes’ theorem, 213 Capability, process, 265–273
Alternative hypothesis, 353 Benchmarks, 165–166 nonconformance rates,
American Society for Testing and Best regression approach, 561 266–268
Materials (ASTM), 164, 189 Between samples variation, 419 Capability indexes, 268–272
Analysis of variance (ANOVA), 414 Bias, 296 Capability ratio, 272
interpretation of results, in estimation, 295 Capture-recapture experiment, 294
427–435 Bimodal histogram, 21 Categorical data analysis, 380–394
multiple comparisons, 428–432 Binomial distribution, 51–54 Fisher’s exact test, 389–391
randomized block, 435–441 mean value, 67 homogeneity, test for,
and regression, 521–522 Poisson approximation to, 55–56 385–389
single-factor (one-way), 419–427 table, 584–586 univariate, 381–384
two-factor (two-way), 457 variance of, 75 Category proportions, 380
Analytic studies, 9–10 Bivariate data, 4, 102 Causation, 115
ANOVA. See Analysis of variance fitting a line to, 117–132 Cells, 454
(ANOVA) Bivariate normal distribution, Censored data, 71
ANOVA assumptions, 420 154–155, 522 Census, 3
ANOVA decompositions, 420, Block sum of squares (SSB), 437 Center, measures of, 62–72
437, 457 Blocking, 183–184, 446 Centerlines, 253
ANOVA formulas, 456–457, Blocks, 183, 436 Central limit theorem, 235–238
464–466 Bonferroni t critical values, 531 Chance experiments, 195–201
ANOVA table, 421, 466, 521 Bootstrap, 343 Chart statistic, 253
ANOVA tests, 415–416, 417 test of hypotheses, 405 Chebyshev’s inequality, 99–100
629

Chi-squared distributions, regression coefficient, 547, 549 Covariance, 154

380–381 sample size determination, Critical values
table, 381, 593–594 303–304, 310–311 t, 319
Chi-squared tests simultaneous, 531 tolerance, 323
for distribution, 396–397 slope of regression line, 517 z (standard normal), 299–300,
for homogeneity, 386–387 and test procedures, 404–405 302–303
for univariate categorical Wilcoxon’s rank-sum test, 407 Cubic regression, 536
data, 383 Confidence levels, 172–173 Cumulative proportion
upper-tailed test, 381 for confidence intervals, 298, (frequency), 60
Class interval (classes), 16–17 299–303
Cluster sampling, 177 Conformance, 248
Clusters, 177 Conforming data, 273 D
Code of Federal Regulations, 164 Confounding, 492
Coding schemes, 472–474 Consistent estimator, 297 Data, 3
Coefficient of determination, Contingency table, 385 attributes, 249, 273–283
123–124, 511 Continuous distributions, 28–33, bivariate, 4
Coefficient of multiple determina- 46–50 censored, 71
tion, 136–137, 144, 544–545 mean (expected) value, 220 from experiments, 179–186
model selection, 559–563 measures of center for, 67–70 measurement systems, 186–192
Coefficient of variation, 79 percentiles of, 86–87 measures of center for, 62–65
Common causes, 252 variance and standard deviation measures of variability for,
Comparative boxplots, 84 of, 76–78, 220 72–74
Comparative stem-and-leaf Continuous variables, 14 multivariate, 4
displays, 13 histogram of, 16–20 operational definitions of,
Complement of an event, 199, random, 216–217, 218 162–166
204–205 Contour, 450 from sampling, 166–179
Complete second-order model, Contour plots, 450, 451 univariate, 3
145, 539 Contrast, 478 Decomposition, 420
Completely randomized Control charts, 240, 246, 252–256 Defective products, 248
design, 419 attributes data, 273–283 Defining relations, 493
Components of variance, 434 c charts, 278–282 Degrees of freedom
Conditional probabilities, 208–215 constants, 258, 602 chi-squared distributions, 380
Confidence bounds, 304–305, 309 for mean and variation, 256–265 chi-squared test, 396
Confidence intervals, 298, 299–303 np charts, 274–278 F distribution, 417
bootstrap, 342–344 p charts, 240, 273–278 interactions, 458
difference between means, R charts, 257–258, 263 single-factor ANOVA, 420
311–314, 327–335, 329–331, s charts, 260–263 single sample, 74
344, 405 u charts, 274, 278–282 Studentized range
large-sample, 298–317, 300, variables, 249 distribution, 429
303, 311–314 x chart, 258–260, 260–263 t distribution, 318, 319
for the mean, 343–344 Control limits, 253 total, 457, 466
mean y value in regression, Controlled studies, 180 Deming, W. Edwards, 9, 163
527–529 recomputing, 258 Denominator degrees of
median, 349 Correlation, 108, 522 freedom, 417
one sided, 304–305 and the bivariate normal Density curves, 29
from paired data, 329–331 distribution, 154–155 Density estimation, 21–22, 339
paired t, 330 Correlation coefficient, 108–117 Density functions, 29
pooled t, 329 Pearson’s sample, 108–114 joint, 153
for population proportions, 309 population, 114–115, 154, 522 probability, 218
prediction intervals, 530 Counts, 382 Density scale, 19

Dependent variables, 115, 117, 181 Dummy (indicator) variable, Experiments

ANOVA problems, 415 484, 539 chance, 195–201
Descriptive statistics, 4, 195 Dunnett’s method, 432–433 data from, 179–186
Design generators, 493 Dunnett’s t, 432, 601 fractional factorial, 491
Design matrix, 472–474, 490–491 randomized block, 435
Design of experiments (DOE), 2k, 474–475
179, 445 E Explanatory variables, 117. See also
Destructive testing, 167 Predictor variables
Deterministic relationship, 504 Ecological correlation, 117 Exponential distribution, 32–33
Deviations from the mean, 73 Effects Exponential regression, 513–514
Discrete distributions, 33–34, 50–60 fixed, 433–434 Extrapolation, danger of, 121
mean (expected) value, 220 and interaction, 454–456 Extreme outliers, 85
measures of center for, 65–67 main, 454–456
normal approximations to, random, 433–434
43–44 sum of squares, 482 F
variance and standard deviation variance, 481
of, 74–76, 220 Effects estimates, 475–479 F distribution, 416–417
Discrete variables, 14 Effects plots, 427–428, 454 table, 417, 595–598
histogram of, 14–16 Empirical rule, 77 F table, 417
random, 216–217, 218 Enumerative studies, 9–10 F test
Disjoint events, 199, 200, 204–205 Error degrees of freedom, 458 model utility in regression,
addition rule for, 203–204 Error sum of squares (SSE), 122, 522, 545
Distribution-free methods, 404 136, 437, 457 one-way ANOVA, 421
Distributions ANOVA notation, 419 Factor levels, 181, 472
binomial, 51–54 in regression, 510, 544 Factorial designs, 446, 452,
bivariate normal, 154–155, 522 Error variance, 434 463–464
Chi-squared, 380–381, 396–397 Errors in hypothesis testing, fractional, 489–499
continuous, 28–33 355–357 2k, 472–489
discrete, 33–34 types I and II errors, 355 Factors, 414, 446
exponential, 32–33 Estimate, interval. See Confidence interactions, 455
F, 416–417 intervals main effect, 475
hypergeometric, 390 Estimate, point, 293–294 Failure, mean time between, 220
joint, 151–157 Estimated regression line, infer- Failure laws, 284–285
lognormal, 46–47 ences on, 525–533 Failure rates, 285
marginal, 153, 156 Estimation error, 529 Family of distributions, 32
measures of center for, 31, Estimator, point. See Point Family significance level, 429, 432
65–70 estimation Fisher’s exact test, 389–391
multivariate normal, 156 Events, 196–197 Fitted (predicted) values, 122,
normal, 36–46 complementary, 199, 204–205 510, 544
Poisson, 54–56 depicting, 197–198 Five-number summary, 83
probability, 218–220 disjoint (mutually exclusive), Fixed effects, 433–434
Rayleigh, 59 199, 200, 204–205 Fixed effects model, 459
sampling, 228–232 forming, 198–200 Fixed factor, 433
standard normal, 38–41 independent, 209–213 Forward selection, 562
Studentized range, 429 simple, 196 Fourths (quartiles), 81
t, 318–321 Expected value. See Means Fractional factorial designs,
test of hypotheses, 394–399 Experimental designs, 162, 179, 489–499
uniform, 34 246, 445 and alias structures, 493
Weibull, 47–49 Experimental errors, 182 experiments, 491
Dotplot, 13–14 Experimental units, 181 Frame, sampling, 168

Frequency, 14 Indicator variables, 484, 539 Least squares line, 119–121, 508–509
cumulative, 60 Inferential statistics, 6, 195 assessing the fit of, 122–124
relative, 14–15 Influential observations, 559 polynomial functions, 135–136
Frequentist approach to probability, Inspection units, 278 and residual plots, 127
202–203 Interaction effects, 454–456, 477 standard deviation about,
Full factorial designs, 489 Interaction predictors, 145 125–126
Full model, 550 Interactions, 447–448 Level of significance, 356
Full quadratic model, 145, 539 and degrees of freedom, 458 Levels, 414, 446, 472
of factors, 455 Leverage, 559
multifactor designs, 464 Likelihood functions, 336
G two-factor designs, 454–456 Likelihood ratio principles, 405
between variables, 537–539 Likelihood ratio test statistics,
Galton, Francis, 121–122 Intercept of a line, 118 405–406
General addition rule, 206 Interlaboratory comparisons, Logistic regression, 563–567
General additive fit, 146–148 189–191 Logit functions, 564
General additive model, 534 International Organization for Lognormal distribution, 46–47, 242
Goodness-of-fit tests, 119 Standardization (ISO), 165 mean value, 69
Grand mean, 257 Interquartile range (IQR), 80–83 and quantile plots, 93
and boxplots, 83–86 variance of, 77
Interval estimates, 298 Low level, 472
H
Invariance property, 339 Lower capability index, 271
Lower confidence bound, 305
Half fraction design, 491
Lower control limit (LCL), 253
Hazard functions, 285–287
High level, 472
J Lower quartile, 80–81
Lower specification limit (LSL), 248
High leverage, 559
Joint density, function, 153 Lower-tailed test, 360, 363–364
High-leverage observations, 129–130
Joint distributions, 151–157 LOWESS, 137, 138, 551
Histograms, 14–23, 247
mean values of, 154
of process data, 249–250
Joint mass function, 152
shapes of, 21–22
Homogeneity, test for, 385–389
Joint probabilities, 222 M
Hypergeometric distribution, 390
Main effects, 458
Hypothesis, 353
and test procedures, 353–363
K of a factor, 475
multifactor designs, 464
Hypothesis testing. See Test of
Kernel density estimation, 340–342 two-factor designs, 454–456
hypotheses
Kernel function, 340 Mallows’ CP, 561
k-out-of-n system, 60 Mann-Whitney test, 403–404
I Marginal distribution, 153, 156
Mass function, 33
Implicit null hypothesis, 359 L binomial distribution, 52
Independence joint, 152
of joint distributions, 155–156 Laplace, Pierre Simon de, 201–202 Poisson distribution, 54
of random variables, 222 Large-sample confidence intervals, probability, 218, 222
and the sample mean, 233 298–317 Matched pairs, 331
test for, 388–389 Leaf, 10 Maximum likelihood estimation,
Independent events, 209–213 Least squares, weighted, 137 335–339
Independent variables, 115, Least squares coefficients, 135, 143 Mean. See also specific
155–156, 181, 446. See also Least squares estimates, 509 distributions
Predictor variables in multiple regression, 542 bootstrap confidence intervals,
ANOVA problems, 415 Least squares fit, 119 343–344

confidence interval for Mixed models, 459 Nonnormal population

difference of, 311–314 Model adequacy, 556–558 distributions, 324
confidence intervals for, Model parameters, estimating, Nonparametric tests, 404
343–344 508–512 Nonrandom samples, 170–171
of continuous distributions, Model selection, 559–563 Nonstandard normal distribution,
67–69, 220 Model utility, 522, 544–547 41–43
control charts for, 256–265 Model utility test Normal distribution, 36–46
deviations from the, 73 multiple regression, 545 bivariate, 154–155
of discrete distributions, 65–67, simple linear regression, central limit theorem, 235–238
220 520, 525 discrete populations, approxima-
distribution of sample, 233–235 Monotonic pattern scatterplots, 513 tions to, 43–44
of a function, 72 Multicollinearity, 563 empirical rule, 77
hypothesis testing about, Multifactor designs, 463–471 hypothesis testing, 359–360
363–380 main effects, 464 mean value of, 69
of joint distributions, 154 Multimodal histogram, 21 nonstandard, 41–43
population, estimating, 171–175 Multiple comparison percentiles of, 40, 87
of random variables, 220–221 procedures, 428 and quantile plots, 91–93, 557
sample, 62–63 Dunnett’s, 432 Ryan-Joiner test for, 395
of sample population Tukey’s, 428–431 of the sample mean, 234
proportions, 239 Multiple regression small-sample intervals based on,
standard error of, 233 inferences in, 542–555 318–327
test of hypotheses, 363–380, models, 533–542 standard, 38–41
365–367 Multivariate data, 4 standard deviation of, 76–77
trimmed, 64–65 Multivariate data sets, 102 testing for, 394–396
Mean square, 421, 466 Multivariate normal variance of, 76–77
Mean square error, 421, 511 distribution, 156 Normal equations, 542
Mean square for treatments Mutually exclusive events, of the least squares line, 119
(MSTr), 421 199, 200 np charts, 274–278
Mean squares, 458 addition rules for, 203–204, Null hypothesis, 353
Mean time to (before) failure, 220 206–207 implicit, 359
Measurement systems, 186–192 Null value, 365
Measures of center Numerator degrees of freedom, 417
continuous distributions, N
67–70
data, 62–65 National Institute of Standards and O
discrete distributions, 65–67 Technology (NIST), 165,
Measures of variability, 72–74 186 Observational studies, 180–181
Median Negative skew, 22 Observed category frequencies,
confidence intervals, 349 Neyman allocation, 173 382
of continuous distributions, No main effect, 455 Observed significance level (OSL),
69–70 Nominal values, 247 358. See also P-values
of a distribution, 31 Nonconformance rates, Odd ratio, 565
sample, 63–64 266–268 Odds, 564
Memoryless property, 227, Nonconforming data, 273 Offset, 186
285, 286 Nonconformities, 248, 266–268 One-factor-at-a-time experiments,
Metrology, 186 Poisson distribution, 279 446–449
Midhinge, 99 Nondestructive testing, 167 One-sample confidence interval,
Midrange, 99 Nonlinear regression models, 541 318–321
Mild outliers, 85 Nonlinear relationships, One-sample prediction interval,
Minitab, 4 132–140 318, 321–323

One-sample t test, 402–403 for nonconformities, 279 Probability density functions, 218
test of hypotheses, 365–367 table, 587 joint, 222
One-sided confidence intervals, variance of, 76 Probability distributions, 218–220.
304–305 Polynomial functions, 135–137 See also individual
One-sided tolerances, 247 Polynomial regression, 135–137, distributions
One-tailed t test, 401–402 535–537 Probability plots. See Quantile plots
One-way ANOVA, 415 Pooled t confidence intervals, 329 Procedures, 170
F test, 421 Pooled t test, 378 Process, 9
Operating characteristic curve, 57 Population, 3, 166 Process capability, 265–273
Operational definitions, 162–166 Population correlation indexes, 268–272
Out of control rules, 253–254 coefficient, 114–115, 522 nonconformance rates, 266–268
Outliers, 11 Population proportions, 309 Process control activities, 248
in boxplots, 85–86 estimating, 175–177 Process mean, 266
and sample means, 63 Population regression coefficients, Process spread, 266
and sample medians, 63, 64 534 Process variation, 266
and trimmed means, 64 Population regression functions, Product rule for probabilities, 210
513, 534 Professional standards, 163–165
Population regression line, 505 Proportion
P Positive skew, 16, 22 distribution of sample, 238–240
Power function relationship, 135 population, estimate of, 175–177
p charts, 240, 273–278 Power of a test, 402 Proportional allocation, 173
P-values, 358 Power transformations, 132–135
for a chi-squared test, 381 Practical significance, 400
test of hypotheses, 357–361 Precision of measuring data, 187 Q
for t tests, 363 Predictable controlled process, 265
Paired data, 327, 371–374 Predicted values, 122, 484, 510, 544 Quadratic predictors, 145
confidence interval from, sampling distribution, 526 Quadratic regression, 135–137, 535
329–331 Prediction bounds, 322 model, 536
Paired t intervals, 330 Prediction errors, 529 Quadrats, 174, 177
Paired t test, 372 Prediction intervals Qualitative predictor variables,
Parallel systems, 212–213, and confidence intervals, 530 539–541
288–289 in multiple regression, 549 Quantile, 87
Pareto diagrams, 23 one-sample, 318, 321–323 Quantile plots, 90–97
Pearson’s sample correlation in simple linear regression, for normal distributions,
coefficient, 108–111 529–530 91–93, 557
and the coefficient of Predictor variables, 117, 140–151 sample quantiles, 90
determination, 124 qualitative, 539–541 Weibull distribution, 93–94
properties of, 111–114 Predictors Quartiles, 80–83
Percentiles, 86–87 creating new, 145–146
of a normal distribution, 40 eliminating a group of, 550–551
Plots for checking model adequacy, interaction, 145 R
556, 557, 558 model selection, 559–563
Point estimates, 293–294 quadratic, 145 R charts, 257–258, 263
Point estimation, 294–298 Probabilistic models, 504 R software, 4
maximum likelihood, 335 Probability Random deviation (error), 504
unbiased, 295–297 concepts of, 201–208 Random effects, 433–434
Poisson distribution, 54–56 conditional, 208–215 Random effects model, 459
approximation to binomial, joint, 222 Random experiments, 195
55–56 mass function, 218, 222 Random factors, 433
mean value, 67 of a match, 214 Random number generator, 168

Random sampling, 167, 168–170, Resampling procedures, 343 Sampling inspection, 200
194 Research hypothesis, 354 Sampling plans, 162
and nonrandom samples, Residual plots, 126–129, 557–558 SAS software, 4
170–171 Residual sum of squares (SSResid), Scatterplot matrix, 142
and sampling distributions, 228 122–123, 136, 144, 510 Scatterplots, 102–107, 508
stratified sampling, 171–177 multiple regression, 544 and correlation, 108
Random variables, 215–227 Residuals, 122, 510 monotonic pattern, 513
Randomization, 182–183, 446, 474 multiple regression, 544 of nonlinear relationships,
Randomized block design, 436 standardized, 558 132–140
Randomized block experiments, Resistant line, 129 smoothing, 137–138
435–441 Response surface, 449 Youden plots, 189–191
Range, 72 Response variables, 117, 162, 181, Screening designs, 494
Rank, 403 414, 446 Series system, 211, 212–213,
Rational subgroups, 253–255 predicting, 484–486 287–289
Rayleigh distribution, 59 Robust interval, 324 Shewhart, W.A., 252
Reduced models, 550 Ryan-Joiner test, 395 Shewhart chart, 254, 256
Redundancy, 288 for normality, 395, 603 Shifted Weibull distribution, 50
Regression, 121–122 Significance, statistical vs.
and analysis of variance practical, 400
(ANOVA), 521–522 S Significance level, 356
cubic, 536 Simple events, 196
exponential, 513–514 s charts, 260–263 Single factor ANOVA
line, 505, 525–533 Sample mean, 62–63 Simple linear regression model,
logistic, 563–567 sampling distribution of, 505, 507, 525
model selection, 559–563 233–238 Simple random sampling (SRS),
multiple, 533–542 Sample median, 63–64 171
nonlinear, 541 Sample proportion, 175 Simultaneous confidence
polynomial, 135–137, 535–537 sampling distribution of, intervals, 531
quadratic, 535 238–240, 308 Single-factor ANOVA, 419–427
simple linear, 505, 507 Sample regression line, 119–121 and boxplots, 85
single independent variable, Sample size determination degrees of freedom, 420
504–517 and confidence intervals, notation, 419
slope of a line, 501, 505, 303–304, 310–311 test of hypotheses, 420–423
517–525 estimation, 173 Six sigma, 248
unusual observations in, 559 Sample space, 196 Skewed histogram, 22
Regression analysis, 117, 555–573 Samples, 3, 166 Skewed Weibull density
Regression coefficients, 547, 549 Sampling cluster, 177 curves, 48
Regression sum of squares Sampling data, 166–179 Slope of a line, 118
(SSRegr), 511 with or without replacement, 168 in multiple regression, 547
Relative error, 191 stratified, 171 in regression, 501, 505,
Relative frequency, 14–15 Sampling distributions, 228–232 517–525
Reliability, 283–291 of the chart statistic, 253 Small-sample intervals, 318–327
and hazard functions, 285–287 of the difference between two Smoothed histograms, 21–22
system, 287–289 means, 312–313 Smoothing a scatter plot, 1
at time t, 285 of the estimated slope, 517, 547 37–138
Repeatability, 188–189 of a predicted value, 526 Smoothing parameters,
Repeated stems, 12 of a sample mean, 233–238 340–341
Replicate, 463 of a sample proportion, Software packages, 4
Replication, 182, 446, 447 238–240, 308 Special causes, 252
Reproducibility, 188–189 Sampling frames, 9, 168 Specification limits, 247–248

Standard deviation block sum of squares (SSB), 437 multifactor designs, 466–469
about the least squares line, effect, 482 normal distribution, 359–360
125–126 error. See Error sum of squares one-sample t test, 365–367
of a continuous distribution, (SSE) paired t test, 372
76–78 regression, 511 procedures for, 405–406
of a discrete distribution, 74–76 residual. See Residual sum of P-values, 357–361
of the normal distribution, 76–77 squares (SSResid) single-factor ANOVA, 420–423
sample, 73 total. See Total sum of squares steps, 366–367
of the sample mean, 233 (SSTo) Test procedures, 355
of sampling statistics, 526, 528 treatment, 419, 437 confidence intervals, 404–405
Standard error, 173–174, 233 Symmetric histogram, 22 hypothesis, 353–363
of the sample proportion, 239 System reliability, 287–289 Test statistics, 357–361
Standard normal distribution, 38–41 Tolerance critical value, 323, 589
table, 582–583 Tolerance intervals, 318, 323–324
table of values, 38 T Tolerances, 247
Standard order, 473–474 Topological reliability, 211, 287
Standardized limits, 41 t confidence interval, 318–321, Total degrees of freedom,
Standardized residuals, 558 327, 329 457, 466
plot, 557–558 t critical values, 319 Total quality management (TQM),
Standardized variables, 41 t distributions, 318–321 248
Standards, 163–165 t table, 319, 364, 588, 590–592 Total sum of squares (SSTo), 122,
professional, 161 t test, 365–367, 368–369 437, 456–457
Statistic, 194 one sample, 365, 402–403 ANOVA, 420
Statistical control, 253 one-tailed, 401–402 regression, 136, 144
Statistical hypothesis, 353 paired, 372 Transformations, 132–135, 514
Statistical inferences, 194, 195 P-values, 363 of data, 26
Statistical process control two sample, 368–369 Treatment levels, 181, 414
(SPC), 248 Target value, 247 Treatment sum of squares (SSTr),
Statistical significance, 400 Test of hypotheses, 353, 437 419, 437
Statistically significant results, 400 about categorical populations, Treatments, 414
Statistics, 1 380–394 Tree diagrams, 197
descriptive, 4 about means, 363–380 Trimmed mean, 64–65
inferential, 6 bootstrap, 405 Trimming percentage, 64–65
scope of, 6–8 chi-squared tests, 396–397 True regression line, 505
Stem, 10 and confidence intervals, Truncation, 12
Stem-and-leaf displays, 10–13 404–405 Tukey, John, 428
comparative, 13 difference between means, Tukey’s method, 428–432
Stepwise regression, 563 367–374, 371–374, 403–404 2k designs, 472–489
Straight line, fitting to, 118–121 distribution, form of, 394–399 analyzing experiments, 479–484
Strata, 171 errors in, 355–357 fraction of, 490
Stratified sampling, 171–177 for a group of predictors, models, fitting, 484–486
Studentized range distribution, 429 550–551 Two-factor ANOVA, 458
table, 429, 599–600 homogeneity, 385–388 Two-factor designs, 453–463,
Subgroups, 252 hypothesis testing, 437 457–461
rational, 253–255 independence, 388–389 Two-factor interaction effects,
Sum of squares, 436–437, 444–445, large-sample, 359–361 456, 475
456–457 Mann-Whitney, 403–404 Two-sample bootstrap
balanced three-factor means, 365–367 intervals, 344
ANOVA, 465 model utility, 520, 545 Two-sample t interval, 327–329

Two-sample t test, 368–369 V quantile plots, 93–94

Two-sample tests shifted, 50
bootstrap confidence Variability, measures of, 72–80 variance of, 77
intervals, 344 Variables, 3 Weighted least squares, 137
difference of means, 403–404 continuous, 14, 16–20 Whiskers, 83
t test, 368–369 control charts, 249 Wilcoxon rank-sum confidence
Two-sided tolerances, 271 dependent, 115, 181 interval, 405
Two-tailed test, 360, 363–364 discrete, 14–16 Wilcoxon rank-sum test, 403–404
Type I error, 355 dummy (indicator), 484, 539 confidence intervals, 407
probabilities, 358 independent, 115, 181 Window width, 340
Type II error, 355 random, 215–227 Within-samples variation, 419
probabilities, 400–403 response. See Response
variables
selection of, 559–563 X
U
Variables data, 248–249
Variance x charts, 258–260, 260–263
u charts, 274, 278–282
analysis of variance (ANOVA),
Unbalanced designs, 431
See Chapter 9.
Unbiased estimators, 231,
of an effect, 481 Y
295–297
components of, 434
Unbiasedness, 231
of a continuous distribution, Yates, Frank, 473
Uncorrelated variables, 156
76–78, 220 Yates’ standard orders, 473–474
Uniform distribution, 34
of a difference, 312 Youden, W.J., 449
Unimodal histogram, 21
of a discrete distribution, 74–76, Youden plots, 189–191
Unitless measures, 269
220
Univariate data, 3
model assumption for, 557
and hypothesis testing, 381–384
of random variables, 220–221 Z
visual displays for, 10–28
sample, 73
Unusual observations in z confidence interval, 318
Variation
regression, 559 z critical values, 299–300, 302–303,
coefficient of, 79
Upper capability index, 271 309–310
control charts for, 256–265
Upper confidence bound, 305 z curve, 38
Venn diagrams, 198
Upper control limit (UCL), 253 z (standard normal) distribution,
Upper quartile, 80–81 38–41
Upper specification limit (USL), critical values, 299–300,
248
W
302–303
Upper-tailed test, 360, 363–364 z table, 582–583
Weibull distribution, 47–49
ANOVA tests, 417 Zero acceptance plan, 226
mean value of, 69
chi-squared test, 381
parameter estimation, 290

Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has
www.ebook3000.com
deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has
deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

Introduction To Probability and Statistics 15th Edition
No ratings yet
Introduction To Probability and Statistics 15th Edition
784 pages
Statistics For Engineers and Scientists William Navidi Instant Download
100% (5)
Statistics For Engineers and Scientists William Navidi Instant Download
59 pages
Solution Manual For Elementary Statistics 9th Edition by Weiss ISBN 0321989392 9780321989390
100% (49)
Solution Manual For Elementary Statistics 9th Edition by Weiss ISBN 0321989392 9780321989390
36 pages
(Ebook PDF) Statistics For Engineers and Scientists 5th Edition by William Navidipdf Download
100% (5)
(Ebook PDF) Statistics For Engineers and Scientists 5th Edition by William Navidipdf Download
46 pages
Applied Statistics With Python
100% (1)
Applied Statistics With Python
320 pages
Statistics For Engineers and Scientists, 6th Edition William Navidi - Ebook PDF Instant Download
100% (8)
Statistics For Engineers and Scientists, 6th Edition William Navidi - Ebook PDF Instant Download
69 pages
Community Ecology: Analytical Methods Using R and Excel
73% (11)
Community Ecology: Analytical Methods Using R and Excel
43 pages
(Ebook PDF) Statistics For The Behavioral Sciences 10th Editioninstant Download
100% (4)
(Ebook PDF) Statistics For The Behavioral Sciences 10th Editioninstant Download
53 pages
(Ebook PDF) Modern Business Statistics, With Microsoft Office Excel 4th Editionpdf Download
100% (4)
(Ebook PDF) Modern Business Statistics, With Microsoft Office Excel 4th Editionpdf Download
44 pages
Applied Statistics For Agriculture, Veterinary, Fishery, Dairy and Allied Fields High-Resolution PDF Download
100% (11)
Applied Statistics For Agriculture, Veterinary, Fishery, Dairy and Allied Fields High-Resolution PDF Download
17 pages
Understandable Statistics 12th Edition Brase C.H. - Ebook PDF PDF Download
100% (2)
Understandable Statistics 12th Edition Brase C.H. - Ebook PDF PDF Download
79 pages
(Ebook PDF) Statistics For Engineers and Scientists 5th Edition by William Navidi Instant Download
100% (1)
(Ebook PDF) Statistics For Engineers and Scientists 5th Edition by William Navidi Instant Download
44 pages
Statistics For Management and Economics 10th Edition by Gerald Keller (Ebook PDF) Instant Download
100% (3)
Statistics For Management and Economics 10th Edition by Gerald Keller (Ebook PDF) Instant Download
59 pages
(Ebook PDF) Modern Business Statistics, With Microsoft Office Excel 4th Edition PDF Download
100% (2)
(Ebook PDF) Modern Business Statistics, With Microsoft Office Excel 4th Edition PDF Download
48 pages
Horticulture Proposal
No ratings yet
Horticulture Proposal
17 pages
(Ebook PDF) Introduction To Probability and Statistics 3rd by William Mendenhallinstant Download
100% (3)
(Ebook PDF) Introduction To Probability and Statistics 3rd by William Mendenhallinstant Download
44 pages
(Ebook PDF) Modern Business Statistics, With Microsoft Office Excel 4th Edition Download
100% (7)
(Ebook PDF) Modern Business Statistics, With Microsoft Office Excel 4th Edition Download
56 pages
Sigma Plot Statistics User Guide
No ratings yet
Sigma Plot Statistics User Guide
462 pages
(Ebook PDF) Essentials of Statistics For Business and Economics 7th Editioninstant Download
100% (3)
(Ebook PDF) Essentials of Statistics For Business and Economics 7th Editioninstant Download
55 pages
Research Methods Statistics and Applications
100% (6)
Research Methods Statistics and Applications
962 pages
(Ebook PDF) Statistics For Business Economics 13th Edition by David PDF Download
100% (1)
(Ebook PDF) Statistics For Business Economics 13th Edition by David PDF Download
49 pages
Statical Chapman
100% (1)
Statical Chapman
385 pages
Essentials of Statistics For Business and Economics 7th Edition (Ebook PDF) Download
100% (3)
Essentials of Statistics For Business and Economics 7th Edition (Ebook PDF) Download
52 pages
(Ebook PDF) Understanding Basic Statistics 7th Edition PDF Download
100% (1)
(Ebook PDF) Understanding Basic Statistics 7th Edition PDF Download
52 pages
(Ebook PDF) Statistics For Business Economics 13th Edition by David Download
100% (1)
(Ebook PDF) Statistics For Business Economics 13th Edition by David Download
41 pages
Statistics For Engineers and Scientists, 6th Edition William Navidi - Ebook PDF Download PDF
100% (1)
Statistics For Engineers and Scientists, 6th Edition William Navidi - Ebook PDF Download PDF
52 pages
Programming with STM32: Getting Started with the Nucleo Board and C/C++
From Everand
Programming with STM32: Getting Started with the Nucleo Board and C/C++
Donald Norris
3.5/5 (3)
(Ebook PDF) Statistics Unplugged 4th Edition by Sally Caldwell PDF Download
100% (1)
(Ebook PDF) Statistics Unplugged 4th Edition by Sally Caldwell PDF Download
44 pages
Full (Ebook PDF) Statistics Unplugged 4th Edition by Sally Caldwell Ebook All Chapters
100% (1)
Full (Ebook PDF) Statistics Unplugged 4th Edition by Sally Caldwell Ebook All Chapters
55 pages
Redefining Internal Audit Performance Im
No ratings yet
Redefining Internal Audit Performance Im
233 pages
Think Stats
100% (2)
Think Stats
142 pages
EDA
100% (1)
EDA
9 pages
Introduction to Java Programming, 2nd Edition
From Everand
Introduction to Java Programming, 2nd Edition
Prof. Sham Tickoo
5/5 (1)
Instant Download Statistics For Business Economics With XLSTAT Education Edition Printed Access Card David R. Anderson PDF All Chapters
No ratings yet
Instant Download Statistics For Business Economics With XLSTAT Education Edition Printed Access Card David R. Anderson PDF All Chapters
65 pages
Molluscicidaleffectoftuba-Tuba (Jatrophacurcaslinn) Extracts On Golden Apple Snail (Pomaceacanaliculatalamarck)
No ratings yet
Molluscicidaleffectoftuba-Tuba (Jatrophacurcaslinn) Extracts On Golden Apple Snail (Pomaceacanaliculatalamarck)
6 pages
Bayesian Statistical Methods
100% (10)
Bayesian Statistical Methods
288 pages
Foundation of Statistics - R
No ratings yet
Foundation of Statistics - R
164 pages
Fundamentals of Biostatistics 7th Edition Chapter-1
0% (2)
Fundamentals of Biostatistics 7th Edition Chapter-1
10 pages
Think Stats: Probability and Statistics For Programmers
100% (1)
Think Stats: Probability and Statistics For Programmers
142 pages
Applied Statistics Ebook
No ratings yet
Applied Statistics Ebook
253 pages
B.Sc. III STATISTICS (Paper XI)
No ratings yet
B.Sc. III STATISTICS (Paper XI)
13 pages
(Ebook PDF) Research Decisions Quantitative Qualitative and Mixed-Method Approaches 5th Edition Instant Download
No ratings yet
(Ebook PDF) Research Decisions Quantitative Qualitative and Mixed-Method Approaches 5th Edition Instant Download
53 pages
(Ebook PDF) Statistics For The Behavioral Sciences 10Th Edition Install Download
No ratings yet
(Ebook PDF) Statistics For The Behavioral Sciences 10Th Edition Install Download
55 pages
(FREE PDF Sample) R Data Analysis Without Programming 1st Edition David W. Gerbing Ebooks
100% (4)
(FREE PDF Sample) R Data Analysis Without Programming 1st Edition David W. Gerbing Ebooks
84 pages
Probability and Statistics For Science and Engineering With Examples in R 2nd Edition Hongshik Ahn - Downloadable PDF 2025
No ratings yet
Probability and Statistics For Science and Engineering With Examples in R 2nd Edition Hongshik Ahn - Downloadable PDF 2025
52 pages
Student Solutions Manual For Brase's Understanding Basic Statistics, 6th
No ratings yet
Student Solutions Manual For Brase's Understanding Basic Statistics, 6th
144 pages
(Ebook PDF) Statistics For Business Economics 12th by David R. Anderson Download
No ratings yet
(Ebook PDF) Statistics For Business Economics 12th by David R. Anderson Download
55 pages
(Ebook PDF) Understanding Basic Statistics 7Th Edition Download
No ratings yet
(Ebook PDF) Understanding Basic Statistics 7Th Edition Download
57 pages
(Ebook PDF) Essentials of Statistics For Business and Economics 7Th Edition Install Download
No ratings yet
(Ebook PDF) Essentials of Statistics For Business and Economics 7Th Edition Install Download
59 pages
Statistics For Engineering and The Sciences, Sixth Edition Student Solutions Manual William M. Mendenhall Instant Download
No ratings yet
Statistics For Engineering and The Sciences, Sixth Edition Student Solutions Manual William M. Mendenhall Instant Download
56 pages
(Ebook PDF) Probability and Statistics For Engineering and The Sciences 9Th Edition Download
No ratings yet
(Ebook PDF) Probability and Statistics For Engineering and The Sciences 9Th Edition Download
54 pages
Statistics For Business & Economics 13Th Edition (Ebook PDF) Download
No ratings yet
Statistics For Business & Economics 13Th Edition (Ebook PDF) Download
57 pages
(Ebook PDF) Statistics For The Behavioral Sciences 10th Edition Download
No ratings yet
(Ebook PDF) Statistics For The Behavioral Sciences 10th Edition Download
50 pages
Livestock
No ratings yet
Livestock
13 pages
(Ebook) Doing Data Analysis With SPSS: Version 18.0 by Robert H. Carver, Jane Gradwohl Nash ISBN 0840049161
No ratings yet
(Ebook) Doing Data Analysis With SPSS: Version 18.0 by Robert H. Carver, Jane Gradwohl Nash ISBN 0840049161
55 pages
Introductory Statistics For Data Analysis Warren J. Ewens Install Download
No ratings yet
Introductory Statistics For Data Analysis Warren J. Ewens Install Download
78 pages
(Ebook PDF) Statistics For The Behavioral Sciences 10Th Edition Download
No ratings yet
(Ebook PDF) Statistics For The Behavioral Sciences 10Th Edition Download
51 pages
Business Analytics Lab Manual - Complete Program
No ratings yet
Business Analytics Lab Manual - Complete Program
85 pages
(Ebook PDF) Statistics For The Behavioral Sciences 10Th Edition
No ratings yet
(Ebook PDF) Statistics For The Behavioral Sciences 10Th Edition
55 pages
(Ebook PDF) Statistics For Engineers and Scientists 5Th Edition by William Navidi
No ratings yet
(Ebook PDF) Statistics For Engineers and Scientists 5Th Edition by William Navidi
55 pages
(Probability and Statistics For Programmers) Allen Downey - Think Stats. Probability and Statistics For programmers-O'Reilly Media (2012) PDF
100% (10)
(Probability and Statistics For Programmers) Allen Downey - Think Stats. Probability and Statistics For programmers-O'Reilly Media (2012) PDF
142 pages
(Ebook) Statistics For Business & Economics by David R. Anderson, Dennis J. Sweeney, Thomas A. Williams ISBN 9781337901062, 1337901067, 2018965692
No ratings yet
(Ebook) Statistics For Business & Economics by David R. Anderson, Dennis J. Sweeney, Thomas A. Williams ISBN 9781337901062, 1337901067, 2018965692
55 pages
(123doc) Quantitative Methods For Second Language Research Carsten Roever Aek Phakiti Routledge 2018 Scan
No ratings yet
(123doc) Quantitative Methods For Second Language Research Carsten Roever Aek Phakiti Routledge 2018 Scan
291 pages
Ej 1283103
No ratings yet
Ej 1283103
36 pages
Design and Analysis of DNA Microarray Investigations Premium Download
No ratings yet
Design and Analysis of DNA Microarray Investigations Premium Download
17 pages
Probability & Statistics For Engineers & Scientists, 9th Edition Ronald E. Walpole
No ratings yet
Probability & Statistics For Engineers & Scientists, 9th Edition Ronald E. Walpole
41 pages
Statistical Foundations of Machine Learning: The Handbook
No ratings yet
Statistical Foundations of Machine Learning: The Handbook
364 pages
CAPE Applied Maths Unit 1 Summary With 2008-2015 Solutions
No ratings yet
CAPE Applied Maths Unit 1 Summary With 2008-2015 Solutions
154 pages
Characterisation and Mechanisms of Bradykinin-Evoked Pain in Man Using Iontophoresis
No ratings yet
Characterisation and Mechanisms of Bradykinin-Evoked Pain in Man Using Iontophoresis
11 pages
Weekly Training Load Distribution
No ratings yet
Weekly Training Load Distribution
8 pages
Factors That Affect Nursing Perception As A Career Choice Among Scientific High School Students in Hebron City - 0
No ratings yet
Factors That Affect Nursing Perception As A Career Choice Among Scientific High School Students in Hebron City - 0
46 pages
Chapter II-Methodology
100% (1)
Chapter II-Methodology
4 pages
How to Bring Technology Into Your Classroom: The quick and easy guide for teachers
From Everand
How to Bring Technology Into Your Classroom: The quick and easy guide for teachers
Jessica Sanders
3/5 (1)
Week 017 Measures of Central Tendency
No ratings yet
Week 017 Measures of Central Tendency
15 pages
BC3EA02 Exploratory Data Analysis
No ratings yet
BC3EA02 Exploratory Data Analysis
2 pages
UC Davis Statistics Lecture Note 1
No ratings yet
UC Davis Statistics Lecture Note 1
2 pages
Mastering Predictive Analytics With R - Sample Chapter
No ratings yet
Mastering Predictive Analytics With R - Sample Chapter
57 pages
03 - Surface Roughness Analysis in Finishing End-Milling of Duplex Stainless Steel UNS S32205
No ratings yet
03 - Surface Roughness Analysis in Finishing End-Milling of Duplex Stainless Steel UNS S32205
10 pages
Hospital Warehouse Management During The Construct
No ratings yet
Hospital Warehouse Management During The Construct
8 pages
(Unedited) Biogas Research
No ratings yet
(Unedited) Biogas Research
25 pages
Influence of Non Interest Income On Fina PDF
No ratings yet
Influence of Non Interest Income On Fina PDF
18 pages
HJJBV
No ratings yet
HJJBV
17 pages
Potential of Lemongrass Leaves Extract Cymbopogon Citratus As Prevention For Oil Oxidation
No ratings yet
Potential of Lemongrass Leaves Extract Cymbopogon Citratus As Prevention For Oil Oxidation
6 pages
E-learning with Camtasia Studio
From Everand
E-learning with Camtasia Studio
David B. Demyan
No ratings yet
Leg Elevation Decreases The Incidence of Post-Spinal Hypotension in Cesarean Section: A Randomized Controlled Trial
No ratings yet
Leg Elevation Decreases The Incidence of Post-Spinal Hypotension in Cesarean Section: A Randomized Controlled Trial
6 pages
Chihara Et Al-Mathematical Statistics With Re Sampling and R-2011-ToC - 201
No ratings yet
Chihara Et Al-Mathematical Statistics With Re Sampling and R-2011-ToC - 201
8 pages
Alternative Feeds and Daily Rations For Mud Crab (Scylla Serrata F) Culture
No ratings yet
Alternative Feeds and Daily Rations For Mud Crab (Scylla Serrata F) Culture
4 pages
Introduction To Multivariate Analysis MPU2263
No ratings yet
Introduction To Multivariate Analysis MPU2263
14 pages
Probabilidad y Estadistica para Ingenieros Douglas - Montgomery
No ratings yet
Probabilidad y Estadistica para Ingenieros Douglas - Montgomery
11 pages
E-Learning for Educators
From Everand
E-Learning for Educators
Denise Taylor
No ratings yet

Applied Statistics

Uploaded by

Applied Statistics

Uploaded by

www.ebook3000.

Australia Brazil Mexico Singapore United Kingdom United States

Cengage Learning is a leading provider of customized learning solutions with

Cengage Learning products are represented in Canada by

For your course and learning solutions, visit www.cengage.com

Printed in the United States of America

My grandsons, Philip and Elliot

My grandchildren, Ava and Leo

My wife and daughter, Midori and Alicia

1 Data and Distributions, 1

2 Numerical Summary Measures, 61

3 Bivariate and Multivariate Data

4 Obtaining Data, 161

5 Probability and Sampling Distributions, 194

6 Quality and Reliability, 246

Supplementary Exercises, 291

7 Estimation and Statistical Intervals, 293

8 Testing Statistical Hypotheses, 352

9 The Analysis of Variance, 413

10 Experimental Design, 445

11 Inferential Methods in Regression

Appendix Tables, 581

Mathematical and Computing Level 

Factor Factor name Factor levels

Plots of all two-factor interactions are shown in Figure 10.18,

Interaction Plots for FAEE Main Effects Plots for FAEE

Main Effects Plots for FAEE

Focus and Content 

and tolerance—are introduced in Chapter 7. Hypothesis testing is discussed in Chap-

Some Suggestions Concerning Coverage 

Changes for the Third Edition

Carbon Calculator CO2 (lb)

1.1 Populations, Samples, and Processes 

Figure 1.1 A Minitab stem-and-leaf display (10ths digit truncated)

Having obtained a sample from a population, an investigator would frequently like to

The Scope of Modern Statistics

randomness and uncertainty and new methodology to analyze data. As evidence of

“Measuring the Vulnerability of the Uruguayan Population to Vector-Borne Diseases

Enumerative Versus Analytic Studies

1. The amount of electric current in the exhaustive process

1.2 Visual Displays for Univariate Data 

Figure 1.2 Stem-and-leaf display for percentage binge drinkers at

Unless otherwise noted, all content on this page is © Cengage Learning.

Figure 1.3 Minitab stem-and-leaf display

Figure 1.4 A comparative stem-

3.6 5.4 7.2 9.0 10.8 12.6 14.4

Figure 1.5 A dotplot of the data from Example 1.6

number of times the value occurs

Constructing a Histogram for Discrete Data

Figure 1.6 Histogram of number of corporate board members

Proportion of boards with (relative (relative (relative

Proportion of boards with (relative (relative (relative

Constructing a histogram for continuous data (measurements) entails subdividing the

27.5 28.0 28.5 29.0 29.5 30.0 30.5 31.0 31.5

Constructing a Histogram for Continuous Data:

Figure 1.7 Histogram of the energy

Class: 12 ,3 32, 5 52, 7 72, 9 92, 11 112,13 132,15 152,17 172,19

Constructing a Histogram for Continuous Data:

relative frequency of the class

in environments affected by severe weather conditions. For this reason, researchers

Class: 22 ,4 4 2 ,6 6 2 ,8 8 2 ,12 12 2 ,20 20 2 ,30

Figure 1.9 A Minitab density histogram for

relative frequency 5 (class width)(density)

Density Estimation of NFL Player Weights

(a) (b) (c) (d)

Type of defect Frequency Relative frequency

Figure 1.12 A Pareto diagram for Example 1.11

Section 1.2 Exercises

c. Do there appear to be any outlying strength 132.7 132.9 133.0 133.1 133.1 133.1

10. The article “Knee Injuries in Women Collegiate z: 1 8 6 1 1 5 3 0 0 4 4 0 0 1 2 1

28.1 31.2 13.7 46.0 25.8 16.8 34.8 62.3 11 14 20 23 31 36 39 44 47 50

1.3 Describing Distributions 

Density Density Density

(a) (b) (c)

Figure 1.14 The area under the density

Figure 1.15 Density curve for Example 1.12

Mathematical and Computing Level

Focus and Content

Some Suggestions Concerning Coverage

1.1 Populations, Samples, and Processes

1.2 Visual Displays for Univariate Data

1.3 Describing Distributions

DEFINITION A variable x is said to have an exponential distribution with parameter . 0 if

Each different value of prescribes a different exponential distribution, so we have an en-

1.4 The Normal Distribution

Figure 1.20 Visual identification of and

1.5 Other Continuous Distributions

Figure 1.32 A binomial histogram when 5 8 and 5 .5625

where the parameter must satisfy . 0.

Figure 1.33 Poisson histogram when 5 4.5

tion for individuals who are number 3 on the f (x) 5 c