Quality Control With R - An ISO Standards Approach (2015)
Quality Control With R - An ISO Standards Approach (2015)
R!
Series Editors
Robert Gentleman
Div. Public Health Sciences, San Mateo, California, USA
Kurt Hornik
Accounting & Statistics, WU Wien, Dept of Finance, Wien, Austria
Giovanni Parmigiani
Dana Farber Cancer Institute, Boston, USA
Javier M. Moguerza
Department of Computer Science and Statistics, Rey Juan Carlos University,
Madrid, Spain
Use R!
This work is subject to copyright. All rights are reserved by the Publisher,
whether the whole or part of the material is concerned, specifically the rights of
translation, reprinting, reuse of illustrations, recitation, broadcasting,
reproduction on microfilms or in any other physical way, and transmission or
information storage and retrieval, electronic adaptation, computer software, or
by similar or dissimilar methodology now known or hereafter developed.
The publisher, the authors and the editors are safe to assume that the advice and
information in this book are believed to be true and accurate at the date of
publication. Neither the publisher nor the authors or the editors give a warranty,
express or implied, with respect to the material contained herein or for any errors
or omissions that may have been made.
Conventions
We use a homogeneous typeset throughout the book so that elements can be
easily identified by the reader. Text in Sans-Serif font is for software (e.g., R ,
Minitab ). Text in teletype font within paragraphs is used for R components
(packages, functions, arguments, objects, commands, variables, etc.).
The commands and scripts are formatted in blocks, using teletype font with
gray background. Moreover, the syntax is highlighted, so the function names,
character strings, and function arguments are colored (in the electronic version)
or with different grayscales (printed version). Thus, an input block of code will
look like this:
The text output appears just below the command that produces it, and with a
gray background. Each line of the output is preceded by two hashes ( ## ):
There are quite a lot of examples in the book. They are numbered and start
with the string Example (Brief title for the example) and finish with a square ( □ )
at the end of the example. In the subsequent evolution of the example within the
chapter, the string (cont.) is added to the example title.
Throughout the book, when we talk about products, it will be very often
suitable for services. Likewise, we use in a general manner the term customer
when referring to customers and/or clients.
The Production
The book has been written in .Rnw files. Both Eclipse + StatET IDE and RStudio
have been used as both editor and interface with R . Notice that if you have a
different version of R or updated version of the packages, you may not get
exactly the same outputs. The session info of the machine where the code has
been run is:
R version 3.2.1 (2015-06-18), x86_64-pc-linux-gnu
Locale: LC_CTYPE=es_ES.UTF-8 , LC_NUMERIC=C , LC_TIME=es_ES.UTF-8 ,
LC_COLLATE=es_ES.UTF-8 , LC_MONETARY=es_ES.UTF-8 ,
LC_MESSAGES=es_ES.UTF-8 , LC_PAPER=es_ES.UTF-8 , LC_NAME=es_ES.UTF-8 ,
LC_ADDRESS=es_ES.UTF-8 , LC_TELEPHONE=es_ES.UTF-8 ,
LC_MEASUREMENT=es_ES.UTF-8 , LC_IDENTIFICATION=es_ES.UTF-8
Base packages: base, datasets, graphics, grDevices, grid, methods, stats,
utils
Other packages: AcceptanceSampling 1.0-3, car 2.0-25, ctv 0.8-1,
downloader 0.3, e1071 1.6-4, Formula 1.2-1, ggplot2 1.0.1, Hmisc 3.16-0,
ISOweek 0.6-2, knitr 1.10.5, lattice 0.20-31, MASS 7.3-42, nortest 1.0-3,
qcc 2.6, qicharts 0.2.0, qualityTools 1.54, rj 2.0.3-1, rvest 0.2.0,
scales 0.2.5, SixSigma 0.8-1, spc 0.5.1, survival 2.38-3, XML 3.98-1.3,
xtable 1.7-4
Loaded via a namespace (and not attached): acepack 1.3-3.3, class 7.3-13,
cluster 2.0.2, colorspace 1.2-6, crayon 1.3.0, curl 0.9.1, digest 0.6.8,
evaluate 0.7, foreign 0.8-64, formatR 1.2, gridExtra 0.9.1, gtable 0.1.2,
highr 0.5, httr 1.0.0, labeling 0.3, latticeExtra 0.6-26, lme4 1.1-8,
magrittr 1.5, Matrix 1.2-0, memoise 0.2.1, mgcv 1.8-6, minqa 1.2.4,
munsell 0.4.2, nlme 3.1-121, nloptr 1.0.4, nnet 7.3-10, parallel 3.2.1,
pbkrtest 0.4-2, plyr 1.8.3, proto 0.3-10, quantreg 5.11, R6 2.1.0,
RColorBrewer 1.1-2, Rcpp 0.11.6, reshape2 1.4.1, rj.gd 2.0.0-1, rpart 4.1-
10, selectr 0.2-3, SparseM 1.6, splines 3.2.1, stringi 0.5-5, stringr 1.0.0,
tcltk 3.2.1, testthat 0.10.0, tools 3.2.1
Resources
The code and the figures included in this book are available at the book
companion website: http://www.qualitycontrolwithr.com . The data sets used in
the examples are available in the SixSigma package. Links and materials will be
updated in a regular basis.
1.1 Introduction
References
2.1 Introduction
2.2 R Interfaces
2.3 R Expressions
2.4 R Infrastructure
References
3.1 Origin
3.5 Histogram
3.8 Stratification
3.9 ISO Standards for the Seven Basic Quality Control Tools
References
References
5.1.1 Background
References
6 Data Sampling for Quality Control with R
References
7.1 Introduction
References
8.1 Introduction
References
9.1 Introduction
9.2.1 Introduction
9.3.1 Introduction
9.3.2 Attributes Control Charts for Groups
References
10.1 Introduction
10.3.1 Phase I
10.3.2 Phase II
References
Subject Index
List of Figures
Fig. 1.1 Out of control process
Fig. 1.2 Chance causes variability
Fig. 1.3 Assignable causes variability
Fig. 1.4 Results under a normal distribution
Fig. 1.5 Typical control chart example
Fig. 1.6 R learning curve
Fig. 1.7 R Project website homepage
Fig. 1.8 CRAN web page
Fig. 1.9 Intuitive example control chart
Fig. 1.10 RStudio layout
Fig. 1.11 Example control chart
Fig. 1.12 RStudio new R markdown dialog box
Fig. 1.13 Markdown word report (p1)
Fig. 1.14 Markdown word report (p2)
Fig. 2.1 R GUI for Windows
Fig. 2.2 RStudio Layout
Fig. 2.3 RStudio Console
Fig. 2.4 RStudio Source
Fig. 2.5 RStudio History
Fig. 2.6 RStudio export graphic dialog box
Fig. 2.7 RStudio History
Fig. 2.8 RStudio Workspace
Fig. 2.9 RStudio Files pane
Fig. 2.10 RStudio Packages
Fig. 2.11 RStudio Help
Fig. 2.12 RStudio data viewer
Fig. 2.13 RStudio Import Dataset
Fig. 3.1 Intuitive Cause-and-effect diagram (qcc)
Fig. 3.2 Intuitive Cause-and-effect diagram (SixSigma)
Fig. 3.3 R Markdown Check sheet
Fig. 3.4 Filled Check sheet
Fig. 3.5 Control chart tool
Fig. 3.6 Pellets density basic histogram
Fig. 3.7 A histogram with options
Fig. 3.8 A lattice-based histogram
Fig. 3.9 A ggplot2-based histogram
Fig. 3.10 A simple barplot
Fig. 3.11 Basic Pareto chart
Fig. 3.12 Pareto chart with the qcc package
Fig. 3.13 Pareto chart with the qualityTools package
Fig. 3.14 Pareto chart with the qicharts package
Fig. 3.15 Scatter plot example
Fig. 3.16 Stratified box plots
Fig. 4.1 ISO Standards publication path
Fig. 4.2 ISO TC69 web page
Fig. 5.1 Thickness example: histogram
Fig. 5.2 Thickness example: histograms by groups
Fig. 5.3 Thickness example: simple run chart
Fig. 5.4 Thickness example: run chart with tests
Fig. 5.5 Thickness example: tier chart by shifts
Fig. 5.6 Thickness example: box plot (all data)
Fig. 5.7 Thickness example: box plots by groups
Fig. 5.8 Thickness example: lattice box plots
Fig. 5.9 Histogram with central tendency measures
Fig. 5.10 Normal distribution
Fig. 5.11 Histogram of non-normal density data
Fig. 5.12 Individuals control chart of non-normal density data
Fig. 5.13 Box-Cox transformation plot
Fig. 5.14 Control chart of transformed data
Fig. 5.15 Quantile-Quantile plot
Fig. 5.16 Quantile-Quantile plot (non normal)
Fig. 6.1 Error types
Fig. 6.2 OC Curves
Fig. 7.1 OC curve for a simple sampling plan
Fig. 7.2 OC curve risks illustration
Fig. 7.3 OC curve with the AcceptanceSampling package
Fig. 7.4 OC curve for the found plan
Fig. 7.5 Variables acceptance sampling illustration
Fig. 7.6 Probability of acceptance when p =AQL
Fig. 7.7 Probability of acceptance when p =LTPD
Fig. 8.1 Taguchi’s loss function and specification design
Fig. 8.2 Thickness example: One week data dot plot
Fig. 8.3 Reference limits in a Normal distribution
Fig. 8.4 Histogram of metal plates thickness
Fig. 8.5 Specification limits vs. reference limits
Fig. 8.6 Capability analysis for the thickness example
Fig. 9.1 Control charts vs. probability distribution
Fig. 9.2 Identifying special causes through individual points
Fig. 9.3 Patterns in control charts
Fig. 9.4 Control chart zones
Fig. 9.5 X-bar chart example (basic options)
Fig. 9.6 X-bar chart example (extended options)
Fig. 9.7 OC curve for the X-bar control chart
Fig. 9.8 Range chart for metal plates thickness
Fig. 9.9 S chart for metal plates thickness
Fig. 9.10 X-bar and S chart for metal plates thickness
Fig. 9.11 I & MR control charts for metal plates thickness
Fig. 9.12 CUSUM chart for metal plates thickness
Fig. 9.13 EWMA chart for metal plates thickness
Fig. 9.14 p chart for metal plates thickness
Fig. 9.15 np chart for metal plates thickness
Fig. 9.16 c chart for metal plates thickness
Fig. 9.17 u chart for metal plates thickness
Fig. 9.18 Decision tree for basic process control charts
Fig. 10.1 Single woodboard example
Fig. 10.2 Single woodboard example (smoothed)
Fig. 10.3 Woodboard example: whole set of profiles
Fig. 10.4 Woodboard example: whole set of smoothed profiles
Fig. 10.5 Woodboard example: Phase I
Fig. 10.6 Woodboard example: In-control Phase I group
Fig. 10.7 Woodboard example: Phase II
Fig. 10.8 Woodboard example: Phase II out of control
Fig. 10.9 Woodboard example: Profiles control chart
List of Tables
Table 1.1 CRAN task views
Table 1.2 Pellets density data (g/cm 3 )
Table 4.1 Standard development project stages
Table 5.1 Thickness of a certain steel plate
Table 6.1 Complex bills population
Table 6.2 Pellets density data
Table 7.1 Iterative sampling plan selection method
Table A.1 Shewhart constants
Footnotes
1 ISO Standards are continuously evolving. All references to standards throughout the book are specific for
a given point in time. In particular, this point in time is end of June 2015.
Part I
Fundamentals
This part includes four chapters with the fundamentals of the three topics
covered by the book, namely: Quality Control, R, and ISO Standards. Chapter 1
introduces the problem through an intuitive example, which is also solved using
the R software. Chapter 2 comprises a description of the R ecosystem and a
complete set of explanations and examples regarding the use of R. In Chapter 3,
the seven basic quality tools are explored from the R and ISO perspectives.
Those straightforward tools will smoothly allow the reader to get used to both
Quality Control and R. Finally, the importance of standards and how they are
made can be found in Chapter 4
© Springer International Publishing Switzerland 2015
Emilio L. Cano, Javier M. Moguerza and Mariano Prieto Corcoba, Quality Control with R, Use R!,
DOI 10.1007/978-3-319-24046-6_1
Abstract
This chapter introduces Quality Control by means of an intuitive example.
Furthermore, that example is used to illustrate how to use the R statistical
software and programming language for Quality Control. A description of R
outlining its advantages is also included in this chapter, all in all paving the way
to further investigation throughout the book.
1.1 Introduction
This chapter provides the necessary background to understand the fundamental
ideas behind quality control from a statistical perspective. It provides a review of
the history of quality control in Sect. 1.2. The nature of variability and the
different kinds of causes responsible for it within a process are described in
Sect. 1.3; this section also introduces the control chart, which is the fundamental
tool used in statistical quality control. Sect. 1.4 introduces the advantages of
using R for quality control. Sect. 1.5 develops an intuitive example of a control
chart. Finally, Sect. 1.6 provides a roadmap to getting started with R while
reproducing the example in Sect. 1.5.
1.2 A Brief History of Quality Control
Back in 1924, while working for the Bell Telephone Co. in solving certain
problems related to the quality of some electrical components, Walter Shewhart
set up the foundations of modern statistical quality control [16]. Until that time
the concept of quality was limited to check that a product characteristic was
within its design limits. Shewhart’s revolutionary contribution was the concept
of “process control.” From this new perspective, a product’s characteristic within
its design limits is only a necessary—but not a sufficient—condition to allow the
producer to be satisfied with the process . The idea behind this concept is that the
inherent and inevitable variability of every process can be tracked by means of
simple and straightforward statistical tools that permit the producer to detect the
moment when abnormal variation appears in the process. This is the moment
when the process can be labeled as “out of control,” and some action should be
put in place to correct the situation.
A simple example will help us understand this concept. Let’s suppose a
factory is producing metal plate whose thickness is a critical attribute of the
product according to customer needs. The producer will carefully control the
thickness of successive lots of product, and will make a graphical representation
of this variable with respect to time, see Fig. 1.1. Between points A and B the
process exhibits a small variability around the center of the acceptable range of
values. But something happens after point C, because the fluctuation of values is
much more evident, together with a shift in the average values in the direction of
the Upper Specification Limit (USL) . This is the point when it is said that the
process has gone out of control. After this period, the operator makes some kind
of adjustments in the process (point E) that allows the process to come back to
the original controlled state.
Fig. 1.1 Out of control. Example of an out-of-control process
It is worth noting that none of the points represented in this example are out
of the specification limits, which means that all the production is defect-free .
Although one could think that, after all, what really matters is the distinction
between defects and
non-defects, an out-of-control situation of a process is highly undesirable as
long as it is evident that the producer no longer controls the process and is at the
mercy of chance. These ideas of statistical quality control were quickly
assimilated by industry and even today, almost one century after the pioneering
work of Shewhart, constitute one of the basic pillars of modern quality.
Fig. 1.2 Chance causes. Variability resulting from chance causes. The process is under control
very small and should lead us to question if the process really is under
control. If we combine this idea with the graphical representation of the process
data with time, we will have developed the first and simplest of the control
charts.
The control chart is the main tool that is used in the statistical processes
control. A control chart is a time series plot of process data to which three lines
are superposed; the mean, the Upper Control Limit (UCL), and the Lower
Control Limit (LCL). As a first approach, upper and lower control limits are
separated from the process mean by a magnitude equal to three standard
deviations (3σ), thus setting up a clear boundary between those values that could
be reasonably expected and those that should be the result of assignable causes.
Figure 1.5 shows all the different parts of a typical control chart: the center line,
calculated as the average value (μ) of the data points, the UCL, calculated as the
sum of the average plus three standard deviations of the data points (μ + 3σ), and
the LCL calculated as the subtraction of the average minus three standard
deviations of the data points (μ − 3σ). A chart constructed in this way is at the
same time a powerful and simple tool that can be used to determine the moment
in which a process gets out of control. The reasoning behind the control chart is
that any time a data point falls outside of the region comprised by both control
limits, there exist a very high probability that an assignable cause has appeared
in the process .
Fig. 1.5 A typical control chart. Data points are plotted sequentially along with the control limits and the
center line
Although the criterion of one data point falling farther than three standard
deviations from the mean is the simplest one to understand based on the nature
of a normal process, some others also exist. For example:
Two of three consecutive data points farther than two standard deviations
from the mean;
Four of five consecutive data points farther than one standard deviation
from the mean;
Eight consecutive data points falling at the same side of the mean;
Six consecutive data points steadily increasing or decreasing;
Etc.
What have all these patterns in common? The answer is simple in statistical
terms; all of them correspond to situations of very low probability if chance
variation were the only one present in the process. Then, it should be concluded
that some assignable cause is in place and the process is out of control .
What Is R?
R is the evolution of the S language created in the Bell laboratories in the 1970s
by a group of researchers led by John Chambers and Rick Becker [2]. Note that,
in this sense, quality control and R are siblings, see Sect. 1.2 . Later on, in the
1990s Ross Ihaka and Robert Gentleman designed R as FOSS largely compatible
with S [5]. Definitely, the open source choice encouraged the scientific
community to further develop R, and the R-core was created afterwards. At the
beginning, R was mainly used in academia and research. Nevertheless, as
R evolved it was more and more used in other environments, such as private
companies and public administrations. Nowadays it is one of the most popular
software packages for analytics.4
R is platform-independent, it is available for Linux , Mac , and Windows . It is
FOSS and can be downloaded from the Comprehensive R Archive Network
(CRAN )5 repository. We can find in [4] the following definition of R :
Why R?
The ways of using R described above may sound old-fashioned. However, this is
a systematic way of work which, once is appropriately learned, it is far more
effective than the usual point, click, drag, and drop features of a software based
on windows and menus. More often than not, such user-friendly Graphical User
Interfaces (GUIs ) avoid the user to think on what they are actually doing, just
because there is a mechanical sequence of clicks that do the work for them.
When users have to write what they want the machine to do, they must know
what they want the software to do. Still, extra motivation is needed to start using
R. The learning curve for R is very slow at the beginning, and it takes a lot of
time to learn things, see Fig. 1.6. This is discouraging for learners, especially
when you are stressed by the need of getting results quickly in a competitive
environment. However, this initial effort is rewarding. Once one grasps the
basics of the language and the new way of doing things, i.e., writing rather than
clicking, impressive results are get easily. Moreover, the flexibility of having
unlimited possibilities both through the implemented functionality and one’s
own developments fosters the user creativity and allows asking questions and
looking for answers, creating new knowledge for their organization.
Fig. 1.6 R learning curve. It takes a lot of time to learn something about R, but then you create new things
very quickly. The time units vary depending on the user’s previous skills. Note that the curve is asymptotic:
you never become an expert, but are always learning something new
In addition to the cost-free motivation, there are many reasons for choosing
R as the statistical software for quality control. We outline here some of the
strengths of the R project, which are further developed in the subsequent
sections:
It is Free and Open Source;
The system runs in almost any system and configuration and the installation
is easy;
There is a base functionality for a wide range of statistical computation and
graphics, such as descriptive statistics, statistical inference, time series, data
mining, multivariate plotting, advanced graphics, optimization,
mathematics, etc;
The base installation can be enriched by installing contributed packages
devoted to particular topics, for example for quality control;
It has Reproducible Research and Literate Programming capabilities [14];
New functionality can be added to fulfill any user or company
requirements;
Interfacing with other languages such as Python , C , or Fortran is possible, as
well as wrapping other programs within R scripts ;
There is a wide range of options to get support on R, including the extensive
R documentation , the R community, and commercial support.
How to Obtain R
The official R project website12 is the main source of information to start with R.
Even though the website design is quite austere, it contains a lot of resources, see
Fig. 1.7.
Fig. 1.7 R project website homepage. The left menu bar provides access to basic R information, the
CRAN, and documentation
In the central part of the homepage we can find two blocks of information:
Getting Started: Provides links to the download pages and to the answers
to the frequently asked questions;
News: Feed with the recent news about R: new releases , conferences , and
issues of the R Journal .
In addition, the following sections are available from the left side menu:
About R: Basic information about R and the latest news;
Download, packages: A link to the CRAN repository;
R Project: Varied information about the R Project, its foundation, donors,
conferences, and some tools;
Documentation: It is one of the strengths of R. The Frequently Asked
Questions (FAQs ) is a good starting point. It is a short document with
general answers about the system, and also to very common questions
arising when starting to use R. The Manuals are quite complete and updated
with each release. There are different manuals for different levels;
Misc: This miscellaneous section provides links to other resources.
The links to download the R software and the link to CRAN lead to the
selection of a mirror. The R project is hosted at the Institute for Statistics and
Mathematics of WU (Wirtschaftsuniversität Wien, Vienna University of
Economics and Business). Mirrors are replicated servers throughout the world
maintained by private and public organizations that kindly contribute to the R
ecosystem. It is recommended to select a mirror near your location when
downloading CRAN files. The main server can be directly accessed without a
mirror selection at the URL : http://cran.r-project.org.
The CRAN web page, see Fig. 1.8, has links to download and install the
software for Linux , Windows , and Mac . For Linux Users, the repository of the
selected mirror can be added to the sources list and then install and get updates
in the usual way using the package management system. For Windows and Mac
users, installation files can be downloaded and installed by double-clicking
them. In any case, the installation is straightforward and the default settings are
recommended for most users.
Fig. 1.8 CRAN web page. Access to the R software, including the sources, documentation, and other
information
Some other interesting resources are available in the CRAN web page. The
source code can be also downloaded, not only for the last release, but also for the
current development of R, and for older versions. From the left side menu, we
can access further resources. The most impressive one is the Packages section.
Add-on packages are optional software that extend the R software with more
code, data, and documentation. The R distribution itself includes about 30
packages. Nevertheless, the number of contributed packages is astonishing. At
the time this is written,13 more than 6500 packages are available at CRAN for a
number of different applications. Each contributed package has a web page at
CRAN with links to download the package (binary or source code), manuals,
and other resources. Moreover, CRAN is not the only repository for R packages.
Other R repositories are Bioconductor14 and Omegahat,15 and more and more
developers are using generic software repositories such as GitHub16 to publish
their packages. In total, the rdocumentation.org17 website records 7393
packages. The installation of add-on packages is straightforward in R, especially
for those available at CRAN, as will be shown in Chapter 2.
Another great resource at CRAN are the Task Views . A Task View is a
collection of resources related to a given topic that bring together R functions,
packages, documentation, links, and other materials, classified and commented
by the Task View maintainer. The task views are maintained by contributors to
the R Project who are experts on the subject. The Task Views available at CRAN
are listed in Table 1.1. Currently, there is not a Task View for Quality Control.
Nevertheless, we include in Chapter 2 a sort of proposal for it.
Table 1.1 CRAN task views
Name Topic
Bayesian Bayesian inference
ChemPhys Chemometrics and computational physics
ClinicalTrials Clinical trial design, monitoring, and analysis
Cluster Cluster analysis and finite mixture models
DifferentialEquations Differential equations
Distributions Probability distributions
Econometrics Econometrics
Environmetrics Analysis of ecological and environmental data
ExperimentalDesign Design of experiments (DoE) and analysis of experimental data
Finance Empirical finance
Genetics Statistical genetics
Graphics Graphic displays and dynamic graphics and graphic devices and
visualization
HighPerformanceComputing High-performance and parallel computing with R
MachineLearning Machine learning and statistical learning
MedicalImaging Medical image analysis
MetaAnalysis Meta-analysis
Multivariate Multivariate statistics
NaturalLanguageProcessing Natural language processing
NumericalMathematics Numerical mathematics
OfficialStatistics Official statistics and survey methodology
Optimization Optimization and mathematical programming
Pharmacokinetics Analysis of pharmacokinetic data
Phylogenetics Phylogenetics, especially comparative methods
Psychometrics Psychometric models and methods
ReproducibleResearch Reproducible research
Robust Robust statistical methods
SocialSciences Statistics for the social sciences
Spatial Analysis of spatial data
SpatioTemporal Handling and analyzing spatio-temporal data
Survival Survival analysis
TimeSeries Time series analysis
WebTechnologies Web technologies and services
gR gRaphical models in R
Example 1.1.
Pellets density.
A certain ceramic process produces pellets whose density is a critical quality
characteristic according to customer needs. Current technical specification states
that the density of a pellet is considered acceptable if it is greater than 10.5
g/cm3. A sample of one pellet is taken and measured, following a standardized
inspection process, after each hour of continuous operation. The complete set of
inspection data for a one-day period is in Table 1.2 (left-to-right ordered).
Download R
1. Go to http://www.r-project.org;
2. Click on the Download R link;
3. Select a mirror close to your location, or the 0-Cloud first one to get
automatically redirected. Any of the links should work, though the
downloading time could vary;
4. Click on the Download R for Windows link;
5. Click on the base link;
6. Click on the Download R 3.x.x for Windows link;
7. Save the.exe file in a folder of your choice, for example the Downloads folder
within your Users folder.
Install R
1. Open the folder where you saved the.exe file;
2. Double-click the R-X.x.x-win.exe file;
3. Accept the default options in the installation wizard.
Download RStudio
1. Go to http://www.rstudio.com;
2. Click on the Powerful IDE for R link;
3. Click on the Desktop link;
4. Click on the DOWNLOAD RSTUDIO DESKTOP button;
5. Click on the RStudio X.xx.xxx - Windows XP/Vista/7/8 link;
6.
Save the.exe file in a folder of your choice, for example the Downloads folder
within your Users folder.
Install RStudio
1. Make sure you have the last version of Java18 installed on your system;
2. Open the folder where you saved the.exe file;
3. Double-click the RStudio-X.xx.xxx.exe file;
4. Accept the default options in the installation wizard.
Start RStudio
After the installation of R and RStudio, you will have on your desktop an icon for
RStudio. As for R, if your system is a 64-bit one, you will have two new icons:
one for R for 32 bits and another for R for 64 bits. In general, it is recommended
to run the version that matches your OS architecture. Double click the RStudio
icon and you should see something similar to the screen capture in Fig. 1.10.
More details about R and RStudio are provided in Chapter 2.
Fig. 1.10 RStudio application. This is what we see when starting RStudio
2. Click on the Install button. A dialog box appears;
3. Type “qcc” on the Packages text box;
4. Click on Install. You will see some messages in the RStudio console.
Packages only need to be installed once, but they need to be loaded in the
workspace in every session that uses functions of the package (see next
steps).
Select and Set Your Working Directory (Optional)
By default, the R working directory is your home directory, e.g., My Documents.
This working directory is shown in the Files tab (lower-right pane) when
opening RStudio. The working directory can be changed following these steps,
even though it is not needed for the purpose of this example:
Select the Files tab in the lower-right pane;
Click the …button on the upper-right side of the Files Pane (see Fig. 1.10);
Look for the directory you want to be the working directory. For example,
create a folder on your home directory called qcrbook;
Click on the Select button;
Click on the More… menu on the title bar of the files tab and select the Set
as working directory option;
Check that the title bar of the console pane (left-down) shows now the path
to your working directory.
1. Create a new R Script . You can use the File menu and select New File/R
Script, or click the New File command button (first icon on the toolbar) and
select R Script. A blank file is created and opened in your source editor pane .
2. Save your file. Even though it is empty, it is good practice to save the file at
the beginning and then save changes regularly while editing. By default, the
Save File dialog box goes to the current working directory. You can save the
file there, or create a folder for your scripts. Choose a name for your file, for
example “roadmap” or “roadmap.R”. If you do not write any extension,
RStudio does it for you.
3. Write the expressions you need to get the control chart in the script file:
Create a vector with the pellets data. The following expression creates a
vector with the values in Table 1.2 using the function c, and assigns
(through the operator <-) the vector to the symbol pdensity.
If you are reading the electronic version of this chapter, you can copy
and paste the code. The code is also available at the book’s companion
website19 in plain text. Please note that when copying and pasting from a
pdf file, sometimes you can get non-ascii characters, spurious text, or
other inconsistencies. If you get an error or a different result when
running a pasted expression, please check that the expression is exactly
what you see in the book. In any case, we recommend typing everything,
at least at the beginning, in order to get used to the R mood.
Load the qcc package . This is done with the function library as
follows:
Use the function qcc to create a control chart for individual values of the
pellets density data:
Remember to save your script. If you did not do that earlier, put a name
to the file.
Run Your Script
Now you have a script with three expressions. To run the whole script, click on
the Source icon in the tool bar above your script. And that is it! You should get
the control chart and output text in Fig. 1.11 in the Plots pane and in the console,
respectively. As you may have noticed, it is not exactly the control chart in
Fig. 1.9. Now we have provided the qcc function with the minimum amount of
information it needs to produce the control chart. Further arguments can be
provided to the function, and we can also add things such as lines and text to the
plot. The following code is the one that produced Fig. 1.9:
Fig. 1.11 Example control chart. Your first control chart with R following the steps of the roadmap
After loading the qcc package, the first expression produces the plot; the
second one draws a horizontal line at the specification limit; and the third one
puts a text explaining the line. Do not worry for the moment about these details,
just keep in mind that we can go from simple expressions to complex programs
as needed. As for the text output in Figure 1.11, it is the structure of the object
returned by the qcc function. You will learn more about R objects and output
results in Chapter 2.
1. Create a new R Markdown file. You can use the File menu and select New
File/R Markdown …, or click the New File command button (first icon on the
toolbar) and select R Markdown. A dialog box appears to select the format of
your report, see Fig. 1.12. Type a title and author for your report, select the
option Word and click OK21;
2. A new file is shown in the source editor , but this time it is not empty. By
default, RStudio creates a new report based on a template;
3. Keep the first six lines as they are. They are the meta data to create the report,
i.e.: title, author, date, and output format;
4. Take a look at the next two paragraphs. Then delete this text and write
something suitable for your quality control report;
5.
The chunks of code are embraced between the lines ‘‘‘{r} and ‘‘‘. Change the
expression in the first chunk of the template by the first two expressions of the
script you created above;
6. Change the expression in the second chunk of the template by the third
expression of the script you created above;
7. Read the last paragraph of the template to realize what is the difference
between the two chunks of code. Change this paragraph by something you
want to say in your report.
8. You should have something similar to this in your R Markdown file:
---
title: "Quality Control with R: Intuitive Example"
author: "Emilio L. Cano"
date: "31/01/2015"
output: word_document
---
This is my first report of quality control using R.
First, I will create the data I need from the
example in the Book *Quality Control with R*. I
also need to load the ‘qcc‘ library:
‘‘‘{r}
pdensity <- c(10.6817, 10.6040, 10.5709, 10.7858,
10.7668, 10.8101, 10.6905, 10.6079,
10.5724, 10.7736, 11.0921, 11.1023,
11.0934, 10.8530, 10.6774, 10.6712,
10.6935, 10.5669, 10.8002, 10.7607,
10.5470, 10.5555, 10.5705, 10.7723)
library(qcc)
‘‘‘
And this is my first control chart using R:
‘‘‘{r, echo=FALSE}
qcc(data = pdensity, type = "xbar.one")
‘‘‘
It worked! Using R for quality control is great :)
9. Knit the report. Once the R Markdown is ready and saved, click on the knit
Word icon on the source editor toolbar. The resulting report is in Figs. 1.13
and 1.14. Note that everything needed for the report is in the R Markdown
file: the data, the analysis, and the text of the report. Imagine there is an error
in some of the data points of the first expression. All you need to get an
updated report is to change the bad data and knit the report again. This is a
simple example, but when reports get longer, changing things using the
typical copy-paste approach is far less efficient. Another example of a Lean
measure we are applying.
Fig. 1.13 Markdown word report. Page 1
Fig. 1.14 Markdown word report. Page 2
1.7 Conclusions and Further Steps
In this chapter we have intuitively introduced quality control by means of a
simple example and straightforward R code. Next chapters will develop different
statistical techniques to deal with different quality control scenarios. These
techniques include further types of control charts in order to deal with
continuous and discrete manufacturing, as well as acceptance sampling or
process capability analysis.
On the other hand, every chapter contains a section devoted to the
International Standards Organization (ISO) standards that are relevant to the
chapter contents. Moreover, Appendix B comprises the full list of standards
published by the Application of Statistical Methods ISO Technical Committee
(TC), i.e., ISO TC/69. Relevant ISO standards for this introductory chapter are
those pertaining to vocabulary and symbols, as well as the more general ones
describing methodologies. Even though we will see them in more specific
chapters, these are the more representative generic standards of ISO/TC 69:
2. Chambers, J.M.: Software for Data Analysis. Programming with R. Statistics and Computing. Springer,
Berlin (2008)
[MATH][CrossRef]
3. Free Software Foundation, Inc.: Free Software Foundation website. http://gnu.org (2014). Accessed 10
July 2014
5. Ihaka, R., Gentleman, R.: R: a language for data analysis and graphics. J. Comput. Graph. Stat. 5, 299–
314 (1996)
6. ISO TC69/SC1–Terminology and Symbols: ISO 3534-1:2006 - Statistics – Vocabulary and symbols –
Part 1: General statistical terms and terms used in probability. Published standard. http://www.iso.org/
iso/catalogue_detail.htm?csnumber=40145 (2010)
7. ISO TC69/SC1–Terminology and Symbols: ISO 3534-2:2006 - Statistics – Vocabulary and symbols –
Part 2: Applied statistics. Published standard. http://www.iso.org/iso/catalogue_detail.htm?csnumber=
40147 (2014)
11. ISO TC69/SC4–Applications of statistical methods in process management: ISO 7870-1:2014 - Control
charts – Part 1: General guidelines. Published standard. http://www.iso.org/iso/catalogue_detail.htm?
csnumber=62649 (2014)
12. ISO TC69/SC6–Measurement methods and results: ISO 5725-1:1994 - Accuracy (trueness and
precision) of measurement methods and results – Part 1: General principles and definitions. Published
standard. http://www.iso.org/iso/catalogue_detail.htm?csnumber=11833 (2012)
13. ISO TC69/SC6–Measurement methods and results: ISO 10576-1:2003 - Statistical methods –
Guidelines for the evaluation of conformity with specified requirements – Part 1: General principles.
Published standard. http://www.iso.org/iso/catalogue_detail.htm?csnumber=32373 (2014)
14. Leisch, F.: Sweave: dynamic generation of statistical reports using literate data analysis. In: Härdle, W.,
Rönz, B. (eds.) Compstat 2002 — Proceedings in Computational Statistics, pp. 575–580. Physica,
Heidelberg (2002). http://www.stat.uni-muenchen.de/~leisch/Sweave
15. R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical
Computing, Vienna (2015). http://www.R-project.org/
16. Shewhart, W.: Economic Control of Quality in Manufactured Products. Van Nostrom, New York (1931)
Footnotes
1 http://en.Wikipedia.org/wiki/Six_Sigma.
2 See more about free software at http://gnu.org/philosophy/free-sw.en.html.
3 Lean, or Lean Manufacturing, is a quality methodology based on the reduction of waste.
4 r4stats.com/articles/popularity.
5 http://cran.r-project.org.
6 http://blog.revolutionanalytics.com/local-r-groups.html.
7 http://www.revolutionanalytics.com.
8 http://www.rstudio.com.
9 http://www.openanalytics.eu.
10 http://www.tibco.com.
11 http://stackoverflow.com/tags/r.
12 http://www.r-project.org.
13 April 2015.
14 http://bioconductor.org.
15 http://omegahat.org.
16 http://github.org.
17 http://rdocumentation.org.
18 Check at http://www.java.com.
19 http://www.qualitycontrolwithr.com.
20 Visit http://www.miktex.com for a LaTeX distribution for Windows.
21 If there are missing or outdated packages, RStudio will ask for permission to install or update them. Just
answer yes.
© Springer International Publishing Switzerland 2015
Emilio L. Cano, Javier M. Moguerza and Mariano Prieto Corcoba, Quality Control with R, Use R!,
DOI 10.1007/978-3-319-24046-6_2
Abstract
This chapter introduces R as statistical software and programming language for
quality control. The chapter is organized as a kind of tutorial with lots of
examples ready to be run by the reader. Moreover, the code is available at the
book’s companion website. Even though the RStudio interface is also introduced
in the chapter, any other user interface can be used, including the R default GUI
and code editor.
2.1 Introduction
In this chapter, the essentials of the R statistical software and programming
language [27] are explained. This provides the reader with the basic knowledge
to start using R for quality control. You should try the code by yourself while
reading this chapter, and therefore you need R and RStudio (optionally but
recommended) installed on your computer before continuing reading the chapter.
Follow the step-by-step instructions explained in Sect. 1.6 of Chapter 1, or just
go to the R website1 and to the RStudio website,2 download the installation files,
and install them to your computer. If you are reading the electronic version of
this chapter, you can copy and paste the code in the examples.3 The code is also
available at the book’s companion website.4 In any case, we recommend typing
everything, at least at the beginning, in order to get used to the R mood.
In Chapter 1, we introduced the power of R for quality control, what it is, its
history, etc. This chapter goes into the details of the software to get advantage of
that power. We highlight here some of the R features explained in Sect. 1.4 of
Chapter 1:
R is the evolution of the S language, developed at Bell Laboratories (then
AT&T and Lucent Technologies) in the 1970s [6]. Note that it is the same
company where Walter Shewhart developed modern statistical quality
control 50 years before [34];
R is maintained by a foundation , a Core Team , and a huge community of
users and stakeholders, including commercial companies that make their
own developments;
R is Free and Open Source Software (FOSS ). Free as in free beer, and free
as in free speech [14];
R is also a programming language, and a system for statistical computing
and graphics;
R is platform independent: it runs in Windows, Mac, and Linux;
The way of interacting with R is by means of expressions , which are
evaluated in the R Console , or can be stored in R scripts to be run as
programs;
R has Reproducible Research and Literate Programming capabilities, which
has proven quite useful for quality control reports in Sect. 1.6, Chapter 1;
R base functionality provides a complete set of tools for statistical
computing and plotting, developed by time-honored experts;
R base functionality is expanded by an increasing number of contributed
packages for a wide range of applications, including some for quality
control;
The software can be customized creating new functions for particular
needs.
The toughest part for new R users is to get used to the interactivity with the
system. Having to write the expressions prompts errors which, especially at the
beginning, are not easy to interpret. Nevertheless, those errors are usually caused
by similar patterns. Find below a list of common errors while writing R
expressions. If you get an error when running an R expression, it very likely can
be classified into one of those categories. Please take into account that those
types of errors are not made only by beginners, but it is part of the normal use of
R. Practice will reduce the number of times errors are produced and, more
importantly, the time one lasts realizing where is the problem and fix the
expression. This list contains concepts that you still do not know about. Note the
list as a reference and come back here whenever you get an error while reading
the chapter and practicing with the code. Once you have completed the chapter,
read the list again to fix concepts.
Missing closing character. You need to close all the parentheses, square
brackets, curly brackets, or quotation marks you had opened in an
expression. Otherwise the expression is incomplete, and the console prompt
keeps waiting for you to finish it with the + symbol. If you are running a
script, R will try to continue the expression with the next one, and the error
message could be uninformative. So always check that you do not have a
missing closing symbol;
String characters without quotation marks. String characters must be
provided in quotation marks ("). Everything in an expression that is not in
quotation marks is evaluated in the workspace, and therefore it should exist,
either in the global environment or in other environments. Usually, the error
message indicates that the object does not exist, or something related to the
class of the object;
Missing parenthesis in functions. Every function must include
parenthesis, even if it does not need any argument;
Missing arguments in a function call. Functions accept arguments, which
sometimes can be omitted either because they are optional or because they
have a default value. Otherwise they are mandatory and a message indicates
so;
Wrong arguments in functions. Sometimes it is due to the missing
quotation marks mentioned above. Check the class of the object you are
using as argument, and the class the function expects as argument;
Incompatible lengths. Data objects have a length that may be critical when
using them. For example, the columns of a data frame must be of the same
length;
Wrong data. If a vector of data is supposed to be, for example, numeric,
but one or more of its components is another thing, for example a character
string, then computations over the vector might result on unexpected
results. This is not always so evident, as it may be a number but the
computer might interpret a character, for example due to spurious blank
spaces or the like;
Other wrong syntax errors. Check the following:
– The arguments in a function are separated by commas (,);
– The conditions in loops and conditions are in parenthesis;
– You do not have wrong blank spaces, for example in the assignment
operator;
– Use a period (.) as decimal separator in numbers;
– Expressions are in different lines, or separated by a semicolon.
In the remaining of the chapter you will find an overview of R interfaces in
Sect. 2.2; a description of the main R elements in Sect. 2.4; an introduction to
RStudio in Sect. 2.5; Sects. 2.6 and 2.7 describe how to work with data within R
and with external data sources. This is the starting point for the application of the
quality control tools explained throughout the book; a QualityControl task view
is proposed in Sect. 2.8; finally, some ideas and thoughts about R and
Standardization are given in Sect. 2.9. Note that the specific functions and
packages for quality control are not included in this chapter, as they are
explained in detail in the corresponding chapter. For example, functions for
modelling processes are in Chapter 5, and so on. Appendix C is a complete cheat
sheet for quality control with R.
2.2 R Interfaces
The R base installation comes with a Command Line Interface (CLI) . This CLI
allows interacting with R using the R Console as outlined above, by means of
expressions and scripts. This is one of the hardest parts for beginners, especially
for those who do not have experience in programming. Luckily, being R open
source software and a programming language at the same time allows
developing more advanced interfaces to work with R. For the Windows and Mac
versions of R, an extremely simple Graphical User Interface (GUI) is also
included with the base installation. It can be started as any other application in
the system, Figure 2.1 shows the GUI for Windows.
Fig. 2.1 R GUI for windows. The R GUI allows basic interaction with R through the R console; scripts can
be created using the R Editor; and the R Graphics device opens when invoking a plot. The menu bar
contains access to some basic operations such as installing packages, or save and load files
The approach followed in this book is using an interface of the second type.
This allows to use all the capabilities of R, and the examples provided
throughout the book can be used either in the built-in R GUI, both in the R
console and as scripts in the R Editor, or in other available GUIs. In what
follows, we explain one of the interfaces that has become very popular among a
wide range of R users, including those using R in Industry: RStudio. This choice
does not mean that one interface is better than the others. In fact, we invite the
reader to try out more than one and decide by themselves which one fits better
their needs. In fact, we have been using both RStudio and Eclipse + StatET to write
this book using Reproducible Research and Literate Programming techniques.
The good thing is that we can choose between several alternatives. Moreover, as
we remarked above, all the examples in the book are ready to use in any R
interface, or interactively in the console.
2.3 R Expressions
The way to interact with R is through R expressions, sometimes named as
commands. As explained above, R is interactive, in the sense that it responds to
given inputs. Such inputs are R expressions, which can be of several types,
mainly:
An arithmetic expression;
A logical expression;
A call to a function;
An assignment.
Expressions are evaluated by R when run in the console or through a script. If
the expression is incomplete, the R Console prompt keeps waiting until the
expression is complete. If the expression is complete, R evaluates the expression,
producing a result. This result may show some output to the user, which can be
textual or graphical. Some expressions do not produce any visible output, being
the result, for example, storing data in variables, or writing data to disk.
One of the characteristics of R is that it works with in-memory data.
Nevertheless, we will need to work with expressions containing files in several
ways. Some of them are:
Read data files to use in data analysis;
Write data files to use later on;
Save plots to be included in reports using other software tools;
Create R scripts to write sets of expressions containing a complete analysis;
Create report files with code, results, data, and text suitable to be compiled
and delivered.
In summary, the purpose of using files in R can be either working with data,
or working with code. When files are involved in R expressions, we can provide
the file location using two approaches:
Through the absolute path, i.e., the location in the computer from the root
file system;
Through the relative path, i.e., the location in the computer from the
working directory of the R session (see below).
File paths must be provided as character strings, and therefore quotation
marks must be used. When using Windows, it is important to note that the
backward slash character (“\”) is reserved for escaping 8 in R, and Windows
paths must be provided using either a forward slash (“/”) or a double backward
slash (“\\”) to separate folders and file names. For relative paths, the usual
symbols for current and parent directories (“.” and “..” respectively) can be used.
2.4 R Infrastructure
The R infrastructure is composed of the following elements:
The console
The editor
The graphical output
The history
The workspace
The working directory
In the R GUI, the console, the editor, and the graphical output are the three
windows that can be seen in Fig. 2.1. However, the history, the workspace, and
the working directory are hidden and we need to use coding to access them. As
remarked above, interfaces like RStudio allow more options in order to work
with those system-related elements. Moreover, advanced functionality is
available to: easily access to objects and functions; syntax highlighting;
contextual menus; access to help; explore data; etc. Nevertheless, the interface is
actually a wrapper for the R system, and the level of interaction for the statistical
analysis is the same: console and scripts.
2.5 Introduction to RStudio
RStudio is a Java-based application, and therefore having Java installed is a
prerequisite. Make sure you have the latest version of Java9 to avoid possible
issues. The RStudio interface is shown in Fig. 2.2.10 It has a layout of four panes
whose dimensions can be adjusted, and each pane can contain different types of
elements by means of tabs. Most of those elements are basic components of the
R system listed above. The default layout is as follows11:
Fig. 2.2 RStudio layout. The RStudio interface is divided into four panes: the console pane, the source
pane, the workspace and history pane, and the files, plots, packages, and help pane. The layout can be
modified through the global options in the Tools menu
1. Lower-left pane. This pane is for the R Console . It can also show system-
related elements such as the output of the system console when calling
system commands, for example to compile a report;
2. Upper-left pane. This pane is for the R Source. R Scripts are managed in this
pane. Other types of files can also be opened in this pane, for example text
files containing data, code in other programming languages, or report files.
Data sets are also shown in this pane;
3. Upper-right pane. This pane is mainly for the R History and the R
Environment Other tabs appear when using certain features of RStudio, such
as packages development, or R Presentations;
4. Lower-right pane. This pane is the most populated. It has the following tabs:
Files. It is a system file explorer with basic functions. It can be linked to
the R working directory;
Plots. It is the RStudio graphics device. The plots generated in R are
shown here;
Packages. Shows the packages available in the system, and we can
install, uninstall, or update them easily;
Help. This tab provides access to all the R Documentation, including the
documentation of the installed contributed packages;
Viewer. This tab is used to develop web applications with RStudio,
which will not be covered in this book.
The R(Studio) Console
The R console in RStudio is located by default in the lower-left pane, see
Fig. 2.3. Its behavior is the same as in the standard R GUI: there is a prompt
identified by the “>” symbol that is waiting for an expression. The user writes an
expression after the prompt and presses the Intro or return key. R evaluates the
expression and produces a result. An important issue puzzling for newcomers
that arises quite often is that if an expression is incomplete, the prompt changes
to the “+” symbol, waiting for the rest of the expression. Most of the times the
user thought that the expression was complete and does not know what to do.
Usually, it is due to a missing closing parenthesis or the like, and the way to
cancel the expression is to press the ESC key. Some details about the RStudio
console:
Fig. 2.3 RStudio console. The RStudio console provides interaction with R
We can go to the console prompt using the keyboard shortcut CTRL+2 from
anywhere;
The title bar of the RStudio console contains a valuable information: the
current working directory (see below);
The arrow next to the working directory path is to show the working
directory in the files pane;
When writing an expression, we can press TAB or CTRL+SPACEBAR to see a
contextual menu and select: available objects in the workspace and
functions; arguments of a function (within a function); or files and folders
of the working directory (within quotation marks);
The ESC key cancels the current expression;
CTRL+L clears the console;
The up and down arrow keys navigate through the history.
In what follows, R code is shown in gray background. Input expressions can
be written directly in the RStudio Console or script editor (or copy-pasted if you
are reading the electronic version of this book). The output produced by R is
shown in the book after two hash symbols (“##”) at the beginning of the line. For
example, the simplest expression we can input is a number. Type the number 1 at
the console and press Intro:
We can see that the result of this input expression is a line of text with the
number 1 between squared brackets followed by the number 1. The number in
square brackets is an identifier that will be explained later. The result of the
expression is the same number that we wrote. One step beyond would be to ask
for a calculation, for example:
Now the output is the result of the arithmetic expression. What happens if the
expression is incomplete?
As you may have realized, the > symbol changes to +, denoting that the
expression is not complete. The system remains in that state until either the
expression is completed or the ESC key is pressed, cancelling the expression.
Arithmetic expressions return the result of the operation. Another type of R
expressions are logical expressions, which return the TRUE or FALSE value:
where pi is itself an expression that gets the value of the internal object
containing the value of π = 3,14159….
The log function gets the logarithm of a number. We can see the possible
arguments of the function through the function str 13:
Therefore, log is a function that accepts two arguments: x, that does not have
any default value, and base, whose default value is the expression:
i.e., the e constant. Thus, the value that we get with the log function is the
natural logarithm, i.e., with base e, of the number we pass as first argument, or
with a different base if we pass the base argument. For example, the decimal
logarithm would be:
The dot-dot-dot (…) argument means that the function accepts an undefined
number of arguments. So, will it work without arguments?
The documentation for the function is then shown in the Help tab, lower-
right pane. As the function does not need any argument to work, could we use it
without parenthesis?
Note that you might get a different number of elements per row, as the output
width has been set to 50 characters for the generation of the book’s code, you
can set your own preferred output with the options function as follows:
Fig. 2.4 RStudio source editor. The RStudio source editor can manage R scripts, reports, and code in other
programming languages, such as C++ and Python
Complex scripts can be run from the console or from other scripts. For
example, if we have a script called dayly.R that performs ordinary tasks that we
want to use in other scripts, e.g., loading packages, import data, etc., we can run
such script with the following expression:
Fig. 2.5 RStudio plots tab. Plots generated in RStudio are shown in the plots tab, lower-right pane
The RStudio graphics device includes a menu bar with several options that
makes life easier with devices management:
Navigation through the graphics history;
Zooming;
Export files to several formats using a dialog box;
Removing and clearing of graphics history.
The export menu includes three options for graphics: save as image, save as
pdf, and copy to clipboard. The former two open a dialog box with export
options such as the file extension (in the case of image), file path, and image
size, see Fig. 2.6.
Fig. 2.6 RStudio export graphic dialog box. A preview of the image is shown along with the export
options: image format and dimensions being the most relevant
Fig. 2.7 RStudio history. The R History can be easily consulted, searched, and used during an R session
through the History tab in the upper-right pane
The history can also be accessed via R code, see the documentation of the
functions loadhistory, savehistory, history.
The R Workspace
The objects that are available in R are stored in the workspace. The workspace is
organized in different environments. The Global Environment is the place in
which the objects we create through assignments are stored. Each loaded
package has its own environment. Environments are also created for calls to
functions, and the user can even create environments. For the scope of this book,
just keep track of your Global Environment tab, upper-right pane, see Fig. 2.8
where you will find useful information about the objects that are available to use
in your session. You can save, open, search, and clear objects in the workspace
through the menu bar of the Environment tab. To make actions only over
selected objects in the workspace, change the view using the upper-right icon on
the menu bar from “List” (default) to “Grid,” select the objects you want to save
or clear, and click the appropriate button. Remember to change again to the List
view in order to be able to explore the environment. An icon for importing
datasets stored in text files is also available (we will go over this later on).
Fig. 2.8 RStudio workspace. The R Workspace contains a list of available objects in the global
environment, which can be used in R expressions during the R Session
The R workspace can also be accessed via R code, see the documentation of
the functions ls, str , load, save, save.image, and rm.
Fig. 2.9 RStudio files pane. Interaction with the file system is possible through the Files pane, including
the setting and visualization of the R working directory
The R working directory and the file system can also be accessed via R code,
see the documentation of the functions getwd (returns the working directory),
setwd (sets the working directory), list.files, list.dirs, and dir. Actually, it
is common practice to include at the beginning of the scripts an expression to set
the working directory, for example:
Recall that the backslash character (“\”) has a special meaning in R and
Windows paths must be provided using either a forward slash (“/”) or a double
backward slash (“\\”) to separate folders and file names. This is particularly
important when copying and pasting paths from the address bar of the Windows
file explorer.
Packages
R functionality is organized by means of packages. The R distribution itself
includes about 30 packages. Some of them are loaded when starting R. In
addition, a number of contributed packages are available, see Sect. 1.4 in
Chapter 1. In order to use the functions of a package, it must be loaded in
advance. Obviously, the package must be installed in the system in order to be
loaded. The installation of a package is done once, while the package must be
loaded in the R workspace every time we want to use it. Both operations can be
done through the Packages pane of RStudio, see Fig. 2.10.
Fig. 2.10 RStudio packages. The Packages tab in the lower-right pane shows a list of installed packages
with links to the documentation and command buttons to manage the packages
To install a package, click on the Install icon in the Packages tab menu bar. A
dialog box opens where we can select whether to install the package from
CRAN or from a local file. To install a package from CRAN, type the name of
the package (or just the first letters to get a list) and click on the Install button. If
you select to install it from a local file, a dialog box appears to search the file.
This is useful for packages that are not published in official repositories, but are
available from other sources. Similarly to the R software, add-on packages are
regularly updated by their authors. Installed packages can be updated by clicking
the Update button in the command bar of the Packages tab. From the list of
installed packages we can also go to the documentation of the package by
clicking on its name, remove the package from the system clicking on the icon
on the right, or load the package in the workspace by selecting the check-box on
the left. Nevertheless, even though the installation of packages is comfortable
through the RStudio interface, it is more convenient to load the packages in the
scripts as they are needed in the code using the library function. For example,
to load the qcc package:
Vectors
Creating Vectors
The most basic classes in R are vectors. They are also very important because
more complex data structures usually are composed by vectors. For example, the
columns of a data frame with the data of a process are actually vectors with the
values of different variables. Therefore, the explanations in this subsection are
mostly valid for working with objects whose class is data.frame or list.
There are several ways of creating vectors. The most basic one is entering
the values interactively in the console using the scan function. If you type on the
console:
then the console prompt changes to “1:” waiting for the first element of the
vector. Type, for example, 10 and press RETURN. Now the prompt changes again
to “2:” and waits for the second element of the vector, and so on. Enter, for
example, two more values: 20 and 30. When you have finished, press INTRO
without any value and the scanning of values finishes. Your output should look
like this:
Now you have a vector whose name is “x” on your workspace. This is what
the assignment operator (“<-”) did: to assign the result of the scan function to the
“x” symbol.16 If you are using RStudio, check the Environment tab in the upper-
right pane, and see the information you have at a glance. Under the “Values”
group, you have the object x and some information about it: the data type (num),
its length (from index 1 to index 3, i.e., 3 elements), and the first values of the
vector (in this case all of them as there are few of them). You can always access
this information from code either in the console or within a script. The following
expression gets the list of objects in your workspace using the ls function17:
And now you can ask for the structure of the x object with the str function:
If you input the variable symbol as a expression, you get its contents as
output:
Sequences of integers can also be created using the colon operator (“:”)
between the first and last numbers of the intended sequence. For example, the
following expression creates a vector with the integer numbers from 1 to 10:
Notice how in the above code we have typed two expressions in the same
line, but a semicolon was used to separate them. Another useful function to
generate vectors is the rep function, that repeats values. When working with
vectors, it is common practice to combine the different ways of creating vectors:
The sequence of indices along a vector can also be generated with the
seq_alog function:
Check that you have all the five new vectors in your workspace. We have
created numeric and character vectors. Logical vectors can also be created:
The TRUE and FALSE values are coerced to 1 and 0, respectively, when trying
to operate with them. This is useful, for example, to get the number of elements
that are true in a logical vector18:
If we want to label each element of this vector, for example because the
numbers are for different weeks, we can do so using the names function:
remaining the rest of the items unchanged. We can include integer vectors as
index to select more than one element. For example the following expression
gets the first and third elements of the x1 vector:
We can also exclude elements from the selection instead of specifying the
included elements. Thus, the previous expression is equivalent to this one:
New elements can be added to a vector either creating a new vector with the
original one and the new element(s) or assigning the new element to the index
greater than the last one, for example:
Ordering Vectors
Two functions are related with the ordering of vectors. Let us create a random
vector to illustrate them. The following expressions are to get a random sample
of size 10 from the digits 0 to 9. The set.seed function sets the seed in order to
make the example reproducible, see ?RNG to get help about random numbers
generation with R.
This function is very useful for sorting datasets as we will see later. Both the
order function and the sort function accept a “decreasing” argument to get the
reverse result. In addition, the rev function reverses the order of any vector, for
example, the following expressions are equivalent:
In this case, the 5 has been recycled once to complete a 3-length vector.
Mathematical functions which require a single value as argument return vectors
with the result of the function over each value of the original vector. For
example, the sqrt function returns the square root of a number:
Matrices
We can extract and replace parts of a matrix in the same way as in vectors.
The only difference is that now we have two indices rather than one inside the
squared brackets, separated by a comma. The first one is for the row index, and
the second one is for the column index. We can extract a whole row (column) by
leaving the second (first) index empty:
Notice that in the Environment tab of the upper-right pane of RStudio, the
matrix is under the “Data” group, instead of the “Values” one. As matrices have
two dimensions, i.e., rows and columns, they can be visualized in the RStudio
data viewer by clicking on the icon on the right of the list. The structure of the
matrix can be also get using the str function. See how now the lengths of the
two dimensions are shown, i.e., four rows and two columns:
Marginal sums and means can be computed using the rowSums, colSums,
rowMeans, and colMeans functions, for example:
Lists
Creating Lists
Lists are data structures that can contain any other R objects of different types
and lengths. Such objects can be created within the own definition of the list, or
taken from the workspace. The elements of a list can also be named, typically
this is done when creating the list. In the following example, we create a list
whose name is “myList,” and has three components.
See the printing of the list. The first two elements are shown with its name
preceded by a $ symbol. This is because we named them in the list definition.
The third element had no name and it is identified by its index between double
square brackets [[3]].
Accessing Lists
Similarly to vectors, the components of a list are indexed, and we can extract
each element of the list either by its index or by its name. In the latter case, we
can use the $ operator. See the following examples:
The difference between simple and double squared brackets is that when
using double squared brackets, we get the original object that is within the list, of
its own class, e.g., matrix. On the contrary, if we do the extraction using the
single squared brackets like in vectors, we get an object of class list. This makes
possible to select more than one element in the list, for example:
Notice that we can extract elements from the inner components of a list, for
example a column of the matrix that is the first element of the list. We can also
replace parts of an object as we had done with vectors and matrices:
You can see the structure of a list in the workspace by looking at the
Environment tab in the upper-right pane of RStudio, or using the str function:
Notice that the structure of a list shows the structure of each element of the
list. In the Environment tab, RStudio upper-right pane, the number of elements
of the list is shown, and by clicking on the left-side icon next to the name of the
list, the list is expanded to show the structure of each element of the list.
Data Frames
The usual way of working with data is by organizing them in rows and columns.
It is common that we have our data in such a way, either from spreadsheets, text
files, or databases. Columns represent variables, which are measured or observed
in a set of items, represented by rows. The class of R objects with such structure
is the data.frame class. We refer to them as data frames hereon. Recall that
matrices are also organized in rows and columns. The difference is that a matrix
can only contain data of the same type, for example numbers or character
strings. However, the columns of a data frame can be of different types, e.g., a
numerical column for the measurement of a quality characteristic, another one
logical stating whether the item is nonconforming, another one a factor for the
machine where the item was produced, and so on.
Notice that the access by name is equivalent to the access using double
squared brackets. The difference is whether the result is a data frame or a vector.
As a two dimensional data object, we can also access data frame elements in the
matrix fashion:
Data frames rows and columns have always names. Even if they are not
available when creating the data frame, R assign them: for columns, using the
letter V followed by a number (V1, V2, …); for rows, the default names are row
indices. Rows and column names can be consulted and changed afterwards in
the same way we explained above for factors and vectors, see the following
examples19 (we first create a copy of the data frame):
Ordering, Filtering, and Aggregating Data Frames
We already know that data frame columns are vectors. Therefore we can use the
functions explained for vectors in data frames. For example, to sort the data
frame created above by the weight column, we use the extracting strategy by
means of the squared brackets, passing as row indices the result of the order
function over the column (or columns) of interest:
where the third argument can be any function over a vector, typically
aggregation functions, see Appendix C.
Missing Values
Missing values treatment is a quite important topic in data analysis in general,
and in quality control in particular, especially in early stages of data cleaning.
Missing values are represented in R by the special value NA (not available). If we
try to do computations over vectors that include NAs, for example the mean, we
will get NA as a result, unless the argument na.rm (remove NAs) is set to TRUE.
Such argument is available in a number of functions and methods, but not
always. It may happen that NA values should actually have a value, but it was not
correctly identified when creating the data object. Then we can assign other
values to NAs. For that purpose (and others) the is.na function is very useful.
First, let us create a new column in our data frame to illustrate NAs. Suppose we
measured the content of salt of each element in the data frame in addition to the
weight. Unfortunately, for some reason the measurements could have not be
taken for all of the items. We add this new information as we learnt above:
Let us compute the means of the two numerical variables in the data frame:
There was no problem to compute the mean weight, as all the observations
are available. However, the mean salt could not be computed because there is a
missing value. To overcome this situation, we must tell the mean function to
omit the missing values:
Another possible action over NAs is to assign a value. Let us suppose that the
missing value is due to the fact that the item had no salt at all, i.e., the correct
value should be zero. We can turn all the NAs values into zeros (or any other
value) as follows:
In any case, data structures and types can be converted from one type to
another. For example, if we want vector3 to be a numeric vector, we coerce the
object to numeric:
It did not work because the string “13/02/2015” is the first one in a by-
character order. To make R understand that a variable is a date, we need to
convert the character string into a date. As you have likely guess, we do that
with an as.xxx function. But in this case we need an important additional
argument: the format in which the date is stored in the character vector. In the
case at hand, we have a day/month/year format, which must be specified as
follows (we overwrite the date variable):
Note that now the date column is of Date type, and the data is represented in
ISO format, i.e., “YYYY-MM-DD”. The format argument expects a character
string indicating the pattern used in the character strings that store the dates. In
our example, we are specifying that the string is formed by: (1) the day of the
month in decimal format (%d); (2) a forward slash; (3) the month of the year in
decimal format (%m); (4) another forward slash; and (5) the year with century
(%Y). Check the documentation for the strptime topic for more options. Now
we can sort the data frame by date:
It can also be useful to create variables for the year, month, etc. for
aggregation, classification, stratification, or any other purpose. For example, if
we store the week we can plot control charts where the groups are the weeks. We
use the format function in the reverse sense, i.e., we turn dates into character
strings, see the following examples:
2.7 Data Import and Export with R
In the previous section we have created all data from scratch. There are many
situations in which we will use such strategy. However, raw data usually comes
from external sources, either because they are automatically recorded during the
process, or stored in databases, or manually entered in spreadsheets. The easiest
way to import data in R is using.csv files. CSV stands for Comma Separated
Values, and.csv files are text files in which each line corresponds to an
observation of a dataset, and the values for each column are separated by a
comma. Actually, the comma can be substituted by another value, for example
when the comma is used as decimal point, then semicolons are used instead of
commas to separate columns. The main advantage of using.csv files is that they
can be generated by most of the applications that storage data, such as
spreadsheets, databases, etc. Furthermore,.csv files can be opened and edited in
spreadsheets applications such as Microsoft Office or LibreOffice, for which most of
the users are already trained.
In the following, we will explain how to get data into R from.csv files. At the
end of the section, some directions are provided to import data from other
sources. A.csv file is available for downloading from the book’s companion
website.
Importing.csv Files
In manufacturing it is common that PLCs (Programmable Logic Controllers)
record data regarding product quality features. Quite often such recording
machines can automatically generate data in.csv files. In such a case the files are
ready to work with them in R. However, if we are exporting data from
spreadsheets, we must take into account that the resulting file will only contain
text, and what we see on the screen may be different than what we get on the
file. If the data on the file does not correspond with what we want, then formats,
formulas, or other application-specific options might be the cause. Remove all
the formats in numbers and characters. It is also recommended to do the
computations in R rather than using formulas in the spreadsheet. Make sure the
data in each column are consistent, for example you do not use different data
types in the same columns (text and numbers). Once you have your data ready
for exporting, select the “Save as …” menu option of your spreadsheet
application and select the.csv format in the “File type” list. Search the location
where you want to save the file, for example your R working directory, choose a
name, and save the file. Depending on your system locale configuration, the
software usually decides the decimal point symbol and the separator for values.
For example, if your system is in English, the decimal point symbol will be the
period, and the separator, the comma; but if your system is, for example, in
Spanish, then the decimal point symbol will be the comma, and the separator, the
semicolon. These two formats are the most common ones.
For the examples below, you need to download the file http://www.
qualitycontrolwithr.com/lab.csv to your working directory. You can go to your
browser and download it as any other file. Alternatively, you can use the
download.file function21:
Now that you have a.csv file on your working directory, you can explore it.
If you click the file in the Files pane of RStudio, the text file is opened in the
source pane. This format is difficult to manage from a text editor, so take a look
just to see how it looks like, and close it. Before importing the data into R, open
the.csv file with your spreadsheet application, for example Microsoft Excel.
Double-clicking the file in a files explorer window should work, but if it does
not, use the “File / Open …” menu of your spreadsheet application and search
the file. It is possible that the spreadsheet application asks for the format of your
data. If so, just select the period as decimal symbol and the comma as values
separator. See how the data inside the.csv file looks like your usual spreadsheets,
without formats, though. Now you can close the file, from now on we will work
with the data in R.
Fig. 2.13 RStudio import dataset dialog box. From the Environment tab we can import data in text files
through the Import Dataset menu
where <your_path> is the path where you downloaded the.csv file, i.e., your
working directory if you used the previous code. The second expression is the
one that opened the data frame in the data viewer. It is just what RStudio does
when clicking on the icon to the right of a data frame or matrix in the
Environment tab. The first expression is the interesting one. Importing files from
the import dataset menu is useful to explore data files, or to import a static data
file once and then save the data processing as specific R data files
(extension.RData). However, the usual way of working is that data files are
regularly updated, either adding rows to the files or adding files to folders. Thus,
it is more efficient to automate the data import in the scripts, and we do that with
the read.csv function above. Note that the only argument that was included was
the file path, as the rest of the default options are valid for standard.csv files as it
is the case. The read.csv function is actually a wrapper of the read.table
function, check the documentation for more details and options.
Data Cleaning
Now we have the data available in our system, but raw data is likely to contain
errors. Before applying the methods described in the following chapters, make
sure that your data are ready for quality control. An exploratory data analysis
should be made to detect possible errors, find outliers, and identify missing
values. Some examples are given below. The first thing we must do is to verify if
the data frame is what we expect to be. Check the number of rows, number of
columns and their type, either in the RStudio environment tab or in the console:
Missing Values
Let us get a summary of the ph variable of the lab data frame.
and now we have no missing values for the ph variable. It is not always
necessary to assign missing values. For example, for the other two numerical
variables of the data set, there are missing values, but we know from the own
process that ph is always measured, but fat and salt are only measured for some
items. Thus, it is normal having NAs in those columns. We just must be aware
and take that into account when making computations:
In addition to the is.na function, the functions any.na and complete. cases
functions are useful to manage missing values. The former returns TRUE if at least
one of the values of the first argument is NA. The latter gets the row indices of a
data frame whose columns are all not NAs. This can be useful to get only the
rows of a data frame that are complete.
Outliers
Outliers are another type of particular data that must be examined before
applying statistics to quality control. An outlier of a data set is, as defined in
[18], “a member of a small subset of observations that appears to be inconsistent
with the remainder of a given sample.” Outliers or outlying observations can be
typically attributed to one or more of the following causes:
Measurement or recording error;
Contamination;
Incorrect distributional assumption;
Rare observations.
Similarly to missing values, sometimes outliers are correct, or we just cannot
remove them or assign a different value. In such cases, robust techniques should
be applied, see [18] . But in some other cases, the value is either impossible or
extremely unlikely to occur, and it should be corrected or removed. In a dataset
with more variables, this removal means assigning the NA value.
To illustrate outliers, let us go back to the ph variable of our lab data frame.
You have probably already realized that there is a strange number in the
summary:
As you might have guessed, the median and mean are close to 6.6, the
minimum is 6.63, but the maximum is ten times those values. It looks like there
is something inconsistent. There is obviously something wrong with that value.
We can descendently sort the dataset and check the first values to see if there are
more extreme values:
We see that it is just row number 16 who has the maximum value. After
some investigation, it was detected a wrong recording of the value, the value
should have been 6.63. Note that this is a very common error when recording
data. Again, we can fix the problem as follows:
Wrong Values
Finally, other wrong values may arise in the data. It is not always easy to detect
wrong values. For categorical variables, a frequency table is a good way to find
possible errors. Let us get a frequency table for the analyst variable using the
function table. The result is the count of rows for each possible value of the
variable used as argument:
Notice that there is an “analyst4” and an “analyst_4”. The former has only
one count, and the rest are 288 or above. Apparently, “analyst4” and “analyst_4”
are the same person, and we have again a recording error. Unfortunately, these
types of errors are quite common when manually recording data in spreadsheets.
A more difficult to detect error is the one we have in the date column. There
is no value for row 24, but it is not detected as missing value because it was
imported as an empty string rather than a missing value:
The which function returns the TRUE indices of a logical vector, very useful
to use for data extraction. In this case, we should have created a column of type
Date in advance, and then look for missing data over that column, because the
empty string is coerced to NA:
The first argument of the function is the object in the workspace that contains
the data to be exported, preferably a matrix or data frame; The second argument
is the path to the output file; and the third argument avoids to create a column
with the row names, typically the row index unless row names have been set.
Thus, the lab_clean.csv has the same structure as lab.csv but with the wrong
data fixed. In fact, it is the same result as if we had edited the.csv file with a
spreadsheet application such as Microsoft Excel to correct the data and then
saved the file with a new name.
A strategy mixing both approaches is however the most efficient. The
following step-by-step procedure can be followed as a guide when planning
quality control data analysis:
1. Create folders structure for your quality control data analysis project. The
following could be a general proposal which should be adapted to the project
specifics, if any:
data: This folder shall contain the data files. It could contain sub-folders
such as “rawdata” and “cleandata”, “yyyy_mm_dd” (one folder per day),
etc.;
code: This folder shall contain the scripts;
reports: This folder shall contain the.Rmd files and their counterpart
compiled reports as shown in Sect. 1.6, Chapter 1;
plots: This folder could contain the exported plots to be used by other
programs;
other…: Any other folder useful for the purpose of the quality control
data analysis.
2. Save the raw data file;
3. Create a script for data cleaning. This allows to keep track of the changes
made, even including comments in the code for further reference and lessons
learned;
4. Export the clean data in a data file with a new name (included in the data
cleaning script);
5. Create scripts for the quality control data analysis. There might be several
different scripts, for example for exploratory data analysis, control charts,
capability analysis, etc.;
6. Create report files with the relevant results.
In addition to.csv files, data can be exported to many other formats. For
example, one or more objects in the workspace can be saved in a.RData file
using the save function. The following expression saves the lab data frame in the
lab.RData file:
Later on, the data in a.RData file can be imported to the workspace with the
load function:
It is up to the user which data and file formats to choose for their quality
control data analysis. All of them have advantages and disadvantages.
Depending on the use that will be done over the data, it could be better to use.csv
files, e.g., when the data are bound to be used by other applications, or.RData
files, if only R will make use of them. In addition to.csv and.RData, many other
formats can be used in R for data import and export, see the following
subsection.
Modeling Quality
The packages in this paragraph are installed with the base installation of R.
The base package contains basic functions to describe the process
variability. The summary function gets a numerical summary of a variable.
The function table returns frequency tables. The functions mean, median,
var, and sd compute the mean, median, variance, and standard deviation of
a sample, respectively. For two variables, we can compute the covariance
and the correlation with the functions cov and cor, respectively.
The stats package includes functions to work with probability
distributions. The functions for the density/mass function, cumulative
distribution function, quantile function, and random variate generation are
named in the form dxxx, pxxx, qxxx, and rxxx respectively, where xxx
represents a given theoretic distribution, including norm (normal), binom
(binomial), beta, geom (geometric), and so on, see ?Distributions for a
complete list. Linear models can be adjusted using the lm function. Analysis
of Variance (ANOVA) can be done with the anova function. The ts and
arima functions are available for time series analysis.
Visualizing Quality
Standard plots can be easily made with the graphics package. It basically works
as a painter canvas: you can start plotting a simple plot and then add more
details. The graphics, grid, and lattice packages are included in the R base
installation. The grid and lattice packages must be loaded before use, though.
The graphics package allows to build standard plots using the plot (scatter
plots), hist (histograms), barplot (bar plots), boxplot (box plots) functions.
Low-level graphics can also be drawn using the functions: points, lines,
rect (rectangles), text, and polygon. Those functions can also be used to
annotate standard plots. Functions of x can be drawn with the curve
function.
The grid package implements a different way to create and modify plots in
run time, including support for interaction.
The lattice package [32] can plot a number of elegant plots with an
emphasis on multivariate data. It is based in Trellis plots.
ggplot2 is another package [36] providing elegant plots through the
grammar of graphics.
Cause-and-effect diagrams can be drawn with the cause.and.effect (qcc
package [33]) and the ss.ceDiag (SixSigma package [5]) functions.
To make Pareto charts the functions pareto.chart (qcc package),
paretoChart (qualityTools package [29]) and paretochart
(qicharts package [1]) can be used.
Control Charts
The qcc package [33] can perform several types of control charts, including:
xbar (mean), R (range), S (standard deviation), xbar.one (individual values),
p (proportion), np, c, u (nonconformities), and g (number of non-events
between events). The function qcc plots a control chart of the type specified
in the type argument for the data specified in the data argument. For charts
expecting data in groups, i.e., xbar, R, and S charts, the input data must be
prepared with the function qcc.groups, whose arguments are the vector with
the measurements and the vector with the groups identifiers. For attribute
charts where the size of groups is needed, e.g., p, np, and u, the sizes
argument is mandatory.
The qcc package allows to implement customized control charts, see
demo("p.std.chart").
The functions ewma, cusum, and mqcc in the qcc package are for
exponentially weighted moving average control charts, cumulative sums
control charts, and multivariate control charts, respectively.
The SixSigma package can plot moving range control charts with the ss.cc
function.
The qicharts package provides the qic to plot control charts and run charts
. It has also the trc function for multivariate data run charts.
The IQCC package [3] implements qcc control charts with a focus on Phase I
and Phase II analysis.
The qcr package [12] provides quality control charts and numerical results.
The MSQC package [31] is a toolkit for multivariate process monitoring.
Control Charts Operating Characteristic (OC) curves. The qcc package
oc.curves function draws operating characteristic curves which provide
information about the probability of not detecting a shift in the process.
Capability Analysis
The qcc package process.capability function performs a capability
analysis over a qcc object previously created.
The qualityTools package cp function returns capability indices and charts.
The SixSigma package contains functions to individually get the indices
(ss.ca.cp, ss.ca.cpk, ss.ca.z ). A complete capability analysis including
plots can be done with the ss.ca.study function.
The mpcv package [8] performs multivariate process capability analysis
using the multivariate process capability vector.
The tolerance package [41] contains functions for calculating tolerance
intervals, useful to set specifications.
Acceptance Sampling
The AcceptanceSampling package [20] provides functionality for creating
and evaluating single, double, and multiple acceptance sampling plans. A
single sampling plan can be obtained with the find.plan function
The acc.samp function in the tolerance package provides an upper bound
on the number of acceptable rejects or nonconformities in a process.
The Dodge package [15] contains functions for acceptance sampling ideas
originated by Dodge [10].
Design of Experiments
Please visit the ExperimentalDesign Task View to see all resources
regarding this topic.
CRAN Packages
AcceptanceSampling
base
Dodge
edcc
ggplot2
graphics
grid
IQCC
knitr
lattice
mpcv
MSQC
qcc
qcr
qicharts
qualityTools
SixSigma
spc
spcadjust
stats
tolerance
Books
Cano, E.L., Moguerza, J.M., Redchuk, A.: Six Sigma with R. Statistical
Engineering for Process Improvement, Use R!, vol. 36. Springer, New York
(2012).
Cano, E.L., Moguerza, J.M., Prieto, M.: Quality Control with R. An ISO
Standards Approach, Use R!. Springer, New York (2015).
Dodge, H., Romig, H.: Sampling Inspection Tables, Single and Double
Sampling. John Wiley and Sons (1959)
Montgomery, D.C. Statistical Quality Control, Wiley (2012)
Links
http://www.qualitycontrolwithr.com
http://www.sixsigmawithr.com
http://www.r-qualitytools.org
The default formats follow the rules of the ISO 8601 international standard
which expresses a day as “2001-02-28” and a time as “14:01:02” using
leading zeroes as here. (The ISO form uses no space to separate dates and
times: R does by default.)
Check ISO 8601 international standard [17] for more details on date and time
data representation. As explained in Sect. 2.6, using the format function and the
% operator, any format can be obtained. The ISOweek package [4] could be useful
if you are in trouble to get weeks in ISO format when using Windows.
As referenced in Sect. 2.7, part 4 of ISO 16269, Statistical interpretation of
data—Part 4: Detection and treatment of outliers [18], “provides detailed
descriptions of sound statistical testing procedures and graphical data analysis
methods for detecting outliers in data obtained from measurement processes. It
recommends sound robust estimation and testing procedures to accommodate the
presence of outliers.”
Some examples in this chapter generated random numbers. ISO 28640 [19],
“Random variate generation methods,” specifies methods for this technique.
Regarding data management and interchange, there are a number of
international standards developed by the ISO/IEC JTC 1/SC 32, check the
available standards in the subcommittee web page23 for further details.
Data is becoming a relevant topic in standardization. Recently, a Big Data
Study Group has been created within the ISO/IEC JTC 1 Technical Committee
(Information Technology).24 Keep updated on this standardization topic if your
quality control data is big.
R Certification
Even though there is not a specific reference to ISO Standards in the R project
documentation regarding the software, we can find in the R website homepage a
link to “R Certification.” There we can find “A Guidance Document for the Use
of R in Regulated Clinical Trial Environments,” a document devoted to
“Regulatory Compliance and Validation Issues.” Even though the document
focuses on the United States Federal Drug Administration (FDA) regulations,
many of the topics covered can be applied to or adopted for other fields. In
particular, the Software Development Life Cycle (SDLC) section, which is also
available in the R Certification web page as a standalone document, represents
“A Description of R’s Development, Testing, Release and Maintenance
Processes,” which can be used for certification processes if needed, for example,
in relation to ISO/IEC 12207 [16], Systems and software engineering—Software
life cycle processes.
References
1. Anhoej, J.: qicharts: quality improvement charts. http://www.CRAN.R-project.org/package=qicharts
(2015). R package version 0.2.0
3. Barbosa, E.P., Barros, F.M.M., de Jesus Goncalves, E., Recchia, D.R.: IQCC: improved quality control
charts. http://www.CRAN.R-project.org/package=IQCC (2014). R package version 0.6
4. Block, U.: Using an algorithm by Hatto von Hatzfeld: ISOweek: week of the year and weekday
according to ISO 8601. http://www.CRAN.R-project.org/package=ISOweek (2011). R package version
0.6-2
5. Cano, E.L., Moguerza, J.M., Redchuk, A.: Six sigma with R. In: Statistical Engineering for Process
Improvement, Use R!, vol. 36. Springer, New York (2012). http://www.springer.com/statistics/book/
978-1-4614-3651-5
6. Chambers, J.M.: Software for data analysis. In: Programming with R. Statistics and Computing.
Springer, Berlin (2008)
[MATH][CrossRef]
8. Ciupke, K.: Multivariate process capability vector based on one-sided model. Qual. Reliab. Eng. Int.
(2014). doi:10.1002/qre.1590. R package version 1.1
9. Conway, J., Eddelbuettel, D., Nishiyama, T., Prayaga, S.K., Tiffin, N.: RPostgreSQL: R interface to the
PostgreSQL database system. http://www.CRAN.R-project.org/package=RPostgreSQL (2013). R
package version 0.4
10. Dodge, H., Romig, H.: Sampling Inspection Tables, Single and Double Sampling. Wiley, New York
(1959)
[MATH]
11. Fellows, I.: Deducer: a data analysis gui for R. J. Stat. Softw. 49(8), 1–15 (2012). http://www.jstatsoft.
org/v49/i08/
12. Flores, M., Naya, S., Fernandez, R.: qcr: quality control and reliability. http://www.CRAN.R-project.
org/package=qcr (2014). R package version 0.1-18
13. Fox, J.: The R commander: a basic statistics graphical user interface to R. J. Stat. Softw. 14(9), 1–42
(2005). http://www.jstatsoft.org/v14/i09
14. Free Software Foundation, Inc.: Free Software Foundation website. http://www.gnu.org (2014)
[Retrieved 2014-07-10]
15. Godfrey, A.J.R., Govindaraju, K.: Dodge: functions for acceptance sampling ideas originated by H.F.
Dodge. http://www.CRAN.R-project.org/package=Dodge (2013). R package version 0.8
16. ISO/IEC JTC 1 – Information Technology: ISO/IEC 12207:2008, Systems and Software Engineering –
Software Life Cycle Processes. ISO – International Organization for Standardization (2008). http://
www.iso.org/iso/catalogue_detail?csnumber=43447
17. ISO TC154 – Processes, Data Elements and Documents in Commerce, Industry and Administration:
ISO 8601 Data Elements and Interchange Formats – Information Interchange – Representation of Dates
and Times. ISO – International Organization for Standardization (2004)
18. ISO TC69/SCS–Secretariat: ISO 16269-4:2010 – Statistical Interpretation of Data – Part 4: Detection
and Treatment of Outliers. Published Standard. http://www.iso.org/iso/catalogue_detail.htm?csnumber=
44396 (2010)
19. ISO TC69/SCS–Secretariat: ISO 28640:2010 – Random Variate Generation Methods. Published
Standard. http://www.iso.org/iso/catalogue_detail.htm?csnumber=42333 (2015)
20. Kiermeier, A.: Visualizing and assessing acceptance sampling plans: the R package
AcceptanceSampling. J. Stat. Softw. 26(6) (2008). http://www.jstatsoft.org/v26/i06/
21. Lang, D.T., The CRAN Team: XML: tools for parsing and generating XML within R and S-plus. http://
www.CRAN.R-project.org/package=XML (2015). R package version 3.98-1.3
23. Mukhin, D., James, D.A., Luciani, J.: ROracle: OCI based oracle database interface for R. http://www.
CRAN.R-project.org/package=ROracle (2014). R package version 1.1.12
24. Murrell, P.: R Graphics, 2nd edn. Chapman & HallCRC, Boca Raton (2011)
[MATH]
25. Ooms, J., James, D., DebRoy, S., Wickham, H., Horner, J.: RMySQL: database interface and MySQL
driver for R. http://www.CRAN.R-project.org/package=RMySQL (2015). R package version 0.10.1
26. R Core Team: Foreign: read data stored by minitab, S, SAS, SPSS, Stata, Systat, Weka, dBase, ….
http://www.CRAN.R-project.org/package=foreign (2015). R package version 0.8-64
27. R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical
Computing, Vienna (2015). http://www.R-project.org/
28. Ripley, B., Lapsley, M.: RODBC: ODBC database access. http://www.CRAN.R-project.org/package=
RODBC (2015). R package version 1.3-12
29. Roth, T.: qualityTools: statistics in quality science. http://www.r-qualitytools.org (2012). R package
version 1.54 http://www.r-qualitytools.org
30. RStudio Team: RStudio: Integrated Development Environment for R. RStudio Inc., Boston, MA
(2012). http://www.rstudio.com/
31. Santos-Fernández, E.: Multivariate Statistical Quality Control Using R, vol. 14. Springer, Berlin
(2013). http://www.springer.com/statistics/computational+statistics/book/978-1-4614-5452-6
32. Sarkar, D.: Lattice: Multivariate Data Visualization with R. Springer, New York (2008). http://www.
lmdvr.r-forge.r-project.org. ISBN 978-0-387-75968-5
33. Scrucca, L.: qcc: an r package for quality control charting and statistical process control. R News 4/1,
11–17 (2004). http://www.CRAN.R-project.org/doc/Rnews/
34. Shewhart, W.: Economic Control of Quality in Manufactured Products. Van Nostrom, New York (1931)
35. Urbanek, S.: RJDBC: provides access to databases through the JDBC interface. http://www.CRAN.R-
project.org/package=RJDBC (2014). R package version 0.2-5
36. Wickham, H.: ggplot2: elegant graphics for data analysis. In: Use R! Springer, Berlin (2009)
37. Wickham, H., James, D.A., Falcon, S.: RSQLite: SQLite interface for R. http://www.CRAN.R-project.
org/package=RSQLite (2014). R package version 1.0.0
38. Xie, Y.: Dynamic Documents with R and Knitr. Chapman and Hall/CRC, Boca Raton, FL (2013).
http://www.yihui.name/knitr/. ISBN 978-1482203530
39. Xie, Y.: knitr: a comprehensive tool for reproducible research in R. In: Stodden, V., Leisch, F., Peng,
R.D. (eds.) Implementing Reproducible Computational Research. Chapman and Hall/CRC, Boca Raton
(2014). http://www.crcpress.com/product/isbn/9781466561595. ISBN 978-1466561595
40. Xie, Y.: knitr: a general-purpose package for dynamic report generation in R. http://www.yihui.name/
knitr/ (2015). R package version 1.10.5
41. Young, D.S.: tolerance: an R package for estimating tolerance intervals. J. Stat. Softw. 36(5), 1–39
(2010). http://www.jstatsoft.org/v36/i05/
Footnotes
1 http://www.r-project.org.
2 http://www.rstudio.com.
3 Please note that sometimes what you paste could not be exactly what you see in the book and some
modifications could be needed.
4 http://www.qualitycontrolwithr.com.
5 http://www.rstudio.com.
6 http://www.ess.r-project.org.
7 http://www.walware.de/goto/statet.
8 Escaping means to provide a character string with special characters. For example, ∖n is for the special
character new line.
9 http://www.java.com.
10 The version used while writing this book was 0.99.xxx.
11 It can be changed through the Tools > Global options menu.
12 This means that only the first letters of the argument name can be provided. We do not recommend that,
though.
13 This function returns the structure of any R object.
14 Pressing the F1 key when the cursor is in a function name also works.
15 In this case not all the code is shown because seq is a built-in, compiled function.
16 The scan function also accepts arguments to scan data from files and text, check the function
documentation.
17 Note that the output might have more elements if further objects were created beforehand.
18 The sum function will be explained later.
19 We use the paste function to get a sequence of character strings, see Appendix C to see more functions
to work with strings.
20 We first fix the seed in order to make the example reproducible.
21 We use a different URL within the download.file function as it fails in redirecting URLs.
22 There are more packages able to deal with Excel files, check http://www.thertrader.com/2014/02/11/a-
million-ways-to-connect-r-and-excel/.
23 http://www.iso.org/iso/home/store/catalogue_tc/catalogue_tc_browse.htm?commid=45342.
24 http://www.jtc1bigdatasg.nist.gov.
© Springer International Publishing Switzerland 2015
Emilio L. Cano, Javier M. Moguerza and Mariano Prieto Corcoba, Quality Control with R, Use R!,
DOI 10.1007/978-3-319-24046-6_3
Abstract
The aim of this chapter is to smoothly introduce the reader to Quality Control
techniques from the so-called Seven Basic Quality Control tools: Cause-and-
effect-diagram, check sheet, control chart, histogram, Pareto chart, scatter
diagram, and stratification. These are basic but powerful tools when used wisely.
3.1 Origin
Kaoru Ishikawa is one of the main Japanese figures in the quality area. He
chose a set of seven very simple tools, which as such a set constitute an
improvement methodology [5]. Using these tools, most standard quality and
productivity problems become affordable, see [19]. The seven quality Ishikawa
tools are:
1. Cause-and-effect diagram1
2.
Check sheet
3. Control chart
4. Histogram
5. Pareto chart
6. Scatter diagram
7. Stratification2
Next, we provide brief R examples of the use of each tool.
Fig. 3.1 Cause-and-effect diagram for the intuitive example. A horizontal straight line is drawn and the
cause is put on the right side. Then, lines for the groups stem from the center line, looking like a fishbone.
The possible causes within each group are finally printed besides those lines
Fig. 3.2 Cause-and-effect diagram for the intuitive example. The SixSigma package produces a more
elaborated diagram
□
Example 3.2.
Pellets density (cont.)
Although a check sheet can be designed with text processing or spreadsheet
software, we will use R to produce a check sheet with the data we have from the
previous examples. This has the advantage of reusing data, which reduces errors
and is part of the reproducible research approach explained in Chapter 1. Recall
from the previous section that we saved the causes and the effect in character
vectors in order to use them when necessary. Now we consolidate all those
vectors in a data frame:
The rbind function binds rows of data frames with identical columns to
create larger data frames. Additional columns can be added to register events by
different criteria. For example: a column for each day of the week and then
analyze the check sheets weekly; a column for the machine where the pellet was
compacted; etc. In this example, let us suppose that the raw material could come
from three different suppliers, and we want to count the point out of control by
each supplier. Then we need to add three columns to the data frame that will
form the check sheet. For the moment, NA values can be assigned, see Chapter 2
to find out more about NAs.
Fig. 3.3 R Markdown check sheet. We can produce tables in reports and in this way a check sheet is easy
to generate
Fig. 3.4 Filled check sheet. An operator registers the events per potential cause with simple tick marks
Example 3.3.
Pellets density (cont.)
In our illustrative example, the control chart for pellets density
measurements plotted in Chapter 1 is reproduced in Fig. 3.5 in order to make this
chapter self-contained. To make such control chart, we need to have the data in
an R object as follows:
Fig. 3.5 Control chart tool. The control chart advances possible problems before they reach the customer
and then plot the control chart for individual values using the qcc package as
follows:
A summary of the object shows basic information. The assigned object is a
list whose content can be used for further analysis, for example to get the out-of-
control points:
In this case, three points are beyond the control limits, namely measurements
number 11, 12, 13. This means that they are highly unlikely to occur, and
therefore the cause of this situation should be investigated. This investigation
might result in a cause-and-effect diagram as explained in Sect. 3.2. □
3.5 Histogram
A histogram provides an idea of the statistical distribution of the process data,
that is, whether the data are centered or not, or even if the data are concentrated
around the mean or sparse. To build a histogram, at a first stage some intervals
are calculated and then, at a second stage, the number of observations inside
each interval has to be accounted. This number of observations is known as the
“frequency.” Finally, the frequencies are represented using vertical adjacent bars.
The area of each bar is proportional to its frequency, being its width the length of
the corresponding interval.
Example 3.4.
Pellets density (cont.)
Basic histograms can be generated with short R expressions and the standard
graphics. For example, the histogram of the pellets density in the illustrative
example explained in Chapter 1 shown in Fig. 3.6 is obtained with the following
simple expression:
Fig. 3.6 Pellets density basic histogram. A basic histogram is generated just with the hist function with
a numerical vector as argument
Notice that histograms generated with statistical software are usually built
with constant width intervals, being the height of the bars for the frequency
(absolute or relative) of data points within each interval. The number and width
of intervals are decided by the software using one of the accepted rules, see
Chapter 5 for details.
The R Core, see Chapter 1, is very concerned about making possible to draw
plots with few options. This allows to generate plots with such short expressions
as the one that generated the extremely simple histogram in Fig. 3.6.
Nevertheless, one of the strengths of R is its graphical capability. Thus, more
elements and styles can be added to the histogram just with the graphics base
package, see, for example, the following code that generates a more elaborated
version of our histogram, shown in Fig. 3.7:
Fig. 3.7 A histogram with options. Graphical options can be added to the hist function to generate nicer
histograms
See the documentation for the topics hist, plot, and par to find out more
about those and other options. Check also how to include expressions through
the expression function. On the other hand, there are some packages specialized
in elegant graphics that use specific syntax. The lattice package [22] is
included in the R base installation, but it is not loaded at the start-up. The
histogram function is the one to generate lattice-based histograms. The panel
argument accepts a number of functions to make complex plots. For example, in
the following code we add a density line to the histogram, see Fig. 3.8 for the
result:
Fig. 3.8 A lattice-based histogram. A density line has been added to the histogram
Consult the lattice package documentation and its functions histogram and
xyplot for graphical options, and trellis.par.set, trellis.par.get for themes and
styles.
The ggplot2 package [24] can also plot histograms using the grammar of
graphics, see [24]. The following code creates the histogram in Fig. 3.9.3 On the
other hand, the special grammar of ggplot2 builds the chart by adding
components with the “+” operator.
Fig. 3.9 A ggplot2-based histogram. In this function, the number of bars is determined by the binwidth
argument, which sets the interval width to create the frequency table
Example 3.5.
Pellets density (cont.)
Let us illustrate Pareto charts with the example we used in sections 3.2 and
3.3. The data gathered by the process owner in the check sheet shown in Fig. 3.4
can be saved into an R data frame with the following code. Note how we are re-
using the data recorded in the previous tools.
This bar plot is useless for Pareto analysis. The following changes result in
Fig. 3.11, which is an actual Pareto Chart. Note that the options have been tuned
up to improve the readability of the plot, check the documentation of the par
function.
Fig. 3.11 A basic Pareto chart. Simply sorting the bars and reorganizing the axis information a simple
bar plot becomes a useful Pareto chart
Even though we can use standard plots for Pareto charts, there are some
functions in contributed packages that can be useful. The pareto.chart in the
qcc package returns the plot and a table containing the descriptive statistics used
to draw the Pareto chart. This table can be stored in an R object for further use.
The data input should be a named vector, so we first prepare our data. The result
is the chart in Fig. 3.12.
Fig. 3.12 Pareto chart with the qcc package. A dot and line plot is plotted with the cumulative
percentage
The qualityTools package [21] also includes a function for Pareto charts,
namely paretoChart. It also uses a named vector and returns a frequency table,
see Fig. 3.13.
Fig. 3.13 Pareto chart with the qualityTools package. The table below the chart can be removed by
setting the showTable argument to FALSE
Another option is the paretochart function in the qicharts package [1]. This
function expects a factor or character vector to make the counts by itself. We can
easily create this data structure with the rep function, see Chapter 2. The
following code produces Fig. 3.14:
Fig. 3.14 Pareto chart with the qicharts package. The output shows the frequency table used to plot the
chart
Note that when using specific functions we usually lose control over the
graphics. Such specific functions are often convenient, but we may need
something different in the output for our quality control report. For example, we
could split and color the bars in Fig. 3.11 according to the suppliers information
in the check sheet. On the other hand, adding lines and points with the
cumulative percentages in the Pareto chart in Fig. 3.11 is straightforward with
the points function. Moreover, customizing packages’ functions is also possible
as the source code is available. □
Example 3.6.
Pellets density (cont.)
To illustrate the example, we simulate the temperature with the following
code:
Fig. 3.15 Scatter plot example. Relations between variables can be found by using scatter plots. Cause-
and-effect relations must be validated through designed experiments, though
The input of the plot function can be a formula, as in this example, where the
left-hand expression is for the response variable (vertical axis) and the right-
hand expression is for the predictive variable (horizontal axis). Similarly to
histograms and other plots, scatter plots can be also generated with the lattice
and ggplot2 packages, check the functions xyplot and geom_point, respectively.
In this example,4 it is apparent that when one variable increases, the other
one also grows in magnitude. Further investigation is usually needed to
demonstrate the cause-and-effect relationship, and to eventually set the optimal
values of the factors for the process optimization. □
3.8 Stratification
In many cases, some numerical variables may have been measured for
different groups (also referred to as factors). When this information is available,
the quality control analysis should be made by these groups (strata). We illustrate
the stratification strategy, by means of the box plot, one of the most useful
graphical tools that will be described in detail in Chapter 5. Using the same scale
in the vertical axis, the data distribution for each factor can be visualized at a
glance. It is important to register as much information as possible about the
factors which data can be split into (operator, machine, laboratory, etc.).
Otherwise, masking effects or mixtures of populations may take place, slowing
down the detection of possible problems.
Example 3.7.
Pellets density (cont.)
In the case of the example we have been using throughout this chapter, let us
assume that the observations of the density measurements correspond to the
three suppliers A, B, and C:
□
Actually, stratification is a strategy that is used throughout the rest of the seven
basic quality tools, and in the application of any other statistical technique in
quality control. For example, it was stratification what we did when designing
the check sheet to gather information by supplier. Stratification can also be
applied to histograms, scatter plots, or Pareto charts. We will see in Chapter 6
that stratified sampling is also a way to improve our estimations and predictions
about the process.
The seven basic quality tools is a topic covered by a number of authors, as it
is an effective and easy to implement problem-solving technique, even being
included in the Project Management Base of Knowledge (PMBoK) [20]. In this
regard, most of the lists keep the six previous tools. However, some of them
replace “stratification” by “run chart” or “flow chart.” A run chart is actually a
simplified version of the control chart, where data points are plotted sequentially.
Flow charts and similar diagrams such as process maps are in fact a previous-to-
stratification step. Those problem-structuring tools allow to divide the process
into steps or sub-processes, and identify the different factors that could influence
the output, thereby defining the groups in which perform the stratified analysis.
A detailed explanation of process maps and how to get them with R can be found
in [3].
References
1. Anhoej, J.: Qicharts: quality improvement charts, url http://CRAN.R-project.org/package=qicharts. R
package version 0.2.0 (2015)
2. Cano, E.L., Moguerza, J.M., Redchuk, A.: Six sigma in a nutshell. In: Six Sigma with R, Use R!, vol.
36, pp. 3–13. Springer, New York (2012). doi:10.1007/978-1-4614-3652-2_1. url http://dx.doi.org/10.
1007/978-1-4614-3652-2_1
3. Cano, E.L., Moguerza, J.M., Redchuk, A.: Six sigma with R. Statistical Engineering for Process
Improvement, Use R!, vol. 36. Springer, New York (2012). url http://www.springer.com/statistics/book/
978-1-4614-3651-5
5. Ishikawa, K.: What is total quality control? The Japanese way. Prentice Hall Business Classics.
Prentice-Hall, Englewood Cliffs (1985)
6. Ishikawa, K.: Guide to Quality Control. Asian Productivity Organisation, Tokyo (1991)
7. ISO: ISO/IEC 31010:2009, Risk management – Risk assessment techniques. International standard
(2010)
8. ISO: ISO/IEC 19795-1:2006, Information technology – Biometric performance testing and reporting –
Part 1: Principles and framework. International standard (2016)
9. ISO TC69/SC1–Terminology and Symbols: ISO 3534-1:2006 - Statistics – Vocabulary and symbols –
Part 1: General statistical terms and terms used in probability. Published standard (2010). url http://
www.iso.org/iso/catalogue_detail.htm?csnumber=40145
10. ISO TC69/SC1–Terminology and Symbols: ISO 3534-4:2014 - Statistics – Vocabulary and symbols –
Part 4: Survey sampling. Published standard (2014). url http://www.iso.org/iso/catalogue_detail.htm?
csnumber=56154
12. ISO TC69/SC4–Applications of statistical methods in process management: ISO 7870-4:2011 - Control
charts – Part 4: Cumulative sum charts. Published standard (2011). url http://www.iso.org/iso/
catalogue_detail.htm?csnumber=40176
13. ISO TC69/SC4–Applications of statistical methods in process management: ISO 7870-3:2012 - Control
charts – Part 3: Acceptance control charts. Published standard (2012). url http://www.iso.org/iso/
catalogue_detail.htm?csnumber=40175
14. ISO TC69/SC4–Applications of statistical methods in process management: ISO 7870-2:2013 - Control
charts – Part 2: Shewhart control charts. Published standard (2013). url http://www.iso.org/iso/
catalogue_detail.htm?csnumber=40174
15. ISO TC69/SC4–Applications of statistical methods in process management: ISO 7870-1:2014 - Control
charts – Part 1: General guidelines. Published standard (2014). url http://www.iso.org/iso/catalogue_
detail.htm?csnumber=62649
16. ISO TC69/SC4–Applications of statistical methods in process management: ISO 7870-5:2014 - Control
charts – Part 5: Specialized control charts. Published standard (2014). url http://www.iso.org/iso/
catalogue_detail.htm?csnumber=40177
17. ISO TC69/SC7–Applications of statistical and related techniques for the implementation of Six Sigma:
ISO 13053-1:2011 - Quantitative methods in process improvement – Six Sigma – Part 1: DMAIC
methodology. Published standard (2011). url http://www.iso.org/iso/catalogue_detail.htm?csnumber=
52901
18.
ISO TC69/SC7–Applications of statistical and related techniques for the implementation of Six Sigma:
ISO 13053-2:2011 - Quantitative methods in process improvement – Six Sigma – Part 2: Tools and
techniques. Published standard (2011). url http://www.iso.org/iso/catalogue_detail.htm?csnumber=
52902
19. Kume, H.: Statistical Methods for Quality Improvement. The Association for Overseas Technical
Scholarships, Tokyo (1985)
20. PMI: A guide to the projet management body of knowledge. Project Management Institute (PMI),
Newton Square (2013)
21. Roth, T.: Qualitytools: Statistics in Quality Science, url http://www.r-qualitytools.org. R package
version 1.54 (2012)
22. Sarkar, D.: Lattice: Multivariate Data Visualization with R. Springer, New York (2008). url http://
lmdvr.r-forge.r-project.org. ISBN 978-0-387-75968-5
23. Scrucca, L.: Qcc: an R package for quality control charting and statistical process control. R News 4/1,
11–17 (2004). url http://CRAN.R-project.org/doc/Rnews/
24. Wickham, H.: Ggplot2: Elegant Graphics for Data Analysis. Use R!. Springer, New York (2009)
Footnotes
1 Also known as Ishikawa diagram or “fishbone” diagram.
2 Some authors replace “stratification” by “flow chart” or “run chart” in the original list.
3 The ggplot function requires a data frame object as input data, so the pdensity vector is converted to a
data frame in the first argument.
4 As expected from the simulation!
5 Define, Measure, Analyze, Improve, and Control.
6 Descriptions are from the standards summaries at the ISO website http://www.iso.org.
7 At the time of writing, the development stage is Final Draft International Standard (FDIS) , see Chapter 4
to find out more about standards development stages.
© Springer International Publishing Switzerland 2015
Emilio L. Cano, Javier M. Moguerza and Mariano Prieto Corcoba, Quality Control with R, Use R!,
DOI 10.1007/978-3-319-24046-6_4
Abstract
This chapter details the way ISO international standards for quality control are
developed. Quality Control starts with Quality, and standardization is crucial to
deliver products and services where quality satisfies final users, whatever they
are customers, organizations, or public bodies. The development process, carried
out by Technical Committees (TCs), entails a kind of path until the standard is
finally adopted, including several types of intermediate deliverables. The work
of such TCs is outlined along with the general structure of ISO, and with a focus
on the TC in charge of statistical methods. Finally, the current and potential role
that R can play, not only as statistical software, but also as programming
language, is shown.
1. PWI. Preliminary Work Items that are not yet mature enough to be
incorporated to a programme of work, for example relating to emerging
technologies or recent discoveries. If the preliminary work item has not
progressed to the proposal stage in 3 years, it is automatically deleted from
the programme of work.
2. NP. A new Work Item Proposal can be for a new standard, a new part of an
existing standard, a technical specification (TS) or a publicly available
specification (PAS) . The proposal can be made by different stakeholders,
such as the own TC, a national body, or an organization in liaison, among
others. It must include at least an outline and a project leader. The (of course
standardized) form is circulated to the TC members to vote. Approval
requires simple majority of P-members (see Sect. 4.1) and the commitment to
participate by some of them. Once approved, it is included in the TC
programme of work as a project.
3.
WD. A first version of the Working Draft could have been submitted with the
NP. Once the project is accepted, the project leader and the experts nominated
during the approval work together to prepare/improve a working draft
conforming to Part 2 of ISO/IEC Directives [94]. A working group can be
proposed by the TC Secretariat. ISO/IEC Directive Part 2 assures that all
standards have the same structure and style. ISO Standards are published in
English and French7, so all efforts must be made to have English and French
versions of the text in order to avoid delays. When the WD is finished, it is
circulated to TC members as a first committee draft (CD).
4. CD. At this stage, national bodies provide comments on the CD. It is quite an
active stage in which technical details are discussed within the TC or SC both
electronically and in-person meetings. Comments are compiled by the
secretariat until an appropriate level of consensus is attained. In case of
doubt, a two-thirds majority is usually sufficient. During this stage the CD
can be discussed and revised until it is proposed as a DIS.
5. DIS. A draft international standard is circulated for voting and commenting to
all national bodies, not only to those involved in the TC/SC. At this stage,
technical comments, mandatory in case of negative vote, can be made.
Comments can be addressed by the secretary for the final draft. Before
stepping into the next stage, a report on the voting and decisions on
comments is circulated again, and finally an FDIS is prepared.
6. FDIS. This is the last stage before publication. The procedure follows a
similar procedure to the one in DIS. However, editorial comments are
expected rather than technical comments.
7. ISO. The international standard is eventually published once the comments in
FDIS has been addressed.
8. SR. After publication, an ISO Standard and other deliverables such as TR are
subject to systematic review in order to determine whether it should be
confirmed, revised/amended, converted to another form of deliverable, or
withdrawn. For an ISO Standard, the maximum elapsed time before
systematic review is 5 years.
Figure 4.1 summarizes the standards development process, including
approximate target dates, see [95] for details. Please note that at any voting
stage, the document can be rejected and referred back to the TC/SC, that may
decide to resubmit a modified version, change the type of document (e.g., a
technical specification instead of an international standard), or cancel the project.
Fig. 4.1 ISO Standards publication path. Standardized process
Now we have a data.frame called DATA with ten observations and six
variables, namely: title, link, description, category, guid, pubDate. From the
information of each variable we can extract information to have more useful
variables, for example splitting the title variable to get the standard code as
follows:
Then, following a similar approach of that for an XML file, the information
from an HTML file can be get. However, errors may arise due to the strict rules
of the XML specification. Webscrapping is easier to do using the rvest package,
see the package documentation for details. In our example, the following code
gets the abstract for ISO 3534-1:200612:
Now you can use the text for any purpose, for example to print out the
abstract with the cat function:
Output:
Abstract
ISO 3534-1:2006 defines general statistical terms and terms used in
probability which may be used in the drafting of other International
Standards. In addition, it defines symbols for a limited number of these
terms.
2. Box, G.: Total quality: its origins and its future. In: Total Quality Management, pp. 119–127. Springer,
Berlin (1995)
3. Cano, E.L., Moguerza, J.M., Redchuk, A.: Six Sigma with R. Statistical Engineering for Process
Improvement, Use R!, vol. 36. Springer, New York (2012). http://www.springer.com/statistics/book/
978-1-4614-3651-5
4. Crosby, P.: Quality is Free: The Art of Making Quality Certain. Mentor Book. McGraw-Hill, New
York (1979). https://books.google.es/books?id=n4IubCcpm0EC
5. Eubank, R., Kupresanin, A.: Statistical Computing in C++ and R. Chapman & Hall/CRC The R
Series. Taylor & Francis, New York (2011). https://books.google.es/books?id=CmZRui57cWkC
6. Gardener, M.: Beginning R: The Statistical Programming Language. ITPro collection. Wiley, New
York (2012). https://books.google.es/books?id=iJoKYSWCubEC
7. ISO TC176/SC1 – Concepts and terminology: ISO 9000:2005 - Quality management systems –
Fundamentals and vocabulary. Published standard (2005). http://www.iso.org/iso/home/store/
catalogue_tc/catalogue_detail.htm?csnumber=42180
8. ISO TC176/SC2 – Quality systems: ISO 9001:2008 - Quality management systems – Requirements.
Published standard (2008). http://www.iso.org/iso/home/store/catalogue_tc/catalogue_detail.htm?
csnumber=46486
9. ISO TC176/SC2 – Quality systems: ISO 9004:2009 - Managing for the sustained success of an
organization – A quality management approach. Published standard (2009). http://www.iso.org/iso/
home/store/catalogue_tc/catalogue_detail.htm?csnumber=41014
10. ISO TC176/SC3 – Supporting technologies: ISO/TR 10017:2003 - Guidance on statistical techniques
for ISO 9001:2000. Published Technical Report (2003). http://www.iso.org/iso/home/store/catalogue_
tc/catalogue_detail.htm?csnumber=36674
11. ISO TC176/SC3 – Supporting technologies: ISO 19011:2011 - Guidelines for auditing management
systems. Published standard (2011). http://www.iso.org/iso/home/store/catalogue_tc/catalogue_detail.
htm?csnumber=50675
12. ISO TC69/SC1–Terminology and Symbols: ISO 3534-1:2006 - Statistics – Vocabulary and symbols –
Part 1: General statistical terms and terms used in probability. Published standard (2010). http://www.
iso.org/iso/catalogue_detail.htm?csnumber=40145
13. ISO TC69/SC1–Terminology and Symbols: ISO 3534-3:2013 - Statistics – Vocabulary and symbols –
Part 3: Design of experiments. Published standard (2013). http://www.iso.org/iso/catalogue_detail.
htm?csnumber=44245
14. ISO TC69/SC1–Terminology and Symbols: ISO 3534-2:2006 - Statistics – Vocabulary and symbols –
Part 2: Applied statistics. Published standard (2014). http://www.iso.org/iso/catalogue_detail.htm?
csnumber=40147
15. ISO TC69/SC1–Terminology and Symbols: ISO 3534-4:2014 - Statistics – Vocabulary and symbols –
Part 4: Survey sampling. Published standard (2014). http://www.iso.org/iso/catalogue_detail.htm?
csnumber=56154
29. ISO TC69/SC5–Acceptance sampling: ISO 2859-4:2002 - Sampling procedures for inspection by
attributes – Part 4: Procedures for assessment of declared quality levels. Published standard (2010).
http://www.iso.org/iso/catalogue_detail.htm?csnumber=36164
30. ISO TC69/SC5–Acceptance sampling: ISO 3951-3:2007 - Sampling procedures for inspection by
variables – Part 3: Double sampling schemes indexed by acceptance quality limit (AQL) for lot-by-lot
inspection. Published standard (2010). http://www.iso.org/iso/catalogue_detail.htm?csnumber=40556
31. ISO TC69/SC5–Acceptance sampling: ISO 2859-1:1999/Amd 1:2011. Published standard (2011).
http://www.iso.org/iso/catalogue_detail.htm?csnumber=53053
32. ISO TC69/SC5–Acceptance sampling: ISO 28801:2011 - Double sampling plans by attributes with
minimal sample sizes, indexed by producer’s risk quality (PRQ) and consumer’s risk quality (CRQ).
Published standard (2011). http://www.iso.org/iso/catalogue_detail.htm?csnumber=44963
33. ISO TC69/SC5–Acceptance sampling: ISO 3951-4:2011 - Sampling procedures for inspection by
variables – Part 4: Procedures for assessment of declared quality levels. Published standard (2011).
http://www.iso.org/iso/catalogue_detail.htm?csnumber=44806
34. ISO TC69/SC5–Acceptance sampling: ISO 13448-1:2005 - Acceptance sampling procedures based on
the allocation of priorities principle (APP) – Part 1: Guidelines for the APP approach. Published
standard (2013). http://www.iso.org/iso/catalogue_detail.htm?csnumber=37429
35. ISO TC69/SC5–Acceptance sampling: ISO 3951-1:2013 - Sampling procedures for inspection by
variables – Part 1: Specification for single sampling plans indexed by acceptance quality limit (AQL)
for lot-by-lot inspection for a single quality characteristic and a single AQL. Published standard
(2013). http://www.iso.org/iso/catalogue_detail.htm?csnumber=57490
36. ISO TC69/SC5–Acceptance sampling: ISO 3951-2:2013 - Sampling procedures for inspection by
variables – Part 2: General specification for single sampling plans indexed by acceptance quality limit
(AQL) for lot-by-lot inspection of independent quality characteristics. Published standard (2013).
http://www.iso.org/iso/catalogue_detail.htm?csnumber=57491
37. ISO TC69/SC5–Acceptance sampling: ISO 13448-2:2004 - Acceptance sampling procedures based on
the allocation of priorities principle (APP) – Part 2: Coordinated single sampling plans for acceptance
sampling by attributes. Published standard (2014). http://www.iso.org/iso/catalogue_detail.htm?
csnumber=37430
40. ISO TC69/SC5–Acceptance sampling: ISO 21247:2005 - Combined accept-zero sampling systems
and process control procedures for product acceptance. Published standard (2014). http://www.iso.org/
iso/catalogue_detail.htm?csnumber=34445
41. ISO TC69/SC5–Acceptance sampling: ISO 2859-10:2006 - Sampling procedures for inspection by
attributes – Part 10: Introduction to the ISO 2859 series of standards for sampling for inspection by
attributes. Published standard (2014). http://www.iso.org/iso/catalogue_detail.htm?csnumber=39991
42. ISO TC69/SC5–Acceptance sampling: ISO 2859-1:1999 - Sampling procedures for inspection by
attributes – Part 1: Sampling schemes indexed by acceptance quality limit (AQL) for lot-by-lot
inspection. Published standard (2014). http://www.iso.org/iso/catalogue_detail.htm?csnumber=1141
43. ISO TC69/SC5–Acceptance sampling: ISO 2859-3:2005 - Sampling procedures for inspection by
attributes – Part 3: Skip-lot sampling procedures. Published standard (2014). http://www.iso.org/iso/
catalogue_detail.htm?csnumber=34684
44. ISO TC69/SC5–Acceptance sampling: ISO 2859-5:2005 - Sampling procedures for inspection by
attributes – Part 5: System of sequential sampling plans indexed by acceptance quality limit (AQL) for
lot-by-lot inspection. Published standard (2014). http://www.iso.org/iso/catalogue_detail.htm?
csnumber=39295
45. ISO TC69/SC5–Acceptance sampling: ISO 3951-5:2006 - Sampling procedures for inspection by
variables – Part 5: Sequential sampling plans indexed by acceptance quality limit (AQL) for
inspection by variables (known standard deviation). Published standard (2014). http://www.iso.org/
iso/catalogue_detail.htm?csnumber=39294
46. ISO TC69/SC5–Acceptance sampling: ISO 8422:2006 - Sequential sampling plans for inspection by
attributes. Published standard (2014). http://www.iso.org/iso/catalogue_detail.htm?csnumber=39915
47. ISO TC69/SC5–Acceptance sampling: ISO 8423:2008 - Sequential sampling plans for inspection by
variables for percent nonconforming (known standard deviation). Published standard (2014). http://
www.iso.org/iso/catalogue_detail.htm?csnumber=41992
48. ISO TC69/SC5–Acceptance sampling: ISO 24153:2009 - Random sampling and randomization
procedures. Published standard (2015). http://www.iso.org/iso/catalogue_detail.htm?csnumber=42039
49. ISO TC69/SC6–Measurement methods and results: ISO 10725:2000 - Acceptance sampling plans and
procedures for the inspection of bulk materials. Published standard (2010). http://www.iso.org/iso/
catalogue_detail.htm?csnumber=33418
50. ISO TC69/SC6–Measurement methods and results: ISO 11843-2:2000 - Capability of detection – Part
2: Methodology in the linear calibration case. Published standard (2010). http://www.iso.org/iso/
catalogue_detail.htm?csnumber=20186
51. ISO TC69/SC6–Measurement methods and results: ISO 21748:2010 - Guidance for the use of
repeatability, reproducibility and trueness estimates in measurement uncertainty estimation. Published
standard (2010). http://www.iso.org/iso/catalogue_detail.htm?csnumber=46373
52. ISO TC69/SC6–Measurement methods and results: ISO 11648-2:2001 - Statistical aspects of
sampling from bulk materials – Part 2: Sampling of particulate materials. Published standard (2012).
http://www.iso.org/iso/catalogue_detail.htm?csnumber=23663
53. ISO TC69/SC6–Measurement methods and results: ISO 11843-5:2008 - Capability of detection – Part
5: Methodology in the linear and non-linear calibration cases. Published standard (2012). http://www.
iso.org/iso/catalogue_detail.htm?csnumber=42000
54. ISO TC69/SC6–Measurement methods and results: ISO 5725-1:1994 - Accuracy (trueness and
precision) of measurement methods and results – Part 1: General principles and definitions. Published
standard (2012). http://www.iso.org/iso/catalogue_detail.htm?csnumber=11833
55. ISO TC69/SC6–Measurement methods and results: ISO 5725-6:1994 - Accuracy (trueness and
precision) of measurement methods and results – Part 6: Use in practice of accuracy values. Published
standard (2012). http://www.iso.org/iso/catalogue_detail.htm?csnumber=11837
56. ISO TC69/SC6–Measurement methods and results: ISO/TR 13587:2012 - Three statistical approaches
for the assessment and interpretation of measurement uncertainty. Published standard (2012). http://
www.iso.org/iso/catalogue_detail.htm?csnumber=54052
57. ISO TC69/SC6–Measurement methods and results: ISO 11843-6:2013 - Capability of detection – Part
6: Methodology for the determination of the critical value and the minimum detectable value in
Poisson distributed measurements by normal approximations. Published standard (2013). http://www.
iso.org/iso/catalogue_detail.htm?csnumber=53677
58. ISO TC69/SC6–Measurement methods and results: ISO 5725-2:1994 - Accuracy (trueness and
precision) of measurement methods and results – Part 2: Basic method for the determination of
repeatability and reproducibility of a standard measurement method. Published standard (2013). http://
www.iso.org/iso/catalogue_detail.htm?csnumber=11834
59. ISO TC69/SC6–Measurement methods and results: ISO 5725-3:1994 - Accuracy (trueness and
precision) of measurement methods and results – Part 3: Intermediate measures of the precision of a
standard measurement method. Published standard (2013). http://www.iso.org/iso/catalogue_detail.
htm?csnumber=11835
60. ISO TC69/SC6–Measurement methods and results: ISO 5725-4:1994 - Accuracy (trueness and
precision) of measurement methods and results – Part 4: Basic methods for the determination of the
trueness of a standard measurement method. Published standard (2013). http://www.iso.org/iso/
catalogue_detail.htm?csnumber=11836
61. ISO TC69/SC6–Measurement methods and results: ISO 10576-1:2003 - Statistical methods –
Guidelines for the evaluation of conformity with specified requirements – Part 1: General principles.
Published standard (2014). http://www.iso.org/iso/catalogue_detail.htm?csnumber=32373
62. ISO TC69/SC6–Measurement methods and results: ISO 11095:1996 - Linear calibration using
reference materials. Published standard (2014). http://www.iso.org/iso/catalogue_detail.htm?
csnumber=1060
63. ISO TC69/SC6–Measurement methods and results: ISO 11648-1:2003 - Statistical aspects of
sampling from bulk materials – Part 1: General principles. Published standard (2014). http://www.iso.
org/iso/catalogue_detail.htm?csnumber=33484
64. ISO TC69/SC6–Measurement methods and results: ISO 11843-1:1997 - Capability of detection – Part
1: Terms and definitions. Published standard (2014). http://www.iso.org/iso/catalogue_detail.htm?
csnumber=1096
65. ISO TC69/SC6–Measurement methods and results: ISO 11843-3:2003 - Capability of detection – Part
3: Methodology for determination of the critical value for the response variable when no calibration
data are used. Published standard (2014). http://www.iso.org/iso/catalogue_detail.htm?csnumber=
34410
66. ISO TC69/SC6–Measurement methods and results: ISO 11843-4:2003 - Capability of detection – Part
4: Methodology for comparing the minimum detectable value with a given value. Published standard
(2014). http://www.iso.org/iso/catalogue_detail.htm?csnumber=34411
67. ISO TC69/SC6–Measurement methods and results: ISO 11843-7:2012 - Capability of detection – Part
7: Methodology based on stochastic properties of instrumental noise. Published standard (2014).
http://www.iso.org/iso/catalogue_detail.htm?csnumber=53678
68. ISO TC69/SC6–Measurement methods and results: ISO 5725-5:1998 - Accuracy (trueness and
precision) of measurement methods and results – Part 5: Alternative methods for the determination of
the precision of a standard measurement method. Published standard (2014). http://www.iso.org/iso/
catalogue_detail.htm?csnumber=1384
69. ISO TC69/SC6–Measurement methods and results: ISO/TS 28037:2010 - Determination and use of
straight-line calibration functions. Published standard (2014). http://www.iso.org/iso/catalogue_detail.
htm?csnumber=44473
70. ISO TC69/SC6–Measurement methods and results: ISO/TS 21749:2005 - Measurement uncertainty
for metrological applications – Repeated measurements and nested experiments. Published standard
(2015). http://www.iso.org/iso/catalogue_detail.htm?csnumber=34687
71. ISO TC69/SC7–Applications of statistical and related techniques for the implementation of Six
Sigma: ISO/TR 29901:2007 - Selected illustrations of full factorial experiments with four factors.
Published standard (2007). http://www.iso.org/iso/catalogue_detail.htm?csnumber=45731
72. ISO TC69/SC7–Applications of statistical and related techniques for the implementation of Six
Sigma: ISO/TR 29901:2007/Cor 1:2009. Published standard (2009). http://www.iso.org/iso/
catalogue_detail.htm?csnumber=54056
73. ISO TC69/SC7–Applications of statistical and related techniques for the implementation of Six
Sigma: ISO/TR 12845:2010 - Selected illustrations of fractional factorial screening experiments.
Published standard (2010). http://www.iso.org/iso/catalogue_detail.htm?csnumber=51963
74. ISO TC69/SC7–Applications of statistical and related techniques for the implementation of Six
Sigma: ISO/TR 14468:2010 - Selected illustrations of attribute agreement analysis. Published
standard (2010). http://www.iso.org/iso/catalogue_detail.htm?csnumber=52900
75. ISO TC69/SC7–Applications of statistical and related techniques for the implementation of Six
Sigma: ISO 13053-1:2011 - Quantitative methods in process improvement – Six Sigma – Part 1:
DMAIC methodology. Published standard (2011). http://www.iso.org/iso/catalogue_detail.htm?
csnumber=52901
76. ISO TC69/SC7–Applications of statistical and related techniques for the implementation of Six
Sigma: ISO 13053-2:2011 - Quantitative methods in process improvement – Six Sigma – Part 2: Tools
and techniques. Published standard (2011). http://www.iso.org/iso/catalogue_detail.htm?csnumber=
52902
77. ISO TC69/SC7–Applications of statistical and related techniques for the implementation of Six
Sigma: ISO/TR 12888:2011 - Selected illustrations of gauge repeatability and reproducibility studies.
Published standard (2011). http://www.iso.org/iso/catalogue_detail.htm?csnumber=52899
78. ISO TC69/SC7–Applications of statistical and related techniques for the implementation of Six
Sigma: ISO 17258:2015 - Statistical methods – Six Sigma – Basic criteria underlying benchmarking
for Six Sigma in organisations. Published standard (2015). http://www.iso.org/iso/catalogue_detail.
htm?csnumber=59489
79. ISO TC69/SC8–Application of statistical and related methodology for new technology and product
development: ISO 16336:2014 - Applications of statistical and related methods to new technology and
product development process – Robust parameter design (RPD). Published standard (2014). http://
www.iso.org/iso/catalogue_detail.htm?csnumber=56183
80. ISO TC69/SCS–Secretariat: ISO 11453:1996/Cor 1:1999. Published standard (1999). http://www.iso.
org/iso/catalogue_detail.htm?csnumber=32469
81. ISO TC69/SCS–Secretariat: ISO/TR 18532:2009 - Guidance on the application of statistical methods
to quality and to industrial standardization. Published standard (2009). http://www.iso.org/iso/
catalogue_detail.htm?csnumber=51651
82. ISO TC69/SCS–Secretariat: ISO 16269-4:2010 - Statistical interpretation of data – Part 4: Detection
and treatment of outliers. Published standard (2010). http://www.iso.org/iso/catalogue_detail.htm?
csnumber=44396
83. ISO TC69/SCS–Secretariat: ISO 11453:1996 - Statistical interpretation of data – Tests and confidence
intervals relating to proportions. Published standard (2012). http://www.iso.org/iso/catalogue_detail.
htm?csnumber=19405
84. ISO TC69/SCS–Secretariat: ISO 16269-7:2001 - Statistical interpretation of data – Part 7: Median –
Estimation and confidence intervals. Published standard (2012). http://www.iso.org/iso/catalogue_
detail.htm?csnumber=30709
85. ISO TC69/SCS–Secretariat: ISO 5479:1997 - Statistical interpretation of data – Tests for departure
from the normal distribution. Published standard (2012). http://www.iso.org/iso/catalogue_detail.htm?
csnumber=22506
86. ISO TC69/SCS–Secretariat: ISO/TR 13519:2012 - Guidance on the development and use of ISO
statistical publications supported by software. Published standard (2012). http://www.iso.org/iso/
catalogue_detail.htm?csnumber=53977
89. ISO TC69/SCS–Secretariat: ISO 2602:1980 - Statistical interpretation of test results – Estimation of
the mean – Confidence interval. Published standard (2015). http://www.iso.org/iso/catalogue_detail.
htm?csnumber=7585
90. ISO TC69/SCS–Secretariat: ISO 2854:1976 - Statistical interpretation of data – Techniques of
estimation and tests relating to means and variances. Published standard (2015). http://www.iso.org/
iso/catalogue_detail.htm?csnumber=7854
91. ISO TC69/SCS–Secretariat: ISO 28640:2010 - Random variate generation methods. Published
standard (2015). http://www.iso.org/iso/catalogue_detail.htm?csnumber=42333
92. ISO TC69/SCS–Secretariat: ISO 3301:1975 - Statistical interpretation of data – Comparison of two
means in the case of paired observations. Published standard (2015). http://www.iso.org/iso/
catalogue_detail.htm?csnumber=8540
93. ISO TC69/SCS–Secretariat: ISO 3494:1976 - Statistical interpretation of data – Power of tests relating
to means and variances. Published standard (2015). http://www.iso.org/iso/catalogue_detail.htm?
csnumber=8845
94. ISO/IEC: ISO/IEC Directives – Part 2:2011. Principles to structure and draft documents intended to
become International Standards, Technical Specifications or Publicly Available Specifications.
ISO/IEC Directives (2011). http://www.iec.ch/members_experts/refdocs/
95. ISO/IEC: ISO/IEC Directives – Part 1:2014. Official procedures to be followed when developing and
maintaining an International Standard and procedures specific to ISO. ISO/IEC Directives (2014).
http://www.iec.ch/members_experts/refdocs/
96. ISO/IEC JTC 1/SC 22 - Programming languages, their environments and system software interfaces:
ISO/IEC.9899:2011 Information technology – Programming languages – C. Published standard
(2011). http://www.iso.org/iso/home/store/catalogue_tc/catalogue_detail.htm?csnumber=57853
97. ISO/IEC JTC 1/SC 22 - Programming languages, their environments and system software interfaces:
ISO/IEC 14882:2014 Information technology – Programming languages – C++. Published standard
(2014). http://www.iso.org/iso/home/store/catalogue_tc/catalogue_detail.htm?csnumber=64029
98. Juran, J., Defeo, J.: Juran’s Quality Handbook: The Complete Guide to Performance Excellence.
McGraw-Hill, New York (2010)
99. Matsumoto, M., Nishimura, T.: Mersenne twister: a 623-dimensionally equidistributed uniform
pseudo-random number generator. ACM Trans. Modell. Comput. Simul. 8, 3–30 (1998)
[MATH][CrossRef]
100. Moen, R.D., Nolan, T.W., Provost, L.P.: Quality Improvement Through Planned Experimentation, 3rd
edn. McGraw-Hill, New York (2012)
101. Pathak, M.: Beginning Data Science with R. SpringerLink: Bücher. Springer, New York (2014)
[CrossRef]
102. Taguchi, G., Chowdhury, S., Wu, Y.: Taguchi’s Quality Engineering Handbook. Wiley, Hoboken
(2005)
[MATH]
Footnotes
1 Check the up-to-date list at http://www.iso.org/iso/home/about/iso_members.htm.
2 Frank Kaplan, see http://asq.org/about-asq/who-we-are/bio_box.html.
3 http://dictionary.cambridge.org/dictionary/british/quality.
4 http://en.wikipedia.org/wiki/Quality_(business).
5 For example, ISO 14000—Environmental management.
6 http://www.iso.org/iso/home/standards.htm.
7 Sometimes also in Russian.
8 Check http://www.iso.org/iso/home/store/catalogue_tc/catalogue_tc_browse.htm?commid=560992&
development=on.
9 http://www.iso.org/iso/home/store/catalogue_tc/catalogue_tc_browse.htm?commid=45202&published=
on.
10 There is an RSS channel for each standard, but the abstract is not there.
11 The grep function uses regular expressions to match text info.
12 The syntax package::function avoids conflict with other packages’ functions with the same name.
Part II
Statistics for Quality Control
This part includes two chapters about basic Statistics, needed to perform
statistical quality control. The techniques explained throughout the book make
use of basic concepts such as mean, variance, sample, population, probability
distribution, etc. A practical approach is followed, without details about the
mathematical theory. Chapter 5 reviews descriptive statistics, probability, and
statistical inference with quality control examples illustrated with the R
software. Chapter 6 tackles the crucial task of appropriately taking samples from
a population, using intuitive examples and R capabilities.
© Springer International Publishing Switzerland 2015
Emilio L. Cano, Javier M. Moguerza and Mariano Prieto Corcoba, Quality Control with R, Use R!,
DOI 10.1007/978-3-319-24046-6_5
Abstract
This chapter provides the necessary background to understand the fundamental
ideas of descriptive and inferential statistics. In particular, the basic ideas and
tools used in the description both graphical and numerical, of the inherent
variability always present in real world are described. Additionally, some of the
most usual statistical distributions used in quality control, for both the discrete
and the continuous domains are introduced. Finally, the very important topic of
statistical inference contains many examples of specific applications of R to
solve these problems. The chapter also summarizes a selection of the ISO
standards available to help users in the practice of descriptive and inferential
statistic problems.
Histogram
The histogram, briefly introduced in Chapter 3, is one of the most popular charts
used in statistics; it is simple to construct and simple to understand. A histogram
is a bar chart used for describing continuous variables. This bar chart shows the
distribution of the measurements of variables. On the x-axis, each bar’s width
represents an interval of the possible values of a variable. The height of the bars
(that is, the y-axis) represents the frequency (relative or absolute) of the
measurements within each interval. The rule is that the area of the bars should be
proportional to the frequencies.
The histogram does not give us information about the behavior of the
measurements with respect to time; it is used to find the distribution of a
variable, that is:
Is the variable centered or asymmetric?
What is the variation like? Are the observations close to the central values,
or is it a spread distribution?
Is there any pattern that would prompt further analysis?
Is it a normal distribution?
To make a histogram of our data, we first determine the number of bins
(bars) that we are going to plot. Then, we decide on the width of the intervals
(usually the same for all intervals) and count the number of measurements within
each interval. Finally, we plot the bars. For intervals with equal widths, the
height of the bars will be equal to the frequencies.
The simplest way to create a histogram with R is by means of the standard
graphics capabilities, i.e., using the hist function. A simple call to this function
with the vector of data as argument plots the histogram. More elaborate charts
can be made using the lattice and ggplot2 packages, see Chapter 2 They are
specially useful if we need to visualize several histograms in the same chart as
we will see in the following example. Remember that histograms are one of the
seven basic quality control tools, see Chapter 3, you can see there more
examples.
Example 5.1.
Metal plates thickness. Histogram.
Table 5.1 contains two sets of 12 measurements, each one corresponding to
the thickness of a certain steel plate produced in 2 successive days. Nominal
thickness of this product is 0.75 in. Production equipment was readjusted after
Day 1 because the engineer in charge of the production line concluded that the
product was thicker than required. The following code creates a data frame in the
R workspace1:
Although in the case of the data corresponding to Day1 it may seem clear
that the numbers are larger than 0.75 in (only two data points out of twelve are
not larger than that value) a histogram will put the situation even clearer. Fig. 5.1
shows the histogram generated with the following code:
Fig. 5.1 Histogram. A histogram provides an idea of the data distribution. The x axis represents the
magnitude we are measuring. The y axis is for the frequency of data. Thus, hints about the range, central
values and underlying probability distribution can be got
As was shown in Chapter 2, even though R can produce charts with very
little information (for example, in the above code only the first argument would
be needed), we can add options to get the desired result. Try the function with
and without the added options and see the differences. When doing so, we
explain the arguments used in the code. In this case:
There are more graphical options, you can always check the documentation
of the plotting function (in this case, hist) and the graphical parameters options
(par).
Having this chart will give the engineer a good argument to backup his
feeling about Day1 data. A good idea would be to represent data before and after
the adjustment of the equipment, in the search for an evidence of the
improvement. A straightforward way to display charts for different groups in R
is using functions of the lattice package. The following code produces the chart
in Fig. 5.2.
Fig. 5.2 Histograms by groups. When different groups are in a data set, visualization by those groups is a
powerful tool
Run Chart
A run chart is a bidimensional chart where the x-axis represents a time line and
on the y-axis is plotted the variable that we want to monitor. These types of
charts are also called time-series charts when we have a time scale on the x-axis.
The scale of the x-axis may not necessarily be equally temporally shifted (for
example, the volume of some recipients whose production is sequential). Thus,
we will have a number of subgroups where a characteristic is measured, and we
have the order of the subgroups (notice that a subgroup may contain only one
element). Usually a centered line is plotted in a run chart. It may represent a
target, the mean of the data, or any other value. Run charts allow us to detect
patterns that can be indicative of changes in a process. Changes entail variability
and, thus less quality. In particular, if we detect cycles, trends, or shifts, we
should review our process. If we use the median as center line, then half of the
points are below the center line, and have of the points are above the center line.
If a process is in control, the position of each point with respect to the center line
is random. However, if the process is not in control, then non-random variation
appears in the run chart. In addition to the apparent patterns detected visualizing
the chart, additional numerical tests can be run to detect such non-random
variation.
Simple run charts can be plotted using R standard graphics simply plotting a
vector of data with the plot function, and then adding a center line using the
abline function. The qichart package provides a simple interface to plot run
charts and get run tests.
Example 5.2.
Metal plates thickness (cont.) Run chart.
The run chart corresponding to our example of plate thickness in Fig. 5.3 is
the simplest version of a run chart we can get with just the following two
expressions:
Fig. 5.3 Simple run chart. The run chart provides insights about changes in the process
type Sets the type of representation for each data point (lines, points, both)
pch Sets the symbol to be plotted at each point
h Sets the value in the vertical axis at which a horizontal line will be plotted
lwd Sets the line width
The run chart produced by the qicharts package in Fig. 5.4 provides further
information. In particular, the number of observations, the longest run, and the
crossings.
Fig. 5.4 Run chart with tests. Numerical tests can be run to detect process shifts
freeze Point in the x axis that divides the data set in two subsets
pre.text Text to annotate the pre-freeze period
post.text Text to annotate the post-freeze period
runvals (logical) whether to print the statistics from runs analysis
Note that the median has been computed only with day 1 data, as we have
frozen those data as in-control data. The numbers in brackets for the longest run
and the crossings are the limits to consider that the process is in control. In this
case, both criteria indicate that the distribution around the median cannot be
considered random. Details about the foundations of these tests can be found in
[2] and [15].
The sequential representation of the same data provides a substantially
different vision that what the histogram does. Now it seems to be evident that a
real change occurred after observation number 12 (corresponding to the
adjustment of the process after Day1)
If we analyze in detail the information provided by the histogram and the run
chart, we can conclude that none is better, both are complementary and
necessary to understand the whole picture of the situation. □
Example 5.3.
Metal plates thickness (cont.) Tier chart.
Now let us suppose that our plate thickness data for each day can be divided
into two groups of equal size, each corresponding to a different working shift.
The following code adds this information to the data frame:
We can plot a dot plot using the dotplot function in the lattice package,
and then add lines to get a typical tier chart. The following code plots the tier
chart in Fig. 5.5.
Fig. 5.5 Tier chart by shifts. A tier chart displays variability within and between groups
Note that the panel argument in lattice plots allows to add a lot of
sophistication to multivariate plots. The value for this argument is an anonymous
function in which the elements in the panel are drawn by calling panel.*
functions. Usually, the first element is the counterpart of the container function,
in our example, panel.dotplot. See [14] for details about lattice graphics. □
Box-and-Whisker Plot
The box-and-whisker plot is also known as the box plot. It graphically
summarizes the distribution of a continuous variable. The sides of the box are
the first and third quartiles (25th and 75th percentile, respectively).2 Thus, inside
the box we have the middle 50 % of the data. The median is plotted as a line that
crosses the box. The extreme whisker values can be the maximum and minimum
of the data or other limits beyond which the data are considered outliers. The
limits are usually taken as:
where Q 1 and Q 3 are the first and third quartiles, respectively, and IQR is the
interquartile range (Q 3 − Q 1). Quantiles and IQR will be explained in detail in
Sect. 5.1.3.2. We can replace 1.5 with any value in the boxplot function of R.
The outliers are plotted beyond the whiskers as isolated points and can be
labeled to identify the index of the outliers. The box plot tells us if the
distribution is centered or biased (the position of the median with respect to the
rest of the data), if there are outliers (points outside the whiskers), or if the data
are close to the center values (small whiskers or boxes). This chart is especially
useful when we want to compare groups and check if there are differences
among them.
To create a boxplot with R we can use the boxplot standard function or other
packages functions like bwplot in the lattice package.
Example 5.4.
Metal plates thickness (cont.) Box plot.
For the example of the metal plates, we obtain the boxplot in Fig. 5.6 for the
whole data set using the following code:
Fig. 5.6 Box plot for all data. A box plot shows the distribution of the data
To compare groups using boxplots, we can add a formula to the function, see
Fig. 5.7:
Fig. 5.7 Box plot by groups. Displaying boxplots for different groups is a powerful visualization tool
We can even make comparisons using more than one grouping variable. The
lattice package is more appropriate for this task. In Fig. 5.8, we create a panel
for each day using the following expression:
Fig. 5.8 Lattice box plots. A deeper stratified analysis can be made using lattice graphics
□
1. The sample mean is the average value. This is the most widely used measure
due to its mathematical properties. The main inconvenience is that it is
sensitive to outliers (values far from the central values):
Example 5.5.
Metal plates thickness (cont.) Mean of the metal plates thickness.
The mean function over a vector returns the average of the values in the
vector. If there are missing values (NA), the returned value is NA, unless we set the
na.rm argument to TRUE, see Chapter 2 for details.
To make any computation by groups in R, we can use the tapply function, in
our example:
□
2. The median is the value that divides the data into two halves: one containing
the higher values, the other containing the lower values. It is not influenced
by outliers. If we have an even number of data, the average value of the two
central values is taken.
The median function computes the median of a numeric vector in R.
Example 5.6.
Metal plates thickness (cont.) Median of the metal plates thickness.
□
3. The mode is the most frequent value (or range of values in a continuous
variable). In a frequency table, it is the value that has the maximum
frequency. Even though the mean and the median are unique values, a data set
might have more than one mode. Many times this means that the sample data
come from merged populations, and we should measure categorical variables
in order to make an appropriate stratification.
There is not a function to calculate the mode with R. Instead, we need to
check the value or range of values that has the maximum frequency.
Example 5.7.
Metal plates thickness (cont.). Mode of the metal plates thickness.
For the example of the metal plates none of the values are repeated, so there
is no value that can be considered as the mode. However, we can check which
interval has the highest frequency. This is the modal interval. We can follow two
approaches. One is to divide the range into a number of intervals and check
which interval is the modal one, for example:
In the code above, we first divide the range of the variable into (rounded)
intervals with the cut function, where n is the number of observations. This is
quite a common rule to start, but there are other, for example to build a
histogram the Sturges’ formula [16] is used, type ?nclass.Sturges to see other.
Then we create an object with the frequency table, and finally the name of the
class whose frequency is maximal is showed.
The second approach is to use the intervals produced to construct a
histogram, which are taken with a rounded interval width. To do that, we simple
save the result of the hist function instead of plotting it, and then access to the
object in which that information was saved. We could also take the mid value of
the interval in order to provide a single value for the mode.
In the object saved, the element mid contains the mid points of each interval,
and the counts element contains the frequencies.
Note that both approaches provide a similar result, which is definitely
enough for a descriptive analysis. Putting all together, if we compare the central
tendency measures and the graphical tools (histogram and box plot), we can see
how the asymmetry shown by the histogram, with a slight positive skew (longest
right tail) results in a mean higher than the median and the mode. Fig. 5.9 shows
the measures in the own histogram, produced with the following code.
Fig. 5.9 Histogram with central tendency measures. The mean of a sample data departs from the median to
the longest tail
□
2. plotted a histogram of the thickness data;
3. plot a vertical line for each central tendency measure;
4. added a legend to the plot.
Variability
Variability is statistics’ reason for being [1]. In this section we will see how to
measure such variability. The variance is the most important measure of
variability due to its mathematical properties. It is the average squared distance
from the mean, and we will represent it by σ 2:
Example 5.8.
Metal plates thickness (cont.) Variance.
□
The variance is in square units compared with the mean. Hence, the standard
deviation is the most commonly used variability measure:
where c 4 and d 2 are tabulated constants for a given n sample size, and R is
the sample range (see below).
Example 5.9.
Metal plates thickness (cont.) Standard deviation.
□
The range (R) is the difference between the maximum value and the minimum
value, but it is strongly influenced by extreme values. Nevertheless, when we
have few data, it is used as a robust method to estimate the variability as outlined
above, see [12] for details:
To calculate the range with R, we need to get the maximum and minimum
using the range function, and then get the difference.
Example 5.10.
Metal plates thickness (cont.) Sample range.
Note how we can add any customized function on the fly to make any
computation by groups. The third argument of the tapply function in the above
code is a so-called anonymous function, always with an x as argument that is
then used within the function body. This strategy can also be used in vectorized
functions such as lapply and sapply, type ?lapply in the console and check the
documentation and examples on the topic.
□
Similarly to the median, we can compute the quartiles. These are the values
that divide the data into four parts. Thus the median is the second quartile (Q 2).
The first quartile (Q 1) is the value that has 25 % of data below it, and the third
quartile (Q 3) is the value above which 25 % of data remain. The interquartile
range (IQR) is a measure of variability that avoids the influence of outliers. This
range contains the middle 50 % of the data:
To calculate the quartiles with R we can use the quantile function. The
summary function over a numeric vector returns the five numbers summary (max,
min, Median, Q 1 and Q 3) and the mean. The IQR function computes the
interquartile range. The mad function returns the median absolute deviation.
Example 5.11.
Metal plates thickness (cont.) MAD, Quartiles and interquartile range.
□
Frequency Tables
Usually, raw data becomes as difficult to interpret as the number of observations
increases. Thus, many times the first operation we do with the data is to build a
frequency table. For a discrete variable, the frequency of a value of the variable
is the number of times this particular value appears. For a discrete variable the
relative frequency is the fraction of times the value appears.
For continuous variables we need to arrange the data into classes. For
example, if in our example of plate thickness we want to count the number of
values above and below the nominal value of 0.75 in, we first create a new factor
variable.
We use the R function table to get frequency tables. To discretize a
numerical variable we can use the cut function or just combine logical
expressions and assignments.
Example 5.12.
Metal plates thickness (cont.) Frequency tables.
□
Hypergeometric Distribution
The hypergeometric distribution is used when, within a finite population of N
items, there is a certain number D of them that belong to a certain category, e.g.,
defectives. The problem consists in calculating the probability of obtaining x
items of that special category if a random sample of n items is taken from the
population without replacement. In this case, we say that X follows a
hypergeometric distribution and we denote it by X ∼ H(N, n, p), where .
That probability can be calculated by
where, in general, is the binomial coefficient:
Example 5.13.
Metal plates thickness (cont.)
In our example, considering the population of the 24 plates produced in the 2
days, there is a total of seven plates thinner than the nominal value. If we take a
random sample of five plates, which is the probability of obtaining all of them
thicker than the nominal value?
If X is the random variable number of plates thinner than the nominal value,
the question to answer is: P(X = 0), and therefore we use the dhyper function. For
this particular random variable, N = 24, D = 7, and n = 5. Thus, we get the sought
probability as follows:
Binomial Distribution
The binomial distribution is the appropriate distribution to deal with proportions.
It is defined as the total number of successes in n independent trials. By
independent trial we mean the so-called Bernoulli trial, whose outcome can be
success or failure, with p being the probability of success assumed as constant.
The binomial distribution is absolutely determined by the parameters p and n,
and its probability function is
Example 5.14.
Metal plates thickness (cont.) Binomial distribution.
In our example, in the second day just 6 out of the 12 plates (50 %) were
thinner than the nominal value. If we suppose that this rate remains constant (a
necessary hypothesis to consider the binomial distribution as applicable), it
could be possible to calculate the probability of obtaining x=0 plates thinner that
the nominal values in the next 5 units. Let X be the binomial random variable
number of thinner-than-nominal plates in a 5-sized sample . The
question is to compute P(X = 0), which can be done with the following
expression:
This probability, just around 3 %, indicates that this would be a rare event.
Another example would be to get the probability of having more than 1
thinner-than-nominal plates. We could add up the densities from 2 to 5, or
simply use the pbinom function to get P(X > 1) as follows:
□
Poisson Distribution
The Poisson distribution is useful to describe random processes where the events
occur at random and at a per unit basis, e.g., defects per unit surface, or defective
units per hour. In this distribution the rate of occurrence is supposed to be
constant and, theoretically, the number of events may range from zero to infinity.
The Poisson probability of observing x events in a sample unit for which the
average number of defects were λ, would be calculated as:
Example 5.15.
Metal plates thickness (cont.)
If in our production of metal plates the rate of a certain surface defect were
0.2 defects/unit, which would be the probability that the next unit has zero
defects? Being X the random variable number of defects per unit ∼ P o(λ = 0. 2),
we want to know P(X = 0), computed in R as follows:
Fig. 5.10 Normal distribution. Between the mean and three standard deviations fall 99.7 % of the data.
Between the mean and two standard deviations fall 95.5 % of the data. Between the mean and one standard
deviation fall 68.3 % of the data
P(X 2 ≤ x):
These results clearly show that a shift has taken place in the process, since
the fraction of values raised from 11.40 % to 56.03 %.
□
Data Transformation
In many situations we have to deal with non-normal distributions. Typical non-
normal distributions are skewed, a clearly different behavior to the one of the
symmetrical one exhibited by the normal distribution.
Example 5.17.
Metal plates thickness (cont.) Non-normal data.
That could be the case, in our example, after some kind of change in the
production process in days 3, 4, and 5. The histogram departs from the
symmetric normal distribution, being better represented by a probability
distribution skewed to the right. First, let us add the new data to the data frame:
The histogram in Fig. 5.11 clearly depicts the new situation. The mean is
now far from the median, and a normal density (solid line) does not fit at all with
the histogram shape.
Fig. 5.11 Histogram of non-normal density data. The histogram shows a non-normal distribution with a
positive skew
□
Typical control charts can work well with non-normal data, since their basic goal
is to identify anomalous departures from a stable (normal) process. We will see
in detail control charts in Chapter 9. Nevertheless, we are going to illustrate here
a data transformation using control charts for individual values.
Example 5.18.
Metal plates thickness (cont.) Non-normal data control chart.
The control chart using the non-normal data would be the one in Fig. 5.12.
We use the following code to plot this control chart, see Chapter 9 for details on
the qcc package and function.
Fig. 5.12 Individuals control chart of non-normal density data. In principle, we detect two out-of-limits
points and a broken pattern rule
□
But if from the knowledge of the process one could conclude that this new
behavior will represent the process in the near future, it could be advisable to
adjust the data in order to account for the non normality.
The idea is to transform the data with the help of an algorithm in such a way
that after the transformation they will look like a true normal population. Then
the control chart, ideally conceived to detect assignable causes in a normal
population would be even more effective.
The simplest algorithm used for this purpose is called the Box-Cox
transformation. It consists in taking the original data to the power λ (or logλ if λ
= 0). There exists an optimum value of this parameter for which the transformed
data mostly shows a normal behavior.
There are several ways for performing Box-Cox data transformation with R.
One is the boxcox function in the MASS package, which provides a plot of the
possible values of λ against their log-Likelihood, including a confidence interval.
Another one is to use the powerTransform function in the car package, which
provides numerical results. We will combine both approaches in the following
example.
Example 5.19.
Metal plates thickness (cont.) Non-normal data transformation.
The application of the transformation for the data corresponding to day 3, 4,
and 5 yields the plot in Fig. 5.13, generated with the following code:
The fact that working with the transformed data yields no data points out of
the control limits is a clear indication that, actually, this process was statistically
in control.
□
That is, the number of events (x) in n Bernoulli experiments over the number
of experiments.
For normal distributions the sample mean is an unbiased estimator of the
population mean
The sampling distribution of this mean is, in turn, a normal distribution with
the following parameters:
For the variance the unbiased estimator is the sample variance, defined as
We have obtained an estimation for the parameter of interest for our process.
However, any estimation is linked to some uncertainty, and therefore we will
have to deal with some error level. To quantify this uncertainty, we use interval
estimation. Interval estimation consists in giving bounds for our estimation (LL
and UL , lower and upper limits). These limits are calculated so that we have
confidence in the fact that the real value of the parameter is contained within
them. This fact is stated as a confidence level and expressed as a percentage. The
confidence level reflects the percentage of times that the real value of the
parameter is assumed to be in the interval when repeating the sampling. Usually
the confidence level is represented by 100 × (1 −α) %, with α the confidence
coefficient. The confidence coefficient is a measure of the error in our
estimation. Common values for the confidence level are 99, 95, or 90 %,
corresponding, respectively, to α = 0. 01, α = 0. 05, and α = 0. 1.
A confidence interval is expressed as an inequality. If θ is a parameter, then
[LL, UL] means
Example 5.20.
Metal plates thickness (cont.) Confidence interval for a proportion.
In Example 5.14 we estimated the proportion of plates thinner than the
nominal value as:
That was a point estimator, we can obtain a confidence interval with the
following code:
Thus, a confidence interval for the proportion of plates thinner than the
nominal value is [0.2538, 0.7462].
An exact test returns a wider (more conservative) interval:
Other methods are used and compared with the following code, see the
documentation of the binconf function to find out more.
□
We can see that the two confidence intervals do not overlap, since the upper
limit of Day2 (0,7696) is smaller than the lower limit of Day1 (0,7885). We may
anticipate that a situation like this is a clear indication that the two means are
different. □
Example 5.22.
Metal plates thickness (cont.) Confidence interval for the mean.
Let’s calculate with R the 95 % confidence intervals for the variance of the
populations corresponding to Day1 and Day2. The function var.test does not
return a confidence interval for the variance of a population given a sample, but
for the ratio between variances. Nevertheless, we can easily construct the
confidence interval computing the limits in the formulae above.
□
Means
This test can be stated in various ways; we could, for example, test the null
hypothesis that the mean of one population is equal to the mean of another
population; these kinds of tests are called two-sided tests. For example,
The purpose of this last test is to demonstrate that the mean of the second sample
is smaller than the first one.
Example 5.23.
Metal plates thickness (cont.) Hypothesis tests for the mean.
The application of this last kind of test to our data corresponding to Day1
and Day2 yields the following result:
Since the p-value is very small (even lower than 0.01) we reject the null
hypothesis and consider that the Day1 values mean is larger than the Day2
mean. □
Variances
The null hypothesis of this test is usually stated as the ratio of the two variances
to be compared being equal to one. This is
The purpose of this last test is to demonstrate that the variance of the second
sample is smaller than the first one.
Example 5.24.
Metal plates thickness (cont.) Hypothesis tests for the variance.
The application of the two-sided test to Day1 and Day2 data yields the
following result.
Since p-value is large (even larger than 0.1), we cannot reject the null
hypothesis, thus accepting that the two variances are equal. □
Proportions
The test to decide if two proportions, p 1 and p 2, differ in a hypothesized value,
p 0, is similar to the test of means, this is;
Example 5.25.
Metal plates thickness (cont.) Hypothesis tests for proportions.
In our example, in the second day just 6 out of the 12 plates were thinner
than the nominal value of 0.75, while in the first day only 1 out of the 12 plates
was. The question is, are these two proportions equal? The test of proportions
gives the following result:
Since the p-value is greater than the 0.05 criterion, we cannot reject the null
hypothesis at a 95 % confidence level, thus accepting that the two proportions
are equal. The p-value is close to α = 0. 05, though. A larger sample could be
drawn to perform again the test if we suspect that there is a real difference
between proportions.
□
Normality
In many situations it is necessary to check if the data under analysis follow a
normal distribution. The reason for this is that many tests have been developed
under the hypothesis that the data are normal; therefore, if this requirement is not
fulfilled by the data, the results of the test could be misleading.
Example 5.26.
Metal plates thickness (cont.) Hypothesis tests for normality.
There are several statistical tests to check normality, the most known is
called the Shapiro-Wilks test. The hypothesis are as follows:
H 0: The data are normally distributed
H 1: The data are not normally distributed
Let’s use this test to check normality for the 12 data points of Day1:
Day1 data is normal. A graphical tool can also be used to check normality, it
is called a Quantile-Quantile plot (or Q-Q plot). In this plot, if data come from a
normal distribution, the points lie approximately along a straight line, see
Fig. 5.15, which has been obtained with the following expressions:
Fig. 5.15 Quantile-Quantile plot. The points are approximately in a straight line
If we try with Day2 we get a very similar result. But what happens with data
from days 3, 4, and 5? Let us give it a try:
These data are clearly non-normal, and we reject the null hypothesis with a
high confidence (the p-value is very small). We guessed that in Sect. 5.2.2.3 and
transformed the data using the Box-Cox transformation. In this case, non-
normality was clear just looking at the histogram in Fig. 5.11, but sometimes this
is not evident and we can confirm it with this simple test and a Quantile-Quantile
plot like the one in Fig. 5.16.
Fig. 5.16 Quantile-Quantile plot (non normal). The points clearly depart from the straight line
□
2. Chen, Z.: A note on the runs test. Model Assist. Stat. Appl. 5, 73–77 (2010)
3. Hsu, H.: Shaum’s Outline of Probability, Random Variables and Random Processes. Shaum’s Outline
Series, 2nd edn. McGraw-Hill, New York (2010)
4. ISO TC69/SC1–Terminology and Symbols: ISO 3534-1:2006 - Statistics – Vocabulary and symbols –
Part 1: General statistical terms and terms used in probability. Published standard (2010). http://www.
iso.org/iso/catalogue_detail.htm?csnumber=40145
6. ISO TC69/SCS–Secretariat: ISO 11453:1996 - Statistical interpretation of data – Tests and confidence
intervals relating to proportions. Published standard (2012). http://www.iso.org/iso/catalogue_detail.
htm?csnumber=19405
7. ISO TC69/SCS–Secretariat: ISO 5479:1997 - Statistical interpretation of data – Tests for departure
from the normal distribution. Published standard (2012). http://www.iso.org/iso/catalogue_detail.htm?
csnumber=22506
8. ISO TC69/SCS–Secretariat: ISO 2602:1980 - Statistical interpretation of test results – Estimation of the
mean – Confidence interval. Published standard (2015). http://www.iso.org/iso/catalogue_detail.htm?
csnumber=7585
10. ISO TC69/SCS–Secretariat: ISO 3301:1975 - Statistical interpretation of data – Comparison of two
means in the case of paired observations. Published standard (2015). http://www.iso.org/iso/catalogue_
detail.htm?csnumber=8540
11. ISO TC69/SCS–Secretariat: ISO 3494:1976 - Statistical interpretation of data – Power of tests relating
to means and variances. Published standard (2015). http://www.iso.org/iso/catalogue_detail.htm?
csnumber=8845
12. Montgomery, D.: Statistical Quality Control, 7th edn. Wiley, New York (2012)
13. Rumsey, D.: Statistics For Dummies. Wiley, New York (2011)
14. Sarkar, D.: Lattice: Multivariate Data Visualization with R. Springer, New York (2008). http://lmdvr.r-
forge.r-project.org. ISBN 978-0-387-75968-5
15. Schilling, M.F.: The surprising predictability of long runs. Math. Mag. 85, 141–149 (2012)
[MATH][MathSciNet][CrossRef]
16. Sturges, H.A.: The choice of a class interval. J. Am. Stat. Assoc. 21, 65–66 (1926)
[CrossRef]
Footnotes
1 The data frame is also available in the SixSigma package.
2 Actually, a version of those quartiles called hinches, see [5] and?boxplot.stats.
3 A normal distribution with μ = 0 and σ = 1.
© Springer International Publishing Switzerland 2015
Emilio L. Cano, Javier M. Moguerza and Mariano Prieto Corcoba, Quality Control with R, Use R!,
DOI 10.1007/978-3-319-24046-6_6
Abstract
Statistical Quality Control tries to predict the behavior of a given process
through the collection of a subset of data coming from the performance of the
process. This chapter showcases the importance of sampling and describes the
most important techniques used to draw representative samples. An example
using R on how to plot Operating Characteristic (OC) curves and its application
to determine the sample size of groups within a sampling process is shown.
Finally, the ISO Standards related to sampling are summarized.
Example 6.1.
Pool Liquid Density.
Let us suppose we have to determine the average density of the liquid
contained in a large pool. Let us also suppose this liquid contains a certain solid
compound dissolved in the base liquid; as long as the solid material will slowly
tend to fall downwards forced by gravity, density will not be uniform at different
depths in the pool.
If, based on ease of collection, we took samples from the surface of the pool,
the resulting average density so calculated would underestimate the real density
in the entire pool. In this case we can say that these samples do not adequately
represent the population parameter. □
Example 6.2.
Complex Bills.
A transactional process generates complex bills, consisting of many data
fields that have to be filled by the clerks. Thirty-two bills were produced
yesterday, and the supervisor wishes to check eight of them in detail. Which ones
should he choose? Table 6.1 shows all the population of bills. The data in
Table 6.1 is in the ss.data.bills data frame of the SixSigma package and it is
available when loading the package:
Table 6.1 Complex bills population
Bill no Clerk Errors Bill no Clerk Errors Bill no Clerk Errors
1 Mary 2 9 John 0 17 John 1
2 Mary 2 10 John 1 18 John 0
3 John 0 11 John 2 19 John 0
4 John 1 12 Mary 1 20 John 0
5 John 2 13 Mary 1 21 John 0
6 John 0 14 Mary 1 22 John 0
7 John 0 15 John 0 23 Mary 1
8 John 0 16 John 1 24 Mary 1
Thus, we have a data frame with 32 observation and three variables: nbill
for the bill identification; clerk for the clerk name and errors for the count of
errors in the bill.
We have to select eight random numbers between 1 and 32 and choose the
bills with the selected identifiers as sample elements. In other words, we need to
take a random sample of the nbill variable in the ss.data.bills data frame.
To do that with R, we use the sample function. It has three important arguments:
the vector that contains the population to be sampled, the sample size, and
whether the sample is with or without replacement. Replacement means that a
member of the population can be selected more than once. In this case, the
population is formed by the bill’s identifiers, the size is equal to eight, and the
sample is without replacement.
Note that in the above code we fix the seed using the set.seed function for
the sake of reproducibility of the example. In this way, anyone who runs the
code will get the same results. This is due to the fact that random numbers
generated with computers are actually pseudo-random because they are based on
an initial seed. In a production environment, the seed is rarely set, except in
specific conditions such as simulation experiments that should be verified by a
third party. Find out more about Random Number Generation (RNG) with R in
the documentation for the RNG topic (type?RNG in the R console). ISO 28640
Standard deals with random variate generation methods, see [8].
The result is that the supervisor has to select bills No. 27, 23, 29, 3, 2, 16, 11,
and 13. We can save the sample in a new data frame as a subset of the population
as follows:
Based on this sample, the average number of defects in the population should
be estimated (see Chapter 5) as 1 defect per bill:
□
Example 6.3.
Complex Bills (Cont.) Stratified sampling.
We can get in R the proportions of each clerk both in the population and in
the sample with the following code:
Thus, in order to stratify the sample, a 25 % of the sample, namely 2 bills,
will be taken from Mary’s production and a 75 % of the sample, namely 6 bills,
will be taken from John’s production. In R, we can first extract the bills from
each stratum:
and then draw a sample from each stratum of the appropriate size:
Based on this sample, the average number of defects in the population should
be estimated as a weighted mean:
Example 6.4.
Complex Bills (Cont.) Cluster sampling.
Going back to the example of the bills, the clusters could be the different
customers to whom bills are made for. Measuring the number of defects for the
bills corresponding to one or two customers a good result could be obtained at a
much lower cost. □
Example 6.5.
Complex Bills (Cont.) Systematic sampling.
In our example of the bills it was decided to take a sample of 8 items, so an
item must be selected every 32/8=4 bills. We only have to decide, at random,
which of the four first bills will be selected as the first one in the sample (let this
number be n) and then continue selecting (n + 4), (n + 8), etc. □
2. The null hypothesis is false and is not rejected (Error type II).
Fig. 6.1 illustrates these two possibilities for a typical control chart that keeps
track of sample average value, i.e., the X-bar chart, see Chapter 9. In this chart,
the null and alternative hypotheses are, respectively
Fig. 6.1 Error types. Different error types for an x-bar chart
If H 0 were true (left part of the figure), a sample A could fall outside of the
control limits thus leading us to reject H 0. On the other hand, if H 0 were false
(right part of the figure), a sample B could fall within the control limits thus
leading us to accept H 0.
The maximum probabilities for these situations to occur are denoted as α for
error type I and β for error type II, and they are set up in the design stage of any
hypothesis test. In particular, in Chapter 5 we showed that usually α is typically
set to 0.01, 0.05, or 0.1. It can be proved that there exists a specific relationship
among α, β, δ, and n (sample size) for every hypothesis test.
For the case of control charts it is very important to know what the capability
of the chart will be for detecting a certain change in the process, e.g., in the
process mean. This capability of detecting a change of a certain magnitude is
called the “power” of the chart. It can be shown that
We can easily plot OC curves for quality control with R. The function
oc.curves in the qcc package plots the operating characteristic curves for a ‘qcc’
object. We explain in detail objects whose class is qcc in Chapter 9. To illustrate
OC curves in this chapter, let us consider the example in Chapter 1
Example 6.6.
Pellets Density.
In this example, a set of 24 measurements for the density of a given material
are available, see Table 6.2. In order to plot OC curves for an X-bar chart, we
need the data organized in rational subgroups. Let us assume that every four
measurements make up a group. Therefore, there are six samples whose size is
four. With this information, we can create a qcc object as mentioned above. First,
we need to create the data and the qcc.groups object as follows:
Table 6.2 Pellets density data
Now we can create the qcc object, and plot the OC curves for that specific
control chart (see Fig. 6.2):
Fig. 6.2 OC curves. Each curve represents a function of the error type II probability as a function of the
deviation from the process mean that the control chart will be able to detect for different sample sizes
Fig 6.2 shows the representation of β for different sample sizes. This figure is
very useful as it is the basis for determining the sample size required for
detecting a given process shift with a desired probability. Furthermore, if we
save the result of the oc.curves function in an R object, we can explore the
complete set of data and look for the best sampling strategy. The first rows of the
matrix created are as follows:
and we can check the type II error for each sample size for a given deviation
from the current process mean. For example, if we want to detect a 1.5 standard
deviations depart from the mean:
With the current sample size (n = 4), the probability of false negatives β, i.e.,
being the process out of control the chart does not show a signal, is near 50 %.
We need groups of 10 to have this value around 0.04, i.e., a power of at least 95
%. Note that we can choose the samples sizes to plot through the n argument of
the oc.curves function. On the other hand, the function also provides OC curves
for attributes control charts (see Chapter 9). □
References
1. Cochran, W.: Sampling Techniques. Wiley Series in Probability and Mathematical Statistics: Applied
Probability and Statistics. Wiley, New York (1977)
2. ISO TC69/SC1–Terminology and Symbols: ISO 3534-1:2006 - Statistics – Vocabulary and symbols –
Part 1: General statistical terms and terms used in probability. Published standard (2010). http://www.
iso.org/iso/catalogue_detail.htm?csnumber=40145
3. ISO TC69/SC1–Terminology and Symbols: ISO 3534-2:2006 - Statistics – Vocabulary and symbols –
Part 2: Applied statistics. Published standard (2014). http://www.iso.org/iso/catalogue_detail.htm?
csnumber=40147
4. ISO TC69/SC1–Terminology and Symbols: ISO 3534-4:2014 - Statistics – Vocabulary and symbols –
Part 4: Survey sampling. Published standard (2014). http://www.iso.org/iso/catalogue_detail.htm?
csnumber=56154
8. ISO TC69/SCS–Secretariat: ISO 28640:2010 - Random variate generation methods. Published standard
(2015). http://www.iso.org/iso/catalogue_detail.htm?csnumber=42333
9. Lohr, S.: Sampling: Design and Analysis. Advanced (Cengage Learning). Cengage Learning, Boston
(2009)
10. Montgomery, D.: Statistical Quality Control, 7th edn. Wiley Global Education, Hoboken (2012)
Footnotes
1 See the concept of sampling distribution in Chapter 5.
Part III
Delimiting and Assessing Quality
This part includes two chapters covering how to compare the quality standards
with the process. In Chapter 7, acceptance sampling techniques are reviewed.
Sampling plans are obtained in order to fulfill requirements pertaining to
producer’s risk and consumer’s risk. The sampled items are assessed against an
attribute (defective, non defective) or a variable (a given continuous quality
characteristic). Chapter 8 starts establishing the quality specifications, i.e., the
voice of the customer (VoC), in order to compare with the voice of the process
(VoP) through Capability Analysis. Then, examples using R illustrate the
methods, and the ISO Standards related to these topics are discussed.
© Springer International Publishing Switzerland 2015
Emilio L. Cano, Javier M. Moguerza and Mariano Prieto Corcoba, Quality Control with R, Use R!,
DOI 10.1007/978-3-319-24046-6_7
Abstract
Undoubtedly, an effective but expensive way of providing conforming items to a
customer is making a complete inspection of all items before shipping. In an
ideal situation, a process designed to assure zero defects would not need
inspection at all. In practice, a compromise between these two extremes is
attained, and acceptance sampling is the quality control technique that allows
reducing the level of inspection according to the process performance. This
chapter shows how to apply acceptance sampling using R and the related ISO
standards.
7.1 Introduction
The basic problem associated with acceptance sampling is as follows:
whenever a company receives a shipment of products (typically raw material)
from a supplier a decision has to be made about the acceptance or rejection of
the product. In order to make such a decision, the company selects a sample out
of the lot, measures a specified quality characteristic and, based on the results of
the inspection decides among:
Accepting the lot (and send it to the production line);
Rejecting the lot (and send it back to the supplier);
Take another sample before deciding (if results are not conclusive).
Sampling plans can be classified in attribute and variables. The attribute case
corresponds to the situation where the inspection simply determines if the item is
“good” or “bad,” this means it complies or not with a certain specification . This
kind of inspection is cheaper, but larger sample sizes are required. The variable
case, on the other hand, corresponds to the situation where the quality
characteristic is measured, thus allowing the inspector to decide based on the
value obtained. This kind of inspection is more expensive, but smaller sample
sizes are required.
In practice, the procedure to be followed is very simple; whenever the
company receives a shipment of N units, a random sample of n units is taken
from the lot and if d or less units happen to be considered as defective then the
lot is accepted. The procedure described corresponds to the case of attribute
inspection; the variable case is somewhat more sophisticated but conceptually
equivalent. For details on sampling methods, see Chapter 6
As any other hypothesis test, acceptance sampling is not a perfect tool but
just a useful one. There always exists the possibility of accepting a lot containing
too many defective items, as well as rejecting another one with very few
defectives. Fortunately, an upper bound for these two probabilities can be set up
in all cases by adequately selecting the parameters n and d.
This chapter provides the necessary background to understand the
fundamental ideas of acceptance sampling plans. Section 7.1 describes the
philosophy of the acceptance sampling problem. Sections 7.2 and 7.3,
respectively, develop the basic computational methods for attribute and variable
acceptance sampling as well as the way to implement them with R. Finally,
Sect.7.4 provides a selection of the ISO standards available to help users in the
practice of acceptance sampling.
N lot size;
n sample size (taken at random from the lot);
d maximum number of defective units in the sample for acceptance.
The result of the inspection of the sample is:
2. Reject the lot if x > d.
This kind of sampling plans, the simplest ones, are called single sampling
plans because the lot’s fate is decided based on the results of a unique sample.
There exist other kinds of sampling plans where two values of d are established;
the lot is accepted if ; rejected if x ≥ d upper; and a second sample taken if
. This kind of sampling plans are called double sampling plans.
The performance of a determined sampling plan is described by its operating
characteristic (OC) curve. This curve is a graphical representation of the
probability of accepting the lot as a function of the lot’s defective fraction. This
probability can be computed by means of the binomial probability distribution
(see Chapter 5), as long as the lot size be much larger than the sample size (n⁄N
< 0. 1):
(7.1)
where P a stands for the probability of accepting the lot and p stands for the
lot’s fraction defective.
Example 7.1.
single sampling plan.
If we assume that n = 100 and d = 5, the resulting OC curve should look like
Figure 7.1, which has been produced with the following R code:
Fig. 7.1 OC curve for a simple sampling plan. The parameters of this OC curve are n = 100, d = 5
□
There is a specific OC curve for every different sample plan; this means that if
we change either n or d, the curve will also change. But the general behavior
of all the curves will be similar; they start at P a = 1 for p = 0, decrease more or
less rapidly as p increases, and finish at P a = 0 for p = 1.
Two points in the OC curve are of special interest. The point of view of the
producer is that he requires a sampling plan having a high probability of
acceptance for a lot with a low (agreed) defective fraction. This low defective
fraction is called “acceptable quality level” (AQL), and the probability of such a
good quality lot being rejected is called “producer’s risk” (α). On the other hand,
the point of view of the customer is that he requires a sampling plan having a
high probability of rejection for a lot with a high (agreed) defective fraction.
This high defective fraction is called “lot tolerance percent defective” (LTPD),
and the probability of such a low quality lot being accepted is called
“consumer’s risk” (β). Figure 7.2 illustrates these two probabilities for a typical
OC curve.
Fig. 7.2 OC curve risks illustration. The fraction defective values AQL and LTPD are agreed. A
sampling plan yields then a producer’s risk α and a consumer’s risk β
The solution to this system is not easy and even not feasible all the times, so
that in general an acceptable solution will be as far as we could go. For
“acceptable” solution we understand a sampling plan that leads to actual α and β
values close enough to target values. Traditionally, nomographs (also called
nomograms) have been used to get approximate values of n and d given α and β
with paper and pencil, see, for example, [15]. Computational methods can be
used, though. R and a simple iterative method will greatly help in finding such
an acceptable solution. The iterative method we suggest is as follows:
Step 1) Choose your target α and β values.
Step 2) Start with a sampling plan like n = 10 and d = 1. Calculate α and β
values with R. Normally, such an initial plan will give α actual close to α
target and .
Step 3) There are two possibilities: If , then change d to d + 1.
Calculate α and β values with R and repeat Step 3.
Or
If , then change n to n +δ n . Calculate α and β values
with R and repeat Step 3.
Normally, δ n should range between 10 to 50 depending on how large
be the difference between α actual and α target. Larger δ n correspond to
larger differences.
Step 4) If the solution happens to be feasible, final values obtained for α actual
and β actual will be close to their target values. If not, judgement will
have to be used in order to decide the best values for n and d.
Example 7.2.
Iterative method to select a sampling plan.
An example will illustrate this method. Let us suppose we need a sampling
plan that will provide us with α target = 0. 05 for AQL = 5 % and β target = 0. 10 for
LTPD = 16 %. The following R code runs the method above getting the result
in Table 7.1.
Table 7.1 Iterative sampling plan selection method
Iteration n d α β Decision
1 10 1 0.09 0.51 Increase n
2 25 1 0.36 0.07 Increase d
3 25 2 0.13 0.21 Increase d
4 25 3 0.03 0.42 Increase n
5 40 3 0.14 0.10 Increase d
6 40 4 0.05 0.21 Increase n
7 55 4 0.14 0.05 Increase d
8 55 5 0.06 0.11 Increase d
9 55 6 0.02 0.20 Increase n
10 70 6 0.06 0.05 …
Note that we fix 10 iterations and make a decision depending on which risk
is farther away from the target. A customized function can be programmed
taking into account the specific problem at hand. In the eighth iteration we get a
plan that yields producer’s and customer’s risks very close to the targets.
□
Example 7.3.
OC curve and acceptance sampling plan with the AcceptanceSampling R
package.
The following code gets the OC curve for the sampling plan in Example 7.1,
i.e., with n = 100 and d = 5. The result is in Fig. 7.3.
Fig. 7.3 OC curve with the AcceptanceSampling package. Graphical parameters can be added to
customize the plot
Now let us compute the sampling plan proposed by the find.plan function
for the requirements in Example 7.2, i.e., α = 0. 05 for AQL = 5 % and β = 0. 10
for LTPD = 16 %. The arguments of the find.plan function are the producer risk
point (PRP) and consumer risk point (CRP). Each argument should be a vector
of two numbers, the first number being AQL or LTPD, and the second one being
the corresponding probability of acceptance, i.e., 1 −α and β, respectively.
Thus, the proposed plan is drawing samples of size 64 and reject the lot if
there are seven or more defectives. We can create an object of class OC2c for this
plan in order to plot the OC curve (see Fig. 7.4) and assess its performance. The
assess function returns the plan and its probabilities of acceptance, and
compares them with the required ones.
Fig. 7.4 OC curve for the found plan. The found plan can be plotted and assessed
Note that the result is slightly different to the one obtained in Example 7.2.
Both are close to risk targets and probably acceptable approximations for both
parts.
□
Throughout this section, the assumption was made that the binomial probability
distribution could be used for the purpose of calculating the probabilities
associated with the sampling process. As it was stated before, this assumption
holds as long as the sample size be small in comparison with the lot size (n <
< N). This will therefore guarantee that the probability of finding a defect in one
sampled item will remain approximately constant. But in the general case this
assumption is not true, and the more accurate hypergeometric distribution, which
was described in Chapter 5 should be employed instead. Nevertheless, the
methods are the same, just changing probability functions to the appropriate
distribution. As for the AcceptanceSampling package, functions OC2c and
find.plan accept a type argument whose possible values are binomial,
hypergeom, poisson, and normal (the latter for sampling plans for variables in the
next section).
(7.2)
where USL is the Upper Specification Limit, is the sample mean, and σ is
the process standard deviation. This case corresponds to the situation where only
the USL exists, and the standard deviation of the distribution of the individual
values is assumed to be known. If the so calculated Z USL value is larger than k (a
value known as “acceptability constant”), then the lot may be accepted. Fig. 7.5
illustrates this concept.
Fig. 7.5 Variables acceptance sampling illustration. Maximum allowable defective fraction and
acceptability constant
a) For a lot with a low (agreed) fraction defective (AQL), the probability of
rejection (producer’s risk) is equal to α;
b) For a lot with a high (agreed) fraction defective (LTPD), the probability of
acceptance (consumer’s risk) is equal to β.
Conceptually, the situation is illustrated in Figures 7.6 and 7.7. Figure 7.6
corresponds to the situation where the population has a defective fraction p 1
equal to AQL, whereas Figure 7.7 corresponds to the situation where the
population has a defective fraction p 2 equal to LTPD.
Fig. 7.6 Probability of acceptance when p=AQL. Probability of acceptance for a population with
defective fraction equal to AQL
Fig. 7.7 Probability of acceptance when p=LPTD. Probability of acceptance for a population with
defective fraction equal to LPTD
Sample size and the acceptability constant are chosen in such a way that the
probability of acceptance approximately corresponds to (1 −α) for Figure 7.6 and
(β) for Figure 7.7. Note that sample size has a clear effect on sample mean
distribution variance, as long as
where:
and:
Example 7.4.
Variable acceptance sampling. Known standard deviation.
A simple example will illustrate how these formulae are implemented in R.
Let us suppose we wish to develop a variable sample plan where:
AQL p 1 = 1 %
LTPD p 2 = 5 %
producer’s risk α = 5 %
consumer’s risk β = 10 %
σ assumed known
To find the sampling plan for these requirements, we use again the find.plan
function. In this case, we need to add a new argument to the function call,
namely type, in order to get the sampling plan for continuous variables.
2.
Compute the sample mean ;
3. Compute the Z USL value in Eq. (7.2);
4. Compare Z USL with k;
5. Decide whether to accept (Z USL > k) or reject (Z USL ≤ k) the lot.
Example 7.5.
Variable acceptance sampling. Implementation for the metal plates thickness
example.
A numerical example will illustrate the procedure. Let us simulate the
process described in Example 5.1 of Chapter 5. The quality characteristic was
the thickness of a certain steel plate produced in a manufacturing plant. Nominal
thickness of this product was 0.75 in. Let us assume that the standard deviation
is known and equal to 0.05, and the USL is 1 in. A simulated sample of this
process can be obtained with the following code:
Now we compute the sample mean and the Z USL value as follows:
As Z USL > k, this lot must be accepted. We suggest the reader to run this
simulation for different values of the mean and standard deviation and see how
lots are rejected as mean shifts or increase in variation occur. The following
convenient function helps automate this decision process1:
Thus, if a new sample whose mean is 0.92 is drawn, then the lot should be
rejected.
□
In the example above, we assumed that the standard deviation of the population
was known. If this is not the case, the sampling plan must be more conservative
as we have less knowledge about the process. The resulting formulae
corresponding to the case when there is a single specification limit and the
standard deviation is unknown are:
Example 7.6.
Variable acceptance sampling (cont). Unknown standard deviation.
If the standard deviation in Example 7.4 is unknown, then the sampling plan
corresponding with the conditions:
AQL p 1 = 1 %
LTPD p 2 = 5 %
producer’s risk α = 5 %
consumer’s risk β = 10 %
σ assumed unknown
Notice that we only have to change the s.type argument in the find.plan
function (by default "known"). Now we need much more items to be sampled in
order to achieve the objectives. We can simulate a new sample from our
production process, but now we need to estimate σ in Eq. (7.2) through the
sample standard deviation s.
□
References
1. Ishikawa, K.: Guide to Quality Control. Asian Productivity Organisation, Tokyo (1991)
2. ISO TC69/SC1–Terminology and Symbols: ISO 3534-1:2006 - Statistics – Vocabulary and symbols –
Part 1: General statistical terms and terms used in probability. Published standard (2010). url http://
www.iso.org/iso/catalogue_detail.htm?csnumber=40145
3. ISO TC69/SC1–Terminology and Symbols: ISO 3534-2:2006 - Statistics – Vocabulary and symbols –
Part 2: Applied statistics. Published standard (2014). url http://www.iso.org/iso/catalogue_detail.htm?
csnumber=40147
4. ISO TC69/SC1–Terminology and Symbols: ISO 3534-4:2014 - Statistics – Vocabulary and symbols –
Part 4: Survey sampling. Published standard (2014). url http://www.iso.org/iso/catalogue_detail.htm?
csnumber=56154
10. ISO TC69/SC5–Acceptance sampling: ISO 2859-5:2005 - Sampling procedures for inspection by
attributes – Part 5: System of sequential sampling plans indexed by acceptance quality limit (AQL) for
lot-by-lot inspection. Published standard (2014). url http://www.iso.org/iso/catalogue_detail.htm?
csnumber=39295
11. ISO TC69/SC5–Acceptance sampling: ISO 3951-5:2006 - Sampling procedures for inspection by
variables – Part 5: Sequential sampling plans indexed by acceptance quality limit (AQL) for inspection
by variables (known standard deviation). Published standard (2014). url http://www.iso.org/iso/
catalogue_detail.htm?csnumber=39294
12. ISO TC69/SC5–Acceptance sampling: ISO 24153:2009 - Random sampling and randomization
procedures. Published standard (2015). url http://www.iso.org/iso/catalogue_detail.htm?csnumber=
42039
13. Juran, J., Gryna, F.: Juran’s Quality Control Handbook. Industrial Engineering Series. McGraw-Hill,
New York (1988)
14. Kiermeier, A.: Visualizing and assessing acceptance sampling plans: the R package
AcceptanceSampling. J. Stat. Softw. 26(6), 1–20 (2008). url http://www.jstatsoft.org/v26/i06/
15. Montgomery, D.: Statistical Quality Control, 7th edn. Wiley Global Education, New York (2012)
16. Schilling, E., Neubauer, D.: Acceptance Sampling in Quality Control. Statistics: A Series of Textbooks
and Monographs, 2nd edn. Taylor & Francis, Boca Raton (2009)
Footnotes
1 It is relatively easy to implement this in an on-line process via an R interface like, Shiny (http://www.
shiny.rstudio.com), possibly using automatically recorded measurements.
© Springer International Publishing Switzerland 2015
Emilio L. Cano, Javier M. Moguerza and Mariano Prieto Corcoba, Quality Control with R, Use R!,
DOI 10.1007/978-3-319-24046-6_8
Abstract
In order to assess quality, specification limits are to be established. In this
chapter a method to set specification limits taking into account customers’ and
producer’s loss is presented. Furthermore, the specification limits are the voice
of the customer, and quality can be assessed by comparing it with the voice of
the process, that is, its natural limits. Capability indices and the study of long-
and short-term variability do the job.
8.1 Introduction
In Chapter 4 we reviewed some definitions of Quality from several standpoints.
The fulfillment of some specifications seemed to be a generally accepted criteria
to assess Quality. In this chapter we provide some guidelines and resources to
establish such specifications, and how to measure and analyze the capability of a
process to fulfill them. The idea is to fix some specification limits, Upper and/or
Lower (USL and LSL respectively), and compare them with the natural limits of
the process. These natural limits are normally the same used as Upper and Lower
Control Limits (UCL and LCL respectively) in the Control Charts explained in
Chapter 9 This is the first caution a Quality analyst must take: making clear the
difference between Specification Limits and Control Limits. Specification Limits
are the voice of the customer1 (VoC). Natural limits (or Control Limits) are the
voice of the process (VoP). Thus, the capability of a process is a way of
assessing how the VoP is taking care of the VoC. Capability analysis quantifies
this fact through graphical tools, capability indices, and other metrics, thereby
measuring the Quality of our process.
where Y is the quality characteristic and Y 0 is the target value. For a specific
process, k is obtained as:
where Δ c is the tolerance for the characteristic Y from the point of view of the
customer, and L c is the loss for the customer when the characteristic is just out
of specification, i.e., Y 0 ±Δ c . Notice that the loss is zero when the process is
exactly at the target, and increases as the value of the characteristic departs from
this target, see Fig. 8.1.
Fig. 8.1 Taguchi’s loss function and specification design. The farther the target, the higher the loss. The
function is determined by the cost for the customer at the specification limits. The manufacturing limits are
then fixed at the point in which the function equals the cost for the producer
The cost of poor quality is usually higher for the customer than for the
producer. The reason is that a product or service that is delivered to a customer is
usually compound of a series of items or components. Thus, if a component of
the product or service is defective, then the whole thing must be repaired or
replaced. Not to mention installation, transport, and collateral costs. However, a
defective item in the house of the producer would produce a lesser cost, just for
reworking or dismissing the individual component. Let L m be the loss for the
producer. Then we can find a value for the product characteristic Y 0 ±Δ m for
which the loss function equals that loss. Thus, as it is clearly shown in Fig. 8.1,
the manufacturing specification limit is lower than the customer specification
limit. The distance between both of them depends on the difference between the
customer’s loss and the producer’s loss, as the manufacturing tolerance can be
computed as:
Example 8.1.
Metal plates thickness.
We use the example described in Chapter 5 The quality characteristic under
study is the thickness of a certain steel plate. Nominal thickness of this product
is Y 0 = 0. 75 in, with a standard deviation of 0.05 in. The production is organized
in two shifts, seven days a week, and a sample of n = 6 units is drawn from each
shift. The data frame ss.data.thickness2 in the SixSigma package contains the
thickness measurements for two given weeks.
The structure of the data frame and a sample of its first rows are:
A visualization of all the data is in Fig. 8.2 by means of a dot plot with the
lattice package [15] using the following code:
Fig. 8.2 Thickness example: one week data dot plot. Each point represents the thickness of one metal
plate
If the cost for the producer at the specification limit L m = 1 USD, then the
manufacturing tolerance is:
Fig. 8.3 Reference limits in a Normal distribution. Within the reference limits fall 99.7 % of the data
At this point, we need data in order to listen to the VoP. On the one hand, we
need to estimate the reference limits, so we need a sample of an in-control
process and make inference about the probability distribution. If we can accept
that data come from a normally distributed process, then the reference limits are
just μ ± 3σ. We estimate μ and σ, and we are done, see Chapter 5 for inference
and estimation. On the other hand, samples are to be taken in order to
compare items’ actual measurements with specification limits. Thus, the easiest
way of assessing our quality is to count the number of items in the sample that
are correct. The proportion of correct units is the yield of the sample. The yield
of the process may be calculated taking into account rework (first time yield)
and several linked processes (rolled throughput yield). The proportion of defects
is the complementary of process yield. Defects per unit and defects per million
opportunities (DPMO) are other usual metrics. Either way, if the sample is
representative of the population, then we can estimate the yield of the process
through the sample proportion of correct items.
Example 8.2.
Metal plates thickness (cont.) Process yield.
For the sake of clarity, we use the customer specification limits, i.e., 0. 75 ±
0. 05. Thus, we can count the items out of specification in the sample as follows:
Therefore, the yield of the sample is 78/84 = 92.86% and the proportion of
defects in the sample is 7.14%. □
Example 8.3.
Metal plates thickness (cont.) Proportion of defects.
The first question would be: is the random variable that characterizes the
quality characteristic of our process normally distributed? The first tool we can
use is a histogram. Fig. 8.4 shows the histogram for all the samples of the week
in our example. Even though it seems normal, we can perform a hypothesis test
and see if we should reject normality:
Fig. 8.4 Histogram of metal plates thickness. The histogram provides clues about normality.
A normality test proves to be helpful when in doubt
And now we can estimate the likely proportion of defects of our process:
More than 8.6 % of the items will be, in the long term, out of specifications.
□
where σ LT represents long-term (LT) variation . Note that this quotient is the
number of times the reference limits (natural variation) fits into the specification
limits. The lower the index, the greater the proportion of non-conforming items.
However, if the process is not centered in the target, we have a different
performance for larger values and for lower values. To overcome this situation,
the upper and lower process performance indices are defined as:
and the minimum process performance index reflects better the performance
of the process:
where:
The above reasoning applies to the capability indices in the next section.
Fig. 8.5 shows the interpretation of the index depending on its value. If an index
is equal to 1, it means that the reference limits and the specification limits are of
equal width, and therefore we will get approximately (α × 100) % defects in the
long term. If the index is greater than one, then the process is “capable” of
fulfilling the specifications, whilst if the index is lower than 1, then we have a
poor quality and the proportion of defects is greater than α.
Fig. 8.5 Specification limits vs. reference limits. The larger the distance from the reference limit to the
specification limits, the better the performance. The plots correrspond, left to right, to performance (or
capability) indices equal, greater, and lower than 1
Example 8.4.
Metal plates thickness. Process performance.
The sample mean standard deviation were computed in the previous
example. Moreover, the producer’s specification limits are 0. 75 ± 0. 032, i.e.,
USL = 0.782 and LSL = 0.718. Then we can easily calculate the performance
indices as follows:
We will see in the following section how to compute performance indices
using the qcc R package.
□
The difference is in the data used and in the fact that short term (ST)
variability σ S T is used for an in-control process instead of the LT variability.
Again, the lower the index, the greater the proportion of non-conforming
items. We also define upper and lower process capability indices as:
and the minimum process capability index reflects better the performance of
the process:
where and are the average range and standard deviation of the
subgroups respectively, and d 2 and c 4 are tabulated constants that only depend
on the sample size n. See Chapter 9 to find out more about control charts. Then,
the appropriate point estimators for the capability indices are:
Finally, let us consider a process in which the target T is not centered in the
specification interval, i.e., . In this situation, we can use the so-
called Taguchi index defined as:
Example 8.5.
Metal plates thickness. Capability indices.
In order to perform a capability analysis using the qcc package we need to
create a qcc object as if we wanted to plot a control chart (see Chapter 9) with
subgroups. Assuming that each shift is a subgroup:
Now we can get the capability indices and a graphical representation of the
process using the process.capability function, see Fig. 8.6:
Fig. 8.6 Capability analysis for the thickness example. A histogram is shown and compared with the
specification limits and target, along with the computed indices
Notice that, in addition to point estimators, a confidence interval is provided
which is very useful for the monitoring of the capability. It is apparent that this
illustrative process is not capable at all. □
References
1. Cano, E.L., Moguerza, J.M., Redchuk, A.: Six Sigma with R. Statistical Engineering for Process
Improvement, Use R!, vol. 36. Springer, New York (2012). url http://www.springer.com/statistics/book/
978-1-4614-3651-5
2. ISO TC69/SC1–Terminology and Symbols: ISO 3534-1:2006 - Statistics – Vocabulary and symbols –
Part 1: General statistical terms and terms used in probability. Published standard (2010). url http://
www.iso.org/iso/catalogue_detail.htm?csnumber=40145
3. ISO TC69/SC1–Terminology and Symbols: ISO 3534-2:2006 - Statistics – Vocabulary and symbols –
Part 2: Applied statistics. Published standard (2014). url http://www.iso.org/iso/catalogue_detail.htm?
csnumber=40147
11. ISO TC69/SC8–Application of statistical and related methodology for new technology and product
development: ISO 16336:2014 - Applications of statistical and related methods to new technology and
product development process – Robust parameter design (RPD). Published standard (2014). url http://
www.iso.org/iso/catalogue_detail.htm?csnumber=56183
12. Knight, E., Russell, M., Sawalka, D., Yendell, S.: Taguchi quality loss function and specification
tolerance design. Wiki (2007). url https://controls.engin.umich.edu/wiki/index.php/Taguchi_quality_
loss_function_and_specification_tolerance_design. In Michigan chemical process dynamics and
controls open textbook. Accessed 23 June 2015
13. Montgomery, D.: Statistical Quality Control, 7th edn. Wiley Global Education, New York (2012)
14. Pearn, W., Kotz, S.: In: Encyclopedia and Handbook of Process Capability Indices: A Comprehensive
Exposition of Quality Control Measures. Series on Quality, Reliability & Engineering Statistics. World
Scientific, Singapore (2006)
15. Sarkar, D.: Lattice: Multivariate Data Visualization with R. Springer, New York (2008). url http://
lmdvr.r-forge.r-project.org. ISBN 978-0-387-75968-5
16. Taguchi, G., Chowdhury, S., Wu, Y.: Taguchi’s Quality Engineering Handbook. Wiley, Hoboken (2005)
Footnotes
1 A current trend is to use voice of stakeholders (VoS) instead of VoC.
2 see Chapter 4 for details on ISO standards development stages.
3 Production is applicable to products and services for the scope of this chapter.
4 This way of fixing specifications is called functional tolerance in Taguchi’s method terminology.
Part IV
Control Charts
This Part contains two chapters dealing with the monitoring of processes. In
Chapter 9, the most important tool in statistical process control is explained:
control charts. Several types of control charts are shown in order to detect if a
process is out of control. By controlling the stability of the process, we may
anticipate future problems before products/services are received by the customer.
It is also a powerful improvement tool, as the investigation of special causes of
variation may result on better procedures to avoid the root cause of the out-of-
control situation. Chapter 10 presents a methodology to monitor processes where
a nonlinear function characterizes the quality characteristic. Thus, confidence
bands are computed for the so-called nonlinear profiles, allowing the monitoring
of processes under a similar methodology to the control charts approach.
© Springer International Publishing Switzerland 2015
Emilio L. Cano, Javier M. Moguerza and Mariano Prieto Corcoba, Quality Control with R, Use R!,
DOI 10.1007/978-3-319-24046-6_9
Abstract
Control charts constitute a basic tool in statistical process control. This chapter
develops the fundamentals of the most commonly applied control charts.
Although the general basic ideas of control charts are common, two main
different classes are to be considered: control charts for variables, where
continuous characteristics are monitored; and control charts for attributes, where
discrete variables are monitored. In addition, as a special type of control charts,
time weighed charts are also outlined in the chapter. Finally, to guide users in the
practice of control charts, a selection of the available ISO standards is provided.
9.1 Introduction
In Chapter 1 we introduced quality control with an intuitive example based on
the use of a control chart. In fact, control charts are one of the most important
tools in Statistical Process Control (SPC). The underlying idea of control charts
is to build some natural limits for a given summary statistic of a quality
characteristic. Under the presence of common (natural) causes of variation, this
summary statistic is expected to remain within these limits. However, if the
statistic falls out of the natural limits, it is very unlikely that only natural
variability is present, and an investigation should be carried out in order to look
for possible assignable causes of variation, which should be eliminated [18]. In
practice, the natural limits will be estimated according to the sampling
distribution of the statistic to be monitored, and we will refer to the estimated
limits as “control limits.” In fact, every point in a control chart leads to a
hypothesis test: a point out of the control limits may imply an abnormal
performance of the process under study and, as a consequence, the process may
be considered to be out of control. On the contrary, if all points remain within the
control limits, the process may be considered to be in control. See Chapter 5 for
details about statistics, sampling distributions, and hypothesis tests.
This chapter develops the fundamentals of the most commonly applied
control charts, the basic tool used in SPC. The remaining of this section depicts
the basic ideas of control charts; Sect. 9.2 describes the control charts for
variables as well as the special (time weighed) charts; Sect 9.3, describes the
control charts for attributes. Finally, Sect. 9.5 provides a selection of the ISO
standards available to help users in the practice of control chart.
1. Center line (CL): This is the central value the statistic should vary around.
For example, the mean of the process;
2. Lower control limit (LCL). This is the value below which it is very unlikely
for the statistic to occur when the process is in control;
3.
Upper control limit (UCL). This is the counterpart of the LCL on the upper side
of the CL. The LCL and UCL are symmetric if the probability distribution of the
statistic to be monitored is symmetric (e.g., normal).
The control limits are completely different from the specification limits, that
is, the limits beyond which the process will not be accepted by the customer
(see Chap. 8). The control limits are computed as a confidence interval (see
Chap. 5) that comprises a high proportion of the values. Typical control limits
are those between the mean and three standard deviations (μ ± 3σ). For a normal
probability distribution these limits include 99.73 % of the data. Thus, if nothing
abnormal is taking place in the process, there will only be a probability of 0.0027
for an individual observation to be outside the control limits. Moreover, a control
chart adds information about the variation of the process. Figure 9.1 shows how
both types of information are related.
Fig. 9.1 Control charts vs. probability distribution. The control chart shows the sequence of the
observations. The variation around the central line provides an idea of the probability distribution
Having said that, we are interested in learning how many samples will be
needed to detect a given change in the process. In this case, the probability of
detection is the power of the control chart, i.e., 1 −β, where β is obtained from
the OC curve mentioned above, see Chapter 6. Thus, to detect a mean shift
corresponding to a given β, the ARL is:
The ARL indicates the number of samples we need to detect the change.
Then, we should check the sampling frequency depending on our preference
regarding the time we are willing to wait before detecting a change. We leave the
illustrative example for Sect. 9.2 in order to first introduce other concepts.
Fig. 9.2 Identifying special causes through individual points. When an individual point is out of the control
limits, a research on the cause should be started, in order to eliminate the root of the problem
Special causes can also generate other persistent problems in a process. They
can be identified through patterns in the chart. Three important patterns that we
can detect are trends, shifts, and seasonality (Fig. 9.3).
Fig. 9.3 Patterns in control charts. We can identify several patterns in a control chart, namely: Recurring
cycles (seasonality) (a), Shifts (b), or Trends (c)
The following signals may help identify the above out-of-control situations.
Basically, the occurrence of any of the following circumstances is highly
unlikely, thus leading us to the conclusion that an assignable cause may be
present in the process.
Points out of control limits;
Seven consecutive points at the same side of the center line;
Six consecutive points either increasing or decreasing;
Fourteen consecutive points alternating up and down;
Any other unusual pattern.
In addition to control limits we may look inside them in order to anticipate
possible problems. Three zones can be defined comprised by the control limits
(Fig. 9.4).
1. Zone C: ranges between the central line and one standard deviation;
2. Zone B: ranges between one and two standard deviations from the central
line;
3. Zone A: ranges between two and three standard deviations from the central
line.
Fig. 9.4 Control chart zones. The distribution of the observations in the three zones can convey out-of-
control situations
After the definition of these three zones, some other unusual patterns may
arise, namely:
Two out of three consecutive points in Zone A or above;
Four out of five consecutive points in Zone B or above;
Fifteen consecutive points in Zone B.
To compute the control limits (CL ± 3σ) we need to estimate σ. In this case,
as we are monitoring means, we need the standard deviation of those means in
order to compute the limits. From the sample distribution of the sample mean
(see Chapter 5), we know that its standard deviation is . From the central limit
theorem, we can also use this result, even for non-normal distributions. Thus, the
formulae for the central line and the control limits of the mean chart are as
follows:
Example 9.1.
Metal plates thickness. X-bar chart.
In this chapter, we will use the metal plates thickness example in Chapter 8.
We recall here that we have measurements of metal plates thickness made up of
m = 14 samples of size n = 6, corresponding to 7 days, one sample for each of the
two shifts in which the production is organized. The data frame
ss.data.thickness2 is in the SixSigma package. The points of the X-bar control
are the following ones:
We could make the computations for the control limits using the formulae
above and then plot the control chart using R graphical capabilities. This might
be needed at some point, but in general it is more convenient to use contributed
packages that do all the work. Even though there are several packages that can
plot control charts, we will focus just on the qcc package [17], which is widely
used in both academy and industry. Before plotting the X-bar chart, we show the
main features of the functions in the package and the workflow to use it.
The use of the qcc package is simple. The main function is also named qcc,
and it returns a special object of class qcc. Even though the three entities have
the same name, they are not the same, check Chapter 2 to find out more about
packages, functions, and objects. The qcc function only needs two or three
arguments to create the object: one for the data, one for the type of chart, and
another for the sample sizes. The latter is only needed for certain types of charts,
as we will see later. The data argument can be one of the following: (1) a vector
of individual values; or (2) a matrix or a data frame containing one sample in
each row. In our case, we do not have the data structured in this way, but all the
measurements are in a column of the data frame, and the groups are identified in
another column of the data frame. We could transform the data using standard R
functions, but the qcc package includes a convenient function that does the job:
qcc.groups. This function requires as arguments two vectors of the same length:
data containing all the observations, and sample containing the sample
identifiers for each data value. Thus, we first transform our original data into a
matrix in the appropriate format for the qcc package:
Finally, we can obtain the numerical or graphical results by using the generic
functions summary and plot, see Fig. 9.5:
Fig. 9.5 X-bar chart example (basic options). To get a control chart, just the data and the type of chart are
needed
The summary function returns: the call to the function; the title of the chart; a
five-number summary (Q 1, Q 3, median, maximum and minimum) plus the
mean of the statistic monitored; the sample size n; the number of groups m; the
center value of the statistic and its standard deviation; and the control limits. All
this information can be accessed in the object of class qcc, as it is actually a list
with the following elements:
This information can be used for further purposes. Notice that some
information is not shown by the summary function, namely: type, nsigmas, and
violations.
The violations item is, in turn, a list of two elements: (1) the indices of the
samples out of the control limits; and (2) the indices of the samples that violate
some of the rules listed above, e.g., too many points at the same side of the
center line. In our example, only an out-of-control-signal is shown: sample
number three is out of the control limits. Special causes of variation should be
investigated.
We can add options to the qcc function to customize our control chart. The
following is a brief description of the available options:
center: Phase I fixed center value;
std.dev: a fixed value for the standard deviation (or a method to estimate
it);
limits: Phase I fixed limits;
data.name: a character string, just for the plots;
labels: labels for the samples;
newdata: if provided, the data in the argument data is used as Phase I data,
i.e., to compute limits. A vertical line is plotted to separate Phase I and
Phase II data;
newsizes; sample sizes for the Phase II new data;
newlabels; labels for the new samples;
nsigmas: The number of standard deviations to compute the control limits
(by default 3);
confidence.level: if provided, control limits are computed as quantiles. For
example, a confidence level of 0.9973 is equivalent to 3 sigmas;
rules: rules for out-of-control signals. Experienced R users can add new
rules adapted to their processes;
plot: whether to plot the control chart or not.
On the other hand, the qcc.options function allows to set global options for
the current session, check the function documentation (type ?qcc.options) to
find out more. Also the call to the plot function over a qcc object allows to
customize parts of the chart. The following code sets options and adds arguments
to our object of class qcc, see Fig. 9.6.
Fig. 9.6 X-bar chart example (basic options). Options can be added to the qcc function and globally to the
qcc.options
Let us finish this example getting the OC Curves and ARL explained in
Sect. 9.1.2. Now that we have an object of class qcc, we can use the oc.curves
function to get values of β for some values of n, including n = 6 in the example,
and plot the OC curve, see Figure 9.7:
Fig. 9.7 OC curve for the X-bar control chart. The values of β can be stored and, afterwards, we can
calculate the ARL
In this section, we have explained in detail the use of the qcc package. The main
ideas are the same for all types of control charts. Therefore, in the following we
provide less details, explaining just the features that are differential among
charts. Likewise, for the sake of space, OC curves and ARL for each specific
control chart are not explained, see [15] for details.
(9.1)
(9.2)
(9.3)
In this case, we estimate σ R as , see [15] for details. Usually the formulae are
simplified as:
where:
Example 9.2.
Metal plates thickness (cont.). Range chart.
As we already have the matrix with the samples in the object samples.thick,
we just create the range chart in Fig. 9.8 just changing the type of control chart
as follows:
Fig. 9.8 Range chart for metal plates thickness. The range charts monitor variability
and therefore the central line and the control limits are the following:
New constants B 3 and B 4 are then defined to simplify the formulae as
follows:
where:
Example 9.3.
Metal plates thickness (cont.) Standard deviation chart.
Figure 9.9 shows the S chart created with the following expression, again
using the matrix of samples:
Fig. 9.9 S chart for metal plates thickness. The S chart monitors variability through the standard deviation
of the samples
At the beginning of this section we pointed out that control charts for variables
should be shown in couples. Why is this so important? We have seen in the
examples above that we might have out-of-control samples in terms of the mean
values, while being the variability in control; and the opposite case may also
occur. This is the reason why if we only monitor mean values, we will not be
aware of such situations.
Example 9.4.
Metal plates thickness (cont.) X-bar & S chart.
To illustrate the importance of monitoring variability, let us simulate a new
shift sample with the following code:
In order to jointly plot the two control charts, we use the graphical parameter
mfrow to divide the graphics device in two rows. We add the new sample as
Phase II data:
As it is clearly shown in Fig. 9.10, the last point is in control from the point
of view of the mean, but it is out of control from the point of view of variability.
In fact, one of the six new values is even out of the specification limits. □
Fig. 9.10 X-bar and S chart for metal plates thickness. It is important to monitor both mean values and
variability
Example 9.5.
Metal plates thickness (cont.) I & MR control charts.
For illustrative purposes, let us use the first 24 values in the ss.data.
thickness2 data frame to plot the individuals control chart. In this case, a vector
with the data is required, hence we do not need any transformation. On the other
hand, to plot the moving range control chart, we can create a matrix with two
artificial samples: one with the first original 23 values, and another one with the
last original 23 values. The following code plots the I & MR control charts in
Fig. 9.11.
Fig. 9.11 Individual and moving range charts for metal plates thickness. The moving range assumes
sample size n = 2 to compute limits
CUSUM Chart
The CUSUM chart (cumulative sums) controls the process by means of the
difference of accumulated sums with respect to a target value (usually the mean).
This chart may be used with either grouped or individual values. For each
sample two statistics are monitored, the so-called cusum coefficients: one for the
positive deviations C j + and another one for the negative deviations C j −, which
are calculated as follows:
Example 9.6.
Metal plates thickness (cont.) CUSUM chart.
The following expression plots the CUSUM chart in Fig. 9.12 using the
default settings, check the documentation of the function to learn more about
how to change them:
Fig. 9.12 CUSUM chart for metal plates thickness. Process shifts are detected sooner than with Shewhart
charts
EWMA Chart
EWMA is the acronym for Exponentially Weighted Moving Average. This chart
permits the identification of small deviations. It is said that the EWMA chart has
memory, as each monitored value takes into account the information from
previously monitored values. It is specially appropriate when data significantly
deviates from the normal distribution. The statistic to be monitored is a weighted
moving average z j computed as:
Example 9.7.
Metal plates thickness (cont.) EWMA chart.
The following expression plots the EWMA chart in Fig. 9.13 using the
default settings, check the documentation of the function to learn more about
how to change them:
Fig. 9.13 EWMA chart for metal plates thickness. The statistic plotted is in same scale that the variable
Note that in addition to the points and lines of the statistic z j , the real value
of each sample or observation is plotted as a ‘+’ symbol. □
The p Chart
This chart is used to control proportions within groups of a certain size, such as
lots, orders in a day, etc. The statistic to be monitored is the sample proportion p
j , whose standard deviation is:
Note that if we have different sample sizes the control limits are not constant.
The n p Chart
The n p control chart is used to monitor the number of elements D j with the
characteristic to be controlled, not the proportion. Nevertheless, the type of data
is the same as in the p chart, i.e., groups with a given size n, mandatory all of the
same size for this chart. The center line is the average number of items with the
characteristic per sample, i.e., . In this case the control limits are calculated
as:
Example 9.8.
Metal plates thickness (cont.) p and n p charts.
Suppose we want to monitor the proportion and the number of items whose
thickness is larger than the midpoint between the nominal value and the
specification limit, i.e., 0.775 in. We first need a vector with the proportions for
each sample, that can be calculated with the following expression:
The p control chart in Fig. 9.14 can now be obtained with the following call
to the qcc function:
Fig. 9.14 p chart for metal plates thickness. The statistic monitored is the proportion of items in each
sample
The n p control chart in Fig. 9.15 shows the same pattern. The election of one
or another is most of the times a matter of what is easier of interpret by the team:
proportions or counts. Moreover, the n p chart only can be used when all the
samples have the same size n.
Fig. 9.15 n p chart for metal plates thickness. The statistic monitored is the number of items within a
category in each sample
□
The c Chart
The c control chart is used to control the total number of events for a given
process in an errors per interval basis, this is a process that follows a Poisson
distribution in which there could theoretically be an infinite number of possible
events. In this kind of processes we do not have a sample size from which a
proportion could be calculated, as in the p and n p charts. The most common
application of this chart is to control the total number of nonconformities
measured in a series of m samples of the same extension, either temporal or
spatial.
Examples could be the number of unattended calls per hour, the number of
nonconformities per day, etc. It could also be used to monitor the number of
events of physical samples, e.g., when samples form a continuous material are
taken (fabric, surfaces (ft 2), liquid (l), etc.) and the average number of events per
sample are measured. The statistic of each sample is the count of events c j . The
center line is the average number of events per sample . As in a Poisson
distribution the variance is the parameter λ, then an estimator of the standard
deviation is , and therefore the control limits are:
The u Chart
When in the previous situation we have n j items of different size within each
sample j in which we count the total number of events x j in all the elements
within the sample, it may be interesting to monitor the average number of
defects per item. In such a situation the u chart should be used. The statistic to be
monitored is u j , the number of defects per unit in the sample j, i.e.:
and the center line is the average number of defects per unit in all the
samples:
Example 9.9.
Metal plates thickness (cont.) c and u charts.
In addition to the thickness measurement, some metal plates (1, 2, or 3) are
inspected to find flaws in the surface. The inspector counts the number of flaws
in each inspected metal plate, and this information is in the column flaws of the
ss.data.thickness2 data frame.
To plot the c chart for all the metal plates in Fig. 9.16 we need the vector
with just the inspected items, i.e., removing the NA values:
Fig. 9.16 c chart for metal plates thickness. The statistic monitored is the count of flaws in each individual
metal plate
Finally, if we want to monitor the average flaws per metal plate in each shift,
then we need the u chart in Fig. 9.17 that is the result of the following code:
Fig. 9.17 u chart for metal plates thickness. Note that, as we have a different number of inspected items in
each shift, the limits are not constant
□
We have focused on the use of the qcc R package [17]. Other packages can
be used to plot control charts, e.g., IQCC [2], qcr [4], spc [14], and qicharts[1].
Packages also useful for quality control charting in R are, for example, the
spcadjust package [5], with functions for the calibration of control charts; and
the edcc package [20], specific for economic design of control charts.
Nevertheless, with the formulae provided for the control lines (center, upper
limit, and lower limit) you are prepared to plot your own control charts with the
R graphical functions in the packages graphics, lattice [16] or ggplot2 [19],
just plotting points and lines and adding control lines (CL, UCL, LCL). In this
way, you can customize any feature of your control chart. Furthermore, for the
sake of completeness, Appendix A contains the constants used in the formulae.
9.5 ISO Standards for Control Charts
The following are the more relevant standards in the topic of control charts:
ISO 7870-1:2014 Control charts—Part 1: General guidelines [12] . This
Standard presents the key elements and philosophy of the control chart
approach, and identifies a wide variety of control charts, such as Shewhart
control chart and specialized control charts.
ISO 7870-2:2013 Control charts—Part 2: Shewhart control charts [11]
. This Standard establishes a guide to the use and understanding of the
Shewhart control chart approach to the methods for statistical control of a
process. It is limited to the treatment of SPC methods using only
Shewhart’s charts. Some supplementary material that is consistent with the
Shewhart approach, such as the use of warning limits, analysis of trend
patterns and process capability is briefly introduced.
ISO 7870-3:2012 Control charts—Part 3: Acceptance control charts
[10] . This Standard provides guidance on the uses of acceptance control
charts and establishes general procedures for determining sample sizes,
action limits, and decision criteria. This chart is typically used when the
process variable under study is normally distributed, however, it can be
applied to a non-normal distribution. Examples are included to illustrate a
variety of circumstances in which this technique has advantages and to
provide details of the determination of the sample size, the action limits,
and the decision criteria.
ISO 7870-4:2011 Control charts—Part 4: Cumulative sum charts [8] .
This Standard provides statistical procedures for setting up cumulative sum
(CUSUM) schemes for process and quality control using variables
(measured) and attribute data.
ISO 7870-5:2014 Control charts—Part 5: Specialized control charts
[13] . This Standard establishes a guide to the use of specialized control
charts in situations where commonly used Shewhart control chart approach
to the methods of statistical control of a process may either be not
applicable or less efficient in detecting unnatural patterns of variation of the
process.
ISO 11462-1:2001 Guidelines for implementation of statistical process
control (SPC)—Part 1: Elements of SPC [9] . This Standard provides
guidelines for the implementation of a SPC system, and a variety of
elements to guide an organization in planning, developing, executing,
and/or evaluating a SPC system.
In addition, parts 1 and 2 of ISO 3534 (Vocabulary and symbols about
Statistics, Probability, and Applied Statistics) [6, 7] are also useful for the scope
of Control Charts.
References
1. Anhoej, J.: qicharts: quality improvement charts. http://CRAN.R-project.org/package=qicharts (2015).
R package version 0.2.0
2. Barbosa, E.P., Barros, F.M.M., de Jesus, E.G., Recchia, D.R.: IQCC: improved quality control charts.
http://CRAN.R-project.org/package=IQCC (2014). R package version 0.6
3. Cano, E.L., Moguerza, J.M., Redchuk, A.: Six Sigma with R. Statistical Engineering for Process
Improvement. Use R!, vol. 36. Springer, New York (2012). http://www.springer.com/statistics/book/
978-1-4614-3651-5
4. Flores, M., Naya, S., Fernandez, R.: qcr: quality control and reliability. http://CRAN.R-project.org/
package=qcr (2014). R package version 0.1-18
5. Gandy, A., Kvaloy, J.T.: Guaranteed conditional performance of control charts via bootstrap methods.
Scand. J. Stat. 40, 647–668 (2013). doi:10.1002/sjos.12006
[MATH][MathSciNet][CrossRef]
6. ISO TC69/SC1–Terminology and Symbols: ISO 3534-1:2006 - Statistics – Vocabulary and symbols –
Part 1: General statistical terms and terms used in probability. Published standard. http://www.iso.org/
iso/catalogue_detail.htm?csnumber=40145 (2010)
7. ISO TC69/SC1–Terminology and Symbols: ISO 3534-2:2006 - Statistics – Vocabulary and symbols –
Part 2: Applied statistics. Published standard. http://www.iso.org/iso/catalogue_detail.htm?csnumber=
40147 (2014)
10. ISO TC69/SC4–Applications of statistical methods in process management: ISO 7870-3:2012 - Control
charts – Part 3: Acceptance control charts. Published standard. http://www.iso.org/iso/catalogue_detail.
htm?csnumber=40175 (2012)
11. ISO TC69/SC4–Applications of statistical methods in process management: ISO 7870-2:2013 - Control
charts – Part 2: Shewhart control charts. Published standard. http://www.iso.org/iso/catalogue_detail.
htm?csnumber=40174 (2013)
12. ISO TC69/SC4–Applications of statistical methods in process management: ISO 7870-1:2014 - Control
charts – Part 1: General guidelines. Published standard. http://www.iso.org/iso/catalogue_detail.htm?
csnumber=62649 (2014)
13. ISO TC69/SC4–Applications of statistical methods in process management: ISO 7870-5:2014 - Control
charts – Part 5: Specialized control charts. Published standard. http://www.iso.org/iso/catalogue_detail.
htm?csnumber=40177 (2014)
14. Knoth, S.: spc: statistical process control—collection of some useful functions. http://CRAN.R-project.
org/package=spc (2015). R package version 0.5.1
15. Montgomery, D.: Statistical Quality Control, 7th edn. Wiley Global Education, New York (2012)
16. Sarkar, D.: Lattice: Multivariate Data Visualization with R. Springer, New York (2008). http://lmdvr.r-
forge.r-project.org. ISBN 978-0-387-75968-5
17. Scrucca, L.: qcc: an r package for quality control charting and statistical process control. R News 4/1,
11–17 (2004). http://CRAN.R-project.org/doc/Rnews/
18. Shewhart, W.: Economic Control of Quality in Manufactured Products. Van Nostrom, New York (1931)
19. Wickham, H.: ggplot2: Elegant Graphics for Data Analysis. Use R! Springer, New York (2009)
20. Zhu, W., Park, C.: edcc: An R package for the economic design of the control chart. J. Stat. Softw.
52(9), 1–24 (2013). http://www.jstatsoft.org/v52/i09/
© Springer International Publishing Switzerland 2015
Emilio L. Cano, Javier M. Moguerza and Mariano Prieto Corcoba, Quality Control with R, Use R!,
DOI 10.1007/978-3-319-24046-6_10
Abstract
In many situations, processes are often represented by a function that involves a
response variable and a number of predictive variables. In this chapter, we show
how to treat data whose relation between the predictive and response variables is
nonlinear and, as a consequence, cannot be adequately represented by a linear
model. This kind of data are known as nonlinear profiles. Our aim is to show
how to build nonlinear control limits and a baseline prototype using a set of
observed in-control profiles. Using R, we show how to afford situations in which
nonlinear profiles arise and how to plot easy-to-use nonlinear control charts.
10.1 Introduction
In Chapter 9 we presented control charts, considered to be the basic tools of
statistical process control (SPC). In particular, control charts are useful to test the
stability of a process when measuring one or more response variables
sequentially. However, in many situations, processes are often represented by a
function (called profile) that involves a response variable and a number of
predictive variables. The simplest profiles are those coming from a linear
relation between the response and the predictive variables. Nevertheless, in
many cases, a nonlinear relation exists among the variables under study and,
therefore, more complex models are demanded by the industry. In this chapter,
we show how to treat data whose relation between the predictive and response
variables is nonlinear and, as a consequence, cannot be adequately represented
by a linear model. This kind of data are known as nonlinear profiles.
One of the first approaches to nonlinear profiles in the industry can be
consulted in the document entitled “Advanced Quality System Tools” published
by Boeing in 1998 [1]. On page 91 of that publication, a location variability
chart of the flange-angle deviation from target at each location within a given
spar is shown, with the peculiarity that, in addition to the classical specification
limits, location averages and natural tolerance limits are included for each
location. These so-called natural tolerance limits are far from linear,
demonstrating that a nonlinear relation between the location and the flange-angle
deviation exists. A brief review on nonlinear profiles research can be consulted
in [12].
Example 10.1.
Engineered woodboards.
To illustrate this chapter, we will use a data set in the SixSigma package [2].
It is a variation of the example introduced in [11]. Our data set is made up of
observations over 50 items, named P1, …, P50. Each item corresponds to an
engineered woodboard. For each woodboard, 500 observations (measurements)
of its density are taken at locations 0.001 in apart within the board. The
observations have been sequentially obtained. Each five observations correspond
to a sample from the same 4-h shift, that is, the first five observations correspond
to the first 4-h shift, and so on. The data are available in the SixSigma package as
data objects ss.data.wbx for the locations, and ss.data.wby for the density
measurements. Note that ss.data.wby is a matrix in which each column contains
the measurements corresponding to a woodboard. The column names allow to
easily identify the woodboard and the group it belongs to. Fig. 10.1 graphically
represents board P1, using standard R graphics functions. The 500 locations
where the measurements were taken are in the x-axis, while the y-axis is for the
density measurements at each location. We will refer to each set of 500
measurements taken over a given board as a profile.
Fig. 10.1 Single woodboard example. Plot of the first board in the sample (named P1). The density
measurements (y axis) are plotted against their location within the woodboard (x axis)
□
Example 10.2.
Engineered woodboards (cont.)
In the case at hand, n = 500, corresponding every x i to each one of the 500
locations taken 0.001 inches apart within the board, and every y i to each one of
the 500 measurements taken at the locations. The first 20 locations and
measurements corresponding to board P1 are:
Notice that the smoothing procedure is working well as the smoothed version
seems to fit the original profile.
Function smoothProfiles accepts as first input argument a matrix containing
a set of profiles. In this case, as we are working only with profile P1, the input
matrix is the vector containing the measurements corresponding to profile P1
(ss.data.wby[, "P1"]). The second argument is the vector corresponding to the
locations where the measurements are taken (ss.data.wbx). The output of the
function is the smoothed version of profile P1, which is assigned to object
P1.smooth. The function allows the tuning of the SVM model inside. To this aim,
additional arguments may be provided to change the default settings (check the
function documentation for details by typing ?smoothProfiles). Note that this
function makes use of the e1071 library, and therefore it must be installed in R in
advance.
Function PlotProfiles plots a set of profiles in a matrix. In this case, we
have built a matrix with the smoothed version of profile P1 in the first column
and profile P1 itself in the second column using the cbind function. We may plot
the whole set of profiles with the following code (see Fig. 10.3):
Fig. 10.3 Woodboard example: whole set of profiles. Plot of the 50 profiles. The density measurements (y
axis) are plotted against their location within the woodboard (x axis)
Fig. 10.4 Woodboard example: whole set of smoothed profiles. Plot of the 50 smoothed profiles. The
smoothed density measurements (y axis) are plotted against their location within the woodboard (x axis)
□
10.3.1 Phase I
Example 10.3.
Engineered woodboards (cont.) Phase I analysis.
We will divide the set of 50 profiles into two subsets. A first subset will be
made up of the profiles obtained within the first seven 4-h shifts, that is, profiles
P1, …, P35. The second subset will be made up of the remaining profiles, i.e.,
profiles P36, …, P50. In Phase I, we will use the profiles in the first group to
seek for a baseline subset of in-control profiles and, with this baseline subset,
model the in-control process performance. We will refer to this first group as
Phase I group.
First, we create a matrix with the profiles in the Phase I group (columns 1–35
in matrix ss.data.wby):
Next, we calculate and plot confidence bands from the profiles in the Phase I
group and an estimation of function f(x) in (10.1), see Fig. 10.5. In the following,
we will refer to f(x) as the prototype profile.
Fig. 10.5 Woodboard example: Phase I. Plot of the 35 Phase I group profiles, confidence bands, and
estimation of f(x)
The function outProfiles returns a list of three vectors. The first vector
(labOut) contains the labels of the out-of-control profiles. The second vector
(idOut) contains the indexes of the out-of-control profiles. This vector is given
for completeness as in some cases the index may be preferable to the label. The
third vector contains the proportion of times that each profile remains out of the
confidence bands. By default, the function considers a profile to be out of
control if the proportion of times that this profile remains out of the confidence
bands is over 0.5. The user may change this default value by including the tol
argument with the desired value. For instance, the user may change the value of
the tol parameter to 0.80. In this case, only one out-of-control profile arises,
P28. After some investigation, the user may consider that profile P28 is out of
control, and should not belong to the in-control baseline subset of profiles. As a
consequence, it is removed and the confidence bands are calculated again
without this profile, following in this way a classical Phase I SPC strategy.
With the following code the confidence bands are calculated again using 34
profiles, that is, without profile P28 in the Phase I group of profiles:
Fig. 10.6 shows the 34 profiles (thin black lines), the confidence bands
calculated from these 34 profiles (thick blue lines) and the estimation of f(x)
(thick green line). In the plot it is apparent that some profiles are close to the
confidence bands. The following code looks for the new list of out-of-control
profiles using the 0.8 tolerance previously defined by the user.
Fig. 10.6 Woodboard example: Phase I. Plot of the Phase I baseline in-control profiles, confidence bands,
and estimation of f(x)
In this case, the value NULL indicates that the vector is empty, and therefore,
no out-of-control profiles were detected. It is clear that now the process seems to
be in control. And these 34 profiles constitute the Phase I in-control baseline
group of profiles. □
10.3.2 Phase II
Once the confidence bands and the baseline profiles have been determined, we
will use these confidence bands to check if some not previously analyzed
profiles (typically new ones) are out of control.
Example 10.4.
Engineered woodboards (cont.) Phase II analysis.
In Phase II, we will check whether the profiles in the last three 4-h shifts,
profiles P36, …, P50, are out of control. Notice that these profiles were not used
to estimate the confidence bands in Phase I. Next, we create a matrix with these
profiles:
In order to check if the new profiles are out of control, we just use function
outProfiles over the new set of profiles and the control limits calculated in
Phase I.
Profiles P46, P47, and P48 are considered to be out of control. The
proportion of times that these profiles remain out of the confidence bands is 1,
0.95, and 1 respectively. As a consequence, the user may conclude that during
the last 4-h shift the process was not in-control and the causes should be
investigated.
We can plot the Phase II profiles, the confidence bands, the estimation of
f(x), and the out-of-control profiles. To do this, we use again the plotProfiles
function adding a new argument outControl with the labels or indexes of the
out-of-control profiles, see Fig. 10.7.
Fig. 10.7 Woodboard example: Phase II. Plot of the Phase II profiles, confidence bands, estimation of f(x),
and out-of-control profiles
Finally, for the sake of clarity, we can plot a graph only containing the out-
of-control profiles, the confidence bands, and the prototype, see Fig. 10.8. The
code to plot this chart is:
Fig. 10.8 Woodboard example: Phase II out of control. Plot of the Phase II out-of-control profiles,
confidence bands, estimation of f(x)
In order to plot only the out-of-control profiles, we must change the default
value of the argument onlyout to TRUE.
Example 10.5.
Engineered woodboards (cont.) Profiles control chart.
Function outProfiles provides a vector which includes the proportion of
times that each profile is out of the confidence bands in the pOut element of the
output list. Notice that the pOut is independent of the tol parameter. To plot this
chart (see Fig. 10.9) we use the function plotControlProfiles as follows:
Fig. 10.9 Woodboard example: Profiles control chart. Sequential plot of the profiles out-of-control rate for
a given tolerance
□
References
1. Boeing Commercial Airplane Group, M.D.P.Q.A.D.: Advanced Quality System Tools, AQS D1-9000-
1. Toolbox (1998). url http://www.boeingsuppliers.com/supplier/d1-9000-1.pdf
2. Cano, E.L., Moguerza, J.M., Redchuk, A.: Six sigma with R. Statistical Engineering for Process
Improvement, Use R!, vol. 36. Springer, New York (2012). url http://www.springer.com/statistics/book/
978-1-4614-3651-5
3. Cano, J., Moguerza, J.M., Psarakis, S., Yannacopoulos, A.N.: Using statistical shape theory for the
monitoring of nonlinear profiles. Appl. Stoch. Model. Bus. Ind. 31(2), 160–177 (2015). doi:10.1002/
asmb.2059. url http://dx.doi.org/10.1002/asmb.2059
6. ISO TC69/SC6–Measurement methods and results: ISO 11843-5:2008 - Capability of detection – Part
5: Methodology in the linear and non-linear calibration cases. Published standard (2012). url http://
www.iso.org/iso/catalogue_detail.htm?csnumber=42000
7. ISO/TC 108, Mechanical vibration, shock and condition monitoring, Subcommittee SC 5, Condition
monitoring and diagnostics of machines: Condition monitoring and diagnostics of machines – Data
interpretation and diagnostics techniques – Part 1: General guidelines. Published standard (2012).
url http://www.iso.org/iso/home/store/catalogue_tc/catalogue_detail.htm?csnumber=39836
8. ISO/TC 184, Automation systems and integration, Subcommittee SC 5, Interoperability, integration and
architectures of automation systems and applications: Automation systems and integration – Integration
of advanced process control and optimization capabilities for manufacturing systems – Part 1:
Framework and functional model. Published standard (2015). url http://www.iso.org/iso/catalogue_
detail.htm?csnumber=61131
9. Moguerza, J., Muñoz, A., Psarakis, S.: Monitoring nonlinear profiles using support vector machines. In:
Rueda, L., Mery, D., Kittler, J. (eds.) Progress in Pattern Recognition, Image Analysis and
Applications. Lecture Notes in Computer Science, vol. 4756, pp. 574–583. Springer, Heidelberg
(2007). doi:10.1007/978-3-540-76725-1_60. url http://dx.doi.org/10.1007/978-3-540-76725-1_60
10. Moguerza, J.M., Muñoz, A.: Support vector machines with applications. Stat. Sci. 21(3), 322–336
(2006)
[MATH][CrossRef]
11. Walker, E., Wright, W.: Comparing curves with additive models. J. Qual. Technol. 34(1), 118–129
(2002)
12. Woodall, W.H.: Current research in profile monitoring. Producão 17(3), 420–425 (2007). Invited paper
Appendix A: Shewhart Constants for Control Charts
The main Shewhart constants d 2 , d 3 , and c 4 can be obtained for any n using R
as shown in the following examples:
The rest of Shewhart constants that can be found at any textbook are
computed using those three basic constants. A full table of constants can also be
generated using R. Table A.1 shows the constants used in this book. There are
other constants not covered by this book which could also be computed just
using the appropriate formula. A data frame with the constants in Table A.1 can
be obtained with the following code:
Table A.1 Shewhart constants
n d2 d3 c4 A2 D3 D4 B3 B4
2 1.1284 0.8525 0.7979 1.8800 0.0000 3.2665 0.0000 3.2665
3 1.6926 0.8884 0.8862 1.0233 0.0000 2.5746 0.0000 2.5682
4 2.0588 0.8798 0.9213 0.7286 0.0000 2.2821 0.0000 2.2660
5 2.3259 0.8641 0.9400 0.5768 0.0000 2.1145 0.0000 2.0890
6 2.5344 0.8480 0.9515 0.4832 0.0000 2.0038 0.0304 1.9696
7 2.7044 0.8332 0.9594 0.4193 0.0757 1.9243 0.1177 1.8823
8 2.8472 0.8198 0.9650 0.3725 0.1362 1.8638 0.1851 1.8149
9 2.9700 0.8078 0.9693 0.3367 0.1840 1.8160 0.2391 1.7609
10 3.0775 0.7971 0.9727 0.3083 0.2230 1.7770 0.2837 1.7163
11 3.1729 0.7873 0.9754 0.2851 0.2556 1.7444 0.3213 1.6787
12 3.2585 0.7785 0.9776 0.2658 0.2833 1.7167 0.3535 1.6465
13 3.3360 0.7704 0.9794 0.2494 0.3072 1.6928 0.3816 1.6184
14 3.4068 0.7630 0.9810 0.2354 0.3281 1.6719 0.4062 1.5938
15 3.4718 0.7562 0.9823 0.2231 0.3466 1.6534 0.4282 1.5718
16 3.5320 0.7499 0.9835 0.2123 0.3630 1.6370 0.4479 1.5521
17 3.5879 0.7441 0.9845 0.2028 0.3779 1.6221 0.4657 1.5343
18 3.6401 0.7386 0.9854 0.1943 0.3913 1.6087 0.4818 1.5182
19 3.6890 0.7335 0.9862 0.1866 0.4035 1.5965 0.4966 1.5034
20 3.7349 0.7287 0.9869 0.1796 0.4147 1.5853 0.5102 1.4898
21 3.7783 0.7242 0.9876 0.1733 0.4250 1.5750 0.5228 1.4772
22 3.8194 0.7199 0.9882 0.1675 0.4345 1.5655 0.5344 1.4656
23 3.8583 0.7159 0.9887 0.1621 0.4434 1.5566 0.5452 1.4548
24 3.8953 0.7121 0.9892 0.1572 0.4516 1.5484 0.5553 1.4447
25 3.9306 0.7084 0.9896 0.1526 0.4593 1.5407 0.5648 1.4352
TC69/SCS: Secretariat
ISO 11453:1996 Statistical interpretation of data—Tests and confidence
intervals relating to proportions.
ISO 11453:1996/Cor 1:1999 .
ISO 16269-4:2010 Statistical interpretation of data—Part 4: Detection and
treatment of outliers.
ISO 16269-6:2014 Statistical interpretation of data—Part 6: Determination
of statistical tolerance intervals.
ISO 16269-7:2001 Statistical interpretation of data—Part 7: Median—
Estimation and confidence intervals.
ISO 16269-8:2004 Statistical interpretation of data—Part 8: Determination
of prediction intervals.
ISO 2602:1980 Statistical interpretation of test results—Estimation of the
mean—Confidence interval.
ISO 2854:1976 Statistical interpretation of data—Techniques of estimation
and tests relating to means and variances.
ISO 28640:2010 Random variate generation methods.
ISO 3301:1975 Statistical interpretation of data—Comparison of two means
in the case of paired observations.
ISO 3494:1976 Statistical interpretation of data—Power of tests relating to
means and variances.
ISO 5479:1997 Statistical interpretation of data—Tests for departure from
the normal distribution.
ISO/TR 13519:2012 Guidance on the development and use of ISO statistical
publications supported by software.
ISO/TR 18532:2009 Guidance on the application of statistical methods to
quality and to industrial standardization.
RStudio
CTRL + number Go to panel:
1: Editor
2: Console
3: Help
4: History
5: Files
6: Plots
7: Packages
8: Environment
CTRL + MAYÚS + K knit current R Markdown report
CTRL + MAYÚS + I Compile R Sweave (LaTeX) current report
CTRL + S Save file
F1 Contextual help (upon the cursor position)
CTRL + F
Activates search (within different panels) 1
<console>
↑ Expressions history
CTRL+L Clear console
ESC Cancel current expression
<editor and console>
TAB Prompt menu:
Select objects in the workspace
Select function arguments (when in parenthesis)
Select list elements (after the $ character)
Select chunk options (when in chunk header)
Select files (when in quotes)
<editor>
CTRL + ENTER Run current line or selection
CTRL + MAYÚS + S Source full script
CTRL + ALT + I Insert code chunk
CTRL + ALT + C Run current code chunk (within a chunk)
CTRL + MAYÚS + P Repeat las code run
CTRL + MAYÚS + C Comment current line or selection (add # at the beginning
of the line)
CTRL + D Delete current line
Help
?, help Help on a function
General
; Separate expressions in the same line
<- Assignment operator
{ <code>} Code blocks within curly brackets
# Comment (ignores the remaining of the line)
‘ <string> ‘ (backtick) Allow using identifiers with special
characters and/or blank spaces
Math Operators
+, - /, *, Arithmetic
Comparison Functions
all Are all elements TRUE?
Math Functions
sqrt Square root
exp, log Exponential and logarithmic
factorial Factorial
Vectors
c Create a vector (combine values)
[ ] Item selection
sort Sorting
Matrices
matrix Create a matrix
t Transpose a matrix
solve Inverse a matrix
Factors
factor Create a factor
Character String
nchar Get number of characters
Lists
list Create a list
Data Frames
data.frame Create a data frame
str Get data frame structure: column names, types, and sample data
head, tail Get first or last rows of a data frame
Files
download.file Download files
Data Simulation
set.seed Fix the seed 3
hist Histogram
mean Average
median Median
quantile Percentiles, quantiles
var Variance
sd Standard deviation
cor Correlation
Acceptance Sampling
Simple sampling plan
Double sampling plan
Assess plan
Control Charts
qcc Library
qcc.groups Create object with grouped data
qualityTools Package
cp Process capability indices
Pareto Analysis
qcc Package
cause.and.effect Cause-and-effect analysis
pareto.chart Pareto Chart
qualityTools Package
paretoChart Pareto chart
SixSigma Package
ss.ceDiag Cause-and-effect diagram
Probability
p* Distribution function for a given value
is.* Return a logic value TRUE if the object is of the specified class, for
example, numeric
as.* Coerce to the specified class
Vectorized Functions
tapply Apply a function to values for each level of a factor
lapply Apply a function to each element of a list returning a list
Programming
for Loop over the values of a vector or list
message
Reports
xtable Package
xtable Create tables in different formats, e.g., LaTeX, HTML
col1 col2
1 1 3
2 2 4
Package knitr
knit Converts Rmd, Rhtml and Rnw files into HTML, MS Word o PDF
reports. See documentación at http://yihui.name/knitr/ .
Main options in a code chunk header:
Useful Links
R-Project: http://www.r-project.org
RStudio: http://www.rstudio.com
Easy R practice: http://tryr.codeschool.com/
List of colours with names: http://www.stat.columbia.edu/~tzheng/files/
Rcolor.pdf
http://www.cyclismo.org/tutorial/R/
http://www.statmethods.net/index.html
Recipes : http://www.cookbook-r.com/
Search documentation: http://www.rdocumentation.org/
http://www.computerworld.com/s/article/9239625/Beginner_s_guide_to_
R_Introduction
http://www.inside-r.org/
http://www.r-bloggers.com/
Google R styleguide: http://google-styleguide.googlecode.com/svn/trunk/
Rguide.xml
Book Six Sigma with R : www.sixsigmawithr.com
A
AcceptanceSampling
assess
find.plan
OC2c
B
base
anyNA
array
as.Date
as.numeric
c
class
colMeans
colnames
cut
data.frame
detach
diff
dir
exp
expression
factor
format
getwd
gl
grep
is.na
is.numeric
length
library
list
list.dirs
list.files
load
log
ls
matrix
max
mean
names
ncol
nrow
options
order
paste
range
rbind
rep
require
rev
rm
RNG
round
rownames
rowSums
sample
sapply
save
save.image
scan
search
seq
seq_along
set.seed
setwd
sort
source
sqrt
strsplit
subset
sum
summary
table
tapply
typeof
unlist
which
C
car
bcPower
powerTransform
constants
LETTERS
letters
pi
D
Deducer
F
foreign
G
ggplot2
geom_boxplot
geom_histogram
geom_point
ggplot
labs
graphics
abline
barplot
boxplot
curve
hist
legend
lines
par
plot
points
polygon
rect
text
grDevices
grid
H
h5
Hmisc
binconf
I
IQCC
ISOweek
K
knitr
L
lattice
bwplot
dotplot
histogram
llines
panel.dotplot
panel.superpose
trellis.par.get
trellis.par.set
xyplot
M
MASS
boxcox
MSQC
N
nortest
ad.test
Q
qcc
cause.and.effect
cusum
ewma
mqcc
oc.curves
pareto.chart
process.capability
qcc
qcc.groups
qcc.options
summary.qcc
qcr
qicharts
paretochart
qic
trc
qualityTools
cp
paretoChart
R
Rcmdr
RJDBC
RMongo
RMySQL
RODBC
ROracle
RPostgreSQL
RSQLite
rvest
html
html_nodes
html_text
S
SixSigma
climProfiles
outProfiles
plotControlProfiles
plotProfiles
smoothProfiles
ss.ca.cp
ss.ca.cpk
ss.ca.study
ss.ca.z
ss.cc
ss.cc.getc4
ss.cc.getd2
ss.cc.getd3
ss.ceDiag
ss.data.bills
ss.data.wbx
ss.data.wby
special values
FALSE
i
Inf
LETTERS
letters
month.abb
month.name
NA
NaN
NULL
pi
TRUE
stats
aggregate
anova
arima
bartlett.test
binom.test
chisq.test
coef
complete.cases
cov
dbinom
dhyper
dnorm
dpois
dyper
IQR
lm
mad
median
pbinom
phyper
pnorm
poisson.test
ppois
prop.test
qbinom
qchisq
qhyper
qnorm
qpois
qqline
qqnorm
quantile
rbinom
rhyper
rnorm
rpois
sd
shapiro.test
t.test
ts
stats ( cont. )
var
var.test
weighted.mean
T
tolerance
acc.samp
U
utils
apropos
available.packages
browseVignettes
demo
download.file
example
head
help
history
install.packages
installed.packages
loadhistory
read.csv
read.table
remove.packages
savehistory
str
Sweave
vignette
write.csv
X
XLConnect
XML
xmlSApply
xmlTreeParse
xpathApply
xtable
print.xtable
xtable
Subject Index
Symbols
C p
C pkL
C pkU
C pk
C pmk
F ( x )
H 0
H 1
P p
P pkL
P pkU
P pk
Q 1
Q 2
Q 3
α
β
X 2
δ
γ
λ
μ
σ
σ 2
σ ST
Ѳ
c 4
d 2
d 3
f ( x )
p -value
s 2
LATEX
FDIS
IDE
MDB
D 3
D 4
d 2
d 3
ANOVA
ANSI
AQL
ARL
BSI
CD
CLI
CL
CRAN
CSV
DBMS
DFSS
DIS
DMAIC
DPMO
DoE
EWMA
FAQ
FDA
FDIS
FOSS
GUI
HTML
ICS
IEC
IQR
ISO
JTC
LCL
LL
LSL
LTPD
LT
MAD
MR
NCD
NP
OBP
OC
ODBC
OS
PAS
PLC
PMBoK
PWI
QFD
RCA
RNG
RPD
RSS
RUG
R
SC
SDLC
SME
SPC
SR
SVM
TC
TMB
TR
TS
UCL
UL
URL
USL
VoC
VoP
WD
WG
XML
5Ms
A
acceptability constant
acceptance
acceptance quality level
acceptance sampling
for attributes
for variables
AENOR
aggregate values
alternative hypothesis
Anderson-Darling test
anonymous function
argument
assignable causes
asymmetric
attribute
attributes control charts
average
B
bar chart
baseline profile
Bernoulli trial
Big Data
binomial
binomial distribution
black belt
box plot
box-and-whisker plot
Box-Cox transformation
brainstorming
browser
C
c chart
capability analysis
capability index
capability indices
cause
cause-and-effect
cause-and-effect diagram
center line
central limit theorem
central tendency
character
characteristic
check sheet
chunk (code)
class
cluster
cluster sampling
column
comment
common causes
complementary event
confidence bands
confidence interval
confidence level
consumer’s risk
continuous distributions
continuous scale
continuous variable
control chart
constants
individuals
control chart power
control chart tests
control chart zones
control limits
cost of poor quality
crossings
csv file
customer
customer specification limit
customer’s risk
CUSUM chart
cusum coefficients
cycle
D
data
acquisition
database
export
raw data
treatment
data acquisition
data analysis
data cleaning
data collection plan
data frame
data import
data reuse
data structures
data transformation
data type
database
dataset
date
dates
debugging
decreasing
defect
defect-free
defective
defective fraction
defects
defects per million opportunities
defects per unit
degrees of freedom
density
design for six sigma
design of experiments
design specifications
destructive test
device
dimension
discrete distribution
distribution
normal distribution
distribution function
distribution parameters
distributions
binomial
exponential
geometric
hypergeometric
lognormal
negative binomial
normal
Poisson
uniform
Weibull
double
double sampling plan
E
effect
environment
error
error type I
error type II
escaping
especial variability
estimation
estimator
event
EWMA chart
exclude
expectation
expected value
experimental design
exploratory data analysis
export graphics
export data
extract
extraction
extreme values
F
factor
files
finite population
first-time yield
fishbone diagram
fitness of use
flow chart
formula
fraction defective
frequency
frequency table
function
functional tolerance
G
global standard deviation
goodness of fit
graphical system
graphics
grid
par
graphics options
green belt
groups
H
histogram
histogram bars
history
hypothesis test
hypothesis testing
I
I Chart
image
import csv
import data
improvement
in-control process
independent event
independent samples
independent trial
index
individuals chart
individuals control chart
inference
inspection
integer
interface
Internet
interquartile range
interval
interval estimation
interview
IQR
Ishikawa diagram
Ishikawa, Kaoru
ISO
ISO Council
ISO members
ISO website
K
knit
L
label
larger-the-better
Lean
length
liaison
liaisons
category A
category B
category D
limits
control limits
specification limits
linear relation
list
location
log-Likelihood
logical
logical expressions
long-term variation
longest run
Loss function analysis
lot
lower capability index
lower control limit
lower performance index
lower specification limit
M
manufacturing specification limit
markdown
matrix
maximum
mean
mean tests
measure
measurement
measurement methods
mechanical devices
median
median absolute deviation
Mersenne-Twister algorithm
minimum
minimum capability index
missing
missing values
mode
model
moving range
moving Range chart
MR Chart
N
names
national body
natural limits
natural variation
nomilan-is-best
nominal values
nomogram
nomograph
non normality
non-random variation
nonlinear profile
nonlinearity
normal distribution
normality
np chart
null hypothesis
numeric
O
O-member
object
one-sided test
open source
operating characteristic
operating characteristic curve
operator
optimization
order
order data frame
out of control
out-of-control process
outlier
P
p chart
P-member
p-value
package
package installation
package update
panel (lattice)
parameter
Pareto Analysis
Pareto chart
partial matching
path
pattern
percentile
Phase I control charts
Phase II control charts
plot
lines
text
plots
point estimation
point estimator
Poisson distribution
Poisson rate
poor quality
population
population mixture
power
predictive variable
print
probability
probability distribution
probability distribution model
problem-solving techniques
procedure
process
control
out of control
shift
process capability
process change
process control
process data
process map
process owner
process parameter
process performance indices
process shift
process tolerance
producer
producer’s risk
product
quality
profile
profiles control chart
programme of work
programming language
Ada
C
Fortran
Pascal
Prolog
Python
R
Ruby
proportion of defects
proportions test
prototype profile
pseudo-random numbers
Q
quality
characteristic
quality (definition)
quality assurance
quality characteristic
larger-the-better
nominal-is-best
smaller-the-better
quality control
history
plan
quality definition
quality function deployment
quality management systems
quantile
quantile function
quantile-quantile plot
quartile
R
R
CRAN
FAQ
foundation
base
code
community
conferences
Console
console
contributors
definition
documentation
environment
expression
Foundation
foundation
Help
history
installation
interface
journal
learning curve
libraries
mailing lists
manuals
markdown
object
output
packages
programming language
R-core
release
reports
script
scripts
session
source code
task views
website
R Commander
R Chart
R expression
random
random noise
random numbers
random sampling
random variable
random variate
randomness
range
Range chart
rare event
rational subgroups
raw data
recycling
reference interval
reference limits
regular expression
regularization
rejection
rejection region
relative frequency
reliability
replace
replacement
representative sample
reproducible report
reproducible research
requirement
response
response variable
reverse
risk management
robust parameter design
rolled throughput yield
row
RStudio
help
installation
layout
plots
plots pane
source
source editor
run chart
run tests
S
S Chart
sample
sample data
sample distribution
sample mean
sample size
sample standard deviation
sample variance
sampling
sampling distribution
sampling frequency
sampling plans
scatter plot
script
seasonality
secretariat (of a TC)
seed
sequence
Seven Quality Tools
Shapiro-Wilks test
Shewhart
constants
Shewhart charts
Shewhart control charts
shift
short-term variation
significance level
simple random sampling
simulation
single sampling plan
Six Sigma
skewness
smaller-the-better
smooth function
software
FOSS
Apache
C
commercial software
eclipse
emacs
Fortran
freedoms
Internet
Java
JMP
LibreOffice
licence
Linux
mac
Microsoft Office
Minitab
MySQL
open source
pandoc
php
Python
R
RStudio
S
SAS
source code
SPSS
Stata
statistical software
support
Systat
Ubuntu
Windows
sort
source files
special causes
special values
specification
specification limit
specification limits
specification tolerance
specifications
specifications design
specifications limits
spreadsheet
standard
standard development
standard deviation
Standard deviation chart
standard errors
standard normal distribution
standard stages
standards
standards development
standards maintenance
statistic
statistical software
strata
stratification
stratified sampling
Student’s t
Sturges
subgroup
subprocess
subset data frame
summary
mean
summary statistic
support vector machines
system locale
systematic sampling
T
Taguchi capability index
Taguchi loss function
tally sheet
target
task view
TC/69 Secretariat
TC69
TC69/AHG1
TC69/SC1
TC69/SC1/WG2
TC69/SC1/WG5
TC69/SC1/WG6
TC69/SC4
TC69/SC4/WG10
TC69/SC4/WG11
TC69/SC4/WG12
TC69/SC5
TC69/SC5/WG10
TC69/SC5/WG2
TC69/SC5/WG3
TC69/SC5/WG8
TC69/SC6
TC69/SC6/WG1
TC69/SC6/WG5
TC69/SC6/WG7
TC69/SC6/WG9
TC69/SC7
TC69/SC7/AHG1
TC69/SC7/WG1
TC69/SC7/WG2
TC69/SC7/WG3
TC69/SC8
TC69/SC8/WG1
TC69/SC8/WG2
TC69/SC8/WG3
TC69/WG3
tendency
terminology
tier chart
time series
time-dependent process
tolerance limits
trend
trivial causes
two-sided test
U
u chart
unbiased estimator
unbiasedness
upper capability index
upper control limit
upper performance index
upper specification limit
V
variability
variable
categorical
continuous
discrete
qualitative
quatitative
variables control charts
variables relation
variance
variance tests
variation
assignable cause
assignable causes
assignable variation
causes
chance variation
common causes
special causes
vector
vectorized functions
vignette
vital causes
voice of stakeholders
voice of the customer
voice of the process
VoS
W
WD
webscrapping
weighted mean
weighted moving average
Wikipedia
working directory
workspace
X
x-bar chart
Y
yield
Footnotes
1 See ‘Edit’ menu for further options.
2 Double operators && and || are used to compare vectors globally. Single operators, element-wise.
3 This makes results reproducible.