Multivariate Process Control Chart For Controlling The False Discovery Rate

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Industrial Engineering

& Management Systems


Vol 11, No 4, December 2012, pp.385-389
ISSN 1598-7248EISSN 2234-6473

http://dx.doi.org/10.7232/iems.2012.11.4.385
2012 KIIE

Multivariate Process Control Chart for


Controlling the False Discovery Rate
Jang-Ho Park, Chi-Hyuck Jun*
Department of Industrial and Management Engineering, POSTECH, Pohang, Korea
(Received: November 19, 2012 / Revised: November 20, 2012 / Accepted: November 22, 2012)
ABSTRACT
With the development of computer storage and the rapidly growing ability to process large amounts of data, the multivariate control charts have received an increasing attention. The existing univariate and multivariate control charts are
a single hypothesis testing approach to process mean or variance by using a single statistic plot. This paper proposes a
multiple hypothesis approach to developing a new multivariate control scheme. Plotted Hotellings T2 statistics are
used for computing the corresponding p-values and the procedure for controlling the false discovery rate in multiple
hypothesis testing is applied to the proposed control scheme. Some numerical simulations were carried out to compare
the performance of the proposed control scheme with the ordinary multivariate Shewhart chart in terms of the average
run length. The results show that the proposed control scheme outperforms the existing multivariate Shewhart chart
for all mean shifts
Keywords: Average Run Length, False Discovery Rate, Multivariate Shewhart Control Chart, p-Value
* Corresponding Author, E-mail: [email protected]

1. INTRODUCTION
Traditionally, quality is essential part of manufacturing in various industries, such as the chemical, semiconductor, automobile, computer, and cell phone industries. These days the importance of quality is emphasized also in service industries such as the banking, telecommunications, and health care industries. Customers
demand for better quality products is growing stronger
especially since they can now share the knowledge about
quality through the Internet and social networks. Therefore, quality improvement is one of the most important
aspects of business. Montgomery (2007) defines the term
quality improvement as the reduction of variability in
processes and products. There are two causes of variability in processes. The first cause is chance cause derived from random effects such as weather conditions.
Chance cause is considered as natural. The other cause
is assignable cause such as a failure in machinery or
faulty raw materials. Assignable cause can be controlled

by theoretical methodology. Therefore, many companies


have tried to reduce the assignable cause.
Although there are various methods to improve
quality, statistical process control (SPC) is regarded as
the most scientific and valid approach. In most industries, the univariate quality variable is used for separately monitoring key measurements on final products which
in some way define the quality of that product (MacGregor and Kourti, 1995). However, the univariate control chart cannot detect whether the variables are correlated with each other. To overcome this difficulty, several multivariate control charts have been proposed
based on chi-squared statistics and Hotellings T 2 statistics. These charts can be considered an extension of
each univariate control chart. The multivariate Shewhart
control charts are an extension of the X -control chart
(Hotelling, 1947) which is the simple and most widely
used.
Control charts can be interpreted using a p-value
approach. By using a p-value approach, we get several

Park and Jun: Industrial Engineering & Management Systems

Vol 11, No 4, December 2012, pp.385-389, 2012 KIIE

advantages over the traditional control charts. First, a pvalue approach offers better graphical displays of the
performance of the process and incorporates more complex control procedures (Benjamini and Kling, 1999). Li
et al. (2012) showed that we can determine how strong
the signal is and how stable the process performs at a
given time using univariate cumulative sum (CUSUM)
charts. If we adopt a p-value approach, the control charts
can be considered as a sequential single hypothesis test.
Therefore, if we can establish the distribution of plotted
statistics, we can control quality more easily by setting
only type I error (Lee and Jun, 2010, 2012). Finally,
we can apply a multiple comparison procedure to control charts by testing single hypothesis simultaneously
(Benjamini and Kling, 1999).
In statistics, the multiple comparisons or multiple testing problem occurs when one considers a set of
statistical inferences simultaneously (Miller, 1981). Since
single hypothesis testing increases the false positive rate
when various hypotheses are tested, the family wise
error rate (FWER) is used in multiple comparison procedures. The FWER is the probability that at least one
false positive or type I error will occur among all the
hypotheses tested. Many procedures to control the FWER
have been proposed such as the Bonferroni, Sidak,
Tukeys, Holms step-down, and Hochbergs step-up
procedures. However, these procedures are not widely
used. They give conservative results when the number
of hypotheses increases. Therefore, the utility of testing
decreases. To overcome the weaknesses of the FWER,
Benjamini and Hochberg (1995) proposed using the
false discovery rate (FDR). The FDR is the expected
portion of false positives among all the rejected hypotheses. They also proposed a procedure for controlling the FDR.
There has been an effort to apply the FDR to univariate control charts. Lee and Jun (2010, 2012) proposed
procedures to control FDR for univariate X -charts and
exponentially weighted moving average (EWMA) charts.
They showed that by controlling the FDR, X -charts and
EWMA charts give better performance than traditional
control charts.
Grown out of these motivations, the objective of this
paper is to provide a new multivariate process chart by
controlling the FDR. The remainder of this paper is organized as follows. The multivariate process control is
interpreted in terms of p-values in Section 2. Section 2
also proposes a new multivariate control scheme for
controlling the FDR. Section 3 compares the performances of the new control schemes using numerical experiments. Finally, Section 4 gives the conclusion.

2. MULTIVARIATE CONTROL SCHEME


FOR CONTROLLING THE FDR
Multivariate Shewhart control chart (Hotelling, 1947)

386

is used for more than two quality variables or equal to


two quality variables. Basically, it uses Hotellings T 2 .
Suppose that there are q quality variables and the total
number of observations is equal to m. Also, suppose that
the single observation vector follows a multivariate
normal distribution with mean vector 0 and covariance
matrix . The proposed method controls no single observation but subgroup. If the size of subgroup is n, the
mean vector of jth subgroup, X j (j = 1, 2, 3, ) is like
the following equation.
Xj =

1
n

n
i =1

(1)

X ij

The vector X ij , is i th observation which is included


in j subgroup. Therefore, the covariance matrix of
subgroup is given by
th

Sj =

1
n1

n
i =1

( X ij X j )( X ij X j )T .

(2)

If 0 = (01 , 01 , ", 0q )T is the target mean quality,


then the statistics of the jth subgroup of the Multivariate
SPC chart is defined by Hotellings T2 statistics as
(Anderson, 1958),
Tj2 = n(X j - 0 )T S-1j (X j - 0 )

(3)

Since the control chart controls the quality when


the current condition is in an in-control state, T2 follows
the Hotelling distribution with a degree of freedom (q,
n-q).
(4)

2
T 2 ~ Tq,n-1

The Hotelling T2 distribution is related to the more


familiar F distribution. The relationship between the
Hotelling T2 and the F distributions is
F=

n q
q ( n 1)

(5)

T2

The upper control limit (UCL) of a multivariate


SPC chart can be set using the above relationship, while
the lower control limit (LCL) is 0 and the UCL is obtained using Eq. (6).
2
TUCL
=

q ( n1)
( nq )

F , q , nq

(6)

Here F ,q,n q is 100 % of the critical point of the


F distribution with a degree of freedom (q, n-q). Therefore, the multivariate SPC chart indicates an out-of2
control state when T2 is greater than or equal to TUCL
.
The multivariate Shewhart control scheme is considered to be a sequential single hypothesis testing
which is described as,
H 0 : j = 0 vs. H1 : j 0 ( j = 1, 2, ")

(7)

Multivariate Process Control Chart for Controlling the False Discovery Rate
Vol 11, No 4, December 2012, pp.385-389, 2012 KIIE

If the null hypothesis j = 0 is true, then the statistic Tj2 follows the Hotelling T2 distribution with a degree of freedom (q, n-q). Therefore, each subgroups pvalue is
p j = P {T2 > T j2 H 0 } = Gq , nq ( q(nn -q-1) T j2 )

p( i ) qi / r , (i = 1, , r)

n -q

(8)

3. NUMERICAL EXPERIMENTS
In this section, some numerical experiments are performed to compare the BH scheme with the multivariate
Shewhart control chart. A theoretical average run length
(ARL) for the multivariate Shewhart control chart is
compared with a simulated ARL for the BH scheme.
Since computing the simulated ARL for the BH scheme
is difficult, the Monte Carlo simulation approach is used.
In this section, two- and three-dimensional quality variables are used for experiments.

(9)

3.1 Two-Dimensional Quality Variables

where the non-centrality parameter () is


1

= n( 0 )T ( 0 )

First, the theoretical ARL is compared with the simulated ARL using the p-value approach for the conventional multivariate Shewhart control chart. For twodimensional quality variables, the mean vector (0, 0)T,
covariance matrices (1 0.2; 0.2 1), (1 0.5; 0.5 1), and (1
0.8; 0.8 1) are used to compare the theoretical ARL and
the simulated ARL. For the theoretical ARL, Eqs. (9)
and (10) are used. For the simulated ARL, 10,000 iterations replicated 20 times are performed to compute an
average value. Various mean shift sizes (0, 0.5, 1, 1.5, 2,
2.5, 3, 4, 5) were used for experimental purposes. If a
mean shift size is 0.5, a shifted process has a mean vector of (0.5, 0.5)T, and the covariance matrix is unchanged. The critical value was set at 0.005. The results of this experiment are provided in Table 1. Table 1
indicates that the simulated results are almost exactly

(10)

If is 0, the non-central F distribution is exactly the


same as the F distribution.
Step 1: For each subgroup, compute T2 statistics.
Step 2: For each T2 statistic, compute the p-value.
Step 3: Apply the Benjamini and Hochberg procedure.
1) Specify the FDR level q.
2) From the current testing point t to the previous
testing point t-r+1, sort the p-values in increasing
order. If the ordered p-values are p(1), p(2), , p(r1), p(r) the corresponding hypotheses are H(1), H(2),
, H(r-1), H(r)
3) If certain p(i) (I = 1, , r) values satisfy

Table 1. ARL of multivariate Shewhart control chart for two-dimensional quality variables
Covariance matrix
1 0.2

0.2 1

Mean Shift

0
0.5
1
1.5
2
2.5
3
4
5

1 0.5

0.5 1

1 0.8

0.8 1

Simulation

Theoretical

Simulation

Theoretical

Simulation

Theoretical

199.6347
74.7893
22.0309
8.9513
4.6744
2.9133
2.0582
1.3552
1.1151

200.0000
74.8447
22.0235
8.9563
4.6762
2.9144
2.0665
1.3565
1.1150

199.5738
86.6555
27.6506
11.5473
5.9996
3.6920
2.5402
1.5612
1.2119

200.0000
86.4125
27.7327
11.5426
6.0001
3.6715
2.5360
1.5643
1.2115

199.6769
96.0968
33.1505
14.1588
7.3766
4.4844
3.0394
1.7967
1.3263

200.0000
96.0559
33.2083
14.1679
7.3789
4.4709
3.0371
1.7932
1.3258

ARL: average run length.

(9)

the current state is considered to be out-of-control


and the hypothesis H(i) is rejected.

where Gq,n -q is the tail probability of the F distribution


with a degree of freedom (q, n-q). The control chart indicates an out-of-control state when the jth p-values, pj,
is less than or equal to a type I error .
Anderson (1958) proved that if the null hypothesis
is not true, the statistics T2 follow a generalized T2 distribution ( T 2 ) with a degree of freedom (q, n-q). He
also derived the relationship between the generalized T2
distribution and the non-central F distribution ( F ) to be
F = q(n -1) T 2

387

Park and Jun: Industrial Engineering & Management Systems

Vol 11, No 4, December 2012, pp.385-389, 2012 KIIE

388

the same as the theoretical results. Therefore, a p-value


approach is stable and appropriate for use with a multivariate Shewhart control chart. Also, the out-of-control
ARLs are different given the same mean shifts level.
Table 2 shows the difference between maximum and
minimum eigenvalue for two-dimensional covariance
matrix. Tables 1 and 2 represent that the out-of-control
ARL drops sharply when the difference between maximum and minimum eigenvalues is small. In other words,
when the quality variables are independent, the control
charts detect the out-of-control signal quickly.
The mean vector (0, 0)T and only the covariance
matrix (1 0.5; 0.5 1) are used to compare ARLs. The
span size for BH scheme was r = 10, 20, 30. For multivariate Shewhart control chart ARLs, theoretical values
were used. For the BH scheme simulation, 10,000 iterations replicated 20 times were used to compute average
values. Various mean shift sizes (0, 0.5, 1, 1.5, 2, 2.5, 3,
4, 5) were used for experimental purposes, and the critical value was set at 0.005. It is axiomatic that a shorter
ARL1 corresponds to a better control chart given the
same ARL0. A trivial case is where the ARL0 of the
MSCS is 1/. Some experiments show that the ARL0 of
the BH-scheme is approximately 1/ when the FDR
level of the BH-scheme is q = r. Therefore, in this
experiment, the FDR level was set at r. The results
of this experiment are listed in Table 3. The ARL1 of the

BH scheme is smaller than that of the multivariate Shewhart control chart for all mean shift sizes. Therefore,
the BH scheme performs better for two-dimensional
quality variables. Also, in the case of the BH scheme, a
larger span size results in better performance when the
mean shift is small. These tendencies are the same in the
various covariance matrix cases.
3.2 Three-Dimensional Quality Variables
For three-dimensional quality variables, the same procedure is adopted to compare the performance of the BH
scheme and multivariate Shewhart control chart. To
compare the performance of the BH scheme with the
multivariate Shewhart control chart, the mean vector (0,
0, 0)T and only the covariance matrix (1 0.5 0.5; 0.5 1
0.5; 0.5 0.5 1) are used. Other conditions such as the
number of iterations, number of replications, critical
level and mean shift size were the same as those in the
two-dimensional analysis set forth above. Table 4 lists
the results of this experiment. The interpretation of these
results is also the same as in the case of twodimensional quality variables. In other words, the BH
scheme performs better for three dimensional quality
variables. Also, in the case of the BH scheme, a larger
span size results in better performance when the mean

Table 2. Maximum and minimum eigenvalue for two-dimensional covariance matrix


Covariance matrix

1st Eigenvalue

2nd Eigenvalue

Difference between
maximum and minimum eigenvalue

1 0.2

0.2 1

1.2

0.8

0.4

1 0.2

0.2 1

1.5

0.5

1.0

1 0.2

0.2 1

1.8

0.2

1.6

Table 3. ARLs of BH scheme and multivariate shewhart control chart for two-dimensional quality variables
Mean shift

BH scheme (r = 10)

BH scheme (r = 20)

BH scheme (r = 30)

200.8141

200.5796

201.6201

200.0000

0.5

114.9605

106.2124

98.7953

123.3485

50.0415

42.6149

36.5857

57.5694

1.5

23.7864

18.4067

15.2016

30.6586

12.8410

9.7025

8.9444

18.6685

2.5

7.8120

6.4639

6.4241

12.5258

5.3399

4.9407

4.9288

9.0127

3.3348

3.3243

3.3280

5.3909

2.5065

2.5162

2.5115

3.6733

ARL: average run length, BH scheme: Benjamini and Hochberg scheme.

Multivariate Shewhart

Multivariate Process Control Chart for Controlling the False Discovery Rate
Vol 11, No 4, December 2012, pp.385-389, 2012 KIIE

389

Table 4. ARLs of BH scheme and multivariate Shewhart control chart for three-dimensional quality variables
Mean shift

BH scheme (r = 10)

BH scheme (r = 20)

BH scheme (r = 30)

Multivariate Shewhart

200.8141

200.5796

201.6201

200.0000

0.5
1

114.9605
50.0415

106.2124
42.6149

98.7953
36.5857

123.3485
57.5694

1.5
2
2.5

23.7864
12.8410
7.8120

18.4067
9.7025
6.4639

15.2016
8.9444
6.4241

30.6586
18.6685
12.5258

3
4
5

5.3399
3.3348
2.5065

4.9407
3.3243
2.5162

4.9288
3.3280
2.5115

9.0127
5.3909
3.6733

ARL: average run length, BH scheme: Benjamini and Hochberg scheme.

shift is small.

4. CONCLUSION
This paper proposed a new multivariate process
control scheme, which intends to control the false discovery rate. The BH procedure is incorporated to control
the FDR in the sense of multiple hypothesis testing.
First, some simulation studies showed that the use of pvalues in multivariate Shewhart control chart is appropriate. Finally, it was shown that the proposed control
scheme outperforms the conventional chart in two- and
three- dimensional quality variables in terms of ARL.

ACKNOWLEDGMENTS
This research was supported by Basic Science Research Program through the National Research Foundation of Korea from the Ministry of Education, Science
and Technology (Project No. 2012-0001665).

REFERENCES
Anderson, T. W. (1958), An Introduction to Multivariate
Statistical Analysis, Wiley, New York, NY.
Benjamini, Y. and Hochberg, Y. (1995), Controlling the
false discovery rate: a practical and powerful approach to multiple testing, Journal of the Royal

Statistical Society B Methodological, 57(1), 289300.


Benjamini, Y. and Kling, Y. (1999), A look at statistical
process control through the p-values, Research Paper: RP-SOR-99-08, Tel Aviv University, School
of Mathematical Science, Israel.
Hotelling, H. (1947), Multivariate quality control, illustrated by the air testing of sample bombsights. In:
Eisenhart, C. (ed.), Selected Techniques of Statistical Analysis for Scientific and Industrial Research,
and Production and Management Engineering, McGraw-Hill Books, New York, NY.
Lee, S. H. and Jun, C. H. (2010), A new control scheme
always better than X-bar chart, Communications in
Statistics-Theory and Methods, 39(19), 3492-3503.
Lee, S. H. and Jun, C. H. (2012), A process monitoring
scheme controlling false discovery rate, Communications in Statistics-Simulation and Computation,
41(10), 1912-1920.
Li, Z., Qiu, P., Chatterjee, S., and Wang, Z. (2012), Using p values to design statistical process control
charts, Statistical Papers, 1-17.
MacGregor, J. F. and Kourti, T. (1995), Statistical process control of multivariate processes, Control Engineering Practice, 3(3), 403-414.
Miller, R. G. (1981), Simultaneous Statistical Inference,
Springer-Verlag, New York, NY.
Montgomery, D. C. (2007), Introduction to Statistical
Quality Control, Academic Internet Publishers,
Ventura, CA.

You might also like