ILPMF
ILPMF
4, OCTOBER 2021
Abstract— Process monitoring, which is used to ensure the the parameters become more complicated. Process engineers
product quality in semiconductor manufacturing, develops a have identified the need for prealarm process prediction
control chart and alerts engineers whenever a control limit is schemes that will identify problems faster than the existing
exceeded. However, a late alarm could generate defects or scraps,
and thus, a prealarm is urgent to be developed. This study technology.
proposes an in-line predictive monitoring (ILPM) framework that In the literature, data science and machine learning (DS/ML)
uses process parameter monitoring (PPM) in the first phase and techniques have proven helpful for process prognosis and
equipment parameter monitoring (EPM) in the second phase. fault detection and classification (FDC). The methodology
PPM includes off-line training and in-line prediction, where we for FDC in DS/ML can be categorized by the type of
collect the first half of time-series data and predict the second half
for in-line quality control. To maintain robustness and accuracy, training task: the supervised learning builds a function that
the concept drift is used to update the ILPM model in real maps the inputs to outputs based on observations including
time; specifically, the EPM detects the loss of prediction by input–output pair, and the unsupervised learning identifies
cumulative sum (CUSUM) control chart or detects the change unknown patterns in data sets without predetermined labels
of equipment parameters by Bayesian approach for developing (i.e., outputs) [3]. The supervised learning algorithms used in
retraining mechanism. An empirical study of a semiconductor
manufacturer indicates that the proposed ILPM framework FDC include tree-based algorithms [4], [8], probabilistic least
improved both the quality control and production capacity. squares support vector regression [5], particle filtering [6],
deep learning [7], support vector data descriptions [9], and
Note to Practitioners—Although some in-line monitoring virtual metrology [23]. The unsupervised learning algorithms
approaches have been proposed for specific conditions, little used include adaptive resonance theory 1 (ART1) [10], spatial
research has been done to develop a prealarm method; in
particular, this study proposes predictive monitoring by using independent tests [11], uncorrelated multilinear principal com-
data from multiple sensors (or status variable identifications, ponent analysis (UMPCA) [12], fusion models [13], degrada-
SVIDs) to predict one sensor in one process step of semiconductor tion analysis [33], and dual cointegration analysis [40]. Since
manufacturer. The prealarm benefits the early troubleshooting DS/ML techniques are widely used in FDC, each algorithm
and equipment capacity. This study also applies concept drift and has its own advantages and disadvantages [14], and model
equipment parameter monitoring (EPM) to identify the predic-
tion model misalignment or equipment misalignment. In practice, selection is based on performance metrics [e.g., accuracy,
root cause identification of the prediction error is critical and F1-score, and mean squared error (MSE)] and applicable
benefits the model retraining and equipment maintenance. conditions in real applications.
Index Terms— Bayesian monitoring, concept drift, failure prog- Besides DS/ML techniques, process engineers usually use
nosis, gated recurrent unit (GRU), in-line prediction. the metrology measures of control charts or sensors to monitor
semiconductor manufacturing. Typically, using the sample
I. I NTRODUCTION mean as the center line and standard deviations as variability,
the upper control limit (UCL) and lower control limit (LCL)
S EMICONDUCTOR manufacturing involves capital inten-
sive and complex manufacturing network processes.
To achieve a high quality and high throughput, a large number
can be calculated to effectively control the specification and
parameter state. For the univariate control chart, the seminal
work was developed by Shewhart [15]. Based on the frequen-
of process parameters, called SVIDs such as temperature, pres-
tist points of view with the fixed target value and assuming
sure, current, and voltage, are collected in real time. A control
that the process mean and variance are fixed, i.e., statistical
chart [1] monitors the SVIDs and sounds an alarm when
theory with the normal distribution of production data, 99.7%
a machine abnormality is detected or the product quality is
of the data points will fall within the range of three times
flawed. As advances in technology allow semiconductor man-
the standard deviation of the centerline for quality control.
ufacturers to collect more SVIDs, interaction effects among
However, Shewhart’s control chart does not memorize the
Manuscript received February 2, 2020; revised May 29, 2020; accepted information from sequential or time-series data, and thus, it is
July 27, 2020. Date of publication August 14, 2020; date of current version not sensitive to detect sudden data drift. Later, a cumulative
October 6, 2021. This article was recommended for publication by Associate
Editor L. Moench and Editor F.-T. Cheng upon evaluation of the reviewers’ sum (CUSUM) control chart was developed to accumulate
comments. This work was supported by the Ministry of Science and Tech- the variations of sequential data points and then regulate
nology, Taiwan, under Grant MOST106-2218-E-031-001. (Corresponding them [16]. In addition, while other control charts treat the
author: Chia-Yen Lee.)
The authors are with the Institute of Manufacturing Information and subgroups of samples individually, an exponentially weighted
Systems, National Cheng Kung University, Tainan 70101, Taiwan (e-mail: moving average (EWMA) chart was proposed to consider
[email protected]). the relation and sequence of subgroup samples by assigning
Color versions of one or more of the figures in this article are available
online at https://ieeexplore.ieee.org. appropriate weights [1], [17]. An EWMA chart monitors
Digital Object Identifier 10.1109/TASE.2020.3014177 the profile of the manufacturing process, shows the different
1545-5955 © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: National Cheng Kung Univ.. Downloaded on November 02,2022 at 06:41:18 UTC from IEEE Xplore. Restrictions apply.
LEE et al.: ILPM FRAMEWORK 1669
importance of all prior data points, and weights the most recent this study contributes to the monitoring literature, develops
samples highly. A variant of the EWMA chart, called moving an ILPM framework for prealarm of PPM in real time, and
centerline EWMA (MCEWMA), considers the residuals with integrates the EPM for model retraining by concept drift.
varying control limits and fits an EWMA to the univariate The remainder of this article is organized as follows.
autocorrelated data to minimize the one-step-ahead prediction Section II introduces the fundamentals and methodologies
error [18]. Furthermore, change point detection was developed of the ILPM framework. Section III presents the proposed
for profile monitoring with a likelihood ratio statistic to detect two-phase ILPM framework including PPM and EPM, and we
the location and magnitude of the shifts in linear profiles [19], develop off-line training module, in-line prediction module,
particularly, the parameter changes in intercept, slope, and and concept drift module. Section IV discusses the results of
error variance, respectively. For a comprehensive review of the validating the proposed framework with an empirical study of
profile, control charts are used to monitor the manufacturing a semiconductor assembly manufacturer. Section V presents
process (see [20]). the conclusion.
For semiconductor manufacturing, control chart techniques
provide better precision control by recipe. A recipe is a specific II. F UNDAMENTALS AND M ETHODOLOGIES
set of SVIDs with the necessary steps and tool settings. Only
This section describes the fundamental DS/ML techniques
multivariate control charts can handle the large numbers of
and control chart. They are GRU, CUSUM, target baseline
SVIDs collected with today’s technology. The most popular
(TB), concept drift, and Bayesian monitoring. These tech-
is the Hotelling’s T-squired statistic, which considers the
niques are used for ILPM framework and model retraining
covariance among variables [21], [22]; however, it is limited to
mechanism.
Gaussian distribution, which is rare in practice. Nonparametric
control charts are also used, such as the k-nearest neighbor
detection evaluating the cumulative distance of an observation A. Gated Recurrent Units
to its k-nearest neighbors in the learning samples under control The traditional artificial neural network (ANN) was devel-
without distribution assumption [30], or the adaptive Maha- oped with multilayer feedforward structure or backpropagation
lanobis distance considering the local covariance structure mechanism [29]. Each input of ANN is independent, and
among variables with a different covariance matrix obtained there is no memory function in the network structure. Hence,
on the nearest neighbors of each observation [24]. However, ANN cannot handle complicated sequential problems, such as
these methods were developed in the literature work for either linguistic materials, self-correlated signals, or time-series data.
off-line or metrology analysis, and thus, the fault detection or Recurrent neural network (RNN) with memory function and
alarms could be late when defective wafers were generated or a storage unit was developed. Each input can be temporarily
a wafer has run several steps. used as the available input of the next neuron. Long short-term
Motivated by the need to improve real-time monitoring memory (LSTM) is a type of RNN often used for time-series
in semiconductor manufacturing, this study develops a com- data sets [31]. The LSTM structure consists of an input gate,
prehensive in-line predictive monitoring (ILPM) framework a forget gate, and an output gate, and each gate is determined
embedded with a prealarm system using DS/ML so that as ON or OFF in the model training process. To reduce the
anomaly can be identified and corrected early than allowed parameters used in LSTM, GRU was developed. A reset gate
by the existing technology. The proposed ILPM framework determines how to integrate a new input with the previous
consists of a two-phase monitoring scheme. In phase 1, memory, and an update gate determines how much of the
the process parameter monitoring (PPM) collects the first previous memory to retain [25].
half of the data sequence of multiple process parameters The GRU structure has several variants; the following
(also called SVIDs/sensors hereafter) in one process step to describes a typical variant. Let st be the output state vector, x t
predict the second half of the data with respect to single be the input vector, z t be the update gate vector, and rt be the
sensor. This study uses gated recurrent unit (GRU) [25] and reset gate vector. Let Wz , Uz , and bz be the parametric matrices
suggests CUSUM chart for predictive monitoring. However, and intercept vector of the update gate. Let Wr , Ur , and br be
the prediction bias may rise later. One reason is model mis- the parametric matrices and intercept vector of the reset gate.
alignment which indicates the obsolete prediction model (i.e., Let Ws , Us , and bs be the parametric matrices and intercept
bias of predicted value), and the other reason is equipment vector for calculating the state candidate. Notation ϕ represents
misalignment which indicates the sensor error (i.e., bias of a sigmoid function. The operator ◦ denotes the Hadamard
actual value) due to equipment deterioration in the long-term product. The hyperbolic tangent function s̃t represents the state
operation. Thus, the concept drift is used to detect the changes candidate. GRU is formulated in the following equations:
in the data pattern over time and retrain the prediction model.
In phase 2, the equipment parameter monitoring (EPM) detects z t = ϕ(Wz x t + Uz st−1 + bz ) (1)
the lifetime of parts and sudden changes in mechanical para- rt = ϕ(Wr x t + Ur st−1 + br ) (2)
meters such as stiffness, damping, or vibration by Bayesian s̃t = tanh (Ws x t + Us (rt ◦ st−1 ) + bs ) (3)
monitoring. A comprehensive EPM procedure is developed st = z t ◦ st−1 + (1 − z t )◦s̃t . (4)
by continuously updating and retraining the PPM prediction
model for catching up with the updated information of new Fig. 1 shows the GRU. The update gate is formulated by (1)
observation collected in process step [27], [28]. Therefore, and forgets some information from the previous state, where z t
Authorized licensed use limited to: National Cheng Kung Univ.. Downloaded on November 02,2022 at 06:41:18 UTC from IEEE Xplore. Restrictions apply.
1670 IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 18, NO. 4, OCTOBER 2021
Fig. 2. Data transformation from samples to TB. (a) Centerline. (b) TB.
Fig. 1. GRU.
Authorized licensed use limited to: National Cheng Kung Univ.. Downloaded on November 02,2022 at 06:41:18 UTC from IEEE Xplore. Restrictions apply.
LEE et al.: ILPM FRAMEWORK 1671
the raw data streams, detectors can be built by sequential parameter variability. The characteristics of parameter mean μt
analysis, control charts, monitoring distributions on two dif- change over time due to system aging, environmental changes,
ferent time windows, and contextual approaches [27]. For or human interference. Assume that process mean μt follows
monitoring the parameters of the learners, the parameters of a normal distribution with mean μ and variance γ 2 . Third,
a system described by mathematical equations, i.e., a model- the posterior distribution presents the updated parameter. To
based system, are estimated, and the detection of changes of estimate and update the parameter mean μt , the information
the parameters provides information, where learners can use of parameter characteristic yt at time t is used. The probability
to compensate for the control errors. For example, mechanical density function (p.d.f.) of μt , given yt , following the Bayes
system identification methods such as genetic algorithms [32] approach is:
and Bayesian optimization [34] are used to estimate the f 1 (yt |μt ) f 2 (μt )
mechanical parameters which represent the system aging or g(μt |yt ) = (8)
f 1 (yt |μt ) f 2 (μt )dμt
wear-out by minimizing the gap between numerical analysis
and experimental results. For monitoring the prediction errors, where g(μt |yt ) is also a univariate normal distribution and also
an online adaptive learning procedure was developed to update called posterior distribution. Posterior mean μpt and posterior
the prediction model by considering the loss function esti- variance σpt2 combining the prior information with the data are
mated between the actual and predicted values of the new written as
sample [27]. An advanced incremental learning algorithm is μ
γ2
+ yt
σ2
able to adapt to the evolution of the data-generating process μpt = E(μt |yt ) = = wμ + (1 − w)yt (9)
over time. Two methods are suggested: update the model
1
γ2 + 1
σ2
parameters: 1) by analyzing the marginal errors with the most 1
σpt2 = Var(μt |yt ) = = γ 2w (10)
recently observed samples (i.e., edge retraining) or 2) by 1
γ2 + 1
σ2
adding these most recent samples into the training data set with
a fixed/variable time window (i.e., complete retraining). For where w = σ 2 /(γ 2 + σ 2 ) is considered as a weight of prior
edge retraining, the dynamic weighted ensemble classifier is mean μ. Hence, the posterior mean is the weighted average of
popular [35]; each classifier refers to an expert with a weight, the prior mean and the new sample, and the posterior of inverse
and the weights of the classifiers are dynamically adjusted variance is the combination of the prior inverse variance and
based on the prediction performance over time. For complete the inverse of the sample variance.
retraining, the design of triggering mechanisms and batch Assume that the equipment parameter μt is a fixed constant
size identifies the tradeoff between the model retraining time in a small time interval, for example, minutes or hours,
and prediction accuracy. This study focuses on the off-line so that the yt can be considered as an independent and
complete retraining due to a large amount of data by a recipe identically distribution (i.i.d.) with the p.d.f. f 1 (yt |μt ). Given
by steps in semiconductor manufacturing process (i.e., big a data set y1 , . . . , yT with a sample size of T , the equipment
data). The CUSUM control chart is used for change detection, mechanical parameters μ, σ 2 , and γ 2 can be estimated as
and then, the model is retained by adding new observations. follows. The estimate of overall aging process mean μ is the
double expectation of yt given μt , and the estimator μ̂ can be
formulated as follows:
E. Bayesian Monitoring of Equipment Aging Process T
yt
Process drift exists in most of the mechanical systems; μ̂ = t=1 . (11)
T
in particular, system aging and tool wearing problem may
happen in a long-term operation. The mechanical parameters Assume that the correlation between μt and μt+1 is close
ideally change slowly over time, and thus, this study aims to 1, to consider the time lag between yt ’s as the sample error.
to identify the sudden and significant drift of mechanical Since Var(yt −yt+1 ) = 2σ 2 , the estimator σ 2 can be formulated
parameters automatically by updating the varied process mean as follows:
T (yt −yt −1 )2
and variance required to monitor the operational status of the t=2
semiconductor manufacturing process. A Bayesian approach σ̂ 2 = 2
. (12)
T −1
takes the advantage of an online environment by using pre-
vious and current data to update the prior via likelihood; To estimate γ 2 , first, estimate the total variance. Assume
thus, we use it to estimate the process mean [36], [37]. For that the sample and process covariance do not exist. By the
the Bayesian process monitoring, we explain three necessary conditional expectation method, the total variance V and the
components: the likelihood function, prior distribution, and estimator can be formulated as follows:
posterior distribution. V ≡ Var(yt ) = E μt (Var(yt |μt ))+Varμt (E(yt |μt )) = σ 2 +γ 2
First, the likelihood function presents the sample variability.
Let yt be a continuous aging process characteristic such as the (13)
mechanical parameters of equipment. Since sampling errors 1
T
V̂ = (yt − μ̂)2 . (14)
may occur, assume that yt is a normal distribution with mean T t=1
μt and variance σ 2 at time t, where σ 2 is an unknown constant
and assumed to be the combination of the measurement Then, γ̂ 2 = V̂ − σ̂ 2 if it is positive; otherwise, set γ̂ 2 = δ V̂ ,
and sampling errors. Second, the prior distribution presents where δ is a very small positive value.
Authorized licensed use limited to: National Cheng Kung Univ.. Downloaded on November 02,2022 at 06:41:18 UTC from IEEE Xplore. Restrictions apply.
1672 IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 18, NO. 4, OCTOBER 2021
ishing the collection of the first half of the data in one step.
m n
100% (st − ŝt )
The data are used as a model input which predicts the curve of MAPE = . (16)
mn i=1 s
the second half of the data. If the CUSUM chart shows that the t= 2
n t
predictive curve is out of control, the in-line prediction module
triggers the prealarm mechanism; otherwise, the predicted In-Line Prediction Module:
remaining process is in control. The alarm confirmation stage Steps 2.1–2.3: For one new sample, after collecting the
is triggered by finishing the collection of the remaining data. first-half sequence of the sensor values, the ILPM model
If the CUSUM chart shows that the actual curve of the remain- predicts the second-half output sequence. In the prealarm
ing data is out of control, the in-line prediction module issues stage, these steps calculate and cumulate the distances between
an alarm confirmation which confirms the previous stage and the predicted values and TB for CUSUM monitoring, i.e., this
triggers a troubleshooting process. Typically, the proposed is an ILPM by predicted CUSUM.
ILPM is built by a recipe by step by a sensor, but it can Steps 2.4–2.6: If the predicted CUSUM chart shows out of
be generalized based on the grouping technique (e.g., recipe control, the equipment triggers a prealarm, and engineers val-
group). idate the alarm and troubleshoot. These steps keep collecting
Fig. 4 shows the proposed ILPM framework including two data whether the CUSUM shows in control or out of control
phases—PPM and EPM. Three modules including off-line until the end of the process step.
training, in-line prediction, and concept drift are developed. Steps 2.7–2.9: After collecting the remaining second-half
The off-line training module is built to train the predic- sequence of the sensor values, in the alarm-confirm stage,
tion model and estimate the TB. In-line prediction mod- these steps calculate and cumulate the distances between the
ule predicts the remaining part of sequential data and cal- actual values and TB for CUSUM monitoring; i.e., this is a real
culates the CUSUM for prealarm and alarm confirmation. monitoring by actual CUSUM. If the actual CUSUM shows
The concept drift module is used to update the predic- out of control, the equipment triggers an alarm, and engineers
tion model by detecting the bias of the prediction model need to validate the alarm and troubleshooting.
by identifying the root cause (i.e., model misalignment Step 2.10: The consistency between predicted CUSUM and
or equipment misalignment) through the Bayesian aging actual CUSUM is investigated. Four cases are discussed here.
monitoring. For the two cases—the predicted CUSUM prealarms, but then
The steps of the proposed two-phase framework shown the actual CUSUM alarms, or the predicted CUSUM does
in Fig. 4 are explained as follows. not prealarms, but then the actual CUSUM does not alarm,
Off-Line Training Module: these two cases show a justified consistency between predicted
Step 1.1: In-line data such as parameters, SVIDs, or sensor result and actual result, and thus, there is no correction needed
values with respect to one step in one equipment during for the ILPM prediction model. However, for the other two
the production process are collected from the manufacturing cases—the predicted CUSUM prealarms, but then the actual
execution system (MES), FDC, or recipe management system CUSUM does not alarm (i.e., false alarm), or the predicted
(RMS). CUSUM does not prealarms, but then the actual CUSUM
Steps 1.2–1.3: Data preprocessing merges different data alarms (i.e., missing alarm), these two types of errors usually
tables, removes the outlier, imputes the missing values, imply misalignments between prediction model and actual
smooths the noise, and then divides the cleaned data into a sensor values. Go to Step 3.1 for further investigation by
training data set and a testing data set for model training. concept drift module.
Authorized licensed use limited to: National Cheng Kung Univ.. Downloaded on November 02,2022 at 06:41:18 UTC from IEEE Xplore. Restrictions apply.
LEE et al.: ILPM FRAMEWORK 1673
Concept Drift Module: the sudden and significant aging process drift by Bayesian
Steps 3.1–3.2: These steps calculate the loss between the monitoring.
predicted and actual values. The loss is measured by the Step 3.7: Implement Bayesian monitoring of the equip-
distance and cumulated for CUSUM monitoring; that is, ment aging process with respect to the mechanical parameter.
the CUSUM chart monitors the prediction accuracy of the In a long-term operation, the equipment parameter mean
current ILPM model. Note that the loss can be defined by may change over time. However, the estimate of prior mean
different forms, e.g., considering the squared error as the contained all the data points. This will provide an inaccurate
distance with quadratic penalty. estimate of equipment parameter since the process mean will
Steps 3.3–3.4: If the cumulative loss is over the control limit, be weighed down by old data. By incorporating the weighting
then the error between the predicted value and actual value is factor into Bayesian approach, the estimation of parameters
significant and detected, and then, we need to investigate the can be rewritten by
root cause of the error (go to Step 3.5); that is, the loss is T
substantially affected by either the bias of predicted value (i.e., λT −t yt
μ̂ = t=1T
(17)
T −t
prediction model misalignment due to new sample bringing t=1 λ
new information different from the past) or the bias of actual T λT −t (yt −yt −1 )2
t=2
value (equipment misalignment due to aging or wear-out). σ̂ 2 = T
2
(18)
Otherwise, the prediction model is in control and applicable t=2λ −t
T
T T −t
to the process for continuous monitoring (go to Step 3.4). λ (yt − μ̂)2
Steps 3.5–3.6: EPM is developed to investigate two things: V̂ = t=1
T (19)
T −t
t=1 λ
remaining useful life (RUL) and process drift. For RUL,
the predictive maintenance (PdM) model can be applied, and where λT −t is the weighted coefficient for each time t, and λ is
if the RUL is over than a predetermined threshold, then the a hyperparameter given as 0 < λ < 1. If the weight away data
equipment or part is out of control, and go to Step 3.8 for are selected to be m observations ago, set λ as δ 1/m , where δ
equipment maintenance due to worn-out equipment or part is a very small value. For example, let δ = 0.0001, and the
deterioration. Otherwise, the lifetime is in control, and go to number of data to be weighted away is 100, 200, and 300, and
Step 3.7. Since RUL and PdM are out of the scope of this the λ, in turn, is 0.912, 0.955, and 0.970, respectively. Com-
study, see [38] and [39] for details. We focus on identifying paring the estimated process mean μ̂t and variance σ̂ 2 with
Authorized licensed use limited to: National Cheng Kung Univ.. Downloaded on November 02,2022 at 06:41:18 UTC from IEEE Xplore. Restrictions apply.
1674 IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 18, NO. 4, OCTOBER 2021
Authorized licensed use limited to: National Cheng Kung Univ.. Downloaded on November 02,2022 at 06:41:18 UTC from IEEE Xplore. Restrictions apply.
LEE et al.: ILPM FRAMEWORK 1675
Fig. 7. Good product plots. (a) Distance measure between actual values and Fig. 9. Defective product plots. (a) Before ILPM model retraining. (b) After
TB. (b) Actual CUSUM plot. ILPM model retraining.
Authorized licensed use limited to: National Cheng Kung Univ.. Downloaded on November 02,2022 at 06:41:18 UTC from IEEE Xplore. Restrictions apply.
1676 IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 18, NO. 4, OCTOBER 2021
Authorized licensed use limited to: National Cheng Kung Univ.. Downloaded on November 02,2022 at 06:41:18 UTC from IEEE Xplore. Restrictions apply.
LEE et al.: ILPM FRAMEWORK 1677
R EFERENCES [25] K. Cho et al., “Learning phrase representations using RNN encoder-
decoder for statistical machine translation,” 2014, arXiv:1406.1078.
[1] D. C. Montgomery, Introduction to Statistical Quality Control, 7th ed. [Online]. Available: http://arxiv.org/abs/1406.1078
Hoboken, NJ, USA: Wiley, 2013. [26] S.-Y. Hung, C.-Y. Lee, and Y.-L. Lin, “Data science for delamina-
[2] C. Wang, M. Yang, D. Xu, and H. Wu, “A novel integrated identification tion prognosis and online batch learning in semiconductor assembly
method of model structure and parameters for drive system,” in Proc. process,” IEEE Trans. Compon., Packag., Manuf. Technol., vol. 10, no. 2,
IEEE 27th Int. Symp. Ind. Electron. (ISIE), Jun. 2018, pp. 13–15. pp. 314–324, Feb. 2020.
[3] G. James, D. Witten, T. Hastie, and R. Tibshirani, An Introduction to [27] J. Gama, I. Žliobaitė, A. Bifet, M. Pechenizkiy, and A. Bouchachia,
Statistical Learning: With Applications in R, 1st ed. London, U.K.: “A survey on concept drift adaptation,” ACM Comput. Surv., vol. 46,
Springer, 2013. no. 4, pp. 1–37, Apr. 2014.
[4] C.-F. Chien, W.-C. Wang, and J.-C. Cheng, “Data mining for yield [28] I. Žliobaitė, M. Pechenizkiy, and J. Gama, “An overview of concept drift
enhancement in semiconductor manufacturing and an empirical study,” applications,” in Big Data Analysis: New Algorithms for a New Society,
Expert Syst. Appl., vol. 33, no. 1, pp. 192–198, Jul. 2007. N. Japkowicz and J. Stefanowski, Eds. Cham, Switzerland: Springer,
[5] T. Khawaja and G. Vachtsevanos, “A novel architecture for on-line 2016, pp. 91–114.
failure prognosis using probabilistic least squares support vector regres- [29] Hecht-Nielsen, “Theory of the backpropagation neural network,” in
sion machines,” in Proc. Annu. Conf. Prognostics Health Manage. Soc., Proc. Int. Joint Conf. Neural Netw., Jun. 1989, pp. 593–611.
San Diego, CA, USA, 2009, pp. 1–8. [30] Q. P. He and J. Wang, “Fault detection using the k-nearest neighbor rule
[6] E. Zio and G. Peloni, “Particle filtering prognostic estimation of the for semiconductor manufacturing processes,” IEEE Trans. Semicond.
remaining useful life of nonlinear components,” Rel. Eng. Syst. Saf., Manuf., vol. 20, no. 4, pp. 345–354, Nov. 2007.
vol. 96, no. 3, pp. 403–409, Mar. 2011. [31] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural
[7] F. Jia, Y. Lei, J. Lin, X. Zhou, and N. Lu, “Deep neural networks: A Comput., vol. 9, no. 8, pp. 1735–1780, 1997.
promising tool for fault characteristic mining and intelligent diagnosis [32] M. Iwasaki, M. Miwa, and N. Matsui, “GA-based evolutionary identi-
of rotating machinery with massive data,” Mech. Syst. Signal Process., fication algorithm for unknown structured mechatronic systems,” IEEE
vols. 72–73, pp. 303–315, May 2016. Trans. Ind. Electron., vol. 52, no. 1, pp. 300–305, Feb. 2005.
[8] C.-Y. Lee and T.-L. Tsai, “Data science framework for variable selection, [33] A. Chehade and Z. Shi, “Sensor fusion via statistical hypothesis testing
metrology prediction, and process control in TFT-LCD manufacturing,” for prognosis and degradation analysis,” IEEE Trans. Autom. Sci. Eng.,
Robot. Comput.-Integr. Manuf., vol. 55, pp. 76–87, Feb. 2019. vol. 16, no. 4, pp. 1774–1787, Oct. 2019.
[9] W. Huang, J. Cheng, and Y. Yang, “Rolling bearing fault diagnosis and [34] J. Snoek, H. Larochelle, and R. P. Adams, “Practical Bayesian opti-
performance degradation assessment under variable operation conditions mization of machine learning algorithms,” in Proc. 25th Int. Conf.
based on nuisance attribute projection,” Mech. Syst. Signal Process., Neural Inf. Process. Syst., Lake Tahoe, NV, USA, vol. 2, Dec. 2012,
vol. 114, pp. 165–188, Jan. 2019. pp. 2951–2959.
[10] S.-C. Hsu and C.-F. Chien, “Hybrid data mining approach for pattern [35] J. Z. Kolter and M. A. Maloof, “Dynamic weighted majority:
extraction from wafer bin map to improve yield in semiconductor man- An ensemble method for drifting concepts,” J. Mach. Learn. Res., vol. 8,
ufacturing,” Int. J. Prod. Econ., vol. 107, no. 1, pp. 88–103, May 2007. pp. 2755–2790, Dec. 2007.
[36] C. J. Feltz and J.-J.-H. Shiau, “Statistical process monitoring using an
[11] C.-W. Liu and C.-F. Chien, “An intelligent system for wafer bin map
empirical Bayes multivariate process control chart,” Qual. Rel. Eng. Int.,
defect diagnosis: An empirical study for semiconductor manufacturing,”
vol. 17, no. 2, pp. 119–124, 2001.
Eng. Appl. Artif. Intell., vol. 26, nos. 5–6, pp. 1479–1486, May 2013.
[37] B. M. Colosimo, and E. Del Castillo, Bayesian Process Monitoring,
[12] K. Paynabar, J. Jin, and M. Pacella, “Monitoring and diagnosis of multi-
Control and Optimization. London, U.K.: Chapman & Hall, 2006.
channel nonlinear profile variations using uncorrelated multilinear prin-
[38] J. Lee, F. Wu, W. Zhao, M. Ghaffari, L. Liao, and D. Siegel, “Prognostics
cipal component analysis,” IIE Trans., vol. 45, no. 11, pp. 1235–1247,
and health management design for rotary machinery systems—Reviews,
Nov. 2013.
methodology and applications,” Mech. Syst. Signal Process., vol. 42,
[13] K. Liu, N. Z. Gebraeel, and J. Shi, “A data-level fusion model for nos. 1–2, pp. 314–334, Jan. 2014.
developing composite health indices for degradation modeling and [39] C.-Y. Lee, T.-S. Huang, M.-K. Liu, and C.-Y. Lan, “Data science
prognostic analysis,” IEEE Trans. Autom. Sci. Eng., vol. 10, no. 3, for vibration heteroscedasticity and predictive maintenance of rotary
pp. 652–664, Jul. 2013. bearings,” Energies, vol. 12, no. 5, p. 801, Feb. 2019.
[14] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical [40] Y. Hu and C. Zhao, “Fault diagnosis with dual cointegration analysis
Learning: Data Mining, Inference, and Prediction, 2th ed. New York, of common and specific nonstationary fault variations,” IEEE Trans.
NY, USA: Springer-Verlag, 2008. Autom. Sci. Eng., vol. 17, no. 1, pp. 237–247, Jan. 2020.
[15] W. A. Shewhart, Economic Control of Quality of Manufactured Product,
1st ed. New York, NY, USA: D. Van Nostrand Company, Inc, 1931.
[16] E. S. Page, “Continuous inspection scheme,” Biometrika, vol. 41,
nos. 1–2, pp. 100–115, 1954.
[17] C. A. Lowry, W. H. Woodall, C. W. Champ, and S. E. Rigdon,
“A multivariate exponentially weighted moving average control chart,” Chia-Yen Lee (Member, IEEE) received the Ph.D.
Technometrics, vol. 34, no. 1, pp. 46–53, 1992. degree in industrial and systems engineering from
[18] D. C. Montgomery and C. M. Mastrangelo, “Some statistical process Texas A&M University, College Station, TX, USA,
control methods for autocorrelation,” J. Qual. Technol., vol. 23, no. 3, in 2012.
pp. 173–193, 1991. He is currently the Director and a Professor with
[19] M. A. Mahmoud, P. A. Parker, W. H. Woodall, and D. M. Hawkins, the Institute of Manufacturing Information and Sys-
“A change point method for linear profile data,” Qual. Rel. Eng. Int., tems, National Cheng Kung University (NCKU),
vol. 23, no. 2, pp. 247–268, 2007. Tainan, Taiwan. His research interests include
[20] W. H. Woodall, D. J. Spitzner, D. C. Montgomery, and S. Gupta, “Using intelligent manufacturing systems, data science, pro-
control charts to monitor process and product quality profiles,” J. Qual. ductivity and efficiency analysis, and stochastic opti-
Technol., vol. 36, no. 3, pp. 309–320, Jul. 2004. mization. His research works appear in the European
[21] H. Hotelling, “The generalization of student’s ratio,” Ann. Math. Statist., Journal of Operational Research, the IEEE T RANSACTIONS ON P OWER
vol. 2, no. 3, pp. 360–378, 1931. S YSTEMS , Annals of Operations Research, the IEEE T RANSACTIONS ON
[22] C.-Y. Lee and Z.-H. Dong, “Hierarchical equipment health index frame- E NGINEERING M ANAGEMENT, and Applied Soft Computing.
work,” IEEE Trans. Semicond. Manuf., vol. 32, no. 3, pp. 267–276, Dr. Lee received the Best Practice Paper Award from the 17th Asia Pacific
Aug. 2019. Industrial Engineering and Management Systems Conference (APIEMS2016),
[23] F.-T. Cheng, H.-C. Huang, and C.-A. Kao, “Developing an automatic the 2016 Outstanding Young Industrial Engineer Award from the Chinese
virtual metrology system,” IEEE Trans. Autom. Sci. Eng., vol. 9, no. 1, Institute of Industrial Engineers (CIIE), the 2017 Ta-You Wu Memorial Award
pp. 181–188, Jan. 2012. of Distinguished Young Scholars from the Ministry of Science and Technology
[24] G. Verdier and A. Ferreira, “Adaptive mahalanobis distance and k- (MOST), Taiwan, the 2018 Kwoh-Ting Li Technology & Literature Lecture-
nearest neighbor rule for fault detection in semiconductor Manufac- ships Award of Distinguished Young Scholars from NCKU-Delta Electronics,
turing,” IEEE Trans. Semicond. Manuf., vol. 24, no. 1, pp. 59–68, and the 2019 Feng-Zhang Lu Memorial Medal from the Chinese Management
Feb. 2011. Association.
Authorized licensed use limited to: National Cheng Kung Univ.. Downloaded on November 02,2022 at 06:41:18 UTC from IEEE Xplore. Restrictions apply.
1678 IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 18, NO. 4, OCTOBER 2021
Chao-Shian Wu received the M.S. degree from the Yu-Hsin Hung (Member, IEEE) received the B.S.
Institute of Manufacturing Information and Systems, degree in statistics and the M.S. degree in man-
National Cheng Kung University, Tainan, Taiwan, ufacturing information and systems from National
in 2019. Cheng Kung University (NCKU), Tainan, Taiwan,
His research interests include data science, in 2017 and 2019, respectively.
machine learning, statistical process control, and He is currently a Research Assistant with the
operations research. Institute of Manufacturing Information and Systems,
Mr. Wu received the 2019 Master Thesis Award National Cheng Kung University. His research inter-
at Information System Session from the Chinese ests include statistics, optimization theory, control
Institute of Industry Engineers (CIIE). theory, intelligent manufacturing systems, and data
science.
Mr. Hung received the 2019 Master Thesis Award at Production System
Session from the Chinese Institute of Industry Engineers (CIIE) and the
membership in the Phi Tau Phi Scholastic Honor Society in 2019.
Authorized licensed use limited to: National Cheng Kung Univ.. Downloaded on November 02,2022 at 06:41:18 UTC from IEEE Xplore. Restrictions apply.