0% found this document useful (0 votes)
17 views11 pages

ILPMF

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views11 pages

ILPMF

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

1668 IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 18, NO.

4, OCTOBER 2021

In-Line Predictive Monitoring Framework


Chia-Yen Lee , Member, IEEE, Chao-Shian Wu, and Yu-Hsin Hung, Member, IEEE

Abstract— Process monitoring, which is used to ensure the the parameters become more complicated. Process engineers
product quality in semiconductor manufacturing, develops a have identified the need for prealarm process prediction
control chart and alerts engineers whenever a control limit is schemes that will identify problems faster than the existing
exceeded. However, a late alarm could generate defects or scraps,
and thus, a prealarm is urgent to be developed. This study technology.
proposes an in-line predictive monitoring (ILPM) framework that In the literature, data science and machine learning (DS/ML)
uses process parameter monitoring (PPM) in the first phase and techniques have proven helpful for process prognosis and
equipment parameter monitoring (EPM) in the second phase. fault detection and classification (FDC). The methodology
PPM includes off-line training and in-line prediction, where we for FDC in DS/ML can be categorized by the type of
collect the first half of time-series data and predict the second half
for in-line quality control. To maintain robustness and accuracy, training task: the supervised learning builds a function that
the concept drift is used to update the ILPM model in real maps the inputs to outputs based on observations including
time; specifically, the EPM detects the loss of prediction by input–output pair, and the unsupervised learning identifies
cumulative sum (CUSUM) control chart or detects the change unknown patterns in data sets without predetermined labels
of equipment parameters by Bayesian approach for developing (i.e., outputs) [3]. The supervised learning algorithms used in
retraining mechanism. An empirical study of a semiconductor
manufacturer indicates that the proposed ILPM framework FDC include tree-based algorithms [4], [8], probabilistic least
improved both the quality control and production capacity. squares support vector regression [5], particle filtering [6],
deep learning [7], support vector data descriptions [9], and
Note to Practitioners—Although some in-line monitoring virtual metrology [23]. The unsupervised learning algorithms
approaches have been proposed for specific conditions, little used include adaptive resonance theory 1 (ART1) [10], spatial
research has been done to develop a prealarm method; in
particular, this study proposes predictive monitoring by using independent tests [11], uncorrelated multilinear principal com-
data from multiple sensors (or status variable identifications, ponent analysis (UMPCA) [12], fusion models [13], degrada-
SVIDs) to predict one sensor in one process step of semiconductor tion analysis [33], and dual cointegration analysis [40]. Since
manufacturer. The prealarm benefits the early troubleshooting DS/ML techniques are widely used in FDC, each algorithm
and equipment capacity. This study also applies concept drift and has its own advantages and disadvantages [14], and model
equipment parameter monitoring (EPM) to identify the predic-
tion model misalignment or equipment misalignment. In practice, selection is based on performance metrics [e.g., accuracy,
root cause identification of the prediction error is critical and F1-score, and mean squared error (MSE)] and applicable
benefits the model retraining and equipment maintenance. conditions in real applications.
Index Terms— Bayesian monitoring, concept drift, failure prog- Besides DS/ML techniques, process engineers usually use
nosis, gated recurrent unit (GRU), in-line prediction. the metrology measures of control charts or sensors to monitor
semiconductor manufacturing. Typically, using the sample
I. I NTRODUCTION mean as the center line and standard deviations as variability,
the upper control limit (UCL) and lower control limit (LCL)
S EMICONDUCTOR manufacturing involves capital inten-
sive and complex manufacturing network processes.
To achieve a high quality and high throughput, a large number
can be calculated to effectively control the specification and
parameter state. For the univariate control chart, the seminal
work was developed by Shewhart [15]. Based on the frequen-
of process parameters, called SVIDs such as temperature, pres-
tist points of view with the fixed target value and assuming
sure, current, and voltage, are collected in real time. A control
that the process mean and variance are fixed, i.e., statistical
chart [1] monitors the SVIDs and sounds an alarm when
theory with the normal distribution of production data, 99.7%
a machine abnormality is detected or the product quality is
of the data points will fall within the range of three times
flawed. As advances in technology allow semiconductor man-
the standard deviation of the centerline for quality control.
ufacturers to collect more SVIDs, interaction effects among
However, Shewhart’s control chart does not memorize the
Manuscript received February 2, 2020; revised May 29, 2020; accepted information from sequential or time-series data, and thus, it is
July 27, 2020. Date of publication August 14, 2020; date of current version not sensitive to detect sudden data drift. Later, a cumulative
October 6, 2021. This article was recommended for publication by Associate
Editor L. Moench and Editor F.-T. Cheng upon evaluation of the reviewers’ sum (CUSUM) control chart was developed to accumulate
comments. This work was supported by the Ministry of Science and Tech- the variations of sequential data points and then regulate
nology, Taiwan, under Grant MOST106-2218-E-031-001. (Corresponding them [16]. In addition, while other control charts treat the
author: Chia-Yen Lee.)
The authors are with the Institute of Manufacturing Information and subgroups of samples individually, an exponentially weighted
Systems, National Cheng Kung University, Tainan 70101, Taiwan (e-mail: moving average (EWMA) chart was proposed to consider
[email protected]). the relation and sequence of subgroup samples by assigning
Color versions of one or more of the figures in this article are available
online at https://ieeexplore.ieee.org. appropriate weights [1], [17]. An EWMA chart monitors
Digital Object Identifier 10.1109/TASE.2020.3014177 the profile of the manufacturing process, shows the different
1545-5955 © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: National Cheng Kung Univ.. Downloaded on November 02,2022 at 06:41:18 UTC from IEEE Xplore. Restrictions apply.
LEE et al.: ILPM FRAMEWORK 1669

importance of all prior data points, and weights the most recent this study contributes to the monitoring literature, develops
samples highly. A variant of the EWMA chart, called moving an ILPM framework for prealarm of PPM in real time, and
centerline EWMA (MCEWMA), considers the residuals with integrates the EPM for model retraining by concept drift.
varying control limits and fits an EWMA to the univariate The remainder of this article is organized as follows.
autocorrelated data to minimize the one-step-ahead prediction Section II introduces the fundamentals and methodologies
error [18]. Furthermore, change point detection was developed of the ILPM framework. Section III presents the proposed
for profile monitoring with a likelihood ratio statistic to detect two-phase ILPM framework including PPM and EPM, and we
the location and magnitude of the shifts in linear profiles [19], develop off-line training module, in-line prediction module,
particularly, the parameter changes in intercept, slope, and and concept drift module. Section IV discusses the results of
error variance, respectively. For a comprehensive review of the validating the proposed framework with an empirical study of
profile, control charts are used to monitor the manufacturing a semiconductor assembly manufacturer. Section V presents
process (see [20]). the conclusion.
For semiconductor manufacturing, control chart techniques
provide better precision control by recipe. A recipe is a specific II. F UNDAMENTALS AND M ETHODOLOGIES
set of SVIDs with the necessary steps and tool settings. Only
This section describes the fundamental DS/ML techniques
multivariate control charts can handle the large numbers of
and control chart. They are GRU, CUSUM, target baseline
SVIDs collected with today’s technology. The most popular
(TB), concept drift, and Bayesian monitoring. These tech-
is the Hotelling’s T-squired statistic, which considers the
niques are used for ILPM framework and model retraining
covariance among variables [21], [22]; however, it is limited to
mechanism.
Gaussian distribution, which is rare in practice. Nonparametric
control charts are also used, such as the k-nearest neighbor
detection evaluating the cumulative distance of an observation A. Gated Recurrent Units
to its k-nearest neighbors in the learning samples under control The traditional artificial neural network (ANN) was devel-
without distribution assumption [30], or the adaptive Maha- oped with multilayer feedforward structure or backpropagation
lanobis distance considering the local covariance structure mechanism [29]. Each input of ANN is independent, and
among variables with a different covariance matrix obtained there is no memory function in the network structure. Hence,
on the nearest neighbors of each observation [24]. However, ANN cannot handle complicated sequential problems, such as
these methods were developed in the literature work for either linguistic materials, self-correlated signals, or time-series data.
off-line or metrology analysis, and thus, the fault detection or Recurrent neural network (RNN) with memory function and
alarms could be late when defective wafers were generated or a storage unit was developed. Each input can be temporarily
a wafer has run several steps. used as the available input of the next neuron. Long short-term
Motivated by the need to improve real-time monitoring memory (LSTM) is a type of RNN often used for time-series
in semiconductor manufacturing, this study develops a com- data sets [31]. The LSTM structure consists of an input gate,
prehensive in-line predictive monitoring (ILPM) framework a forget gate, and an output gate, and each gate is determined
embedded with a prealarm system using DS/ML so that as ON or OFF in the model training process. To reduce the
anomaly can be identified and corrected early than allowed parameters used in LSTM, GRU was developed. A reset gate
by the existing technology. The proposed ILPM framework determines how to integrate a new input with the previous
consists of a two-phase monitoring scheme. In phase 1, memory, and an update gate determines how much of the
the process parameter monitoring (PPM) collects the first previous memory to retain [25].
half of the data sequence of multiple process parameters The GRU structure has several variants; the following
(also called SVIDs/sensors hereafter) in one process step to describes a typical variant. Let st be the output state vector, x t
predict the second half of the data with respect to single be the input vector, z t be the update gate vector, and rt be the
sensor. This study uses gated recurrent unit (GRU) [25] and reset gate vector. Let Wz , Uz , and bz be the parametric matrices
suggests CUSUM chart for predictive monitoring. However, and intercept vector of the update gate. Let Wr , Ur , and br be
the prediction bias may rise later. One reason is model mis- the parametric matrices and intercept vector of the reset gate.
alignment which indicates the obsolete prediction model (i.e., Let Ws , Us , and bs be the parametric matrices and intercept
bias of predicted value), and the other reason is equipment vector for calculating the state candidate. Notation ϕ represents
misalignment which indicates the sensor error (i.e., bias of a sigmoid function. The operator ◦ denotes the Hadamard
actual value) due to equipment deterioration in the long-term product. The hyperbolic tangent function s̃t represents the state
operation. Thus, the concept drift is used to detect the changes candidate. GRU is formulated in the following equations:
in the data pattern over time and retrain the prediction model.
In phase 2, the equipment parameter monitoring (EPM) detects z t = ϕ(Wz x t + Uz st−1 + bz ) (1)
the lifetime of parts and sudden changes in mechanical para- rt = ϕ(Wr x t + Ur st−1 + br ) (2)
meters such as stiffness, damping, or vibration by Bayesian s̃t = tanh (Ws x t + Us (rt ◦ st−1 ) + bs ) (3)
monitoring. A comprehensive EPM procedure is developed st = z t ◦ st−1 + (1 − z t )◦s̃t . (4)
by continuously updating and retraining the PPM prediction
model for catching up with the updated information of new Fig. 1 shows the GRU. The update gate is formulated by (1)
observation collected in process step [27], [28]. Therefore, and forgets some information from the previous state, where z t

Authorized licensed use limited to: National Cheng Kung Univ.. Downloaded on November 02,2022 at 06:41:18 UTC from IEEE Xplore. Restrictions apply.
1670 IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 18, NO. 4, OCTOBER 2021

Fig. 2. Data transformation from samples to TB. (a) Centerline. (b) TB.
Fig. 1. GRU.

present the average number of points before the out-of-control


determines how much of the previous memory to retain. The points shown in the process. ARL0 depends on the choices of
reset gate integrates the new input with the previous memory both H and k. For example, if H = 4σ , it derives an in-control
by (2) and generates the state candidate by (3). The new output ARL0 = 168 samples, whereas if H = 5σ it derives an in-
state st is calculated by (4). control ARL0 = 465 samples. In fact, if H = 4.77σ , it derives
ARL0 = 370, which matches the ARL0 value for a Shewhart
B. CUSUM Control Chart control chart with 3σ limits, i.e., ARL0 = (1/0.0027) = 370.
In particular, H = 5σ and K = σ/2 in CUSUM shows a good
As mentioned earlier, Shewhart control charts do not mem-
ARL property when it is desired to detect a shift of about 1σ
orize the information from sequential or time-series data,
in the process mean.
i.e., they are not sensitive to detect sudden data drift and
small variations [16]. Therefore, another control chart, called
CUSUM, was designed to overcome these limitations by C. Target Baseline
calculating the CUSUMs of the deviations of the sample mean For one step in the manufacturing process, the sequential
from a target value. Collect m samples with each of size n, data of equipment SVIDs or sensor values can be monitored
and compute the mean x̄ i of each sample. μ0 is the target for and collected immediately. These time-series samples may
the process mean. Cm is the CUSUM including the m samples. show some variations with noise. To extract the golden pat-
CUSUM is formulated in the following equation: tern of the time series data points from the good products,

m several samples are collected randomly, and the median of the
Cm = (x̄ i − μ0 ). (5) sensor values is calculated to obtain the centerline as shown
i=1 in Fig. 2(a). The centerline is built using the median to avoid
If the process remains in control centered at μ0 , the CUSUM being affected by outliers. Since the time-series data points are
plot will show variations in a random pattern centered around not totally independent and may show some autocorrelation,
zero. If the process mean shifts upward, the Cm value will the EWMA is used to eliminate noise and transform the
gradually drift upward and vice versa if the process mean shifts centerline to the TB as shown in Fig. 2(b). The weighting
downward. parameter in EWMA controls the amount of influence of the
To design the control limit of CUSUM, a tabular previous observations on the current EWMA, i.e., EWMA
CUSUM [1] was developed, which accumulates derivations weights the samples in geometrically decreasing order so
from target value μ0 that are above target with one statistic that the most recent samples get higher weights, whereas
Cm+ (called one-sided upper CUSUM) and that are below target the farthest samples contribute very little. Thus, the proposed
with another statistic Cm− (called one-sided lower CUSUM). CUSUM calculates the CUSUMs of the deviations of the
They are formulated as follows, where the starting values are sample mean from the TB in this study.
C0+ = C0− = 0:
+ D. Concept Drift
Cm+ = max [0,x i − (μ0 + K ) + Cm−1 ] (6)

Cm− = max [0,(μ0 − K ) − x i + Cm−1 ]. (7) In DS/ML, how to ensure that the prediction model can keep
up with new information and provide an updated analysis is an
The parameter K , which is the reference value usually, important issue. Traditional supervised learning models usu-
presents the half between the target value μ0 (from the normal ally assume that the training data set can identify the features
samples) and the out of control of the mean μ1 (from the or extract the characteristics from the real settings. In practice,
abnormal samples). That is, K = k|μ1 − μ0 |, where k = 0.5. environmental changes reduce prediction accuracy. Concept
If either Cm+ or Cm− exceeds the decision interval H , the process drift refers to the changes in the conditional distribution of
is considered to be out of control. The smaller the K refer- the output, i.e., response variable, given the input features,
ence value, the smaller mean shift detected in the process. whereas the distribution of the input may stay unchanged [27].
An appropriate value for H is four or five times the process Therefore, concept drift is used to detect system anomalies
standard deviation σ [1]. Based on the normal distribution by monitoring: 1) the raw data streams; 2) the parameters
assumption, the average run length (ARL) is typically used to of the learners; or 3) the prediction errors. For monitoring

Authorized licensed use limited to: National Cheng Kung Univ.. Downloaded on November 02,2022 at 06:41:18 UTC from IEEE Xplore. Restrictions apply.
LEE et al.: ILPM FRAMEWORK 1671

the raw data streams, detectors can be built by sequential parameter variability. The characteristics of parameter mean μt
analysis, control charts, monitoring distributions on two dif- change over time due to system aging, environmental changes,
ferent time windows, and contextual approaches [27]. For or human interference. Assume that process mean μt follows
monitoring the parameters of the learners, the parameters of a normal distribution with mean μ and variance γ 2 . Third,
a system described by mathematical equations, i.e., a model- the posterior distribution presents the updated parameter. To
based system, are estimated, and the detection of changes of estimate and update the parameter mean μt , the information
the parameters provides information, where learners can use of parameter characteristic yt at time t is used. The probability
to compensate for the control errors. For example, mechanical density function (p.d.f.) of μt , given yt , following the Bayes
system identification methods such as genetic algorithms [32] approach is:
and Bayesian optimization [34] are used to estimate the f 1 (yt |μt ) f 2 (μt )
mechanical parameters which represent the system aging or g(μt |yt ) =  (8)
f 1 (yt |μt ) f 2 (μt )dμt
wear-out by minimizing the gap between numerical analysis
and experimental results. For monitoring the prediction errors, where g(μt |yt ) is also a univariate normal distribution and also
an online adaptive learning procedure was developed to update called posterior distribution. Posterior mean μpt and posterior
the prediction model by considering the loss function esti- variance σpt2 combining the prior information with the data are
mated between the actual and predicted values of the new written as
sample [27]. An advanced incremental learning algorithm is μ
γ2
+ yt
σ2
able to adapt to the evolution of the data-generating process μpt = E(μt |yt ) = = wμ + (1 − w)yt (9)
over time. Two methods are suggested: update the model
1
γ2 + 1
σ2
parameters: 1) by analyzing the marginal errors with the most 1
σpt2 = Var(μt |yt ) = = γ 2w (10)
recently observed samples (i.e., edge retraining) or 2) by 1
γ2 + 1
σ2
adding these most recent samples into the training data set with
a fixed/variable time window (i.e., complete retraining). For where w = σ 2 /(γ 2 + σ 2 ) is considered as a weight of prior
edge retraining, the dynamic weighted ensemble classifier is mean μ. Hence, the posterior mean is the weighted average of
popular [35]; each classifier refers to an expert with a weight, the prior mean and the new sample, and the posterior of inverse
and the weights of the classifiers are dynamically adjusted variance is the combination of the prior inverse variance and
based on the prediction performance over time. For complete the inverse of the sample variance.
retraining, the design of triggering mechanisms and batch Assume that the equipment parameter μt is a fixed constant
size identifies the tradeoff between the model retraining time in a small time interval, for example, minutes or hours,
and prediction accuracy. This study focuses on the off-line so that the yt can be considered as an independent and
complete retraining due to a large amount of data by a recipe identically distribution (i.i.d.) with the p.d.f. f 1 (yt |μt ). Given
by steps in semiconductor manufacturing process (i.e., big a data set y1 , . . . , yT with a sample size of T , the equipment
data). The CUSUM control chart is used for change detection, mechanical parameters μ, σ 2 , and γ 2 can be estimated as
and then, the model is retained by adding new observations. follows. The estimate of overall aging process mean μ is the
double expectation of yt given μt , and the estimator μ̂ can be
formulated as follows:
E. Bayesian Monitoring of Equipment Aging Process T
yt
Process drift exists in most of the mechanical systems; μ̂ = t=1 . (11)
T
in particular, system aging and tool wearing problem may
happen in a long-term operation. The mechanical parameters Assume that the correlation between μt and μt+1 is close
ideally change slowly over time, and thus, this study aims to 1, to consider the time lag between yt ’s as the sample error.
to identify the sudden and significant drift of mechanical Since Var(yt −yt+1 ) = 2σ 2 , the estimator σ 2 can be formulated
parameters automatically by updating the varied process mean as follows:
T (yt −yt −1 )2
and variance required to monitor the operational status of the t=2
semiconductor manufacturing process. A Bayesian approach σ̂ 2 = 2
. (12)
T −1
takes the advantage of an online environment by using pre-
vious and current data to update the prior via likelihood; To estimate γ 2 , first, estimate the total variance. Assume
thus, we use it to estimate the process mean [36], [37]. For that the sample and process covariance do not exist. By the
the Bayesian process monitoring, we explain three necessary conditional expectation method, the total variance V and the
components: the likelihood function, prior distribution, and estimator can be formulated as follows:
posterior distribution. V ≡ Var(yt ) = E μt (Var(yt |μt ))+Varμt (E(yt |μt )) = σ 2 +γ 2
First, the likelihood function presents the sample variability.
Let yt be a continuous aging process characteristic such as the (13)
mechanical parameters of equipment. Since sampling errors 1 
T
V̂ = (yt − μ̂)2 . (14)
may occur, assume that yt is a normal distribution with mean T t=1
μt and variance σ 2 at time t, where σ 2 is an unknown constant
and assumed to be the combination of the measurement Then, γ̂ 2 = V̂ − σ̂ 2 if it is positive; otherwise, set γ̂ 2 = δ V̂ ,
and sampling errors. Second, the prior distribution presents where δ is a very small positive value.

Authorized licensed use limited to: National Cheng Kung Univ.. Downloaded on November 02,2022 at 06:41:18 UTC from IEEE Xplore. Restrictions apply.
1672 IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 18, NO. 4, OCTOBER 2021

Steps 1.4–1.5: These steps calculate the centerline by taking


the median of the sensor values in all good samples, builds
the EWMA formulation by setting appropriate weights, and
then converts the centerline to the TB.
Steps 1.6–1.7: Train the prediction model by the training
data set including the pairs of input sequence and output
sequence. A supervised learning builds the ILPM model by
GRU and applies tenfold cross validation to the testing data
Fig. 3. ILPM for one specific equipment sensor. set to evaluate the model’s prediction accuracy. Here, the
performance metrics are MSE and mean absolute percentage
error (MAPE) commonly used in DS due to its statistical prop-
III. ILPM F RAMEWORK erty and interpretability, respectively [14]. Given m samples
each with size n, where the first half of the data are for the
This section proposes the ILPM framework. We illustrate input sequence and the second half is for the output sequence
the concept of ILPM as Fig. 3 first. For one manufacturing prediction, MSE is minimized in the training process, and then,
equipment producing one specific product, the PPM phase MAPE can be calculated for better interpretation. They are
monitors each individual process sensor/SVID/parameter (sen- defined as follows:
sor hereafter) in one manufacturing process step. We build
1  
m n
an in-line prediction module with a prealarm stage and an MSE = (st − ŝt ) (15)
alarm-confirm stage. The prealarm stage is triggered by fin- mn i=1
t= 2 
n

ishing the collection of the first half of the data in one step.
m  n  
100%  (st − ŝt ) 
The data are used as a model input which predicts the curve of MAPE =  . (16)
mn i=1  s 
the second half of the data. If the CUSUM chart shows that the t= 2 
n t
predictive curve is out of control, the in-line prediction module
triggers the prealarm mechanism; otherwise, the predicted In-Line Prediction Module:
remaining process is in control. The alarm confirmation stage Steps 2.1–2.3: For one new sample, after collecting the
is triggered by finishing the collection of the remaining data. first-half sequence of the sensor values, the ILPM model
If the CUSUM chart shows that the actual curve of the remain- predicts the second-half output sequence. In the prealarm
ing data is out of control, the in-line prediction module issues stage, these steps calculate and cumulate the distances between
an alarm confirmation which confirms the previous stage and the predicted values and TB for CUSUM monitoring, i.e., this
triggers a troubleshooting process. Typically, the proposed is an ILPM by predicted CUSUM.
ILPM is built by a recipe by step by a sensor, but it can Steps 2.4–2.6: If the predicted CUSUM chart shows out of
be generalized based on the grouping technique (e.g., recipe control, the equipment triggers a prealarm, and engineers val-
group). idate the alarm and troubleshoot. These steps keep collecting
Fig. 4 shows the proposed ILPM framework including two data whether the CUSUM shows in control or out of control
phases—PPM and EPM. Three modules including off-line until the end of the process step.
training, in-line prediction, and concept drift are developed. Steps 2.7–2.9: After collecting the remaining second-half
The off-line training module is built to train the predic- sequence of the sensor values, in the alarm-confirm stage,
tion model and estimate the TB. In-line prediction mod- these steps calculate and cumulate the distances between the
ule predicts the remaining part of sequential data and cal- actual values and TB for CUSUM monitoring; i.e., this is a real
culates the CUSUM for prealarm and alarm confirmation. monitoring by actual CUSUM. If the actual CUSUM shows
The concept drift module is used to update the predic- out of control, the equipment triggers an alarm, and engineers
tion model by detecting the bias of the prediction model need to validate the alarm and troubleshooting.
by identifying the root cause (i.e., model misalignment Step 2.10: The consistency between predicted CUSUM and
or equipment misalignment) through the Bayesian aging actual CUSUM is investigated. Four cases are discussed here.
monitoring. For the two cases—the predicted CUSUM prealarms, but then
The steps of the proposed two-phase framework shown the actual CUSUM alarms, or the predicted CUSUM does
in Fig. 4 are explained as follows. not prealarms, but then the actual CUSUM does not alarm,
Off-Line Training Module: these two cases show a justified consistency between predicted
Step 1.1: In-line data such as parameters, SVIDs, or sensor result and actual result, and thus, there is no correction needed
values with respect to one step in one equipment during for the ILPM prediction model. However, for the other two
the production process are collected from the manufacturing cases—the predicted CUSUM prealarms, but then the actual
execution system (MES), FDC, or recipe management system CUSUM does not alarm (i.e., false alarm), or the predicted
(RMS). CUSUM does not prealarms, but then the actual CUSUM
Steps 1.2–1.3: Data preprocessing merges different data alarms (i.e., missing alarm), these two types of errors usually
tables, removes the outlier, imputes the missing values, imply misalignments between prediction model and actual
smooths the noise, and then divides the cleaned data into a sensor values. Go to Step 3.1 for further investigation by
training data set and a testing data set for model training. concept drift module.

Authorized licensed use limited to: National Cheng Kung Univ.. Downloaded on November 02,2022 at 06:41:18 UTC from IEEE Xplore. Restrictions apply.
LEE et al.: ILPM FRAMEWORK 1673

Fig. 4. ILPM DS framework.

Concept Drift Module: the sudden and significant aging process drift by Bayesian
Steps 3.1–3.2: These steps calculate the loss between the monitoring.
predicted and actual values. The loss is measured by the Step 3.7: Implement Bayesian monitoring of the equip-
distance and cumulated for CUSUM monitoring; that is, ment aging process with respect to the mechanical parameter.
the CUSUM chart monitors the prediction accuracy of the In a long-term operation, the equipment parameter mean
current ILPM model. Note that the loss can be defined by may change over time. However, the estimate of prior mean
different forms, e.g., considering the squared error as the contained all the data points. This will provide an inaccurate
distance with quadratic penalty. estimate of equipment parameter since the process mean will
Steps 3.3–3.4: If the cumulative loss is over the control limit, be weighed down by old data. By incorporating the weighting
then the error between the predicted value and actual value is factor into Bayesian approach, the estimation of parameters
significant and detected, and then, we need to investigate the can be rewritten by
root cause of the error (go to Step 3.5); that is, the loss is T
substantially affected by either the bias of predicted value (i.e., λT −t yt
μ̂ = t=1T
(17)
T −t
prediction model misalignment due to new sample bringing t=1 λ
new information different from the past) or the bias of actual T λT −t (yt −yt −1 )2
t=2
value (equipment misalignment due to aging or wear-out). σ̂ 2 = T
2
(18)
Otherwise, the prediction model is in control and applicable t=2λ −t
T
T T −t
to the process for continuous monitoring (go to Step 3.4). λ (yt − μ̂)2
Steps 3.5–3.6: EPM is developed to investigate two things: V̂ = t=1
T (19)
T −t
t=1 λ
remaining useful life (RUL) and process drift. For RUL,
the predictive maintenance (PdM) model can be applied, and where λT −t is the weighted coefficient for each time t, and λ is
if the RUL is over than a predetermined threshold, then the a hyperparameter given as 0 < λ < 1. If the weight away data
equipment or part is out of control, and go to Step 3.8 for are selected to be m observations ago, set λ as δ 1/m , where δ
equipment maintenance due to worn-out equipment or part is a very small value. For example, let δ = 0.0001, and the
deterioration. Otherwise, the lifetime is in control, and go to number of data to be weighted away is 100, 200, and 300, and
Step 3.7. Since RUL and PdM are out of the scope of this the λ, in turn, is 0.912, 0.955, and 0.970, respectively. Com-
study, see [38] and [39] for details. We focus on identifying paring the estimated process mean μ̂t and variance σ̂ 2 with

Authorized licensed use limited to: National Cheng Kung Univ.. Downloaded on November 02,2022 at 06:41:18 UTC from IEEE Xplore. Restrictions apply.
1674 IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 18, NO. 4, OCTOBER 2021

the parameter mean μpt of the updated posterior distribution,


the estimated parameter characteristics represent the long-term
changes of aging, and the posterior parameter mean μpt repre-
sents the instant drift of mechanical parameter characteristic of
a more immediate situation. The posterior mean is capable of
capturing the parameter drift rapidly due to the weights based
on the parameter and sample variance, and a larger parameter
variance makes the posterior put more weights on the new
samples. In online parameter monitoring, the posterior mean
can be tested to determine whether it belongs to the long-term Fig. 5. Good product plots. (a) Distance measure between predicted values
and TB. (b) Predicted CUSUM plot.
estimated distribution or not by hypothesis test (20). Note that
generally a smoothing hyperparameter λ is selected to estimate
the equipment parameter mean and to construct the posterior
distribution

H0 :μ̂pt = μt
(20)
H1:μ̂pt = μt .
Steps 3.8–3.9: If the Bayesian monitoring of equipment
parameter is out of control, then it indicates the equipment
misalignment (i.e., actual value error/sensor error) and equip-
ment alarms for maintenance or part replacement (go to Step
3.8). Otherwise, it indicates model misalignment, and the Fig. 6. Defective product plots. (a) Distance measure between predicted
values and TB. (b) Predicted CUSUM plot.
ILPM model is needed to be corrected. Update the posterior
distribution of the parameter mean μpt by Bayesian approach
according to the new sample (go to Step 3.9).
Steps 3.10–3.11: The new samples will be added into the process step and uses it to predict the second-half trend. The
training data set for ILPM model retraining. The retraining performance of GRU prediction shows an MAPE of 2.81%.
mechanism including two subroutines is built to retrain the Based on the ILPM, we calculate the CUSUM of the distance
ILPM model off-line by a recipe by step by a sensor. One between the predicted values and the TB. For a good product,
subroutine is to retrain the model periodically, where we the predicted trend in PPM is shown in Fig. 5(a), and the
schedule model training procedure on a recurring basis using predicted CUSUM is developed as shown in Fig. 5(b). The
a job scheduler and suggest batch retraining after collecting predicted CUSUM is not out of control, so the prealarm is
new data in a period of time. For the batch retraining, the time not triggered.
window is useful to fix the size of data set by reserving the We use the same procedure for defective products. The
recent samples and remove the oldest samples or by identify- ILPM model provides the predicted trend of the second half of
ing the representative samples. The other module is based on the data sequence as shown in Fig. 6(a). Based on the ILPM,
drift detection, where this detection can automatically schedule we calculate the CUSUM of the distance between the predicted
model retraining, and we suggest the data set including the values and the TB. The predicted CUSUM is out of control,
new samples with majority or higher weights. After preparing so the prealarm is triggered as shown in Fig. 6(b).
the updated data set, go to Step 1.3. 2) Alarm-Confirm Stage: Follow up the process step of
Fig. 5 with respect to this good product. After collecting the
IV. E MPIRICAL S TUDY actual values of the second half of the data sequence in this
process step, we calculate the CUSUM of the distance between
This section describes the empirical study of a semicon-
the actual values and the TB. For a good product, the actual
ductor assembly manufacturer in Taiwan used to validate the
trend in PPM is shown in Fig. 7(a), and the actual CUSUM
ILPM framework. We focus on one equipment type (e.g.,
is developed as shown in Fig. 7(b). The actual CUSUM is
electroplating) and collect data on seven pieces of equipment
not out of control, so it validates the prealarm which is not
operating for 3 months in 2018. After data preprocessing,
triggered.
the prepared data set consists of 13 877 084 observations,
Follow up the process step of Fig. 6 with respect to this
58 continuous variables, and 21 category variables. Without
defective product. We use the same procedure for the defective
losing generality, all data are transformed for proprietary
product. After collecting the actual values of the second half
information protection.
of the data sequence in this process step, we calculate the
CUSUM of distance between the actual values and the TB.
A. Phase I—PPM For this defective product, the actual trend in PPM is shown
1) Prealarm Stage: After data collection and preprocessing, in Fig. 8(a), and the actual CUSUM is developed as shown
we calculate the TB of good products and implement off-line in Fig. 8(b). The actual CUSUM is out of control, and then,
training for the ILPM model. For one specific sensor, the ILPM it validates that the prealarm is triggered in the previous stage
model collects the first half of the data sequence in one and supports the engineering troubleshooting.

Authorized licensed use limited to: National Cheng Kung Univ.. Downloaded on November 02,2022 at 06:41:18 UTC from IEEE Xplore. Restrictions apply.
LEE et al.: ILPM FRAMEWORK 1675

Fig. 7. Good product plots. (a) Distance measure between actual values and Fig. 9. Defective product plots. (a) Before ILPM model retraining. (b) After
TB. (b) Actual CUSUM plot. ILPM model retraining.

shows the poor prediction of the defective product by the


current ILPM model, and the loss is over the CUSUM control
limit. Thus, we retrain and update the model. After adding this
sample to the data set, the ILPM model (i.e., GRU prediction
model) is retrained off-line. For the next new sample, a better
prediction with a smaller MAPE is shown in Fig. 9(b).
2) Concept Drift for Equipment Misalignment: Due to the
equipment aging and deterioration, the equipment misalign-
Fig. 8. Defective product plots. (a) Distance measure between actual values ment could cause an irregular sensor which may collect
and TB. (b) Actual CUSUM plot.
imprecise data. As mentioned, this study focuses on aging
process drift of equipment rather than lifetime estimation.
TABLE I
In practice, it is time consuming and difficult to conduct
C ONFUSION M ATRIX OF PPM VALIDATION
several experiments for equipment deterioration in a factory,
so we suggest a simulation method for data generation of one
mechanical parameter and then apply the proposed Bayesian
monitoring. We consider simulating a long-term operation of
one equipment. The mechanical parameter is estimated once
per hour. We generate n = 24 · 30 · 12 = 8640 samples for
EPM. Based on the domain knowledge from engineers, there
are three simulation models and their hyperparameter settings.
To validate the PPM, for the same recipe/step/sensor, 3) Random Jump Model: First, we simulate the rapid
we randomly select 60 samples of good products and 60 sam- process drift data. A random jump (RJ) model assumes that
ples of defective products. The ILPM model shows higher the parameter mean stays constant over time, and drift occurs
accuracy; in particular, 93.3% of samples of good products randomly with random magnitude. The RJ model is formulated
are validated, i.e., the prediction can be applied, and there is as follows:

nothing misclassified as shown in Table I. The nonapplicable μt−1 , with probability 1 − p
samples mean that the ILPM model cannot be applied to μt =  2 (21)
μt−1 + N 0,η , with probability p
these samples due to a large amount of missing data or
invalid formats generated from some broken sensors, failure in where η and p are the hyperparameters of the RJ model.
data collection, or changing the sensors’ sampling rates. For If we assume that the average RJ happens once per week,
the defective products, although the prediction performance the hyperparameter of jump probability is p = (1/24 × 7) ≈
is excellent, more than one-third of the samples cannot be 0.006, and the jump magnitude is η2 = 4.
validated completely due to nonapplicable and missing data, 4) Linear and Exponential Jump Model: Following Wang et
or the shutdown of the second-half data collection sequence al. [2] who observed that the deteriorated mechanical para-
after the prealarm is triggered. meter usually follows an exponential pattern in a long-term
process, for simulation, we combine the linear slope and
B. Phase II—EPM exponential growth of the process mean. The linear and
exponential jump (LEJ) model is formulated as follows:
1) Concept Drift for Model Misalignment: As described in 
Section II, due to the changes of uncertain environment and μt−1 + c, if t < tc
the streaming data collection over time, we use concept drift to μt = (22)
kμt−1 , if t ≥ tc
detect changes and update the ILPM model by monitoring the
loss between the predicted and actual values. First, we consider where c and tc are the hyperparameters of the LEJ model. If we
model misalignment, and in the next subsection, we consider assume the change point from linear to exponential happens
equipment misalignment. For one specific sensor, Fig. 9(a) at every 6 months, the hyperparameter of the change point is

Authorized licensed use limited to: National Cheng Kung Univ.. Downloaded on November 02,2022 at 06:41:18 UTC from IEEE Xplore. Restrictions apply.
1676 IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 18, NO. 4, OCTOBER 2021

monitoring does not trigger an alarm for the smooth change of


the parameter in the long term. The sample values generated
from the simulation are shown in Fig. 11(a), and no out of
control has been detected as shown in Fig. 11(b). The test
statistics are much closer to the control limit as the equipment
parameter changes rapidly at the end of life. For the HJ model,
it is more likely to occur in real practice to describe the
equipment deterioration. The proposed Bayesian parameter
monitoring successfully detects instant process drift in the
Fig. 10. Bayesian monitoring of RJ model. (a) Sample value of mechanical
long-term operation as shown in Fig. 12(b).
parameter over time. (b) Control chart for parameter mean.
V. C ONCLUSION
This study proposed a two-phase DS framework embedded
with PPM in phase I and EPM in phase II. Three modules off-
line training, in-line prediction, and concept drift were devel-
oped for prealarm of defective products and change detection
of model retraining. For in-line prediction, the ILPM collected
the first half of the data sequence and predicted the second
half of the data sequence in the process step. The concept
drift module identified model misalignment and equipment
alignment by Bayesian monitoring and updated the ILPM
Fig. 11. Bayesian monitoring of LEJ model. (a) Sample value of mechanical
model continuously for maintaining robustness and accuracy.
parameter over time. (b) Control chart for parameter mean. An empirical study of a real semiconductor manufacturer was
used to validate the proposed framework. The results showed a
higher performance of ILPM prediction, but some nonapplica-
ble cases or the shutdown of the second-half data collection
after the prealarm was triggered, which may give difficulties
in practical implementation. In practice, PPM supports the
prealarm for troubleshooting. The prealarm is helpful to give
more time for preparation (e.g., hold lot), and the operators can
coordinate the line balance or capacity backup by production
rescheduling. Furthermore, PPM provides important sensors
or SVIDs which significantly affect the quality prediction, and
Fig. 12. Bayesian monitoring of HJ model. (a) Sample value of mechanical
thus, it supports the quick response to the process adjustment
parameter over time. (b) Control chart for parameter mean. through feature selection [26] or design of experiments (DOE).
In particular, the interaction effect among multiple SVIDs
can be preliminarily identified. In addition, EPM detects the
tc = 24 · 30 · 6 = 4320, the linear slope is c = 0.002, and the sudden drift of a mechanical parameter caused by sensor error,
exponential rate is k = 1.0005. wearing out, or equipment aging in the long-term operation
5) Hybrid Jump Model: The hybrid jump (HJ) model com- and supports the equipment health maintenance. A simulation
bines the RJ model and the LEJ model. Through the Bayesian method for data generation was used to validate the proposed
approach, the prior of the parameter mean and variance is Bayesian monitoring.
given as normal distribution N(1, 1). For the long term, We suggest three extensions of the research. First, in addi-
we calculate the hyperparameter λ = δ 1/m = 0.99 by setting tion to the LSTM and GRU models, support vector regression,
m = 24 · 30 = 720 observations in a month and δ = 0.0001. gradient boosting machine, and random forest could be used
For each time period, we estimate a new parameter mean to improve the prediction accuracy or reduce the computation
and variance and then use a three-sigma control limit to test time of the model training. Second, more investigation could
whether the posterior mean is out of control or not. The results help to determine the volume of data in the first data collection
for the three simulation models are shown in Figs. 10–12, sequence required to predict the second (i.e., remaining) data
respectively. sequence. A smaller amount collected in the first part can
For the RJ model, we generate the sample values of the trigger the prealarm earlier but may also affect the prediction
mechanical parameter based on the proposed simulation as accuracy in the second part. This tradeoff should be inves-
shown in Fig. 10(a) and then use the estimated mean and tigated for practical implementation of ILPM model. Third,
variance to build up the parameter’s process control limits multivariate Bayesian monitoring of multiple mechanical para-
as shown in Fig. 10(b). The result shows that the parameter meters, which clarifies their relationship, could provide a more
monitoring on posterior mean is capable of detecting the jump comprehensive monitoring for achieving more accurate control
points with large change. The small jumps can be regarded as by developing feedforward compensation of the mechanical
random errors and ignored. For the LEJ model, the Bayesian parameters.

Authorized licensed use limited to: National Cheng Kung Univ.. Downloaded on November 02,2022 at 06:41:18 UTC from IEEE Xplore. Restrictions apply.
LEE et al.: ILPM FRAMEWORK 1677

R EFERENCES [25] K. Cho et al., “Learning phrase representations using RNN encoder-
decoder for statistical machine translation,” 2014, arXiv:1406.1078.
[1] D. C. Montgomery, Introduction to Statistical Quality Control, 7th ed. [Online]. Available: http://arxiv.org/abs/1406.1078
Hoboken, NJ, USA: Wiley, 2013. [26] S.-Y. Hung, C.-Y. Lee, and Y.-L. Lin, “Data science for delamina-
[2] C. Wang, M. Yang, D. Xu, and H. Wu, “A novel integrated identification tion prognosis and online batch learning in semiconductor assembly
method of model structure and parameters for drive system,” in Proc. process,” IEEE Trans. Compon., Packag., Manuf. Technol., vol. 10, no. 2,
IEEE 27th Int. Symp. Ind. Electron. (ISIE), Jun. 2018, pp. 13–15. pp. 314–324, Feb. 2020.
[3] G. James, D. Witten, T. Hastie, and R. Tibshirani, An Introduction to [27] J. Gama, I. Žliobaitė, A. Bifet, M. Pechenizkiy, and A. Bouchachia,
Statistical Learning: With Applications in R, 1st ed. London, U.K.: “A survey on concept drift adaptation,” ACM Comput. Surv., vol. 46,
Springer, 2013. no. 4, pp. 1–37, Apr. 2014.
[4] C.-F. Chien, W.-C. Wang, and J.-C. Cheng, “Data mining for yield [28] I. Žliobaitė, M. Pechenizkiy, and J. Gama, “An overview of concept drift
enhancement in semiconductor manufacturing and an empirical study,” applications,” in Big Data Analysis: New Algorithms for a New Society,
Expert Syst. Appl., vol. 33, no. 1, pp. 192–198, Jul. 2007. N. Japkowicz and J. Stefanowski, Eds. Cham, Switzerland: Springer,
[5] T. Khawaja and G. Vachtsevanos, “A novel architecture for on-line 2016, pp. 91–114.
failure prognosis using probabilistic least squares support vector regres- [29] Hecht-Nielsen, “Theory of the backpropagation neural network,” in
sion machines,” in Proc. Annu. Conf. Prognostics Health Manage. Soc., Proc. Int. Joint Conf. Neural Netw., Jun. 1989, pp. 593–611.
San Diego, CA, USA, 2009, pp. 1–8. [30] Q. P. He and J. Wang, “Fault detection using the k-nearest neighbor rule
[6] E. Zio and G. Peloni, “Particle filtering prognostic estimation of the for semiconductor manufacturing processes,” IEEE Trans. Semicond.
remaining useful life of nonlinear components,” Rel. Eng. Syst. Saf., Manuf., vol. 20, no. 4, pp. 345–354, Nov. 2007.
vol. 96, no. 3, pp. 403–409, Mar. 2011. [31] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural
[7] F. Jia, Y. Lei, J. Lin, X. Zhou, and N. Lu, “Deep neural networks: A Comput., vol. 9, no. 8, pp. 1735–1780, 1997.
promising tool for fault characteristic mining and intelligent diagnosis [32] M. Iwasaki, M. Miwa, and N. Matsui, “GA-based evolutionary identi-
of rotating machinery with massive data,” Mech. Syst. Signal Process., fication algorithm for unknown structured mechatronic systems,” IEEE
vols. 72–73, pp. 303–315, May 2016. Trans. Ind. Electron., vol. 52, no. 1, pp. 300–305, Feb. 2005.
[8] C.-Y. Lee and T.-L. Tsai, “Data science framework for variable selection, [33] A. Chehade and Z. Shi, “Sensor fusion via statistical hypothesis testing
metrology prediction, and process control in TFT-LCD manufacturing,” for prognosis and degradation analysis,” IEEE Trans. Autom. Sci. Eng.,
Robot. Comput.-Integr. Manuf., vol. 55, pp. 76–87, Feb. 2019. vol. 16, no. 4, pp. 1774–1787, Oct. 2019.
[9] W. Huang, J. Cheng, and Y. Yang, “Rolling bearing fault diagnosis and [34] J. Snoek, H. Larochelle, and R. P. Adams, “Practical Bayesian opti-
performance degradation assessment under variable operation conditions mization of machine learning algorithms,” in Proc. 25th Int. Conf.
based on nuisance attribute projection,” Mech. Syst. Signal Process., Neural Inf. Process. Syst., Lake Tahoe, NV, USA, vol. 2, Dec. 2012,
vol. 114, pp. 165–188, Jan. 2019. pp. 2951–2959.
[10] S.-C. Hsu and C.-F. Chien, “Hybrid data mining approach for pattern [35] J. Z. Kolter and M. A. Maloof, “Dynamic weighted majority:
extraction from wafer bin map to improve yield in semiconductor man- An ensemble method for drifting concepts,” J. Mach. Learn. Res., vol. 8,
ufacturing,” Int. J. Prod. Econ., vol. 107, no. 1, pp. 88–103, May 2007. pp. 2755–2790, Dec. 2007.
[36] C. J. Feltz and J.-J.-H. Shiau, “Statistical process monitoring using an
[11] C.-W. Liu and C.-F. Chien, “An intelligent system for wafer bin map
empirical Bayes multivariate process control chart,” Qual. Rel. Eng. Int.,
defect diagnosis: An empirical study for semiconductor manufacturing,”
vol. 17, no. 2, pp. 119–124, 2001.
Eng. Appl. Artif. Intell., vol. 26, nos. 5–6, pp. 1479–1486, May 2013.
[37] B. M. Colosimo, and E. Del Castillo, Bayesian Process Monitoring,
[12] K. Paynabar, J. Jin, and M. Pacella, “Monitoring and diagnosis of multi-
Control and Optimization. London, U.K.: Chapman & Hall, 2006.
channel nonlinear profile variations using uncorrelated multilinear prin-
[38] J. Lee, F. Wu, W. Zhao, M. Ghaffari, L. Liao, and D. Siegel, “Prognostics
cipal component analysis,” IIE Trans., vol. 45, no. 11, pp. 1235–1247,
and health management design for rotary machinery systems—Reviews,
Nov. 2013.
methodology and applications,” Mech. Syst. Signal Process., vol. 42,
[13] K. Liu, N. Z. Gebraeel, and J. Shi, “A data-level fusion model for nos. 1–2, pp. 314–334, Jan. 2014.
developing composite health indices for degradation modeling and [39] C.-Y. Lee, T.-S. Huang, M.-K. Liu, and C.-Y. Lan, “Data science
prognostic analysis,” IEEE Trans. Autom. Sci. Eng., vol. 10, no. 3, for vibration heteroscedasticity and predictive maintenance of rotary
pp. 652–664, Jul. 2013. bearings,” Energies, vol. 12, no. 5, p. 801, Feb. 2019.
[14] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical [40] Y. Hu and C. Zhao, “Fault diagnosis with dual cointegration analysis
Learning: Data Mining, Inference, and Prediction, 2th ed. New York, of common and specific nonstationary fault variations,” IEEE Trans.
NY, USA: Springer-Verlag, 2008. Autom. Sci. Eng., vol. 17, no. 1, pp. 237–247, Jan. 2020.
[15] W. A. Shewhart, Economic Control of Quality of Manufactured Product,
1st ed. New York, NY, USA: D. Van Nostrand Company, Inc, 1931.
[16] E. S. Page, “Continuous inspection scheme,” Biometrika, vol. 41,
nos. 1–2, pp. 100–115, 1954.
[17] C. A. Lowry, W. H. Woodall, C. W. Champ, and S. E. Rigdon,
“A multivariate exponentially weighted moving average control chart,” Chia-Yen Lee (Member, IEEE) received the Ph.D.
Technometrics, vol. 34, no. 1, pp. 46–53, 1992. degree in industrial and systems engineering from
[18] D. C. Montgomery and C. M. Mastrangelo, “Some statistical process Texas A&M University, College Station, TX, USA,
control methods for autocorrelation,” J. Qual. Technol., vol. 23, no. 3, in 2012.
pp. 173–193, 1991. He is currently the Director and a Professor with
[19] M. A. Mahmoud, P. A. Parker, W. H. Woodall, and D. M. Hawkins, the Institute of Manufacturing Information and Sys-
“A change point method for linear profile data,” Qual. Rel. Eng. Int., tems, National Cheng Kung University (NCKU),
vol. 23, no. 2, pp. 247–268, 2007. Tainan, Taiwan. His research interests include
[20] W. H. Woodall, D. J. Spitzner, D. C. Montgomery, and S. Gupta, “Using intelligent manufacturing systems, data science, pro-
control charts to monitor process and product quality profiles,” J. Qual. ductivity and efficiency analysis, and stochastic opti-
Technol., vol. 36, no. 3, pp. 309–320, Jul. 2004. mization. His research works appear in the European
[21] H. Hotelling, “The generalization of student’s ratio,” Ann. Math. Statist., Journal of Operational Research, the IEEE T RANSACTIONS ON P OWER
vol. 2, no. 3, pp. 360–378, 1931. S YSTEMS , Annals of Operations Research, the IEEE T RANSACTIONS ON
[22] C.-Y. Lee and Z.-H. Dong, “Hierarchical equipment health index frame- E NGINEERING M ANAGEMENT, and Applied Soft Computing.
work,” IEEE Trans. Semicond. Manuf., vol. 32, no. 3, pp. 267–276, Dr. Lee received the Best Practice Paper Award from the 17th Asia Pacific
Aug. 2019. Industrial Engineering and Management Systems Conference (APIEMS2016),
[23] F.-T. Cheng, H.-C. Huang, and C.-A. Kao, “Developing an automatic the 2016 Outstanding Young Industrial Engineer Award from the Chinese
virtual metrology system,” IEEE Trans. Autom. Sci. Eng., vol. 9, no. 1, Institute of Industrial Engineers (CIIE), the 2017 Ta-You Wu Memorial Award
pp. 181–188, Jan. 2012. of Distinguished Young Scholars from the Ministry of Science and Technology
[24] G. Verdier and A. Ferreira, “Adaptive mahalanobis distance and k- (MOST), Taiwan, the 2018 Kwoh-Ting Li Technology & Literature Lecture-
nearest neighbor rule for fault detection in semiconductor Manufac- ships Award of Distinguished Young Scholars from NCKU-Delta Electronics,
turing,” IEEE Trans. Semicond. Manuf., vol. 24, no. 1, pp. 59–68, and the 2019 Feng-Zhang Lu Memorial Medal from the Chinese Management
Feb. 2011. Association.

Authorized licensed use limited to: National Cheng Kung Univ.. Downloaded on November 02,2022 at 06:41:18 UTC from IEEE Xplore. Restrictions apply.
1678 IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 18, NO. 4, OCTOBER 2021

Chao-Shian Wu received the M.S. degree from the Yu-Hsin Hung (Member, IEEE) received the B.S.
Institute of Manufacturing Information and Systems, degree in statistics and the M.S. degree in man-
National Cheng Kung University, Tainan, Taiwan, ufacturing information and systems from National
in 2019. Cheng Kung University (NCKU), Tainan, Taiwan,
His research interests include data science, in 2017 and 2019, respectively.
machine learning, statistical process control, and He is currently a Research Assistant with the
operations research. Institute of Manufacturing Information and Systems,
Mr. Wu received the 2019 Master Thesis Award National Cheng Kung University. His research inter-
at Information System Session from the Chinese ests include statistics, optimization theory, control
Institute of Industry Engineers (CIIE). theory, intelligent manufacturing systems, and data
science.
Mr. Hung received the 2019 Master Thesis Award at Production System
Session from the Chinese Institute of Industry Engineers (CIIE) and the
membership in the Phi Tau Phi Scholastic Honor Society in 2019.

Authorized licensed use limited to: National Cheng Kung Univ.. Downloaded on November 02,2022 at 06:41:18 UTC from IEEE Xplore. Restrictions apply.

You might also like