Quantum Computing Assisted Deep Learning For Fault Detection and Diagnosis in Industrial Process Systems
Quantum Computing Assisted Deep Learning For Fault Detection and Diagnosis in Industrial Process Systems
Abstract
Quantum computing (QC) and deep learning techniques have attracted widespread attention in the
recent years. This paper proposes QC-based deep learning methods for fault diagnosis that exploit their
unique capabilities to overcome the computational challenges faced by conventional data-driven
approaches performed on classical computers. Deep belief networks are integrated into the proposed fault
diagnosis model and are used to extract features at different levels for normal and faulty process operations.
The QC-based fault diagnosis model uses a quantum computing assisted generative training process
followed by discriminative training to address the shortcomings of classical algorithms. To demonstrate its
applicability and efficiency, the proposed fault diagnosis method is applied to process monitoring of
continuous stirred tank reactor (CSTR) and Tennessee Eastman (TE) process. The proposed QC-based deep
learning approach enjoys superior fault detection and diagnosis performance with obtained average fault
detection rates of 79.2% and 99.39% for CSTR and TE process, respectively.
Key words: Quantum computing, deep learning, process monitoring, fault detection
1. Introduction
Fault detection and diagnosis has been an active area of research in process systems engineering due
to the growing demand for ensuring safe operations and preventing malfunctioning of industrial processes
by detecting abnormal events [1, 2]. Furthermore, the advent of chemical plant accidents causing
tremendous environmental and economic losses provide an extra incentive to develop process monitoring
techniques that effectively assure process safety and product quality in complex chemical process systems.
Data-driven approaches often termed as multivariate statistical process monitoring methods have attracted
significant attention and have been widely applied to monitor industrial processes [3-5]. Such methods rely
on historical process data and rarely require detailed knowledge of the governing physical models of the
*
Corresponding author. Phone: (607) 255-1162; Fax: (607) 255-9166; E-mail: [email protected]
1
continuous or batch processes [6], thus making them relatively easier to implement in process control and
operations [7].
Quantum computing (QC) based applications have been gaining traction recently due to their unique
capabilities with a significant portion of its presence perceived in the area of optimization with applications
in operational planning [8], molecular design [9, 10], process scheduling and operations [10, 11], logistics
optimization [10, 12], and energy systems [13]. The randomness and uncertainty inherently associated with
QC operations, subject to internal magnetic fields, thermal fluctuations, and other noise sources, could be
a hindrance to optimization applications. However, this non-ideal behavior can be exploited to develop
efficient statistical machine learning techniques. QC-enhanced machine learning techniques have been
proposed for data fitting [14], pattern recognition [15], generative machine learning [16], handwriting
recognition [17], and quantum recommendation systems [18]. These QC-based data-driven techniques can
also be used in process control and monitoring for industrial processes. Quantum advantages offered by QC
in terms of speed and method of operation could benefit fault monitoring in complex process systems where
swift and precise fault detection is desired. However, the applicability of QC-based techniques is limited
due to the commercially available quantum computers facing several limitations like low number of
quantum bits (also termed as qubits), limited connectivity, and lack of quantum memory. As a result,
integrating QC-enhanced learning techniques with classical machine learning algorithms to overcome such
limitations becomes necessary and is a promising approach for process monitoring.
The applicability and capacity of some basic classical data-driven methods in industrial process
monitoring such as principal component analysis (PCA), partial least squares (PLS), independent
component analysis (ICA), and fisher discriminant analysis (FDA) has been extensively studied [19, 20].
PCA and FDA are dimensionality reduction techniques that can be used to detect faults and discriminate
among classes of data by describing the trends in historical data through lower dimensional representations
[21, 22]. PLS and ICA are other powerful multivariate statistical tools widely used for fault detection and
diagnosis [23, 24]. Monitoring techniques based on these methods face some limitations which directly
affect their anomaly detection efficiency in complex process systems. PCA-based methods do not take into
account the temporal correlations between process data and information between classes when determining
the lower dimensional representations. FDA requires control limits for fault detection devised from the
assumption that the measurement signals follow a multivariate Gaussian distribution which may raise false
alarms. It is often difficult to interpret the independent latent variables in PLS with a possible risk of
overfitting. Several new variations of the basic data-driven monitoring methods have also been proposed
and applied to fault detection and diagnosis in industrial processes [25-28]. However, a large portion of
these analytical approaches are limited to linear and some specific nonlinear models. Also, the inherent
nonlinear nature of complex process systems renders the use of such methods inefficient due to
2
misclassification of large portion of the process data. Nonlinear classification techniques like support vector
machine (SVM) improve the fault classification performance for highly overlapped data. However, the
corresponding model complexity increases with the process data dimensions [22]. The extent of complex
nonlinearities and correlations present between the process data make it difficult for these classical data-
driven methods to generalize to all complex process systems, restraining their applicability in practical
situations.
The ability of artificial neural networks to approximate nonlinear relationships between the process
data and process states by generalizing the knowledge can be successfully applied to diagnose faults in
complex chemical process systems [29-32]. However, in some instances their generalization to multiple
faults is not always successful. Recently, deep learning has become a promising tool for smart fault
diagnosis due to powerful techniques like auto-encoder (AE) [33], restricted Boltzmann machine (RBM)
[34] and convolutional neural network (CNN) [35, 36]. Such deep learning models extract multiple levels
of abstraction from normal and faulty data, allowing them to achieve high classification accuracy. The
increasing complexity of industrial process systems requires deeper and more complex neural network
architectures to learn process data features and utilizes growing computational resources for efficient
process optimization and control [37]. Feature extractor models like RBM could also be computationally
intractable to train through classical training algorithms. Therefore, there arises a need to develop high-
performance deep learning models for fault detection and diagnosis capable of overcoming limitations of
the current machine learning paradigms carried out on state-of-the-art classical computers.
There are several research challenges towards developing QC-based process monitoring techniques
that utilize deep learning architectures and ensure effective fault detection and diagnosis performance. One
such challenge is to design deep learning models and architectures that can extract faulty features from
small datasets, since in most industrial applications large amounts of data for faulty operations are seldom
available. A further challenge lies in training of such deep architectures as their complexity increases with
the number of hyper-parameters. Faults must be detected and diagnosed quickly for safety concerns that
implies the training process should be performed with reasonable computational costs. Limitations of the
classical training algorithms for deep learning models and QC devices also pose a computational challenge.
It is crucial to develop techniques that leverage both QC and classical computers to overcome such
challenges.
In this work, we develop QC-based model and methods for fault detection and diagnosis of complex
process systems that efficiently extract several levels of features for normal and faulty process operations
using deep RBM-based architectures. For complex process systems with high number of process
measurements, training the RBMs is computationally challenging and might also result in suboptimal
hyper-parameters that further affect the classification accuracy of fault detection models. To this end, we
3
train the RBM-based network in the QC-based deep learning model with a quantum assisted training
algorithm to overcome such computational challenges. The proposed model effectively detects faults in
complex process systems by leveraging the superior feature extraction and deep learning techniques to
facilitate proper discrimination between normal and faulty process states. Complexities such as
nonlinearities between process variables and correlations between historical data can also be handled by
this QC-based fault diagnosis model. The applicability of this QC-based deep learning method is
demonstrated through two case studies on statistical process monitoring of the closed-loop continuous
stirred tank reactor (CSTR) and the Tennessee Eastman (TE) process, respectively. These two processes
are commonly used in benchmarking applications to measure and compare the performance of the fault
diagnosis models. The CSTR simulation deals with a first-order reaction carried out in a tank with seven
process variables recorded at each step that has three types of simulated faults, while the TE process is a
relatively large industrial chemical manufacturing process with 52 process variables and 20 faults.
Computational challenges stemming from the large size of the RBM used for the case studies are effectively
tackled by the proposed QC-assisted training process. The obtained computational results for detecting
anomalies are compared against state-of-the-art data-driven models and deep fault detection models trained
on classical computers.
The major contributions of this work are summarized below:
A novel QC-based deep learning model for detection and diagnosis of faults in complex
process systems is proposed;
The feature extractor network in the QC-based fault diagnosis model is trained with a novel
training process that performs generative training assisted by quantum sampling;
Case studies on CSTR and TE process are presented with comprehensive comparison against
state-of-the-art fault detection methods using classical computers.
We first provide a brief background on RBMs and adiabatic quantum computing. These preliminaries
in Section 2 are paramount to implementation of the proposed models and methods in this paper. The
proposed QC-based deep learning model for fault diagnosis and quantum assisted methods are presented in
Section 3. Two industrial case studies are presented in Sections 4 and 5 to demonstrate the effectiveness of
the proposed model. These are followed by a discussion on quantum advantage perceived in the respective
case studies in Section 6. Conclusions are drawn in the Section 7.
4
2. Background
5
particular forms of noise have also been reported [44]. Adiabatic computation requires the gap between the
excited states and the ground states to be not too small. Adiabatic evolution is particularly susceptible to
noise if this gap is small [45]. It has also been shown that under certain conditions, thermal interactions
with environment can improve the performance of AQC [46]. Apart from thermal fluctuations, several
internal and external factors contribute to the noise in quantum systems. Qubits in such devices can be
affected by the electronic control components and material impurities, which give rise to the external and
internal sources of noise, respectively. In the context of optimization, noisy qubits deviate the state of the
system from a global optimal solution to sub-optimal solution state. However, from a machine learning
perspective, such noisy behavior and measurement uncertainty in quantum systems can be exploited to
approximate sample distributions that could be used to model the distribution of data, as will be introduced
in Quantum Generative Training section. Despite several quantum advantages offered by AQC, the role of
current quantum technologies for process systems engineering is not well established. To provide a better
sense of the capabilities and challenges of AQC approaches, the presented background highlights the
theories and practicums of quantum computers through several references. Some of the major publications
that construe basic concepts underlying AQC can be found in [38] and [39]. Readers interested in
implementation of AQC-inspired methodologies can also refer to methodology focused articles like [13]
and [47].
Figure 1. a) Adiabatic quantum optimization (AQO) and b) Chimera architecture of the D-wave processing
unit
6
applied for pattern analysis and generation with applications in image generation [49], collaborative
filtering for movie recommendations [50], phone recognition [51], and many more. As the name suggests,
RBM is a restricted variant of Boltzmann machine that forms an undirected bipartite graph as shown in
Figure 2, between neurons from two groups commonly termed as visible and hidden units. A RBM network
with m visible neurons and n hidden neurons represent the observable data and the dependencies between
the observed variables, respectively [52]. The hyper-parameters for this undirected bipartite graph are the
weights and biases. For a pair of visible unit vi and a hidden unit hj, a real valued weight wij is associated
with the edge between them. A bias term bi and cj are associated with the ith visible unit and jth hidden
unit, respectively.
The energy function of a RBM [53] for the joint configuration of binary or Bernoulli visible and hidden
units 𝒗, 𝒉 ∈ 0,1 is given by E(v,h) as shown in Eq. (1). Due to the absence of connections between
units of the same layer, the state of the hidden variables is independent of the state of the visible variables
and vice versa. A probability is assigned by the network to each possible pair of visible and hidden units
through the RBM energy function as shown in Eq. (2), where the normalization constant or the partition
function Z is defined by summing over all possible pairs of visible and hidden vectors. This joint probability
distribution is defined by a Gibbs or a Boltzmann distribution. Due to the conditional independence between
the variables in the same layer, the conditional distributions factorize nicely and simple expressions for the
marginal distributions of visible variables can be obtained. Eq. (3) gives the probability assigned to a visible
vector v obtained by summing over all possible hidden vectors.
E v, h
ivisible
bi vi
jhidden
c j h j wij vi h j
i, j
(1)
1
p v, h exp E v, h , Z exp E v ', h ' (2)
Z v ', h '
7
1
p v exp E v, h
Z h
(3)
Generative training deals with determining the weights and biases that maximize the likelihood or log-
likelihood of the observed data. To maximize the probability p(v) assigned to the training data vector v by
the RBM, the weights and biases of the network are updated such that the energy of the training data vector
is lowered, while simultaneously raising the energy of the other training data vectors. The gradients of the
log-likelihood of the training data with respect to the hyper-parameters of the RBM can be calculated from
Eq. (4). The gradients can be interpreted as the difference between the expectation values under the
distributions of training data and the underlying model.
log p v
vi h j vi h j (4)
wij data model
Learning rules to update the values of weights and biases can be derived from these log-likelihood
gradients in order to maximize the log probability with stochastic gradient ascent. Eqs. (5), (6), (7) describe
the update rules where ε is the learning rate and α is the momentum. The terms 〈𝑣 ℎ 〉 , 〈𝑣 〉 , 〈ℎ 〉
are the clamped expectation values with a fixed v and can be efficiently computed from training data using
Eq. (8). This equation provides an unbiased sample of the clamped expectations where 𝜎 𝑥 is the logistic
sigmoid function defined by 𝜎 𝑥 1/ 1 𝑒 . Eq. (9) also produces unbiased samples of visible states,
give a hidden vector h.
c tj1 c tj h j data hj
model
(7)
P h j 1| v c j wij vi (8)
iv
P vi 1| h bi wij h j (9)
jh
The model expectations 〈𝑣 ℎ 〉 , 〈𝑣 〉 , 〈ℎ 〉 are difficult to estimate. They can be
computed by randomly initializing the visible states and performing Gibbs sampling for a long time.
However, this can be computationally intractable as the number of visible and hidden units increases [54].
Hinton proposed a faster learning algorithm called contrastive divergence (CD) learning [55] that has
become a standard way to train RBMs. Rather than approximating the model expectations by running a
Markov chain until equilibrium is achieved, the k-step CD learning (CD-k) runs the Gibbs chain for only k
8
steps to yield the samples 〈𝑣 ℎ 〉 , 〈𝑣 〉 , 〈ℎ 〉 as shown in Figure 2b. This learning algorithm works well
despite the k-step reconstruction of the training data crudely approximating the model expectations [55].
Theoretically, as 𝑘 → ∞ the update rules converge to the true gradient. However, in practice the updates
are computed using a single-step (k=1) reconstruction to achieve good enough performance.
Many significant applications use real-valued data nowadays for which the binary RBM would
produce poor logistic representations. In such cases, a modified variation of the RBM can be used by
replacing Bernoulli visible units with Gaussian visible units [54]. The energy function then takes the form
of Eq. (10), where σi is the standard deviation of the Gaussian noise for the ith visible unit. CD-1 can be
used to learn the variance of the noise, but it is much more complicated than the binary case. An easier
alternative is to normalize each data component to have zero mean and unit variance, and then use noise-
free models. The variance σ2 would be unity in this case.
vi bi
2
vi
E v, h
ivisible 2 i 2
jhidden
c j hj
i, j i
h j wij (10)
Deep architectures can be constructed by stacking layers of RBMs together. Such deep architectures
are termed as deep belief networks (DBNs) where each RBM sub-network’s hidden layer serves as the
visible layer for the following RBM layer [56]. DBNs are trained in a greedy fashion by sequentially
training each RBM layer. There have been many implementations and uses of DBNs in real-world
applications due to their versatility and effective multiple-level feature extraction capabilities [57].
9
order to classify the state of this data vector, outputs from the pre-trained sub-networks DBN-N and DBN-
F that serve as 𝑘 dimensional approximations of the input data, are combined together.
The second step uses the combined approximate 2𝑘 dimensional vector. It is passed on to the local
classification sub-network that predicts the state of the original input data vector. The local classification
deep neural network-based architecture yields the probabilities of two possible states, normal and faulty.
The local classifier follows a supervised discriminative learning strategy that uses class labels as an extra
output layer. A graphical representation of the proposed QC-based fault diagnosis model is shown in Figure
3. Since the performance of DBN-based networks is known to be sub-optimal due to the presence of several
local minima, generative training helps locating a desired local neighborhood near a good optimum while
discriminative training further refines the optimum by fine-tuning the model parameters.
Figure 3. Repeating sub-network in the proposed QC-based fault diagnosis model that uses deep belief
networks and local classifier to predict the state of the data samples
10
Figure 4. Deep belief network architecture used in the repeating sub-network of the QC-based fault
diagnosis model that produces a high level abstraction of the input data
Yˆ1 Xˆ Wg Cg (11)
Following the first RBM layer, the second RBM layer in the DBN extracts higher level features from
the process data. Deep network architectures are always preferred over shallow networks, but increasing
11
model complexity requires large amount of training data to achieve optimum model skill. Also, increasing
the number of layers introduces size constraints on the following layers and might limit the model
performance. Computational experiments conducted with one RBM layer yield a lower performance than
relatively deeper architectures. Therefore, two RBM layers are used in the DBN sub-networks. Binary
output vector 𝑌 obtained from the first RBM layer serves as input to this RBM layer. Therefore, the visible
and hidden units of the second RBM are modeled as Bernoulli units. The weights matrix 𝑊 ∈ ℝ ,
visible bias 𝐵 ∈ ℝ , and hidden bias 𝐶 ∈ ℝ form the model parameters for this layer that need to be
optimized. The update rules for these model parameters require the computation of model expectations
〈𝑣 ℎ 〉 , 〈𝑣 〉 , and 〈ℎ 〉 . Since the CD algorithm approximates the gradient for the update
rules with a larger variance that might not always lead to the maximum likelihood estimate of the model
parameters, the model expectations are estimated using quantum sampling implemented through a quantum
computer.
The AQC devices are explicitly built for optimization purposes by determining the ground state of the
problem Hamiltonian. However, there have been experimental evidence suggesting that under certain
conditions such devices sample approximately from a Boltzmann distribution at an effective temperature
[59, 60]. The final states of the qubits are effectively described by a Boltzmann distribution when the
strengths of the fields and couplings on the device are sufficiently small. Due to the presence of non-ideal
interactions between the qubits and the environment, the AQC device can be used as a sampling engine
[17]. A natural resemblance exists between the problem Hamiltonian taking the form of a QUBO problem
and the energy function of the RBM with Bernoulli units. Quantum sampling exploits this by embedding
the RBM energy function onto the AQC device. The distribution of the excited states of the qubits can then
be modeled as a Boltzmann distribution given in Eq. (12). An unknown scale parameter 𝛽 dictates the
effective temperature at which samples are drawn from the underlying Boltzmann distribution. The value
of this parameter depends on the operating conditions of the AQC device, and it is a direct link between the
problem Hamiltonian and the energy function. Although some techniques have been proposed that estimate
the effective temperature [61], a constant value for 𝛽 is empirically selected depending on the size of
the RBM. Samples drawn from an AQC device follow a trend as shown in Figure 5.
12
Figure 5. RBM Energy histogram obtained for two sets of control parameters obtained by increasing the
actual parameters by a scaling factor along with the effect of scaling factor on the average energy
Control parameters used for the quantum sampling process are equivalent to the weights and biases of the
RBM energy function provided that the scale parameter is unity. 𝛽 can also be estimated by adjusting
the actual control parameters by a user-defined scaling factor and analyzing the difference between the
histogram of samples drawn from an AQC device as shown in Figure 5. Selecting an appropriate scaling
factor is a crucial task; increasing the scaling factor tends to reduce the average energy of the samples drawn
through quantum sampling. Setting the value of the unknown scale parameter to one eliminates the need
for analytically calculating 𝛽 at each iteration of the training process, and sparingly reduces the required
computational resources and time. It should also be noted that the AQC device could be used over the cloud,
thus providing a cost-effective alternative over buying an expensive AQC device.
1
P v, h exp eff ERBM (12)
Z
13
Figure 6. Quantum generative training through quantum sampling
With the approximate knowledge of the underlying Boltzmann distribution, the model expectations
are computed by drawing several samples corresponding to the RBM energy function by quantum sampling.
Eqs. (13), (14), (15) use N samples drawn from adiabatic optimization runs to calculate the corresponding
model expectation values required to update the model parameters. Figure 6 summarizes the quantum
generative training process that uses quantum sampling to find the maximum likelihood estimates of the
corresponding model parameters.
N
1
vi h j
model
N
v h
n 1
n
i
n
j (13)
N
1
vi model
N
v
n 1
n
i (14)
N
1
hj
model
N
h
n 1
n
j (15)
The update rules for the weights and biases of the second RBM in the DBN sub-network given in Eq.
(5) converge to the minimum cross-entropy loss between the original input and the reconstructed input
vector. The output from the second RBM layer 𝑌 ∈ ℝ bounded by [0,1] is obtained by multiplying
input data vector with the weights matrix and adding the corresponding hidden biases followed by a sigmoid
activation function operation given in Eq. (16). Output of the generative training model 𝑌 is a transformed
version of the original input data vector 𝑋. This transformation can be considered as a higher-level
abstraction of the historical process data and can be used as an input to the corresponding classifier to
determine the state of the input data sample in the QC-based fault diagnosis model.
14
Yˆ2 Yˆ1 Wb Cb (16)
Figure 7. Local classifier architecture that identifies normal or faulty data samples
The weights matrix 𝑊 ∈ ℝ and bias vector 𝑏 ∈ ℝ form the model parameters for the fully
connected layer that connects each input to every hidden neuron. Nonlinear combinations of the extracted
features can be easily learned with a fully connected layer which is a major component of the discriminative
training process. The output generated by this layer 𝑌 ∈ ℝ is used to predict the score of normal or
faulty class; it is obtained by summing the bias and product of weights matrix with the input vector as
shown in Eq. (17) followed by a ReLU activation function operation. As the process data can be in either
of the two states, normal or faulty, weights vector 𝑊 ∈ ℝ and bias vector 𝑏 ∈ ℝ predict the final
class scores using the soft-max activation function in Eq. (18). Model parameters for the DBN-based sub-
networks are fine-tuned by retraining the local classifier neural network classically with the
15
backpropagation algorithm that performs supervised learning of neural networks using gradient descent.
The gradients of the loss function are estimated with respect to the model parameters of the local classifier
sub-network, in order to iteratively update the model parameter values. Minimizing the categorical cross-
entropy loss for the classifier yields maximum likelihood estimates of the model parameters. The cross-
entropy loss can be computed with the predicted class scores as shown in Eq. (19), where YT are the true
fault labels.
Yˆ3 ReLU Yˆ2 W f b f (17)
Pi
exp Yˆ3 Wsi bsi (18)
exp Yˆ3 Wsi bsi
i
LossCE Yi T log Pi (19)
i
A QC-based fault diagnosis model for individual process faults is obtained by following the quantum
generative training and discriminative training process. The DBN-N sub-network in the generative model
is trained only once and can be re-used for each diagnosis model. To detect the unknown state of the process
data sample, both normal state and faulty state abstractions of the data sample generated as the output of
the DBN-based generative model are merged. The local classifier then predicts the probabilities that the
data sample belongs to normal or faulty states. A threshold probability of 0.5 further detects the state of the
new process data sample.
16
is introduced after 100 minutes of normal operation. The training and testing datasets corresponding to both
normal and faulty states used for this case study are provided in the Supplementary Information.
17
trained through the quantum generative training process. Cross-entropy loss is used as a performance metric
to track the training progress. Mean square loss can also be used as a viable substitute for cross-entropy
loss. The hidden layers or the outputs in the DBN-N and DBN-F sub-networks are merged together as a
single layer with 16 neurons. This ensures that the higher level abstractions of the normal and faulty states
are processed together. A fully connected layer with 16 neurons followed by a soft-max layer is attached to
the merged outputs and forms the basis of the discriminative training. With the weights and biases obtained
through the quantum generative training as starting points, the complete network is retrained with Adam
optimizer to minimize the categorical cross-entropy loss.
In order to draw samples from the AQC-based device, D-wave’s 2000Q quantum processing unit is
used remotely over the cloud. The model expectations required to compute the weight and bias updates are
calculated with these samples. This AQC-based device uses 2,048 qubits and 5,600 couplers that limit the
size of fully connected RBM energy function with an equal number of visible and hidden units to 52 units
in each layer. 1,000 anneal runs are performed with each run lasting for 20μs on this quantum processing
unit. An embedding scheme for the corresponding RBM energy function is determined in a heuristic manner
and the obtained graph minor is re-used to eliminate unnecessary complications with the effective
temperature parameter 𝛽 . This parameter is set to a constant value of one for the CSTR case study.
18
4.2. Fault Detection
Several computational experiments are conducted with the aforementioned experimental settings to
demonstrate the viability of the proposed QC-based fault diagnosis model. The local classifier detects the
state of each sample and classifies it as normal or faulty by predicting the likelihood of individual states. It
is important to note that the DBN-F sub-network is trained using the entire faulty dataset consisting of data
samples corresponding to each individual fault. This allows the local classifier to effectively learn a
discernible pattern between normal and faulty data samples. As a result, the proposed architecture is
sensitive towards previously unseen states and is capable of detecting samples belonging to unknown faulty
states. A probability control limit of 0.5 is used to classify the input data vector as normal or faulty. The
fault detection rate (FDR) and the false alarm rate (FAR) are reported for each fault classified with the
proposed QC-based fault diagnosis model in Table 1. FDR is defined as the fraction of faulty samples that
are accurately detected, and FAR is the fraction of normal data samples that are incorrectly classified as
faulty. If p is the number of fault samples that are detected as faulty and q is the number of normal samples
detected as faulty, then FDR and FAR can be computed with Eqs. (20) and (21), respectively. FDRs for the
CSTR case study estimated with canonical variate dissimilarity analysis (CVDA) [62] are also reported in
Table 1, where the control limits for fault detection are computed with the T2 statistic.
100 p
FDR % (20)
total count of faulty samples
100 q
FAR % (21)
total count of normal samples
Table 1. Fault detection results of the local classifier in the proposed QC-based deep learning
model for the CSTR case study
It can be clearly seen that the FDR for the QC-based fault diagnosis model significantly improves for
the sensor Fault 1. However, this may be accompanied by an increase in the number of false positives. As
for the parametric faults 2 and 3, the FDRs are higher than or comparable to that of the detection rates
obtained by CVDA. This improvement in detection rates can be attributed to the accurate capture of
19
nonlinear process behavior by the QC-based fault diagnosis model. It is a well-known fact that the
performance of deep architectures depends on the hyper-parameters like the number of neurons used.
Therefore, we also generate heatmaps for FDRs as functions of the number of hidden units in the DBN-N
and DBN-F sub-networks to determine the optimal configuration for the number of neurons in the hidden
layers of the DBN sub-networks.
Figure 10. FDR heatmaps of the DBN sub-networks for the CSTR case study
The FDR maps for all three faults as shown in Figure 10 indicate that the detection rates for fault one are
high for almost all DBN architectures. However, the detection results for fault two and fault three are
relatively non-uniform. In case of fault two, for a fixed number of hidden units in the Bernoulli RBM, the
FDRs gradually decrease with an increase in the hidden neurons in the first RBM layer. Alternatively, no
discernible pattern is observed in the FDRs for fault three. The choice of best performing DBN architecture
with 15 and 8 hidden units in the first RBM and the second RBM layer, respectively, can be clearly justified
from these FDR heatmaps.
Among the 1,500 faulty samples in the testing dataset, the local classifier accurately classifies the
dynamic input data samples with an average detection rate of 86.08%. This implies that the 13.92% of the
faulty samples are missed. Compared to the missed detection rate of 40.42% in the CVDA technique, the
QC-based fault diagnosis model clearly outperforms this fault monitoring technique. The average FAR rate
obtained through performing multiple computational experiments for the CSTR case study is 19.41%.
These false alarms can be accounted to the difficulty in differentiating between normal and faulty state one,
as fault one deviates only slightly from normal operation and produces higher number of false alarms.
Although the proposed QC-based fault diagnosis model may be accompanied by one false alarm for every
five normal samples, it clearly outperforms the classical CVDA technique for all three faults with a
significantly higher fault detection rate. Identifying faults for the CSTR case study is trivial and can be
associated with the small-scale nature of the process system with lower number of process variables and
fewer simulated faults. Performing fault identification for the CSTR case study manually by visual
inspection of the recorded process variables could be much more cost efficient than implementing advanced
fault diagnosis methodology for minor pattern recognition. In the next case study, automatic fault
20
identification is essential due to the large-scale nature of the nonlinear process system for which we
implement a QC-based deep learning model for fault identification.
21
Figure 11. Schematic of the Tennessee Eastman (TE) chemical process
The sub-networks DBN-N and DBN-F use the same architectural configuration to produce abstractions
of the normal and faulty states, respectively. The first RBM layer in the DBN-based sub-networks consists
of 52 visible Gaussian units and 26 Bernoulli hidden units. This RBM layer is set up to produce an output
by simple perceptron operation without sampling from the underlying Gaussian distribution. The following
RBM layer used 26 visible units corresponding to the hidden layer of the first RBM layer along with 20
hidden units. Output obtained from the second RBM layer is produced by sampling from a binomial
distribution with the hidden unit values as the means. A learning rate of 0.01 and momentum of one is used
to train the DBN sub-networks via quantum generative training. Learning rate for RBM with Gaussian units
should be at least one order of magnitude less than the corresponding binary RBM. Cross-entropy loss is
used as a performance metric to track the progress of the quantum generative training process. A data vector
with 40 dimensions is obtained after the high-level abstractions from the DBN-N and DBN-F sub-network
are concatenated. Fully connected layer with 40 neurons is attached to this input and forms the major
component of the discriminator sub-network. Fine-tuning of the weights and biases obtained through
quantum generative training is performed by training the discriminator with the Adam optimizer to
minimize the categorical cross-entropy loss.
The quantum generative training process draws samples from the AQC-based quantum computer for
quantum sampling to approximate the model expectations. D-Wave 2000Q quantum processor with 2,048
qubits and 5,600 couplers is used remotely over the cloud for all computational experiments involving QC-
22
based fault detection model. The anneal schedule runs for 20μs on this processor. To compute the model
expectations, 1000 anneal reads are used implying the drawing of 1000 samples from the quantum
computer. For a single RBM instance, an embedding scheme for the corresponding RBM energy function
is found through a heuristic technique. Drawing samples from the quantum computer for the energy
function requires the use of the same embedding scheme. It is important to re-use the same graph-minor in
order to minimize the variation in the effective temperature dependent parameter 𝛽 . For this case study,
the value of the unknown scale parameter is set to unity to avoid further complications associated with the
hyper-parameter learning rules.
23
is 99.39%, meaning only 0.61% of the faulty samples remain undetected. A major challenge in developing
fault diagnosis models is to adjust the trade-off between the FDRs and the FARs. An increase in FDR is
usually accompanied by an increase in the FAR. However, the false positive rate for the proposed QC-
based diagnosis model is only 5.25%. With an average FDR of 99.39% and FAR of 5.25%, the performance
of the proposed fault diagnosis model is significantly high and can efficiently differentiate faulty process
data from normal states of operation.
Table 2. Fault detection results of the local classifier in the proposed QC-based deep learning model for
the TE process case study
24
Figure 12. FDR heatmaps of the DBN sub-networks for the TE process case study
25
context of fault identification, the FDR metric is modified to compute the classification accuracy for each
fault class. To this end, the FDR is computed for each fault type with Eq. (22), where r is the count of a
particular faulty state samples that are accurately classified to this state.
100 r
FDR % (22)
total count of this fault type samples
For comparison purposes, the FDRs of some state-of-the-art data-driven and deep neural network-based
approaches are also reported. The diagnosis results obtained for the TE process using PCA [20] and DBN-
based fault diagnosis model [34] are also reported in Table 3. As evident from the diagnosis results, PCA
does not effectively detect several of the faults in the TE process. The diagnosis rates for faults 3, 9, and 15
are particularly poor. The inability of PCA to take into account the temporal correlations might be
contributing to its poor diagnosis performance. This shortcoming is overcome by the DBN-based fault
diagnosis model which strongly augments the diagnosis results for a significant portion of the faulty states.
The FDRs for faults 3 and 9 improve with the DBN-based model, but fault 15 performs even worse than
that of PCA. None of the faults of types 15 and 16 are diagnosed with the DBN-based model in [34].
However, the proposed QC-based fault diagnosis model provides a FDR of 44.9% for fault 15 which is
significantly higher than both PCA and DBN-based model proposed in [34]. In the TE process, faults 3, 5,
9, 15, and 16 are particularly hard to detect and usually require significant model tweaking for a mediocre
performance improvement. Among the poorly performing fault classes with the QC-based fault diagnosis
model, only faults 9, 10, and 11 are random faults. Fault 3 is a step fault introduced in feed temperature in
the TE process. The DBN-based model in [34] is unable to detect any fault of types 15 and 16. Most state-
of-the-art fault diagnosis models are unable to generalize to multiple faults. A similar trend of generalization
to multiple faults is observed with the proposed QC-based fault diagnosis model. Investigating the
significance of poorly detected faults on the generalization abilities of the QC-based fault diagnosis model
forms the basis for the future scope of this work. On the other hand, the global classifier in the proposed
QC-based diagnosis model classifies faults of almost all fault states with a significantly higher accuracy
rate than PCA for faults 3 and 9. The resulting FDRs for the rare and hard to detect faults 15 and 16 are
higher than those of both PCA and DBN-based models as well. The lowest FDR reported by the QC-based
fault diagnosis model is 38.1% for the fault 9. Although, this diagnosis rate for fault 9 is lower than that of
the DBN-based model, the diagnostic performance of the proposed model is clearly superior for faults that
are rare and hard to detect. Apart from false positives, misclassification of faulty states is also possible and
cannot be overlooked as the cost of repairs required due to detection of particular faults could be expensive.
The performance of this fault diagnosis model for fault identification can be represented by a confusion
matrix which allows the visualization of the accuracy of classification, and misclassification as well. Figure
13 represents the diagnosis results of the global classifier in the QC-based fault diagnosis model in the form
26
of a confusion matrix. The diagonal elements in the matrix are the FDRs for a particular class of samples.
The last row in the matrix labeled as normal corresponds to the FARs and have some of the lowest values
in the confusion matrix. This confusion matrix can also be used to determine the degree of resemblance
between classes of samples. Faults with no similarities whatsoever between other faulty or normal states
are relatively easy to diagnose with lower chances of misidentification. Faults 1, 2, 5, 6, and 18 are few
such faults with the highest FDR recorded with detection rates as high as 100%.
The low FARs produced by the local classifiers in the QC-based fault diagnosis model are maintained
for the diagnosis results of the global classifier network. Several computational experiments are performed
for the global classifier network in order to estimate the extent of FARs for each corresponding fault. Figure
14 shows the FARs for each of the 20 faults simulated in the TE process. It should be noted that the highest
FAR rate recorded is lower than one percent. Although an average FDR of 82.1% is reported for the DBN-
based model [34], it should be noted that this framework is developed specifically for complex chemical
processes. Its application to the TE process involves several data preprocessing steps like variable sorting
and time length selection that are not considered for this case study performed with the QC-based deep
learning model for fault diagnosis. Such preprocessing steps involving feature screening could
improve the diagnosis results of the proposed QC-based deep learning model for fault diagnosis.
However, to demonstrate the generalization capabilities of the proposed fault diagnosis model,
feature screening is not performed for the TE process case study. With an average FDR of 80% and
a total average FAR of 1.3%, the proposed QC-based fault diagnosis model can be competitively used
against highly targeted state-of-the-art fault diagnosis methods implemented with classical computers and
for detection and diagnosis of rare faults in complex process systems.
Table 3. Comparison between different fault diagnosis models for the TE process case study with respect
to fault detection rates for each identified fault
27
8 98 78 76.3
9 8.38 57 38.1
10 60.5 98 44.6
11 78.88 87 51.5
12 99.13 85 81.3
13 95.38 88 94.7
14 100 87 86.9
15 14.13 0 44.9
16 55.25 0 68.3
17 95.25 100 92.7
18 90.5 98 95.6
19 41.13 93 73.0
20 63.38 93 89.9
Figure 13. Confusion matrix for the fault diagnosis results obtained by the global classifier
28
Figure 14. False alarm rates for the global classifier in the TE process case study
6. Quantum Advantage
Conventional classical learning techniques to train DBNs crudely approximate the log-likelihood
gradients of the training data required for the hyper-parameter update rules. The CD-k algorithm more
closely approximates the contrastive divergence that is defined as the difference between Kullback-Liebler
divergences [54]. It has also been demonstrated that CD-k algorithm does not follow the gradient of any
function [64]. Although CD-k converges to the true gradient after infinite reconstruction steps, it is
impractical to run the algorithm for an endless time. Other than the approximation limitations, CD-k may
take many iterations to converge due to the inherent noise in Gibbs sampling and slow evolution towards
the equilibrium probability distribution [17].
Quantum generative training circumvents some challenges put forth by the classical training
techniques. For machine learning and deep learning applications, a quantum advantage can be quantified
with the computation effort require to achieve a particular model performance. Computation time required
could also be considered as a factor in demonstrating the efficiency of quantum inspired techniques over
classical techniques. In the case studies, the performance profiles given by the loss curves for the second
RBM layer in the DBN-F sub-network can be used to compare performance of classical and quantum
training techniques. Loss curves for all faults in the CSTR case study are shown in Figure 15 for both CD-
1 algorithm and the quantum sampling-based training approach Similar curves for the TE process case
29
study are also plotted for few faults and are given in Figure 16. These particular representation for faults in
the TE process are chosen such that a clear distinction between the classical and quantum techniques can
be observed. As seen in the plots, QC-based training algorithm converges faster than the classical CD-1
algorithm. A clear quantum advantage can be perceived with quantum assisted training techniques for the
proposed fault diagnosis model. The computation time required to calculate gradients with both quantum
and classical techniques is negligible in the case of TE process; therefore, this is not an effective criteria to
quantify the superiority of quantum inspired techniques over classical training algorithms. In addition,
samples are drawn from an AQC device at each step of the quantum sampling process within 20μs. This
sampling time is independent of the size of the RBM network and does not increase with size, unlike
classical training techniques. This implies that a computational time advantage could be clearly perceived
in case of large networks trained with the quantum generative training process.
The approximation errors for the CD-1 algorithm to train DBNs could have adverse effects on its
performance as the size of the RBM sub-networks gets very large. Markov chain based conventional
training techniques would not be a feasible choice in such cases either. However, because quantum
sampling can draw samples from an underlying approximate Boltzmann distribution that models the joint
probability of the RBM, the quantum generative training technique can guarantee an efficient performance.
This holds true provided that the size of the RBM energy function does not exceed the scale of current
AQC-based computers. As evident from the two case studies of process monitoring in nonlinear complex
process systems, the proposed QC-based fault diagnosis model effectively detects faults with significantly
higher detection rates and lower false positives. This implies that the proposed fault diagnosis model is a
generalized approach and could work for most nonlinear complex process systems with little to no
modifications. With the increasing applicability of deep neural networks, a quantum advantage provides an
extra edge to such approaches. To this end, it is important to note that computational speed could also
contribute towards the quantum advantage as the number of process variables increases. Faster convergence
with quantum sampling ensures less computation to achieve the same model performance with that of
classical techniques like CD-1 algorithm. High computational speeds coupled with faster convergence can
guarantee superior performance of such deep learning models and methods.
30
Figure 15. Loss curves for DBN trained with quantum and classical methods for the CSTR case study
Figure 16. Loss curves for DBN trained with quantum and classical techniques for the TE process case
study
7. Conclusion
In this paper, we proposed a QC-based fault diagnosis model to distinguish faulty states from normal
operating states in complex industrial chemical process systems. We integrated quantum assisted generative
training with classical discriminative training to detect and diagnose multiple faults introduced in the
system. The sampling abilities of AQC computers were exploited to perform quantum generative training
for the DBN-based sub-networks present in the proposed QC-based fault diagnosis model. The applicability
of this model was demonstrated through two applications on a CSTR and TE process, respectively. The
obtained detection and diagnosis results indicated that the proposed QC-based fault diagnosis model clearly
outperformed state-of-the-art data-driven approaches and deep neural network based models in most cases.
A quantum advantage was also perceived with the quantum generative training while training the DBN-
based sub-networks in the proposed fault diagnosis model in contrast to the classical training approaches.
31
References
[1] S. J. Qin, "Survey on data-driven industrial process monitoring and diagnosis," Annual
Reviews in Control, vol. 36, pp. 220-234, 2012.
[2] E. Russell, L. H. Chiang, and R. D. Braatz, Data-driven methods for fault detection and
diagnosis in chemical processes. London New York: Springer, 2000.
[3] M. Kano, K. Nagao, S. Hasebe, I. Hashimoto, H. Ohno, R. Strauss, et al., "Comparison of
multivariate statistical process monitoring methods with applications to the Eastman
challenge problem," Computers & Chemical Engineering, vol. 26, pp. 161-174, 2002.
[4] C. Shang and F. You, "Data Analytics and Machine Learning for Smart Process
Manufacturing: Recent Advances and Perspectives in the Big Data Era," Engineering, vol.
5, pp. 1010-1016, 2019.
[5] J. MacGregor and A. Cinar, "Monitoring, fault diagnosis, fault-tolerant control and
optimization: Data driven methods," Computers & Chemical Engineering, vol. 47, pp. 111-
120, 2012.
[6] L. Luo, R. J. Lovelett, and B. A. Ogunnaike, "Hierarchical monitoring of industrial
processes for fault detection, fault grade evaluation, and fault diagnosis," AIChE Journal,
vol. 63, pp. 2781-2795, 2017.
[7] Y. Chu and F. You, "Model-based integration of control and operations: Overview,
challenges, advances, and opportunities," Computers & Chemical Engineering, vol. 83, pp.
2-20, 2015.
[8] E. G. Rieffel, D. Venturelli, B. O’Gorman, M. B. Do, E. M. Prystay, and V. N.
Smelyanskiy, "A case study in programming a quantum annealer for hard operational
planning problems," Quantum Information Processing, vol. 14, pp. 1-36, 2015.
[9] A. Perdomo-Ortiz, N. Dickson, M. Drew-Brook, G. Rose, and A. Aspuru-Guzik, "Finding
low-energy conformations of lattice protein models by quantum annealing," Scientific
Reports, vol. 2, p. 571, 2012.
[10] A. Ajagekar, T. Humble, and F. You, "Quantum computing based hybrid solution strategies
for large-scale discrete-continuous optimization problems," Computers & Chemical
Engineering, vol. 132, p. 106630, 2020.
[11] T. T. Tran, M. Do, E. G. Rieffel, J. Frank, Z. Wang, B. O'gorman, et al., "A Hybrid
Quantum-Classical Approach to Solving Scheduling Problems," in Symposium on
Combinatorial Search, 2016.
[12] F. Neukart, G. Compostella, C. Seidel, D. von Dollen, S. Yarkoni, and B. Parney, "Traffic
Flow Optimization Using a Quantum Annealer," Frontiers in ICT, vol. 4, 2017.
[13] A. Ajagekar and F. You, "Quantum computing for energy systems optimization:
Challenges and opportunities," Energy, vol. 179, pp. 76-89, 2019.
[14] N. Wiebe, D. Braun, and S. Lloyd, "Quantum Algorithm for Data Fitting," Physical Review
Letters, vol. 109, 2012.
[15] S. Aaronson, "Quantum Machine Learning Algorithms," Nature Physics, vol. 11, pp. 291-
293, 2015.
[16] X. Gao, Z. Zhang, and L. Duan. (2017, November 01, 2017). An efficient quantum
algorithm for generative machine learning. arXiv e-prints. Available:
https://ui.adsabs.harvard.edu/abs/2017arXiv171102038G
32
[17] S. H. Adachi and M. P. Henderson, "Application of quantum annealing to training of deep
neural networks," arXiv preprint arXiv:1510.06356, 2015.
[18] I. Kerenidis and A. Prakash. (2016, March 01, 2016). Quantum Recommendation Systems.
arXiv e-prints. Available: https://ui.adsabs.harvard.edu/abs/2016arXiv160308675K
[19] L. H. Chiang, E. L. Russell, and R. D. Braatz, "Fault diagnosis in chemical processes using
Fisher discriminant analysis, discriminant partial least squares, and principal component
analysis," Chemometrics and Intelligent Laboratory Systems, vol. 50, pp. 243-252, 2000.
[20] S. Yin, S. X. Ding, A. Haghani, H. Y. Hao, and P. Zhang, "A comparison study of basic
data-driven fault diagnosis and process monitoring methods on the benchmark Tennessee
Eastman process," Journal of Process Control, vol. 22, pp. 1567-1581, 2012.
[21] V. Venkatasubramanian, R. Rengaswamy, S. N. Kavuri, and K. Yin, "A review of process
fault detection and diagnosis Part III: Process history based methods," Computers &
Chemical Engineering, vol. 27, pp. 327-346, 2003.
[22] L. H. Chiang, M. E. Kotanchek, and A. K. Kordon, "Fault diagnosis based on Fisher
discriminant analysis and support vector machines," Computers & Chemical Engineering,
vol. 28, pp. 1389-1401, 2004.
[23] J. M. Lee, C. K. Yoo, and I. B. Lee, "Statistical process monitoring with independent
component analysis," Journal of Process Control, vol. 14, pp. 467-485, 2004.
[24] R. Jia, S. Zhang, and F. You, "Transfer learning for end-product quality prediction of batch
processes using domain-adaption joint-Y PLS," Computers & Chemical Engineering, vol.
140, p. 106943, 2020.
[25] J. F. Macgregor, C. Jaeckle, C. Kiparissides, and M. Koutoudi, "Process Monitoring and
Diagnosis by Multiblock Pls Methods," Aiche Journal, vol. 40, pp. 826-838, 1994.
[26] M. Kano, S. Hasebe, I. Hashimoto, and H. Ohno, "A new multivariate statistical process
monitoring method using principal component analysis," Computers & Chemical
Engineering, vol. 25, pp. 1103-1113, 2001.
[27] J. M. Lee, S. J. Qin, and I. B. Lee, "Fault detection and diagnosis based on modified
independent component analysis," Aiche Journal, vol. 52, pp. 3501-3514, 2006.
[28] Q. P. He, S. J. Qin, and J. Wang, "A new fault diagnosis method using fault directions in
fisher discriminant analysis," Aiche Journal, vol. 51, pp. 555-571, 2005.
[29] J. C. Hoskins, K. M. Kaliyur, and D. M. Himmelblau, "Fault-Diagnosis in Complex
Chemical-Plants Using Artificial Neural Networks," Aiche Journal, vol. 37, pp. 137-141,
1991.
[30] T. Sorsa and H. N. Koivo, "Application of Artificial Neural Networks in-Process Fault-
Diagnosis," Automatica, vol. 29, pp. 843-849, 1993.
[31] V. Venkatasubramanian and K. Chan, "A Neural Network Methodology for Process Fault-
Diagnosis," Aiche Journal, vol. 35, pp. 1993-2002, 1989.
[32] M. J. Willis, G. A. Montague, C. Dimassimo, M. T. Tham, and A. J. Morris, "Artificial
Neural Networks in Process Estimation and Control," Automatica, vol. 28, pp. 1181-1187,
1992.
[33] F. Y. Lv, C. L. Wen, Z. J. Bao, and M. Q. Liu, "Fault Diagnosis Based on Deep Learning,"
2016 American Control Conference (Acc), pp. 6851-6856, 2016.
[34] Z. P. Zhang and J. S. Zhao, "A deep belief network based fault diagnosis model for complex
chemical processes," Computers & Chemical Engineering, vol. 107, pp. 395-407, 2017.
33
[35] K. B. Lee, S. Cheon, and C. O. Kim, "A Convolutional Neural Network for Fault
Classification and Diagnosis in Semiconductor Manufacturing Processes," Ieee
Transactions on Semiconductor Manufacturing, vol. 30, pp. 135-142, 2017.
[36] H. Wu and J. S. Zhao, "Deep convolutional neural network model based chemical process
fault diagnosis," Computers & Chemical Engineering, vol. 115, pp. 185-197, 2018.
[37] C. Ning and F. You, "Optimization under uncertainty in the era of big data and deep
learning: When machine learning meets mathematical programming," Computers &
Chemical Engineering, vol. 125, pp. 434-448, 2019.
[38] T. Albash and D. A. Lidar, "Adiabatic quantum computation," Reviews of Modern Physics,
vol. 90, 2018.
[39] T. Kadowaki and H. Nishimori, "Quantum annealing in the transverse Ising model,"
Physical Review E - Statistical Physics, Plasmas, Fluids, and Related Interdisciplinary
Topics, vol. 58, pp. 5355-5363, 1998.
[40] A. Lucas, "Ising formulations of many NP problems," Frontiers in Physics, vol. 2, 2014.
[41] K. E. Hamilton and T. S. Humble, "Identifying the minor set cover of dense connected
bipartite graphs via random matching edge sets," Quantum Information Processing, vol.
16, p. 94, 2017.
[42] S. Okada, M. Ohzeki, M. Terabe, and S. Taguchi, "Improving solutions by embedding
larger subproblems in a D-Wave quantum annealer," Scientific Reports, vol. 9, 2019.
[43] D-Wave System Documentation (2018). Available:
https://docs.dwavesys.com/docs/latest/index.html
[44] J. Roland and N. J. Cerf, "Noise resistance of adiabatic quantum computation using random
matrix theory," Physical Review A, vol. 71, 2005.
[45] D. S. Wild, S. Gopalakrishnan, M. Knap, N. Y. Yao, and M. D. Lukin, "Adiabatic Quantum
Search in Open Systems," Physical Review Letters, vol. 117, 2016.
[46] M. H. Amin, P. J. Love, and C. J. Truncik, "Thermally assisted adiabatic quantum
computation," Phys Rev Lett, vol. 100, p. 060503, 2008.
[47] D. Venturelli, D. Marchand, and G. Rojo, "Job shop scheduling solver based on quantum
annealing," in Proc. of ICAPS-16 Workshop on Constraint Satisfaction Techniques for
Planning and Scheduling (COPLAS), pp. 25-34.
[48] E. R. David and L. M. James, "Information Processing in Dynamical Systems: Foundations
of Harmony Theory," in Parallel Distributed Processing: Explorations in the
Microstructure of Cognition: Foundations, ed: MITP, 1987, pp. 194-281.
[49] G. E. Hinton and R. R. Salakhutdinov, "Reducing the dimensionality of data with neural
networks," Science, vol. 313, pp. 504-507, 2006.
[50] R. Salakhutdinov, A. Mnih, and G. Hinton, "Restricted Boltzmann machines for
collaborative filtering," presented at the Proceedings of the 24th international conference
on Machine learning, Corvalis, Oregon, USA, 2007.
[51] A.-r. Mohamed and G. Hinton, "Phone recognition using restricted boltzmann machines,"
in 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, 2010,
pp. 4354-4357.
[52] A. Fischer and C. Igel, "Training restricted Boltzmann machines: An introduction," Pattern
Recognition, vol. 47, pp. 25-39, 2014.
[53] J. J. Hopfield, "Neural networks and physical systems with emergent collective
computational abilities," Proceedings of the national academy of sciences, vol. 79, pp.
2554-2558, 1982.
34
[54] G. E. Hinton, "A Practical Guide to Training Restricted Boltzmann Machines," in Neural
Networks: Tricks of the Trade: Second Edition, G. Montavon, G. B. Orr, and K.-R. Müller,
Eds., ed Berlin, Heidelberg: Springer Berlin Heidelberg, 2012, pp. 599-619.
[55] G. E. Hinton, "Training products of experts by minimizing contrastive divergence," Neural
Computation, vol. 14, pp. 1771-1800, 2002.
[56] G. Hinton, "Deep belief networks," Scholarpedia, p. 5947, 2009.
[57] G. E. Hinton, S. Osindero, and Y.-W. Teh, "A fast learning algorithm for deep belief nets,"
Neural computation, vol. 18, pp. 1527-1554, 2006.
[58] A. K. Jain and B. Chandrasekaran, "39 Dimensionality and sample size considerations in
pattern recognition practice," in Handbook of Statistics. vol. 2, ed: Elsevier, 1982, pp. 835-
855.
[59] A. Perdomo-Ortiz, B. O'Gorman, J. Fluegemann, R. Biswas, and V. N. Smelyanskiy,
"Determination and correction of persistent biases in quantum annealers," Sci Rep, vol. 6,
p. 18628, 2016.
[60] Z. Bian, F. Chudak, W. Macready, A. Roy, R. Sebastiani, and S. Varotti, "Solving sat and
maxsat with a quantum annealer: Foundations, encodings, and preliminary results," arXiv
preprint arXiv:1811.02524, 2018.
[61] M. Benedetti, J. Realpe-Gómez, R. Biswas, and A. Perdomo-Ortiz, "Estimation of effective
temperatures in quantum annealers for sampling applications: A case study with possible
applications in deep learning," Physical Review A, vol. 94, p. 022308, 2016.
[62] K. E. S. Pilario, Y. Cao, and M. Shafiee, "Mixed kernel canonical variate dissimilarity
analysis for incipient fault monitoring in nonlinear dynamic processes," Computers &
Chemical Engineering, vol. 123, pp. 143-154, 2019.
[63] J. J. Downs and E. F. Vogel, "A Plant-Wide Industrial-Process Control Problem,"
Computers & Chemical Engineering, vol. 17, pp. 245-255, 1993.
[64] I. Sutskever and T. Tieleman, "On the convergence properties of contrastive divergence,"
in Proceedings of the thirteenth international conference on artificial intelligence and
statistics, 2010, pp. 789-795.
35