Shapley-Based Explainable AI For Clustering
Shapley-Based Explainable AI For Clustering
1 Introduction
A significant limitation of existing AI techniques that threatens trust, adoption,
and maturity in manufacturing industry is that it is exceedingly difficult to understand
how developed models make predictions, particularly with arbitrarily deep and nonlin-
ear neural networks. This lies at the forefront of what McKinsey & Company cited as
the most significant challenge facing companies in implementing Industry 4.0 solutions
in 2020 outside of the COVID-19 pandemic: a limited understanding of the technology
itself (Agrawal et al. 2021). In light of these central challenges, the field of explainable
artificial intelligence (XAI) has recently emerged as a research area exploring various
approaches. Explainable and trustworthy data-driven methodology can improve deci-
sion-making for fault diagnosis as well as the broader field of prognostics and health
management (PHM), enabling cost-saving intelligent predictive maintenance and ef-
fective resource allocation strategies (Hrnjica and Softic 2020).
XAI methods vary by aspects such as explanation scope, model specificity,
and the location of the explanations (Ahmed et al. 2022). In recent years, XAI methods
have been developed for several applications including manufacturing cost estimation
(Yoo and Kang 2021), predictive maintenance (Hrnjica and Softic 2020; Serradilla et
al. 2021), privacy-preserving industrial applications (Ogrezeanu et al. 2022), semicon-
ductor defect classification (Lee et al. 2022), and quality management in semiconductor
manufacturing (Senoner et al. 2021). Most of these applications attempt to quantify
feature attributions for predictions and global feature importance, which historically
have been model-specific and derived from nonparametric, decision tree-based model-
ing (Ahmed et al. 2022). Features are also inherently explainable for linear regression
models, with the coefficients directly giving the attribution (Sofianidis et al. 2021).
However, for nonlinear parametric models, learned weight values may give misleading
results both due to differences in variable scale as well as the nonlinearities in the ar-
chitecture. As a result, XAI methods have been recently developed to quantify feature
attributions on a model-agnostic level, compatible with high-fidelity nonlinear super-
vised learning models of arbitrary complexity.
One prominent model-agnostic XAI technique is the Local Interpretable
Model-Agnostic Explanations (LIME) method. The purpose of LIME is to explain in-
dividual black-box model predictions, accomplished by weighing perturbed data sam-
ples around a neighborhood of an observation of interest and obtaining a low-fidelity
explainer model (Molnar 2022). The output explainer model is typically a weighted
linear and/or sparse method that is inherently explainable as a surrogate model, recov-
ering the local decision boundaries around the observation of interest. The advantage
of LIME is its model-agnostic capability of mixing low- and high-fidelity predictive
models, taking advantage of the discrimination capabilities of high-fidelity nonlinear
modeling while simultaneously allowing local explanations provided by the low-fidel-
ity explainer. This multi-fidelity approach also enables the usage of different feature
sets; for example, highly accurate convolutional neural networks (CNNs) with learned
feature representations can be utilized in conjunction with LASSO-regularized linear
explainers trained on human-explainable features (Molnar 2022). However, there are
significant limitations with LIME, including instability stemming from the sampling
technique used. The resulting explanations may be brittle and only valid within the
target observation’s local neighborhood, which is in itself non-trivial to define (Molnar
2022).
Shapley-based techniques have been developed as an alternative to obtain both
local and global feature attributions (Senoner et al. 2021). Originating from game the-
ory economics, the Shapley value was first introduced in 1951 by Lloyd Shapley, who
later won the Nobel Memorial Prize in Economic Sciences in 2012. The concept for
Shapley values addresses the fundamental question: in a cooperative game, how does
one determine the fairest payoff for all players, considering their individual contribu-
tions to the game? In the last decade, Shapley values have been reexplored for machine
2
learning contexts, in which a Shapley value is interpreted as the average marginal con-
tribution of a feature across all possible feature sets.
Computing the exact Shapley value is computationally expensive and com-
plexity scales exponentially on the number of features, making it exceedingly difficult
when data are high-dimensional. As a result, several approximation methods have been
proposed to obtain Shapley-inspired feature attributions. Štrumbelj and Kononenko
proposed a stochastic Monte Carlo-based sampling approach that approximates Shap-
ley values via permuting and splicing data instances (Štrumbelj and Kononenko 2014).
Another approximation technique was later developed by Lundberg et al. with SHapley
Additive exPlanations (SHAP), a class of deterministic methods originally developed
for decision tree models (TreeSHAP) that has since been extended to provide additive
explanations regardless of the model (Lundberg et al. 2017).
Since the inception of the SHAP method, researchers have also explored the
utility of the values beyond simply offering post-hoc model interpretations. For exam-
ple, Senoner et al. used information from SHAP analysis in the semiconductor domain
to identify key process and quality drivers, recommending candidate improvement ac-
tions to significantly reduce yield loss at Hitachi Energy in Zurich, Switzerland (Sen-
oner et al. 2021). In addition, Cooper et al. detailed how SHAP values provide valuable
context for supervised clustering analysis, discovering key subgroups in a COVID-19
symptomatology case study (Cooper et al. 2021). Although the clusters from Cooper et
al.’s work are derived from a pipeline containing several nonlinear transformations
such as those within the predictive model and stochastic dimensionality reduction, the
resulting clusters are human-explainable and can be characterized with high precision
by simple decision rules.
However, methods thus far have not proposed Shapley-based clustering anal-
ysis to include semi-supervised contexts in which only partial labeling is available to
construct data-driven models, a key limitation for practical application. In addition, ex-
isting methods have not evaluated the utility of Shapley methods for model predictions
with class imbalance. This paper aims to address these research gaps by proposing
a new clustering framework extensible for semi-supervised problem scenarios.
The main contributions of this paper are summarized as follows:
The first case study explores unsupervised and semi-supervised cases, whereas the sec-
ond focuses on Shapley-explainable clustering for a fully supervised learning scenario.
The case studies vary in data type (RGB image heatmaps versus time series data), level
of supervision, as well as application area (semiconductor manufacturing versus aero-
space prognostics), demonstrating the flexibility of the proposed methodology.
3
2 Methodology
This paper explores a novel explainable clustering framework. The overall ob-
jective is to obtain information-dense clusters that relate to the underlying predictions
of a semi-supervised or fully supervised trained model, which can be further described
with simple rules with high precision. The methodology utilizes the following tools:
The methodology addresses the need for explainable clustering extensible for semi-
supervised fault diagnosis and prognosis problem scenarios common in manufacturing
datasets. All steps will be further elaborated in this section. A block diagram of the
methodology, illustrating the key components as applied for unsupervised, semi-super-
vised, and fully supervised case studies, is provided in Fig. 1.
Fig. 1 Proposed clustering methodology primarily based on UMAP and HDBSCAN techniques
for unsupervised, semi-supervised, and fully supervised cases. The framework is augmented with
Shapley value analysis tied to classification and/or regression model predictions when appropriate
for semi-supervised and fully supervised cases
4
2.1 Shapley Value Analysis
Mathematically, the Shapley value for a feature j and sample x is defined in
Eq. 1 (Lundberg et al. 2017):
in which F is the power set of all features, N is the total number of features, and S is a
subset that excludes feature j. 𝑓( (𝑆 ∪ {𝑗}) − 𝑓( (𝑆) is the estimated marginal contribu-
tion of adding feature j to the feature subset S for sample x, requiring the repeated eval-
uation of model f. The interpretation of the Shapley value is that it is the amount feature
j contributes to the prediction of sample x beyond a baseline average prediction or ex-
pectation.
Local accuracy, the most essential property of Shapley theory, is defined in
Eq. 2 and states that the sum of all attributions equals the prediction:
*
where 𝜙) (𝑓) represents the baseline prediction of the model f and the local prediction
𝑓(𝑥) is explainable via a linear sum of the obtained Shapley values.
In this methodology, Shapley value analysis is used as an informative trans-
formation of the feature space leading up to the subsequent clustering steps. The result
of this is being able to explain any target prediction from a trained nonlinear predictive
model as a linear combination of the baseline prediction and the localized feature at-
tributions provided by the Shapley values, as in Eq. 2. However, there is further utility
in Shapley values, as they can provide necessary structure to derive clusters that relate
to a target prediction and contain meaningful information content (Cooper et al. 2021).
Two techniques for obtaining approximations of the Shapley values are utilized in this
paper: Štrumbelj and Kononenko’s Monte Carlo-based sampling approach (Štrumbelj
and Kononenko 2014) as well as Lundberg et al.’s SHAP method (Lundberg et al.
2017). In their paper, Lundberg et al. proved that the obtained SHAP values satisfy
three important properties, justifying their usage as Shapley approximations: local ac-
curacy (Eq. 2), missingness, and consistency.
5
subsequent learning of the UMAP-embedded space (McInnes et al. 2018). McInnes et
al. found that compared to t-SNE, UMAP better captures global behavior, and can be
made flexible for several use cases depending on user-specified parameters.
UMAP revolves around three key tunable parameters: the number of neighbors
to estimate the size of the local neighborhood, the minimum distance controlling the
tightness of packed points, and the number of extracted UMAP components (the di-
mensionality of the embedding). It is important to note that unlike other standard di-
mensionality reduction techniques such as principal components analysis (PCA),
UMAP components do not preserve original densities; therefore, the relative distances
between obtained clusters may not be meaningful. However, when the number of neigh-
bors is set appropriately high to capture global behavior (depending on application and
dataset sample size) and the minimum distance is low (e.g., equal to 0), UMAP-based
clustering has been empirically successful in improved visualizations that pair well with
Shapley analysis for added explainability (Cooper et al. 2021).
6
rules will be learned via the SkopeRules method (Gardin et al. 2018), which learns
highly discriminative rules in accordance to specified precision and recall thresholds
that are then deduplicated to ensure heterogeneity. This approach is inspired by Cooper
et al.’s subgroup discovery approach with regards to COVID-19 symptomatology,
which similarly obtained rule-based descriptions to describe shared symptoms of
COVID-19 positive patients (Cooper et al. 2021). However, in this paper, the approach
is further constrained by providing the following three heuristics to prevent overfitting
and maximize the utility and explainability provided by the rules:
1. Learned rules may only consist of the top 10 highest ranked features in mean abso-
lute Shapley values;
2. Each rule may only have up to two terms;
3. Each rule may only be described in terms of the original feature values.
As a result, as shown in Fig. 1, these descriptions will only be provided for the fully
supervised PHM case study to explain the cluster subgroups in terms of the physical
meaning of the original tabular features.
7
fundamental insights for the utility of Shapley-based clustering extended for the class
of weakly labeled semi-supervised learning problems prevalent in manufacturing in-
dustry.
8
anomalous; it is recommended that in practical use, operators examine the unclustered
samples for potential anomalies or unconventional data relative to the overall distribu-
tion.
Table 1 Classification performance for Model 1 and Model 2, with both models trained on par-
tial labels. Model 1 offers superior performance for classification
The SHAP method is implemented using Python 3.7 to quantify the feature
attributions to the target prediction: fault classification. The permutation explainer is
utilized for both models, taking approximately 3 hours to calculate the feature attribu-
tions for all 59,077 samples of 20 features each benchmarked on a single machine with
an Intel Core i7-10750H CPU @ 2.60 GHz and 32 GB of RAM. After the SHAP values
have been computed, the UMAP and HDBSCAN combination is utilized to produce
clustering results, as mentioned previously. For both models, the same UMAP settings
are used as in the unsupervised case, and similar HDBSCAN settings with a minimum
cluster size of 20 are used so to achieve a comparable number of total clusters. Fig. 3
illustrates the semi-supervised SHAP clustering assignment for Model 1.
9
Fig. 3 UMAP + HDBSCAN clusterings based on SHAP values for Model 1, demonstrating
significantly fewer unclustered samples (~0.1%) as clusters now relate to the target predic-
tion
The difference between the clustering results illustrated in Fig. 2 and Fig. 3 is
stark. The additional SHAP-based transformation introduces significant structure to the
clustering space, in which it becomes more evident which clusters are shaped by the
various underlying model predictions. For example, the unclustered samples in addition
to clusters 7, 8, and 9 from Fig. 3 are all distinctly anomalous upon inspection; cluster
7 contains most of the Fault 2 anomalies and Fault 1 anomalies are mostly split between
clusters 8 and 9. These strong associations between the clusters and faulty classes were
not present in the unsupervised case illustrated in Fig. 2. Notably, just 0.1% of the da-
taset remains unclustered by HDBSCAN in this scenario, improving the clustering per-
centage—thereby reducing the number of designated noisy samples—by an order of
magnitude compared to the purely unsupervised result. This significant improvement
in clustering is made possible by having experts partially label just 1.6% of the dataset
and train a supervised learning model based on those labels. The equivalent clustering
assignment for Model 2 is presented in Fig. 4.
Once again, SHAP provides structure to the dataset that streamlines the clus-
tering process, even when the underlying model misclassifies a significant portion of
the known Fault 1 classifications. For Model 2, 0.5% of the dataset remained unclus-
tered by HDBSCAN, still representing a significant improvement over the purely un-
supervised case. Similar to the clustering obtained from Model 1, Model 2 separates
most of the identified Fault 1 samples into two clusters (3 and 10). Interestingly, the
NMI between the two clustering assignments is 0.86, indicating strong alignment in the
information content in the clusters.
10
Fig. 4 UMAP + HDBSCAN clusterings based on SHAP values for Model 2, demonstrating
similar clustering quality despite differences in the underlying model performance
11
mode information with component-level granularity. While the challenge itself focused
on predicting the remaining useful life (RUL) of the engine units, the labeled failure
modes offer an opportunity for prognostics approaches incorporating XAI techniques.
The dataset consists of 8 subsets that contain 7 different failure modes, in
which each failure mode is characterized by the presence of potentially overlapping
faults encountered for 5 mechanical components: fan, low pressure compressor (LPC),
high pressure compressor (HPC), low pressure turbine (LPT), and high pressure turbine
(HPT). Existing work on this dataset has explored deep learning techniques for RUL
estimation, with challenge winners implementing variations of convolutional neural
network (CNN) architectures to achieve accurate predictions (Solís-Martín et al. 2021;
DeVol et al. 2021; Lövberg 2021). These approaches focused solely on RUL prediction
enabled by deep feature representations, but did not forecast failing components or tar-
get interpretability for their prognostic approaches.
In our work, we focus on constructing Shapley-explainable clusters to obtain
subgroups describable in terms of features directly based on the original variables of
the dataset. Table 2 presents a summary of these variables, which include dynamic
operating scenario descriptions and time series sensor measurements (18 time series in
total), and auxiliary variables that are held constant per cycle. We refer to the challenge
formulation and documentation for more information about this benchmark dataset
(Chao et al. 2021a, b).
We propose examining this benchmark dataset from multiple perspectives,
building from our previous work, which introduced the reformulation of this benchmark
problem for forecasting failures based on the labeled failure mode information (Cohen
et al. 2023). To enhance the interpretability and prognostic utility of the data-driven
model, we aim to: 1) predict the current health status, with validation possible by using
the binary health state label provided by NASA; and 2) predict the failing component(s)
responsible for the failure in addition to RUL prediction; and 3) explain the behavior
of the predictive model by assessing feature attributions, which was not considered in
any prior work. Expanding the number of outputs to a total of 7 for our model allows
for the simultaneous detection of incipient faults, monitoring of equipment health, and
prediction of the RUL until catastrophic failure. The prognostic insights from such an
approach could allow for improved decision-making resulting in swift resource alloca-
tion and appropriate maintenance staffing, reducing costs associated with expensive
reactive maintenance policies (Selcuk 2016).
12
Table 2 Complete variable descriptions from the 2021 PHM Data Challenge (Chao et al. 2021b)
Variable Symbol Description Units
𝐴! unit Unit number -
𝐴" cycle Flight cycle number -
𝐴# 𝐹$ Flight class -
𝐴% ℎ& Health state -
𝑊! alt Altitude ft
𝑊" Mach Flight Mach number -
𝑊# TRA Throttle-resolver angle %
𝑊% T2 Total temp. at fan inlet °R
𝑋𝑠! Wf Fuel flow pps
𝑋𝑠" Nf Physical fan speed rpm
𝑋𝑠# Nc Physical core speed rpm
𝑋𝑠% T24 Total temp. at LPC outlet °R
𝑋𝑠' T30 Total temp. at HPC outlet °R
𝑋𝑠( T48 Total temp. at HPT outlet °R
𝑋𝑠) T50 Total temp. at LPT outlet °R
𝑋𝑠* P15 Total pressure in bypass-duct psia
𝑋𝑠+ P2 Total pressure at fan inlet psia
𝑋𝑠!, P21 Total pressure at fan outlet psia
𝑋𝑠!! P24 Total pressure at LPC outlet psia
𝑋𝑠!" Ps30 Static pressure at HPC outlet psia
𝑋𝑠!# P40 Total pressure at burner outlet psia
𝑋𝑠!% P50 Total pressure at LPT outlet psia
13
Table 3 Health state predictions and equipment forecasts for xANN prognostics model
Prediction Precision Recall F1-Score
Unhealthy 0.99 0.98 0.99
Healthy 0.96 0.98 0.97
No Fan Failure 0.96 0.96 0.96
Fan Failure 0.92 0.93 0.93
No LPC Failure 0.88 0.94 0.91
LPC Failure 0.85 0.75 0.80
No HPC Failure 0.94 0.97 0.96
HPC Failure 0.96 0.93 0.95
No HPT Failure 0.91 0.90 0.91
HPT Failure 0.92 0.92 0.92
No LPT Failure 0.89 0.77 0.82
LPT Failure 0.80 0.90 0.85
14
Fig. 5 Depiction of UMAP component space colored by current health state predic-
tions learned from a) raw (min-max normalized) values versus b) Shapley values,
with the visualization in b) showing separability
The Shapley-based clusters show visual separability, but most importantly can
also be described using the original features. Using the SkopeRules implementation on
Python 3.7, high-precision rules describe the derived clusters pictured in Fig. 5 based
on just one term: the current cycle. This is intuitive as an initial example because the
engine units are healthy in initial operation (i.e., when the cycle number is low). Fig. 6
illustrates the same clusters colored by the cycle number, where it is visually clear that
the mostly healthy cluster can be described by a low cycle number.
15
Fig. 6 Health state clusters colored by cycle, demonstrating the ease of describ-
ing the derived Shapley-based clusters in terms of the original feature scale
The health state prediction is an intuitive case in which one variable, the cur-
rent cycle, is clearly dominant in explaining the prediction. However, the other predic-
tions of failing components are significantly more challenging due to the overlap pre-
sent between failures. To clarify the subgroups of failing components, the derived clus-
ters will consist of the failure predictions only; in other words, explainable subgroups
of predicted component failures will be identified.
The Shapley-based clustering results for forecasted fan failures will be illus-
trated as an example of successful explainable component-level prognostic clustering.
First, the top 10 features are ranked by mean absolute Shapley value, which becomes
the basis for the subsequent clustering and derived descriptions. Fig. 7 depicts the
global feature importance ranking for predicting fan failures.
16
Fig. 7 Global feature importance ranking for predicting eventual fan failures, with
cycle and average physical fan speed representing the top 2 most influential features
for the xANN model (see Table 2 for complete variable descriptions)
Using the same UMAP and HDBSCAN settings as previously described, the
clusters for eventual fan failures are depicted in Fig. 8. HDBSCAN detected two clus-
ters (in addition to some noisy samples, removed from the Fig. 8 illustration), with
cluster 0 comprising most of the predictions.
17
The highest performing rule identified by SkopeRules that describes the major
cluster involves the physical fan speed: both the mean and the third quartile statistics
have quantifiable thresholds that describe the dominant fan failure cluster with a preci-
sion of 0.97. This procedure is followed across all forecasted component failures, with
Shapley values recomputed for each target prediction. Table 4 lists the identified rules
for each of the clusters derived using the SkopeRules method across all target predic-
tions, including health state and RUL.
Table 4 Identified rules for Shapley-based clusters in terms of original feature values (see Table
2 for complete variable descriptions)
18
Of the 16 total derived cluster descriptions in Table 4, 12 of them characterize
the respective clusters with a precision exceeding 0.85 and highlight key contributions
for the forecasted failures. This is particularly notable when considering that these rules
are constrained to comprise a maximum of just 2 terms, using variables limited to the
top 10 globally important features as quantified by mean absolute Shapley value. Some
of these identified variables align with prior expectations; for example, the physical fan
speed variable characterizing the major fan failure cluster. However, other variables
such as the total pressure at the burner outlet explaining both forecasted HPT failure
clusters are surprising findings that would have been difficult to pinpoint without XAI
techniques. Another example is that the RUL clusters can be described with near-per-
fect separability based on the altitude, perhaps suggesting that a prognostics model cal-
ibrated or normalized to handle dynamic operating conditions could clarify the degra-
dation trend (Lövberg 2021). These discoveries can lead to the potential identification
of root failure causes, particularly if forecasted failures are closely investigated by do-
main experts.
5 Discussion
19
describable clustering could lead to a paradigm shift for how operators interact and
interface with AI systems. Under this framework, experts can explain localized predic-
tions of interest as well as global trends with a unified approach that is a step in the
direction of demystifying black-box approaches into more trustworthy and reliable
“glass box” fault diagnosis models.
6 Conclusion
The Shapley-based clustering approach proposed in this paper derives explain-
ability from existing data-driven fault diagnosis and prognosis models. By extending
Shapley-based clustering methodology to semi-supervised problems under the critical
lens of fault diagnosis and PHM, it is possible to utilize XAI techniques for practical
intelligent manufacturing problems where labeled data are difficult to obtain. The main
contributions of this paper are listed as follows:
7 Data Availability
The dataset for the 2021 PHM Data Challenge is publicly available from
NASA’s Prognostics Center of Excellence Data Set Repository, accessible for down-
load from the following link: https://www.nasa.gov/content/prognostics-center-of-ex-
cellence-data-set-repository under the heading “17. Turbofan Engine Degradation Sim-
ulation-2”. Upon publication, the authors intend to additionally include a link to a
GitHub repository containing the code written for this benchmark dataset.
20
8 Declarations and Statements
9 References
Agrawal M, Dutta S, Kelly R, Millán I (2021) COVID-19: An inflection point for In-
dustry 4.0. https://www.mckinsey.com/business-functions/operations/our-in-
sights/covid-19-an-inflection-point-for-industry-40. Accessed 14 Oct 2021
Ahmed I, Jeon G, Piccialli F (2022) From Artificial Intelligence to Explainable Artifi-
cial Intelligence in Industry 4.0: A Survey on What, How, and Where. IEEE
Trans Industr Inform 18:5031–5042. https://doi.org/10.1109/TII.2022.3146552
Chao M, Kulkarni C, Goebel K, Fink O (2021a) Aircraft Engine Run-To-Failure Da-
taset Under Real Flight Conditions. NASA Ames Prognostics Data Repository
Chao MA, Kulkarni C, Goebel K, Fink O (2021b) PHM Society Data Challenge
2021. 1–6
Cohen J, Huan X, Ni J (2023) Fault Prognosis of Turbofan Engines: Eventual Failure
Prediction and Remaining Useful Life Estimation. arXiv:2303.12982 (preprint)
Cohen J, Ni J (2022) Semi-Supervised Learning for Anomaly Classification Using
Partially Labeled Subsets. J Manuf Sci Eng 144:.
https://doi.org/10.1115/1.4052761
Cohen J, Ni J (2021) A Deep Fuzzy Semi-supervised Approach to Clustering and
Fault Diagnosis of Partially Labeled Semiconductor Manufacturing Data. In:
Rayz J, Raskin V, Dick S, Kreinovich V (eds) Explainable AI and Other Appli-
cations of Fuzzy Techniques. Springer, Cham, pp 62–73
Cooper A, Doyle O, Bourke A (2021) Supervised Clustering for Subgroup Discovery:
An Application to COVID-19 Symptomatology. Communications in Computer
and Information Science 1525 CCIS:408–422. https://doi.org/10.1007/978-3-
030-93733-1_29
DeVol N, Saldana C, Fu K (2021) Inception Based Deep Convolutional Neural Net-
work for Remaining Useful Life Estimation of Turbofan Engines. In: Annual
Conference of the PHM Society. PHM Society
Gardin F, Gautier R, Goix N, et al (2018) skope-rules: machine learning with logical
rules in Python. https://github.com/scikit-learn-contrib/skope-rules. Accessed
22 Oct 2022
Hrnjica B, Softic S (2020) Explainable AI in Manufacturing: A Predictive Mainte-
nance Case Study. In: IFIP Advances in Information and Communication Tech-
nology. Springer, pp 66–73
Innes M (2018) Flux: Elegant machine learning with Julia. J Open Source Softw 3:.
https://doi.org/10.21105/JOSS.00602
21
Lee M, Jeon J, Lee H (2022) Explainable AI for domain experts: a post Hoc analysis
of deep learning for defect classification of TFT–LCD panels. J Intell Manuf
33:1747–1759. https://doi.org/10.1007/S10845-021-01758-3/FIGURES/17
Longadge MR, Snehlata M, Dongre S, Latesh Malik D (2013) Class Imbalance Prob-
lem in Data Mining Review. International Journal of Computer Science and
Network 2:. https://doi.org/10.48550/arxiv.1305.1707
Lövberg A (2021) Remaining Useful Life Prediction of Aircraft Engines with Varia-
ble Length Input Sequences. In: Annual Conference of the PHM Society. PHM
Society
Lundberg SM, Allen PG, Lee S-I (2017) A Unified Approach to Interpreting Model
Predictions. Adv Neural Inf Process Syst 30:
McInnes L (2018) Using UMAP for Clustering — umap 0.5 documentation.
https://umap-learn.readthedocs.io/en/latest/clustering.html. Accessed 20 Oct
2022
McInnes L, Healy J, Astels S (2017) hdbscan: Hierarchical density based clustering. J
Open Source Softw 2:205. https://doi.org/10.21105/JOSS.00205
McInnes L, Healy J, Melville J (2018) UMAP: Uniform Manifold Approximation and
Projection for Dimension Reduction. https://doi.org/10.48550/arxiv.1802.03426
Molnar C (2022) Interpretable Machine Learning: A Guide for Making Black Box
Models Explainable, 2nd edn.
Nainggolan R, Perangin-Angin R, Simarmata E, Tarigan AF (2019) Improved the
Performance of the K-Means Cluster Using the Sum of Squared Error (SSE) op-
timized by using the Elbow Method. J Phys Conf Ser 1361:012015.
https://doi.org/10.1088/1742-6596/1361/1/012015
Ogrezeanu I, Vizitiu A, Ciușdel C, et al (2022) Privacy-Preserving and Explainable
AI in Industrial Applications. Applied Sciences 12:6395.
https://doi.org/10.3390/APP12136395
Redell N ShapML.jl: A Julia package for interpretable machine learning with stochas-
tic Shapley values. In: 2020. https://github.com/nredell/ShapML.jl. Accessed 23
Oct 2022
Schubert E, Sander J, Ester M, et al (2017) DBSCAN Revisited, Revisited: Why and
How You Should (Still) Use DBSCAN. ACM Transactions on Database Sys-
tems (TODS) 42:. https://doi.org/10.1145/3068335
Selcuk S (2016) Predictive maintenance, its implementation and latest trends:
http://dx.doi.org/101177/0954405415601640 231:1670–1679.
https://doi.org/10.1177/0954405415601640
Senoner J, Netland T, Feuerriegel S (2021) Using Explainable Artificial Intelligence
to Improve Process Quality: Evidence from Semiconductor Manufacturing.
https://doi.org/101287/mnsc20214190 68:5704–5723.
https://doi.org/10.1287/MNSC.2021.4190
Serradilla O, Zugasti E, de Okariz JR, et al (2021) Adaptable and Explainable Predic-
tive Maintenance: Semi-Supervised Deep Learning for Anomaly Detection and
Diagnosis in Press Machine Data. Applied Sciences 11:7376.
https://doi.org/10.3390/APP11167376
22
Shahapure KR, Nicholas C (2020) Cluster quality analysis using silhouette score. Pro-
ceedings - 2020 IEEE 7th International Conference on Data Science and Ad-
vanced Analytics, DSAA 2020 747–748.
https://doi.org/10.1109/DSAA49011.2020.00096
Sofianidis G, Rožanec JM, Mladenic D, Kyriazis D (2021) A Review of Explainable
Artificial Intelligence in Manufacturing.
https://doi.org/10.48550/arxiv.2107.02295
Solís-Martín D, Galán-Páez J, Borrego-Díaz J (2021) A Stacked Deep Convolutional
Neural Network to Predict the Remaining Useful Life of a Turbofan Engine.
Annual Conference of the PHM Society 13:.
https://doi.org/10.36001/PHMCONF.2021.V13I1.3110
Štrumbelj E, Kononenko I (2014) Explaining prediction models and individual pre-
dictions with feature contributions. Knowl Inf Syst 41:647–665.
https://doi.org/10.1007/S10115-013-0679-X/TABLES/4
Yoo S, Kang N (2021) Explainable artificial intelligence for manufacturing cost esti-
mation and machining feature visualization. Expert Syst Appl 183:115430.
https://doi.org/10.1016/J.ESWA.2021.115430
23