An Experimental Evaluation of State-Of
An Experimental Evaluation of State-Of
A R T I C L E I N F O A B S T R A C T
Keywords: Manufacturing is transformed towards smart manufacturing, entering a new data-driven era fueled by digital
Smart manufacturing technologies. The resulting Smart Manufacturing Systems (SMS) gather extensive amounts of diverse data,
Industry 4.0 thanks to the growing number of sensors and rapid advances in sensing technologies. Among the various data
Time-series classification
types available in SMS settings, time-series data plays a pivotal role. Hence, Time-Series Classification (TSC)
Machine learning
AI
emerges as a crucial task in this domain. Over the past decade, researchers have introduced numerous methods
for TSC, necessitating not only algorithmic development and analysis but also validation and empirical com-
parison. This dual approach holds substantial value for practitioners by streamlining choices and revealing in-
sights into models’ strengths and weaknesses. The objective of this study is to fill this gap by providing a rigorous
experimental evaluation of the state-of-the-art Machine Learning (ML) and Deep Learning (DL) algorithms for
TSC tasks in manufacturing and industrial settings. We first explored and compiled a comprehensive list of more
than 92 state-of-the-art algorithms from both TSC and manufacturing literature. Following this, we methodo-
logically selected the 36 most representative algorithms from this list. To evaluate their performance across
various manufacturing classification tasks, we curated a set of 22 manufacturing datasets, representative of
different characteristics that cover diverse manufacturing problems. Subsequently, we implemented and eval-
uated the algorithms on the manufacturing benchmark datasets, and analyzed the results for each dataset. Based
on the results, ResNet, DrCIF, InceptionTime, and ARSENAL emerged as the top-performing algorithms, boasting
an average accuracy of over 96.6 % across all 22 manufacturing TSC datasets. These findings underscore the
robustness, efficiency, scalability, and effectiveness of convolutional kernels in capturing temporal features in
time-series data collected from manufacturing systems for TSC tasks, as three out of the top four performing
algorithms leverage these kernels for feature extraction. Additionally, LSTM, BiLSTM, and TS-LSTM algorithms
deserve recognition for their effectiveness in capturing features within manufacturing time-series data using
RNN-based structures.
* Corresponding author.
E-mail address: [email protected] (T. Wuest).
https://doi.org/10.1016/j.rcim.2024.102839
Received 8 March 2024; Received in revised form 31 May 2024; Accepted 21 July 2024
Available online 30 July 2024
0736-5845/© 2024 Elsevier Ltd. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
M.A. Farahani et al. Robotics and Computer-Integrated Manufacturing 91 (2025) 102839
reduction [8], customer demand forecasting, and real-time machine assets on the shop floor and beyond to a digital network using the In-
monitoring for Remaining Useful Life (RUL) forecasting [9]. Time-series dustrial Internet of Things (IIoT) to collect and share data. Combining
data is one of the main data types that are available in manufacturing this increasing quantity and quality of data with abilities to derive
settings [10]. It is becoming more ubiquitous thanks to the increasing meaningful insights provides organizations with the ability to develop a
number of sensors and sensing technologies and can be found across a sustained competitive advantage. This business case advances the need
wide range of industries outside of manufacturing. Stock market prices for researchers to investigate methods to maximize the impact of
in financial markets [11], ECG data in healthcare [12], Ecohydrology time-series analytics and at the same time, the need to ensure the rapid
sensing data [13], positional data from smart wearable devices, transfer of new knowledge to practitioners in industry.
high-resolution images from the sun over time [14], 3D depth sensor The motivation behind this work is the fact that besides focusing on
Kinect data [15], and vibration, pressure, and temperature data coming algorithm development and analysis, it is essential to concurrently un-
from manufacturing sensors [16] are all examples of time-series data. dertake the validation and empirical comparison of the numerous
The general goal of time-series analytics is to approximate a dataset existing algorithms. This endeavor holds immense value for practi-
in terms of understanding the underlying relation between data points in tioners as it narrows their options and provides insights into the
the time-series and incorporating consideration of recognized patterns strengths and weaknesses of available models. At the same time, the
into generated predictions. Time-series analytics is considered one of the manufacturing community, especially industry, is in desperate need of
most challenging problems in data mining mainly because of temporal practical guidance on the issue. This lack of practical guidance is the
dependencies, potential variable lengths, potential seasonality, trend, main motivation for this study. To date, such an investigation has not
non-stationarity, and noise. In general, having ordered values adds a been conducted for TSC on manufacturing data sets, and the objective of
layer of complexity to a problem [17–19]. The primary types of our paper is to fill this pressing research gap.
time-series analysis are time-series classification, time-series forecasting, The main focus of this paper, as depicted in Fig. 1, is to examine TSC
anomaly detection, and clustering [17]. Time-Series Classification (TSC) is a algorithms that can be effectively and efficiently applied in the
predictive task that leverages supervised learning approaches to learn manufacturing domain. This study reviews, categorizes, and evaluates
from labeled data and categorize them into labeled classes. Time-series the state-of-the-art TSC algorithms on a diverse set of manufacturing
forecasting seeks to understand data components (such as trends, sea- problems represented by different data sets. To achieve this, first, the
sonality, and cycles) to predict future behaviors and values. Finally, state-of-the-art TSC algorithms are identified and extracted from the
time-series clustering and anomaly detection tasks, often grouped together, literature before we use our novel and transparent methodology to select
use unsupervised learning approaches to create groups or clusters of the most representative (state-of-the-art and baseline) TSC algorithms
data with similar properties and/or to detect anomalous data. from the different categories. During the study, we identified a gap be-
Manufacturing presents a unique opportunity for leveraging time- tween the TSC algorithms that are common in Computer Science (CS)
series analytics with the increasing prevalence of industrial sensors. and the ones dominant in the manufacturing literature. To ensure we
Examples of TSC applications in smart manufacturing systems include cover the best and most advanced of both worlds, the initial list of TSC
but are not limited to quality inspection and control, predictive main- algorithms comprises both the TSC community in CS and the
tenance, supply chain optimization, and energy management. With the manufacturing literature to ensure completeness. To identify the state-
growth of Industry 4.0 and smart manufacturing, new machine tools are of-the-art TSC algorithms in smart manufacturing settings, we imple-
already equipped with advanced sensing technologies, and existing ment the down-selected algorithms on publicly available manufacturing
legacy systems are rapidly outfitted with large amounts of new and datasets, perform an empirical comparative study on them, and carefully
powerful sensors. These sensors are capable of automatically accumu- evaluate the results. It should be noted that in Fig. 1, the size of the
lating various time-series data [20]. These sensors connect the physical bubbles does not have specific meanings, and the depiction is only for
2
M.A. Farahani et al. Robotics and Computer-Integrated Manufacturing 91 (2025) 102839
3
M.A. Farahani et al. Robotics and Computer-Integrated Manufacturing 91 (2025) 102839
in manufacturing and industrial settings from our previous work that successive layers of feature representations in a neural network struc-
could potentially be used as a reference for TSC tasks [10]. We analyzed ture. There has been increasing attention on DL techniques in recent
all 90 papers from the proposed ML TSC algorithm in detail, resulting in years across all ML applications and many researchers consider ANN/DL
36 papers from that list that provided sufficient information or expla- techniques as a separate category from conventional ML techniques. One
nation to allow the reproduction of their proposed ML algorithms. reason behind that may be due to the ability of automatic feature
Furthermore, there are several studies from the TSC community in the extraction in DL techniques whereas, in conventional ML techniques,
CS field introducing and comparing TSC algorithms [19,21,23]. After hand-crafted features should be constructed and calculated before
removing duplicate algorithms that were used in multiple research executing any ML task. We define conventional ML as the group of non-
studies, the result is a comprehensive list of 92 TSC algorithms reflecting ANN and DL techniques and algorithms that use hand-crafted features in
the current state-of-the-art in both manufacturing and CS. Moreover, we their learning process and unlike ANN/DL algorithms, are incapable of
categorized the algorithms in a hierarchical structure that helps with learning features automatically. Some other authors may use the
decision-making and algorithm selection. To the best of our knowledge, “Traditional ML” term for this group of algorithms, but we prefer
there is no existing research providing a comprehensive list of applicable “Conventional ML” for the sake of consistency with our previous
ML algorithms that can be used for TSC tasks in the manufacturing research. It should also be noted that due to the diversity of research and
domain. publications regarding ML across a multitude of sub-domains, in-
TSC techniques and algorithms that have been proposed in the dustries, and applications, a variety of methods for classifying ML al-
literature can be categorized into two main categories, namely Con- gorithms have emerged and resulted in a lack of consensus [3].
ventional ML methods and Artificial Neural Networks (ANN) & DL-based
methods. DL is a specific subfield of ML where the learning happens in
4
M.A. Farahani et al. Robotics and Computer-Integrated Manufacturing 91 (2025) 102839
5
M.A. Farahani et al. Robotics and Computer-Integrated Manufacturing 91 (2025) 102839
Table 4 these algorithms with the classifier name followed by the distance
Conventional ML TSC algorithms using differential-based FE technique. metric. Moreover, Independent, Dependent, and Adaptive KNN-DTW
Algorithm FE algorithm Classification Classification are the generalized version of DTW adopted for multivariate
name technique algorithm time-series [47], and Elastic Ensemble (EE) is an ensemble algorithm
KNN-CID first-order differences Distance-based KNN that works by ensembling eleven elastic metrics [21]. Table 7 shows
KNN-DDTW first-order differences Distance-based KNN these algorithms, and their respective classification techniques and
KNN-DTDC first-order differences, Distance-based KNN algorithms.
cosine transform
6
M.A. Farahani et al. Robotics and Computer-Integrated Manufacturing 91 (2025) 102839
Table 7
Instance-based conventional TSC ML algorithms that do not use any FE techniques.
Algorithm name Classification technique Classification algorithm Algorithm name Classification technique Classification algorithm
LR Statistical LR NB Statistical NB
LDA Statistical DA QDA Statistical DA
SVM Distance-based SVM BAG-DT DT Ensemble DT
RF DT Ensemble DT GBM DT Ensemble DT
Extreme RF DT Ensemble RF DT XGBoost DT Ensemble DT
KNN-EUC Distance-based KNN KNN-DTW Distance-based KNN
KNN-LCSS Distance-based KNN KNN-ERP Distance-based KNN
KNN-WDTW Distance-based KNN KNN-MSM Distance-based KNN
KNN-TWE Distance-based KNN KNN-DTW-I Distance-based KNN
KNN-DTW-D Distance-based KNN KNN-DTW-A Distance-based KNN
EE Algorithm Ensemble 11 Elastic metrics
Table 9
Conventional ML TSC algorithms using distance-based classification techniques.
Algorithm name FE technique FE algorithm Algorithm name FE technique FE algorithm
7
M.A. Farahani et al. Robotics and Computer-Integrated Manufacturing 91 (2025) 102839
Table 10 algorithms in each category from the initial compiled list of 92 algo-
Conventional ML TSC algorithms using DT ensemble classification techniques. rithms. More details on each algorithm can be found in the referenced
Algorithm Classification FE FE algorithm papers in subsequent sections.
name algorithm technique The Feed Forward Neural Networks (FFN) are the first and simplest
BAG-DT DT None None type of ANN architecture that was proposed. The main characteristic of
RF DT None None these networks is that the information moves in only one direction and
GBM DT None None there are no cycles or loops in the network. Multilayer Perceptron (MLP)
Extreme RF RF None None [53], FFT-MLP [54], Ensemble Sparse Supervised Model (ESSM) [55],
RotF RF Statistical PCA
FS DT Shapelet- SAX, Shapelet Discovery
and DA-NET [17] algorithms are using the FFN-based architecture to do
based the TSC tasks. Table 13 shows these algorithms, and their respective FE
TSF Time-series Tree Interval- Summary Statistics architectures and algorithms.
based Convolutional Neural Networks (CNN) are another group of ANN that
CIF Time-series Tree Interval- Summary Statistics + Catch22
was originally proposed by LeCun in 1998 for image analysis. Since
based
DrCIF DT Interval- Summary Statistics + Catch22 then, many variations of CNN have been proposed and successfully
based applied for different tasks. CNN architectures have shown good results in
TSBF RF Interval- Summary Statistics +BOP time series classification due to their powerful local feature capture
based capabilities. The main characteristic of CNN is using convolution kernels
STC RF Shapelet- Shapelet transform
based
with trainable weights to find the local patterns by high-dimensional
XGBoost DT None None nonlinear feature extraction [17]. CNN [54], Time-CNN [19],
gRFS RF Shapelet- Shapelet transform –FDC-CNN [56], Fully Convolutional Networks (FCN) [53], t-LeNet
based [19], –HHO-ConvNet [57], 1DCNN [58,59], Encoder [19], MCDCNN
RISE RF Interval- Spectral features
[19], Multiple Time-Series Convolutional Neural Network (MTS-CNN)
based
PF PF None None [20], MCNN [19], Dilated CNN [60], ResNet [23,53], Temporal Con-
TS-CHIEF PF Hybrid Similarity measures, dictionary volutional Networks (TCN) [61], InceptionTime [62], Inception-1DCNN
representations, interval-based [63], MultiVariate Convolutional Neural Network (MVCNN) [64],
transformations CWT-CNN [65,66], GASF-CNN [67] algorithms are using different var-
iants of CNN-based for the TSC tasks. It is important to note that here we
are only exploring the general architecture of these algorithms and
Table 11 different possibilities for each category of algorithms. Although several
Conventional ML TSC algorithms using algorithm ensemble classification algorithms may seem similar to each other, the fine details of each of
techniques. them such as the number of layers, number of kernels in each layer, etc.
Algorithm Classification FE technique FE algorithm are different. These details are out of the scope of this study and we
name algorithm encourage interested readers to find them in referenced papers for each
EE 11 Elastic metrics None None algorithm. Table 14 shows these algorithms, and their respective FE
COTE EE, ST, ACF, PS Hybrid Hybrid architectures and algorithms.
CBOSS BOSS Dictionary- SFA Recurrent Neural Networks (RNN) are another popular type of archi-
based
TDE BOSS Dictionary- SFA
tecture for ANN/DL algorithms for TSC tasks. RNNs have been devel-
based oped to address the sequential input data and have the sequential data
ARSENAL ROCKET Kernel-based Convolution feeding ability. The main characteristic of these networks is their
kernels memory as they take information from prior time stamps to influence
the current input and output. While other ANN architectures assume
that the input variables are independent of each other, the output of an
Table 12 RNN depends on the prior elements within the sequence. In our review,
Conventional ML TSC algorithms using meta ensemble classification techniques.
Algorithm Classification algorithm FE FE Table 13
name technique algorithm ANN & DL ML TSC algorithms using FFN feature engineering architectures.
HIVE-COTE EE Ensemble, Shapelet Ensemble, Hybrid Hybrid Algorithm name FE technique FE layers
V1.0 BOSS Ensemble, TSF, RISE
HIVE-COTE STC, TDE, ARSENAL, DrCIF Hybrid Hybrid MLP FFN MLP
V2.0 FFT-MLP FFN FFT, MLP
DA-NET FFN MLP, Dual Attention (SEWA & SSAW), MLP
ESSM FFN Sparse filtering, MLP
8
M.A. Farahani et al. Robotics and Computer-Integrated Manufacturing 91 (2025) 102839
Table 14 Table 16
ANN & DL ML TSC algorithms using CNN FE architectures. ANN & DL ML TSC algorithms using FFN FE architectures.
Algorithm FE layers Algorithm FE layers Algorithm name FE technique FE layers
name name
ATT-1DCNN-GRU CNN-RNN Conv1D, GRU, Attention, MLP
CNN Conv1D–MLP MCNN Window Slicing, CNN-LSTM CNN-RNN Conv1D, LSTM
Conv1D, MLP 4-layer CNN- CNN-RNN Conv1D, LSTM, MLP
Time-CNN Conv1D, MLP Dilated CNN Dilated Conv, LSTM
Conv1D Layers, TapNet CNN-RNN Conv1D, LSTM, MLP, Attention
MLP Layers MLSTM-FCN CNN-RNN LSTM, Conv1D, Squeeze & Excitation
–FDC-CNN Conv1D, MLP ResNet Conv1D, Residual MALSTM-FCN CNN-RNN LSTM, Attention, Conv1D, Squeeze &
Block, MLP Excitation
FCN Conv1D, MLP TCN dilated causal FFT-CNN-LSTM CNN-RNN FFT, Conv2D, LSTM, MLP
convolution,
Residual block, MLP
t-LeNet Conv1D, MLP InceptionTime Conv1D, Inception preprocessed manufacturing datasets that are publicly available for re-
Block, Residual
searchers and practitioners. As a result, ML researchers in the
Block, MLP
–HHO- Conv1D, MLP Inception- Inception Conv,
manufacturing domain either have to i) revert to primary data collected
ConvNet 1DCNN Conv1D, MLP from machines which is hard to come by, ii) do the preprocessing from
1DCNN Conv1D, MLP MVCNN 1 × 1 Conv, scratch based on their application, or iii) turn to popular datasets found
Inception Conv, in other domains. While valuable, these datasets from other domains
MLP
lack characteristics that are unique to and representative of
Encoder Conv1D, Attention, MLP CWT-CNN CWT, Conv2D, MLP
MCDCNN Independent Conv1D on GASF-CNN PAA, GASF, manufacturing problems. There are many publicly available time-series
channels, MLP Conv2D, MLP datasets covering a range of applications outside of manufacturing, such
MTS-CNN Independent Conv1D on as in the medical domain, the financial markets, human activity recog-
channels, Independent nition, and speech recognition applications. The University of California
MLP on channels, MLP
Riverside (UCR) time-series archive [80], and the University of East
Anglia (UEA) multivariate time-series classification archive [81], are the
RNN [68], LSTM [61], Stacked LSTM [69], BI-LSTM [70], and TS-LSTM main examples of these publicly available datasets. Additionally, there
[65] algorithms use different variants of RNN-based architectures for the are also some time-series datasets available at the UC Irvine (UCI) Ma-
TSC tasks. Table 15 shows these algorithms, and their respective FE chine Learning repository. Each of these sources contains preprocessed
architectures and algorithms. datasets that can be used for research purposes. However, the number of
Some algorithms utilize a combination of both RNN and CNN layers manufacturing-related public datasets that can be used in TSC is limited
inside their architectures. This is to make use of both of these network’s and covers a narrow range of applications. A persuasive reason for this
characteristics and to increase the performance of the TSC algorithm scarcity of manufacturing datasets is the reluctance of some authors to
overall. Since this group cannot fit neatly in either of those categories, share their datasets and non-disclosure agreements with the industry in
we denote these algorithms as CNN-RNN architectures. In our review, many cases. As a result, manufacturing is missing out on valuable ad-
ATT-1DCNN-GRU [71], CNN-LSTM [72], 4-layer CNN-LSTM [69], vantages afforded to the CS field by strong data availability to advance
Time-series attentional prototype network (TapNet) [73], Multivariate the field as a whole.
LSTM-FCN (MLSTM-FCN) [74,75], Multivariate Attention LSTM-FCN In this study, we aim to bridge this gap by gathering several available
(MALSTM-FCN) [75], FFT-CNN-LSTM [76] algorithms are using datasets from various manufacturing resources and preprocessing them
different variants of CNN-RNN based architectures to the TSC task. with a standard and transparent methodology. The result is a repository
Table 16 shows these algorithms, and their respective FE architectures of ready-to-use manufacturing-specific datasets that can be fed into ML
and algorithms. algorithms to investigate their performance in a smart manufacturing
Finally, a more recent ANN architecture is Generative Adversarial setting.
Networks (GAN). It was developed by Goodfellow et al. [77]. The idea The goal of this study is to perform the experimental evaluation of
behind GAN is to make use of two different kinds of networks (i.e., previously mentioned TSC algorithms on manufacturing-related data-
discriminator and generative) and make them compete with each other sets and evaluate which algorithm(s) are best suited for that situation.
to learn the joint probability between a set of input features and output To find a representative list of datasets from the manufacturing domain,
classes [78]. Based on our review, the use of GAN neural networks for we first gathered a list of 33 datasets from various resources. These are
TSC tasks in manufacturing has been very limited, and we only had the manufacturing-related datasets that can be used for different applica-
WGAN-GP-based deep adversarial Transfer Learning (WDATL) algo- tions and ML tasks. Neupane et al., introduced five datasets that can be
rithm that used this architecture [79]. used for bearing fault detection and diagnosis applications [5]. The
Prognostics and Health Management (PHM) Society is another resource
2.2. Public manufacturing TSC datasets that hosts a data-driven competition every year and has a few public
datasets that can be used for time-series research. Jia et al. reviewed
One of the greatest challenges when studying time-series analytics these datasets from 2008 until 2017 in their study [91]. The more recent
for smart manufacturing applications is the availability of applicable PHM challenge datasets are available on their website. NASA has a
public datasets [10]. Currently, there are a limited number of Prognostics Center of Excellence Data Set Repository, which is a
collection of data sets that have been donated by universities, agencies,
Table 15 or companies for prognosis purposes. Most of these are time-series
ANN & DL ML TSC algorithms using FFN FE architectures. datasets that fall into the manufacturing domain and can be used for
our purpose. It is important to note that this is not an exhaustive list of
Algorithm name FE technique FE layers
all available datasets. The datasets with image data types are not
RNN RNN RNN, MLP included in this selection and we are only focusing on datasets with
LSTM RNN LSTM, MLP
time-series data type. Table 17 shows our initial list of datasets with
Stacked LSTM RNN LSTM, MLP
BI-LSTM RNN Bi-LSTM, MLP their general characteristics.
TS-LSTM RNN LSTM, Attention, MLP
9
M.A. Farahani et al. Robotics and Computer-Integrated Manufacturing 91 (2025) 102839
Table 17
The initial list of 33 manufacturing datasets with different applications used for a variety of tasks.
Dataset name Domain Associated application ML task Reference
Gas Sensor Temperature Semiconductor Detection limit Estimation Forecasting UCI ML Repository
Hydraulic systems Railway Condition Monitoring Classification UCI ML Repository
Gas sensors home activity Chemical Condition Diagnosis Classification UCI ML Repository
Control charts Time-series General Pattern Recognition Classification UCI ML Repository
PHM09 Gearbox Gearbox Anomaly Detection & Prognosis Anomaly Detection, Forecasting PHM Society
PHM10 CNC milling Milling RUL Estimation Forecasting PHM Society
PHM18 Ion Mill Etch Semiconductor RUL Estimation Forecasting PHM Society
PHM22_Rock_Drills Rock drills Fault Diagnosis Classification PHM Society
Paderborn University Bearing Fault Diagnosis Classification Neupane et.al.,2020[5]
NASA FEMTO Bearing RUL Estimation Forecasting Neupane et.al.,2020[5]
IMS_Bearing Bearing Fault Prognosis Forecasting Neupane et.al.,2020[5]
PHM08 NASA engine Aerospace RUL Estimation Forecasting PHM Society
PHM15 NASA HIRF Energy Fault detection and Prognosis Anomaly Detection, Forecasting PHM Society
PHM19 NASA Crack Manufacturing Fault Estimation Forecasting PHM Society
PHM21 NASA Turbo Fan Engines RUL Estimation Forecasting PHM Society
Bearing_Univar Bearing Fault Diagnosis Classification Huang et.al., 2013[82]
NASA Milling BEST Milling Tool wear prognosis Forecasting Agogino et al., 2007[83]
NASA MOSFET Semiconductor RUL Estimation Forecasting Celaya et al., 2011[84]
Battery Dataset Electronics RUL Estimation Forecasting Saha et al., 2007[85]
NASA IGBT Accelerated Electronics RUL Estimation Forecasting Celaya et al., 2009[86]
NASA CFRP Composites Manufacturing Fault Diagnosis Classification Saxena et al. [87],
3W Oil Wells Anomaly Detection Anomaly Detection Vargas et al., 2019[88]
MFPT Bearing Fault Diagnosis Classification Neupane et.al.,2020[5]
Energy Consumption Energy Energy Estimation Forecasting Data.gova
Metal Etching Semiconductor Fault Diagnosis Classification Wise et al., 1999[89]
CWRU Bearing Bearing Fault Diagnosis Classification Neupane et.al.,2020[5]
SECOM Semiconductor Fault Diagnosis Classification UCI ML Repository
PHM11 Anemometer Wind turbines Anomaly Detection & Fault Prognosis Anomaly detection, Forecasting PHM Society
PHM13 Maintenance Unknown Fault Diagnosis Classification PHM Society
PHM16 CMP Semiconductor MMR Prediction Forecasting PHM Society
PHM17 Train Bogie Transportation Fault Diagnosis Anomaly Detection PHM Society
NASA IMS Bearing Bearing Fault Diagnosis Classification Lee et al., 2007[90]
Metal Etching Feature-set Semiconductor Fault Diagnosis Classification Wise et al., 1999[89]
a
https://bloomington.data.socrata.com/stories/s/hgqr-8ivd.
10
M.A. Farahani et al. Robotics and Computer-Integrated Manufacturing 91 (2025) 102839
differences. Thus, FCN [53] was selected for our experimental evalua- evaluates the effect of bidirectional connections in LSTM layers, and
tion to assess the performance of this structure in TSC tasks. Apart from finally, the TS-LSTM [65] algorithm evaluates the effect of attention
FCN, four other algorithms from the CNN-based TSC algorithms group mechanism in RNN-based architectures.
were chosen. The Encoder [19] algorithm investigates the impact of the From the FFN-based TSC algorithms group, the MLP [62] algorithm
attention mechanism, the ResNet [53] algorithm examines the effect of has been chosen as a benchmark for all classification tasks and the
residual blocks, and the InceptionTime [62] algorithm assesses the effect DA-NET [17] algorithm evaluates transformer-like structures with for-
of both inception and residual modules. ward connections for TSC tasks. One other algorithm has been chosen as
Four algorithms have been chosen from the RNN-based TSC algo- representative of the CNN-RNN-based TSC algorithms group. The
rithms group. The Stacked LSTM [69] algorithm is the baseline repre- MALSTM-FCN [75] algorithm evaluates a more sophisticated
sentative of RNN-based algorithms, The BI-LSTM [70] algorithm RNN–CNN structure coupled with attention and squeeze and excitation
11
M.A. Farahani et al. Robotics and Computer-Integrated Manufacturing 91 (2025) 102839
Fig. 10. Genealogy of Algorithm ensemble and Meta ensemble classification algorithms.
12
M.A. Farahani et al. Robotics and Computer-Integrated Manufacturing 91 (2025) 102839
Table 18 Table 20
The final set of 19 conventional ML TSC algorithms selected for experimental The final set of seven benchmark ML algorithms selected for experimental
evaluation. evaluation.
Algorithm FE technique Classification Proposed Reference Algorithm FE Classification Proposed Reference
name Technique Year name technique technique
KNN-TWE None Distance-based 2015 Lines, J., & LR None Statistical 1958 Cox, D. R [96].
Bagnall, A [92]. NB None Statistical 2004 H. Zhang[97]
PF None DT ensemble 2019 Lucas, B., et al. RF None DT ensemble 2001 Breiman, L [98].
[46]. SVM None Distance-based 1998 Cortes, C., &
KNN-DTW- None Distance-based 2017 Shokoohi-Yekta, Vapnik, V. [99]
I M., et al. [47]. KNN-EUC None Distance-based 1951 Fix & Hodges
EE None Algorithm 2015 Lines, J., & [100]
Ensemble Bagnall, A [92]. Ridge None Statistical 1970 Hoerl, A. E., &
XGBoost None DT ensemble 2016 Chen, T., & Kennard, R. W
Guestrin, C [45]. [101].
KNN- Differential- Distance-based 2013 Gorecki and MLP FFN Statistical 2017 Wang, Z., et al.
DDTW based Luczak[93] [53].
FBL Distance- Statistical 2014 B. Fulcher and N.
based Jones[22]
BOSS-VS Dictionary- Distance-based 2016 Schäfer, P [28]. consecutive data points in a given sample of the former type are
based dependent on each other, and there might be some degree of autocor-
DTWF Dictionary- Distance-based 2015 Kate, R. J [29].
based
relation between them. This is not true in the latter case, and consecutive
TDE Dictionary- Algorithm 2021 Middlehurst, M. data points can be considered independent. Since TSC on raw time-series
based Ensemble et al. [37]. is considered a more challenging task, we chose raw time series datasets
MrSQM Dictionary- Statistical 2021 Nguyen, T. L., & for this review. From the list in Table 17, SECOM, PHM11 Anemometer,
based Ifrim, G [33].
PHM13 Maintenance, PHM16 CMP, PHM17 Train Bogie, NASA IMS
LS Shapelet- Statistical 2014 Grabocka et al.
based [94]. Bearing, and Metal Etching Feature-set are feature-sets and the rest are
STC Shapelet- DT ensemble 2017 Bostrom, A., & raw time-series.
based Bagnall, A [39]. Second, we checked the dataset documentation and recorded the ML
DrCIF Interval- DT ensemble 2021 Middlehurst, M. tasks that each dataset was designed for. This information is recorded in
based et al. [37].
RISE Interval- DT ensemble 2018 Lines, J., et al.
the “ML Task” column of Table 17. For example, the PHM 2021 NASA
based [35]. Turbofan engine is gathered from a run-to-failure experiment and was
RotF Statistical DT ensemble 2006 Rodriguez, J. J., originally designed for forecasting tasks and RUL estimation applica-
et al. [44]. tions. Although there can be preprocessing actions available to change
ARSENAL Kernel-based Algorithm 2021 Middlehurst, M.
this assumption, we only chose the datasets labeled for classification
Ensemble et al. [37].
ROCKET Kernel-based Statistical 2020 Dempster, A., tasks for this review. As a result, 11 datasets had these conditions and
et al. [41]. were chosen.
HIVE-COTE Hybrid Meta Ensemble 2021 Middlehurst, M. Since the datasets stem from different sources, there are many dif-
2 et al. [37]. ferences between them. The differences range from different numbers of
files and folders to deal with for a sample, to different file types to store
the data (e.g., CSV, txt, Matlab, etc.), to varying lengths of time series, to
Table 19 whether the observations were scaled or not after being recorded. The
The final set of 10 ANN & DL ML TSC algorithms selected for experimental selected datasets must have a standard structure and characteristics to
evaluation. be comparable. Thus, a significant amount of preprocessing work was
Algorithm FE Classification Proposed Reference required to complete this step. To do so, we loaded the dataset into a
name technique technique Year single file dataset with the structure defined in Section 2.1 (see Fig. 2),
FCN CNN Statistical 2017 Wang, Z., et al. dealt with varying lengths with a predefined method that seemed logical
[53]. for a given dataset, and standardized the observations to have a standard
Encoder CNN Statistical 2018 Serrà, J., et al. normal distribution with a mean equal to zero and unit standard devi-
[95].
ResNet CNN Statistical 2017 Wang, Z., et al.
ation. Then the dataset instances and dimensions were shuffled
[53]. randomly to avoid any biases in later steps.
InceptionTime CNN Statistical 2020 Ismail Fawaz, H., A recurring and controversial question to resolve in this kind of
et al. [62]. research is whether to standardize the datasets or not. Most of the past
Stacked LSTM RNN Statistical 2021 Mekruksavanich,
research in the literature indicates it is advisable to standardize time-
S., & Jitpattanakul,
A [69]. series data [23]. The reasoning is threefold: First, if summary mea-
BI-LSTM RNN Statistical 2021 Bartosik, S. C., & sures such as mean and variance can be used to discriminate, then the
Amirlatifi, A [70]. problem is considered trivial and thus can be solved with simple
TS-LSTM RNN Statistical 2021 Lee, W. J., et al. methods such as thresholding. Second, some algorithms perform stan-
[65].
DA-NET FFN Statistical 2022 Chen, R., et al.
dardization internally thus having non-standardized datasets can distort
[17]. comparisons of algorithms. Finally, some datasets are already stan-
MALSTM-FCN CNN, Statistical 2019 Karim, et al.[75] dardized (see Table 21), so standardizing the rest allows for less biased
RNN comparison.
GASF-CNN GASF, Statistical 2019 Martínez-Arellano,
There were some datasets that we were able to extract multiple
CNN et al.[67]
datasets with different characteristics. For example in the case of the
CWRU dataset, the data was collected from three sensors installed on the
from the time-series signals (e.g., mean, standard deviation, min, max, device namely Drive End accelerometer data (DE), Fan End acceler-
etc.). We refer to this type of dataset as a “Feature-set". The main dif- ometer data (FE), and Base Accelerometer data (BA). The Data was
ference between these two kinds of datasets is the fact that the collected at different sample rates (i.e., 12,000 samples/second and
13
M.A. Farahani et al. Robotics and Computer-Integrated Manufacturing 91 (2025) 102839
Table 21
Dataset initial characteristics before preprocessing.
Dataset name Balanced Varying length Standardized # Classes (N) (T) (M) # instances
48,000 samples/second) and the number of experiments in each of these evaluation and the evaluation metrics.
conditions is different. As a result, we extracted six different datasets
with different characteristics from the raw data as illustrated in 3.3.1. Implementation of experimental evaluation
Table 22. We differentiated these datasets with different names for the We implemented our experiments in two separate Python environ-
sake of clarity. For instance “CWRU_12k_DE_uni” refers to the univariate ments. For conventional ML algorithms, we used Python 3.8 with scikit-
dataset including only the data from the DE sensor gathered in the 12k learn 1.2.2, sktime 0.18.0, tsfresh 0.20.0, tslearn 0.5.3.2, xgboost 1.7.5,
sample rate. mrsqm 0.0.1, pyts 0.12.0 as major ML python packages and for DL al-
The result is 22 preprocessed datasets that are ready to be ingested by gorithms we used Python 3.9 with tensorflow 2.11.0 and torch 2.0.0. All
the selected TSC algorithms to evaluate their performance over multiple these experiments have been done on an AMD Threadripper Pro
datasets. Based on Demsar et al., a number of datasets greater than ten 5975WX with 32 cores (64 threads), with 2x RTX A5000 (24 GB) GPUs,
are considered sufficient for this kind of performance evaluation [102]. and 128 GB or 512GB of memory. The difference in memory available is
Tables 21 and 22 show the details of dataset characteristics before and due to an upgrade made necessary by some of the more demanding al-
after the preprocessing step for reproducibility. The bolded items in gorithm/data set combinations. We set a 48-hour runtime limit for each
Table 21 and Table 22 refer to a reduced set of eleven independent time running an algorithm on a dataset. This seems reasonable given
datasets that will be used as a scenario in the experiments. that each of the 36 algorithms was run five times on each of the 22
datasets and running this number of experiments for 48 h on a single
CPU core would accumulate to approximately 21 years. In practice, we
3.3. Experimental setup
conducted experiments equivalent to approximately 2 years. These time
limits apply to running the algorithm once on a single dataset. The
In this section, we illustrate the details of the experimental
Table 22
Dataset final characteristics after preprocessing.
Dataset name Varying length Scaling (N) (T) (M) # Instances
14
M.A. Farahani et al. Robotics and Computer-Integrated Manufacturing 91 (2025) 102839
algorithms were stopped manually after exceeding the runtime limit, dataset. These two evaluation approaches are widely accepted among
and an accuracy of zero was recorded for them in calculations moving the TSC community.
forward. Additionally, there were instances when the system ran out of Tables 23 and 24 summarize the parameters and configurations of
memory. This decision is again based on the study’s purpose to provide the algorithm used to generate the results. We also documented the
practical guidance to readers. multivariate capability, which processing units were used for any of the
Our goal is to test the algorithms in their default parameter setting algorithms, and the parallelization capability of each algorithm. While
and how well they can generalize on manufacturing datasets with the there may be other implementations of these algorithms with different
minimum amount of parameter or hyperparameter tuning. So we did not characteristics, this table records the specific implementations used in
carry out any parameter or hyperparameter tuning and we performed this study. For the algorithms capable of running on GPU, we ran the
them on the default setting proposed by the referenced paper or the algorithms on all available GPU cores, and for the algorithms running on
Python package authors. Unless explicitly stated in the original paper CPU, we used 20 CPU cores for each algorithm. More details about each
proposing the algorithm, we have not conducted any external tuning. algorithm can be found in the referenced Python libraries.
More details for each algorithm can be found in the associated papers.
While we acknowledge that this choice might impact some algorithms 4. Results & discussion
more than others, we believe it to be the best way to avoid additional
bias and in accordance with our objective to provide insights to prac- In this section, we will present the results and analysis of our
titioners and academics alike. implemented experimental evaluation from different perspectives and
To avoid categorizing algorithms into separate groups based on their offer additional insights. We were not obtaining results for all algorithms
capability to handle univariate or multivariate TSC, we adopted a on all datasets within our defined constraints. Our objective in this
different approach in our methodology. We treated the ability to handle evaluation was to assess the performance of classifiers based on the
multivariate time-series as an internal capability of the algorithm. In original authors’ (or default) recommended configurations without any
cases where an algorithm did not possess the multivariate capability, we optimization. While it is possible that we could have tailored these al-
used only the first dimension of the dataset as input for that specific gorithms’ parameters to function more accurately on the challenging
algorithm. By doing so, we ensured a consistent experimental setup datasets, our intention was to avoid introducing bias into our results by
throughout our evaluation. doing so. Instead, we aim to determine which algorithms demonstrate
To be able to better evaluate the algorithms on unseen data, we have better generalization capabilities across different problems and exhibit
trained each algorithm five times on each dataset by using a five-fold robust performance without dataset-specific optimizations.
cross-validation approach. Each fold uses a different 80/20 percent Table 25 shows the algorithms’ accuracy on all datasets. Each
train/test split with a different random shuffled initialization, which number is the average statistic over five runs. Moreover, it contains the
enables us to take the mean accuracy and standard deviation over the overall average accuracy (AVG ACC), number of wins (WIN), average
five runs to reduce the bias due to the initial values and provide an rank (AVG Rank), and the mean per-class error (MPCE) metrics. The
analysis of uncertainty and variability. For DL algorithms, we ensured results ‘‘time” in the table denote that the corresponding algorithm
that all models converged during the training phase. We achieved this by failed to generate any result within defined time and resource con-
choosing a high number of epochs for the given architecture and straints, and “OOM” indicates that the algorithm was stopped due to
applying an early stopping callback if the model loss did not improve for needing more than available 512GB memory. The dataset names have
50 epochs. been abbreviated to be able to fit into the table. Table 25 numbers are
the results of approximately two years of experiment runtime.
3.3.2. Evaluation metrics The ResNet, DrCIF, InceptionTime, and ARSENAL algorithms are
To evaluate the performance of different algorithms on multiple showing the best overall performance in our experience. They were all
datasets, we followed the recommendations of Demsar [102], and we able to achieve an average accuracy of higher than 96.6 % on 22 datasets
adopted the non-parametric Friedman test [102] to reject the null hy- which is very impressive. The ResNet algorithm achieved the best results
pothesis. The null hypothesis being tested is that all classifiers perform on WIN and AVG Rank metrics and DrCIF achieved the best results in
the same and the observed differences are merely random. Then, the AVG ACC and MPCE. The InceptionTime and ARSENAL algorithms are
significance of differences between compared classifiers is measured also showing overall competitive results in this experiment.
using pairwise post-hoc analysis by a Wilcoxon signed-rank test with The DrCIF and ARSENAL algorithms belong to the conventional ML
Holm correction (α = 0.05). A Critical Difference (CD) diagram is used to algorithms category proving that DL algorithms are not always the best
intuitively visualize the performance of these classifiers [103]. Since solution and there are very powerful algorithms among ML algorithms
some of our datasets are imbalanced, we use the weighted F1 score as the as well. DrCIF employs interval-based feature extraction techniques and
measurement of accuracy. derives a set of features referred to as catch22 features, in addition to
Moreover, we followed the recommendations by Wang et al., [53]. summary statistic features obtained from intervals within the time se-
and reported the “Average Accuracy”, “Number of Wins”, “Average ries. Subsequently, it leverages a DT ensemble classification technique
Rank”, and the “Mean Per-Class Error (MPCE)” evaluation metrics based for classification purposes. On the other hand, ARSENAL adopts an
on classification error. Average accuracy is the average statistic of a ensemble approach by employing multiple ROCKET algorithms for
given algorithm over all datasets. The number of wins indicates the classification. Each ROCKET classifier utilizes a range of convolution
number of times that a given algorithm outperformed all other algo- kernels for feature extraction and relies on the ridge classifier for car-
rithms, counting the ties for all winning algorithms. The average rank is rying out the classification process.
defined to measure the algorithm’s difference over multiple datasets. These results also show the robustness, efficiency, scalability, and
MPCE is a robust baseline criterion that calculates the mean error rates power of convolution kernels in capturing temporal features in time-
by considering the number of classes and can provide additional in- series data as three out of four best-performing algorithms are using
sights. The equation is as follows: these kernels for feature extraction.
k
The LSTM, BiLSTM, and TS-LSTM algorithms are another note-
1∑ ek worthy group of algorithms in this experiment. These algorithms are
MPCE =
K k=1 Dk based on RNN architectures and are showing comparatively very good
overall performance which shows the effectiveness of RNN-based
Where K is the number of datasets, Dk represents the number of
structures in capturing features in time-series data.
classes in the dataset k, and ek represents the error rate on the k-th
One particularly interesting outcome of this experiment is the
15
M.A. Farahani et al. Robotics and Computer-Integrated Manufacturing 91 (2025) 102839
Table 23
Conventional ML TSC algorithms parameters. “Multivariate capability" refers to whether the current implementation of the algorithm can process three-dimensional
multivariate time-series data. "CPU/GPU" indicates the type of processing units employed for conducting our experiments. Lastly, "parallelization" denotes whether the
implementation supports the execution of the algorithm across multiple cores, either on the CPU or GPU.
Algorithm Source Algorithm parameters Multivariate CPU/ Parallelization
name Capability GPU
Table 24
ANN and DL TSC algorithms parameters.
Algorithm Source Algorithm parameters Multivariate CPU/ Parallelization
name capability GPU
MLP dl-4-tsca Optimization = AdaDelta, Loss = Entropy, Epochs = 1000, Batch = 16, Learning rate = YES GPU YES
0.1
FCN dl-4-tsc Optimization = Adam, Loss = Entropy, Epochs = 1000, Batch = 16, Learning rate = YES GPU YES
0.001
Encoder dl-4-tsc Optimization = Adam, Loss = Entropy, Epochs = 100, Batch = 12, Learning rate = YES GPU YES
0.00001
ResNet dl-4-tsc Optimization = Adam, Loss = Entropy, Epochs = 1000, Batch = 32, Learning rate = YES GPU YES
0.001
InceptionTime dl-4-tsc Optimization = Adam, Loss = Entropy, Epochs = 1000, Batch = 32, Learning rate = YES GPU YES
0.001, kernel_size = 41
Stacked LSTM Based on Optimization = RMSprop, Loss = Entropy, Epochs = 200, Batch = 64, Learning rate = YES GPU YES
[69] 0.001
BI-LSTM Based on Optimization = RMSprop, Loss = Entropy, Epochs = 500, Batch = 64, Learning rate = YES GPU YES
[70] 0.001
TS-LSTM Based on Optimization = Adam, Loss = Entropy, Epochs = 500, Batch = 64, Learning rate = 0.001 YES GPU YES
[65]
DA-NET DANETb Optimization = Adam, Loss = Entropy, Epochs = 100, Batch = 16, Learning rate = 0.001 YES GPU YES
MALSTM-FCN MLSTM- Optimization = Adam, Loss = Entropy, Epochs = 100, Batch = 128, Learning rate = YES GPU YES
FCNc 0.001
GASF-CNN Based on Optimization Algorithm = Adam, Loss = Entropy, Epochs = 500, Batch = 64, Learning YES GPU YES
[67] rate = 0.001
a
https://github.com/hfawaz/dl-4-tsc.
b
https://github.com/Sample-design-alt/DANet.
c
https://github.com/houshd/MLSTM-FCN.
underperformance of certain algorithms that are typically considered continuously until obtaining conclusive outcomes. Notably, HIVE-COTE
state-of-the-art in the TSC literature. Specifically, algorithms such as V2.0 impressively achieved 100 % accuracy in two out of three instances
KNN-DTW, HIVE-COTE V2.0, and STC faced limitations while passing when it managed to produce any results at all. These findings likely stem
the experiment’s defined maximum runtime, resulting in poor overall from the unique characteristics of the manufacturing datasets employed
performance. It is quite plausible that these algorithms could have in this study, characterized by long time-series lengths and a substantial
demonstrated remarkable accuracy had we been able to run them number of instances. We believe that this result holds important
16
Table 25
BEARING_U 0.1788 0.1554 0.1486 0.4233 0.4167 0.4400 time 0.3279 time 0.9816 time 0.4739 time 0.9996 time 0.2078 0.3671 time 1.0000
PHM22_M 0.8152 0.9262 0.5158 0.9826 0.9866 0.9719 OOM 0.9972 time 0.5747 time 0.0301 time 0.9980 time 0.9044 0.9874 time 0.9819
PHM22_PIN_U 0.8774 0.9544 0.5118 0.9823 0.9647 0.9164 OOM 0.9875 time 0.7909 time 0.2949 time 0.9930 time 0.9112 0.9877 time 0.9889
PHM22_PO_U 0.8507 0.9141 0.4225 0.9731 0.8735 0.9264 OOM 0.8336 time 0.7605 time 0.1929 time 0.9917 time 0.8765 0.9801 time 0.9771
PHM22_PDIN_U 0.8145 0.9266 0.5163 0.9828 0.9863 0.9534 OOM 0.9883 time 0.7720 time 0.0302 time 0.9879 time 0.9040 0.9870 time 0.9809
ETCHING_M 0.6203 0.7081 0.5491 0.7703 0.7699 0.8041 0.7868 0.7743 0.8054 0.8913 0.8183 0.7563 0.7735 0.8706 0.7505 0.7223 0.7639 0.8795 0.7897
MFPT_48_U 0.4719 0.5685 0.7608 0.6411 0.7556 0.4032 OOM 0.6148 0.6623 0.9640 0.6345 0.6302 0.3478 0.9995 0.9984 0.5462 0.6173 1.0000 1.0000
MFPT_96_U 0.5229 0.5795 0.6776 0.6409 0.7459 0.4122 OOM 0.5909 0.6627 0.9774 0.6734 0.8164 time 0.9995 0.9989 0.5646 0.5711 1.0000 1.0000
PADER_64_U 0.4622 0.4875 0.2574 0.5354 0.6734 0.4218 OOM 0.8305 time 0.6230 time OOM time 0.9814 time 0.4307 0.6893 time 0.9660
PADER_4_U 0.4885 0.4978 0.3642 0.6703 0.5933 0.6501 OOM 0.8579 time 0.7788 time 0.3105 time 0.9251 time 0.5240 0.7231 time 0.9030
PADER_64_M 0.4601 0.4874 0.2580 0.5358 0.6726 0.4088 OOM 0.8829 time time time OOM time 0.9958 time 0.4312 0.6885 time 0.9648
PADER_4_M 0.4895 0.4976 0.3647 0.6697 0.5936 0.9086 OOM 0.9405 time 0.6144 time 0.3106 time 0.9955 time 0.5229 0.7233 time 0.9034
Hydra_10_M 0.8050 0.9420 0.1774 0.9723 0.6795 0.5587 OOM 0.7392 0.2185 0.7633 0.5414 0.2928 0.2462 0.9859 0.3291 0.9168 0.9660 0.9815 0.9805
Hydra_100_M 0.8680 0.9638 0.1774 0.9615 0.6766 0.5808 OOM 0.8228 0.6068 0.6908 time 0.5646 time 0.9896 time 0.9023 0.9641 0.9918 0.9822
Gas_sensors 0.5328 0.5790 0.4755 0.6267 0.6464 0.5555 0.5306 0.7427 0.6132 0.6712 0.4577 0.4770 time 0.8389 0.3435 0.6386 0.7096 0.8384 0.6796
Control_charts 0.7585 0.9414 0.9685 0.9783 0.980 0.9023 0.4695 0.9502 0.5894 0.9750 0.9816 0.9683 0.9784 0.9967 0.9917 0.8380 0.9231 0.9933 0.6647
CWRU_12D_U 0.2107 0.2375 0.5094 0.5443 0.6161 0.4323 OOM 0.7756 0.9639 0.9933 0.9313 0.3104 time 0.9999 time 0.3829 0.7418 0.9976 0.9997
CWRU_12D_M 0.2566 0.2488 0.5250 0.4894 0.5653 0.4835 OOM 0.8253 0.8873 0.9986 0.9219 0.4525 time 1.0000 time 0.3835 0.7075 0.9995 0.9995
CWRU_12F_U 0.3180 0.3131 0.2489 0.2664 0.3532 0.3322 OOM 0.7276 0.8569 0.9933 0.8638 0.3350 time 0.9998 0.5357 0.2641 0.5599 0.9981 0.9998
CWRU_12F_M 0.3696 0.3618 0.3383 0.3014 0.3266 0.5437 OOM 0.8567 0.9014 0.9978 0.8863 0.5251 0.1903 1.0000 0.6931 0.2709 0.4666 0.9998 0.9994
CWRU_48D_U 0.2764 0.3122 0.2541 0.6875 0.6562 0.5416 OOM 0.8908 time 0.8974 time 0.4001 time 0.9987 time 0.3765 0.8097 0.9981 0.9984
CWRU_48D_M 0.2785 0.3112 0.2536 0.6881 0.6474 0.5307 OOM 0.9147 time 0.9578 time 0.4000 time 0.9997 time 0.3831 0.8141 0.8862 0.9980
AVG ACC 0.533 0.587 0.422 0.697 0.690 0.622 0.081 0.812 0.353 0.803 0.350 0.390 0.115 0.979 0.256 0.586 0.761 0.571 0.9440
WIN 0 0 0 0 0 0 0 0 0 1 0 0 0 4 0 0 0 3 3
AVG Rank 25.14 22.91 26.14 19.45 20.23 22.59 31.93 17.77 26.32 16.36 26.16 25.73 31.25 5.45 28.09 23.68 18.07 16.34 10.11
MPCE 0.096 0.085 0.119 0.065 0.066 0.083 0.180 0.043 0.130 0.050 0.133 0.120 0.175 0.007 0.139 0.086 0.054 0.088 0.017
17
Dataset name ARSNL KNNDTWI TDE EE H- MrSQM ROCKET FCN ENCODER ResNet InceptionTime LSTM Bi- TS- DA- MALSTM- GASF-
COTE2 LSTM LSTM NET FCN CNN
BEARING_U 1.0000 time time time time 1.0000 1.0000 0.9996 0.2877 0.9996 0.9996 0.9980 0.8312 0.9934 0.6917 0.5095 0.9011
PHM22_M 0.9985 time time time time 0.9830 0.9979 0.9983 0.9978 0.9986 0.9986 0.9941 0.9979 0.9982 0.9817 0.9974 0.9976
PHM22_PIN_U 0.9966 time time time time 0.9957 0.9933 0.9980 0.9953 0.9983 0.9775 0.9997 0.9955 0.9972 0.9838 0.9957 0.9936
PHM22_PO_U 0.9917 time time time time 0.9908 0.9801 0.9952 0.9889 0.9974 0.9895 0.9964 0.9919 0.9852 0.9013 0.9824 0.9931
PHM22_PDIN_U 0.9969 time time time time 0.9807 0.9916 0.9968 0.9972 0.9981 0.9956 0.9889 0.9928 0.9966 0.9146 0.9947 0.9954
AVG ACC 0.966 0.357 0.157 0.124 0.128 0.911 0.928 0.958 0.903 0.969 0.966 0.912 0.924 0.960 0.809 0.847 0.747
WIN 8 0 0 0 2 1 4 3 0 10 9 2 0 0 0 0 0
AVG Rank 5.55 26.09 28.48 30.18 28.86 13.27 9.89 6.59 11.68 4.16 5.34 10.02 11.43 8.89 17.50 15.70 17.64
MPCE 0.012 0.132 0.162 0.173 0.168 0.028 0.023 0.014 0.024 0.011 0.012 0.026 0.021 0.013 0.051 0.038 0.066
M.A. Farahani et al. Robotics and Computer-Integrated Manufacturing 91 (2025) 102839
implications for both researchers and practitioners engaged in TSC tasks the second-best algorithm and is not significantly inferior to DrCIF. It
within smart manufacturing systems, offering valuable insights for their also achieved the highest number of wins among all algorithms. These
endeavors. results underscore the effectiveness of both interval-based and kernel-
based FE techniques when combined with ensemble learning classifi-
4.1. Benchmark algorithms cation techniques in TSC tasks.
Algorithms specifically designed for TSC tasks should provide im- 4.3. ANN & DL algorithms
provements in terms of accuracy metrics compared to existing bench-
mark algorithms. Using a benchmark classifier that treats each series as a Many reasons can be mentioned explaining the popularity of DL al-
vector, without considering the potential autocorrelation, is the gorithms in recent years. DL algorithms excel at automatically learning
apparent starting point for TSC tasks. However, TSC problems have and extracting features from the data, reducing the need for manual FE
certain characteristics that make them challenging, such as long series and extensive domain expertise. Moreover, their scalability and flexi-
with many redundant or correlated attributes, variable lengths, sea- bility make them practical for handling large datasets and high-
sonality, trend, and non-stationarity and noises. Since standard classi- dimensional with long time-series data. Additionally, the utilization of
fiers may struggle with these characteristics, there have been efforts to advanced hardware and computational resources, such as GPUs has
design classifiers that can compensate for them. However, not all TSC further facilitated the widespread adoption of DL algorithms across
problems will have these characteristics, and benchmarking against various domains. In this part, we assessed these capabilities to see how
standard classifiers can provide insights into the datasets’ characteris- DL algorithms can approximate time-series problems in the
tics. Table 26 summarizes the comparison of seven benchmark algo- manufacturing domain and find the best performers amongst them. We
rithms on 22 datasets and provides four metrics to fully evaluate compared ten algorithms with different architectures, and MLP, as the
different approaches benchmark in this group, was included in this group for comparison.
Fig. 11 shows the critical difference diagram for seven benchmark Table 28 summarizes the comparison results. Except for MLP, DA-NET,
algorithms listed in Table 20. The cliques are formed using a pairwise GASF-CNN, and MASLSTM, all other tested DL algorithms are showing
Wilcoxon test. The existence of a clique between a pair of algorithms competitive results and AVG ACC and MPCE metrics. ResNet is superior
means that they are not significantly different from each other over in all four defined metrics and based on Table 28 results, and it can be
tested datasets. considered the best ANN & DL algorithm for TSC tasks in the
Table 26 and Fig. 11 consistently demonstrate that MLP outperforms manufacturing domain.
the other seven benchmark algorithms, with RF coming in as the second- Fig. 13 presents the critical difference diagram for this analysis. The
best performer. These results are calculated based on the raw accuracy results from this analysis agree with Table 28 and show that ResNet is
measures provided in Table 25 and will not be repeated here. This superior among all eleven ANN & DL algorithms. InceptionTime, FCN,
outcome aligns with our expectations, given MLP’s prowess in and TS-LSTM are in the next places respectively although they are not
addressing complex and nonlinear problems, irrespective of any time- significantly different from ResNet based on the pairwise Wilcoxon test.
related factors. This shows that they are also very powerful algorithms.
4.2. Conventional TSC algorithms 4.4. Results for univariate time-series (UTSC) and multivariate time-
series (MTSC)
Due to the fundamental differences in structure and learning schema
between conventional ML and DL algorithms, we conducted separate As mentioned earlier (refer to Fig. 3), the TSC algorithms are viewed
comparisons for each group to identify the best algorithms for as comprehensive modules that receive raw time-series data with the
manufacturing problems. In this group, we included 19 algorithms, of shape N*T*M on one end and predict corresponding labels on the other
which all except XGBoost were explicitly designed to solve TSC prob- end. We assume that the ability to handle multivariate time-series data
lems. Due to the competitive performance of XGBoost in TSC classifi- and extract discriminative features from different dimensions of a given
cation tasks [104], it was included in this group alongside other dataset is an internal capability of the algorithm. However, there may be
algorithms. Additionally, we included RF as the top-performing non-DL situations where our focus is specifically on working with univariate
algorithm based on the benchmark comparison. Table 27 summarizes time-series data, without requiring additional dimensions. It’s worth
the comparison results. noting that algorithms designed to handle multivariate data can natu-
Fig. 12 presents the critical difference diagram for this analysis. We rally accept univariate data as well. The list of univariate and multi-
removed six worst-performer algorithms (PF, LS, DTWF, TDE, EE, and variate datasets can be found in Table 21.
HIVE-COTE 2) from this figure to be able to have a more discernable To manage the size of the experiment effectively, we divided the
figure. The difference between average ranks in Table 27 and Fig. 12 is datasets into two distinct groups. The first group of experiments focuses
due to this decision aimed at making the results more digestible for the on testing the performance of the ten best-performing algorithms on
readers. twelve univariate datasets. Table 29 summarizes the comparison results.
The DrCIF algorithm has demonstrated the best results among 20 All compared algorithms show an average accuracy of more than 95 %
conventional ML algorithms. It achieved the highest ranking in three out proving that all of them are capable of generating competitive results in
of four metrics in Table 27, as well as the best results in Fig. 12. UTSC tasks. The ResNet algorithm however shows superior performance
ARSENAL is another highly powerful algorithm. In Fig. 12, it ranks as in all four metrics. The InceptionTime is the second-best algorithm with
tied results in two out of four metrics.
Fig. 14 presents a critical difference diagram for this analysis, which
Table 26
Performance comparison of seven benchmark algorithms on 22 datasets. indicates that there are no significant differences between any pairs of
algorithms according to the pairwise Wilcoxon test. This suggests that all
Metric Algorithms
of these algorithms are capable of delivering satisfactory accuracy in
LR NB RF SVM KNN-EUC Ridge MLP UTSC tasks. However, when considering the four metrics provided in
AVG ACC 0.587 0.422 0.697 0.690 0.622 0.533 0.813 Table 29, it becomes evident that the ResNet algorithm performs slightly
WIN 0 1 3 2 2 0 14 better than the others, followed by InceptionTime, ARSENAL, and the
AVG Rank 4.55 5.91 2.86 2.91 4.27 5.41 2.09 FCN algorithm, in that order. Additionally, taking into account the
MPCE 0.085 0.119 0.065 0.066 0.083 0.096 0.043
runtime and computational expenses of these algorithms can serve as an
18
M.A. Farahani et al. Robotics and Computer-Integrated Manufacturing 91 (2025) 102839
Fig. 11. Critical difference diagrams for seven benchmark classifiers on the 22 datasets.
Table 27
Performance comparison of 20 conventional algorithms on 22 datasets.
Metric Algorithms
AVG ACC 0.697 0.081 0.353 0.803 0.350 0.390 0.115 0.979 0.256 0.586
WIN 0 0 0 1 0 0 0 11 0 0
AVG Rank 9.32 16.34 13.45 7.98 13.16 12.07 15.93 2.32 14.34 11.32
MPCE 0.065 0.180 0.130 0.050 0.133 0.120 0.175 0.007 0.139 0.086
AVG ACC 0.761 0.571 0.944 0.966 0.357 0.157 0.124 0.128 0.911 0.928
WIN 0 3 4 12 0 0 0 2 1 4
AVG Rank 8.61 8.43 4.70 2.61 13.27 14.61 15.50 14.75 6.50 4.77
MPCE 0.054 0.088 0.017 0.012 0.132 0.162 0.173 0.168 0.028 0.023
Fig. 12. Critical difference diagrams for 20 conventional algorithms on the 22 datasets.
Table 28
Performance comparison of eleven ANN & DL algorithms on 22 datasets.
Metric Algorithms
MLP FCN Encoder ResNet Inception time LSTM BiLSTM TSLSTM DA-NET MALSTM-FCN GASF-CNN
AVG ACC 0.812 0.958 0.903 0.969 0.966 0.912 0.924 0.960 0.809 0.847 0.747
WIN 0 5 0 14 12 1 0 1 0 0 0
AVG Rank 9.23 3.55 6.07 2.07 2.77 5.64 6.30 4.61 8.82 8.36 8.59
MPCE 0.043 0.014 0.024 0.011 0.012 0.026 0.021 0.013 0.051 0.038 0.066
Fig. 13. Critical difference diagrams for eleven DL algorithms on the 22 datasets.
additional factor for decision-making when distinguishing among The second group of experiments aims to evaluate the performance
equally competent algorithms. The results of this evaluation are elabo- of the top ten best-performing algorithms on ten multivariate datasets.
rated upon in Section 4.6. Table 30 summarizes the comparison results. Seven algorithms show an
19
M.A. Farahani et al. Robotics and Computer-Integrated Manufacturing 91 (2025) 102839
Table 29
Performance comparison of top 10 algorithms on 12 univariate datasets.
Metric Algorithms
ResNet InceptionTime FCN TSLSTM LSTM DrCIF ARSENAL RISE ROCKET BiLSTM
AVG ACC 0.995 0.995 0.983 0.991 0.992 0.989 0.980 0.957 0.969 0.964
WIN 7 6 2 0 2 0 5 3 2 0
AVG Rank 2.75 3.62 5.21 6.87 5.75 5.71 4.33 6.50 6.92 7.33
MPCE 0.001 0.001 0.005 0.002 0.002 0.003 0.006 0.019 0.009 0.008
Fig. 14. Critical difference diagrams for top 10 algorithms on 12 univariate (UTSC) datasets.
Table 30
Performance comparison of top 10 algorithms on 10 multivariate datasets.
Metric Algorithms
ResNet InceptionTime FCN TSLSTM BiLSTM DrCIF ARSENAL RISE ROCKET Encoder
AVG ACC 0.937 0.931 0.928 0.923 0.877 0.967 0.948 0.928 0.879 0.890
WIN 4 3 2 0 0 5 3 0 2 0
AVG Rank 3.25 3.70 4.45 5.90 8.30 3.15 4.40 6.90 6.60 8.35
MPCE 0.023 0.024 0.025 0.026 0.038 0.013 0.019 0.027 0.040 0.036
average accuracy of higher than 92 % in this comparison. This shows dataset and augmented it, resulting in four distinct datasets. In this
that all these algorithms are competitive for MTSC tasks. However, analysis, we removed these augmented datasets to ensure a more
DrCIF algorithms were able to outperform all other algorithms in all four distinct dataset collection and to test if the augmentation process
metrics. ResNet, InceptionTime, and ARSENAL algorithms are in the introduced any biases. The relevant datasets are highlighted in bold font
next places respectively. in Table 21.
Fig. 15 displays the critical difference diagram for this analysis. To effectively manage the scale of the experiment, we tested the top
While the differences between the compared algorithms are not signif- ten best-performing algorithms in terms of AVG ACC metric on the
icant enough to reject the null hypothesis in the pairwise Wilcoxon test, mentioned eleven datasets. Table 31 provides a summary of the results.
an examination of the metrics in Table 30 reveals that DrCIF slightly The results indicate that all these algorithms are performing very well on
outperforms the other algorithms, with ResNet, InceptionTime, and the reduced datasets. The DrCIF algorithm is superior in two metrics and
ARSENAL algorithms following closely. Once more, the analysis of the ARSENAL algorithm is superior in two metrics.
runtime and computational expenses in Section 4.6 can provide addi- Fig. 16 presents the critical difference diagram for this analysis.
tional insights for decision-making purposes. Although the Wilcoxon test failed to reject the null hypothesis on most
algorithm pairs, the ARSENAL and DrCIF algorithms are showing
4.5. Results on reduced datasets marginally better performance among others.
Fig. 17 presents boxplots of 19 algorithms for all accuracy measures.
In this section, we conducted the experiment using a reduced set of These 19 algorithms were selected after removing those that failed to
eleven datasets. As part of the methodology, we augmented some of the produce results for all datasets, as well as low-performing algorithms
datasets to create a more diverse dataset repository to reach the goal of like LR, NB, and GASF-CNN, to enhance the clarity of the plot. These
covering a wider range of problems. For instance, the PHM2022 dataset boxplots offer a visual summary of the distribution, skewness, and
was initially a multivariate dataset with three dimensions derived from presence of outliers in accuracy measurements, aiding in the assessment
three different sensors. We split each sensor’s data into a univariate of the reliability and robustness of different algorithms in comparison.
Fig. 15. Critical difference diagrams for top 10 algorithms on 10 multivariate (MTSC) datasets.
20
M.A. Farahani et al. Robotics and Computer-Integrated Manufacturing 91 (2025) 102839
Table 31
Performance comparison of 10 best-performing algorithms on 11 datasets.
Metric Algorithms
DrCIF RISE ARSENAL ROCKET FCN ResNet InceptionTime BiLSTM TSLSTM MrSQM
AVG ACC 0.972 0.915 0.963 0.918 0.946 0.946 0.944 0.892 0.942 0.901
WIN 4 2 6 3 3 4 4 0 0 1
AVG Rank 3.59 6.50 3.32 5.73 4.54 4.09 3.86 8.41 6.82 8.14
MPCE 0.011 0.026 0.014 0.029 0.020 0.020 0.021 0.032 0.020 0.031
For instance, ARSENAL and DrCIF exhibit consistent results with few In real-world applications, especially those that require time-
outliers, while algorithms such as Bi-LSTM and MrSQM display larger sensitive decisions or when we have limited available computation re-
boxes, indicating less consistency. sources, the computational efficiency of an algorithm can be just as
important as its accuracy. Faster algorithms with shorter runtimes are
4.6. Runtime and computational expense evaluation more practical for real-time and high-throughput systems, where quick
decisions are imperative. They can operate efficiently on standard CPU
As was shown in previous sections, there might be situations where configurations, removing the need for high-performance computers with
we need extra evaluation metrics in addition to accuracy, to be able to advanced GPUs. Additionally, runtime assessments help identify trade-
make better decisions. Runtime can serve as such a metric and it can be offs between computational complexity and accuracy, allowing practi-
directly related to the algorithms’ computational complexity. While tioners to choose the most suitable algorithms based on their specific
accuracy reflects an algorithm’s ability to correctly classify data, run- application requirements. Therefore, considering both accuracy and
time considerations can offer a different dimension of evaluation. runtime in TSC tasks provides a more comprehensive perspective,
21
M.A. Farahani et al. Robotics and Computer-Integrated Manufacturing 91 (2025) 102839
enabling the selection of the right balance between classification per- Finally, in Fig. 20, the scatter plot for the five top-performing algo-
formance and computational efficiency. rithms (i.e., ARSENAL, DrCIF, TS-LSTM, ResNet, and Inception) runtime
It is difficult to compare runtimes and computational expenses of minutes vs accuracy percentage was plotted. In the upper part of the
different algorithms for several reasons such as differences in Python figure, each algorithm has 22 data points (marked with color-coded
package software and available hardware resources differences. More- circles). In the lower part of the figure, the points with an accuracy of
over, some algorithms had been run on CPU while some others ran on less than 90 % and runtime higher than 1000 min were removed to
GPU. We also ran several algorithms in parallel on different CPU and provide a zoomed-in comparison of the algorithms and their variabil-
GPU cores and we do not know the effect of doing so on the final runtime ities. The ellipses are drawn to mark the 95 % confidence interval
(which was considered out of scope for this study). Although compu- around the mean (marked with the star), assuming a bivariate normal
tational complexity assessments have been conducted by researchers distribution of points in the horizontal and vertical axis meaning 95 % of
over the years [28,31,105], our approach took a more practical route to the data points are in each respective ellipse. The scatter plot shows that
gain insight into relative algorithm performance by recording runtime ARSENAL and TS-LSTM have the minimum variability in both accuracy
data for all experiments. Fig. 18 provides the boxplot summaries of 19 and runtime metrics among other algorithms, making them good can-
algorithm runtime information and Fig. 19 plots the average accuracy didates when performance stability is needed. In contrast, ResNet and
against runtime. InceptionTime algorithms both have high accuracy and runtime vari-
These results must be considered an indicator only for the scalability abilities. Using them requires caution and they are less reliable in this
of tested algorithms as they have not been obtained based on a rigorous regard.
methodology but as additional results of the main experiment. Scal-
ability is a very important factor to consider when choosing the best
algorithm for a given problem. If the problem is associated with a large 4.7. Practical implications
dataset or we anticipate the need to scale up the system in the future, we
must consider the classifier’s scalability. The upper-left part of Fig. 19 Accuracy and runtime were discussed in detail in previous sections.
showcases algorithms that exhibit commendable scalability while However, when TSC algorithms are selected for manufacturing settings
maintaining an acceptable level of accuracy. Algorithms like ARSENAL, in practical applications, there are several other implications to consider
ROCKET, and RISE fall into this category. Conversely, the upper-right besides high accuracy and low runtime. Data requirements, ease of
part of the figure presents algorithms such as InceptionTime, FCN, and implementation, required computational resources, and model inter-
ResNet, where high accuracies come at the expense of extended run- pretability are among those and are briefly discussed in the following.
times. Some algorithms with exceptionally high runtimes were excluded First, before choosing any TSC algorithm for the use case, the data
from this figure. Notably, algorithms like HIVE-COTE V2.0 belong to this quality, quantity, and the needed effort to preprocess it should be
group and may encounter scalability issues to a degree that renders them considered as these factors can affect the model’s accuracy regardless of
impractical for certain applications. the chosen algorithm. For example, DL models typically need larger
datasets to perform well and avoid overfitting and in manufacturing
Fig. 18. Runtime Box plots for 19 algorithms on 22 datasets. The vertical axis is the logarithmic transformation of runtime in seconds and the horizontal axis is the
name of different algorithms.
22
M.A. Farahani et al. Robotics and Computer-Integrated Manufacturing 91 (2025) 102839
Fig. 19. Average Runtime vs. Accuracy for 19 algorithms on 22 datasets. The horizontal axis is the average runtime of each algorithm in minutes and the vertical axis
is the average accuracy of each algorithm in percentage.
environments, running the machines to collect more data can be very legwork, conducting experiments across various scenarios, and intro-
expensive and even infeasible in some cases. In this situation, conven- ducing algorithms that demonstrate strong out-of-the-box performance,
tional ML approaches that can work with less data might be more particularly on manufacturing datasets. In doing so, we face yet another
effective. challenge in time-series analytics for smart manufacturing applications
Second, the ease of implementation should be considered when we are which is the scarcity of applicable public datasets, with only limited
choosing a TSC algorithm. Choosing an easy-to-deploy, open-sourced, preprocessed manufacturing datasets accessible to both researchers and
and well-documented algorithm that requires minimal parameter tuning practitioners. Consequently, ML researchers in manufacturing either
and technical expertise is favorable for practitioners and saves imple- must rely on original data collected from machines, which can be
mentation time. Moreover, DL models need GPUs and considerably challenging and not feasible in many cases, or start from scratch with
higher computational power to run efficiently. Thus, the needed data preprocessing tailored to their specific applications. In a recent
computational resources should also be considered. effort, we provided a structured overview of the current state of time-
Finally, it is very important to choose a TSC algorithm providing series pattern recognition in manufacturing, emphasizing practical
interpretable predictions. This is a very important factor in problem-solving approaches. Building upon this foundation, here in this
manufacturing use cases because the prediction results are often used to study, we have developed a specialized ML framework for TSC in smart
conduct root cause analysis and fault diagnosis, to help with the manufacturing systems, empowering manufacturers to address diverse
continuous process improvement. For instance, DT-based classification challenges within the industry.
algorithms, which may have lower accuracy, may produce more valu- In this paper, we present the largest empirical study of TSC algo-
able predictions in sensitive cases compared to a highly accurate CNN rithms in the manufacturing domain to-date. The entire experiment
model operating as a “black box” due to the clear understanding of how required nearly two years of machine runtime, which we parallelized to
the predictions were made. ensure feasibility within our resource limitations. Based on our results,
ResNet, DrCIF, InceptionTime, and ARSENAL emerged as the top-
5. Conclusions, future work, and limitations performing algorithms, boasting an average accuracy of over 96.6 %
across all 22 datasets. Notably, DrCIF and ARSENAL belong to the
Manufacturing industries are in dire need of AI and ML platforms to conventional ML algorithms category, highlighting that DL algorithms
facilitate their transition towards Industry 4.0 and smart manufacturing are not always the optimal choice and powerful alternatives exist. These
systems, where a disconnect exists between the state-of-the-art ML al- findings underscore the robustness, efficiency, scalability, and effec-
gorithms in computer science literature and those utilized in tiveness of convolutional kernels in capturing temporal features in time-
manufacturing literature. Furthermore, practitioners within the series data collected from manufacturing systems for TSC tasks, as three
manufacturing domain may lack the technical knowledge required to out of the top four performing algorithms leverage these kernels for
navigate the increasing number of algorithms, each with slight varia- feature extraction. Additionally, LSTM, BiLSTM, and TS-LSTM algo-
tions in structure and performance. This challenge is compounded by the rithms deserve recognition for their effectiveness in capturing features
multitude of parameters and hyperparameters that need tuning after within time-series data using recurrent RNN-based structures. It is
selecting algorithms. In our efforts, we have undertaken exhaustive important to note that these results were derived empirically on
23
M.A. Farahani et al. Robotics and Computer-Integrated Manufacturing 91 (2025) 102839
Fig. 20. Scatterplot of all runtime vs. accuracies for the five top-performing algorithms (up). 95 % confidence interval ellipses (down). The horizontal axis is the
runtime of each algorithm in minutes and the vertical axis is the accuracy of each algorithm in percentage. Average accuracy and runtime are depicted by the
star markers.
manufacturing time-series data for the defined scope and this study was and advancement in this growing field.
not designed to draw any theoretical conclusions beyond the scope. There are several limitations that must be considered when inter-
We also used runtime as a supplementary metric to help the decision- preting the results of this paper. This paper was written under certain
making process, shedding light on the trade-offs between accuracy and assumptions, timelines, and resource limitations, with a primary focus
computational efficiency. on providing a comprehensive TSC framework in smart manufacturing
There are several topics and subtopics that we did not discuss in this systems. While the complete removal of subjectivity and biases is
work that can be considered in future works. These topics include but are impossible and arguably not desirable, we intend to maintain trans-
not limited to topics such as addressing time-series data with variable parency by articulating the process and methodology used, enabling our
length for TSC, addressing very long time-series collected from high- audience to understand our biases, intent, understanding, and their in-
frequency systems, investigating the impact of different normalization fluence on the content of this paper. In particular, the following limi-
techniques on the performance of TSC models in manufacturing, tations are worth noting:
investigating recent generative AI, transformer-based, and LLM-based In the methodology section, it was assumed that newer algorithms
techniques and algorithms for TSC tasks in manufacturing, and more. within the same classification categories would outperform their pre-
These unexplored areas present opportunities for further investigation decessors. While this assumption is intuitively correct, It is worth
24
M.A. Farahani et al. Robotics and Computer-Integrated Manufacturing 91 (2025) 102839
acknowledging that it may not hold true in all cases, as some specific References
algorithms developed for specific problems may exist that refute this
assumption. However, studies like this involve making a number of [1] S. Sinha, E. Bernardes, R. Calderon, T. Wuest, Digital Supply networks: Transform
your Supply Chain and Gain Competitive Advantage With Disruptive Technology
decisions about how information is collected and analyzed, how ex- and Reimagined Processes, McGraw-Hill Education, 2020.
periments are conducted, etc. Often there is no one "correct" approach. [2] A Kusiak, Smart manufacturing, Int. J. Prod. Res. 56 (1–2) (2018) 508–517,
Instead, we focused on being as transparent as possible in explaining all https://doi.org/10.1080/00207543.2017.1351644.
[3] T. Wuest, D. Weimer, C. Irgens, K.-D. Thoben, Machine learning in
the steps to increase the clarity and reproducibility of our work. We manufacturing: advantages, challenges, and applications, Prod. Manuf. Res. 4 (1)
adopted this assumption to help the downselection of representative (2016) 23–45, https://doi.org/10.1080/21693277.2016.1192517.
algorithms within each category and to manage the scope of the [4] M. Babic, M.A. Farahani, T. Wuest, Image based quality inspection in smart
manufacturing systems: a literature review, Proc. CIRP 103 (2021) 262–267,
experiment, as it was not feasible to run all the initial 92 algorithms on https://doi.org/10.1016/j.procir.2021.10.042.
all datasets within a reasonable timeframe. [5] D. Neupane, J. Seok, Bearing fault detection and diagnosis using case western
The recorded runtimes were calculated under the condition that five reserve university dataset with deep learning approaches: a review, IEEE Access 8
(2020) 93155–93178, https://doi.org/10.1109/ACCESS.2020.2990528.
algorithms were run simultaneously on parallel CPU cores. It is impor-
[6] M.M. Rahman, M.A. Farahani, T. Wuest, Multivariate time-series classification of
tant to acknowledge that this concurrent execution may impact the critical events from industrial drying hopper operations: a deep learning
runtimes in certain cases. Results might have varied if we had the re- approach, J. Manuf. Mater. Process. 7 (5) (2023) 164, https://doi.org/10.3390/
sources to run algorithms individually. jmmp7050164.
[7] M. Torkjazi, A.K. Raz, Data-driven approach with machine learning to reduce
Although we tried to include as many TSC algorithms as possible, it is subjectivity in multi-attribute decision making methods, in: 2023 IEEE
plausible that some algorithms (especially brand-new algorithms) were International Systems Conference (SysCon), 2023, pp. 1–8, https://doi.org/
not included due to the timing of the search and the continuous and fast- 10.1109/SysCon53073.2023.10131094.
[8] H. Khosravi, H. Sahebi, R. khanizad, I. Ahmed, Identification of the factors
paced nature of the research in this field. Nevertheless, the developed affecting the reduction of energy consumption and cost in buildings using data
methodology can be applied to these new algorithms in the future to mining techniques (arXiv:2305.08886), ArXiv. (2023), http://arxiv.org/abs/2
expand on the presented results. Moreover, although different evalua- 305.08886.
[9] H. Zhang, Q. Zhang, S. Shao, T. Niu, X. Yang, Attention-based LSTM network for
tion metrics had been utilized in this study, there might be some limi- rotatory machine remaining useful life prediction, IEEE Access. 8 (2020)
tations and biases in the used evaluation metrics. Finally, the results 132188–132199, https://doi.org/10.1109/ACCESS.2020.3010066.
discussed in this paper are only applicable to the studied domain and [10] M.A. Farahani, M.R. McCormick, R. Gianinny, F. Hudacheck, R. Harik, Z. Liu,
T. Wuest, Time-series pattern recognition in smart manufacturing systems: a
problem and the findings are not generalizable to other domains. literature review and ontology, J. Manuf. Syst. (2023).
[11] H.N. Bhandari, B. Rimal, N.R. Pokhrel, R. Rimal, K.R. Dahal, R.K.C. Khatri,
CRediT authorship contribution statement Predicting stock market index using LSTM, Mach. Learn. Appl. 9 (2022) 100320,
https://doi.org/10.1016/j.mlwa.2022.100320.
[12] P. Jain, W.F. Alsanie, D.O. Gago, G.C. Altamirano, R.A. Sandoval Núñez,
Mojtaba A. Farahani: Writing – review & editing, Writing – original A. Rizwan, S.A Asakipaam, A cloud-based machine learning approach to reduce
draft, Visualization, Validation, Software, Methodology, Investigation, noise in ECG arrhythmias for smart healthcare services, Comput. Intell. Neurosci.
Formal analysis, Data curation, Conceptualization. M.R. McCormick: 2022 (2022) 1–11, https://doi.org/10.1155/2022/3773883.
[13] M.A. Farahani, A. Vahid, A.E. Goodwell, Evaluating ecohydrological model
Writing – review & editing, Writing – original draft, Visualization, sensitivity to input variability with an information-theory-based approach,
Investigation, Formal analysis. Ramy Harik: Writing – review & editing, Entropy 24 (7) (2022) 994, https://doi.org/10.3390/e24070994.
Validation, Project administration, Funding acquisition. Thorsten [14] A. Zafari, A. Khoshkhahtinat, P.M. Mehta, N.M. Nasrabadi, B.J. Thompson, D. da
Silva, M.S.F Kirk, Attention-based generative neural image compression on solar
Wuest: Writing – review & editing, Writing – original draft, Validation, dynamics observatory, in: 2022 21st IEEE International Conference on Machine
Supervision, Resources, Project administration, Methodology, Investi- Learning and Applications (ICMLA), 2022, pp. 198–205, https://doi.org/
gation, Funding acquisition, Conceptualization. 10.1109/ICMLA55696.2022.00035.
[15] M. Akyash, H. Mohammadzade, H. Behroozi, A dynamic time warping based
kernel for 3D action recognition using kinect depth sensor, in: 2020 28th Iranian
Declaration of competing interest Conference on Electrical Engineering (ICEE), 2020, pp. 1–5, https://doi.org/
10.1109/ICEE50131.2020.9260988.
[16] Z.M. Çınar, A. Abdussalam Nuhu, Q. Zeeshan, O. Korhan, M. Asmael, B. Safaei,
The authors declare the following financial interests/personal re- Machine learning in predictive maintenance towards sustainable smart
lationships which may be considered as potential competing interests: manufacturing in industry 4.0, Sustainability 12 (19) (2020) 8211, https://doi.
Thorsten Wuest, Ramy Harik reports financial support was provided by org/10.3390/su12198211.
[17] R. Chen, X. Yan, S. Wang, G. Xiao, DA-Net: dual-attention network for
National Science Foundation. If there are other authors, they declare multivariate time series classification, Inf. Sci. (N.Y.) 610 (2022) 472–487,
that they have no known competing financial interests or personal re- https://doi.org/10.1016/j.ins.2022.07.178.
lationships that could have appeared to influence the work reported in [18] Q. Yang, X. Wu, 10 Challenging problems in data mining research, Int. J. Inf.
Technol. Decis. Mak. 05 (04) (2006) 597–604, https://doi.org/10.1142/
this paper.
S0219622006002258.
[19] H.I. Fawaz, G. Forestier, J. Weber, L. Idoumghar, P.-A. Muller, Deep learning for
Data availability time series classification: a review, Data Min. Knowl. Discov. 33 (4) (2019)
917–963, https://doi.org/10.1007/s10618-019-00619-1.
[20] C.-Y. Hsu, W.-C. Liu, Multiple time-series convolutional neural network for fault
Data will be made available on request. detection and diagnosis and empirical study in semiconductor manufacturing,
J. Intell. Manuf. 32 (3) (2021) 823–836, https://doi.org/10.1007/s10845-020-
01591-0.
[21] A. Bagnall, A. Bostrom, J. Large, J. Lines, The great time series classification bake
Acknowledgment off: an experimental evaluation of recently proposed algorithms, Extended Ver.
(2016) (arXiv:1602.01711). arXiv, http://arxiv.org/abs/1602.01711.
This material is based upon work supported by the National Science [22] B.D. Fulcher, N.S. Jones, Highly comparative feature-based time-series
classification, IEEE Trans. Knowl. Data Eng. 26 (12) (2014) 3026–3037, https://
Foundation under Grant No. 2119654 and 2420964. Any opinions, doi.org/10.1109/TKDE.2014.2316504.
findings, conclusions, or recommendations expressed in this material are [23] A.P. Ruiz, M. Flynn, J. Large, M. Middlehurst, A. Bagnall, The great multivariate
those of the author(s) and do not necessarily reflect the views of the time series classification bake off: a review and experimental evaluation of recent
algorithmic advances, Data Min. Knowl. Discov. 35 (2) (2021) 401–449, https://
National Science Foundation. doi.org/10.1007/s10618-020-00727-3.
The authors express their appreciation for the contribution and [24] E. Keogh, K. Chakrabarti, S. Mehrotra, M. Pazzani, Locally adaptive
valuable discussions with Robert Gianinny in the early stages of the dimensionality reduction for indexing large time series databases, in: Proceedings
of the 2001 ACM SIGMOD International Conference on Management of Data,
study that improved the quality of this paper. The authors express their
2001.
appreciation to the Robotics and Computer-Integrated Manufacturing [25] P. Schäfer, M. Högqvist, SFA: a symbolic fourier approximation and index for
reviewers for their feedback and resulting improvements. similarity search in high dimensional datasets, in: Proceedings of the 15th
25
M.A. Farahani et al. Robotics and Computer-Integrated Manufacturing 91 (2025) 102839
International Conference on Extending Database Technology, 2012, pp. 516–527, [54] O. Mey, W. Neudeck, A. Schneider, O. Enge-Rosenblatt, Machine learning-based
https://doi.org/10.1145/2247596.2247656. unbalance detection of a rotating shaft using vibration data, in: 2020 25th IEEE
[26] P. Senin, S. Malinchik, SAX-VSM: interpretable time series classification using International Conference on Emerging Technologies and Factory Automation
SAX and vector space model, in: 2013 IEEE 13th International Conference on (ETFA), 2020, pp. 1610–1617, https://doi.org/10.1109/
Data Mining, 2013, pp. 1175–1180, https://doi.org/10.1109/ICDM.2013.52. ETFA46521.2020.9212000.
[27] P. Schäfer, The BOSS is concerned with time series classification in the presence [55] F. Zhang, J. Yan, P. Fu, J. Wang, R.X. Gao, Ensemble sparse supervised model for
of noise, Data Min. Knowl. Discov. 29 (6) (2015) 1505–1530, https://doi.org/ bearing fault diagnosis in smart manufacturing, Robot. Comput. Integr. Manuf. 65
10.1007/s10618-014-0377-7. (2020) 101920, https://doi.org/10.1016/j.rcim.2019.101920.
[28] P. Schäfer, Scalable time series classification, Data Min. Knowl. Discov. 30 (5) [56] K.B. Lee, S. Cheon, C.O Kim, A convolutional neural network for fault
(2016) 1273–1298, https://doi.org/10.1007/s10618-015-0441-y. classification and diagnosis in semiconductor manufacturing processes, IEEE
[29] R.J. Kate, Using dynamic time warping distances as features for improved time Trans. Semicond. Manuf. 30 (2) (2017) 135–142, https://doi.org/10.1109/
series classification, Data Min. Knowl. Discov. 30 (2) (2016) 283–312, https:// TSM.2017.2676245.
doi.org/10.1007/s10618-015-0418-x. [57] N.A. Golilarz, A. Addeh, H. Gao, L. Ali, A.M. Roshandeh, H. Mudassir Munir, R.
[30] P. Schäfer, U. Leser, Multivariate time series classification with WEASEL+MUSE U. Khan, A new automatic method for control chart patterns recognition based on
(arXiv:1711.11343). ArXiv. (2018) http://arxiv.org/abs/1711.11343. ConvNet and Harris Hawks meta heuristic optimization algorithm, IEEE Access 7
[31] Middlehurst, M., Vickers, W., & Bagnall, A. (2019). Scalable dictionary classifiers (2019) 149398–149405, https://doi.org/10.1109/ACCESS.2019.2945596.
for time series classification (Vol. 11871, pp. 11–19). https://doi.org/10.1007/ [58] R. Meyes, N. Hütten, T. Meisen, Transparent and interpretable failure prediction
978-3-030-33607-3_2. of sensor time series data with convolutional neural networks, Proc. CIRP 104
[32] Bagnall, A., Flynn, M., Large, J., Lines, J., & Middlehurst, M. (2020). A tale of two (2021) 1446–1451, https://doi.org/10.1016/j.procir.2021.11.244.
toolkits, report the third: on the usage and performance of HIVE-COTE v1.0 (Vol. [59] T. Zan, Z. Liu, H. Wang, M. Wang, X. Gao, Control chart pattern recognition using
12588, pp. 3–18). https://doi.org/10.1007/978-3-030-65742-0_1. the convolutional neural network, J. Intell. Manuf. 31 (3) (2020) 703–716,
[33] T.L. Nguyen, G. Ifrim, MrSQM: fast time series classification with symbolic https://doi.org/10.1007/s10845-019-01473-0.
representations (arXiv:2109.01036), ArXiv. (2022), http://arxiv.org/abs/2 [60] O. Yazdanbakhsh, S. Dick, Multivariate time series classification using dilated
109.01036. convolutional neural network (arXiv:1905.01697), ArXiv. (2019), http://arxiv.
[34] H. Deng, G. Runger, E. Tuv, M. Vladimir, A time series forest for classification and org/abs/1905.01697.
feature extraction (arXiv:1302.2277), ArXiv. (2013), http://arxiv.org/abs/ [61] D. Janka, F. Lenders, S. Wang, A. Cohen, N. Li, Detecting and locating patterns in
1302.2277. time series using machine learning, Control Eng. Pract. 93 (2019) 104169,
[35] Lines, J., Taylor, S., & Bagnall, A. (2018). Time series classification with HIVE- https://doi.org/10.1016/j.conengprac.2019.104169.
COTE: the hierarchical vote collective of transformation-based ensembles. [62] H.I. Fawaz, B. Lucas, G. Forestier, C. Pelletier, D.F. Schmidt, J. Weber, G.I. Webb,
[36] M. Middlehurst, J. Large, A. Bagnall, The canonical interval forest (CIF) classifier L. Idoumghar, P.-A. Muller, F. Petitjean, InceptionTime: finding AlexNet for time
for time series classification, in: 2020 IEEE International Conference on Big Data series classification, Data Min. Knowl. Discov. 34 (6) (2020) 1936–1962, https://
(Big Data), 2020, pp. 188–195, https://doi.org/10.1109/ doi.org/10.1007/s10618-020-00710-y.
BigData50022.2020.9378424. [63] J. Xu, H. Lv, Z. Zhuang, Z. Lu, D. Zou, W. Qin, Control chart pattern recognition
[37] M. Middlehurst, J. Large, M. Flynn, J. Lines, A. Bostrom, A Bagnall, HIVE-COTE method based on improved one-dimensional convolutional neural network, IFAC-
2.0: a new meta ensemble for time series classification, Mach. Learn. 110 (11–12) PapersOnLine 52 (13) (2019) 1537–1542, https://doi.org/10.1016/j.
(2021) 3211–3243, https://doi.org/10.1007/s10994-021-06057-9. ifacol.2019.11.418.
[38] L. Ye, E Keogh, Time series shapelets: a novel technique that allows accurate, [64] C.-L. Liu, W.-H. Hsaio, Y.-C. Tu, Time series classification with multivariate
interpretable and fast classification, Data Min. Knowl. Discov. 22 (1–2) (2011) convolutional neural network, IEEE Trans. Indust. Electron. 66 (6) (2019)
149–182, https://doi.org/10.1007/s10618-010-0179-5. 4788–4797, https://doi.org/10.1109/TIE.2018.2864702.
[39] Bostrom, A., & Bagnall, A. (2017). Binary shapelet transform for multiclass time [65] W.J. Lee, K. Xia, N.L. Denton, B. Ribeiro, J.W Sutherland, Development of a speed
series classification. 24–46. invariant deep learning model with application to condition monitoring of
[40] I. Karlsson, P. Papapetrou, H. Boström, Generalized random shapelet forests, Data rotating machinery, J. Intell. Manuf. 32 (2) (2021) 393–406, https://doi.org/
Min. Knowl. Discov. 30 (5) (2016) 1053–1085, https://doi.org/10.1007/s10618- 10.1007/s10845-020-01578-x.
016-0473-y. [66] J. Grezmak, P. Wang, C. Sun, R.X. Gao, Explainable convolutional neural network
[41] A. Dempster, F. Petitjean, G.I. Webb, ROCKET: exceptionally fast and accurate for gearbox fault diagnosis, Proc. CIRP 80 (2019) 476–481, https://doi.org/
time series classification using random convolutional kernels, Data Min. Knowl. 10.1016/j.procir.2018.12.008.
Discov. 34 (5) (2020) 1454–1495, https://doi.org/10.1007/s10618-020-00701-z. [67] G. Martínez-Arellano, G. Terrazas, S. Ratchev, Tool wear classification using time
[42] A. Shifaz, C. Pelletier, F. Petitjean, G.I. Webb, TS-CHIEF: a scalable and accurate series imaging and deep learning, Int. J. Adv. Manuf. Technol. 104 (9–12) (2019)
forest algorithm for time series classification, Data Min. Knowl. Discov. 34 (3) 3647–3662, https://doi.org/10.1007/s00170-019-04090-6.
(2020) 742–775, https://doi.org/10.1007/s10618-020-00679-8. [68] K. Lee, M. Chung, S. Kim, D.H Shin, Damage detection of catenary mooring line
[43] D.F. Silva, V.M.A.D. Souza, G.E.A.P.A. Batista, Time series classification using based on recurrent neural networks, Ocean Eng. 227 (2021) 108898, https://doi.
compression distance of recurrence plots, in: 2013 IEEE 13th International org/10.1016/j.oceaneng.2021.108898.
Conference on Data Mining, 2013, pp. 687–696, https://doi.org/10.1109/ [69] S. Mekruksavanich, A. Jitpattanakul, LSTM networks using smartphone data for
ICDM.2013.128. sensor-based human activity recognition in smart homes, Sensors 21 (5) (2021)
[44] J.J. Rodriguez, L.I. Kuncheva, C.J. Alonso, Rotation forest: a new classifier 1636, https://doi.org/10.3390/s21051636.
ensemble method, IEEE Trans. Pattern. Anal. Mach. Intell. 28 (10) (2006) [70] S.C. Bartosik, A. Amirlatifi, Machine learning assisted lithology prediction
1619–1630, https://doi.org/10.1109/TPAMI.2006.211. utilizing toeplitz inverse covariance-based clustering (TICC), Geo-Extreme 2021
[45] T. Chen, C. Guestrin, XGBoost: a scalable tree boosting system, in: Proceedings of (2021) 232–241, https://doi.org/10.1061/9780784483701.023.
the 22nd ACM SIGKDD International Conference on Knowledge Discovery and [71] H. Liu, R. Ma, D. Li, L. Yan, Z. Ma, Machinery fault diagnosis based on deep
Data Mining, 2016, pp. 785–794, https://doi.org/10.1145/2939672.2939785. learning for time series analysis and knowledge graphs, J. Signal. Process. Syst.
[46] B. Lucas, A. Shifaz, C. Pelletier, L. O’Neill, N. Zaidi, B. Goethals, F. Petitjean, G. 93 (12) (2021) 1433–1455, https://doi.org/10.1007/s11265-021-01718-3.
I. Webb, Proximity forest: an effective and scalable distance-based classifier for [72] Giannetti, C., Essien, A., & Pang, Y.O. (2019). A novel deep learning approach for
time series, Data Min. Knowl. Discov. 33 (3) (2019) 607–635, https://doi.org/ event detection in smart manufacturing.
10.1007/s10618-019-00617-3. [73] X. Zhang, Y. Gao, J. Lin, C.-T. Lu, TapNet: multivariate time series classification
[47] M. Shokoohi-Yekta, B. Hu, H. Jin, J. Wang, E. Keogh, Generalizing DTW to the with attentional prototypical network, in: Proceedings of the AAAI Conference on
multi-dimensional case requires an adaptive approach, Data Min. Knowl. Discov. Artificial Intelligence 34, 2020, pp. 6845–6852, https://doi.org/10.1609/aaai.
31 (1) (2017) 1–31, https://doi.org/10.1007/s10618-016-0455-0. v34i04.6165.
[48] L.C. Günther, S. Kärcher, T. Bauernhansl, Activity recognition in manual [74] S. Fahle, T. Glaser, B. Kuhlenkötter, Investigation of machine learning models for
manufacturing: detecting screwing processes from sensor data, Proc. CIRP. 81 a time series classification task in radial–axial ring rolling, in: G. Daehn, J. Cao,
(2019) 1177–1182, https://doi.org/10.1016/j.procir.2019.03.288. B. Kinsey, E. Tekkaya, A. Vivek, Y. Yoshida (Eds.), Forming the Future, Springer
[49] Q. Li, Y. Gu, N. Wang, Application of random forest classifier by means of a QCM- International Publishing, 2021, pp. 589–600, https://doi.org/10.1007/978-3-
based E-nose in the identification of chinese liquor flavors, IEEE Sens. J. 17 (6) 030-75381-8_48.
(2017) 1788–1794, https://doi.org/10.1109/JSEN.2017.2657653. [75] F. Karim, S. Majumdar, H. Darabi, S. Harford, Multivariate LSTM-FCNs for time
[50] A. Zafari, A. Khoshkhahtinat, P. Mehta, M.S. Ebrahimi Saadabadi, M. Akyash, N. series classification, Neural Netw. 116 (2019) 237–245, https://doi.org/10.1016/
M. Nasrabadi, Frequency disentangled features in neural image compression, in: j.neunet.2019.04.014.
2023 IEEE International Conference on Image Processing (ICIP), 2023, [76] J.-H. Lee, J. Kang, W. Shim, H.-S. Chung, T.-E. Sung, Pattern detection model
pp. 2815–2819, https://doi.org/10.1109/ICIP49359.2023.10222816. using a deep learning algorithm for power data analysis in abnormal conditions,
[51] Khoshkhahtinat, A., Zafari, A., Mehta, P.M., Akyash, M., Kashiani, H., & Electronics (Basel) 9 (7) (2020) 1140, https://doi.org/10.3390/
Nasrabadi, N.M. (2023). Multi-context dual hyper-prior neural image electronics9071140.
compression. https://doi.org/10.48550/ARXIV.2309.10799. [77] I.J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair,
[52] Akyash, M., Mohammadzade, H., & Behroozi, H. (2021). DTW-merge: a novel A. Courville, Y. Bengio, Generative adversarial networks (arXiv:1406.2661),
data augmentation technique for time series classification. https://doi.org/10. ArXiv (2014), http://arxiv.org/abs/1406.2661.
48550/ARXIV.2103.01119. [78] S.A. Israel, J.H. Goldstein, J.S. Klein, J. Talamonti, F. Tanner, S. Zabel, P.
[53] Z. Wang, W. Yan, T. Oates, Time series classification from scratch with deep A. Sallee, L. McCoy, Generative adversarial networks for classification, in: 2017
neural networks: a strong baseline (arXiv:1611.06455), ArXiv. (2016), htt IEEE Applied Imagery Pattern Recognition Workshop (AIPR), 2017, pp. 1–4,
p://arxiv.org/abs/1611.06455. https://doi.org/10.1109/AIPR.2017.8457952.
26
M.A. Farahani et al. Robotics and Computer-Integrated Manufacturing 91 (2025) 102839
[79] G. Xiang, K. Tian, Spacecraft intelligent fault diagnosis under variable working [92] J. Lines, A. Bagnall, Time series classification with ensembles of elastic distance
conditions via wasserstein distance-based deep adversarial transfer learning, Int. measures, Data Min. Knowl. Discov. 29 (3) (2015) 565–592, https://doi.org/
J. Aerosp. Eng. 2021 (2021) 1–16, https://doi.org/10.1155/2021/6099818. 10.1007/s10618-014-0361-2.
[80] H.A. Dau, A. Bagnall, K. Kamgar, C.-C.M. Yeh, Y. Zhu, S. Gharghabi, C. [93] T. Górecki, M. Łuczak, Using derivatives in time series classification, Data Min.
A. Ratanamahatana, E. Keogh, The UCR time series archive, IEEE/CAA J. Autom. Knowl. Discov. 26 (2) (2013) 310–331, https://doi.org/10.1007/s10618-012-
Sinica 6 (6) (2019) 1293–1305, https://doi.org/10.1109/JAS.2019.1911747. 0251-4.
[81] A. Bagnall, H.A. Dau, J. Lines, M. Flynn, J. Large, A. Bostrom, P. Southam, [94] J. Grabocka, N. Schilling, M. Wistuba, L. Schmidt-Thieme, Learning time-series
E. Keogh, The UEA multivariate time series classification archive (arXiv: shapelets, in: Proceedings of the 20th ACM SIGKDD International Conference on
1811.00075), ArXiv 2018 (2018), http://arxiv.org/abs/1811.00075. Knowledge Discovery and Data Mining, 2014, pp. 392–401, https://doi.org/
[82] E.G.J. Huang, [Doctoral dissertation], University of Cincinnati, 2013. 10.1145/2623330.2623613.
[83] A. Agogino, K. Goebel, Milling Data Set, NASA Prognostics Data Repository, [95] J. Serrà, S. Pascual, A. Karatzoglou, Towards a universal neural network encoder
2007. for time series (arXiv:1805.03908), ArXiv. (2018), http://arxiv.org/abs/1
[84] J.R. Celaya, A. Saxena, S. Saha, K. Goebel, Prognostics of power MOSFETs under 805.03908.
thermal stress accelerated aging using data-driven and model-based [96] D.R. Cox, The regression analysis of binary sequences, J. R. Statist. Soc.: Ser. B
methodologies, in: the Proceedings of the Annual Conference of the Prognostics (Methodol.) 20 (2) (1958) 215–232, https://doi.org/10.1111/j.2517-6161.1958.
and Health Management Society, 2011. tb00292.x.
[85] B. Saha, K. Goebel, Battery Data Set, NASA Prognostics Data Repository, 2007. [97] Zhang, H. (2004). The optimality of naive bayes.
[86] J. Celaya, P. Wysocki, K. Goebel, IGBT Accelerated Aging Data Set, NASA [98] L. Breiman, Random forest, Mach. Learn. 45 (1) (2001) 5–32, https://doi.org/
Prognostics Data Repository, 2009. 10.1023/A:1010933404324.
[87] Saxena, A., Goebel, K., Larrosa, C.C., & Chang, F.-K. (n.d.). CFRP Composites Data [99] C. Cortes, V. Vapnik, Support-vector networks, Mach. Learn. 20 (3) (1995)
Set. NASA Prognostics Data Repository. https://www.nasa.gov/content/prognost 273–297, https://doi.org/10.1007/BF00994018.
ics-center-of-excellence-data-set-repository. [100] E. Fix, J.L. Hodges, Discriminatory Analysis. Nonparametric Discrimination:
[88] R.E.V. Vargas, C.J. Munaro, P.M. Ciarelli, A.G. Medeiros, B.G.do Amaral, D. Consistency Properties, USAF School of Aviation Medicine, Randolph Field,
C. Barrionuevo, J.C.D.de Araújo, J.L. Ribeiro, L.P Magalhães, A realistic and Texas, 1951.
public dataset with rare undesirable real events in oil wells, J. Petrol. Sci. Eng. [101] A.E. Hoerl, R.W. Kennard, Ridge regression: biased estimation for nonorthogonal
181 (2019) 106223, https://doi.org/10.1016/j.petrol.2019.106223. problems, Technometrics 12 (1) (1970) 55–67.
[89] B.M. Wise, N.B. Gallagher, S.W. Butler, D.D. White, G.G. Barna, A comparison of [102] Demsar, J. (2006). Statistical comparisons of classifiers over multiple data sets.
principal component analysis, multiway principal component analysis, trilinear [103] Benavoli, A., Corani, G., & Mangili, F. (2016). Should we really use post-hoc tests
decomposition and parallel factor analysis for fault detection in a semiconductor based on mean-ranks?
etch process, J. Chemom. 13 (3–4) (1999) 379–396, https://doi.org/10.1002/ [104] P. Liu, Vibration time series classification using parallel computing and XGBoost,
(SICI)1099-128X(199905/08)13:3/4<379::AID-CEM556>3.0.CO;2-N. in: 2023 IEEE International Conference on Prognostics and Health Management
[90] J. Lee, H. Qiu, G. Yu, J. Lin, Bearing Data Set, NASA Prognostics Data Repository, (ICPHM), 2023, pp. 192–199, https://doi.org/10.1109/
2007. https://www.nasa.gov/content/prognostics-center-of-excellence-data- ICPHM57936.2023.10193920.
set-repository. [105] A. Shifaz, C. Pelletier, F. Petitjean, G.I Webb, Elastic similarity and distance
[91] X. Jia, B. Huang, J. Feng, H. Cai, J. Lee, Review of PHM data competitions from measures for multivariate time series (arXiv:2102.10231), ArXiv (2023), htt
2008 to 2017, Methodol. Anal. (2018). p://arxiv.org/abs/2102.10231.
27