Make 06 00090 v2
Make 06 00090 v2
knowledge extraction
Article
A Parallel Approach to Enhance the Performance of Supervised
Machine Learning Realized in a Multicore Environment
Ashutosh Ghimire and Fathi Amsaad *
Department of Computer Science and Engineering, Wright State University, Dayton, OH 45435, USA;
[email protected]
* Correspondence: [email protected]
Abstract: Machine learning models play a critical role in applications such as image recognition,
natural language processing, and medical diagnosis, where accuracy and efficiency are paramount.
As datasets grow in complexity, so too do the computational demands of classification techniques.
Previous research has achieved high accuracy but required significant computational time. This paper
proposes a parallel architecture for Ensemble Machine Learning Models, harnessing multicore CPUs
to expedite performance. The primary objective is to enhance machine learning efficiency without
compromising accuracy through parallel computing. This study focuses on benchmark ensemble
models including Random Forest, XGBoost, ADABoost, and K Nearest Neighbors. These models are
applied to tasks such as wine quality classification and fraud detection in credit card transactions. The
results demonstrate that, compared to single-core processing, machine learning tasks run 1.7 times
and 3.8 times faster for small and large datasets on quad-core CPUs, respectively.
Keywords: machine learning; parallel computing; accuracy; performance; ensemble model; multicore
processing
1. Introduction
Citation: Ghimire, A.; Amsaad, F. A
Machine learning models are increasingly used in many life-critical applications such
Parallel Approach to Enhance the
as image recognition, natural language processing, and medical diagnosis [1]. These models
Performance of Supervised Machine
are applied to learn from more complex datasets, which facilitates more precise findings
Learning Realized in a Multicore
and enhances the ability to identify patterns and connections among different aspects
Environment. Mach. Learn. Knowl.
Extr. 2024, 6, 1840–1856. https://
of information. ML models are particularly valuable in scenarios like medical decision
doi.org/10.3390/make6030090
making, where distinguishing between various types of illnesses is crucial. The ML learning
process often involves training data on specialized classifiers while also determining which
Academic Editor: Mehmed attributes to prioritize for optimal results. Moreover, there are various types of machine
Kantardzic
learning models, as given in Figure 1, and practitioners have the flexibility to choose from
Received: 18 June 2024 a range of approaches, utilize specific classifiers, and evaluate which method best aligns
Revised: 25 July 2024 with their objectives.
Accepted: 28 July 2024 One of the most challenging aspects of machine learning, particularly with large
Published: 2 August 2024 datasets, is the speed at which data processing occurs. Lengthy training periods for mas-
sive datasets can significantly slow down the training and evaluation processes. Moreover,
large datasets may require substantial computational power and memory, further exacer-
bating processing delays [2]. Addressing these challenges necessitates the use of efficient
Copyright: © 2024 by the authors. computing methods and optimizing resource allocation to expedite processing. Techniques
Licensee MDPI, Basel, Switzerland.
such as parallel processing, distributed computing, and cloud computing offer viable
This article is an open access article
solutions to improve processing efficiency.
distributed under the terms and
Several research efforts have explored the potential of parallelization to accelerate
conditions of the Creative Commons
supervised machine learning. A significant body of work has focused on data parallelism,
Attribution (CC BY) license (https://
where the dataset is partitioned across multiple cores to expedite training and prediction.
creativecommons.org/licenses/by/
Techniques such as data partitioning, feature partitioning, and model partitioning have been
4.0/).
Classification
Supervised
Learning
Regression
Machine Unsupervised
Clustering
Learning Learning
Reinforcement
While existing research has made considerable progress in parallelizing supervised ma-
chine learning, several challenges and limitations still persist. Many proposed approaches
exhibit limited scalability, failing to effectively utilize the full potential of modern multi-
core processors as the problem size grows. Furthermore, achieving optimal performance
often requires intricate algorithm-specific optimizations, hindering the development of
general-purpose parallelization frameworks. Consequently, there remains a need for inno-
vative techniques that can address these challenges and deliver substantial performance
improvements across a wide range of machine learning algorithms and hardware platforms.
This paper proposes a novel parallel processing technique tailored for ensemble clas-
sifiers and KNN algorithms. This research aims to address these limitations and explore
the potential of parallel processing to enhance performance on multicore architectures.
By effectively partitioning data, parallelizing classifier training, and optimizing communi-
cation, the proposed approach seeks to improve computational efficiency and scalability in
multicore environments. The methodology will be evaluated using real-world datasets to
demonstrate its effectiveness in handling large-scale machine learning tasks.
The proposed parallel approach to enhance supervised machine learning boasts sig-
nificant potential across diverse sectors. By dramatically accelerating model training and
inference while preserving accuracy, this methodology is poised to revolutionize indus-
tries such as healthcare, finance, and retail [7]. For instance, in healthcare, it can expedite
drug discovery, enable real-time patient monitoring, and facilitate precision medicine [8].
Within the financial domain, it can bolster fraud detection, risk assessment, and algorithmic
trading. Furthermore, retailers can benefit from optimized inventory management, person-
alized marketing, and improved customer experience. The potential applications extend to
manufacturing, agriculture, and environmental science, where rapid data processing and
analysis are paramount.
The remainder of this paper is organized as follows. Section 2 discusses the related
work to this research, while Section 3 describes the data and materials used. Section 4
presents the proposed parallel processing methodology in detail, including data parti-
tioning, parallel classifier training, and optimization techniques. Section 5 describes the
experimental results, comparing the performance of the proposed method with existing ap-
proaches. Finally, Section 6 offers a comprehensive conclusion of the findings, highlighting
the contributions of this research and outlining potential avenues for future work.
2. Contributions
Many widely used machine learning models, such as Random Forest, are based on
ensemble modeling, as shown in Figure 2. This paper introduces an enhanced parallel
Mach. Learn. Knowl. Extr. 2024, 6 1842
processing method designed to accelerate the training of ensemble models. While parallel
processing techniques have been proposed previously, our approach demonstrates a sub-
stantial improvement in efficiency, as training with four cores is 3.8 times faster compared to
using a single core. Notably, our method maintains consistent accuracy across different core
configurations, ensuring that performance is not compromised as computational resources
are scaled.
Figure 2. Overview of the architecture of the Random Forest classifier machine learning model.
3. Related Works
Numerous advancements in the field of parallel neural network training have been
proposed to enhance the efficiency and speed of training processes. A notable contribution
is a model that achieves a 1.3 times faster increase over sequential neural network training
by analyzing sub-images at each level and filtering out non-faces at successive levels,
revealing functional limitations in the face detection technique due to the application of
three neural network cascades [9].
A map-reduce approach was applied in a new ML technique that partitions train-
ing data into smaller groups, distributing it randomly across multiple GPU processes
before each epoch. This method uses sequence bucketing optimization to manage training
sequences based on input lengths, leading to improved training performance by lever-
aging parallel processing [10]. Moreover, the backpropagation algorithm was used to
create a hardware-independent prediction tool, demonstrating quicker response times in
parallelizing ANN algorithms for applications like financial forecasting [11].
Another study focused on enhancing the efficiency of parallelizing the historical data
method across simultaneous computer networks. It discussed the necessity of different
numbers of input neurons for single-layer and multi-layer perceptrons to predict sensor
drift values, which depend on the neural network architecture [12]. This highlights the
Mach. Learn. Knowl. Extr. 2024, 6 1843
from lighting, occlusion, pose, expression, and other factors. This algorithm can be used to
improve the accuracy of facial recognition programs.
A parallel machine and a demonstration on how parallel learning can be implemented
are proposed [25]. In this model, DSPs and transputers, which are the features of the device
and are called ArMenX, are studied. Neural network calculations are carried out on the
DSPs. Also, DSP dynamic allocation uses transputers, where the model is implemented to
use a pre-existing learning algorithm.
A parallel neural network with a confidence algorithm, as well as a parallel neural
network with a success/failure algorithm to achieve high performance in the test problem
of letter recognition from a set of phonemes, are presented [26]. In this method, data
partitioning allows us to decompose complex problems into smaller, more manageable
problems. This allows each neural network to better adapt to its particular subproblem.
In addressing the complexity of time-dependent systems, a prediction method for neu-
ral networks with parallelism is proposed. By dividing the process into short time intervals
and distributing the load across the network, this method enhances stress balancing and
improves approximation and forecasting capabilities, as demonstrated in a sunspot time
series forecasting [27].
A meta-heuristic called Ant Colony Optimization (ACO), inspired by natural processes,
has been parallelized to solve problems like the Traveling Salesman Problem (TSP). This
parallelization aims to expedite the solution process while maintaining algorithm quality,
highlighting the growing interest in parallelizing algorithms and metaheuristics with the
advent of parallel architectures [28].
Finally, recurrent neural network (RNN) training has been optimized through paral-
lelization and a two-stage network structure, significantly accelerating training for tasks
with varying input sequence lengths. The sequence bucketing and multi-GPU paralleliza-
tion techniques enhance training speed and efficiency, as illustrated by the application of
LSTM RNN to online handwriting recognition tasks [29].
The existing literature underscores the critical need for efficient processing of large
datasets using parallel computing techniques. Studies such as those by [23,30] have demon-
strated the effectiveness of parallel algorithms in reducing computation time and enhancing
the scalability of machine learning models. However, the approach in [23] focuses on spe-
cific algorithms like Ant Colony Optimization, and [30] is limited to image-based datasets.
The proposed method builds upon this foundational work by offering a generalized
parallel approach that is versatile across various machine learning models and datasets.
With the increasing popularity of ensemble modeling due to its robustness, stability, and re-
duction in overfitting, our algorithm enhances the parallel processing of ensemble models
by integrating advanced multicore processing capabilities. Our approach not only achieves
significant speedup, but it also maintains high accuracy, addressing the limitations identi-
fied in previous studies. This advancement provides a more holistic solution that can be
widely applied, bridging the gap between theoretical research and practical implementation
in diverse industrial applications.
2. Wine Quality Metrics: This dataset, which is sometimes referred as Dataset2 in this
paper, was chosen from an online source [32]. In total, the dataset includes 4898
wines with 11 input features, namely total sulfur dioxide, alcohol, pH, density, free
sulfur dioxide, sulfates, chlorides, citric acid, residual sugar, fixed acidity, and volatile
acidity. It has one output called Quality, where the class ratings vary from 3 to 9,
and higher quality results are considered evidence of better quality. The problem type
is a supervised ML that builds a model that makes predictions based on data if there
is any ambiguity.
Table 1. Table showing summaries of the datasets considered for the proposed model.
5. Proposed Model
5.1. Overview
In this study, a parallel processing technique is presented for quickly and effectively
employing a ensemble classifier and KNN to divide a large amount of data into multiple
categories. The workflow plan for solving the problem using a Random Forest machine
learning model is shown in Figure 2. In order to evaluate the proposed parallel processing
ensemble models, two problems were considered. The overall implementation design of
the proposed algorithm is explained in Figure 3.
Core 1
Decision Tree 1
Sample 1
Labels Training Core 2
Sample 2 Decision Tree 2
Data
Raw
Data Test Data Sample n
Core n
Decision Tree n
Input Data Data processing Sampling
Prediction Evaluation
Figure 3. Schematic diagram of the proposed parallel processing technique for an ensemble model.
At first, the aim was to detect the fraud transaction of a credit card based on the
transaction details and amount. The features were compressed into a lower-dimensional
subspace using dimensionality reduction techniques. The advantage of decreasing the
dimension of the feature space is that less storage space is needed, and the learning
algorithm can operate much more speedily. The second goal was to estimate the quality of
a wine; as such, certain useful attributes were first extracted from the raw dataset. Some of
the attributes were redundant due to their high correlation with each other. The quality of
each wine was considered as the label of Dataset2, while the class of the transaction was
considered for Dataset1. The labels are kept separately for training and evaluation.
This study presents a parallel processing technique designed to efficiently utilize
ensemble classifiers such as Random Forest, AdaBoost, XGBoost classifier, and KNN
for large-scale data categorization. Random Forest is an ensemble learning method for
classification, regression, etc., and it operates by constructing a multitude of decision trees
at training time and outputting the class that is the mode of the classes (classification) or
Mach. Learn. Knowl. Extr. 2024, 6 1846
mean prediction (regression) of the individual trees. This ensemble of decision trees can
provide better generalization performance compared to a single decision tree.
The training data are fed into the ensemble classifier, along with the parameters for
parallel processing using n number of cores. At first, the classifier ensembles are divided
into group of decision trees so that the classifier would use all the available cores on the
machine. Once the classifier is trained, it is used to predict the class labels for the test data.
The accuracy of the classifier is then computed, and the execution time for training is noted.
To comprehensively evaluate the models and generalize their performance, three
distinct hardware configurations were selected. The chosen algorithms were executed on
all three devices using both datasets. The subsequent analysis and comparison of these
results are detailed in Section 6, providing insights into how the algorithms performed
across varied hardware setups. The overall implementation design is explained in Figure 3.
Table 2. Table showing the hardware specifications of the devices used for the experiment.
5
140
40
200
Execution Time (seconds)
Device 1 RF Device 1 XG
Device 2 RF 36 Device 2 XG
150 Device 3 RF Device 3 XG
32
100
28
50 24
0 20
1 2 4 8 1 2 4 8
Number of Cores Number of Cores
Execution
Figure 5.Figure time for time
5. Execution Random Forest (left)
for Random and(left)
Forest XGBoost (right) on(right)
and XGBoost Various
ondevices
various devices.
While comparing
6.1.1. Generalization the accuracy
Across Various Hardware of the four different ensemble models trained 374
Configurations with
Dataset1 on all three devices, the accuracy of all models was preserved across all devices,
The experiment is extended to different devices with varying hardware configurations, 375
indicating the robustness of the models’ performance in different hardware environments.
as detailed in Subsection 5.5. This analysis is crucial for generalizing and evaluating 376
In this comparison, the Random Forest, XGBoost, ADABoost, and KNN classifiers achieved
the proposed algorithm across different platforms with varying operating systems and 377
an accuracy of 99.96%, 99.92%, 99.91%, and 99.84%, respectively.
processors. 378
three devices. The graph shows that Device 1 was the fastest for both single-core and 380
Table 3 presents the multicore performance improvement speeds of various models
multicore processing for both models. Despite this, the pattern of reduction in execution 381
trained with Dataset1 on Device1. The Random Forest model demonstrated a notable
time with the addition of more cores is similar for each model across all devices. 382
improvement, scaling almost linearly with the number of cores up to 16, achieving a
While comparing the accuracy of the four different ensemble models trained with 383
speedup of 9.05 times. The ADABoost model showed similar trends with a significant
Dataset1 on all three devices, the accuracy of all models was preserved across all devices, 384
speedup, peaking at 7.28 times with 8 cores, but it then slightly decreased with 16 cores.
indicating the robustness of the models’ performance in different hardware environments. 385
XGBoost and KNNs exhibited more modest improvements, with XGBoost reaching a
in this comparison, the Random Forest, XGBoost, ADABoost, and KNN classifiers achieved 386
maximum speedup of 2.25 times and KNNs achieving up to 3.1 times, indicating less
accuracies of 99.96%, 99.92%, 99.91%, and 99.84%, respectively. 387
efficiency in utilizing multiple cores compared to Random Forest and ADABoost.
6.1.2. Speed Improvement Analysis 388
Table 3 presents the multicore performance improvement speeds of various models 389
trained with Dataset1 on Device1. The Random Forest model demonstrates a notable im- 390
provement, scaling almost linearly with the number of cores up to 16, achieving a speedup 391
Mach. Learn. Knowl. Extr. 2024, 6 1850
Table 3. The multicore performance improvement speed of various models trained with Dataset1
on Device1.
Table 4. The multicore performance improvement speed of various models trained with Dataset1
on Device2.
Table 5. The multicore performance improvement speeds of the various models trained with Dataset1
on Device3.
86
84
Accuracy (Percentage)
82
80
Random Forest
78
XGBoost
76
Version July 25, 2024 submitted to Journal Not Specified ADABoost 12 of 17
74 KNN Classifier
72
70
68
86
66
84
Accuracy (Percentage)
82 1 2 4 6 8
80
Number of Cores Random Forest
78
XGBoost
Figure 6. Accuracy76for
Figure 6. variousofmodels
Accuracy performance
the various models’ with Dataset2with
performance tested on Device2
Dataset2 when tested on Device2.
ADABoost
74 KNN Classifier
1.8 7 displays the execution times of the same
Figure models when trained on Dataset2
72
and tested
70 on Device2. As the number of cores increased,
1.6 there was a noticeable decrease in
Random Forest
Execution Time (seconds) Execution Time (seconds)
tje execution
68
1.4 time for all models. Random Forest, which initially had the highest execution
XGBoost
time of661.54 s with a single core, reduced to 0.76 s with 8 cores. Similarly, ADABoost’s
ADABoost
1.2 time decreased from 0.60 s to 0.40 s, and XGBoost’s
execution from 0.56 s to 0.47 s. KNNs
1 2 4 6KNN Classifier 8 reducing slightly to
consistently
1 showed the lowest execution times, starting at 0.08 s and
0.06 s with 8 cores. This reduction in execution
Number of Corestime with additional cores demonstrates the
0.8
efficiency gains achieved through parallel processing.
Figure 6. Accuracy
0.6for various models performance with Dataset2 tested on Device2
1.8
0.4
1.6
0.2
Random Forest
1.4
0 XGBoost
1 2 4 6 8
1.2 ADABoost
Number of Cores KNN Classifier
1
Figure 7. Execution time for various models performance with Dataset2 tested on Device2
0.8
0.6
Figure 7 displays the execution times of the same models when trained on Dataset2 420
and tested on Device2.
0.4 As the number of cores increases, there is a noticeable decrease in 421
execution time for all models. Random Forest, which initially has the highest execution 422
0.2 with a single core, reduces to 0.76 seconds with 8 cores. Similarly, 423
time of 1.54 seconds
ADABoost’s execution 0 time decreases from 0.60 seconds to 0.40 seconds, and XGBoost’s 424
1 2 4 6 8
from 0.56 seconds to 0.47 seconds. KNN consistently shows the lowest execution times, 425
starting at 0.08 seconds and reducing slightly Number
to 0.06of Cores
seconds with 8 cores. This reduction 426
in execution time
Figure 7. Execution with additional
time for various
Figure 7. Execution
cores
time formodels
demonstrates
performance
the various withefficiency
the Dataset2with
models’ performance
gains
tested achieved
on Device2
Dataset2
through 427
when tested on Device2.
parallel processing. 428
increasingnificant
from 1.45 times with 2with
improvement, cores to 2.02 times
a speedup with 8from
increasing cores.1.45
XGBoost exhibits
times with a more
ADABoost’s execution time decreases from 0.60 seconds to 0.40 seconds, and 2XGBoost’s
cores to 2.02433
times
424
modest speedup, peaking at 1.19 times with 8 cores, indicating limited scalability with
from 0.56with 8 cores. XGBoost exhibited
KNNa consistently
more modestshows speedup,the peaking at 1.19 times with 8425cores,
434
seconds to 0.47 seconds. lowest execution times,
additional cores. ADABoost
indicating limited shows a variable
scalability with pattern,cores.
additional achieving a speedup
ADABoost showed of 1.57
a times pattern,
variable 435
starting at 0.08 seconds and reducing slightly to 0.06 seconds with 8 cores. This reduction 426
with 2 cores, slightly
achieving decreasing
a speedup with
of 1.57 4 cores,
times and stabilizing
with 2 cores, at 1.5 times with 8 cores. KNN
in execution time with additional cores demonstrates the slightly
efficiencydecreasing with 4through
gains achieved cores, and436 stabi-
427
demonstrates
lizing moderate
at 1.5 timesimprovement,
with 8 cores. with
KNNs speedup values
demonstrated rising
moderate from 1.14 times
improvement, with 2 speedup
with 437
parallel processing. 428
cores to 1.33 times withfrom4 and 8 cores. Overall, thesetoresults highlight that while Random
The values rising
table 6 presents the 1.14 timesperformance
multicore with 2 cores 1.33 times
improvement with
speeds 4 and 8 cores.
of various Overall,438
models these
429
Forest and ADABoost
results highlightbenefit
that,significantly
while Random from multicore
Forest and processing,benefit
ADABoost XGBoost and KNNfrom
significantly 439 mul-
trained with Dataset2 on Device2, with values in seconds. The speedup metrics for Random 430
Forest, XGBoost, ADABoost, and KNN demonstrate how well these models utilize multiple 431
cores. Random Forest shows a consistent and significant improvement, with a speedup 432
increasing from 1.45 times with 2 cores to 2.02 times with 8 cores. XGBoost exhibits a more 433
modest speedup, peaking at 1.19 times with 8 cores, indicating limited scalability with 434
additional cores. ADABoost shows a variable pattern, achieving a speedup of 1.57 times 435
Mach. Learn. Knowl. Extr. 2024, 6 1852
ticore processing, XGBoost and KNN show more limited gains, underscoring the varying
efficiency of parallel processing across different models.
Table 6. The multicore performance improvement speeds of the various models trained with Dataset2
on Device2.
4 2
Execution Time (seconds)
3.5 1.8
1.6
3
1.4
2.5 1.2
2 1
1.5 0.8
0.6
1 Dataset1 RF Dataset1 XG
0.4
0.5 Dataset2 RF 0.2 Dataset2 XG
0 0
1 2 4 8 1 2 4 8
Number of Cores Number of Cores
Figure 8. Performance speed for Random Forest (left) and XGBoost (right) on two datasets.
Table 7. Comparison of the performance speed using quad-core processors against previous studies.
6.5. Discussions
The results of the implemented parallel computing algorithm reveal significant im-
provements in execution time while maintaining high accuracy across different machine
learning models. The Random Forest classifier, in particular, showed a noteworthy enhance-
ment in performance compared to other models. The divide-and-conquer strategy intrinsic
to the Random Forest algorithm inherently benefits from parallel processing as each tree
in the forest can be built independently and simultaneously. This modular nature allows
the Random Forest classifier to fully leverage the multiple cores available, leading to a
substantial reduction in execution time without sacrificing accuracy. For instance, the accu-
racy remained consistent to 99.96% across different core counts, while the execution time
improved markedly from 148.97 s with a single core to 38.24 s with four cores. This demon-
strates that the Random Forest classifier is particularly well suited for parallel processing,
providing both efficiency and reliability in handling large datasets.
In contrast, other models such as K-Nearest Neighbors (KNNs) exhibited less signifi-
cant improvements. While the KNNs model did benefit from parallel computing, the nature
of its algorithm—relying heavily on distance calculations for classification—did not par-
allelize as efficiently as the tree-based structure of Random Forest. As a result, the KNNs
model’s execution time showed only a modest decrease from 4.35 s with one core to 1.52 s
with eight cores, with the accuracy remaining stable at 99.84%. This comparison highlights
that the specific characteristics of the algorithm play a crucial role in determining the
extent of improvement achieved through parallel processing. Models like Random Forest,
with their inherently independent and parallelizable tasks, demonstrate more pronounced
performance gains, whereas models with more interdependent calculations, such as KNNs,
show relatively modest improvements.
6.6. Limitations
Despite the promising results, several limitations need to be acknowledged. First,
the performance gains observed were heavily dependent on the hardware configuration,
particularly the number of available cores and the architecture of the multicore processor.
In environments with fewer cores or less advanced hardware, the speedup may be less
significant. Second, the parallel approach introduced additional complexity in terms of
implementation and debugging, which could pose challenges for practitioners with limited
experience in parallel computing. Additionally, while the method was tested on a diverse
set of machine learning algorithms, certain algorithms or models with inherently sequential
operations may not benefit as much from parallelization. Finally, the overhead associated
with parallel processing, such as inter-thread communication and synchronization, can
diminish the performance improvements if not managed efficiently. Future work should
focus on addressing these limitations by optimizing the parallelization strategies and
exploring adaptive approaches that can dynamically adjust based on the hardware and
dataset characteristics.
Mach. Learn. Knowl. Extr. 2024, 6 1854
7. Conclusions
Across all scenarios, the accuracy of the models remained consistently high across
all scenarios, demonstrating robustness to increased core utilization. However, there was
a clear inverse relationship between processing time and the number of cores employed.
Notably, processing times exhibited exponential decreases with greater core utilization,
enabling accelerated training without sacrificing accuracy. Particularly for larger datasets,
multicore processing yielded significant speed enhancements compared to smaller datasets.
This study also generalized its findings across three distinct devices with varied hard-
ware configurations.
Looking ahead, future research could explore the extension of these techniques to
multicore GPUs, an area not covered in this study. Additionally, further investigations
could compare the outcomes with the state-of-the-art parallel computing methodologies
leveraging GPUs for machine learning applications.
Author Contributions: This manuscript benefited from the combined efforts of A.G. and F.A. A.G.
conceived and developed all the components of the research and drafted the initial manuscript. F.A.
provided valuable supervision throughout the process. Both A.G. and F.A. contributed significantly
to refining the writing. All authors have read and agreed to the published version of the manuscript.
Funding: This research is funded by a grant provided by the Air Force Research Lab (AFRL) through
the Assured and Trusted Digital Microelectronics Ecosystem (ADMETE) grant, BAA-FA8650-18-S-
1201, which was awarded to Wright State University, Dayton, Ohio, USA. This project was carried
out under CAGE Number 4B991 and DUNS Number 047814256.
Data Availability Statement: The dataset used in this work can be downloaded from the web
page at https://archive.ics.uci.edu/dataset/186/wine+quality (accessed on 16 June 2024) and
https://www.kaggle.com/datasets/yashpaloswal/fraud-detection-credit-card (accessed on 17 June
2024). The source code is available at the GitHub repository at https://github.com/aashutoshghimire/
parallel-ensemble-model (accessed on 30 July 2024).
Conflicts of Interest: The authors declare no conflicts of interest.
Abbreviations
The following abbreviations are used in this manuscript:
ML Machine Learning
CPU Central Processing Unit
RF Random Forest
ANN Artificial Neural Network
PPT Pattern Parallel Training
MPI Message Passing Interface
DNN Deep Neural Network
GPU Graphical Processing Unit
t-SNE t-Distributed Stochastic Neighbor Embedding
MCPU Million Cycles Per Second
LM Levenberg–Marquardt
ACO Ant Colony Optimization
XG XGBoost Classifier
KNN K-Nearest Neighbors
References
1. Onishi, S.; Nishimura, M.; Fujimura, R.; Hayashi, Y. Why Do Tree Ensemble Approximators Not Outperform the Recursive-Rule
eXtraction Algorithm? Mach. Learn. Knowl. Extr. 2024, 6, 658–678. [CrossRef]
2. Ghimire, A.; Asiri, A.N.; Hildebrand, B.; Amsaad, F. Implementation of secure and privacy-aware ai hardware using distributed
federated learning. In Proceedings of the 2023 IEEE 16th Dallas Circuits and Systems Conference (DCAS), Denton, TX, USA,
14–16 April 2023; pp. 1–6.
3. Dey, S.; Mukherjee, A.; Pal, A.; P, B. Embedded deep inference in practice: Case for model partitioning. In Proceedings of the 1st
Workshop on Machine Learning on Edge in Sensor Systems, New York, NY, USA, 10 November 2019; pp. 25–30.
Mach. Learn. Knowl. Extr. 2024, 6 1855
4. Li, B.; Gao, E.; Yin, J.; Li, X.; Yang, G.; Liu, Q. Research on the Deformation Prediction Method for the Laser Deposition
Manufacturing of Metal Components Based on Feature Partitioning and the Inherent Strain Method. Mathematics 2024, 12, 898.
[CrossRef]
5. Wiggers, W.; Bakker, V.; Kokkeler, A.B.; Smit, G.J. Implementing the conjugate gradient algorithm on multi-core systems. In
Proceedings of the 2007 International Symposium on System-on-Chip, Tampere, Finland, 20–21 November 2007; pp. 1–4.
6. Capra, M.; Bussolino, B.; Marchisio, A.; Shafique, M.; Masera, G.; Martina, M. An updated survey of efficient hardware
architectures for accelerating deep convolutional neural networks. Future Internet 2020, 12, 113. [CrossRef]
7. Chapagain, A.; Ghimire, A.; Joshi, A.; Jaiswal, A. Predicting breast cancer using support vector machine learning algorithm. Int.
Res. J. Innov. Eng. Technol. 2020, 4, 10.
8. Ghimire, A.; Tayara, H.; Xuan, Z.; Chong, K.T. CSatDTA: Prediction of Drug–Target Binding Affinity Using Convolution Model
with Self-Attention. Int. J. Mol. Sci. 2022, 23, 8453. [CrossRef] [PubMed]
9. Turchenko, V.; Paliy, I.; Demchuk, V.; Smal, R.; Legostaev, L. Coarse-Grain Parallelization of Neural Network-Based Face Detection
Method. In Proceedings of the 2007 4th IEEE Workshop on Intelligent Data Acquisition and Advanced Computing Systems:
Technology and Applications, Dortmund, Germany, 6–8 September 2007; pp. 155–158. [CrossRef]
10. Doetsch, P.; Golik, P.; Ney, H. A comprehensive study of batch construction strategies for recurrent neural networks in MXNet.
arXiv 2017, arXiv:1705.02414.
11. Casas, C.A. Parallelization of artificial neural network training algorithms: A financial forecasting application. In Proceedings of
the 2012 IEEE Conference on Computational Intelligence for Financial Engineering & Economics (CIFEr), New York, NY, USA,
29–30 March 2012; pp. 1–6.
12. Turchenko, V.; Triki, C.; Grandinetti, L.; Sachenko, A. Parallel Algorithm of Enhanced Historical Data Integration Using Neural
Networks. In Proceedings of the 2005 IEEE Intelligent Data Acquisition and Advanced Computing Systems: Technology and
Applications, Sofia, Bulgaria, 5–7 September 2005; pp. 66–73. [CrossRef]
13. Wang, J.; Han, Z. Research on speech emotion recognition technology based on deep and shallow neural network. In Proceedings
of the 2019 Chinese Control Conference (CCC), Guangzhou, China, 27–30 July 2019; pp. 3555–3558.
14. Naik, D.S.B.; Kumar, S.D.; Ramakrishna, S.V. Parallel processing of enhanced K-means using OpenMP. In Proceedings of the 2013
IEEE International Conference on Computational Intelligence and Computing Research, Enathi, India, 26–28 December 2013;
pp. 1–4. [CrossRef]
15. Todorov, D.; Zdraveski, V.; Kostoska, M.; Gusev, M. Parallelization of a Neural Network Algorithm for Handwriting Recognition:
Can we Increase the Speed, Keeping the Same Accuracy. In Proceedings of the 2021 44th International Convention on Information,
Communication and Electronic Technology (MIPRO), Opatija, Croatia, 27 September–1 October 2021; pp. 932–937.
16. Sun, S.; Chen, W.; Bian, J.; Liu, X.; Liu, T. Ensemble-compression: A new method for parallel training of deep neural networks. In
Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases; Springer: Cham, Switzerland,
2017; pp. 187–202.
17. Guan, S.U.; Li, S. Parallel Growing and Training of Neural Networks Using Output Parallelism. Trans. Neur. Netw. 2002,
13, 542–550. [CrossRef] [PubMed]
18. Chen, X.; Xiang, S.; Liu, C.L.; Pan, C.H. Vehicle Detection in Satellite Images by Parallel Deep Convolutional Neural Networks.
In Proceedings of the 2013 2nd IAPR Asian Conference on Pattern Recognition, Naha, Japan, 5–8 November 2013; pp. 181–185.
[CrossRef]
19. Farber, P.; Asanovic, K. Parallel neural network training on Multi-Spert. In Proceedings of the 3rd International Conference on
Algorithms and Architectures for Parallel Processing, Melbourne, Australia, 12 December 1997; pp. 659–666. [CrossRef]
20. Suri, N.N.R.; Deodhare, D.; Nagabhushan, P. Parallel Levenberg-Marquardt-Based Neural Network Training on Linux Clusters—A
Case Study. In Proceedings of the ICVGIP, Hyderabad, India, 16–18 December 2002; pp. 1–6.
21. Thulasiram, R.; Rahman, R.; Thulasiraman, P. Neural network training algorithms on parallel architectures for finance applications.
In Proceedings of the 2003 International Conference on Parallel Processing Workshops, Kaohsiung, Taiwan, 6–9 October 2003;
pp. 236–243. [CrossRef]
22. Aggarwal, K. Simulation of artificial neural networks on parallel computer architectures. In Proceedings of the 2010 International
Conference on Educational and Information Technology, Chongqing, China, 17–19 September 2010; Volume 2, pp. V2-255–V2-258.
[CrossRef]
23. Fejzagić, E.; Oputić, A. Performance comparison of sequential and parallel execution of the Ant Colony Optimization algorithm
for solving the traveling salesman problem. In Proceedings of the 2013 36th International Convention on Information and
Communication Technology, Electronics and Microelectronics (MIPRO), Opatija, Croatia, 20–24 May 2013; pp. 1301–1305.
24. Pu, Z.; Wang, K.; Yan, K. Face Key Point Location Method based on Parallel Convolutional Neural Network. In Proceedings of
the 2019 2nd International Conference on Safety Produce Informatization (IICSPI), Chongqing, China, 28–30 November 2019;
pp. 315–318. [CrossRef]
25. Autret, Y.; Thepaut, A.; Ouvradou, G.; Le Drezen, J.; Laisne, J. Parallel learning on the ArMenX machine by defining sub-networks.
In Proceedings of the 1993 International Conference on Neural Networks (IJCNN-93-Nagoya, Japan), Nagoya, Japan, 25–29
October 1993; Volume 1, pp. 915–918. [CrossRef]
26. Lee, B. Parallel neural networks for speech recognition. In Proceedings of the International Conference on Neural Networks
(ICNN’97), Houston, TX, USA 12 June 1997; Volume 4, pp. 2093–2097. [CrossRef]
Mach. Learn. Knowl. Extr. 2024, 6 1856
27. Dai, Q.; Xu, S.H.; Li, X. Parallel Process Neural Networks and Its Application in the Predication of Sunspot Number Series. In
Proceedings of the 2009 Fifth International Conference on Natural Computation, Tianjian, China, 14–16 August 2009; Volume 1,
pp. 237–241. [CrossRef]
28. Petkovic, D.; Altman, R.; Wong, M.; Vigil, A. Improving the explainability of Random Forest classifier—User centered approach.
In Proceedings of the Biocomputing 2018, Kohala Coast, HI, USA , 3–7 January 2018; pp. 204–215. [CrossRef]
29. Khomenko, V.; Shyshkov, O.; Radyvonenko, O.; Bokhan, K. Accelerating recurrent neural network training using sequence
bucketing and multi-GPU data parallelization. In Proceedings of the 2016 IEEE First International Conference on Data Stream
Mining & Processing (DSMP), Lviv, Ukraine, 23–27 August 2016; pp. 100–103. [CrossRef]
30. Borhade, P.; Deshmukh, R.; Murarka, S.; Agarwal, R. Image Classification using Parallel CPU and GPU Computing. Int. J. Eng.
Adv. Technol. 2020, 9, 839–843. [CrossRef]
31. Oswal, Y.P. Fraud Detection Credit Card. 2023. Available online: https://www.kaggle.com/datasets/yashpaloswal/fraud-
detection-credit-card/data (accessed on 17 June 2024).
32. Dua, D.; Graff, C. UCI Machine Learning Repository. University of California, Irvine, School of Information and Computer
Sciences. 2017. Available online: http://archive.ics.uci.edu/ml (accessed on 13 July 2024).
33. Fodor, I.K. A Survey of Dimension Reduction Techniques; Technical Report; Lawrence Livermore National Lab. (LLNL): Livermore,
CA, USA, 2002.
34. Kazemi, F.; Asgarkhani, N.; Jankowski, R. Machine learning-based seismic fragility and seismic vulnerability assessment of
reinforced concrete structures. Soil Dyn. Earthq. Eng. 2023, 166, 107761. [CrossRef]
35. Loh, W.Y. Classification and regression trees. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2011, 1, 14–23. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.