0% found this document useful (0 votes)
27 views17 pages

Make 06 00090 v2

This paper presents a novel parallel processing technique for enhancing the performance of supervised machine learning models in multicore environments, focusing on ensemble classifiers and KNN algorithms. The proposed method significantly improves training speed, achieving up to 3.8 times faster processing on quad-core CPUs while maintaining accuracy across various datasets. The research highlights the potential applications in sectors such as healthcare and finance, aiming to tackle the computational challenges posed by large datasets.

Uploaded by

dreams4desires
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views17 pages

Make 06 00090 v2

This paper presents a novel parallel processing technique for enhancing the performance of supervised machine learning models in multicore environments, focusing on ensemble classifiers and KNN algorithms. The proposed method significantly improves training speed, achieving up to 3.8 times faster processing on quad-core CPUs while maintaining accuracy across various datasets. The research highlights the potential applications in sectors such as healthcare and finance, aiming to tackle the computational challenges posed by large datasets.

Uploaded by

dreams4desires
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

machine learning &

knowledge extraction

Article
A Parallel Approach to Enhance the Performance of Supervised
Machine Learning Realized in a Multicore Environment
Ashutosh Ghimire and Fathi Amsaad *

Department of Computer Science and Engineering, Wright State University, Dayton, OH 45435, USA;
[email protected]
* Correspondence: [email protected]

Abstract: Machine learning models play a critical role in applications such as image recognition,
natural language processing, and medical diagnosis, where accuracy and efficiency are paramount.
As datasets grow in complexity, so too do the computational demands of classification techniques.
Previous research has achieved high accuracy but required significant computational time. This paper
proposes a parallel architecture for Ensemble Machine Learning Models, harnessing multicore CPUs
to expedite performance. The primary objective is to enhance machine learning efficiency without
compromising accuracy through parallel computing. This study focuses on benchmark ensemble
models including Random Forest, XGBoost, ADABoost, and K Nearest Neighbors. These models are
applied to tasks such as wine quality classification and fraud detection in credit card transactions. The
results demonstrate that, compared to single-core processing, machine learning tasks run 1.7 times
and 3.8 times faster for small and large datasets on quad-core CPUs, respectively.

Keywords: machine learning; parallel computing; accuracy; performance; ensemble model; multicore
processing

1. Introduction
Citation: Ghimire, A.; Amsaad, F. A
Machine learning models are increasingly used in many life-critical applications such
Parallel Approach to Enhance the
as image recognition, natural language processing, and medical diagnosis [1]. These models
Performance of Supervised Machine
are applied to learn from more complex datasets, which facilitates more precise findings
Learning Realized in a Multicore
and enhances the ability to identify patterns and connections among different aspects
Environment. Mach. Learn. Knowl.
Extr. 2024, 6, 1840–1856. https://
of information. ML models are particularly valuable in scenarios like medical decision
doi.org/10.3390/make6030090
making, where distinguishing between various types of illnesses is crucial. The ML learning
process often involves training data on specialized classifiers while also determining which
Academic Editor: Mehmed attributes to prioritize for optimal results. Moreover, there are various types of machine
Kantardzic
learning models, as given in Figure 1, and practitioners have the flexibility to choose from
Received: 18 June 2024 a range of approaches, utilize specific classifiers, and evaluate which method best aligns
Revised: 25 July 2024 with their objectives.
Accepted: 28 July 2024 One of the most challenging aspects of machine learning, particularly with large
Published: 2 August 2024 datasets, is the speed at which data processing occurs. Lengthy training periods for mas-
sive datasets can significantly slow down the training and evaluation processes. Moreover,
large datasets may require substantial computational power and memory, further exacer-
bating processing delays [2]. Addressing these challenges necessitates the use of efficient
Copyright: © 2024 by the authors. computing methods and optimizing resource allocation to expedite processing. Techniques
Licensee MDPI, Basel, Switzerland.
such as parallel processing, distributed computing, and cloud computing offer viable
This article is an open access article
solutions to improve processing efficiency.
distributed under the terms and
Several research efforts have explored the potential of parallelization to accelerate
conditions of the Creative Commons
supervised machine learning. A significant body of work has focused on data parallelism,
Attribution (CC BY) license (https://
where the dataset is partitioned across multiple cores to expedite training and prediction.
creativecommons.org/licenses/by/
Techniques such as data partitioning, feature partitioning, and model partitioning have been
4.0/).

Mach. Learn. Knowl. Extr. 2024, 6, 1840–1856. https://doi.org/10.3390/make6030090 https://www.mdpi.com/journal/make


Mach. Learn. Knowl. Extr. 2024, 6 1841

investigated to distribute computational load efficiently [3,4]. Additionally, researchers


have delved into algorithm-level parallelism, exploring opportunities to parallelize specific
operations within machine learning algorithms. For instance, parallel implementations of
matrix operations, gradient calculations, and optimization algorithms have been developed
to exploit multicore architectures [5,6].

Classification
Supervised
Learning
Regression

Machine Unsupervised
Clustering
Learning Learning

Reinforcement

Figure 1. Hierarchical structure of machine learning algorithms.

While existing research has made considerable progress in parallelizing supervised ma-
chine learning, several challenges and limitations still persist. Many proposed approaches
exhibit limited scalability, failing to effectively utilize the full potential of modern multi-
core processors as the problem size grows. Furthermore, achieving optimal performance
often requires intricate algorithm-specific optimizations, hindering the development of
general-purpose parallelization frameworks. Consequently, there remains a need for inno-
vative techniques that can address these challenges and deliver substantial performance
improvements across a wide range of machine learning algorithms and hardware platforms.
This paper proposes a novel parallel processing technique tailored for ensemble clas-
sifiers and KNN algorithms. This research aims to address these limitations and explore
the potential of parallel processing to enhance performance on multicore architectures.
By effectively partitioning data, parallelizing classifier training, and optimizing communi-
cation, the proposed approach seeks to improve computational efficiency and scalability in
multicore environments. The methodology will be evaluated using real-world datasets to
demonstrate its effectiveness in handling large-scale machine learning tasks.
The proposed parallel approach to enhance supervised machine learning boasts sig-
nificant potential across diverse sectors. By dramatically accelerating model training and
inference while preserving accuracy, this methodology is poised to revolutionize indus-
tries such as healthcare, finance, and retail [7]. For instance, in healthcare, it can expedite
drug discovery, enable real-time patient monitoring, and facilitate precision medicine [8].
Within the financial domain, it can bolster fraud detection, risk assessment, and algorithmic
trading. Furthermore, retailers can benefit from optimized inventory management, person-
alized marketing, and improved customer experience. The potential applications extend to
manufacturing, agriculture, and environmental science, where rapid data processing and
analysis are paramount.
The remainder of this paper is organized as follows. Section 2 discusses the related
work to this research, while Section 3 describes the data and materials used. Section 4
presents the proposed parallel processing methodology in detail, including data parti-
tioning, parallel classifier training, and optimization techniques. Section 5 describes the
experimental results, comparing the performance of the proposed method with existing ap-
proaches. Finally, Section 6 offers a comprehensive conclusion of the findings, highlighting
the contributions of this research and outlining potential avenues for future work.

2. Contributions
Many widely used machine learning models, such as Random Forest, are based on
ensemble modeling, as shown in Figure 2. This paper introduces an enhanced parallel
Mach. Learn. Knowl. Extr. 2024, 6 1842

processing method designed to accelerate the training of ensemble models. While parallel
processing techniques have been proposed previously, our approach demonstrates a sub-
stantial improvement in efficiency, as training with four cores is 3.8 times faster compared to
using a single core. Notably, our method maintains consistent accuracy across different core
configurations, ensuring that performance is not compromised as computational resources
are scaled.

Figure 2. Overview of the architecture of the Random Forest classifier machine learning model.

The major contributions of this research include the following:


1. Enhanced Parallel Processing: We present an improved parallel processing technique
that effectively leverages multicore environments to boost training speed.
2. Efficient Workload Distribution: Our method optimally distributes computational
tasks, significantly reducing processing time while maintaining model accuracy.
3. Comprehensive Validation: The proposed approach is rigorously validated through
experiments with diverse datasets and hardware configurations, demonstrating its
robustness and practical applicability.
4. Advancement in Parallel Machine Learning: This work advances the field by pro-
viding a practical solution for managing large-scale datasets and complex models,
thereby contributing to the state-of-the-art developments in parallel machine learning.

3. Related Works
Numerous advancements in the field of parallel neural network training have been
proposed to enhance the efficiency and speed of training processes. A notable contribution
is a model that achieves a 1.3 times faster increase over sequential neural network training
by analyzing sub-images at each level and filtering out non-faces at successive levels,
revealing functional limitations in the face detection technique due to the application of
three neural network cascades [9].
A map-reduce approach was applied in a new ML technique that partitions train-
ing data into smaller groups, distributing it randomly across multiple GPU processes
before each epoch. This method uses sequence bucketing optimization to manage training
sequences based on input lengths, leading to improved training performance by lever-
aging parallel processing [10]. Moreover, the backpropagation algorithm was used to
create a hardware-independent prediction tool, demonstrating quicker response times in
parallelizing ANN algorithms for applications like financial forecasting [11].
Another study focused on enhancing the efficiency of parallelizing the historical data
method across simultaneous computer networks. It discussed the necessity of different
numbers of input neurons for single-layer and multi-layer perceptrons to predict sensor
drift values, which depend on the neural network architecture [12]. This highlights the
Mach. Learn. Knowl. Extr. 2024, 6 1843

challenges and strategies involved in implementing complex parallel models on single-


processor systems.
The complexity and inefficiency of implementing numerous parallel models on single-
processor systems have been addressed by a novel parallel ML approach. This method
simplifies complex ML models using a parallel processing framework, and it includes
implementations like Systolic array, Warp, MasPar, and Connection Machine in parallel
computing systems [13]. Parallel neural network training processes are significantly influ-
enced by the extensive usage of training data, especially in voice recognition. The division
of batch sizes among multiple training processors is crucial to achieve outcomes compara-
ble to single-processor training. However, this division can slow down the full training
process due to the overhead associated with parallelism [13].
Iterative K-means clustering, typically unsuitable for large data volumes due to longer
completion times, was adapted for parallel processing. This approach achieves notable
performance improvements, as evidenced by an accuracy of 74.76 and an execution time
of 56 s. Additionally, exemplar parallelism in backpropagation neural network training
shows minimal overhead, significantly enhancing the speed and communication between
processes [14,15].
The ensemble–compression approach aggregates local model results through an en-
semble, aligning their outputs instead of their variables. This method, though it increases
model size, demonstrates better performance in modular networks by reducing test in-
accuracy significantly [16,17]. The use of deep learning convolutional neural networks
(DNNs) has also been proposed to improve face recognition and feature point calibration
techniques, despite the considerable time required for training and testing [18].
A parallel implementation of the backpropagation algorithm and the use of the LM
nonlinear optimization algorithm have shown promise in training neural networks ef-
ficiently across multiple processors. These methods demonstrate significant speedup
capabilities and are suited for handling real-world, high-dimensional, large-scale problems
due to the parallelizable nature of the Jacobian computation [19,20].
Neural network training for applications such as price forecasting in finance has been
improved through parallel and multithreaded backpropagation algorithms, outperforming
conventional autoregression models in accuracy. The use of OpenMP and MPI results
further illustrates the effectiveness of training set parallelism [21].
Various general-purpose hardware techniques and parallel architecture models have
been summarized in the literature, highlighting the importance of cost-effective approaches
to parallel ML implementation [22]. Moreover, the application of Ant Colony Optimiza-
tion (ACO) for solving the Traveling Salesman Problem (TSP) showcases the benefits of
parallelizing metaheuristics to speed up solution processes while maintaining quality
standards [23].
A new neural network training method that has applications in finance such as price
forecasting is presented [21]. This model implements four distinct algorithms that are
parallel and multithreaded backpropagation neural networks. To determine the accuracy of
our findings, we conducted some analysis to examine the performance of these algorithms
and compared the outcomes to those of a conventional autoregression model. By comparing
the OpenMP and MPI results, training set parallelism was found to defeat all types taken
into account.
A summary of the various studies that have been conducted in this area is pre-
sented [22]. After this discussion, various general-purpose hardware techniques are de-
scribed. Then, various parallel architecture models and topologies are described. Finally,
a conclusion is given based on the costs of techniques.
Deep learning convolutional neural network (DNN)-based models have been pro-
posed to enhance the understanding of face recognition and feature point calibration
techniques [24]. In addition, it has been proposed to increase the training example size and
develop a cascade structure algorithm. The proposed algorithm is resilient to interference
Mach. Learn. Knowl. Extr. 2024, 6 1844

from lighting, occlusion, pose, expression, and other factors. This algorithm can be used to
improve the accuracy of facial recognition programs.
A parallel machine and a demonstration on how parallel learning can be implemented
are proposed [25]. In this model, DSPs and transputers, which are the features of the device
and are called ArMenX, are studied. Neural network calculations are carried out on the
DSPs. Also, DSP dynamic allocation uses transputers, where the model is implemented to
use a pre-existing learning algorithm.
A parallel neural network with a confidence algorithm, as well as a parallel neural
network with a success/failure algorithm to achieve high performance in the test problem
of letter recognition from a set of phonemes, are presented [26]. In this method, data
partitioning allows us to decompose complex problems into smaller, more manageable
problems. This allows each neural network to better adapt to its particular subproblem.
In addressing the complexity of time-dependent systems, a prediction method for neu-
ral networks with parallelism is proposed. By dividing the process into short time intervals
and distributing the load across the network, this method enhances stress balancing and
improves approximation and forecasting capabilities, as demonstrated in a sunspot time
series forecasting [27].
A meta-heuristic called Ant Colony Optimization (ACO), inspired by natural processes,
has been parallelized to solve problems like the Traveling Salesman Problem (TSP). This
parallelization aims to expedite the solution process while maintaining algorithm quality,
highlighting the growing interest in parallelizing algorithms and metaheuristics with the
advent of parallel architectures [28].
Finally, recurrent neural network (RNN) training has been optimized through paral-
lelization and a two-stage network structure, significantly accelerating training for tasks
with varying input sequence lengths. The sequence bucketing and multi-GPU paralleliza-
tion techniques enhance training speed and efficiency, as illustrated by the application of
LSTM RNN to online handwriting recognition tasks [29].
The existing literature underscores the critical need for efficient processing of large
datasets using parallel computing techniques. Studies such as those by [23,30] have demon-
strated the effectiveness of parallel algorithms in reducing computation time and enhancing
the scalability of machine learning models. However, the approach in [23] focuses on spe-
cific algorithms like Ant Colony Optimization, and [30] is limited to image-based datasets.
The proposed method builds upon this foundational work by offering a generalized
parallel approach that is versatile across various machine learning models and datasets.
With the increasing popularity of ensemble modeling due to its robustness, stability, and re-
duction in overfitting, our algorithm enhances the parallel processing of ensemble models
by integrating advanced multicore processing capabilities. Our approach not only achieves
significant speedup, but it also maintains high accuracy, addressing the limitations identi-
fied in previous studies. This advancement provides a more holistic solution that can be
widely applied, bridging the gap between theoretical research and practical implementation
in diverse industrial applications.

4. Data and Materials


This research discusses the performance of machine learning (ML) classification when
combined with distributed computing and CPU cores. For this experiment, two kinds of
datasets of different sizes were used for different problems in order to observe the pattern
of the proposed parallel algorithm for ensemble models with respect to size. The overall
summary of the datasets is given in Table 1. The datasets are as follows:
1. Credit Card Transactions: This dataset, which is sometimes referred as Dataset1 in this
paper, was chosen from an online source [31]. In total, the dataset has 284,807 instances
of transaction, with 30 input features like time, transaction amount, and 28 other
components. It has one output label called Class, which is the target variable, where
1 indicates a fraudulent transaction and 0 indicates a non-fraudulent transaction.
Mach. Learn. Knowl. Extr. 2024, 6 1845

2. Wine Quality Metrics: This dataset, which is sometimes referred as Dataset2 in this
paper, was chosen from an online source [32]. In total, the dataset includes 4898
wines with 11 input features, namely total sulfur dioxide, alcohol, pH, density, free
sulfur dioxide, sulfates, chlorides, citric acid, residual sugar, fixed acidity, and volatile
acidity. It has one output called Quality, where the class ratings vary from 3 to 9,
and higher quality results are considered evidence of better quality. The problem type
is a supervised ML that builds a model that makes predictions based on data if there
is any ambiguity.
Table 1. Table showing summaries of the datasets considered for the proposed model.

Description Dataset1 Dataset2


Number of samples 284,807 4898
Number of features 30 11
Number of output classes 2 7

5. Proposed Model
5.1. Overview
In this study, a parallel processing technique is presented for quickly and effectively
employing a ensemble classifier and KNN to divide a large amount of data into multiple
categories. The workflow plan for solving the problem using a Random Forest machine
learning model is shown in Figure 2. In order to evaluate the proposed parallel processing
ensemble models, two problems were considered. The overall implementation design of
the proposed algorithm is explained in Figure 3.

Core 1
Decision Tree 1
Sample 1
Labels Training Core 2
Sample 2 Decision Tree 2
Data
Raw
Data Test Data Sample n
Core n
Decision Tree n
Input Data Data processing Sampling

New Data Best model Voting


Ensemble Model
Labels Labels

Prediction Evaluation

Figure 3. Schematic diagram of the proposed parallel processing technique for an ensemble model.

At first, the aim was to detect the fraud transaction of a credit card based on the
transaction details and amount. The features were compressed into a lower-dimensional
subspace using dimensionality reduction techniques. The advantage of decreasing the
dimension of the feature space is that less storage space is needed, and the learning
algorithm can operate much more speedily. The second goal was to estimate the quality of
a wine; as such, certain useful attributes were first extracted from the raw dataset. Some of
the attributes were redundant due to their high correlation with each other. The quality of
each wine was considered as the label of Dataset2, while the class of the transaction was
considered for Dataset1. The labels are kept separately for training and evaluation.
This study presents a parallel processing technique designed to efficiently utilize
ensemble classifiers such as Random Forest, AdaBoost, XGBoost classifier, and KNN
for large-scale data categorization. Random Forest is an ensemble learning method for
classification, regression, etc., and it operates by constructing a multitude of decision trees
at training time and outputting the class that is the mode of the classes (classification) or
Mach. Learn. Knowl. Extr. 2024, 6 1846

mean prediction (regression) of the individual trees. This ensemble of decision trees can
provide better generalization performance compared to a single decision tree.
The training data are fed into the ensemble classifier, along with the parameters for
parallel processing using n number of cores. At first, the classifier ensembles are divided
into group of decision trees so that the classifier would use all the available cores on the
machine. Once the classifier is trained, it is used to predict the class labels for the test data.
The accuracy of the classifier is then computed, and the execution time for training is noted.
To comprehensively evaluate the models and generalize their performance, three
distinct hardware configurations were selected. The chosen algorithms were executed on
all three devices using both datasets. The subsequent analysis and comparison of these
results are detailed in Section 6, providing insights into how the algorithms performed
across varied hardware setups. The overall implementation design is explained in Figure 3.

5.2. Data Preprocessing


This subsection details the preprocessing steps applied to each dataset described
in Section 4. For Dataset1, initial features obtained from credit card transactions were
reduced to 28 components using Principal Component Analysis (PCA) [33]. To ensure user
confidentiality, the original features were transformed into a new set of orthogonal features
through PCA, resulting in anonymized columns. As the open-source dataset already had
PCA applied, it was unnecessary to repeat this preprocessing in the experiments.
For both of the datasets, standardization was applied to ensure that each feature
contributed equally to the analysis. Standardization involves scaling the data such that it
has a mean of zero and a standard deviation of one. This process helps in making features
comparable by centering them around the mean and scaling them based on standard
deviation. It is particularly beneficial for models that are sensitive to the scale of data, such
as KNN, Random Forest, XGBoost, and AdaBoost, as it can improve model performance
and consistency. Additionally, standardization can enhance the convergence speed of
gradient-based algorithms by making the optimization process more stable.
The final preprocessing step was to split the dataset into training and testing sets,
with 80% of the data used for training and 20% for testing, which is a common approach in
many studies [34]. Splitting data into training and testing sets is crucial in machine learning
to ensure unbiased evaluation, to detect overfitting, and to assess the generalization ability
of a model to new and unseen data.

5.3. Multicore Processing for Ensemble Models


The ensemble learning method constructs multiple classifiers, like decision trees in
case of RF, and combines their predictions. The parallelization of the training process can
significantly improve the computational efficiency, especially for large datasets. The core
idea is to train individual decision trees in parallel, leveraging the inherent independence
between the trees. Figure 3 shows the training of an ensemble classifier with a parallel
processing technique. The training of the classifiers is parallelized through the following
three major steps.

5.3.1. Data Partitioning


The training data are partitioned into multiple subsets by randomly sampling in-
stances, which is called bootstrap sampling. The technique was used in the ensemble
model to create multiple subsets of the training data for building an individual classifier. It
involves randomly sampling instances from the original training dataset with replacements.
This means that each subset can contain duplicate instances, and some instances from the
original dataset may not be included in a particular subset. By creating these bootstrap
samples, the classifier in the ensemble model (EM) is trained on a slightly different subset
of the data, introducing diversity among the trees. This diversity helps to reduce the overall
variance and improve the generalization performance of the EM. The process of bootstrap
sampling works as given in Algorithm 1.
Mach. Learn. Knowl. Extr. 2024, 6 1847

Algorithm 1: Bootstrap Sampling Algorithm for Generating Subsets from the


Original Dataset
Data: Original training dataset T with size N and number of subsets M
Result: List of bootstrap subsets S
1 Initialize an empty list of subsets S;
2 for i ← 1 to M do
3 Initialize an empty subset subseti ;
4 while size of subseti < N do
5 Randomly select an instance instance from T with a replacement;
6 Add instance to subseti ;
7 Add subseti to S;
8 return S;

5.3.2. Parallel Tree Construction


Each subset of the training data is assigned to a separate processor or computing node.
These processors or nodes independently build a decision tree using the assigned subset
of the training data. During the tree construction process, a random subset of features
is selected at each node split, further introducing diversity among the trees. The CARTs
(Classification and Regression Trees) algorithm, which is the tree construction algorithm,
is executed in parallel on each processor or node [35]. Synchronization mechanisms
are required to ensure that the parallel tree construction processes do not interfere with
each other and that the final results are correctly aggregated. By parallelizing the tree
construction process, the computational load is distributed across multiple processors or
nodes, leading to significant performance improvements, especially for large datasets.

5.3.3. Ensemble Combination


Once all classifiers are trained, they are combined to form the final model. The combi-
nation process is typically straightforward and does not require parallelization as it involves
a simple aggregation of the predictions of individual classifiers. The combination process
does not require significant computational resources as it involves simple operations like
counting votes or calculating averages.
In a Random Forest classifier, the process of making a final prediction involves multiple
steps. Firstly, for a given test instance, each decision tree in the Random Forest makes a
prediction of the classes for the classification task. The final prediction is determined by
a majority voting among the individual tree predictions. The class with the most votes is
assigned as the final prediction.

5.4. Parallelization of the K-Nearest Neighbors Algorithm


The parallelization of KNNs is performed in a similar process of a multicore processing
of ensemble models. Firstly, the training data are partitioned into multiple subsets, as
explained in Section 5.3.1 For parallel KNNs training, each subset of the training data
is assigned to a separate processor/node. On each processor/node, a separate KNNs
model is trained using only the assigned subset of the training data. During the training
process, each processor/node computes the distances between the instances in its subset
and the new data point to be classified. When a new data point needs to be classified, it is
distributed to all processors/nodes.
Each processor/node finds the k-nearest neighbors from its subset and makes a predic-
tion based on those neighbors. The final prediction is obtained by combining the predictions
from all processors/nodes. This can be achieved by techniques like the following: majority
voting, where the class predicted by the highest number of processors/nodes is chosen; and
distance-weighted voting, where the class predictions are weighted by the inverse of the
distances from the neighbors, and the class with the highest weighted sum is then chosen.
Mach. Learn. Knowl. Extr. 2024, 6 1848

When a new data point requires classification, it is distributed to all processors/nodes.


Each processor/node then finds the k-nearest neighbors from its subset of the training
data and makes a class prediction based on those neighbors. To obtain the final prediction,
the predictions from all processors/nodes are combined using one of two techniques.
The first technique is majority voting, where the class predicted by the highest number
of processors/nodes is chosen as the final prediction. This is a simple and effective approach
that does not consider the relative distances of the neighbors. The second technique is
distance-weighted voting, where the class predictions are weighted by the inverse of the
distances from the neighbors. The class with the highest weighted sum is then chosen as the
final prediction. This approach takes into account the relative distances of the neighbors,
potentially improving the accuracy of the final prediction.

5.5. Hardware Specifications


The experiments were conducted on three multicore environments with the specifica-
tions given in Table 2.

Table 2. Table showing the hardware specifications of the devices used for the experiment.

Specifications Device1 Device2 Device3


Dell OptiPlex Tower
Device Name HP Envy 13 × 360 MacBook Air M1 2020
Plus 7010
Processor Intel I7-13700K AMD Ryzen 7 4700U Apple M1
Number of cores 16 cores 8 cores 8 cores
2.0 GHz (Base)/4.1
Clock Speed 3.40 GHz 3.2 GHz
GHz (Max Boost)
Memory 32 GB 16 GB 8 GB
Operating System Windows 11 Windows 10 MacOs Sonoma 14.5
Programming
Python 3 Python 3 Python 3
Language

6. Results and Discussion


This study investigated the impact of increasing the number of cores on the accu-
racy and speed of machine learning algorithms with a parallel processing architecture.
The results demonstrate that, while the accuracy remains relatively constant, the speed
varies inversely with the number of cores employed. The results are analyzed in the
following subsections.

6.1. Performance with Dataset1


For the larger dataset in the experiment, Dataset1, the classification models achieved
high accuracy, with the highest being 99.96% for Device1. The results show that the accuracy
obtained by Random Forest, ADABoost, and XGBoost were 99.96%, 99.91%, and 99.92%,
respectively. Despite slight differences in accuracy among the models, the accuracy was
maintained regardless of the number of cores used.
Figure 4 also shows the execution times for the various models tested on Device1 while
training with Dataset1. For better visualization, Figure 4 was divided into two parts based
on the range of execution times. Models like XGBoost and KNN exhibited lower training
times compared to Random Forest and ADABoost. The slowest model without parallel
processing was Random Forest, with an execution time of 148.97 s, while the slowest model
with 16-core processing was ADABoost, with an execution time of 20.08 s. The fastest
model for training with single-core processing was XGBoost, with an execution time of
30.80 s after KNNs with 4.35 s, and with 16 cores, the same model took just 24.55 s for
training while KNNs took 1.40 s.
Mach. Learn. Knowl. Extr. 2024, 6 1849

5
140

Execution Time (seconds)

Execution Time (seconds)


RF KNN
120 ADABoost 4
100 XGBoost
3
80
60 2
Version July 25, 2024 submitted to Journal Not Specified 10 of 17
40
1
20
0 05
140 1 2 4 6 8 10 12 14 16 1 2 4 6 8 10 12 14 16
Execution Time (seconds)

Execution Time (seconds)


Random Forest KNN
120 Number of Cores
ADABoost 4 Number of Cores
Figure 100 XGBoost
4. Execution time for Various models Performance with Dataset1 tested on Device1.
3
80
6.1.1. Generalization across Various Hardware Configurations
The60experiment was extended to different2devices with varying hardware configura-
tions, as40detailed in Section 5.5. This analysis was crucial for generalizing and evaluating
the proposed algorithm across different platforms 1 with varying operating systems and pro-
20
cessors.
0 5 compares the performance of the
Figure 0 Random Forest and XGBoost classifiers
1 2 4 6 8 10 12 14 16 1 2 4 6 8 10 12 14 16
across three devices. The graph shows that Device1 was the fastest for both single-core and
Number
multicore processing forofboth models. Despite this, theNumber
Cores of reduction
pattern of Cores in the execution
Figure 4.time with the
Execution timeaddition of more
for Various modelscores was similar
Performance with for each model
Dataset1 tested onacross all devices.
Device1

40
200
Execution Time (seconds)

Execution Time (seconds)

Device 1 RF Device 1 XG
Device 2 RF 36 Device 2 XG
150 Device 3 RF Device 3 XG
32

100
28

50 24

0 20
1 2 4 8 1 2 4 8
Number of Cores Number of Cores
Execution
Figure 5.Figure time for time
5. Execution Random Forest (left)
for Random and(left)
Forest XGBoost (right) on(right)
and XGBoost Various
ondevices
various devices.

While comparing
6.1.1. Generalization the accuracy
Across Various Hardware of the four different ensemble models trained 374
Configurations with
Dataset1 on all three devices, the accuracy of all models was preserved across all devices,
The experiment is extended to different devices with varying hardware configurations, 375
indicating the robustness of the models’ performance in different hardware environments.
as detailed in Subsection 5.5. This analysis is crucial for generalizing and evaluating 376
In this comparison, the Random Forest, XGBoost, ADABoost, and KNN classifiers achieved
the proposed algorithm across different platforms with varying operating systems and 377
an accuracy of 99.96%, 99.92%, 99.91%, and 99.84%, respectively.
processors. 378

Figure 5 compares the performance


6.1.2. Speed Improvement Analysis of Random Forest and XGBoost classifiers across 379

three devices. The graph shows that Device 1 was the fastest for both single-core and 380
Table 3 presents the multicore performance improvement speeds of various models
multicore processing for both models. Despite this, the pattern of reduction in execution 381
trained with Dataset1 on Device1. The Random Forest model demonstrated a notable
time with the addition of more cores is similar for each model across all devices. 382
improvement, scaling almost linearly with the number of cores up to 16, achieving a
While comparing the accuracy of the four different ensemble models trained with 383
speedup of 9.05 times. The ADABoost model showed similar trends with a significant
Dataset1 on all three devices, the accuracy of all models was preserved across all devices, 384
speedup, peaking at 7.28 times with 8 cores, but it then slightly decreased with 16 cores.
indicating the robustness of the models’ performance in different hardware environments. 385
XGBoost and KNNs exhibited more modest improvements, with XGBoost reaching a
in this comparison, the Random Forest, XGBoost, ADABoost, and KNN classifiers achieved 386
maximum speedup of 2.25 times and KNNs achieving up to 3.1 times, indicating less
accuracies of 99.96%, 99.92%, 99.91%, and 99.84%, respectively. 387
efficiency in utilizing multiple cores compared to Random Forest and ADABoost.
6.1.2. Speed Improvement Analysis 388

Table 3 presents the multicore performance improvement speeds of various models 389

trained with Dataset1 on Device1. The Random Forest model demonstrates a notable im- 390

provement, scaling almost linearly with the number of cores up to 16, achieving a speedup 391
Mach. Learn. Knowl. Extr. 2024, 6 1850

Table 3. The multicore performance improvement speed of various models trained with Dataset1
on Device1.

Core Random Forest XGBoost ADABoost KNNs


1 1 1 1 1
2 1.98 1.32 2.2 1.61
4 3.89 1.55 4.13 2.44
8 6.6 2.14 7.28 2.95
16 9.05 2.25 7.06 3.1

Table 4 illustrates the performance improvements on Device2. Random Forest contin-


ued to show significant speedup, reaching up to 4.01 times with 8 cores, although this was
less than the improvement seen on Device1. ADABoost also performed well, achieving a
speedup of 4.39 times with 8 cores. XGBoost and KNNs displayed more limited improve-
ments, with XGBoost peaking at 1.87 times and KNNs at 2.12 times. The reduced speedup
compared to Device1 suggests that Device2 may have less efficient multicore processing
capabilities or other hardware limitations affecting performance.

Table 4. The multicore performance improvement speed of various models trained with Dataset1
on Device2.

Core Random Forest XGBoost ADABoost KNN


1 1 1 1 1
2 1.72 1.55 1.86 1.22
4 2.53 1.85 3.02 1.94
8 4.01 1.87 4.39 2.12

Table 5 provides the multicore performance improvement speeds on Device3. Random


Forest showed a maximum speedup of 4.52 times with 8 cores, while ADABoost reached
a speedup of 4.4 times, indicating strong parallel processing capabilities on this device.
XGBoost and KNNs again demonstrated more modest improvements, with XGBoost
peaking at 2.13 times and KNNs at 2.15 times. Overall, Device3 appeared to perform better
than Device2 but not as well as Device1 in terms of multicore speedup, highlighting the
variability in multicore processing efficiency across different hardware configurations.

Table 5. The multicore performance improvement speeds of the various models trained with Dataset1
on Device3.

Core Random Forest XGBoost ADABoost KNNs


1 1 1 1 1
2 1.94 1.63 1.91 1.35
4 2.95 1.85 3.7 2.01
8 4.52 2.13 4.4 2.15

6.2. Performance with Dataset2


Figure 6 illustrates the accuracy of the various machine learning models—Random
Forest, XGBoost, ADABoost, and KNNs—when trained on Dataset2 and tested on Device2.
The accuracy metrics remained constant across all core configurations, indicating that in-
creasing the number of cores does not impact the accuracy of these models. Random Forest
and KNNs exhibited stable accuracy rates of 70.71% and 81.0%, respectively. ADABoost
achieved the highest accuracy at 86.0%, while XGBoost maintained an accuracy of 67.96%.
This consistency suggests that the classification performance of the ensemble models is
robust to changes in computational resources when parallelized by the proposed algorithm.
Version JulyMach. Learn.
25, 2024 Knowl. Extr.
submitted to Journal 6 Specified
2024,Not 12 of 17 1851

86
84

Accuracy (Percentage)
82
80
Random Forest
78
XGBoost
76
Version July 25, 2024 submitted to Journal Not Specified ADABoost 12 of 17
74 KNN Classifier
72
70
68
86
66
84
Accuracy (Percentage)

82 1 2 4 6 8
80
Number of Cores Random Forest
78
XGBoost
Figure 6. Accuracy76for
Figure 6. variousofmodels
Accuracy performance
the various models’ with Dataset2with
performance tested on Device2
Dataset2 when tested on Device2.
ADABoost
74 KNN Classifier
1.8 7 displays the execution times of the same
Figure models when trained on Dataset2
72
and tested
70 on Device2. As the number of cores increased,
1.6 there was a noticeable decrease in
Random Forest
Execution Time (seconds) Execution Time (seconds)

tje execution
68
1.4 time for all models. Random Forest, which initially had the highest execution
XGBoost
time of661.54 s with a single core, reduced to 0.76 s with 8 cores. Similarly, ADABoost’s
ADABoost
1.2 time decreased from 0.60 s to 0.40 s, and XGBoost’s
execution from 0.56 s to 0.47 s. KNNs
1 2 4 6KNN Classifier 8 reducing slightly to
consistently
1 showed the lowest execution times, starting at 0.08 s and
0.06 s with 8 cores. This reduction in execution
Number of Corestime with additional cores demonstrates the
0.8
efficiency gains achieved through parallel processing.
Figure 6. Accuracy
0.6for various models performance with Dataset2 tested on Device2
1.8
0.4
1.6
0.2
Random Forest
1.4
0 XGBoost
1 2 4 6 8
1.2 ADABoost
Number of Cores KNN Classifier
1
Figure 7. Execution time for various models performance with Dataset2 tested on Device2
0.8
0.6
Figure 7 displays the execution times of the same models when trained on Dataset2 420
and tested on Device2.
0.4 As the number of cores increases, there is a noticeable decrease in 421
execution time for all models. Random Forest, which initially has the highest execution 422
0.2 with a single core, reduces to 0.76 seconds with 8 cores. Similarly, 423
time of 1.54 seconds
ADABoost’s execution 0 time decreases from 0.60 seconds to 0.40 seconds, and XGBoost’s 424
1 2 4 6 8
from 0.56 seconds to 0.47 seconds. KNN consistently shows the lowest execution times, 425
starting at 0.08 seconds and reducing slightly Number
to 0.06of Cores
seconds with 8 cores. This reduction 426
in execution time
Figure 7. Execution with additional
time for various
Figure 7. Execution
cores
time formodels
demonstrates
performance
the various withefficiency
the Dataset2with
models’ performance
gains
tested achieved
on Device2
Dataset2
through 427
when tested on Device2.
parallel processing. 428

The table Table


6 presents the
6 presents multicore performance
the multicore improvement
performance speeds
improvement of various
speeds models
of various429 mod-
Figure 7 displays the execution times of the same models when trained on Dataset2 420
trained with
els Dataset2
when on
trainedDevice2,
with with values
Dataset2 onin seconds.
Device2, The
with speedup
the metrics
values for Random
represented in seconds.
430
and tested on Device2. As the number of cores increases, there is a noticeable decrease in 421
Forest, XGBoost, ADABoost, and KNN demonstrate how well these models andutilize multiple
executionThe speedup metrics for Random Forest, XGBoost, ADABoost, KNNs demonstrate
431
time for all models. Random Forest, which initially has the highest execution 422
cores. Random Forest shows a consistent and significant improvement, with aa consistent
speedup and
time of 1.54 seconds with a single core, reduces to 0.76 seconds with 8 cores. Similarly, 423 sig-
how well these models utilize multiple cores. Random Forest showed 432

increasingnificant
from 1.45 times with 2with
improvement, cores to 2.02 times
a speedup with 8from
increasing cores.1.45
XGBoost exhibits
times with a more
ADABoost’s execution time decreases from 0.60 seconds to 0.40 seconds, and 2XGBoost’s
cores to 2.02433
times
424
modest speedup, peaking at 1.19 times with 8 cores, indicating limited scalability with
from 0.56with 8 cores. XGBoost exhibited
KNNa consistently
more modestshows speedup,the peaking at 1.19 times with 8425cores,
434
seconds to 0.47 seconds. lowest execution times,
additional cores. ADABoost
indicating limited shows a variable
scalability with pattern,cores.
additional achieving a speedup
ADABoost showed of 1.57
a times pattern,
variable 435
starting at 0.08 seconds and reducing slightly to 0.06 seconds with 8 cores. This reduction 426
with 2 cores, slightly
achieving decreasing
a speedup with
of 1.57 4 cores,
times and stabilizing
with 2 cores, at 1.5 times with 8 cores. KNN
in execution time with additional cores demonstrates the slightly
efficiencydecreasing with 4through
gains achieved cores, and436 stabi-
427
demonstrates
lizing moderate
at 1.5 timesimprovement,
with 8 cores. with
KNNs speedup values
demonstrated rising
moderate from 1.14 times
improvement, with 2 speedup
with 437
parallel processing. 428
cores to 1.33 times withfrom4 and 8 cores. Overall, thesetoresults highlight that while Random
The values rising
table 6 presents the 1.14 timesperformance
multicore with 2 cores 1.33 times
improvement with
speeds 4 and 8 cores.
of various Overall,438
models these
429
Forest and ADABoost
results highlightbenefit
that,significantly
while Random from multicore
Forest and processing,benefit
ADABoost XGBoost and KNNfrom
significantly 439 mul-
trained with Dataset2 on Device2, with values in seconds. The speedup metrics for Random 430
Forest, XGBoost, ADABoost, and KNN demonstrate how well these models utilize multiple 431

cores. Random Forest shows a consistent and significant improvement, with a speedup 432

increasing from 1.45 times with 2 cores to 2.02 times with 8 cores. XGBoost exhibits a more 433

modest speedup, peaking at 1.19 times with 8 cores, indicating limited scalability with 434

additional cores. ADABoost shows a variable pattern, achieving a speedup of 1.57 times 435
Mach. Learn. Knowl. Extr. 2024, 6 1852

ticore processing, XGBoost and KNN show more limited gains, underscoring the varying
efficiency of parallel processing across different models.

Table 6. The multicore performance improvement speeds of the various models trained with Dataset2
on Device2.

Core Random Forest XGBoost ADABoost KNNs


1 1 1 1 1
2 1.45 1.07 1.57 1.14
4 1.75 1.09 1.42 1.33
8 2.02 1.19 1.5 1.33

6.3. Comparison of the Speed Improvements in between Datasets


Figure 8 illustrates the performance speed of the Random Forest and XGBoost models
on the two datasets, comparing their execution times across different numbers of cores.
For Random Forest, Dataset1 demonstrated a substantial decrease in execution time with
increasing cores, achieving the most significant speedup by reaching 4.01 times with 8 cores.
Dataset2 also showed improved performance with more cores, though to a lesser extent,
peaking at 2.02 times with 8 cores. In contrast, XGBoost’s execution times exhibited modest
improvements. For Dataset1, the speedup reached 1.87 times with 8 cores, while for
Dataset2, the performance gains were minimal, with a peak speedup of only 1.09 times
regardless of the number of cores. This comparison highlights that, while Random Forest
benefited significantly from parallel processing on both datasets, XGBoost’s performance
improvements were more restrained, particularly for Dataset2.

4 2
Execution Time (seconds)

Execution Time (seconds)

3.5 1.8
1.6
3
1.4
2.5 1.2
2 1
1.5 0.8
0.6
1 Dataset1 RF Dataset1 XG
0.4
0.5 Dataset2 RF 0.2 Dataset2 XG
0 0
1 2 4 8 1 2 4 8
Number of Cores Number of Cores
Figure 8. Performance speed for Random Forest (left) and XGBoost (right) on two datasets.

6.4. Comparison of the Results to the Existing Works


Table 7 presents a comparison of the proposed techniques with existing works for vari-
ous problems. For the Traveling Salesman Problem, the proposed technique of parallelizing
the Ant Colony Optimization (ACO) algorithm resulted in a 1.007 times faster execution.
In the case of handwriting recognition using the MNIST digits dataset, the proposed back-
propagation algorithm with an exemplar parallelism of neural networks (NNs) achieved a
2.4 times speedup, albeit with a 6.7 percent decrease in accuracy. For image classification
using convolutional neural networks (CNNs), the proposed technique of data partitioning
and parallel training with multicores resulted in a 1.05 times faster execution. In the pro-
posed model for wine quality classification and fraud detection using the Random Forest
classifier, the proposed technique achieved a 2.22 times and 3.89 times faster execution,
respectively, with respect to maintaining the accuracy fluctuation.
Mach. Learn. Knowl. Extr. 2024, 6 1853

Table 7. Comparison of the performance speed using quad-core processors against previous studies.

Paper Problem Algorithm Result


Ant Colony
[23] TSP 1.007 times faster
Optimization
[15] MINST digits Backpropagation 2.4 times
Convolution Neural
Image classification 1.05 times faster
Network
Wine Quality Random Forest
Proposed Model 1.75 times faster
Classification Classifier
Random Forest
Proposed Model Fraud Detection 3.89 times faster
Classifier

6.5. Discussions
The results of the implemented parallel computing algorithm reveal significant im-
provements in execution time while maintaining high accuracy across different machine
learning models. The Random Forest classifier, in particular, showed a noteworthy enhance-
ment in performance compared to other models. The divide-and-conquer strategy intrinsic
to the Random Forest algorithm inherently benefits from parallel processing as each tree
in the forest can be built independently and simultaneously. This modular nature allows
the Random Forest classifier to fully leverage the multiple cores available, leading to a
substantial reduction in execution time without sacrificing accuracy. For instance, the accu-
racy remained consistent to 99.96% across different core counts, while the execution time
improved markedly from 148.97 s with a single core to 38.24 s with four cores. This demon-
strates that the Random Forest classifier is particularly well suited for parallel processing,
providing both efficiency and reliability in handling large datasets.
In contrast, other models such as K-Nearest Neighbors (KNNs) exhibited less signifi-
cant improvements. While the KNNs model did benefit from parallel computing, the nature
of its algorithm—relying heavily on distance calculations for classification—did not par-
allelize as efficiently as the tree-based structure of Random Forest. As a result, the KNNs
model’s execution time showed only a modest decrease from 4.35 s with one core to 1.52 s
with eight cores, with the accuracy remaining stable at 99.84%. This comparison highlights
that the specific characteristics of the algorithm play a crucial role in determining the
extent of improvement achieved through parallel processing. Models like Random Forest,
with their inherently independent and parallelizable tasks, demonstrate more pronounced
performance gains, whereas models with more interdependent calculations, such as KNNs,
show relatively modest improvements.

6.6. Limitations
Despite the promising results, several limitations need to be acknowledged. First,
the performance gains observed were heavily dependent on the hardware configuration,
particularly the number of available cores and the architecture of the multicore processor.
In environments with fewer cores or less advanced hardware, the speedup may be less
significant. Second, the parallel approach introduced additional complexity in terms of
implementation and debugging, which could pose challenges for practitioners with limited
experience in parallel computing. Additionally, while the method was tested on a diverse
set of machine learning algorithms, certain algorithms or models with inherently sequential
operations may not benefit as much from parallelization. Finally, the overhead associated
with parallel processing, such as inter-thread communication and synchronization, can
diminish the performance improvements if not managed efficiently. Future work should
focus on addressing these limitations by optimizing the parallelization strategies and
exploring adaptive approaches that can dynamically adjust based on the hardware and
dataset characteristics.
Mach. Learn. Knowl. Extr. 2024, 6 1854

7. Conclusions
Across all scenarios, the accuracy of the models remained consistently high across
all scenarios, demonstrating robustness to increased core utilization. However, there was
a clear inverse relationship between processing time and the number of cores employed.
Notably, processing times exhibited exponential decreases with greater core utilization,
enabling accelerated training without sacrificing accuracy. Particularly for larger datasets,
multicore processing yielded significant speed enhancements compared to smaller datasets.
This study also generalized its findings across three distinct devices with varied hard-
ware configurations.
Looking ahead, future research could explore the extension of these techniques to
multicore GPUs, an area not covered in this study. Additionally, further investigations
could compare the outcomes with the state-of-the-art parallel computing methodologies
leveraging GPUs for machine learning applications.

Author Contributions: This manuscript benefited from the combined efforts of A.G. and F.A. A.G.
conceived and developed all the components of the research and drafted the initial manuscript. F.A.
provided valuable supervision throughout the process. Both A.G. and F.A. contributed significantly
to refining the writing. All authors have read and agreed to the published version of the manuscript.
Funding: This research is funded by a grant provided by the Air Force Research Lab (AFRL) through
the Assured and Trusted Digital Microelectronics Ecosystem (ADMETE) grant, BAA-FA8650-18-S-
1201, which was awarded to Wright State University, Dayton, Ohio, USA. This project was carried
out under CAGE Number 4B991 and DUNS Number 047814256.
Data Availability Statement: The dataset used in this work can be downloaded from the web
page at https://archive.ics.uci.edu/dataset/186/wine+quality (accessed on 16 June 2024) and
https://www.kaggle.com/datasets/yashpaloswal/fraud-detection-credit-card (accessed on 17 June
2024). The source code is available at the GitHub repository at https://github.com/aashutoshghimire/
parallel-ensemble-model (accessed on 30 July 2024).
Conflicts of Interest: The authors declare no conflicts of interest.

Abbreviations
The following abbreviations are used in this manuscript:
ML Machine Learning
CPU Central Processing Unit
RF Random Forest
ANN Artificial Neural Network
PPT Pattern Parallel Training
MPI Message Passing Interface
DNN Deep Neural Network
GPU Graphical Processing Unit
t-SNE t-Distributed Stochastic Neighbor Embedding
MCPU Million Cycles Per Second
LM Levenberg–Marquardt
ACO Ant Colony Optimization
XG XGBoost Classifier
KNN K-Nearest Neighbors

References
1. Onishi, S.; Nishimura, M.; Fujimura, R.; Hayashi, Y. Why Do Tree Ensemble Approximators Not Outperform the Recursive-Rule
eXtraction Algorithm? Mach. Learn. Knowl. Extr. 2024, 6, 658–678. [CrossRef]
2. Ghimire, A.; Asiri, A.N.; Hildebrand, B.; Amsaad, F. Implementation of secure and privacy-aware ai hardware using distributed
federated learning. In Proceedings of the 2023 IEEE 16th Dallas Circuits and Systems Conference (DCAS), Denton, TX, USA,
14–16 April 2023; pp. 1–6.
3. Dey, S.; Mukherjee, A.; Pal, A.; P, B. Embedded deep inference in practice: Case for model partitioning. In Proceedings of the 1st
Workshop on Machine Learning on Edge in Sensor Systems, New York, NY, USA, 10 November 2019; pp. 25–30.
Mach. Learn. Knowl. Extr. 2024, 6 1855

4. Li, B.; Gao, E.; Yin, J.; Li, X.; Yang, G.; Liu, Q. Research on the Deformation Prediction Method for the Laser Deposition
Manufacturing of Metal Components Based on Feature Partitioning and the Inherent Strain Method. Mathematics 2024, 12, 898.
[CrossRef]
5. Wiggers, W.; Bakker, V.; Kokkeler, A.B.; Smit, G.J. Implementing the conjugate gradient algorithm on multi-core systems. In
Proceedings of the 2007 International Symposium on System-on-Chip, Tampere, Finland, 20–21 November 2007; pp. 1–4.
6. Capra, M.; Bussolino, B.; Marchisio, A.; Shafique, M.; Masera, G.; Martina, M. An updated survey of efficient hardware
architectures for accelerating deep convolutional neural networks. Future Internet 2020, 12, 113. [CrossRef]
7. Chapagain, A.; Ghimire, A.; Joshi, A.; Jaiswal, A. Predicting breast cancer using support vector machine learning algorithm. Int.
Res. J. Innov. Eng. Technol. 2020, 4, 10.
8. Ghimire, A.; Tayara, H.; Xuan, Z.; Chong, K.T. CSatDTA: Prediction of Drug–Target Binding Affinity Using Convolution Model
with Self-Attention. Int. J. Mol. Sci. 2022, 23, 8453. [CrossRef] [PubMed]
9. Turchenko, V.; Paliy, I.; Demchuk, V.; Smal, R.; Legostaev, L. Coarse-Grain Parallelization of Neural Network-Based Face Detection
Method. In Proceedings of the 2007 4th IEEE Workshop on Intelligent Data Acquisition and Advanced Computing Systems:
Technology and Applications, Dortmund, Germany, 6–8 September 2007; pp. 155–158. [CrossRef]
10. Doetsch, P.; Golik, P.; Ney, H. A comprehensive study of batch construction strategies for recurrent neural networks in MXNet.
arXiv 2017, arXiv:1705.02414.
11. Casas, C.A. Parallelization of artificial neural network training algorithms: A financial forecasting application. In Proceedings of
the 2012 IEEE Conference on Computational Intelligence for Financial Engineering & Economics (CIFEr), New York, NY, USA,
29–30 March 2012; pp. 1–6.
12. Turchenko, V.; Triki, C.; Grandinetti, L.; Sachenko, A. Parallel Algorithm of Enhanced Historical Data Integration Using Neural
Networks. In Proceedings of the 2005 IEEE Intelligent Data Acquisition and Advanced Computing Systems: Technology and
Applications, Sofia, Bulgaria, 5–7 September 2005; pp. 66–73. [CrossRef]
13. Wang, J.; Han, Z. Research on speech emotion recognition technology based on deep and shallow neural network. In Proceedings
of the 2019 Chinese Control Conference (CCC), Guangzhou, China, 27–30 July 2019; pp. 3555–3558.
14. Naik, D.S.B.; Kumar, S.D.; Ramakrishna, S.V. Parallel processing of enhanced K-means using OpenMP. In Proceedings of the 2013
IEEE International Conference on Computational Intelligence and Computing Research, Enathi, India, 26–28 December 2013;
pp. 1–4. [CrossRef]
15. Todorov, D.; Zdraveski, V.; Kostoska, M.; Gusev, M. Parallelization of a Neural Network Algorithm for Handwriting Recognition:
Can we Increase the Speed, Keeping the Same Accuracy. In Proceedings of the 2021 44th International Convention on Information,
Communication and Electronic Technology (MIPRO), Opatija, Croatia, 27 September–1 October 2021; pp. 932–937.
16. Sun, S.; Chen, W.; Bian, J.; Liu, X.; Liu, T. Ensemble-compression: A new method for parallel training of deep neural networks. In
Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases; Springer: Cham, Switzerland,
2017; pp. 187–202.
17. Guan, S.U.; Li, S. Parallel Growing and Training of Neural Networks Using Output Parallelism. Trans. Neur. Netw. 2002,
13, 542–550. [CrossRef] [PubMed]
18. Chen, X.; Xiang, S.; Liu, C.L.; Pan, C.H. Vehicle Detection in Satellite Images by Parallel Deep Convolutional Neural Networks.
In Proceedings of the 2013 2nd IAPR Asian Conference on Pattern Recognition, Naha, Japan, 5–8 November 2013; pp. 181–185.
[CrossRef]
19. Farber, P.; Asanovic, K. Parallel neural network training on Multi-Spert. In Proceedings of the 3rd International Conference on
Algorithms and Architectures for Parallel Processing, Melbourne, Australia, 12 December 1997; pp. 659–666. [CrossRef]
20. Suri, N.N.R.; Deodhare, D.; Nagabhushan, P. Parallel Levenberg-Marquardt-Based Neural Network Training on Linux Clusters—A
Case Study. In Proceedings of the ICVGIP, Hyderabad, India, 16–18 December 2002; pp. 1–6.
21. Thulasiram, R.; Rahman, R.; Thulasiraman, P. Neural network training algorithms on parallel architectures for finance applications.
In Proceedings of the 2003 International Conference on Parallel Processing Workshops, Kaohsiung, Taiwan, 6–9 October 2003;
pp. 236–243. [CrossRef]
22. Aggarwal, K. Simulation of artificial neural networks on parallel computer architectures. In Proceedings of the 2010 International
Conference on Educational and Information Technology, Chongqing, China, 17–19 September 2010; Volume 2, pp. V2-255–V2-258.
[CrossRef]
23. Fejzagić, E.; Oputić, A. Performance comparison of sequential and parallel execution of the Ant Colony Optimization algorithm
for solving the traveling salesman problem. In Proceedings of the 2013 36th International Convention on Information and
Communication Technology, Electronics and Microelectronics (MIPRO), Opatija, Croatia, 20–24 May 2013; pp. 1301–1305.
24. Pu, Z.; Wang, K.; Yan, K. Face Key Point Location Method based on Parallel Convolutional Neural Network. In Proceedings of
the 2019 2nd International Conference on Safety Produce Informatization (IICSPI), Chongqing, China, 28–30 November 2019;
pp. 315–318. [CrossRef]
25. Autret, Y.; Thepaut, A.; Ouvradou, G.; Le Drezen, J.; Laisne, J. Parallel learning on the ArMenX machine by defining sub-networks.
In Proceedings of the 1993 International Conference on Neural Networks (IJCNN-93-Nagoya, Japan), Nagoya, Japan, 25–29
October 1993; Volume 1, pp. 915–918. [CrossRef]
26. Lee, B. Parallel neural networks for speech recognition. In Proceedings of the International Conference on Neural Networks
(ICNN’97), Houston, TX, USA 12 June 1997; Volume 4, pp. 2093–2097. [CrossRef]
Mach. Learn. Knowl. Extr. 2024, 6 1856

27. Dai, Q.; Xu, S.H.; Li, X. Parallel Process Neural Networks and Its Application in the Predication of Sunspot Number Series. In
Proceedings of the 2009 Fifth International Conference on Natural Computation, Tianjian, China, 14–16 August 2009; Volume 1,
pp. 237–241. [CrossRef]
28. Petkovic, D.; Altman, R.; Wong, M.; Vigil, A. Improving the explainability of Random Forest classifier—User centered approach.
In Proceedings of the Biocomputing 2018, Kohala Coast, HI, USA , 3–7 January 2018; pp. 204–215. [CrossRef]
29. Khomenko, V.; Shyshkov, O.; Radyvonenko, O.; Bokhan, K. Accelerating recurrent neural network training using sequence
bucketing and multi-GPU data parallelization. In Proceedings of the 2016 IEEE First International Conference on Data Stream
Mining & Processing (DSMP), Lviv, Ukraine, 23–27 August 2016; pp. 100–103. [CrossRef]
30. Borhade, P.; Deshmukh, R.; Murarka, S.; Agarwal, R. Image Classification using Parallel CPU and GPU Computing. Int. J. Eng.
Adv. Technol. 2020, 9, 839–843. [CrossRef]
31. Oswal, Y.P. Fraud Detection Credit Card. 2023. Available online: https://www.kaggle.com/datasets/yashpaloswal/fraud-
detection-credit-card/data (accessed on 17 June 2024).
32. Dua, D.; Graff, C. UCI Machine Learning Repository. University of California, Irvine, School of Information and Computer
Sciences. 2017. Available online: http://archive.ics.uci.edu/ml (accessed on 13 July 2024).
33. Fodor, I.K. A Survey of Dimension Reduction Techniques; Technical Report; Lawrence Livermore National Lab. (LLNL): Livermore,
CA, USA, 2002.
34. Kazemi, F.; Asgarkhani, N.; Jankowski, R. Machine learning-based seismic fragility and seismic vulnerability assessment of
reinforced concrete structures. Soil Dyn. Earthq. Eng. 2023, 166, 107761. [CrossRef]
35. Loh, W.Y. Classification and regression trees. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2011, 1, 14–23. [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

You might also like