0% found this document useful (0 votes)
24 views18 pages

One Dimensional VGGNet For High Dimensional Data - 2023 - Applied Soft Computing

Uploaded by

鄭文竣
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views18 pages

One Dimensional VGGNet For High Dimensional Data - 2023 - Applied Soft Computing

Uploaded by

鄭文竣
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Applied Soft Computing 135 (2023) 110035

Contents lists available at ScienceDirect

Applied Soft Computing


journal homepage: www.elsevier.com/locate/asoc

One-dimensional VGGNet for high-dimensional data



Sheng Feng a , Liping Zhao a , Haiyan Shi a , Mengfei Wang b , Shigen Shen c , ,

Weixing Wang b ,
a
Department of Computer Science and Engineering, Shaoxing University, Shaoxing, 312000, China
b
School of Information Engineering, Chang’an University, Xi’an, 710064, China
c
School of Information Engineering, Huzhou University, Huzhou, 313000, China

article info a b s t r a c t

Article history: We consider a deep learning model for classifying high-dimensional data and seek to achieve
Received 24 September 2022 optimal evaluation accuracy and robustness based on multicriteria decision-making (MCDM) for high-
Received in revised form 12 December 2022 dimensional data analysis applications during comprehensive evaluation (CE) activities. We propose
Accepted 11 January 2023
a novel one-dimensional visual geometry group network (1D_VGGNet) to overcome the problem that
Available online 20 January 2023
high-dimensional data are too complicated and unstable to be feasibly applied. Then, to effectively
Dataset link: https://github.com/fengsheng handle one-dimensional MCDM, we present a 1D_VGGNet classifier to replace the two-dimensional
13/datasets convolution operation applied to image data with a one-dimensional convolution operation applied
to one-dimensional MCDM. Furthermore, to solve the invariance problem of the generated feature
Keywords: maps, the maxpooling kernel size can be flexibly adjusted to effectively meet the requirements of
High-dimensional data reducing the feature map dimension and speeding up training and prediction on different datasets.
Deep learning classification The improvement is reasonable for various high-dimensional data application scenarios. Moreover, we
One-dimensional visual geometry group
propose a novel objective function to accurately evaluate network performance since the objective
network (1D_VGGNet)
One-dimensional convolution
function includes a variety of representative performance evaluation metrics, and the average value is
Comprehensive evaluation (CE) calculated as one of the CE metrics. The experimental results illustrate that the proposed framework
outperforms a one-dimensional convolutional neural network (1D_CNN) for comprehensive classifica-
tion on the Shaoxing University student achievement dataset and the MIT-BIH Arrhythmia database
and achieves average gains of 36.3% and 12.1% in terms of the designated evaluation metric.
© 2023 Elsevier B.V. All rights reserved.

1. Introduction be feasibly applied, we need to design a more applicable classifier


to deal with them [15]. Despite the efforts of many scholars to
Recently, we have witnessed rapid advances in big data for improve the classification and evaluation performance, we still
many applications, which is known as ‘‘information explosion’’ face many challenges, such as how to design a more flexible
[1–3]. Artificial intelligence (AI) techniques have been combined classifier that can adapt to complex and diverse high-dimensional
with the internet of things (IoT), referred to as the artificial data, and there has been no perfect solution. Thus, it is very
intelligence of things (AIoT). Examples of AIoT include Healthcare important to design a more robust algorithm.
4.0 [4] and Industry 4.0 [5,6]. Using AIoT, we can not only store In recent years, the source codes for many traditional classi-
massive data in clouds but also carry out big data analyses for fiers have gradually been opened for scholars to study [16]. How-
human daily production and life. With the maturity of AIoT, ever, many classifiers for high-dimensional data still have high-
many areas have experienced geometric data growth. These areas cost solutions. Conversely, one well-known classifier, the visual
can be electronically represented with high-dimensional data, geometry group network (VGGNet) [17], can be used to improve
i.e., audio, video, and electronic text document data [7–13]. Deep the prior state-of-the-art configurations with very small (3 × 3)
learning classifiers and comprehensive evaluations (CEs) can be convolution filters and push the network depth to more weight
used to determine the deep meaning behind data [14]. However, layers than ILSVRC-2013, its close competitor [18]. With (3 × 3)
since high-dimensional data are too complicated and unstable to receptive fields, two-dimensional VGGNet (2D_VGGNet) can use
3 × 3 layers instead of 7 × 7 layers to increase the discrimination.
∗ Corresponding authors. By increasing the network depth, we find that (3 × 3) filters
are efficient. However, since this approach is designed for image
E-mail addresses: [email protected] (S. Feng),
[email protected] (L. Zhao), [email protected] (H. Shi),
data, it is still challenging to apply it to high-dimensional data
[email protected] (M. Wang), [email protected] (S. Shen), and multicriteria decision-making (MCDM) applications, which
[email protected] (W. Wang). are used to sort alternatives [19].

https://doi.org/10.1016/j.asoc.2023.110035
1568-4946/© 2023 Elsevier B.V. All rights reserved.
S. Feng, L. Zhao, H. Shi et al. Applied Soft Computing 135 (2023) 110035

To mimic the visual perception of living beings, a typical con- 2. Related work
volutional neural network (CNN) contains convolutional layers
and pooling layers to form a feedforward neural network, which Based on deep learning and one-dimensional convolutional
has a deep structure [20]. The convolution layer is responsible for neural network theory, our goal is to scientifically analyze and
extracting the features of the input data and includes multiple evaluate high-dimensional data and mine the deep meaning be-
convolution kernels. The pooling layer is responsible for reduc- hind the data. Thus, the related work can be divided into three
ing the sampling dimension, computing consumption, memory parts: (1) research on high-dimensional data, (2) research on deep
consumption and network complexity, removing redundancies learning classification and (3) research on 1D_CNNs.
and compressing features [21]. Specifically, one-dimensional con- High-dimensional data: There are various approaches fo-
volutional neural networks (1D_CNNs) can be used to keep a cused on analyzing high-dimensional data in statistical science
small number of translations of the input data invariant to a [23–28]. For instance, to solve the problem of the number of co-
large extent [22]. The side effect of CNNs is that with increasing
variates (also called features) exceeding the sample size, Ahmed
network depth, the classification accuracy of the model does
et al. focused on high-dimensional regression analysis to select
not significantly improve. Therefore, a reasonable solution is to
random subsets of covariates [15]. The ensemble approach can be
combine 2D_VGGNet, which can push the depth to more weight
used to analyze each subset and obtain the results. The authors
layers, with 1D_CNNs to keep a small number of translations of
the input data invariant and improve the classification accuracy demonstrated that the approach surpasses penalty methods in
in various scenarios with high-dimensional data. the real world. To address the problem that the discriminative
Moreover, to solve the MCDM optimization problem (MCD- information is masked by high dimensions, Li et al. considered
MOP) for CE applications, we propose a novel framework called high-dimensional data classification by discarding the irrelevant
the one-dimensional VGGNet (1D_VGGNet) framework to classify data [29]. The authors found that the high-dimensional data
and evaluate objects in applications with high-dimensional data. classification framework performs very well for a small number
The proposed framework is a novel framework for preserving of training samples. To address the lack of generalization power
the evaluation comments of experts and improving the evalua- to select an optimal subset of features, Salesi et al. proposed a
tion accuracy when high-dimensional data are too complicated feature selection algorithm that applies the Fisher score filter
and unstable to be feasibly applied. We then introduce a novel algorithm [30]. Its fitness function uses information theory-based
objective function to accurately evaluate the performance of the criteria, and the algorithm outperforms other feature selection
network model since the objective function includes a variety of algorithms. To reduce the side effect of performance degradation
representative performance evaluation metrics, and the average as a deep autoencoder network becomes deeper, Wickramasinghe
value is calculated as the CE metric. et al. [31] considered unsupervised feature learning with a ResNet
The main contributions of this work can be summarized as autoencoder. The authors found that unsupervised feature learn-
follows: ing outperforms standard deep autoencoders by adding residual
connections.
▶ To achieve optimal evaluation accuracy and robustness
Deep learning classification: The remarkable progress in
based on MCDM for high-dimensional data analysis appli-
deep learning classification has spread to many domains [32–38].
cations during comprehensive evaluation (CE) activities, we
propose a novel framework called 1D_VGGNet to classify For hyperparameter optimization in lung nodule classification,
and evaluate objects in applications with high-dimensional Zhang et al. [39] considered a nonstationary kernel to optimize
data. It can be used to classify objects accurately and pre- the hyperparameter configuration. The authors showed that the
serve the evaluation comments of experts to make the clas- correlation function relies on relative locations rather than the
sification results consistent with expert opinions as much as distance to the optimal point. To address the problem that a CNN
possible. can become trapped in a local region, Xiao et al. [40] focused on
▶ To effectively solve the problem of one-dimensional MCDM, image fusion and employed U-Net for segmentation tasks. The
we replace the two-dimensional convolution of 2D_VGGNet authors showed that the network can outperform some state-of-
applied to image data by one-dimensional convolution ap- the-art methods in terms of network complexity. Although there
plied to one-dimensional MCDM. Furthermore, to solve the have been many methods to improve the original architecture of
problem of the invariance of the generated feature maps, the convolutional neural networks, landmark results were reported
kernel size of maxpooling can be flexibly adjusted to effec- in 2015. Simonyan et al. [41] considered the depth aspect of
tively meet the requirements of reducing the dimension of CNN architecture design and proposed a novel VGGNet model.
the feature maps and speeding up training and prediction The authors steadily increased the network depth by adding
on different datasets. The improvement is reasonable for more convolutional layers. The experimental results showed that
various high-dimensional data application scenarios. the network can achieve outstanding accuracy on classification
▶ To effectively solve the MCDMOP, we present a novel ob- tasks and is applicable to image recognition tasks. Then, for
jective function to accurately evaluate the performance of hand gesture recognition, Ding et al. [42] focused on a VGG-
the network model since the objective function includes a 16 model to extract features with reduced number of data. The
variety of representative performance evaluation metrics, authors showed that the method can be used to achieve excellent
and the average value is calculated as one of the CE metrics. real-time recognition performance.
The remainder of our paper is organized as follows. In Sec- One-dimensional CNN: Since a raw one-dimensional dataset
tion 2, the related work is briefly surveyed and compared. We can be directly fed into 1D_CNNs [43–45], 1D_CNNs can out-
introduce the preprocessing method of MCDM and our objective perform their 2D counterparts for 1D signal applications, and
function to accurately evaluate the performance of the network they have the following advantages: (1) they have low com-
model in Section 3. We formally define our 1D_VGGNet archi- putational complexity; (2) they can learn challenging 1D signal
tecture and one-dimensional convolution in Section 4, and we tasks with relatively shallow architectures; and (3) with their
present the kernel size adjustment method to effectively meet low computational requirements, they can perform real-time and
the requirements of reducing the feature map dimension and low-cost tasks [46]. Thus, to clarify the applications of 1D_CNNs,
speeding up training and prediction on different datasets. In Li et al. [47] focused on the effectiveness of feature extraction,
Section 5, extensive experimental results are shown to verify our where the position of the feature does not matter. The authors
proposed framework. Finally, we conclude our paper in Section 6. showed various applications of 1D_CNNs in time series prediction
2
S. Feng, L. Zhao, H. Shi et al. Applied Soft Computing 135 (2023) 110035

and signal identification of electrocardiograms. Then, to solve the where m is the number of features. Because high-dimensional
problem of extracting features from adjacent spectral informa- data can describe very complex data structures, the value of m
tion, Kang et al. [48] considered a feature extraction network with can be very large.
atrous convolution. The authors showed that the method out- Next, suppose that R represents all remarks. It can be defined
performed some 3D_CNN models. To address multiple pollutant as
factors, Dai et al. [49] considered a one-dimensional multiscale
R = {r1 , r2 , . . . , ri , . . . , rn } ∈ Rn , (2)
convolution kernel based on temporal and spatial characteristics.
Thus, the advantages of CNN and LSTM models can be exploited. where n is the number of remark levels, and ri ∈ R. We define
The authors showed that the network can outperform a CNN the weight distribution W about F as
model. For fog infrastructure consisting of fog-based gateways, m

Cheikhrouhou et al. [50] considered a modular 1D_CNN. The W = {w1 , w2 , . . . , wi , . . . , wm } | wi = 1, (3)
inference module of the model can analyze 1D signals and initiate i=1
real-time emergency countermeasures. The experimental results
where wi is the weight that corresponds to feature fi .
showed that outstanding performance can be achieved using this
Therefore, supposing the dataset D contains s samples with m
algorithm.
features, we can define the dataset as
Many previous works have focused on feature extraction and
a1,1 a1 , 2 ... a1,m
⎡ ⎤
one-dimensional multiscale convolution kernels of 1D_CNNs
[48,49,51] to ensure satisfactory evaluation performance and ⎢a2,1 a2 , 2 ... a2,m ⎥
efficiency. Our 1D_VGGNet framework provides an effective so- ⎣ ..
D =⎢ .. ..
.
.. ⎥ (4)
. . .

lution to classify and evaluate objects in applications with high-
dimensional data. Moreover, our efforts focus on the preservation
as,1 as,2 ... as,m
of the evaluation comments of experts and improve the VGGNet where ai,j ∈ R indicates the score using R on each feature fi based
classifier for high-dimensional data to make the classification on expert knowledge independently.
results consistent with expert opinions as much as possible.
Then, to solve the problem of the invariance of the generated 3.2. Multicriteria decision-making optimization problem (MCDMOP)
feature maps, the kernel size of maxpooling can be flexibly
adjusted to effectively meet the requirements of reducing the When considering multiple criteria (features), the decision-
dimension of feature maps and speeding up training and pre- making problem of localizing optimal samples is a challenging
diction on different datasets. The improvement is reasonable for task in modern decision-making science. Its potential applica-
various high-dimensional data application scenarios. Moreover, tions include engineering, economic, and military scenarios. To
we propose a novel objective function to accurately evaluate the clearly describe our objective, we define our optimization prob-
performance of the network model since the objective function lem as achieving the maximum classification accuracy with our
includes a variety of representative performance evaluation met- 1D_VGGNet model relative to that of 1D_CNNs and the minimum
rics, and the average value is calculated as one of the CE metrics. impact of a single feature for MCDM. We denote Inc(κ ) as the
To the best of our knowledge, this is the first framework to increase in the accuracy of multiple evaluation metrics based on
present a 1D_VGGNet framework used to classify objects based the ground-truth classification results of expert scoring:
on the evaluation comments of experts for high-dimensional data κ1D_VGGNet − κ1D_CNN
Inc(κ ) = . (5)
classification and analysis to overcome the problem that high- κ1D_CNN
dimensional data are too complicated and unstable to be feasibly
Hence, κ1D_VGGNet and κ1D_CNN are the accuracy rates of the
applied.
multiple evaluation metrics in terms of κ ∈ {Accuracy, Jaccard,
Recall, F1, Precision, R2, AUC, Avg}, where AUC is the area under
3. Preprocessing method of multicriteria decision-making the curve and Avg is the average; the Python sklearn module is
used to evaluate our 1D_VGGNet model and the 1D_CNNs with
As mentioned above, 2D_VGGNet can be used to improve the ground-truth classification results of expert scoring.
prior state-of-the-art configurations with very small (3 × 3) Thus, our MCDMOP can be formulated as
convolution filters and push the depth to more weight layers
than ILSVRC-2013, its close competitor [18]. With a very small maximize Inc(κ ) (6)
{D },{F },{R},{yi }
kernel size of 3 receptive fields throughout the whole network,
2D_VGGNet can incorporate three convolution layers with a ker- subject to
nel size of 3 instead of a single layer with a kernel size of 7, which yi = [0, . . . , r ],
makes the decision function more discriminative. However, since
ŷ1D_VGGNet
i = [0, . . . , r ],
this approach is designed for image data, we need to improve
the network architecture in terms of high-dimensional data with ŷ1D_CNN
i = [0, . . . , r ], (7)
MCDM. κj ∈ [0, 1],
κt ∈ (−∞, 1],
3.1. Preprocessing procedure
where yi is an integer indicating the class of [ai,1 , ai,2 , . . . , ai,m ] as
For high-dimensional data with MCDM, we focus on how to the ground-truth classification results of expert scoring. ŷ1D_VGGNet
i

improve the CE metric to preserve the evaluation comments of is an integer indicating the class of the same sample predicted by
experts and adapt to complex and diverse high-dimensional data. our 1D_VGGNet model. ŷ1D_CNN
i is an integer indicating the class
of the same sample predicted by the 1D_CNN model. Moreover,
Thus, we first introduce the preprocessing procedure of CE and
based on yi and ŷ1D_VGGNet , we can obtain κ1D_VGGNet using the
define it as H = ⟨F , R , W ⟩. Suppose that F is the set of all i

features. It can be defined as Python sklearn module. Based on yi and ŷ1D_CNNi , we can obtain
κ1D_CNN . Furthermore, j ∈ {1, 2, 3, 4, 5, 7} is related to the mul-
F = {f1 , f2 , . . . , fi , . . . , fm }, (1) tiple evaluation metrics in terms of Accuracy, Jaccard, Recall, F1,
3
S. Feng, L. Zhao, H. Shi et al. Applied Soft Computing 135 (2023) 110035

Precision and AUC. t ∈ {6, 8} is related to the metrics in terms of with the parameters of the convolution kernel and to obtain the
R2 and Avg. In addition, we use Eq. (5) to calculate the increase results through matrix dot transport and summation operations.
Inc(κ ) in the accuracy of the evaluation metrics based on the On this basis, the deep network can extract more complex fea-
ground-truth classification results of expert scoring. Finally, our tures from low-level features. The forward propagation formula
goal is to maximize Inc(κ ) by adjusting our model parameters. of the neural network can be described as follows.

xl+1 = H (xl wl+1 + bl+1 ). (8)


4. One-dimensional visual geometry group network architec-
ture l+1 Wl l+1
Here, w ∈ R represents the weight vector, and b ∈ R
represents the bias of layer l (W l+1 is the width of the output
Consider a CNN with L convolution layers. Let xl represent l
of layer l). xl ∈ RW represents the input of layer l + 1. H (·) :
the outputs from layers l (l ∈ {1, . . . , L }) and x0 = D . As- R → R is an activation function that implements a nonlinear
sume the training set ℵ = {(x1 , y1 ), . . . , (xn , yn )}, where xi = transformation and acts as a threshold function. Moreover, in the
[ai,1 , ai,2 , . . . , ai,m ] ∈ Rm is an m-dimensional vector and yi = convolution operation, each convolution kernel acts as a filter to
[0, . . . , r ] is an integer indicating the class of xi . The purpose in extract different kinds of features.
designing a neural network is to classify these vectors with the To form a complex expression space, it is necessary to add
neurons in the convolution layers, and the output layer should nonlinear mapping. Therefore, the ReLU activation function is
contain r neurons. introduced as follows.
Then, for each class of r neurons, the network input is x0 ,
if x < 0,
{
and its output is the classification scores of ai . The network 0
Hrelu (x) = max(0, x) = (9)
can extract features from ai that are linearly separable in the x if x ≥ 0.
output layer. Moreover, because the high-dimensional data are
if x < 0,
{
′ 0
too complicated and unstable to be applied, the classification of Hrelu (x) = (10)
1 if x ≥ 0.
x0 is a challenging task. Consequently, our 1D_VGGNet model,
which has a deeper network architecture can accurately learn a This clearly indicates that Hrelu (x) has a unilateral asymmetric
feature transformation function for x0 . In addition, because the structure; its derivative in R+ is always 1 and does not saturate
traditional 2D_VGGNet model is designed for image classification, R+ , which effectively alleviates the phenomenon of gradient dis-
it cannot be directly applied to high-dimensional data classifica- appearance. Moreover, the model can arbitrarily approach any
tion. Therefore, we first design the architecture of the 1D_VGGNet nonlinear structure, so it has the ability to deal with complex
model for high-dimensional data D and then detail the model data.
training process.
4.3. Pooling layer
4.1. Architecture of 1D_VGGNet
To reduce the number of feature map parameters, improve the
Fig. 1 shows the architecture of our 1D_VGGNet model. The computing speed, and increase the receptive field, a pooling layer
input data are s × m high-dimensional data D (s is the total is applied for the downsampling operation. This operation can
number of samples), and the batch size is 64. The features of D make the model pay more attention to the global feature than
are extracted through 5 groups of convolution layers, where each the local location while retaining some important feature infor-
group is composed of multiple one-dimensional convolutional mation, improving fault tolerance, and preventing overfitting. The
layers Conv1D in series. The essence of convolution is to extract pooling layer can be represented as follows.
the features of high-dimensional data with the parameters of
the convolution kernel and obtain the results through matrix xout = α P (xin ) + β. (11)
dot transport and summation operations. The rectified linear where α and β are the proportional and horizontal deviations,
unit (ReLU) activation function after each convolutional layer respectively. P (·) is the downsampling function. xin and xout are
enhances the feature learning ability of the model. At the end the input and output of the pooling layer, respectively. Moreover,
of each group is the one-dimensional pooling layer that uses we select the maximum pooling operator. It gives the maximum
maxpool1D, which is a downsampling operation to reduce the value in the adjacent area of the input feature map, and the
number of parameters in a feature map and increase the comput- expression is as follows.
ing speed and the receptive field. In addition, it should be noted
dow n
that in the feature extraction stage, the number of convolution xout (i) = max {xin (c | i)}. (12)
(i−1)γ ≤c ≤iγ
layers L is variable and positively correlated with the number of
parameters of the model. With the increase in L , the number of Here, xin (c | i) is the activation value of the cth neuron of the
parameters of the model increases. The classification stage after input, the position interval of the set is (i − 1)γ ≤ c ≤ iγ , i is the
the feature extraction stage consists of three fully connected (FC) location of the feature map, and γ is the kernel size.
layers. The first two FC layers each have 128 channels. After each Compared with the convolution operation, the pooling opera-
layer, the ReLU activation function is used to mitigate gradient tion only changes the dimension of the feature map but does not
disappearance. The third FC layer has 5 channels, which are rela- change the depth D of the feature map. Therefore, the number of
tive to the classification result ŷi of ai using the maximum value of pooling layers is closely related to the dimension of the feature
γ
r channels. These classification results correspond to five export map. Taking the maxpool1D layer fpool with a kernel size γ as an
scoring ranges: {[90, 100], [80, 90), [70, 80), [60, 70), [0, 60)}. example, the length L = t of high-dimensional data D remains
unchanged, and the width W = m decreases with the increase in
4.2. One-dimensional convolution layer the pooling layer. The formula is as follows.
Win + 2 × ϑ1 − ϑ2 × (γ − 1) − 1
⌊ ⌋
Convolution is a basic operation of analytical mathematics and Wout = +1
adopts a discrete form in the field of deep learning. As the basic ϑ3
⌊ ⌋ (13)
component of convolutional neural networks, the essence of con- Win
= , if ϑ1 = 0, ϑ2 = 1, and ϑ3 = γ .
volution is to extract the features of the high-dimensional data γ
4
S. Feng, L. Zhao, H. Shi et al. Applied Soft Computing 135 (2023) 110035

Fig. 1. Architecture of our 1D_VGGNet model.

⌊ ⌋ γ =1
Table 1 utilize (5 − logγ ≥2 m = 5) fpool layers. In addition, if 1 ≤
1D_VGGNet parameters.
logγ ≥2 m < 5, then the 1D_VGGNet model can perform the
⌊ ⌋
No. Layer Amount Size L W D ⌊ ⌋ γ ≥2
logγ ≥2 m downsampling operation. Thus, the number of fpool
0 Input D 64 m 1 ⌊ ⌋
1 Conv1D 64 3 64 m 64 layers in the 1D_VGGNet model is logγ ≥2 m , while the number
2 Conv1D 64 3 64 m 64 γ =1 ⌊ ⌋
γ1 of fpool layers is 5 − logγ ≥2 m . For instance, if m = 23, the
3 fpool γ1 64 W1 64 γ =3 ⌊ ⌋
4 Conv1D 128 3 64 W1 128 number of fpool layers in the 1D_VGGNet model is log3 23 = 2,
5 Conv1D 128 3 64 W1 128 γ =1 ⌊ ⌋
γ2 while the number of fpool layers is 5 − log3 23 = 3. Thus, the
6 fpool γ2 64 128
W2
parameters in the above table are γ1,2 = 3 and γ3,4,5 = 1.
7 Conv1D 256 3 64 W2 256
8 Conv1D 256 3 64 W2 256
9 Conv1D 256 3 64 W2 256 4.4. Fully connected layer
γ3
10 fpool γ3 64 W3 256
11 Conv1D 512 3 64 W3 512 To map the features abstracted by the feature extraction stage
12 Conv1D 512 3 64 W3 512 to the label space of a specific dimension and obtain the loss
13 Conv1D 512 3 64 W3 512
γ4 result, we employ a three-layer FC network in the classification
14 fpool γ4 64 W4 512
15 Conv1D 512 3 64 W4 512
stage. The neurons in each layer can be connected with all the
16 Conv1D 512 3 64 W4 512 neurons in the previous and next layers, and the input and output
17 Conv1D 512 3 64 W4 512 are extended into one-dimensional vectors. Therefore, the FC
γ5
18 fpool γ5 64 W5 512 layer has the largest number of parameters, which is (Ll−1 ×
19 FC 64 1 128 W l−1 × Dl−1 + 1) × Ll × W l × Dl . The FC layer is a special case of
20 FC 64 1 128
a convolution layer.
21 FC 64 1 5
22 Output 64 1 5
4.5. Training the 1D_VGGNet model

To analyze the quality of the training process and determine


where Win and Wout are the widths of the input and output of the whether the model converges, we define the objective func-
pooling layer, respectively. ϑ1 , ϑ2 and ϑ3 are the padding, dilation tion called the loss function to determine how accurate the
and stride of the pooling layer, respectively. On this basis, we can 1D_VGGNet model is at classifying the samples in ℵ by measuring
obtain the formula for the amount n of the pooling layers and the the gap between the predicted result and the ground truth.
width m of D as follows. Our main goal in training 1D_VGGNet is to minimize the
incorrectly classified samples. To show all parameters of wl and
⌊ ⌋
m
Wn = n
(∏ ) ≥ 1, 1 ≤ n ≤ 5. (14) bl in a single vector, we can augment wl and bl as wl(w|bl ) =
Wi [bl , w1 , . . . , wW l ]. Moreover, we can also augment xl with 1 as
i=1 xl(x|1) = [1, x1 , . . . , xW l ]. Therefore, we formulate this objective as
γ follows.
Here, Wn is the width of the nth output feature map of fpool , and n
the constraint is Wn ≥ 1. ⌊·⌋ is the rounding down operation, and ∑
E =− yi log σ (xl(x|1) wl(+1
w|bl+1 )
), (15)
the parameters are listed in Table 1. n = 1 and 5 indicate the first
γ i=1
and the last fpool layers, respectively.
1
σ (xl(x|1) wl(+w|1bl+1 ) ) = . (16)
−xl(x|1) wl+1 l+1
Theorem 1. Let m represent the number of features of high- 1+e (w|b )

dimensional data D ; then, if m ≥ γ 5 , γ ≥ 2, the 1D_VGGNet model Here, σ : R → [0, 1] is the logistic sigmoid function. Further-
γ ≥2 γ ≥2
has five fpool layers. Otherwise, the number of fpool layers in the more, to find the minimum of E , we use gradient descent with the
γ =1
1D_VGGNet model is ⌊logγ ≥2 m⌋, while the number of fpool layers is cross-entropy loss function. To implement the gradient descent
5 − ⌊logγ ≥2 m⌋. method, we should obtain the derivative of σ (x) with respect to
its parameter as follows.
Proof. (1) If m ≥ γ 5 , γ ≥ 2, then W5 =
⌊m⌋
γ5
≥ 1. Thus, the ∂σ (x)
γ ≥2 = σ (x)(1 − σ (x)). (17)
1D_VGGNet model can have five fpool layers. (2) If m < γ , γ ≥ 2, 5 x
then n = logγ ≥2 m < 5. Otherwise, a contradiction
⌊ ⌋
Then, we can obtain the partial derivative of E by the chain
⌊ ⌋ of the
definition of Wn ≥ 1 would exist; that is, W5 = γm5 | (m < rule as follows.
γ 5 ) = 0. For a special case, if logγ ≥2 m = 0, then m = 1.
⌊ ⌋ ∂E
= (σ (xli wl+1 ) − yi )xi , (18)
Thus, there is no space for the downsampling operation. Because ∂wi
γ =1
the fpool layer has the feature of dimension invariance, to retain ∂E
the dimension of the feature maps, the 1D_VGGNet model can = σ (xli wl+1 ) − yi . (19)
∂w0
5
S. Feng, L. Zhao, H. Shi et al. Applied Soft Computing 135 (2023) 110035

Fig. 2. Flowchart of our 1D_VGGNet model.

In the error backpropagation process, the loss function is used Table 2


to calculate the error. Then, the error E l+1 is convoluted with Framework parameters.

the parameter wlij to propagate the error to the previous layer Parameters Value

to obtain the error E l as follows. Dataset (D ) Shaoxing University MIT-BIH [53]


Kernel size of Conv1D 3 same
W l+1
Kernel size of maxpool1D (γ ) 1, 2, 3, 5, 7, 11, 13 same
∑ )′
Ejl
l+1
wlij . FC layer channels (r) 5 6
(
= Ei (20)
Epoch (ϕ ) 4000 200
i=1 Batch size 64 same
Here, (·)′ indicates the matrix transpose operations. Then, the Learning rate (η) 1.0 × 10−6 1.0 × 10−3
Feature number (m) 23 256
weight gradient ∂ wlij is calculated as follows.

∂ wlij = xlj Eil+1 . (21)


5. Experiment and analysis
In addition, it is not sufficient to propagate the dataset once
in the neural network. We need to propagate the dataset many
Here, we present the CE metric of our proposed framework.
times in the same neural network. As the number of epochs
The experiments are performed on the Shaoxing University
ϕ increases, the update times of the parameters in the neural
dataset, which includes the scores of more than 700 students
network also increase, and the network changes from underfitting
in two semesters, and the MIT-BIH Arrhythmia Database, which
to overfitting. We update the network parameters w and b using
contains 48 electrocardiographic (ECG) records of 47 individuals.
the gradient descent method as follows.
ŵl = wl − η∂ wl , 5.1. Framework parameters and datasets
{
E → (22)
b̂l = bl − η∂ bl .
The experiments are performed using MATLAB R2021a and
Here, wl and ŵl are the weights before and after updating, re- PyCharm 2021.3 with PyTorch 1.10.0. The operating system of the
spectively. bl and b̂l are the biases before and after updating, computer is Windows 10 64-bit, the central processing unit (CPU)
respectively. η is the learning rate, which is used to control the is an AMD Ryzen 9 5900HX with Radeon Graphics 3.30 GHz and
step size of the parameter update process. 8 cores, and the RAM is 64 GB 1600 MHz.
Furthermore, to minimize E and improve the convergence of The framework parameters are listed in Table 2. To verify the
the gradient descent method, we use the Adam optimizer [52] to high performance of our framework, we use the student achieve-
dynamically estimate and adjust the η values of wl and bl using ment dataset of Shaoxing University and the MIT-BIH Arrhythmia
the first and second moments of the gradient. Database as the datasets D . Next, we set the kernel size of Conv1D
to 3 and the kernel size γ of maxpool1D to 1, 2, 3, 5, 7, 11, 13
Fig. 2 shows the flowchart of our 1D_VGGNet model. In the for the two datasets. Then, we set the r channels of the FC layer
first step, we use the preprocessing process to obtain a high- to 5 (6) for the two datasets. Furthermore, during the training
dimensional dataset D . Next, we input D into the five groups phase, we set the number of epochs ϕ , batch size and learning
of convolution layers for feature extraction. Then, in the classi- rate η to 4000 (200), 64 and 1.0 × 10−6 (1.0 × 10−3 ) for the
fication stage, we use a three layer FC network to map features two datasets, respectively. Moreover, for consistency, we set the
to the label space of a specific dimension and obtain the loss feature number m to 23 (256).
result. Finally, we use the softmax function to obtain the final
category. In the second step, we use Theorem 1 to determine the 5.1.1. Shaoxing University datasets
γ ≥2 γ =1 γ
number of fpool layers and fpool layers in the five fpool layers. In Dataset D is composed of data of the Shaoxing University
the third step, we use Eqs. (13) and (14) to calculate the width first semester dataset. Each row corresponds to the scores of all
γ
Wn of the nth output feature map of fpool based on the amount s = 746 students for one feature fi . There are a total of m = 23
n of the pooling layers and the width m of D . In the fourth step, features. The data of some features are evenly distributed with
we use the objective function (6) in the MCDMOP to evaluate the good spacing. However, the data of other features are relatively
increase Inc(κ ) of our 1D_VGGNet model and the 1D_CNNs with concentrated, so the spacing is poor. This means that the dis-
the ground-truth classification results based on expert scoring. tribution characteristics of the dataset are highly representative
Finally, we adjust the values of hyperparameters γ , ϕ and η. and cover most cases. The data of the second semester dataset
Based on these hyperparameters, we readjust the architecture of have similar characteristics. The number of students is reduced
our 1D_VGGNet model to improve the classification performance. to s = 742, but the data distribution of each feature fi has similar
6
S. Feng, L. Zhao, H. Shi et al. Applied Soft Computing 135 (2023) 110035

Table 3
Robustness evaluation of our framework with 2 groups of convolution layers on the Shaoxing University datasets.
Dataset Classifier Accuracy Jaccard Recall F1 Precision R2 AUC Avg
γ =1
1st_2nd 1D_CNN2G 0.748 0.277 0.456 0.439 0.426 0.107 0.687 0.449
γ =1
1D_VGG2G 0.724 0.283 0.450 0.432 0.416 0.070 0.679 0.436
γ =2
1D_VGG2G 0.658 0.285 0.421 0.401 0.383 −0.044 0.653 0.394
γ =3
1D_VGG2G 0.646 0.283 0.410 0.393 0.378 −0.065 0.645 0.384
γ =5
1D_VGG2G 0.637 0.282 0.395 0.388 0.381 −0.072 0.635 0.378
γ =7
1D_VGG2G 0.635 0.275 0.383 0.382 0.383 −0.079 0.628 0.372
γ =9
1D_VGG2G 0.608 0.275 0.357 0.356 0.374 −0.123 0.605 0.350
γ =11
1D_VGG2G 0.621 0.247 0.350 0.352 0.361 −0.126 0.608 0.345
γ =13
1D_VGG2G 0.528 0.221 0.242 0.208 0.336 −0.280 0.526 0.254
Inc(κi ) −0.032 0.022 −0.013 −0.016 −0.023 −0.346 −0.012 −0.027
γ =1
2nd_1st 1D_CNN2G 0.751 0.269 0.417 0.427 0.458 −0.010 0.664 0.425
γ =1
1D_VGG2G 0.732 0.257 0.400 0.409 0.440 0.075 0.652 0.424
γ =2
1D_VGG2G 0.686 0.257 0.377 0.387 0.420 0.004 0.632 0.395
γ =3
1D_VGG2G 0.625 0.259 0.347 0.355 0.395 −0.097 0.604 0.355
γ =5
1D_VGG2G 0.617 0.251 0.337 0.345 0.386 −0.117 0.598 0.345
γ =7
1D_VGG2G 0.625 0.251 0.339 0.348 0.393 −0.104 0.600 0.350
γ =9
1D_VGG2G 0.615 0.238 0.321 0.329 0.391 −0.126 0.589 0.337
γ =11
1D_VGG2G 0.609 0.241 0.329 0.334 0.368 −0.145 0.593 0.333
γ =13
1D_VGG2G 0.493 0.197 0.214 0.160 0.423 −0.346 0.508 0.236
Inc(κi ) −0.025 −0.045 −0.041 −0.042 −0.039 −8.500 −0.018 −0.004

distribution characteristics as the corresponding feature of the evaluate our framework with the ground-truth classification re-
first semester. This means that the classifier can train the model sults based on expert scoring. The score of the results for γ is
using the first semester data as the training set and the second reported for both semesters of the student achievement dataset.
semester data as the test set for classification, and vice versa. The first part of Table 3 shows the scores based on the various
evaluation metrics using the data of the 1st semester as the
5.1.2. MIT-BIH database training set and the data of the 2nd semester as the test set. For
To address more high-dimensional data criteria, we select the different values of γ , the first- and second-best algorithms,
γ =1 γ =1
the MIT-BIH Arrhythmia Database [53] with 256 dimensions for which are 1D_CNN2G and 1D_VGG2G , respectively, are obtained
the comparative experimental dataset. The database contains 48 when γ = 1. The average scores of these algorithms are 0.449 and
ECG records sampled at 360 Hz of 47 individuals. We take the 0.436, respectively. As the value of γ increases, the performance
‘101’, ‘105’, ‘114’, ‘118’, ‘124’, ‘201’, ‘210’ and ‘217’ records as the of our algorithm gradually declines, and the worst performance
γ =13
test set and the remaining records as the training dataset. Only occurs when γ = 13. The average score of our 1D_VGG2G model
γ =1
the modified limb lead II (MLII) data are used for training and is 0.254, which is 43.4% worse than that of the best 1D_CNN2G
testing. We select a segment with a length of 256 as the 256- algorithm. For the Jaccard matrix, a slightly better performance is
dimensional feature centered on each peak. Subsequently, we γ =1
achieved using our 1D_VGG2G model (0.283) compared with the
select six categories as the r = 6 channels of the classifier, which γ =1 γ =1
1D_CNN2G model (0.277). For the other metric, our 1D_VGG2G
are ’normal beat’, ’premature contract’, ’pace beat’, ’atmospheric γ =1
model obtains worse results than the best 1D_CNN2G algorithm.
premature beat’, ’fusion of vascular and normal beat’, and ’beat
Furthermore, the second part of Table 3 shows exactly the same
not classified during learning’. Moreover, to improve the accuracy
conclusions using the data of the 2nd semester as the training set
of the last category, we include all the data of this category from
and the data of the 1st semester as the test set. These results lead
the 2017 Physionet Challenge [54] in the training dataset, thereby
to exactly the same conclusion, i.e., the best algorithms, which
increasing the number of training samples. Finally, the sample γ =1 γ =1
are 1D_CNN2G and 1D_VGG2G , are obtained when γ = 1. The
size of the training dataset is 22935 × 256, and the sample size
average scores are 0.425 and 0.424, respectively. In particular, the
of the test set is 4568 × 256. γ =1
performance scores of the best 1D_CNN2G algorithm are slightly
γ =1
5.2. Test phase worse than those of the second best 1D_VGG2G algorithm for the
γ =1
R2 metric. This means that our 1D_VGG2G model provides better
γ =1
5.2.1. Shaoxing university datasets model fitting performance than the 1D_CNN2G model. Therefore,
γ =1
(1) Two groups of convolution layers the conclusion is that 1D_CNN2G achieves better performance
γ
Since 1D_CNN can work well with various network depths, than our 1D_VGG2G model with 2 groups of convolution layers
first, we evaluate the classification performance of our and various values of γ , but it has worse model fitting.
γ
1D_VGGNet model with 2 groups of convolution layers 1D_VGG2G , (2) Five groups of convolution layers
as shown in the previous figure. We compare our framework with Since the well-known 2D_VGGNet model can achieve the best
the state-of-the-art tracker from the literature, a 1D_CNN with 2 performance in the case of five groups of convolution layers,
γ
groups of convolution layers 1D_CNN2G [51]. as shown in the previous figure, we compare the performance
Then, to evaluate the effect of our framework, we use multiple of the one-dimensional and two-dimensional models with five
evaluation metrics, including the accuracy score, macro Jaccard groups of convolution layers. Moreover, since high-dimensional
score, recall score, F1 score, precision score, R2 score, and area data cannot be directly imported into the 2D_VGGNet model,
under the curve (AUC) score, in the Python sklearn module to when m = 23, we change the one-dimensional data 1 × m of
7
S. Feng, L. Zhao, H. Shi et al. Applied Soft Computing 135 (2023) 110035

Table 4
Robustness evaluation of our framework with 5 groups of convolution layers on the Shaoxing University datasets.
Dataset Classifier Accuracy Jaccard Recall F1 Precision R2 AUC Avg
γ =1
1st_2nd 1D_CNN5G 0.856 0.286 0.576 0.587 0.611 0.329 0.766 0.573
γ =1
1.5D_VGG5G 0.927 0.332 0.881 0.894 0.917 0.741 0.930 0.803
γ =1
1D_VGG5G 0.929 0.326 0.854 0.865 0.884 0.693 0.917 0.781
γ =2
1D_CNN5G 0.747 0.284 0.461 0.442 0.426 0.109 0.688 0.451
γ =2
2D_VGG5G 0.476 0.269 0.282 0.249 0.235 −0.359 0.542 0.242
γ =2
1D_VGG5G 0.615 0.271 0.414 0.376 0.351 −0.150 0.643 0.360
γ =3
1D_CNN5G 0.732 0.281 0.446 0.433 0.420 0.084 0.678 0.439
γ =3
2D_VGG5G 0.639 0.185 0.326 0.302 0.312 −0.175 0.609 0.314
γ =3
1D_VGG5G 0.736 0.273 0.493 0.478 0.500 0.055 0.703 0.463
γ =5
1D_CNN5G 0.714 0.262 0.440 0.416 0.398 0.023 0.674 0.418
γ =5
1D_VGG5G 0.791 0.258 0.628 0.590 0.588 0.178 0.785 0.545
γ =7
1D_CNN5G 0.716 0.247 0.410 0.401 0.394 0.021 0.656 0.406
γ =7
1D_VGG5G 0.798 0.282 0.580 0.576 0.585 0.229 0.758 0.544
γ =9
1D_CNN5G 0.743 0.263 0.431 0.423 0.418 0.083 0.671 0.433
γ =9
1D_VGG5G 0.756 0.256 0.537 0.519 0.567 0.088 0.730 0.493
γ =11
1D_CNN5G 0.694 0.238 0.370 0.375 0.410 0.003 0.631 0.389
γ =11
1D_VGG5G 0.918 0.353 0.706 0.703 0.701 0.556 0.841 0.683
γ =13
1D_CNN5G 0.636 0.182 0.276 0.252 0.258 −0.118 0.568 0.293
γ =13
1D_VGG5G 0.511 0.182 0.200 0.135 0.102 −0.325 0.500 0.186
Inc(κi ) 0.085 0.140 0.483 0.474 0.447 1.106 0.197 0.363
γ =1
2nd_1st 1D_CNN5G 0.885 0.293 0.667 0.736 0.897 0.508 0.814 0.686
γ =1
1.5D_VGG5G 0.941 0.359 0.921 0.874 0.865 0.648 0.950 0.794
γ =1
1D_VGG5G 0.961 0.347 0.945 0.951 0.958 0.880 0.966 0.858
γ =2
1D_CNN5G 0.752 0.272 0.419 0.430 0.465 0.128 0.665 0.447
γ =2
2D_VGG5G 0.507 0.248 0.267 0.227 0.237 −0.299 0.538 0.246
γ =2
1D_VGG5G 0.660 0.258 0.363 0.371 0.408 −0.042 0.619 0.377
γ =3
1D_CNN5G 0.764 0.267 0.424 0.433 0.466 0.012 0.671 0.434
γ =3
2D_VGG5G 0.798 0.231 0.443 0.429 0.416 0.121 0.690 0.447
γ =3
1D_VGG5G 0.874 0.287 0.649 0.651 0.680 0.439 0.803 0.626
γ =5
1D_CNN5G 0.737 0.259 0.402 0.412 0.446 0.091 0.654 0.429
γ =5
1D_VGG5G 0.845 0.253 0.612 0.599 0.606 0.316 0.781 0.573
γ =7
1D_CNN5G 0.721 0.264 0.405 0.410 0.427 0.062 0.653 0.420
γ =7
1D_VGG5G 0.836 0.252 0.588 0.598 0.626 0.328 0.766 0.571
γ =9
1D_CNN5G 0.755 0.248 0.405 0.412 0.442 0.109 0.659 0.433
γ =9
1D_VGG5G 0.788 0.253 0.544 0.554 0.576 0.201 0.736 0.522
γ =11
1D_CNN5G 0.721 0.243 0.385 0.391 0.419 0.045 0.642 0.407
γ =11
1D_VGG5G 0.923 0.350 0.698 0.713 0.732 0.584 0.836 0.691
γ =13
1D_CNN5G 0.614 0.182 0.268 0.242 0.258 −0.153 0.561 0.282
γ =13
1D_VGG5G 0.489 0.182 0.200 0.131 0.098 −0.357 0.500 0.178
Inc(κi ) 0.086 0.184 0.417 0.292 0.068 0.732 0.187 0.252

γ =1 γ =1 γ =11
each row in D to two-dimensional data 4 × 6 and use 1e-5 to 1.5D_VGG5G , 1D_VGG5G and 1D_VGG5G , with average scores
fill the insufficient data at the end. Next, Conv1D and maxpool1D of 0.803, 0.781 and 0.683, respectively. The fourth best algorithm
are replaced by Conv2D and maxpool2D for 2D_VGGNet. Then, γ =1
is 1D_CNN5G , with an average score of 0.573. Then, as the value
when γ = 1, the functions of maxpool1D and maxpool2D are γ
of γ increases, the performance of 1D_CNN5G shows a gradual
almost the same and similar to those of the one-dimensional γ
γ =1
downward trend, while that of 1D_VGG5G shows the character-
model. Thus, we name the case of γ = 1 1.5D_VGG5G . Finally, istics of first increasing and then decreasing. The highest score
due to the limitation of the two-dimensional data 4 × 6, the value
occurs when γ = 11. Furthermore, limited by the value range
of γ cannot be greater than 4. Therefore, the two-dimensional γ
γ =2 of γ , the performance of 2D_VGG5G shows an increasing trend.
comparison models use γ = 2, 3 and are named 2D_VGG5G [17]
γ =3 This shows that increasing the value of γ cannot improve the
and 2D_VGG5G , respectively. γ
γ γ performance of the 1D_CNN5G algorithms, but it can improve the
The top half of Table 4 shows the 1D_CNN5G , 2D_VGG5G and γ γ
γ performance of the 1D_VGG5G and 2D_VGG5G algorithms to some
1D_VGG5G models with the same evaluation metric as the top half γ =11
of the previous table on the same dataset. As we expected, the extent. Although the performance of 1D_VGG5G cannot surpass
γ =1
algorithms with five groups of convolution layers are far better that of 1D_VGG5G , it can be proven that the numerical change in
γ
than the algorithms with two groups of convolution layers in γ can affect the performance of 1D_VGG5G . Moreover, for all the
γ =1
all evaluation metrics. Above all, the best top 3 algorithms are evaluation metrics, our 1D_VGG5G model completely surpasses
8
S. Feng, L. Zhao, H. Shi et al. Applied Soft Computing 135 (2023) 110035

Fig. 4. Macroaverage ROC analysis of our framework on the Shaoxing University


datasets for two semesters. (a) 1st semester and (b) 2nd semester.

Fig. 3. Confusion matrix analysis of our framework on the Shaoxing University other four categories reach 0.97 or above, which indicates that the
datasets for two semesters. (a) 1st semester and (b) 2nd semester.
classification accuracy of the model for the 1st semester dataset
is very high. Moreover, Fig. 3(b) shows the test results of the
γ =1
2nd semester dataset using the same model trained with the 1st
1D_CNN5G , with an improvement of 36.3%. Especially in terms semester dataset. As expected, the first three categories still show
γ =1
of R2, our 1D_VGG5G model is improved by 110.6% compared a high TP rate. However, the TP rate of the latter two categories
γ =1
with 1D_CNN5G , that is, 0.693 versus 0.329, respectively. This decreased significantly. In particular, the last class has a large
γ =1 difference of 0.33. This shows that the classifier has difficulty
shows that our 1D_VGG5G model has a much better model fitting
ability. Furthermore, using the same dataset, the bottom half of classifying the last class of the 2nd semester dataset.
Table 4 shows exactly the same conclusions as the bottom half Next, to overcome class imbalance in the datasets, we use the
of the previous table, except that the first two algorithms switch macroaverage AUC score to analyze our framework on the same
places. When γ = 1, the effects of maxpool1D and maxpool2D datasets. Fig. 4(a) shows the receiver operating characteristic
are almost the same and are infinitely close to the classification (ROC) curves of the top 2D_VGGNet and top 3 one-dimensional
γ =1
performance of 1D_VGGNet. The conclusions are that the best classifiers on the 1st semester test set. Using 1D_VGG5G im-
γ =1 proves the TP rate to more than 0.95, showing far better per-
algorithm is our 1D_VGG5G model; the single evaluation metric
with the greatest improvement is R2, which is improved by 73.2%. formance than the other three classifiers. Moreover, the AUC
γ =3 scores of these classifiers correspond to the data in the previ-
The best 2D_VGGNet model is 2D_VGG5G . Therefore, our frame-
work consistently provides the best classification performance in ous table. Furthermore, Fig. 4(b) shows the ROC curve on the
terms of the accuracy score, macro Jaccard score, recall score, F1 2nd semester test set. The outcomes of the four classifiers have
γ =1 γ =1
score, precision score, R2 score, AUC score and Avg score. similar trends, but the performance of 1D_VGG5G , 1D_CNN5G
γ =1 γ =3
On this basis, we use the best classifier, 1D_VGG5G , to predict and 2D_VGG5G is decreased compared with the performance
γ =11
the student achievement dataset for two semesters. Fig. 3(a) on the 1st semester data, while the performance of 1D_VGG5G
γ =1
shows the normalized confusion matrix for classifying the 1st improves. Nevertheless, 1D_VGG5G always maintains the best
γ =1
semester dataset with our 1D_VGG5G model trained on the 2nd classification performance.
semester dataset. As shown in the figure, other than the true Then, to study the classification results of each class, we use
γ =1
positive (TP) ratio of class 1, which is 0.8, the TP ratios of the the ROC curve to analyze our 1D_VGG5G model on the same
9
S. Feng, L. Zhao, H. Shi et al. Applied Soft Computing 135 (2023) 110035

γ =1 γ =1
Fig. 5. ROC analysis of our 1D_VGG5G model on the Shaoxing University Fig. 6. PR analysis of our 1D_VGG5G model on the Shaoxing University datasets
datasets for two semesters. (a) 1st semester and (b) 2nd semester. for two semesters. (a) 1st semester and (b) 2nd semester.

datasets. Fig. 5(a) shows the ROC curve of the test set on the to the characteristics of the dataset, we choose the MSELoss loss
1st semester data for each class. Except for class 1, AUC scores function in PyTorch to calculate the loss value. Among the above
γ =11 γ =5 γ =5
above 0.96 are reached for all classes. In particular, an AUC score three algorithms, 2D_VGG5G , 1D_CNN5G and 1D_VGG5G , have
γ =1
of 1 is achieved for class 5, which indicates that 1D_VGG5G has the best configurations, with average scores of 0.573, 0.605 and
excellent classification performance on the 1st semester test set. 0.678, respectively. Moreover, as the value of γ increases, the
In addition, Fig. 5(b) shows the ROC curves of the five classes three algorithms show irregular fluctuation characteristics. Thus,
on the 2nd semester test set. Except for class 1, the scores of the value change in γ can change the performance of the classifier
all other classes decreased to varying degrees. In particular, class to a certain extent. Moreover, for all the evaluation metrics, our
5 decreases most significantly; that is, its AUC score decreases γ =5 γ =5
1D_VGG5G model completely surpasses 1D_CNN5G , with an im-
γ =1 γ =5
by 0.167. In summary, 1D_VGG5G shows excellent classification provement of 12.1%. Especially with respect to R2, our 1D_VGG5G
performance, and each class obtains an AUC score of more than γ =5
is improved by 72.5% compared with 1D_CNN5G , that is, 0.540
0.83. γ =5
versus 0.313, respectively. This means that our 1D_VGG5G model
Finally, corresponding to the previous figure, we use the
γ =1 has a much better model fitting ability. Therefore, the conclusion
precision–recall (PR) curve to analyze 1D_VGG5G for multiclass
is that our framework consistently provides the best classification
classification. Fig. 6(a) shows a left–right symmetrical relation-
performance for all the evaluation metrics.
ship with the previous figure. In addition, the order of average
Fig. 7 shows the normalized confusion matrix for classifying
precision (AP) from small to large is consistent with that of γ =5
the AUC score, which reflects the one-to-one correspondence the test set of the MIT-BIH database with our 1D_VGG5G model.
between the ROC and PR curves. Fig. 6(b) shows exactly the same As shown in the figure, the TP ratios of classes 1, 2, 3 and 6 reach
γ =1 0.89 or above, which indicates that the classification accuracy of
effect as Fig. 6(a). In summary, our 1D_VGG5G model can achieve
better classification precision. the model is very high. However, the TP rates of classes 4 and 5
decrease significantly. This shows that the classifier has difficulty
5.2.2. MIT-BIH database classifying classes with few training samples.
γ γ γ Finally, Fig. 8 shows the ROC curves of the best classifiers for
Table 5 shows the 1D_CNN5G , 2D_VGG5G and 1D_VGG5G models γ =5
with the same evaluation metric as in the previous table using the test set of the MIT-BIH database. 1D_VGG5G improves the
the MIT-BIH Arrhythmia Database. Since we need to apply the TP rate to more than 0.75, showing better performance than the
2D_VGGNet model on high-dimensional data, when m = 256, other classifiers. The AUC scores of these classifiers correspond
γ =5
we change the one-dimensional data 1 × m of each row in D to the data in the previous table. Moreover, 1D_CNN5G and
γ =11
to two-dimensional data 16 × 16. Furthermore, to better adapt 2D_VGG5G show similar trends, and both yield smaller values
10
S. Feng, L. Zhao, H. Shi et al. Applied Soft Computing 135 (2023) 110035

Table 5
Robustness evaluation of our framework with 5 groups of convolution layers on the MIT-BIH database.
Classifier Accuracy Jaccard Recall F1 Precision R2 AUC Avg
γ =1
1D_CNN5G 0.898 0.283 0.650 0.648 0.660 0.254 0.814 0.601
γ =1
1.5D_VGG5G 0.368 0.061 0.167 0.090 0.061 −0.428 0.500 0.117
γ =1
1D_VGG5G 0.867 0.270 0.632 0.623 0.619 0.108 0.802 0.560
γ =2
1D_CNN5G 0.876 0.270 0.639 0.639 0.669 −0.964 0.808 0.420
γ =2
2D_VGG5G 0.817 0.240 0.640 0.627 0.633 −1.986 0.803 0.253
γ =2
1D_VGG5G 0.868 0.266 0.635 0.619 0.614 −0.413 0.805 0.485
γ =3
1D_CNN5G 0.857 0.265 0.670 0.653 0.651 −0.587 0.821 0.476
γ =3
2D_VGG5G 0.861 0.277 0.622 0.615 0.616 0.115 0.797 0.558
γ =3
1D_VGG5G 0.744 0.256 0.570 0.555 0.595 −0.582 0.758 0.414
γ =5
1D_CNN5G 0.882 0.303 0.647 0.640 0.638 0.313 0.812 0.605
γ =5
2D_VGG5G 0.708 0.195 0.557 0.558 0.625 −0.231 0.743 0.451
γ =5
1D_VGG5G 0.930 0.301 0.685 0.696 0.760 0.540 0.835 0.678
γ =7
1D_CNN5G 0.845 0.273 0.688 0.658 0.649 −0.279 0.828 0.523
γ =7
2D_VGG5G 0.856 0.285 0.615 0.617 0.624 0.103 0.793 0.556
γ =7
1D_VGG5G 0.882 0.275 0.640 0.645 0.659 −0.817 0.808 0.442
γ =9
1D_CNN5G 0.908 0.287 0.686 0.695 0.709 −0.157 0.834 0.566
γ =9
2D_VGG5G 0.368 0.061 0.167 0.090 0.061 −0.428 0.500 0.117
γ =9
1D_VGG5G 0.851 0.256 0.633 0.624 0.631 −0.258 0.801 0.505
γ =11
1D_CNN5G 0.840 0.253 0.645 0.634 0.642 −0.771 0.806 0.436
γ =11
2D_VGG5G 0.888 0.283 0.648 0.632 0.621 0.128 0.813 0.573
γ =11
1D_VGG5G 0.823 0.241 0.625 0.622 0.639 −2.002 0.795 0.249
γ =13
1D_CNN5G 0.822 0.245 0.620 0.596 0.593 −0.327 0.793 0.477
γ =13
2D_VGG5G 0.793 0.271 0.608 0.604 0.627 −0.712 0.784 0.425
γ =13
1D_VGG5G 0.803 0.228 0.606 0.589 0.595 −1.200 0.784 0.344
Inc(κi ) 0.054 −0.007 0.059 0.088 0.191 0.725 0.028 0.121

Fig. 8. Macroaverage ROC analysis of our framework on the MIT-BIH database.

γ =1
Fig. 7. Confusion matrix analysis of our framework on the MIT-BIH database. classifiers. 1D_CNN5G is the first classifier to control the batch
loss below 0.1. However, as the number of iterations reaches
γ =1
30 000, the performance of 1D_VGG5G gradually surpasses that
γ =5 γ =5 γ =1
than 1D_VGG5G . Thus, 1D_VGG5G obtains the best classification of 1D_CNN5G , and the batch loss approaches zero. Although
performance. the four classifiers perform well, as the number of iterations
γ =11
increases, the batch loss decreases. However, 1D_VGG5G has the
γ =3
5.3. Training phase slowest convergence rate, and 2D_VGG5G has the largest fluctu-
ation. Furthermore, Fig. 9(b) shows the batch loss using the 2nd
5.3.1. Shaoxing University datasets semester dataset as the training set. Although all four classifiers
To analyze the training performance of our model, we ex- achieve faster convergence speeds than those in Fig. 9(a), our
γ =1
tend our comparative experiments to the batch loss analysis 1D_VGG5G model still shows the best performance and has the
of our framework on the student achievement dataset for two minimum batch loss.
semesters. We compare our framework with the state-of-the-art Then, we further extend our comparative experiments to an-
1D_CNN and 2D_VGGNet classifiers mentioned above. Fig. 9(a) alyze the training loss of our framework on the same dataset.
presents the batch loss using the 1st semester dataset as the Fig. 10(a) shows the training loss relative to each iteration in
training set for the top 2D_VGGNet and top 3 one-dimensional the previous figure using the dataset of the 1st semester as
11
S. Feng, L. Zhao, H. Shi et al. Applied Soft Computing 135 (2023) 110035

Fig. 9. Batch loss analysis of our framework on the Shaoxing University datasets
for two semesters. (a) 1st semester and (b) 2nd semester. Fig. 10. Training loss analysis of our framework on the Shaoxing University
datasets for two semesters. (a) 1st semester and (b) 2nd semester.

the test set for the four classifiers. It is clearly shown that the
γ =1 show the same trend as that in Fig. 11(a), and higher train-
training loss of 1D_VGG5G is much lower than that of the other γ =1
γ =11 ing accuracy values are obtained. When 1D_VGG5G advances
three classifiers and is infinitely close to 0.9. 1D_VGG5G requires γ =1
γ =1 to approximately 10 000 iterations, it surpasses 1D_CNN5G , and
42 000 iterations before it gradually surpasses 1D_CNN5G and γ =11
when 1D_VGG5G advances to approximately 38 000 iterations,
becomes the second smallest classifier in terms of the training γ =1
loss. Moreover, Fig. 10(b) shows the training loss relative to each it surpasses 1D_CNN5G . Overall, our framework surpasses the
iteration in the previous figure using the dataset of the 2nd training performance of 1D_CNN and 2D_VGGNet and shows a
semester as the test set. All four classifiers show the same trend more stable and reliable model training ability.
as that in Fig. 10(a), and they each have lower training loss values. Furthermore, to validate the performance of the training
However, at the initial stage of training, the training loss values model, we validate the loss change on the same dataset. Fig. 12(a)
γ =3 shows the validation loss and has a change trend similar to that
of the four classifiers fluctuate greatly. Furthermore, 2D_VGG5G
always obtains the worst training performance. in the previous figure using the dataset of the 2nd semester as
To analyze the gradual change in the training accuracy with the test set for the four classifiers. However, in the end stage,
γ =11
the training loss, we further extend our comparative experiments the performance of 1D_VGG5G is no longer similar to that
γ =1 γ =1
to the training accuracy analysis of our framework on the same of 1D_CNN5G but is constantly close to that of 1D_VGG5G ,
γ =11
dataset. Fig. 11(a) presents the training accuracy relative to the indicating that the performance of 1D_VGG5G is significantly
training model in the previous figure using the dataset of the improved in the validation stage. In addition, Fig. 12(b) shows
1st semester as the test set for the four classifiers. In the initial exactly the same trend using the dataset of the 1st semester as
γ =1 γ =1 γ =11
stage of training, the best accuracy is obtained using 1D_CNN5G . the test set. The validation losses of 1D_VGG5G and 1D_VGG5G
However, when the number of iterations reaches approximately γ =1 γ =3
γ =1 γ =1 are significantly lower than those of 1D_CNN5G and 2D_VGG5G .
17 000, 1D_VGG5G completely surpasses 1D_CNN5G to become Moreover, the validation losses of the four classifiers are higher
the classifier with the highest training accuracy, and its accuracy than the corresponding training losses in the previous figure.
γ =11 γ =3
is infinitely close to 1. Although 1D_VGG5G and 2D_VGG5G As the training loss decreases, the validation loss decreases syn-
are the classifiers with the largest fluctuation, when the num- chronously, which indicates that the training model is in the pro-
γ =11
ber of iterations reaches 40 000, 1D_VGG5G gradually surpasses cess of self-improvement, and the design of the model framework
γ =1
1D_CNN5G to become the classifier with the second highest is very reasonable.
training accuracy. Furthermore, Fig. 11(b) shows the training ac- In addition, we validate the change in the accuracy on the
curacy relative to the training model in the previous figure using same dataset, as shown in Fig. 13. In the initial stage of train-
γ =1
the dataset of the 2nd semester as the test set. All four classifiers ing, the best validation accuracy is obtained using 1D_CNN5G .
12
S. Feng, L. Zhao, H. Shi et al. Applied Soft Computing 135 (2023) 110035

Fig. 11. Training accuracy analysis of our framework on the Shaoxing University Fig. 12. Validation loss analysis of our framework on the Shaoxing University
datasets for two semesters. (a) 1st semester and (b) 2nd semester. datasets for two semesters. (a) 2nd semester and (b) 1st semester.

However, when the number of iterations reaches approximately 5.3.2. MIT-BIH database
γ =1
14 000 in Fig. 13(a) and 10 000 in Fig. 13(b), 1D_VGG5G com- Fig. 14 presents the batch loss of the best classifiers us-
γ =1
pletely surpasses 1D_CNN5G to become the classifier with the ing the MIT-BIH database as the training set. As shown in the
γ =5
highest validation accuracy, and its accuracy is infinitely close to figure, the fluctuation amplitude of 1D_VGG5G is the largest,
γ =1 γ =11
0.97. We find that the validation accuracies of 1D_VGG5G and followed by that of 2D_VGG5G , and the fluctuation amplitude of
γ =11 γ =1 γ =5 γ =5
1D_VGG5G are significantly higher than those of 1D_CNN5G 1D_CNN5G is the smallest. It can be seen that the 1D_VGG5G and
γ =3 γ =11
and 2D_VGG5G . Moreover, the validation accuracies of the four 2D_VGG5G models have more parameters to adjust, so training
classifiers are lower than the corresponding training accuracies is difficult. However, all three algorithms can be used to reduce
in the previous figure. Based on these results, with the contin- the loss to infinity and close to zero.
uous reduction in the training loss, the training accuracy con- Fig. 15 shows the training loss relative to each iteration in the
tinues to increase. Moreover, the validation loss is reduced syn- previous figure using 1000 randomly selected training samples
chronously, and the validation accuracy increases, which shows from the MIT-BIH database as the test set for the three classifiers.
that the model framework design is in an ideal stage with the γ =5
It clearly shows that the training loss values of 1D_VGG5G and
best performance. γ =5
1D_CNN5G are much lower than those of the other classifiers
Finally, Table 6 describes the training performance analysis γ =11
and are infinitely close to 0. Furthermore, 2D_VGG5G always ob-
based on the time cost for the Shaoxing University dataset. First,
tains the worst training performance and the largest fluctuation
when the value of γ is the same, the 1D_CNN model takes
amplitude.
the least time, while the 1.5D_VGGNet and 2D_VGGNet models
Fig. 16 presents the training accuracy relative to the training
always take the most time, for example, 15199 s versus 31828 s,
respectively. Second, the time-consuming ratio of 1D_VGGNet to model in the previous figure using the same 1000 selected train-
1.5D_VGGNet increases by approximately −0.3; for example, the ing samples from the MIT-BIH database as the test set for the
increase (1D_VGG vs. 1.5D_VGG) is −0.295 for the 1st semester three classifiers. Similar to the conclusion made from the previous
γ =5 γ =5
data. Finally, compared with 2D_VGGNet, 1D_VGGNet always figure, the change trends of 1D_VGG5G and 1D_CNN5G are simi-
γ =11
takes approximately −0.5 more time. Moreover, when γ = 3, lar, and the training accuracy is infinitely close to 1.0. 2D_VGG5G
the growth rate reaches the maximum and increases (1D_VGG always obtains the worst accuracy and the largest fluctuation
vs. 2D_VGG) by −0.555 on the 1st semester data. It can be seen range. Overall, our framework is similar to the 1D_CNN frame-
that 1D_CNN always takes the least time, and 1D_VGGNet always work, surpasses the training performance of 2D_VGGNet, and
takes approximately half as much time as 2D_VGGNet. shows a more stable and reliable model training ability.
13
S. Feng, L. Zhao, H. Shi et al. Applied Soft Computing 135 (2023) 110035

Table 6
Training performance analysis based on the time cost for the Shaoxing University
dataset.
Dataset Classifier Time cost (s)
γ =1
1st 1D_CNN5G 15199
γ =1
1.5D_VGG5G 31828
γ =1
1D_VGG5G 22432
Increase (1D_VGG vs. 1.5D_VGG) −0.295
γ =2
1D_CNN5G 12462
γ =2
2D_VGG5G 34014
γ =2
1D_VGG5G 15216
Increase (1D_VGG vs. 2D_VGG) −0.553
γ =3
1D_CNN5G 14415
γ =3
2D_VGG5G 33762
γ =3
1D_VGG5G 15018
Increase (1D_VGG vs. 2D_VGG) −0.555
γ =1
2nd 1D_CNN5G 15166
γ =1
1.5D_VGG5G 31795
γ =1
1D_VGG5G 21607
Increase (1D_VGG vs. 1.5D_VGG) −0.320
γ =2
1D_CNN5G 12478
γ =2
2D_VGG5G 32262
γ =2
1D_VGG5G 15560
Increase (1D_VGG vs. 2D_VGG) −0.518
γ =3
1D_CNN5G 14491
γ =3
2D_VGG5G 32537
γ =3
1D_VGG5G 15010
Increase (1D_VGG vs. 2D_VGG) −0.539

Fig. 13. Validation accuracy analysis of our framework on the Shaoxing


University datasets for two semesters. (a) 2nd semester and (b) 1st semester.

Fig. 16. Training accuracy analysis of our framework on the MIT-BIH database.

Fig. 17 shows the validation loss and a change trend, which


Fig. 14. Batch loss analysis of our framework on the MIT-BIH database.
are similar to those in the previous figure, using 1000 ran-
domly selected testing samples from the MIT-BIH database as
the test set for the three classifiers. The fluctuation range of the
γ =5
three algorithms is increased. Moreover, 1D_VGG5G is similar to
γ =5
1D_CNN5G , and the validation loss can be reduced to a minimum
γ =11
value. 2D_VGG5G always obtains a high validation loss and the
maximum amplitude. As the training loss and validation loss
decrease synchronously, the design of the model framework is
very reasonable, and it has self-improvement abilities.
In addition, we validate the accuracy change on the same 1000
selected testing samples of the MIT-BIH database in Fig. 18. We
can obtain a similar conclusion that the validation accuracies of
γ =5 γ =5
1D_VGG5G and 1D_CNN5G are significantly higher than that of
γ =11
2D_VGG5G . Moreover, the validation accuracy of the three clas-
sifiers is lower than the corresponding training accuracy in the
Fig. 15. Training loss analysis of our framework on the MIT-BIH database. previous figure. Therefore, while the training loss decreases, the
14
S. Feng, L. Zhao, H. Shi et al. Applied Soft Computing 135 (2023) 110035

Table 8
Overall analysis for 1D_VGGNet.
Category Features [23] [24] [32] [33] [43] [44] Ours
√ √ √ √ √ √
Theory Internet of things
(IoT) √ √ √ √ √ √ √
High-dimensional
data √ √ √ √ √ √
Deep learning
classification √ √ √ √
Methodology One-dimensional
convolution layers √
Changeable kernel
size of maxpool1D √ √ √
Multicriteria
decision-making
(MCDM) √
Fig. 17. Validation loss analysis of our framework on the MIT-BIH database. MCDM optimization
problem (MCDMOP) √ √ √ √ √ √ √
Comprehensive
evaluation (CE)

classification theories. Second, in terms of the methodology, the


application scenarios involved in these methods, which range
from COVID-19 detection to weather forecasting, can be classified
as CE applications. Although one-dimensional convolution layers
have been applied to some extent, only our framework effectively
combines the Conv1D and changeable kernel size of maxpool1D
and fully utilizes their respective advantages. Furthermore, only
the work by [43,44] and our framework realize MCDM applica-
tions. In particular, only our framework successfully combines
MCDM and MCDMOP to effectively improve the classification
Fig. 18. Validation accuracy analysis of our framework on the MIT-BIH database. accuracy. Therefore, our framework is considerably better than
existing methods in terms of theory and methodology.
Table 7
Training performance analysis based on the time cost for the
MIT-BIH database. 6. Conclusion
Classifier Time cost (s)

1D_CNN5G
γ =5
832 To improve deep learning classification and achieve optimal
γ =5 evaluation accuracy and robustness based on MCDM for applica-
2D_VGG5G 4756
γ =5 tions in high-dimensional data analysis during CE activities, we
1D_VGG5G 2358
propose a novel 1D_VGGNet model for high-dimensional data
Increase (1D_VGG vs. 1D_CNN) 1.834
γ =11
classification and analysis to overcome the problem that high-
1D_CNN5G 790
γ =11
dimensional data are too complicated and unstable to be feasibly
2D_VGG5G 8133 applied. Moreover, 1D_VGGNet is far better than 2D_VGGNet in
γ =11
1D_VGG5G 2115 terms of classification accuracy and time consumption. In addi-
Increase (1D_VGG vs. 2D_VGG) −0.740 tion, 1D_VGGNet can directly use high-dimensional data as input
data without having to process two-dimensional data, which is
required in 2D_VGGNet. Therefore, 1D_VGGNet has wider appli-
training accuracy increases. Moreover, because the validation loss cability. Furthermore, to solve the problem of the invariance of
decreases synchronously and the validation accuracy increases, the generated feature maps, the maxpooling kernel size can be
γ =5 flexibly adjusted to effectively meet the requirements of reducing
the conclusion is that 1D_VGG5G has a reasonable model design.
Finally, Table 7 describes the training performance analysis the dimension of the feature maps and speeding up training and
based on the time cost for the MIT-BIH database. First, when prediction on different datasets. The improvement is reasonable
the values of γ are the same, the time consumption ratio of for various high-dimensional data application scenarios. More-
1D_VGGNet to 1D_CNN increases by approximately 1.834, for over, we propose a novel objective function to accurately evaluate
example, 2358 s versus 832 s, respectively. Then, compared with the performance of the network model since the objective func-
2D_VGGNet, 1D_VGGNet always takes approximately −0.5; for tion includes a variety of representative performance evaluation
example, the increase (1D_VGG vs. 2D_VGG) is −0.740. It can metrics, and the average value is calculated as one of the CE
be seen that 1D_CNN always takes the least time, 1D_VGGNet is metrics. Various experimental results obtained on the student
second and 2D_VGGNet takes the most time. achievement datasets of Shaoxing University and the MIT-BIH
Arrhythmia Database show that our 1D_VGGNet model outper-
5.4. Overall analysis forms the 1D_CNN model in most scenarios, except for the case
when there are two groups of convolution layers. When there are
Table 8 reports the overall performance based on various five groups of convolution layers, our algorithm achieves much
features. First, from the theoretical perspective, high-dimensional better performance and is more robust than the 1D_CNN model
data analysis can be realized. Moreover, except for [24], all other and achieves average gains of 36.3% and 12.1% in terms of the
works have successfully implemented the IoT and deep learning designated evaluation metric.
15
S. Feng, L. Zhao, H. Shi et al. Applied Soft Computing 135 (2023) 110035

6.1. Limitations 6.2. Future works

(1) We design our 1D_VGGNet for preserving the evalua- (1) The VGGNet model is well known for using five groups of
tion comments of experts and improving the evaluation accuracy convolution layers to improve the classification performance of
when high-dimensional data are too complicated and unstable to deep learning. Our research focuses on the most representative
VGG-16 model, which has obvious advantages over CNN. How-
be feasibly applied. However, experts have deficiencies in terms
ever, the VGGNet model also includes other types of network
of knowledge and interests. Thus, it is necessary to recognize the
structures, such as VGG-11, VGG-13, and VGG-19. Therefore, for
limitations of experts. First, experts’ own knowledge is defec-
diverse datasets in the real world, future works should consider
tive. This includes knowledge defects caused by the uncertainty
making full use of the diverse network structures of the VGGNet
of modern knowledge, knowledge defects caused by a lack of
model and taking full advantage of the characteristics of net-
information, and knowledge defects due to the knowledge com-
work structures to improve the classification performance of deep
position of experts. Second, experts have their own interests, learning.
which affect objectivity and fairness. This includes the influence (2) For the Shaoxing University datasets, as the value of γ
of self-interest, the autocracy of expert opinions, the influence γ
increases, the performance of our 1D_VGG5G shows the character-
of decision-making organs, and the influence of interest groups. istics of first increasing and then decreasing, and the best score
Third, expert decision-making conflicts with democracy. Even in occurs when γ = 11. Moreover, for the MIT-BIH database, as the
an objective and fair situation, experts only provide certain pro- value of γ increases, our algorithm shows irregular fluctuation
fessional knowledge. They cannot provide value judgments, let characteristics. Thus, it shows that the value change in γ can
alone make choices about the public’s risk tolerance and attitude. change the performance of the classifier to a certain extent. Based
Therefore, when making value judgments on risk regulation, they on this, we suggest that regardless of dataset used to train the
should be made by the public themselves rather than by technical classifier, it is necessary to change the value of γ to find the best
experts. classification accuracy.
(2) For the Shaoxing University datasets with relatively few (3) For the MIT-BIH database, to improve the accuracy of the
dimensions and sample numbers, the classification accuracy of last category, we included all the data of this category from
our 1D_VGGNet regularly increases first and then decreases with the 2017 Physionet Challenge data [54] in the training dataset,
increasing γ . However, for the MIT-BIH database, which has a thereby increasing the number of training samples. As shown in
relatively large number of dimensions and samples, the classifi- the confusion matrix analysis, the TP ratios of classes 1, 2, 3 and
cation accuracy of our 1D_VGGNet is characterized by irregular 6 reach 0.89 or above, which indicates that the classification ac-
fluctuations, and specific rules cannot be summarized. From the curacy of the model is very high. However, the TP rates of classes
training loss analysis chart, training accuracy, verification loss and 4 and 5 decreased significantly. This shows that the classifier has
difficulty classifying the classes with few training samples. Based
verification accuracy in the training phase, it can be seen that for
on this, we believe that an effective approach to improve the
the Shaoxing University datasets, the curve of our 1D_VGGNet
classification accuracy is to appropriately increase the number of
is smooth. In contrast, for the MIT-BIH database, the curve of
samples in each category to improve the TP ratios.
our 1D_VGGNet shows fluctuates greatly. Based on this, with the
(4) Based on the training performance analysis, it can be seen
increase in dimensions and the number of samples, the classifi-
that the 1D_CNN always takes the least time, our 1D_VGGNet
cation performance of our 1D_VGGNet decreases and fluctuates
ranks second and 2D_VGGNet takes the most time. The bot-
greatly. tleneck of our network is the model fitting ability, which can
(3) In the training phase, when the learning rate η is 1.0 × be addressed by increasing the number of model parameters.
10−6 and the number of iterations is 48 000, the best training Therefore, the time cost must be increased to train more model
performance on the Shaoxing University datasets, which have parameters. An effective improvement approach is to use graph-
relatively few dimensions and sample numbers, is obtained by ics processing unit (GPU) programming based on compute unified
our 1D_VGGNet. For the MIT-BIH database, which has a relatively device architecture (CUDA). Compared with CPU, the main ad-
large number of dimensions and samples, when the learning vantage of using GPU is the improvement of throughput, which
rate η is 1.0 × 10−3 and the number of iterations is 17 950, the means that we can execute more parallel code on GPU at the
best training performance is obtained by our 1D_VGGNet. As the same time than on CPU. However, GPUs cannot speed up recur-
number of dimensions and samples decreases, our 1D_VGGNet sive algorithms or algorithms that cannot be parallelized.
needs a smaller learning rate and more iterations to obtain the
best training performance. CRediT authorship contribution statement
(4) We can use our 1D_VGGNet to predict future develop-
ments. However, when accessing all data at the same time, our Sheng Feng: Conceptualization, Methodology, Software, Data
1D_VGGNet cannot meet the data processing requirements, re- curation, Writing – original draft, Formal analysis, Funding ac-
sulting in an increasing amount of unprocessed data. In addition, quisition, Investigation, Project administration, Resources, Valida-
our 1D_VGGNet does not continuously integrate new information tion, Visualization. Liping Zhao: Funding acquisition. Haiyan Shi:
into the model once constructed. However, it periodically rebuilds Investigation. Mengfei Wang: Investigation. Shigen Shen: Super-
a new model from scratch. This is not only time consuming but vision, Writing – review & editing. Weixing Wang: Supervision,
can lead to outdated models. In view of the above problems, Writing – review & editing.
our future efforts will focus on incremental and online algo-
rithms [55], as they can constantly incorporate information into Declaration of competing interest
their models. Thus, processing time and space can be minimized.
Due to their continuous large-scale and real-time processing ca- The authors declare that they have no known competing finan-
pabilities, these algorithms have recently received more attention cial interests or personal relationships that could have appeared
in the context of big data. to influence the work reported in this paper.
16
S. Feng, L. Zhao, H. Shi et al. Applied Soft Computing 135 (2023) 110035

Data availability [19] Y. Qin, X. Wang, Z. Xu, Ranking tourist attractions through online reviews:
A novel method with intuitionistic and hesitant fuzzy information based
The data that support the findings of this study are publicly on sentiment analysis, Int. J. Fuzzy Syst. 24 (2021) 755–777.
[20] J.-C. Guo, F.-B. Yan, G. Wan, X.-J. Hu, S. Wang, A deep learning method for
available at https://github.com/fengsheng13/datasets.
the recognition of solar radio burst spectrum, PeerJ Comput. Sci. 8 (2022)
e855.
Acknowledgments [21] Y. He, X. Gong, Radio signal searching based on convolution neural
network, Ordnance Ind. Autom. 36 (10) (2017) 88–92.
This work is supported in part by the National Science Foun- [22] S.R. Chitturi, D. Ratner, R.C. Walroth, V. Thampy, E.J. Reed, M. Dunne, C.J.
dation of China No. 62271321 and 61871289, Zhejiang Provincial Tassone, K.H. Stone, Automated prediction of lattice parameters from X-ray
powder diffraction patterns, J. Appl. Crystallogr. 54 (6) (2021) 1799–1810.
Natural Science Foundation of China under Grants LZ22F020002,
[23] A. Malekloo, E. Ozer, M. AlHamaydeh, M. Girolami, Machine learning and
LTY22F020003, LGG22F010004, and LY19F020014, Ministry of structural health monitoring overview with emerging technology and high-
Education Industry University Cooperation Collaborative Educa- dimensional data source highlights, Struct. Health Monit. 21 (4) (2022)
tion Project under Grant 202101011025, and Humanities and So- 1906–1955.
cial Sciences Project of Shaoxing University under Grant [24] Q. Fan, Y.-C. Hsu, R.P. Lieli, Y. Zhang, Estimation of conditional average
treatment effects with high-dimensional data, J. Bus. Econom. Statist. 40
2021LJ001. The views expressed are solely those of the authors.
(1) (2022) 313–327.
[25] Y. Zhang, R. Zhu, Z. Chen, J. Gao, D. Xia, Evaluating and selecting features
Appendix A. Supplementary data via information theoretic lower bounds of feature inner correlations for
high-dimensional data, European J. Oper. Res. 290 (1) (2021) 235–247.
Supplementary material related to this article can be found [26] B. Guan, Y. Zhao, Y. Yin, Y. Li, A differential evolution based feature
online at https://doi.org/10.1016/j.asoc.2023.110035. combination selection algorithm for high-dimensional data, Inform. Sci.
547 (2021) 870–886.
[27] T.C. Lux, L.T. Watson, T.H. Chang, Y. Hong, K. Cameron, Interpolation of
References sparse high-dimensional data, Numer. Algorithms 88 (1) (2021) 281–313.
[28] U. Laa, D. Cook, S. Lee, Burning sage: Reversing the curse of dimensionality
[1] L. Abualigah, A. Diabat, M. Abd Elaziz, Intelligent workflow scheduling in the visualization of high-dimensional data, J. Comput. Graph. Statist.
for big data applications in IoT cloud computing environments, Cluster (2021) 1–10.
Comput. 24 (1) (2021) 1–20. [29] Y. Li, Y. Chai, H. Yin, B. Chen, A novel feature learning framework for high-
[2] C. Huang, L. Zhu, Robust evaluation method of communication network dimensional data classification, Int. J. Mach. Learn. Cybern. 12 (2) (2021)
based on the combination of complex network and big data, Neural 555–569.
Comput. Appl. 33 (3) (2021) 887–896. [30] S. Salesi, G. Cosma, M. Mavrovouniotis, TAGA: Tabu asexual genetic
[3] J. Wang, C. Xu, J. Zhang, R. Zhong, Big data analytics for intelligent algorithm embedded in a filter/filter feature selection approach for
manufacturing systems: A review, J. Manuf. Syst. 62 (2021) 738–752, high-dimensional data, Inform. Sci. 565 (2021) 105–127.
http://dx.doi.org/10.1016/j.jmsy.2021.03.005.
[31] C.S. Wickramasinghe, D.L. Marino, M. Manic, ResNet autoencoders for
[4] A. Raza, K.P. Tran, L. Koehl, S. Li, Designing ECG monitoring healthcare
unsupervised feature learning from high-dimensional data: Deep models
system with federated transfer learning and explainable AI, Knowl.-Based
resistant to performance degradation, IEEE Access 9 (2021) 40511–40520.
Syst. 236 (2022) 107763, http://dx.doi.org/10.1016/j.knosys.2021.107763.
[32] Ç. Oğuz, M. Yağanoğlu, Detection of COVID-19 using deep learning tech-
[5] I.F. del Amo, J.A. Erkoyuncu, M. Farsi, D. Ariansyah, Hybrid recommen-
niques and classification methods, Inf. Process. Manage. 59 (5) (2022)
dations and dynamic authoring for AR knowledge capture and re-use
103025.
in diagnosis applications, Knowl.-Based Syst. 239 (2022) 107954, http:
[33] E. Choi, K. An, K.-T. Kang, Deep-learning-based microfluidic droplet classi-
//dx.doi.org/10.1016/j.knosys.2021.107954.
fication for multijet monitoring, ACS Appl. Mater. Interfaces 14 (13) (2022)
[6] B. Mrugalska, J. Ahmed, Organizational agility in industry 4.0: A systematic
15576–15586.
literature review, Sustainability 13 (15) (2021) 8272.
[34] W. Chen, K. Shi, Multi-scale attention convolutional neural network for
[7] S. Feng, H. Shi, L. Huang, S. Shen, S. Yu, H. Peng, C. Wu, Unknown hostile
time series classification, Neural Netw. 136 (2021) 126–140.
environment-oriented autonomous WSN deployment using a mobile robot,
[35] Y. Fan, W. Pang, S. Lu, HFPQ: Deep neural network compression by
J. Netw. Comput. Appl. 182 (2021) 103053.
hardware-friendly pruning-quantization, Appl. Intell. 51 (2021) 7016–7028.
[8] S. Feng, K. Hu, E. Fan, L. Zhao, C. Wu, Kalman filter for spatial-
[36] H. Louati, S. Bechikh, A. Louati, C.-C. Hung, L.B. Said, Deep convolutional
temporal regularized correlation filters, IEEE Trans. Image Process. 30
neural network architecture design as a bi-level optimization problem,
(2021) 3263–3278.
[9] S. Feng, S. Shen, L. Huang, A.C. Champion, S. Yu, C. Wu, Y. Zhang, Neurocomputing 439 (2021) 44–62.
Three-dimensional robot localization using cameras in wireless multimedia [37] J. Qiu, C. Chen, S. Liu, H.-Y. Zhang, B. Zeng, Slimconv: Reducing channel
sensor networks, J. Netw. Comput. Appl. 146 (2019) 102425. redundancy in convolutional neural networks by features recombining,
[10] S. Feng, C. Wu, Y. Zhang, S. Shen, Collaboration calibration and three- IEEE Trans. Image Process. 30 (2021) 6434–6445.
dimensional localization in multi-view system, Int. J. Adv. Robot. Syst. 15 [38] S.-J. Wang, Y. He, J. Li, X. Fu, MESNet: A convolutional neural network for
(6) (2018) 1729881418813778. spotting multi-scale micro-expression intervals in long videos, IEEE Trans.
[11] S. Feng, W. Cheng-Dong, Y. Zhang, Dynamic localization of mobile robot Image Process. 30 (2021) 3956–3969.
based on improved APIT, J. Beijing Univ. Posts Telecommun. 5 (2016) [39] M. Zhang, H. Li, S. Pan, J. Lyu, S. Ling, S. Su, Convolutional neural net-
67–71. works based lung nodule classification: A surrogate-assisted evolutionary
[12] C. Wu, S. Feng, Y. Zhang, Dynamic localization of mobile robot based on algorithm for hyperparameter optimization, IEEE Trans. Evol. Comput. 25
asynchronous Kalman filter, J. Northeastern Univ. (Nat. Sci.) 34 (3) (2013) (5) (2021) 869–882.
312–316. [40] B. Xiao, B. Xu, X. Bi, W. Li, Global-feature encoding U-net (GEU-net) for
[13] S. Feng, C.-d. Wu, Y.-z. Zhang, Z.-x. Jia, Grid-based improved maximum multi-focus image fusion, IEEE Trans. Image Process. 30 (2020) 163–175.
likelihood estimation for dynamic localization of mobile robots, Int. J. [41] K. Simonyan, A. Zisserman, Very deep convolutional networks for
Distrib. Sens. Netw. 10 (3) (2014) 271547. large-scale image recognition, 2014, arXiv preprint arXiv:1409.1556.
[14] B. Ziółko, D. Emms, M. Ziółko, Fuzzy evaluations of image segmentations, [42] I. Ding Jr., N.-W. Zheng, M.-C. Hsieh, Hand gesture intention-based identity
IEEE Trans. Fuzzy Syst. 26 (4) (2017) 1789–1799. recognition using various recognition strategies incorporated with VGG
[15] S.E. Ahmed, S. Amiri, K. Doksum, Ensemble linear subspace analysis of convolution neural network-extracted deep learning features, J. Intell.
high-dimensional data, Entropy 23 (3) (2021) 324. Fuzzy Systems 40 (4) (2021) 7775–7788.
[16] G. Beliakov, M. Gagolewski, S. James, Hierarchical data fusion processes [43] N. Sabor, G. Gendy, H. Mohammed, G. Wang, Y. Lian, Robust arrhythmia
involving the Möbius representation of capacities, Fuzzy Sets and Systems classification based on QRS detection and a compact 1D-CNN for wearable
433 (2022) 1–21, http://dx.doi.org/10.1016/j.fss.2021.02.006. ECG devices, IEEE J. Biomed. Health Inf. (2022).
[17] K. Simonyan, A. Zisserman, Very deep convolutional networks for [44] M. Zhu, J. Xie, Investigation of nearby monitoring station for hourly PM2.
large-scale image recognition: International conference on learning 5 forecasting using parallel multi-input 1D-CNN-biLSTM, Expert Syst. Appl.
representations, 2015, arXiv preprint arXiv:1409.1556. 211 (2023) 118707.
[18] M.D. Zeiler, R. Fergus, Visualizing and understanding convolutional net- [45] J. He, P. Wu, Y. Tong, X. Zhang, M. Lei, J. Gao, Bearing fault diagnosis via
works, in: European Conference on Computer Vision, Springer, 2014, pp. improved one-dimensional multi-scale dilated CNN, Sensors 21 (21) (2021)
818–833. 7319.

17
S. Feng, L. Zhao, H. Shi et al. Applied Soft Computing 135 (2023) 110035

[46] M.G. Ragab, S.J. Abdulkadir, N. Aziz, Q. Al-Tashi, Y. Alyousifi, H. Alhussian, [50] O. Cheikhrouhou, R. Mahmud, R. Zouari, M. Ibrahim, A. Zaguia, T.N. Gia,
A. Alqushaibi, A novel one-dimensional cnn with exponential adaptive One-dimensional CNN approach for ECG arrhythmia analysis in fog-cloud
gradients for air pollution index prediction, Sustainability 12 (23) (2020) environments, IEEE Access 9 (2021) 103513–103523.
10090. [51] D. Neupane, Y. Kim, J. Seok, J. Hong, CNN-based fault detection for smart
[47] Z. Li, F. Liu, W. Yang, S. Peng, J. Zhou, A survey of convolutional manufacturing, Appl. Sci. 11 (24) (2021) 11732.
neural networks: Analysis, applications, and prospects, IEEE Trans. Neu- [52] D.P. Kingma, J. Ba, Adam: A method for stochastic optimization, 2014, arXiv
ral Netw. Learn. Syst. (2021) 1–21, http://dx.doi.org/10.1109/TNNLS.2021. preprint arXiv:1412.6980.
3084827. [53] G.B. Moody, R.G. Mark, The impact of the MIT-BIH arrhythmia database,
[48] B. Kang, I. Park, C. Ok, S. Kim, ODPA-CNN: One dimensional parallel IEEE Eng. Med. Biol. Mag. 20 (3) (2001) 45–50.
atrous convolution neural network for band-selective hyperspectral image [54] G.D. Clifford, C. Liu, B. Moody, H.L. Li-wei, I. Silva, Q. Li, A. Johnson,
classification, Appl. Sci. 12 (1) (2021) 174. R.G. Mark, AF classification from a short single lead ECG recording: The
[49] H. Dai, G. Huang, J. Wang, H. Zeng, F. Zhou, Prediction of air pollutant con- PhysioNet/computing in cardiology challenge 2017, in: 2017 Computing in
centration based on one-dimensional multi-scale CNN-LSTM considering Cardiology (CinC), IEEE, 2017, pp. 1–4.
spatial-temporal characteristics: A case study of Xi’an, China, Atmosphere [55] V. Losing, B. Hammer, H. Wersing, Incremental on-line learning: A review
12 (12) (2021) 1626. and comparison of state of the art algorithms, Neurocomputing 275 (2018)
1261–1274.

18

You might also like