Rank 1

This paper presents a winning solution for the PhysioNet Challenge 2021, utilizing an ensemble of Residual CNNs with a multi-head attention mechanism for classifying ECG signals into 26 groups. The model incorporates various loss functions and optimizes classification thresholds using evolutionary methods, achieving the highest scores across all tested lead configurations. The solution demonstrates improved generalizability and consistent performance, ranking first among 39 teams.

Uploaded by

ian

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views4 pages

Rank 1

Uploaded by

ian

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Classification of ECG Using Ensemble of Residual CNNs with Attention

Mechanism

Petr Nejedly, Adam Ivora, Radovan Smisek, Ivo Viscor, Zuzana Koscova, Pavel Jurak, Filip Plesinger

Institute of Scientific Instruments of the Czech Academy of Sciences, Brno, Czech Republic

Abstract propagation during training.

For this Computing in Cardiology 2021 challenge [3]
This paper introduces a winning solution (team ISIBrno- we introduce a solution using an ensemble of a custom
AIMT) to the PhysioNet Challenge 2021. The method variant of ResNet neural networks accompanied by a mul-
is based on the ResNet deep neural network architecture tihead attention mechanism. This solution arises from our
with a multi-head attention mechanism for ECG classi- last year Computing in Cardiology 2020 challenge solu-
fication into 26 independent groups. The model is opti- tion [4], where we were using ResNet-GRU with atten-
mized using a mixture of loss functions, i.e., binary cross- tion mechanism [5]. With this method, we were able
entropy, custom challenge score loss function, and sparsity to achieve an acceptable validation score, however, the
loss function. Probability thresholds for each classifica- final test scores showed poor performance while testing
tion class are estimated using the evolutionary optimiza- on the undisclosed testing database. This indicated that
tion method. The final model consists of three submod- our method was able to classify data that originated from
els forming a majority voting classification ensemble. The the same hospital very well, however, generalizability for
proposed method classifies ECGs with a variable number other institutions was missing. This year’s solution tries
of leads, e.g., 12-lead, 6-lead, 4-lead, 3-lead, and 2-lead. to improve drawbacks from previous years by introducing
The algorithm was validated and tested on the external several changes to our method.
hidden datasets (CPSC, G12EC, undisclosed set, UMich), The investigation of literature from previous year solu-
achieving a challenge score 0.58 for all tested lead con- tions [6] led us to change the preprocessing step by intro-
figurations. The total training time was approximately 27 ducing data filtering to maximize generalizability across
hours, i.e., 9 hours per model. The presented solution was the institutions. The filtering should minimize the ECG
ranked first across all 39 teams in all categories. frequency band as much as possible (for the cost of possi-
bly discarding some ECG information that might be use-
ful). We believe that this might be a good idea since we
1. Introduction are not aware of data quality, types of artifacts, and distor-
tions in undisclosed testing sets. Secondly, we utilize z-
Cardiovascular diseases are the most common cause of score normalization, while last year, we were using physi-
death globally, reaching 32 percent by 2019 [1]. Heart dis- cal units in mV. In addition, models are trained using a cus-
orders are usually analyzed using electrocardiographic sig- tom loss function which consists of three parts, i.e., cross-
nals (ECG) at a length of 10-60 seconds, acquired from the entropy, custom challenge loss, and custom sparsity loss.
body surface. The ECG signal shows the electrical activity The custom challenge loss optimization was proposed
of heart atria and ventricles and, therefore, informs about by [7], where the continuous equivalent of binary OR op-
heart rhythm and a beat morphology. erator was used to design differentiable approximation of
Current automated algorithms to analyze the ECG sig- challenge metric. This helps the model to learn the sim-
nal are based on machine-learning (using expert features) ilarity of diagnoses. Next, we introduce the custom spar-
or deep-learning methods. A specialty of deep-learning sity loss, which forces the model to output probability val-
methods is that they extract features by themselves during ues close to 0 or 1, which helps with the final threshold
a training process from a raw or transformed ECG signal. optimization to binarize the data output. Lastly, the class-
These deep-learning methods are usually based on con- specific thresholds are found using differential evolution
volutional layers and are called convolutional neural net- genetic algorithms. The final model consists of three sub-
works (CNN). A need to train very complex CNNs led to units creating the model ensemble.
the invention of the Residual Networks (ResNet) architec-
ture [2], implementing residual blocks to improve gradient

Computing in Cardiology 2021; Vol 48 Page 1 ISSN: 2325-887X DOI: 10.22489/CinC.2021.014

2. Methods timization loss function is composed of three units i.e.,
binary cross-entropy (BCE), custom challenge loss (CL),
For this challenge, we have introduced a fully au- and custom sparsity loss (SL). The custom challenge loss
tonomous cloud-based solution for training and deploy- (differentiable approximation of challenge score) forces
ment of deep-learning models utilizing publicly available the network to maximize challenge score, accounting for
Python libraries such as NumPy, SciPy, scikit-learn and class weights. The method was proposed by [7] during the
PyTorch. For training and validation, the public challenge previous year of challenge. In addition, we propose a spar-
dataset was split into two sub-datasets in ratios 80 per- sity loss derived from the parabolic curve that penalizes the
cent and 20 percent, respectively. The dataset stratification network for outputting probability values that are close to
was iteratively optimized by a method available in scikit- 0.5 thus forcing it to output probability values close to 0 or
multilearn based on [8]. The data preprocessing consists 1, which helps with final threshold optimization.
of several steps described below:
1. Provided data are expanded into fixed 12-lead config- X
uration. If any lead is missing, the particular matrix row Loss = BCE(t, p) − CL(t, p) + SL(p) (1)
is filled with zeroes. This transformation always outputs a batch

matrix with dimensions (12, time).

SL(p) = −4p(p − 1) (2)
2. Resampling: Data are resampled to the sampling fre-
quency of 500 Hz. Polyphase filtering is used when the X
original sampling frequency is 1000 Hz; otherwise, the CL(t, p) = wij aij (t, p) (3)
FFT method is used for resampling. ij
3. Filtering: Data are filtered using a zero-phase method In general, the inputs to the loss functions are targets t and
with 3rd order Butterworth bandpass filter with frequency probabilities p. The challenge loss is estimated from mod-
band from 1 Hz to 47 Hz. ified multi-class confusion matrix entries aij and its core-
4. Normalization: Each ECG channel is normalized using sponding weights wij . The authors of [7] proposed to es-
a z-score. timate normalization constant N for modified confusion
5. Zeropadding: Data are zero-padded into the shape of matrix entries aij using continuous version of logic OR
8192 samples in the time domain. If a signal length is function, which makes loss function differentiable.
larger than 8192, then the signal is randomly sampled and X X
cut into the length of 8192. N= Xi |Yi ≈ Xi + Yi − Xi ∗ Yi (4)
6. Augmentation: During the training phase, randomly i i
choose the lead configuration (e.g. 12, 6, 4, 3, 2). Leads where Xi and Yi are outputs and targets for given class
that are not used are filled with zeros. i, respectively. Since we are interested in maximizing the
The model architecture is designed on the custom challenge score, we can invert the sign for challenge loss
ResNet blocks that utilize large convolution sizes (1st conv in eq. 1 and standardly use minimization optimizers.
layer 15 and subsequent residual conv layers 9 and using The model with the best validation performance is se-
stride 2x). The output from the convolutional layers is for- lected, and class thresholds are optimized by a differential
warded through the multi-head attention mechanism and evolution genetic algorithm. In general, this process re-
subsequently pooled with adaptive max pooling. The re- quires a large amount of computation since we are explor-
sulting feature vector is concatenated with a binary ECG ing 26-dimensional space with boundaries between 0 to 1.
lead indicator and classified by fully connected layers. The benefit of sparsity loss is that majority of model prob-
The model has two output heads. A first head outputs ability outputs are located close to 0 or 1, which speeds up
the logits forwarded into the BCE loss function. The sec- the threshold optimization. The estimated class-specific
ond output forms an additional small neural network that thresholds are the same for all leads configurations. The
processes logits from the first output head and optimizes random dataset split, model training, and threshold opti-
challenge score and sparsity of probabilities, i.e., challenge mization are repeated three times to create the model en-
loss and sparsity loss. The small network does not propa- semble. Each model in the ensemble outputs binary in-
gate gradients into the bottom layers. This means that the dicators for each class. For this reason, the final vote is
bottom layers of the model for feature extraction are opti- decided by the majority vote.
mized by BCE. And the top layer that outputs the proba-
bilities is optimized by challenge and sparsity loss. 3. Results
The model is trained using Adam optimizer for 50
epochs with learning rate 1e-3, batch size = 128, and L2 In a local validation, we achieved a score of 0.69; only
regularization parameter 1e-4 while reducing the learning 12-lead performance was investigated. However, this re-
rate by a factor of 0.1 after every 20th epoch. The op- sult is biased since the local validation set is used for

Page 2
A Input
B Input

ResBlock

Conv

BatchNorm
Conv
LeakyReLU
BatchNorm

LeakyReLU
ResBlock
AvgPool

ResBlock
Conv
Conv
ResBlock
BatchNorm
ResBlock
LeakyReLU
ResBlock

+ Dropout

Q K V
Output
MultiHeadAttention

(a) Residual block architecture.

GlobalMaxPool Lead Indicator

C Public Dataset
Concat

Linear
Iterative
Stratification
Ensemble x3

Output #1 Linear

BCE Loss BatchNorm

Training: 80% Validation: 20%
LeakyReLU

Linear
Neural Network Class Threshold
Training Optimization
Sigmoid
Differential
NN
Evolution

BCE Loss Adam

Output #2
Class Thresholds
Sparsity Loss StepLR Challenge Loss
Sparsity Loss
Save model and threshold
Challenge Loss
Model
(c) Full model architecture.
Training-Validation Loop: 50 epochs

(b) Training and validation pipeline with differential evolution threshold optimization.

Figure 1: Residual block (a), Model architecture (b), Training pipeline (c)

Ranking Team Validation CPSC test G12EC test Undisclosed test UMich test Test set
1 ISIBrno-AIMT 0.63 0.71 0.61 0.54 0.59 0.58
2 DSAIL-SNU 0.59 0.61 0.59 0.54 0.57 0.57
3 NIMA 0.63 0.76 0.6 0.44 0.58 0.55

Table 1: Challenge scores for top 3 teams in all-leads category.

Page 3
the selection of the best model and subsequent automatic In this paper, we described our solution to the Phys-
threshold optimization. In addition, we performed a gener- ioNet/CinC Challenge 2021, performing the best across all
alization test by evaluating model performance on an exter- teams and categories. The presented model shown consis-
nal dataset (Hefei high tech Cup - ECG Human-Machine tent classification performance across all lead configura-
Intelligence Competition held by the Tianchi platform)[9], tions, answering the challenge topic ”Will Two Do?”.
and achieved the challenge score of 0.77 (not all scored
classes were present). Lastly, we performed another gen- Acknowledgments
eralization test by excluding G12EC dataset from training
and using it as full holdout achieving the challenge score This work was supported with a project by the Czech
of 0.53. Academy of Sciences RVO: 68081731 and with a project
The performance of our algorithm (ISIBrno-AIMT by the Czech Technological Agency: FW01010305.
team) was estimated on hidden validation set during the of-
ficial phase of the challenge. Tab.1 shows a comparison of References
the three best-performing teams. Tab.2 shows our valida-
tion challenge scores for specific lead configurations. For [1] WHO Cardiovascular diseases (CVDs). https://www.
who.int/news-room/fact-sheets/detail/car
the hidden test set, we received a score of 0.58 across all
diovascular-diseases-(cvds). Accessed: 2021-
configurations. The total training time was approximately 08-30.
27 hours, i.e., 9 hours per model. [2] He K, Zhang X, Ren S, Sun J. Deep Residual Learning for
Image Recognition. arXiv 2015;.
Leads Validation Test Ranking [3] Reyna MA, Sadr N, Perez Alday EA, Gu A, Shah A, Ro-
12 0.64 0.58 1st bichaux C, et al. Will Two Do? Varying Dimensions in
6 0.62 0.58 1st Electrocardiography: the PhysioNet/Computing in Cardiol-
4 0.63 0.58 1st ogy Challenge 2021. Computing in Cardiology 2021;48:1–
3 0.63 0.58 1st 4.
2 0.62 0.58 1st [4] Perez Alday EA, Gu A, Shah A, Robichaux C, Wong AKI,
Liu C, et al. Classification of 12-lead ECGs: the Phys-
Table 2: Challenge scores for our final selected entry (team ioNet/Computing in Cardiology Challenge 2020. Physiolog-
ISIBrno-AIMT) scored on the hidden validation set, and ical Measurement 2020;41.
one-time scoring on the hidden test set as well as the rank- [5] Nejedly P, Ivora A, Viscor I, Halamek J, Jurak P, Plesinger
F. Utilization of Residual CNN-GRU With Attention Mech-
ing on the hidden test set.
anism for Classification of 12-lead ECG. Computing in Car-
diology 2020;1–4.
[6] Natarajan A, Chang Y, Mariani S, Rahman A, Boverman G,
4. Discussion and Conclusions Vij S, et al. A Wide and Deep Transformer Neural Network
for 12-Lead ECG Classification. Computing in Cardiology
This paper introduces a method for the classification of 2020;1–4.
ECG with a variable number of leads. We have developed [7] Vicar T, Hejc J, Novotna P, Ronzhina M, Janousek O. ECG
a Residual CNN network with an attention mechanism op- Abnormalities Recognition Using Convolutional Network
timized by a mixture of loss functions i.e., binary cross- With Global Skip Connections and Custom Loss Function.
entropy, differentiable approximation of challenge score, Computing in Cardiology 2020;1–4.
and sparsity loss function. Subsequently, a differential [8] Sechidis K, Tsoumakas G, Vlahavas I. On the Stratification
of Multi-label Data. Machine Learning and Knowledge Dis-
evolution algorithm was used for class-specific threshold
covery in Databases 2011;145–158.
optimization. [9] Wang D, Meng Q, Chen D, Zhang H, Xu L. Automatic De-
In comparison to our previous solution [5] from Phys- tection of Arrhythmia Based on Multi-Resolution Represen-
ioNet/CinC challenge 2020[4], we have improved prepro- tation of ECG Signal. Sensors 2020;20(6).
cessing steps (filtering, normalization, and data augmenta-
tion). We have also updated the architecture of the model
using larger convolutional kernels. In the presented solu- Address for correspondence:
tion, we performed local tests to check generalization abili-
Petr Nejedly
ties of the model. We believe that signal filtering in the nar-
Královopolská 147, 612 00 Brno, Czech Republic
row frequency range of 1-47 Hz helped with the general-
[email protected]
ization of our model, which was probably the critical draw-
back of our last year’s solution. We also introduced spar-
sity loss properties helping with the class-specific thresh-
olds optimization.