Base Paper
Base Paper
Base Paper
A New Multi-Stage Combined Kernel Filtering Approach for ECG Noise
Removal
PII: S0022-0736(17)30418-1
DOI: doi: 10.1016/j.jelectrocard.2017.10.009
Reference: YJELC 52524
Please cite this article as: Tayel Mazhar B., Eltrass Ahmed S., Ammar Abeer I., A New
Multi-Stage Combined Kernel Filtering Approach for ECG Noise Removal, Journal of
Electrocardiology (2017), doi: 10.1016/j.jelectrocard.2017.10.009
This is a PDF file of an unedited manuscript that has been accepted for publication.
As a service to our customers we are providing this early version of the manuscript.
The manuscript will undergo copyediting, typesetting, and review of the resulting proof
before it is published in its final form. Please note that during the production process
errors may be discovered which could affect the content, and all legal disclaimers that
apply to the journal pertain.
ACCEPTED MANUSCRIPT
P T
A New Multi-Stage Combined Kernel Filtering Approach for
RI
ECG Noise Removal
SC
Mazhar B. Tayela, Ahmed S. Eltrass*b, Abeer I. Ammarc
a,b,c
Department of Electrical Engineering, Faculty of Engineering, Alexandria University, Alexandria 21544, Egypt.
Abstract
NU
MA
Electrocardiogram (ECG) signals are contaminated with different artifacts and noise sources which increase
the difficulty in analyzing the ECG signals and obtaining accurate diagnosis of heart diseases. In this paper, a
new multi-stage combined adaptive filtering design based on Kernel Recursive Least Squares Tracker
(KRLST) and Kernel Recursive Least Squares with Approximate Linear Dependency (ALDKRLS)
ED
algorithms is proposed for removing artifacts and noise sources, while preserving the low frequency
components and the tiny features of the ECG signal. The capability of the proposed approach is demonstrated
by investigating several ECG signals from the MIT-BIH database and comparing the results with other
existing adaptive filtering techniques. The results show that the combined ALDKRLS-KRLST approach is
PT
much superior in terms of attenuating artifacts components, sensitivity of ECG peak detection, and heart
diseases diagnosis. This reveals the effectiveness of the proposed technique as an effective framework for
achieving high-resolution ECG from noisy ECG recordings.
CE
AC
Keywords: ECG; Adaptive Filters; Least Mean Square; Recursive Least Square; Kernel; Baseline Wander; Power Line
Interference; ElectroMyoGraphy;
List of Abbreviations
T
B2P : Back-to-the-Prior
P
BSS : Blind Source Separation
BW : Baseline Wonder
RI
ECG : ElectroCardioGram
EMD :Empirical Mode Decomposition
SC
EMG : ElectroMyoGraphy
FNs : False Negatives
GPs : Gaussian processes
ICA : Independent Component Analysis
NU
KAF : Kernel Adaptive Filter
KLMS : Kernel Least Mean Square
KRLST : Kernel Recursive Least Squares Tracker
MA
LMS : Least Mean Square
MIT-BIH : Massachusetts Institute of Technology–Beth Israel Deaconess Medical Center
MSE : Mean Square Error
NKLMS : Normalized Kernel Least Mean Square
ED
1. Introduction
The ElectroCardioGram (ECG) is a representation of the electrical activity of heart muscles, which is
widely used in several clinical studies for interpretation and identification of heart disorders. According to the
most recent statistics, provided by the World Health Organization (WHO), heart diseases remain the main
specific reason of mortality in any region of the world. The ECG signal is a low amplitude voltage signal that
T
can be distorted easily by different sources of noise, which cause poor signal quality and inaccurate clinical
P
diagnosis. The most common noise sources are the Power Line Interference (PLI) which consists of a 60 Hz
(or 50 Hz in some countries) sinusoid and its harmonics, the Baseline Wonder (BW) resulting from
RI
respiratory or patient movement with frequency ranges from 0.15 Hz to 3 Hz, and the ElectroMyoGraphy
(EMG) which is an electrical potential with a broad spectrum generated by muscle cells [1]. These noise
sources should be suppressed or cancelled to obtain reliable and high-quality ECG recording and
SC
consequently to provide accurate heart disease diagnosis. ECG noise removal is very important for accurate
analysis in many ECG applications, e.g., beat classification [2], QRS detection [3], analysis of asymptomatic
arrhythmia [4], fetal ECG signal extraction from the maternal abdominal ECG [5], ECG signal data
NU
compression [6], and the detection of heart diseases or disorders [1].
ECG noise removal is a very challenging and crucial task because of the time-varying and non-stationary
nature of the ECG noise, especially its spectral overlap with the ECG signal. Several approaches have been
MA
proposed for removing different noise sources from the ECG signal. Blind Source Separation (BSS)
techniques such as the Principal Component Analysis (PCA) and Independent Component Analysis (ICA)
require linearly independent multichannel ECG recordings to be utilized in ECG noise removal [7]. These
techniques provide good performance but with a high computational cost and a large amount of operating
ED
memory. Over the past several years, methods based on the Wavelet Transform (WT) were proposed to filter
signals that have multi-resolution characteristics such as the ECG signal [8, 9]. Recently, artificial neural
networks have also been employed to remove the Gaussian and Baseline Wonder (BW) noise in ECG signals
[10]. One of the common and effective methods used in ECG noise removal is the adaptive filter architecture,
PT
which is based on minimizing the error between the input ECG signal contaminated with different noise
sources and a reference signal having a good correlation with the ECG noise source in order to estimate the
noise characteristics [11]. The adaptive filtering techniques have a superior performance in tracking the non-
CE
stationary changes in both the ECG signal and the noise source.
The aim of this work is to propose a new multi-stage kernel adaptive filtering approach for ECG noise
AC
removal with high accuracy and low computational cost. The proposed approach is based on Kernel Recursive
Least Squares with Approximate Linear Dependency (ALDKRLS) [12] and Kernel Recursive Least Squares
Tracker (KRLST) [13] algorithms. The remaining parts of this paper are structured as follows. Section 2
presents the structure of Kernel Adaptive Filtering (KAF), followed by an explanation of the proposed multi-
stage ALDKRLS-KRLST filtering design for ECG noise removal. Section 3 discusses the results of the
proposed filtering approach, followed by a critical comparison with other existing adaptive filtering
techniques. The sensitivity performance and the assessment of classification accuracy are also presented in
section 3. The discussion and conclusions are provided in section 4.
2. Theory
Conventional filters are not efficient in ECG noise removal because of the ECG data nonlinearity and the
overlapped spectra between ECG signal and noise, while adaptive filters are capable of resolving these
difficulties. An Adaptive filter is a digital filter which self-adjusts its transfer function according to an
optimizing linear or nonlinear algorithm with an unknown environment input signal [14]. Linear adaptive
ACCEPTED MANUSCRIPT
filtering algorithms have a good performance with only linear data. For nonlinear data like the ECG signal,
the mapping between the desired signal and the input signal is nonlinear so that nonlinear adaptive filters
achieve better performance than linear filters. Kernel Adaptive Filter (KAF) is a type of nonlinear adaptive
filters. Different KAFs were already utilized in ECG noise removal such as the Kernel Least Mean Squares
(KLMS) and the Normalized KLMS (NKLMS) [15].
T
2.1. Kernel Adaptive Filtering (KAF)
P
Kernel methods offer an efficient online solution to deal with many problems in nonlinear signal
RI
processing. The main idea of kernel filtering is to transform the input data into a high-dimensional feature
space in order to use the linear structure of this space to implement well-established linear adaptive algorithms
such as the LMS [16] and RLS [17]. Kernel methods have the advantages of having convex loss functions,
SC
and of being moderately complex to implement [14, 18].
Consider N pairs of training data {𝑥(𝑖), 𝑦(𝑖)}𝑁𝑖=1 , where 𝑥(𝑖) is the input vector at time 𝑖, and 𝑦(𝑖) is the
NU
output signal with 𝑓𝑖 (. ) be the learned function at time 𝑖. When a new sample is available at time 𝑖, the
function 𝑓𝑖 (. ) makes a prediction 𝑦̂(𝑖) = 𝑓𝑖 (𝑥(𝑖)). The aim of online learning is to update the prediction
function sequentially by minimizing the cost or loss function 𝐽. With kernel methods, the prediction function
𝑓(. ) usually takes the form:
MA
𝑓(𝑥) = 〈𝑊, 𝜑(𝑥)〉𝐻 = 𝑊 𝑇 𝜑(𝑥) (1)
where 𝜑(𝑥) is the transformed data or the nonlinear mapping of the input data included by a Mercer kernel
ED
𝑘(x, x ′ ) which maps the input 𝑥 to a high dimentional feature space 𝐻, 𝑊 is a weight vector in 𝐻 that should
be estimated, and 〈. , . 〉𝐻 denotes the inner product in 𝐻 space. Updating the prediction function amounts for
updating the weight vector 𝑊 by minimizing the loss function 𝐽. The loss function 𝐽 in kernel adaptive
PT
filtering is chosen to be the Mean Square Error (MSE) due to its appropriate properties such as smoothness,
low computational complexity, and convexity which is a very important feature that prevents the algorithms
from being stuck in a local minima when solving the optimization problem of minimizing the loss function 𝐽.
The loss function 𝐽 can be expressed as:
CE
Linear algorithms such as LMS and RLS are then applied to calculate 𝑊 and obtain the optimum solution.
Note that the online system requires updating its solution when new data becomes available, and the
functional representation of classical kernel-based algorithms increases linearly with the number of processed
data. This leads to a growing complexity for each consecutive update.
In order to obtain feasible online kernel algorithms, complexity growth can be reduced by representing the
solution using only a subset of relevant bases according to certain criterion. Most KAFs are designed
specifically for stationary data which is not the case for ECG recordings. Therefore, the non-stationarity
behaviour of ECG data should be tracked with new KAF algorithms that include a mechanism for computing
the solution which provides more weight to more recent data. Based on these considerations, the KRLS
Tracker (KRLST) algorithm [13] is employed, for the first time, to track the non-stationary ECG data that
ACCEPTED MANUSCRIPT
exhibit nonlinear relationships by forgetting past information and by tracking changes in the target latent
function. In this algorithm, a sensible mechanism for tracking is followed by handling the uncertainty about
the input–output relationship, which can be considered as the latent function, and studying the problem of
how older data should be forgotten. For a set of input/output pairs, the KRLST filter predicts an unknown
output given the corresponding input and the available data at time 𝑡 using two main steps: (1) the
formulation of a probabilistic framework based on Gaussian Processes (GPs) and (2) the forgetting strategy of
T
the past information (see the Appendix for detailed explanation).
P
2.2.2 ALDKRLS Algorithm
RI
The second challenge with kernel algorithms is that as the number of training data increases, the size
increases linearly. This may degrade the computational efficiency and increase the system complexity when
SC
applying kernel filters in ECG noise removal. In order to control this growing structure in the proposed filter,
the ALDKRLS algorithm [12] is utilized. In this algorithm, the computational efficiency is enhanced by
employing the Approximate Linear Dependency (ALD) sparsification technique into the KRLS algorithm.
NU
Given a new sample, the ALD algorithm will distinguish between two cases : (1) the sample is approximately
dependent of past samples and (2) the sample is not approximately dependent on past samples. Each sample
will be added to the dictionary according to an important parameter called the sensitivity threshold which
decides if a basis will be accepted into the dictionary or not.
MA
The ALD on-line prediction setup assumes sequentially a stream of input/output pairs
{(𝑥1 , 𝑦1 ), (𝑥2 , 𝑦2 )}, 𝑥𝑖 ∈ 𝜒, 𝑦𝑖 ∈ ℝ. Assuming at time step 𝑡 , after having observed 𝑡 − 1 training samples
𝑚𝑡−1
{𝑥𝑖 }𝑡−1
𝑖=1 , a dictionary consisting of a subset of the training samples 𝐷𝑡−1 = {𝑥 ̂𝑗 }𝑗=1 has been collected.
ED
Presenting a new sample 𝑥𝑡 , the algorithm tests whether 𝜙(𝑥𝑡 ) is approximately linearly dependent on the
dictionary vectors or not. If not, it will be added to the dictionary. Consequently, all training samples up to
time 𝑡 can be approximated by linear combinations of the vectors in 𝐷𝑡 . To avoid adding the training sample
PT
𝑇
𝑥𝑡 to the dictionary, the coefficients 𝑎 = (𝑎1 , … … 𝑎𝑚𝑡−1 ) which satisfy the Approximate Linear Dependence
(ALD) condition should be searched:
CE
𝑚 𝑡−1 2
𝛿𝑡 ≝ min 𝑗 =1 𝑎𝑗 ∅ (𝑥
̂𝑗 ) − ∅(𝑥𝑡 ) ≤𝜐 (3)
𝑎
where 𝜐 is the ALD sensitivity threshold or the level of sparsity which must be in the range of [0, 0.05] to
AC
minimize the MSE [12], ∅ is the kernel function, δt is the distance between ∅(xt ) and the subspace, and 𝑎𝑗 is
the coefficient vector. If the ALD condition holds, ∅(xt ) can be approximated by linear combinations of
current dictionary items. By performing the minimization in (11), the optimal coefficient vector can be
obtained [12, 19].
In the current work, the advantages of ALDKRLS and KRLST algorithms are combined in a new design
of three-stages for ECG noise removal. The first and second stages employ the ALDKRLS algorithm to
remove the 60 Hz PLI and BW, respectively, while the KRLST algorithm is utilized in the third stage to
remove the EMG artifacts. The basic building block of the proposed ECG noise removal design is shown in
Fig. 1. As shown in Fig. 1, the basic input is the desired ECG signal 𝑆1 contaminated with a noise 𝑛1 and the
reference input is a noise 𝑛2 which is correlated with the contaminating noise 𝑛1 . This reference input can be
BW or PLI or EMG artifacts. Both noise sources 𝑛1 and 𝑛2 should not be correlated with the signal 𝑆1 . The
adaptive algorithm adapts the coefficients 𝑊 of the filter to generate an output 𝑦 similar to the noise
ACCEPTED MANUSCRIPT
contaminated with the ECG input signal. Then, this output 𝑦 is subtracted from the input signal to compute
the filter error 𝑒 = (𝑆1 + 𝑛1 ) − 𝑦, which is minimized by adapting the filter coefficients 𝑊.
𝑠1+ 𝑛1 + 𝑒
P T
_
𝑛2
RI
Adaptive FIR Digital
Filter
Reference input 𝑦
SC
𝑊
Adaptive Algorithm
NU
ALDKRLS or KRLST
MA
Fig. 1. Adaptive filter structure for noise removal [20].
𝑥 = 𝑠1 + 𝑛1 + 𝑒1 + 𝑒2
+ 𝑒3
ED
- - -
PT
𝑥1 60 𝐻𝑧 𝑃𝐿𝐼 𝑦1
W1
ALDKRLS
CE
𝑥2 𝐵𝑊 𝑦2
W2
AC
ALDKRLS
`
𝑥3 𝐸𝑀𝐺 𝑦3
W3
KRLST
Fig. 2 illustrates the detailed design of the combined ALDKRLS-KRLST multi-stage filter. The input
signal to the first stage is the ECG signal 𝑆1 contaminated with different artifacts 𝑛1 : PLI, BW, and EMG.
A reference signal of 60 Hz PLI is introduced to the first stage to search the optimum estimation 𝑦1 of the
PLI interference. The PLI estimation 𝑦1 is then subtracted from the input ECG signal (𝑆1 + 𝑛1 ) to obtain
ACCEPTED MANUSCRIPT
the output of the first stage 𝑒1 which is the ECG signal without the PLI. The output 𝑒1 is then forwarded to
the second stage as its input signal. The same procedure is repeated in the second and third stages to remove
the BW and EMG artifacts, respectively. The final output 𝑒3 is the filtered ECG signal after removing all
artifacts.
Table 1 shows the parameters of the ALDKRLS and the KRLST filters used in the ECG noise removal
T
design. The kernel type is chosen to be Gaussian for both filters with a kernel parameter of 32. The forgetting
factor λ ∈ (0, 1] tells whether the data is non-stationarity (λ = 1), or not and to which degree (0 < λ < 1). In the
P
current study, the ECG signal is a high degree non-stationary signal and hence the forgetting factor should be
RI
close to one. In the proposed design, the forgetting factor is selected to be 0.99 with a maximum dictionary
size of 100 for the KRLST filter. With the use of ALDKRLS filter, the maximum dictionary size can be
increased to 1000 with a sensitivity threshold of 10−4 as shown in Table 1. Note that the ALD sensitivity
SC
threshold 𝜐 determines whether a basis will be accepted into the dictionary or not to slow down the dictionary
growth and consequently reduce the computational complexity and the memory consumption. The value of
the ALD threshold 𝜐 must be in the range of [0,0.05] to minimize the MSE [13]. In the proposed design, a
NU
sensitivity threshold of υ = 10−4 gives the best results.
Table 1. The parameters of ALDKRLS and KRLST filters used in the proposed ECG noise removal design.
Filter Type Common Parameters Dedicated Parameters
MA
Kernel Kernel Maximum ALD- Forgetting Regularization
Type Parameter Dictionary size (M) Threshold (𝜐) Factor (λ) Parameter 𝜎𝑛2
-4
ALDKRLS Gaussian 32 1000 10 -- --
ED
3. Results
PT
In order to demonstrate the effectiveness of the proposed ECG noise removal technique, 30 ECG signals
from the Massachusetts Institute of Technology–Beth Israel Deaconess Medical Center (MIT-BIH)
Arrhythmia Database [21] are investigated. A case study is shown in Fig. 3, where an ECG record (record
CE
100) from the MIT-BIH database is used as a clean ECG signal with a sampling frequency of 360 samples per
second and number of samples equals 3600 [21]. Different types of artifacts taken from the MIT-BIH noise
stress test database are added to the clean ECG signal as follows: the 60 Hz PLI noise, the baseline wander (in
AC
Fig. 3 shows the filtering results of the proposed ALDKRLS-KRLST approach and other existing adaptive
filters (LMS, KLMS, NKLMS, and KRLS). It can be noted from Fig. 3c that the LMS method fails to detect
the start of the ECG signal and causes a distortion in the ECG signal morphology because of its low
sensitivity to the nonlinear behaviour of ECG signals. These drawbacks are partially enhanced using the
kernelized LMS algorithms but with a moderate morphology distortion because of the non-stationarity of
ECG signals. With the use of KRLS algorithm, the detection of the ECG signal start is significantly enhanced
but still with a small degree of distortion in the ECG wave morphology (see Fig. 3f). Fig. 3g reveals the
superior performance of the proposed ALDKRLS-KRLST over other adaptive filtering techniques.
ACCEPTED MANUSCRIPT
P T
RI
SC
NU
MA
ED
PT
CE
AC
Fig. 3. ECG noise removal results: (a) clean ECG signal; (b) noisy ECG signal; filtered ECG signal with (c) LMS; (d) KLMS; (e)
NKLMS; (f) KRLS; and (g) the proposed ALDKRLS-KRLST.
ACCEPTED MANUSCRIPT
The performance of the ALDKRLS-KRLST approach is evaluated by investigating the morphology of the
ECG frequency components before and after removing the artifacts. We have calculated the Power Spectral
Densities (PSDs) of the contaminated ECG signal and the output ECG signal after each stage. Table 2 shows
the attenuation percentages of the components of maximum power of PLI, BW, and EMG existed in the ECG
signal after each stage using the proposed ALDKRLS-KRLST approach and other adaptive filtering
techniques. It can be noted from Table 2 that the PLI and BW components are moderately attenuated using
T
LMS, KLMS, NKLMS, and KRLS techniques. The EMG components are poorly attenuated with the LMS,
KLMS, and NKLMS methods and moderately attenuated with the KRLS. On contrast, the proposed
P
ALDKRLS-KRLST approach have the maximum attenuation percentage for all artifact components,
RI
revealing its superior performance over other adaptive filtering techniques. Also, the results show that the
original ECG components are not affected after each filtering stage, and that the tiny features of the ECG
signal are preserved.
SC
Table 2. Attenuation percentages of components of maximum power of PLI, BW, and EMG artifacts using the combined ALDKRLS-
KRLST approach and other adaptive filtering techniques.
NU
Artifact Frequency Attenuation (%)
LMS KLMS NKLMS KRLS Proposed Combined Design
PLI artifact (60 Hz) 58 % 62 % 68 % 71 % 96 %
MA
BW artifact (1.7 Hz) 37 % 49 % 56 % 63 % 80.7 %
EMG artifact (112 Hz) 17 % 29 % 35 % 47 % 77.2 %
EMG artifact (119 Hz) 11 % 24 % 31 % 41 % 68.8 %
EMG artifact (138 Hz) 15 % 27 % 31 % 44 % 74.6 %
ED
The performance of the combined ALDKRLS-KRLST approach is further investigated by comparing the
filtered ECG signal and the noisy ECG signal using statistical evaluation metrics such as the MSE, cross
correlation, and the consumed time. As shown in Fig. 4, these evaluation metrics were calculated for the
CE
proposed design and other adaptive filtering techniques. It can be noted that the LMS method has a poor
performance because the mapping between the input and output ECG signal is extremely nonlinear. The
performance is enhanced using the kernel filters (KLMS, NKLMS, and KRLS), which update the solution
AC
Figs. 4a and 4b show that the KRLST algorithm has the least MSE and the highest cross correlation values
because of its ability to deal with the ECG data whose input-output relationships change over time. However,
the KRLST method has the largest computational time (9.7 sec) compared to all other algorithms because the
computational time and complexity grows linearly with the number of processed data for each track. As
shown in Fig. 4c, the ALDKRLS algorithm consumes much lower time (2.3 sec) than KRLST because it
reduces the input data to a sparse dictionary of bases. However, the ALDKRLS has slightly higher MSE and
lower cross correlation than the KRLST algorithm (see Figs. 4a and 4b). Combining the ALDKRLS and
KRLST algorithms in the proposed design exploits the advantages of both algorithms, while keeping the
computational time as low as possible (4.8 sec). It can be noted from Fig. 4 that the cascaded combination of
the proposed design with the use of the ALDKRLS algorithm in the first two stages and KRLST in the last
stage has better performance than combining the KRLST in the first two stages and the ALDKRLS in the last
stage.
ACCEPTED MANUSCRIPT
P T
RI
SC
NU
MA
ED
PT
CE
Fig. 4. (a) MSE; (b) cross correlation; and (c) consumed time of the proposed ALDKRLS-KRLST approach and other adaptive filtering
methods.
AC
The accurate detection of ECG wave parameters is very important for precise diagnosis of heart diseases.
Therefore, in the current work, the sensitivity of the combined ALDKRLS-KRLST design and other adaptive
filtering techniques in detecting the ECG peak locations of P, Q, R, S and T waves is evaluated by
determining the True Positives (𝑇𝑃𝑠) and False Negatives (𝐹𝑁𝑠) of wave parameters. The sensitivity of peak
detection can be expressed as [22]
TPs
Sensitivity = % (4)
TPs+FNs
where 𝑇𝑃𝑠 are the peak locations of the reconstructed ECG signal which are matched with the peak locations
in the clean ECG, and 𝐹𝑁𝑠 are the peak locations in the clean ECG which are not found in the reconstructed
ECG signal.
ACCEPTED MANUSCRIPT
The ECG peak points are calculated using the same algorithm for all investigated filtering techniques in
order to compare their sensitivity with the same metric. We have employed the wavelet transform algorithm
in [23] for ECG peak detection because of its high performance compared to other algorithms based on
differentiation. The sensitivity is calculated for each wave (P, Q, R, S and T) with a tolerance of zero, one,
and four samples. Fig. 5 shows the sensitivity of all ECG waves with a zero tolerance using the proposed
design and other adaptive filtering techniques. It can be noted that the proposed ALDKRLS-KRLST
T
algorithm accurately detects all ECG waves, and provides the highest sensitivity comparing to all other
techniques.
P
RI
Sensitivity Without Tolerance
SC
100
90
80
NU
70
Sensitivity (%)
60
50
MA
40
30
20
10
ED
0
LMS KLMS NKLMS KRLS ALDKRLS KRLST Proposed
Design
PT
Filtering Techniques
Fig. 5. Sensitivity of peak detection for the proposed ECG noise removal design and other adaptive filtering methods.
Fig. 5 shows that the LMS method cannot detect any peak of the ECG signal except the P wave. With the
AC
use of kernel-based filters (KLMS, NKLMS, and KRLS), the detection sensitivity is improved for Q, R, and S
waves, but still not significantly enhanced for P and T waves. The detection sensitivity of the ALDKRLS
method is highly enhanced for Q, R, and S waves but still is not significantly improved for P and T waves.
Both the KRLST algorithm and the combined ALDKRLS-KRLST approach accurately detect Q, R, and S
waves, and provide higher sensitivity for P, and T waves than all other algorithms. However, the
computational time of the proposed ALDKRLS-KRLST design is much lower than the KRLST algorithm (see
Fig. 4). This reveals the effectiveness of the proposed technique in diagnosing diseases which require accurate
detection of the P and T waves such as the myocardial infarction, junctional rhythm, atrioventricular block,
and myocardial ischemia [24].
In order to demonstrate the effectiveness of the proposed design in heart disease diagnosis, we investigate
the diagnosis accuracy of the Atrial Fibrillation (AF) disease for the filtered ECG signals using the combined
ALDKRLS-KRLST approach and using other adaptive filtering techniques. AF is the most common cardiac
ACCEPTED MANUSCRIPT
arrhythmia, occurring in 1-2 % of the general population and is associated with significant mortality and
morbidity through the association of death risk. In the current study, the fuzzy logic algorithm in [25] is
employed for classification and the features extracted by this classifier are RR interval irregularity, P-wave
absence, T-wave presence, and the noise level. The used training set consists of 300 records available at
https://physionet.org/challenge/2017/. The AF diagnosis accuracy is investigated using 15 records from MIT-
BIH Atrial Fibrillation Data Base (AFDB). The AF classification accuracy for the clean 15 records without
T
any noise sources is 80%. After adding noise and artifact sources and removing them using the proposed
design and other adaptive filtering methods, the AF classifier is applied to the reconstructed ECG signals to
P
compare the improvement by each filtering technique. Table 3 summarizes the classification accuracy results
RI
for the 15 test records. It can be noted that the classification accuracy with the proposed ALDKRLS-KRLST
filter design (73.3 %) is very close to the accuracy of clean records (80 %) and is significantly larger than all
other traditional adaptive filtering techniques (< 33.5 % ). Also, Table 3 shows that the classification accuracy
SC
of the proposed ALDKRLS-KRLST design is higher than the ALDKRLS algorithm and is the same as the
KRLST algorithm. However, as discussed in section 3.1, the computational efficiency of the proposed
ALDKRLS-KRLST design is superior compared to the KRLST algorithm.
NU
Table 3. AF classification accuracy for the proposed ECG noise removal design and other adaptive filtering methods.
Filtering technique LMS KLMS NKLMS KRLS ALDKRLS KRLST Proposed Design
MA
True classification (A) 2 4 4 5 10 11 11
False classification (N) 13 11 11 10 5 4 4
Accuracy (%) 13.3 % 26.6 % 26.6 % 33.3 % 66.6% 73.3 % 73.3 %
ED
In this paper, a new three-stage ALDKRLS-KRLST technique is proposed for removing ECG noise sources
such as PLI, BW, and EMG without changing the important features and characteristics of the ECG signal.
One of the main advantages of employing a cascade of three-filters rather than only one filter is the simple
CE
and fast adaptation of coefficients for the three independent adaptive filters comparing to only one filter. Also,
after each stage of the cascade design, the ECG signal with one of the three attenuated artifacts can be
obtained for further investigation or analysis in different clinical studies. Moreover, the ECG signal quality
can be enhanced by using multiple processing stages that eliminate one artifact type at a time. To the authors'
AC
knowledge, the cascade design for ECG noise removal represents an approach that remains largely
unexplored [26].
In the proposed cascade design, we have combined the ALDKRLS and KRLST algorithms in order to
make full use of the advantages of both techniques. The KRLST algorithm has the ability of tracking non
stationary data whose input-output relationship change over time. However, its computational time is
relatively high because of the linear growth of consumed time with the number of processed data for each
track. Therefore, the ALDKRLS algorithm is employed in the first two stages in order to reduce the input data
to a sparse dictionary of bases before the tracking stage with the KRLST algorithm. A critical comparison
between the combined ALDKRLS-KRLST approach and other adaptive filtering techniques such as the LMS,
KLMS, NKLMS, and KRLS [4, 15] is made using different evaluation metrics.
In order to evaluate the performance of the proposed ALDKRLS-KRLST design, the morphology of ECG
frequency components and artifacts is investigated before and after removing the artifact sources. The results
ACCEPTED MANUSCRIPT
show that the ALDKRLS-KRLST approach attenuates, on the average, 96% of the PLI; 80.7% of the
maximum power component of BW ( 1.7 Hz), and 77.2%, 68.8%, 74.6 % of the component of maximum
power of EMG at frequencies 112 Hz, 119 Hz, 132 Hz, respectively. Note that all artifacts are adequately
attenuated while preserving the tiny and important features of the ECG signal. It can be noted from Table 2
that the proposed design more effectively attenuates the contaminating artifacts in the ECG signal than all
other adaptive filtering techniques [15]. Other approaches based on Blind Source Separation (BSS) techniques
T
have achieved good performance in removing different ECG artifacts [7]. However, these techniques require
independent multichannel ECG recordings and high computational time. On the other hand, the proposed
P
approach can operate with single channel ECG recording which makes it attractive for the personal healthcare
RI
environment [27].
We further investigate the performance of the proposed design by comparing the denoised ECG signal after
SC
filtering with the noisy ECG signal using statistical evaluation metrics such as the MSE and cross correlation.
It can be noted from Fig. 4 that the proposed design has the highest cross correlation, and has the lowest MSE
comparing to other adaptive filtering techniques [15]. However, the computational time of the proposed
NU
ALDKRLS-KRLST design is higher (around 4 sec) than other adaptive filtering techniques (from 1 to 3 sec).
Note that the combined design of ALDKRLS and KRLST gives lower computational time than a cascade
design of the same filter (either ALDKRLS or KRLST). The computational efficiency of the ALDKRLS-
KRLST filter can be improved with the use of other forgetting schemes in the tracking algorithm. This is
MA
currently under development, and the results will be reported in the near future.
The sensitivity of the ALDKRLS-KRLST approach in detecting the peaks of ECG waves: P, Q, R, S and T
waves is investigated and compared with other adaptive filtering techniques. As shown in Fig. 5, the LMS
ED
algorithm can only detect the P wave, while the kernel-based algorithms can detect the Q, R, and S waves, but
give low sensitivity for the detection of P and T waves. On contrast, the proposed design accurately detects Q,
R, and S waves, and provides high sensitivity for P, and T waves. The calculations reveal that the accuracy of
P and T wave detection with the ALDKRLS-KRLST approach is higher by at least 15 % and 23 %,
PT
respectively, than other adaptive filtering techniques. This reveals that the proposed approach can improve the
diagnosis accuracy of heart diseases which are based on the characteristics of T and P waves [24]. The
performance of the proposed technique in detecting the ECG peaks is better than Wavelet Transform (WT) [9]
CE
and Empirical Mode Decomposition (EMD) techniques [28]. However, due to the relative complexity of the
proposed approach, it requires higher computational time than WT and EMD methods.
AC
The diagnosis accuracy of the Atrial Fibrillation (AF) disease with the use of combined ALDKRLS-
KRLST approach and other adaptive filtering techniques is investigated by employing a fuzzy logic
classification algorithm [25] over different ECG measurements. The results show that there is an
improvement in the performance of the classifier after removing the noise sources using all filter techniques.
However, the performance of the ALDKRLS-KRLST technique is significantly higher than other adaptive
filtering techniques. It can be noted from Table 3 that ECG denoising with the ALDKRLS-KRLST approach
allows a classification accuracy improvement of around 40 % comparing to the traditional adaptive filtering
techniques (LMS, KLMS, NKLMS, and KRLS) for AF diagnosis. Another two-stage ECG denoising
algorithm based on EMD and statistical approaches [29] has achieved comparable accuracy to our proposed
approach in AF detection. However, the weaknesses of the EMD-based filtering techniques is the limited
accuracy in detecting the peaks of ECG waves [30]. This reveals the effectiveness of the proposed technique
in improving the accuracy of heart diseases diagnosis.
ACCEPTED MANUSCRIPT
Appendix
For a set of input/output pairs 𝐷𝑡 = {𝑥𝑖 , 𝑦𝑖 }𝑡𝑖=1 , where 𝑥𝑖 ∈ 𝑅𝐷 are D-dimensional input vectors and 𝑦𝑖 ∈ 𝑅
are scalar outputs, the KRLST filter predicts an unknown output 𝑦𝑡+1 given the corresponding input 𝑥𝑡+1 and
the data available at time 𝑡 using the following two main steps [13]:
T
1- Defining a probabilistic framework based on Gaussian Processes (GPs)
P
In a Bayesian setting, a model that describes the observations is needed. Following the standard setup of
RI
Gaussian Process (GP) regression, observations can be described as the sum of an unobservable latent
function of the inputs plus an unknown zero-mean Gaussian noise 𝜀𝑖 as follows:
SC
𝑦𝑖 = 𝑓(𝑥𝑖 ) + 𝜀𝑖 (A1)
Equation (A1) implies that the likelihood of the latent function is 𝑃(𝑦𝑖 |𝑓𝑖 ) = 𝑁(𝑦𝑖 |𝑓𝑖 , 𝜎𝑛2 ), where the
NU
shorthand notation 𝑓𝑖 = 𝑓(𝑥𝑖 ) is used, and the noise power is assumed to be constant and equal to 𝜎𝑛2 . GPs are
stochastic processes that are defined by a mean function and a covariance function. To perform Bayesian
inference, a prior over the latent function is needed, which is taken to be a zero-mean GP with a covariance
function 𝑘(𝑥, 𝑥 ′ ), also known as kernel. The prior joint distribution of vector 𝑓𝑡 = [𝑓1 , … … . 𝑓𝑡 ] 𝑇 is a zero-
MA
mean multivariate Gaussian with covariance matrix 𝛫𝑡 , with elements [𝛫𝑡 ]𝑖,𝑗 = 𝑘(𝑥𝑖 , 𝑥𝑗 ). In the standard
KRLS setting, the predictive mean is often expressed as 𝑦̂𝑡+1 = Κ𝑇𝑡+1 𝛼𝑡 , where 𝛼𝑡 are the kernel weights
which can be obtained by 𝛼𝑡 = (𝛫𝑡 + 𝜎𝑛2 𝑰)−1 𝑦𝑡 . In this formulation, the noisy observations 𝑦𝑡 are used so that
the kernel matrix includes a regularization term 𝜎𝑛2 𝐈, where 𝐈 is the identity matrix.
ED
Assume that the posterior at time t including the most recent observation is a known Gaussian 𝑃(𝑓𝑡 |𝐷𝑡 ) =
𝑁(𝑓𝑡 |𝜇𝑡 , Σ𝑡 ), where 𝜇𝑡 is the mean of 𝑓𝑡 and Σ𝑡 is the variance of 𝑓𝑡 . This posterior should be updated to
PT
include a new observation at time 𝑡 + 1 (𝑥𝑡+1 , 𝑦𝑡+1 ). The likelihood of 𝑓𝑡+1 = 𝑓(𝑥𝑡+1 ) given the new
observation is:
CE
The predictive distribution of a new observation 𝑦𝑡+1 given past data is given by:
2
𝑃(𝑦𝑡+1 |𝐷𝑡 ) = 𝑁(𝑦𝑡+1 |𝑦̂𝑡+1 , 𝜎̂𝑦𝑡+1 ) (A4)
𝑇 2 2
where the mean and the variance of this Gaussian are 𝑦̂𝑡+1 = 𝑞𝑡+1 𝜇𝑡 and 𝜎̂𝑦𝑡+1 = 𝜎𝑛2 + 𝜎̂𝑓𝑡+1 .
This illustrates how probabilistic predictions for new observations are obtained, and how these new
observations can be included in the posterior once they are available. Only 𝜇𝑡+1 , Σ𝑡+1 and 𝑄𝑡+1 will be reused
in the next iteration, and the remaining parameters will be computed on demand.
ACCEPTED MANUSCRIPT
2- Forgetting strategy
In a time-varying scenario, only recent samples have relevant information, whereas the information
contained in older samples is actually misleading. In such a case, it will be very important to have a KRLS
tracker that is able to forget past information and track changes in the target latent function. All information
available up to time 𝑡 is stored in the posterior GP over the dictionary bases:
P T
(𝑓(𝑥)|𝐷𝑡 )~ℊ𝑃 (𝐾𝑡 (𝑥)𝑇 𝑄𝑡 𝜇𝑡 , 𝑘(𝑥, 𝑥 ′ ) + 𝐾𝑡 (𝑥)𝑇 𝑄𝑡 ( 𝑡 − 𝐾𝑡 )𝑄𝑡 𝐾𝑡 (𝑥 ′ )) (A5)
RI
where the notation (𝑓(𝑥)|𝐷𝑡 )~ℊ𝑃(𝑚𝑒𝑎𝑛, 𝑐𝑜𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒) means that (𝑓(𝑥)|𝐷𝑡 ) is a stochastic function drawn
from a GP with certain mean and covariance functions, 𝐾𝑡 (𝑥) denotes the vector of covariances between
SC
𝑥 and all the bases in the dictionary at time t. In order to make the KRLST algorithm able to adapt to
nonstationary data, it should be able to “forget” past samples, i.e., to intentionally force the posterior
𝑃(𝑓(𝑥)|𝐷𝑡 ) to lose some data. This can be done by linearly combining (𝑓(𝑥)|𝐷𝑡 ) with another independent
GP called the forgetting noise 𝑛(𝑥) which holds no information about data. Since this new posterior after
NU
forgetting will be a linear combination of two GPs, it will also be a GP, and it will be denoted as (𝑓̃(𝑥)|𝐷𝑡 )
and can be expressed as:
MA
(𝑓̃(𝑥)|𝐷𝑡 ) = 𝛼(𝑓(𝑥)|𝐷𝑡 ) + 𝛽𝑛(𝑥) (A6)
where 𝛼, 𝛽 > 0 are used to adapt the trade-off between the informative GP (𝑓(𝑥)|𝐷𝑡 ) and the uninformative,
forgetting noise 𝑛(𝑥). The posterior GP after forgetting, 𝑃(𝑓̃(𝑥)|𝐷𝑡 ), should also be expressed in terms of a
ED
distribution over the latent points in the dictionary 𝑁(𝜇̃𝑡 , ̃ 𝑡 ). The posterior after forgetting can be expressed
as:
PT
Different definitions for 𝛼, 𝛽, and 𝑛(𝑥) will lead to different types of forgetting. In the Back-to-the-Prior
CE
(B2P) Forgetting method, the GP 𝑛(𝑥) is selected to act as a noise. Note that 𝑛(𝑥) holds no information about
the data and is independent of (𝑓(𝑥)|𝐷𝑡 ). Assume for a moment that we want to forget all past data, then we
must set α = 0 to completely remove the informative GP. In that case, the posterior GP would be 𝛽𝑛(𝑥).
AC
When no data has been observed, the distribution of the posterior should, by definition, be equal to the prior
and hence 𝛽𝑛(𝑥) must be a scaled version of the GP prior. This scale can be chosen to be 1, and consequently
the distribution of the noise becomes: 𝑛(𝑥)~ℊ𝑃 (0, 𝑘(𝑥, 𝑥 ′ )). It can be noted with this choice that setting α =
0 implies that β = 1. This shows that once 𝑛(𝑥) has been defined, the distribution of (𝑓̃(𝑥)|𝐷𝑡 ) can be
obtained from equation (A7). Since 𝑛(𝑥) is independent of (𝑓(𝑥)|𝐷𝑡 ), the distribution of (𝑓̃(𝑥)|𝐷𝑡 ) with B2P
can be expressed as:
Comparing (A7) and (A8) and identifying terms, it can be deduced that 𝜇̃𝑡 = 𝛼𝜇𝑡 and 𝛼 2 + 𝛽 2 = 1. This
defines the relationship between the posterior distribution before and after the forgetting process. The
forgetting process depends on a single positive parameter, 𝛼, and its corresponding parameter 𝛽 = √1 − 𝛼 2.
This latter formula implies that α cannot be bigger than 1. The values of α are therefore in the range from 0
(all past data is forgotten) to 1 (no forgetting occurs). Let 𝛼 2 = λ, the forgetting updates are finally:
ACCEPTED MANUSCRIPT
𝑡 ⟵ λ 𝑡 + (1 − λ)𝐾𝑡 (A9)
µ𝑡 ⟵ √λµ𝑡 (A10)
where λ ∈ (0, 1] is the forgetting factor. The forgetting factor tells whether the data is non-stationarity (λ = 1)
or not and to which degree (0 < λ < 1). In the current study, the ECG signal is a high degree non-stationary
T
signal and hence the forgetting factor should be close to one. In the proposed design, λ of 0.99 gives best
P
results. The application of the KRLST algorithm in ECG denoising can be summarized in the following steps:
RI
Set the initial parameters such as the forgetting factor 𝜆, the kernel function 𝑘(𝑥, 𝑥 ′ ), the budget or
the number of bases M, and the regularization or the noise power 𝜎𝑛2 .
SC
Observe the first input/output pair (𝑥1 , 𝑦1 ) and then initialize the inference parameters 𝜇1 , 1 , and
𝑄1 as follows:
𝑦 𝑘(𝑥 ,𝑥 )
𝜇1 = 21 1 1 ) (A11)
𝜎𝑛 +𝑘(𝑥1 ,𝑥1
NU
𝑘(𝑥 ,𝑥 )2
1 = 𝑘(𝑥1 , 𝑥1 ) − 2 1 1 ) (A12)
𝜎𝑛 +𝑘(𝑥1 ,𝑥1
1
𝑄1 = (A13)
𝑘(𝑥 ,𝑥 )
1 1
MA
where 𝑘(x1 , x1 ) is the initial covariance function for the first data point x1 .
Add 𝑥1 to the bases dictionary. Then, for each time instant 𝑡 = 1, 2, …, use the B2P forgetting
ED
(A14)
2 𝑇
𝛾𝑡+1 = 𝑘𝑡+1 − 𝐾𝑡+1 𝑞𝑡+1 (A15)
2 2 𝑇
𝜎̂𝑓𝑡+1 = 𝛾𝑡+1 + 𝑞𝑡+1 ℎ𝑡+1 (A16)
where ℎ𝑡+1 = 𝑡 𝑞𝑡+1 .
Observe the actual output 𝑦𝑡+1 after computing the output predictive mean and the output predictive
variance as follows:
𝑇
𝑦̂𝑡+1 = 𝑞𝑡+1 𝜇𝑡 (A17)
2 2
𝜎̂𝑦𝑡+1 = 𝜎𝑛2 + 𝜎̂𝑓𝑡+1 (A18)
References
[1] R. M. Rangayyan and N. P. Reddy, Biomedical signal analysis: a case-study approach. Annals of Biomedical
Engineering, 30.7 (2002), 983-983.
ACCEPTED MANUSCRIPT
[2] V. X. Afonso, W. J. Tompkins, T. Q. Nguyen, and S. Luo, ECG beat detection using filter banks. IEEE Transactions
on Biomedical Engineering, 46.2 (1999): 192-202.
[3] Y. H. Hu, W. J. Tompkins, J. L. Urrusti, and V. X. Afonso, Applications of artificial neural networks for ECG signal
detection and classification. Journal of Electrocardiology, 26, 66–73 (1993).
[4] N. V. Thakor and Y. S. Zhu, Applications of adaptive filtering to ECG analysis: Noise cancellation and arrhythmia
detection. IEEE Transactions on Biomedical Engineering, 38. 8 (1991): 785–794.
[5] A. Khamene and S. Negahdaripour, A new method for the extraction of fetal ECG from the composite abdominal
T
signal. IEEE Transactions on Biomedical Engineering, 47.4 (2000): 507-516.
P
[6] Y. Zigel, A. Cohen, and A. Katz, ECG signal compression using analysis by synthesis coding. IEEE Transactions on
Biomedical Engineering, 47.10 (2000):1308-1316.
RI
[7] M. P. S. Chawl, A comparative analysis of principal component and independent component techniques for
electrocardiograms, Neural Computing and Applications, 18.6 (2009):539–556.
[8] S. Poornachandra, Wavelet-based denoising using subband dependent threshold for ECG signals, Digital signal
SC
processing, 18.1 (2008): 49-55.
[9] B. N. Singh and A. K. Tiwari, Optimal selection of wavelet basis function applied to ECG signal denoising. Digital
signal processing, 16.3 (2006): 275-287.
[10] N. Dey, T. Dash, and S. Dash, ECG signal denoising by functional link artificial neural network (FLANN),
NU
International Journal of Biomedical Engineering and Technology. 7.4 (2011): 377–389.
[11] N. V. Thakor and Y-S. Zhu, Applications of adaptive filtering to ECG analysis: Noise cancellation and arrhythmia
detection. IEEE Transactions on Biomedical Engineering. 38.8 (1991):785–794.
[12] Y. Engel, S. Mannor, and R. Meir, The kernel recursive least squares algorithm. IEEE Transactions on Signal
MA
Processing. 52.8 (2004): 2275–2285.
[13] S. Van Vaerenbergh, M. L´azaro-Gredilla, and I. Santamar´ıa, Kernel recursive least-squares tracker for time-varying
regression. IEEE Transactions on Neural Networks and Learning Systems. 23.8 (2012):1313–1326.
[14] W. Liu, J. C. Príncipe, and S. Haykin, Kernel adaptive filtering: A comprehensive introduction, Vol. 57, John Wiley
& Sons, Inc., Hoboken, New Jersey, 2011.
ED
[15] M. Ghasemi, ECG noise cancellation using Kernel adaptive filtering. Doctoral dissertation, California State
University, Northridge, 2013.
[16] W. Liu, P. P. Pokharel, and J. C. Principe, The Kernel Least-Mean-Square Algorithm. IEEE Transactions on Signal
Processing. 56.2 (2008): 543–554.
PT
[17] Y. Engel, S. Mannor, and R. Meir, The kernel recursive least-squares algorithm. IEEE Transactions on Signal
Processing. 52.8 (2004):2275-2285.
[18] C. Campbell, Kernel methods: a survey of current techniques. Neurocomputing. 48.1 (2002): 63-84.
CE
[19] S. Van Vaerenbergh and I. Santamar´ıa, Online Regression with Kernels. Regularization, Optimization, Kernels, and
Support Vector Machines. Chapman and Hall/CRC, 2014, pp. 477-501.
[20] M. G. Bellanger, Adaptive Digital Filters, CRC Press, 2001 Jul 20.
[21] G. B. Moody, R. G. Mark, and A. L. Goldberger, PhysioNet: Aweb-based resource for the study of physiologic
AC
signals. IEEE Engineering in Medicine and Biology Magazine, 20.3 (2001): 70-75.
[22] A. Gacek and W. Pedrycz (Eds.), ECG Signal processing, classification and interpretation: Comprehensive
framework of computational intelligence. Springer Science & Business Media, 2011.
[23] C. Li, C. Zheng, and C. Tai, Detection of ECG characteristic points using wavelet transforms. IEEE Transactions on
biomedical Engineering, 42.1 (1995): 21-28.
[24] B. Surawicz and T. Knilans, Chou's electrocardiography in clinical practice: adult and pediatric. Elsevier Health
Sciences, 2008 Apr 22.
[25] R. Ceylan, Y. Özbay, and B. Karlik, A novel approach for classification of ECG arrhythmias: Type-2 fuzzy
clustering neural network, Expert Systems with Applications. 36.3 (2009): 6721-6726.
[26] J. A. Urigüen, and B. Garcia-Zapirain, EEG artifact removal—state-of-the-art and guidelines, Journal of neural
engineering, 12(3), p.031001, 2015.
[27] K. T. Sweeney, H. Ayaz, T. E. Ward, M. Izzetoglu, S. F. McLoone, and B. Onaral, A
Methodology for Validating Artifact Removal Techniques for Physiological Signals, IEEE Transactions on Information
Technology in Biomedicine, vol. 16, no. 5, pp. 918-926, 2012.
[28] M. Blanco-Velasco, B. Weng, and K. E. Barner, ECG signal denoising and baseline wander correction based on the
empirical mode decomposition, Computers in biology and medicine, 38(1), 1-13, 2008.
[29] J. Lee, D. D. McManus, S. Merchant, and K. H. Chon, Automatic motion and noise artifact detection in holter ECG
ACCEPTED MANUSCRIPT
data using empirical mode decomposition and statistical approaches, IEEE Transactions on Biomedical
Engineering, 59(6), 1499-1506, 2012.
[30] A. J. Nimunkar, and W. J. Tompkins, R-peak detection and signal averaging for simulated stress ECG using EMD,
29th Annual International Conference of Engineering in Medicine and Biology Society, EMBS 2007, IEEE, 2007.
P T
RI
SC
NU
MA
ED
PT
CE
AC