Synopsis - Main Content
Synopsis - Main Content
1 ABSTRACT 1
2 LITERATURE REVIEW 1
3 PROBLEM DEFINITION 2
4 METHODOLOGY 3
5 FACILITIES REQUIRED 4
6 BIBLIOGRAPHY/REFERENCES 5
1 ABSTRACT
Cardiac diseases are among the leading causes of death worldwide. Early and accurate diagnosis
of cardiac diseases can help prevent complications and improve patient outcomes. Electrocardio-
gram (ECG) is a widely used biomedical signal to measure the electrical activity of the heart and
detect abnormalities. However, manual interpretation of ECG patterns can be time-consuming,
subjective, and prone to errors.
In this project, we propose an automatic cardiac disease diagnosis system based on machine
learning techniques. We will first use a pretrained convolutional neural network (CNN) like
Alex Net/GoogLeNet to extract bottleneck features from ECG signals and then cluster the fea-
tures into two clusters using some clustering algorithms. This line of approach, if successful,
can be used to address the problem of avoiding or reducing the excessive effort associated with
labelling ECG data. Furthermore, we also plan to use these pretrained CNNs on both spec-
trogram images derived from ECG signals and images of the time-domain ECG signals, one of
these at a time, to build end-to-end cardiac disease diagnosis systems
Keywords
Cardiac diseases, early diagnosis, electrocardiogram (ECG), machine learning, convolutional
neural network (CNN), automatic diagnosis, clustering algorithms, bottleneck features, spectro-
gram images, time-domain ECG signals, end-to-end diagnosis.
2 LITERATURE REVIEW
The detection and classification of heart diseases have undergone major advancements with
the integration of machine learning (ML) and signal processing techniques. Electrocardiogram
(ECG) wavelets, in particular, offer a way to analyze and decompose the ECG signal into various
components, allowing for more precise and accurate detection. This literature review summarizes
the significant contributions and research in the domain of heart disease detection using ECG
wavelets and machine learning.
1. Basic of ECG Signal
(a) The ECG signal is a representation of the electrical activity of the heart over time. Its
interpretation can offer insights into heart function and detect abnormalities (Smith,
1996) [1].
2. Wavelet Transform in ECG Analysis
(a) Wavelets allow the decomposition of ECG signals into multi-resolution levels, which
help in effectively isolating distinct features and characteristics of the signal (Addison,
2002) [2].
(b) Inoue and Kobayashi (2000) [3] utilized wavelet-based features to detect QRS com-
plexes, which are significant components of the ECG waveform.
3. Application of Machine Learning in ECG Analysis
(a) Machine learning has introduced automated methods to analyze ECG signals and
predict heart diseases (Kher, 2018) [4].
(b) Rajpurkar et al. (2017) [5] used a convolutional neural network to detect arrhythmias
with a level of accuracy that rivals cardiologists.
1
4. ECG Wavelets ML for Heart Disease Detection
(a) Acharya et al. (2017) [6] proposed an integrated approach combining ECG wavelets
and ML, demonstrating improved accuracy in detecting cardiac anomalies.
(b) Kapoor and Bhatia (2019) [7] emphasized the benefits of wavelet-transformed fea-
tures, which when coupled with ML models, can provide enhanced results.
(a) Decision trees, SVM, and neural networks have been popular choices for classifying
ECG signals (Osowski and Linh, 2001) [8].
(b) Clifford et al. (2017) [9] showcased the potential of deep learning models in capturing
intricate patterns within ECG signals.
(a) Handling noise and artifacts in ECG signals remains a challenge, which affects the
accuracy of wavelet-based techniques (Luz et al., 2016) [10].
(b) Data imbalance, where certain heart diseases have fewer instances in datasets, may
skew the results of ML models (Jun et al., 2018) [11].
7. Future Outlook
(a) Integration of more advanced deep learning techniques promises further enhancement
in detection accuracy (Hannun et al., 2019) [12].
(b) The potential fusion of ECG with other diagnostic methods and its subsequent anal-
ysis using ML could provide a holistic approach to heart disease detection (Natarajan
et al., 2020) [13].
(a) With wearable technology advancements, continuous ECG monitoring combined with
ML can enable real-time heart disease detection (Wang et al., 2020) [14].
(b) This real-time analysis can lead to personalized healthcare interventions and timely
treatments (Miotto et al., 2021) [15].
3 PROBLEM DEFINITION
Our main problem is automatic cardiac disease diagnosis from ECG data, a task that aims to
identify the presence and type of heart problems from ECG signals. ECG data is a type of
biomedical signal that records the electrical activity of the heart. However, transfer learning
and clustering ECG data has not yet been much explored for automatic cardiac disease diag-
nosis. Transfer learning is a technique that leverages the knowledge learned from one domain
to another domain. Clustering is a method that groups similar data points together based on
some criteria. We will try to address these issues by proposing two frameworks: one that uses
transfer learning and another that uses clustering to improve the performance and robustness
of automatic cardiac disease diagnosis. Clustering can specially be helpful when labelled data is
not available. Obtaining disease labels is very expensive as it requires very skilled people who
need to spend hours, if not days, on the data to label them. Therefore, our problem can be
decomposed into three sub-problems:
2
1. Removing noise from the ECG signals as a preprocessing step as recorded ECG signals are
generally noisy due to movement of the patients.
2. How to retrain pretrained CNNs with ECG spectrogram images and ECG time domain
waveforms images as inputs for ECG analysis.
3. How to extract bottleneck features from the pretrained CNN in both cases and then cluster
the data into two classes (normal and abnormal) to address the problem of unlabelled ECG
data.
4 METHODOLOGY
We propose two approaches to address the problem of automatic cardiac disease identification.
The first approach is to retrain a pretrained convolutional neural network (CNN) with ECG
spectrogram and ECG time domain waveforms as inputs. The second approach is to extract
bottleneck features from the pretrained CNN in both cases and then cluster the data into two
classes (normal and abnormal) to address the problem of unlabelled ECG data.
A CNN is a type of neural network that can learn to extract features from images or signals
by applying multiple layers of filters. A pretrained CNN is a CNN that has been trained on a
large dataset of images or signals, such as ImageNet or PhysioNet, and can be used as a feature
extractor or fine-tuned for a specific task. In our case, we use a pretrained CNN called ResNet-
50, which has been trained on ImageNet, a dataset of over 14 million images belonging to 1000
classes.
To use ResNet-50 for ECG analysis, we need to convert the ECG signals into images that can be
fed into the network. One way to do this is to use spectrograms, which are visual representations
of the frequency spectrum of a signal. A spectrogram can capture the temporal and spectral
characteristics of an ECG signal, such as the QRS complex, the ST segment and the T wave.
To generate spectrograms from ECG signals, we use a short-time Fourier transform (STFT)
with a window size of 256 samples and a hop size of 64 samples. Another way to convert ECG
signals into images is to use time domain waveforms, which are simply plots of the amplitude
of the signal over time. A time domain waveform can capture the shape and duration of an
ECG signal, such as the P wave, the QRS complex and the T wave. To generate time domain
waveforms from ECG signals, we simply normalize the signal values between 0 and 1 and resize
them to fit the input size of ResNet-50.
We will then retrain ResNet-50 with both spectrogram and time domain waveform images as
inputs. We use a balanced dataset of 10,000 ECG signals from PhysioNet, where 5,000 signals
are normal and 5,000 signals are abnormal (including arrhythmia, myocardial infarction, and
congestive heart failure). We will split the dataset into 80then preprocess the signals to remove
noise using standard noise reduction algorithms before feeding them to the network. We will
then retrain the network.
Our next approach will target the scenario of unlabelled data using bottleneck features. A
bottleneck feature is a low-dimensional representation of an input that captures its essential in-
formation, which can be obtained by passing an input through a pretrained CNN and extracting
the output of one of its intermediate layers. In our case, we use ResNet-50 as the pretrained
CNN and extract the output of its last convolutional layer, which has a dimensionality of 2048.
We will then apply a clustering algorithm to group the bottleneck features into two clusters:
normal and abnormal. Clustering is an unsupervised learning technique that can find patterns
in data without using labels. In our case, we plan to use K-means clustering, which is a simple
and efficient algorithm that assigns each data point to one of K clusters based on its distance to
the cluster centroids. We use the same dataset as before, but without using labels. We extract
bottleneck features from both spectrogram and time domain waveform images using ResNet-50.
3
We will then apply K-means clustering with K=2 to both sets of features separately. We will
evaluate the clustering performance by measuring the accuracy.
5 FACILITIES REQUIRED
1. Software: For data analysis, signal processing, and model creation, use programming lan-
guages like Python and libraries like TensorFlow, scikit-learn, and NumPy.
2. ECG Database: For the model’s training and evaluation, access to a broad and labeled
ECG database is essential.
(a) Computer: To handle data processing, model training, and experimentation, a com-
puter with enough processing speed and memory is needed.
(b) GPU: Because neural networks can make use of parallel processing, using a GPU,
particularly for deep learning tasks, can drastically shorten the time it takes to train
a model.
(c) High-Resolution display: A high-resolution display makes it easier to see findings and
graphs.
(d) Internet connectivity: Downloading databases, libraries, and research articles may
require connection to the internet.
Platforms for cloud computing can also be considered for projects with limited resources.
4
6 BIBLIOGRAPHY/REFERENCES
References
References