0% found this document useful (0 votes)

47 views13 pages

MGFF Springer

The document presents the Multimodal Gated Fusion Framework (MGFF) for predicting cardiovascular diseases (CVD) by integrating textual electronic health records (EHR) and visual electrocardiogram (ECG) data using a gated fusion mechanism. The framework achieved an accuracy of 91.2%, an F1-score of 0.91, and a ROC-AUC of 0.94, outperforming existing models like MedFuse and VisualBERT. Future work aims to enhance the robustness, generalizability, and interpretability of the model for broader clinical application.

Uploaded by

chandranath

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views13 pages

MGFF Springer

Uploaded by

chandranath

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Multimodal Gated Fusion Framework for

Cardiovascular Disease Prediction Using ECG

and EHR Data

Jothi Prakash V.1 , Thiruselvan S1 Chandranath S., Vikeshkumar M., and Ajay
Meenatchi Sundaram M.

Department of Information Technology, Karpagam College of Engineering,

Myleripalayam Village, Coimbatore 641032, Tamil Nadu, India.
[email protected]

Abstract. Cardiovascular diseases (CVD) are among the leading causes

of mortality worldwide, necessitating accurate and reliable predictive
models to improve clinical decision-making. Existing approaches often
focus on single-modality data, such as textual electronic health records
(EHR) or visual electrocardiogram (ECG) signals, limiting their abil-
ity to capture complementary information across modalities. To address
this, we propose the Multimodal Gated Fusion Framework (MGFF), a
novel method that integrates textual EHR data and visual ECG signals
through a gated fusion mechanism, leveraging the ViLT transformer for
robust multimodal feature alignment and classification. The framework
was evaluated on multimodal datasets, achieving an accuracy of 91.2%,
an F1-score of 0.91, and a ROC-AUC of 0.94, significantly outperforming
advanced baselines such as MedFuse and VisualBERT. Extensive experi-
ments, including ablation studies and calibration analysis, demonstrated
the importance of the gated fusion mechanism and the reliability of the
predicted probabilities. While the results are promising, limitations such
as robustness to noisy data and computational efficiency highlight areas
for future improvement. The proposed MGFF provides a reliable, accu-
rate, and scalable solution for CVD prediction, emphasizing the potential
of multimodal approaches in advancing healthcare analytics. Future work
will focus on enhancing robustness, generalizability, and interpretability
for broader clinical adoption.

Keywords: Multimodal Fusion, Cardiovascular Disease Prediction, Deep

Learning, Gated Fusion Mechanism, Transformer Models

1 Introduction
Cardiovascular diseases (CVD) [7, 8] remain one of the leading causes of mor-
tality globally, accounting for millions of deaths annually. Early detection and
accurate prediction of CVD are critical for effective clinical intervention and im-
proving patient outcomes. Traditional prediction methods often rely on single-
modality data, such as textual electronic health records (EHR) or visual elec-
trocardiogram (ECG) signals [8, 11]. While these approaches provide valuable
2 Jothi Prakash V. et al.

insights, they fail to exploit the complementary nature of multimodal data, which
can offer a more comprehensive understanding of a patient’s condition. This lim-
itation highlights the need for advanced frameworks that integrate diverse data
modalities to enhance predictive accuracy and reliability [11]. Recent advance-
ments in multimodal learning have demonstrated the potential of combining het-
erogeneous data sources for improved performance in various domains, including
healthcare. However, many existing multimodal frameworks face challenges such
as ineffective feature alignment, limited generalizability, and computational in-
efficiency [4, 6]. These limitations motivate the development of a robust and
efficient framework tailored to the specific needs of CVD prediction. In this re-
search, we propose the Multimodal Gated Fusion Framework (MGFF), a novel
approach that integrates textual EHR data and visual ECG signals using a
gated fusion mechanism and a ViLT-based transformer for feature refinement
and classification. The key contributions of this work are as follows:
– We introduce MGFF, a unified framework that leverages gated fusion for
effective multimodal feature integration and ViLT for robust feature align-
ment.
– We demonstrate the superior performance of MGFF on multimodal datasets,
achieving state-of-the-art results with an accuracy of 91.2%, F1-score of 0.91,
and ROC-AUC of 0.94.
– We conduct extensive experiments, including ablation studies, calibration
analysis, and multimodal fusion efficiency evaluation, to validate the effec-
tiveness of the proposed framework.

2 Related Works
The task of cardiovascular disease (CVD) prediction has garnered significant
attention in recent years, with advancements in machine learning enabling the
development of predictive models across various data modalities. Existing works
can be broadly categorized into single-modality approaches and multimodal
frameworks.

2.1 Single-Modality Approaches

Single-modality models focus on leveraging either visual ECG signals or textual
EHR data. Convolutional Neural Networks (CNNs) [11, 10] have been widely
used for ECG analysis due to their ability to capture temporal and spectral
features from raw signals. For instance, ECG-only models have demonstrated
promising results in arrhythmia detection and CVD classification, but their re-
liance on visual features alone limits their ability to incorporate broader clinical
context. Similarly, Deep Neural Networks (DNNs)[8, 1] have been applied to
EHR data, utilizing structured clinical features such as demographics, lab re-
sults, and diagnoses. However, EHR-only models often suffer from insufficient
representation of physiological patterns, which are crucial for comprehensive
CVD prediction [8, 11, 10].
Title Suppressed Due to Excessive Length 3

2.2 Multimodal Frameworks

Multimodal frameworks aim to integrate multiple data modalities to address

the limitations of single-modality approaches [7, 8]. Simple fusion techniques,
such as feature concatenation [7, 5], have been explored to combine ECG and
EHR data, but these methods often fail to exploit the complementary nature
of the modalities effectively. Advanced attention-based models like MedFuse [9]
have introduced modality-specific encoders and attention mechanisms for fea-
ture alignment, demonstrating improved performance in healthcare applications.
Similarly, VisualBERT [12], originally developed for vision-and-language tasks,
has been adapted for integrating textual and visual healthcare data. While these
models achieve competitive results, challenges such as ineffective feature align-
ment and computational overhead persist. Despite the advancements in multi-
modal learning, several limitations remain. Many frameworks rely on simplis-
tic fusion mechanisms that do not effectively capture the complex interdepen-
dencies between modalities. Additionally, attention-based models like MedFuse
and transformer-based architectures like VisualBERT, while powerful, often re-
quire extensive computational resources, limiting their applicability in resource-
constrained environments [8, 6, 1]. Furthermore, existing methods lack robust
calibration, which is critical for clinical applications requiring reliable probability
estimates.

2.3 Research Gaps

Despite significant advancements in machine learning for cardiovascular disease

(CVD) prediction, several critical gaps remain in existing methodologies. Single-
modality models, such as those relying solely on ECG or EHR data, fail to
leverage the complementary nature of multimodal information, limiting their
predictive accuracy. While multimodal frameworks like MedFuse and Visual-
BERT have shown promise, they often employ simplistic fusion mechanisms or
attention-based architectures that lack efficient feature alignment and integra-
tion. Additionally, these models are computationally intensive, posing challenges
for deployment in resource-constrained environments. Furthermore, existing ap-
proaches often overlook the importance of model calibration, which is essential
for producing reliable probability estimates in clinical decision-making. Lastly,
the interpretability of current models remains inadequate, making it difficult for
clinicians to trust or act on predictions. Addressing these gaps necessitates the
development of robust, efficient, and interpretable multimodal frameworks that
can effectively integrate diverse data modalities while maintaining reliability and
scalability.

3 Methodology

The proposed Multimodal Gated Fusion Framework (MGFF) aims to predict

cardiovascular diseases (CVD) by integrating textual (EHR) and visual (ECG)
4 Jothi Prakash V. et al.

data. The framework leverages advanced feature extraction techniques, a gated

fusion mechanism for effective multimodal alignment, and a transformer-based
classification module to ensure robust and accurate predictions. The key com-
ponents of the framework are illustrated in Figure 1 and detailed below.

Fig. 1: Overview of the MGGF Framework.

3.1 Datasets
The ECG [3] and MIT-BIH [2] datasets, designed for multimodal analysis, com-
prise a total of 22,225 samples with distinct features and classes. The ECG
dataset includes 333 samples with 21 unique features, while the MIT-BIH dataset
contains 21,892 samples with 188 unique features. Both datasets are categorized
into two classes, representing normal and disease states, with 215 normal sam-
ples and 118 disease samples in each dataset. The multimodal nature of the
data incorporates both textual clinical records and visual ECG signals, ensur-
ing comprehensive representation. Preprocessing techniques, including normal-
ization, tokenization, and augmentation, were applied to maintain data quality
and enhance model compatibility. Normalization was applied to ensure consis-
tent data ranges for all features, while tokenization converted textual EHR data
Title Suppressed Due to Excessive Length 5

into embeddings compatible with the model. Augmentation techniques, such as

flipping and scaling, were applied to ECG data to improve model generalization.
Table 1 summarizes the dataset statistics.

Table 1: CVD Dataset Statistics

Statistic ECG Dataset MIT-BIH Dataset
Total Samples 333 21,892
Unique Features 21 188
Normal Samples 215 215
Disease Samples 118 118
Data Types Visual Textual
Multi-modal Representation Yes Yes

3.2 Feature Extraction

Feature extraction is a crucial step in the proposed Multimodal Fusion Frame-

work. Both ECG and EHR data are independently processed to represent their
respective modalities. The extracted features are subsequently aligned and fused
using a gated mechanism for the classification task.

ECG Feature Extraction The visual features of the ECG dataset are ex-
tracted using a Convolutional Neural Network (CNN)-based encoder. Given an
input ECG signal E, the CNN processes the signal through a series of convolu-
tional layers, activation functions, and pooling operations to generate a feature
map Fe ∈ Rde , where de is the dimensionality of the ECG feature vector. The
feature extraction process is mathematically expressed as:

Fe = CNN(E), (1)

where Fe captures the temporal and spectral information inherent in the ECG
signal.

EHR Feature Extraction Structured features from the EHR dataset are
derived using a Deep Neural Network (DNN)-based encoder. Given an input
EHR data H, the DNN processes the data through multiple fully connected
layers to extract meaningful features, represented as Fh ∈ Rdh , where dh is
the dimensionality of the EHR feature vector. The process is mathematically
represented as:
Fh = DNN(H), (2)
where Fh encapsulates the patient’s health history, including diagnostic and
demographic information.
6 Jothi Prakash V. et al.

Multimodal Feature Fusion The visual features Fe and structured features

Fh are aligned and integrated through a gated fusion mechanism. The gated fu-
sion mechanism applies learnable gates to control the contribution of each modal-
ity to the final fused feature vector. This ensures that the most relevant informa-
tion from both modalities is emphasized during classification. This mechanism
ensures effective multimodal fusion by selectively combining the most relevant
features from each modality. The fused multimodal feature vector Fm ∈ Rdm is
computed as:
Fm = GatedFusion(Fe , Fh ), (3)

where Fm represents the integrated features from both ECG and EHR data,
which are subsequently utilized for classification.

3.3 Classification

The fused multimodal feature vector Fm , obtained through the gated fusion
mechanism, is fed into the classification layer to predict the likelihood of car-
diovascular disease (CVD). This layer is designed to refine and classify the mul-
timodal representation effectively, leveraging the ViLT (Vision-and-Language
Transformer) model. ViLT was chosen for its efficiency in processing multimodal
data by directly operating on aligned textual and visual embeddings without re-
quiring heavy pretraining on paired datasets.

Linear Projection The fused feature vector Fm ∈ Rdm is first transformed

into a lower-dimensional space compatible with the ViLT architecture. This is
achieved through a linear projection layer:

Fp = Wp Fm + bp , (4)

where Wp ∈ Rdt ×dm is the projection matrix, bp ∈ Rdt is the bias vector, and
dt represents the dimensionality of the projected feature vector Fp .

Transformer-based Feature Refinement The projected features Fp are fur-

ther refined using the Vision-and-Language Transformer (ViLT). ViLT utilizes a
self-attention mechanism to capture interdependencies and higher-order relation-
ships between the ECG (visual) and EHR (textual) modalities. This refinement
process is represented as:
Fr = ViLT(Fp ), (5)

where Fr ∈ Rdt denotes the refined feature vector. ViLT ensures that the fused
representation incorporates both spatial and semantic alignment across modali-
ties.
Title Suppressed Due to Excessive Length 7

MLP Head and Prediction The refined feature vector Fr is passed through a
Multi-Layer Perceptron (MLP) head for classification. The MLP consists of one
or more fully connected layers and a softmax function, which outputs the proba-
bility distribution over the classes. The classification probabilities are computed
as:
ŷ = Softmax(Wo Fr + bo ), (6)

where Wo ∈ RC×dt is the weight matrix, bo ∈ RC is the bias vector, and C is

the number of classes (e.g., normal and disease states).

Loss Function The model is trained using a cross-entropy loss function, which
quantifies the difference between the predicted probability distribution ŷ and the
true labels y. The cross-entropy loss function was chosen as it is well-suited for
binary classification tasks, effectively penalizing incorrect predictions based on
the predicted probability distribution. The loss function is defined as:

N C
1 XX
L=− yi,c log(ŷi,c ), (7)
N i=1 c=1

where N is the number of samples, yi,c is the ground-truth label for class c of
sample i, and ŷi,c is the predicted probability for class c. The final output of the
classification layer is the predicted class label:

Class = arg max(ŷc ), (8)

where ŷc is the predicted probability for class c. The class with the highest
probability is selected as the final prediction, indicating whether the sample
belongs to the normal or disease category.

4 Experimental Evaluation

The experiments were conducted using the ECG and MIT-BIH datasets, with
a train-test split of 80%-10%-10% for training, validation, and testing, respec-
tively. All data preprocessing steps, including normalization for ECG signals and
tokenization for EHR records, were applied to ensure consistency and compati-
bility with the proposed framework. The model was implemented using Python
3.8 with PyTorch 1.12.0 and trained on an NVIDIA V100 GPU with 32GB of
memory. The Adam optimizer was used with an initial learning rate of 0.001,
a batch size of 64, and training was performed for 50 epochs. Hyperparameter
tuning was carried out using grid search to optimize model performance. Ad-
ditionally, 5-fold cross-validation was employed to evaluate the robustness and
generalizability of the model.
8 Jothi Prakash V. et al.

4.1 Evaluation Metrics

The performance of the proposed multimodal fusion framework is evaluated

using standard metrics, including accuracy, precision, recall (sensitivity), F1-
score, and ROC-AUC. These metrics provide a comprehensive understanding of
the model’s classification performance, balancing overall correctness, the ability
to detect positive cases, and the trade-off between precision and recall.

4.2 Baseline Models

To evaluate the performance of the proposed multimodal fusion framework, we

compare it against five recent and relevant baseline models. The first baseline is a
CNN-based ECG-only model [10] that processes the visual ECG signals indepen-
dently, assessing the contribution of the visual modality in isolation. Similarly,
the second baseline is a DNN-based EHR-only model [1] that utilizes structured
EHR data alone to evaluate the effectiveness of the textual modality. For multi-
modal fusion, we include a simple fusion baseline that combines ECG and EHR
features through feature concatenation without the gated fusion mechanism [5],
providing insights into the importance of the proposed alignment strategy. Addi-
tionally, we compare against MedFuse [9], a recent multimodal fusion framework
specifically designed for integrating clinical text and medical imaging data using
attention-based mechanisms for feature alignment. Finally, VisualBERT [12], a
transformer-based model originally developed for vision-and-language tasks, is
adapted to process ECG signals and EHR data by treating ECG signals as visual
embeddings and EHR data as textual inputs. These baselines allow for a com-
prehensive comparison, showcasing the advantages of the proposed framework
in leveraging both modalities for robust cardiovascular disease prediction.

4.3 Evaluation with Baselines

The effectiveness of the proposed Multimodal Gated Fusion Framework is evalu-

ated against several baseline models using key metrics: accuracy, precision, recall,
F1-score, and ROC-AUC. As shown in Table 2, the proposed framework outper-
forms all baselines, achieving the highest accuracy (91.2%), precision (0.92),
recall (0.90), F1-score (0.91), and ROC-AUC (0.94). Single-modality baselines,
including ECG-only (CNN) and EHR-only (DNN) models, demonstrate lim-
ited performance, highlighting the necessity of multimodal integration. While
the simple fusion model, using feature concatenation, shows moderate improve-
ments, it fails to capture the complementary nature of ECG and EHR data effec-
tively. Advanced baselines like MedFuse and VisualBERT, leveraging attention
and transformer-based architectures, achieve competitive results; however, the
proposed framework surpasses them due to its gated fusion mechanism and the
use of ViLT for precise alignment and feature refinement. These results under-
score the framework’s ability to robustly integrate multimodal data for accurate
cardiovascular disease prediction.
Title Suppressed Due to Excessive Length 9

Table 2: Performance Comparison with Baseline Models

Model Accuracy (%) Precision Recall F1-Score ROC-AUC
ECG-only (CNN) 83.5 0.82 0.79 0.81 0.85
EHR-only (DNN) 79.2 0.78 0.76 0.78 0.82
Simple Fusion (Concatenation) 85.3 0.84 0.82 0.83 0.87
MedFuse 87.4 0.87 0.85 0.86 0.89
VisualBERT 88.1 0.88 0.86 0.87 0.90
MGFF (Proposed) 91.2 0.92 0.90 0.91 0.94

4.4 Ablation Study

To evaluate the contributions of individual components in the Multimodal Gated
Fusion Framework (MGFF), an ablation study was conducted. Table 3 presents
the results of removing or modifying key components of the framework. The base-
line MGFF (Proposed) achieves the highest accuracy (91.2%), precision (0.92),
recall (0.90), F1-score (0.91), and ROC-AUC (0.94). Removing the gated fusion
mechanism and replacing it with simple concatenation results in a noticeable
performance drop, with accuracy decreasing to 87.6%, underscoring the gated
mechanism’s importance for effective feature integration. Similarly, replacing the
ViLT transformer with a simpler MLP leads to reduced accuracy (88.4%), pre-
cision (0.88), and F1-score (0.87), demonstrating the transformer’s critical role
in refining multimodal features. Single-modality experiments with ECG or EHR
data alone show significantly lower performance across all metrics, emphasizing
the necessity of multimodal integration. These results validate the design choices
in MGFF and highlight the synergistic effect of gated fusion and transformer-
based refinement in achieving robust cardiovascular disease prediction.

Table 3: Results of Ablation Study

Variant Accuracy (%) Precision Recall F1-Score ROC-AUC
MGFF (Proposed) 91.2 0.92 0.90 0.91 0.94
Without Gated Fusion (Concatenation) 87.6 0.88 0.85 0.86 0.88
Without ViLT (Using MLP) 88.4 0.88 0.86 0.87 0.89
ECG Only (CNN) 83.5 0.82 0.79 0.81 0.85
EHR Only (DNN) 79.2 0.78 0.76 0.78 0.82

4.5 Statistical Analysis

To ensure the reliability and significance of the observed performance improve-
ments, we conducted a statistical analysis comparing the proposed Multimodal
Gated Fusion Framework (MGFF) with baseline models. The statistical signif-
icance of the differences in performance metrics was evaluated using a paired
t-test at a 95% confidence level. Table 4 summarizes the mean and standard
deviation of accuracy, precision, recall, F1-score, and ROC-AUC across 5 runs
for each model, along with the p-values comparing MGFF with each baseline.
The results in Table 4 indicate that the proposed MGFF significantly outper-
forms all baseline models across all metrics, with p-values less than 0.05 for most
10 Jothi Prakash V. et al.

comparisons. The low standard deviation of MGFF’s performance metrics across

5 runs demonstrates its stability and robustness. Baseline models like MedFuse
and VisualBERT, while achieving competitive results, show statistically signif-
icant differences when compared to MGFF. The paired t-test results confirm
that the improvements brought by the gated fusion mechanism and ViLT-based
refinement are statistically significant, further validating the effectiveness of the
proposed framework for multimodal cardiovascular disease prediction.

Table 4: Statistical Analysis of Performance Metrics

Model Accuracy (%) Precision Recall F1-Score ROC-AUC p-value
ECG-only (CNN) 83.5 ± 1.2 0.82 ± 0.01 0.79 ± 0.02 0.81 ± 0.01 0.85 ± 0.02 < 0.001
EHR-only (DNN) 79.2 ± 1.5 0.78 ± 0.02 0.76 ± 0.02 0.78 ± 0.02 0.82 ± 0.01 < 0.001
Simple Fusion 85.3 ± 0.9 0.84 ± 0.01 0.82 ± 0.01 0.83 ± 0.01 0.87 ± 0.01 < 0.01
MedFuse 87.4 ± 0.7 0.87 ± 0.01 0.85 ± 0.01 0.86 ± 0.01 0.89 ± 0.01 < 0.05
VisualBERT 88.1 ± 0.8 0.88 ± 0.01 0.86 ± 0.01 0.87 ± 0.01 0.90 ± 0.01 < 0.05
MGFF (Proposed) 91.2 ± 0.6 0.92 ± 0.01 0.90 ± 0.01 0.91 ± 0.01 0.94 ± 0.01 –

4.6 Multimodal Fusion Efficiency Analysis

To evaluate the efficiency of the proposed gated fusion mechanism in integrating

multimodal data, we compared it with alternative fusion strategies, including
early fusion, late fusion, and simple concatenation. Table 5 presents the results
for accuracy, precision, recall, F1-score, and ROC-AUC across these fusion meth-
ods. The proposed gated fusion mechanism achieves the highest performance,
with an accuracy of 91.2%, precision of 0.92, recall of 0.90, F1-score of 0.91,
and ROC-AUC of 0.94. Early fusion, which combines raw data before feature
extraction, performs poorly due to insufficient alignment of modalities, resulting
in an accuracy of 83.7%. Late fusion, which merges predictions from individual
modality-specific models, shows improved results with an accuracy of 86.1%, but
it lacks the synergistic benefits of joint feature representation. Simple concatena-
tion achieves moderate performance with an accuracy of 85.3%, highlighting the
limitations of naive feature integration. These results validate the effectiveness
of the gated fusion mechanism in aligning and integrating multimodal features,
enabling robust cardiovascular disease prediction.

Table 5: Comparison of Fusion Strategies

Fusion Method Accuracy (%) Precision Recall F1-Score ROC-AUC
Early Fusion 83.7 0.81 0.78 0.80 0.84
Late Fusion 86.1 0.85 0.83 0.84 0.87
Simple Concatenation 85.3 0.84 0.82 0.83 0.87
Gated Fusion (Proposed) 91.2 0.92 0.90 0.91 0.94
Title Suppressed Due to Excessive Length 11

4.7 Calibration Analysis

To evaluate the reliability of the predicted probabilities, we performed a cali-
bration analysis of the proposed Multimodal Gated Fusion Framework (MGFF).
Calibration measures how closely the predicted probabilities match the actual
outcomes. A well-calibrated model produces probabilities that accurately reflect
the likelihood of a positive prediction. Calibration curves were used for visual
interpretation, as shown in Figure 2, with the diagonal line representing perfect
calibration.

Fig. 2: Calibration curves for MGFF (Proposed) and baseline models. The
diagonal line represents perfect calibration.

The calibration curves in Figure 2 demonstrate that the proposed MGFF

achieves the closest alignment with the diagonal line, indicating superior calibra-
tion compared to the baseline models. Models such as ECG-only and EHR-only
exhibit significant deviations, highlighting lower reliability in predicted probabil-
ities. Advanced baselines like MedFuse and VisualBERT show better alignment
but still fall short of MGFF’s performance. These results validate the ability
of MGFF to produce well-calibrated predictions, enhancing its applicability in
real-world healthcare scenarios where reliable probability estimates are crucial.

5 Limitations
While the proposed Multimodal Gated Fusion Framework (MGFF) demon-
strates significant improvements in cardiovascular disease prediction, it has cer-
tain limitations. First, the model relies on high-quality and well-structured multi-
modal datasets, which may not always be available in real-world clinical settings.
12 Jothi Prakash V. et al.

The performance of MGFF may degrade when dealing with noisy or incomplete
data, such as missing ECG signals or incomplete EHR records. Second, the com-
putational complexity of the ViLT-based transformer and gated fusion mecha-
nism requires significant hardware resources, making deployment on low-resource
devices challenging. Third, the framework is tailored to the specific modalities of
ECG and EHR data, limiting its generalizability to other healthcare domains or
multimodal datasets without additional customization. Lastly, while the model
achieves high accuracy and calibration, its interpretability could be further en-
hanced to provide actionable insights for clinicians, such as highlighting specific
contributing features from each modality. Addressing these limitations could
further improve the practical applicability and scalability of MGFF in diverse
clinical scenarios.

6 Conclusion
In this research, we proposed the Multimodal Gated Fusion Framework (MGFF)
for cardiovascular disease prediction, integrating textual EHR data and visual
ECG signals using a gated fusion mechanism and ViLT-based transformer for
feature refinement. The framework demonstrated superior performance across
multiple metrics, achieving an accuracy of 91.2%, an F1-score of 0.91, and
a ROC-AUC of 0.94, outperforming advanced baselines such as MedFuse and
VisualBERT. Extensive analysis, including ablation studies, calibration assess-
ment, and multimodal fusion efficiency evaluation, validated the significance of
the gated fusion mechanism and the transformer’s ability to align multimodal
features effectively. Furthermore, the calibration analysis showed that MGFF
produces well-calibrated predictions, enhancing its reliability for clinical appli-
cations. Despite these promising results, future work will focus on addressing the
limitations of the framework, including improving robustness to noisy and incom-
plete data, reducing computational complexity for deployment on low-resource
devices, and generalizing the approach to other multimodal healthcare datasets.
Additionally, enhancing the interpretability of the framework to provide action-
able insights for clinicians will be prioritized to increase its practical utility in
real-world scenarios.
REFERENCES

[1] Muhammad Shakeel Akram, Bogaraju Sharatchandra Varma, and Dewar Finlay.
Embedded dnn classifier for five different cardiac diseases. In 2024 35th Irish
Signals and Systems Conference (ISSC), pages 01–06. IEEE, 6 2024.
[2] Akshita Gour, Muktesh Gupta, Rajesh Wadhvani, and Sanyam Shukla. Ecg based
heart disease classification: Advancement and review of techniques. Procedia Com-
puter Science, 235:1634–1648, 2024.
[3] Muhammad Salman Haleem, Rossana Castaldo, Silvio Marcello Pagliara, Mario
Petretta, Marco Salvatore, Monica Franzese, and Leandro Pecchia. Time adap-
tive ecg driven cardiovascular disease detector. Biomedical Signal Processing and
Control, 70:102968, 9 2021.
[4] Biyanka Jaltotage, Juan Lu, and Girish Dwivedi. Use of artificial intelligence in-
cluding multimodal systems to improve the management of cardiovascular disease.
Canadian Journal of Cardiology, 40:1804–1812, 10 2024.
[5] Muhammad Umar Khan, Sumair Aziz, Khushbakht Iqtidar, Galila Faisal Zaher,
Shareefa Alghamdi, and Munazza Gull. A two-stage classification model integrat-
ing feature fusion for coronary artery disease detection and classification. Multi-
media Tools and Applications, 81:13661–13690, 4 2022.
[6] Mohammad Moshawrab, Mehdi Adda, Abdenour Bouzouane, Hussein Ibrahim,
and Ali Raad. Reviewing multimodal machine learning and its use in cardiovas-
cular diseases detection. Electronics, 12:1558, 3 2023.
[7] V. Jothi Prakash and N. K. Karthikeyan. Enhanced evolutionary feature selec-
tion and ensemble method for cardiovascular disease prediction. Interdisciplinary
Sciences: Computational Life Sciences, 13:389–412, 9 2021.
[8] V. Jothi Prakash and N. K. Karthikeyan. Dual-layer deep ensemble techniques
for classifying heart disease. Information Technology and Control, 51:158–179, 3
2022.
[9] Ali Rasekh, Reza Heidari, Amir Hosein Haji Mohammad Rezaie, Parsa Sharifi
Sedeh, Zahra Ahmadi, Prasenjit Mitra, and Wolfgang Nejdl. Robust fusion of
time series and image data for improved multimodal clinical prediction. IEEE
Access, 12:174107–174121, 2024.
[10] Arul Antran Vijay Subramanian and Jothi Prakash Venugopal. A deep ensem-
ble network model for classifying and predicting breast cancer. Computational
Intelligence, 39:258–282, 4 2023.
[11] Jothi Prakash V., Arul Antran Vijay S., Ganesh Kumar P., and Karthikeyan N.K.
A novel attention-based cross-modal transfer learning framework for predicting
cardiovascular disease. Computers in Biology and Medicine, 170:107977, 3 2024.
[12] Junxin Wang, Juanen Li, Rui Wang, and Xinqi Zhou. Vae-driven multimodal
fusion for early cardiac disease detection. IEEE Access, 12:90535–90551, 2024.

Literature Review CVD
No ratings yet
Literature Review CVD
7 pages
Minor Project II Mohd Rehan
No ratings yet
Minor Project II Mohd Rehan
17 pages
Heart Attack All Points
No ratings yet
Heart Attack All Points
15 pages
An Advanced Machine Learning Architecture For Heart Disease Detection Research Paper
No ratings yet
An Advanced Machine Learning Architecture For Heart Disease Detection Research Paper
11 pages
Early Detection of Cardiovascular Complications Using Soft Computing and Deep Learning
No ratings yet
Early Detection of Cardiovascular Complications Using Soft Computing and Deep Learning
4 pages
Literature Review
No ratings yet
Literature Review
6 pages
Multimodal ECG Heartbeat Classification Method Based On A Convolutional Neural Network Embedded With FCA
No ratings yet
Multimodal ECG Heartbeat Classification Method Based On A Convolutional Neural Network Embedded With FCA
10 pages
Multimodal Variational Autoencoder For Low-Cost Cardiac Hemodynamics Instability Detection
No ratings yet
Multimodal Variational Autoencoder For Low-Cost Cardiac Hemodynamics Instability Detection
11 pages
Cardiovascular Lietrature
No ratings yet
Cardiovascular Lietrature
4 pages
2021-Prediction of Cardiovascular Diseases by Integrating Multi-Modal Features
No ratings yet
2021-Prediction of Cardiovascular Diseases by Integrating Multi-Modal Features
9 pages
Research Proposal JiuliXiong
No ratings yet
Research Proposal JiuliXiong
6 pages
IJRPR36184
No ratings yet
IJRPR36184
18 pages
Explainable Deep Learning-Based Approach For Multilabel Classification of Electrocardiogram
No ratings yet
Explainable Deep Learning-Based Approach For Multilabel Classification of Electrocardiogram
13 pages
Flexcare: Leveraging Cross-Task Synergy For Flexible Multimodal Healthcare Prediction
No ratings yet
Flexcare: Leveraging Cross-Task Synergy For Flexible Multimodal Healthcare Prediction
11 pages
Pavani
No ratings yet
Pavani
4 pages
Combining Structured and Unstructured Data For Predictive Models: A Deep Learning Approach
No ratings yet
Combining Structured and Unstructured Data For Predictive Models: A Deep Learning Approach
11 pages
Algorithms 18 00094
No ratings yet
Algorithms 18 00094
24 pages
Bioconf Iscku2024 00047
No ratings yet
Bioconf Iscku2024 00047
9 pages
Synopsis Cardio Last
No ratings yet
Synopsis Cardio Last
12 pages
Multimodal Variational Autoencoder For Low-Cost Cardiac Hemodynamics Instability Detection
No ratings yet
Multimodal Variational Autoencoder For Low-Cost Cardiac Hemodynamics Instability Detection
11 pages
Heart Attack 2
No ratings yet
Heart Attack 2
8 pages
Analysis of A Deep Learning Model For 12-Lead ECG Classification Reveals Learned Features Similar To Diagnostic Criteria
No ratings yet
Analysis of A Deep Learning Model For 12-Lead ECG Classification Reveals Learned Features Similar To Diagnostic Criteria
12 pages
Research Paper Group 9
No ratings yet
Research Paper Group 9
9 pages
9 Ijaer230802
No ratings yet
9 Ijaer230802
12 pages
Diagnostics: Machine Learning-Based Predictive Models For Detection of Cardiovascular Diseases
No ratings yet
Diagnostics: Machine Learning-Based Predictive Models For Detection of Cardiovascular Diseases
19 pages
Base Paper
No ratings yet
Base Paper
10 pages
A Multitier Deep Learning Model For Arrhythmia Detection
No ratings yet
A Multitier Deep Learning Model For Arrhythmia Detection
9 pages
Automated Ensemble Multimodal Machine Learning For Healthcare
No ratings yet
Automated Ensemble Multimodal Machine Learning For Healthcare
14 pages
Efficient Multi-View Fusion and Flexible Adaptation To View Missing in Cardiovascular System Signals
No ratings yet
Efficient Multi-View Fusion and Flexible Adaptation To View Missing in Cardiovascular System Signals
16 pages
Research Proposal-V1
No ratings yet
Research Proposal-V1
5 pages
Detection of Cardiovascular
No ratings yet
Detection of Cardiovascular
10 pages
Diagnostics 12 03215 v2
No ratings yet
Diagnostics 12 03215 v2
17 pages
Synposis Final
No ratings yet
Synposis Final
4 pages
First Review
No ratings yet
First Review
15 pages
Information: Ensemble Deep Learning Models For Heart Disease Classification: A Case Study From Mexico
No ratings yet
Information: Ensemble Deep Learning Models For Heart Disease Classification: A Case Study From Mexico
28 pages
Machine Learning Algorithms For Predictive Analytics
No ratings yet
Machine Learning Algorithms For Predictive Analytics
17 pages
Heart Research 1
No ratings yet
Heart Research 1
15 pages
Real - Time Recognition of Cardiovascular Conditions From ECG Images With Deep Learning
No ratings yet
Real - Time Recognition of Cardiovascular Conditions From ECG Images With Deep Learning
10 pages
Diagnostics 14 00239 v2
No ratings yet
Diagnostics 14 00239 v2
19 pages
Mathematics 11 04681
No ratings yet
Mathematics 11 04681
15 pages
VAE-Driven Multimodal Fusion For Early Cardiac Disease Detection
No ratings yet
VAE-Driven Multimodal Fusion For Early Cardiac Disease Detection
17 pages
XAI and IOT For Cardiovascular Disease Prediction
No ratings yet
XAI and IOT For Cardiovascular Disease Prediction
19 pages
Predicting Heart Disease Using Neural Networks
No ratings yet
Predicting Heart Disease Using Neural Networks
7 pages
Review Paper Heart Disease Prediction
No ratings yet
Review Paper Heart Disease Prediction
5 pages
Arterial hypertension in clinical practice: study and analysis of biotechnological and telemedicine models
From Everand
Arterial hypertension in clinical practice: study and analysis of biotechnological and telemedicine models
Michele Karaboue
No ratings yet
Cardiovascular Disease Prediction Combination Using Machine and Deep Learning Model
No ratings yet
Cardiovascular Disease Prediction Combination Using Machine and Deep Learning Model
16 pages
Synopsis 1
No ratings yet
Synopsis 1
9 pages
Asd 1
No ratings yet
Asd 1
6 pages
Detection of Cardiovascular Diseases in ECG Images Using Machine Learning and Deep Learning Methods
No ratings yet
Detection of Cardiovascular Diseases in ECG Images Using Machine Learning and Deep Learning Methods
4 pages
Artificial Intelligence in Medicine
No ratings yet
Artificial Intelligence in Medicine
8 pages
Novel Deep Learning Architecture For Heart Disease Prediction Using Convolutional Neural Network
100% (1)
Novel Deep Learning Architecture For Heart Disease Prediction Using Convolutional Neural Network
6 pages
Paper
No ratings yet
Paper
2 pages
Abstract
No ratings yet
Abstract
7 pages
A Hybrid Feature Selection With Data-Driven Approach For Cardiovascular Disease Prediction Using Machine Learning
No ratings yet
A Hybrid Feature Selection With Data-Driven Approach For Cardiovascular Disease Prediction Using Machine Learning
9 pages
Deep Residual 2D Convolutional Neural Network For Cardiovascular Disease Classification
No ratings yet
Deep Residual 2D Convolutional Neural Network For Cardiovascular Disease Classification
16 pages
Risk Prediction of Cardiovascular Disease Using
No ratings yet
Risk Prediction of Cardiovascular Disease Using
14 pages
Liu 18 B
No ratings yet
Liu 18 B
24 pages
A Robust Heart Disease Prediction System Using Hybrid Deep Neural Networks
No ratings yet
A Robust Heart Disease Prediction System Using Hybrid Deep Neural Networks
18 pages
CARDIO VASCULAR PREDECTION 1 and 2
No ratings yet
CARDIO VASCULAR PREDECTION 1 and 2
7 pages
Foundation Models For Cardiovascular Disease Detection Via Biosignals From Digital Stethoscopes
No ratings yet
Foundation Models For Cardiovascular Disease Detection Via Biosignals From Digital Stethoscopes
13 pages
Lesson 8 Association Rules
No ratings yet
Lesson 8 Association Rules
58 pages
Chapter 5 - System Stability
No ratings yet
Chapter 5 - System Stability
27 pages
BCA Project
No ratings yet
BCA Project
12 pages
Full Download PDF of (Ebook PDF) A First Course in Mathematical Modeling 4th Edition All Chapter
100% (8)
Full Download PDF of (Ebook PDF) A First Course in Mathematical Modeling 4th Edition All Chapter
43 pages
Overview-Numerical Methods
No ratings yet
Overview-Numerical Methods
25 pages
Prof. Amr Goneid Part 2. Types of Complexities: Department of Computer Science, AUC
No ratings yet
Prof. Amr Goneid Part 2. Types of Complexities: Department of Computer Science, AUC
30 pages
History of ACE
No ratings yet
History of ACE
2 pages
Module 2
No ratings yet
Module 2
66 pages
Huffman Coding
No ratings yet
Huffman Coding
12 pages
Unraveling The Brain - A Creative Journey Into Neural Networks
No ratings yet
Unraveling The Brain - A Creative Journey Into Neural Networks
14 pages
Obfuscated Malware Detection Using Deep Generative Models
No ratings yet
Obfuscated Malware Detection Using Deep Generative Models
13 pages
1.4 Solving System of Linear Equations
No ratings yet
1.4 Solving System of Linear Equations
14 pages
06 Numerical Methods
No ratings yet
06 Numerical Methods
88 pages
Designing Machine Learning Workflows in Python Chapter3
No ratings yet
Designing Machine Learning Workflows in Python Chapter3
42 pages
Big O Algorithm Complexity Cheat Sheet
100% (1)
Big O Algorithm Complexity Cheat Sheet
3 pages
Algorithm
No ratings yet
Algorithm
16 pages
Prolog Lab 1
No ratings yet
Prolog Lab 1
4 pages
Practical Work of AI
No ratings yet
Practical Work of AI
6 pages
Midterm Review Solution
No ratings yet
Midterm Review Solution
6 pages
Stress Testing Market Risk
No ratings yet
Stress Testing Market Risk
18 pages
Ch19 Learning Curves S
No ratings yet
Ch19 Learning Curves S
12 pages
Module 1 Introduction To Quadratic Equations
100% (1)
Module 1 Introduction To Quadratic Equations
14 pages
MCR3U Unit 3: Exponential Functions Assignment - COMMUNICATION
No ratings yet
MCR3U Unit 3: Exponential Functions Assignment - COMMUNICATION
3 pages
Business Statistics Course Outline
No ratings yet
Business Statistics Course Outline
3 pages
Fourier Features Let Networks Learn High Freq Funcs in Low Dim Domains
No ratings yet
Fourier Features Let Networks Learn High Freq Funcs in Low Dim Domains
24 pages
A Modern Course in Statistical Physics: Linda E. Reich!
100% (1)
A Modern Course in Statistical Physics: Linda E. Reich!
8 pages
NQT2021
No ratings yet
NQT2021
4 pages
PPR 3
No ratings yet
PPR 3
31 pages
Problem Statement For B-Tree: Functionalities: Insertion
No ratings yet
Problem Statement For B-Tree: Functionalities: Insertion
7 pages
76.sentiment Analysis in Emergency Calls For Exploring Natural Language Processing For Enhanced Police Dispatch Services
No ratings yet
76.sentiment Analysis in Emergency Calls For Exploring Natural Language Processing For Enhanced Police Dispatch Services
2 pages

MGFF Springer

Uploaded by

MGFF Springer

Uploaded by

Multimodal Gated Fusion Framework for

Cardiovascular Disease Prediction Using ECG

Department of Information Technology, Karpagam College of Engineering,

Abstract. Cardiovascular diseases (CVD) are among the leading causes

Keywords: Multimodal Fusion, Cardiovascular Disease Prediction, Deep

2.1 Single-Modality Approaches

2.2 Multimodal Frameworks

Multimodal frameworks aim to integrate multiple data modalities to address

2.3 Research Gaps

Despite significant advancements in machine learning for cardiovascular disease

The proposed Multimodal Gated Fusion Framework (MGFF) aims to predict

data. The framework leverages advanced feature extraction techniques, a gated

Fig. 1: Overview of the MGGF Framework.

into embeddings compatible with the model. Augmentation techniques, such as

Table 1: CVD Dataset Statistics

3.2 Feature Extraction

Feature extraction is a crucial step in the proposed Multimodal Fusion Frame-

Multimodal Feature Fusion The visual features Fe and structured features

Linear Projection The fused feature vector Fm ∈ Rdm is first transformed

Transformer-based Feature Refinement The projected features Fp are fur-

where Wo ∈ RC×dt is the weight matrix, bo ∈ RC is the bias vector, and C is

Class = arg max(ŷc ), (8)

4.1 Evaluation Metrics

The performance of the proposed multimodal fusion framework is evaluated

4.2 Baseline Models

To evaluate the performance of the proposed multimodal fusion framework, we

4.3 Evaluation with Baselines

The effectiveness of the proposed Multimodal Gated Fusion Framework is evalu-

Table 2: Performance Comparison with Baseline Models

4.4 Ablation Study

Table 3: Results of Ablation Study

4.5 Statistical Analysis

comparisons. The low standard deviation of MGFF’s performance metrics across

Table 4: Statistical Analysis of Performance Metrics

4.6 Multimodal Fusion Efficiency Analysis

To evaluate the efficiency of the proposed gated fusion mechanism in integrating

Table 5: Comparison of Fusion Strategies

4.7 Calibration Analysis

The calibration curves in Figure 2 demonstrate that the proposed MGFF

You might also like