0% found this document useful (0 votes)
46 views8 pages

MultimodNet A Multimodal Deep Learning Model For COPD Staging Based On Chest X-Ray and Pulmonary Function Tests

The document presents a study on MultimodNet, a multimodal deep learning model designed to enhance the diagnosis and staging of Chronic Obstructive Pulmonary Disease (COPD) by integrating chest X-ray images and Pulmonary Function Test (PFT) data. This approach aims to improve diagnostic accuracy and patient outcomes by classifying COPD into four stages: Initial, Progressive, Complicated, and Critical, leveraging both structural and functional data. The research highlights the potential of AI-driven solutions in advancing disease detection and management in healthcare.

Uploaded by

Atiya Aymen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views8 pages

MultimodNet A Multimodal Deep Learning Model For COPD Staging Based On Chest X-Ray and Pulmonary Function Tests

The document presents a study on MultimodNet, a multimodal deep learning model designed to enhance the diagnosis and staging of Chronic Obstructive Pulmonary Disease (COPD) by integrating chest X-ray images and Pulmonary Function Test (PFT) data. This approach aims to improve diagnostic accuracy and patient outcomes by classifying COPD into four stages: Initial, Progressive, Complicated, and Critical, leveraging both structural and functional data. The research highlights the potential of AI-driven solutions in advancing disease detection and management in healthcare.

Uploaded by

Atiya Aymen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Proceedings of the Third International Conference on Automation, Computing and Renewable Systems (ICACRS-2024)

IEEE Xplore Part Number: CFP24CB5-ART; ISBN: 979-8-3315-3242-0

MultimodNet: A Multimodal Deep Learning Model


for COPD Staging based on Chest X-Ray and
2024 3rd International Conference on Automation, Computing and Renewable Systems (ICACRS) | 979-8-3315-3242-0/24/$31.00 ©2024 IEEE | DOI: 10.1109/ICACRS62842.2024.10841790

Pulmonary Function Tests


A. Krishnaveni 1 G.Vinoth Rajkumar 2 J. Relin Francis Raj 3
Professor, Department of Mechanical Assistant Professor, Department of Associate Professor, Department of
Engineering, Electronics and Communication Electronics and Communication
Government College of Engineering, Engineering, Engineering,
Tirunelveli, India J.P. College of Engineering, Saveetha School of Engineering,
[email protected] Tenkasi, India Saveetha Institute of Medical and
[email protected] Technical Sciences,
Chennai, India
[email protected]

R. Santhana Krishnan 4 S. Murali 5 Imran Javeed Settu 6


Assistant Professor, Department of Assistant Professor, Department of Assistant Professor, Department of
Electronicsand Communication Computer Science and Engineering, Electronics and Communication
Engineering, Velammal College of Engineering and Engineering,
SCAD College of Engineering and Technology, Vel Tech Rangarajan Dr. Sagunthala
Technology, Madurai, India R&D Institute of Science and
Cheranmahadevi, India [email protected] Technology ,
[email protected] Avadi, Chennai, India
[email protected]

Abstract— Chronic Obstructive Pulmonary Disease (COPD) progressive respiratory condition characterized by persistent
represents a major global health challenge, with early and airflow limitation and associated with chronic inflammation
accurate diagnosis being essential for improving patient of the airways and lungs. COPD has a profound impact on
outcomes. Traditional diagnostic approaches often rely on individuals’ quality of life, leading to symptoms such as
single-modality assessments, which may not fully capture the
complexity of the disease. This research introduces
breathlessness, chronic cough, and frequent exacerbations that
MultimodNet, a deep learning framework that integrates both often require hospitalization. These exacerbations can cause
chest X-ray images and Pulmonary Function Test (PFT) data to rapid declines in lung function, placing a substantial burden
provide a more comprehensive classification of COPD stages. By on healthcare systems and resulting in significant
combining structural information from chest X-rays with socioeconomic costs.
functional lung metrics like FEV1, FVC, and the FEV1/FVC Early and accurate diagnosis is critical in mitigating these
ratio, MultimodNet aims to enhance the accuracy and reliability effects, enabling timely interventions that can slow disease
of COPD diagnosis. The core strength of MultimodNet lies in its progression, improve patient outcomes, and reduce the overall
multimodal approach, which allows for the processing and healthcare burden. Traditional diagnostic approaches for
fusion of diverse data types to overcome the limitations of using
a single modality. This integration helps the model provide more
COPD have predominantly relied on Pulmonary Function
accurate predictions regarding COPD progression, classifying Tests (PFTs), which measure critical parameters such as
patients into one of four stages: Initial, Progressive, Forced Expiratory Volume in one second (FEV1), Forced
Complicated, and Critical. These stages reflect the disease's Vital Capacity (FVC), and the FEV1/FVC ratio. While PFTs
clinical trajectory, from the onset of symptoms through to provide essential quantitative insights into lung function, they
advanced stages where life-threatening complications may arise. fail to offer structural information about lung anatomy, which
By leveraging both imaging and clinical data, MultimodNet is often crucial for a comprehensive understanding of disease
offers a powerful diagnostic tool that can assist healthcare progression. Conversely, chest X-rays are commonly
providers in making early and precise decisions, leading to employed to visualize lung structure, identify abnormalities,
better treatment planning and improved patient care. This
framework has the potential to not only enhance diagnostic
and rule out other conditions. However, these imaging
workflows but also contribute to the broader field of AI-driven methods lack the functional details that PFTs provide. Relying
healthcare solutions, where multimodal data integration plays a on either of these modalities independently may result in
key role in advancing disease detection and management. incomplete assessments, leading to underdiagnosis,
misclassification, or delayed treatment. Thus, there is a
Keywords— Multimodal Data Integration, MultimodNet, AlexNet, pressing need for diagnostic systems that combine these
Pulmonary Function Test , COPD Diagnosis, Adam Optimizer. complementary data sources to provide a holistic
I. INTRODUCTION understanding of COPD. Recent advancements in machine
learning and deep learning have revolutionized medical
Chronic Obstructive Pulmonary Disease (COPD) is a diagnostics, offering powerful tools for the analysis of
significant global health concern, ranking among the leading complex data. Many existing systems leverage deep learning
causes of morbidity and mortality worldwide. It is a models, such as convolutional neural networks (CNNs), to

979-8-3315-3242-0/24/$31.00 ©2024 IEEE 1683


uthorized licensed use limited to: AMRUTA INSTITUTE OF ENGINEERING & MANAGEMENT SCIENCES. Downloaded on March 17,2025 at 16:24:30 UTC from IEEE Xplore. Restrictions appl
Proceedings of the Third International Conference on Automation, Computing and Renewable Systems (ICACRS-2024)
IEEE Xplore Part Number: CFP24CB5-ART; ISBN: 979-8-3315-3242-0

extract meaningful features from medical images like chest X- accuracy rates of 96.98% (cough model), 99.97% (CT
rays. Similarly, machine learning algorithms have been model), and 98.65% (FuzzyGuard), improving diagnosis and
applied to clinical datasets, including PFT metrics, for disease patient outcomes. P. Sahu et al. used a 1D CNN model
prediction and classification. However, most of these methods optimized with Adam and RMSprop techniques to identify
focus on a single data modality, limiting their diagnostic COPD [7], utilizing advances in AI. Adam outperformed
potential. Although some studies have explored multimodal RMSprop (92% and 88%, respectively) with training
approaches, their scope has often been constrained by accuracy of 94% and validation accuracy of 90% on the
insufficient integration strategies, suboptimal model ICBHI 2017 dataset. In respiratory health analytics, this
architectures, or a lack of comprehensive validation. model showed promise as a trustworthy diagnosis tool. A
MultimodNet is an advanced multimodal deep learning similar strategy was followed by S. Jha et al., for early COPD
framework designed to enhance the diagnosis and staging of detection [8]. CNNs, LSTMs, and a CNN-LSTM hybrid
Chronic Obstructive Pulmonary Disease (COPD). It integrates model were among the sophisticated deep learning methods
structural data from chest X-rays with functional insights from used by N. Vodnala et al. to differentiate between COPD and
Pulmonary Function Tests (PFTs), addressing the limitations asthma [9]. With a 93% accuracy rate using variables like
of traditional single-modality methods. By classifying COPD spectral centroid and Mel-Frequency Cepstral Coefficients,
into stages such as Initial, Progressive, Complicated, and the CNN-LSTM model proved to be effective in classifying
Critical, the model improves early detection and enables respiratory diseases based on cough characteristics. The
personalized treatment planning. This approach enhances accuracy of the diagnosis was improved. M. S. Karthikeyan
diagnostic accuracy, supports better patient outcomes, and et al. proposed U-Net-Attention-TBNet for accurate
showcases the transformative potential of AI-driven tuberculosis lesion detection in chest X-rays [10]. Combining
multimodal solutions in managing chronic diseases like U-Net with attention mechanisms, it surpassed ResNet,
COPD. DenseNet and VGGNet-based U-Nets on the CheXpert
dataset, achieving high accuracy, precision, recall, and F1
II. RELATED WORKS scores. This strategy offers potential applicability in
Ikechukwu et al. applied deep learning to Chest X- enhancing COPD detection.
rays (CXRs) for early COPD detection using the VinDR- Rukumani Khandhan et al. applied deep transfer
CXR dataset. The Xception model, fine-tuned to achieve a learning for early COPD prediction using breathing sound
recall of 98.2%, [1] outperformed ResNet50. Grad-CAM and recordings [11]. Inception, ResNet, and VGGNet were fine-
SHAP were used to provide explainability, highlighting deep tuned on a dataset of sounds from individuals with COPD and
learning's potential for early COPD diagnosis in resource- other lung disorders. These models achieved high precision,
limited settings. V. Koshta et al. introduced Fourier enabling cost-effective early detection and management of
Decomposition Method models using DCT and DFT to COPD. G. R. Khanaghavalle et al. developed a COPD
classify asthma and COPD based on lung sound signals. [2] severity diagnostic system using the
The models achieved high accuracy, with DCT reaching RespiratoryDatabase@TR dataset. Feature extraction
99.4% for asthma vs COPD and DFT reaching 99.8% for techniques like spectrograms and chromograms, paired with
asthma vs normal, utilizing statistical attributes and data augmentation, enhanced the RESNET50 model's
classifiers such as SVM, kNN, and ensemble classifiers. The performance [12]. This sound-based methodology
GOLD criteria were used by A. Kinikar et al. to develop demonstrated potential for accurate early COPD severity
prediction models for determining the severity of COPD [3]. classification, offering innovative solutions for improved
Machine learning classifiers trained on derived features, such global respiratory disease management. G. S. Marepalli et al.
as XGBoost , SVC, Naive Bayes , Random Forest and Hard utilized deep learning models, CNN and LSTM, to identify
Voting Ensemble, were used in the models. These models COPD from respiratory audio data using the Respiratory
helped forecast the severity of COPD on a scale of 1 to 4, Sound Database. The LSTM model achieved 98.2%
which helped with early identification and treatment. They accuracy, while the CNN model reached 92.3% [13]. These
attained an accuracy of 97.6%. H. J. Davies et al. developed models demonstrated great precision, highlighting the
a COPD-simulator to generate COPD-like data from healthy potential of early COPD detection to improve patient
subjects, enhancing deep learning model training [4]. The outcomes. P. Sahu et al. proposed a deep learning framework
simulator's effectiveness was validated through analysis of for COPD severity classification using lung sound
waveforms and FEV 1 /FVC ratios, achieving an area under recordings. The framework extracted features from the Open
the curve of 0.75 in detecting COPD when trained on Respiratory Sound and Respiratory@TR datasets, achieving
surrogate data. J. E. Nikshya et al. introduced the CCA-RFE 95.76% accuracy in two-class classification and 93% in
Selector (CCARS) algorithm for early leukemia detection by multiclass classification. The framework outperformed
integrating multi-omics data. The algorithm enhanced feature traditional methods for early COPD detection. Ezhil E.
selection and classification accuracy, outperforming existing Nithila et al. developed "DensePneumoNet," a deep learning
methods such as PCA and Lasso Regression. This approach, algorithm using DenseNet for pneumonia detection from CT
demonstrating improved accuracy and reduced chest images. It achieved superior performance across all
computational time, could also be applied to COPD detection, evaluation metrics. The approach, leveraging DenseNet's
leveraging multi-omics data for better diagnostics [5]. dense blocks and transition layers, demonstrated
S. Kumar et al. developed the "FuzzyGuard" effectiveness and could also be applied to COPD detection
framework using cough audio, lung sounds and CT images [15]. A hybrid SVM+ResNet50 prediction model for
for early COPD detection [6]. The framework, employing diagnosing COPD was proposed by J. Chawla et al., and it
ensemble learning with an RVFL neural network, achieved achieved 93% accuracy [16]. A combined dataset from the

979-8-3315-3242-0/24/$31.00 ©2024 IEEE 1684


uthorized licensed use limited to: AMRUTA INSTITUTE OF ENGINEERING & MANAGEMENT SCIENCES. Downloaded on March 17,2025 at 16:24:30 UTC from IEEE Xplore. Restrictions appl
Proceedings of the Third International Conference on Automation, Computing and Renewable Systems (ICACRS-2024)
IEEE Xplore Part Number: CFP24CB5-ART; ISBN: 979-8-3315-3242-0

National Center for Biotechnology Information and the III. Model Architecture
National Institute of Health in the United States was used in A multimodal deep learning model is proposed to
the model. The detection and treatment of COPD can also be classify COPD stages by integrating features from chest X-
improved by using this tactic. rays and PFT data. The architecture consists of two main
components:
III. PROPOSED WORK • DenseNet121 for Chest X-ray Processing:
DenseNet121, a powerful Convolutional Neural
I. Introduction and Motivation
Network (CNN) architecture, is used for medical
Chronic Obstructive Pulmonary Disease (COPD) is image classification due to its dense connections,
a leading cause of morbidity and mortality worldwide, which facilitate improved feature reuse and gradient
emphasizing the importance of early and accurate diagnosis flow. The model will be pre-trained on the ImageNet
for better management and treatment outcomes. While dataset and fine-tuned using the chest X-ray images to
Pulmonary Function Tests (PFTs) have traditionally been
learn relevant patterns. The DenseNet121 model will
used to assess the extent of lung dysfunction, they alone do
generate feature vectors representing high-level
not provide a comprehensive understanding of the disease
progression. Chest X-rays, offering structural insights into information from the chest X-rays.
the lungs, can be paired with clinical data for enhanced • FCN for PFT Data Processing: A fully connected
diagnosis. Recent advances in deep learning provide an network (FCN) will process the PFT data. This
opportunity to improve medical diagnostics, especially in the network consists of multiple dense layers,
interpretation of medical images. This research proposes a transforming the features (FEV1, FVC, FEV1/FVC
multimodal deep learning approach that integrates chest X- ratio) into a numerical representation suitable for
ray images with PFT data to classify the stages of COPD. combining with the features extracted from the chest
Specifically, we use the Chest X-ray14 dataset for image data X-rays. This integration ensures a unified
and the COPDGene dataset for clinical data. The aim is to representation for classification.
enhance the accuracy of COPD classification by combining
both modalities, which will ultimately aid in providing more Fusion of Image and PFT Data:
precise patient diagnoses. The outputs of DenseNet121 and the FCN will be
concatenated to form a combined feature vector, which will
II. Data Collection and Preprocessing be passed through additional dense layers to produce the final
Datasets classification. A softmax activation function will be used in
This study utilizes two primary datasets: the output layer to predict one of the COPD stages: Initial,
Progressive, Complicated, or Critical.
• Chest X-ray14 Dataset: This dataset consists of over
IV. Training Setup
100,000 labeled chest X-ray images, including those
associated with COPD. For this study, the dataset will The model was trained using a batch size of 32 to
be filtered to select images related to COPD cases. balance training efficiency and memory usage, utilizing the
Adam optimizer for its ability to handle large models and
• COPDGene Dataset: This dataset provides clinical
diverse datasets effectively. Categorical cross-entropy served
data, including key pulmonary function test
as the loss function, ideal for multi-class classification tasks.
parameters such as FEV1, FVC, and the FEV1/FVC To prevent overfitting, early stopping was implemented with
ratio, essential for assessing lung function in COPD a patience threshold of 10 epochs, terminating training when
patients. validation accuracy ceased to improve.
Preprocessing of Chest X-ray Images
The chest X-ray images are resized to 224x224 pixels V. Handling Imbalanced Data
to match the input size required by the DenseNet121 model, COPD datasets often exhibited class imbalance,
a highly effective deep learning architecture for medical with certain stages of COPD being underrepresented. To
image classification. Image normalization is performed to address this issue, the Synthetic Minority Over-sampling
scale pixel values within the [0, 1] range, ensuring uniformity Technique (SMOTE) was implemented, generating synthetic
across the dataset. Data augmentation techniques, including samples for the underrepresented classes. This approach
random rotations, zooming, and flipping, are applied to successfully created a more balanced dataset, ensuring
artificially expand the dataset and reduce overfitting. equitable representation across all COPD stages during
training.
Preprocessing of PFT Data
The clinical PFT data, including FEV1, FVC, and the VI. Model Evaluation and Performance Metrics
FEV1/FVC ratio, undergoes a normalization via min-max The model’s performance was evaluated using
scaling, ensuring all features are within a similar numerical metrics such as accuracy, precision, recall, and F1 score.
range. This step is critical for effective model training. The These metrics provided valuable insights into the reliability
chest X-ray images and PFT data will be aligned to ensure and effectiveness of the model in classifying COPD stages.
each image corresponds to the correct clinical data for each Evaluating these metrics helped identify strengths and
patient. potential weaknesses in predictions, enabling fine-tuning to

979-8-3315-3242-0/24/$31.00 ©2024 IEEE 1685


uthorized licensed use limited to: AMRUTA INSTITUTE OF ENGINEERING & MANAGEMENT SCIENCES. Downloaded on March 17,2025 at 16:24:30 UTC from IEEE Xplore. Restrictions appl
Proceedings of the Third International Conference on Automation, Computing and Renewable Systems (ICACRS-2024)
IEEE Xplore Part Number: CFP24CB5-ART; ISBN: 979-8-3315-3242-0

improve overall performance. Additionally, these evaluations dense network for PFT data processing, and a fusion layer to
ensured that the model maintained a balance between enhance predictive accuracy.
correctly identifying cases and minimizing false positives or
Table I: Multimodal Deep Learning Approach for COPD: Pseudocode
negatives, ultimately enhancing its utility in clinical
applications.
import numpy as np
This diagram (figure 1) simplifies the research flow, import pandas as pd
making it easy to follow the process from data preparation to import tensorflow as tf
model evaluation. from tensorflow.keras.applications import DenseNet121
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D,
Input, Concatenate
from tensorflow.keras.models import Model
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from imblearn.over_sampling import SMOTE

# Step 1: Data Preparation


def load_data():
# Mockup: Replace with actual dataset loading
xray_images = np.random.rand(1000, 224, 224, 3) # Simulated images
pft_data = np.random.rand(1000, 3) # Simulated PFT features
(FEV1, FVC, FEV1/FVC)
labels = np.random.randint(0, 4, 1000) # 4 classes: Initial,
Progressive, Complicated, Critical
return xray_images, pft_data, labels

# Normalize and split data


def preprocess_data(xray_images, pft_data, labels):
scaler = MinMaxScaler()
pft_data = scaler.fit_transform(pft_data)
x_train, x_test, y_train, y_test = train_test_split(
np.hstack([xray_images.reshape(len(xray_images), -1), pft_data]),
labels, test_size=0.2, random_state=42)
return x_train, x_test, y_train, y_test

# Handle imbalance using SMOTE


def handle_imbalance(x_train, y_train):
smote = SMOTE()
x_train_balanced, y_train_balanced = smote.fit_resample(x_train,
y_train)
return x_train_balanced, y_train_balanced

# Step 2: Model Architecture


def build_model():
# Image model
xray_input = Input(shape=(224, 224, 3), name="Xray_Input")
densenet = DenseNet121(weights='imagenet', include_top=False,
input_tensor=xray_input)
xray_features = GlobalAveragePooling2D()(densenet.output)

# PFT model
pft_input = Input(shape=(3,), name="PFT_Input")
pft_features = Dense(128, activation='relu')(pft_input)

# Fusion and classification


combined_features = Concatenate()([xray_features, pft_features])
x = Dense(256, activation='relu')(combined_features)
output = Dense(4, activation='softmax', name="Output")(x)

# Build the model


model = Model(inputs=[xray_input, pft_input], outputs=output)
return model

# Step 3: Model Training


def train_model(model, xray_train, pft_train, y_train):
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy', metrics=['accuracy'])
early_stopping =
Fig. 1. Multimodal COPD Classification Architecture
tf.keras.callbacks.EarlyStopping(monitor='val_accuracy', patience=10)
history = model.fit(
The pseudocode (Table 1) outlines a multimodal [xray_train, pft_train], y_train,
deep learning framework integrating chest X-ray images and validation_split=0.2, batch_size=32, epochs=100,
pulmonary function test (PFT) data for COPD classification. callbacks=[early_stopping]
)
It incorporates DenseNet121 for image feature extraction, a return history

979-8-3315-3242-0/24/$31.00 ©2024 IEEE 1686


uthorized licensed use limited to: AMRUTA INSTITUTE OF ENGINEERING & MANAGEMENT SCIENCES. Downloaded on March 17,2025 at 16:24:30 UTC from IEEE Xplore. Restrictions appl
Proceedings of the Third International Conference on Automation, Computing and Renewable Systems (ICACRS-2024)
IEEE Xplore Part Number: CFP24CB5-ART; ISBN: 979-8-3315-3242-0

# Step 4: Model Evaluation Learning Rate 0.0001


def evaluate_model(model, xray_test, pft_test, y_test):
results = model.evaluate([xray_test, pft_test], y_test)
print(f"Test Loss: {results[0]}, Test Accuracy: {results[1]}") A. Accuracy:
This Table III and Figure 2 illustrate the accuracy of
# Full Workflow
xray_images, pft_data, labels = load_data() different models (MultimodNet, AlexNet, and VGG16)
x_train, x_test, y_train, y_test = preprocess_data(xray_images, pft_data, across the four stages of COPD classification: Initial,
labels) Progressive, Complicated, and Critical. MultimodNet
x_train_balanced, y_train_balanced = handle_imbalance(x_train,
consistently outperforms AlexNet and VGG16 across all
y_train)
stages, demonstrating higher accuracy in the classification of
# Split balanced data back into xray and pft subsets COPD stages. MultimodNet achieves the highest accuracy in
xray_train = x_train_balanced[:, :-3].reshape(-1, 224, 224, 3) the Initial Stage (95.231%), Progressive Stage (95.008%),
pft_train = x_train_balanced[:, -3:]
Complicated Stage (94.989%), and Critical Stage (94.256%).
model = build_model() AlexNet follows closely behind, while VGG16 shows the
history = train_model(model, xray_train, pft_train, y_train_balanced) lowest accuracy across all stages. This superior performance
of MultimodNet is likely due to its multimodal architecture,
xray_test = x_test[:, :-3].reshape(-1, 224, 224, 3)
which integrates both chest X-ray images and clinical data
pft_test = x_test[:, -3:]
evaluate_model(model, xray_test, pft_test, y_test) (PFTs), enhancing the model’s ability to classify COPD
stages more effectively.
IV. RESULTS ANALYSIS AND DISCUSSION
Table III: Stage-wise Accuracy Evaluation of Different Models

This setup in Table II aligns with state-of-the-art


configurations and includes relevant preprocessing, training, Initial Progressive Complicated Critical
Model
(%) (%) (%) (%)
and evaluation steps for multimodal deep learning research.
MultimodNet 95.231 95.008 94.989 94.256
Table II: Simulation and Model Configuration Parameters
AlexNet 93.008 93.024 92.952 92.336

Parameter Value VGG16 89.006 90.112 90.208 90.054


Google Open Images Dataset, Synthetic
Data Sources
Data (GANs)
Tiger Images, Environmental
Key Metadata
Conditions (Lighting, Terrain, Camera MultimodNet AlexNet VGG16
Collected
95.231

Angles)
95.008

94.989

94.256
Data Annotation Tools RectLabel (Bounding Box Annotation)
93.024
93.008

92.952
Data Augmentation (Flips, Rotations,
Data Preprocessing

92.336
Cropping, Brightness/Contrast
Techniques
Adjustments), GAN Augmentation
Normalization Scale [0, 1]
90.208
90.112

90.054
Pre-trained Faster R-CNN with
Model Initialization
89.006

ResNet50 backbone (COCO-trained)


Fine-tuning, Freezing initial layers,
Transfer Learning
Gradual unfreezing
Training/Validation/Te Training: 80%, Validation: 10%,
sting Split Testing: 10%
Anchor Box
Based on typical tiger dimensions
Optimization
Loss Function Focal Loss
Hyperparameter Bayesian Optimization (Learning rate,
Tuning Weight decay, Anchor sizes)
Evaluation Metrics Accuracy, Precision, Recall, F1 Score
k-fold Cross-Validation Yes
Tracking Algorithm Kalman Filter
Bounding Box
Non-Maximum Suppression (NMS)
Optimization
0.5 (Detections with a confidence score Fig. 2. Performance Visualization of Models Accuracy Across COPD
Confidence Threshold
below 50% are filtered out) Stages
NVIDIA Jetson Edge Devices, Real-
Model Deployment
Time Monitoring B. Precision:
Active Learning Feedback Loop, Table IV and Figure 3 provide a comparative
Continuous Learning
Periodic Fine-Tuning
analysis of the precision of three deep learning models:
Number of Training
Epochs
50 MultimodNet, AlexNet, and VGG16, across the four stages
Batch Size 16 of COPD: Initial, Progressive, Complicated, and Critical.
Learning Algorithm Adam Optimizer MultimodNet demonstrates superior precision across all

979-8-3315-3242-0/24/$31.00 ©2024 IEEE 1687


uthorized licensed use limited to: AMRUTA INSTITUTE OF ENGINEERING & MANAGEMENT SCIENCES. Downloaded on March 17,2025 at 16:24:30 UTC from IEEE Xplore. Restrictions appl
Proceedings of the Third International Conference on Automation, Computing and Renewable Systems (ICACRS-2024)
IEEE Xplore Part Number: CFP24CB5-ART; ISBN: 979-8-3315-3242-0

stages of COPD classification, maintaining high precision in Complicated Stage, and 93.124% for the Critical Stage.
the Initial Stage (95.088%), Progressive Stage (95.118%), While these values are relatively high, they still lag behind
Complicated Stage (94.022%), and Critical Stage (94.154%). MultimodNet, especially in the critical stages.VGG16 has the
AlexNet performs slightly worse than MultimodNet, with lowest recall across all stages, with 89.227% for the Initial
precision values of 92.996% for the Initial Stage, 92.921% Stage, 90.542% for the Progressive Stage, 90.238% for the
for the Progressive Stage, 92.967% for the Complicated Complicated Stage, and 89.118% for the Critical Stage. This
Stage, and 93.084% for the Critical Stage. VGG16 exhibits indicates that VGG16 struggles more to identify all true
the lowest precision across all stages, with 89.227% for the positive cases, which could result in more missed diagnoses.
Initial Stage, 90.542% for the Progressive Stage, 90.238% for MultimodNet’s high recall demonstrates its ability to
the Complicated Stage, and 89.421% for the Critical Stage. effectively detect COPD stages, minimizing false negatives
The higher precision of MultimodNet reflects its effective use and ensuring better detection of patients in need of treatment.
of both chest X-ray images and clinical PFT data, allowing Its advantage is primarily due to the integration of both chest
for better discrimination between classes and fewer false X-ray images and clinical PFT data, which allows for more
positives. On the other hand, AlexNet and VGG16, being robust feature extraction and enhanced sensitivity compared
traditional image-based models, struggle more with the to traditional image-only models like AlexNet and VGG16.
integration of diverse data sources, leading to lower precision
compared to MultimodNet. Table V: Stage-wise Recall Evaluation of Different Models

Table IV: Stage-wise Precision Evaluation of Different Models Initial Progressive Complicated Critical
Model
(%) (%) (%) (%)
Initial Progressive Complicated Critical
Model MultimodNet 95.088 95.084 95.042 95.124
(%) (%) (%) (%)
MultimodNet 95.088 95.118 94.022 94.154 AlexNet 92.996 92.339 92.854 93.124
AlexNet 92.996 92.921 92.967 93.084 VGG16 89.227 90.542 90.238 89.118
VGG16 89.227 90.542 90.238 89.421

MultimodNet AlexNet VGG16

95.124
95.088

95.084

95.042
MultimodNet AlexNet VGG16
95.118
95.088

94.154
94.022

93.124
92.996

92.854
93.084
92.996

92.967
92.921

92.339
90.542

90.238
90.542

90.238

89.227

89.118
89.421
89.227

Fig. 3. Performance Visualization of Models Precision Across COPD Fig. 4. Performance Visualization of Models Recall values Across
Stages COPD Stages

C. Recall : D. F1 Score:
Table V and Figure 4 present a comparison of recall Table VI and Figure 5 highlight the comparison of
scores for MultimodNet, AlexNet, and VGG16 across the F1 Scores for the models MultimodNet, AlexNet, and
four stages of COPD: Initial, Progressive, Complicated, and VGG16 across four stages of COPD: Initial, Progressive,
Critical. MultimodNet outperforms the other two models Complicated, and Critical. F1 score is a key metric that
with recall values of 95.088% in the Initial Stage, 95.084% in balances precision and recall, providing a better
the Progressive Stage, 95.042% in the Complicated Stage, understanding of the model's ability to correctly identify both
and 95.124% in the Critical Stage. This superior recall true positives and negatives while minimizing false positives
indicates that MultimodNet is highly effective at correctly and negatives. MultimodNet leads in all stages, with the
identifying true positive instances, making fewer false highest F1 scores of 95.088% in the Initial Stage, 95.101% in
negative predictions.In comparison, AlexNet shows lower the Progressive Stage, 94.529% in the Complicated Stage,
recall performance with values of 92.996% for the Initial and 94.637% in the Critical Stage. This consistent
Stage, 92.339% for the Progressive Stage, 92.854% for the performance across all stages reflects the robustness and

979-8-3315-3242-0/24/$31.00 ©2024 IEEE 1688


uthorized licensed use limited to: AMRUTA INSTITUTE OF ENGINEERING & MANAGEMENT SCIENCES. Downloaded on March 17,2025 at 16:24:30 UTC from IEEE Xplore. Restrictions appl
Proceedings of the Third International Conference on Automation, Computing and Renewable Systems (ICACRS-2024)
IEEE Xplore Part Number: CFP24CB5-ART; ISBN: 979-8-3315-3242-0

effectiveness of MultimodNet, combining multiple demonstrate solid performance, but their F1 scores are lower,
modalities (such as chest X-rays and clinical PFT data) for indicating less balance between precision and recall,
enhanced prediction accuracy. The high F1 scores indicate especially as the complexity of the stages increases.
that MultimodNet is well-balanced in terms of both precision
and recall, minimizing both false positives and false V. CONCLUSION
negatives. AlexNet follows with F1 scores of 92.996% in the The comparison of MultimodNet, AlexNet, and
Initial Stage, 92.629% in the Progressive Stage, 92.910% in VGG16 across various COPD stages reveals that the
the Complicated Stage, and 93.104% in the Critical Stage. multimodal architecture of MultimodNet provides significant
While its scores are lower than those of MultimodNet, they advantages in classification performance. By integrating
still represent a solid performance, particularly in the Critical chest X-ray images and clinical PFT data, MultimodNet
stage. AlexNet is trained using traditional CNN techniques, achieves higher accuracy, precision, recall, and F1 scores
which focus primarily on image data, thus its performance compared to AlexNet and VGG16, which are image-based
may not be as strong in a multimodal setting compared to models. MultimodNet consistently outperforms the other
MultimodNet. VGG16 performs the least well, with F1 scores models, particularly in the more complex COPD stages such
of 89.227% in the Initial Stage, 90.542% in the Progressive as Progressive, Complicated, and Critical. This enhanced
Stage, 90.238% in the Complicated Stage, and 89.269% in performance is due to the model's ability to effectively handle
the Critical Stage. The model struggles more than the other diverse data sources, resulting in more accurate and sensitive
two due to its relatively simpler architecture, which is less detection. In contrast, AlexNet and VGG16, with their focus
suited to multimodal learning. Consequently, its ability to on image data alone, show lower performance, especially in
balance precision and recall in the COPD classification task handling complex cases. These findings underscore the value
is limited, especially in more complex stages. of multimodal learning for medical diagnostics, highlighting
the superior capabilities of MultimodNet in accurately
Table V: Stage-wise F1 Score Evaluation of Different Models
diagnosing COPD at different stages.
Initial Progressive Complicated Critical Future enhancements could involve integrating
Model additional multimodal data, such as patient demographics,
(%) (%) (%) (%)
MultimodNet 95.088 95.101 94.529 94.637 genetic information, and environmental factors, to further
improve the accuracy and robustness of COPD stage
AlexNet 92.996 92.629 92.910 93.104
classification. Exploring advanced deep learning techniques
VGG16 89.227 90.542 90.238 89.269 like transformers or self-supervised learning could also
enhance model performance in complex, real-world clinical
settings.
MultimodNet AlexNet VGG16
95.101
95.088

REFERENCES
94.637
94.529

[1] A. V. Ikechukwu and S. Murali, "xAI: An Explainable AI Model for


93.104
92.996

92.910
92.629

the Diagnosis of COPD from CXR Images," 2023 IEEE 2nd


International Conference on Data, Decision and Systems (ICDDS),
Mangaluru, India, 2023, pp. 1-6, doi:
90.542

90.238

10.1109/ICDDS59137.2023.10434619.
89.269
89.227

[2] V. Koshta, B. Kumar Singh, A. K. Behera and R. T. G., "Fourier


Decomposition-Based Automated Classification of Healthy, COPD,
and Asthma Using Single-Channel Lung Sounds," in IEEE
Transactions on Medical Robotics and Bionics, vol. 6, no. 3, pp. 1270-
1284, Aug. 2024, doi: 10.1109/TMRB.2024.3408325.
[3] A. Kinikar, M. Chandwani and T. Rane, "Predicting COPD Severity
Using Machine Learning and GOLD Criteria," 2024 3rd International
Conference for Innovation in Technology (INOCON), Bangalore,
India, 2024, pp. 1-6, doi: 10.1109/INOCON60754.2024.10511329.
[4] H. J. Davies et al., "Physically Meaningful Surrogate Data for COPD,"
in IEEE Open Journal of Engineering in Medicine and Biology, vol. 5,
pp. 148-156, 2024, doi: 10.1109/OJEMB.2024.3360688.
[5] J. E. Nikshya, M. S. Karthikeyan, S. Prasad, R. S. Krishnan, S.
Balamurugan and J. R. F. Raj, "A Machine Learning Framework for
Integrating Multi-Omics Data for Early Leukemia Detection," 2024 8th
International Conference on I-SMAC (IoT in Social, Mobile, Analytics
Fig. 5. Performance Visualization of Models F1 Score values Across and Cloud) (I-SMAC), Kirtipur, Nepal, 2024, pp. 1348-1356, doi:
COPD Stages 10.1109/I-SMAC61858.2024.10714596.
[6] S. Kumar, A. V. Shvetsov and S. H. Alsamhi, "FuzzyGuard: A Novel
Multimodal Neuro-Fuzzy Framework for COPD Early Diagnosis,"
The F1 score is a critical measure of a model’s in IEEE Internet of Things Journal, doi: 10.1109/JIOT.2024.3467176.
reliability, particularly when dealing with imbalanced [7] P. Sahu, S. Jha and S. Kumar, "Optimized 1D CNNs for Enhanced
datasets or where both false positives and false negatives are Early Detection and Accurate Prediction of COPD and Other
costly. MultimodNet’s higher F1 scores indicate a stronger Pulmonary Diseases," 2024 IEEE Region 10 Symposium
performance in handling complex datasets, where a balance (TENSYMP), New Delhi, India, 2024, pp. 1-6, doi:
10.1109/TENSYMP61132.2024.10752156.
between precision and recall is essential for diagnosing
[8] S. Jha, P. Sahu and S. Kumar, "Enhanced Predictive Modeling
COPD in its different stages. The combination of multiple Techniques for Early Detection of COPD Utilizing 1D Convolutional
modalities provides MultimodNet with an edge in making Neural Networks," 2024 15th International Conference on Computing
more reliable predictions. In contrast, AlexNet and VGG16

979-8-3315-3242-0/24/$31.00 ©2024 IEEE 1689


uthorized licensed use limited to: AMRUTA INSTITUTE OF ENGINEERING & MANAGEMENT SCIENCES. Downloaded on March 17,2025 at 16:24:30 UTC from IEEE Xplore. Restrictions appl
Proceedings of the Third International Conference on Automation, Computing and Renewable Systems (ICACRS-2024)
IEEE Xplore Part Number: CFP24CB5-ART; ISBN: 979-8-3315-3242-0

Communication and Networking Technologies (ICCCNT), Kamand, [13] G. S. Marepalli, P. K. Kollu and M. D. Inavolu, "Early Detection of
India, 2024, pp. 1-6, doi: 10.1109/ICCCNT61001.2024.10726000. Chronic Obstructive Pulmonary Disease in Respiratory Audio Signals
[9] N. Vodnala, P. S. Yarlagadda, S. Bhuvana K, M. Ch and K. Sailaja, Using CNN and LSTM Models," 2024 IEEE International Conference
"Novel Deep Learning Approaches to Differentiate Asthma and COPD on Contemporary Computing and Communications (InC4), Bangalore,
Based on Cough Sounds," 2024 Parul International Conference on India, 2024, pp. 1-6, doi: 10.1109/InC460750.2024.10648991.
Engineering and Technology (PICET), Vadodara, India, 2024, pp. 1-4, [14] P. Sahu, S. Kumar and A. K. Behera, "SOUNDNet: Leveraging Deep
doi: 10.1109/PICET60765.2024.10716083. Learning for the Severity Classification of Chronic Obstructive
[10] M. S. Karthikeyan, J. R. F. Raj, R. Parvathi, S. T. Anushya, R. S. Pulmonary Disease Based on Lung Sound Analysis," 2024 IEEE
Krishnan and K. P. Joshua, "U-Net-Attention-TBNet: A Cutting-Edge International Conference on Electronics, Computing and
Solution for Accurate TB Lesion Segmentation and Communication Technologies (CONECCT), Bangalore, India, 2024,
Classification," 2024 8th International Conference on I-SMAC (IoT in pp. 1-6, doi: 10.1109/CONECCT62155.2024.10677193.
Social, Mobile, Analytics and Cloud) (I-SMAC), Kirtipur, Nepal, [15] E. E. Nithila, B. C. V, J. R. Francis Raj, A. Srinivasan, N. Soundiraraj
2024, pp. 1357-1364, doi: 10.1109/I-SMAC61858.2024.10714897. and R. S. Krishnan, "Automated Pneumonia Classification Using
[11] R. K. C, G. E, A. G and B. P. R, "Early Prediction of Chronic DensePneumoNet in Chest CT Scans," 2024 5th International
Obstructive Pulmonary Disease: A Deep Transfer Learning Conference on Electronics and Sustainable Communication Systems
Approach," 2024 2nd International Conference on Self Sustainable (ICESC), Coimbatore, India, 2024, pp. 1907-1914, doi:
Artificial Intelligence Systems (ICSSAS), Erode, India, 2024, pp. 379- 10.1109/ICESC60852.2024.10690103.
387, doi: 10.1109/ICSSAS64001.2024.10760801. [16] J. Chawla and N. K. Walia, "A Novel Artificial Intelligence based
[12] G. R. Khanaghavalle, G. Rahul, S. R. Senajith, T. S. Vishnuvasan and Approach for Diagnosis of Chronic Obstructive Pulmonary
S. Keerthana, "Chronic Obstructive Pulmonary Disease Severity Disease," 2024 3rd International Conference for Innovation in
Classification using lung sound," 2024 10th International Conference Technology (INOCON), Bangalore, India, 2024, pp. 1-7, doi:
on Communication and Signal Processing (ICCSP), Melmaruvathur, 10.1109/INOCON60754.2024.10512051.
India, 2024, pp. 428-432, doi: 10.1109/ICCSP60870.2024.10543344.

979-8-3315-3242-0/24/$31.00 ©2024 IEEE 1690


uthorized licensed use limited to: AMRUTA INSTITUTE OF ENGINEERING & MANAGEMENT SCIENCES. Downloaded on March 17,2025 at 16:24:30 UTC from IEEE Xplore. Restrictions appl

You might also like