Project Report Chest Xray classification (2)
Project Report Chest Xray classification (2)
Project Report Chest Xray classification (2)
Attention Mechanism
Table of Contents
Project Report: Chest X-ray Classification Using CNN with Attention Mechanism................1
1. Introduction........................................................................................................................2
2. Dataset and Data Loading..................................................................................................2
Dataset Summary...............................................................................................................3
3. Data Preprocessing.............................................................................................................3
4. Model Architecture.............................................................................................................4
4.1 CNN Layers and Residual Blocks with SE Attention..................................................4
5. Hyperparameter Optimization............................................................................................6
6. Training Procedure and Evaluation....................................................................................6
6.1 Callbacks......................................................................................................................6
6.2 Training and Validation................................................................................................7
6.3 Test Performance..........................................................................................................7
7. Results................................................................................................................................7
8. Conclusion..........................................................................................................................8
9. Code Explanation...................................................................................................................8
9.1 Dataset and Data Loading............................................................................................8
9.2 Data Preprocessing.......................................................................................................9
9.3 Model Architecture.......................................................................................................9
9.4 Hyperparameter Optimization....................................................................................10
9.5 Training Procedure and Evaluation............................................................................13
9.6 Results........................................................................................................................13
1. Introduction
The COVID-19 pandemic highlighted the critical role of diagnostic imaging in screening,
diagnosis, and treatment planning. Chest X-rays (CXR) have become a standard diagnostic
tool, particularly useful in early detection of COVID-19 and differentiation from other
pulmonary diseases, such as pneumonia. This project aims to develop a deep learning model
that accurately classifies Chest X-ray images into three categories:
COVID-19
Normal
Pneumonia
This classification task is inherently challenging due to similarities in visual patterns across
different diseases, limited annotated datasets, and the need for high sensitivity and specificity.
To tackle these challenges, the model is based on a Convolutional Neural Network (CNN)
architecture augmented with attention mechanisms (Squeeze-and-Excitation blocks) and
residual connections to enhance feature extraction and class discrimination.
/content/Data/train
├── COVID19
├── NORMAL
├── PNEUMONIA
/content/Data/test
├── COVID19
├── NORMAL
├── PNEUMONIA
Each image is labelled based on its containing folder. These images are loaded using
TensorFlow’s image_dataset_from_directory function, which automatically assigns labels
based on folder names. Two datasets are generated:
Testing Set: Used for final evaluation of the model after training.
Dataset Summary
Image Resolution: Resized to 128x128 pixels for uniformity.
3. Data Preprocessing
The raw images underwent the following preprocessing steps to ensure compatibility with the
neural network and to enhance training performance:
Resizing: Each image was resized to 128x128 pixels, a common input size for CNN
architectures, balancing model accuracy and computational efficiency.
Batching: Images were loaded in batches of 32 to optimize memory usage and training speed.
Normalization: Pixel values were scaled to a range of 0-1. This step standardizes input
values, helping the model converge faster by stabilizing gradients.
Sample Visualization: A subset of images and labels was visualized after preprocessing to
confirm the integrity of data loading and class distribution. Sample dataset images after
preprocessing are shown in Figure 1.
Figure 1: Sample images from the dataset after preprocessing
4. Model Architecture
The model architecture was designed to maximize the accuracy of Chest X-ray image
classification using a blend of CNN, residual connections, and attention mechanisms,
specifically Squeeze-and-Excitation (SE) blocks. This architecture was chosen for its balance
of accuracy and efficiency, allowing for the capture of important visual cues while remaining
computationally feasible.
Conv2D Layer: Applies 32 filters of size (3x3) with padding to preserve spatial dimensions.
Residual Blocks:
Two Residual Blocks with 64 and 128 filters, respectively, followed by batch normalization
and ReLU activations.
Each block contains two Conv2D layers, where the output is added to the input (shortcut
connection) to form the residual connection. This helps in mitigating the vanishing gradient
problem and allows efficient feature propagation.
Squeeze-and-Excitation (SE) Block: Embedded within each residual block to learn channel-
wise attention weights. This block compresses the spatial dimensions of feature maps, applies
two dense layers to generate scaling factors for each channel, enhancing the network's focus
on important features.
Conv2D Layer with 256 filters and kernel size (3x3), followed by batch normalization and
ReLU activation.
Global Average Pooling: Reduces each feature map to a single value, creating a compact
feature vector.
Dense Layer with 512 units and Dropout (0.5) for regularization.
Output Layer:
Dense Layer with 3 units (for the three classes) and a SoftMax activation function to produce
probability scores for each class.
5. Hyperparameter Optimization
To further improve model performance, hyperparameter optimization was performed using a
custom Artificial Rabbit Optimization (ARO) algorithm. This approach iteratively searched
for the best combination of hyperparameters by exploring a reduced search space. The
primary hyperparameters optimized include:
Batch Size: 64
Filters: 128
6.1 Callbacks
Early Stopping: Monitors validation loss to halt training when improvements stagnate.
Learning Rate Reduction: Reduces the learning rate by half if validation loss does not
improve, helping the model converge to a better local minimum.
6.2 Training and Validation
Epochs: 100, with early stopping based on validation loss.
7. Results
The model demonstrated robust classification performance, achieving a test accuracy of
96.5%. This high accuracy suggests that the CNN architecture, enhanced by attention
mechanisms and residual connections, effectively distinguishes between COVID-19,
NORMAL, and PNEUMONIA classes in Chest X-ray images.
Learning Curves
The following plots of Figure 2 show the training and validation accuracy and loss across
epochs:
Training Loss and Validation Loss Plot: Shows smooth convergence, with validation loss
stabilizing around the optimal value.
8. Code Explanation
8.1 Dataset and Data Loading
/content/Data/train
├── COVID19
├── NORMAL
├── PNEUMONIA
/content/Data/test
├── COVID19
├── NORMAL
├── PNEUMONIA
import tensorflow as tf
train_dataset = tf.keras.preprocessing.image_dataset_from_directory(
"/content/Data/train",
image_size=(128, 128),
batch_size=32,
label_mode="int"
)
test_dataset = tf.keras.preprocessing.image_dataset_from_directory(
"/content/Data/test",
image_size=(128, 128),
batch_size=32,
label_mode="int"
)
Steps performed on the dataset include resizing, batching, and normalization. Visualizing a
few samples to verify preprocessing:
visualize_samples(train_dataset)
This model architecture includes convolutional layers, residual blocks, and Squeeze-and-
Excitation (SE) attention mechanisms to improve focus on important features.
if use_se:
x = se_block(x)
x = Add()([shortcut, x])
x = ReLU()(x)
return x
# Final Layers
x = GlobalAveragePooling2D()(x)
x = Dense(512, activation='relu')(x)
x = Dropout(0.5)(x)
x = Dense(256, activation='relu')(x)
x = Dropout(0.5)(x)
outputs = Dense(3, activation="softmax")(x)
model = build_model()
model.summary()
Using the Artificial Rabbit Optimization (ARO) algorithm, the following hyperparameters
were optimized:
# Initialize the rabbits (randomly select values from the search space)
print(f"Iteration {iter+1}/{max_iter}")
fitness_scores = []
params = [
search_space["learning_rate"][int(rabbit[0] *
(len(search_space["learning_rate"]) - 1e-6))],
search_space["batch_size"][int(rabbit[1] *
(len(search_space["batch_size"]) - 1e-6))],
search_space["filters"][int(rabbit[2] *
(len(search_space["filters"]) - 1e-6))],
search_space["kernel_size"][int(rabbit[3] *
(len(search_space["kernel_size"]) - 1e-6))],
search_space["dropout_rate"][int(rabbit[4] *
(len(search_space["dropout_rate"]) - 1e-6))],
search_space["num_hidden_units"][int(rabbit[5] *
(len(search_space["num_hidden_units"]) - 1e-6))],
params_flat = [
# Ensure that params_flat and each rabbit array have the same
shape
if len(params_flat) != 7:
fitness_scores.append(fitness)
best_fitness = fitness
for i in range(num_rabbits):
if rabbits[i].shape != best_rabbit.shape:
# Ensure that the rabbit stays within the bounds of the search
space
rabbits[i] = np.clip(rabbits[i], 0, 1)
return best_rabbit
Callbacks
callbacks = [
tf.keras.callbacks.EarlyStopping(monitor="val_loss", patience=5,
restore_best_weights=True),
tf.keras.callbacks.ReduceLROnPlateau(monitor="val_loss", factor=0.5,
patience=3)
]
history = model.fit(
train_dataset,
validation_data=test_dataset,
epochs=100,
callbacks=callbacks
)
Test Performance
8.6 Results
The following plots show training and validation accuracy and loss over epochs, providing
insights into model performance and generalization:
def plot_metrics(history):
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))
plt.show()
plot_metrics(history)
o The base model is frozen to prevent its weights from being updated during training,
ensuring that only the new layers we add are trained on the chest X-ray dataset.
Custom Layers:
o GlobalAveragePooling2D: Reduces the spatial dimensions of the feature maps while
o Fully Connected Layers: A dense layer with 512 units and ReLU activation is added to
capture high-level features from the output of the base model. A dropout rate of 30% is
applied to prevent overfitting.
o Output Layer: The final layer is a dense layer with a SoftMax activation function, which
classifies the input into one of three classes (for example, 'normal', 'pneumonia', and
'tuberculosis').
Hyperparameter tuning
To further improve model performance, hyperparameter optimization was performed using a
custom Artificial Rabbit Optimization (ARO) algorithm. This approach iteratively searched
for the best combination of hyperparameters by exploring a reduced search space. The
primary hyperparameters optimized include:
Batch Size: 64
Filters: 128
Model Training and Evaluation: After compiling the model, the model is trained on the
chest X-ray dataset, using a training set and validating on a separate test set. Early stopping is
utilized to ensure that the model does not overfit to the training data.
The model's performance is evaluated on the test dataset, and the following metrics are
reported:
Learning Curves The learning curves for both training and validation accuracy and loss are
plotted to visualize the model's performance throughout the training process. These plots help
in identifying issues such as overfitting or underfitting by comparing training and validation
trends and are shown in Figure 3.
Training Accuracy: As the epochs progress, the training accuracy improves steadily,
indicating that the model is learning to classify chest X-rays effectively.
Training and Validation Loss: The training and validation losses are expected to
decrease over time, indicating that the model is successfully minimizing the loss
function.
Figure 3: Training; Validation Loss and Accuracy on VGG-16
Code Explanation:
This code snippet defines a function to build a deep learning model using VGG16 as the base
architecture for image classification, specifically for chest X-ray classification tasks. The
VGG16 model, pretrained on the ImageNet dataset, is used as a feature extractor by
excluding its top fully connected layers and freezing its weights, which ensures that the
model leverages the rich features learned from ImageNet without modifying the base model
during training. Custom fully connected layers are added on top of VGG16 to tailor the
model for the specific task, including two dense layers with ReLU activation and dropout for
regularization. The final output layer uses the softmax activation function for multi-class
classification. The model is compiled with the Adam optimizer (with a small learning rate of
0.0001), sparse categorical cross-entropy loss, and accuracy as the evaluation metric. This
setup allows the model to efficiently learn from the chest X-ray dataset while preventing
overfitting through dropout regularization, making it suitable for tasks like classifying
different conditions (e.g., normal, pneumonia, tuberculosis) in chest X-ray images.
# Output layer
outputs = layers.Dense(num_classes, activation='softmax')(x)
return model
10. Conclusion
In this project, we compared the performance of a custom Convolutional Neural Network
(CNN) and a pretrained VGG16 model for the task of chest X-ray classification. The custom
CNN model, tailored specifically for this task, achieved a test loss of 0.2959 and a test
accuracy of 96.5%, demonstrating its strong ability to classify chest X-rays effectively. On
the other hand, the VGG16 model, which leveraged pretrained weights from ImageNet and
fine-tuned for the same classification task, performed slightly better, with a test loss of 0.13
and a test accuracy of 95%. While both models showed impressive results, the VGG16 model
benefited from the transfer learning approach, utilizing rich features learned from a large,
diverse dataset. The custom CNN, however, provided competitive performance and could be
further optimized for specific tasks. In conclusion, both models demonstrated high accuracy
and could be valuable tools for automating the analysis of chest X-rays, with VGG16 offering
a slight edge due to its ability to leverage pretrained knowledge.