Project Report Chest Xray classification (2)

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 18

Project Report: Chest X-ray Classification Using CNN with

Attention Mechanism

Table of Contents
Project Report: Chest X-ray Classification Using CNN with Attention Mechanism................1
1. Introduction........................................................................................................................2
2. Dataset and Data Loading..................................................................................................2
Dataset Summary...............................................................................................................3
3. Data Preprocessing.............................................................................................................3
4. Model Architecture.............................................................................................................4
4.1 CNN Layers and Residual Blocks with SE Attention..................................................4
5. Hyperparameter Optimization............................................................................................6
6. Training Procedure and Evaluation....................................................................................6
6.1 Callbacks......................................................................................................................6
6.2 Training and Validation................................................................................................7
6.3 Test Performance..........................................................................................................7
7. Results................................................................................................................................7
8. Conclusion..........................................................................................................................8
9. Code Explanation...................................................................................................................8
9.1 Dataset and Data Loading............................................................................................8
9.2 Data Preprocessing.......................................................................................................9
9.3 Model Architecture.......................................................................................................9
9.4 Hyperparameter Optimization....................................................................................10
9.5 Training Procedure and Evaluation............................................................................13
9.6 Results........................................................................................................................13
1. Introduction
The COVID-19 pandemic highlighted the critical role of diagnostic imaging in screening,
diagnosis, and treatment planning. Chest X-rays (CXR) have become a standard diagnostic
tool, particularly useful in early detection of COVID-19 and differentiation from other
pulmonary diseases, such as pneumonia. This project aims to develop a deep learning model
that accurately classifies Chest X-ray images into three categories:

COVID-19

Normal

Pneumonia

This classification task is inherently challenging due to similarities in visual patterns across
different diseases, limited annotated datasets, and the need for high sensitivity and specificity.
To tackle these challenges, the model is based on a Convolutional Neural Network (CNN)
architecture augmented with attention mechanisms (Squeeze-and-Excitation blocks) and
residual connections to enhance feature extraction and class discrimination.

2. Dataset and Data Loading


The dataset used in this project consists of Chest X-ray images classified into three
categories, with directory structures as follows:

/content/Data/train

├── COVID19

├── NORMAL

├── PNEUMONIA

/content/Data/test

├── COVID19

├── NORMAL

├── PNEUMONIA
Each image is labelled based on its containing folder. These images are loaded using
TensorFlow’s image_dataset_from_directory function, which automatically assigns labels
based on folder names. Two datasets are generated:

Training Set: Used for model training.

Testing Set: Used for final evaluation of the model after training.

Dataset Summary
Image Resolution: Resized to 128x128 pixels for uniformity.

Batch Size: 32 images per batch.

Label Encoding: Labels are integers with COVID-19 = 0, NORMAL = 1, PNEUMONIA = 2.

3. Data Preprocessing
The raw images underwent the following preprocessing steps to ensure compatibility with the
neural network and to enhance training performance:

Resizing: Each image was resized to 128x128 pixels, a common input size for CNN
architectures, balancing model accuracy and computational efficiency.

Batching: Images were loaded in batches of 32 to optimize memory usage and training speed.

Normalization: Pixel values were scaled to a range of 0-1. This step standardizes input
values, helping the model converge faster by stabilizing gradients.

Sample Visualization: A subset of images and labels was visualized after preprocessing to
confirm the integrity of data loading and class distribution. Sample dataset images after
preprocessing are shown in Figure 1.
Figure 1: Sample images from the dataset after preprocessing

4. Model Architecture
The model architecture was designed to maximize the accuracy of Chest X-ray image
classification using a blend of CNN, residual connections, and attention mechanisms,
specifically Squeeze-and-Excitation (SE) blocks. This architecture was chosen for its balance
of accuracy and efficiency, allowing for the capture of important visual cues while remaining
computationally feasible.

4.1 CNN Layers and Residual Blocks with SE Attention


Input Layer: Accepts images of size 128x128 with 3 channels (RGB).
Initial Convolutional Block:

Conv2D Layer: Applies 32 filters of size (3x3) with padding to preserve spatial dimensions.

Batch Normalization: Normalizes activations, stabilizing training.

ReLU Activation: Introduces non-linearity.

Max Pooling (2x2): Reduces spatial dimensions by a factor of 2.

Residual Blocks:

Two Residual Blocks with 64 and 128 filters, respectively, followed by batch normalization
and ReLU activations.

Each block contains two Conv2D layers, where the output is added to the input (shortcut
connection) to form the residual connection. This helps in mitigating the vanishing gradient
problem and allows efficient feature propagation.

Squeeze-and-Excitation (SE) Block: Embedded within each residual block to learn channel-
wise attention weights. This block compresses the spatial dimensions of feature maps, applies
two dense layers to generate scaling factors for each channel, enhancing the network's focus
on important features.

Additional Convolutional Layer:

Conv2D Layer with 256 filters and kernel size (3x3), followed by batch normalization and
ReLU activation.

Max Pooling (2x2): Further spatial dimension reduction.

Fully Connected Layers:

Global Average Pooling: Reduces each feature map to a single value, creating a compact
feature vector.

Dense Layer with 512 units and Dropout (0.5) for regularization.

Another Dense Layer with 256 units and Dropout (0.5).

Output Layer:

Dense Layer with 3 units (for the three classes) and a SoftMax activation function to produce
probability scores for each class.
5. Hyperparameter Optimization
To further improve model performance, hyperparameter optimization was performed using a
custom Artificial Rabbit Optimization (ARO) algorithm. This approach iteratively searched
for the best combination of hyperparameters by exploring a reduced search space. The
primary hyperparameters optimized include:

Learning Rate: [1e-4, 1e-3, 1e-2]

Batch Size: [32, 64]

Filters: [64, 128]

Kernel Size: (3x3)

Dropout Rate: [0.3, 0.5]

Number of Hidden Units: [256, 512]

The ARO algorithm identified the following optimal parameters:

Learning Rate: 0.001

Batch Size: 64

Filters: 128

Kernel Size: (3, 3)

Dropout Rate: 0.3

Number of Hidden Units: 256

6. Training Procedure and Evaluation


The model was trained using the AdamW optimizer with a weight decay term to improve
generalization. The sparse_categorical_crossentropy loss function was used, as the labels
were encoded as integers.

6.1 Callbacks
Early Stopping: Monitors validation loss to halt training when improvements stagnate.

Learning Rate Reduction: Reduces the learning rate by half if validation loss does not
improve, helping the model converge to a better local minimum.
6.2 Training and Validation
Epochs: 100, with early stopping based on validation loss.

Validation Accuracy: Consistently high, indicating strong model generalization on unseen


data.

6.3 Test Performance


After training, the model was evaluated on the test dataset, achieving:

Test Loss: 0.2959

Test Accuracy: 96.5%

7. Results
The model demonstrated robust classification performance, achieving a test accuracy of
96.5%. This high accuracy suggests that the CNN architecture, enhanced by attention
mechanisms and residual connections, effectively distinguishes between COVID-19,
NORMAL, and PNEUMONIA classes in Chest X-ray images.

Learning Curves

The following plots of Figure 2 show the training and validation accuracy and loss across
epochs:

Figure 2: Training and Validation Accuracy; Loss


Training Accuracy and Validation Accuracy Plot: Demonstrates steady improvement with no
overfitting.

Training Loss and Validation Loss Plot: Shows smooth convergence, with validation loss
stabilizing around the optimal value.

8. Code Explanation
8.1 Dataset and Data Loading

The dataset structure used for the project:

/content/Data/train
├── COVID19
├── NORMAL
├── PNEUMONIA

/content/Data/test
├── COVID19
├── NORMAL
├── PNEUMONIA

Loading the dataset using TensorFlow:

import tensorflow as tf

train_dataset = tf.keras.preprocessing.image_dataset_from_directory(
"/content/Data/train",
image_size=(128, 128),
batch_size=32,
label_mode="int"
)

test_dataset = tf.keras.preprocessing.image_dataset_from_directory(
"/content/Data/test",
image_size=(128, 128),
batch_size=32,
label_mode="int"
)

8.2 Data Preprocessing

Steps performed on the dataset include resizing, batching, and normalization. Visualizing a
few samples to verify preprocessing:

import matplotlib.pyplot as plt

def visualize_samples(dataset, num_images=9):


plt.figure(figsize=(10, 10))
for images, labels in dataset.take(1):
for i in range(num_images):
ax = plt.subplot(3, 3, i + 1)
plt.imshow(images[i].numpy().astype("uint8"))
plt.title(f"Label: {labels[i].numpy()}")
plt.axis("off")

visualize_samples(train_dataset)

8.3 Model Architecture

This model architecture includes convolutional layers, residual blocks, and Squeeze-and-
Excitation (SE) attention mechanisms to improve focus on important features.

Below is the model code implementing convolutional, residual, and SE blocks:

from tensorflow.keras.layers import Conv2D, BatchNormalization, ReLU,


GlobalAveragePooling2D, Dense, Input, Add, MaxPooling2D, Dropout,
GlobalAveragePooling2D
from tensorflow.keras.models import Model

def se_block(input_tensor, reduction_ratio=16):


"""Squeeze and Excitation Block"""
filters = input_tensor.shape[-1]
se = GlobalAveragePooling2D()(input_tensor)
se = Dense(filters // reduction_ratio, activation='relu')(se)
se = Dense(filters, activation='sigmoid')(se)
return tf.keras.layers.multiply([input_tensor, se])

def residual_block(x, filters, use_se=False):


"""Residual Block with optional Squeeze-and-Excitation"""
shortcut = x
x = Conv2D(filters, (3, 3), padding="same")(x)
x = BatchNormalization()(x)
x = ReLU()(x)

x = Conv2D(filters, (3, 3), padding="same")(x)


x = BatchNormalization()(x)

if use_se:
x = se_block(x)

x = Add()([shortcut, x])
x = ReLU()(x)
return x

def build_model(input_shape=(128, 128, 3)):


inputs = Input(shape=input_shape)
x = Conv2D(32, (3, 3), padding="same", activation="relu")(inputs)
x = MaxPooling2D(pool_size=(2, 2))(x)

# Residual blocks with SE Attention


x = residual_block(x, 64, use_se=True)
x = MaxPooling2D(pool_size=(2, 2))(x)
x = residual_block(x, 128, use_se=True)
x = MaxPooling2D(pool_size=(2, 2))(x)

# Final Layers
x = GlobalAveragePooling2D()(x)
x = Dense(512, activation='relu')(x)
x = Dropout(0.5)(x)
x = Dense(256, activation='relu')(x)
x = Dropout(0.5)(x)
outputs = Dense(3, activation="softmax")(x)

model = Model(inputs, outputs)


return model

model = build_model()
model.summary()

8.4 Hyperparameter Optimization

Using the Artificial Rabbit Optimization (ARO) algorithm, the following hyperparameters
were optimized:

def artificial_rabbit_optimization(search_space, train_dataset,


test_dataset, num_rabbits=5, max_iter=5):

# Initialize the rabbits (randomly select values from the search space)

rabbits = np.random.uniform(0, 1, size=(num_rabbits, 7)) # Ensure


rabbits have 7 parameters

best_rabbit = rabbits[0].copy() # Initialize best_rabbit with the same


shape as each rabbit

best_fitness = -np.inf # Start with a very low value

for iter in range(max_iter):

print(f"Iteration {iter+1}/{max_iter}")

# For each rabbit, map the values to the search space

fitness_scores = []

for i, rabbit in enumerate(rabbits):

# Map the rabbit's position to the search space

params = [

search_space["learning_rate"][int(rabbit[0] *
(len(search_space["learning_rate"]) - 1e-6))],

search_space["batch_size"][int(rabbit[1] *
(len(search_space["batch_size"]) - 1e-6))],

search_space["filters"][int(rabbit[2] *
(len(search_space["filters"]) - 1e-6))],
search_space["kernel_size"][int(rabbit[3] *
(len(search_space["kernel_size"]) - 1e-6))],

search_space["dropout_rate"][int(rabbit[4] *
(len(search_space["dropout_rate"]) - 1e-6))],

search_space["num_hidden_units"][int(rabbit[5] *
(len(search_space["num_hidden_units"]) - 1e-6))],

# Flatten the params to avoid issues with tuples (kernel_size


is a tuple)

params_flat = [

params[0], params[1], params[2], # Learning rate, batch


size, filters

params[3][0], params[3][1], # Flatten kernel size

params[4], params[5] # Dropout rate, num hidden units

# Ensure that params_flat and each rabbit array have the same
shape

if len(params_flat) != 7:

raise ValueError("Flattened parameters have incorrect


shape")

# Evaluate the fitness of the current rabbit (model accuracy)

fitness = fitness_function(params_flat, train_dataset,


test_dataset)

fitness_scores.append(fitness)

# Update the best rabbit found so far

if fitness > best_fitness:

best_fitness = fitness

best_rabbit = np.array(params_flat) # Store the best


rabbit as a numpy array with 7 values
# Update the rabbits' positions

for i in range(num_rabbits):

# Ensure that both rabbits[i] and best_rabbit have the same


shape

if rabbits[i].shape != best_rabbit.shape:

raise ValueError("Mismatch in shape between rabbits[i] and


best_rabbit")

rabbits[i] += np.random.rand(len(rabbits[i])) * (best_rabbit -


rabbits[i])

# Ensure that the rabbit stays within the bounds of the search
space

rabbits[i] = np.clip(rabbits[i], 0, 1)

# Return the best hyperparameters found

return best_rabbit

# Optimal parameters obtained from ARO algorithm


optimal_params = {
"learning_rate": 0.001,
"batch_size": 64,
"filters": 128,
"dropout_rate": 0.3,
"hidden_units": 256
}

# Applying optimal parameters in model training


optimizer =
tf.keras.optimizers.Adam(learning_rate=optimal_params["learning_rate"])
model.compile(optimizer=optimizer, loss="sparse_categorical_crossentropy",
metrics=["accuracy"])

8.5 Training Procedure and Evaluation

Callbacks

Implementing early stopping and learning rate reduction:

callbacks = [
tf.keras.callbacks.EarlyStopping(monitor="val_loss", patience=5,
restore_best_weights=True),
tf.keras.callbacks.ReduceLROnPlateau(monitor="val_loss", factor=0.5,
patience=3)
]

Training and Validation

Training the model on the dataset:

history = model.fit(
train_dataset,
validation_data=test_dataset,
epochs=100,
callbacks=callbacks
)

Test Performance

Evaluating model performance on the test set:

test_loss, test_accuracy = model.evaluate(test_dataset)


print(f"Test Loss: {test_loss}")
print(f"Test Accuracy: {test_accuracy}")

8.6 Results

The following plots show training and validation accuracy and loss over epochs, providing
insights into model performance and generalization:

# Plotting accuracy and loss curves


import matplotlib.pyplot as plt

def plot_metrics(history):
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))

ax1.plot(history.history["accuracy"], label="Training Accuracy")


ax1.plot(history.history["val_accuracy"], label="Validation Accuracy")
ax1.set_title("Training and Validation Accuracy")
ax1.legend()

ax2.plot(history.history["loss"], label="Training Loss")


ax2.plot(history.history["val_loss"], label="Validation Loss")
ax2.set_title("Training and Validation Loss")
ax2.legend()

plt.show()
plot_metrics(history)

9. Exploiting VGG-16 Pretrained Model for Chest X-Ray Classification


The architecture used in this project is based on VGG16, a deep CNN that has been pre-
trained on the ImageNet dataset. The core idea behind transfer learning is to utilize the
pretrained weights from VGG16 on a large dataset (ImageNet) and then fine-tune the model
for chest X-ray classification.
VGG16 Base Model:
o We load the VGG16 model, excluding its top fully connected layers, and set it to operate

on input images of size 128x128 with 3 colour channels (RGB).

o The base model is frozen to prevent its weights from being updated during training,
ensuring that only the new layers we add are trained on the chest X-ray dataset.

Custom Layers:
o GlobalAveragePooling2D: Reduces the spatial dimensions of the feature maps while

retaining important information.

o Fully Connected Layers: A dense layer with 512 units and ReLU activation is added to
capture high-level features from the output of the base model. A dropout rate of 30% is
applied to prevent overfitting.

o Output Layer: The final layer is a dense layer with a SoftMax activation function, which
classifies the input into one of three classes (for example, 'normal', 'pneumonia', and
'tuberculosis').

Hyperparameter tuning
To further improve model performance, hyperparameter optimization was performed using a
custom Artificial Rabbit Optimization (ARO) algorithm. This approach iteratively searched
for the best combination of hyperparameters by exploring a reduced search space. The
primary hyperparameters optimized include:

Learning Rate: [1e-4, 1e-3, 1e-2]

Batch Size: [32, 64]

Filters: [64, 128]

Kernel Size: (3x3)

Dropout Rate: [0.3, 0.5]

Number of Hidden Units: [256, 512]

The ARO algorithm identified the following optimal parameters:

Learning Rate: 0.001

Batch Size: 64
Filters: 128

Kernel Size: (3, 3)

Dropout Rate: 0.3

Number of Hidden Units: 256

Model Training and Evaluation: After compiling the model, the model is trained on the
chest X-ray dataset, using a training set and validating on a separate test set. Early stopping is
utilized to ensure that the model does not overfit to the training data.

The model's performance is evaluated on the test dataset, and the following metrics are
reported:

 Test Loss: 0.13

 Test Accuracy: 0.95

Learning Curves The learning curves for both training and validation accuracy and loss are
plotted to visualize the model's performance throughout the training process. These plots help
in identifying issues such as overfitting or underfitting by comparing training and validation
trends and are shown in Figure 3.

 Training Accuracy: As the epochs progress, the training accuracy improves steadily,
indicating that the model is learning to classify chest X-rays effectively.

 Validation Accuracy: A stable or improving validation accuracy suggests that the


model generalizes well to unseen data.

 Training and Validation Loss: The training and validation losses are expected to
decrease over time, indicating that the model is successfully minimizing the loss
function.
Figure 3: Training; Validation Loss and Accuracy on VGG-16

Code Explanation:
This code snippet defines a function to build a deep learning model using VGG16 as the base
architecture for image classification, specifically for chest X-ray classification tasks. The
VGG16 model, pretrained on the ImageNet dataset, is used as a feature extractor by
excluding its top fully connected layers and freezing its weights, which ensures that the
model leverages the rich features learned from ImageNet without modifying the base model
during training. Custom fully connected layers are added on top of VGG16 to tailor the
model for the specific task, including two dense layers with ReLU activation and dropout for
regularization. The final output layer uses the softmax activation function for multi-class
classification. The model is compiled with the Adam optimizer (with a small learning rate of
0.0001), sparse categorical cross-entropy loss, and accuracy as the evaluation metric. This
setup allows the model to efficiently learn from the chest X-ray dataset while preventing
overfitting through dropout regularization, making it suitable for tasks like classifying
different conditions (e.g., normal, pneumonia, tuberculosis) in chest X-ray images.

# Build the model using VGG16 as the base


def build_model_vgg16(input_shape=(128, 128, 3), num_classes=3,
dropout_rate=0.3, num_hidden_units=512):
# Load VGG16 with pretrained weights, excluding the top fully connected
layers
base_model = VGG16(weights='imagenet', include_top=False,
input_shape=input_shape)
# Freeze the VGG16 layers
for layer in base_model.layers:
layer.trainable = False

# Add custom fully connected layers


x = base_model.output
x = layers.GlobalAveragePooling2D()(x)
x = layers.Dense(num_hidden_units, activation='relu')(x)
x = layers.Dropout(dropout_rate)(x)
x = layers.Dense(num_hidden_units // 2, activation='relu')(x)
x = layers.Dropout(dropout_rate)(x)

# Output layer
outputs = layers.Dense(num_classes, activation='softmax')(x)

# Create the model


model = models.Model(inputs=base_model.input, outputs=outputs)

return model

# Instantiate the model with optimal hyperparameters found by ARO


model = build_model_vgg16()

# Compile the model with optimized learning rate


model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001),
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])

10. Conclusion
In this project, we compared the performance of a custom Convolutional Neural Network
(CNN) and a pretrained VGG16 model for the task of chest X-ray classification. The custom
CNN model, tailored specifically for this task, achieved a test loss of 0.2959 and a test
accuracy of 96.5%, demonstrating its strong ability to classify chest X-rays effectively. On
the other hand, the VGG16 model, which leveraged pretrained weights from ImageNet and
fine-tuned for the same classification task, performed slightly better, with a test loss of 0.13
and a test accuracy of 95%. While both models showed impressive results, the VGG16 model
benefited from the transfer learning approach, utilizing rich features learned from a large,
diverse dataset. The custom CNN, however, provided competitive performance and could be
further optimized for specific tasks. In conclusion, both models demonstrated high accuracy
and could be valuable tools for automating the analysis of chest X-rays, with VGG16 offering
a slight edge due to its ability to leverage pretrained knowledge.

You might also like