0% found this document useful (0 votes)
207 views18 pages

Capstone Project AIML CV1 Interim Report

This document summarizes a capstone project to build a CNN model to detect pneumonia in chest radiograph images. It introduces pneumonia, describes the given data and problem statement, which is to identify lung opacity in images and locate affected areas. Exploratory data analysis is performed including visualizations of sample images and metadata. A CNN model is developed and optimized to classify images as normal, not normal or having lung opacity.

Uploaded by

ernkjha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
207 views18 pages

Capstone Project AIML CV1 Interim Report

This document summarizes a capstone project to build a CNN model to detect pneumonia in chest radiograph images. It introduces pneumonia, describes the given data and problem statement, which is to identify lung opacity in images and locate affected areas. Exploratory data analysis is performed including visualizations of sample images and metadata. A CNN model is developed and optimized to classify images as normal, not normal or having lung opacity.

Uploaded by

ernkjha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Pneumonia Detection challenge

CAPSTONE PROJECT REPORT


Detect Pneumonia by detecting lung opacity using CNN model

Team Members (Jan 2022 Batch Group A)

Naresh Jani
Praveen Kumar
Rutvij Shishangiya
Varun Nair
Swati Jha

Date : 6th January, 2023


Mentored By : Mr. Gaurav Srivastava

1
Pneumonia Detection challenge

Contents
1. Abstract .............................................................................................................................................. 3
2. Introduction ........................................................................................................................................ 3
a) About Pneumonia ........................................................................................................................... 3
b) Pneumonia Diagnosis and detection .............................................................................................. 4
C) Chest Radiographs Basics ............................................................................................................... 5
3. Summary of Problem Statement, Data and Findings ........................................................................ 5
a) Problem Statement ......................................................................................................................... 5
b) Given Data ...................................................................................................................................... 5
c) Findings ........................................................................................................................................... 6
4. Overview of the final process-data pre-processing steps and the algorithms used ........................ 7
a) Visualizations – Exploratory Data Analysis ..................................................................................... 7
a.i) Distribution of lung opacity in patients .................................................................................... 7
a.ii) Age-wise value counts for all records ...................................................................................... 9
a.ii) Graph represents which age group has more infection cases ............................................... 10
a.iii) Bar chart view of the ratio of Sex and the different lung opacities ...................................... 10
a.iv) Lung opacity scatter plot ....................................................................................................... 11
a.v) DICOM image Metadata ........................................................................................................ 12
a.vi) Visualization of Sample X-Ray images for each category...................................................... 14
a.vii) Visualization of Sample X-Ray images with bounding boxes ............................................... 15
5. Deciding Models and Model Building .............................................................................................. 15
a) Suitable algorithm for the given problem..................................................................................... 15
Fig 5.a.i ): Architecture of the CNN model .................................................................................... 17
Fig 5.a.ii ): Fit the model................................................................................................................ 17
Fig 5.a.iii) Train loss and Train accuracy graph ............................................................................. 17
Fig 5.a.iv) Confusion Matrix and classification report .................................................................. 18
6. Improve model performance ........................................................................................................... 18

2
Pneumonia Detection challenge

1. Abstract

This is the report of the capstone project, which is executed as part of the
academic requirement, and to complete the PGP programme on AIML from
Great learning, Great Lakes University.
The objective of the project is to build a CNN model to detect the presence of
pneumonia given a set of DICOM images.

2. Introduction

a) About Pneumonia
Pneumonia is an infection that inflames the air sacs in one or both lungs. The
air sacs may fill with fluid or pus (purulent material), causing cough with
phlegm or pus, fever, chills, and difficulty breathing. A variety of organisms,
including bacteria, viruses and fungi, can cause pneumonia.
It is a life-threatening disease particularly to infants, children and people older
than 65, and people with health problems or weakened immune systems. Signs
and symptoms of pneumonia may include:

• Chest pain when you breathe or cough


• Confusion or changes in mental awareness (in adults age 65 and older)
• Cough, which may produce phlegm
• Fatigue
• Fever, sweating and shaking chills
• Lower than normal body temperature (in adults older than age 65 and
people with weak immune systems)
• Nausea, vomiting or diarrhoea
• Shortness of breath

Going by the statistics, in 2019, 2.5 million people died from pneumonia
around the world. 600,000 of them were children under 5 years of age. Three
out of ten infant deaths caused by pneumonia occur in the first month of life.
Between 2000 and 2012, infant mortality decreased by more than half.

3
Pneumonia Detection challenge

b) Pneumonia Diagnosis and detection

Pneumonia could be difficult to diagnose because the symptoms are variable


and very similar to those seen in cold or influenza. To diagnose pneumonia,
doctors will study the medical history, conduct physical exam and run tests. If
the doctor suspects, he could do further tests which includes blood test, study
of the Chest X-ray, Pulse oximetry, Sputum test or even CT scan.

On the Chest X-ray, pneumonia usually manifests as an area of increased


opacity. Comparison of Chest X-rays of the patient taken at different time
points and correlation with clinical symptoms and history are helpful in making
the diagnosis.

For the detection of Pneumonia, we need to detect Inflammation of the lungs.


In this project, we are challenged to build an algorithm to detect a visual signal
for pneumonia in medical images. Basically, our algorithm needs to
automatically locate lung opacities on chest radiographs.

Fig 2.b.1 depicts an image with normal lungs. It is observed that there is a mass
of tissue surrounding the lungs and between the lungs. These areas contain
skin, muscles, fat, bones, and the heart and big blood vessels. That translates
into a lot of information on the chest radiograph that is not useful for detecting
the lung opacity.

Fig 2.b.1

4
Pneumonia Detection challenge

C) Chest Radiographs Basics

In the process of taking the image, an X-ray passes through the body and
reaches a detector on the other side. Tissues with sparse material, such as
lungs, which are full of air, do not absorb the X-rays and appear black in the
image. Dense tissues such as bones absorb the X- rays and appear white in the
image.

In short -
• Black = Air
• White = Bone
• Grey = Tissue or fluid

The left side of the subject is on the right side of the screen by convention. It
can also be observed that there is a small L at the top of the right corner. In a
normal image, we see the lungs as black, but they have different projections
on them - mainly the rib cage bones, main airways, blood vessels and the
heart.

3. Summary of Problem Statement, Data and Findings


a) Problem Statement

Computer vision can be used in health care for identifying diseases. In


Pneumonia detection, we need to detect Inflammation of the lungs. In this
challenge, we are required to build an algorithm to detect a visual signal
for pneumonia in medical images.

Specifically, our objective is to:

• Build a model to identify whether CXR images have lung opacity or not.
• To build an algorithm to automatically locate lung opacities on chest
radiographs providing affected area details through bounding box.

b) Given Data

In the dataset, some of the features are labelled “Not Normal No Lung
Opacity”. This extra third class indicates that while pneumonia was determined

5
Pneumonia Detection challenge

not to be present, there was nonetheless some type of abnormality on the


image and oftentimes this finding may mimic the appearance of true
pneumonia. Dicom original images: - Medical images are stored in a special
format called DICOM files (*.dcm). They contain a combination of header
metadata as well as underlying raw image arrays for pixel data.

Original link to the dataset: https://www.kaggle.com/c/rsna-pneumonia-


detection-challenge/data

The data and the dataset contain:

• stage_2_detailed_class_info.csv – contains attributes patientId and class


information(Normal/Not Normal/Lung Opacity)
• stage_2_train_labels.csv – contains attributes patientId, x, y, width,
height and Target
• stage_2_train_images – 22684 images in .dcm format, to be used for
training the model
• stage_2_test_images – 3000 images in. dcm format.

c) Findings

Differences observed between training images and no of information given in


the .csv files.
No. of training Images: 26684
No. of rows given in csv file:30227

Difference is clarified, as there are multiple rows for same patients with
different bounding boxes.

6
Pneumonia Detection challenge

4. Overview of the final process-data pre-processing steps


and the algorithms used

a) Visualizations – Exploratory Data Analysis

From our EDA, we learned that there are 26684 unique patients. Overall, the
distribution of data is imbalanced with Target class being only 31.6% of the
whole dataset. This tends to result in bias. We have addressed such data issues
using augmentation techniques or used sampling methods to equally represent
data classes.

Here, are some of our findings from the DICOM images dataset
Few patients have multiple bounding boxes –

• 3266 patients have 2 bounding boxes defined


• 119 patients have 3 bounding boxes defined
• 13 patients have 4 bounding boxes defined

We will run a binary classification to predict patients with pneumonia.

a.i) Distribution of lung opacity in patients

• 45.06% patients are with Lung Opacity


• 23.52% patients have healthy/Normal lung image

7
Pneumonia Detection challenge

• 31.41% patients have No Lung Opacity but may have other lung
abnormality

8
Pneumonia Detection challenge

a.ii) Age-wise value counts for all records


Patient distribution shows most pneumonia patients or patients are found
between age 40-60. More records for male are observed in the data.

9
Pneumonia Detection challenge

a.ii) Graph represents which age group has more infection cases

As per below chart, Patient age group between 40 to 60 having more cases of
infection.

a.iii) Bar chart view of the ratio of Sex and the different lung opacities

Number of male patients with Pneumonia is greater than female patients.

10
Pneumonia Detection challenge

Distribution of three different classes based on patient sex.

We can clearly see that ‘MALE’ have higher number of observations as


compared to ‘FEMALE’ in all the classes.

a.iv) Lung opacity scatter plot

From the below scatter plots, it can be observed that lung opacity is seen
maximum in age group 35-50 followed by group between
20-35.

Also we observe that lower age group ( less than 20) patients have less
Pneumonia compared to older ages. Above 65 age group Pneumonia patients
are less as the number of patients decreases.

11
Pneumonia Detection challenge

a.v) DICOM image Metadata


DICOM (Digital Imaging and Communications in Medicine) is a standard for
storing, processing, and transmitting medical images and related information.
DICOM images consists below metadata about the patient. We parsed DICOM
image using dcmread method of pydicom library.

12
Pneumonia Detection challenge

13
Pneumonia Detection challenge

a.vi) Visualization of Sample X-Ray images for each category

14
Pneumonia Detection challenge

a.vii) Visualization of Sample X-Ray images with bounding boxes

5. Deciding Models and Model Building

a) Suitable algorithm for the given problem

In the given problem, the data sample contains images as input and the
information of those affected with pneumonia. The need is for us to build a

15
Pneumonia Detection challenge

model which learns from the given data and given a new sample image, the
model should be able to accurately classify if the image is of a pneumonia
affected person or not.

This is clearly a deep learning problem and since it involves image features and
metadata as input features and the output is classification of the image,
Convolutional neural network (CNN) models are the right models to be
adopted. The model involves building the input layer, feature extraction layer,
using activation functions, applying appropriate weights and classifiers

16
Pneumonia Detection challenge

Fig 5.a.i ): Architecture of the CNN model

Fig 5.a.ii ): Fit the model

Fig 5.a.iii) Train loss and Train accuracy graph

17
Pneumonia Detection challenge

Fig 5.a.iv) Confusion Matrix and classification report

6. Improve model performance

1) Approaches to improve model performance—To be covered in Final


Report
2) In next Milestone, Performance can be improved using transfer learning,
by adding more layers, hyper parameter tuning.
3) Transfer learning can be applied using such VGG-16,VGG-19, Restnet etc.
4) Look further if still any data augmentation technique can be useful to
improve further.

18

You might also like