Capstone Project AIML CV1 Interim Report
Capstone Project AIML CV1 Interim Report
Naresh Jani
Praveen Kumar
Rutvij Shishangiya
Varun Nair
Swati Jha
1
Pneumonia Detection challenge
Contents
1. Abstract .............................................................................................................................................. 3
2. Introduction ........................................................................................................................................ 3
a) About Pneumonia ........................................................................................................................... 3
b) Pneumonia Diagnosis and detection .............................................................................................. 4
C) Chest Radiographs Basics ............................................................................................................... 5
3. Summary of Problem Statement, Data and Findings ........................................................................ 5
a) Problem Statement ......................................................................................................................... 5
b) Given Data ...................................................................................................................................... 5
c) Findings ........................................................................................................................................... 6
4. Overview of the final process-data pre-processing steps and the algorithms used ........................ 7
a) Visualizations – Exploratory Data Analysis ..................................................................................... 7
a.i) Distribution of lung opacity in patients .................................................................................... 7
a.ii) Age-wise value counts for all records ...................................................................................... 9
a.ii) Graph represents which age group has more infection cases ............................................... 10
a.iii) Bar chart view of the ratio of Sex and the different lung opacities ...................................... 10
a.iv) Lung opacity scatter plot ....................................................................................................... 11
a.v) DICOM image Metadata ........................................................................................................ 12
a.vi) Visualization of Sample X-Ray images for each category...................................................... 14
a.vii) Visualization of Sample X-Ray images with bounding boxes ............................................... 15
5. Deciding Models and Model Building .............................................................................................. 15
a) Suitable algorithm for the given problem..................................................................................... 15
Fig 5.a.i ): Architecture of the CNN model .................................................................................... 17
Fig 5.a.ii ): Fit the model................................................................................................................ 17
Fig 5.a.iii) Train loss and Train accuracy graph ............................................................................. 17
Fig 5.a.iv) Confusion Matrix and classification report .................................................................. 18
6. Improve model performance ........................................................................................................... 18
2
Pneumonia Detection challenge
1. Abstract
This is the report of the capstone project, which is executed as part of the
academic requirement, and to complete the PGP programme on AIML from
Great learning, Great Lakes University.
The objective of the project is to build a CNN model to detect the presence of
pneumonia given a set of DICOM images.
2. Introduction
a) About Pneumonia
Pneumonia is an infection that inflames the air sacs in one or both lungs. The
air sacs may fill with fluid or pus (purulent material), causing cough with
phlegm or pus, fever, chills, and difficulty breathing. A variety of organisms,
including bacteria, viruses and fungi, can cause pneumonia.
It is a life-threatening disease particularly to infants, children and people older
than 65, and people with health problems or weakened immune systems. Signs
and symptoms of pneumonia may include:
Going by the statistics, in 2019, 2.5 million people died from pneumonia
around the world. 600,000 of them were children under 5 years of age. Three
out of ten infant deaths caused by pneumonia occur in the first month of life.
Between 2000 and 2012, infant mortality decreased by more than half.
3
Pneumonia Detection challenge
Fig 2.b.1 depicts an image with normal lungs. It is observed that there is a mass
of tissue surrounding the lungs and between the lungs. These areas contain
skin, muscles, fat, bones, and the heart and big blood vessels. That translates
into a lot of information on the chest radiograph that is not useful for detecting
the lung opacity.
Fig 2.b.1
4
Pneumonia Detection challenge
In the process of taking the image, an X-ray passes through the body and
reaches a detector on the other side. Tissues with sparse material, such as
lungs, which are full of air, do not absorb the X-rays and appear black in the
image. Dense tissues such as bones absorb the X- rays and appear white in the
image.
In short -
• Black = Air
• White = Bone
• Grey = Tissue or fluid
The left side of the subject is on the right side of the screen by convention. It
can also be observed that there is a small L at the top of the right corner. In a
normal image, we see the lungs as black, but they have different projections
on them - mainly the rib cage bones, main airways, blood vessels and the
heart.
• Build a model to identify whether CXR images have lung opacity or not.
• To build an algorithm to automatically locate lung opacities on chest
radiographs providing affected area details through bounding box.
b) Given Data
In the dataset, some of the features are labelled “Not Normal No Lung
Opacity”. This extra third class indicates that while pneumonia was determined
5
Pneumonia Detection challenge
c) Findings
Difference is clarified, as there are multiple rows for same patients with
different bounding boxes.
6
Pneumonia Detection challenge
From our EDA, we learned that there are 26684 unique patients. Overall, the
distribution of data is imbalanced with Target class being only 31.6% of the
whole dataset. This tends to result in bias. We have addressed such data issues
using augmentation techniques or used sampling methods to equally represent
data classes.
Here, are some of our findings from the DICOM images dataset
Few patients have multiple bounding boxes –
7
Pneumonia Detection challenge
• 31.41% patients have No Lung Opacity but may have other lung
abnormality
8
Pneumonia Detection challenge
9
Pneumonia Detection challenge
a.ii) Graph represents which age group has more infection cases
As per below chart, Patient age group between 40 to 60 having more cases of
infection.
a.iii) Bar chart view of the ratio of Sex and the different lung opacities
10
Pneumonia Detection challenge
From the below scatter plots, it can be observed that lung opacity is seen
maximum in age group 35-50 followed by group between
20-35.
Also we observe that lower age group ( less than 20) patients have less
Pneumonia compared to older ages. Above 65 age group Pneumonia patients
are less as the number of patients decreases.
11
Pneumonia Detection challenge
12
Pneumonia Detection challenge
13
Pneumonia Detection challenge
14
Pneumonia Detection challenge
In the given problem, the data sample contains images as input and the
information of those affected with pneumonia. The need is for us to build a
15
Pneumonia Detection challenge
model which learns from the given data and given a new sample image, the
model should be able to accurately classify if the image is of a pneumonia
affected person or not.
This is clearly a deep learning problem and since it involves image features and
metadata as input features and the output is classification of the image,
Convolutional neural network (CNN) models are the right models to be
adopted. The model involves building the input layer, feature extraction layer,
using activation functions, applying appropriate weights and classifiers
16
Pneumonia Detection challenge
17
Pneumonia Detection challenge
18