0% found this document useful (0 votes)

24 views34 pages

Multiclass Report

Reacherch paper on cervical cancer

Uploaded by

lovemysound1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views34 pages

Multiclass Report

Reacherch paper on cervical cancer

Uploaded by

lovemysound1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

KLE Society's

KLE Technological University

A Mini Project Report

CLASSIFICATION OF IMBALANCED MEDICAL IMAGE DATA

submitted in partial fulfillment of the requirement for the degree of

Bachelor of Engineering
In
Computer Science and Engineering

Submitted By

Atharv Kadole 01fe20bcs041

Under the guidance of

Mrs. Nirmala Patil

SCHOOL OF COMPUTER SCIENCE & ENGINEERING

HUBLI–580 031 (India).

Academic year 2022-23
KLE Society's
KLE Technological University

2022 - 2023

SCHOOL OF COMPUTER SCIENCE & ENGINEERING

CERTIFICATE

This is to certify that Mini Project entitled CLASSIFICATION OF IMBALANCED

MEDICAL IMAGE DATA is a bonafied work carried out by the student team Mr.

Akshath Raj – 01fe20bcs293, Mr. Vaiybhav Balachenna – 01fe20bcs297, Mr. Atharv

Kadole – 01fe20bcs041, in partial fulfillment of completion of Fifth semester B. E. in

Computer Science and Engineering during the year 2022 – 2023. The project report has

been approved as it satisfies the academic requirement with respect to the project work

prescribed for the above said programme.

Guide Head, SoCSE

Mrs. Nirmala Patil Dr. Meena S. M

External Viva:
Name of the Examiners Signature with date
1.
2.
ABSTRACT

One of the most important factors affecting human health is Breast Cancer. The process
of diagnosing this disease involves the use of pathological breast cancer images. Medical
image classification of these pathological breast cancer images plays an important role in
clinical treatment and computer-aided diagnosing tasks. Deep learning techniques provide
an effective way to construct an end-to-end model that can compute final classification
labels with the raw pixels of medical images. However, imbalanced class distribution in
the medical image data, which leads to misclassification, is a great challenge in this field.
In this project, to handle the problem of class imbalance, we perform various resampling
techniques to balance the medical image data. Multiple state-of-art deep learning models
are used for multi-class classification of the image data. These models are trained both on
imbalanced and balanced image data to compare the results. To improve the accuracy of
predictions ensemble learning is used, as an ensemble model which combines the trained
deep learning models, can make better predictions, improve accuracy and achieve better
performance than any single contributing model.

Keywords: Medical Image Classification, Deep Learning, Ensemble Learning

ACKNOWLEDGEMENTS

We take this opportunity to thank Dr. Ashok Shettar (Vice-Chancellor, KLE

Technological University, Hubli), Dr. Prakash Tewari (Dean of Academic Affairs, KLE
Technological University, Hubli) and Dr. Meena S M (Head of School of Computer
Science and Engineering, KLE Technological University, Hubli).

We also take this opportunity to thank Mrs. Nirmala Patil, our guide, for providing us
with an academic environment that nurtured our practical skills and contributed to our
project's success.

We sincerely thank Mr. Mahesh Patil, Mini Project Coordinator, for their support,
inspiration, and wholehearted cooperation during the course of completion.

Akshath Raj
Vaiybhav Balachenna
Atharv Kadole
Chapter TABLE OF CONTENTS Page No.
No.
1. INTRODUCTION 1-3
1.1 Preamble 1
1.2 Motivation 2
1.3 Objectives of the project 2
1.4 Literature Survey 2
1.5 Problem Definition 3
2. PROPOSED SYSTEM 4-4
2.1 Description of Proposed System. 4
2.2 Description of Target Users 4
2.3 Advantages of Proposed System 4
2.4 Scope 4
3. SOFTWARE REQUIREMENT SPECIFICATION 5-7
3.1 Overview of SRS 5
3.2 Requirement Specifications 5
3.2.1 Functional Requirements 5
3.2.2 Nonfunctional Requirements 5
3.2.4.1 Performance Requirements 5
3.2.4.2 Usability 5
3.2.3 Use Case Diagram 6
3.2.4 Use Case Description 6
3.3 Software and Hardware requirement specifications 7
4 SYSTEM DESIGN 8-10
4.1 Architecture of the system 8
4.2 Data Set Description 10
5 IMPLEMENTATION 11-11
5.1 Proposed Methodology 11
6 TESTING 12-12
6.1 Acceptance Testing 12
6.2 Unit Testing 12
7 RESULTS AND DISCUSSIONS 13-19
7.1 Results 13
7.2 Discussions 19
8 CONCLUSION AND FUTURE SCOPE 20-20
8.1 Conclusion 20
8.2 Future scope 20
9 REFERENCES 21-21
10 APPENDIX 22-23
A Gantt Chart 22
C Description of Tools & Technology used 22
D Blue Print 23
Classification of Imbalanced Medical Image Data
------------------------------------------------------------------------------------------------------------

1. INTRODUCTION

1.1 Preamble
Breast cancer is the most common disease for women. It is cells with extra growth
of mass in women’s breast region. This breast tissue forms the tumor, which is
classified as benign or malignant. The malignant is the most affected cancerous
region, and the benign is the non-cancerous region. This disease is diagnosed by
biopsy. The researchers analyze various automated diagnosis approaches to
determine breast cancer. The stroma maturity of cancer in the breast is classified by
the histological image. The breast cancer image of the stroma is the matured result
of classification. Thermograph and mammography images are used for this
approach. The thermograph images are taken from cameras, which are analyzed by
infrared radiation and its intensity level. By comparing with the thermograph and
mammogram, the mammogram image provides the exact result.
However, in many medical and clinical cases, it can be hard to collect a balanced
dataset for training since some diseases have a low prevalence. This leads to the data
imbalance problem, namely, the number of samples in different classes is not
balanced. Imbalanced data can negatively affect the performance of models
significantly. Many models that perform well on balanced datasets cannot achieve
good performances when it comes to their imbalanced counterparts. To solve the
problem of class imbalance, resampling techniques can be used like over sampling
and under sampling.
The breast cancer images are classified using various machine learning and deep
learning techniques. Computer-aided diagnosis is an important research field in
medical image classification, where the goal of a majority of task is to differentiate
between different classes of benign and malignant, and predict the accurate class of
the breast cancer image. With the development of deep learning, medical image
classification has achieved remarkable progress. Usually, the training of deep
learning models need plenty of labeled samples that belong to different classes.
Various state-of-art deep learning models like CNN, VGG-19, ResNet-50 are used
for multi-class classification of medical image data.
Ensemble learning is a machine learning technique that combines several base
models in order to produce one optimal predictive model. Ensemble learning
strategies are beneficial in deep learning based medical image classification as
assembling of diverse models has the advantage to combine their strengths in
focusing on different features whereas balancing out the individual incapability of a
model. The final prediction from these ensembling techniques is obtained by
combining results from several base models. Averaging, weighted average method
and voting are some of the ways the results are combined to obtain a final
prediction.

------------------------------------------------------------------------------------------------------------
School of Computer Science and Engineering 1
Classification of Imbalanced Medical Image Data
------------------------------------------------------------------------------------------------------------
1.2 Motivation
• Cancer is a major health issue and due to its recent high increase, will be the
number one cause of death in the coming decades.
• The class imbalance distribution is a general problem for medical real-world data
and particularly cancer data.
• Incorrect classification of noncancerous cells may lead to serious health
consequences.
• As such, data analysis of healthcare and treatment data is crucial to doctors for
predicting the class of cancer at their early stages as well as making right clinical
decisions.

1.3 Objectives
• To rebalance the imbalanced data using resampling methods.
• To classify images to its corresponding class using deep learning models.
• To evaluate the accuracy of deep learning models.
• To compare the results of classification of imbalanced and balanced data.
• To ensemble the trained models to improve accuracy.
• To predict the class of unseen random pathological images from testing set.

1.4 Literature Survey

[1] Classifying medical images is a fundamental problem in computer vision as well
as image segmentation and detection. There are a lot of methods used to classify
images such as Artificial Neural Networks (ANN), Support Vector Machine(SVM)
and Convolutional Neural Network (CNN). CNN has played a vital role in
classification in the last years, due to the power of deep learning among other types
of machine learning. In this paper, the basic concept of CNN has been summarized.
Moreover, various models are simply discussed on CNN.

[2] Medical image classification plays an essential role in clinical treatment and
teaching tasks. Moreover, by using them, much time and effort need to be spent on
extracting and selecting classification features. The deep neural network is an
emerging machine learning method that has proven its potential for different
classification tasks. Notably, the convolutional neural network dominates with the
best results on varying image classification tasks. However, medical image datasets
are hard to collect because it needs a lot of professional expertise to label them.
Therefore, this paper researches how to apply the convolutional neural network
(CNN) based algorithm on a chest X-ray dataset to classify pneumonia. Three
techniques are evaluated through experiments. These are linear support vector
machine classifiers with local rotation and orientation-free features, transfer learning

------------------------------------------------------------------------------------------------------------
School of Computer Science and Engineering 2
Classification of Imbalanced Medical Image Data
------------------------------------------------------------------------------------------------------------
on two convolutional neural network models: Visual Geometry Group i.e., VGG16
and InceptionV3, and a capsule network training from scratch. Data augmentation is
a data preprocessing method applied to all three methods. The results of the
experiments show that data augmentation generally is an effective way for all three
algorithms to improve performance. Also, Transfer learning is a more useful
classification method on a small dataset compared to a support vector machine with
oriented fast and rotated binary (ORB) robust independent elementary features and
capsule network. In transfer learning, retraining specific features on a new target
dataset is essential to improve performance. And, the second important factor is a
proper network complexity that matches the scale of the dataset.

[3] In this paper, a semi-supervised learning-based image classification method is

proposed, which uses a small amount of labelled pathological image data to train the
network model, and then integrates the features extracted by the network to classify
the image. The results show that the classification effect of the neural network is
better than convolution neural networks and other traditional image classification
models. To some extent, it can reduce the dependence of neural networks on a large
number of training samples, and effectively reduce the overfitting phenomenon of
the network. Through the analysis of the overall classification accuracy and kappa
coefficient of different classification methods under different sample numbers, it is
found that the overall classification accuracy and kappa coefficient are increasing
with the increasing number of training samples. Especially in the case of a small
number of training samples, compared with other deep neural networks and
traditional classification methods, the classification accuracy of the counter neural
network is about 10% higher than that of other neural networks and traditional
classification methods, and the advantages are more obvious

1.5 Problem Definition

To classify multiclass imbalanced medical image data using deep neural networks
and ensemble techniques.

------------------------------------------------------------------------------------------------------------
School of Computer Science and Engineering 3
Classification of Imbalanced Medical Image Data
------------------------------------------------------------------------------------------------------------

2. PROPOSED SYSTEM

2.1 Description of Proposed System

The system can classify the breast cancer images into 8 classes namely Adenosis,
Fibroadenoma, Phyllode Tumor, Tubular Adenoma, Ductal Carcinoma, Lobular
Carcinoma, Mucinous Carcinoma, Papillary Carcinoma. It reads the images and
assign label to the images. To balance the uneven distribution of classes in the image
data resampling techniques like over sampling and under sampling are used. It
combines both over sampling and under sampling techniques to balance the data.
SMOTE (Synthetic Minority Oversampling Technique) which is an over sampling
technique is used in which minority class samples are increased randomly by
replicating them. SMOTE synthesizes new minority instances between existing
minority instances. Tomek links elimination is used as an under sampling technique
which removes noisy and borderline majority class samples. Both imbalanced and
balanced data are trained on deep learning models like CNN, VGG-19, ResNet-50 to
compare the results. Ensemble learning techniques like summed average and
weighted average ensemble are used to improve the accuracy of predictions.

2.2 Description of Target Users

With the help of the system, the possible errors made by pathologists and
physicians, such as those caused by inexperience, fatigue, stress, and so on can be
avoided, and the medical data can be examined in a shorter time and in a more
detailed manner. The system can be used in hospitals and clinics for clinical
treatment and diagnosing purposes to predict the class of breast cancer of the given
pathological breast cancer image in a much easy way and in less time.

2.3 Advantages of Proposed System

• Multi-class classification is implemented in the system.
• It easily predicts the class of given breast cancer image.
• It enhances the accuracy of multiple models to give better final output.
• Predictions from the system can be used for treatment of the disease.

2.4 Scope
As of now our proposed system is trained only on histopathological images of 40x
magnification but in future we will be able to train this system irrespective of
magnifying factors.

------------------------------------------------------------------------------------------------------------
School of Computer Science and Engineering 4
Classification of Imbalanced Medical Image Data
------------------------------------------------------------------------------------------------------------

3. SOFTWARE REQUIREMENT SPECIFICATIONS

3.1 Overview of SRS

A software requirements specification (SRS) is a detailed description of a software
system to be developed with its functional and non-functional requirements. The
SRS is developed based on the agreement between customer and contractors. It may
include the use cases of how the user is going to interact with the software system.
The software requirement specification document is consistent with all necessary
requirements required for project development. To develop the software system, we
should have a clear understanding of Software system. To achieve this, we need
continuous communication with customers to gather all requirements.

3.2 Requirement Specifications

Requirement Specifications are mainly of two types. They are Functional
requirements and Non-Functional Requirements.

3.2.1 Functional Requirements

• The system shall process the input images and label their corresponding
class.
• The system shall visualize the distribution of different classes.
• The system shall balance the input data using resampling techniques.
• The system shall classify the images using individual deep learning
models.
• The system shall ensemble deep learning models to improve the
prediction accuracy.

3.2.2 Non-Functional Requirements

3.2.2.1 Performance Requirements
• The system should be able to give more than 85% accuracy for
unseen testing data.
• The system should be able to predict the class of breast cancer
image within 1 second.

3.2.2.2 Usability
• The system should be user friendly to input the image data and
predict its corresponding class.

------------------------------------------------------------------------------------------------------------
School of Computer Science and Engineering 5
Classification of Imbalanced Medical Image Data
------------------------------------------------------------------------------------------------------------
3.2.3 Use Case Diagram

Figure 1: Use Case Diagram

3.2.4 Use Case Description

Use Case: Balance Dataset
Primary Actor: System
Goal in Context: To resample dataset such that different classes have equal
number of images
Pre-Conditions: Imported dataset should be imbalanced

------------------------------------------------------------------------------------------------------------
School of Computer Science and Engineering 6
Classification of Imbalanced Medical Image Data
------------------------------------------------------------------------------------------------------------
Scenarios:
• Dataset can be balanced by using oversampling techniques
• Dataset can be balanced by using undersampling techniques
• Dataset can be balanced by combing both oversampling and
undersampling
Exceptions:
• Dataset is already balanced.
• Dataset cannot be balanced since it is very large and have high
dimensions
Frequency of Use: Whenever dataset is imported

3.3 Software and Hardware Requirement Specifications

• 8 GB RAM or higher.
• Minimum hard disk space of 16 GB.
• Windows OS.
• Python with its libraries must be installed.
• GPU is needed to train the model rapidly.
• Intel Core(TM) i5 @ 2.40GHz processor or higher.

------------------------------------------------------------------------------------------------------------
School of Computer Science and Engineering 7
Classification of Imbalanced Medical Image Data
------------------------------------------------------------------------------------------------------------

4. SYSTEM DESIGN

4.1 Architecture of the System

Architecture of the three deep learning models used in the system:
• CNN
A Convolutional Neural Network(CNN) with 6 convolution layers is defined and
trained on both imbalanced and balanced data . In these convolution layers the
features of image gets extracted. The other layers of CNN are input layer, pooling
layer, fully connected layer and an output layer. Through input layer images of
dimension 64 x 64 are passed. Pooling layer is used to reduce the spatial volume
of input image after convolution. It is used between two convolution layers.
Flatten function converts the pooled layer into fully connected layer. Dense adds
the fully connected layer to output layer. SoftMax activation function is used at
output layer to transform the raw outputs of the neural network into a vector of
probabilities. Figure 2 shows the architecture of CNN.

Figure 2: Architecture of CNN

• VGG-19
VGG-19 is a convolutional neural network that is 19 layers deep. You can load a
pretrained version of the network trained on more than a million images from
the ImageNet database. The pretrained network can classify images into 1000
object categories, such as keyboard, mouse, pencil, and many animals. As per our
need we can change the output layer of the model. We flatten the layer after
loading the base model. Dense adds the flatten layer to output layer which has 8
classes to predict the different classes of breast cancer. Softmax activation
function is used as it is a multi-class classification problem. Figure 3 shows
VGG-19 architecture.
------------------------------------------------------------------------------------------------------------
School of Computer Science and Engineering 8
Classification of Imbalanced Medical Image Data
------------------------------------------------------------------------------------------------------------

Figure 3: Architecture of VGG-19

• ResNet-50
ResNet-50 is a convolutional neural network that is 50 layers deep. ResNet, short
for Residual Networks is a classic neural network used as a backbone for many
computer vision tasks. We load a pretrained version of the network trained on
more than a million images from the ImageNet database which can classify
images into 1000 object categories. As per our need we add input layer which
takes input of images of dimension 64 x 64, and an output layer of 8 classes with
softmax activation function since it is a multiclass classification. Figure 4 shows
Resnet-50 archtecture.

Figure 4: Architecture of ResNet-50

------------------------------------------------------------------------------------------------------------
School of Computer Science and Engineering 9
Classification of Imbalanced Medical Image Data
------------------------------------------------------------------------------------------------------------
4.2 Data Set Description
Dataset Source:
• The name of the dataset is BreakHis_v1.
• It has 8 classes namely Adenosis, Fibroadenoma, Phyllode Tumor, Tubular
Adenoma, Ductal Carcinoma, Lobular Carcinoma, Mucinous Carcinoma,
Papillary Carcinoma.
• Source of Dataset: Breast Cancer Histopathological Database (BreakHis) -
Laboratório Visão Robótica e Imagem (ufpr.br)
• This database has been built in collaboration with the P&D Laboratory –
Pathological Anatomy and Cytopathology, Parana, Brazil.

Dataset Analysis:
• It is composed of 9,109 microscopic images of breast tumor tissue.
• It is collected from 82 patients using different magnifying factors (40X, 100X,
200X, and 400X).
• Due to hardware limitation, we use images of magnifying factor 40X for
classification which has 1995 images.

Dataset Pre-Processing:
• The dataset is highly imbalanced among the classes.
• For balancing, both Oversampling using SMOTE and Under sampling using
Tomek Links are combined.
• SMOTE (Synthetic Minority Oversampling TEchnique) is used for Oversampling
which generates synthetic samples of minority class.
• Under sampling is done by eliminating Tomek Links, are pairs of patches of
opposite classes who are their own nearest neighbors.

------------------------------------------------------------------------------------------------------------
School of Computer Science and Engineering 10
Classification of Imbalanced Medical Image Data
------------------------------------------------------------------------------------------------------------

5. IMPLEMENTATION

5.1 Proposed Methodology

The system is divided into three main sections. In the first section, we import the
imbalance medical image dataset which is balanced using resampling techniques
like oversampling and undersampling. SMOTE which is an oversampling technique
and Tomek links elimination which is an undersampling technique are combined to
give the balanced image dataset. In the next section, we train both balanced and
imbalanced data on each individual deep learning model to classify the images. The
deep learning models used here are CNN, VGG-19 and ResNet-50. The accuracy of
each individual deep learning model is calculated from the predictions obtained
from the model. In the last section, all the three deep learning models are ensembled.
Summed average and weighted average are the ensembling techniques used which
gives the to combined predictions of all the three deep learning model to improve
the overall accuracy of the final output.

Figure 5: Block diagram of the proposed system

------------------------------------------------------------------------------------------------------------
School of Computer Science and Engineering 11
Classification of Imbalanced Medical Image Data
------------------------------------------------------------------------------------------------------------

6. TESTING

6.1 Acceptance Testing

Test ID Input Description Expected Output Actual Output Status

Found 1995 files Found 1995 files
1 Import dataset belonging to 8 belonging to 8 Pass
classes classes
It displays the
Should display the
Predict class of original and
original and
2 random image predicted class Pass
predicted class of
from testing set of given input
given input image
image
Table 1: Acceptance test plan

6.2 Unit Testing

Test ID Input Description Expected Output Actual Output Status

Should equals the It equals the
Balancing dataset
distribution of distribution of
1 using resampling Pass
different classes different classes
techniques
of breast cancer of breast cancer
Obtain accuracy of Should print the It prints the
2 CNN Model accuracy of CNN accuracy of CNN Pass
(Balanced) Model (Balanced) Model (Balanced)
Should print the It prints the
Obtain accuracy of
accuracy of accuracy of
3 VGG-19 Model Pass
VGG-19 Model VGG-19 Model
(Balanced)
(Balanced) (Balanced)
Should print the It prints the
Obtain accuracy of
accuracy of accuracy of
4 ResNet-50 Model Pass
ResNet-50 Model ResNet-50 Model
(Balanced)
(Balanced) (Balanced)
Table 2: Unit test plan

------------------------------------------------------------------------------------------------------------
School of Computer Science and Engineering 12
Classification of Imbalanced Medical Image Data
------------------------------------------------------------------------------------------------------------

7. RESULTS AND DISCUSSIONS

7.1 Results
Dataset Visualization:
In Figure 6 this we can see that the image dataset is highly imbalanced.

Figure 6: Distribution of different classes

Resampling:
The dataset is split into training and testing data with 80% training and 20% testing
data. Figure 7 shows distribution of classes before resampling and Figure 8 shows
distribution of classes after resampling.

Figure 7: Distribution of different classes in train-test data before resampling

------------------------------------------------------------------------------------------------------------
School of Computer Science and Engineering 13
Classification of Imbalanced Medical Image Data
------------------------------------------------------------------------------------------------------------

Figure 8: Distribution of different classes in train-test data after sampling

CNN:
In Figure 9 we can compare the training and validation accuracy of imbalanced and
balanced data plotted against epochs. In Figure 10 we can compare the confusion
matrix obtained from imbalanced and balanced data.
Accuracy of CNN on imbalanced data is 90.48% and for balanced data is 93.55%.

Figure 9: Accuracy plot of CNN on imbalanced (left) and balanced (right) data

Figure 10: Prediction comparison using confusion matrix of CNN on imbalanced

(left) and balanced (right) data
------------------------------------------------------------------------------------------------------------
School of Computer Science and Engineering 14
Classification of Imbalanced Medical Image Data
------------------------------------------------------------------------------------------------------------
CNN Classification Report (Imbalanced):

precision recall f1-score support

B_A 1.00 0.77 0.87 22

B_F 0.90 0.92 0.91 48
B_P 0.88 0.94 0.91 16
B_T 0.96 0.90 0.93 29
M_D 0.91 0.95 0.93 171
M_L 0.83 0.91 0.87 32
M_M 0.95 0.85 0.90 47
M_P 0.85 0.82 0.84 34

accuracy 0.90 399

macro avg 0.91 0.88 0.89 399
weighted avg 0.91 0.90 0.90 399

CNN Classification Report (Balanced):

precision recall f1-score support

B_A 0.94 0.98 0.96 171

B_F 0.97 0.94 0.96 171
B_P 0.95 1.00 0.97 171
B_T 0.97 0.91 0.94 171
M_D 0.81 0.96 0.88 169
M_L 0.97 0.92 0.95 169
M_M 0.95 0.95 0.95 171
M_P 0.95 0.82 0.88 171

accuracy 0.94 1364

macro avg 0.94 0.94 0.94 1364
weighted avg 0.94 0.94 0.94 1364

VGG-19:
In Figure 11 we can compare the training and validation accuracy of imbalanced
and balanced data plotted against epochs. In Figure 12 we can compare the
confusion matrix obtained from imbalanced and balanced data.
Accuracy of VGG-19 on imbalanced data is 45.68% and for balanced data is
51.38%.

------------------------------------------------------------------------------------------------------------
School of Computer Science and Engineering 15
Classification of Imbalanced Medical Image Data
------------------------------------------------------------------------------------------------------------

Figure 11: Accuracy plot of VGG-19 on imbalanced (left) and balanced (right) data

Figure 12: Prediction comparison using confusion matrix of VGG-19 on

imbalanced (left) and balanced (right) data

VGG-19 Classification Report (Imbalanced):

precision recall f1-score support

B_A 0.15 0.50 0.24 22

B_F 0.51 0.40 0.45 48
B_P 0.33 0.31 0.32 16
B_T 0.62 0.45 0.52 29
M_D 0.65 0.73 0.69 171
M_L 0.59 0.31 0.41 32
M_M 0.45 0.19 0.27 47
M_P 0.48 0.38 0.43 34

accuracy 0.51 399

macro avg 0.47 0.41 0.41 399
weighted avg 0.55 0.51 0.51 399

------------------------------------------------------------------------------------------------------------
School of Computer Science and Engineering 16
Classification of Imbalanced Medical Image Data
------------------------------------------------------------------------------------------------------------
VGG-19 Classification Report (Balanced):

precision recall f1-score support

B_A 0.30 0.65 0.41 171

B_F 0.54 0.33 0.41 171
B_P 0.71 0.35 0.46 171
B_T 0.84 0.54 0.65 171
M_D 0.34 0.62 0.44 170
M_L 0.70 0.51 0.59 170
M_M 0.31 0.16 0.21 171
M_P 0.51 0.50 0.50 171

accuracy 0.46 1366

macro avg 0.53 0.46 0.46 1366
weighted avg 0.53 0.46 0.46 1366

ResNet-50:
In Figure 13 we can compare the training and validation accuracy of imbalanced
and balanced data plotted against epochs. In Figure 14 we can compare the
confusion matrix obtained from imbalanced and balanced data.
Accuracy of ResNet-50 on imbalanced data is 25.11% and for balanced data is
31.08%.

Figure 13: Accuracy plot of ResNet-50 on imbalanced (left) and balanced (right)
data

------------------------------------------------------------------------------------------------------------
School of Computer Science and Engineering 17
Classification of Imbalanced Medical Image Data
------------------------------------------------------------------------------------------------------------

Figure 14: Prediction comparison using confusion matrix of ResNet-50 on

imbalanced (left) and balanced (right) data

ResNet-50 Classification Report (Imbalanced):

precision recall f1-score support

B_A 0.08 0.68 0.15 22

B_F 0.27 0.12 0.17 48
B_P 0.22 0.12 0.16 16
B_T 0.25 0.14 0.18 29
M_D 0.67 0.46 0.54 171
M_L 0.31 0.28 0.30 32
M_M 0.29 0.04 0.07 47
M_P 0.36 0.24 0.29 34

accuracy 0.31 399

macro avg 0.31 0.26 0.23 399
weighted avg 0.44 0.31 0.34 399

ResNet-50 Classification Report (Balanced):

precision recall f1-score support

B_A 0.16 0.80 0.26 171

B_F 0.26 0.06 0.10 171
B_P 0.21 0.06 0.09 171
B_T 0.30 0.12 0.17 171
M_D 0.52 0.25 0.34 170
M_L 0.63 0.46 0.53 170
M_M 0.24 0.04 0.07 171
M_P 0.40 0.22 0.28 171

accuracy 0.25 1366

macro avg 0.34 0.25 0.23 1366
weighted avg 0.34 0.25 0.23 1366
------------------------------------------------------------------------------------------------------------
School of Computer Science and Engineering 18
Classification of Imbalanced Medical Image Data
------------------------------------------------------------------------------------------------------------

Ensemble Model:
The weighted ensemble model gives the same accuracy as CNN Model as CNN as
been assigned with highest weight. Therefore, the final accuracy of the model is
93.55%.

7.2 Discussions
We have discussed the comparison of each deep learning model on both imbalanced
and balanced data and find out that accuracy has improved after balancing the
dataset.

------------------------------------------------------------------------------------------------------------
School of Computer Science and Engineering 19
Classification of Imbalanced Medical Image Data
------------------------------------------------------------------------------------------------------------

8. CONCLUSION AND FUTURE SCOPE

8.1 Conclusion
We thus conclude that CNN gives the best accuracy of 93.55% as compared to
VGG-19 and ResNet-50. The ensemble model which combines the predictions of all
three deep learning models also gives the same accuracy as CNN has been assigned
the highest weight for the weighted average ensemble model.

8.2 Future Scope

In future the system should train more deep learning models and should predict
images of different magnifying factors.

------------------------------------------------------------------------------------------------------------
School of Computer Science and Engineering 20
Classification of Imbalanced Medical Image Data
------------------------------------------------------------------------------------------------------------

9. REFERENCES

[1]Research on Medical Image Classification Based on Machine Learning-Hai

tang,Zhihui hu.

[2]Deep convolutional neural network based medical image classification for disease
diagnosis Samir S. Yadav & Shivajirao M. Jadhav Journal of Big Data volume 6, Article
number: 113 (2019).

[3]An Overview of Medical Images Classification based on CNN-December 2020:Toqa

A.Sadoon.

[4]Medical Image Classification Based on Deep Features Extracted by Deep Model and
Statistic Feature Fusion with Multilayer Perceptron ZhiFei Lai 1and HuiFang Deng1.

[5]Biomedical Image Classification in a Big Data Architecture Using Machine Learning

Algorithms Christian Tchito Tchapga, 1 Thomas Attia Mih, Aurelle Tchagna
Kouanou,corresponding author 1 , 2 Theophile Fozin Fonzin, 2 , 3 Platini Kuetche
Fogang, 4 Brice Anicet Mezatio, 2 and Daniel Tchiotsop 5.

[6]A Framework for Medical Images Classification Using Soft Set Saima Anwar
Lashari*, Rosziati Ibrahim.

[7]Medical image analysis based on deep learning approach

Muralikrishna Puttagunta & S. Ravi

[8]Automatic classification of medical image modality and anatomical location using

convolutional neural network Chen-Hua Chiang, Conceptualization, Data curation,
Formal analysis, Investigation, Methodology, Writing – original draft,1,2 Chi-Lun Weng,
Data curation, Investigation, Methodology, Project administration,3 and Hung-Wen Chiu,
Conceptualization, Investigation, Supervision, Writing – review & editing4,*

[9]Code-free deep learning for multi-modality medical image classification Edward

Korot, Zeyu Guan, Daniel Ferraz, Siegfried K. Wagner, Gongyu Zhang, Xiaoxuan Liu,
Livia Faes, Nikolas Pontikos,

[10]A Review on Medical Image Analysis using DeepLearning Mukesh

Bhardwaj1,Vivek Singh Kushwah2
,Subhrendu Guha Neogi3
------------------------------------------------------------------------------------------------------------
School of Computer Science and Engineering 21
Classification of Imbalanced Medical Image Data
------------------------------------------------------------------------------------------------------------

10. APPENDIX

A Gantt Chart

B Description of Tools and Technology Used

Python:
Python is a high-level, general-purpose programming language. Its design philosophy
emphasizes code readability with the use of significant indentation. Python is
dynamically-typed and garbage-collected. It supports multiple programming
paradigms, including structured, object-oriented and functional programming.

Tensorflow:
TensorFlow is a free and open-source software library for machine learning and
artificial intelligence. It can be used across a range of tasks but has a particular focus
on training and inference of deep neural networks.

Keras:
Keras is a free, open source, high-level, deep learning API developed by Google
for implementing neural networks. It is written in Python and is used to make the
implementation of neural networks easy. It also supports multiple backend neural
network computation.

------------------------------------------------------------------------------------------------------------
School of Computer Science and Engineering 22
Mini Project
ORIGINALITY REPORT

14 %
SIMILARITY INDEX
8%
INTERNET SOURCES
12%
PUBLICATIONS
6%
STUDENT PAPERS

PRIMARY SOURCES

1 H Spoorti, R Sneha, V Soujanya, K Heena, S

Pooja, D. G Narayan. "Secure Access Control
2%
to Cloud Resources using Blockchain", 2021 IEEE
International Conference on Distributed
Computing, VLSI, Electrical Circuits and Robotics
(DISCOVER), 2021
Publication

2 Han Liu, Dezhi Han, Dun Li. "Fabric-iot: A

Blockchain-Based Access Control System in 1%
IoT", IEEE Access, 2020
Publication

coek.info
3
Internet Source 1%
Submitted to University of Essex
4
Student Paper 1%
5 Submitted to Westford School of
Management 1%
Student Paper

hdl.handle.net
6
Internet Source 1%
7
www.geeksforgeeks.org
Internet Source 1%
8 www.slideshare.net
Internet Source 1%
9 Lihua Song, Mengchen Li, Zongke Zhu, Peng
Yuan, Yunhua He. "Attribute-Based Access
1%
Control Using Smart Contracts for the Internet
of Things", Procedia Computer Science, 2020
Publication

10
opengovasia.com
Internet Source
1%
11
Submitted to University of Northumbria at
Newcastle
1%
Student Paper

12
A. Saritha Haridas, Arun T. Nair, K. S. Haritha,
Kesavan Namboothiri. "Chapter 13 Artiﬁcial
1%
Intelligence-Based Phonocardiogram:
Classiﬁcation Using Cepstral Features", Springer
Science and Business Media LLC, 2022
Publication

"Blockchain and Trustworthy Systems", Springer

Science and Business Media LLC, 2020
<1%
13 Publication
arxiv.org
Internet Source
<1 %
cointelegraph.com
Internet Source
<1 %
meswapnilk.medium.com <1 %
Internet Source

www.hindawi.com <1 %
Internet Source

Ziyuan Wang, Dain Yap Liﬀman, Dileban <1 %

Karunamoorthy, Ermyas Abebe. "Distributed
Ledger Technology for Document and Workﬂow
Management in Trade and Logistics", Proceedings
of the 27th ACM International Conference on
Information and Knowledge Management - CIKM
'18, 2018
Publication

www.researchgate.net <1 %
Internet Source

www.mdpi.com
<1 %
Internet Source

Thanh Long Nhat Dang, Minh Son Nguyen. "An

<1 %
Approach to Data Privacy in Smart Home using
Blockchain Technology", 2018 International
Conference on Advanced Computing and
Applications (ACOMP), 2018
Publication
wn.com
22 Internet Source <1 %
Exclude quotes Oﬀ Exclude bibliography Oﬀ

DeepFake-edit Final
No ratings yet
DeepFake-edit Final
47 pages
Ebook - Futureproofing Recruitment With AI - Raunak
No ratings yet
Ebook - Futureproofing Recruitment With AI - Raunak
59 pages
Breast Cancer Prediction Using Machine Learning
No ratings yet
Breast Cancer Prediction Using Machine Learning
8 pages
Breast Cancerr Main
100% (1)
Breast Cancerr Main
47 pages
CPP Report
No ratings yet
CPP Report
16 pages
Breast Cancer Prediction Project
No ratings yet
Breast Cancer Prediction Project
33 pages
BC Proposal
No ratings yet
BC Proposal
18 pages
Report MP
No ratings yet
Report MP
26 pages
Project PPT1 Enhanced
No ratings yet
Project PPT1 Enhanced
16 pages
Kanchan Chandolkar Major Project Report File
No ratings yet
Kanchan Chandolkar Major Project Report File
27 pages
Final Major Project 7th Sem
No ratings yet
Final Major Project 7th Sem
72 pages
Disease Presiction
No ratings yet
Disease Presiction
32 pages
Brest Cancer Tumor Detection
No ratings yet
Brest Cancer Tumor Detection
40 pages
Project
No ratings yet
Project
40 pages
A-14 Mini Project Abstract
No ratings yet
A-14 Mini Project Abstract
15 pages
Classification of Invasive Ductal Carcinoma From Histopathology Breast Cancer Images Using Stacked Generalized Ensemble
No ratings yet
Classification of Invasive Ductal Carcinoma From Histopathology Breast Cancer Images Using Stacked Generalized Ensemble
16 pages
Zeroth Review Minor P
No ratings yet
Zeroth Review Minor P
11 pages
Pt4 Project Report Updatedd
No ratings yet
Pt4 Project Report Updatedd
47 pages
Proposal Cancer
No ratings yet
Proposal Cancer
4 pages
G5 Research Paper
No ratings yet
G5 Research Paper
14 pages
Sahana S - 1BI22MC086
No ratings yet
Sahana S - 1BI22MC086
47 pages
Optimizing Classification Models For Medical Image Diagnosis: A Comparative Analysis On Multi-Class Datasets
No ratings yet
Optimizing Classification Models For Medical Image Diagnosis: A Comparative Analysis On Multi-Class Datasets
10 pages
PFY G-12 BC Classification
No ratings yet
PFY G-12 BC Classification
57 pages
Health and Med Tech Sadhana
No ratings yet
Health and Med Tech Sadhana
94 pages
PFY G-12 BC Classification
No ratings yet
PFY G-12 BC Classification
55 pages
Hda TP Final
No ratings yet
Hda TP Final
29 pages
Brain Tumour Analysis Using Image Processsing
No ratings yet
Brain Tumour Analysis Using Image Processsing
48 pages
Mini Project
No ratings yet
Mini Project
3 pages
Jatin Synopsis
No ratings yet
Jatin Synopsis
19 pages
.PPTX 20240624 205804 0000
No ratings yet
.PPTX 20240624 205804 0000
20 pages
The Roadmap To A Strong Business
No ratings yet
The Roadmap To A Strong Business
49 pages
Ramakant Thesis
No ratings yet
Ramakant Thesis
68 pages
BC Detect.
100% (1)
BC Detect.
38 pages
102-22-05-24, 145 PM Microsoft Lens
No ratings yet
102-22-05-24, 145 PM Microsoft Lens
83 pages
Skin Disease Detection Using Machine Learning
100% (2)
Skin Disease Detection Using Machine Learning
59 pages
Breast CancerMET
No ratings yet
Breast CancerMET
33 pages
Breast Cancer Survey
No ratings yet
Breast Cancer Survey
8 pages
AReviewand Computational Analysisof Breast Cancer Using
No ratings yet
AReviewand Computational Analysisof Breast Cancer Using
8 pages
An Efficient Convolutional Neural Network-Based Classifier For An Imbalanced Oral Squamous Carcinoma Cell Dataset
No ratings yet
An Efficient Convolutional Neural Network-Based Classifier For An Imbalanced Oral Squamous Carcinoma Cell Dataset
13 pages
Breast Cacner Detection
No ratings yet
Breast Cacner Detection
6 pages
Shihab Thesis
No ratings yet
Shihab Thesis
41 pages
Breast Cancer Diagnosis
No ratings yet
Breast Cancer Diagnosis
31 pages
.PPTX 20240621 112030 0000
No ratings yet
.PPTX 20240621 112030 0000
20 pages
Final Report
No ratings yet
Final Report
13 pages
Final Breast Cancer
100% (1)
Final Breast Cancer
23 pages
The Comparative Study of Deep Learning N
No ratings yet
The Comparative Study of Deep Learning N
14 pages
Proj Report PDF
0% (1)
Proj Report PDF
49 pages
HW Wincon
No ratings yet
HW Wincon
3 pages
Exploring Machine Learning Classifiers F
No ratings yet
Exploring Machine Learning Classifiers F
21 pages
Image Quality Enhancement Taken by Multiple Cameras for Pedestrians Monitoring جرختلا ثحب ناونع
No ratings yet
Image Quality Enhancement Taken by Multiple Cameras for Pedestrians Monitoring جرختلا ثحب ناونع
56 pages
Viyan Report
No ratings yet
Viyan Report
59 pages
Zhou 2020
No ratings yet
Zhou 2020
4 pages
A Deep-Learning-Based Novel Method To Classify Breast Cancer
No ratings yet
A Deep-Learning-Based Novel Method To Classify Breast Cancer
6 pages
ICISN2025 Article BreastCancerProcessing Tuan Tran
No ratings yet
ICISN2025 Article BreastCancerProcessing Tuan Tran
10 pages
Justification of The Research Proposed
No ratings yet
Justification of The Research Proposed
22 pages
Neural Network
No ratings yet
Neural Network
15 pages
Ukoha Chinonso Precious 17CG023225
No ratings yet
Ukoha Chinonso Precious 17CG023225
86 pages
Breast Cancer Classification - Team6
No ratings yet
Breast Cancer Classification - Team6
22 pages
DL Review Ansi
No ratings yet
DL Review Ansi
9 pages
BreastCancer Classification - 2025
No ratings yet
BreastCancer Classification - 2025
24 pages
A Transfer Learning Approach To Breast Cancer
No ratings yet
A Transfer Learning Approach To Breast Cancer
11 pages
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
César Pérez López
No ratings yet
An Efficient Method For Number Plate Detection and Extraction Using White Pixel Detection (WPD) Method
No ratings yet
An Efficient Method For Number Plate Detection and Extraction Using White Pixel Detection (WPD) Method
7 pages
A Systematic Literature Review On Price Forecasting Models in Construction Industry
No ratings yet
A Systematic Literature Review On Price Forecasting Models in Construction Industry
11 pages
Two Stage Prediction Method For Capacity Aging Trajectories of Lith - 2024 - Ene
No ratings yet
Two Stage Prediction Method For Capacity Aging Trajectories of Lith - 2024 - Ene
19 pages
A Comparative Analysis of SVM and CNN For Scoliosis Detection in Backbone X-Ray Images
No ratings yet
A Comparative Analysis of SVM and CNN For Scoliosis Detection in Backbone X-Ray Images
14 pages
Online Panel Data Quality: A Sentiment Analysis Based On A Deep Learning Approach
No ratings yet
Online Panel Data Quality: A Sentiment Analysis Based On A Deep Learning Approach
8 pages
Hand Gesture Recognition Systems: A Survey: Inshare
No ratings yet
Hand Gesture Recognition Systems: A Survey: Inshare
6 pages
Intelligent Techniques For Forecasting Electricity Consumption of Buildings
No ratings yet
Intelligent Techniques For Forecasting Electricity Consumption of Buildings
8 pages
Slide 2
No ratings yet
Slide 2
35 pages
Image Steganalysis Using Deep Learning 2023
No ratings yet
Image Steganalysis Using Deep Learning 2023
33 pages
Facial Landmark Detection Using CNN Report
No ratings yet
Facial Landmark Detection Using CNN Report
47 pages
ANN Project
No ratings yet
ANN Project
38 pages
Forecasting Oil Production by Adaptive Neuro Fuzzy Inference System
No ratings yet
Forecasting Oil Production by Adaptive Neuro Fuzzy Inference System
9 pages
Csi3006 Soft-Computing-Techniques Eth 1.0 66 Csi3006 61 Acp
No ratings yet
Csi3006 Soft-Computing-Techniques Eth 1.0 66 Csi3006 61 Acp
2 pages
Mini Project-04,52 00
No ratings yet
Mini Project-04,52 00
85 pages
Distributed Deep Learning For Parallel Training
No ratings yet
Distributed Deep Learning For Parallel Training
7 pages
Brain Tumor Detection Using Deep Learning
No ratings yet
Brain Tumor Detection Using Deep Learning
5 pages
Prakhar Project Report
No ratings yet
Prakhar Project Report
58 pages
Get To The Point: Summarization With Pointer-Generator Networks
No ratings yet
Get To The Point: Summarization With Pointer-Generator Networks
20 pages
Matlab Program Codes For Bidirectional Associative Memory Networks
No ratings yet
Matlab Program Codes For Bidirectional Associative Memory Networks
4 pages
Bezier Curve Machine Learning With MS CNTK
No ratings yet
Bezier Curve Machine Learning With MS CNTK
11 pages
BIAM - 560 - Final Course Project D40562330
No ratings yet
BIAM - 560 - Final Course Project D40562330
41 pages
Base Paper - 2
No ratings yet
Base Paper - 2
5 pages
Automatic Estimation of Excavator Actual and Relative Cycle Times
No ratings yet
Automatic Estimation of Excavator Actual and Relative Cycle Times
16 pages
Paper-189 - Machine Learning Unveiled
No ratings yet
Paper-189 - Machine Learning Unveiled
19 pages
Ihub - IITR - PCP in Generative AI and Machine Learning - 41223
No ratings yet
Ihub - IITR - PCP in Generative AI and Machine Learning - 41223
30 pages
Digital Fluency Complete Notes All 3 Modules
No ratings yet
Digital Fluency Complete Notes All 3 Modules
26 pages
Explainable and Interpretable Models in Computer Vision and Machine Learning
No ratings yet
Explainable and Interpretable Models in Computer Vision and Machine Learning
305 pages