0% found this document useful (0 votes)
17 views14 pages

Project Report ML

Machine Learning project Report

Uploaded by

banaroy57
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views14 pages

Project Report ML

Machine Learning project Report

Uploaded by

banaroy57
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

Ovarian Cancer Detection from Histopathology

Images

SUBMITTED BY

Session: 2024-25

A Project Report Submitted to the Cotton University for the degree of Master in Computer
Applications for the Subject of Machine Learning

GUIDED BY

SAGARIKA SENGUPTA

DEPARTMENT OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGY


COTTON UNIVERSITY, ASSAM

ABSTRACT
Ovarian cancer is among the leading causes of cancer-related deaths among women
worldwide. Accurate and timely diagnosis through histopathological image analysis plays a crucial
role in improving survival rates. This project leverages advanced deep learning techniques to
automate the detection of ovarian cancer from histopathological images. We evaluate the performance
of four state-of-the-art convolutional neural networks (CNNs) – VGG-19, DenseNet-121, ResNet-
152, and Inception V3 – using a publicly available dataset. By employing transfer learning and fine-
tuning these architectures, we achieved significant performance improvements in classifying
histopathological images. Results demonstrate high accuracy, sensitivity, and specificity, highlighting
the potential for clinical adoption. Future work will focus on enhancing explainability and scaling the
solution to larger datasets.

1|Page
TABLE OF CONTENTS

1. Introduction
2. Contributions of the work
3. Literature Review
3.1 Overview of Related Work
3.2 Key Research Gaps
3.2.1 Lack of studies
3.2.2 Limited Dataset
3.2.3 Few Comparative Analysis of Different CNN Architectures
3.3 How This Work Addresses Gaps
4. Dataset Description
5. Proposed Methodology
5.1 Pipeline Overview
5.1.1 Data Preprocessing
5.1.2 Model Training
5.1.3 Evaluation
5.2 Model Architecture
5.2.1 VGG-19
5.2.2 ResNet-152
5.2.3 Inception V3
5.2.4 DenseNet-121
5.3 Training Details
6. Result and Description
6.1 Accuracy Loss Graph
6.2 Confusion Metrics
6.3 Performance Metrics
7. Conclusion
8. References

2|Page
1. INTRODUCTION

Cancer is a deadly illness and cause of mortality worldwide, with ovarian cancer being one of
the most lethal gynaecologic malignancies. Globally, approximately 225,000 cases of ovarian cancer
are diagnosed annually, with around 140,000 fatalities. Ovarian cancer is known as a lethal cancer
since the signs are hard to detect in the preliminary stages and often presents nonspecific symptoms.
As a result, it is exceedingly difficult to treat cancer if it is not detected at an early stage. Appropriate
detection of stage and type of tumor is very necessary for further diagnosis.

Histopathology is the examination of a biopsy or tissue material under a microscope.


Understanding microscopic structures and their activities at cellular, sub cellular, tissue and organ
levels is necessary in order to understand the illness progression. Despite advancements, automating
the detection of ovarian cancer in histopathological images remains a challenge due to the complexity
of tumour morphology and the lack of large, annotated datasets.

The objective of this project is to design and evaluate a robust machine learning pipeline
capable of accurately classifying histopathological images of ovarian cancer using deep learning
architectures.

This study explores the application of transfer learning techniques utilizing state-of-the-art
convolutional neural networks (CNNs) to achieve reliable and effective classification results. The
research places a strong emphasis on ensuring high levels of accuracy, robustness, and clinical
applicability, which are critical for real-world implementation. By tackling the issues, the work aims
to contribute to the development of more dependable and widely applicable solutions in the field.

2. CONTRIBUTIONS OF THE WORK

 Implementation of four advanced CNN architectures – VGG-19, DenseNet-121, ResNet-152,


and Inception V3 – for ovarian cancer detection of 4 different ovarian cancer type.
 Use of transfer learning and fine-tuning to improve performance on a limited dataset.
 Use data augmentation for increasing variation of images on the limited dataset.
 Comprehensive comparative analysis of model performance based on accuracy, loss.
 Development of preprocessing and augmentation pipelines to enhance model robustness.
 Visualization of model predictions using accuracy and log graph, confusion matrix, etc.

3. Literature Review

3.1 Overview of Related Work

Recent studies have highlighted the efficacy of deep learning in medical image
analysis. CNNs, in particular, have shown remarkable success in tasks like tumour

3|Page
classification and segmentation. Architectures like VGG and ResNet have been widely
adopted for histopathological image analysis. However, most existing studies focus on larger
datasets and overlook ovarian cancer-specific challenges.

3.2 Key Research Gaps

3.2.1 Lack of studies:


Despite the growing interest in applying artificial intelligence and deep learning to
medical diagnostics, there remains a notable gap in research focused on ovarian cancer
detection. Most studies in this domain tend to prioritize more common cancers, such as
breast, lung, or skin cancer, leaving ovarian cancer underrepresented. This lack of
attention is concerning given the high mortality rate associated with ovarian cancer,
which is often due to late-stage diagnosis.

3.2.2 Limited Dataset:


One of the key challenges in leveraging deep learning for ovarian cancer detection is
the scarcity of high-quality, annotated datasets. Training robust and accurate deep
learning models, particularly convolutional neural networks, requires a large volume of
labeled data to ensure the model can effectively learn patterns and generalize to unseen
cases. However, datasets for ovarian cancer, especially those that include
histopathological images, are limited in size and availability.

3.2.3 Few Comparative Analysis of Different CNN Architectures:


While convolutional neural networks (CNNs) have shown promise in histopathological
image analysis, there is a lack of comprehensive studies comparing different CNN
architectures for ovarian cancer detection specifically. Such analyses are crucial for
identifying the most effective models for this application, as different architectures
may vary in their ability to handle the unique challenges posed by ovarian cancer
histopathology, such as subtle morphological variations. Without these comparative
studies, it becomes difficult to determine which models are best suited for this critical
task, limiting progress in the field.

3.3 How This Work Addresses Gaps

This project addresses these gaps by exploring multiple CNN architectures tailored to ovarian
cancer detection, employing transfer learning and data augmentation techniques to mitigate
dataset limitations, and conducting a detailed comparative analysis.

4. DATASET DESCRIPTION
The " Histopathology Images of Ovarian Cancer Detection " dataset, available on Kaggle,
provides histopathological images for the classification of ovarian cancer subtypes. The dataset is
intended for research in medical imaging and machine learning, focusing on the classification of
ovarian cancer based on histopathology slides. This dataset can be used to train models for detecting

4|Page
different ovarian cancer subtypes, and for the broader goal of improving diagnostic accuracy and
treatment plans for ovarian cancer. Ethical considerations were adhered to in dataset usage.

The dataset contains histopathology images collected from various ovarian cancer cases.
These images are taken from stained tissue samples and are categorized into 4 subtypes of ovarian
cancer. This type of dataset is crucial for training and testing machine learning algorithms designed to
recognize patterns in histological data for the automatic diagnosis of cancerous tissues.

Structure
 Total Images: 800.
 Classes: Clear_Cell, Endometri, Mucinous, Serous.
 Image Resolution: Varied, resized to 224x224 or 299x299 depending on the model.
Key Features of the Dataset
1. Images:
 The dataset consists of histopathology images (mainly in .jpg, .png format).
 Each image corresponds to a particular biopsy sample.
 The resolution and size of images may vary, though they are typically high-quality
images suitable for detailed analysis.
2. Cancer Subtypes:
 The dataset includes images categorized according to different ovarian cancer subtypes.
These subtypes are typically classified based on tissue characteristics and include:
o Clear Cell
o Endometri
o Mucinous
o Serous
3. Imbalance in Classes:
 Like many medical datasets, there are imbalance in the number of images across different
subtypes, which is a common challenge in machine learning classification tasks. Special
attention is given for data preprocessing to handle class imbalances.
4. Image Characteristics
 Resolution and Size: The images may vary in resolution but typically contain detailed
tissue samples, requiring deep learning models to extract features from high-dimensional data.

 Color Channels: The images are likely colored due to the staining techniques used in
histopathology, which may require special preprocessing (e.g., normalization of the color
channels) for model training.

5. Preprocessing Considerations
1. Normalization: Histopathology images typically need normalization or
standardization to make them suitable for training deep learning models.

2. Augmentation: Given the potential limited size of the dataset, augmentation


techniques (e.g., rotation, flipping, scaling) may be necessary to artificially increase
the size of the dataset and improve model generalization.
5|Page
3. Resizing: Many machine learning models require consistent image sizes, so images
are resized to a fixed dimension (e.g., 224x224 pixels).

4. Splitting the Dataset: The dataset will need to be split into training, validation, and
test sets to evaluate model performance effectively.

5. PROPOSED METHODOLOGY
The aim of the project is to detect and classify the ovarian cancer from tissue sample images.
5.1 Pipeline Overview
5.1.1 Data Preprocessing
 For our project, we use Google Colab to perform research code. To initialize
the code, several types of packages that were necessary for study were
installed.
 Used ‘ImageDataGenerator’ function for augmentation of test data, and
normalization. This function also helps dividing test data and validation data.
5.1.2 Model Training
 Trained model VGG-19, Inception V3, DenseNet- 121, and ResNet-152 on
the dataset.
 Dropout and Dense layer are used to prevent overfitting / underfitting of the
model
5.1.3 Evaluation
 Assessing models using accuracy
 Accuracy – loss graph, and confusion matric are prepared to visualize the
performance of the models.
5.2 Model Architecture
5.2.1 VGG-19
VGG-19 is a deep convolutional neural network known for its simplicity and
effectiveness in image classification tasks. Despite being relatively simple compared
to more advanced models (like ResNet or DenseNet), VGG-19 still performs
remarkably well in many image classification challenges, including medical image
classification.
Key Features of VGG-19
 Simple Architecture: VGG-19 consists of 19 layers — 16 convolutional
layers followed by 3 fully connected layers. This simplicity is its strength in
some applications.

 Deep Layers: VGG-19's deep architecture allows it to learn complex


hierarchical features from images.

 Pretrained Weights: The model comes with pre-trained weights from


ImageNet, allowing transfer learning to your specific dataset. Fine-tuning on
a medical dataset, such as histopathology images, can significantly improve
performance.

5.2.2 ResNet-152

6|Page
ResNet-152 (Residual Networks with 152 layers) is one of the most powerful
deep learning architectures for image classification, specifically designed to tackle the
vanishing gradient problem in very deep networks. It has achieved remarkable
performance in image recognition tasks, including medical image analysis.
Key Features of ResNet-152
 Depth: ResNet-152 consists of 152 layers, which allows it to learn a more
complex set of features compared to shallower models.

 Residual Blocks: It uses residual connections, which help to mitigate the


vanishing gradient problem and enable better training of very deep networks.

 Pre-trained Weights: ResNet-152 is often used with pre-trained weights


from large-scale image datasets (such as ImageNet), which can significantly
boost performance on small or specialized datasets like histopathology
images.

5.2.3 INCEPTION V3
InceptionV3 is a highly efficient deep learning model for image
classification, built to tackle the challenges of both accuracy and computational
efficiency. It is particularly well-suited for medical image classification tasks,
including histopathology image analysis, as it incorporates advanced techniques to
enhance feature learning.
Key Features of Inception V3
 Efficient Architecture: InceptionV3 uses the "Inception" module, which
allows the network to use multiple types of convolutions in parallel, making
it highly efficient for learning complex image features.

 Auxiliary Classifiers: These are additional classifiers that help improve


gradient flow and training speed, especially in very deep networks.

 Pre-trained Weights: Like ResNet-152, InceptionV3 can be used with pre-


trained weights (e.g., trained on ImageNet), which helps the model generalize
better on specialized tasks like cancer detection.

5.2.4 DenseNet-121
DenseNet-121 is a highly efficient convolutional neural network architecture
that connects each layer to every other layer in a feedforward fashion, creating dense
blocks where each layer receives input from all previous layers. This dense
connectivity helps alleviate the vanishing gradient problem, enhances feature reuse,
and makes the model more parameter-efficient.
Key Features of DenseNet-121
 Dense Blocks: Each layer in a dense block receives input from all previous
layers, improving feature reuse and network performance.

 Parameter Efficiency: DenseNet-121 requires fewer parameters than other


deep networks like ResNet or Inception because each layer has direct access
to all the feature maps that came before it.

7|Page
 Pretrained Weights: Like other models (e.g., ResNet and Inception),
DenseNet-121 can be initialized with weights pretrained on ImageNet, which
can significantly improve the model's performance on your dataset.

5.3 Training Details

 Pretrained Weights: Initialized with ImageNet weights.

 Optimizer: Adam with learning rate scheduling.

 Batch Size: Varied, e.g 20,24, etc depending on the model.

 Epochs: 20/25.

 Framework: TensorFlow.

6. RESULT AND DESCRIPTION

6.1 Accuracy Loss Graph

Model : VGG - 19

8|Page
Model : DenseNet – 121

Model : Inception V3

9|Page
Model : ResNet - 152

6.2 Confusion Metrics

Model : VGG – 19

10 | P a g e
Model : DenseNet – 121

Model : Inception V3

11 | P a g e
Model : ResNet - 152

6.3 Performance Metrics

Model Accuracy Loss


VGG – 19 85.00% 21.30%
DenseNet – 121 95.00% 21.52%
Inception V3 80.00% 54.5%
ResNet 152 80.00% 57.8%

7. CONCLUSION
This project demonstrates the effectiveness of deep learning in automating ovarian cancer
detection from histopathological images. In this project, Deep learning architectures such as ResNet-
152, Inception V3, DenseNet-121 and VGG-19 were used. All the models were trained on 180 images
each. Among the evaluated architectures, DenseNet-121 achieved the best overall performance. These
technologies have the potential to revolutionize diagnosis, leading to better patient outcomes and
more personalized treatment plans.
Future work for this project will be to deal with all the above-mentioned limitations. Model
can also include other medical data such as genetic data for more analysis. Also using the same
dataset, different pre-trained model such as GoogleNet, Mobile-Net can be used with larger data for
analysing and classifying the accuracy further.

8. REFERENCES
 Kaggle ( https://kaggle.com )
 DigitalSreeni (YouTube Channel) (www.youtube.com/@DigitalSreeni)
 Chat GPT ( https://chatgpt.com )
 Gemini ( https://gemini.google.com )
12 | P a g e
------------------------------------------------

13 | P a g e

You might also like