0% found this document useful (0 votes)
21 views19 pages

Capstone Review 2

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views19 pages

Capstone Review 2

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

GITAM

(Deemed to be university)
School of Technology, Hyderabad
Department of Computer Science and Engineering

PREDICTION OF BREAST CANCER USING


HISTOPATHOLOGY IMAGES

Project Batch No: P601

KVL BHAVYA - HU21CSEN0101141


Guide Name: Dr. Figlu Mohanty
CH HARIKA - HU21CSEN0101163
K PRANATHI - HU21CSEN0100954
K SURYA TEJA - HU21CSEN0101927

1
CONTENTS
Introduction

Abstract

Review of literature Survey

Problem statement

Objectives

Dataset analysis

Identification of Tools/Technologies

Flowchart of complete project

References

2
INTRODUCTION
● Breast cancer is a life-threatening disease; early detection improves survival rates.
● Machine learning, especially deep learning, transforms histopathology image
analysis.
● Techniques include pre-trained CNNs, transfer learning, and hybrid models.
● Ten research papers reviewed show model effectiveness and dataset insights.
● Highlights challenges faced in current methodologies.
● Identifies areas for future research to enhance breast cancer detection.

3
ABSTRACT
Breast cancer remains a global health concern, with early detection playing a critical role in
improving patient outcomes. This project focuses on developing an automated system to predict
and classify breast cancer using histopathology images, utilizing advanced machine learning
techniques. A comprehensive analysis of existing research reveals the effectiveness of
pre-trained convolutional neural networks (CNNs) and traditional classifiers in distinguishing
between benign and malignant tissues. By applying these techniques, the project aims to
enhance the accuracy, efficiency, and reliability of breast cancer diagnosis, providing valuable
assistance to pathologists in clinical settings.

4
REVIEW OF LITERATURE SURVEY
Publication Abstract Data set Algorithm Results Observations Research Gap
Details
Analysis of Pre-trained CNNs BreakHis Dataset: 7,909 VGG16, VGG19, ResNet50 + LR achieved ResNet50 performed Limitations in dataset
combined with images (2,480 benign, Xception, ResNet50 + the highest accuracy of well across different size and diversity. Future
histopathological traditional classifiers to 5,429 malignant) SVM & Logistic 93.27%, while ResNet50 magnifications, work could explore data
images for classify breast cancer Regression + SVM achieved 92.5% especially at 40x and augmentation and
prediction of histopathology images. 100x subtype classification.
breast cancer using
traditional
classifiers with
pre-trained CNN
Gene Expression Needs validation across
An Efficient Deep Gene expression TCGA Dataset: 495 EfficientNet, RegNet, Prediction: Spearman Hist2RNA model more diverse datasets for
Learning prediction from WSls, RNA sequencing DenseNet, hist2RNA correlation 0.82 across showed computational improved
histopathology images data for 138 genes, model for gene patients. AUROC for efficiency, reducing generalizability.
Architecture to using deep learning. External TMA dataset: expression prediction + subtypes ranged from energy consumption 50x
Predict Gene 498 patients Voting Classifier for 0.63 to 0.89 for different compared to patch-based
Expression from subtype classification cancer subtypes methods
Breast Cancer
Histopathology
Images.

5
REVIEW OF LITERATURE SURVEY
Publication Abstract Data set Algorithm Results Observations Research Gap
Details
Transforming Explored advanced ML Kaggle Breast Transfer Learning ResNet50 achieved the Transfer learning Future work should
models like transfer Histopathology Dataset: Models: ResNet50, highest accuracy of models, especially focus on larger datasets
Breast Cancer learning for breast 2,453 images (IDC ResNet101, VGG16, 92.2%, with AUC of ResNet50, are effective and further model
Identification: An cancer identification. positive/negative) VGG19 91.0% and recall of in identifying IDC. optimization.
In-Depth 95.7%
Examination of
Advanced
Machine Learning
Models Applied to
Histopathological
Images.

LMHisNet: Proposed LMHistNet, a Dataset: 8 classes of LMHistNet with Achieved 99% accuracy LMHistNet Further testing on larger
Levenberg-Marqua CNN optimized with the breast cancer images Convolutional Block for binary classification outperformed several datasets and integration
rdt Based Deep Levenberg-Marquardt (adenosis, fibroadenoma, Attention Module, and 88% for multiclass state-of-the-art with real-world clinical
algorithm for breast carcinomas, etc.), Group Normalized classification. High architectures, workflows required.
Neural Network cancer classification. Magnifications: 40X, block, Spatial precision and recall at particularly in binary
Classification of 100X, 200X, 400X Factorization, Batch 100X magnification. classification.
Breast Cancer Normalization, Hinge
Loss
Histopathological
Images

6
REVIEW OF LITERATURE SURVEY
Publication Abstract Data set Algorithm Results Observations Research Gap
Details
A Critical Analysis Comparative analysis of BreakHis and BACH Efficient B3, Efficient B3 achieved High accuracy across More work needed in
various CNN and databases (publicly ViT-AMCNet, 98% accuracy. Other multiple models, with model optimization and
and Classification transfer learning models available) Ensembled Transfer models like CFAN and Efficient B3 leading in segmentation challenges
of Breast Cancer for breast cancer Learning (ETL), Inception-ResNet-V2 precision and recall. for better clinical
Using classification. Curriculum Feature also showed high applicability.
Alignment Network performance.
Histopathology (CFAN)
Images

Advancing Breast
Cancer Prediction Utilized transfer learning Breast Histology ResNet50, AlexNet, ResNet50 achieved Data augmentation Needs additional testing
and deep neural Dataset: 260 whole-slide GoogleNet + 97.11% accuracy, improved model across diverse datasets
and Early networks for early breast images, augmented to Patch-based dataset, demonstrating the performance, with and clinical trials for
Detection with cancer detection. 72,900 patches Data Augmentation potential of deep ResNet50 excelling in validation.
Advanced Deep (rotation, mirroring) learning for early breast classification.
cancer detection.
Learning Models

7
REVIEW OF LITERATURE SURVEY
Publication Abstract Data set Algorithm Results Observations Research Gap
Details
Breast Cancer Combined transfer BreakHis Dataset: 7,909 VGG16, VGG19, DenseNet201 + XGBoost proved robust Needs further
learning with gradient images ResNet50, DenseNet XGBoost achieved the across different improvements, such as
Diagnosis from boosting (XGBoost) for (benign/malignant) at (pre-trained models) + highest accuracy: 40X magnification levels, stain normalization and
Histopathology breast cancer magnifications: 40X, XGBoost, LightGBM, (93.6%), 100X (91.3%), showing high sensitivity broader classification of
Images Using classification. 100X, 200X, 400X CatBoost 200X (93.8%), 400X and precision. cancer subtypes.
(89.1%). Overall
Deep Neural average accuracy:
Network and 91.9%.
XGBoot

A deep learning Proposed BCR-Net Dataset of H&E and CNN-scorer for patch AUC for H&E WSIs: Intelligent patch Needs more robust
framework to framework for Ki67-stained breast scoring, Multiple 0.775; AUC for Ki67 sampling improved validation, especially for
predicting breast cancer cancer images from 151 Instance Learning (MIL) WSIs: 0.811. Accuracy model efficiency, with clinical use in risk
predict breast recurrence using anonymized patients + Attention-Based for high risk: 71.1% MIL enhancing prediction.
cancer recurrence histopathology images. Pooling for slide-level (H&E), 79.2% (Ki67). interpretability.
from classification
histopathology
images

8
REVIEW OF LITERATURE SURVEY
Publication Abstract Data set Algorithm Results Observations Research Gap
Details
A deep-learning ENLIGHT-DeepPT uses Datasets include TCGA Combines DeepPT ENLIGHT-DeepPT It generalizes well across More research is needed
AI to predict cancer cohorts and five patient (predicts mRNA achieved an odds ratio of cancer types and does on its application to
framework to treatment response from cohorts for treatments expression from images) 2.28 and outperformed not require matched WSI other cancers, larger
predict cancer histopathology slides, like trastuzumab, and ENLIGHT (predicts state-of-the-art methods and response data for datasets, and integration
treatment response outperforming current crizotinib, and PARPi, treatment response based in predicting treatment training. with clinical variables.
methods across multiple sourced from Genomic on gene expression). response.
from cancer types and Data Commons and
histopathology therapies. Cancer Imaging
images through Archive.
imputed
transcriptomics
Four pre-trained transfer ResNet50 outperformed ResNet50 is identified as
AL-KINDI This study evaluates four The dataset consists of learning models were the other models, the optimal model for Potential further
pre-trained transfer 2,453 histopathology applied: ResNet50, achieving accuracy of breast cancer detection research could involve
CENTER FOR learning models images, categorized into ResNet101, VGG16, and 92.2%, AUC of 91.0%, from histopathology applying ResNet50 to
RESEARCH AND (ResNet50, ResNet101, two groups: those VGG19. The models recall of 95.7%, and loss images, significantly larger datasets and
DEVELOPMENT VGG16, VGG19) for featuring invasive ductal were evaluated based on of 3.5%. VGG19 had the outperforming the other evaluating its
breast cancer detection carcinoma (IDC) and accuracy, AUC, recall, weakest performance. models. performance across a
using histopathology those without IDC. and loss. Preprocessing broader range of cancer
images. ResNet50 included resizing, types. Other deep
achieved the highest de-noising, segmenting, learning models or
performance with 92.2% and smoothing the combinations of
accuracy, 91.0% AUC, images. techniques could also be
and 95.7% recall, while explored.
VGG19 showed the
weakest results.

9
LITERATURE SUMMARY
● The literature survey reviews ten research papers on machine learning and deep learning for breast
cancer prediction using histopathology images. Key techniques include pre-trained CNNs like
ResNet50, VGG16, and Xception for feature extraction.
● Classifiers such as Support Vector Machines (SVM) and Logistic Regression are commonly used
alongside CNNs.
● Advanced architectures like LMHistNet and transfer learning models like DenseNet and EfficientNet
show high accuracy.
● Methods explored include gene expression prediction, multi-class classification, and gradient
boosting classifiers (XGBoost).
● Challenges identified include dataset diversity, varying image magnifications, and the need for
broader dataset generalization.

10
PROBLEM STATEMENT
● Breast cancer is a leading cause of morbidity and mortality among women globally.
● Increasing incidence demands efficient diagnostic tools for early detection and improved patient
outcomes.
● Traditional methods rely on manual examination of histopathology slides, which are:
a. Time-consuming
b. Subject to human error
● Growing volume of medical imaging data challenges pathologists in maintaining accuracy and
efficiency.
● Project goal: Develop a robust deep learning model (e.g., CNNs, VGG16, VGG19) to:
i. Accurately analyze and classify histopathology images.
ii. Provide reliable tools for pathologists.
iii. Ensure timely and accurate diagnoses to improve clinical outcomes.

11
OBJECTIVES
● Develop a Hybrid Classification Model: Integrate CNN-based feature extraction with traditional
methods (GLCM and LBP) and deep learning architectures (VGG16 and VGG19) to effectively
capture intricate features in breast cancer histopathology images.
● Implement Feature Reduction Techniques: Identify and retain the most relevant features from
the dataset to enhance classification accuracy and efficiency.
● Compare Classifier Performance: Evaluate various classification techniques (Random Forest,
SVM, CNN) using selected features, aiming to determine the most effective approach for breast
cancer classification based on metrics like accuracy, precision, and recall.

12
DATASET ANALYSIS
● BreaKHis Dataset: Comprises 9,109 microscopic images of breast tumor tissue from 82
patients.
● Contains 2,480 benign and 5,429 malignant samples.
● Images have a resolution of 700x460 pixels, in 3-channel RGB, 8-bit depth, and are in PNG
format.
● Collected using SOB method (partial mastectomy/excisional biopsy).
● Benign Tumors: Adenosis (A), Fibroadenoma (F), Phyllodes Tumor (PT), Tubular
Adenoma (TA).
● Malignant Tumors: Carcinoma (DC), Lobular Carcinoma (LC), Mucinous Carcinoma
(MC), Papillary Carcinoma (PC).

13
DATASET ANALYSIS
● Dataset Significance:
○ Serves as a benchmarking tool for evaluating breast cancer classification models.
○ Facilitates research on early detection and diagnosis of breast cancer.
● Data Preprocessing Needs:
○ Image Resizing: Standardizing input sizes for models.
○ Normalization: Scaling pixel values for improved performance.
○ Data Augmentation: Techniques like rotation and flipping to enhance dataset diversity.

14
IDENTIFICATION OF TECHNOLOGIES
● Programming Language:
◦ Python: Chosen for its extensive libraries and frameworks.
● Deep Learning Framework:
◦ TensorFlow/Keras: For building and training CNN models.
● Image Processing Libraries:
◦ OpenCV: For image pre-processing.
◦ Pillow (PIL): For image manipulation.
● Data Handling:
◦ NumPy: For numerical operations.
◦ Pandas: For data manipulation.

15
IDENTIFICATION OF TECHNOLOGIES
● Model Evaluation:
◦ Matplotlib: For visualizing model performance.
● Development Environment:
◦ Jupyter Notebook: For interactive coding and documentation.
● Hardware Requirements:
◦ GPU: For efficient training of deep learning models

16
FLOWCHART OF A COMPLETE
PROJECT
BreakHis Dataset Accuracy, Precision, Recall,
F1 Score

Data Collection Feature Extraction Performance Evaluation

Image Preprocessing Classification Output

Benign or Malignant
Resizing, Normalization
classification

17
REFERENCES
● Gupta, K., and Chawla, N. "Analysis of Histopathological Images for Prediction of Breast Cancer Using Traditional Classifiers with
Pre-Trained CNN," Procedia Computer Science, vol. 167, pp. 878-889, 2020, doi: 10.1016/j.procs.2020.03.427.
● Mondol, R.K., Millar, E.K.A., Graham, P.H., Browne, L., Sowmya, A., and Meijering, E. "hist2RNA: An Efficient Deep Learning
Model to Predict Gene Expression from Breast Cancer Histopathology Images," Cancers, vol. 15, no. 2569, 2023, doi:
10.3390/cancers15092569.
● R. K. Ray, A. A. Linkon, M. S. Bhuiyan, R. M. Jewel, N. Anjum, B. P. Ghosh, M. T. Mia, Badruddowza, M. S. U. Sarker, and M.
Shaima, "Transforming Breast Cancer Identification: An In-Depth Examination of Advanced Machine Learning Models Applied to
Histopathological Images," J. Comput. Sci. Technol. Stud., vol. 6, no. 1, pp. 155–161, Jan. 2024, doi: 10.32996/jcsts.2024.6.1.16
● Koshy, S. S., & Anbarasi, L. J. (2024). LMHistNet: Levenberg–Marquardt Based Deep Neural Network for Classification of Breast
Cancer Histopathological Images. IEEE Access, vol. 12, pp. 52051-52062. DOI: [10.1109/ACCESS.2024.3385011]
● P. Singh, R. Kumar, M. Gupta, and A. J. Obaid, "A Critical Analysis and Classification of Breast Cancer Using Histopathology
Images," in Proc. 2024 Int. Conf. Knowl. Eng. Commun. Syst. (ICKECS), 2024, pp. 1-8, doi: 10.1109/ICKECS61492.2024.10617282
● Garg, P., Sharma, P., & Prakash, U. M. (2024). Advancing Breast Cancer Prediction and Early Detection with Advanced Deep
Learning Models. 2024 Fourth International Conference on Advances in Electrical, Computing, Communication, and Sustainable
Technologies (ICAECT). DOI: [10.1109/ICAECT60202.2024.10469397]
● A. Maleki, M. Raahemi, and H. Nasiri, "Breast Cancer Diagnosis from Histopathology Images Using Deep Neural Network and
XGBoost," Preprint submitted to Elsevier, 2024
● Z. Su, M. K. K. Niazi, T. E. Tavolara, S. Niu, G. H. Tozbikian, R. Wesolowski, and M. N. Gurcan, "BCR-Net: A deep learning
framework to predict breast cancer recurrence from histopathology images," PLOS ONE, vol. 18, no. 4, Apr. 2023
● R. K. Ray, A. A. Linkon, M. S. Bhuiyan, R. M. Jewel, N. Anjum, B. P. Ghosh, M. T. Mia, Badruddowza, M. S. U. Sarker, and M.
Shaima, "Transforming Breast Cancer Identification: An In-Depth Examination of Advanced Machine Learning Models Applied to
Histopathological Images," Journal of Computer Science and Technology Studies, vol. 6, no. 1, pp. 155–161, Jan. 2024, doi:
10.32996/jcsts.2024.6.1.16
● D.-T. Hoang et al., "A deep-learning framework to predict cancer treatment response from histopathology images through imputed
transcriptomics," Nature Cancer, vol. 5, 2024, doi: 10.1038/s43018-024-00793-2

18
THANK YOU

19

You might also like