0% found this document useful (0 votes)
3 views13 pages

Assessment_of_Deep_Learning_Algorithms_to_Predict_

This study presents a novel approach using deep learning algorithms to classify breast cancer histopathological images, utilizing a private dataset from Morocco. The ResNet50 and Xception models demonstrated high accuracy and sensitivity in detecting carcinoma cases, achieving overall accuracies of 84.5% and 88%, respectively. The research highlights the potential of digital pathology and artificial intelligence to enhance diagnostic accuracy in breast cancer pathology.

Uploaded by

Saber Boutayeb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views13 pages

Assessment_of_Deep_Learning_Algorithms_to_Predict_

This study presents a novel approach using deep learning algorithms to classify breast cancer histopathological images, utilizing a private dataset from Morocco. The ResNet50 and Xception models demonstrated high accuracy and sensitivity in detecting carcinoma cases, achieving overall accuracies of 84.5% and 88%, respectively. The research highlights the potential of digital pathology and artificial intelligence to enhance diagnostic accuracy in breast cancer pathology.

Uploaded by

Saber Boutayeb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Assessment of Deep Learning Algorithms to Predict

Histopathological Diagnosis of Breast Cancer: First


Moroccan Prospective Study on a Private Dataset
HAJAR EL AGOURI (  [email protected] )
Mohammed V Military Instruction Hospital: Hopital Militaire d'Instruction Mohammed V
https://orcid.org/0000-0001-5692-3160
Mohammed Azizi
Datapathology
Hicham El Attar
Pathological anatomy laboratory Ennasr El jadida
Mohammed El Khannoussi
Datapathology
Azeddine Ibrahimi
Medical University of Lodz Faculty of Pharmacy: Uniwersytet Medyczny w Lodzi Wydzial
Farmaceutyczny
Rachad Kabbaj
National Institute of Oncology
Habiba Kadiri
National Institute of Oncology
Salma BEKARSABEIN
National Institute of Oncology
Soumaya ECH-CHARIF
National Institute of Oncology
Chaimaa Mounjid
Mohammed V University of Rabat Faculty of Sciences: Universite Mohammed V de Rabat Faculte des
Sciences
Basma El Khannoussi
Mohammed V Souissi University Faculty of Medicine and Pharmacy Rabat: Universite Mohammed V de
Rabat Faculte de Medecine et de Pharmacie Rabat

Research note

Keywords: Breast cancer, digital pathology, artificial intelligence, Deep learning, Machine learning,
Convolutional Neural Networks

Page 1/13
DOI: https://doi.org/10.21203/rs.3.rs-855143/v1

License:   This work is licensed under a Creative Commons Attribution 4.0 International License.
Read Full License

Page 2/13
Abstract
Objective: Breast cancer is a critical public health issue and a leading cause of cancer-related deaths
among women worldwide. Its early diagnosis and detection can effectively help in increasing the chances
of survival rate. The aim of this work is to develop a computational approach based on deep
convolutional neural networks for an efficient classification of breast cancer histopathological images by
using our own created dataset. Two models of deep neural network architectures were used combined to
gradient boosted trees classifier. Images were classified in three classes, normal tissue-benign lesions, in
situ carcinoma and invasive carcinoma.

Results: Both Resnet50 and Xception models achieved comparable results, with a small advantage to
Xception extracted features. The proposed classification allowed us to obtain high degree of precision, a
good generalization performance and avoided an eventual overfitting scenario due to the limited size of
the data. In addition, we reported high sensitivity for detection of carcinoma cases, which is important for
diagnostic pathology workflow in order to assist pathologists for diagnosing breast cancer with precision.

Introduction
According to the World Health Organization, Breast cancer (BC) constitutes the first major cause of
women’s death1. In Morocco, 11 747 of women’s new cases with BC were diagnosed during the last year.
It represented about 19,8% of all new cancer cases and 38,9% of all cancers in women2.

Around the world, we are faced with an exponential increase in cancer cases, growing numbers of
patients from an aging population, and a shortage of trained pathologists3. Moreover, there is a need for
accuracy in histopathologic diagnosis of BC as patient demand for accurate diagnostics and
personalized therapy is increasing4;5. Therefore, the trend towards digitization of pathology data has
opened the door to faster, more precise and more reproducible diagnosis through computerized image
analysis6.

In addition, this will revolutionize the laborious work of the pathologist, which is often repetitive and time
consuming, causing significant intra and inter-observer variability7;8. Facing these issues, it is urgent to
develop an automatic and an accurate histopathological image analysis methods, especially
classification tasks.

Recently, we have witnessed groundbreaking improvements in digital pathology (DP) and artificial
intelligence (AI), promising to change the way we detect and treat BC in the near future9. The most
promising advance in AI is Machine learning (ML), and particularly Deep learning (DL)10. In breast
pathology, convolutional Neural Networks (CNNs) are favoring deep learning approaches for BC
classification and detection11;12.

Methods

Page 3/13
Study description:
The study was prospectively performed in the histopathology department of the National Institute of
Oncology in Rabat, over a period of six months from January 2020 to June 2020, involving 116 breast
surgical specimens. Only diagnosis of invasive breast carcinoma (IBC) of non specific type was included
on all breast surgical specimens. All diagnoses of IBC of a specific type, as well as tumors lysed after
neoadjuvant chemotherapy were excluded.

In this study, the tumor tissue samples were stained with hematoxylin-eosin (HE), photographed at x200
equivalent magnification, and exported to jpeg format using olympus cellsens entry software. This
process was performed by one pathologist, at light microscopy, using Olympus BX43, coupled with
camera DX73. Furthermore, two qualified consultant breast pathologists; completed a brief training in use
of the digital microscopy system, were recruited to participate in the validation study.

Dataset collection:
We collected overall 328 HE stained images. Each image is labeled with one of three classes (Fig. 1):
invasive carcinoma (IC) (group2); in-situ carcinoma (IS-C) (group1) and benign: benign lesions and or
normal tissue (group0). The labeling was performed by two pathologists, who only provided diagnostic
information from the image contents, without specifying the area of interest for the classification.

Proposed methodology:
When a pathological image with high resolution (2048×1536 pixels) is input, our goal is to accurately
classify the image into one of three categories: normal or benign, IS-C and IC. To achieve this, we have
proposed and tested a method for BC classification inspired from the experimental protocol proposed by
Alexander Rakhlin and al.13. In our work, each phase is described in the following subsection:
1. Data pre-processing and augmentation:
First, original images are resized by dividing the initial size in two in order to accelerate the later
operations. After a color normalization step, we performed 40 random color augmentations for each
image. The augmentation consists of an affine transformation of the input images pixel intensities that
allowed us to multiply the size of the dataset by 40. Consequently, each image was used to generate 20
randomly extracted patches of a fixed size (750 x 750), lately processed by the CNNs. (Fig. 2A)

2. Features extraction
For our use case, we opted for two Deep CNNs architectures: ResNet50 and Xception models. These two
models are pre-trained on the 'ImageNet' Dataset, available for public usage, and which contains more
than 1 Million images or about 150GB of annotated images of several categories. Both models will be
used to compute a descriptor vector for each crop. The feature vectors of the 20 crops of a single image
will be combined through a pooling operation to generate one feature vector per image. (Fig. 2B)

3. Machine Learning classification:


Page 4/13
We performed a supervised classification using XGBoost model. It is an optimized distributed gradient
boosting library, which can be efficiently executed on a GPU station (Graphics Processing Unit), to allow a
quick training and evaluation of the model. In fact, gradient boosting models are being extensively used
in machine learning due to their speed, accuracy, and robustness against overfitting.

4. Evaluation metrics:
To validate our approach, we used a cross-validation method. The augmented images that were extracted
from the same original image were placed in the same fold to prevent information leakage. We used a 6-
fold cross validation strategy, leading to accuracy metric for each fold and then an average global
accuracy. Due to a very limited number of images we had for this study, we did not manage to have an
additional separate test set. In our work, we computed a prediction for each augmented image then
combined the decision made for the 40 augmentations through a voting strategy, in order to compute a
unique prediction for each image. In addition, we evaluated the performances for two scenarios, each one
corresponding to a CNN architecture of features extraction. We also compared actual class and predicted
results obtained using a confusion matrix.

Results
We have extracted 328 images from HE stained digital slides, among which 152 were non-carcinoma and
176 were carcinoma images. The carcinoma class included images of IC and IS-C while the non-
carcinoma class contained images of normal tissues as well as benign lesions. In our study, we
performed multi-class classification into three classes: groupe 0(benign):152 images, groupe 1(IS-C):70
images, and groupe 2(IC):106 images. Given the results of this classification system, we computed the
corresponding metrics for a binary classification case and a multi-class classification case.

We obtained the following results by performing a 6-folds cross validation approach, training on 273
images and testing on 55 images, during 6 rounds. (Table 1)

ResNet 50 model:
The Resnet50 model had correctly predicted 277 out of 328 instances; 142 benign instances were
effectively benign, 85 IC were actually invasive, and 50 instances were correctly predicted as IS-C), while
51 cases were misclassified. In terms of 3-class classification, majority voting showed good results,
achieving an overall accuracy of 84.5% for three classes.

We also reported that overall accuracy increases when only two classes (non-carcinoma and carcinoma)
are considered (84,5 versus 90%). This indicates that the normal/benign and in situ/invasive classes
share similar features. In addition, this proposed model achieved an overall sensitivity of 93% for
carcinoma classification, which means that our classifier was very good at detecting cancer.

1. Xception model:
Page 5/13
For instance, among 152 normal cases, 144 were correctly classified as normal, only 4 were wrongly
classified as IC and 4 IS-C were missed. We also noticed that the Xception network achieved a maximum
overall accuracy of 88% for three classes, slightly bigger than the Resnet50 model.

In comparison with Resnet50, the Xception model showed high classification results for the binary
classifications for all the evaluation metrics, as well as 3-class classifications. Additionally, we reported a
high sensitivity (95%) for carcinoma cases, which have a great significance in the diagnostic pathology
workflow.

Discussion
DP and AI in breast pathology:
The automation of BC diagnosis is essential and requires digitalization of the histological slides using
the whole-slide imaging (WSI) system14, which could assist pathologists to improve the accuracy of
diagnostic processes.15

DP had the potential to transform the way in which pathology services are delivered across the globe.
Indeed, it made telepathology consultation between expert pathologists easier16, provide tools for a more
efficient workflow and higher reproducibility17, especially in challenging situations such as COVID-19
pandemic. The goal of DP is not to take over the pathologist’s work, but to improve accuracy and reduce
human error18.

However, laboratories with integrated DP workflows are still sparse nowadays. In Morocco, as a
developing country, we are the first one to introduce AI in routine pathology workflow.

In breast pathology, rapid advances in AI along with the growing DP are a promising approach to meet
the urgent need for more accurate detection, classification and prediction19. Actually, ML and DL
algorithms have been widely successful and showed a high performance in terms of BC diagnosis,
prognosis, and response to treatment20-23.

Comparison with the state-of-the-art


First of all, it is worth mentioning that there are few Moroccan studies who have proposed different
approaches24;25, performed by biomedical engineers and data scientists, for BC diagnosis using ML on
public dataset. Yet to know, our experience is the first one, as pathologists, that successfully assesses AI-
algorithms for an automated diagnosis of BC using binary and multi-class classifications in one research
work, based on our private single dataset.

The effectiveness of our proposed DL approach can be compared with various state-of-the-art studies
used for the classification of BC histopathology images. Most of these studies are based on publicly
available dataset26-30. Meanwhile, most medical image datasets are usually much smaller because of

Page 6/13
patient privacy issues and the need for expert annotation and labelling4. In our study, we used our own
created dataset, which has a limited size compared to public image datasets.

The experimental results showed state-of-the-art testing accuracy for BC detection as compared to
existing methods. For instance, for Spanhol et al.31, the achieved accuracy was approximately 84%. In
our work, the overall accuracy is 84% when using ResNet50 and 88% with Xception. In comparison with
the previous work, our methods present similar performances, even though our training was performed
considering 3 classes. Besides, the used dataset contains approximately 2000 images for the referred
magnification, which is a significantly larger training set. Moreover, the previous study images were
selected in such a way that only relevant regions for diagnosis were present, while in our case some
patches in the training and testing sets may not contain the most relevant information to be correctly
classified, which can lower the accuracy in term of classification.

In the work of Araujo et al.32, authors reported a level of accuracy of 77.8% for 4-class and 83.3% for
binary-classification. The sensitivity of their method for cancer cases achieved 95.6%. At the same time,
our proposed classification allowed us to obtain a high sensitivity for carcinoma cases, which have a
great significance in the diagnostic pathology workflow, as the harm resulting from a false negative
(patient remains without diagnosis) is much more detrimental than a false positive (patient undergoes
additional procedures and treatments such as chemotherapy). In addition, we achieved a high degree of
accuracy i.e 90% (Resnet 50) and 91% (Xception) for binary-classification tasks.

Conclusions
In this paper, we proposed a simple and effective method for the classification of HE stained histological
BC images in case of very small training data (328 samples). To increase the robustness of the classifier
we opted for a hybrid pipeline and used strong data augmentation and deep convolutional features
extracted with publicly available pre-trained CNNs. In term of classification task, our results revealed a
good discriminatory power either for the differentiation between benign and malignant or to classify their
three sub-categories

Limitations
Although the presented work has clearly demonstrated the powerful classification capacity of AI-
algorithms in term of BC histopathology images, we were challenged by the limited size of the dataset
which can leads to overfitting. Therefore, to circumvent this issue we opted for a hybrid pipeline and
strong data augmentation.

Currently, we are working on the extension of our dataset with other pathology laboratories as well as
detection of invasive BC of a specific type in order to improve the accuracy of classification. Moreover,
our project for the implementation of the WSI system is boosting the BC diagnostic workflow. In our
future work, we intend to use and evaluate other CNNs pretrained models for the features extraction

Page 7/13
stage, and extend the application usability to other types of cancer, such as colorectal, lung or prostate
cancer.

Abbreviations
The following abbreviations are used in this manuscript:

BC: Breast cancer,

IBC: Invasive breast cancer,

IS-C: In-situ carcinoma,

IC: Invasive carcinoma,

HE: Hematoxylin and eosin,

AI: Artificial intelligence,

ML: Machine learning, DL: Deep learning,

CNN: Convolutional Neural Networks,

WSI: Whole-slide imaging,

DP: Digital pathology

Declarations
Ethics approval and consent to participate:

The study protocol and study methodology was approved by the Human Research Ethics Committee of
the University of Mohammed V, Faculty of Medicine and Pharmacy. All participants gave a written
informed consent; in case of the inability to give written consent, a legal representative had to provide
consent.

Consent for publication:

Not applicable.

Availability of data and material:

All data generated or analysed during this study are included in this published article.

Competing interests:

Page 8/13
The authors declare that they have no competing interests.

Funding:

The publication charge of this study will be covered by the Institute of Cancer Research.

Acknowledgment:

Gratefully thanks for the Institute of Cancer Research for funding my manuscript. We would like to
particularly acknowledge the support of the team and partners of Datapathology, first Moroccan startup
that combines medical and digital expertise to develop new tools for diagnosis and precision pathology.

Author’s contributions:

EH conceived of the idea for the study, designed the study, analysed and interpreted the data, and drafted
the manuscript.

AM designed the deep learning method, and helped to draft the manuscript.

ELH and EM conceived of the idea for the study, analysed and interpreted the data.

AI contributed to the statistical analysis.

RK, KH, BS, ES, EB performed the histological examination of all breast surgical specimens employed in
this study.

ES, EB provided annotations of all the digitized slide images employed in this study.

MC coordinated the literature search and helped to draft the manuscript.

EB supervised, reviewed and validated the study.

All authors critically reviewed the manuscript and provided final approval for submission.

References
1. World health Organization facts on Breast Cancer https://www.who.int/cancer/prevention/diagnosis-
screening/breast- cancer/en/
2. The Global Cancer Observatory https://gco.iarc.fr/today/data/factsheets/populations/900-world-
fact-sheets.pdf
3. Robertson S, Azizpour H, Smith K, Hartman J. Digital image analysis in breast pathology-from image
processing techniques to artificial intelligence. Transl Res 2018;194:19-35.
4. Asmaa Ibrahim, Paul Gamble, Ronnachai Jaroensri, Mohammed M. Abdelsamea, Craig H. Mermel,
Po-Hsuan Cameron Chen, Emad A. Rakha, Artificial intelligence in digital breast pathology:

Page 9/13
Techniques and applications, The Breast, Volume 49, 2020.
5. Acs, B, Rantalainen, M, Hartman, J (Karolinska Institutet, Stockholm, Sweden) Artificial intelligence as
the next step towards precision pathology. J Intern Med; 2020; 288: 62– 81.
6. Bera, K., Schalper, K. A., Rimm, D. L., Velcheti, V. & Madabhushi, A. Artificial intelligence in digital
pathology - new tools for diagnosis and precision oncology. Nat. Rev. Clin. Oncol. 16, 703–715
(2019).
7. Rakha EA, Aleskandarani M, Toss MS, et al. Breast cancer histologic grading using digital
microscopy: concordance and outcome association. J Clin Pathol 2018;71(8):680e6.
8. Rakha EA, Ahmed MA, Aleskandarany MA, et al. Diagnostic concordance of breast pathologists:
lessons from the National Health Service breast screening programme pathology external quality
assurance scheme. Histopathology 2017;70(4):632e42.
9. Williams B, Hanby A, Millican-Slater R, et al. Digital pathology for primary diagnosis of screen-
detected breast lesions - experimental data, validation and experience from four centres.
Histopathology 2020;76:968–75.
10. James H. Harrison, John R. Gilbertson, Matthew G. Hanna, Niels H. Olson, Jansen N. Seheult, James
M. Sorace, Michelle N. Stram; Introduction to Artificial Intelligence and Machine Learning for
Pathology. Arch Pathol Lab Med 2021; doi: https://doi.org/10.5858/arpa.2020-0541-CP
11. Karan Gupta, Nidhi Chawla, Analysis of Histopathological Images for Prediction of Breast Cancer
Using Traditional Classifiers with Pre-Trained CNN, Procedia Computer Science, Volume 167, 2020.
12. Mercan E, Mehta S, Bartlett J et al. Assessment of machine learning of breast pathology structures
for automated differentiation of breast cancer and high-risk proliferative lesions. JAMA Netw Open
2019; 2: e198777.
13. Alexander Rakhlin, Alexey Shvets, Vladimir Iglovikov, Alexandr A. Kalinin, Deep Convolutional Neural
Networks for Breast Cancer Histology Image Analysis, bioRxiv 259911; 2018
14. Colling R, Pitman H, Oien K, et al. Artificial intelligence in digital pathology: a roadmap to routine use
in clinical practice. J Pathol 2019; 249: 143e50.
15. Rakha EA, Toss M, Shiino S, Gamble P, Jaroensri R, Mermel CH, Chen PC. Current and future
applications of artificial intelligence in pathology: a clinical perspective. J Clin Pathol. 2020
16. Jahn SW, Plass M, Moinfar F. Digital Pathology: Advantages, Limitations and Emerging Perspectives.
J Clin Med. 2020
17. Benjamin Moxley-Wyles, Richard Colling, Clare Verrill, Artificial intelligence in pathology: an overview,
Diagnostic Histopathology,Volume 26, Issue 11,2020,
18. Cui, M., Zhang, D.Y. Artificial intelligence and computational pathology. Lab Invest 101, 412–422
(2021)
19. Zhou, Xiaomin et al. “A Comprehensive Review for Breast Histopathology Image Analysis Using
Classical and Deep Neural Networks.” IEEE Access 8 (2020): 90931-90956.

Page 10/13
20. D. Steiner, R. MacDonald, Y. Liu, and et al. Impact of Deep Learning Assistance on the
Histopathologic Review of Lymph Nodes for Metastatic Breast Cancer. The American Journal of
Surgical Pathology, 42(12):1636–1646, 2018
21. Coudray, N., Tsirigos, A. Deep learning links histology, molecular signatures and prognosis in cancer.
Nat Cancer 1, 755–757 (2020)
22. Whitney J, Corredor G, Janowczyk A, et al. Quantitative nuclear histomorphometry predicts oncotype
DX risk categories for early stage ERþ breast cancer. BMC Canc 2018;18(1):610.
23. Naik, N., Madani, A., Esteva, A. et al. Deep learning-enabled breast cancer hormonal receptor status
determination from base-level H&E stains. Nat Commun 11, 5727 (2020).
24. Ghadi, Abderrahim & Saoud, Hajar & Ghailani, M.. (2019). Proposed approach for breast cancer
diagnosis using machine learning. 10.1145/3368756.3369089.
25. Asri, Hiba et al. “Using Machine Learning Algorithms for Breast Cancer Risk Prediction and
Diagnosis.” ANT/SEIT (2016).
26. BreakHis: Breast Cancer Histopathological Database BreakHis,
(2015). Available: http://web.inf.ufpr.br/vri/databases/breast-cancer-histopathological-database-
breakhis
27. Bioimaging 2015 Breast Histology Classification Challenge, (2015).
Available: https://rdm.inesctec.pt/dataset/nis-2017-003.
28. TUPAC: The Tumor Proliferation Assessment Challenge 2016, (2016)/ Available: http://tupac.tue-
image.nl/
29. Camelyon 2016: Camelyon Grand Challenge 2016, (2016). Available: https://camelyon16.grand-
challenge.org/Data/.
30. BACH: The Grand Challenge on BreAst Cancer Histology images, (2018). Available: https://iciar2018-
challenge.grand-challenge.org/.
31. Spanhol, F.A.; Oliveira, L.S.; Petitjean, C.; Heutte, L. Breast cancer histopathological image
classification using Convolutional Neural Networks. In Proceedings of the 2016 International Joint
Conference on Neural Networks (IJCNN), Vancouver, BC, Canada, 24–29 July 2016.
32. T. Ara´ujo, G. Aresta, E. Castro, J. Rouco, P. Aguiar, C. Eloy, A. Pol´onia, and A. Campilho,
“Classification of breast cancer histology images using convolutional neural networks,” PloS one,
vol. 12, no. 6, p. e0177544, 2017.

Table
Due to technical limitations, table 1 xlsx is only available as a download in the Supplemental Files
section.

Figures

Page 11/13
Figure 1

Examples of breast histopathological images in our dataset: (A) normal; (B) benign; (C) in situ carcinoma;
and (D) invasive carcinoma (Hematoxylin-Eosin stain, original magnification x200)

Page 12/13
Figure 2

An overview of the proposed methodology Figure 2A : Illustration of Data-augmentation: from original


image to augmented crops Figure 2B : Illustration of convolutional Neural Network: from input to output
image

Supplementary Files
This is a list of supplementary files associated with this preprint. Click to download.

table.xlsx

Page 13/13

You might also like