0% found this document useful (0 votes)
30 views12 pages

Kimia Net

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views12 pages

Kimia Net

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Fine-Tuning and Training of DenseNet for

Histopathology Image Representation


Using TCGA Diagnostic Slides
Abtin Riasatian1 , Morteza Babaie1 , Danial Maleki1 , Shivam Kalra1 , Mojtaba Valipour2 , Sobhan Hemati1 , Manit
Zaveri1 , Amir Safarpoor1 , Sobhan Shafiei1 , Mehdi Afshari1 , Maral Rasoolijaberi1 , Milad Sikaroudi1 , Mohd
Adnan1 , Sultaan Shah3 , Charles Choi3 , Savvas Damaskinos3 , Clinton JV Campbell4 , Phedias Diamandis5 , Liron
Pantanowitz6 , Hany Kashani1 , Ali Ghodsi2, 7 , and H.R. Tizhoosh 1, 7
arXiv:2101.07903v1 [eess.IV] 20 Jan 2021

1
Kimia Lab, University of Waterloo, 200 University Ave. W., Waterloo, ON, Canada
2
School of Computer Science, University of Waterloo, 200 University Ave. W., Waterloo, ON, Canada
3
Huron Digital Pathology, 1620 King Street North, St. Jacobs, ON, Canada
4
Department of Pathology and Molecular Medicine, McMaster University, Hamilton, Canada
5
Laboratory Medicine and Pathobiology, University of Toronto, ON, Canada
6
Department of Pathology, University of Pittsburgh Medical Center, PA, USA
7
Vector Institute, 661 University Ave Suite 710, Toronto, ON, Canada

Abstract—Feature vectors provided by pre-trained deep artifi- and computer monitor has gained momentum in recent years.
cial neural networks have become a dominant source for image The process of digitization of whole slide images (WSIs)
representation in recent literature. Their contribution to the offers many advantages such as more efficient workflows,
performance of image analysis can be improved through fine-
tuning. As an ultimate solution, one might even train a deep easier collaboration and telepathology, and new biological
network from scratch with the domain-relevant images, a highly insights into histopathology data through usage of image
desirable option which is generally impeded in pathology by lack processing and computer vision algorithms to detect relevant
of labeled images and the computational expense. In this study, clinicopathologic patterns (Gurcan et al., 2009; Tizhoosh and
we propose a new network, namely KimiaNet, that employs the Pantanowitz, 2018).
topology of the DenseNet with four dense blocks, fine-tuned and
trained with histopathology images in different configurations. The recent progress in machine learning, particularly deep
We used more than 240,000 image patches with 1000×1000 pixels learning, provides a major argument for proponents of modern
acquired at 20× magnification through our proposed “high- pathology to justify the benefits of going digital. Pre-trained,
cellularity mosaic” approach to enable the usage of weak labels fine-tuned and de novo trained deep networks are being
of 7,126 whole slide images of formalin-fixed paraffin-embedded proposed for diverse classification, prediction and retrieval
human pathology samples publicly available through the The
Cancer Genome Atlas (TCGA) repository. We tested KimiaNet tasks. However, processing WSIs, with or without machine
using three public datasets, namely TCGA, endometrial cancer learning, has its own challenges (Madabhushi, 2009). WSIs
images, and colorectal cancer images by evaluating the perfor- of histopathology are large files containing complex histologic
mance of search and classification when corresponding features patterns. Hence, compact and expressive image representation,
of different networks are used for image representation. As well, which is fundamental to image analysis in pathology presents
we designed and trained multiple convolutional batch-normalized
ReLU (CBR) networks. The results show that KimiaNet provides many challenges. Handcrafted features, i.e., image descriptors
superior results compared to the original DenseNet and smaller that have been manually designed based on general image
CBR networks when used as feature extractor to represent processing knowledge, have been in use as a solution for
histopathology images. several decades (Jegou et al., 2011). Many studies, however,
Index Terms—Histopathology, Deep Learning, Transfer Learn- have shown that deep features, i.e., high-level embeddings in
ing, Image Search, Image Classification, Deep Features, Image a properly trained deep network, can outperform handcrafted
Representation, TCGA features in most applications (Kumar et al., 2017). As a result,
many different convolutional architectures have been trained
I. I NTRODUCTION and introduced to provide features either directly or through
transfer learning (Shin et al., 2016; Mormont et al., 2018). This
Conventional light microscopy is a well-established tech- has created new questions in computer vision research, and
nology with centuries of history. The adoption of digital consequently in medical image analysis as to which network
pathology that replaces the microscope with a digital scanner topology is most suitable for a given task. Is transfer learning
sufficient to solve specific problems? What challenges and
Manuscript submitted for publication on December 31, 2019. Correspond-
ing authors: Morteza Babaie (email: [email protected]), H.R. Tizhoosh benefits does training an entire network from scratch entail?
(email: [email protected]) (Niazi et al., 2019; Yu et al., 2016; Burt et al., 2018).
Riasatian et al. / arXiv preprint (2021)

In this work, we focus on one of the most successful from a neuropathology dataset with 13 distinct classes for
convolutional topolgies in the literature, namely the DenseNet transfer learning. Eventually, they reported a 95% accuracy
(Huang et al., 2017; Zhang et al., 2019a,b). We fine-tuned for the classification task. Faust et al. (2019) also fine-tuned
and trained the DenseNet architecture with a large number a VGG-19 model utilizing an extended version of the latter
of histopathology patches at 20 times magnification (20×) dataset with 74 distinct classes annotated employing the World
and compare its search and classification performance, as a Health Organization (WHO) tomour classification scheme.
feature extractor, against the original DeneseNet trained with Ultimately, they reported the training validation accuracy of
1.2 million natural images from ImageNet (Deng et al., 2009). 66% over the images spanning the 74 trained classes.
While it is generally expected that fine-tuning and training Trained Networks – Coudray et al. (2018) trained an
should deliver more accurate results than transfer learning, Inception V3 model on 1, 176 lung WSIs from the TCGA
the data and experimental challenges inherent in this may dataset. They extracted patches of size 512×512 pixels from
easily prevent the expected effects (Glorot and Bengio, 2010; Lung Adenocarcinoma (LUAD), Lung Squamous Cell Carci-
Najafabadi et al., 2015; Srivastava et al., 2015). noma (LUSC), and healthy lung tissue slides. Subsequently,
In the following sections, we briefly review the relevant they predicted diagnosis associated with any given WSI by
literature. We describe how we fine-tuned and re-trained a assessing the majority vote among the patch level predictions
network, using a large public dataset, which we have named of all tiles of the same slide. Finally, they reported around
KimiaNet. We report the results of applying KimiaNet on 97% accuracy for both 20× and 5× magnifications.
three public datasets for search and classification. We provide Fu et al. (2019) employed transfer learning on 17, 396
details of all experiments to demonstrate the superiority of fresh-frozen tissue images from the TCGA dataset. Their
the KimiaNet features over the original DenseNet features. As dataset contained 42 classes, including 28 tomour types and
well, we report the details of four smaller networks that we 14 healthy tissues. As they removed the non-informative tiles
designed and trained for bench-marking against KimiaNet. with minimal gradient magnitude, they fine-tuned an Inception
V4 utilizing 6.5 million patches of size 512×512 pixels at 20×
II. L ITERATURE R EVIEW magnification extracted from the training set WSIs. In the end,
The literature on the applications of deep learning in digital they evaluated the power of the learned visual representation
pathology is diverse and contains a multitude of approaches by investigating the connection between the WSIs and genome
(Janowczyk and Madabhushi, 2016; Niazi et al., 2019; Cam- information.
panella et al., 2019). As we are focused on image repre- Wei et al. (2019) trained a ResNet architecture using five
sentation, in this section we only review recent works that pathological and benign patterns of lung parenchyma, anno-
have used, fine-tuned or trained deep networks for different tated by three pathologists. They also reported a high level of
purposes in digital pathology (Janowczyk and Madabhushi, agreement between their model outcome and the pathologists
2016). final diagnosis.
Pre-Trained Networks – Two early examples of off-the- Campanella et al. (2019) suggested a framework based on
shelf feature extractors are Overfeat (Sharif Razavian et al., multiple instance learning and deep neural networks to evalu-
2014) and DeCaf (Donahue et al., 2014) used successfully ate WSI-level diagnosis to avoid time-consuming and expen-
in breast cancer classification by capturing and combining sive annotations. First, they trained several ResNet models on
different fully connected (FC) layers (Spanhol et al., 2017). different subsets of a dataset made of 44, 732 private slides to
Combining Inception (V3) features (extracted from multi- learn visual representation at both 20× and 5× magnifications.
magnification pathology images) with a fully connected layer The dataset included prostate cancer, basal cell carcinoma,
has been used for binary classification in breast cancer metas- and metastatic breast cancer involving the lymph nodes. Next,
tasis analysis (Liu et al., 2017). they chose a subset of most the suspicious tiles from each
Feature selection from pre-trained deep networks has also WSI to pass it through a Recurrent Neural Network (RNN)
been performed in many pathology tasks like HEp-2 protein to predict the WSI-level prediction. Bilaloglu et al. (2019)
classification (Phan et al., 2016). Using features from pre- trained a deep convolutional neural network called PathCNN
trained networks on pathology domains is another way to on a dataset comprised of lung cancer, kidney cancer, breast
classify the pathology problems (Mormont et al., 2018). For cancer, and non-neoplastic tissues from the TCGA repository.
instance, features from DeepLoc (Almagro Armenteros et al., In particular, they extracted regions of size 512 × 512 pixels at
2017) have been used to classify protein subcellular local- 20× magnification and then downsampled them to 299 × 299
ization (Kraus et al., 2017). As well, deep features from pre- pixel patches. Similar to previous studies, they predicted WSI-
trained networks have been successfully used for image search level diagnosis based on the aggregation of the patch-level
(Kalra et al., 2019a,b). prediction.
Fine-Tuned Networks – It is generally expected that fine- There are a multitude of other possible approaches to
tuning should increase the accuracy of a pre-trained network customizing deep solution for histopathology images. Multi-
or the expressiveness of its features in pathology image clas- instance learning (Campanella et al., 2019), teacher-student
sification (Kieffer et al., 2017). Faust et al. (2018) fine-tuned learning (Watanabe et al., 2017), and visual dictionaries (Zhu
the last 2 convolutional blocks of a VGG-19 model initialized et al., 2018) are among most investigated, to mention a few.
with ImageNet weights. More specifically, they utilized an- Here, we are mainly focused on training and fine-tuning of
notated patches of size 1024 by 1024 at 20× magnification most commonly used deep topologies.
Riasatian et al. / arXiv preprint (2021)

As the literature shows, we still need to investigate the his colleagues introduced the densely connected convolutional
effect of fine-tuning and training from scratch on expected networks (Huang et al., 2017). The network consists of several
performance improvement specially using deep embeddings dense blocks with preceding convolutional and pooling layer
as image features for various tasks. This should ideally be (Figure 1). Most recent works refer to DenseNet topology
performed using a large public dataset with raw heteroge- as a reliable candidate solution for image representation in
neous cases not particularly curated for training, hence easily histopathology (Campanella et al., 2019). Beside the popu-
repeatable on many other repositories. As well, images should larity of DenseNet (Lee et al., 2017; Liu et al., 2018), we
be processed at high magnification (e.g., 20× or higher) and have already experimented with its features in our previous
large enough patches (e.g., 500 × 500µm2 ∼ 1000 × 1000 works (Babaie and Tizhoosh, 2019; Kalra et al., 2019a,b).
pixels) to model and cover the workflow for most diagnostic In addition, compared to the top-10 matching evaluation of
cases. We will investigate the fine-tuning and training of such networks for bench-marking through ImageNet, DenseNet is
a network and test its features for search and classification certainly a compact architecture with a smaller footprint; the
using three public datasets. size of DenseNet-121 topology with almost 7M parameters
Although proposing new architectures and learning algo- amounts to almost 10% of EfficientNet-B7 (66M parameters)
rithms have been the main vehicle of progress within the AI and 0.8% of FixResNeXt-101 32×48d (829M parameters)2 .
community in recent years, customizing existing topologies
is absolutely necessary and may still not be easily possible
due to both data and computational challenges. The TCGA A. Public Image Datasets
repository, for instance, is large and publicly available. How- It is paramount to use public data such that results are
ever, due to the absence of pixel-level and regional labels of reproducible by other researchers. We downloaded and used
gigapixel files, it is not readily available for training deep three public image datasets: pan-cancer images from The
networks. Solving practical challenges like this to exploit Cancer Genome Atlas (TCGA) repository (≈ 33,000 WSIs
the discriminative power of deep networks appear to also for 32 primary diagnosis), endometrial cancer images (≈
be a valuable way of knowledge creation. The contribution 3,300 patches from 4 classes), and colorectal cancer images
of the proposed high-cellularity mosaic as whole-slide image (5,000 patches from 8 classes). Whereas TCGA provides WSIs
representation is enabling the usage of the unlabelled image with primary diagnosis for the entire image, both colorectal
data by providing an implicit regional annotation, i.e., the and endometrial datasets contain labelled patches. We used
mosaic patches of high-cellularity. most TCGA images for training and some for testing; both
Our main contribution is to propose a specialized patch colorectal and endometrial datasets were exclusively used for
selection method to create a high-cellularity collection – called testing.
cellMosaic – to enable the usage of weak WSI-level labels of a 1) TCGA Images: The TCGA repository (i.e., Genomic
public multi-organ cancer image archive at high magnification Data Commons, GDC3 ) with 30,072 WSIs is a publicly
with high-resolution patches with no downsampling for train- available repository (Gutman et al., 2013; Tomczak et al.,
ing of a densely connected network. The proposed KimiaNet 2015; Cooper et al., 2018). We recorded 29,120 WSI fully
can then be employed for feature extraction in histopathology. readable files at 20× magnification (approximately 6 terabytes
in compressed form) to prepare the dataset for training. Al-
III. K IMIA N ET - DATA AND T RAINING though 40× magnification images were also available in many
cases, we did use 20× magnification to maximize the size of
We name our network “KimiaNet”1 as we are convinced that the dataset. The dataset contains 25 anatomic sites with 32
the actual information hidden in big image data can only be cancer subtypes. Brain, endocrine, gastrointestinal tract, gyne-
extracted through extensive fine-tuning, or better, training from cological, hematopoietic, liver/pancreaticobiliary, melanocytic,
scratch. As there have been extensive research on network prostate/testis, pulmonary, and urinary tract had more than one
topologies, we have chosen a dependable network, called primary diagnoses such that they could be used for subtype
DenseNet, as the basis for our investigations. classification. From the 29,120 WSIs, 26,564 specimens were
Customizing well-established architectures for specific and neoplasms, and 2,556 were non-neoplastic. A total of 17,425
sensitive tasks not only appear to be justified but also seem files comprised of frozen section digital slides were removed
to be necessary for the sake of application-oriented catego- from the dataset due to their lower quality. We did not use
rization and end-user awareness. One example is certainly the frozen sections because the freezing artefacts in these images
“CheXNet” that is a fine-tuned DenseNet using x-ray images can “confound routine pathological examination or image
(Rajpurkar et al., 2017). analysis algorithms” (Cooper et al., 2018). We kept 11,579 of
Based on the observation that convolutional networks can permanent hematoxylin and eosin (H&E) sections for training
be very deep, yet more accurate, and still efficient to train and testing. We did not remove manual pen markings from
“if they contain shorter connections between layers close to the slides when present. This pre-selection of TCGA images
the input and those close to the output”, Gao Huang and will be further refined to assemble the training, validation and
1 The Persian/Arabic word kı̄miyā and its Greek version khēmeı́a appear to
testing datasets (see Section III-C).
originate from the Coptic word kēme (meaning Egypt) and is believed to be
2 https://sotabench.com/benchmarks/image-classification-on-imagenet
the root word for “chemistry” and associated with the alchemy that tried to
purify metals and convert them to gold. 3 https://portal.gdc.cancer.gov/
Riasatian et al. / arXiv preprint (2021)

1
2
3
4
5
6
7

Dense Block 1 Dense Block 4 30


Input patch
Conv. Conv. Pool. Pool. Linear
layer layer layer layer

Fig. 1. DenseNet architecture of KimiaNet.

TABLE I
E NDOMETRIUM DATASET (S UN ET AL ., 2019) BRCA BLCA ACC

Class Number of patches


Normal Endometrium (NE) 1,333
Endometrial Polyp (EP) 636
Endometrial Hyperplasia (EH) 798
Endometrial Adenocarcinoma (EA) 535

2) Endometrium dataset: Recently the endometrium


dataset was introduced to compare a deep learning method THYM READ LUSC
(HIENet) classification ability versus four experienced pathol-
ogists (Sun et al., 2019). In this dataset, there are four classes
of endometrial tissue, namely normal, endometrial polyp, en-
dometrial hyperplasia, and endometrial adenocarcinoma. Table
I shows the class distribution of all 3, 302 images in the
endometrium dataset. Patches of size 640 × 480 pixels are
extracted from 20× or 10× magnification WSIs and saved
as JPEG files4 . Although all slides have been prepared and
scanned at the same hospital, considerable stain variation can
be observed across endometrium patches.
3) Colorectal Cancer Dataset: One of the first digital
pathology classification datasets, the colorectal cancer images Fig. 2. Sample patches from the three datasets: TCGA repository (top two
(Kather et al., 2016), consists of 5,000 samples in 8 classes rows with the 1st row for good quality and 2nd row for low quality samples),
and 625 small patches (150×150 pixels) in each class. Labels endometrial cancer (3rd row), and colorectal cancer (bottom row).
in this dataset are tumour epithelium, simple stroma, complex
stroma, immune cells, debris, normal mucosal glands, adipose
tissue and background patches. Yottixel is a recently proposed image search engine for
Figure 2 shows sample images for each of the three datasets. histopathology (Kalra et al., 2019a,b). We used a modified
Yottixel indexing followed by post-processing to extract la-
belled patches from TCGA dataset (see Algorithm 1). Yottixel
B. Processing Unlabelled Big WSI Data assembles a “mosaic” of each WSI at magnification mI
Due to the large size of digital pathology images, rep- through patching and clustering at magnification mC with
resenting WSI files is still an obstacle for many tasks in patches at size l × l grouped in nC clusters. The mosaic
computational pathology (Tizhoosh and Pantanowitz, 2018). is a rather small collection of patches, i.e., p percentage of
Most approaches therefore focus on patch processing. The the image area, at magnification mI to represent the entire
TCGA data only provides WSIs such that patches have to WSI. Here, the main difference with the original Yottixel
be extracted. As well, images are not labelled in the common mosaic is the function cellMosaic (line 13, Algorithm 1) that
sense. That means there is no manual delineation of regions modifies the mosaic M into a new mosaic M 0 by removing all
of interest (i.e., malignant pixels); TCGA WSIs are associated patches with low cellularity by taking the top TCell percent
with a primary diagnosis for the entire image which may also of cellularity-sorted patches. Based on the assumption that
contain healthy tissue. This makes the creation of a dataset of many high-grade carcinomas may have higher cellularity levels
patches somewhat difficult as we need to feed labelled patches compared with healthy tissue, M 0 enables us to use the WSI
(i.e., small sub-images) into a deep network. label (i.e., the primary diagnosis) for all remaining patches of
each WSI, hence making the TCGA data usable for training
4 Download: https://doi.org/10.6084/m9.figshare.7306361.v2 a network.
Riasatian et al. / arXiv preprint (2021)

Fig. 3. A WSI and its selected mosaic patches (left), Yottixel mosaic with 80 patches (middle), modified cellMosaic with 16 patches (right).

Algorithm 1 Modified Yottixel Algorithm information not being reported at the time of creating the
1: mI ← 20x . Magnification for indexing dataset and the DLBC (Lymphoid Neoplasm Diffuse Large B-
2: mC ← 5x . Magnification for clustering cell Lymphoma) class for not having any detailed groups with
3: l ← 1000 . Patch size l × l at mI at least 20 cases. We used tomour type categorization based
4: nC ← 9 . Number of clusters at mC on established literature (Cooper et al., 2018). For instance,
5: p ← 15% . Mosaic percentage LUAD (lung adenocarcinoma), LUSC (lung squamous cell
6: TCell ← 20% . Top cases among sorted cellularity carcinoma) and MESO (mesothelioma) would all fall under
7: A ← readWSI(f ileN ame) . Read an image “Pulmonary Tumours”. This is particularly useful for evalu-
8: procedure YOTTIXEL I NDEX (A, mI , mC , l, nC , p, TCell ) ation of horizontal search. The test and validation datasets
9: S ← Segment(A, mC ) . Separate tissue/background were chosen from the single cases randomly selecting from
10: P ← Patching(A, S, mI , l) . Get all patches 10% of WSIs within each class. The rest of the slides were
11: C ← KMeansCluster(P ) . Cluster patches assigned to the training dataset. All cases that did not contain
12: M ← getMosaic(C, p, A) . Select a mosaic diagnostic or morphological information were removed from
13: M 0 ← cellMosaic(M, TCell ) . Keep cell patches all datasets. This resulted into a test dataset of 777 slides, a
14: F ← Network(M 0 ) . Get features validation dataset of 776 slides and training dataset of 7,375
15: return F . Set of features for A diagnostic slides. All three sets are disjoint. In Particular, we
selected both test and validation sets to only consist of those
patients with only one diagnostic slide. As well, WSIs with
C. Training Data no magnification information or with magnification lower than
To create the dataset, we first eliminated frozen sections so 20× were removed. This led to creation of a test dataset of
that only permanent section diagnostic slides were left. The 744 slides, a validation dataset of 741 slides and a training
low quality of frozen section images might negatively affect dataset of 7,126 slides (a total of 8,611 WSIs). Extracting
training. In order to create a versatile dataset, we divided the 500µ × 500µ at 20× finally resulted in 1,198,118 patches for
data into groups with most detailed labels, each unique group training, 121,801 patches for validation, and 116,088 patches
being specified by combination of ‘morphology’, ‘primary for testing.
diagnosis’ and ‘tissue or organ of origin’ label. Then, we Pathologists generally use different patch sizes and samples
removed the groups with fewer than 20 cases so the dataset to find different types of information. For instance, 10× is
can be used at the most detailed level by specifying the label used for gross features and infiltrates, 20× for more detailed
of each class with the mentioned combination; hence, each histology patterns, and 40× for fine nuclear and cellular
class has at least 2 cases (ten percent of 20) test samples. For details. Both magnification level and patch size are set based
example, one of the deleted groups was [’8020/3’, ’Carcinoma, on empirical evidence and computational convenience (some
undifferentiated, NOS’, ’Tail of pancreas’] which had only works have used similar settings as in this work, e.g., see Faust
one case. As a result of this process, 2 of the 32 classes et al. (2018)). The algorithms proposed in this work can be run
of primary diagnoses were removed, the UCEC (Uterine for any magnification and any patch size if desired histologic
Corpus Endometrial Carcinoma) class due to the morphology features are apparent and/or computational resources available.
Riasatian et al. / arXiv preprint (2021)

The dataset was still not suitable for training; as the WSIs
were labeled with diagnosis, and not the patches. However,
we intended to train the network at the patch level. The patch
dataset included multiple healthy/benign patches associated
with a diseased WSI. This would perhaps confuse any deep
network during training. As carcinomas are generally associ-
ated with uncontrolled cell growth, mainly embodied in areas
with unusually high presence of cell nuclei (e.g., small cell
carcinoma is extremely hypercellular), cellularity of patches
can be used to eliminate most benign/healthy patches (Travis,
2014)5 . We chose hypercellularity to automate patch selection
as this is one of the principal hallmark features of cancer that
spans most neoplasms. Cellularity can be used as an initial
filter to select patches with a higher probability of malignancy.
However, we realize that reactive or inflammatory tissue KIRC GBM STAD
Nuclei Ratio: 13% Nuclei Ratio: 41% Nuclei Ratio: 61%
features may in some cases also be included in this set. While
we do recognize that non-neoplastic cell types may show Fig. 4. Samples for cell segmentation.
hypercellular regions, we feel these caveats are outweighed by
the superior automation provided by this approach. Similar ap-
proaches have been used recently when patches with minimal 256, 128, 128 and 64 for KimiaNet-I, KimiaNet-II, KimiaNet-
gradient magnitudes were eliminated (Fu et al., 2019). Hence, III and KimiaNet-IV, respectively. Each network was trained
we measured the cellularity of each patch. This was done by for roughly 20 epochs. Each epoch took approximately 60,
first deconvolving the patch color from RGB to hematoxylin 75, 90 and 110 minutes for KimiaNet configurations I,II,III
and eosin channels using color deconvolution (Onder et al., and IV, respectively. We used 30 classes of primary diagnoses
2014)6 . Then a binary mask was created from the hematoxylin as the KimiaNet output (see Table VI in Appendix). The
channel using a constant threshold, set empirically, to get the stopping criterion for training is a decrease in validation
cellularity ratio for each patch (Figure 4). Finally, training accuracy for three consecutive epochs. We used the Adam
and validation dataset patches were each sorted with respect optimizer (Kingma and Ba, 2014) for optimization of our
to their ratio of cellularity (number of pixels with value 1 in models with an initial learning rate of 0.0001 and scheduler
the created mask over the number of all patch pixels). The top that decreases the learning rate every 5 epochs. We set the
TCell = 20% patches (lines 6 and 13, Algorithm1) with regard cross-entropy as the loss function for our models to measure
to their sorted ratio were selected with the additional constraint the performance of the classification model. For each of
of having a file size larger than a specific threshold (e.g., >100 the models, we used the ImageNet pre-trained weights for
KB). This resulted into final training and validation datasets initialization. One has to bear in mind that the DenseNet
containing 242,202 and 24,646 patches, from 7,126 and 741 has been trained with 1.2 million images for 1,000 classes;
WSIs, respectively. Hence, we are using a 80%-10%-10% split KimiaNet has been trained with 242,202 images (data ratio
for training/validation/testing to assign as many samples as = 242, 202/1, 200, 000 ≈ 0.20 ) for 30 classes (class ratio
possible to the training data. Besides, the restrictions on testing = 30/1, 000 = .03).
data geared the ratio also toward larger this split. Figure 5 shows the convergence behaviour of all four
models. In most experiments, convergence was observable
D. Training after 10 epochs. Clearly, the highest accuracy values were
achieved when we trained KimiaNet-IV by re-training all
We run the training/fine-tuning in several configura-
DenseNet-121 weights. The general trend of getting higher
tions to generate KimiaNet-I (last DenseNet-121 block
accuracy by fine-tuning more blocks is also visible.
trained), KimiaNet-II (last two DenseNet-121 blocks trained),
KimiaNet-III (last three DenseNet-121 blocks trained), and
KimiaNet-IV (all DenseNet-121 blocks trained). IV. E XPERIMENTS
We used the PyTorch platform to train and test the models
Once the training of different versions of KimiaNet were
described above. We trained each model on 4 Tesla V100
completed, we used the three datasets to measure the gen-
GPUs with 32GB memory per GPU. We set the batch size of
eralization capability of KimiaNet’s features extracted from
5 Although abnormal and disrupted tissue architecture is another major cri- its last pooling layer. The DenseNet features from the same
terion to select candidate patches, this may be more challenging to accomplish layer were extracted as well for comparison. When searching
compared to cellularity measurements. through features/barcodes to find matches, we used k-NN
6 We used the function available in HistomicsTK library on GitHub: The
RGB image is transformed into optical density space, and then projected algorithm (with k = 3) to find the top k matched (most similar)
onto the stain vectors in the columns of the stain matrix W , a 3×3 matrix features/barcodes. The top k matched images through search
containing the color vectors in columns. For two stain images the third column can be retrieved along with their corresponding metadata.
is zero and will be complemented using cross-product. For deconvolving H&E
stained image, W is set to [[0.650, 0.072, 0], [0.704, 0.990, 0], [0.286, 0.105, However, we treat the search like a classifier to quantify its
0]]. performance.
Riasatian et al. / arXiv preprint (2021)

TABLE II
1 3-N EAREST N EIGHBORS ACCURACY (%) FOR THE HORIZONTAL SEARCH
AMONG 744 WSI S FOR DIFFERENTLY FINE - TUNED / TRAINED K IMIA N ET.
0.9 T HE BEST RESULTS ARE HIGHLIGHTED . T HE LAST COLUMN SHOWS THE
IMPROVEMENT OF ACCURACY (%) THROUGH K IMIA N ET COMPARED TO
0.8 D ENSE N ET.

0.7 Tumor Type Patient # DN I II III IV diff


Brain 74 72 96 97 99 99 +27
Accuracy

Breast 91 53 86 87 91 91 +38
0.6 Endocrine 72 65 86 89 93 92 +28
Gastro. 88 53 74 81 80 84 +31
0.5 KN-IV Train Acc Gynaec. 30 13 43 40 47 57 +44
KN-IV Val Acc Head/neck 32 25 75 69 81 88 +63
KN-III Train Acc Liver 51 43 67 69 80 88 +45
0.4
KN-III Val Acc Melanocytic 28 18 57 54 75 86 +68
KN-II Train Acc Mesenchymal 13 23 38 62 69 69 +46
0.3 KN-II Val Acc
Prostate/testis 53 57 89 91 94 96 +39
KN-I Train Acc
KN-I Val Acc Pulmonary 86 56 83 86 85 86 +30
0.2 Urinary tract 123 59 89 89 88 89 +30
2 4 6 8 10 12 14 16 18 20
Epochs

1 DN KN-I KN-II KN-III KN-IV


Fig. 5. Training/validation accuracy for different KimiaNet configurations.
0.9
0.8
So far, we have mentioned major parameters for KimiaNet. 0.7
Clearly, any solution based deep networks has many param- 0.6
eters and hyperparameters that need to be adjusted (Cui and 0.5
Bai, 2019). In the case of histopathology gigapixel images, 0.4
0.3
additional parameters may be added for operation such as
0.2
segmentation, patching and clustering (Kalra et al., 2019a).
0.1
0
A. TCGA Experiments: Classification through Search
To evaluate the distinctive power of the provided features,
two types of experiments were performed on the testing
images: 1) Horizontal search which means measuring how
accurate the algorithm can find the tomour type across the Fig. 6. Horizontal search results (accuracy, in percentage) for TCGA data.
entire test dataset, and 2) Vertical search which is defined as
finding the right primary diagnosis of a tomour type among the
slides of a specific primary site (containing different primary of 44.8% (std=19.9%) whereas KN-IV delivered an average
diagnoses). The features were “barcoded” for faster search accuracy of 85.4% (std=11.6%). With 68% improvement,
(Tizhoosh et al., 2016; Kalra et al., 2019a). Barcoding refers melanocyctic malignancies benefited the most from KimiaNet
to binarization of deep features based on their point-to-point features. However, the performance for head and neck (63%)
changes (increase/decrease is encoded as 1/0; a-b is 1 whereas and mesenchymal (54%) has also shown substantial increase
b-a is 0 when a < b). Not only binary operations are much over the average improvement. Although brain has the lowest
faster than arithmetic operations on real-valued features, but improvement (27%), its search accuracy reaches 99% with
also we have already observed that encoding the gradient of KimiaNet compared to 72% with DenseNet.
deep feature may even increase the matching accuracy (Kumar
et al., 2018). For cancer subtyping, we run the vertical search where we
We used the k-nearest neighbors algorithm (k-NN) approach confined the search to each tumour site to extract the correct
with Hamming distance to compare the barcoded features diagnosis for each primary site. WSIs were recognized through
of individual patches. For horizontal search the classification the “median-of-min” approach: the minimum Hamming dis-
accuracy was used whereas for vertical search we calculated tance for each patch of the query mosaic was calculated when
the F1 scores. The results for horizontal search are reported compared to all patches of other WSIs. The median value of
in Table II and Figure 6. all minimum distances was taken as the matching score for the
query WSI. Figure 7 shows sample queries and results for both
Analysis of Horizontal Search – As Table II shows DenseNet and KimiaNet. Examination of t-SNE visualization,
KimiaNet improves the search accuracy in all cases (on applied on test images, showed that KimiaNet features have
average 41%±14%). A Kolmogorov-Smirnov test of normality superior class discrimination (Figure 8). Detailed results are
delivered a test statistic of D = 0.25079 for DenseNet reported in Table III. Here, we used the F1-measure to account
(p = 0.27417) and D = 0.28253 for KimiaNet IV (p = for sensitivity and specificity.
0.2433). Hence, both distributions did not differ significantly Analysis of Vertical Search – The higher discrimination
from a normal distribution. DN showed an average accuracy power of KimiaNet features is clear. This is not only mani-
Riasatian et al. / arXiv preprint (2021)

TABLE IV
R ESULTS FOR COLORECTAL CANCER DATASET.

PCPG PCPG PCPG


Methods Accuracy
Combined features (Kather et al., 2016) 87.40%
Fine-tuned VGG-19 on 74 classes (Faust et al., 2019) 93.58%
Query WSI: PCPG
DenseNet 94.90%
PCPG PCPG ACC KN-I 96.38%
KN-IV 96.80%
Ensemble of CNNs Here1 (Nanni et al., 2018) 97.60%

OV OV OV
network input default size (224 × 224 pixels for DenseNet,
1000 × 1000 pixels for KimiaNet, and 1024 × 1024 pixels
Query WSI: OV
for the fine-tuned VGG-19). Subsequently, features of each
CESC OV CESC
network were used to classify the images using SVM (Support
Vector Machines). Ten fold cross-validation with fixed folds,
Fig. 7. Results for two sample query WSIs (left): Corresponding search results
based on KimiaNet features (top row for each query WSI) and DenseNet was performed to train the best classifier in each case. The
features (bottom row for each query WSI) and their assigned TCGA primary Cubic SVM outperformed all other SVM versions. As Figure
diagnosis. For TCGA project IDs see Table VI in Appendix. 9 demonstrates, with 81.41% KimiaNet features showed im-
provement surpassing HIEnet with 76.91% and the fine-tuned
TABLE III VGG-19 with 76.38%. Figure 10 illustrates the confusion
k-NN RESULTS FOR THE VERTICAL SEARCH AMONG 744 WSI S . T HE BEST
RESULTS ARE HIGHLIGHTED . F1- MEASURE HAS BEEN REPORTED HERE
matrices of DenseNet and KimiaNet.
INSTEAD OF SIMPLE CLASSIFICATION ACCURACY. F OR TCGA CODES SEE
TABLE VI IN A PPENDIX . C. Colorectal Data Experiments: Classification
Site Subtype nslides DN I II III IV In the last experiment, we repeated the previous experiments
Brain LGG 39 71 75 82 85 81
Brain GBM 35 77 73 80 83 81 with the colorectal dataset. KimiaNet delivered higher accu-
Endocrine THCA 51 94 98 98 99 100 racy than DenseNet and the fine-tuned VGG-19 (Table IV)
Endocrine ACC 6 25 25 20 55 44
Endocrine PCPG 15 57 75 73 80 85 whereas only an ensemble CNN approach provided higher
Gastro. ESCA 14 50 73 50 83 78 accuracy. The confusion matrix for KimiaNet shows a pro-
Gastro. COAD 32 65 76 75 75 76
Gastro. STAD 30 63 77 73 84 86 nounced diagonal (Figure 11).
Gastro. READ 12 22 30 26 29 30
Gynaeco. UCS 3 75 86 60 75 86
Gynaeco. CESC 17 88 97 84 97 94 D. High-Cellularity Mosaic
Gynaeco. OV 10 67 89 74 95 95
Liver, panc. CHOL 4 29 40 31 40 40 The cellMosaic is a novel approach using weak/soft labels
Liver, panc. LIHC 35 86 94 87 97 96 on valuable large datasets like the TCGA that are not annotated
Liver, panc. PAAD 12 70 73 56 82 76
Melanocytic SKCM 24 92 94 94 98 94
at the pixel level. Using the cellMosaic has apparently enabled
Melanocytic UVM 4 0 40 40 86 67 KimiaNet to outperform DenseNet for TCGA diagnostic im-
Prostate/testis PRAD 40 99 100 99 100 100
Prostate/testis TGCT 13 96 100 96 100 100
ages. As we envision using KimiaNet as a feature extractor for
Pulmonary LUAD 38 65 73 72 69 78 histopathology (and not as a classifier), the cellMosaic may
Pulmonary LUSC 43 69 74 74 75 84
Pulmonary MESO 5 0 0 0 33 75
also have induced a bias toward high-grade carcinomas and
Urinary tract BLCA 34 90 96 93 93 93 perhaps inflammations with high-cellularity. The additional
Urinary tract KIRC 50 83 95 99 97 97 experiments with two other dataset provided confidence that
Urinary tract KIRP 28 77 91 91 91 91
Urinary tract KICH 11 48 86 78 84 86 KimiaNet can in fact represent histopathology patterns in a
variety of single patches, and not in cellMosaic as for TCGA
whole-slide images. For instance, the CRC dataset contains
fested in the t-SNE visualization (Figure 8) but also quantified classes like adipose, debris, normal and background with low
in F1-measures (Table III). For all subtypes, the F1 score of to no cellularity at all, and KimiaNet features were able to
KimiaNet is higher than DenseNet. distinguish them from tumour epithelium with high-cellularity.
However, more experiments with other datasets would be
B. Endometrium Data Experiments: Classification beneficial to more closely define the KimiaNet’s application
To verify the performance of deep features of KimiaNet domain.
on the pathology datasets in comparison with other networks,
we conducted several experiments on Endometrium images E. CBR Nets: Small versus large
as one of the most recently released datasets. One of these According to recently published results, smaller topologies
experiments is comparing KimiaNet to another histopathol- may in fact provide better or the same results for medical
ogy feature extractor, which is a fine-tuned VGG-19 using images as delivered by large networks pre-trained and tested
838,644 human-annotated histopathologic patches spanning 74 using datasets like ImageNet (Raghu et al., 2019).
different lesional and non-lesional tissue types (Faust et al., Compact and simple networks made of repetitions of con-
2019). We extracted deep features from all patches with the volutional, batch-normalization and ReLU layers, also called
Riasatian et al. / arXiv preprint (2021)

Fig. 8. t-SNE visualization of randomly selected test images for DenseNet (left) and KimiaNet (right).

AT 97.8% 1.8% 0.3% 0.2%

BP 0.5% 99.5%

True Class CS 93.4% 0.3% 2.2% 0.6% 2.1% 1.3%

De 1.0% 0.5% 96.5% 1.0% 1.1%

IC 1.6% 98.1% 0.2% 0.2%

NMG 0.3% 0.6% 0.8% 97.6% 0.6%

Fig. 9. Endometrium dataset: SVM accuracy for different deep features: Fine- SS 3.0% 1.6% 0.3% 0.2% 94.6% 0.3%
tuned VGG
vs. DenseNet vs. HIEnet vs. KimiaNet for different input TE 1.3% 0.5% 0.8% 0.5% 97.0%
sizes. AT BP CS De IC NMG SS TE
Predicted Class

EA 88.0% 9.0% 0.7% 2.2% EA 88.6% 8.2% 0.7% 2.4% Fig. 11. CRC dataset confusion matrix for KimiaNet features.

EH 5.6% 77.6% 4.9% 11.9% TABLE V


True Class

EH 4.5% 82.0% 4.4% 9.1%


True Class

AVERAGE AND STANDARD DEVIATION OF ACHIEVED SCORE ( ACCURACY


FOR HORIZONTAL AND F1 SCORE FOR VERTICAL SEARCH ) FOR EACH
EP 0.8% 7.9% 64.5% 26.9% EP 0.8% 9.4% 71.7% 18.1% TOPOLOGY FOR TCGA IMAGES . n MAX IS NUMBER OF TIMES MAXIMUM
SCORE ACHIEVED ( OUT OF 12 AND 26 CLASSES FOR HORIZONTAL AND
VERTICAL SEARCH RESPECTIVELY ) AND n WEIGHTS IS THE APPROXIMATE
NE 0.8% 5.7% 11.7% 81.8% NE 0.6% 6.2% 8.3% 84.9% NUMBER OF PARAMETERS FOR THE TOPOLOGY.

EA EH EP NE EA EH EP NE Topology Horizontal nmax Vertical nmax nweights


Predicted Class Predicted Class CBR Small 48±20 0 69±24 1 2 Millions
CBR Mod. Small 46±22 0 68±29 1 5.5 Millions
Fig. 10. Confusion matrices for DenseNet (left) and KimiaNet features (right) CBR LargeT 55±20 0 69±23 3 8.5 Millions
for the endometrium dataset. CBR LargeW 50±22 0 71±24 2 8.5 Millions
DN 45±20 0 64±28 0 7 Millions
KN IV 85±12 12 81±18 23 7 Millions

CBR nets, have been found to be quite accurate, compared


to larger standard ImageNet models, for applications such as and test multiple CBR network topologies to validate against
retinal and x-ray image identifications. Raghu et al. (2019) KimiaNet.
investigated large networks like ResNet50 (˜23.5M weights) We implemented the Small, LargeT and LargeW topologies
and Inception-V3 (˜23M weights) and showed that, for in- as suggested by Raghu et al. (2019) with a slight modification
stance, the Small CBR network (˜2M weights) generated of adding a dense layer before the last fully connected layer7 .
comparable results. The authors, however, did not investigate
the popular DenseNet121 (˜7M weights). In order to verify our 7 The performance of CBR features was very low without the additional
results in light of these recent experiments, we did implement dense layer.
Riasatian et al. / arXiv preprint (2021)

We also implemented a modified Small network by adding the available data and code are valuable for the computational
fifth CBR block to the Small topology. The networks were pathology community. The proposed high-cellularity mosaic
initialized with random weights and trained from scratch for crucially facilitated the training but may have introduced a
around 40 epochs using the same TCGA dataset as used for bias toward certain histologic features at the expense of visual
KimiaNet. All settings were the same as for the training of similarities. Future works have to carve out a comprehensive
KimiaNet except that the learning rate was initialized with list of histopathology applications that could benefit from
0.003 and the batch size was 32. KimiaNet for image representation.
Tables V shows the results of the experiments for horizontal
and vertical search. For horizontal search, it can be observed Acknowledgements – The authors would like to thank the
that KimiaNet achieves a considerably higher accuracy average Ontario government for the ORF-RE (Ontario Research Fund
with more consistency than all other networks, having a 30% - Research Excellence) and NSERC (Natural Sciences and
difference with the second best results (LargeT with around Engineering Research Council of Canada) that have funded
1.5 M parameters more than KimiaNet). In addition, KimiaNet this research.
achieves the maximum accuracy for all of the classes (12 Data/Code Availability – KimiaNet and the folds for the
in horizontal search) while none of other networks reached validation of all datasets will be available for download from
the maximum accuracy for any of the classes. There is a “https://kimia.uwaterloo.ca”.
similar pattern for vertical search. The average of F1 scores
for KimiaNet is 10% higher than the second best network,
R EFERENCES
LargeW, which, again, has more parameters than KimiaNet.
KimiaNet’s low standard deviation of F1 scores shows its Almagro Armenteros, J.J., Sønderby, C.K., Sønderby, S.K.,
stability compared to other networks and the number of max- Nielsen, H., Winther, O., 2017. Deeploc: prediction of
imum F1 scores (23 out of 26 classes) confirms its superiority protein subcellular localization using deep learning. Bioin-
over CBR networks. formatics 33, 3387–3395.
Babaie, M., Tizhoosh, H.R., 2019. Deep features for tissue-
fold detection in histopathology images. arXiv preprint
V. S UMMARY AND C ONCLUSIONS
arXiv:1903.07011 .
The question of image representation is generally an im- Bilaloglu, S., Wu, J., Fierro, E., Sanchez, R.D., Ocampo, P.S.,
portant topic in computer vision and becomes critical in Razavian, N., Coudray, N., Tsirigos, A., 2019. Efficient pan-
digital pathology due to the texture complexity, polymorphism, cancer whole-slide image classification and outlier detection
and the sheer size of WSIs. Examining recent works clearly using convolutional neural networks. bioRxiv , 633123.
shows that high-level embeddings in artificial neural networks Burt, J.R., Torosdagli, N., Khosravan, N., RaviPrakash, H.,
are considered the most robust and expressive source for Mortazi, A., Tissavirasingham, F., Hussein, S., Bagci, U.,
image representation. Pre-trained networks such as DenseNet 2018. Deep learning beyond cats and dogs: recent advances
that draw their discrimination power from intensive training in diagnosing breast cancer with deep neural networks. The
with millions of natural (non-medical) images have found British journal of radiology 91, 20170545.
widespread usage in medical image analysis. Several attempts Campanella, G., Hanna, M.G., Geneslaw, L., Miraflor, A.,
have been reported in literature to fine-tune or train deep Silva, V.W.K., Busam, K.J., Brogi, E., Reuter, V.E., Klim-
networks with histopathology images, a desirable task that is stra, D.S., Fuchs, T.J., 2019. Clinical-grade computational
impeded by lack of labelled image data and the need for high- pathology using weakly supervised deep learning on whole
performance computing devices. slide images. Nature medicine 25, 1301–1309.
In this work, we proposed KimiaNet in several fine-tuned Cooper, L.A., Demicco, E.G., Saltz, J.H., Powell, R.T., Rao,
configurations by using a clustering-based mosaic structure for A., Lazar, A.J., 2018. Pancancer insights from the cancer
image representation modified by relaying on high cellularity genome atlas: the pathologist’s perspective. The Journal of
in order to enable the usage of WSI-level labels in archives pathology 244, 512–524.
with no pixel-level annotations. Three public datasets were Coudray, N., Ocampo, P.S., Sakellaropoulos, T., Narula, N.,
employed to generate the results. While there are several Snuderl, M., Fenyö, D., Moreira, A.L., Razavian, N., Tsiri-
approaches to the application of neural networks to medical gos, A., 2018. Classification and mutation prediction from
image analysis, the narrow focus on a single problem/approach non–small cell lung cancer histopathology images using
within a given study limits guidance on how best to optimize deep learning. Nature medicine 24, 1559.
neural networks for pathology tasks. Cui, H., Bai, J., 2019. A new hyperparameters optimization
Our main contribution in this paper was exploiting a di- method for convolutional neural networks. Pattern Recog-
verse, multi-organ public image repository like TCGA at 20× nition Letters 125, 828–834.
magnification to extract large patches, 1000 × 1000 pixels at Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.,
high resolution, for training a densely connected network with 2009. Imagenet: A large-scale hierarchical image database,
weak labels to serve as feature extractor. As well, we showed in: 2009 IEEE conference on computer vision and pattern
that fine-tuning a deep network on a sufficiently large number recognition, Ieee. pp. 248–255.
of histopathology images delivers better performance than Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N.,
a pre-trained network model. Finally, we think the publicly Tzeng, E., Darrell, T., 2014. Decaf: A deep convolutional
Riasatian et al. / arXiv preprint (2021)

activation feature for generic visual recognition, in: Inter- Theory, Tools and Applications (IPTA), IEEE. pp. 1–6.
national conference on machine learning, pp. 647–655. Kingma, D.P., Ba, J., 2014. Adam: A method for stochastic
Faust, K., Bala, S., van Ommeren, R., Portante, A., optimization. arXiv preprint arXiv:1412.6980 .
Al Qawahmed, R., Djuric, U., Diamandis, P., 2019. In- Kraus, O.Z., Grys, B.T., Ba, J., Chong, Y., Frey, B.J., Boone,
telligent feature engineering and ontological mapping of C., Andrews, B.J., 2017. Automated analysis of high-
brain tumour histomorphologies by deep learning. Nature content microscopy data with deep learning. Molecular
Machine Intelligence 1, 316–321. systems biology 13.
Faust, K., Xie, Q., Han, D., Goyle, K., Volynskaya, Z., Kumar, M.D., Babaie, M., Tizhoosh, H.R., 2018. Deep bar-
Djuric, U., Diamandis, P., 2018. Visualizing histopathologic codes for fast retrieval of histopathology scans, in: 2018 In-
deep learning classification and anomaly detection using ternational Joint Conference on Neural Networks (IJCNN),
nonlinear feature space dimensionality reduction. BMC pp. 1–8. doi:10.1109/IJCNN.2018.8489574.
bioinformatics 19, 173. Kumar, M.D., Babaie, M., Zhu, S., Kalra, S., Tizhoosh, H.R.,
Fu, Y., Jung, A.W., Torne, R.V., Gonzalez, S., Vohringer, 2017. A comparative study of cnn, bovw and lbp for
H., Jimenez-Linan, M., Moore, L., Gerstung, M., 2019. classification of histopathological images, in: 2017 IEEE
Pan-cancer computational histopathology reveals mutations, Symposium Series on Computational Intelligence (SSCI),
tumor composition and prognosis. bioRxiv , 813543. IEEE. pp. 1–7.
Glorot, X., Bengio, Y., 2010. Understanding the difficulty of Lee, H.S., Jung, H., Agarwal, A.A., Kim, J., 2017. Can
training deep feedforward neural networks, in: Proceedings deep neural networks match the related objects?: A survey
of the thirteenth international conference on artificial intel- on imagenet-trained classification models. arXiv preprint
ligence and statistics, pp. 249–256. arXiv:1709.03806 .
Gurcan, M.N., Boucheron, L., Can, A., Madabhushi, A., Liu, L., Ouyang, W., Wang, X., Fieguth, P., Chen, J., Liu,
Rajpoot, N., Yener, B., 2009. Histopathological image X., Pietikäinen, M., 2018. Deep learning for generic object
analysis: A review. IEEE reviews in biomedical engineering detection: A survey. arXiv preprint arXiv:1809.02165 .
2, 147. Liu, Y., Gadepalli, K., Norouzi, M., Dahl, G.E., Kohlberger,
Gutman, D.A., Cobb, J., Somanna, D., Park, Y., Wang, F., T., Boyko, A., Venugopalan, S., Timofeev, A., Nelson,
Kurc, T., Saltz, J.H., Brat, D.J., Cooper, L.A., Kong, J., P.Q., Corrado, G.S., et al., 2017. Detecting cancer metas-
2013. Cancer digital slide archive: an informatics resource tases on gigapixel pathology images. arXiv preprint
to support integrated in silico analysis of tcga pathology arXiv:1703.02442 .
data. Journal of the American Medical Informatics Associ- Madabhushi, A., 2009. Digital pathology image analysis:
ation 20, 1091–1098. opportunities and challenges. Imaging in medicine 1, 7.
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q., Mormont, R., Geurts, P., Marée, R., 2018. Comparison of
2017. Densely connected convolutional networks, in: Pro- deep transfer learning strategies for digital pathology, in:
ceedings of the IEEE conference on computer vision and Proceedings of the IEEE Conference on Computer Vision
pattern recognition, pp. 4700–4708. and Pattern Recognition Workshops, pp. 2262–2271.
Janowczyk, A., Madabhushi, A., 2016. Deep learning for Najafabadi, M.M., Villanustre, F., Khoshgoftaar, T.M., Seliya,
digital pathology image analysis: A comprehensive tutorial N., Wald, R., Muharemagic, E., 2015. Deep learning
with selected use cases. Journal of pathology informatics 7. applications and challenges in big data analytics. Journal
Jegou, H., Perronnin, F., Douze, M., Sánchez, J., Perez, P., of Big Data 2, 1.
Schmid, C., 2011. Aggregating local image descriptors into Nanni, L., Ghidoni, S., Brahnam, S., 2018. Ensemble of
compact codes. IEEE transactions on pattern analysis and convolutional neural networks for bioimage classification.
machine intelligence 34, 1704–1716. Applied Computing and Informatics .
Kalra, S., Choi, C., Shah, S., Pantanowitz, L., Tizhoosh, H., Niazi, M.K.K., Parwani, A.V., Gurcan, M.N., 2019. Digital
2019a. Yottixel–an image search engine for large archives pathology and artificial intelligence. The Lancet Oncology
of histopathology whole slide images. arXiv preprint 20, e253–e261.
arXiv:1911.08748 . Onder, D., Zengin, S., Sarioglu, S., 2014. A review on
Kalra, S., Tizhoosh, H., Shah, S., Choi, C., Damaskinos, color normalization and color deconvolution methods in
S., Safarpoor, A., Shafiei, S., Babaie, M., Diamandis, P., histopathology. Applied Immunohistochemistry & Molecu-
Campbell, C.J., et al., 2019b. Pan-cancer diagnostic consen- lar Morphology 22, 713–719.
sus through searching archival histopathology images using Phan, H.T.H., Kumar, A., Kim, J., Feng, D., 2016. Transfer
artificial intelligence. arXiv preprint arXiv:1911.08736 . learning of a convolutional neural network for hep-2 cell
Kather, J.N., Weis, C.A., Bianconi, F., Melchers, S.M., Schad, image classification, in: 2016 IEEE 13th International Sym-
L.R., Gaiser, T., Marx, A., Zöllner, F.G., 2016. Multi-class posium on Biomedical Imaging (ISBI), IEEE. pp. 1208–
texture analysis in colorectal cancer histology. Scientific 1211.
reports 6, 27988. Raghu, M., Zhang, C., Kleinberg, J., Bengio, S., 2019.
Kieffer, B., Babaie, M., Kalra, S., Tizhoosh, H.R., 2017. Transfusion: Understanding transfer learning for medical
Convolutional neural networks for histopathology image imaging, in: Advances in Neural Information Processing
classification: Training vs. using pre-trained networks, in: Systems, pp. 3342–3352.
2017 Seventh International Conference on Image Processing Rajpurkar, P., Irvin, J., Zhu, K., Yang, B., Mehta, H., Duan,
Riasatian et al. / arXiv preprint (2021)

T., Ding, D., Bagul, A., Langlotz, C., Shpanskaya, K., disjoint dictionaries for representation of histopathology
et al., 2017. Chexnet: Radiologist-level pneumonia detec- images. Journal of Visual Communication and Image
tion on chest x-rays with deep learning. arXiv preprint Representation 55, 243–252.
arXiv:1711.05225 .
Sharif Razavian, A., Azizpour, H., Sullivan, J., Carlsson, S., VI. A PPENDIX
2014. Cnn features off-the-shelf: an astounding baseline
for recognition, in: Proceedings of the IEEE conference Code Primary Diagnosis #Patients
on computer vision and pattern recognition workshops, pp. ACC Adrenocortical Carcinoma 86
806–813. BLCA Bladder Urothelial Carcinoma 410
Shin, H.C., Roth, H.R., Gao, M., Lu, L., Xu, Z., Nogues, I., BRCA Breast Invasive Carcinoma 1097
CESC Cervical Squamous Cell 304
Yao, J., Mollura, D., Summers, R.M., 2016. Deep convo-
Carcinoma and Endocervical
lutional neural networks for computer-aided detection: Cnn Adenoc.
architectures, dataset characteristics and transfer learning. CHOL Cholangiocarcinoma 51
IEEE transactions on medical imaging 35, 1285–1298. COAD Colon Adenocarcinoma 459
Spanhol, F.A., Oliveira, L.S., Cavalin, P.R., Petitjean, C., DLBC Lymphoid Neoplasm Diffuse 48
Heutte, L., 2017. Deep features for breast cancer histopatho- Large B-cell Lymphoma
logical image classification, in: 2017 IEEE International ESCA Esophageal Carcinoma 185
GBM Glioblastoma Multiforme 604
Conference on Systems, Man, and Cybernetics (SMC), pp.
HNSC Head and Neck Squamous 473
1868–1873. doi:10.1109/SMC.2017.8122889. Cell Carcinoma
Srivastava, R.K., Greff, K., Schmidhuber, J., 2015. Training KICH Kidney Chromophobe 112
very deep networks, in: Advances in neural information KIRC Kidney Renal Clear Cell Car- 537
processing systems, pp. 2377–2385. cinoma
Sun, H., Zeng, X., Xu, T., Peng, G., Ma, Y., 2019. Computer- KIRP Kidney Renal Papillary Cell 290
aided diagnosis in histopathological images of the en- Carcinoma
LGG Brain Lower Grade Glioma 513
dometrium using a convolutional neural network and atten-
LIHC Liver Hepatocellular Carci- 376
tion mechanisms. IEEE Journal of Biomedical and Health noma
Informatics , 1–1doi:10.1109/JBHI.2019.2944977. LUAD Lung Adenocarcinoma 522
Tizhoosh, H.R., Pantanowitz, L., 2018. Artificial intelligence LUSC Lung Squamous Cell Carci- 504
and digital pathology: Challenges and opportunities. Journal noma
of pathology informatics 9. MESO Mesothelioma 86
Tizhoosh, H.R., Zhu, S., Lo, H., Chaudhari, V., Mehdi, T., OV Ovarian Serous Cystadenocar- 590
2016. Minmax radon barcodes for medical image re- cinoma
PAAD Pancreatic Adenocarcinoma 185
trieval, in: International Symposium on Visual Computing, PCPG Pheochromocytoma and Para- 179
Springer. pp. 617–627. ganglioma
Tomczak, K., Czerwińska, P., Wiznerowicz, M., 2015. The PRAD Prostate Adenocarcinoma 499
cancer genome atlas (tcga): an immeasurable source of READ Rectum Adenocarcinoma 170
knowledge. Contemporary oncology 19, A68. SARC Sarcoma 261
Travis, W.D., 2014. Pathology and diagnosis of neuroen- SKCM Skin Cutaneous Melanoma 469
docrine tumors: lung neuroendocrine. Thoracic surgery STAD Stomach Adenocarcinoma 442
TGCT Testicular Germ Cell Tumors 150
clinics 24, 257–266. THCA Thyroid Carcinoma 507
Watanabe, S., Hori, T., Le Roux, J., Hershey, J.R., 2017. THYM Thymoma 124
Student-teacher network learning with enhanced features, UCEC Uterine Corpus Endometrial 558
in: 2017 IEEE International Conference on Acoustics, Carcinoma
Speech and Signal Processing (ICASSP), IEEE. pp. 5275– UCS Uterine Carcinosarcoma 57
5279. UVM Uveal Melanoma 80
Wei, J.W., Tafe, L.J., Linnik, Y.A., Vaickus, L.J., Tomita, N., TABLE VI
Hassanpour, S., 2019. Pathologist-level classification of T HE TCGA CODES ( IN ALPHABETICAL ORDER ) OF ALL 32
histologic patterns on resected lung adenocarcinoma slides PRIMARY DIAGNOSES AND CORRESPONDING NUMBER OF
EVIDENTLY DIAGNOSED PATIENTS IN THE DATASET (TCGA = T HE
with deep neural networks. Scientific reports 9, 3358. C ANCER G ENOME ATLAS )
Yu, D., Deng, L., Seide, F.T.B., Li, G., 2016. Discriminative
pretraining of deep neural networks. US Patent 9,235,799.
Zhang, J., Lu, C., Li, X., Kim, H.J., Wang, J., 2019a. A full
convolutional network based on densenet for remote sensing
scene classification. Math. Biosci. Eng 16, 3345–3367.
Zhang, K., Guo, Y., Wang, X., Yuan, J., Ding, Q., 2019b.
Multiple feature reweight densenet for image classification.
IEEE Access 7, 9872–9880.
Zhu, S., Li, Y., Kalra, S., Tizhoosh, H.R., 2018. Multiple

You might also like