Meng 2021
Meng 2021
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMI.2021.3059699, IEEE
Transactions on Medical Imaging
IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. XX, NO. XX, XXXX 20XX 1
Abstract— Cervical cancer, as one of the most frequently cancer. Routinely, once a cervical lesion is detected through
diagnosed cancers worldwide, is curable when detected the examinations of Pap smear, human papillomavirus (HPV),
early. Histopathology images play an important role in and colposcopy, the histopathological screening considered as
precision medicine of the cervical lesions. However, few
computer aided algorithms have been explored on cervical the gold standard, is adopted to finalize subsequent treatments.
histopathology images due to the lack of public datasets. Thus, the accuracy and efficiency of cervical histopathological
In this paper, we release a new cervical histopathology screening are of vital importance. However, the giga-pixels
image dataset for automated precancerous diagnosis. of whole slide images (WSIs) place high demands on the
Specifically, 100 slides from 71 patients are annotated professionalism and concentration of pathologists. Therefore,
by three independent pathologists. To show the difficulty
of the task, benchmarks are obtained through both fully computer aided diagnosis (CAD) is on urgent demand.
and weakly supervised learning. Extensive experiments Deep learning has shown great potential in CAD since
based on typical classification and semantic segmentation the emergence of some histopathology datasets, such as
networks are carried out to provide strong baselines. CAMELYON16 [3] and BACH [4] for breast cancer,
In particular, a strategy of assembling classification, DigestPath [5] for colon cancer, and PAIP [6] for liver cancer
segmentation, and pseudo-labeling is proposed to further
improve the performance. The Dice coefficient reaches etc. However, a recent study of CAD for pan-cancer shows
0.7833, indicating the feasibility of computer aided that the correlation between the algorithm of cervical cancer
diagnosis and the effectiveness of our weakly supervised and pathologist-estimated tumor purity is the lowest among
ensemble algorithm. The dataset and evaluation codes are 42 tissue types [7], which highlights the variety of the cervix
publicly available. To the best of our knowledge, it is the from other tissues and the necessity for special study on
first public cervical histopathology dataset for automated
precancerous segmentation. We believe that this work the cervix. Nevertheless, to the best of our knowledge, there
will attract researchers to explore novel algorithms on is no specially designed histopathology public dataset for
cervical automated diagnosis, thereby assisting doctors CAD of cervical precancerous lesions. The scarcity of public
and patients clinically. data further hinders the development of related algorithms.
Index Terms— Cervical histopathology, classification, Therefore, we release a new public dataset called MTCHI
dataset, segmentation, weakly supervised learning. to help researchers without medical background to delve
and compare the automated algorithms. The MTCHI dataset
contains 100 cervical WSIs at 10× magnification. Specifically,
I. I NTRODUCTION 20 WSIs containing 101 regions of interest (RoIs) are provided
Cervical cancer is one of the leading causes of cancer with pixel-level annotations, and additional 80 ones have
death in women aged 20 to 39 years with 10 premature image-level annotations. Considering diagnostic subjectivity
deaths per week [1]. It is observed that cervical lesion is a and experience, the data are annotated into four categories
continuous disease from mild dysplasia to cervical cancer [2]. (i.e., normal, CIN 1, CIN 2, and CIN 3) by three independent
Fortunately, cervical precancerous lesions can be identified pathologists according to the severity of cervical lesions as
and treated clinically to reduce the risk of developing invasive described in [8].
Automated precancerous diagnosis of cervical
This work is supported by grants from the National Natural Science histopathology may encounter multiple challenges. First,
Foundation of China (NSFC, U1931202, 62076033), the Beijing
Municipal Science and Technology Commission (Z201100007520001, the acquisition location and incision direction of the
Z131100004013036) and BUPT Excellent Ph.D. Students Foundation biopsied tissues determine the appearance of the cervical
(CX2019217). (Zhu Meng and Zhicheng Zhao contributed equally to this basement membrane in the histopathology images, leading
work.) (Corresponding author: Zhicheng Zhao.)
Z. Meng and B. Li are with the Beijing University of Posts and to uncertainty of spatial morphology and high demands on
Telecommunications, Beijing, China (e-mail: [email protected]; the ability of algorithms to identify diversity data. Second,
[email protected]). cervical carcinogenesis is developed from mild lesion to
Z. Zhao and F. Su are with the Beijing University of Posts and
Telecommunications, and are also with the Beijing Key Laboratory cancer gradually, and the lesion grading is subjective without
of Network System and Network Culture, Beijing, China (e-mail: precise quantification criteria, which causes lots of annotation
[email protected]; [email protected]). noises. In addition, compared with tissues such as breast
L. Guo is with the Department of Pathology, School of Basic Medical
Sciences, Third Hospital, Peking University Health Science Center, and colon, cervical tissues usually shape into strip with
Beijing, China (e-mail: [email protected]). small areas, resulting in the fitting difficulty of deep models.
0278-0062 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Dedan Kimathi University of Technology. Downloaded on May 18,2021 at 14:34:55 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMI.2021.3059699, IEEE
Transactions on Medical Imaging
2 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. XX, NO. XX, XXXX 20XX
Furthermore, pixel-level annotated data are scarce, which of different grades in the breast, intestine, and liver possess
hinders the generalization performance of the fully supervised distinct morphological differences. The exploration on pan-
algorithms. In this paper, extensive experiments are conducted cancer has shown that algorithms that perform well in other
based on the analysis of image patches cropped from the cancers encounter obstacles in cervical lesions [7]. Therefore,
WSIs, and strong baselines are provided for future algorithm a public cervical histopathology image dataset is required to
comparison on the MTCHI dataset. Specifically, both fully specifically explore the CAD of cervical precancerous lesions.
and weakly supervised learning are considered and the
ensemble of classification, segmentation, and pseudo-labeling
B. Related Methods
achieves the best accuracy.
The remainder of this paper is organized as follows: Section 1) CAD for Cervical Histopathology Images: The CAD of
II introduces related datasets and deep learning algorithms. cervical precancerous histopathology images strongly depends
Section III describes our dataset construction and evaluation on the extraction of structural features, which is extremely
metrics. Section IV discusses the evaluated methods, including challenging. Thus, some algorithms attempted to start with
fully supervised and weakly supervised approaches. Section V simple samples. For example, algorithms from [15] [16] [17]
presents the experiments and discussion. Section VI provides ingeniously selected and divided simple samples into upper,
the conclusions. middle, and lower layers parallel to the basement membrane.
Each layer was further cut into several small patches to
II. R ELATED W ORK extract features such as color, texture, cell distribution, and
deep learning semantic information. These features were
A. Previous Datasets finally fused to determine the lesion grade of the whole
1) Previous Cervical Datasets: CAD on cervical Pap smear tissue. However, accurately cutting a sample into three layers
images has attracted much attention because of public datasets. is complicated in practical applications. Wang et al. [18]
For example, Herlev dataset [9] focuses on the segmentation achieved basement membrane segmentation via a generative
and classification of the nucleus and cytoplasm of a single adversarial network [19], but the basement membrane is
cell on Pap smear images. In addition, the ISBI 2014 [10] occasionally invisible because of incomplete tissue. All of the
and ISBI 2015 [11] datasets are designed to extract the abovementioned experiments were conducted on the basis of
boundaries of individual cytoplasm and nucleus from real private datasets.
and synthetic overlapping cervical cytology images. These 2) Fully Supervised Learning for Histopathology Images:
datasets pay much attention on the features of cytology, Convolutional neural networks (CNNs) have achieved
including the segmentation of nucleus and cytoplasm, the remarkable results in natural image processing; accordingly,
overlap and separation between cells, and the lesion grade the transfer learning for automatic processing of
of each cell. Differently, cervical histopathology images histopathology images has become increasingly popular.
contain rich information of tissue structure, concerning about The WSIs were often first cropped into patches with a small
both histology and cytology. Therefore, a public cervical size by sliding a window. The patches were then classified or
histopathology image dataset is urgently needed. segmented through CNNs, thereby stitching into a diagnostic
2) Previous Histopathology Datasets: CAMELYON16 is a result map. For example, Fu et al. [7] performed a pan-cancer
large histopathology dataset about the detection of cancer computational histopathology analysis with Inception-v4 [20].
metastasis on sentinel lymph nodes of breast cancer patients. The lightweight ShuffleNet [21] was used in [22] to identify
It contains 400 WSIs with giga-pixels, attracting generous microsatellite instability and mismatch-repair deficiency
researchers to delve in CAD of histopathology images. in colorectal tumors. ResNet-34 [23], VGG-16 [24] and
PatchCamelyon [12] extracts small patches with size of Inception-v4 were adopted in [25] to detect the invasive
96px × 96px from CAMELYON16 to assign a simple breast cancer. Wang et al. [26] utilized GoogLeNet [27]
and direct binary metastasis classification task. BreaKHis with hard example guided training to locate tumors in breast
[13] contains 7,909 breast cancer histopathology images and colon images. HookNet [28], a semantic segmentation
(700px × 460px) acquired from 82 patients for benign and network derived from U-Net [29], was designed to aggregate
malignant classification of tumors. BreastPathQ [14] scores patch features of multiple resolutions for breast and lung
cancer cellularity for tumor burden assessment in 2,579 breast cancer. In particular, many outstanding algorithms have been
pathology patches (512px × 512px) at 20× magnification. proposed since the establishment of CAMELYON16 dataset.
BACH attracts many algorithms via promoting a detailed Liu et al. [30] achieved high accuracy with Inception-v3 [31].
microscopy breast image classification (normal, benign, in Lin et al. [32] proposed the ScanNet based on a modified
situ carcinoma, and invasive carcinoma) with patches and VGG-16 network by replacing the last three fully connected
WSIs at 20× magnification. DigestPath evaluates algorithms layers with fully convolutional layers to avoid the boundary
through signet ring cell detection from 155 patients and effect of network predictions. Takahama et al. [33] extracted
872 colonoscopy tissue screening slices (3, 000px × 3, 000px patch features from a classification model (GoogLeNet)
on average) of gastric mucosa and intestine. PAIP contains and then input them into a segmentation model to obtain
100 liver WSIs with multiple magnifications to detect and probability heatmaps. Guo et al. [34] proposed a similar
segment areas of carcinogenic cells, and calculate the area method, but only applied the segmentation stage to the tumor
of the tumor burden. In the aforementioned datasets, cells regions detected by the classification model. Khened et al.
0278-0062 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Dedan Kimathi University of Technology. Downloaded on May 18,2021 at 14:34:55 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMI.2021.3059699, IEEE
Transactions on Medical Imaging
MENG et al.: A CERVICAL HISTOPATHOLOGY DATASET FOR COMPUTER AIDED DIAGNOSIS OF PRECANCEROUS LESIONS 3
0278-0062 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Dedan Kimathi University of Technology. Downloaded on May 18,2021 at 14:34:55 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMI.2021.3059699, IEEE
Transactions on Medical Imaging
4 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. XX, NO. XX, XXXX 20XX
and CIN 2”, if both lesions can be found in the slide with Note that forcing the category of the entire region by counting
obvious corresponding regions. Therefore, a slide has at least the categories of most pixels is not recommended because
one label and up to four labels (i.e., normal, CIN 1, CIN 2, many complex regions have been removed in advance for the
and CIN 3). However, cervical tissue is diverse in morphology, test images. Although the tricky approach yields better results,
and the diagnoses are almost experiential. Annotator A and such method is not optimal means for future exploration. In
Annotator C work independently. The annotation differences addition, the background is ignored in the evaluation.
of slides by the two annotators are contrasted in Fig. 3. Only
51.25% of the slides are identified consistently by the two IV. E VALUATED M ETHODS
annotators, indicating the difficulty of cervical precancerous
Considering that the high resolution of WSIs exceeds
diagnosis. In two diagnostic results, the information from the
the load capacity of GPUs, experiments on the MTCHI
professional Annotator C is recommended. The multi-labels
dataset are carried out by image patch analysis. The slides
for these data indicate that the slides contain the relevant
from the pixel-annotated training set are first cropped into
regions, but do not specify the corresponding locations. When
small patches (400px × 400px) with an overlapping stride
high resolution slides are cut into small patches for processing,
of 100px. Then the patches with foreground proportions of
the label of each patch is unknown. Therefore, the 80 slides
less than 20% are discarded before training. The remaining
are valuable materials for unsupervised and weakly supervised
7,724 patches are used for training the fully supervised
algorithms.
classification and segmentation, as well as extracting pseudo-
labeled patches from the image-annotated slides for weakly
supervised learning. The networks used for classification and
C. Evaluation Metrics segmentation are described in Section IV-A and Section IV-B.
The performance of the algorithms are evaluated by pixel- The strategy of assembling classification and segmentation is
level ground truth. Four evaluation metrics are applied to introduced in Section IV-C. The weakly supervised learning
compare the automated diagnostic results with the ground strategy for MTCHI is discussed in Section IV-D. Post-
truth to fairly measure the performance of the models. First, processing is presented in Section IV-E.
the Dice coefficient, a commonly used evaluation metric in
medical image segmentation tasks, is used to measure the A. Fully Supervised Classification
degree of coincidence between the prediction and the truth. Popular classification networks are adopted in the fully
The Dice coefficient is defined as supervised experiments, including MobileNet-v2 [42], VGG,
4 GoogLeNet, Inception-v3, DenseNet [43], and ResNet. All
1 X 2 | Pi ∩ Ti |
Dice = , (1) classification networks are initialized with parameters pre-
4 i=1 | Pi | + | Ti | trained on the ImageNet [44]. For a pixel-annotated patch, the
classification truth is the category with the largest area except
where Pi denotes the regions predicted to be category i background, i.e., one of normal, CIN 1, CIN 2, and CIN3.
(i = 1, 2, 3, 4 denotes normal, CIN 1, CIN 2, and CIN The learning rate is initialized with 0.001, and decreased by
3, respectively) and Ti denotes the truth. Second, mean using the cosine annealing strategy. The experimental results
Intersection over Union (mIoU), a commonly used evaluation of the 30th epoch are stored for comparison when cross-
metric in natural image segmentation, is also applied in this entropy loss (CE-loss) is used to constrain the model fitting,
task. The definition of mIoU is slightly different from the Dice while the 50th ones are stored when adaptive elastic loss (AE-
coefficient. mIoU is defined as loss) [45] is used. The outputs of the classification networks
4 are the confidence probabilities of the four categories, and the
1 X Pi ∩ Ti category with the largest confidence probability is regarded
mIoU = . (2)
4 i=1 Pi ∪ Ti as the prediction result. Each patch corresponds to a single
0278-0062 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Dedan Kimathi University of Technology. Downloaded on May 18,2021 at 14:34:55 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMI.2021.3059699, IEEE
Transactions on Medical Imaging
MENG et al.: A CERVICAL HISTOPATHOLOGY DATASET FOR COMPUTER AIDED DIAGNOSIS OF PRECANCEROUS LESIONS 5
diagnostic result. Therefore, it is necessary to fill the whole [47], DeepLab v3+, U-Net and its variants such as ENS-
patch to obtain the diagnosis heatmap of the same size as the UNet [48], Res-UNet [49], UNET3+ [50], and HookNet. The
input patch. The structures of the classification networks are learning rate is initially set to 0.02 and decreased by a factor
described below. of 0.5 after every 10 epochs. The networks are all trained
1) MobileNet-v2: MobileNet-v2 contains very few network by CE-loss. The output of the semantic segmentation network
parameters to complete approximately the same function as has the same length and width as the input image. Semantic
traditional convolutions, thereby accelerating the feedforward segmentation can be regarded as a pixel-level classification,
process. Specifically, depthwise separable convolutions are and thus, each pixel has a corresponding prediction result. The
embedded instead of regular convolutions. An expansion layer segmentation network structures are introduced below.
is assigned before the depthwise convolution to expand the 1) FCN: The fully connected layers of VGG are replaced
feature channels, and a projection layer is assigned after it by convolutional layers, constituting a fully convolutional
to reduce the dimensions. The expansion, depthwise, and network (FCN). During feature extraction process by VGG,
projection layers form a bottleneck residual block. Multiple feature-map size is reduced from x to 1/32x. In FCN32s,
blocks are stacked to extract patch features. the feature-map with size 1/32x is return to the original size
2) VGG: Convolutional layers with the same kernel size through a deconvolution layer to obtain segmentation results.
of 3 × 3 are stacked in the VGG network. The number of FCN16s first up-samples the feature-map at size 1/32x, then
feature-map channels is increased step by step through the sums it to the feature-map at size 1/16x, and finally recovers
convolutional layers. The feature-map size is down-sampled it to the input size by a deconvolution layer.
five times through five max pooling layers with a stride 2) SegNet: SegNet employs the structure of encoder-
of 2. Three fully connected layers are assigned at the end decoder to implement semantic segmentation. Here, VGG-16
of the VGG network to obtain the classification results. In is regarded as an encoder to extract features. The decoder
addition to the 5 max pooling layers and the 3 fully connected consists of convolutional layers and up-sampling operations.
layers, VGG-16 contains 8 convolutional layers and VGG-19 During the max pooling in the encoder, pooling locations are
contains 11 convolutional layers. Due to the large number of stored as indexes for the up-sampling of the decoder.
parameters, VGG has a good ability to extract features but is 3) U-Net: U-Net was originally designed for cell
time-consuming. segmentation, and its output size is smaller than the
3) GoogLeNet and Inception-v3: GoogLeNet, also known input size. To obtain the diagnostic heatmap with the same
as Inception-v1, contains fewer parameters than VGG. It size as the input patch, all convolutional layers in U-Net
is composed of multiple Inception modules. An Inception are modified with zero padding. The encoder extracts
module aggregates convolutional layers (1 × 1, 3 × 3, 5 × 5), high-dimensional semantic features by max pooling and
pooling operations (3 × 3) to deal with different scales. convolutional layers, while the decoder gradually restores
Dimension reductions and projections are judiciously applied the feature size through convolutional and deconvolutional
before the convolutions with large kernel sizes. As to layers. Four sets of different scale features in the encoder
Inception-v3, the Inception modules are improved through the and decoder are cascaded by skip connections to improve the
factorization into small convolutions. position accuracy of the segmentation results.
4) DenseNet: DenseNet-121 and DenseNet-169 are stacked 4) U-Net Variants (ENS-UNet, Res-UNet, UNET3+): ENS-
by dense blocks. In a dense block, each layer takes all UNet inserts a noise suppression block in every skip
preceding feature-maps as input. Each layer can access the connection path of U-Net. Res-UNet replaces all convolutional
gradient directly from the loss and the original input image, layers and skip connections in U-Net with residual blocks. In
enabling implicit deep supervision. the decoder of UNET3+, features are densely concatenated
5) ResNet: Different from VGG-19, ResNet replaces the with all shallow features from the encoder.
max pooling layers except the first one using convolutions 5) HookNet: The U-Net-like structure without skip
with a stride of 2. For ResNet-18 and ResNet-34, two 3 × 3 connections is treated as one branch of the HookNet. The
convolutional layers constitute a residual block. The input structures of the two branches in the HookNet are exactly
of the residual block is connected to the output of the the same, but the input patches have different fields of
convolutional layers to avoid the gradient disappearance during view. Specifically, a 400px × 400px patch is resized to
training. When the networks are deeper (e.g., ResNet-50 284px × 284px as input to the first context branch. The same
and ResNet-101), the residual block is composed of three patch is center cropped by 284px × 284px as input to the
convolutional layers with kernel sizes of 1 × 1, 3 × 3, and target branch. The second layer of the decoder in the context
1 × 1. Multiple residual blocks are stacked to extract features. branch is center cropped and cascaded to the first layer of
the decoder in the target branch. Since the channel number
in the HookNet-x is elevated gradually, the output channel
B. Fully Supervised Segmentation number x of the first convolutional layer determines the
When there are multiple categories in one patch, it is an parameter quantity. Experiments on the basis of HookNet-16
inaccurate practice to take the category with the most pixels and HookNet-64 are conducted in this paper.
in the patch as the label, so the pixel-wise classification (i.e., 6) DeepLab v3+: DeepLab v3+ adopts ResNet as the
the semantic segmentation) is required. The semantic networks encoder to extract features. The high-dimensional features
used for experiments in this paper include FCN [46], SegNet from the end of the encoder are fed into an Atrous Spatial
0278-0062 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Dedan Kimathi University of Technology. Downloaded on May 18,2021 at 14:34:55 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMI.2021.3059699, IEEE
Transactions on Medical Imaging
6 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. XX, NO. XX, XXXX 20XX
0278-0062 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Dedan Kimathi University of Technology. Downloaded on May 18,2021 at 14:34:55 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMI.2021.3059699, IEEE
Transactions on Medical Imaging
MENG et al.: A CERVICAL HISTOPATHOLOGY DATASET FOR COMPUTER AIDED DIAGNOSIS OF PRECANCEROUS LESIONS 7
TABLE I
R ESULTS OF FULLY SUPERVISED CLASSIFICATION . OVERLAPPING IS BETTER THAN NON - OVERLAPPING FOR MOST NETWORKS . M OBILE N ET- V 2
WORKS THE FASTEST, R ES N ET-101 WORKS THE BEST FOR NON - OVERLAPPING , AND VGG-19 WORKS THE BEST FOR OVERLAPPING .
feature extractor M . For weakly supervised segmentation, more context information is gathered in each pixel, but with
the ground truth is the mask filled with the pseudo-label. In more computing resources.
addition, assigning the same pseudo-label to an entire patch is
coarse with false pixel-level labels. Thus, for the ensemble V. R ESULTS AND D ISCUSSION
of classification and segmentation, pseudo learning is only A. Fully Supervised Learning
applied to the classification branch to balance the information
and noise. Considering that the assisted blocks at point B and 1) Fully Supervised Classification: The classification
C need more parameters, only the block at point A is used for networks described in Section IV-A are all adopted for the
weakly supervised ensemble. First, the mixed training set S is experiments on the MTCHI dataset. As shown in Table I,
used to train the ResNet backbone and the assisted block for although the networks play their advantages in different
30 epochs. Second, the weights of the classification branch are aspects, there are obvious differences in speed and accuracy
fixed and the rest segmentation layers are trained by using the when applied to cervical precancerous CAD. MobileNet-v2
original training set S0 for 10 epochs with the learning rate of runs fastest because its floating point operations (FLOPs)
0.001. In inference process, the output of the assisted block is only 0.94G. Although Inception-v3 and ResNet-101 are
is normalized by the sigmoid function and multiplied with the very deep, they are still faster than VGG-19 because of their
segmentation output. Note that the backbone contains atrous special connection modes, namely, the Inception block and the
convolution to aggregate the information of multiple receptive residual block. The training performance of AE-loss is better
fields, namely, the classification branch in the ensemble is not than that of CE-loss in most experiments, because AE-loss fits
exactly the same as ResNet. Thus, it is retrained instead of the relationship between categories of cervical precancerous
directly loading the weights of weakly supervised ResNet. lesions better. Generally overlapping post-processing achieves
good results because it makes up for the lack of foreground
information of one patch. However, there are also some
E. Post-processing special cases of overlapping post-processing which leads to
the decrease of accuracy. For example, in Fig. 5 (b), the
The diagnosis results of patches are stitched into the
wrong prediction of a patch misleads the diagnosis of adjacent
entire diagnostic map. A simple method is to stitch non-
patches. In Fig. 5 (c), although the whole region is annotated
overlapping patches (0% overlapping). However, there are
as CIN 1 by pathologists, some regions are actually normal
noises in stitching due to insufficient information at the
tissues which can be clearly displayed by 75% overlapping.
edge of the patch. Therefore, the patches are cropped with
Although the overlapping post-processing in (c) is closer to
overlapping. Specifically, the network output of a pixel P~ =<
the real situation, it inevitably causes the loss of accuracy
p1 , p2 , p3 , p4 > denotes the confidence probability of <
when compared with the ground truth.
normal, CIN 1, CIN 2, CIN 3>. When N pixels overlap, the
1 PN ~ 2) Fully Supervised Segmentation: The results of fully
fusion confidence probability is P~ = Pk . The final supervised segmentation networks are fairly compared without
N k=1
diagnosis result of the pixel is the category with the highest overlapping in Table II. Compared with U-Net, the U-
confidence probability in P~ . The larger the overlap area is, the Net-variants (ENS-UNet, Res-UNet, and UNET3+) increase
0278-0062 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Dedan Kimathi University of Technology. Downloaded on May 18,2021 at 14:34:55 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMI.2021.3059699, IEEE
Transactions on Medical Imaging
8 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. XX, NO. XX, XXXX 20XX
Fig. 5. Three examples from ResNet-101 trained with CE-loss. I, T , P0 and P75 represent the input image, the ground truth, non-overlapping
result and 75% overlapping result, respectively. The white, green, blue and red parts represent normal tissue, lesions of CIN 1, CIN 2 and CIN 3
respectively. (a) The 75% overlapping makes up for the lack of edge information in non-overlapping. (b) The wrong prediction results mislead the
judgement of adjacent patches in overlapping post-processing. (c) The upper left corner of the sample is closer to the normal tissue rather than CIN
1, and the result of overlapping is closer to the real situation than the annotation.
TABLE II
R ESULTS OF FULLY SUPERVISED SEGMENTATION AND ENSEMBLE WITH
ASSISTED BLOCK . E NSEMBLE -A, B, AND C REPRESENT THE RESULTS
OF ASSISTED BLOCK AT POINT A, B, AND C.
0278-0062 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Dedan Kimathi University of Technology. Downloaded on May 18,2021 at 14:34:55 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMI.2021.3059699, IEEE
Transactions on Medical Imaging
MENG et al.: A CERVICAL HISTOPATHOLOGY DATASET FOR COMPUTER AIDED DIAGNOSIS OF PRECANCEROUS LESIONS 9
TABLE III
W EAKLY SUPERVISED CLASSIFICATION RESULTS WITH 75% OVERLAPPING . ACC REPRESENTS THE ACCURACY COMPARED WITH PATCH - LEVEL
GROUND TRUTH . T HE RESULTS USING R ES N ET-101 AS M FOR EXTRACTING PSEUDO - LABELED PATCHES IS BETTER THAN THOSE USING
M OBILE N ET- V 2. W HEN R ES N ET-101 IS ADOPTED AS M , I NCEPTION - V 3 SHOWS THE BEST EFFECT WHEN ALL PSEUDO DATA ARE INTRODUCED,
R ES N ET-101 SHOWS THE BEST EFFECT WHEN 50% ONES ARE INTRODUCED, AND VGG-19 SHOWS THE BEST EFFECT WHEN 25% ONES ARE
INTRODUCED.
0278-0062 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Dedan Kimathi University of Technology. Downloaded on May 18,2021 at 14:34:55 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMI.2021.3059699, IEEE
Transactions on Medical Imaging
10 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. XX, NO. XX, XXXX 20XX
represents the existence of corresponding category in the limit, and the cropped patch has a limited field of
patch, so it is coarse and improper to fill the segmentation view. Therefore, it is expected to explore methods based
truth with the pseudo-label. As shown in Table IV, the on multi-scale image fusion to gather rich structural
Dice coefficient reaches 0.674 (75% overlapping) when 50% information to simulate the pathologists’ process of
Sp is used for training, which is only slightly improved determining the lesion location and grade at different
compared with fully supervised segmentation 0.65 (non- resolutions.
overlapping). Therefore, pseudo-labeling is unsuitable for the • Due to the complexity of cervical tissue structure,
weakly supervised segmentation because of too much noise. pathologists are subjective in the diagnosis of lesions.
3) Weakly Supervised Ensemble: As mentioned above, In addition, absolute accuracy cannot be achieved when
pseudo-labeling is valuable in classification instead of labeling lesion regions with polygonal lines on low-
segmentation. Thus, pseudo dataset Sp is only used for resolution images. Therefore, weakly supervised learning
training the classification branch in the ensemble, while the that only uses annotations as references is encouraged to
segmentation branch is still trained with pixel-level annotated resist labeling errors introduced by various factors.
S0 . Considering that the training of classification and • It is much easier to obtain unlabeled data. Hence,
segmentation is separate, Ensemble-B and C are difficult to unsupervised learning algorithms are strongly
converge due to more parameters and the lack of segmentation recommended.
loss; thus, only Ensemble-A is adopted for weakly supervised
ensemble learning. The Dice coefficient of Ensemble-A VI. C ONCLUSIONS
reaches 0.7833 in Table V, which is significantly higher than
0.7559 in Table II. Therefore, image-level annotated data are In this paper, a new cervical histopathology image dataset
valuable for cervical precancerous CAD. called MTCHI is introduced, and a precancerous task is
designed to evaluate the performance of automated diagnosis.
Four evaluation metrics (Dice coefficient, mIoU, AP, and WP)
C. Discussion
are provided particularly for this task. Both fully and weakly
The classification network transferred from the natural supervised algorithms are discussed. Extensive experiments
image processing tasks can quickly fit the cervical based on classification and segmentation networks are carried
histopathology data. Classification networks are prone to out to demonstrate the feasibility of CAD on cervical
misjudgement due to insufficient foreground information, precancerous lesions. The high accuracy of the ensemble
which is offset by overlapping post-processing through of fully and weakly supervised strategies demonstrates the
aggregating the diagnosis information of adjacent patches. potential of unlabeled data in improving the performance. The
The segmentation networks accurately locate the boundaries dataset is publicly available for researchers to reproduce and
of the lesion regions, but the segmentation performance is explore novel algorithms, and finally is helpful for diagnosis.
decreased because of the holes inside the prediction heatmaps.
By assigning an assisted block to the segmentation network,
the advantages of both classification and segmentation are R EFERENCES
combined and good performance is achieved. [1] R. L. Siegel, K. D. Miller, and A. Jemal, “Cancer statistics, 2020,” Ca-
In fact, pixel-level annotation is time-consuming, while Cancer J. Clin., vol. 70, no. 1, pp. 7–30, 2020.
[2] R. M. Richart and B. A. Barron, “A follow-up study of patients with
image-level annotation is relatively abundant. Despite the cervical dysplasia,” Am. J. Obstet. Gynecol., vol. 105, no. 3, pp. 386–
image-level annotation, the label of the cropped patch is 393, 1969.
unknown due to the large size of the cervical histopathology [3] B. E. Bejnordi et al., “Diagnostic assessment of deep learning algorithms
image. Therefore, weakly supervised learning is a latent. for detection of lymph node metastases in women with breast cancer,”
Jama, vol. 318, no. 22, pp. 2199–2210, 2017.
Pseudo-labeling is adopted in this paper for weakly supervised [4] G. Aresta et al., “Bach: Grand challenge on breast cancer histology
learning. Experiments show that the accuracy of the images,” Med. Image Anal., vol. 56, pp. 122–139, 2019.
classification model for extracting pseudo-label data is critical [5] J. Li et al., “Signet ring cell detection with a semi-supervised learning
framework,” in IPMI, 2019, pp. 842–854.
to the purity of the training set which strongly affects the [6] Y. J. Kim et al., “Paip 2019: Liver cancer segmentation challenge,” Med.
final performance. The anti-noise performance of networks Image Anal., vol. 67, 2020.
0278-0062 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Dedan Kimathi University of Technology. Downloaded on May 18,2021 at 14:34:55 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMI.2021.3059699, IEEE
Transactions on Medical Imaging
MENG et al.: A CERVICAL HISTOPATHOLOGY DATASET FOR COMPUTER AIDED DIAGNOSIS OF PRECANCEROUS LESIONS 11
[7] Y. Fu et al., “Pan-cancer computational histopathology reveals [34] Z. Guo et al., “A fast and refined cancer regions segmentation framework
mutations, tumor composition and prognosis,” Nature Cancer, vol. 1, in whole-slide breast pathological images,” Scientific reports, vol. 9,
no. 8, pp. 800–810, 2020. no. 1, pp. 1–10, 2019.
[8] T. M. Darragh et al., “The lower anogenital squamous terminology [35] M. Khened et al., “A generalized deep learning framework
standardization project for hpv-associated lesions: background and for whole-slide image segmentation and analysis,” arXiv preprint
consensus recommendations from the college of american pathologists arXiv:2001.00258, 2020.
and the american society for colposcopy and cervical pathology,” Arch. [36] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam,
Pathol. Lab. Med., vol. 136, no. 10, pp. 1266–1297, 2012. “Encoder-decoder with atrous separable convolution for semantic image
[9] J. Jantzen, J. Norup, G. Dounias, and B. Bjerregaard, “Pap-smear segmentation,” in ECCV, 2018, pp. 801–818.
benchmark data for pattern classification,” Nature inspired Smart Inf. [37] D.-H. Lee, “Pseudo-label: The simple and efficient semi-supervised
Syst. (NiSIS), pp. 1–9, 2005. learning method for deep neural networks,” in ICML Workshop, vol. 3,
[10] Z. Lu et al., “Evaluation of three algorithms for the segmentation of no. 2, 2013.
overlapping cervical cells,” IEEE J. Biomed. Health Inform., vol. 21, [38] H. a. Tokunaga, “Negative pseudo labeling using class proportion for
no. 2, pp. 441–450, 2016. semantic segmentation in pathology,” ECCV, 2020.
[11] Z. Lu, G. Carneiro, and A. P. Bradley, “An improved joint optimization [39] Y. Li et al., “Self-loop uncertainty: A novel pseudo-label for semi-
of multiple level set functions for the segmentation of overlapping supervised medical image segmentation,” in MICCAI. Springer, 2020,
cervical cells,” IEEE Trans. Image Process., vol. 24, no. 4, pp. 1261– pp. 614–623.
1272, 2015. [40] H.-T. Cheng et al., “Self-similarity student for partial label
[12] B. S. Veeling, J. Linmans, J. Winkens, T. Cohen, and M. Welling, histopathology image segmentation,” in ECCV. Springer, 2020, pp.
“Rotation equivariant cnns for digital pathology,” in MICCAI. Springer, 117–132.
2018, pp. 210–218. [41] S. Shaw et al., “Teacher-student chain for efficient semi-supervised
[13] F. A. Spanhol, L. S. Oliveira, C. Petitjean, and L. Heutte, “A dataset histology image classification,” ICLR Workshop, 2020.
for breast cancer histopathological image classification,” IEEE Trans. [42] M. Sandler et al., “Mobilenetv2: Inverted residuals and linear
Biomed. Eng., vol. 63, no. 7, pp. 1455–1462, 2015. bottlenecks,” in CVPR, 2018, pp. 4510–4520.
[14] M. Peikari, S. Salama, S. Nofech-Mozes, and A. L. Martel, “Automatic [43] G. Huang et al., “Densely connected convolutional networks,” in CVPR,
cellularity assessment from post-treated breast surgical specimens,” 2017, pp. 4700–4708.
Cytometry A, vol. 91, no. 11, pp. 1078–1087, 2017. [44] O. Russakovsky et al., “Imagenet large scale visual recognition
[15] P. Guo et al., “Nuclei-based features for uterine cervical cancer histology challenge,” Int. J. Comput. Vis., vol. 115, no. 3, pp. 211–252, 2015.
image analysis with fusion-based classification,” IEEE J. Biomed. Health [45] Z. Meng et al., “Adaptive elastic loss based on progressive inter-class
Inform., vol. 20, no. 6, pp. 1595–1607, 2015. association for cervical histology image segmentation,” in ICASSP,
[16] S. De et al., “A fusion-based approach for uterine cervical cancer 2020, pp. 976–980.
histology image classification,” Comput. Med. Imag. Grap., vol. 37, no. [46] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks
7-8, pp. 475–487, 2013. for semantic segmentation,” in CVPR, 2015, pp. 3431–3440.
[17] H. A. AlMubarak et al., “A hybrid deep learning and handcrafted feature [47] V. Badrinarayanan, A. Kendall, and R. Cipolla, “Segnet: A deep
approach for cervical cancer digital histology image classification,” Int. convolutional encoder-decoder architecture for image segmentation,”
J. Healthcare Inf. Syst. Informatics, vol. 14, no. 2, pp. 66–87, 2019. IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 12, pp. 2481–2495,
[18] D. Wang, C. Gu, K. Wu, and X. Guan, “Adversarial neural networks 2017.
for basal membrane segmentation of microinvasive cervix carcinoma in [48] Z. Meng, Z. Fan, Z. Zhao, and F. Su, “Ens-unet: End-to-end noise
histopathology images,” in ICMLC, vol. 2. IEEE, 2017, pp. 385–389. suppression u-net for brain tumor segmentation,” in EMBC. IEEE,
2018, pp. 5886–5889.
[19] I. Goodfellow et al., “Generative adversarial networks,” Commun. ACM,
[49] K. Cao and X. Zhang, “An improved res-unet model for tree species
vol. 63, no. 11, pp. 139–144, 2020.
classification using airborne high-resolution images,” Remote Sens.,
[20] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. Alemi, “Inception-v4,
vol. 12, no. 7, p. 1128, 2020.
inception-resnet and the impact of residual connections on learning,”
[50] H. Huang et al., “Unet 3+: A full-scale connected unet for medical
arXiv preprint arXiv:1602.07261, 2016.
image segmentation,” in ICASSP. IEEE, 2020, pp. 1055–1059.
[21] X. Zhang, X. Zhou, M. Lin, and J. Sun, “Shufflenet: An extremely
efficient convolutional neural network for mobile devices,” in CVPR,
2018, pp. 6848–6856.
[22] A. Echle et al., “Clinical-grade detection of microsatellite instability in
colorectal tumors by deep learning,” Gastroenterology, vol. 159, no. 4,
pp. 1406–1416, 2020.
[23] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
recognition,” in CVPR, 2016, pp. 770–778.
[24] K. Simonyan and A. Zisserman, “Very deep convolutional networks for
large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
[25] H. Le et al., “Utilizing automated breast cancer detection to identify
spatial distributions of tumor infiltrating lymphocytes in invasive breast
cancer,” Am. J. Pathol., 2020.
[26] Y. Wang et al., “Pathological image classification based on hard example
guided cnn,” IEEE Access, vol. 8, pp. 114 249–114 258, 2020.
[27] C. Szegedy et al., “Going deeper with convolutions,” in CVPR, 2015,
pp. 1–9.
[28] M. van Rijthoven et al., “Hooknet: multi-resolution convolutional
neural networks for semantic segmentation in histopathology whole-slide
images,” Med. Image Anal., vol. 68, 2020.
[29] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks
for biomedical image segmentation,” in MICCAI. Springer, 2015, pp.
234–241.
[30] Y. Liu et al., “Detecting cancer metastases on gigapixel pathology
images,” arXiv preprint arXiv:1703.02442, 2017.
[31] C. Szegedy et al., “Rethinking the inception architecture for computer
vision,” in CVPR, 2016, pp. 2818–2826.
[32] H. Lin et al., “Scannet: A fast and dense scanning framework for
metastastic breast cancer detection from whole-slide image,” in WACV.
IEEE, 2018, pp. 539–546.
[33] S. Takahama et al., “Multi-stage pathological image classification using
semantic segmentation,” in ICCV, 2019, pp. 10 702–10 711.
0278-0062 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Dedan Kimathi University of Technology. Downloaded on May 18,2021 at 14:34:55 UTC from IEEE Xplore. Restrictions apply.