0% found this document useful (0 votes)
20 views13 pages

Nuclei-Based Features For Uterine Cervical Cancer Histology Image Analysis With Fusion-Based Classification

This paper presents a novel approach for classifying cervical intraepithelial neoplasia (CIN) grades using nuclei-based features from histology images. The study introduces new acellular and atypical cell concentration features and employs a fusion-based classification method, achieving an accuracy of 88.5% in grading CIN. The research aims to enhance the diagnostic process for cervical cancer by automating the assessment of histology slides traditionally performed by expert pathologists.

Uploaded by

gdheepak1979
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views13 pages

Nuclei-Based Features For Uterine Cervical Cancer Histology Image Analysis With Fusion-Based Classification

This paper presents a novel approach for classifying cervical intraepithelial neoplasia (CIN) grades using nuclei-based features from histology images. The study introduces new acellular and atypical cell concentration features and employs a fusion-based classification method, achieving an accuracy of 88.5% in grading CIN. The research aims to enhance the diagnostic process for cervical cancer by automating the assessment of histology slides traditionally performed by expert pathologists.

Uploaded by

gdheepak1979
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 20, NO.

6, NOVEMBER 2016 1595

Nuclei-Based Features for Uterine Cervical


Cancer Histology Image Analysis With
Fusion-Based Classification
Peng Guo, Koyel Banerjee, R. Joe Stanley, Senior Member, IEEE, Rodney Long, Member, IEEE,
Sameer Antani, Member, IEEE, George Thoma, Senior Member, IEEE, Rosemary Zuna, Shelliane R. Frazier,
Randy H. Moss, Senior Member, IEEE, and William V. Stoecker

Abstract—Cervical cancer, which has been affecting women


worldwide as the second most common cancer, can be cured if
detected early and treated well. Routinely, expert pathologists
visually examine histology slides for cervix tissue abnormality
assessment. In previous research, we investigated an automated,
localized, fusion-based approach for classifying squamous epithe-
lium into Normal, CIN1, CIN2, and CIN3 grades of cervical
intraepithelial neoplasia (CIN) based on image analysis of 61 dig-
itized histology images. This paper introduces novel acellular and
atypical cell concentration features computed from vertical seg-
ment partitions of the epithelium region within digitized histology
images to quantize the relative increase in nuclei numbers as the Fig. 1. CIN grade label examples highlighting the increase of immature atypi-
CIN grade increases. Based on the CIN grade assessments from two cal cells from epithelium bottom to top with increasing CIN severity. (a) Normal.
expert pathologists, image-based epithelium classification is inves- (b) CIN 1. (c) CIN 2. (d) CIN 3.
tigated with voting fusion of vertical segments using support vec-
tor machine and linear discriminant analysis approaches. Leave-
one-out is used for the training and testing for CIN classification, Detection of cervical cancer and its precursor lesion is accom-
achieving an exact grade labeling accuracy as high as 88.5%. plished through a Pap test, a colposcopy to visually inspect the
Index Terms—Cervical cancer, cervical intraepithelial neopla- cervix, and microscopic interpretation of histology slides by
sia (CIN), fusion-based classification, image processing, linear dis- a pathologist when biopsied cervix tissue is available. Micro-
criminant analysis (LDA), support vector machine (SVM). scopic evaluation of histology slides by a qualified pathologist
has been used as a standard of diagnosis [2]. As a part of the
I. INTRODUCTION pathologist diagnostic process, cervical intraepithelial neoplasia
N 2008, there were 529 000 new cases of invasive cervical (CIN) is a premalignant condition for cervical cancer in which
I cancer reported worldwide [1]. While the greatest impact of
cervical cancer prevalence is in the developing world, invasive
the atypical cells are identified in the epithelium by the visual
inspection of histology slides [3]. As shown in Fig. 1, cervical
cervical cancer continues to be diagnosed in the US each year. biopsy diagnoses include Normal (that is, no CIN lesion), and
three grades of CIN: CIN1, CIN2, and CIN3 [3]–[5]. CIN1 cor-
Manuscript received February 20, 2015; revised June 19, 2015 and August responds to mild dysplasia (abnormal change), whereas CIN2
8, 2015; accepted September 22, 2015. Date of publication October 26, 2015; and CIN3 are used to denote moderate dysplasia and severe
date of current version December 6, 2016. This work was supported in part
by the Intramural Research Program of the National Institutes of Health, the dysplasia, respectively. Histologic criteria for CIN include in-
National Library of Medicine, and the Lister Hill National Center for Biomedical creasing immaturity and cytologic atypia in the epithelium.
Communications. As CIN increases in severity, the epithelium has been ob-
P. Guo, K. Banerjee, R. Joe Stanley, and R. H. Moss are with the Depart-
ment of Electrical and Computer Engineering, Missouri University of Sci- served to show delayed maturation with an increase in imma-
ence and Technology, Rolla, MO 65409-0040 USA (e-mail: [email protected]; ture atypical cells from bottom to top of the epithelium [4]–[7].
[email protected]; [email protected]). As shown in Fig. 1, atypical immature cells are seen mostly in
R. Long, S. Antani, and G. Thoma are with the Lister Hill National Center
for Biomedical Communications, National Library of Medicine, National Insti- the bottom third of the epithelium for CIN 1 [see Fig. 1(b)].
tutes of Health, DHHS, Bethesda, MD 20894 USA (e-mail: [email protected]; For CIN2, the atypical immature cells appear in the bottom two
[email protected]; [email protected]). thirds of the epithelium [see Fig. 1(c)], and for CIN 3, atypical
R. Zuna is with the Department of Pathology, University of Oklahoma
Health Sciences Center, Oklahoma, OK 73117 USA (e-mail: rosemary-zuna@ immature cells lie in the full thickness of the epithelium [see
ouhsc.edu). Fig. 1(d)]. When these atypical cells extend beyond the epithe-
S. R. Frazier is with the Surgical Pathology Department, University of Mis- lium, i.e., through the basement membrane and start to enter
souri Hospitals and Clinics, Columbia, MO 65202 USA (e-mail: FrazierSR@
health.missouri.edu). into the surrounding tissues and organs, it may indicate invasive
W. V. Stoecker is with Stoecker & Associates, Rolla, MO 65401 USA (e-mail: cancer [3]. In addition to analyzing the progressively increasing
[email protected]). quantity of atypical cells from bottom to top of the epithe-
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. lium, identification of the nuclei atypia is also significant [3].
Digital Object Identifier 10.1109/JBHI.2015.2483318 Nuclei atypia is a characteristic of nuclei enlargement, thereby
2168-2194 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
Authorized licensed use limited to: SRMSee
Institute of Science and Technology Kattankulathur.
http://www.ieee.org/publications Downloaded on January 28,2025
standards/publications/rights/index.html at 13:06:13
for more UTC from IEEE Xplore. Restrictions apply.
information.
1596 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 20, NO. 6, NOVEMBER 2016

resulting in different shapes and sizes of the nuclei present within


the epithelium region. Visual assessment of this nuclei atypia
may be difficult, due to the large number of nuclei present and
the complex visual field, i.e., tissue heterogeneity. This may
contribute to diagnostic grading repeatability problems and in-
ter and intrapathologist variation [6]–[8].
Computer-assisted methods (digital pathology) have been ex-
plored for CIN diagnosis in other studies [5], [9]–[12] and pro-
vided the foundation for the work reported in [9]. These meth- Fig. 2. Example of incorrect medial axis determined using distance transform
only (solid line). The desired medial axis is manually drawn and is overlaid
ods examined texture features [11], nuclei determination and on the image (dashed line). The left-hand, right-hand, and interior sections are
Delaunay triangulation analysis [12], medial axis determination labeled on the bounding box image to highlight the limitations of the distance
[5], and localized CIN grade assessment [5]. A more detailed transform algorithm.
review of digital pathology techniques is presented in that pa-
per [9]. Our research group previously investigated a localized Step 5: Fuse the CIN grades from each vertical segment to
fusion-based approach to classify the epithelium region into the obtain the CIN grade of the whole epithelium for image-
different CIN grades, as determined by an expert pathologist [9]. based classification.
We examined 66 features including texture, intensity shading, The following sections present each step in detail.
Delaunay triangle (DT) features (such as area and edge length),
and weighted density distribution features, which yielded an
A. Medial Axis Detection and Segments Creation
exact CIN grade label classification result of 70.5% [9].
The goal of this paper, performed in collaboration with the Medial axis determination used a distance-transform-based
National Library of Medicine, is to automatically classify 61 [13], [14] approach from [9]. The distance-transform-based ap-
manually segmented cervical histology images into four dif- proach from [9] had difficulties in finding the left- and right-hand
ferent grades of CIN and to compare results with CIN grade end-axis portions of the epithelium axis in nearly rectangular
determined by an expert pathologist. The research presented in and triangular regions. Fig. 2 shows an example of an incor-
this paper extends the study in [9] to the development of new im- rect medial axis estimation using a distance-transform-based
age analysis and classification techniques for individual vertical approach (solid line) and the manually labeled desired medial
segments to allow improved whole-image CIN grade determina- axis (dashed line).
tion. Specifically, we present new image analysis techniques to Accordingly, the algorithm from [9] used the bounding box of
determine epithelium orientation and image analysis and to find the epithelium to obtain a center line through the bounding box
and characterize acellular and nuclei regions within the epithe- and intersecting the center line with the epithelium object. The
lium. We also present comparative CIN grading classification resulting center line was divided into a left-hand segment (20%),
analysis versus two expert pathologists CIN grading of the 61 a right-hand segment (20%), and the interior segment (60%).
image dataset. These divisions of the epithelium can be observed in Fig. 2. The
The order of the remaining sections of this paper is as fol- interior 60% portion of the distance-transform-based medial
lows. Section II presents the methods used in this paper. Section axis was retained as a part of the final medial axis. The left-
III describes the experiments performed. Section IV presents and right-hand cutoff points of the interior distance transform
and analyzes the results obtained and a discussion. Section V axis were determined as the closest Euclidean distance points
provides the study conclusions. from the distance transform axis to the center line points on the
20% left- and right-hand segments. As done in [9], the left- and
right-hand cutoff points are projected to the median bounding
II. METHODS
box points for the remaining left-hand 20% and right-hand 20%
The images analyzed included 61 full-color digitized histol- portions of the axis. The projected left- and right-hand segments
ogy images of hematoxylin and eosinophil preparations of tissue are connected with the interior distance transform axis to yield
sections of normal cervical tissue and three grades of cervical the final medial axis.
carcinoma in situ. An additional image, labeled as CIN1 by two The epithelium’s orientation was determined using a novel
experts (RZ and SF), was used for image processing algorithm approach based on the bounding box and the final medial axis.
parameter determination. The same experimental dataset was Using the bounding box, a comparison was performed of the
used in [9]. The entire classification process, as utilized in [9], number of nuclei distributed over eight masks that are created
of the segmented epithelium images was performed using the from eight control points (P1 , P2 , P3,..., P8 ) at the corners and
following five-step approach: the midpoints of the bounding box edges (see Fig. 3).
Step 1: Locate the medial axis of the segmented epithelium The masks are used for computing the ratios of the number of
region. detected nuclei to the areas of the masks. The control points used
Step 2: Divide the segmented image into ten vertical seg- for determining the masks are shown as P1–P8 in Fig. 3. For each
ments, orthogonal to the medial axis. control point combination, the number of nuclei is computed for
Step 3: Extract features from each of the vertical segments. each mask using the algorithm presented in Section II-B1. Let
Step 4: Classify each of these segments into one of the CIN n represent the set of the number of nuclei computed from
grades. masks 1–8, given as n = {n1 , n2 , . . . , n8 }, as designated in
Authorized licensed use limited to: SRM Institute of Science and Technology Kattankulathur. Downloaded on January 28,2025 at 13:06:13 UTC from IEEE Xplore. Restrictions apply.
GUO et al.: NUCLEI-BASED FEATURES FOR UTERINE CERVICAL CANCER HISTOLOGY IMAGE ANALYSIS 1597

Fig. 3. Bounding box of epithelium with control points labeled.

Fig. 5. Epithelium image example with vertical segment images (I1 ,I2 ,
I3 , . . . , I1 0 ) determined from bounding boxes after dividing the medial axis
into ten line segment approximations after medial axis computation.

normalizing the size of the epithelium region. The medial axis


top/bottom orientation is determined as vij = maxij (v). The
resulting medial axis is partitioned into ten segments of approx-
imately equal length, perpendicular line slopes are estimated at
the midpoints of each segment, and vertical lines are projected
at the end points of each segment to generate ten vertical seg-
ments for analysis. The partitioning of the epithelium image
into ten vertical segments was performed to facilitate localized
CIN classifications within the epithelium that can be fused to
provide an image-based CIN assessment, as done in [9]. Fig. 5
provides an example of the medial axis partitioning and the ten
vertical segments obtained.

B. Feature Extraction
Features are computed for each of the ten vertical segments
of the whole image I1 ,I2 ,I3 , . . . , I10 . All the segments of one
whole image are feature extracted in a sequence, from left to
right I1 to I10 (see Fig. 5).
Fig. 4. Bounding box partitioning with masks combinations shown based on
control points from Fig. 3 as a part of epithelium orientation determination In total, five different types of features were obtained in this
algorithm. (a) Mask 1 and Mask 2. (b) Mask 3 and Mask 4. (c) Mask 5 and study, including: 1) texture features (F1–F10) [9]; 2) cellularity
Mask 6. (d) Mask 7 and Mask 8. features (F11–F13); 3) nuclei features (F14, F15); 4) acellular
(light area) features (F16–F22); 4) combination features (F23,
Fig. 4(a)–(d). The eccentricity, defined as the ratio of the fitted F24); and 5) advanced layer-by-layer triangle features (F25–
ellipse foci distance to the major axis length as given in [16], F27). To give a brief introduction of the extracted features, Ta-
is computed for the entire epithelium image mask, given as e, ble I is presented showing the feature label and brief description
and for each mask image, denoted as ei . Then, the eccentricity in every row for each feature.
weighted nuclei ratios are calculated for each mask combination, 1) Texture and Cellular Features: The texture and color fea-
given as v = {v12 , v34 , v56 , v78 , v21 , v43 , v65 , v87 }, where tures were used in our previous work and are described in [9].
    The use of color in histopathological image analysis is also
n1 n1 e described in [10] and [11]. For texture features, both first-order
v12 =
max(n) n2 e1 structural measures derived directly from the image segment and
    second-order statistical methods based on the gray-level cooc-
n3 n3 e
v34 = currence matrix (GLCM) [5], [18] were employed. A gray-scale
max(n) n4 e3
    luminance version of the image was created in order to compute
n5 n5 e the statistics of energy, correlation, contrast, and uniformity of
v56 =
max(n) n6 e5 the segmented region; these statistics are then used to generate
    features (F1–F10) shown in Table I. The texture features include
n7 n7 e
v78 = contrast (F1), energy (F2), correlation (F3), and uniformity (F4)
max(n) n8 e7
of the segmented region, combined with the same statistics (con-
etc., and max(n) denotes the maximum area of the eight parti- trast, energy, and correlation) generated from the GLCM of the
tioned masks. The term ni /max(n) is used as a scale factor for segment (F5–F10, see Table I).

Authorized licensed use limited to: SRM Institute of Science and Technology Kattankulathur. Downloaded on January 28,2025 at 13:06:13 UTC from IEEE Xplore. Restrictions apply.
1598 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 20, NO. 6, NOVEMBER 2016

TABLE I Cluster centers are found from the luminance image using
FEATURE DESCRIPTION
K-means clustering [9] for three different regions (K = 3) de-
noted as clustLight, clustMedium, and clustDark for the light,
Label Description
medium, and dark cluster centers, respectively. Then, the ratios
F1 Contrast of segment: Intensity contrast between a pixel and its neighbor are calculated based on (1)–(3) [9]
over the segment image.
F2 Energy of segment: Squared sum of pixel values in the segment image. numLight
Acellular ratio =
F3 Correlation of segment: How correlated a pixel is to neighbors over the numLight + numM edium + numDark
segment image.
F4 Segment homogeneity: Closeness of the distribution of pixels in the (1)
segment image to the diagonal elements.
F5, F6 Contrast of GLCM: Local variation in GLCM in horizontal and vertical numM edium
directions. Cytoplasm ratio =
F7, F8 Correlation of GLCM: Joint probability occurrence (periodicity) of
numLight + numM edium + numDark
elements in the segment image in the horizontal and vertical directions.
F9, F10 Energy of GLCM: Sum of squared elements in the GLCM in horizontal
(2)
and vertical directions.
F11 Acellular ratio: Proportion of object regions within segment image with
numDark
N uclei ratio =
light pixels (acellular). numLight + numM edium + numDark
F12 Cytoplasm ratio: Proportion of object regions within segment image with
medium pixels (cytoplasm). (3)
F13 Nuclei ratio: Proportion of object regions within segment image with
dark pixels (nuclei). where Acellular ratio (F11), Cytoplasm ratio (F12), and Nu-
F14 Average nucleus area: Ratio of total nuclei area over total number of
nuclei. clei ratio (F13) represent the cellular features in Table I, and
F15 Background to nuclei area ratio: Ratio of total background area to total numLight, numMedium, and numDark represent the number of
nuclei area.
F16 Intensity ratio: Ratio of average light area image intensity to background
pixels that were assigned to the clusters of light, medium, and
intensity. dark, respectively. These features correspond to intensity shad-
F17 Ratio R: Ratio of average light area red to background red. ing features developed in [9].
F18 Ratio G: Ratio of average light area green to background green.
F19 Ratio B: Ratio of average light area blue to background blue.
2) Nuclei Features: The dark shading color feature discussed
F20 Luminance ratio: Ratio of average light area luminance to background above corresponds to nuclei, which appear within epithelial cells
luminance. in various shapes and sizes. Nuclei tend to increase in both num-
F21 Ratio light area: Ratio of light area to total area.
F22 Light area to background area ratio: Ratio of total light area to ber and size as the CIN level increases. This linkage between
background area. the nuclei characteristics and CIN levels motivates our devel-
F23 Ratio acellular number to nuclei number: Ratio of number of light areas
to number of nuclei.
opment of algorithms for nuclei detection feature extraction. In
F24 Ratio acellular area to nuclei area: Ratio of total light area to total nuclei this paper, the algorithms of nuclei detection and nuclei fea-
area. ture extraction are developed to obtain features to facilitate CIN
F25 Triangles in top layer: Number of triangles in top layer.
F26 Triangles in mid layer: Number of triangles in middle layer. classification. Specifically, we carry out the following steps:
F27 Triangles in bottom layer: Number of triangles in bottom layer. 1) nuclei feature preprocessing: average filter, image sharp-
ening, and histogram equalization;
2) nuclei detection: clustering, hole filling, small-area elim-
ination, etc;
3) nuclei feature extraction.
In preprocessing, vertical image segments are processed indi-
vidually. After converting the segment into a gray-scale image
I, an averaging filter is applied as in (4), where ∗ denotes con-
volution
⎡ ⎤
1 2 1
1 ⎢ ⎥
A= ⎣ 2 4 2 ⎦ ∗ I. (4)
16
1 2 1
After the average-filtered image is obtained, an image sharp-
ening method is used to emphasis the dark shading part, which
Fig. 6. Sample shading representatives within epithelium image used for de- is expressed as (5), following the methods in [17]
termining cellular features.
I sharp en = kI − A (5)
The luminance images showed regions with three different where I sharp en is the sharpened image, and k is an empirically
intensities, marked as light, medium, and dark areas within each determined constant of 2.25. The average-filtered image A and
single-segmented luminance image, as shown in Fig. 6 of normal the sharpened image I sharp en are shown in Fig. 8. In the final
cervical histology. The light areas correspond to acellular areas; preprocessing step, we apply histogram equalization using the
the medium areas correspond to cytoplasm; and the dark areas MATLAB function histeq to the sharpened image (I sharp en ) (in
correspond to nuclei. particular, to enhance details of the nuclei atypia).

Authorized licensed use limited to: SRM Institute of Science and Technology Kattankulathur. Downloaded on January 28,2025 at 13:06:13 UTC from IEEE Xplore. Restrictions apply.
GUO et al.: NUCLEI-BASED FEATURES FOR UTERINE CERVICAL CANCER HISTOLOGY IMAGE ANALYSIS 1599

Fig. 7. Example of image preprocessing. (a) Original luminance image I. (b)


Sharpened image Ish a rp e n obtained after average filtering of I.

Fig. 8. Example of image processing steps to obtain nuclei cluster pixels from
K-means algorithm from histogram equalized image. (a) Histogram equalized
image determined from Fig. 7(b). (b) Mask image obtained from K-means Fig. 9. Image examples of nuclei detection algorithm. (a) Image with pre-
algorithm with pixels closest to nuclei cluster. liminary nuclei objects obtained from clustering [step 1—Fig. 8(c)]. (b) Image
closing to connect nuclei objects (step 2). (c) Image with hole filling to produce
nuclei objects (step 3). (d) Image opening to separate nuclei objects (step 4). (e)
Image with nonnuclei (small) objects eliminated (step 5).
The nuclei detection algorithm is described as follows using
the equalized histogram image as the input.
1) Step 1: Cluster the histogram-equalized image into clus- algorithm, and the resulting mask image with pixels closest to
ters of background (darkest), nuclei, and darker and lighter the nuclei cluster from the K-means algorithm in Step 1. The
(lightest) epithelium regions using the K-means algorithm nuclei detection algorithm steps 2–5 are illustrated in Fig. 9.
(K = 4). Generate a mask image containing the pixels The nuclei features are calculated as follows: With the de-
closest to the nuclei cluster (second darkest). tected nuclei shown as white objects in the final binary images
2) Step 2: Use the MATLAB function imclose with a circular [see Fig. 9(e)], the nuclei features are calculated as
structuring element of radius 4 to perform morphological N uclei Area
closing on the nuclei mask image. Average nucleus area = (6)
N uclei N umber
3) Step 3: Fill the holes in the image from Step 2 with MAT-
LAB’S imfill function for this process. N onN uclei Area
Ratio of background to nuclei area =
4) Step 4: Use the MATLAB’S imopen to perform morpho- N uclei Area
(7)
logical opening with a circular structuring element of ra-
dius 4 on the image from Step 3. where Average nucleus area (F14) and Ratio of background
5) Step 5: Eliminate small area noise objects (nonnuclei ob- to nuclei area (F15) represent ratios obtained from the final
jects) within the epithelium region of interest from the nuclei images as shown in Fig. 9(e). In (6) and (7), Nucle-
mask in Step 4, with the area opening operation using the iArea represents the total area for all nuclei detected (all white
MATLAB function bwareaopen. pixels); NucleiNumber indicates the total number of white re-
Fig. 8 shows an example of a sharpened image before and after gions [number of objects in Fig. 9(e)]; AverageNucleusArea
histogram equalization, which is input to the nuclei detection is the ratio of nuclei area to the nuclei number, which tends

Authorized licensed use limited to: SRM Institute of Science and Technology Kattankulathur. Downloaded on January 28,2025 at 13:06:13 UTC from IEEE Xplore. Restrictions apply.
1600 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 20, NO. 6, NOVEMBER 2016

Fig. 10. Example L∗ image for light area detection. Fig. 12. Thresholded image of Fig. 11.

Fig. 11. Adaptive histogram equalized image of Fig. 10. Fig. 13. Example image of light area clusters after K-means clustering.

to increase with higher CIN grade; NonNucleiArea area repre-


sents the total number of pixels in the black nonnucleus region
within the epithelium region [black pixels within epithelium in
Fig. 9(e)]. RatioBackgroundNucleusArea denotes the ratio of
the nonnuclei area to nuclei area. We expect larger values of Av-
erageNucleusArea to correspond to increasing CIN grade, and
RatioBackgroundNucleusArea to decrease with increasing CIN
grade.
3) Acellular Features: Extracting the light area regions, de-
scribed previously as “light shading,” is challenging due to the
color and intensity variations in the epithelium images. We eval-
uated each of the L∗ -, a∗ -, and b∗ -planes of CIELAB color space Fig. 14. Example morphological dilation and final light area mask. (a) Mor-
for characterizing the light areas, and determined empirically phological dilation and erosion process after K-means clustering. (b) Final light
that L∗ provides the best visual results. The following outlines area mask, after eliminating regions with areas smaller than 100 pixels.
the methods we used to segment the histology images.
1) Step 1: Convert the original image from RGB color space
to L∗ a∗ b∗ color space, then select the luminance compo- dark nuclei regions and to retain the lighter nuclei and
nent L∗ (see Fig. 10). epithelium along with the light areas (see Fig. 12).
2) Step 2: Perform adaptive histogram equalization on 4) Step 4: Segment the light areas using the K-means al-
the image from step 1 using MATLAB’S adapthisteq. gorithm based on [9], with K equal to 4. The K-means
adapthisteq operates on small regions (tiles) [15] for con- algorithm input is the histogram-equalized image from
trast enhancement so that that the histogram of the out- Step 2 multiplied by the binary thresholded image from
put region matches a specified histogram and combines Step 3. A light area clustering example is given in Fig. 13.
neighboring tiles using bilinear interpolation to eliminate 5) Step 5: Remove all objects having an area less than 100
artificially induced boundaries (see Fig. 11). pixels from the image, determined empirically, using the
3) Step 3: After the image has been contrast adjusted, the MATLAB function regionprops [15]. A morphological
image is binarized by applying an empirically determined closing is performed with a disk structure element of ra-
threshold of 0.6. This step is intended to eliminate the dius 2. An example result is shown in Fig. 14.

Authorized licensed use limited to: SRM Institute of Science and Technology Kattankulathur. Downloaded on January 28,2025 at 13:06:13 UTC from IEEE Xplore. Restrictions apply.
GUO et al.: NUCLEI-BASED FEATURES FOR UTERINE CERVICAL CANCER HISTOLOGY IMAGE ANALYSIS 1601

Using the light area mask, the acellular features (from Table I)
are computed and are given as follows:
Light Area Intensity
Intensity ratio = (8)
Background Intensity
Light Area Red
Ratio R = (9)
Background Red
Light Area Green
Ratio G = (10)
Background Green
Light Area Blue
Ratio B = (11)
Background Blue
Light Area Luminance
Luminance ratio = (12)
Background Luminance
Light Area
Ratio light area = (13) Fig. 15. Example of nuclei detection comparison between the circular-Hough
Segment Area method and the method presented in this paper. (a) Original vertical segment. (b)
Light Area Example of circular-Hough method; note the nuclei misses and false detections.
Ratio light area = (14) (c) Nuclei detected using the algorithm from Section II-B2.
Background Area
where SegmentArea gives the epithelium area within the ver-
tical segment; LightArea denotes the area of all light area re-
gions; LightNumber corresponds to the number of light areas;
BackgroundArea represents the total number of nonnuclei and
nonlight area pixels inside the epithelium within the vertical seg-
ment (i.e., background area); LightAreaIntensity, LightAreaRed,
LightAreaGreen, LightAreaBlue, and LightAreaLuminance are
the average intensity, red, green, blue, and luminance values,
respectively, of the light areas within the epithelium of the
vertical segment; BackgroundIntensity, BackgroundRed, Back- Fig. 16. Distribution of nuclei centroids as vertices for DTs in bottom layer
groundGreen, BackgroundBlue, and BackgroundLuminance are (green), mid layer (red), and top layer (blue).
the average intensity, red, green, blue, and luminance values,
respectively, of the nonnuclei and nonlight area pixels within
the epithelium of the vertical segment. tics (means, standard deviations) of these quantities were also
4) Combination features: After both the nuclei features and included as features. In applying the CHT to our experimental
the acellular features were extracted, two new features were dataset, we observed that for some images, this method some-
calculated with the intent to capture the relative increase in times fails to locate noncircular irregularly-shaped nuclei; on
nuclei numbers as CIN grade increases. These features are the the other hand, this method does (incorrectly) detect some non-
ratio of the acellular number to the nuclei number (F23), and nuclei regions as nuclei, which leads to incorrect vertices being
the ratio of the acellular area to the total nuclei area (F24). The input to the (DT) method, thus degrading the triangle features
equations for calculating the combination features are presented calculated in downstream processing. To overcome the short-
as follows: comings of the CHT for nuclei detection, we use the centroids
Light N umber of the nuclei detected based on the method presented in Section
Ratio acellular number to nuclei number = II-B2. An example comparison between the previous circular-
N uclei N umber
(15) Hough method and the method in this paper is presented in
Fig. 15. Circles indicate the locations of detected nuclei.
Light Area
Ratio acellular area to nuclei area = (16) In this paper, we use the DT method, but restrict the geomet-
N uclei Area rical regions it can act upon, as follows. Before forming the DTs
where LightNumber and NucleiNumber represent the total num- with the vertices provided by the nuclei detection results from
ber of light area and nuclei objects, respectively, as found in Section II-B2, the vertical segment being processed is subdi-
Sections II-B2 and II-B3 (see Fig. 6). vided into three vertical layers as illustrated in Fig. 16. The aim
5) Triangle Features: In previous research, triangle features is to associate the presence of increasing nuclei throughout the
have been investigated based on the circular-Hough Transform epithelium with increasing CIN grades, namely: abnormality of
(CHT) [18] to detect nuclei for use as vertices in DT formulation the bottom third of the epithelium roughly corresponds to CIN1;
[19] to obtain the triangles [9], [10]. Features were computed abnormality of the bottom two-thirds, to CIN2; and abnormality
which included triangle area and edges length, and simple statis- of all three layers, to CIN3. We refer to these layers as bottom,

Authorized licensed use limited to: SRM Institute of Science and Technology Kattankulathur. Downloaded on January 28,2025 at 13:06:13 UTC from IEEE Xplore. Restrictions apply.
1602 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 20, NO. 6, NOVEMBER 2016

was assigned to the ten vertical segments for that image. For
the left-out test image, each vertical segment was classified into
one of the CIN grades using the SVM/LDA classifier.
Then, the CIN grades of the vertical segment images were
fused to obtain the CIN grade of the entire test epithelium image
(see Fig. 5). The fusion of the CIN grades of the vertical segment
images was completed using a voting scheme. The CIN grade of
the test epithelium image was assigned to the most frequently
occurring class assignment for the ten vertical segments. In
the case of a tie among the most frequently occurring class
assignments for the vertical segments, the test image is assigned
to the higher CIN class. For example, if there is a tie between
CIN1 and CIN2, then the image is designated as CIN2. The
leave-one-image-out training/testing approach was performed
Fig. 17. DTs in bottom layer (green lines), mid layer (red lines), and top layer separately for each expert’s CIN labeling of the experimental
(blue lines). dataset. For evaluation of epithelium classification, three scoring
schemes were implemented.
mid, and top. (See Fig. 16, the green circles stand for the top Scheme 1 (Exact Class Label): The first approach is exact
layer vertices, red for mid layer, and blue for bottom.) classification meaning that if the class label automatically as-
After locating the vertices for DT, the DT algorithm iteratively signed to the test image is the same as the expert class label,
selects point triples to become vertices of each new triangle cre- then the image is considered to be correctly labeled. Otherwise,
ated. Delaunay triangulation exhibits the property that no point the image is considered to be incorrectly labeled.
lies within the circles that are formed by joining the vertices Scheme 2 (Off-by-One Class Label): The second scoring ap-
of the triangles [5]. As shown in Fig. 17, all the triangles in proach is an Off-by-One classification scheme, known as “win-
three layers formed using DT are unique and do not contain any dowed class label” in previous research [9]. If the predicted
points within the triangles. The features are obtained according CIN grade level for the epithelium image is only one grade off
to the triangles in three different layers, including the number of as compared to the expert class label, the classification result is
triangles in top layer (F25), number of triangles in middle layer considered correct. For example, if the expert class label CIN2
(F26), and number of triangles in bottom layer (F27). was classified as CIN1 or CIN 3, the result would be considered
correct. If expert class label CIN1 was classified as CIN3, the
III. EXPERIMENTS PERFORMED result would be considered incorrect.
Scheme 3 (Normal Versus CIN): For the third scoring scheme,
We carried out three sets of experiments, which are described
the classification result would be considered incorrect when a
in this section. The experimental dataset consisted of 61 digi-
Normal grade was classified as any CIN grade and vice versa.
tized histology images, which were CIN labeled by two experts
(RZ and SF) (RZ: 16 Normal, 13 CIN1, 14 CIN2, and 18 CIN3;
SF: 14 Normal, 14 CIN1, 17 CIN2, and 16 CIN3). B. Classification of the Whole Epithelium
For the second set of experiments, features were extracted
A. Fusion-Based CIN Grade Classification of Vertical
from the whole epithelium image following the steps shown in
Segment Images
Fig. 18, which also gives the comparison between the whole-
For the first set of experiments, all the features extracted from epithelium image classification and fusion-based classification
the vertical segment images were used as inputs to train the over vertical segments (see Section III-A).
SVM/LDA classifier. The LIBSVM [22] implementation of the The whole-epithelium image classification in this section is
SVM [20] and LDA [21] classifiers were utilized in this study. done without generating any of the individual vertical segment
The SVM implementation uses a linear kernel and the four images (see Fig. 19 as an example for nuclei feature detec-
weights were the fractions of the images in each CIN class to tion over the whole image). The experiment was investigated to
the entire image set (fraction of the image set that is Normal, compare the performance of the fusion-based epithelium clas-
fraction of the image set that is CIN1, etc.). sification (see Section III-A) to the performance obtained by
Individual features were normalized by subtracting the mean classifying the epithelium image as a whole. Features extracted
training set feature value and dividing by the standard deviation from the whole image were used as inputs to the SVM/LDA
training set feature value. Due to the limited size of the image set, classifier using the same leave-one-image out approach. The
a leave-one-image-out approach was investigated for classifier same scoring schemes as presented in Section III-A were used
training and testing. For this approach, the classifier is trained to evaluate the performance of the whole epithelium classifica-
based on the individual vertical segment feature vectors for all tion. Again, the leave-one-image-out training/testing approach
but the left-out epithelium image (used as the test image). For was performed separately for each expert’s CIN labeling of the
classifier training, the expert truthed CIN grade for each image experimental dataset.

Authorized licensed use limited to: SRM Institute of Science and Technology Kattankulathur. Downloaded on January 28,2025 at 13:06:13 UTC from IEEE Xplore. Restrictions apply.
GUO et al.: NUCLEI-BASED FEATURES FOR UTERINE CERVICAL CANCER HISTOLOGY IMAGE ANALYSIS 1603

TABLE II
CONFUSION MATRIX RESULTS FOR FUSION-BASED CLASSIFICATION USING ALL
27 FEATURES (F1–F27) FOR SVM AND LDA CLASSIFIERS FOR BOTH EXPERTS

Expert RZ: SVM/LDA

Normal (16) CIN1 (13) CIN2 (14) CIN3 (18)

Normal 14/14 0/0 0/0 0/0


CIN1 2/2 12/11 0/0 0/0
CIN2 0/0 1/1 12/14 3/3
CIN3 0/0 0/1 2/0 15/15
Expert SF: SVM/LDA

Normal (14) CIN1 (14) CIN2 (17) CIN3 (16)


Normal 10/10 2/3 0/0 0/0
CIN1 4/3 9/9 1/1 0/0
CIN2 0/0 3/1 16/16 1/1
CIN3 0/1 0/1 0/0 15/15

out approach based on all the 27 features generated. Then,


the vertical segment classifications were fused using a voting
scheme to obtain the CIN grade of the epithelium image. We
Fig. 18. Fusion-based approach versus whole image approach.
evaluated the performance of these epithelium image classifi-
cations using the three approaches presented in Section III-A.
Table II shows the confusion matrices for the classification re-
sults obtained using the fusion-based approach, for the SVM
and LDA classifiers, respectively, for both experts (RZ and SF).
In the following, we provide summary comments for these
Table II results, and compare them with the previous results
published in [9], which used the RZ expert CIN labeling of the
image set. 1) For the Exact Class Label, we obtained an ac-
curacy of 86.9%/82.0% (RZ/SF) using the SVM classifier and
Fig. 19. Example image of nuclei detection over whole image without creating
vertical segments; the top image is the original epithelium image; bottom is the
88.5%/82.0% (RZ/SF) using the LDA classifier, (previous [9]:
nuclei mask of this image. 62.3% LDA). 2) For the Normal Versus CIN scoring scheme,
SVM classifier accuracy was 96.7%/90.2% (RZ/SF) and LDA
classifier 96.7%/90.2% (RZ/SF) (previous [9]: 88.5% LDA). 3)
C. Feature Evaluation and Selection For the Off-by-One class scoring scheme, SVM had an accu-
racy of 100%/100% (RZ/SF) and LDA, 98.4%/96.7% (RZ/SF)
For feature evaluation and selection, a SAS implementation (previous [9]: 96.7%).
of multinomial logistic regression (MLR) [24]–[27] and a Weka In the order to evaluate the performance of the fusion-based
attribute information gain evaluator were employed. For SAS approach for epithelium classification, we also carried out clas-
analysis, MLR is used for modeling nominal outcome vari- sification using the entire epithelium image. For the whole im-
ables, where the log odds of the outcomes are modeled as a age classification, we again used the SVM and LDA classifiers.
linear combination of the predictor variables [23]–[26]. The p- Table III shows the whole image classification results for both
values obtained from the MLR output are used as a criterion for experts.
selecting features with p-values less than an appropriate alpha From Table III, the Exact Class Label scoring scheme pro-
(α) value [23]–[26]. For Weka analysis, the algorithm ranks the vided an accuracy of 67.2%/54.1% (RZ/SF) and 50.8%/54.1%
features by a parameter called “attributes information gain ratio (RZ/SF) using the SVM and LDA classifiers, respectively.
(AIGR),” where the higher the ratio, the more significant the The Normal versus CIN scoring scheme yielded an accuracy
feature will be for the classification results. For both methods, of 98.4%/88.6% (RZ/SF) and 80.3%/88.6% (RZ/SF) for the
the automatically generated labels of the vertical segmentations SVM and LDA classifiers, respectively. The Off-by-One scor-
and the feature data are given as input. ing scheme obtained an accuracy of 90.2%/96.7% (RZ/SF) and
93.4%/95.1% for the SVM and LDA classifiers, respectively.
IV. EXPERIMENTAL RESULTS AND ANALYSIS The corresponding accuracy figures from the previous re-
search [9] for the LDA classifier are given in the following.
A. Experimental Results
Exact Class Label scoring: 39.3%; Normal versus CIN scoring:
We obtained the vertical segment image classifications (CIN 78.7%; Off-by-One scoring (called “windowed class” in [9]):
grading) using the SVM/LDA classifier with a leave-one-image- 77.0%.

Authorized licensed use limited to: SRM Institute of Science and Technology Kattankulathur. Downloaded on January 28,2025 at 13:06:13 UTC from IEEE Xplore. Restrictions apply.
1604 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 20, NO. 6, NOVEMBER 2016

TABLE III TABLE IV


CONFUSION MATRIX RESULTS FOR WHOLE IMAGE CLASSIFICATION USING ALL CONFUSION MATRIX RESULTS FOR FUSION-BASED CLASSIFICATION USING
27 FEATURES (F1–F27) FOR SVM AND LDA CLASSIFIERS FOR BOTH EXPERTS REDUCED FEATURE SET FOR SVM AND LDA CLASSIFIERS FOR BOTH EXPERTS

Expert RZ: SVM/LDA Expert RZ: SVM/LDA

Normal (16) CIN1 (13) CIN2 (14) CIN3 (18) Normal (16) CIN1 (13) CIN2 (14) CIN3 (18)

Normal 15/9 0/4 0/1 0/0 Normal 14/14 0/0 0/0 0/0
CIN1 1/5 8/5 3/2 3/1 CIN1 2/2 10/12 0/0 0/0
CIN2 0/1 2/3 8/8 5/8 CIN2 0/0 2/1 12/13 3/1
CIN3 0/1 3/1 3/3 10/9 CIN3 0/0 1/0 2/1 15/17
Expert SF: SVM/LDA Expert SF: SVM/LDA

Normal (14) CIN1 (14) CIN2 (17) CIN3 (16) Normal (14) CIN1 (14) CIN2 (17) CIN3 (16)
Normal 9/10 2/3 0/0 0/0 Normal 11/10 3/2 0/0 0/0
CIN1 5/3 6/8 2/3 1/1 CIN1 3/3 9/11 1/1 0/0
CIN2 0/1 5/2 11/8 7/8 CIN2 0/1 2/1 16/16 1/1
CIN3 0/0 1/1 4/6 7/7 CIN3 0/0 0/0 0/0 15/15

TABLE V
CIN DISCRIMINATION RATES FOR FUSION-BASED CLASSIFICATION USING ALL
For feature evaluation and selection experiments, all 27 fea- FEATURES, WHOLE IMAGE CLASSIFICATION, AND REDUCED FEATURE SET
tures extracted from the individual vertical segments were used FUSION-BASED CLASSIFICATION FOR BOTH EXPERTS
as inputs to the SAS MLR algorithm. We used α = 0.05 as
the threshold to determine statistically significant features. The Fusion-based classification
overall 27 features with p-values are presented in Table VII (see SVM (RZ/SF) LDA (RZ/SF)
appendix). From Table VII, features with a p-value smaller than
0.05 are considered statistically significant features. Exact Class Label 86.9%/82.0% 88.5%/82.0%
Normal versus CIN 96.7%/90.2% 96.7%/90.2%
In addition, all 27 features and the truth labels were used as Off-by-One 100%/100% 98.4%/96.7%
input for the Weka information gain evaluation algorithm [27]. Whole image classification
The algorithm ranks the features by an AIGR which ranges from SVM (RZ/SF) LDA (RZ/SF)
Exact Class Label 67.2%/54.1% 50.8%/54.1%
0 to 1. The larger the ratio is, the more likely that the feature is Normal versus CIN 98.4%/88.6% 80.3%/88.6%
considered by the algorithm. The 27 features and corresponding Off-by-One 90.2%/96.7% 93.4%/95.1%
AIGR values are shown in Table VII (see appendix). Reduced feature set
SVM (RZ/SF) LDA (RZ/SF)
Based on the statistically significant features shown in Ta- Exact Class Label 83.6%/83.6% 88.5%/85.3%
ble VII, we selected the feature set consisting of features F1, Normal versus CIN 96.7%/90.2% 95.1%/90.2%
F3, F4, F5, F7, F9, F10, F12, F13, F14, F15, F18, F21–F24, Off-by-One 98.4%/100% 100%/98.4%

F26, F27 as the input feature vectors for the fusion-based classi-
fication. Note that all these features were selected based on the
SAS MLR test of statistical significance except for F23 and F24, of 88.5%/85.3% (RZ/SF), Normal versus CIN classification
which were selected since they have relatively high information of 95.1%/90.2% (RZ/SF), and Off-by-One classification of
gain ratio (AIGR) among the 27 features (the second place and 100%/98.4% (RZ/SF). The highest correct classification rates
third place in Table VII of appendix). Our experiment compared obtained in previous work using the same experimental dataset
classification accuracies using this reduced set of features to the and leave-one-out training/testing approach with the LDA clas-
results using the entire 27 features set, and also compared to the sifier are summarized as follows [9]: Exact Class label of 70.5%,
classification accuracies obtained in the previous research [9]. Normal versus CIN of 90.2%, and Off-by-One classification of
The reduced feature classifications were done for the fusion- 100%.
based method only to remain consistent with the previous re-
search [9]. The classification algorithms (SVM/LDA) were ap-
B. Analysis of Results
plied to the reduced features, followed by fusing of the vertical
segment classifications to obtain the CIN grade of the epithe- In this section, we use the classification results from Section
lium. The resulting classifications obtained in this approach are IV-A, to compare 1) the performance among the scoring ap-
shown as confusion matrices in Table IV for both experts. proaches, 2) the performance between the SVM and LDA clas-
From Table IV, the following correct classification rates were sifiers, and 3) the performance between the previous research
obtained for the reduced features using the SVM-based classi- [9] and this study. Table V gives an overview of the correct
fier: Exact Class Label classification of 83.6%/83.6% (RZ/SF), recognition rates in different classification schemes examined
Normal versus CIN classification of 96.7%/90.2% (RZ/SF), in this paper.
and Off-by-One classification of 98.4%/100% (RZ/SF). From From Table V, the fusion-based classification approach shows
Table IV, the correct classification rates using the LDA improvement (except for same results for Normal versus CIN)
classifier were obtained as: Exact Class Label classification compared to the whole image classification approach, when

Authorized licensed use limited to: SRM Institute of Science and Technology Kattankulathur. Downloaded on January 28,2025 at 13:06:13 UTC from IEEE Xplore. Restrictions apply.
GUO et al.: NUCLEI-BASED FEATURES FOR UTERINE CERVICAL CANCER HISTOLOGY IMAGE ANALYSIS 1605

TABLE VI
SUMMARY OF CLASSIFICATION ACCURACIES: PREVIOUS RESEARCH VERSUS
REDUCED FEATURE SET RESULTS IN THIS PAPER

Previous work [9] Current Work (RZ/SF)


LDA SVM LDA

Fusion-based Exact Class Label 62.3% 86.9%/82.0% 88.5%/82.0%


classification Normal versus CIN 88.5% 96.7%/90.2% 96.7%/90.2%
Off-by-One 96.7% 100%/100% 98.4%/96.7%
Whole image Exact Class Label 39.3% 67.2%/54.1% 50.8%/54.1%
classification Normal versus CIN 78.7% 98.4%/88.5% 80.3%/88.5%
Off-by-One 77.0% 90.2%/96.7% 93.4%/95.1%
Reduced feature Exact Class Label 70.5% 83.6%/83.6% 88.5%/85.3%
classification Normal versus CIN 90.2% 96.7%/90.2% 95.1%/90.2%
Off-by-One 100% 98.4%/100% 100%/98.4%
Fig. 20. Misclassification example of a CIN2 image labeled as a CIN3.

all the feature vectors are used as input for the classifiers. For
the fusion-based versus whole image classification, the fusion-
based approach shows an improvement of 19.7% (minimum
improvement from the two experts from 67.2% to 86.9%) for
SVM and 27.9% (minimum improvement from the two experts
from 54.1% to 82.0%) for LDA using the Exact Class Label
scoring scheme. For the Normal versus CIN scoring scheme, an
accuracy improvement of 1.7% (minimum improvement from
the two experts from 88.5% to 90.2%) for the LDA classifier,
Fig. 21. Misclassification example of a CIN3 image labeled as a CIN1.
although we note an accuracy decline of 1.7% (from 98.4% to
96.7%) was observed for the SVM classifier. For Off-by-One
scoring scheme, classification accuracy increases 3.3% (mini-
mum of two experts from 96.7% to 100%) and 1.6% (minimum ample of an image with an expert label of CIN2 (RZ) that was
of two experts from 95.1% to 96.7%) for SVM and LDA, re- labeled as a CIN3 by the LDA classifier.
spectively. Inspecting Fig. 20, nuclei bridge across the epithelium and
With feature reduction added to fusion-based classification, are relatively uniform in density in the lower left-hand portion of
the fusion-based method improves in half the comparisons. the epithelium (see arrow). The nuclei features and the layer-by-
Specifically, the Exact Class Label accuracy for SVM declines layer DT features, particularly in the vertical segments contain-
by 3.3% (minimum improvement from two experts of 86.9% ing the lower left-hand portion of the epithelium, provide for a
to 83.6%) and LDA’s accuracy yields zero improvement (mini- higher CIN grade. In other regions of the epithelium, the nuclei
mum from two experts of 88.5% to 88.5%). For Normal versus density is not as uniform across the epithelium, which could
CIN, there is no improvement (0%) for SVM for both experts, provide for a less severe CIN grade label for the epithelium.
and a 1.6% loss (minimum from two experts of 96.7%–95.1%) Fig. 21 shows an example of an image with an expert label
in accuracy for LDA. For Off-by-One, the SVM classifier has a of CIN3 (RZ) that was labeled as a CIN1 by the LDA classifier.
1.6% decline (minimum of experts from 100% to 98.4%), and This image has texture, nuclei distribution, and color typical
LDA has zero improvement (from 98.36% to 98.36%), and LDA of a CIN3 grade. However, the white gaps present along the
has a gain of 1.6% (minimum of experts from 98.4% to 100%). epithelium were detected as acellular regions, leading to the
Among all of classification results obtained by the two differ- misclassification.
ent classifiers, the highest come from the fusion-based classifi- The overall algorithm was found to be robust in successful
cation. The highest Exact Class Label classification accuracy by identification of nuclei. Nuclei in the two lightest-stained slides
the two experts was 88.5%/85.3% (LDA, reduced feature set). and the two darkest-stained slides were counted. An average
In comparison, SVM obtained 83.6%/83.6% by both experts for of 89.2% of nuclei in all four slides was detected. The 89.2%
the reduced feature set. The accuracies for Normal versus CIN nuclei detection rate observed represents an advance over the
and Off-by-One are relatively high for both experts (above 90% results of Veta et al. [28], who found 85.5% to 87.5% of nuclei
for both SVM and LDA classifiers, and for both the full and the (not strictly comparable, as these results were for breast cancer).
reduced feature sets). A summary of the results from this study The finding of a high percentage of nuclei in the lightest- and
and from the previous research [9] is shown in Table VI below. darkest-stained slides shows that the algorithm is adaptable and
Note that only the LDA classifier was reported in [9]. robust with regard to varying staining.
In examining the classification results, the majority of the Ex- The approach in this study expands the techniques of other
act Class Label classification errors are off-by-one CIN grade. researchers who often process but a single cell component: the
This is supported with the high Off-by-One classification rates nucleus. We show in this paper that the transition from benign
for the different experiments performed. Fig. 20 shows an ex- to cancer affects the whole cell. We have shown that not only

Authorized licensed use limited to: SRM Institute of Science and Technology Kattankulathur. Downloaded on January 28,2025 at 13:06:13 UTC from IEEE Xplore. Restrictions apply.
1606 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 20, NO. 6, NOVEMBER 2016

nuclei, but in fact features of the entire cell, including intercel- De were the same training and testing sets used. The classifi-
lular spaces, are changed due to the more rapidly growing cells. cation results presented in this study for the two experts only
Thus, one of the top four features by p-value is the proportion differed by greater than 8.2% in the Exact Class Label of the
of regions of cytoplasm in the image (F12). whole image. Thus, the experimental results suggest that the
We also sought to use layers to better represent the CIN tran- involvement of nuclei and nuclei-related features using vertical
sition stages. The number of DTs in the middle layer was also segment classification and fusion for obtaining the image-based
one of the top four features by p-value, validating our approach CIN classification is an improvement over the existing methods
of analysis by layers. The last two features with the most signif- for automated CIN diagnosis. Even though our method out-
icant p-values were the energy of a GLCM (the sum of squared performed published results, we note that there is potential for
elements in GLCM in horizontal and vertical directions). The further improvement.
energy in the GLCM appears to capture the growing biological
disorder as the CIN grade increases.
We emphasize that between the previous research [9] and V. SUMMARY AND FUTURE WORK
our paper, 1) the training and testing datasets are the same; 2) In this study, we developed new features for the automated
the classifier (LDA) is the same and we investigated the SVM CIN grade classification of segmented epithelium regions. New
classifier; and 3) the scoring schemes, (Exact Class Label, Nor- features include nuclei ratio features, acellular area features,
mal versus CIN, and Off-by-One) are the same (in the previous combination features, and layer-by-layer triangle features. We
research, Off-by-One was called “windowed class”). There are carried out epithelium image classification based on these
two differences between the previous and current work. First, ground truth sets: 1) two experts labeled 62 whole epithelium
CIN classification results are reported for two experts (RZ and images as Normal, CIN1, CIN2, and CIN3, and 2) investiga-
SF) in this study to demonstrate CIN classification improve- tor labeling of ten vertical segments within each epithelium
ment over previous work, even with variations in the expert image into the same four CIN grades. The vertical segments
CIN truthing of the experimental dataset. Second, three acel- were classified using an SVM or LDA classifier, based on the
lular features (F18, F21, F22) and two layer-by-layer triangle investigator-labeled training data of the segments with a leave-
(F26, F27) nuclei features were found to be significant (from one-out approach. We used a novel fusion-based epithelium im-
Table VII), which are new in this paper, contribute to improved age classification method which incorporates a voting scheme
CIN discrimination capability over previous work. to fuse the vertical segment classifications into a classification
For the fusion-based method applied to all the feature vectors, of the whole epithelium image. We evaluated the classification
Table VI shows that Exact Class Label accuracy increases by results with three scoring schemes, and compared the classifi-
19.7% (from 62.3% to 82.0% for the lower of the two expert cation differences by classifiers, by scoring schemes, and the
results) for LDA. For the whole image method, LDA improved classification results of this paper as compared to our previous
by 14.8% (from 39.3% to 54.1% for the lower of the two expert work [9].
results). For fusion-based classification with reduced feature We found that the classification accuracies yielded in this
vectors, accuracy increases by 14.8% (from 70.5% to 85.3% for study with nuclei features outperformed that of the previous
the lower of the two experts) for LDA. Since Exact Class Label work [9]. Using the LDA classifier upon the reduced set of fea-
is the most stringent of the scoring schemes we used, we inter- tures, and based on an Off-by-One classification scoring scheme
pret these results as showing a substantial gain in classification for epithelium region classification, correct prediction rates as
in classification accuracy when using the nuclei features and high as 100% were obtained. Normal versus CIN classification
nuclei-related features. rates were as high as 96.72%, whereas the rates for Exact Class
The Off-by-One classification achieved excellent classifica- Labels were as high as 88.52% using a reduced set of features.
tion accuracy (100%) with the SVM and LDA classifiers, which Future research may include the use of adaptive critic design
matches the results from the previous study [9]. This classifica- methods for classification of CIN grade. Also, it is important to
tion metric gives more evidence of the similarity of neighboring include more cervix histology images to obtain a comprehensive
classes (Normal/CIN1 or CIN1/CIN2 or CIN2/CIN3) and the dataset for different CIN grades. With the enhancement of the
difficulties in discriminating between them [6]–[8]. It is also dataset, inter- or intrapathologist variations can be incorporated
consistent with the intra- and interpathologist variation in la- [6].
beling of these images. The two experts for this study differed Gwilym Lodwick, among his many contributions to diag-
in the CIN labeling of five images (out of 61) (or 8.2%) in the nostic radiology, contributed to our basic knowledge of pattern
experimental dataset, with the experts differing by only 1 CIN recognition by both humans and computers. The importance
grade (higher or lower) in each of the five cases. of diagnostic signs, which he also termed minipatterns, was
Overall, the 88.5%/85.3% accuracy by the two experts of stated: “Signs, the smallest objects in the picture patterns of
Exact Class Label prediction using the reduced features is disease, are of vital importance to the diagnostic process in that
23.0% higher than published results for automated CIN diag- they carry the intelligence content or message of the image”
nosis (62.3%) as presented by Keenan et al. in [12], 17.3% [29]. In this context, Professor Lodwick also maintained that
higher than the accuracy of the method used by Guillaud et al. these signs are at the heart of the human diagnostic process.
(68%) in [11], and 14.8% higher than the accuracy of the method The results of our study appear to indicate that the new layer-
by De [9], although we note that only in the comparison with by-layer and vertical segment nuclei features, in the domain of

Authorized licensed use limited to: SRM Institute of Science and Technology Kattankulathur. Downloaded on January 28,2025 at 13:06:13 UTC from IEEE Xplore. Restrictions apply.
GUO et al.: NUCLEI-BASED FEATURES FOR UTERINE CERVICAL CANCER HISTOLOGY IMAGE ANALYSIS 1607

cervical cancer histopathology, provide useful signs or minipat- inter-pathologist variation and factors associated with disagreement,”
terns to facilitate improved diagnostic accuracy. With the advent Histopathology, vol. 16, no. 4, pp. 371–376, 1990.
[8] C. Molloy, C. Dunton, P. Edmonds, M. F. Cunnane, and T. Jenkins, “Eval-
of advanced image processing techniques, these useful signs uation of colposcopically directed cervical biopsies yielding a histologic
may now be employed to increase the accuracy of computer diagnosis of CIN 1, 2,” J. Lower Genital Tract Dis., vol. 6, pp. 80–83,
diagnosis of cervical neoplasia, potentially enabling earlier di- 2002.
[9] S. De, R. J. Stanley, C. Lu, R. Long, S. Antani, G. Thoma, and R. Zuna,
agnosis for a cancer that continues to exact a significant toll on “A fusion-based approach for uterine cervical cancer histology image
women worldwide. classification,” Comput. Med. Imag. Graph., vol. 37, pp. 475–487, 2013.
[10] J. V. D. Marel, W. G. V. Quint, M. Schiffman, M. M. Van-de-Sandt,
R. E. Zuna, S. Terence-Dunn, K. Smith, C. A. Mathews, M. A. Gold,
APPENDIX J. Walker, and N. Wentzensen, “Molecular mapping of high-grade cervical
intraepithelial neoplasia shows etiological dominance of HPV16,” Int. J.
TABLE VII Cancer, vol. 131, pp. E946–E953, 2012.
FEATURES WITH CORRESPONDING P-VALUES AND AIGR [11] M. Guillaud, K. Adler-Storthz, A. Malpica, G. Staerkel, J. Matisic,
D. V. Niekirk, D. Cox, N. Poulin, M. Follen, and C. MacAulay, “Subvisual
chromatin changes in cervical epithelium measured by texture image anal-
Feature p-value AIGR Feature p-value AIGR
ysis and correlated with HPV,” Gynecologic Oncol., vol. 99, pp. S16–S23,
2006.
F1 0.0013 0.226 F15 0.1101 0.505
[12] S. J. Keenan, J. Diamond, W. G. McCluggage, H. Bharucha, D. Thompson,
F2 > 0.05 0.21 F16 > 0.05 0.2713
P. H. Bartels, and P. W. Hamilton, “An automated machine vision system
F3 0.0182 0.026 F17 > 0.05 0.2897
for the histological grading of cervical intraepithelial neoplasia (CIN),” J.
F4 0.0425 0.204 F18 0.0201 0.2357
Pathol., vol. 192, no. 3, pp. 351–362, 2000.
F5 0.0604 0.171 F19 > 0.05 0.2717
[13] C. R. Maurer, Q. Rensheng, and V. Raghavan, “A linear time algorithm
F6 > 0.05 0.0309 F20 > 0.05 0.2990
for computing exact Euclidean distance transforms of binary images in
F7 0.0051 0.2057 F21 0.0320 0.3608
arbitrary dimensions,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 25,
F8 > 0.05 0.079 F22 0.0646 0.3295
no. 2, pp. 265–270, Feb. 2003.
F9 0.0001 0.080 F23 > 0.05 0.3975
[14] C. R. Rao, H. Toutenburg, A. Fieger, C. Heumann, T. Nittner, and
F10 0.0001 0.034 F24 > 0.05 0.4713
S. Scheid, Linear Models: Least Squares Alternatives (Springer Series
F11 > 0.05 0.205 F25 > 0.05 0.1001
Statistics). New York, NY, USA: Springer-Verlag, 1999.
F12 0.0001 0.169 F26 0.0001 0.1037
[15] K. Banerjee, “Uterine cervical cancer histology image feature extraction
F13 0.0033 0.2287 F27 0.0001 0.2644
and classification,” M.S. thesis, Dept. Electr. Comput. Eng., Missouri
F14 0.0037 0.1800
Univ. Sci. Technol., Rolla, MO, USA, 2014.
[16] Mathworks. (2015). [Online]. Available: http://www.mathworks.com/
help/images/ref/regionprops.html
[17] R. Gonzalez and R. Woods, Digital Image Processing, 2nd ed. Englewood
Cliffs, NJ, USA: Prentice-Hall, 2002.
ACKNOWLEDGMENT [18] J. Borovicka. (2003). Circle detection using Hough transforms. Course
Project: COMS30121. Image Process. Comput. Vis. [Online]. Available:
The authors would like to thank the medical expertise and http://linux.fjfi.cvut.cz/pinus/bristol/imageproc/hw1/report.pdf
collaboration of Dr. M. Schiffman and Dr. N. Wentzensen, both [19] F. R. Preparata and M. I. Shamos, Computational Geometry: An Introduc-
tion. New York, NY, USA: Springer-Verlag, 1985.
from the National Cancer Institute’s Division of Cancer Epi- [20] W. J. Krzanowski, Principles Multivariate Analysis: A User’s Perspective.
demiology and Genetics. New York, NY, USA: Oxford Univ. Press, 1988.
[21] R. E. Fan, P. H. Chen, and C. J. Lin, “Working set selection using sec-
ond order information for training SVM,” J. Mach. Learn. Res., vol. 6,
REFERENCES pp. 1889–1918, 2005.
[22] C. C. Chang and C. J. Lin, “LIBSVM: A library for support vector ma-
[1] World Health Organization, Department of Reproductive Health and Re- chines,” ACM Trans. Intell. Syst. Technol., vol. 2, no. 27, pp. 1–27, 2011.
search and Department of Chronic Diseases and Health Promotion, Com- [23] A. Agresti, Introduction Categorical Data Analysis. New York, NY, USA:
prehensive Cervical Cancer Control: A Guide to Essential Practice, 2nd Wiley, 1996.
ed. Geneva, Switzerland: WHO Press, 2006, p. 7. [24] M. Pal, “Multinomial logistic regression-based feature selection for hy-
[2] J. Jeronimo, M. Schiffman, L. R. Long, L. Neve, and S. Antani, “A tool for perspectral data,” Int. J. Appl. Earth Obs. Geoinform., vol. 14, no. 1,
collection of region based data from uterine cervix images for correlation pp. 214–220, 2012.
of visual and clinical variables related to cervical neoplasia,” in Proc. [25] T. Li, S. Zhu, and M. Ogihara, “Using discriminant analysis for multi-class
IEEE 17th Symp. Comput.-Based Med. Syst., 2004, pp. 558–562. classification: An experimental investigation,” Knowl. Inf. Syst., vol. 14,
[3] V. Kumar, A. Abba, N. Fausto, and J. Aster, “The female genital tract,” in no. 4, pp. 453–472, 2006.
Proc. Robbins and Cotran Pathologic Basis of Disease, 9th ed., V. Kumar, [26] D. Hosmer and S. Lemeshow, Applied Logistic Regression, 2nd ed. New
A. K. Abbas, J. C. Aster, Eds. Philadelphia, PA, USA: Saunders, 2014, York, NY, USA: Wiley, 2000.
ch. 22, pp 1017–1021. [27] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and
[4] L. He, L. R. Long, S. Antani, and G. R. Thoma, “Computer assisted I. H. Witten, “The WEKA data mining software: An update,” SIGKDD
diagnosis in histopathology,” in Sequence and Genome Analysis: Methods Explor., vol. 11, no. 1, pp. 10–18, 2009.
and Applications, Z. Zhao, Ed. Hong Kong: iConcept Press, 2011, ch. 15, [28] M. Veta, P. J. V. Diest, R. Kornegoor, A. Huisman, M. A. Viergever, and
pp. 271–287. J. P. W. Pluim, “Automatic nuclei segmentation in H&E stained breast
[5] Y. Wang, D. Crookes, O. S. Eldin, S. Wang, P. Hamilton, and J. Diamond, cancer histopathology images,” PLoS ONE, vol. 8, no. 7, art. no. e70221,
“Assisted diagnosis of cervical intraepithelial neoplasia (CIN),” IEEE J. 2013, doi:10.1371/journal.pone.0070221
Sel. Topics Signal Process., vol. 3, no. 1, pp. 112–121, Feb. 2009. [29] G. Lodwick, “Diagnostic signs and minipatterns,” in Proc Annu. Symp.
[6] W. G. McCluggage, M. Y. Walsh, C. M. Thornton, P. W. Hamilton, A. Date, Comput. Appl. Med. Care, 1980, vol. 3, pp. 1849–1850.
L. M. Caughley, and H. Bharucha, “Inter- and intra-observer variation in
the histopathological reporting of cervical squamous intraepithelial lesions
using a modified Bethesda grading system,” Brit. J. Obstetrics Gynaecol.,
vol. 105, no. 2, pp. 206–210, 1988.
[7] S. M. Ismail, A. B. Colclough, J. S. Dinnen, D. Eakins, D. M. Evans,
E. Gradwell, J. P. O’Sullivan, J. M. Summerell, and R. New-
combe, “Reporting cervical intra-epithelial neoplasia (CIN): Intra- and Authors’ photographs and biographies not available at the time of publication.

Authorized licensed use limited to: SRM Institute of Science and Technology Kattankulathur. Downloaded on January 28,2025 at 13:06:13 UTC from IEEE Xplore. Restrictions apply.

You might also like