1 Gopakumar2017 PDF
1 Gopakumar2017 PDF
1
Department of Earth and Space Sciences, IIST, India – 695547 2 Department of Instrumentation and Applied Physics, IISc, India – 560012
3
Department of Electrical Engineering, Indian Institute of Technology Tirupati, India – 517006
*
Corresponding author: e-mail: [email protected]
This article has been accepted for publication and undergone full peer review but has not
been through the copyediting, typesetting, pagination and proofreading process, which
may lead to differences between this version and the Version of Record. Please cite this
article as doi: 10.1002/jbio.201700003
This article is protected by copyright. All rights reserved.
network to classify different stages of parasite infections sults show that the system operating on focus stack produces
while [19] uses SVM classifier on scale invariant features the best results both in terms of sensitivity and specificity
(SIFT) [20]. These methods operate on different dataset and hence support the design of our custom built focus stack
(differ in stain & dataset size) and report sensitivity in the collecting portable microscope.
range 81.7% to 95% and specificity in the range 92.59% to The main contributions of the paper are 1) The method
100%. For example, the work in [8] reports sensitivity of of identifying and processing only the suspected parasite
83% and specificity of 98% on a small dataset of 55 slide locations and not the entire slide thereby saving computer
Accepted Article
images that used Leishman stain. power, 2) The proposal of using focus stack of image patches
In this paper, we address detection as well as counting for malaria detection instead of a single focused image
of Plasmodium falciparum infected Red Blood Cells (RBC) and thereby reporting better accuracy, 3) The custom-built
from Leishman stained microscope slide images. The choice portable slide scanner to facilitate focus stack acquisition
of Leishmans stain was motivated from the recent study [21] in a cost effective manner, 4) The use of CNN rather than
which compared the use of Leishman and Giemsa stains for using conventional classifier such as SVM to exploit the
malaria diagnosis and suggested Leishman as a good alter- ability of the classifier to operate directly on focus stack, 5)
native. Even though Giemsa staining is more commonly The automated cell counter making use of a proposed RBC
used, the Leishman staining method provides better clarity segmentation strategy.
for visualization of the nuclear content (chromatin pattern) This paper is organized such that section 2 gives an
within the cells. Since, there are White Blood Cells (WBCs) overview of the operating framework, section 3 discusses
in actual blood samples, and our objective being to deal the detection of parasite infected cell locations, section 4
with only RBCs (not to classify WBCs), Leishman is a good addresses the automated counting procedure. Description of
staining alternative as it provide dark blue staining to the nu- the quantitative analysis is provided in section 5, followed
clear chromatin structures. This will help us to easily locate by the conclusions drawn from the present study in section
WBCs (being nucleated), and ensure that they are segregated 6.
from infected RBCs. A comparative evaluation of conven-
tional staining methods and immunological techniques for
the diagnosis of malaria can be found in [22]. 2. Overview of the Framework
As discussed earlier, though there are large number of
attempts to automate the malaria diagnosis, almost all of This section discusses the proposed prototype slide scanner
them inspect only the best focused image to identify the used to collect the dataset and outlines the proposed method
infected cells. This has not only increased the cost of the for detecting and counting infected RBCs.
device as they employ sophisticated methods to generate the
in-focus image but also introduced the possibility of mis-
classifying dark image artifacts (due to dust on camera/relay 2.1. Experimental setup and Dataset generation
lens) as parasite. Also, depending on the different life-cycle
stages of the parasite, they appear differently on slides and
are often hard to get all of them in a single focus. We ad-
dress these problems by acquiring focus stack of images.
The method is simple, and though it seems slow at first, the
additional information acquired as part of the focal stack is
not discarded but being made use of in subsequent image
processing. By avoiding sophisticated methods to gener-
ate the in-focus image and by using off-the-shelf optical &
electronic components, the cost of the system was reduced
considerably. By making use of focus stack in subsequent
processing, we have also increased the overall detection
efficiency. As of now, the least price of an automated slide
scanner is US $ 25,000 (PathScope [23]), whereas the com-
ponent cost of the instrument, proposed here, however, not
the commercial cost, is as low as $1500.
For the image analysis, CNN that operate directly on
focus stack of images is employed to identify malarial infec- Figure 1 Schematic representation of the prototype slide scanner
tion. We compare the results of detection of malaria infected
RBCs in terms of sensitivity and specificity obtained by 1) The dataset used in the study are videos containing focus
a support vector machine classifier trained on the statistical stack of multiple Field-of-Views (FoV) of Leishman stained
and textural features extracted from the suspected parasite slide images prepared using WBCs spiked cultured malarial
locations (similar to [6, 13, 14]), and CNN classifiers trained samples. The P. falciparum malaria culture is maintained in
on 2) the 32 × 32 patches surrounding the suspected lo- 5% hematocrit with O+ red blood cells containing Roswell
cations of the best focus image and 3) the focus stack of Park Memorial Institute (RPMI) media. For preparation of
32 × 32 patches surrounding the suspected locations. The re- blood mimicking the patient sample, the malaria culture is
This article is protected by copyright. All rights reserved.
then spiked with very small amount of WBCs extracted by of the motors. An LCD display unit with touch-screen is
partial centrifugation of blood sample. About 10 µL of the attached to the prototype slide scanner for user interaction
resultant sample is pipetted onto a clean glass slides (base and display purpose.
slides) and spread using a wedge slide, held at an angle of Figure 1 shows the schematic representation of the de-
450 (approximately) with the base slide. The wedge slide veloped prototype slide scanner unit. All major electronic
is moved horizontally over the base slide to result in a thin components of the prototype slide scanner including the
smear of the resultant sample. The prepared smears are on-board processor, display and motor drivers are powered
Accepted Article
exposed to 5% Leishman stain for 10 minutes for fixing using a regulated switching power supply with multiple volt-
and staining the cells. The stained slides are mounted with age tappings (RT-125D, MeanWell). The micro-controller
cover glass after DePeX addition for proper preservation. boards are powered from USB ports of the Intel NUC board.
The slides are then sequentially imaged under the imaging Motor control and simultaneous data acquisition are carried
setup. out by executing Python (version 2.9) code on the on-board
A custom built focus-stack collecting bright-field trans- processor. The custom-built prototype slide scanner em-
mission microscope setup built with inexpensive, off-the- ploys z-stacking/scanning approach to acquire the whole
shelf optical components and a camera unit is employed slide images and employs passive focusing mechanism to
to capture the dataset for the present study. The objective determine the in-focus images from multiple FoVs. The Z
behind realization of such a slide scanner is to device an stacks are recorded after translating the slide by appropri-
affordable and reliable cyto-diagnostic platform that can ately actuating the lateral direction motors. In the process
provide focus stack and function as an automated alterna- of z-stacking, the z-axis motor is translated in steps along
tive to traditional microscopic tests for malaria diagnosis in the direction of optical axis, and the focus stack videos are
resource limited settings. The optical setup of the prototype recorded during the entire span. The z-stacking approach
slide scanner unit consists of a white light LED source and adopted in the prototype offers multiple advantages. Since
necessary collimating lens arrangement for uniform illu- a passive focusing mechanism is employed, the system re-
mination of the sample plane and subsequent acquisition lies on commonly used opto-mechanical components and
of digital images of the slide. A low-cost 40X objective does not require any high precision piezo positioning units,
lens (Lawrence and Mayo) is employed to magnify the sam- thus reducing the overall system cost to less than 1500 $
ple features. For a digital imaging system like ours, the (US). Further additional information derived from the fo-
characteristic of interest is the digital resolution which is cus stacks can be used for examination of very fine sample
interplay of the magnification of the objective lens used in features which may not be evident from the analysis of
the setup and the pixel size of the camera sensor used for just the best focus image and for differentiating imaging
imaging. For clear identification of sample features from artifact (small dark spots in images due to dust particles
the respective FoV images, the system should have good on camera sensor/relay lens) from similar sample features,
digital resolution. With the given setup, our objective is thereby minimizing the cases of misclassification. Further,
to cover a larger FoV without compromising the digital focal stacking is essential while examining thicker sections
resolution of the setup. To meet these requirements, a dig- of the slides.
ital colour camera (DFK22BUC03, Imaging source) with We have used 765 FoVs containing 62015 cells of which
a very fine pixel size of 6 µm and a relay lens unit is em- 1191 cells are infected. The ground-truth was determined
ployed to capture the videos/images of the slide. This has after consultation with experts working in the field after
enabled us to get very good results in terms of the quality inspecting the focus stack of images which helped them
of recorded images even at a lower magnification of 40X to clearly differentiate the infected locations from other
which sufficed the diagnostic requirement. A custom-built artifacts. The FoVs for our experiments are selected from 2
3-D printed motorized translational stage integral to the de- slides. For each FoV of a slide, the z-axis motor is used to
veloped prototype is employed to navigate the slide across translate in steps across the best focused frame, capturing
multiple FoVs. The motorized stage holds the sample slide multiple images. In this way, all valid FoVs of the slide are
in place and translates the slide independently along x, y captured. We use variance as the focus measure to direct the
and z directions. The stage was built with the assembly z-axis motor and ensure that we always move across the best
of three readily available low-cost linear actuator stepper focused frame. In an ideal setup, once the best focused frame
motor units (Nema 11). Three separate digital motor driver location is fixed using z-axis motor, taking fixed number
units (DM422C) supply the required driving current to the of frames on both direction about it and doing it for all
respective motors. Each of the three axes of the motorized FoVs will suffice. But in our setup, due to vibrations of the
translational stage can be independently controlled, enabling motors, there was small shift in the best focused location
the whole slide scanning. Appropriate electrical signals to across the movement of different motors. We address this
enable the motors and to control the speed & direction of the problem by readjusting (automatically) between successive
motors are supplied by two off-the-shelf micro-controller focus scan by keeping track of the z-motor position that has
boards (Arduino UNO). The imaging source camera was produced the best focused frame in the last scan. We can
plugged into an on-board processor (Intel NUC) to record also use the same focus measure to decide whether to image
the videos. The on-board processor also communicates with a FoV (i.e., whether it contains sufficient cells to image). For
the micro-controller boards through the Python Firmata se- the present study, image variance computed from the gray
rial communication link to facilitate the software control scale equivalent image is employed as the focus measure
This article is protected by copyright. All rights reserved.
to identify the best focus image in a given focus stack. The
variance computation can be expressed as follows (Eq. (1)).
1 M N
F= ∑ ∑ (G(i, j) − µ)2
M ∗ N i=1
(1)
j=1
900
800
600
have found that the CNN working on the focus stack gave
500 the best performance. The detailed discussion can be found
in section 3.
400 Section 4 discusses the problem of segmenting the cells
towards taking the count of infected RBCs. In the segmen-
300
tation procedure, we make use of typical size of an RBC
to determine the number of cells in the slide particularly
200
when there are clumps of cells. Since the size of a WBC is
100
typically bigger than that of an RBC, we separate WBCs
0 500 1000 1500 2000 2500 3000 first, before deciding the number of RBCs by subsequent
Frame Number in the Video Stack
segmentation. Once the infected locations are identified and
Figure 2 Plot showing image variance (Y Axis) across 13 focus the cells are appropriately segmented out, the counting of
stacks (the best focused image for each FoV is marked in red). infected RBCs is trivial. The flowchart of the procedure is
shown in Fig.5.
Fig.3 shows the best focused image from a FoV stack.
It contains four infected RBCs (encircled in Red) and an
imaging artifact on a RBC due to dust on the camera sensor
(encircled in Black) and a WBC (encircled in Blue).
2.2. Framework
Figure 5 The flowchart depicting cell counting
Figure 6 The centroid of the regional minima superimposed on 3.2.2. Classification by CNN
the image shown in Fig.3
In addition to using SVM on hand-engineered features, we
have used custom-built CNN (Fig. 7) for detecting the in-
3.2. Detecting infected locations fected locations. One of the designed CNN directly operates
on RGB candidate patches selected from the best focused
Once the candidate locations are determined, the next step image while the other operates on focus stack of patches.
is to identify the locations which are really infected by para- We compare the advantage of using the focus stack over
sites. We treat this as a binary classification problem where only the best focused image in identifying the infected cells.
This article is protected by copyright. All rights reserved.
Accepted Article
The CNN is a biologically inspired feed-forward multi- image) and corners. Typically such low level features are
layer artificial neural network mapping an input into an extracted at convolution layers lower in the hierarchy, and
output. The connectivity pattern between its neurons is in- subsequent convolutions in upper layers are operating on
spired by the organization of animal visual cortex so as to these low level features to possibly extract more high-level,
respond to overlapping regions tiling the visual field. Inter- task specific features. Thus, we are actually learning the
nally, it can be thought of as a composition of functions each feature detectors (not actual features) which when operated
implementing simple convolutions on input feature map us- on test images extract the needed features.
ing learned kernels interleaved with non-linear and pooling Because of its capability to directly deal with multi-
operations followed by locally or fully connected layers [28]. channel images, CNN opens up a good space for analysing
With the advent of high-end computing capability, CNN has focus stack particularly in malaria diagnosis. The basic de-
recently become the de-facto standard for classification and sign of the CNN used in our experiment is shown in Fig.7.
has provided reliable result in medical domain. CNNs are Here, input is a D channel image patch of size 32 × 32. The
successfully used in detecting micro calcification on mam- C, R, and P in the blocks represent Convolution, ReLU, and
mograms [29], classifying interstitial lung diseases [30], Pooling respectively. The size of kernels used for convo-
detecting pathologic cases in chest Xray [31] and detecting lution is also shown under each block where the subscript
lung nodules in chest radiographs [29] and for detecting shows number of kernels used. We have also shown the size
mitosis in breast histology images [32]. Recently, [33] have of output map computed at each step and is placed above
studied the capability of CNN (both transfer learning and the blocks. All pooling blocks does max pooling in 2 × 2
standalone classification capability) in deciding whether area and uses a stride of 2. We have chosen this architecture
a sample is malaria infected or not. The dataset that they based on the following observations. As we are dealing with
have used contain 27578 RBCs from Giemsa stained slide patches around suspected parasite locations, and being the
images and have reported that CNN as a standalone clas- typical RBC cell size in our image 41 × 41, the input to
sifier has produced better classification result (mean accu- CNN is decided as 32 × 32 which is decent enough to hold
racy 97.37%) over the transfer learning based classification the neighbourhood in making the final decision. We have
(91.99%). used standard CNN building blocks: convolution, ReLU and
The basic building blocks of a CNN are convolution, subsampling. We prefer max-pooling based sub-sampling
Rectified Linear Unit (ReLU) and sub-sampling. The feature to average pooling since we don’t want to average out the
extraction is happening at convolution layers. The activation details. We have chosen standard kernel size 5 × 5 for the
function, ReLU suppresses feature value (ideally to 0) if it feature extraction. Being a binary classification, the desired
is negative, otherwise it reproduces the same feature at the number of output neurons was set to 2. Once the input
output. Thus, the ReLU introduces needed non-linearity to and output are set as discussed, a reasonably deep CNN,
the features extracted by the convolution layers. Since the not necessarily the chosen one, should give a fair classifi-
gradient of ReLU is 1 for positive inputs, the vanishing of cation. We chose a moderately deep architecture (neither
gradient problem [34] never happens in such cases during shallow nor too deep) having 4 feature extraction (convolu-
training which is more severe with sigmoid activation func- tion) layers for our experiment. The non-linearity in feature
tion especially at lower layers of the network due to chain is introduced by ReLU or MaxPooling towards the top lay-
rule of backpropagation. The sub-sampling layer introduces ers. The output layer producing the deciding feature is set
small shift and scale invariances to features as the convolu- as convolution layer (which basically turns out to be the
tions operate on a scaled down version of input thereafter. fully connected layer, due to the size of input map) and is
Note that, during training the parameters that have to be motivated from one of the most successful ImageNet CNN
learned in CNN (Fig. 7) are only the kernel weights used by model [36]. These made us to avoid pooling after R2 and
the convolutions. Intuitively, the operation at convolution C4 as they are already at the lowest dimension at the feature
block using the learned kernel on an input image can be level. However, the non-linearity of feature from layer C2 is
thought of extracting very local background and foreground ensured through max pooling.
features such as (but not exactly) low pass and high-pass The CNN is learned by a variant of backpropagation
filters, edges (just like operating a Sobel [35] kernel on the algorithm and we have used logarithmic loss of softmax
This article is protected by copyright. All rights reserved.
output (softmax log-loss Eq. (2)) as error function [37]. stacks (3 in each row) of such patches used for training CNN
are provided in Fig. 9. Cells shown in first row are really
exi jc
infected while in second and third rows are healthy. Note
y = − ∑ log T
(2)
i, j ∑t=1 exi jt that, the cells in second row have artifact due to dust and
have almost the same appearance unlike the parasite in first
In Eq. (2), T is total number of classes and at soft-max row which has change in appearance across the stack. Thus,
layer, there will be one neuron corresponding to each class. the sets that we have selected capture the change profile
Accepted Article
Since the number of classes in our case is 2 (infected and of the parasite/artifact across the stack and value the main
healthy), for each input patch, the network produces output motive of the experiment: checking whether focus stack
at 2 neurons. In ideal case, for an infected patch, the output improves the detection accuracy in malarial cases. Thus, in
at first and second neuron will be 1 and 0 respectively. For a the third experiment, a CNN is trained using these focus
healthy patch, this will be reversed to 0 and 1. The soft-max stack of patches, in the same way using the same candidate
response (term inside the log function) computes the chance locations used to train the CNN on the best focused patches.
of being the neuron’s output 1, where xi jc is the output at
neuron corresponding to the true class (c) for an input. Note
that, in our case c is 1 for an infected patch and is 2 for a
healthy patch. At the time of testing, the soft-max log-loss
layer (SML) is excluded and assigns the label of the class
which yields the maximum response. We have used CNN
building blocks developed for Matlab [37] to design our
network.
A B C D
Figure 12 Comparing local adaptive and global thresholding: A)
Clear LTh Difficult LTh Clear GTh Difficult Set the best focused image B) the region of interest, results for local
Clear Set = Clear LTh | Clear GTh adaptive and global thresholding respectively.
13.(b)). However, note that the cell regions are better seg-
mented from background with less breaks/holes when com-
pared to the result of local thresholding. We will make use
of this to isolate more number of cells and is going to be Figure 14 Top row : WBCs; Bottom row : infected RBCs
discussed in next subsection.
The Leishman stains nuclear chromatin structures in
dark blue (Fig. 14). Being nucleated, all WBCs are stained
Separating out the segmented cells and clumps in dark blue unlike the non-nucleated RBCs. Except lym-
phocytes, all other WBCs are much larger cells than RBCs.
Thus, we use colour and cell size information to identify
The area and solidity [27] are computed for each object seg- WBCs from background and RBCs. We address this as a
mented out by local thresholding and are checked against classification problem. We had collected 1000 pixel samples
measures for typical RBCs. The area is computed by count- each from RBCs, WBCs and background region. The colour
ing the number of pixels in the segment and solidity is components at these pixel location in LAB colour space (A
measured as the ratio of this area to the area of convex hull & B) [38] are used to build likelihood models for RBCs,
holding the segment. We have set lower bound for cell area WBCs and background. We have used Gaussian to model
as 1400 and upper bound as 2400 pixel area. If the segment the distribution since it was observed that the data spread
whose area is within these bounds and if solidity is greater more or less in Gaussian distribution. These models are then
than 0.85, they are qualified as single cell. The bounds on used to decide whether a pixel belongs to RBC, WBC, or
cell area are set from the fact that typical cell diameter for an background. For any test pixel, if the likelihood of WBC
RBC is in between 6 to 8 micrometers (µm) and by consid- is greater than the likelihood of RBCs and background, it
ering that 15 pixels corresponds to 2.19 micrometers in our is identified as a candidate pixel from a WBC cell. For any
imaging setup. Thus 3µm radius corresponds to ˜20.6 pixels segment thus identified, if its area is less than the maxi-
and 4µm radius corresponds to ˜27.4 pixels and the corre- mum area considered for an RBC, it can be either a heavily
sponding cell region must therefore contain 1400 pixels and infected RBC or a lymphocyte. If the RBCs are healthy,
2400 pixels (π × r2 ) respectively. However, the threshold then we can differentiate between lymphocyte and RBCs
for solidity (0.85) is set empirically. Once the qualified cells just using the colour. However, when there is heavy infec-
are identified from global thresholded image, these are then tion, the infected RBCs also stain in blue due to chromatin
masked out. By this masking, a few cells may get isolated structures of parasites. Luckily, in such cases, there will be
in global thresholded image which were earlier part of a dark parasite spots in infected RBCs as shown in Fig. 14.
clump. This can be seen by analysing segmented cells in Fig. In order to identify such spots, we have considered pixels
13.(c) in the regions of interest marked in Fig. 12. The newly whose intensity in all channels is less than 0.4 (on a scale
qualified cells are decided by computing ‘area’ and ‘solidity’ of 1). If sufficient such pixels (10 connected pixels, in our
measures for the objects remaining in global thresholded im- experiments) are found, the cell is identified as an infected
age after the masking operation. The cells identified by this RBC else a lymphocyte. The threshold for identifying the
procedure are shown in Fig. 13.(c). All other segments need black spot is decided by considering 50 heavily infected
to be processed separately and are shown in Fig. 13.(d). cells having dark parasite areas. With this procedure, we
This article is protected by copyright. All rights reserved.
could correctly locate WBCs (47 WBCs) without any false cell area), the computed centroid regions are masked out
positives from the 765 slide images used in this study. Fi- from the segment and the remaining number of centroids are
nally the WBCs are masked out and are excluded from the picked out using the distance transform of the masked out
rest of the procedure to segment cells from the clump. The segment. Towards this, the centroids of the regional maxima
WBC identified for the slide image in Fig. 3 is shown in points from the distance transform of masked out segment
Fig.15.(a). The masked out image which is to be further are computed but this time reducing the radius by 25% (em-
processed for segmenting the cells is shown in Fig.15.(b). pirically set) of the previous value and the top qualifying
Accepted Article
regional maxima points are picked out as centroids. The top
qualifying local maxima points are identified based on how
big the distance value is, in the distance transform corre-
sponding to the points of interest. This process is repeated
until the desired number of centroids is found out. From our
experience, the method worked quite well, once we could
correctly identify number of cells in clumps. However, if
there are heavily overlapping cases, the number of cells in
the clump could not be accurately determined as the method
that we have proposed is only based on the cell area. This
Figure 15 Masking out WBCs: A) WBCs identified B) WBCs
can be seen in case of the clump shown immediately right
excluded from clumps processing
to the bounding box (just above the WBC) shown in Fig. 16.
In this case, only two centroids are identified, even though
there are three cells.
Get the centres of the cells The cell centroids identified for the clumps of cells in
Fig.3 is shown in Fig.16. Fig.16.(a) shows different steps for
finding the cell centroids for a segment marked in Fig.16.(b).
The first row of Fig.16.(a) shows, respectively, a clump in
the original image selected for processing, its binary im-
age, distance transform, cell centroids, and cell centroids
superimposed on the clump. Since the expected number of
cell centroids for this segment is five but found only four as
explained earlier the required centroids are determined by
masking out the computed centroids and taking maxima on
the distance transform of the masked image. This is shown
Figure 16 Cell centroids for clumps: A) finding cell centroids for in the second row of Fig.16.(a). The images are the centroid
the segment marked using bounding box in B B) cell centroids region masked out binary image, its distance transform, all
identified for the clumps the centroids identified and the centroids super imposed
image. Note that all clumps in Fig.3 but the one shown in
The centroid is the farthest lying point from cell bound- Fig.16.(a) produced the correct number of centroids as de-
ary. Thus, if we measure the distance for each point on the sired right from the first level of processing (the processing
cell from background, the cell center will be the farthest shown in the first row of Fig.16.(a)).
from background. This means that the points on the cell
which are lying on the periphery are all having zero dis- Split the cells from clumps
tance while a pixel just inside the boundary will be having Once the centroids of cells in clumps are determined, the
unit distance and so on with the highest value at the center. cells are segmented using watershed [40]. The centroids
If there is only one cell, then the overall maximum in the found out are used as foreground marker, and the back-
distance transform occurs at centroid. However, for a cell ground image generated by global (Otsu’s) threshold is used
clump, the distance transform is supposed to have as many as the background marker. We use gradient magnitude as
local maxima as the number of cells contained in it. These segmentation function. That is, we impose minimum at the
local maxima are due to high distance transform value for marker locations and apply watershed algorithm on the gra-
each of the cell which has boundary with respect to back- dient magnitude image. The output of the watershed based
ground for majority of the portion though there is a portion segmentation is shown in Fig.17.
which clumps them with other cells. There are efficient al-
gorithms that compute this distance metric [39]. Therefore,
we compute the distance transform of the segments and 5. Results and Discussion
regional maxima points are identified as initial centroids
of cells in the segment. Being the typical cell diameter in In this section, we provide and analyse the results of parasite
pixels is 41, the region for determining the local maxima is detection and segmentation procedure presented in last sec-
fixed as a disk of diameter 41. If the number of centroids tions. As noted earlier, the pathologist working in the field
identified for each segment does not match with the num- have marked infected locations on slide images. The num-
ber of cells expecting from the clump (segment area/typical ber of infected and healthy cells is then correctly identified
This article is protected by copyright. All rights reserved.
as healthy fall in FN. Similarly, the healthy cells that are
correctly classified fall in TN while the misclassified cells
fall in FP. The sensitivity (TP/(TP+FN)) measures the ability
of a classifier to correctly identify an infected cell while
specificity (TN/(FP+TN)) measures the ability of a classifier
to correctly identify healthy cells. For any classification
system, there is a trade-off between these two quantities.
Accepted Article
MCC takes this into consideration and it turns out to be
Figure 17 Marker Based Watershed Segmentation: A) The fore- a better measure than the simple accuracy especially in
ground cell markers embedded on cells B) super-imposed cell cases where the number of positives and negatives are quite
boundaries on slide image shown in Fig.3 unbalanced. The MCC is defined by
T P × T N − FP × FN
MCC = p
manually by verifying the segmentation produced by the (FP + T P)(T P + FN)(FN + T N)(T N + FP)
automated counting procedure. Thus the infected parasite (3)
locations marked by experts on slide images along with the
number of infected and healthy cells provide ground truth
for the present study. We perform the quantitative analysis 1
of parasite detection in terms of false positives and false CNN on Focus Stack
negatives. We perform the quantitative analysis of segmen- CNN on Best Focused
tation by seeing how close the cell count identified by the 0.8 SVM on Best Focused
automated counting procedure is when compared to the True Positive Rate
manually verified count. We also assess the visual quality
of the proposed segmentation. 0.6
As per the WHO manual on Malaria diagnosis [1],
malaria detection from blood smear requires the exami-
0.4
nation of 800 high power (100X) FoVs. To cover the pre-
scribed physical slide area, the developed system requires
imaging of only 128 FoVs (with 40X magnification). The 0.2
focus stack acquisition from the required number of FoVs
takes about 11 minutes. This is followed by analysis of the
focus stack for identifying the infected and healthy RBCs. 0
This analysis is conducted in Matlab 2014a installed for 0 0.2 0.4 0.6 0.8 1
Windows 7, 64 bit operating system running on an Intel i3 False Positive Rate
machine @ 3.10 GHz with 4 GB RAM. The processing of
each FoV (i.e., slide image of dimension 480 × 720 × 3) on Figure 18 The ROC for the proposed classifiers : CNN on focus
average took around 6.5 seconds; ˜5 seconds for segmenta- stack, CNN on the best focused image and SVM on features
tion and ˜1.5 seconds for parasite detection.
In order to anlalyse the effectiveness of using focus As per the definition in Eq. (3), MCC value can range
stack in accurately detecting parasite locations, we have between -1 and 1. Value 1 corresponds to perfect classi-
performed a 10 fold cross validation experiment on the fication, 0 corresponds to not better than a random guess
classifiers: SVM on hand engineered features, CNN on the while -1 corresponds to the worst classifier. The average sen-
best focused patches and CNN on the focus stack of patches. sitivity, specificity and MCC measures computed for each
In this experiment, the 5600 positive patches along with of the classifier across 10 folds along with their standard
5600 negative patches, selected at random from the available deviation (std) is shown in Table 1. The results show that
326934 negative patches, constitute the training dataset. The the CNN working on the focus stack has an advantage in
entire 11200 training patches are now divided into 10 sets, reducing false positives, and false negatives as indicated by
by selecting samples at random but ensuring 560 positive the highest values for all measures in Table 1. The Receiver
and 560 negative samples in each set. Now, 9 sets are used Operating Characteristics (ROC) for the first cross valida-
to train the classifiers as discussed in previous section, and tion fold is shown in Fig. 18, where we plot true positive
used the remaining set for validation. Such 10 run has been rate (sensitivity) against false positive rate (1 - specificity).
made, where in each run, each set is used for validation of The area under the curve (AUC) is a measure of the good-
the classifiers trained with the remaining 9 sets. We measure ness of classification, where an ideal classifier should give
the effectiveness of parasite detection in terms of sensitivity, unit area and a classifier that does a random guess should
specificity and Matthews Correlation Coefficient (MCC). give 0.5. The mean area under the curve for 10 fold cross
All these measures are computed from the number of true validation experiment that we have conducted turned out
positives (TP), false negatives (FN), false positives (FP) and to be 0.9992 for the CNN on focus stack, 0.9987 for CNN
true negatives (TN). The infected cells which are correctly on the best focused image and 0.9910 for SVM on features.
classified as infected fall in TP, and that are misclassified The standard deviation reported are respectively 7.5764e−4 ,
This article is protected by copyright. All rights reserved.
Table 1 Average Sensitivity, Specificity and MCC along with their assess the accuracy of the network at each epoch during
standard deviation in 10 fold cross validation: A) SVM on Features training. The accuracy of the network on this validation set
and CNN on B) Patches C) Focus Stack can be seen in the plot (blue) shown in Fig.19. In order to
avoid over-fitting on the training data, we chose the network
Metric Method - A Method - B Method - C for testing as the one that gives minimum error on the vali-
Sensitivity 96.38% (0.88) 98.91%(0.36) 99.14%(0.37) dation set and not on the training set. These turned out to be
Specificity 95.43%(0.85) 99.39%(0.31) 99.62%(0.18) the trained network at iteration 36 from the set of classifiers
Accepted Article
MCC 0.9181(0.0150) 0.9831(0.0039) 0.9877(0.0032) trained to work on the best focused patches and the network
at iteration 85 from the set of classifiers trained to work on
the focus stack of patches.
Table 2 Number of Samples (#) used in Training and Validation For a test image, the trained classifiers can be applied
at candidate locations and the parasite locations can be
Patches # Un-rotated # Rotated # Train # Validation # Test marked. As noted earlier in subsection 3.1, the candidate
1400×4 60% of 5600 20% of 5600 locations are found from regional minima of the intensities.
+ve 1400 1400
(5600) (3360) (1120)
Depending on the classifier selected, the 32 × 32 RGB patch
-ve 326934 - 3360 1120 326934
or the focus stack of patches or the features are extracted
and are used for the classification. The parasite locations
9.9871e−4 and 0.0037. The high value for the mean AUC identified by different classifiers for the test image provided
and low value of standard deviation reveals that the CNN in Fig.3 are shown in Fig.20. The ground truth parasite
working on the focus stack offers superior performance. locations are provided in Fig.20 (a) and Fig.20 (b) provides
In order to perform a detailed analysis on the capability the locations detected by the SVM classifier. Fig.20 (c) and
of CNN operating on the focus stack, in our second exper- (d) are respectively the locations identified by the CNN
iment, we have trained the classifiers using relatively few trained on the best focused patches and CNN trained on
samples. The classifiers are trained by taking at random, the focus stack. Note that the SVM trained on the features
60% positive samples and an equal number of negative unnecessarily identify a healthy cell as infected while the
samples. A separate 20% positive samples (+ve: samples CNN working on the best focused patch misses out one
which are really infected) and an equal number of negative infected cell.
samples (-ve: samples which are really healthy) are used to
validate the network during training. The trained system is
then tested for all patches in all slide images. The number
of training, testing and validation patches are explicitly pro-
vided in Table 2. All classifiers are trained with exactly the
same set of candidate parasite locations in order to facilitate
a fair comparison.
0.15 0.15
train train
val val
0.10 0.10
error
error
0.05 0.05
only identified by the CNN operating on the focus stack Metric Method - A Method - B Method - C
and so as the case with the next three true negatives (d)-(f). Sensitivity 92.87% 96.56% 96.98%
The cells shown in Fig.21 (d)-(f) are really healthy cells Specificity 93.84% 98.27% 98.50%
and the marks are due to dust on the sensor and are cor- MCC 0.4435 0.7033 0.7302
rectly identified as healthy only by the CNN operating on
the focus stack. It can be clearly seen that the dust area does
not change considerably across the focus stack unlike the
change around the parasite locations. The last two images Table 7 Confusion Matrices for CNN on Focus stack for A) Slide-1
and B) Slide-2
(g)-(h) are the infected cells which are wrongly marked as
healthy by all three classifiers. It can be understood from
Infected Healthy Infected Healthy
the shown cell images that it is hard to go for a decision Infected 566 (566) 21 (22) 590 (590) 14 (14)
just by looking at the cells in the best focused slide image Healthy 506 (506) 32352 (32293) 406 (406) 27560 (27514)
(shown in the middle row) and as reflected by the confusion A B
matrix in Table 3, CNN operating on focus stack gets an
upper hand in taking the correct decision compared to the
classifiers working on the single best focused image. This the classification discussed in section 3.2 is shown in Ta-
can be observed by the highest MCC measure in Table 4, ble 5. The cells are counted by the automated procedure
for the CNN running on focus stack. The table also reveals developed in section 4 unlike the results produced in Table
that the proposed system could correctly classify 1156 cells 3 where the cells are counted manually. The correspond-
out of the total 1191 infected cells yielding a sensitivity of ing sensitivity and specificity obtained by the completely
97.06% and it could correctly classify 59912 cells out of the automated system are also provided in Table 6. It can be
total 60824 healthy cells yielding a specificity of 98.50%. easily verified that the results provided in Table 3 and Table
The sensitivity and specificity of the system using the SVM 5 are comparable. Now, the parasitemia level reported by
classifier trained on features of the patches are 92.95% and the automated system can be computed. It is defined as the
93.82% respectively and that of the CNN trained on the ratio of total number of infected RBCs to the total number of
best focused image are 96.64% and 98.27%. Note that, a RBCs considered and is typically expressed in percentage.
considerably infected cell can easily be identified from the The actual parasitemia level is defined by the ground truth
best focused image itself while the difficulty is in the case and is 1.92% (1191/62015). The statistics in Table 5 shows
of a cell at early stage of infection as can be seen in Fig. that the CNN on the focus stack has produced the closest
8. In such cases of early infection, though the variation prediction 3.34% (2068/61911) compared to the CNN on
across the stack is minimal, the CNN working on the focus the best focused patch (3.57%) as well as SVM on hand
stack could differentiate parasites from artifacts like dust engineered features (7.83%).
where the changes across the focus stack is insignificant. In order to assess separately the parasite detection and
This further results in the improvement of accuracy in terms automated counting procedure on the FoVs collected from
of sensitivity, specificity and MCC as observed in Table 4. the two slides that we have used in this study, the confusion
In order to evaluate the effectiveness of the automated matrices are provided in Table 7. There were 392 FoVs from
procedure for counting, the confusion matrix generated by Slide-1 and 373 from Slide-2. Table 7 (a) provides the statis-
This article is protected by copyright. All rights reserved.
tics for first slide while Table 7 (b) provides the statistics for From the result of segmentation obtained on our dataset, we
second slide. The automated count given by the proposed infer that our proposed method of segmentation performed
system for TP, FN, FP and TN is also provided in Table 7 better than the method in [41].
and are provided within brackets. The corresponding sen-
sitivity and specificity metrics computed for the detection
procedure for both slides are comparable and are respec-
tively 96.42% & 97.68% and 98.46% & 98.55%.
Accepted Article
We have seen that the number of cells identified by
the automated counting procedure comes very close to the
manually verified count. Now, we will analyse the visual
quality of segmentation. The slide images used in this study
had so much variation in cell overlap, focus and in staining.
Fig.22 shows this variability where Fig.22 (a) and (b) show
different amount of staining and Fig.22 (c) shows that a
few of the cell images are not in focus. The corresponding
segmentation result is provided in second row. Fig.22 (d)
shows a clump of two cells wrongly identified as a single
cell due to the large overlapping region between the cells and
Fig.22 (e) shows the case where a single cell get segmented
as if there are two cells.
Figure 23 Comparing segmentation results : A) the result of seg-
mentation by CellX method [41] for the image in Fig.3 B) Another
focused image, the cells segmented using C) CellX method [41]
and D) by the proposed method.