0% found this document useful (0 votes)
49 views17 pages

1 Gopakumar2017 PDF

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views17 pages

1 Gopakumar2017 PDF

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Abstract This paper introduces a focus stacking based ap-

proach for automated quantitative detection of Plasmodium fal-


ciparum malaria from blood smear. For the detection, a custom
designed convolutional neural network (CNN) operating on focus
stack of images is used. The cell counting problem is addressed
as the segmentation problem and we propose a two level seg-
mentation strategy. Use of CNN operating on focus stack for the
detection of malaria is first of its kind, and it not only improved
the detection accuracy (both in terms of sensitivity (97.06%)
and specificity (98.50%)) but also favoured the processing on
cell patches and avoided the need for hand-engineered features.
Accepted Article
The slide images are acquired with a custom-built portable slide
scanner made from low-cost, off-the-shelf components and is
suitable for Point-of-Care Diagnostics. The proposed approach
of employing sophisticated algorithmic processing together with
inexpensive instrumentation can potentially benefit clinicians to
enable malaria diagnosis.

CNN based Malaria Diagnosis from Focus-stack of Blood


Smear Images Acquired using Custom-built Slide Scanner
G. Gopakumar 1 , Swetha 2 M., Sai Siva Gorthi 2 , Gorthi. R. K. Sai Subrahmanyam 3,*

1. Introduction affordable and automated diagnostic platform can minimize


dependency on the expert for conducting these tests and
Malaria is a deadly infectious disease transmitted by female facilitate in performing these diagnostic tests in resource
Anopheles mosquitoes. In 2015, 214 million malaria cases limited conditions. This has generated huge interest in af-
were reported worldwide causing an estimated death toll of fordable point-of-care [3] diagnostic systems.
438,000. Five Malarial species of the protozoan of genus
Plasmodium [1] (falciparum, vivax, ovale, malariae, and In the direction of developing automated point-of-care
knowlesi) infect humans. Among those five, Plasmodium malaria diagnostics, quite a large number of works have
falciparum is the most common species infecting humans been done both from thick [4–6] and thin [7–10] smear
(˜75%) and accounts for the majority of deaths. The next ma- slide images. Computer vision based algorithms are also
jor share (˜20%) of infection is by vivax while the knowlesi widely employed for identification of malaria infected sam-
rarely infects humans. Typically the diagnosis of malaria ples from slide images. [11] reviews some of these tech-
is done by inspecting the Giemsa/Leishman stained blood niques. The work done in [4] uses a two stage algorithm for
smears under a bright field microscope. Though there are automatic detection of parasites in thick blood films. In first
other methods [2] such as antigen-based rapid diagnostic stage, sensitivity of the system is maximized while second
tests and the use of the polymerase chain reaction to detect stage tries to reduce false positives by closely examining the
the parasite’s DNA, microscopic examination remains as cells that are classified in first stage. There are studies that
the gold standard for the diagnosis of malaria due to the use statistical features such as mean, variance, kurtosis at
low cost and widespread acceptance. However, the man- suspected parasite location followed by classification based
ual examination of the slides is cumbersome and demands on genetic programming [5] and support vector machine
highly experienced clinicians. Often the result varies from (SVM [12]) [6, 13, 14]. The intensity level differences as
clinician to clinician, and produces non-standardized results. well as features extracted from morphology of the parasites
Thus, it is highly beneficial and can add great diagnostic are also studied for its capability in detecting the presence
value to have an automated system to aid the clinician in ef- of parasites in thin films [15–17]. The work done in [18]
ficient detection of malaria. Further, the development of an extracts histogram based texture features and uses neural

1
Department of Earth and Space Sciences, IIST, India – 695547 2 Department of Instrumentation and Applied Physics, IISc, India – 560012
3
Department of Electrical Engineering, Indian Institute of Technology Tirupati, India – 517006
*
Corresponding author: e-mail: [email protected]

This article has been accepted for publication and undergone full peer review but has not
been through the copyediting, typesetting, pagination and proofreading process, which
may lead to differences between this version and the Version of Record. Please cite this
article as doi: 10.1002/jbio.201700003
This article is protected by copyright. All rights reserved.
network to classify different stages of parasite infections sults show that the system operating on focus stack produces
while [19] uses SVM classifier on scale invariant features the best results both in terms of sensitivity and specificity
(SIFT) [20]. These methods operate on different dataset and hence support the design of our custom built focus stack
(differ in stain & dataset size) and report sensitivity in the collecting portable microscope.
range 81.7% to 95% and specificity in the range 92.59% to The main contributions of the paper are 1) The method
100%. For example, the work in [8] reports sensitivity of of identifying and processing only the suspected parasite
83% and specificity of 98% on a small dataset of 55 slide locations and not the entire slide thereby saving computer
Accepted Article
images that used Leishman stain. power, 2) The proposal of using focus stack of image patches
In this paper, we address detection as well as counting for malaria detection instead of a single focused image
of Plasmodium falciparum infected Red Blood Cells (RBC) and thereby reporting better accuracy, 3) The custom-built
from Leishman stained microscope slide images. The choice portable slide scanner to facilitate focus stack acquisition
of Leishmans stain was motivated from the recent study [21] in a cost effective manner, 4) The use of CNN rather than
which compared the use of Leishman and Giemsa stains for using conventional classifier such as SVM to exploit the
malaria diagnosis and suggested Leishman as a good alter- ability of the classifier to operate directly on focus stack, 5)
native. Even though Giemsa staining is more commonly The automated cell counter making use of a proposed RBC
used, the Leishman staining method provides better clarity segmentation strategy.
for visualization of the nuclear content (chromatin pattern) This paper is organized such that section 2 gives an
within the cells. Since, there are White Blood Cells (WBCs) overview of the operating framework, section 3 discusses
in actual blood samples, and our objective being to deal the detection of parasite infected cell locations, section 4
with only RBCs (not to classify WBCs), Leishman is a good addresses the automated counting procedure. Description of
staining alternative as it provide dark blue staining to the nu- the quantitative analysis is provided in section 5, followed
clear chromatin structures. This will help us to easily locate by the conclusions drawn from the present study in section
WBCs (being nucleated), and ensure that they are segregated 6.
from infected RBCs. A comparative evaluation of conven-
tional staining methods and immunological techniques for
the diagnosis of malaria can be found in [22]. 2. Overview of the Framework
As discussed earlier, though there are large number of
attempts to automate the malaria diagnosis, almost all of This section discusses the proposed prototype slide scanner
them inspect only the best focused image to identify the used to collect the dataset and outlines the proposed method
infected cells. This has not only increased the cost of the for detecting and counting infected RBCs.
device as they employ sophisticated methods to generate the
in-focus image but also introduced the possibility of mis-
classifying dark image artifacts (due to dust on camera/relay 2.1. Experimental setup and Dataset generation
lens) as parasite. Also, depending on the different life-cycle
stages of the parasite, they appear differently on slides and
are often hard to get all of them in a single focus. We ad-
dress these problems by acquiring focus stack of images.
The method is simple, and though it seems slow at first, the
additional information acquired as part of the focal stack is
not discarded but being made use of in subsequent image
processing. By avoiding sophisticated methods to gener-
ate the in-focus image and by using off-the-shelf optical &
electronic components, the cost of the system was reduced
considerably. By making use of focus stack in subsequent
processing, we have also increased the overall detection
efficiency. As of now, the least price of an automated slide
scanner is US $ 25,000 (PathScope [23]), whereas the com-
ponent cost of the instrument, proposed here, however, not
the commercial cost, is as low as $1500.
For the image analysis, CNN that operate directly on
focus stack of images is employed to identify malarial infec- Figure 1 Schematic representation of the prototype slide scanner
tion. We compare the results of detection of malaria infected
RBCs in terms of sensitivity and specificity obtained by 1) The dataset used in the study are videos containing focus
a support vector machine classifier trained on the statistical stack of multiple Field-of-Views (FoV) of Leishman stained
and textural features extracted from the suspected parasite slide images prepared using WBCs spiked cultured malarial
locations (similar to [6, 13, 14]), and CNN classifiers trained samples. The P. falciparum malaria culture is maintained in
on 2) the 32 × 32 patches surrounding the suspected lo- 5% hematocrit with O+ red blood cells containing Roswell
cations of the best focus image and 3) the focus stack of Park Memorial Institute (RPMI) media. For preparation of
32 × 32 patches surrounding the suspected locations. The re- blood mimicking the patient sample, the malaria culture is
This article is protected by copyright. All rights reserved.
then spiked with very small amount of WBCs extracted by of the motors. An LCD display unit with touch-screen is
partial centrifugation of blood sample. About 10 µL of the attached to the prototype slide scanner for user interaction
resultant sample is pipetted onto a clean glass slides (base and display purpose.
slides) and spread using a wedge slide, held at an angle of Figure 1 shows the schematic representation of the de-
450 (approximately) with the base slide. The wedge slide veloped prototype slide scanner unit. All major electronic
is moved horizontally over the base slide to result in a thin components of the prototype slide scanner including the
smear of the resultant sample. The prepared smears are on-board processor, display and motor drivers are powered
Accepted Article
exposed to 5% Leishman stain for 10 minutes for fixing using a regulated switching power supply with multiple volt-
and staining the cells. The stained slides are mounted with age tappings (RT-125D, MeanWell). The micro-controller
cover glass after DePeX addition for proper preservation. boards are powered from USB ports of the Intel NUC board.
The slides are then sequentially imaged under the imaging Motor control and simultaneous data acquisition are carried
setup. out by executing Python (version 2.9) code on the on-board
A custom built focus-stack collecting bright-field trans- processor. The custom-built prototype slide scanner em-
mission microscope setup built with inexpensive, off-the- ploys z-stacking/scanning approach to acquire the whole
shelf optical components and a camera unit is employed slide images and employs passive focusing mechanism to
to capture the dataset for the present study. The objective determine the in-focus images from multiple FoVs. The Z
behind realization of such a slide scanner is to device an stacks are recorded after translating the slide by appropri-
affordable and reliable cyto-diagnostic platform that can ately actuating the lateral direction motors. In the process
provide focus stack and function as an automated alterna- of z-stacking, the z-axis motor is translated in steps along
tive to traditional microscopic tests for malaria diagnosis in the direction of optical axis, and the focus stack videos are
resource limited settings. The optical setup of the prototype recorded during the entire span. The z-stacking approach
slide scanner unit consists of a white light LED source and adopted in the prototype offers multiple advantages. Since
necessary collimating lens arrangement for uniform illu- a passive focusing mechanism is employed, the system re-
mination of the sample plane and subsequent acquisition lies on commonly used opto-mechanical components and
of digital images of the slide. A low-cost 40X objective does not require any high precision piezo positioning units,
lens (Lawrence and Mayo) is employed to magnify the sam- thus reducing the overall system cost to less than 1500 $
ple features. For a digital imaging system like ours, the (US). Further additional information derived from the fo-
characteristic of interest is the digital resolution which is cus stacks can be used for examination of very fine sample
interplay of the magnification of the objective lens used in features which may not be evident from the analysis of
the setup and the pixel size of the camera sensor used for just the best focus image and for differentiating imaging
imaging. For clear identification of sample features from artifact (small dark spots in images due to dust particles
the respective FoV images, the system should have good on camera sensor/relay lens) from similar sample features,
digital resolution. With the given setup, our objective is thereby minimizing the cases of misclassification. Further,
to cover a larger FoV without compromising the digital focal stacking is essential while examining thicker sections
resolution of the setup. To meet these requirements, a dig- of the slides.
ital colour camera (DFK22BUC03, Imaging source) with We have used 765 FoVs containing 62015 cells of which
a very fine pixel size of 6 µm and a relay lens unit is em- 1191 cells are infected. The ground-truth was determined
ployed to capture the videos/images of the slide. This has after consultation with experts working in the field after
enabled us to get very good results in terms of the quality inspecting the focus stack of images which helped them
of recorded images even at a lower magnification of 40X to clearly differentiate the infected locations from other
which sufficed the diagnostic requirement. A custom-built artifacts. The FoVs for our experiments are selected from 2
3-D printed motorized translational stage integral to the de- slides. For each FoV of a slide, the z-axis motor is used to
veloped prototype is employed to navigate the slide across translate in steps across the best focused frame, capturing
multiple FoVs. The motorized stage holds the sample slide multiple images. In this way, all valid FoVs of the slide are
in place and translates the slide independently along x, y captured. We use variance as the focus measure to direct the
and z directions. The stage was built with the assembly z-axis motor and ensure that we always move across the best
of three readily available low-cost linear actuator stepper focused frame. In an ideal setup, once the best focused frame
motor units (Nema 11). Three separate digital motor driver location is fixed using z-axis motor, taking fixed number
units (DM422C) supply the required driving current to the of frames on both direction about it and doing it for all
respective motors. Each of the three axes of the motorized FoVs will suffice. But in our setup, due to vibrations of the
translational stage can be independently controlled, enabling motors, there was small shift in the best focused location
the whole slide scanning. Appropriate electrical signals to across the movement of different motors. We address this
enable the motors and to control the speed & direction of the problem by readjusting (automatically) between successive
motors are supplied by two off-the-shelf micro-controller focus scan by keeping track of the z-motor position that has
boards (Arduino UNO). The imaging source camera was produced the best focused frame in the last scan. We can
plugged into an on-board processor (Intel NUC) to record also use the same focus measure to decide whether to image
the videos. The on-board processor also communicates with a FoV (i.e., whether it contains sufficient cells to image). For
the micro-controller boards through the Python Firmata se- the present study, image variance computed from the gray
rial communication link to facilitate the software control scale equivalent image is employed as the focus measure
This article is protected by copyright. All rights reserved.
to identify the best focus image in a given focus stack. The
variance computation can be expressed as follows (Eq. (1)).

1 M N
F= ∑ ∑ (G(i, j) − µ)2
M ∗ N i=1
(1)
j=1

Here M and N are the number of rows and columns


Accepted Article
of the frame (M = 480, N = 720), and µ is the aver-
age intensity of the gray level image G, defined by µ =
1 M N
M∗N ∑i=1 ∑ j=1 G(i, j).
The Fig.2 plots the focus measure computed from each
of the 13 adjacent FoVs collected during a horizontal scan.
As expected, for each FoV stack, the focus measure mono-
tonically increases and attains maximum value for the frame
in the best focus and then it decreases. In Fig.2, the best Figure 3 Best focused image from a stack. Infected RBCs are
focused image for each FoVs is marked in red asterisks and encircled in Red, an RBC with an artifact is encircled in Black and
the one marked in green corresponds to either the last frame a WBC is encircled in Blue.
of a FoV or the first frame of a new FoV.

900

800

700 Figure 4 The flowchart depicting detection of parasite locations


Variance Measure

600
have found that the CNN working on the focus stack gave
500 the best performance. The detailed discussion can be found
in section 3.
400 Section 4 discusses the problem of segmenting the cells
towards taking the count of infected RBCs. In the segmen-
300
tation procedure, we make use of typical size of an RBC
to determine the number of cells in the slide particularly
200
when there are clumps of cells. Since the size of a WBC is
100
typically bigger than that of an RBC, we separate WBCs
0 500 1000 1500 2000 2500 3000 first, before deciding the number of RBCs by subsequent
Frame Number in the Video Stack
segmentation. Once the infected locations are identified and
Figure 2 Plot showing image variance (Y Axis) across 13 focus the cells are appropriately segmented out, the counting of
stacks (the best focused image for each FoV is marked in red). infected RBCs is trivial. The flowchart of the procedure is
shown in Fig.5.
Fig.3 shows the best focused image from a FoV stack.
It contains four infected RBCs (encircled in Red) and an
imaging artifact on a RBC due to dust on the camera sensor
(encircled in Black) and a WBC (encircled in Blue).

2.2. Framework
Figure 5 The flowchart depicting cell counting

The flowchart for the detection of candidate parasite loca-


tions using a classifier can be found in Fig.4. The regional
minima locations are identified as the candidate locations
and a 32 × 32 patch surrounding it is used to represent each 3. Proposed Methodology : Detecting
such location. Depending on the classifier, the patches as infected locations
such or features extracted from the patches are used to take
appropriate decision to classify the candiadate location as In this section, we discuss the proposed method for detecting
infected or healthy. We have experimented with three classi- parasite locations. The idea is to classify identified candidate
fiers: SVM classifier on hand engineered features, a CNN locations into either infected or healthy. We experiment with
classifier operating only on the best focused patch and a traditional way of extracting features and feeding it to SVM
CNN classifier operating on the focus stack of patches and classifier. Also, we experiment the use of trending deep
This article is protected by copyright. All rights reserved.
learning network: CNN. Two CNNs are designed for this; we need to classify each of the candidate locations into
the first operates on patches from the best focused image of classes either infected or healthy. We experiment both with
a FoV while the second operate on focus stack of patches. traditional SVM classifier trained on hand engineered sta-
tistical as well as textural features and the trending CNN
deep learning classifier. There were a total of 1400 positive
3.1. Identifying candidate locations patches which are the patches around ground truth parasite
locations. Note that, a cell may have more than one parasite
Accepted Article
The candidate locations are identified from the best focused
locations marked on it. There were 326934 negative patches
image. The best focused image for each FoV is identified
which are the patches surrounding local intensity minima
as the one with the highest focus measure (variance). The
where there is no parasite. As the number of positive candi-
behaviour of focus measure across focus stack of images for
date patches is very less compared to the number of negative
13 FoVs can be found in Fig.2 and the best focused image
patches, we have rotated the patches at 900 , 1800 and 2700
for an arbitrary FoV is shown in Fig.3. Once the best fo-
and increased the number of positive samples to 5600.
cused image is retrieved, the next step is to detect candidate
locations. The parasites appear darker in slide images and
hence suspected candidate locations of the parasite can be
3.2.1. Classification by SVM
identified as the local minima intensity region of the image
over a neighbourhood spanning the radius of typical RBC
size. We can refine these candidate locations by excluding Each of the candidate location identified is represented by
a few of them which are falling in the background region. selecting 32 × 32 RGB patch surrounding it in the best
The background image for this purpose can easily be found focused image. Unlike using statistical features alone for
by Otsu’s global thresholding [24] since there is high con- the SVM based classication [6, 13, 14], we propose to use
trast between cell region and background. Fig.6 shows the texture features as well. Altogether, fourteen features are
suspected parasite locations identified (dilated version of considered. At parasite locations, there will be gradient dif-
centroids of the regional minima [25, 26]) and are super- ference, and the surrounding textural characteristics differ
imposed on the original image shown in Fig.3. Though the when compared to a healthy patch. Widely used features
darkness of parasite may slightly change across staining, to characterize the texture of a patch are ‘Contrast’, ‘Cor-
we are looking only for the regional minima to identify the relation’, ‘Energy’ and ‘Homogeneity’. These features are
candidate locations (irrespective of the level of darkness). extracted from the gray level co-occurrence matrix (GLCM)
Thus every parasite location will be identified as a candidate of the region as defined in [27].
location and in our experiments, the method has identified The parasite locations in a Lieshman’s stained slide im-
all such locations. When there is noise, its location also age appears to be darker and hence such locations can be
qualifies as candidate location for parasite, but will be taken characterized by identifying statistical features such as the
care of by the classifier since noisy locations have different minimum, maximum, mean and variance of intensities of
profile across the stack when compared to parasite locations. the patch as well as the minimum and maximum gradient
magnitude observed in 32 × 32 region. Since the parasite
infection is very local especially in case of early stage of
infection, we have considered small (3 × 3) non-overlapping
sub-regions for the 32 × 32 patch and have computed the
minimum and maximum values of mean and variance ob-
served for all sub-regions. Thus there are 4 texture features,
4 statistical features computed by taking all pixel intensities
of the patch and 2 features computed from gradient of the
patch, followed by 4 features computed by considering all
3 × 3 sub-regions constituting the final set of 14 features.
The SVM classier with radial basis function (RBF, σ =
0.7) kernel is then trained by taking features of training
patches. The kernel parameter σ is fixed after experimenting
with a range of values between 0 and 1 in steps of 0.1.

Figure 6 The centroid of the regional minima superimposed on 3.2.2. Classification by CNN
the image shown in Fig.3
In addition to using SVM on hand-engineered features, we
have used custom-built CNN (Fig. 7) for detecting the in-
3.2. Detecting infected locations fected locations. One of the designed CNN directly operates
on RGB candidate patches selected from the best focused
Once the candidate locations are determined, the next step image while the other operates on focus stack of patches.
is to identify the locations which are really infected by para- We compare the advantage of using the focus stack over
sites. We treat this as a binary classification problem where only the best focused image in identifying the infected cells.
This article is protected by copyright. All rights reserved.
Accepted Article

Figure 7 The CNN designed for malaria detection

The CNN is a biologically inspired feed-forward multi- image) and corners. Typically such low level features are
layer artificial neural network mapping an input into an extracted at convolution layers lower in the hierarchy, and
output. The connectivity pattern between its neurons is in- subsequent convolutions in upper layers are operating on
spired by the organization of animal visual cortex so as to these low level features to possibly extract more high-level,
respond to overlapping regions tiling the visual field. Inter- task specific features. Thus, we are actually learning the
nally, it can be thought of as a composition of functions each feature detectors (not actual features) which when operated
implementing simple convolutions on input feature map us- on test images extract the needed features.
ing learned kernels interleaved with non-linear and pooling Because of its capability to directly deal with multi-
operations followed by locally or fully connected layers [28]. channel images, CNN opens up a good space for analysing
With the advent of high-end computing capability, CNN has focus stack particularly in malaria diagnosis. The basic de-
recently become the de-facto standard for classification and sign of the CNN used in our experiment is shown in Fig.7.
has provided reliable result in medical domain. CNNs are Here, input is a D channel image patch of size 32 × 32. The
successfully used in detecting micro calcification on mam- C, R, and P in the blocks represent Convolution, ReLU, and
mograms [29], classifying interstitial lung diseases [30], Pooling respectively. The size of kernels used for convo-
detecting pathologic cases in chest Xray [31] and detecting lution is also shown under each block where the subscript
lung nodules in chest radiographs [29] and for detecting shows number of kernels used. We have also shown the size
mitosis in breast histology images [32]. Recently, [33] have of output map computed at each step and is placed above
studied the capability of CNN (both transfer learning and the blocks. All pooling blocks does max pooling in 2 × 2
standalone classification capability) in deciding whether area and uses a stride of 2. We have chosen this architecture
a sample is malaria infected or not. The dataset that they based on the following observations. As we are dealing with
have used contain 27578 RBCs from Giemsa stained slide patches around suspected parasite locations, and being the
images and have reported that CNN as a standalone clas- typical RBC cell size in our image 41 × 41, the input to
sifier has produced better classification result (mean accu- CNN is decided as 32 × 32 which is decent enough to hold
racy 97.37%) over the transfer learning based classification the neighbourhood in making the final decision. We have
(91.99%). used standard CNN building blocks: convolution, ReLU and
The basic building blocks of a CNN are convolution, subsampling. We prefer max-pooling based sub-sampling
Rectified Linear Unit (ReLU) and sub-sampling. The feature to average pooling since we don’t want to average out the
extraction is happening at convolution layers. The activation details. We have chosen standard kernel size 5 × 5 for the
function, ReLU suppresses feature value (ideally to 0) if it feature extraction. Being a binary classification, the desired
is negative, otherwise it reproduces the same feature at the number of output neurons was set to 2. Once the input
output. Thus, the ReLU introduces needed non-linearity to and output are set as discussed, a reasonably deep CNN,
the features extracted by the convolution layers. Since the not necessarily the chosen one, should give a fair classifi-
gradient of ReLU is 1 for positive inputs, the vanishing of cation. We chose a moderately deep architecture (neither
gradient problem [34] never happens in such cases during shallow nor too deep) having 4 feature extraction (convolu-
training which is more severe with sigmoid activation func- tion) layers for our experiment. The non-linearity in feature
tion especially at lower layers of the network due to chain is introduced by ReLU or MaxPooling towards the top lay-
rule of backpropagation. The sub-sampling layer introduces ers. The output layer producing the deciding feature is set
small shift and scale invariances to features as the convolu- as convolution layer (which basically turns out to be the
tions operate on a scaled down version of input thereafter. fully connected layer, due to the size of input map) and is
Note that, during training the parameters that have to be motivated from one of the most successful ImageNet CNN
learned in CNN (Fig. 7) are only the kernel weights used by model [36]. These made us to avoid pooling after R2 and
the convolutions. Intuitively, the operation at convolution C4 as they are already at the lowest dimension at the feature
block using the learned kernel on an input image can be level. However, the non-linearity of feature from layer C2 is
thought of extracting very local background and foreground ensured through max pooling.
features such as (but not exactly) low pass and high-pass The CNN is learned by a variant of backpropagation
filters, edges (just like operating a Sobel [35] kernel on the algorithm and we have used logarithmic loss of softmax
This article is protected by copyright. All rights reserved.
output (softmax log-loss Eq. (2)) as error function [37]. stacks (3 in each row) of such patches used for training CNN
are provided in Fig. 9. Cells shown in first row are really
exi jc
 
infected while in second and third rows are healthy. Note
y = − ∑ log T
(2)
i, j ∑t=1 exi jt that, the cells in second row have artifact due to dust and
have almost the same appearance unlike the parasite in first
In Eq. (2), T is total number of classes and at soft-max row which has change in appearance across the stack. Thus,
layer, there will be one neuron corresponding to each class. the sets that we have selected capture the change profile
Accepted Article
Since the number of classes in our case is 2 (infected and of the parasite/artifact across the stack and value the main
healthy), for each input patch, the network produces output motive of the experiment: checking whether focus stack
at 2 neurons. In ideal case, for an infected patch, the output improves the detection accuracy in malarial cases. Thus, in
at first and second neuron will be 1 and 0 respectively. For a the third experiment, a CNN is trained using these focus
healthy patch, this will be reversed to 0 and 1. The soft-max stack of patches, in the same way using the same candidate
response (term inside the log function) computes the chance locations used to train the CNN on the best focused patches.
of being the neuron’s output 1, where xi jc is the output at
neuron corresponding to the true class (c) for an input. Note
that, in our case c is 1 for an infected patch and is 2 for a
healthy patch. At the time of testing, the soft-max log-loss
layer (SML) is excluded and assigns the label of the class
which yields the maximum response. We have used CNN
building blocks developed for Matlab [37] to design our
network.

3.2.3. CNN on the Best Focused Patches


Figure 8 Focus stacks of 4 cells each containing 9 images (along
For each of the selected candidate parasite locations for columns): Last three stacks are that of infected cells while the first
training, 32 × 32 RGB patch is extracted from the best fo- row represents a healthy cell with a dust on it
cused frame surrounding the suspected location. These are
then used to train the CNN classifier shown in Fig. 7. Note
that the dimension of input (D) is 3 since we are using only
one RGB patch for each candidate location.

3.2.4. CNN on 32 × 32 focus stack patches

In this section, we will start with the motivation behind using


focus stack in recognizing parasite locations. The Fig. 8
shows 4 focus stacks (one in each row), where the 5th image
is from the best focused frame. The remaining 4 images each Figure 9 Sample focus stacks (3 in each row) used in training
on both sides are the images after skipping 4 frames away CNN: Cells in first row are infected while in second and third rows
from the best focused frame on both sides of the stack. These are healthy. Note that cells in second row has artifact due to dust
are actually tiled cell images (each of size 51 × 51) cropped and have almost the same appearance across the stack.
selectively from frames and are then scaled to fit width of the
manuscript. The cell in first row is healthy while the others
are infected by parasite. Note that, in the infected cases, the
4. Automatic segmentation and counting of
parasites are coming into focus and then fades away unlike
the dust on healthy cell which is having almost constant infected RBCs
appearance across the stack. Though there is a variation The detection of malaria infected RBCs is a sub-problem to
in case of parasite infection across focus stack, the change the more general problem of determining the parasitemia
is minimal especially in case of early stage of infection. level. In order to count number of infected RBCs, each cell
Our intention is to capture these changes with minimum has to be separately identified. We address this problem by
number of images (to restrict computational complexity) automatic segmentation of cells and is going to be discussed
and we chose a patch from the best focused frame, and in this section.
two patches far away from the best focused, one on both
sides. We have chosen 32 × 32 RGB patches for training,
since this is decent enough to capture the neighbourhood 4.1. Segmentation procedure
to decide the infection, being 41 × 41 the typical cell size.
Thus, each focus stack taken to train the CNN will now The proposed segmentation is a cascaded two step procedure
contain 3 RGB patches (32 × 32 × 9). A few of the focus : segmenting the cells which are more or less separated and
This article is protected by copyright. All rights reserved.
segmenting the cells from clumps. We employ an adaptive is accomplished by first identifying isolated cells in local
thresholding strategy to segment cells from the background thresholded image (Clear LT h ) and then masking them in the
(Fig. 10) followed by marker based watershed segmentation global thresholded image (¬Clear LT h & GT h Image) and
for cells that forms clump (Fig. 11). finally looking for good cells (Clear GT h ) in the remaining
segments. Once isolated cells are identified from local and
global thresholded images, the objects left in global thresh-
Focused Image olded image (the Difficult Set in Fig. 10) are recognized to
Accepted Article
contain multiple cells (as cell clumps) and are processed
separately (Fig. 11). Following subsections explain these
Local Thresholding Global Thresholding
procedures in detail.

GTh Image Local Adaptive Thresholding


A
LTh Image
¬Clear LTh &
GTh Image

Separate Isolated cells Separate Isolated cells

A B C D
Figure 12 Comparing local adaptive and global thresholding: A)
Clear LTh Difficult LTh Clear GTh Difficult Set the best focused image B) the region of interest, results for local
Clear Set = Clear LTh | Clear GTh adaptive and global thresholding respectively.

Figure 10 Procedure to segment more or less separated cells


A Leishman stained slide image appears in good contrast
and cell regions appear totally different in colour when
compared to background. Rather than using a single global
threshold to segment all cells, we find an adaptive threshold
at each pixel position considering the pixel’s neighbourhood
(21 × 21) intensity values. Then we assign a pixel to the
cell area if its intensity is still lower than this threshold.
We define a threshold little (ε) lower than the Gaussian
weighted neighbourhood pixel intensities so that smooth
regions inside the cell are rightly assigned to the cell area
and the background region as background. It also helps
to produce a good dark threshold for the pixels on cell
Figure 11 Procedure to segment cells from clumps
wall and around the cell wall. We could better separate
the cells by local adaptive threshold when compared to the
segmentation obtained by using a single global threshold.
This is highlighted in Fig. 12.(b) by taking two sub-regions
4.1.1. Segmenting the cells from the background of interest. The segmentation result for the full image is
provided in top row of Fig. 13 and are respectively, the local
In order to segment the more or less separated cells, a combi- and global thresholded images. The size of averaging kernel
nation of adaptive as well as global threshold based strategy for picking local threshold is set to 21 × 21 which provided
is proposed. Once the clear cases are segmented from back- the best result, when experimented with a range of sizes in
ground, the next level of processing is to separate the cells between 3 and 27 and for the same reason ε is set to 0.005.
from clumps, if any. The main steps involved are shown
as a flowchart in Fig. 10. The local thresholding schemes Global Thresholding
segment cells based on local statistics and hence better sepa-
rability can be achieved especially at slowly varying region The Otsu’s [24] method is used for global thresholding.
between cells; with global threshold, this would have formed However, in order to reduce the effect of variation in bright-
a cell clump. However, local threshold can produce many ness (in different regions of FoV) on the threshold, Otsu’s
holes/breaks in cells especially if the cell area is of constant threshold is computed for all quarter regions and the thresh-
intensity. This would not happen in case of segmentation olds are applied separately. The effect of global threshold
by global thresholding as there is high inter class variability on the image in Fig. 3 is shown in Fig. 13.(b). As noted, we
between cell region and background. We make use of this could isolate more cells by local thresholding (Fig. 13.(a))
complementary nature to have a good segmentation. This when compared to the results by global thresholding (Fig.
This article is protected by copyright. All rights reserved.
4.1.2. Segmenting the cells from clumps

The flowchart for the procedure to segment cells from clump


is shown in Fig.11. The number of cells contained in each
segment is determined based on the size of a typical RBC.
However, being bigger, WBCs can masquerade as a clump
of RBCs. So the first step is to mask out all WBCs, if present
Accepted Article
in difficult set produced after segmenting out the well sepa-
rated RBCs (Fig.10). The next step is to decide approximate
centroids of cells. Finally, we apply marker based watershed
to segment cells from clumps, where cell centroids iden-
tified are used as cell markers. The following subsections
explain this procedure in detail.

Mask out WBCs by Bayesian classification


Figure 13 Segmentation by local adaptive as well as global
thresholds: A) local adaptive thresholding B) global (Otsu’s)
thresholding C) the well separated cells segmented out D) cells
to be segmented from clumps

13.(b)). However, note that the cell regions are better seg-
mented from background with less breaks/holes when com-
pared to the result of local thresholding. We will make use
of this to isolate more number of cells and is going to be Figure 14 Top row : WBCs; Bottom row : infected RBCs
discussed in next subsection.
The Leishman stains nuclear chromatin structures in
dark blue (Fig. 14). Being nucleated, all WBCs are stained
Separating out the segmented cells and clumps in dark blue unlike the non-nucleated RBCs. Except lym-
phocytes, all other WBCs are much larger cells than RBCs.
Thus, we use colour and cell size information to identify
The area and solidity [27] are computed for each object seg- WBCs from background and RBCs. We address this as a
mented out by local thresholding and are checked against classification problem. We had collected 1000 pixel samples
measures for typical RBCs. The area is computed by count- each from RBCs, WBCs and background region. The colour
ing the number of pixels in the segment and solidity is components at these pixel location in LAB colour space (A
measured as the ratio of this area to the area of convex hull & B) [38] are used to build likelihood models for RBCs,
holding the segment. We have set lower bound for cell area WBCs and background. We have used Gaussian to model
as 1400 and upper bound as 2400 pixel area. If the segment the distribution since it was observed that the data spread
whose area is within these bounds and if solidity is greater more or less in Gaussian distribution. These models are then
than 0.85, they are qualified as single cell. The bounds on used to decide whether a pixel belongs to RBC, WBC, or
cell area are set from the fact that typical cell diameter for an background. For any test pixel, if the likelihood of WBC
RBC is in between 6 to 8 micrometers (µm) and by consid- is greater than the likelihood of RBCs and background, it
ering that 15 pixels corresponds to 2.19 micrometers in our is identified as a candidate pixel from a WBC cell. For any
imaging setup. Thus 3µm radius corresponds to ˜20.6 pixels segment thus identified, if its area is less than the maxi-
and 4µm radius corresponds to ˜27.4 pixels and the corre- mum area considered for an RBC, it can be either a heavily
sponding cell region must therefore contain 1400 pixels and infected RBC or a lymphocyte. If the RBCs are healthy,
2400 pixels (π × r2 ) respectively. However, the threshold then we can differentiate between lymphocyte and RBCs
for solidity (0.85) is set empirically. Once the qualified cells just using the colour. However, when there is heavy infec-
are identified from global thresholded image, these are then tion, the infected RBCs also stain in blue due to chromatin
masked out. By this masking, a few cells may get isolated structures of parasites. Luckily, in such cases, there will be
in global thresholded image which were earlier part of a dark parasite spots in infected RBCs as shown in Fig. 14.
clump. This can be seen by analysing segmented cells in Fig. In order to identify such spots, we have considered pixels
13.(c) in the regions of interest marked in Fig. 12. The newly whose intensity in all channels is less than 0.4 (on a scale
qualified cells are decided by computing ‘area’ and ‘solidity’ of 1). If sufficient such pixels (10 connected pixels, in our
measures for the objects remaining in global thresholded im- experiments) are found, the cell is identified as an infected
age after the masking operation. The cells identified by this RBC else a lymphocyte. The threshold for identifying the
procedure are shown in Fig. 13.(c). All other segments need black spot is decided by considering 50 heavily infected
to be processed separately and are shown in Fig. 13.(d). cells having dark parasite areas. With this procedure, we
This article is protected by copyright. All rights reserved.
could correctly locate WBCs (47 WBCs) without any false cell area), the computed centroid regions are masked out
positives from the 765 slide images used in this study. Fi- from the segment and the remaining number of centroids are
nally the WBCs are masked out and are excluded from the picked out using the distance transform of the masked out
rest of the procedure to segment cells from the clump. The segment. Towards this, the centroids of the regional maxima
WBC identified for the slide image in Fig. 3 is shown in points from the distance transform of masked out segment
Fig.15.(a). The masked out image which is to be further are computed but this time reducing the radius by 25% (em-
processed for segmenting the cells is shown in Fig.15.(b). pirically set) of the previous value and the top qualifying
Accepted Article
regional maxima points are picked out as centroids. The top
qualifying local maxima points are identified based on how
big the distance value is, in the distance transform corre-
sponding to the points of interest. This process is repeated
until the desired number of centroids is found out. From our
experience, the method worked quite well, once we could
correctly identify number of cells in clumps. However, if
there are heavily overlapping cases, the number of cells in
the clump could not be accurately determined as the method
that we have proposed is only based on the cell area. This
Figure 15 Masking out WBCs: A) WBCs identified B) WBCs
can be seen in case of the clump shown immediately right
excluded from clumps processing
to the bounding box (just above the WBC) shown in Fig. 16.
In this case, only two centroids are identified, even though
there are three cells.
Get the centres of the cells The cell centroids identified for the clumps of cells in
Fig.3 is shown in Fig.16. Fig.16.(a) shows different steps for
finding the cell centroids for a segment marked in Fig.16.(b).
The first row of Fig.16.(a) shows, respectively, a clump in
the original image selected for processing, its binary im-
age, distance transform, cell centroids, and cell centroids
superimposed on the clump. Since the expected number of
cell centroids for this segment is five but found only four as
explained earlier the required centroids are determined by
masking out the computed centroids and taking maxima on
the distance transform of the masked image. This is shown
Figure 16 Cell centroids for clumps: A) finding cell centroids for in the second row of Fig.16.(a). The images are the centroid
the segment marked using bounding box in B B) cell centroids region masked out binary image, its distance transform, all
identified for the clumps the centroids identified and the centroids super imposed
image. Note that all clumps in Fig.3 but the one shown in
The centroid is the farthest lying point from cell bound- Fig.16.(a) produced the correct number of centroids as de-
ary. Thus, if we measure the distance for each point on the sired right from the first level of processing (the processing
cell from background, the cell center will be the farthest shown in the first row of Fig.16.(a)).
from background. This means that the points on the cell
which are lying on the periphery are all having zero dis- Split the cells from clumps
tance while a pixel just inside the boundary will be having Once the centroids of cells in clumps are determined, the
unit distance and so on with the highest value at the center. cells are segmented using watershed [40]. The centroids
If there is only one cell, then the overall maximum in the found out are used as foreground marker, and the back-
distance transform occurs at centroid. However, for a cell ground image generated by global (Otsu’s) threshold is used
clump, the distance transform is supposed to have as many as the background marker. We use gradient magnitude as
local maxima as the number of cells contained in it. These segmentation function. That is, we impose minimum at the
local maxima are due to high distance transform value for marker locations and apply watershed algorithm on the gra-
each of the cell which has boundary with respect to back- dient magnitude image. The output of the watershed based
ground for majority of the portion though there is a portion segmentation is shown in Fig.17.
which clumps them with other cells. There are efficient al-
gorithms that compute this distance metric [39]. Therefore,
we compute the distance transform of the segments and 5. Results and Discussion
regional maxima points are identified as initial centroids
of cells in the segment. Being the typical cell diameter in In this section, we provide and analyse the results of parasite
pixels is 41, the region for determining the local maxima is detection and segmentation procedure presented in last sec-
fixed as a disk of diameter 41. If the number of centroids tions. As noted earlier, the pathologist working in the field
identified for each segment does not match with the num- have marked infected locations on slide images. The num-
ber of cells expecting from the clump (segment area/typical ber of infected and healthy cells is then correctly identified
This article is protected by copyright. All rights reserved.
as healthy fall in FN. Similarly, the healthy cells that are
correctly classified fall in TN while the misclassified cells
fall in FP. The sensitivity (TP/(TP+FN)) measures the ability
of a classifier to correctly identify an infected cell while
specificity (TN/(FP+TN)) measures the ability of a classifier
to correctly identify healthy cells. For any classification
system, there is a trade-off between these two quantities.
Accepted Article
MCC takes this into consideration and it turns out to be
Figure 17 Marker Based Watershed Segmentation: A) The fore- a better measure than the simple accuracy especially in
ground cell markers embedded on cells B) super-imposed cell cases where the number of positives and negatives are quite
boundaries on slide image shown in Fig.3 unbalanced. The MCC is defined by
T P × T N − FP × FN
MCC = p
manually by verifying the segmentation produced by the (FP + T P)(T P + FN)(FN + T N)(T N + FP)
automated counting procedure. Thus the infected parasite (3)
locations marked by experts on slide images along with the
number of infected and healthy cells provide ground truth
for the present study. We perform the quantitative analysis 1
of parasite detection in terms of false positives and false CNN on Focus Stack
negatives. We perform the quantitative analysis of segmen- CNN on Best Focused
tation by seeing how close the cell count identified by the 0.8 SVM on Best Focused
automated counting procedure is when compared to the True Positive Rate
manually verified count. We also assess the visual quality
of the proposed segmentation. 0.6
As per the WHO manual on Malaria diagnosis [1],
malaria detection from blood smear requires the exami-
0.4
nation of 800 high power (100X) FoVs. To cover the pre-
scribed physical slide area, the developed system requires
imaging of only 128 FoVs (with 40X magnification). The 0.2
focus stack acquisition from the required number of FoVs
takes about 11 minutes. This is followed by analysis of the
focus stack for identifying the infected and healthy RBCs. 0
This analysis is conducted in Matlab 2014a installed for 0 0.2 0.4 0.6 0.8 1
Windows 7, 64 bit operating system running on an Intel i3 False Positive Rate
machine @ 3.10 GHz with 4 GB RAM. The processing of
each FoV (i.e., slide image of dimension 480 × 720 × 3) on Figure 18 The ROC for the proposed classifiers : CNN on focus
average took around 6.5 seconds; ˜5 seconds for segmenta- stack, CNN on the best focused image and SVM on features
tion and ˜1.5 seconds for parasite detection.
In order to anlalyse the effectiveness of using focus As per the definition in Eq. (3), MCC value can range
stack in accurately detecting parasite locations, we have between -1 and 1. Value 1 corresponds to perfect classi-
performed a 10 fold cross validation experiment on the fication, 0 corresponds to not better than a random guess
classifiers: SVM on hand engineered features, CNN on the while -1 corresponds to the worst classifier. The average sen-
best focused patches and CNN on the focus stack of patches. sitivity, specificity and MCC measures computed for each
In this experiment, the 5600 positive patches along with of the classifier across 10 folds along with their standard
5600 negative patches, selected at random from the available deviation (std) is shown in Table 1. The results show that
326934 negative patches, constitute the training dataset. The the CNN working on the focus stack has an advantage in
entire 11200 training patches are now divided into 10 sets, reducing false positives, and false negatives as indicated by
by selecting samples at random but ensuring 560 positive the highest values for all measures in Table 1. The Receiver
and 560 negative samples in each set. Now, 9 sets are used Operating Characteristics (ROC) for the first cross valida-
to train the classifiers as discussed in previous section, and tion fold is shown in Fig. 18, where we plot true positive
used the remaining set for validation. Such 10 run has been rate (sensitivity) against false positive rate (1 - specificity).
made, where in each run, each set is used for validation of The area under the curve (AUC) is a measure of the good-
the classifiers trained with the remaining 9 sets. We measure ness of classification, where an ideal classifier should give
the effectiveness of parasite detection in terms of sensitivity, unit area and a classifier that does a random guess should
specificity and Matthews Correlation Coefficient (MCC). give 0.5. The mean area under the curve for 10 fold cross
All these measures are computed from the number of true validation experiment that we have conducted turned out
positives (TP), false negatives (FN), false positives (FP) and to be 0.9992 for the CNN on focus stack, 0.9987 for CNN
true negatives (TN). The infected cells which are correctly on the best focused image and 0.9910 for SVM on features.
classified as infected fall in TP, and that are misclassified The standard deviation reported are respectively 7.5764e−4 ,
This article is protected by copyright. All rights reserved.
Table 1 Average Sensitivity, Specificity and MCC along with their assess the accuracy of the network at each epoch during
standard deviation in 10 fold cross validation: A) SVM on Features training. The accuracy of the network on this validation set
and CNN on B) Patches C) Focus Stack can be seen in the plot (blue) shown in Fig.19. In order to
avoid over-fitting on the training data, we chose the network
Metric Method - A Method - B Method - C for testing as the one that gives minimum error on the vali-
Sensitivity 96.38% (0.88) 98.91%(0.36) 99.14%(0.37) dation set and not on the training set. These turned out to be
Specificity 95.43%(0.85) 99.39%(0.31) 99.62%(0.18) the trained network at iteration 36 from the set of classifiers
Accepted Article
MCC 0.9181(0.0150) 0.9831(0.0039) 0.9877(0.0032) trained to work on the best focused patches and the network
at iteration 85 from the set of classifiers trained to work on
the focus stack of patches.
Table 2 Number of Samples (#) used in Training and Validation For a test image, the trained classifiers can be applied
at candidate locations and the parasite locations can be
Patches # Un-rotated # Rotated # Train # Validation # Test marked. As noted earlier in subsection 3.1, the candidate
1400×4 60% of 5600 20% of 5600 locations are found from regional minima of the intensities.
+ve 1400 1400
(5600) (3360) (1120)
Depending on the classifier selected, the 32 × 32 RGB patch
-ve 326934 - 3360 1120 326934
or the focus stack of patches or the features are extracted
and are used for the classification. The parasite locations
9.9871e−4 and 0.0037. The high value for the mean AUC identified by different classifiers for the test image provided
and low value of standard deviation reveals that the CNN in Fig.3 are shown in Fig.20. The ground truth parasite
working on the focus stack offers superior performance. locations are provided in Fig.20 (a) and Fig.20 (b) provides
In order to perform a detailed analysis on the capability the locations detected by the SVM classifier. Fig.20 (c) and
of CNN operating on the focus stack, in our second exper- (d) are respectively the locations identified by the CNN
iment, we have trained the classifiers using relatively few trained on the best focused patches and CNN trained on
samples. The classifiers are trained by taking at random, the focus stack. Note that the SVM trained on the features
60% positive samples and an equal number of negative unnecessarily identify a healthy cell as infected while the
samples. A separate 20% positive samples (+ve: samples CNN working on the best focused patch misses out one
which are really infected) and an equal number of negative infected cell.
samples (-ve: samples which are really healthy) are used to
validate the network during training. The trained system is
then tested for all patches in all slide images. The number
of training, testing and validation patches are explicitly pro-
vided in Table 2. All classifiers are trained with exactly the
same set of candidate parasite locations in order to facilitate
a fair comparison.

0.15 0.15
train train
val val
0.10 0.10
error

error

0.05 0.05

Figure 20 A) The ground-truth parasite locations in the slide


0 0 image shown in Fig.3. The parasite locations identified by B) SVM
0 20 40 60 80 100 0 20 40 60 80 100
training epoch training epoch trained on features C) CNN trained on best focused image and D)
CNN trained on focus stack.
Figure 19 The behaviour of CNN training A.) on patches from
the best focused image and B.) on focus stack of patches The confusion matrix generated for each of the classier
is shown in Table 3. It can be seen from Table 3 (c) that
Note that only 3360 negative samples out of the total the CNN trained on the focus stack produced superior per-
333352 negative patches available are used for training since formance with minimum false positives and false negatives.
we have only very limited number of positive patches. The Fig.21 shows the focus stacks of 8 cell images which are
learning behaviour of CNNs can be found in Fig. 19, where resized to 40 × 40. The middle row holds the best focused
the left side plot corresponds to the CNN learning only image, the first and last row hold the images which are re-
from the best focused patches while the right side plot cor- spectively the 16th image after skipping 15 images on either
responds to the CNN learned to operate on the focus stack side of the focus stack as discussed in subsection 3.2.4. The
of patches. We have used a set of independent samples to first 3 cell images (a)-(c) are the true positives (infected)
This article is protected by copyright. All rights reserved.
Table 4 Sensitivity, Specificity and MCC: A) SVM on Features
and CNN on B) Patches C) Focus Stack

Metric Method - A Method - B Method - C


Sensitivity 92.95% 96.64% 97.06%
Specificity 93.82% 98.27% 98.50%
Accepted Article
MCC 0.4430 0.7036 0.7305
Figure 21 A - C) are the three true positives (focus stacks) only
identified by CNN on focus stack D - F) are three true negatives
only identified by CNN on focus stack G - H) are two infected cells Table 5 Confusion Matrices (Automated): A) SVM on Features
missed out by all the classifiers B) CNN on Patches C) CNN on Focus Stack

Infected Healthy Infected Healthy Infected Healthy


Infected 1107 85 1151 41 1156 36
Table 3 Confusion Matrices : A) SVM on Features B) CNN on
Healthy 3741 56978 1053 59666 912 59807
Patches C) CNN on Focus Stack A B C

Infected Healthy Infected Healthy Infected Healthy


Infected 1107 84 1151 40 1156 35
Healthy 3756 57068 1053 59771 912 59912 Table 6 Sensitivity, Specificity and MCC (Automated): A) SVM
A B C on Features and CNN on B) Patches C) Focus Stack

only identified by the CNN operating on the focus stack Metric Method - A Method - B Method - C
and so as the case with the next three true negatives (d)-(f). Sensitivity 92.87% 96.56% 96.98%
The cells shown in Fig.21 (d)-(f) are really healthy cells Specificity 93.84% 98.27% 98.50%
and the marks are due to dust on the sensor and are cor- MCC 0.4435 0.7033 0.7302
rectly identified as healthy only by the CNN operating on
the focus stack. It can be clearly seen that the dust area does
not change considerably across the focus stack unlike the
change around the parasite locations. The last two images Table 7 Confusion Matrices for CNN on Focus stack for A) Slide-1
and B) Slide-2
(g)-(h) are the infected cells which are wrongly marked as
healthy by all three classifiers. It can be understood from
Infected Healthy Infected Healthy
the shown cell images that it is hard to go for a decision Infected 566 (566) 21 (22) 590 (590) 14 (14)
just by looking at the cells in the best focused slide image Healthy 506 (506) 32352 (32293) 406 (406) 27560 (27514)
(shown in the middle row) and as reflected by the confusion A B
matrix in Table 3, CNN operating on focus stack gets an
upper hand in taking the correct decision compared to the
classifiers working on the single best focused image. This the classification discussed in section 3.2 is shown in Ta-
can be observed by the highest MCC measure in Table 4, ble 5. The cells are counted by the automated procedure
for the CNN running on focus stack. The table also reveals developed in section 4 unlike the results produced in Table
that the proposed system could correctly classify 1156 cells 3 where the cells are counted manually. The correspond-
out of the total 1191 infected cells yielding a sensitivity of ing sensitivity and specificity obtained by the completely
97.06% and it could correctly classify 59912 cells out of the automated system are also provided in Table 6. It can be
total 60824 healthy cells yielding a specificity of 98.50%. easily verified that the results provided in Table 3 and Table
The sensitivity and specificity of the system using the SVM 5 are comparable. Now, the parasitemia level reported by
classifier trained on features of the patches are 92.95% and the automated system can be computed. It is defined as the
93.82% respectively and that of the CNN trained on the ratio of total number of infected RBCs to the total number of
best focused image are 96.64% and 98.27%. Note that, a RBCs considered and is typically expressed in percentage.
considerably infected cell can easily be identified from the The actual parasitemia level is defined by the ground truth
best focused image itself while the difficulty is in the case and is 1.92% (1191/62015). The statistics in Table 5 shows
of a cell at early stage of infection as can be seen in Fig. that the CNN on the focus stack has produced the closest
8. In such cases of early infection, though the variation prediction 3.34% (2068/61911) compared to the CNN on
across the stack is minimal, the CNN working on the focus the best focused patch (3.57%) as well as SVM on hand
stack could differentiate parasites from artifacts like dust engineered features (7.83%).
where the changes across the focus stack is insignificant. In order to assess separately the parasite detection and
This further results in the improvement of accuracy in terms automated counting procedure on the FoVs collected from
of sensitivity, specificity and MCC as observed in Table 4. the two slides that we have used in this study, the confusion
In order to evaluate the effectiveness of the automated matrices are provided in Table 7. There were 392 FoVs from
procedure for counting, the confusion matrix generated by Slide-1 and 373 from Slide-2. Table 7 (a) provides the statis-
This article is protected by copyright. All rights reserved.
tics for first slide while Table 7 (b) provides the statistics for From the result of segmentation obtained on our dataset, we
second slide. The automated count given by the proposed infer that our proposed method of segmentation performed
system for TP, FN, FP and TN is also provided in Table 7 better than the method in [41].
and are provided within brackets. The corresponding sen-
sitivity and specificity metrics computed for the detection
procedure for both slides are comparable and are respec-
tively 96.42% & 97.68% and 98.46% & 98.55%.
Accepted Article
We have seen that the number of cells identified by
the automated counting procedure comes very close to the
manually verified count. Now, we will analyse the visual
quality of segmentation. The slide images used in this study
had so much variation in cell overlap, focus and in staining.
Fig.22 shows this variability where Fig.22 (a) and (b) show
different amount of staining and Fig.22 (c) shows that a
few of the cell images are not in focus. The corresponding
segmentation result is provided in second row. Fig.22 (d)
shows a clump of two cells wrongly identified as a single
cell due to the large overlapping region between the cells and
Fig.22 (e) shows the case where a single cell get segmented
as if there are two cells.
Figure 23 Comparing segmentation results : A) the result of seg-
mentation by CellX method [41] for the image in Fig.3 B) Another
focused image, the cells segmented using C) CellX method [41]
and D) by the proposed method.

Figure 22 Segmentation results : A), B), C) Images at different


staining level and focus D) Wrong segmentation of a clump con-
taining two cells E) Wrong segmentation of a single cell into two
cells
Figure 24 Comparing segmentation results on 12 sub-images:
The result of segmentation procedure proposed is com- Original image in first row, segmentation by CellX [41] in second
pared with the result of the method in [41] where the authors row and the result of the proposed method in third row.
used the membrane pattern for identifying the cell bound-
aries. We have used the implementation provided in the link Further, by inspecting the confusion matrices in Table
associated to [41]. The parameters to define the membrane 3 and Table 5, it can be seen that the number of false posi-
pattern are set as explained in the manuscript [41]. For this tives for the classification by CNN remained the same. This
purpose, 60% of FoVs is selected at random and from each means that no cells marked with more than one false posi-
selected FoV, one cell is used to define the parameters. The tive location is wrongly segmented into two cells such that
result of segmentation on the slide image in Fig.3 is shown the marked locations fall into multiple number of segments.
in Fig.23 (a). Fig.23 (c) and (d) show respectively the re- Also, no two adjacent cells marked as false positive come
sults obtained on the slide image shown in Fig.23 (b) by into the same segmented region. The count difference of
the method described in [41] and our method. We have also 15 in the number of false positives by SVM is due to the
shown in Fig. 24, the segmentation results on twelve more fact that adjacent cells are identified false positive but are
sub-images for a comparison. The results are ordered such counted as single cells since they fall in the same segment.
that the top row shows the sub-images, second row holds the These are heavily overlapped cases and one such case is
segmentation by following the membrane pattern [41] and shown in Fig.22 (d). The count difference of 105 in healthy
third row holds the result of segmentation by the proposed cells between the manual and the automated cell counting is
approach. It can be seen that the method [41] missed out due to the same reason. Also, there is a count difference of
quite a few number of cells. This is because when there is one cell in the number of false negative. In this case, one cell
more than one type of cells in the slide image under study, marked positive was wrongly split into two segments. One
a unique membrane pattern is often difficult to find. In our segment was identified as infected by all the three classifiers,
dataset though most of the RBCs are nearly circular and but none of them identified the second as infected which
are having almost the same size, some of them take very contained the marked ground truth location. This case is
different shape and size. Also heavily infected RBC takes shown in Fig.22 (e). However the comparable results in con-
completely a different membrane profile and so does WBCs. fusion matrices shown in Tables 3 and 5, reveal that overall
This article is protected by copyright. All rights reserved.
Table 8 Performance of the Method in [19] (OMI) lab, Department of Instrumentation & Applied Physics (IAP),
IISc, Bangalore for preparing the Lieshman stained sample slides
Infected Healthy Method - [19] that were used in the experiments. The authors also sincerely
Infected 984 208 Sensitivity 82.55 thank Mr. Jayesh Adhikari, Project Assistant at OMI Lab, IAP, IISc,
Healthy 4091 56628 Specificity 93.26 Bangalore for helping with the design and building of the transla-
tional stage which was used in the setup for slide scanning. Also,
we thank VK Jagannath of IISc for the constructive suggestions
Accepted Article
the segmentation was good and produced comparable cell and S Rajesh & Dr. BV Kranthi of IISc for helping us to prepare
count with the manual method. the ground-truth for the dataset used in this study.
Note that, we have used the results in Table 1 to under- Key words: Malaria Diagnosis
stand how good the training is across different cross valida-
tion folds (10 fold) run by considering all infected patches.
The large value of the mean accuracy and low standard de-
viation reveal that the system trains effectively well across References
different set of training set used in the cross-validation ex-
[1] W.H.O., Basic malaria microscopy - Part I: Learner’s guide
periment. Note that, we have used 90% of infected cells for
(World Health Organization, 2010).
each cross validation experiment reported in Table 1 while
[2] P. Zimmerman and R. Howes, Curr Opin Infect Dis 28(5),
we have used only 60% infected patches for the results re-
446–454 (2015).
ported in Tables 3 to 6. Thus, the result in Table 1 provides [3] H. Zhu, S. O. Isikman, O. Mudanyali, A. Greenbaum, and
how good the training is (mean accuracy and standard devia- A. Ozcan, Lab Chip 13, 51–67 (2013).
tion) across the folds in 10 fold cross validation experiment, [4] M. Elter, E. HaBlmeyer, and T. ZerfaB, Detection of
while the tables 3 to 6 help us in establishing the generality malaria parasites in thick blood films, in: 2011 Annual Inter-
of approach, its robustness to classification even with 60% national Conference of the IEEE Engineering in Medicine
data and validates segmentation & classification procedures and Biology Society, (Aug 2011), pp. 5140–5144.
altogether. [5] I. K. E. Purnama, F. Z. Rahmanti, and M. H. Purnomo,
We have also compared the detection accuracy of our Malaria parasite identification on thick blood film using
method with the accuracy on a patch based malarial detec- genetic programming, in: Instrumentation, Communica-
tion method reported in [19]. Though the stain for the slide tions, Information Technology, and Biomedical Engineer-
images used in [19] was Giemsa, we adopted the approach ing (ICICI-BME), 2013 3rd International Conference on,
of using the LBP/VAR [42] and SIFT features [20] as re- (Nov 2013), pp. 194–198.
ported in the manuscript [19]. The LBP/VAR feature with [6] A. Pinkaew, T. Limpiti, and A. Trirat, Automated classifi-
the given specification is computed with the source code cation of malaria parasite species on thick blood film using
available at [43]. The number of false positives and false support vector machine, in: 2015 8th Biomedical Engineer-
negatives obtained by this method on our dataset is shown ing International Conference (BMEiCON), (Nov 2015),
in Table 8. Comparing the sensitivity and specificity of the pp. 1–5.
[7] A. Ravendran, K. W. T. R. T. de Silva, and R. Senanayake,
method with those reported in Table 6, it can be understood
Moment invariant features for automatic identification of
that the CNN trained on the focus stack has produced the
critical malaria parasites, in: 2015 IEEE 10th International
best performance.
Conference on Industrial and Information Systems (ICIIS),
(Dec 2015), pp. 474–479.
[8] V. V. Makkapati and R. M. Rao, Segmentation of malaria
6. Conclusions parasites in peripheral blood smear images, in: 2009 IEEE
International Conference on Acoustics, Speech and Signal
Malaria is a deadly infectious disease affecting a few mil- Processing, (April 2009), pp. 1361–1364.
lion individuals around the globe. High degree of sensi- [9] A. Mehrjou, T. Abbasian, and M. Izadi, Automatic malaria
tivity and specificity are desired for any malaria diagnosis diagnosis system, in: Robotics and Mechatronics (ICRoM),
system. In this paper, we have proposed a completely au- 2013 First RSI/ISM International Conference on, (Feb
tomated, custom-built, portable, cost-effective prototype 2013), pp. 205–211.
[10] Y. Purwar, , S. L. Shah, G. Clarke, A. Almugairi, and
system with necessary instrumentation as well as image
A. Muehlenbachs, Malaria Journal 10(1), 364 (2011).
analysis/classification algorithms that can be used for quan-
[11] F. B. Tek, A. G. Dempster, and I. Kale, Malaria Journal 8(1),
titative malaria detection. The results produced in this paper 153 (2009).
suggest using CNN over traditional classifiers like SVM for [12] B. E. Boser, I. M. Guyon, and V. N. Vapnik, A training
considerable improvement in specificity and sensitivity and algorithm for optimal margin classifiers, in: Proceedings
also suggest operating on focus stack of images for malaria of the Fifth Annual Workshop on Computational Learning
diagnosis. We strongly believe that this work make a signifi- Theory, , COLT ’92 (ACM, New York, NY, USA, 1992),
cant step forward in the malaria eradication programme run pp. 144–152.
by many countries and organisations worldwide. [13] S. S. Savkare and S. P. Narote, Automated system for
malaria parasite identification, in: Communication, Infor-
Acknowledgements. The authors would like to acknowledge mation Computing Technology (ICCICT), 2015 Interna-
Project Associate Mr. Abhishek Pathak, Optics & Microfluidics tional Conference on, (Jan 2015), pp. 1–4.
This article is protected by copyright. All rights reserved.
[14] W. Preedanan, M. Phothisonothai, W. Senavongse, and tional Symposium on Biomedical Imaging (ISBI), (April
S. Tantisatirapong, Automated detection of Plasmodium fal- 2015), pp. 294–297.
ciparum from giemsa-stained thin blood films, in: 2016 8th [32] D. C. Cireşan, A. Giusti, L. M. Gambardella, and J. Schmid-
International Conference on Knowledge and Smart Tech- huber, in: Mitosis Detection in Breast Cancer Histology
nology (KST), (Feb 2016), pp. 215–218. Images with Deep Neural Networks, edited by K. Mori,
[15] D. Anggraini, A. S. Nugroho, C. Pratama, I. E. Rozi, A. A. I. Sakuma, Y. Sato, C. Barillot, and N. Navab (Springer
Iskandar, and R. N. Hartono, Automated status identifica- Berlin Heidelberg, Berlin, Heidelberg, 2013), pp. 411–418.
Accepted Article
tion of microscopic images obtained from malaria thin [33] Z. Liang, A. Powell, I. Ersoy, M. Poostchi, K. Silamut,
blood smears, in: Electrical Engineering and Informatics K. Palaniappan, P. Guo, M. A. Hossain, A. Sameer, R. J.
(ICEEI), 2011 International Conference on, (July 2011), Maude, J. X. Huang, S. Jaeger, and G. Thoma, Cnn-based
pp. 1–6. image analysis for malaria diagnosis, in: 2016 IEEE Inter-
[16] S. Kareem, I. Kale, and R. C. S. Morling, Automated national Conference on Bioinformatics and Biomedicine
malaria parasite detection in thin blood films:- a hybrid (BIBM), (Dec 2016), pp. 493–496.
illumination and color constancy insensitive, morphologi- [34] K. Rohan, Vanishing of gradients, https://ayearofai.com/
cal approach, in: Circuits and Systems (APCCAS), 2012 rohan-4-the-vanishing-gradient-problem-ec68f76ffb9b,
IEEE Asia Pacific Conference on, (Dec 2012), pp. 240–243. 2016, Accessed: 2017-04-10.
[17] L. h. Zou, J. Chen, J. Zhang, and N. Garcia, Malaria cell [35] I. Sobel, History and definition of the sobel operator, Tech.
counting diagnosis within large field of view, in: Digital rep., Feb 2014, Presented at Stanford A.I. Project 1968.
Image Computing: Techniques and Applications (DICTA), [36] K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman,
2010 International Conference on, (Dec 2010), pp. 172– Return of the devil in the details: Delving deep into convolu-
177. tional nets, in: British Machine Vision Conference, (2014).
[18] H. A. Nugroho, S. A. Akbar, and E. E. H. Murhandarwati, [37] A. Vedaldi and K. Lenc, CoRR abs/1412.4564 (2014).
Feature extraction and classification for detection malaria [38] R. S. Hunter, J. Opt. Soc. Am. 48(12), 985–993 (1958).
parasites in thin blood smear, in: 2015 2nd International [39] R. Fabbri, L. D. F. Costa, J. C. Torelli, and O. M. Bruno,
Conference on Information Technology, Computer, and ACM Comput. Surv. 40(1), 2:1–2:44 (2008).
Electrical Engineering (ICITACEE), (Oct 2015), pp. 197– [40] L. Vincent and P. Soille, IEEE Transactions on Pattern
201. Analysis and Machine Intelligence 13(6), 583–598 (1991).
[19] N. Linder, R. Turkki, M. Walliander, A. Mårtensson, V. Di- [41] S. Dimopoulos, C. E. Mayer, F. Rudolf, and
wan, E. Rahtu, M. Pietikäinen, and M. Lundin, PLoS ONE J. Stelling, Bioinformatics 30(18), 2644–2651 (2014),
9(8), e104855 (2014). http://www.csb.ethz.ch/tools/software/cellx.html.
[20] D. G. Lowe, Int. J. Comput. Vision 60(2), 91–110 (2004). [42] T. Ojala, M. Pietikainen, and T. Maenpaa, IEEE Transac-
[21] S. Sathpathi, A. K. Mohanty, P. Satpathi, S. K. Mishra, P. K. tions on Pattern Analysis and Machine Intelligence 24(7),
Behera, G. Patel, and A. M. Dondorp, Malaria Journal 971–987 (2002).
13(1), 1–5 (2014). [43] LBP/VAR implementation; centre for machine vision and
[22] P. Samir, P. Chitra, and A. Urhekar, Journal of Evolution of signal analysis; university of oulu, http://www.cse.oulu.fi/
Medical and Dental Sciences 2(7), 712–724 (2013). CMV/Downloads/LBPMatlab, 2016, Accessed: 2016-10-
[23] Pathscope TM slide scanner; digipath inc. : Pathology de- 15.
livered digitally, http://www.digipath.biz/pr/PathScope.pdf,
2016, Accessed: 2016-12-7.
[24] N. Otsu, IEEE Transactions on Systems, Man and Cyber-
netics 9(1), 62–66 (1979).
[25] E. J. Breen and R. Jones, Comput. Vis. Image Underst.
64(3), 377–389 (1996).
[26] P. Soille, Morphological image analysis: principles and
applications, 2nd ed edition (Springer, 2003).
[27] G. Gopakumar, V. K. Jagannadh, S. S. Gorthi, and
G. R. K. S. Subrahmanyam, Journal of Microscopy 261(3),
307–319 (2016).
[28] Y. LeCun and Y. Bengio, The handbook of brain theory
and neural networks (MIT Press, Cambridge, MA, USA,
1998), chap. Convolutional Networks for Images, Speech,
and Time Series, pp. 255–258.
[29] S. C. B. Lo, H. P. Chan, J. S. Lin, H. Li, M. T. Freedman,
and S. K. Mun, Neural Networks 8(7), 1201 – 1214 (1995).
[30] Q. Li, W. Cai, X. Wang, Y. Zhou, D. D. Feng, and
M. Chen, Medical image classification with convolutional
neural network, in: Control Automation Robotics Vision
(ICARCV), 2014 13th International Conference on, (Dec
2014), pp. 844–848.
[31] Y. Bar, I. Diamant, L. Wolf, S. Lieberman, E. Konen, and
H. Greenspan, Chest pathology detection using deep learn-
ing with non-medical training, in: 2015 IEEE 12th Interna-
This article is protected by copyright. All rights reserved.
Graphical Abstract
Accepted Article

This paper addresses quantitative diagnosis of malaria


due to Plasmodium falciparum which is accomplished by
detecting and counting infected cells from focus-stack of
blood smear acquired using custom-built slide scanner. For
detection, custom designed convolutional neural network
(CNN) is used. The cell counting problem is addressed as
segmentation problem and a two level segmentation strategy
is proposed. Use of CNN operating on focus stack for the
detection of malaria is first of its kind, which not only im-
proved sensitivity and specificity but also avoided the need
for hand-engineered features.

This article is protected by copyright. All rights reserved.

You might also like