3 Abbas2016
3 Abbas2016
DOI 10.1007/s00521-016-2474-6
ORIGINAL ARTICLE
Abstract Malaria parasitemia is the quantitative mea- out on image dataset with respect to ground truth data,
surement of the parasites in the blood to grade the degree of determining the degree of infection with the sensitivity of
infection. Light microscopy is the most well-known 98 % and specificity of 97 %. The accuracy and efficiency
method used to examine the blood for parasitemia quan- of the proposed scheme in the context of being automatic
tification. The visual quantification of malaria parasitemia were proved experimentally, surpassing other state-of-the-
is laborious, time-consuming and subjective. Although art schemes. In addition, this research addressed the pro-
automating the process is a good solution, the available cess with independent factors (RBCs’ morphology).
techniques are unable to evaluate the same cases such as Eventually, this can be considered as low-cost solutions for
anemia and hemoglobinopathies due to deviation from malaria parasitemia quantification in massive
normal RBCs’ morphology. The main aim of this research examinations.
is to examine the microscopic images of stained thin blood
smears using a variety of computer vision techniques, Keywords Malaria Parasitemia Thin blood smears
grading malaria parasitemia on independent factors (RBC’s Machine aided Features mining
morphology). The proposed methodology is based on
inductive approach, color segmentation of malaria parasites
through adaptive algorithm of Gaussian mixture model 1 Introduction
(GMM). The quantification accuracy of RBCs is improved,
splitting the occlusions of RBCs with distance transform Malaria is a serious infectious disease caused by genus
and local maxima. Further, the classification of infected plasmodium, a blood parasite injected by female Anophe-
and non-infected RBCs has been made to properly grade les mosquito into the human body. According to the World
parasitemia. The training and evaluation have been carried Health Organization’s annual malaria report 2013 [1],
malaria takes the life of a child every 45 min. The plas-
modium attacks the RBCs, which are blood components.
& Tanzila Saba Parasitemia, the quantitative measurement of the parasites
[email protected]; [email protected]
in blood, is used to check the severity of malaria [2]. For
1
Department of Computer Science, Islamia College Peshawar, this purpose, visual quantification through light microscopy
Peshawar, KPK, Pakistan is still the most prevalent and commonly practiced method
2
College of Computer and Information Science, Prince Sultan because of its availability and economical methods of
University, Riyadh 11586, Saudi Arabia testing [3, 4]. The microscopy examination of malaria
3
Faculty of Computing, Universiti Teknologi Malaysia, involves two types of slides, i.e., thick and thin blood
81310 Skudai, Johor, Malaysia smears. The thick blood smear tests are mainly used in
4
College of Computer and Information Systems, Al-Yamamah malaria to test for the presence or absence of plasmodium
University, Riyadh 11512, Saudi Arabia in the blood. The thin blood smear tests are used for
5
College of Computer and Information Sciences, King Saud detailed examination of malaria such as quantification of
University, Riyadh 12372, Saudi Arabia parasitemia, specie identification and life cycle
123
Neural Comput & Applic
classification. According to the recommendations of WHO In contrast, inductive approach is bottom-up or the
in [3] and revised version in 2004 [5], the thin blood smear deductive approach in reverse [43–45]. Both of these
must be examined under 70–100 windows while diagnos- approaches suffered from RBCs’ morphology dependent
ing via the microscopy of malaria; the number of infected factors. However, inductive approach is better because it
RBCs will be counted among 100 RBCs in each window. has more freedom to select morphology independent fac-
Physicians frequently ask for the thin blood smear test in tors. The color, intensities, shape, size, area, radius and all
severe stages of malaria under microscope through visual other morphology-related factors of RBCs are highly
quantification, which has proven to be too laborious, time- variable factors in different patients. In the same way, we
consuming, and the results are often erroneous due to the cannot depend on the morphology of parasites except for
massive number of on-going examinations [4]. color. Moreover, in the literature, we also discovered
The rest of this paper is arranged as follows. Section 2 another serious problem of occluded RBCs. The occluded
discusses the research background and current challenges, RBCs problem was addressed by few studies on the same
Sect. 3 presents the proposed methodology framework, morphological dependent grounds, while the majority
Sect. 4 reports experimental results analysis, discussion ignore the problem altogether.
and finally Sect. 5 concludes the paper. The deductive approach adopted by [19] and [7] is
seriously affected in the presence of dense occluded RBCs.
The study in [19] is also dependent on area granulometry
2 Research background for RBC size (only constant when RBC is healthy) esti-
mation, and there is no specification for occlusions of
This section summarizes the categories of the tools and RBCs. The authors of [25, 26] trained SVM and PCA for
techniques presented in the literature for the task of classification, while for features extraction they also relied
quantification of malaria parasitemia and its grading. For on morphology dependent factors, i.e., area and radius. In
detailed study of the existing tools and techniques, inter- addition to the study of [25] was dependent on bimodal
ested readers are referred to the survey reported in [2]. histogram. Further, both mentioned studies have no clear
A noticeable number of studies addressed the automatic approach on how to address the occlusions of RBCs. The
malaria parasitemia quantification. Majority of them tried studies [6], [27] and [8] are dependent on the circularity of
to resolve problems such as image luminance, low contrast, RBCs (detection through circular hough transform). The
poor illumination and out of focus images in the prepro- circularity of RBCs is very sensitive and can be disturbed
cessing step [37–40]. However, most of these problems due to exertion of even slight pressure on the slide during
were resolved due to the advent of high-quality imaging preparation. For features extraction, the consideration of
tools. The well-known techniques employed by the fixed area, radius and edges is the cases suited to normal
majority of the studies as preprocessing steps are: His- RBCs but due to malaria and other diseases theses features
togram equalization (HE) [14–17], brightness preserving alter frequently [13]. However, these are adopted by
dynamic HE (BPDHE) [18], smallest univalue segment majority of the research studies such as [7, 10, 11].
assimilating nucleus (SUSAN) [19], smoothing the image Moreover, authors in [7] also counted the number of
through Median filter and edge preservation through infected RBCs based on the number of parasites which is
Laplacian [11, 20] and in the same way, but for edge not acceptable in medical cases, as authors in [13] stated
preservation, the authors of [21–23] used unsharp masking. that the infected RBC will be counted one, regardless of the
The underlying study considered image smoothing through number of parasites in it. The occluded RBCs problem is
median filter of kernel size [3 9 3], high kernel size will addressed in the work mentioned in [10], through the
remove the parasites particularly in their initial stages and method developed in [28], but in dense occluded RBCs the
for edges preservation of red blood cells and parasites an method will affect the accuracy. The segmentation of
unsharp masking is used. The selection of these two RBCs based on nucleic approach exposes the problem, as
methods has been made on the basis of their positive RBCs have no nuclei, and the studies considered the par-
results, experimented on 74 images of the standard dataset asites as nuclei. The studies based on nucleic approach will
obtained from [24]. be seriously affected when the RBCs become really
Further, according to the literature survey, we can nucleated, such as when the RBCs life span is near the end,
broadly divide the adapted methodologies previously for or the RBCs are highly matured. The nucleic approach is
automatic malaria diagnosis or parasitemia estimation into followed by several researches reported in [29, 30] and in
two deductive and inductive approaches [41, 42]. Deduc- [31] segmentation of RBCs. The segmentation based on
tive approach is a top-down strategy starting with the chromatin dots offers no surety that on the basis of maxi-
foreground and background separation, followed by red mum and minimum intensity levels that they will be the
blood cell segmentation. Finally, the parasites are studied. same in all images, and in addition, these studies are highly
123
Neural Comput & Applic
susceptible to noise. On the same grounds the studies of 2.1 Research challenges
[32, 33] addressed the segmentation of the parasites. In
addition, single RBCs may have noisy chromatin dots, Automatic tools and techniques introduced previously
single dots are not considered by experts as parasites, false provide better solutions for the mentioned problems, but
results will be reported and accuracy will be at risk [46]. mostly deal with dependent factors of RBC’s morphology
123
Neural Comput & Applic
[2]. For example, the circularity of RBCs is not universal which is less distributed in the image, will be the eligible
case and the majority of previous studies, reported in [6–8] color of the feature. In this regard, the most suitable prob-
considered RBCs as round or elliptical in shape and red in abilistic approach is Gaussian mixture model with expec-
color. Size, area and any other fixed geometrical factors of tation maximization to determine the mean, weight and co-
RBCs are also risk factors, true only in normal situations variance of the colors distributed in the image as presented
[9] and have been considered in [7, 10, 11] as well. A slight in the equation:
deviation from the proposed models of these studies will fWk ; lk ; CVk g; 8kcopts 2 Color ð2Þ
abruptly reduce the accuracy and efficiency and may even
generate no response in some cases. In addition, occluded where {Wk, lk, CVk} is the weight, mean and co-variance
RBCs are also a serious issue that have not been properly matrices of kth color component. The color components are
addressed in the past. The term occlusion is used because determined with the Gaussian mixture model by assigning
of clumping and overlapping RBCs [12]. Clump means to the pixel through the normal distribution probability as
glue, and RBCs glued to each other in the form of long mentioned in Eq. (3)
chains, an indication of iron deficiency (common in Wk Nðfx jlk ; CVk Þ
malaria) in the blood. Overlapped RBCs are formed due to Pðkjfx Þ ¼ P ð3Þ
k Wk Nðfx jlk ; CVk Þ
inappropriate slide preparation. The occluded RBCs affect
the accuracy in terms of malaria parasitemia [12]. Malaria where k is the color values in a group vector, Wk are the
P
parasitemia is the percentage ratio of infected RBCs to all weights given as ( Kk=1WK = 1) and fx(W1, …, WK;
RBCs present on the slide [13]. f1, …, fK).
iRs Next, the spatial variance is calculated from horizontal
% MP ¼ 100 ð1Þ and vertical variances of the kth color components, which
aRs
are presented in Eqs. (4) and (5), respectively.
where iRs and aRs represent the number of infected RBCs
1 X
and the number of all RBCs in a single window, Vv ðkÞ ¼ Pðkjfx Þjyv Mv ðkÞj2 ð4Þ
respectively. jY jk y
3 Proposed methodology
123
Neural Comput & Applic
1
P
where Mv ðkÞ ¼ jYjk y Pðkjfx Þyv , where yv and xh are y- The weighted feature color is also normalized to the
coordinate and x-coordinate of the pixel x, while |Y|k and range [0, 1].
P P Results with the proposed technique are presented for
|X|k are given as |Y|k = yP(k|fx) and |X|k = xP(k|fx),
respectively. visual inspection and compared with the ground truth images
The total variance of a color component k is given as: marked by medical experts as shown in Fig. 2. The verifi-
cation has been made by another panel of medical experts
V ðkÞ ¼ Vv ðkÞ þ Vh ðkÞ: ð6Þ from Saidu Medical College Swat, KPK, Pakistan. The
Further, we normalized V(k) to the range [0, 1] as, segmented features are slightly dilated for clear visibility.
As the parasite in its initial stages is in the form of
ðV ðkÞ mink V ðkÞÞ
V ðkÞ ¼ : ð7Þ threads and can span an area of at least 50 pixels (empir-
ðmaxk V ðkÞ minkV ðkÞÞ ically checked by experimenting on more than 45 images
Thus, the weighted sum of color spatial-distribution feature out of 74), the small areas are identified as noise and
Fs(x, f) is defined as: removed from the image. After segmentation of parasites
X both the original and the resulted image having parasites
Fs ðx; f Þ / Pðkjfx Þ ð1 V ðkÞÞ: ð8Þ are converted to binary form for further processing.
Fig. 3 Presents the separation process, a Presents input original image, b presents the binary image of the original, c presents the single RBCs
and d presents the separated occluded RBCs
123
Neural Comput & Applic
3.2 Occluded red blood cells splitting infected). The accuracy of quantifying RBCs (infected and
non-infected) mainly suffered with occlusions (clumps and
The precise grading of malaria parasitemia depends on the overlaps of RBCs). The splitting of occlusions process
accurate quantification of RBCs (infected and non- needs to be designed in a way to save processing time on
Fig. 4 Overall process after separation of occluded RBCs from single drawing, e presents the mapping of drawn circles on the initial points
RBCs, a presents the image of occluded RBCs, b distance transform of the boundaries of the occluded RBCs and f presents the final
of image presented in a, c presents local maxima of the occluded mapped and cleaved RBCs in constituents number
RBCs, d presents the centroids of the occluded RBCs for circles
123
Neural Comput & Applic
aration of occluded RBCs from single RBCs in the image. where |X| = finite set of points, xi is point |X|, while ai is
weight assigned to xi, the sum of the weights must be equal
3.2.1 Checking for occluded RBCs to 1 mean normalized.
AreaRBC ¼ No: of Pels ð10Þ
We double checked for the presence of occluded RBCs,
i.e., median area check and median elongation check. First, where no. of pixels = pixels defining the convex hull
we find the convex hulls of all the RBCs present under the object of the RBCs.
current window through Eq. (9). We find the areas and LRBC
elongation of the convex hulls through Eqs. (10)–(12), ElongationRBC ¼ ð11Þ
BRBC
respectively. Using these two measures, we find a nor-
malize variance among all the RBCs. Through experi- where LRBC is the major axis and BRBC is the minor axis of
mentation, we found that if the variance is higher than 0.2 each convex hull (RBCs).
in case of area and higher than 0.5 in case of elongation ðX lÞ2
then the occluded RBCs will exist and vice versa. r2 ¼ ð12Þ
N
Fig. 5 Occluded RBCs
splitting through the proposed
technique, a, c and e are original
images, while b, d and e are the
results obtained by drawing the
circles on the centroid positions
obtained through local maxima
from distance transform and
then mapping of the circles with
the occlusion to obtain the
actual cleaved number of RBCs
123
Neural Comput & Applic
Fig. 6 Parasite imposition process with proposed technique on images having occluded RBCs. a, d Present original images, b, e have cleaved
occluded RBCs and imposition of parasites on them and c, f present the imposition of parasites on single RBCs
123
Neural Comput & Applic
For malaria parasitemia grading, we have performed the As all the RBCs are separated and single, identifying the
following steps. infected RBCs is needed for counting. We used one specific
quality of infected RBCs, considering the outer boundaries
3.3.1 Imposition of segmented parasites of all the RBCs and encircling those with green that are
infected on the basis that if the parent, or outer boundary has
The parasites, which are segmented in the first step, are child boundary. RBCs having no child boundary are con-
imposed on the single RBCs after splitting the occluded sidered as non-infected RBCs. From medical literature an
RBCs into single RBCs if the occlusions existed otherwise RBC, in case of malaria, is considered infected based on the
this step will be followed directly after segmentation of presence of plasmodium in it. An infected RBC having many
parasites. The imposition of parasites is needed for the plasmodium parasites will count as one infected RBC. The
purpose of identifying the infected RBCs and counts their visual results for this phase are shown in Figs. 7, 8.
number to estimate the percentage malaria parasitemia.
The imposition of parasites process is just simply the 3.3.3 Segmentation of infected RBCs
addition of the two binary images, i.e., the one which has
single RBCs, while the other having segmented parasites as In segmentation of infected RBCs, we followed the same
they were in opposite signs to cancel the effects of noise concept as we did in the identification. We took an empty
and any other artifact. The visual results for inspection are binary image of the same size in which all RBCs (infected
presented in Fig. 6. and non-infected) are present. Then, we highlight those
Fig. 7 Parasite imposition on images having no occluded RBCs. a. c Input images, while b, d are the resultant after imposition of parasites
123
Neural Comput & Applic
Fig. 8 Identification of infected RBCs. a, c Present original images, while b, d present infected RBCs highlighted with the red boundaries
Fig. 9 Process of infected RBCs segmentation. a Is original binary image, b The empty image with areas highlighted as the infected RBCs area,
c present infected RBCs, resulted through proposed technique and finally d contains all non-infected RBCs
123
Neural Comput & Applic
Fig. 10 Segmentation process of the infected RBCs in slide images infected RBCs existed in the single RBCs, c, f are the non-infected
having occluded RBCs. a, f Represent the input images to this module RBCs in the cleaved RBCs. e, h Present the non-infected RBCs in the
while b presents the infected RBCs in the cleaved RBCs. d, g Present single RBCs
areas with (1’s) which we identified in the image having all RBCs. Having the total count of infected and non-infected
RBCs (infected and non-infected). Adding the image in RBCs, the percentage of malaria parasitemia ratio can be
which areas are highlighted to the image having both estimated by using the formula described in Eq. (1).
infected and non-infected RBCs resulted in an image
having infected RBCs. The whole process is visualized in 3.3.6 Malaria parasitemia grading
Fig. 9, while the results are presented in Fig. 10.
According to the book at [34] and to the study [35, 36], the
3.3.4 Counting infected and non-infected RBCs percentage of malaria parasitemia should be examined in
100–200 windows and can be graded to one of the fol-
Following segmentation, counting infected and non-infected lowing grades or levels listed in Table 2.
RBCs is a simple task. For automatic counting, we used Finally, malaria parasitemia is graded to the mentioned
MATLAB built-in function ‘bwlabel’. The RBCs segmenta- levels in Table 2. Further, for testing purpose, we assumed
tion and counting results are shown in Figs. 11,12 and Table 1. 40000 RBCs per window and estimated the results based
on this assumption with the result of each image because
3.3.5 Estimation of percentage malaria parasitemia each image is a single window.
123
Neural Comput & Applic
Fig. 11 Segmentation process of the infected RBCs in slide images having all single RBCs. a, d Represent the original input images, b, e present
the infected RBCs while c, f present the non-infected RBCs
4.1 Ground truth data preparation 4.3 Quantitative evaluation of the proposed
occluded RBCs splitting technique
The images obtained from DPDx [16] were printed as
forms and distributed among three pathologists. Each form We first check the relationship of counting red blood cells
has a single image of thin blood smear and its manually (automatically after occlusions splitting and manually
estimated statistics and marking of the parasites in the made by the experts) through Pearson’s correlation coef-
image. These forms are verified by another panel of three ficient. The relationship between the two variables is
medical experts. The data collection has been made in shown in Fig. 13. For the same purpose, we also performed
Department of Pathology, Saidu Medical College, Saidu the confusion matrix based-precision, recall and F-measure
Sharif Swat, KPK, Pakistan. with Eqs. (15), (16) and (16) through the confusion matrix
in Table 3.
4.2 Inter-rater agreement Tp
Precision ¼ ð14Þ
Tp þ Fp
The collected data are first checked for inter-rater relia- Tp
bility agreement through a variation of Cohen’s Kappa Recall ¼ ð15Þ
T p þ Fn
(Two Raters) called Fleiss’ Kappa through Eq. (13).
Precision Recall
P0 P0e F-measure ¼ 2 ð16Þ
j¼ ð13Þ Precision þ Recall
1 P0e
P 0 P where Tp = correctly counted as red blood cells, Tn =
where P0 ¼ N1 Ni¼1 Pi and Pe ¼ Nj¼1 p2j , N = total num- correctly counted as non-red blood cells, Fp = in-correctly
ber of subjects and i, j = 1,2,3,…,N, k represents subjects counted as red blood cells and Fn = in-correctly counted
and categories, respectively. The Fleiss’ Kappa calculation as non-red blood cells
for the collected data is j ¼ 0:96, which shows strongly The achieved precision, recall and F-measure by
reliable data. counting the RBCs after splitting the occluded RBCs with
123
Neural Comput & Applic
Fig. 12 Counting process results. a, d, g Original input images, b, e, h are images labeled as infected RBCs, while images c, f, i are labeled as
non-infected RBCs
Table 1 Complete statistics after examination of single thin blood the proposed technique are 0.973766, 0.989544 and
smear image 0.985951, respectively.
Complete statistics
4.4 Statistical analysis of the overall framework
All red blood cells 32
Infected RBC 7 Moreover, in the same way, we performed the overall
Non-Infected RBC 26 results of percentage malaria parasitemia estimation
Percentage MP 100 RBC 21.212122 through Pearson’s correlation coefficient to find the
%MP 40 K RBC 8484.848485 relationship between manually and automatically esti-
Grade C mated percentage malaria parasitemia depicted in
Fig. 14. Confusion matrix based-sensitivity and
123
Neural Comput & Applic
123
Neural Comput & Applic
123
Neural Comput & Applic
information technology. Middle East University, Amman, Jordan, script recognition. Neural Comput Appl 25(6):1337–1347.
p 116 doi:10.1007/s00521-014-1618-9
31. Zou L-H, et al (2010) Malaria cell counting diagnosis within 40. Norouzi A, Rahim MSM, Altameem A, Saba T, Rada AE,
large field of view. In: international conference on digital image Rehman A, Uddin M (2014) Medical image segmentation
computing: techniques and applications (DICTA) methods, algorithms, and applications. IETE Tech Rev. doi:10.
32. Somasekar J (2011) Computer vision for malaria parasite clas- 1080/02564602.2014.906861
sification in erythrocytes. Int J Comput Sci Eng 3(6):2251–2256 41. Neamah K, Mohamad D, Saba T, Rehman A (2014) Discrimi-
33. Makkapati VV, Rao RM (2009) Segmentation of malaria para- native features mining for offline handwritten signature verifi-
sites in peripheral blood Smear images. In: IEEE international cation. 3D Res. doi:10.1007/s13319-013-0002-3
conference on acoustics, speech and signal processing 42. Rehman A, Saba T (2014) Features extraction for soccer video
34. Hänscheid T, Valadas E, Grobusch M (2000) Automated malaria semantic analysis: current achievements and remaining issues.
diagnosis using pigment detection. Parasitol Today 16(12):549–551 Artif Intell Rev 41(3):451–461. doi:10.1007/s10462-012-9319-1
35. Homel M, Gilles HM (1998) Malaria. In: Colliet L, Balows A, 43. Saba T, Rehman A (2012) Machine learning and script recogni-
Sussman M (eds) Microbiology and microbial, infections, 9th tion. Lambert Academic publisher, Saarbrueken, pp 56–68
edn. Topley & Wilson’s, Arnold 44. Joudaki S, Mohamad D, Saba T, Rehman A, Al-Rodhaan M, Al-
36. Iyar DRBK Malaria Diagnostics. 2013 [cited 2014 7-04-2014]; Dhelaan A (2014) Vision-based sign language classification: a
Available from: http://www.slideshare.net/iyerbk/malaria-diag directional review. IETE Tech Rev 31(5):383–391. doi:10.1080/
nostics. pp. 56–62. doi:10.1179/1743131X13Y.0000000063 02564602.2014.961576
37. Saba T, Rehman A (2012) Effects of artificially intelligent tools 45. Muhsin ZF, Rehman A, Altameem A, Saba T, Uddin M (2014)
on pattern recognition. Int J Mach Learn Cybernet 4:155–162. Improved quadtree image segmentation approach to region infor-
doi:10.1007/s13042-012-0082-z mation. Imaging Sci J 62(1):56–62. doi:10.1179/1743131X13Y.
38. Rehman A, Saba T (2014) Neural network for document image 0000000063
preprocessing. Artif Intell Rev 42(2):253–273. doi:10.1007/ 46. Saba T, Al-Zahrani S, Rehman A (2012) Expert system for offline
s10462-012-9337-z clinical guidelines and treatment. Life Sci J 9(4):2639–2658
39. Saba T, Rehman Amjad, Altameem Ayman, Uddin Mueen (2014)
Annotated comparisons of proposed preprocessing techniques for
123