Formella 2003 Automatic
Formella 2003 Automatic
Rodrguez-Damin M. 1, Cernadas E.1, Formella A. 1, Gonzlez A. 2 Departamento de Informtica. Universidade de Vigo. [email protected], [email protected], [email protected] 2 Departamento de Bioloxa Vexetal e Ciencias do Solo. Universidade de Vigo. [email protected] As Lagoas s/n, E-32004 Ourense, Spain
1
Species of Urticaceae family have a great allergenic potential but the human sensitivity is specific for each specie. So, clinical practice shows special interest in the identification of these species. Unfortunately, palynologists are not able to distinguish the species of this family using traditional methods to pollen count. Added to this fact, these methods are tedious and therefore, they are no optimal to routine on-line analysis. The paper presents a complete system for the classification of pollen allergenic species of Urticaceae family. The images are taken by an optical microscopy. A coarse border of pollen grain is estimated using Hough transform and then the accurate border is obtained applying snakes to this coarse border. Then a set of shape measures is computed, which are used to discriminate between species. A statistical evaluation of detection and classification capabilities of our method is provided and discussed.
The Urticaceae pollen is an alergeno with a very important clinical repercussion [1]. It is present all the year in the Spanish atmosphere with changeable concentration. The pollen grains are transported by wind. In the month of major blossom, when the pollen grains come into contact with persons reaching the surface mucous coat of the respiratory tract, the pollen grains produce important alterations in the organism, such as allergin rhinitis and asthma type I. The reaction is caused by the existence of a surface structure named oncus on the pollen grains which is a thickening of the intine (inner layer of the pollen wall) under apertures (specialized region of the spore dermis that is thinner than elsewhere on the grain and usually differs in its structure and ornamentation) [2]. In clinical practice, it is possible to distinguish the sensitivity IgEmediated to genus Parietaria or to Urtica one. An antigenic cross reaction between the two genus has not been detected [3]. Among the Iberian species, the pollen of the gender Parietaria is one of the most common and also one of the greater allergic potential [4].
Many pattern recognition systems can be partitioned into components such as: image acquisition, object extraction, feature generation, feature selection, classifier design, and system evaluation. A sensor converts the microscopy preparation into digital data. An object extractor isolates objects of interest from the image (in our case, pollen grains). A feature generator measures object properties that are useful for classification. The feature selection stage chooses a subset of features that are the best candidates. Once the classifier has been designed, the system evaluation stage quantifies the performance of the designed classifier. During the design of the system, one may go back and redesign earlier stages to improve the overall performance. In this study, digital images of microscopy preparations are taken using the palynologist laboratory infrastructure (a microscope connected to an analog camera). The slides of 35 mm are taken with an amplification of 40
38
4420('#"! 5 % 3 % 1 ) & % $ % $
Due to human allergic reaction, clinical practice shows special interest in the differentiation of the pollen, but in both genus, the morphology of the pollen is quite similar: the pollen grains are small and of almost equal size (13-17 Pm in Parietaria; 13-19 Pm in Urtica), they exhibit similar roundness and do not have excessive ornamentation of exine. There exist some openings of type pore, 3-4 porate in Parietaria and 3-4 zonoporate or pantoporate in Urtica [5]. Figure 2 shows a well-defined pollen grain of each specie. With the traditional methods of pollen analysis, optical microscopy and electronic microscopy of balayage (SEM), it is very difficult to distinguish the pollen grains of the genus Urtica and the genus Parietaria by human inspection. This leads palynologist to only identifying the Urticaceae family in counting analysis. Added to this fact, these procedures are very tedious and, therefore, they are not adequate for daily analysis.
Proceedings of Acivs 2003 (Advanced Concepts for Intelligent Vision Systems), Ghent, Belgium, September 2-5, 2003
From the viewpoint of computer vision, pollen grains of Urticaceae family are nearly circles. Hough transform [7, 8] is a well-known method for circle detection which
Edges in a grey level image are the discontinuities in the image intensity profile, which can be regarded as the boundary between objects or regions, in our particular case, between pollen grains and preparation background. The simplest way to extract prominent edges may be the generation of an edge strength image followed by some thresholding. The edge strength image can be computed by applying edge detection operators to the original image. These operators normally provide good edge maps in ideal images, but they frequently have difficulties in establishing the connectivity of edge segments in very noisy images. Basically, we are interested in getting well-defined edges of the external border of pollen grains. A thin border is also desired to get as few points as possible (these points will be the input to the Hough transform). Hence, we divide the edge extraction algorithm in four steps: pre-processing, edge detection, thresholding, and post-processing.
Fig. 1 Stages of binarization of the image. A) Original image. B) After applying a median filter and an edge detector. C) The thresholded image. D) The final morphologically treated and thinned image.
39
B&A4ED%CB&A5@9786&45231)0&'(&
times. Afterwards, the slides are rasterized by a commercial laser scanner with a resolution of 2048x3072 pixels and 24 color bits. The accumulated spatial resolution at which the preparations are digitalized is 8196 dots per mm. Each preparation contains pollen grains of the same type: Parietaria judaica, Urtica urens or Urtica membranacea. We present a preliminary approach to discriminate pollen [6]. The pollen grains were selected manually on the images. The results with a limited well-defined pollen grains dataset were encouraged, so to improve the system, in the new version employs the Hough transform to isolate the pollen grains. With this technique it is possibly to develop an automatic system, however, some of the pollens exhibit no regular circular shape in the preparation. To obtain a good boundary of every pollen, the active contours method (snakes) is employed. Finally, a set of shape features is computed for every pollen grain, which is used to discriminate the pollen grains. In the following sections, we describe every stage of the system in more detail.
must be applied to binary edge images. The techniques employed to get a good binary image are explained in the sequel, and then, a brief summary of pollen detection using the Hough transform is presented.
%$" ! # ! # G
Proceedings of Acivs 2003 (Advanced Concepts for Intelligent Vision Systems), Ghent, Belgium, September 2-5, 2003
The pre-processing step is very simple: the original image (showed in Figure 1A) is filtered by a median filter [9] to attenuate the spurious noise. In the second step, a gradient operator, e.g. the Sobel operator, is applied to the filtered image. The result is an edge strength image where edges of objects are highlighted. This effect can be seen in Figure 1B. The resulting image is still not a binary image. To achieve a binary image, we use thresholding, which is undoubtedly one of the most popular segmentation approach that converts a grey-level or floating image into a binary image. Although the selection of optimum thresholds has remained a challenge over decades, many methods for automatic threshold selection have been reported [10]. We calculate the threshold with the Otsu method [10, 11], which maximizes the betweenclass variance. The final result is a binary edge image as it is displayed in Figure 1C. However, this image has a lot of internal points that represent the internal structure of the pollen grain. These points are not necessary to detect the pollen grains, so a fourth step is added. We post-process the binary edge image to remove artifacts coming from imperfections of the microscopy preparations and the internal structure of the pollen grains. The aim is to keep only the external boundaries of the pollen grains as thin as possible. Therefore, we employ binary mathematical morphological operators. The most basic operations are erosion and dilation. Erosion is the reduction in size of regions of interest which is most readily accomplished by iterative peeling-off single-pixel layers from the outer boundary of the region of interest. Dilation can be seen as the opposite process, entailing iterative additions of single-pixel layers to the boundary of each region of interest to increase the regions size [12, 13]. In particular, we have applied an erosion operation using a matrix of 3 x 3 pixels. The result of this operation is subtracted from the first binary image. The result gives us the border of the pollen grains. To get a thinner border a classical thinning algorithm is used [14]. The final image is displayed in Figure 1D. Even the border is not precise, it is sufficient to find the position of the pollen in the image using the Hough transform described below.
object detection. It provides an elegant way of extracting global features like curve segments from an image.
As we have already mentioned, the shape of a pollen grain can be approximated by a circle. Examples for the three types of pollen grains are displayed in Figure 2. The similarity and roundness is evident. The Hough transform is a well accepted method for geometric
D 6&$"#!2&4 6!
Figure. 2. Example of the three type of pollens: Parietaria judaica (above), Urtica membranacea (middle) and Urtica ureas (below).
40
Proceedings of Acivs 2003 (Advanced Concepts for Intelligent Vision Systems), Ghent, Belgium, September 2-5, 2003
The Hough transform is basically a template matching scheme, as pointed out in [15]. The template search is a circle witch can be represented as next: Let (x, y) be an edge pixel on a circle with centre coordinates (a, b) and radius r. Then, the circle can be expressed by:
x a
y b
r2
(1)
We consider only the range of expected radii of pollen grains in order to decrease computational cost. We establish the threshold to indicate which points of the parameter space represent truly circles. Both are critical decisions. If we use a low radii minimum, then the probability to recognize preparation artefacts as pollen grains is high, but if we used a high value, we may loose the small pollen grains. The decision is also influenced by the general difficulty in the detection of circles of small size, that is caused by the effect of the masking in the polling produced by the circles of large
circumference. In fact, the votes of these larger circles appear dispersed around the solution and some of them can hide the center of a small circle. The final detection of the positions of the pollen grains in the preparations is finally perform by thresholding the resulting parameter space. Different values of thresholds will result in different compromises of true positive and false positive errors. If we used a low value for the threshold, a lot of points are considered to be local maxima, hence we would include the imperfections, by the contrary, if we used a high value for the threshold, we would recognize only the more well-defined pollen grains. This problem is far from trivial and there are not yet any generally applicable techniques guaranteeing solutions to it. Peak finding is further complicated by false peaks which do not correspond to objects in the image. These may be generated by noise which has some structure to it, or by the interplay of pixels on different objects.
Figure 3 shows the original image of a pollen microscopy preparation and Figure 4 shows the detected pollen grains, respectively. Every recognized pollen is marked with a bold circle over itself.
41
Proceedings of Acivs 2003 (Advanced Concepts for Intelligent Vision Systems), Ghent, Belgium, September 2-5, 2003
Figure. 4 Figure 3 after applying the Hough Transform where the detected pollens are graphically highlighted
The results of the Hough transform are the radii and centers of every detected pollen. These data and the gradient of the image will be used as the input of the pollen grain classification module. It consists of three steps: accurate pollen boundary extraction using snakes, shape feature extraction of the pollen, and classification.
Active contours, also known as snakes, are frequently used to tailor accurately the boundary of an object. The points defining the boundary of a polyline an moved during the process trying to minimize the energy contribution from a variety of functions. The energy contribution is composed of three terms, a continuity term, a curvature term, and an image term [16]:
gradient of the image. The gradient is used since it enhances the edge values. In our algorithm the original image filtered by Sobel operator is used as the gradient image. The radii and centers of the circles provided by the Hough transform can be used to get a first approximation of the borders, taking a set of equidistant points on the ideal circles. We try to fit each border to the actual boundary. We assume equal weights for the three above-mentioned energy contribution terms. Figure 5 shows two examples. Above (A), the boundary of the pollen grain (triangles) is still close to the input points, but it is easy to appreciate the better adjustment with respect to the initial points (squares). By the contrary, on the below column the points are overwritten, because the pollen does not present irregularities on its boundary.
(2)
42
where , and are weights. The first term represents the point spacing, the goal is not to minimize the distance between points, but to drive the points to some average separation. The second term has the purpose to ensure that the points maintain some smooth curve, much like a spline fit. The last term represents the
52 % A 1
%!$#%5#%$ &1 2
4 6
8&5@ 5 7 6 4 2
8B 2
B 86 4
Proceedings of Acivs 2003 (Advanced Concepts for Intelligent Vision Systems), Ghent, Belgium, September 2-5, 2003
Holes (H). Sum of differences between Dmax and the distance between CT and the pollen boundary Boundary roughness(BR). Border points are adjusted to a normal distribution: height, center, and width. The rate height /width is BR Radius Dispersion(RD). Standard deviation of distances between CT and pollen boundary
Figure. 5. Examples of snakes: The squares represent the points computed by the Hough transform (initial border) and the triangles represent the contour after applying snakes (actual border).
x x x x x x x x
Perimeter (P). Number of pixels on the pollen boundary Roundness (R). Relation between the squared perimeter and the area Centroide (CT). Center of gravity Mean Distance to CT (Dmean). Average distance between CT and pollen boundary Maximum Distance to CT (Dmax). Largest distance between CT and pollen boundary Minimum Distance to CT (Dmin). Smallest distance between CT and pollen boundary Rate of change (R1). Maximum Distance to CT divided by Minimum Distance to CT: Dmax/Dmin Rate of change (R2). Maximum Distance to CT divided by Mean Distance to CT: Dmax/Dmean Rate of change (R3). Minimum Distance to CT divided by Mean Distance to CT: Dmin /Dmean Diameter (DM). Largest distance between two points on the pollen boundary
27 images of each Urticaceae species (9 Parietaria judaica, 6 Urtica urens and 10 Urtica membranacea) have been used in this study. The pollen have been collected in different places of Galicia (NW of Spain). The number of pollen grains per image ranges from 2 to 16 pollen grains. The total number of pollen grains of each species are: 51 Parietaria judaica, 18 Urtica urens and 116 Urtica membranacea. Since the Hough transform is a very time consuming task, it is applied to a sub-sampled version of the image. We applied a size reduction of four times.
43
!$ $
where is the covariance matrix of the training set and mj is the mean class prototype of class j. The mean class prototype is calculated taking the mean vector of each class in the training set. We assume the same covariance matrix for all classes. As the number of microscopy preparations is still limited, the training set is constructed using N-1 images and the test is carried out using the excluded image (i.e. the leave-oneimage-out approach). If this image is correctly classified, a hit is counted. The process is repeated N times, each time excluding a different image of the dataset .
Human experts use morphological information to distinguish the different types of pollen. Once the shapes of the pollen grains have been recognized, we measure the features described below. The selection of the shape features takes into account the morphological characteristics employed by the palynologist experts and by others papers [17].
2 B
C5 B
)
"
86 B
$
4 B
5! && 4
(3)
Proceedings of Acivs 2003 (Advanced Concepts for Intelligent Vision Systems), Ghent, Belgium, September 2-5, 2003
Figure. 6 Comparison of pollen detection in images of microscopy preparations with FROC Curves, explanation see text
The input to the classification stage are the isolated pollen grains. The results are: 45 Parietaria judaica, 13 Urtica urens and 112 Urtica membranacea. Using all the shape features described in section 4.2, the percentage of correct pollen grain classification is 76.04%. Although the system sensibility is higher than random classification (33.33%), it is rather low for an acceptable pollen discrimination system. The reason might be that some of the shape features have poor classification capabilities or even worse, they may distort the entire classification. Hence to improve the classification, an important issue is how to select good features to discriminate well the pollen species. We use a modified Floating Search Method (FSM) [20], and consider for the sensibility of the system the rate of correct classification based on the Mahalanobis distance with the leaving-one-imageout approach. Because FSM as discussed in [20] can be trapped in cycles, we additionally stop the search after three times the number of features. The percentage of correct classification using this searching approach is 85.62 %. The best subset include the following features: Mean Distance to centroide
44
f f f gf
SSSST
f f gf
SSST
g h qrqs 8p i h g fd c g e SST
t u {y zst x wu ts pq y v r f gf
R R
gf
TS
f gf
fo
fn
fh f i ~ |} f j f k } f l
mf
gf
0&
The performance of detecting pollen positions may be influenced by the different decisions taken during the processing. The results of varying the parameters are assessed by Free Response Operating Characteristic (FROC) curves [19], which plot the percentage of correctly detected pollen grains as a function of the number of false positive detections per image (false pollens). True and false hits are visually counted overlapping the detected circles of Hough transform on the original image, as it can be seen in Figure 4. The best performance is achieved for the highest correct number of detections with the lowest false positive detections. The final results are shown in Figure 6. In relation to the binary edge image, the FROC curves of Figure 6A show that the best score is obtained applying a median filter of 3X3 pixels to attenuate the noise in the original image. Larger filter masks seem to destroy significant edges in the images. In relation to the range of radii, the FROC curves of Figure 6B show that the selection of the range of radii is important to achieve acceptable true positive rates with low false positive rates per image. The best one tested is a range of 13 to 25 pixels. For pollen detection, the system sensitivity is 82% with almost no false positives per image. In the FROC curves of Figure 6C, the behavior for every species is presented for the best parameters established (median filter of 3X3 pixels and range of radii of 13 to 25 pixels). It can be observed that the system performance is more or less uniform for every specie.
ST SU SV SW SX SY S` Sa Sb SST
vu t wx y wu
Afterwards, the coordinates of the detected pollen grains are mapped to the original images. The results of the classification are presented and discussed in the following.
8& D 6
#! "
&8! 2 4 6
5 A
9 I H D G RQ D 8B @ 9 86 E Q A 7 9 I H D G FP D P B @ 9 86 E A 7 9 I H D G FC D C B @ 9 86 E A 7
5 B
5 @ 2
$ #
Proceedings of Acivs 2003 (Advanced Concepts for Intelligent Vision Systems), Ghent, Belgium, September 2-5, 2003
Dmean, Rate of change R1 (Dmax/Dmin), Rate of change R2 (Dmax/Dmean), Maximum Distance to centroide Dmax, and Holes H. The largest range of correctness are reached by the Urtica membranacea with 95% and by the Parietaria judaica with almost 89 %, but the range fall down with the Urtica urens.
[4].
[5].
[6].
A complete system to automatically to discriminate and count Urticaceae species (Urtica membranacea, Urtica urens, and Parietaria judaica) have been described. A Hough transform technique is applied to microscopy preparations of pollen to detected a coarse estimation of the border of the pollen grains. Pollen boundary is refined using snakes and pollen species discrimination is achieved using features of pollen shape. The system performance is 85.6 % of correctness. The set of shape parameters which offers the best discrimination is: Mean Distance to centroide Dmean, Rate of change R1 (Dmax/Dmin), Rate of change R2 (Dmax/Dmean), Maximum Distance to centroide Dmax, and Holes H. These initial results are very encouraging and suggest that this method can be innovative in the pollen analysis with important repercussion in allergic treatment in clinical practice. On the one hand further evaluations still have to be done like comparing the performance of the system with the performance that a group of experts can achieve, and for getting an evaluation more consistent, it is also necessary to increase the number of samples. On the other hand improvements of the performance may be possible with the incorporation of new features and also the study of the internal border of pollen grains and its internal structures.
[7].
[8].
[9].
[10].
[11].
[12]. [13].
[14]. [15].
[16].
[1].
[2].
[3].
Negrini, A.C.; Ariano, R.; Delbono, G.; Ebbli, A.; Quaglia, A. Y Arobba, D. Incidence of sensitization to the pollens of Urticaceae (Parietaria), Poaceae and Oleaceae (Olea europea) and pollen rain in Liguria (Italy). Aerobiologia, 8: 355-358. 1992. Casas, C., J. Mrquez, M. Surez-Cervera, J. Seoane-Camba. Immunocytochemical localization of allergenic proteins in Parietaria judaica L. (Urticacae) pollen grains. European J. Cell. Biol. 70: 179-188. de Po. 1986. Corbi A.L., Corte, J., J. Bousquet, A. Basomba, A. Cistero, J. GarciaSelles, G.DAmato, J. Carreira. Allergenic cross-reactivity pollen of Urticaceae. Int. Arch. Allergy Appl. Immunol. 77: 377-383. 1985.
[17].
[18]. [19].
[20].
Ayuso R.; Carreira, J. Y Polo, F. Quantification of the major allergen of several Parietaria pollens by an anti-Par 1 monoclonal antibody-based ELISA. Analysis of crossreac among purified Par J 1, Par o 1 y Par m 1 allergens. Clinical Experimental Allergy, 25(10): 993-888. 1995. Saa-Otero, M.P., Surez-Cervera M. and Graca, V.R. Atlas len de Galicia I . Diputacin de Ourense 358 pp. 1986. Rodrguez Damin M., Cernadas Eva, Formella A., De Sa-Otero M. Pilar. 2002. Pollen classification of three types of plants of the family Urticaceae. 12th Portuguese conference on pattern recognition. Aveiro, Portugal 27-26 june. ISBN 972-789-067-9. Soo-Chang Pei, Ji-Hwei Horng. Circular arc detection based on Hough transform. Pattern Recognition Lett. 16 (1995), 615-625. Davies E.R. Machine Vision: Theory Algorithms, Practicalities, Academic Press Ltd, 24/28 Oval Road, London NW1 7DX, United Kingdom, 1990. Petrou Maria, Bosdogianni Panagiota. Image Processing. John Wiley & Sons. Great Britain. 2000. Ritter G.X. and Wilson J.N., HandBook of Computer Vision in Image Algebra, CRC-Press, 1996. Otsu N. A 1979. Threshold selection method from gray-level histograms. IEEE trans. On Systems, Man and Cybernetics, vol. 9, no. 1. Serra Jean Mathematical Morphology, Vol. 1. Academic-Press. 1982. Seul Michael, OGorman Lawrence, Sammon Michael J. Practical Algorithms form Image Analysis, Cambridge University Press, 2000. Pavlidis Theo. 1982. Algorithms for Graphics and Image Processing. Computer Science Press. Eric W., Grimson L., and Huttenlocher Daniel P., On the sensitivity of the Hough transform for object recognition, IEEE Trans Pattern Analysis and Machine Intelligence, vol. 12, no. 3, pp. 255274, Mar 1990. Williams D.J. and Shah M. A Fast Algorithm for Active Contour and Curvature Estimation. CVGIP: Image Understanding, 55 (1), pp 14-26. January, 1992. Fountoura Costa da Luciano, Marcondes Cesar Roberto, Shape-Analysis and Classification Theory and Practice, CRC-Press 2001. Duda R. O., Hart P.E., and Stork D. G., Pattern Classification, John Wiley & Sons, 2001. MacMillan N.A., Creelman C.D., Detection Theory: A Users Guide. Cambridge University Press, Cambridge. 1991. Pudil Novovicova J. and Kittler J,. 1994. Floating search methods in feature selection. Pattern Recognition Lett. 15:119-125.
$
$#$
!$ F $
%5
# #
&
45