Analysis of Automatic Image Classification Methods For Urticaceae Pollen
Analysis of Automatic Image Classification Methods For Urticaceae Pollen
Neurocomputing
journal homepage: www.elsevier.com/locate/neucom
a r t i c l e i n f o a b s t r a c t
Article history: Pollen classification is considered an important task in palynology. In the Netherlands, two genera of the
Received 16 February 2022 Urticaceae family, named Parietaria and Urtica, have high morphological similarities but induce allergy at
Revised 22 September 2022 a very different level. Therefore, distinction between these two genera is very important. Within this
Accepted 15 November 2022
group, the pollen of Urtica membranacea is the only species that can be recognized easily under the micro-
Available online 26 November 2022
scope. For the research presented in this study, we built a dataset from 6472 pollen images and our aim
was to find the best possible classifier on this dataset by analysing different classification methods, both
Keyword:
machine learning and deep learning-based methods. For machine learning-based methods, we measured
Image classification
Deep learning
both texture and moment features based on images from the pollen grains. Varied feature selection tech-
Machine learning niques, classifiers as well as a hierarchical strategy were implemented for pollen classification. For deep
Hierarchical strategy learning-based methods, we compared the performance of six popular Convolutional Neural Networks:
Pollen grains AlexNet, VGG16, VGG19, MobileNet V1, MobileNet V2 and ResNet50. Results show that compared with
flat classification models, a hierarchical strategy yielded the highest accuracy with 94.5% among machine
learning-based methods. Among deep learning-based methods, ResNet50 achieved an accuracy of 99.4%,
slightly outperforming the other neural networks investigated. In addition, we investigated the influence
on performance by changing the size of image datasets to 1000 and 500 images, respectively. Results
demonstrated that on smaller datasets, ResNet50 still achieved the best classification performance. An
ablation study was implemented to help understanding why the deep learning-based methods outper-
formed the other models investigated. Using Urticaceae pollen as an example, our research provides a
strategy of selecting a classification model for pollen datasets with highly similar pollen grains to support
palynologists and could potentially be applied to other image classification tasks.
Ó 2022 Leiden Institute of Advanced Computer Science, Leiden University. Published by Elsevier B.V. This
is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
https://doi.org/10.1016/j.neucom.2022.11.042
0925-2312/Ó 2022 Leiden Institute of Advanced Computer Science, Leiden University. Published by Elsevier B.V.
This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
C. Li, M. Polling, L. Cao et al. Neurocomputing 522 (2023) 181–193
learning-based methods [3]8 and deep learning-based methods [5] which are morphological very similar, but cause completely differ-
[9][10][11][12]. ent allergy levels. Pollen of the two genera cannot currently be dis-
Machine learning methods need to be fed with manually tinguished easily by a palynologist; the species Urtica
selected features before they can extract these from images. The, membranacea represents the only species that can be specifically
so called, handcrafted features used in machine learning tech- distinguished.
niques are mostly based on shape, texture and other related prop- Parietaria and Urtica are two genera commonly encountered in
erties of pollen grain images. The extracted features play an the Netherlands. The occurrence of Parietaria plants is very much
important role in the performance of classification. In addition, increasing and could induce severe allergy in hay fever patients
suitable feature selection methods and classifiers are also crucial while Urtica does not [12]. Species from the genus Parietaria as well
for machine learning-based classification methods. as Urtica membranacea originate from the Mediterranean area and
In the work of del Pozo-Baños et al. [13], a combination of geo- now increase in North Europe. Due to climate change these species
metrical and texture characteristics was proposed as the discrimi- can maintain themselves in northern countries such as the Nether-
native features for a 17 class pollen dataset. Incorporation of Linear lands. The pollen grains from these taxa exhibit a similar round-
Discriminant Analysis (LDA) and Least Square Support Vector ness, and are all very small, but differ in the following features:
Machines (LS-SVM) accomplished the best performance of 94.92% 1) different number of pores: Parietaria and Urtica have 3 to 4
accuracy. Marcos et al. [14] extracted four texture features includ- pores, while this is variable for Urtica membranacea (usually 5 to
ing Gray-Level Cooccurrence Matrices (GLCM), log-Gabor filters 10; i.e. pantoporate). 2) The average size of Parietaria pollen is
(LGF), Local Binary Patterns (LBP) and Discrete Tchebychev slightly smaller (11–18lm) and it with a coarser and more irreg-
Moments (DTM) from a pollen image dataset with 15 classes. Fish- ular surface than Urtica. Urtica pollen are bigger in size on average
er’s Discriminant Analysis (FDA) and K-Nearest Neighbour (KNN) (15.2–21.1lm), and often have a more pronounced thickened
were subsequently applied to perform dimensionality reduction exine around the pore (annulus). The shape of Urtica membranacea
and multivariate classification. It yielded an accuracy of 95%. Man- is slightly angular and is easily distinguished because of its small
ikis et al. [8] used texture features obtained by GLCM and seven size (10–12lm) and high number of pores. Although these pollen
geometrical features computed from the binary mask of a pollen grains have the aforementioned differences, it is not possible for
image dataset. A Random Forest (RF) classifier was used in the clas- experts to distinguish the three different classes by the naked-
sification stage; with this classifier 88.24% accuracy was achieved eye using a light microscope. This is mainly because of their small
on 6 pollen classes. Machine learning thus show highly varying size. Therefore, in order to improve the accuracy and efficiency of
results, and is seemingly dependent on the dataset used. Urticaceae pollen classification, automatic algorithms are required.
Instead of manual design of the features, deep leaning methods Currently, very few studies focused on pollen classification of
automatically extract image features through convolutional layers the Urticaceae family. Rodrı̀guez-Damián et al. [16] extracted both
of the network. In recent years, many state-of-the-art Convolu- geometrical and texture features and probed three classifiers: Sup-
tional Neural Networks (CNNs) were applied in pollen classifica- port Vector Machines (SVM), Multi-Layer Perceptron (MLP) and
tion tasks. In the work of Sevillano et al. [5], pretrained AlexNet Minimum Distance Classifier (MDC). The best performance of
was used to classify a dataset with 46 different classes of pollen 88% success rate was reached on a total of 291 pollen images of
grains. By incorporating data augmentation and cross-validation the three species Parietaria judaica, Urtica urens and Urtica mem-
techniques, an accuracy of 98% was achieved. In the work pre- branacea. Compared with their relatively small Urticaceae dataset,
sented by Battiato et al. [4], both AlexNet and SmallerVGGNet were we aimed to analyse a much larger dataset that includes all species
implemented to classify five classes of pollen grains, with 13,000 (Parietaria judaica, Parietaria officinalis, Urtica dioica, Urtica urens
images. The two networks obtained a performance of 89.63% and and Urtica membranacea) present in the Netherlands. We grouped
89.73% accuracy, respectively. A seven layer deep Convolutional these five species into 3 classes: Parietaria (Parietaria judaica, Pari-
Neural Network designed by Daood et al. [9], was trained on a etaria officinalis), Urtica (Urtica urens, Urtica dioica) and Urtica mem-
dataset of 30 pollen classes and accomplished a 94% correct classi- branacea. Both Parietaria and Urtica dominate in the Netherlands
fication rate. Astolfi et al. [15] analysed a pollen dataset composed but cause a totally different allergy level. Urtica membranacea is
of 73 pollen categories. They compared the performance of eight an exotic Mediterranean species and it is the only species can be
state-of-the-art CNNs which included Inception-V3, VGG16, easily distinguished. Hence our starting point for three labels and
VGG19, ResNet-50, NASNet, Xception, DenseNet-201 and thus, our study is based on a three-class classification task. The
Inception-ResNet-V2. They showed that DenseNet-201 and best performance achieved in our study is 99.4% by a ResNet50.
ResNet-50 achieved superior performance against other CNNs with Actually, it is also possible to do a classification task over all five
an accuracy of 95.7% and 94.0%, respectively. species (see Supplementary Table 1). Another challenge is that
Based on the analysis of related work mentioned above, both the pollen grains that we used were unacetolyzed. Acetolyzed pol-
machine learning and deep learning-based methods have achieved len grains are those that all pollen materials are destroyed by ace-
comparable performance on pollen datasets. However, the pollen tolysis with the exception of sporopollenin that forms the outer
datasets used in these studies is derived from species or genera pollen wall, the exine. In contrast to acetolyzed pollen grains,
from different plant families [16]. The morphology of each class unacetolyzed pollen keep their original organic features which
of pollen is already clearly distinctive under the microscopy by are less apparent. To the best of our knowledge, our previous work
human analysts. For example, the public POLEN23E dataset [3] [12] was the first and the only time that CNNs were applied and
consists of 23 pollen classes from the Brazilian Savannah, derived compared for the analysis of the unacetolyzed Urticaceae pollen
from 23 genera in 15 families. Each class of pollen has a, different grains. In this study, we extended this work further and aimed to
shape, size and texture. The other public pollen dataset from the find an automatic classification model with the best performance
Brazilian Savannah, called POLLEN73S, which was analysed by in both machine learning-based and deep learning-based methods
Astolfi et al. [15], has 73 pollen classes with clearly variable colour, for our unacetolyzed Urticaceae dataset. In general, for a deep
shape and other morphological differences. These distinct features learning model, a large dataset is required as input. However, there
ensured the high performance of the classification model applied. are many limitations for researchers to collect a sufficiently large
However, in this research, we are more interested in distinguishing dataset in practice. Subsequently, we were curious about how
genera of the same family Urticaceae, namely, Parietaria and Urtica machine learning-based and deep learning-based methods work
182
C. Li, M. Polling, L. Cao et al. Neurocomputing 522 (2023) 181–193
on a smaller sized image dataset. Therefore, two additional exper- However, some pollen grains are very closely located to each other
iments on smaller datasets were designed to compare the perfor- so that cropping to the same size might cause incomplete pollen
mance of different classification models. For a 1000-image separation. Therefore we chose for as another approach which is
dataset, a ResNet50 yielded the best performance of 96.3% while to use padding of the cropped images so that the resulting images
for a 500-image dataset, it achieved the best accuracy of 93.3%. all have the same size. For the padding size, the biggest size of all
individual pollen images was selected, i.e.276 276 pixels. The
padding value was set to the median value at the edge of each pol-
2. Methods
len image in order to make the content of the padding images more
natural.
2.1. Sample and Image preparation
After the pre-processing of the images, we aimed to find the
best classification model for our Urticaceae pollen dataset.
2.1.1. Sample preparation of the pollen grains
Machine learning and deep learning-based classification models
Our pollen data included both fresh pollen specimens and dry
were constructed and the performance of each model was evalu-
pollen specimens [12]. Fresh pollen specimens were collected by
ated and compared.
an experienced biologist in the surroundings of Leiden and The
Hague (the Netherlands) during the flowering season of 2018 and
2019. Dry pollen specimens were collected from the herbarium of 2.2. Machine learning methods
Naturalis Biodiversity Center, Leiden, the Netherlands, using iden-
tification keys and descriptions. For each species in our dataset, 2.2.1. Feature extraction and selection
pollen samples from 4 to 8 plants were taken, from different geo- Machine learning methods require manual selection of relevant
graphical locations in order to cover as much variation as possible features before extracting these from images. One challenge is how
(see Supplementary Table 2). Microscope slides were freshly pre- to select an appropriate set of features for classification. By observ-
pared by aerobiological experts from Naturalis Biodiversity Center. ing the characteristics of Urticaceae pollen grains, we noticed that
The thecae of open flowers were carefully opened on a microscopic Parietaria has a coarser ornamentation on the surface of its pollen
slide using tweezers. Non-pollen materials were manually grains, Urtica has thickened pores and Urtica membranacea has an
removed. The pollen grains were mounted using a glycerin: angular outline. Texture attributes of surface and shape features
water:gelatin (7:6:1) solution with 2% phenol and stained with was considered as the appropriate pollen descriptors for Urticaceae
Safranin (0.002% w/v). Cover slips were sealed with paraffin. Each pollen grains. We aimed to include as much representative features
slide contained only one plant of each species of Urticaceae. as possible for Urticaceae pollen classification due to their high
morphological similarities. The following selected features have
2.1.2. Image capturing and pre-processing been proven to be successful in classification tasks for pollen
The slide area rich in pollen was scanned automatically using a recognition: GLCM, LBP, Gabor filter texture features and His-
Zeiss Observer Z1 microscope with a Plan Apochromat togram of Oriented Gradients (HOG). These features have provided
100 objective (NA 1.4), equipped with a Hamamatsu c9100 satisfactory results as reported in [4]14. Both First Order Statistics
EM-CCD camera. As pollen grains are three dimensional, it is diffi- (FOS), which are derived from statistical properties of the intensity
cult to set a focal plane for pollen samples. Therefore, we captured histogram of an image, and Wavelet measurements, which is a tex-
20 slices of images along the Z axis for pollen grains. The step size ture analysis based on a Discrete Wavelet Transform (DWT) have
was 1.8 lm. After obtaining a stack of images including pollen, the been included as they have been successfully used in pattern
grains were detected and cropped; this is referred to as the 3D pol- recognition of cells [19]20. In addition, the seven Hu invariant
len stack. Fig. 1 (a) shows an example of a slice from the raw image. moments and three shape measures derived from the invariants,
Fig. 1 (b) shows all 20 slices of different focal depths of an individ- referred to as Extension, Dispersion and Elongation (EDE) were
ual pollen grain. In total, 6472 individual pollen stack images were included as invariant descriptors for shape [21]. So, based on afore-
captured. Three categories were included for the image classifica- mentioned image-based studies [22]23 we have selected six tex-
tion study. These were (1) Parietaria (including Parietaria judaica, ture features and two moment-based features to represent the
Parietaria officinalis), (2) Urtica (Urtica dioica, Urtica urens), (3) characteristics of pollen grains in our study. Table 1 shows the
Urtica membranacea (see Fig. 2). selected features with the dimensions of each feature vector.
As shown in Fig. 1(b), not all of the 20 slices in the Z-stack were HOG features in combination with a SVM classifier have proven
in-focus. In order to obtain as much informative features as possi- to be a representative texture descriptor in the image recognition
ble, all Z-stack images were further processed using a Z-stack pro- field [24]. In the procedure of HOG feature extraction, we divided
jection method [17]. Z-stack projection is a method of analysing an image into several small connected regions, aka cells. Each cell
and highlighting specific features from all slices in a stacked image returns a 9 1 feature vector. In order to be more invariant in rep-
without incorporating out-of-focus blurriness. The selected projec- resenting the changes of shadowing and illumination, a larger
tions were Standard Deviation (STD), Minimum Intensity (MIN), region, referred to as the block, is formed. The block consists of four
and Extend Focus (EXT) [18], which are shown in Fig. 1 (c) and cells and returns a 36 1 feature vector. In the experiment, a pol-
Fig. 2. The three projections per pollen grain were treated as three len image with size (276 276) can be divided into 100 blocks,
separate channel images for the input of supervised classification consequently, a 3600 1 feature vector is returned at the end.
models. Same-sized images are required to feed into classification LBP is an invariant descriptor that can be used for texture clas-
models which is achieved by resizing images to the same size. sification. A n-digit binary number is obtained by comparing each
However, for pollen images captured by a microscope, the mor- pixel with its n neighbour pixels on a circle with radius r and used
phology and details of pollen grains are expected to be changed to compute the histogram. In our study, we fine-tuned the param-
by resizing. We did not opt for resizing as the resized images might eters and set it as n = 24, r = 3. Similar to the HOG feature extrac-
ignore the original size differences of pollen grains, which is a tion procedure, the image was also divided into 16 smaller blocks.
potentially important diagnostic feature. There are several ways In this manner a 416 1 LBP feature vector is returned.
to preserve this nature of the features. One can crop images of GLCM characterizes the texture of images by considering the
the pollen grains to the same size from a slide-scan image. spatial relationship of pairs of pixels in an image. GLCM is created
183
C. Li, M. Polling, L. Cao et al. Neurocomputing 522 (2023) 181–193
Fig. 1. The workflow of pollen image acquisition. (a) One plane of raw pollen image. (b) 20 slices at different focal depths of gray scale images of one individual pollen grain.
(c) 3 different projections in Z axis, STD = Standard Deviation Projection, MIN = Minimum Intensity Projection, EXT = Extend Focus Projection. (d) The padding images of each
projection.
direction theta = 0°, 45°, 90°, 135°. We further calculated the prop-
erties based on the matrix which are defined by Haralick et al. [25],
as the extracted feature vector. Finally, a 24 1 feature vector was
returned.
A 5 1 texture feature vector of FOS named standard deviation
of intensity, smoothness, skewness, uniformity and entropy was
calculated [20]. These measures reflect the statistical properties
of the intensity histogram of each pollen image. Wavelet-based
texture measurements show the image details in different direc-
tions after DWT. We calculated the mean, standard deviation and
entropy of intensity in three directions (horizontal, vertical, diago-
nal) of each pollen image. The dimension of wavelet measurement
is 9 1.
Another commonly used texture descriptor is the Gabor filter: it
reflects frequency content in a specific direction of a localized
region of the image. In this study, 12 Gabor filters are designed
at four directions 0°, 45°, 90°, 135° with three different frequencies
p=4; p=2; 3p=4. Therefore, the dimension of the Gabor filter feature
vector is 60 1.
Hu moments are normally extracted from pollen images with
Fig. 2. A sample pollen grain of each category in our dataset. (a) Parietaria judaica. the property of scale and rotation invariance. A total of seven
(b) Urtica urens. (c) Urtica membranacea. Each column represents the STD, MIN, EXT moment invariants as proposed by Hu [26] were extracted. EDE
projections of each pollen grain, respectively.
features are derived from the 1st and the 2nd order invariants. Even
though the morphological differences of pollen between genera is
Table 1 subtle, it was expected that these image moment features could
The dimension of feature vector of each feature. play a role in the pollen classification task.
Each pollen image in our dataset consisted of 3 projections
Feature Dimension
(STD, MIN, and EXT) obtained by projecting 20-slice Z-stack
HOG 3600 3
images. Fig. 2 shows that the features of each projection are differ-
LBP 416 3
Gabor filter 60 3
ent, especially the texture features. In order to include as much
GLCM 24 3 information of the pollen grain dataset as possible, we calculated
FOS 53 8 features for 3 projections of pollen images and concatenated
Wavelet 93 these together as the final feature vector (cf. Table 1). Therefore,
Hu moments 73
after feature extraction, the dimension of feature vector reaches
EDE 33
Total 4124 3 4124 3. Compared with public pollen image datasets like
POLEN23E, POLLEN73S [3]15 and the 2D Urticaceae pollen images
used in [16], our dataset based on a method of projection of 3D
based on a statistical rule Pði; j; d; thetaÞ, which refers to the num- images might intrinsically extract more representative features.
ber of times that gray-level j occurs at a distance d and at a direc- This partially underlies the reason of the high performance results
tion theta from gray-level i. Our experiment set d ¼ 1 and the that we have achieved.
184
C. Li, M. Polling, L. Cao et al. Neurocomputing 522 (2023) 181–193
validation means the ratio of training data and validation data is 3. Experiment results and discussion
8:2 while 10-fold cross-validation means the ratio is 9:1. We com-
pared the performance of deep learning models with 5-fold cross- We derived results from two parts: a comparison of pollen clas-
validation and 10-fold cross-validation, respectively. After 5/10- sification performance with different machine learning algorithms
fold cross-validation, 5/10 models were obtained and tested on test and an analysis of performance of deep learning neural networks.
datasets. In this study, hard voting was adopted to calculate the Two additional experiments show how machine learning-based
final accuracy rather than the average accuracy of 5/10 models. and deep learning-based methods work on smaller-size image
Hard voting sums the votes for class labels from each model for datasets. Our results are based on performance measures.
predicting the class with the majority votes. The experimental
results show that the hard voting technique further improves the
classification performance on the test dataset. 3.1. Results with machine learning methods
Table 2
Performance comparison of different flat classification models. Standard deviation, of each subset via cross-validation, is given in brackets.
Classifiers Hyperparameters Feature selection with threshold Cross-validation Precision Recall F1 score Accuracy
SVM kernel=‘rbf’, C = 4 PCA (0.8) 10-fold 0.913 0.913 0.913 0.913
(±0.012) (±0.012) (±0.012) (±0.012)
5-fold 0.915 0.915 0.915 0.915
(±0.008) (±0.008) (±0.008) (±0.008)
RF Estimators = 500 SelectFromModel 10-fold 0.886 0.886 0.886 0.886
(mean) (±0.015) (±0.015) (±0.015) (±0.015)
5-fold 0.884 0.884 0.884 0.884
(±0.010) (±0.010) (±0.010) (±0.010)
MLP Solver=’sgd’, PCA (0.85) 10-fold 0.898 0.898 0.898 0.898
Maxiter = 300 (±0.011) (±0.011) (±0.011) (±0.011)
5-fold 0.890 0.890 0.890 0.890
(±0.008) (±0.008) (±0.008) (±0.008)
Adaboost Estimators = 500, Mutual Information 10-fold 0.789 0.754 0.749 0.754
LR = 0.5 (2000) (±0.014) (±0.021) (±0.025) (±0.021)
5-fold 0.784 0.745 0.743 0.748
(±0.015) (±0.024) (±0.028) (±0.024)
Table 3
Performance comparison of hierarchical classification models.
performance of the flat classification models while Adaboost five convolutional layers and three fully connected layers were
achieved the lowest performance. In Table 3 the best combination fine-tuned with a learning rate 0.001. We set the batch size to
is SVM + SVM which obtained 94.5% of accuracy and 0.941 of all of 128 and the number of batches to 4000. Fig. 4 (a) shows the perfor-
the hP, hR and hF. The reason why hP, hR and hF are equal is that mance plot of AlexNet. The plot shows the accuracy and loss for
our simple hierarchical tree structure only has 3 layers and for each both training and validation dataset. The accuracy drastically
parent node, it only has 2 children. According to the definition of hP increases in the first 700 batches and then converges gradually.
and hR (cf. Eq. (5)), in this case, the calculation of hP, hR and hF is AlexNet achieved an accuracy of 94.1% with a standard deviation
equal. Based on our experiments, a hierarchical model which com- of (±0.002) using 10-fold cross-validation while 5-fold cross-
bined SVM + PCA and SVM + PCA at both 2 levels was considered validation retrieved a comparable accuracy of 92.4% (±0.002) (see
as the best model among machine learning-based methods. Table 4). The average accuracy and standard deviation were calcu-
lated by training each model three times. The accompanied preci-
3.2. Results with deep learning methods sion, recall and F1 score achieved with the same 0.941. The reason
why these three measurements are so similar is that, for this case,
Our starting point has been to work with commonly available False Positive (FP) samples are nearly equal to the number of False
deep learning methods, the AlexNet, VGG16, VGG19, ResNet50, Negatives (FN). The consistency of these measurements shows the
MobileNet V1 and MobileNet V2. First of all, a total of 6472 pollen reliability of the model. Six positive samples and six negative sam-
grain images were divided into a training set and test set in a ratio ples among three classes of pollen grains performed by AlexNet are
of 9:1 randomly. The test set was composed of images that were shown in Fig. 5 (a) and (b). The actual label, predicted label and
not seen by the model during the training process and that were confidence score of each sample are indicated. Label 1 to 3 repre-
used to test the trained classification model. Secondly, considering sent the 3 classes of pollen: Parietaria, Urtica, Urtica membranacea.
that deep learning models require a huge amount of data, a data In Fig. 5 (a), positive samples clearly show the distinguished prop-
augmentation technique was applied. Thirdly, similar with erties. Urtica pollen has obviously thickened pores compared with
machine learning models, 5-fold and 10-fold cross-validation were pollen of Parietaria and Urtica membranacea. Pollen of Urtica mem-
used to prevent overfitting and increase the robustness as well as branacea has more angular outlines than pollen of the other two
the generalization ability of the deep learning models. Data aug- genera. In Fig. 5 (b) illustrates that, when the properties of the 3
mentation process was performed for each cross-validation set classes of pollen are not clearly displayed, the network will mis-
independently. classify these samples because of high similarities among the 3
Based on these aforementioned procedures, we fine-tuned six classes.
representative deep learning classification models. The pretrained Similarly, the pretrained VGG16 and VGG19 models were fine-
AlexNet was implemented in the PyTorch framework. The whole tuned with a batch size of 64 and the number of epochs set to 30.
187
C. Li, M. Polling, L. Cao et al. Neurocomputing 522 (2023) 181–193
Fig. 4. The performance plots of (a) AlexNet, (b) VGG16, (c) VGG19, (d) MobileNet V1, (e) MobileNet V2, and (f) ResNet50, in terms of training loss, etc., with respect to the
number of epochs.
Table 4
Classification performances of different deep learning classification models. Standard deviation, training each model three times, is given in brackets.
Fig. 5. Examples of classification performed by AlexNet. (a) Positive samples with their predicted label and confidence score. (b) Negative samples with their predicted label
and confidence score.
The whole network was fine-tuned using learning rate 2e-5 with- the models converge well in the training process. Table 4 lists
out freezing any layers. Fig. 4 (b) and (c) show the performance detailed measurements of these two models. For 10-fold cross-
plots of VGG16 and VGG19, respectively. Both plots show that validation, VGG16 obtained an average accuracy of 98.3% with a
188
C. Li, M. Polling, L. Cao et al. Neurocomputing 522 (2023) 181–193
standard deviation of (±0.001) while VGG19 achieved a compara- cal model demonstrated that hierarchical strategy improves the
ble average accuracy of 98.6% with (±0.002). performance. The performance of the hierarchical model was, how-
The pretrained ResNet50, MobileNet V1 and MobileNet V2 ever, still lower than the deep learning models, except for AlexNet.
models were constructed based on Keras as well. And all of these We obtained similar results with the larger 6472-image dataset
models were fine-tuned on our pollen dataset. Table 4 shows that (Table 3 and Table 4). This is probably due to the fact that AlexNet
light-weights MobileNets achieved a comparable accuracy with has a shallow layer-structure which includes only five convolu-
VGGNets but it had a slightly higher standard deviation. This tional layers and three fully connected layers. The results indicate
means that light models are not as robust as heavy-weight models. that extracting as many features as possible manually as well as
ResNet50 obtained the highest performance of 99.4% with 10-fold using a hierarchical strategy outperforms a shallow deep learning
cross-validation among the six models investigated due to its dee- neural network such as AlexNet. For the 500-image dataset we
per network layers and a creative residual structure. Consequently, obtained similar results.
ResNet50 was selected as the best performing model among all the In order to obtain sufficient information for a statistical analysis
models that we implemented. In addition, both 5-fold and 10-fold of the performance of the models, we used a cross-validation
cross-validation achieved comparable performance for all of the approach over the entire image set with two differently sized
deep learning models studied. groups of subsets 1. We implemented the 5-fold cross-validation
Fig. 6 and 7 show the positive and negative samples classified to select five image subsets from the 6472-image dataset, as well
by VGG16 and ResNet50, respectively. Compared with AlexNet as the 10-fold cross-validation to select ten smaller image subsets.
(Fig. 5), the confidence score of the classified pollen of VGG16 With this selection method, the average performance of all subsets
was higher. The reason is that VGG16 has deeper layers which from different models was compared, the results of which are given
results in the extraction of more detailed and distinct features of in Supplementary Table 5. The experiments confirmed that ResNet50
pollen data. ResNet50 has a much deeper and complex network achieved the best performance on both 1000- and 500-sized dataset.
structure and the classification accuracy was higher than that of Deep learning-based methods show a better performance on
VGG16 (Table 4). For positive samples in Fig. 7 (a), the confidence both the large and smaller pollen datasets. An ablation study was
score of ResNet50 was almost 1.00 which was higher than that of conducted to help understand why this difference was retrieved.
VGG16. And in the test dataset, only three negative samples, Convolutional layers of deep neural networks can catch more rep-
shown in Fig. 7 (b), were misclassified due to the high performance resentative features compared with extracting handcrafted fea-
of the ResNet50 model. tures manually. We visualized intermediate feature maps of
After analysing automatic classification models based on both VGG16 and ResNet50 in Fig. 8 and Fig. 9 to provide extra insight
machine learning and deep learning methods on our pollen data- in the procedure of feature extraction. For each different layer of
set, we observed that the ResNet50 neural network reached an the model, different features were extracted. In Fig. 8, the feature
accuracy of 99.4% (±0.002) which is 4.9% higher compared to the maps of convolutional layers 1, 4 and 7 of VGG16 are shown. In
hierarchical machine learning model. Deep learning-based meth- Fig. 9, we show the feature maps of the convolutional layer in stage
ods perform better to classify our Urticaceae pollen grains. In addi- 1 and three bottlenecks in stage 2 of ResNet50. The structure of
tion to our pollen images dataset, we have used the deep learning ResNet50 can be divided into five stages [35]. Stage 1 consists of
classifiers to other pollen image datasets available to us. These 1 convolutional layer and stage 2–5 consist of a different number
have not been used in the training/testing but are used as unseen of bottleneck structures. From both feature maps, we can conclude
samples to probe the classifiers from our study. The classification that, in the first several convolutional layers, basic pollen features
results with these additional datasets confirm the findings from such as edges and textures (surface ornamentation) are clearly dis-
study. Early results with these extra datasets, based on VGG16, played as was also found in [12]. With an increase of the number of
have already been reported in [12]. With our ResNet50 model, network layers, more and more complex and abstract features
the results with unseen data are even better. In Supplementary influence the performance of pollen classification. For example,
Table 4 these results are summarized. in convolutional layer 4 of VGG16 and the first bottleneck in stage
2 of ResNet50, other important parts of pollen such as the pores are
3.3. Results on smaller-size image datasets highlighted. In the higher layers of the network, only the most rep-
resentative features are retained but these features are difficult to
It is common knowledge that the training process of deep learn- grasp. With the help of a deep convolutional network that can
ing model requires the use of a large data set. However, in daily extract different features from low level (detail) to high level (ab-
practice, there are limitations in the collection of sufficient samples stract), the best score in pollen classification tasks is achieved.
and images. Therefore, we examined the robustness of both In addition, all of the techniques, i.e., transfer learning, data
machine learning-based and deep learning-based methods when augmentation and hard voting, clearly contributed to improve
facing a smaller dataset. Are machine learning-based method and the performance of the deep learning models under study. Table 6
deep learning-based method comparable in performance? To shows to what extent the accuracy can be improved by different
answer this question, starting from the original data, two smaller techniques applied on the around 1000-sized image subset. Five
pollen image datasets consisting of 1000-sized and 500-sized 1000-sized image subsets were selected via 5-fold cross-
image subsets, were constructed. These image subsets were ran- validation. The average performance of five subsets was calculated
domly selected from 6472 images. And the ratio of the 3 classes and the results are shown in Table 6. The first row shows that
was 1:1:1. The experimental results on smaller datasets shown ResNet50 achieved 81.4% accuracy if the model was trained from
in Table 5 was based on one round of selection. scratch. Transfer learning improved the accuracy to 95.0% using
On both smaller pollen datasets (1000 and 500 images), the pre-trained parameters which were trained on the ImageNet data-
same six deep learning-based models were applied. For machine set. Based on the 95.0% accuracy of transfer learning, the ResNet50
learning models, we refine-tuned the hyperparameters of the best model with data augmentation improved the accuracy to 96.2%.
performed flat model (SVM) and hierarchical model (SVM + SVM). The accuracy is 1.2% higher than without data augmentation which
Table 5 shows the performance of both machine learning-based
and deep learning-based methods on the two smaller image data-
sets. Compared with the 88% accuracy of the flat model on the 1
Two differently sized groups of subsets consist of 1000- and 500-sized image
1000-image dataset, the 93.9% accuracy obtained by the hierarchi- subsets, they are given as an indication; the real number is slightly higher.
189
C. Li, M. Polling, L. Cao et al. Neurocomputing 522 (2023) 181–193
Fig. 6. Examples of classification performed by VGG16. (a) Positive samples with their predicted label and confidence score. (b) Negative samples with their predicted label
and confidence score.
Fig. 7. Examples of classification performed by ResNet50. (a) Positive samples with their predicted label and confidence score. (b) Negative samples with their predicted label
and confidence score.
Table 5
Performance comparison of different methods on smaller-size image datasets. Standard deviation, training each model three times, is given in brackets.
is shown in the second row. Data augmentation helped to increase classification of our Urticaceae pollen data. Supplementary Table
the variety and the size of image data. Hard voting predicted the 6 shows the ablation study of ResNet50 based on 10-fold cross-
class labels with the majority votes of different classification mod- validation selection method.
els. The combination of all these techniques significantly improved
the accuracy of the ResNet50 model to 97.5%. The third row of
Table 6 shows that without transfer learning, the accuracy of the 4. Conclusion
ResNet50 model was only 86.1%. We can conclude that, in this
study, transfer learning plays a more important role in the perfor- This study aimed to find the automatic classification model
mance of deep learning models comparing with data augmentation with the best performance to classify Urticaceae pollen grains. Pol-
and hard voting. Because of these advanced techniques, we len grains of this family have high morphological similarity while
achieved great success with our deep learning models in the they induce different allergenic levels. Few researchers focused
190
C. Li, M. Polling, L. Cao et al. Neurocomputing 522 (2023) 181–193
(a)
(b)
(c)
(a)
(b)
(c)
Fig. 9. Example of feature maps of ResNet50. (a) Parietaria. (b) Urtica. (c) Urtica membranacea. Column 1 represents the input data, column 2–5 are the output of convolutional
layer in stage 1, and output of three bottlenecks in stage 2, of ResNet50, respectively.
Table 6
Ablation study with ResNet50. The average performance of ResNet50 based on five (about) 1000-sized image subsets via 5-fold cross-validation selection method is given.
Standard deviation, of five subsets, is given in brackets. Numbers in italics refer to training without transfer learning and data augmentation, respectively.
191
C. Li, M. Polling, L. Cao et al. Neurocomputing 522 (2023) 181–193
192
C. Li, M. Polling, L. Cao et al. Neurocomputing 522 (2023) 181–193
[19] P. Bountris, E. Farantatos, N. Apostolou, Advanced image analysis tools Marcel Polling received his Ph.D. degree at Naturalis
development for the early stage bronchial cancer detection, World Academy Biodiversity Center (the Netherlands) and the Natural
of Science, Engineering and Technology 1 (9). History Museum, University of Oslo, Norway, in 2021.
[20] L. Cao, M. de Graauw, K. Yan, L. Winkel, F.J. Verbeek, Hierarchical classification He has a background in palynology, having studies Earth
strategy for phenotype extraction from epidermal growth factor receptor Sciences and working as a biostratigraphic consultant in
endocytosis screening, BMC Bioinform. 17 (196). doi:10.1186/s12859-016- Wales, United Kingdom for 4,5 years. During his PhD he
1053-2. focused on innovative methods of pollen recognition,
[21] G.A. Dunn, A.F. Brown, Alignment of fibroblasts on grooved surfaces described including automatic image recognition and DNA
by a simple geometric transformation, J. Cell Sci. 83 (1986) 313–340. metabarcoding. He is currently employed at Wagenin-
[22] C. Chudyk, H. Castaneda, R. Leger, I. Yahiaoui, F. Boochs, Development of an
gen Environmental Research as a Molecular Ecologist,
automatic pollen classification system using shape, texture and aperture
mainly working on projects utilizing genetics to study
features, in: LWA 2015 Workshops: KDML, FGWM, IR, and FGDB, 2015, pp. 65–
74. and monitor biodiversity.
[23] M. Rodríguez-Damián, E. Cernadas, A. Formella, P. Sá-Otero, Pollen
classification using brightness-based and shape-based descriptors, in:
Proceedings – International Conference on Pattern Recognition, vol. 2, 2004,
pp. 212–215. doi:10.1109/icpr.2004.1334098. Lu Cao received her Ph.D. from LIACS, Leiden University,
[24] N. Dalal, B. Triggs, Histograms of oriented gradients for human detection, in: the Netherlands in 2014. Her background is in Com-
Proceedings – 2005 IEEE Computer Society Conference on Computer Vision puter Science and her research focuses on high-
and Pattern Recognition, CVPR 2005, 2005. doi:10.1109/CVPR.2005.177. throughput imaging, image and data analysis. Between
[25] R.M. Haralick, I. Dinstein, K. Shanmugam, Textural features for image 2014 and 2016, she worked as a postdoc researcher in
classification, IEEE Trans. Syst., Man Cybern. 3 (6) (1973) 610–621, https:// the Department of Anatomy and Embryology, Leiden
doi.org/10.1109/TSMC.1973.4309314.
University Medical Center (LUMC). In 2017, she worked
[26] M.K. Hu, Visual pattern recognition by moment invariants, IRE Trans. Inform.
in the Department of Applied Stem Cell Technologies
Theory 8 (2) (1962) 179–187, https://doi.org/10.1109/TIT.1962.1057692.
(AST) at the University of Twente as a postdoctoral fel-
[27] J. Tang, S. Alelyani, H. Liu, Feature selection for classification: A review, in: Data
Classification: Algorithms and Applications, no. 28, 2014, pp. 37–64. low. Before joining LIACS, she worked in Cytocypher B.V.
doi:10.1201/b17320. as an image and data analyst. She started her work at
[28] D. Jain, V. Singh, Feature selection and classification systems for chronic LIACS as an assistant professor from 2019.
disease prediction: A review (2018), https://doi.org/10.1016/j.eij.2018.03.002.
[29] M.J. Cossetin, J.C. Nievola, A.L. Koerich, Facial expression recognition using a
pairwise feature selection and classification approach, in: Proceedings of the
International Joint Conference on Neural Networks, 2016, pp. 5149–5155, Barbara Gravendeel leads the Evolutionary Ecology
https://doi.org/10.1109/IJCNN.2016.7727879. group at Naturalis Biodiversity Center in Leiden, the
[30] M.C. Popescu, L.M. Sasu, Feature extraction, feature selection and machine Netherlands. She received her Ph.D. in Plant Systematics
learning for image classification: A case study, in: 2014 International at Leiden University in 2000. After having been a Ful-
Conference on Optimization of Electrical and Electronic Equipment OPTIM ,
bright Visiting Professor at Harvard University from
2014, 2014,, pp. 968–973, https://doi.org/10.1109/OPTIM.2014.6850925.
2006-2007, Barbara became a full professor at the
[31] C.N. Silla, A.A. Freitas, A survey of hierarchical classification across different
application domains (2011), https://doi.org/10.1007/s10618-010-0175-9. University of Applied Sciences Leiden from 2011-2018.
[32] S. Kiritchenko, S. Matwin, A.F. Famili, Functional annotation of genes using Since 2019, Barbara holds a chair in Plant Evolution at
hierarchical text categorization, In Proc. of the BioLINK SIG: Linking Literature, the Radboud Institute for Biological and Environmental
Information and Knowledge for Biology (held at ISMB-05. Sciences in Nijmegen, the Netherlands. She has a back-
[33] A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep ground in Botany and her research focuses on rapid
convolutional neural networks, Adv. Neural Inform. Process. Syst. (2012). evolutionary changes of plants.
[34] K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale
image recognition, in: Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition, 2015.
[35] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Fons Verbeek is head of the Imaging & Bioinformatics
Proceedings of the IEEE Conference on Computer Vision and Pattern
group in LIACS. Leiden University, the Netherlands. He
Recognition, 2016, pp. 770–778, https://doi.org/10.1109/CVPR.2016.90.
received his Ph.D. in Applied Physics in Delft University
[36] A.G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M.
of Technology and worked as a postdoc researcher in
Andreetto, H. Adam, Mobilenets: Efficient convolutional neural networks for
mobile vision applications, in: Proceedings of the IEEE Conference on University of Amsterdam subsequently. After that he
Computer Vision and Pattern Recognition, 2017. worked as a research fellow at the Hubrecht Institute in
[37] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, L.C. Chen, Mobilenetv 2: Utrecht. In 2003, he started to work at LIACS, Leiden
Inverted residuals and linear bottlenecks, in: Proceedings of the IEEE University as a researcher/group leader. In 2008, he
Conference on Computer Vision and Pattern Recognition, 2018, pp. 4510– became an associate professor. As of 2013 he was affil-
4520, https://doi.org/10.1109/CVPR.2018.00474. iated to Fiocruz, Rio de Janeiro, Brazil as a visiting pro-
[38] A. Mahbod, G. Schaefer, R. Ecker, I. Ellinger, Pollen grain microscopic image fessor. In 2016, he was appointed full professor at
classification using an ensemble of fine-tuned deep convolutional neural Leiden University, the Netherlands. His research area
networks, in: Proceedings of the IEEE Conference on Computer Vision and includes Image Analysis with a focus on microscopy imaging. His research stretches
Pattern Recognition, 2020. from feature extraction and segmentation to classification strategies in the domain
[39] D.G. Arias, M.V.M. Cirne, J.E. Chire, H. Pedrini, Classification of pollen grain
of the life-sciences.
images based on an ensemble of classifiers, in: Proceedings – 16th IEEE
International Conference on Machine Learning and Applications, ICMLA 2017,
2017, pp. 234–240, https://doi.org/10.1109/ICMLA.2017.0-153.
193