0% found this document useful (0 votes)
51 views13 pages

Analysis of Automatic Image Classification Methods For Urticaceae Pollen

The document analyzes different automatic image classification methods for classifying pollen from the Urticaceae plant family. It compares machine learning methods that use manually extracted texture and shape features to deep learning methods using convolutional neural networks. For machine learning, a hierarchical classification strategy achieved the highest accuracy of 94.5%. For deep learning, ResNet50 achieved the best performance at 99.4% accuracy, outperforming other networks even on smaller datasets. The document provides an effective strategy for classifying highly similar pollen grains to support palynologists.

Uploaded by

Galhaadis
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views13 pages

Analysis of Automatic Image Classification Methods For Urticaceae Pollen

The document analyzes different automatic image classification methods for classifying pollen from the Urticaceae plant family. It compares machine learning methods that use manually extracted texture and shape features to deep learning methods using convolutional neural networks. For machine learning, a hierarchical classification strategy achieved the highest accuracy of 94.5%. For deep learning, ResNet50 achieved the best performance at 99.4% accuracy, outperforming other networks even on smaller datasets. The document provides an effective strategy for classifying highly similar pollen grains to support palynologists.

Uploaded by

Galhaadis
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Neurocomputing 522 (2023) 181–193

Contents lists available at ScienceDirect

Neurocomputing
journal homepage: www.elsevier.com/locate/neucom

Analysis of automatic image classification methods for Urticaceae pollen


classification
Chen Li a, Marcel Polling b,c, Lu Cao a, Barbara Gravendeel b,d, Fons J. Verbeek a,⇑
a
LIACS, Leiden University, Leiden, The Netherlands
b
Wageningen Environmental Research, The Netherlands
c
Naturalis Biodiversity Center, Leiden, The Netherlands
d
Radboud Institute of Biological and Environmental Sciences, Nijmegen, The Netherlands

a r t i c l e i n f o a b s t r a c t

Article history: Pollen classification is considered an important task in palynology. In the Netherlands, two genera of the
Received 16 February 2022 Urticaceae family, named Parietaria and Urtica, have high morphological similarities but induce allergy at
Revised 22 September 2022 a very different level. Therefore, distinction between these two genera is very important. Within this
Accepted 15 November 2022
group, the pollen of Urtica membranacea is the only species that can be recognized easily under the micro-
Available online 26 November 2022
scope. For the research presented in this study, we built a dataset from 6472 pollen images and our aim
was to find the best possible classifier on this dataset by analysing different classification methods, both
Keyword:
machine learning and deep learning-based methods. For machine learning-based methods, we measured
Image classification
Deep learning
both texture and moment features based on images from the pollen grains. Varied feature selection tech-
Machine learning niques, classifiers as well as a hierarchical strategy were implemented for pollen classification. For deep
Hierarchical strategy learning-based methods, we compared the performance of six popular Convolutional Neural Networks:
Pollen grains AlexNet, VGG16, VGG19, MobileNet V1, MobileNet V2 and ResNet50. Results show that compared with
flat classification models, a hierarchical strategy yielded the highest accuracy with 94.5% among machine
learning-based methods. Among deep learning-based methods, ResNet50 achieved an accuracy of 99.4%,
slightly outperforming the other neural networks investigated. In addition, we investigated the influence
on performance by changing the size of image datasets to 1000 and 500 images, respectively. Results
demonstrated that on smaller datasets, ResNet50 still achieved the best classification performance. An
ablation study was implemented to help understanding why the deep learning-based methods outper-
formed the other models investigated. Using Urticaceae pollen as an example, our research provides a
strategy of selecting a classification model for pollen datasets with highly similar pollen grains to support
palynologists and could potentially be applied to other image classification tasks.
Ó 2022 Leiden Institute of Advanced Computer Science, Leiden University. Published by Elsevier B.V. This
is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

1. Introduction analysis is often implemented by human visual inspection under


the microscope, and includes the identification of differences in
The analysis of pollen grains is widely used in detection and shape, texture, size and other specific features of pollen categories
monitoring of airborne allergenic particles. In recent years, pollen [3]. However, merely relying on human inspection for pollen clas-
seasons are prolonged due to global warming and climate change sification tasks is unrealistic as the size of image datasets is rapidly
[1]. This subsequently causes an increase of hay fever patients increasing due to high-throughput screening, while the expertise
who are affected by rising allergenic pollen levels in the air [2]. needed to perform this detailed analysis is rapidly disappearing.
In palynological research, identification of pollen grains plays a Another limitation of manual classification is that it may induce
key role to suggest safety treatments to patients with allergic classification biases with varied inspectors when the differences
rhinitis. It helps patients and medical professionals to monitor among pollen categories are very subtle. Thus, automatic classifica-
the levels of airborne allergenic pollen and thus plan outdoor activ- tion techniques are now being developed that have proven to per-
ities and medication treatments accordingly. Pollen recognition form well in pollen classification tasks [3]4567.
Researchers have adopted different approaches to automate the
⇑ Corresponding author. process of pollen classification. In general, the two main technical
E-mail address: [email protected] (F.J. Verbeek). approaches of pollen image classification tasks are machine

https://doi.org/10.1016/j.neucom.2022.11.042
0925-2312/Ó 2022 Leiden Institute of Advanced Computer Science, Leiden University. Published by Elsevier B.V.
This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
C. Li, M. Polling, L. Cao et al. Neurocomputing 522 (2023) 181–193

learning-based methods [3]8 and deep learning-based methods [5] which are morphological very similar, but cause completely differ-
[9][10][11][12]. ent allergy levels. Pollen of the two genera cannot currently be dis-
Machine learning methods need to be fed with manually tinguished easily by a palynologist; the species Urtica
selected features before they can extract these from images. The, membranacea represents the only species that can be specifically
so called, handcrafted features used in machine learning tech- distinguished.
niques are mostly based on shape, texture and other related prop- Parietaria and Urtica are two genera commonly encountered in
erties of pollen grain images. The extracted features play an the Netherlands. The occurrence of Parietaria plants is very much
important role in the performance of classification. In addition, increasing and could induce severe allergy in hay fever patients
suitable feature selection methods and classifiers are also crucial while Urtica does not [12]. Species from the genus Parietaria as well
for machine learning-based classification methods. as Urtica membranacea originate from the Mediterranean area and
In the work of del Pozo-Baños et al. [13], a combination of geo- now increase in North Europe. Due to climate change these species
metrical and texture characteristics was proposed as the discrimi- can maintain themselves in northern countries such as the Nether-
native features for a 17 class pollen dataset. Incorporation of Linear lands. The pollen grains from these taxa exhibit a similar round-
Discriminant Analysis (LDA) and Least Square Support Vector ness, and are all very small, but differ in the following features:
Machines (LS-SVM) accomplished the best performance of 94.92% 1) different number of pores: Parietaria and Urtica have 3 to 4
accuracy. Marcos et al. [14] extracted four texture features includ- pores, while this is variable for Urtica membranacea (usually 5 to
ing Gray-Level Cooccurrence Matrices (GLCM), log-Gabor filters 10; i.e. pantoporate). 2) The average size of Parietaria pollen is
(LGF), Local Binary Patterns (LBP) and Discrete Tchebychev slightly smaller (11–18lm) and it with a coarser and more irreg-
Moments (DTM) from a pollen image dataset with 15 classes. Fish- ular surface than Urtica. Urtica pollen are bigger in size on average
er’s Discriminant Analysis (FDA) and K-Nearest Neighbour (KNN) (15.2–21.1lm), and often have a more pronounced thickened
were subsequently applied to perform dimensionality reduction exine around the pore (annulus). The shape of Urtica membranacea
and multivariate classification. It yielded an accuracy of 95%. Man- is slightly angular and is easily distinguished because of its small
ikis et al. [8] used texture features obtained by GLCM and seven size (10–12lm) and high number of pores. Although these pollen
geometrical features computed from the binary mask of a pollen grains have the aforementioned differences, it is not possible for
image dataset. A Random Forest (RF) classifier was used in the clas- experts to distinguish the three different classes by the naked-
sification stage; with this classifier 88.24% accuracy was achieved eye using a light microscope. This is mainly because of their small
on 6 pollen classes. Machine learning thus show highly varying size. Therefore, in order to improve the accuracy and efficiency of
results, and is seemingly dependent on the dataset used. Urticaceae pollen classification, automatic algorithms are required.
Instead of manual design of the features, deep leaning methods Currently, very few studies focused on pollen classification of
automatically extract image features through convolutional layers the Urticaceae family. Rodrı̀guez-Damián et al. [16] extracted both
of the network. In recent years, many state-of-the-art Convolu- geometrical and texture features and probed three classifiers: Sup-
tional Neural Networks (CNNs) were applied in pollen classifica- port Vector Machines (SVM), Multi-Layer Perceptron (MLP) and
tion tasks. In the work of Sevillano et al. [5], pretrained AlexNet Minimum Distance Classifier (MDC). The best performance of
was used to classify a dataset with 46 different classes of pollen 88% success rate was reached on a total of 291 pollen images of
grains. By incorporating data augmentation and cross-validation the three species Parietaria judaica, Urtica urens and Urtica mem-
techniques, an accuracy of 98% was achieved. In the work pre- branacea. Compared with their relatively small Urticaceae dataset,
sented by Battiato et al. [4], both AlexNet and SmallerVGGNet were we aimed to analyse a much larger dataset that includes all species
implemented to classify five classes of pollen grains, with 13,000 (Parietaria judaica, Parietaria officinalis, Urtica dioica, Urtica urens
images. The two networks obtained a performance of 89.63% and and Urtica membranacea) present in the Netherlands. We grouped
89.73% accuracy, respectively. A seven layer deep Convolutional these five species into 3 classes: Parietaria (Parietaria judaica, Pari-
Neural Network designed by Daood et al. [9], was trained on a etaria officinalis), Urtica (Urtica urens, Urtica dioica) and Urtica mem-
dataset of 30 pollen classes and accomplished a 94% correct classi- branacea. Both Parietaria and Urtica dominate in the Netherlands
fication rate. Astolfi et al. [15] analysed a pollen dataset composed but cause a totally different allergy level. Urtica membranacea is
of 73 pollen categories. They compared the performance of eight an exotic Mediterranean species and it is the only species can be
state-of-the-art CNNs which included Inception-V3, VGG16, easily distinguished. Hence our starting point for three labels and
VGG19, ResNet-50, NASNet, Xception, DenseNet-201 and thus, our study is based on a three-class classification task. The
Inception-ResNet-V2. They showed that DenseNet-201 and best performance achieved in our study is 99.4% by a ResNet50.
ResNet-50 achieved superior performance against other CNNs with Actually, it is also possible to do a classification task over all five
an accuracy of 95.7% and 94.0%, respectively. species (see Supplementary Table 1). Another challenge is that
Based on the analysis of related work mentioned above, both the pollen grains that we used were unacetolyzed. Acetolyzed pol-
machine learning and deep learning-based methods have achieved len grains are those that all pollen materials are destroyed by ace-
comparable performance on pollen datasets. However, the pollen tolysis with the exception of sporopollenin that forms the outer
datasets used in these studies is derived from species or genera pollen wall, the exine. In contrast to acetolyzed pollen grains,
from different plant families [16]. The morphology of each class unacetolyzed pollen keep their original organic features which
of pollen is already clearly distinctive under the microscopy by are less apparent. To the best of our knowledge, our previous work
human analysts. For example, the public POLEN23E dataset [3] [12] was the first and the only time that CNNs were applied and
consists of 23 pollen classes from the Brazilian Savannah, derived compared for the analysis of the unacetolyzed Urticaceae pollen
from 23 genera in 15 families. Each class of pollen has a, different grains. In this study, we extended this work further and aimed to
shape, size and texture. The other public pollen dataset from the find an automatic classification model with the best performance
Brazilian Savannah, called POLLEN73S, which was analysed by in both machine learning-based and deep learning-based methods
Astolfi et al. [15], has 73 pollen classes with clearly variable colour, for our unacetolyzed Urticaceae dataset. In general, for a deep
shape and other morphological differences. These distinct features learning model, a large dataset is required as input. However, there
ensured the high performance of the classification model applied. are many limitations for researchers to collect a sufficiently large
However, in this research, we are more interested in distinguishing dataset in practice. Subsequently, we were curious about how
genera of the same family Urticaceae, namely, Parietaria and Urtica machine learning-based and deep learning-based methods work

182
C. Li, M. Polling, L. Cao et al. Neurocomputing 522 (2023) 181–193

on a smaller sized image dataset. Therefore, two additional exper- However, some pollen grains are very closely located to each other
iments on smaller datasets were designed to compare the perfor- so that cropping to the same size might cause incomplete pollen
mance of different classification models. For a 1000-image separation. Therefore we chose for as another approach which is
dataset, a ResNet50 yielded the best performance of 96.3% while to use padding of the cropped images so that the resulting images
for a 500-image dataset, it achieved the best accuracy of 93.3%. all have the same size. For the padding size, the biggest size of all
individual pollen images was selected, i.e.276  276 pixels. The
padding value was set to the median value at the edge of each pol-
2. Methods
len image in order to make the content of the padding images more
natural.
2.1. Sample and Image preparation
After the pre-processing of the images, we aimed to find the
best classification model for our Urticaceae pollen dataset.
2.1.1. Sample preparation of the pollen grains
Machine learning and deep learning-based classification models
Our pollen data included both fresh pollen specimens and dry
were constructed and the performance of each model was evalu-
pollen specimens [12]. Fresh pollen specimens were collected by
ated and compared.
an experienced biologist in the surroundings of Leiden and The
Hague (the Netherlands) during the flowering season of 2018 and
2019. Dry pollen specimens were collected from the herbarium of 2.2. Machine learning methods
Naturalis Biodiversity Center, Leiden, the Netherlands, using iden-
tification keys and descriptions. For each species in our dataset, 2.2.1. Feature extraction and selection
pollen samples from 4 to 8 plants were taken, from different geo- Machine learning methods require manual selection of relevant
graphical locations in order to cover as much variation as possible features before extracting these from images. One challenge is how
(see Supplementary Table 2). Microscope slides were freshly pre- to select an appropriate set of features for classification. By observ-
pared by aerobiological experts from Naturalis Biodiversity Center. ing the characteristics of Urticaceae pollen grains, we noticed that
The thecae of open flowers were carefully opened on a microscopic Parietaria has a coarser ornamentation on the surface of its pollen
slide using tweezers. Non-pollen materials were manually grains, Urtica has thickened pores and Urtica membranacea has an
removed. The pollen grains were mounted using a glycerin: angular outline. Texture attributes of surface and shape features
water:gelatin (7:6:1) solution with 2% phenol and stained with was considered as the appropriate pollen descriptors for Urticaceae
Safranin (0.002% w/v). Cover slips were sealed with paraffin. Each pollen grains. We aimed to include as much representative features
slide contained only one plant of each species of Urticaceae. as possible for Urticaceae pollen classification due to their high
morphological similarities. The following selected features have
2.1.2. Image capturing and pre-processing been proven to be successful in classification tasks for pollen
The slide area rich in pollen was scanned automatically using a recognition: GLCM, LBP, Gabor filter texture features and His-
Zeiss Observer Z1 microscope with a Plan Apochromat togram of Oriented Gradients (HOG). These features have provided
100  objective (NA 1.4), equipped with a Hamamatsu c9100 satisfactory results as reported in [4]14. Both First Order Statistics
EM-CCD camera. As pollen grains are three dimensional, it is diffi- (FOS), which are derived from statistical properties of the intensity
cult to set a focal plane for pollen samples. Therefore, we captured histogram of an image, and Wavelet measurements, which is a tex-
20 slices of images along the Z axis for pollen grains. The step size ture analysis based on a Discrete Wavelet Transform (DWT) have
was 1.8 lm. After obtaining a stack of images including pollen, the been included as they have been successfully used in pattern
grains were detected and cropped; this is referred to as the 3D pol- recognition of cells [19]20. In addition, the seven Hu invariant
len stack. Fig. 1 (a) shows an example of a slice from the raw image. moments and three shape measures derived from the invariants,
Fig. 1 (b) shows all 20 slices of different focal depths of an individ- referred to as Extension, Dispersion and Elongation (EDE) were
ual pollen grain. In total, 6472 individual pollen stack images were included as invariant descriptors for shape [21]. So, based on afore-
captured. Three categories were included for the image classifica- mentioned image-based studies [22]23 we have selected six tex-
tion study. These were (1) Parietaria (including Parietaria judaica, ture features and two moment-based features to represent the
Parietaria officinalis), (2) Urtica (Urtica dioica, Urtica urens), (3) characteristics of pollen grains in our study. Table 1 shows the
Urtica membranacea (see Fig. 2). selected features with the dimensions of each feature vector.
As shown in Fig. 1(b), not all of the 20 slices in the Z-stack were HOG features in combination with a SVM classifier have proven
in-focus. In order to obtain as much informative features as possi- to be a representative texture descriptor in the image recognition
ble, all Z-stack images were further processed using a Z-stack pro- field [24]. In the procedure of HOG feature extraction, we divided
jection method [17]. Z-stack projection is a method of analysing an image into several small connected regions, aka cells. Each cell
and highlighting specific features from all slices in a stacked image returns a 9  1 feature vector. In order to be more invariant in rep-
without incorporating out-of-focus blurriness. The selected projec- resenting the changes of shadowing and illumination, a larger
tions were Standard Deviation (STD), Minimum Intensity (MIN), region, referred to as the block, is formed. The block consists of four
and Extend Focus (EXT) [18], which are shown in Fig. 1 (c) and cells and returns a 36  1 feature vector. In the experiment, a pol-
Fig. 2. The three projections per pollen grain were treated as three len image with size (276  276) can be divided into 100 blocks,
separate channel images for the input of supervised classification consequently, a 3600  1 feature vector is returned at the end.
models. Same-sized images are required to feed into classification LBP is an invariant descriptor that can be used for texture clas-
models which is achieved by resizing images to the same size. sification. A n-digit binary number is obtained by comparing each
However, for pollen images captured by a microscope, the mor- pixel with its n neighbour pixels on a circle with radius r and used
phology and details of pollen grains are expected to be changed to compute the histogram. In our study, we fine-tuned the param-
by resizing. We did not opt for resizing as the resized images might eters and set it as n = 24, r = 3. Similar to the HOG feature extrac-
ignore the original size differences of pollen grains, which is a tion procedure, the image was also divided into 16 smaller blocks.
potentially important diagnostic feature. There are several ways In this manner a 416  1 LBP feature vector is returned.
to preserve this nature of the features. One can crop images of GLCM characterizes the texture of images by considering the
the pollen grains to the same size from a slide-scan image. spatial relationship of pairs of pixels in an image. GLCM is created

183
C. Li, M. Polling, L. Cao et al. Neurocomputing 522 (2023) 181–193

Fig. 1. The workflow of pollen image acquisition. (a) One plane of raw pollen image. (b) 20 slices at different focal depths of gray scale images of one individual pollen grain.
(c) 3 different projections in Z axis, STD = Standard Deviation Projection, MIN = Minimum Intensity Projection, EXT = Extend Focus Projection. (d) The padding images of each
projection.

direction theta = 0°, 45°, 90°, 135°. We further calculated the prop-
erties based on the matrix which are defined by Haralick et al. [25],
as the extracted feature vector. Finally, a 24  1 feature vector was
returned.
A 5  1 texture feature vector of FOS named standard deviation
of intensity, smoothness, skewness, uniformity and entropy was
calculated [20]. These measures reflect the statistical properties
of the intensity histogram of each pollen image. Wavelet-based
texture measurements show the image details in different direc-
tions after DWT. We calculated the mean, standard deviation and
entropy of intensity in three directions (horizontal, vertical, diago-
nal) of each pollen image. The dimension of wavelet measurement
is 9  1.
Another commonly used texture descriptor is the Gabor filter: it
reflects frequency content in a specific direction of a localized
region of the image. In this study, 12 Gabor filters are designed
at four directions 0°, 45°, 90°, 135° with three different frequencies
p=4; p=2; 3p=4. Therefore, the dimension of the Gabor filter feature
vector is 60  1.
Hu moments are normally extracted from pollen images with
Fig. 2. A sample pollen grain of each category in our dataset. (a) Parietaria judaica. the property of scale and rotation invariance. A total of seven
(b) Urtica urens. (c) Urtica membranacea. Each column represents the STD, MIN, EXT moment invariants as proposed by Hu [26] were extracted. EDE
projections of each pollen grain, respectively.
features are derived from the 1st and the 2nd order invariants. Even
though the morphological differences of pollen between genera is
Table 1 subtle, it was expected that these image moment features could
The dimension of feature vector of each feature. play a role in the pollen classification task.
Each pollen image in our dataset consisted of 3 projections
Feature Dimension
(STD, MIN, and EXT) obtained by projecting 20-slice Z-stack
HOG 3600  3
images. Fig. 2 shows that the features of each projection are differ-
LBP 416  3
Gabor filter 60  3
ent, especially the texture features. In order to include as much
GLCM 24  3 information of the pollen grain dataset as possible, we calculated
FOS 53 8 features for 3 projections of pollen images and concatenated
Wavelet 93 these together as the final feature vector (cf. Table 1). Therefore,
Hu moments 73
after feature extraction, the dimension of feature vector reaches
EDE 33
Total 4124  3 4124  3. Compared with public pollen image datasets like
POLEN23E, POLLEN73S [3]15 and the 2D Urticaceae pollen images
used in [16], our dataset based on a method of projection of 3D
based on a statistical rule Pði; j; d; thetaÞ, which refers to the num- images might intrinsically extract more representative features.
ber of times that gray-level j occurs at a distance d and at a direc- This partially underlies the reason of the high performance results
tion theta from gray-level i. Our experiment set d ¼ 1 and the that we have achieved.
184
C. Li, M. Polling, L. Cao et al. Neurocomputing 522 (2023) 181–193

In order to remove redundant and irrelevant features, feature


selection and dimensionality reduction techniques were applied.
Feature selection returns prominent subsets of features while
dimensionality reduction creates new features with lower dimen-
sion from the original features. Feature selection includes a filter
method, wrapped method and embedded method [27]. These
methods have been shown to improve the accuracy of classifica-
tion studies [28]29.
In this study, feature selection methods including Mutual infor-
mation, SelectFromModel and Principal Component Analysis (PCA)
were assessed. Mutual information is a filter feature selection
method. In this method a subset of the best K features which are
most relevant to the target labels is chosen; the selection of the
number K is mostly based on experience. The embedded method
named SelectFromModel works more flexible. It selects the most Fig. 3. Hierarchical tree of pollen classification.
relevant features according to the performance of machine learn-
ing models during the training process. This integrated approach
ensures that the selected features are the best for the model. Alter- 2.3. Deep learning methods
natively, PCA is widely used in feature dimensionality reduction. It
is a process of computing the principal components and preserving 2.3.1. Convolutional neural networks
the first few of them that maximize the variances between differ- We selected several well-established deep learning models for
ent classes. PCA is known for obtaining lower-dimensional features pollen classification. AlexNet achieved the best performance on
and improving the accuracy of machine learning model in many an ImageNet classification task in 2012 [33]. In order to prove that
fields such as image recognition [13]30. convolutional network depth affects image classification accuracy,
In short, these selected feature selection and dimensionality Simonyan et al. [34] proposed VGGNet for large-scale image recog-
reduction methods were applied after feature extraction. Subse- nition. A 16-layer VGG16 network and 19-layer VGG19 network
quently, classification models were used as the last step to classify have proven to be the two best-performing convolutional neural
pollen grains. networks in other studies. ResNet introduces a residual learning
framework to ease the training process of very deep networks–
up to 152 layers [35]. Even though ResNet has lower complexity
2.2.2. Classifiers than VGGNets, it still has millions of parameters making the net-
Once features are extracted from images, an efficient classifica- work computational heavy. A more light-weighted set of neural
tion model is required so that it can perform well on the pollen networks, named MobileNets, was designed in order to embed it
classification task. A large number of classification approaches into mobile devices or other applications [36]. In this study, we
exist. In this study, SVM, RF, MLP and Adaboost classifiers were selected the aforementioned models: i.e., AlexNet, VGG16,
used as they have shown to perform well in previous pollen classi- VGG19, ResNet50, and MobileNet V1 and V2 [37] to classify our
fication studies [16]48. pollen dataset.
Combined with extracted features and feature selection meth- In addition, we used a transfer learning technique to alleviate
ods, the classifiers have been trained and the hyperparameters the computational burden of training from scratch. With our three
were tuned based on the performance of the experiment. classes pollen dataset, we fine-tuned the pretrained AlexNet based
on the ImageNet dataset in the PyTorch framework. The other pre-
trained networks implemented in the Keras Library were fine-
2.2.3. Hierarchical strategy tuned on the TensorFlow platform. All experiments were executed
A flat classification model is a straight-forward approach for on a dedicated server equipped with two NVidia GeForce GTX 2070
taxonomic classification tasks. Only one classifier is used to classify with 8 GB GPUs using Linux Ubuntu operating system.
all classes. However, the process ignores potential hierarchical
structure among different classes which could reduce performance 2.3.2. Data augmentation
[31]. Hierarchical classification can be seen as a particular tree- Deep learning models need a large number of image datasets
structured approach. It merges the classes which are more similar covering diverse scenarios. Data augmentation techniques play
into subgroups and classifies these subgroups separately. Varied an important role in increasing the variety of images. Furthermore,
classifiers are used to classify classes at different hierarchical levels if appropriate transforms are applied to a dataset, data augmenta-
until reaching the leaf nodes. Hierarchical strategy has been used tion can greatly improve the performance and reduce overfitting.
in many classification tasks [20]32 and has proven to increase In our case, differences between pollen of Urticaceae genera are
the performance compared with flat classification models. very subtle and slight configurational changes during image cap-
For our work, we structured the three classes of pollen grains as turing may affect the classification performance. Therefore, a large
a hierarchical tree as shown in Fig. 3. We used a local classifier per amount of training data was needed for our study.
parent node approach to train a two-stage classifier for each parent In order to simulate the possible transforms of pollen data,
node in the hierarchical tree. In the first stage we merged Parietaria brightness and flip transforms were most obvious and straightfor-
and Urtica into one subgroup based on high morphological similar- ward to select and therefore applied as augmentation options.
ity. Urtica membranacea is a distinct species that can already be Other transforms like rotation, zoom range, etc., were not selected.
clearly distinguished under the light microscope and it was there-
fore treated as the other subgroup. In the second stage, Parietaria 2.3.3. Cross-validation and hard voting
and Urtica were subsequently classified. At both stages we selected Cross-validation is applied in an image training process to
the best classifier and feature selection method for each parent improve the effectiveness, robustness and generalization ability
node in order to get a better performance of the hierarchical clas- of deep learning models, as well as to prevent overfitting. In this
sification model. study, K values of 5 or 10 were used [11]38. A 5-fold cross-
185
C. Li, M. Polling, L. Cao et al. Neurocomputing 522 (2023) 181–193

validation means the ratio of training data and validation data is 3. Experiment results and discussion
8:2 while 10-fold cross-validation means the ratio is 9:1. We com-
pared the performance of deep learning models with 5-fold cross- We derived results from two parts: a comparison of pollen clas-
validation and 10-fold cross-validation, respectively. After 5/10- sification performance with different machine learning algorithms
fold cross-validation, 5/10 models were obtained and tested on test and an analysis of performance of deep learning neural networks.
datasets. In this study, hard voting was adopted to calculate the Two additional experiments show how machine learning-based
final accuracy rather than the average accuracy of 5/10 models. and deep learning-based methods work on smaller-size image
Hard voting sums the votes for class labels from each model for datasets. Our results are based on performance measures.
predicting the class with the majority votes. The experimental
results show that the hard voting technique further improves the
classification performance on the test dataset. 3.1. Results with machine learning methods

For machine learning methods, the 6472 pollen images were


2.4. Performance evaluation divided into training and test datasets in a ratio of 9:1. In this
experiment, we compared the performance of each classification
Before addressing the results, we first introduce the perfor- model using 5-fold and 10-fold cross-validation, respectively. In
mance measures that we used. These were: addition, a grid search technique [39] was applied in the training
process to help search for optimal hyperparameters automatically.
TP With this technique, a list of hyperparameter values is defined
precision ¼ ð1Þ
TP þ FP beforehand and the optimal set of parameters, that can maximize
the accuracy of the model, is returned.
TP Table 2 shows the performance of each classifier with corre-
recall ¼ ð2Þ
TP þ FN sponding hyperparameter settings and feature selection methods.
In Table 2 we present the results of a SVM with a Radial Basis Func-
tion (RBF). This SVM, with a penalty parameter C = 4, was shown to
2  precision  recall
F1score ¼ ð3Þ be optimal for this type of data. These parameters were deter-
precision þ recall
mined by a grid search technique. Two dictionaries which included
kernel functions (linear, rbf, poly) and penalty values (from 1 to 10)
Where TP refers to true positives, TN represents true negatives, FP is
were set and the best parameter combination was returned. Other
false positives and FN is false negatives. High precision and recall
parameters in SVM were selected by default values. The highest
values are able to verify good performance against false positives
score with an accuracy of 91.5% (r= ±0.008) and F1 score of
and false negatives of a model [5]. The F1 score is an overall mea-
0.915 was achieved in combination with a PCA threshold of 0.8
surement which combines precision and recall together. A high F1
with 5-fold cross-validation. The threshold 0.8 of the PCA means
score means that a model retrieved both low false positives and
that the first few principal components with the ratio of accumu-
low false negatives, which proves the consistency of those measures
lated data variation and total data variation greater than 0.8 are
and the reliability of the model. Precision, recall and F1 score were
preserved while all others are discarded. In this case, the final size
calculated as the average weighted by the number of true instances
of the selected feature vector is 179  1 (see Supplementary
for each class in our experiments. The accuracy of the classification
Table 3). For the RF classifier, the number of trees was set to 500.
model was also calculated by the number of true predictions
The SelectFromModel function of a threshold ‘mean’ embedded
divided by the total number of samples.
in a classifier can achieve the best performance of 88.6% (±0.015)
The performance measurements mentioned above are com-
with 10-fold cross-validation. The threshold ‘mean’ was set accord-
monly applied in flat classification models. However, they are not
ing to the importance of each feature. It means that a feature
suitable for hierarchical classification models since they do not dif-
whose importance is greater or equal to the ‘mean’ is kept while
ferentiate the misclassification errors among different hierarchical
others are discarded. The final size of the selected feature vector
stages. Instead, we adopted the measures suggested in [32], which
is 2064  1. Similarly, all of these parameters are fine-tuned by
include hierarchical precision (hP), hierarchical recall (hR) and
the grid search technique.
hierarchical f-measure (hF). These are defined as follows:
Furthermore, we carried out experiments with MLP and Ada-
X boost classifiers. The MLP led to the best accuracy of 89.8%
j Cbi \ Cb0i j (±0.011) with the following settings: the optimizer was Stochastic
i
hP ¼ X ð4Þ Gradient Descent (SGD); the number of maximal iterations was
j Cb0i j 300; PCA feature reduction was at a threshold of 0.85. The final size
i of feature vector after PCA feature reduction becomes 337  1.
Adaboost reached a performance of 75.4% (±0.021) with the num-
X
j Cbi \ Cb0i j
ber of Estimators set to 500 and a Learning Rate (LR) of 0.5. Mutual
i
Information plays an important role in the accuracy of the Ada-
hR ¼ X ð5Þ boost classifier because it selected 2000 features most relevant to
j Cbi j
i
target classes of pollen datasets. In addition, 5-fold and 10-fold
cross-validation obtained a comparable performance with differ-
ent machine learning-based classification models.
2  hP  hR In order to improve the performance of the flat classification
hF ¼ ð6Þ
hP þ hR model further, we applied a hierarchical strategy classification.
We have implemented different combinations of flat models to
Where Cbi is a set of real classes with all of its ancestors and Cb0i is a form a two-level hierarchical structure which includes SVM
set of predict classes with all of its ancestors. Ancestors here refer to + SVM, SVM + MLP, MLP + SVM, SVM + RF, RF + SVM. Table 3
all the nodes which are connected to the specific real/predict class shows all permutations with SVM for the hierarchical classification
node in the hierarchical tree structure. model except for Adaboost. This is because SVM achieved the best
186
C. Li, M. Polling, L. Cao et al. Neurocomputing 522 (2023) 181–193

Table 2
Performance comparison of different flat classification models. Standard deviation, of each subset via cross-validation, is given in brackets.

Classifiers Hyperparameters Feature selection with threshold Cross-validation Precision Recall F1 score Accuracy
SVM kernel=‘rbf’, C = 4 PCA (0.8) 10-fold 0.913 0.913 0.913 0.913
(±0.012) (±0.012) (±0.012) (±0.012)
5-fold 0.915 0.915 0.915 0.915
(±0.008) (±0.008) (±0.008) (±0.008)
RF Estimators = 500 SelectFromModel 10-fold 0.886 0.886 0.886 0.886
(mean) (±0.015) (±0.015) (±0.015) (±0.015)
5-fold 0.884 0.884 0.884 0.884
(±0.010) (±0.010) (±0.010) (±0.010)
MLP Solver=’sgd’, PCA (0.85) 10-fold 0.898 0.898 0.898 0.898
Maxiter = 300 (±0.011) (±0.011) (±0.011) (±0.011)
5-fold 0.890 0.890 0.890 0.890
(±0.008) (±0.008) (±0.008) (±0.008)
Adaboost Estimators = 500, Mutual Information 10-fold 0.789 0.754 0.749 0.754
LR = 0.5 (2000) (±0.014) (±0.021) (±0.025) (±0.021)
5-fold 0.784 0.745 0.743 0.748
(±0.015) (±0.024) (±0.028) (±0.024)

Table 3
Performance comparison of hierarchical classification models.

Hierarchy Level 1 Hierarchy Level 2 hP hR hF Accuracy


Classifier with Feature selection Classifier with Feature selection
Hyperparameters with threshold Hyperparameters with threshold
SVM PCA (0.8) SVM PCA (0.8) 0.941 0.941 0.941 0.945
(kernel=’rbf’,C = 6) (kernel=’rbf’,C = 4)
SVM PCA (0.8) MLP PCA (0.85) 0.920 0.920 0.920 0.916
(kernel=’rbf’,C = 4) (Solver=’sgd’, Maxiter = 300)
MLP PCA (0.85) SVM PCA (0.8) 0.929 0.929 0.929 0.933
(Solver=’sgd’, Maxiter = 300) (kernel=’rbf”,C = 4)
SVM PCA (0.8) RF SelectFromModel 0.907 0.907 0.907 0.897
(kernel=’rbf’,C = 4) (Estimators = 500) (mean)
RF SelectFromModel SVM PCA (0.8) 0.913 0.913 0.913 0.911
(Estimators = 500) (mean) (kernel=’rbf’,C = 4)

performance of the flat classification models while Adaboost five convolutional layers and three fully connected layers were
achieved the lowest performance. In Table 3 the best combination fine-tuned with a learning rate 0.001. We set the batch size to
is SVM + SVM which obtained 94.5% of accuracy and 0.941 of all of 128 and the number of batches to 4000. Fig. 4 (a) shows the perfor-
the hP, hR and hF. The reason why hP, hR and hF are equal is that mance plot of AlexNet. The plot shows the accuracy and loss for
our simple hierarchical tree structure only has 3 layers and for each both training and validation dataset. The accuracy drastically
parent node, it only has 2 children. According to the definition of hP increases in the first 700 batches and then converges gradually.
and hR (cf. Eq. (5)), in this case, the calculation of hP, hR and hF is AlexNet achieved an accuracy of 94.1% with a standard deviation
equal. Based on our experiments, a hierarchical model which com- of (±0.002) using 10-fold cross-validation while 5-fold cross-
bined SVM + PCA and SVM + PCA at both 2 levels was considered validation retrieved a comparable accuracy of 92.4% (±0.002) (see
as the best model among machine learning-based methods. Table 4). The average accuracy and standard deviation were calcu-
lated by training each model three times. The accompanied preci-
3.2. Results with deep learning methods sion, recall and F1 score achieved with the same 0.941. The reason
why these three measurements are so similar is that, for this case,
Our starting point has been to work with commonly available False Positive (FP) samples are nearly equal to the number of False
deep learning methods, the AlexNet, VGG16, VGG19, ResNet50, Negatives (FN). The consistency of these measurements shows the
MobileNet V1 and MobileNet V2. First of all, a total of 6472 pollen reliability of the model. Six positive samples and six negative sam-
grain images were divided into a training set and test set in a ratio ples among three classes of pollen grains performed by AlexNet are
of 9:1 randomly. The test set was composed of images that were shown in Fig. 5 (a) and (b). The actual label, predicted label and
not seen by the model during the training process and that were confidence score of each sample are indicated. Label 1 to 3 repre-
used to test the trained classification model. Secondly, considering sent the 3 classes of pollen: Parietaria, Urtica, Urtica membranacea.
that deep learning models require a huge amount of data, a data In Fig. 5 (a), positive samples clearly show the distinguished prop-
augmentation technique was applied. Thirdly, similar with erties. Urtica pollen has obviously thickened pores compared with
machine learning models, 5-fold and 10-fold cross-validation were pollen of Parietaria and Urtica membranacea. Pollen of Urtica mem-
used to prevent overfitting and increase the robustness as well as branacea has more angular outlines than pollen of the other two
the generalization ability of the deep learning models. Data aug- genera. In Fig. 5 (b) illustrates that, when the properties of the 3
mentation process was performed for each cross-validation set classes of pollen are not clearly displayed, the network will mis-
independently. classify these samples because of high similarities among the 3
Based on these aforementioned procedures, we fine-tuned six classes.
representative deep learning classification models. The pretrained Similarly, the pretrained VGG16 and VGG19 models were fine-
AlexNet was implemented in the PyTorch framework. The whole tuned with a batch size of 64 and the number of epochs set to 30.
187
C. Li, M. Polling, L. Cao et al. Neurocomputing 522 (2023) 181–193

Fig. 4. The performance plots of (a) AlexNet, (b) VGG16, (c) VGG19, (d) MobileNet V1, (e) MobileNet V2, and (f) ResNet50, in terms of training loss, etc., with respect to the
number of epochs.

Table 4
Classification performances of different deep learning classification models. Standard deviation, training each model three times, is given in brackets.

Cross-validation Precision Recall F1 score


AlexNet 10-fold 0.941(±0.002) 0.941(±0.002) 0.941(±0.002)
5-fold 0.924(±0.002) 0.924(±0.002) 0.924(±0.002)
VGG16 10-fold 0.983(±0.001) 0.983(±0.001)) 0.983(±0.001)
5-fold 0.985(±0.002) 0.985(±0.002) 0.985(±0.002)
VGG19 10-fold 0.986(±0.002) 0.986(±0.002) 0.986(±0.002)
5-fold 0.988(±0.003) 0.988(±0.003) 0.988(±0.003)
ResNet50 10-fold 0.994(±0.002) 0.994(±0.002) 0.994(±0.002)
5-fold 0.993(±0.002) 0.993(±0.002) 0.993(±0.002)
MobileNet V1 10-fold 0.981(±0.003) 0.981(±0.003) 0.981(±0.003)
5-fold 0.980(±0.002) 0.980(±0.002) 0.980(±0.002)
MobileNet V2 10-fold 0.985(±0.003) 0.985(±0.003) 0.985(±0.003)
5-fold 0.984(±0.003) 0.984(±0.003) 0.984(±0.003)

Label: 1 Label: 1 Label: 2 Label: 2 Label: 3 Label: 3


Predicted: 1 Predicted: 1 Predicted: 2 Predicted: 2 Predicted: 3 Predicted: 3
0.5761 0.54498 0.57609 0.53593 0.57606 0.53123
(a)

Label: 1 Label: 1 Label: 2 Label: 2 Label: 3 Label: 3


Predicted: 2 Predicted: 2 Predicted: 1 Predicted: 1 Predicted: 1 Predicted: 1
0.55665 0.57449 0.41037 0.55925 0.56865 0.40726
(b)

Fig. 5. Examples of classification performed by AlexNet. (a) Positive samples with their predicted label and confidence score. (b) Negative samples with their predicted label
and confidence score.

The whole network was fine-tuned using learning rate 2e-5 with- the models converge well in the training process. Table 4 lists
out freezing any layers. Fig. 4 (b) and (c) show the performance detailed measurements of these two models. For 10-fold cross-
plots of VGG16 and VGG19, respectively. Both plots show that validation, VGG16 obtained an average accuracy of 98.3% with a

188
C. Li, M. Polling, L. Cao et al. Neurocomputing 522 (2023) 181–193

standard deviation of (±0.001) while VGG19 achieved a compara- cal model demonstrated that hierarchical strategy improves the
ble average accuracy of 98.6% with (±0.002). performance. The performance of the hierarchical model was, how-
The pretrained ResNet50, MobileNet V1 and MobileNet V2 ever, still lower than the deep learning models, except for AlexNet.
models were constructed based on Keras as well. And all of these We obtained similar results with the larger 6472-image dataset
models were fine-tuned on our pollen dataset. Table 4 shows that (Table 3 and Table 4). This is probably due to the fact that AlexNet
light-weights MobileNets achieved a comparable accuracy with has a shallow layer-structure which includes only five convolu-
VGGNets but it had a slightly higher standard deviation. This tional layers and three fully connected layers. The results indicate
means that light models are not as robust as heavy-weight models. that extracting as many features as possible manually as well as
ResNet50 obtained the highest performance of 99.4% with 10-fold using a hierarchical strategy outperforms a shallow deep learning
cross-validation among the six models investigated due to its dee- neural network such as AlexNet. For the 500-image dataset we
per network layers and a creative residual structure. Consequently, obtained similar results.
ResNet50 was selected as the best performing model among all the In order to obtain sufficient information for a statistical analysis
models that we implemented. In addition, both 5-fold and 10-fold of the performance of the models, we used a cross-validation
cross-validation achieved comparable performance for all of the approach over the entire image set with two differently sized
deep learning models studied. groups of subsets 1. We implemented the 5-fold cross-validation
Fig. 6 and 7 show the positive and negative samples classified to select five image subsets from the 6472-image dataset, as well
by VGG16 and ResNet50, respectively. Compared with AlexNet as the 10-fold cross-validation to select ten smaller image subsets.
(Fig. 5), the confidence score of the classified pollen of VGG16 With this selection method, the average performance of all subsets
was higher. The reason is that VGG16 has deeper layers which from different models was compared, the results of which are given
results in the extraction of more detailed and distinct features of in Supplementary Table 5. The experiments confirmed that ResNet50
pollen data. ResNet50 has a much deeper and complex network achieved the best performance on both 1000- and 500-sized dataset.
structure and the classification accuracy was higher than that of Deep learning-based methods show a better performance on
VGG16 (Table 4). For positive samples in Fig. 7 (a), the confidence both the large and smaller pollen datasets. An ablation study was
score of ResNet50 was almost 1.00 which was higher than that of conducted to help understand why this difference was retrieved.
VGG16. And in the test dataset, only three negative samples, Convolutional layers of deep neural networks can catch more rep-
shown in Fig. 7 (b), were misclassified due to the high performance resentative features compared with extracting handcrafted fea-
of the ResNet50 model. tures manually. We visualized intermediate feature maps of
After analysing automatic classification models based on both VGG16 and ResNet50 in Fig. 8 and Fig. 9 to provide extra insight
machine learning and deep learning methods on our pollen data- in the procedure of feature extraction. For each different layer of
set, we observed that the ResNet50 neural network reached an the model, different features were extracted. In Fig. 8, the feature
accuracy of 99.4% (±0.002) which is 4.9% higher compared to the maps of convolutional layers 1, 4 and 7 of VGG16 are shown. In
hierarchical machine learning model. Deep learning-based meth- Fig. 9, we show the feature maps of the convolutional layer in stage
ods perform better to classify our Urticaceae pollen grains. In addi- 1 and three bottlenecks in stage 2 of ResNet50. The structure of
tion to our pollen images dataset, we have used the deep learning ResNet50 can be divided into five stages [35]. Stage 1 consists of
classifiers to other pollen image datasets available to us. These 1 convolutional layer and stage 2–5 consist of a different number
have not been used in the training/testing but are used as unseen of bottleneck structures. From both feature maps, we can conclude
samples to probe the classifiers from our study. The classification that, in the first several convolutional layers, basic pollen features
results with these additional datasets confirm the findings from such as edges and textures (surface ornamentation) are clearly dis-
study. Early results with these extra datasets, based on VGG16, played as was also found in [12]. With an increase of the number of
have already been reported in [12]. With our ResNet50 model, network layers, more and more complex and abstract features
the results with unseen data are even better. In Supplementary influence the performance of pollen classification. For example,
Table 4 these results are summarized. in convolutional layer 4 of VGG16 and the first bottleneck in stage
2 of ResNet50, other important parts of pollen such as the pores are
3.3. Results on smaller-size image datasets highlighted. In the higher layers of the network, only the most rep-
resentative features are retained but these features are difficult to
It is common knowledge that the training process of deep learn- grasp. With the help of a deep convolutional network that can
ing model requires the use of a large data set. However, in daily extract different features from low level (detail) to high level (ab-
practice, there are limitations in the collection of sufficient samples stract), the best score in pollen classification tasks is achieved.
and images. Therefore, we examined the robustness of both In addition, all of the techniques, i.e., transfer learning, data
machine learning-based and deep learning-based methods when augmentation and hard voting, clearly contributed to improve
facing a smaller dataset. Are machine learning-based method and the performance of the deep learning models under study. Table 6
deep learning-based method comparable in performance? To shows to what extent the accuracy can be improved by different
answer this question, starting from the original data, two smaller techniques applied on the around 1000-sized image subset. Five
pollen image datasets consisting of 1000-sized and 500-sized 1000-sized image subsets were selected via 5-fold cross-
image subsets, were constructed. These image subsets were ran- validation. The average performance of five subsets was calculated
domly selected from 6472 images. And the ratio of the 3 classes and the results are shown in Table 6. The first row shows that
was 1:1:1. The experimental results on smaller datasets shown ResNet50 achieved 81.4% accuracy if the model was trained from
in Table 5 was based on one round of selection. scratch. Transfer learning improved the accuracy to 95.0% using
On both smaller pollen datasets (1000 and 500 images), the pre-trained parameters which were trained on the ImageNet data-
same six deep learning-based models were applied. For machine set. Based on the 95.0% accuracy of transfer learning, the ResNet50
learning models, we refine-tuned the hyperparameters of the best model with data augmentation improved the accuracy to 96.2%.
performed flat model (SVM) and hierarchical model (SVM + SVM). The accuracy is 1.2% higher than without data augmentation which
Table 5 shows the performance of both machine learning-based
and deep learning-based methods on the two smaller image data-
sets. Compared with the 88% accuracy of the flat model on the 1
Two differently sized groups of subsets consist of 1000- and 500-sized image
1000-image dataset, the 93.9% accuracy obtained by the hierarchi- subsets, they are given as an indication; the real number is slightly higher.

189
C. Li, M. Polling, L. Cao et al. Neurocomputing 522 (2023) 181–193

Label: 1 Label: 1 Label: 2 Label: 2 Label: 3 Label: 3


Predicted: 1 Predicted: 1 Predicted: 2 Predicted: 2 Predicted: 3 Predicted: 3
0.995 0.999 0.975 1 0.991 0.997
(a)

Label: 1 Label: 1 Label: 2 Label: 2 Label: 2 Label: 3


Predicted: 2 Predicted: 2 Predicted: 1 Predicted: 3 Predicted: 1 Predicted: 1
0.876 0.727 0.785 0.479 0.828 0.733
(b)

Fig. 6. Examples of classification performed by VGG16. (a) Positive samples with their predicted label and confidence score. (b) Negative samples with their predicted label
and confidence score.

Fig. 7. Examples of classification performed by ResNet50. (a) Positive samples with their predicted label and confidence score. (b) Negative samples with their predicted label
and confidence score.

Table 5
Performance comparison of different methods on smaller-size image datasets. Standard deviation, training each model three times, is given in brackets.

Deep learning-based Machine learning-based


AlexNet VGG16 VGG19 ResNet50 MobileNet V1 MobileNet V2 Flat model Hierarchical model
Accuracy of 0.916 0.943 0.943 0.963 0.947 0.950 0.880 0.939
1000 images (±0.006) (±0.006) (±0.012) (±0.012) (±0.015) (±0.010)
Accuracy of 0.861 0.920 0.920 0.933 0.927 0.907 0.760 0.896
500 images (±0.032) (±0.000) (±0.020) (±0.012) (±0.012) (±0.023)

is shown in the second row. Data augmentation helped to increase classification of our Urticaceae pollen data. Supplementary Table
the variety and the size of image data. Hard voting predicted the 6 shows the ablation study of ResNet50 based on 10-fold cross-
class labels with the majority votes of different classification mod- validation selection method.
els. The combination of all these techniques significantly improved
the accuracy of the ResNet50 model to 97.5%. The third row of
Table 6 shows that without transfer learning, the accuracy of the 4. Conclusion
ResNet50 model was only 86.1%. We can conclude that, in this
study, transfer learning plays a more important role in the perfor- This study aimed to find the automatic classification model
mance of deep learning models comparing with data augmentation with the best performance to classify Urticaceae pollen grains. Pol-
and hard voting. Because of these advanced techniques, we len grains of this family have high morphological similarity while
achieved great success with our deep learning models in the they induce different allergenic levels. Few researchers focused
190
C. Li, M. Polling, L. Cao et al. Neurocomputing 522 (2023) 181–193

(a)

(b)

(c)

Input Conv layer 1 Conv layer 4 Conv layer 7


Fig. 8. Example of feature maps of VGG16. (a) Parietaria. (b) Urtica. (c) Urtica membranacea. Column 1 represents the input data, column 2–4 are the output of convolutional
layer 1, 4, and 7, respectively.

(a)

(b)

(c)

Input Conv1_conv Conv2_block1_out Conv2_block2_out Conv2_block3_out

Fig. 9. Example of feature maps of ResNet50. (a) Parietaria. (b) Urtica. (c) Urtica membranacea. Column 1 represents the input data, column 2–5 are the output of convolutional
layer in stage 1, and output of three bottlenecks in stage 2, of ResNet50, respectively.

Table 6
Ablation study with ResNet50. The average performance of ResNet50 based on five (about) 1000-sized image subsets via 5-fold cross-validation selection method is given.
Standard deviation, of five subsets, is given in brackets. Numbers in italics refer to training without transfer learning and data augmentation, respectively.

Training from With/without With/without data With hard


scratch transfer learning augmentation voting
Accuracy 0.814 0.950 0.962 0.975
(±0.025) (±0.017) (±0.004) (±0.002)
0.814 0.950 0.950 0.971
(±0.025) (±0.017) (±0.017) (±0.022)
0.814 0.814 0.837 0.861
(±0.025) (±0.025) (±0.026) (±0.023)

191
C. Li, M. Polling, L. Cao et al. Neurocomputing 522 (2023) 181–193

on classification of pollen of the Urticaceae nettle family to genus Acknowledgments


and species level. For our research, a pollen grain image dataset
of the Urticaceae family was constructed, consisting of 6472 This work was supported by the Chinese Scholarship Council
images. The pollen grains were unacetolyzed and to our knowl- through Leiden University and the European Union’s Horizon
edge, these had not yet been used before the analysis of pollen 2020 research and innovation programme under H2020 MSCA-
image classification tasks except for our own previous work [12]. ITN-ETN Grant agreement No 765000 Plant.ID. We would like
Two approaches in image classification techniques including thank Xiaoqin Tang, Zhan Xiong, Shima Javanmardi and other col-
machine learning-based methods and deep learning-based meth- leagues who assist us through feedback during discussions and
ods were implemented and analysed. For machine learning-based meetings.
methods, six texture features and two moment features were
extracted. Subsequently, several popular feature selection tech-
niques and classifiers were applied. Compared with flat classifica- Appendix A. Supplementary data
tion models, a hierarchical strategy was confirmed to achieve great
success with the classification task. Among the different machine Supplementary data associated with this article can be found, in
learning methods, the highest performance of 94.5% accuracy the online version, at https://doi.org/10.1016/j.neucom.2022.11.
was achieved by hierarchical classification models. For deep 042.
learning-based methods, six well-established deep Convolutional
Neural Networks were used to perform a classification task.
References
Together with data augmentation, cross validation and hard voting
techniques, the pretrained ResNet50 model, which achieved an [1] D. Gennaro, J.C.-N. Herberto, P.M.O. Olga, et al., The effects of climate change
accuracy of 99.4% (±0.002) was considered the best classification on respiratory allergy and asthma induced by pollen and mold allergens,
model among the six models investigated. Allergy: Eur. J. Allergy Clin. Immunol. 75 (9) (2020) 2219–2228, https://doi.
org/10.1111/all.14476.
From our comparison of machine learning-based with deep [2] T. Biedermann, L. Winther, S.J. Till, P. Panzner, A. Knulst, E. Valovirta, Birch
learning-based methods, we conclude that deep learning-based pollen allergy in europe (2019), https://doi.org/10.1111/all.13758.
methods perform better for pollen image classification. Two addi- [3] A.B. Gonçalves, J.S. Souza, G.G. Da Silva, M.P. Cereda, A. Pott, M.H. Naka, H.
Pistori, Feature extraction and machine learning for the classification of
tional experiments demonstrated that deep learning models are brazilian savannah pollen grains, PLoS ONE 11 (6). doi:10.1371/journal.
more successful for both large and smaller sized datasets. One rea- pone.0157044.
son is that deep learning models can extract more representative [4] S. Battiato, A. Ortis, F. Trenta, L. Ascari, M. Politi, C. Siniscalco, Detection and
classification of pollen grain microscope images, in: Proceedings of the IEEE
features of pollen images from low (detailed) to high (abstract)
Conference on Computer Vision and Pattern Recognition Workshops, 2020,
level. The performance of machine learning methods is, however, https://doi.org/10.1109/CVPRW50498.2020.00498.
highly dependent on the quality of features that are extracted from [5] V. Sevillano, K. Holt, J.L. Aznarte, Precise automatic classification of 46 different
the image dataset. In addition, transfer learning, data augmenta- pollen types with convolutional neural networks, PLoS ONE 15 (6).
doi:10.1371/journal.pone.0229751.
tion and hard voting techniques drastically improved the perfor- [6] S. Dunker, E. Motivans, D. Rakosy, D. Boho, P. Mäder, T. Hornick, T.M. Knight,
mance of deep learning models. An ablation study showed that Pollen analysis using multispectral imaging flow cytometry and deep learning,
the accuracy of deep learning models is improving step by step. New Phytologist. doi:10.1111/nph.16882.
[7] M. Pospiech, Z. Javrková, P. Hrabec, P. Štarha, S. Ljasovská, J. Bednář, B.
Deep nets such as Inception-V3, DenseNets, and NASNets have Tremlová, Identification of pollen taxa by different microscopy techniques,
shown to perform well on datasets in the public domain. Neverthe- PLoS ONE doi:10.1371/journal.pone.0256808.
less, ResNet50 has already yield an accuracy of 99.4% on our data- [8] G.C. Manikis, K. Marias, E. Alissandrakis, L. Perrotto, E. Savvidaki, N. Vidakis,
Pollen grain classification using geometrical and textural features, in: IST 2019
set. We may apply these deeper networks on a larger dataset in the – IEEE International Conference on Imaging Systems and Techniques
future. Our work clearly demonstrates what automatic classifica- Proceedings, 2019, https://doi.org/10.1109/IST48021.2019.9010563.
tion methods can accomplish for highly similar images of pollen [9] A. Daood, E. Ribeiro, M. Bush, Pollen grain recognition using deep learning, in:
Lecture Notes in Computer Science (including subseries Lecture Notes in
species in the Urticaceae family. This technique can be broader Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 10072, 2016,
applied to similar pollen from other families. This method could pp. 321–330. doi:10.1007/978-3-319-50835-1_30.
also potentially be extended to cope with other image classifica- [10] J. Schiele, F. Rabe, M. Schmitt, M. Glaser, F. Haring, J.O. Brunner, B. Bauer, B.
Schuller, C. Traidl-Hoffmann, A. Damialis, Automated classification of airborne
tion tasks.
pollen using neural networks, in: Proceedings of the Annual International
Conference of the IEEE Engineering in Medicine and Biology Society EMBS,
2019, pp. 4474–4478, https://doi.org/10.1109/EMBC.2019.8856910.
CRediT authorship contribution statement [11] V. Sevillano, J.L. Aznarte, Improving classification of pollen grain images of the
polen23e dataset through three different applications of deep learning
Chen Li: Conceptualization, Investigation, Methodology, Valida- convolutional neural networks, PLoS ONE 13 (9). doi:10.1371/journal.
pone.0201807.
tion, Writing - original draft. Marcel Polling: Conceptualization, [12] M. Polling, C. Li, L. Cao, F. Verbeek, L.A. de Weger, J. Belmonte, C. De Linares, J.
Investigation, Resources, Writing - review & editing. Lu Cao: Con- Willemse, H. de Boer, B. Gravendeel, Neural networks for increased accuracy of
ceptualization, Investigation, Writing - review & editing, Supervi- allergenic pollen monitoring, Sci. Rep. doi:10.1038/s41598-021-90433-x.
[13] M. del Pozo-Baños, J.R. Ticay-Rivas, J.B. Alonso, C.M. Travieso, Features
sion. Barbara Gravendeel: Resources, Writing - review & editing.
extraction techniques for pollen grain classification, Neurocomputing 150
Fons J. Verbeek: Conceptualization, Writing - review & editing, (2015) 377–391, https://doi.org/10.1016/j.neucom.2014.05.085.
Supervision. [14] J.V. Marcos, R. Nava, G. Cristóbal, R. Redondo, B. Escalante-Ramírez, G. Bueno,
Ó. Déniz, A. González-Porto, C. Pardo, F. Chung, T. Rodríguez, Automated pollen
identification using microscopic imaging and texture analysis, Micron 68
Data availability (2015) 36–46, https://doi.org/10.1016/j.micron.2014.09.002.
[15] G. Astolfi, A.B. Gonçalves, G.V. Menezes, F.S.B. Borges, A.C.M.N. Astolfi, E.T.
Matsubara, M. Alvarez, H. Pistori, Pollen73s: An image dataset for pollen grains
Data will be made available on request. classification, Ecol. Inform. 60. doi:10.1016/j.ecoinf.2020.101165.
[16] M. Rodrı̀uez-Damián, E. Cernadas, A. Formella, M. Fernández-Delgado, P. De
Sá-Otero, Automatic detection and classification of grains of pollen based on
Declaration of Competing Interest shape and texture, IEEE Trans. Syst., Man Cybern. Part C: Appl. Rev. 36 (4)
(2006) 531–542, https://doi.org/10.1109/TSMCC.2005.855426.
The authors declare that they have no known competing finan- [17] W. Rasband, Imagej, National Institutes of Health, Bethesda, Maryland, USA.
[18] F. Aguet, D. Van De Ville, M. Unser, Model-based 2.5-d deconvolution for
cial interests or personal relationships that could have appeared extended depth of field in brightfield microscopy, IEEE Trans. Image Process.
to influence the work reported in this paper. 17(7) (2008) 1144–1153. doi:10.1109/TIP.2008.924393.

192
C. Li, M. Polling, L. Cao et al. Neurocomputing 522 (2023) 181–193

[19] P. Bountris, E. Farantatos, N. Apostolou, Advanced image analysis tools Marcel Polling received his Ph.D. degree at Naturalis
development for the early stage bronchial cancer detection, World Academy Biodiversity Center (the Netherlands) and the Natural
of Science, Engineering and Technology 1 (9). History Museum, University of Oslo, Norway, in 2021.
[20] L. Cao, M. de Graauw, K. Yan, L. Winkel, F.J. Verbeek, Hierarchical classification He has a background in palynology, having studies Earth
strategy for phenotype extraction from epidermal growth factor receptor Sciences and working as a biostratigraphic consultant in
endocytosis screening, BMC Bioinform. 17 (196). doi:10.1186/s12859-016- Wales, United Kingdom for 4,5 years. During his PhD he
1053-2. focused on innovative methods of pollen recognition,
[21] G.A. Dunn, A.F. Brown, Alignment of fibroblasts on grooved surfaces described including automatic image recognition and DNA
by a simple geometric transformation, J. Cell Sci. 83 (1986) 313–340. metabarcoding. He is currently employed at Wagenin-
[22] C. Chudyk, H. Castaneda, R. Leger, I. Yahiaoui, F. Boochs, Development of an
gen Environmental Research as a Molecular Ecologist,
automatic pollen classification system using shape, texture and aperture
mainly working on projects utilizing genetics to study
features, in: LWA 2015 Workshops: KDML, FGWM, IR, and FGDB, 2015, pp. 65–
74. and monitor biodiversity.
[23] M. Rodríguez-Damián, E. Cernadas, A. Formella, P. Sá-Otero, Pollen
classification using brightness-based and shape-based descriptors, in:
Proceedings – International Conference on Pattern Recognition, vol. 2, 2004,
pp. 212–215. doi:10.1109/icpr.2004.1334098. Lu Cao received her Ph.D. from LIACS, Leiden University,
[24] N. Dalal, B. Triggs, Histograms of oriented gradients for human detection, in: the Netherlands in 2014. Her background is in Com-
Proceedings – 2005 IEEE Computer Society Conference on Computer Vision puter Science and her research focuses on high-
and Pattern Recognition, CVPR 2005, 2005. doi:10.1109/CVPR.2005.177. throughput imaging, image and data analysis. Between
[25] R.M. Haralick, I. Dinstein, K. Shanmugam, Textural features for image 2014 and 2016, she worked as a postdoc researcher in
classification, IEEE Trans. Syst., Man Cybern. 3 (6) (1973) 610–621, https:// the Department of Anatomy and Embryology, Leiden
doi.org/10.1109/TSMC.1973.4309314.
University Medical Center (LUMC). In 2017, she worked
[26] M.K. Hu, Visual pattern recognition by moment invariants, IRE Trans. Inform.
in the Department of Applied Stem Cell Technologies
Theory 8 (2) (1962) 179–187, https://doi.org/10.1109/TIT.1962.1057692.
(AST) at the University of Twente as a postdoctoral fel-
[27] J. Tang, S. Alelyani, H. Liu, Feature selection for classification: A review, in: Data
Classification: Algorithms and Applications, no. 28, 2014, pp. 37–64. low. Before joining LIACS, she worked in Cytocypher B.V.
doi:10.1201/b17320. as an image and data analyst. She started her work at
[28] D. Jain, V. Singh, Feature selection and classification systems for chronic LIACS as an assistant professor from 2019.
disease prediction: A review (2018), https://doi.org/10.1016/j.eij.2018.03.002.
[29] M.J. Cossetin, J.C. Nievola, A.L. Koerich, Facial expression recognition using a
pairwise feature selection and classification approach, in: Proceedings of the
International Joint Conference on Neural Networks, 2016, pp. 5149–5155, Barbara Gravendeel leads the Evolutionary Ecology
https://doi.org/10.1109/IJCNN.2016.7727879. group at Naturalis Biodiversity Center in Leiden, the
[30] M.C. Popescu, L.M. Sasu, Feature extraction, feature selection and machine Netherlands. She received her Ph.D. in Plant Systematics
learning for image classification: A case study, in: 2014 International at Leiden University in 2000. After having been a Ful-
Conference on Optimization of Electrical and Electronic Equipment OPTIM ,
bright Visiting Professor at Harvard University from
2014, 2014,, pp. 968–973, https://doi.org/10.1109/OPTIM.2014.6850925.
2006-2007, Barbara became a full professor at the
[31] C.N. Silla, A.A. Freitas, A survey of hierarchical classification across different
application domains (2011), https://doi.org/10.1007/s10618-010-0175-9. University of Applied Sciences Leiden from 2011-2018.
[32] S. Kiritchenko, S. Matwin, A.F. Famili, Functional annotation of genes using Since 2019, Barbara holds a chair in Plant Evolution at
hierarchical text categorization, In Proc. of the BioLINK SIG: Linking Literature, the Radboud Institute for Biological and Environmental
Information and Knowledge for Biology (held at ISMB-05. Sciences in Nijmegen, the Netherlands. She has a back-
[33] A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep ground in Botany and her research focuses on rapid
convolutional neural networks, Adv. Neural Inform. Process. Syst. (2012). evolutionary changes of plants.
[34] K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale
image recognition, in: Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition, 2015.
[35] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Fons Verbeek is head of the Imaging & Bioinformatics
Proceedings of the IEEE Conference on Computer Vision and Pattern
group in LIACS. Leiden University, the Netherlands. He
Recognition, 2016, pp. 770–778, https://doi.org/10.1109/CVPR.2016.90.
received his Ph.D. in Applied Physics in Delft University
[36] A.G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M.
of Technology and worked as a postdoc researcher in
Andreetto, H. Adam, Mobilenets: Efficient convolutional neural networks for
mobile vision applications, in: Proceedings of the IEEE Conference on University of Amsterdam subsequently. After that he
Computer Vision and Pattern Recognition, 2017. worked as a research fellow at the Hubrecht Institute in
[37] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, L.C. Chen, Mobilenetv 2: Utrecht. In 2003, he started to work at LIACS, Leiden
Inverted residuals and linear bottlenecks, in: Proceedings of the IEEE University as a researcher/group leader. In 2008, he
Conference on Computer Vision and Pattern Recognition, 2018, pp. 4510– became an associate professor. As of 2013 he was affil-
4520, https://doi.org/10.1109/CVPR.2018.00474. iated to Fiocruz, Rio de Janeiro, Brazil as a visiting pro-
[38] A. Mahbod, G. Schaefer, R. Ecker, I. Ellinger, Pollen grain microscopic image fessor. In 2016, he was appointed full professor at
classification using an ensemble of fine-tuned deep convolutional neural Leiden University, the Netherlands. His research area
networks, in: Proceedings of the IEEE Conference on Computer Vision and includes Image Analysis with a focus on microscopy imaging. His research stretches
Pattern Recognition, 2020. from feature extraction and segmentation to classification strategies in the domain
[39] D.G. Arias, M.V.M. Cirne, J.E. Chire, H. Pedrini, Classification of pollen grain
of the life-sciences.
images based on an ensemble of classifiers, in: Proceedings – 16th IEEE
International Conference on Machine Learning and Applications, ICMLA 2017,
2017, pp. 234–240, https://doi.org/10.1109/ICMLA.2017.0-153.

Chen Li received her Master degree in China Agricul-


tural University in 2018. She is a PhD candidate in
bioinformatics group in Leiden institute of advanced
computer science (LIACS), Leiden University, the
Netherlands. Her research interests focus on image
analysis and computer vision, now especially on
bioimaging analysis.

193

You might also like