DRBclassifier
DRBclassifier
net/publication/325898268
CITATIONS READS
63 1,277
2 authors:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Xiaowei Gu on 10 July 2018.
Abstract- In this paper, a new type of multilayer rule-based classifier is proposed and applied to image
classification problems. The proposed approach is entirely data-driven and fully automatic. It is generic and can
be applied to various classification and prediction problems, but in this paper we focus on image processing, in
particular. The core of the classifier is a fully interpretable, understandable, self-organised set of IF…THEN…
fuzzy rules based on the prototypes autonomously identified by using a one-pass type training process. The
classifier can self-evolve and be updated continuously without a full retraining. Due to the prototype-based
nature, it is non-parametric; its training process is non-iterative, highly parallelizable and computationally
efficient. At the same time, the proposed approach is able to achieve very high classification accuracy on
various benchmark datasets surpassing most of the published methods, be comparable with the human abilities.
In addition, it can start classification from the first image of each class in the same way as humans do, which
makes the proposed classifier suitable for real-time applications. Numerical examples of benchmark image
processing demonstrate the merits of the proposed approach.
Keywords- fuzzy rule based classifiers, deep learning, non-parametric, non-iterative, self-evolving structure
1. Introduction
Nowadays, deep learning has gained a lot of popularity in both the academic circles and the general public
thanks to the very quick advance in computational resources (both hardware and software) [20], [26]. A number
of publications have demonstrated that deep convolutional neural networks (DCNNs) can produce highly
accurate results in various image processing problems including, but not limited to, handwritten digits
recognition [12], [13], [21], [40], object recognition [18], [23], [42], human action recognition [10], [41], human
face recognition [19], [33], [46], remote sensing image classification [44], [50], etc. Some publications suggest
that the DCNNs can match the human performance on handwritten digits recognition problems [12], [13].
Indeed, DCNN is a powerful technique that provides high classification rates. There are also recently introduced
approaches exploiting deep models for image understanding [31], [32] by learning informative hidden
representations from visual features of images through DCNNs.
However, DCNNs have a number of deficiencies and shortcomings. For example, they require a huge
amount of training data, are usually offline, lack transparency and their internal parameters cannot be easily
interpreted; they involve ad hoc decisions concerning the internal structure; they have no proven guaranteed
convergence; they have limited parallelization ability. It is also well-known that DCNN-based approaches are
not able to deal with uncertainty. They perform classification quite well when the validation images share
similar feature properties with the training images, however, they require a full retraining for images from
unseen classes as well as for images with feature properties different from that of the training images.
On the other hand, traditional fuzzy rule-based (FRB) systems are well known for being an efficient
approach to deal with uncertainties. FRB systems have been successfully used for classification [8], [24]
offering transparent and interpretable structure. Their design also traditionally requires handcrafting
membership functions, assumptions to be made and parameters to be selected. More recently, very efficient
data-driven FRB classifiers were proposed which can learn autonomously from the data (streams) [2], [8], and
self-evolve, however even they could not reach the levels of performance achieved by deep learning classifiers
mainly because of their quite simple and small internal structure.
In this paper, we offer a principally new approach, which combines the advantages of both, the recently
introduced self-organising non-parametric FRB systems [2], [7], , applied to classification problem [3] with the
concept of a massively parallel multi-layer structure that deep learning benefits from. This results in a
principally new type of a multi-layer neuro-fuzzy architecture, which we call Deep Rule-Based (DRB) system
and demonstrate its performance on various image classification problems. The proposed DRB approach
*Corresponding Author
employs a massively parallel set of 0-order fuzzy rules [3], [7], [8] as the learning engine, which self-organizes a
transparent and human understandable IF…THEN FRB system structure. Each IF…THEN… fuzzy rule of the
DRB system consists of a (large) number of prototypes, which are not pre-determined, but are identified through
a fully autonomous, online, non-iterative, non-parametric training process. These prototypes are the most
representative actual data samples (images) at which the data density obtains local maxima (the most typical
locally images); they are used to automatically form data clouds (cluster-like groupings of data with similar
properties) by attracting the other data samples (images) to them [7]. The training process of the DRB system
can start “from scratch”, and more importantly, it can start classification from the first image of each class in the
same way as humans do, and is able to consistently self-evolve and self-update its structure and meta-parameters
with newly observed training images, which makes the proposed classifier suitable for real-time applications.
The proposed DRB approach is more generic, but in this paper we limit our study only to image
classification. We use only the very fundamental image transformation techniques such as normalization,
rotation, scaling and segmentation. In this way, the generalization ability of the well-known (low and high level)
feature descriptors from the field of computer vision, which we use (described in the next section) is further
improved. These pre-processing steps are common for the computer vision literature, but we do not use one
specific pre-processing technique which is often used (elastic deformation [12], [13]) because of its low
reproducibility and somewhat controversial nature.
The DRB classifier has a general architecture and is simpler, entirely data-driven and fully automatic in
comparison to than the DCNN-based approaches, but it is able to perform highly accurate classification on
various benchmark problems surpassing the state-of-the-art methods, including mainstream deep learning. Its
prototype-based nature also allows the training process to be non-parametric, non-iterative and highly
parallelizable since it concerns only the visual similarity between the identified prototypes and the unlabelled
samples. As a result, it is faster by several orders of magnitude, does not require accelerated hardware such as
GPU, HPC and can be ported on chip and still be continuously learning.
Moreover, thanks to the fact that only the general principles are involved in the proposed approach, the
DRB system can be easily modified and extended to various classification and prediction problems. In
summary, if compared with the state-of-the-art approaches, the proposed DRB classifier has the following
unique properties:
i) it is free from prior assumptions and user- and problem- specific parameters;
ii) it offers a human-interpretable and self-evolving structure;
iii) its training process is fully online, transparent, non-iterative, non-parametric (it is prototype-based);
iv) its training process can start “from scratch”;
v) its training process is highly parallelizable;
Numerical experiments based on various benchmark image classification datasets (handwritten digits
recognition, remote sensing image recognition and object recognition) demonstrate its excellent performance.
The remainder of this paper is organized as follows. Section 2 introduces the general multi-layer
architecture of the proposed approach. Section 3 briefly describes the feature descriptors involved in the DRB
classifier. The training process and validation process of the proposed DRB classifier are presented in Section 4.
Numerical examples are given in Section 5, and this paper is concluded by Section 6.
3. Feature Extraction
In this section, we will briefly describe the feature descriptors that are employed in the DRB classifier to
make it self-contained. Feature extraction can be viewed as a projection from the original images to a feature
space that makes the images from different classes separable, namely, I x . Current feature descriptors can
be divided into three categories based on their descriptive abilities [44], namely: “low-level”, “medium-level”
and “high-level”. Different feature descriptors have different advantages. In general, low-level feature
descriptors work very well on problems where low-level visual features, e.g., spectral, texture, and structure,
play the dominant role. In contrast, high-level feature descriptors work better on classifying images with high-
diversity and nonhomogeneous spatial distributions because they can learn more abstract and discriminative
semantic features.
In this paper, two low-level feature descriptors (GIST and HOG) are employed, and we further create a
combination of both to improve their descriptive ability. However, as the low-level feature descriptors are not
enough to handle efficiently complex, large-scale problems, we also use one of the most widely used high-level
feature descriptors (a pre-trained VGG-VD-16 [42]). It has to be stressed that the high-level feature descriptor is
directly used without further tuning and is a part of the pre-processing layer.
As there is no interdependence of different images within the feature extraction stage, it can be parallelized
massively to further reduce the processing time. Once the global features (either low- or high-level) of the image
are extracted and stored, there is no need to repeat the same process again.
We also have to stress that this paper describes a general DRB approach and the feature descriptors are not
necessarily limited to GIST or HOG or the pre-trained VGG-VD-16 only. Alterative feature descriptors can be
used, i.e. CaffeNet [22], SIFT [34], etc., and further combinations of different visual features can also be
considered as well. One may further consider to refine the commonly used visual features into more informative
representations by uncovering an appropriate latent subspace [30]. However, selecting the most suitable feature
descriptor(s) for a particular problem requires prior knowledge about the problem, and this is out of the scope of
this paper.
3.1. Employed Low-Level Feature Descriptors
A. GIST Descriptor
GIST feature descriptor gives an impoverished and coarse version of the principal contours and textures of
an image [38]. In the proposed DRB classifier, we use the same GIST descriptor as described in [38] without
any modification, which extracts a 1 512 dimensional feature vector denoted by g I g1 I , g 2 I ,...,
g512 I .
B. HOG Descriptor
HOG descriptor [14] has been proven to be very successful in various computer vision tasks, such as object
detection, texture analysis and image classification. In the DRB classifier, although the size of the images varies
for different problems, we used the default block size of 2 2 and changed the cell size to fix the
dimensionality of the HOG features to be 1 576 , denoted by h I = h1 I , h2 I ,..., h576 I .
To improve the distinctiveness of the HOG feature vectors of images between different classes, we expand
the value range of the HOG vectors by the following nonlinear nonparametric function [4],[5]:
1, x 0
where sgn( x) 0, x 0 , and the nonlinearly mapped HOG feature vector of I is denoted by h I .
1, x 0
C. Combined GIST-HOG Features
To further improve the descriptive ability of the GIST and HOG feature descriptors, in this paper, we
further combine the GIST and HOG feature vectors to create a new, more descriptive integrated vector as
follows:
h I
g I
f I , (2)
g I h I
However, as the pre-trained model requires the input image to be the size of 227×227 pixels, it is, in fact,
not good in handling problems with small-size images with simple semantic contents.
IF I ~ Pc,1 OR
OR I ~ Pc , Nc THEN class c (3)
where “~” denotes similarity, which can also be seen as a fuzzy degree of satisfaction/membership [7] or
typicality [6]; I is a particular image and x is its corresponding feature vector; x can be g I , h I ,
f I or v I ; Pc , j is the jth visual prototype of the cth class; pc , j is the corresponding feature vector of Pc , j
and has the same dimensionality as x ; j 1, 2,..., Nc ; N c is the number of prototypes of the cth class.
c 1,2,..., C .
Examples of AnYa type fuzzy rules generalized from the popular handwritten digits recognition problem,
MNIST dataset [27] for digits “2”, “3”, “5” and “8” are visualized in Table II. As we can see, AnYa type fuzzy
rules in the table provide a very intuitive representation of the mechanism. Moreover, each of the AnYa type
fuzzy rules can be interpreted as a number of simpler fuzzy rules with single prototype connected by “OR”
operator. As a result, a massive parallelization is possible.
Table II. Illustrative Example of AnYa Fuzzy Rules with MNIST Dataset
Fuzzy Rules
IF (I ~ ) OR (I ~ ) OR (I ~ ) OR (I ~ ) OR … OR (I ~ ) OR (I ~ ) THEN (digit 3)
IF (I ~ ) OR (I ~ ) OR (I ~ ) OR (I ~ ) OR … OR (I ~ ) OR (I ~ ) THEN (digit 5)
IF (I ~ ) OR (I ~ ) OR (I ~ ) OR (I ~ ) OR … OR (I ~ ) OR (I ~ ) THEN (digit 8)
In the remainder of this section, we will describe the training and validation processes as well as the
decision-making mechanism of the proposed DRB classifier.
4.1. Training of the DRB System
Due to the highly parallel structure of the proposed system, in this subsection, we summarize the main
procedure of the training process of a single FRB subsystem, namely the c th one.
Stage 0: System Initialization
The c th FRB subsystem is initialized by the first image of the c th class, I c ,1 . We firstly apply the vector
normalization to the global feature vector of I c ,1 , denoted by x c ,1 ( xc ,1 xc ,1,1 , xc ,1,2 ,..., xc ,1, d , d is the
dimensionality):
xc ,1 xc ,1 xc ,1 (4)
With the vector normalization, the Euclidean distance between two normalized data samples z z and
where k is the current time instance; c is the global mean of all the observed data samples of the c th class;
pc, Nc is the mean of feature vectors of the images associated with the first data cloud with the visual prototype
Pc , Nc ; Sc , Nc is the number of images associated with the data cloud; rc, Nc is the radius of the area of the data
cloud; ro is a small value to stabilize the initial status of the newly formed data clouds. Data clouds are very
much like clusters, but are nonparametric and do not have a specific pre-determined, regular shape. They
directly represent the local ensemble properties of the observed data samples [7].
In this paper, we use ro 2 1 cos(30o ) to define the degree of similarity on the edge of the data cloud.
We need to stress that, ro is not a problem-specific parameter and requires no prior knowledge to be determined
Stage 1: Preparation
For the newly arrived k th ( k k 1 ) training image that belongs to the c th class, denoted by I c , k we
xc , k
firstly apply the vector normalization (expression (4)) to its corresponding feature vector: xc , k . Then,
xc , k
the global mean, c is updated as follows:
k 1 1
c c xc , k (6)
k k
And we calculate the data densities of all the existing prototypes Pc ,i ( i 1, 2,..., N c , where N c is the
number of identified prototypes) as detailed in [6]:
D Pc,i
1
2
(7a)
1 pc,i c c2
D Ic,k
1
2
(7b)
1 xc , k c c2
where c2 X c c 1 c ; X c is the average norm of the observed normalized data samples, which is
2 2
Condition 1:
IF D I c , k max
j 1,2,..., N c
D P OR D I
c,i c, k min
j 1,2,..., N c
D P
c, i
(8)
THEN I c , k is a new prototype
Once Condition 1 is satisfied, I c , k is set to be a new prototype and it initializes a new data cloud:
If Condition 1 is not met, we find the nearest prototype to I c , k , denoted by Pc , n , using equation (10):
Pc , n arg min xc , k pc, j
j 1,2,..., Nc
(10)
Before we associate I c , k with the data cloud of Pc , n , Condition 2 is checked to see whether I c , k locates in
the area of influence of Pc , n :
If Condition 2 is met, I c , k is assigned to the data cloud formed around the prototype Pc , n and the meta-
parameters of this data cloud are updated as follows:
Sc , n 1 1 1 1
Sc , n Sc, n 1; pc, n pc , n xc, k ; rc2, n rc2, n c2, n ; (12)
Sc , n Sc , n 2 2
2
where c2, n 1 pc, n .
Otherwise, it means that I c , k is out of the influence area of the nearest data cloud, and, therefore, a new data
cloud is initialized by I c , k with I c , k as its prototype ( N c N c 1 ). The meta-parameters of the new data cloud
are, then, added using expression (9).
Then, the next image is grabbed at Stage 1. After all the training samples have been processed, the system
goes to the final stage and generates the AnYa type fuzzy rule.
Stage 3: Fuzzy Rules Generation
Once the training process has been finished, the system will generate one AnYa fuzzy rule based on the
identified prototypes:
Rulec : IF I ~ Pc,1 OR
OR I ~ Pc, Nc THEN class c (13)
If more training samples are available later, the FRB subsystem can continue the processing cycle from
Stage 1 and update the fuzzy rules accordingly.
The flowchart of the training process of the FRB subsystem is depicted in Fig. 2.
Fig. 2. Flowchart of the training process of the FRB subsystem
j 1,2,..., Nc
c I arg max exp x pc, j
2
(14)
As a result, one can get C scores of confidence I 1 I , 2 I ,..., C I per image, which are the
inputs of the overall decision-maker of the DRB classifier.
4.3. Decision-Making Mechanism
For a single FRB system, the overall decision-maker (the last layer in Fig. 1) decides the label of the
validation image using the “winner-takes-all” principle as follows:
In some applications, i.e. face recognition, remote sensing, object recognitions, etc., where local
information may play a more important role than the global information, one can consider segmenting (both the
training and validation) images to capture local information. In such cases, the 0-order FRB subsystems are
trained with segments of training images instead of the full images. The overall label of a validation image is
given as an integration of all the scores of confidence that the FRB subsystems associated with its segments,
denoted by Sg1 , Sg 2 ,…, SgT :
1 T
label I arg max c Sg i (16)
c 1,2,..., C T i 1
If an FRB ensemble [23] is used, the label of the validation image is considered as the integration of all the
scores of confidence that the FRB systems given to the image [4]:
1
max c ,i I
K
label I arg max I
c ,i (17)
c 1,2,..., C K i 1
i 1,2,..., K
where K is the number of FRB systems in the ensemble.
The detailed architecture of the proposed DRB for handwritten digits recognition for the training process is
shown in Fig.4. The architecture for the validation process is given in Fig.5.
The pre-processing block of the proposed DRB classifier for handwritten digits recognition consists of the
following layers, where we adopt the same rotation and scaling operation as used in references [12], [13] but
without using elastic distortion:
1. Normalization layer, which applies linear normalization to fit the original pixel value range of 0, 255
into the range of 0, 1 .
2. Scaling layer, which resizes the validation images from their original size of 28 28 into 7 (S=7)
different sizes: i) 28 22 , ii) 28 24 , iii) 28 26 , iv) 28 28 , v) 28 30 , vi) 28 32 and vii) 28 34 .
3. Rotation layer, which rotates the images by 11 ( R 11 ) different angles i) -15o, ii) -12o, iii) -9o, iv) -6o,
v) -3o, vi) 0o, vii) 3o, viii) 6o, ix) 9o, x) 12o and xi) 15o.
4. Segmentation layer, which extracts the central area ( 22 22 ) from the training images. It discards the
borders that consist mostly of white pixels with little or no information.
The scaling and rotation layers create 77 ( SR 77 ) new training sets from the original one with respect to
different scaling sizes and rotation degrees [4]. As a result, we will train 77 DRB systems in regards to the 77
new training sets and later form an ensemble. Each DRB system consists of 10 AnYa type 0-order fuzzy rules
with a large number of prototypes connected with a disjunction (Logical “OR”) as shown in Table II,
corresponding to digits “0” to “9”. For each validation image, we just apply the normalization and segmentation
operations.
Since the images within the MNIST dataset are quite small and simple, high-level feature descriptors are
not suitable for this problem. Therefore, the feature descriptor used by the DRB classifier in this experiment is
GIST, HOG or the combined GIST and HOG (CGH) features. However, due to the different descriptive abilities
of these features, the performance of the DRB classifier is somewhat different. The recognition accuracy of the
proposed DRB classifier using different feature descriptors is tabulated in Table III. The corresponding average
training times for the 10 fuzzy rules are tabulated in Table IV.
Table III. Comparison between the Proposed Approach and the State-of-the-Art Approaches
Committee Committee
Large Large of 7 of 35
DRB DRB
DRB- DRB- DRB- Convolution Convolution Convolution Convolution
Approaches Ensem Cascade
GIST HOG CGH al Networks al Networks al Neural al Neural
ble [5]
[40] [21] Networks Networks
[12] [13]
99.30 98.86 99.32 99.44 99.73%
Accuracy 99.55% 99.40% 99.47% 99.77%
% % % % 2%
Almost 14 hours for each one
Training Time Less than 2 minute for each part
of the DNNs.
No No Core i7-920 (2.66GHz), 12
PC-Parameters Core i7-4790 (3.60GHz), 16 GB DDR3
Information Information GB DDR3
2 GTX 480 &
2 GTX 580
GPU Used None
Elastic
Distortion No No No Yes
Tuned
Parameters No Yes Yes Yes
By further creating a DRB ensemble consisting of a DRB classifier trained with GIST features and a DRB
classifier trained with HOG features, we achieve a better recognition performance, which is tabulated in Table
III as well. In our previous work, we also proposed a DRB cascade [5] that further improves the recognition
accuracy by using a SVM for conflict resolution, which is also presented in Table III. The conflict resolution
only applies to a small number (about 5%) of the validation data for which the two highest confidence values are
close to each other and thus there may be two possible winners with similar overall scores [5]. One of the
important advantages of the proposed DRB classifier is that it provides in a clear and explicit form per rule/class
confidence level.
The only 56 images that are incorrectly recognized by the proposed DRB ensemble are depicted in Fig. 3
and the corresponding labels are given on top of these images. As we can see, none of these digits is written
clearly and the majority of them are far different from the normal handwriting styles.
One of the most distinctive advantages of the proposed DRB classifier is its evolving ability, which means
that there is no need for complete re-training of the classifier when new data samples are available. To illustrate
this advantage, we train the DRB classifier with images in the form of an image stream (video). Meanwhile, the
execution time and the recognition accuracy are recorded during the process. In this example, we use the
original training set without rescaling or rotation, which speeds up the process significantly. The relationship
curves of the training time (the average for each of the 10 fuzzy rules) and recognition accuracy with the
growing amount of the training samples are depicted in Fig. 6.
Table IV. Computation Time for the Learning Process per Sub-system (in seconds)
Fuzzy Rule # 1 2 3 4 5 6 7 8 9 10
Digital “0” “1” “2” “3” “4” “5” “6” “7” “8” “9”
GIST 39.26 32.39 41.95 45.72 37.17 34.90 37.36 35.89 42.99 36.90
Feature HOG 72.03 70.99 82.47 92.73 73.46 67.53 68.48 77.93 75.83 69.90
CGH 96.54 88.93 99.21 113.52 91.53 85.19 91.92 89.12 104.08 92.26
The architecture of the proposed DRB classifier, as shown in Fig.8, consists of the following layers:
1. Normalization layer;
2. Rotation layer, which rotates the images by i) 0o, ii) 90 o, iii) 180 o
and iv) 270o to improve the
generalization ability of the classifier.
3. Segmentation layer, which splits each image into smaller pieces by a 64 64 size sliding window with
the step size of 32 pixels in both horizontal and vertical directions. The segmentation layer cuts one image into
49 pieces.
4. Feature descriptor, which extracts the combined GIST and HOG features from each segment.
5. FRB system, which consists of 9 fuzzy rules, each of them is trained based on the segments of images of
a particular class within the dataset.
6. Decision-maker, which generates the labels using equation (16).
Following the commonly used experimental protocol [17], we firstly transform the images into grey-level
ones and train the proposed DRB classifier with randomly selected 20% of images of each class and use the
remainder as a validation data set. The experiment is repeated 5 times and the average accuracy is reported in
Table V. Visual examples of the extracted IF…THEN… rules per class during experiments are given in Table
VI.
The performance of the proposed DRB is also compared with the state-of-the-art approaches as follows:
1. Transfer Learning with Deep Representations (TLDP) [41];
2. Two-Level Feature Representation (TLFP) [17];
3. Bag of Visual Words (BoVW) [47];
4. Scale-Invariant Feature Transform with Sparse Coding (SIFTSC) [11];
5. Spatial Pyramid Matching Kernel (SPMK) [25].
and the recognition accuracies of the comparative approaches are reported in Table V as well. One can see that,
the proposed approach is able to produce a significantly better recognition result than the best current methods.
Furthermore, by using a smaller step size, the DRB classifier can grasp more details, and this leads to a better
recognition performance.
Table V. Comparison between the Proposed Approach and the State-of-the-Art Approaches
Method Accuracy (%)
TLDP [41] 82.13
TLFP [17] 90.94
BoVW [47] 87.41
SIFTSC [11] 87.58
SPMK [25] 82.85
DRB-GCH 92.95
DRB-VGG 97.70
To show the evolving ability of the proposed DRB classifier, we randomly select out 20% of the images of
each class for validation and train the DRB classifier with 10%, 20%, 30%, 40%, 50%, 60%, 70% and 80% of
the dataset. The experiment is repeated five times and the average accuracy is tabulated in Table VII. The
average time for training is also reported, however, due to the unbalanced classes, the training time as tabulated
in Table VII is the overall training time of the nine fuzzy rules.
As handwritten digits images in the MNIST dataset are much simpler, the low-level feature descriptors are
sufficient for problems of this type. In contrast, remote sensing images have more fine details and a variety of
semantic contents. Therefore, we further introduce the high-level feature descriptor, namely, the pre-trained
VGG-VD-16 model, into the DRB classifier and use the original RGB remote sensing images for training. The
architecture of the DRB classifier is adjusted as depicted in Fig. 9 to accommodate the high-level feature
descriptor. As one can see, the adjusted DRB classifier is different from the one using low-level feature
descriptors in terms of the following layers:
1. Segmentation layer, which splits each image into smaller pieces by a 192 192 pixels size sliding
window with the step size of 64 pixels in both horizontal and vertical directions. The segmentation layer cuts
one image into 4 pieces.
2. Scaling layer, which resizes the image segments into the size of 227 227 pixels;
3. Feature descriptor, which extracts a 1 4096 dimensional feature vector from each segment;
And the rotation layer, FRB layer and decision makers are the same as shown in Fig. 8. Then, the
experiments in Tables V and VII are repeated using the same experimental protocol, and the new results are
tabulated in the respective Tables.
From the above experiments one can see that by using the high-level feature descriptor, both the recognition
accuracy and the computational efficiency of the DRB classifier on the remote sensing problem are significantly
boosted.
5.3. UCMerced Dataset
UCMerced dataset [47] consists of fine spatial resolution remote sensing images of 21 challenging scene
categories (including airplane, beach, building, etc.). Each category contains 100 images of the same image size
(256×256 pixels). The example images of the 21 classes are shown in Fig.10.
Following the commonly used experimental protocol [17], we randomly select 80% of images of each class
for training and use the remainder as a validation set. The experiment is repeated 5 times and the average
accuracy is reported in Table VIII. In this experiment, we use the same architecture as depicted in Fig. 9.
The performance of the proposed DRB is also compared with the state-of-the-art approaches as follows:
1. Two-Level Feature Representation (TLFP) [17];
2. Bag of Visual Words (BoVW) [47];
3. Scale-Invariant Feature Transform with Sparse Coding (SIFTSC) [11];
4. Spatial Pyramid Matching Kernel (SPMK) [25],[48];
5. Multipath Unsupervised Feature Learning (MUFL) [15];
6. Random Convolutional Network (RCNet) [50];
7. Linear SVM with Pre-Trained CaffeNet (SVM+Caffe) [39];
8. LIBLINEAR Classifier with the VGG-VD-16 Features (LIBL+VGG) [44];
9. Linear SVM with the VGG-VD-16 Features (SVM+VGG).
Table VIII. Comparison between the Proposed Approach and the-State-of-the-Art Approaches
Approach Accuracy Approach Accuracy
TLFR [17] 91.12% RCNet [50] 94.53%
BoVW [47] 76.80% SVM+ Caffe [39] 93.42%
SIFTSC [11] 81.67% LIBL+VGG [44] 95.21%
SPMK [48] 74.00% SVM+VGG 94.48%
MUFL [15] 88.08% DRB 96.14%
From the comparison given in Table VIII one can see that, the proposed DRB classifier, again, produced the
best classification performance. Similarly, we randomly select out 20% of the images of each class for
validation and train the DRB classifier with 10%, 20%, 30%, 40%, 50%, 60% and 70% of the dataset. The
experiment is repeated 5 times, and the average accuracy and time required for training (per rule) are tabulated
in Table IX. One can see from Table X that the DRB classifier can achieve 95%+ classification accuracy with
less than 20 seconds for training each fuzzy rule in addition to the highly interpretable structure and
ability to continue to learn and evolve automatically.
Table IX. Results with Different Amount of Training Samples
Ratio 10% 20% 30% 40%
Accuracy (%) 83.48 88.57 90.80 92.19
Time (in seconds) 0.27 1.36 3.96 5.83
Ratio 50% 60% 70% 80%
Accuracy (%) 93.48 94.19 95.14 96.10
Time (in seconds) 10.29 11.52 15.49 18.15
Table X. Comparison between the Proposed Approach and the State-of-the-Art Approaches
Accuracy (%)
Approach
15 Training 30 Training
CBDN [28] 57.7 65.4
CLFH [23] 57.6 66.3
DECN [49] 58.6 66.9
LSPM [45] 67.0 73.2
LCLC [43] 65.4 73.4
DEFEATnet [18] 71.3 77.6
CSAE [35] 64.0 71.4
SVM+VGG 78.9 83.5
DRB 81.9 84.5