Learning Feature Fusion Strategies For Various Image Types To Detect Salient Objects
Learning Feature Fusion Strategies For Various Image Types To Detect Salient Objects
www.elsevier.com/locate/pr
PII: S0031-3203(16)30104-2
DOI: http://dx.doi.org/10.1016/j.patcog.2016.05.020
Reference: PR5741
To appear in: Pattern Recognition
Received date: 7 March 2016
Revised date: 20 April 2016
Accepted date: 4 May 2016
Cite this article as: Muhammad Iqbal, Syed S. Naqvi, Will N. Browne,
Christopher Hollitt and Mengjie Zhang, Learning Feature Fusion Strategies for
Various Image Types to Detect Salient Objects, Pattern Recognition,
http://dx.doi.org/10.1016/j.patcog.2016.05.020
This is a PDF file of an unedited manuscript that has been accepted for
publication. As a service to our customers we are providing this early version of
the manuscript. The manuscript will undergo copyediting, typesetting, and
review of the resulting galley proof before it is published in its final citable form.
Please note that during the production process errors may be discovered which
could affect the content, and all legal disclaimers that apply to the journal pertain.
Learning Feature Fusion Strategies for Various Image
Types to Detect Salient Objects
Abstract
1. Introduction
Visual saliency has recently attracted much computer vision research, giving
birth to a new sub-domain known as salient object detection [1]. For salient
object detection, the task is to detect the salient, attention grabbing object(s)
in a scene and subsequently segment it in its entirety [2, 3]. It is similar to the
problem of figure-ground segmentation [4, 5, 6], but differs from the traditional
segmentation problem as the task is simply to find the most salient object
rather than completely partitioning the image into perceptually homogeneous
regions [7]. Salient object detection is actually the task of marking regions of
interest in a scene, which facilitates various computer vision applications, e.g.
image segmentation [8], image retrieval [9, 10], picture collage [11, 12], object
recognition [13] or image compression [14].
Most methods specialized for the task of salient object detection concentrate
on constructing deterministic tailor-made features [15, 16] such as color or color
gradient and apply heuristics to combine them. A class of models [17, 18, 19] use
low, mid and high-level features to learn a single set of weighting parameters
for combining features, but apply them across multiple types of images, e.g.
images with cluttered backgrounds or multiple objects of interest. Therefore,
such techniques inherently lose generalization when operated on test sets with
different images having various properties and sets of features. An alternative
approach is to learn model parameters using an assembly of weak learners, which
increase generalization. However the quality of final solution depends upon the
performance of individual learners and can be degraded if one of the learners is
not optimal [20].
2
A learning classifier system (LCS) is a rule-based machine learning technique
in which each rule relates sections of the feature space with a classification and
a measure of accuracy [21, 22]. To address the issue of loss in generalization
on unseen image types and to make the system general for all image types,
previously we utilized the strength of LCS to autonomously divide the feature
space into niches and construct rules covering each image type [23]. The aim of
this paper is to extend and demonstrate the LCS technique proposed in [23]
by fully investigating niching of image types and demonstrating performance on
a wide range of domains and salient object detection benchmark techniques.
The rest of the paper is organized as follows. Section 2 briefly describes
the related work in salient object detection. In Section 3 the proposed LCS
technique to detect salient objects in an image is detailed. Section 4 introduces
the datasets, parameter settings, and performance measures used in the exper-
imentation. In Section 5 experimental results are presented and compared with
existing state-of-the-art systems. Section 6 provides an analysis of the evolved
classifier rules obtained using the proposed LCS system. In the ending section
this work is concluded and the future work is outlined.
2. Background
3
The proposed LCS method, to be presented in this study, to compute saliency
maps is based on XCS [32], which is a well-tested LCS model. In XCS, the learn-
ing agent evolves a population [P ] of classifiers, as depicted in Figure 1, where
each classifier consists of a rule and a set of associated parameters estimating
the quality of the rule. Each rule is of the form ‘if condition then action’,
where condition is used to match input observations, and the corresponding
action predict the class label for a given observation. Commonly, the condition
in a rule is represented by a conjunction of predicates using one predicate for
each corresponding input feature; and the action is represented by a numeric
constant.
4
action set [A] is formed, which consists of the classifiers in [M ] that advocate
a. After receiving an environmental reward, the associated parameters of all
classifiers in [A] are updated. When appropriate, new classifiers are produced
using an evolutionary mechanism, usually a GA. Additionally, in XCS overly
specific classifiers may be subsumed by any more general and accurate classifiers
in order to reduce the number of classifiers in the final population [33]. For a
complete description, the interested reader is referred to the original XCS papers
by Wilson [32, 34], and to the algorithmic details by Butz and Wilson [35].
5
al. [19] used least square regression to learn weights for eye fixation prediction
using basic saliency features (i.e., color, intensity and orientation). Both the
discriminative approaches lose generalization on a subset of images due to a
single weighting scheme being applied to features for all image types. Navjot
et al. [39] applied a constrained Particle Swarm Optimization method to deter-
mine an optimal weight vector to combine different features to obtain a saliency
map. The single solution set learned after the evolutionary process hinders the
generalization performance of their method on the large set of testing images.
The recent discriminative regional feature integration (DRFI) work of Jiang
et al. [7] employs regression to automatically select and integrate features. The
inclusion of high dimensional features for learning a regression model enables
their approach to discover discriminative features and achieve robust perfor-
mance on unseen data. However, computing high dimensional features and
running a regressor for each image to increase generalization comes at a cost
of additional computational time and limits scalability with increasing num-
ber of regions per image. Despite the inclusion of multiple segmentations at
different scales, the region based saliency computation is prone to inherent limi-
tations, such as inappropriate annotation and non-uniform saliency assignment.
Additionally, in scenarios where feature performance is heavily dependent upon
important feature related parameters, joint optimization of such parameters can
not be easily incorporated into the automatic feature integration process.
Moreover, related work on alternating optimization for multiview varied fea-
ture fusion has been proposed in past works that takes into account the com-
plementary information of multiview data [40, 41].
All the above mentioned approaches achieve reasonably good results, how-
ever they lose generalization capability as they only learn a single set of weights
to combine features for all image types. AdaBoost [18, 20] learns the task of
salient object detection using an assembly of weak learners (hence increasing
generalization). However the quality of final solution depends upon the perfor-
mance of individual learners and can be affected drastically by one of the learners
in the decision tree, badly affecting the generalization of the overall system [42].
6
Once again, AdaBoost does not divide the feature space into niches depend-
ing upon image types, which affects generalization on unseen images. Due to
cooperative nature of the evolved rules, the LCS technique has an inherent ca-
pability to autonomously divide the feature space into niches. Previously we
adapted a supervised classifier system, known as XCS with Computed Action
(XCSCA) [43], to learn different combinations of image features and construct
rules covering each image type [23]. This study substantiates that work by
fully investigating the niching of different image types and demonstrating per-
formance on a wide range of domains and salient object detection benchmark
techniques.
7
Algorithm 1: The training process in the proposed approach. Here
maxP robs denotes the maximum number of training instances, which is
usually greater than the number of images in the training set S because a
single image can be used more than once as a training instance.
Data: A set [S] of training images.
Result: An evolved population [P ] of classifiers.
1 numP robs ← 0
2 repeat
3 s ← randomly select an input image from S
4 numP robs ← numP robs + 1
5 [M ] ← generate match set out of [P ] using s
6 Am ← compute action (i.e., saliency map) of each classifier m ∈ [M ]
7 Em ← compute error between the target saliency map and each Am
8 update associated parameters of each classifier m ∈ [M ] using Em
9 run GA, when deemed appropriate
10 until numP robs = maxP robs
11 return P
• f 0: A global feature that assigns low saliency to colors that vary a lot in
the spatial domain, which is based on the work of Liu et al. [1]. To compute
the spatial variance of colors in an image, all the colors are modeled by
8
Algorithm 2: The testing process in the proposed approach.
Data: A set [T ] of testing images, and a population [P ] of classifiers.
Result: The number of correctly and falsely classified positive and negative
pixel values for all images ∈ T .
1 numImages ← 0
2 repeat
3 t ← get an input image from T
4 numImages ← numImages + 1
5 [M ] ← generate match set out of [P ] using t
6 best ← choose the best classifier using fitness Fm of each classifier m ∈ [M ]
7 Abest ← compute action (i.e., saliency map) of the best classifier
8 E ← compute error between the target saliency map and the computed
saliency map Abest
9 T Pt ← number of correctly classified positive pixel values for t
10 T Nt ← number of correctly classified negative pixel values for t
11 F Pt ← number of falsely classified positive pixel values for t
12 F Nt ← number of falsely classified negative pixel values for t
13 until numImages = total number of images in T
14 return T P, T N, F P, F N
9
• f 2: A global feature computing the spatial distribution of pixels in a
cluster with respect to the image center, inspired by the work of Fu et al.
[45]. The global spatial distribution of pixels in a cluster from the image
center is measured as the mean of Euclidean distance between pixels in a
cluster to the image center.
10
the boundary of the window. The number of such super-pixels that have
their pixels both inside and outside the window boundary must be low for
a window having a high probability of containing the object(s).
11
such that there exists a ground distance between the features.1 Consequently,
conditions in classifier rules will be encoded as a concatenation of real-valued
intervals so that the system can evolve generalized classifiers. To aid in un-
derstanding the novel encoding scheme, the classifier matching mechanism is
depicted in Figure 2.
Figure 2: The novel encoding scheme to match an input image against the classifier population
in order to enable generalization in classifier rules.
If the current input state is not matched by any classifier in the population
during the training phase, a random classifier is created to cover the current
input. However, if an input is not matched during the testing phase, we com-
pute Euclidean distance of each classifier condition from the current input; and
consider the closer “|P | × tournamentSize” classifiers to be matched, where |P |
is the number of classifiers in the population and tournamentSize is a constant
parameter, usually chosen as 0.4.
1 We used the fast implementation of EMD with thresholded ground distances based on
the work of Pele and Werman [50].
12
In this work, the action (the saliency map) in a classifier rule is computed
as a linear function of the input image and a weight vector w, similar to XC-
SCA [43]. In this function, the computed two-dimensional features are linearly
combined with the evolved weights to produce the required saliency map for the
matched input image, as given in Equation 1. Here x0 denotes a constant input
parameter. The weights are evolved using recursive least squares, as described
in [51].
SaliencyM ap = w0 x0 + wi+1 fi ; i = 0, 1, 2...8. (1)
TP + TN
Error = 1 − , (2)
TP + FP + TN + FN
where T P and T N are the number of correctly classified positive and neg-
ative pixel values (for a single image) between the computed saliency map and
the ground truth. F P and F N represent the falsely classified pixel values.
4. Experiment Design
This section describes the data sets, parameter settings, perform measures,
and the state-of-the-art methods used in this study to compare the results of
the proposed approach XCSCA.
13
4.1. Data Sets
This work employs the following commonly used benchmark data sets in the
field of computer vision: Microsoft Research Asia (MSRA) [44], Salient Object
Dataset (SOD) [52], and Segmentation Evaluation Dataset (SED2) [53].
The MSRA data set is comprised of 25,000 images in total and includes
ground truth annotations in the form of labeled rectangles from multiple users.
These ground truth annotations classify multiple objects as one by placing them
in a single rectangle and also do not cater for pixel-wise accuracy. To remedy
these effects, a set of 1000 images were manually segmented by a single user
to obtain binary masks [54]. These ground truth masks consider the effect of
pixel-wise accuracy and multiple objects and is widely accepted in the field of
computer vision as a standard benchmark for saliency evaluation. All images
and respective ground truth are resized to 200 × 200 to leverage computational
efficiency. Consequently the computed feature maps are of the same size. Fig-
ure 3 shows some representative images with their ground truth and respective
features from three data sets. It can be observed that no single feature accu-
rately captures the ground truth.
MSRA
SOD
SED2
Figure 3: From left to right: image, ground truth, features f 0 − f 8 in the same order as listed
in Section 3.
The SOD [52] benchmark contains 300 images that are difficult for salient
object detection due to the cluttered backgrounds and ambiguous salient object
scenes. Boundary level ground truth information is available for all images. The
SED2 [53] contains 100 two-object images along with pixel-wise ground truth
and is a subset of the segmentation evaluation database.
14
4.2. Parameter Settings
For all the experiments conducted in this study, the number of classifiers
used is 2000 and the number of training instances is 20, 000. The selection
method used to select two parent classifiers in the GA is tournament selection
with tournament size ratio 0.4. GA subsumption is activated whereas action
set subsumption is deactivated. The remaining learning parameters and their
values used in this study are shown in Table 1, which are commonly used in the
literature, as suggested by Butz and Wilson [35].
The training has been conducted on the MSRA data set only as it includes
images of different types, e.g. cluttered background, multiple objects, large
salient objects, small salient objects, and faces, persons, objects and text. The
resulting classifier populations are tested on each of the three data sets. All
the experiments have been repeated 30 times with a different seed in each run,
where the MSRA data set has been randomly divided into a training set of 700
and a test set of 300 images in each experiment. Each result reported in this
work is average of the 30 runs.
15
Table 1: The learning parameters used in the proposed method.
Parameter Description Value
FI initial fitness value of a new classifier 0.01
α fitness fall-off rate used to calculate fitness of a classifier 0.1
β learning rate used to update fitness of a classifier 0.2
0 error threshold in accuracy under which a classifier is considered to be accurate 10
ν fitness exponent used to calculate fitness of a classifier 5
θGA threshold above which the GA is applied in a match set 25
χ probability of applying (two-point) crossover in the GA to produce two offspring 0.8
16
μ probability of mutating an attribute in an offspring 0.01
θdel experience threshold above which fitness of a classifier may be considered in its probability of deletion 20
δ fraction of mean fitness of population below which fitness of a classifier may be considered in its probability of deletion 0.1
θsub experience threshold above which a classifier may subsume another classifier 20
Fr fitness reduction factor used to reduce the average fitness of two parent classifiers before assigning it to offspring 0.1
r0 a parameter used in the covering operation to determine the distribution range of the spread in each interval 0.7
m0 a parameter used in the mutation operation to modify the distribution range of intervals in offspring 0.5
x0 a constant input parameter used to compute action of a classifier 0.5
δrls a scaling factor used to initialize the covariance matrix in a classifier 100
monotonically decreasing trend and has been agreed upon by researchers in the
document retrieval field.
In order to compare the segmentation quality of different methods, the
saliency maps are thresholded adaptively based on their average intensity values
to compute precision, recall and F-measure metrics. The F-measure in this case
is a measure of accuracy of the saliency map to completely segment the whole
object from an image.
The proposed approach is compared with the nine features used as individual
saliency map generators, as described in the Section 3, and seven state-of-the-art
methods. The state-of-the-art methods selected for comparison include Dense
and Sparse Saliency (DSR) [56], Manifold Ranking (MR) [57], Bayesian Saliency
Model (BSM) [58], Cluster-based Saliency (CS) [59], Low Rank Matrix Recov-
ery (LRK) [60], Soft Image Abstraction (SIA) [61] and Hierarchical Saliency
(HS) [62]. The benchmark methods are selected as they are recent, widely ac-
cepted by the community and have made either their methods or their results
available for public use.
5. Results
This section shows the performance comparison of the obtained results using
the proposed approach with the individual feature based methods and seven
state-of-the-art methods.
17
This indicates that 20% to 30% pixels belong to the annotated salient objects
for all datasets. At the other extreme of the PR curve for lower recall values,
only the proposed LCS method retains high precision values for most cases as
compared with the individual feature based methods. This behavior corresponds
to the smoother saliency maps of the proposed method that maintain more true
positives at higher thresholds, resulting in completely highlighted salient objects.
The high variability in terms of the PR curves of the features demonstrate
the performance gaps amongst individual features. It is noteworthy that the
proposed XCSCA method is able to identify these huge performance gaps and
improve upon individual feature performance by minimizing these gaps. The
average PR results on all datasets in the bottom row in Figure 4 demonstrate
the performance gains of 38.8% and 2.7% achieved by the XCSCA method as
compared with the worst and best performing individual features, respectively.
18
Figure 4: Comparison of individual features with XCSCA on all datasets in terms of PR
curves. From top row to bottom row: MSRA, SOD, SED2 and average PR curve on all
datasets.
19
Precision Recall F-meaure Precision Recall F-meaure
1 1
0.9 0.9
0.8 0.8
0.7 0.7
0.6 0.6
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
f0 f1 f2 f3 f4 f5 f6 f7 f8 XCSCA f0 f1 f2 f3 f4 f5 f6 f7 f8 XCSCA
(a) (b)
Precision Recall F-meaure Precision Recall F-meaure
1 1
0.9 0.9
0.8 0.8
0.7 0.7
0.6 0.6
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
f0 f1 f2 f3 f4 f5 f6 f7 f8 XCSCA f0 f1 f2 f3 f4 f5 f6 f7 f8 XCSCA
(c) (d)
Figure 5: Comparison of individual features with the proposed XCSCA method in terms of
segmentation based measures, i.e. F-measure, precision and recall. (a) MSRA, (b) SOD, (c)
SED2 and (d) average measures on all datasets.
20
can be observed that several methods perform closer to the proposed method
on lower threshold values, however the proposed method maintains the highest
precision on higher threshold values (corresponding to lower recall) as compared
with the state-of-the-art methods. The increased precision in the saliency re-
sults of the proposed method (at similar recall values) can be attributed to the
appropriate weighting of features within each niche, resulting in suppression of
background noise in cluttered scenarios.
In terms of the interpolated PR curves, DSR performs better than all meth-
ods on the MSRA dataset and HS performs better than all methods on the
SOD dataset. However, on average XCSCA performs better than all methods
as shown in the bottom row in Figure 6.
6. Further Discussions
21
Figure 6: Comparison of methods with respect to PR curves. From top row to bottom row:
MSRA, SOD, SED2 and average PR curve on all datasets.
22
Precision Recall F-meaure Precision Recall F-meaure
1 1
0.9 0.9
0.8 0.8
0.7 0.7
0.6 0.6
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
BSM CS DSR HS LRK MR SIA XCSCA BSM CS DSR HS LRK MR SIA XCSCA
(a) (b)
Precision Recall F-meaure Precision Recall F-meaure
1 1
0.9 0.9
0.8 0.8
0.7 0.7
0.6 0.6
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
BSM CS DSR HS LRK MR SIA XCSCA BSM CS DSR HS LRK MR SIA XCSCA
(c) (d)
23
thresholds. Also the segmentation results reveal that the proposed method con-
sistently obtained the highest precision as compared with other methods. The
robustness of our approach in completely highlighting the salient objects and
uniform saliency assignment inside object contours as compared with the state-
of-the-art methods can be observed by the visual comparison shown in Figure 8.
In most cases, the saliency maps produced by the proposed LCS method exhibit
the highest quality of capturing the precise location of salient objects, complete
salient objects and effective suppression of background noise as compared with
the state-of-the-art methods.
Input GT MR DSR HS LRK XCSCA
MSRA
SOD
SED2
Figure 8: Visual comparison of selected saliency methods on representative images from MSRA
(test images), SOD and SED2 datasets.
24
object recognition applications. The last row of Figure 8 shows an example of
saliency where the proposed method matches the object details more precisely
than depicted in the ground truth and highlights both objects more evenly than
state-of-the-art. The first and fourth rows of Figure 8 exhibit examples of proper
highlighting of complete salient object and suppression of background noise as
compared with the state-of-the-art techniques. The fourth row image in Figure 8
shows an example of a complex scene with an ambiguous salient object. The
state-of-the-art methods struggle in completely highlighting the salient regions
and include unwanted background noise as in columns 5 and 6. The proposed
method performs better than the state-of-the-art in capturing the salient region,
while suppressing the unwanted background noise.
25
Table 2: A sample of the experienced and accurate classifier rules, obtained in a typical run, using the proposed LCS system. The names of images
covered (i.e. matched) by a rule are listed under the rule.
No. Condition Weight Vector
1 [0.00 1.00] [0.03 0.68] [0.00 1.00] [0.22 1.00] [0.00 0.89] [0.00 0.44] [0.65 1.00] [0.02 0.66] [0.00 0.43] -416.2,-352.9,183.8,-268.0,1111.5,-141.0,701.9,752.8,1135.5,397.1
0 13 13700, 0 18 18530, 0 24 24558, 0 2 2301, 0 6 6646, 10 268015474 2bd515353c, 10 52202431.dscn0209.png ,10 59242003.img 4193, 1 55 55005, 1 57 57848
2 [0.00 0.47] [0.00 0.80] [0.00 0.74] [0.00 0.82] [0.00 0.43] [0.00 0.27] [0.99 1.00] [0.00 0.77] [0.00 1.00] -400.5,-72.7,980.0,49.6,272.7,141.5,1311.2,294.8,-110.6,628.8
0 18 18530, 0 19 19593, 0 21 21781, 0 2 2301, 0 7 7821, 10 260372546 03d18a4e9e, 10 267183829 e8538b16ff.png ,10 268122508 c361b4db6c, 10 268252475 3716432836 m,
10 268262571 a713359657, 10 41105485.05 0226 aabbspoth2, 10 43642602.pasqueflowerpasqueflowerearlyeve2, 10 49513258.0509180006.fsdm, 10 52202430.dscn0206,
10 52202431.dscn0209, 10 54628916.jan8 06 536, 10 54628917.jan8 06 537, 10 54628920.jan8 06 540, 10 66830630.sejklgec.sept10 06 746, 1 26 26013, 1 51 51078, 1 55 55005,
1 57 57848, 1 58 58671, 1 61 61269, 1 65 65984
3 [0.00 1.00] [0.20 0.90] [0.40 1.00] [0.00 0.69] [0.00 0.93] [0.00 0.52] [0.15 1.00] [0.00 0.79] [0.00,0.70] -559.7,255.1,517.9,296.3,15.9,472.2,539.3,40.2,627.7,787.9
26
0 11 11164, 0 11 11981, 0 12 12891, 0 13 13347, 0 13 13386, 0 13 13700, 0 15 15022, 0 16 16704, 0 16 16768, 0 18 18565, 0 19 19593, 0 1 1288, 0 1 1409, 0 1 1427,
0 21 21244, 0 21 21394, 0 24 24453, 0 2 2756, 0 4 4240, 0 5 5091, 0 5 5291, 0 6 6646, 0 7 7478, 0 7 7917, 0 8 8859, 0 9 9398, 0 9 9453, 10 00000069 018, 10 00000093 007,
10 144439584 97e8823d39, 10 1557631 c5d89caa41, 10 156234186 1b16bc540f, 10 161208056 248b2c2ab6, 10 224830165 1fc5dcfecf, 10 236916255 307bc2272c,
10 262725611 c3ce2b827e, 10 264977628 cde9f779bc, 10 266774040 f40061481f, 10 267183829 e8538b16ff, 10 267412727 1727822888, 10 267744690 ac99310c04,
10 267780676 e2346e581a, 10 267781224 2039c2b3fb, 10 267781788 2765356aeb, 10 267935839 30e321dbe5, 10 267975486 b9cd08a46f, 10 267976432 b4e9429cff,
10 268001846 db119074c8, 10 268014760 9676395938, 10 268129990 04e858de05, 10 268168010 d3852a2e6e, 10 268211019 270132655c, 10 268217391 a99ec26edb,
10 268231155 a6210a13b7, 10 268231902 1586ec90e2, 10 268252475 3716432836 m, 10 29912713.stream5sm, 10 36687635.gpzoo przewalski, 10 38173647.pbcats06,
10 40677597.img 1543, 10 43047581.ds20050506 0127awfcat, 10 44859877.10, 10 52202392.dscn0140, 10 52202445.dscn0226, 10 52202453.dscn0243, 10 52202457.dscn0251,
10 52202466.dscn0266, 10 58229768.20060405022 object of fancy, 10 59242003.img 4193, 10 59271038.138 3822, 10 59271075.img 1787,
10 66830586.6whtssec.sept10 06 705, 10 96835757 c609dbaa80, 10 98048098 af08c0f2df, 1 26 26013, 1 26 26532, 1 27 27034, 1 35 35795,
1 43 43206, 1 47 47611, 1 47 47818, 10 66830586.6whtssec.sept10 06 705, 1 51 51078, 1 53 53496, 1 54 54098, 1 56 56172, 1 56 56265, 1 57 57848,
1 60 60686, 1 62 62852, 1 63 63372, 1 63 63566, 1 65 65815, 1 65 65984, 1 66 66979, 1 67 67033
are computed by applying the corresponding classifier weights.
Figure 9: Images grouped by three representative experienced rules from the population.
As the selected classifiers are highly general and experienced, multiple images
are covered by more than an individual classifier as can be observed in Figure 9.
It is noteworthy that each group encompasses a variety of images belonging to
different types. This is due to the fact that images are grouped based on a
variety of extracted features where each feature captures a different property of
the image.
The grouped saliency response in Figure 10 elaborates the niching property
of the system as different characteristics saliency responses are covered by dif-
27
Figure 10: Saliency grouping results corresponding to the images in Figure 9.
ferent rules. In general group 2 includes saliency outputs where high saliency is
uniformly assigned inside object contours as can be observed by the red high-
lighted cases. Groups 1 and 3 feature noisy saliency response where the back-
ground confusion included in both cases differs in its characteristics. The red
squares show particularly clear examples for each group. It can be observed that
the background clutter included by group 1 saliency outputs appears to cap-
ture valid segments of the image belonging to the background. Conversely, the
background noise included in group 3 saliency outputs is scattered or dispersed.
28
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
7. Conclusions
The goal of this study was to investigate the effectiveness of learning clas-
sifier systems in combining different image features to detect salient objects
from different complexity-level images. This goal was successfully achieved by
incorporating a novel encoding scheme in a learning classifier system that com-
putes actions using a linear combination of features. The proposed approach
29
effectively learned different feature combinations for various types of images
with different difficulty levels. The obtained results indicate that the proposed
approach outperforms nine individual feature based methods and seven state-of-
the-art combinatorial methods, in terms of precise-recall curves and F-measures.
Further, it is observed that the proposed method preserves more details of ob-
jects than the state-of-the-art methods, which may benefit object recognition
applications. Future work includes the investigation of using more rich fea-
tures (such as computed in deep neural networks) and experimenting the LCS
technique on image classification benchmarks.
Acknowledgments
This work was supported in part by the Marsden Fund of the New Zealand
Government under contract VUW1209, administrated by the Royal Society of
New Zealand.
References
[1] T. Liu, Z. Yuan, J. Sun, J. Wang, N. Zheng, X. Tang, H.-Y. Shum, Learning
to Detect a Salient Object, IEEE Transactions on Pattern Analysis and
Machine Intelligence 33 (2) (2011) 353–367.
[5] S. Liu, X. Bai, Discriminative Features for Image Classification and Re-
trieval, Pattern Recognition Letters 33 (6) (2012) 744–751.
30
[6] H. Zhang, X. Bai, J. Zhou, J. Cheng, H. Zhao, Object Detection Via Struc-
tural Feature Selection and Shape Model, IEEE Transactions on Image
Processing 22 (12) (2013) 4984–4995.
[7] H. Jiang, J. Wang, Z. Yuan, Y. Wu, N. Zheng, S. Li, Salient Object De-
tection: A Discriminative Regional Feature Integration Approach, in: Pro-
ceedings of the IEEE Conference on Computer Vision and Pattern Recog-
nition, 2013, pp. 2083–2090.
[8] M.-M. Cheng, G.-X. Zhang, N. J. Mitra, X. Huang, S.-M. Hu, Global Con-
trast based Salient Region Detection, in: Proceedings of IEEE Conference
on Computer Vision and Pattern Recognition, 2011, pp. 409–416.
[10] X. Yang, X. Qian, T. Mei, Learning Salient Visual Word for Scalable Mobile
Image Retrieval, Pattern Recognition 48 (10) (2015) 3093–3101.
[11] J. Wang, L. Quan, J. Sun, X. Tang, H.-Y. Shum, Picture Collage, in: Pro-
ceedings of IEEE Conference on Computer Vision and Pattern Recognition,
2006, pp. 347–354.
31
[16] S. Goferman, L. Zelnik-Manor, A. Tal, Context-Aware Saliency Detection,
IEEE Transactions on Pattern Analysis and Machine Intelligence 34 (10)
(2012) 1915–1926.
[18] A. Borji, Boosting Bottom-Up and Top-Down Visual Features for Saliency
Estimation, in: Proceedings of IEEE Conference on Computer Vision and
Pattern Recognition, 2012, pp. 438–445.
32
[26] I. Kukenys, W. N. Browne, M. Zhang, Transparent, Online Image Pattern
Classification Using a Learning Classifier System, in: Applications of Evo-
lutionary Computation, Vol. 6624 of Lecture Notes in Computer Science,
Springer Berlin Heidelberg, 2011, pp. 183–193.
33
[36] S. S. Naqvi, W. N. Browne, C. Hollitt, Optimizing Visual Attention Models
for Predicting Human Fixations Using Genetic Algorithms, in: Proceedings
of IEEE Congress on Evolutionary Computation, 2013, pp. 1302–1309.
[38] N. Tong, H. Lu, Y. Zhang, X. Ruan, Salient Object Detection via Global
and Local Cues, Pattern Recognition 48 (10) (2015) 3258–3267.
[41] J. Yu, Y. Rui, D. Tao, Click Prediction for Web Image Reranking Using
Multimodal Sparse Coding, IEEE Transactions on Image Processing 23 (5)
(2014) 2019–2032.
[44] T. Liu, Z. Yuan, J. Sun, J. Wang, N. Zheng, X. Tang, H.-Y. Shum, Learning
to Detect a Salient Object, IEEE Transactions on Pattern Analysis and
Machine Intelligence 33 (2) (2011) 353–367.
34
[46] B. Alexe, T. Deselaers, V. Ferrari, What is an Object?, in: Proceedings of
IEEE Conference on Computer Vision and Pattern Recognition, 2010, pp.
73–80.
[50] O. Pele, M. Werman, Fast and Robust Earth Mover’s Distances, in: Pro-
ceedings of International Conference on Computer Vision, 2009, pp. 460–
467.
35
[54] R. Achanta, S. Hemami, F. Estrada, S. Süsstrunk, Frequency-tuned Salient
Region Detection, in: Proceedings of IEEE Conference on Computer Vision
and Pattern Recognition, 2009, pp. 1597–1604.
[56] X. Li, H. Lu, L. Zhang, X. Ruan, M.-H. Yang, Saliency Detection via
Dense and Sparse Reconstruction, in: IEEE International Conference on
Computer Vision, 2013, pp. 2976–2983.
[57] C. Yang, L. Zhang, H. Lu, X. Ruan, M.-H. Yang, Saliency Detection via
Graph-Based Manifold Ranking, in: Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, 2013, pp. 3166–3173.
[58] Y. Xie, H. Lu, M.-H. Yang, Bayesian Saliency via Low and Mid Level Cues,
IEEE Transactions on Image Processing 22 (5) (2013) 1689–1698.
[60] X. Shen, Y. Wu, A Unified Approach to Salient Object Detection via Low
Rank Matrix Recovery, in: Proceedings of the IEEE Conference on Com-
puter Vision and Pattern Recognition, 2012, pp. 853–860.
[61] M.-M. Cheng, J. Warrell, W.-Y. Lin, S. Zheng, V. Vineet, N. Crook, Ef-
ficient Salient Region Detection with Soft Image Abstraction, in: IEEE
International Conference on Computer Vision, 2013, pp. 1529–1536.
[62] Q. Yan, L. Xu, J. Shi, J. Jia, Hierarchical Saliency Detection, in: Proceed-
ings of the IEEE Conference on Computer Vision and Pattern Recognition,
2013, pp. 1155–1162.
36