0% found this document useful (0 votes)
27 views37 pages

Learning Feature Fusion Strategies For Various Image Types To Detect Salient Objects

Uploaded by

praveengpkotwan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views37 pages

Learning Feature Fusion Strategies For Various Image Types To Detect Salient Objects

Uploaded by

praveengpkotwan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Author’s Accepted Manuscript

Learning Feature Fusion Strategies for Various


Image Types to Detect Salient Objects

Muhammad Iqbal, Syed S. Naqvi, Will N. Browne,


Christopher Hollitt, Mengjie Zhang

www.elsevier.com/locate/pr

PII: S0031-3203(16)30104-2
DOI: http://dx.doi.org/10.1016/j.patcog.2016.05.020
Reference: PR5741
To appear in: Pattern Recognition
Received date: 7 March 2016
Revised date: 20 April 2016
Accepted date: 4 May 2016
Cite this article as: Muhammad Iqbal, Syed S. Naqvi, Will N. Browne,
Christopher Hollitt and Mengjie Zhang, Learning Feature Fusion Strategies for
Various Image Types to Detect Salient Objects, Pattern Recognition,
http://dx.doi.org/10.1016/j.patcog.2016.05.020
This is a PDF file of an unedited manuscript that has been accepted for
publication. As a service to our customers we are providing this early version of
the manuscript. The manuscript will undergo copyediting, typesetting, and
review of the resulting galley proof before it is published in its final citable form.
Please note that during the production process errors may be discovered which
could affect the content, and all legal disclaimers that apply to the journal pertain.
Learning Feature Fusion Strategies for Various Image
Types to Detect Salient Objects

Muhammad Iqbala,∗, Syed S. Naqvia , Will N. Brownea , Christopher Hollitta ,


Mengjie Zhanga
a School of Engineering and Computer Science

Victoria University of Wellington, PO Box 600, Wellington 6140, New Zealand

Abstract

Salient object detection is the task of automatically localizing objects of in-


terests in a scene by suppressing the background information, which facilitates
various machine vision applications such as object segmentation, recognition and
tracking. Combining features from different feature-modalities has been demon-
strated to enhance the performance of saliency prediction algorithms and differ-
ent feature combinations are often suited to different types of images. However,
existing saliency learning techniques attempt to apply a single feature combina-
tion across all image types and thus lose generalization in the test phase when
considering unseen images. Learning classifier systems (LCSs) are an evolution-
ary machine learning technique that evolve a set of rules, based on a niched
genetic reproduction, which collectively solve the problem. It is hypothesized
that the LCS technique has the ability to autonomously learn different feature
combinations for different image types. Hence, this paper further investigates
the application of LCS for learning image dependent feature fusion strategies for
the task of salient object detection. The obtained results show that the proposed
method outperforms, through evolving generalized rules to compute saliency

∗ Corresponding Author: Muhammad Iqbal, School of Engineering and Computer Science,

Victoria University of Wellington, PO Box 600, Wellington 6140, New Zealand;


Email: [email protected]; Phone: +64 4 463 5233 x 8874
Email addresses: [email protected] (Muhammad Iqbal),
[email protected] (Syed S. Naqvi), [email protected] (Will N.
Browne), [email protected] (Christopher Hollitt),
[email protected] (Mengjie Zhang)

Preprint submitted to Pattern Recognition May 20, 2016


maps, the individual feature based methods and seven combinatorial techniques
in detecting salient objects from three well known benchmark datasets of various
types and difficulty levels.
Keywords: Object Detection, Saliency Map, Learning Classifier Systems,
XCS, Pattern Recognition

1. Introduction

Visual saliency has recently attracted much computer vision research, giving
birth to a new sub-domain known as salient object detection [1]. For salient
object detection, the task is to detect the salient, attention grabbing object(s)
in a scene and subsequently segment it in its entirety [2, 3]. It is similar to the
problem of figure-ground segmentation [4, 5, 6], but differs from the traditional
segmentation problem as the task is simply to find the most salient object
rather than completely partitioning the image into perceptually homogeneous
regions [7]. Salient object detection is actually the task of marking regions of
interest in a scene, which facilitates various computer vision applications, e.g.
image segmentation [8], image retrieval [9, 10], picture collage [11, 12], object
recognition [13] or image compression [14].
Most methods specialized for the task of salient object detection concentrate
on constructing deterministic tailor-made features [15, 16] such as color or color
gradient and apply heuristics to combine them. A class of models [17, 18, 19] use
low, mid and high-level features to learn a single set of weighting parameters
for combining features, but apply them across multiple types of images, e.g.
images with cluttered backgrounds or multiple objects of interest. Therefore,
such techniques inherently lose generalization when operated on test sets with
different images having various properties and sets of features. An alternative
approach is to learn model parameters using an assembly of weak learners, which
increase generalization. However the quality of final solution depends upon the
performance of individual learners and can be degraded if one of the learners is
not optimal [20].

2
A learning classifier system (LCS) is a rule-based machine learning technique
in which each rule relates sections of the feature space with a classification and
a measure of accuracy [21, 22]. To address the issue of loss in generalization
on unseen image types and to make the system general for all image types,
previously we utilized the strength of LCS to autonomously divide the feature
space into niches and construct rules covering each image type [23]. The aim of
this paper is to extend and demonstrate the LCS technique proposed in [23]
by fully investigating niching of image types and demonstrating performance on
a wide range of domains and salient object detection benchmark techniques.
The rest of the paper is organized as follows. Section 2 briefly describes
the related work in salient object detection. In Section 3 the proposed LCS
technique to detect salient objects in an image is detailed. Section 4 introduces
the datasets, parameter settings, and performance measures used in the exper-
imentation. In Section 5 experimental results are presented and compared with
existing state-of-the-art systems. Section 6 provides an analysis of the evolved
classifier rules obtained using the proposed LCS system. In the ending section
this work is concluded and the future work is outlined.

2. Background

This section introduces the necessary background in learning classifier sys-


tems, and the related work in salient object detection.

2.1. Learning Classifier Systems

Traditionally, an LCS represents a rule-based agent that incorporates a ge-


netic algorithm (GA) and machine learning to solve a given task by evolving
a population of interpretable classifiers. Each classifier covers a part of the
feature space that may be overlapped with other classifiers. The LCS tech-
nique has been successfully applied to a wide range of problems including clas-
sification, data mining, control, modeling, image processing and optimization
problems [24, 25, 26, 27, 28, 29, 30, 31].

3
The proposed LCS method, to be presented in this study, to compute saliency
maps is based on XCS [32], which is a well-tested LCS model. In XCS, the learn-
ing agent evolves a population [P ] of classifiers, as depicted in Figure 1, where
each classifier consists of a rule and a set of associated parameters estimating
the quality of the rule. Each rule is of the form ‘if condition then action’,
where condition is used to match input observations, and the corresponding
action predict the class label for a given observation. Commonly, the condition
in a rule is represented by a conjunction of predicates using one predicate for
each corresponding input feature; and the action is represented by a numeric
constant.

Figure 1: Overview of a learning classifier system [28].

In XCS, on receiving the environmental input state s, a match set [M ] is


formed consisting of the classifiers from the population [P ] that have conditions
matching the input s. For every action ai in the set of all possible actions, if
ai is not represented in [M ] then a covering classifier is randomly generated.
After that an action a is selected to be performed on the environment and an

4
action set [A] is formed, which consists of the classifiers in [M ] that advocate
a. After receiving an environmental reward, the associated parameters of all
classifiers in [A] are updated. When appropriate, new classifiers are produced
using an evolutionary mechanism, usually a GA. Additionally, in XCS overly
specific classifiers may be subsumed by any more general and accurate classifiers
in order to reduce the number of classifiers in the final population [33]. For a
complete description, the interested reader is referred to the original XCS papers
by Wilson [32, 34], and to the algorithmic details by Butz and Wilson [35].

2.2. Salient Object Detection

Visual attention is a fundamental research problem in psychology, neuro-


science, and computer vision literature. Researchers have built computational
models of visual attention to predict where humans are likely to fixate [36].
Recently, this work has been expanded to identify salient objects in a scene for
object detection and localization. Salient object detection is a difficult problem
in computer vision as natural scenes can include objects with cluttered back-
grounds (making it difficult to distinguish the object from background based on
its features) and scenes containing multiple objects.
Deterministic methods to detect salient objects include fine human-constructed
features, but they usually combine them linearly, thus neglecting the importance
of individual features [16]. Machine learning approaches have the ability to learn
feature importance during combination, which enhances their performance in
challenging cases such as scenes with cluttered backgrounds and multiple ob-
jects [37].
Tong et al. [38] used 73 texture and color features by exploring both global
and local cues to compute a saliency map. However, the simplistic nature of
feature combination (i.e., the average of the local and global features) compro-
mises the final saliency output on difficult cases of saliency detection. Judd et
al. [17] learned a model of saliency from 33 features (including low, mid and
high level features) to predict human eye fixations. They used support vector
machines (SVMs) with linear kernels to learn feature weightings, while Zhao et

5
al. [19] used least square regression to learn weights for eye fixation prediction
using basic saliency features (i.e., color, intensity and orientation). Both the
discriminative approaches lose generalization on a subset of images due to a
single weighting scheme being applied to features for all image types. Navjot
et al. [39] applied a constrained Particle Swarm Optimization method to deter-
mine an optimal weight vector to combine different features to obtain a saliency
map. The single solution set learned after the evolutionary process hinders the
generalization performance of their method on the large set of testing images.
The recent discriminative regional feature integration (DRFI) work of Jiang
et al. [7] employs regression to automatically select and integrate features. The
inclusion of high dimensional features for learning a regression model enables
their approach to discover discriminative features and achieve robust perfor-
mance on unseen data. However, computing high dimensional features and
running a regressor for each image to increase generalization comes at a cost
of additional computational time and limits scalability with increasing num-
ber of regions per image. Despite the inclusion of multiple segmentations at
different scales, the region based saliency computation is prone to inherent limi-
tations, such as inappropriate annotation and non-uniform saliency assignment.
Additionally, in scenarios where feature performance is heavily dependent upon
important feature related parameters, joint optimization of such parameters can
not be easily incorporated into the automatic feature integration process.
Moreover, related work on alternating optimization for multiview varied fea-
ture fusion has been proposed in past works that takes into account the com-
plementary information of multiview data [40, 41].
All the above mentioned approaches achieve reasonably good results, how-
ever they lose generalization capability as they only learn a single set of weights
to combine features for all image types. AdaBoost [18, 20] learns the task of
salient object detection using an assembly of weak learners (hence increasing
generalization). However the quality of final solution depends upon the perfor-
mance of individual learners and can be affected drastically by one of the learners
in the decision tree, badly affecting the generalization of the overall system [42].

6
Once again, AdaBoost does not divide the feature space into niches depend-
ing upon image types, which affects generalization on unseen images. Due to
cooperative nature of the evolved rules, the LCS technique has an inherent ca-
pability to autonomously divide the feature space into niches. Previously we
adapted a supervised classifier system, known as XCS with Computed Action
(XCSCA) [43], to learn different combinations of image features and construct
rules covering each image type [23]. This study substantiates that work by
fully investigating the niching of different image types and demonstrating per-
formance on a wide range of domains and salient object detection benchmark
techniques.

3. Salient Object Detection using Learning Classifier Systems

Commonly, salient objects in an image are detected by determining a saliency


map for the input image, which is a computed image that attempts to emphasize
the object(s) to be detected. In a supervised learning system, it is necessary to
provide the target saliency map (i.e., the ground truth) along with the input
image during training. A ground truth is a manually segmented binary image
that emphasizes the object to be detected. During the testing process, only the
image features computed using the input image are provided to the system. The
training and testing processes used in the proposed XCSCA-based approach to
detecting salient objects are briefly described in Algorithm 1 and Algorithm 2,
respectively.
The remaining of this section explains the new methods incorporated in
XCSCA to compute saliency maps in an effort to detect salient objects from
various types of images by learning different feature fusion strategies. The
new extensions introduced in the proposed technique are: the design of input
features, the mechanism to match an input image with a population of classifiers,
the mechanism to compute actions that are essentially the saliency maps, and
the error function to calculate an error between a computed saliency map and
the target saliency map.

7
Algorithm 1: The training process in the proposed approach. Here
maxP robs denotes the maximum number of training instances, which is
usually greater than the number of images in the training set S because a
single image can be used more than once as a training instance.
Data: A set [S] of training images.
Result: An evolved population [P ] of classifiers.
1 numP robs ← 0
2 repeat
3 s ← randomly select an input image from S
4 numP robs ← numP robs + 1
5 [M ] ← generate match set out of [P ] using s
6 Am ← compute action (i.e., saliency map) of each classifier m ∈ [M ]
7 Em ← compute error between the target saliency map and each Am
8 update associated parameters of each classifier m ∈ [M ] using Em
9 run GA, when deemed appropriate
10 until numP robs = maxP robs
11 return P

3.1. Design of Input Features

Instead of matching the input image at pixel-level against conditions of clas-


sifier rules, we will compute the following nine saliency-based features for each
input image. Each of this produces a two-dimensional real-valued arrays. We
carefully select potential features from previous work [44, 45, 8, 46, 47] that are
suited for the task of salient object detection and have been previously shown
to correlate with visual attention in different experimental settings. To thor-
oughly evaluate the learning performance of models, we have chosen features
that complement each other well, with each one performing better than others
for a particular image type in past experiences.

• f 0: A global feature that assigns low saliency to colors that vary a lot in
the spatial domain, which is based on the work of Liu et al. [1]. To compute
the spatial variance of colors in an image, all the colors are modeled by

8
Algorithm 2: The testing process in the proposed approach.
Data: A set [T ] of testing images, and a population [P ] of classifiers.
Result: The number of correctly and falsely classified positive and negative
pixel values for all images ∈ T .
1 numImages ← 0
2 repeat
3 t ← get an input image from T
4 numImages ← numImages + 1
5 [M ] ← generate match set out of [P ] using t
6 best ← choose the best classifier using fitness Fm of each classifier m ∈ [M ]
7 Abest ← compute action (i.e., saliency map) of the best classifier
8 E ← compute error between the target saliency map and the computed
saliency map Abest
9 T Pt ← number of correctly classified positive pixel values for t
10 T Nt ← number of correctly classified negative pixel values for t
11 F Pt ← number of falsely classified positive pixel values for t
12 F Nt ← number of falsely classified negative pixel values for t
13 until numImages = total number of images in T
14 return T P, T N, F P, F N

Gaussian Mixture Models (GMM). Afterwards, each pixel in the spatial


domain is assigned to a color component of the GMM. Next, horizontal
and vertical variances of each color component of the GMM are calculated
and added to obtain the total variations of colors in the spatial domain.
Finally, the colors having high total variance are assigned low saliency,
while the colors exhibiting low variance in the spatial domain are assigned
high saliency values.

• f 1: A global feature that captures the contrast between clusters obtained


through k-means segmentation, inspired by the work of Fu et al. [45]. The
contrast cue of a cluster is computed by accumulating its distance to all
other clusters.

9
• f 2: A global feature computing the spatial distribution of pixels in a
cluster with respect to the image center, inspired by the work of Fu et al.
[45]. The global spatial distribution of pixels in a cluster from the image
center is measured as the mean of Euclidean distance between pixels in a
cluster to the image center.

• f 3: A region based feature that computes the global contrast between


spatial neighboring regions only [8]. The image is first segmented using
graph-based image segmentation. Next, a quantized color histogram is
constructed for each region. Afterwards, saliency for each region is com-
puted as its weighted color contrast to all other regions in the image. The
weights are the number of pixels contained by a region to emphasize con-
trast to larger regions, while the contrast itself is measured by the color
distance metric between the two regions.

• f 4, f 5: Two low-level region-based color features adapted from the work


of Naqvi et al. [47]. One color feature for each region is computed by
accumulating the earth mover’s distance (EMD) of its LAB histogram
from the histograms of all other regions in the image, while the other is
computed by measuring EMD between the histograms of a region and
its neighboring regions only. Additionally, contrast from the boundary
regions is also exploited in these features to enhance their discriminative
power.

• f 6: A mid-level feature that uses the objectness of image windows to


highlight salient objects, based on the work of Alexe et al. [46]. The
objectness measure for a window is based on four image cues. The first
cue is multi-scale image saliency based on the work of Hou et al. [48]; the
second cue is the color contrast of a window from its surrounding regions,
computed as the Chi-square distance between the Lab histograms of the
window and its surrounding super-pixels; the third cue is computed by
measuring the edge density inside a window; finally, the last cue counts
the number of super-pixels that have their pixels both inside and outside

10
the boundary of the window. The number of such super-pixels that have
their pixels both inside and outside the window boundary must be low for
a window having a high probability of containing the object(s).

• f 7: A feature that groups regions based on their objectness score. Similar


regions in terms of objectness scores are merged to form a larger region.
For each region, its difference from all other regions is computed in terms
of objectness scores to form a difference matrix of size equal to the square
of the number of regions. From the difference matrix a global threshold is
calculated by finding the smallest difference that exist between neighbors.
Afterwards, a local process compares regions only with their neighboring
regions and groups those having a difference less than the global threshold
by assigning them the same objectness.

• f 8: A feature that highlights salient patterns based on the work of Naqvi


et al. [47]. The salient patterns are determined by finding any outstanding
patches that have a large distance from neighboring patches. Match dis-
tance is employed to compute the histogram distance between patches due
to its ability to capture cross-bin similarities/dissimilarities. In addition,
intra-patch variance is exploited for computational efficiency.

3.2. Input Matching Scheme

If classifier conditions in an LCS are matched directly to the computed image


features, then it will be hard to evolve any generalized classifier rules due to the
large-sized two-dimensional image features [26]. Therefore, in order to enable
generalization, we introduce a novel encoding scheme to match an input image
against classifier conditions. In this encoding scheme, each computed image
feature fi will be encoded as a real-valued constant di to be matched against
classifier conditions. The real-valued constant for a feature is computed using
earth mover’s distance (EMD) [49] from a two-dimensional artificial feature con-
sisting of all ones. EMD is a way of converting an image into a number, which
is defined as the minimum cost for transforming one histogram into the other

11
such that there exists a ground distance between the features.1 Consequently,
conditions in classifier rules will be encoded as a concatenation of real-valued
intervals so that the system can evolve generalized classifiers. To aid in un-
derstanding the novel encoding scheme, the classifier matching mechanism is
depicted in Figure 2.

Figure 2: The novel encoding scheme to match an input image against the classifier population
in order to enable generalization in classifier rules.

If the current input state is not matched by any classifier in the population
during the training phase, a random classifier is created to cover the current
input. However, if an input is not matched during the testing phase, we com-
pute Euclidean distance of each classifier condition from the current input; and
consider the closer “|P | × tournamentSize” classifiers to be matched, where |P |
is the number of classifiers in the population and tournamentSize is a constant
parameter, usually chosen as 0.4.

3.3. Computing Actions/Saliency Maps

In an LCS, usually, the action in a classifier rule is represented by a fixed


scalar value. However, to learn a task that has a large number of classes, it is
beneficial to use a mechanism to compute actions instead of using fixed scalar
actions in classifier rules [43, 28].

1 We used the fast implementation of EMD with thresholded ground distances based on
the work of Pele and Werman [50].

12
In this work, the action (the saliency map) in a classifier rule is computed
as a linear function of the input image and a weight vector w, similar to XC-
SCA [43]. In this function, the computed two-dimensional features are linearly
combined with the evolved weights to produce the required saliency map for the
matched input image, as given in Equation 1. Here x0 denotes a constant input
parameter. The weights are evolved using recursive least squares, as described
in [51].


SaliencyM ap = w0 x0 + wi+1 fi ; i = 0, 1, 2...8. (1)

To form a match set we use the real-valued distance (denoted by d) for


each of the two-dimensional features, but to compute the saliency map the
corresponding feature itself is used.

3.4. Error Function


The goal of the proposed approach is to evolve saliency maps for each input
image such that the error between the computed saliency map and the target
saliency map is minimized. The error is calculated using an error function that
determines the difference between the computed saliency map and the ground
truth of the input image. The calculated error is used to update the associated
parameters in the corresponding classifiers. The error function used in this work
is:

TP + TN
Error = 1 − , (2)
TP + FP + TN + FN
where T P and T N are the number of correctly classified positive and neg-
ative pixel values (for a single image) between the computed saliency map and
the ground truth. F P and F N represent the falsely classified pixel values.

4. Experiment Design

This section describes the data sets, parameter settings, perform measures,
and the state-of-the-art methods used in this study to compare the results of
the proposed approach XCSCA.

13
4.1. Data Sets

This work employs the following commonly used benchmark data sets in the
field of computer vision: Microsoft Research Asia (MSRA) [44], Salient Object
Dataset (SOD) [52], and Segmentation Evaluation Dataset (SED2) [53].
The MSRA data set is comprised of 25,000 images in total and includes
ground truth annotations in the form of labeled rectangles from multiple users.
These ground truth annotations classify multiple objects as one by placing them
in a single rectangle and also do not cater for pixel-wise accuracy. To remedy
these effects, a set of 1000 images were manually segmented by a single user
to obtain binary masks [54]. These ground truth masks consider the effect of
pixel-wise accuracy and multiple objects and is widely accepted in the field of
computer vision as a standard benchmark for saliency evaluation. All images
and respective ground truth are resized to 200 × 200 to leverage computational
efficiency. Consequently the computed feature maps are of the same size. Fig-
ure 3 shows some representative images with their ground truth and respective
features from three data sets. It can be observed that no single feature accu-
rately captures the ground truth.
MSRA
SOD
SED2

Figure 3: From left to right: image, ground truth, features f 0 − f 8 in the same order as listed
in Section 3.

The SOD [52] benchmark contains 300 images that are difficult for salient
object detection due to the cluttered backgrounds and ambiguous salient object
scenes. Boundary level ground truth information is available for all images. The
SED2 [53] contains 100 two-object images along with pixel-wise ground truth
and is a subset of the segmentation evaluation database.

14
4.2. Parameter Settings

For all the experiments conducted in this study, the number of classifiers
used is 2000 and the number of training instances is 20, 000. The selection
method used to select two parent classifiers in the GA is tournament selection
with tournament size ratio 0.4. GA subsumption is activated whereas action
set subsumption is deactivated. The remaining learning parameters and their
values used in this study are shown in Table 1, which are commonly used in the
literature, as suggested by Butz and Wilson [35].
The training has been conducted on the MSRA data set only as it includes
images of different types, e.g. cluttered background, multiple objects, large
salient objects, small salient objects, and faces, persons, objects and text. The
resulting classifier populations are tested on each of the three data sets. All
the experiments have been repeated 30 times with a different seed in each run,
where the MSRA data set has been randomly divided into a training set of 700
and a test set of 300 images in each experiment. Each result reported in this
work is average of the 30 runs.

4.3. Performance Measures

The performance is measured using precision-recall (PR) curves and the


segmentation quality. The PR curves are drawn by computing 256 pairs of
average precision and recall values. To compute the 256 pairs, the saliency
maps are thresholded using 256 thresholds in the range 0 to 255 to generate 256
binary maps. The binary maps are compared with the corresponding ground
truth map for each image to get 256 pairs of precision and recall values for each
image. The average of precision recall pairs over the whole dataset gives the
final PR curve. The increase of recall values on the x-axis of the PR curves
corresponds to decrease of threshold values from 255 to 0.
We present the original (i.e. empirical) PR curves as well as interpolated
PR curves in the results section. The interpolated PR curves are computed
by the method “maximum precision for each recall” [55], which ensures the

15
Table 1: The learning parameters used in the proposed method.
Parameter Description Value
FI initial fitness value of a new classifier 0.01
α fitness fall-off rate used to calculate fitness of a classifier 0.1
β learning rate used to update fitness of a classifier 0.2
0 error threshold in accuracy under which a classifier is considered to be accurate 10
ν fitness exponent used to calculate fitness of a classifier 5
θGA threshold above which the GA is applied in a match set 25
χ probability of applying (two-point) crossover in the GA to produce two offspring 0.8

16
μ probability of mutating an attribute in an offspring 0.01
θdel experience threshold above which fitness of a classifier may be considered in its probability of deletion 20
δ fraction of mean fitness of population below which fitness of a classifier may be considered in its probability of deletion 0.1
θsub experience threshold above which a classifier may subsume another classifier 20
Fr fitness reduction factor used to reduce the average fitness of two parent classifiers before assigning it to offspring 0.1
r0 a parameter used in the covering operation to determine the distribution range of the spread in each interval 0.7
m0 a parameter used in the mutation operation to modify the distribution range of intervals in offspring 0.5
x0 a constant input parameter used to compute action of a classifier 0.5
δrls a scaling factor used to initialize the covariance matrix in a classifier 100
monotonically decreasing trend and has been agreed upon by researchers in the
document retrieval field.
In order to compare the segmentation quality of different methods, the
saliency maps are thresholded adaptively based on their average intensity values
to compute precision, recall and F-measure metrics. The F-measure in this case
is a measure of accuracy of the saliency map to completely segment the whole
object from an image.
The proposed approach is compared with the nine features used as individual
saliency map generators, as described in the Section 3, and seven state-of-the-art
methods. The state-of-the-art methods selected for comparison include Dense
and Sparse Saliency (DSR) [56], Manifold Ranking (MR) [57], Bayesian Saliency
Model (BSM) [58], Cluster-based Saliency (CS) [59], Low Rank Matrix Recov-
ery (LRK) [60], Soft Image Abstraction (SIA) [61] and Hierarchical Saliency
(HS) [62]. The benchmark methods are selected as they are recent, widely ac-
cepted by the community and have made either their methods or their results
available for public use.

5. Results

This section shows the performance comparison of the obtained results using
the proposed approach with the individual feature based methods and seven
state-of-the-art methods.

5.1. Comparison with Individual Feature Based Methods

5.1.1. Precision Recall Curves Based Comparison


Figure 4 shows the performance of individual features in comparison with
the proposed XCSCA method on all chosen benchmark datasets in terms of
PR curves (empirical as well as interpolated). The performance of methods is
summarized and ranked by the area under the precision recall curve (AUCPR).
It can be observed from Figure 4 that at 0 threshold the precision values lie
between 0.2 and 0.3 (at the right extreme of the PR curves) for all datasets.

17
This indicates that 20% to 30% pixels belong to the annotated salient objects
for all datasets. At the other extreme of the PR curve for lower recall values,
only the proposed LCS method retains high precision values for most cases as
compared with the individual feature based methods. This behavior corresponds
to the smoother saliency maps of the proposed method that maintain more true
positives at higher thresholds, resulting in completely highlighted salient objects.
The high variability in terms of the PR curves of the features demonstrate
the performance gaps amongst individual features. It is noteworthy that the
proposed XCSCA method is able to identify these huge performance gaps and
improve upon individual feature performance by minimizing these gaps. The
average PR results on all datasets in the bottom row in Figure 4 demonstrate
the performance gains of 38.8% and 2.7% achieved by the XCSCA method as
compared with the worst and best performing individual features, respectively.

5.1.2. Segmentation Quality Based Comparison


Figure 5 shows the segmentation performance of the individual features in
comparison with the proposed XCSCA method on all datasets. The high vari-
ability of the precision, recall and F-measure metrics can be observed from
Figure 5. The average F-measure results on all datasets in Figure 5 (d) show
performance improvements of 29.8% and 4.5% obtained by the proposed XCSCA
method as compared with the worst and best performing individual features,
respectively.

5.2. Comparison with State-of-the-Art Methods

5.2.1. Precision Recall Curves Based Comparison


Figure 6 shows the performance of seven state-of-the-art methods in com-
parison with the proposed XCSCA method on all chosen benchmark datasets in
terms of PR curves (empirical as well as interpolated). In terms of the empirical
PR curves, the proposed XCSCA method outperforms all the state-of-the-art
techniques on all datasets. The overall superior quality of the saliency maps pro-
duced by the proposed method can be observed from the average PR results. It

18
Figure 4: Comparison of individual features with XCSCA on all datasets in terms of PR
curves. From top row to bottom row: MSRA, SOD, SED2 and average PR curve on all
datasets.

19
Precision Recall F-meaure Precision Recall F-meaure

1 1

0.9 0.9

0.8 0.8

0.7 0.7

0.6 0.6

0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0
f0 f1 f2 f3 f4 f5 f6 f7 f8 XCSCA f0 f1 f2 f3 f4 f5 f6 f7 f8 XCSCA

(a) (b)
Precision Recall F-meaure Precision Recall F-meaure

1 1

0.9 0.9

0.8 0.8

0.7 0.7

0.6 0.6

0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0
f0 f1 f2 f3 f4 f5 f6 f7 f8 XCSCA f0 f1 f2 f3 f4 f5 f6 f7 f8 XCSCA

(c) (d)

Figure 5: Comparison of individual features with the proposed XCSCA method in terms of
segmentation based measures, i.e. F-measure, precision and recall. (a) MSRA, (b) SOD, (c)
SED2 and (d) average measures on all datasets.

20
can be observed that several methods perform closer to the proposed method
on lower threshold values, however the proposed method maintains the highest
precision on higher threshold values (corresponding to lower recall) as compared
with the state-of-the-art methods. The increased precision in the saliency re-
sults of the proposed method (at similar recall values) can be attributed to the
appropriate weighting of features within each niche, resulting in suppression of
background noise in cluttered scenarios.
In terms of the interpolated PR curves, DSR performs better than all meth-
ods on the MSRA dataset and HS performs better than all methods on the
SOD dataset. However, on average XCSCA performs better than all methods
as shown in the bottom row in Figure 6.

5.2.2. Segmentation Quality Based Comparison


Figure 7 shows the comparison of methods in terms of the quality of their
induced segmentation for MSRA, SOD, and SED2 datasets. Figure 7 suggests
that none of the methods consistently outperforms all other methods on this
benchmark. However, on average the proposed method outperforms all the
state-of-the-art methods on all datasets. It is noteworthy that the proposed
method has consistently the highest precision as compared with all other state-
of-the-art methods on all datasets, suggesting that the proposed method main-
tains the highest number of true positives inside object contours in the seg-
mented output. The high precision and F-measure performance of the proposed
method can be ascribed to the suppression of background noise achieved by the
proposed method especially on cluttered background images. Additionally, the
robust performance on this benchmark can be partly associated to learning of
adaptive thresholds during the training stage of the proposed method.

6. Further Discussions

6.1. Qualitative Comparison


According to the precision recall curve based comparison, the proposed
method maintained higher precision than the state-of-the-art methods on higher

21
Figure 6: Comparison of methods with respect to PR curves. From top row to bottom row:
MSRA, SOD, SED2 and average PR curve on all datasets.

22
Precision Recall F-meaure Precision Recall F-meaure

1 1

0.9 0.9

0.8 0.8

0.7 0.7

0.6 0.6

0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0
BSM CS DSR HS LRK MR SIA XCSCA BSM CS DSR HS LRK MR SIA XCSCA

(a) (b)
Precision Recall F-meaure Precision Recall F-meaure

1 1

0.9 0.9

0.8 0.8

0.7 0.7

0.6 0.6

0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0
BSM CS DSR HS LRK MR SIA XCSCA BSM CS DSR HS LRK MR SIA XCSCA

(c) (d)

Figure 7: Comparison of methods with respect to segmentation based measures, i.e. F-


measure, Precision and Recall. (a) MSRA, (b) SOD, (c) SED2 and (d) average measures on
all datasets. The methods are sorted according to their F-measure scores.

23
thresholds. Also the segmentation results reveal that the proposed method con-
sistently obtained the highest precision as compared with other methods. The
robustness of our approach in completely highlighting the salient objects and
uniform saliency assignment inside object contours as compared with the state-
of-the-art methods can be observed by the visual comparison shown in Figure 8.
In most cases, the saliency maps produced by the proposed LCS method exhibit
the highest quality of capturing the precise location of salient objects, complete
salient objects and effective suppression of background noise as compared with
the state-of-the-art methods.
Input GT MR DSR HS LRK XCSCA
MSRA
SOD
SED2

Figure 8: Visual comparison of selected saliency methods on representative images from MSRA
(test images), SOD and SED2 datasets.

A noteworthy quality of the proposed saliency maps is preservation of object


details as shown by the third and last row of Figure 8, which may benefit

24
object recognition applications. The last row of Figure 8 shows an example of
saliency where the proposed method matches the object details more precisely
than depicted in the ground truth and highlights both objects more evenly than
state-of-the-art. The first and fourth rows of Figure 8 exhibit examples of proper
highlighting of complete salient object and suppression of background noise as
compared with the state-of-the-art techniques. The fourth row image in Figure 8
shows an example of a complex scene with an ambiguous salient object. The
state-of-the-art methods struggle in completely highlighting the salient regions
and include unwanted background noise as in columns 5 and 6. The proposed
method performs better than the state-of-the-art in capturing the salient region,
while suppressing the unwanted background noise.

6.2. Analysis of Evolved Rules

To further analyze the niching scheme of the evolved solutions, we performed


an experiment on 500 test images from the MSRA dataset. Three highly ex-
perienced representative rules were selected from the population as shown in
Table 2.
The images and corresponding saliency maps covered by each rule were
placed into distinct groups. To visualize image and saliency groups formed by
the rules, a dissimilarity matrix was created by utilizing the feature distances
for each image. Afterwards, 2D mapping and normalization of the data was
performed to display the images and saliency maps belonging to a group in the
form of an embedded image. As numerous images were covered by each indi-
vidual rule, it is not feasible to visualize all of them. Hence only a fixed sized
grid of size 200×200 is chosen and image thumbnails of size 50×50 are used for
visualization purposes. The arrangement of images is subject to the dissimilar-
ities between their feature distances in encoded feature space. Figure 9 shows
the image groups formed by the representative rules, while Figure 10 shows the
corresponding saliency maps. The images are placed into groups based on their
feature composition and arranged in the embedded image according to their
mutual distances in encoded feature space. The corresponding saliency maps

25
Table 2: A sample of the experienced and accurate classifier rules, obtained in a typical run, using the proposed LCS system. The names of images
covered (i.e. matched) by a rule are listed under the rule.
No. Condition Weight Vector
1 [0.00 1.00] [0.03 0.68] [0.00 1.00] [0.22 1.00] [0.00 0.89] [0.00 0.44] [0.65 1.00] [0.02 0.66] [0.00 0.43] -416.2,-352.9,183.8,-268.0,1111.5,-141.0,701.9,752.8,1135.5,397.1
0 13 13700, 0 18 18530, 0 24 24558, 0 2 2301, 0 6 6646, 10 268015474 2bd515353c, 10 52202431.dscn0209.png ,10 59242003.img 4193, 1 55 55005, 1 57 57848
2 [0.00 0.47] [0.00 0.80] [0.00 0.74] [0.00 0.82] [0.00 0.43] [0.00 0.27] [0.99 1.00] [0.00 0.77] [0.00 1.00] -400.5,-72.7,980.0,49.6,272.7,141.5,1311.2,294.8,-110.6,628.8
0 18 18530, 0 19 19593, 0 21 21781, 0 2 2301, 0 7 7821, 10 260372546 03d18a4e9e, 10 267183829 e8538b16ff.png ,10 268122508 c361b4db6c, 10 268252475 3716432836 m,
10 268262571 a713359657, 10 41105485.05 0226 aabbspoth2, 10 43642602.pasqueflowerpasqueflowerearlyeve2, 10 49513258.0509180006.fsdm, 10 52202430.dscn0206,
10 52202431.dscn0209, 10 54628916.jan8 06 536, 10 54628917.jan8 06 537, 10 54628920.jan8 06 540, 10 66830630.sejklgec.sept10 06 746, 1 26 26013, 1 51 51078, 1 55 55005,
1 57 57848, 1 58 58671, 1 61 61269, 1 65 65984
3 [0.00 1.00] [0.20 0.90] [0.40 1.00] [0.00 0.69] [0.00 0.93] [0.00 0.52] [0.15 1.00] [0.00 0.79] [0.00,0.70] -559.7,255.1,517.9,296.3,15.9,472.2,539.3,40.2,627.7,787.9

26
0 11 11164, 0 11 11981, 0 12 12891, 0 13 13347, 0 13 13386, 0 13 13700, 0 15 15022, 0 16 16704, 0 16 16768, 0 18 18565, 0 19 19593, 0 1 1288, 0 1 1409, 0 1 1427,
0 21 21244, 0 21 21394, 0 24 24453, 0 2 2756, 0 4 4240, 0 5 5091, 0 5 5291, 0 6 6646, 0 7 7478, 0 7 7917, 0 8 8859, 0 9 9398, 0 9 9453, 10 00000069 018, 10 00000093 007,
10 144439584 97e8823d39, 10 1557631 c5d89caa41, 10 156234186 1b16bc540f, 10 161208056 248b2c2ab6, 10 224830165 1fc5dcfecf, 10 236916255 307bc2272c,
10 262725611 c3ce2b827e, 10 264977628 cde9f779bc, 10 266774040 f40061481f, 10 267183829 e8538b16ff, 10 267412727 1727822888, 10 267744690 ac99310c04,
10 267780676 e2346e581a, 10 267781224 2039c2b3fb, 10 267781788 2765356aeb, 10 267935839 30e321dbe5, 10 267975486 b9cd08a46f, 10 267976432 b4e9429cff,
10 268001846 db119074c8, 10 268014760 9676395938, 10 268129990 04e858de05, 10 268168010 d3852a2e6e, 10 268211019 270132655c, 10 268217391 a99ec26edb,
10 268231155 a6210a13b7, 10 268231902 1586ec90e2, 10 268252475 3716432836 m, 10 29912713.stream5sm, 10 36687635.gpzoo przewalski, 10 38173647.pbcats06,
10 40677597.img 1543, 10 43047581.ds20050506 0127awfcat, 10 44859877.10, 10 52202392.dscn0140, 10 52202445.dscn0226, 10 52202453.dscn0243, 10 52202457.dscn0251,
10 52202466.dscn0266, 10 58229768.20060405022 object of fancy, 10 59242003.img 4193, 10 59271038.138 3822, 10 59271075.img 1787,
10 66830586.6whtssec.sept10 06 705, 10 96835757 c609dbaa80, 10 98048098 af08c0f2df, 1 26 26013, 1 26 26532, 1 27 27034, 1 35 35795,
1 43 43206, 1 47 47611, 1 47 47818, 10 66830586.6whtssec.sept10 06 705, 1 51 51078, 1 53 53496, 1 54 54098, 1 56 56172, 1 56 56265, 1 57 57848,
1 60 60686, 1 62 62852, 1 63 63372, 1 63 63566, 1 65 65815, 1 65 65984, 1 66 66979, 1 67 67033
are computed by applying the corresponding classifier weights.

Figure 9: Images grouped by three representative experienced rules from the population.

As the selected classifiers are highly general and experienced, multiple images
are covered by more than an individual classifier as can be observed in Figure 9.
It is noteworthy that each group encompasses a variety of images belonging to
different types. This is due to the fact that images are grouped based on a
variety of extracted features where each feature captures a different property of
the image.
The grouped saliency response in Figure 10 elaborates the niching property
of the system as different characteristics saliency responses are covered by dif-

27
Figure 10: Saliency grouping results corresponding to the images in Figure 9.

ferent rules. In general group 2 includes saliency outputs where high saliency is
uniformly assigned inside object contours as can be observed by the red high-
lighted cases. Groups 1 and 3 feature noisy saliency response where the back-
ground confusion included in both cases differs in its characteristics. The red
squares show particularly clear examples for each group. It can be observed that
the background clutter included by group 1 saliency outputs appears to cap-
ture valid segments of the image belonging to the background. Conversely, the
background noise included in group 3 saliency outputs is scattered or dispersed.

28
1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1 Average distance (Rule1)


Average distance (Rule2)
Average distance (Rule3)
0
f0 f1 f2 f3 f4 f5 f6 f7 f8
Feature number

Figure 11: Mean distance for images covered by an individual rule.

The saliency outputs enclosed in black boxes belong to images covered by


more than one rule. The different saliency response generated by different rules
can be easily observed for each duplicated image, e.g. the suppressed back-
ground saliency response by the group 3 rule for the hand image as compared
with the saliency output in group 1 is noteworthy. Conversely, group 2 and 3
generate similar saliency outputs for the cycle image.
Figure 11 shows the average results for the feature distances of images cov-
ered by an individual rule. This result depicts an important property of the
niching scheme of the proposed system. It can be observed that the features
f 4 − f 8 appear to be similar for all the images and these distinct rules are cre-
ated to accommodate the varying nature of features f 0 − f 3 for these images.
It is also noted that the influence of f 4 and f 5 has been greatly reduced by
the learnt weighting, i.e. effectively a “don’t care”, showing the system can
overcome the inclusion of a limited number of redundant or irrelevant features.

7. Conclusions

The goal of this study was to investigate the effectiveness of learning clas-
sifier systems in combining different image features to detect salient objects
from different complexity-level images. This goal was successfully achieved by
incorporating a novel encoding scheme in a learning classifier system that com-
putes actions using a linear combination of features. The proposed approach

29
effectively learned different feature combinations for various types of images
with different difficulty levels. The obtained results indicate that the proposed
approach outperforms nine individual feature based methods and seven state-of-
the-art combinatorial methods, in terms of precise-recall curves and F-measures.
Further, it is observed that the proposed method preserves more details of ob-
jects than the state-of-the-art methods, which may benefit object recognition
applications. Future work includes the investigation of using more rich fea-
tures (such as computed in deep neural networks) and experimenting the LCS
technique on image classification benchmarks.

Acknowledgments

This work was supported in part by the Marsden Fund of the New Zealand
Government under contract VUW1209, administrated by the Royal Society of
New Zealand.

References

[1] T. Liu, Z. Yuan, J. Sun, J. Wang, N. Zheng, X. Tang, H.-Y. Shum, Learning
to Detect a Salient Object, IEEE Transactions on Pattern Analysis and
Machine Intelligence 33 (2) (2011) 353–367.

[2] Y.-Z. Song, X. Bai, P. M. Hall, L. Wang, In Search of Perceptually Salient


Groupings, IEEE Transactions on Image Processing 20 (4) (2011) 935–947.

[3] S. S. Naqvi, W. N. Browne, C. Hollitt, Salient Object Detection via Spectral


Matting, Pattern Recognition 51 (2016) 209–224.

[4] R. Kimchi, M. A. Peterson, Figure-Ground Segmentation Can Occur With-


out Attention, Psychological Science 19 (7) (2008) 660–668.

[5] S. Liu, X. Bai, Discriminative Features for Image Classification and Re-
trieval, Pattern Recognition Letters 33 (6) (2012) 744–751.

30
[6] H. Zhang, X. Bai, J. Zhou, J. Cheng, H. Zhao, Object Detection Via Struc-
tural Feature Selection and Shape Model, IEEE Transactions on Image
Processing 22 (12) (2013) 4984–4995.

[7] H. Jiang, J. Wang, Z. Yuan, Y. Wu, N. Zheng, S. Li, Salient Object De-
tection: A Discriminative Regional Feature Integration Approach, in: Pro-
ceedings of the IEEE Conference on Computer Vision and Pattern Recog-
nition, 2013, pp. 2083–2090.

[8] M.-M. Cheng, G.-X. Zhang, N. J. Mitra, X. Huang, S.-M. Hu, Global Con-
trast based Salient Region Detection, in: Proceedings of IEEE Conference
on Computer Vision and Pattern Recognition, 2011, pp. 409–416.

[9] L. Shao, M. Brady, Specific Object Retrieval Based on Salient Regions,


Pattern Recognition 39 (10) (2006) 1932–1948.

[10] X. Yang, X. Qian, T. Mei, Learning Salient Visual Word for Scalable Mobile
Image Retrieval, Pattern Recognition 48 (10) (2015) 3093–3101.

[11] J. Wang, L. Quan, J. Sun, X. Tang, H.-Y. Shum, Picture Collage, in: Pro-
ceedings of IEEE Conference on Computer Vision and Pattern Recognition,
2006, pp. 347–354.

[12] S. Goferman, A. Tal, L. Zelnik-Manor, Puzzle-Like Collage, Computer


Graphics Forum 29 (2) (2010) 459–468.

[13] Y.-C. Chen, V. M. Patel, R. Chellappa, P. J. Phillips, Salient Views and


View-Dependent Dictionaries for Object Recognition, Pattern Recognition
48 (10) (2015) 3053–3066.

[14] L. Itti, Automatic Foveation for Video Compression Using a Neurobiolog-


ical Model of Visual Attention, IEEE Transactions on Image Processing
13 (10) (2004) 1304–1318.

[15] D. A. Klein, S. Frintrop, Center-surround Divergence of Feature Statistics


for Salient Object Detection, in: Proceedings of International Conference
on Computer Vision, 2011, pp. 2214–2219.

31
[16] S. Goferman, L. Zelnik-Manor, A. Tal, Context-Aware Saliency Detection,
IEEE Transactions on Pattern Analysis and Machine Intelligence 34 (10)
(2012) 1915–1926.

[17] T. Judd, K. Ehinger, F. Durand, A. Torralba, Learning to Predict Where


Humans Look, in: Proceedings of International Conference on Computer
Vision, 2009, pp. 2106–2113.

[18] A. Borji, Boosting Bottom-Up and Top-Down Visual Features for Saliency
Estimation, in: Proceedings of IEEE Conference on Computer Vision and
Pattern Recognition, 2012, pp. 438–445.

[19] Q. Zhao, C. Koch, Learning a Saliency Map Using Fixated Locations in


Natural Scenes, Journal of Vision 11 (3) (2011) 1–15.

[20] Q. Zhao, C. Koch, Learning Visual Saliency by Combining Feature Maps


in a Nonlinear Manner Using AdaBoost, Journal of Vision 12 (6) (2012)
1–15.

[21] J. H. Holland, L. B. Booker, M. Colombetti, M. Dorigo, D. E. Goldberg,


S. Forrest, R. L. Riolo, R. E. Smith, P. L. Lanzi, W. Stolzmann, S. W.
Wilson, What Is a Learning Classifier System?, in: Learning Classifier
Systems, From Foundations to Applications, Springer, 2000, pp. 3–32.

[22] L. Bull, T. Kovacs, Foundations of Learning Classifier Systems: An Intro-


duction, Springer, 2005.

[23] M. Iqbal, S. S. Naqvi, W. N. Browne, C. Hollitt, M. Zhang, Salient Object


Detection Using Learning Classifier Systems that Compute Action Map-
pings, in: Proceedings of the Genetic and Evolutionary Computation Con-
ference, 2014, pp. 525–532.

[24] L. Bull, Applications of Learning Classifier Systems, Springer, 2004.

[25] K. Shafi, T. Kovacs, H. A. Abbass, W. Zhu, Intrusion Detection with Evolu-


tionary Learning Classifier Systems, Natural Computing 8 (1) (2009) 3–27.

32
[26] I. Kukenys, W. N. Browne, M. Zhang, Transparent, Online Image Pattern
Classification Using a Learning Classifier System, in: Applications of Evo-
lutionary Computation, Vol. 6624 of Lecture Notes in Computer Science,
Springer Berlin Heidelberg, 2011, pp. 183–193.

[27] M. Behdad, L. Barone, T. French, M. Bennamoun, On XCSR for Electronic


Fraud Detection, Evolutionary Intelligence 5 (2) (2012) 139–150.

[28] M. Iqbal, W. N. Browne, M. Zhang, XCSR with Computed Continuous


Action, in: Proceedings of the Australasian Joint Conference on Artificial
Intelligence, 2012, pp. 350–361.

[29] M. Iqbal, W. N. Browne, M. Zhang, Learning Complex, Overlapping and


Niche Imbalance Boolean Problems Using XCS-Based Classifier Systems,
Evolutionary Intelligence 6 (2) (2013) 73–91.

[30] M. Iqbal, W. N. Browne, M. Zhang, Reusing Building Blocks of Extracted


Knowledge to Solve Complex, Large-Scale Boolean Problems, IEEE Trans-
actions on Evolutionary Computation 18 (4) (2014) 465–480.

[31] M. Iqbal, W. N. Browne, M. Zhang, Extending XCS with Cyclic Graphs


for Scalability on Complex Boolean Problems, Evolutionary Computation-
Doi:10.1162/EVCO a 00167.

[32] S. W. Wilson, Classifier Fitness Based on Accuracy, Evolutionary Compu-


tation 3 (2) (1995) 149–175.

[33] T. Kovacs, Evolving Optimal Populations with XCS Classifier Systems,


Tech. Rep. CSR-96-17 and CSRP-9617, University of Birmingham, UK
(1996).

[34] S. W. Wilson, Generalization in the XCS Classifier System, in: Proceedings


of the Genetic Programming Conference, 1998, pp. 665–674.

[35] M. V. Butz, S. W. Wilson, An Algorithmic Description of XCS, Soft Com-


puting 6 (3-4) (2002) 144–153.

33
[36] S. S. Naqvi, W. N. Browne, C. Hollitt, Optimizing Visual Attention Models
for Predicting Human Fixations Using Genetic Algorithms, in: Proceedings
of IEEE Congress on Evolutionary Computation, 2013, pp. 1302–1309.

[37] L. Mai, F. Liu, Comparing Salient Object Detection Results without


Ground Truth, in: Proceedings of the European Conference on Computer
Vision, 2014, pp. 76–91.

[38] N. Tong, H. Lu, Y. Zhang, X. Ruan, Salient Object Detection via Global
and Local Cues, Pattern Recognition 48 (10) (2015) 3258–3267.

[39] N. Singh, R. Arya, R. Agrawal, A Novel Approach to Combine Features for


Salient Object Detection Using Constrained Particle Swarm Optimization,
Pattern Recognition 47 (4) (2014) 1731–1739.

[40] J. Yua, D. Taob, Y. Ruic, J. Cheng, Pairwise Constraints Based Multiview


Features Fusion for Scene Classification, Pattern Recognition 46 (2) (2013)
483–496.

[41] J. Yu, Y. Rui, D. Tao, Click Prediction for Web Image Reranking Using
Multimodal Sparse Coding, IEEE Transactions on Image Processing 23 (5)
(2014) 2019–2032.

[42] T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning:


Data Mining, Inference, and Prediction, Springer, 2009.

[43] P. L. Lanzi, D. Loiacono, Classifier Systems That Compute Action Map-


pings, in: Proceedings of the Genetic and Evolutionary Computation Con-
ference, 2007, pp. 1822–1829.

[44] T. Liu, Z. Yuan, J. Sun, J. Wang, N. Zheng, X. Tang, H.-Y. Shum, Learning
to Detect a Salient Object, IEEE Transactions on Pattern Analysis and
Machine Intelligence 33 (2) (2011) 353–367.

[45] H. Fu, X. Cao, Z. Tu, Cluster-Based Co-Saliency Detection, IEEE Trans-


actions on Image Processing 22 (10) (2013) 3766–3778.

34
[46] B. Alexe, T. Deselaers, V. Ferrari, What is an Object?, in: Proceedings of
IEEE Conference on Computer Vision and Pattern Recognition, 2010, pp.
73–80.

[47] S. S. Naqvi, W. N. Browne, C. Hollitt, Combining Object-Based Local


and Global Feature Statistics for Salient Object Search, in: Proceedings of
International Conference on Image and Vision Computing New Zealand,
2013, p. 6.

[48] X. Hou, L. Zhang, Saliency Detection: A Spectral Residual Approach,


in: Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, 2007, pp. 1–8.

[49] Y. Rubner, C. Tomasi, L. J. Guibas, The Earth Mover’s Distance as a


Metric for Image Retrieval, International Journal of Computer Vision 40 (2)
(2000) 99–121.

[50] O. Pele, M. Werman, Fast and Robust Earth Mover’s Distances, in: Pro-
ceedings of International Conference on Computer Vision, 2009, pp. 460–
467.

[51] P. L. Lanzi, D. Loiacono, S. W. Wilson, D. E. Goldberg, Generalization


in the XCSF Classifier System: Analysis, Improvement, and Extension,
Evolutionary Computation 15 (2) (2007) 133–168.

[52] A. Borji, D. N. Sihite, L. Itti, Salient Object Detection: A Benchmark,


in: Proceedings of the European Conference on Computer Vision, Part II,
2012, pp. 414–429.

[53] S. Alpert, M. Galun, R. Basri, A. Brandt, Image Segmentation by Prob-


abilistic Bottom-Up Aggregation and Cue Integration, in: Proceedings of
the IEEE Conference on Computer Vision and Pattern Recognition, 2007,
pp. 1–8.

35
[54] R. Achanta, S. Hemami, F. Estrada, S. Süsstrunk, Frequency-tuned Salient
Region Detection, in: Proceedings of IEEE Conference on Computer Vision
and Pattern Recognition, 2009, pp. 1597–1604.

[55] K. Boyd, K. H. Eng, C. D. Page, Area under the Precision-Recall Curve:


Point Estimates and Confidence Intervals, in: Machine Learning and
Knowledge Discovery in Databases, 2013, pp. 451–466.

[56] X. Li, H. Lu, L. Zhang, X. Ruan, M.-H. Yang, Saliency Detection via
Dense and Sparse Reconstruction, in: IEEE International Conference on
Computer Vision, 2013, pp. 2976–2983.

[57] C. Yang, L. Zhang, H. Lu, X. Ruan, M.-H. Yang, Saliency Detection via
Graph-Based Manifold Ranking, in: Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, 2013, pp. 3166–3173.

[58] Y. Xie, H. Lu, M.-H. Yang, Bayesian Saliency via Low and Mid Level Cues,
IEEE Transactions on Image Processing 22 (5) (2013) 1689–1698.

[59] H. Fu, X. Cao, Z. Tu, Cluster-Based Co-Saliency Detection, IEEE Trans-


actions on Image Processing 22 (10) (2013) 3766–3778.

[60] X. Shen, Y. Wu, A Unified Approach to Salient Object Detection via Low
Rank Matrix Recovery, in: Proceedings of the IEEE Conference on Com-
puter Vision and Pattern Recognition, 2012, pp. 853–860.

[61] M.-M. Cheng, J. Warrell, W.-Y. Lin, S. Zheng, V. Vineet, N. Crook, Ef-
ficient Salient Region Detection with Soft Image Abstraction, in: IEEE
International Conference on Computer Vision, 2013, pp. 1529–1536.

[62] Q. Yan, L. Xu, J. Shi, J. Jia, Hierarchical Saliency Detection, in: Proceed-
ings of the IEEE Conference on Computer Vision and Pattern Recognition,
2013, pp. 1155–1162.

36

You might also like