0% found this document useful (0 votes)
15 views9 pages

Markaki 2019

Uploaded by

samitanger267
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views9 pages

Markaki 2019

Uploaded by

samitanger267
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

IET Image Processing

Research Article

Image sorting via a reduction in travelling ISSN 1751-9659


Received on 17th July 2018
Revised 23rd November 2018
salesman problem Accepted on 18th September 2019
E-First on 6th November 2019
doi: 10.1049/iet-ipr.2018.5880
www.ietdl.org

Smaragda Markaki1, Costas Panagiotakis1,2 , Dimitra Lasthiotaki1


1Department of Management Science and Technology, Hellenic Mediterranean University, 72100 Agios Nikolaos, Crete, Greece
2Institute of Computer Science, FORTH, 70013 Heraklion, Crete, Greece
E-mail: [email protected]

Abstract: The authors define and approximately solve the problem of unsupervised image sorting that is considered as a kind
of content-based image clustering. The content-based image sorting is the creation of a route that passes through all the
images once, in such an order that the next one from the previous image has similar content. In the end, an image ordering (e.g.
slideshow) is automatically produced, so that the images with similar content should be close to each other. This problem
resembles the problem known in the literature as ‘travelling salesman problem’ (TSP). In this work, the authors have proposed
two classes of methods (the nearest-neighbour and genetic methods) that have also been applied on the TSP problem. Their
benefits on computational efficiency and accuracy are discussed over six datasets that have been created from the GHIM-10K
dataset. The experimental results demonstrate that the proposed methods efficiently solve the image sorting problem, producing
image sequences that almost agree with human intuition.

1 Introduction of cities and then return to the original location so that the total
distance of his trip is the minimum possible and with the limitation
During the last decade, the human access to multimedia creation, to visit each city once. Fig. 1 depicts an unsorted database of
mainly through mobile devices, and the tendency to share them images (up) and an image sorting (down) that is the output of the
through social networks [1] cause an explosion of image data proposed method. According to our knowledge, this is the first
available on the web. The representation of image content and work that defines and faces the problem of image sorting.
categorisation is generally time consuming. The development of The rest of the paper is organised as follows. Section 2 presents
image content representation and indexing/clustering techniques the related work. In Sections 3 and 4, the problem formulation and
are aiming to facilitate access, content search, and automatic the proposed methodology are described, respectively.
categorisation (tagging/labelling). Experimental results and conclusions are presented in Sections 5
In this work, we present an unsupervised method of content- and 6, respectively.
based image sorting for the images of a large database. This
problem is considered as a content-based image clustering problem
[2, 3]. According to the image clustering problem, the goal is to 2 Related work
find a mapping of a set of given images into clusters (classes) such 2.1 Image classification and representation
that each class contains images that belong to the same category.
The problem of content-based image sorting actually is to find the Image classification that is a similar problem with image sorting
minimum path that can access all the images once. The sorting has been studied intensively over the last decade. There are
result should give a sequence of images such that each image is methods based on supervised learning, where a training dataset is
followed by the image that looks more similar. The proposed given and each test image is categorised into the predefined
methodology can be used to represent the image content by classes. In the supervised learning, various techniques have been
creating a slideshow and to speed up image indexing and employed including support vector machines (SVMs), boosting
clustering, as the similar image in the content is closely located. classifiers and multiple instance learning [7]. These approaches
The problem of image sorting is quite similar to the well- suffer from several critical challenges including difficulty in
studied ‘travelling salesman problem’ (TSP) [4–6]. TSP refers to handling a large number of categories, requirement of a predefined
the problem of finding the ideal and minimum route for the set of target classes of interest and extensive labelling efforts for
salesman who, starting at his starting point, should visit a number constructing training data. In addition, they are not scalable, since
they are typically designed to annotate images only with a
predefined set of labels [7].
The unsupervised learning methods can overcome this problem
but they cannot provide human interpretable label information for
each image or category [7]. The image sorting problem can also be
faced by supervised and unsupervised learning methods. However,
in this particular problem, the unsupervised learning methods are
more preferable than the supervised ones since they overcome the
problem of creating training data without losing information in
description of the system output as in the image classification
problem. In this work, we propose unsupervised learning methods
to face the problem.
Unsupervised image clustering methods partition the unlabelled
Fig. 1 Overview of our approach. An unsorted database of images (up) images into disjoint clusters based on a similarity measure. There
and an output of the proposed method (down) are several approaches to this problem like topic modelling [8] and

IET Image Process., 2020, Vol. 14 Iss. 1, pp. 31-39 31


© The Institution of Engineering and Technology 2019
Fig. 2 Schema of NN algorithm

graph-based methods [9]. In [7], a novel algorithm to cluster and The global minimum of this function corresponds to the optimal
annotate a set of input images jointly has been proposed, where the solution of the TSP problem.
images are clustered into several discriminative groups and each The ‘simplicity’ of the statement of the problem is deceptive,
group is identified with representative labels automatically. A set the TSP is one of the most intensely studied problems in
of these label-based representations is then refined collectively computational mathematics and yet no effective solution method is
through a non-negative matrix factorisation with sparsity and known for the general case. Some researchers believe that the TSP
orthogonality constraints, where the refined representations are is in a class of problems known as NP-hard, meaning that these
employed to cluster and annotate the input images jointly. problems contain non-deterministic polynomials and may not have
The image representation is also one of the most important any solution and that approximations through heuristics are the
issues in image clustering and labelling and object recognition. The only solutions possible. The main difficulty of this problem is the
image representation is also required for the image sorting enormous number of possible tours; (n − 1)!/2 for symmetric n
problem. In the literature, several visual features, i.e. colour layout cities tour. As the number of cities in the problem increases, the
descriptor (CLD) [10, 11], scale-invariant feature transform (SIFT) numbers of permutations of valid tours also increase. It is this
[12], speeded up robust features (SURF) [13] and histogram of factorial growth that makes the task of solving the TSP immense
oriented gradients (HOG) [14], have been used to encode visual even for the modest n sized problems [16].
information of images. Supervised techniques [15] that use learned Many methods have been developed for solving TSP, including
features give higher performances than the unsupervised ones but exact algorithms and approximate algorithms [5, 6]. The exact
have the limitations of selection of the training set and classes. algorithms are carried out to find the optimal solution from all
valid solutions in a number of steps. However, because of
2.2 Travelling salesman problem exponential complexity, they are always infeasible if the scale of
TSP becomes large. In contrast, the approximate algorithms,
The image sorting problem is also related to the problem known in especially many bioinspired algorithms, can obtain accepted
the literature as ‘the travelling salesman problem’ (TSP), except solutions for many NP-hard problems with (relatively) short
that in this case we have images instead of cities and there is no running time [17].
need to return to the point of origin as required in the TSP. The The nearest-neighbour (NN) algorithm is one of the first
goal of the TSP is to find a routing of a salesman who starts from a algorithms used in the TSP and it is implemented in the proposed
home location, visits a prescribed set of cities and returns to the method as well and perhaps NN is the most natural heuristic for the
original location in such a way that the total distance travelled is TSP. In this algorithm, one mimics the traveller whose rule of
the minimum one and each city is visited exactly once. Although a thumb is always to go next to the nearest as-yet-unvisited location
business tour of a modern day travelling salesman may not seem to [18]. The NN algorithm lets the salesman choose the nearest
be too complex in terms of route planning, the TSP in its generality unvisited city as his next move. This algorithm quickly yields an
represents a typical ‘hard’ combinatorial optimisation problem [5]. effectively short route. Thus, the NN starts from every vertex and
The mathematical formulation of TSP is given hereafter: given chooses the best tour obtained [19]. The steps followed by the NN
a set of N cities {S = C1, C2, …, CN } and the distance between each algorithm are presented in Fig. 2.
pair of cities D(Ci, C j), what is the shortest possible route (r) that A variation of the NN algorithm, called NN1 algorithm, is also
visits each point exactly once and returns to the point of origin [see used in the proposed method. The NN1 starts its tour from a fixed
(2)]: vertex i1, goes to the nearest vertex i2 then to the nearest vertex i3
(from i2) distinct from i1, i2 etc. [19].
N−1 Another class of algorithms that is widely used to solve such
FTSP(r) = ∑ D(Ir k , Ir k
( ) ( + 1)) + D(Ir(N ), Ir(1)) (1) type of problems and implemented in the proposed method is the
k=1 genetic algorithm (GA) [20]. GA belongs to the category of
metaheuristic algorithms and is inspired by the natural selection
process that belongs to the larger category of evolutionary

32 IET Image Process., 2020, Vol. 14 Iss. 1, pp. 31-39


© The Institution of Engineering and Technology 2019
Fig. 3 Schema of GA algorithm

algorithms. GA is generally composed of two processes. The first distance between each pair of images D(Ii, I j), what is the shortest
process is the selection of individuals for the production of the next possible route (r) through the images which accesses each image
generation and the second process is manipulation of the selected exactly once. Let F(r) be the path cost of a route r [see (2)]:
individuals to form the next generation by crossover and mutation
techniques. The selection mechanism determines which individuals N−1
are chosen for mating (reproduction) and how many offspring each FIS(r) = ∑ D(Ir k , Ir k
( ) ( + 1)) (2)
selected individual produces. The main principle of the selection k=1
strategy is ‘the better is an individual; the higher is its chance of
being parent’ [16]. The global minimum of this function corresponds to the optimal
The GA operation is based on the Darwinian principle of solution of the image sorting problem. In holds that
‘survival of the fittest’ and it implies that the fitter individuals are
more likely to survive and have a greater chance of passing their FIS(r) = FTSP(r) − D(Ir(N ), Ir(1)) (3)
good genetic features to the next generation. In GA, each
individual i.e. chromosome that is a member of the population which means that the methods applied on TSP can also be applied
represents a potential solution to the problem. There are a number on the problem of image sorting. It seems that the image sorting
of possible chromosome representations due to a wide variety of problem inherits the ‘complexity hardness’ from the TSP, therefore
problem types [16]. it makes sense to compute only approximated solutions to this
The TSP consists of a number of cities, where each pair of cities problem as the number of given images increases.
has a corresponding distance. The aim is to visit all the cities such
that the total distance travelled will be minimised. Obviously, a 4 Methodology
solution, and therefore a chromosome which represents that
solution to the TSP, can be given as an order, that is a path, of the In this work, the NN algorithm, the NN1 algorithm and the GA, as
cities. The procedure for solving TSP can be viewed as a process described in Section 2, were applied to the method of unsupervised
flow given in Fig. 3 [16]. image sorting of a large database. The NN1 algorithm is a variation
GA is one of the most used evolutionary computation of the NN. The main difference between them is the fact that the
algorithms that gives a good solution for TSP, but it has high NN1 algorithm always has a starting point, the initial image of the
computational cost. In [21], the affinity propagation clustering specific database. The NN and NN1 can be easily and quickly
technique is used to reduce the computational time. The main idea implemented, but they might sometimes omit shorter paths. On the
of this work is to cluster the cities into smaller clusters and other hand, the GA procedure requires more time to complete the
independently solve the TSP in each cluster using GA separately, process but it usually produces higher-quality solutions than those
thus the access to the optimal solution will be in less computational by NN and NN1.
time. Another approach using the K-means clustering algorithm to In order to apply these three TSP methods on the image sorting
restructure the travelling route by reconnecting each cluster has problem, we have modified them for minimising the cost function
been proposed in [17]. The clusters have been ranked in advance of (2) instead of (1). This is done by removing the last term
according to the distance among cluster centres. D(Ir(N ), Ir(1)) from the objective function that they try to minimise,
Linear programming approaches have also been used to solve as defined in (3).
TSP. In [4], a time-expanded integer linear programming The image sorting problem is associated with a distance
formulation has been proposed. The approach works with carefully function D in order to measure the similarity between the images
designed partially time-expanded networks, which are used to [see (2)]. The distance function D can be directly computed in an
produce upper as well as lower bounds, and which are iteratively unsupervised way on selected image descriptors [10] or using
refined until the convergence. supervised methods e.g. neural networks [24]. The proposed
Ant colony optimisation algorithms [22, 23] have also been methodology is flexible on the selection of image descriptors or
applied to the TSP. They send out a large number of virtual ant supervised methods. On this framework, similarly with [10, 25],
agents to explore many possible routes, finding approximated we use MPEG-7 visual descriptors [26], like the CLD, a low cost
solutions to the TSP. and compact descriptor, which suffices to describe smoothly the
changes in visual content (mainly colour and motion variations).
CLD captures the spatial layout of the representative colours on a
3 Image sorting problem
grid superimposed on a region or image. The representation is
In this work, we define the image sorting problem as follows: based on coefficients of the discrete cosine transform [11].
given a set of N images {S = I1, I2, …, IN } and the visual/content

IET Image Process., 2020, Vol. 14 Iss. 1, pp. 31-39 33


© The Institution of Engineering and Technology 2019
Fig. 4 Sample images (two per class) from GHIM-10K dataset that have been used to create Datasets 1–6

Similarly with [10, 25, 26], we used the following function D to • In the first set of experiments, the accuracy of the image sorting
measure the content distance of two CLDs, {DY, D Cb, D Cr} and is calculated from the ratio of the images belonging to the same
{DY′, D Cb′, D Cr′}: category and have been sequenced in the path with the number
of all the images of the route. We take the advantage that the
D= ∑ (DYi − DYi′) 2
+ ∑ (D Cbi − D Cbi′)
2 optimal result of the image sorting should respect the transitions
of classes according to image categories. Therefore, a human
i i
(4) intuition-based metric is to get the accuracy of the computed
+ ∑ (D Cri − D Cri′) 2
path by taking into account the number of transitions of classes
i in the computed image route. It holds that lower the transitions
the better accuracy.
where (DY, D Cb, D Cr) represent the ith DCT coefficients of the Y • In the second set of experiments, where the time order of the
(luminance) and Cb, Cr (chrominance) colour components. sampled images is given, the accuracy is defined by the ratio of
The extraction process of the descriptor from an image consists the images that have been placed correctly and timely in the
of four stages: image partitioning, representative colour detection, route according to their previous tour image to the number of all
DCT transformation and nonlinear quantisation of the zigzag- the images of the route.
scanned coefficients. In the first stage, the image is divided into 64 • The path cost is the total distance between the images [see (2)]
blocks (8 blocks × 8 blocks). Since the sizes of the input image are that the proposed method tries to minimise. According to our
not necessarily multiple of 8, it is assumed that the blocks can experiments, the accuracy and path cost are highly correlated.
differ in their size, although the pixels are distributed in the most Always the method with the highest accuracy gives the lowest
uniform way. In the following stage, a single representative colour path cost.
is selected from each block. Consequently, a tiny image
• The computational time is the time (in seconds) required to
representation of size 8 × 8 is obtained. Any method to compute
complete the process of each algorithm. In the computational
each representative colour can be applied, but the average of the
time we only measure the execution times of TSP-based
pixel colours in a block is sufficient in general. In the third stage,
methods, without taking into account the computations of image
each of the three colour components is transformed by a 8 × 8
descriptors and distance matrices.
DCT, so three sets of 64 DCT coefficients are obtained. According
to CLD, YCbCr colour space has been used [26]. In the last stage,
each set of coefficients is zigzag-scanned and a few low-frequency 5.2 Image classification-based evaluation
coefficients are nonlinearly quantised [11]. In this work, the GHIM-10K dataset [27], which consists of 20
different categories and 10,000 images in total, was used in order
5 Experimental results to create datasets (our intention is to make the code by
implementing the proposed method together with the datasets and
In our experimental results, we have included two sets of
the experimental results publicly available) to evaluate the results
experiments. In the first set, we have created datasets of images
of image sorting methods. The content of images varies from
where the image classification is given and the goal is to measure if
sunset, buildings, flowers, cars, mountains, animals, insects etc.
the ordering proposed by the proposed frameworks agrees with the
Each category includes 500 images of size 400 × 300 or 300 × 400
given classification. So, if we apply the image sorting in a set of
in the JPEG format. Six subsets of images (the selection of both
images of different classes, the optimal solution should give a route
categories and images was random) were created as defined
that minimises the transitions between the classes. The optimal
hereafter:
method yields zero transitions. This remark is used in our
experimental results to evaluate the accuracy of the proposed
• Dataset 1: 500 images in total, 100 images from each category;
methods.
categories: snowy mountains, flowers, butterflies, sunset,
In the second set of experiments, we have tested the proposed
motorbikes.
method on sampled images from given videos. On these
experiments, the ground truth is given by the time order of the • Dataset 2: 250 images in total, 50 images from each category;
sampled images. So, the goal is to automatically find the time order categories: fireworks, snowy mountains, meadows, butterflies,
of the sampled images of the given video. According to our sunsets.
knowledge this is the first work that defines and faces the problem • Dataset 3: 150 images in total, 50 images from each category;
of image sorting, so it is not possible to include comparisons with categories: fireworks, snowy mountains, meadows.
other methods from the literature that solves the image sorting • Dataset 4: 100 images in total, 50 images from each category;
problem. categories: fireworks, snowy mountains.
• Dataset 5: 20 images in total, 10 images from the category
5.1 Evaluation metrics fireworks and 5 images from the each following category:
snowy mountains and meadows.
Every method is evaluated in terms of two key parameters: its • Dataset 6: 15 images in total, 5 images from each category;
running time and the quality of the tours that it produces [18]. categories: fireworks, snowy mountains, meadows.
So, for each subset of data, the accuracy, path cost and
computational time are presented for each of the three algorithms So, the size of datasets varies from 15 (Dataset 6) to 500 (Dataset
used: 1) images and the number of categories varies from two (Dataset 4)
to six (Dataset 1). Fig. 4 depicts the sample images (two per class)

34 IET Image Process., 2020, Vol. 14 Iss. 1, pp. 31-39


© The Institution of Engineering and Technology 2019
Table 1 Results of NN, NN1 and GA algorithms on Dataset
1 (500 images)
NN NN1 GA
accuracy, % 73.7 72.7 75.2
path cost 159,770 160,713 159,181
comp. time 2.82 0.01 502.91
Bold values indicate the top-performing method.

Table 2 Results of NN, NN1 and GA algorithms on Dataset


2 (250 images)
NN NN1 GA
accuracy, % 80.4 77.7 81.2
path cost 75,805 76,761 73,808
comp. time 0.27 0.01 63
Bold values indicate the top-performing method.

Table 3 Results of NN, NN1 and GA algorithms on Dataset


3 (150 images)
NN NN1 GA
accuracy, % 93.2 90.5 94.6
path cost 41,363 42,297 40,049
comp. time 0.09 0.005 18.1
Bold values indicate the top-performing method.

Table 4 Results of NN, NN1 and GA algorithms on Dataset


4 (100 images)
NN NN1 GA
accuracy, % 100 98.9 100
path cost 27,193 27,715 26,224
comp. time 0.04 0.004 6.47
Bold values indicate the top-performing method.

Table 5 Results of NN, NN1 and GA algorithms on Dataset


5 (20 images)
NN NN1 GA
accuracy, % 88.2 88.2 100
path cost 5811 6006 5648
comp. time 0.01 <0.001 0.23
Bold values indicate the top-performing method.

Table 6 Results of NN, NN1 and GA algorithms on Dataset


6 (15 images)
NN NN1 GA
accuracy, % 83.3 83.3 100
path cost 4972 5084 4820
comp. time 0.01 <0.001 0.16
Bold values indicate the top-performing method.

from GHIM-10K dataset that have been used to create Datasets 1–


6. The results after completion of the experiments are given below.
Tables 1–6 present the results of NN, NN1 and GA algorithms
on Datasets 1–6, respectively. It holds that under any case the GA
algorithm outperforms NN and NN1, while NN outperforms NN1.
The NN1 method is the most computational effective method.
Therefore, from the proposed methods, it can be concluded that the
GA produces high-quality solution with the best accuracy between Fig. 5 Sequence of images of Dataset 6 according to
the tree algorithms although it requires more time to complete the (a) NN algorithm, (b) NN1 algorithm, (c) GA algorithm
process. The NN algorithm produces solutions with good enough
accuracy with respect to the time required to complete the process. Fig. 5 depicts the three sequences of images of Dataset 6
The algorithm with the minimum running time, as it was expected, according to (a) NN, (b) NN1 and (c) GA algorithms. In this
is the NN1 algorithm. However, it does not produce good enough example, NN and NN1 give the similar results (having almost the
results compared to the others. same path cost) with two misses yielding 83.3% accuracy, while

IET Image Process., 2020, Vol. 14 Iss. 1, pp. 31-39 35


© The Institution of Engineering and Technology 2019
Table 7 Results of NN, NN1 and GA algorithms on Dataset Table 10 Results of NN, NN1 and GA algorithms on
7 (100 images) Dataset 10 (12 images)
NN NN1 GA NN NN1 GA
accuracy, % 93.9 97.0 97.0 accuracy, % 72.7 72.7 63.6
path cost 1059.7 1075.4 1029.5 path cost 1742.7 1907.3 1736.3
comp. time 0.05 0.01 7.4 comp. time 0.006 <0.001 0.1
Bold values indicate the top-performing method. Bold values indicate the top-performing method.

Table 8 Results of NN, NN1 and GA algorithms on Dataset Table 11 Results of NN, NN1 and GA algorithms on
8 (12 images) Dataset 11 (100 images)
NN NN1 GA NN NN1 GA
accuracy, % 90.9 72.7 90.9 accuracy, % 100 75.8 100
path cost 341.6 379.0 341.6 path cost 1836.4 1955.9 1836.4
comp. time 0.006 <0.001 0.1 comp. time 0.05 0.01 7.4
Bold values indicate the top-performing method. Bold values indicate the top-performing method.

Table 9 Results of NN, NN1 and GA algorithms on Dataset Table 12 Results of NN, NN1 and GA algorithms on
9 (100 images) Dataset 12 (12 images)
NN NN1 GA NN NN1 GA
accuracy, % 73.7 73.7 87.9 accuracy, % 90.9 54.6 90.9
path cost 5841.6 5841.6 5658.0 path cost 971.7 1107.6 971.7
comp. time 0.05 0.01 7.4 comp. time 0.006 <0.001 0.1
Bold values indicate the top-performing method. Bold values indicate the top-performing method.

the results of GA algorithm agree with the image classes of ground Table 13 Results of NN, NN1 and GA algorithms on
truth yielding 100% accuracy. Dataset 13 (125 images)
NN NN1 GA
5.3 Video-based evaluation accuracy, % 56.4 63.5 57.1
In the video-based evaluation, two public videos (available online: path cost 5510.9 5649.4 5466.9
https://www.dropbox.com/sh/rpysux4oa746jty/B265lHwpAB) comp. time 0.07 0.005 12.8
have been used: Bold values indicate the top-performing method.

• Hall Monitor [25] (300 frames),


• Foreman [25] (300 frames), Table 14 Results of NN, NN1 and GA algorithms on
• Coast_guard [25] (300 frames), Dataset 14 (15 images)
• Data [25] (375 frames),under two different sampling rates. So, NN NN1 GA
we create eight subsets of images as defined hereafter: accuracy, % 57.1 64.3 64.3
• Dataset 7: The Hall Monitor video is sampled with a sampling path cost 2535.0 2793.3 2533.9
rate of 33.3% (one frame for every three frames), resulting in comp. time 0.01 0.008 0.2
100 images. Bold values indicate the top-performing method.
• Dataset 8: The Hall Monitor video is sampled with a sampling
rate of 1 Hz (one frame for every 25 frames), resulting in 12
images. the datasets of Hall Monitor and Coast_guard, we get better
• Dataset 9: The Foreman video is sampled with a sampling rate performances than the corresponding datasets of Foreman. This
of 33.3% (one frame for every three frames), resulting in 100 was expected due to the fast and unpredictable content changes of
images. the Foreman video (see Fig. 6). The worst results were obtained
• Dataset 10: The Foreman video is sampled with a sampling rate under the video Data, as it consists of several shots (while the
of 1 Hz (one frame for every 25 frames), resulting in 12 images. others are one-shot videos). This video has been selected in order
• Dataset 11: The Coast_guard video is sampled with a sampling to show that it is not possible to predict the order of shots by the
rate of 33.3% (one frame for every three frames), resulting in proposed method. However, the proposed method can be used to
100 images. create a ‘smooth’ and ‘short’ path between different shots as in
• Dataset 12: The Coast_guard video is sampled with a sampling Data video (see Fig. 7).
rate of 1 Hz (one frame for every 25 frames), resulting in 12 Next, we analyse the results of Tables 7 and 8 to examine how
images. the frame rates affect the results. Due to the higher frame rate of
• Dataset 13: The Data video is sampled with a sampling rate of Dataset 7, the consecutive frames of Dataset 9 are closer to each
33.3% (one frame for every three frames), resulting in 125 other having smaller distances between them than the distances of
images. consecutive frames of Dataset 8, where more unpredictable content
• Dataset 14: The Data video is sampled with a sampling rate of changes appeared between consecutive frames. This fact explains
1 Hz (one frame for every 25 frames), resulting in 15 images. the better accuracy results on datasets with higher frame rates e.g.
Dataset 9. Similar results are also obtained on Datasets 9–14.
After completion of the experiments the results are presented Figs. 6–9 depict the three sequences of images (according to
below. NN, NN1 and GA algorithms) of Dataset 10 (Foreman), Dataset 14
Tables 7–14 present the results of NN, NN1 and GA algorithms (Data), Dataset 8 (Hall Monitor) and Dataset 12 (Coast_guard),
on Datasets 7–14, respectively. It holds that in most of the cases the respectively. In some cases, the methods proposed to inverse the
GA algorithm outperforms NN and NN1, while NN outperforms time order of image sequences that are equivalent in path cost with
NN1. GA algorithm yields the lowest path cost under any case. In the normal one.

36 IET Image Process., 2020, Vol. 14 Iss. 1, pp. 31-39


© The Institution of Engineering and Technology 2019
Fig. 6 Image sequence of NN (first raw), NN1 (second raw) and GA (third row) algorithms on Dataset 10 (12 images)

Fig. 7 Image sequence of NN (first raw), NN1 (second raw) and GA (third row) algorithms on Dataset 14 (15 images)

Fig. 8 Image sequence of NN (first raw), NN1 (second raw) and GA (third row) algorithms on Dataset 8 (12 images)

Fig. 9 Image sequence of NN (first raw), NN1 (second raw) and GA (third row) algorithms on Dataset 12 (12 images)

5.4 Evaluation of CLD experiments. It holds that CLD is quite faster (about 100 times
faster) than SURF-BOF-based description, since it is a very simple
In this experiment, we evaluate the performance of CLD compared representation without having any step with heavy computations.
with an image retrieval schema that uses SURF with bag-of-
The computational complexity of both the methods is O(N 2) as
features (SURF-BOF) description [28] according to the
both of them have to fill the distance matrix between all pairs of
implementation of Matlab (https://www.mathworks.com/help/
images. Figs. 10b and c depict the accuracy of NN method using
vision/ref/retrieveimages.html) [28]. The vocabulary is quantised
CLD and SURF-BOF description on the ten datasets of the first
using K-means algorithm. Finally, this method returns the cosine
and second experiments, respectively. Under most of the cases, it
similarity metric between two images. We used the one minus the
holds that CLD-based description clearly outperforms the SURF-
cosine similarity metric as the distance between the images. The
BOF description.
platform we used for implementing all the experiments was an
Intel Core i7 2.8 GHz, 12 GB RAM under Windows 10. All of our
tests are on Matlab 2018a. 6 Conclusions
Fig. 10a depicts the computational times for the computation of In this paper, we define and solve the unsupervised image sorting
distance matrix using CLD and SURF-BOF descriptions, as a problem. This problem is considered as a content-based image
function of number of images according to the first set of clustering problem and resembles with the problem known in the

IET Image Process., 2020, Vol. 14 Iss. 1, pp. 31-39 37


© The Institution of Engineering and Technology 2019
while the salesman is travelling. According to the online image
sorting problem, the images can be updated in real time and the
goal is to select the appropriate image in order to find a routing
with the minimum cost. The proposed algorithms in the current
work cannot be applied efficiently on the online image sorting
problem. This can be seen as an expansion of the current work,
which can be faced by applying the methodology proposed in [30].

7 References
[1] Gygli, M., Grabner, H., Gool, L.V.: ‘Video summarization by learning
submodular mixtures of objectives’. Proc. of the IEEE Conf. on Computer
Vision and Pattern Recognition, Boston, USA, 2015, pp. 3090–3098
[2] Agrawal, A., Karnick, H.: ‘Unsupervised image clustering’. PhD thesis,
Indian Institute of Technology, Kanpur, 2009
[3] Ahmed, N.: ‘Recent review on image clustering’, IET Image Process., 2015,
9, (11), pp. 1020–1032
[4] Boland, N., Hewitt, M., Vu, D.M., et al.: ‘Solving the traveling salesman
problem with time windows through dynamically generated time-expanded
networks’. Int. Conf. on AI and OR Techniques in Constraint Programming
for Combinatorial Optimization Problems, Thessaloniki, Greece, 2017, pp.
254–262
[5] Gutin, G., Punnen, A.P.: ‘The traveling salesman problem and its variations’,
vol. 12, (Springer Science & Business Media, Berlin, 2006)
[6] Weise, T., Chiong, R., Lassig, J., et al.: ‘Benchmarking optimization
algorithms: an open source framework for the traveling salesman problem’,
IEEE Comput. Intell. Mag., 2014, 9, (3), pp. 40–52
[7] Hong, S., Choi, J., Feyereisl, J., et al.: ‘Joint image clustering and labeling by
matrix factorization’, IEEE Trans. Pattern Anal. Mach. Intell., 2016, 38, (7),
pp. 1411–1424
[8] Liu, D., Chen, T.: ‘Unsupervised image categorization and object localization
using topic models and correspondences between images’. IEEE 11th Int.
Conf. on Computer Vision, 2007, ICCV 2007, Rio de Janeiro, Brazil, 2007,
pp. 1–7
[9] Kim, G., Faloutsos, C., Hebert, M.: ‘Unsupervised modeling of object
categories using link analysis techniques’. 2008 IEEE Conf. Computer Vision
and Pattern Recognition, Anchorage, AK, USA, 2008, p. 342
[10] Panagiotakis, C., Doulamis, A., Tziritas, G.: ‘Equivalent key frames selection
based on ISO-content principles’, IEEE Trans. Circuits Syst. Video Technol.,
2009, 19, (3), pp. 447–451
[11] Ventura Royo, C.: ‘Image-based query by example using MPEG-7 visual
descriptors’, 2010
[12] De Sande, K.V., Gevers, T., Snoek, C.: ‘Evaluating color descriptors for
object and scene recognition’, IEEE Trans. Pattern Anal. Mach. Intell., 2010,
32, (9), pp. 1582–1596
[13] Bay, H., Ess, A., Tuytelaars, T., et al.: ‘Speeded-up robust features (surf)’,
Comput. Vis. Image Underst., 2008, 110, (3), pp. 346–359
[14] Dalal, N., Triggs, B.: ‘Histograms of oriented gradients for human detection’.
IEEE Computer Society Conf. on Computer Vision and Pattern Recognition,
Fig. 10 Comparisons of CLD and SURF-BOF descriptors 2005, CVPR 2005, San Diego, USA, 2005, vol. 1, pp. 886–893
[15] Krizhevsky, A., Sutskever, I., Hinton, G.E.: ‘Imagenet classification with deep
(a) The computational time to compute the distance matrix using CLD and SURF-
convolutional neural networks’. Advances in Neural Information Processing
BOF description, (b) The accuracy of the NN method using CLD and SURF-BOF Systems, Lake Tahoe, USA, 2012, pp. 1097–1105
description on datasets of the first experiment, (c) The accuracy of the NN method [16] Razali, N.M., Geraghty, J.: ‘Genetic algorithm performance with different
using CLD and SURF-BOF description on datasets of the second experiment selection strategies in solving TSP’. Proc. of the World Congress on
Engineering, London, UK, 2011, vol. 2, pp. 1134–1139
[17] Deng, Y., Liu, Y., Zhou, D.: ‘An improved genetic algorithm with initial
literature as the TSP. A solution of this problem automatically population strategy for symmetric TSP’, Math. Probl. Eng., 2015, 2015, pp.
creates a slideshow of the given image set, which is very helpful on 1–6
representation of the images. We have modified three TSP-based [18] Johnson, D.S., McGeoch, L.A.: ‘The traveling salesman problem: a case
study in local optimization’, Local Search Comb. Optim., 1997, 1, pp. 215–
methods: the NN, the NN1 and the genetic methods to solve the 310
image sorting problem using CLD for image representation. [19] Gutin, G., Yeo, A., Zverovich, A.: ‘Traveling salesman should not be greedy:
In order to evaluate the proposed frameworks, we have domination analysis of greedy-type heuristics for the TSP’, Discrete Appl.
performed two sets of experiments. In the first set, we used datasets Math., 2002, 117, (1–3), pp. 81–86
[20] Georgilakis, P.S., Doulamis, N.D., Doulamis, A.D., et al.: ‘A novel iron loss
of images where the image classification is given and the goal is to reduction technique for distribution transformers based on a combined genetic
measure whether the resulting image ordering agrees with the algorithm-neural network approach’, IEEE Trans. Syst. Man Cybern. C, Appl.
given classification. In the second set of experiments, we have Rev., 2001, 31, (1), pp. 16–34
tested the proposed method on sampled images from given videos, [21] El-Samak, A.F., Ashour, W.: ‘Optimization of traveling salesman problem
using affinity propagation clustering and genetic algorithm’, J. Artif. Intell.
where the ground truth is given by the time order of the sampled Soft Comput. Res., 2015, 5, (4), pp. 239–245
images. In addition, we show that CLD outperforms a SURF-BOF [22] Dorigo, M., Gambardella, L.M.: ‘Ant colonies for the travelling salesman
description in terms of computational efficiency and accuracy. problem’, Biosystems, 1997, 43, (2), pp. 73–81
Conclusively, the proposed methods yield accurate results on [23] Mahi, M., Baykan, Ö.K., Kodaz, H.: ‘A new hybrid method based on particle
swarm optimization, ant colony optimization and 3-OPT algorithms for
image sorting problem, showing the effectiveness of the proposed traveling salesman problem’, Appl. Soft Comput., 2015, 30, pp. 484–490
methodology even if a simple image descriptor has been used. [24] Zagoruyko, S., Komodakis, N.: ‘Learning to compare image patches via
According to our experimental results, an open problem that convolutional neural networks’. Proc. of the IEEE Conf. on Computer Vision
can be considered as future work is the automatic prediction of and Pattern Recognition, Boston, USA, 2015, pp. 4353–4361
[25] Panagiotakis, C., Ovsepian, N., Michael, E.: ‘Video synopsis based on a
video-shots ordering. This problem is still open and it requires sequential distortion minimization method’. Int. Conf. on Computer Analysis
high-level video features [29]. Additionally, the unsupervised of Images and Patterns, York, UK, 2013, pp. 94–101
methods, like the proposed one, may fail to predict the order of [26] Manjunath, B.S., Ohm, J.-R., Vasudevan, V.V., et al.: ‘Color and texture
shots since it is related to human perception; hence, this problem descriptors’, IEEE Trans. Circuits Syst. Video Technol., 2001, 11, (6), pp.
703–715
may be only well faced by supervised methods. [27] Liu, G.-H., Yang, J.-Y., Li, Z.: ‘Content-based image retrieval using
The TLP as well as the image sorting can be also defined under computational visual attention model’, Pattern Recognit., 2015, 48, (8), pp.
the real-time (online) conditions [30], where the cities arrive online 2554–2566

38 IET Image Process., 2020, Vol. 14 Iss. 1, pp. 31-39


© The Institution of Engineering and Technology 2019
[28] Philbin, J., Chum, O., Isard, M., et al.: ‘Object retrieval with large [30] Blom, M., Krumke, S.O., de Paepe, W.E., et al.: ‘The online TSP against fair
vocabularies and fast spatial matching’. IEEE Conf. on Computer Vision and adversaries’, INFORMS J. Comput., 2001, 13, (2), pp. 138–148
Pattern Recognition, 2007, CVPR'07, Minneapolis, USA, 2007, pp. 1–8
[29] Radenović, F., Tolias, G., Chum, O.: ‘CNN image retrieval learns from bow:
unsupervised fine-tuning with hard examples’. European Conf. on Computer
Vision, Amsterdam, Netherlands, 2016, pp. 3–20

IET Image Process., 2020, Vol. 14 Iss. 1, pp. 31-39 39


© The Institution of Engineering and Technology 2019

You might also like