0% found this document useful (0 votes)
27 views8 pages

Sparse Representation For Image Similarity Assessment Using SIFT Algorithm

This document summarizes a journal article that proposes a feature-based approach called FSRISA (Feature-based Sparse Representation for Image Similarity Assessment) to quantify the similarity between images. It uses SIFT to extract features and K-SVD for dictionary learning. To assess similarity, it formulates sparse representation of test image features with respect to a joint dictionary of reference and test image dictionaries. It calculates reconstruction errors to determine how many test features reconstruct better using the reference vs. test dictionary, and uses this to calculate a similarity value between 0-1. The approach is evaluated on applications like copy detection, retrieval and recognition.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views8 pages

Sparse Representation For Image Similarity Assessment Using SIFT Algorithm

This document summarizes a journal article that proposes a feature-based approach called FSRISA (Feature-based Sparse Representation for Image Similarity Assessment) to quantify the similarity between images. It uses SIFT to extract features and K-SVD for dictionary learning. To assess similarity, it formulates sparse representation of test image features with respect to a joint dictionary of reference and test image dictionaries. It calculates reconstruction errors to determine how many test features reconstruct better using the reference vs. test dictionary, and uses this to calculate a similarity value between 0-1. The approach is evaluated on applications like copy detection, retrieval and recognition.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Journal of Computer Applications ISSN: 0974 1925,

Volume-5, Issue EICA2012-1, February 10, 2012



Image Processing

Sparse Representation for Image Similarity
Assessment using SIFT Algorithm
Shanmugam C
Assistant Professor(Senior)
Dept of ECE
Sri Shakthi Institute of Engineering and
Technology, Coimbatore-62.
Nandhini K
Dept of ECE
Sri Shakthi Institute of Engineering and
Technology, Coimbatore-62.
Poovizhi P
Dept of ECE
Sri Shakthi Institute of Engineering and
Technology, Coimbatore-62.
Priyadharshni C
Dept of ECE
Sri Shakthi Institute of Engineering and
Technology, Coimbatore-62.
Abstract - Assessment of image similarity is fundamentally important to numerous
multimedia applications. The goal of similarity assessment is to automatically assess the
similarities among images in a perceptually consistent manner. In this project, we interpret
the image similarity assessment problem as an information fidelity problem. More
specifically, we propose a feature-based approach to quantify the information that is present
in a reference image and how much of this information can be extracted from a test image to
assess the similarity between the two images. Here, we extract the feature points and their
descriptors from an image, followed by learning the dictionary/basis for the descriptors in
order to interpret the information present in this image. Then, we formulate the problem of
the image similarity assessment in terms of sparse representation. The prime importance of
this project is even after the image is subjected to geometrical variations like rotating,
shifting, change in orientation, change in background, still we can able to correctly identify
the similarity between two image. To evaluate the applicability of the proposed feature-
based sparse representation for image similarity assessment (FSRISA) technique, we apply
FSRISA to three popular applications, namely, image copy detection, retrieval, and
recognition by properly formulating them to sparse representation problems.
Keywords: image, sparse, image similarity assessment.
I. INTRODUCTION
Image similarity assessment is fundamentally important to numerous multimedia
information processing systems and applications, such as compression, restoration,
enhancement, copy detection, retrieval, and recognition/classification. The major goal of
image similarity assessment is to design algorithms for automatic and objective evaluation
of similarity in a manner that is consistent with subjective human evaluation.
EICA 202 Sparse Representation for Image Similarity Assessment using
SIFT Algorithm
Shanmugam C , Nandhini K , Poovizhi P , Priyadharshni C 97
(a) (b)
(c) (d)


Figure 1: Some examples of image manipulations. (a) original image (b)zoomed and shifted
version (c) original image (d) background modified version.
A. Steps in FSRISA
x Feature extraction-SIFT
x Dictionary learning-K-SVD
x Information Extraction
Feature Extraction have been discussed in the section II. Dictionary Learning using K-SVD
algorithm have been discussed in section III. Finally Information extraction to calculate
similarity value have been discussed in detail in section IV. Applications are described in
section V. Results have been discussed in section VI.

B. Block diagram
Reference
Image
Feature
Extraction
Dictionary
Learning
Information
via Sparse
coding
Similarity
Value
Feature
Extraction
Feature
Extraction
Feature
Extraction
Journal of Computer Applications ISSN: 0974 1925,
Volume-5, Issue EICA2012-1, February 10, 2012
Shanmugam C , Nandhini K , Poovizhi P , Priyadharshni C 98
C. Algorithm
Input: A reference image I
1
and a test image I
2
.
Output: The similarity value between I
1
and I
2
, i.e., Sim(I
1
, I
2
).
1. Extract the SIFT feature vectors y
1i
, i = 1, 2, .., K
1
, from I
1
,
followed by learning the dictionary feature D
1
sparsely representing y
1i
.
2. Extract the SIFT feature vectors y
2j
, j = 1, 2, .., K
2
, from I
2
, followed by learning the
dictionary feature D
2
sparsely representing y
2j
.
3. Perform l
1
-minimization by solving equation for y
2j
, j = 1,2, ..., K
2
, with respect to
D
12
= [D
1
|D
2
].
4. Calculate the reconstruction errors, E
1j
and E
2j
, for y
2j
, j = 1,2, ..., K
2
, with respect to D
1
and D
2
, respectively.
5. Perform voting by comparing E
1j
and E
2j
, for y
2j
, j = 1, 2, ,K
2
, and get the percentages
of votes, V
1
and V
2
, with respect to D
1
and D
2
, respectively.
6. Calculate Sim(I
1
, I
2
) = (V
1
- V
2
+ 1) / 2 .
II. EXTRACTION OF FEATURES (SIFT)
SIFT, Scale Invariant feature transform is a method for extracting distinctive invariant
features from images that can be used to perform reliable matching between different views
of an object or scene. The features are invariant to image scale and rotation, and are shown
to provide robust matching across a substantial range of affine distortion, change in 3D
viewpoint, addition of noise, and change in illumination. The features are highly distinctive,
in the sense that a single feature can be correctly matched with high probability against a
large database of features from many images.
III. DICTIONARY LEARNING (K-SVD)
Given a set of K SIFT feature vectors,y
i
R
M1
,i=1,2,K, we apply K-SVD to find the
dictionary D of size MN,M<N<<K, by formulating the problem as:
Min (
2
2
) subject to
i
,
0
_L,...(1)
Where x
i
R
N1
is the sparse representation coefficients of y
i, 0,
l
0
-norm of x
i
, counts the
number of nonzero coefficients of x
i
, and L is the most desired number of nonzero
coefficients of x
i
. We apply K-SVD to solve Eq. (1) via an iterative manner with two stages:
(i) sparse coding stage: apply OMP (orthogonal matching pursuit) to solve x
i
for each y
i
while fixing D; and (ii) dictionary update stage: update D together with the nonzero
coefficients of x
i
. The two stages are iteratively performed until convergence. It should be
noted that the l
0
-minimization formulation in Eq. (1) can be converted into an l
1
-
minimization problem and other dictionary learning algorithm, (e.g., the online dictionary
learning algorithm ) can be also applied in the dictionary feature extraction stage. The
obtained dictionary feature D is an over complete dictionary, where D =
n=1,2,..N
R
MN
, contains N prototype feature vector atoms as the column vectors in D.
Each original feature vector y
i
R
M1
,
i=1,2,K, can be sparsely represented as a linear
combination of the atoms defined in D, satisfying Dx
i 2
_, where _0 is an error
tolerance.
EICA 202 Sparse Representation for Image Similarity Assessment using
SIFT Algorithm

Shanmugam C , Nandhini K , Poovizhi P , Priyadharshni C 99

IV. INFORMATION EXTRACTION
After obtaining the dictionary feature for each image, we formulate the image similarity
assessment based on dictionary feature matching as a sparse representation problem,
described as follows. First, consider the two SIFT feature (column) vectors with First,
consider the two SIFT feature (column) vectors with length M, y
1i
, i = 1, 2, .., K1, and y
2j
, j
= 1, 2, .., K2, extracted, respectively, from the two images, I
1
and I
2
, where K
1
and K
2
are
the numbers of feature vectors of I1 and I2, respectively. The dictionary features of I
1
and I
2
are D
1
(of size MN
1
) and D
2
(of size MN
2
), respectively, where M < N
1
and M < N
2
.
Hence, y
1i
= D
1
x
1i
and y
2j
= D
2
x
2j
, where x
1i
and x
2j
are the two sparse coefficient (column)
vectors with length N1 and N2, of y1i and y2j, respectively.
Obviously, if y
1i
and y
2j
can be matched, y
1i
can be represented sparsely and linearly with
respect to D
2
. On the other hand, y
2j
can be represented sparsely and linearly with respect to
D
1
. To assess the similarity between a reference image I
1
and a test image I
2
, exploiting the
discriminative characteristic of sparse representation, we want to quantify how much
information present in I
1
can be extracted from I
2
. A sparse representation problem for
representing each SIFT feature vector y
2j
of I
2
with respect to the joint dictionary D
12
=[D
1
][D
2
] can be defined as
x
j
=min
subject to 2
_....(2)
x
j
where x
j
with length (N
1
+N
2
) is the sparse coefficient vector of y
2j
with length M of I
2
.
D
12
= [D
1
|D
2
] of size M(N
1
+N
2
) is the joint dictionary concatenating D
1
and D
2
, and _ 0
is an error tolerance. To solve the sparsest solution for x
j
, Eq. (2) can be cast to an l
1
-
minimization problem as
x
j
=arg min(
2
+
1
.(3)
where is a positive real number parameter. In this project, we apply an efficient sparse
coding algorithm, called the SPARSA (sparse reconstruction by separable approximation)
algorithm to solve Eq. (3) in order to find the sparse representation ( x
j
) of y
2j
with
respect to the dictionary D
12
. SPARSA is a very efficient iterative algorithm, where each
step is obtained by solving an optimization sub problem involving a quadratic term with
diagonal Hessian plus the original sparsity-inducing regularizer. Of course, Eq. (3) can be
directly solved via a greedy algorithm, such as OMP and other l
1
-minimization algorithms.
It is expected that the positions of nonzero coefficients in x
j
(or the selected atoms from
D
12
) should be highly concentrated on only one sub-dictionary (e.g., D
1
or D
2
), and the
remaining coefficients in x
j
should be zeros or small enough. Also, it is intuitive to expect
that the atoms for sparsely representing y
2j
should be mostly selected from the sub-
dictionary D
2
learned from the feature vectors extracted from the image I
2
itself, instead of
D
1
. If the parameters for learning the two dictionaries (D
1
and D
2
) can be adequately tuned,
the manner of atom selection in the sparse coding process may be changed accordingly. That
is, we intend to make the sparse coefficients x
j
(or the used atoms) solved by performing
sparse coding for y
2j
more consistent with our expectation to help for similarity assessment.
More specifically, we expect y
2j
will use more atoms from D
2
to represent it when I
2
and I
1
are visually different. On the other hand, we expect y2j will use more atoms from D
1
to
represent it when I
2
and I
1
are visually similar. The details are described in the seventh
paragraph of this subsection. Based on the obtained solution x
j
of Eq. (3), we can calculate
Journal of Computer Applications ISSN: 0974 1925,
Volume-5, Issue EICA2012-1, February 10, 2012
Shanmugam C , Nandhini K , Poovizhi P , Priyadharshni C 100
the reconstruction error as ||y
2j
D
12
x
j
||
2
. By letting the elements in x
j
, corresponding to
the atoms from D
2
, be zeros, we can get the reconstruction error E
1j
, using only the atoms
from D
1
for reconstructing y
2j
. On the other hand, by letting the elements in x
j
,
corresponding to the atoms from D
1
, be zeros, we can get the reconstruction error E
2j
, using
only the atoms from D
2
for reconstructing y
2j
. If E
1j
< E
2j
, it is claimed that the atoms from
D
1
are more suitable for representing y
2j
than those from D
2
, and D
1
will get a vote.
Otherwise, if E
2j
< E
1j
, y
2j
is more suitable to be represented by the atoms from D
2
(the
dictionary learned from the feature vectors y
2j
itself) than D
1
, and D
2
will get a vote.
Considering all the SIFT feature vectors of I
2
, y
2j
, j = 1, 2, , K
2
, the obtained percentage
of votes from D
1
and D
2
are denoted by V
1
and V
2
, 0 _ V
1
, V
2
_ 1, respectively. Based on
the voting strategy, we define the similarity between the two images, I
1
and I
2
, as:
Sim(I
1
, I
2
) = (V
1
- V
2
+ 1)/2,.. (4)
where the range of (V
1
- V
2
) is [-1, 1], which can be shifted to [0,1], resulting in the Sim(I
1
,
I
2
) defined in Eq. (4). Larger Sim(I
1
,I
2
) indicates that more atoms from D
1
learned from I
1
can well represent the feature vectors y
2j
extracted from I
2
. This implies that a considerable
amount of information (denoted by D
1
) presented in I
1
can be extracted (via spare coding by
solving Eq.(3)) from I
2
. On the other hand, smaller Sim(I
1
, I
2
) indicates most suitable atoms
for representing y
2j
extracted from I
2
are from D
2
learned from I
2
itself. This implies that
less/no information presented in I
1
can be extracted from I
2
. Hence, the larger the Sim(I
1
,
I
2
) is, the more similar the images I
1
and I
2
are.Obviously, if I1 is visually very different
from I
2
, V
2
is larger than V
1
.Nevertheless, if I
1
is visually similar to I
2
, V
2
will not be larger
than V
1
in all instances. That is, better (or similar) reconstruction performance for y
2j
may
be achieved using D
1
as the dictionary than using D
2
due to some feature vectors extracted
from I
2
being able to be matched by the feature vectors extracted from I
1
. To achieve this
goal, we propose three rules for tuning the parameters used by K-SVD for learning the two
dictionaries, D
1
and D
2
: (i) the number of the atoms in D
1
should be larger than that in D
2
(N
1
> N
2
); (ii) the number (J
1
) of iterations K-SVD performs for learning D
1
should be
larger than that (J
2
) for learning D
2
(J
1
> J
2
); and (iii) the number of the target sparsity (L
1
),
i.e., the number of nonzero coefficients for representing each feature vector for learning D
1
should be larger than that (L
2
) for learning D
2
(L
1
> L
2
). According to the rules designed
above, when I
1
is visually similar to I
2
and D
1
is finer than D
2
, the l
1
-minimizer for solving
Eq. (3) may prefer more promising atoms from D
1
than D
2
to reconstruct y
2j
, resulting in V
1
> V
2
and larger Sim(I
1
, I
2
).
Otherwise, when I
1
is visually different from I
2
, most atoms for reconstructing y
2j
will still
be selected from D
2
, resulting in V
1
< V
2
and smaller Sim(I
1
, I
2
). The proposed FSRISA
technique is summarized in Algorithm.
The major goal of performing sparse coding with respect to the dictionary consisting of D
1
and D
2
, instead of only D
1
can be addressed as follows. When I
1
(reference image) is
visually different from I
2
(test image), D
1
and D
2
are significantly different. In this scenario,
the idea behind our FSRISA is somewhat related to that of sparse coding-based image
classification approach, or sparse coding-based image decomposition approach. We use the
similar concept to quantify the similarity between I
2
and I
1
, which may be interpreted as
either (i) classifying I
2
into I
1
or I
2
itself; or (ii) decomposing I
2
into the components of I
1
and/or those of I
2
itself. When I
1
is visually similar to I
2
, D
1
and D
2
are similar, which is
enforced to that D
1
is finer than D
2
in FSRISA. Hence, the above discussions are also valid
in this scenario. Moreover, why we dont perform sparse coding with respect to only one
EICA 202 Sparse Representation for Image Similarity Assessment using
SIFT Algorithm

Shanmugam C , Nandhini K , Poovizhi P , Priyadharshni C 101

dictionary D
1
can be explained as follows. When performing sparse coding for the feature
vectors of I
2
with respect to a dictionary D
1
consisting of atoms which may be not suitable
for sparsely representing them, the sparse coding procedure still attempts to minimize the
reconstruction errors. Based on our experience, it is usually not well distinguishable from
reconstruction errors obtained with respect to either related or unrelated dictionaries. On the
other hand, it is not easy to define a bounded score based on reconstruction error obtained by
only one dictionary
V. APPLICATIONS
In this section, we introduce three multimedia applications, including image copy detection,
retrieval, and recognition, of the proposed FSRISA.
A. Image Copy Detection via FSRISA
Digital images distributed through the Internet may suffer from several possible
manipulations, such as (re)compression, noising, contrast/brightness adjusting, and
geometrical operations. To ensure trustworthiness, image copy detection techniques have
emerged to search for duplicates and forgeries.
Image copy detection can be achieved via content-based copy detection approach, which
measures the similarity/distance between an original image and its possible copy version
through comparing their extracted image features, SIFT-based features have been recently
investigated. In this section, we study content-based image copy detection by applying the
proposed FSRISA approach.
A user can perform image copy detection to detect possible copies of her/his original image
from the Internet or an image database. To detect whether a test image I2 is actually a copy
of a query image I1 with the dictionary feature D1 of size MN1,we first extract the SIFT
feature vectors y2j, j = 1, 2, ..., K2, and learn the dictionary feature D2 of size MN2 of I2,
such that D1 is finer than D2.
Then, we perform l1-minimization by solving Eq.(3) for each y2j with respect to D12 =
[D1|D2], and voting to get the percentages of votes, V1 and V2, with respect to D1 and
D2,respectively. Finally, based on Eq. (4), the similarity between I1and I2 can be calculated
as Sim(I1, I2). Given an empirically determined threshold , iI Sim(I1, I2) _ , then I2 can
be determined as a copy version of I1. Otherwise, I1 and I2 can be determined to be
unrelated. The computational complexity for performing the FSRISA-based image copy
detection can be also similarly analyzed.
B. Image Retrieval via FSRISA
The most popular image retrieval approach is content-based image retrieval (CBIR), where
the most common technique is to measure the similarity between two images by comparing
their extracted image features.
In the proposed scheme, for a query image, we extract its dictionary feature (with N atoms)
and transmit it to an image database, where each image is stored together with its dictionary
feature and original SIFT feature vectors. For comparing the query image IQ and each
Journal of Computer Applications ISSN: 0974 1925,
Volume-5, Issue EICA2012-1, February 10, 2012
Shanmugam C , Nandhini K , Poovizhi P , Priyadharshni C 102
database image IDi, i = 1,2, , size_of_database, where size_of_database denotes the total
number of database images, we apply the proposed FSRISA scheme to perform the l1-
minimization (Eq. (3)) for each SIFT feature vector of IDi with respect to the dictionary
consisting of the dictionary features of the two images. Then, we calculate the reconstruction
errors of all the stored feature vectors of IDi and perform voting to get the similarity value
between the two images (Eq. (4)) to be the score of IDi.
Finally, we retrieve the top Q database images with the largest scores. Similarly, the
computational complexity of comparing the two images can be analyzed except that the
complexity for extracting the dictionary feature of each database image can be excluded due
to the fact that the process can be performed in advance during database construction.
C. Image Recognition via FSRISA
Consider a well-classified image database, where each class includes several images with
the same object, but with different variations. Given a query image, a user may enquire
which class the image belongs to. For image recognition, sparse representation techniques
have been extensively used.
The major idea is to exploit the fact that the sparsest representation is naturally
discriminative. Among all of the subsets of atoms in a dictionary, it selects the subset that
most compactly expresses the input signal and rejects all of the others with less compact
representations. More specifically, image recognition/classification can be achieved by
representing the feature of the query image as a linear combination of those training samples
from the same class in a dictionary.
Moreover, the conclusions in claimed that their sparse representation-based face recognition
algorithm should be extended to less constrained conditions (e.g., variations in object pose
or misalignment). In order not to incur such a constraint, both variability-invariant features
and sparse representation should be properly integrated. In addition, in a dictionary consists
of several subsets of image features(down-sampled image pixels were used), where each
subset contains the features of several training images belonging to the same class.
Nevertheless, if the number of classes, the number of training images in each class, or the
feature dimension of a training image, is too large, the dictionary size will be very large. It
will induce very high computational complexity in performing sparse coding for the feature
vector(s) of a query image.we propose an image recognition approach, where we assess the
similarity between a query image and each class of training images based on the proposed
FSRISA. In the training stage, for the i-th image class, i = 1, 2, , C, where C denotes the
number of classes in an image database, we extract the SIFT feature vectors for each image
as the training samples. Then, we apply K-SVD to learn the dictionary DCi of size MNCi
to be the dictionary feature of the i-th image class, where M (= 128) denotes the length of
a SIFT feature vector and NCi denotes the number of atoms of DCi. In the recognition stage,
for a query image IQ, we extract the SIFT feature vectors yQj, j = 1, 2, ..., KQ, where KQ
denotes the number of SIFT feature vectors, and the dictionary feature DQ.
Then, we apply FSRISA to assess the similarity between IQ and the i-th image class,
i = 1, 2, , C, by performing the l1-minimization (similar to Eq. (3)) to obtain the sparse
representation coefficients xQj for yQj of IQ, with respect to the dictionary DCi_Q
consisting of DCi and DQ. Then, we calculate the reconstruction errors for yQj with respect
EICA 202 Sparse Representation for Image Similarity Assessment using
SIFT Algorithm

Shanmugam C , Nandhini K , Poovizhi P , Priyadharshni C 103

to DCi and DQ,respectively, and perform voting for each yQj. Based on Eq. (4), we can
calculate the similarity between IQ and the i-th image class, denoted by Sim(IQ, Class-i).
Finally, the query image IQ can be determined to belong to the i-th class with the largest
Sim(IQ, Class-i). Moreover, the computational complexity for assessing the similarity
between the query image IQ and the i-th class of images can be also approximately analyzed
based, where the dictionary feature extraction for the i-th class of images can be performed
in advance during database construction. In our image recognition approach, the sparse
coding procedure should be performed for a query image and each image class, which is
indeed computationally expensive. The complexity of our approach can be improved by
applying more efficient sparse coding techniques, such as multi-core OMP.
VI. CONCLUSION
Thus we have found similarity between two images. The core is to propose a feature-based
image similarity assessment technique by exploring the two aspects of a feature detector in
terms of representation and matching in our FSRISA framework. Then, we properly
formulate the image copy detection, retrieval, and recognition problems as sparse
representation problems and solve them based on our FSRISA.
REFERENCES
1. Li-Wei Kang, Hsu, Chen, Chun-Shien Lu,Chih-Yang Lin and Soo-Chang Pei ,Feature-
based Sparse Representation for Image Similarity Assessment, Multimedia IEEE
Transactions, volume 13, Issue5, October 2011
2. L. W. Kang, C. Y. Hsu, H. W. Chen, and C. S. Lu, Secure SIFT-based sparse
representation for image copy detection and recognition ,Proc. of IEEE Int. Conf. on
Multimedia and Expo, Singapore, July 2010.
3. D. Nistr and H. Stewnius ,Scalable recognition with a vocabulary tree , IEEE Conf.
on Computer Vision and Pattern Recognition,2006, pp. 21612168
4. D. G. Lowe , Distinctive image features from scale-invariant key points , Computer
Vision, vol. 60, no. 2, pp. 91110, 2004.
5. Michal Aharon, Michael Elad, and Alfred Bruckstein,K-SVD: An Algorithm for
Designing Over complete Dictionaries for Sparse Representation , IEEE Transactions
On Signal Processing, VOL. 54, NO. 11, November2006.
6. Z. Wang and A. C. Bovik, Mean squared error: love it or leave it? A new look at
signal fidelity measures, IEEE Signal Processing Magazine,vol. 26, no. 1, pp. 98-117,
Jan. 2009.
7. H. R. Sheikh and A. C. Bovik, Image information and visual quality,IEEE Trans. on
Image Processing, vol.15, no.2, pp. 430-444, Feb. 2006.
8. Zhong Wu,Qifa Ke, JianSun,Heung-Yeung Shum,A Multi-Sample, Multi-Tree
Approach to Bag-of-Words Image Representationfor Image Retrieval.

You might also like