Sparse Representation For Image Similarity Assessment Using SIFT Algorithm
This document summarizes a journal article that proposes a feature-based approach called FSRISA (Feature-based Sparse Representation for Image Similarity Assessment) to quantify the similarity between images. It uses SIFT to extract features and K-SVD for dictionary learning. To assess similarity, it formulates sparse representation of test image features with respect to a joint dictionary of reference and test image dictionaries. It calculates reconstruction errors to determine how many test features reconstruct better using the reference vs. test dictionary, and uses this to calculate a similarity value between 0-1. The approach is evaluated on applications like copy detection, retrieval and recognition.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
27 views8 pages
Sparse Representation For Image Similarity Assessment Using SIFT Algorithm
This document summarizes a journal article that proposes a feature-based approach called FSRISA (Feature-based Sparse Representation for Image Similarity Assessment) to quantify the similarity between images. It uses SIFT to extract features and K-SVD for dictionary learning. To assess similarity, it formulates sparse representation of test image features with respect to a joint dictionary of reference and test image dictionaries. It calculates reconstruction errors to determine how many test features reconstruct better using the reference vs. test dictionary, and uses this to calculate a similarity value between 0-1. The approach is evaluated on applications like copy detection, retrieval and recognition.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8
Journal of Computer Applications ISSN: 0974 1925,
Volume-5, Issue EICA2012-1, February 10, 2012
Image Processing
Sparse Representation for Image Similarity Assessment using SIFT Algorithm Shanmugam C Assistant Professor(Senior) Dept of ECE Sri Shakthi Institute of Engineering and Technology, Coimbatore-62. Nandhini K Dept of ECE Sri Shakthi Institute of Engineering and Technology, Coimbatore-62. Poovizhi P Dept of ECE Sri Shakthi Institute of Engineering and Technology, Coimbatore-62. Priyadharshni C Dept of ECE Sri Shakthi Institute of Engineering and Technology, Coimbatore-62. Abstract - Assessment of image similarity is fundamentally important to numerous multimedia applications. The goal of similarity assessment is to automatically assess the similarities among images in a perceptually consistent manner. In this project, we interpret the image similarity assessment problem as an information fidelity problem. More specifically, we propose a feature-based approach to quantify the information that is present in a reference image and how much of this information can be extracted from a test image to assess the similarity between the two images. Here, we extract the feature points and their descriptors from an image, followed by learning the dictionary/basis for the descriptors in order to interpret the information present in this image. Then, we formulate the problem of the image similarity assessment in terms of sparse representation. The prime importance of this project is even after the image is subjected to geometrical variations like rotating, shifting, change in orientation, change in background, still we can able to correctly identify the similarity between two image. To evaluate the applicability of the proposed feature- based sparse representation for image similarity assessment (FSRISA) technique, we apply FSRISA to three popular applications, namely, image copy detection, retrieval, and recognition by properly formulating them to sparse representation problems. Keywords: image, sparse, image similarity assessment. I. INTRODUCTION Image similarity assessment is fundamentally important to numerous multimedia information processing systems and applications, such as compression, restoration, enhancement, copy detection, retrieval, and recognition/classification. The major goal of image similarity assessment is to design algorithms for automatic and objective evaluation of similarity in a manner that is consistent with subjective human evaluation. EICA 202 Sparse Representation for Image Similarity Assessment using SIFT Algorithm Shanmugam C , Nandhini K , Poovizhi P , Priyadharshni C 97 (a) (b) (c) (d)
Figure 1: Some examples of image manipulations. (a) original image (b)zoomed and shifted version (c) original image (d) background modified version. A. Steps in FSRISA x Feature extraction-SIFT x Dictionary learning-K-SVD x Information Extraction Feature Extraction have been discussed in the section II. Dictionary Learning using K-SVD algorithm have been discussed in section III. Finally Information extraction to calculate similarity value have been discussed in detail in section IV. Applications are described in section V. Results have been discussed in section VI.
B. Block diagram Reference Image Feature Extraction Dictionary Learning Information via Sparse coding Similarity Value Feature Extraction Feature Extraction Feature Extraction Journal of Computer Applications ISSN: 0974 1925, Volume-5, Issue EICA2012-1, February 10, 2012 Shanmugam C , Nandhini K , Poovizhi P , Priyadharshni C 98 C. Algorithm Input: A reference image I 1 and a test image I 2 . Output: The similarity value between I 1 and I 2 , i.e., Sim(I 1 , I 2 ). 1. Extract the SIFT feature vectors y 1i , i = 1, 2, .., K 1 , from I 1 , followed by learning the dictionary feature D 1 sparsely representing y 1i . 2. Extract the SIFT feature vectors y 2j , j = 1, 2, .., K 2 , from I 2 , followed by learning the dictionary feature D 2 sparsely representing y 2j . 3. Perform l 1 -minimization by solving equation for y 2j , j = 1,2, ..., K 2 , with respect to D 12 = [D 1 |D 2 ]. 4. Calculate the reconstruction errors, E 1j and E 2j , for y 2j , j = 1,2, ..., K 2 , with respect to D 1 and D 2 , respectively. 5. Perform voting by comparing E 1j and E 2j , for y 2j , j = 1, 2, ,K 2 , and get the percentages of votes, V 1 and V 2 , with respect to D 1 and D 2 , respectively. 6. Calculate Sim(I 1 , I 2 ) = (V 1 - V 2 + 1) / 2 . II. EXTRACTION OF FEATURES (SIFT) SIFT, Scale Invariant feature transform is a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. III. DICTIONARY LEARNING (K-SVD) Given a set of K SIFT feature vectors,y i R M1 ,i=1,2,K, we apply K-SVD to find the dictionary D of size MN,M<N<<K, by formulating the problem as: Min ( 2 2 ) subject to i , 0 _L,...(1) Where x i R N1 is the sparse representation coefficients of y i, 0, l 0 -norm of x i , counts the number of nonzero coefficients of x i , and L is the most desired number of nonzero coefficients of x i . We apply K-SVD to solve Eq. (1) via an iterative manner with two stages: (i) sparse coding stage: apply OMP (orthogonal matching pursuit) to solve x i for each y i while fixing D; and (ii) dictionary update stage: update D together with the nonzero coefficients of x i . The two stages are iteratively performed until convergence. It should be noted that the l 0 -minimization formulation in Eq. (1) can be converted into an l 1 - minimization problem and other dictionary learning algorithm, (e.g., the online dictionary learning algorithm ) can be also applied in the dictionary feature extraction stage. The obtained dictionary feature D is an over complete dictionary, where D = n=1,2,..N R MN , contains N prototype feature vector atoms as the column vectors in D. Each original feature vector y i R M1 , i=1,2,K, can be sparsely represented as a linear combination of the atoms defined in D, satisfying Dx i 2 _, where _0 is an error tolerance. EICA 202 Sparse Representation for Image Similarity Assessment using SIFT Algorithm
Shanmugam C , Nandhini K , Poovizhi P , Priyadharshni C 99
IV. INFORMATION EXTRACTION After obtaining the dictionary feature for each image, we formulate the image similarity assessment based on dictionary feature matching as a sparse representation problem, described as follows. First, consider the two SIFT feature (column) vectors with First, consider the two SIFT feature (column) vectors with length M, y 1i , i = 1, 2, .., K1, and y 2j , j = 1, 2, .., K2, extracted, respectively, from the two images, I 1 and I 2 , where K 1 and K 2 are the numbers of feature vectors of I1 and I2, respectively. The dictionary features of I 1 and I 2 are D 1 (of size MN 1 ) and D 2 (of size MN 2 ), respectively, where M < N 1 and M < N 2 . Hence, y 1i = D 1 x 1i and y 2j = D 2 x 2j , where x 1i and x 2j are the two sparse coefficient (column) vectors with length N1 and N2, of y1i and y2j, respectively. Obviously, if y 1i and y 2j can be matched, y 1i can be represented sparsely and linearly with respect to D 2 . On the other hand, y 2j can be represented sparsely and linearly with respect to D 1 . To assess the similarity between a reference image I 1 and a test image I 2 , exploiting the discriminative characteristic of sparse representation, we want to quantify how much information present in I 1 can be extracted from I 2 . A sparse representation problem for representing each SIFT feature vector y 2j of I 2 with respect to the joint dictionary D 12 =[D 1 ][D 2 ] can be defined as x j =min subject to 2 _....(2) x j where x j with length (N 1 +N 2 ) is the sparse coefficient vector of y 2j with length M of I 2 . D 12 = [D 1 |D 2 ] of size M(N 1 +N 2 ) is the joint dictionary concatenating D 1 and D 2 , and _ 0 is an error tolerance. To solve the sparsest solution for x j , Eq. (2) can be cast to an l 1 - minimization problem as x j =arg min( 2 + 1 .(3) where is a positive real number parameter. In this project, we apply an efficient sparse coding algorithm, called the SPARSA (sparse reconstruction by separable approximation) algorithm to solve Eq. (3) in order to find the sparse representation ( x j ) of y 2j with respect to the dictionary D 12 . SPARSA is a very efficient iterative algorithm, where each step is obtained by solving an optimization sub problem involving a quadratic term with diagonal Hessian plus the original sparsity-inducing regularizer. Of course, Eq. (3) can be directly solved via a greedy algorithm, such as OMP and other l 1 -minimization algorithms. It is expected that the positions of nonzero coefficients in x j (or the selected atoms from D 12 ) should be highly concentrated on only one sub-dictionary (e.g., D 1 or D 2 ), and the remaining coefficients in x j should be zeros or small enough. Also, it is intuitive to expect that the atoms for sparsely representing y 2j should be mostly selected from the sub- dictionary D 2 learned from the feature vectors extracted from the image I 2 itself, instead of D 1 . If the parameters for learning the two dictionaries (D 1 and D 2 ) can be adequately tuned, the manner of atom selection in the sparse coding process may be changed accordingly. That is, we intend to make the sparse coefficients x j (or the used atoms) solved by performing sparse coding for y 2j more consistent with our expectation to help for similarity assessment. More specifically, we expect y 2j will use more atoms from D 2 to represent it when I 2 and I 1 are visually different. On the other hand, we expect y2j will use more atoms from D 1 to represent it when I 2 and I 1 are visually similar. The details are described in the seventh paragraph of this subsection. Based on the obtained solution x j of Eq. (3), we can calculate Journal of Computer Applications ISSN: 0974 1925, Volume-5, Issue EICA2012-1, February 10, 2012 Shanmugam C , Nandhini K , Poovizhi P , Priyadharshni C 100 the reconstruction error as ||y 2j D 12 x j || 2 . By letting the elements in x j , corresponding to the atoms from D 2 , be zeros, we can get the reconstruction error E 1j , using only the atoms from D 1 for reconstructing y 2j . On the other hand, by letting the elements in x j , corresponding to the atoms from D 1 , be zeros, we can get the reconstruction error E 2j , using only the atoms from D 2 for reconstructing y 2j . If E 1j < E 2j , it is claimed that the atoms from D 1 are more suitable for representing y 2j than those from D 2 , and D 1 will get a vote. Otherwise, if E 2j < E 1j , y 2j is more suitable to be represented by the atoms from D 2 (the dictionary learned from the feature vectors y 2j itself) than D 1 , and D 2 will get a vote. Considering all the SIFT feature vectors of I 2 , y 2j , j = 1, 2, , K 2 , the obtained percentage of votes from D 1 and D 2 are denoted by V 1 and V 2 , 0 _ V 1 , V 2 _ 1, respectively. Based on the voting strategy, we define the similarity between the two images, I 1 and I 2 , as: Sim(I 1 , I 2 ) = (V 1 - V 2 + 1)/2,.. (4) where the range of (V 1 - V 2 ) is [-1, 1], which can be shifted to [0,1], resulting in the Sim(I 1 , I 2 ) defined in Eq. (4). Larger Sim(I 1 ,I 2 ) indicates that more atoms from D 1 learned from I 1 can well represent the feature vectors y 2j extracted from I 2 . This implies that a considerable amount of information (denoted by D 1 ) presented in I 1 can be extracted (via spare coding by solving Eq.(3)) from I 2 . On the other hand, smaller Sim(I 1 , I 2 ) indicates most suitable atoms for representing y 2j extracted from I 2 are from D 2 learned from I 2 itself. This implies that less/no information presented in I 1 can be extracted from I 2 . Hence, the larger the Sim(I 1 , I 2 ) is, the more similar the images I 1 and I 2 are.Obviously, if I1 is visually very different from I 2 , V 2 is larger than V 1 .Nevertheless, if I 1 is visually similar to I 2 , V 2 will not be larger than V 1 in all instances. That is, better (or similar) reconstruction performance for y 2j may be achieved using D 1 as the dictionary than using D 2 due to some feature vectors extracted from I 2 being able to be matched by the feature vectors extracted from I 1 . To achieve this goal, we propose three rules for tuning the parameters used by K-SVD for learning the two dictionaries, D 1 and D 2 : (i) the number of the atoms in D 1 should be larger than that in D 2 (N 1 > N 2 ); (ii) the number (J 1 ) of iterations K-SVD performs for learning D 1 should be larger than that (J 2 ) for learning D 2 (J 1 > J 2 ); and (iii) the number of the target sparsity (L 1 ), i.e., the number of nonzero coefficients for representing each feature vector for learning D 1 should be larger than that (L 2 ) for learning D 2 (L 1 > L 2 ). According to the rules designed above, when I 1 is visually similar to I 2 and D 1 is finer than D 2 , the l 1 -minimizer for solving Eq. (3) may prefer more promising atoms from D 1 than D 2 to reconstruct y 2j , resulting in V 1 > V 2 and larger Sim(I 1 , I 2 ). Otherwise, when I 1 is visually different from I 2 , most atoms for reconstructing y 2j will still be selected from D 2 , resulting in V 1 < V 2 and smaller Sim(I 1 , I 2 ). The proposed FSRISA technique is summarized in Algorithm. The major goal of performing sparse coding with respect to the dictionary consisting of D 1 and D 2 , instead of only D 1 can be addressed as follows. When I 1 (reference image) is visually different from I 2 (test image), D 1 and D 2 are significantly different. In this scenario, the idea behind our FSRISA is somewhat related to that of sparse coding-based image classification approach, or sparse coding-based image decomposition approach. We use the similar concept to quantify the similarity between I 2 and I 1 , which may be interpreted as either (i) classifying I 2 into I 1 or I 2 itself; or (ii) decomposing I 2 into the components of I 1 and/or those of I 2 itself. When I 1 is visually similar to I 2 , D 1 and D 2 are similar, which is enforced to that D 1 is finer than D 2 in FSRISA. Hence, the above discussions are also valid in this scenario. Moreover, why we dont perform sparse coding with respect to only one EICA 202 Sparse Representation for Image Similarity Assessment using SIFT Algorithm
Shanmugam C , Nandhini K , Poovizhi P , Priyadharshni C 101
dictionary D 1 can be explained as follows. When performing sparse coding for the feature vectors of I 2 with respect to a dictionary D 1 consisting of atoms which may be not suitable for sparsely representing them, the sparse coding procedure still attempts to minimize the reconstruction errors. Based on our experience, it is usually not well distinguishable from reconstruction errors obtained with respect to either related or unrelated dictionaries. On the other hand, it is not easy to define a bounded score based on reconstruction error obtained by only one dictionary V. APPLICATIONS In this section, we introduce three multimedia applications, including image copy detection, retrieval, and recognition, of the proposed FSRISA. A. Image Copy Detection via FSRISA Digital images distributed through the Internet may suffer from several possible manipulations, such as (re)compression, noising, contrast/brightness adjusting, and geometrical operations. To ensure trustworthiness, image copy detection techniques have emerged to search for duplicates and forgeries. Image copy detection can be achieved via content-based copy detection approach, which measures the similarity/distance between an original image and its possible copy version through comparing their extracted image features, SIFT-based features have been recently investigated. In this section, we study content-based image copy detection by applying the proposed FSRISA approach. A user can perform image copy detection to detect possible copies of her/his original image from the Internet or an image database. To detect whether a test image I2 is actually a copy of a query image I1 with the dictionary feature D1 of size MN1,we first extract the SIFT feature vectors y2j, j = 1, 2, ..., K2, and learn the dictionary feature D2 of size MN2 of I2, such that D1 is finer than D2. Then, we perform l1-minimization by solving Eq.(3) for each y2j with respect to D12 = [D1|D2], and voting to get the percentages of votes, V1 and V2, with respect to D1 and D2,respectively. Finally, based on Eq. (4), the similarity between I1and I2 can be calculated as Sim(I1, I2). Given an empirically determined threshold , iI Sim(I1, I2) _ , then I2 can be determined as a copy version of I1. Otherwise, I1 and I2 can be determined to be unrelated. The computational complexity for performing the FSRISA-based image copy detection can be also similarly analyzed. B. Image Retrieval via FSRISA The most popular image retrieval approach is content-based image retrieval (CBIR), where the most common technique is to measure the similarity between two images by comparing their extracted image features. In the proposed scheme, for a query image, we extract its dictionary feature (with N atoms) and transmit it to an image database, where each image is stored together with its dictionary feature and original SIFT feature vectors. For comparing the query image IQ and each Journal of Computer Applications ISSN: 0974 1925, Volume-5, Issue EICA2012-1, February 10, 2012 Shanmugam C , Nandhini K , Poovizhi P , Priyadharshni C 102 database image IDi, i = 1,2, , size_of_database, where size_of_database denotes the total number of database images, we apply the proposed FSRISA scheme to perform the l1- minimization (Eq. (3)) for each SIFT feature vector of IDi with respect to the dictionary consisting of the dictionary features of the two images. Then, we calculate the reconstruction errors of all the stored feature vectors of IDi and perform voting to get the similarity value between the two images (Eq. (4)) to be the score of IDi. Finally, we retrieve the top Q database images with the largest scores. Similarly, the computational complexity of comparing the two images can be analyzed except that the complexity for extracting the dictionary feature of each database image can be excluded due to the fact that the process can be performed in advance during database construction. C. Image Recognition via FSRISA Consider a well-classified image database, where each class includes several images with the same object, but with different variations. Given a query image, a user may enquire which class the image belongs to. For image recognition, sparse representation techniques have been extensively used. The major idea is to exploit the fact that the sparsest representation is naturally discriminative. Among all of the subsets of atoms in a dictionary, it selects the subset that most compactly expresses the input signal and rejects all of the others with less compact representations. More specifically, image recognition/classification can be achieved by representing the feature of the query image as a linear combination of those training samples from the same class in a dictionary. Moreover, the conclusions in claimed that their sparse representation-based face recognition algorithm should be extended to less constrained conditions (e.g., variations in object pose or misalignment). In order not to incur such a constraint, both variability-invariant features and sparse representation should be properly integrated. In addition, in a dictionary consists of several subsets of image features(down-sampled image pixels were used), where each subset contains the features of several training images belonging to the same class. Nevertheless, if the number of classes, the number of training images in each class, or the feature dimension of a training image, is too large, the dictionary size will be very large. It will induce very high computational complexity in performing sparse coding for the feature vector(s) of a query image.we propose an image recognition approach, where we assess the similarity between a query image and each class of training images based on the proposed FSRISA. In the training stage, for the i-th image class, i = 1, 2, , C, where C denotes the number of classes in an image database, we extract the SIFT feature vectors for each image as the training samples. Then, we apply K-SVD to learn the dictionary DCi of size MNCi to be the dictionary feature of the i-th image class, where M (= 128) denotes the length of a SIFT feature vector and NCi denotes the number of atoms of DCi. In the recognition stage, for a query image IQ, we extract the SIFT feature vectors yQj, j = 1, 2, ..., KQ, where KQ denotes the number of SIFT feature vectors, and the dictionary feature DQ. Then, we apply FSRISA to assess the similarity between IQ and the i-th image class, i = 1, 2, , C, by performing the l1-minimization (similar to Eq. (3)) to obtain the sparse representation coefficients xQj for yQj of IQ, with respect to the dictionary DCi_Q consisting of DCi and DQ. Then, we calculate the reconstruction errors for yQj with respect EICA 202 Sparse Representation for Image Similarity Assessment using SIFT Algorithm
Shanmugam C , Nandhini K , Poovizhi P , Priyadharshni C 103
to DCi and DQ,respectively, and perform voting for each yQj. Based on Eq. (4), we can calculate the similarity between IQ and the i-th image class, denoted by Sim(IQ, Class-i). Finally, the query image IQ can be determined to belong to the i-th class with the largest Sim(IQ, Class-i). Moreover, the computational complexity for assessing the similarity between the query image IQ and the i-th class of images can be also approximately analyzed based, where the dictionary feature extraction for the i-th class of images can be performed in advance during database construction. In our image recognition approach, the sparse coding procedure should be performed for a query image and each image class, which is indeed computationally expensive. The complexity of our approach can be improved by applying more efficient sparse coding techniques, such as multi-core OMP. VI. CONCLUSION Thus we have found similarity between two images. The core is to propose a feature-based image similarity assessment technique by exploring the two aspects of a feature detector in terms of representation and matching in our FSRISA framework. Then, we properly formulate the image copy detection, retrieval, and recognition problems as sparse representation problems and solve them based on our FSRISA. REFERENCES 1. Li-Wei Kang, Hsu, Chen, Chun-Shien Lu,Chih-Yang Lin and Soo-Chang Pei ,Feature- based Sparse Representation for Image Similarity Assessment, Multimedia IEEE Transactions, volume 13, Issue5, October 2011 2. L. W. Kang, C. Y. Hsu, H. W. Chen, and C. S. Lu, Secure SIFT-based sparse representation for image copy detection and recognition ,Proc. of IEEE Int. Conf. on Multimedia and Expo, Singapore, July 2010. 3. D. Nistr and H. Stewnius ,Scalable recognition with a vocabulary tree , IEEE Conf. on Computer Vision and Pattern Recognition,2006, pp. 21612168 4. D. G. Lowe , Distinctive image features from scale-invariant key points , Computer Vision, vol. 60, no. 2, pp. 91110, 2004. 5. Michal Aharon, Michael Elad, and Alfred Bruckstein,K-SVD: An Algorithm for Designing Over complete Dictionaries for Sparse Representation , IEEE Transactions On Signal Processing, VOL. 54, NO. 11, November2006. 6. Z. Wang and A. C. Bovik, Mean squared error: love it or leave it? A new look at signal fidelity measures, IEEE Signal Processing Magazine,vol. 26, no. 1, pp. 98-117, Jan. 2009. 7. H. R. Sheikh and A. C. Bovik, Image information and visual quality,IEEE Trans. on Image Processing, vol.15, no.2, pp. 430-444, Feb. 2006. 8. Zhong Wu,Qifa Ke, JianSun,Heung-Yeung Shum,A Multi-Sample, Multi-Tree Approach to Bag-of-Words Image Representationfor Image Retrieval.