A SIFT-Based Forensic Method For Copy-Move Attack Detection and Transformation Recovery
A SIFT-Based Forensic Method For Copy-Move Attack Detection and Transformation Recovery
features also in the digital forensics domain; in fact, SIFT fea- the keypoint as the center and its canonical orientation as the
tures have been used for fingerprint detection [29], shoeprint origin axis. The contribution of each pixel is obtained by ac-
image retrieval [30], and also for copy–move detection [31], cumulating image gradient magnitude and orientation
[12], [32]. in scale-space and the histogram is computed as the
local statistics of gradient orientations (considering eight bins)
A. Review of the SIFT Algorithm in 4 4 subpatches.
Summarizing the above, given an image , this procedure
Most of the algorithms proposed in the literature for detecting ends with a list of keypoints each of which is completely
and describing local visual features usually require two steps. described by the following information:
The first is the detection step, in which interest points are local- where are the coordinates in the image plane, is the
ized, while in the second step robust local descriptors are built scale of the keypoint (related to the level of the image-pyramid
so as to be invariant with respect to orientation, scale, and affine used to compute the descriptor), is the canonical orientation
transformations. A comprehensive analysis of several local de- (used to achieve rotation invariance), and is the final SIFT
scriptors is provided in [33], while local affine region detectors descriptor.
are surveyed in [34]. These works confirm that SIFT features
[13] are a good solution because of their robust performance
and relatively low computational costs.
This method can be roughly summarized as the following B. Our Contribution
four steps: 1) scale-space extrema detection; 2) keypoint local-
ization; 3) assignment of one (or more) canonical orientations; A very preliminary work on copy–move forgery detection
4) generation of keypoint descriptors. based on SIFT features was proposed in [31], but in that paper
In other words, given an input image , SIFT features are no estimation of the parameters of the applied geometric trans-
detected at different scales using a scale-space representation formation is performed and, furthermore, extended numerical
implemented as an image pyramid. The pyramid levels are ob- results to evaluate real performances of the methodology (e.g.,
tained by Gaussian smoothing and subsampling of the image true/false positive rates) are not provided. Another very recent
resolution while interest points are selected as local extrema work has been presented in [32]. Although the technique is able
(minimum/maximum) in the scale-space. These keypoints, re- to deal with region extraction by resorting to a correlation map,
ferred to as in the following, are extracted by applying a com- it cannot manage affine transformation and, also in this case,
putable approximation of the Laplacian of Gaussian called Dif- quantitative results on the reliability of the estimate of geometric
ference of Gaussians (DoG). Specifically, a DoG image is transformation parameters are not given; in addition, this ap-
given by: proach adopts many different empirical thresholds whose set-
, where is the convolution ting seems to be not completely unsupervised. Moreover none
of the original image with the Gaussian blur of these contributions considers accurately the case of multiple
at scale . copy–move forgeries. As we will show furthermore, this is a key
In order to guarantee invariance to rotations, the algorithm point in a realistic forensic scenario since often a forged image
assigns to each keypoint a canonical orientation . To determine contains several cloned areas (like in the case of Fig. 2).
this orientation, a gradient orientation histogram is computed In this scenario is placed our proposed method that is able to
in the neighborhood of the keypoint. Specifically, for an image detect and then to estimate the geometrical transformation used
sample at scale (the scale in which that keypoint in a copy–move forgery attack. Multiple copy–move forgeries
was detected), the gradient magnitude and orientation are managed by performing a robust feature matching proce-
are precomputed using pixel differences dure and then a clustering on the keypoint coordinates in order
to separate the different cloned areas. These two tasks are fun-
damental since otherwise, in case of multiple cloning, it is often
impossible to detect and separate each forgery and also to es-
(1) timate the geometric transformation. Estimating the geometric
parameters with accuracy is deemed as a fundamental task not
(2)
only to understand how the cloned patch has been processed
[35] and possibly to infer which was the counterfeiter’s motive,
An orientation histogram with 36 bins is formed, with but also to compare the original source block of image and the
each bin covering approximately 10 . Each sample in the forged one on a common ground. Furthermore, a reliable esti-
neighboring window added to a histogram bin is weighted by mate of the transformation allows us to register the two patches
its gradient magnitude and by a Gaussian-weighted circular for a possible deeper forensics analysis [36]. The method pro-
window with equal to 1.5 times respect to the scale of the posed hereafter is able to deal with affine geometric transforma-
keypoint. The peaks in this histogram correspond to dominant tions and, as demonstrated by experimental results, also gives
orientations. Once these keypoints are detected, and canonical reliable estimates of the transformation parameters. Our tech-
orientations are assigned, SIFT descriptors are computed at nique works by relying on a unique empirical threshold which
their locations in both image plane and scale-space. Each regulates clustering operation, and that has been determined by
feature descriptor consists of a histogram of 128 elements, a training procedure on a general dataset. This is a very impor-
obtained from a 16 16 pixel area around the corresponding tant issue also in comparison with similar techniques like that
keypoint. This area is selected using the coordinates of in [32].
1102 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 6, NO. 3, SEPTEMBER 2011
Fig. 3. Overview of the proposed system. SIFT matched pairs and clusters.
III. PROPOSED METHOD very different from those of the rest of the set (i.e., features that
The proposed approach is based on the SIFT algorithm to ex- are globally distinctive). Therefore, the case of cloned patches
tract robust features which can allow it to discover if a part of is very critical since the keypoints detected in those regions are
an image was copy–moved, and furthermore, which geomet- very similar to each other.
rical transformation was applied. In fact, the copied part has For this reason, we propose a novel matching procedure that
basically the same appearance of the original one, thus key- is a generalization of (3), and is able to deal with multiple copies
points extracted in the forged region will be quite similar to of the same features. Our generalized 2NN test (referred as
the original ones. Therefore, matching among SIFT features can 2NN) starts from the observation that in a high dimensional
be adopted for the task of determining possible tampering. A feature space such as that of SIFT features, keypoints that are
simple schematization of the whole system is shown in Fig. 3: different from one considered share very high and very similar
the first step consists of SIFT feature extraction and keypoint values (in terms of Euclidean distances) among them. Instead,
matching, the second step is devoted to keypoint clustering and similar features show low Euclidean distances with respect to
forgery detection, while the third one estimates the occurred the others. The idea of the 2NN test is that the ratio between the
geometric transformation, if tampering has been detected. distance of the candidate match and the distance of the second
nearest neighbor is low in the case of a match (e.g., lower than
A. SIFT Features Extraction and Multiple Keypoint Matching 0.6) and very high in case of two “random features” (e.g., greater
Given a test image, a set of keypoints with than 0.6). Our generalization consists of iterating the 2NN test
their corresponding SIFT descriptors is extracted. between until this ratio is greater than (in our exper-
A matching operation is performed in the SIFT space among the iments we set this value to 0.5). If is the value in which the
vectors of each keypoint to identify similar local patches in procedure stops, each keypoint in correspondence to a distance
the test image. The best candidate match for each keypoint in (where ) is considered as a match for
is found by identifying its nearest neighbor from all the other the inspected keypoint.
keypoints of the image, which is the keypoint with the Finally, by iterating over keypoints in , we can obtain the set
minimum Euclidean distance in the SIFT space. In order to de- of matched points. All the matched keypoints are retained, but
cide that two keypoints match (i.e., “are these two descriptors isolated ones are no longer considered in subsequent processing
the same or not?”), simply evaluating the distance between two steps. Already at this stage a draft idea of the authenticity of
descriptors with respect to a global threshold does not perform the image is provided. But it can happen that images that legit-
well. This is due to the high-dimensionality of the feature space imately contain areas with very similar texture yield matched
(128) in which some descriptors are much more discriminative keypoints that might induce false alarms. The following two
than others. steps of the proposed methodology reduce this possibility.
We can obtain a more effective procedure, as suggested
in [13], by using the ratio between the distance of the closest B. Clustering and Forgery Detection
neighbor to that of the second-closest one, and comparing it with To identify possible cloned areas, an agglomerative hierar-
a threshold (often fixed to 0.6). For the sake of clarity, given a chical clustering [37] is performed on spatial locations (i.e.,
keypoint we define a similarity vector coordinates) of the matched points. Hierarchical clustering cre-
that represents the sorted Euclidean distances with respect ates a hierarchy of clusters which may be represented by a tree
to the other descriptors. Following this idea, the keypoint is structure. The algorithm starts by assigning each keypoint to
matched only if this constraint is satisfied: a cluster; then it computes all the reciprocal spatial distances
where (3) among clusters, finds the closest pair of clusters, and finally
merges them into a single cluster. Such computation is itera-
We refer to this procedure as the 2NN test. This matching pro- tively repeated until a final merging situation is achieved. The
cedure has one main drawback: it is unable to manage multiple way this final merging can be accomplished is basically condi-
keypoint matching. This is a key aspect in case of copy–move tioned both by the linkage method adopted and by the threshold
forgeries since it may happen that the same image area is cloned used to stop cluster grouping.
over and over (see, for example, Fig. 2). In other words, it only Several linkage methods exist in the literature and our ex-
finds matches between keypoints whose SIFT descriptors are periments evaluate their performance to estimate the best cutoff
AMERINI et al.: SIFT-BASED FORENSIC METHOD FOR COPY–MOVE ATTACK DETECTION AND TRANSFORMATION RECOVERY 1103
threshold for forgery detection (see Section IV-A for a de- a flat patch). Anyway, this is a very well-known open issue in
tailed description of such experiments). In particular, three dif- SIFT-related scientific literature.
ferent linkage methods have been evaluated: Single, Centroid,
and Ward’s linkage. Given two clusters and , respectively C. Geometric Transformation Estimation
containing and objects (where and indicate When an image has been classified as nonauthentic, the pro-
the th and the th object in the clusters and ), the linkage posed method can determine which geometrical transformation
methods operate as follows: was used between the original area and its copy–moved ver-
1) Single linkage uses the smallest Euclidean distance be- sion. Let the matched point coordinates be, for the two areas,
tween objects in the two clusters and , respectively; their geo-
metric relationships can be defined by an affine homography
which is represented by a 3 3 matrix as
(4)
TABLE II
FOURTEEN DIFFERENT COMBINATIONS OF GEOMETRIC TRANSFORMATIONS
APPLIED TO THE ORIGINAL PATCH FOR THE MICC-F2000 DATASET
applied to the and axis of the cloned image part (e.g., in the
attack , the and axes are scaled by 30%, and no rotation is
performed).
TABLE VI
TRANSFORMATION PARAMETER ESTIMATION ON IMAGE CARS. THE VALUES AND ARE EXPRESSED IN PIXELS WHILE IS IN DEGREES
Fig. 4. Some examples of tampered images are pictured in the first row; the corresponding detection results are reported in the second row.
Fig. 6. Examples of tampered images with multiple cloning are shown in the first row, while the detection results are reported in the second row.
Fig. 7. Error analysis of tampered images misdetection for each different attack (in percentage).
[33] K. Mikolajczyk and C. Schmid, “A performance evaluation of local Roberto Caldelli (M’10) received the Laurea degree
descriptors,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 10, in electronic engineering in 1997 and the Ph.D. de-
pp. 1615–1630, Oct. 2005. gree in computer science and telecommunication in
[34] K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. 2001, both from the University of Florence, Florence,
Schaffalitzky, T. Kadir, and L. Van Gool, “A comparison of affine re- Italy.
gion detectors,” Int. J. Comput. Vision, vol. 65, no. 1/2, pp. 43–72, Currently he is an assistant professor at the Media
2005. Integration and Communication Center of the Uni-
[35] W. Wei, S. Wang, X. Zhang, and Z. Tang, “Estimation of image rotation versity of Florence. He is also a member of CNIT. His
angle using interpolation-related spectral signatures with application to main research activities, witnessed by several publi-
blind detection of image forgery,” IEEE Trans. Inf. Forensics Security, cations, include digital image processing, image and
vol. 5, no. 3, pp. 507–517, Sep. 2010. video digital watermarking, and multimedia foren-
[36] W. Lu, A. L. Varna, and M. Wu, “Forensic hash for multimedia infor- sics.
mation,” in Proc. SPIE Media Forensics and Security, San Jose, CA,
2010.
[37] T. Hastie, R. Tibshirani, and J. H. Friedman, The Elements of Statistical
Learning. New York: Springer, 2003. Alberto Del Bimbo (M’90) is a full professor of
[38] R. I. Hartley and A. Zisserman, Multiple View Geometry in Computer computer engineering at the University of Florence,
Vision. Cambridge, U.K.: Cambridge Univ. Press, 2004. Florence, Italy, where he is also the director of the
[39] M. Fischler and R. Bolles, “Random sample consensus: A paradigm Master in Multimedia, and the director of the Media
for model fitting with applications to image analysis and automated Integration and Communication Center. His research
cartography,” Commun. ACM, vol. 24, no. 6, pp. 381–395, 1981. interests include pattern recognition, multimedia
[40] T.-T. Ng, S.-F. Chang, J. Hsu, and M. Pepeljugoski, Columbia Pho- information retrieval, and human–computer interac-
tographic Images and Photorealistic Computer Graphics Dataset AD- tion. He has published more than 250 publications
VENT, Columbia Univ., New York, Tech. Rep., 2004. in some of the most distinguished scientific journals
and international conferences, and is the author of
Irene Amerini received the Laurea degree in com- the monograph Visual Information Retrieval.
puter engineering in 2006 and the Ph.D. degree in Dr. Del Bimbo is an IAPR fellow and Associate Editor of Multimedia
computer engineering, multimedia, and telecommu- Tools and Applications, Pattern Analysis and Applications, Journal of Visual
nication in 2011, both from the University of Flo- Languages and Computing, and the International Journal of Image and
rence, Italy. Video Processing, and was an Associate Editor of Pattern Recognition, IEEE
She was a visiting scholar at Binghamton Uni- TRANSACTIONS ON MULTIMEDIA, and IEEE TRANSACTIONS ON PATTERN
versity, Binghamton, NY, in 2010. Currently she ANALYSIS AND MACHINE INTELLIGENCE.
is a postdoctoral researcher at the Image and
Communication Laboratory, Media Integration and
Communication Center, University of Florence,
Florence, Italy. Her main research interests focus on
Giuseppe Serra received the Laurea degree in com-
multimedia forensics and image processing.
puter engineering in 2006, and the Ph.D. degree in
computer engineering, multimedia, and telecommu-
nication in 2010, both from the University of Flo-
rence, Florence, Italy.
Lamberto Ballan (S’08) received the Laurea degree He is a postdoctoral researcher at the Media
in computer engineering in 2006 and the Ph.D. degree Integration and Communication Center, University
in computer engineering, multimedia, and telecom- of Florence. He was a visiting scholar at Carnegie
munication in 2011, both from the University of Flo- Mellon University, Pittsburgh, PA, and at Télécom
rence, Italy. ParisTech/ENST, Paris, in 2006 and 2010, respec-
He was a visiting scholar at Télécom ParisTech/ tively. His research interests include image and
ENST, Paris, in 2010. Currently he is a postdoctoral video analysis, multimedia ontologies, image forensics, and multiple-view
researcher at the Media Integration and Communica- geometry.
tion Center, University of Florence, Florence, Italy.
His main research interests focus on multimedia in-
formation retrieval, image and video analysis, pattern
recognition, and computer vision.