Duplicate Image Detection and Comparison Using Single Core, Multiprocessing, and Multithreading
Duplicate Image Detection and Comparison Using Single Core, Multiprocessing, and Multithreading
ISSN No:-2456-2165
Abstract:- We undoubtedly have a chunk of images on To provide a proper and effective solution to this
our computer. The problem with having a lot of pictures problem, we came up with measures that can detect and even
is that you tend to accumulate duplicates along the way. eradicate such files with user permission.
It would be prudent to manage space efficiently.
Detecting duplicate images from a set of images is a time- The reason we are implementing it with multiple cores
consuming task that can be automated, and duplicate is that we want to compare it to sequential single-core
data can be removed to save space. As we use our phones execution and write about our findings.
more, the number of unwanted duplicate photo and
picture files grows in the device at random, ideally in By running the script and comparing using different
every folder. Duplicate photos/pictures consume a lot of cores, it will be possible to predict which solution is better for
phone memory and slow down the phone's performance. which type of operation.
Finding and removing them manually is difficult. Since
human visual ability is not well developed enough to For example, Single-core or sequential algorithm may
extract structure similarity from the naked eye, we be efficient for a small dataset consisting of just 5 images, as
propose a novel approach based on structural it may take up less CPU power, whereas when the dataset
information degradation. As a practical solution to this consists of 1000+ images, multi-processing might be efficient
problem, we create a structural similarity index and in detecting the duplicate images from the dataset.
demonstrate it with a set of images from our database.
Finding similar and duplicate photos from these samples II. LITERATURE REVIEW
can be a time-consuming task. Duplicate photo finders
come in handy in this situation. Finally, we will compare [1] Proposes a method for evaluating perceptual picture
the computation time and power required by processing quality that uses a range of known aspects of the human visual
on multiple cores vs. single core threads, as well as system to calculate the numeric visibility of mistakes
provide benchmarks and graphical representations for (differences) between a deteriorated image and the original
each. image. We offer an alternative complementary paradigm for
quality evaluation based on the deterioration of structural
Keywords:- Single core ; Multithreading ; Multiprocessing; information, based on the notion that human visual ability is
RGB ; Luminance ; Contrast ; Structure ; Similarity Index. highly adapted for obtaining structural information from a
scene. We build a Structural Similarity Index as an illustration
I. INTRODUCTION of this notion, demonstrating its promise with a set of intuitive
examples and comparisons to both subjective evaluations and
With the fast advancement of Internet technology and state-of-the-art objective approaches on a database of photos
the growing use of technical devices, people may easily take, compressed using JPEG and JPEG2000.
transfer, and share photos over the internet. Due to which
duplicate pictures make up a substantial percentage of the [2] For picture registration, a similarity measure based on
image data. Duplicate image detection is the process of values from their respective Fourier Transforms is presented.
detecting duplicate copies of a query picture from a set of The method generates signatures based on image content
images efficiently and effectively. rather than image annotation and hence does not require
human intervention. It computes the final rank for measuring
Common devices like smartphones often duplicate similarity using both the real and complex components of the
images in form of junk files which is not supported by any FFT. Any reliable method must precisely represent all objects
security measures. These files are easily accessible to a non- in a picture, and different strategies may be required
authorized user and the user is unaware of such files. These depending on the size of the image data. This paper gives an
files can contain important information which could be overview on how to use the Open-CV library to allow for the
exposed against the user's permission. creation of a rating scheme for finding the similarity and
further introducing a comparison metric based on the
[3] Introduces a method for edge identification in the image [8] Gives specifics of picture capture, address segmentation,
based on OpenCV with computer vision and image processing and recognition, which are crucial techniques in automatic
algorithms and algorithms for determining the exact number letter sorting machines, as well as the applications of pattern
of copper cores in the microwire. To begin, we photograph the recognition technology to postal automation in China. Letter
wire's interior structure with a high-resolution camera. sorting machines with pattern recognition technology are
Second, we implement picture pre-processing using OpenCV frequently employed in mail processing centers to
image processing methods. Finally, because morphological mechanically sort mail. Since the 1990s, approximately 100
opening and shutting processes blur image borders, we letter sorting devices have been implemented throughout
employ them to segment images. Finally, contour tracking is China. Siemens (previously AEG), NEC, Solystic (formerly
used to determine the exact amount of copper cores. Actel-Bell), and SRI (Shanghai Research Institute of Postal
Experiments using Borland C++ Builder 6.0 reveal that Science, China Post Group) were among the companies that
OpenCV-based picture edge detection methods are produced these machines.
straightforward, have a high level of code integration, and
have a high level of image edge positioning accuracy. [9] Proposes an approach for extracting distinguishing
invariant elements from photos that can be used to match
[4] On astronomical data processing, proposes parallel multiple views of the same object or scene reliably. The
processing techniques for multicore processors. Astronomers features are invariant to the rotation of the image and its
who prefer Python as their scripting language and utilized scaling, and they've been shown to give reliable matching
PyRAF/IRAF for data processing are the target audience. across a long range of affine distortion, 3D perspective
Three issues of varying difficulty were analyzed on three change, noise addition, and lighting change. The features are
distinct types of multicore processors to show the benefits of extremely distinctive in the sense that a single feature may be
undergoing parallelizing data processing activities in terms of accurately matched against a vast database of features from
execution time. The parallel code can be implemented rather multiple photos with a high likelihood. It also explains how to
easily thanks to Python's native multiprocessing module. The apply these traits to object recognition. Individual features are
3 multiprocessing approaches—Pool/Map, Process/Queue, matched to a collection of features from known objects using
and Parallel Python—were also compared. a fast nearest-neighbor technique, then a Hough transform is
used to identify clusters belonging to a single item, and
[5] The paper states about the popularity of Python among finally, a least-squares solution is used to verify consistent
numeric groups, because of its easy-to-use number-crunching posture parameters.
modules such as [NumPy], [SciPy], [Dask], [Numba], and
others. [10] Presents an area-level visual consistency verification
To use all of the available CPU cores, these modules often use scheme for partial-duplicate search to determine whether there
multi-threading for efficient multi-core parallelism. However, is visually consistent region (VCR) pairs between images. The
when utilized jointly in a single application, their threads can possible VCRs are created by mapping the regions divided
interfere with one another, resulting in overhead and from candidate pictures to a query image using the attributes
inefficiency. of the matched local features to handle the issue of identifying
partial-duplicate images from non-partial-duplicate images
following the local feature matching.
[6] Presents a revolutionary robotics middleware and III. ALGORITHM USED
programming environment that is deeply entrenched. It runs a
multithreaded, publish-subscribe design paradigm for ➢Serial execution Algorithm
microcontroller applications and provides a Unix-like
A sequential algorithm, also named a serial algorithm, is
software interface. By giving a modular and standards-
a computer program that is run sequentially, rather than
oriented platform, we improve on the embedded open-source
concurrently or in parallel, from beginning to end.
systems. The system architecture is based on a POSIX
application programming interface and a publish-subscribe
Most conventional computer algorithms are sequential
object request broker.
algorithms, even if they aren't explicitly labeled as such
because sequential is a background assumption.
[7] Proposes a framework for Image inpainting is a technique
for guessing missing pixel values in an image by combining
Concurrency and parallelism are two different notions in
pixel value information from nearby pixels with prior
general, although they frequently overlap (many distributed
knowledge gained through learning the object class. We
algorithms are both concurrent and parallel), hence the term
present an agile image inpainting method that is very accurate
• OpenCV
OpenCV (Open source Computer Vision Library) is a
library of programming functions principally geared toward
period laptop vision. The library is cross-platform and free to
be used underneath the ASCII text file BSD license.
• Multiprocessing
Multiprocessing may be a package that supports
spawning processes using an API the same as the threading
module. Due to this, the parallel processing module permits
the technologist to fully leverage multiple processors on a
given machine. It runs on each UNIX as well as Windows.
The parallel processing module conjointly introduces APIs
that don't have analogs within the threading module.
• Threading
This module constructs higher-level threading interfaces
on high of the lower level _thread module. This module
provides low-level primitives for operating with multiple
threads (also referred to as lightweight processes or tasks) —
multiple threads of management sharing their world
knowledge area. For synchronization, easy locks (also
referred to as mutexes or binary semaphores) are provided.
The threading module provides a better-to-use and higher-
level threading API designed on high of this module.
V. PROPOSED APPROACH
Time of execution and CPU consumption for sequential, The comparison between the two images is performed
multi-processing, and multi-threading are compared and based on these 3 features.
visualization is created by examining and evaluating all of the
above ways. This system calculates the Structural Similarity Index
between 2 given images which is a value between -1 and +1.
The script can install multiple data sources in parallel
and collect the results after each installation by creating a new A value of +1 indicates that the 2 given images are very
thread for each installed resource. similar or the same while a value of -1 indicates the 2 given
images are very different. Often these values are adjusted to
This means that each subsequent download is not be in the range [0, 1], where the extremes hold the same
waiting on the download of earlier web pages. In this case, the meaning.
Luminance: Luminance is measured by averaging over all the pixel values. Its denoted by μ (Mu) and the formula is given below,
Fig 3: Where xi is the ith pixel value of the image x. N is the total number of pixel values.
Source: https://www.cns.nyu.edu/pub/eero/wang03-reprint.pdf
Fig4-Source: https://www.cns.nyu.edu/pub/eero/wang03-
reprint.pdf
VII. RESULT
Fig8 : Source:https://www.jesusdaily.com/can-you-spot-the-
Fig 6 Generating duplicate images and the similarity difference-in-these-two-mona-lisa-pictures-there-are-4-total/
Fig 9: Thread Number analysis where the x-axis is Image It is observable this line graph can be broken down into
count and the y-axis is Time in seconds chunks of the group of 8 cores generating output together for
their particular work.
CPU Utilization becomes 88% for both Multiprocessing [1]. Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E.
and Multithreading (depends upon the Hardware P. (2004). Image quality assessment: from error
sometimes a 100%) . visibility to structural similarity. IEEE transactions on
CPU utilization remains lower and around 26% which is image processing, 13(4), 600-612.
normal for Single Processing and Single-Threaded [2]. Narayanan, S., & Thirivikraman, P. K. (2015). Image
application similarity using Fourier transform. Journal Impact
Factor, 6(2), 29-37.
[3]. Xie, G., & Lu, W. (2013). Image Edge Detection Based
On Opencv. International Journal of Electronics and
Electrical Engineering, 1(2), 104-6.
[4]. Singh, N., Browne, L. M., & Butler, R. (2013). Parallel
astronomical data processing with Python: Recipes for
multicore machines. Astronomy and Computing, 2, 1-
10.
[5]. Malakhov, A. (2016, July). Composable multi-
threading for Python libraries. In Proceedings of the
Python in Science Conferences.
[6]. Meier, L., Honegger, D., & Pollefeys, M. (2015, May).
PX4: A node-based multithreaded open source robotics
framework for deeply embedded platforms. In 2015
IEEE international conference on robotics and
automation (ICRA) (pp. 6235-6240). IEEE.
[7]. “Fast image inpainting using similarity of subspace
Fig 14: Analyzing the CPU usage in MultiProcessing method” the authors Hosoi, T., Kobayashi, K., Ito, K.,
& Aoki, T.
[8]. Y. Lu, X. Tu, S. Lu and P. S. P. Wang, "Application of
pattern recognition technology to postal automation in
China" in Pattern Recognition and Machine Vision-in
Honor and Memory of Professor King-Sun Fu.,
Copenhagen, Denmark:River Pub. Co, pp. 367-381,
Mar. 2010.
[9]. D. G. Lowe, "Distinctive image features from scale-
invariant keypoints", Int. J. Comput. Vis., vol. 60, no. 2,
pp. 91-110, 2004.
[10]. Zhili Zhou, Q. M. Jonathan Wu, Yimin Yang, and
Xingming Sun. 2020. Region-Level Visual Consistency
Verification for Large-Scale Partial-Duplicate Image
Search. <i>ACM Trans. Multimedia Comput.
Commun. Appl.</i> 16, 2, Article 54 (June 2020), 25
pages. DOI:https://doi.org/10.1145/3383582.
[11]. A. Landge and P. Mane, "Near duplicate image
matching techniques," 2016 International Conference
on Information Communication and Embedded
Systems (ICICES), 2016, pp. 1-5, doi:
Fig 15: Analyzing the CPU usage in Single core Processing 10.1109/ICICES.2016.7518863.
[12]. K K, Thyagharajan & Kalaiarasi, G.. (2020). A Review
VIII. CONCLUSION on Near-Duplicate Detection of Images using Computer
Vision Techniques. Archives of Computational
The use of Threading is enough and more suited in Methods in Engineering. 28. 10.1007/s11831-020-
Applications like GUI and Networking as it’s less resource- 09400-w.
intensive, Both Multiprocessing and Multithreading have [13]. Kumar, P.J. & Ellappan, V. & Badala, P.. (2016).
similar performance in I/O bound operations. Image duplication detection. International Journal of
Pharmacy and Technology. 8. 25632-25639.