0% found this document useful (0 votes)
93 views6 pages

Content Based Image Retrieval in Peer-To-Peer Networks

1. Currently, most content-based image retrieval (CBIR) systems use a centralized model where image features and queries are handled by central servers, which has limitations in scalability. Peer-to-peer (P2P) networks provide an alternative decentralized model for CBIR. 2. CBIR works by extracting visual features from images like color, texture, and shape and representing them as vectors for similarity matching with query images. Relevance feedback improves results by iteratively incorporating user feedback on retrieved images into subsequent queries. 3. P2P networks are well-suited for large-scale image retrieval as the data, processing, and storage are distributed across peer nodes, improving scalability over centralized
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
93 views6 pages

Content Based Image Retrieval in Peer-To-Peer Networks

1. Currently, most content-based image retrieval (CBIR) systems use a centralized model where image features and queries are handled by central servers, which has limitations in scalability. Peer-to-peer (P2P) networks provide an alternative decentralized model for CBIR. 2. CBIR works by extracting visual features from images like color, texture, and shape and representing them as vectors for similarity matching with query images. Relevance feedback improves results by iteratively incorporating user feedback on retrieved images into subsequent queries. 3. P2P networks are well-suited for large-scale image retrieval as the data, processing, and storage are distributed across peer nodes, improving scalability over centralized
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Content based image retrieval in peer-to-peer networks

Currently, most of the CBIR systems are based on the centralized computing model. Some are stand-alone applications while
others are web-based systems. A centralized system maintains central nodes to handle the query requests. It keeps the entire
feature descriptor database in a centralized server. Upon retrieving the relevant images according to feature similarity
measures, the content will be transferred directly from the content server to the requesting host. The drawback of the centralized
system is its limited scalability for handling growing volumes of retrieval requests and large image databases. The worldwide
infrastructure of computers and networks created an exciting opportunity for collecting vast amounts of data and for sharing
computers and resources on an unprecedented scale. In the last few years, the emerging Peer-to Peer (P2P) model has become
a very powerful and attractive paradigm for developing Internet-scale file systems and sharing resources. This paper explores
the various approaches and efforts for design of content-based image retrieval in P2P Environment.

Keyword: Image Retrieval, feature extraction, Peer to Peer network, Relevance feedback

1. INTRODUCTION
With advances on the Internet and digital image sensor technologies, the volume of digital images produced by
scientific, educational, industrial, and other applications available to users increased enormously. From cameras
and webcams to printers and scanners, the hardware is becoming sleeker, faster, and cheaper. As the cost of
equipment decreases, the market for new enthusiasts widens, all owing more consumers to experience the joy of
creating their own images. A quick browse around the web can easily turn up graphic artwork from various artists,
news photos around the world, corporate images of new products and services, and much more. Online sites such
as Facebook and Instagram give billions the capability to share their photographs. The Internet has clearly proven
itself a catalyst in fostering the growth of digital imaging. With such a large multimedia database becoming a
reality in various domains, methods for organizing a database of images and for efficient retrieval have become
important.
Content-based image retrieval (CBIR) is the process of retrieval of desired images from a huge collection
based on the visual contents of an image (such as color, texture, shape, and spatial layout) that can be extracted
from the images.

1.1 Content Based Image Retrieval

To overcome the limitations of text-based image retrieval, content-based image retrieval (CBIR) was introduced
in the early 1990's. During the past few years, remarkable progress has been made in system development of
CBIR. However, there remain many challenging research problems that continue to attract researchers from
several disciplines.
The CBIR makes direct use of the content of an image rather than depending on human annotation of
metadata with keywords. Content-based image retrieval is a process of retrieval of desired images from a huge
image database with the help of the visual contents of an image. The system takes each image as a combination
of pixels characterized by the low-level features such as color, shape, and texture, then it represents these features
in the form of vectors called the descriptors of the image. It extracts these primitive features by automated
techniques and then uses it for searching and retrieval.

1.2 General Architecture of CBIR Systems

Figure 1.2.1 shows a general architecture of a content-based image retrieval system. Two main functionalities are
supported by this system: data insertion and query processing. The data insertion subsystem is responsible for the
extraction of appropriate features from images and storing them in the image database. This process is usually
performed off-line. The query processing allows the user to specify a query by means of a query pattern and to
visualize the retrieved similar images. The query-processing module extracts a feature vector from a query pattern
and applies a metric to evaluate the similarity between the query image and the database images. Next, it ranks
the database images in a decreasing order of similarity to the query image and forwards the most similar images
to the interface module.

User

Fig. 1.2.1 General Architecture of CBIR System

The visual contents of the images in the database are extracted and described by multi-dimensional feature vectors.
Users provide an example image or sketch to the retrieval system to find the desired images. The system then
changes this example into the representation of feature vectors. The similarities between the feature vectors of the
query image and those of the images in the database are then computed and retrieval is performed with the help
of an indexing scheme. The indexing scheme provides an efficient way to search for an image in the database.
Recent retrieval systems have integrated users' relevance feedback to modify the retrieval process to generate
perceptually and semantically more meaningful retrieval results.

Even after considering these factors in the design of CBIR system, it sometimes fails to precisely retrieve
desired images. Then it is useful to refine the results using the feedback given by the user for the retrieved results
in multiple iterations. Relevance feedback is a process in which user feedback is important. An image retrieval
system presents a ranked set of images which are relevant to the user’s initial query and then iteratively solicits
the user for feedback on the relevance of images and uses the feedback to compose and improve the retrieval
result.
In recent years many commercial systems such as WebSeek [5], Netra [1] and experimental systems such
as MIT’s photo book [2], WBIIS [3] have been proposed and used for visual search. These systems are based on
the centralized computing model. One of the limitations of the centralized CBIR system is that the feature
extraction, indexing, and querying are done in a centralized manner, which can be computationally expensive,
and is difficult to scale up.

To provide better scalability, a decentralized system is designed based on peer to peer (P2P) paradigm.
Each node in a P2P network act as both, a client for requesting image and a server for redistributing the images
[4]. In P2P file systems, files are stored at the end user machines (peers) rather than on a central server and, as
opposed to the traditional client-server model, files are transferred directly between peers. In general, P2P systems
allow a decentralized sharing of distributed computational resources and contents of individual peers.

We foresee the advantages of CBIR in P2P network in several ways. First, with an increased number of users
joining the P2P network, the image collection will grow enormously due to individual contributions which gives
diversity and variety. Second, it overcomes the scalability problem of image retrieval by using a decentralized
retrieval approach. Furthermore, the storage, information and computational cost can be distributed among the
peers that allow many individual computers to achieve higher throughput [4].

2. RELEVANCE FEEDBACK
CBIR system sometimes fails to precisely retrieve desired images. Then it is useful to refine the results using the
iterative feedback given by the user. Relevance feedback is a process in which user feedback is important. An
image retrieval system presents a ranked set of images which are relevant to the user ‘s initial query and then
iteratively solicits the user for feedback on the relevance of images and uses the feedback to compose and improve
the retrieval result. Relevance feedback can be considered as a learning problem, a user provides feedback
examples from the retrieved results of a query and the system learns from such examples to refine retrieval results.

According to Mitchell’s [8] definition, machine learning deals with the question of how to design
computer programs that automatically improve with experience. In this view, any task that could be improved
with respect to certain performance measures based on some experience can be considered as the machine-learning
task. In CBIR, relevance feedback is a task to improve the retrieval performance and the experience here is
feedback examples provided by the users. Hence, classical machine-learning methods, such as decision tree
learning [9], artificial neural networks [8], Bayesian learning [15], and kernel-based learning [122] can be used
for relevance feedbacks in CBIR.

Users are usually reluctant to provide many feedback examples; here the number of training samples is
very small, typically less than ten in each round of the feedback session. On the contrary, feature dimensions in
CBIR systems are usually high. Hence, the crucial issue in performance of relevance feedback in CBIR systems
is how to learn from small training samples in a very high dimension feature space. This fact makes many learning
methods, such as decision tree learning and artificial neural networks, not suitable for CBIR [10]. The key issues
in addressing relevance feedback in CBIR as a small sample learning problem include how to learn fast from
small sets of feedback samples to improve retrieval accuracy effectively; how to accumulate knowledge learned
from feedback; and how to integrate low-level visual and high-level semantic features in the query. However,
most of the published works have been focused on the first issue.

Compared with other learning methods, Bayesian learning is observed to be more efficient. Vasconcelos
and Lippman treated feature distribution as a Gaussian mixture and used Bayesian inference for learning during
feedback iterations in a query session [13]. Richer information captured by the mixture model also makes image
regional matching possible.

The approach proposed in [14] used Monte Carlo sampling to search the set of samples that will minimize
the expected number of future iterations. In estimating the expected number of future iterations, entropy is used
as an estimate of the number of future iterations under the ambiguity specified by the current probability
distribution of the target image over the all-test images. Tong and Chang [12] proposed an SVM active learning
algorithm to select a sample to maximally reduce the size of vector space in which the class boundary lines. It is
observed that selecting the points near the SVM boundary can achieve this goal, and it is more efficient than other
schemes, which require exhaustive trials on all the test items. Therefore, in their work, the points near the SVM
boundary are used to approximate the most-informative points; and the most-positive images are chosen as the
ones farthest from the boundary on the positive side in the feature space.

Some researchers consider a relevance feedback process in CBIR as a pattern recognition or classification
problem. Under such a consideration, the positive and negative examples provided by user can be treated as
training examples and a classifier could be trained. Then, such a classifier can separate all data set into relevant
and irrelevant groups. It seemed that many existing pattern recognition tools could be adopted for this task and
different kinds of classifiers have been experimented, such as a linear classifier [15], nearest-neighbor classifier
[14], Bayesian classifier [10], support vector machines (SVM) [12], and so on. In this category, the most popular
algorithm is represented by [12] where the SVM classifier is trained to divide the positive and negative examples.
Then such SVM classifier will classify all images in database into two groups: relevant and irrelevant groups to a
given query.

3. CONTENT BASED IMAGE RETRIEVAL IN P2P NETWORK


Since mid-1990s, many CBIR systems have been proposed and developed, including QBIC [5], WebSEEK [5],
WBIIS [6], SIMPLIcity [96], MARS, [6], NeTra [5] Photobook [2] and other systems for domain-specific
applications. The images are represented in the form of a feature vector with their similarity which is based on the
distance between the feature vectors. One of the drawbacks of these systems is that the feature extraction,
indexing, and querying are carried out in a centralized way, which can be computationally intensive, and it is
difficult to scale up.

One of the promising future trends in CBIR includes distributed computing on data collection, data
processing, and information retrieval. By extending the centralized system model, we not only can increase the
size of image collections easily, but we can also overcome the scalability bottleneck problem by distributing the
task of image retrieval. P2P network is a recently evolved paradigm for distributed computing.

The P2P file sharing applications can accomplish tasks that are difficult for the conventional centralized
computing models to achieve. For example, by distributing data storage over networked computers, one can have
virtual data storage that is much more than what can be stored in a local computer. In addition, one may also
foresee data security by distributing pieces of an encrypted file over many computers. Likewise, one may also
distribute the computation among different computers to achieve a high-performance throughput. Peer-to Peer
(P2P) network offers a completely decentralized and distributed paradigm on top of the physical network, which
avoids the coordinator bottleneck problem.

Emerging P2P networks or the implementations such as Gnutella [19], Napster [18], Freenet [17],
LimeWire [16], and eDonkey [15] offer the following advantages:

• Distributed Resource: The storage, information and computational cost can be distributed among the
peers, allowing many individual computers to achieve a higher throughput [36].
• Increased Reliability: The P2P network improves reliability by eliminating dependence on centralized
coordinators. In other words, the P2P network can still be operational even after a certain portion of peers
is down [96].
• Comprehensiveness of Information: The P2P network has the potential to reach every computer on the
Internet.
P2P networks, which are formed by equally privileged nodes connecting to each other in a self-
organizing way, have been one of the most important architectures for data sharing. Popular P2P file-sharing
networks such as eDonkey counts millions of users [98] and tens of millions of files. The ever-growing amount
of multimedia data and computational power of P2P networks motivates the need and potential for large scale
multimedia retrieval applications such as content-based image sharing. P2P networks are well known for their
efficiency, scalability, and robustness on file sharing. To provide extended search functionality such as content-
based image retrieval (CBIR), one must faces the following challenges:
• In contrast to centralized environments, data in P2P network is distributed among different nodes, thus a
CBIR algorithm needs to index and search for images in a distributed manner.
• Unlike distributed servers/clouds, nodes in P2P networks have limited network bandwidth and
computational power, thus the algorithm should keep the network cost low and the workload among
nodes balanced.
• As P2P networks are under constant churn, where nodes join and leave the network frequently, the index
needs to be updated dynamically to adapt to such changes.
To support content indexing and to avoid message flooding, structured overlay networks such as
Distributed Hash Tables (DHTs) [7] are often implemented on the top of a physical network. By organizing the
nodes in a structured way, messages can be efficiently routed between any pair of nodes, and the index integrity
can be maintained during network churn.

For the CBIR, most of the existing systems adopt a global feature approach: an image is represented as
a high dimensional feature vector (e.g., color histogram), and the similarity between files is measured using the
distance between two feature vectors [3,4,5]. Usually, the feature vectors are indexed by a distributed high-
dimensional index or Locality Sensitive Hashing (LSH) over the DHT overlay. However, due to the limitation
known as “curse of dimensionality”, most of these solutions have high network costs or serious workload balance
issue among nodes when the dimensionality of feature vectors is high.

3. 1 Query Broadcasting in P2P Domain


In a P2P- CBIR, when a peer initiates a search for an image, it broadcasts a query request to its connected peers.
Peers then forward the request to their connected peers and this process continues. Unlike the client server
architecture of the web, the P2P network allows individual computers that join and leave the network frequently
to share information directly with each other without the help of dedicated servers. Each peer acts as a server and
as a client simultaneously on these networks, a peer can become a member of the network by connecting with one
or more peers in the current network.
Messages are sent over multiple hops from one peer to another while each peer looks up its locally shared
collection and responds to queries. Basically, this model of query broadcasting is wasteful because peers are forced
to handle irrelevant query messages. This type of search is called a Brute Force Search (BFS).

There are several solutions proposed to solve the query broadcasting problem. Chord [7], CAN [7], Pastry
[7] and Tapestry [12] tackle it by distributing the index storage into different peers, thus sharing the workload of
a centralized index server. The distributed infrastructure of both CAN and Chord uses Distributed Hash Table
(DHT) technique to map a filename to a key; each peer is responsible for storing a certain range of (key, value)
pairs. When a peer looks for a file, it hashes the filename to a key and asks the peers responsible for this key for
the actual storage location of that file. The chord approach models the key as a m-bit identifier and arranges the
peers into a logical ring topology to determine which peer is responsible for storing which pair (key, value). CAN
approach model the key as a point on a D-dimensional Cartesian coordinate space, while each peer is responsible
for the pairs (key, value) inside its specific region. Such systems speed up and reduce message passing for the
process of key lookup (data location).

Some extensions of DHTs to perform content-based retrieval and textual similarity matches are proposed
in Tang et al. [12] and Harrenetal.[15]. Although DHT’s are scalable, their performance under the dynamic
conditions of prevalent P2P systems is still unknown due to the penalty incurred in joining and leaving [4]. As
DHTs mandate a specific network structure and incur a certain penalty on joining and leaving the network, some
researchers propose methods that operate under the prevalent dynamic P2P environment, for example, Gnutella.
Crespo proposed a routing indices approach for retrieving text documents in P2P systems. Under this scheme,
each peer maintains a routing index, which is used for forwarding queries to peers that are supposed to contain
more documents of the same category as the queries. This method requires all peers to agree upon a set of
document categories.

4. CONCLUSION

7506
This paper explores the various approaches of design of content-based image retrieval with relevance feedback in
P2P environment. It highlights the importance of feedback mechanisms in image retrieval system to improve its
retrieval accuracy. This paper also presents the challenges of query broadcasting in P2P domain for image retrieval
and explores the various techniques to make the CBIR system work efficiently in P2P environment.

References
[1] Ma W. and Manjuath B., “Natra: A Toolbox for Navigating Large Image Databases”, In Proceedings of the IEEE International Conference
on Image Processing. pp. 568– 571, 2002
[2] Pentland, A. Picard, “Photobook: Tools for Content-based Manipulation of Image Databases”, In Proceedings of SPIE. Vol. 2185, pp. 34–
47, 2008.
[3] SETI: homepage: http://www.setiathome.ssl.berkeley. edu/.
[4] Sia, K. C. NG, “Bridging the P2P and WWW Divide with DISCOVIR—DIStributed COntent-based Visual Information Retrieval”, In
Poster Proceedings of The Twelfth International World Wide Web Conference, Poster ID S172. Hungary.
[5] Smith R. and Chang S. F. “An Image and Video Search Engine for the World-Wide Web”, In Proceedings of SPIE. Vol. 3022, pp. 84–95,
2011.
[6] Sripanikulchai, K. Maggs, “Efficient Content Location Using Interest Based Locality in Peer-to-Peer Systems”. In Proceedings of IEEE
INFOCOM, pp. 222-228, 2003.
[7] Stoica, I.Morris, “Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications” In Proceedings of ACM SIGCOMM pp. 149–
160, 2011.
[8] J. Laaksonen, M. Koskela, and E. Oja, “PicSOM: Self-Organizing Maps for Content-Based Image Retrieval,” in Proceedings of
International Joint Conference on NN, pp. 112-118 July 1999.
[9] S.D.MacArthur, C.E.Brodley, and C.-R.Shyu,“Relevance Feedback Decision Trees in Content-Based Image Retrieval,” in IEEE Workshop
on Content-Based Access of Image and Video Libraries, pp. 68–72, 2000.
[10] Z. Su, H. J. Zhang, and S. Ma, “Relevant Feedback using a Bayesian Classifier in Content-Based Image Retrieval,” in SPIE Electronic
Imaging, San Jose, CA, pp. 174-182, January 2001.
[11] K. Tieu and P. Viola, “Boosting Image Retrieval,” in IEEE Conference on Computer Vision and Pattern Recognition, pp. 236-242, 2000.
[12] S. Tong and E. Chang, “Support Vector Machine Active Leaning for Image Retrieval,” in ACM Multimedia, Ottawa, Canada, pp. 128-
132, 2001.
[13] N. Vasconcelos and A. Lippman, “Learning from User Feedback in Image Retrieval Systems,” in NIPS’99, Denver, CO, pp. 28-36,
1999.
[14] P. Wu and B. S. Manjunath, “Adaptive Nearest Neighbour Search for Relevance Feedback in Large Image Database,” in ACM
Multimedia Conference, Ottawa, Canada, pp. 145-152, 2001.
[15] Y. Wu, Q. Tian, and T. S.Huang, “Discriminant EM Algorithm with Application to Image Retrieval,” in IEEE CVPR, South Carolina,
pp. 128-136, 2000.
[16] The eDonkey2000 homepage: http://www.edonkey2000.com.
[17] Freenet: The Freenet homepage. http://freenet.sourceforge.net.
[18] The Limewire homepage: http://www.limewire.org
[19] Napster: The Napster homepage. http://www.napster.com.
[20] Abe, N. Mamitsuka, “Query Learning Strategies Using Boosting and Bagging”, in Proceedings of the 15th International Conference on
Machine Learning, Madison, pp. 1–9, 1998.

You might also like