Seminar Report 2
Seminar Report 2
Introduction
An image retrieval system is a computer system for browsing, searching and retrieving images
from a large database of digital images. Most traditional and common methods of image retrieval
utilize some method of adding metadata such as captioning, keywords, or descriptions to the
images so that retrieval can be performed over the annotation words. Manual image annotation is
time-consuming, laborious and expensive; to address this, there has been a large amount of
research done on automatic image annotation. Additionally, the increase in social web
applications and the semantic web have inspired the development of several web-based image
annotation tools[1].
The first microcomputer-based image database retrieval system was developed at MIT, in the
1990s, by Banireddy Prasaad, Amar Gupta, Hoo-min Toong, and Stuart Madnick[1].
Advances in data storage and image acquisition technologies have enabled the creation of large
image datasets. In this scenario, it is necessary to develop appropriate information systems to
efficiently manage these collections. The commonest approaches use the so-called CBIR -
Content-Based Image Retrieval systems. Basically, CBIR systems try to retrieve images similar to
a user-defined specification or pattern (e.g., shape sketch, image example). Their goal is to support
image retrieval based on content properties (e.g., shape, color, texture), usually encoded into
feature vectors. One of the main advantages of the CBIR approach is the possibility of an automatic
retrieval process, instead of the traditional keyword-based approach, which usually requires very
laborious and time-consuming previous annotation of database images. The CBIR technology has
been used in several applications such as fingerprint identification, biodiversity information
systems, digital libraries, crime prevention, medicine, historical research, among others[2].
A key component of the Content Based Image Retrieval system is feature extraction. A feature is
a characteristic that can capture a certain visual property of an image either globally for the whole
image, or locally for objects or regions. Some key issues related to CBIR systems are the following.
First, how the extracted features can present image contents. Second, how to determine the
similarity between images based on their extracted features. One technique for these issues is using
a vector model. This model represents an image as a vector of features and the difference between
two images is measured via the distance between their feature vectors. There exist two approaches
to search, to browse, and to retrieve images. The first one is based on textual information attributed
to the images manually by a human. This is called concept-based or text-based image indexing. A
human describes the images according to the image content, the caption, or the background
information. However, the representation of an image with text requires significant effort and can
be expensive, tedious, time consuming, subjective, incomplete, and inconsistent. To overcome the
limitations of the text-based approach, the second approach, Content-Based Image Retrieval
(CBIR) techniques are used. In a CBIR system, images are automatically indexed by summarizing
their visual features such as color, texture, and shape. These features are automatically extracted
from the images. Recent years have seen a rapid increase in the size of digital image collections.
Everyday, both military and civilian equipment generates giga-bytes of images. A huge amount of
information is out there. However, we cannot access or make use of the information unless it is
organized so as to allow efficient browsing, searching, and retrieval. Image retrieval has been a
very active research area since the 1970s, with the thrust from two major research communities,
database management and computer vision. These two research communities study image retrieval
from different angles, one being text-based and the other visual-based[4].
In the early 1990s, because of the emergence of large-scale image collections, the two difficulties
faced by the manual annotation approach became more and more acute. To overcome these
difficulties, content-based image retrieval was proposed. That is, instead of being manually
annotated by text-based key words, images would be indexed by their own visual content, such as
color and texture. Since then, many techniques in this research direction have been developed and
many image retrieval systems, both research and commercial, have been built. The advances in
this research direction are mainly contributed by the computer vision community[5].
Many image retrieval techniques have been developed by researchers and scientists, some of the
most important and widely used image retrieval techniques are shown in Figure2.1.
Figure 2.1
Text based Image retrieval of queries would be "search results for flowers or even "search results
for flowers added on 2014-10-05.". So, the keyword may be by image name, date of adding,
deleting, modifying and others. Text-based image retrieval is also called description-based image
retrieval. Textbased image retrieval is used to retrieve the XML documents containing the images
based on the textual information for a specific multimedia query. To overcome the limitations of
CBIR, TBIR represents the visual content of images by manually assigned keywords/tags. It allows
a user to present his/her information need as a textual query, and find the relevant images based
on the match between the textual query and the manual annotations of images [7].
It is an old method, starting in 1970s. This technique requires a text as input to search for image.
Example
• Misspellings.
In content based image retrieval, images are searched and retrieved on the basis of similarity of
their visual contents to a query image using features of the image. A feature extraction module is
used to extract low-level image features from the images in the collection. Commonly extracted
image features include color, texture and shape.
Different from the previous one, Content-Based Image Retrieval take as input a query image and
the goal is search similar images by color, texture or form, as a comparison. Example of queries
of this techniques would be something like "search for results similar to that image containing
flowers ".
So, the user owns an image of a flower and the search will return similar images to that query
image. [7]
Multimodal fusion image retrieval involves data fusion and machine learning algorithms. Data
fusion, also known as combination of evidence, is a technique of merging multiple sources of
evidence. By using multiple modalities, we can learn the skimming effect, chorus effect and dark
horse effect [7].
Image retrieval based on the semantic meaning of the images is currently being explored by many
researchers. This is one of the efforts to close the semantic gap problem. In this context, there are
two main approaches: Annotating images or image segments with keywords through automatic
image annotation or adopting the semantic web initiatives [7].
The difference between the user’s information need and the image representation is called the
semantic gap in CBIR systems. The limited retrieval accuracy of image nuclear retrieval systems
is essentially due to the intrinsic semantic gap. In order to reduce the gap, relevance feedback is
very helpful into CBIR system. The basic idea behind relevance feedback is to integrate human
perception subjectivity into the query and involve user to evaluate the retrieval results. Then
depending upon user’s integration the similarity measures are automatically refined. There are lots
of CBIR algorithms has been proposed and most of them work on the finding effectively specific
image or group of relevant image to that query image using similarity computation phase. But it
is necessary to have user’s interaction to get better results[7].
Image Retrieval has been used in several applications, such as medicine, fingerprint identification,
biodiversity information systems, digital libraries, crime prevention, historical research, among
others [3].
The number of medical images produced by digital devices has increased more and more. For
instance, a medium-sized hospital usually performs procedures that generate medical images that
require hundreds or even thousands of gigabytes within a small space of time. The task of taking
care of such huge amount of data is hard and time-consuming. That’s one of the reasons that has
motivated research in the field of Content-Based Image Retrieval. In fact, the medical domain is
frequently mentioned as one of the main areas where Content Based Image Retrieval finds its
application [3].
Biologists gather many kinds of data for biodiversity studies, including spatial data, and images of
living beings. Ideally, Biodiversity Information Systems (BIS) should help researchers to enhance
or complete their knowledge and understanding about species and their habitats by combining
textual, image content-based, and geographical queries. An example of such a query might start
by providing an image as input (e.g., a photo of a fish) and then asking the system to "Retrieve all
database images containing fish whose fins are shaped like those of the fish in this photo" [3].
There are several digital libraries that support services based on image content. One example is
the digital museum of butterflies, aimed at building a digital collection of Taiwanese butterflies.
This digital library includes a module responsible for content-based image retrieval based on color,
texture, and patterns. In a different image context, present a content-based image retrieval digital
library that supports geographical image retrieval. The system manages air photos which can be
retrieved through texture descriptors. Place names associated with retrieved images can be
displayed by cross-referencing with a Geographical Name Information System (GNIS) gazetteer
[3].
Feature (content) extraction is the basis of content-based image retrieval. In a broad sense, features
may include both text-based features (key words, annotations) and visual features (color, texture,
shape, faces). However, since there already exists rich literature on text-based feature extraction
in the DBMS and information retrieval research communities, we will confine ourselves to the
techniques of visual feature extraction. Within the visual feature scope, the features can be further
classified as general features and domain specific features. The former includes color, texture, and
shape features while the latter is application-dependent and may include, for example, human faces
and finger prints. The domain-specific features are better covered in pattern recognition literature
and may involve much domain knowledge which we will not have enough space to cover in this
paper. Therefore, the remainder of the section will concentrate on those general features which can
be used in most applications. Because of perception subjectivity, there does not exist a single best
presentation for a given feature. As we will see soon, for any given feature there exist multiple
representations which characterize the feature from different perspectives [5].
1. Color
The color feature is one of the most widely used visual features in image retrieval. It is relatively
robust to background complication and independent of image size and orientation. Some
representative studies of color perception and color spaces can be found in. In image retrieval, the
color histogram is the most commonly used color feature representation. Statistically, it denotes
the joint probability of the intensities of the three color channels. Swain and Ballard proposed
histogram intersection, an L1 metric, as the similarity measure for the color histogram.To take into
account the similarities between similar but not identical colors, Ioka and Niblack et al. introduced
an L2-related metric in comparing the histograms. Furthermore, considering that most
color histograms are very sparse and thus sensitive to noise, Stricker and Orengo proposed using
the cumulated color histogram. Their research results demonstrated the advantages of the proposed
approach over the conventional color histogram approach. Besides the color histogram, several
other color feature representations have been applied in image retrieval, including color moments
and color sets. To overcome the quantization effects, as in the color histogram, Stricker and Orengo
proposed using the color moments approach. The mathematical foundation of this approach is that
any color distribution can be characterized by its moments. Furthermore, since most of the
information is concentrated on the low-order moments, only the first moment (mean), and the
second and third central moments (variance and skewness) were extracted as the color feature
representation. Weighted Euclidean distance was used to calculate the color similarity. To
facilitate fast search over large-scale image collections, Smith and Chang proposed color sets as
an approximation to the color histogram. They first transformed the (R, G, B) color space into a
perceptually uniform space, such as HSV, and then quantized the transformed color space into M
bins. A color set is defined as a selection of colors from the quantized color space. Because color
set feature vectors were binary, a binary search tree was constructed to allow a fast search. The
relationship between the proposed color sets and the conventional color histogram was further
discussed[3].
2. Texture
Texture refers to the visual patterns that have properties of homogeneity that do not result from
the presence of only a single color or intensity. It is an innate property of virtually all surfaces,
including clouds, trees, bricks, hair, and fabric. It contains important information about the
structural arrangement of surfaces and their relationship to the surrounding environment.
Because of its importance and usefulness in pattern recognition and computer vision, there are
rich research results from the past three decades. Now, it further finds its way into image
retrieval. More and more research achievements are being added to it. In the early 1970s,
Haralick et al. proposed the co-occurrence matrix representation of texture features. This
approach explored the gray level spatial dependence of texture. It first constructed a co-
occurrence matrix based on the orientation and distance between image pixels and then extracted
meaningful statistics from the matrix as the texture representation. Many other researchers
followed the same line and further proposed enhanced versions. For example, Gotlieb and
Kreyszig studied the statistics originally proposed in and experimentally found out that contrast,
inverse deference moment, and entropy had the biggest discriminatory power. Motivated by the
psychological studies in human visual perception of texture, Tamura et al. explored the texture
representation from a different angle. They developed computational approximations to the
visual texture properties found to be important in psychology studies. The six visual texture
properties were coarseness, contrast, directionality, linelikeness, regularity, and roughness. One
major distinction between the Tamura texture representation and the co-occurrence matrix
representation is that all the texture properties in Tamura representation are visually meaningful,
whereas some of the texture properties used in co-occurrence matrix representation may not be
(for example, entropy). This characteristic makes the Tamura texture representation very
attractive in image retrieval, as it can provide a more user-friendly interface. The QBIC system
and the MARS system further improved this texture representation. In the early 1990s, after the
wavelet transform was introduced and its theoretical framework was established, many
researchers began to study the use of the wavelet transform in texture representation. In Smith
and Chang used the statistics (mean and variance) extracted from the wavelet subbands as the
texture representation. This approach achieved over 90% accuracy on the 112 Brodatz texture
images. To explore the middle-band characteristics, a tree-structured wavelet transform was used
by Chang and Kuo in to further improve the classification accuracy. The wavelet transform was
also combined with other techniques to achieve better performance. Gross et al. used the wavelet
transform, together with KL expansion and Kohonen maps, to perform texture analysis in .
Thyagarajan et al. and Kundu et al. combined the wavelet transform with a co-occurrence matrix
to take advantage of both statistics-based and transform-based texture analyses. There also were
quite a few review papers in this area. An early review paper, by Weszka et al., compared the
texture classification performance of Fourier power spectrum, secondorder gray level statistics
(co-occurrence matrix), and first-order statistics of gray level differences. They tested the three
methods on two sets of terrain samples and concluded that the Fourier method performed poorly
while the other two were comparable. In, Ohanian and Dubes compared and evaluated four types
of texture representations, namely Markov random field representation, multichannel filtering
representation, fractal-based representation, and co-occurrence representation. They tested the
four texture representations on four test sets, with two being synthetic (fractal and Gaussian
Markov random field) and two being natural (leather and painted surfaces). They found that co-
occurrence matrix representation performed best in their test sets. In a more recent paper, Ma and
Manjunath evaluated the texture image annotation by various wavelet transform representations,
including orthogonal and bi-orthogonal wavelet transforms, the tree-structured wavelet
transform, and the Gabor wavelet transform. They found that the Gabor transform was the best
among the tested candidates which matched human vision study results [8].
3. Shape
In image retrieval, depending on the applications, some require the shape representation to be
invariant to translation, rotation, and scaling, while others do not. Obviously, if a representation
satisfies the former requirement, it will satisfy the latter as well. Therefore, in the following we
will focus on shape representations that are transformation invariant. In general, the shape
representations can be divided into two categories, boundary-based and region-based. The former
uses only the outer boundary of the shape while the latter uses the entire shape region. The most
successful representatives for these two categories are Fourier descriptor and moment invariants.
The main idea of a Fourier descriptor is to use the Fourier transformed boundary as the shape
feature. Some early work can be found in. To take into account the digitization noise in the image
domain, Rui et al. proposed a modified Fourier descriptor which is both robust to noise and
invariant to geometric transformations. The main idea of moment invariants is to use region-based
moments which are invariant to transformations, as the shape feature. In Hu identified seven such
moments. Based on his work, many improved versions emerged. In based on the discrete version
of Green’s theorem, Yang and Albregtsen proposed a fast method of computing moments in binary
images. Motivated by the fact that most useful invariants were found by extensive experience and
trial-and-error, Kapur et al. developed algorithms to systematically generate and search for a given
geometry’s invariants. Realizing that most researchers did not consider what happened to the
invariants after image digitization, Gross and Latecki developed an approach which preserved the
qualitative differential geometry of the object boundary, even after an image was digitized. In a
framework of algebraic curves and invariants is proposed to represent complex objects in a
cluttered scene by parts or patches. Polynomial fitting is done to represent local geometric
information, from which geometric invariants are used in object matching and recognition. Some
recent work in shape representation and matching includes the finite element method (FEM) the
turning function and the wavelet descriptor. The FEM defines a stiffness matrix which describes
how each point on the object is connected to the other points. The eigenvectors of the stiffness
matrix are called modes and span a feature space. All the shapes are first mapped into this space
and similarity is then computed based on the eigenvalues. Along a similar line of the Fourier
descriptor, Arkin et al. developed a turning function-based method for comparing both convex and
concave polygons. In Chuang and Kuo used the wavelet transform to describe object shape. It
embraced the desirable properties such as multiresolution representation, invariance, uniqueness,
stability, and spatial localization. For shape matching, chamfer matching attracted much research
attention. Barrow et al. first proposed the chamfer matching technique which compared two
collections of shape fragments at a cost proportional to the linear dimension, rather than area. In
to further speed up the chamfer matching process, Borgerfos proposed a hierarchical chamfer
matching algorithm. The matching was done at different resolutions, from coarse to fine. In Li and
Ma showed that the geometric moments method (region-based) and the Fourier descriptor
(boundary-based) were related by a simple linear transformation. In Babu et al. compared the
performance of boundary-based representations (chain code, Fourier descriptor, UNL Fourier
descriptor), region-based representations (moment invariants, Zernike 44 RUI, HUANG, AND
CHANG moments, pseudo-Zernike moments), and combined representations (moment invariants
and Fourier descriptor, moment invariants and UNL Fourier descriptor). Their experiments
showed that the combined representations outperformed the simple representations. In addition to
2D shape representations, there were many methods developed for 3D shape representations. In
Wallace and Wintz presented a technique for normalizing Fourier descriptors which retained all
shape information and was computationally efficient. They also took advantage of an interpolation
property of Fourier descriptor which resulted in efficient representation of 3D shapes. In Wallace
and Mitchell proposed using a hybrid structural/statistical local shape analysis algorithm for 3D
shape representation. Further, Taubin proposed using a set of algebraic moment invariants to
represent both 2D and 3D shapes which greatly reduced the computation required for shape
matching [5].
4. Segmentation
Segmentation is very important to image retrieval. Both the shape feature and the layout feature
depend on good segmentation. In this subsection we will describe some existing segmentation
techniques used in both computer vision and image retrieval. In Lybanon et al. researched a
morphological operation (opening and closing) approach in image segmentation. They tested their
approach in various types of images, including optical astronomical images, infrared ocean images,
and magnetograms. While this approach was effective in dealing with the above scientific image
types, its performance needs to be further evaluated for more complex natural scene images. In
Hansen and Higgins exploited the individual strengths of watershed analysis and relaxation
labeling. Since fast algorithm exists for the watershed method, they first used the watershed to
subdivide an image into catchmen basins. They then used relaxation labeling to refine and update
the classification of catchment basins initially obtained from the watershed to take advantage of
the relaxation labeling’s robustness to noise. In Li et al. proposed a fuzzy entropy-based
segmentation approach. This approach is based on the fact that local entropy maxima correspond
to the uncertainties among various regions in the image. This approach was very effective for
images whose histograms do not have clear peaks and valleys. Other segmentation techniques
based on Delaunay triangulation, fractals, and edge flow can be found in. All the above-mentioned
algorithms are automatic. A major advantage of this type of segmentation algorithms is that it can
extract boundaries from a large number of images without occupying human time and effort.
However, in an unconstrained domain, for nonpreconditioned images, the automatic segmentation
is not always reliable. What an algorithm can segment in this case is only regions, but not objects.
To obtain high-level objects, which is desirable in image retrieval, human assistance is needed. In
Samadani and Han proposed a computer-assisted boundary extraction approach, which combined
manual inputs from the user with the image edges generated by the computer. In Daneels et al.
developed an improved method of active contours. Based on the user’s input, the algorithm first
used a greedy procedure to provide fast initial convergence. Second, the outline was refined by
using dynamic programming. In Rui et al. proposed a segmentation algorithm based on clustering
and grouping in spatial–color–texture space. The user defines where the attractor (object of
interest) is, and the algorithm groups regions into meaningful objects. One last comment worth
mentioning in segmentation is that the requirements of segmentation accuracy are quite different
for shape features and layout features. For the former, accurate segmentation is highly desirable
while for the latter, a coarse segmentation may suffice [6].
Image texture has been proven to be a powerful feature for retrieval and classification of images.
In fact, an important number of real world objects have distinctive textures. These objects range
from natural scenes such as clouds, water, and trees, to manmade objects such as bricks, fabrics,
and buildings. During the last three decades, a large number of approaches have been devised for
describing, classifying and retrieving texture images. Some of the proposed approaches work in
the image space itself. Under this category, we find those methods using edge density, edge
histograms, or co-occurrence matrices. Most of the recent approaches extract texture features from
transformed image space. The most common transforms are Fourier wavelet and Gabor
transforms. This paper describes a new technique that makes use of the local distribution of the
edge points to characterize the texture of an image. The description is represented by a 2-D array
of LBP-like codes called LBEP image from which two histograms are derived to constitute the
feature vectors of the texture [8].
This study considers some of the state-of-the art texture analysis methods recently
described in literature. This includes methods working in a transformed space (such as wavelet,
Gabor or Fourier spaces) and some methods working in the image space itself, such as edge
histogram- and Local Binary Pattern-based methods. All these techniques have been reported to
produce very good results.
2.1. Methods Working in Pixel Space Edge information is considered as one of the most
fundamental texture primitives [29]. This information is used in different forms to describe texture
images. Edge histogram (also known as gradient vector) is among the most popular of these forms.
A gradient operator (such as Sobel operator) is applied to the image to obtain gradient magnitude
and gradient direction images. From these two images a histogram of gradient directions is
constructed. It records the gradient magnitude of the image edges at various directions.
LBP-based approach was first introduced by Ojala et al.in 1996. It uses an operator called Local
Binary Pattern (LBP in short), characterized by its simplicity, accuracy and invariance to
monotonic changes in gray scale caused by illumination variations. Several extensions of the
original LBP-based texture analysis method were proposed since then, such as a rotation and
scaling invariant method and a multi-resolution method. In its original form, LBP operator assigns
to each image pixel the decimal value of a binary string that describes the local pattern around the
pixel [9].
Figure 6.1 illustrates how LBP code is calculated.
Three well known edge detection techniques Sobel, Canny and Laplacian of Gaussian
(LoG) were tested. Edge detection using Sobel operator is the fastest among the three techniques
but is also the most sensitive to noise, which leads to a much deteriorated accuracy for the retrieval
process. Canny algorithm produces a better characterization of the edges but is relatively slow
which affects sensibly the speed of the overall retrieval process. LoG is chosen as it produces a
good trade-off between execution time and retrieval accuracy [9].
This operation applies an LBP-like coding to E. Various LBEP masks have beentested: an 8-
neighbour mask, a 12-neighbour mask and a 24-neighbour mask as shown in fig 6.3. The use of
24-neighbour mask slows down sensibly the retrieval process (mainly at the level of histogram
calculation) without significant improvement in the accuracy. Further investigation showed that
12-neighbour mask leads to better retrieval results. Figure.6.3 shows the 8- and 12-
neighbourhood masks that have been considered.
In medicine to date, virtually all Picture Archiving and Communications Systems (PACS) retrieve
images simply by indices based on patient name, technique, or some-observer-coded text of
diagnostic findings. Fields of text tags, such as patient demographics, diagnostic codes (ICD-9,
American College of Radiology diagnostic codes), image view-plane (saggital, coronal, etc.) and
so on usually are the first handles on this process. This textual approach, however, fails to fully
account for quantitative and shape relationships of medically relevant structures within an image
that are visible to a trained observer but not codable in conventional database terms. Suitable
database structures addressing the visual/spatial properties of medical images and more effective
techniques to deal with different types of knowledge are necessary. In this paper we propose a
CBIR system for normal anatomical regions (heart and great vessels, liver, renal and splenic
parenchyma, and backbone) present in CT studies of the chest and abdomen. The proposed system
automatically extracts co-occurrence texture features both at the global (organ) level and local
(pixel) level, and then uses these features to measure the similarity between various organ images
of a CT organ image database. One of the major challenges in building such a type of system is to
determine the best similarity metric to be used in the context of texture features for CT image
databases. In our approach, we will investigate the effectiveness of several metrics in performing
similarity retrieval based on both pixel- and global-based co-occurrence texture features [1].
Background
General CBIR systems that extract automatically low-level image features from pixel data have
been intensively explored by several researchers (Stan et al. Niblack et al. Mehrotra et al. Pentland
et al); although anatomic information rests on visual appearances which makes it a natural feature
to use in retrieval, there has been little work done to build medical CBIR systems. 2 Glatard et al.
introduced a CBIR system which uses Gabor filters extracted from segmented cardiac Magnetic
Resonance Imaging (MRI)’s to perform clinically relevant queries on large image databases that
do not require user supervision. Brodley et al. introduced a CBIR system for the retrieval of CT
lung images; their proposed system uses several features (such as co-occurrence texture features,
Fourier descriptors and moments), and relies on expert interaction with the system in addition to
various machine learning and computer vision techniques. Wei et al. proposed a CBIR system for
the mammography imaging modality using the co-occurrence texture signatures as globa l features.
In our approach, we use both global and local-level co-occurrence texture features to retrieve
normal anatomical regions produced by Computed Tomography imaging modality. Since there is
no similarity measure known to perform the best for the CT modality, we compare eight similarity
measures and show how the selection of a similarity measure affects the retrieval precision [4].
Figure 7.2
Texture Features
In medical image processing, texture is especially important, because it is difficult to classify
human body organ tissues using shape or gray level information. Several methods have been
applied towards the analysis and characterization of texture within medical images including
fractal dimension, run-length encoding, discrete wavele t transform, and co-occurrence matrices.
In our current implementation, we use the Haralick co-occurrence texture model and its texture
descriptors that capture the spatial dependence of gray-level values and texture structures within
an image. We are using the following ten descriptors: entropy, energy (angular second moment),
contrast, homogeneity, sum mean, variance, correlation, maximum probability, inverse difference
moment and cluster tendency. These descriptors are calculated at both local (pixel) and global-
level depending on the similarity measures to be used and the fundamental structures present in
the images as shown in fig 7.2. To compute global-level features, the normalized co-occurrence
matrices are calculated in four directions and five displacements generating twenty matrices per
segmented image. The ten Haralick features are calculated for each of the twenty matrices and
then, the twenty values are averaged and recorded as a mean feature vector for the corresponding
segmented image. To compute pixel-level features, a 5-by-5 neighborhood is considered for each
pixel within the segmented region, generating one co-occurrence matrix per 5-by-5 neighborhood
region. While co-occurrence matrices are normally defined for a fixed distance and direction when
calculated at the global level, for the pixel-level approach, since the neighborhood of the pixel is
small, we do not calculate the co-occurrence along fixed directions and displacements, but instead
consider all the pixel pairs within that neighborhood. Thus, our implementation produces a single
co-occurrence matrix for each pixel rather than for each choice of distance and direction. Then,
for each co-occurrence matrix (each pixel), we calculate ten Haralick features which can be related
to specific characteristics in the image. Since the gray-levels for our images range from 0 to 4096,
for reasons of computational efficiency, the number of gray levels can be reduced if one chooses
to bin them, thus reducing the size of the co-occurrence matrix. In our approach, before calculating
the matrices, we applied a linear binning such that the range [0, 4096] was mapped to the range
[0,256]. From the pixel-level data, we derived
1) The means vector-based data consists of the average of the normalized pixel-level data for
each region such that the texture representation of that corresponding region is a vector instead
of a set of vectors given by the pixels’ vector representation within that region.
3) The texture signatures are clustered representations of the normalized local level data obtained
using a k-d tree clustering algorithm. The k-d tree algorithm iteratively divides the data space using
predefined stopping criteria. In our approach, we implemented two stopping criteria: the first
criterion was to establish a minimum variance within the subset to be divided to prevent creating
redundant clusters and over-splitting; the second stopping criterion was used to enforce a minimum
cluster size as a percentage of the original data set. This was done to maintain significant size
within the clusters and to prevent outliers from uncontrollably growing 4 the tree. Multiple
signatures were developed by varying both the variance and minimum cluster size and used for
the directed Hausdorff distance calculation and retrieval evaluation.
Similarity Measures
Similarity metrics are measures that describe how similar two images are (organs in our case). We
implement eight similarity measures as follows:
1) Euclidean distance,
2) Minkowski 1-distance (city block distance or L1 norm),
3) Chi-square ( 2 c ) statistics (used to distinguish whether distributions of the descriptors differ
from each other),
4) weighted-meanvariance (WMV – uses the means and standard deviations for each of the
considered features),
5) Jeffrey-divergence (used to compute the distance between class distributions of two values of
the same feature),
6) Cramer-von Mises (similar to the squared Euclidean distance but calculated between the
distributions and as the maximal discrepancy between the cumulative distributions) ,
7) Kolmogorov-Smirnov distance (used for unbinned data distributions and it is invariant to
arbitrary monotonic transformations), and
8) Hausdorff distance (used on texture signatures). For the mathematical definitions of these
metrics.
Evaluation of results
We evaluate the system’s similarity retrieval results using precision as a performance metric. The
precision is calculated as the number of relevant retrieved images divided by the total number of
retrieved images in return to the query. A retrieved image is considered to be relevant if belongs
to the same anatomical region as the query image. In our current implementation, we look only at
the best five retrieval results and evaluate the eight similarity measures with respect to these top
similarities. Since several similarity measures are used for both pixel-level and global-level data,
we need to compare and find out what are the best similarity measures for pixel-level and global
level data. The best similarity metric result for pixel-level data would also be compared with the
best result from global-level data. At the pixel-level data, we compared the best similarity metric
for means vector-based with the best from binned-histogram and texture signatures [8].
Figure 7.3 Precision at the global-level when each image from database becomes a query
image.
Figure 7.4
[1] E. H. Adelson and J. R. Bergen. Spatiotemporal energy models for the perception of motion.
JOSA A , 2(2):284– 299, 1985.
[2] J. R. Bach, C. Fuller, A. Gupta, A. Hampapur, B. Horowitz, R. Humphrey, R. Jain, and C. Shu.
Virage image search engine: an open framework for image management. In SPIE Conf. on Storage
and Retrieval for Image and Video Databases IV, 2670: 76–87, 1996.
[3] S. Belongie, C. Carson, H. Greenspan, and J. Malik. Colorand texture-based image
segmentation using EM and its application to content-based image retrieval. In ICCV, pages 675–
682,1998.
[4] A. C. Bovik, M. Clark, and W. S. Geisler. Multichannel texture analysis using localized spatial
filters. IEEE PAMI, 12(12):55–73, 1990.
[5] P. Brodatz. Textures: A Photographic Album for Artists and Designers. Dover, New York, NY,
1966.
[6] J. D. Daugman. Complete discrete 2-d Gabor transforms by neural networks for image analysis
and compression. IEEE ASSP, 36:1169–1179,1988.
[7] F. Farrokhnia and A. K. Jain. A multi-channel filtering approach to texture segmentation. In
CVPR, pages 364–370, 1991
[8] M. Henning, M. Nicolas, and B. David, et al, “ A review of content based image retrieval
systems in medical applications-clinical benefits and future directions,” International Journal of
Medical Informatics, vol. 73, no. 1 , pp. 1-21, 2004.
[9] [Asad Ali, Saeed Murtaza, Aamir Saeed Malik,” Content Based Image Retrieval Using
Daubechies Wavelet Transform”. Proceedings of the 2nd National Workshop on Trends in
Information Technology (NWTIT), Page 110-115, 2003