Ch-7-Multimedia Data Retrieval and Management
Ch-7-Multimedia Data Retrieval and Management
Multimedia Systems
MM data retrieval
Features are a smarter way to represent MM data
content than their original format
e.g., color and texture for an image
Today we focus on which are the most suitable models
for
representing, interpreting, describing and comparing
such features
E.g., color histograms for images by using the Euclidean distance as
similarity measure
Multimedia Systems
Content-based search
First approach to search for MM objects relies on standard text-based
techniques, provided objects come with a precise textual description
of what they represent/describe, i.e., of their semantics
However, the “annotation” of MM objects is a subjective, time
consuming, and tedious process (completely manual!!)
A more convenient approach, suitable to manage large DBs,
is to
automatically extract from MM objects
a set of (low-level) relevant numerical features
that, at least partially, convey some of the semantics of the
objects
Clearly, which are the “best” features to extract depend on the specific
medium and on the application at hand (i.e., what we are looking for)
Multimedia Systems
The general scenario
In general, we have a 2-levels scenario:
Multimedia Systems
The reference architecture
For the text-based approach, the image querying problem can
be simply transformed into a traditional information retrieval
problem
(IR)
query image
as we saw for textual feature
documents retrieval and extraction
as we will see when image
speaking about segmentation
“MM data annotation”… GUI (optional)
For content-based
information retrieval (CBIR) results
query
visualize
more sophisticated processor
query evaluation techniques query engine
are required
index
image feature
DB DB
Multimedia Systems
Variables of the CBIR problem
How the set of relevant results is determined depends
on which low-level features are used to characterize the MM
data content
on the similarity criterion (distance function) used to
compare
such features
on how DB objects are ranked with respect to the query
on whether the user is interested in the whole MM data query or
only in a part of it
Multimedia Systems
CBIR problem
Simplest case:
each MM data object (i.e., image) is characterized using
global low-level features and the result of a query
consists in the set of DB objects that “better match” the
visual characteristics of the target object, according to a
predefined similarity criterion, which is in turn based on
such features
This is also defined Nearest Neighbors (NN)
search problem
Multimedia Systems
Representing color
In a digital image, the color space that encodes the color
content of each pixel of the image is necessarily discretized
This depends on how many bits per pixel (bpp) are used
Example:
if one represents images in the RGB space by using 8 3 = 24 bpp,
the number of possible distinct colors is 224 = 16,777,216
With 8 bits per channel, we have 256 possible values on each
channel
Although discrete, the possible color values are still too many if
one wants to compactly represent the color content of an image
This also aims at achieving some robustness in the matching process
(e.g., the two RGB values (123,078,226) and (121,080,230) are almost
indistinguishable)
D = 64
Multimedia Systems
Further examples
Two D=64 color histograms
Multimedia Systems
Comparing color histograms
Since histograms are vectors, we can use any Lp-norm to measure the
distance (dissimilarity) of two color histograms
However, doing so we are not taking into account colors’ correlation
Depending on the query and the dataset, we might therefore obtain low-quality
results
Weighted Lp-norms and relevance feedback can partially alleviate the
problem…
Multimedia Systems
Sample queries based on color (1)
32-D HSV histograms
QueryImage Euclidean distance
Multimedia Systems
Sample queries based on color (2)
QueryImage Euclidean distance 32-D HSV histograms
Multimedia Systems
Quadratic distance
Consider two histograms h and q, both with D bins
Their quadratic distance is defined as:
LA
D D
L A (h, q; A) a h q h q
i,j i i j j
i1 j1
h q T
A h
q
where A = {ai,j} is called the (color-)similarity matrix
The value of ai,j is the “similarity” of the i-th and the j-th colors (ai,i = 1)
Note that:
when A is a diagonal matrix we are back to the weighted Euclidean distance,
when A = I (the identity matrix) we obtain the L2 distance
In order to guarantee that LA is indeed a distance (LA(h,q;A) 0 h,q), it
is sufficient that A is a symmetric positive definite matrix
Multimedia Systems
Quadratic distance vs. Euclidean distance
As a simple example, let D = 3, with colors red, orange, and blue
Consider 3 pure-color images and the corresponding
histograms:
h1=(1,0,0) h2=(0,1,0)
h3=(0,0,1)
Multimedia Systems
Representing texture (1)
Texture provides information about the uniformity, granularity
and regularity of the image surface
Multimedia Systems
Representing texture (2)
Tamura features correspond to properties of a texture which
are readily perceived, that is coarseness, contrast and
directionality
(3-D feature vector)
Coarseness - coarse vs. fine
Contrast - high vs. low contrast
Directionality - directional vs. non-directional
Multimedia Systems
Texture extraction with Gabor filters
A Gabor filter is a Gaussian modulated by a sinusoid, which
can reveal the presence of a pattern along a certain direction
and at a certain scale (frequency)
j)
Multimedia Systems
Representing shape
Once one has succeeded in extracting an object’s contour,
the next step is how to represent/encode it
A common approach is to navigate the contour, which leads
to an ordering of the pixels in the contour:
{ (x(t),y(t)) : t = 1…,M }
Multimedia Systems
Comparing shapes
The commonest way to measure the (dis-)similarity of two shape
vectors of equal length D is based on Euclidean distance (L2)
However, with Euclidean distance we have to face a basic problem
Sensitivity to “alignment of values”
A distance like this exists, and is called “Dynamic Time Warping” (DTW)
In order to understand how it works, we first need to introduce some important
concepts related to time series…
Multimedia Systems
How to measure similarity between time series
Given two time series of equal length D, the commonest way
to measure their (dis-)similarity is based on Euclidean
distance
However, with Euclidean distance we have to face two basic
problems
1. High-dimensionality: (very) large D values
2. Sensitivity to “alignment of values”
s
Since DFT
is a linearS transformation we have:
f 0
D1 D1
2 2
L 2 (S,Q)2
L (s,q)2 t0 s q
2 t t
Es q ES Q S
f f
f 0 Q
Multimedia Systems
An example: EEG data
Sampling rate: 128 Hz
Multimedia Systems
Another example Fourier
First 4
Fourier
data values coefficients coefficients
128 0.4995 1.5698 1.5698
points 0.5264
0.5523
1.0485
0.7160
1.0485
0.7160
0.5761 0.8406 0.8406
s 0.5973 0.3709 0.3709
0.6153 0.4670 0.4670
0.6301 0.2667 0.2667
0.6420 0.1928 0.1928
s’ 0.6515 0.1635
0.6596 0.1602
0 20 40 60 80 100 120 140
0.6672 0.0992
0.6751 0.1282
0.6843 0.1438
0.6954 0.1416
0.7086 0.1400
0.7240 0.1412
0.7412 0.1530
0.7595 0.0795
0.7780 0.1013
0.7956 0.1150
0.8115 0.1801
0.8247 0.1082
s’ = approximation of s with 0.8345 0.0812
0.8407 0.0347
4 Fourier coefficients 0.8431 0.0052
0.8423 0.0017
0.8387 0.0002
… ...
Multimedia Systems
The “alignment” problem
Euclidean distance, as well as other Lp-norms, are not robust w.r.t., even
small, contractions/expansions of the signal along the time axis
E.g., speech signals
Intuitively, we would need a distance measure that is able to “match” a
point of time series s even with “surrounding” points of time series q
Alternatively, we may view the time axis as a “stretchable” one
A distance like this exists, and is called “Dynamic Time Warping”
(DTW)!
Multimedia Systems
How to compute the DTW (1)
Assume that the two time series s and q have the same length D
Note that with DTW this is not necessary anymore!
Construct a DD matrix d, whose element di,j is the distance
between si and qj
We take di,j = (si - qj)2, but other possibilities exist (e.g., |si – qj|)
D= 0 1 2 3 4 5
7 25 16 25 36 16 9
6
3 1 0 1 4 0 1
s 1 2 5 4 3 7
4 4 1 4 9 1 0
q 2 3 2 1 3 4 s
L2(s,q) = 29 5 9 4 9 16 4 1
The “rules of the game”: 2 0 1 0 1 1 4
Start from (0,0) and end in (D-1,D-1) 1 1 4 1 0 4 9
Take one step at a time
d 2 3 2 1 3 4
At each step, move only by increasing i, j, or both
I.e., never go back! q
“Jumps” are not allowed!
Sum all distances you have found in the “warping path”
Multimedia Systems
How to compute the DTW (2)
The figure shows a possible warping path w, whose “cost” is 21
The “Euclidean path” moves only along the main diagonal, and costs 29
7 25 16 25 36 16 9
3 1 0 1 4 0 1
4 4 1 4 9 1 0
s
5 9 4 9 16 4 1
2 0 1 0 1 1 4
1 1 4 1 0 4 9
2 3 2 1 3 4
warping path w
The DTW is the minimum cost among all the warping paths
Multimedia Systems
How to compute the DTW (3)
From the d matrix, incrementally build a new matrix WP, whose elements
wpi,j are recursively defined as:
7 25 16 25 36 16 9 7 40 22 31 43 24 15
3 1 0 1 4 0 1 3 15 6 7 11 8 6
4 4 1 4 9 1 0 4 14 6 9 18 8 5
s
5 9 4 9 16 4 1 s 5 10 5 11 18 7 5
2 0 1 0 1 1 4 2 1 2 2 3 4 8
1 1 4 1 0 4 9 1 1 5 6 6 10 19
d 2 3 2 1 3 4 WP 2 3 2 1 3 4
q
q
Then set dDTW(s,q) = wpD-1,D-1
Multimedia Systems
Back to the shape context: sample
queries [BCP05]
R = relevant 1100 objects’ contours
(same type of fish)
QueryImage
Multimedia Systems
This is not the whole story…
Of course, many other features models (and correspondent
distance functions) have been defined in literature for MM
data
Please, refer [SWS+00, LZL+07, DJL+08] for detailed pointers
This was just a way to provide some concrete examples of
MM features and distance functions for comparing them!
Note that, besides “generic” features, any specific image
domain/application needs to extract and manage specific features,
which in general require much more sophisticated tools than the one we
have seen
E.g., face/fingerprints recognition
Nonetheless, what is important to stress is that the
problem of how to search in large image DB’s
remains (almost) the same!
Multimedia Systems