0% found this document useful (0 votes)
10 views10 pages

Key-Frame Detection in Raw Video Streams

The document discusses key frame detection in raw video streams. It first reviews existing techniques for condensed video representation, including keyframe detection and shot detection methods. It then describes a histogram comparison method used to recognize environments. Finally, it proposes modifying the histogram comparison method to detect keyframes in raw video streams recorded from cameras mounted on vehicles. The algorithm is tested on video recorded from cars driving through a city.

Uploaded by

Quốc Thắng
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views10 pages

Key-Frame Detection in Raw Video Streams

The document discusses key frame detection in raw video streams. It first reviews existing techniques for condensed video representation, including keyframe detection and shot detection methods. It then describes a histogram comparison method used to recognize environments. Finally, it proposes modifying the histogram comparison method to detect keyframes in raw video streams recorded from cameras mounted on vehicles. The algorithm is tested on video recorded from cars driving through a city.

Uploaded by

Quốc Thắng
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Key-frame Detection in Raw Video Streams

Stanislava Budvytytė Armantas Ostreika


[email protected] [email protected]
Kaunas University of Technology, Department of Multimedia Engineering
Studentu̧ St. 50, LT-51368 Kaunas, Lithuania

Abstract. The first, in this paper a review on condensed video representation techniques,
especially, keyframe detection is given. The second, a histogram comparison method used for rec-
ognizing the environment is described. And the third, a modification of the histogram comparison
method to use it for keyframe detection in raw video streams is proposed. The algorithm is tested
using video records, that were created as a car routes while driving through a city.
Keywords: Condensed video representation, key frame detection, shot detection, video analy-
sis, raw video streams.

1 Introduction ronmental objects and their changes. We found


it as an interesting approach and applied it for
Video is a manifold media and it requires a lot key frame extraction in raw video streams filmed
of storing space in both, digital or analog for- by route tracking equipment and body-mounted
mats. In resent decade a digital storage formats cameras.
and devises have been developing firmly and stor- Our experiments, results and other observa-
ing space price have decreased vastly. At the tions are described in Section 4. In Section 5
same time various digital devises having built-in we conclude with discusion and future work as-
video cameras have been created. As a result an sumption.
amount of video information is increasing enor-
mously. A demand on video information analy-
sis and processing techniques is increasing at the 2 Related Work
same time.
In this paper we focus only on analysis of visual An amount of information stored in various video
content of the video — we analyse methods used formats takes terabites of space in hard disks
for frame features extraction and comparison of and other storage devices. And it has a ten-
the frames. The methods can be grouped into dency to increase further. Problems of condensed
two groups: shot boundary detection algorithms video representation and video storage and anal-
and key frame extraction methods. An overview ysis has been researched from different perspec-
of these methods is given in Section 2. tives for many years now. The problem of con-
Even taking only a visual component of video densed video representation is closely related to
stream, one do not get a continuous information other video and image analysis problems — shot
flow. Based on the storing formats, visual infor- detection and pattern matching. In the solution
mation is composed of a huge amount of still im- of the problem an important role is played by
ages — key frames and other supportive data de- heuristic methodology of human perception.
scribing their flow from one key frame to another. In [1] a review of different video shot detec-
The data storage key frames and key frames that tion and condensed representation methods is
are interesting for a user as a compact way to given. This paper is a good choice to begin a
review video content is not the same. These dif- research in the area. Shot boundary detection
ferences are described in Section 3.1. and condensed representation algorithms use dif-
ferent features and metrics extracted form video
In Section 3.2 we introduce a method for
frames or parts of frames called, regions of in-
raw video representation as a set of key-frames.
terest (ROI) [1]. The authors name the possible
Our method is mainly based on the technique
choices for these video components to analyze:
described in [10]. In the paper the authors
described a method for histogram comparison. • Features such as luminance / color, lu-
They have tested it indoors for recognizing envi- minance / color histogram, image edges,

1
motion, coefficients of different transforma- their areas, is first calculated and then the dis-
tions, e.g. discrete Fourier transformation or tance histogram is created. The half of maximum
discrete cosine transformation. histogram value is considered as the appropriate
• Spatial feature domain — single pixel, rect- threshold for the given iteration. The segmen-
angular blocks, arbitrary shaped blocks, tation is terminated if no segments are merged
whole frame. form one step to an other.” [2] As a result, for
• Feature similarity metric is a mathematical each frame color properties such as size, loca-
expression of similarity or dissimilarity of the tion, and average color of color components of
frames or ROIs. the frame are calculated. In parallel, motion
properties such as size, location, and average mo-
• Temporal domain of continuity metrics such
tion vectors of motion components of the frame
as the simplest way is to compare two neigh-
are determined. All color/motion properties are
boring frames, n-frame window — uses all
classified into pre-determined classes so that each
frames in a window, this method is com-
feature vector corresponds to a specific class. In
monly used, or interval since last shot change
each class a degree of membership is allocated.
computing statistics of changes from the last
The authors have called this allocation a fuzzy
shot or one of the last detected shots.
classification. It leads to that every segment can
• Shot change detection methods, such as belong to any number of classes but with differ-
static thresholding, adaptive thresholding, ent degree of membership. Depending on this
probabilistic detection, trained classifiers, classification for each frame a fuzzy multidimen-
are also used. sional histogram is created. Finally, frames con-
The next important thing about shot detection sisting similar content are discarded. It is done
is performance evaluation [1]. For this purpose by content-based sampling algorithm which cal-
standard technical characteristics are used: re- culates correlations between different frames.
call, precision, and accuracy. A number of video analysis parameter that are
The subproblem of condensed video represen- useful both in shot cut and in key-frame detec-
tation has several specific techniques for video tion are described in [9]. The paper describes an
analysis. Firstly, it is highlighting — extraction algorithm for low bit-rate video coding. A key
of frames that contain the most relevant infor- role is played by two thresholds in the algorithm.
mation of a whole video or of a single shot. It The first, a high value fixed threshold is used as
is sometimes called key frame extraction, too. a security measure forcing an algorithm to record
Hierarchical highlighting tries to extract “a hi- a frame if the number of frames passed since the
erarchy or a tree of highlighted frames, so that last encoded frame detection. And the second,
user can then interactively request more detail an adaptive threshold which depends on an av-
on the part on the video that interests him.”[1]. erage frame comparison value since the last de-
Skimming technique tries to extract appropriate tected frame for encoding. If this value increases
features from video segments instead of images. an average — a frame is encoded. An extra mem-
A fuzzy video content representation technique ory parameter ensures that an adaptive threshold
is presented in [2]. The focus is taken on tra- would not force to encode frame too fast. More-
ditional video sequences made of a collection of over, a limit parameter do not allows to skip a
different shots. So, firstly a video shot cut de- frame from encoding when the frame is almost
tection is applied to detect a sequences of frames encoded1 . The last parameter is a time span
with similar visual content. Another step is a measured in milliseconds for avoidance of frame
color/motion segmentation for each frame in a encoding too close in time.
single shot. It is performed using “the recursive In [6] a histogram-based fuzzy c-means clus-
shortest spanning tree (RSST) algorithm, called tering algorithm for video segmentation is pre-
M-RSST” [2] which is adopted for both color and sented. For each frame RGB color space his-
motion segmentation. The algorithm uses two tograms are created with respect to YCbCr color
parameters — an initial image resolution level to space and compared to the histogram of the pre-
detect the boundary blocks of an image and an vious frame. The results of comparison are sorted
adaptive threshold for terminating the algorithm. into three groups using a fuzzy c-means cluster-
“[. . . ]At each iteration, the Euclidean distance of ing algorithm: shot change (SC), suspected shot
color or motion intensities between two neighbor- 1 The algorithm is developed for real-time low bit-rate

ing segments, weighted by the harmonic mean of video encoding.

2
change (SSC), and no shot change (NSC). All quence of a still images — frames. At this point
frames in SC are recognized as shot change key we do not analyse a content of a frame. Single
frames. Other shot change frames are selected frames that are retrieved from video sequences
using heuristic methods from SSC group. in some cases can be analysed by image analysis
A two-stage hierarchical video summary ex- techniques. We do not take it into account and
traction method is described in [3]. In the claim that a frame is the smallest possible part
first stage alpha-trimmed average histograms for of video sequence. In our case a shot is a contin-
whole sequence of frames are created. An alpha- uous sequence of relative frames. A shot bound-
trimmed histograms are created for each frame in ary is an editing point or a camera break. As in
a sequence, too. Each frame histogram is com- physical video structure, a scene is a sequence of
pared to the average histogram and their compar- related shots.
ison values are grouped according to fuzzy clus-
terization into a number of classes. Based on star
selection algorithm the best key frames are se- 3.2 Keyframe Extraction in Raw
lected. In the second stage of the method, based Video Streams
on user defined parameters hierarchical structure
of key frames is created. As noted before, a concern of our research is
a continues non-edited video streams filmed by
body-mounted camera or filmed by route tracing
3 Methods equipment. Theoretically, such records consist
of a sequence of frames. The view in the frame
3.1 Background of Video Analysis changes continuesly in time and space, without
editing points or camera breaks. It can be con-
In the beginning we specify the terms of video
sidered as a single shot. Shot boundary tech-
streams elements and analysis stages. From the
niques, e.g. [11], used on such streams do not give
analysis point of view a structure of a video can
a sufficient result for condensed representation.
be analysed from two different perspectives —
As we are interested in logical video analysis,
logical and physical.
the semantic of keyframes is our main concern.
From the physical point of view video unit
There are several keyframe extraction methods
can be a shot, a scene or a sequence. Shots,
developed, too [3, 13], and they all start from
which are the smallest physical video units, con-
shot boundaries detection and only then the key
sist of one or more consecutively generated and
frames are extracted.
recorded frames, representing continues action in
time and space. [12]. The end of one shot and
the beginning of an other one is detected because Definition 1 We define a keyframe is a frame
of a break of continuous view. The breaks are in a shot which image is different in comparison
called shot boundaries and are formed because to previous frames.
of a camera breaks and editing points. In time
and space semantically related shots are ussally Most of the algorithms, developed for shot cut
grouped together to form a scene. In some cases or key frame detection, are tested on clean video
relating scenes are grouped to make a sequence. streams, where shot boundaries and key frames
Moreover, physically because of video com- are easy to determine. In this paper a method
pression methods frames are not the same in for raw video streams is developed. The chal-
video sequences. For example, in MPEG stan- lenge is that such streams usually are shaky and
dart there are three types of frames: I-frame, have a lot of noise in the view. The scene changes
P-frame, and B-frame. I-frame (intra-frame), constantly or stays still underfined period of time
sometimes called key-frame, is encoded as a sin- and have no real shot boundaries. All changes are
gle image, with no reference to any past or fu- natural environmental changes caused by move-
ture frames. A P-frame (predicted frame) is en- ment of a camera or movement in the environ-
coded relative to the past reference frame. And a ment.
B-frame (bi-directional predicted frames) is en- Our method was develop on the bases of open
coded relative to the past reference frame, the source shot boundary detection program Shotde-
future reference frame, or both frames [7]. tect [11]. The original method of frame compari-
Logically, starting form the lowest level, one son was changed and user interface was addopted
can assume that video stream consist of a se- to the needs of our algorithm.

3
3.2.1 An Idea of the Algorithm • Sectors — number of rectangular sectors for
dividing frame into regions. Possible value:
Video records that we analyse share the folow-
1 (whole frame), 4, 9, 16. Theoretically any
ing characteristics: they have none editing points
number in power of 2.
(any type, e.g.. cut, fade, dissolve, wipe, or com-
puter generated transformations [1]), they are • Different sectors — a maximum number
filmed by body mounted or other kind of wear- of different rectangular sectors allowed for
able camera. Such records are shaky and some- frame to be recognized as similar. The pa-
times noisy, they might have unfocused or low rameter has to be smaller than a number of
contrast view at some points. Moreover, the sectors in a frame.
change of a scene is either constant (e.g. user
is driving) or none (e.g. user is standing because • Histogram bins — a number of histogram
of red traffic light). bins for dividing hue values into them.
We take a video record which satisfies these • Similarity percent — two histograms are
conditions and, starting form the first view considered not similar if the distance be-
frame, for every N th frame we calculate measure- tween them is greater or equal to the per-
ment for frame comparison. If a frame is recog- cent of maximum possible distance between
nized as different enough compared to previous histograms.
one, it is recorded as an image.
Each frame is divided into rectangular seg- It is important to note that the authors of the
ments and for every segment hue histogram is ex- histogram comparison have tested their method
tracted. Further, a difference of two histograms on indoor scenes for visible pattern matching [10].
from respective segments of two frames is cal- In case when we take sectors parameter equal
culated. Histograms are considered different if to 1 and different sectors parameter equal to 0,
the distance between them is greater than the we actually use their method for pattern match-
selected percent of maximum possible distance. ing. Unfortunately, they do not give a parameter
Two frames are considered different if a selected when they consider two images similar.
number of segments is recognized as different.
As a parameter for histogram, hue from the 3.2.3 An Algorithm
HSV color space was selected because this color
A pseudocode of a video stream processing algo-
space is much closer to how human eye really sees
rithm is given in Algorithm 1 and a pseudocode
colors [8]. Althought, RGB color space is much
of comparing two frames is given in Algorithm 2.
easier to use in calculations, real understanding
Some of the pseudocode methods are used in
of an image is more important in image analysis
both of the algorithms and share the same name.
applications (including video).
A pseudomethod F := ExtractV ideoF rame(V )
replaces a code that is necessary to extract a
3.2.2 Parameters frame from a video stream. (We use methods
The following parameters are used in the algo- that are described in FFmpeg [4] multime-
rithm. dia processing library.) A pseudomethod
CreateHistograms(F, dencity, sectors, bins)
• Step — taking into account the fact that the creates hue histograms for every rectangular
view in a video stream has no cut change, a sector in the frame F . It takes into account
step of skiping frames in the process of the every dencity th pixel of a frame rows and
comparison is taken into account. In [5] simi- columns and puts its hue value into one of
lar parameter was applied. It is claimed that bins. An auxiliary pseudomethod val :=
the parameter can be selected as a number P ercentV alue(bins, dencity, sectors, percent) is
of frames per second, and experiments were for calculating a value which later is used for
performed with a value equal to 22. comparing it with a distance between histograms
to identify whether histograms are similar or
• Dencity of pixels — in order to make calcu- not.
lations faster the parameter of pixels dencity As mentioned above an Algorithm 2 de-
was added. If it is equal to 1, every pixel is scribes how two frames are compared in order
taken into account, otherwise the number of to decide whether they are similar or not.
pixels are skipped in every row and column. Firstly, we take histograms of a previous frame

4
as follows: H prev := GetHistograms(), tograms are hue histograms they are modulo type
create hue histograms for a new frame — histograms:
CreateHistograms(F, dencity, sectors, bins)
and take them in the same way: H := [. . . ]In a modulo type histogram or
GetHistograms(). For every two corresponding signature [. . . ] the first bin and the
histograms from current and previous frames an last bin are considered to be adjacent
extended signatures are created. to each other, and hence, it forms a
The comparison of the histograms is done closed circle, due to the nature of the
according the method described in [10]. Based data type. [10]
on the data of two histograms the extended sig-
natures are generated {S prev[i], S[i], length} := If modulo distance between two signatures
CreateExtendedSignatures(H prev[i], H[i]). (and histograms at the same time) is greater than
According to the definitions of signature and the val parameter the histograms are considered
extended signature, we take two histograms not similar. Finally, if more sectors of the frames
and analyse every pair of their elements. If are recognized different than the dif f value, the
any element of a pair is not equal to zero, the new key frame is recorded. The current frame
elements and the number of the histograms bin becomes a previous frame.
are recorded into respective extended signatures. The described process compares frames till it
[. . . ]Given a pair of signatures to be compared, reaches the end of the record.
the number of bins is the same. Moreover, each
bin in both signatures represents the same bin 3.2.4 Variations of the Algorithm
in the histograms. [10]. In Figure 2 the extended
signatures of two histograms A and B in Figure 1 In Section 3.2.2 we have described a set of param-
are shown. Together with each pair of extended eters we used in the algorithm. In this section the
signatures, a length of the original histogram exact values combinations, used in the algorithm,
is stored. It is necessary to know in further is named.
calculations.
• Step — a step between the frames to be com-
9 9

8 8 pared. In most of the experiments it was 24.


7 7

6 6 Some testing was also performed with a step


5 5

4 4 which was equal to a frame rate per second


3 3

2 2
of a selected records (in different records it
1

0
1

0
might be different).
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7

1.1. H(A) 1.2. H(B) • Dencity of pixels — the final experiments


were performed taking the value equal to 4
Figure 1: Example of two histograms (every forth pixel in every forth line).

9 9
• Sectors — one of key parameters in our algo-
8

7
8

7
rithm. The algorithm was tested when tak-
6 6 ing whole the frame (1 sector), 4, 9, or 16
5 5

4 4 rectangular sectors.
3 3

2 2

0
1

0
• Different sectors — a maximum number of
0 1 2 4 5 6 0 1 2 4 5 6

different sectors allowed for similar frames.


2.1. ExS(A) 2.2. ExS(B)
For whole frame the parameter is equal to
zero, for others it must be smaller then the
Figure 2: An extended signatures of histograms number of sectors. In variantions that were
in Figure 1 tested the different sectors parameter was
smaller that a half of s sectors number.
A comparison of the generated extended signa-
tures is preformed in method, called Dmod[i] := • The algorithm is designed so that the num-
M oduloDistance(S prev[i], S[i], length). It is an ber of histogram bins can be 2n , where n the-
implementation of Modulo distance algorithm, oretically can be any integer number. The
described in the same paper [10]. As our his- algorithm was tested with 28 = 256 bins.

5
Data: V – a video stream; step – frame step; dencity – dencity of pixels; sectors – number of
frame sectors; dif f – number of different sectors; bins – number of histogram bins;
percent – percent of distance between two histograms.
Result: KS – a set of selected keyframes.
begin
F := ExtractV ideoF rame(V );
i := 0;
j := 0;
val := 0;
while F 6= N U LL do
j + +;
if i > 0 then
if j = step then
CompareF rames(F, F prev, dencity, sectors, bins, val, dif f );
i + +;
F prev := F ;
j := 0;
else
CreateHistograms(F, dencity, sectors, bins);
val := P ercentV alue(bins, dencity, sectors, percent);
SaveF rame(F );
i + +;
F prev := F ;
F := ExtractV ideoF rame(V );
end

Algorithm 1: ProcessVideo

Data: F – frame; F prev – previous frame; dencity – dencity of pixels; sectors – number of
frame sectors; bins – number of histogram bins; val – comparison value; dif f – number
of different sectors.
Result: If frames are not similar — record new keyframe.
begin
H prev := GetHistograms();
CreateHistograms(F, dencity, sectors, bins);
H := GetHistograms();
param := 0;
for i ← 0 to sectors do
{S prev[i], S[i], length} := CreateExtendedSignatures(H prev[i], H[i]);
Dmod[i] := M oduloDistance(S prev[i], S[i], length);
if Dmod[i] >= val then
param + +;
if param > dif f then
SaveF rame(F );
F prev := F ;
end

Algorithm 2: CompareFrames

6
• Similarity percent is probably the most im- method and the results of the one we propose.
portant parameter in the algorithm, because This method of analysis has a big disadvan-
it enables to charge whether two histograms tage — it is very sensitive for small noisy motion
are similar or not. Our selected values varied and shadows (even using the selected step value).
between 25 to 5 percents. A number of key frames having differences only
in shadows are recorded. On the other hand, a
number of smoothly appearing objects is mising
4 Experiments and Results in the representation.
In the last page in Figure 6 an example of
As mentioned shortly earlier, the described al- the same video sequence frames is shown. Here
gorithm was implemented on the basis of open one can see every 24th frame of 70 second length
source program, called Shotdetect [11]. It is writ- record. And in Figure 5 an extracted key frames
ten using C++ programming language. Process- of the same sequence using out proposed algo-
ing of video stream is based on methods from rithm is given.
FFmpeg library, released on LGPL license [4]. For the comparison of the parameters of the
Two groups of videos we experimented — methods we give a graphical example in Figure 3.
video records filmed my body-mounted camera In Figure 3.1 a normalizes selection criteria of the
(slow but shaky movement, passing though ob- whole frame is shown. And in Figure 3.2 a cri-
jects, etc.) and videos recording route that were teria for the same sequence using our proposed
filmed form a car while driving through a city method dividing frame into 4 sectors is given. It
(stable enough, smooth movement and smoothly is very obvious that when sectors are used selec-
appearing and disappearing objects). For illus- tion criteria gets emphasised and more environ-
trating our results we use video records made mental highlights are selected into the sequence
while driving through a city. of the keyframes. In both charts the strait bold
Firstly, we would like to note that the parame- line marks the similarity percent value.
ter of the step is one of critical parameters in our
application. For example, taking a very small 800000

700000

step close or equal to 1, analysis time increases 600000

500000

drastically and the method becomes very sensi- 400000

tive for such view noises as shadows of the trees. 300000

200000

Because of this reason, a step value was empir- 100000

ically selected equal to 24. A small step has an 0


0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73

other disadvantage too. In case, some object in a 3.1.


view is appearing and disappearing very slowly,
800000

small step is not able to detect environmental 700000

changes and keyframes showing this object is not 600000

500000

created. 400000

300000

An other problem solved by using step value is 200000

recognition of zooms and pans. They are much 100000

easier to detect and keyframes are extracted in


0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73

more meaningfull stages of zooming or panning. 3.2.

In Figure 4 keyframe selection using histogram


Figure 3: Selection criteria of the example video
comparison method described in [10] is shown.
stream: Fig. 3.1 a normalized criteria using whole
Here hue histograms are extracted from the
frame, and Fig. 3.2 a criteria of the frame of 4
whole image for comparing image paterns. The
sectors
allowed maximum difference of the histograms to
be recognized as similar is 25% of maximum pos-
sible modulo distance between two histograms.
Value of similarity was also selected empirically,
because the authors of the histogram comparison
5 Discusion and Future
method do not give information about the level Work
of difference allowed in their experiments. The
selected step value is equal to 24. It is used in The whole problem of key frame selection in a
order to be able to compare the results of this video sequence is hardly to define unambiguously.

7
4.1. 4.2. 4.3. 4.4. 4.5. 4.6.

4.7. 4.8. 4.9. 4.10. 4.11. 4.12.

Figure 4: An example output of testing the method described in [10] (Figure 4.12 is the last frame
of the video sequence)

5.1. 5.2. 5.3. 5.4. 5.5. 5.6.

5.7. 5.8. 5.9. 5.10. 5.11. 5.12.

5.13. 5.14. 5.15. 5.16. 5.17. 5.18.

5.19. 5.20. 5.21. 5.22. 5.23. 5.24.

Figure 5: An example output of testing our proposed method (Figure 5.24 is the last frame of the
video sequence)

8
Manual selection would probably give slightly dif- [3] A. M. Ferman and A. M. Tekalp. Two-
ferent sequence of the keyframes. In some cases stage hierarchical video summary extrac-
different users might select quite different frames tion to match low-level user browsing pref-
depending on the content of the sequence and on erences. IEEE Transactions on Multimedia,
their own attitude. 5(2):244–256, 2003.
Our proposed method seems to work fine with
[4] FFmpeg. Last visited 2007.08.08, available
a route tracking filmed video records as it is able
at http://code.google.com/soc/2007/
to select those frames that represent such changes
ffmpeg/about.html.
in the route as turns, new visual environmental
objects. At the same time, such objects as other [5] A. Hanjalic. Shot-boundary detection: un-
cars or clouds is a big disturbance in this kind of raveled and resolved? IEEE Trans. Circuits
data and it forces new unmeaningfull keyframes Syst. Video Techn., 12(2):90–105, 2002.
to be extracted. One of possible ways for further
research could be an attempt to avoid particular [6] C.-C. Lo and S.-J. Wang. Video segmenta-
objects (differant cars) in selection process. tion using a histogram-based fuzzy c-means
An other disturbance, for the method is cant clustering algorithm. Comput. Stand. Inter-
changes in the view. As the method uses precise faces, 23(5):429–438, 2001.
rectangular blocks such changes are detected as
[7] MPEG.ORG. Last visited 2008.04.16, avail-
keyframes, although the view is very similar to
able at http://www.mpeg.org.
the previous frames. A tolerance for view changes
of cant manner should be implemented for the [8] M. J. Pickering and S. Rüger. Evaluation
method. of key frame-based retrieval techniques for
video. Comput. Vis. Image Underst., 92(2-
3):217–235, 2003.
Acknowledgement
[9] J. Sastre, P. Usach, A. Moya, V. Naranjo,
We would like to acknowledge Siemens Master and J. López. Shot Detection Method For
Program which allowed Stanislava Budvytytė to Low Bit-Rate H.264 Video Coding. In Pro-
study with a status of visiting student in Digi- ceedings of the 14th European Signal Pro-
tal Media master program in Bremen University. cessing Conference, Eusipco 2006, Florence,
Also, we thank the Automemento project group Italy, September 2006.
— Lutz Dickmann, Tobias Lensing, supervisor
Stéphane Beauregard and the rest members of [10] F. Serratosa and A. Sanfeliu. Signatures ver-
the project group “AutoMemento meets Persua- sus histograms: Definitions, distances and
sive Technology” for support and collaboration algorithms. Pattern Recogn., 39(5):921–934,
in the initial stage of this research. 2006.
[11] Shotdetect. Last visited 2008.04.16, avail-
References able at
http://shotdetect.nonutc.fr/shotdetect/.
[1] C. Cotsaces, N. Nikolaidis, and I. Pitas.
Video Shot Detection and Condensed Repre- [12] R. Tusch, H. Kosch, and L. Böszörményi.
sentation. A review. Signal Processing Mag- VIDEX: an integrated generic video index-
azine, IEEE, 23(2):28–37, 2006. ing approach. In ACM Multimedia, pages
448–451, 2000.
[2] A. D. Doulamis, N. D. Doulamis, and S. D.
Kollias. A fuzzy video content representa- [13] X. Zhu, J. Fan, A. K. Elmagarmid, and
tion for video summarization and content- X. Wu. Hierarchical video content descrip-
based retrieval. Signal Process., 80(6):1049– tion and summarization using unified se-
1067, 2000. mantic and visual similarity. Multimedia
Syst., 9(1):31–53, 2003.

9
6.1. 6.2. 6.3. 6.4. 6.5. 6.6. 6.7. 6.8.

6.9. 6.10. 6.11. 6.12. 6.13. 6.14. 6.15. 6.16.

6.17. 6.18. 6.19. 6.20. 6.21. 6.22. 6.23. 6.24.

6.25. 6.26. 6.27. 6.28. 6.29. 6.30. 6.31. 6.32.

6.33. 6.34. 6.35. 6.36. 6.37. 6.38. 6.39. 6.40.

6.41. 6.42. 6.43. 6.44. 6.45. 6.46. 6.47. 6.48.

6.49. 6.50. 6.51. 6.52. 6.53. 6.54. 6.55. 6.56.

6.57. 6.58. 6.59. 6.60. 6.61. 6.62. 6.63. 6.64.

6.65. 6.66. 6.67. 6.68. 6.69. 6.70. 6.71. 6.72.

6.73. 6.74. 6.75.

Figure 6: An example output of the test video. An every 24th frame of the sequence is given
(Figure 6.75 is the last frame of the video sequence)

10

You might also like