Mpeg 7
Mpeg 7
• MPEG-7 overview
– What is…
– Why?
– Objectives and scope
– Main elements and organization.
• MPEG-7 Audio
– Low-level features
– High-level tools
What is MPEG-7?
• "Multimedia Content Description Interface”
• ISO/IEC standard by MPEG (Moving Picture Experts Group)
• Providing meta-data for multimedia
• MPEG-1, -2, -4: make content available;
MPEG-7: makes content accessible, retrievable, filterable,
manageable (via device / computer).
• Multi-degrees of interpretation of information’s meaning
• Support as broad a range of applications as possible.
• A compatible (with existing tech) and extensible standard.
Why MPEG-7?
• “The value of information often depends on how
easy it can be found, retrieved, accessed,
filtered and managed. ”
• Past: poverty of the digital multimedia sources
-> Simplicity of the access mechanisms
• Now: growing amount of audiovisual information
-> Identifying and managing them efficiently is
becoming more difficult.
e.g. “record only news about sport.”
Why MPEG-7?
• For future multimedia services, content
representation and description may have to be
addressed jointly.
• Many services dealing with content
representation will have to deal first with content
description
– “a non-described content may be useless”
• Need for access only to the content description:
– New original services (e.g. optimizing personal time)
– Adaptation to networks and terminal capabilities
Application domains
• Broadcast media selection (e.g., radio channel, TV
channel).
• Digital libraries (e.g., film, video, audio and radio
archives).
• E-Commerce (e.g., personalized advertising).
• Education (e.g., repositories of multimedia courses,
multimedia search for support material).
• Home Entertainment (e.g., management of personal
multimedia collections, including manipulation of content,
e.g. karaoke).
• Journalism (e.g. searching speeches of a certain
politician using his name, his voice or his face).
• Multimedia directory services (e.g. yellow pages, G.I.S).
• Surveillance and remote sensing.
MPEG-7 Objectives
Standardize content-based description for various
types of audiovisual information
processing chain:
An example of architecture
t
T0 T1
Illustration of log-tack time
Low-level Features (details)
• Timbral Spectral: (spectral features in a linear-frequency
space)
– SpectralCentroid:
• power-weighted average of the frequency
of the bins in the linear power spectrum.
• distinguishing musical instrument timbres
– 4 Ds for harmonic regularly-spaced components of signals:
• HarmonicSpectralCentroid
• HarmonicSpectralDeviation
• HarmonicSpectralSpread
• HarmonicSpectralVariation
Low-level Features (details)
• Spectral Basis: (low-dimensional projections of a spectral space to
aid compactness and recognition)
– AudioSpectrumBasis:
• a series of (time-varying / statistically independent) basis functions
derived from the singular value decomposition of a normalized
power spectrum.
– AudioSpectrumProjection:
• low-d features of a spectrum after projection upon a reduced rank
basis.
– independent subspaces of a spectra correlate strongly
with different sound sources.
– Provide more salience using less space.
• With Sound Classification and Indexing Description Tools.
High-level audio Description Tools
(Ds and DSs)
• Exchange some generality for descriptive richness:
– a smaller set of audio features (as compared to visual
features) that may canonically represent a sound without
domain-specific knowledge.
• Audio Signature (DS)
• Musical Instrument Timbre
• Melody
• General Sound Recognition and Indexing
• Spoken Content
High-level audio Description Tools
(details)
• Audio Signature Description Scheme
– SpectralFlatness Ds
– a unique content identifier for the purpose of
robust automatic identification
– e.g. audio fingerprinting
High-level audio Description Tools
(details)
• Musical Instrument Timbre Description Tools
– HarmonicInstrumentTimbre Ds:
• LogAttackTime Descriptor
– PercussiveIinstrumentTimbre Ds:
• SpectralCentroid Descriptor
High-level audio Description Tools
(details)
• Melody Description Tools:
– efficient, robust, and expressive melodic similarity
matching.
– MelodyContour Description Scheme:
• terse, efficient melody contour / rhythm
– MelodySequence Description Scheme:
• verbose, complete, expressive melody / rhythm.
• Interval encoding
High-level audio Description Tools
(details)
• General Sound Recognition and Indexing
Description Tools:
– SoundModel Description Scheme
– SoundClassificationModel Description Scheme
• a set of SoundModel DS -> multi-way classifier
– SoundModelStatePath Descriptor
• indices to states generated by a SoundModel of a
segment
– immediately applied to sound effects
– automatically index and segment sound tracks.
– Low -> mid -> high level analyses
High-level audio Description Tools
(details)
• Spoken Content Description Tools:
– detailed description of words spoken within an
audio stream.
– indexing into and retrieval of an audio stream
– indexing of multimedia objects annotated with
speech.
• Recall of audio/video data by memorable spoken events.
– a character or person spoke a particular word
• Spoken Document Retrieval
– separate spoken documents
• Annotated Media Retrieval
– photograph retrieved using a spoken annotation
Power
SpectralCentroid
Spectrum
Signal LogAttackTime
envelope
Signal Temporal Centroid
Instantaneous
HarmonicSpectralSpread
STFT Harmonic
Peaks
Detection Instantaneous
HarmonicSpectralCentroid
Sliding Analysis
Window f0
Instantaneous
HarmonicSpectralDeviation
Instantaneous
HarmonicSpectralVariation
z-1