100% found this document useful (1 vote)
67 views58 pages

Mpeg 7

The document discusses MPEG-7, an ISO/IEC standard for describing multimedia content. MPEG-7 aims to make multimedia content accessible, retrievable, filterable and manageable by providing metadata. It includes low-level audio features like spectrum and timbre as well as high-level tools for sound recognition, melody description and spoken content indexing. MPEG-7 descriptions can be automatically extracted or manually created to support applications in media selection, digital libraries, e-commerce and more.

Uploaded by

Aland Media
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
67 views58 pages

Mpeg 7

The document discusses MPEG-7, an ISO/IEC standard for describing multimedia content. MPEG-7 aims to make multimedia content accessible, retrievable, filterable and manageable by providing metadata. It includes low-level audio features like spectrum and timbre as well as high-level tools for sound recognition, melody description and spoken content indexing. MPEG-7 descriptions can be automatically extracted or manually created to support applications in media selection, digital libraries, e-commerce and more.

Uploaded by

Aland Media
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

MPEG-7

• MPEG-7 overview
– What is…
– Why?
– Objectives and scope
– Main elements and organization.
• MPEG-7 Audio
– Low-level features
– High-level tools
What is MPEG-7?
• "Multimedia Content Description Interface”
• ISO/IEC standard by MPEG (Moving Picture Experts Group)
• Providing meta-data for multimedia
• MPEG-1, -2, -4: make content available;
MPEG-7: makes content accessible, retrievable, filterable,
manageable (via device / computer).
• Multi-degrees of interpretation of information’s meaning
• Support as broad a range of applications as possible.
• A compatible (with existing tech) and extensible standard.
Why MPEG-7?
• “The value of information often depends on how
easy it can be found, retrieved, accessed,
filtered and managed. ”
• Past: poverty of the digital multimedia sources
-> Simplicity of the access mechanisms
• Now: growing amount of audiovisual information
-> Identifying and managing them efficiently is
becoming more difficult.
e.g. “record only news about sport.”
Why MPEG-7?
• For future multimedia services, content
representation and description may have to be
addressed jointly.
• Many services dealing with content
representation will have to deal first with content
description
– “a non-described content may be useless”
• Need for access only to the content description:
– New original services (e.g. optimizing personal time)
– Adaptation to networks and terminal capabilities
Application domains
• Broadcast media selection (e.g., radio channel, TV
channel).
• Digital libraries (e.g., film, video, audio and radio
archives).
• E-Commerce (e.g., personalized advertising).
• Education (e.g., repositories of multimedia courses,
multimedia search for support material).
• Home Entertainment (e.g., management of personal
multimedia collections, including manipulation of content,
e.g. karaoke).
• Journalism (e.g. searching speeches of a certain
politician using his name, his voice or his face).
• Multimedia directory services (e.g. yellow pages, G.I.S).
• Surveillance and remote sensing.
MPEG-7 Objectives
Standardize content-based description for various
types of audiovisual information

• Independent from media support (encoding and storage)


• Different granularity
– Low-level features: shape, size, key, tempo changes,
– High-level semantic info: “scene with a barking brown dog on the
left and with the sound of passing cars in the background.”
• Meaningful in the context of the application
– Same material -> different types of features and combinations
e.g. timbre v.s. loudness
MPEG-7 Objectives
• Information about the content
– The form: e.g. the coding format used
– Conditions for accessing the material:
e.g. Intellectual property rights / price
– Classification: e.g. parental rating
– Links to other relevant materials
– The context: “e.g. Olympic Games 1996, final of 200 meter
hurdles, men)”
• Information present in the content:
– Combination of low-level and high-level descriptors
Scope of the Standard

processing chain:
An example of architecture

• Pull: (Client Queries -> Descriptions repository -> Matched Ds)


• Push: (Filter descriptions -> Programmed actions)
Where are the descriptions from?
• Preservation of existing descriptive data (e.g.
scripts) through production/delivery
• Generated automatically by capture devices
(e.g. time or GPS location in a camera)
• Extracted automatically & semi-automatically
(i.e. with some human assistance)
• Manually produced (e.g. for legacy material such
as existing film archives)
Main Elements of MPEG-7
• Relationship among elements introduced above.
Descriptions
• MPEG-7 approaches the description of content from
several viewpoints.
• A set of methods and tools for the different
viewpoints of the description (not a monolithic system)
• Interrelated and can be combined in many ways.
• Associated with the content itself: (searching, filtering)
• Location: (document V.S. stream)
– physically located with the material
– somewhere else on the globe (maybe not)
• Interoperability with other metadata standards: (XML)
Major Functionalities
• MPEG-7 Systems
• MPEG-7 Description Definition Language
• MPEG-7 Visual
• MPEG-7 Audio
• MPEG-7 Multimedia Description Schemes
• Reference Software: the eXperimentation Model (test)
• MPEG-7 Conformance (syntax checking)
• MPEG-7 Extraction and use of descriptions (technical
report)
MPEG-7 Audio
• Audio provides structures—building upon
some basic structures from the MDS—for
describing audio content.
• Low-level Descriptors:
– audio features that cut across many applications
• High-level Description Tools:
– more specific to a set of applications.
Low-level Features
Low-level Features (details)
• Basic: (temporally sampled scalar values for general use)
– AudioWaveform Descriptor
• waveform envelope: (for display purposes).
– AudioPower Descriptor
• temporally-smoothed instantaneous power:
(quick summary of a signal)
• Silence segment: (no significant sound)
– aid further segmentation of the audio stream, or as a hint
not to process a segment
– Applicable to all kinds of signals
Low-level Features (details)
• Basic Spectral: (single time-frequency analysis of signal)
– AudioSpectrumEnvelope: (Base class)
• the short-term power spectrum:
(display, synthesize, general-purpose search)
– AudioSpectrumCentroid:
• dominated by high or low frequencies ?
– AudioSpectrumSpread:
• the power spectrum centered near the spectral centroid, or spread
out over the spectrum?
• pure-tone and noise-like sounds
– AudioSpectrumFlatness: (the presence of tonal components)
Low-level Features (details)
• Signal Parameters: (periodic or quasi-periodic signals)
– AudioFundamentalFrequency:
• “confidence measure”, replacing “pitch-tracking”
– AudioHarmonicity:
• distinction between sounds with a
harmonic / inharmonic / non-harmonic spectrum
Low-level Features (details)
• Timbral Temporal: (temporal characteristics of segments
of sounds, musical timbre)
– LogAttackTime
– TemporalCentroid
• where in time the energy of a signal is focused.
• Useful when attack times are identical
Signal envelope(t)

t
T0 T1
Illustration of log-tack time
Low-level Features (details)
• Timbral Spectral: (spectral features in a linear-frequency
space)
– SpectralCentroid:
• power-weighted average of the frequency
of the bins in the linear power spectrum.
• distinguishing musical instrument timbres
– 4 Ds for harmonic regularly-spaced components of signals:
• HarmonicSpectralCentroid
• HarmonicSpectralDeviation
• HarmonicSpectralSpread
• HarmonicSpectralVariation
Low-level Features (details)
• Spectral Basis: (low-dimensional projections of a spectral space to
aid compactness and recognition)
– AudioSpectrumBasis:
• a series of (time-varying / statistically independent) basis functions
derived from the singular value decomposition of a normalized
power spectrum.
– AudioSpectrumProjection:
• low-d features of a spectrum after projection upon a reduced rank
basis.
– independent subspaces of a spectra correlate strongly
with different sound sources.
– Provide more salience using less space.
• With Sound Classification and Indexing Description Tools.
High-level audio Description Tools
(Ds and DSs)
• Exchange some generality for descriptive richness:
– a smaller set of audio features (as compared to visual
features) that may canonically represent a sound without
domain-specific knowledge.
• Audio Signature (DS)
• Musical Instrument Timbre
• Melody
• General Sound Recognition and Indexing
• Spoken Content
High-level audio Description Tools
(details)
• Audio Signature Description Scheme
– SpectralFlatness Ds
– a unique content identifier for the purpose of
robust automatic identification
– e.g. audio fingerprinting
High-level audio Description Tools
(details)
• Musical Instrument Timbre Description Tools
– HarmonicInstrumentTimbre Ds:
• LogAttackTime Descriptor
– PercussiveIinstrumentTimbre Ds:
• SpectralCentroid Descriptor
High-level audio Description Tools
(details)
• Melody Description Tools:
– efficient, robust, and expressive melodic similarity
matching.
– MelodyContour Description Scheme:
• terse, efficient melody contour / rhythm
– MelodySequence Description Scheme:
• verbose, complete, expressive melody / rhythm.
• Interval encoding
High-level audio Description Tools
(details)
• General Sound Recognition and Indexing
Description Tools:
– SoundModel Description Scheme
– SoundClassificationModel Description Scheme
• a set of SoundModel DS -> multi-way classifier
– SoundModelStatePath Descriptor
• indices to states generated by a SoundModel of a
segment
– immediately applied to sound effects
– automatically index and segment sound tracks.
– Low -> mid -> high level analyses
High-level audio Description Tools
(details)
• Spoken Content Description Tools:
– detailed description of words spoken within an
audio stream.
– indexing into and retrieval of an audio stream
– indexing of multimedia objects annotated with
speech.
• Recall of audio/video data by memorable spoken events.
– a character or person spoke a particular word
• Spoken Document Retrieval
– separate spoken documents
• Annotated Media Retrieval
– photograph retrieved using a spoken annotation
Power
SpectralCentroid
Spectrum

Signal LogAttackTime
envelope
Signal Temporal Centroid
Instantaneous
HarmonicSpectralSpread
STFT Harmonic
Peaks
Detection Instantaneous
HarmonicSpectralCentroid
Sliding Analysis
Window f0
Instantaneous
HarmonicSpectralDeviation

Instantaneous
HarmonicSpectralVariation

z-1

Timbre Descriptor Estimation


MPEG-7 Audio Amendment 2
will include extended functionality of audio metadata
that is complementary to low-level audio descriptors
in ISO/IEC 15938-4,

providing high level description tools


like chord pattern and Rhythm pattern,

both of which support compact representation of timbre and


rhythm.

You might also like