0% found this document useful (0 votes)

58 views38 pages

Module 7 - Multimedia Information Retrieval

Uploaded by

Aathmika Vijay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views38 pages

Module 7 - Multimedia Information Retrieval

Uploaded by

Aathmika Vijay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

Module 7

MULTIMEDIA INFORMATION
RETRIEVAL

VIT - Dr.D.SARASWATHI 1
• Overview of Spoken Language Audio Retrieval
• Non-Speech Audio Retrieval
• Graph Retrieval
• Imagery Retrieval
• Video Retrieval

VIT - Dr.D.SARASWATHI 2
1.Overview of Spoken Language Audio
Retrieval
• Combines speech recognition and text retrieval to enable
users to access digitized audio-visual content that contains
spoken language components.

VIT - Dr.D.SARASWATHI 3
Basic Goal: Spoken Term Detection
• The fundamental objective is to identify the time spans in an
audio database where a specific query occurs. This process is
known as Spoken Term Detection.
• For instance, if a user searches for “US President,” the system
should locate relevant segments, including utterances
related to “Obama.”

VIT - Dr.D.SARASWATHI 4
Speech Recognition Models
• The first step involves transcribing spoken content into text
using speech recognition techniques.
• Various models, such as RNN/LSTM and DNN, are employed
to achieve accurate transcription.

VIT - Dr.D.SARASWATHI 5
Text Retrieval
• Once transcribed, the spoken content becomes searchable.
• Text retrieval approaches are then used to search over the
transcriptions.
• The goal is to retrieve relevant content based on user
queries.

VIT - Dr.D.SARASWATHI 6
Lattices and Beyond
• To handle the inherent errors in speech recognition, we work
with lattices—graphical representations of multiple
recognition hypotheses.
• Lattices allow us to explore different recognition paths and
improve retrieval accuracy.

VIT - Dr.D.SARASWATHI 7
Semantic Retrieval
• Going beyond basic term detection, we aim for semantic
retrieval of spoken content.
• This involves understanding context, speaker intent, and
deeper meaning within the audio.

VIT - Dr.D.SARASWATHI 8
Spoken
Document
Retrieval

VIT - Dr.D.SARASWATHI 9
VIT - Dr.D.SARASWATHI 10
Challenges and Directions
• Despite advances, speech recognition always produces
errors.
• Researchers are exploring new directions to enhance
retrieval performance and address challenges.

VIT - Dr.D.SARASWATHI 11
2. Non-Speech Audio Retrieval
• While speech audio retrieval focuses on transcribing spoken
language, non-speech audio retrieval deals with identifying
and retrieving other types of audio content.

VIT - Dr.D.SARASWATHI 12
Importance of Non-Speech Audio Retrieval
• Beyond speech, audio databases contain various non-speech
sounds, including music, environmental noise, and sound
effects.
• Retrieving relevant non-speech audio is crucial in fields
like music, movie/video production, and multimedia
content.

VIT - Dr.D.SARASWATHI 13
SoundFisher: A User-Extensible System
• Thorn Blum et al. (1997) introduced SoundFisher, a user-
extensible sound classification and retrieval system.
• SoundFisher draws from multiple disciplines to classify and
retrieve non-speech audio.
• It allows users to define custom sound classes and extend
the system’s capabilities

VIT - Dr.D.SARASWATHI 14
Acoustic Features for Indexing
• Similar to image indexing, where visual feature vectors are
used, non-speech audio retrieval employs acoustic features.
• These features capture properties such as duration,
loudness, pitch, and spectral characteristics.
• Acoustic feature vectors enable efficient indexing and
matching of non-speech audio segments.

VIT - Dr.D.SARASWATHI 15
Applications

• Music Retrieval: Identifying music tracks, genres, and

artists from audio databases.
• Environmental Sound Retrieval: Locating specific sounds
(e.g., birdsong, traffic, waves) in large audio collections.
• Sound Effects Retrieval: Finding relevant sound effects for
multimedia production.

VIT - Dr.D.SARASWATHI 16
Challenges
• Non-speech audio can be highly diverse, making
classification and retrieval complex.
• Handling variations in recording quality, background noise,
and context is essential.

VIT - Dr.D.SARASWATHI 17
• Non-speech audio retrieval enriches our
understanding of audio content beyond spoken
language

VIT - Dr.D.SARASWATHI 18
3. Graph Retrieval
• Graph-based models play a crucial role in information
retrieval, especially when dealing with complex and
interconnected data.

VIT - Dr.D.SARASWATHI 19
VIT - Dr.D.SARASWATHI 20
4. Imagery Retrieval
• There are many real-world applications in which CBIR plays
an important role. Some examples are medicine, forensics,
security, and remote sensing.

VIT - Dr.D.SARASWATHI 21
VIT - Dr.D.SARASWATHI 22
VIT - Dr.D.SARASWATHI 23
• Content-Based Image Retrieval (CBIR) is a way of retrieving images
from a database. In CBIR, a user specifies a query image and gets the
images in the database similar to the query image. To find the most
similar images, CBIR compares the content of the input image to the
database images.
• More specifically, CBIR compares visual features such as shapes,
colors, texture and spatial information and measures the similarity
between the query image with the images in the database with
respect to those features:

VIT - Dr.D.SARASWATHI 24
Text-Based Image Retrieval

VIT - Dr.D.SARASWATHI 25
Feature Extraction – Global

VIT - Dr.D.SARASWATHI 26
Feature Extraction – Local
• Local features describe visual patterns or structures identifiable in
small groups of pixels. For example, edges, points, and various image
patches.
• The descriptors used to extract local features consider the regions
centered around the detected visual structures. Those descriptors
transform a local pixel neighborhood into a vector presentation.
• One of the most used local descriptors is SIFT which stands for Scale-
Invariant Feature Transform. It consists of a descriptor and a detector
for key points. It doesn’t change when we rotate the image we’re
working on. However, it has some drawbacks, such as needing a fixed
vector for encoding and a huge amount of memory.
VIT - Dr.D.SARASWATHI 27
Deep Neural Networks

VIT - Dr.D.SARASWATHI 28
Similarity Measures
• Similarity measures quantify how similar a database image is
to our query image. The selection of the right similarity
measure has always been a challenging task.
• The structure of feature vectors drives the choice of the
similarity measure. There are two types of similarity
measures: distance measures and similarity metrics.

VIT - Dr.D.SARASWATHI 29
Distance
• A distance measure typically quantifies the dissimilarity of
two feature vectors. We calculate it as the distance between
two vectors in some metric space.
• Manhattan distance, Mahalanobis distance, and Histogram
Intersection Distance (HID) are some examples of distance
measure functions.

VIT - Dr.D.SARASWATHI 30
VIT - Dr.D.SARASWATHI 31
Similarity Metrics
• A similarity metric quantifies the similarity between two feature
vectors.

VIT - Dr.D.SARASWATHI 32
5. Video Retrieval

VIT - Dr.D.SARASWATHI 33
•Video retrieval is a fascinating field that
involves searching for relevant videos
based on user queries

VIT - Dr.D.SARASWATHI 34
Objective of Video Retrieval
• The primary goal of video retrieval is to select a video
that corresponds to a given text query. Typically,
videos are returned as a ranked list of candidates,
scored using document retrieval metrics.
• Given a text query and a pool of candidate videos, the
task is to identify the most relevant video content.

VIT - Dr.D.SARASWATHI 35
Methods and Techniques
• Video retrieval can be classified into two main categories:
• Text-Based Video Retrieval: In this approach, users input
representative keywords or a single image (or a group of
images) to search for desired videos.
• Content-Based Video Retrieval: Here, the query is based on
the actual content of the videos, such as visual features,
audio, and other modalities

VIT - Dr.D.SARASWATHI 36
• Deep learning techniques play a crucial role in video-text retrieval.
Some notable methods include:
• ECO (Efficient Convolutional Network for Online Video
Understanding): A network architecture that considers long-term
content while enabling fast per-video processing
• Mixture of Embedding Experts: Learning a text-video embedding
from incomplete and heterogeneous data
• Frozen in Time: A joint video and image encoder for end-to-end
retrieval
• CLIP4Clip: Transferring knowledge from the CLIP model to video-
language retrieval in an end-to-end manner
• CoCa (Contrastive Captioners are Image-Text Foundation Models):
Applying contrastive loss between unimodal image and text
embeddings for video retrieval
VIT - Dr.D.SARASWATHI 37
Datasets and Benchmarks:
• Researchers evaluate video retrieval models on various
datasets. Some popular ones include:
• Kinetics, ActivityNet, MSR-VTT, MSVD, HowTo100M, Charades-
STA, and more
• Subtasks within video retrieval include:
• Video-Text Retrieval, Video Grounding, Video-Adverb Retrieval,
and Replay Grounding

VIT - Dr.D.SARASWATHI 38

Aligning Moments in Time Using Video Queries
No ratings yet
Aligning Moments in Time Using Video Queries
13 pages
Irs Unit-V
No ratings yet
Irs Unit-V
48 pages
CH 2 Emerging Trends 1
No ratings yet
CH 2 Emerging Trends 1
43 pages
Chapter 6
100% (1)
Chapter 6
40 pages
UNIT 4 Information Retrieval Using NLP
No ratings yet
UNIT 4 Information Retrieval Using NLP
13 pages
11 Multimedia Media IR
No ratings yet
11 Multimedia Media IR
19 pages
Visual Content Indexing and Retrieval With Psycho-Visual Models
No ratings yet
Visual Content Indexing and Retrieval With Psycho-Visual Models
276 pages
SAP Tables - Overview
No ratings yet
SAP Tables - Overview
3 pages
Multimedia Information
No ratings yet
Multimedia Information
33 pages
Tda 6107 Ajf
No ratings yet
Tda 6107 Ajf
16 pages
Evaluation of Multimedia Retrieval System
No ratings yet
Evaluation of Multimedia Retrieval System
48 pages
A Survey On Visual Content-Based Video Indexing and Retrieval
No ratings yet
A Survey On Visual Content-Based Video Indexing and Retrieval
23 pages
Looking at People - Concepts and Applications
No ratings yet
Looking at People - Concepts and Applications
147 pages
Multimedia Answer Generation For Community Question Answering
No ratings yet
Multimedia Answer Generation For Community Question Answering
17 pages
q8, q9, q10 Question and Answers
No ratings yet
q8, q9, q10 Question and Answers
16 pages
Moment Detr
No ratings yet
Moment Detr
13 pages
2007d Sigirforum Vanzwol
No ratings yet
2007d Sigirforum Vanzwol
6 pages
Concept-Based Video Retrieval: by Cees G. M. Snoek and Marcel Worring
No ratings yet
Concept-Based Video Retrieval: by Cees G. M. Snoek and Marcel Worring
110 pages
1 Introduction To Multimedia Databases
No ratings yet
1 Introduction To Multimedia Databases
50 pages
The VISIONE Video Search System Explotting Off The Shelf Text Search Engines For Large Scale Video Retrieval
No ratings yet
The VISIONE Video Search System Explotting Off The Shelf Text Search Engines For Large Scale Video Retrieval
26 pages
Temperature Prediction Models in Mass Concrete State of The Art Literature Review
No ratings yet
Temperature Prediction Models in Mass Concrete State of The Art Literature Review
10 pages
GV Series: Variable Speed Booster Sets With The New Sd60 Control Card
100% (1)
GV Series: Variable Speed Booster Sets With The New Sd60 Control Card
40 pages
Ir U6
No ratings yet
Ir U6
30 pages
Unit-5 Ai
No ratings yet
Unit-5 Ai
74 pages
Background (1/4) : Slide 1 Slide 3
No ratings yet
Background (1/4) : Slide 1 Slide 3
7 pages
Video Understanding With Large Language Models - A Survey
No ratings yet
Video Understanding With Large Language Models - A Survey
24 pages
Audio Visual Challenge
No ratings yet
Audio Visual Challenge
6 pages
Content Standard:: /configuring-Of-Computer-Systems-And-Networks - PDF Module in ICT CHS 10 Teacher Guide
100% (2)
Content Standard:: /configuring-Of-Computer-Systems-And-Networks - PDF Module in ICT CHS 10 Teacher Guide
2 pages
Motion-Based Recognition
No ratings yet
Motion-Based Recognition
377 pages
Information Search and Visualization: - Who Earns $50,000 Among The Residents of Eugene, Oregon?
No ratings yet
Information Search and Visualization: - Who Earns $50,000 Among The Residents of Eugene, Oregon?
9 pages
ICT Trivia
No ratings yet
ICT Trivia
9 pages
Video Content Representation With Grammar For Semantic Retrieval
No ratings yet
Video Content Representation With Grammar For Semantic Retrieval
6 pages
Edmund
No ratings yet
Edmund
33 pages
Research Paper (2) Done
No ratings yet
Research Paper (2) Done
17 pages
DMV 6
No ratings yet
DMV 6
15 pages
Video Retrieval
No ratings yet
Video Retrieval
2 pages
Artemis
No ratings yet
Artemis
11 pages
Information Retrieval Systmem: Assignment Qa
No ratings yet
Information Retrieval Systmem: Assignment Qa
13 pages
Cv2021-Lec1-Introduction 1600 PDF - Gdrive.vip
No ratings yet
Cv2021-Lec1-Introduction 1600 PDF - Gdrive.vip
61 pages
Concepts of Information Retrieval System
No ratings yet
Concepts of Information Retrieval System
10 pages
Module 1 - Introduction
No ratings yet
Module 1 - Introduction
61 pages
Cross Modal Survey
No ratings yet
Cross Modal Survey
39 pages
Information Retrieval: IR Evaluation
No ratings yet
Information Retrieval: IR Evaluation
36 pages
VQA ViT
No ratings yet
VQA ViT
24 pages
Computer Vision
No ratings yet
Computer Vision
45 pages
Where Is The User in Multimedia Retrieval
No ratings yet
Where Is The User in Multimedia Retrieval
5 pages
Video Information Retrieval (VIR)
No ratings yet
Video Information Retrieval (VIR)
3 pages
Synthesis Lectures On Computer Vision: Series Editors
No ratings yet
Synthesis Lectures On Computer Vision: Series Editors
8 pages
Latex Conversion
No ratings yet
Latex Conversion
42 pages
Multimedia Computing L1
No ratings yet
Multimedia Computing L1
71 pages
Cif Irws
No ratings yet
Cif Irws
3 pages
Multimedia Information Retrieval
No ratings yet
Multimedia Information Retrieval
143 pages
Ashwin Ramachandran CV - pdf-1
No ratings yet
Ashwin Ramachandran CV - pdf-1
4 pages
Module 3.3 Multimedia IR Models
No ratings yet
Module 3.3 Multimedia IR Models
11 pages
Unit 1.1
No ratings yet
Unit 1.1
54 pages
Immersion Into New Appkat Ons Image Underrfanding: Takeo Kanade, Robotics Institute, Carnegie Mellon University
No ratings yet
Immersion Into New Appkat Ons Image Underrfanding: Takeo Kanade, Robotics Institute, Carnegie Mellon University
8 pages
The State of The Art in Image and Video Retrieval
No ratings yet
The State of The Art in Image and Video Retrieval
7 pages
1 introIR
No ratings yet
1 introIR
15 pages
Pankaj Singh MTech Elec Mercedes Benz
No ratings yet
Pankaj Singh MTech Elec Mercedes Benz
2 pages
Patil New Project Report
No ratings yet
Patil New Project Report
45 pages
QVH: Detecting Moments and Highlights in Videos Via Natural Language Queries
No ratings yet
QVH: Detecting Moments and Highlights in Videos Via Natural Language Queries
18 pages
A Review On Deep Learning Applications
No ratings yet
A Review On Deep Learning Applications
11 pages
RAG Training NEW
No ratings yet
RAG Training NEW
47 pages
BrightViewX XCT Specs
No ratings yet
BrightViewX XCT Specs
6 pages
Cisco Asa Firepower
No ratings yet
Cisco Asa Firepower
11 pages
Modicon LMC078: Motion Controller Programming Guide
No ratings yet
Modicon LMC078: Motion Controller Programming Guide
276 pages
Straightforward Pre SB 230908 131056
No ratings yet
Straightforward Pre SB 230908 131056
159 pages
The AI Marketing Canvas
No ratings yet
The AI Marketing Canvas
25 pages
Deep Visual Analytics (DVA) : Applications, Challenges and Future Directions
No ratings yet
Deep Visual Analytics (DVA) : Applications, Challenges and Future Directions
15 pages
Irs Sem Unit 5
No ratings yet
Irs Sem Unit 5
8 pages
U1
No ratings yet
U1
2 pages
Domain PR Check List3!!! (8647)
No ratings yet
Domain PR Check List3!!! (8647)
304 pages
4.SAP TAO Training Material
0% (1)
4.SAP TAO Training Material
20 pages
3.5.7 Lab - Create A Python Unit Test - ILM
No ratings yet
3.5.7 Lab - Create A Python Unit Test - ILM
9 pages
Richtek RT9742
No ratings yet
Richtek RT9742
20 pages
Bilal Turabi CV
No ratings yet
Bilal Turabi CV
1 page
Customer Intelligence Syste1
No ratings yet
Customer Intelligence Syste1
19 pages
Huntington Et Al - HyLogging - Voluminous Industrial-Scale Reflectance Spectroscopy of The Earth PDF
No ratings yet
Huntington Et Al - HyLogging - Voluminous Industrial-Scale Reflectance Spectroscopy of The Earth PDF
14 pages
Pros Dle24
No ratings yet
Pros Dle24
37 pages
Name:-Nitish Xavier Tirkey F.Y.Bca Date: - 4 October, 2010
No ratings yet
Name:-Nitish Xavier Tirkey F.Y.Bca Date: - 4 October, 2010
10 pages
Step by Step Procdure by Power Point Presentation 5289M
No ratings yet
Step by Step Procdure by Power Point Presentation 5289M
34 pages
Datasheet - A-HV-3U Battery BOS-A V1.1
No ratings yet
Datasheet - A-HV-3U Battery BOS-A V1.1
6 pages
Endian Iec-62443-Compliance Whitepaper en
No ratings yet
Endian Iec-62443-Compliance Whitepaper en
5 pages
Setting Up of An Open Source Based Private Cloud
No ratings yet
Setting Up of An Open Source Based Private Cloud
6 pages
2-DigitalOcean Invoice 2023 Sep (7467235-466314537)
No ratings yet
2-DigitalOcean Invoice 2023 Sep (7467235-466314537)
2 pages
School Education and Sports Department
No ratings yet
School Education and Sports Department
1 page
Daffodil International University
No ratings yet
Daffodil International University
1 page
END Semester Lab Exam EVEN 2025
No ratings yet
END Semester Lab Exam EVEN 2025
1 page
Image Retrieval: Fundamentals and Applications
From Everand
Image Retrieval: Fundamentals and Applications
Fouad Sabry
No ratings yet
Image Retrieval: Unlocking the Power of Visual Data
From Everand
Image Retrieval: Unlocking the Power of Visual Data
Fouad Sabry
No ratings yet

Module 7 - Multimedia Information Retrieval

Uploaded by

Module 7 - Multimedia Information Retrieval

Uploaded by

Module 7

• Music Retrieval: Identifying music tracks, genres, and

You might also like