0% found this document useful (0 votes)

10 views37 pages

Lecture6 2

The document discusses the Bag-of-Words (BoW) model for image matching, highlighting its efficiency in handling large datasets by focusing on likely matches through global similarity measures. It outlines the process of feature extraction, learning a visual vocabulary via clustering, and representing images using frequency histograms of visual words. Additionally, it emphasizes the importance of TF-IDF weighting and the use of inverted files for efficient image retrieval in large databases.

Uploaded by

asumi288hk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views37 pages

Lecture6 2

Uploaded by

asumi288hk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

Bag-of-Words

Object Bag of ‘words’

Image matching
• Brute force approach:

• 250,000 images → ~ 31 billion image pairs

– 2 pairs per second → 1 year on 500 machines

• 1,000,000 images → 500 billion pairs

– 15 years on 500 machines
Image matching

• For city-sized datasets, fewer than 0.1% of

image pairs actually match

• Key idea: only consider likely matches

• How do we know if a match is likely?
• Solution: use fast global similarity measures
– For example, a bag-of-words representation
Object Bag of ‘words’
Origin 1: Texture recognition

histogram

Universal dictionary

Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001;
Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003
Origin 2: Bag-of-words models
• Orderless document representation: frequencies of words
from a dictionary Salton & McGill (1983)
Origin 2: Bag-of-words models
• Orderless document representation: frequencies of words
from a dictionary Salton & McGill (1983)

US Presidential Speeches Tag Cloud

http://chir.ag/phernalia/preztags/
Origin 2: Bag-of-words models
• Orderless document representation: frequencies of words
from a dictionary Salton & McGill (1983)

US Presidential Speeches Tag Cloud

http://chir.ag/phernalia/preztags/
Origin 2: Bag-of-words models
• Orderless document representation: frequencies of words
from a dictionary Salton & McGill (1983)

US Presidential Speeches Tag Cloud

http://chir.ag/phernalia/preztags/
Origin 2: Bag-of-words models

John likes to watch movies. Mary likes too.

John also likes to watch football games

{"John": 1, "likes": 2, "to": 3, "watch": 4, "movies": 5,

."also": 6, "football": 7, "games": 8, "Mary": 9, "too": 10}

[1, 2, 1, 1, 1, 0, 0, 0, 1, 1]

[1, 1, 1, 1, 0, 1, 1, 1, 0, 0]
Bag of words

face, flowers, building

Works pretty well image retrieval, recognition and matching

Images as histograms of visual words
• Inspired by ideas from text retrieval
– [Sivic and Zisserman, ICCV 2003]
frequency

…..
visual words
Quiz: What is BoW for one image?
• A histogram of local feature vectors in an
image
• A visual dictionary
• The feature vector of a local image patch
• A histogram of local features in the collection
of images
Bag of features: outline
1. Extract features
Bag of features: outline
1. Extract features
2. Learn “visual vocabulary”
Bag of features: outline
1. Extract features
2. Learn “visual vocabulary”
3. Quantize features using visual vocabulary
Bag of features: outline
1. Extract features
2. Learn “visual vocabulary”
3. Quantize features using visual vocabulary
4. Represent images by frequencies of
“visual words”

Quantize: approximate by one whose amplitude is

restricted to a prescribed set of values.
1. Feature extraction

Compute
SIFT Normalize
descriptor patch
[Lowe’99]

Detect patches
[Mikojaczyk and Schmid ’02]
[Mata, Chum, Urban & Pajdla, ’02]
[Sivic & Zisserman, ’03]

Slide credit: Josef Sivic

1. Feature extraction

…
2. Learning the visual vocabulary

Clustering

Slide credit: Josef Sivic

2. Learning the visual vocabulary
Visual vocabulary
…

Clustering

Slide credit: Josef Sivic

K-means clustering
• Want to minimize sum of squared Euclidean
distances between points xi and their
nearest cluster centers mk

D( X , M ) =   i k
( x −
cluster k point i in
m ) 2

cluster k

Algorithm:
• Randomly initialize K cluster centers
• Iterate until convergence:
• Assign each data point to the nearest center
• Recompute each cluster center as the mean of all points
assigned to it
K-means clustering

https://en.wikipedia.org/wiki/
File:K-means_convergence.gif
From clustering to vector quantization
• Clustering is a common method for learning a
visual vocabulary or codebook
• Unsupervised learning process
• Each cluster center produced by k-means becomes a
codevector
• Codebook can be learned on separate training set
• Provided the training set is sufficiently representative, the
codebook will be “universal”

• The codebook is used for quantizing features

• A vector quantizer takes a feature vector and maps it to the
index of the nearest codevector in a codebook
• Codebook = visual vocabulary
• Codevector = visual word
Example visual vocabulary

Fei-Fei et al. 2005

Image patch examples of visual words

Sivic et al. 2005

3. Image representation
frequency

…..
codewords
Large-scale image matching
• Bag-of-words models have
been useful in matching an
image to a large database of
object instances

11,400 images of game covers how do I find this image in the database?
(Caltech games dataset)
Large-scale image search
• Build the database:
– Extract features from the
database images
– Learn a vocabulary using k-
means (typical k: 100,000)
– Compute weights for each
word
– Create an inverted file
mapping words → images
Weighting the words
• Just as with text, some visual words are more
discriminative than others

the, and, or vs. cow, AT&T, Cher

• the bigger fraction of the documents a word

appears in, the less useful it is for matching
– e.g., a word that appears in all documents is not
helping us
TF (term frequency)-
IDF(inverse document frequency) weighting
• Instead of computing a regular histogram
distance, we’ll weight each word by it’s
inverse document frequency

• inverse document frequency (IDF) of word j =

number of documents
log
number of documents in which j appears
TF-IDF weighting

• To compute the value of bin j in image I:

term frequency of j in I x inverse document frequency of j

Inverted file
• Each image has ~1,000 features
• We have ~1,000,000 visual words
→each histogram is extremely sparse (mostly zeros)

• Inverted file
– mapping from words to documents
Inverted file
• Can quickly use the inverted file to compute
similarity between a new image and all the
images in the database
– Only consider database images whose bins
overlap the query image
Spatial pyramid: BoW disregards all information
about the spatial layout of the features

Compute histogram in each spatial bin

Slide credit: D. Hoiem
Spatial pyramid

[Lazebnik et al. CVPR 2006]

Slide credit: D. Hoiem

IM - Creative Arts, Music, and Movements in ECE
No ratings yet
IM - Creative Arts, Music, and Movements in ECE
22 pages
Feature Engineering
100% (2)
Feature Engineering
44 pages
School Observations Report
75% (4)
School Observations Report
3 pages
Teaching and Learning English in Science Session Guide
100% (1)
Teaching and Learning English in Science Session Guide
17 pages
CELPIP Reading Tips To Get A Higher Score 2022
No ratings yet
CELPIP Reading Tips To Get A Higher Score 2022
2 pages
Guided Backpropagation
No ratings yet
Guided Backpropagation
11 pages
1 - Artificial Intelligence Introduction
No ratings yet
1 - Artificial Intelligence Introduction
30 pages
Bag of Words
No ratings yet
Bag of Words
72 pages
MCES Gr.3 EGRA Pre TEST Final School Consolidation
100% (1)
MCES Gr.3 EGRA Pre TEST Final School Consolidation
2 pages
Currency Recognition On Mobile Phones Proposed System Modules
No ratings yet
Currency Recognition On Mobile Phones Proposed System Modules
26 pages
Image Matching: - Alok Talekar - Sairam Sundaresan
No ratings yet
Image Matching: - Alok Talekar - Sairam Sundaresan
70 pages
PROJECT Presentation Medical Multimodal Image Retrieval
No ratings yet
PROJECT Presentation Medical Multimodal Image Retrieval
57 pages
Image Features and Categorization: Computer Vision Jia-Bin Huang, Virginia Tech
No ratings yet
Image Features and Categorization: Computer Vision Jia-Bin Huang, Virginia Tech
70 pages
Image Searches, Abstraction, Invariance: 36-350: Data Mining 2 September 2009
No ratings yet
Image Searches, Abstraction, Invariance: 36-350: Data Mining 2 September 2009
27 pages
Learner-Centered Curriculum-2
No ratings yet
Learner-Centered Curriculum-2
86 pages
Bag of Feature
No ratings yet
Bag of Feature
75 pages
Machine Learning: Aigerim Bogyrbayeva
No ratings yet
Machine Learning: Aigerim Bogyrbayeva
85 pages
Final Corrected Study Methods Vis A Vis Academic Performance
No ratings yet
Final Corrected Study Methods Vis A Vis Academic Performance
81 pages
Introduction To Object Recognition: Slides Adapted From Fei-Fei Li, Rob Fergus, Antonio Torralba, and Others
No ratings yet
Introduction To Object Recognition: Slides Adapted From Fei-Fei Li, Rob Fergus, Antonio Torralba, and Others
60 pages
Org MGT Q 1 Mod2 R Module 2 Materials
No ratings yet
Org MGT Q 1 Mod2 R Module 2 Materials
25 pages
Bai09 Descriptors
No ratings yet
Bai09 Descriptors
81 pages
Local Features and Bag of Words Models
No ratings yet
Local Features and Bag of Words Models
60 pages
Bag of Words: The Framework
No ratings yet
Bag of Words: The Framework
44 pages
Bag of Features
No ratings yet
Bag of Features
49 pages
Human Computation and Computer Vision
No ratings yet
Human Computation and Computer Vision
50 pages
Aarya Edara - CAS Portfolio - Batch of 2025
No ratings yet
Aarya Edara - CAS Portfolio - Batch of 2025
46 pages
Bag-Of-Words Models: Noah Snavely
No ratings yet
Bag-Of-Words Models: Noah Snavely
47 pages
Lecture 06
No ratings yet
Lecture 06
72 pages
Analysis of Literary Work ENT 06207
No ratings yet
Analysis of Literary Work ENT 06207
63 pages
Support Vector Machines On The D-Wave Quantum Annealer: D.Willsch, M.Willsch, H.De Raedt, K.Michielsen
No ratings yet
Support Vector Machines On The D-Wave Quantum Annealer: D.Willsch, M.Willsch, H.De Raedt, K.Michielsen
17 pages
Attribute Discovery Via Predictable Discriminative Binary Codes
No ratings yet
Attribute Discovery Via Predictable Discriminative Binary Codes
14 pages
Lec23 Categorization Wide
No ratings yet
Lec23 Categorization Wide
53 pages
Content Based Image Search
No ratings yet
Content Based Image Search
41 pages
Part 11 MD
No ratings yet
Part 11 MD
53 pages
CV Lecture 07 BagOfFeatures
No ratings yet
CV Lecture 07 BagOfFeatures
42 pages
Image Descriptor Matching: Vineeth N Balasubramanian
No ratings yet
Image Descriptor Matching: Vineeth N Balasubramanian
20 pages
IT5409 - Ch7 - Part2 - Object Recognition - v2 - 4pages
No ratings yet
IT5409 - Ch7 - Part2 - Object Recognition - v2 - 4pages
38 pages
Image Classification AI
No ratings yet
Image Classification AI
150 pages
Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 3: Document Topic Modeling
No ratings yet
Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 3: Document Topic Modeling
48 pages
Lab4 103169894
No ratings yet
Lab4 103169894
34 pages
Classifying Images: D.A. Forsyth
No ratings yet
Classifying Images: D.A. Forsyth
24 pages
Project Work
No ratings yet
Project Work
28 pages
2 Bow
No ratings yet
2 Bow
59 pages
Writer Recognition by Computer Vision: Jeffrey P. Woodard Christopher P. Saunders Mark J. Lancaster
No ratings yet
Writer Recognition by Computer Vision: Jeffrey P. Woodard Christopher P. Saunders Mark J. Lancaster
19 pages
CV 2025 Spring 12 Short
No ratings yet
CV 2025 Spring 12 Short
120 pages
Course Material For cs391
No ratings yet
Course Material For cs391
21 pages
Ijaia 040305
No ratings yet
Ijaia 040305
10 pages
R U A A E C: Instructional Planning
No ratings yet
R U A A E C: Instructional Planning
4 pages
A62 Vocabulary Tree
No ratings yet
A62 Vocabulary Tree
30 pages
Feature Selection
No ratings yet
Feature Selection
18 pages
Lab 5
No ratings yet
Lab 5
27 pages
Lecture - 7 MSDS
No ratings yet
Lecture - 7 MSDS
32 pages
Deep Learning Training Best Practices
No ratings yet
Deep Learning Training Best Practices
40 pages
Lecture10-Featurebased Image Matching
No ratings yet
Lecture10-Featurebased Image Matching
33 pages
Makalah Scientific Approach
No ratings yet
Makalah Scientific Approach
8 pages
Lecture2 1
No ratings yet
Lecture2 1
32 pages
Computer Vision Presentation
No ratings yet
Computer Vision Presentation
19 pages
Understanding Bag-Of-Words Model A Statistical Fra
No ratings yet
Understanding Bag-Of-Words Model A Statistical Fra
16 pages
Lecture5 2
No ratings yet
Lecture5 2
26 pages
Bag of Visual Words Its Detectors and de
No ratings yet
Bag of Visual Words Its Detectors and de
13 pages
Unsupervised Domain Adaptation by Backpropagation
No ratings yet
Unsupervised Domain Adaptation by Backpropagation
11 pages
Gender and Development in Philippines
No ratings yet
Gender and Development in Philippines
3 pages
BOW Assignment 210097
No ratings yet
BOW Assignment 210097
10 pages
Semantics-Preserving Bag-of-Words Models and Applications
No ratings yet
Semantics-Preserving Bag-of-Words Models and Applications
13 pages
An Integrated Approach For Image Retrieval Based On Content
No ratings yet
An Integrated Approach For Image Retrieval Based On Content
6 pages
Contextual Bag-of-Words For Visual Categorization: Teng Li, Tao Mei, In-So Kweon, and Xian-Sheng Hua
No ratings yet
Contextual Bag-of-Words For Visual Categorization: Teng Li, Tao Mei, In-So Kweon, and Xian-Sheng Hua
12 pages
An Investigation Into The Use of Cognitive Ability Tests in 4j4tpj7le9
No ratings yet
An Investigation Into The Use of Cognitive Ability Tests in 4j4tpj7le9
13 pages
Understanding Bag-Of-Words Model: A Statistical Framework
No ratings yet
Understanding Bag-Of-Words Model: A Statistical Framework
10 pages
Evaluating Bag-Of-Visual-Words Representations in Scene Classific
No ratings yet
Evaluating Bag-Of-Visual-Words Representations in Scene Classific
11 pages
Spatial Feat Embedding
No ratings yet
Spatial Feat Embedding
4 pages
Fusion of Demands in Review of Bag-Of-Visual Words: Silkesha Thigale, A.B Bagwan
No ratings yet
Fusion of Demands in Review of Bag-Of-Visual Words: Silkesha Thigale, A.B Bagwan
4 pages
18 Circular 2020 PDF
No ratings yet
18 Circular 2020 PDF
3 pages
Content Based Image Retrieval Using Feature Coding
No ratings yet
Content Based Image Retrieval Using Feature Coding
4 pages
Artikel Kartini
No ratings yet
Artikel Kartini
8 pages
Kernel Visual Recognition
No ratings yet
Kernel Visual Recognition
9 pages
English Guide - Lesson One
No ratings yet
English Guide - Lesson One
7 pages
03-3 Feature Descriptors
No ratings yet
03-3 Feature Descriptors
58 pages
Feature Propagation On Image Webs For Enhanced Image Retrieval
No ratings yet
Feature Propagation On Image Webs For Enhanced Image Retrieval
8 pages
Lecture2 2
No ratings yet
Lecture2 2
9 pages
Lecture3 1
No ratings yet
Lecture3 1
8 pages
Lab6 1
No ratings yet
Lab6 1
6 pages
Image Classification Using Bag of Visual Words (Bovw) : 10.22401/anjs.21.4.11
No ratings yet
Image Classification Using Bag of Visual Words (Bovw) : 10.22401/anjs.21.4.11
7 pages
An Overviewof Bagof Words Importance Implementation Applicationsand Challenges
No ratings yet
An Overviewof Bagof Words Importance Implementation Applicationsand Challenges
6 pages
My Favorite Subject
No ratings yet
My Favorite Subject
3 pages
Data Representation and Pattern Recognition in Image Mining-N D Thokare
No ratings yet
Data Representation and Pattern Recognition in Image Mining-N D Thokare
6 pages
Aly 11 Multiple
No ratings yet
Aly 11 Multiple
4 pages
Perez BatangasProveince ResearchBulletin
No ratings yet
Perez BatangasProveince ResearchBulletin
3 pages
ES605/ES805: Research Methodology (2-0-6) 1/2017
No ratings yet
ES605/ES805: Research Methodology (2-0-6) 1/2017
2 pages
A Mixed Generative-Discriminative Based Hashing Method: Qi Zhang, Yang Wang, Jin Qian, Binbin Deng, Xuanjing Huang
No ratings yet
A Mixed Generative-Discriminative Based Hashing Method: Qi Zhang, Yang Wang, Jin Qian, Binbin Deng, Xuanjing Huang
2 pages
Teachers Recruitment Board, Chennai-6 List of Year Wise Selection of Teaching Personnel
No ratings yet
Teachers Recruitment Board, Chennai-6 List of Year Wise Selection of Teaching Personnel
1 page
ps1 Introductory Letter - Sam Flegal
No ratings yet
ps1 Introductory Letter - Sam Flegal
2 pages
KET Speaking Sample Paper
No ratings yet
KET Speaking Sample Paper
3 pages
Dictionary of Computer Vision and Image Processing
From Everand
Dictionary of Computer Vision and Image Processing
Robert B. Fisher
No ratings yet
IGNOU MCA Digital Image Processing and Computer Vision Unsolved Paper Book MCS 230
From Everand
IGNOU MCA Digital Image Processing and Computer Vision Unsolved Paper Book MCS 230
Manish Soni
No ratings yet

Lecture6 2

Uploaded by

Lecture6 2

Uploaded by

Bag-of-Words

Object Bag of ‘words’

• 250,000 images → ~ 31 billion image pairs

• 1,000,000 images → 500 billion pairs

• For city-sized datasets, fewer than 0.1% of

• Key idea: only consider likely matches

US Presidential Speeches Tag Cloud

US Presidential Speeches Tag Cloud

US Presidential Speeches Tag Cloud

John likes to watch movies. Mary likes too.

John also likes to watch football games

{"John": 1, "likes": 2, "to": 3, "watch": 4, "movies": 5,

face, flowers, building

Works pretty well image retrieval, recognition and matching

Quantize: approximate by one whose amplitude is

Slide credit: Josef Sivic

Slide credit: Josef Sivic

Slide credit: Josef Sivic

• The codebook is used for quantizing features

Fei-Fei et al. 2005

Sivic et al. 2005

the, and, or vs. cow, AT&T, Cher

• the bigger fraction of the documents a word

• inverse document frequency (IDF) of word j =

• To compute the value of bin j in image I:

term frequency of j in I x inverse document frequency of j

Compute histogram in each spatial bin

[Lazebnik et al. CVPR 2006]

You might also like