0% found this document useful (0 votes)

17 views62 pages

CVlecture 4

The document discusses computer vision techniques for object recognition, including a history of the challenges in recognition from the 1980s to present. It covers four important recognition problems (recognition, detection, segmentation, pose estimation), and describes datasets like PASCAL VOC and COCO that are used to evaluate recognition algorithms. Deep learning approaches have helped advance recognition by learning features from large amounts of labeled training data rather than relying on hand-crafted features.

Uploaded by

David B

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views62 pages

CVlecture 4

Uploaded by

David B

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 62

COMPUTER VISION

Learning and higher-level computer vision

Andrew French
Today
We will start to look at some higher-level processing in computer vision
• “Higher level” = Understanding what we see in images.
• In future weeks we will look at tracking, which can also be higher-level
• To understand image content, we first need to know what is in an
image
• How to recognise objects

• Here we present a history in the lead up to deep learning…

http://cocodataset.org
PART 1
The recognition challenge
Recognition

Recognition is hard
• There is very large variation in the visual information
• Requires learning from “prior experience”
[An Invitation to 3D Vision Y. Ma, S. Soatto, J. Kosecka, S. Sastry]
Recognition

4 important recognition problems:

BUS

4 important recognition problems:

1. Recognition (identify the main (foreground) object in an image)
2. Detection (find the location of all objects)
3. Segmentation (assign all pixels to objects)
4. Pose (find the location of the object parts)
Detection
• Find the location of all objects in the scenes in terms of
providing a bounding box
Semantic Image Segmentation
• Process of partitioning the image into “meaningful” segments
• Group pixels based on “common” properties
• Recently: semantic image segmentation
Pose estimation
• We assume that the object class and the location of the object
in terms of a bounding box is known
• The aim is to localise the locations of the object parts
Object recognition

• Recognition: “does the image contain any instances of a

particular object class?” (cars, people, dogs, etc.)

• In the 80-90s: Identify specific, known objects in an image

• Example: detect a particular Stapler, Screwdriver etc.
Object recognition

• This problem was solved e.g. using SIFT [Lowe 2004]

• Observe that the objects (train, frog) in the image are allowed to vary in 3D
(e.g. scale, rotation) and can be partially occluded
• However, they are literally the same objects as the templates…
Object (class) recognition

• In the late 2000s/2010s: specific object recognition became object class recognition
• Large variation in object appearance (e.g. see the chairs above)
• Real world images with background clutter
• System needs to be robust in large variation in object pose, illumination, occlusions
The VOC Object Recognition Challenge
The VOC Object Recognition Challenge
The VOC Object Recognition Challenge
Classification Task:
• For each of the 20 object classes, predict the
presence/absence of at least one object of that class in a
test image
• Participants are required to provide a real-valued confidence
of the object’s presence for each test image so that a
precision/recall curve can be drawn

Detection Task:
• For each of the 20 classes, predict the bounding boxes of
each object of that class in a test image (if any), with
associated real-valued confidence.

Segmentation Task:
• For each test image, predict the object class of each pixel,
or “background” if the object does not belong to one of the 20
specified classes
The VOC Object Recognition Challenge
• A prediction (for an algorithm) is made in terms of “comparing”
a test image with a model for a particular object class.

Define
• True Positive (TP) = the algorithm makes a correct prediction
about the presence of an object in an image
• False Positive (FP) = the algorithm predicts the presence of an
object, but that object is not present in the image
• False Negative (FN) = the algorithm misses an object

(also see Evaluation tutorial later)

The VOC Object Recognition Challenge
• Assume that the algorithm is tested on N test images
• For these images we know the “Ground Truth” i.e. the
classes of all the objects in those images
• Hence, we can measure all true detections (TP), false
detections (FP), and missed detections (FN) for a
particular value of a detection threshold

For every threshold value measure

• Precision = TP/(TP+FP)
• Recall = TP/(TP+FN)

Performance Measure
• Draw Precision vs Recall Curve
The PASCAL Visual Object Classes Challenge: A Retrospective. Int J Comput Vis
(2015) 111:98–136
The VOC Object Recognition Challenge
• Assume that the algorithm is tested on N test images
• For these images we know the “Ground Truth” i.e. the
classes of all the objects in those images
• Hence, we can measure all true detections (TP), false
detections (FP), and missed detections (FN) for a
particular value of a detection threshold

For every threshold value measure

• Precision = TP/(TP+FP)
• Recall = TP/(TP+FN)

Performance Measure
• Draw Precision vs Recall Curve
The PASCAL Visual Object Classes Challenge: A Retrospective. Int J Comput Vis
(2015) 111:98–136
Precision-recall curves
Recognition vs Detection
• Aeroplanes

• Best achievable result:

• Detection is much harder, but much more useful!
Other challenges
• COCO – Common Objects In
Context

• 330k images
• 1,500,000 object instances
• http://cocodataset.org/
Other challenges
• Image-Net
• 14,000,000 images
• 1000 object classes
• http://image-net.org/
ImageNet privacy update
• A 2021 research paper on
obfuscating people’s faces in
ImageNet
• Although people aren’t often the
focus of a category, they are in the
background
• By annotating and blurring faces in
ImageNet, the team demonstrate
accuracy only falls by 0.68% - very
small drop.
• Paves the way for privacy-aware
recognition
Reminder: Talk about papers from tutorial! ☺
An aside: How to collect the data?
E.g.
https://www.zooniverse.org/projects/meredithspalmer/snapshot-mountain-zebra/classify
Other recognition problems
• Detect fine-grained facial attributes
• Very fine representation of face: e.g. Bald, curly hair,
glasses, moustache, makeup, etc.
PART 2
Learning the recognition models
Object (Class) Recognition
Aim
• Build a model for recognising a specific object class
e.g. aeroplanes.
We need 3 things:
• Data:
• Images containing objects from that class and
images from all other classes
• Feature extraction:
• We will not work with pixels but with features
extracted from them
• Machine Learning:
• From the features extracted, we will learn a model
that recognises this particular object class
Data
We can use PASCAL VOC for this (20 classes)
• Each object is cropped out and and rescaled to a fixed
resolution
• Ia, a=1,…A images containing objects from that class
• Id, d=1,…D images from all other classes
Feature extraction

• Pixel intensities are not good features as they vary a lot

depending on illumination and viewpoint
• Plus there are millions of pixels!

• Replace pixels with features extracted from them

• For all images compute fa = f(Ia) in RD, fd = f(Id) in RD
• f() is a function for computing features
• E.g. This could be e.g. HOG features (we’ll see an example
next, as these have been introduced earlier in the course),
but it could be many different kinds of features.
HOG Features
• Divide image into a grid of cells (e.g. 8x8)
• Compute edges and their orientation for every
pixel location
• Compute histogram of gradient orientations in
each cell
Inverting Features
Inverse HOG: More generally:
• From HOG features try to
reconstruct the image

The amazingly-titled “HOGgles” ☺ (HOG Goggles)

http://www.cs.columbia.edu/~vondrick/ihog/

https://ieeexplore.ieee.org/document/6751109 https://www.robots.ox.ac.uk/~vedaldi/assets/pubs/mahendran15u
nderstanding.pdf (a 2015 paper about inverting features, inc.
CNNs)
Bags of Features
• Some features are obviously good
representations of some objects e.g.
HoGs and people
• Sometimes its not clear what
features should be used
• Bag of Features methods analyse
the large set of very specific features
generated by a training set of images
and identify a small set of useful,
more generic features
Origin 1: Bag-of-words models
• Orderless document representation: frequencies of words from a dictionary
• Repetition of words suggests importance?

US Presidential Speeches Tag Cloud

Origin 2: Texture recognition
Texture is characterized by the repetition of basic elements or textons
• For stochastic textures (sand, dirt etc), it is the identity of the textons, not their
spatial arrangement, that matters
Origin 2: Texture recognition
histogram

Universal texton dictionary

Bags of features for object recognition
• First, take a bunch of images
• Extract features, and build up a “dictionary” or “visual vocabulary”
– a list of common features
• Given a new image, extract features
• For each feature, find the closest visual word in the dictionary
• Build a histogram to represent the image

face, flowers, building

Learning the visual vocabulary
…

Slide credit: Josef Sivic

Learning the visual vocabulary
…

Clustering

Slide credit: Josef Sivic

Learning the visual vocabulary
Visual vocabulary
…

Clustering

Slide credit: Josef Sivic

Viola-Jones Recognition
• Developed for face recognition, but general
• Basic idea: slide a window across image and
evaluate a face model at every location
• Sliding window detector must evaluate tens of thousands of
location/scale combinations
• Faces are rare: 0–10 per image
• Key ideas
• Integral images for fast feature
evaluation
• Boosting for feature selection
• Attentional cascade for fast rejection
of non-face windows
Features
• Four basic types
• Easy to calculate
• The white areas are subtracted from the black ones
• A novel representation - the integral image - makes feature
extraction faster, and allows consideration of more features.
Integral Images
• The integral image computes a value at each pixel
(x,y) that is the sum of the pixel values above and to
the left of (x,y), inclusive
• This can quickly be computed in one pass through the image

• Cumulative row sum:

s(x, y) = s(x–1, y) + i(x, y)
• Integral image:
ii(x, y) = ii(x, y−1) + s(x, y)
ii(x, y-1)

s(x-1, y)
i(x, y)
Integral Images
• Pixel values can be summed over arbitrary rectangles
quickly
Feature Extraction
• Features are extracted from sub windows of a sample image.
• The base size for a sub window is 24 by 24 pixels.
• Each of the four feature types are scaled and shifted across all possible
combinations
• In a 24 pixel by 24 pixel sub window there are ~160,000 possible features to be calculated.
Feature Selection
• Faces are complex and variable – we need a lot of features to capture
all possible examples
• We can’t possibly use all 160,000

• Can we create a good classifier using just a small subset of all possible features?

• How to select such a subset?

• Boosting is a classification scheme that works by combining weak

learners into a more accurate ensemble classifier
• A weak learner need only do better than chance

• Training consists of multiple boosting rounds

Boosting
• Need a training set of labelled (object/non-object) examples
• Start with all examples equally weighed
• Learn a series of recognition rules (classifiers)
• Re-weight examples so incorrect recognition by nth classifier makes
that example more important to the n+1th
Boosting

• No single rule/classifier can separate complex objects from complex

backgrounds: but a combination can
Boosting

• Weights are determined

automatically
• Details of the weight learning
algorithm are beyond scope of
this module
Viola-Jones Version
• Weak classifiers threshold a single feature xi

= 1 (if x1 > thresh)

o(x1)
= -1 (otherwise)

• At each stage of boosting

• Given re-weighted data from previous stage
• Train all K (160,000) single-feature classifiers
• Select the single best classifier at this stage
• Combine it with the other previously selected classifiers
• Re-weight the data
• Learn all K classifiers again, select the best, combine, reweight
• Repeat until you have T classifiers selected
Cascading Classifiers
• A 200-feature ensemble classifier can achieve 95% correct results
• Not good enough
• Learn simple (few feature) classifiers that can reject obviously non-face regions
• Focus effort (classifiers with more features) on harder regions
• Days to train, but very fast once built

Example two-feature stage 1 classifier.

Aim is to minimize false negatives.
Learned Features are Task-Specific
Face Detection
Learned Features are Task-Specific
Profile Detection
Results
Classical learning in Vision
The classic approach, then, applies learned operations to user-defined features

1
2
-1
-4
0
5
-6 SVM “Not a root tip”
-4
2
1
-3 1. Design/choose features
…
2. Design/choose a
4
classifier
HoG, LBP, histograms, etc 3. Train the classifier
Classical learning in Vision
• Designing features can become a trial and error process

Root tip Root crossover

• Learning will fail if the user limits it to the wrong features

• Some approaches try to reduce reliance on the user

• Bag of Words clusters the results of applying the user-defined set of feature-
detection operators to form a more generic visual vocabulary
• Viola-Jones selects from a much larger set of user-defined features
Deep Learning – the future?
• Deep learning does not use any pre-computed features
• Feature detection and classification are integrated
• Deep methods learn:
1. Which features are needed to make classification possible
2. How to do the classification given those features

Input Image

Deep Network “Not a root tip”

Next time….!
Summary
• In previous lectures we have looked at segmentation and pixel-level information
• What does it mean to have higher level understanding of images?
• We looked at Recognition problems:
• Recognising the main object in an image
• Detecting all instances of an object
• Segmenting all pixels within an object (semantic segmentation)
• Pose: locating all components of an object
Note: although we talk about objects we really mean classes
E.g. happy faces versus sad faces, mountain bikes versus racing bikes?

Data Mining Jurnal
No ratings yet
Data Mining Jurnal
20 pages
QB CP02 Eng
No ratings yet
QB CP02 Eng
18 pages
Deep Learning Optimized Dictionary Learning and Its Application in Eliminating Strong Magnetotelluric Noise
No ratings yet
Deep Learning Optimized Dictionary Learning and Its Application in Eliminating Strong Magnetotelluric Noise
22 pages
From The Spatial To The Non-Spatial
No ratings yet
From The Spatial To The Non-Spatial
34 pages
CNN Based Features Extraction For Age Estimation A
No ratings yet
CNN Based Features Extraction For Age Estimation A
9 pages
Q1-What's The Trade-Off Between Bias and Variance?
100% (1)
Q1-What's The Trade-Off Between Bias and Variance?
5 pages
Pid Feedforward Controller
No ratings yet
Pid Feedforward Controller
6 pages
Kernels
No ratings yet
Kernels
65 pages
Cybernetics of Cybernetics Von Foerster
No ratings yet
Cybernetics of Cybernetics Von Foerster
4 pages
Formal Approaches To Sla
No ratings yet
Formal Approaches To Sla
4 pages
MSC (Integrated) AI & ML Admission Brochure
No ratings yet
MSC (Integrated) AI & ML Admission Brochure
1 page
A Review of Dimensionality Reduction Techniques For Efficient INTERNATIONAL CONFERENCE ON RECENT TRENDS IN ADVANCED COMPUTING Computation Computation
No ratings yet
A Review of Dimensionality Reduction Techniques For Efficient INTERNATIONAL CONFERENCE ON RECENT TRENDS IN ADVANCED COMPUTING Computation Computation
8 pages
AI Speech in Modern World
No ratings yet
AI Speech in Modern World
1 page
The Theoretical Basis of The Effective School Improvement Model (ESI)
No ratings yet
The Theoretical Basis of The Effective School Improvement Model (ESI)
14 pages
4 Lecture 4-Dimensional Modelling
No ratings yet
4 Lecture 4-Dimensional Modelling
45 pages
Grid-Based Localization Stack For Inspection Drones Towards Automation of Large Scale Warehouse Systems
No ratings yet
Grid-Based Localization Stack For Inspection Drones Towards Automation of Large Scale Warehouse Systems
8 pages
Deep Learning Models Based On Image Classification: A Review
No ratings yet
Deep Learning Models Based On Image Classification: A Review
8 pages
Takagi-Sugeno Fuzzy Systems: Reduction and Robust Control of Generalized
No ratings yet
Takagi-Sugeno Fuzzy Systems: Reduction and Robust Control of Generalized
2 pages
COSC 4P76 Machine Learning: Project Report Format: A. The Target Function
No ratings yet
COSC 4P76 Machine Learning: Project Report Format: A. The Target Function
3 pages
A Machine Learning Approach For Tracking and Predicting Student Performance in Degree Programs
No ratings yet
A Machine Learning Approach For Tracking and Predicting Student Performance in Degree Programs
2 pages
50 Deep Learning Technical Interview Questions With Answers
100% (1)
50 Deep Learning Technical Interview Questions With Answers
20 pages
QB Soft
No ratings yet
QB Soft
10 pages
Adaptive Neuro-Fuzzy Inference
No ratings yet
Adaptive Neuro-Fuzzy Inference
13 pages
Basics of Programming, Basics of Probability Theory: Reasoning in Uncertain Situations
No ratings yet
Basics of Programming, Basics of Probability Theory: Reasoning in Uncertain Situations
3 pages
Applied Sciences: Applications of Computer Vision in Automation and Robotics
No ratings yet
Applied Sciences: Applications of Computer Vision in Automation and Robotics
3 pages
Experiment 2 - Warm Water Control
No ratings yet
Experiment 2 - Warm Water Control
13 pages
ChatGPT Tips & Tricks
No ratings yet
ChatGPT Tips & Tricks
11 pages
Actual4Test: Actual4test - Actual Test Exam Dumps-Pass For IT Exams
100% (1)
Actual4Test: Actual4test - Actual Test Exam Dumps-Pass For IT Exams
4 pages
Dax - Sqlbi
0% (1)
Dax - Sqlbi
4 pages
Answer Key ELICIT Pg. 3 & 4
No ratings yet
Answer Key ELICIT Pg. 3 & 4
2 pages
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
From Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
4/5 (6458)
A Man Called Ove: A Novel
From Everand
A Man Called Ove: A Novel
Fredrik Backman
4.5/5 (5181)
Never Split the Difference: Negotiating As If Your Life Depended On It
From Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
4.5/5 (1005)
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (643)
The Little Book of Hygge: Danish Secrets to Happy Living
From Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
3.5/5 (464)
Grit: The Power of Passion and Perseverance
From Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
4/5 (650)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
From Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
4.5/5 (141)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1856)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
4/5 (1175)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
From Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
4.5/5 (582)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
From Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
4.5/5 (361)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (298)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (1139)
The Woman in Cabin 10
From Everand
The Woman in Cabin 10
Ruth Ware
3.5/5 (2814)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (4103)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2885)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (629)
Yes Please
From Everand
Yes Please
Amy Poehler
4/5 (2016)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (943)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (815)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2289)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
From Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
4/5 (1022)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (836)
The Constant Gardener: A Novel
From Everand
The Constant Gardener: A Novel
John le Carré
4/5 (278)
Wolf Hall: A Novel
From Everand
Wolf Hall: A Novel
Hilary Mantel
4/5 (4135)
Bad Feminist: Essays
From Everand
Bad Feminist: Essays
Roxane Gay
4/5 (1090)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (903)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1267)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (244)
A Tree Grows in Brooklyn
From Everand
A Tree Grows in Brooklyn
Betty Smith
4.5/5 (2033)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2546)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (233)
The Art of Racing in the Rain: A Novel
From Everand
The Art of Racing in the Rain: A Novel
Garth Stein
4/5 (4372)
On Fire: The (Burning) Case for a Green New Deal
From Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
4/5 (78)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)
Brooklyn: A Novel
From Everand
Brooklyn: A Novel
Colm Toibin
3.5/5 (2133)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
From Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
4.5/5 (280)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (919)

CVlecture 4

Uploaded by

CVlecture 4

Uploaded by

COMPUTER VISION

Learning and higher-level computer vision

• Here we present a history in the lead up to deep learning…

4 important recognition problems:

4 important recognition problems:

4 important recognition problems:

4 important recognition problems:

4 important recognition problems:

• Recognition: “does the image contain any instances of a

• In the 80-90s: Identify specific, known objects in an image

• This problem was solved e.g. using SIFT [Lowe 2004]

(also see Evaluation tutorial later)

For every threshold value measure

For every threshold value measure

• Best achievable result:

• Pixel intensities are not good features as they vary a lot

• Replace pixels with features extracted from them

The amazingly-titled “HOGgles” ☺ (HOG Goggles)

US Presidential Speeches Tag Cloud

Universal texton dictionary

face, flowers, building

Slide credit: Josef Sivic

Slide credit: Josef Sivic

Slide credit: Josef Sivic

• Cumulative row sum:

• How to select such a subset?

• Boosting is a classification scheme that works by combining weak

• Training consists of multiple boosting rounds

• No single rule/classifier can separate complex objects from complex

• Weights are determined

= 1 (if x1 > thresh)

• At each stage of boosting

Example two-feature stage 1 classifier.

Root tip Root crossover

• Learning will fail if the user limits it to the wrong features

• Some approaches try to reduce reliance on the user

Deep Network “Not a root tip”

You might also like