0% found this document useful (0 votes)
6 views80 pages

Lecture 8 Multi Sensor Part2

Uploaded by

Ted tan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views80 pages

Lecture 8 Multi Sensor Part2

Uploaded by

Ted tan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 80

Visual Perception and Learning in the Open World

CISC3027 Special Topics in Computer and Information Science

Lecture 8: Recognition with More Sensors / Modalities


(part-2)

Instructor: Shu Kong


Email: [email protected]
Office: E11 4025
How to specify THAT object to let robot get it for you?

We need some interaction – human computer interaction (HCI)!


How to specify THAT object to let robot get it for you?
Detect object instances using language
Detect object instances using language
What’s next?

• Visual signal is not enough


• Applications in real open world rely on multiple sensors, more than an RGB camera.
What’s next?

• Visual signal is not enough


• Applications in real open world rely on multiple sensors, more than an RGB camera.
Multi-modality / multi-sensor

• RGB + Infrared
• RGB + depth
• RGB + LiDAR
• RGB + IMU
• RGB + language
RGB + Infrared for object detection

• Again, why use infrared?


RGB + Infrared for object detection

• Again, why use infrared?


RGB + Infrared for object detection

• Again, why use infrared?

Tempe, Arizona, Sunday night, March 19, 2018


RGB + Infrared for object detection

• Again, why use infrared?

Tempe, Arizona, Sunday night, March 19, 2018

No person?
RGB + Infrared for object detection

• Again, why use infrared?

Tempe, Arizona, Sunday night, March 19, 2018

No person?
RGB + Infrared for object detection

• Again, why use infrared?

Single-modal RGB sensor cannot capture all objects, e.g., under poor illumination.
Thermal captures stronger signatures for objects that emit heat.
How about for objects that do not? Let’s fuse modalities.

Tempe, Arizona, Sunday night, March 19, 2018

No person?
RGB + Infrared for object detection

• Again, why use infrared?


RGB + Infrared for object detection

• Again, why use infrared?


RGB + Infrared for object detection

• Again, why use infrared?


RGB + Infrared for object detection

• Again, why use infrared?


RGB + Infrared for object detection

• Again, why use infrared?

HDR scenes
RGB + Infrared for object detection

• Again, why use infrared?

HDR scenes
RGB + Infrared for object detection

• Again, why use infrared?


• Can we use infrared only without RGB? Why (not)?
RGB + Infrared for object detection

• Again, why use infrared?


• Can we use infrared only without RGB? Why (not)?
RGB + Infrared for object detection

• Again, why use infrared?


• Can we use infrared only without RGB? Why (not)?
RGB + Infrared for object detection

• Again, why use infrared?


• Can we use infrared only without RGB? Why (not)?
RGB + Infrared for object detection

• Again, why use infrared?


• Can we use infrared only without RGB? Why (not)?
RGB + Infrared for object detection
RGB + Infrared for object detection

Beam splitter for spatial synchronization


RGB + Infrared for object detection

Beam splitter for spatial synchronization


RGB + Infrared for object detection

How to fuse RGB+Infrared?


RGB + Infrared for object detection

How to fuse RGB+Infrared?


RGB + Infrared for object detection

How to fuse RGB+Infrared?


RGB + Infrared for object detection

How to fuse RGB+Infrared?

[1] Devaguptapu, et al. Borrow from Anywhere: Pseudo Multi-modal Object Detection in Thermal Imagery. CVPRW, 2019
RGB + Infrared for object detection

How to fuse RGB+Infrared?

[1] Devaguptapu, et al. Borrow from Anywhere: Pseudo Multi-modal Object Detection in Thermal Imagery. CVPRW, 2019
RGB + Infrared for object detection

How to fuse RGB+Infrared?


RGB + Infrared for object detection

How to fuse RGB+Infrared?


RGB + Infrared for object detection

How to fuse RGB+Infrared?


Late fusion of RGB + Infrared

modality x1 modality x2

fusion
Late fusion of RGB + Infrared

modality x1 modality x2

fusion

(a) Pooling

(a) Naïve approach: pool single-modal detections together


Late fusion of RGB + Infrared

modality x1 modality x2

fusion

(a) Pooling

(a) Naïve approach: pool single-modal detections together


This will likely produce overlapping detections.
Late fusion of RGB + Infrared

modality x1 modality x2

fusion

(a) Pooling (b) NMS

(a) Naïve approach: pool single-modal detections together


This will likely produce overlapping detections.
(b) Remove overlapping detections with Non-Maximum Suppression (NMS)

Dalal & Triggs. “Histograms of oriented gradients for human detection”. CVPR, 2005.
Late fusion of RGB + Infrared

modality x1 modality x2

fusion

(a) Pooling (b) NMS

(a) Naïve approach: pool single-modal detections together


This will likely produce overlapping detections.
(b) Remove overlapping detections with Non-Maximum Suppression (NMS)
This is a waste of information.

Dalal & Triggs. “Histograms of oriented gradients for human detection”. CVPR, 2005.
Late fusion of RGB + Infrared

modality x1 modality x2

fusion

(a) Pooling (b) NMS (c) Average

(a) Naïve approach: pool single-modal detections together


This will likely produce overlapping detections.
(b) Remove overlapping detections with Non-Maximum Suppression (NMS)
This is a waste of information.
(c) To fuse modalities rather than suppress, let’s try averaging (but this must decrease score)
Late fusion of RGB + Infrared

modality x1 modality x2

fusion

(a) Pooling (b) NMS (c) Average

(a) Naïve approach: pool single-modal detections together


This will likely produce overlapping detections.
(b) Remove overlapping detections with Non-Maximum Suppression (NMS)
This is a waste of information.
(c) To fuse modalities rather than suppress, let’s try averaging (but this must decrease score)
Intuitively, a probabilistic approach to fusion should boost scores when modalities agree
Late fusion of RGB + Infrared

modality x1 modality x2

fusion

(a) Pooling (b) NMS (c) Average (d) ProbEn

(a) Naïve approach: pool single-modal detections together


This will likely produce overlapping detections.
(b) Remove overlapping detections with Non-Maximum Suppression (NMS)
This is a waste of information.
(c) To fuse modalities rather than suppress, let’s try averaging (but this must decrease score)
Intuitively, a probabilistic approach to fusion should boost scores when modalities agree
(d) We propose probabilistic ensembling (ProbEn), a non-learned approach derived from first
principles.

Chen, Shi, Ye, Mertz, Ramanan, Kong. “Multimodal Object Detection via Probabilistic Ensembling”. ECCV, 2022
Probabilistic ensembling of RGB & Infrared detections

modality x1 modality x2

p (y | x1, x2) =
?
Probabilistic ensembling of RGB & Infrared detections
ProbEn is the optimal fusion strategy given the conditional independence assumption

modality x1 modality x2

p (y | x1, x2) =
?
Probabilistic ensembling of RGB & Infrared detections
ProbEn is the optimal fusion strategy given the conditional independence assumption
p (x1 | y ) = p (x1 | x2 , y )
modality x1 modality x2

p (y | x1, x2) =
?
Probabilistic ensembling of RGB & Infrared detections
ProbEn is the optimal fusion strategy given the conditional independence assumption
p (x1 | y ) = p (x1 | x2 , y )
modality x1 modality x2
Bayes rule
p (y | x1, x2) =
?
Probabilistic ensembling of RGB & Infrared detections
ProbEn is the optimal fusion strategy given the conditional independence assumption
p (x1 | y ) = p (x1 | x2 , y )
modality x1 modality x2
Bayes rule
p(x1, x2 | y) p(y)
p (y | x1, x2) =
p(x1, x2)
Probabilistic ensembling of RGB & Infrared detections
ProbEn is the optimal fusion strategy given the conditional independence assumption
p (x1 | y ) = p (x1 | x2 , y )
modality x1 modality x2
Bayes rule
p(x1, x2 | y) p(y)
p (y | x1, x2) = ∝ p(x1, x2 | y) p(y)
p(x1, x2)
Probabilistic ensembling of RGB & Infrared detections
ProbEn is the optimal fusion strategy given the conditional independence assumption
p (x1 | y ) = p (x1 | x2 , y )
modality x1 modality x2
Bayes rule
p(x1, x2 | y) p(y)
p (y | x1, x2) = ∝ p(x1, x2 | y) p(y)
p(x1, x2)
p(x1 | y) p(y) p(x2 | y) p(y)

p(y)
Probabilistic ensembling of RGB & Infrared detections
ProbEn is the optimal fusion strategy given the conditional independence assumption
p (x1 | y ) = p (x1 | x2 , y )
modality x1 modality x2
Bayes rule
p(x1, x2 | y) p(y)
p (y | x1, x2) = ∝ p(x1, x2 | y) p(y)
p(x1, x2)
p(x1 | y) p(y) p(x2 | y) p(y)

p(y)

p(y | x1) p(y | x2)



p(y)
Probabilistic ensembling of RGB & Infrared detections
ProbEn is the optimal fusion strategy given the conditional independence assumption
p (x1 | y ) = p (x1 | x2 , y )
modality x1 modality x2
Bayes rule
p(x1, x2 | y) p(y)
p (y | x1, x2) = ∝ p(x1, x2 | y) p(y)
p(x1, x2)
p(x1 | y) p(y) p(x2 | y) p(y)

ProbEn p(y)
• multiply single modal probability
• divide by the class prior p(y | x1) p(y | x2)

• re-normalize p(y)
Probabilistic ensembling of RGB & Infrared detections
ProbEn is the optimal fusion strategy given the conditional independence assumption
p (x1 | y ) = p (x1 | x2 , y )
modality x1 modality x2
Bayes rule
p(x1, x2 | y) p(y)
p (y | x1, x2) = ∝ p(x1, x2 | y) p(y)
p(x1, x2)
p(x1 | y) p(y) p(x2 | y) p(y)

ProbEn p(y)
• multiply single modal probability
• divide by the class prior p(y | x1) p(y | x2)

• re-normalize p(y)

prior RGB Thermal p(y|x1) p(y|x2)


Re-norm
p(y) p(y|x1) p(y|x2) p(y)
person 0.5 0.7 0.8 0.7*0.8 / 0.5 0.9

car 0.5 0.3 0.2 0.3*0.2 / 0.5 0.1


Probabilistic ensembling of RGB & Infrared detections
ProbEn is the optimal fusion strategy given the conditional independence assumption
p (x1 | y ) = p (x1 | x2 , y )
modality x1 modality x2
Bayes rule
p(x1, x2 | y) p(y)
p (y | x1, x2) = ∝ p(x1, x2 | y) p(y)
p(x1, x2)
p(x1 | y) p(y) p(x2 | y) p(y)

ProbEn p(y)
• multiply single modal probability
• divide by the class prior p(y | x1) p(y | x2)

• re-normalize p(y)

The optimal strategy to fuse is to sum logits


• Not to average softmax scores;
softmax posterior logits • Not to average logits;
exp(si[k]) • But to sum logits! sum logits
p (y=k | xi) =
∑𝑗 exp(si[j])
exp( s1[k] + s2[k] )
p (y=k | x1, x2) ∝
∝ exp(si[k]) p(y=k)

Chen, Shi, Ye, Mertz, Ramanan, Kong. “Multimodal Object Detection via Probabilistic Ensembling”. ECCV, 2022
Probabilistic ensembling of RGB & Infrared detections
ProbEn is the optimal fusion strategy given the conditional independence assumption
p (x1 | y ) = p (x1 | x2 , y )
modality x1 modality x2
Bayes rule
p(x1, x2 | y) p(y)
p (y | x1, x2) = ∝ p(x1, x2 | y) p(y)
p(x1, x2)
p(x1 | y) p(y) p(x2 | y) p(y)

ProbEn p(y)
• multiply single modal probability
• divide by the class prior p(y | x1) p(y | x2)

• re-normalize p(y)

The optimal strategy to fuse is to sum logits


• Not to average softmax scores;
softmax posterior logits • Not to average logits;
exp(si[k]) • But to sum logits! sum logits
p (y=k | xi) =
∑𝑗 exp(si[j])
exp( s1[k] + s2[k] )
p (y=k | x1, x2) ∝
∝ exp(si[k]) p(y=k)

Chen, Shi, Ye, Mertz, Ramanan, Kong. “Multimodal Object Detection via Probabilistic Ensembling”. ECCV, 2022
Probabilistic ensembling of RGB & Infrared detections
Probabilistic ensembling of RGB & Infrared detections
Probabilistic ensembling of RGB & Infrared detections
Probabilistic ensembling of RGB & Infrared detections
Probabilistic ensembling of RGB & Infrared detections

Chen, Shi, Ye, Mertz, Ramanan, Kong. “Multimodal Object Detection via Probabilistic Ensembling”. ECCV, 2022
Probabilistic ensembling of RGB & Infrared detections

ProbEn handles missing modalities.

Chen, Shi, Ye, Mertz, Ramanan, Kong. “Multimodal Object Detection via Probabilistic Ensembling”. ECCV, 2022
Probabilistic late-fusion of RGB + Infrared Log-Average Miss Rate
better
● ProbEn outperforms heuristic fusion methods, e.g., avg and NMS. 0 0.05 0.10 0.15 0.20 0.25 0.30 0.35
RGB
Thermal
MidFusion
KAIST dataset Pooling
NMS
average fusion
ProbEn

detections detections detections

detector detector detector head Detector


Ensemble
detector head
feature fusion

feature feature feature detections detections

RGB-feature RGB-feature thermal-feature RGB-detector thermal detector


extractor extractor extractor

Thermal ProbEn
Probabilistic late-fusion of RGB + Infrared Log-Average Miss Rate
better
● ProbEn outperforms heuristic fusion methods, e.g., avg and NMS. 0 0.05 0.10 0.15 0.20 0.25 0.30 0.35
RGB
● ProbEn still improves even when the conditional independence
Thermal
assumption does not hold. MidFusion
Pooling
NMS
average fusion
ProbEn
ProbEn (3)
Probabilistic late-fusion of RGB + Infrared Log-Average Miss Rate
better
● ProbEn outperforms heuristic fusion methods, e.g., avg and NMS. 0 0.05 0.10 0.15 0.20 0.25 0.30 0.35
RGB
● ProbEn still improves even when the conditional independence
Thermal
assumption does not hold. MidFusion
Pooling
● ProbEn off-the-shelf detectors achieves 26% relative improvement!
NMS
average fusion
ProbEn
ProbEn (3)
RPN+BDT [CVPRW 2017]
TC-DET [ECCV 2020]
IATDNN [InfoFusion 2019]
IAF RCNN [PR 2019]
CIAN [InfoFusion 2019]
MSDS [BMVC 2018]
AR-CNN [ICCV 2019]
MBNet [ECCV 2020]
MLPD [RA-L 2021]
GAFF [WACV 2021] 0.65
ProbEn (3 w/ GAFF) 0.51

Chen, Shi, Ye, Mertz, Ramanan, Kong. “Multimodal Object Detection via Probabilistic Ensembling”. ECCV, 2022
Multi-modality / multi-sensor

• RGB + Infrared
• RGB + depth
• RGB + LiDAR
• RGB + IMU
• RGB + language
RGB + depth

• Why use depth?


RGB + depth

• Why use depth?


RGB + depth

• Why use depth?


RGB + depth

• Why use depth?


RGB + depth for semantic segmentation

• Why use depth?


RGB + depth for semantic segmentation

• Why use depth?


RGB + depth for semantic segmentation

• Why use depth?


RGB + depth for semantic segmentation

• Why use depth?


RGB + depth for semantic segmentation

• Why use depth?


• Intuitively, we need to pay attention to far / small objects.
RGB + depth for semantic segmentation
RGB + depth for semantic segmentation
RGB + depth for semantic segmentation
RGB + depth for semantic segmentation
RGB + depth for semantic segmentation

You might also like