Digging Into Sample Assignment Methods For Object Detection
Digging Into Sample Assignment Methods For Object Detection
Hiroto Honda
- homepage: https://hirotomusiker.github.io/
from: [H1]
How Object Detection Works
Example of 2-stage Detector [H1][3]: Faster R-CNN [1] + Feature Pyramid Network [2]
Object Detectors Decomposed
backbone
neck
roi head
backbone 1-stage
(single-shot)
detector
neck 2-stage
roi head detector
2-stage detector
from: [H1]
Region Proposal Network
INPUT
from: [H1]
visualization of an objectness channel
(corresponding to one of three anchors)
Anchors
from: [H1]
Anchors on Each Grid Cell
from: [H1]
from: [H1]
A IoU = A ∩ B / A ∪ B
B
IoU=0.95
IoU=0.15
IoU Matrix for Anchor-GT Matching
position 0 position 1 position 2
anchors
from: [H1]
matched with
ignored background
GT box 1,
foreground
ignored (T2 ≦ IoU < T1) T1 and T2: predefined threshold values
Sample Assignment of RPN
Box Regression After Sample Assignment
Δx =(x-xa)/wa)
Δy =(y-ya)/ha
Δw = log(w/wa)
Δh = log(h/ha)
from: [H1]
RPN learns relative size and location between GT boxes and anchors
RetinaNet / EfficientDet
Backbone
C2
C3
RetinaNetHead
C4 C5
P6,
stem
P7
cls_subnet -> cls_score
+ + +
+
P4 P5
P2 P3
EfficientDet
Input Image
BGR, H, W
Backbone
Backbone: EfficientNet
C2
Neck: BiFPNC3 RetinaNetHead
C4 C5
P6,
stem
P7
cls_subnet -> cls_score
+ + +
+
P4 P5
P2 P3
Sample Assignment of RetinaNet and EfficientDet
same as RPN - number of anchors and IoU thresholds are different
IoU value
[3]
matched with
ignored background
GT box 1,
foreground
ignored (T2 ≦ IoU < T1) [only RetinaNet] T1 and T2: predefined threshold values
YOLO v1 / v2 / v3 / v4 / v5
darknet53
YOLO Layer
P4 P5
P3
anchors
・・・
GT box 1 0 0 0 0 0 0 0.98 0 0 max
ignored (IoU between prediction and GT > T1) T1: predefined threshold values
anchors
・・・
GT box 1 0 0 0 0 0 0 0.98 0 0
matched with
matched
GT box 1,
with GT box 0,
foreground
foreground
foreground (v4: IoU > T1, v5: box w, h ratio < Ta ) : objectness = 1. regression target
background (v4: IoU > T1, v5: box w, h ratio > Ta) : objectness = 0, no regression loss
YOLOv5 assigns three feature points for one target center -> higher recall
target assignment is so different between YOLO versions - which one is the best?
“Anchor-Free” Detectors
- Assign all the grid cells that fall into the GT box
- only at the appropriate scale
- ‘Center-ness’ score is used additionally to suppress low-quality predictions far
from the GT center
Objects as Points (CenterNet)
IoUthreshold
= mean(IoUs) + std(IoUs)
background (negative)
ignored
multiple anchors can be assigned for one GT
High recall but includes low-quality positives
Conclusion