0% found this document useful (0 votes)

14 views35 pages

Open-World Segmentation and Tracking in 3D: Laura Leal-Taixé - GTC - March 18-21, 2024

The document discusses advancements in dynamic scene understanding for embodied autonomous agents, focusing on various segmentation techniques such as semantic, panoptic, and multi-object tracking. It introduces the SAL model for Lidar segmentation, which utilizes pseudo-labeling and text prompting to classify and segment objects in 3D. Additionally, it highlights the SeMoLi method for motion-inspired pseudo-labeling to enhance 3D object detection using minimal labeled data.

Uploaded by

huynhgse183099

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views35 pages

Open-World Segmentation and Tracking in 3D: Laura Leal-Taixé - GTC - March 18-21, 2024

Uploaded by

huynhgse183099

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

Open-world Segmentation and Tracking in 3D

Laura Leal-Taixé | GTC | March 18-21, 2024

Towards Embodied Autonomous Agents

What is around me? How do they move? Where am I?

Towards Embodied Autonomous Agents

What is around me? How do they move? Where am I?

Dynamic Scene Understanding

The task

# classes
Dynamic Scene Understanding

The task

tree

car

person

road
Semantic
segmentation # classes
A handful Thousands
Dynamic Scene Understanding

The task Understand

person 1
every
person 2
pixel of a video
person 3

tree

car

person

Panoptic
segmentation

road
Semantic
segmentation # classes
A handful Thousands
Dynamic Scene Understanding

The task

Multi-object
tracking &
segmentation

Panoptic
segmentation

Semantic
segmentation # classes
A handful Thousands
Dynamic Scene Understanding

The task

4D panoptic
segmentation

Multi-object
tracking &
segmentation

Panoptic
segmentation

Semantic
segmentation # classes
A handful Thousands Open-world
Dynamic Scene Understanding

The task

4D panoptic
segmentation
REAL-WORLD

Annotating
Multi-object 100 similar
tracking & frames…
segmentation Sure some
more data

Panoptic
segmentation
Let’s get
some data!

Semantic
segmentation # classes
A handful Thousands Open-world
Dynamic Scene Understanding

The task

4D panoptic
segmentation PSEUDO-
Annotating
LABELS
Multi-object 100 similar
tracking & frames…
segmentation Sure some
more data

Panoptic
segmentation
Let’s get
some data!

Semantic
segmentation # classes
A handful Thousands Open-world
Better Call SAL:
Towards Learning to Segment
Anything in Lidar

A. Osep et al. “Better call SAL: towards learning to segment anything in Lidar. arXiv:2403.13129
Hello SAL
segment trash cans and fire hydrants.

A. Osep et al. “Better call SAL: towards learning to segment anything in Lidar. arXiV
Better Call SAL
(Towards) a Lidar foundation model

Inference: Training:
Segment Anythi
Vision Foundation Mod
SAL
Model
Segment
Unlabeledand classify any object
Camera and Lidar Data
Lidar Point Cloud by manually annotating SAL T
large-scale data (as in Pseudo-Label
2D)? Engine P

1. car
Or benefit from existing 2D (
2. person
3. road annotations/models?
CLIP SAM
…
C. traffic sign

Text Prompts Instances + Semantics

Better Call SAL
(Towards) a Lidar foundation model

1. car
Or benefit from existing 2D (
2. person
3. road annotations/models?
CLIP SAM
…
C. traffic sign

Text Prompts Instances + Semantics

Better Call SAL
(Towards) a Lidar foundation model

1. car
Or benefit from existing 2D (
2. person
3. road annotations/models!
CLIP SAM
…
C. traffic sign

Text Prompts Instances + Semantics

Overview of SAL
Training: Pseudo-labeling and model

Segment Anything in Lidar:

Vision Foundation Model to Lidar Distillation

Unlabeled
Camera and Lidar Data
SAL Training With
Pseudo-Label Engine Pseudo-Labels
SAL
(Distillation) Model

CLIP SAM

antics

• Pseudo-label engine: Transfer from 2D foundation models to 3D labels

• SAL model: Zero-shot segmentation via text prompting
The SAL model
segments and classifies.

SAL Model

Lidar Point Cloud Instances

1. car
2. person
3. road
…
C. traffic sign

Text Prompts Semantics

The SAL model
segments and classifies

SAL Model

Objectness
Backbone Instance Decoder
Lidar Point Cloud Mask Instances

1. car Object Queries

2. person
3. road
…
C. traffic sign

Text Prompts Semantics

Class-agnostic segmentation.
The SAL model
segments and classifies

SAL Model

Objectness
Backbone Instance Decoder
Lidar Point Cloud Mask Instances

CLIP token
1. car Object Queries
2. person … …
3. road
…
C. traffic sign
CLIP
…

Text Prompts Semantics

Zero-shot classification.
The SAL model
segments and classifies

SAL Model

Objectness
Backbone Instance Decoder
Lidar Point Cloud Mask Instances

CLIP token
1. car Object Queries
2. person … …
3. road
…
C. traffic sign
CLIP
…

Text Prompts Semantics

How do we train
such a segment anything model?
The Pseudo-Label Engine
Label transfer from 2D to 3D
SAL in action
Zero-shot panoptic segmentation
SAL in action
Text prompting beyond class vocabularies

Hello SAL, segment streetcar.

SAL in action
Text prompting beyond class vocabularies

Hello SAL, segment store front.

SAL in action
Text prompting beyond class vocabularies

Hello SAL, segment curb.

Better Call SAL

Easily scalable, get data --> pseudo-label --> re-train the model

There are still some instances you are not going to recognize. Is there anything we can do?

Moving objects are more critical… so why not find a solution for those?
SeMoLi: Motion-inspired Pseudo-Labeling
for 3D Object Detection in Lidar

J. Seidenschwarz et al. “SeMoLi: what moves together belongs together”. arXiv:2402.19463

Motion-inspired Pseudo-Labeling for 3D Object Detection in Lidar
Problem Statement

Inputs:

Labeled Lidar Streams Unlabeled Lidar Streams

(“just a little bit”) (lots)
Output:

Object Detector (for object observed moving)

Motion-inspired Pseudo-Labeling for 3D Object Detection in Lidar
Problem Statement

Inputs:

Labeled Lidar Streams Unlabeled Lidar Streams

(“just a little bit”) (lots)
Output:

Object Detector (for object observed moving)

Our Method: SeMoLi
Segment Moving in Lidar for Pseudo-Labeling

Pseudo-label generation:

Object detector training:

Our Method: SeMoLi
Segment Moving in Lidar for Pseudo-Labeling

Pseudo-label generation:

Object detector training:

Segmenting moving objects with GNNs

• Create a graph, where nodes are • Message Passing: nodes and • Classify edges into active or not +
point trajectories and edges edges communicate, update their correlation clustering + extracting
connect trajectories that might respective features. bounding boxes
belong to the same object.
Evaluation of SeMoLi
Object Detection and Cross-Dataset Generalization

Waymo Validation Set:

Gap (GT – ours) Improvements

over prior art

Waymo -> Argoverse2 Transfer:

Baseline (DBSCAN++): Najibi et al. Motion inspired unsupervised perception and prediction in autonomous driving. ECCV 2022.
Take home messages

Pseudo-labeling is a powerful tool to leverage strong 2D foundation models for 3D tasks

Geometric and 3D motion cues are still to be explored!

Our goal is to open up possibilities in 3D without requiring labeled data (in 3D)
Questions?

Laura Leal-Taixé | GTC | March 18-21, 2024

Arun Kumar - Introduction To Solid State Physics - Libgen - Li
No ratings yet
Arun Kumar - Introduction To Solid State Physics - Libgen - Li
506 pages
Performance Optimization With Modern CUDA Programming Techniques - 1635781161534001h3am
No ratings yet
Performance Optimization With Modern CUDA Programming Techniques - 1635781161534001h3am
77 pages
Test 6A
No ratings yet
Test 6A
11 pages
CHAPTER QUIZ - Position Paper
90% (10)
CHAPTER QUIZ - Position Paper
3 pages
Text Book Engish
67% (3)
Text Book Engish
64 pages
Counseling
No ratings yet
Counseling
21 pages
Abirami R - Internship - Report
No ratings yet
Abirami R - Internship - Report
26 pages
Exercise 1
No ratings yet
Exercise 1
3 pages
IELTS Writing
100% (1)
IELTS Writing
5 pages
Intro To Philosophy Q1 WEEK3 For Teacher PDF
50% (2)
Intro To Philosophy Q1 WEEK3 For Teacher PDF
18 pages
Swami Vivekananda and Human Excellence - A Book Summary
100% (2)
Swami Vivekananda and Human Excellence - A Book Summary
6 pages
Life and Works of Rizal Reviewer
No ratings yet
Life and Works of Rizal Reviewer
6 pages
Maven Tutorial 01
No ratings yet
Maven Tutorial 01
33 pages
Lisa Hoang - CV
No ratings yet
Lisa Hoang - CV
6 pages
Lewis Hine: I. Introduction: The Context
No ratings yet
Lewis Hine: I. Introduction: The Context
7 pages
Urbana and Feliza
No ratings yet
Urbana and Feliza
3 pages
VALEO
No ratings yet
VALEO
13 pages
Lu-Net: An Efficient Network For 3D Lidar Point Cloud Semantic Segmentation Based On End-To-End-Learned 3D Features and U-Net
No ratings yet
Lu-Net: An Efficient Network For 3D Lidar Point Cloud Semantic Segmentation Based On End-To-End-Learned 3D Features and U-Net
9 pages
Domain Transfer For Semantic Segmentation of Lidar Data Using Deep Neural Networks
No ratings yet
Domain Transfer For Semantic Segmentation of Lidar Data Using Deep Neural Networks
8 pages
A Comprehensive Survey of LIDAR-based 3D Object Detection Methods
No ratings yet
A Comprehensive Survey of LIDAR-based 3D Object Detection Methods
29 pages
Literature Review Table - Demo
No ratings yet
Literature Review Table - Demo
2 pages
ALS IMPLEMENTERS. Doc. Luisito Cantos
No ratings yet
ALS IMPLEMENTERS. Doc. Luisito Cantos
5 pages
Stage Name (What) Stage Aim (Why) Time (How Long) Focus (Who) Procedure (How) Tutor's Comments Language Focus
No ratings yet
Stage Name (What) Stage Aim (Why) Time (How Long) Focus (Who) Procedure (How) Tutor's Comments Language Focus
2 pages
Mental Status Exam Guidelines: DSHS 13-865 (REV. 08/2018) Addendum
No ratings yet
Mental Status Exam Guidelines: DSHS 13-865 (REV. 08/2018) Addendum
1 page
Bucket Drumming Lesson
No ratings yet
Bucket Drumming Lesson
3 pages
EU Funding 2014-15
No ratings yet
EU Funding 2014-15
2 pages
RangeNet ++
No ratings yet
RangeNet ++
8 pages
Squeeze Seg
No ratings yet
Squeeze Seg
7 pages
Sensor Fusion Engineer Nanodegree Syllabus
No ratings yet
Sensor Fusion Engineer Nanodegree Syllabus
6 pages
3P CEW545 Rubrics Level 3P-Ad Hoc-Covid19
No ratings yet
3P CEW545 Rubrics Level 3P-Ad Hoc-Covid19
2 pages
Simple-BEV: What Really Matters For Multi-Sensor BEV Perception?
No ratings yet
Simple-BEV: What Really Matters For Multi-Sensor BEV Perception?
7 pages
NSTP 02 Activity 4 Community Immersion
No ratings yet
NSTP 02 Activity 4 Community Immersion
1 page
Thomas Tullis' Grievance Filed Against Ben Bowman
No ratings yet
Thomas Tullis' Grievance Filed Against Ben Bowman
5 pages
Deep Learning For LiDAR Point Clouds in Autonomous Driving A Review
No ratings yet
Deep Learning For LiDAR Point Clouds in Autonomous Driving A Review
21 pages
Li DeepFusion Lidar-Camera Deep Fusion For Multi-Modal 3D Object Detection CVPR 2022 Paper
No ratings yet
Li DeepFusion Lidar-Camera Deep Fusion For Multi-Modal 3D Object Detection CVPR 2022 Paper
10 pages
NHS FPX 6004 Assessment 3 Training Session For Policy Implementation
No ratings yet
NHS FPX 6004 Assessment 3 Training Session For Policy Implementation
7 pages
Range View Based Fusion of Time-Series LiDAR Data For
No ratings yet
Range View Based Fusion of Time-Series LiDAR Data For
7 pages
MHB Automotive LIDAR Artictle
No ratings yet
MHB Automotive LIDAR Artictle
14 pages
Tier 1 Action Plan
No ratings yet
Tier 1 Action Plan
3 pages
Visual-LiDAR Based 3D Object Detection and Tracking For Embedded Systems
No ratings yet
Visual-LiDAR Based 3D Object Detection and Tracking For Embedded Systems
14 pages
Center Former
No ratings yet
Center Former
17 pages
Semantics-Aware Lidar-Only Pseudo Point Cloud Generation For 3D Object Detection
No ratings yet
Semantics-Aware Lidar-Only Pseudo Point Cloud Generation For 3D Object Detection
7 pages
Poetry Analysis Worksheet
No ratings yet
Poetry Analysis Worksheet
2 pages
Cross Dataset Sensor Alignment Making Visual 3D Object Detector Generalize
No ratings yet
Cross Dataset Sensor Alignment Making Visual 3D Object Detector Generalize
8 pages
Ge Et Al 2023 Roadside Lidar Sensor Configuration Assessment and Optimization Methods For Vehicle Detection and
No ratings yet
Ge Et Al 2023 Roadside Lidar Sensor Configuration Assessment and Optimization Methods For Vehicle Detection and
20 pages
Gusmão 2021 J. Phys. Conf. Ser. 1826 012002
No ratings yet
Gusmão 2021 J. Phys. Conf. Ser. 1826 012002
9 pages
Deep Learning For Lidar-Only and Lidar-Fusion 3D Perception: A Survey
No ratings yet
Deep Learning For Lidar-Only and Lidar-Fusion 3D Perception: A Survey
25 pages
Deep Continuous Fusion
No ratings yet
Deep Continuous Fusion
16 pages
SegContrast 3D Point Cloud Feature Representation Learning Through Self-Supervised Segment Discrimination
No ratings yet
SegContrast 3D Point Cloud Feature Representation Learning Through Self-Supervised Segment Discrimination
8 pages
End-to-End Pseudo-LiDAR For Image-Based 3D Object Detection
No ratings yet
End-to-End Pseudo-LiDAR For Image-Based 3D Object Detection
14 pages
Dokania IDD-3D Indian Driving Dataset For 3D Unstructured Road Scenes WACV 2023 Paper
No ratings yet
Dokania IDD-3D Indian Driving Dataset For 3D Unstructured Road Scenes WACV 2023 Paper
10 pages
LiDARsim: Realistic LiDAR Simulation by Leveraging The Real World
No ratings yet
LiDARsim: Realistic LiDAR Simulation by Leveraging The Real World
11 pages
Annotated Bibliography
No ratings yet
Annotated Bibliography
5 pages
Thesis in Peace and Conflict Studies
100% (3)
Thesis in Peace and Conflict Studies
6 pages
Yang, Liu Et Al 2022 - Graph R-CNN - Towards Accurate 3D Object Detection With Semantic-Decorated Local Graph
No ratings yet
Yang, Liu Et Al 2022 - Graph R-CNN - Towards Accurate 3D Object Detection With Semantic-Decorated Local Graph
18 pages
FW10 Laganiere University Ottawa 2023
No ratings yet
FW10 Laganiere University Ottawa 2023
62 pages
X KD: Knowledge Distillation Across Modalities, Tasks and Stages For Multi-Camera 3D Object Detection
No ratings yet
X KD: Knowledge Distillation Across Modalities, Tasks and Stages For Multi-Camera 3D Object Detection
10 pages
Wang Pseudo-LiDAR From Visual Depth Estimation Bridging The Gap in 3D CVPR 2019 Paper
No ratings yet
Wang Pseudo-LiDAR From Visual Depth Estimation Bridging The Gap in 3D CVPR 2019 Paper
9 pages
Report
No ratings yet
Report
3 pages
Object Detection From A Few LIDAR Scanning Planes
No ratings yet
Object Detection From A Few LIDAR Scanning Planes
13 pages
BirdNet A 3D Object Detection Framework
No ratings yet
BirdNet A 3D Object Detection Framework
8 pages
Deep 3d Object Detection Networks Using Lidar Data
No ratings yet
Deep 3d Object Detection Networks Using Lidar Data
20 pages
NVRadarNet Real-Time Radar Obstacle and Free Space Detection For Autonomous Driving
No ratings yet
NVRadarNet Real-Time Radar Obstacle and Free Space Detection For Autonomous Driving
7 pages
Salsanet: Fast Road and Vehicle Segmentation in Lidar Point Clouds For Autonomous Driving
No ratings yet
Salsanet: Fast Road and Vehicle Segmentation in Lidar Point Clouds For Autonomous Driving
7 pages
2nd Counselling 2011
No ratings yet
2nd Counselling 2011
3 pages
NeurIPS 2023 Pop 3d Open Vocabulary 3d Occupancy Prediction From Images Paper Conference
No ratings yet
NeurIPS 2023 Pop 3d Open Vocabulary 3d Occupancy Prediction From Images Paper Conference
13 pages
Annual Plan 10
No ratings yet
Annual Plan 10
3 pages
Sensors Paper Urban Road Filter
No ratings yet
Sensors Paper Urban Road Filter
18 pages
LIDAR Based Object Detection
No ratings yet
LIDAR Based Object Detection
4 pages
Multi-Modal 3D Object Detection in Autonomous Driving (A Survey)
No ratings yet
Multi-Modal 3D Object Detection in Autonomous Driving (A Survey)
30 pages
S62797 - LLM Inference Sizing - Benchmarking End-to-End Inference Systems
No ratings yet
S62797 - LLM Inference Sizing - Benchmarking End-to-End Inference Systems
36 pages
2022 See - Eye - To - Eye - A - Lidar-Agnostic - 3D - Detection - Framework - For - Unsupervised - Multi-Target - Domain - Adaptation
No ratings yet
2022 See - Eye - To - Eye - A - Lidar-Agnostic - 3D - Detection - Framework - For - Unsupervised - Multi-Target - Domain - Adaptation
8 pages
Universal Model Serving Via Triton and Tensorrt: Ke Ma, Genai@Snap, Inc
No ratings yet
Universal Model Serving Via Triton and Tensorrt: Ke Ma, Genai@Snap, Inc
28 pages
3D Object Visibility Prediction in Autonomous Driving1
No ratings yet
3D Object Visibility Prediction in Autonomous Driving1
17 pages
25370-Article Text-29433-1-2-20230626
No ratings yet
25370-Article Text-29433-1-2-20230626
9 pages
2023 Real-Time and Robust 3D Object Detection Within Roadside LiDARs Using Domain Adaptation
No ratings yet
2023 Real-Time and Robust 3D Object Detection Within Roadside LiDARs Using Domain Adaptation
11 pages
Paper 1
No ratings yet
Paper 1
64 pages
Paper 1 2
No ratings yet
Paper 1 2
2 pages
Rosels: Road Surface Extraction For 3D Automotive Lidar Point Cloud Sequence
No ratings yet
Rosels: Road Surface Extraction For 3D Automotive Lidar Point Cloud Sequence
13 pages
(Master) Lidar 3D Object Detection v5 Spec
No ratings yet
(Master) Lidar 3D Object Detection v5 Spec
68 pages
Streaming Object Detection For 3-D Point Clouds:, ,, ,, ,, ,, Google Brain, Waymo
No ratings yet
Streaming Object Detection For 3-D Point Clouds:, ,, ,, ,, ,, Google Brain, Waymo
13 pages
Pcb-Randnet: Rethinking Random Sampling For Lidar Semantic Segmentation in Autonomous Driving Scene
No ratings yet
Pcb-Randnet: Rethinking Random Sampling For Lidar Semantic Segmentation in Autonomous Driving Scene
7 pages
Cell Material Interaction Lab Manual S241
No ratings yet
Cell Material Interaction Lab Manual S241
22 pages
Ujjwal Sir Reflective Teacher Semi Supervised Paper
No ratings yet
Ujjwal Sir Reflective Teacher Semi Supervised Paper
10 pages
A41101 - How CUDA Programming Works
No ratings yet
A41101 - How CUDA Programming Works
116 pages
Fast LIO 2
No ratings yet
Fast LIO 2
19 pages
Midterm SC
No ratings yet
Midterm SC
5 pages
SaViD Spectravista Aesthetic Vision Integration Fo
No ratings yet
SaViD Spectravista Aesthetic Vision Integration Fo
8 pages
COLA COarse-LAbel Multisource LiDAR Semantic Segmentation For Autonomous Driving
No ratings yet
COLA COarse-LAbel Multisource LiDAR Semantic Segmentation For Autonomous Driving
13 pages
3D-Object Detection For Autonomous Vehicles - Towards Data Science
No ratings yet
3D-Object Detection For Autonomous Vehicles - Towards Data Science
23 pages
Deep Learning For Lidar Point Clouds in Autonomous Driving: A Review
No ratings yet
Deep Learning For Lidar Point Clouds in Autonomous Driving: A Review
21 pages
Klingner X3KD Knowledge Distillation Across Modalities Tasks and Stages For Multi-Camera CVPR 2023 Paper
No ratings yet
Klingner X3KD Knowledge Distillation Across Modalities Tasks and Stages For Multi-Camera CVPR 2023 Paper
11 pages
Pseudo-Image and Sparse Points
No ratings yet
Pseudo-Image and Sparse Points
13 pages
UnO Unsupervised Occupancy Fields For Perception and Forecasting
No ratings yet
UnO Unsupervised Occupancy Fields For Perception and Forecasting
24 pages
Pyramid Image Processing: Exploring the Depths of Visual Analysis
From Everand
Pyramid Image Processing: Exploring the Depths of Visual Analysis
Fouad Sabry
No ratings yet
Hidden Surface Determination: Unveiling the Secrets of Computer Vision
From Everand
Hidden Surface Determination: Unveiling the Secrets of Computer Vision
Fouad Sabry
No ratings yet

Open-World Segmentation and Tracking in 3D: Laura Leal-Taixé - GTC - March 18-21, 2024

Uploaded by

Open-World Segmentation and Tracking in 3D: Laura Leal-Taixé - GTC - March 18-21, 2024

Uploaded by

Open-world Segmentation and Tracking in 3D

Laura Leal-Taixé | GTC | March 18-21, 2024

What is around me? How do they move? Where am I?

What is around me? How do they move? Where am I?

The task Understand

Text Prompts Instances + Semantics

Text Prompts Instances + Semantics

Text Prompts Instances + Semantics

Segment Anything in Lidar:

• Pseudo-label engine: Transfer from 2D foundation models to 3D labels

Lidar Point Cloud Instances

Text Prompts Semantics

1. car Object Queries

Text Prompts Semantics

Text Prompts Semantics

Text Prompts Semantics

Hello SAL, segment streetcar.

Hello SAL, segment store front.

Hello SAL, segment curb.

J. Seidenschwarz et al. “SeMoLi: what moves together belongs together”. arXiv:2402.19463

Labeled Lidar Streams Unlabeled Lidar Streams

Object Detector (for object observed moving)

Labeled Lidar Streams Unlabeled Lidar Streams

Object Detector (for object observed moving)

Object detector training:

Object detector training:

Waymo Validation Set:

Gap (GT – ours) Improvements

Waymo -> Argoverse2 Transfer:

Pseudo-labeling is a powerful tool to leverage strong 2D foundation models for 3D tasks

Geometric and 3D motion cues are still to be explored!

Laura Leal-Taixé | GTC | March 18-21, 2024

You might also like