AI VIETNAM
All-in-One Course
Module 10 - Project
Multi-Task Learning
AI VIET NAM
Nguyen Quoc Thai
1
Year 2023
Objectives
! Multi-task Learning for Computer Vision
Task 1 Training Data Model
Generalization
Feature-based MTL
Task 2 Training Data Model
Parameter-based MTL Generalization
Task 3 Training Data Model
Generalization
2
Outline
Ø Introduction
Ø Deep Multi-Task Architectures
Ø Optimization Strategy
Ø Experiment
3
Introduction
! Single-Task Learning
Ø Image Classification
MODEL Class: CAT
(LeNet, ResNet,…)
4
Introduction
! Single-Task Learning
Ø Image Segmentation
0 0 0 0 0 0 0 0
0 1 1 0 0 0 0 0
0 1 1 1 0 2 2 0
MODEL 0 1 1 1 0 2 2 0
(UNet) 0 1 1 1 2 2 2 0
0 1 1 1 1 2 2 0
1 1 1 1 1 2 2 0
0 0 0 0 0 0 0 0
DOG CAT
5
Introduction
! Single-Task Learning
Ø Object Detection
DOG – 0.98 CAT – 0.87
MODEL
(UNet)
Assign labels, bounding boxes
to objects in the image
6
Introduction
! Single-Task Learning
Task 1 Training Data Model
Training Generalization
Task 2 Training Data Model
Training Generalization
Task 3 Training Data Model
Training Generalization
7
Introduction
! Multi-Task Learning
Task 1 Training Data Model
Generalization
Task 2 Training Data Model
Training Generalization
Task 3 Training Data Model
Generalization
8
Introduction
! Motivation
Ø Learning multiple tasks jointly with the aim of mutual benefit
Ø Improves generalization on other tasks
Caused by the inductive bias provided by the auxiliary task
9
Introduction
! Multi-Task Learning
Task 1 Training Data Model
Generalization
What to Share?
Task 2 Training Data Model
Generalization
How to Share?
Task 3 Training Data Model
Generalization
10
Introduction
! MTL Methods (based on what to share?)
Ø Feature-based MTL
o Aims to learn common features among different tasks
Ø Parameter-based MTL
o Learns model parameters to help learn parameters for other tasks
Ø Instance-based MTL
o Identify useful data instances in a task for other task
11
Introduction
! MTL Methods (based on how to share?)
Ø Feature-based MTL
o Feature learning approach
o Deep learning approach
Ø Parameter-based MTL
o Low-Rank approach
12
Introduction
! Feature Learning Approach
Ø Why need to learn common feature representations?
o Original features may not have enough expressive power
Ø Two sub-categories
o Feature transformation approach
o Feature selection approach
13
Introduction
! Feature Learning Approach
Ø Feature transformation approach
o The learned features are a linear or nonlinear transformation of the original
feature representation
o Multi-task feedforward NN
Input 1 Output for task 1
Input d Output for task 2
14
Introduction
! Feature Learning Approach
Ø Feature selection approach
o Select a subset of the original features as the learned representation
o Eliminates useless features based on different criteria
15
Introduction
! Low-Rank Approach
Ø Assumes the model parameters of different
tasks share a low-rank subspace
16
Introduction
! Deep Learning Approach
Ø Deep Multi-Task Architectures
o Encoder-Focused
o Decoder-Focused
Ø Optimization Strategy Methods
o Task Balancing
o Other: Heuristics, Gradient Sign Dropout
17
Outline
Ø Introduction
Ø Deep Multi-Task Architectures
Ø Optimization Strategy
Ø Experiment
18
Deep Multi-Task Architectures
! Deep Multi-Task Architectures used in Computer Vision
Deep Multi-Task
Architectures
Encoder-Focused Decoder-Focused Other
MTL Baseline PAD-Net
Cross-Stitch Networks ASTMT
PAP-Net
NDDR-CNN MTI-Net
MTAN
19
Deep Multi-Task Architectures
! Encoder-Focused
Ø Share the task features in the encoding stage
Task A Task B Task C
Task specific
Shared Encoder
(Soft/Hard)
20
Deep Multi-Task Architectures
! Encoder-Focused
Ø Hard Parameter Sharing
o Generally applied by sharing the hidden layers between all tasks
o Keep several task-specific output layers
Task A Task B Task C
Task specific
21
Deep Multi-Task Architectures
! Encoder-Focused
Ø Soft Parameter Sharing
o Each task has its own model with its own parameters
o Uses a linear combination in every layer of the task-specific networks
Task A Task B Task C
Task specific
22
Deep Multi-Task Architectures
! Encoder-Focused
Ø Cross-Stitch Networks
o Shared the activations amongst all single-task networks in the encoder
Task A Task B Task A Task B
+ 𝛼 𝛼 +
Share Parameters
23
Deep Multi-Task Architectures
! Encoder-Focused
Ø Cross-Stitch Networks
o Shared the activations amongst all single-task networks in the encoder
o Cross connection
Task A Task B Task A Task B
+ 𝛼 𝛼 + + Conv Conv +
Conv Conv Conv Conv
24
Deep Multi-Task Architectures
! Encoder-Focused
Ø Multi-Task Attention Networks
o Used a shared backbone network in conjunction with task-specific attention
modules in the encoder
Task B Task C
Task specific
Shared Encoder Attention Module Attention Module
Attention Module Attention Module
25
Deep Multi-Task Architectures
! Decoder-Focused
Task A Task B Task C
Task A Task B Task C Task specific
Shared Encoder
(Soft/Hard)
26
Deep Multi-Task Architectures
! Decoder-Focused
Ø PAD-Net
o Multi-Tasks Guided Prediction-and-Distillation Network for Simultaneous
Depth Estimation and Scene Parsing
27
Deep Multi-Task Architectures
! Decoder-Focused
Ø PAD-Net
o Deep Multimodal Distillation
28
Outline
Ø Introduction
Ø Deep Multi-Task Architectures
Ø Optimization Strategy
Ø Experiment
29
Optimization Strategy
! Task Balancing Approaches
Ø Set a unique weight for each task
ℒ!"# = # 𝑤$ . ℒ$
$
Ø Use SGD to minimize the objective
𝜕ℒ$
𝑊%&'()* = 𝑊%&'()* − 𝛾 # 𝑤$
𝜕𝑊%&'()*
$
30
Optimization Strategy
! Uncertainty Weighting
Ø Use the homoscedastic uncertainty to balance the single-task losses
Ø Optimize the model weights W and noise parameters
1 1
ℒ W, σ+ , 𝜎, = , ℒ+ 𝑊 + , ℒ, 𝑊 + log 𝜎+ 𝜎,
2𝜎+ 2𝜎,
31
Optimization Strategy
! Dynamic Weight Averaging (DWA)
Ø Learns to average task weighting over time by considering the rate of change of loss
for each task
Training Time Relative Loss Change
r- t − 1
N exp T L.(t − 1)
w- t = , r. t − 1 =
r t−1 L.(t − 2)
∑. exp .
T
Temperature
(Softness of Task Weighting)
32
Optimization Strategy
! Other methods
Ø Gradient Normalization
Ø Dynamic Task Prioritization
33
Quiz
34
Outline
Ø Introduction
Ø Deep Multi-Task Architectures
Ø Optimization Strategy
Ø Experiment
35
Experiment
! NYUD-v2 Dataset
36
Experiment
! Model
Task A Task B Task C Task A Task B Task C
Hard Parameter Sharing Soft Parameter Sharing
37
Experiment
! Code
38
Summary
Deep Multi-Task
Optimization Strategy
Architectures
Encoder-Focused Decoder-Focused Other Task Balancing
MTL Baseline PAD-Net Uncertainty Weighting
Cross-Stitch Networks ASTMT Gradient Normalization
PAP-Net
NDDR-CNN MTI-Net DWA
MTAN DTP
39
Thanks!
Any questions?
40