0% found this document useful (0 votes)

15 views11 pages

Taskonomy: Disentangling Task Transfer Learning

The document presents a computational framework called Taskonomy that models the relationships among various visual tasks to enhance transfer learning. By establishing a taxonomic map of task transferability, the framework aims to reduce the need for labeled data and improve the efficiency of training models across related tasks. The authors propose a systematic approach to identify and exploit these relationships, ultimately facilitating the development of comprehensive perception systems.

Uploaded by

shriya221009

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views11 pages

Taskonomy: Disentangling Task Transfer Learning

Uploaded by

shriya221009

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Taskonomy: Disentangling Task Transfer Learning

Amir R. Zamir1,2 Alexander Sax1∗ William Shen1∗ Leonidas Guibas1 Jitendra Malik2 Silvio Savarese1
1 2
Stanford University University of California, Berkeley
http://taskonomy.vision/

Abstract 3D
Edges

Do visual tasks have a relationship, or are they unre-

lated? For instance, could having surface normals sim-
plify estimating the depth of an image? Intuition answers
these questions positively, implying existence of a structure Point
Matching Reshading
Object Class.
(1000 class)
among visual tasks. Knowing this structure has notable val-
Curvature
ues; it is the concept underlying transfer learning and pro-
vides a principled way for identifying redundancies across 3D Edges
tasks, e.g., to seamlessly reuse supervision among related 2.5D Segm.
Normals
tasks or solve many tasks in one system without piling up Reshading Z-Depth Distance Semantic
Triplet Segm.
the complexity. Point
Matching
Cam. Pose

We proposes a fully computational approach for model-

Normals
ing the structure of space of visual tasks. This is done via
finding (first and higher-order) transfer learning dependen-
Cam. Pose
cies across a dictionary of twenty six 2D, 2.5D, 3D, and 2D Segm. (fixated)
Room
semantic tasks in a latent space. The product is a computa- 2D Keypoints Autoencoding Cam. Pose Layout
(non-fixated)
tional taxonomic map for task transfer learning. We study Denoising
Vanishing Pts
the consequences of this structure, e.g. nontrivial emerged
relationships, and exploit them to reduce the demand for
labeled data. We provide a set of tools for computing and Figure 1: A sample task structure discovered by the computational
task taxonomy (taskonomy). It found that, for instance, by combining the
probing this taxonomical structure including a solver users learned features of a surface normal estimator and occlusion edge detector,
can employ to find supervision policies for their use cases. good networks for reshading and point matching can be rapidly trained
with little labeled data.

1. Introduction X to Y when many pairs of (x, y) s.t. x ∈ X, y ∈ Y are

given as training data. This is usually referred to as fully su-
Object recognition, depth estimation, edge detection, pervised learning and often leads to problems being solved
pose estimation, etc are examples of common vision tasks in isolation. Siloing tasks makes training a new task or a
deemed useful and tackled by the research community. comprehensive perception system a Sisyphean challenge,
Some of them have rather clear relationships: we under- whereby each task needs to be learned individually from
stand that surface normals and depth are related (one is a scratch. Doing so ignores their quantifiably useful relation-
derivate of the other), or vanishing points in a room are use- ships leading to a massive labeled data requirement.
ful for orientation. Other relationships are less clear: how
Alternatively, a model aware of the relationships among
keypoint detection and the shading in a room can, together,
tasks demands less supervision, uses less computation, and
perform pose estimation.
behaves in more predictable ways. Incorporating such
The field of computer vision has indeed gone far without
a structure is the first stepping stone towards develop-
explicitly using these relationships. We have made remark-
ing provably efficient comprehensive/universal perception
able progress by developing advanced learning machinery
models [32, 4], i.e. ones that can solve a large set of tasks
(e.g. ConvNets) capable of finding complex mappings from
before becoming intractable in supervision or computation
∗ Equal. demands. However, this task space structure and its effects

13712
are still largely unknown. The relationships are non-trivial, supervised learning [20, 6, 15, 100, 17, 80], which are stud-
and finding them is complicated by the fact that we have ied across various fields [70, 91, 10]. We review the topics
imperfect learning models and optimizers. In this paper, most pertinent to vision within the constraints of space:
we attempt to shed light on this underlying structure and Self-supervised learning methods leverage the inherent
present a framework for mapping the space of visual tasks. relationships between tasks to learn a desired expensive one
Here what we mean by “structure” is a collection of com- (e.g. object detection) via a cheap surrogate (e.g. coloriza-
putationally found relations specifying which tasks supply tion) [65, 69, 15, 100, 97, 66]. Specifically, they use a
useful information to another, and by how much (see Fig. 1). manually-entered local part of the structure in the task space
We employ a fully computational approach for this pur- (as the surrogate task is manually defined). In contrast, our
pose, with neural networks as the adopted computational approach models this large space of tasks in a computational
function class. In a feedforward network, each layer succes- manner and can discover obscure relationships.
sively forms more abstract representations of the input con- Unsupervised learning is concerned with the redundan-
taining the information needed for mapping the input to the cies in the input domain and leveraging them for forming
output. These representations, however, can transmit statis- compact representations, which are usually agnostic to the
tics useful for solving other outputs (tasks), presumably if downstream task [6, 47, 18, 7, 30, 74]. Our approach is not
the tasks are related in some form [80, 17, 56, 44]. This is unsupervised by definition as it is not agnostic to the tasks.
the basis of our approach: we computes an affinity matrix Instead, it models the space tasks belong to and in a way
among tasks based on whether the solution for one task can utilizes the functional redundancies among tasks.
be sufficiently easily read out of the representation trained Meta-learning generally seeks performing the learning
for another task. Such transfers are exhaustively sampled, at a level higher than where conventional learning occurs,
and a Binary Integer Programming formulation extracts a e.g. as employed in reinforcement learning [19, 29, 26],
globally efficient transfer policy from them. We show this optimization [2, 79, 46], or certain architectural mecha-
model leads to solving tasks with far less data than learn- nisms [25, 28, 84, 62]. The motivation behind meta learn-
ing them independently and the resulting structure holds on ing has similarities to ours and our outcome can be seen as
common datasets (ImageNet [75] and Places [101]). a computational meta-structure of the space of tasks.
Being fully computational and representation-based, the Multi-task learning targets developing systems that can
proposed approach avoids imposing prior (possibly incor- provide multiple outputs for an input in one run [48, 16].
rect) assumptions on the task space. This is crucial because Multi-task learning has experienced recent progress and the
the priors about task relations are often derived from either reported advantages are another support for existence of a
human intuition or analytical knowledge, while neural net- useful structure among tasks [90, 97, 48, 73, 70, 48, 16, 94,
works need not operate on the same principles [60, 31, 38, 58, 9, 63]. Unlike multi-task learning, we explicitly model
43, 99, 85]. For instance, although we might expect depth the relations among tasks and extract a meta-structure. The
to transfer to surface normals better (derivatives are easy), large number of tasks we consider also makes developing
the opposite is found to be the better direction in a compu- one multi-task network for all infeasible.
tational framework (i.e. suited neural networks better). Domain adaption seeks to render a function that is de-
An interactive taxonomy solver which uses our model veloped on a certain domain applicable to another [42, 96,
to suggest data-efficient curricula, a live demo, dataset, and 5, 77, 50, 24, 34]. It often addresses a shift in the input do-
code are available at http://taskonomy.vision/. main, e.g. webcam images to D-SLR [45], while the task
is kept the same. In contrast, our framework is concerned
2. Related Work with output (task) space, hence can be viewed as task/output
adaptation. We also perform the adaptation in a larger space
Assertions of existence of a structure among tasks date among many elements, rather than two or a few.
back to the early years of modern computer science, e.g.
with Turing arguing for using learning elements [92, 95] 3. Method
rather than the final outcome or Jean Piaget’s works on
developmental stages using previously learned stages as We define the problem as follows: we want to max-
sources [71, 37, 36], and have extended to recent works [73, imize the collective performance on a set of tasks T =
70, 48, 16, 94, 58, 9, 63]. Here we make an attempt to actu- {t1 , ..., tn }, subject to the constraint that we have a limited
ally find this structure. We acknowledge that this is related supervision budget γ (due to financial, computational, or
to a breadth of topics, e.g. compositional modeling [33, 8, time constraints). We define our supervision budget γ to be
11, 21, 53, 89, 87], homomorphic cryptography [40], life- the maximum allowable number of tasks that we are willing
long learning [90, 13, 82, 81], functional maps [68], certain to train from scratch (i.e. source tasks). The task dictionary
aspects of Bayesian inference and Dirichlet processes [52, is defined as V=T ∪ S where T is the set of tasks which we
88, 87, 86, 35, 37], few-shot learning [78, 23, 22, 67, 83], want solved (target), and S is the set of tasks that can be
transfer learning [72, 81, 27, 61, 64, 57], un/semi/self- trained (source). Therefore, T − T ∩ S are the tasks that

3713
(I) Task-specific Modeling (II) Transfer Modeling (III) Task Affinity (IV) Compute Taxonomy
Layout Normals Reshading Layout Normals Reshading
Normalization

... ... 2D Segm.

2D Edges
ce 2D Segm. 3D Keypoints 2.5D Segm 2D Segm. 3D Keypoints 2.5D Segm
Denoising
Autoencoding
s pa
t AHP task affinities
tpu
Ou Layout
Cam. Pose
ose Vanishing Pts.
Binary Integer 2D Keypoints
(fix) Cam. Pose
) Program
e
ac ion
Jigsa
Jigs
Jigsaw (nonfix)
Distance
Sp ntat Normals In-painting
k
s s e Z-Depth Reshading
Ta epre Colorization
Egomotion
(r Scene Class.s.
3D Keypoints Occlusion Edges
e
p ac 1st Order
Matching Object Class. (100)

ts Task-specific 2nd Order Object Class. (1000)

0)
Semantic Segm.
pu 3rd Order
2.5D Segm.Randomm Proj.
In Frozen
Curvature

Figure 2: Computational modeling of task relations and creating the taxonomy. From left to right: I. Train task-specific networks. II. Train (first
order and higher) transfer functions among tasks in a latent space. III. Get normalized transfer affinities using AHP (Analytic Hierarchy Process). IV. Find
global transfer taxonomy using BIP (Binary Integer Program).

Query Image Surface Normals Eucl. Distance Object Class. Scene Class.
we want solved but cannot train (“target-only”), T ∩ S are
Top 5 prediction: Top 2 prediction:
the tasks that we want solved but could play as source too, sliding door living room
and S − T ∩ S are the “source-only” tasks which we may home theater, home theatre
studio couch, day bed
television room

china cabinet, china closet

not directly care about to solve (e.g. jigsaw puzzle) but can entertainment center

be optionally used if they increase the performance on T . Jigsaw puzzle Colorization 2D Segm. 2.5D Segm. Semantic Segm.
The task taxonomy (taskonomy) is a computationally
found directed hypergraph that captures the notion of task
transferability over any given task dictionary. An edge be-
tween a group of source tasks and a target task represents a
feasible transfer case and its weight is the prediction of its Vanishing Points 2D Edges 3D Edges 2D Keypoints 3D Keypoints
performance. We use these edges to estimate the globally
optimal transfer policy to solve T . Taxonomy produces a
family of such graphs, parameterized by the available su-
pervision budget, chosen tasks, transfer orders, and transfer
3D Curvature Image Reshading In-painting Denoising Autoencoding
functions’ expressiveness.
Taxonomy is built using a four step process depicted in
Fig. 2. In stage I, a task-specific network for each task in S
is trained. In stage II, all feasible transfers between sources
and targets are trained. We include higher-order transfers Cam. Pose (non-fixated) Cam. Pose (fixated) Triplet Cam. Pose Room Layout Point Matching
which use multiple inputs task to transfer to one target. In
stage III, the task affinities acquired from transfer function
performances are normalized, and in stage IV, we synthe-
size a hypergraph which can predict the performance of any
transfer policy and optimize for the optimal one. Figure 3: Task Dictionary. Outputs of 24 (of 26) task-specific networks
A vision task is an abstraction read from a raw image. for a query (top left). See results of applying frame-wise on a video here.
We denote a task t more formally as a function ft which
maps image I to ft (I). Our dataset, D, contains for each generalize to out-of-dictionary tasks. The more regular /
task t a set of training pairs (I, ft (I)), e.g. (image, depth). better sampled the space, the better the generalization. We
Task Dictionary: Our mapping of task space is done evaluate this in Sec. 4.2 with supportive results. For evalu-
via (26) tasks included in the dictionary, so we ensure they ation of the robustness of results w.r.t the choice of dictio-
cover common themes in computer vision (2D, 3D, seman- nary, see the supplementary material.
tics, etc) to the elucidate fine-grained structures of task Dataset: We need a dataset that has annotations for ev-
space. See Fig. 3 for some of the tasks with detailed def- ery task on every image. Training all of our tasks on exactly
inition of each task provided in the supplementary material. the same pixels eliminates the possibility that the observed
It is critical to note the task dictionary is meant to be transferabilities are affected by different input data peculiar-
a sampled set, not an exhaustive list, from a denser space ities rather than only task intrinsics. We created a dataset of
of all conceivable visual tasks. This gives us a tractable 4 million images of indoor scenes from about 600 build-
way to sparsely model a dense space, and the hypothesis is ings; every image has an annotation for every task. The
that (subject to a proper sampling) the derived model should images are registered on and aligned with building-wide

3714
...

Transfers Results (2k training images)

Ds→t Ground Task

Surface Normal
I Es 3rd order Input Truth Specific Reshade Layout 2D Segm. Autoenc. Scratch

Estimation
nd
2 order

Ground Task
Representation Es(I) Reshade Layout 2D Segm. Autoenc. Scratch

Segmentation
Transfer Function Input Truth Specific

Source Task Encoder Target Task Output

2.5D
Frozen (e.g., curvature) (e.g., surface normal)

Figure 4: Transfer Function. A small readout function is trained to map Figure 5: Transfer results to normals and 2.5D Segmentation from
representations of source task’s frozen encoder to target task’s labels. If 5 different source tasks. The spread in transferability among sources is
order> 1, transfer function receives representations from multiple sources. apparent. “Scratch” was trained from scratch without transfer learning.

meshes similar to [3, 98, 12] enabling us to programmati- Higher-Order Transfers: Multiple source tasks can
cally compute the ground truth for many tasks without hu- contain complementary information for solving a target task
man labeling. For the tasks that still require labels (e.g. (see examples in Fig 6). We include higher-order transfers
scene classes), we generate them using Knowledge Distil- which are the same as first order but receive multiple rep-
lation [41] from known methods [101, 55, 54, 75]. See the resentations in the input. Thus, our transfers are functions
supplementary material for full details of the process and D : ℘(S) → T , where ℘ is the powerset operator.
a user study on the final quality of labels generated using
As there is a combinatorial explosion in the number of
Knowledge Distillation (showing < 7% error).
feasible higher-order transfers (|T | × |S|k for k th
order),
3.1. Step I: Task-Specific Modeling we employ a sampling procedure with the goal of filtering
out higher-order transfers that are less likely to yield good
We train a fully supervised task-specific network for results, without training them. We use a beam search: for
each task in S. Task-specific networks have an encoder- transfers of order k ≤ 5 to a target, we select its 5 best
decoder architecture homogeneous across all tasks, where sources (according to 1st order performances) and include
the encoder is large enough to extract powerful represen- all of their order-k combination. For k ≥ 5, we use a beam
tations, and the decoder is large enough to achieve a good of size 1 and compute the transfer from the top k sources.
performance but is much smaller than the encoder. We also tested transitive transfers (s → t1 → t2 ) which
showed they do not improve the results, and thus, were not
3.2. Step II: Transfer Modeling include in our model (results in supplementary material).
Given a source task s and a target task t, where s ∈ S
and t ∈ T , a transfer network learns a small readout func- 3.3. Step III: Ordinal Normalization using Analytic
tion for t given a statistic computed for s (see Fig 4). The Hierarchy Process (AHP)
statistic is the representation for image I from the encoder We want to have an affinity matrix of transferabilities
of s: Es (I). The readout function (Ds→t ) is parameterized across tasks. Aggregating the raw losses/evaluations Ls→t
by θs→t minimizing the loss Lt : from transfer functions into a matrix is obviously problem-
h i atic as they have vastly different scales and live in different
Ds→t := arg min EI∈D Lt Dθ Es (I) , ft (I) , (1) spaces (see Fig. 7-left). Hence, a proper normalization is
θ
needed. A naive solution would be to linearly rescale each
where ft (I) is ground truth of t for image I. Es (I) may or row of the matrix to the range [0, 1]. This approach fails
may not be sufficient for solving t depending on the relation when the actual output quality increases at different speeds
between t and s (examples in Fig. 5). Thus, the performance w.r.t. the loss. As the loss-quality curve is generally un-
of Ds→t is a useful metric as task affinity. We train transfer known, such approaches to normalization are ineffective.
functions for all feasible source-target combinations. Instead, we use an ordinal approach in which the output
Accessibility: For a transfer to be successful, the latent quality and loss are only assumed to change monotonically.
representation of the source should both be inclusive of suf- For each t, we construct Wt a pairwise tournament matrix
ficient information for solving the target and have the in- between all feasible sources for transferring to t. The ele-
formation accessible, i.e. easily extractable (otherwise, the ment at (i, j) is the percentage of images in a held-out test
raw image or its compression based representations would set, Dtest , on which si transfered to t better than sj did (i.e.
be optimal). Thus, it is crucial for us to adopt a low-capacity Dsi →t (I) > Dsj →t (I)).
(small) architecture as transfer function trained with a small We clip this intermediate pairwise matrix Wt to be in
amount of data, in order to measure transferability condi- [0.001, 0.999] as a form of Laplace smoothing. Then we
tioned on being highly accessible. We use a shallow fully divide Wt′ = Wt /WtT so that the matrix shows how many
convolutional network and train it with little data (8x to times better si is compared to sj . The final tournament ratio
′
120x less than task-specific networks). matrix is positive reciprocal with each element wi,j of Wt′ :

3715
Autoencoding
Image GT (Normals) Fully Supervised Image GT (Reshade) Fully Supervised Object Class. (1000)
Scene Class
Curvature
Denoising
2D Edges
Occlusion Edges
Egomotion
Cam. Pose (fix)
2D Keypoint
3D Keypoint
Cam. Pose (nonfix)
Matching
nd nd
Reshading
{Occlusion Edges + Curvature } = 2 order transfer { 3D Keypoints + Surface Normals } = 2 order transfer Z-Depth
Distance
Normals
Layout
2.5D Segm.
2D Segm.
Semantic Segm.
Vanishing Pts.

2.5D Se ts.
Po ose m.

.
m. . P Seg .

Va ntic Segm.
t C pe j.

nis eg .
sha roj.

Tas hin m.
pec ts.
De rvatu g

hin on
Dis epth
E No nce
no re

Z-Dding

Va gomrmals
clu 2D Eising

nti Cla g

C Jig ting
2Dion Edges

Se Scenatchin0)

No tance

ific
3D Key dges

In-ose ( on

Z-Dding
(no (fix)

In- c Se ss

in )

2.5 Layoals
Ob Tas domation

Re m Ping
las Layfix)

Pain gm

Dis epth
Ca am D gm
Re ypoin t

Raoloriz saw

la od )
t C e C 0)

Curizati 0)

Ca Ego Edges

Ca 3D Key saw
se p o t
Colass ( lass
Ob Au ss ( ific

m. mo es

S m
2D Jigting

ndo h )
clu D ing

Se 2D Segut
M 100 t

(no int
sh a t

Ob S ss. (1 ing

D rvatuon
Oc 2enois re
Ke poin

poin
jec k-S Pro

Pa fix
s. ( o u

jec toe 100

Ra Matcnfix
Cu codin

2 gP

k-S g P
jec cen 00

lo 10

sio Edg
nis oti

rm
ta

c
n

Po ey
t C nc
ma e

D
n

K
toe

P
se

n
s

ma
Au

tC
Oc

m.
jec
Figure 6: Higher-Order Transfers. Representations can contain com-

Ob
plementary information. E.g. by transferring simultaneously from 3D Figure 7: First-order task affinity matrix before (left) and after (right)
Edges and Curvature individual stairs were brought out. See our publicly Analytic Hierarchy Process (AHP) normalization. Lower means better
available interactive transfer visualization page for more examples. transfered. For visualization, we use standard affinity-distance method
dist = e−β·P (where β = 20 and e is element-wise matrix exponential).
See supplementary material for the full matrix with higher-order transfers.

′ EI∈Dtest [Dsi →t (I) > Dsj →t (I)]

wi,j = . (2) the relative importance of each target task and ℓi specifying
EI∈Dtest [Dsi →t (I) < Dsj →t (I)]
the relative cost of acquiring labels for each task.
We quantify the final transferability of si to t as the cor- The BIP is parameterized by a vector x where each trans-
responding (ith ) component of the principal eigenvector of fer and each task is represented by a binary variable; x indi-
Wt′ (normalized to sum to 1). The elements of the principal cates which nodes are picked to be source and which trans-
eigenvector are a measure of centrality, and are proportional fers are selected. The canonical form for a BIP is:
to the amount of time that an infinite-length random walk on maximize cT x ,
Wt′ will spend at any given source [59]. We stack the prin- subject to Ax b
cipal eigenvectors of Wt′ for all t ∈ T , to get an affinity
matrix P (‘p’ for performance)—see Fig. 7, right. and x ∈ {0, 1}|E|+|V| .
This approach is derived from Analytic Hierarchy Pro- Each element ci for a transfer is the product of the im-
cess [76], a method widely used in operations research to portance of its target task and its transfer performance:
create a total order based on multiple pairwise comparisons. ci := rtarget(i) · pi . (3)
Hence, the collective performance on all targets is the sum-
3.4. Step IV: Computing the Global Taxonomy mation of their individual AHP performance, pi , weighted
by the user specified importance, ri .
Given the normalized task affinity matrix, we need to
devise a global transfer policy which maximizes collective Now we add three types of constraints via matrix A to
performance across all tasks, while minimizing the used su- enforce each feasible solution of the BIP instance corre-
pervision. This problem can be formulated as subgraph se- sponds to a valid subgraph for our transfer learning prob-
lection where tasks are nodes and transfers are edges. The lem: Constraint I: if a transfer is included in the subgraph,
optimal subgraph picks the ideal source nodes and the best all of its source nodes/tasks must be included too, Con-
edges from these sources to targets while satisfying that straint II: each target task has exactly one transfer in, Con-
the number of source nodes does not exceed the supervi- straint III: supervision budget is not exceeded.
sion budget. We solve this subgraph selection problem us- Constraint I: For each row ai in A we require ai · x ≤ bi ,
ing Boolean Integer Programming (BIP), described below, where 
which can be solved optimally and efficiently [39, 14]. |sources(i)| if k = i

Our transfers (edges), E, are indexed by i with the form ai,k = −1 if (k − |E|) ∈ sources(i) (4)

0
({si1 , . . . , simi }, ti ) where {si1 , . . . , simi } ⊂ S and ti ∈ T . otherwise
We define operators returning target and sources of an edge: bi = 0. (5)
sources Constraint II: Via the row a|E|+j , we enforce that each
{si1 , . . . , simi }, ti 7−−−−−→ {si1 , . . . , simi }

target has exactly one transfer:
{si1 , . . . , simi }, ti
target
ti . a|E|+j,i := 2 · ✶{target(i)=j} , b|E|+j := −1. (6)

7−−−−→
Constraint III: the solution is enforced to not exceed the

Solving a task t by fully supervising it is denoted as {t}, t .
We also index the targets T with j so that in this section, i budget. Each transfer i is assigned a label cost ℓi , so
is an edge and j is a target. a|E|+|V|+1,i := ℓi , b|E|+|V|+1 := γ. (7)
The parameters of the problem are: the supervision bud- The elements of A not defined above are set to 0. The
get (γ) and a measure of performance on a target from each problem is now a valid BIP and can be optimally solved in
of its transfers (pi ), i.e. the affinities from P . We can also a fraction of a second [39]. The BIP solution x̂ corresponds
optionally include additional parameters of: rj specifying to the optimal subgraph, which is our taxonomy.

3716
Supervision Budget 2 Supervision Budget 8 Supervision Budget 15 Supervision Budget 26
Colo
orizaation 2D Segm. Vanishing Pts. Occlusion Edges Cam. Pose (nonfix)
Cam. Pose (fix) Normals
Egomotion
Jig
gsaw
w 2D Keypoints Denoising Occlusion Edges Matching Normals
Semantic Segm.
Jigsaw
Transfer Order 1
Cam. Pose (fix)
Reshading 2D Edges 3D Keypoints Normals
Reshading
Reshading
Distance
i
Curvature
Supervision Budget 8 - Order 4 (zoomed)
2.5D Segm. Distance Layout Reshading
Semantic Segm. Autoencoding Z-Depth 2D Segm.
Vanishing Pts. Semantic Z-Depth
2.5D Segm. In-ppaintting Egomotion
Distance Matching Layout Curvature Segm. Jiggsaw Matching
Egomotion
Z-Depth Egomotion Z-Depth 2D Edges
Cam. Pose (nonfix) Semantic Segm. Randdom Proj
o.
Coloorizaation Denoising
Occlusion Edges Normals 3D Keypoints Rando
om Proj
o.
3D Keypoints Vanishing Pts. 2D
2D Edges Random Proj
Object Class. Vanishing o. 2.5D Segm. Autoencoding Segm. 2D
(1000)
Pts. 2.5D Segm.
Cam. Pose (fix)
Scene Class. Edges
Cam. Pose (nonfix) Cam. Pose (fix) In-painting Autoencoding 3D Keypoints
Scene Class. Scene Class. Collorization In-ppainttingg Layout Denoising
Autoencoding
g Object Class. 2D Segm. 2D Keypoints Autoencoding
Curvature Object Class. (1000) Jig
gsaw
w Denoising
Layout (1000) 2D Edges Cam. Pose (nonfix)
In-paintting
g 2D Segm. Scene Class. 2D Keypoints Colorizattion Occlusion Edges Object Class. (1000)
Denoising Curvature
Rando
om Proj
o. 2D Keypoints Matching Distance

Colorrizattion Denoising Matching Occlusion Edges

Autoencoding Autoencoding 2D Keypoints 2D Edges
2D Segm. Occlusion Edges Distance Layout
In-painting
3D Keypoints
2D Keypoints Cam. Pose
Transfer Order 2

Normals
Z-Depth 2D Edges 2D Edges Semantic Segm. 2D Keypoints (fix) Vanishing Pts.
2D Keypoints Semantic Segm. Scene Class. Normals Jiigsaaw Egomotion
Cam. Pose (fix) Cam. Pose Curvature Cam. Pose
Scene Class. 2.5D Segm. Layout
Distance 2D Segm. Colo
orizaation (fix) Curvature 2.5D Segm.
3D Keypoints Object Class. Cam. Pose (nonfix)
Jigssaw (nonfix)
(1000) Rando
om Proj
o. Randomm Proj
o. 3D Keypoints
Z-Depth Coloorizaation Cam. Pose (fix)
Denoising Curvature
Layout
Distance
Normals Semantic Egomotion Reshading 2D Segm.
Normals In-painting
2.5D Segm. Matching Object Class. (1000)
Cam. Pose (nonfix) Semantic
Segm. Distance
Scene Class. Distance . Jiigsaw
w Segm. Z-Depth
Reshading Occlusion Edges Cam. Pose (fix) Z-Depth Vanishing Pts. 2D Edges In-painting Scene Class. Reshading
2.5D Segm. 2D Keypoints In-ppaintting
Egomotion Normals Curvature Autoencoding Reshading
Colorization
Object Class. Cam. Pose (nonfix) Layout Vanishing Pts. Layout Matching Z-Depth Egomotion
(1000) Randdom Proj
o. Reshading Cam. Pose (nonfix)
Vanishing Pts. Jigsaaw 3D Keypoints Denoising
Coloorizaation Vanishing Pts.
Scene Class.
Egomotion Occlusion Edges 2D Segm.
In-ppainnting Object Class.
Jiigsaw
w Matching Denoising Autoencoding (1000) Randdom Proj
o.
3D Keypoints Occlusion Edges
2D Segm. Jiggsaaw Object Class.
2D Edges
In-paintting 2D Segm. (1000)
Matching
Colorrizattion Denoising Semantic
Autoencoding Cam. Pose (fix)
Autoencoding Object Class.
Egomotion In-ppainnting
Layout Jiigsaw
w Segm.
Cam. Pose (fix) Reshading Denoising (1000)
Transfer Order 4

2D Edges Z-Depth
Z-Depth
Semantic Segm.
2D Edges
Distance Egomotion
Vanishing Pts. 2.5D Random Proj
o.
2D Keypoints Cam. Pose (nonfix) 2D Segm. Cam. Pose (nonfix) Segm.
Cam. Pose (fix) Layout
Cam. Pose Vanishing Pts. Layout Denoising Curvature
Distance 2D Segm. 2D Keypoints Normals Matching Coloorizattion
(fix) Cam. Pose Autoencoding
3D Keypoints Vanishing Pts. Scene Class.
igsaaw Reshading
Denoising Curvature (nonfix) Autoencoding
Layout Distance Scene Class.
Normals Normals In-painting Z-Depth 2D Keypoints Ranndom Proj
o.
2.5D Segm. Matching Z-Depth Reshading 2D Edges Normals
Scene Class Coloorizaation 2D Keypoints
Colorization Occlusion Edges
Reshading Occlusion Edges Egomotion Distance
Scene Class. andoom Proj
o. Occlusion
Egomotion Matching Semantic Segm.
Object Class. Cam. Pose (nonfix) 3D Keypoints Occlusion Edges Edges
Semantic Segm.
(1000) Randdom Proj
o. Vanishing Pts. Matching 3D Keypoints Curvature
Semantic Segm. 2.5D Segm.
In-ppainnting Object Class. (1000) Curvature
Jiigsaw
w 2.5D Segm.Random Proj o.
Object Class. 3D Keypoints
Curvature 2.5D Segm.
(1000)

Figure 8: Computed taxonomies for solving 22 tasks given various supervision budgets (x-axes), and maximum allowed transfer orders (y-axes). One
is magnified for better visibility. Nodes with incoming edges are target tasks, and the number of their incoming edges is the order of their chosen transfer
function. Still transferring to some targets when tge budget is 26 (full budget) means certain transfers started performing better than their fully supervised
task-specific counterpart. See the interactive solver website for color coding of the nodes by Gain and Quality metrics. Dimmed nodes are the source-only
tasks, and thus, only participate in the taxonomy if found worthwhile by the BIP optimization to be one of the sources.

4. Experiments works are all trained using the same hyperparameters as the
task-specific networks, except that we anneal the learning
With 26 tasks in the dictionary (4 source-only tasks), our rate earlier since they train much faster. Detailed definitions
approach leads to training 26 fully supervised task-specific of architectures, training process, and experiments with dif-
networks, 22 × 25 transfer networks in 1st order, and 22 × ferent encoders can be found in the supplementary material.
25 th

k for k order, from which we sample according to the
Data Splits: Our dataset includes 4 million images. We
procedure in Sec. 3. The total number of transfer functions
made publicly available the models trained on full dataset,
trained for the taxonomy was ∼3,000 which took 47,886
but for the experiments reported in the main paper, we
GPU hours on the cloud.
used a subset of the dataset as the extracted structure stabi-
Out of 26 tasks, we usually use the following 4 as source- lized and did not change when using more data (explained
only tasks (described in Sec. 3) in the experiments: col- in Sec. 5.2). The used subset is partitioned into training
orization, jigsaw puzzle, in-painting, random projection. (120k), validation (16k), and test (17k) images, each from
However, the method is applicable to an arbitrary partition- non-overlapping sets of buildings. Our task-specific net-
ing of the dictionary into T and S. The interactive solver works are trained on the training set and the transfer net-
website allows the user to specify any desired partition. works are trained on a subset of validation set, ranging from
Network Architectures: We preserved the architectural 1k images to 16k, in order to model the transfer patterns un-
and training details across tasks as homogeneously as possi- der different data regimes. In the main paper, we report all
ble to avoid injecting any bias. The encoder architecture is results under the 16k transfer supervision regime (∼10% of
identical across all task-specific networks and is a fully con- the split) and defer the additional sizes to the supplementary
volutional ResNet-50 without pooling. All transfer func- material and website (see Sec. 5.2). Transfer functions are
tions include identical shallow networks with 2 conv layers evaluated on the test set.
(concatenated channel-wise if higher-order). The loss (Lt ) How good are the trained task-specific networks? Win
and decoder’s architecture, though, have to depend on the rate (%) is the proportion of test set images for which a
task as the output structures of different tasks vary; for all baseline is beaten. Table 1 provides win rates of the task-
pixel-to-pixel tasks, e.g. normal estimation, the decoder is a specifc networks vs. two baselines. Visual outputs for a ran-
15-layer fully convolutional network; for low dimensional dom test sample are in Fig. 3. The high win rates in Table 1
tasks, e.g. vanishing points, it consists of 2-3 FC layers. and qualitative results show the networks are well trained
All networks are trained using the same hyperparameters and stable and can be relied upon for modeling the task
regardless of task and on exactly the same input images. space. See results of applying the networks on a YouTube
Tasks with more than one input, e.g. relative camera pose, video frame-by-frame here. A live demo for user uploaded
share weights between the encoder towers. Transfer net- queries is available here.

3717
Task avg rand Task avg rand Task avg rand Supervision Budget Increase (→)
Budget
Denoising 100 99.9 Layout 99.6 89.1 Scene Class. 97.0 93.4
Autoenc. 100 99.8 2D Edges 100 99.9 Occ. Edges 100 95.4
Reshading 94.9 95.2 Pose (fix) 76.3 79.5 Pose (nonfix) 60.2 61.9
Inpainting 99.9 - 2D Segm. 97.7 95.7 2.5D Segm. 94.2 89.4
Curvature 78.7 93.4 Matching 86.8 84.6 Egomotion 67.5 72.3
Normals 99.4 99.5 Vanishing 99.5 96.4 2D Keypnt. 99.8 99.4
Z-Depth 92.3 91.1 Distance 92.4 92.1 3D Keypnt. 96.0 96.9
Mean 92.4 90.9
Table 1: Task-Specific Networks’ Sanity: Win rates vs. random (Gaus-
sian) network representation readout and statistically informed guess avg.

To get a sense of the quality of our networks vs. state-of-

the-art task-specific methods, we compared our depth esti- max transfer order=1
Gain
max transfer order=4 max transfer order=1
Quality
max transfer order=4

mator vs. released models of [51] which led to outperform- Figure 9: Evaluation of taxonomy computed for solving the full task
ing [51] with a win rate of 88% and losses of 0.35 vs. 0.47 dictionary. Gain (left) and Quality (right) values for each task using the
policy suggested by the computed taxonomy, as the supervision budget
(further details in the supplementary material). In general, increases(→). Shown for transfer orders 1 and 4.
we found the task-specific networks to perform on par or
better than state-of-the-art for many of the tasks, though we

Taxonomy
ImageNet[49]

Noroozi.[65]
Zhang.[100]
Agrawal.[1]

full sup.
do not formally benchmark or claim this.

Zamir.[97]
Wang.[93]
scratch
Order Increase (→)
4.1. Evaluation of Computed Taxonomies Order Task
Depth 88 88 93 89 88 84 86 43 -
.03 .04 .04 .03 .04 .03 .03 .02 .02
Fig. 8 shows the computed taxonomies optimized to 80 52 83 74 74 71 75 15 -
Scene Cls. 3.30
solve the full dictionary, i.e. all tasks are placed in T and S 2.76 3.56 3.15 3.17 3.09 3.19 2.23 2.63

(except for 4 source-only tasks that are in S only). This was Sem. Segm. 78 79 82 85 76 78 84 21 -
1.74 1.88 1.92 1.80 1.85 1.74 1.71 1.42 1.53
79 54 82 76 75 76 76 34 -
Object Cls. 4.08
done for various supervision budgets (columns) and max- 3.57 4.27 3.99 3.98 4.00 3.97 3.26 3.46

imum allowed order (rows) constraints. Still seeing trans- Normals 97 98 98 98 98 97 97 6 -

.22 .30 .34 .28 .28 .23 .24 .12 .15
fers to some targets when the budget is 26 (full dictionary) 80 93 92 89 90 84 87 40 -
2.5D Segm. .21 .34 .34 .26 .29 .22 .24 .16 .17
means certain transfers became better than their fully super-
Occ. Edges 93 96 95 93 94 93 94 42 -
vised task-specific counterpart. .16 .19 .18 .17 .18 .16 .17 .12 .13

Curvature 88 94 89 85 88 92 88 29 -
While Fig. 8 shows the structure and connectivity, Fig. 9 .25 .28 .26 .25 .26 .26 .25 .21 .22
79 78 83 77 76 74 71 59 -
quantifies the results of taxonomy recommended transfer Egomotion 8.60 8.58 9.26 8.41 8.34 8.15 7.94 7.32 6.85
policies by two metrics of Gain and Quality, defined as: Layout 80 76 85 79 77 78 70 36 -
.66 .66 .85 .65 .65 .62 .54 .37 .41
Gain: win rate (%) against a network trained from scratch
using the same training data as transfer networks’. That Figure 10: Generalization to Novel Tasks. Each row shows a novel
test task. Left: Gain and Quality values using the devised “all-for-one”
is, the best that could be done if transfer learning was not transfer policies for novel tasks for orders 1-4. Right: Win rates (%) of the
utilized. This quantifies the gained value by transferring. transfer policy over various self-supervised methods, ImageNet features,
Quality: win rate (%) against a fully supervised network and scratch are shown in the colored rows. Note the large margin of win
trained with 120k images (gold standard). by taxonomy. The uncolored rows show corresponding loss values.
Each column in Fig. 9 shows a supervision budget. As Fig. 10 (left) shows the Gain and Quality of the transfer
apparent, good results can be achieved even when the super- policy found by the BIP for each task. Fig. 10 (right) com-
vision budget is notably smaller than the number of solved pares the taxonomy suggested policy against some of the
tasks, and as the budget increases, results improve (ex- best existing self-supervised methods [93, 100, 65, 97, 1],
pected). Results are shown for 2 maximum allowed orders. ImageNet FC7 features [49], training from scratch, and a
fully supervised network (gold standard).
4.2. Generalization to Novel Tasks
The results in Fig. 10 (right) are noteworthy. The large
The taxonomies in Sec. 4.1 were optimized for solving win margin for taxonomy shows that carefully selecting
all tasks in the dictionary. In many situations, a practitioner transfer policies depending on the target is superior to fixed
is interested in a single task which even may not be in the transfers, such as the ones employed by self-supervised
dictionary. Here we evaluate how taxonomy transfers to a methods. ImageNet features which are the most popular
novel out-of-dictionary task with little data. off-the-shelf features in vision are also outperformed by
This is done in an all-for-one scenario where we put one those policies. Additionally, though the taxonomy transfer
task in T and all others in S. The task in T is target-only policies lose to fully supervised networks (gold standard) in
and has no task-specific network. Its limited data (16k) is most cases, the results often get close with win rates in 40%
used to train small transfer networks to sources. This basi- range. These observations suggests the space has a rather
cally localizes where the target would be in the taxonomy. predicable and strong structure. For graph visualization of

3718
Taxonomy Significance Test Transferring to ImageNet Transferring to MIT Places
(Spearman’s correlation = 0.823) (Spearman’s correlation = 0.857)
Top-1 Top-1
9
Top-5 Top-5

Accuracy
Accuracy
3

Supervision Budget Supervision Budget

Figure 11: Structure Significance. Our taxonomy compared with ran-
dom transfer policies (random feasible taxonomies that use the maximum
allowable supervision budget). Y-axis shows Quality or Gain, and X-axis
is the supervision budget. Green and gray represent our taxonomy and ran- Figure 12: Evaluating the discovered structure on other datasets:
dom connectivities, respectively. Error bars denote 5th –95th percentiles. ImageNet [75] and MIT Places [101]. Y-axis shows accuracy on the
external benchmark while bars on x-axis are ordered by taxonomy’s pre-
the all-for-one taxonomy policies please see the supplemen- dicted performance based on our dataset. A monotonically decreasing plot
corresponds to preserving identical orders and perfect generalization.
tary material. The solver website allows generating the tax-
onomy for arbitrary sets of target-only tasks.
size and architecture of task-specific networks and transfer
5. Significance Test of the Structure networks by 4x), we found the outputs to be remarkably
stable leading to almost no change in the output taxonomy
The previous evaluations showed good transfer results in computed on top. Detailed results and experimental setup
terms of Quality and Gain, but how crucial is it to use our of each tests are reported in the supplementary material.
taxonomy to choose smart transfers over just choosing any
transfer? In other words, how significant/strong is the dis- 6. Limitations and Discussion
covered structure of task space? Fig. 11 quantifies this by We presented a method for modeling the space of visual
showing the performance of our taxonomy versus a large tasks by way of transfer learning and showed its utility in
set of taxonomies with random connectivities. Taxonomy reducing the need for supervision. The space of tasks is an
outperformed all other connectivities by a large margin sig- interesting object of study in its own right and we have only
nifying both existence of a strong structure in the space as scratched the surface in this regard. We also made a number
well as a good modeling of it by our approach. See the sup- of assumptions in the framework which should be noted.
plementary material for full experimental details. Model Dependence: We used a computational approach
and adopted neural networks as our function class. Though
5.1. Evaluation on MIT Places & ImageNet
we validated the stability of the findings w.r.t various archi-
To what extent are our findings dataset dependent, and tectures and datasets, it should be noted that the findings are
would the taxonomy change if done on another dataset? We in principle model and data specific.
examined this by finding the ranking of all tasks for trans- Compositionality: We performed the modeling via a set
ferring to two target tasks of object classification and scene of common human-defined visual tasks. It is natural to con-
classification on our dataset. We then fine tuned our task- sider a further compositional approach in which such com-
specific networks on other datasets (MIT Places [101] for mon tasks are viewed as observed samples which are com-
scene classification, ImageNet [75] for object classification) posed of computationally found latent subtasks.
and evaluated them on their respective test sets and metrics. Space Regularity: We performed modeling of a dense
Fig. 12 shows how the results correlate with taxonomy’s space via a sampled dictionary. Though we showed a good
ranking from our dataset. The Spearman’s rho between the tolerance w.r.t. to the choice of dictionary and transferring
taxonomy ranking and the Top-1 ranking is 0.857 on Places to out-of-dictionary tasks, this outcome holds upon a proper
and 0.823 on ImageNet showing a notable correlation. See sampling of the space as a function of its regularity. More
the supplementary material for full experimental details. formal studies on properties of the computed space is re-
quired for this to be provably guaranteed for a general case.
5.2. Universality of the Structure
Lifelong Learning: We performed the modeling in one
We employed a computational approach with various de- go. In many cases, e.g. lifelong learning, the system is
sign choices. It is important to investigate how specific to evolving and the number of mastered tasks constantly
those the discovered structure is. We did stability tests by increase. Such scenarios require augmentation of the
computing the variance in our output when making changes structure with expansion mechanisms based on new beliefs.
in one of the following system choices: I. architecture of
task-specific networks, II. architecture of transfer func- Acknowledgement: We acknowledge the support of NSF
tion networks, III. amount of data available for training (DMS-1521608), MURI (1186514-1-TBCJE), ONR MURI
transfer networks, IV. datasets, V. data splits, VI. choice (N00014-14-1-0671), Toyota(1191689-1-UDAWF), ONR
of dictionary. Overall, despite injecting large changes (e.g. MURI (N00014-13-1-0341), Nvidia, Tencent, a gift by
varying the size of training data of transfer functions by 16x, Amazon Web Services, a Google Focused Research Award.

3719
References [19] Y. Duan, J. Schulman, X. Chen, P. L. Bartlett, I. Sutskever,
and P. Abbeel. Rl2: Fast reinforcement learning via slow
[1] P. Agrawal, J. Carreira, and J. Malik. Learning to see by reinforcement learning. arXiv preprint arXiv:1611.02779,
moving. In Proceedings of the IEEE International Confer- 2016. 2
ence on Computer Vision, pages 37–45, 2015. 7
[20] D. Erhan, Y. Bengio, A. Courville, P.-A. Manzagol, P. Vin-
[2] M. Andrychowicz, M. Denil, S. Gomez, M. W. Hoffman, cent, and S. Bengio. Why does unsupervised pre-training
D. Pfau, T. Schaul, and N. de Freitas. Learning to learn help deep learning? Journal of Machine Learning Re-
by gradient descent by gradient descent. In Advances in search, 11(Feb):625–660, 2010. 2
Neural Information Processing Systems, pages 3981–3989,
[21] A. Faktor and M. Irani. clustering by composition–
2016. 2
unsupervised discovery of image categories. In European
[3] I. Armeni, S. Sax, A. R. Zamir, and S. Savarese. Joint
Conference on Computer Vision, pages 474–487. Springer,
2d-3d-semantic data for indoor scene understanding. arXiv
2012. 2
preprint arXiv:1702.01105, 2017. 4
[22] L. Fe-Fei et al. A bayesian approach to unsupervised one-
[4] S. Arora, A. Bhaskara, R. Ge, and T. Ma. Provable bounds
shot learning of object categories. In Computer Vision,
for learning some deep representations. In International
2003. Proceedings. Ninth IEEE International Conference
Conference on Machine Learning, pages 584–592, 2014. 1
on, pages 1134–1141. IEEE, 2003. 2
[5] Y. Aytar and A. Zisserman. Tabula rasa: Model transfer
[23] L. Fei-Fei, R. Fergus, and P. Perona. One-shot learning of
for object category detection. In Computer Vision (ICCV),
object categories. IEEE transactions on pattern analysis
2011 IEEE International Conference on, pages 2252–2259.
and machine intelligence, 28(4):594–611, 2006. 2
IEEE, 2011. 2
[24] B. Fernando, A. Habrard, M. Sebban, and T. Tuytelaars.
[6] Y. Bengio, A. Courville, and P. Vincent. Representa-
Unsupervised visual domain adaptation using subspace
tion learning: A review and new perspectives. IEEE
alignment. In Proceedings of the IEEE international con-
transactions on pattern analysis and machine intelligence,
ference on computer vision, pages 2960–2967, 2013. 2
35(8):1798–1828, 2013. 2
[25] C. Finn, P. Abbeel, and S. Levine. Model-agnostic meta-
[7] P. Berkhin et al. A survey of clustering data mining tech-
learning for fast adaptation of deep networks. arXiv
niques. Grouping multidimensional data, 25:71, 2006. 2
preprint arXiv:1703.03400, 2017. 2
[8] E. Bienenstock, S. Geman, and D. Potter. Compositionality,
mdl priors, and object recognition. In Advances in neural [26] C. Finn, S. Levine, and P. Abbeel. Guided cost learn-
information processing systems, pages 838–844, 1997. 2 ing: Deep inverse optimal control via policy optimization.
CoRR, abs/1603.00448, 2016. 2
[9] H. Bilen and A. Vedaldi. Integrated perception with re-
current multi-task neural networks. In Advances in neural [27] C. Finn, X. Y. Tan, Y. Duan, T. Darrell, S. Levine, and
information processing systems, pages 235–243, 2016. 2 P. Abbeel. Deep spatial autoencoders for visuomotor learn-
[10] J. Bingel and A. Søgaard. Identifying beneficial task rela- ing. In Robotics and Automation (ICRA), 2016 IEEE Inter-
tions for multi-task learning in deep neural networks. arXiv national Conference on, pages 512–519. IEEE, 2016. 2
preprint arXiv:1702.08303, 2017. 2 [28] C. Finn, T. Yu, J. Fu, P. Abbeel, and S. Levine. Generalizing
[11] O. Boiman and M. Irani. Similarity by composition. In skills with semi-supervised reinforcement learning. CoRR,
Advances in neural information processing systems, pages abs/1612.00429, 2016. 2
177–184, 2007. 2 [29] C. Finn, T. Yu, T. Zhang, P. Abbeel, and S. Levine. One-
[12] A. Chang, A. Dai, T. Funkhouser, M. Halber, M. Nießner, shot visual imitation learning via meta-learning. CoRR,
M. Savva, S. Song, A. Zeng, and Y. Zhang. Matterport3d: abs/1709.04905, 2017. 2
Learning from rgb-d data in indoor environments. arXiv [30] I. K. Fodor. A survey of dimension reduction techniques.
preprint arXiv:1709.06158, 2017. 4 Technical report, Lawrence Livermore National Lab., CA
[13] Z. Chen and B. Liu. Lifelong Machine Learning. Morgan (US), 2002. 2
& Claypool Publishers, 2016. 2 [31] R. M. French. Catastrophic forgetting in connectionist net-
[14] I. I. CPLEX. V12. 1: Users manual for cplex. International works: Causes, consequences and solutions. Trends in Cog-
Business Machines Corporation, 46(53):157, 2009. 5 nitive Sciences, 3(4):128–135, 1999. 2
[15] C. Doersch, A. Gupta, and A. A. Efros. Unsupervised vi- [32] R. Ge. Provable algorithms for machine learning problems.
sual representation learning by context prediction. In Pro- PhD thesis, Princeton University, 2013. 1
ceedings of the IEEE International Conference on Com- [33] S. Geman, D. F. Potter, and Z. Chi. Composition systems.
puter Vision, pages 1422–1430, 2015. 2 Quarterly of Applied Mathematics, 60(4):707–736, 2002. 2
[16] C. Doersch and A. Zisserman. Multi-task self-supervised [34] R. Gopalan, R. Li, and R. Chellappa. Domain adaptation
visual learning. arXiv preprint arXiv:1708.07860, 2017. 2 for object recognition: An unsupervised approach. In Com-
[17] J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, puter Vision (ICCV), 2011 IEEE International Conference
E. Tzeng, and T. Darrell. Decaf: A deep convolutional on, pages 999–1006. IEEE, 2011. 2
activation feature for generic visual recognition. In Inter- [35] A. Gopnik, C. Glymour, D. Sobel, L. Schulz, T. Kushnir,
national conference on machine learning, pages 647–655, and D. Danks. A theory of causal learning in children:
2014. 2 Causal maps and bayes nets. 111:3–32, 02 2004. 2
[18] J. Donahue, P. Krähenbühl, and T. Darrell. Adversarial fea- [36] A. Gopnik, C. Glymour, D. M. Sobel, L. E. Schulz,
ture learning. arXiv preprint arXiv:1605.09782, 2016. 2 T. Kushnir, and D. Danks. A theory of causal learning in

3720
children: causal maps and bayes nets. Psychological re- [55] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ra-
view, 111(1):3, 2004. 2 manan, P. Dollár, and C. L. Zitnick. Microsoft coco: Com-
[37] A. Gopnik, A. N. Meltzoff, and P. K. Kuhl. The scientist in mon objects in context. In European conference on com-
the crib: Minds, brains, and how children learn. William puter vision, pages 740–755. Springer, 2014. 4
Morrow & Co, 1999. 2 [56] F. Liu, G. Lin, and C. Shen. CRF learning with CNN
[38] A. Graves, G. Wayne, and I. Danihelka. Neural turing ma- features for image segmentation. CoRR, abs/1503.08263,
chines. CoRR, abs/1410.5401, 2014. 2 2015. 2
[39] I. Gurobi Optimization. Gurobi optimizer reference man- [57] Z. Luo, Y. Zou, J. Hoffman, and L. Fei-Fei. Label efficient
ual, 2016. 5 learning of transferable representations across domains and
[40] K. Henry. The theory and applications of homomorphic tasks. 2
cryptography. Master’s thesis, University of Waterloo, [58] J. Malik, P. Arbeláez, J. Carreira, K. Fragkiadaki, R. Gir-
2008. 2 shick, G. Gkioxari, S. Gupta, B. Hariharan, A. Kar, and
[41] G. Hinton, O. Vinyals, and J. Dean. Distilling the knowl- S. Tulsiani. The three rs of computer vision: Recognition,
edge in a neural network. arXiv preprint arXiv:1503.02531, reconstruction and reorganization. Pattern Recognition Let-
2015. 4 ters, 72:4–14, 2016. 2
[42] J. Hoffman, T. Darrell, and K. Saenko. Continuous mani- [59] N. Masuda, M. A. Porter, and R. Lambiotte. Random walks
fold based adaptation for evolving visual domains. In Pro- and diffusion on networks. Physics Reports, 716-717:1 –
ceedings of the IEEE Conference on Computer Vision and 58, 2017. Random walks and diffusion on networks. 5
Pattern Recognition, pages 867–874, 2014. 2 [60] M. Mccloskey and N. J. Cohen. Catastrophic interference in
[43] Y. Hoshen and S. Peleg. Visual learning of arithmetic oper- connectionist networks: The sequential learning problem.
ations. CoRR, abs/1506.02264, 2015. 2 The Psychology of Learning and Motivation, 24:104–169,
[44] F. Hu, G.-S. Xia, J. Hu, and L. Zhang. Transferring deep 1989. 2
convolutional neural networks for the scene classification of [61] L. Mihalkova, T. Huynh, and R. J. Mooney. Mapping and
high-resolution remote sensing imagery. Remote Sensing, revising markov logic networks for transfer learning. In
7(11):14680–14707, 2015. 2 AAAI, volume 7, pages 608–614, 2007. 2
[45] I.-H. Jhuo, D. Liu, D. Lee, and S.-F. Chang. Robust visual [62] T. Mikolov, Q. V. Le, and I. Sutskever. Exploiting simi-
domain adaptation with low-rank reconstruction. In Com- larities among languages for machine translation. CoRR,
puter Vision and Pattern Recognition (CVPR), 2012 IEEE abs/1309.4168, 2013. 2
Conference on, pages 2168–2175. IEEE, 2012. 2 [63] I. Misra, A. Shrivastava, A. Gupta, and M. Hebert. Cross-
[46] D. P. Kingma and J. Ba. Adam: A method for stochastic stitch networks for multi-task learning. In Proceedings
optimization. CoRR, abs/1412.6980, 2014. 2 of the IEEE Conference on Computer Vision and Pattern
[47] D. P. Kingma and M. Welling. Auto-encoding variational Recognition, pages 3994–4003, 2016. 2
bayes. arXiv preprint arXiv:1312.6114, 2013. 2 [64] A. Niculescu-Mizil and R. Caruana. Inductive transfer for
[48] I. Kokkinos. Ubernet: Training auniversal’convolutional bayesian network structure learning. In Artificial Intelli-
neural network for low-, mid-, and high-level vision us- gence and Statistics, pages 339–346, 2007. 2
ing diverse datasets and limited memory. arXiv preprint [65] M. Noroozi and P. Favaro. Unsupervised learning of vi-
arXiv:1609.02132, 2016. 2 sual representations by solving jigsaw puzzles. In European
[49] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet Conference on Computer Vision, pages 69–84. Springer,
classification with deep convolutional neural networks. In 2016. 2, 7
Advances in neural information processing systems, pages [66] M. Noroozi, H. Pirsiavash, and P. Favaro. Represen-
1097–1105, 2012. 7 tation learning by learning to count. arXiv preprint
[50] B. Kulis, K. Saenko, and T. Darrell. What you saw is not arXiv:1708.06734, 2017. 2
what you get: Domain adaptation using asymmetric ker- [67] M. Norouzi, T. Mikolov, S. Bengio, Y. Singer, J. Shlens,
nel transforms. In Computer Vision and Pattern Recogni- A. Frome, G. S. Corrado, and J. Dean. Zero-shot learn-
tion (CVPR), 2011 IEEE Conference on, pages 1785–1792. ing by convex combination of semantic embeddings. arXiv
IEEE, 2011. 2 preprint arXiv:1312.5650, 2013. 2
[51] I. Laina, C. Rupprecht, V. Belagiannis, F. Tombari, and [68] M. Ovsjanikov, M. Ben-Chen, J. Solomon, A. Butscher,
N. Navab. Deeper depth prediction with fully convolutional and L. Guibas. Functional maps: a flexible representation
residual networks. In 3D Vision (3DV), 2016 Fourth Inter- of maps between shapes. ACM Transactions on Graphics
national Conference on, pages 239–248. IEEE, 2016. 7 (TOG), 31(4):30, 2012. 2
[52] B. M. Lake, R. Salakhutdinov, and J. B. Tenenbaum. [69] D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, and A. A.
Human-level concept learning through probabilistic pro- Efros. Context encoders: Feature learning by inpainting. In
gram induction. Science, 350(6266):1332–1338, 2015. 2 Proceedings of the IEEE Conference on Computer Vision
[53] B. M. Lake, T. D. Ullman, J. B. Tenenbaum, and S. J. Ger- and Pattern Recognition, pages 2536–2544, 2016. 2
shman. Building machines that learn and think like people. [70] A. Pentina and C. H. Lampert. Multi-task learning with
Behavioral and Brain Sciences, pages 1–101, 2016. 2 labeled and unlabeled tasks. stat, 1050:1, 2017. 2
[54] Y. Li, H. Qi, J. Dai, X. Ji, and Y. Wei. Fully con- [71] J. Piaget and M. Cook. The origins of intelligence in chil-
volutional instance-aware semantic segmentation. arXiv dren, volume 8. International Universities Press New York,
preprint arXiv:1611.07709, 2016. 4 1952. 2

3721
[72] L. Y. Pratt. Discriminability-based transfer between neural [89] D. G. R. Tervo, J. B. Tenenbaum, and S. J. Gershman. To-
networks. In Advances in neural information processing ward the neural implementation of structure learning. Cur-
systems, pages 204–211, 1993. 2 rent opinion in neurobiology, 37:99–105, 2016. 2
[73] S. R. Richter, Z. Hayder, and V. Koltun. Playing for bench- [90] C. Tessler, S. Givony, T. Zahavy, D. J. Mankowitz, and
marks. In International Conference on Computer Vision S. Mannor. A deep hierarchical approach to lifelong learn-
(ICCV), 2017. 2 ing in minecraft. In AAAI, pages 1553–1561, 2017. 2
[74] S. T. Roweis and L. K. Saul. Nonlinear dimension- [91] S. Thrun and L. Pratt. Learning to learn. Springer Science
ality reduction by locally linear embedding. science, & Business Media, 2012. 2
290(5500):2323–2326, 2000. 2 [92] A. M. Turing. Computing machinery and intelligence.
[75] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, Mind, 59(236):433–460, 1950. 2
S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, [93] X. Wang and A. Gupta. Unsupervised learning of visual
et al. Imagenet large scale visual recognition challenge. representations using videos. In Proceedings of the IEEE
International Journal of Computer Vision, 115(3):211–252, International Conference on Computer Vision, pages 2794–
2015. 2, 4, 8 2802, 2015. 7
[76] R. W. Saaty. The analytic hierarchy process – what it is [94] X. Wang, K. He, and A. Gupta. Transitive invariance
and how it is used. Mathematical Modeling, 9(3-5):161– for self-supervised visual representation learning. arXiv
176, 1987. Mat/d Modelling, Vol. 9, No. 3-5, pp. 161-176, preprint arXiv:1708.02901, 2017. 2
1987. 5 [95] T. Winograd. Thinking machines: Can there be? Are we,
[77] K. Saenko, B. Kulis, M. Fritz, and T. Darrell. Adapting volume 200. University of California Press, Berkeley, 1991.
visual category models to new domains. Computer Vision– 2
ECCV 2010, pages 213–226, 2010. 2
[96] J. Yang, R. Yan, and A. G. Hauptmann. Adapting svm clas-
[78] R. Salakhutdinov, J. Tenenbaum, and A. Torralba. One-shot
sifiers to data with shifted distributions. In Data Mining
learning with a hierarchical nonparametric bayesian model.
Workshops, 2007. ICDM Workshops 2007. Seventh IEEE
In Proceedings of ICML Workshop on Unsupervised and
International Conference on, pages 69–76. IEEE, 2007. 2
Transfer Learning, pages 195–206, 2012. 2
[97] A. R. Zamir, T. Wekel, P. Agrawal, C. Wei, J. Malik, and
[79] J. Schulman, S. Levine, P. Moritz, M. I. Jordan, and
S. Savarese. Generic 3d representation via pose estimation
P. Abbeel. Trust region policy optimization. CoRR,
and matching. In European Conference on Computer Vi-
abs/1502.05477, 2015. 2
sion, pages 535–553. Springer, 2016. 2, 7
[80] A. Sharif Razavian, H. Azizpour, J. Sullivan, and S. Carls-
[98] A. R. Zamir, F. Xia, J. He, A. Sax, J. Malik, and S. Savarese.
son. Cnn features off-the-shelf: an astounding baseline
Gibson Env: Real-world perception for embodied agents.
for recognition. In Proceedings of the IEEE conference on
In 2018 IEEE Conference on Computer Vision and Pattern
computer vision and pattern recognition workshops, pages
Recognition (CVPR). IEEE, 2018. 4
806–813, 2014. 2
[81] D. L. Silver and K. P. Bennett. Guest editors introduction: [99] C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals.
special issue on inductive transfer learning. Machine Learn- Understanding deep learning requires rethinking general-
ing, 73(3):215–220, 2008. 2 ization. CoRR, abs/1611.03530, 2016. 2
[82] D. L. Silver, Q. Yang, and L. Li. Lifelong machine learning [100] R. Zhang, P. Isola, and A. A. Efros. Colorful image col-
systems: Beyond learning algorithms. In in AAAI Spring orization. In European Conference on Computer Vision,
Symposium Series, 2013. 2 pages 649–666. Springer, 2016. 2, 7
[83] R. Socher, M. Ganjoo, C. D. Manning, and A. Ng. Zero- [101] B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva.
shot learning through cross-modal transfer. In Advances Learning deep features for scene recognition using places
in neural information processing systems, pages 935–943, database. In Advances in neural information processing
2013. 2 systems, pages 487–495, 2014. 2, 4, 8
[84] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and
R. Salakhutdinov. Dropout: A simple way to prevent neural
networks from overfitting. Journal of Machine Learning
Research, 15:1929–1958, 2014. 2
[85] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan,
I. J. Goodfellow, and R. Fergus. Intriguing properties of
neural networks. CoRR, abs/1312.6199, 2013. 2
[86] J. B. Tenenbaum and T. L. Griffiths. Generalization, sim-
ilarity, and bayesian inference. Behavioral and Brain Sci-
ences, 24(4):629640, 2001. 2
[87] J. B. Tenenbaum, C. Kemp, T. L. Griffiths, and N. D. Good-
man. How to grow a mind: Statistics, structure, and abstrac-
tion. science, 331(6022):1279–1285, 2011. 2
[88] J. B. Tenenbaum, C. Kemp, and P. Shafto. Theory-based
bayesian models of inductive learning and reasoning. In
Trends in Cognitive Sciences, pages 309–318, 2006. 2

3722

Transfer Learning Through Embedding Spaces (Z-Lib - Io)
No ratings yet
Transfer Learning Through Embedding Spaces (Z-Lib - Io)
223 pages
(Computing Supplement 11) Dr. K. Daniilidis (Auth.), Prof. Dr. W. Kropatsch, Prof. Dr. R. Klette, Prof. Dr. F. Solina, Prof. Dr. R. Albrecht (Eds.) - Theoretical Foundations of Computer Vision-Springe
No ratings yet
(Computing Supplement 11) Dr. K. Daniilidis (Auth.), Prof. Dr. W. Kropatsch, Prof. Dr. R. Klette, Prof. Dr. F. Solina, Prof. Dr. R. Albrecht (Eds.) - Theoretical Foundations of Computer Vision-Springe
259 pages
Lec 16
No ratings yet
Lec 16
76 pages
Deep Learning For Economists
No ratings yet
Deep Learning For Economists
52 pages
Pap 1
No ratings yet
Pap 1
54 pages
Editing Models With Task Arithmetic
No ratings yet
Editing Models With Task Arithmetic
31 pages
Transferability in Deep Learning: A Survey: Junguang Jiang
No ratings yet
Transferability in Deep Learning: A Survey: Junguang Jiang
64 pages
A State-Of-The-Art Computer Vision Adopting Non - E
No ratings yet
A State-Of-The-Art Computer Vision Adopting Non - E
33 pages
Research On Learning Representations in Computer Vision
No ratings yet
Research On Learning Representations in Computer Vision
52 pages
Efficient Adaptation of Large Vision Transformer Via Adapter Re Composing Paper Conference
No ratings yet
Efficient Adaptation of Large Vision Transformer Via Adapter Re Composing Paper Conference
20 pages
Merging and Disentangling Views in Visual Reinforcement Learning For Robotic Manipulation
No ratings yet
Merging and Disentangling Views in Visual Reinforcement Learning For Robotic Manipulation
22 pages
An Image Is Worth More Than 16x16 Patches
No ratings yet
An Image Is Worth More Than 16x16 Patches
23 pages
Dinov 2
No ratings yet
Dinov 2
31 pages
【更有效的掩码模型】Architecture-Agnostic Masked Image Modeling - From ViT Back to CNN
No ratings yet
【更有效的掩码模型】Architecture-Agnostic Masked Image Modeling - From ViT Back to CNN
19 pages
When Creating A Narrow AI - Hierarchy and Nonlocality of Neural Network Skills
No ratings yet
When Creating A Narrow AI - Hierarchy and Nonlocality of Neural Network Skills
19 pages
L U V U C - P N 3DM: Earning From Nlabelled Ideos Sing ON Trastive Redictive Eural Apping
No ratings yet
L U V U C - P N 3DM: Earning From Nlabelled Ideos Sing ON Trastive Redictive Eural Apping
19 pages
Crocov 2
No ratings yet
Crocov 2
19 pages
Mining The Knowledge of Weights For Optimizing Neural Network Structures
No ratings yet
Mining The Knowledge of Weights For Optimizing Neural Network Structures
16 pages
Master Inspera
No ratings yet
Master Inspera
45 pages
Domande ANN
No ratings yet
Domande ANN
28 pages
Transfer Learning For Visual Categorization A Survey
No ratings yet
Transfer Learning For Visual Categorization A Survey
16 pages
Cross Training
No ratings yet
Cross Training
11 pages
DL Tutorial NIPS2015 PDF
No ratings yet
DL Tutorial NIPS2015 PDF
133 pages
Application of Data Augmentation On Deep Learning
No ratings yet
Application of Data Augmentation On Deep Learning
13 pages
A Survey of Deep Learning - From Activations To Transformers
No ratings yet
A Survey of Deep Learning - From Activations To Transformers
12 pages
Unit - Ichp 4
No ratings yet
Unit - Ichp 4
19 pages
Scaling Robot Learning With Semantically Imagined Experience
No ratings yet
Scaling Robot Learning With Semantically Imagined Experience
21 pages
Recent Advances in Deep Learning For Object Detection
No ratings yet
Recent Advances in Deep Learning For Object Detection
26 pages
Hassani Unsupervised Multi-Task Feature Learning On Point Clouds ICCV 2019 Paper
No ratings yet
Hassani Unsupervised Multi-Task Feature Learning On Point Clouds ICCV 2019 Paper
12 pages
2022 - Al-Halah Et Al - Zero Experience Required
No ratings yet
2022 - Al-Halah Et Al - Zero Experience Required
11 pages
Large Self-Supervised Models Bridge The Gap in Domain Adaptive Object Detection
No ratings yet
Large Self-Supervised Models Bridge The Gap in Domain Adaptive Object Detection
16 pages
Transferring To Real-World Layouts: A Depth-Aware Framework For Scene Adaptation
No ratings yet
Transferring To Real-World Layouts: A Depth-Aware Framework For Scene Adaptation
11 pages
Local Aggregation For Unsupervised Learning of Visual Embeddings
No ratings yet
Local Aggregation For Unsupervised Learning of Visual Embeddings
13 pages
2021 - Task Switching Network For Multi-Task Learning - Sun Et Al
No ratings yet
2021 - Task Switching Network For Multi-Task Learning - Sun Et Al
10 pages
4.1 - Unsupervised Visual Representation Learning by Context Prediction
No ratings yet
4.1 - Unsupervised Visual Representation Learning by Context Prediction
10 pages
Network Morphism
No ratings yet
Network Morphism
9 pages
Lecture 19
No ratings yet
Lecture 19
19 pages
Lecun 20181015 Ihes Gomax PDF
No ratings yet
Lecun 20181015 Ihes Gomax PDF
109 pages
Learning Without Forgetting
No ratings yet
Learning Without Forgetting
13 pages
Gravimetic Feeders
100% (1)
Gravimetic Feeders
26 pages
2404.18144v1 Pages 7
No ratings yet
2404.18144v1 Pages 7
10 pages
2019 Researchstatement
No ratings yet
2019 Researchstatement
6 pages
NeurIPS 2023 Clusterfomer Clustering As A Universal Visual Learner Paper Conference
No ratings yet
NeurIPS 2023 Clusterfomer Clustering As A Universal Visual Learner Paper Conference
14 pages
Meta-Learning With Versatile Loss Geometries - For Fast Adaptation Using Mirror Descent
No ratings yet
Meta-Learning With Versatile Loss Geometries - For Fast Adaptation Using Mirror Descent
7 pages
Weakly Supervised Contrastive Learning
No ratings yet
Weakly Supervised Contrastive Learning
10 pages
Domain Generalization On Constrained
No ratings yet
Domain Generalization On Constrained
12 pages
230623-Paper-Learning and Representing Object Shape Through An Array of Orientation Columns
No ratings yet
230623-Paper-Learning and Representing Object Shape Through An Array of Orientation Columns
14 pages
Network Morphism
No ratings yet
Network Morphism
11 pages
Revisiting Self-Supervised Visual Representation Learning PDF
No ratings yet
Revisiting Self-Supervised Visual Representation Learning PDF
10 pages
Master's Thesis Deep Learning For Visual Recognition: Remi Cadene Supervised by Nicolas Thome and Matthieu Cord
No ratings yet
Master's Thesis Deep Learning For Visual Recognition: Remi Cadene Supervised by Nicolas Thome and Matthieu Cord
58 pages
Jia Bin Huang Research Statement
No ratings yet
Jia Bin Huang Research Statement
6 pages
ReviewPaper TransferLearning
No ratings yet
ReviewPaper TransferLearning
6 pages
Imagenet Classification With Deep Convolutional Neural Networks
No ratings yet
Imagenet Classification With Deep Convolutional Neural Networks
7 pages
Proximity Based Automatic Data Annotation For Autonomous Driving
No ratings yet
Proximity Based Automatic Data Annotation For Autonomous Driving
10 pages
Lecture 12 Learning in Vision 2022
No ratings yet
Lecture 12 Learning in Vision 2022
100 pages
Self Supervised Learning
No ratings yet
Self Supervised Learning
5 pages
MobileNetV2 Inverted Residuals and Linear Bottlenecks
No ratings yet
MobileNetV2 Inverted Residuals and Linear Bottlenecks
11 pages
SANDEL-3308 Install Manual Rev G 9-25-03
No ratings yet
SANDEL-3308 Install Manual Rev G 9-25-03
130 pages
Config WCM
100% (1)
Config WCM
17 pages
Module 1 - Introduction To Computer Networks
No ratings yet
Module 1 - Introduction To Computer Networks
9 pages
Woodmizer LT15 Parts
No ratings yet
Woodmizer LT15 Parts
39 pages
X2 Interface - LTE
100% (1)
X2 Interface - LTE
41 pages
Transfer Learning Using VGG-16 With Deep Convoluti
No ratings yet
Transfer Learning Using VGG-16 With Deep Convoluti
9 pages
Think and Decide Think and Observe: 3 Quarter Week 1 Lesson Plan Mathematics 4 I. Objectives
100% (1)
Think and Decide Think and Observe: 3 Quarter Week 1 Lesson Plan Mathematics 4 I. Objectives
3 pages
IPCC Inventory Software Manual
No ratings yet
IPCC Inventory Software Manual
66 pages
Huawei SUN2000 30KTL-A - 33KTL - 40KTL User Manual (Issue04 - 2016!06!20)
No ratings yet
Huawei SUN2000 30KTL-A - 33KTL - 40KTL User Manual (Issue04 - 2016!06!20)
108 pages
Module 8 Artificial Intelligence in Monitoring and Evaluation
No ratings yet
Module 8 Artificial Intelligence in Monitoring and Evaluation
23 pages
7C4 Nakul Narang Is Lab File
No ratings yet
7C4 Nakul Narang Is Lab File
57 pages
PDF Succinctly
100% (1)
PDF Succinctly
60 pages
Full Ordinary Differential Equations Principles and Applications Cambridge IISc Series 1st Edition A. K. Nandakumaran PDF All Chapters
No ratings yet
Full Ordinary Differential Equations Principles and Applications Cambridge IISc Series 1st Edition A. K. Nandakumaran PDF All Chapters
65 pages
Snowplow 101 Guide To Marketing Attribution - 2023
No ratings yet
Snowplow 101 Guide To Marketing Attribution - 2023
16 pages
Tree Menu Magic 2
No ratings yet
Tree Menu Magic 2
77 pages
So3 b1 Unit Test U8a PDF
No ratings yet
So3 b1 Unit Test U8a PDF
5 pages
Extended Essay BM IB
No ratings yet
Extended Essay BM IB
51 pages
Circulation
No ratings yet
Circulation
56 pages
2018 M.SC 2nd Sem
No ratings yet
2018 M.SC 2nd Sem
12 pages
05 RSB Cluster
No ratings yet
05 RSB Cluster
14 pages
8051 UNIT 1-Material
No ratings yet
8051 UNIT 1-Material
38 pages
Modul8 Manual
No ratings yet
Modul8 Manual
113 pages
Design Rotor V-Shape Permanent Magnets-Good
No ratings yet
Design Rotor V-Shape Permanent Magnets-Good
4 pages
Pathfinder Solution Overview
No ratings yet
Pathfinder Solution Overview
2 pages
Emfd Eec
No ratings yet
Emfd Eec
2 pages
Go Bag Policy March 2023
No ratings yet
Go Bag Policy March 2023
5 pages
Programming Unit Vocabulary 1
No ratings yet
Programming Unit Vocabulary 1
4 pages
Naat Nisa Brochure 2023...
No ratings yet
Naat Nisa Brochure 2023...
4 pages
Brakes Volvo Trucks
No ratings yet
Brakes Volvo Trucks
2 pages
CLADLOK Flat Panel Datasheet
No ratings yet
CLADLOK Flat Panel Datasheet
2 pages
Pyramid Image Processing: Exploring the Depths of Visual Analysis
From Everand
Pyramid Image Processing: Exploring the Depths of Visual Analysis
Fouad Sabry
No ratings yet
Hidden Surface Determination: Unveiling the Secrets of Computer Vision
From Everand
Hidden Surface Determination: Unveiling the Secrets of Computer Vision
Fouad Sabry
No ratings yet
Kernel Methods: Fundamentals and Applications
From Everand
Kernel Methods: Fundamentals and Applications
Fouad Sabry
No ratings yet

Taskonomy: Disentangling Task Transfer Learning

Uploaded by

Taskonomy: Disentangling Task Transfer Learning

Uploaded by

Taskonomy: Disentangling Task Transfer Learning

Do visual tasks have a relationship, or are they unre-

We proposes a fully computational approach for model-

1. Introduction X to Y when many pairs of (x, y) s.t. x ∈ X, y ∈ Y are

... ... 2D Segm.

ts Task-specific 2nd Order Object Class. (1000)

china cabinet, china closet

Transfers Results (2k training images)

Source Task Encoder Target Task Output

jec toe 100

′ EI∈Dtest [Dsi →t (I) > Dsj →t (I)]

Colorrizattion Denoising Matching Occlusion Edges

To get a sense of the quality of our networks vs. state-of-

imum allowed order (rows) constraints. Still seeing trans- Normals 97 98 98 98 98 97 97 6 -

Supervision Budget Supervision Budget

You might also like