0% found this document useful (0 votes)

80 views10 pages

Label Propagation For Deep Semi-Supervised Learning

Label propagation is a graph-based semi-supervised learning method that uses unlabeled data to infer pseudo-labels. The paper proposes alternating between training a neural network using labeled and pseudo-labeled data, and using the network's embeddings to construct a nearest neighbor graph for label propagation to infer new pseudo-labels. Experimental results show improved performance over other semi-supervised methods, especially with few labeled examples.

Uploaded by

Tao Fu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

80 views10 pages

Label Propagation For Deep Semi-Supervised Learning

Uploaded by

Tao Fu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Label Propagation for Deep Semi-supervised Learning

Ahmet Iscen1 Giorgos Tolias1 Yannis Avrithis2 Ondřej Chum1

1 VRG, FEE, Czech Technical University in Prague 2 Univ Rennes, Inria, CNRS, IRISA

Abstract
arXiv:1904.04717v1 [cs.CV] 9 Apr 2019

Semi-supervised learning is becoming increasingly im-

portant because it can combine data carefully labeled by
humans with abundant unlabeled data to train deep neu-
ral networks. Classic methods on semi-supervised learn-
ing that have focused on transductive learning have not
been fully exploited in the inductive framework followed by
modern deep learning. The same holds for the manifold
assumption—that similar examples should get the same pre-
diction. In this work, we employ a transductive label prop-
agation method that is based on the manifold assumption to
make predictions on the entire dataset and use these predic-
tions to generate pseudo-labels for the unlabeled data and
train a deep neural network. At the core of the transductive
method lies a nearest neighbor graph of the dataset that
we create based on the embeddings of the same network.
Therefore our learning process iterates between these two
steps. We improve performance on several datasets espe-
Figure 1. Label propagation on manifolds toy example. Triangles
cially in the few labels regime and show that our work is
denote labeled, and circles un-labeled training data, respectively.
complementary to current state of the art. Top: color-coded ground truth for labeled points, and gray color
for unlabeled points. Bottom: color-coded pseudo-labels inferred
1. Introduction by diffusion that are used to train the CNN. The size reflects the
certainty of the pseudo-label prediction.
Modern approaches to many computer vision problems
exploit deep neural networks. These are popular for being
very efficient and providing great performance at test time. amples [42], relations between examples and cluster cen-
The downside is a requirement of large amounts of training troids [1], or considering the manifold structure of data [19].
examples, which are labeled either by humans or automati- Alternatively, in self-supervised learning, one can take ad-
cally on proxy tasks. vantage of additional information like spatial layout in im-
Visual data are available in large quantities, however, ages [5, 12] or temporal relation in videos [40, 28]; or mine
data reliably annotated by humans are still very scarce. Ob- for such information in unstructured data by algorithmic su-
taining large amounts of annotated training data for every pervision using conventional methods [13, 30]. However,
single task is not only impractical, potentially costly, but it most such proxy tasks are inferior when directly compared
also turns out to be error prone. The low quality of crowd- to laboriously annotated data by humans.
sourced annotation is a common motivation to minimize the In classification, semi-supervised methods attempt to re-
need of annotation. duce the number of labeled examples, whereby the fully su-
In the domain of metric learning, promising results have pervised performance on all data acts as an upper bound.
been recently achieved by unsupervised methods for either In transductive learning [43, 45], label inference restricted
learning from scratch or fine-tuning a supervised network to a given set of unlabeled examples is of interest. In in-
for domain adaptation, which devise proxy tasks for learn- ductive learning, the goal is generalization to new unseen
ing. These tasks exploit the distribution of data in the orig- data, while the original training data are discarded. This
inal space, for instance pairwise relations of training ex- is achieved e.g. by combining classification loss on labeled

1
data with unsupervised objectives on all data, where the Unsupervised loss in deep SSL. Assuming that every train-
latter act as regularization [41, 38]. Or, an existing clas- ing image, labeled or not, belongs to a single category, a
sifier can be used to assign pseudo-labels [24, 35], which natural requirement on the classifier is to make a confident
is another form of algorithmic supervision. Using a pow- prediction on the training set. This idea was formalized by
erful classifier trained on carefully annotated data can pro- Sajjadi et al. [35], where the regularizer is designed to min-
vide high-quality pseudo-labels, opening the door to learn- imize the entropy of the network output. Such a loss term is
ing from real unlabeled, large scale data. In such omni- easily combined with other terms. A similar combination is
supervised learning [31], the fully supervised performance performed for denoising auto-encoders that are applied on
on the labeled part is actually the lower bound. This only all images in an unsupervised manner [32].
refreshes the interest in inductive semi-supervised methods. A direction attracting a lot of attention is that of consis-
In this paper, we use efficient transductive label propa- tency loss, where two related cases, e.g. coming from two
gation [43] to infer pseudo-labels for unlabeled data, which similar images, or made by two networks with related pa-
are used to train the classifier. Label propagation is a graph- rameters, are encouraged to have similar network outputs.
based method, and in this work the graph is constructed ex- Sajjadi et al. [34] is the first, to our knowledge, to use a
ploiting the embeddings obtained by the classification net- consistency loss between the outputs of a network on ran-
work itself. Thus, the proposed method alternates between dom perturbations of the same image. Laine and Aila [23]
two steps. First, the network is trained from labeled and rather apply consistency between the output of the current
pseudo-labeled data. The second step uses the embeddings network and the temporal average of outputs during train-
of the network trained in the previous step to construct a ing. The state-of-the-art mean teacher (MT) method [38]
nearest neighbor graph. Label propagation is then used to replaces output averaging by averaging of network param-
infer pseudo-labels for unlabeled images, as well as a cer- eters. Consistency loss is commonly measured by squared
tainty score per image and per class. Training is performed Euclidean distance. The Jensen-Shannon divergence is used
on all data, using certainty-based weights. instead by Qiao et al. [29], while complementarity of the
We experimentally show on standard datasets that the two networks is enforced via adversarial examples. A simi-
proposed method outperforms other semi-supervised ap- lar idea is proposed by Miyato et al. [26].
proaches. The less labeled data is available, the more pro- Pseudo-labeling in deep SSL. Lee [24] uses the current
nounced the advantage of the proposed approach is. network to infer pseudo-labels of unlabeled examples, by
choosing the most confident class. These pseudo-labels are
2. Related work treated like human-provided labels in the cross entropy loss.
The literature is rich in the problem of semi-supervised Its impact is similar to that of entropy minimization [35]; in
learning (SSL). The reader is advised to see [3] for an ex- both cases the network is forced to have more confident pre-
tensive overview. The same holds for SSL in image classi- dictions. The same principle is adopted by Shi et al. [36],
fication [10, 16, 4, 37]. In this section, we mostly restrict where the authors further add contrastive loss to the con-
the discussion to approaches that use deep learning for SSL sistency loss. Our method is different from all such prior
and perform the training on a large image collection with work in that pseudo-labels are inferred by label propagation
mini-batch optimization. rather than network predictions.
Prior work on semi-supervised deep learning for image Label propagation has been extensively used in a transduc-
classification is divided into two main categories. The first tive setup (see chapter 11 [3]). Recently, Douze et al. [7]
consists of methods, e.g. [15, 23, 34, 38], that add an un- perform label propagation on a large image dataset with
supervised loss term (often called a regularizer) into the CNN descriptors for few shot learning. Unseen images are
loss function. This term is applied to either all images or classified via online label propagation, which requires stor-
only the unlabeled ones. Methods in the second category, ing the entire dataset, while the network is trained in ad-
e.g. [24, 36], assign pseudo-labels to the unlabeled exam- vance and descriptors are fixed. Our work is different in
ples. The pseudo-labeled data are then used in training with that we perform label propagation on the training set off-
a supervised loss, such as cross entropy. Both categories use line while training the network, such that inference is pos-
a standard loss term that is trained with supervision from sible without accessing the original training set. Learning
labeled images. A thorough evaluation of SSL deep image by association [17] can been seen as two steps of propaga-
classification can be found in Miyato et al. [27]. tion on a constrained bi-partite graph between labeled and
Our contribution belongs to the second category, and unlabeled examples. Graph transduction game (GTG) [9],
is conceptually and implementation-wise orthogonal to the a form of label propagation, has been used for pseudo-
first. It is therefore straightforward to combine the proposed labels [8] as in our work, but in this case the network is
method with any method from the first category. We do pre-trained, the graph remains fixed and there is no weight-
combine it with [38] as shown in Section 5. ing mechanism. We compare to this approach in Section 5.
3. Preliminaries where again `s is any supervised loss function like cross-
entropy. An example is the approach proposed by Lee [24],
In this section we formulate the semi-supervised learn- who first train network fθ with (2) and then assign pseudo-
ing problem and then we discuss the classifier, different loss labels according to (1) for i ∈ U .
functions that are commonly used in prior work, and finally
a transductive learning approach that our method is based Unsupervised loss is another common alternative where
on. In our experiments we use a convolutional neural net- the loss function applies to both labeled and unlabeled ex-
work (CNN) to perform image classification, but this for- amples and encourages consistency under different trans-
mulation applies to any network architecture in any domain. formations of the data or the network. The so-called consis-
tency loss [36, 38, 36] is defined as
Problem formulation. We assume a collection of n ex-
amples X := (x1 , . . . , xl , xl+1 , . . . , xn ) with xi ∈ X . n
X
The first l examples xi for i ∈ L := {1, . . . , l}, denoted Lu (X; θ) := `u (fθ (xi ), fθ̃ (x̃i )), (4)
by XL , are labeled according to YL := (y1 , . . . , yl ) with i=1

yi ∈ C, where C := {1, . . . , c} is a discrete label set where x̃i refers to a different transformation of example xi .
for c classes. The remaining u := n − l examples xi for Note that according to the standard practice of data augmen-
i ∈ U := {l + 1, . . . , n}, denoted by XU , are unlabeled. tation, every forward pass of xi during training is performed
The goal in SSL is to use all examples X and labels YL under some random transformation. Parameter set θ̃ is ei-
to train a classifier that maps previously unseen samples to ther equal to θ or any other transformation of it, such as a
class labels. moving average over the sequence of network updates [38].
Classifier. The network takes an input example from X and A simple choice of `u is the squared Euclidean distance, i.e.
produces a vector of class confidence scores. We denote it `u (s, s̃) := ||s − s̃)||2 for s, s̃ ∈ Rc , forcing the two outputs
by fθ : X → Rc , where θ are the network parameters. It to be as close as possible.
is conceptually divided in two parts. The first is a feature Transductive learning solves a more specific problem. In-
extraction network φθ : X → Rd mapping the input to a stead of training a generic classifier able to classify new,
feature vector, or descriptor. We denote the descriptor of yet unseen, examples, the goal is to use X and YL to in-
the i-th example by vi := φθ (xi ). The second typically fer labels for examples in XU . In this work, we adopt the
consists of a fully connected (FC) layer applied on top of graph-based approach of Zhou et al. [43] for transductive
φθ and followed by softmax, producing a vector of confi- learning by diffusion1 .
dence scores. Function fθ is the mapping from input space
Diffusion for transductive learning [43]. Let V =
directly to confidence scores. The output of the network for
(v1 , . . . , vl , vl+1 , . . . , vn ) be the descriptor set, where vi
the i-th example is fθ (xi ) and the prediction is the one of
corresponds to xi as defined earlier. A symmetric adjacency
maximum confidence score
matrix W ∈ Rn×n with zero diagonal is constructed, whose
ŷi := arg max fθ (xi )j , (1) elements wij are non-negative pairwise similarities between
j
vi and vj . Its symmetrically normalized counterpart is
where subscript j denotes the j-th dimension of the vector. given by W = D−1/2 W D−1/2 , where D := diag(W 1n )
Supervised loss. In supervised learning, the network is is the degree matrix and 1n is the all-ones n-vector. A n × c
trained by minimizing a supervised loss term of the form label matrix Y is defined with elements

l 1, if i ∈ L ∧ yi = j
X Yij := (5)
Ls (XL , YL ; θ) := `s (fθ (xi ), yi ) , (2) 0, otherwise.
i=1
That is, the rows of Y corresponding to labeled examples
which applies only to labeled examples in XL . Such term are one-hot encoded labels and the rest are zero. Diffusion
is part of the total loss when training a network in a semi- amounts to computing the n × c matrix
supervised setup [36, 38, 29]. A standard choice for the
loss function `s in classification is cross-entropy, given by Z := (I − αW)−1 Y, (6)
`s (s, y) := − log sy for s ∈ Rc and y ∈ C. where α ∈ [0, 1) is a parameter. Finally, the class prediction
Pseudo-labeling is the process of assigning a pseudo-label for an unlabeled example xi is
ŷi to each example xi for i ∈ U . Denoting by ŶU :=
(ŷl+1 , . . . , ŷn ) the collection of pseudo-labels for XU , the ŷi := arg max zij , (7)
j
following additional pseudo-label loss term applies
where zij is the (i, j) element of matrix Z.
n
X
1 We first present the original approach and discuss our design choices
Lp (XU , ŶU ; θ) := `s (fθ (xi ), ŷi ) , (3)
i=l+1 in the following section.
It is interesting to observe that matrix Z as defined by (6) which applies because matrix (I −αW) is positive-definite.
is the minimizer of the following quadratic cost function This solution is known to be faster than the iterative solution
2 of Zhou et al. [43], and has been used in semi-supervised
n learning [44], interactive image segmentation [14], image
αX zi zj 2
J(Z) := wij √ − p +(1−α) kY − ZkF ,

2 i,j=1 dii retrieval [20] and semantic image segmentation [2]. Finally,
djj
we infer the pseudo-labels ŶU = (ŷl+1 , . . . , ŷn ), where ŷi
(8) is given by (7).
where zi is the i-th row of matrix Z, dii is the i-th diago-
nal diagonal element of D and k·kF is the Frobenius norm. Pseudo-label certainty and class balancing. Inferring
The first term encourages smoothness such that nearby ex- pseudo-labels from matrix Z by hard assignment has two
amples get the same predictions, while the second attempts undesired effects: first, we define pseudo-labels on all un-
to maintain predictions for the labeled examples [43]. labeled examples while clearly we do not have the same
certainty for each example. Second, pseudo-labels may not
be balanced over classes, which will impede learning.
4. Method
To deal with the former issue we associate with each
In the following, we begin by providing an overview of pseudo-label a weight reflecting the certainty of the predic-
our approach. We then develop the main elements of our so- tion. We use entropy, as a measure of uncertainty, to assign
lution, put everything together in a concrete algorithm, and weight ωi to example xi , defined by
discuss how our approach is complementary to approaches
using unsupervised loss for SSL [38, 36, 36]. Finally, we H(ẑi )
ωi := 1 − , (11)
discuss the relation to prior work that encourages smooth- log(c)
ness in deep networks.
where Ẑ isP the row-wise normalized counterpart of Z, i.e.
Overview. We introduce a new iterative process for semi- ẑij = zij / k zik , and function H : Rc → R is the entropy
supervised learning that can be summarized as follows. function. Weight ωi is normalized in [0, 1] because log(c)
First, we construct a nearest neighbor graph and perform is the maximum possible entropy in Rc .
label propagation by transductive learning on the training
To deal with the latter issue of class imbalance, we assign
set. Then, we estimate of a weight reflecting the uncertainty
weight ζj to class j that is inversely proportional to class
of label propagation for each unlabeled example. Finally,
population, defined as ζj := (|Lj | + |Uj |)−1 , where Lj
we inject the obtained labels into the network training pro-
(resp. Uj ) are the examples labeled (resp. pseudo-labeled)
cess. These ideas are developed below, while a graphical
as class j.
overview of the proposed approach is shown in Figure 2.
Given the above definitions of per-example and per-class
Nearest neighbor graph. Given a network with pa- weights, we associate the following weighted loss to the la-
rameters θ, we construct the descriptor set V = beled and pseudo-labeled examples
(v1 , . . . , vl , vl+1 , . . . , vn ), where vi := φθ (xi ). A sparse
affinity matrix A ∈ Rn×n with elements l
X
Lw (X, YL , ŶU ; θ) := ζyi `s (fθ (xi ), yi )
(
[vi> vj ]γ+ , if i 6= j ∧ vi ∈ NNk (vj ) i=1
aij := (9) n
0, otherwise
X
+ ωi ζŷi `s (fθ (xi ), ŷi ) , (12)
i=l+1
is constructed, where NNk denotes the set of k nearest
neighbors in X, and γ is a parameter following recent work which is the sum of weighted versions of Ls (2) and Lp (3).
on manifold-based search [20]. Note that constructing the In contrast to (3), pseudo-labels originate in diffusion rather
affinity matrix of the nearest neighbor graph is efficient even than network predictions.
for large n [20], while constructing the full affinity matrix A toy example showing the result of label propagation
as in Zhou et al. is not tractable. Then, let W := A + A> , and the estimated weights is shown in Figure 3.
which is indeed a symmetric nonnegative adjacency matrix Iterative training. Given the above definitions of nearest
with zero diagonal. neighbor graph definition, label propagation, example/class
Label propagation. Estimating matrix Z by (6) is imprac- weighting and pseudo-label loss, we plug those components
tical for large n because the inverse matrix (I − αW)−1 is into an iterative learning process. We begin by randomly
not sparse. We rather use the the conjugate gradient (CG) initializing the network parameters θ and we train the net-
method to solve linear system work for T epochs in a fully supervised manner on the l
labeled examples XL using the supervised loss term (2).
(I − αW)Z = Y, (10) The trained network then provides the starting point for the
Network fθ

FC + softmax
Feature extractor φθ
Phase 1: Train for 1 epoch with
Train for T epochs with Lw (X, YL , ŶU ; θ)
Ls (XL , YL ; θ) (all examples)
(labeled examples only) Use φθ

Extract descriptors V Phase 2: Iterate T 0 times

Compute affinity A (9)
W ← A + A>
W ← D −1/2 W D −1/2

So
La lve
be (1
l pr 0)
op
aga
tio
n

: labels : missing labels : pseudo-labels (size proportional to certainty ωi )

Figure 2. Overview of the proposed approach. Starting from a randomly initialized network, we first train it in a supervised fashion on
the labeled examples. Then we initiate an iterative process where at each iteration we compute a nearest neighbor graph of the entire
training set in the feature space of the current network, we propagate labels by transductive learning, and then we train the network on the
entire training set, with true labels or pseudo-labels on the labeled or unlabeled examples respectively. The pseudo-labels are weighted per
example and per class according to prediction certainty and inverse class population, respectively.

1 labeled example 3 labeled examples 10 labeled examples

Figure 3. Toy example with 300 examples demonstrating label propagation for different number of labeled examples. Triangle markers
correspond the labeled examples and circles to the unlabeled ones which are finally pseudo-labeled by label propagation. The class is
color-coded and the size of the circles corresponds to weight ωi . The true labels are the same as the example of Figure 1 (top).

following iterative process. First, we extract descriptors V stance (4), applied to both labeled and unlabeled examples.
on the entire training set X and compute nearest neighbors Combination of the two comes in a straightforward way by
to construct the adjacency matrix W . Second, we perform adding term (4) to the total loss optimized in lines 4 and 16
label propagation by solving linear system (10) and assign of Algorithm 1. This is exactly the way we combine the
pseudo-labels to unlabeled examples XU by (7). Finally, proposed approach with the state-of-the-art Mean-Teacher
we train the network for one epoch on the entire training approach [38] in our experiments.
set X using the weighted loss Lw (12). We repeat this it- √
Discussion. In an inductive framework, if zi / dii is re-
erative process for T 0 epochs. The above is summarized in placed by the network output fθ (xi ) in the smoothness
Algorithm 1. term of (8), then this becomes an unsupervised loss term,
Procedure O PTIMIZE() refers to the mini-batch opti- e.g. like (4), only now it encourages consistency between
mization of the corresponding loss term for one epoch, i.e. nearby example predictions. And indeed such solution is
all examples are fed to the network once. More details about adopted e.g. by Weston et al. [41]. This is not very effi-
batch construction are given in the implementation details. cient because the adjacency matrix is typically sparse with
Combination with other approaches. Our contribution non-zero-elements only on nearest neighbors, and then the
falls in the case of pseudo-label loss in the form of (3). It is gradient of the smoothness term will propagate from each
orthogonal to approaches that use unsupervised loss, for in- example to its neighbors only at each iteration.
Algorithm 1 Label propagation for deep SSL Teacher [38] when available (1k, 2k and 4k labels). The se-
1: procedure LPDSSL(Training examples X, labels YL ) lection process is repeated 10 times, resulting in 10 different
2: θ ← initialize randomly dataset splits for SSL on CIFAR 10. We follow the common
3: for epoch ∈ [1, . . . , T ] do
4: θ ← O PTIMIZE(Ls (XL , YL ; θ)) . mini-batch optimization
practice which is to use each of them and report mean error
5: end for and standard deviation.
6: for epoch ∈ [1, . . . , T 0 ] do CIFAR-100. Similarly to CIFAR-10, CIFAR-100 has 50k
7: for i ∈ {1, . . . , n} do vi ← φθ (xi ) . extract descriptors
8: for (i, j) ∈ {1, . . . , n}2 do aij ← affinity values (9) training and 10k test images of resolution 32 × 32, com-
9: W ← A + A> . symmetric affinity ing from 100 classes. We follow a protocol equivalent to
10: W ← D−1/2 W D−1/2 . symmetrically normalized affinity the one of CIFAR-10. We evaluate with 40 and 100 labeled
11: Z ← solve (10) with CG P . diffusion images per class, corresponding to 4k and 10k labeled im-
12: for (i, j) ∈ U × C do ẑij ← zij / k zik . normalize Z
13: for i ∈ U do ŷi ← arg maxj ẑij . pseudo-label ages in total. There are 3 such dataset splits, mean error and
14: for i ∈ U do ωi ← certainty of ŷi (11) . pseudo-label weight standard deviation are reported.
15: for j ∈ C do ζj ← (|Lj | + |Uj |)−1 . class weight/balancing
Mini-ImageNet. We introduce an SSL evaluation setup for
16: θ ← O PTIMIZE(Lw (X, YL , ŶU ; θ)) . mini-batch optimization
17: end for
Mini-ImageNet [39] which is a subset of the well-known
18: end procedure ImageNet [6] dataset and has been previously used for few-
shot learning [11]. We use the train/test splits created in the
work of Ravi and Larochelle [33]. It consists of 100 classes
Our main idea therefore is that instead of just encour- with 600 images per class, of resolution 84 × 84. We ran-
aging nearby examples to get the same predictions, we en- domly assign 500 images from each class to the training set,
courage all examples to get predictions same as the ones and 100 images to the test set. The result is a train and test
we would get by transductive learning according to the set of 50k and 10k images, respectively. We create three
quadratic cost (8) and its solution Z (6). Computing Z is dataset splits for the case of 40 and 100 labeled images per
efficient because it is performed outside our main optimiza- class that correspond to 4k and 10k labeled images in to-
tion process, i.e. it does not need iterating on mini-batches tal. Mean error and standard deviation over the three dataset
of data and backpropagating through the network. Then, splits are reported.
given Z, the main optimization process drives all examples
5.2. Training
directly to that solution, as if they were all labeled.
We list the reproduced baselines, and provide training
5. Experiments details per algorithm and dataset.
Implementation. We build our implementation on top
We present the datasets used in our experiments and the of the publicly available Pytorch code for the Mean Teacher
SSL setup that is followed. Then, we discuss the training (MT) approach [38]2 . The fully supervised baseline and MT
details of our method and the methods reproduced for fair are reproduced identically as the original implementation.
comparison. Finally, we perform experiments to show the In all our experiments SGD optimization is used.
impact of different components involved in the proposed Networks. Experiments on CIFAR-10 and CIFAR-100
method and to compare with the state of the art. All er- are performed with the “13-layer” network that is used
ror rates reported are produced by our own implementation in prior work [23, 38], while on Mini-ImageNet, Resnet-
unless otherwise stated. 18 [18] is engaged. Both networks consist of a feature ex-
5.1. Datasets tractor φθ followed by an FC layer and softmax. We add an
`2 -normalization layer right after φθ (before the FC layer)
We use three image classification datasets, namely providing unit-norm descriptors for the graph construction.
CIFAR-10 [22], CIFAR-100 [22] and Mini-ImageNet [39]. The same choice is also adopted in the fully supervised
Each dataset is used in an SSL setup where part of the train- baseline. One exception is all variants of MT as we ob-
ing images are labeled and the rest are unlabeled. We evalu- served that the `2 -normalization layer slightly harms per-
ate the performance on an independent test set. Unless oth- formance. We normalize images to have channel-wise zero
erwise specified, error rate is reported in our experiments. mean and unit variance over the entire training set. Unlike
CIFAR-10. The training set consists of 50k images com- prior work [38], we do not normalize the input images with
ing from 10 classes, while the test set consists of 10k im- ZCA, nor add Gaussian noise to the input layer, which result
ages from the same 10 classes. All images have resolution in worse performance according to our experiments.
32 × 32. Evaluation is performed with 50, 100, 200, and Hyper-parameters and training choices are adapted
400 labeled images per classes, corresponding to l = 500, from the MT method and implementation. These are fixed
1k, 2k, and 4k label images in total. We use the same 2 https://github.com/CuriousAI/mean-teacher/tree/
random selection of labeled images that is used in Mean master/pytorch
Pseudo-labeling CIFAR-10 ωi ζj
36.53 ± 1.42
3 36.17 ± 1.98
Diffusion (7) auto(0.82) auto(0.82) auto(0.82) auto(0.82) auto(0.81) ship(0.81)
3 33.32 ± 1.53
3 3 32.40 ± 1.80
GTG [8] 3 3 35.20 ± 2.23
Network (1) 3 3 35.17 ± 2.46
Table 1. Impact of weights ωi , class weights ζj , and pseudo- ship(0.81) frog(0.80) auto(0.80) auto(0.80) frog(0.80) frog(0.80)
labeling by diffusion prediction (7) or network prediction (1). Er- Figure 7. Examples of incorrectly pseudo-labeled images with
ror rate is reported on CIFAR-10 with 500 labels. highest ωi in CIFAR-10. Predicted class and ωi are shown below
each image.
Prediction accuracy

0.7
0.65 size is 100 for CIFAR-10 and 128 for CIFAR-100 and Mini-
0.6 ImageNet. All other learning parameters remain unchanged
0.55 Diffusion (7) from MT implementation.
0.5
Network (1) The fully supervised approach corresponds to training
0 50 100 150 with (2) and labeled images only. MT uses the additional
Epochs dual output trick with coefficient 0.01. Both these ap-
proaches are reproduced.
Figure 4. Accuracy of predicted pseudo-labels according to
Our approach is performed with mini-batch size B =
ground-truth on CIFAR-10 with 500 labeled images. Diffusion
predictions (7) are compared against network predictions (1).
BU + BL , where BL images are labeled and BU images
are originally unlabeled. We set BL = 50 for CIFAR-10
Number of images

and BL = 31 for CIFAR100 and Mini-ImageNet. Same

is also applied for MT. One epoch is defined as one pass
through all originally unlabeled examples in the training
set, meaning that images in IL appear multiple times per
epoch. We follow the same diffusion parameters as Iscen et
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 al. [20]. We set k = 50 for graph construction, γ = 3
Epoch 0, weight ωi Epoch 90, weight ωi in (9), and α = 0.99 in (10). We solve (10) with at most
20 iterations of CG. Pairwise similarities for the graph are
Figure 5. Distribution of weights ωi for unlabeled images at epoch computed with the publicly available FAISS library [21].
0 (left) and epoch 90 (right) during the training of CIFAR-10 with Confidence weights ωi are normalized over all examples
500 labels. Correct pseudo-labels according to ground-truth are
s.t. maxi ωi = 1. Class weights ζj are normalized over c
shown in blue and incorrect in red.
classes such that the average class weight is 1. Pseudo-label
50
Fully supervised
predictions, ωi , and ζj are updated after each epoch.
To assess the benefit of diffusion, we finally evaluate
Error rate

40 Ours
MT [38]
30 MT + ours a variant of our approach where the pseudo-labels are not
20 provided by diffusion but derived from the network with
10
(1) or from GTG propagation [8] instead. Training is per-
500 1k 2k 4k formed with (12), as with our method. This is in the spirit
Number of labeled images of pseudo-labeling in prior work [36, 24].
Figure 6. Error rate versus number of labeled images on CIFAR- 5.3. Ablation Study
10 using different methods.
We investigate the impact of different components of our
method. First, we study the effectiveness of weights intro-
for all approaches (re)produced by this work. The training duced in the loss function (12). Table 1 shows the classifi-
is performed for 180 epochs in total. Initial learning rate l0 cation performance on CIFAR-10 test set, when using only
is decayed with cosine annealing [25] so that it would have 500 labeled examples for training and the rest of the training
reached zero after 210 epochs, while l0 = 0.05 on CIFAR- set is considered unlabeled. Different weighting schemes
10, and l0 = 0.2 on CIFAR-100 and Mini-ImageNet. Ran- are evaluated by setting all ωi to one, all ζi to one, or both
dom data augmentation is performed by 4×4 random trans- to one. It is shown that both weights have positive contribu-
lations [38] followed by horizontal flip in CIFAR-10 and tions. We also show the benefit of predicting with diffusion
CIFAR-100. On Mini-ImageNet, each image is randomly over predicting by the trained network or GTG propagation.
rotated by 10 degrees before random horizontal flip. Batch Pseudo-labeling by the network predictions uses examples
Dataset CIFAR-10
Nb. labeled images 500 1000 2000 4000
Fully supervised 49.08 ± 0.83 40.03 ± 1.11 29.58 ± 0.93 21.63 ± 0.38
†
TDCNN [36] - 32.67 ± 1.93 22.99 ± 0.79 16.17 ± 0.37
Network prediction (1) + weights 35.17 ± 2.46 23.79 ± 1.31 16.64 ± 0.48 13.21 ± 0.61
Ours: Diffusion prediction (7) + weights 32.40 ± 1.80 22.02 ± 0.88 15.66 ± 0.35 12.69 ± 0.29
VAT [26]† - - - 11.36
Π model [23]† - - - 12.36 ± 0.31
Temporal Ensemble [23]† - - - 12.16 ± 0.24
MT [38]† - 27.36 ± 1.30 15.73 ± 0.31 12.31 ± 0.28
MT [38] 27.45 ± 2.64 19.04 ± 0.51 14.35 ± 0.31 11.41 ± 0.25
MT + Ours 24.02 ± 2.44 16.93 ± 0.70 13.22 ± 0.29 10.61 ± 0.28
Table 2. Comparison with the state of the art on CIFAR-10. Error rate is reported. “13-layer” network is used. The top part of the table
corresponds to training with pseudo-labels, while the bottom part of the table includes methods that are complementary to ours, as shown
by the combination of our method with MT. † denotes scores reported in prior work.

Dataset CIFAR-100 Mini-ImageNet-top1 Mini-ImageNet-top5

Nb. labeled images 4000 10000 4000 10000 4000 10000
Fully supervised 55.43 ± 0.11 40.67 ± 0.49 74.78 ± 0.33 60.25 ± 0.29 53.07 ± 0.68 38.28 ± 0.38
Ours 46.20 ± 0.76 38.43 ± 1.88 70.29 ± 0.81 57.58 ± 1.47 47.58 ± 0.94 36.14 ± 2.19
MT [38] 45.36 ± 0.49 36.08 ± 0.51 72.51 ± 0.22 57.55 ± 1.11 49.35 ± 0.22 32.51 ± 1.31
MT + Ours 43.73 ± 0.20 35.92 ± 0.47 72.78 ± 0.15 57.35 ± 1.66 50.52 ± 0.39 31.99 ± 0.55
Table 3. Performance comparison on CIFAR-100 and Mini-ImageNet with 4k and 10k labeled images. Error rate is reported. “13-layer”
network is used for CIFAR-100 and Resnet-18 is used for Mini-ImageNet. All methods are reproduced by us.

that the network can already classify, while diffusion allows ber of labels is reduced. The results on CIFAR-10 show
for accurate predictions beyond those examples. In Fig- that our approach is complementary to unsupervised loss,
ure 4, we report the progress of the pseudo-label accuracy such as the one used by MT. This combination achieves
on unlabeled images XU throughout the training. Diffusion the best performance on this dataset. The same holds for
predictions are consistently better than network predictions. CIFAR-100 and Mini-ImageNet for 10k available labels.
Figure 5 demonstrates how ωi accurately estimates the Our method also achieves a lower error rate than tempo-
certainty of the prediction. From the plots we observe that ral ensemble (38.65 ± 0.51) and Π-model (39.19 ± 0.36) on
predictions become more accurate as the training evolves, CIFAR-100 [23] with 10k labels. On Mini-ImageNet with
while at the beginning most examples are misclassified. 4k available labels, the best performance is achieved when
The proposed weighting mechanism is robust to incorrect using our method without combining with Mean Teacher.
pseudo-labels and prevents model collapse. Figure 7 shows
some of the incorrectly pseudo-labeled images with high
certainty ωi . Most of the incorrect labels come from trucks
6. Conclusions
labeled as automobiles or birds labeled as frogs.

5.4. Comparison with the state-of-the-art Most recent approaches for deep SSL rely on training
with unsupervised loss on both labeled and unlabeled im-
We present a comparison with state-of-the-art on all 3 ages. We have proposed an approach that relies on graph-
datasets in Tables 2 and 3. The comparison includes perfor- based label propagation to infer pseudo-labels for the un-
mance reported in prior work and our reproduced results. labeled images. An additional training set is formed with
In the case of the work by Shi et al. [36], we only compare these pseudo-labels, which are shown to be more valuable
with their TDCNN variant which refers to pseudo-labeling than the pseudo-labels inferred by the network itself. Our
for network training. The other loss terms in their work are method is in principle complementary to unsupervised loss
complementary to ours, similarly to MT. We additionally terms, which is experimentally shown in this work.
compare with our implementation of pseudo-labeling with
network predictions combined with the proposed weights. Acknowledgments This work is supported by the GAČR
The proposed approach performs the best out of the grant 19-23165S and the OP VVV funded project
pseudo-label based approaches on CIFAR-10. Results in CZ.02.1.01/0.0/0.0/16 019/0000765 “Research Center for
Figure 6 show that our benefit is larger when the num- Informatics”.
References [21] Jeff Johnson, Matthijs Douze, and Hervé Jégou. Billion-
scale similarity search with gpus. arXiv preprint
[1] Mathilde Caron, Piotr Bojanowski, Armand Joulin, and arXiv:1702.08734, 2017. 7
Matthijs Douze. Deep clustering for unsupervised learning
[22] Alex Krizhevsky and Geoffrey Hinton. Learning multiple
of visual features. ECCV, 2018. 1
layers of features from tiny images. Technical report, Uni-
[2] Siddhartha Chandra and Iasonas Kokkinos. Fast, exact and versity of Toronto, 2009. 6
multi-scale inference for semantic image segmentation with
[23] Samuli Laine and Timo Aila. Temporal ensembling for semi-
deep Gaussian CRFs. In ECCV, 2016. 4
supervised learning. In ICLR, 2017. 2, 6, 8
[3] Olivier Chapelle, Bernhard Scholkopf, and Alexander Zien.
[24] Dong-Hyun Lee. Pseudo-label: The simple and efficient
Semi-Supervised Learning. MIT Press, 2006. 2
semi-supervised learning method for deep neural networks.
[4] Dengxin Dai and Luc Van Gool. Ensemble projection for
In ICMLW, 2013. 2, 3, 7
semi-supervised image classification. In ICCV, 2013. 2
[25] Ilya Loshchilov and Frank Hutter. Sgdr: Stochastic gradient
[5] Carl Doersch, Abhinav Gupta, and Alexei A. Efros. Unsu-
descent with warm restarts. ICLR, 2017. 7
pervised visual representation learning by context prediction.
In ICCV, 2015. 1 [26] Takeru Miyato, Shin-ichi Maeda, Shin Ishii, and Masanori
Koyama. Virtual adversarial training: a regularization
[6] Wei Dong, Richard Socher, Li Li-Jia, Kai Li, and Li Fei-
method for supervised and semi-supervised learning. IEEE
Fei. Imagenet: A large-scale hierarchical image database. In
Trans. PAMI, 2018. 2, 8
CVPR, June 2009. 6
[27] Avital Oliver, Augustus Odena, Colin Raffel, Ekin D Cubuk,
[7] Matthijs Douze, Arthur Szlam, Bharath Hariharan, and
and Ian J Goodfellow. Realistic evaluation of deep semi-
Hervé Jégou. Low-shot learning with large-scale diffusion.
supervised learning algorithms. In ICLRW, 2018. 2
In CVPR, 2018. 2
[8] Ismail Elezi, Alessandro Torcinovich, Sebastiano Vascon, [28] Deepak Pathak, Ross B Girshick, Piotr Dollár, Trevor Dar-
and Marcello Pelillo. Transductive label augmentation rell, and Bharath Hariharan. Learning features by watching
for improved deep network learning. arXiv preprint objects move. In CVPR, 2017. 1
arXiv:1805.10546, 2018. 2, 7 [29] Siyuan Qiao, Wei Shen, Zhishuai Zhang, Bo Wang, and Alan
[9] Aykut Erdem and Marcello Pelillo. Graph transduction as a Yuille. Deep co-training for semi-supervised image recogni-
noncooperative game. Neural Computation, 24, 2012. 2 tion. In ECCV, 2018. 2, 3
[10] Rob Fergus, Yair Weiss, and Antonio Torralba. Semi- [30] Filip Radenović, Giorgos Tolias, and Ondřej Chum. Fine-
supervised learning in gigantic image collections. In NIPS, tuning CNN image retrieval with no human annotation. IEEE
2009. 2 Trans. PAMI, 2018. 1
[11] Spyros Gidaris and Nikos Komodakis. Dynamic few-shot [31] Ilija Radosavovic, Piotr Dollar, Ross Girshick, Georgia
visual learning without forgetting. In CVPR, 2018. 6 Gkioxari, and Kaiming He. Data distillation: Towards omni-
[12] Spyros Gidaris, Praveer Singh, and Nikos Komodakis. Un- supervised learning. In CVPR, 2018. 2
supervised representation learning by predicting image rota- [32] Antti Rasmus, Mathias Berglund, Mikko Honkala, Harri
tions. In ICLR, 2018. 1 Valpola, and Tapani Raiko. Semi-supervised learning with
[13] Albert Gordo, Jon Almazan, Jerome Revaud, and Diane Lar- ladder networks. In NIPS, 2015. 2
lus. End-to-end learning of deep visual representations for [33] Sachin Ravi and Hugo Larochelle. Optimization as a model
image retrieval. IJCV, 124(2), 2017. 1 for few-shot learning. In ICLR, 2016. 6
[14] Leo Grady. Random walks for image segmentation. IEEE [34] Mehdi Sajjadi, Mehran Javanmardi, and Tolga Tasdizen.
Trans. PAMI, 28(11):1768–1783, 2006. 4 Mutual exclusivity loss for semi-supervised deep learning.
[15] Yves Grandvalet and Yoshua Bengio. Semi-supervised In ICIP, 2016. 2
learning by entropy minimization. In NIPS, 2005. 2 [35] Mehdi Sajjadi, Mehran Javanmardi, and Tolga Tasdizen.
[16] Matthieu Guillaumin, Jakob Verbeek, and Cordelia Schmid. Regularization with stochastic transformations and perturba-
Multimodal semi-supervised learning for image classifica- tions for deep semi-supervised learning. In NIPS, 2016. 2
tion. In CVPR, 2010. 2 [36] Weiwei Shi, Yihong Gong, Chris Ding, Zhiheng Ma, Xiaoyu
[17] Philip Haeusser, Alexander Mordvintsev, and Daniel Cre- Tao, and Nanning Zheng. Transductive semi-supervised
mers. Learning by association – a versatile semi-supervised deep learning using min-max features. In ECCV, 2018. 2, 3,
training method for neural networks. In CVPR, 2017. 2 4, 7, 8
[18] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. [37] Abhinav Shrivastava, Saurabh Singh, and Abhinav Gupta.
Deep residual learning for image recognition. In CVPR, Constrained semi-supervised learning using attributes and
2016. 6 comparative attributes. In ECCV, 2012. 2
[19] Ahmet Iscen, Giorgos Tolias, Yannis Avrithis, and Ondrej [38] Antti Tarvainen and Harri Valpola. Mean teachers are better
Chum. Mining on manifolds: Metric learning without labels. role models: Weight-averaged consistency targets improve
In CVPR, 2018. 1 semi-supervised deep learning results. In NIPS, 2017. 2, 3,
[20] Ahmet Iscen, Giorgos Tolias, Yannis Avrithis, Teddy Furon, 4, 5, 6, 7, 8
and Ondrej Chum. Efficient diffusion on region manifolds: [39] Oriol Vinyals, Charles Blundell, Tim Lillicrap, Daan Wier-
Recovering small objects with compact cnn representations. stra, et al. Matching networks for one shot learning. In NIPS,
In CVPR, 2017. 4, 7 2016. 6
[40] Xiaolong Wang, Kaiming He, and Abhinav Gupta. Transi-
tive invariance for selfsupervised visual representation learn-
ing. In ICCV, 2017. 1
[41] Jason Weston, Frédéric Ratle, and Ronan Collobert. Deep
learning via semi-supervised embedding. In ICML, 2008. 2,
5
[42] Zhirong Wu, Yuanjun Xiong, Stella Yu, and Dahua Lin. Un-
supervised feature learning via non-parametric instance-level
discrimination. CVPR, 2018. 1
[43] Dengyong Zhou, Olivier Bousquet, Thomas Navin Lal, Ja-
son Weston, and Bernhard Schölkopf. Learning with local
and global consistency. In NIPS, 2003. 1, 2, 3, 4
[44] Xiaojin Zhu, John Lafferty, and Ronald Rosenfeld. Semi-
Supervised Learning with Graphs. PhD thesis, Carnegie
Mellon University, Language Technologies Institute, School
of Computer Science Pittsburgh, PA, 2005. 4
[45] Xiaojin Zhu, John D Lafferty, and Zoubin Ghahramani.
Semi-supervised learning: From Gaussian fields to Gaussian
processes. Technical report, 2003. 1

Route Surveying PDF
60% (5)
Route Surveying PDF
21 pages
6333 Regularization With Stochastic Transformations and Perturbations For Deep Semi Supervised Learning
No ratings yet
6333 Regularization With Stochastic Transformations and Perturbations For Deep Semi Supervised Learning
9 pages
Semi-Supervised Learning Literature Survey
No ratings yet
Semi-Supervised Learning Literature Survey
59 pages
AISTATS21 SemiMeta Compressed
No ratings yet
AISTATS21 SemiMeta Compressed
16 pages
In Defense of Pseudo-Labeling
No ratings yet
In Defense of Pseudo-Labeling
20 pages
Using Weighted Nearest Neighbor To Benef PDF
No ratings yet
Using Weighted Nearest Neighbor To Benef PDF
12 pages
Semi-Supervised Learning With Deep Generative
No ratings yet
Semi-Supervised Learning With Deep Generative
9 pages
On Consistency of Graph-Based Semi-Supervised Learning: Chengan Du Yunpeng Zhao Feng Wang
No ratings yet
On Consistency of Graph-Based Semi-Supervised Learning: Chengan Du Yunpeng Zhao Feng Wang
9 pages
Semi-Supervised Learning With Ladder Network
No ratings yet
Semi-Supervised Learning With Ladder Network
19 pages
Pseudo Label Final
No ratings yet
Pseudo Label Final
7 pages
TR1530
No ratings yet
TR1530
39 pages
Semi-Supervised Learning With Self-Supervised Networks
No ratings yet
Semi-Supervised Learning With Self-Supervised Networks
10 pages
Neural Graph Learning Training Neural Networks Using Graphs
No ratings yet
Neural Graph Learning Training Neural Networks Using Graphs
8 pages
Deep Semi-Supervised Learning For Time-Series Classification
No ratings yet
Deep Semi-Supervised Learning For Time-Series Classification
7 pages
第八章
No ratings yet
第八章
28 pages
2016-Revisiting Semi-Supervised Learning With Graph Embeddings
No ratings yet
2016-Revisiting Semi-Supervised Learning With Graph Embeddings
9 pages
AML Unit-3 Material
No ratings yet
AML Unit-3 Material
26 pages
Semi-Supervised Learning Via Regularized Boosting Working On Multiple Semi-Supervised Assumptions
No ratings yet
Semi-Supervised Learning Via Regularized Boosting Working On Multiple Semi-Supervised Assumptions
15 pages
Semi-Supervised Learning: Xiaojin Zhu, University of Wisconsin-Madison
No ratings yet
Semi-Supervised Learning: Xiaojin Zhu, University of Wisconsin-Madison
10 pages
Title: Understanding Semi-Supervised Learning
No ratings yet
Title: Understanding Semi-Supervised Learning
8 pages
1 s2.0 S1047320308001144 Main
No ratings yet
1 s2.0 S1047320308001144 Main
7 pages
Pseudo Label Final
No ratings yet
Pseudo Label Final
6 pages
Meta Pseudo Labels
No ratings yet
Meta Pseudo Labels
12 pages
Semi-Supervised Learning A Brief Review
No ratings yet
Semi-Supervised Learning A Brief Review
6 pages
Bootstrap PDF
No ratings yet
Bootstrap PDF
11 pages
A Survey On Semi-, Self - and Unsupervised Learning For Image Classification
No ratings yet
A Survey On Semi-, Self - and Unsupervised Learning For Image Classification
33 pages
Self-Taught Learning: Transfer Learning From Unlabeled Data
No ratings yet
Self-Taught Learning: Transfer Learning From Unlabeled Data
8 pages
Re Fix Match
No ratings yet
Re Fix Match
11 pages
Zheng SimMatch Semi-Supervised Learning With Similarity Matching CVPR 2022 Paper
No ratings yet
Zheng SimMatch Semi-Supervised Learning With Similarity Matching CVPR 2022 Paper
11 pages
Self-Supervised Learning For Semi-Supervised Time Series Classification
No ratings yet
Self-Supervised Learning For Semi-Supervised Time Series Classification
13 pages
Usc 08
No ratings yet
Usc 08
46 pages
Unsupervised Feature Learning Via Non-Parametric Instance Discrimination
No ratings yet
Unsupervised Feature Learning Via Non-Parametric Instance Discrimination
10 pages
Contrastive Learning
No ratings yet
Contrastive Learning
10 pages
Superloss: A Generic Loss For Robust Curriculum Learning
No ratings yet
Superloss: A Generic Loss For Robust Curriculum Learning
12 pages
Chapter 05 - 1732187374
No ratings yet
Chapter 05 - 1732187374
15 pages
End-to-End Semi-Supervised Object Detection With Soft Teacher
No ratings yet
End-to-End Semi-Supervised Object Detection With Soft Teacher
10 pages
Machine Learning Tut
No ratings yet
Machine Learning Tut
68 pages
Xie Self-Training With Noisy Student Improves ImageNet Classification CVPR 2020 Paper
No ratings yet
Xie Self-Training With Noisy Student Improves ImageNet Classification CVPR 2020 Paper
12 pages
Wang Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels CVPR 2022 Paper
No ratings yet
Wang Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels CVPR 2022 Paper
10 pages
An Introductory Note On Machine Learning. A V Narasimhadhan
No ratings yet
An Introductory Note On Machine Learning. A V Narasimhadhan
2 pages
A Dual-Channel Semi-Supervised Learning Framework On Graphs Via Knowledge Transfer and Meta-Learning
No ratings yet
A Dual-Channel Semi-Supervised Learning Framework On Graphs Via Knowledge Transfer and Meta-Learning
26 pages
A Survey On Self-Supervised Learning Algorithms Applications and Future Trends
No ratings yet
A Survey On Self-Supervised Learning Algorithms Applications and Future Trends
20 pages
Entropy 24 00551 v2
No ratings yet
Entropy 24 00551 v2
22 pages
Kwon Semi-Supervised Semantic Segmentation With Error Localization Network CVPR 2022 Paper
No ratings yet
Kwon Semi-Supervised Semantic Segmentation With Error Localization Network CVPR 2022 Paper
11 pages
2024 MTH058 Lecture04 AILearningParadigms
No ratings yet
2024 MTH058 Lecture04 AILearningParadigms
85 pages
Lecture 11
No ratings yet
Lecture 11
130 pages
Semi-Supervised Learning in ML
No ratings yet
Semi-Supervised Learning in ML
7 pages
Divide Mix
No ratings yet
Divide Mix
14 pages
(2203.06915) SimMatch - Semi-Supervised Learning With Similarity Matching
No ratings yet
(2203.06915) SimMatch - Semi-Supervised Learning With Similarity Matching
17 pages
Fast Self-Supervised Clustering With Anchor Graph
No ratings yet
Fast Self-Supervised Clustering With Anchor Graph
14 pages
50 Breakthrough AI Concepts in 500 Words Each: In 500 words, #17
From Everand
50 Breakthrough AI Concepts in 500 Words Each: In 500 words, #17
Nietsnie Trebla
No ratings yet
Unit 01 - Linear Classifiers and Generalizations - MD
No ratings yet
Unit 01 - Linear Classifiers and Generalizations - MD
23 pages
A Survey On Contrastive Self-Supervised Learning
No ratings yet
A Survey On Contrastive Self-Supervised Learning
21 pages
2020 Acl-Main 272
No ratings yet
2020 Acl-Main 272
11 pages
Adaptive Hypergraph Learning and Its Application in Image Classification
No ratings yet
Adaptive Hypergraph Learning and Its Application in Image Classification
11 pages
Basak Pseudo-Label Guided Contrastive Learning For Semi-Supervised Medical Image Segmentation CVPR 2023 Paper
No ratings yet
Basak Pseudo-Label Guided Contrastive Learning For Semi-Supervised Medical Image Segmentation CVPR 2023 Paper
12 pages
Perturbed and Strict Mean Teachers For Semi Supervised Semantic Segmentation
No ratings yet
Perturbed and Strict Mean Teachers For Semi Supervised Semantic Segmentation
10 pages
Supervised Learning
No ratings yet
Supervised Learning
4 pages
Supervised Learning Neural Networks
No ratings yet
Supervised Learning Neural Networks
4 pages
Deep Learning: Fundamentals and Applications
From Everand
Deep Learning: Fundamentals and Applications
Fouad Sabry
No ratings yet
AI for Everyone: An Intermediate Guide to Artificial Intelligence
From Everand
AI for Everyone: An Intermediate Guide to Artificial Intelligence
Nova Clarke
No ratings yet
Towards ML Engineering A Brief History To TFX
No ratings yet
Towards ML Engineering A Brief History To TFX
16 pages
Deep Double Descent Where Bigger Models and More Data Hurt
No ratings yet
Deep Double Descent Where Bigger Models and More Data Hurt
24 pages
Network Morphism
No ratings yet
Network Morphism
11 pages
Explainable AI For Trees
No ratings yet
Explainable AI For Trees
72 pages
Algebra in MATLAB
No ratings yet
Algebra in MATLAB
8 pages
Caps Maths English GR R FS
No ratings yet
Caps Maths English GR R FS
278 pages
Rps School System: Final TERM (2018)
No ratings yet
Rps School System: Final TERM (2018)
5 pages
302 Tut 4 Solns
No ratings yet
302 Tut 4 Solns
51 pages
Decision Rule: NCR and NPR: Statistical Tool
No ratings yet
Decision Rule: NCR and NPR: Statistical Tool
2 pages
Calculus 2: Engr. Luisito Lolong Lacatan, Pcpe, PHD
No ratings yet
Calculus 2: Engr. Luisito Lolong Lacatan, Pcpe, PHD
46 pages
Introduction To Linear Programming Applications and Extensions First Edition Darst PDF Download
No ratings yet
Introduction To Linear Programming Applications and Extensions First Edition Darst PDF Download
51 pages
(Ga/Mom) : A Novel: Genetic Algorithms and Method of Moments Integration For Antenna Design
No ratings yet
(Ga/Mom) : A Novel: Genetic Algorithms and Method of Moments Integration For Antenna Design
4 pages
Class6 Maths Patterns MAV
No ratings yet
Class6 Maths Patterns MAV
3 pages
Venn Diagram Puzzle
No ratings yet
Venn Diagram Puzzle
6 pages
Fundamentals Deep Learning Activation Functions When To Use Them
No ratings yet
Fundamentals Deep Learning Activation Functions When To Use Them
15 pages
Ee04 Lesson 1-Part 1
No ratings yet
Ee04 Lesson 1-Part 1
31 pages
Mathongo Jee Mains Crash Course
No ratings yet
Mathongo Jee Mains Crash Course
9 pages
Cfe2 M 276 Order of Operations Bodmas ks2 Powerpoint - Ver - 4
No ratings yet
Cfe2 M 276 Order of Operations Bodmas ks2 Powerpoint - Ver - 4
8 pages
Study Guide 312
No ratings yet
Study Guide 312
1 page
Multiplicative Inverse
No ratings yet
Multiplicative Inverse
15 pages
Grade 9 Maths Ques
No ratings yet
Grade 9 Maths Ques
117 pages
2012 2013 Neweds
No ratings yet
2012 2013 Neweds
84 pages
Number.. The Language of Science (Dantzig T., Mazur J. Pearson 2005 0131856278) PDF
100% (2)
Number.. The Language of Science (Dantzig T., Mazur J. Pearson 2005 0131856278) PDF
414 pages
2x2 Production Economy Handout
No ratings yet
2x2 Production Economy Handout
3 pages
Digital Signal Processing
No ratings yet
Digital Signal Processing
168 pages
MH1301 Extra Questions 1
No ratings yet
MH1301 Extra Questions 1
3 pages
DE - S2023 (3131704) (GTURanker - Com)
No ratings yet
DE - S2023 (3131704) (GTURanker - Com)
2 pages
CE 620 - Introduction To FEM
No ratings yet
CE 620 - Introduction To FEM
16 pages
AMC12 Lecture1 Probability
No ratings yet
AMC12 Lecture1 Probability
10 pages
Mathematics Tos 3RD
No ratings yet
Mathematics Tos 3RD
2 pages
Mathematics Project Part C
No ratings yet
Mathematics Project Part C
10 pages
Julia Landon College Preparatory and Leadership Development School Curriculum Paper 2009-2010 Seventh Grade Pre-Algebra Ms. Hamlow
No ratings yet
Julia Landon College Preparatory and Leadership Development School Curriculum Paper 2009-2010 Seventh Grade Pre-Algebra Ms. Hamlow
3 pages
Kristin IB Math - Vectors
100% (1)
Kristin IB Math - Vectors
16 pages

Label Propagation For Deep Semi-Supervised Learning

Uploaded by

Label Propagation For Deep Semi-Supervised Learning

Uploaded by

Label Propagation for Deep Semi-supervised Learning

Ahmet Iscen1 Giorgos Tolias1 Yannis Avrithis2 Ondřej Chum1

Semi-supervised learning is becoming increasingly im-

Extract descriptors V Phase 2: Iterate T 0 times

: labels : missing labels : pseudo-labels (size proportional to certainty ωi )

1 labeled example 3 labeled examples 10 labeled examples

and BL = 31 for CIFAR100 and Mini-ImageNet. Same

Dataset CIFAR-100 Mini-ImageNet-top1 Mini-ImageNet-top5

You might also like