0% found this document useful (0 votes)
22 views20 pages

Semi-Supervised Variational Adversarial Active Lea

Uploaded by

Mrs.S Krishanthy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views20 pages

Semi-Supervised Variational Adversarial Active Lea

Uploaded by

Mrs.S Krishanthy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Semi-Supervised Variational Adversarial Active

Learning via Learning to Rank and Agreement-Based


Pseudo Labeling

Zongyao Lyu[0000−0002−7542−5818] and William J. Beksi[0000−0001−5377−2627]


arXiv:2408.12774v1 [cs.LG] 23 Aug 2024

The University of Texas at Arlington, Arlington TX 76019, USA

Abstract. Active learning aims to alleviate the amount of labor involved in data
labeling by automating the selection of unlabeled samples via an acquisition func-
tion. For example, variational adversarial active learning (VAAL) leverages an
adversarial network to discriminate unlabeled samples from labeled ones using
latent space information. However, VAAL has the following shortcomings: (i) it
does not exploit target task information, and (ii) unlabeled data is only used for
sample selection rather than model training. To address these limitations, we in-
troduce novel techniques that significantly improve the use of abundant unlabeled
data during training and take into account the task information. Concretely, we
propose an improved pseudo-labeling algorithm that leverages information from
all unlabeled data in a semi-supervised manner, thus allowing a model to ex-
plore a richer data space. In addition, we develop a ranking-based loss prediction
module that converts predicted relative ranking information into a differentiable
ranking loss. This loss can be embedded as a rank variable into the latent space of
a variational autoencoder and then trained with a discriminator in an adversarial
fashion for sample selection. We demonstrate the superior performance of our ap-
proach over the state of the art on various image classification and segmentation
benchmark datasets.

Keywords: Active Learning · Semi-Supervised Learning · Image Classification


and Segmentation

1 Introduction
Deep learning has shown impressive results on computer vision tasks mainly due to an-
notated large-scale datasets. Yet, acquiring labeled data can be extremely costly or even
infeasible. To overcome this issue, active learning (AL) was introduced [6,31]. In AL, a
model is initialized with a relatively small set of labeled training samples. Then, an AL
algorithm progressively chooses samples for annotation that yield high classification
performance while minimizing labeling costs. By demonstrating a reduced requirement
for training instances, AL has been applied to various computer vision applications in-
cluding image categorization, image segmentation, text classification, and more.
Among the most prevalent AL strategies, pool-based approaches have access to a
huge supply of unlabeled data. This provides valuable information about the under-
lying structure of the whole data distribution, especially for small labeling budgets.
Nevertheless, many AL methods still fail to leverage valuable information within the
2 Zongyao Lyu and William J. Beksi

Fig. 1: An overview of SS-VAAL. First, a loss prediction module attached to the tar-
get model predicts losses on the input data. Next, the predicted losses along with the
actual target losses are transformed into ranking losses via a pretrained ranking func-
tion. Unlabeled samples are then passed to the target model and subsequently through
a k-means algorithm to acquire pseudo labels for additional training. Finally, a discrim-
inator following a variational autoencoder is trained in an adversarial manner to select
unlabeled samples for annotation.

unlabeled data during training. On the other hand, semi-supervised learning (SSL), in
particular the technique of pseudo labeling, thrives on utilizing unlabeled data. Pseudo
labeling is based on the concept whereby a model assigns “pseudo labels” to samples
that produce high-confidence scores. It then integrates these samples into the training
process. In contrast, AL typically selects only a handful of highly-informative samples
(i.e., samples with low prediction confidence) at each learning step and regularly seeks
user input. Although AL and pseudo labeling both aim to leverage a model’s uncer-
tainty, they look at different ends of the same spectrum. Hence, their combination can
be expected to achieve increased performance [14].
In light of this observation, we propose to exploit both labeled and unlabeled data
during model training by (i) predicting pseudo labels for unlabeled samples, and (ii)
incorporating these samples and their pseudo labels into the labeled training data in
every AL cycle. The idea of using unlabeled data for training is not new. Earlier work by
Wang et al. [36] showed promising results by applying entropy-based pseudo labeling to
AL. However, pseudo labeling can perform poorly in its original formulation. The sub-
par performance is attributed to inaccurate high-confidence predictions made by poorly
calibrated models. These predictions produce numerous incorrect pseudo labels [1]. To
tackle this issue, we introduce a novel agreement-based clustering technique that assists
in determining pseudo labels. Clustering algorithms can analyze enormous amounts of
unlabeled data in an unsupervised way [26,7], and cluster centers are highly useful for
querying labels from an oracle [18]. Our two-step process involves (i) separately clus-
tering labeled and unlabeled data, (ii) assigning each piece of unlabeled data an initial
pseudo label and a clustering label. A final pseudo label is confirmed only if these two
Semi-Supervised Variational Adversarial Active Learning 3

labels agree. The end result is a significant reduction in the number of incorrect pseudo
labels.
The second aspect of our work focuses on the sample selection strategy in AL.
We base our approach on the VAAL [34] framework. VAAL uses an adversarial dis-
criminator to discern between labeled and unlabeled data, which informs the sample
selection process. Later adaptations of VAAL (e.g., TA-VAAL [19]) incorporate a loss
prediction module, relaxing the task of exact loss prediction to loss ranking prediction.
Additionally, a ranking conditional generative adversarial network (RankCGAN) [29]
is employed to combine normalized ranking loss information into VAAL. To better in-
tegrate task-related information into the training process, we propose a learning-to-rank
method for VAAL. This decision is inspired by the realization that the loss prediction
can be interpreted as a ranking problem [23], a concept central to information retrieval.
We refine the loss prediction process by applying a contemporary learning-to-rank tech-
nique for approximating non-differentiable operations in ranking-based scores. The loss
prediction module estimates a loss for labeled input, converting the predicted loss and
actual target loss into a differentiable ranking loss. This ranking loss, along with la-
beled and unlabeled data, is provided as input into an adversarial learning process that
identifies unlabeled samples for annotation. Therefore, by explicitly exploiting the loss
information directly related to the given task, task-related information is integrated into
the AL process. The architecture of our proposed method, SS-VAAL, is depicted in
Fig. 1.
To summarize, our contributions are the following.

1. We create a novel agreement-based pseudo-labeling technique that optimally har-


nesses rich information from abundant unlabeled data in each AL cycle, capitaliz-
ing on the advantages of unsupervised feature learning.
2. We devise an enhanced loss prediction module that employs a learning-to-rank
method, yielding a more effective sample selection strategy. We develop a ranking
method that explicitly ranks the predicted losses by taking into account the entire
list of loss structures, as opposed to only the pairwise information considered in
prior approaches.
3. We highlight the superior efficacy of our approach through its application to com-
mon image classification and segmentation benchmarks.

Our source code is publicly available [35].

2 Related Work

2.1 Active Learning

AL methods operate on an iterative principle of constructing a training set. This involves


(i) cyclically training the classifier on the current labeled training set, and (ii) once the
model converges, soliciting an oracle (e.g., human annotator) to label new points se-
lected from a pool of unlabeled data based on the utilized heuristic. This type of AL
belongs to pool-based AL, in which our methodology lies. Pool-based AL can be clas-
sified into three groups: (i) uncertainty (informativeness-based) methods [22,12,3], (ii)
4 Zongyao Lyu and William J. Beksi

diversity (representativeness-based) methods [30], and (iii) hybrid methods [17,39,2,38]


based on the instance selection strategy they use. Among the various instance selection
strategies, uncertainty-based selection is the most prevalent. It measures the uncertain-
ties of new unlabeled samples using the predictions made by prior classifiers.
Diversity-based AL methods rely on selecting a few examples by increasing the
diversity of a given batch. The core-set technique [30] was proposed to minimize the
distance between the labeled and unlabeled data pool using the intermediate feature
information of a convolutional deep neural network (DNN) model. It was shown to
be an effective method for large-scale image classification tasks and was theoretically
proven to work best when the number of classes is small. However, as the number of
classes grows the performance deteriorates.
AL methods that combine uncertainty and diversity use a two-step process to select
high-uncertainty points as the most informative points in a batch. Li et al. [24] presented
an adaptive AL approach that combines an information density and uncertainty measure
together to label critical instances for image classification. Sinha et al. [34] observed
that the uncertainty-based batch query strategy often results in a lack of sample diversity
and is vulnerable to outliers. As a remedy, they proposed VAAL, a method that utilizes
an adversarial learning approach to distinguish the spatial coding features of labeled
and unlabeled data, thereby mitigating outlier interference. It also employs both labeled
and unlabeled data to jointly train a variational autoencoder (VAE) in a semi-supervised
fashion. Sample selection in VAAL is based on the prediction from the discriminator
adversarially trained with the VAE. While VAAL incorporates unlabeled data during
the adversarial learning process, it neglects this data during target task learning due to
its inherently task-agnostic nature. An extended version of VAAL [42] was proposed to
combine task-aware and task-agnostic approaches with an uncertainty indicator and a
unified representation for both labeled and unlabeled data.
Task-aware VAAL (TA-VAAL) [19] is an alternative extension of task-agnostic
VAAL that combines task-aware and task-agnostic approaches. TA-VAAL adapts VAAL
to consider the data distribution of both labeled and unlabeled pools by combining them
with a learning loss approach [40]. The learning loss is a task-agnostic method. It in-
cludes a loss prediction module that learns to predict the target loss of unlabeled data
and selects data with the highest predicted loss for labeling. TA-VAAL relaxes the task
of learning loss prediction to ranking loss prediction and employs RankCGAN to incor-
porate normalized ranking loss information into VAAL. However, the main difference
between VAAL and TA-VAAL is the use of task-related information for learning the
ranking function in conjunction with information from unlabeled data. Even so, unla-
beled data is not directly applied to target task learning. To rectify this, we propose a
novel pseudo-labeling technique that can be integrated into each AL cycle, enabling
the comprehensive utilization of the rich information contained within unlabeled data
for direct learning of the target task. Another recent method, multi-classifier adversarial
optimization for active learning (MAOAL) [13], employs multiple classifiers trained ad-
versarially to more precisely define inter-class decision boundaries while aligning fea-
ture distributions between labeled and unlabeled data. We demonstrate that our method
outperforms MAOAL in image classification tasks.
Semi-Supervised Variational Adversarial Active Learning 5

2.2 Semi-Supervised Learning


SSL is a strategy that leverages both labeled and unlabeled data for model training, with
an emphasis on utilizing abundantly available unlabeled data. Several techniques have
been proposed to exploit the relationship between labeled and unlabeled data to achieve
better performance. A notable technique is pseudo labeling [21] where a model, once
trained, is used to predict labels for unlabeled data. These pseudo-labeled data are then
used in subsequent training iterations. Other methods, such as multi-view training [33]
and consistency regularization [28], leverage the structure or inherent properties of the
data to derive meaningful information from the unlabeled portion.
Several efforts have been made to combine SSL and AL methods to make better use
of the unlabeled data during training [36,32,4]. A common strategy in this integrated
approach is to apply pseudo labeling techniques during each AL cycle. This enriches
the training set and improves model accuracy by combining SSL’s efficient use of unla-
beled data with AL’s selective querying, offering a cost-effective solution for scenarios
with limited labeled data. Although simple to implement, pseudo labeling can perform
relatively poorly in its original formulation. The underperformance of pseudo labeling
is generally attributed to incorrect high-confidence predictions from models that are not
properly calibrated. This causes a proliferation of wrong pseudo labels, thus resulting in
a noisy training process [27]. Our enhanced pseudo-labeling approach addresses this
limitation by incorporating unsupervised feature learning through the use of clustering.
Clustering algorithms are employed to group the unlabeled data, and the cluster centers
are used for verifying the predicted pseudo labels. This greatly reduces the number
of incorrect pseudo labels as the labels are assigned based on the proximity to cluster
centers, which represents the classes better than individual instances.

3 Method
Let (XL , YL ) be a pool of data and their labels, and XU the pool of unlabeled data.
Training starts with K available labeled sample pairs (XLK , YLK ). Given a fixed labeling
budget in each AL cycle, b samples from the unlabeled pool are queried according to
an acquisition function. Next, the samples are annotated by human experts and added
to the labeled pool. The model is then iteratively trained on the updated labeled pool
(XLK+b , YLK+b ), and this process is repeated until the labeling budget is exhausted.
SS-VAAL enhances the VAAL framework and its variant, TA-VAAL, as follows.
VAAL employs adversarial learning to distinguish features of labeled and unlabeled
data, which reduces outlier impact and leverages both labeled and unlabeled data in
a semi-supervised training scheme. TA-VAAL, building on the groundwork of VAAL,
utilizes global data structures and local task-related information for sample queries. Our
methodology improves upon these predecessors by harnessing the full potential of the
data distribution and model uncertainty, hence further refining the query strategy in the
AL process.

3.1 Clustering-Assisted Pseudo Labeling


Both VAAL and TA-VAAL do not fully use unlabeled data in the target learning task.
Therefore, we propose to exploit both types of data during model training as follows.
6 Zongyao Lyu and William J. Beksi

Fig. 2: The detailed architecture of SS-VAAL. (Stage 1) A loss prediction module is


attached to the target model to predict losses on the input data. These predicted losses,
along with the actual losses obtained from the target model, are transformed into rank-
ing losses via a pretrained ranking function. Features of the labeled samples are ex-
tracted from the target model to fit a k-means algorithm. (Stage 2) Unlabeled samples
are processed through the target model to obtain initial pseudo labels. The k-means al-
gorithm, already fit with labeled features, is also applied to the unlabeled samples to
obtain clustering labels for them. Initial pseudo and clustering labels are combined to
determine the final pseudo labels. These unlabeled samples and their pseudo labels are
then used for additional training of the target model. (Stage 3) Both labeled and unla-
beled samples are fed into an encoder network to learn the latent variables. The learned
and rank variables are trained adversarially with a discriminator. Sample selection is
based on the predicted probability from the discriminator.

Given XL and XU for labeled and unlabeled examples, respectively, we apply a clas-
sifier f on the unlabeled data f (XU ), and select and assign pseudo labels ŷ for the
most certain predictions. Traditionally, the labeled set will be directly augmented by
y = y + ŷ for the next round of training. Nonetheless, pseudo labeling in its initial form
may produce high-confidence predictions that are incorrect, resulting in numerous er-
roneous pseudo labels and ultimately causing an unstable training process.
To mitigate this issue, we present a semi-supervised pre-clustering technique for
each pseudo label selection process that enhances robustness by reducing incorrect
pseudo labels. In each AL cycle, we first train a model on the available labeled data.
We modify the network to output both the probability score and the feature vector from
the last fully-connected layer before sending it to the softmax function. Then, we fit a
k-means clustering algorithm on the output features of the labeled training data. This
allows the algorithm to learn the structure of the labeled data and predict clusters each
Semi-Supervised Variational Adversarial Active Learning 7

of whose centroid corresponds to one of the classes of the dataset. One thing to note is
that the cluster assignments won’t necessarily correspond directly to the classes of the
dataset being trained. This is because clustering algorithms (e.g., k-means) do not have
any inherent knowledge of class labels and thus the cluster labels they assign have no
intrinsic meaning. To be meaningful, we map the clustering labels to the actual classes
to ensure that they correspond to each other. This is done by assigning each cluster label
to the most frequent true class label within that cluster based on the labeled training
data.
Next, we train a classifier on all unlabeled data to get the predicted probability
vectors
XU
− Rc ,
p(yi = j | xi ) = f (XU ) → (1)
where c is the number of total classes. We assign initial pseudo labels to the unlabeled
data with the most certain predictions only when their associated probabilities are larger
than a threshold τ (we set τ = 0.95 in the experiments), i.e.,

j ∗ = max p(yi = j | xi ),
j
(
arg j, j > τ
ŷi = (2)
0, otherwise.

Then, we apply the k-means function learned on the labeled data to the unlabeled data
to predict the clusters they belong to. Each unlabeled sample is grouped to the nearest
cluster and assigned a label to which the cluster centroid corresponds.
Each unlabeled data point will now have both an initial pseudo label and a clus-
tering label. Lastly, we compare the temporary pseudo labels with the clustering labels
to determine a final pseudo label for each unlabeled data only if they agree with each
other. By doing so, we reduce the number of incorrect pseudo labels, thus taking full
advantage of the abundant unlabeled data for model training. Stage 2 in Fig. 2 shows
this agreement-based pseudo-labeling process. We demonstrate improvement over con-
ventional pseudo labeling through an ablation study in the supplementary material.

3.2 Loss Prediction with Learning-to-Rank


In LL4AL [40], Yoo and Kweon designed a loss prediction module attached to the tar-
get network and jointly learned to predict the losses of unlabeled inputs. The loss is
predicted as a measure of uncertainty, directly guiding the sample selection process.
LL4AL has proven to be effective, yet the “loss-prediction loss” that is key to this ap-
proach is not trivial to calculate. The loss module adapts roughly to the scale changes
of the loss instead of fitting to the exact value. Similar to TA-VAAL, we incorporate
task-related information into the learning process by combining VAAL with the loss
prediction module. Unlike TA-VAAL, which employs a GAN-based ranking method to
address this issue, our approach integrates VAAL with a listwise learning-to-rank tech-
nique that explicitly ranks the predicted losses thus taking into account the entire list of
loss structures. This decision stems from the observation that learning the loss predic-
tion can be seen as a ranking problem. Additionally, the loss in TA-VAAL resembles the
8 Zongyao Lyu and William J. Beksi

original LL4AL as both only consider the neighboring data pairs and ignore the over-
all list structure. This motivates us to use a more appropriate listwise ranking scheme.
Ranking is crucial for many computing tasks, such as information retrieval, and it is
often addressed via a listwise approach (e.g., [5,25]). This involves taking ranked lists
of objects as instances and training a ranking function through the minimization of a
listwise loss function defined on the predicted and ground-truth lists [37].
SoDeep [10] is a method for approximating the non-differentiable sorting operation
in ranking-based losses. It uses a DNN as a sorter to approximate the ranking function
and it is pretrained separately on synthetic values and their ground-truth ranks. The
trained sorter can then be applied directly in downstream tasks by combining it with an
existing model (e.g., the loss prediction module) and converting the value list given by
the model into a ranking list. The ranking loss between the predicted and ground-truth
ranks can then be calculated and backpropagated through the differentiable sorter and
used to update the weights of the model. Fig. 3 illustrates the sorter architecture. We
find this process works well with the loss prediction task in the loss module. Therefore,
we apply SoDeep to the loss prediction module and learn to predict the ranking loss as
a variable that injects task-related information into the subsequent adversarial learning
process, which increases the robustness of the unlabeled sample selection. Concretely,
we substitute the loss prediction module into the sorter architecture as the DNN target
model to produce the predicted scores where the target losses are used as the ground-
truth scores.

Fig. 3: An overview of the SoDeep sorter architecture. A pretrained differentiable DNN


sorter converts the raw scores into ranks given by the target model. A loss is then applied
to the predicted rank, backpropagated through the differentiable sorter, and used to
update the weights.

The upper-right side of Fig. 2 displays the architecture of the modified loss learning
process. We retain the basic structure of the original loss prediction module. Given an
input, the target model generates a prediction, while the loss prediction module takes
multi-layer features as inputs that are extracted from multiple mid-level blocks of the
target model. These features are connected to multiple identical blocks each of which
consists of a global average pooling layer and a fully-connected layer. Then, the outputs
Semi-Supervised Variational Adversarial Active Learning 9

are concatenated and passed through another fully-connected layer to be converted to a


scalar value as the predicted loss Lpred . The target prediction and annotation are used
to calculate a loss Ltarget , which assists in training the target model. This target loss
is treated as the ground-truth loss for the loss prediction module and used to compute
the loss-prediction loss. Specifically, the predicted loss and the target loss are passed
through the pretrained SoDeep sorter and converted to a differentiable ranking loss

Lranking = SoDeep(Lpred , Ltarget ), (3)

which can be used to update the weights of the model. The objective function of the
task learner with the ranking loss module is

L = Ltarget (ŷL , yL ) + λLranking , (4)

where ŷL and yL are the predicted and ground-truth labels, respectively, and λ is a
scaling constant. This training process is illustrated as Stage 1 in Fig. 2. The learned
ranking loss is embedded as a task-related rank variable in the latent space of a VAE
for the subsequent adversarial learning process, which is described in detail in Sec. 3.3.
Stage 1 of the two-stage training is summarized in Alg. 1.

Algorithm 1 Target Model Training


Require: Labeled data pool (XL , YL ), unlabeled data pool XU , pretrained SoDeep sorter S,
initialized model θT , training epochs N , threshold τ
Ensure:
1: for i = 1 to N do
2: Train target model θT on labeled data (XL , YL ) to obtain features and target loss Ltarget
3: Obtain predicted loss Lpred through loss prediction module by fusing multi-level features
4: Lranking ← S(Ltarget , Lpred )
5: Fit features to k-means algorithm
6: Apply k-means on unlabeled data XU and predict clustering labels (CL)
7: Predict initial pseudo labels (IPL) ŷi for unlabeled data XU using (2)
8: Final pseudo labels ← IPL ∩ CL
9: Train model on labeled and pseudo labeled data
10: end for
11: return Trained model θT

3.3 Joint Training with a Variational Autoencoder and Discriminator


For sample selection, we extend VAAL by utilizing a VAE and an adversarial network
(discriminator) to distinguish labeled from unlabeled data. Unlike VAAL, which only
considers the data distribution for adversarial learning, we incorporate task-related
information by embedding the ranking loss as a rank variable in the latent space for
training both the VAE and the discriminator. Let pθ and qϕ be the encoder and decoder
parameterized by θ and ϕ, zL and zU the latent variables generated from the encoder
for labeled and unlabeled data, and rL the rank variable for the labeled data. Let p(z) =
10 Zongyao Lyu and William J. Beksi

N (0, I) be the unit Gaussian prior. The transductive learning of the VAE to capture
latent representation information on both labeled and unlabeled data is characterized by

Ltrans
V AE = E[log qϕ (xL | zL , rL )] − βKL(pθ (zL | xL )||p(z))
(5)
+ E[log qϕ (xU | zU , ˆlU )] − βKL(pθ (zU | xU )||p(z)),

where ˆlU is the predicted loss Lpred over unlabeled data, β is the Lagrangian parameter,
and E denotes the expectation [16].
With the latent representations zL and zU learned by the VAE of both the labeled
and unlabeled data, the objective function of the VAE in adversarial training is then

Ladv ˆ
V AE = −E[log(D(pθ (zL | xL , rL )))] − E[log(D(pθ (zU | xU , lU )))]. (6)

Combining (5) and (6), the overall objective function of the VAE is

LV AE = Ltrans adv
V AE + ηLV AE , (7)

where η is a coefficient hyperparameter. The objective function of the discriminator D


during adversarial training is

Ladv ˆ
D = −E[log(D(pθ (zL | xL , rL )))] − E[log(1 − D(pθ (zU | xU , lU )))], (8)

and the overall objective function of the adversarial training is

min max E[log(D(pθ (zL | xL , rL )))] + E[log(1 − D(pθ (zU | xU , ˆlU )))]. (9)
pθ D

Algorithm 2 Adversarial Training and Sample Selection


Require: Labeled data (XL , YL ), unlabeled data XU , rank variable (i.e., ranking loss) rL ,
trained model θT , initialized models θV AE and θD , training epochs N , labeling budget b
Ensure:
1: for i = 1 to N do
2: Compute Ltrans adv
V AE , LV AE , and LV AE using (5), (6), and (7), respectively
adv
3: Compute LD using (8)
4: Update θV AE and θD using (9)
5: Select samples Xb with minb D(XU )
6: Query labels for Xb : Yb ← Oracle(Xb )
7: (XL , YL ) ← (XL , YL ) ∪ (Xb , Yb )
8: XU ← XU − Xb
9: end for
10: return Updated (XL , YL ), XU

The VAE and discriminator are trained in an adversarial manner. Specifically, the
VAE maps the labeled pθ (zL | xL ) and unlabeled pθ (zU | xU ) data into the latent space
with binary labels 1 and 0, respectively, and tries to trick the discriminator into classi-
fying all the inputs as labeled. On the other hand, the discriminator tries to distinguish
Semi-Supervised Variational Adversarial Active Learning 11

the unlabeled data from the labeled data by predicting the probability of each sample
being from the labeled pool. Thus, the adversarial network is trained to serve as the
sampling scheme via the discriminator by predicting the samples associated with the
latent representations of zL and zU to be from the labeled pool xL or the unlabeled
pool xU according to its predicted probability D(·). In short, sample selection is based
on the predicted probability of the discriminator adversarially trained with the VAE.
The smaller the probability, the more likely the sample will be selected for annotat-
ing. This adversarial training process is shown as Stage 3 in Fig. 2 and summarized in
Alg. 2.

4 Experiments

To evaluate the proposed SS-VAAL framework, we carried out extensive experiments


on two computer vision tasks: image classification and semantic segmentation.

4.1 Active Learning for Image Classification

Datasets. To evaluate SS-VAAL, we performed experiments on the following com-


monly used datasets: CIFAR-10, CIFAR-100 [20], Caltech-101 [11], and ImageNet
[9]. Both the CIFAR-10 and CIFAR-100 datasets consist of 50,000 training images
and 10,000 test images that are 32 × 32 in size. The Caltech 101 dataset contains 9,146
images, split between 101 different object categories. Each object category contains be-
tween 40 and 800 images, each of which is approximately 300 × 200 pixels. ImageNet
is a large-scale dataset with more than 1.2 million images from 1,000 classes.
Implementation details. We first trained a SoDeep sorter to rank the losses. Given the
close performance of several available sorter options, we opted for the LSTM sorter.
The sorter was trained with a sequence length of 128 for 300 epochs on synthetic data
consisting of vectors of generated scalars associated with their ground-truth rank vec-
tors. This training is separate from the AL process. After training was complete, the
sorter was applied to the loss prediction module to convert the predicted and target
losses into ranking losses for the AL process.
For CIFAR-10 and CIFAR-100, we applied the same data augmentation as the com-
pared methods, including a 32 × 32 random crop from 36 × 36 zero-padded images,
random horizontal flip, and normalization with the mean and standard deviation of the
training set. The target model underwent 200 epochs of training on labeled data with a
batch size of 128, then 100 epochs of semi-supervised training on pseudo-labeled data.
The initial learning rate was set to 0.1, and reduced to 0.01 and 0.001 at 160 and 240
epochs, respectively. For training, we employed ResNet-18 [15] as the target network
with the loss prediction module described in Sec. 3 using stochastic gradient descent
with the momentum set to 0.9 and a weight decay of 0.0005. Experiments began with
an initial labeled pool of 1000 / 2000 images from the CIFAR-10 / CIFAR-100 training
set, respectively. At each stage, the budget size was 1000 (CIFAR-10) / 2000 (CIFAR-
100) samples. The pool of unlabeled data consisted of the residual training set from
which samples were selected for labeling by an oracle. Upon labeling, these samples
12 Zongyao Lyu and William J. Beksi

were incorporated back into the initial training set and the process was carried out again
on the updated training set.
For Caltech-101 and ImageNet, the images were resized to 224 × 224 and we initi-
ated the process with 10% of the samples from the dataset as labeled data with a budget
size equivalent to 5% of the dataset. All other settings remained the same as those used
for CIFAR-10 and CIFAR-100, except that the main task was trained for 100 epochs
for the ImageNet dataset. The effectiveness of our approach was assessed based on the
accuracy of the test data. We compared against a random sampling strategy baseline
and state-of-the-art methods including the core-set approach [30], LL4AL [40], VAAL
[34], TA-VAAL [19], and MAOAL [13].
Results. All the compared against methods were averaged across 5 trials on the CIFAR-
10, CIFAR-100, and Caltech-101 datasets, and across 2 trials on ImageNet. Fig. 4 and
Fig. 6 (see supplementary material) show the classification accuracy on the benchmark
datasets. The results obtained for the competing methods are largely in line with those
reported in the literature. Our comprehensive methodology, SS-VAAL, incorporates
both the ranking loss prediction module and the clustering-assisted pseudo labeling. The
empirical results consistently show that SS-VAAL surpasses all the competing methods
at each AL stage.

(a) (b) (c)


Fig. 4: Image classification comparison on the (a) CIFAR-10, (b) CIFAR-100, and (c)
Caltech-101 datasets.

4.2 Active Learning for Semantic Segmentation


Experimental setup. To evaluate the effectiveness of our AL approach in more com-
plex environments, we analyzed the task of semantic segmentation using Cityscapes
[8], a large-scale dataset of urban street scene videos. Consistent with the settings in
[34], we utilized the dilated residual network [41] as the semantic segmentation model.
Performance was measured by the mean intersection over union (mIoU) metric on the
Cityscapes validation set. All other experimental settings were kept consistent with
those used in the image classification experiments.
Results. All the compared against methods were averaged across 3 trials and are shown
in Fig. 5. Our method consistently outperforms all the other methods on the task of se-
mantic segmentation on the Cityscapes dataset as evidenced by its higher mIoU scores.
Semi-Supervised Variational Adversarial Active Learning 13

Fig. 5: Semantic segmentation results on the Cityscapes dataset.

4.3 Ablation Study


To assess the impact of each proposed component, we executed an ablation study for the
classification task on the CIFAR-10, CIFAR-100, and Caltech-101 datasets. The results
are presented in the supplementary material, here we report the main observations. SS-
VAAL (w/ ranking only), which refers to the enhancement of VAAL by integrating the
ranking loss-based module, outperforms VAAL and LL4AL. This confirms the benefits
of considering task-related information in task learning. Moreover, it outperforms TA-
VAAL, indicating that our selection of the listwise ranking method more effectively
conveys task-related information than that of TA-VAAL (Fig. 7 - Fig. 9).
Conversely, SS-VAAL (w/ CAPL only), which entails implementing the proposed
clustering-assisted pseudo-labeling procedure at every stage of model training, yields a
noticeable improvement over all the other methods. This highlights the effectiveness of
exploiting unlabeled data during model training. It also offers a modest improvement
over the SS-VAAL (w/ ranking only) configuration, implying that leveraging unlabeled
data for training contributes more to the performance improvement than employing al-
ternative means for conveying task-related information (Fig. 10 - Fig. 12). Additionally,
we contrast this configuration with SS-VAAL (w/ PL only), which represents the use of
the conventional pseudo-labeling technique. The increase in performance underscores
the effectiveness of our refinement of this method (Fig. 13 - Fig. 15).

5 Conclusion
In this paper we developed key enhancements to both better optimize the use of vast
amounts of unlabeled data during training and incorporate task-related information. Our
approach, SS-VAAL, includes a novel pseudo-labeling algorithm that allows a model to
delve deeper into the data space, thus enhancing its representation ability by exploiting
all unlabeled data in a semi-supervised way in every AL cycle. SS-VAAL also incorpo-
rates a ranking-based loss prediction module that converts predicted losses into a differ-
entiable ranking loss. It can be inserted as a rank variable into VAAL’s latent space for
14 Zongyao Lyu and William J. Beksi

adversarial training. Evaluations on image classification and segmentation benchmarks


demonstrate the increased performance of SS-VAAL over state-of-the-art techniques.

References
1. Arazo, E., Ortego, D., Albert, P., O’Connor, N.E., McGuinness, K.: Pseudo-labeling and
confirmation bias in deep semi-supervised learning. In: Proceedings of the International Joint
Conference on Neural Networks. pp. 1–8. IEEE (2020)
2. Ash, J.T., Zhang, C., Krishnamurthy, A., Langford, J., Agarwal, A.: Deep batch active learn-
ing by diverse, uncertain gradient lower bounds. In: Proceedings of the International Confer-
ence on Learning Representations (2020)
3. Beluch, W.H., Genewein, T., Nürnberger, A., Köhler, J.M.: The power of ensembles for
active learning in image classification. In: Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition. pp. 9368–9377 (2018)
4. Buchert, F., Navab, N., Kim, S.T.: Exploiting diversity of unlabeled data for label-efficient
semi-supervised active learning. In: Proceedings of the International Conference on Pattern
Recognition. pp. 2063–2069 (2022)
5. Cao, Z., Qin, T., Liu, T.Y., Tsai, M.F., Li, H.: Learning to rank: From pairwise approach to
listwise approach. In: Proceedings of the International Conference on Machine learning. pp.
129–136 (2007)
6. Cohn, D.A., Ghahramani, Z., Jordan, M.I.: Active learning with statistical models. Journal
of Artificial Intelligence Research 4, 129–145 (1996)
7. Coletta, L.F., Ponti, M., Hruschka, E.R., Acharya, A., Ghosh, J.: Combining clustering and
active learning for the detection and learning of new image classes. Neurocomputing 358,
150–165 (2019)
8. Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U.,
Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In:
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp.
3213–3223 (2016)
9. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierar-
chical image database. In: Proceedings of the IEEE/CVF Conference on Computer Vision
and Pattern Recognition. pp. 248–255 (2009)
10. Engilberge, M., Chevallier, L., Pérez, P., Cord, M.: Sodeep: a sorting deep net to learn rank-
ing loss surrogates. In: Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition. pp. 10792–10801 (2019)
11. Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training ex-
amples: An incremental bayesian approach tested on 101 object categories. In: Proceedings
of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. pp.
178–178 (2004)
12. Gal, Y., Islam, R., Ghahramani, Z.: Deep bayesian active learning with image data. In: Pro-
ceedings of the International Conference on Machine Learning. pp. 1183–1192 (2017)
13. Geng, L., Liu, N., Qin, J.: Multi-classifier adversarial optimization for active learning. In:
Proceedings of the AAAI Conference on Artificial Intelligence. vol. 37, pp. 7687–7695
(2023)
14. Gilhuber, S., Jahn, P., Ma, Y., Seidl, T.: Verips: Verified pseudo-label selection for deep active
learning. In: Proceedings of the IEEE International Conference on Data Mining. pp. 951–956
(2022)
15. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Pro-
ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp.
770–778 (2016)
Semi-Supervised Variational Adversarial Active Learning 15

16. Higgins, I., Matthey, L., Pal, A., Burgess, C., Glorot, X., Botvinick, M., Mohamed, S., Lerch-
ner, A.: beta-vae: Learning basic visual concepts with a constrained variational framework.
In: Proceedings of the International Conference on Learning Representations (2017)
17. Huang, S.J., Jin, R., Zhou, Z.H.: Active learning by querying informative and representative
examples. IEEE Transactions on Pattern Analysis and Machine Intelligence 10(36), 1936–
1949 (2014)
18. Huang, Z., He, Y., Vogt, S., Sick, B.: Uncertainty and utility sampling with pre-clustering.
In: Proceedings of the Workshop on Interactive Adaptive Learning (2021)
19. Kim, K., Park, D., Kim, K.I., Chun, S.Y.: Task-aware variational adversarial active learning.
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.
pp. 8166–8175 (2021)
20. Krizhevsky, A.: Learning multiple layers of features from tiny images. Tech. rep., University
of Toronto, Toronto, Ontario (2009)
21. Lee, D.H.: Pseudo-label: The simple and efficient semi-supervised learning method for deep
neural networks. In: Proceedings of the Workshop on Challenges in Representation Learning.
vol. 3, p. 896 (2013)
22. Lewis, D.D.: A sequential algorithm for training text classifiers: Corrigendum and additional
data. In: Proceedings of the Acm Sigir Forum. vol. 29, pp. 13–19. ACM New York, NY, USA
(1995)
23. Li, M., Liu, X., van de Weijer, J., Raducanu, B.: Learning to rank for active learning: A
listwise approach. In: Proceedings of the International Conference on Pattern Recognition.
pp. 5587–5594 (2020)
24. Li, X., Guo, Y.: Adaptive active learning for image classification. In: Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 859–866 (2013)
25. Liu, T.Y.: Learning to rank for information retrieval. Foundations and Trends® in Informa-
tion Retrieval 3(3), 225–331 (2009)
26. Nguyen, H.T., Smeulders, A.: Active learning using pre-clustering. In: Proceedings of the
International Conference on Machine Learning. p. 79 (2004)
27. Rizve, M.N., Duarte, K., Rawat, Y.S., Shah, M.: In defense of pseudo-labeling: An
uncertainty-aware pseudo-label selection framework for semi-supervised learning. In: Pro-
ceedings of the International Conference on Learning Representations (2021)
28. Sajjadi, M., Javanmardi, M., Tasdizen, T.: Regularization with stochastic transformations and
perturbations for deep semi-supervised learning. In: Proceedings of the Advances in Neural
Information Processing Systems. vol. 29 (2016)
29. Saquil, Y., Kim, K.I., Hall, P.: Ranking cgans: Subjective control over semantic image at-
tributes. In: Proceedings of the British Machine Vision Conference (2018)
30. Sener, O., Savarese, S.: Active learning for convolutional neural networks: A core-set ap-
proach. In: Proceedings of the International Conference on Learning Representations (2018)
31. Settles, B.: Active learning literature survey. Tech. rep., University of Wisconsin-Madison
Department of Computer Science (2009)
32. Siméoni, O., Budnik, M., Avrithis, Y., Gravier, G.: Rethinking deep active learning: Using
unlabeled data at model training. In: Proceedings of the International Conference on Pattern
Recognition. pp. 1220–1227 (2020)
33. Sindhwani, V., Niyogi, P., Belkin, M.: Beyond the point cloud: From transductive to semi-
supervised learning. In: Proceedings of the International Conference on Machine Learning.
pp. 824–831 (2005)
34. Sinha, S., Ebrahimi, S., Darrell, T.: Variational adversarial active learning. In: Proceedings
of the IEEE/CVF International Conference on Computer Vision. pp. 5972–5981 (2019)
35. https://github.com/robotic-vision-lab/Semi-Supervised-
Variational-Adversarial-Active-Learning
16 Zongyao Lyu and William J. Beksi

36. Wang, K., Zhang, D., Li, Y., Zhang, R., Lin, L.: Cost-effective active learning for deep image
classification. IEEE Transactions on Circuits and Systems for Video Technology 27(12),
2591–2600 (2016)
37. Xia, F., Liu, T.Y., Wang, J., Zhang, W., Li, H.: Listwise approach to learning to rank: theory
and algorithm. In: Proceedings of the International Conference on Machine Learning. pp.
1192–1199 (2008)
38. Yan, X., Nazmi, S., Gebru, B., Anwar, M., Homaifar, A., Sarkar, M., Gupta, K.D.: A
clustering-based active learning method to query informative and representative samples.
Applied Intelligence 52(11), 13250–13267 (2022)
39. Yang, Y., Ma, Z., Nie, F., Chang, X., Hauptmann, A.G.: Multi-class active learning by uncer-
tainty sampling with diversity maximization. International Journal of Computer Vision 113,
113–127 (2015)
40. Yoo, D., Kweon, I.S.: Learning loss for active learning. In: Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition. pp. 93–102 (2019)
41. Yu, F., Koltun, V., Funkhouser, T.: Dilated residual networks. In: Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 472–480 (2017)
42. Zhang, B., Li, L., Yang, S., Wang, S., Zha, Z.J., Huang, Q.: State-relabeling adversarial active
learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition. pp. 8756–8765 (2020)

Supplementary Material

In this supplement we provide image classification results on the ImageNet dataset and
additional experimental results for the ablation study to assess the impact of each SS-
VAAL component.

5.1 ImageNet Results

Fig. 6 presents a performance comparison of our full methodology against several main
competing approaches on the ImageNet dataset. The results clearly show that SS-VAAL
consistently outperforms the others in every iteration, demonstrating its efficacy and
scalability in handling large-scale datasets.

5.2 Ablation Study

We conducted an ablation study on the image classification task to assess the impact
of each proposed component. SS-VAAL (w/ ranking only) refers to the enhancement
of VAAL by integrating the ranking loss-based prediction module. According to Fig. 7
- Fig. 9, this configuration outperforms VAAL and LL4AL, confirming the benefits
of considering task-related information in task learning. Furthermore, this setting also
outperforms TA-VAAL, indicating that our selection of the listwise ranking method
more effectively conveys task-related information than that of TA-VAAL.
On the contrary, SS-VAAL (w/ CAPL only) entails the implementation of the pro-
posed clustering-assisted pseudo-labeling procedure at every stage of model training.
This setup yields a noticeable improvement over all compared methods, highlighting
the effectiveness of exploiting unlabeled data during model training. It also offers a
Semi-Supervised Variational Adversarial Active Learning 17

Fig. 6: Image classification accuracy on the ImageNet dataset.

modest improvement over the SS-VAAL (w/ ranking only) configuration (Fig. 10 -
Fig. 12), implying that leveraging unlabeled data for training contributes more to per-
formance improvement than employing alternative means for conveying task-related
information. Additionally, we contrast this configuration with SS-VAAL (w/ PL only),
which represents the use of the conventional pseudo-labeling technique. The enhance-
ment in performance underscores the effectiveness of our refinement on this method
(Fig. 13 - Fig. 15).

Fig. 7: Ablation results on analyzing the effect of the ranking component on the CIFAR-
10 dataset.
18 Zongyao Lyu and William J. Beksi

Fig. 8: Ablation results on analyzing the effect of the ranking component on the CIFAR-
100 dataset.

Fig. 9: Ablation results on analyzing the effect of the ranking component on the Caltech-
101 dataset.

Fig. 10: Ablation results on analyzing the effect of each component on the CIFAR-10
dataset.
Semi-Supervised Variational Adversarial Active Learning 19

Fig. 11: Ablation results on analyzing the effect of each component on the CIFAR-100
dataset.

Fig. 12: Ablation results on analyzing the effect of each component on the Caltech-101
dataset.

Fig. 13: Ablation results on analyzing the effect of the pseudo-labeling component on
the CIFAR-10 dataset.
20 Zongyao Lyu and William J. Beksi

Fig. 14: Ablation results on analyzing the effect of the pseudo-labeling component on
the CIFAR-100 dataset.

Fig. 15: Ablation results on analyzing the effect of the pseudo-labeling component on
the Caltech-101 dataset.

You might also like