0% found this document useful (0 votes)
18 views11 pages

Zhu and Bento - 2017 - Generative Adversarial Active Learning

Uploaded by

이원재
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views11 pages

Zhu and Bento - 2017 - Generative Adversarial Active Learning

Uploaded by

이원재
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Generative Adversarial Active Learning

Jia-Jie Zhu Jose Bento


Max Planck Institute for Intelligent Systems Department of Computer Science
Tübingen, Germany Boston College
arXiv:1702.07956v5 [cs.LG] 15 Nov 2017

[email protected] Chestnut Hill, Massachusetts, USA


[email protected]

Abstract

We propose a new active learning by query synthesis approach using Generative


Adversarial Networks (GAN). Different from regular active learning, the result-
ing algorithm adaptively synthesizes training instances for querying to increase
learning speed. We generate queries according to the uncertainty principle, but
our idea can work with other active learning principles. We report results from
various numerical experiments to demonstrate the effectiveness the proposed ap-
proach. In some settings, the proposed algorithm outperforms traditional pool-
based approaches. To the best our knowledge, this is the first active learning work
using GAN.

1 Introduction

One of the most exciting machine learning breakthroughs in recent years is the generative adversarial
networks (GAN) [20]. It trains a generative model by finding the Nash Equilibrium of a two-player
adversarial game. Its ability to generate samples in complex domains enables new possibilities for
active learners to synthesize training samples on demand, rather than relying on choosing instances
to query from a given pool.
In the classification setting, given a pool of unlabeled data samples and a fixed labeling budget, ac-
tive learning algorithms typically choose training samples strategically from a pool to maximize the
accuracy of trained classifiers. The goal of these algorithms is to reduce label complexity. Such
approaches are called pool-based active learning. This pool-based active learning approach is illus-
trated in Figure 1 (a).
In a nutshell, we propose to use GANs to synthesize informative training instances that are adapted
to the current learner. We then ask human oracles to label these instances. The labeled data is added
back to the training set to update the learner. This protocol is executed iteratively until the label
budget is reached. This process is shown in Figure 1 (b).
The main contributions of this work are as follows:

• To the best of our knowledge, this is the first active learning framework using deep genera-
tive models1 .
• While we do not claim our method is always superior to the previous active learners in
terms of accuracy, in some cases, it yields classification performance not achievable even by
a fully supervised learning scheme. With enough capacity from the trained generator, our
method allows us to have control over the generated instances which may not be available
to the previous active learners.
1
The appendix of [37] mentioned three active learning attempts but did not report numerical results. Our
approach is also different from those attempts.
Learner Learner
Training Training

Pool GAN

x, y x, ? x, y x, ?

(a) Pool-based (b) GAAL

Figure 1: (a) Pool-based active learning scenario. The learner selects samples for querying from
a given unlabeled pool. (b) GAAL algorithm. The learner synthesizes samples for querying using
GAN.

• We conduct experiments to compare our active learning approach with self-taught learning2.
The results are promising.
• This is the first work to report numerical results in active learning synthesis for image
classification. See [43, 30]. The proposed framework may inspire future GAN applications
in active learning.
• The proposed approach should not be understood as a pool-based active learning method.
Instead, it is active learning by query synthesis. We show that our approach can perform
competitively when compared against pool-based methods.

2 Related Work

Our work is related to two different subjects, active learning and deep generative models.
Active learning algorithms can be categorized into stream-based, pool-based and learning by query
synthesis. Historically, stream-based and pool-based are the two popular scenarios of active learning
[43].
Our method falls into the category of query synthesis. Early active learning by queries synthesis
achieves good results only in simple domains such as X = {0, 1}3, see [1, 2]. In [30], the authors
synthesized learning queries and used human oracles to train a neural network for classifying hand-
written characters. However, they reported poor results due to the images generated by the learner
being sometimes unrecognizable to the human oracles. We will report results on similar tasks such
as differentiating 5 versus 7, showing the advancement of our active learning scheme. Figure 2
compares image samples generated by the method in [30] and our algorithm.

Figure 2: (Left) Image queries synthesized by a neural network for handwritten digits recognition.
Source: [30]. (Right) Image queries synthesized by our algorithm, GAAL.
The popular SVMactive algorithm from [45] is an efficient pool-based active learning scheme for
SVM. Their scheme is a special instance of the uncertainty sampling principle which we also employ.
[28] reduces the exhaustive scanning through database employed by SVMactive . Our algorithm
shares the same advantage of not needing to test every sample in the database at each iteration of
active learning. Although we do so by not using a pool at all instead of a clever trick. [48] proposed
active transfer learning which is reminiscent to our experiments in Section 5.1. However, we do not
consider collecting new labeled data in target domains of transfer learning.
2
See the supplementary document.

2
There have been some applications of generative models in semi-supervised learning and active
learning. Previously, [36] proposed a semi-supervised learning approach to text classification based
on generative models. [26] applied Gaussian mixture models to active learning. In that work, the
generative model served as a classifier. Compared with these approaches, we apply generative mod-
els to directly synthesize training data. This is a more challenging task.
One building block of our algorithm is the groundbreaking work of the GAN model in [20]. Our
approach is an application of GAN in active learning.
Our approach is also related to [44] which studied GAN in a semi-supervised setting. However, our
task is active learning which is different from the semi-supervised learning they discussed. Our work
shares the common strength with the self-taught learning algorithm in [39] as both methods use the
unlabeled data to help with the task. In the supplementary document, we compare our algorithm
with a self-taught learning algorithm.
In a way, the proposed approach can be viewed as an adversarial training procedure [21], where
the classifier is iteratively trained on the adversarial example generated by the algorithm based on
solving an optimization problem. [21] focuses on the adversarial examples that are generated by
perturbing the original datasets within the small epsilon-ball whereas we seek to produce examples
using active learning criterion.
To the best of our knowledge, the only previous mentioning of using GAN for active learning is in
the appendix of [37]. The authors discussed therein three attempts to reduce the number of queries.
In the third attempt, they generated synthetic samples and sorted them by the information content
whereas we adaptively generate new queries by solving an optimization problem. There were no
reported active learning numerical results in that work.

3 Background

We briefly introduce some important concepts in active learning and generative adversarial network.

3.1 Active Learning

In the PAC learning framework [46], label complexity describes the number of labeled instances
needed to find a hypothesis with error ǫ. The label complexity of passive supervised learning, i.e.
using all the labeled samples as training data, is O( dǫ ) [47], where d is the VC dimension of the
hypothesis class H. Active learning aims to reduce the label complexity by choosing the most
informative instances for querying while attaining low error rate. For example, [24] proved that the
active learning algorithm from [10] has the label complexity bound O(θd log 1ǫ ), where θ is defined
therein as the disagreement coefficient, thus reducing the theoretical bound for the number of labeled
instances needed from passive supervised learning. Theoretically speaking, the asymptotic accuracy
of an active learning algorithm can not exceed that of a supervised learning algorithm. In practice,
as we will demonstrate in the experiments, our algorithm may be able to achieve higher accuracy
than the passive supervised learning in some cases.
Stream-based active learning makes decisions on whether to query the streamed-in instances or not.
Typical methods include [5, 10, 14]. In this work, we will focus on comparing pool-based and query
synthesis methods.
In pool-based active learning, the learner selects the unlabeled instances from an existing pool based
on a certain criterion. Some pool-based algorithms make selections by using clustering techniques
or maximizing a diversity measure, e.g. [7, 50, 13, 35, 51, 25]. Another commonly used pool-
based active learning principle is uncertainty sampling. It amounts to querying the most uncertain
instances. For example, algorithms in [45, 8] query the labels of the instances that are closest to
the decision boundary of the support vector machine. Figure 3 (a) illustrates this selection process.
Other pool-based works include [27] which proposes a Bayesian active learning by disagreement
algorithm in the context of learning user preferences, [22, 18] which study the submodularity nature
of sequential active learning schemes.
Mathematically, let P be the pool of unlabeled instances, and f = W φ(x) + b be the separating
hyperplane. φ is the feature map induced by the SVM kernel. The SVMactive algorithm in [45]

3
chooses a new instance to query by minimizing the distance (or its proxy) to the hyperplane
min kW φ(x) + bk. (1)
x∈P
This formulation can be justified by the version space theory in separable cases [45] or by other
analyses in non-separable cases, e.g., [8, 6]. This simple and effective method is widely applied in
many studies, e.g., [17, 49].
In the query synthesis scenario, an instance x is synthesized instead of being selected from an ex-
isting pool. Previous methods tend to work in simple low-dimensional domains [2] but fail in more
complicated domains such as images [30]. Our approach aims to tackle this challenge.
For an introduction to active learning, readers are referred to [43, 12].

3.2 Generative Adversarial Networks


Generative adversarial networks (GAN) is a novel generative model invented by [20]. It can be
viewed as the following two-player minimax game between the generator G and the discriminator
D, n o
min max Ex∼pdata log Dθ1 (x) + Ez log(1 − Dθ1 (Gθ2 (z))) , (2)
θ2 θ1
where pdata is the underlying distribution of the real data and z is uniformly distributed random
variable. D and G each has its own set of parameter θ1 and θ2 . By solving this game, a generator
G is obtained. In the ideal scenario, given random input z, we have G(z) ∼ pdata . However, finding
this Nash Equilibrium is a difficult problem in practice. There is no theoretical guarantee for finding
the Nash Equilibrium due to the non-convexity of D and G. A gradient descent type algorithm is
typically used for solving this optimization problem.
A few variants of GAN have been proposed since [20]. The authors of [38] use GAN with deep con-
volutional neural network structures for applications in computer vision(DCGAN). DCGAN yields
good results and is relatively stable. Conditional GAN[16, 15, 34] is another variant of GAN in
which the generator and discriminator can be conditioned on other variables, e.g., the labels of im-
ages. Such generators can be controlled to generate samples from a certain category. [9] proposed
infoGAN which learns disentangled representations using unsupervised learning.
A few updated GAN models have been proposed. [41] proposed a few improved techniques for
training GAN. Another potentially important improvement of GAN, Wasserstein GAN, has been
proposed by [3, 23]. The authors proposed an alternative to training GAN which can avoid insta-
bilities such as mode collapse with theoretical analysis. They also proposed a metric to evaluate
the quality of the generation which may be useful for future GAN studies. Possible applications of
Wasserstein GAN to our active learning framework are left for future work.
The invention of GAN triggered various novel applications. [52] performed image inpainting task
using GAN. [53] proposed iGAN to turn sketches into realistic images. [33] applied GAN to sin-
gle image super-resolution. [54] proposed CycleGAN for image-to-image translation using only
unpaired training data.
Our study is the first GAN application to active learning.
For a comprehensive review of GAN, readers are referred to [19].

4 Generative Adversarial Active Learning


In this section, we introduce our active learning approach which we call Generative Adversarial
Active Learning (GAAL). It combines query synthesis with the uncertainty sampling principle.
The intuition of our approach is to generate instances which the current learner is uncertain about,
i.e. applying the uncertainty sampling principle. One particular choice for the loss function is based
on uncertainty sampling principle explained in section 3.1. In the setting of a classifier with the
decision function f (x) = W φ(x)+ b, the (proxy) distance to the decision boundary is kW φ(x)+ bk.
Similar to the intuition of (1), given a trained generator function G, we formulate the active learning
synthesis as the following optimization problem
min kW ⊤ φ(G(z)) + bk, (3)
z

4
Algorithm 1 Generative Adversarial Active Learning (GAAL)
1: Train generator G on all unlabeled data by solving (2)
2: Initialize labeled training dataset S by randomly picking a small fraction of the data to label
3: repeat
4: Solve optimization problem (3) according to the current learner by descending the gradient
∇z kW ⊤ φ(G(z)) + bk
5: Use the solution {z1 , z2 , . . . } and G to generate instances for querying
6: Label {G(z1 ), G(z2 ), . . . } by human oracles
7: Add labeled data to the training dataset S and re-train the learner, update W , b
8: until Labeling budget is reached

where z is the latent variable and G is obtained by the GAN algorithm. Intuitively, minimizing
this loss will push the generated samples toward the decision boundary. Figure 3 (b) illustrates this
idea. Compared with the pool-base active learning in Figure 3 (a), our hope is that it may be able to
generate more informative instances than those available in the existing pool.

(a) SVMactive (b) GAAL

Figure 3: (a) SVMactive algorithm selects the instances that are closest to the boundary to query
the oracle. (b) GAAL algorithm synthesizes instances that are informative to the current learner.
Synthesized instances may be more informative to the learner than other instances in the existing
pool.

The solution(s) to this optimization problem, G(z), after being labeled, will be used as new training
data for the next iteration. We outline our procedure in Algorithm 1. It is possible to use a state-of-
the-art classifier, such as convolutional neural networks. To do this, we can replace the feature map
φ in Equation 3 with a feed-forward function of a convolutional neural network. In that case, the
linear SVM will become the output layer of the network. In step 4 of Algorithm 1, one may also
use a different active learning criterion. We emphasis that our contribution is the general framework
instead of a specific criterion.
In training GAN, we follow the procedure detailed in [38]. Optimization problem (3) is non-convex
with possibly many local minima. One typically aims at finding good local minima rather than the
global minimum. We use a gradient descent algorithm with momentum to solve this problem. We
also periodically restart the gradient descent to find other solutions. The gradient of D and G is
calculated using back-propagation.
Alternatively, we can incorporate diversity into our active learning principle. Some active learning
approaches rely on maximizing diversity measures, such as the Shannon Entropy. In our case, we
can include in the objective function (3) a diversity measure such as proposed in [51, 25], thus
increasing the diversity of samples. The evaluation of this alternative approach is left for future
work.

5 Experiments
We perform active learning experiments using the proposed approach. We also compare our ap-
proach to self-taught learning, a type of transfer learning method, in the supplementary document.
The GAN implementation used in our experiment is a modification of a publicly available TensoFlow
DCGAN implementation3. The network architecture of DCGAN is described in [38].
3
https://github.com/carpedm20/DCGAN-tensorflow

5
In our experiments, we focus on binary image classification. Although this can be generalized to
multiple classes using one-vs-one or one-vs-all scheme [29]. Recent advancements in GAN study
show it could potentially model language as well [23]. Although those results are preliminary at the
current stage. We use a linear SVM as our classifier of choice (with parameter γ = 0.001). Even
though classifiers with much higher accuracy (e.g., convolutional neural networks) can be used,
our purpose is not to achieve absolute high accuracy but to study the relative performance between
different active learning schemes.
The following schemes are implemented and compared in our experiments.
• The proposed generative adversarial active learning (GAAL) algorithm as in Algorithm 1.
• Using regular GAN to generate training data. We refer to this as simple GAN.
• SVMactive algorithm from [45].
• Passive random sampling, which randomly samples instances from the unlabeled pool.
• Passive supervised learning, i.e., using all the samples in the pool to train the classifier.
• Self-taught learning from [39].
We initialize the training set with 50 randomly selected samples. The algorithms proceed with a
batch of 10 queries every time.
We use two datasets for training, the MNIST and CIFAR-10. The MNIST dataset is a well-known
image classification dataset with 60000 training samples. The training set and the test set follow the
same distribution. We perform the binary classification experiment distinguishing 5 and 7 which is
reminiscent to [30]. The training set of CIFAR-10 dataset consists of 50000 32 × 32 color images
from 10 categories. One might speculate the possibility of distinguishing cats and dogs by training
on cat-like dogs or dog-like cats. In practice, our human labelers failed to confidently identify most
of the generated cat and dog images. Figure 4 (Top) shows generated samples. The authors of [41]
reported attempts to generate high-resolution animal pictures, but with the wrong anatomy. We leave
this task for future studies, possibly with improved techniques such as [3, 23]. For this reason, we
perform binary classification on the automobile and horse categories. It is relatively easy for human
labelers to identity car and horse body shapes. Typical generated samples, which are presented to
the human labelers, are shown in Figure 4.

Figure 4: Samples generated by GAAL (Top) Generated samples in cat and dog categories. (Bottom
Left) MNIST dataset. (Bottom Right) CIFAR-10 dataset.
5.1 Active Learning

We use all the images of 5 and 7 from the MNIST training set as our unlabeled pool to train the
generator G. Different from traditional active learning, we do not select new samples from the pool
after initialization. Instead, we apply Algorithm 1 to generate a training query. For the generator
D and G, we follow the same network architecture of [38]. We use linear SVM as our classifier
although other classifiers can be used, e.g. [45, 42, 43].
We first test the trained classifier on a test set that follows a distribution different from the training
set. One purpose is to demonstrate the adaptive capability of the GAAL algorithm. In addition,
because the MNIST test set and training set follow the same distribution, pool-based active learning
methods have an natural advantage over active learning by synthesis since they use real images
drawn from the exact same distribution as the test set. It is thus reasonable to test on sets that follow
different, albeit similar, distributions. To this end, we use the USPS dataset from [32] as the test set
with standard preprocessing. In reality, such settings are very common, e.g., training autonomous
drivers on simulated datasets and testing on real vehicles; training on handwriting characters and
recognizing writings in different styles, etc. This test setting is related to transfer learning, where
the distribution of the training domain Ptr (x, y) is different from that of the target domain Pte (x, y).
Figure 5 (Top) shows the results of our first experiment.

6
0.80 Active Learing, 5 vs. 7
0.75

Classification Accuracy
0.70
0.65
SVM active

0.60 Fully Supervised


GAAL
0.55 Simple GAN
Random Sampling
0.5050 100 150 200 250 300 350
Number of Labeled Samples
1.00 Active Learing, 5 vs. 7 0.85 Active Learing, Horse vs. Automobile
0.95 0.80
Classification Accuracy

Classification Accuracy
0.90 0.75
0.85 0.70
SVM active SVM active

0.80 Fully Supervised 0.65 Fully Supervised


GAAL GAAL
0.75 Simple GAN 0.60 Simple GAN
Random Sampling Random Sampling
0.7050 100 150 200 250 0.5550 100 150 200 250
Number of Labeled Samples Number of Labeled Samples
Figure 5: Active learning results. (Top) Train on MNIST, test on USPS. Classifying 5 and 7. The
results are averaged over 10 runs. (Bottom Left) Train on MNIST, test on MNIST. Classifying 5 and
7. (Bottom Right) CIFAR-10 dataset, classifying automobile and horse. The results are averaged
over 10 runs. The error bars represent the empirical standard deviation of the average values. The
figures are best viewed in color.
When using the full training set, with 11000 training images, the fully supervised accuracy is at
70.44%. The accuracy of the random sampling scheme steadily approaches that level. On the
other hand, GAAL is able to achieve accuracies better than that of the fully supervised scheme.
With 350 training samples, its accuracy improves over supervised learning and even SVMactive , an
aggressive active learner [11, 45]. Obviously, the accuracy of both SVMactive and random sampling
will eventually converge to the fully supervised learning accuracy. Note that for the SVMactive
algorithm, an exhaustive scan through the training pool is not always practical. In such cases, the
common practice is to restrict the selection pool to a small random subset of the original data.
For completeness, we also perform the experiments in the settings where the training and test set
follow the same distribution. Figure 5 (Bottom) shows these results. Somewhat surprisingly, in
Figure 5 (Left), GAAL’s classification accuracy starts to drop after about 100 samples. One possible
explanation is that GAAL may be generating points close to the boundary that are also close to each
other. This is more likely to happen if the boundary does not change much from one active learning
cycle to the next. This probably happens because the test and train sets are the identically distributed
and simple, like MNIST. Therefore, after a while, the training set may be filled with many similar
points, biasing the classifier and hurting accuracy. In contrast, because of the finite and discrete
nature of pools in the given datasets, a pool-based approach, such as SVMactive , most likely explores
points near the boundary that are substantially different. It is also forced to explore further points
once these close-by points have already been selected. In a sense, the strength of GAAL might in
fact be hurting its classification accuracy. We believe this effect is not so pronounced when the test
and train sets are different because the boundary changes more significantly from one cycle to the
next, which in turn induces some diversity in the generated samples.
To reach competitive accuracy when the training and test set follow the same distribution, we might
incorporate a diversity term into our objective function in GAAL. We will address this in future
work.
In the CIFAR-10 dataset, our human labeler noticed higher chances of bad generated samples, e.g.,
instances fail to represent either of the categories. This may be because of the significantly higher
dimensions than the MNIST dataset. In such cases, we asked the labelers to only label the samples
they can distinguish. We speculate recent improvements on GAN, e.g., [41, 3, 23], may help mitigate

7
this issue given the cause is the instability of GANs. Addressing this limitation will be left to future
studies.

5.2 Balancing exploitation and exploration


The proposed Algorithm 1 can be understood as an exploitation method, i.e., it focuses on generating
the most informative training data based on the current decision boundary On the other hand, it is
often desirable for the algorithm to explore the new areas of the data. To achieve this, we modify
Algorithm 1 by simply executing random sampling every once in a while. This is a common practice
in active learning [4, 40]. We use the same experiment setup as in the previous section. Figure 6
shows the results of this mixed scheme.

0.80 Active Learing, 5 vs. 7

0.75
Classification Accuracy

0.70

0.65

0.60
GAAL
0.55 Random Sampling
GAAL + random sampling
0.50
50 100 150 200 250
Number of Labeled Samples

Figure 6: Active learning results using a mixed scheme. The mixed scheme executes one iteration
of random sampling after every five iterations of GAAL algorithm. Train on MNIST, test on USPS.
Classifying 5 and 7. The results are averaged over 10 runs. The error bars represent the empirical
standard deviation of the average values. The figure is best viewed in color.

A mixed scheme is able to achieve better performance than either using GAAL or random sampling
alone. Therefore, it implies that GAAL, as an exploitation scheme, performs even better in combi-
nation with an exploration scheme. A detailed analysis such mixed schemes will be an interesting
future topic.

6 Discussion and Future Work


In this work, we proposed a new active learning approach, GAAL, that employs the generative adver-
sarial networks. One possible explanation for GAAL not outperforming the pool-based approaches
in some settings is that, in traditional pool-based learning, the algorithm will eventually exhaust all
the points near the decision boundary thus start exploring further points. However, this is the not
the case in GAAL as it can always synthesize points near the boundary. This may in turn cause the
generation of similar samples, thus reducing the effectiveness. We suspect incorporating a diversity
measure into the GAAL framework as discussed at the end of Section 4 might mitigate this issue.
This issue is related to the exploitation and exploration trade-off which we explored in brief.
The results of this work are enough to inspire future studies of deep generative models in active
learning. However, much work remains in establishing theoretical analysis and reaching better per-
formance. We also suspect that GAAL can be modified to generate adversarial examples such as in
[21]. The comparison of GAAL with transfer learning (see the supplementary document) is particu-
larly interesting and worth further investigation. We also plan to investigate the possibility of using
Wasserstein GAN in our framework.

References
[1] D Angluin. Queries and concept learning. Mach. Learn., 1988.
[2] D Angluin. Queries revisited. Int. Conf. Algorithmic Learn., 2001.

8
[3] Martin Arjovsky, Soumith Chintala, and Léon Bottou. Wasserstein GAN. jan 2017.
[4] Yoram Baram, Ran El Yaniv, and Kobi Luz. Online choice of active learning algorithms.
Journal of Machine Learning Research, 5(Mar):255–291, 2004.
[5] Alina Beygelzimer, Sanjoy Dasgupta, and John Langford. Importance Weighted Active Learn-
ing. Proc. 26th Annu. Int. Conf. Mach. Learn. ICML 09, abs/0812.4(ii):1–8, 2008.
[6] Antoine Bordes, Şeyda Ertekin, Jason Weston, and Léon Bottou. Fast Kernel Classifiers with
Online and Active Learning. J. Mach. Learn. Res., 6:1579–1619, 2005.
[7] Klaus Brinker. Incorporating Diversity in Active Learning with Support Vector Machines.
[8] Colin Campbell, Nello Cristianini, and Alex Smola. Query learning with large margin classi-
fiers. 17th Int. Conf. Mach. Learn., pages 111–118, 2000.
[9] Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, and Pieter Abbeel. In-
foGAN: Interpretable Representation Learning by Information Maximizing Generative Adver-
sarial Nets. 2016.
[10] David Cohn, Les Atlas, and Richard Ladner. Improving generalization with active learning.
Mach. Learn., 15(2):201–221, may 1994.
[11] Sanjoy Dasgupta. Analysis of a greedy active learning strategy. In Advances in neural infor-
mation processing systems, pages 337–344, 2005.
[12] Sanjoy Dasgupta. Two faces of active learning. Theor. Comput. Sci., 412:1767–1781, 2011.
[13] Sanjoy Dasgupta and Daniel Hsu. Hierarchical sampling for active learning. Proceedings of
the 25th international conference on Machine learning - ICML ’08, pages 208–215, 2008.
[14] Sanjoy Dasgupta, Daniel Hsu, and Claire Monteleoni. A general agnostic active learning
algorithm. Engineering, 20(2):1–14, 2007.
[15] Alexey Dosovitskiy, Jost Tobias Springenberg, Maxim Tatarchenko, and Thomas Brox. Learn-
ing to Generate Chairs, Tables and Cars with Convolutional Networks. arXiv preprint
arXiv:1411.5928, pages 1–14, 2014.
[16] Jon Gauthier. Conditional generative adversarial nets for convolutional face generation. Class
Project for Stanford CS231N: Convolutional Neural Networks for Visual Recognition, Winter
semester 2014, 2014.
[17] King-Shy Goh, Edward Y. Chang, and Wei-Cheng Lai. Multimodal concept-dependent active
learning for image retrieval. In Proc. 12th Annu. ACM Int. Conf. Multimed. - Multimed. ’04,
page 564, New York, New York, USA, 2004. ACM Press.
[18] Daniel Golovin and Andreas Krause. Adaptive submodularity: A new approach to active
learning and stochastic optimization. In COLT, pages 333–345, 2010.
[19] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016.
[20] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil
Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in
neural information processing systems, pages 2672–2680, 2014.
[21] Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adver-
sarial examples. arXiv preprint arXiv:1412.6572, 2014.
[22] Andrew Guillory and Jeff Bilmes. Interactive submodular set cover. arXiv preprint
arXiv:1002.3345, 2010.
[23] Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron Courville.
Improved training of wasserstein gans. arXiv preprint arXiv:1704.00028, 2017.
[24] Steve Hanneke. A bound on the label complexity of agnostic active learning. Proc. 24th Int.
Conf. Mach. Learn. - ICML ’07, pages 353–360, 2007.
[25] Steven C H Hoi, Rong Jin, Jianke Zhu, and Michael R Lyu. Semi-Supervised SVM Batch
Mode Active Learning with Applications to Image Retrieval. ACM Trans. Informations Syst.
ACM Trans. Inf. Syst. Publ. ACM Trans. Inf. Syst., 27(16):24–26, 2009.
[26] Timothy M. Hospedales, Shaogang Gong, and Tao Xiang. Finding rare classes: Active learning
with generative and discriminative models. IEEE Trans. Knowl. Data Eng., 25(2):374–386,
2013.

9
[27] Neil Houlsby, Ferenc Huszar, Zoubin Ghahramani, and Jose M Hernández-Lobato. Collabora-
tive gaussian processes for preference learning. In Advances in Neural Information Processing
Systems, pages 2096–2104, 2012.
[28] Prateek Jain, Sudheendrasvnaras Vijayanarasimhan, Kristen Grauman, Prateek Jain, and Kris-
ten Grauman. Hashing Hyperplane Queries to Near Points with Applications to Large-Scale
Active Learning. IEEE Trans. Pattern Anal. Mach. Intell., 36(2):2010, 2010.
[29] A.J. Joshi, F. Porikli, and N. Papanikolopoulos. Multi-class active learning for image classifi-
cation. IEEE Conf. Comput. Vis. Pattern Recognit., pages 2372–2379, 2009.
[30] Kevin J. Lang and Eric B Baum. Query Learning Can Work Poorly when a Human Oracle is
Used, 1992.
[31] Quoc V Le, Alexandre Karpenko, Jiquan Ngiam, and Andrew Y Ng. ICA with Reconstruction
Cost for Efficient Overcomplete Feature Learning.
[32] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel.
Backpropagation Applied to Handwritten Zip Code Recognition, 1989.
[33] Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew Aitken, Alykhan Tejani,
Johannes Totz, Zehan Wang, and Wenzhe Shi. Photo-Realistic Single Image Super-Resolution
Using a Generative Adversarial Network. arXiv, 2016.
[34] Mehdi Mirza and Simon Osindero. Conditional Generative Adversarial Nets. CoRR, pages
1–7, nov 2014.
[35] Hieu T Nguyen and Arnold Smeulders. Active Learning Using Pre-clustering.
[36] Kamal Nigam, Andrew Kachites Mccallum, Sebastian Thrun, and Tom Mitchell. Text Classi-
fication from Labeled and Unlabeled Documents using EM. Mach. Learn., 39:103–134, 2000.
[37] Nicolas Papernot, Martín Abadi, Úlfar Erlingsson, Ian Goodfellow, and Kunal Talwar. Semi-
supervised knowledge transfer for deep learning from private training data. arXiv preprint
arXiv:1610.05755, 2016.
[38] Alec Radford, Luke Metz, and Soumith Chintala. Unsupervised Representation Learning with
Deep Convolutional Generative Adversarial Networks. nov 2015.
[39] Rajat Raina, Alexis Battle, Honglak Lee, Benjamin Packer, and Andrew Y Ng. Self-taught
Learning : Transfer Learning from Unlabeled Data. Proc. 24th Int. Conf. Mach. Learn., pages
759–766, 2007.
[40] Jens Röder, Boaz Nadler, Kevin Kunzmann, and Fred A Hamprecht. Active learning with
distributional estimates. arXiv preprint arXiv:1210.4909, 2012.
[41] Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen.
Improved Techniques for Training GANs. jun 2016.
[42] Andrew I. Schein and Lyle H. Ungar. Active learning for logistic regression: An evaluation,
volume 68. 2007.
[43] Burr Settles. Active learning literature survey. Computer sciences technical report, 1648:Uni-
versity of Wisconsin–Madison, 2010.
[44] Jost Tobias Springenberg. Unsupervised and Semi-supervised Learning with Categorical Gen-
erative Adversarial Networks. arXiv, (2009):1–20, 2015.
[45] Simon Tong and Daphne Koller. Support Vector Machine Active Learning with Applications
to Text Classification. Proc. Int. Conf. Mach. Learn., 1(June):45–66, 2002.
[46] L. G. Valiant and L. G. A theory of the learnable. Commun. ACM, 27(11):1134–1142, nov
1984.
[47] VN Vapnik and V Vapnik. Statistical learning theory. 1998.
[48] Xuezhi Wang, Tzu-Kuo Huang, and Jeff Schneider. Active transfer learning under model shift.
In International Conference on Machine Learning, pages 1305–1313, 2014.
[49] Manfred K Warmuth, Jun Liao, Gunnar Rätsch, Michael Mathieson, Santosh Putta, and Chris-
tian Lemmen. Active Learning with Support Vector Machines in the Drug Discovery Process.
2002.

10
[50] Z Xu, R Akella, and Y Zhang. Incorporating diversity and density in active learning for rele-
vance feedback. European Conference on Information Retrieval, 2007.
[51] Yi Yang, Zhigang Ma, Feiping Nie, Xiaojun Chang, and Alexander G Hauptmann. Multi-Class
Active Learning by Uncertainty Sampling with Diversity Maximization. Int. J. Comput. Vis.,
113(2):113–127, jun 2014.
[52] Raymond Yeh, Chen Chen, Teck Yian Lim, Mark Hasegawa-Johnson, and Minh N. Do. Se-
mantic Image Inpainting with Perceptual and Contextual Losses. jul 2016.
[53] Jun-Yan Zhu, Philipp Krähenbühl, Eli Shechtman, and Alexei A. Efros. Generative Visual
Manipulation on the Natural Image Manifold. pages 597–613. Springer, Cham, 2016.
[54] Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. Unpaired image-to-image trans-
lation using cycle-consistent adversarial networks. arXiv preprint arXiv:1703.10593, 2017.

Appendix: Comparison with Self-taught Learning


One common strength of GAAL and self-taught learning [39] is that both utilize the unlabeled data
to help with the classification task. As we have seen in the MNIST experiment, our GAAL algorithm
seems to be able to adapt to the learner. The results in this experiment are preliminary and not meant
to be taken as comprehensive evaluations.
In this case, the training domain is mostly unlabeled. Thus the method we compare with is self-
taught learning [39]. Similar to the algorithm in [31], we use a Reconstruction Independent Com-
ponent Analysis (RICA) model with a convolutional layer and a pooling layer. RICA is similar
to a sparse autoencoder. Following standard self-taught learning procedures, We first train on the
unlabeled pool dataset. Then we use trained RICA as the a feature extractor to obtain higher level
features from randomly selected MNIST images. We then concatenate the features with the original
image data to train the classifier. Finally, we test the trained classifier on the USPS dataset. We
test the training size of 250, 500, 1000, and 5000. The reason of doing so is that deep learning
type techniques are known to thrive in the abundance of training data. They may perform relatively
poorly with limited amount of training data, as in the active learning scenarios. We run the exper-
iments for 100 times and average the results. We use the same setting for the GAAL algorithm as
in Section 5.1. The classifier we use is a linear SVM. Table 1 shows the classification accuracies
of GAAL, self-taught learning and baseline supervised learning on raw image data. Using GAAL

Table 1: Comparison of GAAL and self-taught learning

A LGOIRTHM T RAINING SET SIZE ACCURACY


GAAL 250 76.42%
S ELF - TAUGHT 250 59.68%
S UPERVISED 250 67.87%
S ELF - TAUGHT 500 65.53%
S UPERVISED 500 69.22%
S ELF - TAUGHT 1000 71.96%
S UPERVISED 1000 69.58%
SELF-TAUGHT 5000 78.08%
S UPERVISED 5000 72.00%

on the raw features achieves a higher accuracy than that of the self-taught learning with the same
training size of 250. In fact, self-taught learning performs worse than the regular supervised learn-
ing when labeled data is scarce. This is possible for an autoencoder type algorithm. However, when
we increase the training size, the self-taught learning starts to perform better. With 5000 training
samples, self-taught learning outperforms GAAL with 250 training samples.
Based on these results, we suspect that GAAL also has the potential to be used as a self-taught
algorithm4. In practice, the GAAL algorithm can also be applied on top of the features extracted
by a self-taught algorithm. A comprehensive comparison with a more advanced self-taught learning
method with deeper architecture is beyond the scope of this work.
4
At this stage, self-taught learning has the advantage that it can utilize any unlabeled training data, i.e., not
necessarily from the categories of interest. GAAL does not have this feature yet.

11

You might also like