Zhu and Bento - 2017 - Generative Adversarial Active Learning
Zhu and Bento - 2017 - Generative Adversarial Active Learning
Abstract
1 Introduction
One of the most exciting machine learning breakthroughs in recent years is the generative adversarial
networks (GAN) [20]. It trains a generative model by finding the Nash Equilibrium of a two-player
adversarial game. Its ability to generate samples in complex domains enables new possibilities for
active learners to synthesize training samples on demand, rather than relying on choosing instances
to query from a given pool.
In the classification setting, given a pool of unlabeled data samples and a fixed labeling budget, ac-
tive learning algorithms typically choose training samples strategically from a pool to maximize the
accuracy of trained classifiers. The goal of these algorithms is to reduce label complexity. Such
approaches are called pool-based active learning. This pool-based active learning approach is illus-
trated in Figure 1 (a).
In a nutshell, we propose to use GANs to synthesize informative training instances that are adapted
to the current learner. We then ask human oracles to label these instances. The labeled data is added
back to the training set to update the learner. This protocol is executed iteratively until the label
budget is reached. This process is shown in Figure 1 (b).
The main contributions of this work are as follows:
• To the best of our knowledge, this is the first active learning framework using deep genera-
tive models1 .
• While we do not claim our method is always superior to the previous active learners in
terms of accuracy, in some cases, it yields classification performance not achievable even by
a fully supervised learning scheme. With enough capacity from the trained generator, our
method allows us to have control over the generated instances which may not be available
to the previous active learners.
1
The appendix of [37] mentioned three active learning attempts but did not report numerical results. Our
approach is also different from those attempts.
Learner Learner
Training Training
Pool GAN
x, y x, ? x, y x, ?
Figure 1: (a) Pool-based active learning scenario. The learner selects samples for querying from
a given unlabeled pool. (b) GAAL algorithm. The learner synthesizes samples for querying using
GAN.
• We conduct experiments to compare our active learning approach with self-taught learning2.
The results are promising.
• This is the first work to report numerical results in active learning synthesis for image
classification. See [43, 30]. The proposed framework may inspire future GAN applications
in active learning.
• The proposed approach should not be understood as a pool-based active learning method.
Instead, it is active learning by query synthesis. We show that our approach can perform
competitively when compared against pool-based methods.
2 Related Work
Our work is related to two different subjects, active learning and deep generative models.
Active learning algorithms can be categorized into stream-based, pool-based and learning by query
synthesis. Historically, stream-based and pool-based are the two popular scenarios of active learning
[43].
Our method falls into the category of query synthesis. Early active learning by queries synthesis
achieves good results only in simple domains such as X = {0, 1}3, see [1, 2]. In [30], the authors
synthesized learning queries and used human oracles to train a neural network for classifying hand-
written characters. However, they reported poor results due to the images generated by the learner
being sometimes unrecognizable to the human oracles. We will report results on similar tasks such
as differentiating 5 versus 7, showing the advancement of our active learning scheme. Figure 2
compares image samples generated by the method in [30] and our algorithm.
Figure 2: (Left) Image queries synthesized by a neural network for handwritten digits recognition.
Source: [30]. (Right) Image queries synthesized by our algorithm, GAAL.
The popular SVMactive algorithm from [45] is an efficient pool-based active learning scheme for
SVM. Their scheme is a special instance of the uncertainty sampling principle which we also employ.
[28] reduces the exhaustive scanning through database employed by SVMactive . Our algorithm
shares the same advantage of not needing to test every sample in the database at each iteration of
active learning. Although we do so by not using a pool at all instead of a clever trick. [48] proposed
active transfer learning which is reminiscent to our experiments in Section 5.1. However, we do not
consider collecting new labeled data in target domains of transfer learning.
2
See the supplementary document.
2
There have been some applications of generative models in semi-supervised learning and active
learning. Previously, [36] proposed a semi-supervised learning approach to text classification based
on generative models. [26] applied Gaussian mixture models to active learning. In that work, the
generative model served as a classifier. Compared with these approaches, we apply generative mod-
els to directly synthesize training data. This is a more challenging task.
One building block of our algorithm is the groundbreaking work of the GAN model in [20]. Our
approach is an application of GAN in active learning.
Our approach is also related to [44] which studied GAN in a semi-supervised setting. However, our
task is active learning which is different from the semi-supervised learning they discussed. Our work
shares the common strength with the self-taught learning algorithm in [39] as both methods use the
unlabeled data to help with the task. In the supplementary document, we compare our algorithm
with a self-taught learning algorithm.
In a way, the proposed approach can be viewed as an adversarial training procedure [21], where
the classifier is iteratively trained on the adversarial example generated by the algorithm based on
solving an optimization problem. [21] focuses on the adversarial examples that are generated by
perturbing the original datasets within the small epsilon-ball whereas we seek to produce examples
using active learning criterion.
To the best of our knowledge, the only previous mentioning of using GAN for active learning is in
the appendix of [37]. The authors discussed therein three attempts to reduce the number of queries.
In the third attempt, they generated synthetic samples and sorted them by the information content
whereas we adaptively generate new queries by solving an optimization problem. There were no
reported active learning numerical results in that work.
3 Background
We briefly introduce some important concepts in active learning and generative adversarial network.
In the PAC learning framework [46], label complexity describes the number of labeled instances
needed to find a hypothesis with error ǫ. The label complexity of passive supervised learning, i.e.
using all the labeled samples as training data, is O( dǫ ) [47], where d is the VC dimension of the
hypothesis class H. Active learning aims to reduce the label complexity by choosing the most
informative instances for querying while attaining low error rate. For example, [24] proved that the
active learning algorithm from [10] has the label complexity bound O(θd log 1ǫ ), where θ is defined
therein as the disagreement coefficient, thus reducing the theoretical bound for the number of labeled
instances needed from passive supervised learning. Theoretically speaking, the asymptotic accuracy
of an active learning algorithm can not exceed that of a supervised learning algorithm. In practice,
as we will demonstrate in the experiments, our algorithm may be able to achieve higher accuracy
than the passive supervised learning in some cases.
Stream-based active learning makes decisions on whether to query the streamed-in instances or not.
Typical methods include [5, 10, 14]. In this work, we will focus on comparing pool-based and query
synthesis methods.
In pool-based active learning, the learner selects the unlabeled instances from an existing pool based
on a certain criterion. Some pool-based algorithms make selections by using clustering techniques
or maximizing a diversity measure, e.g. [7, 50, 13, 35, 51, 25]. Another commonly used pool-
based active learning principle is uncertainty sampling. It amounts to querying the most uncertain
instances. For example, algorithms in [45, 8] query the labels of the instances that are closest to
the decision boundary of the support vector machine. Figure 3 (a) illustrates this selection process.
Other pool-based works include [27] which proposes a Bayesian active learning by disagreement
algorithm in the context of learning user preferences, [22, 18] which study the submodularity nature
of sequential active learning schemes.
Mathematically, let P be the pool of unlabeled instances, and f = W φ(x) + b be the separating
hyperplane. φ is the feature map induced by the SVM kernel. The SVMactive algorithm in [45]
3
chooses a new instance to query by minimizing the distance (or its proxy) to the hyperplane
min kW φ(x) + bk. (1)
x∈P
This formulation can be justified by the version space theory in separable cases [45] or by other
analyses in non-separable cases, e.g., [8, 6]. This simple and effective method is widely applied in
many studies, e.g., [17, 49].
In the query synthesis scenario, an instance x is synthesized instead of being selected from an ex-
isting pool. Previous methods tend to work in simple low-dimensional domains [2] but fail in more
complicated domains such as images [30]. Our approach aims to tackle this challenge.
For an introduction to active learning, readers are referred to [43, 12].
4
Algorithm 1 Generative Adversarial Active Learning (GAAL)
1: Train generator G on all unlabeled data by solving (2)
2: Initialize labeled training dataset S by randomly picking a small fraction of the data to label
3: repeat
4: Solve optimization problem (3) according to the current learner by descending the gradient
∇z kW ⊤ φ(G(z)) + bk
5: Use the solution {z1 , z2 , . . . } and G to generate instances for querying
6: Label {G(z1 ), G(z2 ), . . . } by human oracles
7: Add labeled data to the training dataset S and re-train the learner, update W , b
8: until Labeling budget is reached
where z is the latent variable and G is obtained by the GAN algorithm. Intuitively, minimizing
this loss will push the generated samples toward the decision boundary. Figure 3 (b) illustrates this
idea. Compared with the pool-base active learning in Figure 3 (a), our hope is that it may be able to
generate more informative instances than those available in the existing pool.
Figure 3: (a) SVMactive algorithm selects the instances that are closest to the boundary to query
the oracle. (b) GAAL algorithm synthesizes instances that are informative to the current learner.
Synthesized instances may be more informative to the learner than other instances in the existing
pool.
The solution(s) to this optimization problem, G(z), after being labeled, will be used as new training
data for the next iteration. We outline our procedure in Algorithm 1. It is possible to use a state-of-
the-art classifier, such as convolutional neural networks. To do this, we can replace the feature map
φ in Equation 3 with a feed-forward function of a convolutional neural network. In that case, the
linear SVM will become the output layer of the network. In step 4 of Algorithm 1, one may also
use a different active learning criterion. We emphasis that our contribution is the general framework
instead of a specific criterion.
In training GAN, we follow the procedure detailed in [38]. Optimization problem (3) is non-convex
with possibly many local minima. One typically aims at finding good local minima rather than the
global minimum. We use a gradient descent algorithm with momentum to solve this problem. We
also periodically restart the gradient descent to find other solutions. The gradient of D and G is
calculated using back-propagation.
Alternatively, we can incorporate diversity into our active learning principle. Some active learning
approaches rely on maximizing diversity measures, such as the Shannon Entropy. In our case, we
can include in the objective function (3) a diversity measure such as proposed in [51, 25], thus
increasing the diversity of samples. The evaluation of this alternative approach is left for future
work.
5 Experiments
We perform active learning experiments using the proposed approach. We also compare our ap-
proach to self-taught learning, a type of transfer learning method, in the supplementary document.
The GAN implementation used in our experiment is a modification of a publicly available TensoFlow
DCGAN implementation3. The network architecture of DCGAN is described in [38].
3
https://github.com/carpedm20/DCGAN-tensorflow
5
In our experiments, we focus on binary image classification. Although this can be generalized to
multiple classes using one-vs-one or one-vs-all scheme [29]. Recent advancements in GAN study
show it could potentially model language as well [23]. Although those results are preliminary at the
current stage. We use a linear SVM as our classifier of choice (with parameter γ = 0.001). Even
though classifiers with much higher accuracy (e.g., convolutional neural networks) can be used,
our purpose is not to achieve absolute high accuracy but to study the relative performance between
different active learning schemes.
The following schemes are implemented and compared in our experiments.
• The proposed generative adversarial active learning (GAAL) algorithm as in Algorithm 1.
• Using regular GAN to generate training data. We refer to this as simple GAN.
• SVMactive algorithm from [45].
• Passive random sampling, which randomly samples instances from the unlabeled pool.
• Passive supervised learning, i.e., using all the samples in the pool to train the classifier.
• Self-taught learning from [39].
We initialize the training set with 50 randomly selected samples. The algorithms proceed with a
batch of 10 queries every time.
We use two datasets for training, the MNIST and CIFAR-10. The MNIST dataset is a well-known
image classification dataset with 60000 training samples. The training set and the test set follow the
same distribution. We perform the binary classification experiment distinguishing 5 and 7 which is
reminiscent to [30]. The training set of CIFAR-10 dataset consists of 50000 32 × 32 color images
from 10 categories. One might speculate the possibility of distinguishing cats and dogs by training
on cat-like dogs or dog-like cats. In practice, our human labelers failed to confidently identify most
of the generated cat and dog images. Figure 4 (Top) shows generated samples. The authors of [41]
reported attempts to generate high-resolution animal pictures, but with the wrong anatomy. We leave
this task for future studies, possibly with improved techniques such as [3, 23]. For this reason, we
perform binary classification on the automobile and horse categories. It is relatively easy for human
labelers to identity car and horse body shapes. Typical generated samples, which are presented to
the human labelers, are shown in Figure 4.
Figure 4: Samples generated by GAAL (Top) Generated samples in cat and dog categories. (Bottom
Left) MNIST dataset. (Bottom Right) CIFAR-10 dataset.
5.1 Active Learning
We use all the images of 5 and 7 from the MNIST training set as our unlabeled pool to train the
generator G. Different from traditional active learning, we do not select new samples from the pool
after initialization. Instead, we apply Algorithm 1 to generate a training query. For the generator
D and G, we follow the same network architecture of [38]. We use linear SVM as our classifier
although other classifiers can be used, e.g. [45, 42, 43].
We first test the trained classifier on a test set that follows a distribution different from the training
set. One purpose is to demonstrate the adaptive capability of the GAAL algorithm. In addition,
because the MNIST test set and training set follow the same distribution, pool-based active learning
methods have an natural advantage over active learning by synthesis since they use real images
drawn from the exact same distribution as the test set. It is thus reasonable to test on sets that follow
different, albeit similar, distributions. To this end, we use the USPS dataset from [32] as the test set
with standard preprocessing. In reality, such settings are very common, e.g., training autonomous
drivers on simulated datasets and testing on real vehicles; training on handwriting characters and
recognizing writings in different styles, etc. This test setting is related to transfer learning, where
the distribution of the training domain Ptr (x, y) is different from that of the target domain Pte (x, y).
Figure 5 (Top) shows the results of our first experiment.
6
0.80 Active Learing, 5 vs. 7
0.75
Classification Accuracy
0.70
0.65
SVM active
Classification Accuracy
0.90 0.75
0.85 0.70
SVM active SVM active
7
this issue given the cause is the instability of GANs. Addressing this limitation will be left to future
studies.
0.75
Classification Accuracy
0.70
0.65
0.60
GAAL
0.55 Random Sampling
GAAL + random sampling
0.50
50 100 150 200 250
Number of Labeled Samples
Figure 6: Active learning results using a mixed scheme. The mixed scheme executes one iteration
of random sampling after every five iterations of GAAL algorithm. Train on MNIST, test on USPS.
Classifying 5 and 7. The results are averaged over 10 runs. The error bars represent the empirical
standard deviation of the average values. The figure is best viewed in color.
A mixed scheme is able to achieve better performance than either using GAAL or random sampling
alone. Therefore, it implies that GAAL, as an exploitation scheme, performs even better in combi-
nation with an exploration scheme. A detailed analysis such mixed schemes will be an interesting
future topic.
References
[1] D Angluin. Queries and concept learning. Mach. Learn., 1988.
[2] D Angluin. Queries revisited. Int. Conf. Algorithmic Learn., 2001.
8
[3] Martin Arjovsky, Soumith Chintala, and Léon Bottou. Wasserstein GAN. jan 2017.
[4] Yoram Baram, Ran El Yaniv, and Kobi Luz. Online choice of active learning algorithms.
Journal of Machine Learning Research, 5(Mar):255–291, 2004.
[5] Alina Beygelzimer, Sanjoy Dasgupta, and John Langford. Importance Weighted Active Learn-
ing. Proc. 26th Annu. Int. Conf. Mach. Learn. ICML 09, abs/0812.4(ii):1–8, 2008.
[6] Antoine Bordes, Şeyda Ertekin, Jason Weston, and Léon Bottou. Fast Kernel Classifiers with
Online and Active Learning. J. Mach. Learn. Res., 6:1579–1619, 2005.
[7] Klaus Brinker. Incorporating Diversity in Active Learning with Support Vector Machines.
[8] Colin Campbell, Nello Cristianini, and Alex Smola. Query learning with large margin classi-
fiers. 17th Int. Conf. Mach. Learn., pages 111–118, 2000.
[9] Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, and Pieter Abbeel. In-
foGAN: Interpretable Representation Learning by Information Maximizing Generative Adver-
sarial Nets. 2016.
[10] David Cohn, Les Atlas, and Richard Ladner. Improving generalization with active learning.
Mach. Learn., 15(2):201–221, may 1994.
[11] Sanjoy Dasgupta. Analysis of a greedy active learning strategy. In Advances in neural infor-
mation processing systems, pages 337–344, 2005.
[12] Sanjoy Dasgupta. Two faces of active learning. Theor. Comput. Sci., 412:1767–1781, 2011.
[13] Sanjoy Dasgupta and Daniel Hsu. Hierarchical sampling for active learning. Proceedings of
the 25th international conference on Machine learning - ICML ’08, pages 208–215, 2008.
[14] Sanjoy Dasgupta, Daniel Hsu, and Claire Monteleoni. A general agnostic active learning
algorithm. Engineering, 20(2):1–14, 2007.
[15] Alexey Dosovitskiy, Jost Tobias Springenberg, Maxim Tatarchenko, and Thomas Brox. Learn-
ing to Generate Chairs, Tables and Cars with Convolutional Networks. arXiv preprint
arXiv:1411.5928, pages 1–14, 2014.
[16] Jon Gauthier. Conditional generative adversarial nets for convolutional face generation. Class
Project for Stanford CS231N: Convolutional Neural Networks for Visual Recognition, Winter
semester 2014, 2014.
[17] King-Shy Goh, Edward Y. Chang, and Wei-Cheng Lai. Multimodal concept-dependent active
learning for image retrieval. In Proc. 12th Annu. ACM Int. Conf. Multimed. - Multimed. ’04,
page 564, New York, New York, USA, 2004. ACM Press.
[18] Daniel Golovin and Andreas Krause. Adaptive submodularity: A new approach to active
learning and stochastic optimization. In COLT, pages 333–345, 2010.
[19] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016.
[20] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil
Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in
neural information processing systems, pages 2672–2680, 2014.
[21] Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adver-
sarial examples. arXiv preprint arXiv:1412.6572, 2014.
[22] Andrew Guillory and Jeff Bilmes. Interactive submodular set cover. arXiv preprint
arXiv:1002.3345, 2010.
[23] Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron Courville.
Improved training of wasserstein gans. arXiv preprint arXiv:1704.00028, 2017.
[24] Steve Hanneke. A bound on the label complexity of agnostic active learning. Proc. 24th Int.
Conf. Mach. Learn. - ICML ’07, pages 353–360, 2007.
[25] Steven C H Hoi, Rong Jin, Jianke Zhu, and Michael R Lyu. Semi-Supervised SVM Batch
Mode Active Learning with Applications to Image Retrieval. ACM Trans. Informations Syst.
ACM Trans. Inf. Syst. Publ. ACM Trans. Inf. Syst., 27(16):24–26, 2009.
[26] Timothy M. Hospedales, Shaogang Gong, and Tao Xiang. Finding rare classes: Active learning
with generative and discriminative models. IEEE Trans. Knowl. Data Eng., 25(2):374–386,
2013.
9
[27] Neil Houlsby, Ferenc Huszar, Zoubin Ghahramani, and Jose M Hernández-Lobato. Collabora-
tive gaussian processes for preference learning. In Advances in Neural Information Processing
Systems, pages 2096–2104, 2012.
[28] Prateek Jain, Sudheendrasvnaras Vijayanarasimhan, Kristen Grauman, Prateek Jain, and Kris-
ten Grauman. Hashing Hyperplane Queries to Near Points with Applications to Large-Scale
Active Learning. IEEE Trans. Pattern Anal. Mach. Intell., 36(2):2010, 2010.
[29] A.J. Joshi, F. Porikli, and N. Papanikolopoulos. Multi-class active learning for image classifi-
cation. IEEE Conf. Comput. Vis. Pattern Recognit., pages 2372–2379, 2009.
[30] Kevin J. Lang and Eric B Baum. Query Learning Can Work Poorly when a Human Oracle is
Used, 1992.
[31] Quoc V Le, Alexandre Karpenko, Jiquan Ngiam, and Andrew Y Ng. ICA with Reconstruction
Cost for Efficient Overcomplete Feature Learning.
[32] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel.
Backpropagation Applied to Handwritten Zip Code Recognition, 1989.
[33] Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew Aitken, Alykhan Tejani,
Johannes Totz, Zehan Wang, and Wenzhe Shi. Photo-Realistic Single Image Super-Resolution
Using a Generative Adversarial Network. arXiv, 2016.
[34] Mehdi Mirza and Simon Osindero. Conditional Generative Adversarial Nets. CoRR, pages
1–7, nov 2014.
[35] Hieu T Nguyen and Arnold Smeulders. Active Learning Using Pre-clustering.
[36] Kamal Nigam, Andrew Kachites Mccallum, Sebastian Thrun, and Tom Mitchell. Text Classi-
fication from Labeled and Unlabeled Documents using EM. Mach. Learn., 39:103–134, 2000.
[37] Nicolas Papernot, Martín Abadi, Úlfar Erlingsson, Ian Goodfellow, and Kunal Talwar. Semi-
supervised knowledge transfer for deep learning from private training data. arXiv preprint
arXiv:1610.05755, 2016.
[38] Alec Radford, Luke Metz, and Soumith Chintala. Unsupervised Representation Learning with
Deep Convolutional Generative Adversarial Networks. nov 2015.
[39] Rajat Raina, Alexis Battle, Honglak Lee, Benjamin Packer, and Andrew Y Ng. Self-taught
Learning : Transfer Learning from Unlabeled Data. Proc. 24th Int. Conf. Mach. Learn., pages
759–766, 2007.
[40] Jens Röder, Boaz Nadler, Kevin Kunzmann, and Fred A Hamprecht. Active learning with
distributional estimates. arXiv preprint arXiv:1210.4909, 2012.
[41] Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen.
Improved Techniques for Training GANs. jun 2016.
[42] Andrew I. Schein and Lyle H. Ungar. Active learning for logistic regression: An evaluation,
volume 68. 2007.
[43] Burr Settles. Active learning literature survey. Computer sciences technical report, 1648:Uni-
versity of Wisconsin–Madison, 2010.
[44] Jost Tobias Springenberg. Unsupervised and Semi-supervised Learning with Categorical Gen-
erative Adversarial Networks. arXiv, (2009):1–20, 2015.
[45] Simon Tong and Daphne Koller. Support Vector Machine Active Learning with Applications
to Text Classification. Proc. Int. Conf. Mach. Learn., 1(June):45–66, 2002.
[46] L. G. Valiant and L. G. A theory of the learnable. Commun. ACM, 27(11):1134–1142, nov
1984.
[47] VN Vapnik and V Vapnik. Statistical learning theory. 1998.
[48] Xuezhi Wang, Tzu-Kuo Huang, and Jeff Schneider. Active transfer learning under model shift.
In International Conference on Machine Learning, pages 1305–1313, 2014.
[49] Manfred K Warmuth, Jun Liao, Gunnar Rätsch, Michael Mathieson, Santosh Putta, and Chris-
tian Lemmen. Active Learning with Support Vector Machines in the Drug Discovery Process.
2002.
10
[50] Z Xu, R Akella, and Y Zhang. Incorporating diversity and density in active learning for rele-
vance feedback. European Conference on Information Retrieval, 2007.
[51] Yi Yang, Zhigang Ma, Feiping Nie, Xiaojun Chang, and Alexander G Hauptmann. Multi-Class
Active Learning by Uncertainty Sampling with Diversity Maximization. Int. J. Comput. Vis.,
113(2):113–127, jun 2014.
[52] Raymond Yeh, Chen Chen, Teck Yian Lim, Mark Hasegawa-Johnson, and Minh N. Do. Se-
mantic Image Inpainting with Perceptual and Contextual Losses. jul 2016.
[53] Jun-Yan Zhu, Philipp Krähenbühl, Eli Shechtman, and Alexei A. Efros. Generative Visual
Manipulation on the Natural Image Manifold. pages 597–613. Springer, Cham, 2016.
[54] Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. Unpaired image-to-image trans-
lation using cycle-consistent adversarial networks. arXiv preprint arXiv:1703.10593, 2017.
on the raw features achieves a higher accuracy than that of the self-taught learning with the same
training size of 250. In fact, self-taught learning performs worse than the regular supervised learn-
ing when labeled data is scarce. This is possible for an autoencoder type algorithm. However, when
we increase the training size, the self-taught learning starts to perform better. With 5000 training
samples, self-taught learning outperforms GAAL with 250 training samples.
Based on these results, we suspect that GAAL also has the potential to be used as a self-taught
algorithm4. In practice, the GAAL algorithm can also be applied on top of the features extracted
by a self-taught algorithm. A comprehensive comparison with a more advanced self-taught learning
method with deeper architecture is beyond the scope of this work.
4
At this stage, self-taught learning has the advantage that it can utilize any unlabeled training data, i.e., not
necessarily from the categories of interest. GAAL does not have this feature yet.
11