Active Learning for Deep Object Detection 2
Active Learning for Deep Object Detection 2
Keywords: Active Learning, Deep Learning, Object Detection, YOLO, Continuous Learning, Incremental Learning.
Abstract: The great success that deep models have achieved in the past is mainly owed to large amounts of labeled
training data. However, the acquisition of labeled data for new tasks aside from existing benchmarks is both
challenging and costly. Active learning can make the process of labeling new data more efficient by selecting
unlabeled samples which, when labeled, are expected to improve the model the most. In this paper, we
combine a novel method of active learning for object detection with an incremental learning scheme (Käding
et al., 2016b) to enable continuous exploration of new unlabeled datasets. We propose a set of uncertainty-
based active learning metrics suitable for most object detectors. Furthermore, we present an approach to
leverage class imbalances during sample selection. All methods are evaluated systematically in a continuous
exploration context on the PASCAL VOC 2012 dataset (Everingham et al., 2010).
181
Brust, C., Käding, C. and Denzler, J.
Active Learning for Deep Object Detection.
DOI: 10.5220/0007248601810190
In Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2019), pages 181-190
ISBN: 978-989-758-354-4
Copyright c 2019 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved
VISAPP 2019 - 14th International Conference on Computer Vision Theory and Applications
Score from
margin sampling
(1-vs-2) +
Whole
image
score
ca
ca ycl
bir
ho g
dot
c
rs
e
e
Predict Calculate scores Aggregate individual
unlabeled example for each detection scores
Figure 1: Our proposed system for continuous exploration scenarios. Unlabeled images are evaluated by an deep object de-
tection method. The margins of predictions (i.e., absolute difference of highest and second-highest class score) are aggregated
to identify valuable instances by combining scores of individual detections.
multi-scale detections (Lin et al., 2017). YOLO (Red- part-based detector for SVM classifiers in combina-
mon et al., 2016) is a more recent deep learning-based tion with hashing is proposed for use in large-scale
object detector. Instead of using a CNN as a black box settings. Active learning is realized by selecting the
feature extractor, it is trained in an end-to-end fashion. most uncertain instances for labeling. In (Roy et al.,
All detections are inferred in a single pass (hence the 2016), object detection is interpreted as a structured
name “You Only Look Once”) while detection and prediction problem using a version space approach in
classification are capable of independent operation. the so called “difference of features” space. The aut-
YOLOv2 (Redmon and Farhadi, 2017) and YOLOv3 hors propose different margin sampling approaches
(Redmon and Farhadi, 2018) improve upon the ori- estimating the future margin of an SVM classifier.
ginal YOLO in several aspects. These include among Like our proposed approach, most related met-
others different network architectures, different priors hods presented above rely on uncertainty indicators
for bounding boxes and considering multiple scales like least confidence or 1-vs-2. However, they are
during training and detection. SSD (Liu et al., 2016) designed for a specific type of object detection and
is a single-pass approach comparable to YOLO intro- therefore can not be applied to deep object detection
ducing improvements like assumptions about the as- methods in general whereas our method can. Addi-
pect ratio distribution of bounding boxes as well as tionally, our method does not propose single objects
predictions on different scales. As a result of a series to the human annotator. It presents whole images at
of improvements, it is both faster and more accurate once and requests labels for every object.
than the original YOLO. DSSD (Fu et al., 2017) furt-
her improves upon SSD in focusing more on context
Active Learning for Deep Architectures. In
with the help of deconvolutional layers.
(Wang and Shang, 2014) and (Wang et al., 2016),
uncertainty-based active learning criteria for deep
Active Learning for Object Detection. The aut- models are proposed. The authors offer several me-
hors of (Abramson and Freund, 2006) propose an trics to estimate model uncertainty, including least
active learning system for pedestrian detection in vi- confidence, margin or entropy sampling. Wang et al.
deos taken by a camera mounted on the front of additionally describe a self-taught learning scheme,
a moving car. Their detection method is based on where the model’s prediction is used as a label for
AdaBoost while sampling of unlabeled instances is further training if uncertainty is below a threshold.
realized by hand-tuned thresholding of detections. Another type of margin sampling is presented in
Object detection using generalized Hough transform (Stark et al., 2015). The authors propose querying
in combination with randomized decision trees, cal- samples according to the quotient of the highest and
led Hough forests, is presented in (Yao et al., 2012). second-highest class probability. The visual detection
Here, costs are estimated for annotations, and instan- of defects using a ResNet is presented in (Feng et al.,
ces with highest costs are selected for labeling. This 2017). The authors propose two methods: uncertainty
follows the intuition that those examples are most li- sampling (i.e., defect probability of 0.5) and positive
kely to be difficult and therefore considered most va- sampling (i.e., selecting every positive sample since
luable. Another active learning approach for satellite they are very rare) for querying unlabeled instances
images using sliding windows in combination with as model update after labeling. Another work which
an SVM classifier and margin sampling is proposed presents uncertainty sampling is (Liu et al., 2017). In
in (Bietti, 2012). The combination of active learning addition, a query by committee strategy as well as
for object detection with crowd sourcing is presen- active learning involving weighted incremental dicti-
ted in (Vijayanarasimhan and Grauman, 2014). A onary learning for active learning are proposed. In the
182
Active Learning for Deep Object Detection
work of (Gal et al., 2017), several uncertainty-related located close to a decision boundary. In this case, it
measures for active learning are presented. Since they can be used to refine the decision boundary and is the-
use Bayesian CNNs, they can make use of the proba- refore valuable. The metric is defined using the hig-
bilistic output and employ methods like variance sam- hest scoring classes c1 and c2 :
pling, entropy sampling or maximizing mutual infor- 2
mation. Finally, the authors of (Beluch et al., 2018) v1vs2 (x) = 1 − (max p̂(c1 |x) − max p̂(c2 |x)) .
c1 ∈K c2 ∈K\c1
show that ensamble-based uncertainty measures are (1)
able to perform best in their evaluation. All of the This procedure is known as 1-vs-2 or margin sam-
works introduced above are tailored to active learning pling (Settles, 2009). We use 1-vs-2 as part of our
in classification scenarios. Most of them rely on mo- methods since its operation is intuitive and it can pro-
del uncertainty, similar to our applied selection crite- duce better estimates than e.g., least confidence ap-
ria. proaches (Käding et al., 2016a). However, our propo-
Besides estimating the uncertainty of the model, sed aggregation method could be applied to any other
further retraining-based approaches are maximizing active learning measure.
the expected model change (Huang et al., 2016) or the
expected model output change (Käding et al., 2016a)
that unlabeled samples would cause after labeling.
Since each bounding box inside an image has to be 3 ACTIVE LEARNING FOR DEEP
evaluated according its active learning score, both me- OBJECT DETECTION
asures would be impractical in terms of runtime wit-
hout further modifications. A more complete over- Using a classification metric on a single detection is
view of general active learning strategies can be found straightforward, if class scores are available. Though,
in (Kovashka et al., 2016; Settles, 2009). aggregating metrics of individual detections for a
complete image can be done in many different ways.
In the section below, we propose simple and efficient
2 PREREQUISITE: ACTIVE aggregation strategies. Afterwards, we discuss the
problem of class imbalance in datasets.
LEARNING
3.1 Aggregation of Detection Metrics
In active learning, a value or metric v(x) is assigned
to any unlabeled example x to determine its possible
Possible aggregations include calculating the sum, the
contribution to model improvement. The current mo-
average or the maximum over all detections. Ho-
del’s output can be used to estimate a value, as can
wever, for some aggregations, it is not clear how an
statistical properties of the example itself. A high v(x)
image without any detections should be handled.
means that the example should be preferred during se-
lection because of its estimated value for the current
model. Sum. A straightforward method of aggregation is
In the following section, we propose a method to the sum. Intuitively, this method prefers images with
adapt an active learning metric for classification to ob- lots of uncertain detections in them. When aggrega-
ject detection using an aggregation process. This met- ting detections using a sum, empty examples should
hod is applicable to any object detector whose output be valued zero. It is the neutral element of addition,
contains class scores for each detected object. making it a reasonable value for an empty sum. A low
valuation effectively delays the selection of empty ex-
amples until there are either no better examples left or
Classification. For classification, the model output
the model has improved enough to actually produce
for a given example x is an estimated distribution of
detections on them. The value of a single example x
class scores p̂(c|x) over classes K. This distribution
can be calculated from the detections D in the follo-
can be analyzed to determine whether the model made
wing way:
an uncertain prediction, a good indicator of a valua-
ble example. Different measures of uncertainty are vSum (x) = ∑ v1vs2 (xi ) . (2)
a common choice for selection, e.g., (Ertekin et al., i∈D
2007; Fu and Yang, 2015; Hoi et al., 2006; Jain and
Kapoor, 2009; Kapoor et al., 2010; Käding et al., Average. Another possibility is averaging each de-
2016c; Tong and Koller, 2001; Beluch et al., 2018). tection’s scores. The average is not sensitive to the
For example, if the difference between the two number of detections, which may make scores more
highest class scores is very low, the example may be comparable between images. If a sample does not
183
VISAPP 2019 - 14th International Conference on Computer Vision Theory and Applications
contain any detections, it will be assigned a zero Data. We use the PASCAL VOC 2012 dataset (Eve-
score. This is an arbitrary rule because there is no true ringham et al., 2010) to assess the effects of our met-
neutral element w.r.t. averages. However, we choose hods on learning. To specifically measure incremen-
zero to keep the behavior in line with the other me- tal and active learning performance, both training and
trics: validation set are split into parts A and B in two diffe-
1 rent random ways to obtain more general experimen-
vAvg (x) = ∑ v1vs2 (xi ) . (3)
|D| i∈D tal results. Part B is considered “new” and is compri-
sed of images with the object classes bird, cow and
sheep (first way) or tvmonitor, cat and boat (se-
Maximum. Finally, individual detection scores can cond way). Part A contains all other 17 classes and
be aggregated by calculating the maximum. This can is used for initial training. The training set for part B
result in a substantial information loss. However, it contains 605 and 638 images for the first and second
may also prove beneficial because of increased robus- way, respectively. The decision towards VOC in favor
tness to noise from many detections. For the maxi- of recently published datasets was motivated by the
mum aggregation, a zero score for empty examples is conditions of the dataset itself. Since it mainly con-
valid. The maximum is not affected by zero valued tains images showing fewer objects, it is possible to
detections, because no single detection’s score can be split the data into a known and unknown part without
lower than zero: having images containing classes from both parts of
vMax (x) = max v1vs2 (xi ) . (4) the splits.
i∈D
184
Active Learning for Deep Object Detection
Algorithm 1: Detailed description of the experimental protocol. Please note that in an actual continuous learning scenario,
new examples are always added to U. The loop is never left because U is never exhausted. The described splitting process
would have to be applied regularly.
Require: Known labeled samples L, unknown samples U, initial model f0 , active learning metric v
U ← U\Ubest
L ← L ∪ (Ubest , Ybest )
end while
is not changed during an experimental run. 2016; Shmelkov et al., 2017). We use a straightfor-
ward, but effective fine-tuning method (Käding et al.,
2016b) to implement incremental learning. With each
Evaluation. We report mean average precision
gradient step, the mini-batch is constructed by rand-
(mAP) as described in (Everingham et al., 2010) and
omly selecting from old and new examples with a
validate each five new batches (i.e., 50 new samples).
certain probability of λ or 1 − λ, respectively. After
The result is averaged over five runs for each active
completing the learning step, the new data is simply
learning metric and way of splitting for a total of ten
considered old data for the next step. This method
runs. As a baseline for comparison, we evaluate the
can balance known and unknown data performance
performance of random selection, since there is no ot-
successfully. We use a value of 0.5 for λ to make as
her work suitable for direct comparison without any
few assumptions as possible and perform 100 iterati-
adjustments as of yet.
ons per update. Algorithm 1 describes the protocol
in more detail. The method can be applied to YOLO
Setup – Object Detector. We use YOLO as deep object detection with some adjustments. Mainly, the
object detection framework (Redmon et al., 2016). architecture needs to be changed when new classes
More precisely, we use the YOLO-Small architecture are added. Because of the design of YOLO’s output
as an alternative to larger object detection networks, layer, we rearrange the weights to fit new classes, ad-
because it allows for much faster training. Our ini- ding 49 zero-initialized weights per class.
tial model is obtained by fine-tuning the Extraction
model2 on part A of the VOC dataset for 24,000 ite- 4.1 Results
rations using the Adam optimizer (Kingma and Ba,
2014), for each way of splitting the dataset into parts We focus our analysis on the new, unknown data. Ho-
A and B, resulting in two initial models. The first half wever, not losing performance on known data is also
of initial training is completed with a learning rate of important. We analyze the performance on the known
0.0001. The second half and all incremental experi- part of the data (i.e., part A of the VOC dataset) to va-
ments use a lower learning rate of 0.00001 to prevent lidate our method. In worst case, the mAP decreases
divergence. Other hyperparameters match (Redmon from 36.7% initially to 32.1% averaged across all ex-
et al., 2016), including the augmentation of training perimental runs and methods although three new clas-
data using random crops, exposure or saturation ad- ses were introduced. We can see that the incremental
justments. learning method from (Käding et al., 2016b) causes
only minimal losses on known data. These losses in
Setup – Incremental Learning. Extending an exis- performance are also referred to as “catastrophic for-
ting CNN without sacrificing performance on known getting” in literature (Kirkpatrick et al., 2016). The
data is not a trivial task. Fine-tuning exclusively on method from (Käding et al., 2016b) does not require
new data leads to a severe degradation of performance additional model parameters or adjusted loss terms
on previously learned examples (Kirkpatrick et al., for added samples like comparable approaches such
as (Shmelkov et al., 2017) do, which is important for
2 http://pjreddie.com/media/files/extraction.weights learning indefinitely.
185
VISAPP 2019 - 14th International Conference on Computer Vision Theory and Applications
Table 1: Validation results on part B of the VOC data (i.e., new classes only). Bold face indicates block-wise best results, i.e.,
best results with and without additional weighting (· + w). Underlined face highlights overall best results.
50 samples 100 samples 150 samples 200 samples 250 samples All samples
mAP/AULC mAP/AULC mAP/AULC mAP/AULC mAP/AULC mAP/AULC
Baseline
Random 8.7 / 4.3 12.4 / 14.9 15.5 / 28.8 18.7 / 45.9 21.9 / 66.2 32.4 / 264.0
Our Methods
Max 9.2 / 4.6 12.9 / 15.7 15.7 / 30.0 19.8 / 47.8 22.6 / 69.0 32.0 / 269.3
Avg 9.0 / 4.5 12.4 / 15.2 15.8 / 29.2 19.3 / 46.8 22.7 / 67.8 33.3 / 266.4
Sum 8.5 / 4.2 14.3 / 15.6 17.3 / 31.4 19.8 / 49.9 22.7 / 71.2 32.4 / 268.2
Max + w 9.2 / 4.6 13.0 / 15.7 17.0 / 30.7 20.6 / 49.5 23.2 / 71.4 33.0 / 271.0
Avg + w 8.7 / 4.3 12.5 / 14.9 16.6 / 29.4 19.9 / 47.7 22.4 / 68.8 32.7 / 267.1
Sum + w 8.7 / 4.4 13.7 / 15.6 17.5 / 31.2 20.9 / 50.4 24.3 / 72.9 32.7 / 273.6
Sum (+ w)
Avg (+ w)
Max (+ w)
All
Figure 2: Value of examples of cow, sheep and bird as determined by the Sum, Avg and Max metrics using the initial model.
The top seven selection is not affected by using our weighting method to counter training set class imbalaces.
Performance of active learning methods is usu- we evaluate the models after learning small amounts
ally evaluated by observing points on a learning curve of samples. At this point there is still a large number
(i.e., performance over number of added samples). In of diverse samples for the methods to choose from,
Table 1, we show the mAP for the new classes from which makes the following results much more rele-
part B of VOC at several intermediate learning steps vant for practical applications than results on the full
as well as exhausting the unlabeled pool. In addition dataset.
we show the area under learning curve (AULC) to In general, we can see that incremental learning
further improve comparability among the methods. In works in the context of the new classes in part B of
our experiments, the number of samples added equals the data, meaning that we observe an improving per-
the number of images. formance for all methods. After adding only 50 sam-
ples, Max and Avg are performing better than pas-
sive selection while the Sum metric is outperformed
Quantitative Results – Fast Exploration. Gaining marginally. When more and more samples are added
accuracy as fast as possible while minimizing the hu- (i.e., 100 to 250 samples), we observe a superior per-
man supervision is one of the main goals of active formance of the Sum aggregation. But also the two
learning. Moreover, in continuous exploration scena- other aggregation techniques are able to reach better
rios, like live camera feeds or other continuous auto- rates than mere random selection. We attribute the
matic measurements, it is assumed that new data is fast increase of performance for the Sum metric to its
always available. Hence, the pool of valuable exam- tendency to select samples with many object inside
ples will rarely be exhausted. To assess the perfor- which leads to more annotated bounding boxes. Ho-
mance of our methods in this fast exploration context, wever, the target application is a scenario where the
186
Active Learning for Deep Object Detection
Initial prediction
After 50 samples
amount of unlabeled data is huge or new data is ap- cate that the chosen incremental learning technique
proaching continuously and hence a complete evalu- (Käding et al., 2016b) is suitable for the faced scena-
ation by humans is infeasible. Here, we consider the rio. In continuous exploration, it is usually assumed
amount of images to be evaluated more critical as the that there will be more new unlabeled data available
time needed to draw single bounding boxes. Anot- than can be processed. Nevertheless, evaluating the
her interesting fact is the almost equal performance long term performance of our metrics is important to
of Max and Avg which can be explained as follows: detect possible deterioration over time compared to
the VOC dataset consists mostly of images with only random selection. In contrast to this, the differences
one object in them. Therefore, both metrics lead to a in AULC arise from the improvements of the different
similar score if objects are identified correctly. methods during the experimental run and therefore
We can also see that the proposed balance hand- should be considered as distinctive feature implying
ling (i.e., · + w) causes slight losses in performance at the performance over the whole experiment. Having
very early stages up to 100 samples. At subsequent this in mind, we can still see that Sum performs best
stages, it helps to gain noticeable improvements. Es- while the weighting generally leads to improvements.
pecially for the Sum method benefits from the sam-
ple weighting scheme. A possible explanation for this
behavior would be the following: At early stages, the Quantitative Results — Class-wise Analysis To
classifier has not seen many samples of each class and validate the efficacy of our sample weighting strategy
therefore suffers more from miss-classification errors. as discussed in Section 3.2, it is important to mea-
Hence, the weighting scheme is not able to encourage sure not only overall performance, but to look at me-
the selection of rare class samples since the classi- trics for individual classes. Fig. 4 shows the perfor-
fier decisions are still too unstable. At later stages, mance over time on the validation set for each indi-
this problem becomes less severe and the weighting vidual class. For reference, we also provide the class
scheme is much more helpful than in the beginning. distribution over the relevant part of the VOC data-
This could also explain the performance of Sum in set, indicated by number of object instances in total as
general. Further details on learning pace are given well as number of images with at least one instance in
later in a qualitative study on model evolution. Addi- it.
tionally, the Sum aggregation tends to select batches In the first row, we observe an advantage for the
with many detections in it. Hence, it is natural that weighted method when looking at the performance of
the improvement is noticeable the most with this ag- cow. Out of the three classes in this way of splitting
gregation technique since it helps to find batches with cow has the fewest instances in the dataset. The per-
many rare objects in it. formance of tvmonitor in the second row shows a si-
milar pattern, where it is also the class with the lowest
number of object instances in the dataset. Analyzing
Quantitative Results – All Available Samples. In bird and cat, the top classes by number of instan-
our case, active learning only affects the sequence of ces in each way of splitting, we observe only small
unlabeled batches if we train until there is no new data differences in performance. Thus, we can show evi-
available. Therefore, there are only very small diffe- dence that our balancing scheme is able to improve
rences between each method’s results for mAP after performance on rare classes while it does not effect
training has completed. The small differences indi- performance on frequent classes.
187
VISAPP 2019 - 14th International Conference on Computer Vision Theory and Applications
AP (%)
20 on the remaining classes for the first way of splitting.
Sum 10 Sum Figure 2 shows those images that the three aggrega-
10
Sum + w Sum + w tion metrics consider the most valuable. Additionally,
0 0 common zero scoring images are shown. The least
0 250 500 0 250 500 valuable images shown here are representative of all
# samples # samples proposed metrics because they do not lead to any de-
sheep boat tections using the initial model. Note that there are
more than seven images with zero score in the trai-
20 20 ning dataset. The images shown in the figure have
been selected randomly.
AP (%)
AP (%)
AP (%)
10
contain pristine examples of each object. They are
20 Sum Sum well lit and isolated. The objects in the zero scoring
Sum + w Sum + w images are more noisy and hard to identify even for
0 0
the human viewer, resulting in fewer reliable detecti-
0 250 500 0 250 500
# samples # samples
ons.
Number of samples in VOC dataset by class Qualitative Results – Model Evolution. Obser-
ving the change in model output as new data is lear-
Objects ned can help estimate the number of samples needed
1000
Images to learn new classes and identify possible confusions.
# samples
188
Active Learning for Deep Object Detection
189
VISAPP 2019 - 14th International Conference on Computer Vision Theory and Applications
Kapoor, A., Grauman, K., Urtasun, R., and Darrell, T. Uijlings, J. R., Van De Sande, K. E., Gevers, T., and Smeul-
(2010). Gaussian processes for object categorization. ders, A. W. (2013). Selective search for object re-
International Journal of Computer Vision (IJCV). cognition. International Journal of Computer Vision
Kingma, D. P. and Ba, J. (2014). Adam: A method (IJCV), 104(2):154–171.
for stochastic optimization. arXiv preprint arXiv: Vijayanarasimhan, S. and Grauman, K. (2014). Large-scale
1412.6980. live active learning: Training object detectors with
Kirkpatrick, J., Pascanu, R., Rabinowitz, N. C., Veness, J., crawled data and crowds. International Journal of
Desjardins, G., Rusu, A. A., Milan, K., Quan, J., Ra- Computer Vision (IJCV).
malho, T., Grabska-Barwinska, A., Hassabis, D., Clo- Wang, D. and Shang, Y. (2014). A new active labeling met-
path, C., Kumaran, D., and Hadsell, R. (2016). Over- hod for deep learning. In International Joint Confe-
coming catastrophic forgetting in neural networks. rence on Neural Networks (IJCNN).
arXiv preprint arXiv:1612.00796. Wang, K., Zhang, D., Li, Y., Zhang, R., and Lin, L. (2016).
Kovashka, A., Russakovsky, O., Fei-Fei, L., and Grauman, Cost-effective active learning for deep image classifi-
K. (2016). Crowdsourcing in computer vision. Foun- cation. Circuits and Systems for Video Technology.
dations and Trends in Computer Graphics and Vision. Yao, A., Gall, J., Leistner, C., and Van Gool, L. (2012).
Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Interactive object detection. In Computer Vision and
and Belongie, S. (2017). Feature pyramid networks Pattern Recognition (CVPR).
for object detection. In CVPR.
Liu, P., Zhang, H., and Eom, K. B. (2017). Active deep le-
arning for classification of hyperspectral images. Se-
lected Topics in Applied Earth Observations and Re-
mote Sensing.
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu,
C.-Y., and Berg, A. C. (2016). SSD: Single shot mul-
tibox detector. In European Conference on Computer
Vision (ECCV).
Papadopoulos, D. P., Uijlings, J. R. R., Keller, F., and Fer-
rari, V. (2016). We dont need no bounding-boxes:
Training object class detectors using only human veri-
fication. In Computer Vision and Pattern Recognition
(CVPR).
Redmon, J., Divvala, S., Girshick, R., and Farhadi, A.
(2016). You only look once: Unified, real-time object
detection. In Computer Vision and Pattern Recogni-
tion (CVPR).
Redmon, J. and Farhadi, A. (2017). Yolo9000: Better, fas-
ter, stronger. In Computer Vision and Pattern Recog-
nition (CVPR).
Redmon, J. and Farhadi, A. (2018). Yolov3: An incremental
improvement. arXiv preprint arXiv:1804.02767.
Ren, S., He, K., Girshick, R., and Sun, J. (2015). Faster R-
CNN: Towards real-time object detection with region
proposal networks. In Neural Information Processing
Systems (NIPS).
Roy, S., Namboodiri, V. P., and Biswas, A. (2016). Active
learning with version spaces for object detection.
arXiv preprint arXiv:1611.07285.
Settles, B. (2009). Active learning literature survey. Techni-
cal report, University of Wisconsin–Madison.
Shmelkov, K., Schmid, C., and Alahari, K. (2017). In-
cremental learning of object detectors without cata-
strophic forgetting. In International Conference on
Computer Vision (ICCV).
Stark, F., Hazırbas, C., Triebel, R., and Cremers, D. (2015).
Captcha recognition with active deep learning. In
Workshop New Challenges in Neural Computation.
Tong, S. and Koller, D. (2001). Support vector machine
active learning with applications to text classification.
Journal of Machine Learning Research (JMLR).
190