0% found this document useful (0 votes)
170 views14 pages

Towards Open Set Deep Networks: Abhijit Bendale, Terrance E. Boult University of Colorado at Colorado Springs

1) Deep networks have achieved great success in visual recognition but are designed for closed set recognition where all classes are known. This does not reflect the open set nature of real-world recognition where unknown classes must be rejected. 2) Existing methods like thresholding softmax probabilities are not sufficient for open set recognition as deep networks can be fooled by images unrelated to the predicted class. 3) The paper introduces a new OpenMax layer that estimates the probability of an input belonging to an unknown class by adapting meta-recognition concepts to activation patterns in the penultimate network layer. This allows deep networks to reject fooling and open set images and better handle open set recognition.

Uploaded by

abhiman singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
170 views14 pages

Towards Open Set Deep Networks: Abhijit Bendale, Terrance E. Boult University of Colorado at Colorado Springs

1) Deep networks have achieved great success in visual recognition but are designed for closed set recognition where all classes are known. This does not reflect the open set nature of real-world recognition where unknown classes must be rejected. 2) Existing methods like thresholding softmax probabilities are not sufficient for open set recognition as deep networks can be fooled by images unrelated to the predicted class. 3) The paper introduces a new OpenMax layer that estimates the probability of an input belonging to an unknown class by adapting meta-recognition concepts to activation patterns in the penultimate network layer. This allows deep networks to reject fooling and open set images and better handle open set recognition.

Uploaded by

abhiman singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Towards Open Set Deep Networks

Abhijit Bendale, Terrance E. Boult


University of Colorado at Colorado Springs
{abendale,tboult}@vast.uccs.edu
arXiv:1511.06233v1 [cs.CV] 19 Nov 2015

Abstract classification methods and large datasets have resulted in


many commercial applications [5, 28, 16, 6]. However, a
Deep networks have produced significant gains for var- wide range of operational challenges occur while deploying
ious visual recognition problems, leading to high impact recognition systems in the dynamic and ever-changing real
academic and commercial applications. Recent work in world. A vast majority of recognition systems are designed
deep networks highlighted that it is easy to generate images for a static closed world, where the primary assumption is
that humans would never classify as a particular object that all categories are known a priori. Deep networks, like
class, yet networks classify such images high confidence many classic machine learning tools, are designed to per-
as that given class – deep network are easily fooled with form closed set recognition.
images humans do not consider meaningful. The closed set Recent work on open set recognition [20, 21] and
nature of deep networks forces them to choose from one of open world recognition [1], has formalized processes for
the known classes leading to such artifacts. Recognition in performing recognition in settings that require rejecting
the real world is open set, i.e. the recognition system should unknown objects during testing. While one can always
reject unknown/unseen classes at test time. We present a train with an “other” class for uninteresting classes (known
methodology to adapt deep networks for open set recogni- unknowns), it is impossible to train with all possible exam-
tion, by introducing a new model layer, OpenMax, which ples of unknown objects. Hence the need arises for design-
estimates the probability of an input being from an unknown ing visual recognition tools that formally account for the
class. A key element of estimating the unknown probabil- “unknown unknowns”[18]. Altough a range of algorithms
ity is adapting Meta-Recognition concepts to the activation has been developed to address this issue [4, 20, 21, 25, 2],
patterns in the penultimate layer of the network. Open- performing open set recognition with deep networks has
Max allows rejection of “fooling” and unrelated open set remained an unsolved problem.
images presented to the system; OpenMax greatly reduces
the number of obvious errors made by a deep network. We In the majority of deep networks [11, 26, 3], the output of
prove that the OpenMax concept provides bounded open the last fully-connected layer is fed to the SoftMax function,
space risk, thereby formally providing an open set recog- which produces a probability distribution over the N known
nition solution. We evaluate the resulting open set deep net- class labels. While a deep network will always have a most-
works using pre-trained networks from the Caffe Model-zoo likely class, one might hope that for an unknown input all
on ImageNet 2012 validation data, and thousands of fooling classes would have low probability and that thresholding on
and open set images. The proposed OpenMax model signif- uncertainty would reject unknown classes. Recent papers
icantly outperforms open set recognition accuracy of basic have shown how to produce “fooling” [14] or “rubbish”
deep networks as well as deep networks with thresholding [8] images that are visually far from the desired class but
of SoftMax probabilities. produce high-probability/confidence scores. They strongly
suggests that thresholding on uncertainty is not sufficient
to determine what is unknown. In Sec. 3, we show that
extending deep networks to threshold SoftMax probabil-
1 Introduction ity improves open set recognition somewhat, but does not
Computer Vision datasets have grown from few hundred resolve the issue of fooling images. Nothing in the the-
images to millions of images and from few categories to ory/practice of deep networks, even with thresholded prob-
thousands of categories, thanks to research advances in abilities, satisfies the formal definition of open set recog-
vision and learning. Recent research in deep networks has nition offered in [20]. This leads to the first question
significantly improved many aspects of visual recognition addressed in this paper, “how to adapt deep networks sup-
[26, 3, 11]. Co-evolution of rich representations, scalable port to open set recognition?”

1
Baseball Baseball
Hammerhead 
MODEL  

Real  Image  
Fooling  
OpenSet  
Hammerhead Shark Real:  SM  0.94    OM  0.94     Real:  SM  0.57,  OM  0.58  
MODEL  
Real  Image  
Fooling  
OpenSet  
Great White Shark
MODEL  
Real  Image   Fooling:  SM  1.0,  OM  0.00   Fooling:  SM  0.98,  OM  0.00  
Fooling  
OpenSet  
Scuba Diver
MODEL  
Real  Image  
Openset:  0.15,    OM:  0.17   Openset:  SM  0.25,  OM  0.10  
Fooling  
OpenSet  
Adversarial  Scuba  Diver    
Adversarial Scuba Diver (from Hammerhead) SM  0.32  Scuba  Diver    
OM  0.49  Unknown  
 
ALer  Blur  OM  0.79  Hammerhead  
Sharks
Whales
Dogs
Fish
Baseball

Figure 1: Examples showing how an activation vector model provides sufficient information for our Meta-Recognition and OpenMax
extension of a deep network to support open-set recognition. The OpenMax algorithm measures distance between an activation vector
(AV) for an input and the model vector for the top few classes, adjusting scores and providing an estimate of probability of being unknown.
The left side shows activation vectors (AV) for different images, with different AVs separated by black lines. Each input image becomes
an AV, displayed as 10x450 color pixels, with the vertical being one pixel for each of 10 deep network channel activation energy and the
horizontal dimension showing the response for the first 450 ImageNet classes. Ranges of various category indices (sharks, whales, dogs,
fish, etc.) are identified on the bottom of the image. For each of four classes (baseball, hammerhead shark, great white shark and scuba
diver), we show an AV for 4 types of images: the model, a real image, a fooling image and an open set image. The AVs show patterns of
activation in which, for real images, related classes are often responding together, e.g., sharks share many visual features, hence correlated
responses, with other sharks, whales, large fishes, but not with dogs or with baseballs. Visual inspection of the AVs shows significant
difference between the response patterns for fooling and open set images compared to a real image or the model AV. For example, note
the darker (deep blue) lines in many fooling images and different green patterns in many open set images. The bottom AV is from an
“adversarial” image, wherein a hammerhead image was converted, by adding nearly invisible pixel changes, into something classified as
scuba-diver. On the right are two columns showing the associated images for two of the classes. Each example shows the SoftMax (SM)
and OpenMax (OM) scores for the real image, the fooling and open set image that produced the AV shown on the left. The red OM scores
implies the OM algorithm classified the image as unknown, but for completeness we show the OM probability of baseball/hammerhead
class for which there was originally confusion. The bottom right shows the adversarial image and its associated scores – despite the
network classifying it as a scuba diver, the visual similarity to the hammerhead is clearly stronger. OpenMax rejects the adversarial image
as an outlier from the scuba diver class. As an example of recovery from failure, we note that if the image is Gaussian blurred OpenMax
classifies it as a hammerhead shark with .79 OM probability.

The SoftMax layer is a significant component of the to sum to 1, and rejecting inputs far from known inputs,
problem because of its closed nature. We propose an alter- OpenMax can formally handle unknown/unseen classes
native, OpenMax, which extends SoftMax layer by enabling during operation. Our experiments demonstrate that the
it to predict an unknown class. OpenMax incorporates like- proposed combination of OpenMax and Meta-Recognition
lihood of the recognition system failure. This likelihood is ideas readily address open set recognition for deep networks
used to estimate the probability for a given input belong- and reject high confidence fooling images [14].
ing to an unknown class. For this estimation, we adapt the
While fooling/rubbish images are, to human observers,
concept of Meta-Recognition[22, 32, 9] to deep networks.
clearly not from a class of interest, adversarial images
We use the scores from the penultimate layer of deep net-
[8, 27] present a more difficult challenge. These adversarial
works (the fully connected layer before SoftMax, e.g., FC8)
images are visually indistinguishable from a training sam-
to estimate if the input is “far” from known training data.
ple but are designed so that deep networks produce high-
We call scores in that layer the activation vector(AV). This
confidence but incorrect answers. This is different from
information is incorporated in our OpenMax model and
standard open space risk because adversarial images are
used to characterize failure of recognition system. By drop-
“near” a training sample in input space, for any given output
ping the restriction for the probability for known classes
class.
A key insight in our opening deep networks is noting Algorithm 1 EVT Meta-Recognition Calibration for Open Set
that “open space risk” should be measured in feature space, Deep Networks, with per class Weibull fit to η largest distance to
rather than in pixel space. In prior work, open space risk mean activation vector. Returns libMR models ρj which includes
is not measured in pixel space for the majority of problems parameters τi for shifting the data as well as the Weibull shape and
[20, 21, 1]. Thus, we ask “is there a feature space, ide- scale parameters:κi , λi .
ally a layer in the deep network, where these adversarial Require: FitHigh function from libMR
images are far away from training examples, i.e., a layer Require: Activation levels in the penultimate network
where unknown, fooling and adversarial images become layer v(x) = v1 (x) . . . vN (x)
outliers in an open set recognition problem?” In Sec. 2.1, Require: For each class j let Si,j = vj (xi,j ) for each cor-
we investigate the choice of the feature space/layer in deep rectly classified training example xi,j .
networks for measuring open space risk. We show that 1: for j = 1 . . . N do
an extreme-value meta-recognition inspired distance nor- 2: Compute mean AV, µj = meani (Si,j )
malization process on the overall activation patterns of the 3: EVT Fit ρj = (τj , κj , λj ) = FitHigh(kŜj −µj k, η)
penultimate network layer provides a rejection probability 4: end for
for OpenMax normalization for unknown images, fooling 5: Return means µj and libMR models ρj
images and even for many adversarial images. In Fig. 1, we
show examples of activation patterns for our model, input model based on distance from a learned model. In follow-
images, fooling images, adversarial images (that the system ing section, we elaborate on the space and meta-recognition
can reject) and open set images. approach for estimating distance from known training data,
In summary the contributions of this paper are: followed by a methodology to incorporate such distance in
1. Multi-class Meta-Recognition using Activation Vec- decision function of deep networks. We call our method-
tors to estimate the probability of deep network failure ology OpenMax, an alternative for the SoftMax function
2. Formalization of open set deep networks using Meta- as the final layer of the network. Finally, we show that
Recognition and OpenMax, along with the proof the overall model is a compact abating probability model,
showing that proposed approach manages open space hence, it satisfies the definition for an open set recognition.
risk for deep networks 2.1 Multi-class Meta-Recognition
3. Experimental analysis of the effectiveness of open set
Our first step is to determine when an input is likely not
deep networks at rejecting unknown classes, fooling
from a known class, i.e., we want to add a meta-recognition
images and obvious errors from adversarial images,
algorithm [22, 32] to analyze scores and recognize when
while maintaining its accuracy on testing images
deep networks are likely incorrect in their assessment. Prior
work on meta-recognition used the final system scores, ana-
2 Open Set Deep Networks lyzed their distribution based on Extreme Value Theory
A natural approach for opening a deep network is to apply (EVT) and found these distributions follow Weibull distri-
a threshold on the output probability. We consider this bution. Although one might use the per class scores inde-
as rejecting uncertain predictions, rather than rejecting pendently and consider their distribution using EVT, that
unknown classes. It is expected images from unknown would not produce a compact abating probability because
classes will all have low probabilities, i.e., be very uncer- the fooling images show that the scores themselves were not
tain. This is true only for a small fraction of unknown from a compact space close to known input training data.
inputs. Our experiments in Sec. 3 show that thresholding Furthermore, a direct EVT fitting on the set of class post
uncertain inputs helps, but is still relatively weak tool for recognition scores (SoftMax layer) is not meaningful with
open set recognition. Scheirer et al. [20] defined open space deep networks, because the final SoftMax layer is intention-
risk as the risk associated with labeling data that is “far” ally renormalized to follow a logistic distribution. Thus, we
from known training samples. That work provides only a analyze the penultimate layer, which is generally viewed
general definition and does not prescribe how to measure as a per-class estimation. This per-class estimation is con-
distance, nor does it specify the space in which such dis- verted by SoftMax function into the final output probabili-
tance is to be measured. In order to adapt deep networks ties.
to handle open set recognition, we must ensure they man- We take the approach that the network values from
age/minimize their open space risk and have the ability to penultimate layer (hereafter the Activation Vector (AV)), are
reject unknown inputs. not an independent per-class score estimate, but rather they
Building on the concepts in [21, 1], we seek to choose a provide a distribution of what classes are “related.” In
layer (feature space) in which we can build a compact abat- Sec. 2.2 we discuss an illustrative example based on Fig. 1.
ing probability model that can be thresholded to limit open Our overall EVT meta-recognition algorithm is summa-
space risk. We develop this model as a decaying probability rized in Alg. 1. To recognize outliers using AVs, we adapt
the concepts of Nearest Class Mean [29, 12] or Nearest scores for the AV dimension associated with a great white
Non-Outlier [1] and apply them per class within the activa- shark. All sharks share many direct visual features and
tion vector, as a first approximation. While more complex many contextual visual features with other sharks, whales
models, such as nearest class multiple centroids (NCMC) and large fish, which is why Fig. 1 shows multiple higher
[13] or NCM forests [17], could provide more accurate activations (bright yellow-green) for many ImageNet cate-
modeling, for simplicity this paper focuses on just using a gories in those groups. We hypothesize that for most cat-
single mean. Each class is represented as a point, a mean egories, there is a relatively consistent pattern of related
activation vector (MAV) with the mean computed over only activations. The MAV captures that distribution as a sin-
the correctly classified training examples (line 2 of Alg. 1). gle point. The AVs present a space where we measure the
Given the MAV and an input image, we measure dis- distance from an input image in terms of the activation of
tance between them. We could directly threshold distance, each class; if it is a great white shark we also expect higher
e.g., use the cross-class validation approach of [1] to deter- activations from say tiger and hammerhead sharks as well as
mine an overall maximum distance threshold. In [1], the whales, but very weak or no activations from birds or base-
features were subject to metric learning to normalize them, balls. Intuitively, this seems like the right space in which to
which makes a single shared threshold viable. However, the measure the distance during training.
lack of uniformity in the AV for different classes presents Open Set: First let us consider an open set image, i.e., a
a greater challenge and, hence, we seek a per class meta- real image from an unknown category. These will always be
recognition model. In particular, on line 3 of Alg. 1 we use mapped by the deep network to the class for which SoftMax
the libMR [22] FitHigh function to do Weibull fitting on the provides the maximum response, e.g., the images of rocks
largest of the distances between all correct positive training in Fig. 1 is mapped to baseball and the fish on the right is
instances and the associated µi . This results in a parame- mapped to a hammerhead. Sometimes open set images will
ter ρi , which is used to estimate the probability of an input have lower confidence, but the maximum score will yield
being an outlier with respect to class i. a corresponding class. Comparing the activation vectors of
Given ρi , a simple rejection model would be for the the input with the MAV for a class for which the input pro-
user to define a threshold that decides if an input should duced maximum response, we observe it is often far from
be rejected, e.g., ensuring 90% of all training data will have the mean. However, for some open set images the response
probability near zero of being rejected as an outlier. While provided is close to the AV but still has an overall low acti-
simple to implement, it is difficult to calibrate an abso- vation level. This can occur if the input is an “unknown”
lute Meta-Recognition threshold because it depends on the class that is closely related to a known class, or if the object
unknown unknowns. Therefore, we choose to use this in the is small enough that it is not well distinguished. For exam-
OpenMax algorithm described in Sec. 2 which has a contin- ple, if the input is from a different type of shark or large
uous adjustment. fish, it may provide a low activation, but the AV may not
We note that our calibration process uses only correctly be different enough to be rejected. For this reason, it is still
classified data, for which class j is rank 1. At testing, necessary for open set recognition to threshold uncertainty,
for input x assume class j has the largest probability, then in addition to directly estimating if a class is unknown.
ρj (x) provides the MR estimated probability that x is an Fooling Set: Consider a fooling input image, which
outlier and should be rejected. We use one calibration for was artificially constructed to make a particular class (e.g.,
high-ranking (e.g., top 10), but as an extension separate cal- baseball or hammerhead) have high activation score and,
ibration for different ranks is possible. Note when there hence, to be detected with high confidence. While the artifi-
are multiple channels per example we compute per channel cial construction increases the class of interest’s probability,
per class mean vectors µj,c and Weibull parameters ρj,c . It the image generation process did not simultaneously adjust
is worth remembering that the goal is not to determine the the scores of all related classes, resulting in an AV that is
training class of the input, rather this is a meta-recognition “far” from the model AV. Examine the 3rd element of each
process used to determine if the given input is from an class group in Fig. 1 which show activations from fooling
unknown class and hence should be rejected. images. Many fooling images are visually quite different
and so are their activation vectors. The many regions of very
2.2 Interpretation of Activation Vectors low activation (dark blue/purple) are likely because one can
In this section, we present the concept of activation vectors increase the output of SoftMax for a given class by reduc-
and meta-recognition with illustrative examples based on ing the activation of other classes, which in turn reduces the
Fig. 1. denominator of the SoftMax computation.
Closed Set: Presume the input is a valid input of say Adversarial Set: Finally, consider an adversarial input
a hammerhead shark, i.e., the second group of activation image [8, 27, 31], which is constructed to be close to one
records from Fig. 1. The activation vector shows high class but is mislabeled as another. An example is shown
on the bottom right of Fig. 1. If the adversarial image is Algorithm 2 OpenMax probability estimation with rejection of
constructed to a nearby class, e.g., from hammerhead to unknown or uncertain inputs.
great white, then the approach proposed herein will fail to Require: Activation vector for v(x) = v1 (x), . . . , vN (x)
detect it as a problem – fine-grained category differences Require: means µj and libMR models ρj = (τi , λi , κi )
are not captured in the MAV. However, adversarial images Require: α, the numer of “top” classes to revise
can be constructed between any pair of image classes, see 1: Let s(i) = argsort(vj (x)); Let ωj = 1
[27]. When the target class is far enough, e.g., the ham- 2: for i = 1, . . . , α do  
kx−τs(i) k κs(i)
merhead and scuba example here, or even farther such as −
α−i λs(i)
hammerhead and baseball, the adversarial image will have 3: ωs(i) (x) = 1 − α e
a significant difference in activation score and hence can 4: end for
be rejected. We do not consider adversarial images in our 5: Revise activation vector v̂(x) = v(x) ◦ ω(x)
P
experiments because the outcome would be more a function 6: Define v̂0 (x) = i vi (x)(1 − ωi (x)).
of that adversarial images we choose to generate – and we 7:
know of no meaningful distribution for that. If, for example, ev̂j (x)
P̂ (y = j|x) = PN (2)
v̂i (x)
we choose random class pairs (a, b) and generated adver- i=0 e
sarial images from a to b, most of those would have large 8: Let y ∗ = argmaxj P (y = j|x)
hierarchy distance and likely be rejected. If we choose the 9: Reject input if y ∗ == 0 or P (y = y ∗ |x) < 
closest adversarial images, likely from nearby classes, the
activations will be close and they will not be rejected. in their computation. The scores in the penultimate network
The result of our OpenMax process is that open set layer of Caffe-based deep networks [10], what we call the
as well as fooling or adversarial images will generally be activation vector, has the weighting performed in the con-
rejected. Building a fooling or adversarial image that is volution that produced it. Let v(x) = v1 (x), . . . , vN (x) be
not rejected means not only getting a high score for the the activation level for each class, y = 1, . . . , N . After deep
class of interest, it means maintaining the relative scores network training, an input image x yields activation vector
for the 999 other classes. At a minimum, the space of v(x), the SoftMax layer computes:
adversarial/fooling images is significantly reduced by these
constraints. Hopefully, any input that satisfies all the con- evj (x)
P (y = j|x) = PN (1)
vi (x)
straints is an image that also gets human support for the i=1 e
class label, as did some of the fooling images in Figure 3 of
where the denominator sums over all classes to ensure the
[14], and as one sees in adversarial image pairs fine-grain
probabilities over all classes sum to 1. However, in open
separated categories such as bull and great white sharks.
set recognition there are unknown classes that will occur
One may wonder if a single MAV is sufficient to repre-
at test time and, hence, it is not appropriate to require the
sent complex objects with different aspects/views. While
probabilities to sum to 1.
future work should examine more complex models that
To adapt SoftMax for open set, let ρ be a vector of meta-
can capture different views/exemplars, e.g., NCMC [13]
recognition models for each class estimated by Alg. 1. In
or NCM forests [17]. If the deep network has actually
Alg. 2 we summarize the steps for OpenMax computation.
achieved the goal of view independent recognition, then the
For convenience we define the unknown unknown class to
distribution of penultimate activation should be nearly view
be at index 0. We use the Weibull CDF probability (line 3 of
independent. While the open-jaw and side views of a shark
Alg. 2) on the distance between x and µi for the core of the
are visually quite different, and a multi-exemplar model
rejection estimation. The model µi is computed using the
may be more effective in capturing the different features
images associated with category i, images that were clas-
in different views, the open-jaws of different sharks are still
sified correctly (top-1) during training process. We expect
quite similar, as are their side views. Hence, each view may
the EVT function of distance to provide a meaningful prob-
present a relatively consistent AV, allowing a single MAV
ability only for few top ranks. Thus in line 3 of Alg. 2,
to capture both. Intuitively, while image features may vary
we compute weights for the α largest activation classes and
greatly with view, the relative strength of “related classes”
use it to scale the Weibull CDF probability. We then com-
represented by the AV should be far more view independent.
pute revised activation vector with the top scores changed.
We compute a pseudo-activation for the unknown unknown
2.3 OpenMax class, keeping the total activation level constant. Includ-
The standard SoftMax function is a gradient-log-normalizer ing the unknown unknown class, the new revised activation
of the categorical probability distribution – a primary reason compute the OpenMax probabilities as in Eq. 2.
that it is commonly used as the last fully connected layer of OpenMax provides probabilities that support explicit
a network. The traditional definition has per-node weights rejection when the unknown unknown class (y = 0) has
ing images and 50 open set images as well as for fooling
images. The more off-diagonal the more OpenMax altered
the probabilities. Threshold selection for uncertainty based
rejection , would find a balance between keeping the train-
ing examples while rejecting open set examples. Fooling
images were not used for threshold selection.
While not part of our experimental evaluation, note that
OpenMax also provides meaningful rank ordering via its
estimated probability. Thus OpenMax directly supports a
top-5 class output with rejection. It is also important to note
that because of the re-calibration of the activation scores
v̂i (x), OpenMax often does not produce the same rank
ordering of the scores.
2.4 OpenMax Compact Abating Property
Figure 2: A plot of OpenMax probabilities vs SoftMax prob- While thresholding uncertainty does provide the ability to
abilities for the fooling (triangle), open set (square) and valida- reject some inputs, it has not been shown to formally limit
tion (circle) for 100 categories from ImageNet 2012. The more open space risk for deep networks. It should be easy to
off-diagonal a point, the more OpenMax altered the probabili- see that in terms of the activation vector, the positively
ties. Below the diagonal means OpenMax estimation reduced the labeled space for SoftMax is not restricted to be near the
inputs probability of being in the class. For some inputs Open-
training space, since any increase in the maximum class
Max increased the classes probability, which occurs when the lead-
ing class is partially rejected thereby reducing its probability and
score increases its probability while decreasing the proba-
increasing a second or higher ranked class. Uncertainty-based bility of other classes. With sufficient increase in the maxi-
rejection threshold () selection can optimize F-measure between mum directions, even large changes in other dimension will
correctly classifying the training examples while rejecting open still provide large activation for the leading class. While
set examples. (Fooling images are not used for threshold selec- in theory one might say the deep network activations are
tion.) The number of triangles and squares below the diagonal bounded, the fooling images of [14], are convincing evi-
means that uncertainty thresholding on OpenMax threshold (ver- dence that SoftMax cannot manage open space risk.
tical direction), is better than thresholding on SoftMax (horizontal
direction). Theorem 1 (Open Set Deep Networks): A deep network
extended using Meta-Recognition on activation vectors as
the largest probability. This Meta-Recognition approach in Alg. 2, with the SoftMax later adapted to OpenMax, as in
is a first step toward determination of unknown unknown Eq. 2, provides an open set recognition function.
classes and our experiments show that a single MAV works Proof. The Meta-Recognition probability (CDF of a
reasonably well at detecting fooling images, and is bet- Weibull) is a monotonically increasing function of kµi −
ter than just thresholding on uncertainty. However, in any xk, and hence 1 − ωi (x) is monotonically decreasing.
system that produces certainty estimates, thresholding on Thus, they form the basis for a compact abating proba-
uncertainty is still a valid type of meta-recognition and bility as defined in [21]. Since the OpenMax transforma-
should not be ignored. The final OpenMax approach thus tion is a weighted monotonic transformation of the Meta-
also rejects unknown as well as uncertain inputs in line 9 of Recognition probability, applying Theorems 1 and 2 of
Alg.2. [1] yield that thresholding the OpenMax probability of the
To select the hyper-parameters , η, and α, we can do unknown manages open space risk as measured in the AV
a grid search calibration procedure using a set of training feature space. Thus it is an open set recognition func-
images plus a sampling of open set images, optimizing F- tion.
measure over the set. The goal here is basic calibration
for overall scale/sensitivity selection, not to optimize the 3 Experimental Analysis
threshold over the space of unknown unknowns, which can- In this section, we present experiments carried out in order
not be done experimentally. to evaluate the effectiveness of the proposed OpenMax
Note that the computation of the unknown unknown approach for open set recognition tasks with deep neural
class probability inherently alters all probabilities esti- networks. Our evaluation is based on ImageNet Large Scale
mated. For a fixed threshold and inputs that have even Visual Recognition Competition (ILSVRC) 2012 dataset
a small chance of being unknown, OpenMax will reject with 1K visual categories. The dataset contains around
more inputs than SoftMax. Fig. 2 shows the OpenMax and 1.3M images for training (with approximately 1K to 1.3K
SoftMax probabilities for 100 example images, 50 train- images per category), 50K images for validation and 150K
images for testing. Since test labels for ILSVRC 2012 are OpenMax Open set Softmax Open set
not publicly available, like others have done we report per- 0.60
formance on validation set [11, 14, 23]. We use a pre-
trained AlexNet (BVLC AlexNet) deep neural network pro- 0.59
vided by the Caffe software package [10]. BVLC AlexNet
is reported to obtain approximately 57.1% top-1 accuracy

F-measure
on ILSVRC 2012 validation set. The choice of pre-trained 0.58
BVLC AlexNet is deliberate, since it is open source and one
of the most widely used packages available for deep learn-
ing. 0.57

To ensure proper open set evaluation, we apply a test


protocol similar to the ones presented in [21, 1]. During 0.56
the testing phase, we test the system with all the 1000 cate-
gories from ILSVRC 2012 validation set, fooling categories 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45
Thresholds
and previously unseen categories. The previously unseen
Figure 3: OpenMax and SoftMax-w/threshold performance
categories are selected from ILSVRC 2010. It has been
shown as F-measure as a function of threshold on output prob-
noted by Ruskovsky et al. [19] that approximately 360 cat- abilities. The test uses 80,000 images, with 50,000 validation
egories from ILSVRC 2010 were discarded and not used images from ILSVRC 2012, 15,000 fooling images and 15,000
in ILSVRC 2012. Images from these 360 categories as the “unknown” images draw from ILSVRC 2010 categories not used
open set images, i.e., unseen or unknown categories. in 2012. The base deep network performance would be the same as
Fooling images are generally totally unrecognizable to threshold 0 of SoftMax-w/threshold. OpenMax performance gain
humans as belonging to the given category but deep net- is nearly 4.3% improvement accuracy over SoftMax with optimal
works report with near certainty they are from the specified threshold, and 12.3% over the base deep network. Putting that in
category. We use fooling images provided by Nguyen et context, over the test set OpenMax correctly classified 3450 more
images than SoftMax and 9847 more than the base deep network.
al. [14] that were generated by an evolutionary algorithm or
by gradient ascent in pixel space. The final test set consists
of 50K closed set images from ILSVRC 2012, 15K open the values in the FC8 layer for a test image that consists
set images (from the 360 distinct categories from ILSVRC of 1000x10 dimensional values corresponding to each class
2010) and 15K fooling images (with 15 images each per and each channel. For each channel in each class, the input
ILSVRC 2012 categories). is compared using a per class MAV and per class Weibull
Training Phase: As discussed previously (Alg. 1), we parameters. During testing, distance with respect to the
consider the penultimate layer (fully connected layer 8 , i.e., MAV is computed and revised OpenMax activations are
FC8) for computation of mean activation vectors (MAV). obtained, including the new unknown class (see lines 5&6
The MAV vector is computed for each class by consider- of Alg. 2). The OpenMax probability is computed per chan-
ing the training examples that deep networks training clas- nel, using the revised activations (Eq. 2) yielding an out-
sified correctly for the respective class. MAV is computed put of 1001x10 probabilities. For each class, the average
for each crop/channel separately. Distance between each over the 10 channel gives the overall OpenMax probability.
correctly classified training example and MAV for particu- Finally, the class with the maximum over the 1001 prob-
lar class is computed to obtain class specific distance dis- abilities is the predicted class. This maximum probability
tribution. For these experiments we use a distance that is a is then subject to the uncertainty threshold (line 9). In this
weighted combination of normalized Euclidean and cosine work we focus on strict top-1 predictions.
distances. Supplemental material shows results with pure Evaluation: ILSVRC 2012 is a large scale multi-class
Euclidean and other measures that overall perform simi- classification problem and top-1 or top-5 accuracy is used
larly. Parameters of Weibull distribution are estimated on to measure the effectiveness of a classification algorithm
these distances. This process is repeated for each of the [19]. Multi-class classification error for a closed set system
1000 classes in ILSVRC 2012. The exact length of tail size can be computed by keeping track of incorrect classifica-
for estimating parameters of Weibull distribution is obtained tions. For open set testing the evaluation must keep track of
during parameter estimation phase over a small set of hold the errors that occur due to standard multi-class classifica-
out data. This process is repeated multiple times to obtain tion over known categories as well as errors between known
an overall tail size of 20. and unknown categories. As suggested in [25, 20] we use
Testing Phase: During testing, each test image goes F-measure to evaluate open set performance. For open
through the OpenMax score calibration process as dis- set recognition testing, F-measure is better than accuracy
cussed previously in Alg. 2. The activation vectors are because it is not inflated by true negatives.
OpenMax Fooling Detector OpenMax Openset Detector
100
Softmax Fooling Detector Softmax Openset Detector

80

60
Accuracy

40

Original AV
20
Agama MAV

Jeep MAV
0
0.0 0.2 0.4 0.6 0.8
Thresholds Crop 1 AV

Figure 4: The above figure shows performance of OpenMax and Crop2 AV


SoftMax as a detector for fooling images and for open set test
images. F-measure is computed for varying thresholds on Open- Lizards   Jeep  

Max and SoftMax probability values. The proposed approach of Figure 5: OpenMax also predict failure during training as in this
OpenMax performs very well for rejecting fooling images during example. The official class is agama but the MAV for agama is
prediction phase. rejected for this input, and the highest scoring class is jeep with
probability 0.26. However, cropping out image regions can find
For a given threshold on OpenMax/SoftMax probabil- windows where the agama is well detected and another where the
ity values, we compute true positives, false positives and Jeep is detected. Crop 1 is the jeep region, crop 2 is agama and the
false negatives over the entire dataset. For example, when crops AV clearly match the appropriate model and are accepted
testing the system with images from validation set, fooling with probability 0.32 and 0.21 respectively.
set and open set (see Fig. 3), true positives are defined as
the correct classifications on the validation set, false posi-
tives are incorrect classifications on the validation set and Rejects the input, but with a small amount of simple gaus-
false negatives are images from the fooling set and open set sian blur, the image can be reprocessed and is accepted as a
categories that the system incorrectly classified as known hammerhead shark by with probability 0.79.
examples. Fig. 3 shows performance of OpenMax and Soft-
Max for varying thresholds. Our experiments show that the We used non-test data for parameter tuning, and for
proposed approach of OpenMax consistently obtains higher brevity only showed performance variation with respect
F-measure on open set testing. to the uncertainty threshold shared by both SoftMax with
threshold and OpenMax. The supplemental material shows
4 Discussion variation of a wider range of OpenMax parameters, e.g. one
We have seen that with our OpenMax architecture, we can can increase open set and fooling rejection capability at the
automatically reject many unknown open set and fooling expense of rejecting more of the true classes. In future
images as well as rejecting some adversarial images, while work, such increase in true class rejection might be mit-
having only modest impact to the true classification rate. igated by increasing the expressiveness of the AV model,
One of the obvious questions when using Meta-Recognition e.g. moving to multiple MAVs per class. This might allow
is “what do we do with rejected inputs?” While that is best it to better capture different contexts for the same object,
left up to the operational system designer, there are multiple e.g. a baseball on a desk has a different context, hence, may
possibilities. OpenMax can be treated as a novelty detector have different “related” classes in the AV than say a baseball
in the scenario presented open world recognition [1] after being thrown by a pitcher.
that human label the data and the system incrementally learn
new categories. Or detection can used as a flag to bring in Interestingly, we have observe that the OpenMax rejec-
other modalities[24, 7]. tion process often identifies/rejects the ImageNet images
A second approach, especially helpful with adversarial that the deep network incorrectly classified, especially
or noisy images, is to try to remove small noise that might images with multiple objects. Similarly, many samples that
have lead to the miss classification. For example, the bot- are far away from training data have multiple objects in the
tom right of Fig. 1, showed an adversarial image wherein a scene. Thus, other uses of the OpenMax rejection can be to
hammerhead shark image with noise was incorrectly clas- improve training process and aid in developing better local-
sified by base deep network as a scuba diver. OpenMax ization techniques [30, 15]. See Fig. 5 for an example.
5 Towards Open Set Deep Networks: to increase α values beyond 10, we see almost no gain in
F-Measure performance or fooling/open set detection accu-
Supplemental racy. The most likely reason for this lack of change in per-
In this supplement, we provide we provide additional mate- formance beyond α = 10 is lower ranked classes have very
rial to further the readers understanding of the work on small FC8 activations and do not provide any significant
Open Set Deep Networks, Mean Activation Vectors, Open change in OpenMax probability. The results for varying
Set Recognition and OpenMax algorithm. We present addi- values of α are presented in Figs 8 and 9.
tional experiments on ILSVRC 2012 dataset. First we
present experiments to illustrate performance of OpenMax 6.3 Distance Measures
for various parameters of EVT calibration (alg. 1, main
We tried different distance measures to compute distances
paper) followed by sensitivity of OpenMax to total number
between Mean Activation Vectors and Activation Vector of
of “top classes” (i.e. α in alg. 2, main paper) to consider for
an incoming test image. We tried cosine distance, euclidean
recalibrating SoftMax scores. We then present different dis-
distance and euclidean-cosine distance. Cosine distance and
tance measures namely euclidean and cosine distance used
euclidean distances compared marginally worse compared
for EVT calibration. We then illustrate working of Open-
to euclidean-cosine distance. Cosine distance does not pro-
Max with qualitative examples for open set evaluation per-
vide for a compact abating property hence may not restrict
formed during the testing phase. Finally, we illustrate the
open space for points that have small degree of separation
distribution of Mean Activation Vectors with a class confu-
in terms of angle but still far away in terms of euclidean dis-
sion map.
tance. Euclidean-cosine distance finds the closest points in a
hyper-cone, thus restricting open space and finding closest
6 Parameters for OpenMax Calibra- points to Mean Activation Vector. Euclidean distance and
tion euclidean-cosine distance performed very similar in terms
6.1 Tail Sizes for EVT Calibration of performance. In Fig 10 we show effect of different dis-
tances on over all performance. We see that OpenMax still
In this section we present extended analysis of effect of tail
performs better than SoftMax, and euclidean-cosine dis-
sizes used for EVT fitting in Alg 1 in main paper on the
tance performs the best of those tested.
performance of the proposed OpenMax algorithm. We tried
multiple tail sizes for estimating parameters of Weibull dis-
tribution (line 3, Alg 1, main paper). We found that as the 7 Qualitative Examples
tail size increased, OpenMax algorithm became very robust It is often useful to look at qualitative examples of success
at rejecting images from open set and fooling set. OpenMax and failure. Fig. 11 – Fig. 12 shows examples where Open-
continued to perform much better than SoftMax in this set- Max failed to detect open set examples. Some of these
ting. The results of this experiments are presented in Fig 6. were from classess in ILSVRC 2010 that were close but
However, beyond tail size 20, we saw performance drop on not identical to classes in ILSVRC 2012. Other examples
validation set. This phenomenon can be seen in Fig 7, since are objects from distinct ILSVRC 2010 classes that were
F-Measure obtained on OpenMax starts to drop beyond tail visually very similar to a particular object class in ILSVRC
size 20. Thus, there is an optimal balance to be maintained 2012. Finally we show an example where OpenMax pro-
between rejecting images from open set and fooling set, cessed a ILSVRC 2012 validation image but reduced its
while maintaining correct classification rate on validation probability thus Caffe with SoftMax provides the correct
set of ILSVRC 2012. answer but OpenMax gets this example wrong.
6.2 Top Classes to be considered for revision
α
8 Confusion Map of Mean Activation
In Alg 2 of the main paper, we present a methodology to Vectors
calibrate FC8 scores via OpenMax. In this process, we also Because detection/rejection of unknown classes depends on
incorporate a process to adjust class probability as well as the distance mean activation vector (MAV) of the highest
estimating the probability for the unknown unknown class. scoring FC8 classes. Note this is different from finding the
For this purpose, in Alg 2 (main paper), we consider “top” distance from the input to the cloest MAV. However, we
classes to revise (line 2, Alg 2, main paper), which is con- still find that for unknown classes that are only fine-grain
trolled by parameter α. We call this parameter as α rank, variants of known classes, the system will not likely reject
where value of α suggests total number of “top” classes them. Similarly for adversarial images, if an image is adver-
to revise. In our experiments we found that optimal per- sarially modified to a “neary-by” is is much less likely the
formance is obtained when α = 10. At lower values of OpenMax will reject/detect it. Thus it is useful to consider
α we see drop in F-Measure performance. If we continue the confusion between existing classes.
OpenMax Fooling Detector OpenMax Openset Detector OpenMax Fooling Detector OpenMax Openset Detector OpenMax Fooling Detector OpenMax Openset Detector
90
Softmax Fooling Detector Softmax Openset Detector 90
Softmax Fooling Detector Softmax Openset Detector 90
Softmax Fooling Detector Softmax Openset Detector

80 80 80

70 70 70

60 60 60

50 50 50
Accuracy

Accuracy

Accuracy
40 40 40

30 30 30

20 20 20

10 10 10

0 0 0
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45
Thresholds Thresholds Thresholds

(a) Tail Size 10 (b) Tail Size 20 (optimal) (c) Tail Size 25

OpenMax Fooling Detector OpenMax Openset Detector OpenMax Fooling Detector OpenMax Openset Detector OpenMax Fooling Detector OpenMax Openset Detector
90
Softmax Fooling Detector Softmax Openset Detector 90
Softmax Fooling Detector Softmax Openset Detector 90
Softmax Fooling Detector Softmax Openset Detector

80 80 80

70 70 70

60 60 60

50 50 50
Accuracy

Accuracy

Accuracy
40 40 40

30 30 30

20 20 20

10 10 10

0 0 0
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45
Thresholds Thresholds Thresholds

(d) Tail Size 30 (e) Tail Size 40 (f) Tail Size 50


Figure 6: The graphs shows fooling detection accuracy and open set detection accuracy for varying tail sizes of EVT fitting.
The graphs plot accuracy vs varying uncertainty threshold values, with different tails in each graph. We observe that OpenMax
consistently performs better than SoftMax for varying tail sizes. However, while increasing tail size increases OpenMax
rejections for open set and fooling, it also increases rejection for true images thereby reducing accuracy on validation set
as well, see Fig 7. These type of accuracy plots are often problematic for open set testing which is why in Fig 7 we use
F-measure to better balance rejection and true acceptance. In the main paper, tail size of 20 was used for all the experiments.

9 Comparison with the 1-vs-set algo- [2] P. Bodesheim, A. Freytag, E. Rodner, and J. Denzler. Local
novelty detection in multi-class recognition problems. In
rithm. Winter Conference on Applications of Computer Vision,
The main paper focused on direct extensions within the 2015 IEEE Conference on. IEEE, 2015. 1
Deep Networks. While we consider it tangential, review- [3] K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman.
ers might worry that applying other models, e.g. a linear Return of the devil in the details: Delving deep into convo-
based 1-vs-set open set algorithm[20] to the FC8 data would lutional nets. In Proceedings of the British Machine Vision
provide better results. For completeness we did run these Conference,(BMVC), 2014. 1
experiments. We used liblinear to train a linear SVM on the [4] Q. Da, Y. Yu, and Z.-H. Zhou. Learning with augmented
class by exploiting unlabeled data. In AAAI Conference on
training samples from the 1000 classes. We also trained a
Artificial Intelligence. AAAI, 2014. 1
1-vs-set machine using the liblinear extension cited in [1],
[5] T. Dean, M. A. Ruzon, M. Segal, J. Shlens, S. Vijaya-
refining it on the training data for the 1000 classes. The narasimhan, and J. Yagnik. Fast, accurate detection of
1-Vs-Set algorithm achieves an overall F-measure of only 100,000 object classes on a single machine. In Computer
.407, which is much lower than the .595 of the OpenMax Vision and Pattern Recognition (CVPR), 2013 IEEE Confer-
approach. ence on, pages 1814–1821. IEEE, 2013. 1
[6] H. Fang, S. Gupta, F. Iandola, R. Srivastava, L. Deng, P. Dol-
lar, J. Gao, X. He, M. Mitchell, P. John, L. Zitnick, and
References G. Zweig. From captions to visual concepts and back. In The
[1] A. Bendale and T. E. Boult. Towards open world recognition. IEEE Conference on Computer Vision and Pattern Recogni-
In The IEEE Conference on Computer Vision and Pattern tion (CVPR). IEEE, 2015. 1
Recognition (CVPR), pages 1893–1902, June 2015. 1, 3, 4, [7] A. Frome, G. S. Corrado, J. Shlens, S. Bengio, J. Dean,
6, 7, 8, 10 T. Mikolov, et al. Devise: A deep visual-semantic embed-
OpenMax Fooling Detector Softmax Fooling Detector OpenMax Fooling Detector Softmax Fooling Detector OpenMax Fooling Detector Softmax Fooling Detector
0.60 0.60 0.60

0.59 0.59 0.59

0.58 0.58 0.58

0.57 0.57 0.57


F-measure

F-measure

F-measure
0.56 0.56 0.56

0.55 0.55 0.55

0.54 0.54 0.54

0.53 0.53 0.53


0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45
Thresholds Thresholds Thresholds

(a) Tail Size 10 (b) Tail Size 20 (optimal) (c) Tail Size 25

OpenMax Fooling Detector Softmax Fooling Detector OpenMax Fooling Detector Softmax Fooling Detector OpenMax Fooling Detector Softmax Fooling Detector
0.60 0.60 0.60

0.59 0.59 0.59

0.58 0.58 0.58

0.57 0.57 0.57


F-measure

F-measure

F-measure
0.56 0.56 0.56

0.55 0.55 0.55

0.54 0.54 0.54

0.53 0.53 0.53


0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45
Thresholds Thresholds Thresholds

(d) Tail Size 30 (e) Tail Size 40 (f) Tail Size 50


Figure 7: The graphs shows F-Measure performance of OpenMax and Softmax with Open Set testing (using validation,
fooling and open set images for testing). Each graph shows F-measure plotted against varying uncertainty threshold values.
Tail size varies in different plots. OpenMax reaches its optimal performance at tail size 20. For tail sizes larger than 20,
though OpenMax becomes good at rejecting images from fooling set and open set (Fig 6), it also rejects true images thus
reducing accuracy on validation set. Hence, we choose tail size 20 for our experiments in main paper.
OpenMax Fooling Detector Softmax Fooling Detector OpenMax Fooling Detector Softmax Fooling Detector
0.60 0.60

0.59 0.59

0.58 0.58

0.57 0.57
F-measure

F-measure

0.56 0.56

0.55 0.55

0.54 0.54

0.53 0.53
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45
Thresholds Thresholds

(a) Tail Size 20, Alpha Rank 5 (b) Tail Size 20, Alpha Rank 10 (optimal)

Figure 8: The above figure shows performance of OpenMax and Softmax as number of top classes to be considered for
recalibrating are changed. In our experiments, we found best performance when top 10 classes (i.e. α = 10) were considered
for recalibration.

ding model. In Advances in Neural Information Processing logical Learning Society, 2015. 1, 2, 4
Systems, pages 2121–2129, 2013. 8
[9] N. Jammalamadaka, A. Zisserman, M. Eichner, V. Ferrari,
[8] I. Goodfellow, J. Shelns, and C. Szegedy. Explaining and and C. Jawahar. Has my algorithm succeeded? an evaluator
harnessing adversarial examples. In International Confer- for human pose estimators. In Computer Vision–ECCV 2012.
ence on Learning Representations. Computational and Bio- Springer, 2014. 2
OpenMax Fooling Detector OpenMax Openset Detector OpenMax Fooling Detector OpenMax Openset Detector
90
Softmax Fooling Detector Softmax Openset Detector 90
Softmax Fooling Detector Softmax Openset Detector

80 80

70 70

60 60

F-measure
50 50

Accuracy
40 40

30 30

20 20

10 10

0 0
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45
Thresholds Thresholds

(a) Tail Size 20, Alpha Rank 5 (b) Tail Size 20, Alpha Rank 10 (optimal)

Figure 9: The figure shows fooling detection and open set detection accuracy for varying alpha sizes. In our experiments,
alpha rank of 10 yielded best results. Increasing alpha value beyond 10 did not result in any performance gains.
OpenMax Fooling Detector Softmax Fooling Detector
0.60
OpenMax Open set Softmax Open set OpenMax Open set Softmax Open set
0.59 0.59
0.59

0.58
0.58
0.58

0.57

F-measure
0.57
F-measure

F-measure

0.57 0.56

0.56 0.55
0.56
0.54
0.55

0.55 0.53
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45
Thresholds
0.54
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45
Thresholds Thresholds
(c) Euclidean-Cosine distance, Tail Size
(a) Cosine Distance, Tail Size 20, Alpha (b) Euclidean Distance, Tail Size 20, 20, Alpha Rank 10 (optimal) Note scale
Rank 10 Alpha Rank 10 difference!

Figure 10: The above figure shows performance of OpenMax and Softmax for different types of distance measures. We
found the performance trend to be similar, with euclidean-cosine distance performing best.

[10] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Gir- [15] M. Oquab, L. Bottou, I. Laptev, and J. Sivic. Is object local-
shick, S. Guadarrama, and T. Darrell. Caffe: Convolu- ization for free? weakly-supervised learning with convolu-
tional architecture for fast feature embedding. arXiv preprint tional neural networks. In The IEEE Conference on Com-
arXiv:1408.5093, 2014. 5, 7 puter Vision and Pattern Recognition (CVPR), June 2015. 8
[11] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet [16] V. Ordonez, V. Jagadeesh, W. Di, A. Bhardwaj, and R. Pira-
classification with deep convolutional neural networks. In muthu. Furniture-geek: Understanding fine-grained furni-
Advances in neural information processing systems (NIPS), ture attributes from freely associated text and tags. In Appli-
pages 1097–1105, 2012. 1, 7 cations of Computer Vision (WACV), 2014 IEEE Winter Con-
[12] T. Mensink, J. Verbeek, F. Perronnin, and G. Csurka. Metric ference on, pages 317–324. IEEE, 2014. 1
learning for large scale image classification: Generalizing to [17] M. Ristin, M. Guillaumin, J. Gall, and L. VanGool. Incre-
new classes at near-zero cost. In ECCV, 2012. 3 mental learning of ncm forests for large scale image classifi-
[13] T. Mensink, J. Verbeek, F. Perronnin, and G. Csurka. cation. CVPR, 2014. 4, 5
Distance-based image classification: Generalizing to new [18] D. Rumsfeld. Known and unknown: a memoir. Penguin,
classes at near-zero cost. Pattern Analysis and Machine 2011. 1
Intelligence, IEEE Transactions on, 35(11):2624–2637, [19] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh,
2013. 4, 5 S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein,
[14] A. Nguyen, J. Yosinski, and J. Clune. Deep neural networks A. C. Berg, and L. Fei-Fei. ImageNet Large Scale Visual
are easily fooled: High confidence predictions for unrecog- Recognition Challenge. International Journal of Computer
nizable images. In Computer Vision and Pattern Recognition Vision (IJCV), pages 1–42, April 2015. 7
(CVPR), 2015 IEEE Conference on. IEEE, 2015. 1, 2, 5, 6, [20] W. J. Scheirer, A. de Rezende Rocha, A. Sapkota, and T. E.
7 Boult. Toward open set recognition. Pattern Analysis and
Figure 11: Left is an Image from ILSVRC 2010, “subway train”, n04349306. OpenMax and Softmax both classify as
n04335435. Instead of ‘unknown”. OpenMax predicts that the image on left belongs to category “n04335435:streetcar, tram,
tramcar, trolley, trolley car” from ILSVRC 2012 with an output probability of 0.6391 (caffe probability 0.5225). Right is an
example image image from ILSVRC 2012, “streetcar, tram, tramcar, trolley, trolley car”, n04335435 It is easy to see such
mistakes are bound to happen since open set classes from ILSVRC 2010 may have have many related categories which have
different names, but which are semantically or visually are very similar. This is why fooling rejection is much stronger than
open set rejection.

(a) A validation example of Openmax Failure/ Softmax labeles it (b) Another validation failure example, where softmax classifies it
correctly as n03977966 (Police van/police wagon) with probabil- as n13037406 with probability 0.9991, while OpenMax rejects it
ity 0.6463, while openmax incorrect labels it n02701002 (Ambu- as unknown. n13037406 is a gyromitra, which is genus of mush-
lance) with probability 0.4507. ) room.

Figure 12: The above figure shows an examples of validation image misclassification by OpenMax algorithm.

Machine Intelligence, IEEE Transactions on, 35(7):1757– [23] K. Simonyan and A. Zisserman. Very deep convolutional
1772, 2013. 1, 3, 7, 10 networks for large scale image recognition. In Interna-
[21] W. J. Scheirer, L. P. Jain, and T. E. Boult. Probability models tional Conference on Learning Representations. Computa-
for open set recognition. Pattern Analysis and Machine Intel- tional and Biological Learning Society, 2015. 7
ligence, IEEE Transactions on, 36(11):2317–2324, 2014. 1, [24] R. Socher, M. Ganjoo, C. D. Manning, and A. Ng. Zero-shot
3, 6, 7 learning through cross-modal transfer. In Advances in neural
[22] W. J. Scheirer, A. Rocha, R. J. Micheals, and T. E. Boult. information processing systems, pages 935–943, 2013. 8
Meta-recognition: The theory and practice of recognition [25] R. Socher, C. D. Manning, and A. Y. Ng. Learning con-
score analysis. Pattern Analysis and Machine Intelligence, tinuous phrase representations and syntactic parsing with
IEEE Transactions on, 33(8):1689–1695, 2011. libMR code recursive neural networks. In Proceedings of the NIPS-2010
at http://metarecognition.com. 2, 3, 4 Deep Learning and Unsupervised Feature Learning Work-
Figure 13: The above figure shows confusion matrix of distances between Mean Activation Vector (MAV) for each class
in ILSVRC 2012 with MAV of every other class. Lower values of distances indicate that MAVs for respective classes are
very close to each other, and higher values of distances indicate classes that are far apart. Majority of misclassifications for
OpenMax happen in fine-grained categorization, which is to be expected.

shop, pages 1–9, 2010. 1, 7 [29] R. Tibshirani, T. Hastie, B. Narasimhan, and G. Chu. Diag-
[26] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, nosis of multiple cancer types by shrunken centroids of gene
D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. expression. In Proceedings of the National Academy of Sci-
Going deeper with convolutions. In Computer Vision and ences. NAS, 2002. 3
Pattern Recognition (CVPR), 2015 IEEE Conference on. [30] A. Vezhnevets and V. Ferrari. Object localization in imagenet
IEEE, 2015. 1 by looking out of the window. In Proceedings of the British
Machine Vision Conference,(BMVC), 2015. 8
[27] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan,
[31] J. Yosinski, J. Clune, A. Nguyen, T. Fuchs, and H. Lipson.
I. Goodfellow, and R. Fergus. Intriguing properties of neural
Understanding neural networks through deep visualization.
networks. In International Conference on Learning Repre-
In International Conference on Machine Learning, Work-
sentations. Computational and Biological Learning Society,
shop on Deep Learning, 2015. 4
2014. 2, 4, 5
[32] P. Zhang, J. Wang, A. Farhadi, M. Hebert, and D. Parikh.
[28] Y. Taigman, M. Yang, M. Ranzato, and L. Wolf. Deepface: Predicting failures of vision systems. In The IEEE Confer-
Closing the gap to human-level performance in face verifica- ence on Computer Vision and Pattern Recognition (CVPR),
tion. In Computer Vision and Pattern Recognition (CVPR), June 2014. 2, 3
2014 IEEE Conference on, pages 1701–1708. IEEE, 2014. 1

You might also like