0% found this document useful (0 votes)
23 views5 pages

Lupi

This document summarizes the Learning using Privileged Information (LUPI) framework. It discusses: 1) The LUPI framework allows using additional "privileged information" during training that is unavailable during testing, to potentially improve generalization performance. 2) Support Vector Machine (SVM) and SVM+ algorithms are described for classification using normal and privileged information. SVM+ modifies SVM to incorporate privileged information. 3) Other algorithms like weighted SVMs and margin transfer SVMs are discussed that aim to leverage privileged information by differentiating easy vs difficult examples. 4) Experiments on various datasets are used to compare classification error of SVM vs SVM+ to analyze the benefit of privileged information. Graphs of error vs

Uploaded by

ci8084102
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views5 pages

Lupi

This document summarizes the Learning using Privileged Information (LUPI) framework. It discusses: 1) The LUPI framework allows using additional "privileged information" during training that is unavailable during testing, to potentially improve generalization performance. 2) Support Vector Machine (SVM) and SVM+ algorithms are described for classification using normal and privileged information. SVM+ modifies SVM to incorporate privileged information. 3) Other algorithms like weighted SVMs and margin transfer SVMs are discussed that aim to leverage privileged information by differentiating easy vs difficult examples. 4) Experiments on various datasets are used to compare classification error of SVM vs SVM+ to analyze the benefit of privileged information. Graphs of error vs

Uploaded by

ci8084102
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Understanding LUPI

(Learning using Privileged Information)


Ahmadreza Momeni, Kedar Tatwawadi
Stanford University,
Stanford, US
{amomenis,kedart}@stanford.edu

I. INTRODUCTION B. SVM and SVM+


The idea of using privileged information was first sug- We briefly describe the SVM and SVM+ methods that we
gested by V. Vapnik and A. Vashist in [1], in which they solve for classification, which in this case is finding some
tried to capture the essence of teacher-student based learning ω ∈ X and b ∈ R to build the following predictor:
which is very effective in case of human beings learning.
More specifically, when a human is learning a novel notion, f (x) = sgn [hω, xi + b] .
he exploits his teacher’s comments, explanations, and ex-
amples to facilitate the learning procedure. Vapnik proposed 1) SVM: The SVM learning method (non-separable
the following framework : assume that we want to build a SVM) to find ω and b is equivalent to solving the following
decision rule for determining some labels y based on some optimization problem:
features X, but in the training stage in addition to X, we are
m
also provided with some additional information, denoted as 1 X
the ”privileged information” x∗ which is not present in the min hω, ωi + C ξi
2 i=1
testing stage.
s.t. yi [hω, xi i + b] ≥ 1 − ξi , i = 1, . . . , m.
In such a scenario how can we utilize X ∗ to improve
the learning? In this project report, we try to understand
As a short remark, we should mention that C is a parameter
the framework of LUPI using a variety of experiments. We
that needs tuning. In addition, if the slacks ξi are all equal
also try to propose a new algorithm based on priviledged
to zero then we call the set of given examples separable,
information for Neural Networks based on the intuition
otherwise they are non-separable.
obtained from the experiments.
2) SVM+: In order to take into account the privileged
A. LUPI Framework information X ∗ Vapnik modified the SVM formulation as
follows:
We first briefly describe the mathematical framework of
m
LUPI: In the classical binary classification problems we are 1 X
given m number of pairs (xi , yi ), i = 1, . . . , m where xi ∈ min [hω, ωi + γhω ∗ , ω ∗ i] + C [hω ∗ , x∗ i + b∗ ]
2 i=1
X , yi ∈ {−1, +1}, and each pair is independently generated
by some underlying distribution PXY , which is unknown. s.t. yi [hω, xi i + b] ≥ 1 − [hω ∗ , x∗i i + b∗ ], i = 1, . . . , m,

The goal here is to find a function f : X → {−1, +1} in the [hω , x∗i i ∗
+ b ] ≥ 0, i = 1, . . . , m,
function class F to assign the labels with the lowest error
possible averaged over the unknown distribution PXY . where ω ∗ ∈ X ∗ and b∗ ∈ R. In this problem C and γ are
In the LUPI framework, the model is slightly different, hyper parameters to be tuned.
as we are provided with triplets (xi , x∗i , yi ), i = 1, . . . , m Intuitively, we can think of [hω ∗ , x∗i i + b∗ ]’s as some
where xi ∈ X , x∗i ∈ X ∗ , yi ∈ {−1, +1} with each triplet estimators for the slacks ξi ’s in the previous optimization
is independently generated by some underlying distribution problem. However, the reduced freedom and better prediction
PXX ∗ Y , which is again unknown. However, the goal is the of the slacks using the privileged information improves the
same as before: we still aim to find a function f : X → learning. Another intuition here is that, in some sense the
{−1, +1} in the function class F to assign the labels with margins [hω ∗ , x∗i i + b∗ ] capture the difficulty of the training
the lowest error possible. examples in the privileged space. This difficulty information
The important question which Vapnik asks is: can the is then used to relax/tighten the SVM constraints to improve
generalization performance be improved using the privileged the learning.
information? Vapnik also showed this is true in the case of We next describe some methodologies which capture this
SVM. We will next briefly describe the SVM and the SVM+ intuition relating to difficulty of examples to construt LUPI
LUPI based framework proposed by Vapnik. based frameworks.
C. Weighted SVM and Margin Transfer SVMs a The number of the normal features, b The number of the privileged
features
One way in which privileged information influences learn-
In each of the above datasets, we chose some feature as
ing is by differentiating easy examples from the really
normal ones and some of them as privileged, and then trained
difficult ones. This understanding was later formalized in
the classifiers and computed the error on the corresponding
[2], where the authors argue that if the weights are chosen
test set. For all of the datasets, we used linear kernel. The
appropriately then Weighted-SVM can always outperform
resulted graphs are as follows:
SVM+. In weighted SVMs the exmaple weights themselves
tell the difficult/importance of the examples. Although [2]
0.2

proved that weighted SVMs are better than SVM+, the SVM+
SVM

difficulty arises from the fact that the weights are unknown. 0.18

In some cases though, there are heuristics to guess the 0.16

weights which work pretty well, and the subject knowledge


can often be utilized for this cause.

Error
0.14

We next describe a heuristic proposed in [3] to find 0.12

weights and solve a WSVM problem


1) Margin Transfer SVM: One way to exploit privileged
0.1

information is proposed in [3], where they suggest to solve a 0.08


0 20 40 60 80 100 120 140 160 180

classification problem using only privileged information x∗ ,


Training Set Size

and achieve a classifier f ∗ (note that there is no requirement


for f ∗ to be of the form hω ∗ , x∗ i + b∗ . Now, we store the
Fig. 1. SVM+ v.s. SVM: Ionosphere data
margins ρi := yi · f ∗ (x∗i ). For our purpose, we put some
threshold  on the margins and define ρ̂i := max{ρi , }.
Now we are equipped to solve the following optimization
0.28

problem: 0.26
SVM+
SVM

m 0.24

1 X
min hω, ωi + C ρ̂i ξi 0.22

2 i=1
0.2
Error

0.18

s.t. yi [hω, xi i + b] ≥ 1 − ξi , i = 1, . . . , m. 0.16

0.14

0.12

Intuitively, the margins ρi determine how difficult an exam- 0.1

ple is. In extreme cases, if an example is too difficult (ρi < 0) 0.08
0 50
Training Set Size
100 150

then its weight ρ̂i is equal to zero, which means that we are
eliminating that example in the training stage. This is similar
to human learning procedure, where if an example is too hard
Fig. 2. SVM+ v.s. SVM: Ring data
then the teacher does not use it because it makes the student
diverge from learning the main subject and waste time on
some other useless points.
0.4
We next describe various experiments which we conducted
SVM+
to understand LUPI. SVM

II. E XPERIMENTS
0.35
A. SVM+ v.s. SVM
Error

The first experiment that we conducted was to compare


the performance of SVM+ and SVM. We used the following
datasets: 0.3

TABLE I
DATASETS

0.25
Data Test set size da d∗b 10 20 30 40 50 60 70
Training Set Size
Ionosphere 201 7 6
Ring 7250 10 10
Wine age 108 4 5 Fig. 3. SVM+ v.s. SVM: Wine Age data
As a brief remark, we note that not only does SVM+ 0.28
converge faster, but surprisingly in some cases it converges Manually Weighted SVM
to a better answer, which is observed very distinctly in the SVM
0.26
Ring experiment. We also observed that SVM+ needs a
different solver than SVM and is quite sensitive to the hyper-
0.24
parameters, which makes it very difficult to get it working
for complex datasets.

Error
0.22
B. Manually Weighted SVM v.s. SVM+
The second experiment that we conducted aimed to eval- 0.2
uate the performance of Manually Weighted SVM. The aim
of the experiment was to ascertain the intuition that difficulty 0.18
of examples helps in improving the learning. Thus, we
considered the ease/difficulty of the training set itself as the
0.16
privileged information. 10 15 20 25 30 35 40 45 50 55 60

We used the following datasets: Training Set Size

TABLE II
DATASETS Fig. 5. Manually Weighted SVM v.s. SVM+: Wine age data

Data Test set size da Overall, we observed that the privileged information re-
Abalone 3178 7 lated to difficulty indeed does help the learning in lot of
Wine age 118 7 scenarios. Although, for higher data sizes, the improvement
is not significant. This in some sense confirmed the intuition
a The number of the normal features stated earlier.
The difficulty levels are determined as follows: C. LUPI-FNN
• Abalone Dataset: In this experiment an abalone is
In this experiment, we used the intuition outained from
assigned label +1 if its age is above some threshold the Weighted SVM and the MArgin-Transfer methods to the
otherwise the label is -1. We considered the examples case of Neural Networks. The basic idea, which is applicable
the age of which is equal to the threshold to be difficult. to more general learning frameworks is that: weights can be
• Wine age Dataset: In this experiment a label is +1 if
used to modify the learning rate per-example while applying
the age of wine is above some threshold otherwise it training procedures based on gradient descent (like: SGD,
is -1. We considered the examples the age of which momentum update, RMS-Prop etc.).
is between the threshold and 0.25 times the standard
deviation of the whole dataset age to be difficult.
For both datasets, we used linear kernel. The resulted graphs
are as follows:

0.33
Manually Weighted SVM
0.32 SVM

0.31

0.3

0.29
Error

0.28

0.27

0.26

0.25

0.24

0.23
0 50 100 150
Training Set Size Fig. 6. Input Data (X,y)

In this specific example, we have a spiral dataset con-


Fig. 4. Manually Weighted SVM v.s. SVM+: Abalone data taining 3 classes (each denoted by different colors). The
aim is to use neural networks to perform classification. As the baseline neural network [Fig10], we observed improved
we observe, although the input dataset itself is complex, generalization performance improvement of on an average
the privileged information, which is captured by the polar 3%.
coordinates (unwarped) representation of the input dataset,
is much more easier to classify.

Fig. 9. Weighted NN training

Fig. 7. Privileged Information (X*,y)

Our First step is to fit a 0-layer FCNN in the privileged


space (X∗, Y ). The FCNN consists of a linear classifier fol-
lowed by a softmax layer to determine the class probabilities.
In the experiment, we determined the example weights based
on the softmax probability of the correct class. Thus, lower
the probability, the harder the example and vice-versa.

Fig. 10. Reference training without privileged information

III. C ONCLUSION
From the experiments, we gained a lot of intuition into
how to use LUPI in practical scenarios. We were also able
to formulate LUPI algorithm for neural networks. However,
more experiments with real-life data is necessary to confirm
the performance of the heuristics applied.
Fig. 8. Learning Weights
IV. C ODE
The weights obtained from the privileged information All the source code, including interactive matlab and
were used to train a 1-layer neural network (Linear-ReLU- ipython notebooks are available at: https://github.
Linear-Softmax) [Fig9] for the problem. As compared with com/kedartatwawadi/LUPI.
We plan to update the github repo with more experi-
ments/tutorials on LUPI.
R EFERENCES
[1] V. Vapnik and A. Vashist, “A new learning paradigm: Learning using
privileged information,” Neural Networks, vol. 22, no. 5, pp. 544–557,
2009.
[2] M. Lapin, M. Hein, and B. Schiele, “Learning using privileged informa-
tion: Svm+ and weighted svm,” Neural Networks, vol. 53, pp. 95–108,
2014.
[3] V. Sharmanska, N. Quadrianto, and C. H. Lampert, “Learning to transfer
privileged information,” CoRR, vol. abs/1410.0389, 2014.

You might also like