0% found this document useful (0 votes)

23 views5 pages

Lupi

This document summarizes the Learning using Privileged Information (LUPI) framework. It discusses: 1) The LUPI framework allows using additional "privileged information" during training that is unavailable during testing, to potentially improve generalization performance. 2) Support Vector Machine (SVM) and SVM+ algorithms are described for classification using normal and privileged information. SVM+ modifies SVM to incorporate privileged information. 3) Other algorithms like weighted SVMs and margin transfer SVMs are discussed that aim to leverage privileged information by differentiating easy vs difficult examples. 4) Experiments on various datasets are used to compare classification error of SVM vs SVM+ to analyze the benefit of privileged information. Graphs of error vs

Uploaded by

ci8084102

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views5 pages

Lupi

Uploaded by

ci8084102

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Understanding LUPI

(Learning using Privileged Information)

Ahmadreza Momeni, Kedar Tatwawadi
Stanford University,
Stanford, US
{amomenis,kedart}@stanford.edu

I. INTRODUCTION B. SVM and SVM+

The idea of using privileged information was first sug- We briefly describe the SVM and SVM+ methods that we
gested by V. Vapnik and A. Vashist in [1], in which they solve for classification, which in this case is finding some
tried to capture the essence of teacher-student based learning ω ∈ X and b ∈ R to build the following predictor:
which is very effective in case of human beings learning.
More specifically, when a human is learning a novel notion, f (x) = sgn [hω, xi + b] .
he exploits his teacher’s comments, explanations, and ex-
amples to facilitate the learning procedure. Vapnik proposed 1) SVM: The SVM learning method (non-separable
the following framework : assume that we want to build a SVM) to find ω and b is equivalent to solving the following
decision rule for determining some labels y based on some optimization problem:
features X, but in the training stage in addition to X, we are
m
also provided with some additional information, denoted as 1 X
the ”privileged information” x∗ which is not present in the min hω, ωi + C ξi
2 i=1
testing stage.
s.t. yi [hω, xi i + b] ≥ 1 − ξi , i = 1, . . . , m.
In such a scenario how can we utilize X ∗ to improve
the learning? In this project report, we try to understand
As a short remark, we should mention that C is a parameter
the framework of LUPI using a variety of experiments. We
that needs tuning. In addition, if the slacks ξi are all equal
also try to propose a new algorithm based on priviledged
to zero then we call the set of given examples separable,
information for Neural Networks based on the intuition
otherwise they are non-separable.
obtained from the experiments.
2) SVM+: In order to take into account the privileged
A. LUPI Framework information X ∗ Vapnik modified the SVM formulation as
follows:
We first briefly describe the mathematical framework of
m
LUPI: In the classical binary classification problems we are 1 X
given m number of pairs (xi , yi ), i = 1, . . . , m where xi ∈ min [hω, ωi + γhω ∗ , ω ∗ i] + C [hω ∗ , x∗ i + b∗ ]
2 i=1
X , yi ∈ {−1, +1}, and each pair is independently generated
by some underlying distribution PXY , which is unknown. s.t. yi [hω, xi i + b] ≥ 1 − [hω ∗ , x∗i i + b∗ ], i = 1, . . . , m,
∗
The goal here is to find a function f : X → {−1, +1} in the [hω , x∗i i ∗
+ b ] ≥ 0, i = 1, . . . , m,
function class F to assign the labels with the lowest error
possible averaged over the unknown distribution PXY . where ω ∗ ∈ X ∗ and b∗ ∈ R. In this problem C and γ are
In the LUPI framework, the model is slightly different, hyper parameters to be tuned.
as we are provided with triplets (xi , x∗i , yi ), i = 1, . . . , m Intuitively, we can think of [hω ∗ , x∗i i + b∗ ]’s as some
where xi ∈ X , x∗i ∈ X ∗ , yi ∈ {−1, +1} with each triplet estimators for the slacks ξi ’s in the previous optimization
is independently generated by some underlying distribution problem. However, the reduced freedom and better prediction
PXX ∗ Y , which is again unknown. However, the goal is the of the slacks using the privileged information improves the
same as before: we still aim to find a function f : X → learning. Another intuition here is that, in some sense the
{−1, +1} in the function class F to assign the labels with margins [hω ∗ , x∗i i + b∗ ] capture the difficulty of the training
the lowest error possible. examples in the privileged space. This difficulty information
The important question which Vapnik asks is: can the is then used to relax/tighten the SVM constraints to improve
generalization performance be improved using the privileged the learning.
information? Vapnik also showed this is true in the case of We next describe some methodologies which capture this
SVM. We will next briefly describe the SVM and the SVM+ intuition relating to difficulty of examples to construt LUPI
LUPI based framework proposed by Vapnik. based frameworks.
C. Weighted SVM and Margin Transfer SVMs a The number of the normal features, b The number of the privileged
features
One way in which privileged information influences learn-
In each of the above datasets, we chose some feature as
ing is by differentiating easy examples from the really
normal ones and some of them as privileged, and then trained
difficult ones. This understanding was later formalized in
the classifiers and computed the error on the corresponding
[2], where the authors argue that if the weights are chosen
test set. For all of the datasets, we used linear kernel. The
appropriately then Weighted-SVM can always outperform
resulted graphs are as follows:
SVM+. In weighted SVMs the exmaple weights themselves
tell the difficult/importance of the examples. Although [2]
0.2

proved that weighted SVMs are better than SVM+, the SVM+
SVM

difficulty arises from the fact that the weights are unknown. 0.18

In some cases though, there are heuristics to guess the 0.16

weights which work pretty well, and the subject knowledge

can often be utilized for this cause.

Error
0.14

We next describe a heuristic proposed in [3] to find 0.12

weights and solve a WSVM problem

1) Margin Transfer SVM: One way to exploit privileged
0.1

information is proposed in [3], where they suggest to solve a 0.08

0 20 40 60 80 100 120 140 160 180

classification problem using only privileged information x∗ ,

Training Set Size

and achieve a classifier f ∗ (note that there is no requirement

for f ∗ to be of the form hω ∗ , x∗ i + b∗ . Now, we store the
Fig. 1. SVM+ v.s. SVM: Ionosphere data
margins ρi := yi · f ∗ (x∗i ). For our purpose, we put some
threshold on the margins and define ρ̂i := max{ρi , }.
Now we are equipped to solve the following optimization
0.28

problem: 0.26
SVM+
SVM

m 0.24

1 X
min hω, ωi + C ρ̂i ξi 0.22

2 i=1
0.2
Error

0.18

s.t. yi [hω, xi i + b] ≥ 1 − ξi , i = 1, . . . , m. 0.16

0.14

0.12

Intuitively, the margins ρi determine how difficult an exam- 0.1

ple is. In extreme cases, if an example is too difficult (ρi < 0) 0.08
0 50
Training Set Size
100 150

then its weight ρ̂i is equal to zero, which means that we are
eliminating that example in the training stage. This is similar
to human learning procedure, where if an example is too hard
Fig. 2. SVM+ v.s. SVM: Ring data
then the teacher does not use it because it makes the student
diverge from learning the main subject and waste time on
some other useless points.
0.4
We next describe various experiments which we conducted
SVM+
to understand LUPI. SVM

II. E XPERIMENTS
0.35
A. SVM+ v.s. SVM
Error

The first experiment that we conducted was to compare

the performance of SVM+ and SVM. We used the following
datasets: 0.3

TABLE I
DATASETS

0.25
Data Test set size da d∗b 10 20 30 40 50 60 70
Training Set Size
Ionosphere 201 7 6
Ring 7250 10 10
Wine age 108 4 5 Fig. 3. SVM+ v.s. SVM: Wine Age data
As a brief remark, we note that not only does SVM+ 0.28
converge faster, but surprisingly in some cases it converges Manually Weighted SVM
to a better answer, which is observed very distinctly in the SVM
0.26
Ring experiment. We also observed that SVM+ needs a
different solver than SVM and is quite sensitive to the hyper-
0.24
parameters, which makes it very difficult to get it working
for complex datasets.

Error
0.22
B. Manually Weighted SVM v.s. SVM+
The second experiment that we conducted aimed to eval- 0.2
uate the performance of Manually Weighted SVM. The aim
of the experiment was to ascertain the intuition that difficulty 0.18
of examples helps in improving the learning. Thus, we
considered the ease/difficulty of the training set itself as the
0.16
privileged information. 10 15 20 25 30 35 40 45 50 55 60

We used the following datasets: Training Set Size

TABLE II
DATASETS Fig. 5. Manually Weighted SVM v.s. SVM+: Wine age data

Data Test set size da Overall, we observed that the privileged information re-
Abalone 3178 7 lated to difficulty indeed does help the learning in lot of
Wine age 118 7 scenarios. Although, for higher data sizes, the improvement
is not significant. This in some sense confirmed the intuition
a The number of the normal features stated earlier.
The difficulty levels are determined as follows: C. LUPI-FNN
• Abalone Dataset: In this experiment an abalone is
In this experiment, we used the intuition outained from
assigned label +1 if its age is above some threshold the Weighted SVM and the MArgin-Transfer methods to the
otherwise the label is -1. We considered the examples case of Neural Networks. The basic idea, which is applicable
the age of which is equal to the threshold to be difficult. to more general learning frameworks is that: weights can be
• Wine age Dataset: In this experiment a label is +1 if
used to modify the learning rate per-example while applying
the age of wine is above some threshold otherwise it training procedures based on gradient descent (like: SGD,
is -1. We considered the examples the age of which momentum update, RMS-Prop etc.).
is between the threshold and 0.25 times the standard
deviation of the whole dataset age to be difficult.
For both datasets, we used linear kernel. The resulted graphs
are as follows:

0.33
Manually Weighted SVM
0.32 SVM

0.31

0.3

0.29
Error

0.28

0.27

0.26

0.25

0.24

0.23
0 50 100 150
Training Set Size Fig. 6. Input Data (X,y)

In this specific example, we have a spiral dataset con-

Fig. 4. Manually Weighted SVM v.s. SVM+: Abalone data taining 3 classes (each denoted by different colors). The
aim is to use neural networks to perform classification. As the baseline neural network [Fig10], we observed improved
we observe, although the input dataset itself is complex, generalization performance improvement of on an average
the privileged information, which is captured by the polar 3%.
coordinates (unwarped) representation of the input dataset,
is much more easier to classify.

Fig. 9. Weighted NN training

Fig. 7. Privileged Information (X*,y)

Our First step is to fit a 0-layer FCNN in the privileged

space (X∗, Y ). The FCNN consists of a linear classifier fol-
lowed by a softmax layer to determine the class probabilities.
In the experiment, we determined the example weights based
on the softmax probability of the correct class. Thus, lower
the probability, the harder the example and vice-versa.

Fig. 10. Reference training without privileged information

III. C ONCLUSION
From the experiments, we gained a lot of intuition into
how to use LUPI in practical scenarios. We were also able
to formulate LUPI algorithm for neural networks. However,
more experiments with real-life data is necessary to confirm
the performance of the heuristics applied.
Fig. 8. Learning Weights
IV. C ODE
The weights obtained from the privileged information All the source code, including interactive matlab and
were used to train a 1-layer neural network (Linear-ReLU- ipython notebooks are available at: https://github.
Linear-Softmax) [Fig9] for the problem. As compared with com/kedartatwawadi/LUPI.
We plan to update the github repo with more experi-
ments/tutorials on LUPI.
R EFERENCES
[1] V. Vapnik and A. Vashist, “A new learning paradigm: Learning using
privileged information,” Neural Networks, vol. 22, no. 5, pp. 544–557,
2009.
[2] M. Lapin, M. Hein, and B. Schiele, “Learning using privileged informa-
tion: Svm+ and weighted svm,” Neural Networks, vol. 53, pp. 95–108,
2014.
[3] V. Sharmanska, N. Quadrianto, and C. H. Lampert, “Learning to transfer
privileged information,” CoRR, vol. abs/1410.0389, 2014.

English Proj
No ratings yet
English Proj
15 pages
Lecture 18 - SVM
No ratings yet
Lecture 18 - SVM
54 pages
Bayesian Methods For Support Vector Machines: Evidence and Predictive Class Probabilities
No ratings yet
Bayesian Methods For Support Vector Machines: Evidence and Predictive Class Probabilities
32 pages
ML Questions 2021
100% (1)
ML Questions 2021
26 pages
A Survey of Support Vector Machines With Uncertainties: Ximing Wang Panos M. Pardalos
No ratings yet
A Survey of Support Vector Machines With Uncertainties: Ximing Wang Panos M. Pardalos
17 pages
Support Vector Machines Ymod
No ratings yet
Support Vector Machines Ymod
4 pages
Support Vector Machines As Probabilistic Models
No ratings yet
Support Vector Machines As Probabilistic Models
8 pages
2024 Scu ML 2 1 SVM
No ratings yet
2024 Scu ML 2 1 SVM
36 pages
Fundamental Knowledge of Machine Learning: Abstract This Chapter Introduces The Basic Concepts and Methods of Machine
No ratings yet
Fundamental Knowledge of Machine Learning: Abstract This Chapter Introduces The Basic Concepts and Methods of Machine
14 pages
08 Classification
No ratings yet
08 Classification
46 pages
Cao 2015
No ratings yet
Cao 2015
17 pages
A Practical Guide To Support Vector Classification: I I I N L
No ratings yet
A Practical Guide To Support Vector Classification: I I I N L
12 pages
Support Vector Machine
No ratings yet
Support Vector Machine
13 pages
Tutorial On Support Vector Machine (SVM) : Abstract
No ratings yet
Tutorial On Support Vector Machine (SVM) : Abstract
13 pages
Unit - 2
No ratings yet
Unit - 2
15 pages
An Improved Algorithm For Imbalanced Data and Small Sample Size Classification
No ratings yet
An Improved Algorithm For Imbalanced Data and Small Sample Size Classification
7 pages
AP For NLP-LO2
No ratings yet
AP For NLP-LO2
38 pages
A Practical Guide To Support Vector Classification: I I I N L
No ratings yet
A Practical Guide To Support Vector Classification: I I I N L
15 pages
AML Unit 4 Part 1
No ratings yet
AML Unit 4 Part 1
14 pages
Support Vector Machines: (Vapnik, 1979)
No ratings yet
Support Vector Machines: (Vapnik, 1979)
34 pages
Unit - 2-1
No ratings yet
Unit - 2-1
7 pages
QUESTIONS
No ratings yet
QUESTIONS
20 pages
Article
No ratings yet
Article
23 pages
Machine Learning Techniques Final Report (Fall, 2020) : ML - Explorer
No ratings yet
Machine Learning Techniques Final Report (Fall, 2020) : ML - Explorer
6 pages
Understanding Support Vector Machine Algorithm From Examples
No ratings yet
Understanding Support Vector Machine Algorithm From Examples
10 pages
Support Vector Machines For Histogram-Based Image Classification
No ratings yet
Support Vector Machines For Histogram-Based Image Classification
10 pages
Articol Informatica Economica
No ratings yet
Articol Informatica Economica
10 pages
Unit 5 - Machine Learning
No ratings yet
Unit 5 - Machine Learning
17 pages
Machine Learning Term Test 2
No ratings yet
Machine Learning Term Test 2
20 pages
Combining Support Vector Machines: 6.1. Introduction and Motivations
No ratings yet
Combining Support Vector Machines: 6.1. Introduction and Motivations
20 pages
Support Vector Machine
No ratings yet
Support Vector Machine
45 pages
Aim of The Experiment-Software Required - Theory
No ratings yet
Aim of The Experiment-Software Required - Theory
6 pages
Unit 5 - Machine Learning - WWW - Rgpvnotes.in
No ratings yet
Unit 5 - Machine Learning - WWW - Rgpvnotes.in
17 pages
Support Vector Machines: Jeff Wu
No ratings yet
Support Vector Machines: Jeff Wu
35 pages
Machine Learning Report
No ratings yet
Machine Learning Report
22 pages
SVM Tutorial
No ratings yet
SVM Tutorial
28 pages
Neural 4
No ratings yet
Neural 4
5 pages
Support Vector Machines Vs Logistic Regression: Kevin Swersky University of Toronto CSC2515 Tutorial
No ratings yet
Support Vector Machines Vs Logistic Regression: Kevin Swersky University of Toronto CSC2515 Tutorial
23 pages
This Is
No ratings yet
This Is
7 pages
Mod09-ppt2-ML in Image Classification
No ratings yet
Mod09-ppt2-ML in Image Classification
30 pages
Huawei H12-211 PRACTICE EXAM HCNA-HNTD H
No ratings yet
Huawei H12-211 PRACTICE EXAM HCNA-HNTD H
117 pages
Lec 05
No ratings yet
Lec 05
54 pages
Ijatcse 43922020
No ratings yet
Ijatcse 43922020
6 pages
Support Vector Machines
No ratings yet
Support Vector Machines
12 pages
NEUROCOMMPUTING
No ratings yet
NEUROCOMMPUTING
9 pages
Supervised Alg
No ratings yet
Supervised Alg
27 pages
SVM & Image Classification.
No ratings yet
SVM & Image Classification.
22 pages
CMPE 442 Introduction To Machine Learning: Support Vector Machines
No ratings yet
CMPE 442 Introduction To Machine Learning: Support Vector Machines
64 pages
Atc Lecture Tyliu
No ratings yet
Atc Lecture Tyliu
48 pages
SVM
No ratings yet
SVM
43 pages
Artigo Smallex
No ratings yet
Artigo Smallex
17 pages
Machine Learning Notes SVM
No ratings yet
Machine Learning Notes SVM
22 pages
C. Cifarelli Et Al - Incremental Classification With Generalized Eigenvalues
No ratings yet
C. Cifarelli Et Al - Incremental Classification With Generalized Eigenvalues
25 pages
A Introduction To SVM PDF
No ratings yet
A Introduction To SVM PDF
48 pages
SVM Basics Paper
No ratings yet
SVM Basics Paper
7 pages
Data Classification Using Support Vector Machine: Durgesh K. Srivastava, Lekha Bhambhu
No ratings yet
Data Classification Using Support Vector Machine: Durgesh K. Srivastava, Lekha Bhambhu
7 pages
Gauss Nodes Revolution: Numerical Integration Theory Radically Simplified And Generalised
From Everand
Gauss Nodes Revolution: Numerical Integration Theory Radically Simplified And Generalised
Rob Porter
No ratings yet
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet
Optimization in Function Spaces
From Everand
Optimization in Function Spaces
Amol Sasane
No ratings yet
Perceptrons: Fundamentals and Applications for The Neural Building Block
From Everand
Perceptrons: Fundamentals and Applications for The Neural Building Block
Fouad Sabry
No ratings yet
Stone Fox Day 7
0% (1)
Stone Fox Day 7
4 pages
The Use of Youtube For Learning English
No ratings yet
The Use of Youtube For Learning English
22 pages
q4 Creative Nonfiction Writing
No ratings yet
q4 Creative Nonfiction Writing
13 pages
Creative Writing by Perry
No ratings yet
Creative Writing by Perry
42 pages
Lesson Plan English Year 1: Complementary Learning Objectives
No ratings yet
Lesson Plan English Year 1: Complementary Learning Objectives
1 page
Meduc Final
No ratings yet
Meduc Final
52 pages
ML Course File (R18)
No ratings yet
ML Course File (R18)
51 pages
Model, The Process Model, The Satisfaction Model, The Organizational Learning Model, and The Total Quality Management Model. For Another
No ratings yet
Model, The Process Model, The Satisfaction Model, The Organizational Learning Model, and The Total Quality Management Model. For Another
3 pages
Assignment 1: Course Code (IOP4863)
No ratings yet
Assignment 1: Course Code (IOP4863)
16 pages
7527 19081 1 PB
No ratings yet
7527 19081 1 PB
8 pages
Assessment Task 1
No ratings yet
Assessment Task 1
7 pages
Coaching Plan
100% (1)
Coaching Plan
6 pages
Feap 2 Evidence
No ratings yet
Feap 2 Evidence
11 pages
CISA Exam Preparation
No ratings yet
CISA Exam Preparation
6 pages
AI Ethics: Seven Traps: Annette Zimmermann & Ben Zevenbergen
No ratings yet
AI Ethics: Seven Traps: Annette Zimmermann & Ben Zevenbergen
6 pages
Unitmay 2012
No ratings yet
Unitmay 2012
5 pages
Title: Group Work Vs Team Work: Kurdistan Regional Government - Iraq
No ratings yet
Title: Group Work Vs Team Work: Kurdistan Regional Government - Iraq
8 pages
Tugas Kelompok Bahasa Inggris Recount Text X IPA 6
No ratings yet
Tugas Kelompok Bahasa Inggris Recount Text X IPA 6
8 pages
Chapter Four
No ratings yet
Chapter Four
7 pages
Daily Lesson Log Ten (10) English Second I. Objectives: En10G-Iif-28: Use Words and Expressions That Affirm or Negate
No ratings yet
Daily Lesson Log Ten (10) English Second I. Objectives: En10G-Iif-28: Use Words and Expressions That Affirm or Negate
2 pages
Speech - Design Thinking and Innovation
No ratings yet
Speech - Design Thinking and Innovation
1 page
Brookshear - Turing Machines
No ratings yet
Brookshear - Turing Machines
64 pages
DLL English August 7
No ratings yet
DLL English August 7
4 pages
Enactments Transference and Symptomatic Cure. A Case History Eagle
100% (1)
Enactments Transference and Symptomatic Cure. A Case History Eagle
18 pages
Edp Pre and Post Test Harder
No ratings yet
Edp Pre and Post Test Harder
2 pages
Speakout 2E Adv PLUS SB 9781292241500 UNIT 1
No ratings yet
Speakout 2E Adv PLUS SB 9781292241500 UNIT 1
16 pages
God, Matter, and Information: What Is Ultimate?
No ratings yet
God, Matter, and Information: What Is Ultimate?
24 pages
Characteristics of Young Learners and Good Teachers
100% (1)
Characteristics of Young Learners and Good Teachers
3 pages
FLCT Reviewer
No ratings yet
FLCT Reviewer
2 pages
Day-1 Estimating
No ratings yet
Day-1 Estimating
3 pages

Lupi

Uploaded by

Lupi

Uploaded by

Understanding LUPI

(Learning using Privileged Information)

I. INTRODUCTION B. SVM and SVM+

In some cases though, there are heuristics to guess the 0.16

weights which work pretty well, and the subject knowledge

We next describe a heuristic proposed in [3] to find 0.12

weights and solve a WSVM problem

information is proposed in [3], where they suggest to solve a 0.08

classification problem using only privileged information x∗ ,

and achieve a classifier f ∗ (note that there is no requirement

s.t. yi [hω, xi i + b] ≥ 1 − ξi , i = 1, . . . , m. 0.16

Intuitively, the margins ρi determine how difficult an exam- 0.1

The first experiment that we conducted was to compare

We used the following datasets: Training Set Size

In this specific example, we have a spiral dataset con-

Fig. 9. Weighted NN training

Fig. 7. Privileged Information (X*,y)

Our First step is to fit a 0-layer FCNN in the privileged

Fig. 10. Reference training without privileged information

You might also like