0% found this document useful (0 votes)
121 views31 pages

Semi-Supervised Learning: Lukas Tencer

Lukas Tencer presents on semi-supervised learning. He discusses that labeled data is often expensive to obtain, while unlabeled data is cheap. Semi-supervised learning techniques make use of both labeled and unlabeled data. Some common semi-supervised learning algorithms discussed include self-training, help-training, transductive SVMs, multiview algorithms, and graph-based algorithms. He notes that while semi-supervised learning has had success, it also makes strong assumptions that may not always hold.

Uploaded by

F.V. Jayasudha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
121 views31 pages

Semi-Supervised Learning: Lukas Tencer

Lukas Tencer presents on semi-supervised learning. He discusses that labeled data is often expensive to obtain, while unlabeled data is cheap. Semi-supervised learning techniques make use of both labeled and unlabeled data. Some common semi-supervised learning algorithms discussed include self-training, help-training, transductive SVMs, multiview algorithms, and graph-based algorithms. He notes that while semi-supervised learning has had success, it also makes strong assumptions that may not always hold.

Uploaded by

F.V. Jayasudha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Semi-Supervised Learning

Lukas Tencer
PhD student @ ETS
Motivation
Image Similarity - Domain of origin

:: Semi-Supervised Learning :: Lukas Tencer :: MTL Data ::


Face Recognition - Cross-race effect

:: Semi-Supervised Learning :: Lukas Tencer :: MTL Data ::


Motivation in Machine Learning

:: Semi-Supervised Learning :: Lukas Tencer :: MTL Data ::


Motivation in Machine Learning

:: Semi-Supervised Learning :: Lukas Tencer :: MTL Data ::


Methodology
When to use Semi-Supervised Learning?
• Labelled data is hard to get and expensive
– Speech analysis:
• Switchboard dataset
• 400 hours annotation time for 1 hour of speech
– Natural Language Processing
• Penn Chinese Treebank
• 2 Years for 4000 sentences
– Medical Application
• Require experts opinion which might not be unique
• Unlabelled data is cheap

:: Semi-Supervised Learning :: Lukas Tencer :: MTL Data ::


Types of Semi-Supervised Leaning
• Transductive Learning
– Does not generalize to unseen data
– Produces labels only for the data at training time
• 1. Assume labels
• 2. Train classifier on assumed labels
• Inductive Learning
– Does generalize to unseen data
– Not only produces labels, but also the final classifier
– Manifold Assumption

:: Semi-Supervised Learning :: Lukas Tencer :: MTL Data ::


Selected Semi-Supervised Algorithms

• Self-Training
• Help-Training
• Transductive SVM (S3VM)
• Multiview Algorithms
• Graph-Based Algorithms
• Generative Models
• …….
…..

:: Semi-Supervised Learning :: Lukas Tencer :: MTL Data ::


Self-Training
• The Idea: If I am highly confident in a label of examples, I
am right

• Given Training set 𝑇 = {𝑥𝑖 }, and unlabelled set 𝑈 = {𝑢𝑗 }


1. Train 𝑓 on 𝑇
2. Get predictions 𝑃 = 𝑓(𝑈)
3. If 𝑃𝑖 > 𝛼 then add (𝑥, 𝑓(𝑥)) to 𝑇
4. Retrain 𝑓 on 𝑇

:: Semi-Supervised Learning :: Lukas Tencer :: MTL Data ::


Self-Training

• Advantages:
– Very simple and fast method
– Frequently used in NLP
• Disadvantages:
– Amplifies noise in labeled data
– Requires explicit definition of 𝑃 𝑦 𝑥
– Hard to implement for discriminative classifiers (SVM)

:: Semi-Supervised Learning :: Lukas Tencer :: MTL Data ::


Self-Training
1. Naïve Bayes Classifier on Bag-of-Visual-Word for 2 Classes

2. Classify Unlabelled Data base on Learned Classifier

:: Semi-Supervised Learning :: Lukas Tencer :: MTL Data ::


Self-Training
3. Add the most confident images to the training set

4. Retrain and repeat

:: Semi-Supervised Learning :: Lukas Tencer :: MTL Data ::


Help-Training
• The Challenge: How to make Self-Training work for
Discriminative Classifiers (SVM) ?
• The Idea: Train Generative Help Classifier to get 𝑝(𝑦|𝑥)

• Given Training set 𝑇 = {𝑥𝑖 }, unlabelled set 𝑈 = {𝑢𝑗 }, and


generative classifier 𝑔 and discriminative classifier 𝑓
1. Train 𝑓 and 𝑔 on 𝑇
2. Get predictions 𝑃𝑔 = 𝑔(𝑈) and 𝑃𝑓 = 𝑓(𝑈)
3. If 𝑃𝑔,𝑖 > 𝛼 then add (𝑥, 𝑓(𝑥)) to 𝑇
4. Reduce the value of 𝛼 if |𝑃𝑔,𝑖 > 𝛼| = 0
5. Retrain 𝑓 and 𝑔 on 𝑇 until 𝑈 = 0
:: Semi-Supervised Learning :: Lukas Tencer :: MTL Data ::
Transductive SVM (S3VM)
• The Idea: Find largest margin classifier, such that,
unlabelled data are outside of the margin as much as
possible, use regularization over unlabelled data

• Given Training set 𝑇 = {𝑥𝑖 }, and unlabelled set 𝑈 = {𝑢𝑗 }


1. Find all possible labelings 𝑈1 ⋯ 𝑈𝑛 on 𝑈
2. For each 𝑇𝑘 = 𝑇 ∪ 𝑈𝑘 train a standard SVM
3. Choose SVM with largest margins

• What is the catch?


• NP hard problem, fortunately approximations exist

:: Semi-Supervised Learning :: Lukas Tencer :: MTL Data ::


Transductive SVM (S3VM)
• Solving non-convex optimization problem:

1 2
𝐽 𝜃 = 𝑤 + 𝑐1 𝐿(𝑦𝑖 𝑓𝜃 (𝑥𝑖 )) + 𝑐2 𝐿( 𝑓𝜃 (𝑥𝑖 ) )
2
𝑥𝑖 ∈𝑇 𝑥𝑖 ∈𝑈

• Methods:
– Local Combinatorial Search
– Standard unconstrained optimization solvers (CG, BFGS…)
– Continuation Methods
– Concave-Convex procedure (CCCP)
– Branch and Bound

:: Semi-Supervised Learning :: Lukas Tencer :: MTL Data ::


Transductive SVM (S3VM)

• Advantages:
– Can be used with any SVM
– Clear optimization criterion, mathematically well
formulated

• Disadvantages:
– Hard to optimize
– Prone to local minima – non convex
– Only small gain given modest assumptions

:: Semi-Supervised Learning :: Lukas Tencer :: MTL Data ::


Multiview Algorithms
• The Idea: Train 2 classifiers on 2 disjoint sets of features,
then let each classifier label unlabelled examples and
teach the other classifier

• Given Training set 𝑇 = {𝑥𝑖 }, and unlabelled set 𝑈 = {𝑢𝑗 }


1. Split 𝑇 into 𝑇1 and 𝑇2 on the feature dimension
2. Train 𝑓1 on 𝑇1 and 𝑓1 on 𝑇2
3. Get predictions 𝑃1 = 𝑓1 (𝑈) and 𝑃2 = 𝑓2 (𝑈)
4. Add: top 𝑘 from 𝑃1 to 𝑇2 ; top 𝑘 from 𝑃1 to 𝑇1
5. Repeat until 𝑈 = 0

:: Semi-Supervised Learning :: Lukas Tencer :: MTL Data ::


Multiview Algorithms
• Application: Web-page Topic Classification
– 1. Classifier for Images; 2. Classifier for Text

:: Semi-Supervised Learning :: Lukas Tencer :: MTL Data ::


Multiview Algorithms

• Advantages:
– Simple Method applicable to any classifier
– Can correct mistakes in classification between the 2
classifiers

• Disadvantages:
– Assumes conditional independence between features
– Natural split may not exist
– Artificial split may be complicated if only few eatures

:: Semi-Supervised Learning :: Lukas Tencer :: MTL Data ::


Graph-Based Algorithms
• The Idea: Create a connected graph from labelled and
unlabelled examples, propagate labels over the graph

:: Semi-Supervised Learning :: Lukas Tencer :: MTL Data ::


Graph-Based Algorithms

• Advantages:
– Great performance if graph fits the tasks
– Can be used in combination with any model
– Explicit mathematical formulation

• Disadvantages:
– Problem if graph does not fit the task
– Hard to construct graph in sparse spaces

:: Semi-Supervised Learning :: Lukas Tencer :: MTL Data ::


Generative Models
• The Idea: Assume distribution using labelled data, update
using unlabelled data

• Simple models is:


GMM + EM

:: Semi-Supervised Learning :: Lukas Tencer :: MTL Data ::


Generative Models

• Advantages:
– Nice probabilistic framework
– Instead of EM you can go full Bayesian and include
prior with MAP

• Disadvantages:
– EM find only local minima
– Makes strong assumptions about class distributions

:: Semi-Supervised Learning :: Lukas Tencer :: MTL Data ::


What could go wrong?
• Semi-Supervised Learning make a lot of assumptions
– Smoothness
– Clusters
– Manifolds
• Some techniques (Co-Training) require very specific
setup
• Frequently problem with noisy labels
• There is no free lunch

:: Semi-Supervised Learning :: Lukas Tencer :: MTL Data ::


There is much more out there
• Structural Learning
• Co-EM
• Tri-Training
• Co-Boosting
• Unsupervised pretraining – deep learning
• Transductive Inference
• Universum Learning My work
• Active Learning + Semi-Supervised Learning
• …….
• …..
• …

:: Semi-Supervised Learning :: Lukas Tencer :: MTL Data ::


Demo
Conclusion
• Play with Semi-Supervised Learning
• Basic methods are vary simple to implement and can give
you up to 5 to 10% accuracy
• You can cheat at competitions by using unlabelled data,
often no assumption is made about external data
• Be careful when running Semi-Supervised Learning in
production environment, keep an eye on your algorithm
• If running in production, be aware that data patterns
change and old assumptions about labels may screw up
you new unlabelled data

:: Semi-Supervised Learning :: Lukas Tencer :: MTL Data ::


Some more resources
Videos to watch:
Semisupervised Learning Approaches – Tom Mitchell CMU :
http://videolectures.net/mlas06_mitchell_sla/
MLSS 2012 Graph based semi-supervised learning - Zoubin
Ghahramani Cambridge :
https://www.youtube.com/watch?v=HZQOvm0fkLA

Books to read:
• Semi-Supervised Learning – Chapelle, Schölkopf, Zien
• Introduction to Semi-Supervised Learning - Zhu, Oldberg,
Brachman, Dietterich

:: Semi-Supervised Learning :: Lukas Tencer :: MTL Data ::


THANKS FOR YOUR TIME
Lukas Tencer
[email protected]
http://lukastencer.github.io/
https://github.com/lukastencer
https://twitter.com/lukastencer

Graduating August 2015, looking for ML and DS opportunities

You might also like