0% found this document useful (0 votes)

3 views6 pages

Lect 0407

Uploaded by

aegr82

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views6 pages

Lect 0407

Uploaded by

aegr82

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Semi-Supervised Learning

• The main models we have been studying (PAC,

15-859(B) Machine Learning Theory mistake-bound) are for supervised learning.
– Given labeled examples S = {(xi,yi)}, try to learn a
good prediction rule.
Semi-Supervised Learning
• But often labeled data is rare or expensive.
• On the other hand, often unlabeled data is
Avrim Blum plentiful and cheap.
04/07/08 – Documents, images, OCR, web-pages, protein
sequences, …
• Can we use unlabeled data to help?

Semi-Supervised Learning Semi-Supervised Learning

Can we use unlabeled data to help? Can we use unlabeled data to help?
• Unlabeled data is missing the most important • This is a question a lot of people in ML have
info! But maybe still has useful regularities been interested in. A number of interesting
that we can use. E.g., OCR. methods have been developed.
Today:
• Discuss several methods for trying to use
unlabeled data to help.
• Extension of PAC model to make sense of
what’s going on.

Plan for today Co-training

[Blum&Mitchell’98] motivated by [Yarowsky’95]
Methods: Yarowsky’s Problem & Idea:
• Co-training • Some words have multiple meanings (e.g., “plant”).
Want to identify which meaning was intended in any
• Transductive SVM given instance.
• Graph-based methods • Standard approach: learn function from local
context to desired meaning from labeled data.
Model: “…nuclear power plant generated…”
• Augmented PAC model for SSL. • Idea: use fact that in most documents, multiple
uses have same meaning. Use to transfer confident
predictions over.
There’s also a book “Semi-supervised
learning” on the topic.

1
Co-training Example: classifying webpages
Actually, many problems have a similar characteristic.
• Co-training: Agreement between two parts
• Examples x can be written in two parts – examples contain two sets of features, i.e. an example is
(x1,x2). x= x1, x2 and the belief is that the two parts of the
example are sufficient and consistent, i.e. ∃ c1, c2 such that
• Either part alone is in principle sufficient to c1(x1)=c2(x2)=c(x)
produce a good classifer.
• E.g., speech+video, image and context, web Prof. Avrim Blum My Advisor Prof. Avrim Blum My Advisor

page contents and links.

• So if confident about label for x1, can use to
impute label for x2, and vice versa. Use each
classifier to help train the other.
x - Link info & Text info x1- Link info x2- Text info

Example: intervals Co-Training Theorems

Suppose x1 ∈ R, x2 ∈ R. c1 = [a1,b1], c2 = [a2,b2]
• [BM98] if x1, x2 are independent given the
label: D = p(D1+ × D2+) + (1-p)(D1- × D2-), and if
C is SQ-learnable, then can learn from an
initial “weakly-useful” h1 plus unlabeled data.
+ • Def: h is weakly-useful if
+ +
+ Pr[h(x)=1|c(x)=1] > Pr[h(x)=1|c(x)=0] + ε.
+
(same as weak hyp if target c is balanced)
• E.g., say “syllabus” appears on 1/3 of course
pages but only 1/6 of non-course pages.

Co-Training Theorems Co-Training Theorems

• [BM98] if x1, x2 are independent given the • [BM98] if x1, x2 are independent given the
label: D = p(D1+ × D2+) + (1-p)(D1- × D2-), and if label: D = p(D1+ × D2+) + (1-p)(D1- × D2-), and if
C is SQ-learnable, then can learn from an C is SQ-learnable, then can learn from an
initial “weakly-useful” h1 plus unlabeled data. initial “weakly-useful” h1 plus unlabeled data.
• E.g., say “syllabus” appears on 1/3 of course • [BB05] in some cases (e.g., LTFs), you can use
pages but only 1/6 of non-course pages. this to learn from a single labeled example!
• Use as noisy label. Like classification noise
with potentially asymmetric noise rates α, β.
• Can learn so long as α+β < 1-ε.
(helpful trick: balance data so observed labels are 50/50)

2
A really simple learning algorithm
Co-Training Theorems
Claim: if data has a separator of margin γ, there’s
a reasonable chance a random hyperplane will • [BM98] if x1, x2 are independent given the
have error ≤ ½ - γ/4. [all hyperplanes through origin] label: D = p(D1+ × D2+) + (1-p)(D1- × D2-), and if
Proof:
C is SQ-learnable, then can learn from an
w Pick a (positive) example x. Consider the 2-d initial “weakly-useful” h1 plus unlabeled data.
plane defined by x and target w*. w* • [BB05] in some cases (e.g., LTFs), you can use
x
w Prh(h⋅x ≤ 0 | h⋅w* ≥ 0) this to learn from a single labeled example!
≤ (π/2 - γ)/π = ½ - γ/π. – Repeat process multiple times.
w So, Eh[err(h) | h⋅w* ≥ 0] ≤ ½ - γ/π. – Get 4 kinds of hyps: {close to c, close to ¬c,
close to 1, close to 0}
w Since err(h) is bounded between 0 and 1, there
must be a reasonable chance of success.
QED

Co-Training Theorems Co-Training and expansion

Want initial sample to expand to full set of positives
• [BM98] if x1, x2 are independent given the after limited number of iterations.
label: D = p(D1+ × D2+) + (1-p)(D1- × D2-), and if
Link info Text info
C is SQ-learnable, then can learn from an
X2
initial “weakly-useful” h1 plus unlabeled data. X1

• [BB05] in some cases (e.g., LTFs), you can use

this to learn from a single labeled example! +

• [BBY04] if don’t want to assume indep, and C +

+
is learnable from positive data only, then
suffices for D+ to have expansion.

Transductive SVM [Joachims98] Transductive SVM [Joachims98]

• Suppose we believe target separator goes through • Suppose we believe target separator goes through
low density regions of the space/large margin. low density regions of the space/large margin.
• Aim for separator with large margin wrt labeled • Aim for separator with large margin wrt labeled
and unlabeled data. (L+U) and unlabeled data. (L+U)
• Unfortunately, optimization problem is now NP-
_
hard. Algorithm instead does local optimization.
+ _ _
+ + – Start with large margin over labeled data. Induces
+ _ + +
labels on U.
_ _
– Then try flipping labels in greedy fashion.

+ _ + +
SVM _ _
+ + +
Transductive SVM _ _ _
Labeled data only

3
Graph-based methods Graph-based methods
• Suppose we believe that very similar • Transductive approach. (Given L + U, output
examples probably have the same label. predictions on U).
• If you have a lot of labeled data, this • Construct a graph with edges between very
suggests a Nearest-Neighbor type of alg. similar examples.
• If you have a lot of unlabeled data, suggests • Solve for:
a graph-based method. – Minimum cut
– Minimum “soft-cut” [ZGL]
– Spectral partitioning

Graph-based methods
• Suppose just two labels: 0 & 1.
• Solve for labels f(x) for unlabeled examples
x to minimize: How can we think about
– ∑e=(u,v)|f(u)-f(v)| [soln = minimum cut] these approaches to using
– ∑e=(u,v) (f(u)-f(v))2 [soln = electric potentials] unlabeled data in a PAC-style
model?
+ -

+ -

Proposed Model [BB05] Proposed Model [BB05]

• Augment the notion of a concept class C • Augment the notion of a concept class C
with a notion of compatibility χ between a with a notion of compatibility χ between a
concept and the data distribution. concept and the data distribution.
• “learn C” becomes “learn (C,χ)” (i.e. learn • “learn C” becomes “learn (C,χ)” (i.e. learn
class C under compatibility notion χ) class C under compatibility notion χ)

• Express relationships that one hopes the • To do this, need unlabeled data to allow us to
target function and underlying distribution uniformly estimate compatibilities well.
will possess.
• Require that the degree of compatibility be
• Idea: use unlabeled data & the belief that something that can be estimated from a finite
the target is compatible to reduce C down to sample.
just {the highly compatible functions in C}.

4
Proposed Model [BB05] Margins, Compatibility
• Augment the notion of a concept class C • Margins: belief is that should exist a large margin separator.
with a notion of compatibility χ between a
+
concept and the data distribution.
_
• “learn C” becomes “learn (C,χ)” (i.e. learn +
class C under compatibility notion χ) _
Highly compatible +

• Require χ to be an expectation over individual

examples: • Incompatibility of h and D (unlabeled error rate of h) – the
probability mass within distance γ of h.
– χ(h,D)=Ex ∈ D[χ(h, x)] compatibility of h with D,
• Can be written as an expectation over individual examples
χ(h,x) ∈ [0,1] χ(h,D)=Ex ∈ D[χ(h,x)] where:
– errunl(h)=1-χ(h, D) incompatibility of h with D • χ(h,x)=0 if dist(x,h) γ

(unlabeled error rate of h) • χ(h,x)=1 if dist(x,h) ≥ γ

Margins, Compatibility Co-Training, Compatibility

• Margins: belief is that should exist a large margin
• Co-training: examples come as pairs x1, x2 and the goal
separator. + is to learn a pair of functions h1, h2
_ • Hope is that the two parts of the example are consistent.
+

_
Highly compatible + • Legal (and natural) notion of compatibility:
– the compatibility of h1, h2 and D:

• If do not want to commit to γ in advance, define χ(h,x) to be

a smooth function of dist(x,h), e.g.:
– can be written as an expectation over examples:

• Illegal notion of compatibility: the largest γ s.t. D has

probability mass exactly zero within distance γ of h.

Sample Complexity - Uniform convergence bounds Semi-Supervised Learning

Natural Formalization (PACχ)
Finite Hypothesis Spaces, Doubly Realizable Case
• Define CD,χ(ε) = {h ∈ C : errunl(h) ε}. • We will say an algorithm "PACχ-learns" if it runs in
Theorem poly time using samples poly in respective bounds.

• E.g., can think of ln|C| as # bits to describe target

without knowing D, and ln|CD,χ(ε)| as number of bits to
describe target knowing a good approximation to D,
given the assumption that the target has low
• Bound the # of labeled examples as a measure of the
helpfulness of D with respect to χ unlabeled error rate.
– a helpful distribution is one in which CD,χ(ε) is small

5
Infinite hypothesis spaces / VC-dimension
Target in C, but not fully compatible
Infinite Hypothesis Spaces
Finite Hypothesis Spaces – c* not fully compatible: Assume χ(h,x) ∈ {0,1} and χ(C) = {χh : h ∈ C} where χh(x) = χ(h,x).
Theorem C[m,D] - expected # of splits of m points from D with concepts in C.

ε-Cover-based bounds ε-Cover-based bounds

• For algorithms that behave in a specific way: • For algorithms that behave in a specific way:
– first use the unlabeled data to choose a – first use the unlabeled data to choose a
representative set of compatible hypotheses representative set of compatible hypotheses
– then use the labeled sample to choose among these – then use the labeled sample to choose among these
Theorem
E.g., in case of co-training linear separators with
independence assumption:
– ε-cover of compatible set = {0, 1, c*, ¬ c*}
E.g., Transductive SVM when data is in two blobs.

+ _

+ _
• Can result in much better bound than uniform convergence!

Ways unlabeled data can help in this model

• If the target is highly compatible with D and have enough

unlabeled data to estimate χ over all h ∈ C, then can reduce
the search space (from C down to just those h ∈ C whose
estimated unlabeled error rate is low).

• By providing an estimate of D, unlabeled data can allow a

more refined distribution-specific notion of hypothesis
space size (such as Annealed VC-entropy or the size of the
smallest ε-cover).

• If D is nice so that the set of compatible h ∈ C has a small

ε-cover and the elements of the cover are far apart, then
can learn from even fewer labeled examples than the 1/ε
needed just to verify a good hypothesis.

Introduction To Classification - PPT Slides 1
No ratings yet
Introduction To Classification - PPT Slides 1
62 pages
Icc PDF
100% (1)
Icc PDF
279 pages
Machine Learning Notes
100% (3)
Machine Learning Notes
134 pages
Quality Work Life
No ratings yet
Quality Work Life
12 pages
HW Ch7 1
No ratings yet
HW Ch7 1
12 pages
M & W Strategy
No ratings yet
M & W Strategy
19 pages
Chap6 Stair Design MDM
No ratings yet
Chap6 Stair Design MDM
33 pages
Curriculum Development Prof Ed LET Reviewer
100% (1)
Curriculum Development Prof Ed LET Reviewer
6 pages
3a Index PDF
0% (1)
3a Index PDF
4 pages
Sumana Bandyopadhyay - Kolkata The Colonial City in Transition - Reflections in Geographies of Urban India-Routledge (2022)
100% (1)
Sumana Bandyopadhyay - Kolkata The Colonial City in Transition - Reflections in Geographies of Urban India-Routledge (2022)
395 pages
AASHTO M300 Inorganic Zinc-Rich Primer
100% (2)
AASHTO M300 Inorganic Zinc-Rich Primer
8 pages
Government of West Bengal: Transport Department Islampur ARTO Form 23 Certificate of Registration
0% (1)
Government of West Bengal: Transport Department Islampur ARTO Form 23 Certificate of Registration
1 page
Usc 08
No ratings yet
Usc 08
46 pages
Unit-1 - Machine Learning
No ratings yet
Unit-1 - Machine Learning
85 pages
Cyclotron
72% (61)
Cyclotron
20 pages
Unit 3 Bayesian Learning
No ratings yet
Unit 3 Bayesian Learning
49 pages
1 (8 29) Supervised Learning
No ratings yet
1 (8 29) Supervised Learning
60 pages
14 AAU - Level 6 - Test - Challenge - Unit 4
100% (13)
14 AAU - Level 6 - Test - Challenge - Unit 4
5 pages
Bomba Stanadyne John Deere
100% (22)
Bomba Stanadyne John Deere
60 pages
Machine Learning
No ratings yet
Machine Learning
40 pages
ML Lecture 1 Iitg
No ratings yet
ML Lecture 1 Iitg
32 pages
Data - and AI-driven Methods in Engineering
No ratings yet
Data - and AI-driven Methods in Engineering
40 pages
Computational Learning Theorem
No ratings yet
Computational Learning Theorem
91 pages
The Problem of Overfitting
No ratings yet
The Problem of Overfitting
40 pages
Agile Foundation Courseware – English
From Everand
Agile Foundation Courseware – English
Nader Rad
No ratings yet
Statistical Learning Theory
No ratings yet
Statistical Learning Theory
4 pages
19 ML Intro
No ratings yet
19 ML Intro
31 pages
ML - Unit-3 Chapter - 6 (Bayes Theorem) - Notes
No ratings yet
ML - Unit-3 Chapter - 6 (Bayes Theorem) - Notes
123 pages
Valve and Pump
No ratings yet
Valve and Pump
32 pages
Foundations of Machine Learning: Module 7: Computational Learning Theory
No ratings yet
Foundations of Machine Learning: Module 7: Computational Learning Theory
64 pages
第八章
No ratings yet
第八章
28 pages
912 Brochure
No ratings yet
912 Brochure
3 pages
Revision Worksheet 4 Grade 6
100% (1)
Revision Worksheet 4 Grade 6
2 pages
Pseudo Label Final
No ratings yet
Pseudo Label Final
7 pages
L06 Slides - mlp3
No ratings yet
L06 Slides - mlp3
26 pages
Irs Unit 4 CH 1
No ratings yet
Irs Unit 4 CH 1
58 pages
Slide07 Bayes
No ratings yet
Slide07 Bayes
51 pages
19 ML Intro
No ratings yet
19 ML Intro
33 pages
Inspection Report Shore Quantity Report Ullage Report Time Sheet / Time Log Sample Report Quality Report
No ratings yet
Inspection Report Shore Quantity Report Ullage Report Time Sheet / Time Log Sample Report Quality Report
21 pages
Notes
No ratings yet
Notes
125 pages
Cs 171 18 IntroLearning Old
No ratings yet
Cs 171 18 IntroLearning Old
47 pages
L05 Slides - mlp2
No ratings yet
L05 Slides - mlp2
21 pages
11 2 Multi-Step Subtraction Problems
No ratings yet
11 2 Multi-Step Subtraction Problems
2 pages
Basics of Learning Theory
No ratings yet
Basics of Learning Theory
35 pages
11-AI ML Intro 2022
No ratings yet
11-AI ML Intro 2022
54 pages
PR & ML: CS5691: Machine Learning
No ratings yet
PR & ML: CS5691: Machine Learning
42 pages
DM - Lecture 4
No ratings yet
DM - Lecture 4
65 pages
Fall 2022 Midterm Notes PDF
No ratings yet
Fall 2022 Midterm Notes PDF
15 pages
Machine Learning
No ratings yet
Machine Learning
68 pages
הרצאה-Classifiers and Decision Trees
No ratings yet
הרצאה-Classifiers and Decision Trees
119 pages
Ai512 Book
No ratings yet
Ai512 Book
127 pages
W02 MLOptDL
No ratings yet
W02 MLOptDL
23 pages
Over Fitting and TBL
No ratings yet
Over Fitting and TBL
46 pages
Semi-Supervised Learning Literature Survey
No ratings yet
Semi-Supervised Learning Literature Survey
59 pages
07 Intro To ML
No ratings yet
07 Intro To ML
38 pages
U1 - ML
No ratings yet
U1 - ML
5 pages
Lec1 Introduction
No ratings yet
Lec1 Introduction
60 pages
AI Unit 4
No ratings yet
AI Unit 4
91 pages
"Classifiers": R & D Project by Under The Guidance of
No ratings yet
"Classifiers": R & D Project by Under The Guidance of
59 pages
TheLearningTheory 2
No ratings yet
TheLearningTheory 2
90 pages
Sec 1630
No ratings yet
Sec 1630
145 pages
Computational Learning Theory
No ratings yet
Computational Learning Theory
15 pages
Statistical Machine Learning-The Basic Approach and Current Research Challenges
No ratings yet
Statistical Machine Learning-The Basic Approach and Current Research Challenges
35 pages
Bayesian Learning
No ratings yet
Bayesian Learning
49 pages
Week 09 Lesson 1 Intro Machine Learning 1 To 32
No ratings yet
Week 09 Lesson 1 Intro Machine Learning 1 To 32
61 pages
Class 01
No ratings yet
Class 01
75 pages
Machine Learning: Lecture 6: Bayesian Learning (Based On Chapter 6 of Mitchell T.., Machine Learning, 1997)
No ratings yet
Machine Learning: Lecture 6: Bayesian Learning (Based On Chapter 6 of Mitchell T.., Machine Learning, 1997)
15 pages
Mbafm MMPC 020
No ratings yet
Mbafm MMPC 020
28 pages
Computer Network: 02 December 2024 22:38
No ratings yet
Computer Network: 02 December 2024 22:38
5 pages
Lecturenotes Cse176
No ratings yet
Lecturenotes Cse176
80 pages
Bayes Algorithm
No ratings yet
Bayes Algorithm
26 pages
Employee Retention, Engagement and Careers
No ratings yet
Employee Retention, Engagement and Careers
16 pages
Challenges in ML&DM
No ratings yet
Challenges in ML&DM
12 pages
Multimodal RAG Systems Hands-On Guide
No ratings yet
Multimodal RAG Systems Hands-On Guide
7 pages
Lecturenotes PDF
No ratings yet
Lecturenotes PDF
80 pages
An Introduction To Supervised Machine Learning and Pattern Classification - The Big Picture
No ratings yet
An Introduction To Supervised Machine Learning and Pattern Classification - The Big Picture
55 pages
Lecture 2 - Supervised Learning
No ratings yet
Lecture 2 - Supervised Learning
6 pages
Semi-Supervised Learning: Xiaojin Zhu, University of Wisconsin-Madison
No ratings yet
Semi-Supervised Learning: Xiaojin Zhu, University of Wisconsin-Madison
10 pages
Question of The Day: N N N N
No ratings yet
Question of The Day: N N N N
8 pages
Lecture 3: Applications of Machine Learning Algorithms Jul. 06 & 09, 2018
No ratings yet
Lecture 3: Applications of Machine Learning Algorithms Jul. 06 & 09, 2018
3 pages
Def Slide
No ratings yet
Def Slide
9 pages
Computer Programming The Doctrine
From Everand
Computer Programming The Doctrine
Adesh Silva
No ratings yet
Hytera+VM780+4G+Body+Worn+Camera+User+Manual+ (HyTalk) +V1.0.00 Eng
No ratings yet
Hytera+VM780+4G+Body+Worn+Camera+User+Manual+ (HyTalk) +V1.0.00 Eng
50 pages
Quiz 2 PF
No ratings yet
Quiz 2 PF
7 pages
Machine Reasoning Explainability
No ratings yet
Machine Reasoning Explainability
72 pages
Implementation of Modbus Slave TCPIP For Alfen NG9xx Platform
No ratings yet
Implementation of Modbus Slave TCPIP For Alfen NG9xx Platform
15 pages
Vector Quantization
No ratings yet
Vector Quantization
14 pages
2018 Oblique DT !!
No ratings yet
2018 Oblique DT !!
17 pages
Edexcel Igcse Physics
No ratings yet
Edexcel Igcse Physics
12 pages
Lab 12 Eca2 Version Modif
No ratings yet
Lab 12 Eca2 Version Modif
13 pages
20-Structural Deep Clustering Network
No ratings yet
20-Structural Deep Clustering Network
11 pages
Sciadv Aat9004
No ratings yet
Sciadv Aat9004
7 pages
New Vendor Form
No ratings yet
New Vendor Form
1 page
Schmidt Sciences
No ratings yet
Schmidt Sciences
6 pages
Sustainability - Feature Story
No ratings yet
Sustainability - Feature Story
2 pages
Volktek - Solution Catalog For Surveillance Ethernet
No ratings yet
Volktek - Solution Catalog For Surveillance Ethernet
55 pages

Lect 0407

Uploaded by

Lect 0407

Uploaded by

Semi-Supervised Learning

• The main models we have been studying (PAC,

Semi-Supervised Learning Semi-Supervised Learning

Plan for today Co-training

page contents and links.

Example: intervals Co-Training Theorems

Co-Training Theorems Co-Training Theorems

Co-Training Theorems Co-Training and expansion

• [BB05] in some cases (e.g., LTFs), you can use

• [BBY04] if don’t want to assume indep, and C +

Transductive SVM [Joachims98] Transductive SVM [Joachims98]

Proposed Model [BB05] Proposed Model [BB05]

• Require χ to be an expectation over individual

(unlabeled error rate of h) • χ(h,x)=1 if dist(x,h) ≥ γ

Margins, Compatibility Co-Training, Compatibility

• If do not want to commit to γ in advance, define χ(h,x) to be

• Illegal notion of compatibility: the largest γ s.t. D has

Sample Complexity - Uniform convergence bounds Semi-Supervised Learning

• E.g., can think of ln|C| as # bits to describe target

ε-Cover-based bounds ε-Cover-based bounds

Ways unlabeled data can help in this model

• If the target is highly compatible with D and have enough

• By providing an estimate of D, unlabeled data can allow a

• If D is nice so that the set of compatible h ∈ C has a small

You might also like