Lecture 8-ml On-Line Learning
Lecture 8-ml On-Line Learning
On-Line Learning
Mehryar Mohri
Courant Institute and Google Research
[email protected]
Motivation
PAC learning:
• distribution fixed over time (training and test).
• IID assumption.
On-line learning:
• no distributional assumption.
• worst-case analysis (adversarial).
• mixed training and test.
• Performance measure: mistake model, regret.
Halving(H)
1 H1 H
2 for t 1 to T do
3 Receive(xt )
4 yt MajorityVote(Ht , xt )
5 Receive(yt )
6 if yt = yt then
7 Ht+1 {c Ht : c(xt ) = yt }
8 return HT +1
• Halving algorithm: = 0 .
Mehryar Mohri - Foundations of Machine Learning page 11
Weighted Majority - Proof
Potential: t =
N
i=1 wt,i .
⌘2
⌘ E [L(yt,i , yt )] + (Hoeffding’s ineq.)
wt 1 8
⌘2
⌘L( E [yt,i ], yt ) + (convexity of first arg. of L)
wt 1 8
⌘2
= ⌘L(b
yt , yt ) + .
8
n n
N
n ⇤
Thus, LT = LIk min LIk ,i +
i=1
2k (log N )/2
k=0 k=0 k=0
n ⇥
N k
min LT,i + 2 2(log N )/2.
i=1
k=0
with
n ⇤ n+1 ⇤ ⇤ ⇤ ⇤ ⇤ ⇤
k 2 1 2 (n+1)/2
1 2 T +1 1 2( T + 1) 1 2 T
22 = ⇤ = ⇤ ⇥ ⇤ ⇥ ⇤ ⇥⇤ + 1.
i=0
2 1 2 1 2 1 2 1 2 1
Perceptron(w0 )
1 w1 w0 typically w0 = 0
2 for t 1 to T do
3 Receive(xt )
4 yt sgn(wt · xt )
5 Receive(yt )
6 if (yt = yt ) then
7 wt+1 wt + yt xt more generally yt xt , > 0
8 else wt+1 wt
9 return wT +1
w·x = 0 w·x = 0
ρ ρ
yi (w · xi )
w
1/2
MR 2
= M R. (applying the same to previous ts in I)
where r = t I xt 2 .
and leads to
r2 MT L (u) r
MT2 2 +2 + MT L (u) 2
= (r + MT L (u) 2 )2 .
K PDS kernel.
Kernel-Perceptron( 0 )
1 0
typically 0 = 0
2 for t 1 to T do
3 Receive(xt )
T
4 yt sgn( s=1 s ys K(xs , xt ))
5 Receive(yt )
6 if (yt = yt ) then
7 t t+1
8 return
• Nicolò Cesa-Bianchi, Alex Conconi, Claudio Gentile: On the Generalization Ability of On-
Line Learning Algorithms. IEEE Transactions on Information Theory 50(9): 2050-2057. 2004.
• Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games. Cambridge
University Press, 2006.
• Yoav Freund and Robert Schapire. Large margin classification using the perceptron
algorithm. In Proceedings of COLT 1998. ACM Press, 1998.
• Nick Littlestone. "Learning Quickly When Irrelevant Attributes Abound: A New Linear-
threshold Algorithm" Machine Learning 285-318(2). 1988.
• Rosenblatt, Frank, The Perceptron: A Probabilistic Model for Information Storage and
Organization in the Brain, Cornell Aeronautical Laboratory, Psychological Review, v65, No. 6,
pp. 386-408, 1958.