ML-W2L02 Supervised Learning Setup
ML-W2L02 Supervised Learning Setup
References
Machine Learning for Intelligent Systems,
CS4780/CS5780, Kilian Weinberger, https:/
/www.cs.cornell.edu/courses/cs4780/2018fa/lectur
es/lecturenote01_MLsetup.html
Supervised learning
𝑫 = { 𝒙𝟏, 𝒚𝟏 , 𝒙𝟐, 𝒚𝟐 , … 𝒙𝒏, 𝒚𝒏 } ⊆ 𝑿 × 𝒀
Where,
𝑥𝑖 , 𝑦𝑖 ~𝑃 𝑥, 𝑦
• Exhaustively
• Would be very slow
• The space 𝐻 is usually very large (if not infinite)
y h(x) Square loss Abs loss y h(x) Square loss Abs loss
100.00 101.00 1.00 1.00 100.00 0.00 10,000.00 100.00
90.00 91.00 1.00 1.00 90.00 0.00 8,100.00 90.00
100.00 101.00 1.00 1.00 100.00 0.00 10,000.00 100.00
20.00 21.00 1.00 1.00 20.00 0.00 400.00 20.00
30.00 29.00 1.00 1.00 30.00 0.00 900.00 30.00
40.00 41.00 1.00 1.00 40.00 0.00 1,600.00 40.00
30.00 31.00 1.00 1.00 30.00 0.00 900.00 30.00
10.00 11.00 1.00 1.00 10.00 0.00 100.00 10.00
12.00 13.00 1.00 1.00 12.00 0.00 144.00 12.00
16.00 17.00 1.00 1.00 16.00 0.00 256.00 16.00
100.00 1,000.00 810,000.00 900.00 1,000.00 1,000.00 0.00 0.00
Overall 73,637.27 82.73 Overall 2,945.45 40.73
The elusive 𝒉
𝒉 = 𝒂𝒓𝒈𝒎𝒊𝒏𝒉∈𝑯 𝑳(𝒉)
The memorizer!
• Why is it bad?
• How to prevent this from happening?
Generalization
𝝐 = 𝜠 𝒙,𝒚 ∼𝑷[𝒍(𝒙, 𝒚)|𝒉]
• That the expected loss should be calculated on any data point
sampled from the distribution 𝑃, not necessarily those present
in 𝐷
• How to get a new datapoint 𝑥, 𝑦 ∼ 𝑃?
• All we have are the 𝑛 data points!
• We estimate 𝜖 by splitting the 𝐷:
Training set, 𝑫𝑻𝑹 Test set, 𝑫𝑻𝑬