ML-W2L02 Supervised Learning Setup

Uploaded by

MUHAMMAD ALI SIDDIQUE

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views16 pages

ML-W2L02 Supervised Learning Setup

Uploaded by

MUHAMMAD ALI SIDDIQUE

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 16

Machine Leaning

References
Machine Learning for Intelligent Systems,
CS4780/CS5780, Kilian Weinberger, https:/
/www.cs.cornell.edu/courses/cs4780/2018fa/lectur
es/lecturenote01_MLsetup.html
Supervised learning
𝑫 = { 𝒙𝟏, 𝒚𝟏 , 𝒙𝟐, 𝒚𝟐 , … 𝒙𝒏, 𝒚𝒏 } ⊆ 𝑿 × 𝒀

Where,
𝑥𝑖 , 𝑦𝑖 ~𝑃 𝑥, 𝑦

Learn a function ℎ ∈ 𝐻, such that for a new instance 𝑥Ԧ, 𝑦

~𝑃,
ℎ 𝑥 ≈𝑦
Hypothesis Space
𝒉∈𝑯
• 𝑯 can be thought of to contain classes of hypotheses which
share sets of assumptions like
• Decisions tree
• Perceptron
• Neural networks
• Support Vector Machines
How to choose 𝒉?
• Randomly
• May not work well
• Like using a random program to solve you sorting problem
• May work if 𝐻 is constrained enough

• Exhaustively
• Would be very slow
• The space 𝐻 is usually very large (if not infinite)

• 𝐻 is usually chosen by data scientists (you!) based on their

experience!
• ℎ ∈ 𝐻 is estimated efficiently using various optimization techniques
How to evaluate 𝒉?
Loss functions
• Calculate the average error of ℎ in predicting 𝑦.
• Smaller is better
• 0 loss: No error
• 100% loss: Could not even get one instance right
• 50% loss: Your ℎ is as informative as a coin toss
Loss functions
0/1 Loss
𝒏
𝟏
= ෍ , 1, 𝑖𝑓 ℎ(𝑥𝑖) ≠ 𝑦𝑖
𝑳𝟎/𝟏 𝒉 𝒏 𝒉 𝒙𝐢 ≠𝒚𝒊 ℎ 𝒙 𝐢 ≠𝑦 𝑖 = ቊ0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
𝒊=𝟏 𝑤ℎ𝑒𝑟𝑒 𝛿
• Counts the average number of mistakes in predicting 𝑦
• Returns the training error rate
• Non-continuous and non-differentiable
• Difficult to utilize in optimization
• Used to evaluate classifiers in binary/multiclass settings
Squared loss
𝒏
𝟏
𝑳𝒔𝒒 𝒉 = ෍ 𝒉 𝒙𝐢 𝟐
𝒊
𝒏− 𝒚
𝒊=𝟏
• Typically used in regression settings
• The loss is always non-negative
• The loss grows quadratically with the absolute magnitude of
mis-prediction
• Encourages no predictions to be really far off
• If a prediction is very close to be correct, the square will be tiny
and little attention will be given to that example to obtain zero
error
Absolute loss
𝒏
𝟏
𝑳𝒂𝒃𝒔 𝒉 = ෍ 𝒉 𝒙𝐢 − 𝒚𝒊
𝒏 𝒊=𝟏
• Typically used in regression settings
• The loss is always non-negative
• The loss grows linearly with the absolute magnitude of mis-
prediction
• Better suited for noisy data
Comparison
y h(x) Square loss Abs loss
100.00 101.00 1.00 1.00
90.00 90.01 0.0001 0.01
100.00 200.00 10,000.00 100.00
100.00 1,000.00 810,000.00 900.00
Overall 205,000.25 250.25

y h(x) Square loss Abs loss y h(x) Square loss Abs loss
100.00 101.00 1.00 1.00 100.00 0.00 10,000.00 100.00
90.00 91.00 1.00 1.00 90.00 0.00 8,100.00 90.00
100.00 101.00 1.00 1.00 100.00 0.00 10,000.00 100.00
20.00 21.00 1.00 1.00 20.00 0.00 400.00 20.00
30.00 29.00 1.00 1.00 30.00 0.00 900.00 30.00
40.00 41.00 1.00 1.00 40.00 0.00 1,600.00 40.00
30.00 31.00 1.00 1.00 30.00 0.00 900.00 30.00
10.00 11.00 1.00 1.00 10.00 0.00 100.00 10.00
12.00 13.00 1.00 1.00 12.00 0.00 144.00 12.00
16.00 17.00 1.00 1.00 16.00 0.00 256.00 16.00
100.00 1,000.00 810,000.00 900.00 1,000.00 1,000.00 0.00 0.00
Overall 73,637.27 82.73 Overall 2,945.45 40.73
The elusive 𝒉
𝒉 = 𝒂𝒓𝒈𝒎𝒊𝒏𝒉∈𝑯 𝑳(𝒉)

So we need an ℎ with a low loss on 𝐷?

How not to reduce the loss?
ℎ 𝑥 = ቊ𝑦𝑖, 𝑖𝑓 ∃ 𝑥𝑖 , 𝑦𝑖 ∈ 𝐷, 𝑠. 𝑡. 𝑥 =
0, 𝑥𝑖
• What would be the loss of this ℎ on the training set?
𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
• What would be the loss of this ℎ on an unseen test set?

The memorizer!

• Why is it bad?
• How to prevent this from happening?
Generalization
𝝐 = 𝜠 𝒙,𝒚 ∼𝑷[𝒍(𝒙, 𝒚)|𝒉]
• That the expected loss should be calculated on any data point
sampled from the distribution 𝑃, not necessarily those present
in 𝐷
• How to get a new datapoint 𝑥, 𝑦 ∼ 𝑃?
• All we have are the 𝑛 data points!
• We estimate 𝜖 by splitting the 𝐷:
Training set, 𝑫𝑻𝑹 Test set, 𝑫𝑻𝑬

• We train on 𝐷𝑇𝑅 and test on 𝐷𝑇𝐸 only once!

• Don’t train on test inadvertently! (e.g. repeated testing)
Training and Test Data
• Usually 80:20 or 70:30 splits
• Making sure the splits make sense
• Time series: Split by time
• i.i.d: Uniformly at random
o Make sure you don’t split the same datapoint between 𝐷𝑇𝑅 and 𝐷𝑇𝐸
o Make sure the same data does not get repeated on both sides e.g. spam
• We never look at the test data
• We train only using the 𝐷𝑇𝑅 and only use the 𝐷𝑇𝐸 once
• Then the test error approximates the generalization loss

• How to we evaluate the model, if we do not have access to

the test data while training?
Validation sets
Training set, 𝑫𝑻𝑹 Validation set, 𝑫𝑽𝑨 Test set, 𝑫𝑻𝑬

• E.g. Split 80:10:10

• Train on 𝐷𝑇𝑅, tune parameters or calculate error on 𝐷𝑉𝐴, and finally test
once on 𝐷𝑇𝐸
• Cross-validation:

• Finally, train on the whole data once, before shipping out

Summary
Learning:
∗
𝟏
𝒉 ෍ 𝒍(𝒙, 𝒚|𝒉)
𝑫𝑻𝑹
= 𝒂𝒓𝒈𝒎𝒊𝒏𝒉∗∈𝑯 𝒙,𝒚 ∈𝑫𝑻𝑹
Evaluation:
𝟏
𝝐𝑻𝑬 = ෍ 𝒍(𝒙, 𝒚|
𝑫𝑻𝑬 𝒉∗)
• If the samples are drawn i.i.d. from the same distribution 𝑃, then the testing
𝒙,𝒚 ∈𝑫𝑻𝑬
loss is an unbiased estimator of the true generalization loss:
Generalization:
𝝐=𝜠 𝒙,𝒚 ∼𝑷[𝒍(𝒙, 𝒚)|𝒉∗]
• 𝝐𝑻𝑬 → 𝝐 as
𝐷𝑇𝐸 → +∞
• This is due to the weak law of large numbers, which says that the
empirical average of data drawn from a distribution converges to its mean.

DL Unit-2
No ratings yet
DL Unit-2
24 pages
Lecture2 PDF
No ratings yet
Lecture2 PDF
111 pages
CS229 Lecture 2 PDF
100% (1)
CS229 Lecture 2 PDF
48 pages
Linear Classifier: by Dr. Sanjeev Kumar Associate Professor Department of Mathematics IIT Roorkee, Roorkee-247 667, India
No ratings yet
Linear Classifier: by Dr. Sanjeev Kumar Associate Professor Department of Mathematics IIT Roorkee, Roorkee-247 667, India
86 pages
Lecturenotes Cse176
No ratings yet
Lecturenotes Cse176
80 pages
Lec4 Oct12 2022 PracticalNotes LinearRegression
No ratings yet
Lec4 Oct12 2022 PracticalNotes LinearRegression
34 pages
Supervised Learning Overview, Formulation, Train-Test Split: EE514 - CS535
No ratings yet
Supervised Learning Overview, Formulation, Train-Test Split: EE514 - CS535
28 pages
Lecturenotes PDF
No ratings yet
Lecturenotes PDF
80 pages
Machine Learning Coms-4771: Alina Beygelzimer Tony Jebara, John Langford, Cynthia Rudin
No ratings yet
Machine Learning Coms-4771: Alina Beygelzimer Tony Jebara, John Langford, Cynthia Rudin
17 pages
AI & ML Notes
No ratings yet
AI & ML Notes
22 pages
ML-W2L02 Supervised Learning Setup
No ratings yet
ML-W2L02 Supervised Learning Setup
16 pages
Notes Chapter Linear Classifiers
No ratings yet
Notes Chapter Linear Classifiers
4 pages
Machine Learning Models
No ratings yet
Machine Learning Models
52 pages
6.036: Intro To Machine Learning: Lecturer: Professor Leslie Kaelbling Notes By: Andrew Lin Fall 2019
No ratings yet
6.036: Intro To Machine Learning: Lecturer: Professor Leslie Kaelbling Notes By: Andrew Lin Fall 2019
50 pages
Week 2 Introduction To Linear Models - Revised - v1
No ratings yet
Week 2 Introduction To Linear Models - Revised - v1
54 pages
ML 1 2 3
No ratings yet
ML 1 2 3
54 pages
ML:Introduction: Week 1 Lecture Notes
No ratings yet
ML:Introduction: Week 1 Lecture Notes
5 pages
CH 1
No ratings yet
CH 1
24 pages
DL 02 Basics
No ratings yet
DL 02 Basics
94 pages
Chapter 2 - Linear Classifiers
No ratings yet
Chapter 2 - Linear Classifiers
4 pages
(Machine Learning Coursera) Lecture Note Week 1
No ratings yet
(Machine Learning Coursera) Lecture Note Week 1
8 pages
Week 3
No ratings yet
Week 3
56 pages
ML:Introduction What Is Machine Learning?: Continuous and Discrete Data
No ratings yet
ML:Introduction What Is Machine Learning?: Continuous and Discrete Data
6 pages
Learning3 6pp
No ratings yet
Learning3 6pp
15 pages
AIML - Unit 4 Notes
No ratings yet
AIML - Unit 4 Notes
23 pages
Basics of ML and Evaluation
No ratings yet
Basics of ML and Evaluation
42 pages
Statistical Learning Theory
No ratings yet
Statistical Learning Theory
4 pages
Chapter Regression
No ratings yet
Chapter Regression
10 pages
3 LogisticRegression
No ratings yet
3 LogisticRegression
30 pages
Machine Learning HC
No ratings yet
Machine Learning HC
4 pages
Lec10 Intro ML
No ratings yet
Lec10 Intro ML
93 pages
ML 01
No ratings yet
ML 01
24 pages
002-Supervised Learning Setup 01 W2L2
No ratings yet
002-Supervised Learning Setup 01 W2L2
21 pages
19 ML Intro
No ratings yet
19 ML Intro
31 pages
Introduction of Machine Learning
No ratings yet
Introduction of Machine Learning
61 pages
Week11 - Regularization and Optimization
No ratings yet
Week11 - Regularization and Optimization
75 pages
ML 2
No ratings yet
ML 2
155 pages
Week 1 Lecture Notes
No ratings yet
Week 1 Lecture Notes
7 pages
practicalMachineLearning Lecture3
No ratings yet
practicalMachineLearning Lecture3
25 pages
IML Summary
No ratings yet
IML Summary
12 pages
19 ML Intro
No ratings yet
19 ML Intro
33 pages
ML Cheat Sheet
No ratings yet
ML Cheat Sheet
2 pages
Regression Analysis
No ratings yet
Regression Analysis
11 pages
Intro To Neural Networks Explained For Beginners: Sajjad Mustafa
No ratings yet
Intro To Neural Networks Explained For Beginners: Sajjad Mustafa
110 pages
Group 30
No ratings yet
Group 30
33 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
116 pages
Chapter 02.background-Theory
No ratings yet
Chapter 02.background-Theory
20 pages
DL 02 Basics
No ratings yet
DL 02 Basics
95 pages
Practical-5 - 2CEIT606 - Artificial Intelligence
No ratings yet
Practical-5 - 2CEIT606 - Artificial Intelligence
14 pages
GML Slides 2024 04 29
No ratings yet
GML Slides 2024 04 29
206 pages
Lesson 04 Deep Neural Network
No ratings yet
Lesson 04 Deep Neural Network
81 pages
Supervised Learning
No ratings yet
Supervised Learning
5 pages
L2 - Problems in ML & Performance Evaluation
No ratings yet
L2 - Problems in ML & Performance Evaluation
30 pages
Unit-I Machine Learning Basics
No ratings yet
Unit-I Machine Learning Basics
85 pages
11-AI ML Intro 2022
No ratings yet
11-AI ML Intro 2022
54 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
31 pages
CSE 440 AI Volume1 (p1)
No ratings yet
CSE 440 AI Volume1 (p1)
4 pages