0% found this document useful (0 votes)

10 views

L2 Supervised Learning

Uploaded by

Fahim Ahmed

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

L2 Supervised Learning

Uploaded by

Fahim Ahmed

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

Supervised Learning Setup

Course 4232: Machine Learning

Dept. of Computer Science

Faculty of Science and Technology

Lecturer No: Week No: 2 Semester:

Instructor: Dr. M M Manjurul Islam ([email protected])
Supervised Learning
 Training experience: a set of labeled examples of the form
x = ( x1, x2, …, xn, y )

 where xj are values for input variables and y is the output

 This implies the existence of a “teacher” who knows the right answers

 What to learn: A function f : X1 × X2 × … × Xn → Y , which maps

the input variables into the output domain

2 Goal: minimize the error (loss function) on the training examples

Dr. M M Manjurul Islam
Supervised Learning Problem
 Given a data set D  X1 × X2 × … × Xn × Y, find a function

h : X1 × X2 × … × Xn → Y

such that h(x) is a good predictor for the value of y

h is called a hypothesis

 If Y is the real set, this problem is a regression

 If Y is a finite discrete set, this problem is called classification
 Binary classification if Y has 2 discrete value
3  Multiple classification if more than 2
Dr. M M Manjurul Islam
Supervised Learning Steps
 Decide what the training examples are
 Data collection
 Feature extraction or selection:
 Discriminative features
 Relevant and insensitive to noise
 Input space X, output space Y, and feature vectors
 Choose a model, i.e. representation for h;
 or, the hypothesis class H = {h1, …, hr})
 Choose an error function to define the best hypothesis
 Choose a learning algorithm: regression or classification method
 Training

4
 Evaluation = testing
Dr. M M Manjurul Islam
EX: What Model or Hypothesis Space H ?

• Training examples:

ei = <xi, yi>

5 for i = 1, …, 10 Dr. M M Manjurul Islam

Linear Hypothesis

6
Dr. M M Manjurul Islam
What Error Function ? Algorithm ?
 Want to find the weight vector w = (w0, …, wn) such that hw(xi) ≈ yi
 Should define the error function to measure the difference between
the predictions and the true answers
 Thus pick w such that the error is minimized
 Sum-of-squares error function:
1 m
J ( w) =  [hw ( xi ) − yi ]2
2 i =1
 Compute w such that J(w) is minimal, that is such that:

J ( w) = 0, j = 0,, n
w j
 Learning Algorithm which find w: Least Mean Squares methods
7
Dr. M M Manjurul Islam
Some Linear Algebra

8
Dr. M M Manjurul Islam
Some Linear Algebra …

9
Dr. M M Manjurul Islam
Some Linear Algebra - The Solution!

10
Dr. M M Manjurul Islam
Example of Linear Regression - Data Matrices

11
Dr. M M Manjurul Islam
XTX

12
Dr. M M Manjurul Islam
XT Y

13
Dr. M M Manjurul Islam
Solving for w – Regression Curve

14
Dr. M M Manjurul Islam
Linear Regression - Summary
 The optimal solution can be computed in polynomial time in the size of the
data set.
 Too simple for most real-valued problems
 The solution is w = (XTX)-1XTY, where
 X is the data matrix, augmented with a column of 1’s
 Y is the column vector of target outputs
 A very rare case in which an analytical exact solution is possible
 Nice math, closed-form formula, unique global optimum
 Problems when (XTX) does not have an inverse
 Possible solutions to this:
1. Include high-order terms in hw
2. Transform the input X to some other space X’, and apply linear regression on X’
3. Use a different but more powerful hypothesis representation
 Is Linear Regression enough ?
15
Dr. M M Manjurul Islam
Generalization Ability vs Overfitting
 Very important issue for any machine learning algorithms.
 Can your algorithm predict the correct target y of any unseen x ?
 Hypothesis may perfectly predict for all known x’s but not unseen x’s
 This is called overfitting
 Each hypothesis h has an unknown true error on the universe: JU(h)
 But we only measured the empirical error on the training set: JD(h)
 Let h1 and h2 be two hypotheses compared on training set D, such that
we obtained the result JD(h1) < JD(h2)
 If h2 is “truly” better, that is JU(h2) < JU(h1)
 Then your algorithm is overfitting, and won’t generalize to unseen data
 We are not interested in memorizing the training set
 In our examples, highest degree d hypotheses overfit (i.e. memorize) the data

16 We need methods to overcome overfitting.

Dr. M M Manjurul Islam
Overfitting
 We have overfitting when hypothesis h is more complex than the data
 Complexity of h = number of parameters in h
 Number of weight parameters in our example increases with degree d

 Overfitting = low error on training data but high error on unseen data
 Assume D is drawn from some unknown probability distribution
 Given the universe U of data, we want to learn a hypothesis h from the
training set 𝐷 ⊂ 𝑈 minimizing the error on unseen data 𝑈 ∖ 𝐷.
 Every h has a true error JU(h) on U, which is the expected error when the
data is drawn from the distribution
 We can only measure the empirical error JD(h) on D; we do not have U
 Then… How can we estimate the error JU(h) from D?
 Apply a cross-validation method during D
 Determining best hypothesis h which generalizes best is called model selection.
17
Dr. M M Manjurul Islam
Avoiding Overfitting
• Red curve = Test set
• Blue curve = Training set

• What is the best h?

• Find the degree d
• Such that JT(h) minimal
• Training error decreases with complexity of h;
degree d in our example
• Testing error decreases initially then increases
• We need three disjoint sets of data T, V, U of D
• Learn a potential h using the training set T
• Estimate error of h using the validation set V
18 • Report unbiased h using the test set U
Dr. M M Manjurul Islam
Cross-Validation
 General procedure for estimating the true error of a learner.

 Randomly partition the data into three subsets:

1. Training Set T: used only to find the parameters of classifier, e.g. w.

2. Validation Set V: used to find the correct hypothesis class, e.g. d.
3. Test Set U: used to estimate the true error of your algorithm

 These three sets do not intersect, i.e. they are disjoint

 Repeat cross-validation many times
 Results are averaged to give true error estimate.

19
Dr. M M Manjurul Islam
Cross-Validation and Model Selection
 How to find the best degree d which fits the data D the best?
 Randomly partition the available data D into three disjoint sets;
training set T, validation set V, and test set U, then:
1. Cross-validation: For each degree d, perform a cross-
validation method using T and V sets for evaluating the
goodness of d.
 Some cross-validation techniques to be discussed later
2. Model Selection: Given the best d found in step 1, find hw,d
using T and V sets and report the prediction error of hw,d
using the test set U
 Some model selection approaches to be discussed later.
 The prediction error on U is an unbiased estimate of the true error
20
Dr. M M Manjurul Islam
Leave-One-Out Cross-Validation
 For each degree d do:
1. for i ← 1 to m do:
1. Validation set Vi ← {ei = ( xi, yi )} ; leave the i-the sample out
2. Training set: Ti ← D \ Vi
3. wd,i ← Train(Ti, d) ; optimal wd,i using training set Ti
4. J(d, i) ← Test(Vi) ; validation error of wd,i on xi
; J(d, i) is an unbiased estimate of the true prediction error
2. Average validation error: 𝐽 𝑑 ← 1
𝑚
σ𝑚
𝑖=1 𝐽(𝑑, 𝑖)
 d* ← arg mind J(d) ; select the degree d with lowest average error
; J(d*) is not an unbiased estimate since all data is used to find it.
21
Dr. M M Manjurul Islam
Example: Estimating True Error for d = 1

22
Dr. M M Manjurul Islam
Example: Estimation results for all d

 Optimal choice is d = 2
 Overfitting for d > 2
 Very high validation error for d = 8 and 9
23
Dr. M M Manjurul Islam
Model Selection
 J(d*) is not unbiased since it was obtained using all m sample data
 We chose the hypothesis class d* based on 𝐽 𝑑 =
1
𝑚
σ𝑚𝑖=1 𝐽(𝑑, 𝑖)
 We want both an hypothesis class and an unbiased true error
estimate
 If we want to compare different learning algorithms (or different
hypotheses) an independent test data U is required in order to
decide for the best algorithm or the best hypothesis
 In our case, we are trying to decide which regression model to
use, d=1, or d=2, or …, or d=11?
 And, which has the best unbiased true error estimate
24
Dr. M M Manjurul Islam
k-Fold Cross-Validation
 Partition D into k disjoint subsets of same size and same
distribution, P1, P2, …, Pk
 For each degree d do:
 for i ← 1 to k do:
1. Validation set Vi ← Pi ; leave Pi out for validation
2. Training set Ti ← D \ Vi
3. wd,i ← Train(Ti, d) ; train on Ti
4. J(d, i) ← Test(Vi) ; compute validation error on Ti
 Average validation error: 𝐽 𝑑 ← 1
𝑚
σ𝑚
𝑖=1 𝐽(𝑑, 𝑖)
 d* ← arg mind J(d) ; return optimal degree d
25
Dr. M M Manjurul Islam
kCV-Based Model Selection
Partition D into k disjoint subsets P1, P2, …, Pk
1. For j ← 1 to k do:
1. Test set Uj ← {ej = ( xj, yj )}
2. TrainingAndValidation set Dj ← D \ Uj
3. dj* ← kCV(Dj) ; find best degree d at iteration j
4. wj* ← Train(Dj, dj*) ; find associated w using full data Dj
5. J(hj*) ← Test(Uj) ; estimate unbiased predictive error of hj* on Uj
1 𝑚 ∗
2. Performance of method: 𝐸 ← 𝑚 σ𝑗=1 𝐽(ℎ𝑗 )
; return E as the performance of the learning algorithm
3. Best hypothesis: hbest ← arg minj J(hj*) ; final selected predictor
26
; several approaches can be used to
Dr.come up withIslam
M M Manjurul just one hypothesis
Variations of k-Fold Cross-Validation
 LOOCV: k-CV with k = m; i.e. m-fold CV
 Best but very slow on large D

 Holdout-CV: 2-fold CV with 50%T, 25%V, 25%U

 Advantage for large D but not good for small data set

 Repeated Random Sub-Sampling: k-CV with random V in each iteration

 for i ← 1 to k do:
1. Randomly select a fixed fraction αm, 0 < α < 1, of D as Vi
2. Train on D \V and measure Ji error on V
 Return 𝐸 = 𝑘1 σ𝑘𝑖=1 𝐽𝑖 as the true error estimate
 Usually k = 10 and α = 0.1
 Some samples may never be selected for validation and others may be selected many
times for validation

 k-CV is a good trade-off between true error estimate, speed, and data size
27
 Each sample is used for validation exactly once. Usually k = 10.
Dr. M M Manjurul Islam
Learning a Class from Examples
 Class C of a “family car”
 Prediction: Is car x a family car?
 Knowledge extraction: What do people expect from a
family car?
 Output:
Positive (+) and negative (–) examples
 Input representation:
x1: price, x2 : engine power

Dr. M M Manjurul Islam 28

Training set X
X = {xt ,r t }tN=1

 1 if x is positive
r =
0 if x is negative

 x1 
x= 
x2 
Class C
(p1  price  p2 ) AND (e1  engine power  e2 )

Dr. M M Manjurul Islam 30

Hypothesis class H
 1 if h says x is positive
h( x) = 
0 if h says x is negative

Error of h on H

E (h| X ) = 1(h(xt )  r t )
N

t =1

Dr. M M Manjurul Islam

S, G, and the Version Space

most specific hypothesis, S

most general hypothesis, G

h  H, between S and G is
consistent
and make up the
version space
(Mitchell, 1997)

Dr. M M Manjurul Islam 32

Margin
 Choose h with largest margin

Dr. M M Manjurul Islam

VC Dimension
 N points can be labeled in 2N ways as +/–
 H shatters N if there
exists h  H consistent
for any of these:
VC(H ) = N

An axis-aligned rectangle shatters 4 points only !

Dr. M M Manjurul Islam 34
Probably Approximately Correct (PAC)
Learning
 How many training examples N should we have, such that with probability at
least 1 ‒ δ, h has error at most ε ?
(Blumer et al., 1989)

 Each strip is at most ε/4

 Pr that we miss a strip 1‒ ε/4
 Pr that N instances miss a strip (1 ‒ ε/4)N
 Pr that N instances miss 4 strips 4(1 ‒ ε/4)N
 4(1 ‒ ε/4)N ≤ δ and (1 ‒ x)≤exp( ‒ x)
 4exp(‒ εN/4) ≤ δ and N ≥ (4/ε)log(4/δ)

Dr. M M Manjurul
Lecture Notes for E Alpaydın 2010 Introduction Islam
to Machine Learning 2e © The MIT Press (V1.0) 35
Noise and Model Complexity
Use the simpler one because
 Simpler to use
(lower computational
complexity)
 Easier to train (lower
space complexity)
 Easier to explain
(more interpretable)
 Generalizes better (lower
variance - Occam’s razor)

Dr. M M Manjurul Islam 36

Multiple Classes, Ci i=1,...,K
X = {xt ,r t }tN=1
1 if x t
Ci
ri = 
t

0 if x t
C j , j  i

Train hypotheses
hi(x), i =1,...,K:

 t
Ci
hi (x ) = 
t 1 if x
0 if x t
C j , j  i
Regression

X = x , r
t

t N
t =1
g(x ) = w1x + w0
r t 
g(x ) = w2 x 2 + w1x + w0
r t = f (x t ) + 

1 N t
N t =1

E (g | X ) =  r − g (x )
t 2

1 N t
N t =1

E (w1 ,w0 | X ) =  r − (w1 x + w0 )
t 2

Model Selection & Generalization
 Learning is an ill-posed problem; data is not sufficient to find
a unique solution
 The need for inductive bias, assumptions about H
 Generalization: How well a model performs on new data
 Overfitting: H more complex than C or f
 Underfitting: H less complex than C or f

Dr. M M Manjurul
Lecture Notes for E Alpaydın 2010 Introduction Islam
to Machine Learning 2e © The MIT Press (V1.0) 39
Triple Trade-Off

 There is a trade-off between three factors (Dietterich, 2003):

1. Complexity of H, c (H),
2. Training set size, N,
3. Generalization error, E, on new data
 As N, E
 As c (H), first E and then E

Dr. M M Manjurul Islam 40

Cross-Validation
 To estimate generalization error, we need data unseen during
training. We split the data as
 Training set (50%)
 Validation set (25%)
 Test (publication) set (25%)
 Resampling when there is few data

Dr. M M Manjurul Islam

Dimensions of a Supervised Learner
1. Model: g(x| )

2. Loss function: E ( | X ) =  L(r t , g (xt | ))

3. Optimization procedure:

 * = arg min E ( | X )

Textbook/ Reference Materials

 Introduction to Machine Learning by Ethem Alpaydin

 Machine Learning: An Algorithmic Perspective by
Stephen Marsland
 Pattern Recognition and Machine Learning by
Christopher M. Bishop

Lecture4 Foundations Supervised Learning
No ratings yet
Lecture4 Foundations Supervised Learning
22 pages
Machine Learning-2
No ratings yet
Machine Learning-2
16 pages
unit 4 regression
No ratings yet
unit 4 regression
26 pages
BITS F464 ML Lecture Notes
No ratings yet
BITS F464 ML Lecture Notes
86 pages
ML Lecture 1 Iitg
No ratings yet
ML Lecture 1 Iitg
32 pages
ECS171: Machine Learning: Lecture 1: Overview of Class, LFD 1.1, 1.2
No ratings yet
ECS171: Machine Learning: Lecture 1: Overview of Class, LFD 1.1, 1.2
29 pages
Statistical Machine Learning-The Basic Approach and Current Research Challenges
No ratings yet
Statistical Machine Learning-The Basic Approach and Current Research Challenges
35 pages
E9 205 - Machine Learning For Signal Processing: Practice For Midterm Exam # 1
No ratings yet
E9 205 - Machine Learning For Signal Processing: Practice For Midterm Exam # 1
8 pages
Lect 26 PDF
No ratings yet
Lect 26 PDF
14 pages
Inference For The Generalization Error
No ratings yet
Inference For The Generalization Error
43 pages
Week 05
No ratings yet
Week 05
23 pages
Sample Selection Bias Correction Theory: Google Research, 76 Ninth Avenue, New York, NY 10011
No ratings yet
Sample Selection Bias Correction Theory: Google Research, 76 Ninth Avenue, New York, NY 10011
25 pages
Jornadas de Estad Istica Aplicada, Universidad de Chimborazo, Riobamba, Ecuador, 10 - 13th June 2013
No ratings yet
Jornadas de Estad Istica Aplicada, Universidad de Chimborazo, Riobamba, Ecuador, 10 - 13th June 2013
28 pages
Bahan Ajar Minggu 13 Simsis
No ratings yet
Bahan Ajar Minggu 13 Simsis
9 pages
Learning Theory: Machine Learning 10 - 601B Seyoung Kim
No ratings yet
Learning Theory: Machine Learning 10 - 601B Seyoung Kim
44 pages
Review Questions PDF
No ratings yet
Review Questions PDF
2 pages
Unit 1-1
No ratings yet
Unit 1-1
75 pages
Lecture 2 - Supervised Learning
No ratings yet
Lecture 2 - Supervised Learning
6 pages
Lecture5 Maximum Likelihood
No ratings yet
Lecture5 Maximum Likelihood
13 pages
Jntuk ML RECORD Full
No ratings yet
Jntuk ML RECORD Full
46 pages
Statistical Machine Learning-The Basic Approach and Current Research Challenges
No ratings yet
Statistical Machine Learning-The Basic Approach and Current Research Challenges
35 pages
Unit 2 ML
No ratings yet
Unit 2 ML
89 pages
EDAN96_2024_Last_lecture-1
No ratings yet
EDAN96_2024_Last_lecture-1
78 pages
Unit-1 ML
No ratings yet
Unit-1 ML
19 pages
Lecture 1: Brief Overview - PAC Learning
No ratings yet
Lecture 1: Brief Overview - PAC Learning
3 pages
Module 5
No ratings yet
Module 5
40 pages
Week 6a
No ratings yet
Week 6a
33 pages
ML:Introduction: Week 1 Lecture Notes
No ratings yet
ML:Introduction: Week 1 Lecture Notes
8 pages
Practice Midterm 2 Sol
No ratings yet
Practice Midterm 2 Sol
26 pages
ML PPT LECT_4
No ratings yet
ML PPT LECT_4
16 pages
Devoir 1
No ratings yet
Devoir 1
6 pages
AI Mod5@AzDOCUMENTSin
No ratings yet
AI Mod5@AzDOCUMENTSin
40 pages
Lecture 2b
No ratings yet
Lecture 2b
45 pages
Male
No ratings yet
Male
9 pages
Comp Intel Notes
No ratings yet
Comp Intel Notes
7 pages
Hypothesis (1)
No ratings yet
Hypothesis (1)
44 pages
Chapter Regression
No ratings yet
Chapter Regression
10 pages
Statistics and Probabilityq4Week 3 Module 11
No ratings yet
Statistics and Probabilityq4Week 3 Module 11
22 pages
SupervisedLearning 2 33
No ratings yet
SupervisedLearning 2 33
32 pages
03-computational cognitive science
No ratings yet
03-computational cognitive science
42 pages
Machinistas meet randomistas: useful ML tools for empirical researchers Esther Duflo
No ratings yet
Machinistas meet randomistas: useful ML tools for empirical researchers Esther Duflo
71 pages
CH 04
No ratings yet
CH 04
6 pages
Generalized Linear Model
No ratings yet
Generalized Linear Model
67 pages
Linear-Regression 231212 072619
No ratings yet
Linear-Regression 231212 072619
13 pages
11abm-gokongwei-prob
No ratings yet
11abm-gokongwei-prob
22 pages
Single-Sample T-Test: Mohd Makmor Bakry, PH.D., R.PH
No ratings yet
Single-Sample T-Test: Mohd Makmor Bakry, PH.D., R.PH
12 pages
Training Data Selection For Support Vector Machine
No ratings yet
Training Data Selection For Support Vector Machine
11 pages
6.036: Intro To Machine Learning: Lecturer: Professor Leslie Kaelbling Notes By: Andrew Lin Fall 2019
No ratings yet
6.036: Intro To Machine Learning: Lecturer: Professor Leslie Kaelbling Notes By: Andrew Lin Fall 2019
50 pages
A Layman's Guide to the Project
No ratings yet
A Layman's Guide to the Project
34 pages
n27 PDF
No ratings yet
n27 PDF
3 pages
SML_Lecture2
No ratings yet
SML_Lecture2
35 pages
CS229 Supplemental Lecture Notes: 1 Boosting
No ratings yet
CS229 Supplemental Lecture Notes: 1 Boosting
11 pages
Stat and Prob Q4 Week 3 Module 11 Alexander Randy Estrada
No ratings yet
Stat and Prob Q4 Week 3 Module 11 Alexander Randy Estrada
21 pages
Chapter 1
No ratings yet
Chapter 1
19 pages
Measures of Central Tendency: MAT2001-Statistics For Engineers
No ratings yet
Measures of Central Tendency: MAT2001-Statistics For Engineers
13 pages
Lec 4 Problems On Central Tendency Dispersion
No ratings yet
Lec 4 Problems On Central Tendency Dispersion
18 pages
Vladimir Cherkassky IJCNN05
No ratings yet
Vladimir Cherkassky IJCNN05
40 pages
Tutorial Manual
100% (1)
Tutorial Manual
93 pages
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet
Unit IV File Handling - CSV Files
No ratings yet
Unit IV File Handling - CSV Files
28 pages
3.0-Oops Concepts, Class-1
No ratings yet
3.0-Oops Concepts, Class-1
4 pages
MPLAB Harmony Release Brief v2.05
No ratings yet
MPLAB Harmony Release Brief v2.05
3 pages
Links and Navigation
No ratings yet
Links and Navigation
16 pages
Barclays -Srm Shortlist 2025 Batch -Interview on 18-09-2024
No ratings yet
Barclays -Srm Shortlist 2025 Batch -Interview on 18-09-2024
2 pages
Community Service Documentation
No ratings yet
Community Service Documentation
7 pages
Arvind-SOP For Weblogic Services Starting and Stopping.
No ratings yet
Arvind-SOP For Weblogic Services Starting and Stopping.
9 pages
Algebra I All-in-One for Dummies Mary Sterlingpdf download
100% (2)
Algebra I All-in-One for Dummies Mary Sterlingpdf download
49 pages
Applications of Matrices in Real World
No ratings yet
Applications of Matrices in Real World
9 pages
Problem Solving Cases In Microsoft Access & Excel 16th Edition Ellen Monk - eBook PDFinstant download
100% (3)
Problem Solving Cases In Microsoft Access & Excel 16th Edition Ellen Monk - eBook PDFinstant download
44 pages
Lab 1.2 PDF
No ratings yet
Lab 1.2 PDF
3 pages
log8
No ratings yet
log8
3 pages
DENM-Question Bank Unit 5
No ratings yet
DENM-Question Bank Unit 5
4 pages
Agile Practical File Ekansh
No ratings yet
Agile Practical File Ekansh
52 pages
03-2. Brainstorming Creative Concepts - Foundr
No ratings yet
03-2. Brainstorming Creative Concepts - Foundr
2 pages
Mathematics Today June 2024
No ratings yet
Mathematics Today June 2024
87 pages
Homework Blues Sheet Music
100% (1)
Homework Blues Sheet Music
4 pages
An Introduction To Alice and Object-Oriented Programming
No ratings yet
An Introduction To Alice and Object-Oriented Programming
26 pages
A206083 PDF
No ratings yet
A206083 PDF
170 pages
Full Download Manning Agile Metrics in Action 1617292486 PDF
100% (2)
Full Download Manning Agile Metrics in Action 1617292486 PDF
10 pages
Data 2
No ratings yet
Data 2
16 pages
Carnet Manual
No ratings yet
Carnet Manual
51 pages
Chapter 1: Introduction: Web Design and Development
No ratings yet
Chapter 1: Introduction: Web Design and Development
32 pages
Dynamic Cone Penetration Test (DCP) : Field Data Calculation CBR Analisys (%)
No ratings yet
Dynamic Cone Penetration Test (DCP) : Field Data Calculation CBR Analisys (%)
5 pages
Tank Gauging Introduction May 2022
No ratings yet
Tank Gauging Introduction May 2022
59 pages
72d6e433.pdf Version Ericsson 8200
No ratings yet
72d6e433.pdf Version Ericsson 8200
17 pages
List of Links
No ratings yet
List of Links
2 pages
0500 - s23 - QP - 22 GFGH
No ratings yet
0500 - s23 - QP - 22 GFGH
7 pages
Executive Post Graduate Certification in Data Science and Artificial Intelligence 1
No ratings yet
Executive Post Graduate Certification in Data Science and Artificial Intelligence 1
14 pages
District Mechanics - Tower of Hanoi
No ratings yet
District Mechanics - Tower of Hanoi
3 pages

L2 Supervised Learning

Uploaded by

L2 Supervised Learning

Uploaded by

Supervised Learning Setup

Course 4232: Machine Learning

Dept. of Computer Science

Lecturer No: Week No: 2 Semester:

 where xj are values for input variables and y is the output

 What to learn: A function f : X1 × X2 × … × Xn → Y , which maps

2 Goal: minimize the error (loss function) on the training examples

such that h(x) is a good predictor for the value of y

 If Y is the real set, this problem is a regression

5 for i = 1, …, 10 Dr. M M Manjurul Islam

16 We need methods to overcome overfitting.

• What is the best h?

 Randomly partition the data into three subsets:

1. Training Set T: used only to find the parameters of classifier, e.g. w.

 These three sets do not intersect, i.e. they are disjoint

 Holdout-CV: 2-fold CV with 50%T, 25%V, 25%U

 Repeated Random Sub-Sampling: k-CV with random V in each iteration

Dr. M M Manjurul Islam 28

Dr. M M Manjurul Islam 30

Dr. M M Manjurul Islam

most specific hypothesis, S

Dr. M M Manjurul Islam 32

Dr. M M Manjurul Islam

An axis-aligned rectangle shatters 4 points only !

 Each strip is at most ε/4

Dr. M M Manjurul Islam 36

 There is a trade-off between three factors (Dietterich, 2003):

Dr. M M Manjurul Islam 40

Dr. M M Manjurul Islam

2. Loss function: E ( | X ) =  L(r t , g (xt | ))

 Introduction to Machine Learning by Ethem Alpaydin

You might also like