0% found this document useful (0 votes)
4 views14 pages

ML Lecture 5

Lecture 5 of CP-6107 Machine Learning covers the theoretical foundations of machine learning, focusing on PAC Learning and Agnostic PAC Learning. It discusses the importance of empirical risk minimization (ERM) with inductive bias, the role of probability in sampling training sets, and the definitions of PAC learnability and agnostic PAC learnability. Additionally, it emphasizes the significance of generalization and the challenges of finding consistent hypotheses in the context of learning problems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views14 pages

ML Lecture 5

Lecture 5 of CP-6107 Machine Learning covers the theoretical foundations of machine learning, focusing on PAC Learning and Agnostic PAC Learning. It discusses the importance of empirical risk minimization (ERM) with inductive bias, the role of probability in sampling training sets, and the definitions of PAC learnability and agnostic PAC learnability. Additionally, it emphasizes the significance of generalization and the challenges of finding consistent hypotheses in the context of learning problems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

CP-6107: Machine Learning

Lecture 5

Muhammad Majid ([email protected]) CP-6107 Machine Learning


Lecture 4: Summary
• Theoretical Foundation of Machine Learning
• Empirical Risk Minimization (ERM)
• ERM with Inductive Bias

Muhammad Majid ([email protected]) CP-6107 Machine Learning


Lecture 5: Outline
• Theoretical Foundation of Machine Learning
• PAC Learning
• Agnostic PAC Learning
• Generalized Loss Function

Muhammad Majid ([email protected]) CP-6107 Machine Learning


Basic Learner Strategy
ERM with Inductive Bias
Finite Hypothesis Classes
• There is randomness in the choice of the predictor hS
and in the risk LD,f(hS) as LD,f(hS) depends on the
training set, S, and that training set is picked by a
random process
• It is not realistic to expect that with full certainty S will
suffice to direct the learner toward a good classifier
• There is always some probability that the sampled
training data happens to be very non-representative
of the underlying D
• We will therefore address the probability to sample a
training set for which LD,f(hS) is not too large
Muhammad Majid ([email protected]) CP-6107 Machine Learning
Basic Learner Strategy
ERM with Inductive Bias
Finite Hypothesis Classes
• Usually, we denote the probability of getting a non-
representative sample by δ, and call (1- δ) the
confidence parameter of our prediction
• We introduce another parameter for the quality of
prediction, the accuracy parameter, commonly
denoted by ϵ
• We interpret the event LD,f(hS) > ϵ as a failure of the
learner, while if LD,f(hS) ≤ ϵ we view the output of the
algorithm as an approximately correct predictor
• We are interested in upper bounding the probability to
sample m-tuple of instances that will lead to failure of
the learner
Muhammad Majid ([email protected]) CP-6107 Machine Learning
Basic Learner Strategy
ERM with Inductive Bias
Finite Hypothesis Classes
• Let H be a finite hypothesis class and δ ϵ(0,1) and
ϵ>0 and let m be an integer that satisfies

• Then, for any labeling function, f, and for any


distribution, D, for which the realizability assumption
holds (that is, for some hϵH, LD,f(hS)=0), with
probability of at least 1-δ over the choice of an i.i.d.
sample S of size m, we have that for every ERM
hypothesis, hS, it holds that

Muhammad Majid ([email protected]) CP-6107 Machine Learning


Basic Learner Strategy
Probably Approximately Accurate (PAC) Learning
PAC Learnability Definition
• A hypothesis class H is PAC learnable if there exist a
function mH:(0,1)2N and a learning algorithm with
the following property: For every ϵ,δ ϵ(0,1), for every
distribution D over X, and for every labeling function
f:X  (0,1), if the realizable assumption holds with
respect to H,D,f, then when running the learning
algorithm on m>mH(ϵ,δ) i.i.d. examples generated by
D and labeled by f, the algorithm returns a hypothesis
h such that, with probability of at least 1-δ (over the
choice of the examples), LD,f(h)<=ϵ

Muhammad Majid ([email protected]) CP-6107 Machine Learning


Basic Learner Strategy
Agnostic PAC Learning
Agnostic PAC Learnability Definition
• A hypothesis class H is agnostic PAC learnable if
there exist a function mH:(0,1)2N and a learning
algorithm with the following property: For every
ϵ,δϵ(0,1), for every distribution D over X×Y, then when
running the learning algorithm on m>mH(ϵ,δ) i.i.d.
examples generated by D, the algorithm returns a
hypothesis h such that, with probability of at least 1-δ
(over the choice of m training examples),

Muhammad Majid ([email protected]) CP-6107 Machine Learning


Basic Learner Strategy
Scope of Learning Problems
• Multiclass Prediction
• Real Valued Prediction (Regression)
• General Loss Function

Muhammad Majid ([email protected]) CP-6107 Machine Learning


Basic Learner Strategy
Agnostic PAC Learning
Agnostic PAC Learnability with Generalized Loss Function
• A hypothesis class H is agnostic PAC learnable with
respect to the set Z and a loss function l: H×ZR+ if
there exist a function mH:(0,1)2N and a learning
algorithm with the following property: For every
ϵ,δϵ(0,1), for every distribution D over Z, when
running the learning algorithm on m>mH(ϵ,δ) i.i.d.
examples generated by D, the algorithm returns a
hypothesis hϵH such that, with probability of at least
1-δ (over the choice of m training examples),

where
Muhammad Majid ([email protected]) CP-6107 Machine Learning
Inductive Learning
Recap
• Induction
• Given a training set of examples of the form (x,f(x))
• x is the input, f(x) is the output
• Return a function ℎ that approximates f(x)
• ℎ is called the hypothesis
• Hypothesis space H
• Set of all hypotheses ℎ that the learner may
consider
• Learning is a search through hypothesis space
• Objective: Find ℎ that minimizes misclassification or
more generally some error/loss function with respect
to the training examples
•Muhammad
But what about unseen examples?
Majid ([email protected]) CP-6107 Machine Learning
Inductive Learning
Generalization
• A good hypothesis will generalize well
– i.e., predict unseen examples correctly

• Usually …
– Any hypothesis ℎ found to approximate the target
function f well over a sufficiently large set of
training examples will also approximate the target
function well over any unobserved examples

Muhammad Majid ([email protected]) CP-6107 Machine Learning


Inductive Learning
Definition
• Goal: find an ℎ that agrees with f on training set
• ℎ is consistent if it agrees with f on all
examples
• Finding a consistent hypothesis is not always
possible
• Insufficient hypothesis space:
• E.g., it is not possible to learn exactly
f(x)=ax+b+xsin(x) when H = space of polynomials
of finite degree
• Noisy data
• E.g., in weather prediction, identical conditions
may lead to rainy and sunny days
Muhammad Majid ([email protected]) CP-6107 Machine Learning
Inductive Learning
Definition
• A learning problem is realizable if the hypothesis
space contains the true function otherwise it is
unrealizable
• Difficult to determine whether a learning problem is
realizable since the true function is not known
• It is possible to use a very large hypothesis space
• For example: H = class of all Turing machines

• But there is a tradeoff between expressiveness of a


hypothesis class and the complexity of finding a
good hypothesis

Muhammad Majid ([email protected]) CP-6107 Machine Learning

You might also like