0% found this document useful (0 votes)
10 views

Lecture 5

Uploaded by

sayanpal854
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Lecture 5

Uploaded by

sayanpal854
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

VC Dimension and PAC

Prof. Subir Kumar Das, Dept. of CSE


Hypothesis Space
• Hypothesis space is the space of all legal hypothesis, is a set of all legal
hypothesis that you can describe using the features that you have chosen,
and the language that you have chosen.
• It is the set from which the learning algorithm will pick a hypothesis.
• A hypothesis space is represented by H and the learning algorithm outputs
a hypothesis h belonging to H, this is the output of a learning algorithm.
• The inductive bias (also known as learning bias) of a learning algorithm is
the set of assumptions that the learner uses to predict outputs.
• In machine learning, one aim to construct algorithms that are able to learn
to predict a certain target output.
• The hypothesis that an algorithm would come up depends upon the data
and depends upon the restrictions and bias that is imposed on the data.

• All the legal possible ways in which the coordinate plane can be divided to
predict the outcome of the test data composes of the Hypothesis Space
(H). Prof. Subir Kumar Das, Dept. of CSE
• Each individual possible way is known as the hypothesis (h).
VC Dimension
• A dichotomy is a split of a set into two mutually exclusive subsets whose
union is the original set.
• A dichotomous variable is a type of variable that only takes on two
possible values.
• Gender: Male or Female
• Coin Flip: Heads or Tails
• Property Type: Residential or Commercial
• If for any set of labels {y(1), … , y(d)}, there exists some hЄH so that h(x(i)) =
y(i) for all i = 1, …, m
• There are 2m different ways to separate the sample into two sub-samples
(a dichotomy)
• While the number of hypotheses H can be ∞, the number of dichotomies
H{X1, X2, ..., XN} is at most 2N
• VC dimension, short for Vapnik-Chervonenkis dimension, is a measure of
the complexity of a machine learning model.
• It is named after the mathematicians Vladimir Vapnik and Alexey
Chervonenkis, who developed the concept in the 1970s as part of their
work on statistical learningProf. Subir Kumar Das, Dept. of CSE
theory.
VC Dimension
• VC dimension is defined as the largest number of points that can be
shattered by a binary classifier without misclassification.
• In other words, it is a measure of the model’s capacity to fit arbitrary
labeled datasets.
• A subset S of instances of a set X is shattered by a collection of function F
if ∀ S'⊆ S there is a function f ∈ F such data:
• f (x) = 1 x∈S′
• 0 x∈S−S′
• It is termed informally as a measure of a model’s capacity.
• It is used frequently to guide the model selection process while
developing machine learning applications.
• If there exists a set of n points that can be shattered by the classifier
and there is no set of n+1 points that can be shattered by the classifier,
then the VC dimension of the classifier is n.
• VC Dim helps us compare the models using this bias variance tradeoff.
• Bias represents the inherent error due to the model's assumptions, while
variance measures the model'sProf. Subirsensitivity toCSEtraining data.
Kumar Das, Dept. of
Example
• VC dimension for a linear classifier is at least 3, since it can shatter this
configuration of 3 points.
• In each of the 2³ = 8 possible assignment of positive and negative, the
classifier is able to perfectly separate the two classes.

• Here a linear classifier is lower than 4.


• In this configuration of 4 points, the classifier is unable to segment the
positive and negative classes in at least one assignment.
• Two lines would be necessary to separate the two classes in this
situation. Prof. Subir Kumar Das, Dept. of CSE 5
VC Dimension Application
• In most cases, the exact VC
dimension of a classifier is not so
important.
• Rather, it is used more so to
classify different types of
algorithms by their complexities;
• for example, the class of simple
classifiers could include basic
shapes like lines, circles, or
rectangles,
• whereas a class of complex
classifiers could include classifiers
such as multilayer perceptrons,
boosted trees, or other nonlinear
classifiers.
• The complexity of a classification algorithm, which is directly related to
its VC dimension, is related to the trade-off between bias and variance.
• a model with a higher VC dimension will require more training data to
properly train, but will be Prof.
able to identify more complex relationships 6in
Subir Kumar Das, Dept. of CSE
the data.
Computational Learning Theory

• Computational learning theory (CoLT) is a branch of AI concerned with


using mathematical methods or the design applied to computer learning
programs.
• It involves using mathematical frameworks for the purpose of quantifying
learning tasks and algorithms.
• It can be considered to be an extension of statistical learning theory (SLT),
that makes use of formal methods for the purpose of quantifying learning
algorithms.
• Computational Learning Theory (CoLT): Formal study of learning tasks.
• Statistical Learning Theory (SLT): Formal study of learning algorithms.
• This division of learning tasks vs. learning algorithms is arbitrary, and in
practice, there is quite a large degree of overlap between these two fields.
• It is essentially a sub-fieldProf.of
Subirartificial intelligence
Kumar Das, Dept. of CSE (AI) that focuses on
studying the design and analysis of machine learning algorithms.
Probably Approximately Correct Learning
• PAC learning, also known as Probably Approximately Correct learning is a
theoretical machine learning framework created by Leslie Valiant.
• PAC learning aims to quantify the difficulty involved in a learning task and
it might be considered to be the main sub-field of computational learning
theory.
• PAC learning bothers about the amount of computational effort that is
needed in order to identify a hypothesis (fit model) that is a close match
for the unknown target function.
• PAC learning offers guarantees to the absolute error, how different your
hypothesis (the learned function) is from the concept (the target function,
your task), given that you can only measure your empirical error, the one
that you get from your training sample.
• PAC learning aims to determine whether a learning algorithm can, with
high probability, produce a hypothesis that is approximately correct.
• This means the algorithm should, with a probability of at least 1−𝛿, find a
hypothesis ℎ whose error rate is within 𝜖 of the best possible hypothesis.
• Here, 𝛿 represents the confidence parameter, and 𝜖 represents the
accuracy parameter.
• In the context of Machine Learning, a problem is PAC-learnable if there is
an algorithm A when given some independently drawn samples, will
produce a hypothesis withProf.aSubir
small error
Kumar Das, Dept. offor
CSE any distribution D and any
concept c, and that too with a high probability.
Formal Definition
• f is the function that we want to learn, the target function.
• F is the class of functions from which f can be selected. f is an element of
F.
• X is the set of possible individuals. It is the domain of f.
• N is the cardinality of X.
• D is a probability distribution on X; this distribution is used both when the
training set is created and when the test set is created.
• ORACLE(f,D), a function that in a unit of time returns a pair of the form
(x,f(x)), where x is selected from X according to D.
• H is the set of possible hypotheses.
• h is the specific hypothesis that has been learned. H is an element of H.
• m is the cardinality of the training set.
• error(h) = Probability[f(x) != h(x), x chosen from X according to D]
• Definition: A class of functions F is Probably Approximately (PAC)
Learnable if:
• there is a learning algorithm L that for all f in F, all distributions D on X,
• all epsilon (0 < ε< 1) and delta (0 < 𝛿 < 1), will produce an hypothesis h,
such that
• the probability is at most 𝛿Prof.
that error(h) > ε.
Subir Kumar Das, Dept. of CSE
• L has access to the values of ε and 𝛿, and to ORACLE(f,D)
Probably Approximately Correct Learning
• F is Efficiently PAC Learnable if L is polynomial in ε, 𝛿 , and ln(N).
• It is Polynomial PAC Learnable if m is polynomial in ε, 𝛿 , and the size of
(minimal) descriptions of individuals and of the concept.
• PAC learning, the learning process involves several key components:
• Concept Class (C): The set of all possible target functions.
• Hypothesis Class (H): The set of all possible hypotheses that the learning
algorithm can produce.
• Distribution (D): The probability distribution over the input space.
• Sample Size (m): The number of training examples drawn independently
from D.
• The goal is to find a hypothesis ℎ∈𝐻 that approximates the target concept
𝑐∈𝐶 well enough.

Prof. Subir Kumar Das, Dept. of CSE


Applications and Limitations
• PAC learning provides a theoretical basis for designing and evaluating
machine learning algorithms.
• It helps in understanding the trade-offs between sample size, accuracy,
and confidence, guiding the development of efficient and reliable
algorithms.
• PAC learning aids in model selection by providing criteria for choosing
models that balance complexity and performance.
• It helps in determining the appropriate hypothesis class and regularization
techniques to avoid overfitting.
• PAC learning principles can be applied to determine the optimal querying
strategy to achieve the desired accuracy and confidence with fewer
labeled or unlabeled examples.
• PAC learning assumes that the data distribution is fixed and known, which
may not hold in real-world scenarios where distributions can change over
time.
• Calculating the exact VC dimension for complex hypothesis classes can be
computationally infeasible, limiting practical applications.
• The hypothesis class needs to be expressive enough to contain a good
approximation of the targetProf. Subirfunction, which
Kumar Das, Dept. of CSE can be challenging for
complex problems.
Thank You

Prof. Subir Kumar Das, Dept. of CSE

You might also like