0% found this document useful (0 votes)
153 views36 pages

ML 3

The document discusses machine learning concepts related to computational learning theory. It covers topics like sample complexity, PAC learning, and VC dimension. Specifically, it defines [1] sample complexity as how many training examples are needed for a learner to converge to a successful hypothesis, [2] PAC learning as requiring a learner to output a hypothesis that is approximately correct with high probability, and [3] VC dimension as a measure of the complexity of a hypothesis space that can be used to state bounds on sample complexity for infinite hypothesis spaces.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
153 views36 pages

ML 3

The document discusses machine learning concepts related to computational learning theory. It covers topics like sample complexity, PAC learning, and VC dimension. Specifically, it defines [1] sample complexity as how many training examples are needed for a learner to converge to a successful hypothesis, [2] PAC learning as requiring a learner to output a hypothesis that is approximately correct with high probability, and [3] VC dimension as a measure of the complexity of a hypothesis space that can be used to state bounds on sample complexity for infinite hypothesis spaces.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 36

Day & Time: Monday (10am-11am & 3pm-4pm)

Tuesday (10am-11am)
Wednesday (10am-11am & 3pm-4pm)
Friday (9am-10am, 11am-12am, 2pm-3pm)
Dr. Srinivasa L. Chakravarthy
&
Smt. Jyotsna Rani Thota
Department of CSE
GITAM Institute of Technology (GIT)
Visakhapatnam – 530045
Email: [email protected] & [email protected]
Department of CSE, GIT 1
20 August 2020
EID 403 and machine learning
Course objectives

● Explore about various disciplines connected with ML.


● Explore about efficiency of learning with inductive bias.
● Explore about identification of Ml algorithms like decision
tree learning.
● Explore about algorithms like Artificial Neural networks,
genetic programming, Bayesian algorithm, Nearest neighbor
algorithm, Hidden Markov chain model.

Department of CSE, GIT EID 403 and machine learning


Learning Outcomes

● Identify the various applications connected with ML.


● Classify efficiency of ML algorithms with Inductive bias
technique.
● Discriminate the purpose of all ML algorithms.
● Analyze any application and Correlate available ML
algorithms.
● Choose an ML algorithm to develop their project.

Department of CSE, GIT EID 403 and machine learning


Syllabus

20 August 2020 4

Department of CSE, GIT EID 403 and machine learning


Reference book 1. Title -Machine Learning
Author- Tom M Mitchell

Department of CSE, GIT EID 403 and machine


learning
Reference book 2. Title –Introduction to Machine Learning
Author- Ethem Alpaydin

Department of CSE, GIT EID 403 and machine learning


Module -3(Chapter-7)
It includes-

Chapter -6 Bayesian Learning

&

Chapter -9 Computational Learning Theory

● Probably Learning
● Sample Complexity
● Finite and Infinite Hypothesis space
● Mistake Bound model of Learning

7
Introduction
This chapter introduces

-Characterization of difficulty of several types of ML problems.

-Capabilities of several types of ML algorithms.

-Two specific frameworks for analysing ML algorithms

1. Probably approximately correct(PAC) framework- In this we identify


classes of hypotheses that can and cannot be learned from no.of
training examples.
2. Mistake Bound framework- In this we examine the no.of training errors
made by a learner before determining the correct hypothesis.
Introduction-

The Goal in this chapter is to answer some questions with computational theory
such as-

1. Sample complexity- How many training examples are needed for a


learner to converge (with high probability)to a successful hypothesis?

2. Computational complexity- How much computational effort is needed for


a learner to converge (with high probability)to a successful hypothesis?

3. Mistake bound- How many training examples will the learner misclassify
before converging to a successful hypothesis?
Probably learning an approximately correct hypothesis-

The Problem Setting-

● Set of instances X described by attributes like age(young/old) height(short/tall)


● Set of hypotheses H
● Set of possible target concepts C (i.e., c in C where, c: X {0,1}.)
● Training instances generated by a fixed, unknown probability distribution D from X

For example, D might be the distribution of instances generated by observing


“people who walk out of the largest sports store.”
Probably learning an approximately correct hypothesis(cont.)

The Problem Setting-(cont.)

The learner L, considers some set H of possible hypotheses when attempting


to learn the target concept.

After observing a sequence of training examples of the target concept c, L must


output some hypothesis h from H, which is its estimate of c.

We may evaluate the success of L, by the performance of h over new instances


drawn randomly from X according to D.

With this setting, we are interested in characterizing the performance of various


learners L using various hypothesis spaces H, when learning individual target concepts
drawn from various classes C.
Probably learning an approximately correct hypothesis(cont.)

Error of a Hypothesis-
In order to identify , how closely the learner’s output hypothesis h approximates the
actual target concept c-
we need to know about true error of a hypothesis h w.r.t target concept c, i.e.,
Probably learning an approximately correct hypothesis(cont.)

Error of a Hypothesis-(cont.)

● Two notations of error


Probably learning an approximately correct hypothesis(cont.)

PAC Learnability-

In order to characterize classes of target concepts that are reliably learned


then, what kind of statements about learnability should we say as TRUE?

Generally there are two difficulties existed-


1. we provide training examples corresponds to possible instance X, where there
may be multiple consistent hypothesis and learner cannot be able to pick one
corresponds to target concept.

2. Given that the training examples are drawn randomly, there is always a
nonzero probability that the training examples encountered by learner will be
misleading.
Probably learning an approximately correct hypothesis(cont.)

PAC Learnability-(cont.)

To overcome those difficulties-


1. We should not look for learner to output a zero error hypothesis instead we
require that its error be bounded by some constant ϵ, that can be made
arbitrarily small.
2. We should not look for learner to succeed for every sequence of randomly
drawn training examples instead we require only probability of failure be
bounded by some constant δ, that can be made arbitrarily small.

In short we require a learner probably learn a hypothesis that is approximately


correct-hence the term probably approximately correct(PAC) learning in short.
Probably learning an approximately correct hypothesis(cont.)

PAC Learnability-(cont.)

Where, n is size of instances in X & size(c) is encoding length of c in C.


For example, if C is conjunctions of k boolean features then size(c) is no.of boolean
features actually used to describe c.
Sample complexity for Finite hypothesis spaces-

PAC learnability is determined by no.of training examples required by learner.

The growth in the required training examples with problem size is called as-
-“Sample Complexity” of learning problem.

In practical,the factor that minimizes the success of a learner is limited availability of


training data.

The general bound on the sample complexity for a very broad class of learners,
called consistent learners. A learner is consistent if its output hypotheses perfectly
fits the training data.
Sample complexity for Finite hypothesis spaces-(cont.)

As we know that, version space VSH,D to be the set f all hypotheses h ∈ H that
correctly classify the training examples D.

So, every consistent learner outputs a hypothesis belonging to the version space.

Therefore, to bound the no.of examples needed by any consistent learner, we


bound the no.of examples needed to assure that the version space contains no
unacceptable hypotheses.
Sample complexity for Finite hypothesis spaces-(cont.)
Exhausting the Version space
Sample complexity for Finite hypothesis spaces-(cont.)
How many examples will ∈-exhaust the VS?
Sample complexity for Finite hypothesis spaces-(cont.)
Conjunctions of Boolean Literals Are PAC-Learnable

Use theorem-
Sample complexity for Finite hypothesis spaces-(cont.)
Agnostic Learning

If H does not contains the target concept c, then a zero-error hypothesis cannot
always be found.

In this case, we might ask our learner to output the hypothesis from H that has the
minimum error over the training examples.

A learner that makes no assumption that the target concept is representable by H


and that simply finds the hypothesis with minimum training error, is often called an
agnostic learner.
Sample complexity for Finite hypothesis spaces-(cont.)
Agnostic Learning
Sample complexity for Infinite hypothesis spaces-

So far, the discussion is about-

Sample complexity for PAC leaning grows as the logarithm of size of the
hypothesis space but, the drawbacks to characterizing sample complexity in
terms of |H| are-

1. It can lead to quite weak bounds.


2. In case of infinite hypothesis spaces we cannot apply the equation-

Here we consider, a measure of the complexity of H, called Vapnik-Chervonenkis


dimension or VC dimension or VC(H).

Instead of |H| to state bounds on sample complexity we use VC(H).


Sample complexity for Infinite hypothesis spaces-(cont.)
Shattering a set of Instances
VC dimension measures the complexity of hypothesis not by no.of distinct
hypotheses |H|, but by no.of distinct instances
from X that can be discriminated using H.

A set of 3 instances shattered by eight


hypotheses.
For every possible dichotomy of the
instances, there exists a corresponding
hypothesis.
Sample complexity for Infinite hypothesis spaces-(cont.)
Vapnik-Chervonenkis Dimension

Note- For any finite H, VC(H) ≤ log2|H|. To see this suppose VC(H)=d, then H will
require 2d distinct hypotheses to shatter d instances.

Hence, 2d ≤ |H| and d =VC(H) ≤ log2|H|.


Sample complexity for Infinite hypothesis spaces-(cont.)
Example-VC dimension for linear decision surfaces

Suppose each instance in X is described by the conjunction of exactly 3


boolean literals and,

Suppose that each hypothesis in H is described by conjunction of up to 3


boolean literals. What is VC (H)?

We can show it as follows-By representing each instance by a 3-bit string and


each literal corresponds to l1,l2,l3.

Instance1: 100
Instance2: 010
Instance3: 001
Sample complexity for Infinite hypothesis spaces-(cont.)
Example-VC dimension for linear decision surfaces(cont.)

The VC dimension for linear decision surface in the x,y plane is 3.


a) A set of 3 points that can be shattered using linear decision surfaces.
b) A set of 3 points that cannot be shattered.

NOTE- The VC dimension for conjunctions of n boolean literals is at least n. In fact


it is exactly n, though showing this is more difficult, because it requires
demonstrating that no set of n+1 instances can be shattered.
Sample complexity for Infinite hypothesis spaces-(cont.)
Sample complexity and VC Dimension
Earlier, we considered the question

The answer is- Here the no.of training examples grows logarithmically in

Now,using VC(H) for the complexity of H, it is possible to derive an alternative i.e.,

Here the no.of training examples grows log times linear in (1/ ∈)
Mistake Bound model
So far the discussion is about-

1. How the training examples are generated.


2. Noise in the data.
3. The definition of success by stating “ target concept must be learned exactly
or probably”.
4. The measures according to which the learner is evaluated.

Now consider, Mistake bound model of learning,in which

The learner is evaluated by total no.of mistakes it makes before converges to


correct hypothesis.
Mistake Bound model(cont.)

Mistake boud learning problem may be studied in various specific settings.

Generally, we might count the number of mistakes made before PAC learning the
target concept.

So, In the examples below, we consider the number of mistakes made before
learning the target concept exactly means converging to a hypothesis such that
h(x)=c(x).
Mistake Bound for FIND-S Algorithm

FIND-S makes no errors, provided C ⊆ H and training data is noise-free.


Mistake Bound for FIND-S Algorithm(cont.)

FIND-S begins with most specific hypothesis and generalizes the hypotheses
space in order to observe positive training examples.

And So, FIND-S can never mistakenly classify a negative example as


positive.

So, to calculate no.of mistakes in FIND-S, we need to count the no.of


mistakes it mis-classifies truly positive examples as negative.
Mistake Bound for Halving Algorithm

● If the majority of version space hypotheses classify a new instance as positive


then this prediction is output by the learner.

Mistake bound of Halving algorithm can be derived, when the majority of


hypothesis in current version space incorrectly classifies new instance .

In this case,if an instance is correctly classified then the version space will be
reduced to at most half its current size
Optimal Mistake Bounds

Therefore, M Find-S(C) = n+1 and M Halving (C) ≤ log2(|C|)


END OF CHAPTER 7 & MODULE 3

You might also like