0% found this document useful (0 votes)

153 views36 pages

ML 3

The document discusses machine learning concepts related to computational learning theory. It covers topics like sample complexity, PAC learning, and VC dimension. Specifically, it defines [1] sample complexity as how many training examples are needed for a learner to converge to a successful hypothesis, [2] PAC learning as requiring a learner to output a hypothesis that is approximately correct with high probability, and [3] VC dimension as a measure of the complexity of a hypothesis space that can be used to state bounds on sample complexity for infinite hypothesis spaces.

Uploaded by

V Yaswanth Sai 22114242113

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

153 views36 pages

ML 3

Uploaded by

V Yaswanth Sai 22114242113

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 36

Day & Time: Monday (10am-11am & 3pm-4pm)

Tuesday (10am-11am)
Wednesday (10am-11am & 3pm-4pm)
Friday (9am-10am, 11am-12am, 2pm-3pm)
Dr. Srinivasa L. Chakravarthy
&
Smt. Jyotsna Rani Thota
Department of CSE
GITAM Institute of Technology (GIT)
Visakhapatnam – 530045
Email: [email protected] & [email protected]
Department of CSE, GIT 1
20 August 2020
EID 403 and machine learning
Course objectives

● Explore about various disciplines connected with ML.

● Explore about efficiency of learning with inductive bias.
● Explore about identification of Ml algorithms like decision
tree learning.
● Explore about algorithms like Artificial Neural networks,
genetic programming, Bayesian algorithm, Nearest neighbor
algorithm, Hidden Markov chain model.

Department of CSE, GIT EID 403 and machine learning

Learning Outcomes

● Identify the various applications connected with ML.

● Classify efficiency of ML algorithms with Inductive bias
technique.
● Discriminate the purpose of all ML algorithms.
● Analyze any application and Correlate available ML
algorithms.
● Choose an ML algorithm to develop their project.

Department of CSE, GIT EID 403 and machine learning

Syllabus

20 August 2020 4

Department of CSE, GIT EID 403 and machine learning

Reference book 1. Title -Machine Learning
Author- Tom M Mitchell

Department of CSE, GIT EID 403 and machine

learning
Reference book 2. Title –Introduction to Machine Learning
Author- Ethem Alpaydin

Department of CSE, GIT EID 403 and machine learning

Module -3(Chapter-7)
It includes-

Chapter -6 Bayesian Learning

Chapter -9 Computational Learning Theory

● Probably Learning
● Sample Complexity
● Finite and Infinite Hypothesis space
● Mistake Bound model of Learning

7
Introduction
This chapter introduces

-Characterization of difficulty of several types of ML problems.

-Capabilities of several types of ML algorithms.

-Two specific frameworks for analysing ML algorithms

1. Probably approximately correct(PAC) framework- In this we identify

classes of hypotheses that can and cannot be learned from no.of
training examples.
2. Mistake Bound framework- In this we examine the no.of training errors
made by a learner before determining the correct hypothesis.
Introduction-

The Goal in this chapter is to answer some questions with computational theory
such as-

1. Sample complexity- How many training examples are needed for a

learner to converge (with high probability)to a successful hypothesis?

2. Computational complexity- How much computational effort is needed for

a learner to converge (with high probability)to a successful hypothesis?

3. Mistake bound- How many training examples will the learner misclassify
before converging to a successful hypothesis?
Probably learning an approximately correct hypothesis-

The Problem Setting-

● Set of instances X described by attributes like age(young/old) height(short/tall)

● Set of hypotheses H
● Set of possible target concepts C (i.e., c in C where, c: X {0,1}.)
● Training instances generated by a fixed, unknown probability distribution D from X

For example, D might be the distribution of instances generated by observing

“people who walk out of the largest sports store.”
Probably learning an approximately correct hypothesis(cont.)

The Problem Setting-(cont.)

The learner L, considers some set H of possible hypotheses when attempting

to learn the target concept.

After observing a sequence of training examples of the target concept c, L must

output some hypothesis h from H, which is its estimate of c.

We may evaluate the success of L, by the performance of h over new instances

drawn randomly from X according to D.

With this setting, we are interested in characterizing the performance of various

learners L using various hypothesis spaces H, when learning individual target concepts
drawn from various classes C.
Probably learning an approximately correct hypothesis(cont.)

Error of a Hypothesis-
In order to identify , how closely the learner’s output hypothesis h approximates the
actual target concept c-
we need to know about true error of a hypothesis h w.r.t target concept c, i.e.,
Probably learning an approximately correct hypothesis(cont.)

Error of a Hypothesis-(cont.)

● Two notations of error

Probably learning an approximately correct hypothesis(cont.)

PAC Learnability-

In order to characterize classes of target concepts that are reliably learned

then, what kind of statements about learnability should we say as TRUE?

Generally there are two difficulties existed-

1. we provide training examples corresponds to possible instance X, where there
may be multiple consistent hypothesis and learner cannot be able to pick one
corresponds to target concept.

2. Given that the training examples are drawn randomly, there is always a
nonzero probability that the training examples encountered by learner will be
misleading.
Probably learning an approximately correct hypothesis(cont.)

PAC Learnability-(cont.)

To overcome those difficulties-

1. We should not look for learner to output a zero error hypothesis instead we
require that its error be bounded by some constant ϵ, that can be made
arbitrarily small.
2. We should not look for learner to succeed for every sequence of randomly
drawn training examples instead we require only probability of failure be
bounded by some constant δ, that can be made arbitrarily small.

In short we require a learner probably learn a hypothesis that is approximately

correct-hence the term probably approximately correct(PAC) learning in short.
Probably learning an approximately correct hypothesis(cont.)

PAC Learnability-(cont.)

Where, n is size of instances in X & size(c) is encoding length of c in C.

For example, if C is conjunctions of k boolean features then size(c) is no.of boolean
features actually used to describe c.
Sample complexity for Finite hypothesis spaces-

PAC learnability is determined by no.of training examples required by learner.

The growth in the required training examples with problem size is called as-
-“Sample Complexity” of learning problem.

In practical,the factor that minimizes the success of a learner is limited availability of

training data.

The general bound on the sample complexity for a very broad class of learners,
called consistent learners. A learner is consistent if its output hypotheses perfectly
fits the training data.
Sample complexity for Finite hypothesis spaces-(cont.)

As we know that, version space VSH,D to be the set f all hypotheses h ∈ H that
correctly classify the training examples D.

So, every consistent learner outputs a hypothesis belonging to the version space.

Therefore, to bound the no.of examples needed by any consistent learner, we

bound the no.of examples needed to assure that the version space contains no
unacceptable hypotheses.
Sample complexity for Finite hypothesis spaces-(cont.)
Exhausting the Version space
Sample complexity for Finite hypothesis spaces-(cont.)
How many examples will ∈-exhaust the VS?
Sample complexity for Finite hypothesis spaces-(cont.)
Conjunctions of Boolean Literals Are PAC-Learnable

Use theorem-
Sample complexity for Finite hypothesis spaces-(cont.)
Agnostic Learning

If H does not contains the target concept c, then a zero-error hypothesis cannot
always be found.

In this case, we might ask our learner to output the hypothesis from H that has the
minimum error over the training examples.

A learner that makes no assumption that the target concept is representable by H

and that simply finds the hypothesis with minimum training error, is often called an
agnostic learner.
Sample complexity for Finite hypothesis spaces-(cont.)
Agnostic Learning
Sample complexity for Infinite hypothesis spaces-

So far, the discussion is about-

Sample complexity for PAC leaning grows as the logarithm of size of the
hypothesis space but, the drawbacks to characterizing sample complexity in
terms of |H| are-

1. It can lead to quite weak bounds.

2. In case of infinite hypothesis spaces we cannot apply the equation-

Here we consider, a measure of the complexity of H, called Vapnik-Chervonenkis

dimension or VC dimension or VC(H).

Instead of |H| to state bounds on sample complexity we use VC(H).

Sample complexity for Infinite hypothesis spaces-(cont.)
Shattering a set of Instances
VC dimension measures the complexity of hypothesis not by no.of distinct
hypotheses |H|, but by no.of distinct instances
from X that can be discriminated using H.

A set of 3 instances shattered by eight

hypotheses.
For every possible dichotomy of the
instances, there exists a corresponding
hypothesis.
Sample complexity for Infinite hypothesis spaces-(cont.)
Vapnik-Chervonenkis Dimension

Note- For any finite H, VC(H) ≤ log2|H|. To see this suppose VC(H)=d, then H will
require 2d distinct hypotheses to shatter d instances.

Hence, 2d ≤ |H| and d =VC(H) ≤ log2|H|.

Sample complexity for Infinite hypothesis spaces-(cont.)
Example-VC dimension for linear decision surfaces

Suppose each instance in X is described by the conjunction of exactly 3

boolean literals and,

Suppose that each hypothesis in H is described by conjunction of up to 3

boolean literals. What is VC (H)?

We can show it as follows-By representing each instance by a 3-bit string and

each literal corresponds to l1,l2,l3.

Instance1: 100
Instance2: 010
Instance3: 001
Sample complexity for Infinite hypothesis spaces-(cont.)
Example-VC dimension for linear decision surfaces(cont.)

The VC dimension for linear decision surface in the x,y plane is 3.

a) A set of 3 points that can be shattered using linear decision surfaces.
b) A set of 3 points that cannot be shattered.

NOTE- The VC dimension for conjunctions of n boolean literals is at least n. In fact

it is exactly n, though showing this is more difficult, because it requires
demonstrating that no set of n+1 instances can be shattered.
Sample complexity for Infinite hypothesis spaces-(cont.)
Sample complexity and VC Dimension
Earlier, we considered the question

The answer is- Here the no.of training examples grows logarithmically in

Now,using VC(H) for the complexity of H, it is possible to derive an alternative i.e.,

Here the no.of training examples grows log times linear in (1/ ∈)
Mistake Bound model
So far the discussion is about-

1. How the training examples are generated.

2. Noise in the data.
3. The definition of success by stating “ target concept must be learned exactly
or probably”.
4. The measures according to which the learner is evaluated.

Now consider, Mistake bound model of learning,in which

The learner is evaluated by total no.of mistakes it makes before converges to

correct hypothesis.
Mistake Bound model(cont.)

Mistake boud learning problem may be studied in various specific settings.

Generally, we might count the number of mistakes made before PAC learning the
target concept.

So, In the examples below, we consider the number of mistakes made before
learning the target concept exactly means converging to a hypothesis such that
h(x)=c(x).
Mistake Bound for FIND-S Algorithm

FIND-S makes no errors, provided C ⊆ H and training data is noise-free.

Mistake Bound for FIND-S Algorithm(cont.)

FIND-S begins with most specific hypothesis and generalizes the hypotheses
space in order to observe positive training examples.

And So, FIND-S can never mistakenly classify a negative example as

positive.

So, to calculate no.of mistakes in FIND-S, we need to count the no.of

mistakes it mis-classifies truly positive examples as negative.
Mistake Bound for Halving Algorithm

● If the majority of version space hypotheses classify a new instance as positive

then this prediction is output by the learner.

Mistake bound of Halving algorithm can be derived, when the majority of

hypothesis in current version space incorrectly classifies new instance .

In this case,if an instance is correctly classified then the version space will be
reduced to at most half its current size
Optimal Mistake Bounds

Therefore, M Find-S(C) = n+1 and M Halving (C) ≤ log2(|C|)

END OF CHAPTER 7 & MODULE 3

Introduction To: Software Verification and
No ratings yet
Introduction To: Software Verification and
28 pages
Machine Learning: Fundamentals and Applications
From Everand
Machine Learning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Lecture23 s12
No ratings yet
Lecture23 s12
14 pages
Lecture22 s12
No ratings yet
Lecture22 s12
21 pages
(McGraw-Hill Series in Management) William J. Stevenson - Mayer, Raymond R - Production Operations Management-McGraw-Hill Companies (1968) PDF
100% (1)
(McGraw-Hill Series in Management) William J. Stevenson - Mayer, Raymond R - Production Operations Management-McGraw-Hill Companies (1968) PDF
229 pages
Computer Network: 02 December 2024 22:38
No ratings yet
Computer Network: 02 December 2024 22:38
5 pages
Unit 3
No ratings yet
Unit 3
16 pages
Similarity Filters Jaro Winkler
No ratings yet
Similarity Filters Jaro Winkler
7 pages
4.0 ALGO211 Week10 Computational Learning Theory
No ratings yet
4.0 ALGO211 Week10 Computational Learning Theory
16 pages
New Data Science Specialization Brochure
No ratings yet
New Data Science Specialization Brochure
22 pages
MachineLearning - UNIT III
No ratings yet
MachineLearning - UNIT III
30 pages
Eceeeeinstr F242 Control Systems - Handout
No ratings yet
Eceeeeinstr F242 Control Systems - Handout
3 pages
Hostel Management System
No ratings yet
Hostel Management System
24 pages
Pac Learning
No ratings yet
Pac Learning
30 pages
SML Lecture2
No ratings yet
SML Lecture2
35 pages
Lecture-3 Response
No ratings yet
Lecture-3 Response
43 pages
E Book IRIS Feat Systems
No ratings yet
E Book IRIS Feat Systems
5 pages
3 ML Ch2 Concept Learning Short
No ratings yet
3 ML Ch2 Concept Learning Short
16 pages
ML - Unit 4
No ratings yet
ML - Unit 4
15 pages
ML Unit-4 Prob Learning
No ratings yet
ML Unit-4 Prob Learning
36 pages
Al3451 - Machine Learning - Answer Key 13 Mark
No ratings yet
Al3451 - Machine Learning - Answer Key 13 Mark
22 pages
ML Notes
No ratings yet
ML Notes
161 pages
SLIM Estimation
No ratings yet
SLIM Estimation
3 pages
Dec. EC409-C - Ktu Qbank
No ratings yet
Dec. EC409-C - Ktu Qbank
3 pages
Face Recognition Web App Tutorial: by Fatih Cagatay Akyon
No ratings yet
Face Recognition Web App Tutorial: by Fatih Cagatay Akyon
14 pages
Unit 5
No ratings yet
Unit 5
21 pages
Lecture5 Learning Theory v1.1
No ratings yet
Lecture5 Learning Theory v1.1
59 pages
Unit 1-1
No ratings yet
Unit 1-1
75 pages
Computational Learning Theorem
No ratings yet
Computational Learning Theorem
91 pages
Task 2 Project Management
No ratings yet
Task 2 Project Management
16 pages
Unit 1
No ratings yet
Unit 1
43 pages
Week 7 Notes
No ratings yet
Week 7 Notes
11 pages
Support Vector Machines: Kernels: CS4780/5780 - Machine Learning Fall 2011 Thorsten Joachims Cornell University
No ratings yet
Support Vector Machines: Kernels: CS4780/5780 - Machine Learning Fall 2011 Thorsten Joachims Cornell University
15 pages
Arslan Tariq S CV PDF
No ratings yet
Arslan Tariq S CV PDF
1 page
TheLearningTheory 2
No ratings yet
TheLearningTheory 2
90 pages
SupervisedLearning 2 33
No ratings yet
SupervisedLearning 2 33
32 pages
ML Chapter 7 (CLT) Notes
No ratings yet
ML Chapter 7 (CLT) Notes
59 pages
Warehouse/Distribution Center Layout: Course Description
No ratings yet
Warehouse/Distribution Center Layout: Course Description
2 pages
Basics of Learning Theory
No ratings yet
Basics of Learning Theory
35 pages
Foundations of Machine Learning: Module 7: Computational Learning Theory
No ratings yet
Foundations of Machine Learning: Module 7: Computational Learning Theory
64 pages
Verbal: Paralanguage Rhythm Intonation Tempo Stress Emoticons Paul Wazlawick's
No ratings yet
Verbal: Paralanguage Rhythm Intonation Tempo Stress Emoticons Paul Wazlawick's
2 pages
ML - Unit-3 Chapter - 6 (Bayes Theorem) - Notes
No ratings yet
ML - Unit-3 Chapter - 6 (Bayes Theorem) - Notes
123 pages
Lecture 5
No ratings yet
Lecture 5
12 pages
Lec-1 ML Intro
No ratings yet
Lec-1 ML Intro
15 pages
Chapter 3: Information Systems Development
No ratings yet
Chapter 3: Information Systems Development
14 pages
Unit 3
No ratings yet
Unit 3
99 pages
Consciousness and Computation: N X N N
No ratings yet
Consciousness and Computation: N X N N
1 page
AI Unit 4
No ratings yet
AI Unit 4
91 pages
Smart Warehouse - The Warehouse of The Future
100% (1)
Smart Warehouse - The Warehouse of The Future
2 pages
Notes
No ratings yet
Notes
125 pages
Object Oriented Software Engineering
No ratings yet
Object Oriented Software Engineering
11 pages
Experiment No. 01 Introduction To System Representation and Observation Using MATLAB
No ratings yet
Experiment No. 01 Introduction To System Representation and Observation Using MATLAB
20 pages
Sec 1630
No ratings yet
Sec 1630
145 pages
Cs 171 18 IntroLearning Old
No ratings yet
Cs 171 18 IntroLearning Old
47 pages
ML Lecture 1 Iitg
No ratings yet
ML Lecture 1 Iitg
32 pages
10840109
100% (1)
10840109
24 pages
Week 3
No ratings yet
Week 3
56 pages
Computational Learning
No ratings yet
Computational Learning
12 pages
Lecture 2.4
No ratings yet
Lecture 2.4
28 pages
Back To Basics 04 - Safety Instrumented System (SIS) - Exida
No ratings yet
Back To Basics 04 - Safety Instrumented System (SIS) - Exida
4 pages
SSAD Assignment 5 Team ID 29
No ratings yet
SSAD Assignment 5 Team ID 29
7 pages
A Transportation Problem
No ratings yet
A Transportation Problem
1 page
List of ECSS-DRDs (Issue17-15February2017)
No ratings yet
List of ECSS-DRDs (Issue17-15February2017)
8 pages
Inventory Management: Submitted By: Fariha Samad Aazar Ali Shad Bba-V
No ratings yet
Inventory Management: Submitted By: Fariha Samad Aazar Ali Shad Bba-V
24 pages
ML Unit-3.-1
No ratings yet
ML Unit-3.-1
28 pages
WINSEM2021-22 CSE4020 ETH VL2021220501968 Reference Material I 22-01-2022 PAC Learning
No ratings yet
WINSEM2021-22 CSE4020 ETH VL2021220501968 Reference Material I 22-01-2022 PAC Learning
34 pages
Unit Iii
No ratings yet
Unit Iii
6 pages
ML Unit-3
No ratings yet
ML Unit-3
24 pages
6.1 Bayesian Learning
No ratings yet
6.1 Bayesian Learning
33 pages
ML Unit-2 Material Add-On
No ratings yet
ML Unit-2 Material Add-On
82 pages
ML 3
No ratings yet
ML 3
45 pages
ML Sit1305
No ratings yet
ML Sit1305
127 pages
ISTQB Sample Question and Answer
No ratings yet
ISTQB Sample Question and Answer
116 pages
Software Architecture: Foundations, Theory, and Practice CH - 02
No ratings yet
Software Architecture: Foundations, Theory, and Practice CH - 02
45 pages
Computational Learning Theory
No ratings yet
Computational Learning Theory
15 pages
Module 2 - Syllabus: CS 476 Introduction To Machine Learning, Module 2
No ratings yet
Module 2 - Syllabus: CS 476 Introduction To Machine Learning, Module 2
20 pages
Learning Theory: Machine Learning 10 - 601B Seyoung Kim
No ratings yet
Learning Theory: Machine Learning 10 - 601B Seyoung Kim
44 pages
PSO
No ratings yet
PSO
74 pages
ML Question Bank U - 4
No ratings yet
ML Question Bank U - 4
14 pages
Lecture Series On Machine Learning: Ravi Gupta G. Bharadwaja Kumar
No ratings yet
Lecture Series On Machine Learning: Ravi Gupta G. Bharadwaja Kumar
77 pages
Tutorial
No ratings yet
Tutorial
81 pages
Introduction To Machine Learning (67577) : Shai Shalev-Shwartz
No ratings yet
Introduction To Machine Learning (67577) : Shai Shalev-Shwartz
124 pages
Colt Tutorial
No ratings yet
Colt Tutorial
43 pages
ML Lecture 8
No ratings yet
ML Lecture 8
12 pages
10-601 Machine Learning
No ratings yet
10-601 Machine Learning
7 pages
Machine Learning - Computational Learning Theory PDF
No ratings yet
Machine Learning - Computational Learning Theory PDF
7 pages
How Many Samples To Learn A Finite Class?
No ratings yet
How Many Samples To Learn A Finite Class?
4 pages
Machine Learning: PAC-Learning and VC-Dimension
No ratings yet
Machine Learning: PAC-Learning and VC-Dimension
31 pages
Lect 26 PDF
No ratings yet
Lect 26 PDF
14 pages
Machine Leaning 3
No ratings yet
Machine Leaning 3
44 pages

ML 3

Uploaded by

ML 3

Uploaded by

Day & Time: Monday (10am-11am & 3pm-4pm)

● Explore about various disciplines connected with ML.

Department of CSE, GIT EID 403 and machine learning

● Identify the various applications connected with ML.

Department of CSE, GIT EID 403 and machine learning

Department of CSE, GIT EID 403 and machine learning

Department of CSE, GIT EID 403 and machine

Department of CSE, GIT EID 403 and machine learning

Chapter -6 Bayesian Learning

Chapter -9 Computational Learning Theory

-Characterization of difficulty of several types of ML problems.

-Capabilities of several types of ML algorithms.

-Two specific frameworks for analysing ML algorithms

1. Probably approximately correct(PAC) framework- In this we identify

1. Sample complexity- How many training examples are needed for a

2. Computational complexity- How much computational effort is needed for

The Problem Setting-

● Set of instances X described by attributes like age(young/old) height(short/tall)

For example, D might be the distribution of instances generated by observing

The Problem Setting-(cont.)

The learner L, considers some set H of possible hypotheses when attempting

After observing a sequence of training examples of the target concept c, L must

We may evaluate the success of L, by the performance of h over new instances

With this setting, we are interested in characterizing the performance of various

● Two notations of error

In order to characterize classes of target concepts that are reliably learned

Generally there are two difficulties existed-

To overcome those difficulties-

In short we require a learner probably learn a hypothesis that is approximately

Where, n is size of instances in X & size(c) is encoding length of c in C.

PAC learnability is determined by no.of training examples required by learner.

In practical,the factor that minimizes the success of a learner is limited availability of

Therefore, to bound the no.of examples needed by any consistent learner, we

A learner that makes no assumption that the target concept is representable by H

So far, the discussion is about-

1. It can lead to quite weak bounds.

Here we consider, a measure of the complexity of H, called Vapnik-Chervonenkis

Instead of |H| to state bounds on sample complexity we use VC(H).

A set of 3 instances shattered by eight

Hence, 2d ≤ |H| and d =VC(H) ≤ log2|H|.

Suppose each instance in X is described by the conjunction of exactly 3

Suppose that each hypothesis in H is described by conjunction of up to 3

We can show it as follows-By representing each instance by a 3-bit string and

The VC dimension for linear decision surface in the x,y plane is 3.

NOTE- The VC dimension for conjunctions of n boolean literals is at least n. In fact

Now,using VC(H) for the complexity of H, it is possible to derive an alternative i.e.,

1. How the training examples are generated.

Now consider, Mistake bound model of learning,in which

The learner is evaluated by total no.of mistakes it makes before converges to

Mistake boud learning problem may be studied in various specific settings.

FIND-S makes no errors, provided C ⊆ H and training data is noise-free.

And So, FIND-S can never mistakenly classify a negative example as

So, to calculate no.of mistakes in FIND-S, we need to count the no.of

● If the majority of version space hypotheses classify a new instance as positive

Mistake bound of Halving algorithm can be derived, when the majority of

Therefore, M Find-S(C) = n+1 and M Halving (C) ≤ log2(|C|)

You might also like