0% found this document useful (0 votes)
183 views9 pages

ML Unit-4

This document discusses various topics in computational learning theory and instance-based learning. It begins by defining computational learning theory and its goals of understanding theoretical aspects of deep learning and improving accuracy and efficiency. It then discusses three areas of computational learning theory: sample complexity, computational complexity, and mistake bounds. Sample complexity for finite hypothesis spaces is explained. The mistake bound model of learning is described. Finally, instance-based learning techniques like k-nearest neighbors, locally weighted regression, and radial basis function networks are introduced along with their advantages and disadvantages.

Uploaded by

Anshu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
183 views9 pages

ML Unit-4

This document discusses various topics in computational learning theory and instance-based learning. It begins by defining computational learning theory and its goals of understanding theoretical aspects of deep learning and improving accuracy and efficiency. It then discusses three areas of computational learning theory: sample complexity, computational complexity, and mistake bounds. Sample complexity for finite hypothesis spaces is explained. The mistake bound model of learning is described. Finally, instance-based learning techniques like k-nearest neighbors, locally weighted regression, and radial basis function networks are introduced along with their advantages and disadvantages.

Uploaded by

Anshu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

UNIT-4

Computational Learning Theory: Sample Complexity for Finite Hypothesis spaces, Sample Complexity for
Infinite Hypothesis spaces, The Mistake Bound Model of Learning; INSTANCE-BASED LEARNING – k-
Nearest Neighbour Learning, Locally Weighted Regression, Radial basis function networks, Case-based
learning
❖ COMPUTATIONAL LEARNING THEORY:
• Computational Learning Theory (CLT) is a field of Al used for studying the design of machine learning
algorithms to determine what sorts problems are learnable.
• The ultimate goals are to understand the theoretical ideas of deep learning programs, what makes
them work or not, while improving accuracy and efficiency.
• This field merges many disciplines, such as probability theory, statistics, programming optimization,
information theory, calculus and geometry.
• Computational learning theory is used to:
i.) Provide a theoretical analysis of learning.
ii.) Show when a learning algorithm can be expected to succeed.
iii.) Show when learning may be impossible.
• There are three areas comprised by CLT:
i.) Sample complexity: Sample complexity described the examples we need to find in a good
hypothesis.
ii.) Computational complexity: Computational complexity defined the computational power we
need to find in a good hypothesis.
iii.) Mistake bound: Mistake bound find the mistakes we will make before finding a good hypothesis.

❖ SAMPLE COMPLEXITY FOR FINITE HYPOTHESIS SPACES:


1. The sample complexity of a machine learning algorithm represents the number of training samples that
it needs in order to successfully learn a target function.
2. Sample complexity is the number of training samples that we need to supply to the algorithm, so that
the function returned by the algorithm is within an arbitrarily small error of the best possible function,
with probability arbitrarily close to 1.
3. There are two variants of sample complexity:
a. The weak variant fixes a particular input-output distribution.
b. The strong variant takes the worst-case sample complexity over all input-output distributions.
4. It characterizes classes of learning problems or specific algorithms in terms of sample complexity, i.e.,
the number of training examples necessary or sufficient to learn hypotheses of a given accuracy.
5. Complexity of a learning problem depends on:
a. Size or expressiveness of the hypothesis space.
b. Accuracy to which target concept must be approximated.
c. Probability with which the learner must produce a successful hypothesis.
d. Manner in which training examples are presented, for example, randomly or by query to an oracle.

❖ THE MISTAKE BOUND MODEL OF LEARNING:

1. An algorithm A learns a class C with mistake bound M iff


Mistake (A, C) <= M.
2. In mistake bound model, learning proceeds in rounds, one by one. Suppose Y = {1, + 1}.
3. At the beginning of round t, the learning algorithm A has the hypothesis ht in round t, we see xt,
and predict ht(xt).
4. At the end of the round, yt is revealed and A makes a mistake if ht(xt) not equals to yt. The algorithm
then updates its hypothesis to ht+1 and this continues till time T.
5. Suppose the labels were actually produced by some function fin a give" concept class C.
6. Then we bound the total number of mistakes the learner commits at

7. Amount of computation A has to do in each round in order to update its hypothesis from ht, to ht+1.
8. Setting this issue aside for a moment, we have a remarkably simple algorithm halving (C) that has
a mistake bound of lg(IC I) for any finite concept class C.
9. For a Finite set H of hypotheses, de line the hypothesis majority (A) as follows

❖ INSTANCE-BASED LEARNING:
Instance-based learning methods such as nearest neighbour and locally weighted regression are
conceptually straightforward approaches to approximating real-valued or discrete-valued target
functions.
• Learning in these algorithms consists of simply storing the presented training data. When a new query
instance is encountered, a set of similar related instances is retrieved from memory and used to classify
the new query instance
• Instance-based approaches can construct a different approximation to the target function for each
distinct query instance that must be classified

Advantages of Instance-based learning


1. Training is very fast
2. Learn complex target function
3. Don’t lose information
Disadvantages of Instance-based learning
1. The cost of classifying new instances can be high. This is due to the fact that nearly all computation
takes place at classification time rather than when the training examples are first encountered.
2. In many instance-based approaches, especially nearest-neighbour approaches, is that they typically
consider all attributes of the instances when attempting to retrieve similar training examples from
memory. If the target concept depends on only a few of the many available attributes, then the
instances that are truly most "similar" may well be a large distance apart.

❖ K-NEAREST NEIGHBOUR LEARNING


• K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on Supervised Learning
technique.
• K-NN algorithm assumes the similarity between the new case/data and available cases and put the
new case into the category that is most similar to the available categories.
• K-NN algorithm stores all the available data and classifies a new data point based on the similarity. This
means when new data appears then it can be easily classified into a well suite category by using K- NN
algorithm.
• K-NN algorithm can be used for Regression as well as for Classification but mostly it is used for the
Classification problems.
• K-NN is a non-parametric algorithm, which means it does not make any assumption on underlying data.
• It is also called a lazy learner algorithm because it does not learn from the training set immediately
instead it stores the dataset and at the time of classification, it performs an action on the dataset.
• KNN algorithm at the training phase just stores the dataset and when it gets new data, then it classifies
that data into a category that is much similar to the new data.
• Example: Suppose, we have an image of a creature that looks similar to cat and dog, but we want to
know either it is a cat or dog. So for this identification, we can use the KNN algorithm, as it works on a
similarity measure. Our KNN model will find the similar features of the new data set to the cats and
dogs images and based on the most similar features it will put it in either cat or dog category.

Why do we need a K-NN Algorithm?


Suppose there are two categories, i.e., Category A and Category B, and we have a new data point x1, so
this data point will lie in which of these categories. To solve this type of problem, we need a K-NN
algorithm. With the help of K-NN, we can easily identify the category or class of a particular dataset.
Consider the below diagram:

How does K-NN work?

The K-NN working can be explained on the basis of the below algorithm:

o Step-1: Select the number K of the neighbors


o Step-2: Calculate the Euclidean distance of K number of neighbors

o Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
o Step-4: Among these k neighbors, count the number of the data points in each category.
o Step-5: Assign the new data points to that category for which the number of the neighbor is maximum.
o Step-6: Our model is ready.

Advantages of KNN Algorithm:

o It is simple to implement.
o It is robust to the noisy training data
o It can be more effective if the training data is large.

Disadvantages of KNN Algorithm:

o Always needs to determine the value of K which may be complex some time.
o The computation cost is high because of calculating the distance between the data points for all the
training samples.

Example:
Hence for the given data query x for Maths=6 and CS= 8 the Classification will be PASS
❖ LOCALLY WEIGHTED REGRESSION
• The phrase "locally weighted regression" is called local because the function is approximated based
only on data near the query point, weighted because the contribution of each training example is
weighted by its distance from the query point, and regression because this is the term used widely in
the statistical learning community for the problem of approximating real-valued functions.
• Given a new query instance xq, the general approach in locally weighted regression is to construct an
approximation 𝑓̂ that fits the training examples in the neighbourhood surrounding xq. This
approximation is then used to calculate the value 𝑓̂(xq), which is output as the estimated target value
for the query instance.
• Consider locally weighted regression in which the target function f is approximated near xq using a
linear function of the form

Where, ai(x) denotes the value of the ith attribute of the instance x
• Derived methods are used to choose weights that minimize the squared error summed over the set D
of training examples using gradient descent

Which led us to the gradient descent training rule

Where, η is a constant learning rate


• Need to modify this procedure to derive a local approximation rather than a global one. The simple
way is to redefine the error criterion E to emphasize fitting the local training examples. Three possible
criteria are given below.

6. Minimize the squared error over just the k nearest neighbors:

7. Minimize the squared error over the entire set D of training examples, while weighting the error of each
training example by some decreasing function K of its distance from xq:

8. Combine 1 and 2:

If we choose criterion three and re-derive the gradient descent rule, we obtain the following training rule

The differences between this new rule and the rule given by Equation (3) are that the contribution of instance x to
the weight update is now multiplied by the distance penalty K(d(xq, x)), and that the error is summed over only the
k nearest training examples.
❖ RADIAL BASIS FUNCTION NETWORKS
• One approach to function approximation that is closely related to distance-weighted regression and
also to artificial neural networks is learning with radial basis functions
• In this approach, the learned hypothesis is a function of the form

• Where, each xu is an instance from X and where the kernel function Ku(d(xu, x)) is defined so that it
decreases as the distance d(xu, x) increases.
• Here k is a user provided constant that specifies the number of kernel functions to be included.
• 𝑓̂ is a global approximation to f (x), the contribution from each of the Ku(d(xu, x)) terms is localized to a
region nearby the point xu.

Choose each function Ku(d(xu, x)) to be a Gaussian function centred at the point xu with some variance
𝜎u 2

• The functional form of equ(1) can approximate any function with arbitrarily small error, provided a
sufficiently large number k of such Gaussian kernels and provided the width 𝜎 2 of each kernel can be
separately specified
• The function given by equ(1) can be viewed as describing a two layer network where the first layer of
units computes the values of the various Ku(d(xu, x)) and where the second layer computes a linear
combination of these first-layer unit values

Example: Radial basis function (RBF) network

Given a set of training examples of the target function, RBF networks are typically trained in a two-
stage process.

1. First, the number k of hidden units is determined and each hidden unit u is defined by choosing the
values of xu and 𝜎u 2 that define its kernel function Ku(d(xu, x))
2. Second, the weights w, are trained to maximize the fit of the network to the training data, using the
global error criterion given by

Because the kernel functions are held fixed during this second stage, the linear weight values w, can
be trained very efficiently
Several alternative methods have been proposed for choosing an appropriate number of hidden units
or, equivalently, kernel functions.
• One approach is to allocate a Gaussian kernel function for each training example (xi,f (xi)), centring this
Gaussian at the point xi. Each of these kernels may be assigned the same width 𝜎 2 . Given this approach,
the RBF network learns a global approximation to the target function in which each training example
(xi, f (xi)) can influence the value of f only in the neighbourhood of xi.
• A second approach is to choose a set of kernel functions that is smaller than the number of training
examples. This approach can be much more efficient than the first approach, especially when the
number of training examples is large.

❖ CASE-BASED LEARNING
• Case-based reasoning (CBR) is a learning paradigm based on lazy learning methods and they classify
new query instances by analysing similar instances while ignoring instances that are very different from
the query.
• In CBR represent instances are not represented as real-valued points, but instead, they use a rich
symbolic representation.
• CBR has been applied to problems such as conceptual design of mechanical devices based on a stored
library of previous designs, reasoning about new legal cases based on previous rulings, and solving
planning and scheduling problems by reusing and combining portions of previous solutions to similar
problems

A prototypical example of a case-based reasoning

• The CADET system employs case-based reasoning to assist in the conceptual design of simple
mechanical devices such as water faucets.
• It uses a library containing approximately 75 previous designs and design fragments to suggest
conceptual designs to meet the specifications of new design problems.
• Each instance stored in memory (e.g., a water pipe) is represented by describing both its structure and
its qualitative function.
• New design problems are then presented by specifying the desired function and requesting the
corresponding structure.

The problem setting is illustrated in below figure


• The function is represented in terms of the qualitative relationships among the waterflow levels and
temperatures at its inputs and outputs.
• In the functional description, an arrow with a "+" label indicates that the variable at the arrowhead
increases with the variable at its tail. A "-" label indicates that the variable at the head decreases with
the variable at the tail.
• Here Qc refers to the flow of cold water into the faucet, Qh to the input flow of hot water, and Qm to
the single mixed flow out of the faucet.
• Tc, Th, and Tm refer to the temperatures of the cold water, hot water, and mixed water respectively.
• The variable Ct denotes the control signal for temperature that is input to the faucet, and Cf denotes
the control signal for waterflow.
• The controls Ct and Cf are to influence the water flows Qc and Qh, thereby indirectly influencing the
faucet output flow Qm and temperature Tm.

• CADET searches its library for stored cases whose functional descriptions match the design problem. If
an exact match is found, indicating that some stored case implements exactly the desired function,
then this case can be returned as a suggested solution to the design problem. If no exact match occurs,
CADET may find cases that match various subgraphs of the desired functional specification.

You might also like