ML Unit-4
ML Unit-4
Computational Learning Theory: Sample Complexity for Finite Hypothesis spaces, Sample Complexity for
Infinite Hypothesis spaces, The Mistake Bound Model of Learning; INSTANCE-BASED LEARNING – k-
Nearest Neighbour Learning, Locally Weighted Regression, Radial basis function networks, Case-based
learning
❖ COMPUTATIONAL LEARNING THEORY:
• Computational Learning Theory (CLT) is a field of Al used for studying the design of machine learning
algorithms to determine what sorts problems are learnable.
• The ultimate goals are to understand the theoretical ideas of deep learning programs, what makes
them work or not, while improving accuracy and efficiency.
• This field merges many disciplines, such as probability theory, statistics, programming optimization,
information theory, calculus and geometry.
• Computational learning theory is used to:
i.) Provide a theoretical analysis of learning.
ii.) Show when a learning algorithm can be expected to succeed.
iii.) Show when learning may be impossible.
• There are three areas comprised by CLT:
i.) Sample complexity: Sample complexity described the examples we need to find in a good
hypothesis.
ii.) Computational complexity: Computational complexity defined the computational power we
need to find in a good hypothesis.
iii.) Mistake bound: Mistake bound find the mistakes we will make before finding a good hypothesis.
7. Amount of computation A has to do in each round in order to update its hypothesis from ht, to ht+1.
8. Setting this issue aside for a moment, we have a remarkably simple algorithm halving (C) that has
a mistake bound of lg(IC I) for any finite concept class C.
9. For a Finite set H of hypotheses, de line the hypothesis majority (A) as follows
❖ INSTANCE-BASED LEARNING:
Instance-based learning methods such as nearest neighbour and locally weighted regression are
conceptually straightforward approaches to approximating real-valued or discrete-valued target
functions.
• Learning in these algorithms consists of simply storing the presented training data. When a new query
instance is encountered, a set of similar related instances is retrieved from memory and used to classify
the new query instance
• Instance-based approaches can construct a different approximation to the target function for each
distinct query instance that must be classified
The K-NN working can be explained on the basis of the below algorithm:
o Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
o Step-4: Among these k neighbors, count the number of the data points in each category.
o Step-5: Assign the new data points to that category for which the number of the neighbor is maximum.
o Step-6: Our model is ready.
o It is simple to implement.
o It is robust to the noisy training data
o It can be more effective if the training data is large.
o Always needs to determine the value of K which may be complex some time.
o The computation cost is high because of calculating the distance between the data points for all the
training samples.
Example:
Hence for the given data query x for Maths=6 and CS= 8 the Classification will be PASS
❖ LOCALLY WEIGHTED REGRESSION
• The phrase "locally weighted regression" is called local because the function is approximated based
only on data near the query point, weighted because the contribution of each training example is
weighted by its distance from the query point, and regression because this is the term used widely in
the statistical learning community for the problem of approximating real-valued functions.
• Given a new query instance xq, the general approach in locally weighted regression is to construct an
approximation 𝑓̂ that fits the training examples in the neighbourhood surrounding xq. This
approximation is then used to calculate the value 𝑓̂(xq), which is output as the estimated target value
for the query instance.
• Consider locally weighted regression in which the target function f is approximated near xq using a
linear function of the form
Where, ai(x) denotes the value of the ith attribute of the instance x
• Derived methods are used to choose weights that minimize the squared error summed over the set D
of training examples using gradient descent
7. Minimize the squared error over the entire set D of training examples, while weighting the error of each
training example by some decreasing function K of its distance from xq:
8. Combine 1 and 2:
If we choose criterion three and re-derive the gradient descent rule, we obtain the following training rule
The differences between this new rule and the rule given by Equation (3) are that the contribution of instance x to
the weight update is now multiplied by the distance penalty K(d(xq, x)), and that the error is summed over only the
k nearest training examples.
❖ RADIAL BASIS FUNCTION NETWORKS
• One approach to function approximation that is closely related to distance-weighted regression and
also to artificial neural networks is learning with radial basis functions
• In this approach, the learned hypothesis is a function of the form
• Where, each xu is an instance from X and where the kernel function Ku(d(xu, x)) is defined so that it
decreases as the distance d(xu, x) increases.
• Here k is a user provided constant that specifies the number of kernel functions to be included.
• 𝑓̂ is a global approximation to f (x), the contribution from each of the Ku(d(xu, x)) terms is localized to a
region nearby the point xu.
Choose each function Ku(d(xu, x)) to be a Gaussian function centred at the point xu with some variance
𝜎u 2
• The functional form of equ(1) can approximate any function with arbitrarily small error, provided a
sufficiently large number k of such Gaussian kernels and provided the width 𝜎 2 of each kernel can be
separately specified
• The function given by equ(1) can be viewed as describing a two layer network where the first layer of
units computes the values of the various Ku(d(xu, x)) and where the second layer computes a linear
combination of these first-layer unit values
Given a set of training examples of the target function, RBF networks are typically trained in a two-
stage process.
1. First, the number k of hidden units is determined and each hidden unit u is defined by choosing the
values of xu and 𝜎u 2 that define its kernel function Ku(d(xu, x))
2. Second, the weights w, are trained to maximize the fit of the network to the training data, using the
global error criterion given by
Because the kernel functions are held fixed during this second stage, the linear weight values w, can
be trained very efficiently
Several alternative methods have been proposed for choosing an appropriate number of hidden units
or, equivalently, kernel functions.
• One approach is to allocate a Gaussian kernel function for each training example (xi,f (xi)), centring this
Gaussian at the point xi. Each of these kernels may be assigned the same width 𝜎 2 . Given this approach,
the RBF network learns a global approximation to the target function in which each training example
(xi, f (xi)) can influence the value of f only in the neighbourhood of xi.
• A second approach is to choose a set of kernel functions that is smaller than the number of training
examples. This approach can be much more efficient than the first approach, especially when the
number of training examples is large.
❖ CASE-BASED LEARNING
• Case-based reasoning (CBR) is a learning paradigm based on lazy learning methods and they classify
new query instances by analysing similar instances while ignoring instances that are very different from
the query.
• In CBR represent instances are not represented as real-valued points, but instead, they use a rich
symbolic representation.
• CBR has been applied to problems such as conceptual design of mechanical devices based on a stored
library of previous designs, reasoning about new legal cases based on previous rulings, and solving
planning and scheduling problems by reusing and combining portions of previous solutions to similar
problems
• The CADET system employs case-based reasoning to assist in the conceptual design of simple
mechanical devices such as water faucets.
• It uses a library containing approximately 75 previous designs and design fragments to suggest
conceptual designs to meet the specifications of new design problems.
• Each instance stored in memory (e.g., a water pipe) is represented by describing both its structure and
its qualitative function.
• New design problems are then presented by specifying the desired function and requesting the
corresponding structure.
• CADET searches its library for stored cases whose functional descriptions match the design problem. If
an exact match is found, indicating that some stored case implements exactly the desired function,
then this case can be returned as a suggested solution to the design problem. If no exact match occurs,
CADET may find cases that match various subgraphs of the desired functional specification.