ML Unit Iv
ML Unit Iv
In nearest-neighbor learning the target function may be either discrete-valued or real- valued.
Let us first consider learning discrete-valued target functions of the form
The positive and negative training examples are shown by “+” and “-” respectively. A query point
xqis shown as well.
The 1-Nearest Neighbor algorithm classifies xq as a positive example in this figure, whereas the 5-
Nearest Neighbor algorithm classifies it as a negative example.
Below figure shows the shape of this decision surface induced by1-NearestNeighborover the entire
instance space.
The decision surface is a combination of convex polyhedra surrounding each of the training
examples.
For every training example, the polyhedron indicates the set of query points whose classification will
be completely determined by that training example.
Query points outside the polyhedron are closer to some other training example. This kind of diagram
is often called the Voronoi diagram of the set of training example.
The K-Nearest Neighbor algorithm for approximation a real-valued target function is given below:
Terminology
Regression means approximating a real-valued target function.
Residual is the error𝑓̂(x)-f(x) in approximating the target function.
Kernel function is the function of distance that is used to determine the weight of each training
example.
In other words, the kernel function is the function K such that
wi=K(d(xi, xq))
Where, ai(x) denotes the value of the ith attribute of the instance x
Derived methods are used to choose weights that minimize the squared error summed over the set D
of training examples using gradient descent
Need to modify this procedure to derive a local approximation rather than a global one.
The simple way is to redefine the error criterion E to emphasize fitting the local training examples.
Three possible criteria are given below.
2. Minimize the squared error over the entire set D of training examples, while weighting the error of
each training example by some decreasing function K of its distance from xq:
3. Combine1and 2:
If we choose criterion three and re-derive the gradient descent rule, we obtain the following training rule
The differences between this new rule and the rule given by Equation(3) are that the contribution of instance
x to the weight update is now multiplied by the distance penalty K(d(xq, x)), and that the error is summed
over only the k nearest training examples.
Where, each xu is an instance from X and where the kernel function K u(d(xu,x))is defined so that it
decreases as the distance d(xu, x) increases.
𝑓̂ is a global approximation to f(x), the contribution from each of the K u(d(xu,x)) terms is localized to a
Here k is a user provided constant that specifies the number of kernel functions to be included.
sufficiently large number k of such Gaussian kernels and provided the width 𝜎2 of each kernel can be
The functional form of equ(1) can approximate any function with arbitrarily small error, provided a
separately specified
The function given by equ(1) can be viewed as describing a two layer network where the first layer of units
computes the values of the various Ku(d(xu, x)) and where the second layer computes a linear combination of
these first-layer unit values
Example: Radial basis function (RBF) network:
Given a set of training examples of the target function, RBF networks are typically trained in a two-stage
process.
1. First, the number k of hidden units is determined and each hidden unit u is defined by choosing the values of
xuand 𝜎u2that define its kernel function Ku(d(xu, x))
2. Second, the weights w, are trained to maximize the fit of the network to the training data, using the global
error criterion given by
Because the kernel functions are held fixed during this second stage, the linear weight values w, can be
trained very efficiently
Several alternative methods have been proposed for choosing an appropriate number of hidden units or,
equivalently, kernel functions.
One approach is to allocate a Gaussian kernel function for each training example (xi,f (xi)), centering this
CASE-BASED LEARNING
6. WRITE ABOUT CASE BASED LEARNING. (PART – C)
In case-based learning, which is a concept often applied in machine learning, the system learns from
individual instances or cases rather than relying solely on general rules or models.
It’s a type of lazy learning approach where the system stores and uses specific instances to make predictions
or decisions.
In the context of machine learning, case-based learning involves:
1. Case Representation: Each instance or case is represented by a set of attributes or features. These
attributes describe the characteristics of the case.
2. Case Retrieval: When a new query or instance is presented to the system, it searches through its stored
cases to find the most similar cases to the query.
3. Case Adaptation: The system adapts the information from retrieved cases to generate a prediction or
solution for the new query.
4. Case Base Maintenance: The system might update its case base over time by adding new cases or
removing outdated ones to improve its performance.
Case-based learning is particularly useful when there is a lack of clear rules or patterns that can be learned
from data directly.
It’s often employed in areas where domain knowledge and context play a significant role in making
decisions.
The success of case-based learning heavily depends on the quality of case representation, similarity
measures, and the adaptability of retrieved cases to the new situations.
A prototypical example of a case-based reasoning:
The CADET system employs case-based reasoning to assist in the conceptual design of simple mechanical
devices such as water faucets.
It uses a library containing approximately 75 previous designs and design fragments to suggest conceptual
designs to meet the specifications of new design problems.
Each instance stored in memory (e.g.,a water pipe) is represented by describing both its structure and its
qualitative function.
New design problems are then presented by specifying the desired function and requesting the corresponding
structure.
The problem setting is illustrated in below figure:
The function is represented in terms of the qualitative relationships among the water- flow levels and
temperatures at its inputs and outputs.
In the functional description, an arrow with a "+" label indicates that the variable at the arrow head increases
with the variable at its tail.
A "-" label indicates that the variable at the head decreases with the variable at the tail.
Here Qc refers to the flow of cold water into the faucet, Q h to the input flow of hot water, and Q m to the
single mixed flow out of the faucet.
Tc, Th, and Tmrefer to the temperatures of the cold water, hot water, and mixed water respectively.
The variable Ct denotes the control signal for temperature that is input to the faucet, and C f denotes the
control signal for water flow.
The controls Ctand Cfare to influence the water flows Qcand Qh, thereby indirectly influencing the faucet
output flow Qmand temperature Tm.
CADET searches its library for stored cases whose functional descriptions match the design problem.
If an exact match is found, indicating that some stored case implements exactly the desired function, then this
case can be returned as a suggested solution to the design problem.
If no exact match occurs, CADET may find cases that match various subgraphs of the desired functional
specification.
Reference:
1. Tom M. Mitchell,―Machine Learning, McGraw-Hill Education (India) Private Limited, 2013.
PART – A (1 MARK)
1. ------------------- is an instance-based learner.
a) Eager Learner b) Lazy Learner c) Both (A) and (B) d) None of the Above
2. Machine Learning has various function representation, which of the following is not numerical functions?
a) Case-based b) Neural Network c) Linear Regression d) Support Vector Machines
3. Which of the following will be true about k in K-Nearest Neighbor in terms of Bias?
a) When you decrease the k the bias will be increases
b) When you increase the k the bias will be increases
c) Both (A) and (B)
d) None of the Above
4. Which of the following statements is false about k-Nearest Neighbor algorithm?
a) It stores all available cases and classifies new cases based on a similarity measure
b) It has been used in statistical estimation and pattern recognition
c) It cannot be used for regression
d) The input consists of the k closest training examples in the feature space
5. What are the advantages of Nearest neighbour algorithm?
a) Training is very fast b) Can learn complex target functions
c) Don’t lose information d) All of these
6. What if the target function is real valued in kNN algorithm?
a) Calculate the mean of the k nearest neighbours
b) Calculate the SD of the k nearest neighbor
c) None of these
d) All of the above
7. What is/are advantage(s) of Locally Weighted Regression?
a) Pointwise approximation of complex target function
b) Earlier data has no influence on the new ones
c) Both A & B d) None of these
8. Which network is more accurate when the size of training set between small to medium?
a) PNN/GRNN b) RBF c)K-means clustering d) None of these
9. What is/are true about RBF network?
a) A kind of supervised learning
b) Design of NN as curve fitting problem
c) Use of multidimensional surface to interpolate the test data
d) All of these
10. In k-NN algorithm, given a set of training examples and the value of k < size of training set (n), the algorithm
predicts the class of a test example to be the. What is/are advantages of CBR?
a) Least frequent class among the classes of k closest training examples.
b) Most frequent class among the classes of k closest training examples.
c) Class of the closest point.
d) Most frequent class among the classes of the k farthest training examples.
PART – B (5 MARKS)
1. Write short notes on instance based learning. (Refer P. No.1, Q. No.1)
2. Write short notes on distance weighted nearest neighbor learning. (Refer P. No.1, Q. No.3)
3. Write down the steps involved in locally weighted regression. (Refer P. No.4, Q. No.4)
4. Give an example of radial basis function. (Refer P. No.5, Q. No.5)