Module 5
Module 5
• Each time a new query instance is encountered, its relationship to the previously
stored examples is examined in order to assign a target function value for the new
instance.
• It also includes case-based reasoning methods that use more complex, symbolic
representations for instances.
• A key advantage of this kind of delayed, or lazy, learning is that instead of estimating
the target function once for the entire instance space, these methods can estimate it
locally and differently for each new instance to be classified
Video Tutorial at https://www.youtube.com/@MaheshHuddar
Instance-based Learning
• Instance-based learning methods such as nearest neighbor and locally weighted
regression are conceptually straightforward approaches to approximating real-
valued or discrete-valued target functions.
• This is due to the fact that nearly all computation takes place at classification time
rather than when the training examples are first encountered.
Nearest
Sl. No. Height Weight Target Distance
Points
1 150 50 Medium 8.06
2 155 55 Medium 2.24 1
3 160 60 Large 6.71 3
4 161 59 Large 6.40 2
5 158 65 Large 11.05
6 157 54 ?
• መ
If k = 1, then the 1- Nearest Neighbor algorithm assigns to 𝑓(𝑥𝑞) the value
𝑓(𝑥𝑖 ).
• For larger values of k, the algorithm assigns the most common value among the
k nearest training examples.
• The positive and negative training examples are shown by “+” and “-” respectively.
Nearest
Sl. No. Height Weight Target Distance
Points
1 150 50 1.5 8.06
2 155 55 1.2 2.24 1
3 160 60 1.8 6.71 3
4 161 59 2.1 6.40 2
5 158 65 1.7 11.05
6 157 54 ?
Nearest
Sl. No. Height Weight Target Distance 1/distance2
Points
1 150 50 1.5 8.06
2 155 55 1.2 2.24 0.45 1
3 160 60 1.8 6.71 0.15 3
4 161 59 2.1 6.40 0.16 2
5 158 65 1.7 11.05
6 157 54 ?
– local because the function is approximated based only on data near the query
point,
– regression because this is the term used widely in the statistical learning
community for the problem of approximating real-valued functions.
Video Tutorial at https://www.youtube.com/@MaheshHuddar
LOCALLY WEIGHTED REGRESSION
• Given a new query instance xq, the general approach in locally
መ 𝑞), which
• This approximation is then used to calculate the value 𝑓(𝑥
• Where, ai(x) denotes the value of the ith attribute of the instance x
2. Minimize the squared error over the entire set D of training examples, while weighting
the error of each training example by some decreasing function K of its distance from xq :
• If we choose criterion three and re-derive the gradient descent rule, we obtain the
following training rule
• The differences between this new rule and the rule given by Equation (3) are that the
contribution of instance x to the weight update is now multiplied by the distance penalty
K(d(xq, x)), and that the error is summed over only the k nearest training examples.
Video Tutorial at https://www.youtube.com/@MaheshHuddar
RADIAL BASIS FUNCTIONS
• One approach to function approximation that is closely related to distance-weighted
regression and also to artificial neural networks is learning with radial basis functions.
• In this approach, the learned hypothesis is a function of the form
• Where, each xu is an instance from X and where the kernel function Ku(d(xu, x)) is defined
so that it decreases as the distance d(xu, x) increases.
• Here k is a user provided constant that specifies the number of kernel functions to be
included.
• 𝑓መ is a global approximation to f (x), the contribution from each of the Ku(d(xu, x)) terms is
localized to a region nearby the point xu.
Video Tutorial at https://www.youtube.com/@MaheshHuddar
RADIAL BASIS FUNCTIONS
• Choose each function Ku(d(xu, x)) to be a Gaussian function centred at the point xu with
some variance 𝜎u2
• The functional form of equ(1) can approximate any function with arbitrarily small error,
provided a sufficiently large number k of such Gaussian kernels and provided the width
• The function given by equ(1) can be viewed as describing a two layer network where the
first layer of units computes the values of the various Ku(d(xu, x)) and where the second
layer computes a linear combination of these first-layer unit values
Video Tutorial at https://www.youtube.com/@MaheshHuddar
Video Tutorial at https://www.youtube.com/@MaheshHuddar
CASE-BASED REASONING
• Instance-based methods such as k-NEAREST NEIGHBOR and locally weighted
regression share three key properties.
• First, they are lazy learning methods in that they defer the decision of how to
generalize beyond the training data until a new query instance is observed.
• Second, they classify new query instances by analyzing similar instances while
ignoring instances that are very different from the query.
• To get a prediction for a new example, those cases that are similar, or close to, the
new example are used to predict the value of the target features of the new
example.
• This is at one extreme of the learning problem where, unlike decision trees and
neural networks, relatively little work must be done offline, and virtually all of the
work is performed at query time.
1. Retrieve - Given a new case, retrieve similar cases from the case
base.
4. Retain - Decide whether to retain this new case in the case base.
Video Tutorial at https://www.youtube.com/@MaheshHuddar
CASE-BASED REASONING – Example
• A common example of a case-based reasoning system is a help desk that users call with
problems to be solved.
• Case-based reasoning could be used by the diagnostic assistant to help users diagnose problems
on their computer systems.
• When users give a description of a problem, the closest cases in the case base are retrieved. The
diagnostic assistant could recommend some of these to the user, adapting each case to the
user’s particular situation.
• An example of adaptation is to change the recommendation based on what software the user
has, what method they use to connect to the Internet, and the model of the printer.
• If one of the adapted cases works, that case is added to the case base, to be used when another
user asks a similar question.
• In this way, all of the common different cases will eventually be in the case base.
• If none of the cases found works, some other method is attempted to solve the problem,
perhaps by adapting other cases or having a human help diagnose the problem.
• WhenVideo
the Tutorial
problem at is finally solved, the solution is added to the case base.
https://www.youtube.com/@MaheshHuddar
A prototypical example of a case-based reasoning
• The CADET system employs case-based reasoning to assist in the conceptual design
of simple mechanical devices such as water faucets.
• New design problems are then presented by specifying the desired function and
requesting the corresponding structure.
• If an exact match is found, indicating that some stored case implements exactly
the desired function, then this case can be returned as a suggested solution to
the design problem.
• If no exact match occurs, CADET may find cases that match various subgraphs of
the desired functional specification.
• Humans observe the environment through senses such as eye and ear.
• The brain then suggests the actions and acts voluntarily or involuntarily.
• The kid interacts with the environment and gains valuable experience.
• For example, when a kid accidently burns his hand in fire, he learns to stay away from it in
the future and be careful.
• Similarly, the kid learns to do good things repeatedly by the encouragement or gifts given
by parents or teachers.
• Thus, kids gain experience and learn many things through rewards and punishments
Video Tutorial at https://www.youtube.com/@MaheshHuddar
(negative rewards).
OVERVIEW OF REINFORCEMENT LEARNING
• As a kid executes many actions and receives a mix of positive and negative rewards that
lead to gaining experience and learning, a computer program or robot can learn through
experiences of simulated scenarios.
• In a maze game, there may a danger spot that may lead to loss.
• Negative rewards can be designed for such spots so that the agent does not visit that spot.
• Positive and negative rewards are simulated in reinforcement learning, say +10 for positive
reward and -10 for some danger or negative reward.
2. Absence of a model is a challenge - Games like chess have fixed board and rules. But, many games do not
have any fixed environment or rules. There is no underlying model as well. So, simulation must be done to
gather experience.
3. Partial observability of states - Many states are fully observable. Imagine a scenario in a weather forecasting
where the uncertainty or partial observability exists as complete information about the state is simply not
available.
4. Time consuming operations - More state spaces and possible actions may complicate the scenarios,
resulting in more time consumption.
5. Complexity- Many games like GO are complicated with much larger board configuration and many
possibilities of actions. So, labelled data is simply not available. This adds more complexity to the design of
Video Tutorial
reinforcement at
algorithms. https://www.youtube.com/@MaheshHuddar
REINFORCEMENT LEARNING AS MACHINE LEARNING
• Differences between Reinforcement Learning and Supervised Learning
• The robot, or agent, has a set of sensors to observe the state of its environment, and a set
of actions it can perform to alter this state.
• Its task is to learn a control strategy, or policy, for choosing actions that achieve its goals.
• The goals of the agent can be defined by a reward function that assigns a numerical value
to each distinct action the agent may take from each distinct state.
who provides the reward value for each action performed by the robot.
• The task of the robot is to perform sequences of actions, observe their consequences,
• The control policy is one that, from any initial state, chooses actions that maximize the
• The robot may have a goal of docking onto its battery charger whenever its battery level
is low.
• The goal of docking to the battery charger can be captured by assigning a positive reward
(Eg., +100) to state-action transitions that immediately result in a connection to the
charger and a reward of zero to every other state-action transition.
• Each time it performs an action A, in some state st the agent receives a real-valued
reward r, that indicates the immediate value of this state-action transition.
• This produces a sequence of states si, actions ai, and immediate rewards ri as shown in
the figure.
• The agent's task is to learn a control policy, 𝝅: S → A, that maximizes the expected sum of
Video Tutorial at
these rewards, with future https://www.youtube.com/@MaheshHuddar
rewards discounted exponentially by their delay.
Reinforcement learning problem characteristics
1. Delayed reward: The task of the agent is to learn a target function 𝜋 that maps from the current state
s to the optimal action a = 𝜋 (s). In reinforcement learning, training information is not available in (s,
𝜋 (s)). Instead, the trainer provides only a sequence of immediate reward values as the agent
executes its sequence of actions. The agent, therefore, faces the problem of temporal credit
assignment: determining which of the actions in its sequence are to be credited with producing the
eventual rewards.
2. Exploration: In reinforcement learning, the agent influences the distribution of training examples by
the action sequence it chooses. This raises the question of which experimentation strategy produces
most effective learning. The learner faces a trade-off in choosing whether to favor exploration of
unknown states and actions, or exploitation of states and actions that it has already learned will yield
high reward.
Video Tutorial at https://www.youtube.com/@MaheshHuddar
Reinforcement learning problem characteristics
3. Partially observable states: The agent's sensors can perceive the entire state of the environment at
each time step, in many practical situations sensors provide only partial information. In such cases,
the agent needs to consider its previous observations together with its current sensor data when
choosing actions, and the best policy may be one that chooses actions specifically to improve the
observability of the environment.
4. Life-long learning: Robot requires to learn several related tasks within the same environment, using
the same sensors. For example, a mobile robot may need to learn how to dock on its battery charger,
how to navigate through narrow corridors, and how to pick up output from laser printers. This setting
raises the possibility of using previously obtained experience or knowledge to reduce sample
complexity when learning new tasks.
1. One approach is to require the policy that produces the greatest possible cumulative
reward for the robot over time.
– To state this requirement more precisely, define the cumulative value Vπ (st) achieved by
following an arbitrary policy π from an arbitrary initial state st as follows:
• The quantity Vπ (st) is called the discounted cumulative reward achieved by policy π from
initial state s. It is reasonable to discount future rewards relative to immediate rewards
because, in many cases, we prefer to obtain the reward sooner rather than later.
Considers the average reward per time step over the entire lifetime of the agent.
• V*(s) gives the maximum discounted cumulative reward that the agent can obtain starting
from state s.
• The agent moves one cell to the right in its grid world and receives an
immediate reward of zero for this transition.
• Each arrow in the diagram represents a possible action the agent can take to move from
one state to another.
• The number associated with each arrow represents the immediate reward r(s, a) the
agent receives if it executes the corresponding state-action transition
• The immediate reward in this environment is defined to be zero for all state-action
transitions except for those leading into the state labelled G.
• The state G as the goal state, and the agent can receive reward by entering this state.
Video Tutorial at https://www.youtube.com/@MaheshHuddar
Reinforcement learning
• Once the states, actions, and immediate rewards are defined, choose a value for the
discount factor γ, determine the optimal policy π * and its value function V*(s).
• Let’s choose γ = 0.9. The diagram at the bottom of the figure shows one optimal policy for
this setting.
• Tracing the best sequences of states is as simple as following the links with the highest
values at each state.