0% found this document useful (0 votes)
4 views47 pages

Lecture 2

The document discusses intelligent control systems, focusing on neural networks and their learning processes. It outlines different learning paradigms, including supervised, unsupervised, and reinforcement learning, along with their respective algorithms and applications. Key concepts such as learning algorithms, Hebbian learning, and self-organizing maps are also introduced, emphasizing the iterative nature of learning and the importance of environmental interaction.

Uploaded by

jacobhunter1717
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views47 pages

Lecture 2

The document discusses intelligent control systems, focusing on neural networks and their learning processes. It outlines different learning paradigms, including supervised, unsupervised, and reinforcement learning, along with their respective algorithms and applications. Key concepts such as learning algorithms, Hebbian learning, and self-organizing maps are also introduced, emphasizing the iterative nature of learning and the importance of environmental interaction.

Uploaded by

jacobhunter1717
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

KON 426E

INTELLIGENT CONTROL SYSTEMS

LECTURE 2
24/02/2025
Learning
• The property that is of primary significance for a neural
network is the ability of the network to learn from its
environment and to improve its performance through
learning. The improvement in performance takes place over
time in accordance with some prescribed measure.
• A neural network learns about its environment through an
interactive process of adjustments applied to its synaptic
weights and bias levels.

* Reference: Haykin, S., Neural Networks and Learning Machines, 2009.


Parameters to be learned:
Weights and bias values
Definition of Learning
(Mendel and McClaren, 1970)

• Learning is a process by which the free parameters of


a neural network are adapted through a process of
stimulation by the environment in which the network
is embedded. The type of learning is determined by
the manner in which the parameter changes take
place.

*Reference. Haykin, S.,Neural Networks, 1994.


This definition of the learning process implies the following
sequence of events:

1. The neural network is stimulated by an environment.


2. The neural network undergoes changes in its free
parameters as a result of this stimulation.
3. The neural network responds in a new way to the
environment because of the changes that have occurred in its
internal structure.

These steps are repeated iteratively and eventually the neural


network will behave in the desired way.

*Reference. Haykin, S.,Neural Networks, 1994.


• A prescribed set of well-defined rules for the solution of a
learning problem is called a learning algorithm.

• There are various learning algorithms. They differ from each


other in the way in which the adjustment to a synaptic weight
of a neuron is formulated.

• Another factor to be considered is the manner in which a


neural network relates to its environment. This leads us to the
concept of a learning paradigm. Learning paradigm refers to a
model of the environment in which the neural network
operates.

*Reference. Haykin, S.,Neural Networks, 1994.


Learning Paradigms

Learning With a Teacher


Learning Without a Teacher
(Supervised learning)

Reinforcement Learning Unsupervised Learning


Supervised Learning (Öğreticili Öğrenme)

*Reference. Haykin, S.,Neural Networks, 1994.


The teacher has the knowledge of the environment.
This knowledge is represented in the form of input-output examples.
These input-output examples constitutes the training data set.
However, the environment is unknown to the NN.
Initially, the NN has randomly assigned weights and bias values.
Both the teacher and the NN are exposed to a training vector
from the environment. (an example from the input-output data set)
 The teacher provides the correct output corresponding to the input vector.
 This output vector is the desired response that represents the optimum
action to be performed by the NN.
 Given the input vector the NN computes an output with its weights and
bias values.
An error signal is computed between the output of teacher and NN.
Error= Desired response – Actual response of the NN
The network parameters are adjusted under the combined influence
of the training vector and the error signal.
This adjustment is carried out iteratively in a step-by-step fashion.
Eventually the NN emulates (behaves in the same way) the teacher.
The emulation is assumed to be optimal in some statistical sense.
So, the knowledge of the environment available to the teacher
is transferred to the NN through training as fully as possible.
Now, the teacher can be replaced by the NN.
Example: System Identification
Reinforcement Learning
(Pekiştirmeli/Yinelemeli Öğrenme)

 Reinforcement learning (RL) is an area of machine


learning concerned with how intelligent agents ought to
take actions in an environment in order to maximize the
notion of cumulative reward.

 The environment is typically stated in the form of a Markov


decision process (MDP), and many reinforcement learning
algorithms for this context use dynamic
programming techniques.
*Reference: Wikipedia
 Consider we want to build a machine that learns to play chess. In
this case we cannot use a supervised learner for two reasons.
 First, it is very costly to have a teacher that will take us through
many games and indicate us the best move for each position.
 Second, in many cases, there is no such thing as the best move; the
goodness of a move depends on the moves that follow.
 A single move does not count; a sequence of moves is good if after
playing them we win the game.
 In chess, the only feedback is at the end of the game when we win
or lose the game.

*Reference: Alpaydın E,
Introduction to Machine
Learning, 2010.
• Another example is a robot that is placed in a maze.
• The robot can move in one of the four compass
directions and should make a sequence of movements
to reach the exit.
• As long as the robot is in the maze, there is no
feedback and the robot tries many moves until it
reaches the exit and only then does it get a reward.
• In this case there is no opponent, but we can have a
preference for shorter trajectories, implying that in this
case we play against time.

*Reference: Alpaydın E,Introduction to Machine Learning, 2010.


• These two examples have a number of points in common:
There is a decision maker, called the agent, that is placed in an
environment
In chess, the game-player is the decision maker and the
environment is the board;
In the second case, the maze is the environment of the robot.
• At any time, the environment is in a certain state that is
one of a set of possible states—for example, the state of the
board, the position of the robot in the maze.
• The decision maker has a set of actions possible: legal
movement of pieces on the chess board, movement of the
robot in possible directions without hitting the walls, and so
forth.
• Once an action is chosen and taken, the state changes.
Elements of Reinforcement Learning

The learning decision maker is called the agent.


The agent interacts with the environment that includes everything
outside the agent.
The agent has sensors to decide on its state in the environment and
takes an action that modifies its state.
When the agent takes an action, the environment provides a reward.

*Reference: Alpaydın E,Introduction to Machine Learning, 2010.


 Time is discrete as t = 0, 1, 2, . . ., and denotes
the state of the agent at time t where S is the set of all possible
states.
 denotes the action that the agent takes at time t
where is the set of possible actions in state
 When the agent in state takes the action , the clock ticks,
reward is received, and the agent moves to the next
state .
 The problem is modeled using a Markov decision process (MDP).
 The reward and next state are sampled from their respective
probability distributions, and
 This is a Markov system where the state and reward in
the next time step depend only on the current state and action.
 In some applications, reward and next state are deterministic, and
for a certain state and action taken, there is one possible reward
value and next state. In other applications, they may be stochastic.
 Depending on the application, a certain state may be designated as
the initial state and in some applications, there is also an absorbing
terminal (goal) state where the search ends.
 All actions in this terminal state transition to itself with probability 1
and without any reward.
 The sequence of actions from the start to the terminal state is an
episode, or a trial.
 The policy, π, defines the agent’s behavior and is a mapping from
the states of the environment to actions:
 The policy defines the action to be taken in any state
 The value of a policy , is the expected cumulative
reward that will be received while the agent follows the policy,
starting from state .
Expected Reward
 In the finite-horizon or episodic model, the agent tries to maximize
the expected reward for the next T steps:

 In the infinite-horizon model, there is no sequence limit, but


future rewards are discounted:

where is the discount rate to keep the return finite.


 is less than 1 because there generally is a time limit
to the sequence of actions needed to solve the task. The agent may
be a robot that runs on a battery. We prefer rewards sooner rather
than later because we are not certain how long we will survive.
Optimal policy
For each policy there is an expected reward
We want to find the optimal policy such that:

In some applications, for example, in control, instead of working


with the values of states, we prefer to work with the
values of state action pairs,
• denotes how good it is for the agent to be in state
• denotes how good it is to perform action when in
state . This is called Q-Learning.
Bellman's optimality principle suggests that in
finding the solution of an optimization problem,
regardless of the initial state and initial decision,
the consequent decisions must provide an optimal
policy with respect to the state resulting from the
initial decision.
Bellman’s Equation

Similarly:

The optimal policy is chosen as:


Example: Cognitive Packet Network (CPN)
ÖKE GÜLAY, LOUKAS GEORGIOS (2007). A denial of service detector based on
maximum likelihood detection and the random neural network. COMPUTER
JOURNAL, 50(6), 717-727., Doi: 10.1093/comjnl/bxm066

A networking which consists of 46 nodes


connected with 100 MBits/s links
SwitchLAN backbone network topology:
It is a network that provides service in Switzerland
to all universities, two federal institutes of
technology and the major research institutes.

CPN uses an intelligent and adaptive routing algorithm.


Smart Packets (SP) are sent to measure different metrics.
Reward:
Decision Treshold:
Unsupervised Learning (Öğreticisiz Öğrenme)

There is no external teacher or critic.


The system learns the statistical properties of the input data.
It develops the ability to form internal representations of the input.
New classes are formed automatically.
Statistically similar inputs should be classified in the same class.
A task-independent measure of the quality of the
representation of the system is selected and the free parameters
of the system are optimized with respect to this measure.

*Reference. Haykin, S.,Neural Networks, 1994.


• Unsupervised Learning is suitable for clustering and
classification problems.

• Some unsupervised learning algorithms are:

 K means clustering
 K nearest neighbour
 Kohonen’s SOM
 Singular value decomposition
 Voronoi diagrams

*Reference. Haykin, S.,Neural Networks, 1994.


SELF-ORGANIZING MAPS (SOM)
(Kohonen) – (Haykin, Chapter 9)
 They are a special class of artificial neural networks.
 These networks are based on competitive learning; the
output neurons of the network compete among themselves
to be activated or fired, with the result that only one output
neuron, or one neuron per group is on at any one time. An
output neuron that wins the competition is called a winner-
takes-all neuron, or simply a winning-neuron.

 A self-organizing map is therefore characterized by the


formation of a topographic map of the input patterns, in
which the spatial locations (i.e., coordinates) of the neurons in
the lattice are indicative of intrinsic statistical features
contained in the input patterns
*Reference. Haykin, S.,Neural Networks, 1994.
The brain is organized in many places in such a way that different
sensory inputs are represented by topologically ordered
computational maps.
Outline of the SOM Algorithm
1. Initialization.
Choose random values for the initial weight vectors
2. Sampling.
Draw a sample x from the input space with a certain probability; the
vector x represents the activation pattern that is applied to the
lattice. The dimension of vector x is equal to m.
3. Similarity matching.
Find the best-matching (winning) neuron at time-step n by using
the minimum-distance criterion

*Reference. Haykin, S.,Neural Networks, 1994.


4. Updating.
Adjust the synaptic-weight vectors of all excited neurons by using the
update formula

is the topological neighborhood function.

For example:

5. Continuation.
Continue with step 2 until no noticeable changes in the feature map
are observed.
VORONOI DIAGRAMS
• A Voronoi diagram is a partition of a plane into regions close to each of a
given set of objects. In the simplest case, these objects are just finitely
many points in the plane (called seeds, sites, or generators). For each seed
there is a corresponding region consisting of all points of the plane closer
to that seed than to any other. These regions are called Voronoi cells.

• The Voronoi diagram is named after Georgy Voronoy, and is also called
a Voronoi tessellation, a Voronoi decomposition, a Voronoi partition, or
a Dirichlet tessellation (after Peter Gustav Lejeune Dirichlet). Voronoi cells
are also known as Thiessen polygons. Voronoi diagrams have practical and
theoretical applications in many fields, mainly in science and technology,
but also in visual art.

(Reference: Wikipedia)
Euclidean distance:
Manhattan distance:

(Reference: Wikipedia)
Learning Algorithms

A prescribed set of well-defined rules for the solution of a


learning problem is called a learning algorithm.

1. Error Correction Learning


2. Hebbian Learning
3. Competitive Learning
4. Boltzman Learning
5. Memory-Based Learning

*Reference. Haykin, S.,Neural Networks, 1994.


Error Correction Learning

n: discrete time
Desired response/target output
Actual output signal
Error signal

Cost function / Index of performance


(Instantaneous value of the error energy)

*Reference. Haykin, S.,Neural Networks, 1994.


Widrow-Hoff Rule/ Delta Rule (1960)

Learning rate

The step-by-step adjustments to the synaptic weights of neuron k


are continued until the system reaches a steady-state.
(the synaptic weights are stabilized)
Case 1 If and

This is what we want

Case 2 If and

This is what we want


Verify Case 3 and Case 4 yourselves:

Case 3 If and

Case 4 If and
Hebbian Learning
Hebb was a neuroanatomist
Hebb’s Postulate of Learning from The Organization of Behaviour (1949):
When an axon of cell A is near enough to excite a cell B and repeatedly
or persistently takes part in firing it, some growth process or metabolic
changes take place in one or both cells such that A’s efficiency as one of
the cells firing B, is increased.
(There is physiological evidence of Hebbian learning in hippocampus part of the
brain)

This is a neurobiologıcal statement. It can be rephrased as follows:


1. If two neurons on either side of a synapse (connection) are activated
simultaneously (synchronously), the strength of that synapse is
selectively increased.
2. If two neurons on either side of a synapse are activated
asynchronously, that synapse is selectively weakened or eliminated.

*Reference. Haykin, S.,Neural Networks, 1994.


Such a synapse is called a Hebbian synapse.

A Hebbian synapse is time-dependent, highly local, and strongly


interactive mechanism to increase synaptic efficiency as a function of
the correlation between the presynaptic and postsynaptic activities.

Learning Rule:

Hebb’s Hypothesis:

This is also called the activity-product rule. The change on the weight
is proportional to the product of the input(presynaptic signal) and
output (postsynaptic signal)
Covariance Hypothesis (Sejnowski, 1977) :
 Convergence to a nontrivial state, which is reached when
or
 Synaptic weight is enhenced if there are sufficient levels
of presynaptic and postsynaptic activities, that is the
conditions and are both satisfied.
 Synaptic weight is depressed if there is either
and
or and
Competitive Learning
 The output neurons of a neural network compete among
themselves to become active (fired).
 Only a single output neuron is active at any one time.
 It is suitable to discover statistical features to classify a set of input
patterns.

Basic elements of competitive learning:


• A set of neurons that are same except for random synaptic weight,
and which therefore respond differently to a given set of inpus.
• A limit imposed on the strength of each neuron.
• A mechanism that permits the neurons to compete for the right to
respond to a given set of inputs, such that only one neuron is active
at a time. That neuron is called a winner-takes-all neuron.

*Reference. Haykin, S.,Neural Networks, 1994.


The network may include
feedback connections
to perform lateral inhibition.

You might also like