0% found this document useful (0 votes)
59 views

Summary of Lecture 2: Locomotion Control: Biologically Inspired Artificial Intelligence (WS03: 410)

This document summarizes a lecture on biologically inspired artificial intelligence and locomotion control in robotics. It discusses 1) different types of legged robot locomotion control including problems that need to be solved, types of gaits, and control methods like PID controllers and ZMP control, and 2) learning algorithms covered in the lecture including CPG-and-reflex based control of locomotion, evolutionary algorithms, and reinforcement learning. It also provides an overview of related practical work involving different teams developing walking robots.

Uploaded by

Mehdi Gh
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views

Summary of Lecture 2: Locomotion Control: Biologically Inspired Artificial Intelligence (WS03: 410)

This document summarizes a lecture on biologically inspired artificial intelligence and locomotion control in robotics. It discusses 1) different types of legged robot locomotion control including problems that need to be solved, types of gaits, and control methods like PID controllers and ZMP control, and 2) learning algorithms covered in the lecture including CPG-and-reflex based control of locomotion, evolutionary algorithms, and reinforcement learning. It also provides an overview of related practical work involving different teams developing walking robots.

Uploaded by

Mehdi Gh
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Biologically Inspired Artificial Intelligence

Summary of lecture 2: Locomotion control


(WS03: 410) Topics:

Lecture 3 1. Locomotion control in robotics:


Overview of different bio-inspired robots, wheeled versus
Auke Jan Ijspeert legged robots, passive walkers

Swiss Federal Institute of Technology, Lausanne 2. Legged robot locomotion control:


Problems that need to be solved, different types of gaits,
(EPFL)
PID controllers, trajectory generation, ZMP control,
Virtual Model Control.

Lecture 3: Learning algorithms Practical work:


Topics: • Registered teams:
1. KInator: David Weber
1. CPG-and-reflex based control of locomotion (end of 2. BruceTheShark: Tobias Reinhard and Thilo Tanner
lecture 2) 3. Barbie: Raoul Schmidiger
4. RoDeBot: Dennis Brunner and Roman Flueckiger
2. Evolutionary algorithms 5. ElNino: Oliver Michel
6. Muhammad_Ai: Alexander Kraehenbuehl and Juerg
3. Reinforcement learning Schaefli
7. Gogo_Yubari: Tsuyoshi Ito and Sascha Robert
8. SickBoy: Marc Ziegler
9. Sonovobic: Marc Breuer

Nov 4: Kinator went straight to number one,


congratulations!!

1
Practical work: Practical work
• Recent matches (Friday November 14th):

judo_14_11_03\R8D8_vs_El_Nino.wva For next lecture (Dec. 1), all teams should


judo_14_11_03\chunky_vs_KInator.wva
judo_14_11_03\chunky_vs_Coyote.wva
imperatively have a robot capable of
walking, and if possible, standing-up,
participating to the competition

Lecture 3: Learning algorithms CPG-and-reflex control


Topics: • Main idea: to use oscillators and to replicate the
distributed control mechanisms found in vertebrates
1. CPG-and-reflex based control of locomotion (end of
lecture 2) Visual System Vestibular Sys.

Visuomotor Coord. Balance Control


2. Evolutionary algorithms

3. Reinforcement learning
CPG Reflexes

Reflexes

Actuators Proprioception

2
Concept of Limit Cycle CPG-and-reflex control
• A limit cycle is an oscillatory regime in a dynamical Two types of implementations:
system:
CPG produces desired positions:
~
CPG-and- θ Feedback u θ
reflex + Σ Robot
- Controller (PID)

Limit cycles CPG directly produces torques:


~
• If the limit cycle is stable, the states of the system will CPG-and- u θ
return to it after perturbations Robot
reflex

Taga’s neuromechanical simulation Taga’s neuromechanical simulation

Neural oscillator:
(Taga 1994)

G. Taga. Emergence of bipedal locomotion through entrainment among the neuro-musculo-skeletal


system and the environment. Physica D: Nonlinear Phenomena, 75(1-3):190-208, 1994
G. Taga. A model of the neuro-musculo-skeletal system for human locomotion. i. emergence of basic
gait. Biological Cybernetics, 73(2):97-111, 1995

3
Taga’s neuromechanical simulation Taga’s neuromechanical simulation
Interesting aspects:
Walking gait:
• Locomotion seen as a limit cycle due to the global
entrainment between the neuro-musculo-skeletal system
and the environment
• Robustness against (small) variations in the environment
(e.g. small slopes)

Cons:
• Hand-tuning of (many) parameters for obtaining
satisfactory limit cycles

Nonlinear oscillators Nonlinear oscillator model


Example: Each oscillatory center is modeled with the following oscillator:
Design of a locomotion controller inspired by the salamander
CPG for the control an amphibious robot

Limit cycle: Explicit frequency and amplitude parameters

Limit cycle:

4
Inter-oscillator coupling Body CPG
Two parameters (aij and bij)
per coupling • Model: 40 segments
aij, bij
• Assumptions:
• Lamprey-like system: chain of
oscillators 40
• Two oscillators per segment
• Closest neighbour coupling
• Double symmetry:
left-right+per segment

6 open coupling parameters

Generation of traveling waves for


Corresponding swimming gait:
Example: swimming
~ time to stabilize Motoneuron signals:

(Delvolvé et al 1997):
EMG
In axial
musculature

5
Complete CPG Generation of standing wave for walking

swimming

Limb CPG Body CPG


EMG

From swimming to walking Real salamander: walking

6
Real salamander: from walking to swimming Real salamander: swimming

Real salamander: from swimming to walking Simulation Demo

Salamander applet

7
Outcomes Quadruped-robot controlled with a
• Simple control signals for controlling the speed, direction, CPG-and-reflex based controller
Kimura Lab,
and type of gait (Ebody_left, Ebody_right, Elimb_left, Elimb_right, National Univ. of Electro-Communications
and τ) Tokyo

• Robustness against noise and perturbations

• Entrainment between the CPG and the body through


sensory feedback (work in progress)

• Nonlinear oscillators are more tractable than neural


networks (fewer parameters)
• Problems: not yet a good methodology for setting the
coupling weights

Quadruped-robot controlled with a CPG-and-reflex Control: summary


CPG-and-reflex based controller • Pros:
Kimura Lab, • Distributed control
National Univ. of Electro-Communications
Tokyo • Limit cycle behavior (controller-body-environment)
• Robust against pertubations
• Smooth trajectories due to the oscillators

• Cons:
• Fewer mathematical tools than other methods
• Not (yet) a clear design methodology, it is
Reflex - Knee Bending
recommended to use learning algorithms
Camera control
To avoid obstacles Obstacles detection

8
Lecture 3: Learning algorithms Evolutionary algorithms
Topics: There exist different types of learning:
• Evolution
1. CPG-and-reflex based control of locomotion (end of • Supervised learning
lecture 2) • Learning by imitation
• Reinforcement learning
2. Evolutionary algorithms • Unsupervised learning
• …
3. Reinforcement learning
We will start by making an overview of evolutionary
algorithms

Evolutionary algorithms Genetic Algorithm (GA)


Evolutionary algorithms are stochastic population-based Developed by John Holland (1975)
optimization algorithms Ingredients:
• Fitness function: function returning a real number
Three main mechanisms: describing how well a solution solves the given problem
1. reproduction, • Chromosomes: candidate solutions
2. mutation and • Population: group of solutions (chromosomes)
3. the Darwinian principle of survival of the fittest • Genes: parameters of a chromosome
• Genetic operators: operators that modify the population
Mainly three different flavors: of solutions
1. Genetic algorithms
2. Evolution Strategies Main characteristics of the original GA:
3. Genetic Programming • Large populations, binary encoding, extensive use of the
crossover operator

9
GA: algorithm GA: encoding
Let’s assume we would like to find the maximum of a
1.Initial population fitness function f(x,y):

2.Parent selection 3.Crossover


Y X
5.0 5.0
7.Ending criterion? 4.Mutation
Xi=2.30
Yi=1.03
6.Rejection 5.Fitness evaluation
chromosome
The typical GA has a
binary encoding: <Xi,Yi> = 011101 001101
allele gene

GA: Initial population GA: Parent selection


• New chromosomes are created by modifying parent
• The typical GA has a fixed population size: N chromosomes (only some chromosomes in the whole
chromosomes population)

• The initial population is normally randomly generated • Parents are chosen depending on their fitness: the higher
the fitness, the higher the chance to be chosen
• In some case, prior knowledge of the problem can be
used to introduce some particular solutions • Different schemes are possible:
(chromosomes) in the population • Fitness-based selection: probability directly
proportional to the fitness
• Rank-based selection: probability inversely
proportional to the rank (i.e. first, second,…)
• Tournament selection: Pick two potential parents and
keep the best (repeat until you have enough parents)

10
GA: Crossover operator GA: Mutation operator
Crossover operator: recombination operator that swaps Mutation operator: each allele in a gene has a probability M
genetic material from two parent chromosomes to be mutated:
One-point crossover:
011101 001101 xyyx01 001101
011101 001101

xyyxyx yyxyxy 0111yx yyxyxy

Two-point crossover: 010111 011101


011101 001101 xyyx01 0011xy

xyyxyx yyxyxy 0111yx yyxy01

!! The effectiveness of the crossover operator depends on the


encoding, e.g. in which order parameters are encoded

GA: Fitness function GA: Rejection operator


Fitness functions must be carefully designed. Some Different regimes of selection and rejection operators can be
functions can have the same maxima, but can be more or chosen:
less difficult to optimize
Global optimum
Generational GA: the whole population is updated at each
generation

Steady-state GA: Only part of the population is updated

Local optimum Generational GA: Steady-state GA:


Current New Current New
Easier to optimize More difficult population Children population population Children population
Parents Parents

11
GA: Ending criterion Typical run: generation 0
Different possibilities:

• Fixed number of generations has been reached

• Increase of maximum fitness per generation passing


Y X
below a threshold 5.0 5.0
• Genetic diversity (e.g. standard deviations of gene
values) passing below a threshold

Typical run: next generations Typical run: next generations

Y X Y X
5.0 5.0 5.0 5.0

12
Typical run: convergence Typical run Max.
Fitness Average
Min..

Y X
5.0 5.0
Genetic Generations
Diversity
(e.g. sum of standard
deviations of gene values
within the population

Generations

GA: applications GA: applications


GAs are useful for solving problems that are not well Because the Body-Environment is a complex nonlinear
characterized mathematically, e.g. when no information system, the gradient ∇r f (θ ,..., θ ) = [ ∂f ,..., ∂f ]
∂θ1 ∂θ N
1 N
concerning the gradient of the fitness function is
available (i.e. gradient-descent is not possible). is usually impossible to compute analytically, and time-
consuming to estimate numerically.
Example: evolution of a locomotion controller

Control BODY ENV. Control BODY ENV.

Parameters: θi ? Fitness: f(θ1,…,θN) Parameters: θi Fitness: f(θ1,…,θN)

13
GA: applications GA applications:
Karl Sims evolved Creatures
In robotics, GAs are used either to optimize parameters in a
controller, e.g. a sinus-based controller or a finite-state
machine (i.e. a set of if-then rules), or, more commonly,
to optimize parameters such as synaptic weights in a
neural network

Lecture 2: sinus controller • GA used to evolve both body shape and controller
• Fitness function: speed of locomotion
θ i = θ i0 + Ai sin(υi t + ϕi ) • Controller: special type of neural network (with some neurons
producing sinusoidal signals)

Sims, K., "Evolving Virtual Creatures," Computer Graphics (Siggraph '94) Annual
Conference Proceedings, July 1994, pp.43-50.

GA applications: Evolution Strategies (ES)


evolutionary robotics Developed by Rechenberg 1973 and Schwefel 1975

Main characteristics of the original ES:


• Small populations (sometimes just one chromosome),
• Real number encoding,
• Extensive use of the mutation operator, evolution of the
mutation range
• No crossover operator

• GA used to evolve neural networks.


• Incremental evolution: evolution of obstacle avoidance, then homing-
behavior, then puck grasping, …

Urzelai, J., Floreano, D., Dorigo, M., and Colombetti, M. Incremental Robot Shaping,
Connection Science, 10, 341-360, 1998

14
ES: encoding ES: Mutation operator
At each generation, a gene x is mutated as follows:

x t +1 = x t + N 0 (σ x )
t
Y X
5.0 5.0
Where N 0 (σ ) is a Gaussian random number with mean 0
t

Xi=2.30 and standard deviation σ t


Yi=1.03
At every n generations, the standard deviations are updated:
chromosome
The typical ES has a cd σ x t if pst < 1/5 t
 where ps is the
real number encoding: <Xi,Yi> = 2.30,σx 1.03, σy σx t +n
= ciσ x if pst > 1/5
t
frequency of successful
 t mutations over intervals
Standard deviations of the mutation operator σ x if pst = 1/5 of 10n

ES: Mutation operator Genetic Programming (GP)


Because of the encoding and the mutation operator, a gene Developed by John Koza (1992)
has a “memory” of good mutations:
Main characteristics of the original GP:
• Chromosomes encode programs, e.g. in Lisp (rather than
parameters),
• A chromosome has a tree-structure
Maximum
• Specific mutation and crossover operators to deal with
tree-structure

15
GP: example of encoding GP: example of crossover operator
Symbolic (as opposed to parametric) fitting of a function: Parents

Functions:
+,-,*,/,…

Children

Terminals:
variables or b *b − 2* 2* a *c − b
numeric values 2*a

GP: example of mutation operators Evolutionary algorithms


Parent • Note: There is less and less distinction between genetic
algorithms, evolution strategies, and genetic
programming

• E.g. many genetic algorithms use real-number


Mutation of
a function encodings, many ES use large populations and
Replacement of a subtree
Mutation of crossover, …
by another random one
a parameter
• All these algorithms are part of a continuum of
evolutionary algorithms

16
Evolutionary algorithms: summary Evolutionary algorithms: summary
Pros: Cons:

• Robust optimization (does not get stuck in local optima • Slow


too easily) • Need some adjustments that are problem-dependent
• Few restrictions on the type of fitness function (e.g. it (probabilities of mutation and crossover, number of
does not need to be differentiable, nor continuous) children,…)
• Easy to implement • Robotics: not well suited for online learning (too slow,
• Well-adapted to be implemented on parallel computers must be run serially if single robot, some generated
• Can easily be combined with other approaches (e.g. controllers can be harmful to the robot).
starting with a GA, and then finishing with a gradient-
based hill climbing).

Lecture 3: Learning algorithms Reinforcement Learning


Topics:

1. CPG-and-reflex based control of locomotion (end of The next slides are adapted from Sutton and Barto’s course
lecture 2)

2. Evolutionary algorithms

3. Reinforcement learning

17
Key Features of RL Key Features of RL

❐ Trial-and-Error learning, that is well-suited for online


learning on a robot, for instance.

Environment
❐ Learner is not told which actions to take (i.e. no
supervision), but receives a reward every so often
state action
❐ Possibility of delayed reward
„ Sacrifice short-term gains for greater long-term gains reward
Agent

❐ Most RL learning algorithms can be seen as algorithms


that estimate value functions and solve the Bellman
equation (see next slides)

Adapted from R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Adapted from R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 2

Elements of RL The Agent-Environment Interface

Agent
Policy state reward
rt action
st at
Reward rt+1
Value st+1 Environment
Model of
environment
❐ Policy: what to do Agent and environment interact at discrete time steps : t = 0,1,2,K
Agent observes state at step t : st ∈ S
❐ Reward: what is good
produces action at step t : at ∈ A( st )
❐ Value: what is good because it predicts reward gets resulting reward : rt +1 ∈ ℜ
❐ Model: what follows what and resulting next state : st +1

... rt +1 rt +2 rt +3 s ...
st st +1 st +2 t +3
at at +1 at +2 at +3

Adapted from R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 3 Adapted from R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 4

1
The Agent Learns a Policy Policy

Policy at step t ,π t : S
Stochastic environment:
a
a mapping from states to action probabilities stochastic transition
S’
π t ( s, a ) = probability to take action at = a when st = s

❐ Reinforcement learning methods specify how the agent


changes its policy as a result of experience.
❐ Roughly, the agent’s goal is to get as much reward as it
can over the long run.
Note: both the environment and the policy can be
stochastic/probabilistic

Adapted from R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 5 Adapted from R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 6

Getting the Degree of Abstraction Right Goals and Rewards


❐ Time steps need not refer to fixed intervals of real time.
❐ Actions can be low level (e.g., voltages to motors), or high ❐ The reward signal rt is a scalar. This offers a flexible way
level (e.g., move North, pick-up object, accept a job of encoding the goal of a problem.
offer,… ), ❐ A goal should specify what we want to achieve, not how
we want to achieve it.
❐ States can low-level “sensations” (e.g. distance sensor ❐ The agent must be able to measure success:
readings), or they can be abstract, symbolic, based on „ explicitly;
memory, or subjective (e.g., the state of being “surprised”
„ frequently during its lifespan.
or “lost”).
❐ The environment is not necessarily unknown to the agent,
only incompletely controllable.

Adapted from R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 7 Adapted from R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 8

2
Returns Returns for Continuing Tasks
Suppose the sequence of rewards after step t is :
rt +1 , rt+ 2 , rt + 3 , K Continuing tasks: interaction does not have natural episodes.
What do we want to maximize?

In general, Discounted return:



we want to maximize the expected return, E{Rt }, for each step t. Rt = rt +1 + γ rt+ 2 + γ 2 rt +3 + L = ∑ γ k rt + k +1 ,
k =0

Episodic tasks: interaction breaks naturally into where γ , 0 ≤ γ ≤ 1, is the discount rate.
episodes, e.g., plays of a game, trips through a maze.
Rt = rt +1 + rt +2 + L + rT ,
shortsighted 0 ← γ → 1 farsighted
where T is a final time step at which a terminal state is reached,
ending an episode.

Adapted from R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 9 Adapted from R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 10

An Example Another Example


Avoid failure: the pole falling beyond
a critical angle or the cart hitting end of Get to the top of the hill
track. as quickly as possible.

As an episodic task where episode ends upon failure:


reward = +1 for each step before failure
⇒ return = number of steps before failure
reward = −1 for each step where not at top of hill
As a continuing task with discounted return: ⇒ return = − number of steps before reaching top of hill
reward = −1 upon failure; 0 otherwise
⇒ return = − γ k , for k steps before failure Return is maximized by minimizing
number of steps reach the top of the hill.
In either case, return is maximized by
avoiding failure for as long as possible.
Adapted from R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 11 Adapted from R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 12

3
Reinforcement learning algorithms

❐ We will continue with RL algorithms in Lecture 4.

End of lecture 3

Adapted from R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 13

You might also like