0% found this document useful (0 votes)

46 views34 pages

RL and ObC Lecture 1

Uploaded by

Erdem Şimşek

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views34 pages

RL and ObC Lecture 1

Uploaded by

Erdem Şimşek

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

Reinforcement Learning and Optimization-based

Control

Assoc. Prof. Dr. Emre Koyuncu

Department of Aeronautics Engineering

Istanbul Technical University

Lecture 1: Introduction

E. Koyuncu (ITU) RL and ObC Lecture 1 1 / 34

Table of Contents

1 Optimal Control and RL

2 Adaptive Control

3 Reinforcement Learning

4 RL Applications

5 About this Course

E. Koyuncu (ITU) RL and ObC Lecture 1 2 / 34

Table of Contents

1 Optimal Control and RL

2 Adaptive Control

3 Reinforcement Learning

4 RL Applications

5 About this Course

E. Koyuncu (ITU) RL and ObC Lecture 1 3 / 34

Adaptive and Optimal Control

Optimal Control
• Minimize prescribed Adaptive Control
performance function • Learns online via feedback
• Usually designed to be offline to function
solve HJB • Not usually designed to be
• Use complete knowledge of the optimal
system • First identify the system then
• Solving Nonlinear HJB equation use the model
are often hard or impossible

E. Koyuncu (ITU) RL and ObC Lecture 1 4 / 34

MPC and RL
• Both are frameworks to solve sequential decision making problems
• Both automatically design controllers based on desired outcomes
(reward/cost, constraints, etc.)
Reinforcement Learning
• Controller directly learned from Model Predictive Control
data, exploration and • System identification precedes
exploitation control implementation, model
• Both continuous and fixed during execution
binary/sparse rewards • Typically convex stage costs
• Constraints imposed via • Constraints imposed explicitly
penalties • Online optimization over
• Mostly parameterized controller, prediction horizon - expensive?
Deep Learning integrated cheap • Usually combined with state
• Usually history included in estimator
definition of the state
E. Koyuncu (ITU) RL and ObC Lecture 1 5 / 34
Linear Quadratic Regulators(LQR)

The most basic sort of optimal controller for LTI systems. Consider
following system
ẋ = Ax(t) + Bu(t)
where the state x(t) ∈ Rn and control input u(t) ∈ Rm . The system is
associated with the infinite horizon quadratic cost function
Z ∞
V (x(t0 ), t0 ) = (x T (τ )Qx(τ ) + u T Ru(τ ))d(τ )
t0

with weighting matrices Q ≥ 0, R ≥ 0.

• it is assumed that (A, B) stabilizable - there exist a control input
makes the system stable
√
• (A, Q) is detectable - unstable modes are observable through
√
output (y = Qx)

E. Koyuncu (ITU) RL and ObC Lecture 1 6 / 34

Linear Quadratic Regulators(LQR)
The LQR optimal control problem requires finding the policy that
minimizes the cost
u ∗ (t) = arg min V (t0 , x (t0 ) , u(t))
u(t)
t0 ≤t≤∞

The solution is given by u(t) = −Kx(t), where the gain matrix will be
K = R −1 B T P
where P matrix is a positive definite solution of Algebraic Riccati Equation
AT P + PA + Q − PBR −1 B T P = 0

• under stabilizabiltiy and detectability conditions there is a unique

positive semi-definite solution
• this is closed loop system A − BK is asymptotically stable
• this is offline solution requires complete knowledge on the system
dynamics
E. Koyuncu (ITU) RL and ObC Lecture 1 7 / 34
Linear Quadratic Zero-sum Games

The LQ-ZS games have following linear dynamics

ẋ = Ax(t) + Bu(t) + Dd

where the state x(t) ∈ Rn , control input u(t) ∈ Rm , and disturbance

d(t) ∈ Rk . The system is associated with the infinite horizon quadratic
cost function
1 ∞ T
Z Z ∞
T 2 2
V (x(t), u, d) = x Qx + u Ru − γ kdk dτ ≡ r (x, u, d)dτ
2 t t

with the control weighting matrix R = R T > 0 and a scalar λ > 0.

E. Koyuncu (ITU) RL and ObC Lecture 1 8 / 34

Linear Quadratic Zero-sum Games
The LQ-ZS games require finding the control policy that minimizes the
cost wrt the control and maximizes the cost wrt to the disturbance

V ∗ (x(0)) = min max J(x(0), u, d)

u d
Z ∞
= min max Q(x) + u T Ru − γ 2 kdk2 dt
u d 0
The solution of this optimal control problem is given by

u(x) = −R −1 B T Px = −Kx
1
d(x) = 2 D T Px = Lx
γ
where the P is the solution to the game ARE
1
0 = AT P + PA + Q − PBR −1 B T P + PDD T P
γ2
E. Koyuncu (ITU) RL and ObC Lecture 1 9 / 34
Linear Quadratic Zero-sum Games

√
• There exist a solution P > 0 if (A, B) is stabilizable, (A, Q) is
observable, and λ > λ∗ the H-infinity gain.
• this is offline solution that requires complete knowledge of the system
dynamics (A, B, D)
• if system dynamics (A, B, D) change or the performance index
(Q, R, λ) varies, a new optimal control solution needed.

E. Koyuncu (ITU) RL and ObC Lecture 1 10 / 34

Table of Contents

1 Optimal Control and RL

2 Adaptive Control

3 Reinforcement Learning

4 RL Applications

5 About this Course

E. Koyuncu (ITU) RL and ObC Lecture 1 11 / 34

Model Reference Adaptive Controller (MRAC)

Consider the simple scalar case

ẋ = ax + bu
where the state x(t) ∈ Rn , control input u(t) ∈ Rm , and input gain b > 0.
It is desired for the plant state to follow the state of a reference model
given by
ẋm = −am xm + bm r
where r (t) ∈ Rn reference input signal. Take the controller structure as

u = −kx + dr

which has a feedback term and a feedforward term. k and d are unknown
and are to be determined so that the state tracking error
e(t) = x(t) − xm (t) goes to zero.

E. Koyuncu (ITU) RL and ObC Lecture 1 12 / 34

Model Reference Adaptive Controller (MRAC)

E. Koyuncu (ITU) RL and ObC Lecture 1 13 / 34

Model Reference Adaptive Controller (MRAC)

Tune the controller parameters online. E.g., using Lyapunov techniques,

the parameters are tune wrt

k̇ = αex, d˙ = −βer

where α, β > 0 are tuning parameters, then the tracking error e(t) goes to
zero with time.
• the feedback gain k is tuned by a product of its state x(t) in the
traking error e(t)
• feedforward gain d is tuned by a product of its input r (t) in the
traking error e(t)
• the plant dynamics (a, b) are not needed in the tuning laws!

E. Koyuncu (ITU) RL and ObC Lecture 1 14 / 34

Table of Contents

1 Optimal Control and RL

2 Adaptive Control

3 Reinforcement Learning

4 RL Applications

5 About this Course

E. Koyuncu (ITU) RL and ObC Lecture 1 15 / 34

Reinforcement Learning
RL has close connections to both optimal and adaptive control.
• allows designing adaptive controllers learn online and in real time
• provide solutions to user prescribed optimal control problems.
E.g., actor-critic structure

• policy evaluation executed by the critic

• policy improvement preformed by the actor.
• determine how close to optimal the current action
• modify control policy yields a value function.
E. Koyuncu (ITU) RL and ObC Lecture 1 16 / 34
AI/RL vs Control Terminology
RL uses max value, Control uses min cost
• Reward of a stage → Cost of a stage
• State value → State cost
• Value function → Cost function

System terminology
• Agent → Controller or decision maker
• Action → Control or decision
• Environment → Dynamic system

Learning/Planning terminology
• Learning → Solving a problem with simulation
• Self-learning → Solving problem with simulation-based policy iteration
• Planning vs Learning → Solving problem with model-based or
model-free simulations
E. Koyuncu (ITU) RL and ObC Lecture 1 17 / 34
Value Functions
• Value functions measure the goodness of a particular state or
state/action pair: how good is for the agent to be in a particular state
or execute a particular action at a particular state, for a given policy.
• Optimal value functions measure the best possible goodness of states
or state/action pairs under all possible policies.

• Prediction: For a given policy, estimate state and state/action value

functions
• Control (Optimal): Estimate the optimal state and state/action value
functions
E. Koyuncu (ITU) RL and ObC Lecture 1 18 / 34
Sequential Decision

Optimal decision
• At current state, apply decision that minimizes
Current stage cost + J ∗ (Next state)
where J ∗ (Next state) is the optimal future cost, starting from next
state
• This defines optimal policy - an optimal control to apply at each state

E. Koyuncu (ITU) RL and ObC Lecture 1 19 / 34

Principle of Optimality

Principle of optimality
Let {u0∗ , ..., uN−1
∗ } be an optimal control sequence wrt state sequence
{x0 , ..., xN }. Consider the tail subproblem that starts at xk∗ at time k and
∗ ∗

minimizes over {uk , ..., uN−1 } the cost-to-go from k to N

N−1
X
gk (xk∗ , uk ) + gm (xm , um ) + gN (xN )
m=k+1

Then the tail optimal control sequence {uk∗ , ..., uN−1

∗ } is optimal for the
tail subproblem.
E. Koyuncu (ITU) RL and ObC Lecture 1 20 / 34
Dynamic Programming
Solve all the tail subproblems of a given time length using the solution of
all the tail subproblems of shorter time length.
By principle of optimality
• Consider every possible uk and solve the tail subproblem that starts at
next state xk+1 = fk (xk , uk )
• Optimize over all uk

By principle of optimality
Start with
JN∗ (xN ) = gN (xN ) , for all xN
and for k = 0, , N − 1, let

Jk∗ (xk ) = ∗

min gk (xk , uk ) + Jk+1 (fk (xk , uk )) , for all xk .
uk ∈Uk (xk )

then optimal cost J ∗ (x0 ) is obtained at the last step: J0 (x0 ) = J ∗ (x0 )
E. Koyuncu (ITU) RL and ObC Lecture 1 21 / 34
Constraints via Infinite Cost Values

Can assign infinite cost to infeasible points, using extended reals

R := R ∪ {∞, −∞}

Equivalent Unconstrained
Constrained Optimal Control Formulation
Problem min
N−1
X
c̄ (sk , ak ) + Ē (sN )
s,a
k=0

PN−1 s.t. s0 = s̄0

mins,a k=0
c (sk , ak ) + E (sN )
s.t. s0 = s̄0 sk+1 = f (sk , ak ) , k = 0, . . . , N − 1
sk+1 = f (sk , ak ) ( )
c(s, a) if h(s, a) ≤ 0
0 ≥ h (sk , ak ) , k = 0, . . . , N − 1 with c̄(s, a) =
0 ≥ r (sN ) ∞ else
( )
E (s) if r (s) ≤ 0
and Ē (s) =
∞ else

E. Koyuncu (ITU) RL and ObC Lecture 1 22 / 34

Model-free VS Model-based

George Box
”All models are wrong but some models are useful”

• Due to model error model-free methods often achieve better policies

though are more time consuming
• (Adaptivity) We will examine use of (inaccurate) learned models and
ways not to hinder the final policy while still accelerating learning

E. Koyuncu (ITU) RL and ObC Lecture 1 23 / 34

Bellman’s curse of dimensionality

• Exact Dynamic Programming is an elegant and powerful way to solve

any optimal control problem to global optimality, independent of
convexity. It can be interpreted an efficient implementation of an
exhaustive search that explores all possible control actions for all
possible circumstances.
• However, it requires the tabulation of cost-to-go functions for all
possible states s ∈ S. Thus, it is exactly implementable only for
discrete state and action spaces, and otherwise requires a
discretization of the state space. Its computational complexity grows
exponentially in the state dimension. This ”curse of dimensionality”,
a phrase coined by Richard Bellman, unfortunately makes exact DP
impossible to appy to systems with larger state dimensions.
• Classical MPC does circumvent this problem by restricting itself to
finding only the optimal trajectory that starts at the current state s0.
• Explicit MPC suffers from the same curse of dimensionality as DP.
E. Koyuncu (ITU) RL and ObC Lecture 1 24 / 34
Table of Contents

1 Optimal Control and RL

2 Adaptive Control

3 Reinforcement Learning

4 RL Applications

5 About this Course

E. Koyuncu (ITU) RL and ObC Lecture 1 25 / 34

Reinforcement Learning History

Historical highlights
• Exact DP, Optimal Control - Bellman, Shannon, others 1950s
• AI/RL and Decision Making ideas - late 80s and early 90s
• Backgammon programs - Tesauro, 1992
• Algorithm era, analysis, applications, books - mid 90s
• Machine Learning, Big Data, Neural Networks - mid 2000s
• AlphaGo and AlphaZero - Deepmind, 2016, 2017
• DARPA AlphaDogFight against real F-16 pilots - 2019, 2020

E. Koyuncu (ITU) RL and ObC Lecture 1 26 / 34

Multiagent Reinforcement Learning

OpenAI Hide and Seek game with emergent behaviours

https://openai.com/blog/emergent-tool-use

https://www.youtube.com/watch?v=kopoLzvh5jY

E. Koyuncu (ITU) RL and ObC Lecture 1 27 / 34

RL-based Strategical War Gaming

• Survivability based Optimal Air Combat Mission Planning with Reinforcement Learning, IEEE Conference on Control
Technology and Applications (CCTA), Copenhagen, Denmark, August 21-24, 2018, Baspinar, B., Koyuncu, E.,

E. Koyuncu (ITU) RL and ObC Lecture 1 28 / 34

RL-based Tactical Air Combat

• Assessment of Aerial Combat Game via Optimization-Based Receding Horizon Control, IEEE Access, vol. 8, pp.
35853-35863, 2020, doi: 10.1109/ACCESS.2020.2974792 Baspinar, B., Koyuncu, E.,
• Evaluation of Two-vs-One Air Combats Using Hybrid Maneuver-Based Framework and Security Strategy Approach,
Journal of Aeronautics and Space Technologies, v. 12-1, pg. 95-107, January 2019 Baspinar, B., Koyuncu, E.,
• Differential Flatness-based Optimal Air Combat Maneuver Strategy Generation, AIAA Science and Technology Forum
and Exposition (AIAA SciTech 2019), San Diego, California, 7-11 January 2019 Baspinar B., Koyuncu E.,
• Aerial Combat Simulation Environment for One-on-One Engagement, AIAA SciTech Forum and Exposition: Modelling
and Simulation Technologies, Gaylord Palms, Kissimmee, FL, 8-12 January 2018 Baspinar, B., Koyuncu, E.,

E. Koyuncu (ITU) RL and ObC Lecture 1 29 / 34

RL-based Fast Flight Replanning

https://www.youtube.com/watch?v=8IiLQFQ3V0E
• A Dynamically Feasible Fast Replanning Strategy with Deep Reinforcement Learning, Journal of Intelligent and Robotic
Systems, v. 101, issue 1, 2021 Hasanzade, M., Koyuncu, E.,

E. Koyuncu (ITU) RL and ObC Lecture 1 30 / 34

Table of Contents

1 Optimal Control and RL

2 Adaptive Control

3 Reinforcement Learning

4 RL Applications

5 About this Course

E. Koyuncu (ITU) RL and ObC Lecture 1 31 / 34

Course Topics

• Introduction; Optimal Control; Adaptive Control and RL

• RL and Optimal Control of Discrete Systems
• RL-based Optimal Adaptive Control for Linear Systems
• RL-based Optimal Adaptive Control for Nonlinear Systems
• Policy iteration for continuous-time systems
• Value iteration for continuous-time systems
• RL-based Optimal Adaptive Control with Online Learning
• Online Learning for Zero-sum Games and H-infinity Control
• Online Learning for mutiplayer non-zero-sum Games
• RL for Zero-sum Games

E. Koyuncu (ITU) RL and ObC Lecture 1 32 / 34

Grading Policy

• 20% Paper abstract - problem selection and presentation, in class,

Due date is April 15.
• 40% Submission ready paper - 6 pages, including coding
implementation, in IFAC CPHS template - Due date is May 15, strict.
• 40% Paper presentation, including coding implementation - online, in
final exam week.
• 1 to 3 people groups

E. Koyuncu (ITU) RL and ObC Lecture 1 33 / 34

IFAC CPHS 2024, Antalya Turkey

E. Koyuncu (ITU) RL and ObC Lecture 1 34 / 34

The Deliberate Dumbing Down of America
100% (31)
The Deliberate Dumbing Down of America
738 pages
Optimal Control Exercises
100% (2)
Optimal Control Exercises
79 pages
ParticipantCaseWorksheets 072018
0% (7)
ParticipantCaseWorksheets 072018
11 pages
Linear Quadratic Regulator
0% (1)
Linear Quadratic Regulator
52 pages
Zazzafar Kishi Complt by Mumy Islam-1
No ratings yet
Zazzafar Kishi Complt by Mumy Islam-1
34 pages
Chapter 2 - Strategic Training
100% (1)
Chapter 2 - Strategic Training
53 pages
Purdue Owl Apa Citation Thesis
100% (3)
Purdue Owl Apa Citation Thesis
6 pages
Personality Development
No ratings yet
Personality Development
53 pages
DLL - Nail Care 8 - 1st Week, Nov. 5-9 2018
100% (1)
DLL - Nail Care 8 - 1st Week, Nov. 5-9 2018
3 pages
NVS - NWDA - Pension-Supremecourt - Judgment
No ratings yet
NVS - NWDA - Pension-Supremecourt - Judgment
32 pages
Anaphora Resolution PDF
No ratings yet
Anaphora Resolution PDF
63 pages
Ryan International School Chandigarh Winter Holiday Homework
100% (1)
Ryan International School Chandigarh Winter Holiday Homework
7 pages
15++ Control Óptimo
No ratings yet
15++ Control Óptimo
11 pages
Direct Instruction Lesson Plan Template
No ratings yet
Direct Instruction Lesson Plan Template
4 pages
Borelli Predictive Control PDF
No ratings yet
Borelli Predictive Control PDF
424 pages
LQR
No ratings yet
LQR
34 pages
Optimal and Robust Control
No ratings yet
Optimal and Robust Control
216 pages
Optimal and Robust Control
No ratings yet
Optimal and Robust Control
233 pages
Linear-Quadratic Regulator (LQR) - Wikipedia
100% (1)
Linear-Quadratic Regulator (LQR) - Wikipedia
4 pages
ENAC Booklet 2020 OptimalControl
No ratings yet
ENAC Booklet 2020 OptimalControl
135 pages
Plaintiff's Second Amended Complaint
No ratings yet
Plaintiff's Second Amended Complaint
116 pages
Lecture5 LQR PDF
No ratings yet
Lecture5 LQR PDF
54 pages
Robotics: Control Theory
No ratings yet
Robotics: Control Theory
54 pages
Athans Workshop 10 07
No ratings yet
Athans Workshop 10 07
40 pages
Linear Systems and Optimal Control Condensed Notes: J. A. Mcmahan JR
No ratings yet
Linear Systems and Optimal Control Condensed Notes: J. A. Mcmahan JR
22 pages
Optimal Control
No ratings yet
Optimal Control
35 pages
Triage Education Kit ATS
No ratings yet
Triage Education Kit ATS
75 pages
Linear Quadratic Regulator: Presented By: S.M.Mounesh (21011A0253
No ratings yet
Linear Quadratic Regulator: Presented By: S.M.Mounesh (21011A0253
34 pages
Lecture 4 Control
No ratings yet
Lecture 4 Control
23 pages
Dynamic Programming and Linear Quadratic (LQ) Control (Discrete-Time and Continuous Time Cases)
No ratings yet
Dynamic Programming and Linear Quadratic (LQ) Control (Discrete-Time and Continuous Time Cases)
53 pages
4F3 - Predictive Control
No ratings yet
4F3 - Predictive Control
27 pages
02 - Dynamic Programming and LQR
No ratings yet
02 - Dynamic Programming and LQR
25 pages
Negative Affect and Overconfidence: A Laboratory Investigation
No ratings yet
Negative Affect and Overconfidence: A Laboratory Investigation
41 pages
Chapter One: 1.1 Optimal Control Problem
No ratings yet
Chapter One: 1.1 Optimal Control Problem
25 pages
Otoritas Ijtihad......
No ratings yet
Otoritas Ijtihad......
30 pages
Methods of Linear Control Theory
No ratings yet
Methods of Linear Control Theory
20 pages
Optimal Control and Decision Making: Eexam
No ratings yet
Optimal Control and Decision Making: Eexam
18 pages
Optim
No ratings yet
Optim
23 pages
3796 Neural Lyapunov Model Predicti
No ratings yet
3796 Neural Lyapunov Model Predicti
12 pages
Linear Equations in One Variable (Level 2)
No ratings yet
Linear Equations in One Variable (Level 2)
12 pages
Review On Remote Sensing Methods For Landslide Detection Using Machine and Deep Learning
No ratings yet
Review On Remote Sensing Methods For Landslide Detection Using Machine and Deep Learning
24 pages
Minimax Linear Regulator Problems For Positive Systems
No ratings yet
Minimax Linear Regulator Problems For Positive Systems
26 pages
OPTCON LQ Optimal Control 2024-10-16
No ratings yet
OPTCON LQ Optimal Control 2024-10-16
13 pages
2018-Regret Bounds For Robust Adaptive Control of The Linear Quadratic Regulator
No ratings yet
2018-Regret Bounds For Robust Adaptive Control of The Linear Quadratic Regulator
47 pages
2017 - On The Sample Complexity of The Linear Quadratic Regulator
No ratings yet
2017 - On The Sample Complexity of The Linear Quadratic Regulator
43 pages
Inno2024 EMT4203 CONTROL II NOTES R6
No ratings yet
Inno2024 EMT4203 CONTROL II NOTES R6
9 pages
BBMbook Cambridge Newstyle
No ratings yet
BBMbook Cambridge Newstyle
373 pages
Optimal Control 2018 Souanef
No ratings yet
Optimal Control 2018 Souanef
15 pages
Dr. B. R. AMBEDKAR AND MAKING OF THE CONSTITUTION - A Case Study of Indian Federalism
No ratings yet
Dr. B. R. AMBEDKAR AND MAKING OF THE CONSTITUTION - A Case Study of Indian Federalism
13 pages
Stochastic Feedback Controller Design Considering The Dual Effect
No ratings yet
Stochastic Feedback Controller Design Considering The Dual Effect
13 pages
Linear Quadratic Control Using Model-Free Reinforcement Learning
No ratings yet
Linear Quadratic Control Using Model-Free Reinforcement Learning
16 pages
A2 Linear-Quadratic Optimal Control
No ratings yet
A2 Linear-Quadratic Optimal Control
8 pages
Optimal Control - Wikipedia
No ratings yet
Optimal Control - Wikipedia
12 pages
Unesco - Eolss Sample Chapters: Optimal Linear Quadratic Control
No ratings yet
Unesco - Eolss Sample Chapters: Optimal Linear Quadratic Control
12 pages
Adaptive DP For Discrete Time LQR Optimal Tracking Control Problems With Unknown Dynamics
No ratings yet
Adaptive DP For Discrete Time LQR Optimal Tracking Control Problems With Unknown Dynamics
6 pages
Ibc201 Chapter 1
No ratings yet
Ibc201 Chapter 1
7 pages
Kybernetika 39-2003-4 6
No ratings yet
Kybernetika 39-2003-4 6
11 pages
Linear Quadratic Tracking Control of Unknown Systems
No ratings yet
Linear Quadratic Tracking Control of Unknown Systems
10 pages
06722294
No ratings yet
06722294
6 pages
Model Based Output Difference Feedback Optimal Control
No ratings yet
Model Based Output Difference Feedback Optimal Control
6 pages
Unreachable Setpoints in MPC
No ratings yet
Unreachable Setpoints in MPC
7 pages
Psych
No ratings yet
Psych
18 pages
OCDM2223 Tutorial7solved
No ratings yet
OCDM2223 Tutorial7solved
5 pages
Linear Quadratic Dual Control: Anders Rantzer
No ratings yet
Linear Quadratic Dual Control: Anders Rantzer
4 pages
Learning-Based Control of Continuous-Time Systems Using Output Feedback
No ratings yet
Learning-Based Control of Continuous-Time Systems Using Output Feedback
8 pages
16 - Optimal Control of Unknown Parameter Systems
No ratings yet
16 - Optimal Control of Unknown Parameter Systems
3 pages
Data Driven Control of Large Scale Systems (1) 240720 220740
No ratings yet
Data Driven Control of Large Scale Systems (1) 240720 220740
6 pages
Kamala Pur Kar 2016
No ratings yet
Kamala Pur Kar 2016
11 pages
Introduction To Anthropology
No ratings yet
Introduction To Anthropology
4 pages
Control and Reinforcement Learning
No ratings yet
Control and Reinforcement Learning
6 pages
EE363 Review Session 1: LQR, Controllability and Observability
No ratings yet
EE363 Review Session 1: LQR, Controllability and Observability
6 pages
Drag Force Report
No ratings yet
Drag Force Report
8 pages
Elevator Pitch Hand Out
No ratings yet
Elevator Pitch Hand Out
2 pages
Statement 2: Under The Assumptions of This Theorem For Any
No ratings yet
Statement 2: Under The Assumptions of This Theorem For Any
7 pages
Test 01
No ratings yet
Test 01
4 pages
Linear-Quadratic Optimal Control With Integral Quadratic Constraints
No ratings yet
Linear-Quadratic Optimal Control With Integral Quadratic Constraints
14 pages
Class 4
No ratings yet
Class 4
4 pages
Pasacao Central School
No ratings yet
Pasacao Central School
2 pages
P.L.D. Peres J.C. Geromel - H2 Control For Discrete-Time Systems Optimality and Robustness
No ratings yet
P.L.D. Peres J.C. Geromel - H2 Control For Discrete-Time Systems Optimality and Robustness
4 pages
Python Full Course
No ratings yet
Python Full Course
2 pages
12 Definitive Traits of A Middle Child
No ratings yet
12 Definitive Traits of A Middle Child
2 pages
Ayushi Singh: Objective Interest
No ratings yet
Ayushi Singh: Objective Interest
1 page
LQR
No ratings yet
LQR
5 pages
Woolseylecture 1
No ratings yet
Woolseylecture 1
4 pages
Assessment Calendar
No ratings yet
Assessment Calendar
1 page
Ashby RN Resume
No ratings yet
Ashby RN Resume
1 page
Robust Design of Linear Control Laws For Constrained Nonlinear Dynamic Systems
No ratings yet
Robust Design of Linear Control Laws For Constrained Nonlinear Dynamic Systems
6 pages
Analytical Methods of Optimization
From Everand
Analytical Methods of Optimization
D. F. Lawden
No ratings yet
Nonlinear Control Feedback Linearization Sliding Mode Control
From Everand
Nonlinear Control Feedback Linearization Sliding Mode Control
Mourad Boufadene
No ratings yet
Loop-shaping Robust Control
From Everand
Loop-shaping Robust Control
Philippe Feyel
No ratings yet

RL and ObC Lecture 1

Uploaded by

RL and ObC Lecture 1

Uploaded by

Reinforcement Learning and Optimization-based

Assoc. Prof. Dr. Emre Koyuncu

Department of Aeronautics Engineering

E. Koyuncu (ITU) RL and ObC Lecture 1 1 / 34

1 Optimal Control and RL

5 About this Course

E. Koyuncu (ITU) RL and ObC Lecture 1 2 / 34

1 Optimal Control and RL

5 About this Course

E. Koyuncu (ITU) RL and ObC Lecture 1 3 / 34

E. Koyuncu (ITU) RL and ObC Lecture 1 4 / 34

with weighting matrices Q ≥ 0, R ≥ 0.

E. Koyuncu (ITU) RL and ObC Lecture 1 6 / 34

• under stabilizabiltiy and detectability conditions there is a unique

The LQ-ZS games have following linear dynamics

where the state x(t) ∈ Rn , control input u(t) ∈ Rm , and disturbance

with the control weighting matrix R = R T > 0 and a scalar λ > 0.

E. Koyuncu (ITU) RL and ObC Lecture 1 8 / 34

V ∗ (x(0)) = min max J(x(0), u, d)

E. Koyuncu (ITU) RL and ObC Lecture 1 10 / 34

1 Optimal Control and RL

5 About this Course

E. Koyuncu (ITU) RL and ObC Lecture 1 11 / 34

Consider the simple scalar case

E. Koyuncu (ITU) RL and ObC Lecture 1 12 / 34

E. Koyuncu (ITU) RL and ObC Lecture 1 13 / 34

Tune the controller parameters online. E.g., using Lyapunov techniques,

E. Koyuncu (ITU) RL and ObC Lecture 1 14 / 34

1 Optimal Control and RL

5 About this Course

E. Koyuncu (ITU) RL and ObC Lecture 1 15 / 34

• policy evaluation executed by the critic

• Prediction: For a given policy, estimate state and state/action value

E. Koyuncu (ITU) RL and ObC Lecture 1 19 / 34

minimizes over {uk , ..., uN−1 } the cost-to-go from k to N

Then the tail optimal control sequence {uk∗ , ..., uN−1

Can assign infinite cost to infeasible points, using extended reals

PN−1 s.t. s0 = s̄0

E. Koyuncu (ITU) RL and ObC Lecture 1 22 / 34

• Due to model error model-free methods often achieve better policies

E. Koyuncu (ITU) RL and ObC Lecture 1 23 / 34

• Exact Dynamic Programming is an elegant and powerful way to solve

1 Optimal Control and RL

5 About this Course

E. Koyuncu (ITU) RL and ObC Lecture 1 25 / 34

E. Koyuncu (ITU) RL and ObC Lecture 1 26 / 34

OpenAI Hide and Seek game with emergent behaviours

E. Koyuncu (ITU) RL and ObC Lecture 1 27 / 34

E. Koyuncu (ITU) RL and ObC Lecture 1 28 / 34

E. Koyuncu (ITU) RL and ObC Lecture 1 29 / 34

E. Koyuncu (ITU) RL and ObC Lecture 1 30 / 34

1 Optimal Control and RL

5 About this Course

E. Koyuncu (ITU) RL and ObC Lecture 1 31 / 34

• Introduction; Optimal Control; Adaptive Control and RL

E. Koyuncu (ITU) RL and ObC Lecture 1 32 / 34

• 20% Paper abstract - problem selection and presentation, in class,

E. Koyuncu (ITU) RL and ObC Lecture 1 33 / 34

E. Koyuncu (ITU) RL and ObC Lecture 1 34 / 34

You might also like