0% found this document useful (0 votes)

132 views

Lecture 5: Model-Free Control: David Silver

This document discusses model-free reinforcement learning techniques for control problems with unknown models. It introduces on-policy Monte Carlo control which uses Monte Carlo returns to evaluate a policy and then improves the policy greedily. It also describes epsilon-greedy exploration to ensure continued exploration. Off-policy temporal difference learning is discussed as an alternative to Monte Carlo control that has lower variance and can operate online from incomplete sequences.

Uploaded by

Fawaz Parto

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

132 views

Lecture 5: Model-Free Control: David Silver

Uploaded by

Fawaz Parto

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

Lecture 5: Model-Free Control

David Silver
Lecture 5: Model-Free Control

Outline

1 Introduction

2 On-Policy Monte-Carlo Control

3 On-Policy Temporal-Difference Learning

4 Off-Policy Learning

5 Summary
Lecture 5: Model-Free Control
Introduction

Model-Free Reinforcement Learning

Last lecture:
Model-free prediction
Estimate the value function of an unknown MDP
This lecture:
Model-free control
Optimise the value function of an unknown MDP
Lecture 5: Model-Free Control
Introduction

Uses of Model-Free Control

Some example problems that can be modelled as MDPs

Elevator Robocup Soccer
Parallel Parking Quake
Ship Steering Portfolio management
Bioreactor Protein Folding
Helicopter Robot walking
Aeroplane Logistics Game of Go
For most of these problems, either:
MDP model is unknown, but experience can be sampled
MDP model is known, but is too big to use, except by samples
Model-free control can solve these problems
Lecture 5: Model-Free Control
Introduction

On and Off-Policy Learning

On-policy learning
“Learn on the job”
Learn about policy π from experience sampled from π
Off-policy learning
“Look over someone’s shoulder”
Learn about policy π from experience sampled from µ
Lecture 5: Model-Free Control
On-Policy Monte-Carlo Control
Generalised Policy Iteration

Generalised Policy Iteration (Refresher)

Policy evaluation Estimate vπ

e.g. Iterative policy evaluation
Policy improvement Generate π 0 ≥ π
e.g. Greedy policy improvement
Lecture 5: Model-Free Control
On-Policy Monte-Carlo Control
Generalised Policy Iteration

Generalised Policy Iteration With Monte-Carlo Evaluation

Policy evaluation Monte-Carlo policy evaluation, V = vπ ?

Policy improvement Greedy policy improvement?
Lecture 5: Model-Free Control
On-Policy Monte-Carlo Control
Generalised Policy Iteration

Model-Free Policy Iteration Using Action-Value Function

Greedy policy improvement over V (s) requires model of MDP

π 0 (s) = argmax Ras + Pss

a 0
0 V (s )
a∈A

Greedy policy improvement over Q(s, a) is model-free

π 0 (s) = argmax Q(s, a)

a∈A
Lecture 5: Model-Free Control
On-Policy Monte-Carlo Control
Generalised Policy Iteration

Generalised Policy Iteration with Action-Value Function

Q=
q
π

Starting
Q, π
q*, π*

(Q )
greedy
π=

Policy evaluation Monte-Carlo policy evaluation, Q = qπ

Policy improvement Greedy policy improvement?
Lecture 5: Model-Free Control
On-Policy Monte-Carlo Control
Exploration

Example of Greedy Action Selection

There are two doors in front of you.

You open the left door and get reward 0
V (left) = 0
You open the right door and get reward +1
V (right) = +1
You open the right door and get reward +3
V (right) = +2
You open the right door and get reward +2
V (right) = +2
..
.
Are you sure you’ve chosen the best door?
Lecture 5: Model-Free Control
On-Policy Monte-Carlo Control
Exploration

-Greedy Exploration

Simplest idea for ensuring continual exploration

All m actions are tried with non-zero probability
With probability 1 − choose the greedy action
With probability choose an action at random

/m + 1 − if a∗ = argmax Q(s, a)

(
π(a|s) = a∈A
/m otherwise
Lecture 5: Model-Free Control
On-Policy Monte-Carlo Control
Exploration

-Greedy Policy Improvement

Theorem
For any -greedy policy π, the -greedy policy π 0 with respect to
qπ is an improvement, vπ0 (s) ≥ vπ (s)
X
qπ (s, π 0 (s)) = π 0 (a|s)qπ (s, a)
a∈A
X
= /m qπ (s, a) + (1 − ) max qπ (s, a)
a∈A
a∈A
X X π(a|s) − /m
≥ /m qπ (s, a) + (1 − ) qπ (s, a)
1−
a∈A a∈A
X
= π(a|s)qπ (s, a) = vπ (s)
a∈A

Therefore from policy improvement theorem, vπ0 (s) ≥ vπ (s)

Lecture 5: Model-Free Control
On-Policy Monte-Carlo Control
Exploration

Monte-Carlo Policy Iteration

Q=
q
π

Starting
Q, π
q*, π*

)
dy(Q
gree
π = ε-

Policy evaluation Monte-Carlo policy evaluation, Q = qπ

Policy improvement -greedy policy improvement
Lecture 5: Model-Free Control
On-Policy Monte-Carlo Control
Exploration

Monte-Carlo Control

Q=
q
π

Starting Q
q*, π*

)
dy(Q
ε- gree
π =

Every episode:
Policy evaluation Monte-Carlo policy evaluation, Q ≈ qπ
Policy improvement -greedy policy improvement
Lecture 5: Model-Free Control
On-Policy Monte-Carlo Control
GLIE

GLIE

Definition
Greedy in the Limit with Infinite Exploration (GLIE)
All state-action pairs are explored infinitely many times,

lim Nk (s, a) = ∞
k→∞

The policy converges on a greedy policy,

lim πk (a|s) = 1(a = argmax Qk (s, a0 ))

k→∞ a0 ∈A

1
For example, -greedy is GLIE if reduces to zero at k = k
Lecture 5: Model-Free Control
On-Policy Monte-Carlo Control
GLIE

GLIE Monte-Carlo Control

Sample kth episode using π: {S1 , A1 , R2 , ..., ST } ∼ π
For each state St and action At in the episode,
N(St , At ) ← N(St , At ) + 1
1
Q(St , At ) ← Q(St , At ) + (Gt − Q(St , At ))
N(St , At )
Improve policy based on new action-value function
← 1/k
π ← -greedy(Q)

Theorem
GLIE Monte-Carlo control converges to the optimal action-value
function, Q(s, a) → q∗ (s, a)
Lecture 5: Model-Free Control
On-Policy Monte-Carlo Control
Blackjack Example

Back to the Blackjack Example

Lecture 5: Model-Free Control
On-Policy Monte-Carlo Control
Blackjack Example

Monte-Carlo Control in Blackjack

Lecture 5: Model-Free Control
On-Policy Temporal-Difference Learning

MC vs. TD Control

Temporal-difference (TD) learning has several advantages

over Monte-Carlo (MC)
Lower variance
Online
Incomplete sequences
Natural idea: use TD instead of MC in our control loop
Apply TD to Q(S, A)
Use -greedy policy improvement
Update every time-step
Lecture 5: Model-Free Control
On-Policy Temporal-Difference Learning
Sarsa(λ)

Updating Action-Value Functions with Sarsa

S,A

R
S’

A’

Q(S, A) ← Q(S, A) + α R + γQ(S 0 , A0 ) − Q(S, A)

Lecture 5: Model-Free Control
On-Policy Temporal-Difference Learning
Sarsa(λ)

On-Policy Control With Sarsa

Q=
q
π

Starting Q
q*, π*

)
dy(Q
ε- gree
π =

Every time-step:
Policy evaluation Sarsa, Q ≈ qπ
Policy improvement -greedy policy improvement
Lecture 5: Model-Free Control
On-Policy Temporal-Difference Learning
Sarsa(λ)

Sarsa Algorithm for On-Policy Control

Lecture 5: Model-Free Control
On-Policy Temporal-Difference Learning
Sarsa(λ)

Convergence of Sarsa

Theorem
Sarsa converges to the optimal action-value function,
Q(s, a) → q∗ (s, a), under the following conditions:
GLIE sequence of policies πt (a|s)
Robbins-Monro sequence of step-sizes αt
∞
X
αt = ∞
t=1
X∞
αt2 < ∞
t=1
Lecture 5: Model-Free Control
On-Policy Temporal-Difference Learning
Sarsa(λ)

Windy Gridworld Example

Reward = -1 per time-step until reaching goal

Undiscounted
Lecture 5: Model-Free Control
On-Policy Temporal-Difference Learning
Sarsa(λ)

Sarsa on the Windy Gridworld

Lecture 5: Model-Free Control
On-Policy Temporal-Difference Learning
Sarsa(λ)

n-Step Sarsa
Consider the following n-step returns for n = 1, 2, ∞:
(1)
n=1 (Sarsa) qt = Rt+1 + γQ(St+1 )
(2)
n=2 qt = Rt+1 + γRt+2 + γ 2 Q(St+2 )
.. ..
. .
(∞)
n=∞ (MC ) qt = Rt+1 + γRt+2 + ... + γ T −1 RT
Define the n-step Q-return
(n)
qt = Rt+1 + γRt+2 + ... + γ n−1 Rt+n + γ n Q(St+n )

n-step Sarsa updates Q(s, a) towards the n-step Q-return

(n)
Q(St , At ) ← Q(St , At ) + α qt − Q(St , At )
Lecture 5: Model-Free Control
On-Policy Temporal-Difference Learning
Sarsa(λ)

Forward View Sarsa(λ)

The q λ return combines all n-step

(n)
Q-returns qt
Using weight (1 − λ)λn−1
∞
(n)
X
qtλ = (1 − λ) λn−1 qt
n=1

Forward-view Sarsa(λ)

Q(St , At ) ← Q(St , At ) + α qtλ − Q(St , At )
Lecture 5: Model-Free Control
On-Policy Temporal-Difference Learning
Sarsa(λ)

Backward View Sarsa(λ)

Just like TD(λ), we use eligibility traces in an online algorithm

But Sarsa(λ) has one eligibility trace for each state-action pair

E0 (s, a) = 0
Et (s, a) = γλEt−1 (s, a) + 1(St = s, At = a)

Q(s, a) is updated for every state s and action a

In proportion to TD-error δt and eligibility trace Et (s, a)
δt = Rt+1 + γQ(St+1 , At+1 ) − Q(St , At )
Q(s, a) ← Q(s, a) + αδt Et (s, a)
Lecture 5: Model-Free Control
On-Policy Temporal-Difference Learning
Sarsa(λ)

Sarsa(λ) Algorithm
Lecture 5: Model-Free Control
On-Policy Temporal-Difference Learning
Sarsa(λ)

Sarsa(λ) Gridworld Example

Lecture 5: Model-Free Control
Off-Policy Learning

Off-Policy Learning

Evaluate target policy π(a|s) to compute vπ (s) or qπ (s, a)

While following behaviour policy µ(a|s)

{S1 , A1 , R2 , ..., ST } ∼ µ

Why is this important?

Learn from observing humans or other agents
Re-use experience generated from old policies π1 , π2 , ..., πt−1
Learn about optimal policy while following exploratory policy
Learn about multiple policies while following one policy
Lecture 5: Model-Free Control
Off-Policy Learning
Importance Sampling

Importance Sampling

Estimate the expectation of a different distribution

X
EX ∼P [f (X )] = P(X )f (X )
X P(X )
= Q(X ) f (X )
Q(X )

P(X )
= EX ∼Q f (X )
Q(X )
Lecture 5: Model-Free Control
Off-Policy Learning
Importance Sampling

Importance Sampling for Off-Policy Monte-Carlo

Use returns generated from µ to evaluate π

Weight return Gt according to similarity between policies
Multiply importance sampling corrections along whole episode

π/µ π(At |St ) π(At+1 |St+1 ) π(AT |ST )

Gt = ... Gt
µ(At |St ) µ(At+1 |St+1 ) µ(AT |ST )

Update value towards corrected return

π/µ
V (St ) ← V (St ) + α Gt − V (St )

Cannot use if µ is zero when π is non-zero

Importance sampling can dramatically increase variance
Lecture 5: Model-Free Control
Off-Policy Learning
Importance Sampling

Importance Sampling for Off-Policy TD

Use TD targets generated from µ to evaluate π

Weight TD target R + γV (S 0 ) by importance sampling
Only need a single importance sampling correction

V (St ) ← V (St ) +
π(At |St )

α (Rt+1 + γV (St+1 )) − V (St )
µ(At |St )

Much lower variance than Monte-Carlo importance sampling

Policies only need to be similar over a single step
Lecture 5: Model-Free Control
Off-Policy Learning
Q-Learning

Q-Learning

We now consider off-policy learning of action-values Q(s, a)

No importance sampling is required
Next action is chosen using behaviour policy At+1 ∼ µ(·|St )
But we consider alternative successor action A0 ∼ π(·|St )
And update Q(St , At ) towards value of alternative action

Q(St , At ) ← Q(St , At ) + α Rt+1 + γQ(St+1 , A0 ) − Q(St , At )

Lecture 5: Model-Free Control
Off-Policy Learning
Q-Learning

Off-Policy Control with Q-Learning

We now allow both behaviour and target policies to improve

The target policy π is greedy w.r.t. Q(s, a)

π(St+1 ) = argmax Q(St+1 , a0 )

The behaviour policy µ is e.g. -greedy w.r.t. Q(s, a)

The Q-learning target then simplifies:

Rt+1 + γQ(St+1 , A0 )
=Rt+1 + γQ(St+1 , argmax Q(St+1 , a0 ))
a0
=Rt+1 + max
0
γQ(St+1 , a0 )
a
Lecture 5: Model-Free Control
Off-Policy Learning
Q-Learning

Q-Learning Control Algorithm

S,A
R

S’

A’

0 0
Q(S, A) ← Q(S, A) + α R + γ max
0
Q(S , a ) − Q(S, A)
a

Theorem
Q-learning control converges to the optimal action-value function,
Q(s, a) → q∗ (s, a)
Lecture 5: Model-Free Control
Off-Policy Learning
Q-Learning

Q-Learning Algorithm for Off-Policy Control

Lecture 5: Model-Free Control
Off-Policy Learning
Q-Learning

Q-Learning Demo

Q-Learning Demo
Lecture 5: Model-Free Control
Off-Policy Learning
Q-Learning

Cliff Walking Example

Lecture 5: Model-Free Control
Summary

Relationship Between DP and TD

Full Backup (DP) Sample Backup (TD)
v⇡ (s) !7 s

r
0 0
Bellman Expectation v⇡ (s ) !7 s

Equation for vπ (s) Iterative Policy Evaluation TD Learning

q⇡ (s, a) !7 s, a S,A
r R
s0 S’

Bellman Expectation q⇡ (s0 , a0 ) !7 a0 A’

Equation for qπ (s, a) Q-Policy Iteration Sarsa

q⇤ (s, a) !7 s, a

Bellman Optimality q⇤ (s0 , a0 ) !7 a0

Equation for q∗ (s, a) Q-Value Iteration Q-Learning

Lecture 5: Model-Free Control
Summary

Relationship Between DP and TD (2)

Full Backup (DP) Sample Backup (TD)

Iterative Policy Evaluation TD Learning
0 α
V (s) ← E [R + γV (S ) | s] V (S) ← R + γV (S 0 )
Q-Policy Iteration Sarsa
α
Q(s, a) ← E [R + γQ(S 0 , A0 ) | s, a] Q(S, A) ← R + γQ(S 0 , A0 )
Q-Value Iteration Q-Learning

0 0 α
Q(s, a) ← E R + γ max
0
Q(S , a ) | s, a Q(S, A) ← R + γ max
0
Q(S 0 , a0 )
a ∈A a ∈A

α
where x ← y ≡ x ← x + α(y − x)
Lecture 5: Model-Free Control
Summary

Questions?

MAUI
100% (2)
MAUI
101 pages
React Notes
0% (1)
React Notes
405 pages
2.2+Model Free+Control
No ratings yet
2.2+Model Free+Control
92 pages
Lnotes 04
No ratings yet
Lnotes 04
8 pages
Lecture 4 Pre
No ratings yet
Lecture 4 Pre
85 pages
Lecture 4: Model Free Control: Emma Brunskill
No ratings yet
Lecture 4: Model Free Control: Emma Brunskill
66 pages
slidedeck_7_MAS_2021_22_RL_3_MC_Sarsa_QL
No ratings yet
slidedeck_7_MAS_2021_22_RL_3_MC_Sarsa_QL
65 pages
19 - Monte Carlo and Temporal Difference for Markov Decision Processes.pptx
No ratings yet
19 - Monte Carlo and Temporal Difference for Markov Decision Processes.pptx
57 pages
DSA5102_lecture12
No ratings yet
DSA5102_lecture12
41 pages
CH3_2 Montecarlo Control
No ratings yet
CH3_2 Montecarlo Control
33 pages
10 - Reinforcement Learning
No ratings yet
10 - Reinforcement Learning
24 pages
5.4-Reinforcement Learning-Part2-Learning-Algorithms
No ratings yet
5.4-Reinforcement Learning-Part2-Learning-Algorithms
15 pages
Lecture 5 - ModelFreePrediction
No ratings yet
Lecture 5 - ModelFreePrediction
79 pages
Reinforcement Learning (Part 2) : Nguyen Do Van, PHD
No ratings yet
Reinforcement Learning (Part 2) : Nguyen Do Van, PHD
46 pages
Unit 4
100% (1)
Unit 4
7 pages
Lecture#5 Monte Carlo Methods Part I
No ratings yet
Lecture#5 Monte Carlo Methods Part I
28 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
46 pages
Lecture 3 Post
No ratings yet
Lecture 3 Post
58 pages
Model free methods
No ratings yet
Model free methods
31 pages
Lecture 3 Pre
No ratings yet
Lecture 3 Pre
67 pages
Reinforcement-Learning-Cheatsheet
No ratings yet
Reinforcement-Learning-Cheatsheet
16 pages
Lecture4 Model Free Prediction
No ratings yet
Lecture4 Model Free Prediction
34 pages
Unit Iii Monte Carlo & Temporal Difference Methods
No ratings yet
Unit Iii Monte Carlo & Temporal Difference Methods
18 pages
notes
No ratings yet
notes
6 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
45 pages
Monte Carlo 1
No ratings yet
Monte Carlo 1
245 pages
Course 2_ Sample Based Learning Methods Learning Objectives
No ratings yet
Course 2_ Sample Based Learning Methods Learning Objectives
3 pages
Reinforcement Learning: Csci 5512: Artificial Intelligence Ii
No ratings yet
Reinforcement Learning: Csci 5512: Artificial Intelligence Ii
30 pages
Reinforcement Learning and Control: CS229 Lecture Notes
No ratings yet
Reinforcement Learning and Control: CS229 Lecture Notes
7 pages
Reinforcement Learning and Control: CS229 Lecture Notes
No ratings yet
Reinforcement Learning and Control: CS229 Lecture Notes
15 pages
Lnotes 03
No ratings yet
Lnotes 03
11 pages
Rl Exam Tutti
No ratings yet
Rl Exam Tutti
47 pages
lecture3pre
No ratings yet
lecture3pre
67 pages
CS229
No ratings yet
CS229
17 pages
Unit 05 Dynamic Programming
No ratings yet
Unit 05 Dynamic Programming
9 pages
Lecture26 Ri
No ratings yet
Lecture26 Ri
55 pages
DD2431 Machine Learning Lab 4: Reinforcement Learning Python Version
No ratings yet
DD2431 Machine Learning Lab 4: Reinforcement Learning Python Version
9 pages
I2ml3e Chap18
No ratings yet
I2ml3e Chap18
27 pages
Fundamentals of Reinforcement Learning Learning Objectives
No ratings yet
Fundamentals of Reinforcement Learning Learning Objectives
3 pages
lec22
No ratings yet
lec22
22 pages
DLMAIRIL01_Q4-2024_Session4
No ratings yet
DLMAIRIL01_Q4-2024_Session4
80 pages
Reinforcement Learning: Karan Kathpalia
No ratings yet
Reinforcement Learning: Karan Kathpalia
80 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
50 pages
Reinforcement Learning 2
No ratings yet
Reinforcement Learning 2
41 pages
Planning and Optimal Control Policy Gradient Methods
No ratings yet
Planning and Optimal Control Policy Gradient Methods
34 pages
Retrace
No ratings yet
Retrace
18 pages
12 ML Reinforcement Learning Value Based Control
No ratings yet
12 ML Reinforcement Learning Value Based Control
12 pages
qp ans
No ratings yet
qp ans
40 pages
Ideai Reinforcement Learning
No ratings yet
Ideai Reinforcement Learning
167 pages
M3
No ratings yet
M3
57 pages
Machine Learning
No ratings yet
Machine Learning
5 pages
1-markov
No ratings yet
1-markov
34 pages
RL_2021_22_Exam_I_63163060c243ad69c552d008b899be82
No ratings yet
RL_2021_22_Exam_I_63163060c243ad69c552d008b899be82
4 pages
lec12
No ratings yet
lec12
60 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
6 pages
Book All in One
No ratings yet
Book All in One
288 pages
1 - Table of contents
No ratings yet
1 - Table of contents
6 pages
cs229-notes12 Reinforcement in Control
No ratings yet
cs229-notes12 Reinforcement in Control
17 pages
11-DL-Deep Learning For Reinforcement Learning
No ratings yet
11-DL-Deep Learning For Reinforcement Learning
47 pages
RL 10 QUESTIONS FOR MID II Scheme of Evaluvation
No ratings yet
RL 10 QUESTIONS FOR MID II Scheme of Evaluvation
15 pages
2 - Overview of this book
No ratings yet
2 - Overview of this book
4 pages
Random Optimization: Fundamentals and Applications
From Everand
Random Optimization: Fundamentals and Applications
Fouad Sabry
No ratings yet
Methods of Linear Control Theory
No ratings yet
Methods of Linear Control Theory
20 pages
Positioning System of A Pneumatic Actuator Driven by Proportional Pressure Regulator Valves
No ratings yet
Positioning System of A Pneumatic Actuator Driven by Proportional Pressure Regulator Valves
6 pages
IEEE Conference
No ratings yet
IEEE Conference
7 pages
Nonlinear Control of An Inverted Pendulum
No ratings yet
Nonlinear Control of An Inverted Pendulum
108 pages
Modeling and Control of A Rotary Inverted Pendulum Using Various Methods, Comparative Assessment and Result Analysis
No ratings yet
Modeling and Control of A Rotary Inverted Pendulum Using Various Methods, Comparative Assessment and Result Analysis
7 pages
Mathematical Model of The Double Effect Telescopic Hydraulic Damper
No ratings yet
Mathematical Model of The Double Effect Telescopic Hydraulic Damper
4 pages
2392 3302 PDF
No ratings yet
2392 3302 PDF
80 pages
10 1 1 24 4443 PDF
No ratings yet
10 1 1 24 4443 PDF
27 pages
Aircraft Landing With Decelerated Approach (Longitudinal Movement Model)
No ratings yet
Aircraft Landing With Decelerated Approach (Longitudinal Movement Model)
6 pages
Islamic Festivals
100% (1)
Islamic Festivals
14 pages
Robust Control of F-16 Lateral Dynamics: Hoa Vo and Sridhar Seshagiri, Member, IEEE
No ratings yet
Robust Control of F-16 Lateral Dynamics: Hoa Vo and Sridhar Seshagiri, Member, IEEE
6 pages
Sliding Mode Control of F-16 Longitudinal Dynamics
No ratings yet
Sliding Mode Control of F-16 Longitudinal Dynamics
6 pages
666 PDF
No ratings yet
666 PDF
5 pages
Eeee PDF
No ratings yet
Eeee PDF
4 pages
PLSQL 2 3 SG
No ratings yet
PLSQL 2 3 SG
21 pages
Use More Information Related Information: On This Page
No ratings yet
Use More Information Related Information: On This Page
50 pages
somethingc
No ratings yet
somethingc
4 pages
0417_s21_qp_21
No ratings yet
0417_s21_qp_21
4 pages
What Is JDBC Driver
No ratings yet
What Is JDBC Driver
30 pages
BMC Remedy Manual
No ratings yet
BMC Remedy Manual
128 pages
Registration Form Validation: Write A Java Script To Validate The Following Registration Form
No ratings yet
Registration Form Validation: Write A Java Script To Validate The Following Registration Form
19 pages
Trading Values
No ratings yet
Trading Values
3 pages
Lecture Notes 5 - Problem Solving
100% (1)
Lecture Notes 5 - Problem Solving
10 pages
Btech Cse06062011
No ratings yet
Btech Cse06062011
75 pages
Ecm Editor Flowish Chart
No ratings yet
Ecm Editor Flowish Chart
1 page
Operating System Last 5 Year Question Paper
100% (1)
Operating System Last 5 Year Question Paper
9 pages
NC & CNC E Book
No ratings yet
NC & CNC E Book
16 pages
BBP Template
No ratings yet
BBP Template
17 pages
CSC WS 1
No ratings yet
CSC WS 1
4 pages
Basic Controls
No ratings yet
Basic Controls
3 pages
20 ITSM Requirements and Solutions
No ratings yet
20 ITSM Requirements and Solutions
107 pages
ICSE Computer Project
No ratings yet
ICSE Computer Project
98 pages
List of IT Companies in Jaipur
100% (1)
List of IT Companies in Jaipur
5 pages
Sap Abap Quick Guide
No ratings yet
Sap Abap Quick Guide
93 pages
Kotlin Tutorial PDF
100% (1)
Kotlin Tutorial PDF
5 pages
Operating System For Online
100% (4)
Operating System For Online
114 pages
SQL Database Interview Questions
No ratings yet
SQL Database Interview Questions
29 pages
Lecture 5
No ratings yet
Lecture 5
15 pages
MMU Programming Language Concept
No ratings yet
MMU Programming Language Concept
49 pages
Affinity Between Events Streams PDF
No ratings yet
Affinity Between Events Streams PDF
12 pages
Python and Flask - Notes
No ratings yet
Python and Flask - Notes
6 pages
Unit 3
No ratings yet
Unit 3
9 pages

Lecture 5: Model-Free Control: David Silver

Uploaded by

Lecture 5: Model-Free Control: David Silver

Uploaded by

Lecture 5: Model-Free Control