0% found this document useful (0 votes)

45 views62 pages

Deeplearning - Ai Deeplearning - Ai

Uploaded by

Noura Algadi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views62 pages

Deeplearning - Ai Deeplearning - Ai

Uploaded by

Noura Algadi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 62

Copyright Notice

These slides are distributed under the Creative Commons License.

DeepLearning.AI makes these slides available for educational purposes. You may not use or distribute
these slides for commercial purposes. You may make copies of these slides and use or distribute them for
educational purposes as long as you cite DeepLearning.AI as the source of the slides.

For the rest of the details of the license, see https://creativecommons.org/licenses/by-sa/2.0/legalcode

Reinforcement Learning
Introduction

What is Reinforcement
Learning?
Autonomous Helicopter
GPS
Accelerometers

Compass

Computer

How to fly it?

Andrew Ng
Autonomous Helicopter

[Thanks to Pieter Abbeel, Adam Coates and Morgan Quigley] For more videos: http://heli.stanford.edu.

Andrew Ng
Reinforcement Learning
position of helicopter how to move control sticks

state ! action "

# $

reward function

positive reward : helicopter flying well

negative reward : helicopter flying poorly

Andrew Ng
Robotic Dog Example

[Thanks to Zico Kolter]

Andrew Ng
Applications
• Controlling robots
• Factory optimization
• Financial (stock) trading
• Playing games (including video games)

Andrew Ng
Reinforcement
Learning formalism

Mars rover example

Mars Rover Example
terminal state terminal state

state

left right
state

[Credit: Jagriti Agrawal, Emma Brunskill]

Andrew Ng
Reinforcement
Learning formalism

The Return in
reinforcement learning
Return

100 0 0 0 0 40
state 1 2 3 4 5 6

Return
Return (until terminal state)

Discount Factor

Return

Andrew Ng
Example of Return
return
! = 0.5
100 0 0 0 0 40 reward
1 2 3 4 5 6

The return depends on the actions you take.

100 0 0 0 0 40
1 2 3 4 5 6

Andrew Ng
Reinforcement
Learning formalism

Making decisions: Policies

in reinforcement learning
Policy
100 40

policy
state action 100 40

100 40

A policy is a function 5 ! = " mapping from states to actions, that tells you
what action a to take in a given state s.

Andrew Ng
The goal of reinforcement learning

100 40

Find a policy 5 that tells you what action (a = 5(s)) to take in every state (s) so as to
maximize the return.

Andrew Ng
Reinforcement
Learning formalism

Review of key concepts

Mars rover Helicopter Chess

states 6 states position of helicopter pieces on board

how to move
actions possible move
control stick

rewards

discount factor 7

return 8! + 78" + 7 " 8# + ⋯ 8! + 78" + 7 " 8# + ⋯ 8! + 78" + 7 " 8# + ⋯

policy 5 100 40 Find 5 ! = " Find 5 ! = "

Andrew Ng
Markov Decision Process (MDP)

Agent
Agent
&

state s
reward R action a

Environment /
World

Andrew Ng
State-action value
function

State-action value function

definition
State action value function (Q-function)
= Return if you
• start in state !.
• take action " (once).
• then behave optimally after that.

100 50 25 12.5 20 40 return

action
100 0 0 0 0 40 reward

100 0 0 0 0 40
1 2 3 4 5 6

Andrew Ng
Picking actions
100 50 25 12.5 20 40 return
action
100 0 0 0 0 40 reward

100 100 50 12.5 25 6.25 12.5 10 6.25 20 40 40

100 0 0 0 0 40
1 2 3 4 5 6
! ", $ = Return if you
• start in state %.
• take action & (once).
• then behave optimally after that.

The best possible return from state ! is max < !, " .

$
'∗
The best possible action in state ! is the action " that gives max < !, " . Optimal < function
$

Andrew Ng
State-action value
function

State-action value function

example
Jupyter Notebook

Andrew Ng
State-action value
function

Bellman Equation
Bellman Equation
' (, * = Return if you
• start in state !.
1 2 3 4 5 6
• take action " (once).
• then behave optimally after that.

! : current state #(!) = reward of current state

" : current action
! ! : state you get to after taking action "
"! : action that you take in state !′

Andrew Ng
Bellman Equation
! ", $ = & " + ( max !(" ( , $( )
? '

100 100 50 12.5 25 6.25 12.5 10 6.25 20 40 40

100 0 0 0 0 40
1 2 3 4 5 6

Andrew Ng
Explanation of Bellman Equation
' (, * = Return if you
• start in state !.
• take action " (once).
• then behave optimally after that.

The best possible return from state ! is max * (! , " )

"
, ,
! ", $ = & " + ( max
!
!(" ,$ )
+
Reward you get Return from behaving optimally
right away starting from state # ! .

Andrew Ng
Explanation of Bellman Equation
! ", $ = & " + ( max !(" ( , $( )
? '

100 100 50 12.5 25 6.25 12.5 10 6.25 20 40 40

100 0 0 0 0 40
1 2 3 4 5 6

Andrew Ng
State-action value
function

Random (stochastic)
environment (Optional)
Stochastic Environment

1 2 3 4 5 6

Andrew Ng
Expected Return
100 0 0 0 0 40
1 2 3 4 5 6

Expected Return = Average( #% + -#& + - & #' + - ' #( + ⋯ )

= E[+! + -+" + - " +# + - # +$ + ⋯ ]

Andrew Ng
Expected Return
Goal of Reinforcement Learning:
Choose a policy / ! = " that will tell us what action " to take in state ! so as
to maximize the expected return.

, ,
Bellman ! ", $ = & " + ( .[max
!
! " ,$ ]
+
Equation:

Andrew Ng
Jupyter Notebook

Andrew Ng
Continuous State
Spaces

Example of continuous
state applications
Discrete vs Continuous State
Discrete State:

1 2 3 4 5 6
Continuous State:

0 6 km

Andrew Ng
Autonomous Helicopter

Andrew Ng
Continuous State
Spaces

Lunar Lander
Lunar Lander

Andrew Ng
Lunar Lander
actions:
do nothing
left thruster
main thruster
right thruster

Andrew Ng
Reward Function
• Getting to landing pad: 100 – 140
• Additional reward for moving toward/away from pad.
• Crash: -100
• Soft landing: +100
• Leg grounded: +10
• Fire main engine: -0.3
• Fire side thruster: -0.03

Andrew Ng
Lunar Lander Problem
Learn a policy ! that, given #
$
#̇
$̇
!=
&
&̇
'
(
picks action " = ! $ so as to maximize the return.

) = 0.985

Andrew Ng
Continuous State
Spaces

Learning the state-value

function
Deep Reinforcement
5
Learning
6
5̇
6̇
! 8
x= 8̇
9 #(!, ")
" :
1
0 '
0
0
12 inputs 64 units 64 units 1 unit
In a state $, use neural network to compute
%($, nothing), %($, left), %($, main), %($, right)
Pick the action a that maximizes %($, ")

Andrew Ng
Bellman Equation

& =' # + + max

!
/(#′ , %$ )
%
! "

! = # ,% &
(")
# (") , %(") , '(# (") ), # $
5 &
Andrew Ng
Learning Algorithm
Initialize neural network randomly as guess of %($, ").
Repeat {
Take actions in the lunar lander. Get ($, ", =($), $ ! ).
Store 10,000 most recent ($, ", =($), $ ! ) tuples.

Train neural network:

Create training set of 10,000 examples using

x = ($, ") and y = =($) + A max

!
%($ ! , "! ).
"
Train %#$% such that %#$% $, " ≈ 6.
Set % = %#$% .
[Mnih et al., 2015, Human-level control through deep reinforcement learning]

Andrew Ng
Continuous State
Spaces

Algorithm refinement:
Improved neural network
architecture
Deep Reinforcement
5
Learning
6
5̇
6̇
! 8
8̇
x= 9 #(!, ")
" :
1
0
0
0
12 inputs 64 units 64 units 1 unit
In a state $, use neural network to compute
%($, nothing), %($, left), %($, main), %($, right)
Pick the action a that maximizes %($, ")

Andrew Ng
Deep Reinforcement Learning
#
$
& %($, nothing)
#̇ %($, left)
! = $̇ %($, main)
%($, right)
&̇
'
8 inputs ( 64 units 64 units 4 units

In a state $, input $ to neural network.

Pick the action a that maximizes % $, " . = $ + A max
!
%($ ! , "! )
"

Andrew Ng
Continuous State
Spaces

Algorithm refinement:
!-greedy policy
Learning Algorithm
Initialize neural network randomly as guess of %($, ").
Repeat {
Take actions in the lunar lander. Get ($, ", =($), $ ! ).
Store 10,000 most recent ($, ", =($), $ ! ) tuples.

Train model:
Create training set of 10,000 examples using

x = ($, ") and y = =($) + A max

!
%($ ! , "! ).
"
Train %#$% such that %#$% $, " ≈ 6. D%.' 5 ≈ 6
Set % = %#$% .

Andrew Ng
How to choose actions while still learning?
In some state s
Option 1:
Pick the action " that maximizes %($, ").
Option 2:
With probability 0.95, pick the action a that maximizes %($, ").
With probability 0.05, pick an action " randomly.

(E = 0.05)

Andrew Ng
Continuous State
Spaces

Algorithm refinement:
Mini-batch and soft update
(optional)
How to choose actions while still learning?
$
1 ! ! (
! & 0 1, 3 = 7 8%,' # − $
26
!"#
2104 400
1416 232
1534 315
852 178 repeat { +
… … ' 1 ( ( -
# =#−& #, I
( 2H * D%,' 5 − 6
3210 870 '# ()*
+
' 1 ( ( -
#, I
* = * − & ( 2H * D%,' 5 − 6
'* ()*
}

Andrew Ng
V
V
Mini-batch
V
V price in $1000’s
V
500
! &
400
2104 400 300
1416 232
1534 315 200
batch
852 178
100
… …
3210 870 0
0 1000 2000 3000
size in feet2

Andrew Ng
Mini-batch

! & 0 w, 3

2104 400
1416 232
1534 315 K- K-
852 178 # bedrooms
… …
3210 870
K* K*
size in feet2

Andrew Ng
Learning Algorithm
Initialize neural network randomly as guess of %($, ").
Repeat {
Take actions in the lunar lander. Get ($, ", =($), $ ! ).
Store 10,000 most recent ($, ", =($), $ ! ) tuples.

Train model:
Create training set of 10,000 examples using

x = ($, ") and y = =($) + A max

!
%($ ! , "! ).
"
Train %#$% such that %#$% $, " ≈ 6.
Set % = %#$% .

Andrew Ng
Soft Update
Set ! = !;<= .

Andrew Ng
Continuous State
Spaces

The state of
reinforcement learning
Limitations of Reinforcement Learning
• Much easier to get to work in a simulation than a real robot!
• Far fewer applications than supervised and unsupervised
learning.
• But … exciting research direction with potential for future
applications.

Andrew Ng
Conclusion

Summary and
Thank you
Courses
• Supervised Machine Learning: Regression and Classification
Linear regression, logistic regression, gradient descent
• Advanced Learning Algorithms
Neural networks, decision trees, advice for ML
• Unsupervised Learning, Recommenders, Reinforcement Learning
Clustering, anomaly detection, collaborative filtering, content-
based filtering, reinforcement learning

Andrew Ng
Andrew Ng

Ceb Internship Report
No ratings yet
Ceb Internship Report
12 pages
Nonaka 1991
No ratings yet
Nonaka 1991
3 pages
Book Mathmatical Foundation of Reinforcement Learning Lecture Slides
No ratings yet
Book Mathmatical Foundation of Reinforcement Learning Lecture Slides
524 pages
10_ReinforcementLearning
No ratings yet
10_ReinforcementLearning
59 pages
Lecture Notes v1.0 687 F22
No ratings yet
Lecture Notes v1.0 687 F22
115 pages
ML - Unit 3 - Part II
No ratings yet
ML - Unit 3 - Part II
51 pages
Lecture 06
No ratings yet
Lecture 06
98 pages
21 - Reinforcement Learning
No ratings yet
21 - Reinforcement Learning
25 pages
Lec17 ReinforcementLearning
No ratings yet
Lec17 ReinforcementLearning
58 pages
ML Unit 4
No ratings yet
ML Unit 4
17 pages
Reinforcement Learning and Control: CS229 Lecture Notes
No ratings yet
Reinforcement Learning and Control: CS229 Lecture Notes
15 pages
Reinforcement Learning and Control: CS229 Lecture Notes
No ratings yet
Reinforcement Learning and Control: CS229 Lecture Notes
7 pages
Reinforcement LN-6
No ratings yet
Reinforcement LN-6
13 pages
Adobe Scan Nov 18, 2024
No ratings yet
Adobe Scan Nov 18, 2024
13 pages
RL Intro-2
No ratings yet
RL Intro-2
24 pages
ML Unit 4
No ratings yet
ML Unit 4
9 pages
Lec 04 Reinforcement Learning
No ratings yet
Lec 04 Reinforcement Learning
57 pages
Unit 02 REL
No ratings yet
Unit 02 REL
127 pages
CSE2530 Reinforcement Learning 2025 P1+2
No ratings yet
CSE2530 Reinforcement Learning 2025 P1+2
115 pages
RL Lecturer
No ratings yet
RL Lecturer
38 pages
Lecture 10 - Overview of RL With A VIP Perspective
No ratings yet
Lecture 10 - Overview of RL With A VIP Perspective
35 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
17 pages
Intro To Reinforcement Learning
No ratings yet
Intro To Reinforcement Learning
56 pages
A Beginner's Guide To Deep Reinforcement Learning: Skymind - Ai
No ratings yet
A Beginner's Guide To Deep Reinforcement Learning: Skymind - Ai
23 pages
Reinforcement Learning: A Short Cut
No ratings yet
Reinforcement Learning: A Short Cut
7 pages
Reinforcement Learning: Instructor: Max Welling
No ratings yet
Reinforcement Learning: Instructor: Max Welling
18 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
9 pages
Sections
No ratings yet
Sections
76 pages
ML Unit-V
No ratings yet
ML Unit-V
20 pages
Unit 5 ML
No ratings yet
Unit 5 ML
15 pages
18 AI BasicRL
No ratings yet
18 AI BasicRL
96 pages
7 - Reinforcement Learning
No ratings yet
7 - Reinforcement Learning
23 pages
Sdfesdf
No ratings yet
Sdfesdf
23 pages
An Introduction To Reinforcement Learning
No ratings yet
An Introduction To Reinforcement Learning
63 pages
Cs229-Notes12 Reinforcement in Control
No ratings yet
Cs229-Notes12 Reinforcement in Control
17 pages
RL Frra
No ratings yet
RL Frra
9 pages
DSA5102 Lecture11
No ratings yet
DSA5102 Lecture11
44 pages
Reinforcement Learning Note
No ratings yet
Reinforcement Learning Note
16 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
17 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
10 pages
37 RL
No ratings yet
37 RL
18 pages
Introduction To Reinforcement Learning
No ratings yet
Introduction To Reinforcement Learning
62 pages
Reinf 2
No ratings yet
Reinf 2
4 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
32 pages
Reinforcement Learning-1
No ratings yet
Reinforcement Learning-1
13 pages
Reinforcement Learning: Karan Kathpalia
No ratings yet
Reinforcement Learning: Karan Kathpalia
80 pages
Deep RL - Content Beyond Syllabus
No ratings yet
Deep RL - Content Beyond Syllabus
16 pages
Lec 08
No ratings yet
Lec 08
59 pages
Add-On DRL CS06
No ratings yet
Add-On DRL CS06
23 pages
Unit 5 Deep Learning
No ratings yet
Unit 5 Deep Learning
24 pages
16 RL PDF
No ratings yet
16 RL PDF
87 pages
Module 5-1
No ratings yet
Module 5-1
12 pages
2024 MTH058 Lecture05 ReinforcementLearning
No ratings yet
2024 MTH058 Lecture05 ReinforcementLearning
59 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
30 pages
RL 3
No ratings yet
RL 3
31 pages
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
No ratings yet
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
34 pages
AI Seminar RL
No ratings yet
AI Seminar RL
27 pages
Reinforcement Learning Lec12
No ratings yet
Reinforcement Learning Lec12
60 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
15 pages
Reinforcement Learning MY101
No ratings yet
Reinforcement Learning MY101
15 pages
Deep Reinforcement Learning: Lecture Notes
No ratings yet
Deep Reinforcement Learning: Lecture Notes
60 pages
Model O2000 Air Check Oxygen Deficiency Monitor
No ratings yet
Model O2000 Air Check Oxygen Deficiency Monitor
1 page
Electrosurgery: For The Hospital Operating Room and Surgery Center
100% (1)
Electrosurgery: For The Hospital Operating Room and Surgery Center
2 pages
80 Job Opening PDF Untuk Kamu
No ratings yet
80 Job Opening PDF Untuk Kamu
1 page
Acetylcholine Esterase Antibody
No ratings yet
Acetylcholine Esterase Antibody
2 pages
Self-Check Questions Instruction: Choose The Correct Answer From The Options Given
No ratings yet
Self-Check Questions Instruction: Choose The Correct Answer From The Options Given
5 pages
Al-Ghazi Tractors LTD - Annual Report 2007
100% (2)
Al-Ghazi Tractors LTD - Annual Report 2007
28 pages
Atv630 - Fusiveis e Disjuntores
No ratings yet
Atv630 - Fusiveis e Disjuntores
2 pages
Jeeves - and - Friends-P - G - Wodehouse UPPINT
No ratings yet
Jeeves - and - Friends-P - G - Wodehouse UPPINT
70 pages
Pakistan Academy School Al-Ahmadi Kuwait Winter Vacation - 2015-Home Work Igcse-F
No ratings yet
Pakistan Academy School Al-Ahmadi Kuwait Winter Vacation - 2015-Home Work Igcse-F
6 pages
Pengaruh Temperatur Dan Waktu Terhadap Proses Pengeringan Mie Kering
No ratings yet
Pengaruh Temperatur Dan Waktu Terhadap Proses Pengeringan Mie Kering
4 pages
0580 Unit Conversions PPQ v1
No ratings yet
0580 Unit Conversions PPQ v1
12 pages
Clamping Tools Brochure English
No ratings yet
Clamping Tools Brochure English
6 pages
G10 DLL Arts Q2
100% (2)
G10 DLL Arts Q2
3 pages
Certified Tester Foundation Level Syllabus: Version 2018 Version
No ratings yet
Certified Tester Foundation Level Syllabus: Version 2018 Version
94 pages
2024 Monitoring Tool
No ratings yet
2024 Monitoring Tool
15 pages
MSC Syllabus NTU
No ratings yet
MSC Syllabus NTU
75 pages
Addendum Group I 06 01 2023
No ratings yet
Addendum Group I 06 01 2023
4 pages
Alberto Olarte Sr. National High School Itinerary of Travel Marichu C. Beterbo, Mpa
No ratings yet
Alberto Olarte Sr. National High School Itinerary of Travel Marichu C. Beterbo, Mpa
15 pages
ELKI
No ratings yet
ELKI
7 pages
Ea1 100100 1 Mgu
No ratings yet
Ea1 100100 1 Mgu
2 pages
Instruction Manual: - Maintenance
No ratings yet
Instruction Manual: - Maintenance
4 pages
CH 09
No ratings yet
CH 09
26 pages
Unit 1. Perspective of Indian Economy
No ratings yet
Unit 1. Perspective of Indian Economy
107 pages
1756IF16IH 217990 Datasheet
No ratings yet
1756IF16IH 217990 Datasheet
2 pages
Chapter - 1: Study of Recruitment Policies and Procedure Adopted in Icici Prudential Life Insurance LTD."
No ratings yet
Chapter - 1: Study of Recruitment Policies and Procedure Adopted in Icici Prudential Life Insurance LTD."
60 pages
It o Calculus: Types of Derivatives
No ratings yet
It o Calculus: Types of Derivatives
25 pages
Introduction To Limit States
No ratings yet
Introduction To Limit States
26 pages
Air Pollution
No ratings yet
Air Pollution
33 pages

Deeplearning - Ai Deeplearning - Ai

Uploaded by

Deeplearning - Ai Deeplearning - Ai

Uploaded by

Copyright Notice

These slides are distributed under the Creative Commons License.

For the rest of the details of the license, see https://creativecommons.org/licenses/by-sa/2.0/legalcode

How to fly it?

state ! action "

positive reward : helicopter flying well

negative reward : helicopter flying poorly

[Thanks to Zico Kolter]

Mars rover example

[Credit: Jagriti Agrawal, Emma Brunskill]

The return depends on the actions you take.

Making decisions: Policies

Review of key concepts

states 6 states position of helicopter pieces on board

return 8! + 78" + 7 " 8# + ⋯ 8! + 78" + 7 " 8# + ⋯ 8! + 78" + 7 " 8# + ⋯

policy 5 100 40 Find 5 ! = " Find 5 ! = "

State-action value function

100 50 25 12.5 20 40 return

100 100 50 12.5 25 6.25 12.5 10 6.25 20 40 40

The best possible return from state ! is max < !, " .

State-action value function

! : current state #(!) = reward of current state

100 100 50 12.5 25 6.25 12.5 10 6.25 20 40 40

The best possible return from state ! is max * (! , " )

100 100 50 12.5 25 6.25 12.5 10 6.25 20 40 40

Expected Return = Average( #% + -#& + - & #' + - ' #( + ⋯ )

= E[+! + -+" + - " +# + - # +$ + ⋯ ]

Learning the state-value

& =' # + + max

Train neural network:

x = ($, ") and y = =($) + A max

In a state $, input $ to neural network.

x = ($, ") and y = =($) + A max

x = ($, ") and y = =($) + A max

You might also like