0% found this document useful (0 votes)

3 views39 pages

Lecture2 DRL A

The document outlines a course on Applied Machine Learning taught by Dr. Tao Han, focusing on Reinforcement Learning (RL) concepts such as policy gradients and actor-critic methods. It discusses the challenges of labeling data, the structure of RL, and various examples including playing video games to illustrate how actors maximize rewards through observations and actions. The document also covers the optimization process in RL and the importance of exploration in training data collection.

Uploaded by

ra734

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views39 pages

Lecture2 DRL A

Uploaded by

ra734

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

ECE 381:

Applied Machine Learning

• Tao Han, Ph.D.

• Associate Professor
• Electrical and Computer Engineering
• Newark College of Engineering
• New Jersey Institute of Technology

• https://tao-han-njit.netlify.app

Slides are designed based on Prof. Hung-yi Lee’s Machine Learning courses at National Taiwan University
Supervised Learning → RL
Human label
? Cat

Human label
? “3-3”?

It is challenging to label data in some tasks.

…… machine can know the results are good or not.
Outline

What is RL? (Three steps in ML)

Policy Gradient

Actor-Critic
Machine Learning
≈ Looking for a Function
Actor
Observation Action
Action =
Function Function
f( Observation ) output
input

Find a policy maximizing

total reward
Reward

Environment
Example: Playing Video Game
Termination: all the aliens are killed,
• Space invader
or your spaceship is destroyed.
Score
(reward)

Kill the
aliens

shield
fire
Example: Playing Video Game

Actor
Observation Action

“right”

Reward
reward = 0

Environment
Example: Playing Video Game
Find an actor maximizing expected reward.
Actor
Observation Action

“fire”

Reward
reward = 5
if killing an alien.

Environment
Machine Learning is so simple ……

Step 1: Step 2: define

Step 3:
function with loss from
optimization
unknown training data
Step 1: Function with Unknown
Policy Network
(Actor) Sample based
left 0.7 on scores

… right 0.2 Scores of

actions

…
fire 0.1
pixels
Classification Task!!!

• Input of neural network: the observation of machine

represented as a vector or a matrix
• Output neural network : each action corresponds to a
neuron in output layer
Step 2: Define “Loss”
Start with
observation 𝑠! Observation 𝑠" Observation 𝑠#

Obtain reward Obtain reward

𝑟! = 0 𝑟" = 5

Action 𝑎!: “right” Action 𝑎" : “fire”

(kill an alien)
Step 2: Define “Loss”
Start with
observation 𝑠! Observation 𝑠" Observation 𝑠#

This is an episode.
After many turns Game Over Total reward
$
(spaceship destroyed) (return):
𝑅 = ( 𝑟%
%&!
Obtain reward 𝑟$
What we want
Action 𝑎 $ to maximize
Trajectory
Step 3: Optimization 𝜏 = 𝑠!, 𝑎!, 𝑠", 𝑎", ⋯
Network Network
𝑠! 𝑎! 𝑠" 𝑎"

Env Actor Env Actor Env ……

𝑠! 𝑎! 𝑠" 𝑎" 𝑠#

Reward sample Reward They are

black box …
𝑟! 𝑟" … with randomness

$
How to do the optimization here is
𝑅 𝜏 = ( 𝑟%
the main challenge in RL.
c.f. GAN %&!
Outline

What is RL? (Three steps in ML)

Policy Gradient

Actor-Critic
How to control your actor
• Make it take (or don’t take) a specific action 𝑎! given
specific observation 𝑠.
𝑎 𝑎,
left 1
Actor right 0
𝜃
s fire 0
𝑒
Take action 𝑎, Cross-entropy
𝐿=𝑒
Don’t take action 𝑎, 𝜃 ∗ = 𝑎𝑟𝑔 min 𝐿
"
𝐿 = −𝑒
How to control your actor
Take action 𝑎, given s 𝑎 𝑒! 𝑎,
left 1
Actor right 0
𝜃
s fire 0

Don’t take action 𝑎′

, given 𝑠′ 𝑎 𝑒" 𝑎′
,
left 0
Actor right 1
𝜃
𝑠′ fire 0

𝐿 = 𝑒# − 𝑒$ 𝜃 ∗ = 𝑎𝑟𝑔 min 𝐿
"
How to control your actor

Training Data

𝑠!, 𝑎,! +1 Yes 𝑠 Actor 𝑎

𝑠", 𝑎," -1 No 𝜃
𝑠#, 𝑎,# +1 Yes
𝐿 = + 𝑒# − 𝑒$ + 𝑒% ⋯ − 𝑒&
……

……

𝑠' , 𝑎,' -1 No 𝜃 ∗ = 𝑎𝑟𝑔 min 𝐿

"
How to control your actor

Training Data

𝑠!, 𝑎,! A! +1.5 𝑠 Actor 𝑎

𝑠", 𝑎," A" - 0.5 𝜃
𝑠#, 𝑎,# A# +0.5
𝐿 = 1 A' 𝑒'
……

……

𝑠' , 𝑎,' A' -10 𝜃 ∗ = 𝑎𝑟𝑔 min 𝐿

"
? ?
Version 0
𝑠! 𝑎! 𝑠" Training Data

Env Actor Env Actor 𝑠!, 𝑎! A! = 𝑟!

𝑠", 𝑎" A" = 𝑟"
𝑠! 𝑎! 𝑠" 𝑎"
𝑠#, 𝑎# A# = 𝑟#

……
……
Reward Reward ……
𝑟! 𝑟" 𝑠' , 𝑎' A' = 𝑟'
many episodes

Short-sighted Version!
𝑠! 𝑎! 𝑠"
Version 0 Env Actor Env Actor

𝑠! 𝑎! 𝑠" 𝑎"
“right” “fire”
Reward Reward ……
?
𝑟! 0 𝑟" +5

• An action affects the subsequent observations and thus

subsequent rewards.
• Reward delay: Actor has to sacrifice immediate reward to
gain more long-term reward.
• In space invader, only “fire” yields positive reward, so vision
0 will learn an actor that always “fire”.
Version 1
Training Data

𝑠! 𝑠" 𝑠# 𝑠' 𝑠!, 𝑎! A! = 𝐺!

𝑎! 𝑎" 𝑎# …… 𝑎' 𝑠", 𝑎" A" = 𝐺"
𝑟! 𝑟" 𝑟# 𝑟' 𝑠#, 𝑎# A# = 𝐺#

……

……
𝐺# = 𝑟# + 𝑟$ + 𝑟% + …… + 𝑟* 𝑠' , 𝑎' A' = 𝐺'
𝐺$ = 𝑟$ + 𝑟% + …… + 𝑟*
*
𝐺% = 𝑟% + …… + 𝑟* 𝐺( = 1 𝑟'
cumulated reward ')(
Version 2
Training Data

𝑠! 𝑠" 𝑠# 𝑠' 𝑠!, 𝑎! A! = 𝐺!(

𝑎! 𝑎" 𝑎# …… 𝑎' 𝑠", 𝑎" A" = 𝐺"(
𝑟! 𝑟" 𝑟# 𝑟' 𝑠#, 𝑎# A# = 𝐺#(

……

……
Also the credit of 𝑎! ?
𝐺# = 𝑟# + 𝑟$ + 𝑟% + …… + 𝑟* 𝑠' , 𝑎' A' = 𝐺'(

𝐺#+ = 𝑟# + 𝛾𝑟$ + 𝛾 $ 𝑟% + …… *

𝐺(+ = 1 𝛾 ',( 𝑟'

Discount factor 𝛾 < 1
')(
Version 3
Training Data

𝑠! 𝑠" 𝑠# 𝑠' 𝑠!, 𝑎! A! = 𝐺!( −𝑏

𝑎! 𝑎" 𝑎# …… 𝑎' 𝑠", 𝑎" A" = 𝐺"( −𝑏
𝑟! 𝑟" 𝑟# 𝑟' 𝑠#, 𝑎# A# = 𝐺#( −𝑏

……

……
Good or bad reward is “relative”
If all the 𝑟) ≥ 10 𝑠' , 𝑎' A' = 𝐺'( −𝑏
𝑟) = 10 is negative … *
Minus by a baseline 𝑏 ??? 𝐺(+ = 1 𝛾 ',( 𝑟'
Make 𝐺%( have positive and negative values ')(
Policy Gradient
• Initialize actor network parameters 𝜃 -
• For training iteration 𝑖 = 1 to 𝑇
• Using actor 𝜃 .,# to interact
• Obtain data 𝑠# , 𝑎# , 𝑠$ , 𝑎$ , … , 𝑠* , 𝑎*
• Compute 𝐴# , 𝐴$ , … , 𝐴*
• Compute loss 𝐿 Data collection is in the “for
• 𝜃 . ← 𝜃 .,# − 𝜂∇𝐿 loop” of training iterations.
Policy Gradient
Training Data
𝑠 Actor 𝑎
𝑠!, 𝑎! A! 𝜃
𝑠", 𝑎" A"
𝑠#, 𝑎# A# 𝐿 = 1 A' 𝑒'
……
……

𝜃 . ← 𝜃 .,# − 𝜂∇𝐿
𝑠' , 𝑎' A' only update once

Each time you update the model parameters, you need to

collect the whole training set again.
Policy Gradient
• Initialize actor network parameters 𝜃 -
• For training iteration 𝑖 = 1 to 𝑇
.,#
• Using actor 𝜃 .,#
to interact Experience of 𝜃
• Obtain data 𝑠# , 𝑎# , 𝑠$ , 𝑎$ , … , 𝑠* , 𝑎*
• Compute 𝐴# , 𝐴$ , … , 𝐴*
• Compute loss 𝐿
• 𝜃 . ← 𝜃 .,# − 𝜂∇𝐿

May not be good for 𝜃 .

Policy Gradient
• Initialize actor network parameters 𝜃 -
• For training iteration 𝑖 = 1 to 𝑇
• Using actor 𝜃 .,# to interact
• Obtain data 𝑠# , 𝑎# , 𝑠$ , 𝑎$ , … , 𝑠* , 𝑎*
• Compute 𝐴# , 𝐴$ , … , 𝐴*
May not observe by 𝜃 *
• Compute loss 𝐿
• 𝜃 . ← 𝜃 .,# − 𝜂∇𝐿 𝑠 𝑠 𝑠 𝑠
! " # '
Trajectory of 𝑎! 𝑎" 𝑎# …… 𝑎'
𝜃 *+!
𝑟! 𝑟" 𝑟# 𝑟'
Collection Training Data:
Exploration
𝑠! 𝑎! 𝑠"
Enlarge output
Env Actor Env Actor entropy
𝑠! 𝑎! 𝑠" 𝑎" Add noises onto
parameters
Reward Reward ……
Suppose your actor
𝑟! 𝑟" always takes “left”.
The actor needs to have randomness during We never know
data collection. what would happen
if taking “fire”.
A major reason why we sample actions. J
DeepMind - PPO https://youtu.be/gn4nRCC9TwQ
Outline

What is RL? (Three steps in ML)

Policy Gradient

Actor-Critic
Critic 𝐺#+ = 𝑟# + 𝛾𝑟$ + 𝛾 $ 𝑟% + ……

• Critic: Given actor 𝜃, how good it is when observing 𝑠 (and

taking action 𝑎)
• Value function 𝑉 , 𝑠 : When using actor 𝜃, the discounted
cumulated reward expects to be obtained after seeing s

𝑉, 𝑠
s 𝑉"
scalar

𝑉 , 𝑠 is large 𝑉 , 𝑠 is smaller

The output values of a critic depend on the actor evaluated.

!
How to estimate 𝑉 𝑠
• Monte-Carlo (MC) based approach
The critic watches actor 𝜃 to interact with the environment.

After seeing 𝑠- ,
Until the end of the episode, 𝑠- 𝑉" 𝑉 , 𝑠- 𝐺-(
the cumulated reward is 𝐺-(

After seeing 𝑠. ,
Until the end of the episode,
the cumulated reward is 𝐺.(
𝑠. 𝑉" 𝑉 , 𝑠. 𝐺.(
"
How to estimate 𝑉 𝑠
• Temporal-difference (TD) approach
⋯ 𝑠( , 𝑎( , 𝑟( , 𝑠(/# ⋯ (ignore the expectation here)
𝑉 , 𝑠% = 𝑟% + 𝛾𝑟%/! + 𝛾 "𝑟%/" …
𝑉 , 𝑠%/! = 𝑟%/! + 𝛾𝑟%/" + ⋯

" 𝑉 , 𝑠% = 𝛾𝑉 , 𝑠%/! + 𝑟%
𝑠% 𝑉 𝑉, 𝑠%

- 𝑉 , 𝑠% − 𝛾𝑉 , 𝑠%/! 𝑟%
×𝛾
𝑠%/! 𝑉" 𝑉 , 𝑠%/!
MC v.s. TD
• The critic has observed the following 8 episodes
• 𝑠- , 𝑟 = 0, 𝑠. , 𝑟 = 0, END
• 𝑠. , 𝑟 = 1, END
𝑉 " 𝑠1 = 3/4
• 𝑠. , 𝑟 = 1, END
• 𝑠. , 𝑟 = 1, END 𝑉 " 𝑠0 =? 0? 3/4?
• 𝑠. , 𝑟 = 1, END
• 𝑠. , 𝑟 = 1, END Monte-Carlo: 𝑉 " 𝑠0 = 0
• 𝑠. , 𝑟 = 1, END
• 𝑠. , 𝑟 = 0, END Temporal-difference:

(Assume 𝛾 = 1, and the 𝑉 " 𝑠0 = 𝑉 " 𝑠1 + 𝑟

actions are ignored here.) 3/4 3/4 0
Version 3.5

𝑠! 𝑠" 𝑠# 𝑠' Training Data

𝑎! 𝑎" 𝑎# …… 𝑎' 𝑠!, 𝑎! A! = 𝐺!( −𝑏
𝑟! 𝑟" 𝑟# 𝑟' 𝑠", 𝑎" A" = 𝐺"( −𝑏
𝑠#, 𝑎# A# = 𝐺#( −𝑏

……
……
𝑉, 𝑠' , 𝑎' A' = 𝐺'( −𝑏
s 𝑉" 𝑠
Version 3.5

𝑠! 𝑠" 𝑠# 𝑠' Training Data

𝑎! 𝑎" 𝑎# …… 𝑎' 𝑠!, 𝑎! A! = 𝐺!( −𝑉 , 𝑠!
𝑟! 𝑟" 𝑟# 𝑟' 𝑠", 𝑎" A" = 𝐺"( −𝑉 , 𝑠"
𝑠#, 𝑎# A# = 𝐺#( −𝑉 , 𝑠#

……
……
𝑉, 𝑠' , 𝑎' A' = 𝐺'( −𝑉 , 𝑠'
s 𝑉" 𝑠
Version 3.5 𝑠% , 𝑎% A% = 𝐺%( −𝑉 , 𝑠%

𝐺 = 100
𝐺=3
𝐺=1 𝑉 , 𝑠%
𝑠% 𝐺=2
𝐺 = −10
(not necessary take 𝑎% )
(You sample the actions based on
a distribution) A% > 0
𝑎% is better than average.
𝑎%
𝐺%( A% < 0
𝑠%
Just a sample 𝑎% is worse than average.
𝑟% + 𝑉 , 𝑠%/! − 𝑉 , 𝑠%
Version 4 𝑠% , 𝑎% A% = 𝐺%( −𝑉 , 𝑠%
Advantage Actor-Critic
𝐺 = 100
𝐺=3
𝐺=1 𝑉 , 𝑠%
𝑠% 𝐺=2
𝐺 = −10
(not necessary take 𝑎% )

𝐺 = 101
Obtain 𝑟% 𝐺=4
𝑎% 𝑟%
𝐺=3 +𝑉 , 𝑠%/!
𝑠% 𝑠%/! 𝐺=1
𝐺 = −5
Tip of Actor-Critic
• The parameters of actor and critic can be
shared.

left

Network right Actor

𝑠 Network fire

Network scalar Critic

Outline

What is RL? (Three steps in ML)

Policy Gradient

Actor-Critic

Teaching Strategies For The Development of Literacy Skills
100% (11)
Teaching Strategies For The Development of Literacy Skills
16 pages
2013 September UGC NET Solved Question Paper in Psychology
No ratings yet
2013 September UGC NET Solved Question Paper in Psychology
264 pages
Tefl 1
No ratings yet
Tefl 1
159 pages
Psycho Linguistics
100% (1)
Psycho Linguistics
7 pages
6736102
No ratings yet
6736102
1 page
DLP The Discipline of Counseling
No ratings yet
DLP The Discipline of Counseling
8 pages
16 RL PDF
No ratings yet
16 RL PDF
87 pages
2.game AI 1
No ratings yet
2.game AI 1
268 pages
Twelve Angry Men
No ratings yet
Twelve Angry Men
8 pages
Chapter 9 - Using Standardized Measurements To Look at Cognitive Development
No ratings yet
Chapter 9 - Using Standardized Measurements To Look at Cognitive Development
4 pages
MasterClass 1
No ratings yet
MasterClass 1
6 pages
Unit 8 - Lesson 1 - Page 110
No ratings yet
Unit 8 - Lesson 1 - Page 110
3 pages
The Sounds of Language: An Introduction To Phonetics and Phonology by Elizabeth C. Zsiga
No ratings yet
The Sounds of Language: An Introduction To Phonetics and Phonology by Elizabeth C. Zsiga
6 pages
6S191 MIT DeepLearning L5
No ratings yet
6S191 MIT DeepLearning L5
62 pages
Social Networking and Interpersonal Communication and Conflict Re
No ratings yet
Social Networking and Interpersonal Communication and Conflict Re
62 pages
Noun Phrases Practice-Chivi
No ratings yet
Noun Phrases Practice-Chivi
3 pages
Computer Assisted Instruction e
100% (1)
Computer Assisted Instruction e
2 pages
1 Interpersonal Communication
No ratings yet
1 Interpersonal Communication
26 pages
Apa Maksud Kod 1119
No ratings yet
Apa Maksud Kod 1119
5 pages
Ethics Course Outline Revised
No ratings yet
Ethics Course Outline Revised
20 pages
Nguyễn Hoàng Mai Chi - 207140231039
No ratings yet
Nguyễn Hoàng Mai Chi - 207140231039
8 pages
Bukidnon Association of Catholic Schools (BUACS), Inc. Diocese of Malaybalay
No ratings yet
Bukidnon Association of Catholic Schools (BUACS), Inc. Diocese of Malaybalay
2 pages
Value-Based Reinforcement Learning: Shusen Wang
No ratings yet
Value-Based Reinforcement Learning: Shusen Wang
53 pages
Maai 6
No ratings yet
Maai 6
143 pages
Study On Content Based Image Retrieval
No ratings yet
Study On Content Based Image Retrieval
2 pages
X 13 Oep Lecture 13 Q&a
No ratings yet
X 13 Oep Lecture 13 Q&a
20 pages
Developing Gestalt Counselling - Chapter 7 - Understanding Gestalt Theories of Self and Their Implications
No ratings yet
Developing Gestalt Counselling - Chapter 7 - Understanding Gestalt Theories of Self and Their Implications
6 pages
Lesson Plan Grade 9 January
No ratings yet
Lesson Plan Grade 9 January
7 pages
1 Introduction To Machine Learning
No ratings yet
1 Introduction To Machine Learning
20 pages
Deep RL Tutorial Small
No ratings yet
Deep RL Tutorial Small
66 pages
Social Interaction Model
No ratings yet
Social Interaction Model
11 pages
RL Introduction
No ratings yet
RL Introduction
225 pages
Continuous Time 2
No ratings yet
Continuous Time 2
91 pages
RL Theory Tutorial
No ratings yet
RL Theory Tutorial
80 pages
Q-Learning and Deep Q Networks (DQN)
No ratings yet
Q-Learning and Deep Q Networks (DQN)
52 pages
Direct Precise and Active Verbs For Web
No ratings yet
Direct Precise and Active Verbs For Web
4 pages
Lecture 10 - Overview of RL With A VIP Perspective
No ratings yet
Lecture 10 - Overview of RL With A VIP Perspective
35 pages
Lecture 11 - Multi Agent RL
No ratings yet
Lecture 11 - Multi Agent RL
36 pages
Report On Reinforcement Learning
No ratings yet
Report On Reinforcement Learning
26 pages
What Is Programming
No ratings yet
What Is Programming
1 page
07 Deep Reinforcement Learning (John)
No ratings yet
07 Deep Reinforcement Learning (John)
52 pages
VAK-Getting The Most Out of It
No ratings yet
VAK-Getting The Most Out of It
1 page
Assignment3 Yash Patel
No ratings yet
Assignment3 Yash Patel
10 pages
Lecture Notes Deep Reinforcement Learning: Generalizability in Deep RL
No ratings yet
Lecture Notes Deep Reinforcement Learning: Generalizability in Deep RL
7 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
48 pages
A Dangerous Method Film Analysis
No ratings yet
A Dangerous Method Film Analysis
2 pages
Playing Geometry Dash With Convolutional Neural Networks
No ratings yet
Playing Geometry Dash With Convolutional Neural Networks
7 pages
New CZ3005 Module 5 - Reinforcement Learning
No ratings yet
New CZ3005 Module 5 - Reinforcement Learning
31 pages
Sections
No ratings yet
Sections
76 pages
13-RL DRL
No ratings yet
13-RL DRL
102 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
50 pages
Intro To Reinforcement Learning
No ratings yet
Intro To Reinforcement Learning
56 pages
01 Module 2 Neural Network Based Reinforcement Learning
No ratings yet
01 Module 2 Neural Network Based Reinforcement Learning
133 pages
Unit 5d - Deep Reinforcement Learning
No ratings yet
Unit 5d - Deep Reinforcement Learning
52 pages
L13 Reinforcement Learning
No ratings yet
L13 Reinforcement Learning
35 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
6 pages
DRL v5
No ratings yet
DRL v5
64 pages
11-DL-Deep Learning For Reinforcement Learning
No ratings yet
11-DL-Deep Learning For Reinforcement Learning
47 pages
GE 1 LP 1 ANSWER SHEET Revised
No ratings yet
GE 1 LP 1 ANSWER SHEET Revised
3 pages
S18 Reinforcement Learning 2
No ratings yet
S18 Reinforcement Learning 2
46 pages
Learning Journal Unit 4
No ratings yet
Learning Journal Unit 4
3 pages
Demonstration Final Presentation
No ratings yet
Demonstration Final Presentation
59 pages
CZ3005 Module 5 - Reinforcement Learning
No ratings yet
CZ3005 Module 5 - Reinforcement Learning
31 pages
RL Unit V Qa
No ratings yet
RL Unit V Qa
13 pages
NeurIPS 2021 Decision Transformer Reinforcement Learning Via Sequence Modeling Paper
No ratings yet
NeurIPS 2021 Decision Transformer Reinforcement Learning Via Sequence Modeling Paper
14 pages
REINFORCE Algorithm
No ratings yet
REINFORCE Algorithm
15 pages
What Is TD Learning
No ratings yet
What Is TD Learning
15 pages
Towards Delivering A Coherent Self-Contained Explanation of Proximal Policy Optimization
No ratings yet
Towards Delivering A Coherent Self-Contained Explanation of Proximal Policy Optimization
36 pages
Unit 1 ML
No ratings yet
Unit 1 ML
14 pages
Unit 3 Ai
No ratings yet
Unit 3 Ai
5 pages
Lecture 11
No ratings yet
Lecture 11
51 pages
Learning Task
No ratings yet
Learning Task
14 pages
Lecture 06
No ratings yet
Lecture 06
98 pages
SocrAI Day 4
No ratings yet
SocrAI Day 4
38 pages
Q - Networks (1) 31 50
No ratings yet
Q - Networks (1) 31 50
20 pages
Module 1 (3) - Pages
No ratings yet
Module 1 (3) - Pages
77 pages
Unit - 5
No ratings yet
Unit - 5
43 pages
Module 1
No ratings yet
Module 1
81 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
10 pages
10 Deep Reinforcement
No ratings yet
10 Deep Reinforcement
40 pages
RL Concepts and Methods
No ratings yet
RL Concepts and Methods
8 pages
SP14 CS188 Lecture 10 - Reinforcement Learning I - Print
No ratings yet
SP14 CS188 Lecture 10 - Reinforcement Learning I - Print
25 pages
Games2 6pp
No ratings yet
Games2 6pp
15 pages
Chapter 1 Introduction RL Report Kiran
No ratings yet
Chapter 1 Introduction RL Report Kiran
2 pages
Reinforcement Learning in The Era of LLMS: What Is Essential? What Is Needed? An RL Perspective On RLHF, Prompting, and Beyond
No ratings yet
Reinforcement Learning in The Era of LLMS: What Is Essential? What Is Needed? An RL Perspective On RLHF, Prompting, and Beyond
11 pages
Self-Driving Car Racing: Application of Deep Reinforcement Learning
No ratings yet
Self-Driving Car Racing: Application of Deep Reinforcement Learning
12 pages
Deep Reinforcement Learning
No ratings yet
Deep Reinforcement Learning
25 pages
Learn Python through Nursery Rhymes and Fairy Tales: Classic Stories Translated into Python Programs (Coding for Kids and Beginners)
From Everand
Learn Python through Nursery Rhymes and Fairy Tales: Classic Stories Translated into Python Programs (Coding for Kids and Beginners)
Shari Eskenas
5/5 (1)
The Ultimate Temple Run Unofficial Players Game Guide
From Everand
The Ultimate Temple Run Unofficial Players Game Guide
Josh Abbott
No ratings yet
D2 Gaming System - Version 3 (D2v3)
From Everand
D2 Gaming System - Version 3 (D2v3)
Phillip Rhoades
No ratings yet