0% found this document useful (0 votes)

13 views50 pages

Lec 10

This document describes reinforcement learning and some key algorithms. It introduces reinforcement learning concepts like states, actions, rewards, policies, and the tradeoff between exploration and exploitation. It covers model-based reinforcement learning which learns a model of the environment and then solves it, as well as model-free approaches like value iteration and Q-learning that directly learn values without an explicit model.

Uploaded by

daliYop

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views50 pages

Lec 10

Uploaded by

daliYop

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 50

CS 188: Artificial Intelligence

Reinforcement Learning

Instructor: Pieter Abbeel

University of California, Berkeley
[Slides by Dan Klein, Pieter Abbeel, Anca Dragan. http://ai.berkeley.edu.]
Reinforcement Learning
Reinforcement Learning

Agent

State: s
Reward: r Actions: a

Environment

o Basic idea:
o Receive feedback in the form of rewards
o Agent’s utility is defined by the reward function
o Must (learn to) act so as to maximize expected rewards
o All learning is based on observed samples of outcomes!
Reinforcement Learning
o Still assume a Markov decision process (MDP):
o A set of states s Î S
o A set of actions (per state) A
o A model T(s,a,s’)
o A reward function R(s,a,s’)
o Still looking for a policy p(s)

o New twist: don’t know T or R

o I.e. we don’t know which states are good or what the actions do
o Must actually try actions and states out to learn
Offline (MDPs) vs. Online (RL)

Offline Solution Online Learning

Example: Learning to Walk

Initial A Learning Trial After Learning [1K Trials]

[Kohl and Stone, ICRA 2004]

Example: Learning to Walk

[Kohl and Stone, ICRA 2004]

Initial
[Video: AIBO WALK – initial]
Example: Learning to Walk

[Kohl and Stone, ICRA 2004]

Training
[Video: AIBO WALK – training]
Example: Learning to Walk

[Kohl and Stone, ICRA 2004]

Finished
[Video: AIBO WALK – finished]
The Crawler!

[Demo: Crawler Bot (L10D1)] [You, in Project 3]

Video of Demo Crawler Bot
DeepMind Atari (©Two Minute Lectures)

14
Reinforcement Learning -- Overview
o Passive Reinforcement Learning (= how to learn from experiences)
o Model-based Passive RL
o Learn the MDP model from experiences, then solve the MDP
o Model-free Passive RL
o Forego learning the MDP model, directly learn V or Q:
o Value learning – learns value of a fixed policy; 2 approaches: Direct Evaluation & TD Learning
o Q learning – learns Q values of the optimal policy (uses a Q version of TD Learning)

o Active Reinforcement Learning (= agent also needs to decide how to

collect experiences)
o Key challenges:
o How to efficiently explore?
o How to trade off exploration <> exploitation
o Applies to both model-based and model-free. In CS188 we’ll cover only in
context of Q-learning
Reinforcement Learning -- Overview
o Passive Reinforcement Learning (= how to learn from experiences)
o Model-based Passive RL
o Learn the MDP model from experiences, then solve the MDP
o Model-free Passive RL
o Forego learning the MDP model, directly learn V or Q:
o Value learning – learns value of a fixed policy; 2 approaches: Direct Evaluation & TD Learning
o Q learning – learns Q values of the optimal policy (uses a Q version of TD Learning)

o Active Reinforcement Learning (= agent also needs to decide how to

collect experiences)
o Key challenges:
o How to efficiently explore?
o How to trade off exploration <> exploitation
o Applies to both model-based and model-free. In CS188 we’ll cover only in
context of Q-learning
Model-Based Reinforcement Learning
Model-Based Reinforcement Learning
o Model-Based Idea:
o Learn an approximate model based on experiences
o Solve for values as if the learned model were correct

o Step 1: Learn empirical MDP model

o Count outcomes s’ for each s, a
o Normalize to give an estimate of
o Discover each when we experience (s, a, s’)

o Step 2: Solve the learned MDP

o For example, use value iteration, as before

(and repeat as needed)

Example: Model-Based RL
Input Policy p Observed Episodes (Training) Learned Model
Episode 1 Episode 2 T(s,a,s’).
B, east, C, -1 B, east, C, -1 T(B, east, C) = 1.00
A T(C, east, D) = 0.75
C, east, D, -1 C, east, D, -1
T(C, east, A) = 0.25
D, exit, x, +10 D, exit, x, +10 …
B C D
Episode 3 Episode 4 R(s,a,s’).
E R(B, east, C) = -1
E, north, C, -1 E, north, C, -1
R(C, east, D) = -1
C, east, D, -1 C, east, A, -1 R(D, exit, x) = +10
Assume: g = 1 D, exit, x, +10 A, exit, x, -10 …
Analogy: Expected Age
Goal: Compute expected age of cs188 students
Known P(A)

Without P(A), instead collect samples [a1, a2, … aN]

Unknown P(A): “Model Based” Unknown P(A): “Model Free”

Why does this Why does this

work? Because work? Because
eventually you samples appear
learn the right with the right
model. frequencies.
Reinforcement Learning -- Overview
o Passive Reinforcement Learning (= how to learn from experiences)
o Model-based Passive RL
o Learn the MDP model from experiences, then solve the MDP
o Model-free Passive RL
o Forego learning the MDP model, directly learn V or Q:
o Value learning – learns value of a fixed policy; 2 approaches: Direct Evaluation & TD Learning
o Q learning – learns Q values of the optimal policy (uses a Q version of TD Learning)

o Active Reinforcement Learning (= agent also needs to decide how to

collect experiences)
o Key challenges:
o How to efficiently explore?
o How to trade off exploration <> exploitation
o Applies to both model-based and model-free. In CS188 we’ll cover only in
context of Q-learning
Passive Model-Free Reinforcement Learning
Passive Model-Free Reinforcement Learning
o Simplified task: policy evaluation
o Input: a fixed policy p(s)
o You don’t know the transitions T(s,a,s’)
o You don’t know the rewards R(s,a,s’)
o Goal: learn the state values

o In this case:
o Learner is “along for the ride”
o No choice about what actions to take
o Just execute the policy and learn from experience
o This is NOT offline planning! You actually take actions in the world.
Direct Evaluation

o Goal: Compute values for each state under p

o Idea: Average together observed sample

values
o Act according to p
o Every time you visit a state, write down what the
sum of discounted rewards turned out to be
o Average those samples

o This is called direct evaluation

Example: Direct Evaluation
Input Policy p Observed Episodes (Training) Output Values
Episode 1 Episode 2
B, east, C, -1 B, east, C, -1 -10
A C, east, D, -1 C, east, D, -1 A
D, exit, x, +10 D, exit, x, +10
+8 +4 +10
B C D B C D
Episode 3 Episode 4 -2
E E, north, C, -1 E
E, north, C, -1
C, east, D, -1 C, east, A, -1
D, exit, x, +10 A, exit, x, -10 If B and E both go to C
Assume: g = 1
under this policy, how can
their values be different?
Problems with Direct Evaluation
o What’s good about direct evaluation? Output Values
o It’s easy to understand
o It doesn’t require any knowledge of T, R -10
A
o It eventually computes the correct average
values, using just sample transitions +8 +4 +10
B C D
-2
o What bad about it? E
o It wastes information about state connections
If B and E both go to C
o Each state must be learned separately
under this policy, how can
o So, it takes a long time to learn their values be different?
Reinforcement Learning -- Overview
o Passive Reinforcement Learning (= how to learn from experiences)
o Model-based Passive RL
o Learn the MDP model from experiences, then solve the MDP
o Model-free Passive RL
o Forego learning the MDP model, directly learn V or Q:
o Value learning – learns value of a fixed policy; 2 approaches: Direct Evaluation & TD Learning
o Q learning – learns Q values of the optimal policy (uses a Q version of TD Learning)

o Active Reinforcement Learning (= agent also needs to decide how to

collect experiences)
o Key challenges:
o How to efficiently explore?
o How to trade off exploration <> exploitation
o Applies to both model-based and model-free. In CS188 we’ll cover only in
context of Q-learning
Temporal Difference Value Learning
Why Not Use Policy Evaluation?
o Simplified Bellman updates calculate V for a fixed policy: s
o Each round, replace V with a one-step-look-ahead layer over V
p(s)
s, p(s)

s, p(s),s’
s’
o This approach fully exploited the connections between the states
o Unfortunately, we need T and R to do it!

o Key question: how can we do this update to V without knowing T and R?

o In other words, how to we take a weighted average without knowing the weights?
Sample-Based Policy Evaluation?
o We want to improve our estimate of V by computing these averages:

o Idea: Take samples of outcomes s’ (by doing the action!) and average
s
p(s)
s, p(s)

s, p(s),s’
s2' s1'
s' s3'

Almost! But we can’t

rewind time to get sample
after sample from state s.
Temporal Difference Value Learning
o Big idea: learn from every experience! s
o Update V(s) each time we experience a transition (s, a, s’, r)
p(s)
o Likely outcomes s’ will contribute updates more often
s, p(s)
o Temporal difference learning of values
o Policy still fixed, still doing evaluation! s’
o Move values toward value of whatever successor occurs: running
average
Sample of V(s):

Update to V(s):

Same update:
Exponential Moving Average
o Exponential moving average
o The running interpolation update:

o Makes recent samples more important

o Forgets about the past (distant past values were wrong anyway)

o Decreasing learning rate (alpha) can give converging averages

Example: Temporal Difference Value Learning
States Observed Transitions

B, east, C, -2 C, east, D, -2

A 0 0 0

B C D 0 0 8 -1 0 8 -1 3 8

E 0 0 0

Assume: g = 1, α = 1/2
Problems with TD Value Learning

o TD value leaning is a model-free way to do policy evaluation,

mimicking Bellman updates with running sample averages
o However, if we want to turn values into a (new) policy, we’re sunk:

s
a
s, a

o Idea: learn Q-values, not values s,a,s’

o Makes action selection model-free too! s’
Reinforcement Learning -- Overview
o Passive Reinforcement Learning (= how to learn from experiences)
o Model-based Passive RL
o Learn the MDP model from experiences, then solve the MDP
o Model-free Passive RL
o Forego learning the MDP model, directly learn V or Q:
o Value learning – learns value of a fixed policy; 2 approaches: Direct Evaluation & TD Learning
o Q learning – learns Q values of the optimal policy (uses a Q version of TD Learning)

o Active Reinforcement Learning (= agent also needs to decide how to

collect experiences)
o Key challenges:
o How to efficiently explore?
o How to trade off exploration <> exploitation
o Applies to both model-based and model-free. In CS188 we’ll cover only in
context of Q-learning
Q-Value Iteration
o Value iteration: find successive (depth-limited) values
o Start with V0(s) = 0, which we know is right
o Given Vk, calculate the depth k+1 values for all states:

o But Q-values are more useful, so compute them instead

o Start with Q0(s,a) = 0, which we know is right
o Given Qk, calculate the depth k+1 q-values for all q-states:
Q-Learning
o Q-Learning: sample-based Q-value iteration

o Learn Q(s,a) values as you go

o Receive a sample (s,a,s’,r)
o Consider your old estimate:
o Consider your new sample estimate:
no longer policy
evaluation!

o Incorporate the new estimate into a running average:

[Demo: Q-learning – gridworld (L10D2)]

[Demo: Q-learning – crawler (L10D3)]
Q-Learning Properties
o Amazing result: Q-learning converges to optimal policy --
even if you’re acting suboptimally!

o This is called off-policy learning

o Caveats:
o You have to explore enough
o You have to eventually make the learning rate
small enough
o … but not decrease it too quickly
o Basically, in the limit, it doesn’t matter how you select actions (!)
Video of Demo Q-Learning -- Gridworld
Video of Demo Q-Learning -- Crawler
Reinforcement Learning -- Overview
o Passive Reinforcement Learning (= how to learn from experiences)
o Model-based Passive RL
o Learn the MDP model from experiences, then solve the MDP
o Model-free Passive RL
o Forego learning the MDP model, directly learn V or Q:
o Value learning – learns value of a fixed policy; 2 approaches: Direct Evaluation & TD Learning
o Q learning – learns Q values of the optimal policy (uses a Q version of TD Learning)

o Active Reinforcement Learning (= agent also needs to decide how

to collect experiences)
o Key challenges:
o How to efficiently explore?
o How to trade off exploration <> exploitation
o Applies to both model-based and model-free. In CS188 we’ll cover only in
context of Q-learning
Active Reinforcement Learning
Active Reinforcement Learning
o Full reinforcement learning: optimal policies (like value
iteration)
o You don’t know the transitions T(s,a,s’)
o You don’t know the rewards R(s,a,s’)
o You choose the actions now
o Goal: learn the optimal policy / values

o In this case:
o Learner makes choices!
o Fundamental tradeoff: exploration vs. exploitation
o This is NOT offline planning! You actually take actions in the world
and find out what happens…
Exploration vs. Exploitation
Video of Demo Q-learning – Manual Exploration – Bridge
Grid
How to Explore?
o Several schemes for forcing exploration
o Simplest: random actions (e-greedy)
o Every time step, flip a coin
o With (small) probability e, act randomly
o With (large) probability 1-e, act on current policy

o Problems with random actions?

o You do eventually explore the space, but keep
thrashing around once learning is done
o One solution: lower e over time
o Another solution: exploration functions
[Demo: Q-learning – manual exploration – bridge grid (L10D5)]
[Demo: Q-learning – epsilon-greedy -- crawler (L10D3)]
Video of Demo Q-learning – Epsilon-Greedy – Crawler
Exploration Functions
o When to explore?
o Random actions: explore a fixed amount
o Better idea: explore areas whose badness is not
(yet) established, eventually stop exploring

o Exploration function
o Takes a value estimate u and a visit count n, and
returns an optimistic utility, e.g.
Regular Q-Update:
Modified Q-Update:

o Note: this propagates the “bonus” back to states that lead to unknown states
as well! [Demo: exploration – Q-learning – crawler – exploration function (L10D4)]
Video of Demo Q-learning – Exploration Function –
Crawler
Regret
o Even if you learn the optimal
policy, you still make mistakes
along the way!
o Regret is a measure of your total
mistake cost: the difference
between your (expected) rewards,
including youthful suboptimality,
and optimal (expected) rewards
o Minimizing regret goes beyond
learning to be optimal – it requires
optimally learning to be optimal
o Example: random exploration and
exploration functions both end up
optimal, but random exploration
has higher regret
Reinforcement Learning -- Overview
o Passive Reinforcement Learning (= how to learn from experiences)
o Model-based Passive RL
o Learn the MDP model from experiences, then solve the MDP
o Model-free Passive RL
o Forego learning the MDP model, directly learn V or Q:
o Value learning – learns value of a fixed policy; 2 approaches: Direct Evaluation & TD Learning
o Q learning – learns Q values of the optimal policy (uses a Q version of TD Learning)

o Active Reinforcement Learning (= agent also needs to decide how to

ML Ai PGD
No ratings yet
ML Ai PGD
26 pages
SP14 CS188 Lecture 10 - Reinforcement Learning I - Print
No ratings yet
SP14 CS188 Lecture 10 - Reinforcement Learning I - Print
25 pages
SP14 CS188 Lecture 10 - Reinforcement Learning I
No ratings yet
SP14 CS188 Lecture 10 - Reinforcement Learning I
35 pages
Artificial Intelligence: Lecture 10 - Reinforcement Learning Prof. Shivanjali Khare
No ratings yet
Artificial Intelligence: Lecture 10 - Reinforcement Learning Prof. Shivanjali Khare
45 pages
Tri-Tue-Nhan-Tao - Nathan-Lambert - Lec13 - 6up-Reinforcement-Learning - (Cuuduongthancong - Com)
No ratings yet
Tri-Tue-Nhan-Tao - Nathan-Lambert - Lec13 - 6up-Reinforcement-Learning - (Cuuduongthancong - Com)
8 pages
SP14 CS188 Lecture 10 - Reinforcement Learning I PDF
No ratings yet
SP14 CS188 Lecture 10 - Reinforcement Learning I PDF
38 pages
Lec 11
No ratings yet
Lec 11
45 pages
CS 188 Introduction To Artificial Intelligence Summer 2019 Note 4
No ratings yet
CS 188 Introduction To Artificial Intelligence Summer 2019 Note 4
9 pages
Ai (It) Unit-5
No ratings yet
Ai (It) Unit-5
43 pages
Artificial Intelligence: Lecture 11 - Reinforcement Learning II Dr. Shivanjali Khare
No ratings yet
Artificial Intelligence: Lecture 11 - Reinforcement Learning II Dr. Shivanjali Khare
52 pages
L13 Reinforcement Learning
No ratings yet
L13 Reinforcement Learning
57 pages
S18 Reinforcement Learning 2
No ratings yet
S18 Reinforcement Learning 2
46 pages
Lecture 30 Reinforcement-Learning
No ratings yet
Lecture 30 Reinforcement-Learning
50 pages
ML Unit-4 - RTU
No ratings yet
ML Unit-4 - RTU
18 pages
AI T8 ReinfoLearning
No ratings yet
AI T8 ReinfoLearning
38 pages
16 RL
No ratings yet
16 RL
51 pages
Reinforcement Learning (Part 2) : Nguyen Do Van, PHD
No ratings yet
Reinforcement Learning (Part 2) : Nguyen Do Van, PHD
46 pages
Unit 3
No ratings yet
Unit 3
29 pages
Unit-8 - Reinforcement Learning
No ratings yet
Unit-8 - Reinforcement Learning
52 pages
5th Unit Notes Full File
No ratings yet
5th Unit Notes Full File
22 pages
11-DL-Deep Learning For Reinforcement Learning
No ratings yet
11-DL-Deep Learning For Reinforcement Learning
47 pages
Lecture 29 RL
No ratings yet
Lecture 29 RL
38 pages
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
No ratings yet
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
34 pages
Reinforcement Learning: Csci 5512: Artificial Intelligence Ii
No ratings yet
Reinforcement Learning: Csci 5512: Artificial Intelligence Ii
30 pages
IntroductiontoRL BR
No ratings yet
IntroductiontoRL BR
22 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
46 pages
Unit 3
No ratings yet
Unit 3
32 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
45 pages
7 - Reinforcement Learning
No ratings yet
7 - Reinforcement Learning
23 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
52 pages
37 RL
No ratings yet
37 RL
18 pages
DLMAIRIL01 Q4-2024 Session4
No ratings yet
DLMAIRIL01 Q4-2024 Session4
80 pages
Fundamentals of Reinforcement Learning
No ratings yet
Fundamentals of Reinforcement Learning
33 pages
RL Complete Unit-5
No ratings yet
RL Complete Unit-5
30 pages
RL 1
No ratings yet
RL 1
30 pages
MLT Unit-5 Notes
No ratings yet
MLT Unit-5 Notes
17 pages
Serge Levine Course Introduction To Reinforcement Learning 3: RL Introduction
No ratings yet
Serge Levine Course Introduction To Reinforcement Learning 3: RL Introduction
46 pages
Unit-5 Part C 1) Explain The Q Function and Q Learning Algorithm Assuming Deterministic Rewards and Actions With Example. Ans)
No ratings yet
Unit-5 Part C 1) Explain The Q Function and Q Learning Algorithm Assuming Deterministic Rewards and Actions With Example. Ans)
11 pages
Reinforcement Learning MY101
No ratings yet
Reinforcement Learning MY101
15 pages
Artificial Intelligence: Computer Science & Engineering, Khulna University
No ratings yet
Artificial Intelligence: Computer Science & Engineering, Khulna University
30 pages
CMPE257 - W10C13 - Reinforcement Learning
No ratings yet
CMPE257 - W10C13 - Reinforcement Learning
161 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
32 pages
Stochastic Process - Markov Property - Markov Chain - Markov Decision Process - Reinforcement Learning - RL Techniques - Example Applications
No ratings yet
Stochastic Process - Markov Property - Markov Chain - Markov Decision Process - Reinforcement Learning - RL Techniques - Example Applications
39 pages
Reinforcement
No ratings yet
Reinforcement
9 pages
Unit4 (AI) 2024 Docx-1
No ratings yet
Unit4 (AI) 2024 Docx-1
22 pages
Unit 5
No ratings yet
Unit 5
45 pages
Unit 5 ML
No ratings yet
Unit 5 ML
15 pages
AI 11 Reinforcement Learning II
No ratings yet
AI 11 Reinforcement Learning II
35 pages
Reinforced Learning
No ratings yet
Reinforced Learning
25 pages
An Overview of Machine Learning
No ratings yet
An Overview of Machine Learning
42 pages
Reinforcement Learning: Instructor: Max Welling
No ratings yet
Reinforcement Learning: Instructor: Max Welling
18 pages
Unit 5-1
No ratings yet
Unit 5-1
8 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
38 pages
Reinforcement Learning: Yijue Hou
No ratings yet
Reinforcement Learning: Yijue Hou
34 pages
Reinforcement Learning: Karan Kathpalia
No ratings yet
Reinforcement Learning: Karan Kathpalia
80 pages
cs188 sp23 Note14
No ratings yet
cs188 sp23 Note14
2 pages
Lecture 9 Reiforcement Learning
No ratings yet
Lecture 9 Reiforcement Learning
29 pages
L12 Reinforcement Learning 2
No ratings yet
L12 Reinforcement Learning 2
26 pages
2023 Week3 Modelfree
No ratings yet
2023 Week3 Modelfree
63 pages
Learning Task
No ratings yet
Learning Task
14 pages
Trigonometric Ratios to Transformations (Trigonometry) Mathematics E-Book For Public Exams
From Everand
Trigonometric Ratios to Transformations (Trigonometry) Mathematics E-Book For Public Exams
Mohmmad Khaja Shareef
5/5 (1)
AIML Mod-5
No ratings yet
AIML Mod-5
18 pages
Lecture 5: Model-Free Control: David Silver
No ratings yet
Lecture 5: Model-Free Control: David Silver
43 pages
Artificial Intelligence Course Intellipaat
No ratings yet
Artificial Intelligence Course Intellipaat
11 pages
AI Unit 4 QA
No ratings yet
AI Unit 4 QA
22 pages
Unit Iii - Information Retrieval Design Features of Information Retrieval Systems
No ratings yet
Unit Iii - Information Retrieval Design Features of Information Retrieval Systems
57 pages
WinnieXu CV
No ratings yet
WinnieXu CV
2 pages
Connecting Large Language Models With Evolutionary Algorithms Yields Powerful Prompt
No ratings yet
Connecting Large Language Models With Evolutionary Algorithms Yields Powerful Prompt
18 pages
2022 Acl-Long 524
No ratings yet
2022 Acl-Long 524
18 pages
DLMAIRIL01 Q4-2024 Session3
No ratings yet
DLMAIRIL01 Q4-2024 Session3
47 pages
Combinatorial Optimization by Graph Pointer Networks and Hierarchical Reinforcement Learning
No ratings yet
Combinatorial Optimization by Graph Pointer Networks and Hierarchical Reinforcement Learning
8 pages
Learning To Act Using Real-Time Dynamic Programmin
No ratings yet
Learning To Act Using Real-Time Dynamic Programmin
67 pages
1 s2.0 S0263224123006152 Main
No ratings yet
1 s2.0 S0263224123006152 Main
11 pages
Fin-R1: A Large Language Model For Financial Reasoning Through Reinforcement Learning
No ratings yet
Fin-R1: A Large Language Model For Financial Reasoning Through Reinforcement Learning
20 pages
Forex Trading DRL Approach
No ratings yet
Forex Trading DRL Approach
9 pages
Reinforcement Learning For Trade Execution
No ratings yet
Reinforcement Learning For Trade Execution
7 pages
Introspective Tips: Large Language Model For In-Context Decision Making
No ratings yet
Introspective Tips: Large Language Model For In-Context Decision Making
22 pages
Ebooks File Applied Reinforcement Learning With Python: With OpenAI Gym, Tensorflow, and Keras Beysolow Ii All Chapters
100% (9)
Ebooks File Applied Reinforcement Learning With Python: With OpenAI Gym, Tensorflow, and Keras Beysolow Ii All Chapters
62 pages
Morefeatures
No ratings yet
Morefeatures
6 pages
Reinforcement Learning-Based Optimal Scheduling Model of Battery Energy Storage System at The Building Level
No ratings yet
Reinforcement Learning-Based Optimal Scheduling Model of Battery Energy Storage System at The Building Level
16 pages
Stock Market Prediction Using CNN and LSTM
No ratings yet
Stock Market Prediction Using CNN and LSTM
7 pages
Mobile Robot Path Planning in Dynamic Environments Through Globally Guided Reinforcement Learning
No ratings yet
Mobile Robot Path Planning in Dynamic Environments Through Globally Guided Reinforcement Learning
8 pages
Learning in Artificial Intelligence
No ratings yet
Learning in Artificial Intelligence
5 pages
Visual Communication With Paralyzed People Using Face Detection
No ratings yet
Visual Communication With Paralyzed People Using Face Detection
72 pages
2023an Energy-Efficient Routing Protocol With Reinforcement Learning in Software-Defined Wireless Sensor Networks
No ratings yet
2023an Energy-Efficient Routing Protocol With Reinforcement Learning in Software-Defined Wireless Sensor Networks
22 pages
WILP Degree Course Descriptions
No ratings yet
WILP Degree Course Descriptions
80 pages
COM 418 - Expert Systems CAT
No ratings yet
COM 418 - Expert Systems CAT
3 pages
1 Introduction To Reinforcement Learning
100% (2)
1 Introduction To Reinforcement Learning
104 pages
Hitman Pagenumber
No ratings yet
Hitman Pagenumber
27 pages
Generative AI Leader (ILT) - Module2 - Gen AI - Unlock Foundational Concepts
No ratings yet
Generative AI Leader (ILT) - Module2 - Gen AI - Unlock Foundational Concepts
163 pages