0% found this document useful (0 votes)

11 views45 pages

Lec 11

The document discusses reinforcement learning techniques including model-free passive reinforcement learning using Q-learning and temporal difference learning, as well as active reinforcement learning which involves an agent deciding how to collect experiences by balancing exploration versus exploitation. Approximate reinforcement learning is also covered to handle large state spaces.

Uploaded by

daliYop

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views45 pages

Lec 11

Uploaded by

daliYop

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 45

CS 188: Artificial Intelligence

Reinforcement Learning II

Instructor: Pieter Abbeel, University of California, Berkeley

[These slides were created by Dan Klein, Pieter Abbeel, and Anca Dragan. http://ai.berkeley.edu.]
Reinforcement Learning
o We still assume an MDP:
o A set of states s Î S
o A set of actions (per state) A
o A model T(s, a, s’) = P(s’ | s, a)
o A reward function R(s,a,s’)
o Still looking for a policy p(s)

o New twist: don’t know T or R, so must try out actions

o Big idea: Compute all averages over T using sample outcomes

Reinforcement Learning -- Overview
o Passive Reinforcement Learning (= how to learn from experiences)
o Model-based Passive RL
o Learn the MDP model from experiences, then solve the MDP
o Model-free Passive RL
o Forego learning the MDP model, directly learn V or Q:
o Value learning – learns value of a fixed policy; 2 approaches: Direct Evaluation & TD Learning
o Q learning – learns Q values of the optimal policy (uses a Q version of TD Learning)

o Active Reinforcement Learning (= agent also needs to decide how to

collect experiences)
o Key challenges:
o How to efficiently explore?
o How to trade off exploration <> exploitation
o Applies to both model-based and model-free. In CS188 we’ll cover only in
context of Q-learning
Model-Free Learning
s
o Model-free (temporal difference) learning a
o Experience world through episodes s, a
r
s’
o Update estimates each transition a’
s’, a’
o Over time, updates will mimic Bellman
updates s’’
Q-Learning
o Q-value iteration

o Q-Learning: learn Q(s,a) values as you go

o Receive a sample (s,a,s’,r)
o Consider your old estimate:
o Consider your new sample estimate:

o Incorporate the new estimate into a running average:

[Demo: Q-learning – gridworld (L10D2)]

[Demo: Q-learning – crawler (L10D3)]
Video of Demo Q-Learning -- Gridworld
Video of Demo Q-Learning -- Crawler
Q-Learning Properties
o Amazing result: Q-learning converges to optimal policy --
even if you’re acting suboptimally!

o This is called off-policy learning

o Caveats:
o You have to explore enough
o You have to eventually make the learning rate
small enough
o … but not decrease it too quickly
o Basically, in the limit, it doesn’t matter how you select actions (!)

[Demo: Q-learning – cliff grid (L11D2)]

Video of Demo Q-learning – Manual Exploration – Bridge
Grid
Reinforcement Learning -- Overview
o Passive Reinforcement Learning (= how to learn from experiences)
o Model-based Passive RL
o Learn the MDP model from experiences, then solve the MDP
o Model-free Passive RL
o Forego learning the MDP model, directly learn V or Q:
o Value learning – learns value of a fixed policy; 2 approaches: Direct Evaluation & TD Learning
o Q learning – learns Q values of the optimal policy (uses a Q version of TD Learning)

o Active Reinforcement Learning (= agent also needs to decide how

to collect experiences)
o Key challenges:
o How to efficiently explore?
o How to trade off exploration <> exploitation
o Applies to both model-based and model-free. In CS188 we’ll cover only in
context of Q-learning
Active Reinforcement Learning
Active Reinforcement Learning
o Full reinforcement learning: optimal policies (like value
iteration)
o You don’t know the transitions T(s,a,s’)
o You don’t know the rewards R(s,a,s’)
o You choose the actions now
o Goal: learn the optimal policy / values

o In this case:
o Learner makes choices!
o Fundamental tradeoff: exploration vs. exploitation
o This is NOT offline planning! You actually take actions in the world
and find out what happens…
Exploration vs. Exploitation
How to Explore?
o Several schemes for forcing exploration
o Simplest: random actions (e-greedy)
o Every time step, flip a coin
o With (small) probability e, act randomly
o With (large) probability 1-e, act on current policy

o Problems with random actions?

o You do eventually explore the space, but keep
thrashing around once learning is done
o One solution: lower e over time
o Another solution: exploration functions

[Demo: Q-learning – epsilon-greedy -- crawler (L10D3)]

Video of Demo Q-learning – Epsilon-Greedy – Crawler
Exploration Functions
o When to explore?
o Random actions: explore a fixed amount
o Better idea: explore areas whose badness is not
(yet) established, eventually stop exploring

o Exploration function
o Takes a value estimate u and a visit count n, and
returns an optimistic utility, e.g.
Regular Q-Update:
Modified Q-Update:

o Note: this propagates the “bonus” back to states that lead to unknown states
as well! [Demo: exploration – Q-learning – crawler – exploration function (L10D4)]
Video of Demo Q-learning – Exploration Function –
Crawler
Regret
o Even if you learn the optimal
policy, you still make mistakes
along the way!
o Regret is a measure of your total
mistake cost: the difference
between your (expected) rewards,
including youthful suboptimality,
and optimal (expected) rewards
o Minimizing regret goes beyond
learning to be optimal – it requires
optimally learning to be optimal
o Example: random exploration and
exploration functions both end up
optimal, but random exploration
has higher regret
Reinforcement Learning -- Overview
o Passive Reinforcement Learning (= how to learn from experiences)
o Model-based Passive RL
o Learn the MDP model from experiences, then solve the MDP
o Model-free Passive RL
o Forego learning the MDP model, directly learn V or Q:
o Value learning – learns value of a fixed policy; 2 approaches: Direct Evaluation & TD Learning
o Q learning – learns Q values of the optimal policy (uses a Q version of TD Learning)

o Active Reinforcement Learning (= agent also needs to decide how to

collect experiences)
o Key challenges:
o How to efficiently explore?
o How to trade off exploration <> exploitation
o Applies to both model-based and model-free. In CS188 we’ll cover only in
context of Q-learning
Reinforcement Learning -- Overview
o Passive Reinforcement Learning (= how to learn from experiences)
o Model-based Passive RL
o Learn the MDP model from experiences, then solve the MDP
o Model-free Passive RL
o Forego learning the MDP model, directly learn V or Q:
o Value learning – learns value of a fixed policy; 2 approaches: Direct Evaluation & TD Learning
o Q learning – learns Q values of the optimal policy (uses a Q version of TD Learning)
o Active Reinforcement Learning (= agent also needs to decide how to collect experiences)
o Key challenges:
o How to efficiently explore?
o How to trade off exploration <> exploitation
o Applies to both model-based and model-free. In CS188 we’ll cover only in context of Q-learning
Reinforcement Learning -- Overview
o Passive Reinforcement Learning (= how to learn from experiences)
o Model-based Passive RL
o Learn the MDP model from experiences, then solve the MDP
o Model-free Passive RL
o Forego learning the MDP model, directly learn V or Q:
o Value learning – learns value of a fixed policy; 2 approaches: Direct Evaluation & TD Learning
o Q learning – learns Q values of the optimal policy (uses a Q version of TD Learning)
o Active Reinforcement Learning (= agent also needs to decide how to collect experiences)
o Key challenges:
o How to efficiently explore?
o How to trade off exploration <> exploitation
o Applies to both model-based and model-free. In CS188 we’ll cover only in context of Q-learning
o Approximate Reinforcement Learning (= to handle large state spaces)
o Approximate Q-Learning
o Policy Search
Approximate Q-Learning
Generalizing Across States
o Basic Q-Learning keeps a table of all q-values

o In realistic situations, we cannot possibly learn

about every single state!
o Too many states to visit them all in training
o Too many states to hold the q-tables in memory

o Instead, we want to generalize:

o Learn about some small number of training states
from experience
o Generalize that experience to new, similar situations
o This is a fundamental idea in machine learning, and
we’ll see it over and over again
Example: Pacman
Let’s say we discover In naïve q-learning, Or even this one!
through experience we know nothing
that this state is bad: about this state:

[Demo: Q-learning – pacman – tiny – watch all (L11D4)]

[Demo: Q-learning – pacman – tiny – silent train (L11D6)]
[Demo: Q-learning – pacman – tricky – watch all (L11D5)]
Video of Demo Q-Learning Pacman – Tiny – Watch
All
Video of Demo Q-Learning Pacman – Tiny – Silent
Train
Video of Demo Q-Learning Pacman – Tricky – Watch
All
Feature-Based Representations

o Solution: describe a state using a vector of

features (properties)
o Features are functions from states to real numbers
(often 0/1) that capture important properties of the
state
o Example features:
o Distance to closest ghost
o Distance to closest dot
o Number of ghosts
o 1 / (dist to dot)2
o Is Pacman in a tunnel? (0/1)
o …… etc.
o Is it the exact state on this slide?
o Can also describe a q-state (s, a) with features (e.g.
action moves closer to food)
Linear Value Functions

o Using a feature representation, we can write a q function (or value function)

for any state using a few weights:

o Advantage: our experience is summed up in a few powerful numbers

o Disadvantage: states may share features but actually be very different in

value!
Approximate Q-Learning

o Q-learning with linear Q-functions:

Exact Q’s

Approximate Q’s

o Intuitive interpretation:
o Adjust weights of active features
o E.g., if something unexpectedly bad happens, blame the features that were
on: disprefer all states with that state’s features

o Formal justification: online least squares

Example: Q-Pacman

[Demo: approximate Q-
learning pacman (L11D8)]
Video of Demo Approximate Q-Learning --
Pacman
DeepMind Atari (©Two Minute Lectures)
approximate Q-learning with neural nets

35
Q-Learning and Least Squares
Linear Approximation: Regression
40

24
20
22

30
40
0 20
30
0 20
10 20
10
0 0

Prediction: Prediction:
Optimization: Least Squares

Error or “residual”
Observation

Prediction

0
0 20
Minimizing Error
Imagine we had only one point x, with features f(x), target value y, and weights w:

Approximate q update explained:

“target” “prediction”
Overfitting: Why Limiting Capacity Can Help
30

20
Degree 15 polynomial
15

-5

-10

-15
0 2 4 6 8 10 12 14 16 18 20
Reinforcement Learning -- Overview
o Passive Reinforcement Learning (= how to learn from experiences)
o Model-based Passive RL
o Learn the MDP model from experiences, then solve the MDP
o Model-free Passive RL
o Forego learning the MDP model, directly learn V or Q:
o Value learning – learns value of a fixed policy; 2 approaches: Direct Evaluation & TD Learning
o Q learning – learns Q values of the optimal policy (uses a Q version of TD Learning)
o Active Reinforcement Learning (= agent also needs to decide how to collect experiences)
o Key challenges:
o How to efficiently explore?
o How to trade off exploration <> exploitation
o Applies to both model-based and model-free. In CS188 we’ll cover only in context of Q-learning
o Approximate Reinforcement Learning (= to handle large state spaces)
o Approximate Q-Learning
o Policy Search
Policy Search
Policy Search
o Problem: often the feature-based policies that work well (win games, maximize
utilities) aren’t the ones that approximate V / Q best
o E.g. your value functions from project 2 were probably horrible estimates of future rewards,
but they still produced good decisions
o Q-learning’s priority: get Q-values close (modeling)
o Action selection priority: get ordering of Q-values right (prediction)
o We’ll see this distinction between modeling and prediction again later in the course

o Solution: learn policies that maximize rewards, not the values that predict them

o Policy search: start with an ok solution (e.g. Q-learning) then fine-tune by hill
climbing on feature weights
Policy Search
o Simplest policy search:
o Start with an initial linear value function or Q-function
o Nudge each feature weight up and down and see if your policy is better than
before

o Problems:
o How do we tell the policy got better?
o Need to run many sample episodes!
o If there are a lot of features, this can be impractical

o Better methods exploit lookahead structure, sample wisely, change

multiple parameters…
RL: Helicopter Flight

[Andrew Ng] [Video: HELICOPTER]

RL: Learning Locomotion

[Schulman, Moritz, Levine, Jordan, Abbeel, ICLR 2016] [Video: GAE]

Conclusion of CS188 Part I

o We’re done with Part I: Search and

Planning!

o We’ve seen how AI methods can solve

problems in:
o Search
o Constraint Satisfaction Problems
o Games
o Markov Decision Problems
o Reinforcement Learning

o Next up: Part II: Uncertainty and

Learning!

Rizal's Life: Family, Childhood and Early Education: Here Is Where Our Presentation Begins
100% (2)
Rizal's Life: Family, Childhood and Early Education: Here Is Where Our Presentation Begins
12 pages
AI 11 Reinforcement Learning II
No ratings yet
AI 11 Reinforcement Learning II
35 pages
Artificial Intelligence: Lecture 11 - Reinforcement Learning II Dr. Shivanjali Khare
No ratings yet
Artificial Intelligence: Lecture 11 - Reinforcement Learning II Dr. Shivanjali Khare
52 pages
Lec 10
No ratings yet
Lec 10
50 pages
SP14 CS188 Lecture 10 - Reinforcement Learning I
No ratings yet
SP14 CS188 Lecture 10 - Reinforcement Learning I
35 pages
Ai (It) Unit-5
No ratings yet
Ai (It) Unit-5
43 pages
S18 Reinforcement Learning 2
No ratings yet
S18 Reinforcement Learning 2
46 pages
AI T8 ReinfoLearning
No ratings yet
AI T8 ReinfoLearning
38 pages
SP14 CS188 Lecture 10 - Reinforcement Learning I PDF
No ratings yet
SP14 CS188 Lecture 10 - Reinforcement Learning I PDF
38 pages
Artificial Intelligence: Lecture 10 - Reinforcement Learning Prof. Shivanjali Khare
No ratings yet
Artificial Intelligence: Lecture 10 - Reinforcement Learning Prof. Shivanjali Khare
45 pages
CS 188 Introduction To Artificial Intelligence Summer 2019 Note 4
No ratings yet
CS 188 Introduction To Artificial Intelligence Summer 2019 Note 4
9 pages
SP14 CS188 Lecture 10 - Reinforcement Learning I - Print
No ratings yet
SP14 CS188 Lecture 10 - Reinforcement Learning I - Print
25 pages
cs188 sp23 Note14
No ratings yet
cs188 sp23 Note14
2 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
38 pages
Unit-5 Part C 1) Explain The Q Function and Q Learning Algorithm Assuming Deterministic Rewards and Actions With Example. Ans)
No ratings yet
Unit-5 Part C 1) Explain The Q Function and Q Learning Algorithm Assuming Deterministic Rewards and Actions With Example. Ans)
11 pages
7 - Reinforcement Learning
No ratings yet
7 - Reinforcement Learning
23 pages
37 RL
No ratings yet
37 RL
18 pages
Learning Task
No ratings yet
Learning Task
14 pages
Unit 5 ML
No ratings yet
Unit 5 ML
15 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
32 pages
Lecture 29 RL
No ratings yet
Lecture 29 RL
38 pages
Lecture 30 Reinforcement-Learning
No ratings yet
Lecture 30 Reinforcement-Learning
50 pages
Tri-Tue-Nhan-Tao - Nathan-Lambert - Lec13 - 6up-Reinforcement-Learning - (Cuuduongthancong - Com)
No ratings yet
Tri-Tue-Nhan-Tao - Nathan-Lambert - Lec13 - 6up-Reinforcement-Learning - (Cuuduongthancong - Com)
8 pages
Unit-5 MLT
No ratings yet
Unit-5 MLT
13 pages
Lec 09
No ratings yet
Lec 09
26 pages
Reinforcement Learning: Russell and Norvig: CH 21
No ratings yet
Reinforcement Learning: Russell and Norvig: CH 21
16 pages
Reinforcement Learning: Russell and Norvig: CH 21
No ratings yet
Reinforcement Learning: Russell and Norvig: CH 21
16 pages
21 - Reinforcement Learning
No ratings yet
21 - Reinforcement Learning
25 pages
CMPE257 - W10C13 - Reinforcement Learning
No ratings yet
CMPE257 - W10C13 - Reinforcement Learning
161 pages
Reinforcement Learning: Yijue Hou
No ratings yet
Reinforcement Learning: Yijue Hou
34 pages
Lecture Notes On Reinforcement Learning Basics
No ratings yet
Lecture Notes On Reinforcement Learning Basics
6 pages
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
No ratings yet
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
34 pages
Unit 1
No ratings yet
Unit 1
18 pages
Fundamentals of Reinforcement Learning
No ratings yet
Fundamentals of Reinforcement Learning
33 pages
Reinforcement Learning: Instructor: Max Welling
No ratings yet
Reinforcement Learning: Instructor: Max Welling
18 pages
Lect 2
No ratings yet
Lect 2
26 pages
Unit 3
No ratings yet
Unit 3
29 pages
11-DL-Deep Learning For Reinforcement Learning
No ratings yet
11-DL-Deep Learning For Reinforcement Learning
47 pages
Reinforedu
No ratings yet
Reinforedu
46 pages
Reinforcement Learning: Csci 5512: Artificial Intelligence Ii
No ratings yet
Reinforcement Learning: Csci 5512: Artificial Intelligence Ii
30 pages
CS480 Lecture November 21st
No ratings yet
CS480 Lecture November 21st
193 pages
Hota ML ReinforcementLearning
No ratings yet
Hota ML ReinforcementLearning
12 pages
Artificial Intelligence: Computer Science & Engineering, Khulna University
No ratings yet
Artificial Intelligence: Computer Science & Engineering, Khulna University
30 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
30 pages
Unit-8 - Reinforcement Learning
No ratings yet
Unit-8 - Reinforcement Learning
52 pages
5th Unit Notes Full File
No ratings yet
5th Unit Notes Full File
22 pages
Ai Unit 3
No ratings yet
Ai Unit 3
23 pages
15) EXPLAIN Fitted Q and Deep Q-Learning
No ratings yet
15) EXPLAIN Fitted Q and Deep Q-Learning
17 pages
Reinforcement Learning - Ipynb - Colaboratory
No ratings yet
Reinforcement Learning - Ipynb - Colaboratory
7 pages
Unit 5
No ratings yet
Unit 5
45 pages
Fai Mid2 4ans
No ratings yet
Fai Mid2 4ans
4 pages
ML Unit-4 - RTU
No ratings yet
ML Unit-4 - RTU
18 pages
Lecture RL
No ratings yet
Lecture RL
37 pages
Intro To Reinforcement Learning
No ratings yet
Intro To Reinforcement Learning
56 pages
Unit 5
No ratings yet
Unit 5
54 pages
ML - Unit 3 - Part II
No ratings yet
ML - Unit 3 - Part II
51 pages
Lecture Week12
No ratings yet
Lecture Week12
37 pages
Reinforcement Learning: By: Chandra Prakash IIITM Gwalior
No ratings yet
Reinforcement Learning: By: Chandra Prakash IIITM Gwalior
64 pages
Lecture 5
No ratings yet
Lecture 5
28 pages
IntroductiontoRL BR
No ratings yet
IntroductiontoRL BR
22 pages
Practical Deep Reinforcement Learning with Python: Concise Implementation of Algorithms, Simplified Maths, and Effective Use of TensorFlow and PyTorch (English Edition)
From Everand
Practical Deep Reinforcement Learning with Python: Concise Implementation of Algorithms, Simplified Maths, and Effective Use of TensorFlow and PyTorch (English Edition)
Ivan Gridin
4/5 (1)
TRANSCRIPT-Introduction To The Program
No ratings yet
TRANSCRIPT-Introduction To The Program
3 pages
Storytelling PDF
No ratings yet
Storytelling PDF
38 pages
BS en 15341-2019 - Maintenance Key Performance Indicators
100% (3)
BS en 15341-2019 - Maintenance Key Performance Indicators
56 pages
SAMPLE-LP
No ratings yet
SAMPLE-LP
7 pages
Essay On Sts
No ratings yet
Essay On Sts
1 page
RSL+Exam+Fee+Sheet+2024 Malaysia+
No ratings yet
RSL+Exam+Fee+Sheet+2024 Malaysia+
1 page
TCL Child Care Services
No ratings yet
TCL Child Care Services
18 pages
Malala Yousafzai American English Student
No ratings yet
Malala Yousafzai American English Student
7 pages
1993 Transforming Mission - Book Review
No ratings yet
1993 Transforming Mission - Book Review
3 pages
NUS AMP Brochure
No ratings yet
NUS AMP Brochure
15 pages
Leadership Styles: 5 Major Styles of Leadership
100% (1)
Leadership Styles: 5 Major Styles of Leadership
5 pages
GGV - Samarth.edu - in Index - PHP Examstudent Hall-Admit-Card View Id 86297
No ratings yet
GGV - Samarth.edu - in Index - PHP Examstudent Hall-Admit-Card View Id 86297
2 pages
BSCI 271 - 2025 - SU E1 Fundamentals of Strategic Management Notes
No ratings yet
BSCI 271 - 2025 - SU E1 Fundamentals of Strategic Management Notes
14 pages
Alternative Therapy
No ratings yet
Alternative Therapy
3 pages
UT Dallas Syllabus For Entp6375.501.11s Taught by Rajiv Shah (rxs079000)
No ratings yet
UT Dallas Syllabus For Entp6375.501.11s Taught by Rajiv Shah (rxs079000)
4 pages
Andre Rouhani Resume - 4 December 2016
No ratings yet
Andre Rouhani Resume - 4 December 2016
1 page
FP Grade 3 English Life Skills LP Term 2 Week 6
No ratings yet
FP Grade 3 English Life Skills LP Term 2 Week 6
3 pages
Ieee PHD Thesis Format
100% (3)
Ieee PHD Thesis Format
4 pages
A Semi Detailed Lesson Plan in
No ratings yet
A Semi Detailed Lesson Plan in
10 pages
Ethics Review PDF
No ratings yet
Ethics Review PDF
10 pages
Motivation Science - Burkley
No ratings yet
Motivation Science - Burkley
1,793 pages
Lesson Plan: Trolls-Just Like You and Me?: Featured Resources
No ratings yet
Lesson Plan: Trolls-Just Like You and Me?: Featured Resources
5 pages
Complex Variables
No ratings yet
Complex Variables
2 pages
Rina Tannenbaum Resume October 2012
No ratings yet
Rina Tannenbaum Resume October 2012
33 pages
Child and Adolescent and Learning Principles Finals
No ratings yet
Child and Adolescent and Learning Principles Finals
2 pages
Thermal Degradation of PE and PTFE During Vacuum Evaporation
No ratings yet
Thermal Degradation of PE and PTFE During Vacuum Evaporation
4 pages
HSLU International IT Management
No ratings yet
HSLU International IT Management
17 pages
Narayana GTMS 2024
No ratings yet
Narayana GTMS 2024
10 pages
Informed Consent 2
No ratings yet
Informed Consent 2
7 pages

Lec 11

Uploaded by

Lec 11

Uploaded by

CS 188: Artificial Intelligence

Instructor: Pieter Abbeel, University of California, Berkeley

o New twist: don’t know T or R, so must try out actions

o Big idea: Compute all averages over T using sample outcomes

o Active Reinforcement Learning (= agent also needs to decide how to

o Q-Learning: learn Q(s,a) values as you go

o Incorporate the new estimate into a running average:

[Demo: Q-learning – gridworld (L10D2)]

o This is called off-policy learning

[Demo: Q-learning – cliff grid (L11D2)]

o Active Reinforcement Learning (= agent also needs to decide how

o Problems with random actions?

[Demo: Q-learning – epsilon-greedy -- crawler (L10D3)]

o Active Reinforcement Learning (= agent also needs to decide how to

o In realistic situations, we cannot possibly learn

o Instead, we want to generalize:

[Demo: Q-learning – pacman – tiny – watch all (L11D4)]

o Solution: describe a state using a vector of

o Using a feature representation, we can write a q function (or value function)

o Advantage: our experience is summed up in a few powerful numbers

o Disadvantage: states may share features but actually be very different in

o Q-learning with linear Q-functions:

o Formal justification: online least squares

Approximate q update explained:

o Better methods exploit lookahead structure, sample wisely, change

[Andrew Ng] [Video: HELICOPTER]

[Schulman, Moritz, Levine, Jordan, Abbeel, ICLR 2016] [Video: GAE]

o We’re done with Part I: Search and

o We’ve seen how AI methods can solve

o Next up: Part II: Uncertainty and

You might also like