0% found this document useful (0 votes)

20 views30 pages

Learning Classifier System

Uploaded by

Akshay Hebbar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views30 pages

Learning Classifier System

Uploaded by

Akshay Hebbar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 30

Reinforcement learning

& Case-Based Learning

Classifier Systems

(modified from slides by

Derek Bridge, ICBR’05)
Reinforcement Learning
• The agent interacts with its environment
to achieve a goal

• It receives reward (possibly delayed

reward) for its actions
– it is not told what actions to take

• Trial-and-error search
– neither exploitation nor exploration can be
pursued exclusively without failing at the
task

• Life-long learning
– on-going exploration
Reinforcement 
Learning
Policy  : S  A
state
action reward

a0 a1 a2
s0 s1 s2 ...
r0 r1 r2
State value function, V
State, s V(s) V(s) predicts the future
s0 ... total reward we can
s1 10 obtain by entering state
s2 15 s p(s , a , s ) = 0.7
0 1 1
s1
s3 6
r(s0, a1) = 2 p(s0, a1, s2) = 0.3
 can exploit V
greedily, i.e. in s, s0 s2
choose action a for r(s0, a2) = 5
p(s0, a2, s2) = 0.5
which the following is
largest: s3
r ( s, a )   p ( s, a, s ' ) V ( s ' ) p(s0, a2, s3) = 0.5
s 'S Choosing a1: 2 + 0.7 × 10 + 0.3 × 15 = 13.5
Choosing a2: 5 + 0.5 × 15 + 0.5 × 6 = 15.5
Action value function, Q
State, s Action, a Q(s, a)
Q(s, a) predicts the
s0 a1 13.5
future total reward we
s0 a2 15.5
can obtain by
s1 a1 ...
executing a in s
s1 a2 ...
 can exploit Q
greedily, i.e. in s, s0
choose action a for
which Q(s, a) is largest
Q Learning
Exploration
For each (s, a), initialise Q(s, a) arbitrarily versus
exploitatio
Observe current state, s
n
Do until reach goal state

Select action a by exploiting Q ε-greedily,

One-step
i.e. temporal
with probability difference
ε, choose update rule,
a randomly;
else choose the a for which
TD(0)Q(s, a) is largest
Q ( s, a )  Q ( s, a )   (r   max Q ( s ' , a ' )  Q ( s, a ))
Execute a, entering state s’ and areceiving
'
immediate reward r

Update the table entry for Q(s, a)

s  s’ Watkins 1989
Backup Diagram for Q Learning

s
a

Q(s, a)
r
s’
a’

Q(s’, a’)
Function Approximation
• Q can be represented by a table only if the
number of states & actions is small
• Besides, this makes poor use of
experience
• Hence, we use function approximation,
e.g.
– neural nets
– weighted linear functions
– case-based/instance-based/memory-based
representations
Classifier Systems
• John Holland • Stewart Wilson

– Classifier systems are – ZCS simplifies Holland’s

rule-based systems with classifier systems
components for [Wilson 1994]
performance,
reinforcement and – XCS extends ZCS and
discovery [Holland uses accuracy-based
1986] fitness [Wilson 1995]
– They influenced the – Under simplifying
development of RL and assumptions, XCS
GAs implements Q Learning
[Dorigo & Bersini 1994]
XCS
Environment
State 0011 Action 01 Reward -5

Rules P E F
#011: 01 43 .01 99
#011: 01 43 .01 99 #011: 01 43 .01 99
11##: 00 32 .1300 9 01 10 11
#0##: 11 14 .05 52 001#: 01 27 .24 3
#0##: 11 14 .05 52
001#: 01 27- .2442.53 - 16.5
Action set
001#: 01 27 .24 3
#0#1: 11 Prediction
18 .02 array
92
#0#1: 11 18 .02 92
1#01: Match
10 24set .17 15

... ... ... ... #011: 01 43 .01 99

Rule base 001#: 01 27 .24 Update
3 P, E, F
Deletion
Previous action set
Discovery by GA
Case-Based XCS
Environment
State 0011 Action 01 Reward -5
Cases
X
Rules P E F
1011: 01 43 .01 99
1011: 01 43 .01 99 1011: 01 43 .01 99
1100: 00 32 .1300 9 01 10 11
0000: 11 14 .05 52 0111: 01 27 .24 3
0000: 11 14 .05 52
0111: 01 27- .2442.53 - 16.5
Action set
0111: 01 27 .24 3
0001: 11 Prediction
18 .02 array
92
0001: 11 18 .02 92
1101: Match
10 24set .17 15

... ... ... ...

Update P, E, F
X
Rule base
Case base Deletion
Previous action set
Discovery by GA
Insertion
Case Based Reasoning
• Similar to clustering before decision-making
using cluster centroids, for categorical or
partially numeric data.
• Generalized examples (Cases) are stored,
often in an exception hierarchy (Case base)
• Given new inputs, the best-matching
lowest-level cases are retrieved.
• Adaptation rules describe how the stored
case’s answers are modified for the new
case.
Reinforceme

CBR/IBL/MBR for RL
CBR/IBL/ Reinforceme
CBR/IBL/ nt
MBR nt
MBR Learning
Learning

• Conventionally, the
case has two parts
Case
– problem description,
solution
representing (s, a)
problem (real-valued)
– solution,
representing Q(s, a) state s & action a Q(s, a)

• Hence, the task is Query

regression, i.e. given solution
a new (s, a), predict problem (real-valued)

Q(s, a) (real-valued) state s & action a ?

Reinforceme

Case-Based XCS
Case-Based Reinforceme
Case-Based nt
Reasoning nt
Reasoning Learning
Learning

• Case has three

parts
Case
– problem description,
outcome
representing s problem solution (real-valued)
– solution,
representing a state s action a Q(s, a)

– outcome,
representing Q(s, a) Query

problem solution
• Given new s,
predict a, guided by state s ?
case outcomes as
well as similarities
Case Outcomes
• In CBR research, storing outcomes is
not common but neither is it new, e.g.
– cases have three parts in [Kolodner 1993]
– IB3’s classification records [Aha et al.
1991]

• They
– influence retrieval and reuse
– are updated in cases, based on
performance
– guide maintenance and discovery
Outcomes in Case-Based
XCS
•Each case outcome is a record of
– experience:
how many times it appeared in an action set
– prediction of future reward, P:
this is its estimate of Q(s, a)
– prediction error, E:
average error in P
– fitness, F:
inversely related to E
Retrieval and Reuse
• The Match Set contains the k-nearest
neighbours, but similarity is weighted
by fitness

• From the Prediction Array, we choose

the action with the highest predicted
total future reward, but the cases’
predictions are weighted by similarity
and fitness
Reinforcement
• On receipt of reward r, for each case
in the Previous Action Set
– P is updated by the TD(0) rule
– E is moved towards the difference
between the case’s previous value of P
and its new value of P
– F is computed from accuracy , which is
based on error E
Fitness F and Accuracy 


E
Fitness F is accuracy  relative to the total
accuracies of the Previous Action Set
Deletion
• ‘Random’ deletion
– probability inversely related to fitness

• Or case ci might be deleted if there is

another case cj such that
– cj has sufficient experience
– cj has sufficient fitness (accuracy)
– cj subsumes ci i.e.
• sim(ci, cj) < θ (or could use a competence model)
• cj’s action = ci’s action
Discovery by GA
• Steady-state reproduction, not generational
• The GA runs in a niche (an action set), not
panmictically
• It runs only if time since last GA for these
cases exceeds a threshold
• From the action set, two parents are
selected; two offspring are created by
crossover and mutation
• They are not retained if subsumed by their
parents
• If retained, deletion may take place
Example application: Spam
Classification
• Emails from my mailbox, stripped of
attachments
– 498 of them, approx. 75% spam
– highly personal definition of spam
– highly noisy
– processed in chronological order
• Textual similarity based on a text
compression ratio
• k = 1; ε = 0
• No GA
Spam Classification
• Rewards
– correct: 1
– spam as ham: -100
– ham as spam: -1000

• Other ways of reflecting this

asymmetry
– skewing the voting [Delany et al. 2005]
– loss functions, e.g. [Wilke & Bergmann
1996]
Has Spam had its Chips?
100
90
80
70 Use Best
% correct

60 IB1 (498)
50 IB2 (93)
40 IB3 (82)
30 CBR-XCS (498)
20
10
0
1

136

181

226

271

316

361

406

451

496
Recommender System
Dialogs
• 1470 holidays; 8 descriptive attributes
• Leave-one-in experiments
– each holiday in turn is the target holiday
– questions are asked until retrieval set
contains  5 holidays or no questions
remain
– simulated user answers a question with the
value from the target holiday
– 25-fold cross-validation (different
orderings)
Users Who Always Answer
• Best policy is to choose the remaining
question that has highest entropy
• State, s, records the entropies for each
question
• k = 4; ε starts at 1 and, after ~150
steps, decays exponentially
• Delayed reward = -
(numQuestionsAsked3)
• Multi-step backup
• No GA
Does the learned policy
minimise dialog length?
4
3.5
3
Dialog length

2.5 Random
2 By Entropy
1.5 CBR-XCS (126)

1
0.5
0
1
121
241
361
481
601
721
841
961
1081
1201
1321
1441
Users Who Don’t Always
Answer
• Schmitt 2002:
– an entropy-like policy (simVar)
– but also customer-adaptive (a Bayesian net predicts
reaction to future questions based on reactions to previous
ones)
• Suppose users feel there is a ‘natural’ question order
– if the actual question order matches the natural order,
users will always answer
– if actual question order doesn’t match the natural order,
with non-zero probability users may not answer
• A trade-off
– learning the natural order
• to maximise chance of getting an answer
– learning to ask highest entropy questions
• to maximise chance of reducing size of retrieval set, if given an
answer
Does the learned policy find a
good trade-off?
7

5
Dialog length

Random
4 By Entropy
3 By the Ordering
CBR-XCS (104)
2

0
1
121
241
361
481
601
721
841
961
1081
1201
1321
1441
Bridge’s 11 REs
Reuse Respond

Reap
Retrieve
reward

Receive
Reinforce
sensory input

Reflect

Refine Reduce Retain Replenish

Ai (It) Unit-5
No ratings yet
Ai (It) Unit-5
43 pages
q2 Activity Sheets - Grade 3
100% (2)
q2 Activity Sheets - Grade 3
13 pages
Chemistry Investigatory Project
33% (3)
Chemistry Investigatory Project
11 pages
Ecs 403 ML Module I
No ratings yet
Ecs 403 ML Module I
33 pages
Eco and Youth Club 2023-24
No ratings yet
Eco and Youth Club 2023-24
9 pages
Machine Learning and Neural Networks: Riccardo Rizzo
100% (1)
Machine Learning and Neural Networks: Riccardo Rizzo
113 pages
ML Unit 1
No ratings yet
ML Unit 1
156 pages
Lec17 ReinforcementLearning
No ratings yet
Lec17 ReinforcementLearning
58 pages
AI Unit 4
No ratings yet
AI Unit 4
91 pages
Blueprint Reading For Welders 9th Edition Unlocked Test Bank
No ratings yet
Blueprint Reading For Welders 9th Edition Unlocked Test Bank
308 pages
Artificial Intelligence: Lecture 11 - Reinforcement Learning II Dr. Shivanjali Khare
No ratings yet
Artificial Intelligence: Lecture 11 - Reinforcement Learning II Dr. Shivanjali Khare
52 pages
Unit 5
No ratings yet
Unit 5
70 pages
Unit 5 ML-2-70
No ratings yet
Unit 5 ML-2-70
69 pages
TTNT 09 Learning From Examples
No ratings yet
TTNT 09 Learning From Examples
58 pages
Artificial Intelligence: Lecture 10 - Reinforcement Learning Prof. Shivanjali Khare
No ratings yet
Artificial Intelligence: Lecture 10 - Reinforcement Learning Prof. Shivanjali Khare
45 pages
Machine Learning Unit 1
No ratings yet
Machine Learning Unit 1
56 pages
Intro To Reinforcement Learning
No ratings yet
Intro To Reinforcement Learning
56 pages
ML - Unit 3 - Part II
No ratings yet
ML - Unit 3 - Part II
51 pages
S18 Reinforcement Learning 2
No ratings yet
S18 Reinforcement Learning 2
46 pages
11-DL-Deep Learning For Reinforcement Learning
No ratings yet
11-DL-Deep Learning For Reinforcement Learning
47 pages
Week 12
No ratings yet
Week 12
59 pages
Unit-1 ML
No ratings yet
Unit-1 ML
39 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
38 pages
Lecture 2
No ratings yet
Lecture 2
47 pages
ML RUSA Module 1 Intro
No ratings yet
ML RUSA Module 1 Intro
30 pages
Chap 18
No ratings yet
Chap 18
51 pages
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
No ratings yet
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
34 pages
ML-UNIT-1 - Introduction PART-1
No ratings yet
ML-UNIT-1 - Introduction PART-1
60 pages
AI T8 ReinfoLearning
No ratings yet
AI T8 ReinfoLearning
38 pages
SP14 CS188 Lecture 10 - Reinforcement Learning I - Print
No ratings yet
SP14 CS188 Lecture 10 - Reinforcement Learning I - Print
25 pages
Unit 4
No ratings yet
Unit 4
45 pages
7 - Reinforcement Learning
No ratings yet
7 - Reinforcement Learning
23 pages
Unit 5 ML
No ratings yet
Unit 5 ML
23 pages
Amanuel Ai
No ratings yet
Amanuel Ai
28 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
32 pages
AI 11 Reinforcement Learning II
No ratings yet
AI 11 Reinforcement Learning II
35 pages
ML at Icl Reinforcement Learning: in A Nutshell
No ratings yet
ML at Icl Reinforcement Learning: in A Nutshell
60 pages
I2ml3e Chap18
No ratings yet
I2ml3e Chap18
27 pages
Learning Task
No ratings yet
Learning Task
14 pages
37 RL
No ratings yet
37 RL
18 pages
Table.1 Demographic Profile of The Respondents in Terms of Age
No ratings yet
Table.1 Demographic Profile of The Respondents in Terms of Age
5 pages
21 - Reinforcement Learning
No ratings yet
21 - Reinforcement Learning
25 pages
Unit 5
No ratings yet
Unit 5
21 pages
Effective Applications of Learning: Speech Recognition
No ratings yet
Effective Applications of Learning: Speech Recognition
52 pages
Unit 1 ML
No ratings yet
Unit 1 ML
14 pages
NNML
No ratings yet
NNML
113 pages
Ai Unit V
No ratings yet
Ai Unit V
18 pages
UNIT-VI Learning
No ratings yet
UNIT-VI Learning
19 pages
SP14 CS188 Lecture 10 - Reinforcement Learning I PDF
No ratings yet
SP14 CS188 Lecture 10 - Reinforcement Learning I PDF
38 pages
SP14 CS188 Lecture 10 - Reinforcement Learning I
No ratings yet
SP14 CS188 Lecture 10 - Reinforcement Learning I
35 pages
Lect6 PDF
No ratings yet
Lect6 PDF
66 pages
Reinforcement Learning: Csci 5512: Artificial Intelligence Ii
No ratings yet
Reinforcement Learning: Csci 5512: Artificial Intelligence Ii
30 pages
Lecture#12 DM MS (DEIM) Spring 2025
No ratings yet
Lecture#12 DM MS (DEIM) Spring 2025
21 pages
Reinforcement Learning 2
No ratings yet
Reinforcement Learning 2
13 pages
Reinforcement Learning: Instructor: Max Welling
No ratings yet
Reinforcement Learning: Instructor: Max Welling
18 pages
M Tech Ai Unit Iii
No ratings yet
M Tech Ai Unit Iii
6 pages
I2ml3e Chap18
No ratings yet
I2ml3e Chap18
27 pages
Chapter 8: Learning: By, Safa Hamdare
No ratings yet
Chapter 8: Learning: By, Safa Hamdare
46 pages
Tri-Tue-Nhan-Tao - Nathan-Lambert - Lec13 - 6up-Reinforcement-Learning - (Cuuduongthancong - Com)
No ratings yet
Tri-Tue-Nhan-Tao - Nathan-Lambert - Lec13 - 6up-Reinforcement-Learning - (Cuuduongthancong - Com)
8 pages
Saudi Aramco Typical Inspection Plan: LEAK TESTING (Per SAES-A-004) 14-May-18
No ratings yet
Saudi Aramco Typical Inspection Plan: LEAK TESTING (Per SAES-A-004) 14-May-18
10 pages
Reinforcement Learning: Russell and Norvig: CH 21
No ratings yet
Reinforcement Learning: Russell and Norvig: CH 21
16 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
6 pages
Reinforcement Learning: Russell and Norvig: CH 21
No ratings yet
Reinforcement Learning: Russell and Norvig: CH 21
16 pages
SCAPS Manual Most Recent
No ratings yet
SCAPS Manual Most Recent
137 pages
Quarter 1 Least Learned Competencies in Science
No ratings yet
Quarter 1 Least Learned Competencies in Science
3 pages
Power Series Solutions of Linear Differential Equations
No ratings yet
Power Series Solutions of Linear Differential Equations
34 pages
Untitled
No ratings yet
Untitled
4 pages
AI Book 10 - Worksheets - Unit 1 - Answer Key
No ratings yet
AI Book 10 - Worksheets - Unit 1 - Answer Key
8 pages
3 Simple Habits To Improve Your Critical Thinking
No ratings yet
3 Simple Habits To Improve Your Critical Thinking
6 pages
MCTS Ga
No ratings yet
MCTS Ga
5 pages
Signal Integrity Measurements and Network Analysis
No ratings yet
Signal Integrity Measurements and Network Analysis
55 pages
ECO Exam IMP Questions (JAN-24) by HM Hasnan
No ratings yet
ECO Exam IMP Questions (JAN-24) by HM Hasnan
83 pages
Flaws in Education System
No ratings yet
Flaws in Education System
47 pages
Science Fair Literature Review Example
100% (2)
Science Fair Literature Review Example
4 pages
Modified Bitumens
No ratings yet
Modified Bitumens
6 pages
Agnico Eagle 2023 Sustainability Performance Data - 25042024
No ratings yet
Agnico Eagle 2023 Sustainability Performance Data - 25042024
147 pages
Full Download Electromagnetic Waves and Lasers Second Edition Kimura Wayne D PDF
100% (3)
Full Download Electromagnetic Waves and Lasers Second Edition Kimura Wayne D PDF
49 pages
Intro To PSO
No ratings yet
Intro To PSO
28 pages
LSTMRefCard Printable
No ratings yet
LSTMRefCard Printable
1 page
Steelez - Auction 24th May To 02nd June 2025
No ratings yet
Steelez - Auction 24th May To 02nd June 2025
7 pages
0471 Thermal Insulation and Pliable Membranes
No ratings yet
0471 Thermal Insulation and Pliable Membranes
9 pages
A Model of Self-Regulation
No ratings yet
A Model of Self-Regulation
15 pages
XMLR400M1P25 - 0-400 Bar
No ratings yet
XMLR400M1P25 - 0-400 Bar
6 pages
Stal S700 - Porownanie - 10-Hillong-Milan-Veljkovic
No ratings yet
Stal S700 - Porownanie - 10-Hillong-Milan-Veljkovic
30 pages
1 s2.0 S0924013620301187 Main
No ratings yet
1 s2.0 S0924013620301187 Main
13 pages
Marking Criteria: End-Of-Term Exams For English 5 Speaking Exam: 30% (Five Tests)
No ratings yet
Marking Criteria: End-Of-Term Exams For English 5 Speaking Exam: 30% (Five Tests)
7 pages
Variations of ES
No ratings yet
Variations of ES
7 pages
5 - Vocabulary Exercises - Motivation
No ratings yet
5 - Vocabulary Exercises - Motivation
3 pages
Best Students - Coe
No ratings yet
Best Students - Coe
2 pages
Augmented Intelligence
No ratings yet
Augmented Intelligence
4 pages
Business Etiquette in South Korea - 20230908 - 122053 - 0000
No ratings yet
Business Etiquette in South Korea - 20230908 - 122053 - 0000
8 pages
SOP (Mahi - Project Coordinator)
No ratings yet
SOP (Mahi - Project Coordinator)
1 page
Sequences and Infinite Series, A Collection of Solved Problems
From Everand
Sequences and Infinite Series, A Collection of Solved Problems
Steven Tan
No ratings yet
Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Applications of Derivatives Errors and Approximation (Calculus) Mathematics Question Bank
From Everand
Applications of Derivatives Errors and Approximation (Calculus) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet