0% found this document useful (0 votes)

47 views12 pages

Learningintro Notes

This document discusses machine learning for games and introduces concepts related to designing machine learning systems. It addresses: 1) Reasons to use machine learning for games include playing against and beating human players, minimizing development effort, and improving user experience. 2) Key decisions in designing a machine learning system include how training data is accessed, what the target function is, how concepts will be represented, and what learning algorithm to use. 3) Examples of machine learning applications include checkers and tic-tac-toe to illustrate supervised and reinforcement learning approaches.

Uploaded by

Tanvi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views12 pages

Learningintro Notes

Uploaded by

Tanvi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Learning Agents: Introduction

S Luz
[email protected]
October 22, 2013

Learning in agent architectures

Performance standard

Agent
perception
representation
Critic

rewards/ Perception
instruction

Learner

Goals changes
Actuators

Interaction
planner action policy
action

Machine Learning for Games

Reasons to use Machine Learning for Games:

Play against, and beat human players (as in board games, DeepBlue
etc)
Minimise development effort (when developing AI components); avoid
the knowledge engineering bottleneck
Improve the user experience by adding variability, realism, a sense
that artificial characters evolve, etc.

Some questions

What is (Machine) Learning?

What can Machine Learning really do for us?
What kinds of techniques are there?
How do we design machine learning systems?
Whats different about reinforcement learning?

1
Could you give us some examples?
YES:

Draughts (checkers)
Noughts & crosses (tic-tac-toe)

Defining learning

ML has been studied from various perspectives (AI, control theory, statis-
tics, information theory, ...)

From an AI perspective, the general definition is formulated in terms of

agents and tasks. E.g.:
[An agent] is said to learn from experience E with respect to
some class of tasks T and performance measure P , if its perfor-
mance at tasks in T , as measured by P , improves with E.

(Mitchell, 1997, p. 2)
Statistics, model-fitting, ...

Some examples

Problems too difficult to program by hand

(ALVINN (Pomerleau, 1994))

Data Mining

Name: Corners Name: Corners Name: Corners

Bearing: 100 Bearing: 40 Bearing: 20
Velocity: 20 Velocity: 20 Velocity: 20
Energy: 30 Energy: 20 Energy: 20
Heading: 90 Heading: 90 Heading: 90
... ... ...
---------------------------------------------------------------->
t0 t1 t2 time

2
if Name = Corners & Energy < 25
then
turn(91 - (Bearing - const)
fire(3)

User interface agents

Recommendation services,
Bayes spam filtering
JIT information retrieval

Designing a machine learning system

Main design decisions:

Training experience: How will the system access and use data?
Target function: What exactly should be learned?
Hypothesis representation: How will we represent the concepts to be
learnt?
Inductive inference: What specific algorithm should be used to learn
the target concepts?

Types of machine learning

How will the system be exposed to its training experience?

Direct or indirect access:

indirect access: record of past experiences, databases, corpora
direct access: situated agents reinforcement learning
Source of feedback (teacher):
supervised learning
unsupervised learning
mixed: semi-supervised (transductive), active learning, ....

3
Determining the target function

The goal of the learning algorithm is to induce an approximation f of a

target function f
In supervised learning, the target function is assumed to be specified
through annotation of training data or some form of feedback.
Examples:
a collection of texts categorised by subject f : D S {0, 1}
a database of past games
user or expert feedback
In reinforcement learning the agent will learn an action selection policy
(as in action : S A)

The hypothesis space

The data used in the induction process need to be represented uniformly.

E.g.:
representation of the opponents behaviour as feature vectors
The choice of representation constrains the space of available hypotheses
(inductive bias).
Examples of inductive bias:
assume that feature co-occurrence does not matter (conditional in-
dependence assumption by Nave Bayes classifiers)
assume that the current state of the environment summarises envi-
ronment history (Markov property)

Deduction and Induction

Deduction: from general premises to a concludion. E.g.:

{A B, A} ` B
Induction: from instances to generalisations
Machine learning algorithms produce models that generalise from instances
presented to the algorithm
But all (useful) learners have some form of inductive bias:
In terms of representation, as mentioned above,
But also in terms of their preferences in generalisation procedures.
E.g:
prefer simpler hypotheses, or
prefer shorter hypotheses, or
incorporate domain (expert) knowledge, etc etc

4
Given an function f : X C trained on a set of instances Dc describing a
concept c, we say that the inductive bias of f is a minimal set of assertions B,
such that for any set of instanced X:

x X(B Dc x ` f(x))
This should not be confused with estimation bias, which is a quantitity
(rather than a set of assumptions) which quantifies the average loss due to
misclassification. Together with variance, this quantitity determines the level
of generalisation error of a learner (Domingos, 2012).

Choosing an algorithm

Induction task as search for a hypothesis (or model) that fits the data and
sample of the target function available to the learner, in a large space of
hypotheses
The choice of learning algorithm is conditioned to the choice of represen-
tation
Since the target function is not completely accessible to the learner, the
algorithm needs to operate under the inductive learning assumption that:
an approximation that performs well over a sufficiently large set
of instances will perform well on unseen data

Computational Learning Theory addresses this question.

Two Games: examples of learning

Supervised learning: draughts/checkers

(Mitchell, 1997)

X O O
Reinforcement learning: noughts and crosses

(Sutton and Barto, 1998) O X X
X
Task? (target function, data representation) Training experience? Perfor-
mance measure?

5
A target for a draughts learner

Learn.... f : Board Action or f : Board R

But how do we label (evaluate) the training experience?
Ask an expert?
Derive values from a rational strategy:
if b is a final board state that is won, then f (b) = 100
if b is a final board state that is lost, then f (b) = 100
if b is a final board state that is drawn, then f (b) = 0
if b is a not a final state in the game, then f (b) = f (b0 ), where b0 is
the best final board state that can be achieved starting from b and
playing optimally until the end of the game.
How feasible would it be to implement these strategies?
Hmmmm... Not feasible...

Hypotheses and Representation

The choice of representation (e.g. logical formulae, decision tree, neural

net architecture) constrains the hypothesis search space.
A representation scheme: linear combination of board features:

f(b) = w0 + w1 bp(b) + w2 rp(b) + w3 bk(b)

+w4 rk(b) + w5 bt(b) + w6 rt(b)

where:
bp(b): number of black pieces on board b
rp(b): number of red pieces on b
bk(b): number of black kings on b
rk(b): number of red kings on b
bt(b): number of red pieces threatened by black
rt(b): number of black pieces threatened by red

Training Experience

Some notation and distinctions to keep in mind:

f (b): the true target function
f(b) : the learnt function
ftrain (b): the training value (obtained, for instance, from a training
set containing instances and its corresponding training values)
Problem: How do we obtain training values?

6
A simple rule for obtaining (estimating) training values:

ftrain (b) f(Successor(b))

Remember that what the agent really needs is an action policy, : B A,

to enable it to choose a an action a given a board configuration b. We are
assuming that the agent implements a policy (s) which selects the action that
maximises the value of the successor state of b, f (b0 ).

How do we learn the weights?

Algorithm 1: Least Means Square

1 LMS( c : l e a r n i n g r a t e )
2 f o r each t r a i n i n g i n s t a n c e < b, ftrain (b) >
3 do
4 compute error(b) f o r c u r r e n t a p p r o x i m a t i o n
5 ( i . e . using current weights ) :
6 error(b) = ftrain (b) f(b)
7 f o r each board f e a t u r e ti {bp(b), rp(b), . . . } ,
8 do
9 update w e i g h t wi :
10 wi wi + c ti error(b)
11 done
12 done

P LMS minimises the squared error between training data and current approx.:E
(f train (b) (b))2 Notice that if error(b) = 0 (i.e. target and
f
hb,ftrain (b)iD
approximation match) no weights change. Similarly, if or ti = 0 (i.e. fea-
ture ti doesnt occcur) the corresponding weight doesnt get updated. This
weight update rule can be shown to perform a gradient descent search for the
minimal squared error (i.e. weight updates are proportional to E where
E
E = [ w , E , . . . ]).
0 w1
That the LMS weight update rule implements gradient descent can be seen
by differentiating E:

[f (b) f(b)]2
P
E
=
wi wi
[f (b) f(b)]2
P
=
wi
X
= 2 [f (b) f(b)] [f (b) f(b)]
wi
|D|
X X
= 2 [f (b) f(b)] [f (b) wj tj ]
wi i
X
= 2 error(b) tj

Design choices: summary

7
Determine Type
of Training Experience

Games against ...

experts Table of correct
Games against moves
self

Determine
Target Function

Board Board ...

move value

Determine Representation
of Learned Function

...
Polynomial
Linear function Artificial neural
of six features network

Determine
Learning Algorithm

Linear ...
Gradient programming
descent

Completed Design
(from (Mitchell, 1997)) These are some of the deci-

sions involved in ML design. A number of other practical factors, such as evaluation, avoidance
of overfitting, feature engineering, etc. See (Domingos, 2012) for a useful introduction, and
some machine learning folk wisdom.

The Architecture instantiated

Performance standard

ftrain (b) := f(successor(b) Agent

perception
representation
(bp(b), rp(b), . . .)
Critic

rewards/
(b, ftrain (b), . . .) Perception
instruction

Learner

Goals changes
f

Actuators
= arg max f(s), s

Interaction
planner initial board
action policy
action

Reinforcement Learning

What is different about reinforcement learning:

Training experience (data) obtained through direct interaction with
the environment;
Influencing the environment;
Goal-driven learning;
Learning of an action policy (as a first-class concept)
Trial and error approach to search:
Exploration and Exploitation

8
Basic concepts of Reinforcement Learning

The policy: defines the learning agents way of behaving at a given time:

:SA

The (immediate) reward function: defines the goal in a reinforcement

learning problem:
r:SR

often identified with timesteps: r0 , . . . , rn R

The (long term) value function: the total amount of reward an agent can
expect to accumulate over the future:

V :SR

A model of the environment

Theoretical background
Engineering: optimal control (dating back to the 50s)
Markov Decision Processes (MDPs)
Dynamic programming

Psychology: learning by trial and error, animal learning. Law of effect:

learning is selectional (genetic methods, for instance, are selectional,
but not associative) and
associative (supervised learning is associative, but not selectional)

AI: TD learning, Q-learning

Law of effect: Of several responses made to the same situation, those which
are accompanied or closely followed by satisfaction to the animal will, other
things being equal, be more firmly connected with the situation, so that, when
it recurs, they will be more likely to recur; those which are accompanied or
closely followed by discomfort to the animal will, other things being equal, have
their connections with that situation weakened, so that, when it recurs, they will
be less likely to occur. The greater the satisfaction or discomfort, the greater
the strengthening or weakening of the bond. (Thorndike, 1911, p. 244, quoted
by (Sutton and Barto, 1998))
The selectional aspect means that the learner will select from a large pool
of complete policies, overlooking individual states (e.g. the contribution of a
particular move to winning the game).
The associative aspect means that the learner associates action (move) to a
value but does not select (or compare) policies as a whole.

9
Example: Noughts and crosses

Possible solutions: minimax (assume a perfect opponent), supervised learning

(directly search the space of policies, as in the previous example), reinforcement
learning (our next example).

A Reinforcement Learning strategy

Assign values to each possible game state (e.g. the probability of winning
from that state):

state V (s) outcome

X
Algorithm 2: TD Learning
s0 = 0.5 ?? While l e a r n i n g
X s e l e c t move by
s1 = 0 0.5 ?? l o o k i n g ahead 1 s t a t e
.. choose next s t a t e s :
. i f \= e x p l o r i n g
X 0 p i c k s a t random
si = X 0 0 loss else
0
.. s = arg maxs V (s)
.
X X X
sn = 0 0 1 win N.B.: exploring could mean, for instance,
pick a random next state 10% of the time.

How to update state values

10
s0
opponents
move

o
our (greedy)
move
An update rule:

s1 (TD learning)

V (s) V (s) + [V (s0 ) V (s)]

o An exploratory
move

step-size parameter
si s5
back up (learning rate)
value (for
greedy moves)
o

Some nice properties of this RL algorithm

For a fixed oppononent, if the parameter that controls learning rate () is

reduced properly over time, converges to the true probabilities of winning
from each state (yielding an optimal policy)
If isnt allowed to reach zero, the system will play well against opponents
that alter their game (slowly)
Takes into account what happens during the game (unlike supervised ap-
proaches)

What was not illustrated

RL also applies to situations where there isnt a clearly defined adversary

(games against nature)
RL also applies to non-episodic problems (i.e. rewards can be received at
any time not only at the end of an episode such as a finished game)
RL scales up well to games where the search space is (unlike our example)
truly vast.
See (Tesauro, 1994), for instance.
Prior knowledge can also be incorporated
Look-ahead isnt always required

11
References
Domingos, P. (2012). A few useful things to know about machine learning.
Communications of the ACM, 55(10):7887.
Mitchell, T. M. (1997). Machine Learning. McGraw-Hill.

Pomerleau, D. A. (1994). Neural Network Perception for Mobile Robot Guidance.

Kluwer, Dordrecht, Netherlands.
Sutton, R. S. and Barto, A. G. (1998). Reinforcement Learning: An Introduc-
tion. MIT Press, Cambridge, MA.

Tesauro, G. (1994). TD-gammon, a self-teaching backgammon program,

achieves master-level play. Neural Computation, 6:215219.

Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
AP Calc AB/BC Review Sheet
100% (3)
AP Calc AB/BC Review Sheet
2 pages
Unit 1 ML
No ratings yet
Unit 1 ML
14 pages
Effective Applications of Learning: Speech Recognition
No ratings yet
Effective Applications of Learning: Speech Recognition
52 pages
Ai&ml Unit 4
No ratings yet
Ai&ml Unit 4
21 pages
Module 1 (3)-pages
No ratings yet
Module 1 (3)-pages
77 pages
Ecs 403 ML Module I
No ratings yet
Ecs 403 ML Module I
33 pages
Unit 1 1
No ratings yet
Unit 1 1
64 pages
UNIT 1 - Introduction
No ratings yet
UNIT 1 - Introduction
26 pages
Module 1 (3)
No ratings yet
Module 1 (3)
97 pages
Learning
No ratings yet
Learning
35 pages
Unit 1 ML
No ratings yet
Unit 1 ML
60 pages
Eid 403 ML Module I Lecture Notes
No ratings yet
Eid 403 ML Module I Lecture Notes
26 pages
Module 1
No ratings yet
Module 1
28 pages
UNIT 1 Machine Learning MTech
No ratings yet
UNIT 1 Machine Learning MTech
167 pages
Module 1
No ratings yet
Module 1
27 pages
MACHINE LEARNING TECHNIQUES - PPSX
No ratings yet
MACHINE LEARNING TECHNIQUES - PPSX
26 pages
ML1
No ratings yet
ML1
28 pages
ML Unit-1
No ratings yet
ML Unit-1
61 pages
ml notes
No ratings yet
ml notes
47 pages
Unti 1 ML
No ratings yet
Unti 1 ML
26 pages
ML Unit-I
No ratings yet
ML Unit-I
121 pages
ML Unit - 1
No ratings yet
ML Unit - 1
85 pages
ML Unit 1
No ratings yet
ML Unit 1
156 pages
M01 Machine Learning
No ratings yet
M01 Machine Learning
25 pages
ML 5 Units
No ratings yet
ML 5 Units
466 pages
Unit 1 1
No ratings yet
Unit 1 1
26 pages
Svit Dept of Computer Science and Engineering Machine Learning B.Tech Iiiyr
No ratings yet
Svit Dept of Computer Science and Engineering Machine Learning B.Tech Iiiyr
53 pages
What Is Learning?: CS 391L: Machine Learning
No ratings yet
What Is Learning?: CS 391L: Machine Learning
6 pages
Module 1
No ratings yet
Module 1
27 pages
ML - Unit 1 - Part I
No ratings yet
ML - Unit 1 - Part I
24 pages
Module 1 Notes PDF
No ratings yet
Module 1 Notes PDF
26 pages
Designing A Learning System
No ratings yet
Designing A Learning System
23 pages
machine learning UNIT-I notes
No ratings yet
machine learning UNIT-I notes
38 pages
ML Chapter-1
No ratings yet
ML Chapter-1
39 pages
VTU Exam Question Paper With Solution of 18MCA53 Machine Learning Feb-2022-Dr.gnaneswari
No ratings yet
VTU Exam Question Paper With Solution of 18MCA53 Machine Learning Feb-2022-Dr.gnaneswari
27 pages
Machine Learning (UNIT-1 - PART ONE)
No ratings yet
Machine Learning (UNIT-1 - PART ONE)
24 pages
Module 2 PDF
No ratings yet
Module 2 PDF
26 pages
ML RUSA Module 1 Intro
No ratings yet
ML RUSA Module 1 Intro
30 pages
PPTX
No ratings yet
PPTX
12 pages
Unit 1.2 Desigining A Learning System
No ratings yet
Unit 1.2 Desigining A Learning System
15 pages
Course. Introduction To Machine Learning Lecture 1. Introduction To ML
No ratings yet
Course. Introduction To Machine Learning Lecture 1. Introduction To ML
46 pages
1 Introduction To Machine Learning
No ratings yet
1 Introduction To Machine Learning
20 pages
Designing A Learning System: DR - Chandrika.J Professor CSE Course Faculty
No ratings yet
Designing A Learning System: DR - Chandrika.J Professor CSE Course Faculty
22 pages
Machine Learning Notes-1 (ML Design)
No ratings yet
Machine Learning Notes-1 (ML Design)
7 pages
ML Module Notes
No ratings yet
ML Module Notes
139 pages
Unit-1 Notes
No ratings yet
Unit-1 Notes
26 pages
Unit 4
No ratings yet
Unit 4
45 pages
ML-1
No ratings yet
ML-1
86 pages
Machine Learning (Unit-1)
No ratings yet
Machine Learning (Unit-1)
24 pages
Symbolic Machine Learning: M.S.Kaysar, M.Engg Cse, Iub
100% (2)
Symbolic Machine Learning: M.S.Kaysar, M.Engg Cse, Iub
112 pages
ML Unit-I Chapter-I Introduction
No ratings yet
ML Unit-I Chapter-I Introduction
36 pages
Chap 18
No ratings yet
Chap 18
51 pages
module-4 ML landscape
No ratings yet
module-4 ML landscape
105 pages
Lecture Series On Machine Learning: Ravi Gupta G. Bharadwaja Kumar
No ratings yet
Lecture Series On Machine Learning: Ravi Gupta G. Bharadwaja Kumar
77 pages
Machine Learning
No ratings yet
Machine Learning
25 pages
ML Lec 03 Machine Learning Process
No ratings yet
ML Lec 03 Machine Learning Process
42 pages
Unit 1
No ratings yet
Unit 1
14 pages
ML Unit 1 CS
100% (2)
ML Unit 1 CS
102 pages
ML MODULE - 1-1
No ratings yet
ML MODULE - 1-1
25 pages
Advanced C++ Interview Questions You'll Most Likely Be Asked
From Everand
Advanced C++ Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Present Perfect
No ratings yet
Present Perfect
1 page
Melting Point Manual
No ratings yet
Melting Point Manual
8 pages
A Review On K Means Clustering
No ratings yet
A Review On K Means Clustering
7 pages
Insights On Audit Methodology
No ratings yet
Insights On Audit Methodology
1 page
Effectiveness of Facebook As An Advertising Medium Among University Students
No ratings yet
Effectiveness of Facebook As An Advertising Medium Among University Students
94 pages
Imaging - Lecture 3 (Fractures)
No ratings yet
Imaging - Lecture 3 (Fractures)
11 pages
Quest NetVault Backup 863 Installation Guide English
No ratings yet
Quest NetVault Backup 863 Installation Guide English
96 pages
(WWW - Entrance-Exam - Net) - IETE AMIETE ET-CS-IT (Old Scheme) Signals and Systems Sample Paper 1
No ratings yet
(WWW - Entrance-Exam - Net) - IETE AMIETE ET-CS-IT (Old Scheme) Signals and Systems Sample Paper 1
4 pages
Elements of A Story Powerpoint
No ratings yet
Elements of A Story Powerpoint
36 pages
Fenomenologi Agama Dalam Perspektif Hasan Hanafi: December 2018
No ratings yet
Fenomenologi Agama Dalam Perspektif Hasan Hanafi: December 2018
10 pages
General Training Practice Reading
No ratings yet
General Training Practice Reading
30 pages
Tara The Feminine Divine
100% (18)
Tara The Feminine Divine
180 pages
Thermal Cracking
No ratings yet
Thermal Cracking
4 pages
Test Coordinate Geometry: A B A B A B A B A B A B PK A B
No ratings yet
Test Coordinate Geometry: A B A B A B A B A B A B PK A B
2 pages
Datasheet
No ratings yet
Datasheet
7 pages
Ifugao Epic Poetry
100% (2)
Ifugao Epic Poetry
3 pages
Essay 2 First Draft
No ratings yet
Essay 2 First Draft
7 pages
Case Study Job Selection and Career Plan
No ratings yet
Case Study Job Selection and Career Plan
20 pages
Eced 21 Lesson 1
No ratings yet
Eced 21 Lesson 1
22 pages
New Solutions To The Direct and Indirect Geodetic Problems On The Ellipsoid
No ratings yet
New Solutions To The Direct and Indirect Geodetic Problems On The Ellipsoid
5 pages
Basic 9 Week Three Lesson Note
No ratings yet
Basic 9 Week Three Lesson Note
5 pages
The Growing Industry of Internet Based Business in MSU - Marawi City
No ratings yet
The Growing Industry of Internet Based Business in MSU - Marawi City
7 pages
Private Project Details (100 Images) (5) (2) (2) - 91-100
No ratings yet
Private Project Details (100 Images) (5) (2) (2) - 91-100
10 pages
Summary Writing GCSP2033
No ratings yet
Summary Writing GCSP2033
6 pages
Unit 13 Lesson 4 Managing Tasks Using Online
No ratings yet
Unit 13 Lesson 4 Managing Tasks Using Online
16 pages
Narcisstic Elation
No ratings yet
Narcisstic Elation
3 pages
Eng6 Q1 Las1
No ratings yet
Eng6 Q1 Las1
9 pages
Portfolio 2024
No ratings yet
Portfolio 2024
23 pages
Ethics and Children's Literature - (Introduction)
No ratings yet
Ethics and Children's Literature - (Introduction)
12 pages