0% found this document useful (0 votes)
26 views12 pages

PPTX

The document discusses a machine learning problem of teaching a program to play checkers by learning an evaluation function that assigns scores to board states. It describes representing the function as a linear combination of board features and using examples of states and training values to adjust the weights through an algorithm like least mean squares.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views12 pages

PPTX

The document discusses a machine learning problem of teaching a program to play checkers by learning an evaluation function that assigns scores to board states. It describes representing the function as a linear combination of board features and using examples of states and training values to adjust the weights through an algorithm like least mean squares.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

A CHECKERS LEARNING

PROBLEM
“Machine Learning” By Tom Mitchell
PROBLEM
• Task T: playing checkers
• Performance measure P: percent of games won in the world
tournament
• Training experience E: games played against itself
APPROACH

1. The exact type of knowledge to be learned


2. A representation for this target knowledge
3. A learning mechanism

The type of training experience available can have a significant impact


on success or failure of the learner
TARGET FUNCTION
• to reduce the problem of improving performance P at task T to the
problem of learning some particular target function

Legal Moves Won or Lost Indirect Training Experience

Given Broad Moves required to win Direct Training Experience


TARGET FUNCTION
• Evaluation function that assigns a numerical score to any given board state.
• V : B →Ƞ to denote that V maps any legal board state from the set B to some real
value (we use Ƞ to denote the set of real numbers).

1. if b is a final board state that is won, then V(b) = 100


2. if b is a final board state that is lost, then V(b) = -100
3. if b is a final board state that is drawn, then V(b) = 0
4. if b is a not a final state in the game, then V(b) = V(b' ), where b' is the best
final board state that can be achieved starting from b and playing optimally until
the end of the game (assuming the opponent plays optimally, as well).
TARGET FUNCTION
• operational description of the ideal target function V is required.
• Learning algorithms is expected to acquire only some approximation
to the target function, and for this reason the process of learning the
target function is often called function approximation

On one hand, we wish to pick a very expressive representation to allow representing


as close an approximation as possible to the ideal target function V. On the other
hand, the more expressive the representation, the more training data the program
will require in order to choose among the alternative hypotheses it can represent.
Problem Representation
•A simple representation: for any given board state, the function will be
calculated as a linear combination of the following board features.
• xl: the number of black pieces on the board
• x2: the number of red pieces on the board
• x3: the number of black kings on the board
• x4: the number of red kings on the board
• x5: the number of black pieces threatened by red (i.e., which can be
captured on red's next turn)
• X6: the number of red pieces threatened by black
TARGET FUNCTION
•Thus, our learning program will represent (b) as a linear function of the
form

where through are numerical coefficients, or weights, to be chosen by


the learning algorithm. Learned values for the weights through will
determine the relative importance of the various board features in
determining the value of the board, whereas the weight will provide an
additive constant to the board value.
ESTIMATING TRAINING VALUES
• In order to learn the target function we require a set of training
examples, each describing a specific board state b and the training
value Vtrain(b) for b. In other words, each training example is an
ordered pair of the form .
• Rule for estimating training values.
← (Successor(b))
ADJUSTING THE WEIGHTS
• One common approach is to define the best hypothesis, or set of
weights, as that which minimizes the square error E between the
training values and the values predicted by the hypothesis .

Thus, we seek the weights, or equivalently the , that minimize E for the
observed training examples.
LMS Training
•Least mean squares or LMS training rule is one of several algorithms to
incrementally refine the weights.
LMS weight update rule.
• For each training example
• Use the current weights to calculate
• For each weight , update it as

)
The Final Design

You might also like