ML Lec 03 Machine Learning Process
ML Lec 03 Machine Learning Process
CS-13410
Introduction to Machine Learning
by
Mudasser Naseer
Machine Learning
Machine learning is a field of computer science that uses
statistical techniques to give computer systems the ability to
"learn" (e.g., progressively improve performance on a
specific task) with data, without being explicitly
programmed. (Wikipedia)
Previously we define machine learning as “Optimize a
performance criterion using example data or past
experience”.
Now we will define the learning problem in terms of
computer program.
2
Well-Posed Learning Problems
“A computer program is said to learn from experience E with
respect to some class of tasks T and performance measure P, if
its performance at tasks in T, as measured by P, improves with
experience E”.
To have a well-defined learning problem, we must identify
these three features: the class of tasks T, the measure of
performance to be improved P, and the source of experience E.
So we can define learning as
Learning = Improving with experience at some task
Improve over task T
With respect to performance measure P
Based on experience E
3
Learning Problems-Examples
For example, a computer program that learns to play
checkers might improve its performance as measured by
its ability to win at the class of tasks involving playing
checkers games, through experience obtained by playing
games against itself.
T : Play checkers
P : % of games won against opponents
E : playing practice games against itself
4
More Learning Problems
A handwriting recognition learning problem:
Task T:
recognizing and classifying handwritten letters/words
within images
Performance measure P:
percent of letters/words correctly classified
Training experience E:
a database of handwritten letters/words with given
classifications
5
A robot driving learning problem:
Task T:
driving on public four-lane highways using vision sensors
Performance measure P:
average distance traveled before an error (as judged by
human overseer)
Training experience E:
a sequence of images and steering commands recorded
while observing a human driver
6
Designing a Learning System
To illustrate some of the basic design issues and
approaches to machine learning, let us consider
designing a program to learn to play checkers, with the
goal of entering it in the world checkers tournament.
https://www.google.com/search?rlz
=1C1RNLE_enPK512PK512&ei=o9aKXcOaC8bRwALO-6rAAQ&q
=
how+to+play+checkers&oq=how+to+play+check&gs_l
=psy-ab.3.0.0l10.3594054.3604913..3608807...0.1..0.321.4143
.2-16j1......0....1..gws-wiz.......0i71j0i273j0i131j0i67.JEo8Lqm6u
78#kpvalbx=_zOSKXfrLNMGckwXQtpT4Cw30
8
Type of Training Experience
Direct or indirect feedback regarding the choices made by the system
Direct: training examples consists of individual checkers board states and the
correct move for each.
Indirect: information consisting of the move sequences and final outcomes of
various games played. Here the learner faces an additional problem of credit
assignment, (determining the degree to which each move in the sequence
deserves credit or blame for the final outcome).
Teacher or not
Teacher: The learner rely on the teacher to select informative board states and to
provide the correct move for each.
No teacher or assisted by teacher: the learner might itself propose some confusing
board states and ask the teacher for the correct move. Or the learner may have
complete control over both the board states and (indirect) training classifications
i.e. playing against itself with no teacher. 9
Type of Training Experience
A problem: is training experience representative of
performance goal?
In general, learning is most reliable when the training
examples follow a distribution similar to that of future test
examples.
Note: Most current theories of machine learning rests on
the crucial assumption that the distribution of training
examples is identical to the distribution of test examples.
However, this assumption is often violated in practice.
10
A checkers learning problem:
To proceed with our design, let us decide that our system
will train by playing games against itself.
Advantage: No external trainer needed and the system is
allowed to generate as much training data as time permits.
We now have a fully specified learning task.
Task T: playing checkers
Performance measure P: percent of games won in the world
tournament
Training experience E: games played against itself
11
A checkers learning problem:
In order to complete the design we must now choose
1. What exact type of knowledge should be learned?
2. How shall this knowledge be represented?
3. What specific algorithm to learn it? Or the learning
mechanism?
12
1. What exact type of knowledge should be learned?
(Choosing the target Function)
Problem: what type of knowledge is required to be learned and how this
will be used by the performance program.
Assume our checkers-playing program can generate all the legal moves
from any board state.
The program needs only to learn how to choose the best move from
among these legal moves.
This learning task is representative of a large class of tasks for which
the legal moves that define some large search space are known a priori,
but the best search strategy is not known.
Many optimization problems fall into this class, such as the problems of
scheduling and controlling manufacturing processes where the available
manufacturing steps are well understood, but the best strategy for
13
sequencing them is not.
Choosing the Target Function
Now the problem is transformed to learn a program or
function that chooses the best move for any given board
state. Say
ChooseMove : B → M
where : B is set of legal board states
M is set of legal moves
Thus we have reduced the problem of improving
performance P at task T to the problem of learning some
particular targetfunction such as ChooseMove, this
function accepts as input any board from the set of legal
board states B and produces as output some move from
the set of legal moves M.
14
Choosing the Target Function
Given the kind of indirect training experience available here, this
function will turn out to be very difficult to learn.
An alternative easier target function would be an evaluation function that
assigns a numerical score to any given board state.
Let us call this target function such that
where is the set of real numbers
This target function intend to assigns higher scores to better board
states. After learning this target function , the system easily use it to
select the best move from any current board position. This can be
accomplished by generating the successor board state produced by
every legal move, then using to choose the best successor state and
therefore the best legal move.
15
Choosing the Target Function
Thus, we have reduced the learning task in this case to
the problem of discovering an operational description of
the ideal targetfunction .
It is very difficult in general to learn such an operational
form of perfectly.
We often expect learning algorithms to acquire only
some approximation to the target function, often called
function approximation, we call this .
16
A checkers learning problem:
In order to complete the design we must now choose
1. What exact type of knowledge should be learned?
2. How shall this knowledge be represented?
3. What specific algorithm to learn it? Or the learning
mechanism?
17
2. How shall this knowledge be represented?
(Choosing a Representation for the Target Function)
In order to represent , we have many choices, such as table of features, collection of
rules against features of the board state, quadratic polynomial etc.
Let us choose a simple representation, a linear combination of the following board
features:
: the number of black pieces on the board
: the number of red pieces on the board
: the number of black kings on the board
: the number of red kings on the board
: the number of black pieces threatened by red (i.e., which can be captured on red's
next turn)
: the number of red pieces threatened by black
Thus, our learning program will represent as a linear function of the form
20
3. What specific algorithm to learn? Or
Choosing a Function Approximation Algorithm
22
A simple and surprisingly successful approach is to assign
to i.e. The rule for estimating training values is
23
3.2 Adjusting the weights
Now all remains is to choose the weights that best fit the
set of training examples
Now we need an algorithm that incrementally refine the
weights as new training examples become available.
Several algorithms are available such as
(i) Least Mean Squares (LMS)
(ii) Gradient Descent
(iii) Linear Programming
24
The Final Design
The final design of our checkers learning system
described by the following four modules that represent
the central components in many learning systems.
The Performance System
The Critic
The Generalizer
The Experiment Generator
25
The Performance System
The module that must solve the given performance task,
in this case playing checkers, by using the learned target
function(s).
Takes an instance of a new problem (new game) as input
and produces a trace of its solution (game history) as
output.
we expect its performance to improve as the evaluation
function becomes increasingly accurate.
26
The Critic
It takes as input the history or trace of the game and
produces as output a set of training examples of the
target function.
In our example, the Critic corresponds to the following
training rule
27
Generalizer
It takes as input the training examples and produces an
output hypothesis that is its estimate of the target
function.
Generalizes from the specific training examples,
hypothesizing a general function that covers these
examples and other cases beyond the training examples.
In our example, LMS algorithm is used as Generalizer, and
the output hypothesis is the function described by the
learned weights .
28
The Experiment Generator
It takes as input the current hypothesis (currently learned
function) and outputs a new problem ( i.e., initial board
state).
The main role is to pick new practice problems that will
maximize the learning rate of the overall system.
May follow a very simple strategy, always proposes the
same initial game board to begin a new game. More
sophisticated strategies may also be used to explore
particular regions of the state space.
29
Final design of the checkers learning program
31
Some last minute discussion
In above given design choices we have constrained the
learning task in a number of ways.
The type of knowledge is restricted to single linear
evaluation function.
The evaluation function depends on only the six specific
board features provided.
If we improve the function then our program has a better
chance to learn a good approximation of the true
function.
32
Some real examples
IBM 701 — the first major commercial computer
In 1949, Arthur Samuel (1901 – 1990) joined IBM’s Poughkeepsie lab
and in 1952 he programmed IBM’s first major computer — IBM 701
— to play checkers. It was the first game-playing program to run on a
computer — it was the first checkers program.
By 1955, Samuel had done something groundbreaking; he had
created a program that could learn — something that no one had
done before — and this was demonstrated, by playing a checkers on
television on February 24, 1956.
He published a seminal paper in 1959, titled ”Some Studies in
Machine Learning Using the Game of Checkers”, where he talked
about how a machine could look ahead “by evaluating the resulting
board positions much as a human player might do”. The computer
started out losing to Samuel and eventually beat Samuel.
Samuel’s program was based on Shannon’s minimax strategy to find
the best move from a given current position.
33
Solving Checkers (2007)
After Samuel’s work on checkers, there was a false
impression that checkers was a “solved” game. As a result,
researchers moved on to chess and mostly ignored
checkers until Jonathan Schaeffer began working on
Chinook in 1989. Schaeffer’s goal was to develop a program
capable of defeating the best checkers player.
In 2007 Schaeffer and his team published a paper in the
journal Science titled Checkers Is Solved: the program
could no longer be defeated by anyone, human or
otherwise.
34
Solving Checkers (2007)
Marion Franklin Tinsley (February 3, 1927 – April 3, 1995) was an
American mathematician and checkers player. He is considered to be the
greatest checkers player who ever lived. Tinsley was world champion 1955–
1958 and 1975–1991 and never lost a world championship match, and lost
only seven games (two of them to the Chinook computer program) in his
45-year career.
Checkers is extremely complex — it has roughly 500 billion billion possible
positions (5 x 10²⁰). About 10²⁰ possible board positions compared to Chess
(10⁴⁷) or Go (10²⁵⁰). Even though checkers is simpler than those games, it is
still complex enough that a brute force only approach is impractical.
Checkers is the largest game that has been solved to date, with a search
space of 5×10²⁰. “The number of calculations involved was 10¹⁴, which were
done over a period of 18 years. The process involved from 200 desktop
computers at its peak down to around 50”.
35
Backgammon
Tesauro (1992, 1995) reports a design, similar to the design we discuss
as our final design of the checkers learning program example, for a
program that learns to play the game of Backgammon, by learning a
very similar evaluation function over states of the game.
The program represents the learned
evaluation function using an
artificial neural network that
considers the complete description
of the board state rather than a
subset of board features.
After training on over one million
self-generated training games, his
program was able to play very
competitively with top-ranked
human Backgammon players.
36
A classic example of computer brilliance is Google’s “AlphaGo” which can
beat players at the Chinese game Go. During recent years AlphaGo went on a
spree, defeating Go masters anonymously. In October 2015, AlphaGo became
the first computer Go program to beat a human professional Go
player without handicaps on a full-sized 19×19 board.
At the beginning of 2017, it was revealed to the public, as well as to the
masters it had defeated, that this mysterious Go player was an AI all along.
Originally, the problem with solving Go
was the sheer number of moves that can
be made (approximately 1 x 10^170, which
is more than the total atoms in the
universe).
A computer cannot use brute force for
a winning strategy as a human will
have the upper hand thanks to
intuition. 37
Therefore, the Google team took a different approach and developed a system
that could almost be considered a form of synthetic intuition.
The system essentially runs different tasks, such as identification of important
moves, and found a balance between a neural network, a policy network, and a
value network.
This balancing is similar to how the human brain makes decisions, where
different aspects of a situation are considered. For example,
A person playing chess does not always think of every possible move and plan
according to immediate consequences.
The player could allow moves that may seem bizarre to a computer such as self-sacrifice
which, in the short term, appear to make the player lose but in the long term could be
vital for a victory.
Other techniques include confusion whereby a player that is losing may make “crazy”
moves that can confuse other players and keep their true strategy hidden. 38
AlphaGo uses a Monte Carlo tree search algorithm to find
its moves based on knowledge previously "learned"
by machine learning, specifically by an artificial neural
network (a deep learning method) by extensive training,
both from human and computer play.
39
AlphaZero
AlphaZero is a computer program also developed by artificial
intelligence research company DeepMind to master the games
of chess, shogi and go. The algorithm uses an approach similar
to AlphaGo Zero.
On December 5, 2017, the DeepMind team released
a preprint introducing AlphaZero, which within 24 hours of
training achieved a superhuman level of play in these three
games by defeating world-champion programs Stockfish, elmo,
and the 3-day version of AlphaGo Zero.
40
AlphaZero
41
AlphaZero
AlphaZero was trained solely via "self-play" using 5,000 first-
generation Tensor Processing Units (TPUs) to generate the
games and 64 second-generation TPUs to train the neural
networks, all in parallel, with no access to opening
books or endgame tables.
After four hours of training, DeepMind estimated AlphaZero
was playing at a higher Elo rating than Stockfish 8; after 9
hours of training, the algorithm defeated Stockfish 8 in a time-
controlled 100-game tournament (28 wins, 0 losses, and 72
draws).
The trained algorithm played on a single machine with four
TPUs.
42