0% found this document useful (0 votes)
4 views

Unit-1Notes

The document provides an overview of Artificial Intelligence (AI), defining it as the study of creating machines that can perform tasks better than humans. It discusses the importance of AI, its applications across various fields, and the techniques used in AI research, emphasizing the need for knowledge representation. Additionally, it explores specific AI problems and methods, including game playing, natural language processing, and question answering systems.

Uploaded by

Grishma G
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Unit-1Notes

The document provides an overview of Artificial Intelligence (AI), defining it as the study of creating machines that can perform tasks better than humans. It discusses the importance of AI, its applications across various fields, and the techniques used in AI research, emphasizing the need for knowledge representation. Additionally, it explores specific AI problems and methods, including game playing, natural language processing, and question answering systems.

Uploaded by

Grishma G
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

UNIT-1 Machine learning and Applications

1. What is Artificial Intelligence?


Data: Raw facts, unformatted information.

Information: It is the result of processing, manipulating and organizing data in response to a


specific need. Information relates to the understanding of the problem domain.

Knowledge: It relates to the understanding of the solution domain – what to do?

Intelligence: It is the knowledge in operation towards the solution – how to do? How to apply
the solution?

Artificial Intelligence: Artificial intelligence is the study of how make computers to do things
which people do better at the moment. It refers to the intelligence controlled by a computer
machine.

One View of AI is
 About designing systems that are as intelligent as humans
 Computers can be acquired with abilities nearly equal to human intelligence
 How system arrives at a conclusion or reasoning behind selection of actions
 How system acts and performs not so much on reasoning process.

Why Artificial Intelligence?


 Making mistakes on real-time can be costly and dangerous.
 Time-constraints may limit the extent of learning in real world.

The AI Problem
There are some of the problems contained within AI.
1. Game Playing and theorem proving share the property that people who do them well are
considered to be displaying intelligence.
2. Another important foray into AI is focused on Commonsense Reasoning. It includes
reasoning about physical objects and their relationships to each other, as well as
reasoning about actions and other consequences.
3. To investigate this sort of reasoning Nowell Shaw and Simon built the General Problem
Solver (GPS) which they applied to several common sense tasks as well as the problem
of performing symbolic manipulations of logical expressions. But no attempt was made to
create a program with a large amount of knowledge about a particular problem domain.

School of Computer science and Engineering, Reva University


UNIT-1 Machine learning and Applications

4. The following are the figures showing some of the tasks that are the targets of work in AI:

Only quite simple tasks were selected.


Perception of the world around us is crucial to our survival. Animals with much less intelligence
than people are capable of more sophisticated visual perception. Perception tasks are difficult
because they involve analog signals. A person who knows how to perform tasks from several of
the categories shown in figure learns the necessary skills in standard order.

First perceptual, linguistic and commonsense skills are learned. Later expert skills such as
engineering, medicine or finance are acquired.

What is Artificial Intelligence?

 It is a branch of Computer Science that pursues creating the computers or


machines as intelligent as human beings.
 It is the science and engineering of making intelligent machines, especially intelligent
computer programs.

Definition:
Artificial Intelligence is the study of how to make computers do things, which, at the moment,
people do better. According to the father of Artificial Intelligence, John McCarthy, it is “The science
and engineering of making intelligent machines, especially intelligent computer programs”. It has
gained prominence recently due, in part, to big data, or the increase in speed, size and variety of
data businesses are now collecting.

AI Applications
AI has applications in all fields of human study, such as finance and economics, environmental
engineering, chemistry, computer science, and so on. Some of the applications of AI are listed
below:

Mundane Tasks
1. Perception

 Vision

School of Computer science and Engineering, Reva University


UNIT-1 Machine learning and Applications

 Speech
2. Natural Language
 Understanding
 Generation
 Translation
3. Common sense reasoning
4. Robot Control

Formal Tasks

1. Games
 Chess
 8-puzzle
 Block world etc.
2. Mathematics
 Geometry
 Logic
Integral Calculus

Expert Tasks
3. Engineering
 Design
 Manufacture Planning
 Fault Finding
4. Medical analysis
5. Financial Analysis

What is an AI Technique:
Artificial Intelligence research during the last three decades has concluded that Intelligence
requires knowledge. To compensate overwhelming quality, knowledge possesses less desirable
properties.

A. It is huge.

B. It is difficult to characterize correctly.

C. It is constantly varying.

D. It differs from data by being organized in a way that corresponds to its application.

E. It is complicated.

AI Methods

An AI technique is a method that exploits knowledge that is represented so that:

School of Computer science and Engineering, Reva University


UNIT-1 Machine learning and Applications

 The knowledge captures generalizations that share properties, are grouped together,
rather than being allowed separate representation.
 It can be understood by people who must provide it—even though for many programs bulk
of the data comes automatically from readings. In many AI domains, how the people
understand the same people must supply the knowledge to a program.
 It can be easily modified to correct errors and reflect changes in real conditions.
 It can be widely used even if it is incomplete or inaccurate.
 It can be used to help overcome its own sheer bulk by helping to narrow the range of
possibilities that must be usually considered.

What is an AI Technique?
Artificial Intelligence problems span a very broad spectrum. They appear to have very little in
common except that they are hard. There are techniques that are appropriate for the solution of
a variety of these problems. The results of AI research tells that

Intelligence requires Knowledge. Knowledge possesses some less desirable properties


including:
 It is voluminous
 It is hard to characterize accurately
 It is constantly changing
 It differs from data by being organized in a way that corresponds to the ways it will
be used.

AI technique is a method that exploits knowledge that should be represented in such a


way that:
• The knowledge captures generalizations. In other words, it is not necessary to
represent each individual situation. Instead situations that share important
properties are grouped together.
• It can be understood by people who must provide it. Most of the knowledge a
program has must ultimately be provided by people in terms they understand.
• It can be easily be modified to correct errors and to reflect changes in the world
and in our world view.
• It can be used in a great many situations even if it is not totally accurate or
complete.
• It can be used to help overcome its own sheer bulk by helping to narrow the
range of possibilities that must usually be considered.

It is possible to solve AI problems without using AI techniques. It is possible to apply AI


techniques to solutions of non-AI problems.

Important AI Techniques:

School of Computer science and Engineering, Reva University


UNIT-1 Machine learning and Applications

 Search: Provides a way of solving problems for which no more direct approach
is available as well as a framework into which any direct techniques that are
available can be embedded.
 Use of Knowledge: Provides a way of solving complex problems by exploiting
the structures of the objects that are involved.
 Abstraction: Provides a way of separating important features and variations
from the many unimportant ones that would otherwise overwhelm any process.

Games/Puzzles/Problems in AI
In order to characterize an AI technique let us consider initially OXO or tic-tac-toe and use a
series of different approaches to play the game. The programs increase in complexity, their use of
generalizations, the clarity of their knowledge and the extensibility of their approach. In this way
they move towards being representations of AI techniques.

Tic-Tac-Toe

(Program-1) The

Data Structure

 The first approach (simple) The Tic-Tac-Toe game consists of a nine element
vector called BOARD

 It represents the numbers 1 to 9 in three rows.


1 2 3
4 5 6
7 8 9

 An element contains the value 0 for blank, 1 for X and 2 for O.
 A MOVETABLE vector consists of 19,683 elements (39 ) and is needed where each
element is a nine element vector.
 The contents of the vector are especially chosen to help the algorithm.

The Algorithm
The algorithm makes works as follows:
1. View the vector as a ternary number. Convert it to a decimal number.
2. Use the decimal number as an index in MOVETABLE and access the vector.
3. Set BOARD to this vector indicating how the board looks after the move. This
approach is capable in time but it has several disadvantages.
School of Computer science and Engineering, Reva University
UNIT-1 Machine learning and Applications

Comments about the algorithm


 It consumes more memory space and requires stunning effort to calculate the decimal
numbers.
 Someone has to sit back and make all entries of the movetable

Program-2
In the second approach the data structure is also follows

Data Structure:
Board : The Tic-Tac-Toe game consists of a nine element vector called BOARD. But we use 2
for a blank, 3 for an X and 5 for an O.

TURN: A variable called TURN indicates 1 for the first move and 9 for the last.
The algorithm consists of three procedures/functions

MAKE2: which returns 5 if the centre square is blank; otherwise it returns any blank non-corner
square,
i.e. 2, 4, 6 or 8.

POSSWIN (p): returns 0 if player p cannot win on the next move and otherwise returns the
number of the square that gives a winning move. It checks each line using products 3*3*2 = 18
gives a win for X, 5*5*2=50 gives a win for O, and the winning move is the holder of the blank.

GO (n): Makes a move to square n setting BOARD[n] to 3 or 5.

School of Computer science and Engineering, Reva University


UNIT-1 Machine learning and Applications

The Strategy used by this algorithm is as follows:

Turn-1 Go(1) (upper Left Corner)


Turn-2 If Board[5] is Blank then

Go(5)
Else
Go(1)
Turn-3 If Board[9] is Blank then
Go(9)
else Go(3)
Turn-4 If Posswin(X) is not 0 then
Go(posswin(X)) //i.e. Block opponent’s win
Else
Go(Make2)

Turn-5 If Posswin(X) is not 0 then


Go(Posswin(X) //i.e. win for X
Else If Posswin(O) is not 0 then
Go(Posswin(O)) //i.e. block win for
O Else If Board[7] is blank then
Go(7)
Else
Go(3)

Turn-6 If Posswin(O) is not then


Go(Posswin(O))
Else if Posswin(X) is not 0 then
Go(Posswin(X))
Else
Go(Make 2)
Turn-7 If Posswin(X) is not then
Go(Posswin(X))
Else if Posswin(O) is not 0 then
Go(Posswin(X))
Else
Go( Make 2)
Turn-8 If Posswin(O) is not 0
then
Go(Posswin(O))
Else If Posswin(X) is not 0 then
Go(Posswin(X))
Else
Go (to any blank cell)
Turn-9 Same as Turn-7

School of Computer science and Engineering, Reva University


UNIT-1 Machine learning and Applications

Algorithm – Tic-Tac-Toe Game Playing using Magic Square – Program 2 in AI Magic Square

Here, we assign board positions to vector elements.

The Sum of all rows, columns, and diagonals must be 15.

The first machine will check, the chance to win,

 If the difference between 15 and the sum of the two squares, If this difference is not positive or if it is greater
than 9, then the original two squares were not collinear and so can be ignored.
or it will check the opponent of winning and block the chances of winning.

Here, the objective is machine will win the maximum number of matches.

School of Computer science and Engineering, Reva University


UNIT-1 Machine learning and Applications

Example:

Now, the computer will check its possibility of winning the game.

First, calculate the difference between the 15 and the sum of two positions.

Diff = 15 – (5+4) = 6

6 is not empty, hence Computer can’t win the game.

Now, the computer checks the possibility of opponents winning the match. If the opponent is winning
black it.

Diff = 15 – (8+6) = 1

1 is empty, hence the human can win the game.

Hence Computer Blocks it.

Computer – go to 1

School of Computer science and Engineering, Reva University


UNIT-1 Machine learning and Applications

Now, it’s Player Human Turn,

Now, the computer will check its possibility of winning the game.

Diff = 15 – (5+4) = 6

6 is not empty, hence Computer can’t win the game.

Diff = 15 – (1+4) = 10

10 is greater than 9, hence Computer can’t win the game.

Diff = 15 – (1+5) = 9

9 is empty, hence Computer can win the game. Computer – go to 9

School of Computer science and Engineering, Reva University


UNIT-1 Machine learning and Applications

Program 3 Tic Tac Toe using Min max procedure

Question Answering

Let us consider Question Answering systems that accept input in English and provide answers also in English.
This problem is harder than the previous one as it is more difficult to specify the problem properly. Another area
of difficulty concerns deciding whether the answer obtained is correct, or not, and further what is meant by
‘correct’.

For example, consider the following String/Text

Mary went shopping for a new Coat. She found a red one she really liked. When she got home, she found that
it went perfectly with her favourite dress.

Q1. What did Mary go shopping for? Q2. What did

Mary find that she liked? Q3. Did Mary buy

anything?

Method 1
Data Structures used

Question Patterns: A set of templates that match common questions and produce patterns used to match
against inputs. Templates and patterns are used so that a template that matches a given question is associated

School of Computer science and Engineering, Reva University


UNIT-1 Machine learning and Applications
with the corresponding pattern to find the answer in the input text. For example, the template who did x
y(template) generates x y z (patterns), if a match occurs and z is the answer to the question.

Text: The given text stored as character string

Question: The current question also stored as character string

Algorithm
Answering a question requires the following four steps to be followed:

1. Compare the template against the questions and store all successful matches to produce a set of text
patterns.
2. Pass these text patterns through a substitution process to change the person or voice and produce an
expanded set of text patterns.
3. Apply each of these patterns to the Text; collect all the answers and then print the answers. If we

apply the algorithm to our string following answers are obtained:

Q1. We use the template “WHAT DID X Y” which generates Mary go shopping for z and after substitution
we get Mary goes shopping for z and Mary went shopping for z two forms and z is equivalent to new
coat

Q2. We need a very large number of templates and also a scheme to allow the insertion of ‘find’ before
‘that she liked’; the insertion of ‘really’ in the text; and the substitution of ‘she’ for ‘Mary’ gives the answer
‘a red one’.

Q3. Cannot be answered.

Comments
This is a very primitive approach basically not matching the criteria we set for intelligence and worse than
that, used in the game.

Method 2
Data Structures used

 English-Know: A structure called English-Know consists of a dictionary, grammar and some


semantics about the vocabulary we are likely to come across. This data structure provides the
knowledge to convert English text into a storable internal form and also to convert the response back
into English.
 Input-Text: The input text in character form
 Structured Text: The structured representation of the text is a processed form and defines the context
of the input text by making explicit all references such as pronouns. There are three types of such
knowledge representation systems: production rules of the form:
 ‘if x then y’,
 slot and filler systems and
 statements in mathematical logic.

School of Computer science and Engineering, Reva University


UNIT-1 Machine learning and Applications
This example uses the slot and filler system. Take, for example consider sentence: ‘She found a red one
she really liked’.
Question Answering Problem

Let us consider Question Answering systems that accept input in English and provide answers
also in English. This problem is harder than the previous one as it is more difficult to specify the
problem properly. Another area of difficulty concerns deciding whether the answer obtained is
correct, or not, and further what is meant by ‘correct’.

For example, consider the following String/Text

Mary went shopping for a new Coat. She found a red one she really liked. When she got home,
she found that it went perfectly with her favourite dress.

Q1. What did Mary go shopping

for? Q2. What did Mary find that

she liked? Q3. Did Mary buy

anything?

Method 1
Data Structures used

Question Patterns: A set of templates that match common questions and produce patterns
used to match against inputs. Templates and patterns are used so that a template that matches
a given question is associated with the corresponding pattern to find the answer in the input text.
For example, the template who did x y(template) generates x y z (patterns), if a match occurs
and z is the answer to the question.

Text: The given text stored as character string

Question: The current question also stored as character string

Algorithm

School of Computer science and Engineering, Reva University


UNIT-1 Machine learning and Applications

Answering a question requires the following four steps to be followed:

1. Compare the template against the questions and store all successful matches to produce
a set of text patterns.
2. Pass these text patterns through a substitution process to change the person or voice
and produce an expanded set of text patterns.
3. Apply each of these patterns to the Text; collect all the answers and then print the

answers. If we apply the algorithm to our string following answers are obtained:

Q1. We use the template “WHAT DID X Y” which generates Mary go shopping for z and
after substitution we get Mary goes shopping for z and Mary went shopping for z two
forms and z is equivalent to new coat

Q2. We need a very large number of templates and also a scheme to allow the insertion of
‘find’ before ‘that she liked’; the insertion of ‘really’ in the text; and the substitution of ‘she’
for ‘Mary’ gives the answer ‘a red one’.

Q3. Cannot be answered.

Comments
This is a very primitive approach basically not matching the criteria we set for intelligence
and worse than that, used in the game.

Method 2
Data Structures used

 English-Know: A structure called English-Know consists of a dictionary, grammar and


some semantics about the vocabulary we are likely to come across. This data structure
provides the knowledge to convert English text into a storable internal form and also to
convert the response back into English.
 Input-Text: The input text in character form
 Structured Text: The structured representation of the text is a processed form and
defines the context of the input text by making explicit all references such as pronouns.
There are three types of such knowledge representation systems: production rules of the
form:
 ‘if x then y’,
 slot and filler systems and
 statements in mathematical logic.

This example uses the slot and filler system. Take, for example consider sentence: ‘She
found a red one she really liked’.

School of Computer science and Engineering, Reva University


UNIT-1 Machine learning and Applications

The Structured representation of above sentence is as follows:


Event2

Instance:

Finding Tense:

past tense:

Agent: Mary

Object:

Thing1

Thing1

Instance:

Coat Colour:

Red Event 2

Instance:

Liking Tense:

Past Modifier:

Much Object:

Thing1

Input-Question: The input question stored in character form

Struct-Question: A structured representation of the content of user’s questions

Algorithm

1. Convert the question to a structured form using English know how, then use a marker to
indicate the substring (like ‘who’ or ‘what’) of the structure, that should be returned as an
answer.
2. The structured form is matched against the Structure-Text and
3. The requested segments of the question are returned.

Both questions 1 and 2 generate answers viz. a new coat and a red coat respectively. Here
also Question 3 cannot be answered, because there is no direct response.

Method 3 (Shopping Script

technique) Data structures


School of Computer science and Engineering, Reva University
UNIT-1 Machine learning and Applications

used are

 English-Know
 Input-Text

School of Computer science and Engineering, Reva University


UNIT-1 Machine learning and Applications

 Input-Question
 Struct-Question

We also use a data structure called World-Model (with shopping script) and Integrated-Text

 World-Model: Contains background knowledge of the problem we are


dealing with. For instance it contains a shopping script for a situation like
Mary doing shopping.
Here is a sample shopping script:

 Notations Used:
 C – Customers
 S- Salesperson
 M-Merchandise (coat)
 M’- Red Coat
 L- Location(store)

1. C- Enters L

2. C- Looks around

3. C- Looks for a 4. C- Looks for


. specific M any
interesting M

5. C- asks S for help

7. C- Finds M’ 8. C- Fails to Find M

9. C- Leaves L 10. C- buys M’ 11. C- Leaves 12. Go to step-2


L

13. C- leaves
L

14. C- takes

School of Computer science and Engineering, Reva University


UNIT-1 Machine learning and Applications

Physical Symbol System Hypothesis


At the heart of research in artificial intelligence, the underlying assumptions about intelligence lie
in what Newell and Simon (1976) call the physical symbol system hypothesis. They define a
physical symbol system as follows:
1. Symbols
2. Expressions
3. Symbol Structure
4. System
A physical symbol system consists of a set of entities called symbols, which are physically
patters that can occur as components of another type of entity called an expression (or symbol
structure). A symbol structure is composed of a number of instances (or tokens) of symbols
related in some physical way. At any instance of the time the system will contain a collection of
these symbol structures. The system also contains a collection of processes that operate on
expressions to produce other expressions: processes of creation, modification, reproduction and
destruction.

They state hypothesis as:


“A physical symbol system has the necessary and sufficient means for general ‘intelligent
actions’.”

This hypothesis is only a hypothesis there appears to be no way to prove or disprove it on logical
ground so, it must be subjected to empirical validation we find that it is false. We may find the
bulk of the evidence says that it is true but only way to determine its truth is by experimentation ”

Computers provide the perfect medium for this experimentation since they can be programmed
to simulate physical symbol system we like. The importance of the physical symbol system
hypothesis is twofold. It is a significant theory of the nature of human intelligence and so is of
great interest to psychologists.

Overview of Artificial Intelligence

It was the ability of electronic machines to store large amounts of information and process it at
very high speeds that gave researchers the vision of building systems which could emulate
(imitate) some human abilities.

We will see the introduction of the systems which equal or exceed human abilities and see them
because an important part of most business and government operations as well as our daily
activities.

Definition of AI: Artificial Intelligence is a branch of computer science concerned with the study

School of Computer science and Engineering, Reva University


UNIT-1 Machine learning and Applications

and creation of computer systems that exhibit some form of intelligence such as systems that
learn new concepts and tasks, systems that can understand a natural language or perceive and
comprehend a visual scene, or systems that perform other types of feats that require human
types of intelligence.

To understand AI, we should understand


 Intelligence
 Knowledge
 Reasoning
 Thought
 Cognition: gaining knowledge by thought or perception learning
The definitions of AI vary along two main dimensions: thought process and reasoning and
behavior.

AI is not the study and creation of conventional computer systems. The study of the mind, the
body, and the languages as customarily found in the fields of psychology, physiology, cognitive
science, or linguistics.

In AI, the goal is to develop working computer systems that are truly capable of performing tasks
that require high levels of intelligence.

School of Computer science and Engineering, Reva University


u
Unit-1 Machine learning and Applications

Unit- 1
Ever since computers were invented, we have wondered whether they might be made to learn.
If we could understand how to program them to learn-to improve automatically with
experience-the impact would be dramatic.
 Imagine computers learning from medical records which treatments are most effective
for new diseases
 Houses learning from experience to optimize energy costs based on the particular usage
patterns of their occupants.
 Personal software assistants learning the evolving interests of their users in order to
highlight especially relevant stories from the online morning newspaper

A successful understanding of how to make computers learn would open up many new uses
of computers and new levels of competence and customization

Some successful applications of machine learning


 Learning to recognize spoken words
 Learning to drive an autonomous vehicle
 Learning to classify new astronomical structures
 Learning to play world-class backgammon

Why is Machine Learning Important?

 Some tasks cannot be defined well, except by examples (e.g., recognizing people).
 Relationships and correlations can be hidden within large amounts of data. Machine
Learning/Data Mining may be able to find these relationships.
 Human designers often produce machines that do not work as well as desired in the
environments in which they are used.
 The amount of knowledge available about certain tasks might be too large for explicit
encoding by humans (e.g., medical diagnostic).
 Environments change over time.
 New knowledge about tasks is constantly being discovered by humans. It may be
difficult to continuously re-design systems “by hand”.

1 School of Computer science and Engineering, Reva University


u
Unit-1 Machine learning and Applications

WELL-POSED LEARNING PROBLEMS

Definition: A computer program is said to learn from experience E with respect to some class
of tasks T and performance measure P, if its performance at tasks in T, as measured by P,
improves with experience E.

To have a well-defined learning problem, three features needs to be identified:


1. The class of tasks
2. The measure of performance to be improved
3. The source of experience

Examples
1. Checkers game: A computer program that learns to play checkers might improve its
performance as measured by its ability to win at the class of tasks involving playing
checkers games, through experience obtained by playing games against itself.

Fig: Checker game board


A checkers learning problem:
 Task T: playing checkers
 Performance measure P: percent of games won against opponents
 Training experience E: playing practice games against itself

2. A handwriting recognition learning problem:


 Task T: recognizing and classifying handwritten words within images
 Performance measure P: percent of words correctly classified
 Training experience E: a database of handwritten words with given
classifications
3. A robot driving learning problem:
 Task T: driving on public four-lane highways using vision sensors
 Performance measure P: average distance travelled before an error (as judged
by human overseer)
 Training experience E: a sequence of images and steering commands recorded
while observing a human driver

2 School of Computer science and Engineering, Reva University


u
Unit-1 Machine learning and Applications

DESIGNING A LEARNING SYSTEM

The basic design issues and approaches to machine learning are illustrated by designing a
program to learn to play checkers, with the goal of entering it in the world checkers
tournament
1. Choosing the Training Experience
2. Choosing the Target Function
3. Choosing a Representation for the Target Function
4. Choosing a Function Approximation Algorithm
1. Estimating training values
2. Adjusting the weights
5. The Final Design

1. Choosing the Training Experience

 The first design choice is to choose the type of training experience from which the
system will learn.
 The type of training experience available can have a significant impact on success or
failure of the learner.

There are three attributes which impact on success or failure of the learner

1. Whether the training experience provides direct or indirect feedback regarding the
choices made by the performance system.

For example, in checkers game:


In learning to play checkers, the system might learn from direct training examples
consisting of individual checkers board states and the correct move for each.

Indirect training examples consisting of the move sequences and final outcomes of
various games played. The information about the correctness of specific moves early in
the game must be inferred indirectly from the fact that the game was eventually won or
lost.

Here the learner faces an additional problem of credit assignment, or determining the
degree to which each move in the sequence deserves credit or blame for the final
outcome. Credit assignment can be a particularly difficult problem because the game
can be lost even when early moves are optimal, if these are followed later by poor
moves.
Hence, learning from direct training feedback is typically easier than learning from
indirect feedback.

3 School of Computer science and Engineering, Reva University


u
Unit-1 Machine learning and Applications

2. The degree to which the learner controls the sequence of training examples

For example, in checkers game:


The learner might depends on the teacher to select informative board states and to
provide the correct move for each.

Alternatively, the learner might itself propose board states that it finds particularly
confusing and ask the teacher for the correct move.

The learner may have complete control over both the board states and (indirect) training
classifications, as it does when it learns by playing against itself with no teacher present.

3. How well it represents the distribution of examples over which the final system
performance P must be measured

For example, in checkers game:


In checkers learning scenario, the performance metric P is the percent of games the
system wins in the world tournament.

If its training experience E consists only of games played against itself, there is a danger
that this training experience might not be fully representative of the distribution of
situations over which it will later be tested.
It is necessary to learn from a distribution of examples that is different from those on
which the final system will be evaluated.

2. Choosing the Target Function

The next design choice is to determine exactly what type of knowledge will be learned and
how this will be used by the performance program.

Let’s consider a checkers-playing program that can generate the legal moves from any board
state.
The program needs only to learn how to choose the best move from among these legal moves.
We must learn to choose among the legal moves, the most obvious choice for the type of
information to be learned is a program, or function, that chooses the best move for any given
board state.

1. Let ChooseMove be the target function and the notation is

ChooseMove : B→ M
which indicate that this function accepts as input any board from the set of legal board
states B and produces as output some move from the set of legal moves M.

4 School of Computer science and Engineering, Reva University


u
Unit-1 Machine learning and Applications

ChooseMove is a choice for the target function in checkers example, but this function
will turn out to be very difficult to learn given the kind of indirect training experience
available to our system

2. An alternative target function is an evaluation function that assigns a numerical score


to any given board state
Let the target function V and the notation
V:B →R

which denote that V maps any legal board state from the set B to some real value.
Intend for this target function V to assign higher scores to better board states. If the
system can successfully learn such a target function V, then it can easily use it to select
the best move from any current board position.

Let us define the target value V(b) for an arbitrary board state b in B, as follows:
 If b is a final board state that is won, then V(b) = 100
 If b is a final board state that is lost, then V(b) = -100
 If b is a final board state that is drawn, then V(b) = 0
 If b is a not a final state in the game, then V(b) = V(b' ),

Where b' is the best final board state that can be achieved starting from b and playing optimally
until the end of the game

3. Choosing a Representation for the Target Function

Let’s choose a simple representation - for any given board state, the function c will be
calculated as a linear combination of the following board features:

 xl: the number of black pieces on the board


 x2: the number of red pieces on the board
 x3: the number of black kings on the board
 x4: the number of red kings on the board
 x5: the number of black pieces threatened by red (i.e., which can be captured on red's
next turn)
 x6: the number of red pieces threatened by black

Thus, learning program will represent as a linear function of the form

5 School of Computer science and Engineering, Reva University


u
Unit-1 Machine learning and Applications

Where,
 w0 through w6 are numerical coefficients, or weights, to be chosen by the learning
algorithm.
 Learned values for the weights w1 through w6 will determine the relative importance
of the various board features in determining the value of the board
 The weight w0 will provide an additive constant to the board value

4. Choosing a Function Approximation Algorithm

In order to learn the target function f we require a set of training examples, each describing a
specific board state b and the training value Vtrain(b) for b.

Each training example is an ordered pair of the form (b, Vtrain(b)).

For instance, the following training example describes a board state b in which black has won
the game (note x2 = 0 indicates that red has no remaining pieces) and for which the target
function value Vtrain(b) is therefore +100.

((x1=3, x2=0, x3=1, x4=0, x5=0, x6=0), +100)

Function Approximation Procedure

1. Derive training examples from the indirect training experience available to the learner
2. Adjusts the weights wi to best fit these training examples

1. Estimating training values

A simple approach for estimating training values for intermediate board states is to
assign the training value of Vtrain(b) for any intermediate board state b to be
V̂ (Successor(b))

Where ,
 V̂ is the learner's current approximation to V
 Successor(b) denotes the next board state following b for which it is again the
program's turn to move

Rule for estimating training values

Vtrain(b) ← V̂ (Successor(b))

6 School of Computer science and Engineering, Reva University


u
Unit-1 Machine learning and Applications

2. Adjusting the weights


Specify the learning algorithm for choosing the weights wi to best fit the set of training
examples {(b, Vtrain(b))}
A first step is to define what we mean by the bestfit to the training data.
One common approach is to define the best hypothesis, or set of weights, as that which
minimizes the squared error E between the training values and the values predicted by
the hypothesis.

Several algorithms are known for finding weights of a linear function that minimize E.
One such algorithm is called the least mean squares, or LMS training rule. For each
observed training example it adjusts the weights a small amount in the direction that
reduces the error on this training example

LMS weight update rule :- For each training example (b, Vtrain(b))
Use the current weights to calculate V̂ (b)
For each weight wi, update it as

wi ← wi + ƞ (Vtrain (b) - V̂ (b)) xi

Here ƞ is a small constant (e.g., 0.1) that moderates the size of the weight update.

Working of weight update rule

 When the error (Vtrain(b)- V̂ (b)) is zero, no weights are changed.


 When (Vtrain(b) - V̂ (b)) is positive (i.e., when V̂ (b) is too low), then each weight
is increased in proportion to the value of its corresponding feature. This will raise
the value of V̂ (b), reducing the error.
 If the value of some feature xi is zero, then its weight is not altered regardless of
the error, so that the only weights updated are those whose features actually occur
on the training example board.

7 School of Computer science and Engineering, Reva University


u
Unit-1 Machine learning and Applications

5. The Final Design


The final design of checkers learning system can be described by four distinct program modules
that represent the central components in many learning systems

1. The Performance System is the module that must solve the given performance task by
using the learned target function(s). It takes an instance of a new problem (new game)
as input and produces a trace of its solution (game history) as output.

2. The Critic takes as input the history or trace of the game and produces as output a set
of training examples of the target function

3. The Generalizer takes as input the training examples and produces an output
hypothesis that is its estimate of the target function. It generalizes from the specific
training examples, hypothesizing a general function that covers these examples and
other cases beyond the training examples.

4. The Experiment Generator takes as input the current hypothesis and outputs a new
problem (i.e., initial board state) for the Performance System to explore. Its role is to
pick new practice problems that will maximize the learning rate of the overall system.

The sequence of design choices made for the checkers program is summarized in below figure

8 School of Computer science and Engineering, Reva University


u
Unit-1 Machine learning and Applications

PERSPECTIVES AND ISSUES IN MACHINE LEARNING

Issues in Machine Learning


The field of machine learning, and much of this book, is concerned with answering questions
such as the following
 What algorithms exist for learning general target functions from specific training
examples? In what settings will particular algorithms converge to the desired function,
given sufficient training data? Which algorithms perform best for which types of
problems and representations?
 How much training data is sufficient? What general bounds can be found to relate the
confidence in learned hypotheses to the amount of training experience and the character
of the learner's hypothesis space?

9 School of Computer science and Engineering, Reva University


u
Unit-1 Machine learning and Applications

 When and how can prior knowledge held by the learner guide the process of generalizing
from examples? Can prior knowledge be helpful even when it is only approximately
correct?
 What is the best strategy for choosing a useful next training experience, and how does
the choice of this strategy alter the complexity of the learning problem?
 What is the best way to reduce the learning task to one or more function approximation
problems? Put another way, what specific functions should the system attempt to learn?
Can this process itself be automated?
 How can the learner automatically alter its representation to improve its ability to
represent and learn the target function?

10 School of Computer science and Engineering, Reva University


u
Unit-1 Machine learning and Applications

CONCEPT LEARNING

 Learning involves acquiring general concepts from specific training examples. Example:
People continually learn general concepts or categories such as "bird," "car," "situations in
which I should study more in order to pass the exam," etc.
 Each such concept can be viewed as describing some subset of objects or events defined
over a larger set
 Alternatively, each concept can be thought of as a Boolean-valued function defined over this
larger set. (Example: A function defined over all animals, whose value is true for birds and
false for other animals).

Definition: Concept learning - Inferring a Boolean-valued function from training examples of


its input and output

A CONCEPT LEARNING TASK

Consider the example task of learning the target concept "Days on which Aldo enjoys
his favorite water sport”

Example Sky AirTemp Humidity Wind Water Forecast EnjoySport

1 Sunny Warm Normal Strong Warm Same Yes

2 Sunny Warm High Strong Warm Same Yes

3 Rainy Cold High Strong Warm Change No

4 Sunny Warm High Strong Cool Change Yes

Table: Positive and negative training examples for the target concept EnjoySport.

The task is to learn to predict the value of EnjoySport for an arbitrary day, based on the
values of its other attributes?

What hypothesis representation is provided to the learner?

 Let’s consider a simple representation in which each hypothesis consists of a


conjunction of constraints on the instance attributes.
 Let each hypothesis be a vector of six constraints, specifying the values of the six
attributes Sky, AirTemp, Humidity, Wind, Water, and Forecast.

11 School of Computer science and Engineering, Reva University


u
Unit-1 Machine learning and Applications

For each attribute, the hypothesis will either


 Indicate by a "?' that any value is acceptable for this attribute,
 Specify a single required value (e.g., Warm) for the attribute, or
 Indicate by a "Φ" that no value is acceptable

If some instance x satisfies all the constraints of hypothesis h, then h classifies x as a positive
example (h(x) = 1).

The hypothesis that PERSON enjoys his favorite sport only on cold days with high humidity
is represented by the expression
(?, Cold, High, ?, ?, ?)

The most general hypothesis-that every day is a positive example-is represented by


(?, ?, ?, ?, ?, ?)

The most specific possible hypothesis-that no day is a positive example-is represented by


(Φ, Φ, Φ, Φ, Φ, Φ)

Notation

 The set of items over which the concept is defined is called the set of instances, which is
denoted by X.

Example: X is the set of all possible days, each represented by the attributes: Sky, AirTemp,
Humidity, Wind, Water, and Forecast

 The concept or function to be learned is called the target concept, which is denoted by c.
c can be any Boolean valued function defined over the instances X

c: X→ {O, 1}

Example: The target concept corresponds to the value of the attribute EnjoySport
(i.e., c(x) = 1 if EnjoySport = Yes, and c(x) = 0 if EnjoySport = No).

 Instances for which c(x) = 1 are called positive examples, or members of the target concept.
 Instances for which c(x) = 0 are called negative examples, or non-members of the target
concept.
 The ordered pair (x, c(x)) to describe the training example consisting of the instance x and
its target concept value c(x).
 D to denote the set of available training examples

12 School of Computer science and Engineering, Reva University


u
Unit-1 Machine learning and Applications

 The symbol H to denote the set of all possible hypotheses that the learner may consider
regarding the identity of the target concept. Each hypothesis h in H represents a Boolean-
valued function defined over X
h: X→{O, 1}

The goal of the learner is to find a hypothesis h such that h(x) = c(x) for all x in X.

 Given:
 Instances X: Possible days, each described by the attributes
 Sky (with possible values Sunny, Cloudy, and Rainy),
 AirTemp (with values Warm and Cold),
 Humidity (with values Normal and High),
 Wind (with values Strong and Weak),
 Water (with values Warm and Cool),
 Forecast (with values Same and Change).

 Hypotheses H: Each hypothesis is described by a conjunction of constraints on the


attributes Sky, AirTemp, Humidity, Wind, Water, and Forecast. The constraints may be
"?" (any value is acceptable), “Φ” (no value is acceptable), or a specific value.

 Target concept c: EnjoySport : X → {0, l}


 Training examples D: Positive and negative examples of the target function

 Determine:
 A hypothesis h in H such that h(x) = c(x) for all x in X.

Table: The EnjoySport concept learning task.

The inductive learning hypothesis

Any hypothesis found to approximate the target function well over a sufficiently large set of
training examples will also approximate the target function well over other unobserved
examples.

13 School of Computer science and Engineering, Reva University


u
Unit-1 Machine learning and Applications

CONCEPT LEARNING AS SEARCH

 Concept learning can be viewed as the task of searching through a large space of
hypotheses implicitly defined by the hypothesis representation.
 The goal of this search is to find the hypothesis that best fits the training examples.

Example:
Consider the instances X and hypotheses H in the EnjoySport learning task. The attribute Sky
has three possible values, and AirTemp, Humidity, Wind, Water, Forecast each have two
possible values, the instance space X contains exactly
3.2.2.2.2.2 = 96 distinct instances
5.4.4.4.4.4 = 5120 syntactically distinct hypotheses within H.

Every hypothesis containing one or more "Φ" symbols represents the empty set of instances;
that is, it classifies every instance as negative.
1 + (4.3.3.3.3.3) = 973. Semantically distinct hypotheses

General-to-Specific Ordering of Hypotheses

Consider the two hypotheses


h1 = (Sunny, ?, ?, Strong, ?, ?)
h2 = (Sunny, ?, ?, ?, ?, ?)

 Consider the sets of instances that are classified positive by hl and by h2.
 h2 imposes fewer constraints on the instance, it classifies more instances as positive. So,
any instance classified positive by hl will also be classified positive by h2. Therefore, h2
is more general than hl.

Given hypotheses hj and hk, hj is more-general-than or- equal do hk if and only if any instance
that satisfies hk also satisfies hi

Definition: Let hj and hk be Boolean-valued functions defined over X. Then hj is more general-
than-or-equal-to hk (written hj ≥ hk) if and only if

( xX ) [(hk (x) = 1) → (hj (x) = 1)]

14 School of Computer science and Engineering, Reva University


u
Unit-1 Machine learning and Applications

 In the figure, the box on the left represents the set X of all instances, the box on the right
the set H of all hypotheses.
 Each hypothesis corresponds to some subset of X-the subset of instances that it classifies
positive.
 The arrows connecting hypotheses represent the more - general -than relation, with the
arrow pointing toward the less general hypothesis.
 Note the subset of instances characterized by h2 subsumes the subset characterized by
hl , hence h2 is more - general– than h1

FIND-S: FINDING A MAXIMALLY SPECIFIC HYPOTHESIS

FIND-S Algorithm

1. Initialize h to the most specific hypothesis in H


2. For each positive training instance x
For each attribute constraint ai in h
If the constraint ai is satisfied by x
Then do nothing
Else replace ai in h by the next more general constraint that is satisfied by x
3. Output hypothesis h

15 School of Computer science and Engineering, Reva University


u
Unit-1 Machine learning and Applications

To illustrate this algorithm, assume the learner is given the sequence of training examples
from the EnjoySport task

Example Sky AirTemp Humidity Wind Water Forecast EnjoySport


1 Sunny Warm Normal Strong Warm Same Yes
2 Sunny Warm High Strong Warm Same Yes
3 Rainy Cold High Strong Warm Change No
4 Sunny Warm High Strong Cool Change Yes

 The first step of FIND-S is to initialize h to the most specific hypothesis in H


h - (Ø, Ø, Ø, Ø, Ø, Ø)

 Consider the first training example


x1 = <Sunny Warm Normal Strong Warm Same>, +

Observing the first training example, it is clear that hypothesis h is too specific. None
of the "Ø" constraints in h are satisfied by this example, so each is replaced by the next
more general constraint that fits the example
h1 = <Sunny Warm Normal Strong Warm Same>

 Consider the second training example


x2 = <Sunny, Warm, High, Strong, Warm, Same>, +

The second training example forces the algorithm to further generalize h, this time
substituting a "?" in place of any attribute value in h that is not satisfied by the new
example
h2 = <Sunny Warm ? Strong Warm Same>

 Consider the third training example


x3 = <Rainy, Cold, High, Strong, Warm, Change>, -

Upon encountering the third training the algorithm makes no change to h. The FIND-S
algorithm simply ignores every negative example.
h3 = < Sunny Warm ? Strong Warm Same>

 Consider the fourth training example


x4 = <Sunny Warm High Strong Cool Change>, +

The fourth example leads to a further generalization of h


h4 = < Sunny Warm ? Strong ? ? >

16 School of Computer science and Engineering, Reva University


u
Unit-1 Machine learning and Applications

The key property of the FIND-S algorithm


 FIND-S is guaranteed to output the most specific hypothesis within H that is consistent
with the positive training examples
 FIND-S algorithm’s final hypothesis will also be consistent with the negative examples
provided the correct target concept is contained in H, and provided the training examples
are correct.

Unanswered by FIND-S

1. Has the learner converged to the correct target concept?


2. Why prefer the most specific hypothesis?
3. Are the training examples consistent?
4. What if there are several maximally specific consistent hypotheses?

17 School of Computer science and Engineering, Reva University


u
Unit-1 Machine learning and Applications

VERSION SPACES AND THE CANDIDATE-ELIMINATION ALGORITHM

The key idea in the CANDIDATE-ELIMINATION algorithm is to output a description of the


set of all hypotheses consistent with the training examples

Representation

Definition: consistent- A hypothesis h is consistent with a set of training examples D if and


only if h(x) = c(x) for each example (x, c(x)) in D.

Consistent (h, D)  ( x, c(x)  D) h(x) = c(x))

Note difference between definitions of consistent and satisfies


 An example x is said to satisfy hypothesis h when h(x) = 1, regardless of whether x is
a positive or negative example of the target concept.
 An example x is said to consistent with hypothesis h iff h(x) = c(x)

Definition: version space- The version space, denoted V S with respect to hypothesis space
H, D
H and training examples D, is the subset of hypotheses from H consistent with the training
examples in D
V S {h  H | Consistent (h, D)}
H, D

The LIST-THEN-ELIMINATION algorithm

The LIST-THEN-ELIMINATE algorithm first initializes the version space to contain all
hypotheses in H and then eliminates any hypothesis found inconsistent with any training
example.

1. VersionSpace c a list containing every hypothesis in H


2. For each training example, (x, c(x))
remove from VersionSpace any hypothesis h for which h(x) ≠ c(x)
3. Output the list of hypotheses in VersionSpace

The LIST-THEN-ELIMINATE Algorithm

 List-Then-Eliminate works in principle, so long as version space is finite.


 However, since it requires exhaustive enumeration of all hypotheses in practice it is not
feasible.

18 School of Computer science and Engineering, Reva University


u
Unit-1 Machine learning and Applications

A More Compact Representation for Version Spaces

The version space is represented by its most general and least general members. These
members form general and specific boundary sets that delimit the version space within the
partially ordered hypothesis space.

Definition: The general boundary G, with respect to hypothesis space H and training data D,
is the set of maximally general members of H consistent with D

G {g  H | Consistent (g, D)(g'  H)[(g'  g)  Consistent(g', D)]}


g

Definition: The specific boundary S, with respect to hypothesis space H and training data D,
is the set of minimally general (i.e., maximally specific) members of H consistent with D.

S {s  H | Consistent (s, D)(s'  H)[(s  s')  Consistent(s', D)]}


g

Theorem: Version Space representation theorem

Theorem: Let X be an arbitrary set of instances and Let H be a set of Boolean-valued


hypotheses defined over X. Let c: X →{O, 1} be an arbitrary target concept defined over X,
and let D be an arbitrary set of training examples {(x, c(x))). For all X, H, c, and D such that S
and G are well defined,

VS ={ h  H | (s  S ) (g  G ) ( g  h  s )}
H,D g g

To Prove:
1. Every h satisfying the right hand side of the above expression is in VS
H, D
2. Every member of VS satisfies the right-hand side of the expression
H, D

Sketch of proof:
1. let g, h, s be arbitrary members of G, H, S respectively with g g h g s
 By the definition of S, s must be satisfied by all positive examples in D. Because h g s,
h must also be satisfied by all positive examples in D.
 By the definition of G, g cannot be satisfied by any negative example in D, and because
g g h h cannot be satisfied by any negative example in D. Because h is satisfied by all
positive examples in D and by no negative examples in D, h is consistent with D, and
therefore h is a member of VSH,D.
2. It can be proven by assuming some h in VSH,D,that does not satisfy the right-hand side
of the expression, then showing that this leads to an inconsistency

19 School of Computer science and Engineering, Reva University


u
Unit-1 Machine learning and Applications

CANDIDATE-ELIMINATION Learning Algorithm

The CANDIDATE-ELIMINTION algorithm computes the version space containing all


hypotheses from H that are consistent with an observed sequence of training examples.

Initialize G to the set of maximally general hypotheses in H


Initialize S to the set of maximally specific hypotheses in H
For each training example d, do
• If d is a positive example
• Remove from G any hypothesis inconsistent with d
• For each hypothesis s in S that is not consistent with d
• Remove s from S
• Add to S all minimal generalizations h of s such that
• h is consistent with d, and some member of G is more general than h
• Remove from S any hypothesis that is more general than another hypothesis in S

• If d is a negative example
• Remove from S any hypothesis inconsistent with d
• For each hypothesis g in G that is not consistent with d
• Remove g from G
• Add to G all minimal specializations h of g such that
• h is consistent with d, and some member of S is more specific than h
• Remove from G any hypothesis that is less general than another hypothesis in G

CANDIDATE- ELIMINTION algorithm using version spaces

An Illustrative Example

Example Sky AirTemp Humidity Wind Water Forecast EnjoySport


1 Sunny Warm Normal Strong Warm Same Yes
2 Sunny Warm High Strong Warm Same Yes
3 Rainy Cold High Strong Warm Change No
4 Sunny Warm High Strong Cool Change Yes

20 School of Computer science and Engineering, Reva University


u
Unit-1 Machine learning and Applications

CANDIDATE-ELIMINTION algorithm begins by initializing the version space to the set of


all hypotheses in H;

Initializing the G boundary set to contain the most general hypothesis in H


G0 ?, ?, ?, ?, ?, ?

Initializing the S boundary set to contain the most specific (least general) hypothesis
S0 , , , , , 

 When the first training example is presented, the CANDIDATE-ELIMINTION algorithm
checks the S boundary and finds that it is overly specific and it fails to cover the positive
example.
 The boundary is therefore revised by moving it to the least more general hypothesis that
covers this new example
 No update of the G boundary is needed in response to this training example because G o
correctly covers this example

 When the second training example is observed, it has a similar effect of generalizing S
further to S2, leaving G again unchanged i.e., G2 = G1 = G0

21 School of Computer science and Engineering, Reva University


u
Unit-1 Machine learning and Applications

 Consider the third training example. This negative example reveals that the G boundary
of the version space is overly general, that is, the hypothesis in G incorrectly predicts
that this new example is a positive example.
 The hypothesis in the G boundary must therefore be specialized until it correctly
classifies this new negative example

Given that there are six attributes that could be specified to specialize G2, why are there only
three new hypotheses in G3?
For example, the hypothesis h = (?, ?, Normal, ?, ?, ?) is a minimal specialization of G2
that correctly labels the new example as a negative example, but it is not included in G3.
The reason this hypothesis is excluded is that it is inconsistent with the previously
encountered positive examples

 Consider the fourth training example.

22 School of Computer science and Engineering, Reva University


u
Unit-1 Machine learning and Applications

 This positive example further generalizes the S boundary of the version space. It also
results in removing one member of the G boundary, because this member fails to
cover the new positive example

After processing these four examples, the boundary sets S4 and G4 delimit the version space
of all hypotheses consistent with the set of incrementally observed training examples.

23 School of Computer science and Engineering, Reva University


u
Unit-1 Machine learning and Applications

INDUCTIVE BIAS

The fundamental questions for inductive inference

1. What if the target concept is not contained in the hypothesis space?


2. Can we avoid this difficulty by using a hypothesis space that includes every possible
hypothesis?
3. How does the size of this hypothesis space influence the ability of the algorithm to
generalize to unobserved instances?
4. How does the size of the hypothesis space influence the number of training examples
that must be observed?

These fundamental questions are examined in the context of the CANDIDATE-


ELIMINTION algorithm

A Biased Hypothesis Space

 Suppose the target concept is not contained in the hypothesis space H, then obvious
solution is to enrich the hypothesis space to include every possible hypothesis.
 Consider the EnjoySport example in which the hypothesis space is restricted to include
only conjunctions of attribute values. Because of this restriction, the hypothesis space is
unable to represent even simple disjunctive target concepts such as
"Sky = Sunny or Sky = Cloudy."
 The following three training examples of disjunctive hypothesis, the algorithm would
find that there are zero hypotheses in the version space

Sunny Warm Normal Strong Cool Change Y


Cloudy Warm Normal Strong Cool Change Y
Rainy Warm Normal Strong Cool Change N

 If Candidate Elimination algorithm is applied, then it end up with empty Version Space.
After first two training example
S= ? Warm Normal Strong Cool Change

 This new hypothesis is overly general and it incorrectly covers the third negative
training example! So H does not include the appropriate c.
 In this case, a more expressive hypothesis space is required.

24 School of Computer science and Engineering, Reva University


u
Unit-1 Machine learning and Applications

An Unbiased Learner

 The solution to the problem of assuring that the target concept is in the hypothesis space H
is to provide a hypothesis space capable of representing every teachable concept that is
representing every possible subset of the instances X.
 The set of all subsets of a set X is called the power set of X

 In the EnjoySport learning task the size of the instance space X of days described by
the six attributes is 96 instances.
 Thus, there are 296 distinct target concepts that could be defined over this instance space
and learner might be called upon to learn.
 The conjunctive hypothesis space is able to represent only 973 of these - a biased
hypothesis space indeed

 Let us reformulate the EnjoySport learning task in an unbiased way by defining a new
hypothesis space H' that can represent every subset of instances
 The target concept "Sky = Sunny or Sky = Cloudy" could then be described as

(Sunny, ?, ?, ?, ?, ?) v (Cloudy, ?, ?, ?, ?, ?)

The Futility of Bias-Free Learning

Inductive learning requires some form of prior assumptions, or inductive bias

Definition:
Consider a concept learning algorithm L for the set of instances X.
 Let c be an arbitrary concept defined over X
 Let D = {(x , c(x))} be an arbitrary set of training examples of c.
c
 Let L (x , D ) denote the classification assigned to the instance x by L after training on
i c i
the data D .
c
 The inductive bias of L is any minimal set of assertions B such that for any target concept
c and corresponding training examples D
c

 ( xi  X ) [(B  Dc  xi) ├ L (xi, Dc )]

25 School of Computer science and Engineering, Reva University


u
Unit-1 Machine learning and Applications

The below figure explains


 Modelling inductive systems by equivalent deductive systems.
 The input-output behavior of the CANDIDATE-ELIMINATION algorithm using a
hypothesis space H is identical to that of a deductive theorem prover utilizing the
assertion "H contains the target concept." This assertion is therefore called the inductive
bias of the CANDIDATE-ELIMINATION algorithm.
 Characterizing inductive systems by their inductive bias allows modelling them by their
equivalent deductive systems. This provides a way to compare inductive systems
according to their policies for generalizing beyond the observed training data.

26 School of Computer science and Engineering, Reva University

You might also like