0% found this document useful (0 votes)
12 views48 pages

Learning

The document discusses various aspects of learning in artificial intelligence, including definitions, mechanisms, and algorithms. It covers concepts such as rote learning, induction, decision trees, and neural networks, explaining how these methods enable machines to learn from data and improve performance. Additionally, it outlines the training processes for learning algorithms and the importance of classification in problem-solving tasks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views48 pages

Learning

The document discusses various aspects of learning in artificial intelligence, including definitions, mechanisms, and algorithms. It covers concepts such as rote learning, induction, decision trees, and neural networks, explaining how these methods enable machines to learn from data and improve performance. Additionally, it outlines the training processes for learning algorithms and the importance of classification in problem-solving tasks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

Learning

Outlines
• What is Learning?
• Rote learning
• Induction
• Decision Trees
• Neural Network
What is Learning?
• Most often heard criticisms of AI is that machines cannot be called
intelligent until they are able to learn to do new things and adapt to new
situations, rather than simply doing as they are told to do.
• Definitions of Learning [Simon 1983]: changes in the system that are
adaptive in the sense that they enable the system to do the same task or
tasks drawn from the same population more efficiently and more
effectively the next time.

• Learning covers a wide range of phenomenon:


– Skill refinement : Practice makes skills improve. More you play
tennis, better you get
– Knowledge acquisition: Knowledge is generally acquired through
experience
Various learning mechanisms
• Simple storing of computed information or rote learning, is the most basic
learning activity.
-Many computer programs, i. e., database systems can be said to learn in this sense.
• Another way we learn if through taking advice from others.
-Advice taking is similar to rote learning, but high-level advice may not be in a form
simple enough for a program to use directly in problem solving. The advice need to be
operationalized
• People also learn through their own problem-solving experience.
• Learning from examples : we often learn to classify things in the world
without being given explicit rules.
-Learning from examples usually involves a teacher who helps us classify things by
correcting us when we are wrong.
Rote Learning
• When a computer stores a piece of data, it is performing a
rudimentary form of learning.
 Pros
• In case of data caching, we store computed values so that we do
not have to re compute them later.
• When computation is more expensive than recall, this strategy
can save a significant amount of time.
 Caching has been used in AI programs to produce some
surprising performance improvements.
 Such caching is known as rote learning.
Rote Learning: Example

• Use static evaluation function &

used that score to continue search


• When finished searching , Propagating the
values backward, it had a score for the
position represented by the root.

• It also recorded the board position at the


root & backed up score that had just been
computed for it.

• Instead of using the static evaluation


function to compute a score for A, the
Storing Backed-up Values
stored values for A can be used.
Learning from examples: Induction
• Classification is the process of assigning, to a particular input, the name of a class to
which it belongs.

• The classes from which the classification procedure can choose can be described in a
variety of ways.

• Their definition will depend on the use to which they are put.

• Classification is an important component of many problem solving tasks.

• Simplest: straightforward recognition task


 What letter of the alphabet is this?

 Often classification is embedded inside another operation

If: the current goal is to get from place A to place B, and There is a WALL separating the two places

then: look for a DOORWAY in the WALL and go through it

 To use this rule successfully, the system’s matching routine must be able to identify an object as a wall

 Without this, the rule can never be invoked

 Then, to apply the rule, the system must be able to recognize a doorway
Concept learning
• The idea of producing a classification program that can
evolve its own class definitions is called concept learning
or induction.
• The techniques used for this task must depend on the way
that classes (concepts) are described
• If classes are described by scoring functions, the concept
learning can be done using the technique of coefficient
adjustment
LEARNING
[Through Classification]

Priyam Biswas
Learning Algorithms
• Supervised
– Class information exists
Class 1
Class 2
Class 3

Class 4
• Unsupervised
– NO Class information
Learning Agents
• Learns a classifier from examples: Training stage

• Classifies an unknown instance: Testing or


Classification stage
Learning Algorithms
• Supervised
– Nearest Neighbor / k-Nearest Neighbor
– Decision Tree
– Neural Network / Artificial Neural Network
– Support Vector Machines
– Bayesian Theory
• Bayesian network / Belief Network
• Unsupervised
– Clustering Algorithms
Decision Tree (DT)
• Easy to understand
• Can be expressed as if then else rules
• Builds a classification tree from examples
– Top down methodology
– Divide and conquer
• Uses the DT to classify unknown samples
Decision Tree: An example
• Classify these objects

• Objects are represented by color, texture


(edge information) and shape
• Class information is given
Decision Tree: An example
Decision Tree: An example
Decision Tree: An example
colour

Red Blue

texture shape

Three Four
edges edges
Square Circular

Red Triangle Red Square Blue Square Blue Circle

Leaf1 Leaf2 Leaf3 Leaf4


Rules from Decision Tree (DT)
LEARNING
[With Neural Network]

Priyam Biswas
Artificial Neural Network

Class 1

?
x1
x2 Class 2
x3
Class n
xm
Input Outputs
features
Artificial Neural Network

x1
x2
x3
Class 1
Class 2

xm Class n
ANN AS A LINEAR CLASSIFIER
Consider the AND or OR
functions

x1 x2 AND Class OR Class


0 0 0 B 0 B
0 1 0 B 1 A
1 0 0 B 1 A
1 1 1 A 1 A
ANN AS A LINEAR CLASSIFIER
Can we separate the two classes of AND or
OR function using a line?

x1 x2 AND Class OR Class


0 0 0 B 0 B
0 1 0 B 1 A
1 0 0 B 1 A
1 1 1 A 1 A
ANN AS A LINEAR CLASSIFIER
ANN AS A LINEAR CLASSIFIER

x1 + x2 -1/2 =0

x1
1
1
x2
-1/2
ANN AS A LINEAR CLASSIFIER

x1 + x2 -3/2 =0

x1
1
1
x2
-3/2
ANN AS A LINEAR CLASSIFIER

x1 x1
1 1
1 1
x2 x2
-3/2 -1/2
A perceptron or
A perceptron or
neuron realizing
neuron realizing OR
AND function
function
A 2 INPUT PERCEPTRON

x1 x1
w1 w1
Σ Σ f
w2 w2
x2 w0 x2 w0

w1.x1 + w2.x2 +w0 =0

wT.x +w0 =0 A straight line


An l INPUT PERCEPTRON
x1
w1
x2
w2 Σ f
wl w0
xl

g(x)=wT.x +w0 =0 A hyper plane


THE MAIN ISSUE IN ANN

Class ω1 • How to determine the


Class ω2 weight vectors, w to
separate the classes?
Introduction of Perceptron
 In 1957, Rosenblatt and several other researchers
developed perceptron, which used the similar
network as proposed by McCulloch, and the learning
rule for training network to solve pattern recognition
problem.

 (*) But, this model was later criticized by Minsky


who proved that it cannot solve the XOR problem.
Introduction of Perceptron
x2

ω1
ω2
Assume the
Initial
classifier

x1
Introduction of Perceptron

x2

ω1
ω2

Modify w to
improve the
classifier x1
Perceptron Training Process
1. Set up the network input nodes, layer(s), & connections
2. Randomly assign weights to Wij & bias: j

3. Input training sets Xi (preparing Tj for verification )


4. Training computation:

net j   Wij X i   j
i

1 net j > 0
Y j= if
0 net j  0
The training process
4. Training computation:
If T j  Y j   0 than:
Wij   T j  Y j X i
 j   T j  Y j 

Update weights and bias :


Wij  Wij  Wij
new j   j   j

5. repeat 3 ~5 until every input pattern is satisfied


T j  Y j   0
The recall process
After the network has trained as mentioned above,
any input vector X can be send into the perceptron
network.
The trained weights, Wij, and the bias, θj , is used
to derive netj and, therefore, the output Yj can be
obtained for pattern recognition.
Exercise1: Solving the OR problem
• Let the training patterns are as follows.
X1 X2 T
0 0 0
0 1 1 X1
W11
1 0 1
f Y
1 1 1
X2 X2 W21

X1
f
• Let W11=1, W21=0.5, Θ=0.5
Let learning rate η=0.1
• The initial net function is:
net = W11•X1+ W21•X2 -Θ
i.e. net = 1•X11 + 0.5•X21 - 0.5
• Feed the input pattern into network one by one
(0,0), net= -0.5, Y=0, (T-Y)= 0 O.K.
(0,1), net= 0, Y= 0, (T-Y)=1- Ø=1 Need to update weight
(1,0), net= 0.5, Y=1, (T-Y)= 0 O.K.
(1,1), net= 1, Y=1, (T-Y)= 0 O.K.
• update weights for pattern (0,1)
ΔW11=(0.1)(1)( 0)= 0, W11= W11+ΔW11=1
ΔW12=(0,1)(1)( 1)= 0.1, W21=0.5+0.1=0.6
ΔΘ=-(0.1)(1)=-0.1 Θ=0.5-0.1=0.4
• Applying new weights to the net function:
net=1•X1+0.6•X2-0.4
• Verify the pattern (0,1) to see if it satisfies the expected output.
(0,1), net= 0.2, Y= 1, (T-Y)=Ø O.K.
• Feed the next input pattern, again, one by one
(1,0), net= 0.6, Y=1, (T-Y)= Ø O.K.
(1,1), net= 1.2, Y=1, (T-Y)= Ø O.K.
• Since the first pattern(0,0) has not been testified with the new
weights, feed again.
(0,0), net=-0.4, Y=Ø, δ= Ø O.K.
• Now all the patterns are satisfied. Hence, the network is
successfully trained for OR patterns.
We get the final network is : net=1•X1+0.6•X2-0.4

(This is not the only solution, other solutions are possible.)


• The trained network is formed as follow:

X1 1

X2 f
0.6
Θ=0.4

• Recall process:
Once the network is trained, we can apply any two element
vectors as a pattern and feed the pattern into the network for
recognition. For example, we can feed (1,0) into to the
network
(1,0), net= 0.6, Y=1
Therefore, this pattern is recognized as 1.
Exercise2: Solving the AND problem
• Let the training patterns are used as follow

X1 X2 T X2
0 0 0
0 1 0
f1
1 0 0
1 1 1
X1
Sol:
Let W11=0.5, W21=0.5, Θ=1, Let η=0.1
• The initial net function is:
net =0.5X11+0.5X21 – 1
• Feed the input pattern into network one by one
(0,0), net=-1, Y=Ø, (T-Y)= Ø, O.K.
(0,1), net=-0.5, Y= Ø, (T-Y)= Ø, O.K.
(1,0), net=- 0.5, Y= Ø, (T-Y)= Ø, O.K.
(1,1), net= Ø, Y= Ø, (T-Y)= 1, Need to update weight
• update weights for pattern (1,1) which does not
satisfying the expected output:
ΔW11=(0,1)(1)( 1)= 0.1, W11=0.6
ΔW21=(0,1)(1)( 1)= 0.1, W21=0.5+0.1=0.6,
ΔΘ=-(0.1)(1)=-0.1, Θ=1-0.1=0.9
• Applying new weights to the net function:
net=0.6X1 + 0.6X2 - 0.9

• Verify the pattern (1,1) to see if it satisfies the expected output.


(1,1), net= 0.3, Y= 1, δ= Ø O.K.
• Since the previous patterns are not testified with the new
weights, feed them again.
(0,0), net=-0.9 Y=Ø, δ= Ø O.K.
(0,1), net=-0.3 Y= Ø, δ=Ø O.K.
(1,0), net=- 0.3, Y= Ø, δ= Ø O.K.

• We can generate the pattern recognition function for OR pattern


is: net= 0.6X1 + 0.6X2 - 0.9
X1 0.6

X2 0.6 Θ=0.9
Example: Solving the XOR problem
• Let the training patterns are used as follow.
X2

P2 P4
X1 X2 T f1

0 0 0
0 1 1 P1
X1 f2
1 0 1 f2 P3
P2, P3 P4
1 1 0
f3

P1 f1
• Let W11=1.0, W21= -1.0, Θ=0
If we choose one layer network, it will be proved that the
network cannot be converged. This is because the XOR
problem is a non-linear problem, i.e., one single linear
function is not enough to recognize the pattern. Therefore, the
solution is to add one hidden layer for extra functions.

Θ1
W11
X1 f1
W12
W21 f3
f2
X2
W22 Θ3
Θ2
• The following pattern is formed.
X1 X2 f1 f2 T3
0 0 0 0 0
0 1 0 1 1
1 0 0 1 1
1 1 1 1 0
• So we solve f1 as AND problem and f2 as OR problem
• Assume we found the weights are:
W11=0.3, W21=0.3, W12= 1, W22= 1 θ1=0.5 θ2=0.2
• Therefore the network function for f1 & f2 is:
f1 = 0.3X11+ 0.3X21 - 0.5
f2 = 1•X12+ 1•X22 - 0.2
• Now we need to feed the input one by one for training the
network for f1 and f2 seprearately. This is to satisfiying the
expected output for f1 using T1 and for f2 using T2.
• Finally, we use the output of f1 and f2 as input pattern to train
the network for f3. ,We can derive the result as following.
f3=1•X13 - 0.5X23 + 0.1

Θ=0.5
X1 0.3
f1 1
1
Y
0.3
X2 f2 -0.5
1 Θ= -0.1
Θ=0.2

You might also like