0% found this document useful (0 votes)
619 views36 pages

Ai - Unit 5 - PPT - MMB

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 36

6/29/2024

ARTIFICIAL INTELLIGENCE
UNIT 5: LEARNING

Dr. Madhu Bala Myneni


Professor, CSE

TEXTBOOKS
 1. Artificial Intelligence A Modern Approach, Stuart Russell and Peter Norvig, 3rd
Edition, Pearson Education
 2. Artificial Intelligence, Kevin Knight, Elaine Rich, B. Shivashankar Nair, 2nd
Edition, 2008
 3. Artificial Neural Networks, B. Yagna Narayana, PHI

https://people.engr.tamu.edu/guni/csce421/files/AI_Russell_Norvig.pdf

2nd/3rd edition

1
6/29/2024

UNIT-5: CONTENTS
LEARNING:::::
• What is learning
• Learning by Taking Advice
• Learning in Problem-solving
• Learning from example: induction
• Explanation-based learning(EBL)
• Introduction to Neural Networks
• Different types of Learning in Neural Networks
• Applications of Neural Networks
• Recurrent Networks

WHAT IS LEARNING?
• Learning is the improvement of performance with experience over time.
• The learning element is the portion of a learning AI system that decides how to
modify the performance element and implements those modifications.
• All will learn new knowledge through different methods, depending on the type of
material to be learned, the amount of relevant expertise already possessed, and
the environment in which the learning takes place.

• There are five methods of learning. They are,


1. Memorization: Rote learning
2. Direct instruction: being told/advise
3. Analogy: Problem Solving
4. Induction: Learning from examples - classification
5. Deduction: acquires knowledge without the help of a teacher.

2
6/29/2024

WHAT IS LEARNING?
Learning by taking advice
• Initial state: high-level advice
• Final state: an operational rule
• Operators: unfolding definitions, case analysis, matching, etc.
Learning from examples
• Initial state: collection of positive and negative example
• Final state: concept description
• Search algorithms: candidate elimination. induction of decision trees
Learning in problem-solving
• Initial state: solution traces to example problems
• Final state: new heuristics for solving new problems efficiently
• Heuristics for search: generalization, explanation-based learning. utility consideration
Discovery
• Initial state: some environment
• Final state: unknown
• Heuristic for search: interestingness, analogy. etc.

GENERAL LEARNING MODEL


• Learning can be accomplished using different methods, such as
• by memorization facts,
• by being told, or
• by studying examples like problem solution.
• Learning requires that new knowledge structures be created from some form of input
stimulus.
• This new knowledge must then be assimilated into a knowledge base and be tested in
some way for its utility.

• Testing means that the knowledge should be


used in performance of some task from
which meaningful feedback can be obtained,
where the feedback provides some measure
of the accuracy and usefulness of the newly
acquired knowledge.

3
6/29/2024

GENERAL LEARNING MODEL


• The environment has been included as a part of the overall learner system.
• user working at a keyboard
• program modules to simulate a particular environment
• real physical sensors
• The environment may be regarded as
• Form of nature which produces random stimuli or
• more organized training source such as a teacher which provides selected training
examples for the learner component.

GENERAL LEARNING MODEL


• The actual form of environment used will depend on the particular learning paradigm.
• Some representation language must be assumed for communication between the
environment and the learner.
• The same language is used in both the representation scheme and in the knowledge
base.
• Inputs to the learner component - a physical stimuli of some type or descriptive ,
symbolic training examples.
• The information conveyed to the learner component is used to create and modify
knowledge structures in the knowledge base.
• This is used by the performance component to carry out some tasks, such as solving a
problem playing a game, or classifying instances of some concept.
• Feedback is essential to the learner component to know
• if the knowledge structures in the knowledge base were improving or
• if they were adequate for the performance of the given tasks.
• The feedback may be a simple yes or no type of evaluation, or it may contain more
useful information describing why a particular action was good or bad.

4
6/29/2024

MEMORIZATION (ROTE LEARNING)


• Rote learning is the basic learning activity.
• It is a memorization technique based on repetition.
• It is simply copied into the knowledge base without any modification.
• As computed values are stored, this technique can save a significant amount of time.
• This technique can be used in complex learning systems to use the stored values faster
and there is a generalization to keep the number of stored information down to a
manageable level.
Example: Checkers-playing program
Capabilities:
• Organized storage information: To reuse faster needs to be recompute and maintain
indexing
• Generalization: To keep stored objects at manageable level
• Will be able to quickly recall the meaning of the material the more one repeats it.
• Alternatives to rote learning include meaningful learning, associative learning, and active
learning.

LEARNING BY TAKING ADVICE


• Simple form of learning.
• Suppose a programmer writes a set of instructions to instruct the computer what
to do, the programmer is a teacher and the computer is a student. Once learned
(i.e. programmed), the system will be in a position to do new things.
• The advice may come from many sources: human experts, and internet sources.
• Requires more inference than rote learning.
• The knowledge must be transformed into an operational form before being stored
in the knowledge base.
• The reliability of the source of knowledge should be considered.
• The system should ensure that the new knowledge is conflicting with the existing
knowledge.
• FOO (First Operational Operationaliser), for example, learn the game of Hearts.
• It converts the advice in the form of principles, problems, and methods into
effective executable (LISP) procedures (or knowledge). Now this knowledge is
ready to use.

5
6/29/2024

LEARNING BY TAKING ADVICE


• FOO, Which accepts advice for playing hearts, a card game.
• A human user first translates the advice from English into a representation that
FOO can understand.
• For example, "Avoid taking points" becomes: (avoid (take_points me) (trick))
• FOO must operationalize this advice by turning it into an expression that contains
concepts and actions FOO can use when playing the game of hearts. One
strategy FOO can follow is to UNFOLD an expression by replacing some term
with its definition. By UNFOLDing the definition of avoid, FOO comes up with:
(achieve (not (during (trick) (take-points me))))
• FOO considers the advice to apply to the player called "me." Next, FOO
UNFOLDs the definition of trick:
(achieve (not (during
( scenario
(each pl (players) (play-card pl))
(take-trick (trick-winner)))

LEARNING BY TAKING ADVICE


• In other words. the player should avoid taking points during the scenario
consisting of
(1) players playing cards and
(2) one player taking the trick.
• FOO then uses case analysis to determine which steps could cause one to take
points. It rules out step 1 on the basis that it knows of no intersection of the
concepts of take-points and play-card.
• But step 2 could affect taking points, so FOO UNFOLDs the definition of take-
points:
(achieve (not (there-exists c1 (cards-played)
(there-exists c2 (point-cards)
(during (take (trick-winner) c1)
(take me c2))))))

6
6/29/2024

LEARNING BY TAKING ADVICE


• This advice says that the player should avoid taking point cards during the
process of the trick-winner taking the trick.
• The question for FOO now is:
Under what conditions does (take me c2) occur during (take (trick-winner) c1)?
• Using a partial match technique, FOO hypothesizes that points will be taken if
me = trick-winner and c2 = cl.

• It transforms the advice into:


(achieve (not (and (have-points (cards-played))
(:::: (trick-winner) me))))

LEARNING BY TAKING ADVICE


• This means "Do not win a trick that has points." We have not traveled very far
conceptually from "avoid taking points," but it is important to note that the current
vocabulary is one that FOO can understand in terms of actually playing the game
of hearts.
• Through several other transformations, FOO eventually settles on:
(achieve(>= (and (in-suit-led (card-of me))
(possible (Irick-has-points)))
(low (card-of me)))

7
6/29/2024

LEARNING IN PROBLEM-SOLVING
Learning by Parameter Adjustment
• In a static evaluation function, the program used a polynomial form

• The t terms are the values of the sixteen features contributing to the evaluation. The c terms are
the coefficients (weights) that are attached to each of these values.
• As learning progresses, the c values_ will change.

The most important question in the design of a learning program based on parameter adjustment is
1. "When should the value of a coefficient be increased and when should it be decreased?
2. "By how much should the value be changed?"
Question 1: The coefficients of terms that predicted the outcome should be increased, while the
coefficients of poor predictors should be decreased.
Question 2: Until system performance reaches the highest.
• In some domains, this is easy to do.
• In pattern classification programs, the coefficient can increase based on correct predictions.
• In game-playing programs, it is not decided until the end of the game.

LEARNING IN PROBLEM-SOLVING
Learning with Macro-Operators
• Sequences of actions that can be treated as a whole are called Macro-operations.
• Example: START-CAR is an automatic action, even though it consists of several actions
as macro operators like sitting down, adjusting the mirror, inserting the key, and turning
the key.
• Macro-operators were used in the early problem-solving system, STRIPS having a
learning component.
• After each problem-solving episode, the learning component takes the computed plan
and stores it as a macro-operator, or MACROP.
• A MACROP consists of a sequence of actions, not a single one.
• A MACROP's preconditions are the initial conditions of the problem just solved, and its
postconditions correspond to the goal just achieved.
• In its simplest form, the previously computed plans are similar to rote learning.

8
6/29/2024

LEARNING IN PROBLEM-SOLVING
Learning with Macro-Operators (cont…)
• Suppose, given an initial blocks world situation,ON(C, B) and ON(A, Table) are both true.
• STRIPS can achieve the goal ON(A, B) by devising a plan with the four steps
UNSTACK(C, B), PUTDOWN(C), PICKUP(A), STACK(A, B).
• STRIPS now builds MACROP with preconditions ON(C, B), ON(A,Table) and
postconditions ON(C, Table), ON(A, B).

• The STRIPS can generalize the plan consists of the steps UNSTACK(x1, x2),
PUTDOWN(x1), PICKUP(x3), STACK(x3, x2), where xI, x2, andx3 are variables.

• This plan can then be stored with preconditions ON(x1, x2), ON(x3, Table) and
postconditions ON(x1, Table), ON(x2, x3).
• Such a MACROP can now apply in a variety of situations.

LEARNING IN PROBLEM-SOLVING
Learning with Macro-Operators(cont…)
• Suppose our domain included an operator called STACK-ON-B(x), with preconditions that both x
and B be clear, and with postcondition ON(x, B). Consider the same problem as above:

• STRIPS might come up with the plan UNSTACK(C, B), PUTDOWN(C), STACK-ON-B(A).
• Let’s generalize this plan and store it as a MACROP.
• The precondition becomes ON(x3, x2), the postcondition becomes ON(x1, x2), and the plan itself
becomes UNSTACK(x3, x2), PUTDOWN(x3), STACK-ON-B(x1).
• The generalized MACROP, Let x1 = A, x2 = C, and x3= E. Its preconditions are satisfied, so we
construct the plan UNSTACK(E, C), PUTDOWN(E), STACK-ON-B(A).
• But this plan does not work. The problem is that the postcondition of the MACROP.

9
6/29/2024

LEARNING IN PROBLEM-SOLVING
Learning with Macro-Operators (cont…)
• In reality, STRIPS uses a more complex generalization procedure.
• First, all constants are replaced by variables.
• Then, for each operator in the parameterized plan, STRIPS revaluates its preconditions.
• In our example, the preconditions of steps l and 2 are satisfied, but the only way to
ensure that B is clear for step 3 is to assume that block x2, using the UNSTACK operator,
is block B.
• Through "re proving'· that the generalized plan works, STRIPS locates constraints of this
kind.
• Macro-operators are critical and suitable for nonserializable subgoals.
• Nonserializability means that working on one subgoal will not necessarily interfere with
the previous solution to another subgoal.
• Macro-operators can be useful in such cases since one macro-operator can produce a
small global change in the world, even though the individual operators that make it up
produce many undesirable local changes.

LEARNING IN PROBLEM-SOLVING
Learning by Chunking
• Chunking is a process similar to macro-operators.
• The idea of chunking comes from the psychological literature on memory and problem-
solving.
• Its computational basis is in production systems, of the type SOAR system.
• SOAR exploits chunking, so that its performance can increase with experience.
• In fact, the designers of SOAR hypothesize that chunking is a universal learning method,
i.e., it can account for all types of learning in intelligent systems.
• SOAR solves problems by firing productions, which are stored in long-term memory.
• Some of those firings turn out to be more useful than others.
• When SOAR detects a useful sequence of production firings, it creates a chunk, which is
essentially a large production that does the work of an entire sequence of smaller ones.
As in MACROPs, chunks are generalized before they are stored.

• Chunks are used to learn general search control knowledge in addition to operator
sequences.

10
6/29/2024

LEARNING IN PROBLEM-SOLVING
Learning by Chunking(cont…)
For example, if SOAR tries several different operators, but only one leads to a useful path in the
search space, then SOAR builds productions that help it choose operators more wisely in the future.
Use chunking to replicate the macro-operator results. In solving the 8-puzzle,
• SOAR learns how to place a given tile without disturbing the previously placed tiles.
• Several chunks may encode a single macro-operator, and one chunk may participate in several
macro sequences.
• Chunks are generally applicable toward any goal state. This contrasts with macro tables, which
are structured toward reaching a particular goal state from any initial state.
• Also, chunking emphasizes how learning can occur during problem-solving, while macro tables
are usually built during a preprocessing stage.
• As a result, SOAR can learn within trials as well as across trials.
• Chunks learned during the initial stages of solving a problem are applicable in the later stages of
the same problem-solving episode. After a solution is found, the chunks remain in memory, ready
for use in the next problem.
• The price that SOAR pays for this generality and flexibility is high.
• Chunking is inadequate for duplicating the contents of large, directly-computed macro-operator
tables.

LEARNING FROM EXAMPLES: INDUCTION


• Classification is the process of assigning a label or class for each input.
• Classification is an important component of many problem-solving tasks.
• Example: "What letter of the alphabet is this?“
• Before classification can be done, the classes it will use must be defined.
• Producing a classification program that can construct class definitions is called concept learning,
or induction.
• The techniques used for this task depends on the way that classes (concepts) are described.
• If classes are described by scoring functions, then concept learning can be done using the
technique of coefficient adjustment.
• Way:::::Isolate a set of features that are relevant to the task domain.
• Define each class by a weighted sum of values of these features.
• Each class is then defined by a scoring function that looks very similar to the scoring functions
often used in other situations, such as gameplaying.
• The function has the form: C1t1 + C2t2 + C3t3 +... t is a parameter/feature and c is a weight.
• Negative weights indicate features whose presence usually constitutes negative evidence for a
given class.

11
6/29/2024

LEARNING FROM EXAMPLES: INDUCTION


• Example 1: The task is weather prediction
• The parameters are measurements such as rainfall and the location of cold fronts.
• The function can be written to combine these parameters to predict sunny, cloudy, rainy,
or snowy weather.
• Isolate a set of features that are relevant to the task domain. Define each class as a
structure composed of those features.
Example 2: The task is to identify animals
• The body of each type of animal can be stored as a structure, with various features
representing such things as color, length of neck, and feathers.
Techniques for learning class definition:
• Winston's Learning Program
• Version spaces
• Decision Tree

LEARNING FROM EXAMPLES: INDUCTION


Winston's Learning Program
Describes a structural concept learning program, which operates on a simple block domain.
Example: House, Arch, Rent, etc.
Near miss – Not an actual instance of a concept, similar to the actual concept.
Example: A structural description for the House: Node A
represents the entire structure, which is composed of
two parts: node B, a Wedge, and node C, a Brick.

12
6/29/2024

LEARNING FROM EXAMPLES: INDUCTION


• Figures (b) and I(c) show descriptions of the two Arch structures

The basic approach to Winston's program:


1. Begin with a structural description of one known instance of the concept. Call that description the concept
definition.
2. Examine descriptions of other known instances of the concept. Generalize the definition to include them.
3. Examine descriptions of near misses of the concept. Restrict the definition to exclude these.
4. Steps 2 and 3 of this procedure can be interleaved.
5. Steps 2 and 3 of this procedure rely heavily on a comparison process by which similarities and differences
between structures can be detected.

LEARNING FROM EXAMPLES: INDUCTION


Winston's Learning Program

13
6/29/2024

LEARNING FROM EXAMPLES: VERSION SPACES


• Mitchell describes another approach to concept learning called version spaces.
• The goal is to produce a description that is consistent with all positive examples but no
negative examples in the training set.
• In Winston's system did by evolving a single concept description.
• Version spaces work by maintaining a set of possible descriptions and evolving that set
as new examples and near misses are presented.
• A frame-based language is used. For example, a frame representing an individual car.

• The choice of features and values is called the bias of a learning system.
• A clear statement of the bias of a learning system is important to its evaluation.

LEARNING FROM EXAMPLES: VERSION SPACES


• x1,x2,x3 are variables.
• G is a subset of all general descriptions and
• S is a subset of all specific descriptions
consistent with the training examples.

14
6/29/2024

LEARNING FROM EXAMPLES: VERSION SPACES


The algorithm for narrowing the version space is called the candidate elimination algorithm.

LEARNING FROM EXAMPLES: VERSION SPACES


Example: Japanese economy car having 5 examples (3 positive and 2 negative)

learning a concept like "European car,"

15
6/29/2024

LEARNING FROM EXAMPLES: VERSION SPACES


Candidate elimination algorithm- Initially, G and S both start as singleton sets. G contains a null
Example: Japanese Economy car description and S contains the first positive training example
G = {(x1, x2, x3, x4, x5)}
S = { (Japan. Honda, Blue. I 980, Economy)}
Example2:
• The G set must be specialized so that the negative example
is no longer in the version space.
• Specialization involves replacing variables with constants.
• The available specializations:
G = ( (x1, Honda, x3, x4, x5), (x1, x2, Blue, x4, x5), (x1, x2, x3,
1980, x5), (x1,X2, x3, x4, Economy)}
Example4: Remove the negative sample from G.
• Remove from the G set any inconsistent Now, G= {(x1, x2, Blue, x4, x5), (x1, x2, x3, I 980, x5), (x1,X2,
descriptions. x3, x4, Economy)}
• G= {(Japan,X2, x3, x4, Economy)} and Example 3: Generalize the S set to include the new example
• S = {(Japan,X2, x3, x4, Economy)} by replacing constants with variables.
• S and C are both singletons, so the S = { (Japan, x2, Blue, x4, Economy)}
algorithm has converged on the target Translated into English as: "The target concept may be as specific
concept. as 'Japanese, blue economy car,’ or as general as either 'blue car· or
• No more examples are needed. 'economy car.“’

LEARNING FROM EXAMPLES: VERSION SPACES


A learned a concept like "European car," where a European car
was defined as a car whose origin was either Germany, Italy, or
Britain. Suppose we expand the number of discrete values the
slot origin might take to include the values Europe and.
Imported. Suppose further that we have the following isa
hierarchy at our disposal:

The diagram reveals facts such as “Japanese cars are a subset


of imported cars" and “Italian cars are a subset of European
Cars”. How could we modify the candidate elimination
algorithm to take advantage of this knowledge?

Propose new methods of updating the sets G and S that would


allow us to learn the concept of "European car" in one pass
through a set of adequate training examples.

16
6/29/2024

LEARNING FROM EXAMPLES: DECISION TREES


• A third approach to concept learning is the induction of
decision trees, as exemplified by theID3 program
• ID3 uses a tree representation of concepts.
• To classify a particular input, we start at the top of the tree
and answer questions until we reach a leaf, where the
classification stored
• ID3 uses an iterative method to build up decision trees, preferring simple trees over
complex ones.
• The simple trees are more accurate classifiers of future inputs.
• It begins by choosing a random subset of the training examples is called the window.
• The algorithm builds a decision tree that correctly classifies all examples in the window.
The tree is then tested on the training examples outside the window.
• If all the examples are classified correctly, the algorithm halts.
• Otherwise, it adds many training examples to the window and the process repeats.
Empirical evidence indicates that the iterative strategy is more efficient than considering the
whole training set at once.

THE UTILITY PROBLEM


• A major contribution of the work on EBL in PRODIGY was the identification of the utility
problem in learning systems.
• While new search control knowledge can be of great benefit in solving future problems
efficiently, there are also some drawbacks.
• The learned control rules can take up large amounts of memory and the search program
must take the time to consider each rule at each step during problem solving.
• Considering a control rule, if its postconditions are desirable and its preconditions are
satisfied.
• This is a time-consuming process.
• So learned rules may reduce problem-solving time by directing the search more carefully,
they may also increase problem-solving time by forcing the problem solver to consider
them.
• To minimize the number of node expansions in the search space, consider more control
rules to learn is better.
• To minimize the total CPU time required to solve a problem, consider this trade-off.

17
6/29/2024

EXPLANATION BASED LEARNING-THE UTILITY PROBLEM


• PRODIGY maintains a utility measure for each control rule.
• This measure takes into account the average savings provided by the rule, the
frequency of its application, and the cost of matching it.

• Proposed rule has a negative utility, it is discarded (or "forgotten").


• If not, it is placed in long-term memory with the other rules.

• It is then monitored during subsequent problem-solving. If its utility falls, the rule
is discarded.
• Empirical experiments have demonstrated the effectiveness of keeping only those
control rules with high utility.
• Utility considerations apply to a wide range of learning systems.

EXPLANATION-BASED LEARNING
Consider a chess player who, as Black, has reached the
position shown in Fig. 17.14. The position is called a "fork"
because the white knight attacks both the black king and
the black queen. Black must move the king, thereby leaving
the queen open to capture.

• From this single experience, Black is able to learn quite a bit about the fork trap:
• the idea is that if any piece x attacks both the opponent's king and another piece
y, then piece y will be lost.
• We don't need to see dozens of positive and negative examples of fork positions
in order to draw these conclusions.
• From just one experience, we can learn to avoid this trap in the future and
perhaps to use it to our own advantage.

18
6/29/2024

EXPLANATION-BASED LEARNING
What makes such single-example learning possible? The answer is knowledge.
• The chess player has plenty of domain-specific knowledge that can be brought to bear, including
me rules of chess and any previously acquired strategies.
• That knowledge can be used to identify critical aspects of the training example.
• In the case of the fork, we know that the double simultaneous attack is important while the precise
position and type of the attacking piece is not.

Explanation-based Leaning (EBL) system attempts to learn from a single example x by explaining
why x is an example of the target concept.
The explanation is then generalized, and the system's performance is improved through the
availability of this knowledge.
• EBL programs as accepting the following as input:
• A Training Example-What the learning program "sees" in the world, e.g., the car of Fig. 17.7
• A Goal Concept-A high-level description of what the program is supposed to learn
• An Operationally Criterion-A description of which concepts are usable
• A Domain Theory-A set of rules that describe relationships between objects and actions in a
domain

EXPLANATION-BASED LEARNING
EBL has two steps: (1) explain and (2) generalize.
Step1: Explain, the domain theory is used to prune away all the unimportant aspects of the
training example with respect to the goal concept.
• What is left is an explanation of why the training example is an instance of the goal
concept.
• This explanation is expressed in terms that satisfy the operationality criterion.
Step 2: generalize, the explanation as far as possible while still describing the goal concept.
Example - chess,
Step1: Ignore White’s pawns, king, and rook, and constructs
Explain: White's knight, Black's king, and Black's queen. each in their specific positions.
Operationality is ensured: all chess-playing programs understand the basic concepts of
piece and position.
Next, the explanation is generalized. Using domain knowledge, moving the pieces to a
different part of the board is bad for Black.
Also determine that other pieces besides knights and queens can participate in fork attacks.

19
6/29/2024

EXPLANATION-BASED LEARNING explain why Object23 is a cup.

Consider the concept of Cup

Training Example:
owner(Object23, Ralph) ∧ has-
part(Object23, Concavityl2) ∧ is(Object23,
Light) ∧ color(Object23, Brown) ∧ ...

Domain Knowledge:
isa(x, Light) ∧ has-part(x, y) ∧ isa(y. Handle)
➔ liftable(x)
has-part(x, y) ∧ isa(y, Bottom) ∧ is(y, Flat) ➔
stable(x) description of a cup:
has-part(x, y) ∧ isa(y, Concavity) ∧ is(y, has-part(x, y) ∧ isa(y, Concavity) ∧ is(y,
Upward-Pointing) ➔ open-vessel(x)
Upward-Pointing) ∧ has-part(x, z) ∧ isa(z,
Goal Concept: Cup Bottom) ∧ is(z, Flat) ∧ has-part(x, w) ∧ isa(w,
x is a Cup if x is liftable, stable, and open- Handle) ∧ isa(w, Light)
vessel.

DISCOVERY
• Discovery is a restricted form of learning in which one entity acquires knowledge without
the help of a teacher.
• Automated discovery systems are:
• AM: Theory-Driven Discovery
• BACON: Data-Driven Discovery
• Clustering

20
6/29/2024

DISCOVERY
AM: Theory-Driven Discovery
• AM, worked from a few basic concepts of set theory to discover a good deal of standard number theory.
• AM exploited a variety of general-purpose AI techniques.
• It used a frame system to represent mathematical concepts.
• One of the major activities of AM is to create new concepts and fill in their slots.

AM Heuristics
• BACON: Data-Driven Discovery • If t is a function from A to Band Bis ordered, then
• Clustering consider the elements of A that are mapped into
extremal elements of B. Create a new concept
representing this subset of A.
• If some (but not most) examples of some concept X
are also examples of another concept Y, create a
new concept representing the intersection of X and Y
• If very few examples of a concept X are found, then
add to the agenda the task of finding a generalization
of X.

DISCOVERY BACON's reasoning in a tabular format.

BACON: Data-Driven Discovery


• BACON is a model of data-driven scientific discovery
• BACON begins with a set of variables for a problem.
• For example, The study of the behavior of gases, some variables are
• p, the pressure on the gas, V. the volume of the gas, n, the amount of gas in moles, and
T; the temperature of the gas.
• According to the ideal gas law, BACON is able to derive this law on its own.
• First, BACON holds the variables n and T constant, performing experiments at different
pressures p1, p2, and p3.
• BACON notices that as the pressure increases, the volume V decreases. Therefore, it
creates a theoretical term pV. This term is constant.
• BACON systematically moves on to vary the other variables. It tries an experiment with
different values of T, and finds that pV changes.
• The two terms are linearly related with an intercept of 0, so BACON creates a new term
pV/T.
• Finally, BACON varies the term n and finds another linear relation between n and pV/T.
• For all values of n, p, V, and T, pV/nT = 8.32.

21
6/29/2024

DISCOVERY
Clustering
• It is very similar to induction, In inductive learning, a program learns to classify
objects based on the labelings provided by a teacher.
• In clustering, no class labelings are provided.
• The program must discover for itself the natural classes that exist for the
objects, in addition to a method for classifying instances.

ANALOGY
Analogy is a powerful inference tool.
Our language and reasoning are laden with analogies.
Consider the following sentences:
• Last month, the stock market was a roller coaster.
• Bill is like a fire engine.
• Problems in electromagnetism are just like problems in fluid flow.

Methods of analogical problem-solving:


• Transformational Analogy
• Derivational Analogy

22
6/29/2024

ANALOGY
Transformational Analogy –
Example, points and line segments
A proof that the line segment RN is exactly as long as the line segment
OY, given that RO is exactly as long as NY.

• The program to prove a theorem about angles, namely


that the angle BD is equivalent to the angle CE, given
that angles BC and DE are equivalent.
• The proof about line segments is retrieved and
transformed into a proof about angles by substituting the
line for point, angle for a line segment, AB for R, AC for
0, AD for N, and AE for Y.
• Whole solutions are viewed as states in a problem space
called T-space. T-operators prescribe the methods of
transforming solutions (states) into other solutions.
• Reasoning by analogy becomes a search in T-space:
starting with an old solution, ends analysis, or some
other method to find a solution to the current problem.

ANALOGY-DERIVATIONAL ANALOGY
• Derivational analogy is a necessary component in the
transfer of skills in complex domains.
• For example, coded sorting routine in Pascal, and
then to recode the routine in LISP.
• A line-by-line translation is inappropriate, but reuse the
major structural and control decisions done in the old
solution, and construct in the Pascal program.
• One way to model this behavior is to have a problem-solver "replay" the previous derivation and
modify it when necessary.
• If the original reasons and assumptions for a step's existence remain in the new problem, the step
is copied over.
• If some assumption is no longer valid, another assumption must be found.
• If one cannot be found, then we can try to find justification for some alternative stored in the
derivation of the original problem.
• Or perhaps try some step marked as leading to search failure in the original derivation, if the
reasons for failure conditions are not valid in the current derivation.

23
6/29/2024

INTRODUCTION TO NEURAL NETWORK


Hopfield [1982) introduced a neural network that he proposed as a theory of memory.
A Hopfield network has the features:
• Distributed Representation-A memory is stored as a pattern of activation across a set of
processing elements. Furthermore, memories can be superimposed on one another; different
memories are represented by different patterns over the same set of processing elements.
• Distributed Asynchronous Control-Each processing element makes decisions based only on
its own local situation. All these local actions add up to a global solution.
• Content-Addressable Memory-A number of patterns can be stored in a net work. To retrieve a
pattern, we need only specify a portion of it. The network automatically finds the closest match.
• Fault Tolerance - If a few processing elements misbehave or fail completely, the network will still
function properly.

• Processing elements, or units, are always in one of two states,


active (black) and inactive(white)
• Connected with weighted and symmetric connections
• + weight connection - two units activate each other
• - weight connection – active unit deactivates the neighbor unit

INTRODUCTION TO NEURAL NETWORK


The simple Hopfield network operates as follows::::
• A random unit is chosen.
• If any of its neighbors are active, the unit computes the sum of the weights on the connections to
those active neighbors.
• If the sum is positive, the unit becomes active, otherwise it becomes inactive.
• Another random unit is chosen, and the process repeats until the network reaches a stable state.
i.e., until no more units can change state.
• This process is called parallel relaxation.
• Processing elements, or units, are always in one of two states, active (black) and inactive(white)
• Connected with weighted and symmetric connections
• + weight connection - two units activate each other
• - weight connection – active unit deactivates the neighbor unit

24
6/29/2024

INTRODUCTION TO NEURAL NETWORK


Hopfield network

TYPES OF LEARNING NEURAL NETWORK

• Perceptron
• Back Propagation
• Generalization
• Boltzmann Machines
• Reinforcement Learning
• Unsupervised Learning
• The Kohonen Neural Network Model

25
6/29/2024

LEARNING NEURAL NETWORK- PERCEPTRON


• The perceptron (1962) Rosenblatt was one of the earliest neural network models.
• It models a neuron by taking a weighted sum of its inputs and sending the output 1 if
the sum is greater than some adjustable threshold value (otherwise it sends 0).

LEARNING NEURAL NETWORK- PERCEPTRON


• Let x be an input vector (x₁, x2,... xn,.).
• The weighted summation function g(x) and the
output function o(x) can be defined as:

• If g(x) is exactly zero with two inputs


g(x) = w0 + w1x1 + w2x2 = 0
• x2 = -(w1/w2)x1 – (w0/w2) → equa on for a line
• The location of the line is determined by the
weight w0, w1 and w2
• If an input vector lies on one side of the line, the
perceptron will output 1 / 0

26
6/29/2024

LEARNING NEURAL NETWORK- PERCEPTRON


• Let be the weigh vector {w0, w1, ••• , w,,), and let X
be the subset of training instances misclassified by the
current set of weights.
• Then define the perceptron criterion function, J( ),
to be the sum of the distances of the misclassified
input vectors from the decision surface:

LEARNING NEURAL NETWORK - PERCEPTRON

27
6/29/2024

LEARNING NEURAL NETWORK- PERCEPTRON


View XOR as a pattern classification problem, in
which four patterns two possible outputs

LEARNING NEURAL NETWORK- PERCEPTRON


Perceptron Learning Algorithm
Given: A classification problem with n input feature (x1, x2, …., xn) and two output classes.
Compute A set of weights (w0, w1, w2,….,wn) that will cause a perceptron to fire whenever the input falls
into the first output class.
1. Create a perceptron with n+ 1 input and n+ 1 weight, where the x0 is always set to 1.
2. Initialize the weights (w0, w1,…., wn) to random real values.
3. Iterate through the training set, collecting all examples misclassified by the current set
of weights.
4. If all examples are classified correctly, output the weights and quit.
5. Otherwise, compute the vector sum S of the misclassified input vectors where each
vector has the form (x0, x1, …, Xn). In creating the sum, add to S a vector x if x is an input
for which the perceptron incorrectly fails to fire, but – x if x is an input for which the
perceptron incorrectly fires. Multiply sum by a scale factor η.
6. Moreover, Modify the weights (w0, w1, …, wn) by adding the elements of the vector S to
them.
7. Go to step 3.

28
6/29/2024

LEARNING NEURAL NETWORK- BACK PROPAGATION


Subclass of Multilayer networks includes,
• Fully connected
• Layered
• Feedforward
• Back propagation – Different Activation function - produces a real value between 0 and 1

LEARNING NEURAL NETWORK- BACK PROPAGATION


Algorithm: Given: A set of input-output vector pairs.
Compute A set of weights for a three-layer network that maps inputs onto corresponding outputs.
1. Let A be the number of units in the input layer as determined by the training input vectors.
Let C be the number of units in the output.
Now choose B, the number of units in the hidden layer. As the input and hidden layers, each has an
extra unit used for thresholding therefore, the units in these layers will sometimes be indexed by ranges (0,
..., A) and (0, ..., B).
We denote the activation levels of the units in the input layer by x1, in the hidden layer by h1, and in the
output layer by o}’
Weights from the input layer to the hidden layer are denoted by wij, where the i indexes the input units and
j indexes the hidden units.
Likewise, weights connecting the hidden layer to the output layer are denoted by w2ij, with i indexing to
hidden units and j indexing output units.
2. Initialize the weights in the network.
Each weight should be set randomly to a number between -0 and 0.1.
3. Initialize the activations of the thresholding units. The values of these thresholding units should never
change.
Xo= 1.0, h0 = 1.0

29
6/29/2024

LEARNING NEURAL NETWORK- BACK PROPAGATION


4. Choose an input-output pair. Suppose the input vector is X, and the target output vector is Y;
Assign activation levels to the input units.

5. Propagate the activations from the units in the input layer to the hidden layer using the activation
function:

Note that i ranges from O to A {w10j is the thresholding weight for hidden unit} (its propensity to fire
irrespective of its inputs). x0 is always 1.0.

6. Propagate the activations from the units in the hidden layer to the unit in the output layer.

LEARNING NEURAL NETWORK- BACK PROPAGATION

30
6/29/2024

LEARNING NEURAL NETWORK-REINFORCEMENT LEARNING


It learns as follows:
(I) the network is presented with a sample input from the training set,
(2) the network computes what it thinks should be the sample output,
(3) the network is supplied with a real-valued judgment by the teacher,
(4) the network adjusts its weights, and the process repeats, A positive value in step
3 indicates good performance, while a negative value indicates bad performance.
The network seeks a set of weights that

LEARNING NEURAL NETWORK- UNSUPERVISED LEARNING


Consider, The group of ten animals, each
described by its own set of features, breaks
down naturally into three groups: mammals,
reptiles and birds.
• To build a network that can learn which
group a particular animal belongs to,
and to generalize so that it can identify
animals it has not yet seen.
• Accomplish this with a six-input, three-
output backpropagation network.
• Present the network with an input,
observe its output, and update its
weights based on the errors it makes.
• Without a teacher, the error cannot be
computed, so we must seek other
methods.

31
6/29/2024

LEARNING NEURAL NETWORK- UNSUPERVISED LEARNING


In competitive learning, output units fight for
control over portions of the input space.

A simple competitive learning algorithm is


the following:
1. Present an input vector.
2. Calculate the initial activation for each
output unit.
3. Let the output units fight until only one is
active.
4. Increase the weights on connections
between the active input units. This makes it
more likely that the output unit will be the
time the pattern is repeated.

LEARNING NEURAL NETWORK- UNSUPERVISED LEARNING


Algorithm: Competitive Learning
Given: A network consisting of n binary-valued input units directly connected to any number of output
units.
Produce: A set of weights such that the output units become active according to some natural division
of the inputs.
I. Present an input vector, denoted (x1, x2, ...,xn).
2. Calculate the initial activation for each output unit by sum of its inputs.
3. Let the output units fight until only one is active.
4. Adjust the weights on the input lines that lead to the single active output unit.

where wj is the weight on the connection from input unit j to the active output unit, xj is the value of
the jth input bit, m is the number of input units that are active in the input vector that was chosen in
step 1.
and is the learning rate (some small constant). It is easy to show that if the weights on the
connections feeding into an output unit sum to 1 before the weight change, then they will still sum to I
afterward.
5. Repeat steps l to 4 for all input patterns for many epochs.

32
6/29/2024

LEARNING NEURAL NETWORK- KOHONEN NEURAL


NETWORK MODEL
• The Kohonen neural network is a typical example of both self-organization and competitive
learning.
• Self organization is an unsupervised learning quality which organizes a neural network and makes
it learn some meaningful information.
• Categorized as an unsupervised learning network, the Kohonen network is based on the concept
of graded or reinforcement learning.
• Graded learning is similar to grading a class of students by way of conducting quizes and using
their scores obtained to reflect their performance.
• In our world several processes are learnt by grading while many others are learnt unsupervised.
• A typical example of the latter is the way in which an infant learns to recognize the objects and
people around it.
• A Kohonen network is a non-recurrent (feedforward) network which comprises two layers - the
input layer and an output layer or also referred as the Kohonen layer constitute with Kohonen
neurons.
• As can be seen in Fig. 18.20 (a) every neuron of the input layer is connected to every Kohonen
neuron. The Kohonen layer could be of different dimensions. Fig. 18.20 (b) depicts a one
dimensional Kohonen layer while Fig. 18.20 (c) shows the layer in a two-dimensional
configuration. One could imagine other geometrical shapes for the Kohonen layer, like for
instance, a hexagon. Each connection is associated with a weight Wu that is variable. The network

LEARNING NEURAL NETWORK- KOHONEN NEURAL


NETWORK MODEL
• As can be seen in Fig. 18.20 (a) every neuron of the
input layer is connected to every Kohonen neuron.
• The Kohonen layer could be of different dimensions.

• Fig. 18.20 (b) depicts a one dimensional Kohonen layer


• Fig. 18.20 (c) shows the layer in a two-dimensional
configuration.
• The geometrical shapes for the Kohonen layer, like for
instance, a hexagon.
• Each connection is associated with a weight Wu that is
variable.
• The network is presented with continuous values that
represent patterns that are to be learned.
• These inputs can be looked upon as a real valued
vector (x1, x2, ... , xn)·
• The output is not specified as the learning is
unsupervised.

33
6/29/2024

KOHONEN NEURAL NETWORK MODEL - ALGORITHM


1. Initialize the weights wij for i = l ton (on the input side) and j = 1 tom (on the output side) with real
random values.
Two more parameters, viz. the neighbourhood and the learning rate are initialized. The neigborhood is defined
as a radius r as depicted in Fig. 18.20 (b) and (c). It is initialized to a higher value say 3 while the learning
rate a is selected as a high value (around 0.8) in the interval (0, 1). For all the input vectors::
2. Select an input vector at random. Present the selected input vector x = (x1, x2, ... , x,,).
3. Find that Kohonen neuron j that has its associated weight vector (w1j, w2.j,.. ,w,,) closest to the
input vector x. Closeness could be measured using a suitable distance function,

The Kohonen neuron having the least distance is referred as the winning neuron.
4. Modify the weights of all neurons in the neighborhood of radius r of the winning neuron using :
Note that this modification results in the winning neuron and its neighbours to move closer to
(learn) the input. Naturally if the learning rate a were unity the neurons would converge on the
input. For all other neurons the weights remain unaltered.
5. Update α by reducing it gradually over the iterations. This reduces the rate at which the neurons
converge on the input.
6. Reduce the neighbourhood radius r gradually at specified iterations.

APPLICATIONS OF NEURAL NETWORKS


Connectionist models can be divided into the following categories based on the
complexity of the problem and the network's behavior:
• Pattern recognizers and associative memories
• Pattern transformers
• Dynamic inferences

Problems:::
• Connectionist Speech
• Connectionist Vision
• Combinatorial Problems

34
6/29/2024

APPLICATIONS OF NEURAL NETWORKS


Connectionist Speech Connectionist Problems

RECURRENT NEURAL NETWORKS


Jordon Network:::
• The network's plan units stay constant. They
correspond to an instruction like "shoot a basket."
• The state units encode the current state of the network.
• The output units simultaneously give commands (e.g.,
move arm x to position y) and update the state units.
• The network never settles into a stable state; instead it
changes at each time step.

• Recurrent networks can be trained with the backpropagation algorithm.


• At each step, we compare the activations of the output units with the desired activations and
propagate errors backward through the network.
• When training is completed, the network will be capable of performing a sequence of actions.
• Features of backpropagation, such as automatic generalization also hold.
• A few modifications are useful like the state units to change smoothly.
• Smoothness can be implemented as a change in the weight update rule; The "error" of output
becomes a combination of real error and the magnitude of the change in the state units.
• Enforcing the smoothness constraint turns out to be very important in fast learning.

35
6/29/2024

RECURRENT NEURAL NETWORKS


• Mental model, a mapping that relates the network's
outputs to events in the world.
• It learns two different things:
• The relationship between the plan and the network's
• output, and
• The relationship between the network's output and the
real world.
• RN with a mental model is the same as a Jordan net
except for the addition of two more layers:
• hidden layer and
• a layer representing results as seen in the world.
• First, the latter portion of the network is trained (using
backpropagation) on various pairs of outputs and
targets until the network gets its outputs that affect the
real world.
• These rough weights established are trained using real-
world feedback until they can perform accurately.

36

You might also like