AI Applications
AI Applications
Applications
References
(Neural networks and AI for biomedical engineering), Donnal L. .1
Hudoson, 2003
Fuzzy sets and system; Theory and applications ebook .2
Vision Learning
systems systems
Robotics
Expert systems
Neural networks
Natural language
processing
Overview of Expert Systems
Def: A program that uses available information, heuristics,
and inference to suggest solutions to problems in a particular
discipline
It Can… •
Explain their reasoning or suggested decisions –
Display intelligent behavior –
Draw conclusions from complex relationships –
Provide portable knowledge –
Expert system shell (def) •
A collection of software packages and tools used to develop –
expert systems (its is an expert system without a knowledge
base. I.e. it is not restricted to an specific domain)
Limitations of Expert Systems
Knowledge base •
Stores all relevant information, data, rules, cases, and relationships –
used by the expert system
Rule ?: is A conditional statement that links given conditions to –
actions or outcomes
Inference engine •
Seeks information and relationships from the knowledge base and –
provides answers, predictions, and suggestions in the way a human
expert would
In order to produce a reasoning, it is based on logic. There are several –
or more, 1 predicates of order ,kinds of logic: propositional logic
.cte ,epistemic logic, modal logic, temporal logic, fuzzy logic
..Components of an Expert System
Explanation facility: •
A part of the expert system that allows a user or –
decision maker to understand how the expert
system arrived at certain conclusions or results
User interface •
..Components of an Expert System
Knowledge
Knowledge acquisition
base facility
Expert
Explanation Inference
facility engine
Knowledge
base User
Knowledge
acquisition interface
base
facility
Experts User
Participants in Expert Systems
Development and Use
Domain expert •
The individual or group whose expertise and knowledge is –
captured for use in an expert system
Knowledge user •
The individual or group who uses and benefits from the –
expert system
Knowledge engineer •
Someone trained or experienced in the design, –
development, implementation, and maintenance of an
expert system
Schematic
Expert
system
Knowledge engineer
Domain expert Knowledge user
Evolution of Expert Systems Software
Expert system shell •
Collection of software packages & tools to design, –
develop, implement, and maintain expert systems
high
Expert system
Ease of use
shells
Special and 4th
generation
Traditional
languages
programming
languages
low
Before 1980 1980s 1990s
Expert Systems Development Alternatives
high
Develop
from
scratch
Develop
Development from
costs shell
Use
existing
low package
low high
Time to develop expert system
Overview of The Expert Systems
Development Life Cycle: The Knowledge
Engineering Process
1.1 Components of an Expert System Expert systems, as
defined in before, are built to capture the expertise of human experts in a well-
defined, narrow area of knowledge. The early expert systems took 20 to 50 person-
years to build; today’s complex expert systems are still apt to take about 10 person-
years. With the use of an expert system shell, however, expert systems can be built in
5 person-years and simple expert system prototypes in only 3 person-months.
DIALOG STRUCTURE
INFERENCE
ENGINE
KNOWLEDGE BASE
Inference Engine
The inference engine is a program that allows hypotheses to be generated based upon the
information in the knowledge base. It is the control structure that manipulates the
knowledge in the knowledge base to arrive at various solutions.
Three major methods are incorporated in the inference engine to search a space
efficiently for deriving hypotheses from the knowledge base: forward chaining,
backward chaining, and forward and backward processing combined. Later
chapters will explain each of these methods in detail
Forward chaining, often described as event- or data-driven reasoning, is used for problem
solving when data or basic ideas are a starting point. With this method, the system does not
start with any particular goals defined for it. It works through the facts to arrive at
conclusions, or goals. One drawback of forward chaining is that one derives everything
possible, whether one needs it or not. Forward chaining has been used in expert systems for
data analysis, design, and concept formulation.
Backward chaining, called goal – directed reasoning, is another inference engine technique.
This method entails having a goal or a hypothesis as a starting point and then working
backward along certain paths to see if the conclusion is true. A problem with backward
chaining involves conjunctive subgoals in which a combinatorial explosion of possibilities
could result. Expert systems employing backward chaining are those used for diagnosis and
planning.
Forward and backward processing combined is another method used for search direction in
the inference engine. This approach is used for a large search space, so that bottom-up and
top-down searching can be appropriately combined. This combined search is applicable to
complex problems incorporating uncertainties, such as speech understanding.
Most inference engines have the ability to reason in the presence of uncertainty. Different
techniques have been used for handling uncertainty – namely, Bayesian statistics, certainty
factors and fuzzy logic. In many cases, one must deal with uncertainty in data or in
knowledge. Thus, an expert system should be able to handle this uncertainty.
Knowledge Base
The knowledge base is the last and most important component of an expert system. It is
composed of domain facts and heuristics based upon experience. According to Duda and
Gaschnig, the most powerful expert systems are those containing the most knowledge.
3.2 Building an Expert System
Building an expert system typically involves the rapid prototyping approach. This
approach entails “building a little and testing a little” until the knowledge base is
refined to meet the expected acceptance rate and users’ needs. Expert systems
development is an iterative process in which, after testing, knowledge is
reacquired, represented, encoded, and tested again until the knowledge base is
refined. A principal refinement for expert systems continues to be the need for
effective filtering out of the biases created by the particular values and vies of the
experts who are the sources of the knowledge that constitute knowledge bases.
The first step in building an expert system is to select the problem, define the
expert system’s goal(s), and identify the sources of knowledge. The criteria for
expert system problem selection can be broken down into three components:
problem criteria, expert criteria, and domain area personnel criteria. The following
criteria should be followed when selecting an expert system problem:
Problem Criteria
* The task involves mostly symbolic processing.
* Test cases are available.
* The problem task is well bounded.
* The task must be performed frequently.
* Written materials exist explaining the task.
* The task requires only cognitive skills.
* The experts agree on the solutions.
Expert Criteria
An expert exists. •
The expert is cooperative. •
The expert is articulate. •
The expert’s knowledge is based on experience facts and judgment •
Other experts in the task area exist. •
Domain Area Personnel Criteria
A need exists to develop an expert system for that task. •
The task would be provided with the necessary financial support. •
Top management supports the project. •
The domain area personnel have realistic expectations for the use of an expert •
system.
Users would welcome the expert system. •
The knowledge is not politically sensitive or controversial. •
Once these criteria are met and the problem is selected, the next step is to acquire the knowledge from •
the expert in order to develop the knowledge base .Knowledge acquisition is an iterative process in
which many meetings with the expert are needed to gather all the relevant and necessary information
for the knowledge base. Before being able to acquire the knowledge from the expert, however, the
knowledge engineer must be familiar with the domain. Reading documentation and manuals and
observing experts in the domain are needed to obtain a fundamental background on which the
knowledge engineer’s knowledge will be based. This background is needed so that the knowledge
engineer can ask the appropriate questions and understand what the expert is saying.
Knowledge can be acquired by various methods. The most commonly used method is to have the •
knowledge engineer (i.e., the developer of the expert system) interview the expert. Various structured
and unstructured techniques can be used for interviewing. Some of the methods are as follows:
Method of familiar tasks-- analysis of the tasks that the expert usually performs. •
Limited information tasks—a familiar task is performed but the expert is not given certain information •
that is typically available .
Constrained processing tasks—a familiar task is performed, but the expert must do so under time or •
other constraints.
Method of tough cases – analysis of a familiar task that is conducted for a set of data that •
presents a tough case for the expert.
After the knowledge acquired, the next step is to select a knowledge representation
approach. These approaches include predicate calculus, frames, scripts, semantic networks,
and production rules, as described earlier. According to Software Architecture and
Engineering, Inc., rule-based deduction would be an appropriate method to use if (1) the
underlying knowledge is already organized as rules, (2) the classification is predominantly
categorical, and (3) there is not much context dependence. Frames, scripts, and semantic
networks are best used when the knowledge preexists as descriptions.
The next step consists of programming the knowledge by using a text editor in an expert •
system shell or by using LISP, Prolog, C, or some other appropriate programming language.
An expert system shell contains a generalized dialog structure and an inference engine. A
knowledge base can be designed for a specific problem domain and linked to the expert
system shell to form a new expert system for a particular application. Some of the more
popular expert system shells are these:
. •
The knowledge contained by the system will not be politically sensitive or controversial
Figure 2.1 shows the Expert Choice hierarchy before pair wise comparisons (i.e., before weighting
takes place) for this expert system problem selection application.
After constructing this hierarchy, the evaluation process begins. Expert Choice will first
question the user in order to assign priorities (i.e., weights) to the criteria. Expert Choice allows
the user to provide judgment verbally; so that no numerical guesses are required (it also allows
the user to give a numerical answer). Thus, the first question would be: “With respect to the goal
of selecting an expert system problem, is criterion one (i.e., PRO TYPE) just as important as
criterion two (i.e., EXPERT)?” If the user’s answer to the question is “Yes,” then criterion one is
compared to criterion three (PRO TYPE vs. DOM PERS). If the answer is “NO,” as it is at the top of
Figure 2.2, then Expert Choice will ask, “Is PRO TYPE more important then EXPERT?” and it will ask
for the level of importance. The number of pairwise comparisons is shown in a triangle of dots, as
displayed in the right-hand corner of Figure 2.2. The relative importance, as shown at the top of
Figure 2.2, is moderately more important, strongly more important, very strongly more important
extremely more important, or a degree within the range. Based upon the user’s verbal judgments,
Expert Choice will calculate the relative importance on the following scale, based on Saaty’s work:
1. Equal importance
3. Moderate importance of one over another
5. Essential or strong importance
7. Very strong importance
9. Extremely important
2,4,6,8. Intermediate values between the two adjacent judgments
This procedure is followed to obtain relative priorities of the criteria in which eigenvalues are
calculated based upon pairwise comparisons of one criterion versus another, as discussed in
the previous section. These pairwise comparisons are made for each of the criteria and sub
criteria. Figure 2.2 and 2.3 show the priorities after the pairwise comparisons are made.
Also, an inconsistency index is calculated after each set of pairwise comparisions to show
how consistent the user’s judgments are. An overall inconsistency index is calculated at the
end of the synthesis as well. This measure is zero when all judgments are perfectly
consistent with one another and becomes larger when the inconsistency is greater. The
inconsistency is tolerable if it is 0.10 or less.
After the criteria and subcriteria are weighted, the next step involves
pairwise comparisions between the alternatives and the subcriteria. For example,
one question would be: “With respect to SYMBOLIC, are MACROECO and NUCLEAR
equally preferable?” After all the pairwise comparisions have been entered, Expert
Choice performs synthesis by adding the global priorities (global priorities indicate
the contribution to the overall goal) at each level of the tree hierarchy. Figure 2.5
shows the synthesis of the results and, finally the ranking of the alternatives. In this
example after taking all the pairwise comparisions into account, the best problem
to select for expert systems development is the BID/NO (i.e., develop an expert
system for determining whether a company should bid on a request for proposal).
Its priority is .578, followed by MACROECO (.266) and, last, NUCLEAR (.56). The
overall inconsistency index is 0.06, which is within the tolerable range.
Through this methodology, the best problem for expert systems
development given these three alternatives, is to determine whether to bid
or not on a request for proposal. Expert Choice also allows for sensitivity
analysis if the user to desires.
Other Approaches for Expert System Problem Selection
If one does not want to use the AHP/Expert Choice approach to
select an expert system problem another technique is develop a checklist of
important problem criteria based on those in Section 2.1, and sees how many of
them fit the problem under consideration. This is an unsophisticated approach, but
it is effective and is probably the one most knowledge engineers use when
selecting a possible problem task for expert systems development.
Cost-benefit analysis should also be conducted to determine
whether it is technically economically, and operationally feasible and wise to
develop an expert system for a particular problem. Costs include the expert’s time,
as well as that of the knowledge engineer [1,4]. Additional costs include possible
acquisition of hardware, possible acquisition of software like expert system shells,
overhead, the expert’s travel and lodging expenses, and computing time. The
benefits of an expert system might include reduced costs, increased productivity,
increased training productivity and effectiveness, preservation of knowledge,
enhanced products or services, or even the development of new products and
services [426272829].
The costs and benefits between the status quo and the proposed expert
system could then be compared. Expert Choice could even be used for this cost
benefit analysis to see if the expert system would be more cost-effective than the
status quo.
2.4 Conclusion
Problem selection is a critical step in the expert systems development process.
Selecting the wrong problem or failing to reduce the problem to a manageable size
will create problems later when constructing the expert system.
The guidelines for problem selection criteria presented in this chapter should
be followed closely when considering the type of problem, domain expert, and
domain area personnel. The AHP is a useful structured methodology for
incorporating these criteria in selecting an expert system problem. Expert Choice
easily facilitates the use of the AHP. Whether this structured methodology or some
other approach is used in the problem selection process, the important point is to
make sure that the method used will help to identify the right problem for expert
systems development.
Knowledge Acquisition
3.3 Possible Solutions to 3.1 Knowledge Acquisition
Problems the knowledge Acquisition
3.4 In the Future 3.2 How to Reduce the Expert’s
Boredom
Acquiring knowledge from an expert for expert systems development is a
difficult process. One reason is that an expert, talking continuously, will produce
about 10,000 words a hour; this is equivalent to 300 to 500 pages of transcript
from a single day’s session with an expert. This voluminous information must
somehow be digested by the knowledge engineer and appropriately represented
for expert systems development. In addition, the knowledge engineer, when
interviewing the expert, must pay attention to the expert’s intonation and use of
words, which might influence the meaning of some of the information.
There are many problems in extracting information from an expert. This
chapter will survey the problems that may be encountered during the knowledge
acquisition process. Then some possible solutions to this problem will be
presented.
3.1 Knowledge Acquisition Problems
Knowledge acquisition is the most difficult part of expert systems development. There are
numerous problems associated with this process. The following are some of the major
problems that may be incurred:
Human biases in judgment on the part of the knowledge engineer and expert which might
inhibit transmission of the correct information. These biases include the following:
Recency—people are influenced by the most recent events. –
Availability—people use only the information that is available to them.
Imaginability—people use information only in the form that is presented to them. –
Correlation—people make correlations where none exist. –
Causality—people assign causes where none exist. –
Anchoring and adjustment—people use an anchor point and argue around that point. –
Statistical intuition—people do not fully understand the effects of variance and sample –
size.
The expert could cancel or defer knowledge acquisition appointments, fail to answer
questions, or neglect to supply information.
It is difficult for the knowledge engineer to extract and the expert to convey •
knowledge and heuristics that have been acquired over many years of professional
experience; for example information that the expert considers common sense may
not be so to the knowledge engineer.
The expert might, consciously or unconsciously, use the knowledge engineer to •
experiment with different models of the knowledge domain that he or she has
developed.
The process of knowledge elicitation requires many hours of an expert who is •
already busy and has many demands on his or her time.
Some knowledge engineers may not be good at interviewing, causing them to •
interrupt, not listen to the way the expert uses knowledge, misinterpret
information, or not ask the right questions.
Some knowledge engineers might start feeling expert and then think that they are •
the experts.
Even an expert may not be right 100 percent of the time. •
A knowledge engineer may not listen fully to the language that the expert uses to •
represent his or her experience; the knowledge engineer should be sensitive to
auditory thoughts, visual thoughts, sensory memories, or feeling.
The knowledge engineer may assume or project his or her favored modes of •
thinking into the expert’s verbal reports.
The knowledge engineer may not be cognizant of the body language that the •
expert is using.
It might be difficult to uncover the expert’s ability that is hidden at •
the gut level.
The knowledge engineer may not be organized in his or her approach •
to eliciting knowledge from the expert.
The knowledge engineer may not be skilled in or knowledgeable •
about the different methods that can be used to extract the expert’s
knowledge. Some these methods are as follows:
Method of familiar tasks—analyze the tasks that the expert usually performs. –
Structured and unstructured interviews—the expert is queried with regard to –
knowledge of facts and procedures.
Limited information tasks—a familiar task is performed, but the expert is not –
given certain information that is typically available.
Constrained processing tasks—a familiar task is performed, but the expert must –
work under time or other constraints.
Method of tough cases—analysis of a familiar task is conducted for a set of –
data that presents a tough case for the expert.
Some knowledge engineers may not be very familiar with the domain, and may –
not know what questions to ask or understand what the expert is saying.
The expert may not be cooperative or articulate. –
These are typical problems that may surface during the knowledge
acquisition process. One way to help ensure success during the knowledge
acquisition process is to make sure that the problem selected is appropriate and
well scoped for expert systems development. Additionally, the knowledge engineer
should become knowledgeable about the domain before interacting frequently
with the expert. The expert selected should be willing to participate, cooperative,
and articulate.
The next sections survey solutions that are being developed to improve the
knowledge acquisition process .
3.2 How to Reduce the Expert’s Boredom
It is said that those closes to the technology may have a clouded view of where it is
going. They may be too close to the technology to take a step back and distinguish
the forest for the trees. The same situation may be true of the knowledge engineer
during the sessions with the expert. The knowledge engineer may be caught up in
questioning the expert, without realizing that the expert is bored with the process.
This issue was recently raised by a domain expert who was being interviewed by a
first-time knowledge engineer, who kept telling the expert to give him information
in terms of IF-THEN rules. After about an hour of this exercise, the domain expert
became bored and frustrated, and eventually withdrew from the project.
The question, then, is how to reduce the chance of the expert’s becoming
bored during the knowledge acquisition process. There are 10 methods that may
be used:
1. The knowledge engineer needs to vary the methods used in
acquiring knowledge. Alternatives include scenario building, observation, and
limited information task interviewing to allow flexibility on the part of the expert to
explain his or her reasoning. Using a variety of methods should prevent the expert
from getting into a rut.
2. Each knowledge acquisition session should last no more than 2
hours. Studies have shown that this time limit is optimal.
3.It is helpful to have two knowledge engineers present when
interviewing the expert. Thus, during the questioning/listening process, the expert
can bounce ideas off both of them, instead of dealing with the same person every
day.
4.It might be helpful to deal with the expert away from the office, in
an informal setting. Meeting at a restaurant or pub might make the expert more
relaxed and at ease in answering the knowledge engineer’s questions.
5.Let the expert explain his or her reasoning by running through
typical scenarios. Asking about familiar tasks will make the expert feel more
comfortable with the interviewing process. Later on, in order to obtain some of the
heuristics, the method of tough cases, constrained time, or limited information
tasks might used.
6. Don’t require or force the expert to reason or talk in a certain
way, such as in the form of IF-THEN rules. This will be unnatural, awkward, and
annoying to the expert. The knowledge engineer should listen to the way the
expert is using his or her knowledge and should later determine the best way for
representing it.
7. Early on, show the expert the interactive expert system in order
to capture the expert’s attention. In this manner, the expert can better visualize the
expert system instead of looking at hard copies of rules. This will also show the
expert that his or her time is not being wasted; there is a substantive result of the
knowledge acquisition sessions. One caveat, however, is to be careful in showing
the system to the expert too early. If there is not much in the system, the expert
may consider it trivial and meaningless.
8. The knowledge engineer needs the expert to feel ownership of
the expert system. One way to do this is to name the system after the expert or
include the expert’s name on the opening screen as the “expert con101sultant.”
9. According to Earl Sacerdoti at Copernican, let the expert do his or
her normal daily activities, as well as spending up to 2 hours a session on
knowledge acquisition. With this arrangement, the expert will not feel that the
knowledge engineer is monopolizing his or her time.
10. As a corollary to the previous guideline, remember that, typically
1 day of the expert’s time for every 4 days of the knowledge engineer’s time will be
needed to develop the expert system. This formula should be kept in mind in
projecting the expert’s involvement in the expert system project.
Hopefully, by following these guidelines, the knowledge engineer will create a fruitful
and enjoyable relationship with the expert. Ultimately this should lead to an improved
chance of building and implementing a successful expert system.
1- Harry is a man.
MAN(HARRY)
2- Harry is a tennis player.
TENNISPLAYER(HARRY)
3- All tennis players are athletes.
(FORALL X) [TENNISPLAYERS(X)-- ATHLETE(X)
4- Bob is a coach.
COACH(BOB)
5- All athletes either obey or disobey the coach.
(FORALL X)[ATHLETE(X) OBEYS(X,COACH) OR
DISOBEY(X,COACH) ]
6- Everyone is loyal to someone.
(FORALL X) (EXISTS Y) LOYALTS(X,Y)
7- Athletes only disobey coaches they aren’t loyal to.
(FORALL X)(FORALL Y) [ATHLETE(X) AND COACH(Y) AND DISOBEY(X,Y) ] NOT LOYALTS (X,Y)
8- Harry was disobedient to Bob.
DISOBEDIENT(HARRY, BOB)
If we want to prove “Is Harry loyal to Bob?”, the following proof could be done using
predicate calculus:
RULES SHOW: Is Harry loyal to Bob?
NOTLOYAL TO(HARRY,BOB)
7,Substitution
ATHLETE(HARRY) AND COACH(BOB)
DISOBEY(HARRY, BOB) 4
2,3,Substitution
True, so Harry is not loyal to Bob
Then: Assign the disc drive to each of the controllers and note that the two controllers have
been associated and that each supports one device.
According to Reggia and Perricone and Software Architecture and Engineering, there
are three criteria to use in selecting a knowledge representation approach: (1)
preexisting format of the knowledge, (2) type of classification desired, and (3)
context dependence of the inference process. Production rules are usually used
when the preexisting format of the knowledge already organized as rules or
expressed in terms of rules when the expert is explaining the task to the knowledge
engineer. In this case, by using production rules the knowledge can be kept in the
same form presently used, thus creating intuitive appeal. Production rules are also
used when the classification of knowledge is predominantly categorical.
If most of the decisions in the expert system task can be answered by “yes” or “no,”
then production rules would be appropriate. Last, if the knowledge has little
context dependence, then production rules are a good form for representing it
because there is not much descriptive knowledge. For descriptive knowledge, other
knowledge representation methods, such as frames, are better.
There are several advantages to using production rules. First, rules are a
natural expression of what-to-do knowledge, that is, procedural knowledge.
Second, all knowledge for a problem is uniformly presented as rules. Third, rules
are comprehensible units of knowledge. Fourth rules are modular units of
knowledge that can be easily deleted or added. Last, rules may be used to
represent how-to-do knowledge, that is, metaknowledge. Metaknowledge refers to
knowledge about knowledge and can be represented as metarules. A metarule is a
production rule that controls the application of object-level knowledge. It gives
another layer of sophistication to the expert system because it adds additional
layers of space to a search space to help decide what to do next. A disadvantage of
production rules is that there is a limit to the amount of knowledge that can be
expressed conveniently in a single rule. This is not a severe limitation because even
when using microcomputer-based expert systems shell like Exsys, a rule can have
up to 126 conditions in the IF part and up to 126 conditions in its THEN part.
4.3 Frames
A third knowledge representation method used in expert systems is frames.
Frames, developed by Minsky and Kuipers, are used for declarative knowledge.
Declarative knowledge, in contrast to procedural knowledge, is knowledge that
can’t be immediately executed but can be retrieved and stored. Frames were
developed because there was evidence that people do not analyze new situations
from scratch and then build new knowledge structures to describe those situations.
Instead, people use analogical reasoning and take a large collection of structures,
available in memory, to represent previous experience with objects, locations,
situations, and people [2]. Frames[2] (1) contain information about many aspects
of the objects or situations they describe; (2) contain attributes that must be true
of objects that will be used to fill individual slots; and (3) describe typical instances
of the concepts they represent. Frames are used in situations where there is a large
amount of context dependence, implying the use of descriptive knowledge. Frames
are represented like cookbook recipes, where “slots” are
filled with the ingredients needed for the recipe, and then procedural attachments
(e.g., if-added, if-needed, and to-establish procedures) are used to manipulate the
data (i.e., to fill the slots) within and among the frames, such as going through the
steps on how to actually “prepare” the “recipe.” Default values may be provided
with frames.
Each frame corresponds to one entity and contains a number of
labeled slots for things pertinent to that entity. Slots, in turn, may be
blank, or may be specified by terminals referring to other frames, so
that the collection of frames is linked together in a network. This
allows the knowledge to be useful for modularity an accessible.
Attempts to design general knowledge structures based on the frame
concept were made by Bobrow and Winograd via the Knowledge
Representation Language and by Roberts and Goldstein via the Frame
Representation language.
4.4 Scripts
A special kind of frames is sometimes called a script. Clusters of facts can
have useful special-purpose structures that exploit specific properties of their
restricted domains. A script developed by Schank and Abelson in 1977, is such a
structure that describes a stereotyped sequence of events in a particular context.
The components of a script include the following:
Entry conditions •
Results—conditions that will generally be true after the events described in the •
script have occurred.
Props—slots representing objects. •
Roles—slots representing people. •
Track—specific variation on a more general pattern represented by a particular •
script.
Scenes—actual sequences of events that occur. •
In part of a Restaurant Script, the track can be a coffee shop. The entry
conditions are given where the customer is hungry and has money. The scenes of
entering the coffee shop, ordering, eating, and exiting are displayed. The results of
this script are that the customer has less money, is not hungry (hopefully!), and is
pleased (optional). Another result is that the owner of the coffee shop has more
money. Scripts are helpful in situations there are many causal relationships
between events. Figure 4.1 shows examples of the knowledge representation
methods discussed.
4.5 Semantic Networks
The last major way of representing knowledge in an expert system is by
semantic networks. Semantic networks were discovered by Quillian and Raphael in
1968 and are used to represent declarative knowledge. With semantic networks,
knowledge is organized around the objects being described, but objects are
represented as nodes on a graph and relations among them are represented by
labeled arcs. A semantic network is a collection of nodes and arcs where the
following conditions occur:
•
Nodes represent classes, objects, concepts, situations, events, and so on. •
Nodes have attributes with values that describe the characteristics of the thing •
they represent.
Arcs represent relationships between nodes. •
Arcs allow us to organize knowledge within a network hierarchically. •
For example, Figure 4.2 shows a fragment of a semantic network on computers. •
This figure shows the following associations:
MICROCOMPUTER Isa COMPUTER
IBM PC Owner ME
ME Isa PERSON
This is only a fragment of a semantic network because we haven’t included all the
relevant nodes and arcs relating to microcomputers, nor have we indicated
those nodes and arcs associated with other kinds of computers, like
minicomputers, mainframes, superminicomputers, supercomputers, and
labpsize computers.
The reasoning in a semantic network depends on the procedures used to
manipulate the network. The following steps are usually accomplished:
Basically, match patterns against one or more nodes to retrieve information; •
heuristics may be needed to tell where to begin matching.
Inference—derive general properties by examining a set of nodes for common •
features and relations.
Deduction—follow paths through a set of nodes to derive a conclusion. •
The main advantage of using semantic networks is that for each object,
event, or concept, all the relevant information is collected together. Semantic
networks are used to represent specific events or experiences, as well as for tasks
that have a large amount of context dependence.
4.6 Case-Based Reasoning
Another paradigm of increasing interest is called case-based reasoning. Case based
reasoning is built on the premise of analogical reasoning, whereby one relies on
past episodes or experiences and modifies an old plan to fit the new situation. It
assumes a memory model for representing, indexing, and organizating past cases
and a process model for retrieving and modifying old cases and assimilating new
ones.
Many computer programs use case-based reasoning for problem solving or
interpretation: MEDIATOR and PERSUADER use cases to resolve disputes. CLAVIER
and KRITIK use case-based reasoning for design, HYPO uses cases for legal
reasoning, and MEDIC utilizes case-based reasoning for diagnosis. With today’s
growing interest in case-based rasoning, several case-based reasoning shells are
being sold commercially. These include ReMind (Cognitive Systems), Easteem
(Esteem Softward, Inc.), and CBR Express (Inference Corporation). Many research
issues still need to be resolved to advance case-based reasoning. These include the
representation of episodic knowledge, memory organization, indexing, case
modification, and learning.
4.7 Object-Oriented Programming
Object-oriented programming (OOP) is another popular paradigm that is gaining
worldwide interest. Object-oriented representation of knowledge can be used in
expert systems. Some of the main features of OOP involve the use of objects,
inheritance and specialization, and methods and message passing.
An object is a data structure that contains both data and procedures. The data
structures contained within an object are called attributes or slots or
variables).Inheritance is the technique that ensures that the child of any object will
include the data structures and methods associated with its parents. Inheritance
means that a developer does not have to re-create slots or methods that have been
created. Specialization is the idea that one can specialize (or override) information
that is inherited. A method is a function or procedure attached to an object. A
method is activated by a message sent to the object. Message passing (or binding)
is the process involved in sending messages to objects.
Pure object-oriented languages include Smalltalk, Simula 76, and Eiffel.
Hybrid object-oriented languages, which arebuilt on an existing high-level
language, include C++ Turbo Pascal 5.5, and CLOS. It is expected that object-
oriented language and development tools, expert system tools, and current CASE
(computer-aided software engineering) tools will all merge into a single product by
the mid-1990s.
Soft Computing
Definition
According to Zadeh (who introduced this term), soft computing differ
from hard (conventional ) computing in tolerance to:
Imprecision •
Uncertainty •
Partial truth •
Methodology Attributes
NN GA
Neuro-fuzzy
Fuzzy Logic
GA-Fuzzy
From AI to soft computing
Conventional AI symbolisms •
New trends of AI •
Purpose/Goal
• The main purpose of modern CDSS is to assist clinicians at the point
of care. This means that a clinician would interact with a CDSS to
help determine diagnosis, analysis, etc. of patient data
• Previous theories of CDSS were to use the CDSS to literally make
decisions for the clinician .
• The new methodology of using CDSS forces the clinician to interact with the
CDSS utilizing both his own knowledge and the CDSS to make a better
analysis of the patients data than either human or CDSS could make on
their own
Functions of Computer-Based Clinical Decision
Support Systems
Function Example •
Alerting: highlighting out-of-range laboratory values •
Reminding: reminding the clinician to schedule a drug dose •
Interpreting: interpreting the electrocardiogram •
Predicting: predicting risk of mortality from a severity of illness •
Diagnosing: listing a differential diagnosis for a patient with •
chest pain
Assisting: tailoring the antibiotic choices for liver transplant and •
renal failure
Suggesting: generating suggestions for adjusting the mechanical •
ventilator
Types of CDSS:
There are two types of CDSS
For closing the gap between the physicians and CDSSs, evidence based
appeared to be a perfect technique. It proves to be a very powerful
tool for improving clinical care and also patient outcomes. It has the
potential to improve quality and safety as well as reducing the cost.
Characteristics of a Successful Rule-based System for CDSS
Clear and available interventions •
For high volume areas in the hospital •
Addresses problems with high mortality or morbidity •
In clinical areas of high importance to the organization •
Cost-benefits are clear •
Health care professionals are not overloaded with alerts •
Easy for clinicians to satisfy the alerts (simply ordering a test) •
Willingness of clinician to accept the alert •
Timeliness of the alert •
Rules differ depending on the audience – nurse or physician as example •
Easiest rules to implement that satisfy clinicians and that maintain their •
interest
Need for partnership between clinical and IT groups •
What Clinician’s Want in a CDSS
Efficient •
Not time consuming •
Alerts are triggered only for eligible patients •
Exceptions can be indicated •
Repetition is minimized •
User-friendly interface and presentation •
Easy to see alerts •
Easy to respond to alerts •
Content is accurate and robust •
Easy to access additional information from the alert •
Integrated into the workflow •
Alert appears at an appropriate time •
Alert appears to the appropriate person •
Introduction to Neural
Networks
.1 Biological Basis
ocomputing Fundamentals
al Network Architectures
Learning Paradigms
antages and Limitations
f Neural Networks
2/5/2019 95
6.1 Biological Basis
Biological Neural Networks
The brain is composed of over 100 different •
kinds of special cells called neurons.
The number of neurons in the brain is •
estimated to range from 50 billion to over 100
billion.
96 2/5/2019
6.1 Biological Basis (cont.)
Neurons are divided into interconnected •
groups called networks and provide
specialized functions
Each group contains several thousand neurons •
that are highly interconnected with each
other
97 2/5/2019
Biological Basis (cont.)
98 2/5/2019
6.1 Biological Basis (cont.)
the brain can be viewed as a collection of •
neural networks. A portion of a network
composed of two cells is shown in Figure 6.1.
Human intelligence is used to understand the •
various visual features that are extracted and
stored in memory.
99 2/5/2019
6.1 Biological Basis (cont.)
The ability to learn from and react to changes •
in our environment requires intelligence.
An example is the optical path in visual •
systems. External stimuli are transformed via
cone cells and rod cells into signals that map
features of the visual image into internal
memory.
100 2/5/2019
6.1 Biological Basis (cont.)
An artificial neural network (ANN) is a model •
that emulates a biological neural network.
The nodes in an ANN are based on the •
simplistic mathematical representation of
what we think real neurons look like.
101 2/5/2019
Biological Basis (cont.)
today’s neural computing uses a limited set of •
concepts from biological neural systems to
implement software simulations of massively
parallel processes involving processing
elements (also called artificial neurons or
neurodes) interconnected in a network
architecture
102 2/5/2019
Biological Basis (cont.)
The neurode is analogous to the biological •
neuron, receiving inputs that represent the
electrical impulses that the dendrites of
biological neurons receive from other
neurons.
The output of the neurode corresponds to a •
signal set out from a biological neuron over its
axon
103 2/5/2019
6.1 Biological Basis (cont.)
The axon of the biological neuron branches to •
the dendrites of other neurons, and the
impulses are transmitted over synapses. A
synapse is able to increase or decrease its
strength, thus affecting the level of signal
propagation and is said to cause excitation or
inhibition of a subsequent neuron.
104 2/5/2019
Artificial Neural Networks
The state of the art in neural computing is •
inspired by our current understanding of
biological neural networks .
However, despite the extensive research in •
neurobiology and psychology, important
questions remain about how the brain and the
mind work
105 2/5/2019
The "basic" biological neuron
106 2/5/2019
Artificial Neural Networks (cont.)
Research and development in the area of ANN is •
producing interesting and useful systems that borrow
some features from the biological systems, even
though we are far from having an artificial brain-like
machine. The field of neural computing is in its
infancy, with much research and development
required in order to mimic the brain and mind.
However, many useful techniques inspired by the
biological systems have already been developed and
are finding use in real-world applications.
107 2/5/2019
Artificial Neural Networks (cont.)
More recently, neural network development •
systems and tools have become commercially
available. As with expert systems, the
availability of a convenient development
method is allowing the spread of
neurocomputing and is putting
neurocomputing on the road to becoming part
of the standard repertoire of systems
developers.
108 2/5/2019
6.2 Neurocomputing
Fundamentals
The key concepts needed to understand •
artificial neural networks will now be
discussed.
109 2/5/2019
Neurode
An ANN is composed of basic with called •
artificial neurons, or neurodes, that are the
processing elements (PEs) in a network. Each
neurode receives input data, processing it,
and delivers a single output. This process is
shown in Figure 6.2. The input can be raw data
or output of other PEs. The output can be the
final product or it can be an input to another
neurode.
110 2/5/2019
Figure 6.2 Models of the artificial neurode and network •
input output
(dendrites) f (axon)
output
4 x 3 = 12 weght
input
111 2/5/2019
Networks
An ANN is composed of a collection of •
interconnected neurons that are often
grouped in layers; however, in general, no
specific architecture should be assumed. The
various possible neural network topologies are
the subject of research and development.
112 2/5/2019
Network Architectures
113 2/5/2019
Networks
In terms of layered architectures, two basic •
structures are shown in Figure 6.3. In part (a)
we see two layers: input and output. In part
(b) we see three layers: input, intermediate
(called hidden), and output. An input layer
receives data from the outside world and
sends signals to subsequent layers.
114 2/5/2019
Networks
The outside layer interprets signals from the •
previous layer to produce a result that is
transmitted to the outside world the network
understands of the input data.
115 2/5/2019
Figure 6.3 Taxonomy of ANN architectures and learning algorithms •
Learning
Algorithms
Discrete/Binary Continuous
Architectures
Supervised unsupervised
Recurrent Feed-forward
Estimators Extractors
Hopfield Backpropagatio
n SOFM ART-1
ML perceptron ART-2
Boltzmann
116 2/5/2019
Input
Each input corresponds to a single attribute of a •
pattern or other data in the external world. The
network can be designed to accept sets of input
values that are either binary-valued or continuously
valued. For example, if the problem is to decide
whether or not to approve a loan, an attribute can
be income level, age, and so on. Note that in
neurocomputing, we can only process numbers.
Therefore, if a problem involves qualitative attributes
or graphics, the information must be preprocessed to
a numerical equivalent before it can be interpreted
by the ANN.
117 2/5/2019
Input (cont.)
Examples of inputs to neural networks are pixel •
values of characters and other graphics, digitized
images and voice patterns, digitized signals from
monitoring equipment, and coded data from loan
applications. In all cases, an important initial step is
to design a suitable coding system so that the data
can be presented to the neural networks, commonly
as sets of 1s and 0s.For example, a 6x8-pixel
character would be a 48-bit vector input to the
network.
118 2/5/2019
Output
The out put of the network is the solution to the •
problem. For example, in the loan application case it
may be yes or no.
The ANN, again, will assign numerical values (e.g., + 1 •
means yes; zero means no). The purpose of the
network is to compute the value of the output. In the
supervised type of ANN, the initial output of the
network is usually incorrect and the network must be
adjusted or tuned until it gives the proper output.
119 2/5/2019
Hidden Layers
In multilayered architectures, the inner (hidden) •
layers do not directly interact with the outside world,
but rather add a degree of complexity to allow the
ANN to operate on more interesting problems.
The hidden layer adds an internal representation of •
the problem that gives the network the ability to
deal robustly with inherently nonlinear and complex
problems.
120 2/5/2019
Weights
The weights in an ANN express the relative strengths •
(or mathematical values) of the various connections
that transfer data from layer to layer. In other words,
the weights express the relative importance of each
input to a PE.
Weights are crucial to ANN because they are the •
means by which we repeatedly adjust the network to
produce desired outputs and thereby allow the
network to learn.
121 2/5/2019
Weights (cont.)
The objective in training a neural network is •
to find a set of weights that will correctly
interpret all the sets of input values that are of
interest for a particular problem. Such a set of
weights is possible if the number of neurodes,
the architecture, and the corresponding
number of weights form a sufficiently complex
system to provide just enough parameters to
adjust (or “tune”) to produce all the desired
outputs.
122 2/5/2019
Summation Function
The summation function finds the •
weighted average of all the input elements. A
simple summation function will multiply each
input value (X1) by its weight (Wq) and total
them for a weighted sum, Si . The formula for
N input elements is :
N
S i Wq X 1
j i
123 2/5/2019
Summation Function (cont.)
The neurodes in a neural network thus have •
very simple processing requirements. Mainly,
they need to monitor the incoming signals
from other neurodes, compute the weighted
sums, and determine a corresponding signal
to send to other neurodes.
124 2/5/2019
Transformation Function
The summation function computes the internal •
stimulation or activation level of the neuron. Based
on this level, the neuron may or may not produce an
output. The relationship between the internal
activation levels may be either linear or nonlinear.
Such relationships are expressed by a transformation
function. The sigmoid function, which is commonly
and effectively used, is discussed here.
125 2/5/2019
From Logical Neurons to Finite
Automata
1 Brains, Machines, and
AND 1.5 Mathematics, 2nd Edition,
1987
1 Boolean Net
1
X Y
OR 0.5
NOT X
0 Finite
-1 Automaton
126
Y Q 2/5/2019
Transformation Function (cont.)
The selection of the specific function as well •
as of the transformation function, is one of
the variables considered in choosing a
network architecture and learning paradigm.
Although many different functions are
possible, a very useful and popular nonlinear
transfer function is the sigmoid (or logical
activation) function.
127 2/5/2019
Transformation Function (cont.)
Its formula is : •
1
YT s
1 e
128 2/5/2019
Transformation Function (cont.)
The collective action of a neural network is like that •
of a committee or other group making a decision.
Individuals interact and affect each other in the
process of arriving at a group decision. The global
average or consensus of the group is more significant
than an individual opinion and can remain the same
even if some individuals drop out. Also, a group can
have different mechanisms for arriving at the
collective decision.
129 2/5/2019
Learning
The sets of weight values for a given neural •
network represent different states of its
memory or understanding of the possible
inputs. In supervised networks, training
involves adjustment of the weights to produce
the correct outputs. Thus, the network learns
how to respond to patterns of data presented
to it. In other types of ANN, the networks self-
organize and learn categories of input data
(Figure 6.3).
130 2/5/2019
Learning (cont.)
An important function of the artificial neuron is •
the evaluation of its inputs and the production of an
output response. A weighted sum of the inputs from
the simulated dendrites is evaluated to determine
the level of the output on the simulated axon. Most
artificial systems use threshold values, and a
common activation function is the sigmoid
function,, that can squash the total input
summation to a bounded output value, as shown in
Figure 6.2.
131 2/5/2019
Learning (cont.)
This model of the neuron, or basic perceptron, •
requires a learning algorithm for deriving the
weights that correctly represent the
knowledge to be stored. A fundamental
concept in that regard is Hebbian learning,
based on Donald Hebb’s work in 1949 on
biological systems, which postulates that
active connections should be reinforced.
132 2/5/2019
Learning (cont.)
This means that the strengths (weights) of the •
interconnections increase if the prior node
repeatedly stimulates the subsequent node to
generate an output signal. In some algorithms,
the weights of connections may also be
decreased if they are not involved in
stimulating a node, and negative weights can
also be used to represent inhibiting actions.
133 2/5/2019
Learning (cont.)
For more complex neural computing applications, •
neurodes are combined together in various
architectures useful for information processing
(Figure 6.4).
A common arrangement has layers of neurodes with •
forward connection every neurode except those in
the same or the prior layer. Useful applications
require multip.e (hidden) layers between the input
and output neurodes and a correspondingly large
number of connections.
134 2/5/2019
Learning (cont.)
Information processing with neural computers consists of •
analyzing patterns of data, with learned information stored as
neurode connection weights.
A common characteristic is the ability of the system to classify •
streams of input data without the explicit knowledge of rules
and to use arbitrary patterns of weights to represent the
memory of categories.
During the learning stages, the interconnection weights •
change in response to training data presented to the system.
In contrast, during recall, the weights are fixed at the trained
values.
135 2/5/2019
Learning (cont.)
Although most applications use software •
simulations, neural computing will eventually use
parallel networks of simple processors that use the
strengths of the interconnections to represent
memory.
Each processor will compute node outputs from the •
weights and input signals from other processors.
Together the network of neurons can store •
information that can be recalled to interpret and
classify future inputs to the network.
136 2/5/2019
6.3 Neural Network Architectures
Many different neural network models and •
implementations are being developed and
studied. There representative architectures
(with appropriate learning paradigms) are
shown in Figure 6.4 and are discussed next.
137 2/5/2019
Network Architectures
1. Associative memory systems. These systems •
correlate input data with information stored in
memory. Information can be recalled from
incomplete or noisy input, and the performance
degrades slowly as neurons fail. Associative memory
systems can detect similarities between new input
and stored patterns. Most neural network
architectures can be used as associative memories;
and a prime example is the Hopfield network.
138 2/5/2019
Multi-layer Network
139 2/5/2019
Multi-layer Perceptron
Classifier
140 2/5/2019
Network Architectures
2. Multiple-layer systems. Associative memory •
systems can have one or more intermediate
(hidden) layers. An example of a simple
network is shown in Figure 6.4. The most
common learning algorithm for this
architecture is back propagation, which is a
kind of credit-blame approach to correcting
and reinforcing the network as it adjusts to
the training data presented to it
141 2/5/2019
Network Architectures
Another type of supervised learning, •
competitive filter associative memory, can
learn by changing its weights in recognition of
categories of input data without being
provided examples by an external trainer. A
leading example of such a self-organizing
system for a fixed number of classes in the
inputs is the Kohonen network.
142 2/5/2019
Network Architectures
3. Double-layer structures. A double layer structure, •
exemplified by the adaptive resonance theory (ART)
approach, does not require the knowledge of a precise
number of classes in the training data but uses feed-forward
and feedback to adjust parameters as data is analyzed to
establish arbitrary numbers of categories that represent the
data presented to the system.
Parameters can be adjusted to tune the sensitivity of the •
system and produce meaningful categories.
143 2/5/2019
144 2/5/2019
6.4 Learning Paradigms
An important consideration in ANN is the •
appropriate use of algorithms for learning.
ANN’s have been designed for different type
of learning.
Heteroassociation—mapping one set of data •
to another. This produces output that
generally is different in form from the input
pattern. It is used, for example, in stock
market prediction applications.
145 2/5/2019
6.4 Learning Paradigms
Autoassociation—storing patterns for error •
tolerance. It reproduces an output pattern
similar to or exactly the same as the input
pattern. It is used in optical character
recognition systems.
Regularity detection—looking for useful •
features in data ((feature extraction)). It is
used in sonar signal identification systems.
146 2/5/2019
6.4 Learning Paradigms
Reinforcement learning—acting on feedback. •
This is a supervised form of learning in which
the teacher is more of a critic than an
instructor. It is used in controllers in ultrasonic
spaceplanes.
Two basic approaches to learning in an ANN •
exist: supervised and unsupervised. These
approaches will now be discussed.
147 2/5/2019
Supervised Learning
In the supervised learning approach, we use a set of inputs for •
which the appropriate outputs are known.
In one type, the difference between the desired and actual •
outputs is used to calculate corrections to the weights of the
neural network (learning with a teacher).
A variation on that approach simply acknowledges for each •
input trial whether or not the output is correct as the network
adjust weights in an attempt to achieve correct results
(reinforcement learning).
148 2/5/2019
Unsupervised Learning
In unsupervised learning, the neural network self-organizes to •
produce categories into which a series of inputs fall. No
knowledge is supplied about what classifications are correct,
and those that the network derives may or may not be
meaningful to the person using the network. However, the
number of categories into which the network classifies the
inputs can be controlled by varying certain parameters in the
model. In any case, a human must examine the final
categories to assign meaning and determine the usefulness of
the results. Examples of this type of learning are the ART and
the Kohonen self-organizing feature maps.
149 2/5/2019
Perception learning
As a simple example of learning, consider that •
a single neuron learns the inclusive OR
operation.
The neuron must be trained to recognize the •
input patterns and classify them to give the
corresponding outputs.
150 2/5/2019
Perception learning (cont.)
The procedure is to present to the neuron the •
sequence of input patterns and adjust the weights
after each one.
This step is repeated until the weights converge to •
one set of values that allow the neuron to classify
correctly each of the four inputs.
The results shown in the following example were •
produced using Excel spreadsheet calculations.
151 2/5/2019
Perception learning (cont.)
In this simple case of perceptron learning, the •
following example uses a step function to evaluate
the summation of input values. After outputs are
calculated, a measure of the error between the
output and the desired values is used to update the
weights, subsequently reinforcing correct results. At
any step in the process,
Delta = Z – Y •
Where Z and Y are the desired and actual outputs, •
respectively.
152 2/5/2019
Perception learning (cont.)
Then the updated weights are wi = wi + alpha •
* delta * xi, where alpha is a parameter that
controls how rapidly the learning takes place.
153 2/5/2019
Perception learning (cont.)
As shown in Table 6.1, each calculation uses one of the x1 and •
x2 pairs and the corresponding value for the OR operation,
along with initial values, w1 and w2 , of the neurode weights.
In this example, the weights are assigned random values at
the beginning and a learning rate, alpha, is set to be relatively
low. The value Y is the result of calculation using the equation
just described, and delta is the difference between Y and the
desired result. Delta is used to derive the final weights, which
then become the initial weights in the next row.
154 2/5/2019
Perception learning (cont.)
The initial values of weights for each input are transformed, •
using the previous equation, to values that are used with the
next input. The threshold value causes the Y value to be 1 if
the weighted sum of inputs is greater than 0.5; otherwise, the
output is set to 0. In this example, after the first step, two of
the four outputs are incorrect and no consistent set of
weights has been found. In the subsequence steps, the
learning algorithm produces a set of weights that can give the
correct results. Once determined, a neuron with those weight
values can quickly perform the OR operation.
155 2/5/2019
Back Propagation
Although many supervised learning •
examples exist, other important cases, such as
the exclusive OR, cannot be handled with a
simple neural network. Patterns must be
linearly separable—that is, in the x-y plot of
pattern space, it must be possible to draw a
straight line that divides the clusters of input-
output-points that belong to the desired
categories.
156 2/5/2019
Back Propagation (cont.)
In the previous example, the input-output •
pairs (0,1), (1,0), and (1,1) are linearly
separable from (0,0). Although the
requirement of a linearly separable input
pattern space caused initial disillusionment
with neural networks, recent models such as
back propagation in multilayer networks have
greatly broadened the range of problems that
can be addressed.
157 2/5/2019
Back Propagation (cont.)
Back propagation, a popular technique that is •
relatively easy to implement, requires training data
to provide the network with experience before using
it for processing other data. Externally provided
correct patterns are compared with the neural
network output during training, and feedback is used
to adjust the weights until all training patterns are
correctly categorized by the network. In some cases,
a disadvantage of this approach is prohibitively large
training times.
158 2/5/2019
Back Propagation (cont.)
The neural network output during training and •
feedback is used to adjust the weights until all
training patterns are correctly categorized by
the network. In some cases, a disadvantage of
this approach is prohibitively large training
times.
159 2/5/2019
Back Propagation (cont.)
For any output neuron, the error delta = (Zj-Yj) *’, •
where Z and Y are the desired and actual outputs,
respectively, and ’ is the slope of a sigmoid function
evaluated at the jth neuron. If is chosen to be the
logistic function, then ’=d/dx =(1-), where
(x)=[1+exp(-x)]-1 and x is proportional to the sum of
the weighted inputs of the th neuron. A more
complicated expression can be derived to work
backward from the output neurons through the
inner layers to calculate the corrections to their
associated weights.
160 2/5/2019
Back Propagation (cont.)
The procedure for executing the learning •
algorithm is as follows: Initialize weights and other
parameters to random values, read in the input
vector and desired output, calculate actual output
via the calculations forward through the layers, and
change the weights by calculating errors backward
from the output layer through the hidden layers. This
procedure is repeated for all the input vectors until
the desired and actual outputs agree within some
predetermined tolerance.
161 2/5/2019
6.5 Advantages and Limitations of
Neural Networks
Although ANNs offer exciting possibilities, they also have certain •
limitations.
Traditional Al approaches have in their favor the more transparent •
mechanisms, often expressed in terms such as logic operations and rule-
based representations, that are meaningful to us in our everyday lives.
By comparison, ANNs do not use structured knowledge with symbols •
used by humans to express reasoning processes.
Furthermore, ANNs have so far been used for classification problems and, •
although quite effective in that task, need to be expanded to other types
of intelligent activities.
162 2/5/2019
6.5 Advantages and Limitations of
Neural Networks
An ANN’s weights, even though quite effective, are •
just a set of numbers that in most cases have no
obvious meaning to humans. Thus, an ANN is a black
box solution to problems, and an explanation system
cannot be constructed to justify a given result. As
noted before, another limitation can be excessive
training times, for example, in ANNs using back
propagation.
163 2/5/2019
6.5 Advantages and Limitations of
Neural Networks
Nuerocomputing is a relatively new field, and •
continued research and development will surely
minimize the limitations and find the further
strengths of this approach. The fault tolerance
aspects of ANNs will be improved, allowing them to
be effective as individual neurodes fail or have
incorrect input. The exciting prospects of self-
organizing networks will be exploited to produce
systems that learn on their own how to categorize
input data.
164 2/5/2019
6.5 Advantages and Limitations of
Neural Networks
Future systems will improve in the areas of •
generalization and abstraction, with the ability to go
beyond the training data to interpret patterns not
explicitly seen before. Finally, the collaboration
between scientists in neurocomputing and
neurobiology should lead to advances in each field as
computers mimic what we understand about human
thinking and as neuroscientists learn from computer
simulations of the theories of human cognition.
165 2/5/2019