0% found this document useful (0 votes)
16 views12 pages

Unit 4 PDF

The document discusses knowledge representation in intelligent systems, highlighting the distinction between knowledge and its representation, and various representation schemes such as logic, production rules, semantic networks, and frames. It also introduces probability theory, emphasizing its importance in reasoning under uncertainty, and covers concepts like conditional probabilities, Bayesian networks, and certainty factors. Additionally, it touches on Dempster-Shafer theory as a method for combining evidence in expert systems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views12 pages

Unit 4 PDF

The document discusses knowledge representation in intelligent systems, highlighting the distinction between knowledge and its representation, and various representation schemes such as logic, production rules, semantic networks, and frames. It also introduces probability theory, emphasizing its importance in reasoning under uncertainty, and covers concepts like conditional probabilities, Bayesian networks, and certainty factors. Additionally, it touches on Dempster-Shafer theory as a method for combining evidence in expert systems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

UNIT-4

Introduction, Approaches to Knowledge Representation


Knowledge is the information about a domain that can be used to solve problems in that
domain. To solve many problems requires much knowledge, and this knowledge must be
represented in the computer. As part of designing a program to solve problems, we must
define how the knowledge will be represented. A representation scheme is the form of the
knowledge that is used in an agent. A representation of some piece of knowledge is the
internal representation of the knowledge. A representation scheme specifies the form of
the knowledge. A knowledge base is the representation of all of the knowledge that is
stored by an agent.
Knowledge and Representation are two distinct entities. They play central but
distinguishable roles in intelligent system.

 Knowledge is a description of the world. It determines a system's competence by


what it knows

 Representation is the way knowledge is encoded. It defines a system's


performance in doing something

 Different types of knowledge require different kinds of representation. The


Knowledge Representation models/mechanisms are often based on:

1. Logic : Propositional Logic, Predicate Logic

2. Rules: Production rules sometimes called IF-THEN rules are most popular KR.
productions rules are simple but powerful forms of KR. production rules provide the
flexibility of combining declarative and procedural representation for using them in a
unified form.
Examples of production rules :
1. IF condition THEN action
2. IF premise THEN conclusion
3. IF proposition p1 and proposition p2 are true THEN proposition p3 is true
3. Frames
4. Semantic Net

Knowledge representation using semantic network

Components of a Semantic Network:


We can define a Semantic Network by specifying its fundamental components:
Lexical part :
nodes – denoting objects
links – denoting relations between objects
labels – denoting particular objects and relations
Structural part
the links and nodes form directed graphs
the labels are placed on the links and nodes
Semantic part
meanings are associated with the link and node labels
(the details will depend on the application domain)
Procedural part
constructors allow creation of new links and nodes
destructors allow the deletion of links and nodes
writers allow the creation and alteration of labels
readers can extract answers to questions

Semantic Networks as Knowledge Representations:


Using Semantic Networks for representing knowledge has particular advantages:
a. They allow us to structure the knowledge to reflect the structure of that part of the
world which is being represented.
b. The semantics, i.e. real world meanings, are clearly identifiable.
c. There are very powerful representational possibilities as a result of “is a” and “is a
part of” inheritance hierarchies.
d. They can accommodate a hierarchy of default values (for example, we can
assume the height of an adult male to be 178cm, but if we know he is a baseball
player we should take it to be 195cm).
e. They can be used to represent events and natural language sentences.
Knowledge representation using frames

The idea of semantic networks started out as a natural way to represent labeled
connections between entities. But, as the representations are expected to support
increasingly large ranges of problem solving tasks, the representation schemes
necessarily become increasingly complex. In particular, it becomes necessary to
assign more structure to nodes, as well as to links.
It is natural to use database ideas to keep track of everything, and the nodes and
their relations begin to look more like frames.
A frame consists of a selection of slots which can be filled by values, or
procedures for calculating values, or pointers to other frames.
For Example:

A complete frame based representation will consist of a whole hierarchy or


network of frames connected together by appropriate links/pointers.
Frames as a Knowledge Representation:
The simplest type of frame is just a data structure with similar properties and possibilities
for knowledge representation as a semantic network, with the same ideas of inheritance
and default values.
Frames become much more powerful when their slots can also contain instructions
(procedures) for computing things from information in other slots or in other frames.
The original idea of frames was due to Minsky (1975) who defined them as “data
structures for representing stereotyped situations”, such as going into a hotel room.
This type of frames is now generally referred to as Scripts.

Converting between Semantic Networks and Frames:


It is easy to construct frames for each node of a semantic net by reading off the links, e.g.

Two types of frames:


• Individual frames: represent a single object like a person, part of a trip
• Generic frames: represent categories of objects, like students

Individual frames: An individual frame is a named list of buckets called slots.


What goes in the bucket is called a filler of the slot.
(frame-name
<slot-name1 filler1>
<slot-name2 filler2 > …)

Frames and semantic nets: Frames can be viewed as a structural representation of semantic
nets.
Examples: below are four frames for the entities "Mammal", "Elephant", "Clyde", and "Nellie"
(The symbol * means that the value of the feature is typical for the entity, represented by the
frame.)
Mammal
subclass: Animal
warm_blooded: yes
Elephant
subclass: Mammal
* color: grey
* size: large
Clyde
instance: Elephant
color: pink
owner: Fred
Nellie
instance: Elephant
size: small
Components of a frame entity:
 Name - corresponds to a node in a semantic net
 Attributes (also called slots) filled with particular values
E.G. in the frame for Clyde, instance is the name of a slot, and elephant is the value of the slot.
Names of slots correspond to the links in semantic nets
Values of slots correspond to nodes. Hence each slot can be another frame.
Example:
Size
instance: Slot
single_valued: yes
range: Size-set
Owner
instance: Slot
single_valued: no
range: Person
The attribute value Fred (and even "large", "grey", etc) could be represented as a frame, e.g.:
Fred
instance: Person
occupation: Elephant-breeder
Frames have greater representational power than semantic nets
 Necessary attributes
 Typical attributes ("*" used to indicate attributes that are only true of a typical member of
the class, and not necessarily every member).
 Type constraints and default values of slots, overriding values.
 Slots and procedures: a slot may have a procedure to compute the value of the slot if
needed e.g. object area, given the size

Introduction to Probability theory:

In many settings, we must try to understand what is going on in a system when we have
imperfect or incomplete information.

Two reasons why we might reason under uncertainty:

1. Laziness (modeling every detail of a complex system is costly)

2. Ignorance (we may not completely understand the system)

Probabilities quantify uncertainty regarding the occurrence of events.

Probability theory provides a mathematical foundation to concepts such as “probability”,


“information”, “belief”, “uncertainty”, “confidence”, “randomness”, “variability”, “chance” and
“risk”.

Probability theory is important to empirical scientists because it gives them a rational framework
to make inferences and test hypotheses based on uncertain empirical data.

Probability theory is also useful to engineers building systems that have to operate intelligently
in an uncertain world.
Conditional Probabilities:

Conditional probabilities are key for reasoning because they formalize the process of
accumulating evidence and updating probabilities based on new evidence.

If P(A|B) = 1, this is equivalent to the sentence in Propositional Logic B => A. Similarly, if


P(A|B) =0.9, then this is like saying B => A with 90% certainty.

Given several measurements and other "evidence", E1, ..., Ek, we will formulate queries as P(Q |
E1, E2, ..., Ek) meaning "what is the degree of belief that Q is true given that we know E1, ..., Ek
and nothing else."

Conditional probability is defined as: P(A|B) = P(A ^ B)/P(B) = P(A,B)/P(B)

Some important rules related to conditional probability are:

1. Rewriting the definition of conditional probability, we get the Product Rule: P(A,B) =
P(A|B)P(B)
2. Chain Rule: P(A,B,C,D) = P(A|B,C,D)P(B|C,D)P(C|D)P(D), which generalizes the
product rule for a joint probability of an arbitrary number of variables. Note that ordering
the variables results in a different expression, but all have the same resulting value.
3. Conditionalized version of the Chain Rule: P(A,B|C) = P(A|B,C)P(B|C)
4. Bayes's Rule: P(A|B) = (P(A)P(B|A))/P(B), which can be written as follows to more
clearly emphasize the "updating" aspect of the rule: P(A|B) = P(A) * [P(B|A)/P(B)] Note:
The terms P(A) and P(B) are called the prior (or marginal) probabilities. The term P(A|B)
is called the posterior probability because it is derived from or depends on the value of B.
5. Conditionalized version of Bayes's Rule: P(A|B,C) = P(B|A,C)P(A|C)/P(B|C)
6. Conditioning (aka Addition) Rule: P(A) = Sum{P(A|B=b)P(B=b)} where the sum is over
all possible values b in the sample space of B.
7. P(~B|A) = 1 - P(B|A)

Bayesian Networks (aka Belief Networks) :

 Bayesian Networks, also known as Bayes Nets, Belief Nets, Causal Nets, and Probability
Nets, are a space-efficient data structure for encoding all of the information in the full
joint probability distribution for the set of random variables defining a domain. That is,
from the Bayesian Net one can compute any value in the full joint probability distribution
of the set of random variables.
 Represents all of the direct causal relationships between variables
 Intuitively, to construct a Bayesian net for a given set of variables, draw arcs from cause
variables to immediate effects.
 Space efficient because it exploits the fact that in many real-world problem domains the
dependencies between variables are generally local, so there are a lot of conditionally
independent variables
 Captures both qualitative and quantitative relationships between variables
 Can be used to reason
 Forward (top-down) from causes to effects -- predictive reasoning (aka causal reasoning)
 Backward (bottom-up) from effects to causes -- diagnostic reasoning
 Formally, a Bayesian Net is a directed, acyclic graph (DAG), where there is a node for
each random variable, and a directed arc from A to B whenever A is a direct causal
influence on B. Thus the arcs represent direct causal relationships and the nodes represent
states of affairs. The occurrence of A provides support for B, and vice versa. The
backward influence is call "diagnostic" or "evidential" support for A due to the
occurrence of B.
 Each node A in a net is conditionally independent of any subset of nodes that are not
descendants of A given the parents of A.

Building a Bayesian Net:


Intuitively, "to construct a Bayesian Net for a given set of variables, we draw arcs from cause
variables to immediate effects. In almost all cases, doing so results in a Bayesian network [whose
conditional independence implications are accurate]." (Heckerman, 1996)

Algorithm for constructing a Bayesian Net:

1. Identify a set of random variables that describe the given problem domain
2. Choose an ordering for them: X1, ..., Xn
3. for i=1 to n do
4. Add a new node for Xi to the net
5. Set Parents(Xi) to be the minimal set of already added nodes such that we have
conditional independence of Xi and all other members of {X1, ..., Xi-1} given
Parents(Xi)
6. Add a directed arc from each node in Parents(Xi) to Xi
7. If Xi has at least one parent, then define a conditional probability table at Xi: P(Xi=x |
possible assignments to Parents(Xi)). Otherwise, define a prior probability at Xi: P(Xi)

Notes about this algorithm:

 There is not, in general, a unique Bayesian Net for a given set of random variables. But
all represent the same information in that from any net constructed every entry in the
joint probability distribution can be computed.
 The "best" net is constructed if in Step 2 the variables are topologically sorted first. That
is, each variable comes before all of its children. So, the first nodes should be the roots,
then the nodes they directly influence, and so on.
 The algorithm will not construct a net that is illegal in the sense of violating the rules of
probability.

Certainty Factor Theory :

 A certainty factor (CF) is a numerical value that expresses a degree of subjective belief
that a particular item is true. The item may be a fact or a rule.
 The MYCIN developers realized that a Bayesian approach was intractable, as too much
data and/or suppositions/estimates are required.
 In addition, medical diagnosis systems based on Bayesian methods were not acepted
because the systems did not provide simple explanations of how it has reached its
conclusion.
 Certainty Factors are similar to conditional probabilities,but somewhat different.

– Rather than representing the degree of probability of an outcome, they represent


a measure of belief in the outcome.

– Where probabilities range from 0 (false) to 1 (true), CFs range from:

• -1 believed not to be the case

• 1 believed to be the case

– The absolute size of the CF measures the degree of belief

– The sign indicates belief vs disbelief.

Certainty Factors and Facts and Rules

 We can associate CFs with facts: – E.g., padre(John, Mary) with CF .90
 We can also associate CFs with rules: – (if (sneezes X) then (has_cold X) ) with CF 0.7

– where the CF measures our belief in the conclusion given the premise is observed.

Calculating Certainty Factors


• CFs are calculated using two other measures:

1. MB(H, E) – Measure of Belief: value between 0 and 1 representing the degree to which
belief in the hypothesis H is supported by observing evidence E.

2. MD(H, E) – Measure of Disbelief: value between 0 and 1 representing the degree to


which disbelief in the hypothesis H is supported by observing
evidence E.

CF is calculated in terms of the difference between MB and MD:


Calculating Certainty Factors: example

Calculate CF(Jac, Hip) given the following data:

p(H|E)=0.21/0.54=0.388

p(H)=0.6

• MB(H,E) = 0

• MD(H,E)=(0.6-0.388)/0.6 = 0.3519

Dampster - shafer theory :

 Dempster-Shafer theory is an approach to combining evidence


 Dempster (1967) developed means for combining degrees of belief derived from
independent items of evidence.
 His student, Glenn Shafer (1976), developed method for obtaining degrees of belief for
one question from subjective probabilities for a related question
 People working in Expert Systems in the 1980s saw their approach as ideally suitable for
such systems.
 Each fact has a degree of support, between 0 and 1:
– 0 No support for the fact

– 1 full support for the fact

 Differs from Bayesian approah in that:


– Belief in a fact and its negation need not sum to 1.

– Both values can be 0 (meaning no evidence for or against the fact)

 Mass function m(A): (where A is a member of the power set)

= proportion of all evidence that supports this element of the power set.

“The mass m(A) of a given member of the power set, A, expresses the proportion
of all relevant and available evidence that supports the claim that the actual state
belongs to A but to no particular subset of A.” (wikipedia)

“The value of m(A) pertains only to the set A and makes no additional claims
about any subsets of A, each of which has, by definition, its own mass
Properties of Mass function:

1. Each m(A) is between 0 and 1.


2. All m(A) sum to 1.
3. m(Ø) is 0 - at least one must be true.

Mass function m(A): example

 4 people (B, J, S and K) are locked in a room when the lights go out.
 When the lights come on, K is dead, stabbed with a knife.
 Not suicide (stabbed in the back)
 No-one entered the room.
 Assume only one killer.
 Θ = { B, J, S}
 P(Θ) = (Ø, {B}, {J}, {S}, {B,J}, {B,S}, {J,S}, {B,J,S} )
 Detectives, after reviewing the crime-scene, assign mass probabilities to various elements
of the power set:

Essay Questions :
1. Draw the semantic network representing the following Knowledge.
Every living thing needs oxygen to live. Every human is a living thing. John is human.
Answer john is living thing john needs oxygen to live

2. Explain about different knowledge representation techniques


3. Explain about production rules in details
4. Develop a complete frame based system for hospital application
5. Explain about semantic networks in detail
6. State and prove Baye’s theorem
7. Explain about Bayesian belief networks in detail
8. Explain certainty factor theory
9. Explain Dampster-Shafer theory

Short answer questions:


1. What do you mean by Semantic Nets? Give Example.
2. Elaborate architecture of Knowledge Based Systems.
3. What are the problems faced in selecting Representation Techniques?
4. What are the characteristics of Good Representation Techniques?
5. What are the two commitments of logic and define them?
6. What are the components of a first order logic?
7. What is the difference between the two quantifiers in the logics?
8. Define Probability
9. State Bayes theorem
10. Give joint distribution function for BBN
11. Define Degree of Belief and misbelief
12. Define mass function in Dampster shafer theory

You might also like