Unit 4 PDF
Unit 4 PDF
2. Rules: Production rules sometimes called IF-THEN rules are most popular KR.
productions rules are simple but powerful forms of KR. production rules provide the
flexibility of combining declarative and procedural representation for using them in a
unified form.
Examples of production rules :
1. IF condition THEN action
2. IF premise THEN conclusion
3. IF proposition p1 and proposition p2 are true THEN proposition p3 is true
3. Frames
4. Semantic Net
The idea of semantic networks started out as a natural way to represent labeled
connections between entities. But, as the representations are expected to support
increasingly large ranges of problem solving tasks, the representation schemes
necessarily become increasingly complex. In particular, it becomes necessary to
assign more structure to nodes, as well as to links.
It is natural to use database ideas to keep track of everything, and the nodes and
their relations begin to look more like frames.
A frame consists of a selection of slots which can be filled by values, or
procedures for calculating values, or pointers to other frames.
For Example:
Frames and semantic nets: Frames can be viewed as a structural representation of semantic
nets.
Examples: below are four frames for the entities "Mammal", "Elephant", "Clyde", and "Nellie"
(The symbol * means that the value of the feature is typical for the entity, represented by the
frame.)
Mammal
subclass: Animal
warm_blooded: yes
Elephant
subclass: Mammal
* color: grey
* size: large
Clyde
instance: Elephant
color: pink
owner: Fred
Nellie
instance: Elephant
size: small
Components of a frame entity:
Name - corresponds to a node in a semantic net
Attributes (also called slots) filled with particular values
E.G. in the frame for Clyde, instance is the name of a slot, and elephant is the value of the slot.
Names of slots correspond to the links in semantic nets
Values of slots correspond to nodes. Hence each slot can be another frame.
Example:
Size
instance: Slot
single_valued: yes
range: Size-set
Owner
instance: Slot
single_valued: no
range: Person
The attribute value Fred (and even "large", "grey", etc) could be represented as a frame, e.g.:
Fred
instance: Person
occupation: Elephant-breeder
Frames have greater representational power than semantic nets
Necessary attributes
Typical attributes ("*" used to indicate attributes that are only true of a typical member of
the class, and not necessarily every member).
Type constraints and default values of slots, overriding values.
Slots and procedures: a slot may have a procedure to compute the value of the slot if
needed e.g. object area, given the size
In many settings, we must try to understand what is going on in a system when we have
imperfect or incomplete information.
Probability theory is important to empirical scientists because it gives them a rational framework
to make inferences and test hypotheses based on uncertain empirical data.
Probability theory is also useful to engineers building systems that have to operate intelligently
in an uncertain world.
Conditional Probabilities:
Conditional probabilities are key for reasoning because they formalize the process of
accumulating evidence and updating probabilities based on new evidence.
Given several measurements and other "evidence", E1, ..., Ek, we will formulate queries as P(Q |
E1, E2, ..., Ek) meaning "what is the degree of belief that Q is true given that we know E1, ..., Ek
and nothing else."
1. Rewriting the definition of conditional probability, we get the Product Rule: P(A,B) =
P(A|B)P(B)
2. Chain Rule: P(A,B,C,D) = P(A|B,C,D)P(B|C,D)P(C|D)P(D), which generalizes the
product rule for a joint probability of an arbitrary number of variables. Note that ordering
the variables results in a different expression, but all have the same resulting value.
3. Conditionalized version of the Chain Rule: P(A,B|C) = P(A|B,C)P(B|C)
4. Bayes's Rule: P(A|B) = (P(A)P(B|A))/P(B), which can be written as follows to more
clearly emphasize the "updating" aspect of the rule: P(A|B) = P(A) * [P(B|A)/P(B)] Note:
The terms P(A) and P(B) are called the prior (or marginal) probabilities. The term P(A|B)
is called the posterior probability because it is derived from or depends on the value of B.
5. Conditionalized version of Bayes's Rule: P(A|B,C) = P(B|A,C)P(A|C)/P(B|C)
6. Conditioning (aka Addition) Rule: P(A) = Sum{P(A|B=b)P(B=b)} where the sum is over
all possible values b in the sample space of B.
7. P(~B|A) = 1 - P(B|A)
Bayesian Networks, also known as Bayes Nets, Belief Nets, Causal Nets, and Probability
Nets, are a space-efficient data structure for encoding all of the information in the full
joint probability distribution for the set of random variables defining a domain. That is,
from the Bayesian Net one can compute any value in the full joint probability distribution
of the set of random variables.
Represents all of the direct causal relationships between variables
Intuitively, to construct a Bayesian net for a given set of variables, draw arcs from cause
variables to immediate effects.
Space efficient because it exploits the fact that in many real-world problem domains the
dependencies between variables are generally local, so there are a lot of conditionally
independent variables
Captures both qualitative and quantitative relationships between variables
Can be used to reason
Forward (top-down) from causes to effects -- predictive reasoning (aka causal reasoning)
Backward (bottom-up) from effects to causes -- diagnostic reasoning
Formally, a Bayesian Net is a directed, acyclic graph (DAG), where there is a node for
each random variable, and a directed arc from A to B whenever A is a direct causal
influence on B. Thus the arcs represent direct causal relationships and the nodes represent
states of affairs. The occurrence of A provides support for B, and vice versa. The
backward influence is call "diagnostic" or "evidential" support for A due to the
occurrence of B.
Each node A in a net is conditionally independent of any subset of nodes that are not
descendants of A given the parents of A.
1. Identify a set of random variables that describe the given problem domain
2. Choose an ordering for them: X1, ..., Xn
3. for i=1 to n do
4. Add a new node for Xi to the net
5. Set Parents(Xi) to be the minimal set of already added nodes such that we have
conditional independence of Xi and all other members of {X1, ..., Xi-1} given
Parents(Xi)
6. Add a directed arc from each node in Parents(Xi) to Xi
7. If Xi has at least one parent, then define a conditional probability table at Xi: P(Xi=x |
possible assignments to Parents(Xi)). Otherwise, define a prior probability at Xi: P(Xi)
There is not, in general, a unique Bayesian Net for a given set of random variables. But
all represent the same information in that from any net constructed every entry in the
joint probability distribution can be computed.
The "best" net is constructed if in Step 2 the variables are topologically sorted first. That
is, each variable comes before all of its children. So, the first nodes should be the roots,
then the nodes they directly influence, and so on.
The algorithm will not construct a net that is illegal in the sense of violating the rules of
probability.
A certainty factor (CF) is a numerical value that expresses a degree of subjective belief
that a particular item is true. The item may be a fact or a rule.
The MYCIN developers realized that a Bayesian approach was intractable, as too much
data and/or suppositions/estimates are required.
In addition, medical diagnosis systems based on Bayesian methods were not acepted
because the systems did not provide simple explanations of how it has reached its
conclusion.
Certainty Factors are similar to conditional probabilities,but somewhat different.
We can associate CFs with facts: – E.g., padre(John, Mary) with CF .90
We can also associate CFs with rules: – (if (sneezes X) then (has_cold X) ) with CF 0.7
– where the CF measures our belief in the conclusion given the premise is observed.
1. MB(H, E) – Measure of Belief: value between 0 and 1 representing the degree to which
belief in the hypothesis H is supported by observing evidence E.
p(H|E)=0.21/0.54=0.388
p(H)=0.6
• MB(H,E) = 0
• MD(H,E)=(0.6-0.388)/0.6 = 0.3519
= proportion of all evidence that supports this element of the power set.
“The mass m(A) of a given member of the power set, A, expresses the proportion
of all relevant and available evidence that supports the claim that the actual state
belongs to A but to no particular subset of A.” (wikipedia)
“The value of m(A) pertains only to the set A and makes no additional claims
about any subsets of A, each of which has, by definition, its own mass
Properties of Mass function:
4 people (B, J, S and K) are locked in a room when the lights go out.
When the lights come on, K is dead, stabbed with a knife.
Not suicide (stabbed in the back)
No-one entered the room.
Assume only one killer.
Θ = { B, J, S}
P(Θ) = (Ø, {B}, {J}, {S}, {B,J}, {B,S}, {J,S}, {B,J,S} )
Detectives, after reviewing the crime-scene, assign mass probabilities to various elements
of the power set:
Essay Questions :
1. Draw the semantic network representing the following Knowledge.
Every living thing needs oxygen to live. Every human is a living thing. John is human.
Answer john is living thing john needs oxygen to live