Unit 2
Unit 2
AND
EXPERT SYSTEMS (CSE209)
Propositional Logic:
• Definition: Represents facts as simple, atomic propositions (true/false statements).
• Example: "The sky is blue" can be represented as a proposition like P, where P could
be true or false.
First-Order Logic (Predicate Logic):
• Definition: Extends propositional logic by including objects, properties of
objects, and relations between objects.
• Example: "All humans are mortal" could be represented as ∀x (Human(x) →
Mortal(x)).
Semantic Networks:
• Definition: Represents knowledge as a network of nodes (concepts) and
edges (relationships).
• Example: A network could represent the concept of a "Dog" with edges
linking to properties like "has fur", "is a mammal", etc.
Semantic Networks:
Definition: Represents knowledge as a network of nodes
(concepts) and edges (relationships).
Example: A network could represent the concept of a "Dog"
with edges linking to properties like "has fur", "is a
mammal", etc.
Frames:
Definition: Structured data representations, often used to
represent stereotyped situations.
Example: A "House" frame might include slots like "rooms",
"roof type", "address", etc., each with possible values or sub-
frames.
Ontologies:
Definition: Structured frameworks for organizing information,
often hierarchical and used in knowledge-based systems.
Example: An ontology for animals might classify "Mammals"
as a subclass of "Animals", with further subdivisions into
species like "Dogs", "Cats", etc.
Rules:
Definition: Represent knowledge in the form of "if-then"
statements that define actions or conclusions.
Example: "If it is raining, then bring an umbrella" can be
represented as IF raining THEN bring(umbrella).
Bayesian Networks:
Definition: Graphical models representing probabilistic
relationships among variables.
Example: A Bayesian network might represent the
probability of having a disease given certain symptoms.
Fuzzy Logic:
Definition: Represents knowledge with degrees of truth
rather than binary true/false.
Example: "The weather is hot" could be 0.8 true, indicating
a high degree of truth, but not absolute.
Mappings
Mappings in knowledge representation refer to how real-world concepts and relationships
are translated into these representational formats. This involves defining the relationships
between concepts, how they interact, and the rules governing these interactions.
From Real World to Representation:
Example: Mapping a real-world scenario like "A dog chasing a cat" into a semantic
network could involve creating nodes for "dog", "cat", and "chase", and linking them
with directed edges.
Example: "Dog" → "Chases" → "Cat“
Visualization Example:
Nodes:
"Dog"
"Chase" (action)
"Cat"
Edges:
"Dog" → "Chase"
"Chase" → "Cat"
Inference:
Definition: The process of deriving new knowledge from existing representations.
Example: In a rule-based system, if we know "If A then B" and "A is true", we can
infer that "B is true".
In this graph:
1.Animal is the root concept.
2.Mammal is a subclass of Animal.
3.Dog and Cat are specific examples of Mammal.
Normalization Steps:
4.Ensure Unique Definitions: Verify that each concept like "Animal," "Mammal," and "Dog" has a unique and clear
definition.
5.Proper Categorization: Place concepts into the correct hierarchical structure, reflecting their relationships accurately.
Approaches in
knowledge representation
Approaches in knowledge representation refer to the different methods and strategies
used to encode information in a way that AI systems can utilize to perform reasoning,
learning, and decision-making. Here are some common approaches:
1. Logical Representation
Propositional Logic: Encodes knowledge as simple, declarative statements that are
either true or false.
Example: "It is raining" is represented as a single proposition P.
First-Order Logic (Predicate Logic): Extends propositional logic by including quantifiers
and predicates that can represent relationships between objects.
Example: "All humans are mortal" can be represented as ∀x (Human(x) →
Mortal(x)).
Description Logic: A subset of first-order logic used primarily for defining and reasoning
about concepts and their relationships within ontologies.
Example: Describing the concept of "Animal" with properties like "hasOrganism",
"isLiving", etc.
2. Procedural Representation
Production Rules: Knowledge is represented as "if-then" rules, which are used
to infer new information or perform actions.
Example: "If it is raining, then bring an umbrella" is represented as IF raining
THEN bring(umbrella).
Scripts: Predetermined sequences of actions or events, often used to represent
stereotypical situations.
Example: A "restaurant script" might include steps like entering the
restaurant, ordering food, eating, and paying the bill.
3. Semantic Networks
Represents knowledge as a graph of nodes (concepts) and edges (relationships
between concepts). This approach is particularly effective for representing
hierarchical and associative relationships.
Example: A semantic network for "Bird" might include nodes for "Animal",
"Bird", "CanFly", with edges indicating relationships like "isA" or
"hasProperty".
4. Frame-Based Representation
Frames: Data structures that represent stereotyped situations, with
slots for filling in details. Frames can inherit properties from other
frames, enabling efficient organization of knowledge.
Example: A frame for a "Car" might include slots like "make",
"model", "color", with possible sub-frames for "Engine", "Wheels",
etc.
5. Ontology-Based Representation
Ontologies: Structured frameworks that define the concepts,
categories, properties, and relationships within a specific domain.
Ontologies are used to ensure consistency and interoperability
between systems.
Example: An ontology for medical knowledge might define
relationships between diseases, symptoms, treatments, and
medical procedures.
6. Bayesian Networks
Represents probabilistic relationships between variables using directed
acyclic graphs (DAGs). Each node represents a variable, and the edges
represent probabilistic dependencies.
Example: A Bayesian network might model the probability of having a
disease based on various symptoms and risk factors.
7. Fuzzy Logic
Extends classical logic by allowing values to range between 0 and 1,
representing degrees of truth. This is particularly useful in handling
uncertainty and imprecision in knowledge representation.
Example: "The weather is somewhat hot" might be represented with a
truth value of 0.7 rather than a binary true/false.
8. Connectionist Models (Neural Networks)
Represents knowledge as patterns of activation across networks of
simple units (neurons). While this approach is more focused on learning
and pattern recognition, it can be used for knowledge representation in
the form of learned weights and connections.
Example: A neural network might learn to recognize images of cats
by adjusting the weights of connections between neurons based on
training data.
9. Hybrid Approaches
Combines two or more of the above approaches to leverage their
strengths and mitigate their weaknesses.
Example: A hybrid system might use frames for structured
knowledge representation and Bayesian networks to handle
uncertainty in decision-making processes.
Issues in knowledge representation
Knowledge representation is a critical aspect of artificial intelligence
and expert systems, but it comes with several challenges and issues
that can impact the effectiveness of AI systems. Here are some of the
key issues in knowledge representation:
1. Complexity
Representation of Complex Knowledge: Capturing and representing
complex, real-world knowledge in a structured format can be
challenging. Real-world scenarios often involve numerous interrelated
concepts, and accurately modeling these relationships requires
sophisticated representation techniques.
Example: Modeling the nuanced relationships between various
medical symptoms, diseases, and treatments in a healthcare
system.
2. Ambiguity
Handling Ambiguous Information: Language and real-world scenarios
often contain ambiguities that are difficult to represent and reason
about in a knowledge-based system. Ambiguity arises when a concept
or statement can have multiple interpretations.
Example: The word "bank" can refer to a financial institution or the
side of a river, depending on the context.
3. Incomplete Knowledge
Dealing with Incomplete Information: In many cases, the knowledge
available to a system is incomplete or missing important details.
Representing and reasoning with incomplete knowledge is a significant
challenge.
Example: An expert system might not have all the data needed to
make a precise medical diagnosis but still needs to provide
recommendations.
4. Uncertainty
Managing Uncertainty: Real-world knowledge often involves
uncertainty, such as the likelihood of an event occurring or the
probability of a particular outcome. Representing and reasoning with
uncertain knowledge is complex.
Example: A weather prediction system might need to represent the
probability of rain, which can never be stated with absolute certainty.
5. Scalability
Scalability of Representation: As the amount of knowledge grows, the
representation system must scale accordingly. Large knowledge bases
can become difficult to manage, leading to performance issues in
reasoning and retrieval.
Example: A knowledge base for a legal expert system might contain
millions of rules and facts, making efficient retrieval and reasoning
challenging.
6. Inference and Reasoning
Efficient Inference: Deriving conclusions from a large knowledge base can be
computationally expensive. Ensuring that the reasoning process is efficient
and scalable is a major issue.
Example: In a rule-based system, checking all possible rules to infer a new
piece of knowledge can become infeasible as the number of rules
increases.
7. Knowledge Acquisition
Difficulty in Acquiring Knowledge: Extracting and formalizing expert
knowledge is often a slow and labor-intensive process. Experts may struggle
to articulate their knowledge, and translating it into a formal representation
can introduce errors.
Example: Capturing the tacit knowledge of a medical expert about how to
diagnose rare diseases can be difficult and time-consuming.
8. Consistency
Ensuring Consistency: In large knowledge bases, maintaining consistency
between different pieces of knowledge can be challenging. Inconsistencies
can lead to incorrect inferences or system failures.
Example: An expert system that contains conflicting rules about the side
effects of a medication may give contradictory advice to users.
9. Context Sensitivity
Handling Context: Knowledge is often context-dependent, meaning that
the relevance or interpretation of a piece of information can change
depending on the situation. Representing and reasoning with context-
sensitive knowledge is complex.
Example: The statement "It is cold" can mean different things depending
on the location (e.g., "cold" in Alaska vs. "cold" in the tropics).
10. Interoperability
Interoperability between Systems: Different systems may use different
representations for similar concepts, leading to challenges in integrating
knowledge from multiple sources.
Example: Integrating knowledge from different medical databases that
use varying terminologies and ontologies can be difficult.
Balancing Expressiveness and Efficiency: More expressive knowledge representations can capture
complex relationships but may require more computational resources. Finding a balance between
Example: First-order logic is more expressive than propositional logic, but it is also computationally
• Propositions:
• A proposition is a declarative statement that can be either true or false, but not both.
• Examples:
algorithm can determine whether certain statements are true or false in all
cases.
misinterpretation.
Procedural versus declarative knowledge
Procedural and declarative knowledge are two fundamental types of knowledge that
describe how we understand and use information. They differ in both their nature and
their application in various fields, including artificial intelligence, cognitive science, and
education.
Procedural knowledge refers to the knowledge of how to do something. It involves
knowing the processes, methods, or steps required to accomplish tasks. It is sometimes
called "know-how" knowledge.
Declarative knowledge refers to facts, information, and concepts that one knows and can
explicitly state or declare. It is sometimes called "know-what" knowledge.
Characteristics of Procedural Knowledge
• Implicit: Procedural knowledge is often difficult to articulate fully in words; it’s
more about "doing" than "knowing." It's typically acquired through practice and
experience.
Example: Knowing how to ride a bicycle, tie your shoes, or play a musical
instrument.
• Dynamic: It involves sequences of actions or operations, often executed over time.
• Performance-Oriented: Procedural knowledge is directly related to performing
tasks or operations effectively.
• Less Easily Transferred: Since it’s implicit, procedural knowledge is harder to teach
or transfer to others without hands-on practice or demonstration.
Characteristics of Declarative knowledge
• Explicit: Declarative knowledge can be articulated in words, symbols, or images. It’s the type of
Example: "Paris is the capital of France" or "A triangle has three sides."
• Static: Declarative knowledge represents facts or data that do not change in the context of its use.
• Easily Transferred: Since declarative knowledge is explicit, it can be easily communicated, taught,
•Queries:
•A query is a question posed to the logic programming system to find
out whether certain facts are true or to find values that satisfy certain
conditions.
•Example: ?- grandparent(john, X). asks the system to find all X such that John is a grandparent of X.
•Inference Engine:
•The inference engine is the core component of a logic programming system that applies
inference rules to the facts and rules to derive conclusions. The most common inference method
used in logic programming is backward chaining.
•Unification:
•Unification is the process of matching two logical terms by finding a substitution for variables
that makes the terms identical.
•Example: To unify parent(X, mary) with parent(john, mary), the system identifies that X must be
john.
•Recursion:
•Logic programming frequently uses recursion to define complex relationships or solve problems.
Prolog: A Logic Programming Language
Prolog (Programming in Logic) is the most widely known logic programming
language. It was developed in the early 1970s and is used in fields like artificial
intelligence, computational linguistics, and symbolic reasoning.
•Example: A healthcare expert system can diagnose diseases based on symptoms. The system starts with the
•Reason: Forward chaining is suitable here as it involves gathering data (symptoms) and using rules to arrive
at a potential diagnosis.
•Example: In industrial automation, forward chaining is used in systems that monitor machine conditions. If
certain conditions are met (e.g., high temperature), the system will trigger alarms or shut down machinery.
•Reason: The system continuously monitors sensor data and applies rules to take action as conditions
change.
Example:
Let’s consider a medical diagnosis system.
Rules:
1.If the patient has a fever and cough, then the patient has the flu.
2.If the patient has the flu, then prescribe rest and fluids.
Facts: The patient has a fever and cough.
Process:
•Start with the known facts: the patient has a fever and cough.
•Apply Rule 1: Since the patient has both symptoms, we conclude the patient has the flu.
•Apply Rule 2: Since the patient has the flu, the system concludes to prescribe rest and
fluids.
In forward chaining, the process moves from facts to conclusions.
Backward Reasoning (Backward Chaining) (Goal-driven):
Backward reasoning, also known as backward chaining, is a goal-driven approach
where reasoning starts from a specific goal or hypothesis, and inference rules are
applied in reverse to determine if the goal can be satisfied by existing facts.
Key Characteristics:
•Goal-Driven: The process begins with a specific goal or conclusion that needs to be
proven or achieved.
•Rule Application: Rules are applied in reverse to see if the goal can be derived from
existing facts.
•Fact Validation: The system works backward from the goal, validating whether the facts
can support the goal.
Process:
1.Start with a Goal: Begin with a hypothesis or goal you want to prove or
achieve.
2.Apply Rules Backward: Identify rules that could lead to the goal and check if
their premises are satisfied.
3.Check Facts: For each premise, check if it can be supported by existing facts
or if it needs to be further broken down.
4.Goal Satisfied: The process continues until the goal is proven true by existing
facts or the goal is deemed unachievable.
2. Applications of Backward Chaining:
•Example: Used in medical systems to diagnose diseases by working from potential diseases (goals) and checking
•Reason: Backward chaining is efficient when there are many possible diseases
and the system works backward to verify which one fits the symptoms, focusing only on relevant possibilities.
•Example: In systems troubleshooting (e.g., IT or automotive), backward chaining helps identify the cause of a
failure. If a system isn't working (goal), backward chaining checks possible causes by verifying conditions step by
step.
•Reason: The system starts from the desired state (goal) and traces potential reasons until the actual fault is
identified.
Example:
Same medical diagnosis system as above.
Goal: Prescribe rest and fluids (i.e., we want to know if this is the correct prescription).
Rules:
1.If the patient has a fever and cough, then the patient has the flu.
2.If the patient has the flu, then prescribe rest and fluids.
Facts: The patient has a fever and cough.
Process:
•Start with the goal: prescribe rest and fluids.
•Check if Rule 2 applies: The rule states we can prescribe rest and fluids if the patient has the flu.
•Now, work backward to see if the patient has the flu.
•Check Rule 1: The patient will have the flu if they have a fever and cough.
•Since the patient has a fever and cough (known facts), we conclude the goal is satisfied, and rest and fluids
are prescribed.
Forward Reasoning (Forward Backward Reasoning (Backward
Aspect Chaining) Chaining)
Starting Point Known facts or data A specific goal or hypothesis
Moves from the goal to check if
Direction Moves from facts to conclusions
facts support it
Nature Data-driven Goal-driven
Apply rules to known facts to Apply rules in reverse to prove
Process derive new facts the goal
Can generate many
intermediate conclusions, More focused on the goal,
Efficiency potentially inefficient if the goal potentially more efficient
is far from initial facts
Expert systems, data analysis, Problem-solving, theorem
Common Use Cases rule-based systems proving, diagnostics
Deriving that Socrates is mortal Proving Socrates is mortal by
Example from known facts validating premises
Probability & Bayes' Theorem
Probability
Probability is the mathematical framework for quantifying uncertainty. It represents the
likelihood of an event occurring within a certain context.
•Probability of an Event (P(A)): The likelihood that event A will occur, where 0 ≤ P(A) ≤ 1. A
probability of 0 indicates the event will not occur, and 1 indicates the event is certain to
occur.
Probability of an Event:
Simple Event: An event with a single outcome.
Compound Event: An event with multiple outcomes.
Formula: For a simple event A in a sample space S, P(A)=Number of favorable outcomes
Total number of possible outcomes
•Joint Probability: The probability of two events occurring together (e.g., P(A ∩ B) is the
probability that both A and B occur).
•Conditional Probability (P(A|B)): The probability that event A occurs given that event B
has already occurred. This is a key concept in Bayesian reasoning.
•Marginal Probability:
•The probability of an event occurring irrespective of other events. It is calculated by summing or
integrating over all possible values of the other variables.
•Independence:
•Two events A and B are independent if P(A∩B)=P(A)⋅P(B)
Examples of Conditional Probability
1.Medical Testing:
1. Scenario: Suppose there is a medical test for a disease with a known prevalence in the
population.
2. Events: Let A be the event that a person has the disease, and BBB be the event that the
3. Conditional Probability: P(A∣B) is the probability that a person has the disease given that
they tested positive. This is useful in assessing the reliability of the test.
1. Medical Diagnosis
Scenario: A doctor wants to determine the probability that a patient has a
rare disease given that they tested positive for it.
•Prior Probability P(D): The probability of having the disease before the test
is 0.01 (1% prevalence).
•Likelihood P(T∣D): The probability of testing positive given the disease is
0.95(95% sensitivity).
•False Positive Rate P(T∣D′) : The probability of testing positive without the
disease is 0.05 (5% false positive rate).
Example : Weather Forecasting
•Scenario: Historical data shows that 30% of the days are rainy. On rainy days,
80% of the time it is cloudy. On non-rainy days, 20% of the time it is cloudy.
•Events:
• A: It rains.
• B: It is cloudy.
Objective: Find P(A∣B), the probability it will rain given that it is cloudy.
Bayes' Theorem
Bayes' Theorem is a fundamental concept in probability theory that describes how to
update the probability of an event based on new evidence. It provides a way to calculate
the conditional probability of an event, given the occurrence of another
event.P(H∣E)=(P(E∣H)⋅P(H)) / P(E)
•P(H∣E) is the posterior probability: the probability of the hypothesis H given the evidence E.
•P(E∣H) is the likelihood: the probability of the evidence E given that the hypothesis H is true.
•P(H) is the prior probability: the initial probability of the hypothesis before considering the evidence.
•P(E) is the marginal likelihood: the total probability of the evidence under all possible hypotheses.
Example 1: Medical Diagnosis
90% accurate, meaning it correctly identifies 90% of people with the disease
(true positive rate), and it correctly identifies 90% of people without the
disease (true negative rate). If a person tests positive, what is the probability
Suppose a doctor is testing for a rare disease, which only affects 1% of the
detecting the disease (sensitivity), and it also has a 95% accuracy for
Now, let's say a patient tests positive. We want to calculate the probability
that the patient actually has the disease, given the positive test result.
Bayesian Networks
Bayesian Networks (also known as Belief Networks) are graphical models that represent
the probabilistic relationships among a set of variables. Each node in the network
represents a variable, and the edges represent dependencies between them.
Key Features:
•Directed Acyclic Graph (DAG): The network is structured as a DAG, where each node
corresponds to a random variable, and edges indicate conditional dependencies.
•Conditional Probability Tables (CPTs): Each node has a CPT that specifies the probability
of the node given its parents in the network.
•Inference: Bayesian networks can be used to perform probabilistic inference, updating
beliefs about the state of the world as new evidence is introduced.
Components of Bayesian Networks
1.Nodes:
1. Represent random variables in the network. Each node can represent
discrete or continuous variables.
2. For example, in a medical diagnosis network, nodes might represent variables
like "Fever," "Cough," "Flu," and "Cold."
2.Edges:
1. Directed edges (arrows) between nodes represent probabilistic dependencies
or causal relationships.
2. An edge from node A to node B indicates that A has a direct influence on B.
3. Conditional Probability Tables (CPTs):
•Each node has an associated CPT that quantifies the effect of the parent nodes on
the node.
•For a node with discrete parents, the CPT provides the probability distribution of the
node given each combination of parent values.
•For example, if node B is influenced by nodes A and C, the CPT of B will give the
probability distribution P(B∣A,C).
Example 1: Medical Diagnosis
Scenario: A simple Bayesian Network for diagnosing a disease based on symptoms.
•Variables:
• D: Disease (e.g., flu) D
|
• S: Symptom (e.g., cough) +--+--+
| |
• F: Fever S F
Description:
•D (Disease) influences both S (Symptom) and F (Fever).
•The presence of the disease affects whether the patient shows symptoms and has a fever.
Example 2: Weather Prediction
Scenario: A Bayesian Network to predict the likelihood of rain given the weather forecast and
humidity levels.
F
•Variables: |
v
• R: Rain R
• F: Forecast (e.g., forecast predicts rain) |
v
• H: Humidity H
Description:
•F (Forecast) affects R (Rain).
•R (Rain) influences H (Humidity).
Inference in Bayesian Networks
Inference involves calculating the probabilities of certain variables given evidence about others. This can
be done through various methods:
1.Exact Inference:
1. Algorithms like Variable Elimination and Belief Propagation are used to perform exact inference
in Bayesian Networks.
2. These methods compute the marginal probability distribution of a subset of variables given
evidence.
2.Approximate Inference:
1. When exact inference is computationally infeasible, approximate methods such as Monte Carlo
Sampling (e.g., Gibbs Sampling) can be used to estimate probabilities.
Example of a Bayesian Network
Consider a simple Bayesian Network for a medical diagnosis:
•Nodes: "Cough," "Fever," and "Flu."
•Edges:
• "Flu" → "Cough"
• "Flu" → "Fever"
This network suggests that "Flu" is a cause of both "Cough" and "Fever."
This network suggests that "Flu" is a cause of both "Cough" and "Fever."
1.Conditional Probability Tables:
1. P(Cough | Flu): Probability of having a cough given that the patient has the flu.
2. P(Fever | Flu): Probability of having a fever given that the patient has the flu.
3. P(Flu): Prior probability of having the flu.
2.Inference Example:
1. Given that a patient has a cough and fever, you can use the network to infer the
probability of having the flu.
2. Use the CPTs and evidence to update the belief about the probability of "Flu" given
the observed symptoms.
Example: A Bayesian network could model the probability
of a patient having a disease based on various symptoms
and test results, where each symptom and test result is a
node in the network.
For a network with three variables A, B, and C where
A→B→C, the joint probability distribution can be
written as:
P(A,B,C)=P(A)⋅P(B∣A)⋅P(C∣B)
Here:
P(A) is the probability of A.
P(B∣A) is the conditional probability of BBB given A.
P(C∣B) is the conditional probability of CCC given B.
Dempster-Shafer Theory (DST): Overview
information. It is often used in scenarios where evidence is gathered from multiple sources,
and the goal is to combine this evidence to make a decision or estimate probabilities.
hypotheses, DST assigns "beliefs" that account for the evidence and its degree of
uncertainty.
Key Concepts in DST
1.Frame of Discernment (Θ): A set of mutually exclusive hypotheses or propositions. For example, if
2.Mass Function (Belief Assignment): A function that assigns a belief mass to each subset of the frame
of discernment. The mass assigned to a subset reflects the amount of evidence supporting that subset.
3.Belief (Bel): Represents the total belief supporting a particular hypothesis, accounting for all evidence
4.Plausibility (Pl): Measures how plausible a hypothesis is, considering all the evidence that does not
contradict it.
5.Conflict Factor (K): Represents the degree of conflict between two sets of evidence.
Example 1: Medical Diagnosis
•Scenario: You have two diagnostic tests for a disease. Each test provides evidence with some level of
uncertainty.
•Sources:
• Test 1: Provides a BPA with some degree of belief that the patient has the disease.
• Test 2: Provides a BPA with some degree of belief that the patient has the disease.
•Combination: Use Dempster’s Rule to combine the BPAs from both tests to get a more comprehensive belief
about the patient's health.
When integrating certainty factors (CFs) into a rule-based system, the goal is to combine the results
of different rules to make a final decision. Different strategies are used depending on whether the
rules reinforce each other or are in conflict. Here’s a breakdown of the two major concepts involved:
1.Aggregation: Combining the CFs of different rules using methods such as averaging or weighted
sum. Aggregation involves combining CFs from multiple rules to determine the final CF for a