Unit 3 Knowledge Representation & Reasoning
Unit 3 Knowledge Representation & Reasoning
Propositional logic, also known as sentential logic, deals with logical relationships
between propositions (statements that can either be true or false). In propositional logic,
we analyze how propositions combine to form compound propositions and how their truth
values are determined.
• Syntax: The formal rules governing how propositions and connectives are written. A
proposition is often represented by a letter (e.g., ppp, qqq, rrr).
o Well-formed formulas (WFF): Expressions that are syntactically correct.
o Connectives: ∧\land∧ (AND), ∨\lor∨ (OR), ¬\neg¬ (NOT), ⇒\Rightarrow⇒
(implies), ⇔\Leftrightarrow⇔ (if and only if).
• Semantics: The meaning or interpretation of a logical formula. This involves
assigning truth values to propositions and determining the truth value of compound
formulas.
o Truth Value: The truth or falsity of a proposition, either true (T) or false (F).
1. AND (Conjunction, ∧\land∧): The result is true only if both operands are true.
a. Example: p∧qp \land qp∧q is true if both ppp and qqq are true.
2. OR (Disjunction, ∨\lor∨): The result is true if at least one operand is true.
a. Example: p∨qp \lor qp∨q is true if either ppp or qqq is true.
3. NOT (Negation, ¬\neg¬): The result is true if the operand is false, and vice versa.
a. Example: ¬p\neg p¬p is true if ppp is false.
4. IMPLIES (Implication, ⇒\Rightarrow⇒): The result is false only if the first operand
is true and the second is false.
a. Example: p⇒qp \Rightarrow qp⇒q is false only if ppp is true and qqq is
false.
5. IF AND ONLY IF (Biconditional, ⇔\Leftrightarrow⇔): The result is true if both
operands have the same truth value.
a. Example: p⇔qp \Leftrightarrow qp⇔q is true if both ppp and qqq are either
true or false.
A truth table is a table that shows all possible truth values of a set of propositions and the
resulting truth value of a compound proposition.
• Logical Equivalence: Two logical expressions are equivalent if they always have the
same truth values for all possible assignments of truth values to their variables. For
example:
o p⇒qp \Rightarrow qp⇒q is logically equivalent to ¬p∨q\neg p \lor q¬p∨q.
o Tautology: A compound statement that is always true (e.g., p∨¬pp \lor \neg
pp∨¬p).
o Contradiction: A compound statement that is always false (e.g., p∧¬pp
\land \neg pp∧¬p).
p⇒qp
p q \Rightarro
w qp⇒q
T T T
T F F
F T T
F F T
Propositional logic is used in various fields to model and reason about systems:
1. Computer Science:
a. Boolean Algebra: Used in circuit design, search algorithms, and software
development.
b. Formal Verification: Used to verify correctness of hardware and software
systems.
2. Artificial Intelligence:
a. Knowledge Representation: To model facts about the world in a logical
manner.
b. Automated Theorem Proving: Solving logical problems automatically.
3. Mathematics:
a. Mathematical Logic: Propositional logic forms the foundation of more
complex systems like predicate logic.
b. Set Theory: Logical operations are often used in the manipulation of sets
and relations.
4. Philosophy:
a. Used to model logical arguments and reasoning.
First-Order Logic (FOL), also known as predicate logic or first-order predicate calculus,
extends propositional logic by introducing quantifiers, predicates, and variables. It allows
for reasoning about objects, their properties, and relationships between objects.
• Syntax: The formal structure that defines how FOL expressions are written.
o Terms: Represent objects in the domain (e.g., constants, variables).
o Predicates: Represent relationships or properties (e.g., P(x)P(x)P(x) might
mean "x is a person").
o Atomic Formula: A predicate applied to terms (e.g., P(a)P(a)P(a), meaning
"a is a person").
o Complex Formula: A combination of atomic formulas using logical
connectives (e.g., P(x)∧Q(x,y)P(x) \land Q(x, y)P(x)∧Q(x,y)).
• Semantics: The interpretation of terms and formulas.
o Domain: A set of objects that the variables can refer to.
o Interpretation: Assigns meanings to predicates, functions, and constants in
the context of a domain.
o Truth Value: The truth of a formula depends on the assignment of objects to
the variables and the interpretation of predicates and functions.
Quantifiers are used in FOL to express statements about all or some elements of the
domain.
1. Expressiveness:
a. Propositional Logic: Deals only with whole propositions (true/false), with no
internal structure.
b. First-Order Logic: Allows for a detailed representation of individual objects,
their properties, and relationships.
2. Variables and Quantifiers:
a. Propositional Logic: Has no concept of variables or quantifiers.
b. First-Order Logic: Introduces variables (which can stand for objects) and
quantifiers (which express statements about all or some objects).
3. Scope:
a. Propositional Logic: Limited to true/false values for entire statements.
b. First-Order Logic: Provides a richer framework for expressing properties and
relationships between elements in a domain.
4. Reasoning:
a. Propositional Logic: Primarily uses logical connectives to form complex
statements.
b. First-Order Logic: Uses quantifiers and predicates to form statements that
can describe properties and relationships of individuals.
1. Artificial Intelligence:
a. Knowledge Representation: FOL is widely used in AI to represent
knowledge in a form that machines can reason about, e.g., expert systems,
rule-based systems.
b. Automated Theorem Proving: FOL is used to automatically prove
mathematical theorems by defining axioms and inference rules.
2. Database Querying:
a. SQL Queries: FOL forms the basis of structured query languages (SQL),
where queries can express relationships between data objects and their
properties.
b. Relational Databases: FOL is used to define constraints and rules in
database schemas.
3. Mathematics:
a. FOL is used to formalize mathematical reasoning and proofs. Mathematical
structures, like groups, rings, and sets, can be described using FOL.
4. Formal Verification:
a. FOL is used in the formal verification of hardware and software systems,
ensuring that certain properties hold true in all cases.
5. Natural Language Processing (NLP):
a. In NLP, FOL can be used to model the syntax and semantics of natural
languages, making it possible to analyze and generate sentences in a
structured way.
Inference mechanisms in First-Order Logic (FOL) are used to derive new information or
conclusions from known facts or premises. The primary goal of inference is to use logical
rules to prove or deduce new truths based on existing knowledge.
Inference in FOL is more complex than in propositional logic because it involves variables,
predicates, and quantifiers. The most common methods used for inference in FOL are
forward chaining, backward chaining, and resolution.
3.3.2 Forward and Backward Chaining
Forward Chaining and Backward Chaining are two essential inference strategies used in
rule-based reasoning systems (like expert systems).
1. Forward Chaining:
a. Data-driven approach.
b. Starts with known facts and applies inference rules to derive new facts until
the goal is reached.
c. Procedure:
i. Begin with a set of known facts.
ii. Apply inference rules (e.g., P⇒QP \Rightarrow QP⇒Q) to generate
new facts.
iii. Repeat the process until the desired conclusion is reached or no new
facts can be derived.
d. Example: Given the facts "All humans are mortal" and "Socrates is a
human," forward chaining would infer "Socrates is mortal."
2. Backward Chaining:
a. Goal-driven approach.
b. Starts with the goal and works backwards, looking for the facts that support
it.
c. Procedure:
i. Start with a query (goal).
ii. Identify rules that could support the goal.
iii. Check if the premises of the rule are already known, otherwise work
backwards.
iv. Repeat until the premises are known facts or no solution is found.
d. Example: To prove "Socrates is mortal," backward chaining would check if
"Socrates is a human" and "All humans are mortal" are true.
• Unification: The process of finding a substitution of variables that makes two terms
or formulas identical.
o Example: Unifying P(x)P(x)P(x) with P(a)P(a)P(a) results in the substitution
x↦ax \mapsto ax↦a, meaning xxx is replaced with aaa.
• Substitution: The process of replacing a variable with a specific term in a formula.
o Example: Substituting xxx with aaa in the formula P(x)P(x)P(x) results in
P(a)P(a)P(a).
Unification and substitution are essential for making logical formulas match during the
application of inference rules, such as in forward and backward chaining.
Inference mechanisms in FOL are widely used in various domains, particularly in AI and
computer science. Some key applications include:
1. Expert Systems:
a. Inference engines in expert systems use FOL to derive new knowledge from
existing facts, supporting decision-making processes.
2. Automated Theorem Proving:
a. FOL is used to automatically prove mathematical theorems and verify the
correctness of statements in formal logic.
3. Knowledge Representation:
a. In AI, FOL is used for representing and reasoning about knowledge in a
structured way. Logical inference can derive new information from the facts
represented in the knowledge base.
4. Natural Language Processing (NLP):
a. FOL is applied in NLP for tasks such as semantic parsing, where sentences
are interpreted in terms of logical formulas and inferences are drawn.
5. Robotics:
a. Robots use logical inference to reason about their environment, plan
actions, and make decisions based on sensor data and previous knowledge.
6. Database Querying:
a. Logical inference mechanisms are used in databases to process queries and
derive answers from a set of known facts or data entries.
1. Forward Chaining: Starts from known facts and applies rules to reach the goal.
2. Backward Chaining: Starts with a goal and works backward to determine the facts
needed to prove it.
Both methods are crucial for reasoning within expert systems, allowing systems to
generate conclusions based on available knowledge.
3.4.2 Forward Chaining: Working from Facts to Goals
Forward Chaining is a data-driven reasoning approach that starts with the available facts
and applies rules to derive new facts until the goal is reached.
• Procedure:
o Start with known facts: Begin with a set of facts or known data points in the
system's knowledge base.
o Apply inference rules: Use the rules (typically in the form "If X, then Y") to
infer new facts.
o Repeat: Apply rules iteratively to the new facts generated, continually
expanding the knowledge base.
o Reach the goal: Continue the process until the desired conclusion or goal is
reached.
• Example:
o Given facts: "All humans are mortal" and "Socrates is a human."
o Rule: "If X is human, then X is mortal."
o Forward chaining would conclude: "Socrates is mortal."
• Key Features:
o Data-driven: It moves forward from known data.
o Exhaustive: Can generate all possible conclusions that follow from the
facts.
o May require extensive processing: If many rules are involved, forward
chaining may go through a large number of facts.
Backward Chaining is a goal-driven reasoning approach that begins with a goal and works
backward to find the facts that support it.
• Procedure:
o Start with a goal or query: Identify the goal that you want to prove (e.g., "Is
Socrates mortal?").
o Identify applicable rules: Look for rules that could lead to the goal.
o Check if premises are true: For each applicable rule, check whether the
premises (conditions) of the rule are known facts or need to be proven.
o Repeat: If premises are not known, recursively apply backward chaining to
those premises until facts are found.
o Goal satisfied: The process ends when all necessary facts are found to
prove the goal.
• Example:
o Goal: "Is Socrates mortal?"
o Rule: "If X is a human, then X is mortal."
o Premise to check: "Is Socrates a human?"
o If "Socrates is a human" is true, then "Socrates is mortal."
• Key Features:
o Goal-driven: It starts with the goal and looks for facts that support it.
o Efficient: Only the necessary facts and rules are explored.
o Recursive: The process works backward through the knowledge base until
the facts are found.
• Strengths:
o Forward Chaining: Ideal when the system needs to deduce multiple
conclusions from a set of facts.
o Backward Chaining: Efficient when working with specific goals, as it avoids
unnecessary processing by focusing only on relevant facts.
• Weaknesses:
o Forward Chaining: Can be computationally expensive, especially in large
knowledge bases.
o Backward Chaining: Might be less efficient if the goal is not specific enough
or if there are many potential paths to check.
Both Forward and Backward Chaining are widely used in expert systems and other AI-
based applications for decision-making and problem-solving.
1. Expert Systems:
a. In expert systems, forward chaining is typically used in diagnostic systems
or situations where the system needs to explore all possible consequences
of known facts (e.g., medical diagnosis systems).
b. Backward chaining is often used in rule-based systems where a goal needs
to be verified or specific conclusions are required (e.g., troubleshooting
systems).
2. Medical Diagnosis:
a. Forward chaining can be used to infer symptoms based on known diseases
and conditions.
b. Backward chaining can be used to trace symptoms back to possible
diseases or conditions.
3. Problem Solving in AI:
a. Expert systems use both methods to solve complex problems in fields like
law, engineering, and finance. In these systems, backward chaining is often
used to trace a specific solution path, while forward chaining is used for
exploring all possible solutions.
4. Planning and Scheduling:
a. Forward chaining can be used to plan actions and reach goals, while
backward chaining can be used to plan backward from a desired goal state
to determine the necessary steps.
5. Learning Systems:
a. Chaining methods can be used in learning environments to derive new
knowledge from existing data and conclusions, applying inference to learn
patterns or make predictions.
6. Search Algorithms:
a. In AI search problems, backward chaining can be used to explore paths
backward from the goal, while forward chaining can explore all potential
solutions from the initial state.
Resolution is a rule of inference used in First-Order Logic (FOL) and propositional logic
for proving the validity of logical statements. It is the primary method employed by
automated theorem provers to derive conclusions or refute hypotheses. Resolution is
based on the idea of refutation: a conclusion is inferred by proving that the negation of the
desired result leads to a contradiction.
The resolution process in FOL can be broken down into the following steps:
• Key Concept: Unification involves substituting variables with terms so that two
literals match.
• Example:
o Given literals: P(x)P(x)P(x) and P(a)P(a)P(a), where xxx is a variable and aaa
is a constant.
o The unification process would substitute x=ax = ax=a to make the two
literals identical.
Resolution is widely used in automated theorem proving systems due to its simplicity and
effectiveness in proving logical statements. Some of the key applications of resolution
include:
While resolution is a powerful tool for proving theorems, it comes with certain limitations
and challenges:
1. Computational Complexity:
a. The resolution process can be computationally expensive, especially in
cases involving a large number of clauses. The space and time complexity of
resolution can grow rapidly as the number of variables and clauses
increases.
2. Inefficiency with Large Knowledge Bases:
a. In large or complex knowledge bases, the number of possible resolutions
can explode, leading to inefficiency. Many unnecessary resolutions may be
performed, leading to combinatorial explosion.
3. Need for Conversion to CNF:
a. Converting formulas to CNF can be difficult and time-consuming, especially
for formulas that are not naturally in CNF. The process of conversion may
also increase the size of the formula, further complicating the resolution
process.
4. Ambiguity in Unification:
a. In some cases, unification can be ambiguous, especially when there are
multiple ways to unify terms. This ambiguity can lead to errors or
inefficiencies in the resolution process.
5. Failure to Find a Proof:
a. While resolution is a refutation method (proving the negation of a statement),
it does not always find a proof for a statement unless it is explicitly
unsatisfiable. For some statements, it may not be able to derive a
contradiction, thus failing to prove the statement.
6. Limited Expressiveness:
a. Resolution works well in propositional logic and first-order logic but is less
effective in more complex or higher-order logics, where more advanced
techniques may be necessary.
Probability theory is a branch of mathematics that deals with the likelihood of events
occurring. In AI, probabilistic reasoning is used to make decisions and predictions in
situations involving uncertainty.
P(H∣E)=P(E∣H)P(H)P(E)P(H|E) = \frac{P(E|H)P(H)}{P(E)}P(H∣E)=P(E)P(E∣H)P(H)
where:
where P(A∩B)P(A \cap B)P(A∩B) is the probability of both AAA and BBB occurring.
• Independence: Two events AAA and BBB are independent if the occurrence of one
does not affect the probability of the other.
1. Medical Diagnosis:
a. Used to determine the likelihood of a disease given the observed symptoms
and patient history.
2. Robot Navigation:
a. In uncertain environments, robots use probabilistic models to infer their
position and make decisions about where to move next.
3. Speech Recognition:
a. Probabilistic models help identify the most likely words or phrases given
audio input.
4. Natural Language Processing:
a. Models like Hidden Markov Models (HMM) and Bayesian Networks are
used in text understanding, part-of-speech tagging, and language modeling.
5. Machine Learning:
a. Naive Bayes classifiers and Gaussian Mixture Models (GMMs) are
examples of machine learning techniques that use probabilistic reasoning.
3.7 Utility Theory
Utility theory is concerned with the modeling and quantification of preferences in decision-
making. In AI, utility theory helps to make rational decisions in uncertain environments by
assigning values (utilities) to different outcomes.
• Key Concept: Utility represents the satisfaction or value derived from a particular
outcome. The goal is to maximize the utility of a decision or action.
Rational decision-making involves choosing actions that maximize the expected utility,
based on a decision-maker's preferences. Preferences refer to the way a decision-maker
ranks different outcomes or alternatives.
• Assumptions:
o Completeness: A decision-maker can rank any two outcomes.
o Transitivity: If outcome A is preferred to B and B is preferred to C, then A is
preferred to C.
o Independence: If two options are equivalent in some respects, the decision
maker will make choices based on the remaining differences.
• Expected Utility: In decision theory, the expected utility is the weighted average of
the utilities of all possible outcomes, where each outcome is weighted by its
probability.
• Decision Theory uses expected utility to help make rational choices in the face of
uncertainty, balancing the probabilities of outcomes and their respective utilities.
3.7.4 Risk and Uncertainty in Decision Making
• Risk: A situation where the probabilities of different outcomes are known, and
decision-makers can calculate the expected utility.
• Uncertainty: A situation where the probabilities of outcomes are unknown, making
it difficult to calculate expected utilities.
In AI, decision-makers often must make decisions under both risk and uncertainty, where
decision trees or Markov decision processes (MDPs) are used to model and evaluate
possible decisions.
A Hidden Markov Model (HMM) is a statistical model used to represent systems that are
modeled as a sequence of hidden states, with observable outputs that are probabilistically
dependent on the state. HMMs are widely used in time-series analysis and sequential data
problems, where the goal is to infer the hidden states based on observable data.
• Hidden States: These are the unobserved states of the system that generate the
observations.
• Observations: These are the observable variables that depend on the hidden
states.
1. States: The system has a finite number of hidden states. The states are typically
discrete and not directly observable.
2. Observations: These are the observable outputs that are generated by the hidden
states. In most applications, each state corresponds to a probability distribution
over the possible observations.
3. Transition Probabilities: These define the probability of transitioning from one
state to another. This is represented by a matrix where each element aija_{ij}aij is
the probability of transitioning from state iii to state jjj.
4. Emission Probabilities: These define the probability of observing a particular
observation from a given state. Each state has a distribution over possible
observations.
5. Initial Probabilities: The probabilities of starting in each state at the beginning of
the sequence.
The Forward Algorithm and Backward Algorithm are used for calculating the likelihood of
a sequence of observations given an HMM.
Both algorithms are essential for the efficient computation of probabilities in HMMs.
3.8.4 Training HMM: The Baum-Welch Algorithm
• Expectation Step (E-step): Calculate the expected value of the hidden states given
the current model parameters.
• Maximization Step (M-step): Update the model parameters to maximize the
likelihood of the observed data.
The algorithm iteratively refines the model parameters to better fit the observed data.
HMMs are widely used in applications where the goal is to model sequential or time-
dependent data, such as:
1. Speech Recognition: HMMs are used to model the sequence of phonemes (basic
sound units) in speech, and the task is to recognize words from the spoken audio.
2. Pattern Recognition: HMMs are used in various pattern recognition tasks such as
handwriting recognition, gesture recognition, and bioinformatics (e.g., gene
sequence analysis).
3. Part-of-Speech Tagging: HMMs are employed in natural language processing
tasks, where the hidden states represent parts of speech, and the observations are
the words.
A Bayesian Network (also called a Belief Network or Bayes Net) is a graphical model that
represents probabilistic relationships among a set of variables. Each node in the network
represents a random variable, and the edges represent conditional dependencies between
these variables. Bayesian networks are particularly useful for representing complex
systems involving uncertainty and for performing probabilistic inference.
3.9.2 Structure and Representation of Bayesian Networks
Learning Bayesian networks from data is computationally challenging, especially for large
networks.