AI Unit 2 Notes
AI Unit 2 Notes
COM
• Optimal? No
Depth-limited search
= depth-first search with depth limit l,
i.e., nodes at depth l have no successors
Summary of algorithms
Repeated states
Failure to detect repeated states can turn a linear problem into an exponential one!
Graph search
Summary
• Problem formulation usually requires abstracting away real
define a state space that can feasibly be explored
• Variety of uninformed search strategies
• Iterative deepening search uses only linear space and not much more time
than other uninformed algorithms
11b) How searching is used to provide solutions and also describe some real world
problems.
WWW.VIDYARTHIPLUS.COM V+ TEAM
WWW.VIDYARTHIPLUS.COM
together define the state space. The root of the search tree is a search node corresponding to the
initial state.
We continue choosing, testing, and expanding until either a solution is found or there are no
more states to be expanded. The choice of which state to expand is determined by the search
strategy. The general tree-search algorithm is described in figure as follows:
The nodes in the search tree are defined using five components in data structure. They are
1. STATE: the state in the state space to which the node corresponds;
2. PARENT-NODE: the node in the search tree that generated this node;
3. ACTION: the action that was applied to the parent to generate the node;
4. PATH-COST: the cost, traditionally denoted by g(n), of the path fro the initial state to the
node, as indicated by the parent pointers.
5. DEPTH: the number of steps along the path from the initial state.
The difference between nodes and states, a node is bookkeeping data structure used to represent
the search tree. A state corresponds to a configuration of the world.
To represent the collection of nodes that have been generated but not yet expanded – this
collection is called fringe. Each element of the fringe is a leaf node, that is, a node with no
successors in the tree. The representation of the fringe would be a set of nodes. The search
strategy then would be a function that selects the next node to be expanded from this set. It could
be computationally expensive, because the strategy function might have to look at every element
of the set to choose the best one. Alternatively, the collection of nodes is implemented as a
queue representation. The queue operations as follows:
In AI, where the graph is represented implicitly by the initial state and successor function and is
frequently infinite, complexity is expressed in terms of three quantities: b, the branching factor
or maximum number of successors of any node; d, the depth of the shallowest goal node; and m,
the maximum length of any path in the state space.
Definition of branching factor (b): The number of nodes which is connected to each of the
node in the search tree. It is used to find space and time complexity of the search strategy.
WWW.VIDYARTHIPLUS.COM V+ TEAM
WWW.VIDYARTHIPLUS.COM
salesperson problem
iii) VLSI layout Cell layout, channel routing
Alpha-Beta pruning
Pruning: The process of eliminating a branch of the search tree from consideration without
examining is called pruning. The two parameters of pruning technique are:
1. Alpha (α): Best choice for the value of MAX along the path or lower bound on the value that
on maximizing node may be ultimately assigned.
2. Beta (β): Best choice for the value of MIN along the path or upper bound on the value that a
minimizing node may be ultimately assigned.
Alpha-Beta Pruning: The alpha and beta values are applied to a minimax tree, it returns the
same move as minimax, but prunes away branches that cannot possibly influence the final
decision is called Alpha-Beta pruning or Cutoff.
Consider the two ply game tree from figure.
Alpha-Beta pruning
Pruning: The process of eliminating a branch of the search tree from consideration without
examining is called pruning. The two parameters of pruning technique are:
1. Alpha (α): Best choice for the value of MAX along the path or lower bound on the value that
on maximizing node may be ultimately assigned.
2. Beta (β): Best choice for the value of MIN along the path or upper bound on the value that a
minimizing node may be ultimately assigned.
Alpha-Beta Pruning: The alpha and beta values are applied to a minimax tree, it returns the
same move as minimax, but prunes away branches that cannot possibly influence the final
decision is called Alpha-Beta pruning or Cutoff.
Alpha –beta pruning can be applied to trees of any depth, and it is often possible to prune entire
sub trees rather than just leaves.
Informed search and exploration − Informed search strategies − Heuristic function − Local
search algorithms and optimistic problems − Local search in continuous spaces − Online search
agents and unknown environments − Constraint Satisfaction Problems (CSP) − Backtracking
search and local search for CSP − Structure of problems − Adversarial search − Games −
Optimal decisions in games − Alpha-Beta pruning − Imperfect real-time decision − Games that
include an element of chance.
Informed (Heuristic) search strategies
Informed search (Heuristic search):
WWW.VIDYARTHIPLUS.COM V+ TEAM
WWW.VIDYARTHIPLUS.COM
The path cost from the current state to goal state is calculated, to select the minimum path cost as
the next state.
Additional information can be added as assumption to solve the problem.
E.g. a) Best first search
b) Greedy search
c) A* search
A general-purpose ontology should be applicable in more or less any special purpose domain.
• Add domain-specific axioms
In any sufficiently demanding domain different areas of
knowledge need to be unified.
• Reasoning and problem solving could involve several areas
simultaneously
What do we need to express?
Categories, Measures, Composite objects, Time, Space, Change,
Events, Processes, Physical Objects, Substances, Mental Objects,
Beliefs
Categories :
• KR requires the organisation of objects into categories
• Interaction at the level of the object
• Reasoning at the level of categories
• Categories play a role in predictions about objects
• Based on perceived properties
• Categories can be represented in two ways by FOL
• Predicates: apple(x)
• Reification of categories into objects: apples
• Category = set of its members
Measures :
• Objects have height, mass, cost, ....
Values that we assign to these are measures
• Combine Unit functions with a number:
Length(L1) = Inches(1.5) = Centimeters(3.81).
• Conversion between units:
∀ i Centimeters(2.54 x i)=Inches(i).
• Some measures have no scale:
Beauty, Difficulty, etc.
• Most important aspect of measures:
they are orderable.
• Don't care about the actual numbers.
(An apple can have deliciousness .9 or .1.)
PART B — (5 × 16 = 80 marks)
11. (a) Explain A* algorithm with a suitable example. State the limitations in the algorithm.
Expand the node on the least cost solution path using estimated cost and actual cost as the
evaluation function is called A* search. It evaluates nodes by combining g (n) , the cost to reach
the node, and h (n), the cost to get from the node to the goal:
since g (n) gives the path cost from the start node to node n, and h (n) is the estimated cost of the
cheapest path from n to the goal, we have
Monotonicity (consistency): In search tree any path from the root, the f-cost never decreases. This
condition is true for almost all admissible heuristics. A heuristic which satisfies this property is called
monotonicity.
Optimality: It is derived with two approaches. They are a) A* used with Tree-search b) A* used
with Graph-search.
(b) Explain the constraint satisfaction procedure to solve the cryptarithmetic problem.
Treating a problem as a CSP confers several important benefits. Because the representation of
states in a CSP conforms to a standard pattern—that is, a set of variables with assigned values
the successor function and goal test can written in a generic way that applies to all CSPs.
It is fairly easy to see that a CSP can be given an incremental formulation as a standard
search problem as follows:
Initial state: the empty assignment fg, in which all variables are unassigned.
Successor function: a value can be assigned to any unassigned variable, provided that it
does not conflict with previously assigned variables.
Goal test: the current assignment is complete.
WWW.VIDYARTHIPLUS.COM V+ TEAM
WWW.VIDYARTHIPLUS.COM
O + O = R + 10 X1
X1 + W + W = U + 10 X2
X2 + T + T = O + 10 X3
X3 = F
where X1, X2, and X3 are auxiliary variables representing the digit (0 or 1) carried over into the
next column. Higher-order constraints can be represented in a constraint hypergraph, such as the
one shown in Figure(b). The sharp-eyed reader will have noticed that the Alldiff constraint can
be broken down into binary constraints— F != T, F != U, and so on.
(b)Analyse the missionaries and Cannibals problem which is stated as follows. 3 missionaries and 3
cannibals are on one side of the river along with a boat that can hold one or two people. Find a way to
WWW.VIDYARTHIPLUS.COM V+ TEAM
WWW.VIDYARTHIPLUS.COM
geteveryone to the other side, without leaving a group of missionaries in one place outnumbered by the
cannibals in that place.
(i) Formulate a problem precisely making only those distinctions necessary to ensure a valid
solution. Draw a diagram of the complete state space.
13.(a)Explain the concept of planning with state space search. How is it different from partial order
planning?
The concept of State Space Search is widely used in Artificial Intelligence. The idea is that a
problem can be solved by examining the steps which might be taken towards its solution. Each
action takes the solver to a new state.
The classic example is of the Farmer who needs to transport a Chicken, a Fox and some Grain
across a river one at a time. The Fox will eat the Chicken if left unsupervised. Likewise the
Chicken will eat the Grain.
In this case, the State is described by the positions of the Farmer, Chicken, Fox and Grain. The
solver can move between States by making a legal move (which does not result in something
being eaten). Non-legal moves are not worth examining.
The solution to such a problem is a list of linked States leading from the Initial State to the Goal
State. This may be found either by starting at the Initial State and working towards the Goal state
or vice-versa.
Closed States: States whose links have all been been explored.
Open States: States which have been encountered, but have not been fully
explored.
The idea of a partial-order planner is to have a partial ordering between actions and only
commit to an ordering between actions when forced. This is sometimes also called a non-linear
planner, which is a misnomer because such planners often produce a linear plan.
actions, such that any total ordering of the actions, consistent with the partial ordering, will solve
the goal from the initial state. Writeact0 < act1 if action act0 is before action act1 in the partial
order. This means that action act0 must occur before action act1.
For uniformity, treat start as an action that achieves the relations that are true in the initial state,
and treatfinish as an action whose precondition is the goal to be solved. The pseudoaction start is
before every other action, and finish is after every other action. The use of these as actions means
that the algorithm does not require special cases for the initial situation and for the goals. When
the preconditions of finishhold, the goal is solved.
An action, other than start or finish, will be in a partial-order plan to achieve a precondition of an
action in the plan. Each precondition of an action in the plan is either true in the initial state, and
so achieved bystart, or there will be an action in the plan that achieves it.
(b)What are planning graphs? Explain the methods of planning and acting in the real
world.
A Planning graph consists of a sequence of levels that correspond to time steps in the
plan where level 0 is the initial state. Each level contains a set of literals and a set of Actions
Things go wrong
Incomplete information
o Unknown preconditions, e.g., Intact(Spare)?
o Disjunctive effects, e.g., Inflate(x) causes
o Inflated(x) ∨ SlowHiss(x) ∨ Burst(x) ∨ BrokenPump ∨…
WWW.VIDYARTHIPLUS.COM V+ TEAM
WWW.VIDYARTHIPLUS.COM
Incorrect information
o Current state incorrect, e.g., spare NOT intact
o Missing/incorrect post conditions in operators
Qualification problem:
o can never finish listing all the required preconditions and possible conditional
outcomes of actions.
Solutions
14. (a) Explain the concept of Bayesian network in representing knowledge in an uncertain domain.
A statistical learning method begins with the simplest task: parameter learning with complete
data. A parameter learning task involves finding the numerical parameters for a probability model
whose structure is fixed.
Maximum-likelihood parameter learning: Discrete models
In fact, though, we have laid out one standard method for maximum-likelihood parameter learning:
1. Write down an expression for the likelihood of the data as a function of the parameter(s).
2. Write down the derivative of the log likelihood with respect to each parameter.
3. Find the parameter values such that the derivatives are zero.
A significant problem with maximum-likelihood learning in general: ―when the data set is small
enough that some events have not yet been observed-for instance, no cherry candies-the maximum
Likelihood hypothesis assigns zero probability to those events”.
The most important point is that, with complete data, the maximum-likelihood parameter learning
problem for a Bayesian network decomposes into separate learning problems, one for each
parametez3. The second point is that the parameter values for a variable, given its parents, are just
WWW.VIDYARTHIPLUS.COM V+ TEAM
WWW.VIDYARTHIPLUS.COM
the observed frequencies of the variable values for each setting of the parent values. As before, we
must be careful to avoid zeroes when the data set is small.
Naive Bayes models
Probably the most common Bayesian network model used in machine learning is the naïve Bayes
model. In this model, the "class" variable C (which is to be predicted) is the root and the "attribute"
variables Xi are the leaves. The model is "naive7' because it assumes that the attributes are
conditionally independent of each other, given the class.
Maximum-like likelihood parameter learning: Continuous models
Continuous probability models such as the linear-Gaussian model. The principles for maximum
likelihood learning are identical to those of the discrete case. Let us begin with a very simple case:
learning the parameters of a Gaussian density function on a single variable. That is, the data are
generated also follows:
The parameters of this model are the mean, Y and the standard deviation a.
The quantity (yj - (B1xj + 02)) is the error for (zj, yj)-that is, the difference between the
actual value yj and the predicted value (01 x j + $2)-SO E is the well-known sum of squared errors.
This is the quantity that is minimized by the standard linear regression procedure. Now we can
understand why: minimizing the sum of squared errors gives the maximum likelihood straight-line
model, provided that the data are generated with Gaussian noise of fixed variance.
Bayesian parameter learning
The Bayesian approach to parameter learning places a hypothesis prior over the possible values of
the parameters and updates this distribution as data arrive. This formulation of learning and
prediction makes it clear that Bayesian learning requires no extra "principles of learning."
Furthermore, there is, in essence, just one learning algorithm, i.e., the inference algorithm for
Bayesian networks.
Learning net structures
WWW.VIDYARTHIPLUS.COM V+ TEAM
WWW.VIDYARTHIPLUS.COM
12. (a) Describe the A* search and give the proof of optimality of A*
A* search: Minimizing the total estimated solution cost
Expand the node on the least cost solution path using estimated cost and actual cost as the
evaluation function is called A* search. It evaluates nodes by combining g (n) , the cost to reach the
node, and h (n), the cost to get from the node to the goal:
f (n) = g (n) + h (n).
since g (n) gives the path cost from the start node to node n, and h (n) is the estimated cost of the
cheapest path from n to the goal, we have
f (n) = estimated cost of the cheapest solution through n.
A* search is both complete and optimal.
A* has the following properties: the tree-search version of A* is optimal if h(n) is admissable,
while the graph-search version is optimal if h(n) is consistent.
The first step is to establish the following: if h(n) is consistent, then the values of f(n) along any
path are nondecreasing. The proof follows directly from the definition of consistency:
Suppose n' is a successor of n; then g(n') = g(n) + c(n,a,n') for some action a and we have:
The next step is to prove that whenever A* selects a node n for expansion, the optimal path to
that node has been found. Were this not the case, there would have to be another frontier node n',
and by the graph separation property as illustrated in the following diagram:
where the frontier (white nodes) always separate the explored region (black nodes) from the
unexplored region (gray nodes). Because f is nondecreasing along any path, n' would have lower f-
cost than n and would have been selected first.
WWW.VIDYARTHIPLUS.COM V+ TEAM
WWW.VIDYARTHIPLUS.COM
From the two preceding observations, it follows that the sequence of nodes expanded by A* using
Graph-Search is in nondecreasing order of f(n). Hence, the first goal node selected for expansion
must be an optimal solution because f is the true cost for goal nodes (which have h = 0) and all later
goals will be at least as expensive.
(b) Give the algorithm for solving constraint satisfaction problems by local search?
Backtracking:
Backtracking search, a form of depth first search that chooses values for one variable at a
time and backtracks when a variable has no legal values left to assign. The algorithm is shown in
figure.
function BACKTRACKING-SEARCH(csp) returns a solution, or failure
return RECURSIVE-BACKTRACKING( { }, csp)
function RECURSIVE-BACKTRACKING(assignmen,csp) returns a solution, or failure
if assignment is complete then return assignment
var ← SELECT-UNASSIGNED-VARIABLE(VARIABLE[csp],assignment, csp)
for each value in ORDER-DOMAIN-VALUES(var, assignment, csp) do
if value is consistent with assignment according to CONSTRAINT[csp] then
add { var=value} to assignment
result ← RECURSIVE-BACKTRACKING(assignment, csp)
if result ≠ failure then return result
remove { var=value} from assignment
return failure
13. (a). Illustrate the use of First order logic to represent the knowledge.
WWW.VIDYARTHIPLUS.COM V+ TEAM
WWW.VIDYARTHIPLUS.COM
WWW.VIDYARTHIPLUS.COM V+ TEAM
WWW.VIDYARTHIPLUS.COM
Inference rules for PL apply to FOL as well. For example, Modus Ponens, And-
Introduction, And-Elimination, etc.
New (sound) inference rules for use with quantifiers:
o Universal Elimination
If (Ax)P(x) is true, then P(c) is true, where c is a constant in the domain of x. For
example, from (Ax)eats(Ziggy, x) we can infer eats(Ziggy, IceCream). The
variable symbol can be replaced by any ground term, i.e., any constant symbol or
function symbol applied to ground terms only.
o Existential Introduction
If P(c) is true, then (Ex)P(x) is inferred. For example, from eats(Ziggy,
IceCream) we can infer (Ex)eats(Ziggy, x). All instances of the given constant
symbol are replaced by the new variable symbol. Note that the variable symbol
cannot already exist anywhere in the expression.
o Existential Elimination
From (Ex)P(x) infer P(c). For example, from (Ex)eats(Ziggy, x) infer eats(Ziggy,
Cheese). Note that the variable is replaced by a brand new constant that does not
occur in this orany other sentence in the Knowledge Base. In other words, we
don't want to accidentally draw other inferences about it by introducing the
constant. All we know is there must be some constant that makes this true, so we
can introduce a brand new one to stand in for that (unknown) constant.
Paramodulation
o Given two sentences (P1 v ... v PN) and (t=s v Q1 v ... v QM) where
each Pi and Qi is a literal (see definition below) and Pj contains a term t, derive
new sentence (P1 v ... v Pj-1 v Pj[s] v Pj+1 v ... v PN v Q1 v ... v
QM) where Pj[s] means a single occurrence of the term t is replaced by the
term s in Pj
o Example: From P(a) and a=b derive P(b)
Generalized Modus Ponens (GMP)
o Combines And-Introduction, Universal-Elimination, and Modus Ponens
o Example: from P(c), Q(c), and (Ax)(P(x) ^ Q(x)) => R(x), derive R(c)
o In general, given atomic sentences P1, P2, ..., PN, and implication sentence (Q1 ^
Q2 ^ ... ^ QN) => R, where Q1, ..., QN and R are atomic sentences,
and subst(Theta, Pi) = subst(Theta, Qi) for i=1,...,N, derive new
sentence: subst(Theta, R)
o subst(Theta, alpha) denotes the result of applying a set of substitutions defined
by Theta to the sentence alpha
o A substitution list Theta = {v1/t1, v2/t2, ..., vn/tn} means to replace all
occurrences of variable symbol vi by term ti. Substitutions are made in left-to-
right order in the list. Example: subst({x/IceCream, y/Ziggy}, eats(y,x)) =
eats(Ziggy, IceCream)
WWW.VIDYARTHIPLUS.COM V+ TEAM
WWW.VIDYARTHIPLUS.COM
Automated inference using FOL is harder than using PL because variables can take on
potentially an infinite number of possible values from their domain. Hence there are
potentially an infinite number of ways to apply Universal-Elimination rule of inference
Godel's Completeness Theorem says that FOL entailment is only semidecidable. That is,
if a sentence is true given a set of axioms, there is a procedure that will determine this.
However, if the sentence is false, then there is no guarantee that a procedure will ever
determine this. In other words, the procedure may never halt in this case.
The Truth Table method of inference is not complete for FOL because the truth table size
may be infinite
Natural Deduction is complete for FOL but is not practical for automated inference
because the "branching factor" in a search is too large, caused by the fact that we would
have to potentially try every inference rule in every possible way using the set of known
sentences
Generalized Modus Ponens is not complete for FOL
Generalized Modus Ponens is complete for KBs containing only Horn clauses
o A Horn clause is a sentence of the form:
(Ax) (P1(x) ^ P2(x) ^ ... ^ Pn(x)) => Q(x)
where there are 0 or more Pi's, and the Pi's and Q are positive (i.e., un-negated)
literals
o Horn clauses represent a subset of the set of sentences representable in FOL. For
example, P(a) v Q(a) is a sentence in FOL but is not a Horn clause.
o Natural deduction using GMP is complete for KBs containing only Horn clauses.
Proofs start with the given axioms/premises in KB, deriving new sentences using
GMP until the goal/query sentence is derived. This defines a forward
chaining inference procedure because it moves "forward" from the KB to the
goal.
Example: KB = All cats like fish, cats eat everything they like, and Ziggy
is a cat. In FOL, KB =
1. (Ax) cat(x) => likes(x, Fish)
2. (Ax)(Ay) (cat(x) ^ likes(x,y)) => eats(x,y)
3. cat(Ziggy)
o Backward-chaining deduction using GMP is complete for KBs containing only Horn
clauses. Proofs start with the goal query, find implications that would allow you to
prove it, and then prove each of the antecedents in the implication, continuing to work
"backwards" until we get to the axioms, which we know are true.
Example: Does Ziggy eat fish?
WWW.VIDYARTHIPLUS.COM V+ TEAM
WWW.VIDYARTHIPLUS.COM
To prove eats(Ziggy, Fish), first see if this is known from one of the axioms directly. Here it is
not known, so see if there is a Horn clause that has the consequent (i.e., right-hand side) of the
implication matching the goal. Here,
Proof:
Goal matches RHS of Horn clause (2), so try and prove new sub-
goals cat(Ziggy) and likes(Ziggy, Fish) that correspond to the LHS of (2)
1. cat(Ziggy) matches axiom (3), so we've "solved" that sub-goal
2. likes(Ziggy, Fish) matches the RHS of (1), so try and prove cat(Ziggy)
3. cat(Ziggy) matches (as it did earlier) axiom (3), so we've solved this sub-goal
4. There are no unsolved sub-goals, so we're done. Yes, Ziggy eats fish
(b). Explain the Forward chaining and backward chaining algorithm with example.
Forward chaining
A forward-chaining algorithm for propositional definite clauses was already given. The idea is
simple: start with the atomic sentences in the knowledge base and apply ModusPonens in the
forward direction, adding new atomic sentences, until no further inferences can be made.
First-order definite clauses
First-order definite clauses closely resemble propositional definite clauses they are disjunctions
of literals of which exactly one is positive. A definite clause either is atomic or is an implication
whose antecedent is a conjunction of positive literals and whose consequent is a single positive
literal.
This knowledge base contains no function symbols and is therefore an instance of' the class
DATALOG of Data log knowledge bases-that is, sets of first-order definite clauses with no
function symbols.
WWW.VIDYARTHIPLUS.COM V+ TEAM
WWW.VIDYARTHIPLUS.COM
Decision tree induction is one of the simplest, and yet most successful forms of learning
algorithm. It serves as a good introduction to the area of inductive learning, and is easy to
implement.
Decision trees as performance elements
A decision tree takes as input an object or situation described by a set of attributes and returns a
decision the predicted output value for the input. The input attributes can be discrete or
continuous. For now, we assume discrete inputs. The output value can also be discrete or
continuous; learning a discrete-valued function is called classification learning; learning a
continuous function is called regression.
A decision tree reaches its decision by performing a sequence of tests. Each internal node in the
tree corresponds to a test of the value of one of the properties, and the branches from the node
are labeled with the possible values of the test. Each leaf node in the tree specifies the value to be
returned if that leaf is reached. The decision tree representation seems to be very natural for
humans; indeed, many "How To" manuals (e.g., for car repair) are written entirely as a single
decision tree stretching over hundreds of pages.
WWW.VIDYARTHIPLUS.COM V+ TEAM
WWW.VIDYARTHIPLUS.COM
rules that guide the parser to select one parse over another. With this additional back-ground
knowledge, CHILL can learn to achieve 70% to 85% accuracy on various database query tasks.
i) Information retrieval.
Due to the difficulty of the problem, current approaches to IE focus on narrowly restricted
domains. An example is the extraction from news wire reports of corporate mergers, such as
denoted by the formal relation:
"Yesterday, New-York based Foo Inc. announced their acquisition of Bar Corp."
WWW.VIDYARTHIPLUS.COM V+ TEAM
WWW.VIDYARTHIPLUS.COM
The global advantages of monotonicity should not be casually tossed aside, but at the same time
the computational advantages of nonmonotonic reasoning modes is hard to deny, and they are widely
used in the current state of the art. We need ways for them to co-exist smoothly.
In a JTMS, each sentence in the knowledge base is annotated with a justification consisting of the set of
sentences from which it was inferred. It is a simple TMS where one can examine the consequences of the
current set of assumptions. The meaning of sentences is not known.
Like JTMS in that it reasons with only one set of current assumptions at a time. More powerful than
JTMS in that it recognises the propositional semantics of sentences, i.e. understands the relations between
p and ~p, p and q and p&q, and so on.
An individual object of a certain class. While a class is just the type definition, an actual usage of a class
is called "instance". Each instance of a class can have different values for its instance variables.
PARTB-(5 X 16 = 80 marks)
11. (a) (i) Given an example of a problem for which breadth-first search would work better than
depth-first search. (8)
VIDYARTHIPLUS.COM V+ TEAM
WWW.VIDYARTHIPLUS.COM
In computer science, hill climbing is a mathematical optimization technique which belongs to the family
of local search. It is an iterative algorithm that starts with an arbitrary solution to a problem, then attempts
to find a better solution by incrementally changing a single element of the solution. If the change
produces a better solution, an incremental change is made to the new solution, repeating until no further
improvements can be found.
For example, hill climbing can be applied to the travelling salesman problem. It is easy to find an initial
solution that visits all the cities but will be very poor compared to the optimal solution. The algorithm
starts with such a solution and makes small improvements to it, such as switching the order in which two
cities are visited. Eventually, a much shorter route is likely to be obtained.
Hill climbing is good for finding a local optimum (a solution that cannot be improved by considering a
neighbouring configuration) but it is not guaranteed to find the best possible solution (the global
optimum) out of all possible solutions (the search space). The characteristic that only local optima are
guaranteed can be cured by using restarts (repeated local search), or more complex schemes based on
iterations, like iterated local search, on memory, like reactive search optimization and tabu search, on
memory-less stochastic modifications, like simulated annealing.
The relative simplicity of the algorithm makes it a popular first choice amongst optimizing algorithms. It
is used widely in artificial intelligence, for reaching a goal state from a starting node. Choice of next node
and starting node can be varied to give a list of related algorithms. Although more advanced algorithms
such as simulated annealing or tabu search may give better results, in some situations hill climbing works
just as well. Hill climbing can often produce a better result than other algorithms when the amount of time
VIDYARTHIPLUS.COM V+ TEAM
WWW.VIDYARTHIPLUS.COM
available to perform a search is limited, such as with real-time systems. It is an anytime algorithm: it can
return a valid solution even if it's interrupted at any time before it ends.
In simple hill climbing, the first closer node is chosen, whereas in steepest ascent hill climbing all
successors are compared and the closest to the solution is chosen. Both forms fail if there is no closer
node, which may happen if there are local maxima in the search space which are not solutions. Steepest
ascent hill climbing is similar to best-first search, which tries all possible extensions of the current path
instead of only one.
h (n) = estimated cost of the cheapest path from node n to a goal node.
Expand the node closest to the goal state using estimated cost as the evaluation is called Greedy best-
first search.
Expand the node on the least cost solution path using estimated cost and actual cost as the evaluation
function is called A* search.
VIDYARTHIPLUS.COM V+ TEAM
WWW.VIDYARTHIPLUS.COM
It is complete if the available memory is sufficient to store the deepest solution path.
It is optimal if enough memory is available to store the deepest solution path. Otherwise, it returns the
best solution that can be reached with the available memory.
Minimax is for deterministic games with perfect information. The minimax algorithm generates the
whole game tree and applies the utility function to each terminal state. Then it propagates the utility value
up one level and continues to do so until reaching the start node.
v ← MIN(v, MAX-VALUE(s))
return v
VIDYARTHIPLUS.COM V+ TEAM
WWW.VIDYARTHIPLUS.COM
13.(a) Illustrate the use of predicate logic to represent the knowledge with suitable example.
The organization of objects into categories is a vital part of knowledge representation. Although
interaction with the world takes place at the level of individual objects, much reasoning takes place at the
level of categories.There are two chances for representing categories in first-order logic: predicates and
objects.
Measurements
Kn both scientific and commonsense theories of the world, objects have height, mass, cost,and so on. The
values that we assign for these properties are called measures. Ordinary quantitative measures are quite
easy to represent. We imagine that the universe includes abstract "measure objects," such as the length
that is the length of this line segment:
Actions are logical terms such as Forward and Turn (Right). For now, we will assume that the
environment contains only one agent. (If there is more than one, an additional argument can be inserted to
say which agent is doing the action.)
Situations are logical terms consisting of the initial situation (usually called So) and all situations that are
generated by applying an action to a situation. The function Result(a, s) (sometimes called Do) names the
situation that results when action a is executed in situation s. Figure 3.11 illustrates this idea.
Fluent are functions and predicates that vary from one situation to the next, such as the location of the
agent or the aliveness of the wumpus. The dictionary says a fluent
is something that flows, like a liquid. In this use, it means flowing or changing across situations. By
convention, the situation is always the last argument of a fluent. For example, lHoldzng(G1, So) says that
the agent is not holding the gold GI in the initial situation So. Age (Wumpus, So) refers to the wumpus's
age in So.
A temporal or eternal predicates and functions are also allowed. Examples include the predicate Gold
(GI) and the function Left Leg Of (Wumpus).
A situation calculus agent should be able to deduce the outcome of a given sequence of PROJECTION
actions; this is the projection task. With a suitable constructive inference algorithm, it should also be able
to join a sequence that achieves a desired effect; this is the planning task.
VIDYARTHIPLUS.COM V+ TEAM
WWW.VIDYARTHIPLUS.COM
In the simplest version of situation calculus, each action is described by two axioms: a possibility axiom
that says when it is possible to execute the action, and an effect axiom that says EFFECT AXIOM what
happens when a possible action is executed.
The axioms have the following form:
The problem is that the effect axioms say what changes, but don't say what stays the same.
Representing all the things that stay the same is called the frame problem. We must find an efficient
solution to the frame problem because, in the real world, almost everything stays the same almost all the
time. Each action affects only a tiny fraction of all fluent.
One approach is to write explicit frame axioms that do say what stays the same.
we consider how each fluent predicate evolves over time.3 The axioms we use are called successor-state
axioms. They have the following form:
AXIOM SUCCESSOR-STATE AXIOM:
Action is possible + (Fluent is true in result state # Action S effect made it true
V It was true before and action left it alone) .The unique names axiom states a disqualify for every pair of
constants in the knowledge base.
Generalized events
A generalized event is composed from aspects of some "space-time chunk''--a piece of this
multidimensional space-time universe. This extraction generalizes most of the concepts we have seen so
far, including actions, locations, times, fluent, and physical objects.
VIDYARTHIPLUS.COM V+ TEAM
WWW.VIDYARTHIPLUS.COM
14. (a) With an example explain the logics for nonmonotonic reasoning.
The definite clause logic is monotonic in the sense that anything that could be concluded before a
clause is added can still be concluded after it is added; adding knowledge does not reduce the set of
propositions that can be derived.
A logic is non-monotonic if some conclusions can be invalidated by adding more knowledge. The logic of
definite clauses with negation as failure is non-monotonic. Non-monotonic reasoning is useful for
representing defaults. A default is a rule that can be used unless it overridden by an exception.
For example, to say that b is normally true if c is true, a knowledge base designer can write a rule of the
form
b ←c ∧∼ab_a.
where aba is an atom that means abnormal with respect to some aspect a. Given c, the agent can
infer bunless it is told aba. Adding aba to the knowledge base can prevent the conclusion of b. Rules that
implyaba can be used to prevent the default under the conditions of the body of the rule.
Example: Suppose the purchasing agent is investigating purchasing holidays. A resort may be adjacent to
a beach or away from a beach. This is not symmetric; if the resort was adjacent to a beach, the knowledge
provider would specify this. Thus, it is reasonable to have the clause
away_from_beach ←∼on_beach.
This clause enables an agent to infer that a resort is away from the beach if the agent is not told it is
adjacent to a beach.
A cooperative system tries to not mislead. If we are told the resort is on the beach, we would expect that
resort users would have access to the beach. If they have access to a beach, we would expect them to be
able to swim at the beach. Thus, we would expect the following defaults:
beach_access ←on_beach ∧∼abbeach_access.
swim_at_beach ←beach_access ∧∼abswim_at_beach.
A cooperative system would tell us if a resort on the beach has no beach access or if there is no
swimming. We could also specify that, if there is an enclosed bay and a big city, then there is no
swimming, by default:
abswim_at_beach ←enclosed_bay ∧big_city ∧∼abno_swimming_near_city.
We could say that British Columbia is abnormal with respect to swimming near cities:
abno_swimming_near_city ←in_BC ∧∼abBC_beaches.
Given only the preceding rules, an agent infers away_from_beach. If it is then told on_beach, it can no
longer infer away_from_beach, but it can now infer beach_access and swim_at_beach. If it is also
VIDYARTHIPLUS.COM V+ TEAM
WWW.VIDYARTHIPLUS.COM
told enclosed_bay and big_city, it can no longer infer swim_at_beach. However, if it is then toldin_BC, it
can then infer swim_at_beach.
By having defaults of what is normal, a user can interact with the system by telling it what is abnormal,
which allows for economy in communication. The user does not have to state the obvious.
One way to think about non-monotonic reasoning is in terms of arguments. The rules can be used as
components of arguments, in which the negated abnormality gives a way to undermine arguments. Note
that, in the language presented, only positive arguments exist that can be undermined. In more general
theories, there can be positive and negative arguments that attack each other.
(b) Explain how Bayesian statistics provides reasoning under various kinds of uncertainty.
A statistical learning method begins with the simplest task: parameter learning with complete
data. A parameter learning task involves finding the numerical parameters for a probability model
whose structure is fixed.
Maximum-likelihood parameter learning: Discrete models
In fact, though, we have laid out one standard method for maximum-likelihood parameter learning:
1. Write down an expression for the likelihood of the data as a function of the parameter(s).
2. Write down the derivative of the log likelihood with respect to each parameter.
3. Find the parameter values such that the derivatives are zero.
A significant problem with maximum-likelihood learning in general: ―when the data set is small
enough that some events have not yet been observed-for instance, no cherry candies-the maximum
Likelihood hypothesis assigns zero probability to those events”.
The most important point is that, with complete data, the maximum-likelihood parameter learning
problem for a Bayesian network decomposes into separate learning problems, one for each
parametez3. The second point is that the parameter values for a variable, given its parents, are just
the observed frequencies of the variable values for each setting of the parent values. As before, we
must be careful to avoid zeroes when the data set is small.
Naive Bayes models
Probably the most common Bayesian network model used in machine learning is the naïve Bayes
model. In this model, the "class" variable C (which is to be predicted) is the root and the "attribute"
variables Xi are the leaves. The model is "naive7' because it assumes that the attributes are
conditionally independent of each other, given the class.
Maximum-like likelihood parameter learning: Continuous models
Continuous probability models such as the linear-Gaussian model. The principles for maximum
likelihood learning are identical to those of the discrete case. Let us begin with a very simple case:
learning the parameters of a Gaussian density function on a single variable. That is, the data are
generated also follows:
The parameters of this model are the mean, Y and the standard deviation a.
VIDYARTHIPLUS.COM V+ TEAM
WWW.VIDYARTHIPLUS.COM
rather than the depth; at each iteration, the cutoff value is the smallest f-cost of any node that exceeded the cutoff on the
previous iteration. The main disadvantage is, it will require more storage space in complex domains.
The two recent memory bounded algorithms are:
1. Recursive best-first search(RBFS)
2. Memory bounded A* search (MA*)
Hill-climbing search
A search technique that move in the direction of increasing value to reach a peak state. It terminates when it reaches a peak
where no neighbor has a higher value. The Hill-climbing search algorithm as shown in figure.
function HILL-CLIMBING(problem) returns a state that is local maximum
inputs: problem, a problem
local variables: current, a node
neighbor, a node
current ← MAKE-NODE(INITIAL-STATE[problem])
loop do
neighbor ← a highest-valued successor of current
if VALUE[neighbor] ≤ VALUE[current] then return STATE[current]
current ← neighbor
Figure : The hill-climbing search algorithm
Drawbacks:
Local maxima (Foot hills): a local maximum is a peak that is higher than each of its neighboring states, but lower than the
global maximum
Ridges: a sequence of local maxima, which had a slope that gently moves to a peak.
Plateaux (shoulder): is an area of the state space landscape where the evaluation function is flat. It can be a flat local
maximum.
Some of the variants of hill-climbing are:
Stochastic hill climbing: chooses at random from among the uphill moves.
First choice hill climbing: implements stochastic hill climbing by generating successors randomly until one is generated
that is better than the current state.
Random-restart hill climbing: overcomes local maxima-trivially complete.
Constraint satisfaction problems (CSP) is defined by a set of variables, X1, X2, …Xn, and a set of constraints, C1,
C2,…Cm. Each variable Xi has a nonempty domain Di of all possible values. A complete assignment is one in which
every variable is mentioned, and a solution to a CSP is a complete assignment that satisfies all the constraints. Some CSPs
also require a solution that maximizes an objective function.
Some examples for CSP‘s are:
The n-queens problem
WWW.VIDYARTHIPLUS.COM V+ TEAM
WWW.VIDYARTHIPLUS.COM
A crossword problem
A map coloring problem
Constraint graph: A CSP is usually represented as an undirected graph, called constraint graph where the nodes are the
variables and the edges are the binary constraints.
CSP can be viewed as an incremental formulation as a standard search problem as follows:
Initial state: the empty assignment { }, in which all variables are unassigned.
Successor function: assign a value to an unassigned variable, provided that it does not conflict with previously assigned
variables.
Goal test: the current assignment is complete.
Path cost: a constant cost for every step.
Discrete variables:
1) Finite domains: For n variables with a finite domain size d, the complexity is O (dn). Complete assignment is
possible.E.g. Map-coloring problems.
2) Infinite domains: For n variables with infinite domain size such as strings, integers etc. E.g. set of strings and set of
integers.
Continuous variables: Linear constraints solvable in polynomial time by linear programming. E.g. start / end times for
Hubble space telescope observations.
Types of Constraints:
1. Unary constraints, which restricts the value of a single variable.
2. Binary constraints, involve pair of variables.
3. Higher order constraints, involve three or more variables.
A game can be defined by the initial state, the legal actions in each state, a terminal test and a utility function that applies to
terminal states.
In game playing to select the next state, search technique is required. The pruning technique allows us to ignore positions
of the search tree that make no difference to the final choice, and heuristic evaluation function allow us to find the utility
of a state without doing a complete search.
Optimal decisions in games
A game can be formally defined as a kind of search problem wit the following components:
Initial state: This includes the board position and identifies the player to move.
A successor function (operators), which returns a list of (move, state) pairs, each indicting a legal move and the resulting
state.
A terminal test, which determines when the game is over.
A utility function (payoff function or objective function), which gives a numeric values for the terminal states.
The Minimax algorithm
The Minimax algorithm computes the minimax decision from the current state. It performs a complete depth-first
exploration of the game tree. If the maximum depth of the tree is m, and there are b legal moves at each point, then the time
complexity of the minimax algorithm is O (bm).
WWW.VIDYARTHIPLUS.COM V+ TEAM
WWW.VIDYARTHIPLUS.COM
Debug the knowledge base
WWW.VIDYARTHIPLUS.COM V+ TEAM
WWW.VIDYARTHIPLUS.COM
Learning from observations − Forms of learning − Inductive learning − Learning decision trees − Ensemble learning −
Knowledge in learning − Logical formulation of learning − Explanation based learning − Learning using relevant
information − Inductive logic programming − Statistical learning methods − Learning with complete data − Learning with
hidden variable − EM algorithm − Instance based learning − Neural networks − Reinforcement learning − Passive
reinforcement learning − Active reinforcement learning − Generalization in reinforcement
Learning agent is a performance agent that decides what actions to take and a learning element that modifies the
performance element so that better decisions can be taken in the future. The design of a learning element is affected by
three major issues:
Which components of the performance element are to be learned?
WWW.VIDYARTHIPLUS.COM V+ TEAM
WWW.VIDYARTHIPLUS.COM
Fundamentals of Language
A formal language is defined as a set of strings. Each string is a concatenation of terminal symbols, called words.
A grammar is a finite set of rules that specifies a language. The grammar is a set of rewrite rules.
Example: S – sentence
NP – Noun phrase
VP – Verb phrase
These are called non terminal symbols.
The Component steps of communication
A typical communication episode, in which speaker S wants to inform hearer H about proposition P using words W, is
composed of seven steps.
1. Intention
2. Generation
3. Synthesis
4. Perception
5. Analysis
6. Disambiguation
7. Incorporation
The following figure shows the seven processes involved in communication using the example sentence ―The wumpus is
dead‖.
NP VP
WWW.VIDYARTHIPLUS.COM V+ TEAM
WWW.VIDYARTHIPLUS.COM
It defines how to combine the word and phrases, using five nonterminal symbols. The different types of phrases are:
Sentence
Noun phrase(NP)
Verb phrase(VB)
Prepositional phrase(PP)
Relative clause(Reclause)
Syntactic analysis is the step in which an input sentence is converted into a hierarchical structure that corresponds to the
units of meaning in the sentence. This process is called parsing.
Parsing can be seen as a process of searching for a parse tree. There are two extreme ways of specifying the search space.
1. Top-down parsing
2. Bottom-up parsing 1. Top-down parsing:
Begin with the start symbol and apply the grammar rules forward until the symbols at the terminals of the tree correspond
to the components of the sentence being parsed.
2. Bottom-up parsing
Begin with the sentence to be parsed and apply the grammar rules backward until a single tree whose terminals are the
words of the sentence and whose top node is the start symbol has been produced.
IR is a task of finding documents that are relevant to user‘s need for information.
E.g. Google, Yahoo etc.
An IR is characterized as
1. A document collection
Recall: is the proportion of all the relevant documents in the collection that are in the result.
Precision: is the proportion of documents in the result set that are actually relevant.
Other measures to evaluate IR
The other measures are
1. Reciprocal rank.
2. Time to answer
IR refinements
WWW.VIDYARTHIPLUS.COM V+ TEAM
WWW.VIDYARTHIPLUS.COM
The unigram model treats all words as completely independent, but we know that some words are correlated or co-related
E.g. the word ―Couch‖ has two closely related words
Couches
Sofa
Implementing IR systems
IR systems are made efficient by two data structures. They are
1. Lexicon:
It lists all the words in the document collection.
It supports one operation
It is implemented by hash table.
2. Inverted index
It is similar to the index at back side of a book. It consists of a set of hit lists, which is the place where the word occurs.
In unigram model, it is a list of
pairs.
It is the automatic translation of text from one natural language (source) to another (target).
Types of translation
1. Rough translation
2. Restricted source translation
3. Pre-edited translation
4. Literary translation
WWW.VIDYARTHIPLUS.COM V+ TEAM