0% found this document useful (0 votes)
124 views14 pages

Assignment 3: Artificial Intelligence

The document discusses an assignment submitted for an artificial intelligence course, which includes questions about breadth-first search, statistical approaches for solving AI problems, rule-based systems, and an explanation of Bayesian networks with an example. The questions are answered with multi-paragraph explanations about the various AI concepts and techniques.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
124 views14 pages

Assignment 3: Artificial Intelligence

The document discusses an assignment submitted for an artificial intelligence course, which includes questions about breadth-first search, statistical approaches for solving AI problems, rule-based systems, and an explanation of Bayesian networks with an example. The questions are answered with multi-paragraph explanations about the various AI concepts and techniques.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 14

ASSIGNMENT 3

CSE 452

ARTIFICIAL
INTELLIGENCE

Submitted to:- Submitted by:-


Miss. Ruchi Uday Pratap
10804711
A1810A11
Q1:- Write a pseudo code for BSF.

Solution 1:.

# Pseudo code for breadth first search:

i. Declare two empty lists: Open and Closed.


ii. Add Start node to our Open list.
iii. While our Open list is not empty, loop the following:

a. Remove the first node from our Open List.


b. Check to see if the removed node is our destination.

i. If the removed node is our destination, break out of the loop,


add the node to our Closed list, and return the value of our
Closed list.
ii. If the removed node is not our destination, continue the loop (go
to Step c).
c. Extract the neighbors of our above removed node.

Add the neighbors to the end of our Open list, and add the removed node to our
Closed list
earchMethod = function () {

var origin:Node = nodeOrigin;

var destination:Node = nodeDestination;

// Creating our Open and Closed Lists

var closedList:Array = new Array();

var openList:Array = new Array();

// Adding our starting point to Open List

openList.push(origin);

// Loop while openList contains some data.

while (openList.length != 0) {

// Loop while openList contains some data.

var n:Node = Node(openList.shift());

// Check if node is Destination

if (n == destination) {

closedList.push(destination);

trace("Closed!");

break;

// Store n's neighbors in array

var neighbors:Array = n.getNeighbors();

var nLength:Number = neighbors.length;

// Add each neighbor to the end of our openList


for (i=0; i<nLength; i++) {

openList.push(neighbors[i]);

// Add current node to closedList

closedList.push(n);

};

Q2:- List down the statistical approaches used to solve AI problems.

Solution 2:-

Q3:- What are rule based systems?

Solution 3:-

In computer science, rule-based systems are used as a way to store and manipulate
knowledge to interpret information in a useful way. They are often used in artificial
intelligence applications and research.

Consists of a rule-base (permanent data); an inference engine (process); and a


workspace or working memory (temporary data). Not part of the basic reasoning
process, but essential to applications, is the user interface.

In the cognitive models, the rule-base is usually equated with LTM and the workspace
with STM. There will be restrictions on these to correspond with assumptions about
mental architecture: e.g. limited size of STM.

Knowledge is stored as rules in the rule-base. (Also known as the knowledge base.
Rules are of the form

IF some condition THEN some action

The condition tests working memory, e.g. for the presence of certain symbols or
patterns of symbols. In many systems, the conditions are expressed logically as
conjunctions (occasionally, disjunctions) of predicates. In some systems, some
conditions correspond to sensor data. In a cognitive model, this would correspond to
direct access to sensory input, rather than to its representation in STM.

The action can be one of the following:

Another symbol or set of symbols to be added to STM. In many systems these will be
expressed logically as conjunctions of predicates.

Some other action on STM, e.g. ``delete the symbols XyZ''.

Some other action, e.g. turning a motor off, printing. In a cognitive model, this would
correspond to a motor command.

The inference engine applies the rules to working memory. There are various ways of
doing this: see later. Why so called? See Crevier p 157.

The user interface sits between the inference engine and the user. It translates the
system's answers from an internal representation to something the user can understand;
it passes questions from the system to the user and checks the replies (rejecting, for
example, a negative number as the answer to a request for your weight). Introduction
to Rule-Based Systems

Using a set of assertions, which collectively form the ‘working memory’, and a set of
rules that specify how to act on the assertion set, a rule-based system can be created.
Rule-based systems are fairly simplistic, consisting of little more than a set of if-then
statements, but provide the basis for so-called “expert systems” which are widely used
in many fields. The concept of an expert system is this: the knowledge of an expert is
encoded into the rule set. When exposed to the same data, the expert system AI will
perform in a similar manner to the expert.

Rule-based systems are a relatively simple model that can be adapted to any number of
problems. As with any AI, a rule-based system has its strengths as well as limitations
that must be considered before deciding if it’s the right technique to use for a given
problem. Overall, rule-based systems are really only feasible for problems for which
any and all knowledge in the problem area can be written in the form of if-then rules
and for which this problem area is not large. If there are too many rules, the system
can become difficult to maintain and can suffer a performance hit.
To create a rule-based system for a given problem, you must have (or create) the
following:

A set of facts to represent the initial working memory. This should be anything
relevant to the beginning state of the system.

A set of rules. This should encompass any and all actions that should be taken within
the scope of a problem, but nothing irrelevant. The number of rules in the system can
affect its performance, so you don’t want any that aren’t needed.

A condition that determines that a solution has been found or that none exists. This is
necessary to terminate some rule-based systems that find themselves in infinite loops
otherwise.

Theory of Rule-Based Systems

The rule-based system itself uses a simple technique: It starts with a rule-base, which
contains all of the appropriate knowledge encoded into If-Then rules, and a working
memory, which may or may not initially contain any data, assertions or initially known
information. The system examines all the rule conditions (IF) and determines a subset,
the conflict set, of the rules whose conditions are satisfied based on the working
memory. Of this conflict set, one of those rules is triggered (fired). Which one is
chosen is based on a conflict resolution strategy. When the rule is fired, any actions
specified in its THEN clause are carried out. These actions can modify the working
memory, the rule-base itself, or do just about anything else the system programmer
decides to include. This loop of firing rules and performing actions continues until one
of two conditions are met: there are no more rules whose conditions are satisfied or a
rule is fired whose action specifies the program should terminate.

Which rule is chosen to fire is a function of the conflict resolution strategy. Which
strategy is chosen can be determined by the problem or it may be a matter of
preference. In any case, it is vital as it controls which of the applicable rules are fired
and thus how the entire system behaves. There are several different strategies, but here
are a few of the most common:
First Applicable: If the rules are in a specified order, firing the first applicable one
allows control over the order in which rules fire. This is the simplest strategy and has a
potential for a large problem: that of an infinite loop on the same rule. If the working
memory remains the same, as does the rule-base, then the conditions of the first rule
have not changed and it will fire again and again. To solve this, it is a common
practice to suspend a fired rule and prevent it from re-firing until the data in working
memory, that satisfied the rule’s conditions, has changed.

Random: Though it doesn’t provide the predictability or control of the first-applicable


strategy, it does have its advantages. For one thing, its unpredictability is an advantage
in some circumstances (such as games for example). A random strategy simply
chooses a single random rule to fire from the conflict set. Another possibility for a
random strategy is a fuzzy rule-based system in which each of the rules has a
probability such that some rules are more likely to fire than others.

Most Specific: This strategy is based on the number of conditions of the rules. From
the conflict set, the rule with the most conditions is chosen. This is based on the
assumption that if it has the most conditions then it has the most relevance to the
existing data.

Least Recently Used: Each of the rules is accompanied by a time or step stamp, which
marks the last time it was used. This maximizes the number of individual rules that are
fired at least once. If all rules are needed for the solution of a given problem, this is a
perfect strategy.

"Best" rule: For this to work, each rule is given a ‘weight,’ which specifies how much
it should be considered over the alternatives. The rule with the most preferable
outcomes is chosen based on this weight.

Part B

Q4:- Explain Bayesian Network with example.

Solution 4:-

Bayesian Network.

A Bayesian network, belief network or directed acyclic graphical model is a


probabilistic graphical model that represents a set of random variables and their
conditional dependencies via a directed acyclic graph (DAG). For example, a Bayesian
network could represent the probabilistic relationships between diseases and
symptoms. Given symptoms, the network can be used to compute the probabilities of
the presence of various diseases.
Formally, Bayesian networks are directed acyclic graphs whose nodes represent
random variables in the Bayesian sense: they may be observable quantities, latent
variables, unknown parameters or hypotheses. Edges represent conditional
dependencies; nodes which are not connected represent variables which are
conditionally independent of each other. Each node is associated with a probability
function that takes as input a particular set of values for the node's parent variables and
gives the probability of the variable represented by the node. For example, if the
parents are m Boolean variables then the probability function could be represented by
a table of 2m entries, one entry for each of the 2m possible combinations of its parents
being true or false.

X is a Bayesian network with respect to G if its joint probability density function (with
respect to a product measure) can be written as a product of the individual density
functions, conditional on their parent variables:[1]

where pa(v) is the set of parents of v (i.e. those vertices pointing directly to v via a
single edge).

For any set of random variables, the probability of any member of a joint
distribution can be calculated from conditional probabilities using the chain
rule as follows:[1]

Compare this with the definition above, which can be written as:

for
each which is a parent of
The difference between the two expressions is the conditional
independence of the variables from any of their non-descendents, given
the values of their parent variables. To develop a Bayesian network, we
often first develop a DAG G such that we believe X satisfies the local
Markov property with respect to G. Sometimes this is done by creating
a causal DAG. We then ascertain the conditional probability
distributions of each variable given its parents in G. In many cases, in
particular in the case where the variables are discrete, if we define the
joint distribution of X to be the product of these conditional
distributions, then X is a Bayesian network with respect to G

Example:-

Burglar Alarm Example

I’m at work. Is there is a burglary at home?

Neighbour John calls to say my alarm is ringing.

Neighbour Mary does not call.

Sometimes alarm is set off by minor earthquakes.

Boolean variables: Burglary, E arthquake, A larm,

JohnCalls, MaryCalls

Construct network to reflect causal knowledge

A burglar may set the alarm off

An earthquake may set the alarm off

The alarm may cause Mary to call

The alarm may cause John to call

Burglar Alarm Example


Less space: Max. k parents ⇒ O(nd

) numbers vs. O(d

Faster to answer queries

Simpler to find CPTs


Q5:- How ‘Dempster Shafer Theory’ can be used to solve AI
problems?

Solution 5:-

dampster shafer theory:-

The Dempster–Shafer theory (DST) is a mathematical theory of


evidence[1]. It allows one to combine evidence from different sources
and arrive at a degree of belief (represented by a belief function) that
takes into account all the available evidence. The theory was first
developed by Arthur P. Dempster[2] and Glenn Shafer.

In a narrow sense, the term Dempster–Shafer theory refers to the


original conception of the theory by Dempster and Shafer. However, it
is more common to use the term in the wider sense of the same general
approach, as adapted to specific kinds of situations. In particular, many
authors have proposed different rules for combining evidence, often
with a view to handling conflicts in evidence better

Shafer's framework allows for belief about propositions to be represented as intervals,


bounded by two values, belief (or support) and plausibility:

belief ≤ plausibility.
Belief in a hypothesis is constituted by the sum of the masses of all sets enclosed
by it (i.e. the sum of the masses of all subsets of the hypothesis). It is the amount
of belief that directly supports a given hypothesis at least in part, forming a lower
bound. Plausibility is 1 minus the sum of the masses of all sets whose intersection
with the hypothesis is empty. It is an upper bound on the possibility that the
hypothesis could be true, i.e. it “could possibly be the true state of the system” up
to that value, because there is only so much evidence that contradicts that
hypothesis.

For example, suppose we have a belief of 0.5 and a plausibility of 0.8 for a
proposition, say “the cat in the box is dead.meaning that the cat could either be
dead or alive. This interval represents the level of uncertainty based on the
evidence in your system.
Hypothesis Mass Belief Plausibility
Null (neither alive nor dead) 0 0 0
Alive 0.2 0.2 0.5
Dead 0.5 0.5 0.8
Either (alive or dead) 0.3 1.0 1.0

The null hypothesis is set to zero by definition (it corresponds to “no solution”).
The orthogonal hypotheses “Alive” and “Dead” have probabilities of 0.2 and 0.5,
respectively. This could correspond to “Live/Dead Cat Detector” signals, which
have respective reliabilities of 0.2 and 0.5. Finally, the all-encompassing “Either”
hypothesis (which simply acknowledges there is a cat in the box) picks up the
slack so that the sum of the masses is 1. The belief for the “Alive” and “Dead”
hypotheses matches their corresponding masses because they have no subsets;
belief for “Either” consists of the sum of all three masses (Either, Alive, and
Dead) because “Alive” and “Dead” are each subsets of “Either”. The “Alive”
plausibility is 1 − m (Dead) and the “Dead” plausibility is 1 − m (Alive). Finally,
the “Either” plausibility sums m(Alive) + m(Dead) + m(Either). The universal
hypothesis (“Either”) will always have 100% belief and plausibility —it acts as a
checksum of sorts.

Here is a somewhat more elaborate example where the behavior of belief and
plausibility begins to emerge. We're looking through a variety of detector systems
at a single faraway signal light, which can only be coloured in one of three
colours (red, yellow, or green):
Hypothesis Mass Belief Plausibility
Null 0 0 0
Red 0.35 0.35 0.56
Yellow 0.25 0.25 0.45
Green 0.15 0.15 0.34
Red or Yellow 0.06 0.66 0.85
Red or Green 0.05 0.55 0.75
Yellow or Green 0.04 0.44 0.65
Any 0.1 1.0 1.0

Events of this kind would not be modeled as disjoint sets in probability space as
they are here in mass assignment space. Rather the event "Red or Yellow" would
be considered as the union of the events "Red" and "Yellow", and (see the axioms
of probability theory) P(Red or Yellow) ≥ P(Yellow), and P(Any)=1,
where Any refers to Red or Yellow or Green. In DST the mass assigned
to Any refers to the proportion of evidence that can't be assigned to any of the
other states, which here means evidence that says there is a light but doesn't say
anything about what color it is. In this example, the proportion of evidence that
shows the light is either Red or Green is given a mass of 0.05. Such evidence
might, for example, be obtained from a R/G color blind person. DST lets us
extract the value of this sensor's evidence. Also, in DST the Null set is considered
to have zero mass, meaning here that the signal light system exists and we are
examining it's possible states, not speculating as to whether it exists at all.

Q6:- Explain ATN in NLP with examples?


Solution 6:-

An augmented transition network (ATN) is a recursive transition network that can


perform tests and take actions during arc transitions.

An ATN uses a set of registers to store information.

A set of actions is defined for each arc, and the actions can look at and modify the
registers.

An arc may have a test associated with it. The arc is traversed (and its action is taken)
only if the test succeeds.
When a lexical arc is traversed, it is put in a special variable (*) that keeps track of the
current word.

Natural language understanding is a subtopic of natural language processing in


artificial intelligence that deals with machine reading comprehension.

The process of disassembling and parsing input is more complex than the reverse
process of assembling output in natural language generation because of the occurrence
of unknown and unexpected features in the input and the need to determine the
appropriate syntactic and semantic schemes to apply to it, factors which are pre-
determined when outputting language.

An ATN is simply an RTN that has been equipped with a memory and the ability to
augment arcs with actions and conditions that make reference to that memory (see
Chapter 3 for a detailed discussion of RTNs and ATNs). ATN-based parsers were
probably the most common kind of parser employed by computational linguists in the
1970s, but they have begun to fall out of favour in recent years. This is largely because
the augmentations destroy the declarative nature of the formalism and because a parser
using an ATN is limited in the search strategies it can employ (see Chapter 5 for a
comprehensive account of the relation between parsing and search).
A much larger range of search strategies become practical once a data structure known
as a chart is adopted for parsing, and chart parsers have now become one of the basic
tools of modern NLP. A chart is basically a data structure in which the parser records
its successful attempts to parse subconstituents of the string of words.

Once the parser has recorded the presence of a constituent in one part of the string, it
never needs to look for the same kind of constituent there again. This represents a
significant improvement on the backtracking algorithms used in most ATN systems.
The ability of the chart to record, in addition, the current goals of the parser leads to
the possibility of implementing very sophisticated algorithms (see Chapter 6 for a
detailed discussion of chart-based parsers).

Prolog is an inherently declarative language and so it is not surprising that one of the
first of the new breed of declarative grammar formalisms emerged from that language.
Definite Clause Grammars (DCG's) were developed from ideas of Colmerauer and
have been quite widely used within the Prolog community. A DCG is essentially a
phrase structure grammar (see Chapter 4) annotated with Prolog variables which maps
straightforwardly into ordinary Prolog code. This total compatibility with Prolog is the
major attraction of DCG's. Even though they look like grammars, and are in fact
grammars, they can be used as parsers directly, given the way that Prolog works.
REMOVED IN LIGHT OF REVIEWER COMMENT: However, using them in this
way can prove inefficient since Prolog does not, by itself, employ any analogue of the
well-formed substring table or chart discussed in the preceding section. The DCG
formalism is provably powerful enough to describe languages (both natural and
artificial) of arbitrary complexity. It is not, however, especially well-adapted for
providing elegant accounts of some of the complexities that show up in natural
languages (e.g., the unbounded dependency type of construction discussed in Chapter
4), although this has been ameliorated in some subsequent extensions of the
formalism.

Ambiguity is arguably the single most important problem in NLP (see Chapter 5).
Natural languages are riddled with ambiguities at every level of description, from the
phonetic

You might also like