0% found this document useful (0 votes)
562 views257 pages

Unit 1

The document provides an introduction to artificial intelligence including definitions and history. It defines AI as developing intelligent machines that can behave and think like humans. It discusses Alan Turing's 1950 proposal of the Turing Test to determine if a machine can demonstrate human-level intelligence. The history of AI is then summarized, noting early developments in the 1940s-1950s, boom periods in the 1980s, and intermittent "AI winters" when funding declined.

Uploaded by

jana k
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
562 views257 pages

Unit 1

The document provides an introduction to artificial intelligence including definitions and history. It defines AI as developing intelligent machines that can behave and think like humans. It discusses Alan Turing's 1950 proposal of the Turing Test to determine if a machine can demonstrate human-level intelligence. The history of AI is then summarized, noting early developments in the 1940s-1950s, boom periods in the 1980s, and intermittent "AI winters" when funding declined.

Uploaded by

jana k
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 257

UNIT I

INTRODUCTION

1.1 Artificial Intelligence – Introduction


In today's world, technology advances at a fast pace, and we are constantly
exposed to new innovations.
Artificial Intelligence is one of the developing computer science technologies
that is poised to usher in a new era in the world by creating intelligent
machines.
The Artificial Intelligence is now all around us. It is now working on a wide
range of subfields, from broad to specific, such as self-driving vehicles, chess,
proving theorems, music, painting, and so on.
AI is one of the most exciting and general areas of computer science, with a
bright future ahead of it. AI holds a tendency to cause a machine to work as a
human. Artificial Intelligence is made up of the phrases Artificial and
Intelligence, with Artificial referring to "man-made" and Intelligence referring
to "thinking power," so AI refers to "a man-made thinking power."
Definition
"It is an area of computer science by which we may develop intelligent
machines which can behave like a person, think like humans, and able to make
judgments." 
Artificial Intelligence exists when a machine can have human based talents
such as learning, reasoning, and solving issues.
With Artificial Intellect you do not need to preprogram a machine to do some
function, despite that you may design a machine with programmed algorithms
which can work with own intelligence, and that is the greatness of AI.
ARTIFICIAL
INTELLIGENCE INTELLIGENCE
It is a natural process. It is programmed by humans.
It is actually hereditary. It is not hereditary.
Knowledge is required for intelligence. KB and electricity are
required to generate output.
No human is an expert. We may get better Expert systems are made
solutions from other humans. which aggregate many
person’s experience and
ideas.

(a) Intelligence - Ability to utilise information to improve performance in a


given situation.
(b) Artificial Intelligence - For a particular agent architecture, research and
development of agent programmes that operate well in a specific environment.
(c) Agent - An creature that responds to environmental cues by taking action.
(d) Rationality A property of a system that, given what it knows, does the
"correct thing."
(e) Logical Reasoning - A method of generating new sentences from old ones
in which the new ones must be true if the old ones are true.
Four Approaches of Artificial Intelligence
 Acting humanly: The Turing test approach.
 Thinking humanly: The cognitive modelling approach.
 Thinking rationally: The laws of thought approach.
 Acting rationally: The rational agent approach.
1.1.1 Turing Test in AI

Alan Turing devised a test in 1950 to determine whether a machine can think
like a person. This test is known as the Turing Test. Turing claimed that a
machine is intelligent if it can duplicate human responses under particular
conditions in this test.
Turing proposed the Turing Test in his 1950 paper "Computing Machinery
and Intelligence," which addressed the subject of "Can Machine Think?"

The Turing test is based on a modified version of the party game "Imitation
game." This game has three players: one is a computer, another is a human
responder, and the third is a human interrogator who is separated from the
other two players and whose job it is to figure out which of the two is a
machine.
Consider the following scenario: Player A is a computer, Player B is a human,
and Player C is a questioner. The interrogator is aware that one of them is a
machine, but he must determine this based on the questions and responses.
Because all players communicate via keyboard and screen, the outcome is
unaffected by the machine's capacity to transform words into speech.
The questions and answers can be like:
Interrogator: Are you a computer?
PlayerA (Computer): No
Interrogator: Multiply two large numbers such as (256896489*456725896)
Player A: Long pause and give the wrong answer.
If an interrogator is unable to distinguish between a machine and a human in
this game, the computer passes the test, then the machine is said to be
intelligent and capable of thinking like a human." Hugh Loebner, a New York
businessman, announces a prize competition in 1991, promising $100,000 to
the first computer to pass the Turing test. However, no AI programme has ever
come close to passing the Turing test in its purest form "..
1.1.2 Chatbots to attempt the Turing test
ELIZA: Joseph Weizenbaum designed ELIZA, a natural language processing
computer programme. It was made to demonstrate the ability of machines and
humans to communicate. It was one of the first chatterbots to put the Turing
Test to the test.
Parry: Kenneth Colby built Parry, a chatterbot, in 1972. Parry was created to
represent a person suffering from paranoid schizophrenia (most common
chronic mental disorder). "ELIZA with attitude" was how Parry was
described. In the early 1970s, Parry was put to the test using a variant of the
Turing Test.
Eugene Goostman: Eugene Goostman was a chatbot created in 2001 in Saint
Petersburg. This bot has taken part in a number of Turing Tests. Goostman
won the competition billed as the largest-ever Turing test content in June
2012, convincing 29 percent of the judges that it was a human. Goostman had
the appearance of a 13-year-old virtual boy.

1.1.3 The Chinese Room Argument


Many philosophers were strongly opposed to the concept of Artificial
Intelligence as a whole. "Chinese Room" was the most famous debate on this
list.
In his paper "Mind, Brains, and Program," published in 1980, John Searle
presented the "Chinese Room" thought experiment, which argued against the
validity of Turing's Test. "Programming a computer to comprehend a language
may make it grasp a language, but it will not establish a true comprehension of
language or awareness in a machine," he claims.
He claimed that while machines like ELIZA and Parry could pass the Turing
test by manipulating keywords and symbols, they lacked a true knowledge of
language. As a result, it cannot be described as a machine's "thinking"
capabilities.
Features that a machine must have in order to pass the Turing test:
Natural language processing: To speak with the Interrogator in a general
human language like English, NLP is required.
Knowledge representation: During the test, information will be stored and
retrieved.
Automated reasoning: To answer the questions using previously recorded
information.
Machine learning: To adapt to new situations and recognise generalised
patterns.
Vision (For total Turing test): During a test, to recognise the interrogator's
activities and other objects.
Motor Control (For total Turing test): When objects are requested, to act on
them.

1.2 History of Artificial Intelligence


For academics, Artificial Intelligence is not a new term or a new technology.
This technology is actually much older than you might think. In Ancient
Greek and Egyptian mythologies, there are even tales of mechanical men. The
milestones in the history of AI that define the route from AI generation to
current development are listed below.
Maturation of Artificial Intelligence (1943-1952)
Year 1943: Warren McCulloch and Walter Pits published the first work on
artificial intelligence (AI) in 1943. They suggested an artificial neuron model.
Year 1949: Donald Hebb demonstrated a rule for modifying the strength of
connections between neurons. Hebbian learning is the name given to his rule.
Year 1950: In 1950, Alan Turing, an English mathematician, pioneered
machine learning. In his paper "Computing Machinery and Intelligence," Alan
Turing proposes a test. A Turing test can be used to determine whether or not
a machine can demonstrate intelligent behaviour comparable to human
intelligence.
The birth of Artificial Intelligence (1952-1956)
Year 1955: The "first artificial intelligence software," dubbed "Logic
Theorist," was constructed by Allen Newell and Herbert A. Simon. This
programme verified 38 of 52 mathematical theorems, as well as discovering
new and more elegant proofs for several of them.
Year 1956: At the Dartmouth Conference, American computer scientist John
McCarthy coined the term "Artificial Intelligence." AI became an academic
field for the first time.
High-level computer languages like FORTRAN, LISP, and COBOL were
invented at the time. And there was a lot of interest in AI at the time.
Early enthusiasm in the golden years (1956-1974)
Year 1966: The focus of the researchers was on creating algorithms that could
solve mathematical issues. ELIZA, the first chatbot, was invented by Joseph
Weizenbaum in 1966.
Year 1972: WABOT-1, the world's first intelligent humanoid robot, was
created in Japan. The first AI winter, from 1974 to 1980. The first AI winter
took place between 1974 and 1980. The AI winter refers to a period when
computer scientists faced a severe lack of government funding for AI research.
During AI winters, there was a drop in public interest in artificial intelligence.
A boom of AI (1980-1987)
Year 1980: After AI winter duration, AI came back with "Expert System".
Expert systems were programmed that emulate the decision-making ability of
a human expert.
The American Association of Artificial Intelligence hosted its first national
conference at Stanford University in 1980.
The second AI winter was from 1987 to 1993.
Between 1987 to 1993, the AI Winter lasted for the second time. Investors and
the government have once again halted funding for AI research, citing
excessive costs and ineffective results. XCON, for example, was a very cost-
effective expert system.
The emergence of intelligent agents (1993-2011)
Year 1997: In the year 1997, IBM Deep Blue beats world chess champion,
Gary Kasparov, and became the first computer to beat a world chess
champion.
Year 2002: for the first time, AI entered the home in the form of Roomba, a
vacuum cleaner.
Year 2006: AI came in the Business world till the year 2006. Companies like
Facebook, Twitter, and Netflix also started using AI.
Deep learning, big data and artificial general intelligence (2011-present)
Year 2011: Watson, an IBM computer, won Jeopardy in 2011, a game show in
which it had to solve complicated questions and riddles. Watson had
demonstrated that it could comprehend plain language and solve complex
problems fast.
Year 2012: Google has launched an Android app feature "Google now", which
was able to provide information to the user as a prediction.
Year 2014: In the year 2014, Chatbot "Eugene Goostman" won a competition
in the infamous "Turing test."
Year 2018: IBM's "Project Debater" argued tough themes with two master
debaters and did exceptionally well.
Google showcased an AI programme called "Duplex," which was a virtual
assistant that had taken a hairdresser appointment over the phone, and the lady
on the other end didn't realise she was speaking with the computer.
AI has now progressed to a phenomenal level. Deep learning, big data, and
data science are all hot topics at the moment. Companies such as Google,
Facebook, IBM, and Amazon are now using AI to create incredible
technologies. Artificial Intelligence's future is exciting, and it will be highly
intelligent.
1.3 FUTURE OF ARTIFICIAL INTELLIGENCE
Transportation: Autonomous cars will one day transport us from place to
place, despite the fact that perfecting them could take a decade or more.
• Manufacturing: Predictive analytical sensors keep equipment working
efficiently, and AI-powered robots work alongside humans to do a restricted
range of activities like assembly and stacking.
• Healthcare: Diseases are more quickly and reliably diagnosed, medication
discovery is sped up and streamlined, virtual nursing assistants monitor
patients, and big data analysis helps to provide a more personalised patient
experience in the comparably AI-nascent field of healthcare.
• Education: Early-stage virtual tutors assist human instructors, and facial
analysis gauges students' emotions to help determine who is struggling or
bored, and better tailor the experience to their individual needs.
• Media: Journalism is utilising AI as well, and will continue to do so in the
future. Bloomberg employs Cyborg technology to assist in the interpretation
of complex financial reports. The Associated Press uses Automated Insights'
natural language capabilities to publish 3,700 earnings reports stories per year,
approximately four times more than in the past.
• Customer Service: Last but not least, Google is developing an AI assistant
that can make human-like phone calls to schedule appointments at places like
your local hair salon. The technology comprehends context and nuance in
addition to words.
Characteristics of intelligent agents
The following features characterise intelligent agents:
They have some autonomy, which permits them to complete certain tasks
independently.
They have a learning skill that allows them to learn while performing tasks.
Other entities, such as agents, humans, and systems, can interact with them.
Intelligent agents can incrementally accommodate new rules.
They have a goal-oriented mindset.
They are based on knowledge. They make use of information on
communications, procedures, and entities.
1.4 Typical Intelligent Agents
Agent
An agent can be defined as anything that uses sensors to observe its
environment and actuators to act on that environment.
Humans, for example, detect their surroundings through their sensory organs,
known as sensors, and perform actions using their actuators, which include
their hands, legs, and other body parts.
Diagrammatic Representation of an Agent

Agents interact with the environment through sensors and actuators

Intelligent Agent
A goal-directed agent is an intelligent agent. It sees its environment through its
sensors and acts on it through its actuators, based on observations and built-in
knowledge.
Rational Agent
A rational agent is one that takes the appropriate action in response to each
perception. As a result, the performance measure is maximised, allowing an
agent to be the most successful.
Omniscient Agent
An omniscient agent is one that foresees the real outcome of its actions. In the
real world, however, such agents are impossible to create.
Because a rational agent seeks to attain the best possible result with the
existing perception, which leads to imperfection, rational agents vary from
Omniscient agents. A chess AI is a good example of a rational agent because it
is impossible to predict every conceivable outcome given the present activity,
whereas a tic-tac-toe AI is omniscient because it always knows the outcome in
advance.
Software Agents
It is a programme that operates in a dynamic environment. Because all of the
body parts of software agents are software, they are also known as Softbots.
For instance, video games, flight simulators, and so on.
1.4.1 Behavior of an Agent
An agent's behaviour can be mathematically stated as:
Agent Function:
The mapping of a given percept sequence to an action is known as the agent
function. It's a mathematical explanation in the abstract. 
Agent Program:
The agent function is implemented in a practical and physical manner using
the agent programme.
An automatic hand dryer, for example, uses sensors to detect signals (hands).
When we place our hands near the dryer, it activates the heating circuit and
starts blowing air. When the signal detection fades, the heating circuit is
broken and the air blower is turned off.
Rationality of an agent
An intelligent agent is required to act in a way that maximises its performance
metric. As a result, an agent's rationality is determined by four factors:
 The criterion of success is defined by the performance metric.
 The agent's understanding of the environment is pre-programmed.
 The acts that the agent is capable of executing.
 Until now, the agent's percept sequence.
Exam results, for example, are determined by both the question paper and our
knowledge.
1.4.2 Task Environment
A task environment is a problem that a rational agent is built to solve. As a
result, Russell and Norvig introduced several ways to categorise task
environments in 2003. However, we need be cognizant of the following terms
before classifying the environments:
Performance Measure: It outlines the sequence of steps taken by the agent to
achieve its goal by measuring various criteria.
Environment: It describes how the agent interacts with various types of
environments.
Actuators: It describes how the agent impacts the environment by performing
predetermined activities.
Sensors: It describes how the agent obtains information from its surroundings.
PEAS is an abbreviation for these terms (Performance measure, Environment,
Actuators, Sensors). Let's take a closer look at each part in the following
example to better understand PEAS terminology:

Agent
Performance Environment Actuators Sensors
Type

Cameras,
Taxi Safe, fast, correct Steering, horn, GPS,
Roads, traffic
Driver destination breaks speedom
eter

PEAS summary for an automated taxi driver


1.4.3 Properties/ Classification of Task Environment
Fully Observable vs. Partially Observable
The task environment is completely visible when an agent's sensors offer
access to the complete state of the environment at any given time, whereas the
task environment is partially observable when the agent lacks comprehensive
and relevant information about the environment.
Example: The agent in the Checker Game sees the environment totally,
whereas the agent in the Poker Game only observes the environment partially
because it cannot see the cards of the other agent.
Single-agent vs. Multiagent
Single-agent systems are used when a single agent works to achieve a goal,
whereas Multiagent systems are used when two or more agents collaborate to
achieve a goal.
Example: Playing a crossword puzzle – single agent
Playing chess –multiagent (requires two agents)
Deterministic vs. Stochastic
The environment is deterministic if the agent's present state and action entirely
dictate the next state of the environment, whereas the environment is
stochastic if the next state cannot be known from the current state and action.
Example: Image analysis – Deterministic
Taxi driving – Stochastic (cannot determine the traffic behavior)
It may appear as stochastic if the environment is only partially observable.
Episodic vs. Sequential
The environment is episodic if the agent's episodes are broken into atomic
episodes and the next episode does not depend on prior state actions, whereas
the environment is sequential if present actions may influence future
decisions.
Example:  Part-picking robot – Episodic
Chess playing – Sequential
Static vs. Dynamic
A dynamic environment is one in which the environment varies with time;
otherwise, it is static.
Example: The physical world has a dynamic environment, but crossword
puzzles have a static one.
Discrete vs. Continuous
The environment is discrete if an agent has a finite number of actions and
states, otherwise it is continuous.
Example: There are only so many moves in a game of checkers – Discrete
A truck can make an endless number of moves while travelling to its
destination — indefinitely.
Known vs. Unknown
The outcomes of the agent's actions are known in a known environment, but in
an unknown environment, the agent must learn from the environment in order
to make good decisions.
Example: A tennis player understands the rules and consequences of his or
her actions, whereas a video game player must learn the rules of a new game.
A known environment can be observed in part, but an unknown environment
can be observed completely.
Structure of agents
Artificial intelligence aims to create an agent programme that performs an
agent function, such as mapping percepts to actions. The execution of a
programme necessitates the use of some computer equipment with physical
sensors and actuators, which is referred to as architecture.
As a result, an agent is the result of combining architecture and programming,
i.e. agent = architecture + programming.
The distinction between an agent programme and an agent function is that the
former accepts the current percept as input while the latter takes the whole
percept history.
Types of Agent Programs
The following four categories of agents exist, each with a different amount of
intelligence and task complexity:
Simple reflex agents: It is the most basic agent, acting solely on the present
percept and disregarding the remainder of the percept history. The condition-
action rule – "If condition, then action" – is used by this sort of agent function.
It can only make correct decisions if the surroundings is completely visible.
When the environment is partially observable, these agents cannot avoid
infinite loops, but they can escape them if they randomise their activities.

Example:iDraw is a drawing robot that translates entered text into writing


without saving previous information.
Simple reflex agents don't keep track of their internal states and don't rely on
perceptual theory.
Model-based agent: By preserving certain internal states, these agents can
deal with partially visible surroundings. The internal state is determined by the
percept history, which reflects at least part of the current state's unobserved
characteristics. As a result, the internal state must be updated over time, which
necessitates the encoding of two types of knowledge or information in an
agent programme: the evolution of the world on its own and the impacts of the
agent's actions.

Example: When a person walks in a lane, he maps the pathway in his mind.


Goal-based agents: Having current status information isn't enough unless the
objective hasn't been set. As a result, a goal-based agent chooses from a
variety of options a path that will assist it achieve its goal.
Utility-based agents: These agents are concerned with the performance
indicator. The agent chooses activities that maximise the performance metric
and devote them to the goal.
The ultimate goal of chess is to 'check-and-mate' the king, but the player must
first accomplish various smaller objectives.
Utility-based agents take track of their surroundings and, before attaining their
main goal, complete multiple smaller goals that may appear along the way.
Learning agents: The primary goal of these agents is to teach the agent
machines how to operate in an unfamiliar environment while also gaining as
much knowledge as possible. There are four conceptual components to a
learning agent:
Learning element: This component is in charge of making improvements.
Performance element: It is in charge of choosing external actions based on
the perceptions it has.
Critic: It gives the learning agent feedback on how well the agent is
performing, perhaps increasing the performance measure in the future.
Problem Generator: It offers acts that may lead to novel and educational
experiences.
Example: Humans do not learn to speak until they are born. The goal of a
Learning agent is to increase the agent's overall performance.

 
Working of an agent program’s components
The purpose of agent components is to provide answers to basic queries such
as "What is the state of the world right now?" and "What do my actions
accomplish?"
We can represent the agent's surroundings in a variety of ways by
discriminating on an axis of increasing expressive strength and complexity, as
shown below:
Atomic Representation: We are unable to split each condition of the world in
this manner. As a result, it lacks any internal structure. The atomic
representation is used in search and game-playing, Hidden Markov Models,
and Markov decision processes.
Factored Representation: Each state is divided into a fixed set of attributes
or variables, each of which has a value. It enables us to portray ambiguity. The
Factored representation is used in constraint fulfilment, propositional logic,
Bayesian networks, and machine learning techniques.
Some variables, like as the current GPS location, can be shared by two
separate factored states, but not by two different atomic states.
Structured Representation: Here, we can express the diverse and varying
relationships that exist between distinct items in the world. Structured
representation is based on relational databases and first-order logic, first-order
probability models, and natural language understanding.

PROBLEM SOLVING APPROACH TO TYPICAL AI PROBLEMS


Because they immediately transfer states to actions, reflex agents are
characterised as the simplest agents. Unfortunately, these agents are unable to
function in situations where the mapping is too huge to store and learn. On the
other hand, a goal-based agent considers future behaviours as well as the
desired outcomes. The problem-solving agent is a sort of goal-based agent that
uses atomic representation and has no internal states observable to the
problem-solving algorithms.
Problem-solving agents
Search techniques are universal problem-solving methods in Artificial
Intelligence. These search strategies or algorithms were generally employed
by rational agents or problem-solving agents in AI to solve a given problem
and provide the best outcome. Goal-based agents that use atomic
representation are problem-solving agents. We will learn a variety of problem-
solving search methods in this area.
According to computer science, problem-solving is a subset of artificial
intelligence that includes a variety of ways to solve a problem, such as
algorithms and heuristics.
As a result, a problem-solving agent is a goal-driven agent who is solely
concerned with achieving the goal.
Steps performed by Problem-solving agent
Goal Formulation: It is the first and most basic stage in fixing a problem. It
organises the steps/sequence needed to construct a single goal from many
goals, as well as the actions needed to achieve that goal. The agent's
performance measure and the current condition are used to formulate goals
(discussed below).
Problem Formulation: It is the most crucial phase in problem-solving since it
determines which actions should be followed to attain the stated goal. In the
formulation of a problem, there are five elements to consider:
Initial State: It is the agent's starting state or first step toward its aim.
Actions: It is a list of the various options available to the agent.
Transition Model: It explains what each action accomplishes.
Goal Test: It decides whether or not the current state is a goal state.
Path cost: Each path that leads to the goal is given a numerical cost. A cost
function is chosen by the problem-solving agent to reflect its performance
measure. Remember that the optimal option has the cheapest path cost of all
the alternatives.
The state-space of the problem is implicitly defined by the initial state, actions,
and transition model. A problem's state-space is a collection of all states that
can be achieved by following any series of activities from the beginning state.
The state-space is represented as a directed map or graph, with nodes
representing states, links between nodes representing actions, and a path
representing a series of states connected by a series of actions.
Search: It determines the best potential sequence of actions to get from the
current condition to the goal state. It receives an issue as input and returns a
solution as output.
Solution: It selects the best algorithm from a set of algorithms that can be
demonstrated to be the most optimal solution.
Execution: It uses the best optimal solution found by the searching algorithms
to get from the present state to the goal state.

Example Problems
Normally, there are two types of problem approaches:
Toy Problem: Researchers use it to compare the performance of algorithms
because it provides a succinct and clear explanation of the problem.
Real-world Problem: The problems that need to be solved are those that
occur in the real world. It does not require descriptions, unlike a toy problem,
yet we can have a generic formulation of the problem.
Some Toy Problems
8 Puzzle Problem: A 3x3 matrix with movable tiles numbered 1 to 8 and a
blank area is shown. The tile to the left of the vacant spot can be slid into it.
The goal is to achieve a goal state that is similar to the one indicated in the
diagram below.
Our goal is to slide digits into the blank space in the figure to change the
current state to the goal state.
By sliding digits into the vacant space in the above diagram, we can change
the current (Start) state to the desired state.
The problem formulation is as follows:
States: It shows where each numbered tile and the blank tile are located.
Initial State: We Any state can be used as the starting point.
Actions: The blank space's actions are defined here, i.e., left, right, up, or
down.
Transition Model: It returns the final state, which is determined by the
provided state and actions.
Goal test: It determines if we have arrived at the correct goal-state.
Path cost: The path cost is the number of steps in a path where each step costs
one dollar. The 8-puzzle problem is a form of sliding-block problem that is
used to evaluate new artificial intelligence search engines.
8-queens problem: The goal of this issue is to arrange eight queens on a
chessboard in such a way that none of them can attack another queen. A queen
can attack other queens in the same row and column or diagonally.
We can grasp the problem and its correct solution by looking at the diagram
below.
As can be seen in the diagram above, each queen is put on the chessboard in
such a way that no other queen is placed diagonally, in the same row or
column. As a result, it is one viable solution to the eight-queens dilemma.
There are two primary types of formulations for this problem:
Incremental formulation: It begins with an empty state and progresses in
steps, with the operator augmenting a queen at each step.
Following steps are involved in this formulation:
States: On the chessboard, arrange any number of queens from 0 to 8.
Initial State: A chessboard with no pieces
Actions: Fill any empty box with a queen.
Transition model: The chessboard is returned with the queen in a box.
Goal test: Checks if eight queens can be positioned on the chessboard without
being attacked.
Path cost: Because only final states are counted, there is no requirement for
path cost.
There are approximately 1.8 x 1014 potential sequences to analyse in this
formulation.
Complete-state formulation: It begins with all eight queens on the
chessboard and moves them around the board, avoiding attacks.
Following steps are involved in this formulation
States: Each of the eight queens is arranged in a column, with no queen
assaulting the other.
Actions: Move the queen to a secure spot away from the attackers.
This formulation is superior to the incremental formulation since it shrinks the
state space from 1.8 x 1014 to 2057 and makes finding solutions much easier.
Some Real-world problems
Traveling salesperson problem(TSP): It's a problem of touring, because the
salesman can only visit each city once. The goal is to discover the shortest tour
and sell out all of the merchandise in each place.
VLSI Layout problem: Millions of components and connections are placed
on a chip in order to reduce area, circuit delays, and stray capacitances while
increasing manufacturing yield.
The layout problem is split into two parts:
Cell layout: The circuit's primitive components are arranged into cells, each
of which performs a distinct purpose. Each cell is the same size and shape.
The goal is to arrange the cells on the chip so that they do not overlap.
Channel routing: It determines a unique path through the spaces between the
cells for each wire.
Protein Design: The goal is to develop an amino acid sequence that will fold
into a 3D protein with the ability to treat an illness.
Searching for solutions
We've seen a lot of issues. There is now a need to look for solutions to these
problems.
In this section, we'll look at how the agent can use searching to solve an issue.
For solving different kinds of problem, an agent makes use of different
strategies to reach the goal by searching the best possible algorithms. This
process of searching is known as search strategy.
Measuring problem-solving performance
Before delving into other search tactics, it's important to assess an algorithm's
performance. As a result, there are four ways to evaluate an algorithm's
performance:
Completeness: It assesses if the algorithm is certain to discover a solution (if
any solution exist).
Optimality: It determines whether the approach seeks out the best answer.
Time Complexity: The amount of time it takes for an algorithm to find a
solution.
Space Complexity: The amount of memory needed to conduct a search.
The branching factor, or maximum number of successors, the depth of the
shallowest goal node (i.e., the number of steps from the root to the path), and
the maximum length of any path in a state space all influence the algorithm's
complexity.

Application of AI
Artificial intelligence is used in a variety of ways in today's society. It is
becoming increasingly important in today's world because it can efficiently
handle complicated problems in a variety of areas, including healthcare,
entertainment, banking, and education. Our daily lives are becoming more
comfortable and efficient as a result of artificial intelligence.
The following are some of the areas where Artificial Intelligence is used:
1. AI (Astronomy)
Artificial Intelligence (AI) can be extremely helpful in resolving complicated
challenges in the universe. AI technology can assist in gaining a better
understanding of the cosmos, including how it operates, its origin, and so on.
2. AI (Healthcare)
In the previous five to ten years, AI has become more beneficial to the
healthcare business and is expected to have a big impact.
AI is being used in the healthcare industry to make better and faster diagnoses
than humans. AI can assist doctors with diagnosis and can alert doctors when a
patient's condition is deteriorating so that medical assistance can be provided
before the patient is admitted to the hospital.
3. AI (Gaming)
AI can be employed in video games. AI machines can play strategic games
like chess, in which the system must consider a vast number of different
options.
4. AI (Finance)
The banking and AI businesses are the ideal complements to each other.
Automation, chatbots, adaptive intelligence, algorithm trading, and machine
learning are all being used in financial activities.
5. AI (Data Security)
Data security is critical for every business, and cyber-attacks are on the rise in
the digital age. AI can help you keep your data safe and secure. Some
examples are the AEG bot and the AI2 Platform, which are used to better
determine software bugs and cyber-attacks.
6. AI (Social Media)
Facebook, Twitter, and Snapchat, for example, have billions of user accounts
that must be kept and handled in a very efficient manner. AI has the ability to
organise and manage large volumes of data. AI can go through a large amount
of data to find the most recent trends, hashtags, and user requirements.
7. AI (Travel & Transport)
For the travel industry, AI is becoming increasingly important. AI is capable
of doing a variety of travel-related tasks, including making travel
arrangements and recommending hotels, flights, and the best routes to
customers. The travel industry is utilising AI-powered chatbots that can
engage with clients in a human-like manner to provide better and faster
service.
8. AI (Automotive Industry)
Some automotive companies are utilising artificial intelligence to provide a
virtual assistant to their users in order to improve performance. Tesla, for
example, has released TeslaBot, an intelligent virtual assistant.
Various industries are presently working on self-driving automobiles that will
make your ride safer and more secure.
9. AI (Robotics)
In Robotics, Artificial Intelligence plays a significant role. Typically,
conventional robots are programmed to execute a repetitive task; but, using
AI, we may construct intelligent robots that can perform tasks based on their
own experiences rather than being pre-programmed.
Humanoid Robots are the best instances of AI in robotics; recently, the
intelligent Humanoid Robots Erica and Sophia were built, and they can
converse and behave like people.
10. AI (Entertainment)
We already use AI-based applications in our daily lives with entertainment
providers like Netflix and Amazon. These services display software or show
recommendations using machine learning/artificial intelligence (ML/AI)
algorithms.
11. AI (Agriculture)
Agriculture is a field that necessitates a variety of resources, including effort,
money, and time, in order to get the greatest results. Agriculture is becoming
more computerised these days, and AI is becoming more prevalent in this
industry. AI is being used in agriculture in the form of agriculture robotics,
solid and crop monitoring, and predictive analysis. AI in agriculture has the
potential to be extremely beneficial to farmers.
12. AI (E-commerce)
AI is giving the e-commerce industry a competitive advantage, and it is
becoming increasingly demanded in the market. Shoppers can use AI to find
related products in their preferred size, colour, or brand.
13. AI (Education)
Grading can be automated with AI, giving the instructor more time to educate.
As a teaching assistant, an AI chatbot can communicate with students.
In the future, AI could serve as a personal virtual tutor for pupils, available at
any time and from any location.
UNIT II
PROBLEM SOLVING METHODS

PROBLEM-SOLVING METHODS IN ARTIFICIAL INTELLIGENCE


Because they immediately transfer states to actions, reflex agents are
characterised as the simplest agents. Unfortunately, these agents are unable to
function in situations where the mapping is too huge to store and learn. On the
other hand, a goal-based agent considers future behaviours as well as the
desired outcomes.
Problem-solving agent
By defining problems and their various solutions, the problem-solving agent
performs exactly.
"A problem-solving state refers to a state when we aim to accomplish a
defined objective from a current state or condition," according to psychology.
According to computer science, problem-solving is a subset of artificial
intelligence that includes a variety of ways to solve a problem, such as
algorithms and heuristics.
As a result, a problem-solving agent is a goal-driven agent who is solely
concerned with achieving the goal.
Steps performed by Problem-solving agent
Goal Formulation: It is the first and most basic stage in fixing a problem. It
organises the steps/sequence needed to construct a single goal from many
goals, as well as the actions needed to achieve that goal. The agent's
performance measure and the current condition are used to formulate goals
(discussed below).
Problem Formulation: It is the most crucial phase in problem-solving since it
determines which actions should be followed to attain the stated goal. In the
formulation of a problem, there are five elements to consider:
Initial State: It is the agent's starting state or first step toward its aim.
Actions: It is a list of the various options available to the agent.
Transition Model: It explains what each action accomplishes.
Goal Test: It decides whether or not the current state is a goal state.
Path cost: Each path that leads to the goal is given a numerical cost. A cost
function is chosen by the problem-solving agent to reflect its performance
measure. Remember that the optimal option has the cheapest path cost of all
the alternatives.
Search: It determines the best potential sequence of actions to get from the
current condition to the goal state. It receives an issue as input and returns a
solution as output.
Solution: It selects the best algorithm from a set of algorithms that can be
demonstrated to be the most optimal solution.
Execution: It uses the best optimal solution found by the searching algorithms
to get from the present state to the goal state.
Example Problems
In general, there are two sorts of problem-solving strategies:
Researchers use the toy problem to compare the performance of algorithms
because it offers a brief and exact explanation of the problem.
Actual-world Problems: These are problems that need to be solved in the real
world. It does not require descriptions, unlike a toy problem, yet we can have
a generic formulation of the problem.
Some Toy Problems
8 Puzzle Problem: A 3x3 matrix with movable tiles numbered 1 to 8 and a
blank area is shown. The tile to the left of the vacant spot can be slid into it.
The goal is to achieve a goal state that is similar to the one indicated in the
diagram below.
Our goal in the diagram below is to slide digits into the vacant space to change
the current state to the goal state.
By sliding digits into the vacant space in the above diagram, we can change
the current (Start) state to the desired state.
The problem formulation is as follows:
States: It shows where each numbered tile and the blank tile are located.
Initial State: We Any state can be used as the starting point.
Actions: The blank space's actions are defined here, i.e., left, right, up, or
down.
Transition Model: It returns the final state, which is determined by the
provided state and actions.
Goal test: It determines if we have arrived at the correct goal-state.
Path cost: The path cost is the number of steps in a path where each step costs
one dollar. The 8-puzzle problem is a form of sliding-block problem that is
used to evaluate new artificial intelligence search engines.
8-queens problem: The goal of this issue is to arrange eight queens on a
chessboard in such a way that none of them can attack another queen. A queen
can attack other queens in the same row and column or diagonally.
We can grasp the problem and its correct solution by looking at the diagram
below.
As can be seen in the diagram above, each queen is put on the chessboard in
such a way that no other queen is placed diagonally, in the same row or
column. As a result, it is one viable solution to the eight-queens dilemma.
There are two primary types of formulations for this problem:
Incremental formulation: It begins with an empty state and progresses in
steps, with the operator augmenting a queen at each step.
Following steps are involved in this formulation:
States: On the chessboard, arrange any number of queens from 0 to 8.
Initial State: A chessboard with no pieces
Actions: Fill any empty box with a queen.
Transition model: The chessboard is returned with the queen in a box.
Goal test: Checks if eight queens can be positioned on the chessboard without
being attacked.
Path cost: Because only final states are counted, there is no requirement for
path cost.
There are approximately 1.8 x 1014 potential sequences to analyse in this
formulation.
Complete-state formulation: It begins with all eight queens on the
chessboard and moves them around the board, avoiding attacks.
Following steps are involved in this formulation
States: Each of the eight queens is arranged in a column, with no queen
assaulting the other.
Actions: Move the queen to a secure spot away from the attackers.
This formulation is superior to the incremental formulation since it shrinks the
state space from 1.8 x 1014 to 2057 and makes finding solutions much easier.
Some Real-world problems
Traveling salesperson problem(TSP): It's a problem of touring, because the
salesman can only visit each city once. The goal is to discover the shortest tour
and sell out all of the merchandise in each place.
VLSI Layout problem: Millions of components and connections are placed
on a chip in order to reduce area, circuit delays, and stray capacitances while
increasing manufacturing yield.
The layout problem is split into two parts:
Cell layout: The circuit's primitive components are arranged into cells, each
of which performs a distinct purpose. Each cell is the same size and shape.
The goal is to arrange the cells on the chip so that they do not overlap.
Channel routing: It determines a unique path through the spaces between the
cells for each wire.
Protein Design: The goal is to develop an amino acid sequence that will fold
into a 3D protein with the ability to treat an illness.
Searching for solutions
We've seen a lot of issues. There is now a need to look for solutions to these
problems.
In this section, we'll look at how the agent can use searching to solve an issue.
For solving different kinds of problem, an agent makes use of different
strategies to reach the goal by searching the best possible algorithms. This
process of searching is known as search strategy.
Measuring problem-solving performance
Before delving into other search tactics, it's important to assess an algorithm's
performance. As a result, there are four ways to evaluate an algorithm's
performance:
Completeness: It assesses if the algorithm is certain to discover a solution (if
any solution exist).
Optimality: It determines whether the approach seeks out the best answer.
Time Complexity: The amount of time it takes for an algorithm to find a
solution.
Space Complexity: The amount of memory needed to conduct a search.
The branching factor, or maximum number of successors, the depth of the
shallowest goal node (i.e., the number of steps from the root to the path), and
the maximum length of any path in a state space all influence the algorithm's
complexity.

Search Algorithms in Artificial Intelligence


Problem-solving agents
Search techniques are universal problem-solving methods in Artificial
Intelligence. These search strategies or algorithms were generally employed
by rational agents or problem-solving agents in AI to solve a given problem
and provide the best outcome. Goal-based agents that use atomic
representation are problem-solving agents. We will learn a variety of problem-
solving search methods in this area.
Search Algorithm Terminologies
Search: A step-by-step procedure for solving a search problem in a given
search space is known as searching. There are three main factors that can
contribute to a search problem:
Search Space: A search space is a collection of possible solutions that a
system could have.
Start State: It's the starting point for the agent's quest.
Goal test: It's a function that looks at the current state and returns whether or
not the goal state has been reached.
Search tree: Search tree is a tree representation of a search problem. The root
node, which corresponds to the initial state, is at the top of the search tree.
Actions: It provides the agent with a list of all available actions.
Transition model: A transition model is a description of what each action does.
Path Cost: It's a function that gives each path a numerical cost.
Solution: It is an action sequence that connects the start and end nodes.
Optimal Solution: If a solution is the cheapest of all the options.
Properties of Search Algorithms
The four most important properties of search algorithms to compare their
efficiency are as follows:
Completeness: If a search method guarantees to return a solution if at least one
solution exists for any random input, it is said to be complete.
Optimality: When a solution for an algorithm is guaranteed to be the best
solution (lowest route cost) among all alternative solutions, it is referred to as
an optimum solution.
Time Complexity: The time complexity of an algorithm is a measure of how
long it takes for it to perform its goal.
Space Complexity: It is the maximum amount of storage space necessary at
any point throughout the search due to the problem's complexity.
Types of search algorithms
We can divide search algorithms into uninformed (Blind search) and informed
(Heuristic search) algorithms based on the search problems.
Uninformed/Blind Search
The uninformed search has no domain knowledge, such as proximity or the
goal's location. It works in a brute-force manner since it simply contains
instructions on how to traverse the tree and locate leaf and goal nodes.
Uninformed search is sometimes known as blind search since it searches a
search tree without any knowledge of the search space, such as initial state
operators and goal tests. It goes through the tree, node by node, until it reaches
the destination node.
It is broken into five categories:
Breadth-first search
Uniform cost search
Depth-first search
Iterative deepening depth-first search
Bidirectional Search
Informed Search
Domain knowledge is used by informed search algorithms. Problem
information is available in an educated search, which can help steer the search.
Informed search tactics are more likely to find a solution than uninformed
search strategies. Heuristic search is another name for informed search.
A heuristic is a method for finding a good answer in a fair amount of time,
even if the optimal solution isn't always guaranteed.
Informed search can solve a lot of complicated problems that can't be solved
any other way.
A travelling salesman issue is an example of an informed search algorithm.
Greedy Search
A* Search
Uninformed Search Algorithms
Uninformed search is a type of general-purpose search method that uses brute
force to find results. Uninformed search algorithms have no other information
about the state or search space except how to traverse the tree, which is why
it's also known as blind search.
The many forms of ignorant search algorithms are as follows:
Breadth-first Search
Depth-first Search
Depth-limited Search
Iterative deepening depth-first search
Uniform cost search
Bidirectional Search
Breadth-first Search
The most frequent search approach for traversing a tree or graph is breadth-
first search. Breadth-first search is the name given to an algorithm that
searches a tree or graph in a breadth-first manner.
Before going on to nodes of the next level, the BFS algorithm starts searching
from the tree's root node and extends all successor nodes at the current level.
A general-graph search algorithm like the breadth-first search algorithm is an
example.
A FIFO queue data structure was used to implement a breadth-first search.
Advantages
If a solution is available, BFS will provide it.
If there are multiple answers to a problem, BFS will present the simplest
solution with the fewest steps.
Disadvantages
It necessitates a large amount of memory since each level of the tree must be
saved into memory before moving on to the next.
If the solution is located far from the root node, BFS will take a long time.
Example
We've showed how to traverse the tree using the BFS method from the root
node S to the destination node K in the diagram below. Because the BFS
search algorithm traverses in layers, it will follow the dotted arrow's path, and
the travelled path will be:
S---> A--->B---->C--->D---->G--->H--->E---->F---->I---->K 

Time Complexity: The number of nodes traversed in BFS till the shallowest


Node can be used to determine the algorithm's time complexity. Where d
represents the depth of the shallowest solution and b represents a node at each
state.
T (b) = 1+b2+b3+.......+ bd= O (bd)
Space Complexity: The space complexity of the BFS algorithm is determined
by Memory size of frontier which is O(bd).
Completeness: BFS is complete, which implies it will discover a solution if
the shallowest target node is at a finite depth.
Optimality: If the path cost is a non-decreasing function of the node depth,
BFS is the best option.
Depth-first Search
A recursive approach for traversing a tree or graph data structure is depth-first
search.
The depth-first search is named after the fact that it begins at the root node and
follows each path to its greatest depth node before moving on to the next path.
DFS is implemented using a stack data structure.
The DFS algorithm works in a similar way as the BFS method.
Advantages
Because DFS only needs to store a stack of nodes on the path from the root
node to the current node, it uses extremely little memory.
The BFS method takes longer to reach the goal node (if it traverses in the right
path).
Disadvantages
There's a chance that many states will recur, and there's no certainty that a
solution will be found.
The DFS algorithm performs deep searching and may occasionally enter an
infinite cycle.
Example
The flow of depth-first search is depicted in the search tree below, and it will
proceed in the following order:
root node—>left node ----> right node.
It will begin its search from root node S and traverse A, B, D, and E; after
traversing E, it will backtrack the tree because E has no successors and the
goal node has yet to be discovered. It will retreat to node C and then to node
G, where it will end because it has reached the goal node.
Completeness: Because it expands every node within a constrained search
tree, the DFS search method is complete within finite state space.
Time Complexity: DFS's time complexity will be proportional to the number
of nodes traversed by the algorithm. It is provided by:
T(n)= 1+ n2+ n3 +.........+ nm=O(nm)
Where m is the greatest depth of any node, which can be significantly greater
than d. (Shallowest solution depth)
Space Complexity: Because the DFS method only has to store one path from
the root node, its space complexity is the same as the size of the fringe set,
which is O (bm).
Optimal: The DFS search algorithm is inefficient since it can result in a large
number of steps or a high cost to reach the goal node.
Depth-Limited Search Algorithm
A depth-limited search algorithm works similarly to a depth-first search but
with a limit. The problem of the endless path in the Depth-first search can be
overcome by depth-limited search. The node at the depth limit will be treated
as if it has no additional successor nodes in this procedure.
Two conditions of failure can be used to end a depth-limited search:
Standard failure value: It denotes the absence of a solution to the situation.
Cutoff failure value: Within a specified depth limit, it defines no solution to
the problem.
Advantage
Memory is saved by using a depth-limited search.
Disadvantage
Incompleteness is another drawback of depth-limited search.
If there are multiple solutions to an issue, it may not be the best option.
Example:

Completeness: If the solution is above the depth-limit, the DLS search


procedure is complete.
Time Complexity: Time complexity for DLS algorithm is O(bℓ).
Space Complexity: Space complexity for DLS algorithm is O(b×ℓ).
Optimal: Depth-limited search can be noted as a special case of DFS, and it is
also not optimal even if ℓ>d.
Uniform-cost Search Algorithm
A searching algorithm for traversing a weighted tree or graph is uniform-cost
search. When a separate cost is provided for each edge, this algorithm is used.
The uniform-cost search's main purpose is to discover the shortest path to the
goal node with the lowest cumulative cost. Uniform-cost search grows nodes
from the root node based on their path costs. It can be used to solve any graph
or tree in which the best cost is required. The priority queue employs a
uniform-cost search algorithm. It gives the lowest total cost the highest
priority. If the path cost of all edges is the same, uniform cost search is
comparable to BFS algorithm.
Advantage
Because the path with the lowest cost is chosen at each state, uniform cost
search is the best option.
Disadvantage
It is unconcerned with the number of steps involved in the search and is just
concerned with the expense of the path. As a result, this algorithm may
become stuck in an endless cycle.
Example

Completeness:
The uniform-cost search is complete, so if a solution exists, UCS will discover
it.
Time Complexity:
Let C* is Cost of the optimal solution, and ε is each step to get closer to the
goal node. Then the number of steps is = C*/ε+1. Here we have taken +1, as
we start from state 0 and end to C*/ε.
Hence, the worst-case time complexity of Uniform-cost search isO(b1 + [C*/ε])/.
Space Complexity:
The same logic is for space complexity so, the worst-case space complexity of
Uniform-cost search is O(b1 + [C*/ε]).
Optimal:
Uniform-cost search is always the best option because it only chooses the
cheapest path.
Iterative deepening depth-first Search
The iterative deepening algorithm is a blend of DFS and BFS algorithms. This
search technique determines the appropriate depth limit by gradually raising it
until a goal is discovered.
This algorithm searches in depth first up to a specific "depth limit," then
increases the depth limit for each iteration until the objective node is
discovered.
This search algorithm combines the speed of breadth-first search with the
memory efficiency of depth-first search.
When the search space is huge and the depth of the goal node is unknown, the
iterative search technique is useful for uninformed search.
Advantages
In terms of quick search and memory efficiency, it combines the advantages of
the BFS and DFS search algorithms.
Disadvantages:
The biggest disadvantage of IDDFS is that it duplicates all of the preceding
phase's work.
Example
The iterative deepening depth-first search is seen in the tree structure below.
The IDDFS algorithm iterates until it can't find the goal node any longer. The
algorithm's iteration is described as follows:

1'st Iteration-----> A
2'nd Iteration----> A, B, C
3'rd Iteration------>A, B, D, E, C, F, G
4'th Iteration------>A, B, D, H, I, E, C, F, K, G
The method will find the goal node in the fourth iteration.
Completeness:
If the branching factor is finite, this procedure is complete.
Time Complexity:
Let's suppose b is the branching factor and depth is d then the worst-case time
complexity is O(bd).
Space Complexity:
The space complexity of IDDFS will be O(bd).
Optimal:
If path cost is a non-decreasing function of node depth, the IDDFS algorithm
is optimal.
Bidirectional Search Algorithm
To discover the goal node, the bidirectional search algorithm does two
simultaneous searches, one from the initial state (forward-search) and the
other from the goal node (backward-search). Bidirectional search splits a
single search graph into two small subgraphs, one starting from a beginning
vertex and the other from the destination vertex. When these two graphs
intersect, the search comes to an end.
BFS, DFS, DLS, and other search algorithms can be used in bidirectional
search.
Advantages
Searching in both directions is quick.
It takes less memory to do a bidirectional search.
Disadvantages
The bidirectional search tree is challenging to implement.
In bidirectional search, the objective state should be known ahead of time.
Example
The bidirectional search technique is used in the search tree below. One
graph/tree is divided into two sub-graphs using this approach. In the forward
direction, it begins at node 1 and in the reverse direction, it begins at goal node
16.
The process comes to a halt at node 9, when two searches collide.
Completeness: If we use BFS in both searches, we get a complete bidirectional
search.
Time Complexity: Time complexity of bidirectional search using BFS
is O(bd).
Space Complexity: Space complexity of bidirectional search is O(bd).
Optimal: Bidirectional search is Optimal.
Informed Search Algorithms
So far, we've discussed uninformed search algorithms that scoured the search
space for all possible answers to the problem without having any prior
knowledge of the space. However, an educated search algorithm includes
information such as how far we are from the objective, the cost of the trip, and
how to get to the destination node. This knowledge allows agents to explore
less of the search area and discover the goal node more quickly.
For huge search spaces, the informed search algorithm is more useful. Because
the informed search algorithm is based on the concept of heuristics, it is also
known as heuristic search.
Heuristics function: Informed Search employs a heuristic function to
determine the most promising path. It takes the agent's current state as input
and outputs an estimate of how near the agent is to the goal. The heuristic
method, on the other hand, may not always provide the optimum solution, but
it guarantees that a good solution will be found in a fair amount of time. A
heuristic function determines how close a state is to the desired outcome. It
calculates the cost of an ideal path between two states and is represented by
h(n). The heuristic function's value is always positive.
Admissibility of the heuristic function is given as:
h(n) <= h*(n)
Here h(n) is heuristic cost, and h*(n) is the estimated cost. Hence heuristic
cost should be less than or equal to the estimated cost.
Pure Heuristic Search
The simplest type of heuristic search algorithm is pure heuristic search. It
grows nodes according to their heuristic value h. (n). It has two lists: an OPEN
list and a CLOSED list. It places nodes that have previously expanded in the
CLOSED list, and nodes that have not yet been expanded in the OPEN list.
Each iteration, the lowest heuristic value node n is extended, and all of its
successors are generated, and n is added to the closed list. The algorithm keeps
running until a goal state is discovered.
We shall cover two main algorithms in the informed search, which are listed
below:
Best First Search Algorithm(Greedy search)
A* Search Algorithm
Best-first Search Algorithm (Greedy Search)
The greedy best-first search algorithm always chooses the path that appears to
be the most appealing at the time. It's the result of combining depth-first and
breadth-first search algorithms. It makes use of the heuristic function as well
as search. We can combine the benefits of both methods with best-first search.
At each step, we can use best-first search to select the most promising node.
We expand the node that is closest to the goal node in the best first search
method, and the closest cost is determined using a heuristic function, i.e.
f(n)= g(n).   
Were, h(n)= estimated cost from node n to the goal.
The priority queue implements the greedy best first algorithm.
Best first search algorithm:
Stage 1: Place the starting node into the OPEN list.
Stage 2: If the OPEN list is empty, Stop and return failure.
Stage 3: Remove the node n, from the OPEN list which has the lowest value of
h(n), and places it in the CLOSED list.
Stage 4: Expand the node n, and generate the successors of node n.
Stage 5: Check each of node n's descendants to see if any of them is a goal
node. Return success and end the search if any successor node is a goal node;
otherwise, proceed to Stage 6.
Stage 6: The algorithm looks for the evaluation function f(n) for each
successor node, then determines if the node has been in the OPEN or
CLOSED list. Add the node to the OPEN list if it isn't already on both lists.
Stage 7: Return to Stage 2.
Advantages
By combining the benefits of both algorithms, best first search may transition
between BFS and DFS.
This method outperforms the BFS and DFS algorithms in terms of efficiency.
Disadvantages:
In the worst-case scenario, it can act like an unguided depth-first search.
As with DFS, it's possible to get stuck in a loop.
This algorithm isn't the best.
Example:
Consider the search problem below, which we'll solve with greedy best-first
search. Each node is extended at each iteration using the evaluation function
f(n)=h(n), as shown in the table below.
We're going to use two lists in this example: OPEN and CLOSED Lists. The
iterations for traversing the aforementioned example are listed below.

Expand the nodes of S and put in the CLOSED list


Initialization: Open [A, B], Closed [S]
Iteration 1: Open [A], Closed [S, B]
Iteration2: Open[E,F,A],Closed[S,B]
                  : Open [E, A], Closed [S, B, F]
Iteration 3: Open [I, G, E, A], Closed [S, B, F]
                  : Open [I, E, A], Closed [S, B, F, G]
Hence the final solution path will be: S----> B----->F----> G
Time Complexity: The worst case time complexity of Greedy best first search
is O(bm).
Space Complexity: The worst case space complexity of Greedy best first
search is O(bm). Where, m is the maximum depth of the search space.
Complete: Even if the given state space is finite, greedy best-first search is still
imperfect.
Optimal: Greedy The best-first-search algorithm isn't the greatest.
A* Search Algorithm
The most well-known type of best-first search is the A* search. It employs the
heuristic function h(n) and the cost of getting from state g to node n. (n). It
solves the problem efficiently by combining UCS and greedy best-first search
features. Using the heuristic function, the A* search algorithm finds the
shortest path through the search space. This search algorithm uses a smaller
search tree and delivers the best results faster. The A* method is similar to
UCS, but instead of g(n)it uses g(n)+h(n).
We employ a search heuristic as well as the cost to reach the node in the A*
search algorithm. As a result, we can add both expenses together as follows,
and this total is referred to as a fitness number.

Only the nodes with the lowest value of f(n) are extended at each point in the
search space, and the procedure ends when the goal node is located.
Algorithm of A* search
Stage1: Place the starting node in the OPEN list.
Stage 2: Check if the OPEN list is empty or not, if the list is empty then return
failure and stops.
Stage 3: Select the node from the OPEN list which has the smallest value of
evaluation function (g+h), if node n is goal node then return success and stop,
otherwise
Stage 4: Expand node n and generate all of its successors, and put n into the
closed list. For each successor n', check whether n' is already in the OPEN or
CLOSED list, if not then compute evaluation function for n' and place into
Open list.
Stage 5: Else if node n' is already in OPEN and CLOSED, then it should be
attached to the back pointer which reflects the lowest g(n') value.
Stage 6: Return to Step 2.
Advantages
The A* search algorithm outperforms all other search algorithms.
The A* search algorithm is ideal and comprehensive.
This method is capable of resolving extremely difficult issues.
Disadvantages
Because it is primarily reliant on heuristics and approximation, it does not
always yield the shortest path.
The A* search algorithm has some concerns with complexity.
The fundamental disadvantage of A* is that it requires a lot of memory
because it maintains all created nodes in memory, which makes it unsuitable
for a variety of large-scale issues.
Example
We'll use the A* method to explore the given graph in this example. We'll
calculate the f(n) of each state using the formula f(n)= g(n) + h(n), where g(n)
is the cost of reaching any node from the start state.
We'll use the OPEN and CLOSED lists here.
Solution

Initialization: {(S, 5)}
Iteration1: {(S--> A, 4), (S-->G, 10)}
Iteration2: {(S--> A-->C, 4), (S--> A-->B, 7), (S-->G, 10)}
Iteration3: {(S--> A-->C--->G, 6), (S--> A-->C--->D, 11), (S--> A-->B, 7),
(S-->G, 10)}
Iteration 4 will give the final result, as S--->A--->C--->G it provides the
optimal path with cost 6.
Stratagies
A* algorithm returns the path which occurred first, and it does not search for
all remaining paths.
The efficiency of A* algorithm depends on the quality of heuristic.
A* algorithm expands all nodes which satisfy the condition f(n)<="" li="">
Complete: A* algorithm is complete as long as:
Branching factor is finite.
Cost at every action is fixed.
Optimal: A* search algorithm is optimal if it follows below two conditions:
Admissible: The first requirement for optimality is that h(n) be an admissible
heuristic in A* tree search. An acceptable heuristic is one that is optimistic.
Consistency: For only A* graph-search, the second required condition is
consistency.
A* tree search will always identify the least expensive path if the heuristic
function is accepted.
Time Complexity: The A* search algorithm's time complexity is determined
by the heuristic function, and the number of nodes expanded is proportional to
the depth of the solution d. So, where b is the branching factor, the temporal
complexity is O(bd).
Space Complexity: The space complexity of A* search algorithm is O(b^d)
Heuristic Functions in Artificial Intelligence
Heuristic Functions in AI: As we've already seen, an informed search makes
use of heuristic functions in order to get closer to the goal node. As a result,
there are various ways to get from the present node to the goal node in a
search tree. It is undeniably important to choose a decent heuristic function.
The efficiency of a heuristic function determines its usefulness. The more
knowledge about the problem there is, the longer it takes to solve it.
A heuristic function can help solve some toy problems more efficiently, such
as 8-puzzle, 8-queen, tic-tac-toe, and so on. Let's have a look at how:
Consider the eight-puzzle issue below, which has a start and a target state. Our
goal is to slide the tiles from the current/start state into the goal state in the
correct order. There are four possible movements: left, right, up, and down.
There are various ways to transform the current/start state to the desired state,
but we can solve the problem more efficiently by using the heuristic function
h(n).

A heuristic function for the 8-puzzle problem is defined below:


h(n)=Number of tiles out of position.
The following is a heuristic function for the 8-puzzle problem:
h(n)=Number of tiles that are out of place.
So, there are three tiles that are out of place, namely 6, 5, and 4. The empty tile
in the goal state is not counted). h(n)=3 in this case. The value of h(n) =0 must
now be minimised.
To reduce the h(n) value to 0, we can build a state-space tree as shown below:

The objective state is minimised from h(n)=3 to h(n)=0, as seen in the state
space tree above. However, depending on the requirement, we can design and
employ a number of heuristic functions. A heuristic function h(n) can
alternatively be defined as the knowledge needed to solve a problem more
efficiently, as shown in the previous example. The information can be related
to the nature of the state, the cost of changing states, the characteristics of
target nodes, and so on, and is stated as a heuristic function.
Properties of a Heuristic search Algorithm
The following qualities of a heuristic search algorithm result from the use of
heuristic functions in a heuristic search algorithm:
Admissible Condition: If an algorithm gives an optimal solution, it is said to
be acceptable.
Completeness: If an algorithm ends with a solution, it is said to be complete (if
the solution exists).
Dominance Property: If A1 and A2 are both admissible heuristic algorithms
with h1 and h2 heuristic functions, A1 is said to dominate A2 if h1 is better
than h2 for all node n values.
Optimality Property: If an algorithm is complete, acceptable, and dominates
other algorithms, it is the best and will almost always produce the best result.
Local Search Algorithms and Optimization Problem
The informed and uninformed search expands the nodes in two ways: by
remembering different paths and selecting the best suitable path, which leads
to the solution state required to reach the destination node. But, in addition to
these "classical search algorithms," there are some "local search algorithms"
that ignore path cost and focus just on the solution-state required to reach the
destination node.
Instead of visiting numerous paths and following the neighbours of a single
current node, a local search algorithm completes its mission by following the
neighbours of that node in general.
Although local search algorithms are not systematic, still they have the
following two advantages:
 Because they only work on a single path, local search algorithms
consume a little or constant amount of memory.
 In huge or infinite state spaces, where classical or systematic
algorithms fail, they frequently discover a suitable solution.
Is it possible to use the local search algorithm to solve a pure optimised
problem?
Yes, for pure optimised issues, the local search technique works. A pure
optimization problem is one that can be solved by all nodes. However,
according to the objective function, the goal is to discover the optimal state out
of all of them. Unfortunately, the pure optimization issue fails to identify good
solutions for getting from the current condition to the objective state.
In various contexts of optimization issues, an objective function is a function
whose value is either minimised or maximised. An objective function in
search algorithms can be the path cost to the goal node, for example.
Working of a Local search algorithm
Let's look at how a local search algorithm works with the help of an example:
Consider the state-space landscape below, which includes both:
Location: It is defined by the state.
Elevation: The value of the objective function or heuristic cost function
defines it.

The aforementioned terrain is explored by the local search algorithm by


locating the following two points:
Global Minimum: If the cost is equal to the elevation, the challenge is to
identify the lowest valley, also known as the Global Minimum.   
Global Maxima: If the elevation matches an objective function, the highest
point, known as the Global Maxima, is found. It is the valley's highest peak.
In the Hill-climbing search, we will gain a deeper understanding of how these
points work.
Here are some examples of different kinds of local searches:
Hill-climbing Search
Simulated Annealing
Local Beam Search
Hill Climbing Algorithm in AI
Hill Climbing Algorithm: A local search issue, hill climbing is. The goal of
the hill climbing search is to climb a hill and reach its highest peak or summit.
It is based on the heuristic search strategy, in which the person ascending the
hill calculates the direction that will take him to the highest peak.
State-space Landscape of Hill climbing algorithm
Consider the landscape below, which represents the objective state/peak and
the climber's current state, to grasp the concept of a hill climbing algorithm.
The geographical regions depicted in the diagram can be described as follows:
Global Maximum: It is the highest point on the hill, which is the goal state.
Local Maximum: It is the peak higher than all other peaks but lower than the
global maximum.
Flat local maximum: It is the flat area over the hill where it has no uphill or
downhill. It is a saturated point of the hill.
Shoulder: It is also a flat area where the summit is possible.
Current state: It is the current position of the person.

Types of Hill climbing search algorithm


There are following types of hill-climbing search:
Simple hill climbing
Steepest-ascent hill climbing
Stochastic hill climbing
Random-restart hill climbing

Simple hill climbing search


The most basic approach for climbing a hill is simple hill climbing. The goal
is to climb the mountain's highest peak. The climber's movement is
determined by his movements/steps. If he finds his next step better than the
previous one, he continues to move else remain in the same state. This search
is limited to his prior and subsequent actions.
Simple hill climbing Algorithm
Create a CURRENT node, NEIGHBOUR node, and a GOAL node.
If the CURRENT node=GOAL node, return GOAL and terminate the search.
Else  CURRENT node<= NEIGHBOUR node, move ahead.
Loop until the goal is not reached or a point is not found.
Steepest-ascent hill climbing
Simple hill climbing search is not the same as steepest-ascent hill climbing.
Unlike a traditional hill-climbing search, it considers all subsequent nodes,
compares them, and selects the node that is closest to the answer. Because it
focuses on each node instead of just one, steepest hill climbing search is akin
to best-first search.
Parameter: When there is no closer node, both simple and steepest-ascent hill
climbing searches fail.
Steepest-ascent hill climbing algorithm
 Create a CURRENT node and a GOAL node.
 If the CURRENT node=GOAL node, return GOAL and terminate the
search.
 Loop until a better node is not found to reach the solution.
 If there is any better successor node present, expand it.
 When the GOAL is attained, return GOAL and terminate.
Stochastic hill climbing
The focus of stochastic hill climbing is not on all of the nodes. It chooses one
node at random and determines whether it should be extended or replaced.
Random-restart hill climbing
The try-and-try approach is used in the random-restart algorithm. It searches
the node repeatedly, selecting the best option at each step until the goal is not
found. The shape of the hill is the most important factor in determining
success. It is easier to reach the target if there are few plateaus, local maxima,
and ridges.
Limitations of Hill climbing algorithm
The hill climbing algorithm is a lightning-fast method. It quickly determines
the solution state because improving a bad state is relatively simple. However,
this search has the following limitations:
Local Maxima: It is the mountain peak that is higher than all of its neighbours
but lower than the global maxima. Because there is another peak higher than
it, it is not the objective peak.
Plateau: It is a place with a flat surface and no uphill. The climber finds it
challenging to determine the direction he should travel in order to reach the
goal point. The person may become disoriented in the flat region.

Ridges: It's a difficult situation when a person frequently finds two or more


local maxima of the same height. It becomes tough for the person to find the
correct location and remain focused on it.

Simulated Annealing
The hill climbing algorithm is related to simulated annealing. It is effective in
the current circumstances. Instead of choosing the best move, it chooses a
random move. If the move improves the existing situation, it is always
accepted as a step toward the solution state; otherwise, the move with a
probability smaller than one is accepted. This search method was first utilised
to tackle VLSI layout challenges in 1980. It's also used for plant planning and
other large-scale optimization projects.
Local Beam Search
Random-restart search is not the same as local beam search. Instead of simply
one state, it keeps track of k. It chooses k randomly generated states and
expands them one at a time. The search ends with success if any state is a
desired state. Otherwise, it chooses the top k successors from the entire list
and continues the procedure. While each search process works independently
in random-restart search, the essential information is shared throughout the
parallel search processes in local beam search.
Disadvantages of Local Beam search
The absence of diversity among the k states may make this search difficult.
It's a more expensive version of the hill climbing search.
Parameter: Stochastic Beam Search is a version of Local Beam Search that
chooses k successors at random rather than the best k successors.

Searching with Partial Observations


Introduce the concept of belief state: indicates an agent's current belief about
various physical states based on a series of actions and perceptions up to that
point 3 scenarios:
– Searching with no observation
– Searching with observations
- Solving partially observable problems

Searching with no observations: Sensorless


Sensorless = conformant
Example: Vacuum world with no sensors:
– Agent knows geography of environment
– Agent doesn’t know its location or distribution of dirt
– Initial state: {1, 2, 3, 4, 5, 6, 7, 8}
– Action outcomes:
• [Right]: possible successor states: {2, 4, 6, 8}
• [Right, Suck]: {4, 8}
• [Right, Suck, Left Suck]: {7}
Sensorless problems: Search in Space of Belief States
Beliefs are fully observable
Belief states: every possible set of physical states; N 2N belief statesphysical
states
• Initial state: Typically the set of all physical states
• Actions: Either the union or intersection of the legal actions for the current
belief states
• Transition model: set of all possible states that could result from taking any
of the actions in any of the belief states
• Goal test: all states in current belief set are goal states
• Path cost: (it depends. application-specific)
Challenge: size of belief state
Example: belief state for 10 x 10 vacuum world has 100 x 2100 = 1032
physical states!
Alternatives:
Better (more compact) representation
Solving problem incrementally (incremental belief-state search)
E.g.,
solve for first state,
see if it works for other states;
if not, find another solution for first state,
and iterate
But, generally tough to solve w/o sensors!
Belief-state space for deterministic, sensorless vacuum world

there are 28 = 256 possible belief states, but only 12 reachable belief states

Searching with observations


Define PERCEPT(s) that returns the agent’s percept, given the state s
• E.g., In Vacuum world, PERCEPT(state1)=[A,Dirty]
Special cases:
– Fully observable problems: PERCEPT(s) = s
– Sensorless problems: PERCEPT(s) = Null
Vacuum World Examples

• PERCEPT = [A,Dirty] yields belief state {1, 3}


• PERCEPT = [B,Clean] yields belief state {4, 8}
Example Transitions (Grey circles represent belief states)
Deterministic world: Fig A
Slippery World : Fig B
Searching with observations
Prediction: given action a in belief state b, predict the resulting belief state: ƀ =
Predict(b, a)
• Observation prediction: determine set of percepts o that could be observed in
the predicted belief state: POSSIBLE_PERCEPTS(ƀ)= {𝑜: 𝑜 = PERCEPT 𝑠
and 𝑠 𝜖 ƀ}
• Update: determine belief state that results from each possible percept (i.e.,
which set of states in ƀ could have produced the percept)
𝑏0=UPDATE (ƀ,o)= {s: o =PERCEPT 𝑠 and 𝑠 𝜖 ƀ}
• Then, obtain possible belief states resulting from action and subsequent
possible percepts: RESULTS 𝑏, 𝑎
= 𝑏0: 𝑏0= UPDATE(PREDICT(𝑏, 𝑎 , 𝑜) and 𝑜 𝜖 POSSIBLE−PERCEPTS
(PREDICT (𝑏, 𝑎))}

Online search problems


Until now, there was offline search, in which a complete solution was
generated before anything in the physical world changed.
• Online search:
– Interleaves computation and action
• “Solved” by an agent executing actions
– Useful for dynamic environments
– Useful for nondeterministic environments, since it allows agent to focus on
contingencies that actually arise
– not just those that might arise
– Necessary for unknown environments
Online search problems (con’t.)
Agent only knows:
– Actions(s) – list of actions allowed in state s
– Step-cost function c(s, a, s’) can only be determined after s’ discovered
– Goal-Test(s)
Agent might know: admissible heuristic to determine distance to goal state
• Cost: total path cost of the path the agent actually travels
• Competitive ratio: ratio of actual cost to optimal cost (i.e., best case if the
agent knew the search space in advance)
– “1” is optimal actual cost
Irreversible actions: fall off cliff!
• Dead-end state: locked in a freezer!
• No algorithm can avoid dead ends in all state spaces
• Easier: assume state space is safely explorable, where some goal is reachable
from every state
Constraint Satisfaction Problems in Artificial Intelligence
We've seen a variety of strategies, such as local search and adversarial search,
used to solve various problems. Every problem-solving technique has a single
goal: to find a solution that will allow you to achieve your goal. Although
there were no limits on the agents in adversarial search and local search when
solving issues and finding solutions, there were in adversarial search and local
search.
In this section, we'll look at the Constraint Satisfaction Technique, which is a
form of problem-solving technique. Constraint satisfaction, as the name
implies, is the process of solving a problem while adhering to specific
constraints or norms.
Constraint satisfaction is a problem-solving strategy in which the values of a
problem satisfy specific restrictions or criteria. A strategy like this leads to a
better grasp of the problem's structure and complexity.
Constraint satisfaction is determined by three factors:
X: It is a set of variables.
D: It is a set of domains where the variables reside. There is a specific domain
for each variable.
C: It is a set of constraints which are followed by the set of variables.
Domains are the spaces where variables exist in constraint fulfilment,
following the problem-specific constraints. The three major components of a
constraint satisfaction technique are as follows. A pair of {scope and rel}
make up the constraint value. The scope is a tuple of variables that are part of
the constraint, and rel is a relation that gives a list of possible values for the
variables to meet the problem's constraints.
Solving Constraint Satisfaction Problems
The following are the prerequisites for solving a constraint satisfaction
problem (CSP):
 A state-space
 The notion of the solution.
Assigning values to some or all variables, such as {X1=v1, X2=v2,} and so
on, defines a state in state-space.
There are three methods for assigning values to a variable:
Consistent or Legal Assignment: A consistent or legal assignment is one that
does not break any constraints or rules.
Complete Assignment: An assignment in which each variable has a value and
the CSP solution remains consistent. The term "complete assignment" refers to
such a task.
A partial assignment is one in which only some of the variables are assigned
values. Partial assignments are the name for this type of assignment.
Types of Domains in CSP
The variables are divided into two sorts of domains:
Discrete Domain: It's an endless domain with several variables in a single
state. For each variable, for example, a start state can be assigned an endless
number of times.
Finite Domain: It's a finite domain with continuous states that describe a
single domain for a single variable. A continuous domain is another name for
it.
Constraint Types in CSP
With respect to the variables, basically there are following types of
constraints:
Unary Constraints: It is the simplest type of constraints that restricts the value
of a single variable.
Binary Constraints: It is the constraint type which relates two variables. A
value x2 will contain a value which lies between x1 and x3.
Global Constraints: It is the constraint type which involves an arbitrary
number of variables.
Some special types of solution algorithms are used to solve the following
types of constraints:
Linear Constraints: These type of constraints are commonly used in linear
programming where each variable containing an integer value exists in linear
form only.
Non-linear Constraints: These type of constraints are used in non-linear
programming where each variable (an integer value) exists in a non-linear
form.
Parameter: A special constraint which works in real-world is known
as Preference constraint.
Constraint Propagation
In local state-spaces, there is just one option, which is to look for a solution.
However, in CSP, we have two options: we can either look for a solution or
we can create one.
Constraint propagation is a type of inference that we can undertake.
Constraint propagation is a sort of reasoning that aids in limiting the number
of legal values for variables. Constraint propagation is based on the concept of
local consistency.
Variables are represented as nodes in local consistency, and each binary
constraint is handled as an arc in the given issue. The following local
consistency issues are mentioned further down:
Node Consistency: If all of the values in a variable's domain fulfil the unary
restrictions on the variables, the variable is said to be node consistent.
Arc Consistency: If every value in a variable's domain satisfies the variables'
binary requirements, the variable is said to be arc consistent.
Path Consistency: When the evaluation of a set of two variables in relation to a
third variable can be extended to another variable while still meeting all binary
restrictions. It's comparable to the concept of arc consistency.
k-consistency: The concept of stronger kinds of propagation is defined by this
type of consistency. The k-consistency of the variables is investigated here.
CSP Problems
Constraint satisfaction refers to problems that have some constraints that must
be met in order to be solved. The following issues are included in CSP:
Graph Coloring: The limitation here is that no two neighbouring sides can
have the same colour.

Sudoku Playing: The game has a rule that no number from 0 to 9 can be


repeated in the same row or column.
n-queen problem: The constraint in the n-queen problem is that no queens
should be placed diagonally, in the same row or column.
Crossword: The constraint in a crossword puzzle is that the words must be
correctly formed and meaningful.
Latin square Problem: The goal of this game is to find the pattern that appears
multiple times during the game. They may be jumbled, but the numerals will
be the same.

Cryptarithmetic Problem
Cryptarithmetic Problem is a form of constraint fulfilment problem where the
game is about digits and their unique substitution either with alphabets or
other symbols. The digits (0-9) are substituted by some conceivable alphabets
or symbols in a cryptarithmetic problem. In a cryptarithmetic problem, the
goal is to replace each digit with an alphabet to achieve an arithmetically
correct solution.
The following are the rules or constraints for a cryptarithmetic problem:
 A unique digit should be substituted for a unique alphabet.
 The outcome must adhere to the predetermined arithmetic rules, such
as 2+2 = 4, and nothing else.
 Only digits from 0 to 9 should be used.
 When conducting an addition operation on a problem, there should
only be one carry forward.
 The problem can be approached from either the lefthand side (L.H.S)
or the righthand side (R.H.S) (R.H.S)
 Let's use an example to better grasp the cryptarithmetic problem and its
constraints:
 S E N D + M O R E = M O N E Y is a cryptarithmetic problem.

In this case, combining the phrases S E N D and M O R E gives M O N E Y.


To break down the given problem into its component pieces, follow the
procedures below:
S and M are the terms that begin on the left hand side (L.H.S). Assign a digit
that will produce an acceptable result. Assign S to 9 and M to 1.

As a result of adding up the words, we get a satisfactory result and an


assignment for O as O->0.
Continue to the next terms E and O to obtain N as the output.

Adding E and O results in 5+0=0, which is not allowed due to cryptarithmetic


constraints prohibiting the assignment of the same digit to two letters. As a
result, we must think more deeply and assign a different value.
We will obtain one carry if we solve further, and if we apply it, the answer
will be satisfied.
Furthermore, by combining the next two terms N and R, we arrive at

However, E->5 has already been assigned. As a result, because we get a


different value for E, the given result does not satisfy the values. As a result,
we must think more deeply. We will obtain a carryover on this term after
answering the entire problem, thus our solution will be satisfied.        

where 1 is carried through to the term above


Let's get started.
We get Y as a result of adding the final two terms, i.e., the rightmost terms D
and E.
Here 1 will be  carry forward to the above term
Keeping all the constraints in mind, the final resultant is as follows:

The assignment of the numerals to the alphabets is depicted in the diagram


below.

More cryptarithmical problems can be found here:


On the cryptarithmatic issues, we can also perform multiplication.
Constraint propagation
Although forward checking catches a large number of discrepancies, it does
not catch them all. The phrase "constraint propagation" refers to the process of
spreading the effects of a constraint on one variable to other variables.
Arc Consistency

k-Consistency
Local Search for CSPs

The Structure of Problems Problem Structure


Independent Subproblems

Tree-Structured CSPs

Backtracking Search for CSPs


Backtracking search refers to a depth-first search that selects values for one
variable at a time and then backtracks when there are no more legal values to
assign to that variable. Figure 1 depicts the algorithm.
For the constraint fulfilment problem, a simple backtracking technique is used.
The recursive depth-first search is used to model the algorithm.

For the map-coloring problem, part of the search tree was constructed using
simple backtracking.
Forward checking
Forward checking is a technique for making better use of constraints during
search. When a variable X is assigned, the forward checking procedure
examines each unassigned variable Y that is linked to X by a constraint and
deletes any value from Y's domain that is incompatible with the value chosen
for X. The progress of a map-coloring search with forward checking is shown
in the diagram below.

The progress of a forward-checked map-coloring search. WA = red is assigned


initially, and then red is removed from the domains of the adjoining variables
NT and SA via forward checking. Green is removed from the domains of NT,
SA, and NSW after Q = green. Following the deletion of V = blue, blue from
the domains of NSW and SA, SA is left with no legal values.
Adversarial Search
Adversarial search is a type of search in which we look at the issue that
develops when we try to plan ahead of the world while other agents plan
against us.
We've looked at search methods that are solely linked with a single agent that
attempts to find a solution, which is commonly stated as a series of actions, in
prior subjects.
However, there may be times when more than one agent is searching for the
same answer in the same search space, which is common in game play.
The environment with more than one agent is referred to as a multi-agent
environment, in which each agent is an adversary to the other and competes
against them. Each agent must think about the actions of other agents and how
they affect their own performance.
Adversial searches, often known as Games, are searches in which two or more
players with opposing aims try to explore the same search space for a solution.
Games are modelled as a Search problem and a heuristic evaluation function,
which are the two primary variables that aid in the modelling and solving of
games in AI.
Types of Games in AI

Deterministic Chance Moves

Perfect information Chess, Checkers, go, Othello Backgammon,


monopoly

Imperfect information Battleships, blind, tic-tac-toe Bridge, poker,


scrabble, nuclear
war

Perfect information: A game with perfect information is one in which agents


are able to look at the entire board. Agents have access to all game
information and can watch each other's movements. Chess, Checkers, Go, and
other games are examples.
Imperfect information: Such games as tic-tac-toe, Battleship, blind, Bridge,
and others are known as games with incomplete information since the agents
do not have all of the knowledge about the game and are unaware of what is
going on.
Deterministic games: Deterministic games are those that follow a strict pattern
and set of rules, with no element of chance. Chess, Checkers, Go, tic-tac-toe,
and other games are examples.
Non-deterministic games: Non-deterministic games are those with a variety of
unpredictable events and a chance or luck aspect. Dice or cards are used to
introduce the element of chance or luck. These are unpredictably generated,
and each action reaction is unique. These games are sometimes known as
stochastic games.
Example: Backgammon, Monopoly, Poker, etc.
Zero-Sum Game
Zero-sum games are adversarial searches in which there is only one winner.
Each agent's gain or loss of utility in a zero-sum game is exactly balanced by
the losses or gains of another actor.
One player attempts to maximise a single value, while the other attempts to
minimise it.
A ply is a single move made by one player in the game.
A Zero-sum game is something like chess or tic-tac-toe.
Zero-sum game: Embedded thinking
In the Zero-sum game, one agent or player is trying to find out what to do.
What factors should be considered when making a decision?
He must also consider his opponent.
The opponent is likewise considering what to do.
Each player is trying to figure out how his opponent will react to his moves.
To address gaming problems in AI, this necessitates embedded thinking or
backward reasoning.
Formalization of the problem
A game can be characterised as a sort of AI search that consists of the
following elements:
Initial state: It specifies how the game is set up at the start.
Player(s): It specifies which player has moved in the state space.
Action(s): It returns the set of legal moves in state space.
Result(s, a): It is the transition model, which specifies the result of moves in
the state space.
Terminal-Test(s): If the game is over, the terminal test is true; otherwise, it is
false. Terminal states are the states in which the game comes to a finish.
Utility(s, p): For a game that ends in terminal states s for player p, a utility
function returns the final numeric number. It's also known as the payout
function. Chess has three possible outcomes: win, defeat, or draw, with payoff
values of +1, 0, and 1/2. Utility values for tic-tac-toe are +1, -1, and 0.
Game tree
A game tree is a tree in which the nodes represent game states and the edges
represent player moves. Initial state, actions function, and result function are
all part of the game tree.
Example: Tree of tic-tac-toe games:
The following diagram depicts a portion of the tic-tac-toe game's game tree.
The following are some of the game's significant points:
MAX and MIN are the two players.
Each player takes a turn and begins with MAX.
MAX maximises the game tree's result, whereas MIN minimises it.

Example Explanation:
 MAX has 9 possible moves from the start because he is the first player.
Both players alternately place x and o until we reach a leaf node where
one player has three in a row or all squares are filled.
 Both players will compute the best possible utility versus an optimum
adversary for each node, called the minimax value.
 Assume that both players are well-versed in tic-tac-toe and are playing
their best game. Each player is trying everything he can to keep the
other from winning. In the game, MIN is working against Max.
 So, in the game tree, we have a Max layer, a MIN layer, and each layer
is referred to as Ply. The game proceeds to the terminal node, with
Max placing x and MIN placing o to prevent Max from winning.
 Either MIN or MAX wins, or the game ends in a tie. This game-tree
represents the entire search universe of possibilities in which MIN and
MAX are tic-tac-toeing taking turns alternatively
As a result, the process for adversarial Search for the Minimax is as follows:
 Its goal is to figure out the best way for MAX to win the game.
 It employs a depth-first search strategy.
 The ideal leaf node in the game tree could appear at any level of the
tree.
 Minimax values should be propagated up the tree until the terminal
node is found.
The optimal strategy in a particular game tree can be determined by looking at
the minimax value of each node, which can be expressed as MINIMAX (n). If
MAX prefers to move to a maximum value state and MIN prefers to move to a
minimum value state, then:
Mini-Max Algorithm
 In decision-making and game theory, the mini-max algorithm is a
recursive or backtracking method. It suggests the best move for the
player, provided that the opponent is likewise playing well.
 The Mini-Max algorithm searches the game tree using recursion.
 In AI, the Min-Max algorithm is mostly employed for game play.
Chess, checkers, tic-tac-toe, go, and other two-player games are
examples. This Algorithm calculates the current state's minimax
choice.
 The game is played by two players, one named MAX and the other
named MIN, in this algorithm.
 Both players are fighting it, since the opponent player receives the
smallest benefit while they receive the greatest profit.
 Both players in the game are adversaries, with MAX selecting the
maximum value and MIN selecting the minimum value.
 For the investigation of the entire game tree, the minimax method uses
a depth-first search strategy.
 The minimax algorithm descends all the way to the tree's terminal
node, then recursively backtracks the tree.
Pseudo-code for MinMax Algorithm

function minimax(node, depth, maximizingPlayer) is  
if depth ==0 or node is a terminal node then  
return static evaluation of node  
if MaximizingPlayer then      // for Maximizer Player  
maxEva= -infinity            
 for each child of node do  
 eva= minimax(child, depth-1, false)  
maxEva= max(maxEva,eva)        //gives Maximum of the values  
return maxEva    
else                         // for Minimizer player  
 minEva= +infinity   
for each child of node do  
eva= minimax(child, depth-1, true)  
minEva= min(minEva, eva)         //gives minimum of the values  
return minEva
Initial call:
Minimax(node, 3, true)
Working of Min-Max Algorithm
 A simple example can be used to explain how the minimax algorithm
works. We've included an example of a game-tree below, which
represents a two-player game.
 There are two players in this scenario, one named Maximizer and the
other named Minimizer.
 Maximizer will strive for the highest possible score, while Minimizer
will strive for the lowest possible score.
 Because this algorithm uses DFS, we must go all the way through the
leaves to reach the terminal nodes in this game-tree.
The terminal values are given at the terminal node, so we'll compare them and
retrace the tree till we reach the original state. The essential phases in solving
the two-player game tree are as follows:
Step-1: The algorithm constructs the full game-tree in the first phase, then
applies the utility function to obtain the utility values for the terminal states.
Let's assume A is the tree's initial state in the diagram below. Assume that the
maximizer takes the first turn with a worst-case initial value of -infinity, and
the minimizer takes the second turn with a worst-case initial value of +infinity.
Step 2: Now, we'll locate the Maximizer's utilities value, which is -, and
compare each value in the terminal state to the Maximizer's initial value to
determine the upper nodes' values. It will select the best option from all of
them.
For node D         max(-1,- -∞) => max(-1,4)= 4
For Node E         max(2, -∞) => max(2, 6)= 6
For Node F         max(-3, -∞) => max(-3,-5) = -3
For node G         max(0, -∞) = max(0, 7) = 7
Step 3: In the next step, it's a turn for minimizer, so it will compare all nodes
value with +∞, and will find the 3rd layer node values.
For node B= min(4,6) = 4
For node C= min (-3, 7) = -3

Step 4: Now it's Maximizer's turn, and it'll choose the maximum value of all
nodes and locate the root node's maximum value. There are only four layers in
this game tree, so we can go to the root node right away, but there will be
more layers in real games.
For node A max(4, -3)= 4

That was the complete workflow of the minimax two player game.
Properties of Mini-Max algorithm
Complete- Min-Max algorithm is Complete. It will definitely find a solution
(if exist), in the finite search tree.
Optimal- Min-Max algorithm is optimal if both opponents are playing
optimally.
Time complexity- As it performs DFS for the game-tree, so the time
complexity of Min-Max algorithm is O(bm), where b is branching factor of the
game-tree, and m is the maximum depth of the tree.
Space Complexity- Space complexity of Mini-max algorithm is also similar to
DFS which is O(bm).
Limitation of the minimax Algorithm
The biggest disadvantage of the minimax algorithm is that it becomes
extremely slow while playing complex games like chess or go. This style of
game contains a lot of branching, and the player has a lot of options to choose
from.
Alpha-Beta Pruning
A modified variant of the minimax method is alpha-beta pruning. It's a way
for improving the minimax algorithm.
The number of game states that the minimax search algorithm must investigate
grows exponentially with the depth of the tree, as we saw with the minimax
search method. We can't get rid of the exponent, but we can reduce it by half.
As a result, there is a technique known as pruning that allows us to compute
the correct minimax choice without having to inspect each node of the game
tree.. It's named alpha-beta pruning because it involves two threshold
parameters, Alpha and beta, for future expansion. Alpha-Beta Algorithm is
another name for it.
Alpha-beta pruning can be done at any depth in a tree, and it can sometimes
prune the entire sub-tree as well as the tree leaves.
The two-parameter can be defined as:
a. Alpha: The best (highest-value) choice we have found so far at
any point along the path of Maximizer. The initial value of
alpha is -∞.
b. Beta: The best (lowest-value) choice we have found so far at
any point along the path of Minimizer. The initial value of beta
is +∞.
The Alpha-beta pruning to a standard minimax algorithm produces the same
result as the regular approach, but it removes those nodes that aren't really
effecting the final decision but are slowing down the procedure. As a result,
reducing these nodes speeds up the process.
Condition for Alpha-beta pruning
The main condition which required for alpha-beta pruning is:
α>=β  
Key points about alpha-beta pruning
Only the value of alpha will be updated by the Max player.
Only the beta value will be updated by the Min player.
Instead of alpha and beta values, node values will be sent to upper nodes while
retracing the tree.
Only the alpha and beta values will be passed to the child nodes.

Pseudo-code for Alpha-beta Pruning


function minimax(node, depth, alpha, beta, maximizingPlayer) is  
if depth ==0 or node is a terminal node then  
return static evaluation of node 
if MaximizingPlayer then      // for Maximizer Player  
maxEva= -infinity            
for each child of node do  
eva= minimax(child, depth-1, alpha, beta, False)  
maxEva= max(maxEva, eva)   
alpha= max(alpha, maxEva)      
if beta<=alpha  
break  
return maxEva  
else                         // for Minimizer player  
minEva= +infinity   
for each child of node do  
eva= minimax(child, depth-1, alpha, beta, true)  
minEva= min(minEva, eva)   
 beta= min(beta, eva)  
 if beta<=alpha  
break          
 return minEva 

Working of Alpha-Beta Pruning


To better understand how Alpha-beta pruning works, consider a two-player
search tree.
Stage 1: At the first step the, Max player will start first move from node A
where α= -∞ and β= +∞, these value of alpha and beta passed down to node B
where again α= -∞ and β= +∞, and Node B passes the same value to its child
D.

Stage 2: At Node D, the value of α will be calculated as its turn for Max. The
value of α is compared with firstly 2 and then 3, and the max (2, 3) = 3 will be
the value of α at node D and node value will also 3.
Stage 3: Now algorithm backtrack to node B, where the value of β will change
as this is a turn of Min, Now β= +∞, will compare with the available
subsequent nodes value, i.e. min (∞, 3) = 3, hence at node B now α= -∞, and
β= 3.
In the next step, algorithm traverse the next successor of Node B which is
node E, and the values of α= -∞, and β= 3 will also be passed.
Stage 4: At node E, Max will take its turn, and the value of alpha will change.
The current value of alpha will be compared with 5, so max (-∞, 5) = 5, hence
at node E α= 5 and β= 3, where α>=β, so the right successor of E will be
pruned, and algorithm will not traverse it, and the value at node E will be 5.

Stage 5: At next step, algorithm again backtrack the tree, from node B to node
A. At node A, the value of alpha will be changed the maximum available
value is 3 as max (-∞, 3)= 3, and β= +∞, these two values now passes to right
successor of A which is Node C.
At node C, α=3 and β= +∞, and the same values will be passed on to node F.
Stage 6: At node F, again the value of α will be compared with left child
which is 0, and max(3,0)= 3, and then compared with right child which is 1,
and max(3,1)= 3 still α remains 3, but the node value of F will become 1.

Stage 7: Node F returns the node value 1 to node C, at C α= 3 and β= +∞, here
the value of beta will be changed, it will compare with 1 so min (∞, 1) = 1.
Now at C, α=3 and β= 1, and again it satisfies the condition α>=β, so the next
child of C which is G will be pruned, and the algorithm will not compute the
entire sub-tree G.
Stage 8: C now returns the value of 1 to A here the best value for A is max (3,
1) = 3. Following is the final game tree which is the showing the nodes which
are computed and nodes which has never computed. Hence the optimal value
for the maximizer is 3 for this example.

Move Ordering in Alpha-Beta pruning


The order in which each node is reviewed has a significant impact on the
success of alpha-beta pruning. The order in which moves are made is crucial
in alpha-beta pruning.
There are two types of it:
Worst way to order: In some circumstances, the alpha-beta pruning method
does not trim any of the tree's leaves and works in the same way as the
minimax algorithm. Because of the alpha-beta factors, it also takes more time
in this scenario; this type of trimming is known as worst ordering. The optimal
move is on the right side of the tree in this situation. For such an order, the
temporal complexity is O(bm).
Ideal ordering: When there is a lot of pruning in the tree and the best moves
happen on the left side, the perfect ordering for alpha-beta pruning occurs. We
use DFS because it searches the left side of the tree first and then goes deep
twice as fast as the minimax method. In perfect ordering, the complexity is
O(bm/2).
Rules to find good ordering
The following are some guidelines for finding effective alpha-beta pruning
ordering:
The optimal move should be made from the shallowest node.
In the tree, sort the nodes so that the best ones are checked first.
When deciding on the right step, make use of your subject knowledge. For
example, in Chess, attempt the following order: captures first, threats second,
forward moves third, backward moves fourth.
We can keep track of the states because there's a chance they'll happen again.
Stochastic Games in Artificial Intelligence
In real life, many unpredictable external events can put us in unfavourable
situations. To represent this unpredictability, many games, such as dice
tossing, include a random element. Stochastic games are what they're called.
Backgammon is a classic game in which skill and luck are combined. The
permitted movements are established by rolling dice at the start of each
player's turn. In the backgammon scenario described below, white, for
example, has rolled a 6–5 and has four options.
This is a typical backgammon setup. The goal is to remove all of one's pieces
from the board as rapidly as possible. White progresses toward 25 in a
clockwise direction, while Black moves toward 0 in a counterclockwise
direction. A piece can progress to any position unless there are numerous
opponent pieces; if there is just one opponent, it is caught and must start
over. White has rolled a 6–5 and must choose between four legitimate
moves: (5–10,5–11), (5–11,19–24), (5–10,10–16), and (5–11,11–16), where
the notation (5–11,11–16) denotes moving one piece from 5 to 11 and then
another from 11 to 16.
Stochastic game tree for a backgammon position
White is aware of his or her legal options, but has no idea how Black will
roll, and so has no idea what Black's legal options will be. That means White
will be unable to construct a standard game tree, such as in chess or tic-tac-
toe. A game tree in backgammon must have chance nodes in addition to M A
X and M I N nodes. Chance nodes are depicted as circles in the diagram
below. The branches extending from each chance node reflect the various
dice rolls; each branch is labelled with the throw and its probability. There
are 36 potential ways to roll two dice, each equally likely, but because a 6–5
is the same as a 5–6, there are only 21 distinct rolls. Because each of the six
doubles (1–1 through 6–6) has a probability of 1/36, P (1–1) = 1/36. Each of
the remaining 15 rolls has a 1/18 chance of occurring.

The next step is to learn how to make wise decisions. Naturally, we want to
make the decision that will put us in the best possible situation. The
minimum and maximum values for positions, on the other hand, are not
specified. Instead, we can only calculate the expected value of a position,
which is the average of all possible outcomes of the chance nodes.
As a result, for games with chance nodes, we can generalise the deterministic
minimax value to an expected-minimax value. Terminal nodes, MAX and
MIN nodes (for which the dice roll is known), and MAX and MIN nodes (for
which the dice roll is unknown) all work the same way they did before. The
sum of all outcomes, weighted by the likelihood of each chance action, is the
expected value for chance nodes.

where r is a possible dice roll (or other random events) and RESULT(s,r)
denotes the same state as s, but with the addition that the dice roll’s result is
r.
UNIT III
KNOWLEDGE REPRESENTATION

First-Order Logic in Artificial intelligence


We learned how to represent statements using propositional logic in
Propositional Logic. Unfortunately, we can only represent facts that are either
true or untrue in propositional logic. To represent complicated phrases or
natural language statements, PL is insufficient. The expressive power of
propositional logic is quite restricted. Consider the sentence below, which we
can't represent with PL logic.
"Some humans are intelligent", or
"Sachin likes cricket."
Because PL logic is insufficient to capture the above statements, we needed to
use more stronger logic, such as first-order logic.
First-Order logic
 In artificial intelligence, first-order logic is another method of
knowledge representation. It's a variant of propositional logic.
 FOL has enough expressiveness to convey natural language statements
succinctly.
 Predicate logic or First-order predicate logic are other names for first-
order logic. First-order logic is a sophisticated language that makes it
easier to build information about objects and to articulate relationships
between them.
 First-order logic (like natural language) assumes not just that the world
includes facts, as does propositional logic, but also that the world has
the following things:
Objects: A, B, people, numbers, colors, wars, theories, squares, pits,
wumpus, ......
Relations: It can be a unary relation like red, round, or nearby, or an n-any
relation like sister of, brother of, has colour, or comes between.
Function: Father of, best friend, third inning of, end of, ......
First-order logic contains two basic elements as a natural language:
 Syntax
 Semantics
Syntax of First-Order logic
In first-order logic, the syntax of FOL determines which set of symbols
represents a logical expression. Symbols are the core syntactic constituents of
first-order logic. In FOL, we use short-hand notation to write statements.
Basic Elements of First-order logic
Following are the basic elements of FOL syntax

Constant 1, 2, A, John, Mumbai, cat,....

Variables x, y, z, a, b,....

Predicates Brother, Father, >,....

Function sqrt, LeftLegOf, ....

Connectives ∧, ∨, ¬, ⇒, ⇔

Equality ==

Quantifier ∀, ∃

Atomic sentences
Atomic sentences are the most basic first-order logic sentences. These
sentences are made up of a predicate symbol, a parenthesis, and a series of
terms.
We can represent atomic sentences as Predicate (term1, term2, ......, term n).
Example: Ravi and Ajay are brothers: => Brothers(Ravi, Ajay).
                Chinky is a cat: => cat (Chinky).
Complex Sentences
Connectives are used to join atomic sentences to form complex sentences.
The following are two types of first-order logic statements:
Subject: Subject is the main part of the statement.
Predicate: A predicate can be defined as a relation, which binds two atoms
together in a statement.
Consider the following statement: "x is an integer." It has two parts: the first
component, x, is the statement's subject, and the second part, "is an integer," is
the statement's predicate.

Quantifiers in First-order logic


Quantification describes the quantity of specimen in the universe of discourse,
and a quantifier is a linguistic element that generates quantification.
These are the symbols that allow you to determine or identify the variable's
range and scope in a logical expression. There are two different kinds of
quantifiers:
 Universal Quantifier, (for all, everyone, everything)
 Existential quantifier, (for some, at least one).
Universal Quantifier
A universal quantifier is a logical symbol that indicates that a statement inside
its range is true for everything or every instance of a specific thing.
A symbol ∀ that resembles an inverted A is used to represent the Universal
quantifier.
If x is a variable, then ∀x is read as:
 For all x
 For each x
 For every x.
Example
All man drink coffee.
Let x be a variable that corresponds to a cat, and all x in UOD can be
represented as follows.

∀x man(x) → drink (x, coffee).


It will be read as: Therᵗe are all x where x is a man who drink coffee.
Existential Quantifier
Existential quantifiers are a sort of quantifier that expresses that a statement is
true for at least one instance of something within its scope.
The logical operator ∃ resembles an inverted E and is used to represent it. It is
referred to be an existential quantifier when it is employed with a predicate
variable.
If x is a variable, the existential quantifier is either ∃x or ∃(x). It will also be
read as
There exists a 'x.'
For some 'x.'
For at least one 'x.'
Example
Some boys are intelligent.

∃x: boys(x) ∧ intelligent(x)


It will be read as: There are some x where x is a boy who is intelligent.
Features
The main connective for universal quantifier ∀ is implication →.
The main connective for existential quantifier ∃ is and ∧.
Properties of Quantifiers
In universal quantifier, ∀x∀y is similar to ∀y∀x.
 In Existential quantifier, ∃x∃y is similar to ∃y∃x.
 ∃x∀y is not similar to ∀y∃x.
Some Examples of FOL using quantifier:
1. All birds fly.
The predicate in this question is "fly(bird)."
Because all birds are able to fly, it will be portrayed as follows.
              ∀x bird(x) →fly(x).
2. Every man looks up to his father or mother.
The predicate in this question is "respect(x, y)," where x=man and y=parent.
Because every man exists, we shall use ∀, which will be represented as
follows:
              ∀x man(x) → respects (x, parent).
3. Some of the youngsters enjoy cricket.
The predicate in this question is "play(x, y)," where x=boys and y=game.
We'll use ∃ because there are some boys, and it'll be written as:
              ∃x boys(x) → play(x, cricket).
4. Not every student enjoys both math and science.
The predicate in this question is "like(x, y)," where x denotes student and y
denotes subject.
Because not all pupils are present, we shall utilise ∀ with negation, as shown
in the following representation:
              ¬∀ (x) [ student(x) → like(x, Mathematics) ∧ like(x, Science)].
5. Only one student received a failing grade in mathematics.
The predicate in this question is "failed(x, y)," where x represents a student
and y represents a topic.
We shall adopt the following representation because there is just one student
who failed Mathematics:
              ∃(x) [ student(x) → failed (x, Mathematics) ∧∀ (y) [¬(x==y) ∧
student(y) → ¬failed (x, Mathematics)].
Free and Bound Variables
The quantifiers interact with variables that exist in the right order. First-order
logic has two sorts of variables, which are listed below:
In a formula, a variable is said to be a free variable if it exists outside of the
quantifier's scope.
          Example: ∀x ∃(y)[P (x, y, z)], where z is a free variable.
Bound Variable: In a formula, a variable is said to be a bound variable if it
appears within the scope of the quantifier.
          Example: ∀x [A (x) B( y)], here x and y are the bound variables.
Knowledge Engineering
Knowledge engineering is the process of building a knowledge base in first-
order logic. In knowledge engineering, a knowledge engineer is someone who
researches a given domain, learns significant domain concepts, and develops a
formal representation of the objects.
In this topic, we'll look at the knowledge engineering process in the context of
an electronic circuit, which we're already familiar with. This method is best
suited for developing specialised knowledge bases.
The knowledge-engineering process:
The knowledge-engineering process is broken down into the following steps.
We will construct a knowledge foundation that will allow us to reason about
the digital circuit (One-bit full adder) that is shown below using these
techniques.

1. Identify the task:


The initial stage in the procedure is to determine the task, and there are several
reasoning tasks for the digital circuit.
We'll look at the circuit's functionality at the highest level, or first level:
 Does the circuit add properly?
 What will be the output of gate A2, if all the inputs are high?
At the second level, we'll look into the specifics of the circuit structure, such
as:
 Which gate is connected to the first input terminal?
 Does the circuit have feedback loops?
2. Assemble the relevant knowledge
In the second step, we will put together the necessary information for digital
circuits. So, in order to understand digital circuits, we must have the following
understanding.
 Wires and gates combine to form logic circuits.
 Signals flow through wires to the gate's input terminal, and each gate
produces an output that flows in the opposite direction.
 The gates AND, OR, XOR, and NOT are employed in this logic
circuit.
 One output terminal and two input terminals are shared by all of these
gates (except NOT gate, it has one input terminal).
3. Decide on vocabulary
Selecting functions, predicates, and constants to represent circuits, terminals,
signals, and gates is the next phase in the process. To begin, we'll separate the
gates from one another and from other items. Each gate is represented as an
object with a unique name, such as Gate (X1). Each gate's functionality is
determined by its type, which is represented by constants such as AND, OR,
XOR, and NOT. A predicate will be used to identify circuits: Circuit (C1).
For the terminal, we will use predicate: Terminal(x).
The function In(1, X1) will be used to denote the first input terminal of the
gate, and Out will be used to denote the output terminal (1, X1).
Circuit c has I input and j output, as indicated by the function Arity(c, I j).
Predicate Connect(Out(1, X1), In(1, X1)) can be used to express the
connections between gates.
We use a unary predicate On (t), which is true if the signal at a terminal is on.
4. Encode general knowledge about the domain
We'll need the following principles to encode general information about logic
circuits:
When two terminals are connected, the input signal is the same, which may be
expressed as
∀  t1, t2 Terminal (t1) ∧ Terminal (t2) ∧ Connect (t1, t2) → Signal (t1) = Sign
al (2).   
 Every terminal's signal will have a value of 0 or 1, and will be
represented as:
∀  t Terminal (t) →Signal (t) = 1 ∨Signal (t) = 0.  
 Connect predicates are commutative:
∀  t1, t2 Connect(t1, t2)  →  Connect (t2, t1).       
 Representation of types of gates:
∀  g Gate(g) ∧ r = Type(g) → r = OR ∨r = AND ∨r = XOR ∨r = NOT.   
 If and only if any of the AND gate's inputs are 0, the output will be
zero.
∀  g Gate(g) ∧ Type(g) = AND →Signal (Out(1, g))= 0 ⇔  ∃n Signal (In(n, g)
)= 0.   
 Output of OR gate is 1 if and only if any of its input is 1:
∀  g Gate(g) ∧ Type(g) = OR → Signal (Out(1, g))= 1 ⇔  ∃n Signal (In(n, g))
= 1   
 Output of XOR gate is 1 if and only if its inputs are different:
∀  g Gate(g) ∧ Type(g) = XOR → Signal (Out(1, g)) = 1 ⇔  Signal (In(1, g)) 
≠ Signal (In(2, g)).  
 Output of NOT gate is invert of its input:
∀  g Gate(g) ∧ Type(g) = NOT →   Signal (In(1, g)) ≠ Signal (Out(1, g)).  
 All the gates in the above circuit have two inputs and one output
(except NOT gate).
∀  g Gate(g) ∧ Type(g) = NOT →   Arity(g, 1, 1)   
∀  g Gate(g) ∧ r =Type(g)  ∧ (r= AND ∨r= OR ∨r= XOR) →  Arity (g, 2, 1).   
 All gates are logic circuits:
∀  g Gate(g) → Circuit (g).
Encode a description of the problem instance
 Now, in order to decode the problem of circuit C1, we must first
classify the circuit and its gate components. This phase is simple if the
problem's ontology has already been considered. This step entails
writing simple atomic sentences of idea instances, which is referred to
as ontology.
 We may encode the issue instance in atomic phrases for the given
circuit C1 as follows:
 Because there are two XOR gates, two AND gates, and one OR gate in
the circuit, the atomic sentences for these gates will be:
 For XOR gate: Type(x1)= XOR, Type(X2) = XOR  
 For AND gate: Type(A1) = AND, Type(A2)= AND  
 For OR gate: Type (O1) = OR. 
 And then represent the connections between all the gates.
Pose queries to the inference procedure and get answers
We'll find all of the possible values for all of the terminals in the adder circuit
in this stage. The first question will be: What input combination should yield a
0 for the first output of circuit C1 and a 1 for the second output?
∃ i1, i2, i3 Signal (In(1, C1))=i1  ∧  Signal (In(2, C1))=i2  ∧ Signal (In(3, C1))
= i3  
∧ Signal (Out(1, C1)) =0 ∧ Signal (Out(2, C1))=1 
Debug the knowledge base
We'll now debug the knowledge base, which is the final step in the process. In
this stage, we'll try to figure out what's wrong with the knowledge base.
We may have missed statements like 1 ≠ 0 in the knowledge base.
Inference in First-Order Logic
In First-Order Logic, inference is used to derive new facts or sentences from
existing ones. Before we get into the FOL inference rule, it's important to
understand some basic FOL terminology.
Substitution:
Substitution is a basic procedure that is applied to terms and formulations. It
can be found in all first-order logic inference systems. When there are
quantifiers in FOL, the substitution becomes more complicated. If we
write F[a/x], so it refers to substitute a constant "a" in place of variable "x".
Equality
In First-Order Logic, atomic sentences are formed not only via the use of
predicate and words, but also through the application of equality. We can do
this by using equality symbols, which indicate that the two terms relate to the
same thing.
Brother (John) = Smith, for example.
The thing referred to by the Brother (John) is comparable to the object referred
to by Smith in the example above. The negation symbol can also be used
alongside the equality symbol to show that two terms are not the same.
Example: ¬(x=y) which is equivalent to x ≠y.
FOL inference rules for quantifier
First-order logic has inference rules similar to propositional logic, therefore
here are some basic inference rules in FOL:
 Universal Generalization
 Universal Instantiation
 Existential Instantiation
 Existential introduction
1. Universal Generalization:
Universal generalisation is a valid inference rule that asserts that if premise
P(c) is true for every arbitrary element c in the universe of discourse, we can
draw the following conclusion: ∀ x P(x).

It can be represented as:  .


If we want to prove that every element has a similar property, we can apply
this rule.
x must not be used as a free variable in this rule.
Example: Let's represent, P(c): "A byte contains 8 bits", so for ∀ x P(x) "All
bytes contain 8 bits.", it will also be true.
2. Universal Instantiation:
A valid inference rule is universal instantiation, often known as universal
elimination or UI. It can be used to add additional sentences many times.
The new knowledge base is logically equal to the existing knowledge base.
We can infer any phrase by replacing a ground word for the variable,
according to UI.
The UI rule state that we can infer any sentence P(c) by substituting a ground
term c (a constant within domain x) from ∀ x P(x) for any object in the
universe of discourse.

It can be represented as: .


Example:1.
IF "Every person like ice-cream"=> ∀x P(x) so we can infer that
"John likes ice-cream" => P(c)
Example: 2.
Let's take a famous example,
"All kings who are greedy are Evil." So let our knowledge base contains this
detail as in the form of FOL:
∀x king(x) ∧ greedy (x) → Evil (x),
So from this information, we can infer any of the following statements using
Universal Instantiation
King(John) ∧ Greedy (John) → Evil (John),
King(Richard) ∧ Greedy (Richard) → Evil (Richard),
King(Father(John)) ∧ Greedy (Father(John)) → Evil (Father(John)),
3. Existential Instantiation
Existential instantiation is also known as Existential Elimination, and it is a
legitimate first-order logic inference rule.
It can only be used to replace the existential sentence once.
Although the new KB is not conceptually identical to the old KB, it will be
satisfiable if the old KB was.
This rule states that one can infer P(c) from the formula given in the form of
∃x P(x) for a new constant symbol c.
The restriction with this rule is that c used in the rule must be a new term for
which P(c ) is true.

It can be denoted as:


Example:
From the given sentence: ∃x Crown(x) ∧ OnHead(x, John),
So we can infer: Crown(K) ∧ OnHead( K, John), as long as K does not appear
in the knowledge base.
The sign K used above is a constant symbol known as the Skolem constant.
Skolemization process is a specific case of Existential instantiation.
4. Existential introduction
An existential generalisation is a valid inference rule in first-order logic that is
also known as an existential introduction.
This rule argues that if some element c in the universe of discourse has the
property P, we can infer that something in the universe has the attribute P.
It can be represented as: 
Example: Let's say that,
"Priyanka got good marks in English."
"Therefore, someone got good marks in English."
Generalized Modus Ponens Rule
In FOL, we use a single inference rule called Generalized Modus Ponens for
the inference process. It's a modified form of Modus ponens.
"P implies Q, and P is declared to be true, hence Q must be true," summarises
Generalized Modus Ponens.
According to Modus Ponens, for atomic sentences pi, pi', q. Where there is a
substitution θ such that SUBST (θ, pi',) = SUBST(θ, pi), it can be represented
as:

Example
We will use this rule for Kings are evil, so we will find some x such that x is
king, and x is greedy so we can infer that x is evil.
Here let say, p1' is king(John)        p1 is king(x)  
p2' is Greedy(y)                       p2 is Greedy(x)  
θ is {x/John, y/John}                  q is evil(x)  
SUBST(θ,q). 
Propositional logic in Artificial intelligence
The simplest kind of logic is propositional logic (PL), in which all statements
are made up of propositions. The term "proposition" refers to a declarative
statement that can be true or false. It's a method of expressing knowledge in
logical and mathematical terms.
Example
a) It is Sunday.  
b) The Sun rises from West (False proposition)  
c) 3+3= 7(False proposition)  
d) 5 is a prime number.
The following are some fundamental propositional logic facts:
 Because it operates with 0 and 1, propositional logic is also known as
Boolean logic.
 In propositional logic, symbolic variables are used to express the logic,
and any symbol can be used to represent a proposition, such as A, B,
C, P, Q, R, and so on.
 Propositions can be true or untrue, but not both at the same time.
 An object, relations or functions, and logical connectives make up
propositional logic.
 Logical operators are another name for these connectives.
 The basic elements of propositional logic are propositions and
connectives.
 Connectives are logical operators that link two sentences together.
 Tautology, commonly known as a legitimate sentence, is a proposition
formula that is always true.
 Contradiction is a proposition formula that is always false.
 Statements that are inquiries, demands, or opinions are not
propositions, such as "Where is Rohini," "How are you," and "What is
your name," are not propositions.
Syntax of propositional logic
The allowed sentences for knowledge representation are defined by the syntax
of propositional logic. Propositions are divided into two categories:
 Atomic Propositions
 Compound propositions
Atomic Proposition: Simple statements are atomic propositions. It is made up
of only one proposition sign. These are the sentences that must be true or
untrue in order to pass.
Example
a) 2+2 is 4, it is an atomic proposition as it is a true fact.  
b) "The Sun is cold," as well as being a false fact, is a proposition.
Compound propositions are made up of simpler or atomic propositions joined
together with parenthesis and logical connectives.
a) "It is raining today, and street is wet."  
b) "Ankit is a doctor, and his clinic is in Mumbai."
Logical Connectives
Logical connectives are used to link two simpler ideas or to logically represent
a statement. With the use of logical connectives, we can form compound
assertions. There are five primary connectives, which are listed below:
Negation: A sentence such as ¬ P is called negation of P. A literal can be
either Positive literal or negative literal.
Conjunction: A sentence which has ∧ connective such as, P ∧ Q is called a
conjunction.
Example: Rohan is intelligent and hardworking. It can be written as,
P= Rohan is intelligent,
Q= Rohan is hardworking. → P∧ Q.
Disjunction: A sentence which has ∨ connective, such as P ∨ Q. is called
disjunction, where P and Q are the propositions.
Example: "Ritika is a doctor or Engineer",
Here P= Ritika is Doctor. Q= Ritika is Doctor, so we can write it as P ∨ Q.
Implication: A sentence such as P → Q, is called an implication. Implications
are also known as if-then rules. It can be represented as
If it is raining, then the street is wet.
Let P= It is raining, and Q= Street is wet, so it is represented as P → Q
Biconditional: A sentence such as P⇔ Q is a Biconditional sentence, example
If I am breathing, then I am alive
P= I am breathing, Q= I am alive, it can be represented as P ⇔ Q.Following is
the summarized table for Propositional Logic Connectives:
Truth Table
We need to know the truth values of propositions in all feasible contexts in
propositional logic. With logical connectives, we can combine all possible
combinations, and the representation of these combinations in a tabular
manner is known as a truth table. The truth table for all logical connectives is
as follows:
 
Truth table with three propositions
A proposition can be constructed by combining three propositions: P, Q, and
R. Because we used three proposition symbols, this truth table is made up of
8n Tuples.

Precedence of connectives
Propositional connectors or logical operators, like arithmetic operators, have a
precedence order. When evaluating a propositional problem, this order should
be followed. The following is a list of the operator precedence order:

Precedence Operators

First Precedence Parenthesis

Second Precedence Negation

Third Precedence Conjunction(AND)

Fourth Precedence Disjunction(OR)


Fifth Precedence Implication

Six Precedence Biconditional

Logical equivalence
One of the characteristics of propositional logic is logical equivalence. If and
only if the truth table's columns are equal, two assertions are said to be
logically comparable.
Let's take two propositions A and B, so for logical equivalence, we can write it
as A⇔B. In below truth table we can see that column for ¬A∨ B and A→B,
are identical hence A is Equivalent to B

Properties of Operators
Commutativity:
P∧ Q= Q ∧ P, or
P ∨ Q = Q ∨ P.
Associativity:
(P ∧ Q) ∧ R= P ∧ (Q ∧ R),
(P ∨ Q) ∨ R= P ∨ (Q ∨ R)
Identity element:
P ∧ True = P,
P ∨ True= True.
Distributive:
P∧ (Q ∨ R) = (P ∧ Q) ∨ (P ∧ R).
P ∨ (Q ∧ R) = (P ∨ Q) ∧ (P ∨ R).
DE Morgan's Law:
¬ (P ∧ Q) = (¬P) ∨ (¬Q)
¬ (P ∨ Q) = (¬ P) ∧ (¬Q).
Double-negation elimination:
¬ (¬P) = P.
Limitations of Propositional logic
With propositional logic, we can't represent relations like ALL, SOME, or
NONE. Example:
 All the girls are intelligent.
 Some apples are sweet.
The expressive power of propositional logic is restricted.
We can't explain propositions in propositional logic in terms of their qualities
or logical relationships.
Rules of Inference in Artificial intelligence
Inference
We need intelligent computers in artificial intelligence to construct new logic
from old logic or evidence, therefore inference is the process of drawing
conclusions from data and facts.
Inference rules
The templates for creating valid arguments are known as inference rules. In
artificial intelligence, inference rules are used to generate proofs, and a proof
is a series of conclusions that leads to the intended outcome.
The implication among all the connectives is vital in inference rules. Some
terms relating to inference rules are as follows:
Implication: It is one of the logical connectives which can be represented as P
→ Q. It is a Boolean expression.
Converse: The converse of implication, which means the right-hand side
proposition goes to the left-hand side and vice-versa. It can be written as Q →
P.
Contrapositive: The negation of converse is termed as contrapositive, and it
can be represented as ¬ Q → ¬ P.
Inverse: The negation of implication is called inverse. It can be represented as
¬ P → ¬ Q.
Some of the compound claims in the above term are equivalent to each other,
which we can establish using a truth table.

Hence from the above truth table, we can prove that P → Q is equivalent to ¬
Q → ¬ P, and Q→ P is equivalent to ¬ P → ¬ Q.
Types of Inference rules
1. Modus Ponens:
One of the most essential laws of inference is the Modus Ponens rule, which
asserts that if P and P →Q are both true, we can infer that Q will be true as
well. It's written like this:

Example:
Statement-1: "If I am sleepy then I go to bed" ==> P→ Q
Statement-2: "I am sleepy" ==> P
Conclusion: "I go to bed." ==> Q.
Hence, we can say that, if P→ Q is true and P is true then Q will be true.
2.Proof by Truth table:
3. Hypothetical Syllogism
The Hypothetical Syllogism rule state that if P→R is true whenever P→Q is
true, and Q→R is true. It can be represented as the following notation:
Example:
Statement-1: If you have my home key then you can unlock my home. P→Q
Statement-2: If you can unlock my home then you can take my money. Q→R
Conclusion: If you have my home key then you can take my money. P→R
Proof by truth table:

4. Disjunctive Syllogism:
The Disjunctive syllogism rule state that if P∨Q is true, and ¬P is true, then Q
will be true. It can be represented as:

Example:
Statement-1: Today is Sunday or Monday. ==>P∨Q
Statement-2: Today is not Sunday. ==> ¬P
Conclusion: Today is Monday. ==> Q
Proof by truth-table:
5. Addition:
The Addition rule is one the common inference rule, and it states that If P is
true, then P∨Q will be true.

Example:
Statement: I have a vanilla ice-cream. ==> P
Statement-2: I have Chocolate ice-cream.
Conclusion: I have vanilla or chocolate ice-cream. ==> (P∨Q)
Proof by Truth-Table:

6. Simplification:
The simplification rule state that if P∧ Q is true, then Q or P will also be true.
It can be represented as:

Proof by Truth-Table:
7. Resolution:
The Resolution rule state that if P∨Q and ¬ P∧R is true, then Q∨R will also be
true. It can be represented as

Proof by Truth-Table:

Wumpus world
The Wumpus world is a basic world example that demonstrates the value of a
knowledge-based agent and how knowledge representation is represented. It
was inspired by Gregory Yob's 1973 video game Hunt the Wumpus.
The Wumpus world is a cave with 4x4 rooms and pathways connecting them.
As a result, there are a total of 16 rooms that are interconnected. We now have
a knowledge-based AI capable of progressing in this world. There is an area in
the cave with a beast named Wumpus who eats everybody who enters. The
agent can shoot the Wumpus, but he only has a single arrow. There are some
Pits chambers in the Wumpus universe that are bottomless, and if an agent
falls into one, he will be stuck there indefinitely. The intriguing thing about
this cave is that there is a chance of finding a gold heap in one of the rooms.
So the agent's mission is to find the gold and get out of the cave without
getting eaten by Wumpus or falling into Pits. If the agent returns with gold, he
will be rewarded, but if he is devoured by Wumpus or falls into the pit, he will
be penalised.
A sample diagram for portraying the Wumpus planet is shown below. It
depicts some rooms with Pits, one room with Wumpus, and one agent in the
world's (1, 1) square position.

There are some elements that can assist the agent in navigating the cave. The
following are the components:
 The rooms adjacent to the Wumpus room are stinky, thus there is a
stench there.
 The room next to PITs has a breeze, so if the agent gets close enough
to PIT, he will feel it.
 If and only if the room contains gold, there will be glitter.
 If the agent is facing the Wumpus, the agent can kill it, and the
Wumpus will cry horribly, which can be heard throughout the cave.
PEAS description of Wumpus world
To explain the Wumpus world we have given PEAS description as below:
Performance measure:
 +1000 reward points if the agent comes out of the cave with the gold.
 -1000 points penalty for being eaten by the Wumpus or falling into the
pit.
 -1 for each action, and -10 for using an arrow.
 The game ends if either agent dies or came out of the cave.
Environment:
 A 4*4 grid of rooms.
 The agent initially in room square [1, 1], facing toward the right.
 Location of Wumpus and gold are chosen randomly except the first
square
 Each square of the cave can be a pit with probability 0.2 except the
first square.
Actuators
 Left turn,
 Right turn
 Move forward
 Grab
 Release
 Shoot.
Sensors:
 If the agent is in the same room as the Wumpus, he will smell the
stench. (Not on a diagonal.)
 If the agent is in the room directly adjacent to the Pit, he will feel a
breeze.
 The agent will notice the gleam in the room where the gold is located.
 If the agent walks into a wall, he will feel the bump.
 When the Wumpus is shot, it lets out a horrifying cry that can be heard
throughout the cave.
 These perceptions can be expressed as a five-element list in which
each sensor will have its own set of indicators.
 For instance, if an agent detects smell and breeze but no glitter, bump,
or shout, it might be represented as [Stench, Breeze, None, None,
None].
The Wumpus world Properties
 Partially observable: The Wumpus world is partially observable
because the agent can only perceive the close environment such as an
adjacent room.
 Deterministic: It is deterministic, as the result and outcome of the
world are already known.
 Sequential: The order is important, so it is sequential.
 Static: It is static as Wumpus and Pits are not moving.
 Discrete: The environment is discrete.
 One agent: The environment is a single agent as we have one agent
only and Wumpus is not considered as an agent.
Exploring the Wumpus world
Now we'll investigate the Wumpus universe and see how the agent will
achieve its goal through logical reasoning.
The first step for an agent is to:
Initially, the agent is in the first room, or square [1,1], and we already know
that this room is safe for the agent, thus we will add the sign OK to the below
diagram (a) to represent that room is safe. The agent is represented by the
letter A, the breeze by the letter B, the glitter or gold by the letter G, the
visited room by the letter V, the pits by the letter P, and the Wumpus by the
letter W.
At Room [1,1] agent does not feel any breeze or any Stench which means the
adjacent squares are also OK.
Agent's second Step:
Now that the agent must go forward, it will either go to [1, 2] or [2, 1]. Let's
say agent enters room [2, 1], where he detects a breeze, indicating Pit is
present. Because the pit might be in [3, 1] or [2, 2], we'll add the sign P? to
indicate that this is a Pit chamber.
Now the agent will pause and consider his options before doing any
potentially destructive actions. The agent will return to room [1, 1]. The agent
visits the rooms [1,1] and [2,1], thus we'll use the symbol V to symbolise the
squares he's been to.
Agent's third step:
The agent will now proceed to the room [1,2], which is fine. Agent detects a
stink in the room [1,2], indicating the presence of a Wumpus nearby.
However, according to the rules of the game, Wumpus cannot be in the room
[1,1], and he also cannot be in [2,2]. (Agent had not detected any stench when
he was at [2,1]). As a result, the agent infers that Wumpus is in the room [1,3],
and there is no breeze at the moment, implying that there is no Pit and no
Wumpus in [2,2]. So that's safe, and we'll designate it as OK, and the agent
will advance [2,2].
Agent's fourth step:
Because there is no odour and no breeze in room [2,2], let's assume the agent
decides to move to room [2,3]. Agent detects glitter in room [2,3], thus it
should collect the gold and ascend out of the cave.
Unification
 Unification is the process of finding a substitute that makes two
separate logical atomic expressions identical. The substitution process
is necessary for unification.
 It accepts two literals as input and uses substitution to make them
identical.
 Let Ψ1 and Ψ2 be two atomic sentences and 𝜎 be a unifier such
that, Ψ1𝜎 = Ψ2𝜎, then it can be expressed as UNIFY(Ψ1, Ψ2).
 Example: Find the MGU for Unify{King(x), King(John)}
 Let Ψ1 = King(x), Ψ2 = King(John),
 Substitution θ = {John/x} is a unifier for these atoms and applying this
substitution, and both expressions will be identical.
 For unification, the UNIFY algorithm is employed, which takes two
atomic statements and returns a unifier for each of them (If any exist).
 All first-order inference techniques rely heavily on unification.
 If the expressions do not match, the result is failure.
 The replacement variables are referred to as MGU (Most General
Unifier).
E.g. Let's assume P(x, y) and P(a, f(z)) are two different expressions.
In this case, we must make both of the preceding assertions identical. We'll
make the substitution in this case.
  P(x, y)......... (i)
  P(a, f(z))......... (ii)
 In the first statement, replace x with a and y with f(z), and the result
will be a/x and f(z)/y.
 The first expression will be equal to the second expression with both
replacements, and the substitution set will be [a/x, f(z)/y].
Conditions for Unification
Following are some basic conditions for unification:
 Atoms or expressions with various predicate symbols can never be
united.
 Both phrases must have the same number of arguments.
 If two comparable variables appear in the same expression, unification
will fail.
Unification Algorithm
Algorithm: Unify(Ψ1, Ψ2)
Step. 1: If Ψ1 or Ψ2 is a variable or constant, then:
a) If Ψ1 or Ψ2 are identical, then return NIL.
b) Else if Ψ1is a variable,
a. then if Ψ1 occurs in Ψ2, then return FAILURE
b. Else return { (Ψ2/ Ψ1)}.
c) Else if Ψ2 is a variable,
a. If Ψ2 occurs in Ψ1 then return FAILURE,
b. Else return {( Ψ1/ Ψ2)}.
d) Else return FAILURE.
Step.2: If the initial Predicate symbol in Ψ1 and Ψ2 are not same, then return
FAILURE.
Step. 3: IF Ψ1 and Ψ2 have a different number of arguments, then return
FAILURE.
Step. 4: Set Substitution set(SUBST) to NIL.
Step. 5: For i=1 to the number of elements in Ψ1.
a) Call Unify function with the ith element of Ψ1 and ith element of
Ψ2, and put the result into S.
b) If S = failure then returns Failure
c) If S ≠ NIL then do,
a. Apply S to the remainder of both L1 and L2.
b. SUBST= APPEND(S, SUBST).
Step.6: Return SUBST.
Implementation of the Algorithm
Step.1: Initialize the substitution set to be empty.
Step.2: Recursively unify atomic sentences:
 Check for Identical expression match.
 If one expression is a variable vi, and the other is a term ti which does
not contain variable vi, then:
 Substitute ti / vi in the existing substitutions
 Add ti /vi to the substitution setlist.
 If both the expressions are functions, then function name must be
similar, and the number of arguments must be the same in both the
expression.
 For each pair of the following atomic sentences find the most general
unifier (If exist).
1. Find the MGU of {p(f(a), g(Y)) and p(X, X)}
            Sol: S0 => Here, Ψ1 = p(f(a), g(Y)), and Ψ2 = p(X, X)
                  SUBST θ= {f(a) / X}
                  S1 => Ψ1 = p(f(a), g(Y)), and Ψ2 = p(f(a), f(a))
                  SUBST θ= {f(a) / g(y)}, Unification failed.
Unification is not possible for these expressions.
2. Find the MGU of {p(b, X, f(g(Z))) and p(Z, f(Y), f(Y))}
Here, Ψ1 = p(b, X, f(g(Z))) , and Ψ2 = p(Z, f(Y), f(Y))
S0 => { p(b, X, f(g(Z))); p(Z, f(Y), f(Y))}
SUBST θ={b/Z}
S1 => { p(b, X, f(g(b))); p(b, f(Y), f(Y))}
SUBST θ={f(Y) /X}
S2 => { p(b, f(Y), f(g(b))); p(b, f(Y), f(Y))}
SUBST θ= {g(b) /Y}
S2 => { p(b, f(g(b)), f(g(b)); p(b, f(g(b)), f(g(b))} Unified Successfully.
And Unifier = { b/Z, f(Y) /X , g(b) /Y}.
3. Find the MGU of {p (X, X), and p (Z, f(Z))}
Here, Ψ1 = {p (X, X), and Ψ2 = p (Z, f(Z))
S0 => {p (X, X), p (Z, f(Z))}
SUBST θ= {X/Z}
S1 => {p (Z, Z), p (Z, f(Z))}
SUBST θ= {f(Z) / Z}, Unification Failed.
Hence, unification is not possible for these expressions.
4. Find the MGU of UNIFY(prime (11), prime(y))
Here, Ψ1 = {prime(11) , and Ψ2 = prime(y)}
S0 => {prime(11) , prime(y)}
SUBST θ= {11/y}
S1 => {prime(11) , prime(11)} , Successfully unified.
 Unifier: {11/y}.
5. Find the MGU of Q(a, g(x, a), f(y)), Q(a, g(f(b), a), x)}
Here, Ψ1 = Q(a, g(x, a), f(y)), and Ψ2 = Q(a, g(f(b), a), x)
S0 => {Q(a, g(x, a), f(y)); Q(a, g(f(b), a), x)}
SUBST θ= {f(b)/x}
S1 => {Q(a, g(f(b), a), f(y)); Q(a, g(f(b), a), f(b))}
SUBST θ= {b/y}
S1 => {Q(a, g(f(b), a), f(b)); Q(a, g(f(b), a), f(b))}, Successfully Unified.
Unifier: [a/a, f(b)/x, b/y].
6. UNIFY(knows(Richard, x), knows(Richard, John))
Here, Ψ1 = knows(Richard, x), and Ψ2 = knows(Richard, John)
S0 => { knows(Richard, x); knows(Richard, John)}
SUBST θ= {John/x}
S1 => { knows(Richard, John); knows(Richard, John)}, Successfully Unified.
Unifier: {John/x}.
Forward Chaining and backward chaining in AI
Forward and backward chaining is an essential topic in artificial intelligence,
but before we go into forward and backward chaining, let's look at where these
two phrases come from.
Inference engine:
In artificial intelligence, the inference engine is a component of the intelligent
system that applies logical rules to the knowledge base to infer new
information from known facts. The expert system included the first inference
engine. Inference engines often operate in one of two modes:
 Forward chaining
 Backward chaining
Horn Clause and Definite clause
Horn clause and definite clause are sentence types that allow the knowledge
base to apply a more limited and efficient inference procedure. Forward and
backward chaining techniques are used in logical inference algorithms, and
they both need KB in the form of a first-order definite sentence.
A definite clause, sometimes known as a strict horn clause, is a clause that is a
disjunction of literals with exactly one affirmative literal.
Horn clause: A horn clause is a clause that is a disjunction of literals with at
most one positive literal. As a result, every definite clause is a horn clause.
Example: (¬ p V ¬ q V k). It has only one positive literal k.
It is equivalent to p ∧ q → k.
Forward Chaining
When employing an inference engine, forward chaining is also known as
forward deduction or forward reasoning. Forward chaining is a type of
reasoning that starts with atomic sentences in a knowledge base and uses
inference rules (Modus Ponens) to extract more material in the forward
direction until a goal is attained.
The Forward-chaining algorithm begins with known facts, then activates all
rules with satisfied premises and adds their conclusion to the known facts.
This process continues until the issue is resolved.
Properties of Forward-Chaining
 As it moves from bottom to top, it is a down-up method.
 It is a method of arriving at a conclusion based on known facts or data
by starting at the beginning and working one's way to the end.
 Forward-chaining is also known as data-driven since it allows us to
achieve our goal by utilising existing data.
 Expert systems, such as CLIPS, business, and production rule systems,
frequently employ the forward-chaining approach.
Consider the following well-known example, which we'll apply to both ways.
Example
"It is illegal for an American to sell weapons to unfriendly countries,
according to the law. Country A, an American foe, has a few missiles, all of
which were sold to it by Robert, an American citizen."
Demonstrate that "Robert is a thief."
To answer the problem, we'll turn all of the facts above into first-order definite
clauses, then utilise a forward-chaining procedure to get to the goal.
Facts Conversion into FOL
 It is a crime for an American to sell weapons to hostile nations. (Let's
say p, q, and r are variables)
American (p) ∧ weapon(q) ∧ sells (p, q, r) ∧ hostile(r) → Criminal(p)  
...(1)
 Country A has some missiles. ?p Owns(A, p) ∧ Missile(p). It can be
written in two definite clauses by using Existential Instantiation,
introducing new Constant T1.
Owns(A, T1)             ......(2)
Missile(T1)             .......(3)
 All of the missiles were sold to country A by Robert.
?p Missiles(p) ∧ Owns (A, p) → Sells (Robert, p, A)       ......(4)
 Missiles are weapons.
Missile(p) → Weapons (p)             .......(5)
 Enemy of America is known as hostile.
Enemy(p, America) →Hostile(p)             ........(6)
 Country A is an enemy of America.
Enemy (A, America)             .........(7)
 Robert is American
American(Robert).             ..........(8)
Forward chaining proof
Step-1:
In the first step we will start with the known facts and will choose the
sentences which do not have implications, such as: American(Robert),
Enemy(A, America), Owns(A, T1), and Missile(T1). All these facts will be
represented as below.
Step-2:
At the second step, we will see those facts which infer from available facts and
with satisfied premises.
Rule-(1) does not satisfy premises, so it will not be added in the first iteration.
Rule-(2) and (3) are already added.
Rule-(4) satisfy with the substitution {p/T1}, so Sells (Robert, T1, A) is
added, which infers from the conjunction of Rule (2) and (3).
Rule-(6) is satisfied with the substitution(p/A), so Hostile(A) is added and
which infers from Rule-(7).

Step-3:
At step-3, as we can check Rule-(1) is satisfied with the
substitution {p/Robert, q/T1, r/A}, so we can add Criminal(Robert) which
infers all the available facts. And hence we reached our goal statement.

Hence it is proved that Robert is Criminal using forward chaining approach.


Backward Chaining
When employing an inference engine, backward-chaining is also known as
backward deduction or backward reasoning. A backward chaining algorithm is
a type of reasoning that begins with the objective and goes backward, chaining
via rules to discover known facts that support it.
Properties of backward chaining
 A top-down strategy is what it's called.
 The modus ponens inference rule is used in backward-chaining.
 The goal is divided down into sub-goals or sub-goals in backward
chaining to establish the facts are correct.
 It's known as a goal-driven strategy because a list of objectives
determines which rules are chosen and implemented.
 In game theory, automated theorem proving tools, inference engines,
proof helpers, and other AI applications, the backward-chaining
algorithm is used.
 For proof, the backward-chaining method primarily used a depth-first
search strategy.
Example:
In backward-chaining, we will use the same above example, and will rewrite
all the rules.
American (p) ∧ weapon(q) ∧ sells (p, q, r) ∧ hostile(r) → Criminal(p) ...(1)
Owns(A, T1)                 ........(2)
Missile(T1) ------(3)
?p Missiles(p) ∧ Owns (A, p) → Sells (Robert, p, A)           ......(4)
Missile(p) → Weapons (p)                 .......(5)
Enemy(p, America) →Hostile(p)                 ........(6)
Enemy (A, America)                 .........(7)
American(Robert).                 ..........(8)
Backward-Chaining proof
In Backward chaining, we will start with our goal predicate, which
is Criminal(Robert), and then infer further rules.
Step-1:
At the first step, we will take the goal fact. And from the goal fact, we will
infer other facts, and at last, we will prove those facts true. So our goal fact is
"Robert is Criminal," so following is the predicate of it.

Step-2:
At the second step, we will infer other facts form goal fact which satisfies the
rules. So as we can see in Rule-1, the goal predicate Criminal (Robert) is
present with substitution {Robert/P}. So we will add all the conjunctive facts
below the first level and will replace p with Robert.
Here we can see American (Robert) is a fact, so it is proved here.

Step-3:t At step-3, we will extract further fact Missile(q) which infer from
Weapon(q), as it satisfies Rule-(5). Weapon (q) is also true with the
substitution of a constant T1 at q.
Step-4:
At step-4, we can infer facts Missile(T1) and Owns(A, T1) form Sells(Robert,
T1, r) which satisfies the Rule- 4, with the substitution of A in place of r. So
these two statements are proved here.
Step-5:
At step-5, we can infer the fact Enemy(A, America) from Hostile(A) which
satisfies Rule- 6. And hence all the statements are proved true using backward
chaining.
Difference between backward chaining and forward chaining

 Ahead chaining begins with known facts and moves forward by


applying inference rules to extract more data until it reaches the goal,
whereas backward chaining begins with the goal and moves backward
by applying inference rules to determine the facts that satisfy the goal.
 A data-driven inference technique is forward chaining, and a goal-
driven inference strategy is backward chaining.
 The down-up strategy is known as forward chaining, and the top-down
approach is known as backward chaining.
 Forward chaining employs a breadth-first approach, whereas reverse
chaining employs a depth-first approach.
 The Modus ponens inference rule is used in both forward and
backward chaining.
 Forward chaining is useful for jobs like planning, design process
monitoring, diagnosis, and classification, whereas backward chaining
is useful for tasks like categorization and diagnosis.
 Backward chaining aims to avoid the needless path of reasoning,
whereas forward chaining can be like an exhaustive search.
 There may be a variety of ASK questions from the knowledge base in
forward-chaining, whereas there may be fewer ASK questions in
backward-chaining.
 Forward chaining is sluggish because it tests all of the rules, whereas
backward chaining is quick since it simply checks the rules that are
required.

S. Forward Chaining Backward Chaining


No.

1. Forward chaining starts from Backward chaining starts from


known facts and applies the goal and works backward
inference rule to extract through inference rules to find
more data unit it reaches to the required facts that support
the goal. the goal.

2. It is a bottom-up approach It is a top-down approach

3. Forward chaining is known Backward chaining is known as


as data-driven inference goal-driven technique as we
technique as we reach to the start from the goal and divide
goal using the available data. into sub-goal to extract the
facts.
4. Forward chaining reasoning Backward chaining reasoning
applies a breadth-first search applies a depth-first search
strategy. strategy.

5. Forward chaining tests for all Backward chaining only tests


the available rules for few required rules.

6. Forward chaining is suitable Backward chaining is suitable


for the planning, monitoring, for diagnostic, prescription, and
control, and interpretation debugging application.
application.

7. Forward chaining can Backward chaining generates a


generate an infinite number finite number of possible
of possible conclusions. conclusions.

8. It operates in the forward It operates in the backward


direction. direction.

9. Forward chaining is aimed Backward chaining is only


for any conclusion. aimed for the required data.

Resolution in FOL
Resolution is a method of theorem proof that involves constructing refutation
proofs, or proofs by contradictions. It was created in 1965 by a mathematician
named John Alan Robinson.
When several statements are supplied and we need to prove a conclusion from
those claims, we employ resolution. In proofs by resolutions, unification is a
crucial idea. Resolution is a single inference rule that can work on either the
conjunctive normal form or the clausal form efficiently.
Clause: Disjunction of literals (an atomic sentence) is called a clause. It is also
known as a unit clause.
Conjunctive Normal Form: A sentence represented as a conjunction of clauses
is said to be conjunctive normal form or CNF.

The resolution inference rule


The propositional rule is just a lifted version of the resolution rule for first-
order logic. If two clauses include complementary literals that are expected to
be standardised apart so that they share no variables, resolution can resolve
them.

Where li and mj are complementary literals.


This rule is also called the binary resolution rule because it only resolves
exactly two literals.
Example:
We can resolve two clauses which are given below:
[Animal (g(x) V Loves (f(x), x)]       and       [¬ Loves(a, b) V ¬Kills(a, b)]
Where two complimentary literals are: Loves (f(x), x) and ¬ Loves (a, b)
These literals can be unified with unifier θ= [a/f(x), and b/x] , and it will
generate a resolvent clause:
[Animal (g(x) V ¬ Kills(f(x), x)].
Steps for Resolution:
 Conversion of facts into first-order logic.
 Convert FOL statements into CNF
 Negate the statement which needs to prove (proof by contradiction)
 Draw resolution graph (unification).
To better understand all the above steps, we will take an example in which we
will apply resolution.
Example:
 John likes all kind of food.
 Apple and vegetable are food
 Anything anyone eats and not killed is food.
 Anil eats peanuts and still alive
 Harry eats everything that Anil eats.
Prove by resolution that:
 John likes peanuts.
Step-1: Conversion of Facts into FOL
In the first step we will convert all the given statements into its first order
logic.

Step-2: Conversion of FOL into CNF


In First order logic resolution, it is required to convert the FOL into CNF as
CNF form makes easier for resolution proofs.
Eliminate all implication (→) and rewrite
a. ∀x ¬ food(x) V likes(John, x)
b. food(Apple) Λ food(vegetables)
c. ∀x ∀y ¬ [eats(x, y) Λ ¬ killed(x)] V food(y)
d. eats (Anil, Peanuts) Λ alive(Anil)
e. ∀x ¬ eats(Anil, x) V eats(Harry, x)
f. ∀x¬ [¬ killed(x) ] V alive(x)
g. ∀x ¬ alive(x) V ¬ killed(x)
h. likes(John, Peanuts).
Move negation (¬)inwards and rewrite
a. ∀x ¬ food(x) V likes(John, x)
b. food(Apple) Λ food(vegetables)
c. ∀x ∀y ¬ eats(x, y) V killed(x) V food(y)
d. eats (Anil, Peanuts) Λ alive(Anil)
e. ∀x ¬ eats(Anil, x) V eats(Harry, x)
f. ∀x ¬killed(x) ] V alive(x)
g. ∀x ¬ alive(x) V ¬ killed(x)
h. likes(John, Peanuts).
Rename variables or standardize variables
a. ∀x ¬ food(x) V likes(John, x)
b. food(Apple) Λ food(vegetables)
c. ∀y ∀z ¬ eats(y, z) V killed(y) V food(z)
d. eats (Anil, Peanuts) Λ alive(Anil)
e. ∀w¬ eats(Anil, w) V eats(Harry, w)
f. ∀g ¬killed(g) ] V alive(g)
g. ∀k ¬ alive(k) V ¬ killed(k)
h. likes(John, Peanuts).
Eliminate existential instantiation quantifier by elimination.
In this step, we will eliminate existential quantifier ∃, and this process is
known as Skolemization. But in this example problem since there is no
existential quantifier so all the statements will remain same in this step.
Drop Universal quantifiers.
In this step we will drop all universal quantifier since all the statements are not
implicitly quantified so we don't need it.
y.¬ food(x) V likes(John, x)
a. food(Apple)
b. food(vegetables)
c. ¬ eats(y, z) V killed(y) V food(z)
d. eats (Anil, Peanuts)
e. alive(Anil)
f. ¬ eats(Anil, w) V eats(Harry, w)
g. killed(g) V alive(g)
h. ¬ alive(k) V ¬ killed(k)
i. likes(John, Peanuts).
Distribute conjunction ∧ over disjunction ¬.
This step will not make any change in this problem.
Step-3: Negate the statement to be proved
In this statement, we will apply negation to the conclusion statements, which
will be written as ¬likes(John, Peanuts)
Step-4: Draw Resolution graph:
Now in this step, we will solve the problem by resolution tree using
substitution. For the above problem, it will be given as follows:
Hence the negation of the conclusion has been proved as a complete
contradiction with the given set of statements.

Explanation of Resolution graph


In the first step of resolution graph, ¬likes(John, Peanuts) , and likes(John,
x) get resolved(canceled) by substitution of {Peanuts/x}, and we are left
with ¬ food(Peanuts)
In the second step of the resolution graph, ¬ food(Peanuts) , and food(z) get
resolved (canceled) by substitution of { Peanuts/z}, and we are left with ¬
eats(y, Peanuts) V killed(y) .
In the third step of the resolution graph, ¬ eats(y, Peanuts) and eats (Anil,
Peanuts) get resolved by substitution {Anil/y}, and we are left
with Killed(Anil) .
In the fourth step of the resolution graph, Killed(Anil) and ¬ killed(k) get
resolve by substitution {Anil/k}, and we are left with ¬ alive(Anil) .
In the last step of the resolution graph ¬ alive(Anil) and alive(Anil) get
resolved.
Knowledge representation
Humans excel in comprehending, reasoning, and interpreting information.
Humans have knowledge about things and use that knowledge to accomplish
various activities in the real world. However, knowledge representation and
reasoning deal with how robots achieve all of these things. As a result, the
following is a description of knowledge representation:
 Knowledge representation and reasoning (KR, KRR) is a branch of
artificial intelligence that studies how AI agents think and how their
thinking influences their behaviour.
 It is in charge of encoding information about the real world in such a
way that a computer can comprehend it and use it to solve complicated
real-world problems like diagnosing a medical condition or conversing
with humans in natural language.
 It's also a means of describing how artificial intelligence can represent
knowledge. Knowledge representation is more than just storing data in
a database; it also allows an intelligent machine to learn from its
knowledge and experiences in order to act intelligently like a person.

Representation
Following are the kind of knowledge which needs to be represented in AI
systems:
 Object: All the facts about objects in our world domain. E.g., Guitars
contains strings, trumpets are brass instruments.
 Events: Events are the actions which occur in our world.
 Performance: It describe behavior which involves knowledge about
how to do things.
 Meta-knowledge: It is knowledge about what we know.
 Facts: Facts are the truths about the real world and what we represent.
 Knowledge-Base: The knowledge base is the most important part of
the knowledge-based agents. It's abbreviated as KB. The Sentences are
grouped together in the Knowledgebase (Here, sentences are used as a
technical term and not identical with the English language).
 Knowledge: Knowledge is the awareness or familiarity of facts, data,
and circumstances gained through experiences. The types of
knowledge in artificial intelligence are listed below.
Types of knowledge
Following are the various types of knowledge
1. Declarative Knowledge:
Declarative knowledge is the ability to understand something.
It contains ideas, facts, and objects.
It's also known as descriptive knowledge, and it's communicated using
declarative statements.
It is less complicated than procedural programming.
2. Procedural Knowledge
It's sometimes referred to as "imperative knowledge."
Procedure knowledge is a form of knowledge that entails knowing how to do
something.
It can be used to complete any assignment.
It has rules, plans, procedures, and agendas, among other things.
The use of procedural knowledge is contingent on the job at hand.
3. Meta-knowledge:
Knowledge about the other types of knowledge is called Meta-knowledge.
4. Heuristic knowledge:
Heuristic knowledge is the sum of the knowledge of a group of specialists in a
certain field or subject.
Heuristic knowledge refers to rules of thumb that are based on prior
experiences, awareness of methodologies, and are likely to work but not
guaranteed.
5. Structural knowledge:
Basic problem-solving knowledge is structural knowledge.
It describes the connections between distinct concepts such as kind, part of,
and grouping.
It is a term that describes the relationship between two or more concepts or
objects.
The relation between knowledge and intelligence
Real-world knowledge is essential for intelligence, and artificial intelligence
is no exception. When it comes to exhibiting intelligent behaviour in AI
entities, knowledge is crucial. Only when an agent has some knowledge or
expertise with a given input can he act appropriately on it.
Consider what you would do if you encountered someone who spoke to you in
a language you did not understand. The same can be said for the agents'
intelligent behaviour.
One decision maker, as shown in the diagram below, acts by detecting the
environment and applying knowledge. However, if the knowledge component
is missing, it will be unable to demonstrate intelligent behaviour.
AI knowledge cycle
An Artificial intelligence system has the following components for displaying
intelligent behavior:
Perception
Learning
Knowledge Representation and Reasoning
Planning
Execution
The diagram above depicts how an AI system interacts with the real
environment and what components assist it in displaying intelligence.
Perception is a component of an AI system that allows it to gather information
from its surroundings. It can be in the form of visual, aural, or other sensory
input. The learning component is in charge of gaining knowledge from the
data collected by Perception comportment. The main components of the entire
cycle are knowledge representation and reasoning. These two elements have a
role in demonstrating intelligence in machine-like humans. These two
components are independent of one another, but they are also linked. Analysis
of knowledge representation and reasoning is required for planning and
implementation.
Approaches to knowledge representation:
There are mainly four approaches to knowledge representation, which are
givenbelow:
1. Simple relational knowledge:
It is the most basic technique of storing facts that use the relational method,
with each fact about a group of objects laid out in columns in a logical order.
This method of knowledge representation is often used in database systems to
express the relationships between various things.
This method leaves minimal room for inference.
Example: The following is the simple relational knowledge representation.

Player Weight Age

Player1 65 23

Player2 58 18

Player3 75 24
2. Inheritable knowledge:
All data must be kept in a hierarchy of classes in the inheritable knowledge
approach.
All classes should be organised in a hierarchical or generic fashion.
We use the inheritance property in this method.
Other members of a class pass on their values to elements.
The instance relation is a type of inheritable knowledge that illustrates a
relationship between an instance and a class.
Each individual frame might indicate a set of traits as well as their value.
Objects and values are represented in Boxed nodes in this technique.
Arrows are used to connect objects to their values.
Example:

3. Inferential knowledge:
Knowledge is represented in the form of formal logics in the inferential
knowledge approach.
More facts can be derived using this method.
It ensured that everything was in order.
Example: Let's suppose there are two statements:
a. Marcus is a man
b. All men are mortal
Then it can represent as;

man(Marcus)
∀x = man (x) ----------> mortal (x)s
4. Procedural knowledge:
Small programmes and codes are used in the procedural knowledge approach
to specify how to do specific things and how to proceed.
One significant rule employed in this method is the If-Then rule.
We may employ several coding languages, such as LISP and Prolog, with this
information.
Using this method, we can readily represent heuristic or domain-specific
information.
However, this strategy does not require us to represent all scenarios.
Requirements for knowledge Representation system:
A good knowledge representation system must possess the following
properties.
1. Representational Accuracy:
KR system should have the ability to represent all kind of required knowledge.
2. Inferential Adequacy:
KR system should have ability to manipulate the representational structures to
produce new knowledge corresponding to existing structure.
3. Inferential Efficiency:
The ability to direct the inferential knowledge mechanism into the most
productive directions by storing appropriate guides.
4. Acquisitional efficiency- The ability to acquire the new knowledge easily
using automatic methods.
ONTOLOGICAL ENGINEERING
Events, Time, Physical Objects, and Beliefs are examples of concepts that
appear in a variety of disciplines. Ontological engineering is a term used to
describe the process of representing abstract concepts.

Because of the habit of showing graphs with the general concepts at the top
and the more specialised concepts below them, the overall framework of
concepts is called an upper ontology, as seen in Figure.
Categories and Objects
The categorization of objects is an important aspect of knowledge
representation. Although much reasoning takes place at the level of categories,
much engagement with the environment takes place at the level of particular
things.
A shopper's goal, for example, would generally be to purchase a basketball
rather than a specific basketball, such as BB9. In first-order logic, there are
two ways to represent categories: predicates and objects. To put it another
way, we can use the predicate Basketball (b) or reify1 the category as an
object, Basketballs.
To state that b is a member of the category of basketballs, we may say
Member(b, Basketballs), which we would abbreviate as b Basketballs.
Basketballs is a subcategory of Balls, thus we say Subset(Basketballs, Balls),
abbreviated as Basketballs Balls. Through inheritance, categories help to
organise and simplify the information base. We can deduce that every apple is
edible if we state that all instances of the category Food are edible and that
Fruit is a subclass of Food and Apples is a subclass of Fruit. Individual apples
are said to inherit the quality of edibility from their membership in the Food
category. By connecting things to categories or quantifying over their
members, first-order logic makes it simple to state truths about categories.
Here are some instances of different types of facts:
• An object is a member of a category.
BB9 ∈ Basketballs
• A category is a subclass of another category. Basketballs ⊂ Balls
• All members of a category have some properties.
(x∈ Basketballs) ⇒ Spherical (x)
• Members of a category can be recognized by some properties. Orange(x) ∧
Round (x) ∧ Diameter(x)=9.5 ∧ x∈ Balls ⇒ x∈ Basketballs
• A category as a whole has some properties.
Dogs ∈ Domesticated Species
Because Dogs is both a category and a subcategory of Domesticated Species,
the latter must be a category of categories. Categories can also be formed by
establishing membership requirements that are both required and sufficient. A
bachelor, for example, is a single adult male:
x∈ Bachelors ⇔ Unmarried(x) ∧ x∈ Adults ∧ x∈ Males
Physical Composition
To declare that one thing is a part of another, we utilise the general PartOf
connection. Objects can be categorised into hierarchies, similar to the Subset
hierarchy:
PartOf (Bucharest, Romania)
PartOf (Romania, EasternEurope)
PartOf(EasternEurope, Europe)
PartOf (Europe, Earth)
The PartOf relation is transitive and reflexive; that is,
PartOf (x, y) ∧PartOf (y, z) ⇒PartOf (x, z)
PartOf (x, x)
Therefore, we can conclude PartOf (Bucharest, Earth).
For example, if the apples are Apple1, Apple2, and Apple3, then
BunchOf ({Apple1,Apple2,Apple3})
denotes the composite object with the three apples as parts (not elements). We
can define BunchOf in terms of the PartOf relation. Obviously, each element
of s is part of
BunchOf (s): ∀x x∈ s ⇒PartOf (x, BunchOf (s)) Furthermore, BunchOf (s) is
the smallest object satisfying this condition. In other words, BunchOf (s) must
be part of any object that has all the elements of s as parts:
∀y [∀x x∈ s ⇒PartOf (x, y)] ⇒PartOf (BunchOf (s), y)
Measurements
In both scientific and commonsense theories of the world, objects have height,
mass, cost, and so on. The values that we assign for these properties are called
measures. Length(L1)=Inches(1.5)=Centimeters(3.81)
Conversion between units is done by equating multiples of one unit to another:
Centimeters(2.54 ×d)=Inches(d)
Similar axioms can be written for pounds and kilograms, seconds and days,
and dollars and cents. Measures can be used to describe objects as follows:
Diameter (Basketball12)=Inches(9.5)
ListPrice(Basketball12)=$(19)
d∈ Days ⇒ Duration(d)=Hours(24)
Time Intervals
Event calculus opens us up to the possibility of talking about time, and time
intervals. We will consider two kinds of time intervals: moments and extended
intervals. The distinction is that only moments have zero duration:
Partition({Moments,ExtendedIntervals},Intervals)
i∈Moments⇔Duration(i)=Seconds(0)
The functions Begin and End pick out the earliest and latest moments in an
interval, and the function Time delivers the point on the time scale for a
moment.
The function Duration gives the difference between the end time and the start
time. Interval (i) ⇒Duration(i)=(Time(End(i)) − Time(Begin(i)))
Time(Begin(AD1900))=Seconds(0)
Time(Begin(AD2001))=Seconds(3187324800)
Time(End(AD2001))=Seconds(3218860800)
Duration(AD2001)=Seconds(31536000) Two intervals Meet if the end time of
the first equals the start time of the second. The complete set of interval
relations, as proposed by Allen (1983), is shown graphically in Figure 12.2
and logically below: Meet(i,j) ⇔ End(i)=Begin(j)
Before(i,j) ⇔ End(i) < Begin(j)
After (j,i) ⇔ Before(i, j)
During(i,j) ⇔ Begin(j) < Begin(i) < End(i) < End(j)
Overlap(i,j) ⇔ Begin(i) < Begin(j) < End(i) < End(j)
Begins(i,j) ⇔ Begin(i) = Begin(j)
Finishes(i,j) ⇔ End(i) = End(j)
Equals(i,j) ⇔ Begin(i) = Begin(j) ∧ End(i) = End(j)

EVENTS
Fluents and events are reified in event calculus. The ability to communicate
fluently At(Shankar, Berkeley) is an object that alludes to the fact that Shankar
is in Berkeley, but it doesn't say whether it's true or not. The predicate T, as in
T(At(Shankar, Berkeley), t), is used to affirm that a fluent is true at some point
in time. Instances of event categories are used to describe events. The event
E1 of Shankar flying from San Francisco to Washington, D.C. is described as
E1 ∈Flyings∧ Flyer (E1, Shankar ) ∧ Origin(E1, SF) ∧ Destination (E1,DC)
we can define an alternative three-argument version of the category of flying
events and say E1 ∈Flyings(Shankar, SF,DC) We then use Happens(E1, i) to
say that the event E1 took place over the time interval i, and we say the same
thing in functional form with Extent(E1)=i. We represent time intervals by a
(start, end) pair of times; that is, i = (t1, t2) is the time interval that starts at t1
and ends at t2. The complete set of predicates for one version of the event
calculus is T(f, t) Fluent f is true at time t Happens(e, i) Event e happens over
the time interval i Initiates(e, f, t) Event e causes fluent f to start to hold at
time t Terminates(e, f, t) Event e causes fluent f to cease to hold at time t
Clipped(f, i) Fluent f ceases to be true at some point during time interval i
Restored (f, i) Fluent f becomes true sometime during time interval i We
assume a distinguished event, Start, that describes the initial state by saying
which fluents are initiated or terminated at the start time. We define T by
saying that a fluent holds at a point in time if the fluent was initiated by an
event at some time in the past and was not made false (clipped) by an
intervening event. A fluent does not hold if it was terminated by an event and
not made true (restored) by another event. Formally, the axioms are:
Happens(e, (t1, t2)) ∧Initiates(e, f, t1) ∧¬Clipped(f, (t1, t)) ∧ t1 < t ⇒T(f,
t)Happens(e, (t1, t2)) ∧ Terminates(e, f, t1)∧¬Restored (f, (t1, t)) ∧ t1 < t ⇒
¬T(f, t)
where Clipped and Restored are defined by Clipped(f, (t1, t2)) ⇔∃ e, t, t3
Happens(e, (t, t3))∧ t1 ≤ t < t2 ∧ Terminates(e, f, t) Restored (f, (t1, t2)) ⇔∃ e,
t, t3 Happens(e, (t, t3)) ∧ t1 ≤ t < t2 ∧ Initiates(e, f, t)

MENTAL EVENTS AND MENTAL OBJECTS


We require a model of the mental objects (or knowledge base) in someone's
head, as well as the mental processes that change those mental things. It is not
necessary to have a detailed model. We don't need to know how many
milliseconds it will take for a specific agent to make a deduction. We'll be
content only to be able to deduce that mother is aware of whether she is sitting
or not. We begin with propositional attitudes toward mental objects, such as
Believes, Knows, Wants, Intends, and Informs, which an agent can possess.
The issue is that these attitudes behave differently than "ordinary" predicates.
For example, suppose we try to assert that Lois knows that Superman can fly:
Knows (Lois, CanFly(Superman)) One minor issue with this is that we
normally think of CanFly(Superman) as a sentence, but here it appears as a
term. That issue can be patched up just be reifying CanFly(Superman); making
it a fluent. A more serious problem isthat, if it is true that Superman is Clark
Kent, then we must conclude that Lois knows that Clark can fly: (Superman =
Clark) ∧Knows(Lois, CanFly(Superman)) |= Knows(Lois, CanFly (Clark))
Modal logic is designed to address this problem. Regular logic is concerned
with a single modality, the modality of truth, allowing us to express “P is
true.” Modal logic includes special modal operators that take sentences (rather
than terms) as arguments. For example, “KAP is the notation for "A knows P,"
where K is the modal operator for knowledge. It takes two arguments: a
statement and an agent (represented as the subscript). Modal logic has the
same syntax as first-order logic, with the exception that modal operators can
also be used to build sentences. A model in first-order logic has a set of
objects as well as an interpretation that translates each name to the correct
item, relation, or function. We want to be able to consider both the possibility
that Superman's secret identity is Clark and the chance that it isn't in modal
logic. As a result, we'll need a more complex model, one that includes a
collection of alternative worlds rather than a single real world. The worlds are
connected in a graph by accessibility relations, one relation for each modal
operator. We say that world w1 is accessible from world w0 with respect to
the modal operator KA if everything in w1 is consistent with what A knows in
w0, and we write this as Acc(KA,w0,w1). In diagrams below we show
accessibility as an arrow between possible worlds. In general, a knowledge
atom KAP is true in world w if and only if P is true in every world accessible
from w. The truth of more complex sentences is derived by recursive
application of this rule and the normal rules of first-order logic. That means
that modal logic can be used to reason about nested knowledge sentences:
what one agent knows about another agent’s knowledge. For example, we can
say that, even though Lois doesn’t know whether Superman’s secret identity is
Clark Kent, she does know that Clark knows: KLois [KClark
Identity(Superman, Clark )
∨KClark¬Identity(Superman, Clark )] Figure below shows some possible
worlds for this domain, with accessibility relations for Lois and Superman

In the TOP-LEFT diagram, it is common knowledge that Superman knows his


own identity, and neither he nor Lois has seen the weather report. So in w0 the
worlds w0 and w2 are accessible to Superman; maybe rain is predicted, maybe
not. For Lois all four worlds are accessible from each other; she doesn’t know
anything about the report or if Clark is Superman. But she does know that
Superman knows whether he is Clark, because in every world that is
accessible to Lois, either Superman knows I, or he knows ¬I. Lois does not
know which is the case, but either way she knows Superman knows. In the
TOP-RIGHT diagram it is common knowledge that Lois has seen the weather
report. So in w4 she knows rain is predicted and in w6 she knows rain is not
predicted. Superman does not know the report, but he knows that Lois knows,
because in every world that is accessible to him, either she knows R or she
knows ¬ R. In the BOTTOM diagram we represent the scenario where it is
common knowledge that Superman knows his identity, and Lois might or
might not have seen the weather report. We represent this by combining the
two top scenarios, and adding arrows to show that Superman does not know
which scenario actually holds. Lois does know, so we don’t need to add any
arrows for her. In w0 Superman still knows I but not R, and now he does not
know whether Lois knows R. From what Superman knows, he might be in w0
or w2, in which case Lois does not know whether R is true, or he could be in
w4, in which case she knows R, or w6, in which case she knows ¬R.
REASONING SYSTEMS FOR CATEGORIES
This section discusses systems that are specifically built for categorical
organisation and reasoning. There are two families of systems: semantic
networks and description logics. Semantic networks provide graphical aids for
visualising a knowledge base and efficient algorithms for inferring properties
of an object based on its category membership, while description logics
provide a formal language for constructing and combining category definitions
and efficient algorithms for deciding subset and superset relationships between
categories.
SEMANTIC NETWORKS
Semantic networks come in a variety of shapes and sizes, but they may all
represent individual objects, categories of objects, and relationships between
things. A typical graphical notation shows object or category names in ovals
or boxes, with labelled links connecting them.
For example, Figure below has a Member Of link between Mary and Female
Persons, corresponding to the logical assertion Mary ∈FemalePersons ;
similarly, the SisterOf link between Mary and John corresponds to the
assertion SisterOf (Mary, John). We can connect categories using SubsetOf
links, and so on. We know that persons have female persons as mothers, so
can we draw a HasMother link from Persons to FemalePersons? The answer is
no, because HasMother is a relation between a person and his or her mother,
and categories do not have mothers. For this reason, we have used a special
notation—the double-boxed link—in Figure 12.5. This link asserts that ∀x x∈
Persons ⇒ [∀ y HasMother (x, y) ⇒ y ∈FemalePersons ] We might also want
to assert that persons have two legs—that is, ∀x x∈ Persons ⇒ Legs(x, 2) The
semantic network notation makes it convenient to perform inheritance
reasoning. For example, by virtue of being a person, Mary inherits the
property of having two legs. Thus, to find out how many legs Mary has, the
inheritance algorithm followsthe MemberOf link from Mary to the category
she belongs to, and then follows SubsetOf links up the hierarchy until it finds
a category for which there is a boxed Legs link—in this case, the Persons
category.
Multiple inheritance complicates inheritance when an object can belong to
more than one category or when a category might be a subset of more than one
other category. The fact that links between bubbles indicate only binary
relations is a disadvantage of semantic network notation compared to first-
order logic. In a semantic network, for example, the sentence Fly(Shankar,
NewYork, NewDelhi, Yesterday) cannot be claimed explicitly. Nonetheless,
the impact of n-ary assertions can be obtained by reifying the proposition as an
event belonging to a suitable event category.
The semantic network structure for this event is depicted in the diagram
below. It's worth noting that the constraint on binary relations necessitates the
formation of a large ontology of reified concepts. The ability of semantic
networks to represent is one of its most essential features.

The capacity of semantic networks to represent default values for categories is


one of its most essential features. When looking closely at Figure 3.6, one can
see that John only has one leg, despite the fact that he is a human, and all
people have two legs.
In a strictly logical KB, this would be a contradiction, but in a semantic
network, the assertion that all persons have two legs has only
default status; that is, a person is assumed to have two legs unless this is
contradicted by more specific information
Criminal (West). The resolution proof is shown inWe also include the
negated goal below.

Take note of the structure: a single "spine" starts with the goal clause and
resolves against knowledge base clauses until the empty clause is formed. This
is typical of Horn clause knowledge bases' resolution. In the backward-
chaining algorithm of Figure, the clauses along the main spine correspond to
the sequential values of the objectives variable. This is because, in backward
chaining, we always opt to resolve with a clause whose positive literal unified
with the left most literal of the "current" clause on the spine. Backward
chaining is thus only a special example of resolution with a specific control
mechanism for determining which resolution should be performed next.
UNIT IV
SOFTWARE AGENTS

DEFINITION
Agent architectures, like software architectures, are a formal description of the
components that make up a system and how they communicate. These items
can also be defined using patterns that have certain constraints.
 There are several typical architectures known as pipe-and-filter or
layered architecture.
 The interconnections between components are defined by these.
 Data is transported through a set of one or more objects that perform a
transformation in a Pipe-and-Filter paradigm.
 Layered simply indicates that the system is made up of a series of
layers, each of which provides a distinct set of logical capabilities, and
that connectivity is typically limited to layers that are next to one
another.
TYPES OF ARCHITECTURES
A variety of agent architectures are available to assist, depending on the goals
of the agent application. This section will go through some of the most
common architecture types as well as some of the applications that they can be
utilised for.
1. Reactive architectures
2. Deliberative architectures
3. Blackboard architectures
4. Belief-desire-intention (BDI) architecture
5. Hybrid architectures
6. Mobile architectures
REACTIVE ARCHITECTURES
 The simplest architecture for agents is a reactive architecture.
 Agent behaviours are just a mapping between stimuli and reaction in
this design.
 The agent has no decision-making abilities and can simply react to its
surroundings. 116
 The agent merely reads the environment and then assigns one or more
actions to the state of the environment. Given the circumstances, more
than one action may be suitable, and the agent must choose a decision.
 Reactive architectures have the advantage of being highly quick.
 This type of design is simple to implement in hardware and quick to
look up in software.
 Reactive designs have the drawback of being limited to basic
situations.
 Sequences of actions require the presence of state, which is not
encoded into the mapping function.
DELIBERATIVE ARCHITECTURES
 As the name implies, a deliberative architecture incorporates some
discussion regarding the action to do given the present set of inputs.
 Rather than mapping sensors to actuators directly, the deliberative
architecture evaluates the sensors, state, previous consequences of
specific actions, and other data to determine the appropriate action to
do.
 The mechanism for selecting actions is unknown. This is due to the
fact that it may be a production system, a neural network, or any other
intelligent algorithm.
 The deliberative architecture has the advantage of being able to address
far more complex problems than the reactive architecture.
 It is capable of planning and carrying out sequences of activities in
order to reach a goal.
 The disadvantage is that it is slower than the reactive architecture due
to the deliberation for the action to select.

Reactive architecture defines a simple agent

A deliberative agent architecture considers its actions


BLACKBOARD ARCHITECTURES
 The blackboard architecture is a highly common and also extremely
intriguing design.
 HEARSAY-II, a speech interpretation system, was the first blackboard
architecture. This structure is based on a global work area known as
the blackboard.
 The blackboard is a shared workspace for a group of agents that are
working together to solve an issue.
 As a result, the blackboard not only carries environmental information,
but also intermediate work outputs from cooperating agents.
 In this example, two independent agents sample the environment using
accessible sensors (the sensor agent) as well as available actuators (the
actuator agent) (action agent).
 The sensor agent constantly updates the current state of the
environment in the blackboard, and when an action (as stated in the
blackboard) can be done, the action agent converts this action into
control of the actuators.
 One or more thinking agents are in charge of controlling the agent
system.
 These actors collaborate to fulfil the objectives, which are also listed
on the chalkboard.
 The goal definition behaviours might be implemented by the first
reasoning agent, while the planning element could be implemented by
the second reasoning agent (to translate goals into sequences of
actions).
 Because the blackboard is a shared workspace, coordination is required
to ensure that agents do not step on one another.
 As a result, agents are scheduled according to their requirements.
Agents, for example, can keep an eye on the blackboard and request
the ability to operate when information is updated.
 The scheduler can then determine which agents want to work on the
blackboard and subsequently summon them.
 A multi-threading system can simply implement the blackboard
architecture, which has a globally available work area.
 Each agent becomes one or more threads in the system. The
blackboard architecture is fairly prevalent for both agent and non-agent
systems from this standpoint.
BELIEF-DESIRE-INTENTION (BDI) ARCHITECTURE
 The BDI architecture, which stands for Belief-Desire-Intention, is
based on Michael Bratman's theory of human reasoning.
 Belief represents the agent's perspective on the world (what it believes
to be the state of the environment in which it exists). Desires are the
objectives that define the agent's motivation (what it wants to achieve).
 The agent's desires may be numerous, but they must be consistent.
Finally, Intentions state that the agent will utilise the Beliefs and
Desires to select one or more acts to fulfil the desires.
 The BDI architecture, as previously stated, describes the fundamental
architecture of any deliberative agent. It maintains a set of goals
(wishes), stores a representation of the state of the environment
(beliefs), and finally, an intentional element that maps desires to beliefs
(to deliver one or more actions that affect the state of the environment
based on the agent's needs).
HYBRID ARCHITECTURES
 Most architectures are hybrids, as is the case with traditional software
architecture.
 A pipe-and-filter architecture and a layered architecture, for example,
make constitute the architecture of a network stack.
 As there are global features that are visible and used by each
component of the design, this stack also shares certain elements of a
blackboard architecture.
 Agent architectures are no different. Different architectural aspects can
be chosen to satisfy the needs of the agent system based on those
needs.
MOBILE ARCHITECTURES
 The mobile agent architecture is the last architectural pattern we'll look
at.
 The possibility for agents to migrate between hosts is introduced by
this architectural design. The mobility part of the agent architecture
allows an agent to travel from one host to another.
 Any host that uses the mobile framework can move an agent.
 The mobile agent framework includes a protocol that allows hosts to
communicate with one another during agent migration.
 To prevent a mobile agent framework from becoming a virus conduit,
this architecture also requires authentication and security. A
mechanism for discovery is also included in the mobile agent
framework.
 Which hosts, for example, are available for migration and what
services do they offer? Agents can communicate with one another on
the same host or across hosts in preparation for migration, so
communication is implicit.
 The mobile agent architecture is beneficial since it allows for the
creation of intelligent distributed systems. However, a dynamic
distributed system whose configuration and loading are defined by the
agents themselves.

ARCHITECTURE DESCRIPTIONS
1. Subsumption Architecture (Reactive Architecture)
2. Behavior Networks (Reactive Architecture)
3. ATLANTIS (Deliberative Architecture)
4. Homer (Deliberative Arch)
5. BB1 (Blackboard)
6. Open Agent Architecture (Blackboard)
7. Procedural Reasoning System (BDI)
8. Aglets (Mobile)
9. Messengers (Mobile)
10. Soar (Hybrid)
SUBSUMPTION ARCHITECTURE (REACTIVE ARCHITECTURE)
 Rodney Brooks established the Subsumption architecture in the late
1980s as a result of his research in behavior-based robots.
 Subsumption is based on the premise that intelligent behaviour may be
generated from a collection of simple behaviour modules. 121
 These modules of behaviour are grouped into layers. The actions that
are reflexive in nature are at the bottom, and the behaviours that are
more sophisticated are at the top. Take a look at the abstract model in
Figure.
 The reflexive activities are found at the bottom (level 0). (such as
obstacle avoidance). Level 0 consumes the inputs and offers an action
at the output if these behaviours are necessary. However, because there
are no obstructions, the following layer up is free to take over.
 Depending on the condition of the environment, a collection of
behaviours with distinct purposes compete for control at each level.
 Levels of capability can be limited to support this (in other words, their
outputs are disabled). Sensor inputs can also be directed to higher
layers by suppressing levels. As depicted in the diagram.
 Subsumption is a parallel and distributed sensor and actuator
management framework. The underlying notion is that we start with a
basic set of behaviours and, once we've mastered them, we progress to
higher levels of behaviours.
 For instance, we start with obstacle avoidance and work our way up to
object finding. The architecture adopts a more evolutionary design
approach from this perspective.
 Subsumption is not without flaws. It is easy, however it does not
appear to be very expandable. As more layers are added, they tend to
conflict with one another, therefore the issue becomes how to layer the
behaviours such that each has the ability to manage when the moment
is right.
 Subsumption is also reactive, which means that at the end of the day,
the architecture is still just mapping inputs to behaviours (no planning
occurs, for example). Subsumption does provide a way to pick which
behaviour is appropriate for a given situation.

BEHAVIOR NETWORKS (REACTIVE ARCHITECTURE)


 Behavior networks, invented by Pattie Maes in the late 1980s, are
another distributed reactive design. The question of which action is
most appropriate for a given scenario is addressed by behaviour
networks.
 Behavior networks, as the name implies, are networks of behaviours
that incorporate activation and inhibition relationships.
 Figure is an example behaviour network for a game agent. Behaviors
are rectangles that specify the actions that the agent can take, as
indicated in the legend (attack, explore, reload, etc.).
 The ovals define the preconditions for selecting actions, which are
environmental inputs.
 Preconditions are linked to behaviours via activation links (which
stimulate the behaviour to be performed) or inhibition linkages (which
hinder the behaviour to be performed) (that inhibit the behaviour from
being performed).
 After sampling the environment, the agent's behaviour is chosen based
on the present condition of the environment. The activation and
inhibition linkages are the first thing to notice. When an agent's health
is low, for example, assault and exploration are disabled, leaving the
agent to seek shelter. Additionally, the agent may come across medkits
or ammunition while exploring.
 It is used if a medkit or ammo is discovered. The competency modules
of Maes' algorithm featured preconditions (that must be met before the
module may activate), actions to be performed, and a level of
activation.
 The activation level is a criterion for determining when a competency
module can activate.
 The method also incorporates decay, which causes activations to fade
away over time. Behavior networks, like subsumption architecture, are
examples of Behavior-Based Systems (BBS). These systems' primitive
actions are all behaviours that are based on the condition of the
environment.
 Networks of behaviour are not without flaws. Because the architecture
is reactive, it does not support planning or higher-level behaviours.
When behaviours are extremely interdependent, the architecture can
suffer as well. With so many competing aims, the behaviour modules
may need to expand considerably in order to achieve the desired
results. This approach, on the other hand, is suited for smaller
architecture, such as the FPS gaming agent shown in Figure below.
ATLANTIS (Deliberative Architecture)
 ATLANTIS (A Three-Layer Architecture for Navigating Through
Intricate Situations) was designed to produce a robot that could
navigate through dynamic and imperfect settings while pursuing
specific high-level goals.
 The purpose of ATLANTIS was to show that a goal-oriented robot
could be constructed using a hybrid architecture combining lower-level
reactive and higher-level deliberative behaviours.
 ATLANTIS assumes that these behaviours are not mutually exclusive,
unlike the subsumption design, which permits layers to subsume
control. The lowest layer can respond to the immediate needs of the
environment, but the top layer can facilitate planning and more goal-
oriented activities.
 Control in ATLANTIS is done from the ground up. The reactive
behaviours are found at the lowest level (the control layer).
 Based on the condition of the environment, these primitive level
activities can be conducted first. The sequencing layer is the following
layer. This layer is in charge of carrying out the deliberative layer's
plans.
 The deliberative layer keeps an internal representation of the
environment and devises strategies to achieve objectives.
 Depending on the status of the environment, the sequencing layer may
or may not fulfil the plan. This frees up the computation-intensive jobs
for the deliberation layer. Another example of hybrid architecture may
be seen here.
 The controller layer integrates lower-level behavior-based approaches
with higher-level classical AI principles (in the deliberative layer).
Surprisingly, the deliberative layer does not direct the sequencing
layer; instead, it only advises it on possible action sequences.
 The low-level reactive layer and the higher-level deliberate layers are
asynchronous in this architecture, which is an advantage. This means
that the agent is immune to the dynamic environment while
deliberative strategies are being developed. This is because, whereas
deliberative layer planning can take time, the controller can deal with
random events in the environment.
HOMER (DELIBERATIVE ARCH)
 Homer is another fascinating modular and integrated deliberative
architecture. Vere and Bickmore designed Homer in 1990 as a
deliberative architecture with some notable differences from prior
systems.
 A memory separated into two pieces is at the heart of the Homer
architecture. The first section covers general information (such as
knowledge about the environment). The second section is known as
episodic knowledge, and it is utilised to record environmental
experiences (perceptions and actions taken).
 The natural language processor takes human input via a keyboard and
uses a phrase generator to parse and answer. The temporal planner
generates dynamic plans to achieve predefined objectives, and it can
replan if the situation necessitates it.
 A plan executor (or interpreter) is also included in the architecture,
which is utilised to carry out the plan at the actuators. A number of
monitor processes were also integrated in the architecture. The main
concept of Homer was to create a general intelligence architecture.
 Regular English language input would be possible via the keyboard,
and created English language sentences would be shown on a terminal.
As a result, the user may use the terminal to communicate with Homer,
setting goals and receiving feedback.
 Homer could record his observations of the world, complete with
timestamps, in order to engage in discussion with the user and provide
sensible responses to questions. Homer can add or remove knowledge
from his episodic memory through reflective (monitor) procedures.
 Homer is a fascinating architecture that incorporates a variety of
intriguing concepts, ranging from natural language processing to
planning and reasoning. One problem with Homer is that as the
episodic memory grows larger, it slows down the agent's overall
operation.

BB1 (BLACKBOARD)
 Barbara Hayes- Roth developed BB1, a domain-independent
blackboard design for AI systems. The architecture facilitates problem-
solving control as well as the explanation of its operations. In addition,
the architecture has the ability to learn new domain knowledge. 126
 The domain blackboard, which serves as the global database, and the
control blackboard, which is utilised to generate a solution to the given
control problem, are both included in BB1.
 The ability of BB1 to plan progressively is its fundamental feature.
BB1 generates a plan dynamically and adjusts to changes in the
environment, rather than developing a comprehensive plan for a
particular goal and then executing it. This is critical in dynamic
contexts, as unanticipated changes might result in brittle plans that fail.
PROCEDURAL REASONING SYSTEM (BDI)
 The Procedural Reasoning System (PRS) is a general-purpose
architecture that's suited for reasoning environments where established
processes can specify actions (action sequences).
 PRS is also a BDI architecture, simulating human reasoning theory.
PRS is a distributed architecture that combines reactive and goal-
directed deliberative processing.
 Through interaction with ambient sensors, the architecture is able to
construct a world-model of the environment (beliefs).
 Intentions can also be used to carry out actions. An interpreter (or
reasoner) sits at the heart of the system, which chooses a goal to pursue
(based on current beliefs) and then retrieves a plan to carry out in order
to attain that objective. During the execution of the plan, PRS
iteratively tests the plan's assumptions. This means it can function in
dynamic contexts where traditional planners would fail.
 Plans (also known as knowledge areas) in PRS are predetermined
actions that can be taken in the environment. This simplifies the
architecture because no plans must be generated; instead, they must be
selected based on the environment and the objectives to be reached.
 The interpreter guarantees that changes in the environment do not
result in inconsistencies in the plan, even if planning is more about
selection than search or generation. Instead, a new strategy is devised
to meet the objectives.
 When all necessary operations can be predefined, PRS is a helpful
design. Due to the lack of plan generation, it is also incredibly
efficient. As a result, PRS is an excellent agent architecture for
creating agents that drive mobile robots.
AGLETS (MOBILE)
 Aglets is a mobile agent framework created in the 1990s by IBM
Tokyo. Aglets is built using the Java programming language, which is
ideal for a mobile agent framework. First, the programmes are portable
to any machine that can run a Java Virtual Machine (both
homogeneous and heterogeneous) (JVM). Second, a Java Virtual
Machine is an excellent platform for migration services.
 Serialization, or the aggregation of a Java application's programme and
data into a single object that may be restarted, is supported by Java.
 The Java application is restarted on a fresh JVM in this situation. A
safe environment (sandbox) is also provided by Java to ensure that a
mobile agent framework does not become a virus distribution system.
The Aglets framework is depicted in the diagram above. The JVM is
located at the very bottom of the framework (the virtual machine that
interprets the Java byte codes). Following that is the agent runtime
environment and the mobility protocol. Aglet Transport Protocol (or
ATP) is a mobility protocol that allows agents to be serialised and then
transported to a host that the agent has previously established.
 The agent API is at the top of the stack, and it contains a variety of API
classes focused on agent operation in typical Java fashion. Last but not
least, there are the numerous agents that interact with the framework.
 A mobile agent framework relies on a variety of services provided by
the agent API and runtime environment. Agent management,
communication, and security are three of the most crucial functions. To
allow communication with outside agents, agents must be able to
register themselves on a specific host.
 Security measures must be implemented to ensure that the agent has
the authority to execute on the framework in order to support
communication.
 Aglets provides a mobile agent architecture with a number of essential
features, including mobility, communication, security, and
confidentiality. Aglets enable only rudimentary migration, in that the
agents can only migrate at random locations inside the code (such as
with the dispatch method).
MESSENGERS (MOBILE)
 Messengers is a runtime environment that allows you to migrate
processes (mobile agency).
 Strong migration, or the ability to migrate at arbitrary locations inside
the mobile application, is a distinguishing strength of the messengers
environment.
 The hop statement in the messengers environment specifies when and
where to move to a new destination.
 The messengers agent in the application resumes after the migration is
complete, at the point following the previous hop statement. As a
result, rather of employing a message system to send data to the agent,
the application travels to the data. 128
 When the data collection is huge and the migration links are slow, this
has clear advantages. The authors dubbed the messengers model
Navigational Programming, as well as Distributed Sequential
Computing (DSC).
 These ideas are intriguing because they provide a common
programming paradigm that is equivalent to the typical flow of
sequential programmes. This makes them simpler to create and
comprehend.
 Now let's look at a DSC example that uses the messengers
environment. Consider an application in which we operate enormous
matrices that are stored in the memory of a number of hosts.
SOAR (HYBRID)
 Soar is a symbolic cognitive architecture that was originally an
abbreviation for State-Operator-And-Result.
 Soar is a general-purpose AI system that includes a cognitive model
and an implementation of that model.
 Soar is inspired by Newell's unified theories of cognition. Soar is one
of the most extensively utilised designs, with applications ranging from
human behaviour research to gaming agent design in first-person
shooter games.
 The Soar architecture aims to create systems that encapsulate general
intelligence. While Soar contains many characteristics that support this
purpose (for example, procedural, episodic, and declarative formats for
representing knowledge), it also lacks some key features. These
comprise episodic memories as well as an emotion model. The
problem-solving technique in Soar is built on a production system
(expert system).
 Rules comparable to the if-then form are used to encode behaviour.
Problem solving in Soar is best described as a problem space search (to
a goal node). If this way of issue solving fails, other methods, such as
hill climbing, are employed.
 When a solution is found, Soar employs a technique known as
chunking to learn a new rule based on the new information. If the agent
faces the problem again, it can utilise the rule to choose an action
rather than solving the problem again.
AGENT COMMUNICATION
Communication is a crucial feature in multi-agent systems since it allows for
both coordination and information transfer. Agents must also be able to
communicate their actions or plans. However, the manner in which
communication takes place is determined by the aim.
 Agents interact in order to better their own or the society/system in
which they live's aims.
 Communication allows agents to coordinate their activities and
behaviours, resulting in more coherent systems.
 A property of a system of agents doing some activity in a common
environment is coordination.
 The extent to which they prevent unnecessary activity by lowering
resource contention, avoiding live lock and deadlock, and maintaining
appropriate safety requirements is the degree of coordination.
 Negotiation is coordination among competing or simply self-interested
agents, whereas cooperation is coordination among non-antagonistic
agents.
 To collaborate effectively, each agent must typically maintain a model
of the other agents as well as construct a model of future interactions.
This necessitates the presence of sociability. Coherence refers to how
well a system functions as a whole. The ability of a multiagent system
to maintain global coherence without explicit global control is a
challenge. In this situation, the agents must be able to define shared
goals, assign common responsibilities, avoid avoidable disputes, and
pool knowledge and evidence on their own. It is advantageous if the
agents are organised in some way.
Dimensions of Meaning
The formal study of communication has three components: syntax (the
structure of communication symbols), semantics (what the symbols mean),
and pragmatics (how the symbols are used) (how the symbols are interpreted).
The semantics and pragmatics of meaning are intertwined. Because agents
communicate in order to comprehend and be comprehended, it is critical to
analyse the various aspects of meaning related with communication.
 Descriptive vs. Prescriptive. Some messages describe events, while
others instruct on how to behave. Human comprehension relies on
descriptions, which are difficult for machines to replicate. Most agent
communication languages, then, are appropriately built for the sharing
of information regarding activities and behaviour.
 Personal vs. Conventional Meaning. An agent may have its own
interpretation of a message, which may differ from the interpretation
shared by the other agents with whom the agent interacts. Multiagent
systems should use conventional meanings as much as feasible,
especially since they are often open settings in which new agents can
be deployed at any time.
 Subjective vs. Objective: Meaning A message often has an explicit
consequence on the 130 environment, which can be perceived
objectively, similar to traditional meaning, where meaning is
determined externally to an actor. The effect may differ from what the
sender or receiver of the communication perceives internally, i.e.
subjectively.
 Speaker's vs. Hearer's vs. Society's Perspective: A message can be
expressed according to the viewpoint of the speaker, hearer, or other
observers, regardless of its conventional or objective meaning.
 Semantics vs. Pragmatics: The pragmatics of a message refers to how
communicators use it. This includes considerations of the
communicators' mental states and the environment in which they
reside, as well as issues that are unrelated to the communication's
syntax and semantics.
 Contextuality: Messages can't be deciphered in isolation; they need to
be interpreted in light of the agents' mental states, the current condition
of the environment, and the environment's history: how it got to where
it is now. Previous communications and behaviours of the agents have
a direct impact on interpretations.
 Coverage Smaller languages are more manageable, but they must be
large enough so that an agent can convey the meanings it intends.
 Identity: When agents communicate, the meaning of the
communication is determined by the identities and functions of the
agents engaged, as well as how the agents are specified. A message
could be sent to a specific agent or to all agents who meet a set of
criteria.
 Cardinality: A communication transmitted privately to one agent
would be interpreted differently from a message broadcast to the
general public.
MESSAGE TYPES
 It is critical for agents with varying skills to be able to communicate
with one another. As a result, communication must be defined at
various levels, with the lowest level being utilised for contact with the
least capable agent.
 The agents must be able to communicate with one another in order to
be of interest to one another. They can play either an active, passive, or
both roles in this discourse, allowing them to act as a master, slave, or
peer, respectively.
 We suppose that an agent may send and receive messages through a
communication network, in accordance with the previous definition
and assumptions about an agent.
 The messages can take a variety of forms, as described below.
 Assertions and inquiries are the two most common message types.
Every agent, whether active or passive, needs to be able to accept data.
This information is sent to the agent from an external source via an
assertion in its most basic form. An agent must also be able to answer
inquiries in order to play a passive part in a dialogue, which means it
must be able to 1) accept a query from an external source and 2)
respond to the source by producing an assertion. It's worth noting that
from the perspective of the communication network, there's no
difference between an uninvited assertion and one made in response to
an inquiry.
 An agent must be able to ask questions and make statements in order to
take an active role in a conversation. With these capabilities, the agent
may be able to command another agent to answer to the query or
accept the asserted facts. This control method can also be used to
control subagents like neural networks and databases.
 In a dialogue, one agent acting as a peer with another agent can play
both active and passive roles. Both assertions and inquiries must be
able to be made and accepted.
SPEECH ACTS
The model for communication among computational agents is spoken human
conversation. Speech act theory is a prominent framework for evaluating
human communication. Human natural language is viewed as acts, such as
requests, proposals, commitments, and responses, according to the speech act
theory. When you request something, for example, you're not just making a
statement; you're actually making the request. When a jury finds a defendant
guilty, something happens: the defendant's social position is altered.
A speech act has three aspects:
1. Locution, the physical utterance by the speaker
2. Illocution, the intended meaning of the utterance by the speaker
3. Perlocution, the action that results from the locution. KQML (Knowledge
Query and Manipulation Language)
 From a variety of perspectives, the KQML is an intriguing example of
communication. Communication, for example, necessitates the ability
to locate and converse with a peer (communication layer).
 The messages must next be packaged (messaging layer), followed by
an internal format that reflects the messages and is expressive enough
to convey not only information but also requests, responses, and plans
(content layer).
 There are programmes to support communication in a network of
KQML-speaking agents. These are facilitators who can act as name
servers for KQML components and assist in the discovery of other
agents who can fulfil a given agent's request.
 A KQML router is a front-end to a specific KQML agent that allows
message routing. KQML's message representation follows the LISP
example because it was originally written in Common LISP (balanced
parentheses).
 A KQML message has a format that consists of a performative and a
set of arguments for that performative that can be sent to any transport
(such as sockets). The performative is the spoken act that establishes
the message's intent (assertion, command, request, etc.).
 The performative-name specifies the sort of message to be conveyed
(evaluate, ask- if, stream-about, reply, tell, deny, standby, advertise,
etc.). The unique names of the agents in the interaction are determined
by the sender and receiver. The information contained in the material is
specific to the performance being done.
 This content is described by a language (how to represent it) and an
ontology (the vocabulary (and meaning) of the content). Finally, in
order to correlate the request with the response, the agent can attach a
context to the response (in-reply-to). The KQML message's structure.
(performative-name
: sender X
: receiver Y
: content Z
: language L
: ontology Y
: reply-with R
: in-reply-to Q
)
Let's have a look at a dialogue between two KQML agents as an example. An
agent, in this case, asks for the current value of a temperature sensor in a
system. The temperature of TEMP SENSOR 1A, as sampled by the
temperature-server agent, is requested. The request, as specified in the
prologue language, is the content. Thermal-control-appl is the name of the
agent who is making the request.
(ask-one
:sender thermal-control-appl
:receiver temperature-server
:languageprolog
:ontology CELSIUS-DEGREES
:content “temperature(TEMP_SENSOR_1A ?temperature)”
:reply-with request-102
)
The temperature-server would then respond to our agent, defining the
temperature of the sensor of interest.
(reply
:sender temperature-server
:receiver thermal-control-appl
:languageprolog
:ontology CELSIUS-DEGREES
:content “temperature(TEMP_SENSOR_1A 45.2)”
:in-reply-to request-102
)
Our agent would then receive a response from the temperature-server, defining
the temperature of the sensor of interest,
(reply
:sender temperature-server
:receiver thermal-control-appl
:languageprolog
:ontology CELSIUS-DEGREES
:content “temperature(TEMP_SENSOR_1A 45.2_”
:in-reply-to request1102
)
KQML has a lot of features for communicating information as well as higher-
level requests that deal with the communication layer. A small list of some of
the other KQML per formatives may be seen in the table below.
KQML PERFORMATIVES

Performatives Description
Evaluate Evaluate the content of the message
ask-one Request for the answer to the question
Reply Communicate a reply to a question
stream-about Provide multiple response to a question
Sorry Return an error (can’t respond)
Tell Inform an agent of sentence
Achieve A request of something to achieve by the
receiver
Advertise Advertise the ability to process a performative
Subscribe Subscribe to changes of information
Forward Route a message

 KQML is a valuable language for communicating not only data, but


also data meaning (in terms of a language and ontology).
 KQML has a wide range of features, including simple speed acts as
well as more advanced acts such as data streaming and information
transfer control.
ACL (FIPA AGENT COMMUNICATION LANGUAGE)
 The FIPA ACL is a consortium-based language for agent
communication, whereas KQML is a language established in the
setting of a university.
 The Foundation for Intelligent Physical Agents consortium
standardised ACL, which stands for Agent Communication Language.
ACL is a speech-act language described by a set of per formatives,
similar to KQML.
 The Foundation for Intelligent Physical Agents, or FIPA, is a non-
profit organisation dedicated to the advancement of agent-based
systems. It creates specifications to ensure that agent systems are as
portable as possible (including their ability to communicate using the
ACL).
 The FIPA ACL is quite similar to KQML in terms of message
composition, even using the inner and outer content stacking (meaning
and content).
 Certain speech-acts, or performatives, are also clarified by the ACL.
Communication primitives, for example, are classified as
communicative acts, which are distinct from performative activities.
 The formal language for defining ACL semantics in the FIPA ACL is
the Semantic Language, or SL. This enables BDI themes to be
supported (beliefs, desires, intentions). To put it another way, SL
supports the depiction of long-term goals (intentions), as well as
propositions and objects. Each agent language has a purpose, and while
they are distinct, they can also be considered complementary.
XML
 XML stands for Extensible Markup Language and is a data and meta-
data format (meaning of the data). This is accomplished through the
use of a representation that comprises tags that enclose the data.
 The tags state clearly what the data is about. Consider the KQML ask-
one request as an example. This can be expressed in XML as follows:

 There are obvious parallels between XML and KQML. The tags are
present in KQML, however their syntax differs from that of XML.
KQML allows for tag layering, which is a significant difference.
 It's worth noting that the tag is the performative's and its arguments'
outer layer. XML is a fairly versatile format that allows for quite
sophisticated data and meta-data structures.
 XML is used in a variety of protocols, including XML-RPC (Remote
Procedure Call) and SOAP (Simple Object Access Protocol) (Simple
Object Access Protocol). The Hyper Text Convey Protocol (HTTP) is
used to transport each of these.
TRUST AND REPUTATION
It depends on the level we apply it:
User confidence
• Can we trust the user behind the agent?
– Is he/she a trustworthy source of some kind of knowledge? (e.g. an expert in
a field)
– Does he/she acts in the agent system (through his agents in a trustworthy
way?
Trust of users in agents
• Issues of autonomy: the more autonomy, less trust
• How to create trust?
– Reliability testing for agents
– Formal methods for open MAS
– Security and verifiability
Trust of agents in agents
• Reputation mechanisms
• Contracts
• Norms and Social Structures
Trust - Definition
 Cooperation among agents is incorporated into the design process in
closed environments:
 The multi-agent system is typically constructed by a single developer
or a single team of developers, and one possibility for reducing
complexity is for the chosen developers to include cooperation among
the agents they build as a key system requirement.
 Agent AI seeking information or a specific service from agent aj can
be confident that if AI has the capabilities and resources required, aj
will respond; otherwise, aj will inform AI that it is unable to perform
the action asked.
 It may be stated that trust is implicit in closed situations.
Trust can be computed as
 A binary value (1=‘I do trust this agent’, 0=‘I don’t trust this agent’)
 A set of qualitative values or a discrete set of numerical values (e g
‘trust always’ ‘trust conditional to X’ ‘no trust’) e.g. always, X, trust )
(e.g. ‘2’, ‘1’, ‘0’, ‘-1’, ‘-2’)
 A continuous numerical value (e.g. [-300..300])
 A probability distribution
 Degrees over underlying beliefs and intentions (cognitive approach)
HOW TO COMPUTE TRUST
Trust values can be externally defined
• by the system designer: the trust values are pre-defined
• by the human user: he can introduce his trust values about the humans
behind the other agents
Trust values can be inferred from some existing representation about the
interrelations between the agents
• Communication patterns, cooperation history logs, e-mails, webpage
connectivity mapping...
Trust values can be learnt from current and past experiences
• Increase trust value for agent AI if behaves properly with us
• Decrease trust value for agent AI if it fails/defects us
Trust values can be propagated or shared through a MAS
• Recommender systems, Reputation mechanisms.
TRUST AND REPUTATION
1. In writing, most authors combine trust and reputation.
2. Some authors distinguish between the two.
3. Trust is an individual measure of a given agent's confidence in other agents
(s)
4. Reputation is a social measure of a group of agents' or a society's
confidence in other agents or organisations. One method of calculating
(person) trust is through reputation.
I am more likely to trust a reputable agency.
My reputation has a direct impact on the level of trust that others have in me.
In social groups, reputation can play a sanctioning role: a bad reputation can
be very costly to one's future transactions.
5.Most authors combine (individual) Trust with some form of (social)
Reputation in their models
6. Recommender systems, Reputation mechanisms.
Direct experiences are the most relevant and reliable information source for
individual trust/reputation
1. Type 1: Experience based on direct interaction with the
2. Type 1: Experience with the partner
1. Used by almost all models
2. How to:
• trust value about that partner increases with good experiences,
• it decreases with bad ones
3. Problem: how to compute trust if there is no previous interaction?
3. Type 2: Experience based on observed interaction of other members
1. Used only in scenarios prepared for this.
2. How to: depends on what an agent can observe
• agents can access to the log of past interactions of other agents
• agents can access some feedback from agents about their past interactions
(e.g., in eBay)
3. Problem: one has to introduce some noise handling or
4. confidence level on this information
4. Prior-derived: agents bring with them prior beliefs about strangers Used by
some models to initialize trust/reputation values
How-to:
• designer or human user assigns prior values
• a uniform distribution for reputation priors is set
Assign the lowest possible reputation value to new agents. When an agent's
reputation falls below a certain threshold, there is no motivation to discard a
cyber-identity.
• Assume neither good nor bad reputation for unknown agents.
• Avoid lowest reputation for new, valid agents as an obstacle for other agents
to realize that they are valid.
5. Group-derived:
• Group models can be expanded to yield estimates of prior reputation for
agents in social groupings.
• Establishing a link between a stranger's initial individual reputation and the
group from which he or she originates.
• Issue: extremely domain- and model-dependent.
6.Propagated:
• Based on information gathered from others in the surroundings, the agent can
attempt to evaluate the stranger's reputation. This is known as word of mouth.
• Issue: Combining diverse reputation values is often an ad hoc approach with
no social foundation.
TRUST AND REPUTATION MODELS
1. Not really for MAS, but can be applied to MAS
2. Idea: For serious life / business decisions, you want the
• opinion of a trusted e pert trusted expert
3. If an expert not personally known, then want to find a reference to one via a
chain of friends and colleagues
4. Referral-chain provides:
• Way to judge quality of expert's advice
• Reason for the expert to respond in a trustworthy manner
• Finding good referral-chains is slow, time-consuming, but vital business
gurus on “networking”
• Set of all possible referral-chains = a social network
5.Model integrates information from
• Official organizational charts (online)
• Personal web pages (+ crawling)
• External publication databases
• Internal technical document databases
6. Builds a social network based in referral chains
• Each node is a recommender agent
• Each node provides reputation values for specific areas o E.g. Frieze is good
in mathematics
• Searches in the referral network are
• made by areas
E.g. browsing the network’s “mathematics” recommendation chains
7.Trust Model Overview
• 1-to-1 asymmetric trust relationships.
• Direct trust and recommender trust.
• Trust categories and trust values
• [-1,0,1,2,3,4].
8. Conditional transitivity. Alice trusts Bob.&. Bob trusts Cathy Alice trusts
Cathy Alice trusts.rec Bob.&. Bob says Bob trusts Cathy Alice may trust
Cathy Alice trusts.rec Bob value X. &. Bob says Bob trusts Cathy value Y 140
Alice may trust Cathy value f(X, Y)
9. Recommendation protocol
1. Alice ->Bob: RRQ(Eric)
2. Bob ->Cathy: RRQ(Eric)
3. Cathy -> Bob: Rec(Eric,3)
4. Bob ->Alice: Rec(Eric,3)
Direct Trust
In terms of information reliability, ReGreT assumes that there is no difference
between direct interaction and direct observation. It discusses firsthand
experiences.
The outcome is the most essential component in calculating a direct trust.
A dialogue between two agents can result in one of the following outcomes:
• An initial contract to take a particular course of action and the actual result of
the actions taken, or
• An initial contract to x the terms and conditions of a transaction and the
actual values of the terms of the transaction.

Reputation Model: Witness reputation


a. The first stage in calculating a witness reputation is to determine which
witnesses will be considered by the agent during the computation.
b. The initial set of potential witnesses might be
i. the set of all agents that have interacted with the target agent in the past.
ii. However, because this set might be quite large, the information provided by
its members is likely to suffer from the correlated evidence problem.
c. The next step is to add these values together to get a single value for witness
repute. The weight of each piece of information in determining the final
reputation rating will be proportionate to the witness's credibility.
Reputation Model: Witness reputation
a. Two methods to evaluate witness credibility:
i. ReGreT employs fuzzy rules to determine how the structure of social
relationships affects the information's believability. The kind and degree of a
social link (the edges in a sociogram) is the antecedent of each rule, and the
credibility of the witness from the perspective of that social relation is the
consequent.
ii The second way employed by the ReGreT system to determine a witness's
credibility is to assess the correctness of prior pieces of information supplied
to the agent by that witness. The agent use the direct trust value to assess the
veracity of the information provided by witnesses.
Reputation Model: Neighbourhood Reputation
a. In a MAS, neighbourhood is defined by the relationships formed through
interaction rather than the geographical proximity of the agents.
b. The key notion is that the behaviour of these neighbours, as well as the type
of relationship they have with the target agent, might provide some insight
into the target agent's behaviour.
c. The ReGreT system employs fuzzy criteria to calculate a Neighbourhood
Reputation. i. One or more direct trusts linked with various behavioural
elements and the relationship between the target agent and the neighbour are
the antecedents of these rules.
ii. The worth of a specific reputation follows as a result (that can be associated
to the same behavioural aspect of the trust values or not).
Reputation Model: System Reputation
a. Assigning default reputations to agents using common knowledge about
social groupings and the function that the agent plays in society as a process.
b. ReGreT presupposes that members of these groups have one or more
observable characteristics that clearly identify them as such.
c. We consider an agent to be playing a single part every time it performs an
action.
For example, an agent can act as both a buyer and a seller, but when selling a
product, only the seller's position is significant.
 The system reputations are determined using a table for each social
group, with the rows representing the roles that the agent can perform
for that group and the columns representing the behavioural features.
Reputation Model: Default Reputation
a. To the preceding reputation classes, we must add a fourth: the default
reputation, which is the reputation attributed to a third-party agent when no
information is available.
b. This is usually a fixed value.
Reputation Model: Combining reputations
a. Each reputation type has its own set of features, and there are a variety of
algorithms for combining the four reputation values into a single,
representative reputation value.
b. In ReGreT, this heuristic is based on each type's default and computed
reliability.
c. Assuming we have sufficient data to calculate all of the reputation kinds, we
believe that a. witness reputation should be examined first, followed by b.
neighbourhood reputation.
c. system reputation d. the default reputation.
Main criticism to Trust and Reputation research
a. Proliferation of ad-hoc models weakly grounded in social theory
b. No general, cross-domain model for reputation
c. Lack of integration between models
i. Comparison between models unfeasible
ii. Researchers are trying to solve this by, e.g. the ART competition
NEGOTIATION
1. Negotiation is a common type of interaction that occurs between agents
with diverse agendas.
2. Negotiation is a procedure in which two or more agents reach a shared
conclusion while each attempting to achieve a separate aim or objective. The
agents first present their opposing viewpoints, which may be at odds, and then
attempt to reach an agreement by making concessions or looking for
alternatives.
3. The language used by the participating agents, the protocol followed by the
agents when negotiating, and the decision process utilised by each agency to
decide its positions, concessions, and criteria for agreement are the three major
elements of negotiation.
4. Many communities have devised negotiation methods and techniques.
These can be focussed on the environment or on the agents. "How can the
laws of the environment be constructed so that the agents in it, regardless of
their origin, capabilities, or intents, will interact productively and fairly?" asks
the creators of environment-centered approaches. The following
characteristics should be included in the resulting negotiation mechanism:
• Efficiency: the agents should not waste resources in coming to an agreement.
Stability: no agent should have an incentive to deviate from agreed- upon
strategies.
• Simplicity: the negotiation mechanism should impose low computational and
bandwidth demands on the agents.
• Distribution: the mechanism should not require a central decision maker.
• Symmetry: the mechanism should not be biased against any agent for
arbitrary or inappropriate reasons.
5. These themes are treated in an eloquent and interesting manner. Worth-
oriented domains, state-oriented domains, and task-oriented domains are the
three sorts of environments that have been recognised.
6. A task-oriented domain is one in which agents have a set of tasks to do, the
resources required to complete the tasks are available, and the agents are able
to complete the tasks independently of one another. The agents, on the other
hand, can gain from sharing part of the work. The "Internet downloading
domain," for example, is a list of documents that each agent must access
through the Internet. Downloading has a cost, which each agent would like to
keep as low as possible. If numerous agents need to view the same document,
they can save money by accessing it once and then sharing it.
7. The environment could include the following minimal negotiating
mechanisms and constraints:
(1) each agent declares the documents it wants
(2) documents found to be common to two or more agents are assigned to
agents based on the toss of a coin,
(3) agents pay for the documents they download, and
(4) agents are granted access to the documents they download. as well as any
in their common sets. This mechanism is simple, symmetric, distributed, and
efficient (no document is downloaded twice). To determine stability, the
agents' strategies must be considered.
8. The best approach is for an agent to declare the true set of documents that it
requires, independent of the strategy or documents that the other agents
require. It is stable because there is no motivation for an agent to deviate from
this method.
9. In the first method, negotiation protocols and their components are
formalised using speech-act classifiers and a hypothetical world semantics.
This clarifies the satisfaction criteria for various types of communications. To
provide a flavor of this approach, we show in the following example how the
commitments that an agent might make as part of a negotiation are formalized.

10. This rule states that an agent forms and maintains its commitment to
achieve ø individually iff (1) it has not precommitted itself to another agent to
adopt and achieve ø, (2) it has a goal to achieve ø individually, and (3) it is
willing to achieve ø individually. The chapter on "Formal Methods in DAI"
provides more information on such descriptions.
11. The second strategy is based on the premise that the agents are
economically rational. Furthermore, the number of agents must be limited,
they must speak the same language, have the same problem abstraction, and
arrive at the same conclusion. Rosenschein and Zlotkin created a unified
negotiation protocol based on these assumptions. Agents who follow this
process form a bargain, which is a collaborative plan that satisfies all of the
agents' objectives. For an agent, the utility of a deal is the amount he is willing
to pay less the agreement's cost. Each agent seeks to increase his or her own
utility.
The agents talk about a negotiation set, which is a collection of all deals with a
positive utility for each of them.
In formal terms, a task-oriented domain under this approach becomes a tuple
<T,A,c>
where T is the set of tasks, A is the set of agents, and c(X) is a monotonic
function for the cost of executing the tasks X. A deal is a redistribution of
tasks. The utility of deal d for agent k is Uk(d) = c(Tk) - c(dk)
The conflict deal D occurs when the agents cannot reach a deal. A deal d is
individually rational if d > D. Deal d is pareto optimal if there is no deal d' > d.
The set of all deals that are individually rational and pareto optimal is the
negotiation set, NS. There are three possible situations:
1. conflict: the negotiation set is empty
2. compromise: agents prefer to be alone, but since they are not, they will
agree to a negotiated deal 145
3. cooperative: all deals in the negotiation set are preferred by both agents over
achieving their goals alone. When there is a conflict, then the agents will not
benefit by negotiating—they are better off acting alone. Alternatively, they
can "flip a coin" to decide which agent gets to satisfy its goals.
In the other two circumstances, negotiation is the best option.
Because the agents have some execution autonomy, they can fool or mislead
one another in theory. As a result, developing protocols or societies that can
limit the impacts of deception and misinformation is an attractive study
subject. Another component of the research topic is to create procedures that
make it sensible for agents to be truthful to one another. The links between
economic techniques and human-centered negotiation and argumentation have
yet to be extensively explored.
BARGAINING
A bargaining problem is defined as a pair (S,d). A bargaining solution is a
function f that maps every bargaining problem (S,d) to an outcome in S, i.e.,
f : (S,d) → S
Thus the solution to a bargaining problem is a pair in R2. It gives the values of
the game to the two players and is generated through the function called
bargaining function. Bargaining function maps the set of possible outcomes to
the set of acceptable ones.
Bargaining Solution
A surplus is formed in a transaction when the seller and the buyer value a
product differently. A negotiation solution is a means for buyers and sellers to
agree on how to split the surplus. Consider the case of a house built by builder
A. It cost him Rs. ten lakhs. The house has piqued the curiosity of a potential
buyer, who estimates it to be worth Rs.20 lacs. This deal has the potential to
create a profit of Rs.10 lakhs. Both the builder and the customer must now
agree on a price. The vendor knows the worth is larger than 10 Lacs, and the
buyer knows the cost is less than 20 Lacs. They must come to an agreement on
a price. Both are attempting to maximise their profits. Buyer would want to
buy it for 10 Lacs, while the seller would like to sell it for 20 Lacs. They
bargain on the price, and either trade or dismiss. Trade would result in the
generation of surplus, whereas no surplus is created in case of no-trade.
Bargaining Solution provides an acceptable way to divide the surplus among
the two parties. Formally, a Bargaining Solution is defined as, F : (X,d) → S,
where X R2 and S,dR2. X represents the utilities of the players in the set of
possible bargaining agreements. d represents the point of disagreement. In the
above example, price bargaining set is simply x + y  10, x  0, y  0. A point
(x,y) represents the case, when seller gets a surplus of x, and buyer gets a
surplus of y, i.e. seller sells the house at 10 + x and the buyer pays 20 − y.
1. the set of payoff allocations that are jointly feasible for the two players in
the process of negotiation or arbitration, and
2. the payoffs they would expect if negotiation or arbitration were to fail to
reach a settlement.
Based on these assumptions, Nash generated a list of axioms that a reasonable
solution ought to satisfy. These axioms are as follows:
Axiom 1 (Individual Rationality) This axiom asserts that the bargaining
solution should give neither player less than what it would get from disagree
ment, i.e., f(S,d) ≥ d.
Axiom 2 (Symmetry) The solution should be independent of the players'
names, i.e., who is named a and who is named b, according to this principle.
This indicates that if the utility functions and disagreement utilities of the
participants are equal, they will receive equal shares. As a result, any
asymmetry in the final payout should be related to variations in their utility
functions or the results of their disagreements.
Axiom 3 (Strong Efficiency) This axiom asserts that the bargaining solution
should be feasible and Pareto optimal.
Axiom 4 (Invariance) The solution should not change as a result of linear
changes in either player's utility, according to this principle. As a result, if a
player's utility function is doubled by 2, the solution should remain
unchanged. Only the player will place a higher value on what it receives.
Axiom 5 (Independence of Irrelevant Alternatives) This axiom states that
removing possible options (other than the disagreement point) that would not
have been picked should have no bearing on the solution, i.e. for any closed
convex set.
Nash demonstrated that the bargaining solution that satisfies the five axioms is

NON-COOPERATIVE MODELS OF SINGLE-ISSUE NEGOTATION

Cooperative Game Non Cooperative Game


The players are allowed to communicate Each player independently
before choosing their strategies and chooses its strategy.
playing the game.
The basic modeling unit is the group The basic modeling unit is the
individual.
Players can enforce cooperation in the The cooperation between
group through a third party. individuals is self- enforcing.

GAME-THEORETIC APPROACHES FOR MULTI-ISSUE


NEGOTIATION
The following are the four key procedures for bargaining over multiple issue:
 Global bargaining: Here, the negotiation agents immediately handle
the global problem, addressing all of the difficulties at the same time.
The global bargaining technique is also known as the package deal
procedure in non-cooperative theory. An offer from one agent to the
other would detail how each of the difficulties would be resolved in
this procedure.
 Independent/separate bargaining: Individual difficulties are dealt with
in a completely separate and autonomous manner, with no impact on
one another. This would be the situation if each of the two sides hired
m agents (to negotiate over m problems), each of whom was in charge
of one issue. In discussions between two countries, for example, each
subject may be resolved by representatives from both countries who
are exclusively interested in their own issue.
 Sequential bargaining with independent implementation: In this
scenario, each party considers one topic at a time. For example, they
could start with the first issue and then move on to the second after
reaching an agreement, and so on. The parties are not allowed to
negotiate another problem until the prior one has been settled. The
sequential procedure might take various different shapes. The agenda
and the implementation rule are used to define them. The agenda3 for
sequential bargaining determines the sequence in which the subjects
will be discussed. The implementation rule indicates when a deal on a
specific topic becomes effective. The rule of independent
implementation and the rule of simultaneous implementation are the
two types of implementation rules.
 Sequential bargaining with simultaneous implementation: This is
identical to the prior situation, except that now an agreement on one
topic does not become effective until all following issues have been
resolved.
Co-operative Models of Multi-Issue Negotiation
 Simultaneous implementation agenda independence: Global bargaining
and sequential negotiating with simultaneous implementation produce
the same agreement, according to this axiom.
 Independent implementation agenda independence Global bargaining
and sequential bargaining with autonomous implementation produce
the same agreement, according to this axiom.
 Separate/global equivalence: This axiom states that global bargaining
and separate bargaining yield the same agreement.
Non-Cooperative Models of Multi-Issue Negotiation
An agent’s cumulative utility is linear and additive. The functions Ua and Ub
give the cumulative utilities for a and b respectively at time t and are defined
as follows.

Where w Ra + Rm denotes an m element vector of constants for agent a


and wb+R+m such a vector b. These vectors indicate how the agents prefer
different issuers. For example, if wa c> wa c+1, then agent a values issue c
more than issue c+ 1. Like for agent b.
ARGUMENTATION
➢ A rational verbal and social action aiming at enhancing (or decreasing) the
acceptability of a problematic viewpoint for the listener or reader by
presenting a set of propositions (i.e. arguments) intended to defend (or refute)
the viewpoint in front of a reasonable judge.
➢ Argumentation can be defined as an activity aimed at convincing of the
acceptability of a standpoint by putting forward propositions justifying or
refuting the standpoint.
➢ Argument: Reasons / justifications supporting a conclusion
➢ Represented as: support ->conclusion
– Informational arguments: Beliefs -> Belief e.g. If it is cloudy, it might rain.
– Motivational args: Beliefs, Desires ->Desire e.g. If it is cloudy and you
want to get out then you don’t want to get wet.
– Practical arguments: Belief, Sub-Goals -> Goal e.g. If it is cloudy and you
own a raincoat then put the raincoat.
– Social arguments: Social commitments-> Goal, Desire e.g. I will stop at the
corner because the law say so. e.g I can’t do that, I promise to my mother that
won’t.
Process of Argumentation
 Constructing arguments (in favor of / against a “statement”) from
available information.
A: “Tweety is a bird, so it flies”
B: “Tweety is just a cartoon!”
 Determining the different conflicts among the arguments. “Since
Tweety is a cartoon, it cannot fly!” (B attacks A) Evaluating the
acceptability of the different arguments “Since we have no reason to
believe otherwise, we’ll assume Tweety is a cartoon.” (accept B). “But
then, this means despite being a bird he cannot fly.” (reject A).
 Concluding, or defining the justified conclusions.
“We conclude that Tweety cannot fly!”
Computational Models of Argumentation
1. Given the definition of arguments over a content language (and its logic),
the models allow to:
• Compute interactions between arguments: attacks, defeat, support,...
• Valuation of arguments: assign weights to arguments in order to compare
them. Intrinsic value of an argument Interaction-based value of an argument
2.Selection of acceptable argument (conclusion)
• Individual acceptability
• Collective acceptability

OTHER ARCHITECTURES
LAYERED ARCHITECTURES
Given the requirement that an agent be capable of both reactive and proactive
action, a natural decomposition would be to create distinct subsystems to
handle these two sorts of behaviours. This concept easily leads to a type of
design in which the various subsystems are organised into a hierarchy of
interconnected levels. We'll look at some general aspects of layered
architectures in this part, then look at two examples of such designs:
INTERRAP and TOURINGMACHINES
There will usually be at least two levels, one for reactive activity and the other
for proactive conduct. There's no reason why there shouldn't be many more
layers in theory. Regardless of how many levels there are, the information and
control flows within them provide a valuable typology for such designs.
Within layered systems, we may distinguish two forms of control flow (see
Figure):
• Horizontal layering. The software layers in horizontally tiered systems
(Figure (a)) are all directly related to the sensory input and action output. In
effect, each layer functions as an agent, generating recommendations for what
action to take.
• Vertical layering. Sensory input and action output are each handled with by
at most one layer in vertically tiered designs (Figures (b) and (c)).
Horizontally tiered systems provide a significant advantage in terms of
conceptual simplicity: if we need an agent to demonstrate n various forms of
behaviour, we just design n different levels.
However, because the layers are effectively competing with one another for
action ideas, there is a risk that the agent's overall behaviour will be
inconsistent. A mediator function is usually included in horizontally stacked
systems to ensure consistency. It makes judgments regarding which layer has
"control" of the agent at any particular time.
The requirement for central control is troublesome since it implies that the
designer must consider all conceivable interactions between layers. If the
architecture has n layers and each layer is capable of recommending m
possible actions, then there are mn possible interactions to evaluate. In any but
the simplest systems, this is clearly challenging from a design standpoint. The
adoption of a central control system also creates a bottleneck in the decision-
making process of the agent.

ABSTRACT ARCHITECTURE
 We can easily formalize the abstract view of agentgs presented so far.
First, we will assume that the state of the agent’s environment can be
characterized as a set S = {s1, s2,…) of environment states.
 At any given instant, the environment is assumed to be in one of these
states. The effectoric capability of an agent is assumed to be
represented by a set A = (a1, a2,…) of actions. Then abstractly, an
agent can be viewed as a function Action S* → A
 Which converts environmental state sequences into actions. A standard
agent is an agent that is modelled by a function of this type. The
assumption is that an agent selects what action to take based on its
history — its previous experiences. These encounters are represented
as a series of environment states — those that the agent has
encountered thus far.
 The (non-determinstic) behaviour of an environment can be modelled
as a function env: S x A → P (S) S and an action (perfrmed by
theWhich takes the current state of the environment s agent), and
maps them to a set of environment state env(s,a) – those that could
result from performing action a in state s. If all the sets in the rnage of
env are all sigletons, (i.e., if the result of performing any action in any
state in a set containing a single member), then the environment is
deterministic, and its behaviour can be accrately predicted.
 We can represented the interaction of agent and environment as a
history. A history h is a sequence: h:: ... ... 0 1 2 3 1 s0 ⎯⎯a →s1
⎯⎯a →s2 ⎯⎯a →s3 ⎯⎯a → ⎯a⎯u− →su ⎯⎯au→ where so is
the initial state of the environment (i.e., its state when the agent starts
executing), au is the uth action that the agent chose to perform, and su
is the uth environment state (which is one of the possible results of
executing action au-2 in state su-1). S* → A is an agent, env: S x S →
p(S) is an environment, and so is the initial state of environment.
 The characteristic behaviour of an agent action S* → A is an
environment env : is the set of all the histories that satisfy these
properties. If some property holds of all these histories, this property
can be regarded as an invariant property of the agent in the
environment. For example, if our agent is a nuclear reactor controller,
(i.e., the environment is a nuclear reactr), and in all possible histories
of the contrller/reactor, the reactor does not below up, then this can be
regarded as a (desirable) invariant property. We will denote by
hist(agent, environment) the set of all histories of agent in
environment. Two agentws ag1 and ag2 are said to be behaviorally
equivalent with respect to enviroment env iff hist(ag1, env) = hist(ag2,
env) and simply behaviorally equivalent iffr they are behaviorally
equivalent with respect of all environments.
CONCRETE ARCHITECTURES FOR INTELLIGENT AGENTS
1.Agents have only been considered in the abstract. So, while we've looked at
the properties of agents that retain state and those that don't, we haven't
considered what this state would look like. Similarly, we've modelled an
agent's decision-making as an abstract function action that somehow manages
to indicate which action to take—but we haven't talked about how this
function may be implemented. This omission will be addressed in this section.
We'll look at four different types of agents:
• logic based agents—in which decision making is realized through logical
deduction;
• reactive agents—in which decision making is implemented in some form of
direct mapping from situation to action;
• belief-desire-intention agents—in which decision making depends upon the
manipulation of data structures representing the beliefs, desires, and intentions
of the agent; and finally,
• layered architectures—in which decision making is realized via various
software layers, each of which is more-or- less explicitly reasoning about the
environment at different levels of abstraction.
In each of these circumstances, we're moving away from an abstract view of
agents and toward more concrete commitments regarding their underlying
structure and behaviour. Each section describes the nature of these
commitments, the assumptions that the architectures are based on, and the
relative benefits and drawbacks of each.
UNIT V
APPLICATIONS

Applications of AI
Artificial intelligence is used in a variety of ways in today's society. It is
becoming increasingly important in today's world because it can efficiently
handle complicated problems in a variety of areas, including healthcare,
entertainment, banking, and education. Our daily lives are becoming more
comfortable and efficient as a result of artificial intelligence.
The following are some of the areas where Artificial Intelligence is used:

1. AI (Astronomy)
Artificial Intelligence (AI) can be extremely helpful in resolving complicated
challenges in the universe. AI technology can assist in gaining a better
understanding of the cosmos, including how it operates, its origin, and so on.
2. AI (Healthcare)
In the previous five to ten years, AI has become more beneficial to the
healthcare business and is expected to have a big impact.
AI is being used in the healthcare industry to make better and faster diagnoses
than humans. AI can assist doctors with diagnosis and can alert doctors when a
patient's condition is deteriorating so that medical assistance can be provided
before the patient is admitted to the hospital.
3. AI (Gaming)
AI can be employed in video games. AI machines can play strategic games
like chess, in which the system must consider a vast number of different
options.
4. AI (Finance)
The banking and AI businesses are the ideal complements to each other.
Automation, chatbots, adaptive intelligence, algorithm trading, and machine
learning are all being used in financial activities.
5. AI (Data Security)
Data security is critical for every business, and cyber-attacks are on the rise in
the digital age. AI can help you keep your data safe and secure. Some
examples are the AEG bot and the AI2 Platform, which are used to better
determine software bugs and cyber-attacks.
6. AI (Social Media)
Facebook, Twitter, and Snapchat, for example, have billions of user accounts
that must be kept and handled in a very efficient manner. AI has the ability to
organise and manage large volumes of data. AI can go through a large amount
of data to find the most recent trends, hashtags, and user requirements.
7. AI (Travel & Transport)
For the travel industry, AI is becoming increasingly important. AI is capable
of doing a variety of travel-related tasks, including making travel
arrangements and recommending hotels, flights, and the best routes to
customers. The travel industry is utilising AI-powered chatbots that can
engage with clients in a human-like manner to provide better and faster
service.
8. AI (Automotive Industry)
Some automotive companies are utilising artificial intelligence to provide a
virtual assistant to their users in order to improve performance. Tesla, for
example, has released TeslaBot, an intelligent virtual assistant.
Various industries are presently working on self-driving automobiles that will
make your ride safer and more secure.
9. AI (Robotics)
In Robotics, Artificial Intelligence plays a significant role. Typically,
conventional robots are programmed to execute a repetitive task; but, using
AI, we may construct intelligent robots that can perform tasks based on their
own experiences rather than being pre-programmed.
Humanoid Robots are the best instances of AI in robotics; recently, the
intelligent Humanoid Robots Erica and Sophia were built, and they can
converse and behave like people.
10. AI (Entertainment)
We already use AI-based applications in our daily lives with entertainment
providers like Netflix and Amazon. These services display software or show
recommendations using machine learning/artificial intelligence (ML/AI)
algorithms.
11. AI (Agriculture)
Agriculture is a field that necessitates a variety of resources, including effort,
money, and time, in order to get the greatest results. Agriculture is becoming
more computerised these days, and AI is becoming more prevalent in this
industry. AI is being used in agriculture in the form of agriculture robotics,
solid and crop monitoring, and predictive analysis. AI in agriculture has the
potential to be extremely beneficial to farmers.
12. AI (E-commerce)
AI is giving the e-commerce industry a competitive advantage, and it is
becoming increasingly demanded in the market. Shoppers can use AI to find
related products in their preferred size, colour, or brand.
13. AI (Education)
Grading can be automated with AI, giving the instructor more time to educate.
As a teaching assistant, an AI chatbot can communicate with students.
In the future, AI could serve as a personal virtual tutor for pupils, available at
any time and from any location.
Machine Learning
"Machine learning" is defined by Simon. "Learning signifies adaptive
modifications in the system that enable the system to perform the same task or
tasks selected from the same population more successfully the next time".

Decision Tree Example


Optimizations
 ACO
 Swarm intelligence
 Genetic Algorithm
Impact Applications
1. IBM built Deep Blue, a chess-playing computer.
2. IBM's DeepQA research generated Watson, an artificially intelligent
computer system capable of answering questions presented in natural
language.
3. Deep learning is a class of machine learning algorithms that aims to learn
layered representations of inputs, such as neural networks.

LANGUAGE MODELS
Language can be defined as a set of strings; “print(2+2)” is a legal program in
the language Python, where “2) + (2 print” is not. Since the are an infinite
number of legal programs, they cannot be enumerated; instead they are
specified by a set of rules called a grammar. Formal languages also have rules
that defined the meaning semantics of a program; for example, the rules say
that the “meaning” of “2 + 2” is 4, and the meaning of “1/0” is that an error is
signated.
1. Natural languages, such as English or Spanish, cannot be described as a set
of predetermined sentences. For example, while everyone agrees that "not
being invited is sad" is an English sentence, opinions differ on the
linguistically correctness of "to be not invited is stated." As a result, rather of
defining a natural language model as a definitive collection, it is more fruitful
to characterise it as a probability distribution over sentences. Instead of asking
whether a string of words is a member of the set defining the language, we
question P(S = word) - what is the chance that a random sentence would
contain words. Natural languages are unclear as well. "He saw her duck" can
refer to either a waterfowl belonging to her or a movement she made to avoid
anything. As a result, rather than a single meaning for a sentence, we must
speak of a probability distribution across all alternative interpretations.
2. Finally, natural language is challenging to deal with due to its huge size and
rapid change. As a result, our language models are only a rough
approximation. We begin with the simplest possible approximation and work
our way up.
Let’s begin with the task of computing P(w|H) — probability of word ‘w’,
given some history ‘H’.
Suppose the ‘H’ is ‘its water is so transparent that’, and we want to know the
probability of next word ‘the’: P(the|its water is so transparent that).
One way to estimate this probability — relative frequency counts. Take a large
corpus, count the number of time ‘its water is so transparent that’ and also
count the number of times it has been followed by ‘the’.

Counting Conditional Probability with Relative Frequency Counts

While this method of estimating probabilities straight from counts works well
in many circumstances, it turns out that the web isn't large enough to provide us
with accurate predictions in the vast majority of cases. Why? Because language
is dynamic, and new sentences are introduced on a daily basis that we will
never be able to count.
As a result, we need to incorporate more sophisticated methods for calculating
the likelihood of word w given history H.
To represent the probability of a particular random variable Xi taking on the
value “the”, or P(Xi = “the”), we will use the simplification P(the). We’ll
represent a sequence of N words either as w1 . . . wn or wn (so the
expression wn−1 means the string w1,w2,…,wn−1). For the joint probability of
each word in a sequence having a particular value P(X = w1,Y = w2,Z = w3,
…,W = wn) we’ll use P(w1,w2,…,wn).
How to compute probability of entire sequence? — Using chain rule of
probability
Using Chain rule of probability to calculate joint probability distribution

The chain rule demonstrates the relationship between computing the


conditional probability of a word given prior words and determining the joint
probability of a sequence. However, there is a catch --- we don't know how to
calculate the actual likelihood of a term given a long string of words.
The bigram model uses only the conditional probability of the last preceding
word to approximate the probability of a word given all previous words:
Approximation used in Bigram Model
Markov models are a type of probabilistic model that assumes we can
anticipate the likelihood of a future unit without having to look too far back in
time.
The bigram (which looks back one word) can be generalised to the trigram
(which looks back two words) and therefore to the n-gram (which looks back n
- 1 words).
We can compute the probability of a whole word sequence by substituting the
bigram assumption for the probability of an individual word.

Bigram Model
What method do we use to calculate bi-gram (or n-gram) probabilities? —
Maximum likelihood estimation, or MLE, is a simple method for estimating
probabilities. By taking counts from a corpus and normalising them so that they
fall between 0 and 1, we may acquire the MLE estimate for the parameters of
an n-gram model.
To compute a specific bigram probability of a word y given a prior word x, for
example, we'll count the bigram C(xy) and normalise by the sum of all the
bigrams that share the same first word x:

Estimation of bigram probability using MLE


Because the sum of all bigram counts that begin with a specific word wn - 1
must equal the unigram count for that word wn- 1, we can reduce this equation:

Using MLE to estimate bi-gram probability


Maximum likelihood estimation, or MLE, is an example of using relative
frequencies to estimate probabilities. What language phenomena does the bi-
gram statistic capture, despite the fact that it has been calculated?
Some of the bigram probabilities listed above encapsulate truths that we
consider to be strictly syntactic.
We employed bi-gram models for pedagogical purposes, but in practise, we use
tri-gram or 4-gram models. For computing probabilities in language modelling,
we utilise the log format — log probabilities. If estimated for a 4-gram/5-gram,
the numerical underflow may occur because the probability (<1) and while
multiplying together, the product grows smaller.
Evaluating Language Models
The easiest approach to assess a language model's performance is to embed it
in an application and track how much the application improves. Extrinsic
evaluation refers to this type of end-to-end assessment. Unfortunately, running
large NLP systems from start to finish is frequently prohibitively expensive.
Instead, we devised a score that may be used to swiftly assess prospective
language model enhancements. An intrinsic evaluation metric is one that
assesses a model's quality without regard to its application.
A test set is required for an intrinsic evaluation of a language model. The
probabilities of an n-gram model, like many other statistical models in our
field, are determined by the corpus on which it is trained, often known as the
training set or training corpus. The performance of an n-gram model on unseen
data, referred to as the test set or test corpus, can subsequently be used to assess
its quality.
If we have a corpus of text and wish to compare two distinct n-gram models,
we divide the data into training and test sets, train both models' parameters on
the training set, and then compare how well the two trained models match the
test set.
A better model is one that gives a greater probability to the test set — that is,
one that more precisely predicts the test set.
In practise, we usually divide our data into three categories: 80 percent training,
10% development, and 10% test.
Perplexity
The perplexity (abbreviated as PP) of a language model on a test set is the test
set's inverse probability, normalised by the number of words.

General Formula of Perplexity

Expanding the probability distribution using chain rule


As a result, if we use the bi-gram model to compute perplexity:

Perplexity for Bigram Model (LM)


Because of the inverse, the smaller the perplexity, the higher the conditional
probability of the word sequence. According to the language model, reducing
perplexity is comparable to increasing the test set probability.
Smoothing
What do we do with terms that are in our vocabulary (and aren't unfamiliar) yet
appear in an unknown context in a test set? We'll have to take a bit of
probability mass from the more common occurrences and give it to the events
we've never seen to prohibit a language model from assigning 0 probability to
these unseen events. Smoothing or discounting is the term for this type of
adjustment.
Laplace Smoothing
Before we normalise the bigram counts into probabilities, the simplest
approach to smooth them is to add one to all of them. All counts that were
previously zero will now have a count of one, counts of one will have a count
of two, and so on. Laplace smoothing is the name of the algorithm.
Although Laplace smoothing does not perform well enough to be employed in
recent n-gram models, it does provide an excellent introduction to many of the
concepts found in other smoothing algorithms.
The count ci normalised by the total number of word tokens N is the
unsmoothed maximum likelihood estimate of the unigram probability of the
word wi:
Unigram probability of a word
We must additionally update the denominator to account for the extra V
observations because there are V terms in the lexicon and each one was
incremented:

Laplace Smoothing of Unigram Probability


By establishing an adjusted count c*, it is easy to define how a smoothing
technique impacts the numerator. Because we're simply modifying the
numerator and adding 1, we'll need to multiply by a normalisation factor to
determine this count.

Adjusted Count for Laplace Smoothing


After that, the corrected count can be normalised by N words, yielding a
probabilistic value.
Formula for Laplace Smoothing on Bigram Model
Smoothing can also be viewed as discounting (reducing) some non-zero counts
to obtain the probability mass that will be allocated to the zero counts. We
might express a smoothing process in terms of a relative discount dc, the ratio
of discounted counts to original counts, rather than discounted counts c*:

Discounting factor
Add-k smoothing
One alternative to add-one smoothing is to move a bit less of the probability
mass from the seen to the unseen events. Instead of adding 1 to each count, we
add a frac- tional count k (.5? .05? .01?). This algorithm is therefore called add-
k smoothing.
Add-k Smoothing Algorithm

Add-k smoothing requires that we have a method for choosing k; this can be
done, for example, by optimizing on a devset. Although add-k is useful for
some tasks (including text classification), it turns out that it still doesn’t work
well for language modeling.
Backoff and Interpolation
If we are trying to compute P(wn|wn−2,wn−1) but we have no examples of a
particular trigram wn−2,wn−1,wn. We can instead estimate its probability by
using the bigram probability P(wn|wn−1). Similarly, if we don’t have counts to
compute P(wn|wn−1), we can look to the unigram P(wn).
Sometimes using less context is a good thing, helping to general- ize more for
contexts that the model hasn’t learned much about.
There are two ways to use this n-gram “hierarchy”.
i. If the evidence is sufficient, we use the trigram; otherwise, we use the
bigram; otherwise, we use the unigram.
We always blend the probability estimates from all the n-gram estimators in
interpolation, weighing and mixing the trigram, bigram, and unigram counts.
We merge different order n-grams by linearly interpolating all the models in
simple linear interpolation. As a result, we estimate the trigram likelihood by
combining the probabilities of the unigram, bigram, and trigram.

Linear Interpolation of Trigram Proabability


Such that the estimate’s add up to 1.
N-GRAM WORD MODEL
1. n-gram models used to words instead of characters
2. The word and character models are both subject to the same procedure.
3. The most significant difference is that the vocabulary—the set of symbols
that make up the corpus and model—is greater.
4. Most languages have just roughly 100 letters, and we sometimes create
character models that are even more restricted, such as recognising "A" and
"a" as the same symbol or treating all punctuation as the same symbol. Word
models, on the other hand, have tens of thousands, if not millions, of symbols.
The vast range is due to the ambiguity of what constitutes a word.
5. Word n-gram models must cope with terms that are not in the vocabulary.
6. Because there is always the possibility of a new word that was not observed
in the training corpus using word models, we must explicitly reflect this in our
language model.
7. You may achieve this by simply adding one new term to your vocabulary:,
which stands for the unknown word.
8. For different classes, numerous unknown-word symbols are sometimes
employed. Any string of digits, for example, might be changed with, or any
email address with.

INFORMATION RETRIEVAL
The task of retrieving materials that are relevant to a user's desire for
information is known as information retrieval. Search engines on the World
Wide Web are the most well-known instances of information retrieval
systems. When a Web user types (AI book) into a search engine, a list of
relevant pages appears. We'll look at how such systems are put together in this
part. A system for retrieving information (henceforth referred to as IR) can be
defined as follows:
1. A corpus of documents. Each system must decide what it wants to treat as a
document: a paragraph, a page, or a multipage text.
2. Queries posed in a query language. A query is a statement that expresses
what the user wants to know. The query language can be just a list of terms,
like [AI book]; or it can define a phrase of words that must be adjacent, like
["AI book"]; it can include Boolean operators, like [AI AND book]; or it can
include non-Boolean operators, like [AI NEAR book] or [AI book
site:www.aaai.org].
3. A result set: This is the subset of documents deemed relevant to the query
by the IR system. By relevant, we mean material that is likely to be useful to
the individual who asked the question for the specific information requirement
specified in the inquiry.
4. A presentation of the result set: This can be as simple as a ranked list of
document titles or as complicated as a rotating colour map of the result set
projected into a three-dimensional space and displayed in two dimensions. A
Boolean keyword model was used in the first IR systems. Each word in the
document collection is handled as a Boolean feature, which is true if the term
appears in the document and false otherwise.
Advantage
 It's easy to explain and put into practise.
Disadvantages
 There is no direction on how to organise the relevant materials for
presentation because the degree of relevance of a document is a single
bit.
 Users who are not programmers or logicians are unfamiliar with
Boolean expressions. Users find it inconvenient that in order to learn
about farming in Kansas and Nebraska, they must use the query
[farming (Kansas OR Nebraska)].
 Even for a seasoned user, formulating a good query might be difficult.
Let's say we try [information AND retrieval AND models AND
optimization] and get nothing. We could try [information OR retrieval
OR models OR optimization], but if it yields too many outcomes, it's
hard to determine where to go next.

IR SCORING FUNCTIONS
1. Most IR systems have abandoned the Boolean paradigm in favour of
models based on word count data.
2. A scoring function takes a document and a query and produces a numerical
score; the papers with the highest scores are the most relevant.
3. The score in the BM25 scoring function is a linear weighted sum of the
scores for each of the terms in the question.
4. A query term's weight is influenced by three factors:
• First, the frequency with which a query term appears in a document (also
known as TF for term frequency). For the query [farming in Kansas],
documents that mention “farming” frequently will have higher scores.
• In result set Not in
result set
Relevant 30 20
Not relevant 10 40
Second, the term's inverse document frequency, or IDF. • Because the term
"in" appears in practically every document, it has a high document frequency
and hence a low inverse document frequency, making it less essential to the
query than "farming" or "Kansas."
• Finally, the document's length. A million-word paper will almost certainly
have all of the query words, although it may not be about the query. A short
document that includes all of the terms is a far better option.
All three of them are taken into account by the BM25 function.

where |dj| is the length of document dj in words, and L is the average


document length i |di| N. We have two parameters, k and b, that can be tuned
by crossvalidation; typical values are k = 2.0 and b = 0.75. IDF (qi) is the
inverse document.in the corpus, L = Systems create an index ahead of time
that lists, for each vocabulary word, the documents that contain the word. This
is called the hit list for the word. Then when given an query, we intersect the
hit lists of the query words and only score the documents in the intersection.
IR SYSTEM EVALUATION
Consider the case when an IR system has given a result set for a single query
for which we know which documents are relevant and which are not from a
corpus of 100 documents. The following table shows the number of
documents in each category.
Precision measures the proportion of documents in the result set that are
actually relevant. In our example, the precision is 30/(30 + 10)=.75. The false
positive rate is 1 − .75=.25. 2. Recall measures the proportion of all the
relevant documents in the collection that are in the result set. In our example,
recall is 30/(30 + 20)=.60. The false negative rate is 1
−.60=.40. 3. Recall is difficult to compute in a big document collection, such
as the WorldWideWeb, because there is no practical way to review every page
on the Web for relevance.
IR REFINEMENTS
1. A better model of the effect of document length on relevance is a common
refinement.
2. Although the BM25 scoring system utilises a word model that treats all
terms as fully independent, we know that some words are connected: "couch"
is related to both "couches" and "sofa." Many IR systems try to account for
correlations like these.
3. For example, if the query is [couch], it would be a shame to eliminate pages
that mention "COUCH" or "couches" but not "couch" from the result set. In
both the query and the documents, most IR systems case fold "COUCH" to
"couch," and others utilise a stemming method to reduce "couches" to the stem
form "couch."
4. Next, look for synonyms for words like "couch," such as "sofa." This, like
stemming, has the potential to improve recall but at the expense of precision.
In dictionaries, or by looking for correlations in documents or searches, you
can find synonyms and related words.
5. As a last refinement, metadata—data outside of the document's text—can be
used to improve IR. Human-supplied keywords and publication data are two
examples. Hypertext linkages between documents are an important source of
information on the Internet.
PAGERANKALGORITHM
PageRank (PR) is a Google Search algorithm that ranks web pages in search
engine results. Larry Page, one of Google's founders, was the inspiration for
Page Rank. PageRank is a metric for determining how important a website's
pages are. According to Google, PageRank calculates the importance of a
website by counting the quantity and quality of links that point to it. The basic
premise is that more important websites are more likely to gain links from
other websites.

We will see that the recursion bottom out property. We assume page A has
pages T1…Tn which point to it (i.e., are citations). The parameter d is a
damping factor which can be set between 0 and 1. We usually set d to 0.85.
There are more details about d in the next section. Also C(A) is defined as the
number of links going out of page A. The PageRank of a page A is given as
follows:

THE HITS ALGORITHM


1. Another prominent link-analysis method is the Hyperlink-Induced Topic
Search algorithm, also known as "Hubs and Authorities" or HITS.
2. There are various ways in which HITS differs from PageRank. To begin
with, it is a query-dependent metric: it ranks pages in relation to a query.
3. HITS finds a set of pages relevant to the query first. It accomplishes this by
combining query hit lists.
4. Words, followed by pages in the link neighbourhood of these pages—pages
that link to or are linked from one of the relevant set's pages.
5. To the extent that other pages in the relevant set refer to it, each page in this
set is regarded an authority on the query. When a page points to other
authoritative pages in the relevant set, it is considered a hub.
6. Similar to PageRank, we don't just want to measure the quantity of links;
we want to give high-quality hubs and authorities greater weight.
7. We iterate a procedure, similar to PageRank, that changes a page's authority
score to be the total of the hub scores of the pages that point to it, and the hub
score to be the sum of the authority scores of the pages it points to.
8. PageRank and HITS both contributed to our growing understanding of Web
information retrieval. As search engines improve their methods of collecting
finer indications of search relevance, these algorithms and their expansions are
utilised to score billions of queries every day.
IMAGE EXTRACTION
The practise of collecting information knowledge by skimming a text and
looking for occurrences of a certain class of item and links between objects is
known as information extraction. Extracting addresses from Web pages, with
database columns for street, city, state, and zip code; or extracting storms from
weather reports, with fields for temperature, wind speed, and precipitation, is a
common activity. This can be done with high accuracy in a small domain.
FINITE-STATE AUTOMATA FOR INFORMATION EXTRACTION
1. An attribute-based information extraction system assumes that the entire
text belongs to a single item, and the job is to extract attributes of that object.
2. Relational extraction systems, which deal with many objects and their
relationships, are a step up from attribute-based extraction methods.
3.FASTUS, which handles news reports about corporate mergers and
acquisitions, is an example of a relational-based extraction system.
4. A sequence of cascaded finite-state transducers can be used to create a
relational extraction system.
5. That is, the system is made up of a sequence of small, efficient finite-state
automata (FSAs), each of which takes in text as input, converts it to a new
format, and then sends it on to the next automaton. FASTUS is divided into
five stages: 1.Tokenization 2. Handling of complex words 3. Handling of basic
groups 4. Handling of complex situations 5. Merging of structures
6. Tokenization is the initial stage of FASTUS, which divides the stream of
characters into tokens (words, numbers, and punctuation). Tokenization in
English is rather straightforward; simply separating characters at white space
or punctuation suffices. Some tokenizers can handle markup languages like
HTML, SGML, and XML as well.
7. The second stage deals with more complicated phrases like "set up" and
"joint venture," as well as formal names like "Bridgestone Sports Co." A
combination of lexical elements and finite-state grammar rules is used to
identify these.
8.The third level deals with core groups, such as noun and verb groups. The
objective is to break them down into manageable chunks for later phases.
9. In the fourth level, the fundamental groupings are combined to form
complex phrases. The goal is to have finite-state rules that can be processed
rapidly and that produce unambiguous (or nearly unambiguous) output
phrases. Domain-specific events are dealt with by one type of combination
rule.
10.The tenth and last stage combines the structures created in the preceding
stage. If the next line says, "The joint venture will begin production in
January," this step will note that there are two references to a joint venture,
which should be combined into one. This is an example of the problem of
identity doubt.
PROBABILISTIC MODELS FOR INFORMATION EXTRACTION
 The hidden Markov model, or HMM, is the most basic probabilistic
model for sequences containing hidden states.
 In terms of extraction, HMMs have two significant benefits over FSAs.
 For starters, HMMs are probabilistic and thus noise-tolerant. When a
single expected character is absent in a regular expression, the regex
fails to match; with HMMs, missing characters/words are gracefully
degraded, and we get a probability indicating the degree of match, not
just a Boolean match/fail.
 Second, HMMs may be trained using data rather than costly template
engineering, making it easier to keep them up to date as text changes
over time.

Hidden Markov model for the speaker of a talk announcement


 After learning the HMMs, we may use the Viterbi method to discover
the most likely path through the HMM states and apply them to a text.
One method is to apply each attribute HMM independently; in this
scenario, you can expect the majority of the HMMs to be in the
background. When the extraction is sparse - when the number of
extracted words is modest in comparison to the length of the text - this
method is appropriate.
 The alternative is to merge all of the individual qualities into a single
large HMM, which would then search for a path that travels over
several target attributes, first locating a speaker target, then a date
target, and so on. When we predict only one of each characteristic in a
text, several HMMs are better, and when the texts are more freeform
and rich with attributes, a single large HMM is better.
 HMMs have the advantage of providing probability numbers to aid in
decision-making. If any targets are missing, we must determine
whether or not this is a valid instance of the desired relation, or
whether the targets discovered are false positives. This decision can be
made using a machine learning system.

ONTOLOGY EXTRACTION FROM LARGE CORPORA


1. Building a vast knowledge base or ontology of facts from a corpus is
another application of extraction technology. In three ways, this is
distinct:
• For starters, it's open-ended—we're looking for information on a
variety of areas, not just one.
• Second, with a large corpus, this task is dominated by precision
rather than recall—similar to how question answering on the Web is
dominated by statistical aggregates gathered from multiple sources. 
• Third, rather than being extracted from one specific text, the results
can be statistical aggregates gathered from multiple sources.
2. Here is one of the most productive templates: NP such as NP (, NP)* (,)?
((and | or) NP)?.
3. The strong words and commas must appear in the text exactly as written,
although the parentheses are for grouping, the asterisk indicates zero or more
repetitions, and the question mark indicates that anything is optional.
4. A noun phrase is represented by the variable NP.
5. This template combines the phrases "diseases like rabies impact your dog"
and "supports network protocols like DNS," implying that rabies is a disease
and DNS is a network protocol.
6. The key words "including," "particularly," and "or other" can be used to
create similar templates. Of course, many crucial passages, such as "Rabies is
a sickness," will be missed by these templates. This is done on purpose.
7. The "NP is an NP" template does suggest a subclass relationship on
sometimes, but it frequently implies something different, such as "There is a
God" or "She is a little fatigued." We can afford to be choosy with a huge
corpus, using only high-precision templates.
8. We'll miss a lot of subcategory relationship statements, but we'll almost
certainly discover a paraphrase of the statement someplace else in the corpus
that we can utilise.
AUTOMATED TEMPLATE CONSTRUCTION
Clearly, these are examples of the author–title relationship, but the learning
system had no idea who the writers were or what their titles were. The words
in these samples were used in a Web corpus search, which yielded 199 hits.
Each match is defined as a tuple of seven strings (Author, Title, Order, Prefix,
Middle, Postfix, URL), with Order being true if the author came first and false
if the title came first, Middle being the characters between the author and title,
Prefix being the 10 characters before the match, Suffix being the 10 characters
after the match, and URL being the Web address where the match was made.
1. As a match, each template contains the same seven components.
2. The Author and Title are regexes with any characters (but beginning and
ending in letters) and a length ranging from half the minimum to twice the
maximum length of the samples.
3. Only literal strings, not regexes, are allowed in the prefix, middle, and
postfix.
4. The middle is the most straightforward to grasp: each individual middle
string in the collection of matches is a separate candidate template. The
template's Prefix is then defined as the longest common suffix of all the
prefixes in the matches, and the template's Postfix is defined as the longest
common prefix of all the postfixes in the matches for each such candidate.
5. The template is rejected if one of these has a length of zero.
6. The template's URL is defined as the longest prefix of the matches' URLs.
The sensitivity to noise is the major flaw with this method. Errors can spread
quickly if one of the initial few templates is faulty. One method to mitigate
this issue is to require many templates to verify a new example, and to require
a new template to discover numerous examples that are also discovered by
existing templates.
MACHINE READING
1. A traditional information extraction system that is targeted at a few
relationships and more like a human reader who learns from the text itself; the
field is known as machine reading because of this.
2. TEXTRUNNER is an example of a machine-reading system.
TEXTRUNNER employs cotraining to improve its performance, but it
requires a starting point.
3. Because TEXTRUNNER is domain-agnostic, it can't rely on established
noun and verb lists.

On a huge Web corpus, TEXTRUNNER obtains an accuracy of 88 percent


and a recall of 45 percent (F1 of 60 percent). From a corpus of half a billion
Web pages, TEXTRUNNER has extracted hundreds of millions of facts.

Natural Language Processing


Natural Language Processing (NLP) is an area of AI that allows machines to
interpret human language. Its purpose is to create systems that can understand
text and conduct activities like translation, spell check, and topic classification
automatically.
Natural Language Processing (NLP) is a technique that allows computers to
comprehend human language. NLP analyses the grammatical structure of
phrases and the particular meanings of words behind the scenes, then applies
algorithms to extract meaning and produce results. In other words, it
understands human language so that it can accomplish various activities
automatically.
Virtual assistants like Google Assist, Siri, and Alexa are probably the most
well-known instances of NLP in action. NLP recognises written and spoken
text such as "Hey Siri, where is the nearest gas station?" and converts it to
numbers that machines can comprehend.
Chatbots are another well-known application of NLP. They assist support
teams in resolving issues by automatically interpreting and responding to
typical language queries.
You've undoubtedly come across NLP in many other programmes that you use
on a daily basis without even realising it. When drafting an email, offering to
translate a Facebook post written in another language, or filtering undesirable
advertising emails into your junk bin, you can use text recommendations.
In a word, the goal of Natural Language Processing is to make machines
understand human language, which is complex, ambiguous, and immensely
diverse.

Vauquois Triangle : Schematic diagram for Machine Translation Systems


Difference - NLP, AI, Machine Learning
Natural language processing (NLP), artificial intelligence (AI), and machine
learning (ML) are all terms that are sometimes used interchangeably, so it's
easy to get them mixed up.
The first thing to understand is that natural language processing and machine
learning are both subsets of AI.
Artificial intelligence (AI) is a broad phrase that refers to machines that can
mimic human intelligence. Systems that simulate cognitive abilities, such as
learning from examples and solving problems, are included in AI. From self-
driving automobiles to predictive systems, this covers a wide spectrum of
applications.
Natural Language Processing (NLP) is the study of how computers
comprehend and translate human speech. Machines can understand written or
spoken material and execute tasks such as translation, keyword extraction,
topic classification, and more using natural language processing (NLP).
However, machine learning will be required to automate these procedures and
provide reliable results. Machine learning is the process of teaching machines
how to learn and develop without being explicitly programmed through the
use of algorithms.
NLP is used by AI-powered chatbots to interpret what users say and what they
mean to accomplish, while machine learning is used by AI-powered chatbots
to automatically offer more correct responses by learning from previous
interactions.
NLP Techniques
Syntactic and semantic analysis are two techniques used in Natural Language
Processing (NLP) to assist computers interpret text.
Syntactic Analysis
Syntactic analysis, often known as parsing, examines text using basic
grammatical rules to determine sentence structure, word organisation, and
word relationships.
The following are some of the important sub-tasks:
Tokenization is the process of breaking down a text into smaller pieces called
tokens (which can be phrases or words) in order to make it easier to work
with.
Tokens are labelled as verb, adverb, adjective, noun, and so on using part of
speech tagging (PoS tagging). This aids in deducing a word's meaning (for
example, the word "book" has multiple meanings depending on whether it is
employed as a verb or a noun).
Lemmatization and stemming are techniques for simplifying the analysis of
inflected words by reducing them to their base form.
Stop-word elimination eliminates often used terms that have no semantic
value, such as I, they, have, like, yours, and so on.
Semantic Analysis
The goal of semantic analysis is to capture the meaning of text. It begins by
looking at the meaning of each individual word (lexical semantics). The
programme then examines the word combination and what it means in context.
The following are the primary sub-tasks of semantic analysis:
The goal of word sense disambiguation is to figure out which sense a word is
being used in a given situation.
Relationship extraction tries to figure out how entities (places, people,
organisations, and so on) in a text relate to one another.

5 Use Cases of NLP in Business


Companies are using natural language processing (NLP) techniques to better
understand how their customers perceive them across all channels of
communication, including emails, product reviews, social media posts,
surveys, and more.
AI solutions may be used to automate monotonous and time-consuming
operations, boost efficiency, and allow people to focus on more meaningful
duties in addition to understanding online interactions and how customers
communicate about firms.
Here are some of the most common business applications of NLP:
Sentiment Analysis
 Sentiment analysis is a technique for detecting emotions in text and
categorising them as positive, negative, or neutral. By pasting words
into our free sentiment analysis tool, you can observe how it works.
 Companies can acquire insight into how customers feel about brands or
products by monitoring social media posts, product reviews, or online
polls. You could, for example, examine tweets mentioning your
business in real time to spot furious consumer remarks straight away.
 Perhaps you might conduct a survey to learn how people feel about
your customer service. You may discover which areas of your
customer service earn good or negative feedback by evaluating open-
ended replies to NPS questionnaires.
Language Translation
 Over the last few years, machine translation technology has advanced
significantly, with Facebook's translations approaching superhuman
performance in 2019.
 Businesses can use translation technologies to communicate in a
variety of languages, which can help them strengthen their worldwide
communication or break into new markets.
 Translation systems can also be trained to grasp specialised language
in any area, such as finance or health. As a result, you won't have to
worry about the inaccuracies that come with generic translation
software.
Text Extraction
 You can extract pre-defined information from text using text
extraction. This programme assists you in recognising and extracting
relevant keywords and attributes (such as product codes, colours, and
specifications) and named entities from enormous amounts of data
(like names of people, locations, company names, emails, etc).
 Text extraction can be used to find key terms in legal documents,
identify the essential words cited in customer service tickets, and
extract product specs from a paragraph of text, among other things.
Isn't it intriguing? You can use this keyword extraction tool.
Chatbots
 Chatbots are artificial intelligence systems that communicate with
humans via text or speech.
 Chatbots are increasingly being used for customer service because of
their capacity to provide 24/7 support (shortening response times),
manage many requests at once, and free up human agents from
answering repetitive questions.
 Chatbots actively learn from each contact and improve their grasp of
human intent, allowing you to trust them with routine and easy tasks. If
they come across a client question they can't answer, they'll forward it
to a human representative.
Topic Classification
 Topic classification aids in the categorization of unstructured text. It's a
terrific approach for businesses to learn from customer feedback.
 Assume you want to examine hundreds of open-ended NPS survey
responses. How many people mention your customer service in their
responses? What percentage of clients bring up the topic of "pricing"?
You'll have all your data categorised in seconds with this topic
classifier for NPS feedback.
 Topic classification can also be used to automate the process of
labelling incoming support issues and routing them to the appropriate
individual.
Closing
 The branch of AI known as Natural Language Processing (NLP)
explores how machines interact with human language. NLP is used to
improve technologies that we use every day, such as chatbots, spell-
checkers, and language translators.
 NLP, when combined with machine learning algorithms, results in
systems that learn to do tasks on their own and improve over time.
Among other things, NLP-powered solutions can help you identify
social media posts by emotion or extract identified entities from
business correspondence.

Machine Translation

Automated translation is known as machine translation (MT). It is the process


of converting a text from one natural language (such as English) to another
using computer software (such as Spanish).

The meaning of a text in the original (source) language must be fully restored
in the target language, i.e. the translation, in order to process any translation,
whether human or automated. While it appears simple on the surface, it is
significantly more complicated. Translation is more than just a word-for-word
replacement. A translator must be able to evaluate and analyse all of the text's
aspects, as well as understand how each word influences the others. This
necessitates considerable knowledge of the source and target languages'
grammar, syntax (sentence structure), semantics (meanings), and other
aspects, as well as familiarity with each local region.
Both human and machine translation have their own set of difficulties. For
example, no two translators can create identical translations of the same
content in the same language pair, and customer satisfaction may require
numerous rounds of revisions. The major challenge, though, is determining
how machine translation can produce translations of publishable quality.
Rule-Based Machine Translation Technology
For each language pair, rule-based machine translation relies on a large
number of built-in linguistic rules and millions of bilingual dictionaries.
The software parses text and generates a transitional representation from
which the target language text can be generated. This approach necessitates
enormous sets of rules and extensive lexicons with morphological, syntactic,
and semantic information. These sophisticated rule sets are used by the
programme, which subsequently translates the source language's grammatical
structure to the target language.
Large dictionaries and complex language rules are used to create translations.
By including their terminology into the translation process, users can increase
the quality of the out-of-the-box translation. They generate custom dictionaries
that override the system's default options.
In most circumstances, there are two steps: an initial investment that improves
quality considerably at a low cost, and a continuous investment that improves
quality progressively. While rule-based MT can help firms get to and beyond
the quality threshold, the quality improvement process can be lengthy and
costly.
Statistical Machine Translation Technology
Statistical machine translation makes use of statistical translation models
whose parameters are derived through monolingual and bilingual corpora
analysis. The process of creating statistical translation models is swift, but the
technique is strongly reliant on existing multilingual corpora. For a specific
domain, a minimum of 2 million words is necessary, with considerably more
for general language. Although it is theoretically possible to meet the quality
criterion, most companies lack the appropriate multilingual corpora to create
the necessary translation models. Furthermore, statistical machine translation
uses a lot of CPU power and necessitates a lot of hardware to run translation
models at average performance levels.
Rule-Based MT vs. Statistical MT
Rule-based MT has good out-of-domain quality and is predictable by nature.
Customization based on dictionaries ensures higher quality and adherence to
corporate terminology. However, the fluency that readers expect from a
translation may be lacking. The customising cycle required to attain the
quality criteria might be lengthy and costly in terms of expenditure. Even on
ordinary hardware, the performance is excellent.
When big and qualified corpora are available, statistical MT produces good
results. The translation is fluent, which means it reads smoothly and so
satisfies the needs of the user. The translation, on the other hand, is neither
predictable nor consistent. Good corpora training is automated and less
expensive. However, training on broad language corpora, or text outside of the
defined domain, is ineffective. Statistical machine translation also necessitates
a lot of hardware to construct and manage huge translation models.

Rule-Based MT Statistical MT

+ Consistent and predictable


– Unpredictable translation quality
quality

+ Out-of-domain translation
– Poor out-of-domain quality
quality

+ Knows grammatical rules – Does not know grammar

+ High performance and – High CPU and disk space


robustness requirements

+ Consistency between versions – Inconsistency between versions

– Lack of fluency + Good fluency


– Hard to handle exceptions to + Good for catching exceptions to
rules rules

+ Rapid and cost-effective


– High development and
development costs provided the
customization costs
required corpus exists

Machine Translation Paradigms

SPEECH RECOGNITION
Given an auditory signal, speech recognition is the challenge of detecting a
series of SPEECH words pronounced by a speaker. It has become one of the
most widely used AI applications.
1. Example: When stated quickly, the word "recognise speech" sounds almost
identical to "wreak a nice beach." Even this brief example demonstrates a few
of the challenges that make communication difficult.
2. First segmentation: written words in English have spaces between them, but
in fast speech there are no pauses in “wreck a nice” that would distinguish it as
a multiword phrase as opposed to the single word “recognize”.
3. Second, coarticulation: when speaking quickly the “s” sound at the end of
“nice” merges with the “b” sound at the beginning of “beach” yielding
something that is close to a “sp”. Another problem that does not show up in
this example is homophones – words like “to”, “too” and “two” that sound the
same but differe in meaning argmax P(word1:t | sound1:t) = argmax
P(sound1:t | word1:t) P(word1:t). word1:t word1:t Heere P (sound1:t |
sound1:t) is the acrostic model. It describes the sound of words – that “ceiling”
begins with a soft “c” and sounds the same as “sealing”. P (word1:t) is known
as the language model. It specifies the prior probability of each utterance – for
example, that “ceiling fan” is about 500 times more likely as a word sequence
than “sealing fan”.
4. Once we define the acoustic and language models, we can solve for the
most likely sequence of words using the Viterbi algorithm.
Acoustic Model
1. The magnitude of the current – which approximates the amplitude of
the sound wave – is measured by an analog-to-digital converter at
discrete intervals called sampling rate.
2. The quantization factor determines the precision of each measurement;
speech recognizers commonly keep 8 to 12 bits. A low-end system
sampling at 8 kHz with 8-bit quantization would require roughly half a
megabyte of speech every minute.
3. A phoneme is the smallest unit of sound that has a specific meaning for
language speakers. The "t" in "stick," for example, sounds similar
enough to the "t" in "tick" that English people assume them to be the
same phoneme. A vector of features summarises each frame. The
phone model is depicted in the image below.
Translating the acoustic signal into a sequence of frames. In this
diagram each frame is described by the discretized values of three
acoustic features; a real system would have dozens of features

Building a speech recognizer


1. A voice recognition system's quality is determined by the quality of all of its
components, including the language model, word-pronunciation models,
phone models, and the signal processing techniques used to extract spectral
information from acoustic signals.
2. The most accurate systems train different models for each speaker,
incorporating variances in dialect, gender, and other factors. Because this type
of training can take several hours of contact with the speaker, the most widely
used systems do not develop speaker-specific models.
3. A system's accuracy is determined by a variety of elements. First, the signal
quality is important: a high-quality directional microphone focused at a
stationary mouth in a cushioned room will perform far better than a cheap
microphone broadcasting a signal across phone lines from a car stuck in traffic
with the radio on. The word mistake rate is below 0.5 percent when
recognising digit strings with a vocabulary of 11 words (1-9 plus "oh" and
"zero"), but it jumps to about 10% on news items with a 20,000-word
vocabulary and 20% on a corpus with a 64,000-word vocabulary. The task
matters too: when the system is trying to accomplish a specific task – book a
flight or give directions to a restaurant – the task can often be a accomplished
perfectly even with a word error rate of 109% or more.
A diagram depicting the various options for a machine translation system. At
the top, we have English text. An interlingua-based system parses English first
into a syntactic form, then into a semantic representation and an interlingua
representation, and last into a semantic, syntactic, and lexical form in French
through generation. The dashed lines are used as a shortcut in a transfer-based
system. Different systems convey data at different points; some even do so
several times.

ROBOT
1. Robots are physical agents who manipulate the physical world to fulfil
tasks.
2. They are outfitted with effectors like as legs, wheels, joints, and grippers in
order to accomplish this.
3. The sole goal of effectors is to exert physical pressures on the environment.
4. Robots also have sensors that enable them to perceive their surroundings.
5. Modern robotics uses a variety of sensors, such as cameras and lasers to
monitor the environment and gyroscopes and accelerometers to track the
robot's own movements.
6. The majority of today's robots can be classified into one of three groups.
Manipulators, sometimes known as robot arms, are physically attached to their
work environment, such as a factory assembly line or the International Space
Station.
Robot Hardware
1. Sensors serve as a perceptual link between the robot and its surroundings.
2. Passive sensors, such as cameras, are actual environmental observers,
capturing signals created by other sources in the environment.
3.Active sensors, like sonar, emit energy into the surroundings. The fact that
this radiation is reflected back to the sensor is what they rely on. Active
sensors provide more information than passive sensors, but at the cost of
higher power consumption and the risk of interference when numerous active
sensors are employed simultaneously. Sensors can be classified into three sorts
based on whether they sense the environment, the robot's location, or the
robot's internal setup, whether active or passive.
4. Range finders are sensors that measure the distance between things in the
immediate vicinity. Robots were widely equipped with sonar sensors in the
early days of robotics. Sonar sensors produce directed sound waves that are
reflected by objects, with part of the sound making it to the listener.
5. Stereo vision uses many cameras to capture the surroundings from slightly
different perspectives, then analyses the parallax in the images to compute the
range of nearby objects. Sonar and stereo vision are no longer commonly
employed in mobile ground robots due to their inaccuracy.
6. Laser beams and special 1-pixel cameras are used in other range sensors,
which can be guided using sophisticated mirror configurations or spinning
parts. Scanning lidars are the name for these sensors (short for light detection
and ranging).
7. Radar, which is generally the sensor of choice for UAVs, is another
common range sensor. Radar sensors are capable of measuring distances of
several kilometres. Tactile sensors, such as whiskers, bump panels, and touch-
sensitive skin, are on the other end of the range sensing spectrum. These
sensors use physical contact to determine range and can only be used to detect
items that are very close to the robot.
8. Location sensors are a second significant type of sensor. To detect position,
most location sensors use range sensing as a main component. Outside, the
most popular solution to the problem of location is the Global Positioning
System (GPS).
9.Proprioceptive sensors, which notify the robot of its own motion, are the
third major class. Motors are frequently equipped with shaft decoders that
count the revolution of motors in minute increments to measure the exact
configuration of a robotic joint.
10. Inertial sensors, such as gyroscopes, rely on mass resistance to velocity
change. They can aid in the reduction of uncertainty.
11.Force and torque sensors are used to measure other crucial aspects of the
robot's status. When robots handle fragile goods or objects whose exact shape
and placement are unknown, these are essential.
Robotic Perception
1. Perception is the conversion of sensor data into internal representations of
the environment by robots. Sensors are noisy, and the environment is partially
viewable, unexpected, and frequently dynamic, making perception difficult. In
other words, robots face all of the issues associated with state estimation (or
filtering)
2. Good internal representations for robots have three properties as a rule of
thumb: they contain enough information for the robot to make good decisions,
they are structured so that they can be updated efficiently, and they are natural
in the sense that internal variables correspond to natural state variables in the
physical world.

Robot perception – Dynamic Bayes Network


3. Another machine learning technique allows robots to adapt to large changes
in sensor measurements in real time.
4. Robots can adjust to such changes using adaptive perception approaches.
Self-supervised methods are those that allow robots to acquire their own
training data (complete with labels!). In this case, the robot is using machine
learning to transform a short-range sensor that is good for terrain
categorization into a sensor that can see considerably beyond.

Monte Carlo Algorithm

There may be times when no map of the environment is available. The robot
will then need to obtain a map. This is a bit of a chicken-and-egg situation: the
navigating robot will have to figure out where it is in relation to a map it
doesn't know while also generating the map while not knowing where it is.
This subject has been extensively explored under the name simultaneous
localization and mapping, abbreviated as SLAM, and it is vital for many robot
applications.
Many alternative probabilistic approaches, like the extended Kalman filter
outlined above, are used to tackle SLAM difficulties.
Localization using extended Kalman Filter

Machine learning in robot perception


Robot perception relies heavily on machine learning. This is especially true
when the most appropriate internal representation is unknown. Unsupervised
machine learning algorithms are commonly used to map high-dimensional
sensor streams into lower-dimensional areas. Low-dimensional embedding is a
term for this method.
Another machine learning technique allows robots to adapt to large changes in
sensor measurements in real time.
Robots can adjust to such changes thanks to adaptive perception systems.
Self-supervised methods are those that allow robots to acquire their own
training data (complete with labels!). In this case, the robot is using machine
learning to transform a short-range sensor that is good for terrain
categorization into a sensor that can see considerably beyond.
PLANNING TO MOVE
1. A robot's entire reasoning boils down to deciding how to move effectors.
2. The goal of point-to-point motion is to get the robot or its end effector to a
specific spot.
3. The compliant motion problem, in which a robot moves while in physical
touch with an impediment, poses a higher challenge.
4. A robot manipulator screwing in a light bulb or a robot pushing a box over a
table top are examples of compliant motion. We start by determining an
appropriate representation for describing and solving motion-planning
problems. The configuration space—the space of robot states defined by
location, orientation, and joint angles—turns out to be a better place to operate
than the original 3D workspace.
5. In configuration space, the path planning problem is to find a path from one
configuration to the next. Throughout this book, we've seen many variations of
the pathplanning problem; the challenge introduced by robotics is that path
planning encompasses continuous spaces. T 
6. Cell breakdown and skeletonization are the two basic techniques. The
continuous path-planning problem is reduced to a discrete graph-search
problem in each case. In this section, we assume that the robot's motion is
deterministic and that its localization is precise. The assumptions in the
following sections will be relaxed.
7. The principle of skeletonization underpins the second main family of path-
planning algorithms. These algorithms simplify the planning problem by
reducing the robot's free area to a one-dimensional representation. A skeleton
of the configuration space is a lower-dimensional representation of the
configuration space.
Configuration Space
1. It has two independently moving joints. The (x, y) coordinates of the elbow
and gripper are changed as the joints are moved. (The arm is unable to move
in the z axis.) This shows that the robot's configuration may be represented
using a four-dimensional coordinate system: (xe, ye) for the elbow's placement
in relation to the environment, and (xg, yg) for the gripper's location. Clearly,
these four coordinates represent the robot's whole condition. They make up
what is referred to as workspace representation.
2. Configuration spaces come with their own set of issues. A robot's task is
frequently specified in workspace coordinates rather than configuration space
coordinates. The topic of how to convert workspace coordinates to
configuration space arises as a result.
3. For prismatic joints, these transformations are linear; for revolute joints,
they are trigonometric. Kinematics is the name for this series of coordinate
transformations.
Inverse kinematics is the inverse problem of computing the configuration of a
robot whose effector location is defined in workspace coordinates. The
configuration space can be divided into two subspaces: the space of all
possible robot configurations (often referred to as free space) and the space of
unattainable configurations (usually referred to as occupied space).
Cell Decomposition Methods
1. A uniformly spaced grid is the simplest cell decomposition.
2. The value of each free-space grid cell—that is, the cost of the shortest path
from that cell to the goal—is indicated by grayscale shading.
3. There are a variety of ways to improve cell decomposition methods in order
to relieve some of these issues. The first method permits the mixed cells to be
further subdivided—perhaps utilising cells half the original size. This can be
repeated until a path is determined that is fully made up of free cells. (Of
course, the method only works if it is possible to determine whether a given
cell is a mixed cell, which is simple only if the configuration space boundaries
are defined.)
4. Have relatively simple mathematical descriptions.) This method is complete
provided there is a bound on the smallest passageway through which a
solution must pass. One HYBRID A* algorithm that implements this is hybrid
A*.
Modified Cost Functions
1. A potential field is a function defined over state space whose value
increases as the distance to the nearest obstacle increases.
2. In the shortest-path calculation, the potential field can be used as an
additional cost term.
3. This results in an intriguing trade-off. On the one hand, the robot aims to
reduce the length of the path to the destination. On the other side, it strives to
avoid stumbling blocks by reducing the possible function.
4. The cost function can be changed in a variety of ways. It may be useful, for
example, to smooth the control settings across time.
Skeletonization methods
1. The skeletonization concept underpins the second main family of path-
planning algorithms.
2. These techniques simplify the planning problem by reducing the robot's free
area to a one-dimensional representation.
3. A skeleton of the configuration space is a lower-dimensional representation
of the configuration space.
4. It's a free-space Voronoi graph, or the collection of all points that are
equidistant from two or more obstacles. To use a Voronoi graph for path
planning, the robot must first transform its current configuration to a point on
the graph.
5. It's simple to demonstrate that a straight-line motion in configuration space
can always do this. Second, the robot moves along the Voronoi graph until it
reaches the configuration that is closest to the target. After that, the robot
moves away from the Voronoi graph and toward the destination. In
configuration space, this final phase requires straight-line mobility once more.

A repelling potential field pushes the robot away from obstacles, (b) Path
found by simultaneously minimizing path length and the potential
The Voronoi graph is the set of points equidistance to two or more
obstacles in configuration space (b) A probabilistic moodmap, composed
of 100 randomly chosen points in free space.

Robust methods
1. A robust method assumes a bounded level of uncertainty in each component
of a problem, but assigns no probabilities to values inside the allowable
interval.
2. A robust solution is one that works regardless of the actual values as long as
they stay within the assumed ranges.
3.The conformant planning technique is an extreme kind of resilient strategy.

A two-dimensional environment, velocity uncertainty cone, , butand


envelope of possible robot motions. The intended velocity is , resulting
inwith uncertainty the actual velocity could be anywhere in C a final
configuration somewhere in the motion envelope, which means we
wouldn’t know if we hit the hole or not
The first motion command and the resulting envelope of possible robot
motions. No matter what the error, we know the final configuration will
be to the left of the hole.

You might also like