0% found this document useful (0 votes)
35 views

Artificial Intelligence Notes

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views

Artificial Intelligence Notes

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 81

Artificial Intelligence is composed of two words Artificial and Intelligence, where Artificial

defines “man-made,” and intelligence defines “thinking power”, hence AI means “a man-
made thinking power.”

❖ Definition of AI
It is a branch of computer science by which we can create intelligent machines which can
behave like a human, think like humans, and able to make decisions.”

Artificial Intelligence exists when a machine can have human based skills such as learning,
reasoning, and solving problems.
With Artificial Intelligence you do not need to pre-program a machine to do some work, despite
that you can create a machine with programmed algorithms which can work with own
intelligence, and that is the awesomeness of AI.
It is believed that AI is not a new technology, and some people says that as per Greek myth,
there were Mechanical men in early days which can work and behave like humans.

❖ Why Artificial Intelligence?


Before Learning about Artificial Intelligence, we should know that what is the importance of
AI and why should we learn it. Following are some main reasons to learn about AI:
• With the help of AI, you can create such software or devices which can solve real-world
problems very easily and with accuracy such as health issues, marketing, traffic issues,
etc.
• With the help of AI, you can create your personal virtual Assistant, such as Cortana,
Google Assistant, Siri, etc.
• With the help of AI, you can build such Robots which can work in an environment
where survival of humans can be at risk.
• AI opens a path for other new technologies, new devices, and new Opportunities.

❖ Goals of Artificial Intelligence


Following are the main goals of Artificial Intelligence:
• Replicate human intelligence
• Solve Knowledge-intensive tasks
• An intelligent connection of perception and action
• Building a machine which can perform tasks that requires human intelligence such as:
✓ Proving a theorem
✓ Playing chess
✓ Plan some surgical operation
✓ Driving a car in traffic
• Creating some system which can exhibit intelligent behaviour, learn new things by
itself, demonstrate, explain, and can advise to its user.

❖ What Comprises to Artificial Intelligence?


Artificial Intelligence is not just a part of computer science even it's so vast and requires lots
of other factors which can contribute to it. To create the AI first we should know that how
intelligence is composed, so the Intelligence is an intangible part of our brain which is a
combination of Reasoning, learning, problem-solving perception, language understanding, etc.
To achieve the above factors for a machine or software Artificial Intelligence requires the
following discipline:
• Mathematics
• Biology
• Psychology
• Sociology
• Computer Science
• Neurons Study
• Statistics
❖ Advantages of Artificial Intelligence
Following are some main advantages of Artificial Intelligence:
• High Accuracy with less errors: AI machines or systems are prone to less errors and high
accuracy as it takes decisions as per pre-experience or information.
• High-Speed: AI systems can be of very high-speed and fast-decision making, because of
that AI systems can beat a chess champion in the Chess game.
• High reliability: AI machines are highly reliable and can perform the same action multiple
times with high accuracy.
• Useful for risky areas: AI machines can be helpful in situations such as defusing a bomb,
exploring the ocean floor, where to employ a human can be risky.
• Digital Assistant: AI can be very useful to provide digital assistant to the users such as AI
technology is currently used by various E-commerce websites to show the products as per
customer requirement.
• Useful as a public utility: AI can be very useful for public utilities such as a self-driving
car which can make our journey safer and hassle-free, facial recognition for security
purpose, Natural language processing to communicate with the human in human-language,
etc.

❖ Disadvantages of Artificial Intelligence


Every technology has some disadvantages, and the same goes for Artificial intelligence. Being
so advantageous technology still, it has some disadvantages which we need to keep in our mind
while creating an AI system. Following are the disadvantages of AI:
• High Cost: The hardware and software requirement of AI is very costly as it requires lots
of maintenance to meet current world requirements.
• Can't think out of the box: Even we are making smarter machines with AI, but still they
cannot work out of the box, as the robot will only do that work for which they are trained,
or programmed.
• No feelings and emotions: AI machines can be an outstanding performer, but still it does
not have the feeling so it cannot make any kind of emotional attachment with human, and
may sometime be harmful for users if the proper care is not taken.
• Increase dependency on machines: With the increment of technology, people are getting
more dependent on devices and hence they are losing their mental capabilities.
• No Original Creativity: As humans are so creative and can imagine some new ideas but
still AI machines cannot beat this power of human intelligence and cannot be creative and
imaginative.

❖ Applications of AI
Artificial Intelligence has various applications in today's society. It is becoming essential for
today's time because it can solve complex problems with an efficient way in multiple industries,
such as Healthcare, entertainment, finance, education, etc. AI is making our daily life more
comfortable and faster.
Following are some sectors which have the application of Artificial Intelligence:

1. AI in Astronomy
Artificial Intelligence can be very useful to solve complex universe problems. AI technology
can be helpful for understanding the universe such as how it works, origin, etc.
2. AI in Healthcare
In the last, five to ten years, AI becoming more advantageous for the healthcare industry and
going to have a significant impact on this industry.
Healthcare Industries are applying AI to make a better and faster diagnosis than humans. AI
can help doctors with diagnoses and can inform when patients are worsening so that medical
help can reach to the patient before hospitalization.
3. AI in Gaming
AI can be used for gaming purpose. The AI machines can play strategic games like chess,
where the machine needs to think of a large number of possible places.
4. AI in Finance
AI and finance industries are the best matches for each other. The finance industry is
implementing automation, chatbot, adaptive intelligence, algorithm trading, and machine
learning into financial processes.
5. AI in Data Security
The security of data is crucial for every company and cyber-attacks are growing very rapidly
in the digital world. AI can be used to make your data more safe and secure. Some examples
such as AEG bot, AI2 platform are used to determine software bug and cyber-attacks in a better
way.
6. AI in Social Media
Social Media sites such as Facebook, Twitter, and Snapchat contain billions of user profiles,
which need to be stored and managed in a very efficient way. AI can organize and manage
massive amounts of data. AI can analyze lots of data to identify the latest trends, hashtag, and
requirement of different users.
7. AI in Travel & Transport
AI is becoming highly demanding for travel industries. AI is capable of doing various travel
related works such as from making travel arrangement to suggesting the hotels, flights, and
best routes to the customers. Travel industries are using AI-powered chatbots which can make
human-like interaction with customers for better and fast response.
8. AI in Automotive Industry
Some Automotive industries are using AI to provide virtual assistant to their user for better
performance. Such as Tesla has introduced TeslaBot, an intelligent virtual assistant.
Various Industries are currently working for developing self-driven cars which can make your
journey more safe and secure.
9. AI in Robotics:
Artificial Intelligence has a remarkable role in Robotics. Usually, general robots are
programmed such that they can perform some repetitive task, but with the help of AI, we can
create intelligent robots which can perform tasks with their own experiences without pre-
programmed.
Humanoid Robots are best examples for AI in robotics, recently the intelligent Humanoid robot
named as Erica and Sophia has been developed which can talk and behave like humans.
10. AI in Entertainment
We are currently using some AI based applications in our daily life with some entertainment
services such as Netflix or Amazon. With the help of ML/AI algorithms, these services show
the recommendations for programs or shows.
11. AI in Agriculture
Agriculture is an area which requires various resources, labor, money, and time for best result.
Now a day's agriculture is becoming digital, and AI is emerging in this field. Agriculture is
applying AI as agriculture robotics, solid and crop monitoring, predictive analysis. AI in
agriculture can be very helpful for farmers.
12. AI in E-commerce
AI is providing a competitive edge to the e-commerce industry, and it is becoming more
demanding in the e-commerce business. AI is helping shoppers to discover associated products
with recommended size, color, or even brand.
13. AI in education:
AI can automate grading so that the tutor can have more time to teach. AI chatbot can
communicate with students as a teaching assistant.
AI in the future can be work as a personal virtual tutor for students, which will be accessible
easily at any time and any place.

❖ History of Artificial Intelligence


Artificial Intelligence is not a new word and not a new technology for researchers. This
technology is much older than you would imagine. Even there are the myths of Mechanical
men in Ancient Greek and Egyptian Myths. Following are some milestones in the history of
AI which defines the journey from the AI generation to till date development.
❖ Maturation of Artificial Intelligence (1943-1952)
• Year 1943: The first work which is now recognized as AI was done by Warren
McCulloch and Walter pits in 1943. They proposed a model of artificial neurons.
• Year 1949: Donald Hebb demonstrated an updating rule for modifying the connection
strength between neurons. His rule is now called Hebbian learning.
• Year 1950: The Alan Turing who was an English mathematician and pioneered
Machine learning in 1950. Alan Turing publishes “Computing Machinery and
Intelligence” in which he proposed a test. The test can check the machine's ability to
exhibit intelligent behavior equivalent to human intelligence, called a Turing test.

❖ The birth of Artificial Intelligence (1952-1956)


• Year 1955: An Allen Newell and Herbert A. Simon created the "first artificial
intelligence program “Which was named as "Logic Theorist". This program had
proved 38 of 52 Mathematics theorems, and find new and more elegant proofs for some
theorems.
• Year 1956: The word "Artificial Intelligence" first adopted by American Computer
scientist John McCarthy at the Dartmouth Conference. For the first time, AI coined as
an academic field.
At that time high-level computer languages such as FORTRAN, LISP, or COBOL were
invented. And the enthusiasm for AI was very high at that time.

❖ The golden years-Early enthusiasm (1956-1974)


• Year 1966: The researchers emphasized developing algorithms which can solve
mathematical problems. Joseph Weizenbaum created the first chatbot in 1966, which
was named as ELIZA.
• Year 1972: The first intelligent humanoid robot was built in Japan which was named
as WABOT-1.

❖ The first AI winter (1974-1980)


• The duration between years 1974 to 1980 was the first AI winter duration. AI winter
refers to the time period where computer scientist dealt with a severe shortage of
funding from government for AI researches.
• During AI winters, an interest of publicity on artificial intelligence was decreased.

❖ A boom of AI (1980-1987)
• Year 1980: After AI winter duration, AI came back with "Expert System". Expert
systems were programmed that emulate the decision-making ability of a human expert.
• In the Year 1980, the first national conference of the American Association of Artificial
Intelligence was held at Stanford University.

❖ The second AI winter (1987-1993)


• The duration between the years 1987 to 1993 was the second AI Winter duration.
• Again Investors and government stopped in funding for AI research as due to high cost
but not efficient result. The expert system such as XCON was very cost effective.

❖ The emergence of intelligent agents (1993-2011)


• Year 1997: In the year 1997, IBM Deep Blue beats world chess champion, Gary
Kasparov, and became the first computer to beat a world chess champion.
• Year 2002: for the first time, AI entered the home in the form of Roomba, a vacuum
cleaner.
• Year 2006: AI came in the Business world till the year 2006. Companies like Facebook,
Twitter, and Netflix also started using AI.

❖ Deep learning, big data and artificial general intelligence (2011-present)


• Year 2011: In the year 2011, IBM's Watson won jeopardy, a quiz show, where it had
to solve the complex questions as well as riddles. Watson had proved that it could
understand natural language and can solve tricky questions quickly.
• Year 2012: Google has launched an Android app feature "Google now", which was
able to provide information to the user as a prediction.
• Year 2014: In the year 2014, Chatbot “Eugene Goostman” won a competition in the
infamous "Turing test."
• Year 2018: The "Project Debater" from IBM debated on complex topics with two
master debaters and also performed extremely well.
• Google has demonstrated an AI program "Duplex" which was a virtual assistant and
which had taken hairdresser appointment on call, and lady on other side didn't notice
that she was talking with the machine.
Now AI has developed to a remarkable level. The concept of Deep learning, big data, and data
science are now trending like a boom. Nowadays companies like Google, Facebook, IBM, and
Amazon are working with AI and creating amazing devices. The future of Artificial
Intelligence is inspiring and will come with high intelligence.

❖ Agents in Artificial Intelligence


In artificial intelligence, an agent is a computer program or system that is designed to perceive
its environment, make decisions and take actions to achieve a specific goal or set of goals. The
agent operates autonomously, meaning it is not directly controlled by a human operator.
Artificial intelligence is defined as the study of rational agents. A rational agent could be
anything that makes decisions, such as a person, firm, machine, or software. It carries out an
action with the best outcome after considering past and current percepts (agent’s perceptual
inputs at a given instance). An AI system is composed of an agent and its environment. The
agents act in their environment. The environment may contain other agents.
An agent is anything that can be viewed as:
• Perceiving its environment through sensors and
• Acting upon that environment through actuators

Fig. Interaction of Agents with the Environment


Autonomous Agents: An agent that is not under the immediate control of humans
▪ An agent whose behaviour depends completely on built-in knowledge lacks autonomy.
▪ An agent whose behaviour is only determined by its own experience has complete autonomy.
o With no experience, requires too much random activity!

❖ Structure of an AI Agent
To understand the structure of Intelligent Agents, we should be familiar with Architecture and
Agent programs. Architecture is the machinery that the agent executes on. It is a device with
sensors and actuators, for example, a robotic car, a camera, and a PC. An agent program is an
implementation of an agent function. An agent function is a map from the percept sequence
(history of all that an agent has perceived to date) to an action.
Agent = Architecture + Agent Program
There are many examples of agents in artificial intelligence. Here are a few:
• Intelligent personal assistants: These are agents that are designed to help users with
various tasks, such as scheduling appointments, sending messages, and setting
reminders. Examples of intelligent personal assistants include Siri, Alexa, and Google
Assistant.
• Autonomous robots: These are agents that are designed to operate autonomously in the
physical world. They can perform tasks such as cleaning, sorting, and delivering goods.
Examples of autonomous robots include the Roomba vacuum cleaner and the Amazon
delivery robot.
• Gaming agents: These are agents that are designed to play games, either against human
opponents or other agents. Examples of gaming agents include chess-playing agents
and poker-playing agents.
• Fraud detection agents: These are agents that are designed to detect fraudulent
behaviour in financial transactions. They can analyse patterns of behaviour to identify
suspicious activity and alert authorities. Examples of fraud detection agents include
those used by banks and credit card companies.
• Traffic management agents: These are agents that are designed to manage traffic flow
in cities. They can monitor traffic patterns, adjust traffic lights, and reroute vehicles to
minimize congestion. Examples of traffic management agents include those used in
smart cities around the world.
• A software agent has Keystrokes, file contents, received network packages that act as
sensors and displays on the screen, files, and sent network packets acting as actuators.
• A Human-agent has eyes, ears, and other organs which act as sensors, and hands, legs,
mouth, and other body parts act as actuators.
• A Robotic agent has Cameras and infrared range finders which act as sensors and
various motors act as actuators.

Fig. Characteristics of an Agent


❖ Types of Agents
Agents can be grouped into five classes based on their degree of perceived intelligence and
capability:
• Simple Reflex Agents
• Model-Based Reflex Agents
• Goal-Based Agents
• Utility-Based Agents
• Learning Agent
• Multi-agent systems

Simple Reflex Agents


Simple reflex agents ignore the rest of the percept history and act only on the basis of the
current percept. Percept history is the history of all that an agent has perceived to date. The
agent function is based on the condition-action rule. A condition-action rule is a rule that maps
a state i.e., a condition to an action. If the condition is true, then the action is taken, else not.
This agent function only succeeds when the environment is fully observable. For simple reflex
agents operating in partially observable environments, infinite loops are often unavoidable. It
may be possible to escape from infinite loops if the agent can randomize its actions.
Problems with Simple reflex agents are:
• Very limited intelligence.
• No knowledge of non-perceptual parts of the state.
• Usually too big to generate and store.
• If there occurs any change in the environment, then the collection of rules needs to be
updated.

Fig. Simple Reflex Agents


Model-Based Reflex Agents
It works by finding a rule whose condition matches the current situation. A model-based agent
can handle partially observable environments by the use of a model about the world. The agent
has to keep track of the internal state which is adjusted by each percept and that depends on the
percept history. The current state is stored inside the agent which maintains some kind of
structure describing the part of the world which cannot be seen.
Updating the state requires information about:
• How the world evolves independently from the agent?
• How do the agent’s actions affect the world?

Fig. Model-Based Reflex Agents


Goal-Based Agents
These kinds of agents take decisions based on how far they are currently from their goal
(description of desirable situations). Their every action is intended to reduce their distance from
the goal. This allows the agent a way to choose among multiple possibilities, selecting the one
which reaches a goal state. The knowledge that supports its decisions is represented explicitly
and can be modified, which makes these agents more flexible. They usually require search and
planning. The goal-based agent’s behavior can easily be changed.
Fig. Model-Based Reflex Agents

Utility-Based Agents
The agents which are developed having their end uses as building blocks are called utility-
based agents. When there are multiple possible alternatives, then to decide which one is best,
utility-based agents are used. They choose actions based on a preference (utility) for each state.
Sometimes achieving the desired goal is not enough. We may look for a quicker, safer, cheaper
trip to reach a destination. Agent happiness should be taken into consideration. Utility describes
how “happy” the agent is. Because of the uncertainty in the world, a utility agent chooses the
action that maximizes the expected utility. A utility function maps a state onto a real number
which describes the associated degree of happiness.

Fig. Utility-Based Agents

Learning Agent
A learning agent in AI is the type of agent that can learn from its past experiences or it has
learning capabilities. It starts to act with basic knowledge and then is able to act and adapt
automatically through learning. A learning agent has mainly four conceptual components,
which are:
1. Learning element: It is responsible for making improvements by learning from the
environment.
2. Critic: The learning element takes feedback from critics which describes how well the
agent is doing with respect to a fixed performance standard.
3. Performance element: It is responsible for selecting external action.
4. Problem Generator: This component is responsible for suggesting actions that will lead
to new and informative experiences.

Fig. Learning Agent

Multi-Agent Systems
These agents interact with other agents to achieve a common goal. They may have to coordinate
their actions and communicate with each other to achieve their objective.
A multi-agent system (MAS) is a system composed of multiple interacting agents that are
designed to work together to achieve a common goal. These agents may be autonomous or
semi-autonomous and are capable of perceiving their environment, making decisions, and
taking action to achieve the common objective.
MAS can be used in a variety of applications, including transportation systems, robotics, and
social networks. They can help improve efficiency, reduce costs, and increase flexibility in
complex systems. MAS can be classified into different types based on their characteristics,
such as whether the agents have the same or different goals, whether the agents are cooperative
or competitive, and whether the agents are homogeneous or heterogeneous.
In a homogeneous MAS, all the agents have the same capabilities, goals, and behaviours.
In contrast, in a heterogeneous MAS, the agents have different capabilities, goals, and
behaviours.
This can make coordination more challenging but can also lead to more flexible and robust
systems.
Cooperative MAS involves agents working together to achieve a common goal, while
competitive MAS involves agents working against each other to achieve their own goals. In
some cases, MAS can also involve both cooperative and competitive behaviour, where agents
must balance their own interests with the interests of the group.
MAS can be implemented using different techniques, such as game theory, machine learning,
and agent-based modelling. Game theory is used to analyse strategic interactions between
agents and predict their behaviour. Machine learning is used to train agents to improve their
decision-making capabilities over time. Agent-based modelling is used to simulate complex
systems and study the interactions between agents.
Overall, multi-agent systems are a powerful tool in artificial intelligence that can help solve
complex problems and improve efficiency in a variety of applications.

❖ Problem Solving Agents


A goal-based agent whose goal is to solve a particular problem
✓ Define problem
✓ Identify solution states - Goal
✓ Task: Find sequence of actions that will allow the agent to go from current state to goal
state - Search
✓ Execute action sequence that the search returns as solution

❖ Well-defined Problems
A problem can be defined formally by four components:
▪ The that the agent starts in.
▪ A description of the possible available to the agent.
– Commonly done using a Successor function. Given a particular state x,
SUCCESSOR-FN(x) returns a set of (action_successor) ordered pairs.
– The initial state and success or function implicitly define the state space of the
problem - the set of all states reachable from the initial state.
▪ The determines whether a given state is a goal state.
– Explicit set of possible goal states or specified by an abstract property
▪ A function that assigns a numeric cost to each path.
– Agent chooses a cost function that reflect sits own performance measure
– Agent attempts to minimize the cost function
❖ Problem Formulation
A to a problem is a path from the initial state to a goal state.
• Solution quality is measured by the path cost.
• An has the lowest path cost among all solutions.
• Abstraction: process of removing detail from representation.

❖ Search Space Definitions


State
– A description of all possible state of the world.
– Includes all features of the world that are pertinent to the problem.

Initial State
– Description of all pertinent aspects of the state in which the agent starts the search.

Goal Test
– Conditions the agent is trying to meet.

Goal State
– Any state which meets the goal condition.

Problem Formulation
– Describe a general problem as a search problem.

Solution
– Sequence of actions that transitions the world from the initial state to a goal state.

Solution cost (additive)


– Sum of the cost of operators
– Alternative: sum of distances, number of steps,etc.

Search
– Process of looking for a solution.
– Search algorithm takes problem as input and returns solution
– We are searching through a space of possible states
Execution
– Process of executing sequence of actions (solution)

❖ Examples
❖ Searching for Solutions
The problem thus defined can be solved by searching the state space.
• Search for goal states in a search tree generated from the initial state using the successor
function.
• Search strategy: Choice (!) of which action to be taken up in order to continue search
for the goal state
– Or, which node to expand next in the collection of nodes that have been
generated (fringe)
• Information in search node: state; parent node; action; path-cost; depth.
• Branching factor: Maximum number of successor nodes on any node.
• Effectiveness of search can be determined by:
– Search cost: Time taken to reach the goal state
– Total cost

❖ Searching for Solutions: Visualization


• States are nodes.
• Actions are edges.
• Initial state is root
• Solution is path from root to goal node
• Edges sometimes have associated costs
• States resulting from operator are children

❖ Uniformed search strategy


Uninformed search is a class of general-purpose search algorithms which operates in brute
force-way. Uninformed search algorithms do not have additional information about state or
search space other than how to traverse the tree, so it is also called blind search.
Following are the various types of uninformed search algorithms:
1. Breadth-first Search
2. Depth-first Search
3. Depth-limited Search
4. Iterative deepening depth-first search
5. Uniform cost search
6. Bidirectional Search
1. Breadth-first Search:
• Breadth-first search is the most common search strategy for traversing a tree or graph.
This algorithm searches breadthwise in a tree or graph, so it is called breadth-first
search.
• BFS algorithm starts searching from the root node of the tree and expands all successor
node at the current level before moving to nodes of next level.
• The breadth-first search algorithm is an example of a general-graph search algorithm.
• Breadth-first search implemented using FIFO queue data structure.
Advantages:
• BFS will provide a solution if any solution exists.
• If there are more than one solution for a given problem, then BFS will provide the
minimal solution which requires the least number of steps.
Disadvantages:
• It requires lots of memory since each level of the tree must be saved into memory to
expand the next level.
• BFS needs lots of time if the solution is far away from the root node.
Example:
In the below tree structure, we have shown the traversing of the tree using BFS algorithm from
the root node S to goal node K. BFS search algorithm traverse in layers, so it will follow the
path which is shown by the dotted arrow, and the traversed path will be:

S---> A--->B---->C--->D---->G--->H--->E---->F---->I---->K
Time Complexity: Time Complexity of BFS algorithm can be obtained by the number of
nodes traversed in BFS until the shallowest Node. Where the d= depth of shallowest solution
and b is a node at every state.
T (b) = 1 + b2 + b3 + ....... + bd= O (bd)
Space Complexity: Space complexity of BFS algorithm is given by the Memory size of
frontier which is O(bd).
Completeness: BFS is complete, which means if the shallowest goal node is at some finite
depth, then BFS will find a solution.
Optimality: BFS is optimal if path cost is a non-decreasing function of the depth of the node.

More Examples:
Problem 1 Solution

Path Traversed = [A – B – C- D – E – F- G – H]

Problem 2 Solution
Path Traversed = [S - A – B – C- D – G – H - E – F-
I-K]
2. Depth-first Search
• Depth-first search is a recursive algorithm for traversing a tree or graph data structure.
• It is called the depth-first search because it starts from the root node and follows each
path to its greatest depth node before moving to the next path.
• DFS uses a stack data structure for its implementation.
• The process of the DFS algorithm is similar to the BFS algorithm.
Advantage:
• DFS requires very less memory as it only needs to store a stack of the nodes on the path
from root node to the current node.
• It takes less time to reach to the goal node than BFS algorithm (if it traverses in the
right path).
Disadvantage:
• There is the possibility that many states keep re-occurring, and there is no guarantee of
finding the solution.
• DFS algorithm goes for deep down searching and sometime it may go to the infinite
loop.

Example:
In the below search tree, we have shown the flow of depth-first search, and it will follow the
order as:
Root node--->Left node ----> right node.
It will start searching from root node S, and traverse A, then B, then D and E, after traversing
E, it will backtrack the tree as E has no other successor and still goal node is not found. After
backtracking it will traverse node C and then G, and here it will terminate as it found goal node.
Completeness: DFS search algorithm is complete within finite state space as it will expand
every node within a limited search tree.
Time Complexity: Time complexity of DFS will be equivalent to the node traversed by the
algorithm. It is given by:
T(n)= 1+ n2+ n3 +.........+ nm=O(nm)
Where, m= maximum depth of any node and this can be much larger than d (Shallowest
solution depth)
Space Complexity: DFS algorithm needs to store only single path from the root node, hence
space complexity of DFS is equivalent to the size of the fringe set, which is O(bm).
Optimal: DFS search algorithm is non-optimal, as it may generate a large number of steps or
high cost to reach to the goal node.

Example:
Problem 1: Solution

Path Traversed = [A - B – D – G - E – C – F - H]

Problem 2: Solution

Path Traversed = [A - B – D – G - E – C – F - H]
3. Depth-Limited Search Algorithm:
A depth-limited search algorithm is similar to depth-first search with a predetermined limit.
Depth-limited search can solve the drawback of the infinite path in the Depth-first search. In
this algorithm, the node at the depth limit will treat as it has no successor nodes further.
Depth-limited search can be terminated with two Conditions of failure:
• Standard failure value: It indicates that problem does not have any solution.
• Cutoff failure value: It defines no solution for the problem within a given depth limit.
Advantages:
Depth-limited search is Memory efficient.

Disadvantages:
o Depth-limited search also has a disadvantage of incompleteness.
o It may not be optimal if the problem has more than one solution.

Example:

Completeness: DLS search algorithm is complete if the solution is above the depth-limit.
Time Complexity: Time complexity of DLS algorithm is O(bℓ).
Space Complexity: Space complexity of DLS algorithm is O(b×ℓ).
Optimal: Depth-limited search can be viewed as a special case of DFS, and it is also not
optimal even if ℓ>d.
Problem Solution

For depth (d) =2;


Path Traversed = [X - A – C – D - B – I – J]

4. Uniform-cost Search Algorithm:


Uniform-cost search is a searching algorithm used for traversing a weighted tree or graph. This
algorithm comes into play when a different cost is available for each edge. The primary goal
of the uniform-cost search is to find a path to the goal node which has the lowest cumulative
cost. Uniform-cost search expands nodes according to their path costs form the root node. It
can be used to solve any graph/tree where the optimal cost is in demand. A uniform-cost search
algorithm is implemented by the priority queue. It gives maximum priority to the lowest
cumulative cost. Uniform cost search is equivalent to BFS algorithm if the path cost of all edges
is the same.
Advantages:
• Uniform cost search is optimal because at every state the path with the least cost is
chosen.
Disadvantages:
• It does not care about the number of steps involve in searching and only concerned
about path cost. Due to which this algorithm may be stuck in an infinite loop.
Example:
Completeness:
Uniform-cost search is complete, such as if there is a solution, UCS will find it.

Time Complexity:
Let C* is Cost of the optimal solution, and ε is each step to get closer to the goal node. Then
the number of steps is = C*/ε+1. Here we have taken +1, as we start from state 0 and end to
C*/ε.
Hence, the worst-case time complexity of Uniform-cost search is O(b1 + [C*/ε]).

Space Complexity:
The same logic is for space complexity so, the worst-case space complexity of Uniform-cost
search is O(b1 + [C*/ε]).
Optimal:
Uniform-cost search is always optimal as it only selects a path with the lowest path cost.

Problem 1 Solution

For G is the goal node.


Path Traversed = [A – C – B - E – G]
Total cost =12

For H is the goal node.


Path Traversed = [A – C – B - E – G -
E – F – E – D – H]
Total cost =12
Steps in Uniform Cost Search Algorithm
This algorithm assumes that all the operators have a cost.
1. Initialize: Set OPEN = {s}
CLOSED = {}, set C(s) = 0.
2. Fail: If OPEN = {}, Terminate and fail
3. Select: Select the minimum cost state n from OPEN and save n
to CLOSED
4. Terminate: If 𝑛 𝜖 𝐺, terminate with success.
5. Expand: Generate the successors of 𝑛 using 𝑂.
For each successor of 𝑚;
If 𝑚 𝜖 [𝑂𝑃𝐸𝑁 ∪ 𝐶𝐿𝑆𝑂𝐸𝐷]
Set 𝐶(𝑚) = 𝐶(𝑛) + 𝐶(𝑛, 𝑚) and
insert 𝑚 in OPEN
If 𝑚 𝜖 [𝑂𝑃𝐸𝑁 ∪ 𝐶𝐿𝑆𝑂𝐸𝐷]
Set 𝐶(𝑚) = min{𝐶(𝑚), 𝐶(𝑛) + 𝐶(𝑛, 𝑚)}
If 𝐶(𝑚) has decreased and
𝑚 𝜖 𝐶𝐿𝑂𝑆𝐸𝐷, 𝑚𝑜𝑣𝑒 𝑖𝑡 𝑡𝑜 𝑂𝑃𝐸𝑁
6. Goto Step 2.

5. Iterative deepening depth-first Search:


The iterative deepening algorithm is a combination of DFS and BFS algorithms. This search
algorithm finds out the best depth limit and does it by gradually increasing the limit until a goal
is found.
This algorithm performs depth-first search up to a certain “depth limit”, and it keeps increasing
the depth limit after each iteration until the goal node is found. Generally, this is the preferred
uninformed search method when there is a large search space and the depth of the solution is
not known.
This Search algorithm combines the benefits of Breadth-first search's fast search and depth-
first search's memory efficiency.
The iterative search algorithm is useful uninformed search when search space is large, and
depth of goal node is unknown.

Advantages:
• It combines the benefits of BFS and DFS search algorithm in terms of fast search and
memory efficiency.
Disadvantages:
• The main drawback of IDDFS is that it repeats all the work of the previous phase.
Example:
Following tree structure is showing the iterative deepening depth-first search. IDDFS algorithm
performs various iterations until it does not find the goal node. The iteration performed by the
algorithm is given as:
1'st Iteration-----> A
2'nd Iteration----> A, B, C
3'rd Iteration------>A, B, D, E, C, F, G
4'th Iteration------>A, B, D, H, I, E, C, F, K, G
In the fourth iteration, the algorithm will find the goal node.

Completeness:
This algorithm is complete is ifthe branching factor is finite.
Time Complexity:
Let's suppose b is the branching factor and depth is d then the worst-case time complexity is
O(bd).
Space Complexity:
The space complexity of IDDFS will be O(bd).
Optimal:
IDDFS algorithm is optimal if path cost is a non- decreasing function of the depth of the node.
Problem Solution

Starting node S, Goal Node G


Using DFS
1 Iteration, d = 0, [S] – starting node
st

2nd Iteration, d = 1, [S – A - C]
3rd Iteration, d = 2, [S – A – D – B – C – E - G]

Usi
Performs a depth-limited search by gradually increasing the depth_limit until the goal at the
shallowest depth is found.
1. Perform Depth limited search for depth_limit = 0
2. If solution is found exit, else go to 3
3. Increase depth_limit by 1
4. Go back to 1

6. Bidirectional Search Algorithm:


Bidirectional search algorithm runs two simultaneous searches, one form initial state called as
forward-search and other from goal node called as backward-search, to find the goal node.
Bidirectional search replaces one single search graph with two small subgraphs in which one
starts the search from an initial vertex and other starts from goal vertex. The search stops when
these two graphs intersect each other.
Bidirectional search can use search techniques such as BFS, DFS, DLS, etc.
Advantages:
• Bidirectional search is fast.
• Bidirectional search requires less memory
Disadvantages:
• Implementation of the bidirectional search tree is difficult.
• In bidirectional search, one should know the goal state in advance.

Steps in Bidirectional search


1. Run two simultaneous searches
– one forward from the initial state
– the other backward from the goal
2. Stop when the two searches meet
– check each node before it is expanded to see if it is in the fringe of the other
search tree
3. Searching backwards is not so easy
– 𝑃𝑟𝑒𝑑𝑒𝑐𝑒𝑠𝑠𝑜𝑟(𝑥) function that computes all states that have 𝑥 as a successor.
Example:
In the below search tree, bidirectional search algorithm is applied. This algorithm divides one
graph/tree into two sub-graphs. It starts traversing from node 1 in the forward direction and
starts from goal node 16 in the backward direction.
The algorithm terminates at node 9 where two searches meet.

Completeness: Bidirectional Search is complete if we use BFS in both searches.


Time Complexity: Time complexity of bidirectional search using BFS is O(bd).
Space Complexity: Space complexity of bidirectional search is O(bd).
Optimal: Bidirectional search is Optimal.

Problem 1 Solution

For G is the goal node.


Path Traversed =
[A – B – C - D – F – E - G]
Total cost =12

Problem 2
Solution

For G is the goal node.

Path Traversed = [A – B – D – C – E – F – H – J – L – I – Y – K - G]

❖ Informed Search Algorithms


So far we have talked about the uninformed search algorithms which looked through search
space for all possible solutions of the problem without having any additional knowledge about
search space. But informed search algorithm contains an array of knowledge such as how far
[A –the
we are from B –goal,
C-D – Fcost,
path – Ehow
- G]to reach to goal node, etc. This knowledge help agents to
explore less to the search space and find more efficiently the goal node.
Total cost =12
The informed search algorithm is more useful for large search space. Informed search
algorithm uses the idea of heuristic, so it is also called Heuristic search.

Heuristics function: Heuristic is a function which is used in Informed Search, and it finds the
most promising path. It takes the current state of the agent as its input and produces the
estimation of how close agent is from the goal. The heuristic method, however, might not
always give the best solution, but it guaranteed to find a good solution in reasonable time.
Heuristic function estimates how close a state is to the goal. It is represented by h(n), and it
calculates the cost of an optimal path between the pair of states. The value of the heuristic
function is always positive.
Admissibility of the heuristic function is given as:
h(n) <= h*(n)
Here h(n) is heuristic cost, and h*(n) is the estimated cost. Hence heuristic cost should be less
than or equal to the estimated cost.

Pure Heuristic Search:


Pure heuristic search is the simplest form of heuristic search algorithms. It expands nodes based
on their heuristic value h(n). It maintains two lists, OPEN and CLOSED list. In the CLOSED
list, it places those nodes which have already expanded and in the OPEN list, it places nodes
which have yet not been expanded.
On each iteration, each node n with the lowest heuristic value is expanded and generates all its
successors and n is placed to the closed list. The algorithm continues unit a goal state is found.
In the informed search we will discuss two main algorithms which are given below:
• Best First Search Algorithm (Greedy search)
• A* Search Algorithm

1. Best-first Search Algorithm (Greedy Search):


Greedy best-first search algorithm always selects the path which appears best at that moment.
It is the combination of depth-first search and breadth-first search algorithms. It uses the
heuristic function and search. Best-first search allows us to take the advantages of both
algorithms. With the help of best-first search, at each step, we can choose the most promising
node. In the best first search algorithm, we expand the node which is closest to the goal node
and the closest cost is estimated by heuristic function, i.e.
f(n)= g(n).
Were, h(n) = estimated cost from node n to the goal.
The greedy best first algorithm is implemented by the priority queue.
Best first search algorithm:
• Step 1: Place the starting node into the OPEN list.
• Step 2: If the OPEN list is empty, Stop and return failure.
• Step 3: Remove the node n, from the OPEN list which has the lowest value of h(n),
and places it in the CLOSED list.
• Step 4: Expand the node n, and generate the successors of node n.
• Step 5: Check each successor of node n, and find whether any node is a goal node or
not. If any successor node is goal node, then return success and terminate the search,
else proceed to Step 6.
• Step 6: For each successor node, algorithm checks for evaluation function f(n), and
then check if the node has been in either OPEN or CLOSED list. If the node has not
been in both lists, then add it to the OPEN list.
• Step 7: Return to Step 2.
Advantages:
• Best first search can switch between BFS and DFS by gaining the advantages of both
the algorithms.
• This algorithm is more efficient than BFS and DFS algorithms.
Disadvantages:
• It can behave as an unguided depth-first search in the worst-case scenario.
• It can get stuck in a loop as DFS.
• This algorithm is not optimal.
Example:
Consider the below search problem, and we will traverse it using greedy best-first search. At
each iteration, each node is expanded using evaluation function f(n)=h(n), which is given in
the below table.

In this search example, we are using two lists which are OPEN and CLOSED Lists. Following
are the iteration for traversing the above example.

Expand the nodes of S and put in the CLOSED list


Initialization: Open [A, B], Closed [S]
Iteration 1: Open [A], Closed [S, B]
Iteration 2: Open [E, F, A], Closed [S, B]: Open [E, A], Closed [S, B, F]
Iteration 3: Open [I, G, E, A], Closed [S, B, F]: Open [I, E, A], Closed [S, B, F, G]
Hence the final solution path will be: S----> B----->F----> G
Time Complexity: The worst-case time complexity of Greedy best first search is O(bm).
Space Complexity: The worst-case space complexity of Greedy best first search is O(bm).
Where, m is the maximum depth of the search space.
Complete: Greedy best-first search is also incomplete, even if the given state space is finite.
Optimal: Greedy best first search algorithm is not optimal.

Problem 1 Solution

For G is the goal node.


Path Traversed =
[A – C - F - G]
Heuristic cost = 44

Problem 2 Solution

For G is the goal node.

Path Traversed =

[S – B - F - G]

2. A* Search Algorithm:
A* search is the most commonly known form of best-first search. It uses heuristic function
h(n), and cost to reach the node n from the start state g(n). It has combined features of UCS
and greedy best-first search, by which it solve the problem efficiently. A* search algorithm
finds the shortest path through the search space using the heuristic function. This search
algorithm expands less search tree and provides optimal result faster. A* algorithm is similar
to UCS except that it uses g(n)+h(n) instead of g(n).
In A* search algorithm, we use search heuristic as well as the cost to reach the node. Hence,
we can combine both costs as following, and this sum is called as a fitness number.

Algorithm of A* search:
Step1: Place the starting node in the OPEN list.
Step 2: Check if the OPEN list is empty or not, if the list is empty then return failure and stops.
Step 3: Select the node from the OPEN list which has the smallest value of evaluation function
(g+h), if node n is goal node then return success and stop, otherwise
Step 4: Expand node n and generate all of its successors, and put n into the closed list. For
each successor n', check whether n' is already in the OPEN or CLOSED list, if not then
compute evaluation function for n' and place into Open list.
Step 5: Else if node n' is already in OPEN and CLOSED, then it should be attached to the back
pointer which reflects the lowest g(n') value.
Step 6: Return to Step 2.
Advantages:
• A* search algorithm is the best algorithm than other search algorithms.
• A* search algorithm is optimal and complete.
• This algorithm can solve very complex problems.
Disadvantages:
• It does not always produce the shortest path as it mostly based on heuristics and
approximation.
• A* search algorithm has some complexity issues.
• The main drawback of A* is memory requirement as it keeps all generated nodes in the
memory, so it is not practical for various large-scale problems.
Example:
In this example, we will traverse the given graph using the A* algorithm. The heuristic value
of all states is given in the below table so we will calculate the f(n) of each state using the
formula f(n)= g(n) + h(n), where g(n) is the cost to reach any node from start state.
Here we will use OPEN and CLOSED list.

Solution:

Initialization: {(S, 5)}


Iteration1: {(S--> A, 4), (S-->G, 10)}
Iteration2: {(S--> A-->C, 4), (S--> A-->B, 7), (S-->G, 10)}
Iteration3: {(S--> A-->C--->G, 6), (S--> A-->C--->D, 11), (S--> A-->B, 7), (S-->G, 10)}
Iteration 4 will give the final result, as S--->A--->C--->G it provides the optimal path with
cost 6.
Points to remember:
• A* algorithm returns the path which occurred first, and it does not search for all
remaining paths.
• The efficiency of A* algorithm depends on the quality of heuristic.
• A* algorithm expands all nodes which satisfy the condition f(n)<="" li="">
Complete: A* algorithm is complete as long as:
• Branching factor is finite.
• Cost at every action is fixed.
Optimal: A* search algorithm is optimal if it follows below two conditions:
• Admissible: the first condition requires for optimality is that h(n) should be an
admissible heuristic for A* tree search. An admissible heuristic is optimistic in nature.
• Consistency: Second required condition is consistency for only A* graph-search.
If the heuristic function is admissible, then A* tree search will always find the least cost path.
Time Complexity: The time complexity of A* search algorithm depends on heuristic function,
and the number of nodes expanded is exponential to the depth of solution d. So, the time
complexity is O(bd), where b is the branching factor.
Space Complexity: The space complexity of A* search algorithm is O(bd).

Problem 1 Solution

For D is the goal node, S is the starting node.


Path Traversed = [S – A - C - D]
Cost = 9.

❖ Branch and bound


Branch and bound is one of the techniques used for problem solving. It is used for solving the
optimization problems and minimization problems. These problems are typically exponential in
terms of time complexity and may require exploring all possible permutations in worst case. The
Branch and Bound Algorithm technique solves these problems relatively quickly. If we have
given a maximization problem then we can convert it using the Branch and bound technique by
simply converting the problem into a maximization problem.
It is a way to combine space saving of depth first search with heuristic information. The optimal
solution is chosen is among many solution paths. The main idea is to maintain the lowest cost
path to a GOAL found so far, and its cost. It is typically used with depth first search.
Branch: Several choices are found.
Bound: Setting bound on solution quality.
Pruning: Trimming of branches where solution quality is poor.

Problem Solution

Tree
Bound 1: 16 [S – B – C- G]

Bound 2: 13 [S – B – D - G]
Optimal path

❖ Beam Search
It is a heuristic search algorithm that explores a graph by expanding the most promising
node in a limited set. Beam search is an optimization of best-first search that reduces its
memory requirements. Best-first search is a graph search which orders all partial solutions
(states) according to some heuristic. But in beam search, only a predetermined number of
best partial solutions are kept as candidates. It is thus a greedy algorithm.
Beam search uses breadth-first search to build its search tree. At each level of the tree, it
generates all successors of the states at the current level, sorting them in increasing order
of heuristic cost. However, it only stores a predetermined number, 𝛽 , of best states at each
level (called the beam width). Only those states are expanded next. The greater the beam
width, the fewer states are pruned. With an infinite beam width, no states are pruned and
beam search is identical to breadth-first search. The beam width bounds the memory
required to perform the search. Since a goal state could potentially be pruned, beam search
sacrifices completeness (the guarantee that an algorithm will terminate with a solution, if
one exists). Beam search is not optimal (that is, there is no guarantee that it will find the
best solution).
Example:

For beta =2, the path traversed in the direction is [A-C-F-G]

❖ Hill Climbing Algorithm in Artificial Intelligence


• Hill climbing algorithm is a local search algorithm which continuously moves in the
direction of increasing elevation/value to find the peak of the mountain or best solution
to the problem. It terminates when it reaches a peak value where no neighbor has a
higher value.
• Hill climbing algorithm is a technique which is used for optimizing the mathematical
problems. One of the widely discussed examples of Hill climbing algorithm is
Traveling-salesman Problem in which we need to minimize the distance travelled by
the salesman.
• It is also called greedy local search as it only looks to its good immediate neighbor state
and not beyond that.
• A node of hill climbing algorithm has two components which are state and value.
• Hill Climbing is mostly used when a good heuristic is available.
• In this algorithm, we don't need to maintain and handle the search tree or graph as it
only keeps a single current state.
Features of Hill Climbing:
Following are some main features of Hill Climbing Algorithm:
• Generate and Test variant: Hill Climbing is the variant of Generate and Test method.
The Generate and Test method produce feedback which helps to decide which direction
to move in the search space.
• Greedy approach: Hill-climbing algorithm search moves in the direction which
optimizes the cost.
• No backtracking: It does not backtrack the search space, as it does not remember the
previous states.
State-space Diagram for Hill Climbing:
The state-space landscape is a graphical representation of the hill-climbing algorithm which is
showing a graph between various states of algorithm and Objective function/Cost.
On Y-axis we have taken the function which can be an objective function or cost function, and
state-space on the x-axis. If the function on Y-axis is cost then, the goal of search is to find the
global minimum and local minimum. If the function of Y-axis is Objective function, then the
goal of the search is to find the global maximum and local maximum.

Different regions in the state space landscape:


Local Maximum: Local maximum is a state which is better than its neighbour states, but there
is also another state which is higher than it.
Global Maximum: Global maximum is the best possible state of state space landscape. It has
the highest value of objective function.
Current state: It is a state in a landscape diagram where an agent is currently present.
Flat local maximum: It is a flat space in the landscape where all the neighbour states of current
states have the same value.
Shoulder: It is a plateau region which has an uphill edge.
Types of Hill Climbing Algorithm:
• Simple hill Climbing:
• Steepest-Ascent hill-climbing:
• Stochastic hill Climbing:
1. Simple Hill Climbing:
Simple hill climbing is the simplest way to implement a hill climbing algorithm. It only
evaluates the neighbour node state at a time and selects the first one which optimizes
current cost and set it as a current state. It only checks it's one successor state, and if it finds
better than the current state, then move else be in the same state. This algorithm has the
following features:
• Less time consuming
• Less optimal solution and the solution is not guaranteed
Algorithm for Simple Hill Climbing:
• Step 1: Evaluate the initial state, if it is goal state then return success and Stop.
• Step 2: Loop Until a solution is found or there is no new operator left to apply.
• Step 3: Select and apply an operator to the current state.
• Step 4: Check new state:
o If it is goal state, then return success and quit.
o Else if it is better than the current state then assign new state as a current state.
o Else if not better than the current state, then return to step2.
• Step 5: Exit.

2. Steepest-Ascent hill climbing:


The steepest-Ascent algorithm is a variation of simple hill climbing algorithm. This algorithm
examines all the neighboring nodes of the current state and selects one neighbor node which is
closest to the goal state. This algorithm consumes more time as it searches for multiple
neighbors.
Algorithm for Steepest-Ascent hill climbing:
• Step 1: Evaluate the initial state, if it is goal state then return success and stop, else
make current state as initial state.
• Step 2: Loop until a solution is found or the current state does not change.
• Let SUCC be a state such that any successor of the current state will be better
than it.
• For each operator that applies to the current state:
▪ Apply the new operator and generate a new state.
▪ Evaluate the new state.
▪ If it is goal state, then return it and quit, else compare it to the SUCC.
▪ If it is better than SUCC, then set new state as SUCC.
▪ If the SUCC is better than the current state, then set current state to
SUCC.
• Step 5: Exit.
3. Stochastic hill climbing:
Stochastic hill climbing does not examine for all its neighbor before moving. Rather, this search
algorithm selects one neighbor node at random and decides whether to choose it as a current
state or examine another state.

Problems in Hill Climbing Algorithm:


1. Local Maximum: A local maximum is a peak state in the landscape which is better than
each of its neighboring states, but there is another state also present which is higher than the
local maximum.
Solution: Backtracking technique can be a solution of the local maximum in state space
landscape. Create a list of the promising path so that the algorithm can backtrack the search
space and explore other paths as well.

2. Plateau: A plateau is the flat area of the search space in which all the neighbor states of the
current state contains the same value, because of this algorithm does not find any best direction
to move. A hill-climbing search might be lost in the plateau area.
Solution: The solution for the plateau is to take big steps or very little steps while searching,
to solve the problem. Randomly select a state which is far away from the current state so it is
possible that the algorithm could find non-plateau region.
3. Ridges: A ridge is a special form of the local maximum. It has an area which is higher than
its surrounding areas, but itself has a slope, and cannot be reached in a single move.
Solution: With the use of bidirectional search, or by moving in different directions, we can
improve this problem.

❖ Memory bounded heuristic search


Problem with A* as presented is that it needs to keep track of all fringe and closed nodes. Thus,
it tends to run out of space before time.
Some solutions to problem
IDA* (Iterative Deepening A*)
Fix an ε. Modify expand-node so that it only adds nodes to the fringe that are of cost less than
the current threshold value.
Do A* search for thresholds < ε, < 2ε, < 3ε, … until you find a solution

Recursive Best-First search


- similar to recursive depth-first-search
Idea: keep expanding nodes along “best” path keep track of nodes seen so far when we expand
a node, we replace the cost on each node of path with the cost of the current best child.
Since 7 is not the smallest thing we’ve seen so far, we back track to smallest thing and forget
tree that backed up over.
Simplified memory bounded A* (SMA*)
-Do A* until we run out of memory.
When we don’t have enough memory to add a new node to the fringe, discard from closed or
fringe node of worst cost.

❖ AO* Search Strategy


Best-first search is what the AO* algorithm does. The AO* method divides any given
difficult problem into a smaller group of problems that are then resolved using the AND-
OR graph concept. AND OR graphs are specialized graphs that are used in problems that can
be divided into smaller problems. The AND side of the graph represents a set of tasks that
must be completed to achieve the main goal, while the OR side of the graph represents
different methods for accomplishing the same main goal.

In the above figure, the buying of a car may be broken down into smaller problems or tasks
that can be accomplished to achieve the main goal in the above figure, which is an example
of a simple AND-OR graph. The other task is to either steal a car that will help us accomplish
the main goal or use your own money to purchase a car that will accomplish the main goal.
The AND symbol is used to indicate the AND part of the graphs, which refers to the need
that all subproblems containing the AND to be resolved before the preceding node or issue
may be finished.
The start state and the target state are already known in the knowledge-
based search strategy known as the AO* algorithm, and the best path is identified by
heuristics. The informed search technique considerably reduces the algorithm’s time
complexity. The AO* algorithm is far more effective in searching AND-OR trees than the
A* algorithm.
Working of AO* algorithm:
The evaluation function in AO* looks like this:
f(n) = g(n) + h(n)
f(n) = Actual cost + Estimated cost
here,
f(n) = The actual cost of traversal.
g(n) = the cost from the initial node to the current node.
h(n) = estimated cost from the current node to the goal state.

Difference between the A* Algorithm and AO* algorithm


• A* algorithm and AO* algorithm both works on the best first search.
• They are both informed search and works on given heuristics values.
• A* always gives the optimal solution but AO* doesn’t guarantee to give the optimal
solution.
• Once AO* got a solution doesn’t explore all possible paths but A* explores all paths.
• When compared to the A* algorithm, the AO* algorithm uses less memory.
• opposite to the A* algorithm, the AO* algorithm cannot go into an endless loop.

Steps in AO* algorithm


Step-1: Create an initial graph with a single node (start node).
Step-2: Transverse the graph following the current path, accumulating node that has not yet
been expanded or solved.
Step-3: Select any of these nodes and explore it. If it has no successors then call this value-
FUTILITY else calculate f'(n) for each of the successors.
Step-4: If f'(n)=0, then mark the node as SOLVED.
Step-5: Change the value of f'(n) for the newly created node to reflect its successors by
backpropagation.
Step-6: Whenever possible use the most promising routes, if a node is marked as SOLVED
then mark the parent node as SOLVED.
Step-7: If the starting node is SOLVED or value is greater than FUTILITY then stop else
repeat from Step 2.
Example:

Solution: AO* Graph

❖ Simulated Annealing:
A hill-climbing algorithm which never makes a move towards a lower value guaranteed to be
incomplete because it can get stuck on a local maximum. And if algorithm applies a random
walk, by moving a successor, then it may complete but not efficient. Simulated Annealing is
an algorithm which yields both efficiency and completeness.
In mechanical term Annealing is a process of hardening a metal or glass to a high temperature
then cooling gradually, so this allows the metal to reach a low-energy crystalline state. The
same process is used in simulated annealing in which the algorithm picks a random move,
instead of picking the best move. Simulated annealing can be used to find solutions to
optimization problems by slowly changing the values of the variables in the problem until a
solution is found. If the random move improves the state, then it follows the same path.
Otherwise, the algorithm follows the path which has a probability of less than 1 or it moves
downhill and chooses another path.
The advantage of simulated annealing over other optimization methods is that it is less likely
to get stuck in a local minimum, where the solution is not the best possible but is good enough.
This is because simulated annealing allows for small changes to be made to the solution, which
means that it can escape from local minima and find the global optimum.
Simulated annealing is not a guaranteed method of finding the best solution to an optimization
problem, but it is a powerful tool that can be used to find good solutions in many cases.
Benefits of simulated annealing
Simulated annealing is a powerful tool for solving optimization problems. It is especially well-
suited for problems that are difficult to solve using traditional methods, such as those with
many local optima.
Simulated annealing works by starting with a random solution and then slowly improving it
over time. The key is to not get stuck in a local optimum, which can happen if the search moves
too slowly.
The benefits of using simulated annealing include:
1. The ability to find global optima.
2. The ability to escape from local optima.
3. The ability to handle constraints.
4. The ability to handle noisy data.
5. The ability to handle discontinuities.
6. The ability to find solutions in a fraction of the time required by other methods.
7. The ability to find solutions to problems that are difficult or impossible to solve using other
methods.

Drawbacks of simulated annealing


Simulated annealing is a technique used in AI to find solutions to optimization problems. It is
based on the idea of slowly cooling a material in order to find the lowest energy state, or the
most optimal solution.
However, simulated annealing can be slow and may not always find the best solution.
Additionally, it can be difficult to tune the parameters of the algorithm, which can lead to sub-
optimal results.

❖ Min-Max Algorithm
• Mini-max algorithm is a recursive or backtracking algorithm which is used in decision-
making and game theory. It provides an optimal move for the player assuming that
opponent is also playing optimally.
• Mini-Max algorithm uses recursion to search through the game-tree.
• Min-Max algorithm is mostly used for game playing in AI. Such as Chess, Checkers,
tic-tac-toe, go, and various tow-players game. This Algorithm computes the minimax
decision for the current state.
• In this algorithm two players play the game, one is called MAX and other is called
MIN.
• Both the players fight it as the opponent player gets the minimum benefit while they
get the maximum benefit.
• Both Players of the game are opponent of each other, where MAX will select the
maximized value and MIN will select the minimized value.
• The minimax algorithm performs a depth-first search algorithm for the exploration of
the complete game tree.
• The minimax algorithm proceeds all the way down to the terminal node of the tree, then
backtrack the tree as the recursion.

Working of Min-Max Algorithm:


The working of the minimax algorithm can be easily described using an example. Below we
have taken an example of game-tree which is representing the two-player game.
In this example, there are two players one is called Maximizer and other is called Minimizer.
Maximizer will try to get the Maximum possible score, and Minimizer will try to get the
minimum possible score.
This algorithm applies DFS, so in this game-tree, we have to go all the way through the leaves
to reach the terminal nodes.
At the terminal node, the terminal values are given so we will compare those value and
backtrack the tree until the initial state occurs. Following are the main steps involved in solving
the two-player game tree:
Example:
Properties of Mini-Max algorithm:
• Complete- Min-Max algorithm is Complete. It will definitely find a solution (if exist),
in the finite search tree.
• Optimal- Min-Max algorithm is optimal if both opponents are playing optimally.
• Time complexity- As it performs DFS for the game-tree, so the time complexity of
Min-Max algorithm is O(bm), where b is branching factor of the game-tree, and m is
the maximum depth of the tree.
• Space Complexity- Space complexity of Mini-max algorithm is also similar to DFS
which is O(bm).

Limitation of the minimax Algorithm:


The main drawback of the minimax algorithm is that it gets really slow for complex games
such as Chess, go, etc. This type of games has a huge branching factor, and the player has lots
of choices to decide. This limitation of the minimax algorithm can be improved from alpha-
beta pruning which we have discussed in the next topic.

❖ Alpha-beta pruning

• Alpha-beta pruning is a modified version of the minimax algorithm. It is an


optimization technique for the minimax algorithm.
• As we have seen in the minimax search algorithm that the number of game states it has
to examine are exponential in depth of the tree. Since we cannot eliminate the exponent,
but we can cut it to half. Hence, there is a technique by which without checking each
node of the game tree we can compute the correct minimax decision, and this technique
is called pruning. This involves two threshold parameter Alpha and beta for future
expansion, so it is called alpha-beta pruning. It is also called as Alpha-Beta
Algorithm.
• Alpha-beta pruning can be applied at any depth of a tree, and sometimes it not only
prune the tree leaves but also entire sub-tree.
• The two-parameter can be defined as:
o Alpha: The best (highest-value) choice we have found so far at any point along
the path of Maximizer. The initial value of alpha is -∞.
o Beta: The best (lowest-value) choice we have found so far at any point along
the path of Minimizer. The initial value of beta is +∞.
• The Alpha-beta pruning to a standard minimax algorithm returns the same move as the
standard algorithm does, but it removes all the nodes which are not really affecting the
final decision but making algorithm slow. Hence by pruning these nodes, it makes the
algorithm fast.

Condition for Alpha-beta pruning:


The main condition which required for alpha-beta pruning is α>=β

Key points about alpha-beta pruning:


• The Max player will only update the value of alpha.
• The Min player will only update the value of beta.
• While backtracking the tree, the node values will be passed to upper nodes instead of
values of alpha and beta.
• We will only pass the alpha, beta values to the child nodes.

Pseudo-code for Alpha-beta Pruning:


function minimax (node, depth, alpha, beta, maximizingPlayer) is
if depth ==0 or node is a terminal node then
return static evaluation of node
if MaximizingPlayer then // for Maximizer Player
maxEva= -infinity
for each child of node do
eva= minimax(child, depth-1, alpha, beta, False)
maxEva= max(maxEva, eva)
alpha= max(alpha, maxEva)
if beta<=alpha
break
return maxEva
else // for Minimizer player
minEva= +infinity
for each child of node do
eva= minimax(child, depth-1, alpha, beta, true)
minEva= min(minEva, eva)
beta= min(beta, eva)
if beta<=alpha
break
return minEva
Example:

Move Ordering in Alpha-Beta pruning:


The effectiveness of alpha-beta pruning is highly dependent on the order in which each node
is examined. Move order is an important aspect of alpha-beta pruning.
It can be of two types:
• Worst ordering: In some cases, alpha-beta pruning algorithm does not prune any of
the leaves of the tree, and works exactly as minimax algorithm. In this case, it also
consumes more time because of alpha-beta factors, such a move of pruning is called
worst ordering. In this case, the best move occurs on the right side of the tree. The time
complexity for such an order is O(bm).
• Ideal ordering: The ideal ordering for alpha-beta pruning occurs when lots of pruning
happens in the tree, and best moves occur at the left side of the tree. We apply DFS
hence it first search left of the tree and go deep twice as minimax algorithm in the same
amount of time. Complexity in ideal ordering is O(bm/2).

Rules to find good ordering:


Following are some rules to find good ordering in alpha-beta pruning:
• Occur the best move from the shallowest node.
• Order the nodes in the tree such that the best nodes are checked first.
• Use domain knowledge while finding the best move. Ex: for Chess, try order: captures
first, then threats, then forward moves, backward moves.
• We can bookkeep the states, as there is a possibility that states may repeat.
❖ Knowledge based representation
Humans claim that how intelligence is achieved not by purely reflect mechanisms but by
process of reasoning that operate on internal representation of knowledge. In AI these
techniques for intelligence are present in Knowledge Based Agents.

✓ Knowledge-Based Systems
• A knowledge-based system is a system that uses artificial intelligence techniques to store
and reason with knowledge. The knowledge is typically represented in the form of rules
or facts, which can be used to draw conclusions or make decisions.
• One of the key benefits of a knowledge-based system is that it can help to automate
decision-making processes. For example, a knowledge-based system could be used to
diagnose a medical condition, by reasoning over a set of rules that describe the symptoms
and possible causes of the condition.
• Another benefit of knowledge-based systems is that they can be used to explain their
decisions to humans. This can be useful, for example, in a customer service setting, where
a knowledge-based system can help a human agent understand why a particular decision
was made.
• Knowledge-based systems are a type of artificial intelligence and have been used in a
variety of applications including medical diagnosis, expert systems, and decision support
systems.

✓ Knowledge-Based System in Artificial Intelligence


• An intelligent agent needs knowledge about the real world to make decisions and
reasoning to act efficiently.
• Knowledge-based agents are those agents who have the capability of maintaining an
internal state of knowledge, reason over that knowledge, update their knowledge after
observations and take action. These agents can represent the world with some formal
representation and act intelligently.

✓ Why use a knowledge base?

• A knowledge base inference is required for updating knowledge for an agent to learn
with experiences and take action as per the knowledge.
• Inference means deriving new sentences from old. The inference-based system allows us
to add a new sentence to the knowledge base. A sentence is a proposition about the world.
The inference system applies logical rules to the KB to deduce new information.
• The inference system generates new facts so that an agent can update the KB. An
inference system works mainly in two rules which are given:
• Forward chaining
• Backward chaining

✓ Various levels of knowledge-based agents


A knowledge-based agent can be viewed at different levels which are given below:
1. Knowledge level
Knowledge level is the first level of knowledge-based agent, and in this level, we need to
specify what the agent knows, and what the agent goals are. With these specifications, we
can fix its behavior. For example, suppose an automated taxi agent needs to go from a station
A to station B, and he knows the way from A to B, so this comes at the knowledge level.

2. Logical level
At this level, we understand that how the knowledge representation of knowledge is stored.
At this level, sentences are encoded into different logics. At the logical level, an encoding of
knowledge into logical sentences occurs. At the logical level we can expect to the automated
taxi agent to reach to the destination B.

3. Implementation level
This is the physical representation of logic and knowledge. At the implementation level agent
perform actions as per logical and knowledge level. At this level, an automated taxi agent
actually implement his knowledge and logic so that he can reach to the destination.
Knowledge-based agents have explicit representation of knowledge that can be reasoned.
They maintain internal state of knowledge, reason over it, update it and perform actions
accordingly. These agents act intelligently according to requirements.
Knowledge based agents give the current situation in the form of sentences. They have
complete knowledge of current situation of mini-world and its surroundings. These agents
manipulate knowledge to infer new things at “Knowledge level”.
✓ Knowledge-based system has following features
• Knowledge base (KB): It is the key component of a knowledge-based agent. These
deal with real facts of world. It is a mixture of sentences which are explained in
knowledge representation language.
• Inference Engine (IE): It is knowledge-based system engine used to infer new
knowledge in the system.

✓ Actions performed by an agent


Inference System is used when we want to update some information (sentences) in
Knowledge-Based System and to know the already present information. This mechanism is
done by TELL and ASK operations. They include inference i.e. producing new sentences
from old. Inference must accept needs when one asks a question to KB and answer should
follow from what has been Told to KB. Agent also has a KB, which initially has some
background Knowledge. Whenever, agent program is called, it performs some actions.
Actions done by KB Agent:
1. It TELLS what it recognized from the environment and what it needs to know to the
knowledge base.
2. It ASKS what actions to do? and gets answers from the knowledge base.
3. It TELLS the which action is selected, then agent will execute that action.

If a percept is given, agent adds it to KB, then it will ask KB for the best action and then tells
KB that it has in fact taken that action.

Figure: Knowledge based Systems


✓ A Knowledge based system behavior can be designed in following approaches: -
• Declarative Approach: In this beginning from an empty knowledge base, the agent
can TELL sentences one after another till the agent has knowledge of how to work
with its environment. This is known as the declarative approach. It stores required
information in empty knowledge-based system.
• Procedural Approach: This converts required behaviors directly into program code
in empty knowledge-based system. It is a contrast approach when compared to
Declarative approach. In this by coding behavior of system is designed.

❖ Propositional Logic
What is Logic?
Logic is the basis of all mathematical reasoning, and of all automated reasoning. The rules
of logic specify the meaning of mathematical statements. These rules help us understand and
reason with statements such as –

Which in Simple English means “There exists an integer that is not the sum of two squares”.
Importance of Mathematical Logic The rules of logic give precise meaning to mathematical
statements. These rules are used to distinguish between valid and invalid mathematical
arguments. Apart from its importance in understanding mathematical reasoning, logic has
numerous applications in Computer Science, varying from design of digital circuits, to the
construction of computer programs and verification of correctness of programs.

What is a proposition? A proposition is the basic building block of logic. It is defined as a


declarative sentence that is either True or False, but not both. The Truth Value of a
proposition is True (denoted as T) if it is a true statement, and False (denoted as F) if it is a
false statement. For Example,
All of the above sentences are propositions, where the first two are Valid (True) and the third
one is Invalid (False). Some sentences that do not have a truth value or may have more than
one truth value are not propositions. For Example,

The above sentences are not propositions as the first two do not have a truth value, and the
third one may be true or false. To represent propositions, propositional variables are used.
By Convention, these variables are represented by small alphabets such as 𝑝, 𝑞, 𝑟, 𝑠. The area
of logic which deals with propositions is called propositional calculus or propositional
logic. It also includes producing new propositions using existing ones. Propositions
constructed using one or more propositions are called compound propositions. The
propositions are combined together using Logical Connectives or Logical Operators.

Truth Table
The truth value of a proposition in all possible scenarios, we consider all the possible
combinations of the propositions which are joined together by Logical Connectives to form the
given compound proposition. This compilation of all possible scenarios in a tabular format is
called a truth table. Most Common Logical Connectives-
• Negation: If p is a proposition, then the negation of p is denoted by ¬𝒑, which when
translated to simple English means - “It is not the case that p” or simply “not p”. The
truth value of ¬𝒑 is the opposite of the truth value of p. The truth table of ¬𝒑 is-
Example, the negation of “It is raining today”, is “It is not the case that is raining today” or
simply “It is not raining today”.

• Conjunction – For any two propositions p and q, their conjunction is denoted by


𝒑 ⋀ 𝒒, which means “p and q”. The conjunction 𝒑 ⋀ 𝒒 is True when both p and q are
True, otherwise False. The truth table of 𝒑 ⋀ 𝒒 is-

Example, The conjunction of the propositions p – “Today is Friday” and q – “It is raining
today”, p\wedge q is “Today is Friday and it is raining today”. This proposition is true only
on rainy Fridays and is false on any other rainy day or on Fridays when it does not rain.

❖ Rules of Inference in Artificial intelligence


Inference:
In artificial intelligence, we need intelligent computers which can create new logic from old
logic or by evidence, so generating the conclusions from evidence and facts is termed as
Inference.
Inference rules:
Inference rules are the templates for generating valid arguments. Inference rules are applied to
derive proofs in artificial intelligence, and the proof is a sequence of the conclusion that leads
to the desired goal.
In inference rules, the implication among all the connectives plays an important role. Following
are some terminologies related to inference rules:
• Implication: It is one of the logical connectives which can be represented as P → Q. It
is a Boolean expression.
• Converse: The converse of implication, which means the right-hand side proposition
goes to the left-hand side and vice-versa. It can be written as Q → P.
• Contrapositive: The negation of converse is termed as contrapositive, and it can be
represented as ¬ Q → ¬ P.
• Inverse: The negation of implication is called inverse. It can be represented as ¬ P →
¬ Q.
From the above term some of the compound statements are equivalent to each other, which we
can prove using truth table:

Hence from the above truth table, we can prove that P → Q is equivalent to ¬ Q → ¬ P, and
Q→ P is equivalent to ¬ P → ¬ Q.

❖ Types of Inference rules:


1. Modus Ponens:
The Modus Ponens rule is one of the most important rules of inference, and it states that if P
and P → Q is true, then we can infer that Q will be true. It can be represented as:

Example:
Statement-1: "If I am sleepy then I go to bed" ==> P→ Q
Statement-2: "I am sleepy" ==> P
Conclusion: "I go to bed." ==> Q.
Hence, we can say that, if P→ Q is true and P is true then Q will be true.
Proof by Truth table:

2. Modus Tollens:
The Modus Tollens rule state that if P→ Q is true and ¬ Q is true, then ¬ P will also true. It
can be represented as:
Statement-1: "If I am sleepy then I go to bed" ==> P→ Q
Statement-2: "I do not go to the bed."==> ~Q
Statement-3: Which infers that "I am not sleepy" => ~P
Proof by Truth table:

3. Hypothetical Syllogism:
The Hypothetical Syllogism rule state that if P→R is true whenever P→Q is true, and Q→R is
true. It can be represented as the following notation:
Example:
Statement-1: If you have my home key then you can unlock my home. P→Q
Statement-2: If you can unlock my home then you can take my money. Q→R
Conclusion: If you have my home key then you can take my money. P→R

Proof by truth table:

4. Disjunctive Syllogism:
The Disjunctive syllogism rule state that if P∨Q is true, and ¬P is true, then Q will be true. It
can be represented as:
Example:
Statement-1: Today is Sunday or Monday. ==>P∨Q
Statement-2: Today is not Sunday. ==> ¬P
Conclusion: Today is Monday. ==> Q
Proof by truth-table:

5. Addition:
The Addition rule is one the common inference rule, and it states that If P is true, then P∨Q
will be true.

Example:
Statement: I have a vanilla ice-cream. ==> P
Statement-2: I have Chocolate ice-cream.
Conclusion: I have vanilla or chocolate ice-cream. ==> (P∨Q)

Proof by Truth-Table:

6. Simplification:
The simplification rule state that if P∧ Q is true, then Q or P will also be true. It can be
represented as:
Proof by Truth-Table:

7. Resolution:
The Resolution rule state that if P∨Q and ¬ P∧R is true, then Q∨R will also be true. It can be
represented as

Proof by Truth-Table:

❖ First-Order Logic in Artificial intelligence


In the Propositional logic, we have seen that how to represent statements using propositional
logic. But unfortunately, in propositional logic, we can only represent the facts, which are either
true or false. PL is not sufficient to represent the complex sentences or natural language
statements. The propositional logic has very limited expressive power. Consider the following
sentence, which we cannot represent using PL logic.

• “Some humans are intelligent”, or


• “Sachin likes cricket.”

To represent the above statements, PL logic is not sufficient, so we required some more
powerful logic, such as first-order logic.

First-Order logic:

• First-order logic is another way of knowledge representation in artificial intelligence.


It is an extension to propositional logic.
• FOL is sufficiently expressive to represent the natural language statements in a concise
way.
• First-order logic is also known as Predicate logic or First-order predicate logic.
First-order logic is a powerful language that develops information about the objects in
a more easy way and can also express the relationship between those objects.
• First-order logic (like natural language) does not only assume that the world contains
facts like propositional logic but also assumes the following things in the world:

▪ Objects: A, B, people, numbers, colors, wars, theories, squares, pits, wumpus,


......
▪ Relations: It can be unary relation such as: red, round, is adjacent, or n-any
relation such as: the sister of, brother of, has color, comes between
▪ Function: Father of, best friend, third inning of, end of, ......

• As a natural language, first-order logic also has two main parts:


a. Syntax
b. Semantics

Syntax of First-Order logic:


The syntax of FOL determines which collection of symbols is a logical expression in first-order
logic. The basic syntactic elements of first-order logic are symbols. We write statements in
short-hand notation in FOL.

Basic Elements of First-order logic:


Following are the basic elements of FOL syntax:
Constant 1, 2, A, John, Mumbai, cat,....
Variables x, y, z, a, b,....
Predicates Brother, Father, >,....
Function sqrt, LeftLegOf, ....
Connectives ∧, ∨, ¬, ⇒, ⇔
Equality ==
Quantifier ∀, ∃

Atomic sentences:
• Atomic sentences are the most basic sentences of first-order logic. These sentences are
formed from a predicate symbol followed by a parenthesis with a sequence of terms.

• We can represent atomic sentences as Predicate (term1, term2, ......, term n).
Complex Sentences:
• Complex sentences are made by combining atomic sentences using connectives.
First-order logic statements can be divided into two parts:
• Subject: Subject is the main part of the statement.
• Predicate: A predicate can be defined as a relation, which binds two atoms together in
a statement.
Consider the statement: “x is an integer.”, it consists of two parts, the first part x is the
subject of the statement and second part "is an integer," is known as a predicate.

Quantifiers in First-order logic:


• A quantifier is a language element which generates quantification, and quantification
specifies the quantity of specimen in the universe of discourse.
• These are the symbols that permit to determine or identify the range and scope of the
variable in the logical expression. There are two types of quantifier:
a. Universal Quantifier, (for all, everyone, everything)
b. Existential quantifier, (for some, at least one).

Universal Quantifier:
Universal quantifier is a symbol of logical representation, which specifies that the statement
within its range is true for everything or every instance of a particular thing.
The Universal quantifier is represented by a symbol ∀, which resembles an inverted A.
If x is a variable, then ∀x is read as:
• For all x
• For each x
• For every x.
Example:
• Question: All man drink coffee.
• Answer: ∀x man(x) → drink (x, coffee).
• It will be read as: There are all x where x is a man who drink coffee.
Existential Quantifier:
Existential quantifiers are the type of quantifiers, which express that the statement within its
scope is true for at least one instance of something.
It is denoted by the logical operator ∃, which resembles as inverted E. When it is used with a
predicate variable then it is called as an existential quantifier.
If x is a variable, then existential quantifier will be ∃x or ∃(x). And it will be read as:
• There exists a 'x.'
• For some 'x.'
• For at least one 'x.'

Example:
• Question: Some boys are intelligent.
• Answer: ∃x: boys(x) ∧ intelligent(x).
• It will be read as: There are some x where x is a boy who is intelligent.

Points to remember:
• The main connective for universal quantifier ∀ is implication →.
• The main connective for existential quantifier ∃ is and ∧.
Properties of Quantifiers:
• In universal quantifier, ∀x∀y is similar to ∀y∀x.
• In Existential quantifier, ∃x∃y is similar to ∃y∃x.
• ∃x∀y is not similar to ∀y∃x.

Free and Bound Variables:


The quantifiers interact with variables which appear in a suitable way. There are two types of
variables in First-order logic which are given below:
Free Variable: A variable is said to be a free variable in a formula if it occurs outside the scope
of the quantifier.
Example: ∀x ∃(y)[P (x, y, z)], where z is a free variable.
Bound Variable: A variable is said to be a bound variable in a formula if it occurs within the
scope of the quantifier.
Example: ∀x [A (x) B(y)], here x and y are the bound variables.
❖ Unification in First Order Predicate Logic
• Unification is a process of making two different logical atomic expressions identical by
finding a substitution. Unification depends on the substitution process.
• It takes two literals as input and makes them identical using substitution.
• Let Ψ1 and Ψ2 be two atomic sentences and 𝜎 be a unifier such that, Ψ1𝜎 = Ψ2𝜎, then it can
be expressed as UNIFY(Ψ1, Ψ2).
• Example: Find the MGU for Unify{King(x), King(John)}

Let Ψ1 = King(x), Ψ2 = King(John),


Substitution θ = {John/x} is a unifier for these atoms and applying this substitution, and both
expressions will be identical.
• The UNIFY algorithm is used for unification, which takes two atomic sentences and
returns a unifier for those sentences (If any exist).
• Unification is a key component of all first-order inference algorithms.
• It returns fail if the expressions do not match with each other.
• The substitution variables are called Most General Unifier or MGU.
E.g. Let's say there are two different expressions, P(x, y), and P(a, f(z)).

Conditions for Unification:


Following are some basic conditions for unification:
• Predicate symbol must be same,
• Atoms or expression with different predicate symbol can never be unified.
• Number of Arguments in both expressions must be identical.
• Unification will fail if there are two similar variables present in the same expression.

Implementation of the Algorithm


Step.1: Initialize the substitution set to be empty.
Step.2: Recursively unify atomic sentences:
a. Check for Identical expression match.
b. If one expression is a variable vi, and the other is a term ti which does not contain
variable vi, then:
a. Substitute ti / vi in the existing substitutions
b. Add ti /vi to the substitution setlist.
c. If both the expressions are functions, then function name must be similar, and
the number of arguments must be the same in both the expression.

❖ Forward Chaining and backward chaining in AI


Inference engine:
The inference engine is the component of the intelligent system in artificial intelligence, which
applies logical rules to the knowledge base to infer new information from known facts. The
first inference engine was part of the expert system. Inference engine commonly proceeds in
two modes, which are:
a. Forward chaining
b. Backward chaining

Horn Clause and Definite clause:


Horn clause and definite clause are the forms of sentences, which enables knowledge base to
use a more restricted and efficient inference algorithm. Logical inference algorithms use
forward and backward chaining approaches, which require KB in the form of the first-order
definite clause.
Definite clause: A clause which is a disjunction of literals with exactly one positive literal is
known as a definite clause or strict horn clause.
Horn clause: A clause which is a disjunction of literals with at most one positive literal is
known as horn clause. Hence all the definite clauses are horn clauses.
Example: (¬ p V ¬ q V k). It has only one positive literal k.
It is equivalent to p ∧ q → k.

A. Forward Chaining
Forward chaining is also known as a forward deduction or forward reasoning method when
using an inference engine. Forward chaining is a form of reasoning which start with atomic
sentences in the knowledge base and applies inference rules (Modus Ponens) in the forward
direction to extract more data until a goal is reached.
The Forward-chaining algorithm starts from known facts, triggers all rules whose premises are
satisfied, and add their conclusion to the known facts. This process repeats until the problem is
solved.
Properties of Forward-Chaining:
• It is a down-up approach, as it moves from bottom to top.
• It is a process of making a conclusion based on known facts or data, by starting from
the initial state and reaches the goal state.
• Forward-chaining approach is also called as data-driven as we reach to the goal using
available data.
• Forward chaining approach is commonly used in the expert system, such as CLIPS,
business, and production rule systems.

Consider the following famous example which we will use in both approaches:
Example:
“As per the law, it is a crime for an American to sell weapons to hostile nations. Country
A, an enemy of America, has some missiles, and all the missiles were sold to it by Robert,
who is an American citizen.”
Prove that “Robert is criminal.”
To solve the above problem, first, we will convert all the above facts into first-order definite
clauses, and then we will use a forward-chaining algorithm to reach the goal.
Solution:

• It is a crime for an American to sell weapons to hostile nations. (Let's say p, q, and r
are variables)
American (p) ∧ weapon(q) ∧ sells (p, q, r) ∧ hostile(r) → Criminal(p) ...(1)
• Country A has some missiles. ?p Owns(A, p) ∧ Missile(p). It can be written in two
definite clauses by using Existential Instantiation, introducing new Constant T1.
Owns(A, T1) ......(2)
Missile(T1) .......(3)
• All of the missiles were sold to country A by Robert.
?p Missiles(p) ∧ Owns (A, p) → Sells (Robert, p, A) ......(4)
• Missiles are weapons.
Missile(p) → Weapons (p) .......(5)
• Enemy of America is known as hostile.
Enemy(p, America) →Hostile(p) ........(6)
• Country A is an enemy of America.
Enemy (A, America) .........(7)
• Robert is American
American(Robert). ..........(8)

❖ Backward Chaining:
Backward-chaining is also known as a backward deduction or backward reasoning method
when using an inference engine. A backward chaining algorithm is a form of reasoning, which
starts with the goal and works backward, chaining through rules to find known facts that
support the goal.

Properties of backward chaining:


• It is known as a top-down approach.
• Backward-chaining is based on modus ponens inference rule.
• In backward chaining, the goal is broken into sub-goal or sub-goals to prove the facts
true.
• It is called a goal-driven approach, as a list of goals decides which rules are selected
and used.
• Backward -chaining algorithm is used in game theory, automated theorem proving
tools, inference engines, proof assistants, and various AI applications.
• The backward-chaining method mostly used a depth-first search strategy for proof.

Example:

In backward-chaining, we will use the same above example, and will rewrite all the rules.

• American (p) ∧ weapon(q) ∧ sells (p, q, r) ∧ hostile(r) → Criminal(p) ...(1)


Owns(A, T1) ........(2)
• Missile(T1)
• ?p Missiles(p) ∧ Owns (A, p) → Sells (Robert, p, A) ......(4)
• Missile(p) → Weapons (p) .......(5)
• Enemy(p, America) →Hostile(p) ........(6)
• Enemy (A, America) .........(7)
• American(Robert). ..........(8)
❖ Deductive vs Inductive Reasoning

Reasoning in artificial intelligence has two important forms, Inductive reasoning, and
Deductive reasoning. Both reasoning forms have premises and conclusions, but both reasoning
are contradictory to each other. Following is a list for comparison between inductive and
deductive reasoning:
• Deductive reasoning uses available facts, information, or knowledge to deduce a valid
conclusion, whereas inductive reasoning involves making a generalization from
specific facts, and observations.
• Deductive reasoning uses a top-down approach, whereas inductive reasoning uses a
bottom-up approach.
• Deductive reasoning moves from generalized statement to a valid conclusion, whereas
Inductive reasoning moves from specific observation to a generalization.
• In deductive reasoning, the conclusions are certain, whereas, in Inductive reasoning,
the conclusions are probabilistic.
• Deductive arguments can be valid or invalid, which means if premises are true, the
conclusion must be true, whereas inductive argument can be strong or weak, which
means conclusion may be false even if premises are true.

Basis for Deductive Reasoning Inductive Reasoning


comparison
Definition Deductive reasoning is the form of Inductive reasoning arrives at a conclusion
valid reasoning, to deduce new by the process of generalization using
information or conclusion from specific facts or data.
known related facts and information.
Approach Deductive reasoning follows a top- Inductive reasoning follows a bottom-up
down approach. approach.
Starts from Deductive reasoning starts from Inductive reasoning starts from the
Premises. Conclusion.
Validity In deductive reasoning conclusion In inductive reasoning, the truth of
must be true if the premises are true. premises does not guarantee the truth of
conclusions.
Usage Use of deductive reasoning is Use of inductive reasoning is fast and easy,
difficult, as we need facts which must as we need evidence instead of true facts.
be true. We often use it in our daily life.
Process Theory→ hypothesis→ Observations-
patterns→confirmation. →patterns→hypothesis→Theory.
Argument In deductive reasoning, arguments In inductive reasoning, arguments may be
may be valid or invalid. weak or strong.
Structure Deductive reasoning reaches from Inductive reasoning reaches from specific
general facts to specific facts. facts to general facts.
❖ Probabilistic reasoning in Artificial intelligence
Uncertainty:
Till now, we have learned knowledge representation using first-order logic and propositional
logic with certainty, which means we were sure about the predicates. With this knowledge
representation, we might write A→B, which means if A is true then B is true, but consider a
situation where we are not sure about whether A is true or not then we cannot express this
statement, this situation is called uncertainty.
So, to represent uncertain knowledge, where we are not sure about the predicates, we need
uncertain reasoning or probabilistic reasoning.

Causes of uncertainty:
Following are some leading causes of uncertainty to occur in the real world.
1. Information occurred from unreliable sources.
2. Experimental Errors
3. Equipment fault
4. Temperature variation
5. Climate change.

Probabilistic reasoning:
Probabilistic reasoning is a way of knowledge representation where we apply the concept of
probability to indicate the uncertainty in knowledge. In probabilistic reasoning, we combine
probability theory with logic to handle the uncertainty.
We use probability in probabilistic reasoning because it provides a way to handle the
uncertainty that is the result of someone's laziness and ignorance.
In the real world, there are lots of scenarios, where the certainty of something is not confirmed,
such as “It will rain today,” “behavior of someone for some situations,” “A match between two
teams or two players.” These are probable sentences for which we can assume that it will
happen but not sure about it, so here we use probabilistic reasoning.

Need of probabilistic reasoning in AI:


• When there are unpredictable outcomes.
• When specifications or possibilities of predicates becomes too large to handle.
• When an unknown error occurs during an experiment.
In probabilistic reasoning, there are two ways to solve problems with uncertain knowledge:
• Bayes’ rule
• Bayesian Statistics

As probabilistic reasoning uses probability and related terms, so before understanding


probabilistic reasoning, let's understand some common terms:
Probability: Probability can be defined as a chance that an uncertain event will occur. It is the
numerical measure of the likelihood that an event will occur. The value of probability always
remains between 0 and 1 that represent ideal uncertainties.
• 0 ≤ P(A) ≤ 1, where P(A) is the probability of an event A.
• P(A) = 0, indicates total uncertainty in an event A.
• P(A) =1, indicates total certainty in an event A.
We can find the probability of an uncertain event by using the below formula.

• P(¬A) = probability of a not happening event.


• P(¬A) + P(A) = 1.

Event: Each possible outcome of a variable is called an event.


Sample space: The collection of all possible events is called sample space.
Random variables: Random variables are used to represent the events and objects in the real
world.
Prior probability: The prior probability of an event is probability computed before observing
new information.
Posterior Probability: The probability that is calculated after all evidence or information has
taken into account. It is a combination of prior probability and new information.

Conditional probability:
Conditional probability is a probability of occurring an event when another event has already
happened.
Let’s suppose, we want to calculate the event A when event B has already occurred, "the
probability of A under the conditions of B”, it can be written as:

Where P(A⋀B) = Joint probability of a and B


P(B)= Marginal probability of B.
If the probability of A is given and we need to find the probability of B, then it will be given
as:

It can be explained by using the below Venn diagram, where B is occurred event, so sample
space will be reduced to set B, and now we can only calculate event A when event B is already
occurred by dividing the probability of P(A⋀B) by P(B).

Example:
In a class, there are 70% of the students who like
English and 40% of the students who likes English and
mathematics, and then what is the percent of students
those who like English also like mathematics?

Solution:
Let, A is an event that a student likes Mathematics
B is an event that a student likes English.

Hence, 57% are the students who like English also like Mathematics.

❖ Bayes’ theorem in Artificial intelligence


Bayes' theorem is also known as Bayes' rule, Bayes' law, or Bayesian reasoning, which
determines the probability of an event with uncertain knowledge.
In probability theory, it relates the conditional probability and marginal probabilities of two
random events.
Bayes' theorem was named after the British mathematician Thomas Bayes. The Bayesian
inference is an application of Bayes' theorem, which is fundamental to Bayesian statistics.
It is a way to calculate the value of P(B|A) with the knowledge of P(A|B).
Bayes' theorem allows updating the probability prediction of an event by observing new
information of the real world.

Example: If cancer corresponds to one's age then by using Bayes' theorem, we can determine
the probability of cancer more accurately with the help of age.
Bayes' theorem can be derived using product rule and conditional probability of event A with
known event B:
As from product rule we can write:
• P(A ⋀ B)= P(A|B) P(B) or
Similarly, the probability of event B with known event A:
• P(A ⋀ B)= P(B|A) P(A)
Equating right hand side of both the equations, we will get:

The above equation (a) is called as Bayes' rule or Bayes' theorem. This equation is basic of
most modern AI systems for probabilistic inference.
It shows the simple relationship between joint and conditional probabilities. Here,
P(A|B) is known as posterior, which we need to calculate, and it will be read as Probability of
hypothesis A when we have occurred an evidence B.
P(B|A) is called the likelihood, in which we consider that hypothesis is true, then we calculate
the probability of evidence.
P(A) is called the prior probability, probability of hypothesis before considering the evidence
P(B) is called marginal probability, pure probability of an evidence.
In the equation (a), in general, we can write P (B) = P(A)*P(B|Ai), hence the Bayes' rule can
be written as:

Where A1, A2, A3, ..., An is a set of mutually exclusive and exhaustive events.
Applying Bayes' rule:
Bayes' rule allows us to compute the single term P(B|A) in terms of P(A|B), P(B), and P(A).
This is very useful in cases where we have a good probability of these three terms and want to
determine the fourth one. Suppose we want to perceive the effect of some unknown cause, and
want to compute that cause, then the Bayes' rule becomes:

Example-1:
Question: what is the probability that a patient has diseases meningitis with a stiff neck?
Given Data:
A doctor is aware that disease meningitis causes a patient to have a stiff neck, and it occurs
80% of the time. He is also aware of some more facts, which are given as follows:
o The Known probability that a patient has meningitis disease is 1/30,000.
o The Known probability that a patient has a stiff neck is 2%.
Let a be the proposition that patient has stiff neck and b be the proposition that patient has
meningitis, so we can calculate the following as:
P(a|b) = 0.8
P(b) = 1/30000
P(a)= .02

Hence, we can assume that 1 patient out of 750 patients has meningitis disease with a stiff neck.

Example-2:
Question: From a standard deck of playing cards, a single card is drawn. The probability
that the card is king is 4/52, then calculate posterior probability P(King|Face), which
means the drawn face card is a king card.
Solution:

P(king): probability that the card is King= 4/52= 1/13


P(face): probability that a card is a face card= 3/13
P(Face|King): probability of face card when we assume it is a king = 1
Putting all values in equation (i) we will get:

Following are some applications of Bayes' theorem:


o It is used to calculate the next step of the robot when the already executed step is given.
o Bayes' theorem is helpful in weather forecasting.
o It can solve the Monty Hall problem.

❖ Bayesian Belief Network in artificial intelligence


Bayesian belief network is key computer technology for dealing with probabilistic events and
to solve a problem which has uncertainty. We can define a Bayesian network as:
"A Bayesian network is a probabilistic graphical model which represents a set of variables and
their conditional dependencies using a directed acyclic graph."
It is also called a Bayes network, belief network, decision network, or Bayesian model.
Bayesian networks are probabilistic, because these networks are built from a probability
distribution, and also use probability theory for prediction and anomaly detection.
Real world applications are probabilistic in nature, and to represent the relationship between
multiple events, we need a Bayesian network. It can also be used in various tasks
including prediction, anomaly detection, diagnostics, automated insight, reasoning, time
series prediction, and decision making under uncertainty.
Bayesian Network can be used for building models from data and experts opinions, and it
consists of two parts:
• Directed Acyclic Graph
• Table of conditional probabilities.
The generalized form of Bayesian network that represents and solve decision problems under
uncertain knowledge is known as an Influence diagram.
A Bayesian network graph is made up of nodes and Arcs (directed links), where:

• Each node corresponds to the random variables, and a variable can


be continuous or discrete.
• Arc or directed arrows represent the causal relationship or conditional probabilities
between random variables. These directed links or arrows connect the pair of nodes in
the graph.
• These links represent that one node directly influence the other node, and if there is no
directed link that means that nodes are independent with each other

• In the above diagram, A, B, C, and D are random variables represented by


the nodes of the network graph.
• If we are considering node B, which is connected with node A by a directed
arrow, then node A is called the parent of Node B.
• Node C is independent of node A.

The Bayesian network has mainly two components:


• Causal Component
• Actual numbers
Each node in the Bayesian network has condition probability distribution P(Xi |Parent(Xi) ),
which determines the effect of the parent on that node.
Bayesian network is based on Joint probability distribution and conditional probability. So let’s
first understand the joint probability distribution:
Joint probability distribution:
If we have variables x1, x2, x3, ..., xn, then the probabilities of a different combination of x1, x2,
x3, …, xn, are known as Joint probability distribution.
P[x1, x2, x3,....., xn], it can be written as the following way in terms of the joint probability
distribution.
= P[x1| x2, x3,....., xn]P[x2, x3,....., xn]
= P[x1| x2, x3,....., xn]P[x2|x3,....., xn]....P[xn-1|xn]P[xn].
In general for each variable Xi, we can write the equation as:
P(Xi|Xi-1,........., X1) = P(Xi |Parents(Xi ))

Explanation of Bayesian network:


Let's understand the Bayesian network through an example by creating a directed acyclic graph:
Example: Harry installed a new burglar alarm at his home to detect burglary. The alarm
reliably responds at detecting a burglary but also responds for minor earthquakes. Harry has
two neighbours David and Sophia, who have taken a responsibility to inform Harry at work
when they hear the alarm. David always calls Harry when he hears the alarm, but sometimes
he got confused with the phone ringing and calls at that time too. On the other hand, Sophia
likes to listen to high music, so sometimes she misses to hear the alarm. Here we would like to
compute the probability of Burglary Alarm.

Problem:
Calculate the probability that alarm has sounded, but there is neither a burglary, nor an
earthquake occurred, and David and Sophia both called the Harry.
Solution:
• The Bayesian network for the above problem is given below. The network structure is
showing that burglary and earthquake is the parent node of the alarm and directly
affecting the probability of alarm's going off, but David and Sophia's calls depend on
alarm probability.
• The network is representing that our assumptions do not directly perceive the burglary
and also do not notice the minor earthquake, and they also not confer before calling.
• The conditional distributions for each node are given as conditional probabilities table
or CPT.
• Each row in the CPT must be sum to 1 because all the entries in the table represent an
exhaustive set of cases for the variable.
• In CPT, a boolean variable with k boolean parents contains 2K probabilities. Hence, if
there are two parents, then CPT will contain 4 probability values

List of all events occurring in this network:


• Burglary (B)
• Earthquake(E)
• Alarm(A)
• David Calls(D)
• Sophia calls(S)

We can write the events of problem statement in the form of probability: P[D, S, A, B, E], can
rewrite the above probability statement using joint probability distribution:
P[D, S, A, B, E]= P[D | S, A, B, E]. P[S, A, B, E]
=P[D | S, A, B, E]. P[S | A, B, E]. P[A, B, E]
= P [D| A]. P [ S| A, B, E]. P[ A, B, E]
= P[D | A]. P[ S | A]. P[A| B, E]. P[B, E]
= P[D | A ]. P[S | A]. P[A| B, E]. P[B |E]. P[E]

Let's take the observed probability for the Burglary and earthquake component:
P(B= True) = 0.002, which is the probability of burglary.
P(B= False)= 0.998, which is the probability of no burglary.
P(E= True)= 0.001, which is the probability of a minor earthquake
P(E= False)= 0.999, Which is the probability that an earthquake not occurred.
We can provide the conditional probabilities as per the below tables:

Conditional probability table for Alarm A:


The Conditional probability of Alarm A depends on Burglar and earthquake:
B E P(A= True) P(A= False)
True True 0.94 0.06
True False 0.95 0.04
False True 0.31 0.69
False False 0.001 0.999

Conditional probability table for David Calls:


The Conditional probability of David that he will call depends on the probability of Alarm.
A P(D= True) P(D= False)
True 0.91 0.09
False 0.05 0.95

Conditional probability table for Sophia Calls:


The Conditional probability of Sophia that she calls is depending on its Parent Node “Alarm.”
A P(S= True) P(S= False)
True 0.75 0.25
False 0.02 0.98
From the formula of joint distribution, we can write the problem statement in the form of
probability distribution:
P(S, D, A, ¬B, ¬E) = P (S|A) *P (D|A)*P (A|¬B ^ ¬E) *P (¬B) *P (¬E).
= 0.75* 0.91* 0.001* 0.998*0.999
= 0.00068045.
Hence, a Bayesian network can answer any query about the domain by using Joint
distribution.

The semantics of Bayesian Network:


There are two ways to understand the semantics of the Bayesian network, which is given below:
1. To understand the network as the representation of the Joint probability
distribution.
It is helpful to understand how to construct the network.
2. To understand the network as an encoding of a collection of conditional
independence statements.
It is helpful in designing inference procedure.

You might also like